GitHub user leo79901 closed a discussion: Issue with Fenced

HI,
  I have configured the host's HA provider and enabled Out-of-band Management 
(OOBM). I have also tested the OOBM functionality and am sure that it is 
working properly. 

  Then the HA state of the host was initially set to Available.

  However, when I cut the power of one host, the host enters the `Fencing` 
status within 1 minute and remains in that state for more than 1-3 hour. After 
that, the status changes to fenced, and the VMs on the host are restarted on 
other hosts by the HA.

At the same time, I saw this on the ShapeBlue's website:
```
Fencing – the Host-HA framework is trying to Fence the host by issuing OOBM job
```

Yes, I have seen these logs.  ACS try to shutdown the host by OOBM but faild. 
thers is 2100 try's with a host down. 

```
2024-12-03 14:36:30,219 WARN  [o.a.c.h.t.BaseHATask] 
(pool-4-thread-82:ctx-af691a2f) (logid:670480bb) Exception occurred while 
running FenceTask on a resource: 
org.apache.cloudstack.ha.provider.HAFenceException: OBM service is not 
configured or enabled for this host idnode10
org.apache.cloudstack.ha.provider.HAFenceException: OBM service is not 
configured or enabled for this host idnode10
        at 
org.apache.cloudstack.kvm.ha.KVMHAProvider.fence(KVMHAProvider.java:100)
        at 
org.apache.cloudstack.kvm.ha.KVMHAProvider.fence(KVMHAProvider.java:42)
        at 
org.apache.cloudstack.ha.task.FenceTask.performAction(FenceTask.java:42)
        at org.apache.cloudstack.ha.task.BaseHATask$1.call(BaseHATask.java:86)
        at org.apache.cloudstack.ha.task.BaseHATask$1.call(BaseHATask.java:83)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: com.cloud.utils.exception.CloudRuntimeException: Out-of-band 
Management action [OFF] on Host 
{"id":14,"name":"idnode10","type":"Routing","uuid":"b570ac0f-aad5-4f90-a62e-46e710db3763"}
 failed with error: Set Chassis Power Control to Down/Off failed: Command not 
supported in present state

        at 
org.apache.cloudstack.outofbandmanagement.OutOfBandManagementServiceImpl.executePowerOperation(OutOfBandManagementServiceImpl.java:437)
        at jdk.internal.reflect.GeneratedMethodAccessor144.invoke(Unknown 
Source)
        at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        ... 21 more
```

Then the status change to `Fenced`:

```
2024-12-03 12:28:53,340 WARN  [c.c.a.AlertManagerImpl] 
(pool-4-thread-139:ctx-efd70258) (logid:60f5a05f) alertType=[7] 
dataCenterId=[1] podId=[1] clusterId=[null] message=[Host is down, name: 
idnode10 (id:14), availability zone: Zone1, pod: Pod].
2024-12-03 12:28:53,354 DEBUG [c.c.h.HighAvailabilityManagerImpl] 
(pool-4-thread-139:ctx-efd70258) (logid:60f5a05f) Notifying HA Mgr of to 
restart vm 50-i-2-50-VM
2024-12-03 12:28:53,357 DEBUG [c.c.h.HighAvailabilityManagerImpl] 
(pool-4-thread-139:ctx-efd70258) (logid:60f5a05f) HA schedule restart
2024-12-03 12:28:53,468 INFO  [c.c.h.HighAvailabilityManagerImpl] 
(pool-4-thread-139:ctx-efd70258) (logid:60f5a05f) Schedule vm for HA:  VM 
instance 
{"id":50,"instanceName":"i-2-50-VM","type":"User","uuid":"94c170b3-9780-43aa-ab0c-eaaf56fec475"}
2024-12-03 12:28:53,468 DEBUG [c.c.h.HighAvailabilityManagerImpl] 
(pool-4-thread-139:ctx-efd70258) (logid:60f5a05f) Wakeup workers HA
2024-12-03 12:28:53,477 INFO  [c.c.h.HighAvailabilityManagerImpl] 
(HA-Worker-4:ctx-e9d23278 work-3326) (logid:a1f6d350) Processing work 
HAWork[3326-HA-50-Running-Investigating]
2024-12-03 12:28:53,477 DEBUG [c.c.h.HighAvailabilityManagerImpl] 
(HA-Worker-4:ctx-e9d23278 work-3326) (logid:a1f6d350) RESTART with HAWORK
2024-12-03 12:28:53,489 INFO  [c.c.h.HighAvailabilityManagerImpl] 
(HA-Worker-4:ctx-e9d23278 work-3326) (logid:a1f6d350) HA on VM instance 
{"id":50,"instanceName":"i-2-50-VM","type":"User","uuid":"94c170b3-9780-43aa-ab0c-eaaf56fec475"}
2024-12-03 12:28:53,498 DEBUG [c.c.r.ResourceState] 
(pool-4-thread-139:ctx-efd70258) (logid:60f5a05f) Resource state update: [id = 
14; name = idnode10; old state = Enabled; event = InternalEnterMaintenance; new 
state = Maintenance]
2024-12-03 12:28:53,500 WARN  [c.c.a.AlertManagerImpl] 
(pool-4-thread-139:ctx-efd70258) (logid:60f5a05f) alertType=[30] 
dataCenterId=[1] podId=[1] clusterId=[null] message=[HA Fencing of host id=14, 
in dc id=1 performed].
2024-12-03 12:28:53,510 DEBUG [o.a.c.h.HAManagerImpl] (HA-Worker-4:ctx-e9d23278 
work-3326) (logid:a1f6d350) HA: Host [14] is fenced.
```


## So my issue are:
**Why does the fencing process take such a long time?**
**How can I make the HA process quicker?**

GitHub link: https://github.com/apache/cloudstack/discussions/10026

----
This is an automatically sent email for users@cloudstack.apache.org.
To unsubscribe, please send an email to: users-unsubscr...@cloudstack.apache.org

Reply via email to