GitHub user leo79901 closed a discussion: Issue with Fenced
HI, I have configured the host's HA provider and enabled Out-of-band Management (OOBM). I have also tested the OOBM functionality and am sure that it is working properly. Then the HA state of the host was initially set to Available. However, when I cut the power of one host, the host enters the `Fencing` status within 1 minute and remains in that state for more than 1-3 hour. After that, the status changes to fenced, and the VMs on the host are restarted on other hosts by the HA. At the same time, I saw this on the ShapeBlue's website: ``` Fencing – the Host-HA framework is trying to Fence the host by issuing OOBM job ``` Yes, I have seen these logs. ACS try to shutdown the host by OOBM but faild. thers is 2100 try's with a host down. ``` 2024-12-03 14:36:30,219 WARN [o.a.c.h.t.BaseHATask] (pool-4-thread-82:ctx-af691a2f) (logid:670480bb) Exception occurred while running FenceTask on a resource: org.apache.cloudstack.ha.provider.HAFenceException: OBM service is not configured or enabled for this host idnode10 org.apache.cloudstack.ha.provider.HAFenceException: OBM service is not configured or enabled for this host idnode10 at org.apache.cloudstack.kvm.ha.KVMHAProvider.fence(KVMHAProvider.java:100) at org.apache.cloudstack.kvm.ha.KVMHAProvider.fence(KVMHAProvider.java:42) at org.apache.cloudstack.ha.task.FenceTask.performAction(FenceTask.java:42) at org.apache.cloudstack.ha.task.BaseHATask$1.call(BaseHATask.java:86) at org.apache.cloudstack.ha.task.BaseHATask$1.call(BaseHATask.java:83) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) Caused by: com.cloud.utils.exception.CloudRuntimeException: Out-of-band Management action [OFF] on Host {"id":14,"name":"idnode10","type":"Routing","uuid":"b570ac0f-aad5-4f90-a62e-46e710db3763"} failed with error: Set Chassis Power Control to Down/Off failed: Command not supported in present state at org.apache.cloudstack.outofbandmanagement.OutOfBandManagementServiceImpl.executePowerOperation(OutOfBandManagementServiceImpl.java:437) at jdk.internal.reflect.GeneratedMethodAccessor144.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ... 21 more ``` Then the status change to `Fenced`: ``` 2024-12-03 12:28:53,340 WARN [c.c.a.AlertManagerImpl] (pool-4-thread-139:ctx-efd70258) (logid:60f5a05f) alertType=[7] dataCenterId=[1] podId=[1] clusterId=[null] message=[Host is down, name: idnode10 (id:14), availability zone: Zone1, pod: Pod]. 2024-12-03 12:28:53,354 DEBUG [c.c.h.HighAvailabilityManagerImpl] (pool-4-thread-139:ctx-efd70258) (logid:60f5a05f) Notifying HA Mgr of to restart vm 50-i-2-50-VM 2024-12-03 12:28:53,357 DEBUG [c.c.h.HighAvailabilityManagerImpl] (pool-4-thread-139:ctx-efd70258) (logid:60f5a05f) HA schedule restart 2024-12-03 12:28:53,468 INFO [c.c.h.HighAvailabilityManagerImpl] (pool-4-thread-139:ctx-efd70258) (logid:60f5a05f) Schedule vm for HA: VM instance {"id":50,"instanceName":"i-2-50-VM","type":"User","uuid":"94c170b3-9780-43aa-ab0c-eaaf56fec475"} 2024-12-03 12:28:53,468 DEBUG [c.c.h.HighAvailabilityManagerImpl] (pool-4-thread-139:ctx-efd70258) (logid:60f5a05f) Wakeup workers HA 2024-12-03 12:28:53,477 INFO [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-4:ctx-e9d23278 work-3326) (logid:a1f6d350) Processing work HAWork[3326-HA-50-Running-Investigating] 2024-12-03 12:28:53,477 DEBUG [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-4:ctx-e9d23278 work-3326) (logid:a1f6d350) RESTART with HAWORK 2024-12-03 12:28:53,489 INFO [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-4:ctx-e9d23278 work-3326) (logid:a1f6d350) HA on VM instance {"id":50,"instanceName":"i-2-50-VM","type":"User","uuid":"94c170b3-9780-43aa-ab0c-eaaf56fec475"} 2024-12-03 12:28:53,498 DEBUG [c.c.r.ResourceState] (pool-4-thread-139:ctx-efd70258) (logid:60f5a05f) Resource state update: [id = 14; name = idnode10; old state = Enabled; event = InternalEnterMaintenance; new state = Maintenance] 2024-12-03 12:28:53,500 WARN [c.c.a.AlertManagerImpl] (pool-4-thread-139:ctx-efd70258) (logid:60f5a05f) alertType=[30] dataCenterId=[1] podId=[1] clusterId=[null] message=[HA Fencing of host id=14, in dc id=1 performed]. 2024-12-03 12:28:53,510 DEBUG [o.a.c.h.HAManagerImpl] (HA-Worker-4:ctx-e9d23278 work-3326) (logid:a1f6d350) HA: Host [14] is fenced. ``` ## So my issue are: **Why does the fencing process take such a long time?** **How can I make the HA process quicker?** GitHub link: https://github.com/apache/cloudstack/discussions/10026 ---- This is an automatically sent email for users@cloudstack.apache.org. To unsubscribe, please send an email to: users-unsubscr...@cloudstack.apache.org