GitHub user leo79901 closed a discussion: Issue with Fenced
HI,
I have configured the host's HA provider and enabled Out-of-band Management
(OOBM). I have also tested the OOBM functionality and am sure that it is
working properly.
Then the HA state of the host was initially set to Available.
However, when I cut the power of one host, the host enters the `Fencing`
status within 1 minute and remains in that state for more than 1-3 hour. After
that, the status changes to fenced, and the VMs on the host are restarted on
other hosts by the HA.
At the same time, I saw this on the ShapeBlue's website:
```
Fencing – the Host-HA framework is trying to Fence the host by issuing OOBM job
```
Yes, I have seen these logs. ACS try to shutdown the host by OOBM but faild.
thers is 2100 try's with a host down.
```
2024-12-03 14:36:30,219 WARN [o.a.c.h.t.BaseHATask]
(pool-4-thread-82:ctx-af691a2f) (logid:670480bb) Exception occurred while
running FenceTask on a resource:
org.apache.cloudstack.ha.provider.HAFenceException: OBM service is not
configured or enabled for this host idnode10
org.apache.cloudstack.ha.provider.HAFenceException: OBM service is not
configured or enabled for this host idnode10
at
org.apache.cloudstack.kvm.ha.KVMHAProvider.fence(KVMHAProvider.java:100)
at
org.apache.cloudstack.kvm.ha.KVMHAProvider.fence(KVMHAProvider.java:42)
at
org.apache.cloudstack.ha.task.FenceTask.performAction(FenceTask.java:42)
at org.apache.cloudstack.ha.task.BaseHATask$1.call(BaseHATask.java:86)
at org.apache.cloudstack.ha.task.BaseHATask$1.call(BaseHATask.java:83)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: com.cloud.utils.exception.CloudRuntimeException: Out-of-band
Management action [OFF] on Host
{"id":14,"name":"idnode10","type":"Routing","uuid":"b570ac0f-aad5-4f90-a62e-46e710db3763"}
failed with error: Set Chassis Power Control to Down/Off failed: Command not
supported in present state
at
org.apache.cloudstack.outofbandmanagement.OutOfBandManagementServiceImpl.executePowerOperation(OutOfBandManagementServiceImpl.java:437)
at jdk.internal.reflect.GeneratedMethodAccessor144.invoke(Unknown
Source)
at
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
... 21 more
```
Then the status change to `Fenced`:
```
2024-12-03 12:28:53,340 WARN [c.c.a.AlertManagerImpl]
(pool-4-thread-139:ctx-efd70258) (logid:60f5a05f) alertType=[7]
dataCenterId=[1] podId=[1] clusterId=[null] message=[Host is down, name:
idnode10 (id:14), availability zone: Zone1, pod: Pod].
2024-12-03 12:28:53,354 DEBUG [c.c.h.HighAvailabilityManagerImpl]
(pool-4-thread-139:ctx-efd70258) (logid:60f5a05f) Notifying HA Mgr of to
restart vm 50-i-2-50-VM
2024-12-03 12:28:53,357 DEBUG [c.c.h.HighAvailabilityManagerImpl]
(pool-4-thread-139:ctx-efd70258) (logid:60f5a05f) HA schedule restart
2024-12-03 12:28:53,468 INFO [c.c.h.HighAvailabilityManagerImpl]
(pool-4-thread-139:ctx-efd70258) (logid:60f5a05f) Schedule vm for HA: VM
instance
{"id":50,"instanceName":"i-2-50-VM","type":"User","uuid":"94c170b3-9780-43aa-ab0c-eaaf56fec475"}
2024-12-03 12:28:53,468 DEBUG [c.c.h.HighAvailabilityManagerImpl]
(pool-4-thread-139:ctx-efd70258) (logid:60f5a05f) Wakeup workers HA
2024-12-03 12:28:53,477 INFO [c.c.h.HighAvailabilityManagerImpl]
(HA-Worker-4:ctx-e9d23278 work-3326) (logid:a1f6d350) Processing work
HAWork[3326-HA-50-Running-Investigating]
2024-12-03 12:28:53,477 DEBUG [c.c.h.HighAvailabilityManagerImpl]
(HA-Worker-4:ctx-e9d23278 work-3326) (logid:a1f6d350) RESTART with HAWORK
2024-12-03 12:28:53,489 INFO [c.c.h.HighAvailabilityManagerImpl]
(HA-Worker-4:ctx-e9d23278 work-3326) (logid:a1f6d350) HA on VM instance
{"id":50,"instanceName":"i-2-50-VM","type":"User","uuid":"94c170b3-9780-43aa-ab0c-eaaf56fec475"}
2024-12-03 12:28:53,498 DEBUG [c.c.r.ResourceState]
(pool-4-thread-139:ctx-efd70258) (logid:60f5a05f) Resource state update: [id =
14; name = idnode10; old state = Enabled; event = InternalEnterMaintenance; new
state = Maintenance]
2024-12-03 12:28:53,500 WARN [c.c.a.AlertManagerImpl]
(pool-4-thread-139:ctx-efd70258) (logid:60f5a05f) alertType=[30]
dataCenterId=[1] podId=[1] clusterId=[null] message=[HA Fencing of host id=14,
in dc id=1 performed].
2024-12-03 12:28:53,510 DEBUG [o.a.c.h.HAManagerImpl] (HA-Worker-4:ctx-e9d23278
work-3326) (logid:a1f6d350) HA: Host [14] is fenced.
```
## So my issue are:
**Why does the fencing process take such a long time?**
**How can I make the HA process quicker?**
GitHub link: https://github.com/apache/cloudstack/discussions/10026
----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]