Jon, not an expert on particular implementation, but obviously your host needs power, so its IPMI/BMC/iLo/iDRAC/etc. controller can be contacted and host fenced. Redundant PSU with different power sources is expected (defacto standard in production).
Kind regards, Andrija On Mon, 4 Mar 2019 at 12:19, Jon Marshall <jms....@hotmail.co.uk> wrote: > > I have KVM Host HA enabled and power is lost to one of the compute nodes. > The host has it's state marked as alert and the HA states go through > degraded to suspect to Fencing. > > The problem is that the host is never fenced because there is no power to > it so none of the OOBM commands work which means the VMs are never migrated. > > From the management server logs - > > 2019-03-04 11:02:48,288 WARN [o.a.c.h.t.BaseHATask] > (pool-6-thread-9:null) (logid:d0a19f20) Exception occurred while running > FenceTask on a resource: > org.apache.cloudstack.ha.provider.HAFenceException: OOBM service is not > configured or enabled for this host dcp-cscn2.local > org.apache.cloudstack.ha.provider.HAFenceException: OOBM service is not > configured or enabled for this host dcp-cscn2.local > at > org.apache.cloudstack.kvm.ha.KVMHAProvider.fence(KVMHAProvider.java:99) > at > org.apache.cloudstack.kvm.ha.KVMHAProvider.fence(KVMHAProvider.java:42) > at > org.apache.cloudstack.ha.task.FenceTask.performAction(FenceTask.java:42) > at > org.apache.cloudstack.ha.task.BaseHATask$1.call(BaseHATask.java:86) > at > org.apache.cloudstack.ha.task.BaseHATask$1.call(BaseHATask.java:83) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: com.cloud.utils.exception.CloudRuntimeException: Out-of-band > Management action (OFF) on host (b53122bc-1446-4ffd-a179-e363ad0d541f) > failed with error: Get Auth Capabilities error > Error issuing Get Channel Authentication Capabilities request > Error: Unable to establish IPMI v2 / RMCP+ session > > at > org.apache.cloudstack.outofbandmanagement.OutOfBandManagementServiceImpl.executePowerOperation(OutOfBandManagementServiceImpl.java:423) > at sun.reflect.GeneratedMethodAccessor225.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ... 21 more > > > which begs the question how is this meant to work for a host whose power > has failed. > > > If I turn off KVM Host HA and change the ping interval to 30 and ping > timeout to 2 then the VMs failover to another host within 5 mins. > > I understand what Host HA is meant for but it seems for a failed host in > terms of power it doesn't work. > > Jon > -- Andrija Panić