Clearly the management server doesn't realize the instance on the failed host is not running... but the host is in Alert state and powered down, and missing NFS heartbeats.
2017-12-23 14:57:52,427 DEBUG [c.c.h.Status] (AgentTaskPool-10:ctx-694feb6c) (logid:160220c5) Transition:[Resource state = Enabled, Agent event = AgentDisconnected, Host id = 4, name = r62-i122-36-01.domain.com] 2017-12-23 14:58:24,487 DEBUG [c.c.c.CapacityManagerImpl] (CapacityChecker:ctx-66fbe484) (logid:1f53cd63) Found 1 VMs on host 4 2017-12-23 14:58:24,495 DEBUG [c.c.c.CapacityManagerImpl] (CapacityChecker:ctx-66fbe484) (logid:1f53cd63) Found 0 VM, not running on host 4 Next step ? On Sat, Dec 23, 2017 at 9:49 AM, Jean-Francois Nadeau < the.jfnad...@gmail.com> wrote: > I'd really like to get at the bottom of this. It does sound like the > behavior mentioned in https://issues.apache.org/ > jira/browse/CLOUDSTACK-5582 but should be long fixed. > > One suspect log entry (be unrelated) I noticed is this recurring exception > in the manger logs : > > ERROR [c.c.v.UserVmManagerImpl] (UserVm-ipfetch-3:ctx-d4c44c2b) > (logid:16dd70ad) Caught the Exception in VmIpFetchTask > > Which I guess is caused by the use of an external DHCP so manager fails to > determine a running VM IP. Which brings me to my next question.... how > is a VM marked for HA actually monitored ? > > > On Sat, Dec 23, 2017 at 3:38 AM, Eric Green <eric.lee.gr...@gmail.com> > wrote: > >> If all else fails, change its state to the correct state in the MySQL >> database and restart the management service. Sadly that is the only way I >> could do it when my Cloudstack got confused and stuck an instance in an >> intermediate state where I couldn't do anything with it. >> >> On Dec 22, 2017 at 9:09 AM, <Jean-Francois Nadeau <the.jfnad...@gmail.com >> >> >> wrote: >> >> Good morning, >> >> New to ACS and doing a POC with 4.10 on Centos 7 and KVM. >> >> Im trying to recover VMs after an host failure (powered off from OOB). >> >> Primary storage is NFS and IPMI is configured for the KVM hosts. Zone is >> advanced mode with vlan separation and created a shared network with no >> services since I wish to use an external DHCP. >> >> First, say I don't have a compute offering with HA enabled and a KVM host >> goes down... I can't put it in maintenance mode while down and disabling >> it have no effect on the state of the lost VMs. VM stays in running state >> according to manager. What should I do to force restart on remaining >> healthy hosts ? >> >> Then I enabled IPMI on all KVM hosts and attempted the same experience >> with a compute offering with HA enabled. Same result. Manager do see >> the >> host as disconnected and powered off but take no action. I certainly >> miss >> something here. Please help ! >> >> Regards, >> >> Jean-Francois >> > >