Clearly the management server doesn't realize the instance on the failed
host is not running...  but the host is in Alert state and powered down,
and missing NFS heartbeats.

2017-12-23 14:57:52,427 DEBUG [c.c.h.Status]
(AgentTaskPool-10:ctx-694feb6c) (logid:160220c5) Transition:[Resource state
= Enabled, Agent event = AgentDisconnected, Host id = 4, name =
r62-i122-36-01.domain.com]
2017-12-23 14:58:24,487 DEBUG [c.c.c.CapacityManagerImpl]
(CapacityChecker:ctx-66fbe484) (logid:1f53cd63) Found 1 VMs on host 4
2017-12-23 14:58:24,495 DEBUG [c.c.c.CapacityManagerImpl]
(CapacityChecker:ctx-66fbe484) (logid:1f53cd63) Found 0 VM, not running on
host 4

Next step ?

On Sat, Dec 23, 2017 at 9:49 AM, Jean-Francois Nadeau <
the.jfnad...@gmail.com> wrote:

> I'd really like to get at the bottom of this.    It does sound like the
> behavior mentioned in https://issues.apache.org/
> jira/browse/CLOUDSTACK-5582 but should be long fixed.
>
> One suspect log entry (be unrelated) I noticed is this recurring exception
> in the manger logs :
>
> ERROR [c.c.v.UserVmManagerImpl] (UserVm-ipfetch-3:ctx-d4c44c2b)
> (logid:16dd70ad) Caught the Exception in VmIpFetchTask
>
> Which I guess is caused by the use of an external DHCP so manager fails to
> determine a running VM IP.    Which brings me to my next question.... how
> is a VM marked for HA actually monitored ?
>
>
> On Sat, Dec 23, 2017 at 3:38 AM, Eric Green <eric.lee.gr...@gmail.com>
> wrote:
>
>> If all else fails, change its state to the correct  state in the MySQL
>> database and restart the management  service. Sadly that is the only way I
>> could do it when my Cloudstack got confused and stuck an instance in an
>> intermediate state where I couldn't do anything with it.
>>
>> On Dec 22, 2017 at 9:09 AM, <Jean-Francois Nadeau <the.jfnad...@gmail.com
>> >>
>> wrote:
>>
>> Good morning,
>>
>> New to ACS and doing a POC with 4.10 on Centos 7 and KVM.
>>
>> Im trying to recover VMs after an host failure (powered off from OOB).
>>
>> Primary storage is NFS and IPMI is configured for the KVM hosts.  Zone is
>> advanced mode with vlan separation and created a shared network with no
>> services since I wish to use an external DHCP.
>>
>> First,  say I don't have a compute offering with HA enabled and a KVM host
>> goes down...  I can't put it in maintenance mode while down and disabling
>> it have no effect on the state of the lost VMs.  VM stays in running state
>> according to manager.   What should I do to force restart on remaining
>> healthy hosts ?
>>
>> Then I enabled  IPMI on all KVM hosts and attempted the same experience
>> with a compute offering with HA enabled.   Same result.  Manager do see
>> the
>> host as disconnected and powered off but take no action.   I certainly
>> miss
>> something here.  Please help !
>>
>> Regards,
>>
>> Jean-Francois
>>
>
>

Reply via email to