Hi, I looked at the logs and the reason why host vmh-02 wasn't restarted is that PM restart failed using both other hosts (vmh-01 and vmh-03) with error:
Test Failed, [Powering off machine @ IPMI:10.9.1.11...Failed So we couldn't restart HA VMs on another hosts, because we were not sure that host vmh-02 is really down. I also noticed that even getting PM status of host vmh-02 is problematic, fence agent returned this message: Power Management test failed for Host vmh-02.Done but it also returned successful operation. This looks very very suspicious! Could you please execute following command from vmh-01 or vmh-03 to test PM agent on vmh-02 fence_ipmilan -a <IP> -l <USER> -p <PASSWORD> -o status -v -P where <IP>, <USER> and <PASSWORD> contains values valid for vmh-02? Could you please send us also vdsm.log from machines vmh-01 and vmh-03 so we could investigate details of fence agents execution failures? Thanks a lot Martin Perina ----- Original Message ----- > From: "Siddharth Patil" <siddha...@patil.co.uk> > To: users@ovirt.org > Sent: Wednesday, February 11, 2015 5:17:28 PM > Subject: Re: [ovirt-users] Fenced hosts VM's never migrate > > On 11/02/15, 4:34 PM, Omer Frenkel wrote: > > > > > > ----- Original Message ----- > >> From: "Tim Macy" <mac...@gmail.com> > >> To: users@ovirt.org > >> Sent: Tuesday, February 10, 2015 6:55:31 PM > >> Subject: [ovirt-users] Fenced hosts VM's never migrate > >> > >> I have a 3 host cluster setup with HA enabled and fencing enabled and it > >> appears to be working properly. Executing power management stop, start, > >> and > >> restart work along with host shutdown/restart following a simulated crash. > >> When network is pulled a proxy is chosen and it powers off the downed > >> host, > >> and then restarts it. Since the network is still down it repeats the > >> following in events: > >> "Host kvm01 is not responding. It will stay in Connecting state for a > >> grace > >> period of 162 seconds and after that an attempt to fence the host will be > >> issued." > >> > >> The real problem here is that the VM's on the host that has failed never > >> migrate to a new host and remain down until the network is reconnected. > >> > > > > once the host is powered off by the proxy, HA vms will be started (not > > migrated) on other host, > > if there are resources for it.. > > if you have HA vms that are not started although there is another host > > available for it, > > it might be a bug, can you please attach engine.log from the time of the > > failure? > > > >> We have tested this with back-end storage on gluster and NFS with the same > >> result. This is on oVirt Engine Version: 3.5.1.1-1.el6. Hosts are on > >> CentOS > >> 7 and the Engine is standalone on CentOS 6.6. > >> > >> > >> > >> _______________________________________________ > >> Users mailing list > >> Users@ovirt.org > >> http://lists.ovirt.org/mailman/listinfo/users > >> > > I've had the exact same problem during testing yesterday. The HA VMs > never restarted on the other available hosts. The only difference is > that we're using iSCSI storage backend. > > oVirt Engine Version: 3.5.1.1-1.el6 (hosted engine) > Host: CentOS 6.6 > > Engine logs are attached. > > Thanks, > Siddharth > > _______________________________________________ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users > _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users