Hi, The whole engine.log including the shutdown time (was performed around 9:19) http://pastebin.com/cdY9uTkJ
vdsm.log of host01 (the host which kept on running and took over the engine) split into 3 uploads (limit of 512 kB of pastebin): 1 : http://pastebin.com/dr9jNTek 2 : http://pastebin.com/cuyHL6ne 3 : http://pastebin.com/7x2ZQy1y Michael On 09/21/2015 03:00 PM, Martin Perina wrote: > Hi, > > could you please post whole engine.log (from the time which you turned off > the host with engine VM) and also vdsm.log from both hosts? > > Thanks > > Martin Perina > > ----- Original Message ----- >> From: "Michael Hölzl" <m...@ins.jku.at> >> To: users@ovirt.org >> Sent: Monday, September 21, 2015 10:27:08 AM >> Subject: [ovirt-users] HA - Fencing not working when host with engine gets >> shutdown >> >> Hi all, >> >> we are trying to setup an ovirt environment with two hosts, both >> connected to a ISCSI storage device, a hosted engine and power >> management configured over ILO. So far it seems to work fine in our >> testing setup and starting/stopping VMs works smoothly with proper >> scheduling between those hosts. So we wanted to test HA for the VMs now >> and started to manually shutdown a host while there are still VMs >> running on that machine (to simulate power failure or a kernel panic). >> The expected outcome was that all machines were HA is enabled, are >> booted again. This works if the machine with the failure does not have >> the engine running. If the machine with the hosted engine VM gets >> shutdown, the host gets in the "Not Responsive state" and all VMs end up >> in an unkown state. However, the engine itself starts correctly on the >> second host and it seems like it tries to fence the other host (as >> expected) - Events which we get in the open virtualization manager: >> 1. Host hosted_engine_2 is non responsive >> 2. Host hosted_engine_1 from cluster Default was chosen as a proxy to >> execute Status command on Host hosted_engine_2. >> 3. Host hosted_engine_2 became non responsive. It has no power >> management configured. Please check the host status, manually reboot it, >> and click "Confirm Host Has Been Rebooted" >> 4. Host hosted_engine_2 is not responding. It will stay in Connecting >> state for a grace period of 124 seconds and after that an attempt to >> fence the host will be issued. >> >> Event 4 is continuously coming every 3 minutes. Complete engine.log file >> during engine boot up: http://pastebin.com/D6xS3Wfy >> So the host detects the machine is not responding and wants to fence it. >> But although the host has power management configured over ILO, the >> engine thinks that it is not. As a result the second host does not get >> fenced and VMs are not migrated to the running machine. >> In the log files there are also a lot of time out exception. But I guess >> that this is because the host cannot connect to the other machine. >> >> Did anybody face similar problems with HA? Or any clue what the problem >> might be? >> >> Thanks, >> Michael >> >> >> ---- >> ovirt version: 3.5.4 >> Hosted engine VM OS: Cent OS 6.5 >> Host Machines OS: Cent OS 7 >> >> P.S. We also have to note that we had problems with the command >> fence_ipmilan at the beginning. We were receiving the message "Unable to >> obtain correct plug status or plug is not available," whenever the >> command fence_ipmilan was called. However, the command fence_ilo4 >> worked. So we use a simple script for fence_ipmilan now that calls >> fence_ilo4 and passes the arguments. >> _______________________________________________ >> Users mailing list >> Users@ovirt.org >> http://lists.ovirt.org/mailman/listinfo/users >> _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users