Dear Doron, I haven't collected the logs from the tests, but I would gladly re-do the case and get back to you asap.
This feature is the main reason of which I have chosen to go with Ovirt in the first place, besides other virt environments. Could you please inform me what logs should I be focusing on, besides the engine log; vdsm maybe or other relevant logs? Regards, Alex -- Sent from phone. On 13.01.2013, at 09:56, Doron Fediuck <dfedi...@redhat.com> wrote: > > > From: "Alexandru Vladulescu" <avladule...@bfproject.ro> > To: "users" <users@ovirt.org> > Sent: Friday, January 11, 2013 2:47:38 PM > Subject: [Users] Testing High Availability and Power outages > > > Hi, > > > Today, I started testing on my Ovirt 3.1 installation (from dreyou repos) > running on 3 x Centos 6.3 hypervisors the High Availability features and the > fence mechanism. > > As yesterday, I have reported in a previous email thread, that the migration > priority queue cannot be increased (bug) in this current version, I decided > to test what the official documentation says about the High Availability > cases. > > This will be a disaster case scenarios to suffer from if one hypervisor has a > power outage/hardware problem and the VMs running on it are not migrating on > other spare resources. > > > In the official documenation from ovirt.org it is quoted the following: > High availability > > Allows critical VMs to be restarted on another host in the event of hardware > failure with three levels of priority, taking into account resiliency policy. > > Resiliency policy to control high availability VMs at the cluster level. > Supports application-level high availability with supported fencing agents. > > As well as in the Architecture description: > > High Availability - restart guest VMs from failed hosts automatically on > other hosts > > > > So the testing went like this -- One VM running a linux box, having the check > box "High Available" and "Priority for Run/Migration queue:" set to Low. On > Host we have the check box to "Any Host in Cluster", without "Allow VM > migration only upon Admin specific request" checked. > > > > My environment: > > > Configuration : 2 x Hypervisors (same cluster/hardware configuration) ; 1 x > Hypervisor + acting as a NAS (NFS) server (different cluster/hardware > configuration) > > Actions: Went and cut-off the power from one of the hypervisors from the 2 > node clusters, while the VM was running on. This would translate to a power > outage. > > Results: The hypervisor node that suffered from the outage is showing in > Hosts tab as Non Responsive on Status, and the VM has a question mark and > cannot be powered off or nothing (therefore it's stuck). > > In the Log console in GUI, I get: > > Host Hyper01 is non-responsive. > VM Web-Frontend01 was set to the Unknown status. > > There is nothing I could I could do besides clicking on the Hyper01 "Confirm > Host as been rebooted", afterwards the VM starts on the Hyper02 with a cold > reboot of the VM. > > The Log console changes to: > > Vm Web-Frontend01 was shut down due to Hyper01 host reboot or manual fence > All VMs' status on Non-Responsive Host Hyper01 were changed to 'Down' by > admin@internal > Manual fencing for host Hyper01 was started. > VM Web-Frontend01 was restarted on Host Hyper02 > > > I would like you approach on this problem, reading the documentation & > features pages on the official website, I suppose that this would have been > an automatically mechanism working on some sort of a vdsm & engine fencing > action. Am I missing something regarding it ? > > > Thank you for your patience reading this. > > > Regards, > Alex. > > > > > _______________________________________________ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users > Hi Alex, > Can you share with us the engine's log from the relevant time period? > > Doron
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users