Re: [Users] Testing High Availability and Power outages

Alexandru Vladulescu Sun, 13 Jan 2013 00:46:57 -0800

Dear Doron,

I haven't collected the logs from the tests, but I would gladly re-do the case 
and get back to you asap.


This feature is the main reason of which I have chosen to go with Ovirt in the 
first place, besides other virt environments.

Could you please inform me what logs should I be focusing on, besides the 
engine log; vdsm maybe or other relevant logs?

Regards,
Alex


--
Sent from phone.

On 13.01.2013, at 09:56, Doron Fediuck <dfedi...@redhat.com> wrote:

> 
> 
> From: "Alexandru Vladulescu" <avladule...@bfproject.ro>
> To: "users" <users@ovirt.org>
> Sent: Friday, January 11, 2013 2:47:38 PM
> Subject: [Users] Testing High Availability and Power outages
> 
> 
> Hi,
> 
> 
> Today, I started testing on my Ovirt 3.1 installation (from dreyou repos) 
> running on 3 x Centos 6.3 hypervisors the High Availability features and the 
> fence mechanism.
> 
> As yesterday, I have reported in a previous email thread, that the migration 
> priority queue cannot be increased (bug) in this current version, I decided 
> to test what the official documentation says about the High Availability 
> cases. 
> 
> This will be a disaster case scenarios to suffer from if one hypervisor has a 
> power outage/hardware problem and the VMs running on it are not migrating on 
> other spare resources.
> 
> 
> In the official documenation from ovirt.org it is quoted the following:
> High availability
> 
> Allows critical VMs to be restarted on another host in the event of hardware 
> failure with three levels of priority, taking into account resiliency policy.
> 
> Resiliency policy to control high availability VMs at the cluster level.
> Supports application-level high availability with supported fencing agents.
> 
> As well as in the Architecture description:
> 
> High Availability - restart guest VMs from failed hosts automatically on 
> other hosts
> 
> 
> 
> So the testing went like this -- One VM running a linux box, having the check 
> box "High Available" and "Priority for Run/Migration queue:" set to Low. On 
> Host we have the check box to "Any Host in Cluster", without "Allow VM 
> migration only upon Admin specific request" checked.
> 
> 
> 
> My environment:
> 
> 
> Configuration :  2 x Hypervisors (same cluster/hardware configuration) ; 1 x 
> Hypervisor + acting as a NAS (NFS) server (different cluster/hardware 
> configuration)
> 
> Actions: Went and cut-off the power from one of the hypervisors from the 2 
> node clusters, while the VM was running on. This would translate to a power 
> outage.
> 
> Results: The hypervisor node that suffered from the outage is showing in 
> Hosts tab as Non Responsive on Status, and the VM has a question mark and 
> cannot be powered off or nothing (therefore it's stuck).
> 
> In the Log console in GUI, I get: 
> 
> Host Hyper01 is non-responsive.
> VM Web-Frontend01 was set to the Unknown status.
> 
> There is nothing I could I could do besides clicking on the Hyper01 "Confirm 
> Host as been rebooted", afterwards the VM starts on the Hyper02 with a cold 
> reboot of the VM.
> 
> The Log console changes to:
> 
> Vm Web-Frontend01 was shut down due to Hyper01 host reboot or manual fence
> All VMs' status on Non-Responsive Host Hyper01 were changed to 'Down' by 
> admin@internal
> Manual fencing for host Hyper01 was started.
> VM Web-Frontend01 was restarted on Host Hyper02
> 
> 
> I would like you approach on this problem, reading the documentation & 
> features pages on the official website, I suppose that this would have been 
> an automatically mechanism working on some sort of a vdsm & engine fencing 
> action. Am I missing something regarding it ?
> 
> 
> Thank you for your patience reading this.
> 
> 
> Regards,
> Alex.
> 
> 
> 
> 
> _______________________________________________
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
> Hi Alex,
> Can you share with us the engine's log from the relevant time period?
> 
> Doron

_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

Re: [Users] Testing High Availability and Power outages

Reply via email to