Re: [Users] two node ovirt cluster with HA

Eli Mesika Thu, 30 Jan 2014 03:03:48 -0800


----- Original Message -----
> From: "Tareq Alayan" <[email protected]>
> To: [email protected], "Karli Sjöberg" <[email protected]>
> Cc: [email protected]
> Sent: Monday, January 27, 2014 2:43:29 PM
> Subject: Re: [Users] two node ovirt cluster with HA
> 
> Hi,
> 
> Power management makes use of special *dedicated* hardware in order to
> restart hosts independently of host OS. The engine connects to a power
> management devices using a *dedicated* network IP address.
> The engine is capable of rebooting hosts that have entered a
> non-operational or non-responsive state,


non-operational is related to storage issues so the Host will not be restarted 
by PM in this case

> The abilities provided by all power management devices are: check
> status, start, stop and recycle (restart)...

Only status, start, stop  while restart is implemented as stop->wait to off 
status->start->wait to on status 

> 
> In the case of non-responsive host: all of the VMs that are currently
> running on that host can also become non-responsive. However, the
> non-responsive host keeps locking the VM hard disk for all VMs it is
> running. Attempting to start a VM on a different host and assign the
> second host write privileges for the virtual machine hard disk image can
> cause data corruption.
> Rebooting allows the engine to assume that the lock on a VM hard disk
> image has been released.
> The engine can know for sure that the problematic host has been rebooted
> via the power management device and then it can start a VM from the
> problematic host on another host without risking data corruption.
> Important note: A virtual machine that has been marked highly-available
> can not be safely started on a different host without the certainty that
> doing so will not cause data corruption.
> 
> N-joy,
> 
> --Tareq
> 
> 
> 
> On 01/27/2014 02:05 PM, Dafna Ron wrote:
> > I am adding Tareq for the Power Management implementation.
> >
> > Dafna
> >
> >
> > On 01/27/2014 11:48 AM, Karli Sjöberg wrote:
> >> On Mon, 2014-01-27 at 11:11 +0000, Dafna Ron wrote:
> >>> Powering off the host will never trigger vm migration.
> >>> As far as engine is concerned it just lost connection to the host, but
> >>> has no way of telling if the host is down or if a router is down.
> >> Can´t it at least check with power management if the Host status is down
> >> first?
> >>
> >> I mean, if the network is down there will be no response from either PM
> >> or Host. But if PM is up and can tell you that the Host is down, sounds
> >> rather clear cut to me...
> >>
> >> Seems to me the VM's would be restarted sooner if the flow was altered
> >> to first check with PM if it´s a network or Host issue, and if Host
> >> issue, immediately restart VM's on another Host, instead of waiting for
> >> a potentially problematic Host to boot up eventually.
> >>
> >> /K
> >>
> >>> since vm's can continue running on the host even if engine has no
> >>> access
> >>> to it, starting the vm's on the second host can cause split brain and
> >>> data corruption.
> >>>
> >>> The way that the engine knows what's going on is by sending heath check
> >>> queries to the vdsm.
> >>> Power management will try to reboot a host when the health checks to
> >>> vdsm will not be answered.
> >>> So... if engine gets no reply and has no way of rebooting the host, the
> >>> host status will be changed to Non-Responsive and the vm's will be
> >>> unknown because engine has no way of knowing what's happening with the
> >>> vm's.
> >>> Since reboot of the host will kill the vm's running on it - this will
> >>> never cause any vm migration but... along with the High-Availability vm
> >>> feature, you will be able to have some of the vm's re-started on the
> >>> second host after the host reboot (and that is only if Power Management
> >>> was confirmed as successful).
> >>>
> >>> VM migration is only triggered when:
> >>> 1. Cluster configuration states that the vm should be migrated in case
> >>> of failure
> >>> 2. Engine has access to the host - so the failure is on the storage
> >>> side
> >>> and not the host side.
> >>> 3. the vms are not actively writing (although there might be a new RFE
> >>> for it).
> >>>
> >>> hope this clears things up
> >>>
> >>> Dafna
> >>>
> >>>
> >>>
> >>> On 01/27/2014 10:11 AM, Andrew Lau wrote:
> >>>> Hi,
> >>>>
> >>>> Have you got power management enabled?
> >>>>
> >>>> That's the fencing feature required for the engine to ensure that the
> >>>> host is actually offline. It won't resume any other VMs to prevent
> >>>> potential VM corruption (eg. VM running on multiple hosts).
> >>>>
> >>>> Andrew.
> >>>>
> >>>> On Jan 27, 2014 5:12 PM, "Jaison peter" <[email protected]
> >>>> <mailto:[email protected]>> wrote:
> >>>>
> >>>>      Hi all ,
> >>>>
> >>>>      I was setting a two node ovirt cluster with ovirt engine on
> >>>>      seperate node . I completed the configuration and tested VM  live
> >>>>      migrations with out any issues . Then for checking cluster HA I
> >>>>      powered down one host and expected vms running on that host to be
> >>>>      migrated to the other one . But nothing happened , Engine
> >>>> detected
> >>>>      host as un-rechable and marked it as non-operational and vm
> >>>> ran on
> >>>>      that host went to 'unknown state' . Is that not possible to setup
> >>>>      a fully HA ovirt cluster with two nodes ? or else is that my
> >>>>      configuration problem ? please advice .
> >>>>
> >>>>      Thanks & Regards
> >>>>
> >>>>      Alex
> >>>>
> >>>>      _______________________________________________
> >>>>      Users mailing list
> >>>>      [email protected] <mailto:[email protected]>
> >>>>      http://lists.ovirt.org/mailman/listinfo/users
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Users mailing list
> >>>> [email protected]
> >>>> http://lists.ovirt.org/mailman/listinfo/users
> >>>
> >>> --
> >>> Dafna Ron
> >>> _______________________________________________
> >>> Users mailing list
> >>> [email protected]
> >>> http://lists.ovirt.org/mailman/listinfo/users
> >>
> >>
> >
> >
> 
> _______________________________________________
> Users mailing list
> [email protected]
> http://lists.ovirt.org/mailman/listinfo/users
> 
_______________________________________________
Users mailing list
[email protected]
http://lists.ovirt.org/mailman/listinfo/users

Re: [Users] two node ovirt cluster with HA

Reply via email to