----- Original Message ----- > From: "Tareq Alayan" <[email protected]> > To: [email protected], "Karli Sjöberg" <[email protected]> > Cc: [email protected] > Sent: Monday, January 27, 2014 2:43:29 PM > Subject: Re: [Users] two node ovirt cluster with HA > > Hi, > > Power management makes use of special *dedicated* hardware in order to > restart hosts independently of host OS. The engine connects to a power > management devices using a *dedicated* network IP address. > The engine is capable of rebooting hosts that have entered a > non-operational or non-responsive state,
non-operational is related to storage issues so the Host will not be restarted by PM in this case > The abilities provided by all power management devices are: check > status, start, stop and recycle (restart)... Only status, start, stop while restart is implemented as stop->wait to off status->start->wait to on status > > In the case of non-responsive host: all of the VMs that are currently > running on that host can also become non-responsive. However, the > non-responsive host keeps locking the VM hard disk for all VMs it is > running. Attempting to start a VM on a different host and assign the > second host write privileges for the virtual machine hard disk image can > cause data corruption. > Rebooting allows the engine to assume that the lock on a VM hard disk > image has been released. > The engine can know for sure that the problematic host has been rebooted > via the power management device and then it can start a VM from the > problematic host on another host without risking data corruption. > Important note: A virtual machine that has been marked highly-available > can not be safely started on a different host without the certainty that > doing so will not cause data corruption. > > N-joy, > > --Tareq > > > > On 01/27/2014 02:05 PM, Dafna Ron wrote: > > I am adding Tareq for the Power Management implementation. > > > > Dafna > > > > > > On 01/27/2014 11:48 AM, Karli Sjöberg wrote: > >> On Mon, 2014-01-27 at 11:11 +0000, Dafna Ron wrote: > >>> Powering off the host will never trigger vm migration. > >>> As far as engine is concerned it just lost connection to the host, but > >>> has no way of telling if the host is down or if a router is down. > >> Can´t it at least check with power management if the Host status is down > >> first? > >> > >> I mean, if the network is down there will be no response from either PM > >> or Host. But if PM is up and can tell you that the Host is down, sounds > >> rather clear cut to me... > >> > >> Seems to me the VM's would be restarted sooner if the flow was altered > >> to first check with PM if it´s a network or Host issue, and if Host > >> issue, immediately restart VM's on another Host, instead of waiting for > >> a potentially problematic Host to boot up eventually. > >> > >> /K > >> > >>> since vm's can continue running on the host even if engine has no > >>> access > >>> to it, starting the vm's on the second host can cause split brain and > >>> data corruption. > >>> > >>> The way that the engine knows what's going on is by sending heath check > >>> queries to the vdsm. > >>> Power management will try to reboot a host when the health checks to > >>> vdsm will not be answered. > >>> So... if engine gets no reply and has no way of rebooting the host, the > >>> host status will be changed to Non-Responsive and the vm's will be > >>> unknown because engine has no way of knowing what's happening with the > >>> vm's. > >>> Since reboot of the host will kill the vm's running on it - this will > >>> never cause any vm migration but... along with the High-Availability vm > >>> feature, you will be able to have some of the vm's re-started on the > >>> second host after the host reboot (and that is only if Power Management > >>> was confirmed as successful). > >>> > >>> VM migration is only triggered when: > >>> 1. Cluster configuration states that the vm should be migrated in case > >>> of failure > >>> 2. Engine has access to the host - so the failure is on the storage > >>> side > >>> and not the host side. > >>> 3. the vms are not actively writing (although there might be a new RFE > >>> for it). > >>> > >>> hope this clears things up > >>> > >>> Dafna > >>> > >>> > >>> > >>> On 01/27/2014 10:11 AM, Andrew Lau wrote: > >>>> Hi, > >>>> > >>>> Have you got power management enabled? > >>>> > >>>> That's the fencing feature required for the engine to ensure that the > >>>> host is actually offline. It won't resume any other VMs to prevent > >>>> potential VM corruption (eg. VM running on multiple hosts). > >>>> > >>>> Andrew. > >>>> > >>>> On Jan 27, 2014 5:12 PM, "Jaison peter" <[email protected] > >>>> <mailto:[email protected]>> wrote: > >>>> > >>>> Hi all , > >>>> > >>>> I was setting a two node ovirt cluster with ovirt engine on > >>>> seperate node . I completed the configuration and tested VM live > >>>> migrations with out any issues . Then for checking cluster HA I > >>>> powered down one host and expected vms running on that host to be > >>>> migrated to the other one . But nothing happened , Engine > >>>> detected > >>>> host as un-rechable and marked it as non-operational and vm > >>>> ran on > >>>> that host went to 'unknown state' . Is that not possible to setup > >>>> a fully HA ovirt cluster with two nodes ? or else is that my > >>>> configuration problem ? please advice . > >>>> > >>>> Thanks & Regards > >>>> > >>>> Alex > >>>> > >>>> _______________________________________________ > >>>> Users mailing list > >>>> [email protected] <mailto:[email protected]> > >>>> http://lists.ovirt.org/mailman/listinfo/users > >>>> > >>>> > >>>> > >>>> _______________________________________________ > >>>> Users mailing list > >>>> [email protected] > >>>> http://lists.ovirt.org/mailman/listinfo/users > >>> > >>> -- > >>> Dafna Ron > >>> _______________________________________________ > >>> Users mailing list > >>> [email protected] > >>> http://lists.ovirt.org/mailman/listinfo/users > >> > >> > > > > > > _______________________________________________ > Users mailing list > [email protected] > http://lists.ovirt.org/mailman/listinfo/users > _______________________________________________ Users mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/users

