----- Original Message ----- > From: "Jaison peter" <[email protected]> > To: "Eli Mesika" <[email protected]> > Cc: [email protected], "Tareq Alayan" <[email protected]> > Sent: Tuesday, January 28, 2014 7:33:35 AM > Subject: Re: [Users] two node ovirt cluster with HA > > Thank you all for your valuable feedback . > > Can you please specify some of the supported fencing devices in ovirt ?
For oVirt 3.4 : apc,apc_snmp,bladecenter,cisco_ucs,drac5,drac7,eps,hpblade,ilo,ilo2,ilo3,ilo4,ipmilan,rsa,rsb,wti > > > On Mon, Jan 27, 2014 at 9:10 PM, Eli Mesika <[email protected]> wrote: > > > > > > > ----- Original Message ----- > > > From: "Tareq Alayan" <[email protected]> > > > To: "Andrew Lau" <[email protected]>, "Eli Mesika" < > > [email protected]> > > > Cc: [email protected], "Karli Sjöberg" <[email protected]>, > > [email protected] > > > Sent: Monday, January 27, 2014 2:59:02 PM > > > Subject: Re: [Users] two node ovirt cluster with HA > > > > > > Adding Eli. > > > > I just want to summarize the requirement as I understand it: > > > > In the case that a Host that is running HA VMs and have PM configured is > > turned off manually : > > > > 1) The non-responsive treatment should be modified to check Host status > > via PM agent > > 2) If Host is off , HA VMs will attempt to run on another host ASAP > > 3) The host status should be set to DOWN > > 4) No attempt to restart vdsm (soft fencing) or restart the host (hard > > fencing) will be done > > > > Is the above correct? if so , a RFE on that can be opened > > > > > > > > > > > On 01/27/2014 02:50 PM, Andrew Lau wrote: > > > > Hi, > > > > > > > > I think he was asking what if the power management device reported > > > > that the host was powered off. Then VMs should be brought back up as > > > > being off would essentially be the same as running a power > > cycle/reboot? > > > > > > > > Another example I'm seeing is what happens if the whole host loses > > > > power and it's power management device then becomes unavailable (ie. > > > > not reachable) then you're stuck in the case where it requires manual > > > > intervention. > > > > > > > > I would be interested to potentially see something like a timeout on > > > > those problematic VMs (eg. if nothing was read or write after x amount > > > > of time) then you could consider the host as offline? I guess then > > > > that adds a lot of risk.. > > > > > > > > > > > > On Mon, Jan 27, 2014 at 11:43 PM, Tareq Alayan <[email protected] > > > > <mailto:[email protected]>> wrote: > > > > > > > > Hi, > > > > > > > > Power management makes use of special *dedicated* hardware in > > > > order to restart hosts independently of host OS. The engine > > > > connects to a power management devices using a *dedicated* network > > > > IP address. > > > > The engine is capable of rebooting hosts that have entered a > > > > non-operational or non-responsive state, > > > > The abilities provided by all power management devices are: check > > > > status, start, stop and recycle (restart)... > > > > > > > > In the case of non-responsive host: all of the VMs that are > > > > currently running on that host can also become non-responsive. > > > > However, the non-responsive host keeps locking the VM hard disk > > > > for all VMs it is running. Attempting to start a VM on a different > > > > host and assign the second host write privileges for the virtual > > > > machine hard disk image can cause data corruption. > > > > Rebooting allows the engine to assume that the lock on a VM hard > > > > disk image has been released. > > > > The engine can know for sure that the problematic host has been > > > > rebooted via the power management device and then it can start a > > > > VM from the problematic host on another host without risking data > > > > corruption. > > > > Important note: A virtual machine that has been marked > > > > highly-available can not be safely started on a different host > > > > without the certainty that doing so will not cause data corruption. > > > > > > > > N-joy, > > > > > > > > --Tareq > > > > > > > > > > > > > > > > > > > > On 01/27/2014 02:05 PM, Dafna Ron wrote: > > > > > > > > I am adding Tareq for the Power Management implementation. > > > > > > > > Dafna > > > > > > > > > > > > On 01/27/2014 11:48 AM, Karli Sjöberg wrote: > > > > > > > > On Mon, 2014-01-27 at 11:11 +0000, Dafna Ron wrote: > > > > > > > > Powering off the host will never trigger vm migration. > > > > As far as engine is concerned it just lost connection > > > > to the host, but > > > > has no way of telling if the host is down or if a > > > > router is down. > > > > > > > > Can´t it at least check with power management if the Host > > > > status is down > > > > first? > > > > > > > > I mean, if the network is down there will be no response > > > > from either PM > > > > or Host. But if PM is up and can tell you that the Host is > > > > down, sounds > > > > rather clear cut to me... > > > > > > > > Seems to me the VM's would be restarted sooner if the flow > > > > was altered > > > > to first check with PM if it´s a network or Host issue, > > > > and if Host > > > > issue, immediately restart VM's on another Host, instead > > > > of waiting for > > > > a potentially problematic Host to boot up eventually. > > > > > > > > /K > > > > > > > > since vm's can continue running on the host even if > > > > engine has no access > > > > to it, starting the vm's on the second host can cause > > > > split brain and > > > > data corruption. > > > > > > > > The way that the engine knows what's going on is by > > > > sending heath check > > > > queries to the vdsm. > > > > Power management will try to reboot a host when the > > > > health checks to > > > > vdsm will not be answered. > > > > So... if engine gets no reply and has no way of > > > > rebooting the host, the > > > > host status will be changed to Non-Responsive and the > > > > vm's will be > > > > unknown because engine has no way of knowing what's > > > > happening with the > > > > vm's. > > > > Since reboot of the host will kill the vm's running on > > > > it - this will > > > > never cause any vm migration but... along with the > > > > High-Availability vm > > > > feature, you will be able to have some of the vm's > > > > re-started on the > > > > second host after the host reboot (and that is only if > > > > Power Management > > > > was confirmed as successful). > > > > > > > > VM migration is only triggered when: > > > > 1. Cluster configuration states that the vm should be > > > > migrated in case > > > > of failure > > > > 2. Engine has access to the host - so the failure is > > > > on the storage side > > > > and not the host side. > > > > 3. the vms are not actively writing (although there > > > > might be a new RFE > > > > for it). > > > > > > > > hope this clears things up > > > > > > > > Dafna > > > > > > > > > > > > > > > > On 01/27/2014 10:11 AM, Andrew Lau wrote: > > > > > > > > Hi, > > > > > > > > Have you got power management enabled? > > > > > > > > That's the fencing feature required for the engine > > > > to ensure that the > > > > host is actually offline. It won't resume any > > > > other VMs to prevent > > > > potential VM corruption (eg. VM running on > > > > multiple hosts). > > > > > > > > Andrew. > > > > > > > > On Jan 27, 2014 5:12 PM, "Jaison peter" > > > > <[email protected] <mailto:[email protected]> > > > > <mailto:[email protected] > > > > <mailto:[email protected]>>> wrote: > > > > > > > > Hi all , > > > > > > > > I was setting a two node ovirt cluster with > > > > ovirt engine on > > > > seperate node . I completed the configuration > > > > and tested VM live > > > > migrations with out any issues . Then for > > > > checking cluster HA I > > > > powered down one host and expected vms > > > > running on that host to be > > > > migrated to the other one . But nothing > > > > happened , Engine detected > > > > host as un-rechable and marked it as > > > > non-operational and vm ran on > > > > that host went to 'unknown state' . Is that > > > > not possible to setup > > > > a fully HA ovirt cluster with two nodes ? or > > > > else is that my > > > > configuration problem ? please advice . > > > > > > > > Thanks & Regards > > > > > > > > Alex > > > > > > > > > > _______________________________________________ > > > > Users mailing list > > > > [email protected] <mailto:[email protected]> > > > > <mailto:[email protected] <mailto:[email protected]>> > > > > http://lists.ovirt.org/mailman/listinfo/users > > > > > > > > > > > > > > > > _______________________________________________ > > > > Users mailing list > > > > [email protected] <mailto:[email protected]> > > > > http://lists.ovirt.org/mailman/listinfo/users > > > > > > > > > > > > -- > > > > Dafna Ron > > > > _______________________________________________ > > > > Users mailing list > > > > [email protected] <mailto:[email protected]> > > > > http://lists.ovirt.org/mailman/listinfo/users > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > Users mailing list > > [email protected] > > http://lists.ovirt.org/mailman/listinfo/users > > > _______________________________________________ Users mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/users

