Skickat från min iPhone
> 27 jan 2014 kl. 16:40 skrev "Eli Mesika" <[email protected]>: > > > > ----- Original Message ----- >> From: "Tareq Alayan" <[email protected]> >> To: "Andrew Lau" <[email protected]>, "Eli Mesika" <[email protected]> >> Cc: [email protected], "Karli Sjöberg" <[email protected]>, [email protected] >> Sent: Monday, January 27, 2014 2:59:02 PM >> Subject: Re: [Users] two node ovirt cluster with HA >> >> Adding Eli. > > I just want to summarize the requirement as I understand it: > > In the case that a Host that is running HA VMs and have PM configured is > turned off manually : > > 1) The non-responsive treatment should be modified to check Host status via > PM agent > 2) If Host is off , HA VMs will attempt to run on another host ASAP > 3) The host status should be set to DOWN > 4) No attempt to restart vdsm (soft fencing) or restart the host (hard > fencing) will be done > > Is the above correct? if so , a RFE on that can be opened Spot on, that's exactly what I was trying to say! I'd very much like to see an RFE for that. /K > >> >> >>> On 01/27/2014 02:50 PM, Andrew Lau wrote: >>> Hi, >>> >>> I think he was asking what if the power management device reported >>> that the host was powered off. Then VMs should be brought back up as >>> being off would essentially be the same as running a power cycle/reboot? >>> >>> Another example I'm seeing is what happens if the whole host loses >>> power and it's power management device then becomes unavailable (ie. >>> not reachable) then you're stuck in the case where it requires manual >>> intervention. >>> >>> I would be interested to potentially see something like a timeout on >>> those problematic VMs (eg. if nothing was read or write after x amount >>> of time) then you could consider the host as offline? I guess then >>> that adds a lot of risk.. >>> >>> >>> On Mon, Jan 27, 2014 at 11:43 PM, Tareq Alayan <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> Hi, >>> >>> Power management makes use of special *dedicated* hardware in >>> order to restart hosts independently of host OS. The engine >>> connects to a power management devices using a *dedicated* network >>> IP address. >>> The engine is capable of rebooting hosts that have entered a >>> non-operational or non-responsive state, >>> The abilities provided by all power management devices are: check >>> status, start, stop and recycle (restart)... >>> >>> In the case of non-responsive host: all of the VMs that are >>> currently running on that host can also become non-responsive. >>> However, the non-responsive host keeps locking the VM hard disk >>> for all VMs it is running. Attempting to start a VM on a different >>> host and assign the second host write privileges for the virtual >>> machine hard disk image can cause data corruption. >>> Rebooting allows the engine to assume that the lock on a VM hard >>> disk image has been released. >>> The engine can know for sure that the problematic host has been >>> rebooted via the power management device and then it can start a >>> VM from the problematic host on another host without risking data >>> corruption. >>> Important note: A virtual machine that has been marked >>> highly-available can not be safely started on a different host >>> without the certainty that doing so will not cause data corruption. >>> >>> N-joy, >>> >>> --Tareq >>> >>> >>> >>> >>> On 01/27/2014 02:05 PM, Dafna Ron wrote: >>> >>> I am adding Tareq for the Power Management implementation. >>> >>> Dafna >>> >>> >>> On 01/27/2014 11:48 AM, Karli Sjöberg wrote: >>> >>> On Mon, 2014-01-27 at 11:11 +0000, Dafna Ron wrote: >>> >>> Powering off the host will never trigger vm migration. >>> As far as engine is concerned it just lost connection >>> to the host, but >>> has no way of telling if the host is down or if a >>> router is down. >>> >>> Can´t it at least check with power management if the Host >>> status is down >>> first? >>> >>> I mean, if the network is down there will be no response >>> from either PM >>> or Host. But if PM is up and can tell you that the Host is >>> down, sounds >>> rather clear cut to me... >>> >>> Seems to me the VM's would be restarted sooner if the flow >>> was altered >>> to first check with PM if it´s a network or Host issue, >>> and if Host >>> issue, immediately restart VM's on another Host, instead >>> of waiting for >>> a potentially problematic Host to boot up eventually. >>> >>> /K >>> >>> since vm's can continue running on the host even if >>> engine has no access >>> to it, starting the vm's on the second host can cause >>> split brain and >>> data corruption. >>> >>> The way that the engine knows what's going on is by >>> sending heath check >>> queries to the vdsm. >>> Power management will try to reboot a host when the >>> health checks to >>> vdsm will not be answered. >>> So... if engine gets no reply and has no way of >>> rebooting the host, the >>> host status will be changed to Non-Responsive and the >>> vm's will be >>> unknown because engine has no way of knowing what's >>> happening with the >>> vm's. >>> Since reboot of the host will kill the vm's running on >>> it - this will >>> never cause any vm migration but... along with the >>> High-Availability vm >>> feature, you will be able to have some of the vm's >>> re-started on the >>> second host after the host reboot (and that is only if >>> Power Management >>> was confirmed as successful). >>> >>> VM migration is only triggered when: >>> 1. Cluster configuration states that the vm should be >>> migrated in case >>> of failure >>> 2. Engine has access to the host - so the failure is >>> on the storage side >>> and not the host side. >>> 3. the vms are not actively writing (although there >>> might be a new RFE >>> for it). >>> >>> hope this clears things up >>> >>> Dafna >>> >>> >>> >>> On 01/27/2014 10:11 AM, Andrew Lau wrote: >>> >>> Hi, >>> >>> Have you got power management enabled? >>> >>> That's the fencing feature required for the engine >>> to ensure that the >>> host is actually offline. It won't resume any >>> other VMs to prevent >>> potential VM corruption (eg. VM running on >>> multiple hosts). >>> >>> Andrew. >>> >>> On Jan 27, 2014 5:12 PM, "Jaison peter" >>> <[email protected] <mailto:[email protected]> >>> <mailto:[email protected] >>> <mailto:[email protected]>>> wrote: >>> >>> Hi all , >>> >>> I was setting a two node ovirt cluster with >>> ovirt engine on >>> seperate node . I completed the configuration >>> and tested VM live >>> migrations with out any issues . Then for >>> checking cluster HA I >>> powered down one host and expected vms >>> running on that host to be >>> migrated to the other one . But nothing >>> happened , Engine detected >>> host as un-rechable and marked it as >>> non-operational and vm ran on >>> that host went to 'unknown state' . Is that >>> not possible to setup >>> a fully HA ovirt cluster with two nodes ? or >>> else is that my >>> configuration problem ? please advice . >>> >>> Thanks & Regards >>> >>> Alex >>> >>> _______________________________________________ >>> Users mailing list >>> [email protected] <mailto:[email protected]> >>> <mailto:[email protected] <mailto:[email protected]>> >>> http://lists.ovirt.org/mailman/listinfo/users >>> >>> >>> >>> _______________________________________________ >>> Users mailing list >>> [email protected] <mailto:[email protected]> >>> http://lists.ovirt.org/mailman/listinfo/users >>> >>> >>> -- >>> Dafna Ron >>> _______________________________________________ >>> Users mailing list >>> [email protected] <mailto:[email protected]> >>> http://lists.ovirt.org/mailman/listinfo/users >> >> _______________________________________________ Users mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/users

