----- Original Message ----- > From: "Perry Myers" <pmy...@redhat.com> > To: "xrx" <xrx-ov...@xrx.me>, "Ryan O'Hara" <roh...@redhat.com>, "Andrew > Beekhof" <abeek...@redhat.com> > Cc: users@ovirt.org > Sent: Saturday, March 3, 2012 3:16:02 PM > Subject: Re: [Users] oVirt/RHEV fencing; a single point of failure > > On 03/03/2012 11:52 AM, xrx wrote: > > Hello, > > > > I was worried about the high availability approach taken by > > RHEV/oVirt. > > I had read the thread titled "Some thoughts on enhancing High > > Availability in oVirt" but couldn't help but feel that oVirt is > > missing > > basic HA while it's developers are considering adding (and in my > > opinion > > unneeded) complexity with service monitoring. > > Service monitoring is a highly desirable feature, but for the most > part > (today) people achieve it by running service monitoring in a layered > fashion. > > For example, running the RHEL HA cluster stack on top of VMs on RHEV > (or > Fedora Clustering on top of oVirt VMs) > > So we could certainly skip providing service HA as an integral > feature > of oVirt and continue to leverage the Pacemaker style service HA as a > layered option instead. > > In the past I've gotten the impression that tighter integration and a > single UI/API for managing both VM and service HA was desirable. > > > It all comes down to fencing. Picture this: 3 HP hypervisors > > running > > RHEV/oVirt with iLO fencing. Say hypervisor A runs 10 VMs, all of > > which > > are set to be highly available. Now suppose that hypervisor A has a > > power failure or an iLO failure (I've seen it happen more than once > > with > > a batch of HP DL380 G6s). Because RHEV would not be able to fence > > the > > hypervisor as it's iLO is unresponsive; those 10 HA VMs that were > > halted > > are NOT moved to other hypervisors automatically. > > > > I suggest that oVirt concentrates on having support for multiple > > fencing > > devices as a development priority. SCSI persistent reservation > > based > > fencing would be an ideal secondary, if not primary, fencing > > device; it > > would be easy to set up for users as SANs generally support it and > > is > > proven to work well, as seen on Red Hat clusters. > > Completely agree here. The Pacemaker/rgmanager cluster stacks > already > support an arbitrary number of fence devices per host, to provide > support for both redundant power supplies and also for redundant > fencing > devices. In order to provide resilient service HA, fixing this would > be > a prerequisite anyhow. I've cc'd Andrew Beekhof from the > Pacemaker/stonith_ng, since I think it might be useful to model the > fencing for oVirt similarly to how Pacemaker/stonith_ng does it. > Perhaps there's even some code that could be reused for this as well. > > As for SCSI III PR based fencing... the trouble here has been that > the > fence_scsi script provided in fence-agents is Perl based, and we were > hesitant to drag Perl into the list of required things on oVirt Node > (and in general) > > on the other hand, fence-scsi might not be the right level of > granularity for oVirt based SCSI III PR based fencing anyhow. > Perhaps > better would be to just have vdsm directly call sg_persist commands > directly. > > I've cc'd Ryan O'Hara who wrote fence_scsi and knows a fair bit about > SCSI III PR. If oVirt is interested in pursuing this, perhaps he can > be > of assistance.
There's also sanlock which plays a role here. In the past we required some form of fencing action but once sanlock is integrated that provides another path. > > > I have brought up this point about fencing being a single point of > > failure in RHEV with a Red Hat employee (Mark Wagner) during the > > RHEV > > virtual event; but he said that it is not. I don't see how it > > isn't; one > > single loose iLO cable and the VMs are stuck until there is manual > > intervention. > > Agreed. This is something that should be easily fixed in order to > provide greater HA. > > That being said, I still think more tightly integrated service HA is > a > good idea as well. > > Perry > _______________________________________________ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users > _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users