On Wed, Jun 20, 2012 at 9:38 PM, Luca Lesinigo <[email protected]> wrote: > Hello list. > > I'm an happy user of pacemaker-1.0 + corosync + DRBD with the usual > active/passive dual node setup. > It doesn't do live migration (we also run drbd master/slave and not dual > master) and gets funny under heavy I/O load (VM check script unresponsive, > the cluster thinks a node has problems, starts to migrate things around, etc > etc. - could be entirely my config's fault but that's another story), but it > is serving me well since some years ago. We were 'early adopters' of > pacemaker on Gentoo Linux back when it wasn't even included in the distro (we > did collaborate with a Gentoo dev who was putting together what would become > the current ebuilds for the cluster stack). > > Now we're thinking to upgrade to true external shared storage and our target > would be something like the Dell MD3220 array, basically it's a shared SAS > unit and it presents LUNs to all (up to 4) attached servers[*]. That also > means all LUNs are always available for concurrent read&write to all nodes. > > We are also targeting Xen as hypervisor (because that's what we're already > using) and Ubuntu 12.04 LTS as the server's operating system / Domain-0 > (because we're already familiar with it and because of its 5 year support). > Ideally we won't have any other physical server to manage the cluster "from > the outside" so a general-purpose operating system on the nodes is a must (as > opposed to things like vSphere or maybe XenCluster, but I don't know the > latter really well). > > I would implement node fencing using the lights-out IPMI management, it's on > a separate network and every node in the cluster has access to every IPMI > board of the other nodes. > > We'd like to have a rock solid system and live migration of virtual machines > is a must. > > In the past I knew pacemaker wasn't able to live migrate resources but some > research suggests that it is now possible. > So I'm reaching out to this list to ask: > - if I can get pacemaker-1.1.6 to live migrate Xen VMs
yes > - if anyone already has experience in doing that over shared SAS > infrastructure and how it works in production i've seen people doing it with drbd, if that can handle it then a SAS unit should also. > - I assume user-commanded live migration (with all nodes up and running) > shouldn't pose any problem not a good idea, at least not if you're talking about xen commands. thats basically an internal split-brain condition... - the admin (through various constraints) has to told the cluster to place the VM on nodeX. - the admin (via a cli) has also told the VM to move to nodeY Both cannot be true :-) The solution here is to tell the cluster to move the VM to nodeY, which it will do via migration if possible. > - also a failed-node migration (not live, of course) should work ok yep, just a normal failure. > - what could happen if all SAS links between a single node and the storage > stop working? > (ie, storage array working, storage management IP responding, node working, > but node can't access actual LUNs) what you can do is have a pretend resource (with dummy start/stop actions) that monitors the SAS links and updates a node attribute (pick whatever name you like). you can then have a constraint that restricts the VM to nodes which have that particular attribute set. whether we can live migrate in that situation is unclear and might need testing. > > Thank you for any help and any experience you have to report, it will be > really appreciated. > > [*] we'd like to use shared SAS instead of iSCSI because it's simpler and > should give better performance, given that we know we won't ever grow over 4 > nodes for a single storage array. I'm doing some research on both the SAS vs > iSCSI side and on the software stack side (it's actually this email) before > getting to the final choice between the two. > > -- > Luca Lesinigo > _______________________________________________ > Pacemaker mailing list: [email protected] > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: [email protected] http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
