On Wed, Jun 20, 2012 at 9:38 PM, Luca Lesinigo <[email protected]> wrote:
> Hello list.
>
> I'm an happy user of pacemaker-1.0 + corosync + DRBD with the usual 
> active/passive dual node setup.
> It doesn't do live migration (we also run drbd master/slave and not dual 
> master) and gets funny under heavy I/O load (VM check script unresponsive, 
> the cluster thinks a node has problems, starts to migrate things around, etc 
> etc. - could be entirely my config's fault but that's another story), but it 
> is serving me well since some years ago. We were 'early adopters' of 
> pacemaker on Gentoo Linux back when it wasn't even included in the distro (we 
> did collaborate with a Gentoo dev who was putting together what would become 
> the current ebuilds for the cluster stack).
>
> Now we're thinking to upgrade to true external shared storage and our target 
> would be something like the Dell MD3220 array, basically it's a shared SAS 
> unit and it presents LUNs to all (up to 4) attached servers[*]. That also 
> means all LUNs are always available for concurrent read&write to all nodes.
>
> We are also targeting Xen as hypervisor (because that's what we're already 
> using) and Ubuntu 12.04 LTS as the server's operating system / Domain-0 
> (because we're already familiar with it and because of its 5 year support). 
> Ideally we won't have any other physical server to manage the cluster "from 
> the outside" so a general-purpose operating system on the nodes is a must (as 
> opposed to things like vSphere or maybe XenCluster, but I don't know the 
> latter really well).
>
> I would implement node fencing using the lights-out IPMI management, it's on 
> a separate network and every node in the cluster has access to every IPMI 
> board of the other nodes.
>
> We'd like to have a rock solid system and live migration of virtual machines 
> is a must.
>
> In the past I knew pacemaker wasn't able to live migrate resources but some 
> research suggests that it is now possible.
> So I'm reaching out to this list to ask:
> - if I can get pacemaker-1.1.6 to live migrate Xen VMs

yes

> - if anyone already has experience in doing that over shared SAS 
> infrastructure and how it works in production

i've seen people doing it with drbd, if that can handle it then a SAS
unit should also.

> - I assume user-commanded live migration (with all nodes up and running) 
> shouldn't pose any problem

not a good idea, at least not if you're talking about xen commands.
thats basically an internal split-brain condition...
- the admin (through various constraints) has to told the cluster to
place the VM on nodeX.
- the admin (via a cli) has also told the VM to move to nodeY

Both cannot be true :-)
The solution here is to tell the cluster to move the VM to nodeY,
which it will do via migration if possible.

> - also a failed-node migration (not live, of course) should work ok

yep, just a normal failure.

> - what could happen if all SAS links between a single node and the storage 
> stop working?
> (ie, storage array working, storage management IP responding, node working, 
> but node can't access actual LUNs)

what you can do is have a pretend resource (with dummy start/stop
actions) that monitors the SAS links and updates a node attribute
(pick whatever name you like).
you can then have a constraint that restricts the VM to nodes which
have that particular attribute set.

whether we can live migrate in that situation is unclear and might need testing.

>
> Thank you for any help and any experience you have to report, it will be 
> really appreciated.
>
> [*] we'd like to use shared SAS instead of iSCSI because it's simpler and 
> should give better performance, given that we know we won't ever grow over 4 
> nodes for a single storage array. I'm doing some research on both the SAS vs 
> iSCSI side and on the software stack side (it's actually this email) before 
> getting to the final choice between the two.
>
> --
> Luca Lesinigo
> _______________________________________________
> Pacemaker mailing list: [email protected]
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: [email protected]
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to