Hi, On Fri, Jan 6, 2012 at 1:10 PM, Christian Parpart <tra...@gentoo.org> wrote: > Hey all, > > I am also about to evaluate whether or not Pacemaker+Corosync is the > way to go for our > infrastructure. > > We are currently having about 45 physical nodes (plus about 60 more > virtual containers) > with a statically historically grown setup of services. > > I am now to restructure this historically grown system into something > clean and well > maintainable with HA and scalability in mind (there is no hurry, we've > some time to design it). > > So here is what we mainly have or will have: > > -> HAproxy (tcp/80, tcp/443, master + (hot) failover) > -> http frontend server(s) (doing SSL and static files, in case of > performance issues -> clone resource). > -> Varnish (backend accelerator) > -> HAproxy (load-balancing backend app) > -> Rails (app nodes, clones) > ---------------------------------------------------------------- > - sharded memcache cluster (5 nodes), no failover currently (memcache > cannot replicate :( ) > - redis nodes > - mysql (3 nodes: active master, master, slave) > - Solr (1 master, 2 slaves) > - resque (many nodes) > - NFS file storage pool (master/slave DRBD + ext3 fs currently, want > to use GFS2/OCFS2 however) > > Now, I read alot about ppl saying a pacemaker cluster should not > exceed 16 nodes, and many > others saying this statement is bullsh**. While I now feel more with > the latter, I still want to know: > > is it still wise to built up a single pacemaker/corosync driven > cluster out of all the services above?
There was a question related to large cluster performance which might worth reading [1] As far as Pacemaker is concerned, it has no (theoretical) upper limit on the number of resources it can handle [1] however the lower part of the stack (messaging and membership) has that limit. With Corosync IIRC it was ~32 nodes (max number, used for testing scenarios) and I haven't seen anyone come forth and say they've achieved more than 32 nodes in a single cluster using Corosync. Now comes the question, do you really need to have all of the nodes in the same cluster? Because if the answer is yes you need to consider the following: - any kind of resource failure is managed by relaying the information to the DC, which then makes the decisions and sends out the actions to be taken. -- roughly translated, this means that for any kind of failure, the DC will be more heavily loaded than any other node with performing the required actions. Consider that this node (in your scenario) will also hold resources, which may be affected by the load. So you may want to consider not allowing resources to run on the DC. - working with resources (listing, modifying, etc.) from nodes that are not the DC will create additional overhead (not to mention that you will need to use something else other than the crm shell as for such a large cluster it'll be a performance hit, and this is just for listing resources) - the network layer will have overhead just for CIB synchronization, even though the process uses diff's, it's still a lot of traffic just for this, which in turn affects the timeouts on Corosync, which you tune but then you need to take into account the increased timeouts on resources, so [more nodes] => [increased traffic] => [tune network timeouts] => [tune individual timeouts per resource] => [are you sure?] => [y/n] - now to the above add STONITH (as you should) and then consider the administrative overhead for the entire solution The point is it's not feasible to put everything in one Pacemaker+Corosync cluster, from many perspectives, even if technically it could be done. Of course, this is just my point of view on the matter, I would recommend to have the full 45 nodes split into smaller clusters based on purpose. The way services talk to one is relevant mostly at a network layer, so knowing you can contact MySQL servers on an IP address or a list of IP addresses is the same whether MySQL is in a cluster or not, it's still about A contacting B more or less, I'm trying to simplify the view of the matter. > > One question I also have, is, when pacemaker is managing your > resources, and migrates > one resource from one host (because this one went down) to another, > then this service should > be actually able to access all data on that node, too. > Which leads to the assumption, that you have to install *everything* > on every node, to be actually able > to start anything anywhere (depending on where pacemaker is about to > put it and the scores the admin > has defined). Yes, if you have 10 services available in a cluster, and you allow all services to be started on any one node, all nodes must have all of the 10 services configured and functional. That is why I suggested splitting the cluster by purpose, this way for MySQL nodes you install and configure as necessary, but don't do the same on the rest of the nodes. One other thing, as I see it, you want an N-to-N cluster, with any one service being able to run on any node and to failover to any node. Consider all of the services that need coordinated access to data, now consider any node in the cluster can possibly run that service, which further along means that you need all the nodes to have access to the same shared data, so you're talking about a GFS2/OCFS2 cluster spanning 45 nodes. I know I have an knack on stating the obvious, but people most of the time say one thing and think another, so when you reply with what they say, then all of a sudden when someone else other than you says it, it sheds a different light on the matter. Bottom line, split the nodes into clusters that match a common purpose. There's bound to be more input on this on the matter, this is just my opinion. HTH, Dan [1] http://oss.clusterlabs.org/pipermail/pacemaker/2012-January/012639.html > > Many thanks for your thoughts on this, > Christian. > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org -- Dan Frincu CCNA, RHCE _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org