Hi Steve, We tried to setup a large scale corosync cluster. But found when the node size > 32, the cluster is not stable, the node joining may cause the existing cluster broken and re-configured. In my recent test, the breaking/re-config happened when node 46, 51, 59, 60 and 62 tried to join the cluster. And sometime "service corosync stop" causes the node 100% CPU usage and cannot recover. I have to restart the node.
BTW, I curious of the token lost timeout setting. My understanding is the token lose timeout is to detect node lost or network partition in the ring. But if the Totem protocol depends on passing token in the ring, should the scale of the cluster is relevant to the setting of the token lose timeout and might impact the time for failure detection time? I will do more investigation and testing to see how corosync scale to 64 nodes cluster. Thanks Javen 2010/1/13 Steven Dake <[email protected]> > Untested at this time. > > Feel free to try and report your experiences. > > I have tested 48 nodes on physical hardware and things work quite well > with a 1 sec token timeout and 5 second consensus timeout. > > Regards > -steve > > On Tue, 2010-01-12 at 12:10 +0800, Javen Wu wrote: > > Hi Folks, > > > > I just realize the Corosync has limitation to support 32 nodes as > > maximum. > > Is it possible we extended the limitation to support 64 nodes? Any > > technical barrier? > > > > thanks > > -- > > Javen Wu > > _______________________________________________ > > Openais mailing list > > [email protected] > > https://lists.linux-foundation.org/mailman/listinfo/openais > > -- Javen Wu
_______________________________________________ Openais mailing list [email protected] https://lists.linux-foundation.org/mailman/listinfo/openais
