Hi, I'm also setting up a high-available Lustre system, I configured pairs for the OSSes and MDSes, redundant Corosync rings (two separate rings: IB and Eth), and Stonith is enabled.
The current configuration seems to work fine, however yesterday we experienced some problem because 4 OSSes got rebooted by Stonith. I suspect that Corosync missed a heartbeat due to a kernel/corosync hung, rather than a network problem. I will try the "renice" solution you proposed. I have been thinking that I could increase the "token" timeout value in /etc/corosync/corosync.conf , to prevent short "hiccups". Did you specify a value to this parameter or did you leave the default 1000ms value? Marco On 2012-10-31 03:43, Hall, Shawn wrote: > Thanks for the replies. We've worked on the HA and have it to a > satisfactory point where we can put it into production. We broke it > into a MDS pair and 4 groups of 4 OSS nodes. From our perspective, it's > actually easier to manage groups of 4 than groups of 2, since it's half > as many configurations to keep track of. > > After splitting the cluster into 5 pieces it has become much more > responsive and stable. It's more difficult to manage than one large > cluster, but the stability is obviously worth it. We've been performing > heavy load testing and have not been able to "break" the cluster. We > did a few more things to get to this point: > > - Lowered the nice value of the corosync process to make it more > responsive under load and prevent a node from getting kicked out due to > unresponsiveness. > - Increased vm.min_free_kbytes to give TCP/IP w/ jumbo frames room to > move around. Without this certain nodes would have low memory issues > related to networking and would get stonithed due to unresponsiveness. > > Thanks, > Shawn > > -----Original Message----- > From: Charles Taylor [mailto:tay...@hpc.ufl.edu] > Sent: Wednesday, October 24, 2012 3:33 PM > To: Hall, Shawn > Cc: lustre-discuss@lists.lustre.org > Subject: Re: [Lustre-discuss] Large Corosync/Pacemaker clusters > > > FWIW, we are running HA Lustre using corosync/pacemaker. We broke our > OSSs and MDSs out into individual HA *pairs*. Thought about other > configurations but it was our first step into corosync/pacemaker so we > decided to keep it as simple as possible. Seems to work well. I'm > not sure I would attempt what you are doing though it may be perfectly > fine. When HA is a requirement, it probably makes sense to avoid > pushing the limits of what works. > > Doesn't really help you much other than to provide a data point with > regard to what other sites are doing. > > Good luck and report back. > > Charlie Taylor > UF HPC Center > > On Oct 19, 2012, at 12:52 PM, Hall, Shawn wrote: > >> Hi, >> >> We're setting up fairly large Lustre 2.1.2 filesystems, each with 18 > nodes and 159 resources all in one Corosync/Pacemaker cluster as > suggested by our vendor. We're getting mixed messages on how large of a > Corosync/Pacemaker cluster will work well between our vendor an others. >> >> 1. Are there Lustre Corosync/Pacemaker clusters out there of > this size or larger? >> 2. If so, what tuning needed to be done to get it to work well? >> 3. Should we be looking more seriously into splitting this > Corosync/Pacemaker cluster into pairs or sets of 4 nodes? >> >> Right now, our current configuration takes a long time to start/stop > all resources (~30-45 mins), and failing back OSTs puts a heavy load on > the cib process on every node in the cluster. Under heavy IO load, the > many of the nodes will show as "unclean/offline" and many OST resources > will show as inactive in crm status, despite the fact that every single > MDT and OST is still mounted in the appropriate place. We are running 2 > corosync rings, each on a private 1 GbE network. We have a bonded 10 > GbE network for the LNET. >> >> Thanks, >> Shawn >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss@lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss _______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss