token: 45000 token_retransmits_before_loss_const: 45 On Wed, 2010-05-19 at 08:39 +0200, Alain.Moulle wrote: > Hi Steven > in fact, I 've at first post this question on the Pacemaker ML, > but there is no way in Pacemaker to increase this time, and > I think it is normal as the "cluster manager" part is provided > by corosync, managing the heartbeat. > My concern is to largely increase this time, until even values > as 45s, this is not a problem for applications I have to manage, > but 10s is really a big problem for me, in case of network > problem which lead to silence on heartbeat for a while. > So, based on your experience, which parameters do you > think I can try to increase to get this 45s timeout ? > Thanks a lot. > Regards > Alain > > On Mon, 2010-05-17 at 08:25 +0200, Alain.Moulle wrote: > > > > > > Hi again, > > > > > > > > I 've checked the man corosync.conf and seen many parameters > > > > around token timers etc. but I can't see how to increase the heartbeat > > > > timeout. When testing, it occurs that timeout is between 10s and 12s > > > > before a node decides to fence another one in the cluster (when for > > > > example I force a if down eth0 on this node to simulate Heartbeat > > > > failure). > > > > But I can't see which parameter(s) to tune in corosync.conf to increase > > > > these 10 or 12s ... > > > > > > > > Any tip would be appreciated... > > > > Thanks > > > > Alain > > > > > > > Alain, > > > > I don't have a direct answer to your question. Corosync detects a > > failure of any node in "token" msec. I have not measured how long > > qpid/fencing/pacemaker/rgmanager/gfs/ocfs/etc take to operate on this > > notification. This delta between failure detection and recovery would > > be a good question to potentially ask on the pacemaker ml. > > > > In my test environments I run at token = 1000 msec. Totem can be tuned > > to lower values, but under a heavy network load, may falsely detect a > > node failure. > > > > Most products that use Corosync ship with a 10000msec (10sec) or larger > > token value to offer least chance of false node detection. > > > > The token timer is just one consideration, however. The > > "token_retransmits_before_loss_const" defaults to 4. This may be too > > low in lossy or heavy load networks. A higher value for this > > configuration produces a bit more load but more resilient behavior. > > > > Regards > > -steve > > > > > > > _______________________________________________ > Openais mailing list > [email protected] > https://lists.linux-foundation.org/mailman/listinfo/openais
_______________________________________________ Openais mailing list [email protected] https://lists.linux-foundation.org/mailman/listinfo/openais
