token: 45000
token_retransmits_before_loss_const: 45

 On Wed, 2010-05-19 at 08:39 +0200, Alain.Moulle wrote:
> Hi Steven
> in fact, I 've at first post this question on the Pacemaker ML,
> but there is no way in Pacemaker to increase this time, and
> I think it is normal as the "cluster manager" part is provided
> by corosync, managing the heartbeat. 
> My concern is to largely increase this time, until even values
> as 45s, this is not a problem for applications I have to manage,
> but 10s is really a big problem for me, in case of network 
> problem which lead to silence on heartbeat for a while.
> So, based on your experience, which parameters do you 
> think I can try to increase to get this 45s timeout ?
> Thanks a lot.
> Regards
> Alain
> > On Mon, 2010-05-17 at 08:25 +0200, Alain.Moulle wrote:
> >   
> > > > Hi again,
> > > > 
> > > > I 've checked the man corosync.conf and seen many parameters
> > > > around token timers etc. but I can't see how to increase the heartbeat
> > > > timeout. When testing, it occurs that timeout is between 10s and 12s
> > > > before a node decides to fence another one in the cluster (when for
> > > > example I force a if down eth0 on this node to simulate Heartbeat 
> > > > failure).
> > > > But I can't see which parameter(s) to tune in corosync.conf to increase
> > > > these 10 or 12s ...
> > > > 
> > > > Any tip would be appreciated...
> > > > Thanks
> > > > Alain
> > >     
> > 
> > Alain,
> > 
> > I don't have a direct answer to your question.  Corosync detects a
> > failure of any node in "token" msec.  I have not measured how long
> > qpid/fencing/pacemaker/rgmanager/gfs/ocfs/etc take to operate on this
> > notification.  This delta between failure detection and recovery would
> > be a good question to potentially ask on the pacemaker ml.
> > 
> > In my test environments I run at token = 1000 msec.  Totem can be tuned
> > to lower values, but under a heavy network load, may falsely detect a
> > node failure.
> > 
> > Most products that use Corosync ship with a 10000msec (10sec) or larger
> > token value to offer least chance of false node detection.
> > 
> > The token timer is just one consideration, however.  The
> > "token_retransmits_before_loss_const" defaults to 4.  This may be too
> > low in lossy or heavy load networks.  A higher value for this
> > configuration produces a bit more load but more resilient behavior.
> > 
> > Regards
> > -steve
> > 
> > 
> >   
> _______________________________________________
> Openais mailing list
> [email protected]
> https://lists.linux-foundation.org/mailman/listinfo/openais

_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais

Reply via email to