On Tue, 2009-12-15 at 17:37 -0700, hj lee wrote:
> Hi,
> 
> Thanks for the response. The default token is 5000 that comes with
> openais-0.80.5.  5000 seems to big! Here are the default parameters
> that comes with openais-0.80.5 downloaded from
> http://download.opensuse.org/repositories/server:/ha-clustering/CentOS_5/i386/.
>  I want to detect the node power fail within 500 msec. A small percent of 
> detection error is OK. Do you think it's possible to detect it within 500msec?
> 

500 msec should work but depends on your environment.  More details
below.

The timers are close to those specified (500msec+the tick time of your
kernel).  Most kernels have a tick time of 10 msec or less.  (The HZ
kernel variable).  I am not sure about tickless kernels.

I regularly run corosync with 200 msec token timeout.  It works very
well as long corosync is regularly scheduled.  realtime processes and
kernel misbehavior (kernel spinning in a loop with spinlock held) can
prevent corosync from being scheduled in a timely fashion, so it all
depends on your deployment environment.

The kernel option CONFIG_PREEMPT helps since it will cause corosync to
preempt other long running processes in the system.

Regards
-steve
> Thanks
> hj
> 
> ----------- default parameters comes with openais-0.80.5--------------
>         # How long before declaring a token lost (ms)
>         token:          5000
> 
>         # How many token retransmits before forming a new
> configuration
>         token_retransmits_before_loss_const: 10
> 
>         # How long to wait for join messages in the membership
> protocol (ms)
>         join:           1000
> 
>         # How long to wait for consensus to be achieved before
> starting a new round of membership configuration (ms)
>         consensus:      2500
> 
>         # Turn off the virtual synchrony filter
>         vsftype:        none
> 
>         # Number of messages that may be sent by one processor on
> receipt of the token
>         max_messages:   20
> 
>         # Stagger sending the node join messages by 1..send_join ms
>         send_join: 45
> 
> 
> 
> On Tue, Dec 15, 2009 at 3:37 PM, Steven Dake <[email protected]> wrote:
>         
>         On Tue, 2009-12-15 at 14:49 -0700, hj lee wrote:
>         > Hi,
>         >
>         > I have a simple two nodes clusters with pacemaker-1.0.5 and
>         > openais-0.80.5. If I remove power cable immediately at one
>         of two
>         > nodes, how long does the other node take to detect one of
>         node went
>         > away? With the default openais parameters comes with
>         openais-0.80.5,
>         > it takes 5 sec from power off to promote(I have a simple
>         multi-state
>         > clone running). I want to shorten this time as much as I
>         can, what is
>         > the best way to shorten this time? What kind parameter I can
>         adjust to
>         > reduce this time?
>         >
>         
>         
>         The "token" parameter controls how long it takes to detect a
>         failure
>         within the network.  It should default to 1000msec (1 second)
>         but can be
>         run at lower values depending on your system.
>         
>         I am not certain why you see 5 seconds for full recovery with
>         pacemaker.
>         Andrew may be able to address.
>         
>         Regards
>         -steve
>         > Thanks
>         > hj
>         > _______________________________________________
>         > Openais mailing list
>         > [email protected]
>         > https://lists.linux-foundation.org/mailman/listinfo/openais
>         
> 
> 
> 
> -- 
> Dream with longterm vision!
> kerdosa

_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais

Reply via email to