Re: [Openais] Question regarding cluster fail-over time and GATHER states

Herwin Kleinjan Mon, 15 Mar 2010 01:39:45 -0700

> -----Original Message-----
> From: Steven Dake [mailto:[email protected]]
> Sent: vrijdag 12 maart 2010 17:19
> To: Herwin Kleinjan
> Cc: [email protected]
> Subject: Re: [Openais] Question regarding cluster fail-over time and
GATHER
> states
> 
> On Fri, 2010-03-12 at 09:44 +0100, Herwin Kleinjan wrote:
> > Hello,
> >
> > Currently we are looking into possibilities to speed up the fail-over
> > process on our dual node cluster. This is a RHEL 5.4 cluster running on
HP
> > Proliant servers with iLO based power fencing. For shared storage we use
a
> > fiber based storage array.
> >
> > There are some parts where fail-over time might be improved, one of them
> > relating to openais or its configuration. During testing whenever one
node
> > is failing or its power is disconnected, the other node detects this and
the
> > fail-over process is started:
> >
> > Mar 12 08:46:02 donald01 openais[5301]: [TOTEM] The token was lost in
the
> > OPERATIONAL state.
> > Mar 12 08:46:02 donald01 openais[5301]: [TOTEM] Receive multicast socket
> > recv buffer size (288000 bytes).
> > Mar 12 08:46:02 donald01 openais[5301]: [TOTEM] Transmit multicast
socket
> > send buffer size (288000 bytes).
> > Mar 12 08:46:02 donald01 openais[5301]: [TOTEM] entering GATHER state
from
> > 2.
> > Mar 12 08:46:07 donald01 openais[5301]: [TOTEM] entering GATHER state
from
> > 0.
> > Mar 12 08:46:07 donald01 openais[5301]: [TOTEM] Creating commit token
> > because I am the rep.
> > Mar 12 08:46:07 donald01 openais[5301]: [TOTEM] Saving state aru 44 high
seq
> > received 44
> > Mar 12 08:46:07 donald01 openais[5301]: [TOTEM] Storing new sequence id
for
> > ring 74
> > Mar 12 08:46:07 donald01 openais[5301]: [TOTEM] entering COMMIT state.
> > Mar 12 08:46:07 donald01 openais[5301]: [TOTEM] entering RECOVERY state.
> > Mar 12 08:46:07 donald01 openais[5301]: [TOTEM] position [0] member
> > 10.227.180.101:
> > Mar 12 08:46:07 donald01 openais[5301]: [TOTEM] previous ring seq 112
rep
> > 10.227.180.101
> > Mar 12 08:46:07 donald01 openais[5301]: [TOTEM] aru 44 high delivered 44
> > received flag 1
> > Mar 12 08:46:07 donald01 openais[5301]: [TOTEM] Did not need to
originate
> > any messages in recovery.
> > Mar 12 08:46:07 donald01 openais[5301]: [TOTEM] Sending initial ORF
token
> > Mar 12 08:46:07 donald01 openais[5301]: [CLM  ] CLM CONFIGURATION CHANGE
> > Mar 12 08:46:07 donald01 openais[5301]: [CLM  ] New Configuration:
> > Mar 12 08:46:07 donald01 openais[5301]: [CLM  ]         r(0)
> > ip(10.227.180.101)
> > Mar 12 08:46:07 donald01 openais[5301]: [CLM  ] Members Left:
> > Mar 12 08:46:07 donald01 kernel: dlm: closing connection to node 2
> > Mar 12 08:46:07 donald01 openais[5301]: [CLM  ]         r(0)
> > ip(10.227.180.102)
> > Mar 12 08:46:07 donald01 clurgmgrd[7525]: <info> State change: donald02
DOWN
> >
> > Mar 12 08:46:07 donald01 openais[5301]: [CLM  ] Members Joined:
> > Mar 12 08:46:07 donald01 openais[5301]: [CLM  ] CLM CONFIGURATION CHANGE
> > Mar 12 08:46:07 donald01 openais[5301]: [CLM  ] New Configuration:
> > Mar 12 08:46:07 donald01 openais[5301]: [CLM  ]         r(0)
> > ip(10.227.180.101)
> > Mar 12 08:46:07 donald01 openais[5301]: [CLM  ] Members Left:
> > Mar 12 08:46:07 donald01 openais[5301]: [CLM  ] Members Joined:
> > Mar 12 08:46:07 donald01 openais[5301]: [SYNC ] This node is within the
> > primary component and will provide service.
> > Mar 12 08:46:07 donald01 openais[5301]: [TOTEM] entering OPERATIONAL
state.
> > Mar 12 08:46:07 donald01 openais[5301]: [CLM  ] got nodejoin message
> > 10.227.180.101
> > Mar 12 08:46:07 donald01 openais[5301]: [CPG  ] got joinlist message
from
> > node 1
> >
> >
> > As you can see from the above /var/log/messages excerpt there is a 5
second
> > time frame at the beginning in which apparently nothing is happening
> > (08:46:02-08:46:07). I was wondering how I could reduce or remove this
delay
> > so that the fail-over process will be done more quickly.
> >
> > My current /etc/ais/openais.conf is still the installed default:
> > totem {
> >     version: 2
> >     secauth: off
> >     threads: 0
> >     interface {
> >             ringnumber: 0
> >             bindnetaddr: 192.168.2.0
> >             mcastaddr: 226.94.1.1
> >             mcastport: 5405
> >     }
> > }
> >
> > logging {
> >     debug: off
> >     timestamp: on
> > }
> >
> > amf {
> >     mode: disabled
> > }
> >
> >
> > From /etc/cluster/cluster.conf some lines relating to openais:
> > <cman expected_votes="1" two_node="1" hello_timer="1"
deadnode_timeout="3"/>
> > <totem token="3000"/>
> >
> > I am suspecting more fine tuning of additional openais configuration
> > parameters will do the trick but I am not sure. If more information is
> > needed please let me know, any useful advice would be greatly
appreciated!
> >
> > Best regards,
> > Herwin
> >
> When a cluster is started with cman, /etc/ais/openais.conf is not used.
> 
> While changing timing parameters is not supported by Red Hat, they can
> be modified via overrides.  The 5 second time window you see is a result
> of the "consensus" timeout parameter.
> 
> consensus should be at minimum 2* token.
> 
> You might try
> <totem token="500" consensus="1500" retransmits_before_loss_const="8"/>
> 
> With qdisk, there may be other implications on timer settings.
> 
> You may have to override retransmits_before_loss_const as well.  The
> token timeout is divided by retrans_before_loss and used to calculate
> the token_retransmit parameter.  The smallest value for any timer can be
> 30 milliseconds because of limitations of the Linux timer
> implementation.
> 
> I believe retransmits_before_loss_const is something like 20 for cman,
> so in this case 500/20 = 25 msec (less then 30 msec) which will cause
> aisexec to fail to start.
> 
> A safe value for retransmits_before_loss might be 8-10.
> 
> Again, not supported by Red Hat support, and YMMV.
> 
> Let us know how it goes.
> 
> Timer parameters need to be the same on all nodes.
> 
> Regards
> -steve
> >
> > _______________________________________________
> > Openais mailing list
> > [email protected]
> > https://lists.linux-foundation.org/mailman/listinfo/openais


Thanks for your reply Steve!

I have followed your recommendations and ended up with the following
configuration in /etc/cluster/cluster.conf:

        <cman expected_votes="1" two_node="1" hello_timer="1"
deadnode_timeout="3"/>
        <logging syslog_facility="local4">
                <logger ident="CPG" debug="on"/>
                <logger ident="CMAN" debug="on"/>
        </logging>
        <totem token="500" consensus="1000" retransmits_before_loss="10"/>

And this seems to work. Failover time is now about 16-20 seconds, this 
includes fencing the failed node, assigning IP addresses and mounting the
filesystem required by the cluster service. However, I had expected to see
more logging in /var/log/messages because of the options in <logging/> but
it is the same as without these options...

While I was adding the consensus parameter and thought I little more about
the token parameter I was wondering how these timers relate to the
hello_timer
and deadnode_timeout parameters in the <cman /> line. I have googled quite 
extensively but could not find any detailed documentation/explanation on the
way that these parameters could/should be configured for different use
cases.
Any recommended reading (either online or in a book) that you could
recommend
would also be appreciated.

Best regards,
Herwin

_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais

Re: [Openais] Question regarding cluster fail-over time and GATHER states

Reply via email to