Re: [Openais] Question regarding cluster fail-over time and GATHER states

Steven Dake Wed, 24 Mar 2010 15:50:54 -0700

Might send this question to [email protected].  The component in
question is maintained by that project.


Regards
-steve

On Mon, 2010-03-15 at 10:45 +0100, Herwin Kleinjan wrote:
> > -----Original Message-----
> > From: Herwin Kleinjan [mailto:[email protected]]
> > Sent: maandag 15 maart 2010 9:37
> > To: [email protected]
> > Cc: [email protected]
> > Subject: RE: [Openais] Question regarding cluster fail-over time and
> GATHER
> > states
> > 
> > > -----Original Message-----
> > > From: Steven Dake [mailto:[email protected]]
> > > Sent: vrijdag 12 maart 2010 17:19
> > > To: Herwin Kleinjan
> > > Cc: [email protected]
> > > Subject: Re: [Openais] Question regarding cluster fail-over time and
> > GATHER
> > > states
> > >
> > > On Fri, 2010-03-12 at 09:44 +0100, Herwin Kleinjan wrote:
> > > > Hello,
> > > >
> > > > Currently we are looking into possibilities to speed up the fail-over
> > > > process on our dual node cluster. This is a RHEL 5.4 cluster running
> on
> > HP
> > > > Proliant servers with iLO based power fencing. For shared storage we
> use
> > a
> > > > fiber based storage array.
> > > >
> > > > There are some parts where fail-over time might be improved, one of
> them
> > > > relating to openais or its configuration. During testing whenever one
> > node
> > > > is failing or its power is disconnected, the other node detects this
> and
> > the
> > > > fail-over process is started:
> > > >
> > > > Mar 12 08:46:02 donald01 openais[5301]: [TOTEM] The token was lost in
> > the
> > > > OPERATIONAL state.
> > > > Mar 12 08:46:02 donald01 openais[5301]: [TOTEM] Receive multicast
> socket
> > > > recv buffer size (288000 bytes).
> > > > Mar 12 08:46:02 donald01 openais[5301]: [TOTEM] Transmit multicast
> > socket
> > > > send buffer size (288000 bytes).
> > > > Mar 12 08:46:02 donald01 openais[5301]: [TOTEM] entering GATHER state
> > from
> > > > 2.
> > > > Mar 12 08:46:07 donald01 openais[5301]: [TOTEM] entering GATHER state
> > from
> > > > 0.
> > > > Mar 12 08:46:07 donald01 openais[5301]: [TOTEM] Creating commit token
> > > > because I am the rep.
> > > > Mar 12 08:46:07 donald01 openais[5301]: [TOTEM] Saving state aru 44
> high
> > seq
> > > > received 44
> > > > Mar 12 08:46:07 donald01 openais[5301]: [TOTEM] Storing new sequence
> id
> > for
> > > > ring 74
> > > > Mar 12 08:46:07 donald01 openais[5301]: [TOTEM] entering COMMIT state.
> > > > Mar 12 08:46:07 donald01 openais[5301]: [TOTEM] entering RECOVERY
> state.
> > > > Mar 12 08:46:07 donald01 openais[5301]: [TOTEM] position [0] member
> > > > 10.227.180.101:
> > > > Mar 12 08:46:07 donald01 openais[5301]: [TOTEM] previous ring seq 112
> > rep
> > > > 10.227.180.101
> > > > Mar 12 08:46:07 donald01 openais[5301]: [TOTEM] aru 44 high delivered
> 44
> > > > received flag 1
> > > > Mar 12 08:46:07 donald01 openais[5301]: [TOTEM] Did not need to
> > originate
> > > > any messages in recovery.
> > > > Mar 12 08:46:07 donald01 openais[5301]: [TOTEM] Sending initial ORF
> > token
> > > > Mar 12 08:46:07 donald01 openais[5301]: [CLM  ] CLM CONFIGURATION
> CHANGE
> > > > Mar 12 08:46:07 donald01 openais[5301]: [CLM  ] New Configuration:
> > > > Mar 12 08:46:07 donald01 openais[5301]: [CLM  ]         r(0)
> > > > ip(10.227.180.101)
> > > > Mar 12 08:46:07 donald01 openais[5301]: [CLM  ] Members Left:
> > > > Mar 12 08:46:07 donald01 kernel: dlm: closing connection to node 2
> > > > Mar 12 08:46:07 donald01 openais[5301]: [CLM  ]         r(0)
> > > > ip(10.227.180.102)
> > > > Mar 12 08:46:07 donald01 clurgmgrd[7525]: <info> State change:
> donald02
> > DOWN
> > > >
> > > > Mar 12 08:46:07 donald01 openais[5301]: [CLM  ] Members Joined:
> > > > Mar 12 08:46:07 donald01 openais[5301]: [CLM  ] CLM CONFIGURATION
> CHANGE
> > > > Mar 12 08:46:07 donald01 openais[5301]: [CLM  ] New Configuration:
> > > > Mar 12 08:46:07 donald01 openais[5301]: [CLM  ]         r(0)
> > > > ip(10.227.180.101)
> > > > Mar 12 08:46:07 donald01 openais[5301]: [CLM  ] Members Left:
> > > > Mar 12 08:46:07 donald01 openais[5301]: [CLM  ] Members Joined:
> > > > Mar 12 08:46:07 donald01 openais[5301]: [SYNC ] This node is within
> the
> > > > primary component and will provide service.
> > > > Mar 12 08:46:07 donald01 openais[5301]: [TOTEM] entering OPERATIONAL
> > state.
> > > > Mar 12 08:46:07 donald01 openais[5301]: [CLM  ] got nodejoin message
> > > > 10.227.180.101
> > > > Mar 12 08:46:07 donald01 openais[5301]: [CPG  ] got joinlist message
> > from
> > > > node 1
> > > >
> > > >
> > > > As you can see from the above /var/log/messages excerpt there is a 5
> > second
> > > > time frame at the beginning in which apparently nothing is happening
> > > > (08:46:02-08:46:07). I was wondering how I could reduce or remove this
> > delay
> > > > so that the fail-over process will be done more quickly.
> > > >
> > > > My current /etc/ais/openais.conf is still the installed default:
> > > > totem {
> > > >         version: 2
> > > >         secauth: off
> > > >         threads: 0
> > > >         interface {
> > > >                 ringnumber: 0
> > > >                 bindnetaddr: 192.168.2.0
> > > >                 mcastaddr: 226.94.1.1
> > > >                 mcastport: 5405
> > > >         }
> > > > }
> > > >
> > > > logging {
> > > >         debug: off
> > > >         timestamp: on
> > > > }
> > > >
> > > > amf {
> > > >         mode: disabled
> > > > }
> > > >
> > > >
> > > > From /etc/cluster/cluster.conf some lines relating to openais:
> > > > <cman expected_votes="1" two_node="1" hello_timer="1"
> > deadnode_timeout="3"/>
> > > > <totem token="3000"/>
> > > >
> > > > I am suspecting more fine tuning of additional openais configuration
> > > > parameters will do the trick but I am not sure. If more information is
> > > > needed please let me know, any useful advice would be greatly
> > appreciated!
> > > >
> > > > Best regards,
> > > > Herwin
> > > >
> > > When a cluster is started with cman, /etc/ais/openais.conf is not used.
> > >
> > > While changing timing parameters is not supported by Red Hat, they can
> > > be modified via overrides.  The 5 second time window you see is a result
> > > of the "consensus" timeout parameter.
> > >
> > > consensus should be at minimum 2* token.
> > >
> > > You might try
> > > <totem token="500" consensus="1500" retransmits_before_loss_const="8"/>
> > >
> > > With qdisk, there may be other implications on timer settings.
> > >
> > > You may have to override retransmits_before_loss_const as well.  The
> > > token timeout is divided by retrans_before_loss and used to calculate
> > > the token_retransmit parameter.  The smallest value for any timer can be
> > > 30 milliseconds because of limitations of the Linux timer
> > > implementation.
> > >
> > > I believe retransmits_before_loss_const is something like 20 for cman,
> > > so in this case 500/20 = 25 msec (less then 30 msec) which will cause
> > > aisexec to fail to start.
> > >
> > > A safe value for retransmits_before_loss might be 8-10.
> > >
> > > Again, not supported by Red Hat support, and YMMV.
> > >
> > > Let us know how it goes.
> > >
> > > Timer parameters need to be the same on all nodes.
> > >
> > > Regards
> > > -steve
> > > >
> > > > _______________________________________________
> > > > Openais mailing list
> > > > [email protected]
> > > > https://lists.linux-foundation.org/mailman/listinfo/openais
> > 
> > Thanks for your reply Steve!
> > 
> > I have followed your recommendations and ended up with the following
> > configuration in /etc/cluster/cluster.conf:
> > 
> >         <cman expected_votes="1" two_node="1" hello_timer="1"
> > deadnode_timeout="3"/>
> >         <logging syslog_facility="local4">
> >                 <logger ident="CPG" debug="on"/>
> >                 <logger ident="CMAN" debug="on"/>
> >         </logging>
> >         <totem token="500" consensus="1000" retransmits_before_loss="10"/>
> > 
> > And this seems to work. Failover time is now about 16-20 seconds, this
> > includes fencing the failed node, assigning IP addresses and mounting the
> > filesystem required by the cluster service. However, I had expected to see
> > more logging in /var/log/messages because of the options in <logging/> but
> > it is the same as without these options...
> > 
> > While I was adding the consensus parameter and thought I little more about
> > the token parameter I was wondering how these timers relate to the
> > hello_timer
> > and deadnode_timeout parameters in the <cman /> line. I have googled quite
> > extensively but could not find any detailed documentation/explanation on
> the
> > way that these parameters could/should be configured for different use
> > cases.
> > Any recommended reading (either online or in a book) that you could
> > recommend
> > would also be appreciated.
> > 
> > Best regards,
> > Herwin
> 
> *Doh* please ignore the remark on logging, I forgot to restart the syslog
> facility...
> 
> What I did see now though is that cluster service monitoring scripts are
> only called once every 10 secs. I tried to lower that to 5 secs (as that
> would be the minimum allowed value) by changing parameters in ip.sh, fs.sh
> and script.sh in /usr/share/cluster, but somehow it won't accept these new
> values and checking remains done every 10 secs... Suggestions anyone?
> 
> Best regards,
> Herwin
> 

_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais

Re: [Openais] Question regarding cluster fail-over time and GATHER states

Reply via email to