On 16/10/15 12:37 PM, Vallevand, Mark K wrote:
> Fencing, yes.  I have pcmk-redirect for each node in cluster.conf.

Do you have stonith configured (and tested!) in Pacemaker as well?

> I run with default cman settings for corosync.  No totem clause.  That gives 
> the 20s detection.  Not sure what the defaults really are.
> I added <totem token="1000" token_retransmits_before_loss_const="5" /> to 
> cluster.conf and get about a 5s detection.
> 
> The corosync man page says:
>        token  This timeout specifies in milliseconds until a token loss is 
> declared after not receiving a token.  This is the time spent detecting a
>               failure of a processor in the current configuration.  Reforming 
> a new configuration takes about 50 milliseconds in  addition  to  this
>               timeout.
> 
>               The default is 1000 milliseconds.
> 
>        token_retransmit
>               This timeout specifies in milliseconds after how long before 
> receiving a token the token is retransmitted.  This will be automatically
>               calculated if token is modified.  It is not recommended to 
> alter this value without guidance from the corosync community.
> 
>               The default is 238 milliseconds.
> 
>        hold   This timeout specifies in milliseconds how long the token 
> should be held by the representative when the protocol is under low utiliza‐
>               tion.   It is not recommended to alter this value without 
> guidance from the corosync community.
> 
>               The default is 180 milliseconds.
> 
>        token_retransmits_before_loss_const
>               This  value  identifies  how  many  token  retransmits  should 
> be attempted before forming a new configuration.  If this value is set,
>               retransmit and hold will be automatically calculated from 
> retransmits_before_loss and token.
> 
>               The default is 4 retransmissions.
> 
> But, I don't know what cman sets these to.  But, they aren't these values.  
> And, they aren't the values in the cman man page, which says this:

Maybe it's changed by the ubuntu packagers? I don't know, I don't use
debian or ubuntu.

>               Cman uses different defaults for some of the corosync 
> parameters listed in corosync.conf(5).  If you wish to use a non-default set‐
>               ting, they can be configured in cluster.conf as shown above.  
> Cman uses the following default values:
> 
>                 <totem
>                   vsftype="none"
>                   token="10000"
>                   token_retransmits_before_loss_const="20"
>                   join="60"
>                   consensus="4800"
>                   rrp_mode="none"
>                   <!-- or rrp_mode="active" if altnames are present >
>                 />
>                
> So, it looks like setting the corosync parameters in cluster.conf has some 
> effect.  Cman seems to pass them to corosync.

Yes, never configure corosync directly when using cman, only use
cluster.conf, as you did.

> Onward!
> 
> 
> Regards.
> Mark K Vallevand   [email protected] 
> <mailto:[email protected]> 
> Never try and teach a pig to sing: it's a waste of time, and it annoys the 
> pig.
> 
> THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY 
> MATERIAL and is thus for use only by the intended recipient. If you received 
> this in error, please contact the sender and delete the e-mail and its 
> attachments from all computers.
> 
> 
> -----Original Message-----
> From: Digimer [mailto:[email protected]] 
> Sent: Friday, October 16, 2015 11:18 AM
> To: Cluster Labs - All topics related to open-source clustering welcomed
> Subject: Re: [ClusterLabs] Cluster node loss detection.
> 
> On 16/10/15 11:40 AM, Vallevand, Mark K wrote:
>> Thanks.  I wasn't completely aware of corosync's role in this.  I see new 
>> things in the docs every time I read them.
>>
>> I looked up the corosync settings at one time and did it again:
>>      token loss 3000ms
>>      retransmits 10
>> So 30s.  Redid my simple testing and got detection times of 22s, 26s, and 
>> 25s using very crude methods.
>> Any warnings about setting these values to something else?
>> We require our customers to use an isolated, private network for cluster 
>> communications.  All taken care of in our instructions and cluster 
>> configuration scripts.  Network traffic will not be a factor.  So, I'm 
>> thinking 1000ms and 5 retransmits as an experiment.
> 
> That is very high. I think the default is something like 236ms x 4 losses.
> 
> You do have fencing, right?
> 
>> I was pretty sure that DLM was just being informed by clustering, but I 
>> needed to ask.
>>
>> Again, thanks.
>>      
>>
>> Regards.
>> Mark K Vallevand   [email protected] 
>> <mailto:[email protected]> 
>> Never try and teach a pig to sing: it's a waste of time, and it annoys the 
>> pig.
> 
> 


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?

_______________________________________________
Users mailing list: [email protected]
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to