On 16/10/15 12:37 PM, Vallevand, Mark K wrote: > Fencing, yes. I have pcmk-redirect for each node in cluster.conf.
Do you have stonith configured (and tested!) in Pacemaker as well? > I run with default cman settings for corosync. No totem clause. That gives > the 20s detection. Not sure what the defaults really are. > I added <totem token="1000" token_retransmits_before_loss_const="5" /> to > cluster.conf and get about a 5s detection. > > The corosync man page says: > token This timeout specifies in milliseconds until a token loss is > declared after not receiving a token. This is the time spent detecting a > failure of a processor in the current configuration. Reforming > a new configuration takes about 50 milliseconds in addition to this > timeout. > > The default is 1000 milliseconds. > > token_retransmit > This timeout specifies in milliseconds after how long before > receiving a token the token is retransmitted. This will be automatically > calculated if token is modified. It is not recommended to > alter this value without guidance from the corosync community. > > The default is 238 milliseconds. > > hold This timeout specifies in milliseconds how long the token > should be held by the representative when the protocol is under low utiliza‐ > tion. It is not recommended to alter this value without > guidance from the corosync community. > > The default is 180 milliseconds. > > token_retransmits_before_loss_const > This value identifies how many token retransmits should > be attempted before forming a new configuration. If this value is set, > retransmit and hold will be automatically calculated from > retransmits_before_loss and token. > > The default is 4 retransmissions. > > But, I don't know what cman sets these to. But, they aren't these values. > And, they aren't the values in the cman man page, which says this: Maybe it's changed by the ubuntu packagers? I don't know, I don't use debian or ubuntu. > Cman uses different defaults for some of the corosync > parameters listed in corosync.conf(5). If you wish to use a non-default set‐ > ting, they can be configured in cluster.conf as shown above. > Cman uses the following default values: > > <totem > vsftype="none" > token="10000" > token_retransmits_before_loss_const="20" > join="60" > consensus="4800" > rrp_mode="none" > <!-- or rrp_mode="active" if altnames are present > > /> > > So, it looks like setting the corosync parameters in cluster.conf has some > effect. Cman seems to pass them to corosync. Yes, never configure corosync directly when using cman, only use cluster.conf, as you did. > Onward! > > > Regards. > Mark K Vallevand [email protected] > <mailto:[email protected]> > Never try and teach a pig to sing: it's a waste of time, and it annoys the > pig. > > THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY > MATERIAL and is thus for use only by the intended recipient. If you received > this in error, please contact the sender and delete the e-mail and its > attachments from all computers. > > > -----Original Message----- > From: Digimer [mailto:[email protected]] > Sent: Friday, October 16, 2015 11:18 AM > To: Cluster Labs - All topics related to open-source clustering welcomed > Subject: Re: [ClusterLabs] Cluster node loss detection. > > On 16/10/15 11:40 AM, Vallevand, Mark K wrote: >> Thanks. I wasn't completely aware of corosync's role in this. I see new >> things in the docs every time I read them. >> >> I looked up the corosync settings at one time and did it again: >> token loss 3000ms >> retransmits 10 >> So 30s. Redid my simple testing and got detection times of 22s, 26s, and >> 25s using very crude methods. >> Any warnings about setting these values to something else? >> We require our customers to use an isolated, private network for cluster >> communications. All taken care of in our instructions and cluster >> configuration scripts. Network traffic will not be a factor. So, I'm >> thinking 1000ms and 5 retransmits as an experiment. > > That is very high. I think the default is something like 236ms x 4 losses. > > You do have fencing, right? > >> I was pretty sure that DLM was just being informed by clustering, but I >> needed to ask. >> >> Again, thanks. >> >> >> Regards. >> Mark K Vallevand [email protected] >> <mailto:[email protected]> >> Never try and teach a pig to sing: it's a waste of time, and it annoys the >> pig. > > -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? _______________________________________________ Users mailing list: [email protected] http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
