Thanks. I wasn't completely aware of corosync's role in this. I see new
things in the docs every time I read them.
I looked up the corosync settings at one time and did it again:
token loss 3000ms
retransmits 10
So 30s. Redid my simple testing and got detection times of 22s, 26s, and 25s
using very crude methods.
Any warnings about setting these values to something else?
We require our customers to use an isolated, private network for cluster
communications. All taken care of in our instructions and cluster
configuration scripts. Network traffic will not be a factor. So, I'm thinking
1000ms and 5 retransmits as an experiment.
I was pretty sure that DLM was just being informed by clustering, but I needed
to ask.
Again, thanks.
Regards.
Mark K Vallevand [email protected] <mailto:[email protected]>
Never try and teach a pig to sing: it's a waste of time, and it annoys the pig.
THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY
MATERIAL and is thus for use only by the intended recipient. If you received
this in error, please contact the sender and delete the e-mail and its
attachments from all computers.
-----Original Message-----
From: Digimer [mailto:[email protected]]
Sent: Friday, October 16, 2015 10:04 AM
To: Cluster Labs - All topics related to open-source clustering welcomed
Subject: Re: [ClusterLabs] Cluster node loss detection.
On 16/10/15 10:51 AM, Vallevand, Mark K wrote:
> It looks like it takes 20s for a cluster to detect that a node has been
> lost.
Loss is detected by corosync, and it declares loss after X lost totem
tokens, each token being declared lost after Y milliseconds. By default,
node loss should be detected in about 1 second of no network traffic,
but you need to check corosync's settings.
> The detection seems to correlate to dlm reporting its lost connection to
> the node.
Negative. DLM is informed when a node is declared lost and blocks until
fenced/stonithd tells it that the peer has been successfully fenced.
After which time, it reaps lost locks and recovers.
> Not sure if correlation is causation.
Correlation.
> Anyway, can someone tell me where that 20s might be coming from and if
> it is adjustable?
>
> Ubuntu 12.04 LTS
> pacemaker 1.1.10
> cman 3.1.7
> corosync 1.4.6
>
> Thanks!
>
>
>
> Regards.
> Mark K Vallevand [email protected]
> <mailto:[email protected]>
> Never try and teach a pig to sing: it's a waste of time, and it annoys
> the pig.
>
> THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY
> MATERIAL and is thus for use only by the intended recipient. If you
> received this in error, please contact the sender and delete the e-mail
> and its attachments from all computers.
This suffix has zero legal bearing, just saying. Anything posted to this
list is 100% open and public.
--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?
_______________________________________________
Users mailing list: [email protected]
http://clusterlabs.org/mailman/listinfo/users
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Users mailing list: [email protected]
http://clusterlabs.org/mailman/listinfo/users
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org