Re: [ClusterLabs] Cluster node loss detection.

2015-10-21 Thread Jan Pokorný
On 16/10/15 12:51 -0400, Digimer wrote: > On 16/10/15 12:37 PM, Vallevand, Mark K wrote: >> So, it looks like setting the corosync parameters in cluster.conf >> has some effect. Cman seems to pass them to corosync. > > Yes, never configure corosync directly when using cman, only use > cluster.con

Re: [ClusterLabs] Cluster node loss detection.

2015-10-16 Thread Vallevand, Mark K
:09 PM To: Cluster Labs - All topics related to open-source clustering welcomed Subject: Re: [ClusterLabs] Cluster node loss detection. We know. We've worked out our application-specific answer to split brain. But, proper fencing is on our to-do list. Currently we only deploy 2-node systems.

Re: [ClusterLabs] Cluster node loss detection.

2015-10-16 Thread Vallevand, Mark K
ering welcomed Subject: Re: [ClusterLabs] Cluster node loss detection. On 16/10/15 01:14 PM, Vallevand, Mark K wrote: > No stonith configured. Not explicitly anyway. > Does that factor into this somehow? Yes, you will eventually have a split-brain. All fencing in cman does with 'fe

Re: [ClusterLabs] Cluster node loss detection.

2015-10-16 Thread Digimer
; this in error, please contact the sender and delete the e-mail and its > attachments from all computers. > > > -Original Message- > From: Digimer [mailto:li...@alteeve.ca] > Sent: Friday, October 16, 2015 11:51 AM > To: Cluster Labs - All topics related to open-source c

Re: [ClusterLabs] Cluster node loss detection.

2015-10-16 Thread Vallevand, Mark K
attachments from all computers. -Original Message- From: Digimer [mailto:li...@alteeve.ca] Sent: Friday, October 16, 2015 11:51 AM To: Cluster Labs - All topics related to open-source clustering welcomed Subject: Re: [ClusterLabs] Cluster node loss detection. On 16/10/15 12:37 PM, Vallevand, M

Re: [ClusterLabs] Cluster node loss detection.

2015-10-16 Thread Digimer
ransmits_before_loss_const="20" > join="60" > consensus="4800" > rrp_mode="none" >From: Digimer [mailto:li...@alteeve.ca] > Sent: Friday, October 16, 2015 11:18 AM >

Re: [ClusterLabs] Cluster node loss detection.

2015-10-16 Thread Vallevand, Mark K
and its attachments from all computers. -Original Message- From: Digimer [mailto:li...@alteeve.ca] Sent: Friday, October 16, 2015 11:18 AM To: Cluster Labs - All topics related to open-source clustering welcomed Subject: Re: [ClusterLabs] Cluster node loss detection. On 16/10/15 11:4

Re: [ClusterLabs] Cluster node loss detection.

2015-10-16 Thread Digimer
On 16/10/15 11:40 AM, Vallevand, Mark K wrote: > Thanks. I wasn't completely aware of corosync's role in this. I see new > things in the docs every time I read them. > > I looked up the corosync settings at one time and did it again: > token loss 3000ms > retransmits 10 > So 30s. R

Re: [ClusterLabs] Cluster node loss detection.

2015-10-16 Thread Vallevand, Mark K
---Original Message- From: Vallevand, Mark K [mailto:mark.vallev...@unisys.com] Sent: Friday, October 16, 2015 10:41 AM To: Cluster Labs - All topics related to open-source clustering welcomed Subject: Re: [ClusterLabs] Cluster node loss detection. Thanks. I wasn't completely aware of co

Re: [ClusterLabs] Cluster node loss detection.

2015-10-16 Thread Vallevand, Mark K
its attachments from all computers. -Original Message- From: Digimer [mailto:li...@alteeve.ca] Sent: Friday, October 16, 2015 10:04 AM To: Cluster Labs - All topics related to open-source clustering welcomed Subject: Re: [ClusterLabs] Cluster node loss detection. On 16/10/15 10:51 AM, Val

Re: [ClusterLabs] Cluster node loss detection.

2015-10-16 Thread Digimer
On 16/10/15 10:51 AM, Vallevand, Mark K wrote: > It looks like it takes 20s for a cluster to detect that a node has been > lost. Loss is detected by corosync, and it declares loss after X lost totem tokens, each token being declared lost after Y milliseconds. By default, node loss should be detect

[ClusterLabs] Cluster node loss detection.

2015-10-16 Thread Vallevand, Mark K
It looks like it takes 20s for a cluster to detect that a node has been lost. The detection seems to correlate to dlm reporting its lost connection to the node. Not sure if correlation is causation. Anyway, can someone tell me where that 20s might be coming from and if it is adjustable? Ubuntu