Re: [ClusterLabs] Antw: Re: Antw: Re: OCFS2 on cLVM with node waiting for fencing timeout

2016-10-13 Thread Ken Gaillot
On 10/13/2016 03:36 AM, Ulrich Windl wrote:
> That's what I'm talking about: If 1 of 3 nodes is rebooting (or the cluster 
> is split-brain 1:2), the single node CANNOT continue due to lack of quorum, 
> while the remaining two nodes can. Is it still necessary to wait for 
> completion of stonith?

If the 2 nodes have working communication with the 1 node, then the 1
node will leave the cluster in an orderly way, and fencing will not be
involved. In that case, yes, quorum is used to prevent the 1 node from
starting services until it rejoins the cluster.

However, if the 2 nodes lose communication with the 1 node, they cannot
be sure it is functioning well enough to respect quorum. In this case,
they have to fence it. DLM has to wait for the fencing to succeed to be
sure the 1 node is not messing with shared resources.


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: Antw: Re: OCFS2 on cLVM with node waiting for fencing timeout

2016-10-13 Thread Eric Ren

Hi,

On 10/13/2016 04:36 PM, Ulrich Windl wrote:

Eric Ren  schrieb am 13.10.2016 um 09:48 in Nachricht

<73f764d0-75e7-122f-ff4e-d0b27dbdd...@suse.com>:
[...]

When assuming node h01 still lived when communication failed, wouldn't

quorum prevent h01 from doing anything with DLM and OCFS2 anyway?
Not sure I understand you correctly. By default, loosing quorum will make
DLM stop service.

That's what I'm talking about: If 1 of 3 nodes is rebooting (or the cluster is 
split-brain 1:2), the single node CANNOT continue due to lack of quorum, while 
the remaining two nodes can. Is it still necessary to wait for completion of 
stonith?
quorum and fencing completion are different conditions to be checked before starting 
providing service again. FYI,


https://github.com/renzhengeek/libdlm/blob/master/dlm_controld/cpg.c#L603



See `man dlm_controld`:
```
--enable_quorum_lockspace 0|1
 enable/disable quorum requirement for lockspace operations
```

Does not exist in SLES11 SP4...
Well, I think it's better to keeps the default behavior. Otherwise, it's dangerous when 
brain-split happens.


Eric


Ulrich



___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Antw: Re: Antw: Re: OCFS2 on cLVM with node waiting for fencing timeout

2016-10-13 Thread Ulrich Windl
>>> Eric Ren  schrieb am 13.10.2016 um 09:48 in Nachricht
<73f764d0-75e7-122f-ff4e-d0b27dbdd...@suse.com>:
[...]
>> When assuming node h01 still lived when communication failed, wouldn't 
> quorum prevent h01 from doing anything with DLM and OCFS2 anyway?
> Not sure I understand you correctly. By default, loosing quorum will make 
> DLM stop service. 

That's what I'm talking about: If 1 of 3 nodes is rebooting (or the cluster is 
split-brain 1:2), the single node CANNOT continue due to lack of quorum, while 
the remaining two nodes can. Is it still necessary to wait for completion of 
stonith?

> See `man dlm_controld`:
> ```
> --enable_quorum_lockspace 0|1
> enable/disable quorum requirement for lockspace operations
> ```

Does not exist in SLES11 SP4...

Ulrich



___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org