Re: [Cluster-devel] fence daemon problems

2012-10-09 Thread Lon Hohberger
On 10/03/2012 12:55 PM, Dietmar Maurer wrote: The intention of that is to prevent an inquorate node/partition from killing a quorate group of nodes that are running normally. e.g. if a 5 node cluster is partitioned into 2/3 or 1/4. You don't want the 2 or 1 node group to fence the 3 or 4 nodes

Re: [Cluster-devel] fence daemon problems

2012-10-09 Thread Lon Hohberger
On 10/03/2012 12:44 PM, David Teigland wrote: You might be able to assign different numbers of votes to reduce the likelihood of everyone loosing quorum. (Late to thread - here's an example of what David is talking about): Node Votes - - node1 1 node2 2 node3 3 node4 5 Total:

Re: [Cluster-devel] fence daemon problems

2012-10-03 Thread David Teigland
On Wed, Oct 03, 2012 at 04:55:55PM +, Dietmar Maurer wrote: > > The difficult cases, which I think you're seeing, are partitions where > > no group has quorum, e.g. 2/2. In this case we do nothing, and the > > user has to resolve it by resetting some of the nodes > > The problem with that is

Re: [Cluster-devel] fence daemon problems

2012-10-03 Thread Dietmar Maurer
> The intention of that is to prevent an inquorate node/partition from killing a > quorate group of nodes that are running normally. e.g. if a 5 node cluster is > partitioned into 2/3 or 1/4. You don't want the 2 or 1 node group to fence > the 3 or 4 nodes that are fine. sure, I understand that.

Re: [Cluster-devel] fence daemon problems

2012-10-03 Thread David Teigland
On Wed, Oct 03, 2012 at 04:26:35PM +, Dietmar Maurer wrote: > > I guess you're talking about the dlm_tool ls output? > > Yes. > > > The "fencing" there > > means it is waiting for fenced to finish fencing before it starts dlm > > recovery. > > fenced waits for quorum. > > So who actually s

Re: [Cluster-devel] fence daemon problems

2012-10-03 Thread Dietmar Maurer
> I guess you're talking about the dlm_tool ls output? Yes. > The "fencing" there > means it is waiting for fenced to finish fencing before it starts dlm > recovery. > fenced waits for quorum. So who actually starts fencing when cluster is not quorate? rgmanager?

Re: [Cluster-devel] fence daemon problems

2012-10-03 Thread David Teigland
On Wed, Oct 03, 2012 at 04:12:10PM +, Dietmar Maurer wrote: > > Yes, it's a stateful partition merge, and I think /var/log/messages should > > have > > mentioned something about that. When a node is partitioned from the > > others (e.g. network disconnected), it has to be cleanly reset before

Re: [Cluster-devel] fence daemon problems

2012-10-03 Thread Dietmar Maurer
> Yes, it's a stateful partition merge, and I think /var/log/messages should > have > mentioned something about that. When a node is partitioned from the > others (e.g. network disconnected), it has to be cleanly reset before it's > allowed back. "cleanly reset" typically means rebooted. If it

Re: [Cluster-devel] fence daemon problems

2012-10-03 Thread Dietmar Maurer
> Subject: Re: [Cluster-devel] fence daemon problems > > On Wed, Oct 03, 2012 at 09:25:08AM +, Dietmar Maurer wrote: > > So the observed behavior is expected? > > Yes, it's a stateful partition merge, and I think /var/log/messages should > have > mention

Re: [Cluster-devel] fence daemon problems

2012-10-03 Thread David Teigland
On Wed, Oct 03, 2012 at 09:25:08AM +, Dietmar Maurer wrote: > So the observed behavior is expected? Yes, it's a stateful partition merge, and I think /var/log/messages should have mentioned something about that. When a node is partitioned from the others (e.g. network disconnected), it has t

Re: [Cluster-devel] fence daemon problems

2012-10-03 Thread Dietmar Maurer
> I observe strange problems with fencing when a cluster loose quorum for a > short time. > > After regain quorum, fenced reports 'wait state   messages', and whole > cluster is blocked waiting for fenced. Just found the following in fenced/cpg.c: /* This is how we deal with cpg