Alan

So you don't like the idea of a "quorum time out" on start as well? 

I'm thinking of the "use case" where a multi-node cluster is spread over
two sites and there is a problem at one site making all the nodes of one
site unavailable.

Not only would you want the cluster to recover/continue, but if you may
need to restart the cluster in this event, it is a PITA to have to
temporarily reconfigure the Quorum just to get it to start.

With a "time out" value that is set relatively high, the cluster would
still be able to function (without tweaking the configuration) whilst
still protecting against premature resource starting.

Darren

On Tue, 2010-05-11 at 09:17 -0700, Alan Jones wrote:

> I have fixed constraints to design for allowing the cluster to
> continue when one of two nodes fail but preventing it from from
> starting with only one node.
> To implement this in a non-intrusive way I'm considering implementing
> a votequorum device, similar to the proposed qdisk, that would add a
> vote after a 2->1 transition.
> The calculation of quorum in corosync appears to be ignored by
> pacemaker which looks only at the votes and expected votes.
> If people on the list can give me feedback on this proposal, I'd
> appreciate it.
> Alan
> 
> 
> On Tue, May 11, 2010 at 1:00 AM, Darren Thompson
> <[email protected]> wrote:
> 
>         I'm familiar with other clustering software and the more
>         traditional approach is to have a quorum requirement (to stop
>         the first node started from grabbing all the cluster
>         resources) but to implement a nominal (and hopefully
>         configurable) timeout, after which the quorum requirement is
>         lifted, allowing the cluster resources to run on whatever
>         nodes are available at that point.
>         
>         This seems a reasonable and pragmatic compromise to these
>         mutually exclusive requirements and I don't imagine that would
>         be difficult to code.
>         
>         Darren 
>         
>         
>         
>         
>         
>         On Tue, 2010-05-11 at 08:10 +0100, Christine Caulfield wrote:
>         > 
>         > On 10/05/10 23:22, Alan Jones wrote: > Putting the expected
>         > votes to one in both corosync and pacemaker allows > the
>         > cluster > to start with one node (not what I want). Sorry,
>         > but you can't have it both ways. Either the cluster is
>         > allowed to run with 1 node or it isn't. There is no rule
>         > that says "I want the cluster to run with only one node ONLY
>         > if there were previously 2 nodes and one died, but not if
>         > they were booted at different times". Though we do accept
>         > patches ;-) Chrissie > Unfortunately, it also does > not
>         > allow the > cluster to continue with 1 node after a failure
>         > because pacemaker > remembers the > two node cluster and
>         > increases its expected votes. > The idea of quorum does not
>         > seem to be closely coupled between corosync and > pacemaker.
>         > Running with expected votes of two, I halted a node and then
>         > > used > corosync-quorumtool to set the surviving nodes
>         > votes to two.  Now > corosync says > it has quorum and
>         > pacemaker says it does not; i.e. the resources are not >
>         > able to run. > To sum up - as far as pacemaker behavior the
>         > two_node option does not > seem to > do anything.  Further,
>         > if I plan to do quorum logic in corosync for the > bahavior
>         > > I want, I will also need to explore how to get pacemaker
>         > to use it. > Any comments are welcome. > Alan > > On Mon,
>         > May 10, 2010 at 12:17 AM, Christine Caulfield >
>         > <[email protected] <mailto:[email protected]
>         > 
>         > >> wrote:
>         > >
>         > >     On 08/05/10 01:02, Alan Jones wrote:
>         > >
>         > >         I'd like to modify the quorum behavior to require 2 nodes 
> to
>         > >         start the
>         > >         cluster but allow it to continue with only 1 node after a 
> failure.
>         > >         It seemed that the two_node option used with the 
> votequorum provider
>         > >         might provide what I'm looking for (corosync.conf section 
> below).
>         > >         However, I'm getting the first behavior (requiring 2 
> nodes to start)
>         > >         without the second (continute with only 1 node).
>         > >         Should I provide a votequorum device to add another vote 
> after a
>         > >         failure?
>         > >         Any other ideas?
>         > >         Alan
>         > >         ---
>         > >         quorum {
>         > >                  provider: corosync_votequorum
>         > >                  expected_votes: 2
>         > >                  votes: 1
>         > >                  two_node: 1
>         > >         }
>         > >
>         > >
>         > >
>         > >     expected_votes should be set to 1 if you're using the two_node
>         > >     option. If you set it to 2, then it will always need both 
> nodes to
>         > >     be up ... as you've discovered ;-)
>         > >
>         > >     Chrissie
>         > >
>         > >
>         > 
>         > _______________________________________________
>         > Openais mailing list
>         > [email protected]
>         > https://lists.linux-foundation.org/mailman/listinfo/openais
> 
> 
_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais

Reply via email to