Re: [ClusterLabs] Antw: Re: questions about startup fencing

2017-12-05 Thread Jan Pokorný
On 05/12/17 10:01 +0100, Tomas Jelinek wrote:
> The first attempt to fix the issue was to put nodes into standby mode with
> --lifetime=reboot:
> https://github.com/ClusterLabs/pcs/commit/ea6f37983191776fd46d90f22dc1432e0bfc0b91
> 
> This didn't work for several reasons. One of them was back then there was no
> reliable way to set standby mode with --lifetime=reboot for more than one
> node in a single step. (This may have been fixed in the meantime.) There
> were however other serious reasons for not putting the nodes into standby as
> was explained by Andrew:
> - it [putting the nodes into standby first] means shutdown takes longer (no
> node stops until all the resources stop)
> - it makes shutdown more complex (== more fragile), eg...
> - it result in pcs waiting forever for resources to stop
>   - if a stop fails and the cluster is configured to start at boot, then the
> node will get fenced and happily run resources when it returns
> (because all the nodes are up so we still have quorum)

Isn't one-off stopping of a cluster without actually disabling cluster
software to run on boot rather antithetical?

And beside, isn't this ressurection scenario possible also with the
current parallel (hence subject to race condition) stop in such case,
anyway?

> - only potentially benefits resources that have no (or very few) dependants
> and can stop quicker than it takes pcs to get through its "initiate parallel
> shutdown" loop (which should be rather fast since there is no ssh connection
> setup overheads)
> 
> So we ended up with just stopping pacemaker in parallel:
> https://github.com/ClusterLabs/pcs/commit/1ab2dd1b13839df7e5e9809cde25ac1dbae42c3d

-- 
Jan (Poki)


pgpfTSpu5jsUE.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: questions about startup fencing

2017-12-05 Thread Jehan-Guillaume de Rorthais
On Tue, 5 Dec 2017 10:05:03 +0100
Tomas Jelinek  wrote:

> Dne 4.12.2017 v 17:21 Jehan-Guillaume de Rorthais napsal(a):
> > On Mon, 4 Dec 2017 16:50:47 +0100
> > Tomas Jelinek  wrote:
> >   
> >> Dne 4.12.2017 v 14:21 Jehan-Guillaume de Rorthais napsal(a):  
> >>> On Mon, 4 Dec 2017 12:31:06 +0100
> >>> Tomas Jelinek  wrote:
> >>>  
>  Dne 4.12.2017 v 10:36 Jehan-Guillaume de Rorthais napsal(a):  
> > On Fri, 01 Dec 2017 16:34:08 -0600
> > Ken Gaillot  wrote:
> > 
> >> On Thu, 2017-11-30 at 07:55 +0100, Ulrich Windl wrote:  
> >>>
> >>>
>  Kristoffer Gronlund  wrote:  
> > Adam Spiers  writes:
> >
> >> - The whole cluster is shut down cleanly.
> >>
> >> - The whole cluster is then started up again.  (Side question:
> >> what
> >>  happens if the last node to shut down is not the first to
> >> start up?
> >>  How will the cluster ensure it has the most recent version of
> >> the
> >>  CIB?  Without that, how would it know whether the last man
> >> standing
> >>  was shut down cleanly or not?)  
> >
> > This is my opinion, I don't really know what the "official"
> > pacemaker
> > stance is: There is no such thing as shutting down a cluster
> > cleanly. A
> > cluster is a process stretching over multiple nodes - if they all
> > shut
> > down, the process is gone. When you start up again, you
> > effectively have
> > a completely new cluster.  
> 
>  Sorry, I don't follow you at all here.  When you start the cluster
>  up
>  again, the cluster config from before the shutdown is still there.
>  That's very far from being a completely new cluster :-)  
> >>>
> >>> The problem is you cannot "start the cluster" in pacemaker; you can
> >>> only "start nodes". The nodes will come up one by one. As opposed (as
> >>> I had said) to HP Sertvice Guard, where there is a "cluster formation
> >>> timeout". That is, the nodes wait for the specified time for the
> >>> cluster to "form". Then the cluster starts as a whole. Of course that
> >>> only applies if the whole cluster was down, not if a single node was
> >>> down.  
> >>
> >> I'm not sure what that would specifically entail, but I'm guessing we
> >> have some of the pieces already:
> >>
> >> - Corosync has a wait_for_all option if you want the cluster to be
> >> unable to have quorum at start-up until every node has joined. I don't
> >> think you can set a timeout that cancels it, though.
> >>
> >> - Pacemaker will wait dc-deadtime for the first DC election to
> >> complete. (if I understand it correctly ...)
> >>
> >> - Higher-level tools can start or stop all nodes together (e.g. pcs has
> >> pcs cluster start/stop --all).  
> >
> > Based on this discussion, I have some questions about pcs:
> >
> > * how is it shutting down the cluster when issuing "pcs cluster stop
> > --all"?  
> 
>  First, it sends a request to each node to stop pacemaker. The requests
>  are sent in parallel which prevents resources from being moved from node
>  to node. Once pacemaker stops on all nodes, corosync is stopped on all
>  nodes in the same manner.  
> >>>
> >>> What if for some external reasons one node is slower (load, network,
> >>> whatever) than the others and start reacting ? Sending queries in parallel
> >>> doesn't feels safe enough in regard with all the race conditions that can
> >>> occurs in the same time.
> >>>
> >>> Am I missing something ?
> >>>  
> >>
> >> If a node gets the request later than others, some resources may be
> >> moved to it before it starts shutting down pacemaker as well. Pcs waits
> >> for all nodes to shutdown pacemaker before it moves to shutting down
> >> corosync. This way, quorum is maintained the whole time pacemaker is
> >> shutting down and therefore no services are blocked from stopping due to
> >> lack of quorum.  
> > 
> > OK, so if admins or RA expect to start in, the same conditions the cluster
> > was shut downed, we have to take care of the shutdown ourselves by hands.
> > Considering disabling the resource before shutting down might be the best
> > option in the situation as the CRM will take care of switching off things
> > correctly in a proper transition.  
> 
> My understanding is that pacemaker takes care of switching off things 
> correctly in a proper transition on its shutdown. So there should be no 
> extra care needed. Pacemaker developers, however, need to confirm that.

Sure, but then, the resource would move away from the node if some other
node(s) (with appropriate 

Re: [ClusterLabs] Antw: Re: questions about startup fencing

2017-12-05 Thread Tomas Jelinek

Dne 4.12.2017 v 17:21 Jehan-Guillaume de Rorthais napsal(a):

On Mon, 4 Dec 2017 16:50:47 +0100
Tomas Jelinek  wrote:


Dne 4.12.2017 v 14:21 Jehan-Guillaume de Rorthais napsal(a):

On Mon, 4 Dec 2017 12:31:06 +0100
Tomas Jelinek  wrote:
   

Dne 4.12.2017 v 10:36 Jehan-Guillaume de Rorthais napsal(a):

On Fri, 01 Dec 2017 16:34:08 -0600
Ken Gaillot  wrote:
  

On Thu, 2017-11-30 at 07:55 +0100, Ulrich Windl wrote:


 

Kristoffer Gronlund  wrote:

Adam Spiers  writes:
 

- The whole cluster is shut down cleanly.

- The whole cluster is then started up again.  (Side question:
what
     happens if the last node to shut down is not the first to
start up?
     How will the cluster ensure it has the most recent version of
the
     CIB?  Without that, how would it know whether the last man
standing
     was shut down cleanly or not?)


This is my opinion, I don't really know what the "official"
pacemaker
stance is: There is no such thing as shutting down a cluster
cleanly. A
cluster is a process stretching over multiple nodes - if they all
shut
down, the process is gone. When you start up again, you
effectively have
a completely new cluster.


Sorry, I don't follow you at all here.  When you start the cluster
up
again, the cluster config from before the shutdown is still there.
That's very far from being a completely new cluster :-)


The problem is you cannot "start the cluster" in pacemaker; you can
only "start nodes". The nodes will come up one by one. As opposed (as
I had said) to HP Sertvice Guard, where there is a "cluster formation
timeout". That is, the nodes wait for the specified time for the
cluster to "form". Then the cluster starts as a whole. Of course that
only applies if the whole cluster was down, not if a single node was
down.


I'm not sure what that would specifically entail, but I'm guessing we
have some of the pieces already:

- Corosync has a wait_for_all option if you want the cluster to be
unable to have quorum at start-up until every node has joined. I don't
think you can set a timeout that cancels it, though.

- Pacemaker will wait dc-deadtime for the first DC election to
complete. (if I understand it correctly ...)

- Higher-level tools can start or stop all nodes together (e.g. pcs has
pcs cluster start/stop --all).


Based on this discussion, I have some questions about pcs:

* how is it shutting down the cluster when issuing "pcs cluster stop
--all"?


First, it sends a request to each node to stop pacemaker. The requests
are sent in parallel which prevents resources from being moved from node
to node. Once pacemaker stops on all nodes, corosync is stopped on all
nodes in the same manner.


What if for some external reasons one node is slower (load, network,
whatever) than the others and start reacting ? Sending queries in parallel
doesn't feels safe enough in regard with all the race conditions that can
occurs in the same time.

Am I missing something ?
   


If a node gets the request later than others, some resources may be
moved to it before it starts shutting down pacemaker as well. Pcs waits
for all nodes to shutdown pacemaker before it moves to shutting down
corosync. This way, quorum is maintained the whole time pacemaker is
shutting down and therefore no services are blocked from stopping due to
lack of quorum.


OK, so if admins or RA expect to start in, the same conditions the cluster was
shut downed, we have to take care of the shutdown ourselves by hands.
Considering disabling the resource before shutting down might be the best
option in the situation as the CRM will take care of switching off things
correctly in a proper transition.


My understanding is that pacemaker takes care of switching off things 
correctly in a proper transition on its shutdown. So there should be no 
extra care needed. Pacemaker developers, however, need to confirm that.




That's fine to me, as a cluster shutdown should be part of a controlled
procedure. I have to update my online docs I suppose now.

Thank you for your answers!



___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: questions about startup fencing

2017-12-05 Thread Tomas Jelinek

Dne 4.12.2017 v 23:17 Ken Gaillot napsal(a):

On Mon, 2017-12-04 at 22:08 +0300, Andrei Borzenkov wrote:

04.12.2017 18:47, Tomas Jelinek пишет:

Dne 4.12.2017 v 16:02 Kristoffer Grönlund napsal(a):

Tomas Jelinek  writes:



* how is it shutting down the cluster when issuing "pcs
cluster stop
--all"?


First, it sends a request to each node to stop pacemaker. The
requests
are sent in parallel which prevents resources from being moved
from node
to node. Once pacemaker stops on all nodes, corosync is stopped
on all
nodes in the same manner.


* any race condition possible where the cib will record only
one
node up before
     the last one shut down?
* will the cluster start safely?


That definitely sounds racy to me. The best idea I can think of
would be
to set all nodes except one in standby, and then shutdown
pacemaker
everywhere...



What issues does it solve? Which node should be the one?

How do you get the nodes out of standby mode on startup?


Is --lifetime=reboot valid for cluster properties? It is accepted by
crm_attribute and actually puts value as transient_attribute.


standby is a node attribute, so lifetime does apply normally.



Right, I forgot about this.

I was dealing with 'pcs cluster stop --all' back in January 2015, so I 
don't remember all the details anymore. However, I was able to dig out 
the private email thread where stopping a cluster was discussed with 
pacemaker developers including Andrew Beekhof and David Vossel.


Originally, pcs was stopping nodes in parallel in such a manner that 
each node stopped pacemaker and then corosync independently of other 
nodes. This caused loss of quorum during stopping the cluster, as nodes 
hosting resources which stopped fast disconnected from corosync sooner 
than nodes hosting resources which stopped slowly. Due to quorum 
missing, some resources could not be stopped and the cluster stop 
failed. This is covered in here:

https://bugzilla.redhat.com/show_bug.cgi?id=1180506

The first attempt to fix the issue was to put nodes into standby mode 
with --lifetime=reboot:

https://github.com/ClusterLabs/pcs/commit/ea6f37983191776fd46d90f22dc1432e0bfc0b91

This didn't work for several reasons. One of them was back then there 
was no reliable way to set standby mode with --lifetime=reboot for more 
than one node in a single step. (This may have been fixed in the 
meantime.) There were however other serious reasons for not putting the 
nodes into standby as was explained by Andrew:
- it [putting the nodes into standby first] means shutdown takes longer 
(no node stops until all the resources stop)

- it makes shutdown more complex (== more fragile), eg...
- it result in pcs waiting forever for resources to stop
  - if a stop fails and the cluster is configured to start at boot, 
then the node will get fenced and happily run resources when it returns 
(because all the nodes are up so we still have quorum)
- only potentially benefits resources that have no (or very few) 
dependants and can stop quicker than it takes pcs to get through its 
"initiate parallel shutdown" loop (which should be rather fast since 
there is no ssh connection setup overheads)


So we ended up with just stopping pacemaker in parallel:
https://github.com/ClusterLabs/pcs/commit/1ab2dd1b13839df7e5e9809cde25ac1dbae42c3d

I hope this shed light on why pcs stops clusters the way it does and 
that standby was considered but rejected for good reasons.


Regards,
Tomas

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: questions about startup fencing

2017-12-04 Thread Ken Gaillot
On Mon, 2017-12-04 at 22:08 +0300, Andrei Borzenkov wrote:
> 04.12.2017 18:47, Tomas Jelinek пишет:
> > Dne 4.12.2017 v 16:02 Kristoffer Grönlund napsal(a):
> > > Tomas Jelinek  writes:
> > > 
> > > > > 
> > > > > * how is it shutting down the cluster when issuing "pcs
> > > > > cluster stop
> > > > > --all"?
> > > > 
> > > > First, it sends a request to each node to stop pacemaker. The
> > > > requests
> > > > are sent in parallel which prevents resources from being moved
> > > > from node
> > > > to node. Once pacemaker stops on all nodes, corosync is stopped
> > > > on all
> > > > nodes in the same manner.
> > > > 
> > > > > * any race condition possible where the cib will record only
> > > > > one
> > > > > node up before
> > > > >     the last one shut down?
> > > > > * will the cluster start safely?
> > > 
> > > That definitely sounds racy to me. The best idea I can think of
> > > would be
> > > to set all nodes except one in standby, and then shutdown
> > > pacemaker
> > > everywhere...
> > > 
> > 
> > What issues does it solve? Which node should be the one?
> > 
> > How do you get the nodes out of standby mode on startup?
> 
> Is --lifetime=reboot valid for cluster properties? It is accepted by
> crm_attribute and actually puts value as transient_attribute.

standby is a node attribute, so lifetime does apply normally.
-- 
Ken Gaillot 

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: questions about startup fencing

2017-12-04 Thread Andrei Borzenkov
04.12.2017 18:47, Tomas Jelinek пишет:
> Dne 4.12.2017 v 16:02 Kristoffer Grönlund napsal(a):
>> Tomas Jelinek  writes:
>>

 * how is it shutting down the cluster when issuing "pcs cluster stop
 --all"?
>>>
>>> First, it sends a request to each node to stop pacemaker. The requests
>>> are sent in parallel which prevents resources from being moved from node
>>> to node. Once pacemaker stops on all nodes, corosync is stopped on all
>>> nodes in the same manner.
>>>
 * any race condition possible where the cib will record only one
 node up before
     the last one shut down?
 * will the cluster start safely?
>>
>> That definitely sounds racy to me. The best idea I can think of would be
>> to set all nodes except one in standby, and then shutdown pacemaker
>> everywhere...
>>
> 
> What issues does it solve? Which node should be the one?
> 
> How do you get the nodes out of standby mode on startup?

Is --lifetime=reboot valid for cluster properties? It is accepted by
crm_attribute and actually puts value as transient_attribute.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: questions about startup fencing

2017-12-04 Thread Tomas Jelinek

Dne 4.12.2017 v 14:21 Jehan-Guillaume de Rorthais napsal(a):

On Mon, 4 Dec 2017 12:31:06 +0100
Tomas Jelinek  wrote:


Dne 4.12.2017 v 10:36 Jehan-Guillaume de Rorthais napsal(a):

On Fri, 01 Dec 2017 16:34:08 -0600
Ken Gaillot  wrote:
   

On Thu, 2017-11-30 at 07:55 +0100, Ulrich Windl wrote:


  

Kristoffer Gronlund  wrote:

Adam Spiers  writes:
  

- The whole cluster is shut down cleanly.

- The whole cluster is then started up again.  (Side question:
what
    happens if the last node to shut down is not the first to
start up?
    How will the cluster ensure it has the most recent version of
the
    CIB?  Without that, how would it know whether the last man
standing
    was shut down cleanly or not?)


This is my opinion, I don't really know what the "official"
pacemaker
stance is: There is no such thing as shutting down a cluster
cleanly. A
cluster is a process stretching over multiple nodes - if they all
shut
down, the process is gone. When you start up again, you
effectively have
a completely new cluster.


Sorry, I don't follow you at all here.  When you start the cluster
up
again, the cluster config from before the shutdown is still there.
That's very far from being a completely new cluster :-)


The problem is you cannot "start the cluster" in pacemaker; you can
only "start nodes". The nodes will come up one by one. As opposed (as
I had said) to HP Sertvice Guard, where there is a "cluster formation
timeout". That is, the nodes wait for the specified time for the
cluster to "form". Then the cluster starts as a whole. Of course that
only applies if the whole cluster was down, not if a single node was
down.


I'm not sure what that would specifically entail, but I'm guessing we
have some of the pieces already:

- Corosync has a wait_for_all option if you want the cluster to be
unable to have quorum at start-up until every node has joined. I don't
think you can set a timeout that cancels it, though.

- Pacemaker will wait dc-deadtime for the first DC election to
complete. (if I understand it correctly ...)

- Higher-level tools can start or stop all nodes together (e.g. pcs has
pcs cluster start/stop --all).


Based on this discussion, I have some questions about pcs:

* how is it shutting down the cluster when issuing "pcs cluster stop
--all"?


First, it sends a request to each node to stop pacemaker. The requests
are sent in parallel which prevents resources from being moved from node
to node. Once pacemaker stops on all nodes, corosync is stopped on all
nodes in the same manner.


What if for some external reasons one node is slower (load, network, whatever)
than the others and start reacting ? Sending queries in parallel doesn't
feels safe enough in regard with all the race conditions that can occurs in the
same time.

Am I missing something ?



If a node gets the request later than others, some resources may be 
moved to it before it starts shutting down pacemaker as well. Pcs waits 
for all nodes to shutdown pacemaker before it moves to shutting down 
corosync. This way, quorum is maintained the whole time pacemaker is 
shutting down and therefore no services are blocked from stopping due to 
lack of quorum.


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: questions about startup fencing

2017-12-04 Thread Tomas Jelinek

Dne 4.12.2017 v 16:02 Kristoffer Grönlund napsal(a):

Tomas Jelinek  writes:



* how is it shutting down the cluster when issuing "pcs cluster stop --all"?


First, it sends a request to each node to stop pacemaker. The requests
are sent in parallel which prevents resources from being moved from node
to node. Once pacemaker stops on all nodes, corosync is stopped on all
nodes in the same manner.


* any race condition possible where the cib will record only one node up before
the last one shut down?
* will the cluster start safely?


That definitely sounds racy to me. The best idea I can think of would be
to set all nodes except one in standby, and then shutdown pacemaker
everywhere...



What issues does it solve? Which node should be the one?

How do you get the nodes out of standby mode on startup? Sure, 'pcs 
cluster start --all' could do that. If it is used to start the cluster 
that is. What if you start the cluster by restarting the nodes? Or by 
starting corosync and pacemaker via systemd without using pcs? Or by any 
other method?


There is no reliable way to get nodes out of standby/maintenance mode on 
start, so we must stick to simple pacemaker shutdown.


Moreover, even if pcs is used, how do we know a node was put into 
standby because the whole cluster was stopped and not because a user set 
it to standby manually for whatever reason?


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: questions about startup fencing

2017-12-04 Thread Klaus Wenninger
On 12/04/2017 04:02 PM, Kristoffer Grönlund wrote:
> Tomas Jelinek  writes:
>
>>> * how is it shutting down the cluster when issuing "pcs cluster stop --all"?
>> First, it sends a request to each node to stop pacemaker. The requests 
>> are sent in parallel which prevents resources from being moved from node 
>> to node. Once pacemaker stops on all nodes, corosync is stopped on all 
>> nodes in the same manner.
>>
>>> * any race condition possible where the cib will record only one node up 
>>> before
>>>the last one shut down?
>>> * will the cluster start safely?
> That definitely sounds racy to me. The best idea I can think of would be
> to set all nodes except one in standby, and then shutdown pacemaker
> everywhere...

Really mean standby or rather maintenance to keep resources
from switching to the still alive nodes during shutdown?

Regards,
Klaus

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: questions about startup fencing

2017-12-04 Thread Kristoffer Grönlund
Tomas Jelinek  writes:

>> 
>> * how is it shutting down the cluster when issuing "pcs cluster stop --all"?
>
> First, it sends a request to each node to stop pacemaker. The requests 
> are sent in parallel which prevents resources from being moved from node 
> to node. Once pacemaker stops on all nodes, corosync is stopped on all 
> nodes in the same manner.
>
>> * any race condition possible where the cib will record only one node up 
>> before
>>the last one shut down?
>> * will the cluster start safely?

That definitely sounds racy to me. The best idea I can think of would be
to set all nodes except one in standby, and then shutdown pacemaker
everywhere...

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: questions about startup fencing

2017-12-04 Thread Jehan-Guillaume de Rorthais
On Mon, 4 Dec 2017 12:31:06 +0100
Tomas Jelinek  wrote:

> Dne 4.12.2017 v 10:36 Jehan-Guillaume de Rorthais napsal(a):
> > On Fri, 01 Dec 2017 16:34:08 -0600
> > Ken Gaillot  wrote:
> >   
> >> On Thu, 2017-11-30 at 07:55 +0100, Ulrich Windl wrote:  
> >>>
> >>>  
>  Kristoffer Gronlund  wrote:  
> > Adam Spiers  writes:
> >  
> >> - The whole cluster is shut down cleanly.
> >>
> >> - The whole cluster is then started up again.  (Side question:
> >> what
> >>    happens if the last node to shut down is not the first to
> >> start up?
> >>    How will the cluster ensure it has the most recent version of
> >> the
> >>    CIB?  Without that, how would it know whether the last man
> >> standing
> >>    was shut down cleanly or not?)  
> >
> > This is my opinion, I don't really know what the "official"
> > pacemaker
> > stance is: There is no such thing as shutting down a cluster
> > cleanly. A
> > cluster is a process stretching over multiple nodes - if they all
> > shut
> > down, the process is gone. When you start up again, you
> > effectively have
> > a completely new cluster.  
> 
>  Sorry, I don't follow you at all here.  When you start the cluster
>  up
>  again, the cluster config from before the shutdown is still there.
>  That's very far from being a completely new cluster :-)  
> >>>
> >>> The problem is you cannot "start the cluster" in pacemaker; you can
> >>> only "start nodes". The nodes will come up one by one. As opposed (as
> >>> I had said) to HP Sertvice Guard, where there is a "cluster formation
> >>> timeout". That is, the nodes wait for the specified time for the
> >>> cluster to "form". Then the cluster starts as a whole. Of course that
> >>> only applies if the whole cluster was down, not if a single node was
> >>> down.  
> >>
> >> I'm not sure what that would specifically entail, but I'm guessing we
> >> have some of the pieces already:
> >>
> >> - Corosync has a wait_for_all option if you want the cluster to be
> >> unable to have quorum at start-up until every node has joined. I don't
> >> think you can set a timeout that cancels it, though.
> >>
> >> - Pacemaker will wait dc-deadtime for the first DC election to
> >> complete. (if I understand it correctly ...)
> >>
> >> - Higher-level tools can start or stop all nodes together (e.g. pcs has
> >> pcs cluster start/stop --all).  
> > 
> > Based on this discussion, I have some questions about pcs:
> > 
> > * how is it shutting down the cluster when issuing "pcs cluster stop
> > --all"?  
> 
> First, it sends a request to each node to stop pacemaker. The requests 
> are sent in parallel which prevents resources from being moved from node 
> to node. Once pacemaker stops on all nodes, corosync is stopped on all 
> nodes in the same manner.

What if for some external reasons one node is slower (load, network, whatever)
than the others and start reacting ? Sending queries in parallel doesn't
feels safe enough in regard with all the race conditions that can occurs in the
same time.

Am I missing something ?

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: questions about startup fencing

2017-12-04 Thread Jehan-Guillaume de Rorthais
On Fri, 01 Dec 2017 16:34:08 -0600
Ken Gaillot  wrote:

> On Thu, 2017-11-30 at 07:55 +0100, Ulrich Windl wrote:
> > 
> >   
> > > Kristoffer Gronlund  wrote:  
> > > > Adam Spiers  writes:
> > > >   
> > > > > - The whole cluster is shut down cleanly.
> > > > > 
> > > > > - The whole cluster is then started up again.  (Side question:
> > > > > what
> > > > >   happens if the last node to shut down is not the first to
> > > > > start up?
> > > > >   How will the cluster ensure it has the most recent version of
> > > > > the
> > > > >   CIB?  Without that, how would it know whether the last man
> > > > > standing
> > > > >   was shut down cleanly or not?)  
> > > > 
> > > > This is my opinion, I don't really know what the "official"
> > > > pacemaker
> > > > stance is: There is no such thing as shutting down a cluster
> > > > cleanly. A
> > > > cluster is a process stretching over multiple nodes - if they all
> > > > shut
> > > > down, the process is gone. When you start up again, you
> > > > effectively have
> > > > a completely new cluster.  
> > > 
> > > Sorry, I don't follow you at all here.  When you start the cluster
> > > up
> > > again, the cluster config from before the shutdown is still there.
> > > That's very far from being a completely new cluster :-)  
> > 
> > The problem is you cannot "start the cluster" in pacemaker; you can
> > only "start nodes". The nodes will come up one by one. As opposed (as
> > I had said) to HP Sertvice Guard, where there is a "cluster formation
> > timeout". That is, the nodes wait for the specified time for the
> > cluster to "form". Then the cluster starts as a whole. Of course that
> > only applies if the whole cluster was down, not if a single node was
> > down.  
> 
> I'm not sure what that would specifically entail, but I'm guessing we
> have some of the pieces already:
> 
> - Corosync has a wait_for_all option if you want the cluster to be
> unable to have quorum at start-up until every node has joined. I don't
> think you can set a timeout that cancels it, though.
> 
> - Pacemaker will wait dc-deadtime for the first DC election to
> complete. (if I understand it correctly ...)
> 
> - Higher-level tools can start or stop all nodes together (e.g. pcs has
> pcs cluster start/stop --all).

Based on this discussion, I have some questions about pcs:

* how is it shutting down the cluster when issuing "pcs cluster stop --all"?
* any race condition possible where the cib will record only one node up before
  the last one shut down?
* will the cluster start safely?

IIRC, crmsh does not implement the full cluster shutdown, only one node shut
down at a time. Is it because Pacemaker has no way to shutdown the whole
cluster by stopping all resources everywhere forbidding failovers in the
process?

Is it required to include a bunch of "pcs resource disable " before
shutting down the cluster?

Thanks,
-- 
Jehan-Guillaume de Rorthais
Dalibo

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: questions about startup fencing

2017-12-01 Thread Ken Gaillot
On Thu, 2017-11-30 at 07:55 +0100, Ulrich Windl wrote:
> 
> 
> > Kristoffer Gronlund  wrote:
> > > Adam Spiers  writes:
> > > 
> > > > - The whole cluster is shut down cleanly.
> > > > 
> > > > - The whole cluster is then started up again.  (Side question:
> > > > what
> > > >   happens if the last node to shut down is not the first to
> > > > start up?
> > > >   How will the cluster ensure it has the most recent version of
> > > > the
> > > >   CIB?  Without that, how would it know whether the last man
> > > > standing
> > > >   was shut down cleanly or not?)
> > > 
> > > This is my opinion, I don't really know what the "official"
> > > pacemaker
> > > stance is: There is no such thing as shutting down a cluster
> > > cleanly. A
> > > cluster is a process stretching over multiple nodes - if they all
> > > shut
> > > down, the process is gone. When you start up again, you
> > > effectively have
> > > a completely new cluster.
> > 
> > Sorry, I don't follow you at all here.  When you start the cluster
> > up
> > again, the cluster config from before the shutdown is still there.
> > That's very far from being a completely new cluster :-)
> 
> The problem is you cannot "start the cluster" in pacemaker; you can
> only "start nodes". The nodes will come up one by one. As opposed (as
> I had said) to HP Sertvice Guard, where there is a "cluster formation
> timeout". That is, the nodes wait for the specified time for the
> cluster to "form". Then the cluster starts as a whole. Of course that
> only applies if the whole cluster was down, not if a single node was
> down.

I'm not sure what that would specifically entail, but I'm guessing we
have some of the pieces already:

- Corosync has a wait_for_all option if you want the cluster to be
unable to have quorum at start-up until every node has joined. I don't
think you can set a timeout that cancels it, though.

- Pacemaker will wait dc-deadtime for the first DC election to
complete. (if I understand it correctly ...)

- Higher-level tools can start or stop all nodes together (e.g. pcs has
pcs cluster start/stop --all).

> > 
> > > When starting up, how is the cluster, at any point, to know if
> > > the
> > > cluster it has knowledge of is the "latest" cluster?
> > 
> > That was exactly my question.
> > 
> > > The next node could have a newer version of the CIB which adds
> > > yet
> > > more nodes to the cluster.
> > 
> > Yes, exactly.  If the first node to start up was not the last man
> > standing, the CIB history is effectively being forked.  So how is
> > this
> > issue avoided?
> 
> Quorum? "Cluster formation delay"?
> 
> > 
> > > The only way to bring up a cluster from being completely stopped
> > > is to
> > > treat it as creating a completely new cluster. The first node to
> > > start
> > > "creates" the cluster and later nodes join that cluster.
> > 
> > That's ignoring the cluster config, which persists even when the
> > cluster's down.
> > 
> > But to be clear, you picked a small side question from my original
> > post and answered that.  The main questions I had were about
> > startup
> > fencing :-)
-- 
Ken Gaillot 

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org