On Tue, 05 Dec 2017 08:59:55 -0600 Ken Gaillot <kgail...@redhat.com> wrote:
> On Tue, 2017-12-05 at 14:47 +0100, Ulrich Windl wrote: > > > > > Tomas Jelinek <tojel...@redhat.com> schrieb am 04.12.2017 um > > > > > 16:50 in Nachricht > > > > <3e60579c-0f4d-1c32-70fc-d207e0654...@redhat.com>: > > > Dne 4.12.2017 v 14:21 Jehan-Guillaume de Rorthais napsal(a): > > > > On Mon, 4 Dec 2017 12:31:06 +0100 > > > > Tomas Jelinek <tojel...@redhat.com> wrote: > > > > > > > > > Dne 4.12.2017 v 10:36 Jehan-Guillaume de Rorthais napsal(a): > > > > > > On Fri, 01 Dec 2017 16:34:08 -0600 > > > > > > Ken Gaillot <kgail...@redhat.com> wrote: > > > > > > > > > > > > > On Thu, 2017-11-30 at 07:55 +0100, Ulrich Windl wrote: > > > > > > > > > > > > > > > > > > > > > > > > > Kristoffer Gronlund <kgronl...@suse.com> wrote: > > > > > > > > > > Adam Spiers <aspi...@suse.com> writes: > > > > > > > > > > > > > > > > > > > > > - The whole cluster is shut down cleanly. > > > > > > > > > > > > > > > > > > > > > > - The whole cluster is then started up > > > > > > > > > > > again. (Side question: > > > > > > > > > > > what > > > > > > > > > > > happens if the last node to shut down is not > > > > > > > > > > > the first to > > > > > > > > > > > start up? > > > > > > > > > > > How will the cluster ensure it has the most > > > > > > > > > > > recent version of > > > > > > > > > > > the > > > > > > > > > > > CIB? Without that, how would it know whether > > > > > > > > > > > the last man > > > > > > > > > > > standing > > > > > > > > > > > was shut down cleanly or not?) > > > > > > > > > > > > > > > > > > > > This is my opinion, I don't really know what the > > > > > > > > > > "official" > > > > > > > > > > pacemaker > > > > > > > > > > stance is: There is no such thing as shutting down a > > > > > > > > > > cluster > > > > > > > > > > cleanly. A > > > > > > > > > > cluster is a process stretching over multiple nodes - > > > > > > > > > > if they all > > > > > > > > > > shut > > > > > > > > > > down, the process is gone. When you start up again, > > > > > > > > > > you > > > > > > > > > > effectively have > > > > > > > > > > a completely new cluster. > > > > > > > > > > > > > > > > > > Sorry, I don't follow you at all here. When you start > > > > > > > > > the cluster > > > > > > > > > up > > > > > > > > > again, the cluster config from before the shutdown is > > > > > > > > > still there. > > > > > > > > > That's very far from being a completely new cluster :-) > > > > > > > > > > > > > > > > The problem is you cannot "start the cluster" in > > > > > > > > pacemaker; you can > > > > > > > > only "start nodes". The nodes will come up one by one. As > > > > > > > > opposed (as > > > > > > > > I had said) to HP Sertvice Guard, where there is a > > > > > > > > "cluster formation > > > > > > > > timeout". That is, the nodes wait for the specified time > > > > > > > > for the > > > > > > > > cluster to "form". Then the cluster starts as a whole. Of > > > > > > > > course that > > > > > > > > only applies if the whole cluster was down, not if a > > > > > > > > single node was > > > > > > > > down. > > > > > > > > > > > > > > I'm not sure what that would specifically entail, but I'm > > > > > > > guessing we > > > > > > > have some of the pieces already: > > > > > > > > > > > > > > - Corosync has a wait_for_all option if you want the > > > > > > > cluster to be > > > > > > > unable to have quorum at start-up until every node has > > > > > > > joined. I don't > > > > > > > think you can set a timeout that cancels it, though. > > > > > > > > > > > > > > - Pacemaker will wait dc-deadtime for the first DC election > > > > > > > to > > > > > > > complete. (if I understand it correctly ...) > > > > > > > > > > > > > > - Higher-level tools can start or stop all nodes together > > > > > > > (e.g. pcs has > > > > > > > pcs cluster start/stop --all). > > > > > > > > > > > > Based on this discussion, I have some questions about pcs: > > > > > > > > > > > > * how is it shutting down the cluster when issuing "pcs > > > > > > cluster stop > > > > > > --all"? > > > > > > > > > > First, it sends a request to each node to stop pacemaker. The > > > > > requests > > > > > are sent in parallel which prevents resources from being moved > > > > > from node > > > > > to node. Once pacemaker stops on all nodes, corosync is stopped > > > > > on all > > > > > nodes in the same manner. > > > > > > > > What if for some external reasons one node is slower (load, > > > > network, > > > > > > whatever) > > > > than the others and start reacting ? Sending queries in parallel > > > > doesn't > > > > feels safe enough in regard with all the race conditions that can > > > > occurs in > > > > > > the > > > > same time. > > > > > > > > Am I missing something ? > > > > > > > > > > If a node gets the request later than others, some resources may > > > be > > > moved to it before it starts shutting down pacemaker as well. Pcs > > > waits > > > > I think that's impossible due to the ordering of corosync: If a > > standby is issued, and a resource migration is the consequence, every > > node will see the standby before it sees any other config change. > > Right? > > pcs doesn't issue a standby, just a shutdown. > > When a node needs to shut down, it sends a shutdown request to the DC, > which sets a "shutdown" node attribute, which tells the policy engine > to get all resources off the node. > > Once all nodes have the "shutdown" node attribute set, there is nowhere > left for resources to run, so they will be stopped rather than > migrated. But if the resources are quicker than the attribute setting, > they can migrate before that happens. > > pcs doesn't issue a standby for the reasons discussed elsewhere in the > thread. > > To get a true atomic shutdown, we'd have to introduce a new crmd > request for "shutdown all" that would result in the "shutdown" > attribute being set for all nodes in one CIB modification. Does it means we could set the shutdown node attribute on all nodes by hands using cibadmin? I suppose this would force the CRM to compute the shutdown of everything in only one transition, isn't it? _______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org