On Fri, 01 Dec 2017 16:34:08 -0600 Ken Gaillot <kgail...@redhat.com> wrote:
> On Thu, 2017-11-30 at 07:55 +0100, Ulrich Windl wrote: > > > > > > > Kristoffer Gronlund <kgronl...@suse.com> wrote: > > > > Adam Spiers <aspi...@suse.com> writes: > > > > > > > > > - The whole cluster is shut down cleanly. > > > > > > > > > > - The whole cluster is then started up again. (Side question: > > > > > what > > > > > happens if the last node to shut down is not the first to > > > > > start up? > > > > > How will the cluster ensure it has the most recent version of > > > > > the > > > > > CIB? Without that, how would it know whether the last man > > > > > standing > > > > > was shut down cleanly or not?) > > > > > > > > This is my opinion, I don't really know what the "official" > > > > pacemaker > > > > stance is: There is no such thing as shutting down a cluster > > > > cleanly. A > > > > cluster is a process stretching over multiple nodes - if they all > > > > shut > > > > down, the process is gone. When you start up again, you > > > > effectively have > > > > a completely new cluster. > > > > > > Sorry, I don't follow you at all here. When you start the cluster > > > up > > > again, the cluster config from before the shutdown is still there. > > > That's very far from being a completely new cluster :-) > > > > The problem is you cannot "start the cluster" in pacemaker; you can > > only "start nodes". The nodes will come up one by one. As opposed (as > > I had said) to HP Sertvice Guard, where there is a "cluster formation > > timeout". That is, the nodes wait for the specified time for the > > cluster to "form". Then the cluster starts as a whole. Of course that > > only applies if the whole cluster was down, not if a single node was > > down. > > I'm not sure what that would specifically entail, but I'm guessing we > have some of the pieces already: > > - Corosync has a wait_for_all option if you want the cluster to be > unable to have quorum at start-up until every node has joined. I don't > think you can set a timeout that cancels it, though. > > - Pacemaker will wait dc-deadtime for the first DC election to > complete. (if I understand it correctly ...) > > - Higher-level tools can start or stop all nodes together (e.g. pcs has > pcs cluster start/stop --all). Based on this discussion, I have some questions about pcs: * how is it shutting down the cluster when issuing "pcs cluster stop --all"? * any race condition possible where the cib will record only one node up before the last one shut down? * will the cluster start safely? IIRC, crmsh does not implement the full cluster shutdown, only one node shut down at a time. Is it because Pacemaker has no way to shutdown the whole cluster by stopping all resources everywhere forbidding failovers in the process? Is it required to include a bunch of "pcs resource disable <rid>" before shutting down the cluster? Thanks, -- Jehan-Guillaume de Rorthais Dalibo _______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org