On 8/12/19 3:24 PM, Klaus Wenninger wrote: > On 8/12/19 2:30 PM, Yan Gao wrote: >> Hi Klaus, >> >> On 8/12/19 1:39 PM, Klaus Wenninger wrote: >>> On 8/9/19 9:06 PM, Yan Gao wrote: >>>> On 8/9/19 6:40 PM, Andrei Borzenkov wrote: >>>>> 09.08.2019 16:34, Yan Gao пишет: >>>>>> Hi, >>>>>> >>>>>> With disk-less sbd, it's fine to stop cluster service from the cluster >>>>>> nodes all at the same time. >>>>>> >>>>>> But if to stop the nodes one by one, for example with a 3-node cluster, >>>>>> after stopping the 2nd node, the only remaining node resets itself with: >>>>>> >>>>> That is sort of documented in SBD manual page: >>>>> >>>>> --><-- >>>>> However, while the cluster is in such a degraded state, it can >>>>> neither successfully fence nor be shutdown cleanly (as taking the >>>>> cluster below the quorum threshold will immediately cause all remaining >>>>> nodes to self-fence). >>>>> --><-- >>>>> >>>>> SBD in shared-nothing mode is basically always in such degraded state >>>>> and cannot tolerate loss of quorum. >>>> Well, the context here is it loses quorum *expectedly* since the other >>>> nodes gracefully shut down. >>>> >>>>>> Aug 09 14:30:20 opensuse150-1 sbd[1079]: pcmk: debug: >>>>>> notify_parent: Not notifying parent: state transient (2) >>>>>> Aug 09 14:30:20 opensuse150-1 sbd[1080]: cluster: debug: >>>>>> notify_parent: Notifying parent: healthy >>>>>> Aug 09 14:30:20 opensuse150-1 sbd[1078]: warning: inquisitor_child: >>>>>> Latency: No liveness for 4 s exceeds threshold of 3 s (healthy servants: >>>>>> 0) >>>>>> >>>>>> I can think of the way to manipulate quorum with last_man_standing and >>>>>> potentially also auto_tie_breaker, not to mention >>>>>> last_man_standing_window would also be a factor... But is there a better >>>>>> solution? >>>>>> >>>>> Lack of cluster wide shutdown mode was mentioned more than once on this >>>>> list. I guess the only workaround is to use higher level tools which >>>>> basically simply try to stop cluster on all nodes at once. It is still >>>>> susceptible to race condition. >>>> Gracefully stopping nodes one by one on purpose is still a reasonable >>>> need though ... >>> If you do the teardown as e.g. pcs is doing it - first tear down >>> pacemaker-instances and then corosync/sbd - it is at >>> least possible to tear down the pacemaker-instances one-by one >>> without risking a reboot due to quorum-loss. >>> With kind of current sbd having in >>> - >>> https://github.com/ClusterLabs/sbd/commit/824fe834c67fb7bae7feb87607381f9fa8fa2945 >>> - >>> https://github.com/ClusterLabs/sbd/commit/79b778debfee5b4ab2d099b2bfc7385f45597f70 >>> - >>> https://github.com/ClusterLabs/sbd/commit/a716a8ddd3df615009bcff3bd96dd9ae64cb5f68 >>> this should be pretty robust although we are still thinking >>> (probably together with some heartbeat to pacemakerd >>> that assures pacemakerd is checking liveness of sub-daemons >>> properly) of having a cleaner way to detect graceful >>> pacemaker-shutdown. >> These are all good improvements, thanks! >> >> But in this case the remaining node is not shutting down yet, or it's >> intentionally not being shut down :-) Loss of quorum is as expected, so >> is following no-quorum-policy, but self-reset is probably too much? > Hmm ... not sure if I can follow ... > If you shutdown solely pacemaker one-by-one on all nodes > and these shutdowns are considered graceful then you are > not gonna experience any reboots (e.g. 3 node cluster). > Afterwards you can shutdown corosync one-by-one as well > without experiencing reboots as without the cib-connection > sbd isn't gonna check for quorum anymore (all resources > down so no need to reboot in case of quorum-loss - extra > care has to be taken care of with unmanaged resources but > that isn't particular with sbd). I meant if users would like shut down only 2 out of 3 nodes in the cluster and keep the last one online and alive, it's simply not possible for now, although the loss of quorum is expected.
Regards, Yan _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/