On Mon, May 28, 2018 at 10:47 AM, Klaus Wenninger <kwenn...@redhat.com> wrote: > On 05/28/2018 09:43 AM, Klaus Wenninger wrote: >> On 05/26/2018 07:23 AM, Andrei Borzenkov wrote: >>> 25.05.2018 14:44, Klaus Wenninger пишет: >>>> On 05/25/2018 12:44 PM, Andrei Borzenkov wrote: >>>>> On Fri, May 25, 2018 at 10:08 AM, Klaus Wenninger <kwenn...@redhat.com> >>>>> wrote: >>>>>> On 05/25/2018 07:31 AM, 井上 和徳 wrote: >>>>>>> Hi, >>>>>>> >>>>>>> I am checking the watchdog function of SBD (without shared >>>>>>> block-device). >>>>>>> In a two-node cluster, if one cluster is stopped, watchdog is triggered >>>>>>> on the remaining node. >>>>>>> Is this the designed behavior? >>>>>> SBD without a shared block-device doesn't really make sense on >>>>>> a two-node cluster. >>>>>> The basic idea is - e.g. in a case of a networking problem - >>>>>> that a cluster splits up in a quorate and a non-quorate partition. >>>>>> The quorate partition stays over while SBD guarantees a >>>>>> reliable watchdog-based self-fencing of the non-quorate partition >>>>>> within a defined timeout. >>>>> Does it require no-quorum-policy=suicide or it decides completely >>>>> independently? I.e. would it fire also with no-quorum-policy=ignore? >>>> Finally it will in any case. But no-quorum-policy decides how >>>> long this will take. In case of suicide the inquisitor will immediately >>>> stop tickling the watchdog. In all other cases the pacemaker-servant >>>> will stop pinging the inquisitor which will makes the servant >>>> timeout after a default of 4 seconds and then the inquisitor will >>>> stop tickling the watchdog. >>>> But that is just relevant if Corosync doesn't have 2-node enabled. >>>> See the comment below for that case. >>>> >>>>>> This idea of course doesn't work with just 2 nodes. >>>>>> Taking quorum info from the 2-node feature of corosync (automatically >>>>>> switching on wait-for-all) doesn't help in this case but instead >>>>>> would lead to split-brain. >>>>> So what you are saying is that SBD ignores quorum information from >>>>> corosync and takes its own decisions based on pure count of nodes. Do >>>>> I understand it correctly? >>>> Yes, but that is just true for this case where Corosync has 2-node >>>> enabled. >>>>> In all other cases (might it be clusters with more than 2 nodes >>>> or clusters with just 2 nodes but without 2-node enabled in >>>> Corosync) pacemaker-servant takes quorum-info from >>>> pacemaker, which will probably come directly from Corosync >>>> nowadays. >>>> But as said if 2-node is configured with Corosync everything >>>> is different: The node-counting is then actually done >>>> by the cluster-servant and this one will stop pinging the >>>> inquisitor (instead of the pacemaker-servant) if it doesn't >>>> count more than 1 node. >>>> >>> Is it conditional on having no shared device or it just checks two_node >>> value? If it always behaves this way, even with real shared device >>> present, it means sbd is fundamentally incompatible with two_node and it >>> better be mentioned in documentation. >> If you are referring to counting the nodes instead of taking >> quorum-info from pacemaker in case of 2-node configured >> with corosync, that is universal. >> >> And actually the reason why it is there is to be able to use >> sbd with a single disk on 2-nodes having 2-node enabled. >> >> Imagine quorum-info from corosync/pacemaker being used >> in that case: >> Image a cluster (node-a & node-b). node-a looses connection >> to the network and to the shared storage. node-a will still >> receive positive quorum from corosync as it has seen the other >> node already (since it is up). This will make it ignore the >> loss of the disk (survive on pacemaker). >> node-b is quoruate as well, sees the disk, uses the disk to >> fence node-a and will after a timeout assume node-a to be >> down -> split-brain. > Seeing the disk will prevent the reboot if that is what > was missing for you.
Yes, this was not exactly clear. Thank you! >> >>>> That all said I've just realized that setting 2-node in Corosync >>>> shouldn't really be dangerous anymore although it doesn't make >>>> the cluster especially useful either in case of SBD without disk(s). >>>> >>>> Regards, >>>> Klaus >> _______________________________________________ >> Users mailing list: Users@clusterlabs.org >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > _______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org