On 02/11/2019 09:49 AM, Fulong Wang wrote: > Thanks Yan, > > You gave me more valuable hints on the SBD operation! > Now, i can see the verbose output after service restart. > > > >Be aware since pacemaker integration (-P) is enabled by default, which > >means despite the sbd failure, if the node itself was clean and > >"healthy" from pacemaker's point of view and if it's in the cluster > >partition with the quorum, it wouldn't self-fence -- meaning a node just > >being unable to fence doesn't necessarily need to be fenced. > > >As described in sbd man page, "this allows sbd to survive temporary > >outages of the majority of devices. However, while the cluster is in > >such a degraded state, it can neither successfully fence nor be shutdown > >cleanly (as taking the cluster below the quorum threshold will > >immediately cause all remaining nodes to self-fence). In short, it will > >not tolerate any further faults. Please repair the system before > >continuing." > > Yes, I can see the "pacemaker integration" was enabled in my sbd > config file by default. > So, you mean in some sbd failure cases, if the node was considered as > "healthy" from pacemaker's poinit of view, it still wouldn't sel-fence. > > Honestly speaking, i didn't get you at this point. I have > "no-quorum-policy=ignore" setting in my setup and it's a two node > cluster. > Can you show me a sample situation for this?
When using sbd with 2-node-clusters and pacemaker-integration you might check https://github.com/ClusterLabs/sbd/commit/4bd0a66da3ac9c9afaeb8a2468cdd3ed51ad3377 to be included in your sbd-version. This is relevant when 2-node is configured in corosync. Regards, Klaus > > Many Thanks!!! > > > > > Reagards > Fulong > > > > ------------------------------------------------------------------------ > *From:* Gao,Yan <[email protected]> > *Sent:* Thursday, January 3, 2019 20:43 > *To:* Fulong Wang; Cluster Labs - All topics related to open-source > clustering welcomed > *Subject:* Re: [ClusterLabs] SuSE12SP3 HAE SBD Communication Issue > > On 12/24/18 7:10 AM, Fulong Wang wrote: > > Yan, klaus and Everyone, > > > > > > Merry Christmas!!! > > > > > > > > Many thanks for your advice! > > I added the "-v" param in "SBD_OPTS", but didn't see any apparent > change > > in the system message log, am i looking at a wrong place? > Did you restart all cluster services, for example by "crm cluster stop" > and then "crm cluster start"? Basically sbd.service needs to be > restarted. Be aware "systemctl restart pacemaker" only restarts pacemaker. > > SBD daemons log into syslog. When a sbd watcher receives a "test" > command, there should be a syslog like this showing up: > > "servant: Received command test from ..." > > sbd won't actually do anything about a "test" command but logging a > message. > > If you are not running a late version of sbd (maintenance update) yet, a > single "-v" will make sbd too verbose already. But of course you could > use grep. > > > > > By the way, we want to test when the disk access paths (multipath > > devices) lost, the sbd can fence the node automatically. > Be aware since pacemaker integration (-P) is enabled by default, which > means despite the sbd failure, if the node itself was clean and > "healthy" from pacemaker's point of view and if it's in the cluster > partition with the quorum, it wouldn't self-fence -- meaning a node just > being unable to fence doesn't necessarily need to be fenced. > > As described in sbd man page, "this allows sbd to survive temporary > outages of the majority of devices. However, while the cluster is in > such a degraded state, it can neither successfully fence nor be shutdown > cleanly (as taking the cluster below the quorum threshold will > immediately cause all remaining nodes to self-fence). In short, it will > not tolerate any further faults. Please repair the system before > continuing." > > Regards, > Yan > > > > what's your recommendation for this scenario? > > > > > > > > > > > > > > > > The "crm node fence" did the work. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Regards > > Fulong > > > > ------------------------------------------------------------------------ > > *From:* Gao,Yan <[email protected]> > > *Sent:* Friday, December 21, 2018 20:43 > > *To:* [email protected]; Cluster Labs - All topics related to > > open-source clustering welcomed; Fulong Wang > > *Subject:* Re: [ClusterLabs] SuSE12SP3 HAE SBD Communication Issue > > First thanks for your reply, Klaus! > > > > On 2018/12/21 10:09, Klaus Wenninger wrote: > >> On 12/21/2018 08:15 AM, Fulong Wang wrote: > >>> Hello Experts, > >>> > >>> I'm New to this mail lists. > >>> Pls kindlyforgive me if this mail has disturb you! > >>> > >>> Our Company recently is evaluating the usage of the SuSE HAE on x86 > >>> platform. > >>> Wen simulating the storage disaster fail-over, i finally found that > >>> the SBD communication functioned normal on SuSE11 SP4 but abnormal on > >>> SuSE12 SP3. > >> > >> I have no experience with SBD on SLES but I know that handling of the > >> logging verbosity-levels has changed recently in the upstream-repo. > >> Given that it was done by Yan Gao iirc I'd assume it went into SLES. > >> So changing the verbosity of the sbd-daemon might get you back > >> these logs. > > Yes, I think it's the issue. Could you please retrieve the latest > > maintenance update for SLE12SP3 and try? Otherwise of course you could > > temporarily enable verbose/debug logging by adding a couple of "-v" into > > "SBD_OPTS" in /etc/sysconfig/sbd. > > > > But frankly, it makes more sense to manually trigger fencing for example > > by "crm node fence" and see if it indeed works correctly. > > > >> And of course you can use the list command on the other node > >> to verify as well. > > The "test" message in the slot might get overwritten soon by a "clear" > > if the sbd daemon is running. > > > > Regards, > > Yan > > > > > >> > >> Klaus > >> > >>> The SBD device was added during the initialization of the first > >>> cluster node. > >>> > >>> I have requested help from SuSE guys, but they didn't give me any > >>> valuable feedback yet now! > >>> > >>> > >>> Below are some screenshots to explain what i have encountered. > >>> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > >>> > >>> on a SuSE11 SP4 HAE cluster, i run the sbd test command as below: > >>> > >>> > >>> then there will be some information showed up in the local system > >>> message log > >>> > >>> > >>> > >>> on the second node, we can found that the communication is normal by > >>> > >>> > >>> > >>> but when i turn to a SuSE12 SP3 HAE cluster, ran the same command as > >>> above: > >>> > >>> > >>> > >>> I didn't get any response in the system message log. > >>> > >>> > >>> "systemctl status sbd" also doesn't give me any clue on this. > >>> > >>> > >>> > >>> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > >>> > >>> What could be the reason for this abnormal behavior? Is there any > >>> problems with my setup? > >>> Any suggestions are appreciate! > >>> > >>> Thanks! > >>> > >>> > >>> Regards > >>> FuLong > >>> > >>> > >>> _______________________________________________ > >>> Users mailing list:[email protected] > >>> https://lists.clusterlabs.org/mailman/listinfo/users > >>> > >>> Project Home:http://www.clusterlabs.org > >>> Getting > started:http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >>> Bugs:http://bugs.clusterlabs.org > >> > >> > >> > >> _______________________________________________ > >> Users mailing list: [email protected] > >> https://lists.clusterlabs.org/mailman/listinfo/users > >> > >> Project Home: http://www.clusterlabs.org > >> Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf> > >> Bugs: http://bugs.clusterlabs.org > >> > > > _______________________________________________ > Users mailing list: [email protected] > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org
_______________________________________________ Users mailing list: [email protected] https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
