Re: [ClusterLabs] SuSE12SP3 HAE SBD Communication Issue

Gao,Yan Thu, 03 Jan 2019 04:43:39 -0800

On 12/24/18 7:10 AM, Fulong Wang wrote:

Yan, klaus and Everyone,
  Merry Christmas!!!



Many thanks for your advice!
I added the "-v" param in "SBD_OPTS", but didn't see any apparent changein the system message log, am i looking at a wrong place?

Did you restart all cluster services, for example by "crm cluster stop"and then "crm cluster start"? Basically sbd.service needs to berestarted. Be aware "systemctl restart pacemaker" only restarts pacemaker.

SBD daemons log into syslog. When a sbd watcher receives a "test"command, there should be a syslog like this showing up:


"servant: Received command test from ..."

sbd won't actually do anything about a "test" command but logging a message.

If you are not running a late version of sbd (maintenance update) yet, asingle "-v" will make sbd too verbose already. But of course you coulduse grep.

By the way, we want to test when the disk access paths (multipathdevices) lost, the sbd can fence the node automatically.

Be aware since pacemaker integration (-P) is enabled by default, whichmeans despite the sbd failure, if the node itself was clean and"healthy" from pacemaker's point of view and if it's in the clusterpartition with the quorum, it wouldn't self-fence -- meaning a node justbeing unable to fence doesn't necessarily need to be fenced.

As described in sbd man page, "this allows sbd to survive temporaryoutages of the majority of devices. However, while the cluster is insuch a degraded state, it can neither successfully fence nor be shutdowncleanly (as taking the cluster below the quorum threshold willimmediately cause all remaining nodes to self-fence). In short, it willnot tolerate any further faults. Please repair the system beforecontinuing."


Regards,
  Yan

what's your recommendation for this scenario?







The "crm node fence"  did the work.













Regards
Fulong

------------------------------------------------------------------------
*From:* Gao,Yan <[email protected]>
*Sent:* Friday, December 21, 2018 20:43

*To:* [email protected]; Cluster Labs - All topics related toopen-source clustering welcomed; Fulong Wang

*Subject:* Re: [ClusterLabs] SuSE12SP3 HAE SBD Communication Issue
First thanks for your reply, Klaus!

On 2018/12/21 10:09, Klaus Wenninger wrote:

On 12/21/2018 08:15 AM, Fulong Wang wrote:
Hello Experts,

I'm New to this mail lists.
Pls kindlyforgive me if this mail has disturb you!
Our Company recently is evaluating the usage of the SuSE HAE on x86platform.Wen simulating the storage disaster fail-over, i finally found thatthe SBD communication functioned normal on SuSE11 SP4 but abnormal onSuSE12 SP3.
I have no experience with SBD on SLES but I know that handling of the
logging verbosity-levels has changed recently in the upstream-repo.
Given that it was done by Yan Gao iirc I'd assume it went into SLES.
So changing the verbosity of the sbd-daemon might get you back
these logs.

Yes, I think it's the issue. Could you please retrieve the latest
maintenance update for SLE12SP3 and try? Otherwise of course you could
temporarily enable verbose/debug logging by adding a couple of "-v" into
   "SBD_OPTS" in /etc/sysconfig/sbd.

But frankly, it makes more sense to manually trigger fencing for example
by "crm node fence" and see if it indeed works correctly.

And of course you can use the list command on the other node
to verify as well.

The "test" message in the slot might get overwritten soon by a "clear"
if the sbd daemon is running.

Regards,
    Yan


Klaus

The SBD device was added during the initialization of the firstcluster node.

I have requested help from SuSE guys, but they didn't give me anyvaluable feedback yet now!



Below are some screenshots to explain what i have encountered.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

on a SuSE11 SP4 HAE cluster,  i  run the sbd test command as below:

then there will be some information showed up in the local systemmessage log




on the second node,  we can found that the communication is normal by

but when i turn to a SuSE12 SP3 HAE cluster, ran the same command asabove:




I didn't get any  response in the system message log.


"systemctl status sbd" also doesn't give me any clue on this.



~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

What could be the reason for this abnormal behavior? Is there anyproblems with my setup?

Any suggestions are appreciate!

Thanks!


Regards
FuLong


_______________________________________________
Users mailing list:[email protected]
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home:http://www.clusterlabs.org
Getting started:http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs:http://bugs.clusterlabs.org




_______________________________________________
Users mailing list: [email protected]
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

_______________________________________________
Users mailing list: [email protected]
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] SuSE12SP3 HAE SBD Communication Issue

Reply via email to