>>> Strahil Nikolov <hunter86...@yahoo.com> schrieb am 22.09.2020 um 07:23 in Nachricht <1814286403.4657404.1600752191...@mail.yahoo.com>: > Replace /dev/sde1 with > /dev/disk/by-id/scsi-36000c292840d37bd13eb6be46d3af4ab-part1 : > - in /etc/sysconfig/sbd > - in the cib (via crom configure edit) > > Also, I don't see 'stonith-enabled=true' which could be your actual problem. > > I think you can set it via : > crm configure property stonith-enabled=true > > P.S.: Consider setting the 'resource-stickiness' to '1'.Using partitions is
> not the best option but is better than nothing. I think partitions are fine, especially when you have a modern SAN storage where the smallest allocatable amount is 1GB. The other thing I'd recommend is pre-allocating the message slots for each cluster node. Most importantly /dev/disk/by-id/scsi-36000c292840d37bd13eb6be46d3af4ab-part1 is probably the device to specify. Regards, Ulrich > > Best Regards, > Strahil Nikolov > > > > > > > В вторник, 22 септември 2020 г., 02:06:10 Гринуич+3, Philippe M Stedman > <pmste...@us.ibm.com> написа: > > > > > > Hi Strahil, > > Here is the output of those commands.... I appreciate the help! > > # crm config show > node 1: ceha03 \ > attributes ethmonitor-ens192=1 > node 2: ceha04 \ > attributes ethmonitor-ens192=1 > (...) > primitive stonith_sbd stonith:fence_sbd \ > params devices="/dev/sde1" \ > meta is-managed=true > (...) > property cib-bootstrap-options: \ > have-watchdog=true \ > dc-version=2.0.2-1.el8-744a30d655 \ > cluster-infrastructure=corosync \ > cluster-name=ps_dom \ > stonith-enabled=true \ > no-quorum-policy=ignore \ > stop-all-resources=false \ > cluster-recheck-interval=60 \ > symmetric-cluster=true \ > stonith-watchdog-timeout=0 > rsc_defaults rsc-options: \ > is-managed=false \ > resource-stickiness=0 \ > failure-timeout=1min > > # cat /etc/sysconfig/sbd > SBD_DEVICE="/dev/sde1" > SBD_PACEMAKER=yes > SBD_STARTMODE=always > SBD_DELAY_START=no > SBD_WATCHDOG_DEV=/dev/watchdog > SBD_WATCHDOG_TIMEOUT=5 > SBD_TIMEOUT_ACTION=flush,reboot > SBD_MOVE_TO_ROOT_CGROUP=auto > SBD_OPTS= > > # systemctl status sbd > sbd.service - Shared-storage based fencing daemon > Loaded: loaded (/usr/lib/systemd/system/sbd.service; enabled; vendor preset: > disabled) > Active: active (running) since Mon 2020-09-21 18:36:28 EDT; 15min ago > Docs: man:sbd(8) > Process: 12810 ExecStart=/usr/sbin/sbd $SBD_OPTS -p /var/run/sbd.pid watch > (code=exited, status=0/SUCCESS) > Main PID: 12812 (sbd) > Tasks: 4 (limit: 26213) > Memory: 14.5M > CGroup: /system.slice/sbd.service > \u251c\u250012812 sbd: inquisitor > \u251c\u250012814 sbd: watcher: /dev/sde1 - slot: 0 - uuid: > 94d67f15-e301-4fa9-89ae-e3ce2e82c9e7 > \u251c\u250012815 sbd: watcher: Pacemaker > \u2514\u250012816 sbd: watcher: Cluster > > Sep 21 18:36:27 ceha03.canlab.ibm.com systemd[1]: Starting Shared-storage > based fencing daemon... > Sep 21 18:36:27 ceha03.canlab.ibm.com sbd[12810]: notice: main: Doing flush > + writing 'b' to sysrq on timeout > Sep 21 18:36:27 ceha03.canlab.ibm.com sbd[12815]: pcmk: notice: > servant_pcmk: Monitoring Pacemaker health > Sep 21 18:36:27 ceha03.canlab.ibm.com sbd[12816]: cluster: notice: > servant_cluster: Monitoring unknown cluster health > Sep 21 18:36:27 ceha03.canlab.ibm.com sbd[12814]: /dev/sde1: notice: > servant_md: Monitoring slot 0 on disk /dev/sde1 > Sep 21 18:36:28 ceha03.canlab.ibm.com sbd[12812]: notice: watchdog_init: > Using watchdog device '/dev/watchdog' > Sep 21 18:36:28 ceha03.canlab.ibm.com sbd[12816]: cluster: notice: > sbd_get_two_node: Corosync is in 2Node-mode > Sep 21 18:36:28 ceha03.canlab.ibm.com sbd[12812]: notice: inquisitor_child: > Servant cluster is healthy (age: 0) > Sep 21 18:36:28 ceha03.canlab.ibm.com systemd[1]: Started Shared-storage > based fencing daemon. > > # sbd -d /dev/disk/by-id/scsi-<long_uuid> dump > [root@ceha03 by-id]# sbd -d > /dev/disk/by-id/scsi-36000c292840d37bd13eb6be46d3af4ab-part1 dump > ==Dumping header on disk > /dev/disk/by-id/scsi-36000c292840d37bd13eb6be46d3af4ab-part1 > Header version : 2.1 > UUID : 94d67f15-e301-4fa9-89ae-e3ce2e82c9e7 > Number of slots : 255 > Sector size : 512 > Timeout (watchdog) : 5 > Timeout (allocate) : 2 > Timeout (loop) : 1 > Timeout (msgwait) : 10 > ==Header on disk > /dev/disk/by-id/scsi-36000c292840d37bd13eb6be46d3af4ab-part1 is dumped > > > Thanks, > > Phil Stedman > Db2 High Availability Development and Support > Email: pmste...@us.ibm.com > > Strahil Nikolov ---09/21/2020 01:41:10 PM---Can you provide (replace > sensitive data) : crm configure show > > From: Strahil Nikolov <hunter86...@yahoo.com> > To: "users@clusterlabs.org" <users@clusterlabs.org> > Date: 09/21/2020 01:41 PM > Subject: [EXTERNAL] Re: [ClusterLabs] SBD fencing not working on my two-node > cluster > Sent by: "Users" <users-boun...@clusterlabs.org> > ________________________________ > > > > Can you provide (replace sensitive data) : > > crm configure show > cat /etc/sysconfig/sbd > systemctl status sbd > sbd -d /dev/disk/by-id/scsi-<long_uuid> dump > > P.S.: It is very bad practice to use "/dev/sdXYZ" as these are not > permanent.Always use persistent names like those inside > "/dev/disk/by-XYZ/ZZZZ". Also , SBD needs max 10MB block device and yours > seems unnecessarily big. > > > Most probably /dev/sde1 is your problem. > > Best Regards, > Strahil Nikolov > > > > > В понеделник, 21 септември 2020 г., 23:19:47 Гринуич+3, Philippe M Stedman > <pmste...@us.ibm.com> написа: > > > > > > Hi, > > I have been following the instructions on the following page to try and > configure SBD fencing on my two-node cluster: > https://documentation.suse.com/sle-ha/15-SP1/html/SLE-HA-all/cha-ha-storage- > protect.html > > I am able to get through all the steps successfully, I am using the > following device (/dev/sde1) as my shared disk: > > Disk /dev/sde: 20 GiB, 21474836480 bytes, 41943040 sectors > Units: sectors of 1 * 512 = 512 bytes > Sector size (logical/physical): 512 bytes / 512 bytes > I/O size (minimum/optimal): 512 bytes / 512 bytes > Disklabel type: gpt > Disk identifier: 43987868-1C0B-41CE-8AF8-C522AB259655 > > Device Start End Sectors Size Type > /dev/sde1 48 41942991 41942944 20G IBM General Parallel Fs > > Since, I don't have a hardware watchdog at my disposal, I am using the > software watchdog (softdog) instead. Having said this, I am able to get > through all the steps successfully... I create the fence agent resource > successfully, it shows as Started in crm status output: > > stonith_sbd (stonith:fence_sbd): Started ceha04 > > The problem is when I run crm node fence ceha04 to test out fencing a host > in my cluster. From the crm status output, I see that the reboot action has > failed and furthermore, in the system logs, I see the following messages: > > Sep 21 14:12:33 ceha04 pacemaker-controld[24146]: notice: Requesting fencing > (reboot) of node ceha04 > Sep 21 14:12:33 ceha04 pacemaker-fenced[24142]: notice: Client > pacemaker-controld.24146.5ff1ac0c wants to fence (reboot) 'ceha04' with > device '(any)' > Sep 21 14:12:33 ceha04 pacemaker-fenced[24142]: notice: Requesting peer > fencing (reboot) of ceha04 > Sep 21 14:12:33 ceha04 pacemaker-fenced[24142]: notice: Couldn't find anyone > to fence (reboot) ceha04 with any device > Sep 21 14:12:33 ceha04 pacemaker-fenced[24142]: error: Operation reboot of > ceha04 by <no-one> for pacemaker-controld.24146@ceha04.1bad3987: No such device > Sep 21 14:12:33 ceha04 pacemaker-controld[24146]: notice: Stonith operation > 3/1:4317:0:ec560474-96ea-4984-b801-400d11b5b3ae: No such device (-19) > Sep 21 14:12:33 ceha04 pacemaker-controld[24146]: notice: Stonith operation > 3 for ceha04 failed (No such device): aborting transition. > Sep 21 14:12:33 ceha04 pacemaker-controld[24146]: warning: No devices found > in cluster to fence ceha04, giving up > Sep 21 14:12:33 ceha04 pacemaker-controld[24146]: notice: Transition 4317 > aborted: Stonith failed > Sep 21 14:12:33 ceha04 pacemaker-controld[24146]: notice: Peer ceha04 was > not terminated (reboot) by <anyone> on behalf of pacemaker-controld.24146: No > such device > > I don't know why Pacemaker isn't able to discover my fencing resource, why > isn't it able to find anyone to fence the host from the cluster? > > Any help is greatly appreciated. I can provide more details as required. > > Thanks, > > Phil Stedman > Db2 High Availability Development and Support > Email: pmste...@us.ibm.com > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ > > > > > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/