Re: [ClusterLabs] SBD fencing not working on my two-node cluster

Strahil Nikolov Mon, 21 Sep 2020 17:41:47 -0700

Can you provide (replace sensitive data) :

crm configure show
cat /etc/sysconfig/sbd
systemctl status sbd
sbd -d /dev/disk/by-id/scsi-<long_uuid> dump


P.S.: It is very bad practice to use "/dev/sdXYZ" as these are not 
permanent.Always use persistent names like those inside 
"/dev/disk/by-XYZ/ZZZZ". Also , SBD needs max 10MB block device and yours seems 
unnecessarily big.


Most probably /dev/sde1 is your problem. 

Best Regards,
Strahil Nikolov




В понеделник, 21 септември 2020 г., 23:19:47 Гринуич+3, Philippe M Stedman 
<pmste...@us.ibm.com> написа: 





Hi,

I have been following the instructions on the following page to try and 
configure SBD fencing on my two-node cluster:
https://documentation.suse.com/sle-ha/15-SP1/html/SLE-HA-all/cha-ha-storage-protect.html

I am able to get through all the steps successfully, I am using the following 
device (/dev/sde1) as my shared disk:

Disk /dev/sde: 20 GiB, 21474836480 bytes, 41943040 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 43987868-1C0B-41CE-8AF8-C522AB259655

Device Start End Sectors Size Type
/dev/sde1 48 41942991 41942944 20G IBM General Parallel Fs

Since, I don't have a hardware watchdog at my disposal, I am using the software 
watchdog (softdog) instead. Having said this, I am able to get through all the 
steps successfully... I create the fence agent resource successfully, it shows 
as Started in crm status output:

stonith_sbd (stonith:fence_sbd): Started ceha04

The problem is when I run crm node fence ceha04 to test out fencing a host in 
my cluster. From the crm status output, I see that the reboot action has failed 
and furthermore, in the system logs, I see the following messages:

Sep 21 14:12:33 ceha04 pacemaker-controld[24146]: notice: Requesting fencing 
(reboot) of node ceha04
Sep 21 14:12:33 ceha04 pacemaker-fenced[24142]: notice: Client 
pacemaker-controld.24146.5ff1ac0c wants to fence (reboot) 'ceha04' with device 
'(any)'
Sep 21 14:12:33 ceha04 pacemaker-fenced[24142]: notice: Requesting peer fencing 
(reboot) of ceha04
Sep 21 14:12:33 ceha04 pacemaker-fenced[24142]: notice: Couldn't find anyone to 
fence (reboot) ceha04 with any device
Sep 21 14:12:33 ceha04 pacemaker-fenced[24142]: error: Operation reboot of 
ceha04 by <no-one> for pacemaker-controld.24146@ceha04.1bad3987: No such device
Sep 21 14:12:33 ceha04 pacemaker-controld[24146]: notice: Stonith operation 
3/1:4317:0:ec560474-96ea-4984-b801-400d11b5b3ae: No such device (-19)
Sep 21 14:12:33 ceha04 pacemaker-controld[24146]: notice: Stonith operation 3 
for ceha04 failed (No such device): aborting transition.
Sep 21 14:12:33 ceha04 pacemaker-controld[24146]: warning: No devices found in 
cluster to fence ceha04, giving up
Sep 21 14:12:33 ceha04 pacemaker-controld[24146]: notice: Transition 4317 
aborted: Stonith failed
Sep 21 14:12:33 ceha04 pacemaker-controld[24146]: notice: Peer ceha04 was not 
terminated (reboot) by <anyone> on behalf of pacemaker-controld.24146: No such 
device

I don't know why Pacemaker isn't able to discover my fencing resource, why 
isn't it able to find anyone to fence the host from the cluster?

Any help is greatly appreciated. I can provide more details as required.

Thanks,

Phil Stedman
Db2 High Availability Development and Support
Email: pmste...@us.ibm.com

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] SBD fencing not working on my two-node cluster

Reply via email to