> -----Original Message----- > From: Users <users-boun...@clusterlabs.org> On Behalf Of Valentin Vidic > via Users > Sent: Saturday, November 5, 2022 1:07 PM > To: users@clusterlabs.org > Cc: Valentin Vidić <vvi...@valentin-vidic.from.hr> > Subject: Re: [ClusterLabs] [External] : Re: Fence Agent tests > > On Sat, Nov 05, 2022 at 05:20:47PM +0000, Robert Hayden wrote: > > The OCI compute instances don't have a hardware watchdog, only the > software watchdog. > > So, when the network goes completely hung (e.g. firewall-cmd panic-on), > all network > > traffic stops which implies that IO to the SBD device also stops. I do not > > see > the software > > watchdog take any action in response to the network hang. > > It seems like the watchdog is not working or is not configured with a > correct timeout here. sbd will not refresh the watchdog if it fails to > read from the disk, so the watchdog should eventually expire and reset > the node.
That was my impression as well...so I may have something wrong. My expectation was that SBD daemon should be writing to the /dev/watchdog within 20 seconds and the kernel watchdog would self fence. Here is my setup root:dh2vgmprepap02:ablgmprep:/root:# grep ^SBD /etc/sysconfig/sbd SBD_DEVICE=/dev/disk/by-id/scsi-360e59ebc0f414569bcc7a5e4a6d58ccb-part1 SBD_PACEMAKER=yes SBD_STARTMODE=always SBD_DELAY_START=no SBD_WATCHDOG_DEV=/dev/watchdog SBD_WATCHDOG_TIMEOUT=5 SBD_TIMEOUT_ACTION=flush,reboot SBD_MOVE_TO_ROOT_CGROUP=auto SBD_OPTS= root:dh2vgmprepap02:ablgmprep:/root:# sbd -d /dev/disk/by-id/scsi-360e59ebc0f414569bcc7a5e4a6d58ccb-part1 dump ==Dumping header on disk /dev/disk/by-id/scsi-360e59ebc0f414569bcc7a5e4a6d58ccb-part1 Header version : 2.1 UUID : 04096cc5-1fb8-44da-9c4f-4b6034a0fe06 Number of slots : 255 Sector size : 512 Timeout (watchdog) : 20 Timeout (allocate) : 2 Timeout (loop) : 1 Timeout (msgwait) : 40 ==Header on disk /dev/disk/by-id/scsi-360e59ebc0f414569bcc7a5e4a6d58ccb-part1 is dumped root:dh2vgmprepap02:ablgmprep:/root:# pcs stonith sbd status --full SBD STATUS <node name>: <installed> | <enabled> | <running> dh2vgmprepap03: YES | YES | YES dh2vgmprepap02: YES | YES | YES Messages list on device '/dev/disk/by-id/scsi-360e59ebc0f414569bcc7a5e4a6d58ccb-part1': 0 dh2vgmprepap03 clear 1 dh2vgmprepap02 clear SBD header on device '/dev/disk/by-id/scsi-360e59ebc0f414569bcc7a5e4a6d58ccb-part1': ==Dumping header on disk /dev/disk/by-id/scsi-360e59ebc0f414569bcc7a5e4a6d58ccb-part1 Header version : 2.1 UUID : 04096cc5-1fb8-44da-9c4f-4b6034a0fe06 Number of slots : 255 Sector size : 512 Timeout (watchdog) : 20 Timeout (allocate) : 2 Timeout (loop) : 1 Timeout (msgwait) : 40 ==Header on disk /dev/disk/by-id/scsi-360e59ebc0f414569bcc7a5e4a6d58ccb-part1 is dumped > > -- > Valentin > _______________________________________________ > Manage your subscription: > https://urldefense.com/v3/__https://lists.clusterlabs.org/mailman/listinfo/u > sers__;!!ACWV5N9M2RV99hQ!LPMOKgky02sAjkujkuJM8HLR5G5hAfCaQGPF > Zymg81e8rf3Z1klCgoi4HAicoJr6wBEhEvnYaLZ6G1vRBDTKyw$ > > ClusterLabs home: > https://urldefense.com/v3/__https://www.clusterlabs.org/__;!!ACWV5N9 > M2RV99hQ!LPMOKgky02sAjkujkuJM8HLR5G5hAfCaQGPFZymg81e8rf3Z1klCg > oi4HAicoJr6wBEhEvnYaLZ6G1tNVtP0BA$ _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/