Hi! Maybe see "test-watchdog" in sbd's manual page ;-)
Regards, Ulrich >>> Robert Hayden <robert.h.hay...@oracle.com> schrieb am 05.11.2022 um 19:47 in Nachricht <sa2pr10mb44916ec50d93e8f6d8fa42eec8...@sa2pr10mb4491.namprd10.prod.outlook.com> >> ‑‑‑‑‑Original Message‑‑‑‑‑ >> From: Users <users‑boun...@clusterlabs.org> On Behalf Of Valentin Vidic >> via Users >> Sent: Saturday, November 5, 2022 1:07 PM >> To: users@clusterlabs.org >> Cc: Valentin Vidić <vvidic@valentin‑vidic.from.hr> >> Subject: Re: [ClusterLabs] [External] : Re: Fence Agent tests >> >> On Sat, Nov 05, 2022 at 05:20:47PM +0000, Robert Hayden wrote: >> > The OCI compute instances don't have a hardware watchdog, only the >> software watchdog. >> > So, when the network goes completely hung (e.g. firewall‑cmd panic‑on), >> all network >> > traffic stops which implies that IO to the SBD device also stops. I do not > see >> the software >> > watchdog take any action in response to the network hang. >> >> It seems like the watchdog is not working or is not configured with a >> correct timeout here. sbd will not refresh the watchdog if it fails to >> read from the disk, so the watchdog should eventually expire and reset >> the node. > > That was my impression as well...so I may have something wrong. My > expectation was that SBD daemon > should be writing to the /dev/watchdog within 20 seconds and the kernel > watchdog would self fence. > > Here is my setup > root:dh2vgmprepap02:ablgmprep:/root:# grep ^SBD /etc/sysconfig/sbd > SBD_DEVICE=/dev/disk/by‑id/scsi‑360e59ebc0f414569bcc7a5e4a6d58ccb‑part1 > SBD_PACEMAKER=yes > SBD_STARTMODE=always > SBD_DELAY_START=no > SBD_WATCHDOG_DEV=/dev/watchdog > SBD_WATCHDOG_TIMEOUT=5 > SBD_TIMEOUT_ACTION=flush,reboot > SBD_MOVE_TO_ROOT_CGROUP=auto > SBD_OPTS= > > root:dh2vgmprepap02:ablgmprep:/root:# sbd ‑d > /dev/disk/by‑id/scsi‑360e59ebc0f414569bcc7a5e4a6d58ccb‑part1 dump > ==Dumping header on disk > /dev/disk/by‑id/scsi‑360e59ebc0f414569bcc7a5e4a6d58ccb‑part1 > Header version : 2.1 > UUID : 04096cc5‑1fb8‑44da‑9c4f‑4b6034a0fe06 > Number of slots : 255 > Sector size : 512 > Timeout (watchdog) : 20 > Timeout (allocate) : 2 > Timeout (loop) : 1 > Timeout (msgwait) : 40 > ==Header on disk /dev/disk/by‑id/scsi‑360e59ebc0f414569bcc7a5e4a6d58ccb‑part1 > is dumped > > root:dh2vgmprepap02:ablgmprep:/root:# pcs stonith sbd status ‑‑full > SBD STATUS > <node name>: <installed> | <enabled> | <running> > dh2vgmprepap03: YES | YES | YES > dh2vgmprepap02: YES | YES | YES > > Messages list on device > '/dev/disk/by‑id/scsi‑360e59ebc0f414569bcc7a5e4a6d58ccb‑part1': > 0 dh2vgmprepap03 clear > 1 dh2vgmprepap02 clear > > > SBD header on device > '/dev/disk/by‑id/scsi‑360e59ebc0f414569bcc7a5e4a6d58ccb‑part1': > ==Dumping header on disk > /dev/disk/by‑id/scsi‑360e59ebc0f414569bcc7a5e4a6d58ccb‑part1 > Header version : 2.1 > UUID : 04096cc5‑1fb8‑44da‑9c4f‑4b6034a0fe06 > Number of slots : 255 > Sector size : 512 > Timeout (watchdog) : 20 > Timeout (allocate) : 2 > Timeout (loop) : 1 > Timeout (msgwait) : 40 > ==Header on disk /dev/disk/by‑id/scsi‑360e59ebc0f414569bcc7a5e4a6d58ccb‑part1 > is dumped > > >> >> ‑‑ >> Valentin >> _______________________________________________ >> Manage your subscription: >> https://urldefense.com/v3/__https://lists.clusterlabs.org/mailman/listinfo/u >> sers__;!!ACWV5N9M2RV99hQ!LPMOKgky02sAjkujkuJM8HLR5G5hAfCaQGPF >> Zymg81e8rf3Z1klCgoi4HAicoJr6wBEhEvnYaLZ6G1vRBDTKyw$ >> >> ClusterLabs home: >> https://urldefense.com/v3/__https://www.clusterlabs.org/__;!!ACWV5N9 >> M2RV99hQ!LPMOKgky02sAjkujkuJM8HLR5G5hAfCaQGPFZymg81e8rf3Z1klCg >> oi4HAicoJr6wBEhEvnYaLZ6G1tNVtP0BA$ > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/