Hi, I'm running a 2-node-web-cluster on Almalinux-9, pacemaker 2.1.7, drbd9 and corosync 3.1. I have trouble with the promoting and mounting of the drbd-device. After activating the cluster, the drbd-device is not getting mounted and is showing quite fast an error message:
pacemaker-schedulerd[4879]: warning: Unexpected result (error: Couldn't mount device [/dev/drbd1] as /mnt/clusterfs) was recorded for start of Webcontent_FS on ... pacemaker-schedulerd[4879]: warning: Webcontent_FS cannot run on kathie3 due to reaching migration threshold (clean up resource to allow again) It's like it's trying to mount the device, but the device is not ready yet. The device is the drbd1 and I'm trying to mount it on /mnt/clusterfs. After the error occoured, and I do a "pcs resource cleanup" the cluster is able to mount it. the drbd-resource is named webcontend_DRBD the mounted filesystem is named webcontend_FS All other resources like httpd and HA-IP's working like a charm. This is the log from the start of the cluster: Oct 22 11:48:12 kathie3 pacemaker-controld[4880]: notice: State transition S_ELECTION -> S_INTEGRATION Oct 22 11:48:13 kathie3 pacemaker-schedulerd[4879]: notice: Actions: Start HA-IP_1 ( kathie3 ) Oct 22 11:48:13 kathie3 pacemaker-schedulerd[4879]: notice: Actions: Start HA-IP_2 ( kathie3 ) Oct 22 11:48:13 kathie3 pacemaker-schedulerd[4879]: notice: Actions: Start HA-IP_3 ( kathie3 ) Oct 22 11:48:13 kathie3 pacemaker-schedulerd[4879]: notice: Actions: Start Webcontent_DRBD:0 ( kathie3 ) Oct 22 11:48:13 kathie3 pacemaker-schedulerd[4879]: notice: Actions: Start Webcontent_FS ( kathie3 ) Oct 22 11:48:13 kathie3 pacemaker-schedulerd[4879]: notice: Actions: Start ping_fw:0 ( kathie3 ) Oct 22 11:48:13 kathie3 pacemaker-schedulerd[4879]: notice: Calculated transition 1106, saving inputs in /var/lib/pacemaker/pengine/pe-input-336.bz2 Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Initiating start operation HA-IP_1_start_0 locally on kathie3 Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Initiating start operation Webcontent_FS_start_0 locally on kathie3 Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Initiating start operation ping_fw_start_0 locally on kathie3 Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Initiating start operation Webcontent_DRBD_start_0 locally on kathie3 Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Requesting local execution of start operation for HA-IP_1 on kathie3 Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Requesting local execution of start operation for ping_fw on kathie3 Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Requesting local execution of start operation for Webcontent_DRBD on kathie3 Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Requesting local execution of start operation for Webcontent_FS on kathie3 Oct 22 11:48:13 kathie3 IPaddr2(HA-IP_1)[1682892]: INFO: Adding inet address 192.168.16.75/24 with broadcast address 192.168.16.255 to device ens3 Oct 22 11:48:13 kathie3 IPaddr2(HA-IP_1)[1682912]: INFO: Bringing device ens3 up Oct 22 11:48:13 kathie3 Filesystem(Webcontent_FS)[1682923]: INFO: Running start for /dev/drbd1 on /mnt/clusterfs Oct 22 11:48:13 kathie3 IPaddr2(HA-IP_1)[1682929]: INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /run/resource-agents/send_arp-192.168.16.75 ens3 192.168.16.75 auto not_used not_used Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data: Starting worker thread (node-id 0) Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Result of start operation for HA-IP_1 on kathie3: ok Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Initiating monitor operation HA-IP_1_monitor_30000 locally on kathie3 Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Requesting local execution of monitor operation for HA-IP_1 on kathie3 Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Initiating start operation HA-IP_2_start_0 locally on kathie3 Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Requesting local execution of start operation for HA-IP_2 on kathie3 Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data: Auto-promote failed: Need access to UpToDate data (-2) Oct 22 11:48:13 kathie3 kernel: /dev/drbd1: Can't open blockdev Oct 22 11:48:13 kathie3 kernel: /dev/drbd1: Can't open blockdev Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: meta-data IO uses: blk-bio Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: disk( Diskless -> Attaching ) [attach] Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: Maximum number of peer devices = 1 Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data: Method to ensure write ordering: flush Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: drbd_bm_resize called with capacity == 104854328 Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: resync bitmap: bits=13106791 words=204794 pages=400 Oct 22 11:48:13 kathie3 kernel: drbd1: detected capacity change from 0 to 104854328 Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: size = 50 GB (52427164 KB) Oct 22 11:48:13 kathie3 Filesystem(Webcontent_FS)[1683017]: ERROR: Couldn't mount device [/dev/drbd1] as /mnt/clusterfs Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Result of start operation for Webcontent_FS on kathie3: error (Couldn't mount device [/dev/drbd1] as /mnt/clusterfs) Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Webcontent_FS_start_0@kathie3 output [ blockdev: cannot open /dev/drbd1: No data available\nmount: /mnt/clusterfs: mount(2) system call failed: No data available.\nocf-exit-reason:Couldn't mount device [/dev/drbd1] as /mnt/clusterfs\n ] Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Transition 1106 aborted by operation Webcontent_FS_start_0 'modify' on kathie3: Event failed Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Transition 1106 action 37 (Webcontent_FS_start_0 on kathie3): expected 'ok' but got 'error' Oct 22 11:48:13 kathie3 pacemaker-attrd[4878]: notice: Setting last-failure-Webcontent_FS#start_0[kathie3] in instance_attributes: (unset) -> 1729590493 Oct 22 11:48:13 kathie3 pacemaker-attrd[4878]: notice: Setting fail-count-Webcontent_FS#start_0[kathie3] in instance_attributes: (unset) -> INFINITY Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Transition 1106 aborted by status-1-last-failure-Webcontent_FS.start_0 doing create last-failure-Webcontent_FS#start_0=1729590493: Transient attribute change Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: bitmap READ of 400 pages took 34 ms Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: disk( Attaching -> UpToDate ) [attach] Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: attached to current UUID: 826E8850CF10C812 Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: Setting exposed data uuid: 826E8850CF10C812 Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Result of monitor operation for HA-IP_1 on kathie3: ok Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data stacy3: Starting sender thread (peer-node-id 1) Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data stacy3: conn( StandAlone -> Unconnected ) [connect] Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data stacy3: Starting receiver thread (peer-node-id 1) Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data stacy3: conn( Unconnected -> Connecting ) [connecting] Oct 22 11:48:13 kathie3 IPaddr2(HA-IP_2)[1683100]: INFO: Adding inet address 192.168.16.76/24 with broadcast address 192.168.16.255 to device ens3 Oct 22 11:48:13 kathie3 IPaddr2(HA-IP_2)[1683106]: INFO: Bringing device ens3 up Oct 22 11:48:13 kathie3 IPaddr2(HA-IP_2)[1683112]: INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /run/resource-agents/send_arp-192.168.16.76 ens3 192.168.16.76 auto not_used not_used Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Result of start operation for HA-IP_2 on kathie3: ok Oct 22 11:48:15 kathie3 pacemaker-attrd[4878]: notice: Setting pingd[kathie3] in instance_attributes: (unset) -> 1000 Oct 22 11:48:15 kathie3 pacemaker-controld[4880]: notice: Result of start operation for ping_fw on kathie3: ok Oct 22 11:48:17 kathie3 IPaddr2(HA-IP_1)[1683126]: INFO: ARPING 192.168.16.75 from 192.168.16.75 ens3#012Sent 5 probes (5 broadcast(s))#012Received 0 response(s) Oct 22 11:48:17 kathie3 IPaddr2(HA-IP_2)[1683130]: INFO: ARPING 192.168.16.76 from 192.168.16.76 ens3#012Sent 5 probes (5 broadcast(s))#012Received 0 response(s) Oct 22 11:48:18 kathie3 drbd(Webcontent_DRBD)[1683138]: INFO: webcontent_data: Called drbdsetup wait-connect-resource webcontent_data --wfc-timeout=5 --degr-wfc-timeout=5 --outdated-wfc-timeout=5 Oct 22 11:48:18 kathie3 drbd(Webcontent_DRBD)[1683142]: INFO: webcontent_data: Exit code 5 Oct 22 11:48:18 kathie3 drbd(Webcontent_DRBD)[1683146]: INFO: webcontent_data: Command output: Oct 22 11:48:18 kathie3 drbd(Webcontent_DRBD)[1683150]: INFO: webcontent_data: Command stderr: Oct 22 11:48:19 kathie3 pacemaker-attrd[4878]: notice: Setting master-Webcontent_DRBD[kathie3] in instance_attributes: (unset) -> 1000 Oct 22 11:48:19 kathie3 pacemaker-controld[4880]: notice: Result of start operation for Webcontent_DRBD on kathie3: ok Oct 22 11:48:19 kathie3 pacemaker-controld[4880]: notice: Initiating notify operation Webcontent_DRBD_post_notify_start_0 locally on kathie3 ... Is there some kind of timeout wrong or what am I missing ? Any suggestions are welcome Kind regards fatcharly _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/