Hi,
I'm running a 2-node-web-cluster on Almalinux-9, pacemaker 2.1.7, drbd9 and 
corosync 3.1.
I have trouble with the promoting and mounting of the drbd-device. After 
activating the cluster,
the drbd-device is not getting mounted and is showing quite fast an error 
message:

pacemaker-schedulerd[4879]: warning: Unexpected result (error: Couldn't mount 
device [/dev/drbd1] as /mnt/clusterfs) was recorded for start of Webcontent_FS 
on ...
pacemaker-schedulerd[4879]: warning: Webcontent_FS cannot run on kathie3 due to 
reaching migration threshold (clean up resource to allow again)

It's like it's trying to mount the device, but the device is not ready yet.
The device is the drbd1 and I'm trying to mount it on /mnt/clusterfs. After the 
error occoured, and I do a "pcs resource cleanup" the cluster is able to mount 
it.
the drbd-resource is named webcontend_DRBD
the mounted filesystem is named webcontend_FS
All other resources like httpd and HA-IP's working like a charm.

This is the log from the start of the cluster:

Oct 22 11:48:12 kathie3 pacemaker-controld[4880]: notice: State transition 
S_ELECTION -> S_INTEGRATION
Oct 22 11:48:13 kathie3 pacemaker-schedulerd[4879]: notice: Actions: Start      
HA-IP_1               (                        kathie3 )
Oct 22 11:48:13 kathie3 pacemaker-schedulerd[4879]: notice: Actions: Start      
HA-IP_2               (                        kathie3 )
Oct 22 11:48:13 kathie3 pacemaker-schedulerd[4879]: notice: Actions: Start      
HA-IP_3               (                        kathie3 )
Oct 22 11:48:13 kathie3 pacemaker-schedulerd[4879]: notice: Actions: Start      
Webcontent_DRBD:0     (                        kathie3 )
Oct 22 11:48:13 kathie3 pacemaker-schedulerd[4879]: notice: Actions: Start      
Webcontent_FS         (                        kathie3 )
Oct 22 11:48:13 kathie3 pacemaker-schedulerd[4879]: notice: Actions: Start      
ping_fw:0             (                        kathie3 )
Oct 22 11:48:13 kathie3 pacemaker-schedulerd[4879]: notice: Calculated 
transition 1106, saving inputs in /var/lib/pacemaker/pengine/pe-input-336.bz2
Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Initiating start 
operation HA-IP_1_start_0 locally on kathie3
Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Initiating start 
operation Webcontent_FS_start_0 locally on kathie3
Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Initiating start 
operation ping_fw_start_0 locally on kathie3
Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Initiating start 
operation Webcontent_DRBD_start_0 locally on kathie3
Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Requesting local 
execution of start operation for HA-IP_1 on kathie3
Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Requesting local 
execution of start operation for ping_fw on kathie3
Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Requesting local 
execution of start operation for Webcontent_DRBD on kathie3
Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Requesting local 
execution of start operation for Webcontent_FS on kathie3
Oct 22 11:48:13 kathie3 IPaddr2(HA-IP_1)[1682892]: INFO: Adding inet address 
192.168.16.75/24 with broadcast address 192.168.16.255 to device ens3
Oct 22 11:48:13 kathie3 IPaddr2(HA-IP_1)[1682912]: INFO: Bringing device ens3 up
Oct 22 11:48:13 kathie3 Filesystem(Webcontent_FS)[1682923]: INFO: Running start 
for /dev/drbd1 on /mnt/clusterfs
Oct 22 11:48:13 kathie3 IPaddr2(HA-IP_1)[1682929]: INFO: 
/usr/libexec/heartbeat/send_arp  -i 200 -r 5 -p 
/run/resource-agents/send_arp-192.168.16.75 ens3 192.168.16.75 auto not_used 
not_used
Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data: Starting worker thread 
(node-id 0)
Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Result of start 
operation for HA-IP_1 on kathie3: ok
Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Initiating monitor 
operation HA-IP_1_monitor_30000 locally on kathie3
Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Requesting local 
execution of monitor operation for HA-IP_1 on kathie3
Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Initiating start 
operation HA-IP_2_start_0 locally on kathie3
Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Requesting local 
execution of start operation for HA-IP_2 on kathie3
Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data: Auto-promote failed: Need 
access to UpToDate data (-2)
Oct 22 11:48:13 kathie3 kernel: /dev/drbd1: Can't open blockdev
Oct 22 11:48:13 kathie3 kernel: /dev/drbd1: Can't open blockdev
Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: meta-data IO 
uses: blk-bio
Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: disk( Diskless 
-> Attaching ) [attach]
Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: Maximum number of 
peer devices = 1
Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data: Method to ensure write 
ordering: flush
Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: drbd_bm_resize 
called with capacity == 104854328
Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: resync bitmap: 
bits=13106791 words=204794 pages=400
Oct 22 11:48:13 kathie3 kernel: drbd1: detected capacity change from 0 to 
104854328
Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: size = 50 GB 
(52427164 KB)
Oct 22 11:48:13 kathie3 Filesystem(Webcontent_FS)[1683017]: ERROR: Couldn't 
mount device [/dev/drbd1] as /mnt/clusterfs
Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Result of start 
operation for Webcontent_FS on kathie3: error (Couldn't mount device 
[/dev/drbd1] as /mnt/clusterfs)
Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: 
Webcontent_FS_start_0@kathie3 output [ blockdev: cannot open /dev/drbd1: No 
data available\nmount: /mnt/clusterfs: mount(2) system call failed: No data 
available.\nocf-exit-reason:Couldn't mount device [/dev/drbd1] as 
/mnt/clusterfs\n ]
Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Transition 1106 
aborted by operation Webcontent_FS_start_0 'modify' on kathie3: Event failed
Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Transition 1106 
action 37 (Webcontent_FS_start_0 on kathie3): expected 'ok' but got 'error'
Oct 22 11:48:13 kathie3 pacemaker-attrd[4878]: notice: Setting 
last-failure-Webcontent_FS#start_0[kathie3] in instance_attributes: (unset) 
-> 1729590493
Oct 22 11:48:13 kathie3 pacemaker-attrd[4878]: notice: Setting 
fail-count-Webcontent_FS#start_0[kathie3] in instance_attributes: (unset) -> 
INFINITY
Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Transition 1106 
aborted by status-1-last-failure-Webcontent_FS.start_0 doing create 
last-failure-Webcontent_FS#start_0=1729590493: Transient attribute change
Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: bitmap READ of 
400 pages took 34 ms
Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: disk( Attaching 
-> UpToDate ) [attach]
Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: attached to 
current UUID: 826E8850CF10C812
Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: Setting exposed 
data uuid: 826E8850CF10C812
Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Result of monitor 
operation for HA-IP_1 on kathie3: ok
Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data stacy3: Starting sender 
thread (peer-node-id 1)
Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data stacy3: conn( StandAlone 
-> Unconnected ) [connect]
Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data stacy3: Starting receiver 
thread (peer-node-id 1)
Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data stacy3: conn( Unconnected 
-> Connecting ) [connecting]
Oct 22 11:48:13 kathie3 IPaddr2(HA-IP_2)[1683100]: INFO: Adding inet address 
192.168.16.76/24 with broadcast address 192.168.16.255 to device ens3
Oct 22 11:48:13 kathie3 IPaddr2(HA-IP_2)[1683106]: INFO: Bringing device ens3 up
Oct 22 11:48:13 kathie3 IPaddr2(HA-IP_2)[1683112]: INFO: 
/usr/libexec/heartbeat/send_arp  -i 200 -r 5 -p 
/run/resource-agents/send_arp-192.168.16.76 ens3 192.168.16.76 auto not_used 
not_used
Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Result of start 
operation for HA-IP_2 on kathie3: ok
Oct 22 11:48:15 kathie3 pacemaker-attrd[4878]: notice: Setting pingd[kathie3] 
in instance_attributes: (unset) -> 1000
Oct 22 11:48:15 kathie3 pacemaker-controld[4880]: notice: Result of start 
operation for ping_fw on kathie3: ok
Oct 22 11:48:17 kathie3 IPaddr2(HA-IP_1)[1683126]: INFO: ARPING 192.168.16.75 
from 192.168.16.75 ens3#012Sent 5 probes (5 broadcast(s))#012Received 0 
response(s)
Oct 22 11:48:17 kathie3 IPaddr2(HA-IP_2)[1683130]: INFO: ARPING 192.168.16.76 
from 192.168.16.76 ens3#012Sent 5 probes (5 broadcast(s))#012Received 0 
response(s)
Oct 22 11:48:18 kathie3 drbd(Webcontent_DRBD)[1683138]: INFO: webcontent_data: 
Called drbdsetup wait-connect-resource webcontent_data --wfc-timeout=5 
--degr-wfc-timeout=5 --outdated-wfc-timeout=5
Oct 22 11:48:18 kathie3 drbd(Webcontent_DRBD)[1683142]: INFO: webcontent_data: 
Exit code 5
Oct 22 11:48:18 kathie3 drbd(Webcontent_DRBD)[1683146]: INFO: webcontent_data: 
Command output:
Oct 22 11:48:18 kathie3 drbd(Webcontent_DRBD)[1683150]: INFO: webcontent_data: 
Command stderr:
Oct 22 11:48:19 kathie3 pacemaker-attrd[4878]: notice: Setting 
master-Webcontent_DRBD[kathie3] in instance_attributes: (unset) -> 1000
Oct 22 11:48:19 kathie3 pacemaker-controld[4880]: notice: Result of start 
operation for Webcontent_DRBD on kathie3: ok
Oct 22 11:48:19 kathie3 pacemaker-controld[4880]: notice: Initiating notify 
operation Webcontent_DRBD_post_notify_start_0 locally on kathie3
...

Is there some kind of timeout wrong or what am I missing ?

Any suggestions are welcome

Kind regards

fatcharly


_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to