On 03.11.2017 15:49, Ken Gaillot wrote: > On Thu, 2017-11-02 at 23:18 +0100, Dennis Jacobfeuerborn wrote: >> On 02.11.2017 23:08, Dennis Jacobfeuerborn wrote: >>> Hi, >>> I'm setting up a redundant NFS server for some experiments but >>> almost >>> immediately ran into a strange issue. The drbd clone resource never >>> promotes either of the to clones to the Master state. >>> >>> The state says this: >>> >>> Master/Slave Set: drbd-clone [drbd] >>> Slaves: [ nfsserver1 nfsserver2 ] >>> metadata-fs (ocf::heartbeat:Filesystem): Stopped >>> >>> The resource configuration looks like this: >>> >>> Resources: >>> Master: drbd-clone >>> Meta Attrs: master-node-max=1 clone-max=2 notify=true master- >>> max=1 >>> clone-node-max=1 >>> Resource: drbd (class=ocf provider=linbit type=drbd) >>> Attributes: drbd_resource=r0 >>> Operations: demote interval=0s timeout=90 (drbd-demote-interval- >>> 0s) >>> monitor interval=60s (drbd-monitor-interval-60s) >>> promote interval=0s timeout=90 (drbd-promote- >>> interval-0s) >>> start interval=0s timeout=240 (drbd-start-interval- >>> 0s) >>> stop interval=0s timeout=100 (drbd-stop-interval-0s) >>> Resource: metadata-fs (class=ocf provider=heartbeat >>> type=Filesystem) >>> Attributes: device=/dev/drbd/by-res/r0/0 >>> directory=/var/lib/nfs_shared >>> fstype=ext4 options=noatime >>> Operations: monitor interval=20 timeout=40 >>> (metadata-fs-monitor-interval-20) >>> start interval=0s timeout=60 (metadata-fs-start- >>> interval-0s) >>> stop interval=0s timeout=60 (metadata-fs-stop- >>> interval-0s) >>> >>> Location Constraints: >>> Ordering Constraints: >>> promote drbd-clone then start metadata-fs (kind:Mandatory) >>> Colocation Constraints: >>> metadata-fs with drbd-clone (score:INFINITY) (with-rsc- >>> role:Master) >>> >>> Shouldn't one of the clones be promoted to the Master state >>> automatically? >> >> I think the source of the issue is this: >> >> Nov 2 23:12:03 nfsserver1 drbd(drbd)[4673]: ERROR: r0: Called >> /usr/sbin/crm_master -Q -l reboot -v 10000 >> Nov 2 23:12:03 nfsserver1 drbd(drbd)[4673]: ERROR: r0: Exit code 107 >> Nov 2 23:12:03 nfsserver1 drbd(drbd)[4673]: ERROR: r0: Command >> output: >> Nov 2 23:12:03 nfsserver1 lrmd[2163]: notice: >> drbd_monitor_60000:4673:stderr [ Error signing on to the CIB service: >> Transport endpoint is not connected ] >> >> It seems the drbd resource agent tries to use crm_master to promote >> the >> clone but fails because it cannot "sign on to the CIB service". Does >> anybody know what that means? >> >> Regards, >> Dennis >> > > That's odd, it should only happen if the cluster is not running, but > then the agent wouldn't have been called. > > The CIB is one of the core daemons of pacemaker; it manages the cluster > configuration and status. If it's not running, the cluster can't do > anything. > > Perhaps the CIB is crashing, or something is blocking the communication > between the agent and the CIB.
SELinux was the culprit. After disabling it the problem went away. Regards, Dennis _______________________________________________ Users mailing list: [email protected] http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
