On Mon, Apr 11, 2022 at 1:53 PM Andrei Borzenkov <[email protected]> wrote: > > On 11.04.2022 19:02, Salatiel Filho wrote: > > Hi, I am deploying pacemaker + drbd to provide a high availability > > storage and during the troubleshooting tests I got an strange > > behaviour where the colocation constraint for the remaining resources > > and the cloned group appear to be just ignored. > > > > These are the constraints I have: > > Location Constraints: > > Ordering Constraints: > > start DRBDData-clone then start nfs (kind:Mandatory) > > Colocation Constraints: > > nfs with DRBDData-clone (score:INFINITY) > > Ticket Constraints: > > > > > > The environment: I have a two node cluster with a remote quorum > > device. The test was to stop the quorum device and afterwards stop the > > node currently running all the services ( node1 ). > > The expected behaviour would be that the remaining node would not be > > able to do anything ( partition without-quorum ) until it gets quorum. > > This is the output of pcs status on node2 after power off the quorum > > device and the node1. > > > > Some resources have been removed from the output to make this email cleaner. > > > > Cluster name: storage-drbd > > Cluster Summary: > > * Stack: corosync > > * Current DC: node2 (version 2.1.0-8.el8-7c3f660707) - partition > > WITHOUT quorum > > * Last updated: Mon Apr 11 12:28:06 2022 > > * Last change: Mon Apr 11 12:26:10 2022 by root via cibadmin on node2 > > * 2 nodes configured > > * 11 resource instances configured > > > > Node List: > > * Node node1: UNCLEAN (offline) > > * Online: [ node2 ] > > > > Full List of Resources: > > * fence-node1 (stonith:fence_vmware_rest): Started node2 > > * fence-node2 (stonith:fence_vmware_rest): Started node1 (UNCLEAN) > > * Clone Set: DRBDData-clone [DRBDData] (promotable): > > * DRBDData (ocf::linbit:drbd): Master node1 (UNCLEAN) > > * Slaves: [ node2 ] > > * Resource Group: nfs: > > * vip_nfs (ocf::heartbeat:IPaddr2): Started node1 (UNCLEAN) > > * drbd_fs (ocf::heartbeat:Filesystem): Started node1 (UNCLEAN) > > * nfsd (ocf::heartbeat:nfsserver): Started node1 (UNCLEAN) > > > > Daemon Status: > > corosync: active/enabled > > pacemaker: active/enabled > > pcsd: active/enabled > > > > > > > > > > > > > > As expected, the node 2 is without quorum and waiting. The problem > > hapenned when I turn the node1 back. The quorum was restablished, but > > the drbd master started on node1, but the nfs group started on node2, > > even though I have both start order and colocation to make both the > > Cloned Resource and the NFS group to run on the same node. > > > > No. you do not. > > > > > > > Cluster name: storage-drbd > > Cluster Summary: > > * Stack: corosync > > * Current DC: node2 (version 2.1.0-8.el8-7c3f660707) - partition with > > quorum > > * Last updated: Mon Apr 11 12:29:08 2022 > > * Last change: Mon Apr 11 12:26:10 2022 by root via cibadmin on node2 > > * 2 nodes configured > > * 11 resource instances configured > > > > Node List: > > * Online: [ node1 node2 ] > > > > Full List of Resources: > > * fence-node1 (stonith:fence_vmware_rest): Started node2 > > * fence-node2 (stonith:fence_vmware_rest): Started node1 > > * Clone Set: DRBDData-clone [DRBDData] (promotable): > > * Masters: [ node2 ] > > * Slaves: [ node1 ] > > * Resource Group: nfs: > > * vip_nfs (ocf::heartbeat:IPaddr2): Started node1 > > * drbd_fs (ocf::heartbeat:Filesystem): FAILED node1 > > * nfsd (ocf::heartbeat:nfsserver): Stopped > > > > Failed Resource Actions: > > * drbd_fs_start_0 on node1 'error' (1): call=90, status='complete', > > exitreason='Couldn't mount device [/dev/drbd0] as /exports/drbd0', > > last-rc-change='2022-04-11 12:29:05 -03:00', queued=0ms, exec=2567ms > > > > Daemon Status: > > corosync: active/enabled > > pacemaker: active/enabled > > pcsd: active/enabled > > > > > > > > > > > > Can anyone explain to me why are the constraints being ignored? > > > > You order/colocation is against starting of clone resource, not against > master. If you need to order/colocate resource against master, you need > to say this explicitly. Colocating/ordering against "start" is satisfied > as soon as cloned resource is started as slave, before it gets promoted.
Thanks Andrei, I suppose these are the required constraints then: # pcs constraint order promote DRBDData-Clone then nfs # pcs constraint colocation add nfs with master DRBDData-clone INFINITY with-rsc-role=Master > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
