On 02/18/2016 01:07 PM, Jeremy Matthews wrote: > Hi, > > We're having an issue with our cluster where after a reboot of our system a > location constraint reappears for the ClusterIP. This causes a problem, > because we have a daemon that checks the cluster state and waits until the > ClusterIP is started before it kicks off our application. We didn't have this > issue when using an earlier version of pacemaker. Here is the constraint as > shown by pcs: > > [root@g5se-f3efce cib]# pcs constraint > Location Constraints: > Resource: ClusterIP > Disabled on: g5se-f3efce (role: Started) > Ordering Constraints: > Colocation Constraints: > > ...and here is our cluster status with the ClusterIP being Stopped: > > [root@g5se-f3efce cib]# pcs status > Cluster name: cl-g5se-f3efce > Last updated: Thu Feb 18 11:36:01 2016 > Last change: Thu Feb 18 10:48:33 2016 via crm_resource on g5se-f3efce > Stack: cman > Current DC: g5se-f3efce - partition with quorum > Version: 1.1.11-97629de > 1 Nodes configured > 4 Resources configured > > > Online: [ g5se-f3efce ] > > Full list of resources: > > sw-ready-g5se-f3efce (ocf::pacemaker:GBmon): Started g5se-f3efce > meta-data (ocf::pacemaker:GBmon): Started g5se-f3efce > netmon (ocf::heartbeat:ethmonitor): Started g5se-f3efce > ClusterIP (ocf::heartbeat:IPaddr2): Stopped > > > The cluster really just has one node at this time. > > I retrieve the constraint ID, remove the constraint, verify that ClusterIP is > started, and then reboot: > > [root@g5se-f3efce cib]# pcs constraint ref ClusterIP > Resource: ClusterIP > cli-ban-ClusterIP-on-g5se-f3efce > [root@g5se-f3efce cib]# pcs constraint remove cli-ban-ClusterIP-on-g5se-f3efce > > [root@g5se-f3efce cib]# pcs status > Cluster name: cl-g5se-f3efce > Last updated: Thu Feb 18 11:45:09 2016 > Last change: Thu Feb 18 11:44:53 2016 via crm_resource on g5se-f3efce > Stack: cman > Current DC: g5se-f3efce - partition with quorum > Version: 1.1.11-97629de > 1 Nodes configured > 4 Resources configured > > > Online: [ g5se-f3efce ] > > Full list of resources: > > sw-ready-g5se-f3efce (ocf::pacemaker:GBmon): Started g5se-f3efce > meta-data (ocf::pacemaker:GBmon): Started g5se-f3efce > netmon (ocf::heartbeat:ethmonitor): Started g5se-f3efce > ClusterIP (ocf::heartbeat:IPaddr2): Started g5se-f3efce > > > [root@g5se-f3efce cib]# reboot > > ....after reboot, log in, and the constraint is back and ClusterIP has not > started. > > > I have noticed in /var/lib/pacemaker/cib that the cib-x.raw files get created > when there are changes to the cib (cib.xml). After a reboot, I see the > constraint being added in a diff between .raw files: > > [root@g5se-f3efce cib]# diff cib-7.raw cib-8.raw > 1c1 > < <cib epoch="239" num_updates="0" admin_epoch="0" > validate-with="pacemaker-1.2" cib-last-written="Thu Feb 18 11:44:53 2016" > update-origin="g5se-f3efce" update-client="crm_resource" > crm_feature_set="3.0.9" have-quorum="1" dc-uuid="g5se-f3efce"> > --- >> <cib epoch="240" num_updates="0" admin_epoch="0" >> validate-with="pacemaker-1.2" cib-last-written="Thu Feb 18 11:46:49 2016" >> update-origin="g5se-f3efce" update-client="crm_resource" >> crm_feature_set="3.0.9" have-quorum="1" dc-uuid="g5se-f3efce"> > 50c50,52 > < <constraints/> > --- >> <constraints> >> <rsc_location id="cli-ban-ClusterIP-on-g5se-f3efce" rsc="ClusterIP" >> role="Started" node="g5se-f3efce" score="-INFINITY"/> >> </constraints> > > > I have also looked in /var/log/cluster/corosync.log and seen logs where it > seems the cib is getting updated. I'm not sure if the constraint is being put > back in at shutdown or at start up. I just don't understand why it's being > put back in. I don't think our daemon code or other scripts are doing this, > but it is something I could verify.
I would look at any scripts running around that time first. Constraints that start with "cli-" were created by one of the CLI tools, so something must be calling it. The most likely candidates are pcs resource move/ban or crm_resource -M/--move/-B/--ban. > ******************************** > > From "yum info pacemaker", my current version is: > > Name : pacemaker > Arch : x86_64 > Version : 1.1.12 > Release : 8.el6_7.2 > > My earlier version was: > > Name : pacemaker > Arch : x86_64 > Version : 1.1.10 > Release : 1.el6_4.4 > > I'm still using an earlier version pcs, because the new one seems to have > issues with python: > > Name : pcs > Arch : noarch > Version : 0.9.90 > Release : 1.0.1.el6.centos > > ******************************* > > If anyone has ideas on the cause or thoughts on this, anything would be > greatly appreciated. > > Thanks! > > > > Jeremy Matthews _______________________________________________ Users mailing list: [email protected] http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
