Re: [ClusterLabs] ClusterIP location constraint reappears after reboot

2016-03-14 Thread Ken Gaillot
On 02/22/2016 05:23 PM, Jeremy Matthews wrote:
> Thanks for the quick response again, and pardon for the delay in responding. 
> A colleague of mine and I have been trying some different things today.
> 
> But from the reboot on Friday, further below are the logs from corosync.log 
> from the time of the reboot command to the constraint being added.
> 
> I am not able to perform a "pcs cluster cib-upgrade". The version of pcs that 
> I have does not have that option (just cib [filename] and cib-push 
> ). My versions at the time of these logs were:

I'm curious whether you were able to solve your issue.

Regarding cib-upgrade, you can use the "cibadmin --upgrade" command
instead, which is what pcs does behind the scenes. For a
better-safe-than-sorry how-to, see:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_upgrading_the_configuration

> [root@g5se-f3efce Packages]# pcs --version
> 0.9.90
> [root@g5se-f3efce Packages]# pacemakerd --version
> Pacemaker 1.1.11
> Written by Andrew Beekhof
> 
> I think you're right in that we had a script banning the ClusterIP. It is 
> called from a message daemon that we created that acts as middleware between 
> the cluster software and our application. In this daemon, it has an exit 
> handler that calls a script which runs:
> 
> pcs resource ban ClusterIP $host  # where $host is the 
> result of "host =`hostname`
> 
> ...cause we normally try to push the cluster IP to the other side (though in 
> this case, we just have one node), but then right after that the script calls:
> 
> pcs resource clear ClusterIP
> 
> 
> ...but for some reason, it doesn't seem to result in the constraint being 
> removed (see even FURTHER below where I show a /var/log/message log snippet 
> with both the constraint addition and removal; this was using an earlier 
> version of pacemaker, Pacemaker 1.1.10-1.el6_4.4). I guess with the earlier 
> pcs or pacemaker version, these logs went to messages rather than 
> corosync.log today.
> 
> I am in a bit of a conundrum in that if I upgrade pcs to the 0.9.149 
> (retrieved and "make install" 'ed from github.com because 0.9.139 had a pcs 
> issue with one node clusters) which has the cib-upgrade option), then if I 
> manually remove the ClusterIP constraint this causes a problem for our 
> message daemon in that it thinks neither side in the cluster is active; 
> something to look at on our end. So it seems the removal of the constraint 
> affects our daemon in the new pcs. For the time being, I've rolled back pcs 
> to the above 0.9.90 version.
> 
> One other thing to mention is that the timing of pacemaker's start may have 
> been delayed by what I found out was a change to its initialization header 
> (by either our daemon or application installation script) from 90 1 to 70 20. 
> So in /etc/rc3.d, there is S70pacemaker rather than S90pacemaker. I am not a 
> Linux expert by any means. I guess that may affect start up, but I'm not sure 
> about shutdown.
> 
> Corosync logs from the time reboot was issued to the constraint being added:
> 
> Feb 19 15:22:22 [1997] g5se-f3efce  attrd:   notice: 
> attrd_trigger_update:  Sending flush op to all hosts for: standby (true)
> Feb 19 15:22:22 [1997] g5se-f3efce  attrd:   notice: 
> attrd_perform_update:  Sent update 24: standby=true
> Feb 19 15:22:22 [1994] g5se-f3efcecib: info: cib_process_request: 
>   Forwarding cib_modify operation for section status to master 
> (origin=local/attrd/24)
> Feb 19 15:22:22 [1994] g5se-f3efcecib: info: cib_perform_op:  
>   Diff: --- 0.291.2 2
> Feb 19 15:22:22 [1994] g5se-f3efcecib: info: cib_perform_op:  
>   Diff: +++ 0.291.3 (null)
> Feb 19 15:22:22 [1994] g5se-f3efcecib: info: cib_perform_op:  
>   +  /cib:  @num_updates=3
> Feb 19 15:22:22 [1994] g5se-f3efcecib: info: cib_perform_op:  
>   ++ 
> /cib/status/node_state[@id='g5se-f3efce']/transient_attributes[@id='g5se-f3efce']/instance_attributes[@id='status-g5se-f3efce']:
>   
> Feb 19 15:22:22 [1999] g5se-f3efce   crmd: info: 
> abort_transition_graph:Transition aborted by 
> status-g5se-f3efce-standby, standby=true: Transient attribute change (create 
> cib=0.291.3, source=te_update_diff:391, 
> path=/cib/status/node_state[@id='g5se-f3efce']/transient_attributes[@id='g5se-f3efce']/instance_attributes[@id='status-g5se-f3efce'],
>  1)
> Feb 19 15:22:22 [1999] g5se-f3efce   crmd:   notice: do_state_transition: 
>   State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC 
> cause=C_FSA_INTERNAL origin=abort_transition_graph ]
> Feb 19 15:22:22 [1994] g5se-f3efcecib: info: cib_process_request: 
>   Completed cib_modify operation for section status: OK (rc=0, 
> origin=g5se-f3efce/attrd/24, version=0.291.3)
> Feb 19 15:22:22 [1998] g5se-f3efcepengine:   notice: update_validation:   
>   pacemaker-1.2-style configuration is also va

Re: [ClusterLabs] ClusterIP location constraint reappears after reboot

2016-02-22 Thread Jeremy Matthews
Thanks for the quick response again, and pardon for the delay in responding. A 
colleague of mine and I have been trying some different things today.

But from the reboot on Friday, further below are the logs from corosync.log 
from the time of the reboot command to the constraint being added.

I am not able to perform a "pcs cluster cib-upgrade". The version of pcs that I 
have does not have that option (just cib [filename] and cib-push ). 
My versions at the time of these logs were:

[root@g5se-f3efce Packages]# pcs --version
0.9.90
[root@g5se-f3efce Packages]# pacemakerd --version
Pacemaker 1.1.11
Written by Andrew Beekhof

I think you're right in that we had a script banning the ClusterIP. It is 
called from a message daemon that we created that acts as middleware between 
the cluster software and our application. In this daemon, it has an exit 
handler that calls a script which runs:

pcs resource ban ClusterIP $host# where $host is the 
result of "host =`hostname`

...cause we normally try to push the cluster IP to the other side (though in 
this case, we just have one node), but then right after that the script calls:

pcs resource clear ClusterIP


...but for some reason, it doesn't seem to result in the constraint being 
removed (see even FURTHER below where I show a /var/log/message log snippet 
with both the constraint addition and removal; this was using an earlier 
version of pacemaker, Pacemaker 1.1.10-1.el6_4.4). I guess with the earlier pcs 
or pacemaker version, these logs went to messages rather than corosync.log 
today.

I am in a bit of a conundrum in that if I upgrade pcs to the 0.9.149 (retrieved 
and "make install" 'ed from github.com because 0.9.139 had a pcs issue with one 
node clusters) which has the cib-upgrade option), then if I manually remove the 
ClusterIP constraint this causes a problem for our message daemon in that it 
thinks neither side in the cluster is active; something to look at on our end. 
So it seems the removal of the constraint affects our daemon in the new pcs. 
For the time being, I've rolled back pcs to the above 0.9.90 version.

One other thing to mention is that the timing of pacemaker's start may have 
been delayed by what I found out was a change to its initialization header (by 
either our daemon or application installation script) from 90 1 to 70 20. So in 
/etc/rc3.d, there is S70pacemaker rather than S90pacemaker. I am not a Linux 
expert by any means. I guess that may affect start up, but I'm not sure about 
shutdown.

Corosync logs from the time reboot was issued to the constraint being added:

Feb 19 15:22:22 [1997] g5se-f3efce  attrd:   notice: attrd_trigger_update:  
Sending flush op to all hosts for: standby (true)
Feb 19 15:22:22 [1997] g5se-f3efce  attrd:   notice: attrd_perform_update:  
Sent update 24: standby=true
Feb 19 15:22:22 [1994] g5se-f3efcecib: info: cib_process_request:   
Forwarding cib_modify operation for section status to master 
(origin=local/attrd/24)
Feb 19 15:22:22 [1994] g5se-f3efcecib: info: cib_perform_op:
Diff: --- 0.291.2 2
Feb 19 15:22:22 [1994] g5se-f3efcecib: info: cib_perform_op:
Diff: +++ 0.291.3 (null)
Feb 19 15:22:22 [1994] g5se-f3efcecib: info: cib_perform_op:
+  /cib:  @num_updates=3
Feb 19 15:22:22 [1994] g5se-f3efcecib: info: cib_perform_op:
++ 
/cib/status/node_state[@id='g5se-f3efce']/transient_attributes[@id='g5se-f3efce']/instance_attributes[@id='status-g5se-f3efce']:
  
Feb 19 15:22:22 [1999] g5se-f3efce   crmd: info: 
abort_transition_graph:Transition aborted by 
status-g5se-f3efce-standby, standby=true: Transient attribute change (create 
cib=0.291.3, source=te_update_diff:391, 
path=/cib/status/node_state[@id='g5se-f3efce']/transient_attributes[@id='g5se-f3efce']/instance_attributes[@id='status-g5se-f3efce'],
 1)
Feb 19 15:22:22 [1999] g5se-f3efce   crmd:   notice: do_state_transition:   
State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC 
cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Feb 19 15:22:22 [1994] g5se-f3efcecib: info: cib_process_request:   
Completed cib_modify operation for section status: OK (rc=0, 
origin=g5se-f3efce/attrd/24, version=0.291.3)
Feb 19 15:22:22 [1998] g5se-f3efcepengine:   notice: update_validation: 
pacemaker-1.2-style configuration is also valid for pacemaker-1.3
Feb 19 15:22:22 [1998] g5se-f3efcepengine: info: update_validation: 
Transformation upgrade-1.3.xsl successful
Feb 19 15:22:22 [1998] g5se-f3efcepengine: info: update_validation: 
Transformed the configuration from pacemaker-1.2 to pacemaker-2.0
Feb 19 15:22:22 [1998] g5se-f3efcepengine: info: cli_config_update: 
Your configuration was internally updated to the latest version (pacemaker-2.0)
Feb 19 15:22:22 [1998] g5se-f3efcepengine:   notice: unpack_config: 
On loss of CCM Quorum: I

Re: [ClusterLabs] ClusterIP location constraint reappears after reboot

2016-02-22 Thread Ken Gaillot
)
> Feb 19 15:22:23 [1998] g5se-f3efcepengine:   notice: unpack_config:   
>   On loss of CCM Quorum: Ignore
> Feb 19 15:22:23 [1998] g5se-f3efcepengine: info: unpack_status:   
>   Node g5se-f3efce is in standby-mode
> 
> I'm not sure what all has to be included my original email and Ken Gaillot's 
> response embedded in it below. 
> 
> Message: 3
> Date: Thu, 18 Feb 2016 13:37:31 -0600
> From: Ken Gaillot 
> To: users@clusterlabs.org
> Subject: Re: [ClusterLabs] ClusterIP location constraint reappears
>   after reboot
> Message-ID: <56c61d7b.9090...@redhat.com>
> Content-Type: text/plain; charset=windows-1252
> 
> On 02/18/2016 01:07 PM, Jeremy Matthews wrote:
>> Hi,
>>
>> We're having an issue with our cluster where after a reboot of our system a 
>> location constraint reappears for the ClusterIP. This causes a problem, 
>> because we have a daemon that checks the cluster state and waits until the 
>> ClusterIP is started before it kicks off our application. We didn't have 
>> this issue when using an earlier version of pacemaker. Here is the 
>> constraint as shown by pcs:
>>
>> [root@g5se-f3efce cib]# pcs constraint Location Constraints:
>>   Resource: ClusterIP
>> Disabled on: g5se-f3efce (role: Started) Ordering Constraints:
>> Colocation Constraints:
>>
>> ...and here is our cluster status with the ClusterIP being Stopped:
>>
>> [root@g5se-f3efce cib]# pcs status
>> Cluster name: cl-g5se-f3efce
>> Last updated: Thu Feb 18 11:36:01 2016 Last change: Thu Feb 18 
>> 10:48:33 2016 via crm_resource on g5se-f3efce
>> Stack: cman
>> Current DC: g5se-f3efce - partition with quorum
>> Version: 1.1.11-97629de
>> 1 Nodes configured
>> 4 Resources configured
>>
>>
>> Online: [ g5se-f3efce ]
>>
>> Full list of resources:
>>
>> sw-ready-g5se-f3efce   (ocf::pacemaker:GBmon): Started g5se-f3efce
>> meta-data  (ocf::pacemaker:GBmon): Started g5se-f3efce
>> netmon (ocf::heartbeat:ethmonitor):Started g5se-f3efce
>> ClusterIP  (ocf::heartbeat:IPaddr2):   Stopped
>>
>>
>> The cluster really just has one node at this time.
>>
>> I retrieve the constraint ID, remove the constraint, verify that ClusterIP 
>> is started,  and then reboot:
>>
>> [root@g5se-f3efce cib]# pcs constraint ref ClusterIP
>> Resource: ClusterIP
>>   cli-ban-ClusterIP-on-g5se-f3efce
>> [root@g5se-f3efce cib]# pcs constraint remove 
>> cli-ban-ClusterIP-on-g5se-f3efce
>>
>> [root@g5se-f3efce cib]# pcs status
>> Cluster name: cl-g5se-f3efce
>> Last updated: Thu Feb 18 11:45:09 2016 Last change: Thu Feb 18 
>> 11:44:53 2016 via crm_resource on g5se-f3efce
>> Stack: cman
>> Current DC: g5se-f3efce - partition with quorum
>> Version: 1.1.11-97629de
>> 1 Nodes configured
>> 4 Resources configured
>>
>>
>> Online: [ g5se-f3efce ]
>>
>> Full list of resources:
>>
>> sw-ready-g5se-f3efce   (ocf::pacemaker:GBmon): Started g5se-f3efce
>> meta-data  (ocf::pacemaker:GBmon): Started g5se-f3efce
>> netmon (ocf::heartbeat:ethmonitor):Started g5se-f3efce
>> ClusterIP  (ocf::heartbeat:IPaddr2):   Started g5se-f3efce
>>
>>
>> [root@g5se-f3efce cib]# reboot
>>
>> after reboot, log in, and the constraint is back and ClusterIP has not 
>> started.
>>
>>
>> I have noticed in /var/lib/pacemaker/cib that the cib-x.raw files get 
>> created when there are changes to the cib (cib.xml). After a reboot, I see 
>> the constraint being added in a diff between .raw files:
>>
>> [root@g5se-f3efce cib]# diff cib-7.raw cib-8.raw
>> 1c1
>> < > validate-with="pacemaker-1.2" cib-last-written="Thu Feb 18 11:44:53 
>> 2016" update-origin="g5se-f3efce" update-client="crm_resource" 
>> crm_feature_set="3.0.9" have-quorum="1" dc-uuid="g5se-f3efce">
>> ---
>>> >> validate-with="pacemaker-1.2" cib-last-written="Thu Feb 18 11:46:49 
>>> 2016" update-origin="g5se-f3efce" update-client="crm_resource" 
>>> crm_feature_set="3.0.9" have-quorum="1" dc-uuid="g5se-f3efce">
>> 50c50,52
>> < 
>> ---
>>> 
>>>   >> role="Started" node="g5se-f3efce" score="-INFINITY"/>
>>> 
>>
>>
>> I have also looked in /var/log/cluster/corosync.log and seen logs where it 
&g

Re: [ClusterLabs] ClusterIP location constraint reappears after reboot

2016-02-22 Thread Jeremy Matthews
Thank you, Ken Gaillot, for your response. Sorry for the delayed followup, but 
I have looked and looked at the scripts. There are a couple of scripts that 
have a pcs resource ban command, but they are not executed at the time of 
shutdown which is when I've discovered that the constraint is put back in. Our 
application software did not change on the system. We just updated pcs and 
pacemaker (and dependencies). I had to rollback pcs because it has an issue. 

Below is from /var/log/cluster/corosync.log. Any clues here as to why the 
constraint might have been added? In my other system without the pacemaker 
update, there is not the addition of the constraint. 

Feb 19 15:22:23 [1999] g5se-f3efce   crmd: info: do_state_transition:   
State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS 
cause=C_IPC_MESSAGE origin=handle_response ]
Feb 19 15:22:23 [1999] g5se-f3efce   crmd: info: do_te_invoke:  
Processing graph 9 (ref=pe_calc-dc-1455920543-46) derived from 
/var/lib/pacemaker/pengine/pe-input-642.bz2
Feb 19 15:22:23 [1999] g5se-f3efce   crmd:   notice: run_graph: 
Transition 9 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/pacemaker/pengine/pe-input-642.bz2): Complete
Feb 19 15:22:23 [1999] g5se-f3efce   crmd: info: do_log:FSA: 
Input I_TE_SUCCESS from notify_crmd() received in state S_TRANSITION_ENGINE
Feb 19 15:22:23 [1999] g5se-f3efce   crmd:   notice: do_state_transition:   
State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS 
cause=C_FSA_INTERNAL origin=notify_crmd ]
Feb 19 15:22:23 [1998] g5se-f3efcepengine:   notice: process_pe_message:
Calculated Transition 9: /var/lib/pacemaker/pengine/pe-input-642.bz2
Feb 19 15:22:23 [1994] g5se-f3efcecib: info: cib_process_request:   
Forwarding cib_modify operation for section constraints to master 
(origin=local/crm_resource/3)
Feb 19 15:22:23 [1994] g5se-f3efcecib: info: cib_perform_op:
Diff: --- 0.291.8 2
Feb 19 15:22:23 [1994] g5se-f3efcecib: info: cib_perform_op:
Diff: +++ 0.292.0 (null)
Feb 19 15:22:23 [1994] g5se-f3efcecib: info: cib_perform_op:
+  /cib:  @epoch=292, @num_updates=0
Feb 19 15:22:23 [1994] g5se-f3efcecib: info: cib_perform_op:
++ /cib/configuration/constraints:  
Feb 19 15:22:23 [1994] g5se-f3efcecib: info: cib_process_request:   
Completed cib_modify operation for section constraints: OK (rc=0, 
origin=g5se-f3efce/crm_resource/3, version=0.292.0)
Feb 19 15:22:23 [1999] g5se-f3efce   crmd: info: 
abort_transition_graph:Transition aborted by 
rsc_location.cli-ban-ClusterIP-on-g5se-f3efce 'create': Non-status change 
(cib=0.292.0, source=te_update_diff:383, path=/cib/configuration/constraints, 1)
Feb 19 15:22:23 [1999] g5se-f3efce   crmd:   notice: do_state_transition:   
State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC 
cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Feb 19 15:22:23 [1998] g5se-f3efcepengine:   notice: update_validation: 
pacemaker-1.2-style configuration is also valid for pacemaker-1.3
Feb 19 15:22:23 [1998] g5se-f3efcepengine: info: update_validation: 
Transformation upgrade-1.3.xsl successful
Feb 19 15:22:23 [1998] g5se-f3efcepengine: info: update_validation: 
Transformed the configuration from pacemaker-1.2 to pacemaker-2.0
Feb 19 15:22:23 [1998] g5se-f3efcepengine: info: cli_config_update: 
Your configuration was internally updated to the latest version (pacemaker-2.0)
Feb 19 15:22:23 [1998] g5se-f3efcepengine:   notice: unpack_config: 
On loss of CCM Quorum: Ignore
Feb 19 15:22:23 [1998] g5se-f3efcepengine: info: unpack_status: 
Node g5se-f3efce is in standby-mode

I'm not sure what all has to be included my original email and Ken Gaillot's 
response embedded in it below. 

Message: 3
Date: Thu, 18 Feb 2016 13:37:31 -0600
From: Ken Gaillot 
To: users@clusterlabs.org
Subject: Re: [ClusterLabs] ClusterIP location constraint reappears
after reboot
Message-ID: <56c61d7b.9090...@redhat.com>
Content-Type: text/plain; charset=windows-1252

On 02/18/2016 01:07 PM, Jeremy Matthews wrote:
> Hi,
> 
> We're having an issue with our cluster where after a reboot of our system a 
> location constraint reappears for the ClusterIP. This causes a problem, 
> because we have a daemon that checks the cluster state and waits until the 
> ClusterIP is started before it kicks off our application. We didn't have this 
> issue when using an earlier version of pacemaker. Here is the constraint as 
> shown by pcs:
> 
> [root@g5se-f3efce cib]# pcs constraint Location Constraints:
>   Resource: ClusterIP
> Disabled on: g5se-f3efce (role: Started) Ordering Constraints:
> Colocation Constraints:
> 
> ...and here 

Re: [ClusterLabs] ClusterIP location constraint reappears after reboot

2016-02-18 Thread Ken Gaillot
On 02/18/2016 01:07 PM, Jeremy Matthews wrote:
> Hi,
> 
> We're having an issue with our cluster where after a reboot of our system a 
> location constraint reappears for the ClusterIP. This causes a problem, 
> because we have a daemon that checks the cluster state and waits until the 
> ClusterIP is started before it kicks off our application. We didn't have this 
> issue when using an earlier version of pacemaker. Here is the constraint as 
> shown by pcs:
> 
> [root@g5se-f3efce cib]# pcs constraint
> Location Constraints:
>   Resource: ClusterIP
> Disabled on: g5se-f3efce (role: Started)
> Ordering Constraints:
> Colocation Constraints:
> 
> ...and here is our cluster status with the ClusterIP being Stopped:
> 
> [root@g5se-f3efce cib]# pcs status
> Cluster name: cl-g5se-f3efce
> Last updated: Thu Feb 18 11:36:01 2016
> Last change: Thu Feb 18 10:48:33 2016 via crm_resource on g5se-f3efce
> Stack: cman
> Current DC: g5se-f3efce - partition with quorum
> Version: 1.1.11-97629de
> 1 Nodes configured
> 4 Resources configured
> 
> 
> Online: [ g5se-f3efce ]
> 
> Full list of resources:
> 
> sw-ready-g5se-f3efce   (ocf::pacemaker:GBmon): Started g5se-f3efce
> meta-data  (ocf::pacemaker:GBmon): Started g5se-f3efce
> netmon (ocf::heartbeat:ethmonitor):Started g5se-f3efce
> ClusterIP  (ocf::heartbeat:IPaddr2):   Stopped
> 
> 
> The cluster really just has one node at this time.
> 
> I retrieve the constraint ID, remove the constraint, verify that ClusterIP is 
> started,  and then reboot:
> 
> [root@g5se-f3efce cib]# pcs constraint ref ClusterIP
> Resource: ClusterIP
>   cli-ban-ClusterIP-on-g5se-f3efce
> [root@g5se-f3efce cib]# pcs constraint remove cli-ban-ClusterIP-on-g5se-f3efce
> 
> [root@g5se-f3efce cib]# pcs status
> Cluster name: cl-g5se-f3efce
> Last updated: Thu Feb 18 11:45:09 2016
> Last change: Thu Feb 18 11:44:53 2016 via crm_resource on g5se-f3efce
> Stack: cman
> Current DC: g5se-f3efce - partition with quorum
> Version: 1.1.11-97629de
> 1 Nodes configured
> 4 Resources configured
> 
> 
> Online: [ g5se-f3efce ]
> 
> Full list of resources:
> 
> sw-ready-g5se-f3efce   (ocf::pacemaker:GBmon): Started g5se-f3efce
> meta-data  (ocf::pacemaker:GBmon): Started g5se-f3efce
> netmon (ocf::heartbeat:ethmonitor):Started g5se-f3efce
> ClusterIP  (ocf::heartbeat:IPaddr2):   Started g5se-f3efce
> 
> 
> [root@g5se-f3efce cib]# reboot
> 
> after reboot, log in, and the constraint is back and ClusterIP has not 
> started.
> 
> 
> I have noticed in /var/lib/pacemaker/cib that the cib-x.raw files get created 
> when there are changes to the cib (cib.xml). After a reboot, I see the 
> constraint being added in a diff between .raw files:
> 
> [root@g5se-f3efce cib]# diff cib-7.raw cib-8.raw
> 1c1
> <  validate-with="pacemaker-1.2" cib-last-written="Thu Feb 18 11:44:53 2016" 
> update-origin="g5se-f3efce" update-client="crm_resource" 
> crm_feature_set="3.0.9" have-quorum="1" dc-uuid="g5se-f3efce">
> ---
>> > validate-with="pacemaker-1.2" cib-last-written="Thu Feb 18 11:46:49 2016" 
>> update-origin="g5se-f3efce" update-client="crm_resource" 
>> crm_feature_set="3.0.9" have-quorum="1" dc-uuid="g5se-f3efce">
> 50c50,52
> < 
> ---
>> 
>>   > role="Started" node="g5se-f3efce" score="-INFINITY"/>
>> 
> 
> 
> I have also looked in /var/log/cluster/corosync.log and seen logs where it 
> seems the cib is getting updated. I'm not sure if the constraint is being put 
> back in at shutdown or at start up. I just don't understand why it's being 
> put back in. I don't think our daemon code or other scripts are doing this,  
> but it is something I could verify.

I would look at any scripts running around that time first. Constraints
that start with "cli-" were created by one of the CLI tools, so
something must be calling it. The most likely candidates are pcs
resource move/ban or crm_resource -M/--move/-B/--ban.

> 
> 
> From "yum info pacemaker", my current version is:
> 
> Name: pacemaker
> Arch: x86_64
> Version : 1.1.12
> Release : 8.el6_7.2
> 
> My earlier version was:
> 
> Name: pacemaker
> Arch: x86_64
> Version : 1.1.10
> Release : 1.el6_4.4
> 
> I'm still using an earlier version pcs, because the new one seems to have 
> issues with python:
> 
> Name: pcs
> Arch: noarch
> Version : 0.9.90
> Release : 1.0.1.el6.centos
> 
> ***
> 
> If anyone has ideas on the cause or thoughts on this, anything would be 
> greatly appreciated.
> 
> Thanks!
> 
> 
> 
> Jeremy Matthews


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] ClusterIP location constraint reappears after reboot

2016-02-18 Thread Jeremy Matthews
Hi,

We're having an issue with our cluster where after a reboot of our system a 
location constraint reappears for the ClusterIP. This causes a problem, because 
we have a daemon that checks the cluster state and waits until the ClusterIP is 
started before it kicks off our application. We didn't have this issue when 
using an earlier version of pacemaker. Here is the constraint as shown by pcs:

[root@g5se-f3efce cib]# pcs constraint
Location Constraints:
  Resource: ClusterIP
Disabled on: g5se-f3efce (role: Started)
Ordering Constraints:
Colocation Constraints:

...and here is our cluster status with the ClusterIP being Stopped:

[root@g5se-f3efce cib]# pcs status
Cluster name: cl-g5se-f3efce
Last updated: Thu Feb 18 11:36:01 2016
Last change: Thu Feb 18 10:48:33 2016 via crm_resource on g5se-f3efce
Stack: cman
Current DC: g5se-f3efce - partition with quorum
Version: 1.1.11-97629de
1 Nodes configured
4 Resources configured


Online: [ g5se-f3efce ]

Full list of resources:

sw-ready-g5se-f3efce   (ocf::pacemaker:GBmon): Started g5se-f3efce
meta-data  (ocf::pacemaker:GBmon): Started g5se-f3efce
netmon (ocf::heartbeat:ethmonitor):Started g5se-f3efce
ClusterIP  (ocf::heartbeat:IPaddr2):   Stopped


The cluster really just has one node at this time.

I retrieve the constraint ID, remove the constraint, verify that ClusterIP is 
started,  and then reboot:

[root@g5se-f3efce cib]# pcs constraint ref ClusterIP
Resource: ClusterIP
  cli-ban-ClusterIP-on-g5se-f3efce
[root@g5se-f3efce cib]# pcs constraint remove cli-ban-ClusterIP-on-g5se-f3efce

[root@g5se-f3efce cib]# pcs status
Cluster name: cl-g5se-f3efce
Last updated: Thu Feb 18 11:45:09 2016
Last change: Thu Feb 18 11:44:53 2016 via crm_resource on g5se-f3efce
Stack: cman
Current DC: g5se-f3efce - partition with quorum
Version: 1.1.11-97629de
1 Nodes configured
4 Resources configured


Online: [ g5se-f3efce ]

Full list of resources:

sw-ready-g5se-f3efce   (ocf::pacemaker:GBmon): Started g5se-f3efce
meta-data  (ocf::pacemaker:GBmon): Started g5se-f3efce
netmon (ocf::heartbeat:ethmonitor):Started g5se-f3efce
ClusterIP  (ocf::heartbeat:IPaddr2):   Started g5se-f3efce


[root@g5se-f3efce cib]# reboot

after reboot, log in, and the constraint is back and ClusterIP has not 
started.


I have noticed in /var/lib/pacemaker/cib that the cib-x.raw files get created 
when there are changes to the cib (cib.xml). After a reboot, I see the 
constraint being added in a diff between .raw files:

[root@g5se-f3efce cib]# diff cib-7.raw cib-8.raw
1c1
< 
---
>  validate-with="pacemaker-1.2" cib-last-written="Thu Feb 18 11:46:49 2016" 
> update-origin="g5se-f3efce" update-client="crm_resource" 
> crm_feature_set="3.0.9" have-quorum="1" dc-uuid="g5se-f3efce">
50c50,52
< 
---
> 
>role="Started" node="g5se-f3efce" score="-INFINITY"/>
> 


I have also looked in /var/log/cluster/corosync.log and seen logs where it 
seems the cib is getting updated. I'm not sure if the constraint is being put 
back in at shutdown or at start up. I just don't understand why it's being put 
back in. I don't think our daemon code or other scripts are doing this,  but it 
is something I could verify.



>From "yum info pacemaker", my current version is:

Name: pacemaker
Arch: x86_64
Version : 1.1.12
Release : 8.el6_7.2

My earlier version was:

Name: pacemaker
Arch: x86_64
Version : 1.1.10
Release : 1.el6_4.4

I'm still using an earlier version pcs, because the new one seems to have 
issues with python:

Name: pcs
Arch: noarch
Version : 0.9.90
Release : 1.0.1.el6.centos

***

If anyone has ideas on the cause or thoughts on this, anything would be greatly 
appreciated.

Thanks!



Jeremy Matthews


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org