Re: [ClusterLabs] cluster does not detect kill on pacemaker process ?

2017-04-07 Thread Ken Gaillot
On 04/07/2017 05:20 PM, neeraj ch wrote:
> I am running it on centos 6.6. I am killing the "pacemakerd" process
> using kill -9.

pacemakerd is a supervisor process that watches the other processes, and
respawns them if they die. It is not really responsible for anything in
the cluster directly. So, killing it does not disrupt the cluster in any
way, it just prevents automatic recovery if one of the other daemons dies.

When systemd is in use, systemd will restart pacemakerd if it dies, but
CentOS 6 does not have systemd (CentOS 7 does).

> hmm, stonith is used for detection as well? I thought it was used to
> disable malfunctioning nodes. 

If you kill pacemakerd, that doesn't cause any harm to the cluster, so
that would not involve stonith.

If you kill crmd or corosync instead, that would cause the node to leave
the cluster -- it would be considered a malfunctioning node. The rest of
the cluster would then use stonith to disable that node, so it could
safely recover its services elsewhere.

> On Fri, Apr 7, 2017 at 7:58 AM, Ken Gaillot  > wrote:
> 
> On 04/05/2017 05:16 PM, neeraj ch wrote:
> > Hello All,
> >
> > I noticed something on our pacemaker test cluster. The cluster is
> > configured to manage an underlying database using master slave
> primitive.
> >
> > I ran a kill on the pacemaker process, all the other nodes kept
> showing
> > the node online. I went on to kill the underlying database on the same
> > node which would have been detected had the pacemaker on the node been
> > online. The cluster did not detect that the database on the node has
> > failed, the failover never occurred.
> >
> > I went on to kill corosync on the same node and the cluster now marked
> > the node as stopped and proceeded to elect a new master.
> >
> >
> > In a separate test. I killed the pacemaker process on the cluster DC,
> > the cluster showed no change. I went on to change CIB on a different
> > node. The CIB modify command timed out. Once that occurred, the node
> > didn't failover even when I turned off corosync on cluster DC. The
> > cluster didn't recover after this mishap.
> >
> > Is this expected behavior? Is there a solution for when OOM decides to
> > kill the pacemaker process?
> >
> > I run pacemaker 1.1.14, with corosync 1.4. I have stonith disabled and
> > quorum enabled.
> >
> > Thank you,
> >
> > nwarriorch
> 
> What exactly are you doing to kill pacemaker? There are multiple
> pacemaker processes, and they have different recovery methods.
> 
> Also, what OS/version are you running? If it has systemd, that can play
> a role in recovery as well.
> 
> Having stonith disabled is a big part of what you're seeing. When a node
> fails, stonith is the only way the rest of the cluster can be sure the
> node is unable to cause trouble, so it can recover services elsewhere.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] cluster does not detect kill on pacemaker process ?

2017-04-07 Thread neeraj ch
I am running it on centos 6.6. I am killing the "pacemakerd" process using
kill -9.

hmm, stonith is used for detection as well? I thought it was used to
disable malfunctioning nodes.

On Fri, Apr 7, 2017 at 7:58 AM, Ken Gaillot  wrote:

> On 04/05/2017 05:16 PM, neeraj ch wrote:
> > Hello All,
> >
> > I noticed something on our pacemaker test cluster. The cluster is
> > configured to manage an underlying database using master slave primitive.
> >
> > I ran a kill on the pacemaker process, all the other nodes kept showing
> > the node online. I went on to kill the underlying database on the same
> > node which would have been detected had the pacemaker on the node been
> > online. The cluster did not detect that the database on the node has
> > failed, the failover never occurred.
> >
> > I went on to kill corosync on the same node and the cluster now marked
> > the node as stopped and proceeded to elect a new master.
> >
> >
> > In a separate test. I killed the pacemaker process on the cluster DC,
> > the cluster showed no change. I went on to change CIB on a different
> > node. The CIB modify command timed out. Once that occurred, the node
> > didn't failover even when I turned off corosync on cluster DC. The
> > cluster didn't recover after this mishap.
> >
> > Is this expected behavior? Is there a solution for when OOM decides to
> > kill the pacemaker process?
> >
> > I run pacemaker 1.1.14, with corosync 1.4. I have stonith disabled and
> > quorum enabled.
> >
> > Thank you,
> >
> > nwarriorch
>
> What exactly are you doing to kill pacemaker? There are multiple
> pacemaker processes, and they have different recovery methods.
>
> Also, what OS/version are you running? If it has systemd, that can play
> a role in recovery as well.
>
> Having stonith disabled is a big part of what you're seeing. When a node
> fails, stonith is the only way the rest of the cluster can be sure the
> node is unable to cause trouble, so it can recover services elsewhere.
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] [Problem] The crmd causes an error of xml.

2017-04-07 Thread renayama19661014
Hi Ken,

Thank you for comment

Okay!
I wait for a correction.


Many thanks!
Hideo Yamauchi.


- Original Message -
> From: Ken Gaillot 
> To: users@clusterlabs.org
> Cc: 
> Date: 2017/4/8, Sat 05:04
> Subject: Re: [ClusterLabs] [Problem] The crmd causes an error of xml.
> 
> On 04/06/2017 08:49 AM, renayama19661...@ybb.ne.jp wrote:
>>  Hi All,
>> 
>>  I confirmed a development edition of Pacemaker.
>>   - 
> https://github.com/ClusterLabs/pacemaker/tree/71dbd128c7b0a923c472c8e564d33a0ba1816cb5
>> 
>>  
>>  property no-quorum-policy="ignore" \
>>          stonith-enabled="true" \
>>          startup-fencing="false"
>> 
>>  rsc_defaults resource-stickiness="INFINITY" \
>>          migration-threshold="INFINITY"
>> 
>>  fencing_topology \
>>          rh73-01-snmp: prmStonith1-1 \
>>          rh73-02-snmp: prmStonith2-1
>> 
>>  primitive prmDummy ocf:pacemaker:Dummy \
>>          op start interval="0s" timeout="60s" 
> on-fail="restart" \
>>          op monitor interval="10s" timeout="60s" 
> on-fail="restart" \
>>          op stop interval="0s" timeout="60s" 
> on-fail="fence"
>> 
>>  primitive prmStonith1-1 stonith:external/ssh \
>>          params \
>>          pcmk_reboot_retries="1" \
>>          pcmk_reboot_timeout="40s" \
>>          hostlist="rh73-01-snmp" \
>>          op start interval="0s" timeout="60s" 
> on-fail="restart" \
>>          op stop interval="0s" timeout="60s" 
> on-fail="ignore"
>> 
>>  primitive prmStonith2-1 stonith:external/ssh \
>>          params \
>>          pcmk_reboot_retries="1" \
>>          pcmk_reboot_timeout="40s" \
>>          hostlist="rh73-02-snmp" \
>>          op start interval="0s" timeout="60s" 
> on-fail="restart" \
>>          op stop interval="0s" timeout="60s" 
> on-fail="ignore"
>> 
>>  ### Resource Location ###
>>  location rsc_location-1 prmDummy \
>>          rule  300: #uname eq rh73-01-snmp \
>>          rule  200: #uname eq rh73-02-snmp
>> 
>>  
>> 
>>  I pour the following brief crm files.
>>  I produce the trouble of the resource in a cluster.
>>  Then crmd causes an error.
>> 
>>  
>>  (snip)
>>  Apr  6 18:04:22 rh73-01-snmp pengine[5214]: warning: Calculated transition 
> 4 (with warnings), saving inputs in /var/lib/pacemaker/pengine/pe-warn-0.bz2
>>  Apr  6 18:04:22 rh73-01-snmp crmd[5215]:   error: XML Error: Entity: line 
> 1: parser error : Specification mandate value for attribute 
> CRM_meta_fail_count_prmDummy
>>  Apr  6 18:04:22 rh73-01-snmp crmd[5215]:   error: XML Error: 
> rh73-01-snmp" on_node_uuid="3232238265"> CRM_meta_fail_count_prmDummy
>>  Apr  6 18:04:22 rh73-01-snmp crmd[5215]:   error: XML Error:                
>                                                                 ^
>>  Apr  6 18:04:22 rh73-01-snmp crmd[5215]:   error: XML Error: Entity: line 
> 1: parser error : attributes construct error
>>  Apr  6 18:04:22 rh73-01-snmp crmd[5215]:   error: XML Error: 
> rh73-01-snmp" on_node_uuid="3232238265"> CRM_meta_fail_count_prmDummy
>>  Apr  6 18:04:22 rh73-01-snmp crmd[5215]:   error: XML Error:                
>                                                                 ^
>>  Apr  6 18:04:22 rh73-01-snmp crmd[5215]:   error: XML Error: Entity: line 
> 1: parser error : Couldn't find end of Start Tag attributes line 1
>>  Apr  6 18:04:22 rh73-01-snmp crmd[5215]:   error: XML Error: 
> rh73-01-snmp" on_node_uuid="3232238265"> CRM_meta_fail_count_prmDummy
>>  Apr  6 18:04:22 rh73-01-snmp crmd[5215]:   error: XML Error:                
>                                                                 ^
>>  Apr  6 18:04:22 rh73-01-snmp crmd[5215]: warning: Parsing failed (domain=1, 
> level=3, code=73): Couldn't find end of Start Tag attributes line 1
>>  (snip)
>>  
>> 
>>  The XML that a new trouble count was related to somehow or other seems to 
> have a problem.
>> 
>>  I attach pe-warn-0.bz2.
>> 
>>  Best Regards,
>>  Hideo Yamauchi.
> 
> Hi Hideo,
> 
> Thanks for the report!
> 
> This appears to be a PE bug when fencing is needed due to stop failure.
> It wasn't caught in regression testing because the PE will continue to
> use the old-style fail-count attribute if the DC does not support the
> new style, and existing tests obviously have older DCs. I definitely
> need to add some new tests.
> 
> I'm not sure why fail-count and last-failure are being added as
> meta-attributes in this case, or why incorrect XML syntax is being
> generated, but I'll investigate.
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: 

Re: [ClusterLabs] Fraud Detection Check?

2017-04-07 Thread Eric Robinson
> On a serious note, I too received your e-mails without any red flags attached.

Thanks for the confirmation. I guess I'm the only one seeing those warnings. 
Maybe Office 365 has a problem with ClusterLabs. ;-)

--
Eric Robinson
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Fraud Detection Check?

2017-04-07 Thread Dimitri Maziuk
On 04/07/2017 02:22 PM, Eric Robinson wrote:
>>> You guys got a thing against Office 365?
> 
>> doesn't everybody?
> 
> Fair enough. 

;)

On a serious note, I too received your e-mails without any red flags
attached.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] [Problem] The crmd causes an error of xml.

2017-04-07 Thread Ken Gaillot
On 04/06/2017 08:49 AM, renayama19661...@ybb.ne.jp wrote:
> Hi All,
> 
> I confirmed a development edition of Pacemaker.
>  - 
> https://github.com/ClusterLabs/pacemaker/tree/71dbd128c7b0a923c472c8e564d33a0ba1816cb5
> 
> 
> property no-quorum-policy="ignore" \
> stonith-enabled="true" \
> startup-fencing="false"
> 
> rsc_defaults resource-stickiness="INFINITY" \
> migration-threshold="INFINITY"
> 
> fencing_topology \
> rh73-01-snmp: prmStonith1-1 \
> rh73-02-snmp: prmStonith2-1
> 
> primitive prmDummy ocf:pacemaker:Dummy \
> op start interval="0s" timeout="60s" on-fail="restart" \
> op monitor interval="10s" timeout="60s" on-fail="restart" \
> op stop interval="0s" timeout="60s" on-fail="fence"
> 
> primitive prmStonith1-1 stonith:external/ssh \
> params \
> pcmk_reboot_retries="1" \
> pcmk_reboot_timeout="40s" \
> hostlist="rh73-01-snmp" \
> op start interval="0s" timeout="60s" on-fail="restart" \
> op stop interval="0s" timeout="60s" on-fail="ignore"
> 
> primitive prmStonith2-1 stonith:external/ssh \
> params \
> pcmk_reboot_retries="1" \
> pcmk_reboot_timeout="40s" \
> hostlist="rh73-02-snmp" \
> op start interval="0s" timeout="60s" on-fail="restart" \
> op stop interval="0s" timeout="60s" on-fail="ignore"
> 
> ### Resource Location ###
> location rsc_location-1 prmDummy \
> rule  300: #uname eq rh73-01-snmp \
> rule  200: #uname eq rh73-02-snmp
> 
> 
> 
> I pour the following brief crm files.
> I produce the trouble of the resource in a cluster.
> Then crmd causes an error.
> 
> 
> (snip)
> Apr  6 18:04:22 rh73-01-snmp pengine[5214]: warning: Calculated transition 4 
> (with warnings), saving inputs in /var/lib/pacemaker/pengine/pe-warn-0.bz2
> Apr  6 18:04:22 rh73-01-snmp crmd[5215]:   error: XML Error: Entity: line 1: 
> parser error : Specification mandate value for attribute 
> CRM_meta_fail_count_prmDummy
> Apr  6 18:04:22 rh73-01-snmp crmd[5215]:   error: XML Error: rh73-01-snmp" 
> on_node_uuid="3232238265"> Apr  6 18:04:22 rh73-01-snmp crmd[5215]:   error: XML Error:  
>   ^
> Apr  6 18:04:22 rh73-01-snmp crmd[5215]:   error: XML Error: Entity: line 1: 
> parser error : attributes construct error
> Apr  6 18:04:22 rh73-01-snmp crmd[5215]:   error: XML Error: rh73-01-snmp" 
> on_node_uuid="3232238265"> Apr  6 18:04:22 rh73-01-snmp crmd[5215]:   error: XML Error:  
>   ^
> Apr  6 18:04:22 rh73-01-snmp crmd[5215]:   error: XML Error: Entity: line 1: 
> parser error : Couldn't find end of Start Tag attributes line 1
> Apr  6 18:04:22 rh73-01-snmp crmd[5215]:   error: XML Error: rh73-01-snmp" 
> on_node_uuid="3232238265"> Apr  6 18:04:22 rh73-01-snmp crmd[5215]:   error: XML Error:  
>   ^
> Apr  6 18:04:22 rh73-01-snmp crmd[5215]: warning: Parsing failed (domain=1, 
> level=3, code=73): Couldn't find end of Start Tag attributes line 1
> (snip)
> 
> 
> The XML that a new trouble count was related to somehow or other seems to 
> have a problem.
> 
> I attach pe-warn-0.bz2.
> 
> Best Regards,
> Hideo Yamauchi.

Hi Hideo,

Thanks for the report!

This appears to be a PE bug when fencing is needed due to stop failure.
It wasn't caught in regression testing because the PE will continue to
use the old-style fail-count attribute if the DC does not support the
new style, and existing tests obviously have older DCs. I definitely
need to add some new tests.

I'm not sure why fail-count and last-failure are being added as
meta-attributes in this case, or why incorrect XML syntax is being
generated, but I'll investigate.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Can't See Why This Cluster Failed Over

2017-04-07 Thread Ken Gaillot
On 04/07/2017 12:58 PM, Eric Robinson wrote:
> Somebody want to look at this log and tell me why the cluster failed over? 
> All we did was add a new resource. We've done it many times before without 
> any problems.
> 
> --
> 
> Apr 03 08:50:30 [22762] ha14acib: info: cib_process_request:  
>   Forwarding cib_apply_diff operation for section 'all' to master 
> (origin=local/cibadmin/2)
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: Diff: 
> --- 0.605.2 2
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: Diff: 
> +++ 0.607.0 65654c97e62cd549f22f777a5290fe3a
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: +  
> /cib:  @epoch=607, @num_updates=0
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: ++ 
> /cib/configuration/resources:   type="mysql_745"/>
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: ++ 
> /cib/configuration/resources:   type="mysql_746"/>
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: ++ 
> /cib/configuration/constraints/rsc_colocation[@id='c_clust19']/resource_set[@id='c_clust19-0']:
>   
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: ++ 
> /cib/configuration/constraints/rsc_colocation[@id='c_clust19']/resource_set[@id='c_clust19-0']:
>   
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: ++ 
> /cib/configuration/constraints/rsc_order[@id='o_clust19']/resource_set[@id='o_clust19-3']:
>   
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: ++ 
> /cib/configuration/constraints/rsc_order[@id='o_clust19']/resource_set[@id='o_clust19-3']:
>   
> Apr 03 08:50:30 [22762] ha14acib: info: cib_process_request:  
>   Completed cib_apply_diff operation for section 'all': OK (rc=0, 
> origin=ha14a/cibadmin/2, version=0.607.0)
> Apr 03 08:50:30 [22762] ha14acib: info: write_cib_contents:   
>   Archived previous version as /var/lib/pacemaker/cib/cib-36.raw
> Apr 03 08:50:30 [22762] ha14acib: info: write_cib_contents:   
>   Wrote version 0.607.0 of the CIB to disk (digest: 
> 1afdb9e480f870a095aa9e39719d29c4)
> Apr 03 08:50:30 [22762] ha14acib: info: retrieveCib:
> Reading cluster configuration from: /var/lib/pacemaker/cib/cib.DkIgSs 
> (digest: /var/lib/pacemaker/cib/cib.hPwa66)
> Apr 03 08:50:30 [22764] ha14a   lrmd: info: 
> process_lrmd_get_rsc_info:  Resource 'p_mysql_745' not found (17 active 
> resources)
> Apr 03 08:50:30 [22764] ha14a   lrmd: info: 
> process_lrmd_rsc_register:  Added 'p_mysql_745' to the rsc list (18 active 
> resources)
> Apr 03 08:50:30 [22767] ha14a   crmd: info: do_lrm_rsc_op:  
> Performing key=10:7484:7:91ef4b03-8769-47a1-a364-060569c46e52 
> op=p_mysql_745_monitor_0
> Apr 03 08:50:30 [22764] ha14a   lrmd: info: 
> process_lrmd_get_rsc_info:  Resource 'p_mysql_746' not found (18 active 
> resources)
> Apr 03 08:50:30 [22764] ha14a   lrmd: info: 
> process_lrmd_rsc_register:  Added 'p_mysql_746' to the rsc list (19 active 
> resources)
> Apr 03 08:50:30 [22767] ha14a   crmd: info: do_lrm_rsc_op:  
> Performing key=11:7484:7:91ef4b03-8769-47a1-a364-060569c46e52 
> op=p_mysql_746_monitor_0
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: Diff: 
> --- 0.607.0 2
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: Diff: 
> +++ 0.607.1 (null)
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: +  
> /cib:  @num_updates=1
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: ++ 
> /cib/status/node_state[@id='ha14b']/lrm[@id='ha14b']/lrm_resources:  
> 
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: ++
> 
>  operation="monitor" crm-debug-origin="do_update_resource" 
> crm_feature_set="3.0.9" 
> transition-key="13:7484:7:91ef4b03-8769-47a1-a364-060569c46e52" 
> transition-magic="0:7;13:7484:7:91ef4b03-8769-47a1-a364-060569c46e52" 
> call-id="142" rc-code="7" op-status="0" interval="0" last-run="1491234630" las
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: ++
>   
> 
> Apr 03 08:50:30 [22762] ha14acib: info: cib_process_request:  
>   Completed cib_modify operation for section status: OK (rc=0, 
> origin=ha14b/crmd/7665, version=0.607.1)
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: Diff: 
> --- 0.607.1 2
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: Diff: 
> +++ 0.607.2 (null)
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: +  
> /cib:  @num_updates=2
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: ++ 
> 

Re: [ClusterLabs] Fraud Detection Check?

2017-04-07 Thread Eric Robinson
> I've received your emails without any alteration or flagging as "fraud".
> So I don't think we're doing anything to your emails.

Good to know.

--
Eric Robinson

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Fraud Detection Check?

2017-04-07 Thread Eric Robinson
>> You guys got a thing against Office 365?

> doesn't everybody?

Fair enough. 

--
Eric Robinson

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Fraud Detection Check?

2017-04-07 Thread Dimitri Maziuk
On 04/07/2017 01:32 PM, Eric Robinson wrote:
> You guys got a thing against Office 365?

doesn't everybody?

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Can't See Why This Cluster Failed Over

2017-04-07 Thread Eric Robinson
Somebody want to look at this log and tell me why the cluster failed over? All 
we did was add a new resource. We've done it many times before without any 
problems.

--

Apr 03 08:50:30 [22762] ha14acib: info: cib_process_request:
Forwarding cib_apply_diff operation for section 'all' to master 
(origin=local/cibadmin/2)
Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: Diff: 
--- 0.605.2 2
Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: Diff: 
+++ 0.607.0 65654c97e62cd549f22f777a5290fe3a
Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: +  
/cib:  @epoch=607, @num_updates=0
Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: ++ 
/cib/configuration/resources:  
Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: ++ 
/cib/configuration/resources:  
Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: ++ 
/cib/configuration/constraints/rsc_colocation[@id='c_clust19']/resource_set[@id='c_clust19-0']:
  
Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: ++ 
/cib/configuration/constraints/rsc_colocation[@id='c_clust19']/resource_set[@id='c_clust19-0']:
  
Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: ++ 
/cib/configuration/constraints/rsc_order[@id='o_clust19']/resource_set[@id='o_clust19-3']:
  
Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: ++ 
/cib/configuration/constraints/rsc_order[@id='o_clust19']/resource_set[@id='o_clust19-3']:
  
Apr 03 08:50:30 [22762] ha14acib: info: cib_process_request:
Completed cib_apply_diff operation for section 'all': OK (rc=0, 
origin=ha14a/cibadmin/2, version=0.607.0)
Apr 03 08:50:30 [22762] ha14acib: info: write_cib_contents: 
Archived previous version as /var/lib/pacemaker/cib/cib-36.raw
Apr 03 08:50:30 [22762] ha14acib: info: write_cib_contents: 
Wrote version 0.607.0 of the CIB to disk (digest: 
1afdb9e480f870a095aa9e39719d29c4)
Apr 03 08:50:30 [22762] ha14acib: info: retrieveCib:Reading 
cluster configuration from: /var/lib/pacemaker/cib/cib.DkIgSs (digest: 
/var/lib/pacemaker/cib/cib.hPwa66)
Apr 03 08:50:30 [22764] ha14a   lrmd: info: process_lrmd_get_rsc_info:  
Resource 'p_mysql_745' not found (17 active resources)
Apr 03 08:50:30 [22764] ha14a   lrmd: info: process_lrmd_rsc_register:  
Added 'p_mysql_745' to the rsc list (18 active resources)
Apr 03 08:50:30 [22767] ha14a   crmd: info: do_lrm_rsc_op:  
Performing key=10:7484:7:91ef4b03-8769-47a1-a364-060569c46e52 
op=p_mysql_745_monitor_0
Apr 03 08:50:30 [22764] ha14a   lrmd: info: process_lrmd_get_rsc_info:  
Resource 'p_mysql_746' not found (18 active resources)
Apr 03 08:50:30 [22764] ha14a   lrmd: info: process_lrmd_rsc_register:  
Added 'p_mysql_746' to the rsc list (19 active resources)
Apr 03 08:50:30 [22767] ha14a   crmd: info: do_lrm_rsc_op:  
Performing key=11:7484:7:91ef4b03-8769-47a1-a364-060569c46e52 
op=p_mysql_746_monitor_0
Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: Diff: 
--- 0.607.0 2
Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: Diff: 
+++ 0.607.1 (null)
Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: +  
/cib:  @num_updates=1
Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: ++ 
/cib/status/node_state[@id='ha14b']/lrm[@id='ha14b']/lrm_resources:  

Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: ++  
  
Apr 03 08:50:30 [22762] ha14acib: info: cib_process_request:
Completed cib_modify operation for section status: OK (rc=0, 
origin=ha14b/crmd/7665, version=0.607.1)
Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: Diff: 
--- 0.607.1 2
Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: Diff: 
+++ 0.607.2 (null)
Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: +  
/cib:  @num_updates=2
Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: ++ 
/cib/status/node_state[@id='ha14b']/lrm[@id='ha14b']/lrm_resources:  

Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: ++  
  
Apr 03 08:50:30 [22762] ha14acib: info: cib_process_request:
Completed cib_modify operation for section status: OK (rc=0, 
origin=ha14b/crmd/7666, version=0.607.2)
Apr 03 08:50:30 [22767] ha14a   crmd:   notice: process_lrm_event:  
Operation p_mysql_745_monitor_0: not running (node=ha14a, call=142, rc=7, 
cib-update=88, confirmed=true)
Apr 03 08:50:30 [22767] ha14a   crmd:   notice: process_lrm_event:  
ha14a-p_mysql_745_monitor_0:142 [ not started\n ]
Apr 03 08:50:30 [22762] 

[ClusterLabs] Can't See Why This Cluster Failed Over

2017-04-07 Thread Eric Robinson
Somebody want to look at this log and tell me why the cluster failed over? All 
we did was add a new resource. We've done it many times before without any 
problems.

--

Apr 03 08:50:30 [22762] ha14acib: info: cib_process_request:
Forwarding cib_apply_diff operation for section 'all' to master 
(origin=local/cibadmin/2)
Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: Diff: 
--- 0.605.2 2
Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: Diff: 
+++ 0.607.0 65654c97e62cd549f22f777a5290fe3a
Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: +  
/cib:  @epoch=607, @num_updates=0
Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: ++ 
/cib/configuration/resources:  
Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: ++ 
/cib/configuration/resources:  
Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: ++ 
/cib/configuration/constraints/rsc_colocation[@id='c_clust19']/resource_set[@id='c_clust19-0']:
  
Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: ++ 
/cib/configuration/constraints/rsc_colocation[@id='c_clust19']/resource_set[@id='c_clust19-0']:
  
Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: ++ 
/cib/configuration/constraints/rsc_order[@id='o_clust19']/resource_set[@id='o_clust19-3']:
  
Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: ++ 
/cib/configuration/constraints/rsc_order[@id='o_clust19']/resource_set[@id='o_clust19-3']:
  
Apr 03 08:50:30 [22762] ha14acib: info: cib_process_request:
Completed cib_apply_diff operation for section 'all': OK (rc=0, 
origin=ha14a/cibadmin/2, version=0.607.0)
Apr 03 08:50:30 [22762] ha14acib: info: write_cib_contents: 
Archived previous version as /var/lib/pacemaker/cib/cib-36.raw
Apr 03 08:50:30 [22762] ha14acib: info: write_cib_contents: 
Wrote version 0.607.0 of the CIB to disk (digest: 
1afdb9e480f870a095aa9e39719d29c4)
Apr 03 08:50:30 [22762] ha14acib: info: retrieveCib:Reading 
cluster configuration from: /var/lib/pacemaker/cib/cib.DkIgSs (digest: 
/var/lib/pacemaker/cib/cib.hPwa66)
Apr 03 08:50:30 [22764] ha14a   lrmd: info: process_lrmd_get_rsc_info:  
Resource 'p_mysql_745' not found (17 active resources)
Apr 03 08:50:30 [22764] ha14a   lrmd: info: process_lrmd_rsc_register:  
Added 'p_mysql_745' to the rsc list (18 active resources)
Apr 03 08:50:30 [22767] ha14a   crmd: info: do_lrm_rsc_op:  
Performing key=10:7484:7:91ef4b03-8769-47a1-a364-060569c46e52 
op=p_mysql_745_monitor_0
Apr 03 08:50:30 [22764] ha14a   lrmd: info: process_lrmd_get_rsc_info:  
Resource 'p_mysql_746' not found (18 active resources)
Apr 03 08:50:30 [22764] ha14a   lrmd: info: process_lrmd_rsc_register:  
Added 'p_mysql_746' to the rsc list (19 active resources)
Apr 03 08:50:30 [22767] ha14a   crmd: info: do_lrm_rsc_op:  
Performing key=11:7484:7:91ef4b03-8769-47a1-a364-060569c46e52 
op=p_mysql_746_monitor_0
Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: Diff: 
--- 0.607.0 2
Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: Diff: 
+++ 0.607.1 (null)
Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: +  
/cib:  @num_updates=1
Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: ++ 
/cib/status/node_state[@id='ha14b']/lrm[@id='ha14b']/lrm_resources:  

Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: ++  
  
Apr 03 08:50:30 [22762] ha14acib: info: cib_process_request:
Completed cib_modify operation for section status: OK (rc=0, 
origin=ha14b/crmd/7665, version=0.607.1)
Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: Diff: 
--- 0.607.1 2
Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: Diff: 
+++ 0.607.2 (null)
Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: +  
/cib:  @num_updates=2
Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: ++ 
/cib/status/node_state[@id='ha14b']/lrm[@id='ha14b']/lrm_resources:  

Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: ++  
  
Apr 03 08:50:30 [22762] ha14acib: info: cib_process_request:
Completed cib_modify operation for section status: OK (rc=0, 
origin=ha14b/crmd/7666, version=0.607.2)
Apr 03 08:50:30 [22767] ha14a   crmd:   notice: process_lrm_event:  
Operation p_mysql_745_monitor_0: not running (node=ha14a, call=142, rc=7, 
cib-update=88, confirmed=true)
Apr 03 08:50:30 [22767] ha14a   crmd:   notice: process_lrm_event:  
ha14a-p_mysql_745_monitor_0:142 [ not started\n ]
Apr 03 08:50:30 [22762]