Andrei, I set the interleave=true and it does not restart any more. Thank you very much. A word of you resolves the problem confusing my several days 😊
-----邮件原件----- 发件人: Andrei Borzenkov [mailto:arvidj...@gmail.com] 发送时间: 2017年12月27日 19:06 收件人: Cluster Labs - All topics related to open-source clustering welcomed <users@clusterlabs.org> 主题: Re: [ClusterLabs] Pacemaker Master restarts when Slave is added to the cluster Usual suspect - interleave=false on clone resource. On Wed, Dec 27, 2017 at 10:49 AM, 范国腾 <fanguot...@highgo.com> wrote: > Hello, > > > > In my test environment, I meet one issue about the pacemaker: when a > new node is added in the cluster, the master node restart. This issue > will lead to the system out of service for a while when adding a new > node because there is no master node. Could you please help tell how to debug > such issue? > > > > I have a pacemaker master/slave cluster as below. pgsql-ha is a > resource. I copy the script from > /usr/lib/ocf/resource.d/heartbeat/Dumy and add some simple codes to make it > support promote/demote. > > Now when I run “pcs cluster stop” on db1,the db1 is stopped status and > db2 is still master. > > The problem is: when I run “pcs cluster start” on db1.The db2 status > changes as below: master -> slave->stop->slave->master. Why does db2 restart? > > > > CENTOS7: > > ====================================================== > > 2 nodes and 7 resources configured > > > > Online: [ db1 db2 ] > > > > Full list of resources: > > > > Clone Set: dlm-clone [dlm] > > Started: [ db1 db2 ] > > Clone Set: clvmd-clone [clvmd] > > Started: [ db1 db2 ] > > scsi-stonith-device (stonith:fence_scsi): Started db2 > > Master/Slave Set: pgsql-ha [pgsqld] > > Masters: [ db2 ] > > Slaves: [ db1 ] > > > > Daemon Status: > > corosync: active/enabled > > pacemaker: active/enabled > > pcsd: active/enabled > > [root@db1 heartbeat]# > > ========================================================== > > /var/log/messages: > > Dec 27 00:52:50 db2 cib[3290]: notice: Purged 1 peers with id=1 > and/or > uname=db1 from the membership cache > > Dec 27 00:52:51 db2 kernel: dlm: closing connection to node 1 > > Dec 27 00:52:51 db2 corosync[3268]: [TOTEM ] A new membership > (192.168.199.199:372) was formed. Members left: 1 > > Dec 27 00:52:51 db2 corosync[3268]: [QUORUM] Members[1]: 2 > > Dec 27 00:52:51 db2 corosync[3268]: [MAIN ] Completed service > synchronization, ready to provide service. > > Dec 27 00:52:51 db2 crmd[3295]: notice: Node db1 state is now lost > > Dec 27 00:52:51 db2 crmd[3295]: notice: do_shutdown of peer db1 is > complete > > Dec 27 00:52:51 db2 pacemakerd[3289]: notice: Node db1 state is now > lost > > Dec 27 00:52:57 db2 Doctor(pgsqld)[6671]: INFO: pgsqld monitor : 8 > > Dec 27 00:53:12 db2 Doctor(pgsqld)[6681]: INFO: pgsqld monitor : 8 > > Dec 27 00:53:27 db2 Doctor(pgsqld)[6746]: INFO: pgsqld monitor : 8 > > Dec 27 00:53:33 db2 corosync[3268]: [TOTEM ] A new membership > (192.168.199.197:376) was formed. Members joined: 1 > > Dec 27 00:53:33 db2 corosync[3268]: [QUORUM] Members[2]: 1 2 > > Dec 27 00:53:33 db2 corosync[3268]: [MAIN ] Completed service > synchronization, ready to provide service. > > Dec 27 00:53:33 db2 crmd[3295]: notice: Node db1 state is now member > > Dec 27 00:53:33 db2 pacemakerd[3289]: notice: Node db1 state is now > member > > Dec 27 00:53:33 db2 crmd[3295]: notice: do_shutdown of peer db1 is > complete > > Dec 27 00:53:33 db2 crmd[3295]: notice: State transition S_IDLE -> > S_INTEGRATION > > Dec 27 00:53:33 db2 pengine[3294]: notice: Calculated transition 17, > saving inputs in /var/lib/pacemaker/pengine/pe-input-116.bz2 > > Dec 27 00:53:33 db2 crmd[3295]: notice: Transition 17 (Complete=0, > Pending=0, Fired=0, Skipped=0, Incomplete=0, > Source=/var/lib/pacemaker/pengine/pe-input-116.bz2): Complete > > Dec 27 00:53:33 db2 crmd[3295]: notice: State transition > S_TRANSITION_ENGINE -> S_IDLE > > Dec 27 00:53:33 db2 stonith-ng[3291]: notice: Node db1 state is now > member > > Dec 27 00:53:33 db2 attrd[3293]: notice: Node db1 state is now member > > Dec 27 00:53:33 db2 cib[3290]: notice: Node db1 state is now member > > Dec 27 00:53:34 db2 crmd[3295]: notice: State transition S_IDLE -> > S_INTEGRATION > > Dec 27 00:53:37 db2 crmd[3295]: warning: No reason to expect node 2 to > be down > > Dec 27 00:53:38 db2 pengine[3294]: notice: Unfencing db1: node > discovery > > Dec 27 00:53:38 db2 pengine[3294]: notice: Start dlm:1#011(db1) > > Dec 27 00:53:38 db2 pengine[3294]: notice: Start clvmd:1#011(db1) > > Dec 27 00:53:38 db2 pengine[3294]: notice: Restart > pgsqld:0#011(Master db2) > > > > > > /var/log/cluster/corosync.log: > > > > Dec 27 00:53:37 [3290] db2 cib: info: cib_process_request: > Completed cib_modify operation for section status: OK (rc=0, > origin=db2/crmd/99, version=0.60.29) > > Dec 27 00:53:37 [3290] db2 cib: info: cib_process_request: > Forwarding cib_delete operation for section > //node_state[@uname='db2']/lrm to all (origin=local/crmd/100) > > Dec 27 00:53:37 [3295] db2 crmd: info: do_state_transition: > State transition S_FINALIZE_JOIN -> S_POLICY_ENGINE | > input=I_FINALIZED cause=C_FSA_INTERNAL origin=check_join_state > > Dec 27 00:53:37 [3295] db2 crmd: info: abort_transition_graph: > Transition aborted: Peer Cancelled | source=do_te_invoke:161 > complete=true > > Dec 27 00:53:37 [3293] db2 attrd: info: attrd_client_refresh: > Updating all attributes > > Dec 27 00:53:37 [3293] db2 attrd: info: write_attribute: Sent > update 12 with 2 changes for shutdown, id=<n/a>, set=(null) > > Dec 27 00:53:37 [3293] db2 attrd: info: write_attribute: Sent > update 13 with 1 changes for last-failure-pgsqld, id=<n/a>, set=(null) > > Dec 27 00:53:37 [3293] db2 attrd: info: write_attribute: Sent > update 14 with 2 changes for terminate, id=<n/a>, set=(null) > > Dec 27 00:53:37 [3293] db2 attrd: info: write_attribute: Sent > update 15 with 1 changes for fail-count-pgsqld, id=<n/a>, set=(null) > > Dec 27 00:53:37 [3290] db2 cib: info: cib_process_request: > Forwarding cib_modify operation for section status to all > (origin=local/crmd/101) > > Dec 27 00:53:37 [3290] db2 cib: info: cib_perform_op: > Diff: --- 0.60.29 2 > > Dec 27 00:53:37 [3290] db2 cib: info: cib_perform_op: > Diff: +++ 0.60.30 (null) > > Dec 27 00:53:37 [3290] db2 cib: info: cib_perform_op: -- > /cib/status/node_state[@id='2']/lrm[@id='2'] > > Dec 27 00:53:37 [3290] db2 cib: info: cib_perform_op: + > /cib: @num_updates=30 > > Dec 27 00:53:37 [3295] db2 crmd: warning: match_down_event: No > reason to expect node 2 to be down > > Dec 27 00:53:37 [3295] db2 crmd: info: abort_transition_graph: > Transition aborted by deletion of lrm[@id='2']: Resource state removal > | > cib=0.60.30 source=abort_unless_down:343 > path=/cib/status/node_state[@id='2']/lrm[@id='2'] complete=true > > Dec 27 00:53:37 [3290] db2 cib: info: cib_process_request: > Completed cib_delete operation for section //node_state[@uname='db2']/lrm: > OK (rc=0, origin=db2/crmd/100, version=0.60.30) > > Dec 27 00:53:37 [3290] db2 cib: info: cib_perform_op: > Diff: --- 0.60.30 2 > > Dec 27 00:53:37 [3290] db2 cib: info: cib_perform_op: > Diff: +++ 0.60.31 (null) > > Dec 27 00:53:37 [3290] db2 cib: info: cib_perform_op: + > /cib: @num_updates=31 > > Dec 27 00:53:37 [3290] db2 cib: info: cib_perform_op: + > /cib/status/node_state[@id='2']: > @crm-debug-origin=do_lrm_query_internal > > Dec 27 00:53:37 [3290] db2 cib: info: cib_perform_op: ++ > /cib/status/node_state[@id='2']: <lrm id="2"/> > > > > I use this command to create the resource: > > pcs resource create pgsqld ocf:heartbeat:Doctor op start timeout=60s > op stop timeout=60s op promote timeout=30s op demote timeout=120s op > monitor interval=15s timeout=10s role="Master" op monitor interval=16s > timeout=10s role="Slave" op notify timeout=60s; pcs resource master > pgsql-ha pgsqld notify=true;pcs constraint order start clvmd-clone > then pgsql-ha; > > > _______________________________________________ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > _______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org _______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org