Re: [ClusterLabs] Cluster node getting stopped from other node(resending mail)
On 07/07/2015 03:58 AM, Arjun Pandey wrote: > Hi Ken > > If i look at the logs on the other node around the same time i see this. I > can't figure out the reason based on these.Attaching the corosync.log for > the other node as well. I don't see anything there either. The relevant part could be earlier or later. Are you using the same version of pacemaker and corosync on both nodes? > Jun 01 13:55:44 [1965] messi crmd: info: do_dc_join_offer_one: > An unknown node joined - (re-)offer to any unconfirmed nodes > Jun 01 13:55:44 [1965] messi crmd: info: join_make_offer: > Making join offers based on membership 224 > Jun 01 13:55:44 [1965] messi crmd: info: join_make_offer: > Skipping messi: already known 4 > Jun 01 13:55:44 [1965] messi crmd: info: join_make_offer: > join-2: Sending offer to ronaldo > Jun 01 13:55:44 [1960] messicib: info: cib_process_request: > Forwarding cib_modify operation for section status to master > (origin=local/crmd/138) > Jun 01 13:55:44 [1960] messicib: info: cib_perform_op: > Diff: --- 0.80.2 2 > Jun 01 13:55:44 [1960] messicib: info: cib_perform_op: > Diff: +++ 0.80.3 (null) > Jun 01 13:55:44 [1965] messi crmd: info: crm_update_peer_join: > join_make_offer: Node ronaldo[2] - join-2 phase 0 -> 1 > Jun 01 13:55:44 [1965] messi crmd: info: abort_transition_graph: > Transition aborted: Peer Halt (source=do_te_invoke:158, 1) > Jun 01 13:55:44 [1960] messicib: info: cib_perform_op: + > /cib: @num_updates=3 > Jun 01 13:55:44 [1960] messicib: info: cib_perform_op: + > /cib/status/node_state[@id='ronaldo']: @crmd=online, > @crm-debug-origin=peer_update_callback > Jun 01 13:55:44 [1960] messicib: info: cib_process_request: > Completed cib_modify operation for section status: OK (rc=0, > origin=messi/crmd/138, version=0.80.3) > Jun 01 13:55:44 [1960] messicib: info: cib_process_request: > Completed cib_modify operation for section nodes: OK (rc=0, > origin=ronaldo/crmd/3, version=0.80.3) > Jun 01 13:55:45 [1965] messi crmd: info: do_dc_join_offer_one: > join-2: Processing join_announce request from ronaldo in state > S_INTEGRATION > Jun 01 13:55:45 [1965] messi crmd: info: crm_update_peer_join: > do_dc_join_offer_one: Node ronaldo[2] - join-2 phase 1 -> 0 > Jun 01 13:55:45 [1965] messi crmd: info: join_make_offer: > join-2: Sending offer to ronaldo > Jun 01 13:55:45 [1965] messi crmd: info: crm_update_peer_join: > join_make_offer: Node ronaldo[2] - join-2 phase 0 -> 1 > Jun 01 13:55:45 [1965] messi crmd: info: crm_update_peer_join: > join_make_offer: Node messi[1] - join-2 phase 4 -> 0 > Jun 01 13:55:45 [1965] messi crmd: info: join_make_offer: > join-2: Sending offer to messi > Jun 01 13:55:45 [1965] messi crmd: info: crm_update_peer_join: > join_make_offer: Node messi[1] - join-2 phase 0 -> 1 > Jun 01 13:55:45 [1965] messi crmd: info: abort_transition_graph: > Transition aborted: Node join (source=do_dc_join_offer_one:236, 1) > Jun 01 13:55:45 [1965] messi crmd: info: crm_update_peer_join: > do_dc_join_filter_offer: Node messi[1] - join-2 phase 1 -> 2 > Jun 01 13:55:46 [1965] messi crmd: info: crm_update_peer_join: > do_dc_join_filter_offer: Node ronaldo[2] - join-2 phase 1 -> 2 > Jun 01 13:55:46 [1965] messi crmd: info: > crm_update_peer_expected:do_dc_join_filter_offer: Node ronaldo[2] - > expected state is now member (was down) > Jun 01 13:55:46 [1965] messi crmd: info: do_state_transition: > State transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED > cause=C_FSA_INTERNAL origin=check_join_state ] > Jun 01 13:55:46 [1965] messi crmd: info: crmd_join_phase_log: > join-2: messi=integrated > Jun 01 13:55:46 [1965] messi crmd: info: crmd_join_phase_log: > join-2: ronaldo=integrated > Jun 01 13:55:46 [1965] messi crmd: info: do_dc_join_finalize: > join-2: Syncing our CIB to the rest of the cluster > Jun 01 13:55:46 [1965] messi crmd: info: crm_update_peer_join: > finalize_join_for: Node messi[1] - join-2 phase 2 -> 3 > Jun 01 13:55:46 [1965] messi crmd: info: crm_update_peer_join: > finalize_join_for: Node ronaldo[2] - join-2 phase 2 -> 3 > Jun 01 13:55:46 [1965] messi crmd: info: crm_update_peer_join: > do_dc_join_ack: Node messi[1] - join-2 phase 3 -> 4 > Jun 01 13:55:46 [1965] messi crmd: info: do_dc_join_ack: > join-2: Updating node state to member for messi > Jun 01 13:55:46 [1965] messi crmd: info: erase_status_tag: > Deleting xpath: //node_state[@uname='messi']/lrm > Jun 01 13:55:46 [1960] messicib: info: cib_process_replace: > Digest matched on replace from messi: 5138b696984c7b834dd2b528dadabe
Re: [ClusterLabs] Cluster node getting stopped from other node(resending mail)
On 06/30/2015 11:30 PM, Arjun Pandey wrote: > Hi > > I am running a 2 node cluster with this config on centos 6.5/6.6 > > Master/Slave Set: foo-master [foo] > Masters: [ messi ] > Stopped: [ronaldo ] > eth1-CP(ocf::pw:IPaddr): Started messi > eth2-UP(ocf::pw:IPaddr): Started messi > eth3-UPCP (ocf::pw:IPaddr): Started messi > > where i have a multi-state resource foo being run in master/slave mode and > IPaddr RA is just modified IPAddr2 RA. Additionally i have a > collocation constraint for the IP addr to be collocated with the master. > > Sometimes when i setup the cluster , i find that one of the nodes (the > second node that joins ) gets stopped and i find this log. > > 2015-06-01T13:55:46.153941+05:30 ronaldo pacemaker: Starting Pacemaker > Cluster Manager > 2015-06-01T13:55:46.233639+05:30 ronaldo attrd[25988]: notice: > attrd_trigger_update: Sending flush op to all hosts for: shutdown (0) > 2015-06-01T13:55:46.234162+05:30 ronaldo crmd[25990]: notice: > do_state_transition: State transition S_PENDING -> S_NOT_DC [ > input=I_NOT_DC cause=C_HA_MESSAG > E origin=do_cl_join_finalize_respond ] > 2015-06-01T13:55:46.234701+05:30 ronaldo attrd[25988]: notice: > attrd_local_callback: Sending full refresh (origin=crmd) > 2015-06-01T13:55:46.234708+05:30 ronaldo attrd[25988]: notice: > attrd_trigger_update: Sending flush op to all hosts for: shutdown (0) > This looks to be the likely > reason*** > 2015-06-01T13:55:46.254310+05:30 ronaldo crmd[25990]:error: > handle_request: We didn't ask to be shut down, yet our DC is telling us too > . > * Hi Arjun, I'd check the other node's logs at this time, to see why it requested the shutdown. > 2015-06-01T13:55:46.254577+05:30 ronaldo crmd[25990]: notice: > do_state_transition: State transition S_NOT_DC -> S_STOPPING [ input=I_STOP > cause=C_HA_MESSAGE > origin=route_message ] > 2015-06-01T13:55:46.255134+05:30 ronaldo crmd[25990]: notice: > lrm_state_verify_stopped: Stopped 0 recurring operations at shutdown... > waiting (2 ops remaining) > > Based on the logs , pacemaker on active was stopping the secondary cloud > everytime it joins cluster. This issue seems similar to > http://pacemaker.oss.clusterlabs.narkive.com/rVvN8May/node-sends-shutdown-request-to-other-node-error > > Packages used :- > pacemaker-1.1.12-4.el6.x86_64 > pacemaker-libs-1.1.12-4.el6.x86_64 > pacemaker-cli-1.1.12-4.el6.x86_64 > pacemaker-cluster-libs-1.1.12-4.el6.x86_64 > pacemaker-debuginfo-1.1.10-14.el6.x86_64 > pcsc-lite-libs-1.5.2-13.el6_4.x86_64 > pcs-0.9.90-2.el6.centos.2.noarch > pcsc-lite-1.5.2-13.el6_4.x86_64 > pcsc-lite-openct-0.6.19-4.el6.x86_64 > corosync-1.4.1-17.el6.x86_64 > corosynclib-1.4.1-17.el6.x86_64 ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Cluster node getting stopped from other node(resending mail)
Hi I am running a 2 node cluster with this config on centos 6.5/6.6 Master/Slave Set: foo-master [foo] Masters: [ messi ] Stopped: [ronaldo ] eth1-CP(ocf::pw:IPaddr): Started messi eth2-UP(ocf::pw:IPaddr): Started messi eth3-UPCP (ocf::pw:IPaddr): Started messi where i have a multi-state resource foo being run in master/slave mode and IPaddr RA is just modified IPAddr2 RA. Additionally i have a collocation constraint for the IP addr to be collocated with the master. Sometimes when i setup the cluster , i find that one of the nodes (the second node that joins ) gets stopped and i find this log. 2015-06-01T13:55:46.153941+05:30 ronaldo pacemaker: Starting Pacemaker Cluster Manager 2015-06-01T13:55:46.233639+05:30 ronaldo attrd[25988]: notice: attrd_trigger_update: Sending flush op to all hosts for: shutdown (0) 2015-06-01T13:55:46.234162+05:30 ronaldo crmd[25990]: notice: do_state_transition: State transition S_PENDING -> S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAG E origin=do_cl_join_finalize_respond ] 2015-06-01T13:55:46.234701+05:30 ronaldo attrd[25988]: notice: attrd_local_callback: Sending full refresh (origin=crmd) 2015-06-01T13:55:46.234708+05:30 ronaldo attrd[25988]: notice: attrd_trigger_update: Sending flush op to all hosts for: shutdown (0) This looks to be the likely reason*** 2015-06-01T13:55:46.254310+05:30 ronaldo crmd[25990]:error: handle_request: We didn't ask to be shut down, yet our DC is telling us too . * 2015-06-01T13:55:46.254577+05:30 ronaldo crmd[25990]: notice: do_state_transition: State transition S_NOT_DC -> S_STOPPING [ input=I_STOP cause=C_HA_MESSAGE origin=route_message ] 2015-06-01T13:55:46.255134+05:30 ronaldo crmd[25990]: notice: lrm_state_verify_stopped: Stopped 0 recurring operations at shutdown... waiting (2 ops remaining) Based on the logs , pacemaker on active was stopping the secondary cloud everytime it joins cluster. This issue seems similar to http://pacemaker.oss.clusterlabs.narkive.com/rVvN8May/node-sends-shutdown-request-to-other-node-error Packages used :- pacemaker-1.1.12-4.el6.x86_64 pacemaker-libs-1.1.12-4.el6.x86_64 pacemaker-cli-1.1.12-4.el6.x86_64 pacemaker-cluster-libs-1.1.12-4.el6.x86_64 pacemaker-debuginfo-1.1.10-14.el6.x86_64 pcsc-lite-libs-1.5.2-13.el6_4.x86_64 pcs-0.9.90-2.el6.centos.2.noarch pcsc-lite-1.5.2-13.el6_4.x86_64 pcsc-lite-openct-0.6.19-4.el6.x86_64 corosync-1.4.1-17.el6.x86_64 corosynclib-1.4.1-17.el6.x86_64 Thanks in advance for your help Regards Arjun ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org