[Pacemaker] More Diagnosis help

Alex Samad - Yieldbroker Fri, 31 Oct 2014 13:15:10 -0700

Hi

Had another node die



Everything is looking good, I am guessing corrosync tried to talk to the other 
node and it failed, I believe 

Nov  1 00:08:48 demorp2 ntpd[2461]: peers refreshed
Nov  1 00:08:51 demorp2 corosync[2039]:   [TOTEM ] A processor joined or left 
the membership and a new membership was formed.
Nov  1 00:08:51 demorp2 corosync[2039]:   [CPG   ] chosen downlist: sender r(0) 
ip(10.172.218.52) ; members(old:1 left:0)
Nov  1 00:08:51 demorp2 corosync[2039]:   [MAIN  ] Completed service 
synchronization, ready to provide service.
Nov  1 00:09:05 demorp2 corosync[2039]:   [TOTEM ] A processor joined or left 
the membership and a new membership was formed.
Nov  1 00:09:05 demorp2 corosync[2039]:   [CMAN  ] quorum regained, resuming 
activity
Nov  1 00:09:05 demorp2 corosync[2039]:   [QUORUM] This node is within the 
primary component and will provide service.
Nov  1 00:09:05 demorp2 corosync[2039]:   [QUORUM] Members[2]: 1 2
Nov  1 00:09:05 demorp2 corosync[2039]:   [QUORUM] Members[2]: 1 2
Nov  1 00:09:05 demorp2 crmd[2725]:   notice: cman_event_callback: Membership 
320: quorum acquired
Nov  1 00:09:05 demorp2 crmd[2725]:   notice: crm_update_peer_state: 
cman_event_callback: Node demorp1[1] - state is now member (was lost)
Nov  1 00:09:05 demorp2 corosync[2039]:   [CPG   ] chosen downlist: sender r(0) 
ip(10.172.218.52) ; members(old:1 left:0)
Nov  1 00:09:05 demorp2 corosync[2039]:   [MAIN  ] Completed service 
synchronization, ready to provide service.
Nov  1 00:09:05 demorp2 crmd[2725]:   notice: do_state_transition: State 
transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN cause=C_FSA_INTERNAL 
origin=peer_update_callback ]
Nov  1 00:09:06 demorp2 corosync[2039]: cman killed by node 1 because we were 
killed by cman_tool or other application
Nov  1 00:09:06 demorp2 attrd[2723]:    error: pcmk_cpg_dispatch: Connection to 
the CPG API failed: Library error (2)
Nov  1 00:09:06 demorp2 attrd[2723]:     crit: attrd_cs_destroy: Lost 
connection to Corosync service!
Nov  1 00:09:06 demorp2 attrd[2723]:   notice: main: Exiting...
Nov  1 00:09:06 demorp2 attrd[2723]:   notice: main: Disconnecting client 
0xdc3020, pid=2725...
Nov  1 00:09:06 demorp2 pacemakerd[2712]:    error: pcmk_cpg_dispatch: 
Connection to the CPG API failed: Library error (2)
Nov  1 00:09:06 demorp2 pacemakerd[2712]:    error: mcp_cpg_destroy: Connection 
destroyed
Nov  1 00:09:06 demorp2 stonith-ng[2721]:    error: pcmk_cpg_dispatch: 
Connection to the CPG API failed: Library error (2)
Nov  1 00:09:06 demorp2 crmd[2725]:    error: pcmk_cpg_dispatch: Connection to 
the CPG API failed: Library error (2)
Nov  1 00:09:06 demorp2 crmd[2725]:    error: crmd_cs_destroy: connection 
terminated
Nov  1 00:09:06 demorp2 gfs_controld[2173]: cluster is down, exiting
Nov  1 00:09:06 demorp2 gfs_controld[2173]: daemon cpg_dispatch error 2
Nov  1 00:09:06 demorp2 attrd[2723]:    error: attrd_cib_connection_destroy: 
Connection to the CIB terminated...
Nov  1 00:09:06 demorp2 fenced[2098]: cluster is down, exiting
Nov  1 00:09:06 demorp2 fenced[2098]: daemon cpg_dispatch error 2
Nov  1 00:09:06 demorp2 dlm_controld[2124]: cluster is down, exiting
Nov  1 00:09:06 demorp2 dlm_controld[2124]: daemon cpg_dispatch error 2
Nov  1 00:09:06 demorp2 stonith-ng[2721]:    error: stonith_peer_cs_destroy: 
Corosync connection terminated
Nov  1 00:09:06 demorp2 cib[2720]:  warning: qb_ipcs_event_sendv: 
new_event_notification (2720-2721-11): Broken pipe (32)
Nov  1 00:09:06 demorp2 cib[2720]:  warning: cib_notify_send_one: Notification 
of client crmd/4c1076bf-8a95-4f77-b866-e1bbf5e2ceda failed
Nov  1 00:09:06 demorp2 cib[2720]:    error: pcmk_cpg_dispatch: Connection to 
the CPG API failed: Library error (2)
Nov  1 00:09:06 demorp2 cib[2720]:    error: cib_cs_destroy: Corosync 
connection lost!  Exiting.
Nov  1 00:09:06 demorp2 crmd[2725]:   notice: crmd_exit: Forcing immediate 
exit: Link has been severed (67)
Nov  1 00:09:06 demorp2 lrmd[2722]:  warning: qb_ipcs_event_sendv: 
new_event_notification (2722-2725-6): Bad file descriptor (9)
Nov  1 00:09:06 demorp2 lrmd[2722]:  warning: send_client_notify: Notification 
of client crmd/3598d3e2-600a-4f15-aae2-e087437d6213 failed
Nov  1 00:09:06 demorp2 lrmd[2722]:  warning: send_client_notify: Notification 
of client crmd/3598d3e2-600a-4f15-aae2-e087437d6213 failed
Nov  1 00:09:08 demorp2 kernel: dlm: closing connection to node 1


The other node

It looks to me, like VMWare took too long to give this vm a time slice and 
corosync responded by killing one node


ov  1 00:08:50 demorp1 lrmd[2433]:  warning: child_timeout_callback: 
ybrpstat_monitor_5000 process (PID 32026) timed out
Nov  1 00:08:50 demorp1 lrmd[2433]:  warning: operation_finished: 
ybrpstat_monitor_5000:32026 - timed out after 20000ms
Nov  1 00:08:51 demorp1 crmd[2436]:    error: process_lrm_event: LRM operation 
ybrpstat_monitor_5000 (17) Timed Out (timeout=20000ms)
Nov  1 00:08:52 demorp1 crmd[2436]:   notice: process_lrm_event: 
demorp1-ybrpstat_monitor_5000:17 [ Service running for 18 hours 8 minutes 30 
seconds.\n ]
Nov  1 00:08:53 demorp1 lrmd[2433]:  warning: child_timeout_callback: 
ybrpip_monitor_5000 process (PID 32033) timed out
Nov  1 00:08:53 demorp1 lrmd[2433]:  warning: operation_finished: 
ybrpip_monitor_5000:32033 - timed out after 20000ms
Nov  1 00:08:53 demorp1 crmd[2436]:    error: process_lrm_event: LRM operation 
ybrpip_monitor_5000 (22) Timed Out (timeout=20000ms)
Nov  1 00:09:05 demorp1 corosync[1748]:   [MAIN  ] Corosync main process was 
not scheduled for 16241.7002 ms (threshold is 8000.0000 ms). Consider token 
timeout increase.
Nov  1 00:09:05 demorp1 corosync[1748]:   [TOTEM ] A processor failed, forming 
new configuration.
Nov  1 00:09:05 demorp1 corosync[1748]:   [TOTEM ] Process pause detected for 
15555 ms, flushing membership messages.
Nov  1 00:09:05 demorp1 corosync[1748]:   [MAIN  ] Corosync main process was 
not scheduled for 15555.0029 ms (threshold is 8000.0000 ms). Consider token 
timeout increase.
Nov  1 00:09:05 demorp1 corosync[1748]:   [CMAN  ] quorum lost, blocking 
activity
Nov  1 00:09:05 demorp1 corosync[1748]:   [QUORUM] This node is within the 
non-primary component and will NOT provide any services.
Nov  1 00:09:05 demorp1 corosync[1748]:   [QUORUM] Members[1]: 1
Nov  1 00:09:05 demorp1 corosync[1748]:   [TOTEM ] A processor joined or left 
the membership and a new membership was formed.
Nov  1 00:09:05 demorp1 corosync[1748]:   [CMAN  ] quorum regained, resuming 
activity
Nov  1 00:09:05 demorp1 corosync[1748]:   [QUORUM] This node is within the 
primary component and will provide service.
Nov  1 00:09:05 demorp1 corosync[1748]:   [QUORUM] Members[2]: 1 2
Nov  1 00:09:05 demorp1 corosync[1748]:   [QUORUM] Members[2]: 1 2
Nov  1 00:09:05 demorp1 corosync[1748]:   [CPG   ] chosen downlist: sender r(0) 
ip(10.172.218.51) ; members(old:2 left:1)
Nov  1 00:09:05 demorp1 corosync[1748]:   [MAIN  ] Completed service 
synchronization, ready to provide service.
Nov  1 00:09:05 demorp1 crmd[2436]:   notice: process_lrm_event: LRM operation 
ybrpip_monitor_5000 (call=22, rc=0, cib-update=17, confirmed=false) ok
Nov  1 00:09:05 demorp1 crmd[2436]:   notice: peer_update_callback: Our peer on 
the DC is dead
Nov  1 00:09:05 demorp1 crmd[2436]:   notice: cman_event_callback: Membership 
320: quorum lost
Nov  1 00:09:05 demorp1 crmd[2436]:   notice: cman_event_callback: Membership 
320: quorum acquired
Nov  1 00:09:05 demorp1 crmd[2436]:   notice: do_state_transition: State 
transition S_NOT_DC -> S_ELECTION [ input=I_ELECTION 
cause=C_CRMD_STATUS_CALLBACK origin=peer_update_callba
ck ]
Nov  1 00:09:05 demorp1 crmd[2436]:   notice: process_lrm_event: LRM operation 
ybrpstat_monitor_5000 (call=17, rc=0, cib-update=18, confirmed=false) ok
Nov  1 00:09:06 demorp1 crmd[2436]:  warning: do_log: FSA: Input I_JOIN_OFFER 
from route_message() received in state S_ELECTION
Nov  1 00:09:06 demorp1 crmd[2436]:   notice: do_state_transition: State 
transition S_ELECTION -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL 
origin=do_election_count_vote ]
Nov  1 00:09:06 demorp1 fenced[1822]: telling cman to remove nodeid 2 from 
cluster
Nov  1 00:09:06 demorp1 fenced[1822]: receive_start 2:3 add node with 
started_count 1




_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] More Diagnosis help

Reply via email to