[ClusterLabs] Sudden stop of pacemaker functions

Klechomir Wed, 17 Feb 2016 04:12:27 -0800

Hi List,
Having strange issue lately.
I have two node cluster with some cloned resources on it.

One of my nodes suddenly starts reporting all its resources down (someof them are actually running), stops logging and reminds in this thisstate forever, while still responding to crm commands.

The curious thing is that restarting corosync/pacemaker doesn't changeanything.


Here are the last lines in the log after restart:

Feb 17 12:55:17 [609415] CLUSTER-1 crmd: notice: do_started:The local CRM is operationalFeb 17 12:55:17 [609415] CLUSTER-1 crmd: info:do_state_transition: State transition S_STARTING -> S_PENDING [input=I_PENDING cause=C_FSA_INTERNAL origin=do_started ]Feb 17 12:55:17 [609409] CLUSTER-1 cib: info:cib_process_replace: Digest matched on replace from CLUSTER-2:f7cb10ecaff6cfd1661ca7ec779192b3Feb 17 12:55:17 [609409] CLUSTER-1 cib: info:cib_process_replace: Replaced 0.238.1 with 0.238.40 from CLUSTER-2Feb 17 12:55:17 [609409] CLUSTER-1 cib: info:cib_replace_notify: Replaced: 0.238.1 -> 0.238.40 from CLUSTER-2Feb 17 12:55:18 [609415] CLUSTER-1 crmd: info: update_dc:Set DC to CLUSTER-2 (3.0.6)Feb 17 12:55:19 [609411] CLUSTER-1 stonith-ng: info:stonith_command: Processed register from crmd.609415: OK (0)Feb 17 12:55:19 [609411] CLUSTER-1 stonith-ng: info:stonith_command: Processed st_notify from crmd.609415: OK (0)Feb 17 12:55:19 [609411] CLUSTER-1 stonith-ng: info:stonith_command: Processed st_notify from crmd.609415: OK (0)Feb 17 12:55:19 [609415] CLUSTER-1 crmd: info:erase_status_tag: Deleting xpath://node_state[@uname='CLUSTER-1']/transient_attributesFeb 17 12:55:19 [609415] CLUSTER-1 crmd: info: update_attrd:Connecting to attrd... 5 retries remainingFeb 17 12:55:19 [609415] CLUSTER-1 crmd: notice:do_state_transition: State transition S_PENDING -> S_NOT_DC [input=I_NOT_DC cause=C_HA_MESSAGE origin=do_cl_join_finalize_respond ]Feb 17 12:55:19 [609413] CLUSTER-1 attrd: notice:attrd_local_callback: Sending full refresh (origin=crmd)Feb 17 12:55:19 [609409] CLUSTER-1 cib: info:cib_process_replace: Digest matched on replace from CLUSTER-2:f7cb10ecaff6cfd1661ca7ec779192b3Feb 17 12:55:19 [609409] CLUSTER-1 cib: info:cib_process_replace: Replaced 0.238.40 with 0.238.40 from CLUSTER-2Feb 17 12:55:21 [609413] CLUSTER-1 attrd: warning:attrd_cib_callback: Update shutdown=(null) failed: No such device oraddressFeb 17 12:55:22 [609413] CLUSTER-1 attrd: warning:attrd_cib_callback: Update terminate=(null) failed: No such device oraddressFeb 17 12:55:25 [609413] CLUSTER-1 attrd: warning:attrd_cib_callback: Update pingd=(null) failed: No such device or addressFeb 17 12:55:26 [609413] CLUSTER-1 attrd: warning:attrd_cib_callback: Update fail-count-p_Samba_Server=(null) failed:No such device or addressFeb 17 12:55:26 [609413] CLUSTER-1 attrd: warning:attrd_cib_callback: Update master-p_Device_drbddrv1=(null) failed: Nosuch device or addressFeb 17 12:55:27 [609413] CLUSTER-1 attrd: warning:attrd_cib_callback: Update last-failure-p_Samba_Server=(null) failed:No such device or addressFeb 17 12:55:27 [609413] CLUSTER-1 attrd: warning:attrd_cib_callback: Update probe_complete=(null) failed: No suchdevice or address


After that the logging on the problematic node stops.

Corosync is v2.1.0.26; Pacemaker v1.1.8

Regards,
Klecho

_______________________________________________
Users mailing list: [email protected]
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] Sudden stop of pacemaker functions

Reply via email to