Re: [ClusterLabs] Sudden stop of pacemaker functions

Klechomir Wed, 17 Feb 2016 05:18:44 -0800

Hi Jan,
Here is the output from your command:

attrd: 609413
cib: 609409
corosync: 608778
crmd: 609415
lrmd: 609412
pengine: 609414
pacemakerd: 609407
stonithd: 609411

Regarding using a newer version, that's what I've been thinking about,but I've been using this combination of corosync/pacemaker for manyyears on a different hardware and hever had similar problem.The main difference is that I have stonith enabled only the problematiccluster, but I also suspect that the node, which causes this problem mayhave some hardware issues.

BTW my last few tests with the newest corosync/pacemaker gave me veryannoying delay, when commiting configuration changes (maybe it's a knownproblem?).


Best regards,
Klecho


On 17.02.2016 14:59, Jan Pokorný wrote:

On 17/02/16 14:10 +0200, Klechomir wrote:

Having strange issue lately.
I have two node cluster with some cloned resources on it.
One of my nodes suddenly starts reporting all its resources down (some of
them are actually running), stops logging and reminds in this this state
forever, while still responding to crm commands.

The curious thing is that restarting corosync/pacemaker doesn't change
anything.

Here are the last lines in the log after restart:

[...]
Feb 17 12:55:19 [609409] CLUSTER-1        cib:     info:
cib_process_replace:   Replaced 0.238.40 with 0.238.40 from CLUSTER-2
Feb 17 12:55:21 [609413] CLUSTER-1      attrd:  warning: attrd_cib_callback:
Update shutdown=(null) failed: No such device or address
Feb 17 12:55:22 [609413] CLUSTER-1      attrd:  warning: attrd_cib_callback:
Update terminate=(null) failed: No such device or address
Feb 17 12:55:25 [609413] CLUSTER-1      attrd:  warning: attrd_cib_callback:
Update pingd=(null) failed: No such device or address
Feb 17 12:55:26 [609413] CLUSTER-1      attrd:  warning: attrd_cib_callback:
Update fail-count-p_Samba_Server=(null) failed: No such device or address
Feb 17 12:55:26 [609413] CLUSTER-1      attrd:  warning: attrd_cib_callback:
Update master-p_Device_drbddrv1=(null) failed: No such device or address
Feb 17 12:55:27 [609413] CLUSTER-1      attrd:  warning: attrd_cib_callback:
Update last-failure-p_Samba_Server=(null) failed: No such device or address
Feb 17 12:55:27 [609413] CLUSTER-1      attrd:  warning: attrd_cib_callback:
Update probe_complete=(null) failed: No such device or address

After that the logging on the problematic node stops.

Note sure I follow, what does the following command produce:

     for i in attrd cib corosync crmd lrmd pengine pacemakerd stonithd; do \
     echo "${i}: $(pgrep ${i})"; done

?

Corosync is v2.1.0.26; Pacemaker v1.1.8

Definitely try a most recent version of Pacemaker; what you are using
is 3.5 years old and plentiful fixes landed since then.



_______________________________________________
Users mailing list: [email protected]
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

_______________________________________________
Users mailing list: [email protected]
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Sudden stop of pacemaker functions

Reply via email to