Hi Jan,
Here is the output from your command:
attrd: 609413
cib: 609409
corosync: 608778
crmd: 609415
lrmd: 609412
pengine: 609414
pacemakerd: 609407
stonithd: 609411
Regarding using a newer version, that's what I've been thinking about,
but I've been using this combination of corosync/pacemaker for many
years on a different hardware and hever had similar problem.
The main difference is that I have stonith enabled only the problematic
cluster, but I also suspect that the node, which causes this problem may
have some hardware issues.
BTW my last few tests with the newest corosync/pacemaker gave me very
annoying delay, when commiting configuration changes (maybe it's a known
problem?).
Best regards,
Klecho
On 17.02.2016 14:59, Jan Pokorný wrote:
On 17/02/16 14:10 +0200, Klechomir wrote:
Having strange issue lately.
I have two node cluster with some cloned resources on it.
One of my nodes suddenly starts reporting all its resources down (some of
them are actually running), stops logging and reminds in this this state
forever, while still responding to crm commands.
The curious thing is that restarting corosync/pacemaker doesn't change
anything.
Here are the last lines in the log after restart:
[...]
Feb 17 12:55:19 [609409] CLUSTER-1 cib: info:
cib_process_replace: Replaced 0.238.40 with 0.238.40 from CLUSTER-2
Feb 17 12:55:21 [609413] CLUSTER-1 attrd: warning: attrd_cib_callback:
Update shutdown=(null) failed: No such device or address
Feb 17 12:55:22 [609413] CLUSTER-1 attrd: warning: attrd_cib_callback:
Update terminate=(null) failed: No such device or address
Feb 17 12:55:25 [609413] CLUSTER-1 attrd: warning: attrd_cib_callback:
Update pingd=(null) failed: No such device or address
Feb 17 12:55:26 [609413] CLUSTER-1 attrd: warning: attrd_cib_callback:
Update fail-count-p_Samba_Server=(null) failed: No such device or address
Feb 17 12:55:26 [609413] CLUSTER-1 attrd: warning: attrd_cib_callback:
Update master-p_Device_drbddrv1=(null) failed: No such device or address
Feb 17 12:55:27 [609413] CLUSTER-1 attrd: warning: attrd_cib_callback:
Update last-failure-p_Samba_Server=(null) failed: No such device or address
Feb 17 12:55:27 [609413] CLUSTER-1 attrd: warning: attrd_cib_callback:
Update probe_complete=(null) failed: No such device or address
After that the logging on the problematic node stops.
Note sure I follow, what does the following command produce:
for i in attrd cib corosync crmd lrmd pengine pacemakerd stonithd; do \
echo "${i}: $(pgrep ${i})"; done
?
Corosync is v2.1.0.26; Pacemaker v1.1.8
Definitely try a most recent version of Pacemaker; what you are using
is 3.5 years old and plentiful fixes landed since then.
_______________________________________________
Users mailing list: [email protected]
http://clusterlabs.org/mailman/listinfo/users
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Users mailing list: [email protected]
http://clusterlabs.org/mailman/listinfo/users
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org