Hi Everyone,
So my question is - why Corosync didn't work correctly if it start under Pacemaker?
Finally I found the answer. In short - it was caused by the wrong settings of local name resolver. /etc/cluster/cluster.conf contain the similar settings for each node: <clusternode name="NODE17.local" nodeid="17"> <fence> <method name="pcmk-redirect"> <device name="pcmk" port="NODE17"/> </method> </fence> <altname name="NODE17.pub"/> </clusternode> where NODE17.local point to private IP address NODE17.pub point to public IP addressDue to configuration error on the latest NODE17 the internal name NODE17.local improperly resolved to the public IP (NODE17.pub). In such case command 'corosync-cfgtool -s' on the latest added node NODE17 printed status looks like:
Local node ID 17 RING ID 0 id = 111.11.11.1 Marking ringid 0 interface 111.11.11.1 FAULTY RING ID 1 id = 111.11.11.1 status = ring 1 active with no faults As well as RING ID 0 was marked as FAULTY on all 17 nodes. After name resolution has fixed - both rings are operating as expected .I think the error would be resolved much quickly if I will used newest pacemaker/corosync versions, since both of them has many improvements/fixes applied to logging subsystems.
Thanks everyone who found ability to help me, and be be careful to DNS names when you're using similar configuration :)
-- Regards Denis Gribkov
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org