Hi, On Mon, Jul 2, 2012 at 7:47 PM, Martin de Koning <martind...@gmail.com> wrote: > Hi all, > > Reasonably new to pacemaker and having some issues with corosync loading the > pacemaker plugin after a reboot of the node. It looks like similar issues > have been posted before but I haven't found a relavent fix. > > The Centos 6.2 node was online before the reboot and restarting the corosync > and pacemaker services caused no issues. Since the reboot and subsequent > reboots, I am unable to get pacemaker to join the cluster. > > After the reboot corosync now reports the following: > Jul 2 17:56:22 sessredis-03 corosync[1644]: [pcmk ] WARN: > route_ais_message: Sending message to local.cib failed: ipc delivery failed > (rc=-2) > Jul 2 17:56:22 sessredis-03 corosync[1644]: [pcmk ] WARN: > route_ais_message: Sending message to local.cib failed: ipc delivery failed > (rc=-2) > Jul 2 17:56:22 sessredis-03 corosync[1644]: [pcmk ] WARN: > route_ais_message: Sending message to local.cib failed: ipc delivery failed > (rc=-2) > Jul 2 17:56:22 sessredis-03 corosync[1644]: [pcmk ] WARN: > route_ais_message: Sending message to local.cib failed: ipc delivery failed > (rc=-2) > Jul 2 17:56:22 sessredis-03 corosync[1644]: [pcmk ] WARN: > route_ais_message: Sending message to local.cib failed: ipc delivery failed > (rc=-2) > Jul 2 17:56:22 sessredis-03 corosync[1644]: [pcmk ] WARN: > route_ais_message: Sending message to local.crmd failed: ipc delivery failed > (rc=-2) > > The full syslog is here: > http://pastebin.com/raw.php?i=f9eBuqUh > > corosync-1.4.1-4.el6_2.3.x86_64 > pacemaker-1.1.6-3.el6.x86_64 > > I have checked the the obvious such as inter-cluster communication and > firewall rules. It appears to me that there may be an issue with the with > Pacemaker cluster information base and not corosync. Any ideas? Can I clear > the CIB manually somehow to resolve this?
What does "corosync-objctl | grep member" return? Can you see the same multicast groups on all of the nodes when you run "netstat -ng"? To clear the CIB manually do a "rm -rfi /var/lib/heartbeat/crm/*" on the faulty node (with corosync and pacemaker stopped), then start corosync and pacemaker. HTH, Dan > > Cheers > Martin > > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > -- Dan Frincu CCNA, RHCE _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org