Hello all, I'm running corosync + pacemaker under CentOS 5.5, using the packages from clusterlabs.org (so, corosync-1.2.7-1.1.el5). I'm running into a problem where corosync frequently fails to start up.
When this happens, instead of a single corosync process I see the following: # ps -fe | grep corosync root 1704 1 0 15:36 ? 00:00:00 corosync root 1710 1704 0 15:36 ? 00:00:00 corosync root 1711 1704 0 15:36 ? 00:00:00 corosync root 1712 1704 0 15:36 ? 00:00:00 corosync root 1713 1704 0 15:36 ? 00:00:00 corosync root 1714 1704 0 15:36 ? 00:00:00 corosync root 1715 1704 0 15:36 ? 00:00:00 corosync On a working system, the results always look like this: # ps -fe | grep corosync root 2185 1 0 14:29 ? 00:00:02 corosync And if I look at all of those with strace, all but the first are blocked on the futex() system call. The 'crm' command claims it can't contact crmd: # crm status Connection to cluster failed: connection failed The last message logged implies that some part of corosync successfully started up: 2010-09-24T15:42:55.149517-04:00 myhost corosync[1704]: [MAIN ] Completed service synchronization, ready to provide service. If I kill (-9) all the corosync processes and restart it, everything works fine. The corosync configuration is straight from the documentation. I've put it online here if you'd like to see it: http://gist.github.com/595906 Your help would be tremendously appreciated. Thanks! -- Lars _______________________________________________ Openais mailing list Openais@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/openais