Hello all,

I'm running corosync + pacemaker under CentOS 5.5, using the packages
from clusterlabs.org (so, corosync-1.2.7-1.1.el5).  I'm running into a
problem where corosync frequently fails to start up.

When this happens, instead of a single corosync process I see the following:

# ps -fe | grep corosync
root      1704     1  0 15:36 ?        00:00:00 corosync
root      1710  1704  0 15:36 ?        00:00:00 corosync
root      1711  1704  0 15:36 ?        00:00:00 corosync
root      1712  1704  0 15:36 ?        00:00:00 corosync
root      1713  1704  0 15:36 ?        00:00:00 corosync
root      1714  1704  0 15:36 ?        00:00:00 corosync
root      1715  1704  0 15:36 ?        00:00:00 corosync

On a working system, the results always look like this:

# ps -fe | grep corosync
root      2185     1  0 14:29 ?        00:00:02 corosync

And if I look at all of those with strace, all but the first are
blocked on the futex() system call.

The 'crm' command claims it can't contact crmd:

  # crm status
  Connection to cluster failed: connection failed

The last message logged implies that some part of corosync
successfully started up:

2010-09-24T15:42:55.149517-04:00 myhost corosync[1704]:   [MAIN  ]
Completed service synchronization, ready to provide service.

If I kill (-9) all the corosync processes and restart it, everything works fine.

The corosync configuration is straight from the documentation.  I've
put it online here if you'd like to see it:

  http://gist.github.com/595906

Your help would be tremendously appreciated.  Thanks!

-- Lars
_______________________________________________
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais

Reply via email to