Hi Ray, thanks for the response.

I've tracked down the issue as far as some sort of conflict between the following scripts in /etc/rcS.d:

S59corosync
S70screen-cleanup
S80bootmisc.sh
S85urandom
S90console-screen.sh

With the startup sequence given above, the problem seems to occur close to 100% of time (actually, I've never seen an actual correct startup of corosync after boot).

I believe that the problem seems to revolve around the sockets that corosync uses in /var/run/*, thus I am assuming that one of the scripts that executes after corosync in that sequence messes up some of the /var/run/* files that it needs to communicate back-and-forth between all of it's children.  I've looked through the scripts (although not really closely) and haven't been able to find the culprit.

However, I have managed to "fix" (work-around) the problem by moving S59corosync to S95corosync.  i.e.,

mv /etc/rcS.d/S59corosync /etc/rcS.d/S95corosync

Since doing that on my system(s), I've rebooted 10 times, and corosync (and all of it's children) have come up correctly all ten times.

I'm moving on to another problem with the corosync init.d scripts now: "/etc/init.d/corosync stop" seems to fail 100% of the time, and I believe it to be related to timeouts (i.e. the init script simply isn't giving corosync enough time to shutdown), I'll post back to this list once I have more information on that.

I haven't encountered the problem where corosync fails during a manual start -- it has only been automatic/on-boot starts that have caused problems, and those only when it was at /etc/rcS.d/S59corosync.

Thanks.

Ray Pitmon wrote:
Hi Remi,

I have not found a solution.  I thought about adding it to rc.local, but now I'm finding that starting the thing manually doesn't always work either (especially after a hard-shutdown by pulling the power cable).

I've found that I have to do this on a hard-shutdown:

1. Start corosync, tail syslog and notice that the processes in /usr/bin/heartbeat/ that start up (cib, lrmd, etc) are screaming that they can't get going for some reason.
2. Stop corosync (/etc/init.d/corosync stop), then manually kill all those procs since they won't quit on their own.
3. Stop corosync again, just for good measure.
4. Start corosync.  Watch the logs.  Sometimes it says a few things and exits again (with no indication in the logs why it exited).
5. Start corosync one more time, if it went away, and it runs great.

So.. After further reading, I decided that it probably isn't wise to start pacemaker automatically on boot-up anyway.  From what I've read, I fear I might run into a STONITH death-match.

-Ray

Remi Broemeling wrote:
Hi, Ray.

I'm in the process of playing around with the pacemaker-openais 1.0.5+hg20090813-0ubuntu2~hardy1 package on Ubuntu Hardy Heron, and I encountered the issue that you wrote about on the [email protected] mailing list.

There is no follow-up conversation on the list (at least none that I can see), so I was wondering if you had found a solution to the problem with pacemaker/corosync startup during system boot?

Thanks.


--

Remi Broemeling
Sr System Administrator

Nexopia.com Inc.
direct: 780 444 1250 ext 435
email: [email protected]
fax: 780 487 0376

www.nexopia.com

If money is the root of all evil, why do churches want it so badly?
coolsig.com
_______________________________________________
Mailing list: https://launchpad.net/~ubuntu-ha
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~ubuntu-ha
More help   : https://help.launchpad.net/ListHelp

Reply via email to