On Nov 19, 2009, at 3:03 PM, Andrew Beekhof wrote:
Another problem has appeared:
after the reboot of one server I often have a cluster partition and
both
servers elect themselves DC.
Even if the partition doesn't appear just after the reboot of one
server
(i.e. serverA), if I try to restart corosync on the other server
(i.e.
serverB), the partition appear.
Then if I also restart corosync on the first server (serverA)
everything
work fine again.
But if I restart corosync on the second server (serverB) nothing
change and
the partition appears again.
It's seems to me that there is still something wrong with the first
run of
corosync just after the server reboot.
I've found that it starts a bit too early by default.
Various systems seem to like messing with the network stack (xen is
one but there are others) which confuses corosync.
I wrote a shell script that "manually starts" corosync 5 minutes after
the server starts and in this case the problem appears every time!
It's driving me crazy, because I can see that my script starts a while
after the server is up and I'm pretty sure everything is running!
On the other hand, if I start manually corosync just after the server
is up, everything works fine!
You're not getting addresses from a dhcp server are you?
Thats another common cause, since there can be a significant delay in
obtaining the address - which again messes with corosync.
Absolutely no!
I have two servers with static public IP.
I also added the two server in the /etc/hosts file: in general I
followed all the guidelines I found in the documentation.
I didn't configure any fencing method, because I think that my
configuration
is really simple and I don't need it.
Do you need your data though?
Do you mean it's better to configure a fencing method anyway?
Thank you very much for your help!
Giovanni
_______________________________________________
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker