In a 2-node cluster where each node has two NICs connected to disjoint 
networks, and thus 2 corosync rings, why would loss of communication on one 
ring cause cluster failover?

We have the following setup...


                                         LAN A
                 /====SWITCH====SWITCH====\
         /                                                          \

NODE_A                                                 NODE_B

         \                                                           /
                  \====SWITCH====SWITCH====/
                                         LAN B


Everything on LAN A is physically separate from LAN B, different switches, 
cables, power, etc. For some reason, when either LAN A or LAN B suffers a 
failure, the cluster fails over. What would cause that?

This happened yesterday at 2:05 pm Pacific time. I have the corosync and 
pacemaker logs from both nodes during that timeframe, but they are 20,000+ 
lines. I can see the failover happening (because everything was going along 
normally, then the logs went nuts) but I don't understand why. Can someone tell 
me what clues I should be looking for?

--
Eric Robinson


_______________________________________________
Users mailing list: [email protected]
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to