I've been working on a script for preventing split-brain in 2-node clusters and
I would appreciate comments from everyone. If someone already has a solution
like this, let me know!
Most of my database clusters are 2-nodes, with each node in a geographically
separate data center. Our layout looks like the following diagram. Each server
node has three physical connections to the world. LANs A, B , C, D are all
physically separate cable plants and cross-connects between the data centers
(using different switches, routers, power, fiber paths, etc.). This is to
ensure maximum cluster communication intelligence. LANs A and B (Corosync ring
0) are bonded at the NICs, as are LANs C and D (Corosync ring 1).
Hopefully this diagram will come through intact...
+----------------+
| |
| Third party |
| Web Hosting |
+---+--------+---+
| |
| |
| |
| |
| |
| |
++XX |
XXX XXXXXX+-+XXX
XX XX XXX
XXXXXXX XX
XXXX XX X
X XXX
+--------+ The Interwebs XXX+-----+
| XXX X |
| XX XX |
| X XX |
| X XXXX XXXXXXXXXXX |
| XXXXXX XX XX |
| XXXXXXX |
| |
| Internet | Internet
| |
| |
| |
| LAN A |
| +-----------------------------------+ |
| | LAN B | |
| | +---------------------------+ | |
| | | | | |
+---+---+---+----+ +-----+---+---+--+
| | | |
| Node 1 | | Node 2 |
| | | |
+------+---+-----+ +-----+---+------+
| | LAN C | |
| +----------------------------+ |
| LAN D |
+------------------------------------+
Even with all that connectivity it is possible that something could happen to
interrupt communication between the 2 data centers, or the connectivity been 1
of the data centers and the Internet, and split brain would result. I have been
working on a way to prevent this using a concept I call a "dead drop." This
idea takes its name from the spy world, where spies cannot communicate
directly, but they are able to pass simple information and status messages to
each other by using a blind drop in a previously agreed location. Spy X makes a
mark on a tree. Later, spy Y comes by and sees the mark, and knows that spy X
is okay. He leaves a mark of his own on the tree, and later spy X sees it and
knows that spy Y is okay. Neither spy owns the tree or the land it is on.
The same idea applies here. Suppose all direct TCP/IP connectivity were to be
severed between Nodes 1 and 2, but both of them are still able to reach the
Internet. Normally, split brain would result. But SUPPOSE they were both
running scripts that use curl requests to post and retrieve simple status
messages to and from a third party web host. In other words, even though the
nodes cannot talk to each other directly, they can still leave messages at a
dead drop location for each other to read. If Node 2 was in standby mode,
normally it would switch to primary. However, if it checks the dead drop and
sees a message from Node 1 that says, "I'm still okay and communicating with
customers." Then Node 2 knows not to become cluster primary. This script could
possibly be implemented as a cluster resource, with most other resources
dependent on it.
The dead drop needs no intelligence other than the ability to read and write
simple text files, and it can run on any third-party web host (or on multiple
web sites). It does not fill the role of a quorum or arbitrator. The 2 Nodes
themselves remain in control of their own failover decisions.
I'm SURE this has been attempted already and I don't want to re-invent the
wheel, but I have not seen this approach anywhere. Maybe there's a good reason
for that because it simply won't work? The arbitration solutions I have seen
all rely on a third machine that plays a complex role in arbitration.
Thoughts?
--
Eric Robinson
_______________________________________________
Users mailing list: [email protected]
http://clusterlabs.org/mailman/listinfo/users
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org