On Mon, Mar 01, 2010 at 03:34:59PM -0500, Enrique Sanchez wrote: > Whenever there is a split brain scenario, the node with the lowest > number survive, I am sold on that and have no argument against it, but > when Node0 crashes, Node1 also takes a nose dive, may I know why?
Two nodes is a special and difficult case. If node0 is still heartbeating, node1 thinks it is alive; by the lowest number rule, node1 resets. If node0 is not heartbeating (a full crash), node1 will stay alive. As long as node0 is heartbeating, there is no way for node1 to know that node0 is having trouble. If this case presents a significant problem, just add a third node. Once there are three nodes, you always have a majority, which takes precedence over the lowest number. > What is the node with the lowest number? does it have to be Node0? or > does it mean connectivity to the lowest surviving Node? Here it is specifically talking about surviving nodes; these are the nodes visible via heartbeat. Any node not heartbeating is considered dead. So if node0 is turned off, and node1 is heartbeating, node1 is considered the lowest surviving node. > I setup a test scenario with 4 nodes, 2 nodes mounting the filesystems > and 2 other nodes just participating as network members: For the purposes of ocfs2, nodes that are not mounted are invisible. Only once they mount the filesystem and start heartbeating to they take part in quorum. > Node0 and Node1 have network connectivity and mount the filesystems > Node3 and Node4 are alive & on the network. For your scenario, you essentially have a two-node quorum as described above. Nodes 3&4 don't participate. > During my test (take Node0 down cold turkey) Node1 hung pretty badly, > is this something expected?? What did you do to take it down? Power off? Node1 should take around 90 seconds to notice (depending on your heartbeat timeout settings), and then it should start recovery. Joel -- "Too much walking shoes worn thin. Too much trippin' and my soul's worn thin. Time to catch a ride it leaves today Her name is what it means. Too much walking shoes worn thin." Joel Becker Principal Software Developer Oracle E-mail: joel.bec...@oracle.com Phone: (650) 506-8127 _______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users