On Mon, Mar 1, 2010 at 9:40 PM, Joel Becker <joel.bec...@oracle.com> wrote:
>        Two nodes is a special and difficult case.  If node0 is still
> heartbeating, node1 thinks it is alive; by the lowest number rule, node1
> resets.  If node0 is not heartbeating (a full crash), node1 will stay
> alive.  As long as node0 is heartbeating, there is no way for node1 to
> know that node0 is having trouble.

>        If this case presents a significant problem, just add a third
> node.  Once there are three nodes, you always have a majority, which
> takes precedence over the lowest number.

>> What is the node with the lowest number? does it have to be Node0? or
>> does it mean connectivity to the lowest surviving Node?
>        Here it is specifically talking about surviving nodes; these are
> the nodes visible via heartbeat.  Any node not heartbeating is
> considered dead.  So if node0 is turned off, and node1 is heartbeating,
> node1 is considered the lowest surviving node.

>> I setup a test scenario with 4 nodes, 2 nodes mounting the filesystems
>> and 2 other nodes just participating as network members:
>        For the purposes of ocfs2, nodes that are not mounted are
> invisible.  Only once they mount the filesystem and start heartbeating
> to they take part in quorum.
>        For your scenario, you essentially have a two-node quorum as
> described above.  Nodes 3&4 don't participate.

Then I believe the Quorum rules in the documentation/FAQ should be
updated with this info.

>> During my test (take Node0 down cold turkey)  Node1 hung pretty badly,
>> is this something expected??
>        What did you do to take it down?  Power off?  Node1 should take
> around 90 seconds to notice (depending on your heartbeat timeout
> settings), and then it should start recovery.

I flip the power off, on almost any test Node1 crashes as well.

I don;t understand why you don't have plans to add a referential IP
address to find who's on the network and who isn't, while you got a
point in adding a third node won't break the bank if we're using
RAC/SAP already unless we're required to get a license for that node
anyway, running a node in idle mode seems a little bit wasteful, but
if that solves the problem..... good I'll give it a shoot today.


Ocfs2-users mailing list

Reply via email to