Hi Daniel,

> Which node masters the $RECOVERY resource?  

As with the mastery of any lock resource, any/all nodes can race simultaneously 
to try to master the $RECOVERY resource.  There are some small differences in 
the mastery process for recovery to ensure that deadlocks don't occur, and to 
detect and handle node death.

> Where is that set?

Almost all of this is done in fs/ocfs2/dlm/dlmmaster.c and the eventual master 
is set in the same way as all other lock resources, using the assert_master 
message.

> What happens when that node dies?

As soon as a node is seen as dead (via the heartbeat callback), cleanup occurs 
on all of the locks contained within lock resources that node mastered.  This 
includes the $RECOVERY lockres, though there is a special case in place to 
ensure that the $RECOVERY lockres is re-mastered at that point instead of being 
recovered.  Once it is remastered with the new cluster membership, it continues 
as normal.

> Why can dlm_pick_recovery_master
> get the EX on $RECOVERY and still not be the recovery master?

The EX lock on the $RECOVERY lockres is only used to protect the begin_reco 
message (the message which tells other nodes which node to recover and which 
will be the new master).  After that message is sent to all living nodes, the 
EX is dropped.  If a node has been waiting on the EX and does get it, it checks 
to see if the begin_reco has been sent while it was waiting.  If so, it backs 
off and lets the recovery master continue.

One note on all of this: this is NOT how we would like to do recovery going 
forward, we just did not have a solid cluster membership service in place that 
we could use when the mastery/recovery code was written.  Once we do have a 
stable mechanism and API (stop/start/finish) to depend upon, I would like to 
rewrite the whole thing for lock-table-based mastery and much more sensible 
recovery.  As it stands, it's a brittle structure that has to continually try 
to detect node failures inline and make adjustments as recovery is ongoing, 
which is no fun.

Thanks!
-kurt


_______________________________________________
Ocfs2-devel mailing list
[email protected]
http://oss.oracle.com/mailman/listinfo/ocfs2-devel

Reply via email to