On 2013/5/19 10:25, Joseph Qi wrote: > On 2013/5/18 21:26, Sunil Mushran wrote: >> The first node that gets the lock will do the actual recovery. The others >> will get the lock and see a clean journal and skip the recovery. A thread >> should never error out if it fails to get the lock. It should try and try >> again. >> >> On May 17, 2013, at 11:27 PM, Joseph Qi <[email protected]> wrote: >> >>> Hi, >>> Once there is node down in the cluster, ocfs2_recovery_thread will be >>> triggered on each node. These threads then do the down node recovery by >>> get super lock. >>> I have several questions on this: >>> 1) Why each node has to run such a thread? We know at last one node can >>> get the super lock and do the actual recovery. >>> 2) If this thread is running but something error occurred, take >>> ocfs2_super_lock failed for example, the thread will exit without >>> clearing recovery map, will it cause other threads still waiting for >>> recovery in ocfs2_wait_for_recovery? >>> >> >> > But when error occurs and goes to bail, and the restart logic will not > run. Codes like below: > ... > status = ocfs2_wait_on_mount(osb); > if (status < 0) { > goto bail; > } > > rm_quota = kzalloc(osb->max_slots * sizeof(int), GFP_NOFS); > if (!rm_quota) { > status = -ENOMEM; > goto bail; > } > restart: > status = ocfs2_super_lock(osb, 1); > if (status < 0) { > mlog_errno(status); > goto bail; > } > ... > if (!status && !ocfs2_recovery_completed(osb)) { > mutex_unlock(&osb->recovery_lock); > goto restart; > } > > > _______________________________________________ > Ocfs2-devel mailing list > [email protected] > https://oss.oracle.com/mailman/listinfo/ocfs2-devel > > One more question, do we make sure dlm_recovery_thread always prior to ocfs2_recovery_thread?
_______________________________________________ Ocfs2-devel mailing list [email protected] https://oss.oracle.com/mailman/listinfo/ocfs2-devel
