On 2013/5/18 21:26, Sunil Mushran wrote:
> The first node that gets the lock will do the actual recovery. The others 
> will get the lock and see a clean journal and skip the recovery. A thread 
> should never error out if it fails to get the lock. It should try and try 
> again.
> 
> On May 17, 2013, at 11:27 PM, Joseph Qi <[email protected]> wrote:
> 
>> Hi,
>> Once there is node down in the cluster, ocfs2_recovery_thread will be
>> triggered on each node. These threads then do the down node recovery by
>> get super lock.
>> I have several questions on this:
>> 1) Why each node has to run such a thread? We know at last one node can
>> get the super lock and do the actual recovery.
>> 2) If this thread is running but something error occurred, take
>> ocfs2_super_lock failed for example, the thread will exit without
>> clearing recovery map, will it cause other threads still waiting for
>> recovery in ocfs2_wait_for_recovery?
>>
> 
> 
But when error occurs and goes to bail, and the restart logic will not
run. Codes like below:
...
        status = ocfs2_wait_on_mount(osb);
        if (status < 0) {
                goto bail;
        }

        rm_quota = kzalloc(osb->max_slots * sizeof(int), GFP_NOFS);
        if (!rm_quota) {
                status = -ENOMEM;
                goto bail;
        }
restart:
        status = ocfs2_super_lock(osb, 1);
        if (status < 0) {
                mlog_errno(status);
                goto bail;
        }
...
        if (!status && !ocfs2_recovery_completed(osb)) {
                mutex_unlock(&osb->recovery_lock);
                goto restart;
        }


_______________________________________________
Ocfs2-devel mailing list
[email protected]
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Reply via email to