Hi JunXiao, On 2014/11/10 9:52, Junxiao Bi wrote: > On 11/07/2014 05:27 PM, Xue jiufei wrote: >> On 2014/11/7 15:03, Junxiao Bi wrote: >>> On 11/07/2014 02:36 PM, Xue jiufei wrote: >>>> Hi Junxiao >>>> >>>> On 2014/11/7 13:54, Junxiao Bi wrote: >>>>> Hi Jiufei, >>>>> >>>>> On 11/07/2014 11:41 AM, Xue jiufei wrote: >>>>>> Function ocfs2_readpages() use nonblocking flag to avoid page lock >>>>>> inversion. It will trigger cluster hang because that flag >>>>>> OCFS2_LOCK_UPCONVERT_FINISHING is not cleared if nonblocking lock >>>>>> cannot be granted at once. The flag would prevent dc thread from >>>>>> downconverting. So other nodes cannot acheive this lockres for ever. >>>>>> >>>>>> So we should not set OCFS2_LOCK_UPCONVERT_FINISHING when receiving ast >>>>>> if nonblocking lock had already returned. >>>>>> >>>>> >>>>> Use OCFS2_LOCK_NONBLOCK_FINISHED can't fix this issue. Like there are >>>>> two processes doing non-blocking lock in the following order. >>>> I am wondering why these two processes can lock in such order. >>> >>> Yes, you are right. LOCK_BUSY flag stop this race. Your fix works. But >>> just as i mentioned, i think for non-blocking lock, no need do cluster >>> wild locking, because AST is async, before it reached, most of the time, >>> the non-blocking lock had returned fail. >> We have added some debug information in function dlm_proxy_ast_handler() >> and dlm_send_remote_lock_request(), and run race testcase before. >> The result shows that the probability that AST message came earlier >> than returned status of locking is about 1/4. > That may depend on how your race testcase is made. >> >> In addition, if non-blocking lock do not do cluster wide lock, >> ocfs2_readpages() >> will never succeed, > Jiufei, can you explain why? > If node A takes EX lock, and node B call ocfs2_readpages(), which do not do cluster wide lock, master will not notice node A to downconvert. So even node B call ocfs2_readpages() many times, it will not succeed.
> Thanks, > Junxiao. > > which is not we wanted. >> >> Thanks, >> Jiufei >>> >>> Thanks, >>> Junxiao. >>>>> 1. Process A issue non-blocking lock fail, set >>>>> OCFS2_LOCK_NONBLOCK_FINISHED. >>>> We had set OCFS2_LOCK_BUSY flag before process A call ocfs2_dlm_lock(), >>>>> 2. Process B issue non-blocking lock fail, set >>>>> OCFS2_LOCK_NONBLOCK_FINISHED again. >>>> So process B must wait until ast for Process A returns and clear >>>> OCFS2_LOCK_BUSY. >>>> >>>>> 3. Ast for process A come, clear OCFS2_LOCK_NONBLOCK_FINISHED. >>>>> 4. Ast for process B come, set OCFS2_LOCK_UPCONVERT_FINISHING. >>>>> >>>>> So down convert can still be blocked. >>>>> >>>>> To fix this issue, I think no need to call ocfs2_dlm_lock() for >>>>> non-blocking lock, just check local lockres->l_level is enough, if >>>>> request level equal or smaller than it, lock success, or lock fail. >>>>> >>>>> Thanks, >>>>> Junxiao. >>>>> >>>> Thanks, >>>> jiufei. >>>> >>>>>> Signed-off-by: joyce.xue <[email protected]> >>>>>> --- >>>>>> fs/ocfs2/dlmglue.c | 37 +++++++++++++++++++++++++++++++------ >>>>>> fs/ocfs2/ocfs2.h | 6 ++++++ >>>>>> 2 files changed, 37 insertions(+), 6 deletions(-) >>>>>> >>>>>> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c >>>>>> index 21262f2..1ddb6f1 100644 >>>>>> --- a/fs/ocfs2/dlmglue.c >>>>>> +++ b/fs/ocfs2/dlmglue.c >>>>>> @@ -861,8 +861,13 @@ static inline void >>>>>> ocfs2_generic_handle_convert_action(struct ocfs2_lock_res *lo >>>>>> * We set the OCFS2_LOCK_UPCONVERT_FINISHING flag before >>>>>> clearing >>>>>> * the OCFS2_LOCK_BUSY flag to prevent the dc thread from >>>>>> * downconverting the lock before the upconvert has fully >>>>>> completed. >>>>>> + * Do not prevent the dc thread from downconverting if NONBLOCK >>>>>> lock >>>>>> + * had already returned. >>>>>> */ >>>>>> - lockres_or_flags(lockres, OCFS2_LOCK_UPCONVERT_FINISHING); >>>>>> + if (!(lockres->l_flags & OCFS2_LOCK_NONBLOCK_FINISHED)) >>>>>> + lockres_or_flags(lockres, >>>>>> OCFS2_LOCK_UPCONVERT_FINISHING); >>>>>> + else >>>>>> + lockres_clear_flags(lockres, >>>>>> OCFS2_LOCK_NONBLOCK_FINISHED); >>>>>> >>>>>> lockres_clear_flags(lockres, OCFS2_LOCK_BUSY); >>>>>> } >>>>>> @@ -1324,13 +1329,12 @@ static void lockres_add_mask_waiter(struct >>>>>> ocfs2_lock_res *lockres, >>>>>> >>>>>> /* returns 0 if the mw that was removed was already satisfied, -EBUSY >>>>>> * if the mask still hadn't reached its goal */ >>>>>> -static int lockres_remove_mask_waiter(struct ocfs2_lock_res *lockres, >>>>>> +static int __lockres_remove_mask_waiter(struct ocfs2_lock_res *lockres, >>>>>> struct ocfs2_mask_waiter *mw) >>>>>> { >>>>>> - unsigned long flags; >>>>>> int ret = 0; >>>>>> >>>>>> - spin_lock_irqsave(&lockres->l_lock, flags); >>>>>> + assert_spin_locked(&lockres->l_lock); >>>>>> if (!list_empty(&mw->mw_item)) { >>>>>> if ((lockres->l_flags & mw->mw_mask) != mw->mw_goal) >>>>>> ret = -EBUSY; >>>>>> @@ -1338,6 +1342,18 @@ static int lockres_remove_mask_waiter(struct >>>>>> ocfs2_lock_res *lockres, >>>>>> list_del_init(&mw->mw_item); >>>>>> init_completion(&mw->mw_complete); >>>>>> } >>>>>> + >>>>>> + return ret; >>>>>> +} >>>>>> + >>>>>> +static int lockres_remove_mask_waiter(struct ocfs2_lock_res *lockres, >>>>>> + struct ocfs2_mask_waiter *mw) >>>>>> +{ >>>>>> + unsigned long flags; >>>>>> + int ret = 0; >>>>>> + >>>>>> + spin_lock_irqsave(&lockres->l_lock, flags); >>>>>> + ret = __lockres_remove_mask_waiter(lockres, mw); >>>>>> spin_unlock_irqrestore(&lockres->l_lock, flags); >>>>>> >>>>>> return ret; >>>>>> @@ -1373,6 +1389,7 @@ static int __ocfs2_cluster_lock(struct ocfs2_super >>>>>> *osb, >>>>>> unsigned long flags; >>>>>> unsigned int gen; >>>>>> int noqueue_attempted = 0; >>>>>> + int dlm_locked = 0; >>>>>> >>>>>> ocfs2_init_mask_waiter(&mw); >>>>>> >>>>>> @@ -1481,6 +1498,7 @@ again: >>>>>> ocfs2_recover_from_dlm_error(lockres, 1); >>>>>> goto out; >>>>>> } >>>>>> + dlm_locked = 1; >>>>>> >>>>>> mlog(0, "lock %s, successful return from >>>>>> ocfs2_dlm_lock\n", >>>>>> lockres->l_name); >>>>>> @@ -1514,10 +1532,17 @@ out: >>>>>> if (wait && arg_flags & OCFS2_LOCK_NONBLOCK && >>>>>> mw.mw_mask & (OCFS2_LOCK_BUSY|OCFS2_LOCK_BLOCKED)) { >>>>>> wait = 0; >>>>>> - if (lockres_remove_mask_waiter(lockres, &mw)) >>>>>> + spin_lock_irqsave(&lockres->l_lock, flags); >>>>>> + if (__lockres_remove_mask_waiter(lockres, &mw)) { >>>>>> + if (dlm_locked) >>>>>> + lockres_or_flags(lockres, >>>>>> + OCFS2_LOCK_NONBLOCK_FINISHED); >>>>>> + spin_unlock_irqrestore(&lockres->l_lock, flags); >>>>>> ret = -EAGAIN; >>>>>> - else >>>>>> + } else { >>>>>> + spin_unlock_irqrestore(&lockres->l_lock, flags); >>>>>> goto again; >>>>>> + } >>>>>> } >>>>>> if (wait) { >>>>>> ret = ocfs2_wait_for_mask(&mw); >>>>>> diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h >>>>>> index bbec539..7d6b7d0 100644 >>>>>> --- a/fs/ocfs2/ocfs2.h >>>>>> +++ b/fs/ocfs2/ocfs2.h >>>>>> @@ -144,6 +144,12 @@ enum ocfs2_unlock_action { >>>>>> * before the >>>>>> upconvert >>>>>> * has completed */ >>>>>> >>>>>> +#define OCFS2_LOCK_NONBLOCK_FINISHED (0x00001000) /* NONBLOCK cluster >>>>>> + * lock has already >>>>>> + * returned, do not >>>>>> block >>>>>> + * dc thread from >>>>>> + * downconverting */ >>>>>> + >>>>>> struct ocfs2_lock_res_ops; >>>>>> >>>>>> typedef void (*ocfs2_lock_callback)(int status, unsigned long data); >>>>>> >>>>> >>>>> . >>>>> >>>> >>>> >>> >>> . >>> >> >> > > . > _______________________________________________ Ocfs2-devel mailing list [email protected] https://oss.oracle.com/mailman/listinfo/ocfs2-devel
