On 11/11/2014 09:41 AM, Xue jiufei wrote: > Hi JunXiao, > On 2014/11/10 9:52, Junxiao Bi wrote: >> On 11/07/2014 05:27 PM, Xue jiufei wrote: >>> On 2014/11/7 15:03, Junxiao Bi wrote: >>>> On 11/07/2014 02:36 PM, Xue jiufei wrote: >>>>> Hi Junxiao >>>>> >>>>> On 2014/11/7 13:54, Junxiao Bi wrote: >>>>>> Hi Jiufei, >>>>>> >>>>>> On 11/07/2014 11:41 AM, Xue jiufei wrote: >>>>>>> Function ocfs2_readpages() use nonblocking flag to avoid page lock >>>>>>> inversion. It will trigger cluster hang because that flag >>>>>>> OCFS2_LOCK_UPCONVERT_FINISHING is not cleared if nonblocking lock >>>>>>> cannot be granted at once. The flag would prevent dc thread from >>>>>>> downconverting. So other nodes cannot acheive this lockres for ever. >>>>>>> >>>>>>> So we should not set OCFS2_LOCK_UPCONVERT_FINISHING when receiving ast >>>>>>> if nonblocking lock had already returned. >>>>>>> >>>>>> >>>>>> Use OCFS2_LOCK_NONBLOCK_FINISHED can't fix this issue. Like there are >>>>>> two processes doing non-blocking lock in the following order. >>>>> I am wondering why these two processes can lock in such order. >>>> >>>> Yes, you are right. LOCK_BUSY flag stop this race. Your fix works. But >>>> just as i mentioned, i think for non-blocking lock, no need do cluster >>>> wild locking, because AST is async, before it reached, most of the time, >>>> the non-blocking lock had returned fail. >>> We have added some debug information in function dlm_proxy_ast_handler() >>> and dlm_send_remote_lock_request(), and run race testcase before. >>> The result shows that the probability that AST message came earlier >>> than returned status of locking is about 1/4. >> That may depend on how your race testcase is made. >>> >>> In addition, if non-blocking lock do not do cluster wide lock, >>> ocfs2_readpages() >>> will never succeed, >> Jiufei, can you explain why? >> > If node A takes EX lock, and node B call ocfs2_readpages(), which do not do > cluster > wide lock, master will not notice node A to downconvert. So even node B call > ocfs2_readpages() > many times, it will not succeed. Not sure whether this will happen. As i thought, if there is read ahead ops for this file, there should also be read op which can trigger the down convert. Any way i think we'd better not depend on this.
Reviewed-by: Junxiao Bi <[email protected]> > >> Thanks, >> Junxiao. >> >> which is not we wanted. >>> >>> Thanks, >>> Jiufei >>>> >>>> Thanks, >>>> Junxiao. >>>>>> 1. Process A issue non-blocking lock fail, set >>>>>> OCFS2_LOCK_NONBLOCK_FINISHED. >>>>> We had set OCFS2_LOCK_BUSY flag before process A call ocfs2_dlm_lock(), >>>>>> 2. Process B issue non-blocking lock fail, set >>>>>> OCFS2_LOCK_NONBLOCK_FINISHED again. >>>>> So process B must wait until ast for Process A returns and clear >>>>> OCFS2_LOCK_BUSY. >>>>> >>>>>> 3. Ast for process A come, clear OCFS2_LOCK_NONBLOCK_FINISHED. >>>>>> 4. Ast for process B come, set OCFS2_LOCK_UPCONVERT_FINISHING. >>>>>> >>>>>> So down convert can still be blocked. >>>>>> >>>>>> To fix this issue, I think no need to call ocfs2_dlm_lock() for >>>>>> non-blocking lock, just check local lockres->l_level is enough, if >>>>>> request level equal or smaller than it, lock success, or lock fail. >>>>>> >>>>>> Thanks, >>>>>> Junxiao. >>>>>> >>>>> Thanks, >>>>> jiufei. >>>>> >>>>>>> Signed-off-by: joyce.xue <[email protected]> >>>>>>> --- >>>>>>> fs/ocfs2/dlmglue.c | 37 +++++++++++++++++++++++++++++++------ >>>>>>> fs/ocfs2/ocfs2.h | 6 ++++++ >>>>>>> 2 files changed, 37 insertions(+), 6 deletions(-) >>>>>>> >>>>>>> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c >>>>>>> index 21262f2..1ddb6f1 100644 >>>>>>> --- a/fs/ocfs2/dlmglue.c >>>>>>> +++ b/fs/ocfs2/dlmglue.c >>>>>>> @@ -861,8 +861,13 @@ static inline void >>>>>>> ocfs2_generic_handle_convert_action(struct ocfs2_lock_res *lo >>>>>>> * We set the OCFS2_LOCK_UPCONVERT_FINISHING flag before >>>>>>> clearing >>>>>>> * the OCFS2_LOCK_BUSY flag to prevent the dc thread from >>>>>>> * downconverting the lock before the upconvert has fully >>>>>>> completed. >>>>>>> + * Do not prevent the dc thread from downconverting if NONBLOCK >>>>>>> lock >>>>>>> + * had already returned. >>>>>>> */ >>>>>>> - lockres_or_flags(lockres, OCFS2_LOCK_UPCONVERT_FINISHING); >>>>>>> + if (!(lockres->l_flags & OCFS2_LOCK_NONBLOCK_FINISHED)) >>>>>>> + lockres_or_flags(lockres, >>>>>>> OCFS2_LOCK_UPCONVERT_FINISHING); >>>>>>> + else >>>>>>> + lockres_clear_flags(lockres, >>>>>>> OCFS2_LOCK_NONBLOCK_FINISHED); >>>>>>> >>>>>>> lockres_clear_flags(lockres, OCFS2_LOCK_BUSY); >>>>>>> } >>>>>>> @@ -1324,13 +1329,12 @@ static void lockres_add_mask_waiter(struct >>>>>>> ocfs2_lock_res *lockres, >>>>>>> >>>>>>> /* returns 0 if the mw that was removed was already satisfied, -EBUSY >>>>>>> * if the mask still hadn't reached its goal */ >>>>>>> -static int lockres_remove_mask_waiter(struct ocfs2_lock_res *lockres, >>>>>>> +static int __lockres_remove_mask_waiter(struct ocfs2_lock_res *lockres, >>>>>>> struct ocfs2_mask_waiter *mw) >>>>>>> { >>>>>>> - unsigned long flags; >>>>>>> int ret = 0; >>>>>>> >>>>>>> - spin_lock_irqsave(&lockres->l_lock, flags); >>>>>>> + assert_spin_locked(&lockres->l_lock); >>>>>>> if (!list_empty(&mw->mw_item)) { >>>>>>> if ((lockres->l_flags & mw->mw_mask) != mw->mw_goal) >>>>>>> ret = -EBUSY; >>>>>>> @@ -1338,6 +1342,18 @@ static int lockres_remove_mask_waiter(struct >>>>>>> ocfs2_lock_res *lockres, >>>>>>> list_del_init(&mw->mw_item); >>>>>>> init_completion(&mw->mw_complete); >>>>>>> } >>>>>>> + >>>>>>> + return ret; >>>>>>> +} >>>>>>> + >>>>>>> +static int lockres_remove_mask_waiter(struct ocfs2_lock_res *lockres, >>>>>>> + struct ocfs2_mask_waiter *mw) >>>>>>> +{ >>>>>>> + unsigned long flags; >>>>>>> + int ret = 0; >>>>>>> + >>>>>>> + spin_lock_irqsave(&lockres->l_lock, flags); >>>>>>> + ret = __lockres_remove_mask_waiter(lockres, mw); >>>>>>> spin_unlock_irqrestore(&lockres->l_lock, flags); >>>>>>> >>>>>>> return ret; >>>>>>> @@ -1373,6 +1389,7 @@ static int __ocfs2_cluster_lock(struct >>>>>>> ocfs2_super *osb, >>>>>>> unsigned long flags; >>>>>>> unsigned int gen; >>>>>>> int noqueue_attempted = 0; >>>>>>> + int dlm_locked = 0; >>>>>>> >>>>>>> ocfs2_init_mask_waiter(&mw); >>>>>>> >>>>>>> @@ -1481,6 +1498,7 @@ again: >>>>>>> ocfs2_recover_from_dlm_error(lockres, 1); >>>>>>> goto out; >>>>>>> } >>>>>>> + dlm_locked = 1; >>>>>>> >>>>>>> mlog(0, "lock %s, successful return from >>>>>>> ocfs2_dlm_lock\n", >>>>>>> lockres->l_name); >>>>>>> @@ -1514,10 +1532,17 @@ out: >>>>>>> if (wait && arg_flags & OCFS2_LOCK_NONBLOCK && >>>>>>> mw.mw_mask & (OCFS2_LOCK_BUSY|OCFS2_LOCK_BLOCKED)) { >>>>>>> wait = 0; >>>>>>> - if (lockres_remove_mask_waiter(lockres, &mw)) >>>>>>> + spin_lock_irqsave(&lockres->l_lock, flags); >>>>>>> + if (__lockres_remove_mask_waiter(lockres, &mw)) { >>>>>>> + if (dlm_locked) >>>>>>> + lockres_or_flags(lockres, >>>>>>> + OCFS2_LOCK_NONBLOCK_FINISHED); >>>>>>> + spin_unlock_irqrestore(&lockres->l_lock, flags); >>>>>>> ret = -EAGAIN; >>>>>>> - else >>>>>>> + } else { >>>>>>> + spin_unlock_irqrestore(&lockres->l_lock, flags); >>>>>>> goto again; >>>>>>> + } >>>>>>> } >>>>>>> if (wait) { >>>>>>> ret = ocfs2_wait_for_mask(&mw); >>>>>>> diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h >>>>>>> index bbec539..7d6b7d0 100644 >>>>>>> --- a/fs/ocfs2/ocfs2.h >>>>>>> +++ b/fs/ocfs2/ocfs2.h >>>>>>> @@ -144,6 +144,12 @@ enum ocfs2_unlock_action { >>>>>>> * before the >>>>>>> upconvert >>>>>>> * has completed */ >>>>>>> >>>>>>> +#define OCFS2_LOCK_NONBLOCK_FINISHED (0x00001000) /* NONBLOCK cluster >>>>>>> + * lock has already >>>>>>> + * returned, do not >>>>>>> block >>>>>>> + * dc thread from >>>>>>> + * downconverting */ >>>>>>> + >>>>>>> struct ocfs2_lock_res_ops; >>>>>>> >>>>>>> typedef void (*ocfs2_lock_callback)(int status, unsigned long data); >>>>>>> >>>>>> >>>>>> . >>>>>> >>>>> >>>>> >>>> >>>> . >>>> >>> >>> >> >> . >> > > _______________________________________________ Ocfs2-devel mailing list [email protected] https://oss.oracle.com/mailman/listinfo/ocfs2-devel
