Hi JunXiao,
On 2014/11/10 9:52, Junxiao Bi wrote:
> On 11/07/2014 05:27 PM, Xue jiufei wrote:
>> On 2014/11/7 15:03, Junxiao Bi wrote:
>>> On 11/07/2014 02:36 PM, Xue jiufei wrote:
>>>> Hi Junxiao
>>>>
>>>> On 2014/11/7 13:54, Junxiao Bi wrote:
>>>>> Hi Jiufei,
>>>>>
>>>>> On 11/07/2014 11:41 AM, Xue jiufei wrote:
>>>>>> Function ocfs2_readpages() use nonblocking flag to avoid page lock
>>>>>> inversion. It will trigger cluster hang because that flag
>>>>>> OCFS2_LOCK_UPCONVERT_FINISHING is not cleared if nonblocking lock
>>>>>> cannot be granted at once. The flag would prevent dc thread from
>>>>>> downconverting. So other nodes cannot acheive this lockres for ever.
>>>>>>
>>>>>> So we should not set OCFS2_LOCK_UPCONVERT_FINISHING when receiving ast
>>>>>> if nonblocking lock had already returned.
>>>>>>
>>>>>
>>>>> Use OCFS2_LOCK_NONBLOCK_FINISHED can't fix this issue. Like there are
>>>>> two processes doing non-blocking lock in the following order.
>>>> I am wondering why these two processes can lock in such order.
>>>
>>> Yes, you are right. LOCK_BUSY flag stop this race. Your fix works. But
>>> just as i mentioned, i think for non-blocking lock, no need do cluster
>>> wild locking, because AST is async, before it reached, most of the time,
>>> the non-blocking lock had returned fail.
>> We have added some debug information in function dlm_proxy_ast_handler()
>> and dlm_send_remote_lock_request(), and run race testcase before.
>> The result shows that the probability that AST message came earlier
>> than returned status of locking is about 1/4. 
> That may depend on how your race testcase is made.
>>
>> In addition, if non-blocking lock do not do cluster wide lock, 
>> ocfs2_readpages()
>> will never succeed, 
> Jiufei, can you explain why?
> 
If node A takes EX lock, and node B call ocfs2_readpages(), which do not do 
cluster
wide lock, master will not notice node A to downconvert. So even node B call 
ocfs2_readpages()
many times, it will not succeed.

> Thanks,
> Junxiao.
> 
> which is not we wanted.
>>
>> Thanks,
>> Jiufei
>>>
>>> Thanks,
>>> Junxiao.
>>>>> 1. Process A issue non-blocking lock fail, set 
>>>>> OCFS2_LOCK_NONBLOCK_FINISHED.
>>>> We had set OCFS2_LOCK_BUSY flag before process A call ocfs2_dlm_lock(),
>>>>> 2. Process B issue non-blocking lock fail, set
>>>>> OCFS2_LOCK_NONBLOCK_FINISHED again.
>>>> So process B must wait until ast for Process A returns and clear 
>>>> OCFS2_LOCK_BUSY.
>>>>
>>>>> 3. Ast for process A come, clear OCFS2_LOCK_NONBLOCK_FINISHED.
>>>>> 4. Ast for process B come, set OCFS2_LOCK_UPCONVERT_FINISHING.
>>>>>
>>>>> So down convert can still be blocked.
>>>>>
>>>>> To fix this issue, I think no need to call ocfs2_dlm_lock() for
>>>>> non-blocking lock, just check local lockres->l_level is enough, if
>>>>> request level equal or smaller than it, lock success, or lock fail.
>>>>>
>>>>> Thanks,
>>>>> Junxiao.
>>>>>
>>>> Thanks,
>>>> jiufei.
>>>>
>>>>>> Signed-off-by: joyce.xue <[email protected]>
>>>>>> ---
>>>>>>  fs/ocfs2/dlmglue.c | 37 +++++++++++++++++++++++++++++++------
>>>>>>  fs/ocfs2/ocfs2.h   |  6 ++++++
>>>>>>  2 files changed, 37 insertions(+), 6 deletions(-)
>>>>>>
>>>>>> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
>>>>>> index 21262f2..1ddb6f1 100644
>>>>>> --- a/fs/ocfs2/dlmglue.c
>>>>>> +++ b/fs/ocfs2/dlmglue.c
>>>>>> @@ -861,8 +861,13 @@ static inline void 
>>>>>> ocfs2_generic_handle_convert_action(struct ocfs2_lock_res *lo
>>>>>>           * We set the OCFS2_LOCK_UPCONVERT_FINISHING flag before 
>>>>>> clearing
>>>>>>           * the OCFS2_LOCK_BUSY flag to prevent the dc thread from
>>>>>>           * downconverting the lock before the upconvert has fully 
>>>>>> completed.
>>>>>> +         * Do not prevent the dc thread from downconverting if NONBLOCK 
>>>>>> lock
>>>>>> +         * had already returned.
>>>>>>           */
>>>>>> -        lockres_or_flags(lockres, OCFS2_LOCK_UPCONVERT_FINISHING);
>>>>>> +        if (!(lockres->l_flags & OCFS2_LOCK_NONBLOCK_FINISHED))
>>>>>> +                lockres_or_flags(lockres, 
>>>>>> OCFS2_LOCK_UPCONVERT_FINISHING);
>>>>>> +        else
>>>>>> +                lockres_clear_flags(lockres, 
>>>>>> OCFS2_LOCK_NONBLOCK_FINISHED);
>>>>>>  
>>>>>>          lockres_clear_flags(lockres, OCFS2_LOCK_BUSY);
>>>>>>  }
>>>>>> @@ -1324,13 +1329,12 @@ static void lockres_add_mask_waiter(struct 
>>>>>> ocfs2_lock_res *lockres,
>>>>>>  
>>>>>>  /* returns 0 if the mw that was removed was already satisfied, -EBUSY
>>>>>>   * if the mask still hadn't reached its goal */
>>>>>> -static int lockres_remove_mask_waiter(struct ocfs2_lock_res *lockres,
>>>>>> +static int __lockres_remove_mask_waiter(struct ocfs2_lock_res *lockres,
>>>>>>                                        struct ocfs2_mask_waiter *mw)
>>>>>>  {
>>>>>> -        unsigned long flags;
>>>>>>          int ret = 0;
>>>>>>  
>>>>>> -        spin_lock_irqsave(&lockres->l_lock, flags);
>>>>>> +        assert_spin_locked(&lockres->l_lock);
>>>>>>          if (!list_empty(&mw->mw_item)) {
>>>>>>                  if ((lockres->l_flags & mw->mw_mask) != mw->mw_goal)
>>>>>>                          ret = -EBUSY;
>>>>>> @@ -1338,6 +1342,18 @@ static int lockres_remove_mask_waiter(struct 
>>>>>> ocfs2_lock_res *lockres,
>>>>>>                  list_del_init(&mw->mw_item);
>>>>>>                  init_completion(&mw->mw_complete);
>>>>>>          }
>>>>>> +
>>>>>> +        return ret;
>>>>>> +}
>>>>>> +
>>>>>> +static int lockres_remove_mask_waiter(struct ocfs2_lock_res *lockres,
>>>>>> +                                      struct ocfs2_mask_waiter *mw)
>>>>>> +{
>>>>>> +        unsigned long flags;
>>>>>> +        int ret = 0;
>>>>>> +
>>>>>> +        spin_lock_irqsave(&lockres->l_lock, flags);
>>>>>> +        ret = __lockres_remove_mask_waiter(lockres, mw);
>>>>>>          spin_unlock_irqrestore(&lockres->l_lock, flags);
>>>>>>  
>>>>>>          return ret;
>>>>>> @@ -1373,6 +1389,7 @@ static int __ocfs2_cluster_lock(struct ocfs2_super 
>>>>>> *osb,
>>>>>>          unsigned long flags;
>>>>>>          unsigned int gen;
>>>>>>          int noqueue_attempted = 0;
>>>>>> +        int dlm_locked = 0;
>>>>>>  
>>>>>>          ocfs2_init_mask_waiter(&mw);
>>>>>>  
>>>>>> @@ -1481,6 +1498,7 @@ again:
>>>>>>                          ocfs2_recover_from_dlm_error(lockres, 1);
>>>>>>                          goto out;
>>>>>>                  }
>>>>>> +                dlm_locked = 1;
>>>>>>  
>>>>>>                  mlog(0, "lock %s, successful return from 
>>>>>> ocfs2_dlm_lock\n",
>>>>>>                       lockres->l_name);
>>>>>> @@ -1514,10 +1532,17 @@ out:
>>>>>>          if (wait && arg_flags & OCFS2_LOCK_NONBLOCK &&
>>>>>>              mw.mw_mask & (OCFS2_LOCK_BUSY|OCFS2_LOCK_BLOCKED)) {
>>>>>>                  wait = 0;
>>>>>> -                if (lockres_remove_mask_waiter(lockres, &mw))
>>>>>> +                spin_lock_irqsave(&lockres->l_lock, flags);
>>>>>> +                if (__lockres_remove_mask_waiter(lockres, &mw)) {
>>>>>> +                        if (dlm_locked)
>>>>>> +                                lockres_or_flags(lockres,
>>>>>> +                                        OCFS2_LOCK_NONBLOCK_FINISHED);
>>>>>> +                        spin_unlock_irqrestore(&lockres->l_lock, flags);
>>>>>>                          ret = -EAGAIN;
>>>>>> -                else
>>>>>> +                } else {
>>>>>> +                        spin_unlock_irqrestore(&lockres->l_lock, flags);
>>>>>>                          goto again;
>>>>>> +                }
>>>>>>          }
>>>>>>          if (wait) {
>>>>>>                  ret = ocfs2_wait_for_mask(&mw);
>>>>>> diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h
>>>>>> index bbec539..7d6b7d0 100644
>>>>>> --- a/fs/ocfs2/ocfs2.h
>>>>>> +++ b/fs/ocfs2/ocfs2.h
>>>>>> @@ -144,6 +144,12 @@ enum ocfs2_unlock_action {
>>>>>>                                                       * before the 
>>>>>> upconvert
>>>>>>                                                       * has completed */
>>>>>>  
>>>>>> +#define OCFS2_LOCK_NONBLOCK_FINISHED (0x00001000) /* NONBLOCK cluster
>>>>>> +                                                   * lock has already
>>>>>> +                                                   * returned, do not 
>>>>>> block
>>>>>> +                                                   * dc thread from
>>>>>> +                                                   * downconverting */
>>>>>> +
>>>>>>  struct ocfs2_lock_res_ops;
>>>>>>  
>>>>>>  typedef void (*ocfs2_lock_callback)(int status, unsigned long data);
>>>>>>
>>>>>
>>>>> .
>>>>>
>>>>
>>>>
>>>
>>> .
>>>
>>
>>
> 
> .
> 



_______________________________________________
Ocfs2-devel mailing list
[email protected]
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Reply via email to