Both AST and BAST can only be sent by the master. And we ensure the 
master sends the ASTs before BAST.

Do you have the full lockres dump?

On 02/21/2012 04:36 PM, Xiaowei.hu wrote:
> Hi Sunil,
>
> I mean it execute in this way:
>
> nodeA ocfs2_dlm_lock() and released the res spin lock,here A doesn't
> hold spin locks,
> then it start to execute the proxy ast handler , process bast request
> from nodeB,
> then dlmthread flushed the bast, after this node A start to queue its
> ast in ocfs2_dlm_lock() function.
>
> Thanks,
> Xiaowei
> On 02/22/2012 01:48 AM, Sunil Mushran wrote:
>> > bast queued and flushed,before the ast was queued
>>
>> Unlikely with o2dlm. dlmthread always sends ASTs before BASTs.
>>
>> Can you recreate the entire lockres? A full dump may yield more
>> information.
>>
>> Sunil
>>
>> On 02/20/2012 10:12 PM, xiaowei...@oracle.com wrote:
>>> I am trying to fix bug13611997,CT's machine run into BUG in ocfs2dc
>>> thread, BUG_ON(lockres->l_action != OCFS2_AST_CONVERT&&
>>> lockres->l_action != OCFS2_AST_DOWNCONVERT); I analysized the vmcore
>>> , the lockres->l_action = OCFS2_AST_ATTACH and l_flags=326(which
>>> means
>>> OCFS2_LOCK_BUSY|OCFS2_LOCK_BLOCKED|OCFS2_LOCK_INITIALIZED|OCFS2_LOCK_QUEUED),
>>> after compared with the code , this status could be only possible
>>> during ocfs2_cluster_lock,here is the race situation:
>>>
>>> NodeA NodeB
>>> ocfs2_cluster_lock on a new lockres M
>>> spin_lock_irqsave(&lockres->l_lock, flags);
>>> gen = lockres_set_pending(lockres);
>>> lockres->l_action = OCFS2_AST_ATTACH;
>>> lockres_or_flags(lockres, OCFS2_LOCK_BUSY);
>>> spin_unlock_irqrestore(&lockres->l_lock, flags);
>>>
>>> ocfs2_dlm_lock() finished and returned.
>>> **and lockres_clear_pending(lockres, gen, osb);
>>> request a lock on the same lockres M
>>> It's blocked by nodeA, and a ast proxy was send to A
>>>
>>> bast queued and flushed,before the ast was queued
>>> then the ocfs2dc was scheduled
>>> there is a chance to execute this code path:
>>> ocfs2_downconvert_thread()
>>> ocfs2_downconvert_thread_do_work()
>>> ocfs2_blocking_ast()
>>> ocfs2_process_blocked_lock()
>>> ocfs2_unblock_lock()
>>> spin_lock_irqsave(&lockres->l_lock, flags);
>>> if (lockres->l_flags& OCFS2_LOCK_BUSY)
>>> ret = ocfs2_prepare_cancel_convert(osb, lockres);
>>> BUG_ON(lockres->l_action != OCFS2_AST_CONVERT&&
>>> lockres->l_action != OCFS2_AST_DOWNCONVERT);
>>> here trigger the BUG()
>>>
>>> Solution:
>>> One possible solution for this is to remove the lockres_clear_pending
>>> marked by 2 stars, and left this clear work to the ast function.In
>>> this way could make sure the bast function wait for ast , let it
>>> clear OCFS2_LOCK_BUSY and set OCFS2_LOCK_ATTACHED first, before enter
>>> downconvert process.
>>>
>>>
>>
>

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-devel

Reply via email to