Both AST and BAST can only be sent by the master. And we ensure the master sends the ASTs before BAST.
Do you have the full lockres dump? On 02/21/2012 04:36 PM, Xiaowei.hu wrote: > Hi Sunil, > > I mean it execute in this way: > > nodeA ocfs2_dlm_lock() and released the res spin lock,here A doesn't > hold spin locks, > then it start to execute the proxy ast handler , process bast request > from nodeB, > then dlmthread flushed the bast, after this node A start to queue its > ast in ocfs2_dlm_lock() function. > > Thanks, > Xiaowei > On 02/22/2012 01:48 AM, Sunil Mushran wrote: >> > bast queued and flushed,before the ast was queued >> >> Unlikely with o2dlm. dlmthread always sends ASTs before BASTs. >> >> Can you recreate the entire lockres? A full dump may yield more >> information. >> >> Sunil >> >> On 02/20/2012 10:12 PM, xiaowei...@oracle.com wrote: >>> I am trying to fix bug13611997,CT's machine run into BUG in ocfs2dc >>> thread, BUG_ON(lockres->l_action != OCFS2_AST_CONVERT&& >>> lockres->l_action != OCFS2_AST_DOWNCONVERT); I analysized the vmcore >>> , the lockres->l_action = OCFS2_AST_ATTACH and l_flags=326(which >>> means >>> OCFS2_LOCK_BUSY|OCFS2_LOCK_BLOCKED|OCFS2_LOCK_INITIALIZED|OCFS2_LOCK_QUEUED), >>> after compared with the code , this status could be only possible >>> during ocfs2_cluster_lock,here is the race situation: >>> >>> NodeA NodeB >>> ocfs2_cluster_lock on a new lockres M >>> spin_lock_irqsave(&lockres->l_lock, flags); >>> gen = lockres_set_pending(lockres); >>> lockres->l_action = OCFS2_AST_ATTACH; >>> lockres_or_flags(lockres, OCFS2_LOCK_BUSY); >>> spin_unlock_irqrestore(&lockres->l_lock, flags); >>> >>> ocfs2_dlm_lock() finished and returned. >>> **and lockres_clear_pending(lockres, gen, osb); >>> request a lock on the same lockres M >>> It's blocked by nodeA, and a ast proxy was send to A >>> >>> bast queued and flushed,before the ast was queued >>> then the ocfs2dc was scheduled >>> there is a chance to execute this code path: >>> ocfs2_downconvert_thread() >>> ocfs2_downconvert_thread_do_work() >>> ocfs2_blocking_ast() >>> ocfs2_process_blocked_lock() >>> ocfs2_unblock_lock() >>> spin_lock_irqsave(&lockres->l_lock, flags); >>> if (lockres->l_flags& OCFS2_LOCK_BUSY) >>> ret = ocfs2_prepare_cancel_convert(osb, lockres); >>> BUG_ON(lockres->l_action != OCFS2_AST_CONVERT&& >>> lockres->l_action != OCFS2_AST_DOWNCONVERT); >>> here trigger the BUG() >>> >>> Solution: >>> One possible solution for this is to remove the lockres_clear_pending >>> marked by 2 stars, and left this clear work to the ast function.In >>> this way could make sure the bast function wait for ast , let it >>> clear OCFS2_LOCK_BUSY and set OCFS2_LOCK_ATTACHED first, before enter >>> downconvert process. >>> >>> >> > _______________________________________________ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel