Hi Guozhonghua, On 2016/1/13 19:33, Guozhonghua wrote: > Hi, Junxiao, Jiufei, > > Before the function dlm_drop_lockres_ref called, the dlm spinlock is unlocked. > As the master node responded, the node will lock the dlm spinlock again and > determine whether the res is used. In this window, the master of the res may > down and recovered. > So res state may be recovering. So BUG may be occurred in the > dlm_purge_lockres. > When the master down, the node will call dlm_do_local_recovery_cleanup(). dlm_do_local_recovery_cleanup if (res->state & DLM_LOCK_RES_DROPPING_REF) mlog(); else dlm_move_lockres_to_recovery_list() In dlm_move_lockres_to_recovery_list(), the lockres is set to RECOVERING state. So the lockres with the state DROPPING_REF can not be RECOVERING, right?
Thanks, Jiufei > Thanks. > Guozhonghua. > > > Date: Mon, 14 Dec 2015 10:56:18 +0800 > From: Junxiao Bi <junxiao...@oracle.com> > Subject: Re: [Ocfs2-devel] [PATCH V2] ocfs2/dlm: fix a race between > purge and migration > To: xuejiu...@huawei.com, Andrew Morton <a...@linux-foundation.org>, > Mark Fasheh <mfas...@suse.com> > Cc: "ocfs2-devel@oss.oracle.com" <ocfs2-devel@oss.oracle.com> > Message-ID: <566e2fd2.7050...@oracle.com> > Content-Type: text/plain; charset=windows-1252 > > On 12/11/2015 11:09 AM, Xue jiufei wrote: >> We found a race between purge and migration when doing code review. Node >> A put lockres to purgelist before receiving the migrate message from node >> B which is the master. Node A call dlm_mig_lockres_handler to handle >> this message. >> >> dlm_mig_lockres_handler >> dlm_lookup_lockres >> >>>>>> race window, dlm_run_purge_list may run and send >> deref message to master, waiting the response >> spin_lock(&res->spinlock); >> res->state |= DLM_LOCK_RES_MIGRATING; >> spin_unlock(&res->spinlock); >> dlm_mig_lockres_handler returns >> >> >>>>>> dlm_thread receives the response from master for the deref >> message and triggers the BUG because the lockres has the state >> DLM_LOCK_RES_MIGRATING with the following message: >> >> dlm_purge_lockres:209 ERROR: 6633EB681FA7474A9C280A4E1A836F0F: >> res M0000000000000000030c0300000000 in use after deref >> >> Signed-off-by: Jiufei Xue <xuejiu...@huawei.com> >> Reviewed-by: Joseph Qi <joseph...@huawei.com> > Looks good. > Reviewed-by: Junxiao Bi <junxiao...@oracle.com> >> --- >> fs/ocfs2/dlm/dlmrecovery.c | 9 ++++++++- >> 1 file changed, 8 insertions(+), 1 deletion(-) >> >> diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c >> index 58eaa5c..4055909 100644 >> --- a/fs/ocfs2/dlm/dlmrecovery.c >> +++ b/fs/ocfs2/dlm/dlmrecovery.c >> @@ -1373,6 +1373,7 @@ int dlm_mig_lockres_handler(struct o2net_msg *msg, u32 >> len, void *data, >> char *buf = NULL; >> struct dlm_work_item *item = NULL; >> struct dlm_lock_resource *res = NULL; >> + unsigned int hash; >> >> if (!dlm_grab(dlm)) >> return -EINVAL; >> @@ -1400,7 +1401,10 @@ int dlm_mig_lockres_handler(struct o2net_msg *msg, >> u32 len, void *data, >> /* lookup the lock to see if we have a secondary queue for this >> * already... just add the locks in and this will have its owner >> * and RECOVERY flag changed when it completes. */ >> - res = dlm_lookup_lockres(dlm, mres->lockname, mres->lockname_len); >> + hash = dlm_lockid_hash(mres->lockname, mres->lockname_len); >> + spin_lock(&dlm->spinlock); >> + res = __dlm_lookup_lockres(dlm, mres->lockname, mres->lockname_len, >> + hash); >> if (res) { >> /* this will get a ref on res */ >> /* mark it as recovering/migrating and hash it */ >> @@ -1421,13 +1425,16 @@ int dlm_mig_lockres_handler(struct o2net_msg *msg, >> u32 len, void *data, >> mres->lockname_len, mres->lockname); >> ret = -EFAULT; >> spin_unlock(&res->spinlock); >> + spin_unlock(&dlm->spinlock); >> dlm_lockres_put(res); >> goto leave; >> } >> res->state |= DLM_LOCK_RES_MIGRATING; >> } >> spin_unlock(&res->spinlock); >> + spin_unlock(&dlm->spinlock); >> } else { >> + spin_unlock(&dlm->spinlock); >> /* need to allocate, just like if it was >> * mastered here normally */ >> res = dlm_new_lockres(dlm, mres->lockname, mres->lockname_len); >> > ------------------------------------------------------------------------------------------------------------------------------------- > 本邮件及其附件含有杭州华三通信技术有限公司的保密信息,仅限于发送给上面地址中列出 > 的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、 > 或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本 > 邮件! > This e-mail and its attachments contain confidential information from H3C, > which is > intended only for the person or entity whose address is listed above. Any use > of the > information contained herein in any way (including, but not limited to, total > or partial > disclosure, reproduction, or dissemination) by persons other than the intended > recipient(s) is prohibited. If you receive this e-mail in error, please > notify the sender > by phone or email immediately and delete it! > _______________________________________________ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel