On 2016-7-11 10:07, Joseph Qi wrote: > On 2016/7/10 18:03, piaojun wrote: >> We found a BUG situation that lockres is migrated during deref >> described below. To solve the BUG, we could purge lockres directly when >> other node says I did not have a ref. Additionally, we'd better purge >> lockres if master goes down, as no one will response deref done. >> >> Node 1 Node 2(old master) Node3(new master) >> dlm_purge_lockres >> send deref to N2 >> >> leave domain >> migrate lockres to N3 >> finish migration >> send do assert >> master to N1 >> >> receive do assert msg >> form N3, but can not >> find lockres because >> DROPPING_REF is set, >> so the owner is still >> N2. >> >> receive deref from N1 >> and response -EINVAL >> because lockres is migrated >> >> BUG when receive -EINVAL >> in dlm_drop_lockres_ref >> >> Fixes: 842b90b62461d ("ocfs2/dlm: return in progress if master can not clear >> the refmap bit...") >> Signed-off-by: Jun Piao <piao...@huawei.com> > Use full patch title please. > Others looks well. > > Thanks, > Joseph > Good suggestion, I will fix this problem in the following [PATCH v2]. Thanks, Jun Piao >> --- >> fs/ocfs2/dlm/dlmmaster.c | 9 ++++++--- >> fs/ocfs2/dlm/dlmthread.c | 13 +++++++++++-- >> 2 files changed, 17 insertions(+), 5 deletions(-) >> >> diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c >> index f72e7ae..8c84641 100644 >> --- a/fs/ocfs2/dlm/dlmmaster.c >> +++ b/fs/ocfs2/dlm/dlmmaster.c >> @@ -2276,9 +2276,12 @@ int dlm_drop_lockres_ref(struct dlm_ctxt *dlm, struct >> dlm_lock_resource *res) >> mlog(ML_ERROR, "%s: res %.*s, DEREF to node %u got %d\n", >> dlm->name, namelen, lockname, res->owner, r); >> dlm_print_one_lock_resource(res); >> - BUG(); >> - } >> - return ret ? ret : r; >> + if (r == -ENOMEM) >> + BUG(); >> + } else >> + ret = r; >> + >> + return ret; >> } >> >> int dlm_deref_lockres_handler(struct o2net_msg *msg, u32 len, void *data, >> diff --git a/fs/ocfs2/dlm/dlmthread.c b/fs/ocfs2/dlm/dlmthread.c >> index 68d239b..ce39722 100644 >> --- a/fs/ocfs2/dlm/dlmthread.c >> +++ b/fs/ocfs2/dlm/dlmthread.c >> @@ -175,6 +175,15 @@ static void dlm_purge_lockres(struct dlm_ctxt *dlm, >> res->lockname.len, res->lockname.name, master); >> >> if (!master) { >> + if (res->state & DLM_LOCK_RES_DROPPING_REF) { >> + mlog(ML_NOTICE, "%s: res %.*s already in " >> + "DLM_LOCK_RES_DROPPING_REF state\n", >> + dlm->name, res->lockname.len, >> + res->lockname.name); >> + spin_unlock(&res->spinlock); >> + return; >> + } >> + >> res->state |= DLM_LOCK_RES_DROPPING_REF; >> /* drop spinlock... retake below */ >> spin_unlock(&res->spinlock); >> @@ -203,8 +212,8 @@ static void dlm_purge_lockres(struct dlm_ctxt *dlm, >> dlm->purge_count--; >> } >> >> - if (!master && ret != 0) { >> - mlog(0, "%s: deref %.*s in progress or master goes down\n", >> + if (!master && ret == DLM_DEREF_RESPONSE_INPROG) { >> + mlog(0, "%s: deref %.*s in progress\n", >> dlm->name, res->lockname.len, res->lockname.name); >> spin_unlock(&res->spinlock); >> return; >> > > > > . >
_______________________________________________ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel