On Thu, Jul 28, 2016 at 02:06:08PM -0700, Andrew Morton wrote: > From: piaojun <piao...@huawei.com> > Subject: ocfs2/dlm: continue to purge recovery lockres when recovery master > goes down > > We found a dlm-blocked situation caused by continuous breakdown of > recovery masters described below. To solve this problem, we should purge > recovery lock once detecting recovery master goes down. > > N3 N2 N1(reco master) > go down > pick up recovery lock and > begin recoverying for N2 > > go down > > pick up recovery > lock failed, then > purge it: > dlm_purge_lockres > ->DROPPING_REF is set > > send deref to N1 failed, > recovery lock is not purged > > find N1 go down, begin > recoverying for N1, but > blocked in dlm_do_recovery > as DROPPING_REF is set: > dlm_do_recovery > ->dlm_pick_recovery_master > ->dlmlock > ->dlm_get_lock_resource > ->__dlm_wait_on_lockres_flags(tmpres, > DLM_LOCK_RES_DROPPING_REF); > > Fixes: 8c0343968163 ("ocfs2/dlm: clear DROPPING_REF flag when the master goes > down") > Link: http://lkml.kernel.org/r/578453af.8030...@huawei.com > Signed-off-by: Jun Piao <piao...@huawei.com> > Reviewed-by: Joseph Qi <joseph...@huawei.com> > Reviewed-by: Jiufei Xue <xuejiu...@huawei.com>
Reviewed-by: Mark Fasheh <mfas...@suse.de> --Mark -- Mark Fasheh _______________________________________________ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel