Hi Alex, On 2017/11/1 15:05, alex chen wrote: > Hi Joseph and Changwei, > > It's our basic principle that the function in which may sleep can't be called > within spinlock hold.
I suppose this principle is a suggestion not a restriction. > > On 2017/11/1 9:03, Joseph Qi wrote: >> Hi Alex, >> >> On 17/10/31 20:41, alex chen wrote: >>> In the following situation, the down_write() will be called under >>> the spin_lock(), which may lead a soft lockup: >>> o2hb_region_inc_user >>> spin_lock(&o2hb_live_lock) >>> o2hb_region_pin >>> o2nm_depend_item >>> configfs_depend_item >>> inode_lock >>> down_write >>> -->here may sleep and reschedule >>> >>> So we should unlock the o2hb_live_lock before the o2nm_depend_item(), and >>> get item reference in advance to prevent the region to be released. >>> >>> Signed-off-by: Alex Chen <alex.c...@huawei.com> >>> Reviewed-by: Yiwen Jiang <jiangyi...@huawei.com> >>> Reviewed-by: Jun Piao <piao...@huawei.com> >>> --- >>> fs/ocfs2/cluster/heartbeat.c | 8 ++++++++ >>> 1 file changed, 8 insertions(+) >>> >>> diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c >>> index d020604..f1142a9 100644 >>> --- a/fs/ocfs2/cluster/heartbeat.c >>> +++ b/fs/ocfs2/cluster/heartbeat.c >>> @@ -2399,6 +2399,9 @@ static int o2hb_region_pin(const char *region_uuid) >>> if (reg->hr_item_pinned || reg->hr_item_dropped) >>> goto skip_pin; >>> >>> + config_item_get(®->hr_item); >>> + spin_unlock(&o2hb_live_lock); >>> + >> If unlock here, the iteration of o2hb_all_regions is no longer safe. >> >> Thanks, >> Joseph >> > > In local heartbeat mode, here we already found the region and will break the > loop after > depending item, we get the item reference before spin_unlock(), that means > the region will > never be released by the o2hb_region_release() until we put the item > reference after > spin_lock(&o2hb_live_lock), so we can safely iterate over the list. > In global heartbeat mode, it doesn't matter that some regions may be deleted > after > spin_unlock(), because we just pin all the active regions. But we are still iterating elements on global list *o2hb_all_regions*, right? How can we guarantee that no more elements will be attached or detached while the lock is unlocked. > > Thanks, > Alex > >>> /* Ignore ENOENT only for local hb (userdlm domain) */ >>> ret = o2nm_depend_item(®->hr_item); >>> if (!ret) { >>> @@ -2410,9 +2413,14 @@ static int o2hb_region_pin(const char *region_uuid) >>> else { >>> mlog(ML_ERROR, "Pin region %s fails with %d\n", >>> uuid, ret); >>> + config_item_put(®->hr_item); >>> + spin_lock(&o2hb_live_lock); >>> break; >>> } >>> } >>> + >>> + config_item_put(®->hr_item); >>> + spin_lock(&o2hb_live_lock); >>> skip_pin: >>> if (found) >>> break; >>> -- 1.9.5.msysgit.1 >>> >>> >> >> . >> > > _______________________________________________ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel