Hi Changwei, Thanks for you reply.
On 2017/11/2 13:58, Changwei Ge wrote: > On 2017/11/2 11:45, alex chen wrote: >> Hi Changwei, >> >> On 2017/11/1 17:59, Changwei Ge wrote: >>> Hi Alex, >>> >>> On 2017/11/1 17:52, alex chen wrote: >>>> Hi Changwei, >>>> >>>> Thanks for you reply. >>>> >>>> On 2017/11/1 16:28, Changwei Ge wrote: >>>>> Hi Alex, >>>>> >>>>> On 2017/11/1 15:05, alex chen wrote: >>>>>> Hi Joseph and Changwei, >>>>>> >>>>>> It's our basic principle that the function in which may sleep can't be >>>>>> called >>>>>> within spinlock hold. >>>>> >>>>> I suppose this principle is a suggestion not a restriction. >>>>> >>>> >>>> It is a very very bad idea to sleep while holding a spin lock. >>>> If a process grabs a spinlock and goes to sleep before releasing it. >>>> Then this process yields the control of the CPU to a second process. >>>> Unfortunately the second process also need to grab the spinlock and it will >>>> busy wait. On an uniprocessor machine the second process will lock the CPU >>>> not >>>> allowing the first process to wake up and release the spinlock so the >>>> second >>>> process can't continue, it is basically a deadlock. >>> >>> I agree. This soft lockup truly exists. This point is OK for me. >>> >>>> >>>>>> >>>>>> On 2017/11/1 9:03, Joseph Qi wrote: >>>>>>> Hi Alex, >>>>>>> >>>>>>> On 17/10/31 20:41, alex chen wrote: >>>>>>>> In the following situation, the down_write() will be called under >>>>>>>> the spin_lock(), which may lead a soft lockup: >>>>>>>> o2hb_region_inc_user >>>>>>>> spin_lock(&o2hb_live_lock) >>>>>>>> o2hb_region_pin >>>>>>>> o2nm_depend_item >>>>>>>> configfs_depend_item >>>>>>>> inode_lock >>>>>>>> down_write >>>>>>>> -->here may sleep and reschedule >>>>>>>> >>>>>>>> So we should unlock the o2hb_live_lock before the o2nm_depend_item(), >>>>>>>> and >>>>>>>> get item reference in advance to prevent the region to be released. >>>>>>>> >>>>>>>> Signed-off-by: Alex Chen <alex.c...@huawei.com> >>>>>>>> Reviewed-by: Yiwen Jiang <jiangyi...@huawei.com> >>>>>>>> Reviewed-by: Jun Piao <piao...@huawei.com> >>>>>>>> --- >>>>>>>> fs/ocfs2/cluster/heartbeat.c | 8 ++++++++ >>>>>>>> 1 file changed, 8 insertions(+) >>>>>>>> >>>>>>>> diff --git a/fs/ocfs2/cluster/heartbeat.c >>>>>>>> b/fs/ocfs2/cluster/heartbeat.c >>>>>>>> index d020604..f1142a9 100644 >>>>>>>> --- a/fs/ocfs2/cluster/heartbeat.c >>>>>>>> +++ b/fs/ocfs2/cluster/heartbeat.c >>>>>>>> @@ -2399,6 +2399,9 @@ static int o2hb_region_pin(const char >>>>>>>> *region_uuid) >>>>>>>> if (reg->hr_item_pinned || reg->hr_item_dropped) >>>>>>>> goto skip_pin; >>>>>>>> >>>>>>>> + config_item_get(®->hr_item); >>>>>>>> + spin_unlock(&o2hb_live_lock); >>>>>>>> + >>>>>>> If unlock here, the iteration of o2hb_all_regions is no longer safe. >>>>>>> >>>>>>> Thanks, >>>>>>> Joseph >>>>>>> >>>>>> >>>>>> In local heartbeat mode, here we already found the region and will break >>>>>> the loop after >>>>>> depending item, we get the item reference before spin_unlock(), that >>>>>> means the region will >>>>>> never be released by the o2hb_region_release() until we put the item >>>>>> reference after >>>>>> spin_lock(&o2hb_live_lock), so we can safely iterate over the list. >>>>>> In global heartbeat mode, it doesn't matter that some regions may be >>>>>> deleted after >>>>>> spin_unlock(), because we just pin all the active regions. >>>>> >>>>> But we are still iterating elements on global list *o2hb_all_regions*, >>>>> right? How can we guarantee that no more elements will be attached or >>>>> detached while the lock is unlocked. >>>>> >>>> >>>> In global heartbeat mode, we pin all regions if there is the first >>>> dependent user and >>>> unpin all regions if the number of dependent users falls to zero. >>>> So there is not exist another region will be attached or detached after >>>> spin_unlock(). >>> >>> Well, as for local heartbeat mode, I think, we still need to protect >>> that global list. >>> >>> Thanks, >>> Changwei >>> >> For the local heartbeat mode, as I said before, here we already found the >> region and will >> break the loop after depending item, so we just need to protect the region >> during depending >> item. The global list doesn't need to protect anymore. > > Hi Alex, > Um, I think this is too tricky. > The o2hb_live_lock mainly protect the "o2hb_all_regions" not the region and we will not iterate the list o2hb_all_regions any more after we find the region. So we can unlock here. If you have any other opinions, please let me know and I really appreciate you if you can express your opinions as detailed as possible. >> >> I find a problem in my patch. The reg->hr_item_pinned should be protect by >> the o2hb_live_lock, >> so we should call spin_lock(&o2hb_live_lock) immediately after >> o2nm_depend_item(); >> I will modified the patch and resend the patch > Well, if you do so to revise your patch, how can you keep the > consistency between setting ::hr_item_pinned and pining ::hr_item. > > What will ::hr_item_pinned stand for against your revision? > I think this fix is fragile. > Firstly, the ::hr_item_pinned is set in o2hb_region_pin() when mounting filesystem and cleared in o2hb_region_unpin() when umounting filesystem. mount and umount the filesystem can only be called one by one. So the ::hr_item_pinned will not be set and cleared concurrently. Secondly, Although the o2hb_region_pin() is multiple calls, but are called serial in the function dlm_register_domain_handlers(). So the ::hr_item_pinned will not be set concurrently. So we don't need to worry about the consistency of the ::hr_item_pinned. > Besides that the issue reported by Fungguang seems to be a crash issue > rather than a lockup one. > > Thanks, > Changwei > Um, I think the CONFIG_DEBUG_ATOMIC_SLEEP had been defined in the kernel tested by Fengguang, so the warning "BUG: sleeping function..." could be printed. But normally, the CONFIG_DEBUG_ATOMIC_SLEEP would not be defined in the released kernel. so it is a soft lockup. You can read the function ___might_sleep() for more details. Thanks, Alex >> >> Thank, >> Alex >>>> >>>> Thank, >>>> Alex >>>>>> >>>>>> Thanks, >>>>>> Alex >>>>>> >>>>>>>> /* Ignore ENOENT only for local hb (userdlm domain) */ >>>>>>>> ret = o2nm_depend_item(®->hr_item); >>>>>>>> if (!ret) { >>>>>>>> @@ -2410,9 +2413,14 @@ static int o2hb_region_pin(const char >>>>>>>> *region_uuid) >>>>>>>> else { >>>>>>>> mlog(ML_ERROR, "Pin region %s fails >>>>>>>> with %d\n", >>>>>>>> uuid, ret); >>>>>>>> + config_item_put(®->hr_item); >>>>>>>> + spin_lock(&o2hb_live_lock); >>>>>>>> break; >>>>>>>> } >>>>>>>> } >>>>>>>> + >>>>>>>> + config_item_put(®->hr_item); >>>>>>>> + spin_lock(&o2hb_live_lock); >>>>>>>> skip_pin: >>>>>>>> if (found) >>>>>>>> break; >>>>>>>> -- 1.9.5.msysgit.1 >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> . >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> . >>>>> >>>> >>>> >>> >>> >>> . >>> >> >> > > > . > _______________________________________________ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel