Re: [Cluster-devel] [BUG] fs/dlm: A possible sleep-in-atomic bug in dlm_master_lookup
On Sat, Oct 07, 2017 at 03:26:11AM +0100, Al Viro wrote: > On Sat, Oct 07, 2017 at 09:59:41AM +0800, Jia-Ju Bai wrote: > > According to fs/dlm/lock.c, the kernel may sleep under a spinlock, > > and the function call path is: > > dlm_master_lookup (acquire the spinlock) > > dlm_send_rcom_lookup_dump > > create_rcom > > dlm_lowcomms_get_buffer > > nodeid2con > > mutex_lock --> may sleep > > > > This bug is found by my static analysis tool and my code review. > > Umm... dlm_master_lookup() locking is not nice, but to trigger that > you would need a combination of > > * from_nodeid != our_nodeid (or we would've buggered off long before that > point) > * dir_nodeid == our_nodeid > * failing dlm_search_rsb_tree(&ls->ls_rsbtbl[b].keep, name, len, &r) > (success would have the lock dropped) > * succeeding dlm_search_rsb_tree(&ls->ls_rsbtbl[b].toss, name, len, &r) > * from_master being true > * r->res_master_nodeid != from_nodeid and r->res_master_nodeid == our_nodeid > (the former is follows from the latter, actually) > > The last one might or might not be impossible - I'm not familiar with dlm > guts, but it does have > log_error(ls, "from_master %d our_master", > from_nodeid); > just before that call, so it's worth a further look. dlm_send_rcom_lookup_dump() was for debugging and can be removed. It's a condition that shouldn't happen, and I'm guessing I added that to catch any evidence if it did. I'm surprised it wasn't removed in the final version of the patch, but after 5 years I don't remember what I was thinking. I've pushed a commit dropping it to linux-dlm.git next. Thanks, Dave
[Cluster-devel] [BUG] fs/dlm: A possible sleep-in-atomic bug in dlm_master_lookup
According to fs/dlm/lock.c, the kernel may sleep under a spinlock, and the function call path is: dlm_master_lookup (acquire the spinlock) dlm_send_rcom_lookup_dump create_rcom dlm_lowcomms_get_buffer nodeid2con mutex_lock --> may sleep This bug is found by my static analysis tool and my code review. Thanks, Jia-Ju Bai
Re: [Cluster-devel] [BUG] fs/dlm: A possible sleep-in-atomic bug in dlm_master_lookup
On Sat, Oct 07, 2017 at 09:59:41AM +0800, Jia-Ju Bai wrote: > According to fs/dlm/lock.c, the kernel may sleep under a spinlock, > and the function call path is: > dlm_master_lookup (acquire the spinlock) > dlm_send_rcom_lookup_dump > create_rcom > dlm_lowcomms_get_buffer > nodeid2con > mutex_lock --> may sleep > > This bug is found by my static analysis tool and my code review. Umm... dlm_master_lookup() locking is not nice, but to trigger that you would need a combination of * from_nodeid != our_nodeid (or we would've buggered off long before that point) * dir_nodeid == our_nodeid * failing dlm_search_rsb_tree(&ls->ls_rsbtbl[b].keep, name, len, &r) (success would have the lock dropped) * succeeding dlm_search_rsb_tree(&ls->ls_rsbtbl[b].toss, name, len, &r) * from_master being true * r->res_master_nodeid != from_nodeid and r->res_master_nodeid == our_nodeid (the former is follows from the latter, actually) The last one might or might not be impossible - I'm not familiar with dlm guts, but it does have log_error(ls, "from_master %d our_master", from_nodeid); just before that call, so it's worth a further look.