Re: [Cluster-devel] [BUG] fs: dlm: possible ABBA deadlock
On 2021/8/19 23:55, David Teigland wrote: On Thu, Aug 19, 2021 at 04:54:57PM +0800, Jia-Ju Bai wrote: Hello, My static analysis tool reports a possible ABBA deadlock in the dlm filesystem in Linux 5.10: dlm_recover_waiters_pre() mutex_lock(>ls_waiters_mutex); --> line 5130 recover_convert_waiter() _receive_convert_reply() lock_rsb() mutex_lock(>res_mutex); --> line 69 dlm_recover_waiters_post() lock_rsb() mutex_lock(>res_mutex); --> line 69 mutex_lock(>ls_waiters_mutex); --> line 5307 When dlm_recover_waiters_pre() and dlm_recover_waiters_post() are concurrently executed, the deadlock can occur. I am not quite sure whether this possible deadlock is real and how to fix it if it is real. Any feedback would be appreciated, thanks :) They won't be concurrent, "pre" runs before recovery, and "post" is after. Okay, thanks for your reply :) Best wishes, Jia-Ju Bai
[Cluster-devel] [BUG] fs: dlm: possible ABBA deadlock
Hello, My static analysis tool reports a possible ABBA deadlock in the dlm filesystem in Linux 5.10: dlm_recover_waiters_pre() mutex_lock(>ls_waiters_mutex); --> line 5130 recover_convert_waiter() _receive_convert_reply() lock_rsb() mutex_lock(>res_mutex); --> line 69 dlm_recover_waiters_post() lock_rsb() mutex_lock(>res_mutex); --> line 69 mutex_lock(>ls_waiters_mutex); --> line 5307 When dlm_recover_waiters_pre() and dlm_recover_waiters_post() are concurrently executed, the deadlock can occur. I am not quite sure whether this possible deadlock is real and how to fix it if it is real. Any feedback would be appreciated, thanks :) Reported-by: TOTE Robot Best wishes, Jia-Ju Bai
Re: [Cluster-devel] [BUG] fs: gfs2: possible null-pointer dereferences in gfs2_rgrp_bh_get()
Thanks for the reply :) On 2019/7/24 17:04, Steven Whitehouse wrote: Hi, On 24/07/2019 09:50, Jia-Ju Bai wrote: In gfs2_rgrp_bh_get, there is an if statement on line 1191 to check whether "rgd->rd_bits[0].bi_bh" is NULL. That is how we detect whether the rgrp has already been read in, so the function is skipped in the case that we've already read in the rgrp. When "rgd->rd_bits[0].bi_bh" is NULL, it is used on line 1216: gfs2_rgrp_in(rgd, (rgd->rd_bits[0].bi_bh)->b_data); No it isn't. See line 1196 where bi_bh is set, and where we also bail out (line 1198) in case it has not been set. I overlooked the operation on line 1196... Thanks for the pointer, I am sorry for the false report... Best wishes, Jia-Ju Bai
[BUG] fs: gfs2: possible null-pointer dereferences in gfs2_rgrp_bh_get()
In gfs2_rgrp_bh_get, there is an if statement on line 1191 to check whether "rgd->rd_bits[0].bi_bh" is NULL. When "rgd->rd_bits[0].bi_bh" is NULL, it is used on line 1216: gfs2_rgrp_in(rgd, (rgd->rd_bits[0].bi_bh)->b_data); and on line 1225: gfs2_rgrp_ondisk2lvb(..., rgd->rd_bits[0].bi_bh->b_data); and on line 1228: if (!gfs2_rgrp_lvb_valid(rgd)) Note that in gfs2_rgrp_lvb_valid(rgd), there is a statement on line 1114: struct gfs2_rgrp *str = (struct gfs2_rgrp *)rgd->rd_bits[0].bi_bh->b_data; Thus, possible null-pointer dereferences may occur. These bugs are found by a static analysis tool STCheck written by us. I do not know how to correctly fix these bugs, so I only report bugs. Best wishes, Jia-Ju Bai
[PATCH] fs: gfs2: Fix a null-pointer dereference in gfs2_alloc_inode()
In gfs2_alloc_inode(), when kmem_cache_alloc() on line 1724 returns NULL, ip is assigned to NULL. In this case, "return >i_inode" will cause a null-pointer dereference. To fix this null-pointer dereference, NULL is returned when ip is NULL. This bug is found by a static analysis tool STCheck written by us. Signed-off-by: Jia-Ju Bai --- fs/gfs2/super.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c index 0acc5834f653..c07c3f4f8451 100644 --- a/fs/gfs2/super.c +++ b/fs/gfs2/super.c @@ -1728,8 +1728,9 @@ static struct inode *gfs2_alloc_inode(struct super_block *sb) memset(>i_res, 0, sizeof(ip->i_res)); RB_CLEAR_NODE(>i_res.rs_node); ip->i_rahead = 0; - } - return >i_inode; + return >i_inode; + } else + return NULL; } static void gfs2_free_inode(struct inode *inode) -- 2.17.0
[Cluster-devel] [PATCH] fs: dls: lowcomms: Replace GFP_ATOMIC with GFP_KERNEL in receive_from_sock
receive_from_sock() is never called in atomic context. The call chain ending up at receive_from_sock() is: [1] receive_from_sock() <- process_recv_sockets() process_recv_sockets() is only set as a parameter of INIT_WORK() And receive_from_sock() also calls mutex_lock(), which indicates this function is not called in atomic context. Despite never getting called from atomic context, receive_from_sock() calls alloc_page() with GFP_ATOMIC, which waits busily for allocation. GFP_ATOMIC is not necessary and can be replaced with GFP_KERNEL, to avoid busy waiting and improve the possibility of sucessful allocation. This is found by a static analysis tool named DCNS written by myself. And I also manually check it. Signed-off-by: Jia-Ju Bai <baijiaju1...@gmail.com> --- fs/dlm/lowcomms.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c index 4813d0e..2d4e230 100644 --- a/fs/dlm/lowcomms.c +++ b/fs/dlm/lowcomms.c @@ -633,7 +633,7 @@ static int receive_from_sock(struct connection *con) * This doesn't need to be atomic, but I think it should * improve performance if it is. */ - con->rx_page = alloc_page(GFP_ATOMIC); + con->rx_page = alloc_page(GFP_KERNEL); if (con->rx_page == NULL) goto out_resched; cbuf_init(>cb, PAGE_SIZE); -- 1.9.1
[Cluster-devel] [BUG] fs/dlm: A possible sleep-in-atomic bug in dlm_master_lookup
According to fs/dlm/lock.c, the kernel may sleep under a spinlock, and the function call path is: dlm_master_lookup (acquire the spinlock) dlm_send_rcom_lookup_dump create_rcom dlm_lowcomms_get_buffer nodeid2con mutex_lock --> may sleep This bug is found by my static analysis tool and my code review. Thanks, Jia-Ju Bai