Re: [Cluster-devel] [BUG] fs: dlm: possible ABBA deadlock

2021-08-20 Thread Jia-Ju Bai




On 2021/8/19 23:55, David Teigland wrote:

On Thu, Aug 19, 2021 at 04:54:57PM +0800, Jia-Ju Bai wrote:

Hello,

My static analysis tool reports a possible ABBA deadlock in the dlm
filesystem in Linux 5.10:

dlm_recover_waiters_pre()
   mutex_lock(>ls_waiters_mutex); --> line 5130
   recover_convert_waiter()
     _receive_convert_reply()
   lock_rsb()
     mutex_lock(>res_mutex); --> line 69

dlm_recover_waiters_post()
   lock_rsb()
     mutex_lock(>res_mutex); --> line 69
   mutex_lock(>ls_waiters_mutex); --> line 5307

When dlm_recover_waiters_pre() and dlm_recover_waiters_post() are
concurrently executed, the deadlock can occur.

I am not quite sure whether this possible deadlock is real and how to fix it
if it is real.
Any feedback would be appreciated, thanks :)

They won't be concurrent, "pre" runs before recovery, and "post" is after.


Okay, thanks for your reply :)


Best wishes,
Jia-Ju Bai



[Cluster-devel] [BUG] fs: dlm: possible ABBA deadlock

2021-08-19 Thread Jia-Ju Bai

Hello,

My static analysis tool reports a possible ABBA deadlock in the dlm 
filesystem in Linux 5.10:


dlm_recover_waiters_pre()
  mutex_lock(>ls_waiters_mutex); --> line 5130
  recover_convert_waiter()
    _receive_convert_reply()
  lock_rsb()
    mutex_lock(>res_mutex); --> line 69

dlm_recover_waiters_post()
  lock_rsb()
    mutex_lock(>res_mutex); --> line 69
  mutex_lock(>ls_waiters_mutex); --> line 5307

When dlm_recover_waiters_pre() and dlm_recover_waiters_post() are 
concurrently executed, the deadlock can occur.


I am not quite sure whether this possible deadlock is real and how to 
fix it if it is real.

Any feedback would be appreciated, thanks :)

Reported-by: TOTE Robot 


Best wishes,
Jia-Ju Bai



Re: [Cluster-devel] [BUG] fs: gfs2: possible null-pointer dereferences in gfs2_rgrp_bh_get()

2019-07-24 Thread Jia-Ju Bai

Thanks for the reply :)

On 2019/7/24 17:04, Steven Whitehouse wrote:

Hi,

On 24/07/2019 09:50, Jia-Ju Bai wrote:
In gfs2_rgrp_bh_get, there is an if statement on line 1191 to check 
whether "rgd->rd_bits[0].bi_bh" is NULL.


That is how we detect whether the rgrp has already been read in, so 
the function is skipped in the case that we've already read in the rgrp.




When "rgd->rd_bits[0].bi_bh" is NULL, it is used on line 1216:
    gfs2_rgrp_in(rgd, (rgd->rd_bits[0].bi_bh)->b_data);


No it isn't. See line 1196 where bi_bh is set, and where we also bail 
out (line 1198) in case it has not been set.




I overlooked the operation on line 1196...
Thanks for the pointer, I am sorry for the false report...


Best wishes,
Jia-Ju Bai


[BUG] fs: gfs2: possible null-pointer dereferences in gfs2_rgrp_bh_get()

2019-07-24 Thread Jia-Ju Bai
In gfs2_rgrp_bh_get, there is an if statement on line 1191 to check 
whether "rgd->rd_bits[0].bi_bh" is NULL.

When "rgd->rd_bits[0].bi_bh" is NULL, it is used on line 1216:
    gfs2_rgrp_in(rgd, (rgd->rd_bits[0].bi_bh)->b_data);
and on line 1225:
    gfs2_rgrp_ondisk2lvb(..., rgd->rd_bits[0].bi_bh->b_data);
and on line 1228:
    if (!gfs2_rgrp_lvb_valid(rgd))

Note that in gfs2_rgrp_lvb_valid(rgd), there is a statement on line 1114:
    struct gfs2_rgrp *str = (struct gfs2_rgrp 
*)rgd->rd_bits[0].bi_bh->b_data;


Thus, possible null-pointer dereferences may occur.

These bugs are found by a static analysis tool STCheck written by us.
I do not know how to correctly fix these bugs, so I only report bugs.


Best wishes,
Jia-Ju Bai


[PATCH] fs: gfs2: Fix a null-pointer dereference in gfs2_alloc_inode()

2019-07-24 Thread Jia-Ju Bai
In gfs2_alloc_inode(), when kmem_cache_alloc() on line 1724 returns
NULL, ip is assigned to NULL. In this case, "return >i_inode" will
cause a null-pointer dereference.

To fix this null-pointer dereference, NULL is returned when ip is NULL.

This bug is found by a static analysis tool STCheck written by us.

Signed-off-by: Jia-Ju Bai 
---
 fs/gfs2/super.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 0acc5834f653..c07c3f4f8451 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -1728,8 +1728,9 @@ static struct inode *gfs2_alloc_inode(struct super_block 
*sb)
memset(>i_res, 0, sizeof(ip->i_res));
RB_CLEAR_NODE(>i_res.rs_node);
ip->i_rahead = 0;
-   }
-   return >i_inode;
+   return >i_inode;
+   } else
+   return NULL;
 }
 
 static void gfs2_free_inode(struct inode *inode)
-- 
2.17.0



[Cluster-devel] [PATCH] fs: dls: lowcomms: Replace GFP_ATOMIC with GFP_KERNEL in receive_from_sock

2018-04-09 Thread Jia-Ju Bai
receive_from_sock() is never called in atomic context.

The call chain ending up at receive_from_sock() is:
[1] receive_from_sock() <- process_recv_sockets()
process_recv_sockets() is only set as a parameter of INIT_WORK()
And receive_from_sock() also calls mutex_lock(), which indicates
this function is not called in atomic context.

Despite never getting called from atomic context,
receive_from_sock() calls alloc_page() with GFP_ATOMIC,
which waits busily for allocation.
GFP_ATOMIC is not necessary and can be replaced with GFP_KERNEL,
to avoid busy waiting and improve the possibility of sucessful allocation.

This is found by a static analysis tool named DCNS written by myself.
And I also manually check it.

Signed-off-by: Jia-Ju Bai <baijiaju1...@gmail.com>
---
 fs/dlm/lowcomms.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
index 4813d0e..2d4e230 100644
--- a/fs/dlm/lowcomms.c
+++ b/fs/dlm/lowcomms.c
@@ -633,7 +633,7 @@ static int receive_from_sock(struct connection *con)
 * This doesn't need to be atomic, but I think it should
 * improve performance if it is.
 */
-   con->rx_page = alloc_page(GFP_ATOMIC);
+   con->rx_page = alloc_page(GFP_KERNEL);
if (con->rx_page == NULL)
goto out_resched;
cbuf_init(>cb, PAGE_SIZE);
-- 
1.9.1



[Cluster-devel] [BUG] fs/dlm: A possible sleep-in-atomic bug in dlm_master_lookup

2017-10-09 Thread Jia-Ju Bai

According to fs/dlm/lock.c, the kernel may sleep under a spinlock,
and the function call path is:
dlm_master_lookup (acquire the spinlock)
  dlm_send_rcom_lookup_dump
create_rcom
  dlm_lowcomms_get_buffer
nodeid2con
  mutex_lock --> may sleep

This bug is found by my static analysis tool and my code review.

Thanks,
Jia-Ju Bai