Re: [Cluster-devel] [PATCH] fs: gfs2: prevent OOB access in gfs2_read_sb()
On 13/10/2020 16:26, Anant Thazhemadam wrote: In gfs2_read_sb(), if the condition (d != sdp->sd_heightsize[x - 1] || m) isn't satisfied (in the first 11 iterations), the loop continues, and begins to perform out-of-bounds access. Fix this out-of-bounds access by introducing a condition in the for loop that ensures that no more than GFS2_MAX_META_HEIGHT + 1 elements are accessed. In addition to this, if the above condition is satisfied when x = GFS2_MAX_META_HEIGHT (which = 10), and the flow of control breaks out of the loop, then an out-of-bounds access is performed again while assigning sdp->sd_heightsize[x] = ~0 (since x would be 11 now.), and also the assertion that spd->sd_max_height <= GFS2_MAX_META_HEIGHT would fail. Fix this out-of-bounds access and ensure that the assertion doesn't fail by introducing another variable "index", which keeps track of the last valid value of x (pre-increment) that can be used. That's not quite the right approach. Your analysis below is correct: the problem stems from the block size in the superblock being zeroed by the fuzzer. So the correct fix would be to add a validation check for sb_bsize (gfs2_check_sb() is lacking somewhat). Valid values are powers of 2 between 512 and the page size. Just a heads-up to avoid duplication of effort: Fox Chen (CCed) has attempted to fix this also[1], but I don't know if they plan to send another patch. [1] https://www.redhat.com/archives/cluster-devel/2020-October/msg6.html Thanks, Andy Reported-by: syzbot+a5e2482a693e6b1e4...@syzkaller.appspotmail.com Tested-by: syzbot+a5e2482a693e6b1e4...@syzkaller.appspotmail.com Signed-off-by: Anant Thazhemadam --- I have one question here (potentially a place where I suspect this patch could have a fatal flaw and might need some rework). sdp->sd_max_height = x; sdp->sd_heightsize[x] = ~0; Were these lines written with the logic that the value of x would be equal to (sdp->sd_heightsize[]'s last index filled in by the loop) + 1? Or, is the expected value of x at these lines equal to (sdp->sd_heightsize[]'s last index as filled in by the loop)? I would appreciate it if someone could clarify for me, how this would hold against the second potential out-of-bounds access I mentioned in my commit message. An additional comment (which I feel is of some significance) on this. Reproducing the crash locally, I could infer that sdp->sd_fsb2bb_shift sdp->sd_sb.sb_bsize, sdp->sd_sb.sb_bsize_shift, and sdp->sd_inptrs were all 0. This by extension also means that in gfs2_read_sb(), all the attributes whose values were determined by performing some sort of calculation involving any one of these variables all resulted in either 0 or a negative value. Simply doing the math will also show how this was also the primary reason this OOB access occured in the first place. However, I still feel that gfs2_read_sb() could do with this bit of checking, since it fundamentally prevents OOB accesses from occuring in gfs2_read_sb() in all scenarios. Anyways, coming back to my initial point. Can having values like that be considered unacceptable and as something that needs to be handled (at gfs2_fill_super() maybe?) or is this non-anomalous behaviour and okay? fs/gfs2/ops_fstype.c | 15 --- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c index 6d18d2c91add..66ee8fb06ab9 100644 --- a/fs/gfs2/ops_fstype.c +++ b/fs/gfs2/ops_fstype.c @@ -281,7 +281,7 @@ static int gfs2_read_sb(struct gfs2_sbd *sdp, int silent) { u32 hash_blocks, ind_blocks, leaf_blocks; u32 tmp_blocks; - unsigned int x; + unsigned int x, index; int error; error = gfs2_read_super(sdp, GFS2_SB_ADDR >> sdp->sd_fsb2bb_shift, silent); @@ -329,20 +329,21 @@ static int gfs2_read_sb(struct gfs2_sbd *sdp, int silent) sdp->sd_heightsize[0] = sdp->sd_sb.sb_bsize - sizeof(struct gfs2_dinode); sdp->sd_heightsize[1] = sdp->sd_sb.sb_bsize * sdp->sd_diptrs; - for (x = 2;; x++) { + for (x = 2; x <= GFS2_MAX_META_HEIGHT; x++) { u64 space, d; u32 m; - space = sdp->sd_heightsize[x - 1] * sdp->sd_inptrs; + index = x; + space = sdp->sd_heightsize[index - 1] * sdp->sd_inptrs; d = space; m = do_div(d, sdp->sd_inptrs); - if (d != sdp->sd_heightsize[x - 1] || m) + if (d != sdp->sd_heightsize[index - 1] || m) break; - sdp->sd_heightsize[x] = space; + sdp->sd_heightsize[index] = space; } - sdp->sd_max_height = x; - sdp->sd_heightsize[x] = ~0; + sdp->sd_max_height = index; + sdp->sd_heightsize[index] = ~0; gfs2_assert(sdp, sdp->sd_max_height <= GFS2_MAX_META_HEIGHT); sdp->sd_max_dents_per_leaf = (sdp->sd_sb.sb_bsize -
Re: [Cluster-devel] KASAN: slab-out-of-bounds Write in gfs2_fill_super
On 14/10/2020 13:19, Anant Thazhemadam wrote: On 30/09/20 7:52 pm, Andrew Price wrote: On 30/09/2020 13:39, syzbot wrote: Hello, syzbot found the following issue on: HEAD commit: fb0155a0 Merge tag 'nfs-for-5.9-3' of git://git.linux-nfs... git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=13458c0f90 kernel config: https://syzkaller.appspot.com/x/.config?x=adebb40048274f92 dashboard link: https://syzkaller.appspot.com/bug?extid=af90d47a37376844e731 Just saw this report. This seems to be the same as https://syzkaller.appspot.com/bug?extid=a5e2482a693e6b1e444b , for which I have recently sent in a fix (https://lkml.org/lkml/2020/10/13/588). Thanks. The gfs2 maintainers are probably busy but I'll review the patch. Since the "Reported-by" tag in the patch sent is for the other instance of the same crash, can we close this one as a duplicate? I expect the duplicates will get closed once a fix is in the tree so there's no need to spend time on that. Andy
Re: [Cluster-devel] general protection fault in gfs2_rgrp_dump
On 06/10/2020 13:48, syzbot wrote: Hello, syzbot found the following issue on: HEAD commit:7575fdda Merge tag 'platform-drivers-x86-v5.9-2' of git://.. git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=14abb7c790 kernel config: https://syzkaller.appspot.com/x/.config?x=de7f697da23057c7 dashboard link: https://syzkaller.appspot.com/bug?extid=43fa87986bdd31df9de6 compiler: clang version 10.0.0 (https://github.com/llvm/llvm-project/ c2443155a0fb245c8f17f2c1c72b6ea391e86e81) Unfortunately, I don't have any reproducer for this issue yet. IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+43fa87986bdd31df9...@syzkaller.appspotmail.com gfs2: fsid=syz:syz.0: ri_addr = 20 ri_length = 1 ri_data0 = 21 ri_data = 2060 ri_bitbytes = 0 I could reproduce this by setting ri_bitbytes in the first rindex entry to 0. The bug is in the error path. Patch submitted: https://www.redhat.com/archives/cluster-devel/2020-October/msg8.html Andy start=0 len=0 offset=128 general protection fault, probably for non-canonical address 0xdc20: [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0100-0x0107] CPU: 1 PID: 19688 Comm: syz-executor.3 Not tainted 5.9.0-rc8-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:gfs2_rgrp_dump+0x3b/0x6c0 fs/gfs2/rgrp.c:2220 Code: 24 10 48 89 f3 48 89 7c 24 08 48 bd 00 00 00 00 00 fc ff df e8 06 7a 2b fe 48 89 ea 48 81 c3 00 01 00 00 48 89 d8 48 c1 e8 03 <80> 3c 28 00 74 12 48 89 df e8 97 60 6b fe 48 ba 00 00 00 00 00 fc RSP: 0018:c90009037758 EFLAGS: 00010202 RAX: 0020 RBX: 0100 RCX: 0004 RDX: dc00 RSI: 00016753 RDI: 00016754 RBP: dc00 R08: 83ddd758 R09: f52001206efa R10: f52001206efa R11: R12: 89364b22 R13: 888042e74000 R14: dc00 R15: 89364943 FS: 7fb8f261d700() GS:8880ae90() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 016a9e60 CR3: 959d9000 CR4: 001506e0 DR0: DR1: DR2: DR3: DR6: fffe0ff0 DR7: 0400 Call Trace: gfs2_consist_rgrpd_i+0xa1/0x110 fs/gfs2/util.c:422 compute_bitstructs fs/gfs2/rgrp.c:812 [inline] read_rindex_entry fs/gfs2/rgrp.c:909 [inline] gfs2_ri_update+0xb60/0x1860 fs/gfs2/rgrp.c:986 gfs2_rindex_update+0x283/0x320 fs/gfs2/rgrp.c:1032 init_inodes fs/gfs2/ops_fstype.c:792 [inline] gfs2_fill_super+0x28e7/0x3fe0 fs/gfs2/ops_fstype.c:1125 get_tree_bdev+0x3e9/0x5f0 fs/super.c:1342 gfs2_get_tree+0x4c/0x1f0 fs/gfs2/ops_fstype.c:1201 vfs_get_tree+0x88/0x270 fs/super.c:1547 do_new_mount fs/namespace.c:2875 [inline] path_mount+0x179d/0x29e0 fs/namespace.c:3192 do_mount fs/namespace.c:3205 [inline] __do_sys_mount fs/namespace.c:3413 [inline] __se_sys_mount+0x126/0x180 fs/namespace.c:3390 do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x46087a Code: b8 a6 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 ad 89 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 0f 83 8a 89 fb ff c3 66 0f 1f 84 00 00 00 00 00 RSP: 002b:7fb8f261ca88 EFLAGS: 0202 ORIG_RAX: 00a5 RAX: ffda RBX: 7fb8f261cb20 RCX: 0046087a RDX: 2000 RSI: 2100 RDI: 7fb8f261cae0 RBP: 7fb8f261cae0 R08: 7fb8f261cb20 R09: 2000 R10: R11: 0202 R12: 2000 R13: 2100 R14: 2200 R15: 20047a20 Modules linked in: ---[ end trace 8711b33583174bc7 ]--- RIP: 0010:gfs2_rgrp_dump+0x3b/0x6c0 fs/gfs2/rgrp.c:2220 Code: 24 10 48 89 f3 48 89 7c 24 08 48 bd 00 00 00 00 00 fc ff df e8 06 7a 2b fe 48 89 ea 48 81 c3 00 01 00 00 48 89 d8 48 c1 e8 03 <80> 3c 28 00 74 12 48 89 df e8 97 60 6b fe 48 ba 00 00 00 00 00 fc RSP: 0018:c90009037758 EFLAGS: 00010202 RAX: 0020 RBX: 0100 RCX: 0004 RDX: dc00 RSI: 00016753 RDI: 00016754 RBP: dc00 R08: 83ddd758 R09: f52001206efa R10: f52001206efa R11: R12: 89364b22 R13: 888042e74000 R14: dc00 R15: 89364943 FS: 7fb8f261d700() GS:8880ae90() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 016a9e60 CR3: 959d9000 CR4: 001506e0 DR0: DR1: DR2: DR3: DR6: fffe0ff0 DR7: 0400 --- This report is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at
Re: [Cluster-devel] [PATCH] gfs2: gfs2_read_sb: put gfs2_assert inside the loop
On 03/10/2020 07:31, Fox Chen wrote: for (x = 2;; x++) { ... gfs2_assert(sdp, x <= GFS2_MAX_META_HEIGHT); <--- after ... if (d != sdp->sd_heightsize[x - 1] || m) break; sdp->sd_heightsize[x] = space; } sdp->sd_max_height = x gfs2_assert(sdp, sdp->sd_max_height <= GFS2_MAX_META_HEIGHT) <--- before Before this patch, gfs2_assert is put outside of the loop of sdp->sd_heightsize[x] calculation. When something goes wrong, So this looks related to one of the recent syzbot reports, where the "something goes wrong" is the block size in the on-disk superblock was zeroed and that leads eventually to this out-of-bounds write. The correct fix in that case would be to add a validity check for the block size in gfs2_check_sb(). Andy x exceeds the size of GFS2_MAX_META_HEIGHT, it may already crash inside the loop when sdp->sd_heightsize[x] = space tries to reach the out-of-bound location, gfs2_assert won't help here. This patch fixes this by moving gfs2_assert into the loop. We will check x value each time to see if it exceeds GFS2_MAX_META_HEIGHT. Signed-off-by: Fox Chen --- fs/gfs2/ops_fstype.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c index 6d18d2c91add..6cc32e3010f2 100644 --- a/fs/gfs2/ops_fstype.c +++ b/fs/gfs2/ops_fstype.c @@ -333,6 +333,7 @@ static int gfs2_read_sb(struct gfs2_sbd *sdp, int silent) u64 space, d; u32 m; + gfs2_assert(sdp, x <= GFS2_MAX_META_HEIGHT); space = sdp->sd_heightsize[x - 1] * sdp->sd_inptrs; d = space; m = do_div(d, sdp->sd_inptrs); @@ -343,7 +344,6 @@ static int gfs2_read_sb(struct gfs2_sbd *sdp, int silent) } sdp->sd_max_height = x; sdp->sd_heightsize[x] = ~0; - gfs2_assert(sdp, sdp->sd_max_height <= GFS2_MAX_META_HEIGHT); sdp->sd_max_dents_per_leaf = (sdp->sd_sb.sb_bsize - sizeof(struct gfs2_leaf)) /
Re: [Cluster-devel] KASAN: slab-out-of-bounds Write in gfs2_fill_super
On 30/09/2020 13:39, syzbot wrote: Hello, syzbot found the following issue on: HEAD commit:fb0155a0 Merge tag 'nfs-for-5.9-3' of git://git.linux-nfs... git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=13458c0f90 kernel config: https://syzkaller.appspot.com/x/.config?x=adebb40048274f92 dashboard link: https://syzkaller.appspot.com/bug?extid=af90d47a37376844e731 compiler: clang version 10.0.0 (https://github.com/llvm/llvm-project/ c2443155a0fb245c8f17f2c1c72b6ea391e86e81) syz repro: https://syzkaller.appspot.com/x/repro.syz?x=15c307d390 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1353d58d90 Bisection is inconclusive: the issue happens on the oldest tested release. bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=106acbbb90 final oops: https://syzkaller.appspot.com/x/report.txt?x=126acbbb90 console output: https://syzkaller.appspot.com/x/log.txt?x=146acbbb90 IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+af90d47a37376844e...@syzkaller.appspotmail.com gfs2: fsid=loop0: Trying to join cluster "lock_nolock", "loop0" gfs2: fsid=loop0: Now mounting FS... == BUG: KASAN: slab-out-of-bounds in gfs2_read_sb fs/gfs2/ops_fstype.c:342 [inline] BUG: KASAN: slab-out-of-bounds in init_sb fs/gfs2/ops_fstype.c:479 [inline] BUG: KASAN: slab-out-of-bounds in gfs2_fill_super+0x1db5/0x3fe0 fs/gfs2/ops_fstype.c:1096 Write of size 8 at addr 88809073d548 by task syz-executor940/6853 Bug filed for this: https://bugzilla.redhat.com/show_bug.cgi?id=1883929 Andy CPU: 1 PID: 6853 Comm: syz-executor940 Not tainted 5.9.0-rc7-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1d6/0x29e lib/dump_stack.c:118 print_address_description+0x66/0x620 mm/kasan/report.c:383 __kasan_report mm/kasan/report.c:513 [inline] kasan_report+0x132/0x1d0 mm/kasan/report.c:530 gfs2_read_sb fs/gfs2/ops_fstype.c:342 [inline] init_sb fs/gfs2/ops_fstype.c:479 [inline] gfs2_fill_super+0x1db5/0x3fe0 fs/gfs2/ops_fstype.c:1096 get_tree_bdev+0x3e9/0x5f0 fs/super.c:1342 gfs2_get_tree+0x4c/0x1f0 fs/gfs2/ops_fstype.c:1201 vfs_get_tree+0x88/0x270 fs/super.c:1547 do_new_mount fs/namespace.c:2875 [inline] path_mount+0x179d/0x29e0 fs/namespace.c:3192 do_mount fs/namespace.c:3205 [inline] __do_sys_mount fs/namespace.c:3413 [inline] __se_sys_mount+0x126/0x180 fs/namespace.c:3390 do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x446dba Code: b8 08 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 fd ad fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 0f 83 da ad fb ff c3 66 0f 1f 84 00 00 00 00 00 RSP: 002b:7fff4c56e748 EFLAGS: 0293 ORIG_RAX: 00a5 RAX: ffda RBX: 7fff4c56e7a0 RCX: 00446dba RDX: 2000 RSI: 2100 RDI: 7fff4c56e760 RBP: 7fff4c56e760 R08: 7fff4c56e7a0 R09: 7fff0015 R10: 0220 R11: 0293 R12: 0001 R13: 0004 R14: 0003 R15: 0003 Allocated by task 6853: kasan_save_stack mm/kasan/common.c:48 [inline] kasan_set_track mm/kasan/common.c:56 [inline] __kasan_kmalloc+0x100/0x130 mm/kasan/common.c:461 kmem_cache_alloc_trace+0x1e4/0x2e0 mm/slab.c:3554 kmalloc include/linux/slab.h:554 [inline] kzalloc include/linux/slab.h:666 [inline] init_sbd fs/gfs2/ops_fstype.c:77 [inline] gfs2_fill_super+0xb6/0x3fe0 fs/gfs2/ops_fstype.c:1018 get_tree_bdev+0x3e9/0x5f0 fs/super.c:1342 gfs2_get_tree+0x4c/0x1f0 fs/gfs2/ops_fstype.c:1201 vfs_get_tree+0x88/0x270 fs/super.c:1547 do_new_mount fs/namespace.c:2875 [inline] path_mount+0x179d/0x29e0 fs/namespace.c:3192 do_mount fs/namespace.c:3205 [inline] __do_sys_mount fs/namespace.c:3413 [inline] __se_sys_mount+0x126/0x180 fs/namespace.c:3390 do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The buggy address belongs to the object at 88809073c000 which belongs to the cache kmalloc-8k of size 8192 The buggy address is located 5448 bytes inside of 8192-byte region [88809073c000, 88809073e000) The buggy address belongs to the page: page:bd4b0b2d refcount:1 mapcount:0 mapping: index:0x0 pfn:0x9073c head:bd4b0b2d order:2 compound_mapcount:0 compound_pincount:0 flags: 0xfffe010200(slab|head) raw: 00fffe010200 ea00028e5608 8880aa441b50 8880aa440a00 raw: 88809073c000 00010001 page dumped because: kasan: bad access detected Memory state around the buggy address: 88809073d400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 8880907
Re: general protection fault in gfs2_withdraw
On 29/09/2020 06:34, syzbot wrote: syzbot has bisected this issue to: commit 601ef0d52e9617588fcff3df26953592f2eb44ac Author: Bob Peterson Date: Tue Jan 28 19:23:45 2020 + gfs2: Force withdraw to replay journals and wait for it to finish bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=151d25e390 start commit: 7c7ec322 Merge tag 'for-linus' of git://git.kernel.org/pub.. git tree: upstream final oops: https://syzkaller.appspot.com/x/report.txt?x=171d25e390 console output: https://syzkaller.appspot.com/x/log.txt?x=131d25e390 kernel config: https://syzkaller.appspot.com/x/.config?x=6184b75aa6d48d66 dashboard link: https://syzkaller.appspot.com/bug?extid=50a8a9cf8127f2c6f5df syz repro: https://syzkaller.appspot.com/x/repro.syz?x=13c6a10990 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15d45ed390 Reported-by: syzbot+50a8a9cf8127f2c6f...@syzkaller.appspotmail.com Fixes: 601ef0d52e96 ("gfs2: Force withdraw to replay journals and wait for it to finish") For information about bisection process see: https://goo.gl/tpsmEJ#bisection Bug filed for this one: https://bugzilla.redhat.com/show_bug.cgi?id=1883932 Andy
Re: [Cluster-devel] general protection fault in gfs2_withdraw
On 26/09/2020 18:21, syzbot wrote: syzbot has found a reproducer for the following issue on: HEAD commit:7c7ec322 Merge tag 'for-linus' of git://git.kernel.org/pub.. git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=11f2ff2790 kernel config: https://syzkaller.appspot.com/x/.config?x=6184b75aa6d48d66 dashboard link: https://syzkaller.appspot.com/bug?extid=50a8a9cf8127f2c6f5df compiler: clang version 10.0.0 (https://github.com/llvm/llvm-project/ c2443155a0fb245c8f17f2c1c72b6ea391e86e81) syz repro: https://syzkaller.appspot.com/x/repro.syz?x=160fb77390 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1104f10990 IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+50a8a9cf8127f2c6f...@syzkaller.appspotmail.com gfs2: fsid=syz:syz.0: fatal: invalid metadata block bh = 2072 (magic number) function = gfs2_meta_indirect_buffer, file = fs/gfs2/meta_io.c, line = 417 gfs2: fsid=syz:syz.0: about to withdraw this file system general protection fault, probably for non-canonical address 0xdc0e: [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0070-0x0077] CPU: 0 PID: 6842 Comm: syz-executor264 Not tainted 5.9.0-rc6-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:signal_our_withdraw fs/gfs2/util.c:97 [inline] Seems that it's withdrawing in the init_inodes() path early enough (while looking up the jindex) that sdp->sd_jdesc is still NULL here: static void signal_our_withdraw(struct gfs2_sbd *sdp) { struct gfs2_glock *gl = sdp->sd_live_gh.gh_gl; struct inode *inode = sdp->sd_jdesc->jd_inode; I'm undecided as to whether the bug is that we're withdrawing that early at all, or that we're not checking for NULL there? Probably introduced by: 601ef0d52e96 gfs2: Force withdraw to replay journals and wait for it to finish Andy RIP: 0010:gfs2_withdraw+0x2b0/0xe20 fs/gfs2/util.c:294 Code: e8 03 48 89 44 24 38 42 80 3c 38 00 74 08 48 89 ef e8 34 f7 69 fe 48 89 6c 24 20 48 8b 6d 00 48 83 c5 70 48 89 e8 48 c1 e8 03 <42> 80 3c 38 00 74 08 48 89 ef e8 11 f7 69 fe 48 8b 45 00 48 89 44 RSP: 0018:c900057474f0 EFLAGS: 00010202 RAX: 000e RBX: 8880a71e RCX: 98268db4dfe86a00 RDX: 888092bb6100 RSI: RDI: 8880a71e0430 RBP: 0070 R08: 834ad50c R09: ed1015d041c3 R10: ed1015d041c3 R11: R12: 111014e3c04d R13: 8880a71e0050 R14: 8880a71e026c R15: dc00 FS: 0233b880() GS:8880ae80() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7f74f826d6c0 CR3: a04cc000 CR4: 001506f0 DR0: DR1: DR2: DR3: DR6: fffe0ff0 DR7: 0400 Call Trace: gfs2_meta_check_ii+0x70/0x80 fs/gfs2/util.c:450 gfs2_metatype_check_i fs/gfs2/util.h:126 [inline] gfs2_meta_indirect_buffer+0x29f/0x380 fs/gfs2/meta_io.c:417 gfs2_meta_inode_buffer fs/gfs2/meta_io.h:70 [inline] gfs2_inode_refresh+0x65/0xc00 fs/gfs2/glops.c:438 inode_go_lock+0x12c/0x480 fs/gfs2/glops.c:468 do_promote+0x4db/0xcd0 fs/gfs2/glock.c:390 finish_xmote+0x907/0x1350 fs/gfs2/glock.c:560 do_xmote+0xadb/0x14c0 fs/gfs2/glock.c:686 gfs2_glock_nq+0xac3/0x14d0 fs/gfs2/glock.c:1410 gfs2_glock_nq_init fs/gfs2/glock.h:238 [inline] gfs2_lookupi+0x36f/0x4f0 fs/gfs2/inode.c:317 gfs2_lookup_simple+0xa4/0x100 fs/gfs2/inode.c:268 init_journal+0x132/0x1970 fs/gfs2/ops_fstype.c:620 init_inodes fs/gfs2/ops_fstype.c:756 [inline] gfs2_fill_super+0x2717/0x3fe0 fs/gfs2/ops_fstype.c:1125 get_tree_bdev+0x3e9/0x5f0 fs/super.c:1342 gfs2_get_tree+0x4c/0x1f0 fs/gfs2/ops_fstype.c:1201 vfs_get_tree+0x88/0x270 fs/super.c:1547 do_new_mount fs/namespace.c:2875 [inline] path_mount+0x179d/0x29e0 fs/namespace.c:3192 do_mount fs/namespace.c:3205 [inline] __do_sys_mount fs/namespace.c:3413 [inline] __se_sys_mount+0x126/0x180 fs/namespace.c:3390 do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x458e1a Code: b8 08 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 fd ad fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 0f 83 da ad fb ff c3 66 0f 1f 84 00 00 00 00 00 RSP: 002b:7ffc76f65c88 EFLAGS: 0293 ORIG_RAX: 00a5 RAX: ffda RBX: 7ffc76f65ce0 RCX: 00458e1a RDX: 2000 RSI: 2100 RDI: 7ffc76f65ca0 RBP: 7ffc76f65ca0 R08: 7ffc76f65ce0 R09: 7ffc0015 R10: R11: 0293 R12: 0809 R13: 0004 R14: 0003 R15: 0003 Modules linked in: ---[ end trace 1e62174917573e95 ]--- RIP: 0010:signal_our_withdraw fs/gfs2/util.c:97 [inline] RIP: 0010:
Re: [RFC PATCH 1/4] vfs: Create fs_context-aware mount_bdev() replacement
On 27/03/2019 11:23, David Howells wrote: Andrew Price wrote: + up_write(&s->s_umount); + blkdev_put(bdev, fc->bdev_mode); + down_write(&s->s_umount); fc->bdev should be NULLed here (or, on the way out of sget_fc() might be more appropriate) otherwise we get a double-blkdev_put() leading to NULL pointer derefs later. This happens when I mount a device twice and then unmount them, or mount it 3 times. How about the attached changes? Note that they conflict a bit with the mtd changes from where I took the destructor piece. David It fixes my bug and I see nothing unexpected from an xfstests -g quick run with my gfs2 patch on top. Looks good. Thanks, Andy --- commit 161602f963ae5a05bdb834461f7a4896286fbde0 Author: David Howells Date: Wed Mar 27 11:12:57 2019 + bdev handling fix diff --git a/fs/fs_context.c b/fs/fs_context.c index 9e22fe6aafe7..fc8fda4b9fe9 100644 --- a/fs/fs_context.c +++ b/fs/fs_context.c @@ -425,10 +425,8 @@ void put_fs_context(struct fs_context *fc) if (fc->need_free && fc->ops && fc->ops->free) fc->ops->free(fc); -#ifdef CONFIG_BLOCK - if (fc->bdev) - blkdev_put(fc->bdev, fc->bdev_mode); -#endif + if (fc->dev_destructor) + fc->dev_destructor(fc); security_free_mnt_opts(&fc->security); put_net(fc->net_ns); diff --git a/fs/super.c b/fs/super.c index 85851adb0f19..e9e678d2c37c 100644 --- a/fs/super.c +++ b/fs/super.c @@ -1211,8 +1211,17 @@ int vfs_get_super(struct fs_context *fc, EXPORT_SYMBOL(vfs_get_super); #ifdef CONFIG_BLOCK +static void fc_bdev_destructor(struct fs_context *fc) +{ + if (fc->bdev) { + blkdev_put(fc->bdev, fc->bdev_mode); + fc->bdev = NULL; + } +} + static int set_bdev_super_fc(struct super_block *s, struct fs_context *fc) { + s->s_mode = fc->bdev_mode; s->s_bdev = fc->bdev; s->s_dev = s->s_bdev->bd_dev; s->s_bdi = bdi_get(s->s_bdev->bd_bdi); @@ -1252,6 +1261,9 @@ int vfs_get_block_super(struct fs_context *fc, return PTR_ERR(bdev); } + fc->dev_destructor = fc_bdev_destructor; + fc->bdev = bdev; + /* Once the superblock is inserted into the list by sget_fc(), s_umount * will protect the lockfs code from trying to start a snapshot while * we are mounting @@ -1260,18 +1272,14 @@ int vfs_get_block_super(struct fs_context *fc, if (bdev->bd_fsfreeze_count > 0) { mutex_unlock(&bdev->bd_fsfreeze_mutex); warnf(fc, "%pg: Can't mount, blockdev is frozen", bdev); - error = -EBUSY; - goto error_bdev; + return -EBUSY; } - fc->bdev = bdev; fc->sb_flags |= SB_NOSEC; s = sget_fc(fc, test_bdev_super_fc, set_bdev_super_fc); mutex_unlock(&bdev->bd_fsfreeze_mutex); - if (IS_ERR(s)) { - error = PTR_ERR(s); - goto error_bdev; - } + if (IS_ERR(s)) + return PTR_ERR(s); if (s->s_root) { /* Don't summarily change the RO/RW state. */ @@ -1281,16 +1289,10 @@ int vfs_get_block_super(struct fs_context *fc, goto error_sb; } - /* s_umount nests inside bd_mutex during __invalidate_device(). -* blkdev_put() acquires bd_mutex and can't be called under -* s_umount. Drop s_umount temporarily. This is safe as we're -* holding an active reference. + /* Leave fc->bdev to fc_bdev_destructor() to clean up to avoid +* locking conflicts. */ - up_write(&s->s_umount); - blkdev_put(bdev, fc->bdev_mode); - down_write(&s->s_umount); } else { - s->s_mode = fc->bdev_mode; snprintf(s->s_id, sizeof(s->s_id), "%pg", bdev); sb_set_blocksize(s, block_size(bdev)); error = fill_super(s, fc); @@ -1307,11 +1309,7 @@ int vfs_get_block_super(struct fs_context *fc, error_sb: deactivate_locked_super(s); -error_bdev: - if (fc->bdev) { - blkdev_put(fc->bdev, fc->bdev_mode); - fc->bdev = NULL; - } + /* Leave fc->bdev to fc_bdev_destructor() to clean up */ return error; } EXPORT_SYMBOL(vfs_get_block_super); @@ -1521,8 +1519,13 @@ int vfs_get_tree(struct fs_context *fc) * on the superblock. */ error = fc->ops->get_tree(fc); - if (error < 0) + if (error < 0) { + if (fc->dev_destructor) { + fc->dev_destructor
Re: [RFC PATCH 1/4] vfs: Create fs_context-aware mount_bdev() replacement
Hi David, I've been testing gfs2 on top of this patch and it seems... On 19/03/2019 16:23, David Howells wrote: Create a function, vfs_get_block_super(), that is fs_context-aware and a replacement for mount_bdev(). It caches the block device pointer and file open mode in the fs_context struct so that this information can be passed into sget_fc()'s test and set functions. Signed-off-by: David Howells --- fs/fs_context.c|2 + fs/super.c | 106 include/linux/fs_context.h |6 ++ 3 files changed, 114 insertions(+) diff --git a/fs/fs_context.c b/fs/fs_context.c index 87e3546b9a52..ea027762c0b2 100644 --- a/fs/fs_context.c +++ b/fs/fs_context.c @@ -425,6 +425,8 @@ void put_fs_context(struct fs_context *fc) if (fc->need_free && fc->ops && fc->ops->free) fc->ops->free(fc); + if (fc->bdev) + blkdev_put(fc->bdev, fc->bdev_mode); doing this means... security_free_mnt_opts(&fc->security); put_net(fc->net_ns); diff --git a/fs/super.c b/fs/super.c index f27ee08fb26f..85851adb0f19 100644 --- a/fs/super.c +++ b/fs/super.c @@ -1211,6 +1211,112 @@ int vfs_get_super(struct fs_context *fc, EXPORT_SYMBOL(vfs_get_super); #ifdef CONFIG_BLOCK +static int set_bdev_super_fc(struct super_block *s, struct fs_context *fc) +{ + s->s_bdev = fc->bdev; + s->s_dev = s->s_bdev->bd_dev; + s->s_bdi = bdi_get(s->s_bdev->bd_bdi); + fc->bdev = NULL; + return 0; +} + +static int test_bdev_super_fc(struct super_block *s, struct fs_context *fc) +{ + return s->s_bdev == fc->bdev; +} + +/** + * vfs_get_block_super - Get a superblock based on a single block device + * @fc: The filesystem context holding the parameters + * @keying: How to distinguish superblocks + * @fill_super: Helper to initialise a new superblock + */ +int vfs_get_block_super(struct fs_context *fc, + int (*fill_super)(struct super_block *, + struct fs_context *)) +{ + struct block_device *bdev; + struct super_block *s; + int error = 0; + + fc->bdev_mode = FMODE_READ | FMODE_EXCL; + if (!(fc->sb_flags & SB_RDONLY)) + fc->bdev_mode |= FMODE_WRITE; + + if (!fc->source) + return invalf(fc, "No source specified"); + + bdev = blkdev_get_by_path(fc->source, fc->bdev_mode, fc->fs_type); + if (IS_ERR(bdev)) { + errorf(fc, "%s: Can't open blockdev", fc->source); + return PTR_ERR(bdev); + } + + /* Once the superblock is inserted into the list by sget_fc(), s_umount +* will protect the lockfs code from trying to start a snapshot while +* we are mounting +*/ + mutex_lock(&bdev->bd_fsfreeze_mutex); + if (bdev->bd_fsfreeze_count > 0) { + mutex_unlock(&bdev->bd_fsfreeze_mutex); + warnf(fc, "%pg: Can't mount, blockdev is frozen", bdev); + error = -EBUSY; + goto error_bdev; + } + + fc->bdev = bdev; + fc->sb_flags |= SB_NOSEC; + s = sget_fc(fc, test_bdev_super_fc, set_bdev_super_fc); + mutex_unlock(&bdev->bd_fsfreeze_mutex); + if (IS_ERR(s)) { + error = PTR_ERR(s); + goto error_bdev; + } + + if (s->s_root) { + /* Don't summarily change the RO/RW state. */ + if ((fc->sb_flags ^ s->s_flags) & SB_RDONLY) { + warnf(fc, "%pg: Can't mount, would change RO state", bdev); + error = -EBUSY; + goto error_sb; + } + + /* s_umount nests inside bd_mutex during __invalidate_device(). +* blkdev_put() acquires bd_mutex and can't be called under +* s_umount. Drop s_umount temporarily. This is safe as we're +* holding an active reference. +*/ + up_write(&s->s_umount); + blkdev_put(bdev, fc->bdev_mode); + down_write(&s->s_umount); fc->bdev should be NULLed here (or, on the way out of sget_fc() might be more appropriate) otherwise we get a double-blkdev_put() leading to NULL pointer derefs later. This happens when I mount a device twice and then unmount them, or mount it 3 times. + } else { + s->s_mode = fc->bdev_mode; + snprintf(s->s_id, sizeof(s->s_id), "%pg", bdev); + sb_set_blocksize(s, block_size(bdev)); + error = fill_super(s, fc); + if (error) + goto error_sb; + + s->s_flags |= SB_ACTIVE; + bdev->bd_super = s; + } + + BUG_ON(fc->root); Maybe BUG_ON(fc->bdev); too? Cheers, Andy
Re: [Cluster-devel] [PATCH] dlm: config: Fix ENOMEM failures in make_cluster()
On 22/03/16 17:36, Andrew Price wrote: Commit 1ae1602de0 "configfs: switch ->default groups to a linked list" left the NULL gps pointer behind after removing the kcalloc() call which made it non-NULL. It also left the !gps check in place so make_cluster() now fails with ENOMEM. Remove the remaining uses of the gps variable to fix that. Could this go through the linux-dlm tree as a fix for -rc2? The impact is that dlm_controld fails: dlm_controld[890]: 171 dlm_controld 4.0.4 started dlm_controld[890]: 171 /sys/kernel/config/dlm/cluster: mkdir failed: 12 so it's pretty important. Cheers, Andy Reviewed-by: Bob Peterson Reviewed-by: Andreas Gruenbacher Signed-off-by: Andrew Price --- fs/dlm/config.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/dlm/config.c b/fs/dlm/config.c index 5191121..1669f62 100644 --- a/fs/dlm/config.c +++ b/fs/dlm/config.c @@ -343,13 +343,12 @@ static struct config_group *make_cluster(struct config_group *g, struct dlm_cluster *cl = NULL; struct dlm_spaces *sps = NULL; struct dlm_comms *cms = NULL; - void *gps = NULL; cl = kzalloc(sizeof(struct dlm_cluster), GFP_NOFS); sps = kzalloc(sizeof(struct dlm_spaces), GFP_NOFS); cms = kzalloc(sizeof(struct dlm_comms), GFP_NOFS); - if (!cl || !gps || !sps || !cms) + if (!cl || !sps || !cms) goto fail; config_group_init_type_name(&cl->group, name, &cluster_type);
[PATCH] dlm: config: Fix ENOMEM failures in make_cluster()
Commit 1ae1602de0 "configfs: switch ->default groups to a linked list" left the NULL gps pointer behind after removing the kcalloc() call which made it non-NULL. It also left the !gps check in place so make_cluster() now fails with ENOMEM. Remove the remaining uses of the gps variable to fix that. Reviewed-by: Bob Peterson Reviewed-by: Andreas Gruenbacher Signed-off-by: Andrew Price --- fs/dlm/config.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/dlm/config.c b/fs/dlm/config.c index 5191121..1669f62 100644 --- a/fs/dlm/config.c +++ b/fs/dlm/config.c @@ -343,13 +343,12 @@ static struct config_group *make_cluster(struct config_group *g, struct dlm_cluster *cl = NULL; struct dlm_spaces *sps = NULL; struct dlm_comms *cms = NULL; - void *gps = NULL; cl = kzalloc(sizeof(struct dlm_cluster), GFP_NOFS); sps = kzalloc(sizeof(struct dlm_spaces), GFP_NOFS); cms = kzalloc(sizeof(struct dlm_comms), GFP_NOFS); - if (!cl || !gps || !sps || !cms) + if (!cl || !sps || !cms) goto fail; config_group_init_type_name(&cl->group, name, &cluster_type); -- 2.4.3
Re: rcu: fix hlist_bl_set_first_rcu annotation
Hi, On 03/02/13 18:39, Paul E. McKenney wrote: On Wed, Jan 30, 2013 at 07:07:57PM +, Steven Whitehouse wrote: Abhi noticed that we were getting a complaint from the RCU subsystem about access of an RCU protected list under the write side bit lock. This patch adds additional annotation to check both the RCU read lock and the write side bit lock before printing a message. Signed-off-by: Steven Whitehouse Reported-by: Abhijith Das Tested-by: Abhijith Das Looks plausible to me on first glance, copying Nick Piggin and Christoph Hellwig. If they have no objections, I will queue this. Thanx, Paul Has this had any attention yet? I'm also seeing the complaint quite frequently: [ 68.738811] === [ 68.741380] [ INFO: suspicious RCU usage. ] [ 68.748785] 3.8.0-0.rc7.git1.1.fc19.x86_64 #1 Not tainted [ 68.750841] --- [ 68.752418] include/linux/rculist_bl.h:23 suspicious rcu_dereference_check() usage! [ 68.755264] [ 68.755264] other info that might help us debug this: [ 68.755264] [ 68.758030] [ 68.758030] rcu_scheduler_active = 1, debug_locks = 0 [ 68.760316] 1 lock held by mount/476: [ 68.761896] #0: (&type->s_umount_key#38/1){+.+.+.}, at: [] sget+0x39e/0x670 [ 68.767115] [ 68.767115] stack backtrace: [ 68.769529] Pid: 476, comm: mount Not tainted 3.8.0-0.rc7.git1.1.fc19.x86_64 #1 [ 68.772095] Call Trace: [ 68.772995] [] lockdep_rcu_suspicious+0xe7/0x120 [ 68.775184] [] search_bucket+0x118/0x160 [gfs2] [ 68.777559] [] gfs2_glock_get+0x603/0x7b0 [gfs2] [ 68.780749] [] ? gfs2_glock_get+0x5/0x7b0 [gfs2] [ 68.784173] [] gfs2_glock_nq_num+0x29/0x90 [gfs2] [ 68.786551] [] gfs2_mount+0x869/0xf30 [gfs2] [ 68.788672] [] ? sched_clock_cpu+0xa8/0x100 [ 68.790739] [] ? trace_hardirqs_off+0xd/0x10 [ 68.793042] [] ? local_clock+0x5f/0x70 [ 68.794940] [] ? __mutex_unlock_slowpath+0x80/0x170 [ 68.798236] [] mount_fs+0x39/0x1b0 [ 68.800409] [] ? __alloc_percpu+0x10/0x20 [ 68.803692] [] vfs_kern_mount+0x63/0xf0 [ 68.806773] [] do_mount+0x205/0xa90 [ 68.809669] [] ? might_fault+0x5c/0xb0 [ 68.812717] [] ? strndup_user+0x4b/0xf0 [ 68.816066] [] sys_mount+0x83/0xc0 [ 68.818284] [] system_call_fastpath+0x16/0x1b It would be good to have this silenced for 3.8 but I think there's not long to go. Thanks, Andy diff --git a/include/linux/list_bl.h b/include/linux/list_bl.h index 31f9d75..2eb8855 100644 --- a/include/linux/list_bl.h +++ b/include/linux/list_bl.h @@ -125,6 +125,11 @@ static inline void hlist_bl_unlock(struct hlist_bl_head *b) __bit_spin_unlock(0, (unsigned long *)b); } +static inline bool hlist_bl_is_locked(struct hlist_bl_head *b) +{ + return bit_spin_is_locked(0, (unsigned long *)b); +} + /** * hlist_bl_for_each_entry- iterate over list of given type * @tpos: the type * to use as a loop cursor. diff --git a/include/linux/rculist_bl.h b/include/linux/rculist_bl.h index cf1244f..4f216c5 100644 --- a/include/linux/rculist_bl.h +++ b/include/linux/rculist_bl.h @@ -20,7 +20,7 @@ static inline void hlist_bl_set_first_rcu(struct hlist_bl_head *h, static inline struct hlist_bl_node *hlist_bl_first_rcu(struct hlist_bl_head *h) { return (struct hlist_bl_node *) - ((unsigned long)rcu_dereference(h->first) & ~LIST_BL_LOCKMASK); + ((unsigned long)rcu_dereference_check(h->first, hlist_bl_is_locked(h)) & ~LIST_BL_LOCKMASK); } /** -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/