Re: [Cluster-devel] [PATCH] fs: gfs2: prevent OOB access in gfs2_read_sb()

2020-10-14 Thread Andrew Price

On 13/10/2020 16:26, Anant Thazhemadam wrote:

In gfs2_read_sb(), if the condition
(d != sdp->sd_heightsize[x - 1] || m)
isn't satisfied (in the first 11 iterations), the loop continues,
and begins to perform out-of-bounds access.
Fix this out-of-bounds access by introducing a condition in the for loop
that ensures that no more than GFS2_MAX_META_HEIGHT + 1 elements are
accessed.

In addition to this, if the above condition is satisfied when
x = GFS2_MAX_META_HEIGHT (which = 10), and the flow of control breaks
out of the loop, then an out-of-bounds access is performed again while
assigning sdp->sd_heightsize[x] = ~0 (since x would be 11 now.), and
also the assertion that spd->sd_max_height <= GFS2_MAX_META_HEIGHT would
fail.
Fix this out-of-bounds access and ensure that the assertion doesn't fail
by introducing another variable "index", which keeps track of the last
valid value of x (pre-increment) that can be used.


That's not quite the right approach. Your analysis below is correct: the 
problem stems from the block size in the superblock being zeroed by the 
fuzzer. So the correct fix would be to add a validation check for 
sb_bsize (gfs2_check_sb() is lacking somewhat). Valid values are powers 
of 2 between 512 and the page size.


Just a heads-up to avoid duplication of effort: Fox Chen (CCed) has 
attempted to fix this also[1], but I don't know if they plan to send 
another patch.


[1] https://www.redhat.com/archives/cluster-devel/2020-October/msg6.html

Thanks,
Andy


Reported-by: syzbot+a5e2482a693e6b1e4...@syzkaller.appspotmail.com
Tested-by: syzbot+a5e2482a693e6b1e4...@syzkaller.appspotmail.com
Signed-off-by: Anant Thazhemadam 
---

I have one question here (potentially a place where I suspect this
patch could have a fatal flaw and might need some rework).

sdp->sd_max_height = x;
sdp->sd_heightsize[x] = ~0;

Were these lines written with the logic that the value of x would be
equal to (sdp->sd_heightsize[]'s last index filled in by the loop) + 1?
Or, is the expected value of x at these lines equal to
(sdp->sd_heightsize[]'s last index as filled in by the loop)?
I would appreciate it if someone could clarify for me, how this would
hold against the second potential out-of-bounds access I mentioned in my
commit message.

An additional comment (which I feel is of some significance) on this.
Reproducing the crash locally, I could infer that sdp->sd_fsb2bb_shift
sdp->sd_sb.sb_bsize, sdp->sd_sb.sb_bsize_shift, and sdp->sd_inptrs
were all 0.
This by extension also means that in gfs2_read_sb(), all the attributes
whose values were determined by performing some sort of calculation
involving any one of these variables all resulted in either 0 or a
negative value.
Simply doing the math will also show how this was also the primary reason
this OOB access occured in the first place.
However, I still feel that gfs2_read_sb() could do with this bit of checking,
since it fundamentally prevents OOB accesses from occuring in gfs2_read_sb()
in all scenarios.
Anyways, coming back to my initial point. Can having values like that be
considered unacceptable and as something that needs to be handled (at
gfs2_fill_super() maybe?) or is this non-anomalous behaviour and okay?

  fs/gfs2/ops_fstype.c | 15 ---
  1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index 6d18d2c91add..66ee8fb06ab9 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -281,7 +281,7 @@ static int gfs2_read_sb(struct gfs2_sbd *sdp, int silent)
  {
u32 hash_blocks, ind_blocks, leaf_blocks;
u32 tmp_blocks;
-   unsigned int x;
+   unsigned int x, index;
int error;
  
  	error = gfs2_read_super(sdp, GFS2_SB_ADDR >> sdp->sd_fsb2bb_shift, silent);

@@ -329,20 +329,21 @@ static int gfs2_read_sb(struct gfs2_sbd *sdp, int silent)
sdp->sd_heightsize[0] = sdp->sd_sb.sb_bsize -
sizeof(struct gfs2_dinode);
sdp->sd_heightsize[1] = sdp->sd_sb.sb_bsize * sdp->sd_diptrs;
-   for (x = 2;; x++) {
+   for (x = 2; x <= GFS2_MAX_META_HEIGHT; x++) {
u64 space, d;
u32 m;
  
-		space = sdp->sd_heightsize[x - 1] * sdp->sd_inptrs;

+   index = x;
+   space = sdp->sd_heightsize[index - 1] * sdp->sd_inptrs;
d = space;
m = do_div(d, sdp->sd_inptrs);
  
-		if (d != sdp->sd_heightsize[x - 1] || m)

+   if (d != sdp->sd_heightsize[index - 1] || m)
break;
-   sdp->sd_heightsize[x] = space;
+   sdp->sd_heightsize[index] = space;
}
-   sdp->sd_max_height = x;
-   sdp->sd_heightsize[x] = ~0;
+   sdp->sd_max_height = index;
+   sdp->sd_heightsize[index] = ~0;
gfs2_assert(sdp, sdp->sd_max_height <= GFS2_MAX_META_HEIGHT);
  
  	sdp->sd_max_dents_per_leaf = (sdp->sd_sb.sb_bsize -






Re: [Cluster-devel] KASAN: slab-out-of-bounds Write in gfs2_fill_super

2020-10-14 Thread Andrew Price

On 14/10/2020 13:19, Anant Thazhemadam wrote:


On 30/09/20 7:52 pm, Andrew Price wrote:

On 30/09/2020 13:39, syzbot wrote:

Hello,

syzbot found the following issue on:

HEAD commit:    fb0155a0 Merge tag 'nfs-for-5.9-3' of git://git.linux-nfs...
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=13458c0f90
kernel config:  https://syzkaller.appspot.com/x/.config?x=adebb40048274f92
dashboard link: https://syzkaller.appspot.com/bug?extid=af90d47a37376844e731




Just saw this report.
This seems to be the same as
     https://syzkaller.appspot.com/bug?extid=a5e2482a693e6b1e444b ,
for which I have recently sent in a fix (https://lkml.org/lkml/2020/10/13/588).


Thanks. The gfs2 maintainers are probably busy but I'll review the patch.


Since the "Reported-by" tag in the patch sent is for the other instance of the
same crash, can we close this one as a duplicate?


I expect the duplicates will get closed once a fix is in the tree so 
there's no need to spend time on that.


Andy



Re: [Cluster-devel] general protection fault in gfs2_rgrp_dump

2020-10-07 Thread Andrew Price

On 06/10/2020 13:48, syzbot wrote:

Hello,

syzbot found the following issue on:

HEAD commit:7575fdda Merge tag 'platform-drivers-x86-v5.9-2' of git://..
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=14abb7c790
kernel config:  https://syzkaller.appspot.com/x/.config?x=de7f697da23057c7
dashboard link: https://syzkaller.appspot.com/bug?extid=43fa87986bdd31df9de6
compiler:   clang version 10.0.0 (https://github.com/llvm/llvm-project/ 
c2443155a0fb245c8f17f2c1c72b6ea391e86e81)

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+43fa87986bdd31df9...@syzkaller.appspotmail.com

gfs2: fsid=syz:syz.0: ri_addr = 20
ri_length = 1
ri_data0 = 21
ri_data = 2060
ri_bitbytes = 0


I could reproduce this by setting ri_bitbytes in the first rindex entry 
to 0. The bug is in the error path.


Patch submitted: 
https://www.redhat.com/archives/cluster-devel/2020-October/msg8.html


Andy


start=0 len=0 offset=128
general protection fault, probably for non-canonical address 
0xdc20:  [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0100-0x0107]
CPU: 1 PID: 19688 Comm: syz-executor.3 Not tainted 5.9.0-rc8-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
RIP: 0010:gfs2_rgrp_dump+0x3b/0x6c0 fs/gfs2/rgrp.c:2220
Code: 24 10 48 89 f3 48 89 7c 24 08 48 bd 00 00 00 00 00 fc ff df e8 06 7a 2b fe 48 
89 ea 48 81 c3 00 01 00 00 48 89 d8 48 c1 e8 03 <80> 3c 28 00 74 12 48 89 df e8 
97 60 6b fe 48 ba 00 00 00 00 00 fc
RSP: 0018:c90009037758 EFLAGS: 00010202
RAX: 0020 RBX: 0100 RCX: 0004
RDX: dc00 RSI: 00016753 RDI: 00016754
RBP: dc00 R08: 83ddd758 R09: f52001206efa
R10: f52001206efa R11:  R12: 89364b22
R13: 888042e74000 R14: dc00 R15: 89364943
FS:  7fb8f261d700() GS:8880ae90() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 016a9e60 CR3: 959d9000 CR4: 001506e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
  gfs2_consist_rgrpd_i+0xa1/0x110 fs/gfs2/util.c:422
  compute_bitstructs fs/gfs2/rgrp.c:812 [inline]
  read_rindex_entry fs/gfs2/rgrp.c:909 [inline]
  gfs2_ri_update+0xb60/0x1860 fs/gfs2/rgrp.c:986
  gfs2_rindex_update+0x283/0x320 fs/gfs2/rgrp.c:1032
  init_inodes fs/gfs2/ops_fstype.c:792 [inline]
  gfs2_fill_super+0x28e7/0x3fe0 fs/gfs2/ops_fstype.c:1125
  get_tree_bdev+0x3e9/0x5f0 fs/super.c:1342
  gfs2_get_tree+0x4c/0x1f0 fs/gfs2/ops_fstype.c:1201
  vfs_get_tree+0x88/0x270 fs/super.c:1547
  do_new_mount fs/namespace.c:2875 [inline]
  path_mount+0x179d/0x29e0 fs/namespace.c:3192
  do_mount fs/namespace.c:3205 [inline]
  __do_sys_mount fs/namespace.c:3413 [inline]
  __se_sys_mount+0x126/0x180 fs/namespace.c:3390
  do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x46087a
Code: b8 a6 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 ad 89 fb ff c3 66 2e 0f 1f 84 00 
00 00 00 00 66 90 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 0f 83 8a 89 
fb ff c3 66 0f 1f 84 00 00 00 00 00
RSP: 002b:7fb8f261ca88 EFLAGS: 0202 ORIG_RAX: 00a5
RAX: ffda RBX: 7fb8f261cb20 RCX: 0046087a
RDX: 2000 RSI: 2100 RDI: 7fb8f261cae0
RBP: 7fb8f261cae0 R08: 7fb8f261cb20 R09: 2000
R10:  R11: 0202 R12: 2000
R13: 2100 R14: 2200 R15: 20047a20
Modules linked in:
---[ end trace 8711b33583174bc7 ]---
RIP: 0010:gfs2_rgrp_dump+0x3b/0x6c0 fs/gfs2/rgrp.c:2220
Code: 24 10 48 89 f3 48 89 7c 24 08 48 bd 00 00 00 00 00 fc ff df e8 06 7a 2b fe 48 
89 ea 48 81 c3 00 01 00 00 48 89 d8 48 c1 e8 03 <80> 3c 28 00 74 12 48 89 df e8 
97 60 6b fe 48 ba 00 00 00 00 00 fc
RSP: 0018:c90009037758 EFLAGS: 00010202
RAX: 0020 RBX: 0100 RCX: 0004
RDX: dc00 RSI: 00016753 RDI: 00016754
RBP: dc00 R08: 83ddd758 R09: f52001206efa
R10: f52001206efa R11:  R12: 89364b22
R13: 888042e74000 R14: dc00 R15: 89364943
FS:  7fb8f261d700() GS:8880ae90() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 016a9e60 CR3: 959d9000 CR4: 001506e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at

Re: [Cluster-devel] [PATCH] gfs2: gfs2_read_sb: put gfs2_assert inside the loop

2020-10-05 Thread Andrew Price

On 03/10/2020 07:31, Fox Chen wrote:

for (x = 2;; x++) {
 ...
 gfs2_assert(sdp, x <= GFS2_MAX_META_HEIGHT);  <--- after
 ...
 if (d != sdp->sd_heightsize[x - 1] || m)
 break;
 sdp->sd_heightsize[x] = space;
}

sdp->sd_max_height = x
gfs2_assert(sdp, sdp->sd_max_height <= GFS2_MAX_META_HEIGHT) <--- before

Before this patch, gfs2_assert is put outside of the loop of
sdp->sd_heightsize[x] calculation. When something goes wrong,


So this looks related to one of the recent syzbot reports, where the 
"something goes wrong" is the block size in the on-disk superblock was 
zeroed and that leads eventually to this out-of-bounds write. The 
correct fix in that case would be to add a validity check for the block 
size in gfs2_check_sb().


Andy


x exceeds the size of GFS2_MAX_META_HEIGHT, it may already crash inside
the loop when

sdp->sd_heightsize[x] = space

tries to reach the out-of-bound
location, gfs2_assert won't help here.

This patch fixes this by moving gfs2_assert into the loop.
We will check x value each time to see if it exceeds GFS2_MAX_META_HEIGHT.

Signed-off-by: Fox Chen 
---
  fs/gfs2/ops_fstype.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index 6d18d2c91add..6cc32e3010f2 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -333,6 +333,7 @@ static int gfs2_read_sb(struct gfs2_sbd *sdp, int silent)
u64 space, d;
u32 m;
  
+		gfs2_assert(sdp, x <= GFS2_MAX_META_HEIGHT);

space = sdp->sd_heightsize[x - 1] * sdp->sd_inptrs;
d = space;
m = do_div(d, sdp->sd_inptrs);
@@ -343,7 +344,6 @@ static int gfs2_read_sb(struct gfs2_sbd *sdp, int silent)
}
sdp->sd_max_height = x;
sdp->sd_heightsize[x] = ~0;
-   gfs2_assert(sdp, sdp->sd_max_height <= GFS2_MAX_META_HEIGHT);
  
  	sdp->sd_max_dents_per_leaf = (sdp->sd_sb.sb_bsize -

  sizeof(struct gfs2_leaf)) /





Re: [Cluster-devel] KASAN: slab-out-of-bounds Write in gfs2_fill_super

2020-09-30 Thread Andrew Price

On 30/09/2020 13:39, syzbot wrote:

Hello,

syzbot found the following issue on:

HEAD commit:fb0155a0 Merge tag 'nfs-for-5.9-3' of git://git.linux-nfs...
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=13458c0f90
kernel config:  https://syzkaller.appspot.com/x/.config?x=adebb40048274f92
dashboard link: https://syzkaller.appspot.com/bug?extid=af90d47a37376844e731
compiler:   clang version 10.0.0 (https://github.com/llvm/llvm-project/ 
c2443155a0fb245c8f17f2c1c72b6ea391e86e81)
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=15c307d390
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1353d58d90

Bisection is inconclusive: the issue happens on the oldest tested release.

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=106acbbb90
final oops: https://syzkaller.appspot.com/x/report.txt?x=126acbbb90
console output: https://syzkaller.appspot.com/x/log.txt?x=146acbbb90

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+af90d47a37376844e...@syzkaller.appspotmail.com

gfs2: fsid=loop0: Trying to join cluster "lock_nolock", "loop0"
gfs2: fsid=loop0: Now mounting FS...
==
BUG: KASAN: slab-out-of-bounds in gfs2_read_sb fs/gfs2/ops_fstype.c:342 [inline]
BUG: KASAN: slab-out-of-bounds in init_sb fs/gfs2/ops_fstype.c:479 [inline]
BUG: KASAN: slab-out-of-bounds in gfs2_fill_super+0x1db5/0x3fe0 
fs/gfs2/ops_fstype.c:1096
Write of size 8 at addr 88809073d548 by task syz-executor940/6853


Bug filed for this:

https://bugzilla.redhat.com/show_bug.cgi?id=1883929

Andy


CPU: 1 PID: 6853 Comm: syz-executor940 Not tainted 5.9.0-rc7-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1d6/0x29e lib/dump_stack.c:118
  print_address_description+0x66/0x620 mm/kasan/report.c:383
  __kasan_report mm/kasan/report.c:513 [inline]
  kasan_report+0x132/0x1d0 mm/kasan/report.c:530
  gfs2_read_sb fs/gfs2/ops_fstype.c:342 [inline]
  init_sb fs/gfs2/ops_fstype.c:479 [inline]
  gfs2_fill_super+0x1db5/0x3fe0 fs/gfs2/ops_fstype.c:1096
  get_tree_bdev+0x3e9/0x5f0 fs/super.c:1342
  gfs2_get_tree+0x4c/0x1f0 fs/gfs2/ops_fstype.c:1201
  vfs_get_tree+0x88/0x270 fs/super.c:1547
  do_new_mount fs/namespace.c:2875 [inline]
  path_mount+0x179d/0x29e0 fs/namespace.c:3192
  do_mount fs/namespace.c:3205 [inline]
  __do_sys_mount fs/namespace.c:3413 [inline]
  __se_sys_mount+0x126/0x180 fs/namespace.c:3390
  do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x446dba
Code: b8 08 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 fd ad fb ff c3 66 2e 0f 1f 84 00 
00 00 00 00 66 90 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 0f 83 da ad 
fb ff c3 66 0f 1f 84 00 00 00 00 00
RSP: 002b:7fff4c56e748 EFLAGS: 0293 ORIG_RAX: 00a5
RAX: ffda RBX: 7fff4c56e7a0 RCX: 00446dba
RDX: 2000 RSI: 2100 RDI: 7fff4c56e760
RBP: 7fff4c56e760 R08: 7fff4c56e7a0 R09: 7fff0015
R10: 0220 R11: 0293 R12: 0001
R13: 0004 R14: 0003 R15: 0003

Allocated by task 6853:
  kasan_save_stack mm/kasan/common.c:48 [inline]
  kasan_set_track mm/kasan/common.c:56 [inline]
  __kasan_kmalloc+0x100/0x130 mm/kasan/common.c:461
  kmem_cache_alloc_trace+0x1e4/0x2e0 mm/slab.c:3554
  kmalloc include/linux/slab.h:554 [inline]
  kzalloc include/linux/slab.h:666 [inline]
  init_sbd fs/gfs2/ops_fstype.c:77 [inline]
  gfs2_fill_super+0xb6/0x3fe0 fs/gfs2/ops_fstype.c:1018
  get_tree_bdev+0x3e9/0x5f0 fs/super.c:1342
  gfs2_get_tree+0x4c/0x1f0 fs/gfs2/ops_fstype.c:1201
  vfs_get_tree+0x88/0x270 fs/super.c:1547
  do_new_mount fs/namespace.c:2875 [inline]
  path_mount+0x179d/0x29e0 fs/namespace.c:3192
  do_mount fs/namespace.c:3205 [inline]
  __do_sys_mount fs/namespace.c:3413 [inline]
  __se_sys_mount+0x126/0x180 fs/namespace.c:3390
  do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

The buggy address belongs to the object at 88809073c000
  which belongs to the cache kmalloc-8k of size 8192
The buggy address is located 5448 bytes inside of
  8192-byte region [88809073c000, 88809073e000)
The buggy address belongs to the page:
page:bd4b0b2d refcount:1 mapcount:0 mapping: index:0x0 
pfn:0x9073c
head:bd4b0b2d order:2 compound_mapcount:0 compound_pincount:0
flags: 0xfffe010200(slab|head)
raw: 00fffe010200 ea00028e5608 8880aa441b50 8880aa440a00
raw:  88809073c000 00010001 
page dumped because: kasan: bad access detected

Memory state around the buggy address:
  88809073d400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  8880907

Re: general protection fault in gfs2_withdraw

2020-09-30 Thread Andrew Price

On 29/09/2020 06:34, syzbot wrote:

syzbot has bisected this issue to:

commit 601ef0d52e9617588fcff3df26953592f2eb44ac
Author: Bob Peterson 
Date:   Tue Jan 28 19:23:45 2020 +

 gfs2: Force withdraw to replay journals and wait for it to finish

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=151d25e390
start commit:   7c7ec322 Merge tag 'for-linus' of git://git.kernel.org/pub..
git tree:   upstream
final oops: https://syzkaller.appspot.com/x/report.txt?x=171d25e390
console output: https://syzkaller.appspot.com/x/log.txt?x=131d25e390
kernel config:  https://syzkaller.appspot.com/x/.config?x=6184b75aa6d48d66
dashboard link: https://syzkaller.appspot.com/bug?extid=50a8a9cf8127f2c6f5df
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=13c6a10990
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=15d45ed390

Reported-by: syzbot+50a8a9cf8127f2c6f...@syzkaller.appspotmail.com
Fixes: 601ef0d52e96 ("gfs2: Force withdraw to replay journals and wait for it to 
finish")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection



Bug filed for this one:

https://bugzilla.redhat.com/show_bug.cgi?id=1883932

Andy



Re: [Cluster-devel] general protection fault in gfs2_withdraw

2020-09-28 Thread Andrew Price

On 26/09/2020 18:21, syzbot wrote:

syzbot has found a reproducer for the following issue on:

HEAD commit:7c7ec322 Merge tag 'for-linus' of git://git.kernel.org/pub..
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=11f2ff2790
kernel config:  https://syzkaller.appspot.com/x/.config?x=6184b75aa6d48d66
dashboard link: https://syzkaller.appspot.com/bug?extid=50a8a9cf8127f2c6f5df
compiler:   clang version 10.0.0 (https://github.com/llvm/llvm-project/ 
c2443155a0fb245c8f17f2c1c72b6ea391e86e81)
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=160fb77390
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1104f10990

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+50a8a9cf8127f2c6f...@syzkaller.appspotmail.com

gfs2: fsid=syz:syz.0: fatal: invalid metadata block
   bh = 2072 (magic number)
   function = gfs2_meta_indirect_buffer, file = fs/gfs2/meta_io.c, line = 417
gfs2: fsid=syz:syz.0: about to withdraw this file system
general protection fault, probably for non-canonical address 
0xdc0e:  [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0070-0x0077]
CPU: 0 PID: 6842 Comm: syz-executor264 Not tainted 5.9.0-rc6-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
RIP: 0010:signal_our_withdraw fs/gfs2/util.c:97 [inline]


Seems that it's withdrawing in the init_inodes() path early enough 
(while looking up the jindex) that sdp->sd_jdesc is still NULL here:


  static void signal_our_withdraw(struct gfs2_sbd *sdp)
  {
  struct gfs2_glock *gl = sdp->sd_live_gh.gh_gl;
  struct inode *inode = sdp->sd_jdesc->jd_inode;

I'm undecided as to whether the bug is that we're withdrawing that early 
at all, or that we're not checking for NULL there?


Probably introduced by:

601ef0d52e96 gfs2: Force withdraw to replay journals and wait for it to 
finish


Andy


RIP: 0010:gfs2_withdraw+0x2b0/0xe20 fs/gfs2/util.c:294
Code: e8 03 48 89 44 24 38 42 80 3c 38 00 74 08 48 89 ef e8 34 f7 69 fe 48 89 6c 24 
20 48 8b 6d 00 48 83 c5 70 48 89 e8 48 c1 e8 03 <42> 80 3c 38 00 74 08 48 89 ef 
e8 11 f7 69 fe 48 8b 45 00 48 89 44
RSP: 0018:c900057474f0 EFLAGS: 00010202
RAX: 000e RBX: 8880a71e RCX: 98268db4dfe86a00
RDX: 888092bb6100 RSI:  RDI: 8880a71e0430
RBP: 0070 R08: 834ad50c R09: ed1015d041c3
R10: ed1015d041c3 R11:  R12: 111014e3c04d
R13: 8880a71e0050 R14: 8880a71e026c R15: dc00
FS:  0233b880() GS:8880ae80() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f74f826d6c0 CR3: a04cc000 CR4: 001506f0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
  gfs2_meta_check_ii+0x70/0x80 fs/gfs2/util.c:450
  gfs2_metatype_check_i fs/gfs2/util.h:126 [inline]
  gfs2_meta_indirect_buffer+0x29f/0x380 fs/gfs2/meta_io.c:417
  gfs2_meta_inode_buffer fs/gfs2/meta_io.h:70 [inline]
  gfs2_inode_refresh+0x65/0xc00 fs/gfs2/glops.c:438
  inode_go_lock+0x12c/0x480 fs/gfs2/glops.c:468
  do_promote+0x4db/0xcd0 fs/gfs2/glock.c:390
  finish_xmote+0x907/0x1350 fs/gfs2/glock.c:560
  do_xmote+0xadb/0x14c0 fs/gfs2/glock.c:686
  gfs2_glock_nq+0xac3/0x14d0 fs/gfs2/glock.c:1410
  gfs2_glock_nq_init fs/gfs2/glock.h:238 [inline]
  gfs2_lookupi+0x36f/0x4f0 fs/gfs2/inode.c:317
  gfs2_lookup_simple+0xa4/0x100 fs/gfs2/inode.c:268
  init_journal+0x132/0x1970 fs/gfs2/ops_fstype.c:620
  init_inodes fs/gfs2/ops_fstype.c:756 [inline]
  gfs2_fill_super+0x2717/0x3fe0 fs/gfs2/ops_fstype.c:1125
  get_tree_bdev+0x3e9/0x5f0 fs/super.c:1342
  gfs2_get_tree+0x4c/0x1f0 fs/gfs2/ops_fstype.c:1201
  vfs_get_tree+0x88/0x270 fs/super.c:1547
  do_new_mount fs/namespace.c:2875 [inline]
  path_mount+0x179d/0x29e0 fs/namespace.c:3192
  do_mount fs/namespace.c:3205 [inline]
  __do_sys_mount fs/namespace.c:3413 [inline]
  __se_sys_mount+0x126/0x180 fs/namespace.c:3390
  do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x458e1a
Code: b8 08 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 fd ad fb ff c3 66 2e 0f 1f 84 00 
00 00 00 00 66 90 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 0f 83 da ad 
fb ff c3 66 0f 1f 84 00 00 00 00 00
RSP: 002b:7ffc76f65c88 EFLAGS: 0293 ORIG_RAX: 00a5
RAX: ffda RBX: 7ffc76f65ce0 RCX: 00458e1a
RDX: 2000 RSI: 2100 RDI: 7ffc76f65ca0
RBP: 7ffc76f65ca0 R08: 7ffc76f65ce0 R09: 7ffc0015
R10:  R11: 0293 R12: 0809
R13: 0004 R14: 0003 R15: 0003
Modules linked in:
---[ end trace 1e62174917573e95 ]---
RIP: 0010:signal_our_withdraw fs/gfs2/util.c:97 [inline]
RIP: 0010:

Re: [RFC PATCH 1/4] vfs: Create fs_context-aware mount_bdev() replacement

2019-03-27 Thread Andrew Price

On 27/03/2019 11:23, David Howells wrote:

Andrew Price  wrote:


+   up_write(&s->s_umount);
+   blkdev_put(bdev, fc->bdev_mode);
+   down_write(&s->s_umount);


fc->bdev should be NULLed here (or, on the way out of sget_fc() might be more
appropriate) otherwise we get a double-blkdev_put() leading to NULL pointer
derefs later. This happens when I mount a device twice and then unmount them,
or mount it 3 times.


How about the attached changes?  Note that they conflict a bit with the mtd
changes from where I took the destructor piece.

David


It fixes my bug and I see nothing unexpected from an xfstests -g quick 
run with my gfs2 patch on top. Looks good.


Thanks,
Andy


---
commit 161602f963ae5a05bdb834461f7a4896286fbde0
Author: David Howells 
Date:   Wed Mar 27 11:12:57 2019 +

 bdev handling fix

diff --git a/fs/fs_context.c b/fs/fs_context.c
index 9e22fe6aafe7..fc8fda4b9fe9 100644
--- a/fs/fs_context.c
+++ b/fs/fs_context.c
@@ -425,10 +425,8 @@ void put_fs_context(struct fs_context *fc)
  
  	if (fc->need_free && fc->ops && fc->ops->free)

fc->ops->free(fc);
-#ifdef CONFIG_BLOCK
-   if (fc->bdev)
-   blkdev_put(fc->bdev, fc->bdev_mode);
-#endif
+   if (fc->dev_destructor)
+   fc->dev_destructor(fc);
  
  	security_free_mnt_opts(&fc->security);

put_net(fc->net_ns);
diff --git a/fs/super.c b/fs/super.c
index 85851adb0f19..e9e678d2c37c 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1211,8 +1211,17 @@ int vfs_get_super(struct fs_context *fc,
  EXPORT_SYMBOL(vfs_get_super);
  
  #ifdef CONFIG_BLOCK

+static void fc_bdev_destructor(struct fs_context *fc)
+{
+   if (fc->bdev) {
+   blkdev_put(fc->bdev, fc->bdev_mode);
+   fc->bdev = NULL;
+   }
+}
+
  static int set_bdev_super_fc(struct super_block *s, struct fs_context *fc)
  {
+   s->s_mode = fc->bdev_mode;
s->s_bdev = fc->bdev;
s->s_dev = s->s_bdev->bd_dev;
s->s_bdi = bdi_get(s->s_bdev->bd_bdi);
@@ -1252,6 +1261,9 @@ int vfs_get_block_super(struct fs_context *fc,
return PTR_ERR(bdev);
}
  
+	fc->dev_destructor = fc_bdev_destructor;

+   fc->bdev = bdev;
+
/* Once the superblock is inserted into the list by sget_fc(), s_umount
 * will protect the lockfs code from trying to start a snapshot while
 * we are mounting
@@ -1260,18 +1272,14 @@ int vfs_get_block_super(struct fs_context *fc,
if (bdev->bd_fsfreeze_count > 0) {
mutex_unlock(&bdev->bd_fsfreeze_mutex);
warnf(fc, "%pg: Can't mount, blockdev is frozen", bdev);
-   error = -EBUSY;
-   goto error_bdev;
+   return -EBUSY;
}
  
-	fc->bdev = bdev;

fc->sb_flags |= SB_NOSEC;
s = sget_fc(fc, test_bdev_super_fc, set_bdev_super_fc);
mutex_unlock(&bdev->bd_fsfreeze_mutex);
-   if (IS_ERR(s)) {
-   error = PTR_ERR(s);
-   goto error_bdev;
-   }
+   if (IS_ERR(s))
+   return PTR_ERR(s);
  
  	if (s->s_root) {

/* Don't summarily change the RO/RW state. */
@@ -1281,16 +1289,10 @@ int vfs_get_block_super(struct fs_context *fc,
goto error_sb;
}
  
-		/* s_umount nests inside bd_mutex during __invalidate_device().

-* blkdev_put() acquires bd_mutex and can't be called under
-* s_umount.  Drop s_umount temporarily.  This is safe as we're
-* holding an active reference.
+   /* Leave fc->bdev to fc_bdev_destructor() to clean up to avoid
+* locking conflicts.
 */
-   up_write(&s->s_umount);
-   blkdev_put(bdev, fc->bdev_mode);
-   down_write(&s->s_umount);
} else {
-   s->s_mode = fc->bdev_mode;
snprintf(s->s_id, sizeof(s->s_id), "%pg", bdev);
sb_set_blocksize(s, block_size(bdev));
error = fill_super(s, fc);
@@ -1307,11 +1309,7 @@ int vfs_get_block_super(struct fs_context *fc,
  
  error_sb:

deactivate_locked_super(s);
-error_bdev:
-   if (fc->bdev) {
-   blkdev_put(fc->bdev, fc->bdev_mode);
-   fc->bdev = NULL;
-   }
+   /* Leave fc->bdev to fc_bdev_destructor() to clean up */
return error;
  }
  EXPORT_SYMBOL(vfs_get_block_super);
@@ -1521,8 +1519,13 @@ int vfs_get_tree(struct fs_context *fc)
 * on the superblock.
 */
error = fc->ops->get_tree(fc);
-   if (error < 0)
+   if (error < 0) {
+   if (fc->dev_destructor) {
+   fc->dev_destructor

Re: [RFC PATCH 1/4] vfs: Create fs_context-aware mount_bdev() replacement

2019-03-26 Thread Andrew Price

Hi David,

I've been testing gfs2 on top of this patch and it seems...

On 19/03/2019 16:23, David Howells wrote:

Create a function, vfs_get_block_super(), that is fs_context-aware and a
replacement for mount_bdev().  It caches the block device pointer and file
open mode in the fs_context struct so that this information can be passed
into sget_fc()'s test and set functions.

Signed-off-by: David Howells 
---

  fs/fs_context.c|2 +
  fs/super.c |  106 
  include/linux/fs_context.h |6 ++
  3 files changed, 114 insertions(+)

diff --git a/fs/fs_context.c b/fs/fs_context.c
index 87e3546b9a52..ea027762c0b2 100644
--- a/fs/fs_context.c
+++ b/fs/fs_context.c
@@ -425,6 +425,8 @@ void put_fs_context(struct fs_context *fc)
  
  	if (fc->need_free && fc->ops && fc->ops->free)

fc->ops->free(fc);
+   if (fc->bdev)
+   blkdev_put(fc->bdev, fc->bdev_mode);


doing this means...

  
  	security_free_mnt_opts(&fc->security);

put_net(fc->net_ns);
diff --git a/fs/super.c b/fs/super.c
index f27ee08fb26f..85851adb0f19 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1211,6 +1211,112 @@ int vfs_get_super(struct fs_context *fc,
  EXPORT_SYMBOL(vfs_get_super);
  
  #ifdef CONFIG_BLOCK

+static int set_bdev_super_fc(struct super_block *s, struct fs_context *fc)
+{
+   s->s_bdev = fc->bdev;
+   s->s_dev = s->s_bdev->bd_dev;
+   s->s_bdi = bdi_get(s->s_bdev->bd_bdi);
+   fc->bdev = NULL;
+   return 0;
+}
+
+static int test_bdev_super_fc(struct super_block *s, struct fs_context *fc)
+{
+   return s->s_bdev == fc->bdev;
+}
+
+/**
+ * vfs_get_block_super - Get a superblock based on a single block device
+ * @fc: The filesystem context holding the parameters
+ * @keying: How to distinguish superblocks
+ * @fill_super: Helper to initialise a new superblock
+ */
+int vfs_get_block_super(struct fs_context *fc,
+   int (*fill_super)(struct super_block *,
+ struct fs_context *))
+{
+   struct block_device *bdev;
+   struct super_block *s;
+   int error = 0;
+
+   fc->bdev_mode = FMODE_READ | FMODE_EXCL;
+   if (!(fc->sb_flags & SB_RDONLY))
+   fc->bdev_mode |= FMODE_WRITE;
+
+   if (!fc->source)
+   return invalf(fc, "No source specified");
+
+   bdev = blkdev_get_by_path(fc->source, fc->bdev_mode, fc->fs_type);
+   if (IS_ERR(bdev)) {
+   errorf(fc, "%s: Can't open blockdev", fc->source);
+   return PTR_ERR(bdev);
+   }
+
+   /* Once the superblock is inserted into the list by sget_fc(), s_umount
+* will protect the lockfs code from trying to start a snapshot while
+* we are mounting
+*/
+   mutex_lock(&bdev->bd_fsfreeze_mutex);
+   if (bdev->bd_fsfreeze_count > 0) {
+   mutex_unlock(&bdev->bd_fsfreeze_mutex);
+   warnf(fc, "%pg: Can't mount, blockdev is frozen", bdev);
+   error = -EBUSY;
+   goto error_bdev;
+   }
+
+   fc->bdev = bdev;
+   fc->sb_flags |= SB_NOSEC;
+   s = sget_fc(fc, test_bdev_super_fc, set_bdev_super_fc);
+   mutex_unlock(&bdev->bd_fsfreeze_mutex);
+   if (IS_ERR(s)) {
+   error = PTR_ERR(s);
+   goto error_bdev;
+   }
+
+   if (s->s_root) {
+   /* Don't summarily change the RO/RW state. */
+   if ((fc->sb_flags ^ s->s_flags) & SB_RDONLY) {
+   warnf(fc, "%pg: Can't mount, would change RO state", 
bdev);
+   error = -EBUSY;
+   goto error_sb;
+   }
+
+   /* s_umount nests inside bd_mutex during __invalidate_device().
+* blkdev_put() acquires bd_mutex and can't be called under
+* s_umount.  Drop s_umount temporarily.  This is safe as we're
+* holding an active reference.
+*/
+   up_write(&s->s_umount);
+   blkdev_put(bdev, fc->bdev_mode);
+   down_write(&s->s_umount);


fc->bdev should be NULLed here (or, on the way out of sget_fc() might be 
more appropriate) otherwise we get a double-blkdev_put() leading to NULL 
pointer derefs later. This happens when I mount a device twice and then 
unmount them, or mount it 3 times.



+   } else {
+   s->s_mode = fc->bdev_mode;
+   snprintf(s->s_id, sizeof(s->s_id), "%pg", bdev);
+   sb_set_blocksize(s, block_size(bdev));
+   error = fill_super(s, fc);
+   if (error)
+   goto error_sb;
+
+   s->s_flags |= SB_ACTIVE;
+   bdev->bd_super = s;
+   }
+
+   BUG_ON(fc->root);


Maybe BUG_ON(fc->bdev); too?

Cheers,
Andy


Re: [Cluster-devel] [PATCH] dlm: config: Fix ENOMEM failures in make_cluster()

2016-03-29 Thread Andrew Price

On 22/03/16 17:36, Andrew Price wrote:

Commit 1ae1602de0 "configfs: switch ->default groups to a linked list"
left the NULL gps pointer behind after removing the kcalloc() call which
made it non-NULL. It also left the !gps check in place so make_cluster()
now fails with ENOMEM. Remove the remaining uses of the gps variable to
fix that.


Could this go through the linux-dlm tree as a fix for -rc2? The impact 
is that dlm_controld fails:


  dlm_controld[890]: 171 dlm_controld 4.0.4 started
  dlm_controld[890]: 171 /sys/kernel/config/dlm/cluster: mkdir failed: 12

so it's pretty important.

Cheers,
Andy



Reviewed-by: Bob Peterson 
Reviewed-by: Andreas Gruenbacher 
Signed-off-by: Andrew Price 
---
  fs/dlm/config.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/dlm/config.c b/fs/dlm/config.c
index 5191121..1669f62 100644
--- a/fs/dlm/config.c
+++ b/fs/dlm/config.c
@@ -343,13 +343,12 @@ static struct config_group *make_cluster(struct 
config_group *g,
struct dlm_cluster *cl = NULL;
struct dlm_spaces *sps = NULL;
struct dlm_comms *cms = NULL;
-   void *gps = NULL;

cl = kzalloc(sizeof(struct dlm_cluster), GFP_NOFS);
sps = kzalloc(sizeof(struct dlm_spaces), GFP_NOFS);
cms = kzalloc(sizeof(struct dlm_comms), GFP_NOFS);

-   if (!cl || !gps || !sps || !cms)
+   if (!cl || !sps || !cms)
goto fail;

config_group_init_type_name(&cl->group, name, &cluster_type);





[PATCH] dlm: config: Fix ENOMEM failures in make_cluster()

2016-03-22 Thread Andrew Price
Commit 1ae1602de0 "configfs: switch ->default groups to a linked list"
left the NULL gps pointer behind after removing the kcalloc() call which
made it non-NULL. It also left the !gps check in place so make_cluster()
now fails with ENOMEM. Remove the remaining uses of the gps variable to
fix that.

Reviewed-by: Bob Peterson 
Reviewed-by: Andreas Gruenbacher 
Signed-off-by: Andrew Price 
---
 fs/dlm/config.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/dlm/config.c b/fs/dlm/config.c
index 5191121..1669f62 100644
--- a/fs/dlm/config.c
+++ b/fs/dlm/config.c
@@ -343,13 +343,12 @@ static struct config_group *make_cluster(struct 
config_group *g,
struct dlm_cluster *cl = NULL;
struct dlm_spaces *sps = NULL;
struct dlm_comms *cms = NULL;
-   void *gps = NULL;
 
cl = kzalloc(sizeof(struct dlm_cluster), GFP_NOFS);
sps = kzalloc(sizeof(struct dlm_spaces), GFP_NOFS);
cms = kzalloc(sizeof(struct dlm_comms), GFP_NOFS);
 
-   if (!cl || !gps || !sps || !cms)
+   if (!cl || !sps || !cms)
goto fail;
 
config_group_init_type_name(&cl->group, name, &cluster_type);
-- 
2.4.3



Re: rcu: fix hlist_bl_set_first_rcu annotation

2013-02-14 Thread Andrew Price

Hi,

On 03/02/13 18:39, Paul E. McKenney wrote:

On Wed, Jan 30, 2013 at 07:07:57PM +, Steven Whitehouse wrote:


Abhi noticed that we were getting a complaint from the RCU subsystem
about access of an RCU protected list under the write side bit lock.
This patch adds additional annotation to check both the RCU read
lock and the write side bit lock before printing a message.

Signed-off-by: Steven Whitehouse 
Reported-by: Abhijith Das 
Tested-by: Abhijith Das 


Looks plausible to me on first glance, copying Nick Piggin and Christoph
Hellwig.  If they have no objections, I will queue this.

Thanx, Paul


Has this had any attention yet? I'm also seeing the complaint quite
frequently:

[   68.738811] ===
[   68.741380] [ INFO: suspicious RCU usage. ]
[   68.748785] 3.8.0-0.rc7.git1.1.fc19.x86_64 #1 Not tainted
[   68.750841] ---
[   68.752418] include/linux/rculist_bl.h:23 suspicious 
rcu_dereference_check() usage!

[   68.755264]
[   68.755264] other info that might help us debug this:
[   68.755264]
[   68.758030]
[   68.758030] rcu_scheduler_active = 1, debug_locks = 0
[   68.760316] 1 lock held by mount/476:
[   68.761896]  #0:  (&type->s_umount_key#38/1){+.+.+.}, at: 
[] sget+0x39e/0x670

[   68.767115]
[   68.767115] stack backtrace:
[   68.769529] Pid: 476, comm: mount Not tainted 
3.8.0-0.rc7.git1.1.fc19.x86_64 #1

[   68.772095] Call Trace:
[   68.772995]  [] lockdep_rcu_suspicious+0xe7/0x120
[   68.775184]  [] search_bucket+0x118/0x160 [gfs2]
[   68.777559]  [] gfs2_glock_get+0x603/0x7b0 [gfs2]
[   68.780749]  [] ? gfs2_glock_get+0x5/0x7b0 [gfs2]
[   68.784173]  [] gfs2_glock_nq_num+0x29/0x90 [gfs2]
[   68.786551]  [] gfs2_mount+0x869/0xf30 [gfs2]
[   68.788672]  [] ? sched_clock_cpu+0xa8/0x100
[   68.790739]  [] ? trace_hardirqs_off+0xd/0x10
[   68.793042]  [] ? local_clock+0x5f/0x70
[   68.794940]  [] ? __mutex_unlock_slowpath+0x80/0x170
[   68.798236]  [] mount_fs+0x39/0x1b0
[   68.800409]  [] ? __alloc_percpu+0x10/0x20
[   68.803692]  [] vfs_kern_mount+0x63/0xf0
[   68.806773]  [] do_mount+0x205/0xa90
[   68.809669]  [] ? might_fault+0x5c/0xb0
[   68.812717]  [] ? strndup_user+0x4b/0xf0
[   68.816066]  [] sys_mount+0x83/0xc0
[   68.818284]  [] system_call_fastpath+0x16/0x1b

It would be good to have this silenced for 3.8 but I think there's not 
long to go.


Thanks,
Andy


diff --git a/include/linux/list_bl.h b/include/linux/list_bl.h
index 31f9d75..2eb8855 100644
--- a/include/linux/list_bl.h
+++ b/include/linux/list_bl.h
@@ -125,6 +125,11 @@ static inline void hlist_bl_unlock(struct hlist_bl_head *b)
__bit_spin_unlock(0, (unsigned long *)b);
  }

+static inline bool hlist_bl_is_locked(struct hlist_bl_head *b)
+{
+   return bit_spin_is_locked(0, (unsigned long *)b);
+}
+
  /**
   * hlist_bl_for_each_entry- iterate over list of given type
   * @tpos: the type * to use as a loop cursor.
diff --git a/include/linux/rculist_bl.h b/include/linux/rculist_bl.h
index cf1244f..4f216c5 100644
--- a/include/linux/rculist_bl.h
+++ b/include/linux/rculist_bl.h
@@ -20,7 +20,7 @@ static inline void hlist_bl_set_first_rcu(struct 
hlist_bl_head *h,
  static inline struct hlist_bl_node *hlist_bl_first_rcu(struct hlist_bl_head 
*h)
  {
return (struct hlist_bl_node *)
-   ((unsigned long)rcu_dereference(h->first) & ~LIST_BL_LOCKMASK);
+   ((unsigned long)rcu_dereference_check(h->first, 
hlist_bl_is_locked(h)) & ~LIST_BL_LOCKMASK);
  }

  /**



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/