Re: missing stack trace entry on NULL pointer call [was: Re: BUG: unable to handle kernel NULL pointer dereference in __generic_file_write_iter]

2019-02-28 Thread Jann Horn
On Thu, Feb 28, 2019 at 5:34 PM Jann Horn  wrote:
>
> On Thu, Feb 28, 2019 at 1:57 PM Thomas Gleixner  wrote:
> > On Thu, 28 Feb 2019, Jann Horn wrote:
> > > +Josh for unwinding, +x86 folks
> > > On Wed, Feb 27, 2019 at 11:43 PM Andrew Morton
> > >  wrote:
> > > > On Thu, 21 Feb 2019 06:52:04 -0800 syzbot 
> > > >  wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > syzbot found the following crash on:
> > > > >
> > > > > HEAD commit:4aa9fc2a435a Revert "mm, memory_hotplug: initialize 
> > > > > struct..
> > > > > git tree:   upstream
> > > > > console output: 
> > > > > https://syzkaller.appspot.com/x/log.txt?x=1101382f40
> > > > > kernel config:  
> > > > > https://syzkaller.appspot.com/x/.config?x=4fceea9e2d99ac20
> > > > > dashboard link: 
> > > > > https://syzkaller.appspot.com/bug?extid=ca95b2b7aef9e7cbd6ab
> > > > > compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
> > > > >
> > > > > Unfortunately, I don't have any reproducer for this crash yet.
> > > >
> > > > Not understanding.  That seems to be saying that we got a NULL pointer
> > > > deref in __generic_file_write_iter() at
> > > >
> > > > written = generic_perform_write(file, from, 
> > > > iocb->ki_pos);
> > > >
> > > > which isn't possible.
> > > >
> > > > I'm not seeing recent changes in there which could have caused this.  
> > > > Help.
> > >
> > > +
> > >
> > > Maybe the problem is that the frame pointer unwinder isn't designed to
> > > cope with NULL function pointers - or more generally, with an
> > > unwinding operation that starts before the function's frame pointer
> > > has been set up?
> > >
> > > Unwinding starts at show_trace_log_lvl(). That begins with
> > > unwind_start(), which calls __unwind_start(), which uses
> > > get_frame_pointer(), which just returns regs->bp. But that frame
> > > pointer points to the part of the stack that's storing the address of
> > > the caller of the function that called NULL; the caller of NULL is
> > > skipped, as far as I can tell.
> > >
> > > What's kind of annoying here is that we don't have a proper frame set
> > > up yet, we only have half a stack frame (saved RIP but no saved RBP).
> >
> > That wreckage is related to the fact that the indirect calls are going
> > through __x86_indirect_thunk_$REG. I just verified on a VM with some other
> > callback NULL'ed that the resulting backtrace is not really helpful.
> >
> > So in that case generic_perform_write() has two indirect calls:
> >
> >   mapping->a_ops->write_begin() and ->write_end()
>
> Does the indirect thunk thing really make any difference? When you
> arrive at RIP=NULL, RSP points to a saved instruction pointer, just
> like when indirect calls are compiled normally.
>
> I just compiled kernels with artificial calls to a NULL function
> pointer (in prctl_set_seccomp()), with retpoline disabled, with both
> unwinders. The ORC unwinder shows a call trace with "?" everywhere
> that doesn't show the caller:
[...]
> So I think this doesn't really have anything to do with
> __x86_indirect_thunk_$REG, and the best possible fix might be to teach
> the unwinders that RIP==NULL means "pretend that RIP is *real_RSP and
> that RSP is real_RSP+8, and report *real_RSP as the first element of
> the backtrace".

Cooking up some patches now...


Re: missing stack trace entry on NULL pointer call [was: Re: BUG: unable to handle kernel NULL pointer dereference in __generic_file_write_iter]

2019-02-28 Thread Jann Horn
On Thu, Feb 28, 2019 at 1:57 PM Thomas Gleixner  wrote:
> On Thu, 28 Feb 2019, Jann Horn wrote:
> > +Josh for unwinding, +x86 folks
> > On Wed, Feb 27, 2019 at 11:43 PM Andrew Morton
> >  wrote:
> > > On Thu, 21 Feb 2019 06:52:04 -0800 syzbot 
> > >  wrote:
> > >
> > > > Hello,
> > > >
> > > > syzbot found the following crash on:
> > > >
> > > > HEAD commit:4aa9fc2a435a Revert "mm, memory_hotplug: initialize 
> > > > struct..
> > > > git tree:   upstream
> > > > console output: https://syzkaller.appspot.com/x/log.txt?x=1101382f40
> > > > kernel config:  
> > > > https://syzkaller.appspot.com/x/.config?x=4fceea9e2d99ac20
> > > > dashboard link: 
> > > > https://syzkaller.appspot.com/bug?extid=ca95b2b7aef9e7cbd6ab
> > > > compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
> > > >
> > > > Unfortunately, I don't have any reproducer for this crash yet.
> > >
> > > Not understanding.  That seems to be saying that we got a NULL pointer
> > > deref in __generic_file_write_iter() at
> > >
> > > written = generic_perform_write(file, from, iocb->ki_pos);
> > >
> > > which isn't possible.
> > >
> > > I'm not seeing recent changes in there which could have caused this.  
> > > Help.
> >
> > +
> >
> > Maybe the problem is that the frame pointer unwinder isn't designed to
> > cope with NULL function pointers - or more generally, with an
> > unwinding operation that starts before the function's frame pointer
> > has been set up?
> >
> > Unwinding starts at show_trace_log_lvl(). That begins with
> > unwind_start(), which calls __unwind_start(), which uses
> > get_frame_pointer(), which just returns regs->bp. But that frame
> > pointer points to the part of the stack that's storing the address of
> > the caller of the function that called NULL; the caller of NULL is
> > skipped, as far as I can tell.
> >
> > What's kind of annoying here is that we don't have a proper frame set
> > up yet, we only have half a stack frame (saved RIP but no saved RBP).
>
> That wreckage is related to the fact that the indirect calls are going
> through __x86_indirect_thunk_$REG. I just verified on a VM with some other
> callback NULL'ed that the resulting backtrace is not really helpful.
>
> So in that case generic_perform_write() has two indirect calls:
>
>   mapping->a_ops->write_begin() and ->write_end()

Does the indirect thunk thing really make any difference? When you
arrive at RIP=NULL, RSP points to a saved instruction pointer, just
like when indirect calls are compiled normally.

I just compiled kernels with artificial calls to a NULL function
pointer (in prctl_set_seccomp()), with retpoline disabled, with both
unwinders. The ORC unwinder shows a call trace with "?" everywhere
that doesn't show the caller:

[  228.219140] BUG: unable to handle kernel NULL pointer dereference
at 
[  228.223897] #PF error: [INSTR]
[  228.224562] PGD 0 P4D 0
[  228.225119] Oops: 0010 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
[  228.226319] CPU: 1 PID: 1099 Comm: artificial_null Not tainted
5.0.0-rc8+ #299
[  228.227818] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.10.2-1 04/01/2014
[  228.229542] RIP: 0010:  (null)
[  228.230331] Code: Bad RIP value.
[  228.231011] RSP: 0018:8881d798fe88 EFLAGS: 00010246
[  228.232104] RAX: ffda RBX: 0016 RCX: a0368205
[  228.233599] RDX: dc00 RSI: 7ffde0d71168 RDI: 0042
[  228.235077] RBP: 11103af31fd4 R08: 561b50807740 R09: 0016
[  228.236557] R10:  R11:  R12: 0042
[  228.238039] R13:  R14: 7ffde0d71168 R15: 561b50807740
[  228.239517] FS:  7fe31f1cf700() GS:8881eb04()
knlGS:
[  228.241213] CS:  0010 DS:  ES:  CR0: 80050033
[  228.242411] CR2: ffd6 CR3: 0001df8b8004 CR4: 00360ee0
[  228.243886] DR0:  DR1:  DR2: 
[  228.245364] DR3:  DR6: fffe0ff0 DR7: 0400
[  228.246841] Call Trace:
[  228.247366]  ? __x64_sys_prctl+0x402/0x680
[  228.248224]  ? __ia32_sys_prctl+0x6e0/0x6e0
[  228.249106]  ? __do_page_fault+0x457/0x620
[  228.249969]  ? do_syscall_64+0x6d/0x160
[  228.250778]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[...]

whereas the FP unwinder shows this, listing prctl_set_seccomp only
with a question mark:

[   47.469957] BUG: unable to handle kernel NULL pointer dereference
at 
[   47.476973] #PF error: [INSTR]
[   47.477742] PGD 0 P4D 0
[   47.478341] Oops: 0010 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
[   47.479703] CPU: 4 PID: 1079 Comm: artificial_null Not tainted
5.0.0-rc8+ #300
[   47.481489] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.10.2-1 04/01/2014
[   47.483442] RIP: 0010:  (null)
[   47.484328] Code: Bad RIP value.
[   47.485085] RSP: 0018:8881e01f7e70 EFLAGS: 00010246
[   

Re: missing stack trace entry on NULL pointer call [was: Re: BUG: unable to handle kernel NULL pointer dereference in __generic_file_write_iter]

2019-02-28 Thread Thomas Gleixner
On Thu, 28 Feb 2019, Jann Horn wrote:
> +Josh for unwinding, +x86 folks
> On Wed, Feb 27, 2019 at 11:43 PM Andrew Morton
>  wrote:
> > On Thu, 21 Feb 2019 06:52:04 -0800 syzbot 
> >  wrote:
> >
> > > Hello,
> > >
> > > syzbot found the following crash on:
> > >
> > > HEAD commit:4aa9fc2a435a Revert "mm, memory_hotplug: initialize 
> > > struct..
> > > git tree:   upstream
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=1101382f40
> > > kernel config:  https://syzkaller.appspot.com/x/.config?x=4fceea9e2d99ac20
> > > dashboard link: 
> > > https://syzkaller.appspot.com/bug?extid=ca95b2b7aef9e7cbd6ab
> > > compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
> > >
> > > Unfortunately, I don't have any reproducer for this crash yet.
> >
> > Not understanding.  That seems to be saying that we got a NULL pointer
> > deref in __generic_file_write_iter() at
> >
> > written = generic_perform_write(file, from, iocb->ki_pos);
> >
> > which isn't possible.
> >
> > I'm not seeing recent changes in there which could have caused this.  Help.
> 
> +
> 
> Maybe the problem is that the frame pointer unwinder isn't designed to
> cope with NULL function pointers - or more generally, with an
> unwinding operation that starts before the function's frame pointer
> has been set up?
> 
> Unwinding starts at show_trace_log_lvl(). That begins with
> unwind_start(), which calls __unwind_start(), which uses
> get_frame_pointer(), which just returns regs->bp. But that frame
> pointer points to the part of the stack that's storing the address of
> the caller of the function that called NULL; the caller of NULL is
> skipped, as far as I can tell.
> 
> What's kind of annoying here is that we don't have a proper frame set
> up yet, we only have half a stack frame (saved RIP but no saved RBP).

That wreckage is related to the fact that the indirect calls are going
through __x86_indirect_thunk_$REG. I just verified on a VM with some other
callback NULL'ed that the resulting backtrace is not really helpful.

So in that case generic_perform_write() has two indirect calls:

  mapping->a_ops->write_begin() and ->write_end()

Thanks,

tglx


missing stack trace entry on NULL pointer call [was: Re: BUG: unable to handle kernel NULL pointer dereference in __generic_file_write_iter]

2019-02-27 Thread Jann Horn
+Josh for unwinding, +x86 folks

On Wed, Feb 27, 2019 at 11:43 PM Andrew Morton
 wrote:
> On Thu, 21 Feb 2019 06:52:04 -0800 syzbot 
>  wrote:
>
> > Hello,
> >
> > syzbot found the following crash on:
> >
> > HEAD commit:4aa9fc2a435a Revert "mm, memory_hotplug: initialize struct..
> > git tree:   upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=1101382f40
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=4fceea9e2d99ac20
> > dashboard link: https://syzkaller.appspot.com/bug?extid=ca95b2b7aef9e7cbd6ab
> > compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
> >
> > Unfortunately, I don't have any reproducer for this crash yet.
>
> Not understanding.  That seems to be saying that we got a NULL pointer
> deref in __generic_file_write_iter() at
>
> written = generic_perform_write(file, from, iocb->ki_pos);
>
> which isn't possible.
>
> I'm not seeing recent changes in there which could have caused this.  Help.

+

Maybe the problem is that the frame pointer unwinder isn't designed to
cope with NULL function pointers - or more generally, with an
unwinding operation that starts before the function's frame pointer
has been set up?

Unwinding starts at show_trace_log_lvl(). That begins with
unwind_start(), which calls __unwind_start(), which uses
get_frame_pointer(), which just returns regs->bp. But that frame
pointer points to the part of the stack that's storing the address of
the caller of the function that called NULL; the caller of NULL is
skipped, as far as I can tell.

What's kind of annoying here is that we don't have a proper frame set
up yet, we only have half a stack frame (saved RIP but no saved RBP).

> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > Reported-by: syzbot+ca95b2b7aef9e7cbd...@syzkaller.appspotmail.com
> >
> > BUG: unable to handle kernel NULL pointer dereference at 
> > #PF error: [INSTR]
> > PGD a7ea0067 P4D a7ea0067 PUD 81535067 PMD 0
> > Oops: 0010 [#1] PREEMPT SMP KASAN
> > CPU: 0 PID: 15924 Comm: syz-executor0 Not tainted 5.0.0-rc4+ #50
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> > Google 01/01/2011
> > RIP: 0010:  (null)
> > Code: Bad RIP value.
> > RSP: 0018:88804c3d7858 EFLAGS: 00010246
> > RAX:  RBX: 885aeb60 RCX: 005b
> > RDX:  RSI: 88807ec22930 RDI: 8880a59bdcc0
> > RBP: 88804c3d79b8 R08:  R09: 88804c3d7910
> > R10: 8880835ca200 R11:  R12: 005b
> > R13: 88804c3d7c98 R14: dc00 R15: 
> > FS:  7f3456db4700() GS:8880ae60() knlGS:
> > CS:  0010 DS:  ES:  CR0: 80050033
> > CR2: ffd6 CR3: 814ac000 CR4: 001406f0
> > DR0:  DR1:  DR2: 
> > DR3:  DR6: fffe0ff0 DR7: 0400
> > Call Trace:
> >   __generic_file_write_iter+0x25e/0x630 mm/filemap.c:
> >   ext4_file_write_iter+0x37a/0x1410 fs/ext4/file.c:266
> >   call_write_iter include/linux/fs.h:1862 [inline]
> >   new_sync_write fs/read_write.c:474 [inline]
> >   __vfs_write+0x764/0xb40 fs/read_write.c:487
> >   vfs_write+0x20c/0x580 fs/read_write.c:549
> >   ksys_write+0x105/0x260 fs/read_write.c:598
> >   __do_sys_write fs/read_write.c:610 [inline]
> >   __se_sys_write fs/read_write.c:607 [inline]
> >   __x64_sys_write+0x73/0xb0 fs/read_write.c:607
> >   do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
> >   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > RIP: 0033:0x458089
> > Code: 6d b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7
> > 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
> > ff 0f 83 3b b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
> > RSP: 002b:7f3456db3c78 EFLAGS: 0246 ORIG_RAX: 0001
> > RAX: ffda RBX: 0003 RCX: 00458089
> > RDX: 005b RSI: 2240 RDI: 0003
> > RBP: 0073bf00 R08:  R09: 
> > R10:  R11: 0246 R12: 7f3456db46d4
> > R13: 004c7450 R14: 004dce68 R15: 
> > Modules linked in:
> > CR2: 
> > ---[ end trace 5cac9d2c75a59916 ]---
> > kobject: 'loop5' (4426a409): kobject_uevent_env
> > RIP: 0010:  (null)
> > Code: Bad RIP value.
> > RSP: 0018:88804c3d7858 EFLAGS: 00010246
> > RAX:  RBX: 885aeb60 RCX: 005b
> > kobject: 'loop5' (4426a409): fill_kobj_path: path
> > = '/devices/virtual/block/loop5'
> > RDX:  RSI: 88807ec22930 RDI: 8880a59bdcc0
> > kobject: 'loop2' (b82e0c58): kobject_uevent_env
> > kobject: 'loop2' (b82e0c58): fill_kobj_path: path
> > = '/devices/virtual/block/loop2'
> > RBP: 88804c3d79b8 R08: