Re: [PATCH] x86/orc: Don't bail on stack overflow
* Josh Poimboeufwrote: > On Mon, Nov 27, 2017 at 10:38:42AM +0100, Ingo Molnar wrote: > > > > * Josh Poimboeuf wrote: > > > > > On Sat, Nov 25, 2017 at 04:16:23PM -0800, Andy Lutomirski wrote: > > > > Can you send me whatever config and exact commit hash generated this? > > > > I can try to figure out why it failed. > > > > > > Sorry, I've been traveling. I just got some time to take a look at > > > this. I think there are at least two unwinder issues here: > > > > > > - It doesn't deal gracefully with the case where the stack overflows and > > > the stack pointer itself isn't on a valid stack but the > > > to-be-dereferenced data *is*. > > > > > > - The oops dump code doesn't know how to print partial pt_regs, for the > > > case where if we get an interrupt/exception in *early* entry code > > > before the full pt_regs have been saved. > > > > > > (Andy, I'm not quite sure about your patch, and whether it's still > > > needed after these patches. I'll need to look at it later when I have > > > more time.) > > > > > > I attempted to fix both of the issues with the below patch. Thomas or > > > Ingo, can you test to see if this gets rid of the question marks? > > > > > > I can split it up into proper patches next week. I'm assuming this > > > isn't holding up the KAISER merge? > > > > It's not holding up the Kaiser merge, but good debuggability of weird > > crashes is a > > really good thing, so I constructed a changelog and picked up this patch as > > a > > single commit, and added your Signed-off-by, if that's OK with you. > > > > Will only push it out if it passes testing. > > The commit log looks good, though there's a CONFIG_FRAME_POINTER build > failure. Can you fold in this fix? > > diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c > index 7dd0d2a0d142..77835bc021c7 100644 > --- a/arch/x86/kernel/stacktrace.c > +++ b/arch/x86/kernel/stacktrace.c > @@ -102,7 +102,7 @@ __save_stack_trace_reliable(struct stack_trace *trace, > for (unwind_start(, task, NULL, NULL); !unwind_done(); >unwind_next_frame()) { > > - regs = unwind_get_entry_regs(, NULL); > + regs = unwind_get_entry_regs(); > if (regs) { > /* >* Kernel mode registers on the stack indicate an Done, thanks! Ingo
Re: [PATCH] x86/orc: Don't bail on stack overflow
* Josh Poimboeuf wrote: > On Mon, Nov 27, 2017 at 10:38:42AM +0100, Ingo Molnar wrote: > > > > * Josh Poimboeuf wrote: > > > > > On Sat, Nov 25, 2017 at 04:16:23PM -0800, Andy Lutomirski wrote: > > > > Can you send me whatever config and exact commit hash generated this? > > > > I can try to figure out why it failed. > > > > > > Sorry, I've been traveling. I just got some time to take a look at > > > this. I think there are at least two unwinder issues here: > > > > > > - It doesn't deal gracefully with the case where the stack overflows and > > > the stack pointer itself isn't on a valid stack but the > > > to-be-dereferenced data *is*. > > > > > > - The oops dump code doesn't know how to print partial pt_regs, for the > > > case where if we get an interrupt/exception in *early* entry code > > > before the full pt_regs have been saved. > > > > > > (Andy, I'm not quite sure about your patch, and whether it's still > > > needed after these patches. I'll need to look at it later when I have > > > more time.) > > > > > > I attempted to fix both of the issues with the below patch. Thomas or > > > Ingo, can you test to see if this gets rid of the question marks? > > > > > > I can split it up into proper patches next week. I'm assuming this > > > isn't holding up the KAISER merge? > > > > It's not holding up the Kaiser merge, but good debuggability of weird > > crashes is a > > really good thing, so I constructed a changelog and picked up this patch as > > a > > single commit, and added your Signed-off-by, if that's OK with you. > > > > Will only push it out if it passes testing. > > The commit log looks good, though there's a CONFIG_FRAME_POINTER build > failure. Can you fold in this fix? > > diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c > index 7dd0d2a0d142..77835bc021c7 100644 > --- a/arch/x86/kernel/stacktrace.c > +++ b/arch/x86/kernel/stacktrace.c > @@ -102,7 +102,7 @@ __save_stack_trace_reliable(struct stack_trace *trace, > for (unwind_start(, task, NULL, NULL); !unwind_done(); >unwind_next_frame()) { > > - regs = unwind_get_entry_regs(, NULL); > + regs = unwind_get_entry_regs(); > if (regs) { > /* >* Kernel mode registers on the stack indicate an Done, thanks! Ingo
Re: [PATCH] x86/orc: Don't bail on stack overflow
On Mon, Nov 27, 2017 at 10:38:42AM +0100, Ingo Molnar wrote: > > * Josh Poimboeufwrote: > > > On Sat, Nov 25, 2017 at 04:16:23PM -0800, Andy Lutomirski wrote: > > > Can you send me whatever config and exact commit hash generated this? > > > I can try to figure out why it failed. > > > > Sorry, I've been traveling. I just got some time to take a look at > > this. I think there are at least two unwinder issues here: > > > > - It doesn't deal gracefully with the case where the stack overflows and > > the stack pointer itself isn't on a valid stack but the > > to-be-dereferenced data *is*. > > > > - The oops dump code doesn't know how to print partial pt_regs, for the > > case where if we get an interrupt/exception in *early* entry code > > before the full pt_regs have been saved. > > > > (Andy, I'm not quite sure about your patch, and whether it's still > > needed after these patches. I'll need to look at it later when I have > > more time.) > > > > I attempted to fix both of the issues with the below patch. Thomas or > > Ingo, can you test to see if this gets rid of the question marks? > > > > I can split it up into proper patches next week. I'm assuming this > > isn't holding up the KAISER merge? > > It's not holding up the Kaiser merge, but good debuggability of weird crashes > is a > really good thing, so I constructed a changelog and picked up this patch as a > single commit, and added your Signed-off-by, if that's OK with you. > > Will only push it out if it passes testing. The commit log looks good, though there's a CONFIG_FRAME_POINTER build failure. Can you fold in this fix? diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c index 7dd0d2a0d142..77835bc021c7 100644 --- a/arch/x86/kernel/stacktrace.c +++ b/arch/x86/kernel/stacktrace.c @@ -102,7 +102,7 @@ __save_stack_trace_reliable(struct stack_trace *trace, for (unwind_start(, task, NULL, NULL); !unwind_done(); unwind_next_frame()) { - regs = unwind_get_entry_regs(, NULL); + regs = unwind_get_entry_regs(); if (regs) { /* * Kernel mode registers on the stack indicate an
Re: [PATCH] x86/orc: Don't bail on stack overflow
On Mon, Nov 27, 2017 at 10:38:42AM +0100, Ingo Molnar wrote: > > * Josh Poimboeuf wrote: > > > On Sat, Nov 25, 2017 at 04:16:23PM -0800, Andy Lutomirski wrote: > > > Can you send me whatever config and exact commit hash generated this? > > > I can try to figure out why it failed. > > > > Sorry, I've been traveling. I just got some time to take a look at > > this. I think there are at least two unwinder issues here: > > > > - It doesn't deal gracefully with the case where the stack overflows and > > the stack pointer itself isn't on a valid stack but the > > to-be-dereferenced data *is*. > > > > - The oops dump code doesn't know how to print partial pt_regs, for the > > case where if we get an interrupt/exception in *early* entry code > > before the full pt_regs have been saved. > > > > (Andy, I'm not quite sure about your patch, and whether it's still > > needed after these patches. I'll need to look at it later when I have > > more time.) > > > > I attempted to fix both of the issues with the below patch. Thomas or > > Ingo, can you test to see if this gets rid of the question marks? > > > > I can split it up into proper patches next week. I'm assuming this > > isn't holding up the KAISER merge? > > It's not holding up the Kaiser merge, but good debuggability of weird crashes > is a > really good thing, so I constructed a changelog and picked up this patch as a > single commit, and added your Signed-off-by, if that's OK with you. > > Will only push it out if it passes testing. The commit log looks good, though there's a CONFIG_FRAME_POINTER build failure. Can you fold in this fix? diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c index 7dd0d2a0d142..77835bc021c7 100644 --- a/arch/x86/kernel/stacktrace.c +++ b/arch/x86/kernel/stacktrace.c @@ -102,7 +102,7 @@ __save_stack_trace_reliable(struct stack_trace *trace, for (unwind_start(, task, NULL, NULL); !unwind_done(); unwind_next_frame()) { - regs = unwind_get_entry_regs(, NULL); + regs = unwind_get_entry_regs(); if (regs) { /* * Kernel mode registers on the stack indicate an
Re: [PATCH] x86/orc: Don't bail on stack overflow
* Ingo Molnarwrote: > Will only push it out if it passes testing. So far it required the small fix below for 32-bit. Thanks, Ingo --- arch/x86/kernel/dumpstack.c | 7 +++ arch/x86/kernel/process_64.c | 7 --- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c index f7f3b5b3bc05..f214f7a6ccad 100644 --- a/arch/x86/kernel/dumpstack.c +++ b/arch/x86/kernel/dumpstack.c @@ -68,6 +68,13 @@ static void printk_stack_address(unsigned long address, int reliable, printk("%s %s%pB\n", log_lvl, reliable ? "" : "? ", (void *)address); } +void show_iret_regs(struct pt_regs *regs) +{ + printk(KERN_DEFAULT "RIP: %04x:%pS\n", (int)regs->cs, (void *)regs->ip); + printk(KERN_DEFAULT "RSP: %04x:%016lx EFLAGS: %08lx", (int)regs->ss, + regs->sp, regs->flags); +} + static void show_regs_safe(struct stack_info *info, struct pt_regs *regs) { if (on_stack(info, regs, sizeof(*regs))) diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index 2e0e2979e34d..631e229ab428 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -61,13 +61,6 @@ __visible DEFINE_PER_CPU_USER_MAPPED(unsigned long, rsp_scratch); -void show_iret_regs(struct pt_regs *regs) -{ - printk(KERN_DEFAULT "RIP: %04lx:%pS\n", regs->cs, (void *)regs->ip); - printk(KERN_DEFAULT "RSP: %04lx:%016lx EFLAGS: %08lx", regs->ss, - regs->sp, regs->flags); -} - /* Prints also some state that isn't saved in the pt_regs */ void __show_regs(struct pt_regs *regs, int all) {
Re: [PATCH] x86/orc: Don't bail on stack overflow
* Ingo Molnar wrote: > Will only push it out if it passes testing. So far it required the small fix below for 32-bit. Thanks, Ingo --- arch/x86/kernel/dumpstack.c | 7 +++ arch/x86/kernel/process_64.c | 7 --- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c index f7f3b5b3bc05..f214f7a6ccad 100644 --- a/arch/x86/kernel/dumpstack.c +++ b/arch/x86/kernel/dumpstack.c @@ -68,6 +68,13 @@ static void printk_stack_address(unsigned long address, int reliable, printk("%s %s%pB\n", log_lvl, reliable ? "" : "? ", (void *)address); } +void show_iret_regs(struct pt_regs *regs) +{ + printk(KERN_DEFAULT "RIP: %04x:%pS\n", (int)regs->cs, (void *)regs->ip); + printk(KERN_DEFAULT "RSP: %04x:%016lx EFLAGS: %08lx", (int)regs->ss, + regs->sp, regs->flags); +} + static void show_regs_safe(struct stack_info *info, struct pt_regs *regs) { if (on_stack(info, regs, sizeof(*regs))) diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index 2e0e2979e34d..631e229ab428 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -61,13 +61,6 @@ __visible DEFINE_PER_CPU_USER_MAPPED(unsigned long, rsp_scratch); -void show_iret_regs(struct pt_regs *regs) -{ - printk(KERN_DEFAULT "RIP: %04lx:%pS\n", regs->cs, (void *)regs->ip); - printk(KERN_DEFAULT "RSP: %04lx:%016lx EFLAGS: %08lx", regs->ss, - regs->sp, regs->flags); -} - /* Prints also some state that isn't saved in the pt_regs */ void __show_regs(struct pt_regs *regs, int all) {
Re: [PATCH] x86/orc: Don't bail on stack overflow
* Josh Poimboeufwrote: > On Sat, Nov 25, 2017 at 04:16:23PM -0800, Andy Lutomirski wrote: > > Can you send me whatever config and exact commit hash generated this? > > I can try to figure out why it failed. > > Sorry, I've been traveling. I just got some time to take a look at > this. I think there are at least two unwinder issues here: > > - It doesn't deal gracefully with the case where the stack overflows and > the stack pointer itself isn't on a valid stack but the > to-be-dereferenced data *is*. > > - The oops dump code doesn't know how to print partial pt_regs, for the > case where if we get an interrupt/exception in *early* entry code > before the full pt_regs have been saved. > > (Andy, I'm not quite sure about your patch, and whether it's still > needed after these patches. I'll need to look at it later when I have > more time.) > > I attempted to fix both of the issues with the below patch. Thomas or > Ingo, can you test to see if this gets rid of the question marks? > > I can split it up into proper patches next week. I'm assuming this > isn't holding up the KAISER merge? It's not holding up the Kaiser merge, but good debuggability of weird crashes is a really good thing, so I constructed a changelog and picked up this patch as a single commit, and added your Signed-off-by, if that's OK with you. Will only push it out if it passes testing. Thanks, Ingo
Re: [PATCH] x86/orc: Don't bail on stack overflow
* Josh Poimboeuf wrote: > On Sat, Nov 25, 2017 at 04:16:23PM -0800, Andy Lutomirski wrote: > > Can you send me whatever config and exact commit hash generated this? > > I can try to figure out why it failed. > > Sorry, I've been traveling. I just got some time to take a look at > this. I think there are at least two unwinder issues here: > > - It doesn't deal gracefully with the case where the stack overflows and > the stack pointer itself isn't on a valid stack but the > to-be-dereferenced data *is*. > > - The oops dump code doesn't know how to print partial pt_regs, for the > case where if we get an interrupt/exception in *early* entry code > before the full pt_regs have been saved. > > (Andy, I'm not quite sure about your patch, and whether it's still > needed after these patches. I'll need to look at it later when I have > more time.) > > I attempted to fix both of the issues with the below patch. Thomas or > Ingo, can you test to see if this gets rid of the question marks? > > I can split it up into proper patches next week. I'm assuming this > isn't holding up the KAISER merge? It's not holding up the Kaiser merge, but good debuggability of weird crashes is a really good thing, so I constructed a changelog and picked up this patch as a single commit, and added your Signed-off-by, if that's OK with you. Will only push it out if it passes testing. Thanks, Ingo
Re: [PATCH] x86/orc: Don't bail on stack overflow
On Sat, 25 Nov 2017, Josh Poimboeuf wrote: > It looks a *lot* better with mine and your patches applied. It probably > would have helped Ingo and Thomas figure the problem out a lot sooner: > [1.159583] CS: 0010 DS: ES: CR0: 80050033 > [1.159583] CR2: ff083fb8 CR3: 000136f78002 CR4: > 001606e0 > [1.159583] Call Trace: > [1.159583] > [1.159583] __do_page_fault+0x4b0/0x4b0 > [1.159583] page_fault+0x2c/0x60 > [1.159583] RIP: 0010:do_page_fault+0x0/0x100 > [1.159583] RSP: :ff084120 EFLAGS: 00010012 > [1.159583] RAX: 819d0a87 RBX: 0001 RCX: > 819d0a87 > [1.159583] RDX: 1000 RSI: 0010 RDI: > ff084128 > [1.159583] RBP: 0d68 R08: 7f6d6bb24278 R09: > 0023 > [1.159583] R10: 558e0feca600 R11: 0246 R12: > 7f6d6bb203c0 > [1.159583] R13: 7f6d6bb1f880 R14: 7793bebc R15: > 0100 > [1.159583] ? native_iret+0x7/0x7 > [1.159583] page_fault+0x2c/0x60 > [1.159583] RIP: 0010:apic_timer_interrupt+0x0/0xb0 Yes. That would have pointed immediately to the right place. It'd been obvious that apic_timer_interrupt is not mapped. Thanks, tglx
Re: [PATCH] x86/orc: Don't bail on stack overflow
On Sat, 25 Nov 2017, Josh Poimboeuf wrote: > It looks a *lot* better with mine and your patches applied. It probably > would have helped Ingo and Thomas figure the problem out a lot sooner: > [1.159583] CS: 0010 DS: ES: CR0: 80050033 > [1.159583] CR2: ff083fb8 CR3: 000136f78002 CR4: > 001606e0 > [1.159583] Call Trace: > [1.159583] > [1.159583] __do_page_fault+0x4b0/0x4b0 > [1.159583] page_fault+0x2c/0x60 > [1.159583] RIP: 0010:do_page_fault+0x0/0x100 > [1.159583] RSP: :ff084120 EFLAGS: 00010012 > [1.159583] RAX: 819d0a87 RBX: 0001 RCX: > 819d0a87 > [1.159583] RDX: 1000 RSI: 0010 RDI: > ff084128 > [1.159583] RBP: 0d68 R08: 7f6d6bb24278 R09: > 0023 > [1.159583] R10: 558e0feca600 R11: 0246 R12: > 7f6d6bb203c0 > [1.159583] R13: 7f6d6bb1f880 R14: 7793bebc R15: > 0100 > [1.159583] ? native_iret+0x7/0x7 > [1.159583] page_fault+0x2c/0x60 > [1.159583] RIP: 0010:apic_timer_interrupt+0x0/0xb0 Yes. That would have pointed immediately to the right place. It'd been obvious that apic_timer_interrupt is not mapped. Thanks, tglx
Re: [PATCH] x86/orc: Don't bail on stack overflow
On Sat, Nov 25, 2017 at 10:41:15PM -0600, Josh Poimboeuf wrote: > On Sat, Nov 25, 2017 at 08:25:12PM -0800, Andy Lutomirski wrote: > > On Sat, Nov 25, 2017 at 6:40 PM, Josh Poimboeufwrote: > > > On Sat, Nov 25, 2017 at 04:16:23PM -0800, Andy Lutomirski wrote: > > >> Can you send me whatever config and exact commit hash generated this? > > >> I can try to figure out why it failed. > > > > > > Sorry, I've been traveling. I just got some time to take a look at > > > this. I think there are at least two unwinder issues here: > > > > > > - It doesn't deal gracefully with the case where the stack overflows and > > > the stack pointer itself isn't on a valid stack but the > > > to-be-dereferenced data *is*. > > > > > > - The oops dump code doesn't know how to print partial pt_regs, for the > > > case where if we get an interrupt/exception in *early* entry code > > > before the full pt_regs have been saved. > > > > > > (Andy, I'm not quite sure about your patch, and whether it's still > > > needed after these patches. I'll need to look at it later when I have > > > more time.) > > > > I haven't tested yet, but I think my patch is probably still needed. > > The issue I fixed is that unwind_start() would bail out early if sp > > was below the stack. Also: > > Makes sense, maybe both are needed. Your patch deals with a bad SP at > the beginning and mine deals with a bad SP in the middle. I was able to recreate with the config Ingo posted earlier, along with the following patch: diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index e12168936d3f..693a20d309e3 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -500,7 +500,7 @@ VMLINUX_SYMBOL(__entry_text_end) = .; #define IRQENTRY_TEXT \ - ALIGN_FUNCTION(); \ + . = ALIGN(4096);\ VMLINUX_SYMBOL(__irqentry_text_start) = .; \ *(.irqentry.text) \ VMLINUX_SYMBOL(__irqentry_text_end) = .; It looks a *lot* better with mine and your patches applied. It probably would have helped Ingo and Thomas figure the problem out a lot sooner: [1.159016] PANIC: double fault, error_code: 0x0 [1.159583] CPU: 1 PID: 68 Comm: modprobe Not tainted 4.14.0-01257-g761d390195b6-dirty #19 [1.159583] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc26 04/01/2014 [1.159583] task: 880136f6bb00 task.stack: c9984000 [1.159583] RIP: 0010:page_fault+0x11/0x60 [1.159583] RSP: :ff083fc8 EFLAGS: 00010046 [1.159583] RAX: 819d0a87 RBX: 0001 RCX: 819d0a87 [1.159583] RDX: 1000 RSI: 0010 RDI: ff084078 [1.159583] RBP: 0d68 R08: 7f6d6bb24278 R09: 0023 [1.159583] R10: 558e0feca600 R11: 0246 R12: 7f6d6bb203c0 [1.159583] R13: 7f6d6bb1f880 R14: 7793bebc R15: 0100 [1.159583] FS: 7f6d6c39c5c0() GS:88013fc4() knlGS: [1.159583] CS: 0010 DS: ES: CR0: 80050033 [1.159583] CR2: ff083fb8 CR3: 000136f78002 CR4: 001606e0 [1.159583] Call Trace: [1.159583] [1.159583] __do_page_fault+0x4b0/0x4b0 [1.159583] page_fault+0x2c/0x60 [1.159583] RIP: 0010:do_page_fault+0x0/0x100 [1.159583] RSP: :ff084120 EFLAGS: 00010012 [1.159583] RAX: 819d0a87 RBX: 0001 RCX: 819d0a87 [1.159583] RDX: 1000 RSI: 0010 RDI: ff084128 [1.159583] RBP: 0d68 R08: 7f6d6bb24278 R09: 0023 [1.159583] R10: 558e0feca600 R11: 0246 R12: 7f6d6bb203c0 [1.159583] R13: 7f6d6bb1f880 R14: 7793bebc R15: 0100 [1.159583] ? native_iret+0x7/0x7 [1.159583] page_fault+0x2c/0x60 [1.159583] RIP: 0010:apic_timer_interrupt+0x0/0xb0 [1.159583] RSP: :ff0841d8 EFLAGS: 00010046 [1.159583] RAX: 0374 RBX: 558e0feca2c0 RCX: 7f6d6b85aaf0 [1.159583] RDX: 1000 RSI: 558e0feca600 RDI: [1.159583] RBP: 0d68 R08: 7f6d6bb24278 R09: 0023 [1.159583] R10: 558e0feca600 R11: 0246 R12: 7f6d6bb203c0 [1.159583] R13: 7f6d6bb1f880 R14: 7793bebc R15: 0100 [1.159583] RIP: 0033:0x7f6d6b85aaf0 [1.159583] RSP: 002b:7793bd68 EFLAGS: 0246 ORIG_RAX: 0003 [1.159583] RAX: 819d2000 RBX: 7f6d6b85aaf0 RCX: 0010 [1.159583] RDX: 00010046 RSI: ff0841d8 RDI: [1.159583] RBP:
Re: [PATCH] x86/orc: Don't bail on stack overflow
On Sat, Nov 25, 2017 at 10:41:15PM -0600, Josh Poimboeuf wrote: > On Sat, Nov 25, 2017 at 08:25:12PM -0800, Andy Lutomirski wrote: > > On Sat, Nov 25, 2017 at 6:40 PM, Josh Poimboeuf wrote: > > > On Sat, Nov 25, 2017 at 04:16:23PM -0800, Andy Lutomirski wrote: > > >> Can you send me whatever config and exact commit hash generated this? > > >> I can try to figure out why it failed. > > > > > > Sorry, I've been traveling. I just got some time to take a look at > > > this. I think there are at least two unwinder issues here: > > > > > > - It doesn't deal gracefully with the case where the stack overflows and > > > the stack pointer itself isn't on a valid stack but the > > > to-be-dereferenced data *is*. > > > > > > - The oops dump code doesn't know how to print partial pt_regs, for the > > > case where if we get an interrupt/exception in *early* entry code > > > before the full pt_regs have been saved. > > > > > > (Andy, I'm not quite sure about your patch, and whether it's still > > > needed after these patches. I'll need to look at it later when I have > > > more time.) > > > > I haven't tested yet, but I think my patch is probably still needed. > > The issue I fixed is that unwind_start() would bail out early if sp > > was below the stack. Also: > > Makes sense, maybe both are needed. Your patch deals with a bad SP at > the beginning and mine deals with a bad SP in the middle. I was able to recreate with the config Ingo posted earlier, along with the following patch: diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index e12168936d3f..693a20d309e3 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -500,7 +500,7 @@ VMLINUX_SYMBOL(__entry_text_end) = .; #define IRQENTRY_TEXT \ - ALIGN_FUNCTION(); \ + . = ALIGN(4096);\ VMLINUX_SYMBOL(__irqentry_text_start) = .; \ *(.irqentry.text) \ VMLINUX_SYMBOL(__irqentry_text_end) = .; It looks a *lot* better with mine and your patches applied. It probably would have helped Ingo and Thomas figure the problem out a lot sooner: [1.159016] PANIC: double fault, error_code: 0x0 [1.159583] CPU: 1 PID: 68 Comm: modprobe Not tainted 4.14.0-01257-g761d390195b6-dirty #19 [1.159583] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc26 04/01/2014 [1.159583] task: 880136f6bb00 task.stack: c9984000 [1.159583] RIP: 0010:page_fault+0x11/0x60 [1.159583] RSP: :ff083fc8 EFLAGS: 00010046 [1.159583] RAX: 819d0a87 RBX: 0001 RCX: 819d0a87 [1.159583] RDX: 1000 RSI: 0010 RDI: ff084078 [1.159583] RBP: 0d68 R08: 7f6d6bb24278 R09: 0023 [1.159583] R10: 558e0feca600 R11: 0246 R12: 7f6d6bb203c0 [1.159583] R13: 7f6d6bb1f880 R14: 7793bebc R15: 0100 [1.159583] FS: 7f6d6c39c5c0() GS:88013fc4() knlGS: [1.159583] CS: 0010 DS: ES: CR0: 80050033 [1.159583] CR2: ff083fb8 CR3: 000136f78002 CR4: 001606e0 [1.159583] Call Trace: [1.159583] [1.159583] __do_page_fault+0x4b0/0x4b0 [1.159583] page_fault+0x2c/0x60 [1.159583] RIP: 0010:do_page_fault+0x0/0x100 [1.159583] RSP: :ff084120 EFLAGS: 00010012 [1.159583] RAX: 819d0a87 RBX: 0001 RCX: 819d0a87 [1.159583] RDX: 1000 RSI: 0010 RDI: ff084128 [1.159583] RBP: 0d68 R08: 7f6d6bb24278 R09: 0023 [1.159583] R10: 558e0feca600 R11: 0246 R12: 7f6d6bb203c0 [1.159583] R13: 7f6d6bb1f880 R14: 7793bebc R15: 0100 [1.159583] ? native_iret+0x7/0x7 [1.159583] page_fault+0x2c/0x60 [1.159583] RIP: 0010:apic_timer_interrupt+0x0/0xb0 [1.159583] RSP: :ff0841d8 EFLAGS: 00010046 [1.159583] RAX: 0374 RBX: 558e0feca2c0 RCX: 7f6d6b85aaf0 [1.159583] RDX: 1000 RSI: 558e0feca600 RDI: [1.159583] RBP: 0d68 R08: 7f6d6bb24278 R09: 0023 [1.159583] R10: 558e0feca600 R11: 0246 R12: 7f6d6bb203c0 [1.159583] R13: 7f6d6bb1f880 R14: 7793bebc R15: 0100 [1.159583] RIP: 0033:0x7f6d6b85aaf0 [1.159583] RSP: 002b:7793bd68 EFLAGS: 0246 ORIG_RAX: 0003 [1.159583] RAX: 819d2000 RBX: 7f6d6b85aaf0 RCX: 0010 [1.159583] RDX: 00010046 RSI: ff0841d8 RDI: [1.159583] RBP: 0374 R08:
Re: [PATCH] x86/orc: Don't bail on stack overflow
On Sat, Nov 25, 2017 at 08:25:12PM -0800, Andy Lutomirski wrote: > On Sat, Nov 25, 2017 at 6:40 PM, Josh Poimboeufwrote: > > On Sat, Nov 25, 2017 at 04:16:23PM -0800, Andy Lutomirski wrote: > >> Can you send me whatever config and exact commit hash generated this? > >> I can try to figure out why it failed. > > > > Sorry, I've been traveling. I just got some time to take a look at > > this. I think there are at least two unwinder issues here: > > > > - It doesn't deal gracefully with the case where the stack overflows and > > the stack pointer itself isn't on a valid stack but the > > to-be-dereferenced data *is*. > > > > - The oops dump code doesn't know how to print partial pt_regs, for the > > case where if we get an interrupt/exception in *early* entry code > > before the full pt_regs have been saved. > > > > (Andy, I'm not quite sure about your patch, and whether it's still > > needed after these patches. I'll need to look at it later when I have > > more time.) > > I haven't tested yet, but I think my patch is probably still needed. > The issue I fixed is that unwind_start() would bail out early if sp > was below the stack. Also: Makes sense, maybe both are needed. Your patch deals with a bad SP at the beginning and mine deals with a bad SP in the middle. > > -static bool stack_access_ok(struct unwind_state *state, unsigned long addr, > > +static bool stack_access_ok(struct unwind_state *state, unsigned long > > _addr, > > size_t len) > > { > > struct stack_info *info = >stack_info; > > + void *addr = (void *)_addr; > > > > - /* > > -* If the address isn't on the current stack, switch to the next > > one. > > -* > > -* We may have to traverse multiple stacks to deal with the > > possibility > > -* that info->next_sp could point to an empty stack and the address > > -* could be on a subsequent stack. > > -*/ > > - while (!on_stack(info, (void *)addr, len)) > > - if (get_stack_info(info->next_sp, state->task, info, > > - >stack_mask)) > > - return false; > > + if (!on_stack(info, addr, len) && > > + (get_stack_info(addr, state->task, info, >stack_mask))) > > + return false; > > > > return true; > > } > > This looks odd to me before and after. Shouldn't this be side-effect > free? That is, shouldn't it create a copy of info and stack_mask and > point that to get_stack_info() rather than allowing get_stack_info() > to modify the unwind state? I think the side effects are ok, but maybe stack_access_ok() should be renamed to make it clearer that it has side effects. -- Josh
Re: [PATCH] x86/orc: Don't bail on stack overflow
On Sat, Nov 25, 2017 at 08:25:12PM -0800, Andy Lutomirski wrote: > On Sat, Nov 25, 2017 at 6:40 PM, Josh Poimboeuf wrote: > > On Sat, Nov 25, 2017 at 04:16:23PM -0800, Andy Lutomirski wrote: > >> Can you send me whatever config and exact commit hash generated this? > >> I can try to figure out why it failed. > > > > Sorry, I've been traveling. I just got some time to take a look at > > this. I think there are at least two unwinder issues here: > > > > - It doesn't deal gracefully with the case where the stack overflows and > > the stack pointer itself isn't on a valid stack but the > > to-be-dereferenced data *is*. > > > > - The oops dump code doesn't know how to print partial pt_regs, for the > > case where if we get an interrupt/exception in *early* entry code > > before the full pt_regs have been saved. > > > > (Andy, I'm not quite sure about your patch, and whether it's still > > needed after these patches. I'll need to look at it later when I have > > more time.) > > I haven't tested yet, but I think my patch is probably still needed. > The issue I fixed is that unwind_start() would bail out early if sp > was below the stack. Also: Makes sense, maybe both are needed. Your patch deals with a bad SP at the beginning and mine deals with a bad SP in the middle. > > -static bool stack_access_ok(struct unwind_state *state, unsigned long addr, > > +static bool stack_access_ok(struct unwind_state *state, unsigned long > > _addr, > > size_t len) > > { > > struct stack_info *info = >stack_info; > > + void *addr = (void *)_addr; > > > > - /* > > -* If the address isn't on the current stack, switch to the next > > one. > > -* > > -* We may have to traverse multiple stacks to deal with the > > possibility > > -* that info->next_sp could point to an empty stack and the address > > -* could be on a subsequent stack. > > -*/ > > - while (!on_stack(info, (void *)addr, len)) > > - if (get_stack_info(info->next_sp, state->task, info, > > - >stack_mask)) > > - return false; > > + if (!on_stack(info, addr, len) && > > + (get_stack_info(addr, state->task, info, >stack_mask))) > > + return false; > > > > return true; > > } > > This looks odd to me before and after. Shouldn't this be side-effect > free? That is, shouldn't it create a copy of info and stack_mask and > point that to get_stack_info() rather than allowing get_stack_info() > to modify the unwind state? I think the side effects are ok, but maybe stack_access_ok() should be renamed to make it clearer that it has side effects. -- Josh
Re: [PATCH] x86/orc: Don't bail on stack overflow
On Sat, Nov 25, 2017 at 6:40 PM, Josh Poimboeufwrote: > On Sat, Nov 25, 2017 at 04:16:23PM -0800, Andy Lutomirski wrote: >> Can you send me whatever config and exact commit hash generated this? >> I can try to figure out why it failed. > > Sorry, I've been traveling. I just got some time to take a look at > this. I think there are at least two unwinder issues here: > > - It doesn't deal gracefully with the case where the stack overflows and > the stack pointer itself isn't on a valid stack but the > to-be-dereferenced data *is*. > > - The oops dump code doesn't know how to print partial pt_regs, for the > case where if we get an interrupt/exception in *early* entry code > before the full pt_regs have been saved. > > (Andy, I'm not quite sure about your patch, and whether it's still > needed after these patches. I'll need to look at it later when I have > more time.) I haven't tested yet, but I think my patch is probably still needed. The issue I fixed is that unwind_start() would bail out early if sp was below the stack. Also: > -static bool stack_access_ok(struct unwind_state *state, unsigned long addr, > +static bool stack_access_ok(struct unwind_state *state, unsigned long _addr, > size_t len) > { > struct stack_info *info = >stack_info; > + void *addr = (void *)_addr; > > - /* > -* If the address isn't on the current stack, switch to the next one. > -* > -* We may have to traverse multiple stacks to deal with the > possibility > -* that info->next_sp could point to an empty stack and the address > -* could be on a subsequent stack. > -*/ > - while (!on_stack(info, (void *)addr, len)) > - if (get_stack_info(info->next_sp, state->task, info, > - >stack_mask)) > - return false; > + if (!on_stack(info, addr, len) && > + (get_stack_info(addr, state->task, info, >stack_mask))) > + return false; > > return true; > } This looks odd to me before and after. Shouldn't this be side-effect free? That is, shouldn't it create a copy of info and stack_mask and point that to get_stack_info() rather than allowing get_stack_info() to modify the unwind state?
Re: [PATCH] x86/orc: Don't bail on stack overflow
On Sat, Nov 25, 2017 at 6:40 PM, Josh Poimboeuf wrote: > On Sat, Nov 25, 2017 at 04:16:23PM -0800, Andy Lutomirski wrote: >> Can you send me whatever config and exact commit hash generated this? >> I can try to figure out why it failed. > > Sorry, I've been traveling. I just got some time to take a look at > this. I think there are at least two unwinder issues here: > > - It doesn't deal gracefully with the case where the stack overflows and > the stack pointer itself isn't on a valid stack but the > to-be-dereferenced data *is*. > > - The oops dump code doesn't know how to print partial pt_regs, for the > case where if we get an interrupt/exception in *early* entry code > before the full pt_regs have been saved. > > (Andy, I'm not quite sure about your patch, and whether it's still > needed after these patches. I'll need to look at it later when I have > more time.) I haven't tested yet, but I think my patch is probably still needed. The issue I fixed is that unwind_start() would bail out early if sp was below the stack. Also: > -static bool stack_access_ok(struct unwind_state *state, unsigned long addr, > +static bool stack_access_ok(struct unwind_state *state, unsigned long _addr, > size_t len) > { > struct stack_info *info = >stack_info; > + void *addr = (void *)_addr; > > - /* > -* If the address isn't on the current stack, switch to the next one. > -* > -* We may have to traverse multiple stacks to deal with the > possibility > -* that info->next_sp could point to an empty stack and the address > -* could be on a subsequent stack. > -*/ > - while (!on_stack(info, (void *)addr, len)) > - if (get_stack_info(info->next_sp, state->task, info, > - >stack_mask)) > - return false; > + if (!on_stack(info, addr, len) && > + (get_stack_info(addr, state->task, info, >stack_mask))) > + return false; > > return true; > } This looks odd to me before and after. Shouldn't this be side-effect free? That is, shouldn't it create a copy of info and stack_mask and point that to get_stack_info() rather than allowing get_stack_info() to modify the unwind state?
Re: [PATCH] x86/orc: Don't bail on stack overflow
On Sat, Nov 25, 2017 at 04:16:23PM -0800, Andy Lutomirski wrote: > Can you send me whatever config and exact commit hash generated this? > I can try to figure out why it failed. Sorry, I've been traveling. I just got some time to take a look at this. I think there are at least two unwinder issues here: - It doesn't deal gracefully with the case where the stack overflows and the stack pointer itself isn't on a valid stack but the to-be-dereferenced data *is*. - The oops dump code doesn't know how to print partial pt_regs, for the case where if we get an interrupt/exception in *early* entry code before the full pt_regs have been saved. (Andy, I'm not quite sure about your patch, and whether it's still needed after these patches. I'll need to look at it later when I have more time.) I attempted to fix both of the issues with the below patch. Thomas or Ingo, can you test to see if this gets rid of the question marks? I can split it up into proper patches next week. I'm assuming this isn't holding up the KAISER merge? diff --git a/arch/x86/include/asm/kdebug.h b/arch/x86/include/asm/kdebug.h index f86a8caa561e..395c9631e000 100644 --- a/arch/x86/include/asm/kdebug.h +++ b/arch/x86/include/asm/kdebug.h @@ -26,6 +26,7 @@ extern void die(const char *, struct pt_regs *,long); extern int __must_check __die(const char *, struct pt_regs *, long); extern void show_stack_regs(struct pt_regs *regs); extern void __show_regs(struct pt_regs *regs, int all); +extern void show_iret_regs(struct pt_regs *regs); extern unsigned long oops_begin(void); extern void oops_end(unsigned long, struct pt_regs *, int signr); diff --git a/arch/x86/include/asm/unwind.h b/arch/x86/include/asm/unwind.h index e9cc6fe1fc6f..5be2fb23825a 100644 --- a/arch/x86/include/asm/unwind.h +++ b/arch/x86/include/asm/unwind.h @@ -7,6 +7,9 @@ #include #include +#define IRET_FRAME_OFFSET (offsetof(struct pt_regs, ip)) +#define IRET_FRAME_SIZE (sizeof(struct pt_regs) - IRET_FRAME_OFFSET) + struct unwind_state { struct stack_info stack_info; unsigned long stack_mask; @@ -52,6 +55,10 @@ void unwind_start(struct unwind_state *state, struct task_struct *task, } #if defined(CONFIG_UNWINDER_ORC) || defined(CONFIG_UNWINDER_FRAME_POINTER) +/* + * WARNING: The entire pt_regs may not be safe to dereference. In some cases, + * only the iret registers are accessible. Use with caution! + */ static inline struct pt_regs *unwind_get_entry_regs(struct unwind_state *state) { if (unwind_done(state)) diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c index d58dd121c0af..f7f3b5b3bc05 100644 --- a/arch/x86/kernel/dumpstack.c +++ b/arch/x86/kernel/dumpstack.c @@ -68,6 +68,21 @@ static void printk_stack_address(unsigned long address, int reliable, printk("%s %s%pB\n", log_lvl, reliable ? "" : "? ", (void *)address); } +static void show_regs_safe(struct stack_info *info, struct pt_regs *regs) +{ + if (on_stack(info, regs, sizeof(*regs))) + __show_regs(regs, 0); + else if (on_stack(info, (void *)regs + IRET_FRAME_OFFSET, + IRET_FRAME_SIZE)) { + /* +* When an interrupt or exception occurs in entry code, the +* full pt_regs might not have been saved yet. In that case +* just print the iret return frame. +*/ + show_iret_regs(regs); + } +} + void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs, unsigned long *stack, char *log_lvl) { @@ -116,8 +131,8 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs, if (stack_name) printk("%s <%s>\n", log_lvl, stack_name); - if (regs && on_stack(_info, regs, sizeof(*regs))) - __show_regs(regs, 0); + if (regs) + show_regs_safe(_info, regs); /* * Scan the stack, printing any text addresses we find. At the @@ -141,7 +156,7 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs, /* * Don't print regs->ip again if it was already printed -* by __show_regs() below. +* by show_regs_safe() below. */ if (regs && stack == >ip) goto next; @@ -177,8 +192,8 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs, /* if the frame has entry regs, print them */ regs = unwind_get_entry_regs(); - if (regs && on_stack(_info, regs, sizeof(*regs))) - __show_regs(regs, 0); + if (regs) + show_regs_safe(_info, regs); }
Re: [PATCH] x86/orc: Don't bail on stack overflow
On Sat, Nov 25, 2017 at 04:16:23PM -0800, Andy Lutomirski wrote: > Can you send me whatever config and exact commit hash generated this? > I can try to figure out why it failed. Sorry, I've been traveling. I just got some time to take a look at this. I think there are at least two unwinder issues here: - It doesn't deal gracefully with the case where the stack overflows and the stack pointer itself isn't on a valid stack but the to-be-dereferenced data *is*. - The oops dump code doesn't know how to print partial pt_regs, for the case where if we get an interrupt/exception in *early* entry code before the full pt_regs have been saved. (Andy, I'm not quite sure about your patch, and whether it's still needed after these patches. I'll need to look at it later when I have more time.) I attempted to fix both of the issues with the below patch. Thomas or Ingo, can you test to see if this gets rid of the question marks? I can split it up into proper patches next week. I'm assuming this isn't holding up the KAISER merge? diff --git a/arch/x86/include/asm/kdebug.h b/arch/x86/include/asm/kdebug.h index f86a8caa561e..395c9631e000 100644 --- a/arch/x86/include/asm/kdebug.h +++ b/arch/x86/include/asm/kdebug.h @@ -26,6 +26,7 @@ extern void die(const char *, struct pt_regs *,long); extern int __must_check __die(const char *, struct pt_regs *, long); extern void show_stack_regs(struct pt_regs *regs); extern void __show_regs(struct pt_regs *regs, int all); +extern void show_iret_regs(struct pt_regs *regs); extern unsigned long oops_begin(void); extern void oops_end(unsigned long, struct pt_regs *, int signr); diff --git a/arch/x86/include/asm/unwind.h b/arch/x86/include/asm/unwind.h index e9cc6fe1fc6f..5be2fb23825a 100644 --- a/arch/x86/include/asm/unwind.h +++ b/arch/x86/include/asm/unwind.h @@ -7,6 +7,9 @@ #include #include +#define IRET_FRAME_OFFSET (offsetof(struct pt_regs, ip)) +#define IRET_FRAME_SIZE (sizeof(struct pt_regs) - IRET_FRAME_OFFSET) + struct unwind_state { struct stack_info stack_info; unsigned long stack_mask; @@ -52,6 +55,10 @@ void unwind_start(struct unwind_state *state, struct task_struct *task, } #if defined(CONFIG_UNWINDER_ORC) || defined(CONFIG_UNWINDER_FRAME_POINTER) +/* + * WARNING: The entire pt_regs may not be safe to dereference. In some cases, + * only the iret registers are accessible. Use with caution! + */ static inline struct pt_regs *unwind_get_entry_regs(struct unwind_state *state) { if (unwind_done(state)) diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c index d58dd121c0af..f7f3b5b3bc05 100644 --- a/arch/x86/kernel/dumpstack.c +++ b/arch/x86/kernel/dumpstack.c @@ -68,6 +68,21 @@ static void printk_stack_address(unsigned long address, int reliable, printk("%s %s%pB\n", log_lvl, reliable ? "" : "? ", (void *)address); } +static void show_regs_safe(struct stack_info *info, struct pt_regs *regs) +{ + if (on_stack(info, regs, sizeof(*regs))) + __show_regs(regs, 0); + else if (on_stack(info, (void *)regs + IRET_FRAME_OFFSET, + IRET_FRAME_SIZE)) { + /* +* When an interrupt or exception occurs in entry code, the +* full pt_regs might not have been saved yet. In that case +* just print the iret return frame. +*/ + show_iret_regs(regs); + } +} + void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs, unsigned long *stack, char *log_lvl) { @@ -116,8 +131,8 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs, if (stack_name) printk("%s <%s>\n", log_lvl, stack_name); - if (regs && on_stack(_info, regs, sizeof(*regs))) - __show_regs(regs, 0); + if (regs) + show_regs_safe(_info, regs); /* * Scan the stack, printing any text addresses we find. At the @@ -141,7 +156,7 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs, /* * Don't print regs->ip again if it was already printed -* by __show_regs() below. +* by show_regs_safe() below. */ if (regs && stack == >ip) goto next; @@ -177,8 +192,8 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs, /* if the frame has entry regs, print them */ regs = unwind_get_entry_regs(); - if (regs && on_stack(_info, regs, sizeof(*regs))) - __show_regs(regs, 0); + if (regs) + show_regs_safe(_info, regs); }
Re: [PATCH] x86/orc: Don't bail on stack overflow
Can you send me whatever config and exact commit hash generated this? I can try to figure out why it failed. On Sat, Nov 25, 2017 at 3:13 PM, Thomas Gleixnerwrote: > On Sat, 25 Nov 2017, Andy Lutomirski wrote: > >> On Sat, Nov 25, 2017 at 9:28 AM, Andy Lutomirski wrote: >> > If we overflow the stack into a guard page and then try to unwind >> > it with ORC, it should work perfectly: by construction, there can't >> > be any meaningful data in the guard page because no writes to the >> > guard page will have succeeded. >> > >> > ORC seems entirely capable of unwinding in this situation, except >> > that it doesn't even try. Adjust its initial stack check so that >> > it's willing to try unwinding. >> > >> > I tested this by intentionally overflowing the task stack. The >> > result is an accurate call trace instead of a trace consisting >> > purely of '?' entries. >> > >> > Signed-off-by: Andy Lutomirski >> > --- >> > >> > Hi all- >> > >> > Ingo, this would have fixed half the debugging problem you had, I think. >> > To really nail it, we'd want some kind of magic to annotate the trace >> > so that page_fault (and async_page_fault) entries show CR2 and error_code. >> > >> > Josh, any ideas of how to do that cleanly? We could easily hard-code it >> > in the OOPS unwinder, I guess. >> >> Actually, this does pretty well. We don't get CR2, but, when I added >> an intentional bug kind of along the lines of the one you debugged, >> the intermediate page fault successfully dumps all the regs in the >> stack trace, so we get the faulting instruction *and* the registers. >> We also get ORIG_RAX, which tells us the error code. We could be >> fancy and decode that. > > It works in general, but for that case it's not much better than before > vs. the '?' entries. > > Thanks, > > tglx > > [2.556065] PANIC: double fault, error_code: 0x0 > [2.557116] CPU: 1 PID: 273 Comm: systemd-udevd Not tainted > 4.14.0-01256-g03dea81fe9f2-dirty #30 > [2.558930] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > 1.10.2-1 04/01/2014 > [2.560133] task: 880428dd8000 task.stack: c900025fc000 > [2.560729] RIP: 0010:page_fault+0x11/0x60 > [2.561122] RSP: :ff083fc8 EFLAGS: 00010046 > [2.561607] RAX: 819d0ac7 RBX: 0001 RCX: > 819d0ac7 > [2.562357] RDX: RSI: 0010 RDI: > ff084078 > [2.563027] RBP: 000b R08: R09: > 0040 > [2.563726] R10: 0018 R11: 0246 R12: > 0003 > [2.564429] R13: 55719fd7d410 R14: R15: > 03938700 > [2.565104] FS: 7f9edc0b78c0() GS:88042e44() > knlGS: > [2.565844] CS: 0010 DS: ES: CR0: 80050033 > [2.566396] CR2: ff083fb8 CR3: 000428ec4005 CR4: > 001606e0 > [2.567097] DR0: DR1: DR2: > > [2.567761] DR3: DR6: fffe0ff0 DR7: > 0400 > [2.568451] Call Trace: > [2.568704] > [2.568950] ? __do_page_fault+0x4b0/0x4b0 > [2.569348] ? page_fault+0x2c/0x60 > [2.569680] ? native_iret+0x7/0x7 > [2.570019] ? __do_page_fault+0x4b0/0x4b0 > [2.570396] ? page_fault+0x2c/0x60 > [2.570743] ? call_function_interrupt+0xc0/0xc0 > [2.571199] > [2.571422] Code: ff e8 34 b7 6a ff e9 9f 02 00 00 0f 1f 44 00 00 66 2e 0f > 1f 84 00 00 00 00 00 0f 1f 00 48 83 c4 88 f6 84 24 88 00 00 00 03 75 20 > 4a 01 00 00 48 89 e7 48 8b 74 24 78 48 c7 44 24 78 ff ff ff > [2.573192] Kernel panic - not syncing: Machine halted. > [2.573694] CPU: 1 PID: 273 Comm: systemd-udevd Not tainted > 4.14.0-01256-g03dea81fe9f2-dirty #30 > [2.574528] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > 1.10.2-1 04/01/2014 > [2.575330] Call Trace: > [2.575570] <#DF> > [2.575760] dump_stack+0x46/0x59 > [2.576120] panic+0xde/0x223 > [2.576405] df_debug+0x29/0x30 > [2.576687] do_double_fault+0x9a/0x120 > [2.577057] double_fault+0x22/0x30 > [2.577376] RIP: 0010:page_fault+0x11/0x60 > [2.55] RSP: :ff083fc8 EFLAGS: 00010046 > [2.578314] RAX: 819d0ac7 RBX: 0001 RCX: > 819d0ac7 > [2.578979] RDX: RSI: 0010 RDI: > ff084078 > [2.579666] RBP: 000b R08: R09: > 0040 > [2.580334] R10: 0018 R11: 0246 R12: > 0003 > [2.581008] R13: 55719fd7d410 R14: R15: > 03938700 > [2.581684] ? native_iret+0x7/0x7 > [2.582007] WARNING: can't dereference iret registers at ff084048 > for ip page_fault+0x11/0x60 > [2.582008] > [2.583134] > [2.583367] ?
Re: [PATCH] x86/orc: Don't bail on stack overflow
Can you send me whatever config and exact commit hash generated this? I can try to figure out why it failed. On Sat, Nov 25, 2017 at 3:13 PM, Thomas Gleixner wrote: > On Sat, 25 Nov 2017, Andy Lutomirski wrote: > >> On Sat, Nov 25, 2017 at 9:28 AM, Andy Lutomirski wrote: >> > If we overflow the stack into a guard page and then try to unwind >> > it with ORC, it should work perfectly: by construction, there can't >> > be any meaningful data in the guard page because no writes to the >> > guard page will have succeeded. >> > >> > ORC seems entirely capable of unwinding in this situation, except >> > that it doesn't even try. Adjust its initial stack check so that >> > it's willing to try unwinding. >> > >> > I tested this by intentionally overflowing the task stack. The >> > result is an accurate call trace instead of a trace consisting >> > purely of '?' entries. >> > >> > Signed-off-by: Andy Lutomirski >> > --- >> > >> > Hi all- >> > >> > Ingo, this would have fixed half the debugging problem you had, I think. >> > To really nail it, we'd want some kind of magic to annotate the trace >> > so that page_fault (and async_page_fault) entries show CR2 and error_code. >> > >> > Josh, any ideas of how to do that cleanly? We could easily hard-code it >> > in the OOPS unwinder, I guess. >> >> Actually, this does pretty well. We don't get CR2, but, when I added >> an intentional bug kind of along the lines of the one you debugged, >> the intermediate page fault successfully dumps all the regs in the >> stack trace, so we get the faulting instruction *and* the registers. >> We also get ORIG_RAX, which tells us the error code. We could be >> fancy and decode that. > > It works in general, but for that case it's not much better than before > vs. the '?' entries. > > Thanks, > > tglx > > [2.556065] PANIC: double fault, error_code: 0x0 > [2.557116] CPU: 1 PID: 273 Comm: systemd-udevd Not tainted > 4.14.0-01256-g03dea81fe9f2-dirty #30 > [2.558930] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > 1.10.2-1 04/01/2014 > [2.560133] task: 880428dd8000 task.stack: c900025fc000 > [2.560729] RIP: 0010:page_fault+0x11/0x60 > [2.561122] RSP: :ff083fc8 EFLAGS: 00010046 > [2.561607] RAX: 819d0ac7 RBX: 0001 RCX: > 819d0ac7 > [2.562357] RDX: RSI: 0010 RDI: > ff084078 > [2.563027] RBP: 000b R08: R09: > 0040 > [2.563726] R10: 0018 R11: 0246 R12: > 0003 > [2.564429] R13: 55719fd7d410 R14: R15: > 03938700 > [2.565104] FS: 7f9edc0b78c0() GS:88042e44() > knlGS: > [2.565844] CS: 0010 DS: ES: CR0: 80050033 > [2.566396] CR2: ff083fb8 CR3: 000428ec4005 CR4: > 001606e0 > [2.567097] DR0: DR1: DR2: > > [2.567761] DR3: DR6: fffe0ff0 DR7: > 0400 > [2.568451] Call Trace: > [2.568704] > [2.568950] ? __do_page_fault+0x4b0/0x4b0 > [2.569348] ? page_fault+0x2c/0x60 > [2.569680] ? native_iret+0x7/0x7 > [2.570019] ? __do_page_fault+0x4b0/0x4b0 > [2.570396] ? page_fault+0x2c/0x60 > [2.570743] ? call_function_interrupt+0xc0/0xc0 > [2.571199] > [2.571422] Code: ff e8 34 b7 6a ff e9 9f 02 00 00 0f 1f 44 00 00 66 2e 0f > 1f 84 00 00 00 00 00 0f 1f 00 48 83 c4 88 f6 84 24 88 00 00 00 03 75 20 > 4a 01 00 00 48 89 e7 48 8b 74 24 78 48 c7 44 24 78 ff ff ff > [2.573192] Kernel panic - not syncing: Machine halted. > [2.573694] CPU: 1 PID: 273 Comm: systemd-udevd Not tainted > 4.14.0-01256-g03dea81fe9f2-dirty #30 > [2.574528] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > 1.10.2-1 04/01/2014 > [2.575330] Call Trace: > [2.575570] <#DF> > [2.575760] dump_stack+0x46/0x59 > [2.576120] panic+0xde/0x223 > [2.576405] df_debug+0x29/0x30 > [2.576687] do_double_fault+0x9a/0x120 > [2.577057] double_fault+0x22/0x30 > [2.577376] RIP: 0010:page_fault+0x11/0x60 > [2.55] RSP: :ff083fc8 EFLAGS: 00010046 > [2.578314] RAX: 819d0ac7 RBX: 0001 RCX: > 819d0ac7 > [2.578979] RDX: RSI: 0010 RDI: > ff084078 > [2.579666] RBP: 000b R08: R09: > 0040 > [2.580334] R10: 0018 R11: 0246 R12: > 0003 > [2.581008] R13: 55719fd7d410 R14: R15: > 03938700 > [2.581684] ? native_iret+0x7/0x7 > [2.582007] WARNING: can't dereference iret registers at ff084048 > for ip page_fault+0x11/0x60 > [2.582008] > [2.583134] > [2.583367] ? __do_page_fault+0x4b0/0x4b0 > [2.583751] ? page_fault+0x2c/0x60
Re: [PATCH] x86/orc: Don't bail on stack overflow
On Sat, 25 Nov 2017, Andy Lutomirski wrote: > On Sat, Nov 25, 2017 at 9:28 AM, Andy Lutomirskiwrote: > > If we overflow the stack into a guard page and then try to unwind > > it with ORC, it should work perfectly: by construction, there can't > > be any meaningful data in the guard page because no writes to the > > guard page will have succeeded. > > > > ORC seems entirely capable of unwinding in this situation, except > > that it doesn't even try. Adjust its initial stack check so that > > it's willing to try unwinding. > > > > I tested this by intentionally overflowing the task stack. The > > result is an accurate call trace instead of a trace consisting > > purely of '?' entries. > > > > Signed-off-by: Andy Lutomirski > > --- > > > > Hi all- > > > > Ingo, this would have fixed half the debugging problem you had, I think. > > To really nail it, we'd want some kind of magic to annotate the trace > > so that page_fault (and async_page_fault) entries show CR2 and error_code. > > > > Josh, any ideas of how to do that cleanly? We could easily hard-code it > > in the OOPS unwinder, I guess. > > Actually, this does pretty well. We don't get CR2, but, when I added > an intentional bug kind of along the lines of the one you debugged, > the intermediate page fault successfully dumps all the regs in the > stack trace, so we get the faulting instruction *and* the registers. > We also get ORIG_RAX, which tells us the error code. We could be > fancy and decode that. It works in general, but for that case it's not much better than before vs. the '?' entries. Thanks, tglx [2.556065] PANIC: double fault, error_code: 0x0 [2.557116] CPU: 1 PID: 273 Comm: systemd-udevd Not tainted 4.14.0-01256-g03dea81fe9f2-dirty #30 [2.558930] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 [2.560133] task: 880428dd8000 task.stack: c900025fc000 [2.560729] RIP: 0010:page_fault+0x11/0x60 [2.561122] RSP: :ff083fc8 EFLAGS: 00010046 [2.561607] RAX: 819d0ac7 RBX: 0001 RCX: 819d0ac7 [2.562357] RDX: RSI: 0010 RDI: ff084078 [2.563027] RBP: 000b R08: R09: 0040 [2.563726] R10: 0018 R11: 0246 R12: 0003 [2.564429] R13: 55719fd7d410 R14: R15: 03938700 [2.565104] FS: 7f9edc0b78c0() GS:88042e44() knlGS: [2.565844] CS: 0010 DS: ES: CR0: 80050033 [2.566396] CR2: ff083fb8 CR3: 000428ec4005 CR4: 001606e0 [2.567097] DR0: DR1: DR2: [2.567761] DR3: DR6: fffe0ff0 DR7: 0400 [2.568451] Call Trace: [2.568704] [2.568950] ? __do_page_fault+0x4b0/0x4b0 [2.569348] ? page_fault+0x2c/0x60 [2.569680] ? native_iret+0x7/0x7 [2.570019] ? __do_page_fault+0x4b0/0x4b0 [2.570396] ? page_fault+0x2c/0x60 [2.570743] ? call_function_interrupt+0xc0/0xc0 [2.571199] [2.571422] Code: ff e8 34 b7 6a ff e9 9f 02 00 00 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 83 c4 88 f6 84 24 88 00 00 00 03 75 20 4a 01 00 00 48 89 e7 48 8b 74 24 78 48 c7 44 24 78 ff ff ff [2.573192] Kernel panic - not syncing: Machine halted. [2.573694] CPU: 1 PID: 273 Comm: systemd-udevd Not tainted 4.14.0-01256-g03dea81fe9f2-dirty #30 [2.574528] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 [2.575330] Call Trace: [2.575570] <#DF> [2.575760] dump_stack+0x46/0x59 [2.576120] panic+0xde/0x223 [2.576405] df_debug+0x29/0x30 [2.576687] do_double_fault+0x9a/0x120 [2.577057] double_fault+0x22/0x30 [2.577376] RIP: 0010:page_fault+0x11/0x60 [2.55] RSP: :ff083fc8 EFLAGS: 00010046 [2.578314] RAX: 819d0ac7 RBX: 0001 RCX: 819d0ac7 [2.578979] RDX: RSI: 0010 RDI: ff084078 [2.579666] RBP: 000b R08: R09: 0040 [2.580334] R10: 0018 R11: 0246 R12: 0003 [2.581008] R13: 55719fd7d410 R14: R15: 03938700 [2.581684] ? native_iret+0x7/0x7 [2.582007] WARNING: can't dereference iret registers at ff084048 for ip page_fault+0x11/0x60 [2.582008] [2.583134] [2.583367] ? __do_page_fault+0x4b0/0x4b0 [2.583751] ? page_fault+0x2c/0x60 [2.584127] ? native_iret+0x7/0x7 [2.584466] ? __do_page_fault+0x4b0/0x4b0 [2.584860] ? page_fault+0x2c/0x60 [2.585195] ? call_function_interrupt+0xc0/0xc0 [2.585621] [2.586966] Dumping ftrace buffer: [2.587254](ftrace buffer empty) [2.587534] Kernel Offset: disabled [2.587814] ---[
Re: [PATCH] x86/orc: Don't bail on stack overflow
On Sat, 25 Nov 2017, Andy Lutomirski wrote: > On Sat, Nov 25, 2017 at 9:28 AM, Andy Lutomirski wrote: > > If we overflow the stack into a guard page and then try to unwind > > it with ORC, it should work perfectly: by construction, there can't > > be any meaningful data in the guard page because no writes to the > > guard page will have succeeded. > > > > ORC seems entirely capable of unwinding in this situation, except > > that it doesn't even try. Adjust its initial stack check so that > > it's willing to try unwinding. > > > > I tested this by intentionally overflowing the task stack. The > > result is an accurate call trace instead of a trace consisting > > purely of '?' entries. > > > > Signed-off-by: Andy Lutomirski > > --- > > > > Hi all- > > > > Ingo, this would have fixed half the debugging problem you had, I think. > > To really nail it, we'd want some kind of magic to annotate the trace > > so that page_fault (and async_page_fault) entries show CR2 and error_code. > > > > Josh, any ideas of how to do that cleanly? We could easily hard-code it > > in the OOPS unwinder, I guess. > > Actually, this does pretty well. We don't get CR2, but, when I added > an intentional bug kind of along the lines of the one you debugged, > the intermediate page fault successfully dumps all the regs in the > stack trace, so we get the faulting instruction *and* the registers. > We also get ORIG_RAX, which tells us the error code. We could be > fancy and decode that. It works in general, but for that case it's not much better than before vs. the '?' entries. Thanks, tglx [2.556065] PANIC: double fault, error_code: 0x0 [2.557116] CPU: 1 PID: 273 Comm: systemd-udevd Not tainted 4.14.0-01256-g03dea81fe9f2-dirty #30 [2.558930] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 [2.560133] task: 880428dd8000 task.stack: c900025fc000 [2.560729] RIP: 0010:page_fault+0x11/0x60 [2.561122] RSP: :ff083fc8 EFLAGS: 00010046 [2.561607] RAX: 819d0ac7 RBX: 0001 RCX: 819d0ac7 [2.562357] RDX: RSI: 0010 RDI: ff084078 [2.563027] RBP: 000b R08: R09: 0040 [2.563726] R10: 0018 R11: 0246 R12: 0003 [2.564429] R13: 55719fd7d410 R14: R15: 03938700 [2.565104] FS: 7f9edc0b78c0() GS:88042e44() knlGS: [2.565844] CS: 0010 DS: ES: CR0: 80050033 [2.566396] CR2: ff083fb8 CR3: 000428ec4005 CR4: 001606e0 [2.567097] DR0: DR1: DR2: [2.567761] DR3: DR6: fffe0ff0 DR7: 0400 [2.568451] Call Trace: [2.568704] [2.568950] ? __do_page_fault+0x4b0/0x4b0 [2.569348] ? page_fault+0x2c/0x60 [2.569680] ? native_iret+0x7/0x7 [2.570019] ? __do_page_fault+0x4b0/0x4b0 [2.570396] ? page_fault+0x2c/0x60 [2.570743] ? call_function_interrupt+0xc0/0xc0 [2.571199] [2.571422] Code: ff e8 34 b7 6a ff e9 9f 02 00 00 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 83 c4 88 f6 84 24 88 00 00 00 03 75 20 4a 01 00 00 48 89 e7 48 8b 74 24 78 48 c7 44 24 78 ff ff ff [2.573192] Kernel panic - not syncing: Machine halted. [2.573694] CPU: 1 PID: 273 Comm: systemd-udevd Not tainted 4.14.0-01256-g03dea81fe9f2-dirty #30 [2.574528] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 [2.575330] Call Trace: [2.575570] <#DF> [2.575760] dump_stack+0x46/0x59 [2.576120] panic+0xde/0x223 [2.576405] df_debug+0x29/0x30 [2.576687] do_double_fault+0x9a/0x120 [2.577057] double_fault+0x22/0x30 [2.577376] RIP: 0010:page_fault+0x11/0x60 [2.55] RSP: :ff083fc8 EFLAGS: 00010046 [2.578314] RAX: 819d0ac7 RBX: 0001 RCX: 819d0ac7 [2.578979] RDX: RSI: 0010 RDI: ff084078 [2.579666] RBP: 000b R08: R09: 0040 [2.580334] R10: 0018 R11: 0246 R12: 0003 [2.581008] R13: 55719fd7d410 R14: R15: 03938700 [2.581684] ? native_iret+0x7/0x7 [2.582007] WARNING: can't dereference iret registers at ff084048 for ip page_fault+0x11/0x60 [2.582008] [2.583134] [2.583367] ? __do_page_fault+0x4b0/0x4b0 [2.583751] ? page_fault+0x2c/0x60 [2.584127] ? native_iret+0x7/0x7 [2.584466] ? __do_page_fault+0x4b0/0x4b0 [2.584860] ? page_fault+0x2c/0x60 [2.585195] ? call_function_interrupt+0xc0/0xc0 [2.585621] [2.586966] Dumping ftrace buffer: [2.587254](ftrace buffer empty) [2.587534] Kernel Offset: disabled [2.587814] ---[ end Kernel panic - not syncing:
Re: [PATCH] x86/orc: Don't bail on stack overflow
On Sat, Nov 25, 2017 at 9:28 AM, Andy Lutomirskiwrote: > If we overflow the stack into a guard page and then try to unwind > it with ORC, it should work perfectly: by construction, there can't > be any meaningful data in the guard page because no writes to the > guard page will have succeeded. > > ORC seems entirely capable of unwinding in this situation, except > that it doesn't even try. Adjust its initial stack check so that > it's willing to try unwinding. > > I tested this by intentionally overflowing the task stack. The > result is an accurate call trace instead of a trace consisting > purely of '?' entries. > > Signed-off-by: Andy Lutomirski > --- > > Hi all- > > Ingo, this would have fixed half the debugging problem you had, I think. > To really nail it, we'd want some kind of magic to annotate the trace > so that page_fault (and async_page_fault) entries show CR2 and error_code. > > Josh, any ideas of how to do that cleanly? We could easily hard-code it > in the OOPS unwinder, I guess. Actually, this does pretty well. We don't get CR2, but, when I added an intentional bug kind of along the lines of the one you debugged, the intermediate page fault successfully dumps all the regs in the stack trace, so we get the faulting instruction *and* the registers. We also get ORIG_RAX, which tells us the error code. We could be fancy and decode that.
Re: [PATCH] x86/orc: Don't bail on stack overflow
On Sat, Nov 25, 2017 at 9:28 AM, Andy Lutomirski wrote: > If we overflow the stack into a guard page and then try to unwind > it with ORC, it should work perfectly: by construction, there can't > be any meaningful data in the guard page because no writes to the > guard page will have succeeded. > > ORC seems entirely capable of unwinding in this situation, except > that it doesn't even try. Adjust its initial stack check so that > it's willing to try unwinding. > > I tested this by intentionally overflowing the task stack. The > result is an accurate call trace instead of a trace consisting > purely of '?' entries. > > Signed-off-by: Andy Lutomirski > --- > > Hi all- > > Ingo, this would have fixed half the debugging problem you had, I think. > To really nail it, we'd want some kind of magic to annotate the trace > so that page_fault (and async_page_fault) entries show CR2 and error_code. > > Josh, any ideas of how to do that cleanly? We could easily hard-code it > in the OOPS unwinder, I guess. Actually, this does pretty well. We don't get CR2, but, when I added an intentional bug kind of along the lines of the one you debugged, the intermediate page fault successfully dumps all the regs in the stack trace, so we get the faulting instruction *and* the registers. We also get ORIG_RAX, which tells us the error code. We could be fancy and decode that.