Re: [PATCH] x86/orc: Don't bail on stack overflow

2017-11-27 Thread Ingo Molnar

* Josh Poimboeuf  wrote:

> On Mon, Nov 27, 2017 at 10:38:42AM +0100, Ingo Molnar wrote:
> > 
> > * Josh Poimboeuf  wrote:
> > 
> > > On Sat, Nov 25, 2017 at 04:16:23PM -0800, Andy Lutomirski wrote:
> > > > Can you send me whatever config and exact commit hash generated this?
> > > > I can try to figure out why it failed.
> > > 
> > > Sorry, I've been traveling.  I just got some time to take a look at
> > > this.  I think there are at least two unwinder issues here:
> > > 
> > > - It doesn't deal gracefully with the case where the stack overflows and
> > >   the stack pointer itself isn't on a valid stack but the
> > >   to-be-dereferenced data *is*.
> > > 
> > > - The oops dump code doesn't know how to print partial pt_regs, for the
> > >   case where if we get an interrupt/exception in *early* entry code
> > >   before the full pt_regs have been saved.
> > > 
> > > (Andy, I'm not quite sure about your patch, and whether it's still
> > > needed after these patches.  I'll need to look at it later when I have
> > > more time.)
> > > 
> > > I attempted to fix both of the issues with the below patch.  Thomas or
> > > Ingo, can you test to see if this gets rid of the question marks?
> > > 
> > > I can split it up into proper patches next week.  I'm assuming this
> > > isn't holding up the KAISER merge?
> > 
> > It's not holding up the Kaiser merge, but good debuggability of weird 
> > crashes is a 
> > really good thing, so I constructed a changelog and picked up this patch as 
> > a 
> > single commit, and added your Signed-off-by, if that's OK with you.
> > 
> > Will only push it out if it passes testing.
> 
> The commit log looks good, though there's a CONFIG_FRAME_POINTER build
> failure.  Can you fold in this fix?
> 
> diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c
> index 7dd0d2a0d142..77835bc021c7 100644
> --- a/arch/x86/kernel/stacktrace.c
> +++ b/arch/x86/kernel/stacktrace.c
> @@ -102,7 +102,7 @@ __save_stack_trace_reliable(struct stack_trace *trace,
>   for (unwind_start(, task, NULL, NULL); !unwind_done();
>unwind_next_frame()) {
>  
> - regs = unwind_get_entry_regs(, NULL);
> + regs = unwind_get_entry_regs();
>   if (regs) {
>   /*
>* Kernel mode registers on the stack indicate an

Done, thanks!

Ingo


Re: [PATCH] x86/orc: Don't bail on stack overflow

2017-11-27 Thread Ingo Molnar

* Josh Poimboeuf  wrote:

> On Mon, Nov 27, 2017 at 10:38:42AM +0100, Ingo Molnar wrote:
> > 
> > * Josh Poimboeuf  wrote:
> > 
> > > On Sat, Nov 25, 2017 at 04:16:23PM -0800, Andy Lutomirski wrote:
> > > > Can you send me whatever config and exact commit hash generated this?
> > > > I can try to figure out why it failed.
> > > 
> > > Sorry, I've been traveling.  I just got some time to take a look at
> > > this.  I think there are at least two unwinder issues here:
> > > 
> > > - It doesn't deal gracefully with the case where the stack overflows and
> > >   the stack pointer itself isn't on a valid stack but the
> > >   to-be-dereferenced data *is*.
> > > 
> > > - The oops dump code doesn't know how to print partial pt_regs, for the
> > >   case where if we get an interrupt/exception in *early* entry code
> > >   before the full pt_regs have been saved.
> > > 
> > > (Andy, I'm not quite sure about your patch, and whether it's still
> > > needed after these patches.  I'll need to look at it later when I have
> > > more time.)
> > > 
> > > I attempted to fix both of the issues with the below patch.  Thomas or
> > > Ingo, can you test to see if this gets rid of the question marks?
> > > 
> > > I can split it up into proper patches next week.  I'm assuming this
> > > isn't holding up the KAISER merge?
> > 
> > It's not holding up the Kaiser merge, but good debuggability of weird 
> > crashes is a 
> > really good thing, so I constructed a changelog and picked up this patch as 
> > a 
> > single commit, and added your Signed-off-by, if that's OK with you.
> > 
> > Will only push it out if it passes testing.
> 
> The commit log looks good, though there's a CONFIG_FRAME_POINTER build
> failure.  Can you fold in this fix?
> 
> diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c
> index 7dd0d2a0d142..77835bc021c7 100644
> --- a/arch/x86/kernel/stacktrace.c
> +++ b/arch/x86/kernel/stacktrace.c
> @@ -102,7 +102,7 @@ __save_stack_trace_reliable(struct stack_trace *trace,
>   for (unwind_start(, task, NULL, NULL); !unwind_done();
>unwind_next_frame()) {
>  
> - regs = unwind_get_entry_regs(, NULL);
> + regs = unwind_get_entry_regs();
>   if (regs) {
>   /*
>* Kernel mode registers on the stack indicate an

Done, thanks!

Ingo


Re: [PATCH] x86/orc: Don't bail on stack overflow

2017-11-27 Thread Josh Poimboeuf
On Mon, Nov 27, 2017 at 10:38:42AM +0100, Ingo Molnar wrote:
> 
> * Josh Poimboeuf  wrote:
> 
> > On Sat, Nov 25, 2017 at 04:16:23PM -0800, Andy Lutomirski wrote:
> > > Can you send me whatever config and exact commit hash generated this?
> > > I can try to figure out why it failed.
> > 
> > Sorry, I've been traveling.  I just got some time to take a look at
> > this.  I think there are at least two unwinder issues here:
> > 
> > - It doesn't deal gracefully with the case where the stack overflows and
> >   the stack pointer itself isn't on a valid stack but the
> >   to-be-dereferenced data *is*.
> > 
> > - The oops dump code doesn't know how to print partial pt_regs, for the
> >   case where if we get an interrupt/exception in *early* entry code
> >   before the full pt_regs have been saved.
> > 
> > (Andy, I'm not quite sure about your patch, and whether it's still
> > needed after these patches.  I'll need to look at it later when I have
> > more time.)
> > 
> > I attempted to fix both of the issues with the below patch.  Thomas or
> > Ingo, can you test to see if this gets rid of the question marks?
> > 
> > I can split it up into proper patches next week.  I'm assuming this
> > isn't holding up the KAISER merge?
> 
> It's not holding up the Kaiser merge, but good debuggability of weird crashes 
> is a 
> really good thing, so I constructed a changelog and picked up this patch as a 
> single commit, and added your Signed-off-by, if that's OK with you.
> 
> Will only push it out if it passes testing.

The commit log looks good, though there's a CONFIG_FRAME_POINTER build
failure.  Can you fold in this fix?

diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c
index 7dd0d2a0d142..77835bc021c7 100644
--- a/arch/x86/kernel/stacktrace.c
+++ b/arch/x86/kernel/stacktrace.c
@@ -102,7 +102,7 @@ __save_stack_trace_reliable(struct stack_trace *trace,
for (unwind_start(, task, NULL, NULL); !unwind_done();
 unwind_next_frame()) {
 
-   regs = unwind_get_entry_regs(, NULL);
+   regs = unwind_get_entry_regs();
if (regs) {
/*
 * Kernel mode registers on the stack indicate an


Re: [PATCH] x86/orc: Don't bail on stack overflow

2017-11-27 Thread Josh Poimboeuf
On Mon, Nov 27, 2017 at 10:38:42AM +0100, Ingo Molnar wrote:
> 
> * Josh Poimboeuf  wrote:
> 
> > On Sat, Nov 25, 2017 at 04:16:23PM -0800, Andy Lutomirski wrote:
> > > Can you send me whatever config and exact commit hash generated this?
> > > I can try to figure out why it failed.
> > 
> > Sorry, I've been traveling.  I just got some time to take a look at
> > this.  I think there are at least two unwinder issues here:
> > 
> > - It doesn't deal gracefully with the case where the stack overflows and
> >   the stack pointer itself isn't on a valid stack but the
> >   to-be-dereferenced data *is*.
> > 
> > - The oops dump code doesn't know how to print partial pt_regs, for the
> >   case where if we get an interrupt/exception in *early* entry code
> >   before the full pt_regs have been saved.
> > 
> > (Andy, I'm not quite sure about your patch, and whether it's still
> > needed after these patches.  I'll need to look at it later when I have
> > more time.)
> > 
> > I attempted to fix both of the issues with the below patch.  Thomas or
> > Ingo, can you test to see if this gets rid of the question marks?
> > 
> > I can split it up into proper patches next week.  I'm assuming this
> > isn't holding up the KAISER merge?
> 
> It's not holding up the Kaiser merge, but good debuggability of weird crashes 
> is a 
> really good thing, so I constructed a changelog and picked up this patch as a 
> single commit, and added your Signed-off-by, if that's OK with you.
> 
> Will only push it out if it passes testing.

The commit log looks good, though there's a CONFIG_FRAME_POINTER build
failure.  Can you fold in this fix?

diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c
index 7dd0d2a0d142..77835bc021c7 100644
--- a/arch/x86/kernel/stacktrace.c
+++ b/arch/x86/kernel/stacktrace.c
@@ -102,7 +102,7 @@ __save_stack_trace_reliable(struct stack_trace *trace,
for (unwind_start(, task, NULL, NULL); !unwind_done();
 unwind_next_frame()) {
 
-   regs = unwind_get_entry_regs(, NULL);
+   regs = unwind_get_entry_regs();
if (regs) {
/*
 * Kernel mode registers on the stack indicate an


Re: [PATCH] x86/orc: Don't bail on stack overflow

2017-11-27 Thread Ingo Molnar

* Ingo Molnar  wrote:

> Will only push it out if it passes testing.

So far it required the small fix below for 32-bit.

Thanks,

Ingo

---
 arch/x86/kernel/dumpstack.c  | 7 +++
 arch/x86/kernel/process_64.c | 7 ---
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index f7f3b5b3bc05..f214f7a6ccad 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -68,6 +68,13 @@ static void printk_stack_address(unsigned long address, int 
reliable,
printk("%s %s%pB\n", log_lvl, reliable ? "" : "? ", (void *)address);
 }
 
+void show_iret_regs(struct pt_regs *regs)
+{
+   printk(KERN_DEFAULT "RIP: %04x:%pS\n", (int)regs->cs, (void *)regs->ip);
+   printk(KERN_DEFAULT "RSP: %04x:%016lx EFLAGS: %08lx", (int)regs->ss,
+   regs->sp, regs->flags);
+}
+
 static void show_regs_safe(struct stack_info *info, struct pt_regs *regs)
 {
if (on_stack(info, regs, sizeof(*regs)))
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 2e0e2979e34d..631e229ab428 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -61,13 +61,6 @@
 
 __visible DEFINE_PER_CPU_USER_MAPPED(unsigned long, rsp_scratch);
 
-void show_iret_regs(struct pt_regs *regs)
-{
-   printk(KERN_DEFAULT "RIP: %04lx:%pS\n", regs->cs, (void *)regs->ip);
-   printk(KERN_DEFAULT "RSP: %04lx:%016lx EFLAGS: %08lx", regs->ss,
-   regs->sp, regs->flags);
-}
-
 /* Prints also some state that isn't saved in the pt_regs */
 void __show_regs(struct pt_regs *regs, int all)
 {


Re: [PATCH] x86/orc: Don't bail on stack overflow

2017-11-27 Thread Ingo Molnar

* Ingo Molnar  wrote:

> Will only push it out if it passes testing.

So far it required the small fix below for 32-bit.

Thanks,

Ingo

---
 arch/x86/kernel/dumpstack.c  | 7 +++
 arch/x86/kernel/process_64.c | 7 ---
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index f7f3b5b3bc05..f214f7a6ccad 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -68,6 +68,13 @@ static void printk_stack_address(unsigned long address, int 
reliable,
printk("%s %s%pB\n", log_lvl, reliable ? "" : "? ", (void *)address);
 }
 
+void show_iret_regs(struct pt_regs *regs)
+{
+   printk(KERN_DEFAULT "RIP: %04x:%pS\n", (int)regs->cs, (void *)regs->ip);
+   printk(KERN_DEFAULT "RSP: %04x:%016lx EFLAGS: %08lx", (int)regs->ss,
+   regs->sp, regs->flags);
+}
+
 static void show_regs_safe(struct stack_info *info, struct pt_regs *regs)
 {
if (on_stack(info, regs, sizeof(*regs)))
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 2e0e2979e34d..631e229ab428 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -61,13 +61,6 @@
 
 __visible DEFINE_PER_CPU_USER_MAPPED(unsigned long, rsp_scratch);
 
-void show_iret_regs(struct pt_regs *regs)
-{
-   printk(KERN_DEFAULT "RIP: %04lx:%pS\n", regs->cs, (void *)regs->ip);
-   printk(KERN_DEFAULT "RSP: %04lx:%016lx EFLAGS: %08lx", regs->ss,
-   regs->sp, regs->flags);
-}
-
 /* Prints also some state that isn't saved in the pt_regs */
 void __show_regs(struct pt_regs *regs, int all)
 {


Re: [PATCH] x86/orc: Don't bail on stack overflow

2017-11-27 Thread Ingo Molnar

* Josh Poimboeuf  wrote:

> On Sat, Nov 25, 2017 at 04:16:23PM -0800, Andy Lutomirski wrote:
> > Can you send me whatever config and exact commit hash generated this?
> > I can try to figure out why it failed.
> 
> Sorry, I've been traveling.  I just got some time to take a look at
> this.  I think there are at least two unwinder issues here:
> 
> - It doesn't deal gracefully with the case where the stack overflows and
>   the stack pointer itself isn't on a valid stack but the
>   to-be-dereferenced data *is*.
> 
> - The oops dump code doesn't know how to print partial pt_regs, for the
>   case where if we get an interrupt/exception in *early* entry code
>   before the full pt_regs have been saved.
> 
> (Andy, I'm not quite sure about your patch, and whether it's still
> needed after these patches.  I'll need to look at it later when I have
> more time.)
> 
> I attempted to fix both of the issues with the below patch.  Thomas or
> Ingo, can you test to see if this gets rid of the question marks?
> 
> I can split it up into proper patches next week.  I'm assuming this
> isn't holding up the KAISER merge?

It's not holding up the Kaiser merge, but good debuggability of weird crashes 
is a 
really good thing, so I constructed a changelog and picked up this patch as a 
single commit, and added your Signed-off-by, if that's OK with you.

Will only push it out if it passes testing.

Thanks,

Ingo


Re: [PATCH] x86/orc: Don't bail on stack overflow

2017-11-27 Thread Ingo Molnar

* Josh Poimboeuf  wrote:

> On Sat, Nov 25, 2017 at 04:16:23PM -0800, Andy Lutomirski wrote:
> > Can you send me whatever config and exact commit hash generated this?
> > I can try to figure out why it failed.
> 
> Sorry, I've been traveling.  I just got some time to take a look at
> this.  I think there are at least two unwinder issues here:
> 
> - It doesn't deal gracefully with the case where the stack overflows and
>   the stack pointer itself isn't on a valid stack but the
>   to-be-dereferenced data *is*.
> 
> - The oops dump code doesn't know how to print partial pt_regs, for the
>   case where if we get an interrupt/exception in *early* entry code
>   before the full pt_regs have been saved.
> 
> (Andy, I'm not quite sure about your patch, and whether it's still
> needed after these patches.  I'll need to look at it later when I have
> more time.)
> 
> I attempted to fix both of the issues with the below patch.  Thomas or
> Ingo, can you test to see if this gets rid of the question marks?
> 
> I can split it up into proper patches next week.  I'm assuming this
> isn't holding up the KAISER merge?

It's not holding up the Kaiser merge, but good debuggability of weird crashes 
is a 
really good thing, so I constructed a changelog and picked up this patch as a 
single commit, and added your Signed-off-by, if that's OK with you.

Will only push it out if it passes testing.

Thanks,

Ingo


Re: [PATCH] x86/orc: Don't bail on stack overflow

2017-11-26 Thread Thomas Gleixner
On Sat, 25 Nov 2017, Josh Poimboeuf wrote:

> It looks a *lot* better with mine and your patches applied.  It probably
> would have helped Ingo and Thomas figure the problem out a lot sooner:

> [1.159583] CS:  0010 DS:  ES:  CR0: 80050033
> [1.159583] CR2: ff083fb8 CR3: 000136f78002 CR4: 
> 001606e0
> [1.159583] Call Trace:
> [1.159583]  
> [1.159583]  __do_page_fault+0x4b0/0x4b0
> [1.159583]  page_fault+0x2c/0x60
> [1.159583] RIP: 0010:do_page_fault+0x0/0x100
> [1.159583] RSP: :ff084120 EFLAGS: 00010012
> [1.159583] RAX: 819d0a87 RBX: 0001 RCX: 
> 819d0a87
> [1.159583] RDX: 1000 RSI: 0010 RDI: 
> ff084128
> [1.159583] RBP: 0d68 R08: 7f6d6bb24278 R09: 
> 0023
> [1.159583] R10: 558e0feca600 R11: 0246 R12: 
> 7f6d6bb203c0
> [1.159583] R13: 7f6d6bb1f880 R14: 7793bebc R15: 
> 0100
> [1.159583]  ? native_iret+0x7/0x7
> [1.159583]  page_fault+0x2c/0x60
> [1.159583] RIP: 0010:apic_timer_interrupt+0x0/0xb0

Yes. That would have pointed immediately to the right place. It'd been
obvious that apic_timer_interrupt is not mapped.

Thanks,

tglx


Re: [PATCH] x86/orc: Don't bail on stack overflow

2017-11-26 Thread Thomas Gleixner
On Sat, 25 Nov 2017, Josh Poimboeuf wrote:

> It looks a *lot* better with mine and your patches applied.  It probably
> would have helped Ingo and Thomas figure the problem out a lot sooner:

> [1.159583] CS:  0010 DS:  ES:  CR0: 80050033
> [1.159583] CR2: ff083fb8 CR3: 000136f78002 CR4: 
> 001606e0
> [1.159583] Call Trace:
> [1.159583]  
> [1.159583]  __do_page_fault+0x4b0/0x4b0
> [1.159583]  page_fault+0x2c/0x60
> [1.159583] RIP: 0010:do_page_fault+0x0/0x100
> [1.159583] RSP: :ff084120 EFLAGS: 00010012
> [1.159583] RAX: 819d0a87 RBX: 0001 RCX: 
> 819d0a87
> [1.159583] RDX: 1000 RSI: 0010 RDI: 
> ff084128
> [1.159583] RBP: 0d68 R08: 7f6d6bb24278 R09: 
> 0023
> [1.159583] R10: 558e0feca600 R11: 0246 R12: 
> 7f6d6bb203c0
> [1.159583] R13: 7f6d6bb1f880 R14: 7793bebc R15: 
> 0100
> [1.159583]  ? native_iret+0x7/0x7
> [1.159583]  page_fault+0x2c/0x60
> [1.159583] RIP: 0010:apic_timer_interrupt+0x0/0xb0

Yes. That would have pointed immediately to the right place. It'd been
obvious that apic_timer_interrupt is not mapped.

Thanks,

tglx


Re: [PATCH] x86/orc: Don't bail on stack overflow

2017-11-25 Thread Josh Poimboeuf
On Sat, Nov 25, 2017 at 10:41:15PM -0600, Josh Poimboeuf wrote:
> On Sat, Nov 25, 2017 at 08:25:12PM -0800, Andy Lutomirski wrote:
> > On Sat, Nov 25, 2017 at 6:40 PM, Josh Poimboeuf  wrote:
> > > On Sat, Nov 25, 2017 at 04:16:23PM -0800, Andy Lutomirski wrote:
> > >> Can you send me whatever config and exact commit hash generated this?
> > >> I can try to figure out why it failed.
> > >
> > > Sorry, I've been traveling.  I just got some time to take a look at
> > > this.  I think there are at least two unwinder issues here:
> > >
> > > - It doesn't deal gracefully with the case where the stack overflows and
> > >   the stack pointer itself isn't on a valid stack but the
> > >   to-be-dereferenced data *is*.
> > >
> > > - The oops dump code doesn't know how to print partial pt_regs, for the
> > >   case where if we get an interrupt/exception in *early* entry code
> > >   before the full pt_regs have been saved.
> > >
> > > (Andy, I'm not quite sure about your patch, and whether it's still
> > > needed after these patches.  I'll need to look at it later when I have
> > > more time.)
> > 
> > I haven't tested yet, but I think my patch is probably still needed.
> > The issue I fixed is that unwind_start() would bail out early if sp
> > was below the stack.  Also:
> 
> Makes sense, maybe both are needed.  Your patch deals with a bad SP at
> the beginning and mine deals with a bad SP in the middle.

I was able to recreate with the config Ingo posted earlier, along with
the following patch:

diff --git a/include/asm-generic/vmlinux.lds.h 
b/include/asm-generic/vmlinux.lds.h
index e12168936d3f..693a20d309e3 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -500,7 +500,7 @@
VMLINUX_SYMBOL(__entry_text_end) = .;

 #define IRQENTRY_TEXT  \
-   ALIGN_FUNCTION();   \
+   . = ALIGN(4096);\
VMLINUX_SYMBOL(__irqentry_text_start) = .;  \
*(.irqentry.text)   \
VMLINUX_SYMBOL(__irqentry_text_end) = .;


It looks a *lot* better with mine and your patches applied.  It probably
would have helped Ingo and Thomas figure the problem out a lot sooner:

[1.159016] PANIC: double fault, error_code: 0x0
[1.159583] CPU: 1 PID: 68 Comm: modprobe Not tainted 
4.14.0-01257-g761d390195b6-dirty #19
[1.159583] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.10.2-1.fc26 04/01/2014
[1.159583] task: 880136f6bb00 task.stack: c9984000
[1.159583] RIP: 0010:page_fault+0x11/0x60
[1.159583] RSP: :ff083fc8 EFLAGS: 00010046
[1.159583] RAX: 819d0a87 RBX: 0001 RCX: 819d0a87
[1.159583] RDX: 1000 RSI: 0010 RDI: ff084078
[1.159583] RBP: 0d68 R08: 7f6d6bb24278 R09: 0023
[1.159583] R10: 558e0feca600 R11: 0246 R12: 7f6d6bb203c0
[1.159583] R13: 7f6d6bb1f880 R14: 7793bebc R15: 0100
[1.159583] FS:  7f6d6c39c5c0() GS:88013fc4() 
knlGS:
[1.159583] CS:  0010 DS:  ES:  CR0: 80050033
[1.159583] CR2: ff083fb8 CR3: 000136f78002 CR4: 001606e0
[1.159583] Call Trace:
[1.159583]  
[1.159583]  __do_page_fault+0x4b0/0x4b0
[1.159583]  page_fault+0x2c/0x60
[1.159583] RIP: 0010:do_page_fault+0x0/0x100
[1.159583] RSP: :ff084120 EFLAGS: 00010012
[1.159583] RAX: 819d0a87 RBX: 0001 RCX: 819d0a87
[1.159583] RDX: 1000 RSI: 0010 RDI: ff084128
[1.159583] RBP: 0d68 R08: 7f6d6bb24278 R09: 0023
[1.159583] R10: 558e0feca600 R11: 0246 R12: 7f6d6bb203c0
[1.159583] R13: 7f6d6bb1f880 R14: 7793bebc R15: 0100
[1.159583]  ? native_iret+0x7/0x7
[1.159583]  page_fault+0x2c/0x60
[1.159583] RIP: 0010:apic_timer_interrupt+0x0/0xb0
[1.159583] RSP: :ff0841d8 EFLAGS: 00010046
[1.159583] RAX: 0374 RBX: 558e0feca2c0 RCX: 7f6d6b85aaf0
[1.159583] RDX: 1000 RSI: 558e0feca600 RDI: 
[1.159583] RBP: 0d68 R08: 7f6d6bb24278 R09: 0023
[1.159583] R10: 558e0feca600 R11: 0246 R12: 7f6d6bb203c0
[1.159583] R13: 7f6d6bb1f880 R14: 7793bebc R15: 0100
[1.159583] RIP: 0033:0x7f6d6b85aaf0
[1.159583] RSP: 002b:7793bd68 EFLAGS: 0246 ORIG_RAX: 
0003
[1.159583] RAX: 819d2000 RBX: 7f6d6b85aaf0 RCX: 0010
[1.159583] RDX: 00010046 RSI: ff0841d8 RDI: 
[1.159583] RBP: 

Re: [PATCH] x86/orc: Don't bail on stack overflow

2017-11-25 Thread Josh Poimboeuf
On Sat, Nov 25, 2017 at 10:41:15PM -0600, Josh Poimboeuf wrote:
> On Sat, Nov 25, 2017 at 08:25:12PM -0800, Andy Lutomirski wrote:
> > On Sat, Nov 25, 2017 at 6:40 PM, Josh Poimboeuf  wrote:
> > > On Sat, Nov 25, 2017 at 04:16:23PM -0800, Andy Lutomirski wrote:
> > >> Can you send me whatever config and exact commit hash generated this?
> > >> I can try to figure out why it failed.
> > >
> > > Sorry, I've been traveling.  I just got some time to take a look at
> > > this.  I think there are at least two unwinder issues here:
> > >
> > > - It doesn't deal gracefully with the case where the stack overflows and
> > >   the stack pointer itself isn't on a valid stack but the
> > >   to-be-dereferenced data *is*.
> > >
> > > - The oops dump code doesn't know how to print partial pt_regs, for the
> > >   case where if we get an interrupt/exception in *early* entry code
> > >   before the full pt_regs have been saved.
> > >
> > > (Andy, I'm not quite sure about your patch, and whether it's still
> > > needed after these patches.  I'll need to look at it later when I have
> > > more time.)
> > 
> > I haven't tested yet, but I think my patch is probably still needed.
> > The issue I fixed is that unwind_start() would bail out early if sp
> > was below the stack.  Also:
> 
> Makes sense, maybe both are needed.  Your patch deals with a bad SP at
> the beginning and mine deals with a bad SP in the middle.

I was able to recreate with the config Ingo posted earlier, along with
the following patch:

diff --git a/include/asm-generic/vmlinux.lds.h 
b/include/asm-generic/vmlinux.lds.h
index e12168936d3f..693a20d309e3 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -500,7 +500,7 @@
VMLINUX_SYMBOL(__entry_text_end) = .;

 #define IRQENTRY_TEXT  \
-   ALIGN_FUNCTION();   \
+   . = ALIGN(4096);\
VMLINUX_SYMBOL(__irqentry_text_start) = .;  \
*(.irqentry.text)   \
VMLINUX_SYMBOL(__irqentry_text_end) = .;


It looks a *lot* better with mine and your patches applied.  It probably
would have helped Ingo and Thomas figure the problem out a lot sooner:

[1.159016] PANIC: double fault, error_code: 0x0
[1.159583] CPU: 1 PID: 68 Comm: modprobe Not tainted 
4.14.0-01257-g761d390195b6-dirty #19
[1.159583] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.10.2-1.fc26 04/01/2014
[1.159583] task: 880136f6bb00 task.stack: c9984000
[1.159583] RIP: 0010:page_fault+0x11/0x60
[1.159583] RSP: :ff083fc8 EFLAGS: 00010046
[1.159583] RAX: 819d0a87 RBX: 0001 RCX: 819d0a87
[1.159583] RDX: 1000 RSI: 0010 RDI: ff084078
[1.159583] RBP: 0d68 R08: 7f6d6bb24278 R09: 0023
[1.159583] R10: 558e0feca600 R11: 0246 R12: 7f6d6bb203c0
[1.159583] R13: 7f6d6bb1f880 R14: 7793bebc R15: 0100
[1.159583] FS:  7f6d6c39c5c0() GS:88013fc4() 
knlGS:
[1.159583] CS:  0010 DS:  ES:  CR0: 80050033
[1.159583] CR2: ff083fb8 CR3: 000136f78002 CR4: 001606e0
[1.159583] Call Trace:
[1.159583]  
[1.159583]  __do_page_fault+0x4b0/0x4b0
[1.159583]  page_fault+0x2c/0x60
[1.159583] RIP: 0010:do_page_fault+0x0/0x100
[1.159583] RSP: :ff084120 EFLAGS: 00010012
[1.159583] RAX: 819d0a87 RBX: 0001 RCX: 819d0a87
[1.159583] RDX: 1000 RSI: 0010 RDI: ff084128
[1.159583] RBP: 0d68 R08: 7f6d6bb24278 R09: 0023
[1.159583] R10: 558e0feca600 R11: 0246 R12: 7f6d6bb203c0
[1.159583] R13: 7f6d6bb1f880 R14: 7793bebc R15: 0100
[1.159583]  ? native_iret+0x7/0x7
[1.159583]  page_fault+0x2c/0x60
[1.159583] RIP: 0010:apic_timer_interrupt+0x0/0xb0
[1.159583] RSP: :ff0841d8 EFLAGS: 00010046
[1.159583] RAX: 0374 RBX: 558e0feca2c0 RCX: 7f6d6b85aaf0
[1.159583] RDX: 1000 RSI: 558e0feca600 RDI: 
[1.159583] RBP: 0d68 R08: 7f6d6bb24278 R09: 0023
[1.159583] R10: 558e0feca600 R11: 0246 R12: 7f6d6bb203c0
[1.159583] R13: 7f6d6bb1f880 R14: 7793bebc R15: 0100
[1.159583] RIP: 0033:0x7f6d6b85aaf0
[1.159583] RSP: 002b:7793bd68 EFLAGS: 0246 ORIG_RAX: 
0003
[1.159583] RAX: 819d2000 RBX: 7f6d6b85aaf0 RCX: 0010
[1.159583] RDX: 00010046 RSI: ff0841d8 RDI: 
[1.159583] RBP: 0374 R08: 

Re: [PATCH] x86/orc: Don't bail on stack overflow

2017-11-25 Thread Josh Poimboeuf
On Sat, Nov 25, 2017 at 08:25:12PM -0800, Andy Lutomirski wrote:
> On Sat, Nov 25, 2017 at 6:40 PM, Josh Poimboeuf  wrote:
> > On Sat, Nov 25, 2017 at 04:16:23PM -0800, Andy Lutomirski wrote:
> >> Can you send me whatever config and exact commit hash generated this?
> >> I can try to figure out why it failed.
> >
> > Sorry, I've been traveling.  I just got some time to take a look at
> > this.  I think there are at least two unwinder issues here:
> >
> > - It doesn't deal gracefully with the case where the stack overflows and
> >   the stack pointer itself isn't on a valid stack but the
> >   to-be-dereferenced data *is*.
> >
> > - The oops dump code doesn't know how to print partial pt_regs, for the
> >   case where if we get an interrupt/exception in *early* entry code
> >   before the full pt_regs have been saved.
> >
> > (Andy, I'm not quite sure about your patch, and whether it's still
> > needed after these patches.  I'll need to look at it later when I have
> > more time.)
> 
> I haven't tested yet, but I think my patch is probably still needed.
> The issue I fixed is that unwind_start() would bail out early if sp
> was below the stack.  Also:

Makes sense, maybe both are needed.  Your patch deals with a bad SP at
the beginning and mine deals with a bad SP in the middle.

> > -static bool stack_access_ok(struct unwind_state *state, unsigned long addr,
> > +static bool stack_access_ok(struct unwind_state *state, unsigned long 
> > _addr,
> > size_t len)
> >  {
> > struct stack_info *info = >stack_info;
> > +   void *addr = (void *)_addr;
> >
> > -   /*
> > -* If the address isn't on the current stack, switch to the next 
> > one.
> > -*
> > -* We may have to traverse multiple stacks to deal with the 
> > possibility
> > -* that info->next_sp could point to an empty stack and the address
> > -* could be on a subsequent stack.
> > -*/
> > -   while (!on_stack(info, (void *)addr, len))
> > -   if (get_stack_info(info->next_sp, state->task, info,
> > -  >stack_mask))
> > -   return false;
> > +   if (!on_stack(info, addr, len) &&
> > +   (get_stack_info(addr, state->task, info, >stack_mask)))
> > +   return false;
> >
> > return true;
> >  }
> 
> This looks odd to me before and after.  Shouldn't this be side-effect
> free?  That is, shouldn't it create a copy of info and stack_mask and
> point that to get_stack_info() rather than allowing get_stack_info()
> to modify the unwind state?

I think the side effects are ok, but maybe stack_access_ok() should be
renamed to make it clearer that it has side effects.

-- 
Josh


Re: [PATCH] x86/orc: Don't bail on stack overflow

2017-11-25 Thread Josh Poimboeuf
On Sat, Nov 25, 2017 at 08:25:12PM -0800, Andy Lutomirski wrote:
> On Sat, Nov 25, 2017 at 6:40 PM, Josh Poimboeuf  wrote:
> > On Sat, Nov 25, 2017 at 04:16:23PM -0800, Andy Lutomirski wrote:
> >> Can you send me whatever config and exact commit hash generated this?
> >> I can try to figure out why it failed.
> >
> > Sorry, I've been traveling.  I just got some time to take a look at
> > this.  I think there are at least two unwinder issues here:
> >
> > - It doesn't deal gracefully with the case where the stack overflows and
> >   the stack pointer itself isn't on a valid stack but the
> >   to-be-dereferenced data *is*.
> >
> > - The oops dump code doesn't know how to print partial pt_regs, for the
> >   case where if we get an interrupt/exception in *early* entry code
> >   before the full pt_regs have been saved.
> >
> > (Andy, I'm not quite sure about your patch, and whether it's still
> > needed after these patches.  I'll need to look at it later when I have
> > more time.)
> 
> I haven't tested yet, but I think my patch is probably still needed.
> The issue I fixed is that unwind_start() would bail out early if sp
> was below the stack.  Also:

Makes sense, maybe both are needed.  Your patch deals with a bad SP at
the beginning and mine deals with a bad SP in the middle.

> > -static bool stack_access_ok(struct unwind_state *state, unsigned long addr,
> > +static bool stack_access_ok(struct unwind_state *state, unsigned long 
> > _addr,
> > size_t len)
> >  {
> > struct stack_info *info = >stack_info;
> > +   void *addr = (void *)_addr;
> >
> > -   /*
> > -* If the address isn't on the current stack, switch to the next 
> > one.
> > -*
> > -* We may have to traverse multiple stacks to deal with the 
> > possibility
> > -* that info->next_sp could point to an empty stack and the address
> > -* could be on a subsequent stack.
> > -*/
> > -   while (!on_stack(info, (void *)addr, len))
> > -   if (get_stack_info(info->next_sp, state->task, info,
> > -  >stack_mask))
> > -   return false;
> > +   if (!on_stack(info, addr, len) &&
> > +   (get_stack_info(addr, state->task, info, >stack_mask)))
> > +   return false;
> >
> > return true;
> >  }
> 
> This looks odd to me before and after.  Shouldn't this be side-effect
> free?  That is, shouldn't it create a copy of info and stack_mask and
> point that to get_stack_info() rather than allowing get_stack_info()
> to modify the unwind state?

I think the side effects are ok, but maybe stack_access_ok() should be
renamed to make it clearer that it has side effects.

-- 
Josh


Re: [PATCH] x86/orc: Don't bail on stack overflow

2017-11-25 Thread Andy Lutomirski
On Sat, Nov 25, 2017 at 6:40 PM, Josh Poimboeuf  wrote:
> On Sat, Nov 25, 2017 at 04:16:23PM -0800, Andy Lutomirski wrote:
>> Can you send me whatever config and exact commit hash generated this?
>> I can try to figure out why it failed.
>
> Sorry, I've been traveling.  I just got some time to take a look at
> this.  I think there are at least two unwinder issues here:
>
> - It doesn't deal gracefully with the case where the stack overflows and
>   the stack pointer itself isn't on a valid stack but the
>   to-be-dereferenced data *is*.
>
> - The oops dump code doesn't know how to print partial pt_regs, for the
>   case where if we get an interrupt/exception in *early* entry code
>   before the full pt_regs have been saved.
>
> (Andy, I'm not quite sure about your patch, and whether it's still
> needed after these patches.  I'll need to look at it later when I have
> more time.)

I haven't tested yet, but I think my patch is probably still needed.
The issue I fixed is that unwind_start() would bail out early if sp
was below the stack.  Also:

> -static bool stack_access_ok(struct unwind_state *state, unsigned long addr,
> +static bool stack_access_ok(struct unwind_state *state, unsigned long _addr,
> size_t len)
>  {
> struct stack_info *info = >stack_info;
> +   void *addr = (void *)_addr;
>
> -   /*
> -* If the address isn't on the current stack, switch to the next one.
> -*
> -* We may have to traverse multiple stacks to deal with the 
> possibility
> -* that info->next_sp could point to an empty stack and the address
> -* could be on a subsequent stack.
> -*/
> -   while (!on_stack(info, (void *)addr, len))
> -   if (get_stack_info(info->next_sp, state->task, info,
> -  >stack_mask))
> -   return false;
> +   if (!on_stack(info, addr, len) &&
> +   (get_stack_info(addr, state->task, info, >stack_mask)))
> +   return false;
>
> return true;
>  }

This looks odd to me before and after.  Shouldn't this be side-effect
free?  That is, shouldn't it create a copy of info and stack_mask and
point that to get_stack_info() rather than allowing get_stack_info()
to modify the unwind state?


Re: [PATCH] x86/orc: Don't bail on stack overflow

2017-11-25 Thread Andy Lutomirski
On Sat, Nov 25, 2017 at 6:40 PM, Josh Poimboeuf  wrote:
> On Sat, Nov 25, 2017 at 04:16:23PM -0800, Andy Lutomirski wrote:
>> Can you send me whatever config and exact commit hash generated this?
>> I can try to figure out why it failed.
>
> Sorry, I've been traveling.  I just got some time to take a look at
> this.  I think there are at least two unwinder issues here:
>
> - It doesn't deal gracefully with the case where the stack overflows and
>   the stack pointer itself isn't on a valid stack but the
>   to-be-dereferenced data *is*.
>
> - The oops dump code doesn't know how to print partial pt_regs, for the
>   case where if we get an interrupt/exception in *early* entry code
>   before the full pt_regs have been saved.
>
> (Andy, I'm not quite sure about your patch, and whether it's still
> needed after these patches.  I'll need to look at it later when I have
> more time.)

I haven't tested yet, but I think my patch is probably still needed.
The issue I fixed is that unwind_start() would bail out early if sp
was below the stack.  Also:

> -static bool stack_access_ok(struct unwind_state *state, unsigned long addr,
> +static bool stack_access_ok(struct unwind_state *state, unsigned long _addr,
> size_t len)
>  {
> struct stack_info *info = >stack_info;
> +   void *addr = (void *)_addr;
>
> -   /*
> -* If the address isn't on the current stack, switch to the next one.
> -*
> -* We may have to traverse multiple stacks to deal with the 
> possibility
> -* that info->next_sp could point to an empty stack and the address
> -* could be on a subsequent stack.
> -*/
> -   while (!on_stack(info, (void *)addr, len))
> -   if (get_stack_info(info->next_sp, state->task, info,
> -  >stack_mask))
> -   return false;
> +   if (!on_stack(info, addr, len) &&
> +   (get_stack_info(addr, state->task, info, >stack_mask)))
> +   return false;
>
> return true;
>  }

This looks odd to me before and after.  Shouldn't this be side-effect
free?  That is, shouldn't it create a copy of info and stack_mask and
point that to get_stack_info() rather than allowing get_stack_info()
to modify the unwind state?


Re: [PATCH] x86/orc: Don't bail on stack overflow

2017-11-25 Thread Josh Poimboeuf
On Sat, Nov 25, 2017 at 04:16:23PM -0800, Andy Lutomirski wrote:
> Can you send me whatever config and exact commit hash generated this?
> I can try to figure out why it failed.

Sorry, I've been traveling.  I just got some time to take a look at
this.  I think there are at least two unwinder issues here:

- It doesn't deal gracefully with the case where the stack overflows and
  the stack pointer itself isn't on a valid stack but the
  to-be-dereferenced data *is*.

- The oops dump code doesn't know how to print partial pt_regs, for the
  case where if we get an interrupt/exception in *early* entry code
  before the full pt_regs have been saved.

(Andy, I'm not quite sure about your patch, and whether it's still
needed after these patches.  I'll need to look at it later when I have
more time.)

I attempted to fix both of the issues with the below patch.  Thomas or
Ingo, can you test to see if this gets rid of the question marks?

I can split it up into proper patches next week.  I'm assuming this
isn't holding up the KAISER merge?


diff --git a/arch/x86/include/asm/kdebug.h b/arch/x86/include/asm/kdebug.h
index f86a8caa561e..395c9631e000 100644
--- a/arch/x86/include/asm/kdebug.h
+++ b/arch/x86/include/asm/kdebug.h
@@ -26,6 +26,7 @@ extern void die(const char *, struct pt_regs *,long);
 extern int __must_check __die(const char *, struct pt_regs *, long);
 extern void show_stack_regs(struct pt_regs *regs);
 extern void __show_regs(struct pt_regs *regs, int all);
+extern void show_iret_regs(struct pt_regs *regs);
 extern unsigned long oops_begin(void);
 extern void oops_end(unsigned long, struct pt_regs *, int signr);
 
diff --git a/arch/x86/include/asm/unwind.h b/arch/x86/include/asm/unwind.h
index e9cc6fe1fc6f..5be2fb23825a 100644
--- a/arch/x86/include/asm/unwind.h
+++ b/arch/x86/include/asm/unwind.h
@@ -7,6 +7,9 @@
 #include 
 #include 
 
+#define IRET_FRAME_OFFSET (offsetof(struct pt_regs, ip))
+#define IRET_FRAME_SIZE   (sizeof(struct pt_regs) - IRET_FRAME_OFFSET)
+
 struct unwind_state {
struct stack_info stack_info;
unsigned long stack_mask;
@@ -52,6 +55,10 @@ void unwind_start(struct unwind_state *state, struct 
task_struct *task,
 }
 
 #if defined(CONFIG_UNWINDER_ORC) || defined(CONFIG_UNWINDER_FRAME_POINTER)
+/*
+ * WARNING: The entire pt_regs may not be safe to dereference.  In some cases,
+ * only the iret registers are accessible.  Use with caution!
+ */
 static inline struct pt_regs *unwind_get_entry_regs(struct unwind_state *state)
 {
if (unwind_done(state))
diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index d58dd121c0af..f7f3b5b3bc05 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -68,6 +68,21 @@ static void printk_stack_address(unsigned long address, int 
reliable,
printk("%s %s%pB\n", log_lvl, reliable ? "" : "? ", (void *)address);
 }
 
+static void show_regs_safe(struct stack_info *info, struct pt_regs *regs)
+{
+   if (on_stack(info, regs, sizeof(*regs)))
+   __show_regs(regs, 0);
+   else if (on_stack(info, (void *)regs + IRET_FRAME_OFFSET,
+ IRET_FRAME_SIZE)) {
+   /*
+* When an interrupt or exception occurs in entry code, the
+* full pt_regs might not have been saved yet.  In that case
+* just print the iret return frame.
+*/
+   show_iret_regs(regs);
+   }
+}
+
 void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
unsigned long *stack, char *log_lvl)
 {
@@ -116,8 +131,8 @@ void show_trace_log_lvl(struct task_struct *task, struct 
pt_regs *regs,
if (stack_name)
printk("%s <%s>\n", log_lvl, stack_name);
 
-   if (regs && on_stack(_info, regs, sizeof(*regs)))
-   __show_regs(regs, 0);
+   if (regs)
+   show_regs_safe(_info, regs);
 
/*
 * Scan the stack, printing any text addresses we find.  At the
@@ -141,7 +156,7 @@ void show_trace_log_lvl(struct task_struct *task, struct 
pt_regs *regs,
 
/*
 * Don't print regs->ip again if it was already printed
-* by __show_regs() below.
+* by show_regs_safe() below.
 */
if (regs && stack == >ip)
goto next;
@@ -177,8 +192,8 @@ void show_trace_log_lvl(struct task_struct *task, struct 
pt_regs *regs,
 
/* if the frame has entry regs, print them */
regs = unwind_get_entry_regs();
-   if (regs && on_stack(_info, regs, sizeof(*regs)))
-   __show_regs(regs, 0);
+   if (regs)
+   show_regs_safe(_info, regs);
}
 
   

Re: [PATCH] x86/orc: Don't bail on stack overflow

2017-11-25 Thread Josh Poimboeuf
On Sat, Nov 25, 2017 at 04:16:23PM -0800, Andy Lutomirski wrote:
> Can you send me whatever config and exact commit hash generated this?
> I can try to figure out why it failed.

Sorry, I've been traveling.  I just got some time to take a look at
this.  I think there are at least two unwinder issues here:

- It doesn't deal gracefully with the case where the stack overflows and
  the stack pointer itself isn't on a valid stack but the
  to-be-dereferenced data *is*.

- The oops dump code doesn't know how to print partial pt_regs, for the
  case where if we get an interrupt/exception in *early* entry code
  before the full pt_regs have been saved.

(Andy, I'm not quite sure about your patch, and whether it's still
needed after these patches.  I'll need to look at it later when I have
more time.)

I attempted to fix both of the issues with the below patch.  Thomas or
Ingo, can you test to see if this gets rid of the question marks?

I can split it up into proper patches next week.  I'm assuming this
isn't holding up the KAISER merge?


diff --git a/arch/x86/include/asm/kdebug.h b/arch/x86/include/asm/kdebug.h
index f86a8caa561e..395c9631e000 100644
--- a/arch/x86/include/asm/kdebug.h
+++ b/arch/x86/include/asm/kdebug.h
@@ -26,6 +26,7 @@ extern void die(const char *, struct pt_regs *,long);
 extern int __must_check __die(const char *, struct pt_regs *, long);
 extern void show_stack_regs(struct pt_regs *regs);
 extern void __show_regs(struct pt_regs *regs, int all);
+extern void show_iret_regs(struct pt_regs *regs);
 extern unsigned long oops_begin(void);
 extern void oops_end(unsigned long, struct pt_regs *, int signr);
 
diff --git a/arch/x86/include/asm/unwind.h b/arch/x86/include/asm/unwind.h
index e9cc6fe1fc6f..5be2fb23825a 100644
--- a/arch/x86/include/asm/unwind.h
+++ b/arch/x86/include/asm/unwind.h
@@ -7,6 +7,9 @@
 #include 
 #include 
 
+#define IRET_FRAME_OFFSET (offsetof(struct pt_regs, ip))
+#define IRET_FRAME_SIZE   (sizeof(struct pt_regs) - IRET_FRAME_OFFSET)
+
 struct unwind_state {
struct stack_info stack_info;
unsigned long stack_mask;
@@ -52,6 +55,10 @@ void unwind_start(struct unwind_state *state, struct 
task_struct *task,
 }
 
 #if defined(CONFIG_UNWINDER_ORC) || defined(CONFIG_UNWINDER_FRAME_POINTER)
+/*
+ * WARNING: The entire pt_regs may not be safe to dereference.  In some cases,
+ * only the iret registers are accessible.  Use with caution!
+ */
 static inline struct pt_regs *unwind_get_entry_regs(struct unwind_state *state)
 {
if (unwind_done(state))
diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index d58dd121c0af..f7f3b5b3bc05 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -68,6 +68,21 @@ static void printk_stack_address(unsigned long address, int 
reliable,
printk("%s %s%pB\n", log_lvl, reliable ? "" : "? ", (void *)address);
 }
 
+static void show_regs_safe(struct stack_info *info, struct pt_regs *regs)
+{
+   if (on_stack(info, regs, sizeof(*regs)))
+   __show_regs(regs, 0);
+   else if (on_stack(info, (void *)regs + IRET_FRAME_OFFSET,
+ IRET_FRAME_SIZE)) {
+   /*
+* When an interrupt or exception occurs in entry code, the
+* full pt_regs might not have been saved yet.  In that case
+* just print the iret return frame.
+*/
+   show_iret_regs(regs);
+   }
+}
+
 void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
unsigned long *stack, char *log_lvl)
 {
@@ -116,8 +131,8 @@ void show_trace_log_lvl(struct task_struct *task, struct 
pt_regs *regs,
if (stack_name)
printk("%s <%s>\n", log_lvl, stack_name);
 
-   if (regs && on_stack(_info, regs, sizeof(*regs)))
-   __show_regs(regs, 0);
+   if (regs)
+   show_regs_safe(_info, regs);
 
/*
 * Scan the stack, printing any text addresses we find.  At the
@@ -141,7 +156,7 @@ void show_trace_log_lvl(struct task_struct *task, struct 
pt_regs *regs,
 
/*
 * Don't print regs->ip again if it was already printed
-* by __show_regs() below.
+* by show_regs_safe() below.
 */
if (regs && stack == >ip)
goto next;
@@ -177,8 +192,8 @@ void show_trace_log_lvl(struct task_struct *task, struct 
pt_regs *regs,
 
/* if the frame has entry regs, print them */
regs = unwind_get_entry_regs();
-   if (regs && on_stack(_info, regs, sizeof(*regs)))
-   __show_regs(regs, 0);
+   if (regs)
+   show_regs_safe(_info, regs);
}
 
   

Re: [PATCH] x86/orc: Don't bail on stack overflow

2017-11-25 Thread Andy Lutomirski
Can you send me whatever config and exact commit hash generated this?
I can try to figure out why it failed.

On Sat, Nov 25, 2017 at 3:13 PM, Thomas Gleixner  wrote:
> On Sat, 25 Nov 2017, Andy Lutomirski wrote:
>
>> On Sat, Nov 25, 2017 at 9:28 AM, Andy Lutomirski  wrote:
>> > If we overflow the stack into a guard page and then try to unwind
>> > it with ORC, it should work perfectly: by construction, there can't
>> > be any meaningful data in the guard page because no writes to the
>> > guard page will have succeeded.
>> >
>> > ORC seems entirely capable of unwinding in this situation, except
>> > that it doesn't even try.  Adjust its initial stack check so that
>> > it's willing to try unwinding.
>> >
>> > I tested this by intentionally overflowing the task stack.  The
>> > result is an accurate call trace instead of a trace consisting
>> > purely of '?' entries.
>> >
>> > Signed-off-by: Andy Lutomirski 
>> > ---
>> >
>> > Hi all-
>> >
>> > Ingo, this would have fixed half the debugging problem you had, I think.
>> > To really nail it, we'd want some kind of magic to annotate the trace
>> > so that page_fault (and async_page_fault) entries show CR2 and error_code.
>> >
>> > Josh, any ideas of how to do that cleanly?  We could easily hard-code it
>> > in the OOPS unwinder, I guess.
>>
>> Actually, this does pretty well.  We don't get CR2, but, when I added
>> an intentional bug kind of along the lines of the one you debugged,
>> the intermediate page fault successfully dumps all the regs in the
>> stack trace, so we get the faulting instruction *and* the registers.
>> We also get ORIG_RAX, which tells us the error code.  We could be
>> fancy and decode that.
>
> It works in general, but for that case it's not much better than before
> vs. the '?' entries.
>
> Thanks,
>
> tglx
>
> [2.556065] PANIC: double fault, error_code: 0x0
> [2.557116] CPU: 1 PID: 273 Comm: systemd-udevd Not tainted 
> 4.14.0-01256-g03dea81fe9f2-dirty #30
> [2.558930] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> 1.10.2-1 04/01/2014
> [2.560133] task: 880428dd8000 task.stack: c900025fc000
> [2.560729] RIP: 0010:page_fault+0x11/0x60
> [2.561122] RSP: :ff083fc8 EFLAGS: 00010046
> [2.561607] RAX: 819d0ac7 RBX: 0001 RCX: 
> 819d0ac7
> [2.562357] RDX:  RSI: 0010 RDI: 
> ff084078
> [2.563027] RBP: 000b R08:  R09: 
> 0040
> [2.563726] R10: 0018 R11: 0246 R12: 
> 0003
> [2.564429] R13: 55719fd7d410 R14:  R15: 
> 03938700
> [2.565104] FS:  7f9edc0b78c0() GS:88042e44() 
> knlGS:
> [2.565844] CS:  0010 DS:  ES:  CR0: 80050033
> [2.566396] CR2: ff083fb8 CR3: 000428ec4005 CR4: 
> 001606e0
> [2.567097] DR0:  DR1:  DR2: 
> 
> [2.567761] DR3:  DR6: fffe0ff0 DR7: 
> 0400
> [2.568451] Call Trace:
> [2.568704]  
> [2.568950]  ? __do_page_fault+0x4b0/0x4b0
> [2.569348]  ? page_fault+0x2c/0x60
> [2.569680]  ? native_iret+0x7/0x7
> [2.570019]  ? __do_page_fault+0x4b0/0x4b0
> [2.570396]  ? page_fault+0x2c/0x60
> [2.570743]  ? call_function_interrupt+0xc0/0xc0
> [2.571199]  
> [2.571422] Code: ff e8 34 b7 6a ff e9 9f 02 00 00 0f 1f 44 00 00 66 2e 0f 
> 1f 84 00 00 00 00 00 0f 1f 00 48 83 c4 88 f6 84 24 88 00 00 00 03 75 20  
> 4a 01 00 00 48 89 e7 48 8b 74 24 78 48 c7 44 24 78 ff ff ff
> [2.573192] Kernel panic - not syncing: Machine halted.
> [2.573694] CPU: 1 PID: 273 Comm: systemd-udevd Not tainted 
> 4.14.0-01256-g03dea81fe9f2-dirty #30
> [2.574528] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> 1.10.2-1 04/01/2014
> [2.575330] Call Trace:
> [2.575570]  <#DF>
> [2.575760]  dump_stack+0x46/0x59
> [2.576120]  panic+0xde/0x223
> [2.576405]  df_debug+0x29/0x30
> [2.576687]  do_double_fault+0x9a/0x120
> [2.577057]  double_fault+0x22/0x30
> [2.577376] RIP: 0010:page_fault+0x11/0x60
> [2.55] RSP: :ff083fc8 EFLAGS: 00010046
> [2.578314] RAX: 819d0ac7 RBX: 0001 RCX: 
> 819d0ac7
> [2.578979] RDX:  RSI: 0010 RDI: 
> ff084078
> [2.579666] RBP: 000b R08:  R09: 
> 0040
> [2.580334] R10: 0018 R11: 0246 R12: 
> 0003
> [2.581008] R13: 55719fd7d410 R14:  R15: 
> 03938700
> [2.581684]  ? native_iret+0x7/0x7
> [2.582007] WARNING: can't dereference iret registers at ff084048 
> for ip page_fault+0x11/0x60
> [2.582008]  
> [2.583134]  
> [2.583367]  ? 

Re: [PATCH] x86/orc: Don't bail on stack overflow

2017-11-25 Thread Andy Lutomirski
Can you send me whatever config and exact commit hash generated this?
I can try to figure out why it failed.

On Sat, Nov 25, 2017 at 3:13 PM, Thomas Gleixner  wrote:
> On Sat, 25 Nov 2017, Andy Lutomirski wrote:
>
>> On Sat, Nov 25, 2017 at 9:28 AM, Andy Lutomirski  wrote:
>> > If we overflow the stack into a guard page and then try to unwind
>> > it with ORC, it should work perfectly: by construction, there can't
>> > be any meaningful data in the guard page because no writes to the
>> > guard page will have succeeded.
>> >
>> > ORC seems entirely capable of unwinding in this situation, except
>> > that it doesn't even try.  Adjust its initial stack check so that
>> > it's willing to try unwinding.
>> >
>> > I tested this by intentionally overflowing the task stack.  The
>> > result is an accurate call trace instead of a trace consisting
>> > purely of '?' entries.
>> >
>> > Signed-off-by: Andy Lutomirski 
>> > ---
>> >
>> > Hi all-
>> >
>> > Ingo, this would have fixed half the debugging problem you had, I think.
>> > To really nail it, we'd want some kind of magic to annotate the trace
>> > so that page_fault (and async_page_fault) entries show CR2 and error_code.
>> >
>> > Josh, any ideas of how to do that cleanly?  We could easily hard-code it
>> > in the OOPS unwinder, I guess.
>>
>> Actually, this does pretty well.  We don't get CR2, but, when I added
>> an intentional bug kind of along the lines of the one you debugged,
>> the intermediate page fault successfully dumps all the regs in the
>> stack trace, so we get the faulting instruction *and* the registers.
>> We also get ORIG_RAX, which tells us the error code.  We could be
>> fancy and decode that.
>
> It works in general, but for that case it's not much better than before
> vs. the '?' entries.
>
> Thanks,
>
> tglx
>
> [2.556065] PANIC: double fault, error_code: 0x0
> [2.557116] CPU: 1 PID: 273 Comm: systemd-udevd Not tainted 
> 4.14.0-01256-g03dea81fe9f2-dirty #30
> [2.558930] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> 1.10.2-1 04/01/2014
> [2.560133] task: 880428dd8000 task.stack: c900025fc000
> [2.560729] RIP: 0010:page_fault+0x11/0x60
> [2.561122] RSP: :ff083fc8 EFLAGS: 00010046
> [2.561607] RAX: 819d0ac7 RBX: 0001 RCX: 
> 819d0ac7
> [2.562357] RDX:  RSI: 0010 RDI: 
> ff084078
> [2.563027] RBP: 000b R08:  R09: 
> 0040
> [2.563726] R10: 0018 R11: 0246 R12: 
> 0003
> [2.564429] R13: 55719fd7d410 R14:  R15: 
> 03938700
> [2.565104] FS:  7f9edc0b78c0() GS:88042e44() 
> knlGS:
> [2.565844] CS:  0010 DS:  ES:  CR0: 80050033
> [2.566396] CR2: ff083fb8 CR3: 000428ec4005 CR4: 
> 001606e0
> [2.567097] DR0:  DR1:  DR2: 
> 
> [2.567761] DR3:  DR6: fffe0ff0 DR7: 
> 0400
> [2.568451] Call Trace:
> [2.568704]  
> [2.568950]  ? __do_page_fault+0x4b0/0x4b0
> [2.569348]  ? page_fault+0x2c/0x60
> [2.569680]  ? native_iret+0x7/0x7
> [2.570019]  ? __do_page_fault+0x4b0/0x4b0
> [2.570396]  ? page_fault+0x2c/0x60
> [2.570743]  ? call_function_interrupt+0xc0/0xc0
> [2.571199]  
> [2.571422] Code: ff e8 34 b7 6a ff e9 9f 02 00 00 0f 1f 44 00 00 66 2e 0f 
> 1f 84 00 00 00 00 00 0f 1f 00 48 83 c4 88 f6 84 24 88 00 00 00 03 75 20  
> 4a 01 00 00 48 89 e7 48 8b 74 24 78 48 c7 44 24 78 ff ff ff
> [2.573192] Kernel panic - not syncing: Machine halted.
> [2.573694] CPU: 1 PID: 273 Comm: systemd-udevd Not tainted 
> 4.14.0-01256-g03dea81fe9f2-dirty #30
> [2.574528] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> 1.10.2-1 04/01/2014
> [2.575330] Call Trace:
> [2.575570]  <#DF>
> [2.575760]  dump_stack+0x46/0x59
> [2.576120]  panic+0xde/0x223
> [2.576405]  df_debug+0x29/0x30
> [2.576687]  do_double_fault+0x9a/0x120
> [2.577057]  double_fault+0x22/0x30
> [2.577376] RIP: 0010:page_fault+0x11/0x60
> [2.55] RSP: :ff083fc8 EFLAGS: 00010046
> [2.578314] RAX: 819d0ac7 RBX: 0001 RCX: 
> 819d0ac7
> [2.578979] RDX:  RSI: 0010 RDI: 
> ff084078
> [2.579666] RBP: 000b R08:  R09: 
> 0040
> [2.580334] R10: 0018 R11: 0246 R12: 
> 0003
> [2.581008] R13: 55719fd7d410 R14:  R15: 
> 03938700
> [2.581684]  ? native_iret+0x7/0x7
> [2.582007] WARNING: can't dereference iret registers at ff084048 
> for ip page_fault+0x11/0x60
> [2.582008]  
> [2.583134]  
> [2.583367]  ? __do_page_fault+0x4b0/0x4b0
> [2.583751]  ? page_fault+0x2c/0x60

Re: [PATCH] x86/orc: Don't bail on stack overflow

2017-11-25 Thread Thomas Gleixner
On Sat, 25 Nov 2017, Andy Lutomirski wrote:

> On Sat, Nov 25, 2017 at 9:28 AM, Andy Lutomirski  wrote:
> > If we overflow the stack into a guard page and then try to unwind
> > it with ORC, it should work perfectly: by construction, there can't
> > be any meaningful data in the guard page because no writes to the
> > guard page will have succeeded.
> >
> > ORC seems entirely capable of unwinding in this situation, except
> > that it doesn't even try.  Adjust its initial stack check so that
> > it's willing to try unwinding.
> >
> > I tested this by intentionally overflowing the task stack.  The
> > result is an accurate call trace instead of a trace consisting
> > purely of '?' entries.
> >
> > Signed-off-by: Andy Lutomirski 
> > ---
> >
> > Hi all-
> >
> > Ingo, this would have fixed half the debugging problem you had, I think.
> > To really nail it, we'd want some kind of magic to annotate the trace
> > so that page_fault (and async_page_fault) entries show CR2 and error_code.
> >
> > Josh, any ideas of how to do that cleanly?  We could easily hard-code it
> > in the OOPS unwinder, I guess.
> 
> Actually, this does pretty well.  We don't get CR2, but, when I added
> an intentional bug kind of along the lines of the one you debugged,
> the intermediate page fault successfully dumps all the regs in the
> stack trace, so we get the faulting instruction *and* the registers.
> We also get ORIG_RAX, which tells us the error code.  We could be
> fancy and decode that.

It works in general, but for that case it's not much better than before
vs. the '?' entries.

Thanks,

tglx

[2.556065] PANIC: double fault, error_code: 0x0
[2.557116] CPU: 1 PID: 273 Comm: systemd-udevd Not tainted 
4.14.0-01256-g03dea81fe9f2-dirty #30
[2.558930] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.10.2-1 04/01/2014
[2.560133] task: 880428dd8000 task.stack: c900025fc000
[2.560729] RIP: 0010:page_fault+0x11/0x60
[2.561122] RSP: :ff083fc8 EFLAGS: 00010046
[2.561607] RAX: 819d0ac7 RBX: 0001 RCX: 819d0ac7
[2.562357] RDX:  RSI: 0010 RDI: ff084078
[2.563027] RBP: 000b R08:  R09: 0040
[2.563726] R10: 0018 R11: 0246 R12: 0003
[2.564429] R13: 55719fd7d410 R14:  R15: 03938700
[2.565104] FS:  7f9edc0b78c0() GS:88042e44() 
knlGS:
[2.565844] CS:  0010 DS:  ES:  CR0: 80050033
[2.566396] CR2: ff083fb8 CR3: 000428ec4005 CR4: 001606e0
[2.567097] DR0:  DR1:  DR2: 
[2.567761] DR3:  DR6: fffe0ff0 DR7: 0400
[2.568451] Call Trace:
[2.568704]  
[2.568950]  ? __do_page_fault+0x4b0/0x4b0
[2.569348]  ? page_fault+0x2c/0x60
[2.569680]  ? native_iret+0x7/0x7
[2.570019]  ? __do_page_fault+0x4b0/0x4b0
[2.570396]  ? page_fault+0x2c/0x60
[2.570743]  ? call_function_interrupt+0xc0/0xc0
[2.571199]  
[2.571422] Code: ff e8 34 b7 6a ff e9 9f 02 00 00 0f 1f 44 00 00 66 2e 0f 
1f 84 00 00 00 00 00 0f 1f 00 48 83 c4 88 f6 84 24 88 00 00 00 03 75 20  4a 
01 00 00 48 89 e7 48 8b 74 24 78 48 c7 44 24 78 ff ff ff 
[2.573192] Kernel panic - not syncing: Machine halted.
[2.573694] CPU: 1 PID: 273 Comm: systemd-udevd Not tainted 
4.14.0-01256-g03dea81fe9f2-dirty #30
[2.574528] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.10.2-1 04/01/2014
[2.575330] Call Trace:
[2.575570]  <#DF>
[2.575760]  dump_stack+0x46/0x59
[2.576120]  panic+0xde/0x223
[2.576405]  df_debug+0x29/0x30
[2.576687]  do_double_fault+0x9a/0x120
[2.577057]  double_fault+0x22/0x30
[2.577376] RIP: 0010:page_fault+0x11/0x60
[2.55] RSP: :ff083fc8 EFLAGS: 00010046
[2.578314] RAX: 819d0ac7 RBX: 0001 RCX: 819d0ac7
[2.578979] RDX:  RSI: 0010 RDI: ff084078
[2.579666] RBP: 000b R08:  R09: 0040
[2.580334] R10: 0018 R11: 0246 R12: 0003
[2.581008] R13: 55719fd7d410 R14:  R15: 03938700
[2.581684]  ? native_iret+0x7/0x7
[2.582007] WARNING: can't dereference iret registers at ff084048 
for ip page_fault+0x11/0x60
[2.582008]  
[2.583134]  
[2.583367]  ? __do_page_fault+0x4b0/0x4b0
[2.583751]  ? page_fault+0x2c/0x60
[2.584127]  ? native_iret+0x7/0x7
[2.584466]  ? __do_page_fault+0x4b0/0x4b0
[2.584860]  ? page_fault+0x2c/0x60
[2.585195]  ? call_function_interrupt+0xc0/0xc0
[2.585621]  
[2.586966] Dumping ftrace buffer:
[2.587254](ftrace buffer empty)
[2.587534] Kernel Offset: disabled
[2.587814] ---[ 

Re: [PATCH] x86/orc: Don't bail on stack overflow

2017-11-25 Thread Thomas Gleixner
On Sat, 25 Nov 2017, Andy Lutomirski wrote:

> On Sat, Nov 25, 2017 at 9:28 AM, Andy Lutomirski  wrote:
> > If we overflow the stack into a guard page and then try to unwind
> > it with ORC, it should work perfectly: by construction, there can't
> > be any meaningful data in the guard page because no writes to the
> > guard page will have succeeded.
> >
> > ORC seems entirely capable of unwinding in this situation, except
> > that it doesn't even try.  Adjust its initial stack check so that
> > it's willing to try unwinding.
> >
> > I tested this by intentionally overflowing the task stack.  The
> > result is an accurate call trace instead of a trace consisting
> > purely of '?' entries.
> >
> > Signed-off-by: Andy Lutomirski 
> > ---
> >
> > Hi all-
> >
> > Ingo, this would have fixed half the debugging problem you had, I think.
> > To really nail it, we'd want some kind of magic to annotate the trace
> > so that page_fault (and async_page_fault) entries show CR2 and error_code.
> >
> > Josh, any ideas of how to do that cleanly?  We could easily hard-code it
> > in the OOPS unwinder, I guess.
> 
> Actually, this does pretty well.  We don't get CR2, but, when I added
> an intentional bug kind of along the lines of the one you debugged,
> the intermediate page fault successfully dumps all the regs in the
> stack trace, so we get the faulting instruction *and* the registers.
> We also get ORIG_RAX, which tells us the error code.  We could be
> fancy and decode that.

It works in general, but for that case it's not much better than before
vs. the '?' entries.

Thanks,

tglx

[2.556065] PANIC: double fault, error_code: 0x0
[2.557116] CPU: 1 PID: 273 Comm: systemd-udevd Not tainted 
4.14.0-01256-g03dea81fe9f2-dirty #30
[2.558930] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.10.2-1 04/01/2014
[2.560133] task: 880428dd8000 task.stack: c900025fc000
[2.560729] RIP: 0010:page_fault+0x11/0x60
[2.561122] RSP: :ff083fc8 EFLAGS: 00010046
[2.561607] RAX: 819d0ac7 RBX: 0001 RCX: 819d0ac7
[2.562357] RDX:  RSI: 0010 RDI: ff084078
[2.563027] RBP: 000b R08:  R09: 0040
[2.563726] R10: 0018 R11: 0246 R12: 0003
[2.564429] R13: 55719fd7d410 R14:  R15: 03938700
[2.565104] FS:  7f9edc0b78c0() GS:88042e44() 
knlGS:
[2.565844] CS:  0010 DS:  ES:  CR0: 80050033
[2.566396] CR2: ff083fb8 CR3: 000428ec4005 CR4: 001606e0
[2.567097] DR0:  DR1:  DR2: 
[2.567761] DR3:  DR6: fffe0ff0 DR7: 0400
[2.568451] Call Trace:
[2.568704]  
[2.568950]  ? __do_page_fault+0x4b0/0x4b0
[2.569348]  ? page_fault+0x2c/0x60
[2.569680]  ? native_iret+0x7/0x7
[2.570019]  ? __do_page_fault+0x4b0/0x4b0
[2.570396]  ? page_fault+0x2c/0x60
[2.570743]  ? call_function_interrupt+0xc0/0xc0
[2.571199]  
[2.571422] Code: ff e8 34 b7 6a ff e9 9f 02 00 00 0f 1f 44 00 00 66 2e 0f 
1f 84 00 00 00 00 00 0f 1f 00 48 83 c4 88 f6 84 24 88 00 00 00 03 75 20  4a 
01 00 00 48 89 e7 48 8b 74 24 78 48 c7 44 24 78 ff ff ff 
[2.573192] Kernel panic - not syncing: Machine halted.
[2.573694] CPU: 1 PID: 273 Comm: systemd-udevd Not tainted 
4.14.0-01256-g03dea81fe9f2-dirty #30
[2.574528] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.10.2-1 04/01/2014
[2.575330] Call Trace:
[2.575570]  <#DF>
[2.575760]  dump_stack+0x46/0x59
[2.576120]  panic+0xde/0x223
[2.576405]  df_debug+0x29/0x30
[2.576687]  do_double_fault+0x9a/0x120
[2.577057]  double_fault+0x22/0x30
[2.577376] RIP: 0010:page_fault+0x11/0x60
[2.55] RSP: :ff083fc8 EFLAGS: 00010046
[2.578314] RAX: 819d0ac7 RBX: 0001 RCX: 819d0ac7
[2.578979] RDX:  RSI: 0010 RDI: ff084078
[2.579666] RBP: 000b R08:  R09: 0040
[2.580334] R10: 0018 R11: 0246 R12: 0003
[2.581008] R13: 55719fd7d410 R14:  R15: 03938700
[2.581684]  ? native_iret+0x7/0x7
[2.582007] WARNING: can't dereference iret registers at ff084048 
for ip page_fault+0x11/0x60
[2.582008]  
[2.583134]  
[2.583367]  ? __do_page_fault+0x4b0/0x4b0
[2.583751]  ? page_fault+0x2c/0x60
[2.584127]  ? native_iret+0x7/0x7
[2.584466]  ? __do_page_fault+0x4b0/0x4b0
[2.584860]  ? page_fault+0x2c/0x60
[2.585195]  ? call_function_interrupt+0xc0/0xc0
[2.585621]  
[2.586966] Dumping ftrace buffer:
[2.587254](ftrace buffer empty)
[2.587534] Kernel Offset: disabled
[2.587814] ---[ end Kernel panic - not syncing: 

Re: [PATCH] x86/orc: Don't bail on stack overflow

2017-11-25 Thread Andy Lutomirski
On Sat, Nov 25, 2017 at 9:28 AM, Andy Lutomirski  wrote:
> If we overflow the stack into a guard page and then try to unwind
> it with ORC, it should work perfectly: by construction, there can't
> be any meaningful data in the guard page because no writes to the
> guard page will have succeeded.
>
> ORC seems entirely capable of unwinding in this situation, except
> that it doesn't even try.  Adjust its initial stack check so that
> it's willing to try unwinding.
>
> I tested this by intentionally overflowing the task stack.  The
> result is an accurate call trace instead of a trace consisting
> purely of '?' entries.
>
> Signed-off-by: Andy Lutomirski 
> ---
>
> Hi all-
>
> Ingo, this would have fixed half the debugging problem you had, I think.
> To really nail it, we'd want some kind of magic to annotate the trace
> so that page_fault (and async_page_fault) entries show CR2 and error_code.
>
> Josh, any ideas of how to do that cleanly?  We could easily hard-code it
> in the OOPS unwinder, I guess.

Actually, this does pretty well.  We don't get CR2, but, when I added
an intentional bug kind of along the lines of the one you debugged,
the intermediate page fault successfully dumps all the regs in the
stack trace, so we get the faulting instruction *and* the registers.
We also get ORIG_RAX, which tells us the error code.  We could be
fancy and decode that.


Re: [PATCH] x86/orc: Don't bail on stack overflow

2017-11-25 Thread Andy Lutomirski
On Sat, Nov 25, 2017 at 9:28 AM, Andy Lutomirski  wrote:
> If we overflow the stack into a guard page and then try to unwind
> it with ORC, it should work perfectly: by construction, there can't
> be any meaningful data in the guard page because no writes to the
> guard page will have succeeded.
>
> ORC seems entirely capable of unwinding in this situation, except
> that it doesn't even try.  Adjust its initial stack check so that
> it's willing to try unwinding.
>
> I tested this by intentionally overflowing the task stack.  The
> result is an accurate call trace instead of a trace consisting
> purely of '?' entries.
>
> Signed-off-by: Andy Lutomirski 
> ---
>
> Hi all-
>
> Ingo, this would have fixed half the debugging problem you had, I think.
> To really nail it, we'd want some kind of magic to annotate the trace
> so that page_fault (and async_page_fault) entries show CR2 and error_code.
>
> Josh, any ideas of how to do that cleanly?  We could easily hard-code it
> in the OOPS unwinder, I guess.

Actually, this does pretty well.  We don't get CR2, but, when I added
an intentional bug kind of along the lines of the one you debugged,
the intermediate page fault successfully dumps all the regs in the
stack trace, so we get the faulting instruction *and* the registers.
We also get ORIG_RAX, which tells us the error code.  We could be
fancy and decode that.