Re: [RFC PATCH v2 1/1] powerpc: update ppc_save_regs to save current r1 in pt_regs

2023-07-02 Thread Michael Ellerman
On Thu, 15 Jun 2023 14:40:47 +0530, Aditya Gupta wrote:
> ppc_save_regs() skips one stack frame while saving the CPU register states.
> Instead of saving current R1, it pulls the previous stack frame pointer.
> 
> When vmcores caused by direct panic call (such as `echo c >
> /proc/sysrq-trigger`), are debugged with gdb, gdb fails to show the
> backtrace correctly. On further analysis, it was found that it was because
> of mismatch between r1 and NIP.
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc: update ppc_save_regs to save current r1 in pt_regs
  https://git.kernel.org/powerpc/c/b684c09f09e7a6af3794d4233ef785819e72db79

cheers


Re: [RFC PATCH v2 1/1] powerpc: update ppc_save_regs to save current r1 in pt_regs

2023-06-18 Thread Aditya Gupta

On 15/06/23 17:40, Nicholas Piggin wrote:

On Thu Jun 15, 2023 at 7:10 PM AEST, Aditya Gupta wrote:

ppc_save_regs() skips one stack frame while saving the CPU register states.
Instead of saving current R1, it pulls the previous stack frame pointer.

...

So this now saves regs as though it was an interrupt taken in the
caller, at the instruction after the call to ppc_save_regs, whereas
previously the NIP was there, but R1 came from the caller's caller
and that mismatch is what causes gdb's dwarf unwinder to go haywire.

Nice catch, and I think I follow the fix and I think I agree with it.
Before the bug was introduced, NIP also came from the grandparent.
Which is what xmon presumably wanted, but since then I think maybe it
makes more sense to just have the parent caller.

Reivewed-by: Nicholas Piggin 
Fixes: d16a58f8854b1 ("powerpc: Improve ppc_save_regs()")


Thanks for reviewing this, and providing a Fixes tag too.

Thanks
- Aditya



Re: [RFC PATCH v2 1/1] powerpc: update ppc_save_regs to save current r1 in pt_regs

2023-06-15 Thread Nicholas Piggin
On Thu Jun 15, 2023 at 7:10 PM AEST, Aditya Gupta wrote:
> ppc_save_regs() skips one stack frame while saving the CPU register states.
> Instead of saving current R1, it pulls the previous stack frame pointer.
>
> When vmcores caused by direct panic call (such as `echo c >
> /proc/sysrq-trigger`), are debugged with gdb, gdb fails to show the
> backtrace correctly. On further analysis, it was found that it was because
> of mismatch between r1 and NIP.
>
> GDB uses NIP to get current function symbol and uses corresponding debug
> info of that function to unwind previous frames, but due to the
> mismatching r1 and NIP, the unwinding does not work, and it fails to
> unwind to the 2nd frame and hence does not show the backtrace.
>
> GDB backtrace with vmcore of kernel without this patch:
>
> -
> (gdb) bt
>  #0  0xc02a53e8 in crash_setup_regs (oldregs=,
> newregs=0xc4f8f8d8) at ./arch/powerpc/include/asm/kexec.h:69
>  #1  __crash_kexec (regs=) at kernel/kexec_core.c:974
>  #2  0x0063 in ?? ()
>  #3  0xc3579320 in ?? ()
> -
>
> Further analysis revealed that the mismatch occurred because
> "ppc_save_regs" was saving the previous stack's SP instead of the current
> r1. This patch fixes this by storing current r1 in the saved pt_regs.
>
> GDB backtrace with vmcore of patched kernel:
>
> 
> (gdb) bt
>  #0  0xc02a53e8 in crash_setup_regs (oldregs=0x0, 
> newregs=0xc670b8d8)
> at ./arch/powerpc/include/asm/kexec.h:69
>  #1  __crash_kexec (regs=regs@entry=0x0) at kernel/kexec_core.c:974
>  #2  0xc0168918 in panic (fmt=fmt@entry=0xc1654a60 "sysrq 
> triggered crash\n")
> at kernel/panic.c:358
>  #3  0xc0b735f8 in sysrq_handle_crash (key=) at 
> drivers/tty/sysrq.c:155
>  #4  0xc0b742cc in __handle_sysrq (key=key@entry=99, 
> check_mask=check_mask@entry=false)
> at drivers/tty/sysrq.c:602
>  #5  0xc0b7506c in write_sysrq_trigger (file=, 
> buf=,
> count=2, ppos=) at drivers/tty/sysrq.c:1163
>  #6  0xc069a7bc in pde_write (ppos=, count= out>,
> buf=, file=, pde=0xc362cb40) at 
> fs/proc/inode.c:340
>  #7  proc_reg_write (file=, buf=, 
> count=,
> ppos=) at fs/proc/inode.c:352
>  #8  0xc05b3bbc in vfs_write (file=file@entry=0xc6aa6b00,
> buf=buf@entry=0x61f498b4f60  0x61f498b4f60>,
> count=count@entry=2, pos=pos@entry=0xc670bda0) at 
> fs/read_write.c:582
>  #9  0xc05b4264 in ksys_write (fd=,
> buf=0x61f498b4f60 , 
> count=2)
> at fs/read_write.c:637
>  #10 0xc002ea2c in system_call_exception (regs=0xc670be80, 
> r0=)
> at arch/powerpc/kernel/syscall.c:171
>  #11 0xc000c270 in system_call_vectored_common ()
> at arch/powerpc/kernel/interrupt_64.S:192
> 
>
> Signed-off-by: Aditya Gupta 

So this now saves regs as though it was an interrupt taken in the
caller, at the instruction after the call to ppc_save_regs, whereas
previously the NIP was there, but R1 came from the caller's caller
and that mismatch is what causes gdb's dwarf unwinder to go haywire.

Nice catch, and I think I follow the fix and I think I agree with it.
Before the bug was introduced, NIP also came from the grandparent.
Which is what xmon presumably wanted, but since then I think maybe it
makes more sense to just have the parent caller.

Reivewed-by: Nicholas Piggin 
Fixes: d16a58f8854b1 ("powerpc: Improve ppc_save_regs()")

Thanks,
Nick


[RFC PATCH v2 1/1] powerpc: update ppc_save_regs to save current r1 in pt_regs

2023-06-15 Thread Aditya Gupta
ppc_save_regs() skips one stack frame while saving the CPU register states.
Instead of saving current R1, it pulls the previous stack frame pointer.

When vmcores caused by direct panic call (such as `echo c >
/proc/sysrq-trigger`), are debugged with gdb, gdb fails to show the
backtrace correctly. On further analysis, it was found that it was because
of mismatch between r1 and NIP.

GDB uses NIP to get current function symbol and uses corresponding debug
info of that function to unwind previous frames, but due to the
mismatching r1 and NIP, the unwinding does not work, and it fails to
unwind to the 2nd frame and hence does not show the backtrace.

GDB backtrace with vmcore of kernel without this patch:

-
(gdb) bt
 #0  0xc02a53e8 in crash_setup_regs (oldregs=,
newregs=0xc4f8f8d8) at ./arch/powerpc/include/asm/kexec.h:69
 #1  __crash_kexec (regs=) at kernel/kexec_core.c:974
 #2  0x0063 in ?? ()
 #3  0xc3579320 in ?? ()
-

Further analysis revealed that the mismatch occurred because
"ppc_save_regs" was saving the previous stack's SP instead of the current
r1. This patch fixes this by storing current r1 in the saved pt_regs.

GDB backtrace with vmcore of patched kernel:


(gdb) bt
 #0  0xc02a53e8 in crash_setup_regs (oldregs=0x0, 
newregs=0xc670b8d8)
at ./arch/powerpc/include/asm/kexec.h:69
 #1  __crash_kexec (regs=regs@entry=0x0) at kernel/kexec_core.c:974
 #2  0xc0168918 in panic (fmt=fmt@entry=0xc1654a60 "sysrq 
triggered crash\n")
at kernel/panic.c:358
 #3  0xc0b735f8 in sysrq_handle_crash (key=) at 
drivers/tty/sysrq.c:155
 #4  0xc0b742cc in __handle_sysrq (key=key@entry=99, 
check_mask=check_mask@entry=false)
at drivers/tty/sysrq.c:602
 #5  0xc0b7506c in write_sysrq_trigger (file=, 
buf=,
count=2, ppos=) at drivers/tty/sysrq.c:1163
 #6  0xc069a7bc in pde_write (ppos=, count=,
buf=, file=, pde=0xc362cb40) at 
fs/proc/inode.c:340
 #7  proc_reg_write (file=, buf=, 
count=,
ppos=) at fs/proc/inode.c:352
 #8  0xc05b3bbc in vfs_write (file=file@entry=0xc6aa6b00,
buf=buf@entry=0x61f498b4f60 ,
count=count@entry=2, pos=pos@entry=0xc670bda0) at 
fs/read_write.c:582
 #9  0xc05b4264 in ksys_write (fd=,
buf=0x61f498b4f60 , 
count=2)
at fs/read_write.c:637
 #10 0xc002ea2c in system_call_exception (regs=0xc670be80, 
r0=)
at arch/powerpc/kernel/syscall.c:171
 #11 0xc000c270 in system_call_vectored_common ()
at arch/powerpc/kernel/interrupt_64.S:192


Signed-off-by: Aditya Gupta 
---

More information:

This problem with gdb backtrace was discovered while working on a crash
tool enhancement to improve crash analysis using gdb passthrough to be
able print function arguments and local variables inside crash tool. gdb
passthrough simply asks gdb to handle the backtrace printing, where it
was noticed that it could not print correct backtrace in some vmcores.

The changes introduced here has an implication on xmon, that it might show
one extra `xmon` frame in backtrace. By looking at older commits it seems
that originally the ppc_save_regs function was introduced as
xmon_save_regs(). But now the same function has been renamed to
ppc_save_regs() and been used in few other places as well.

Tested this patch with multiple ways of crashing:
1. direct panic call (`echo c > /proc/sysrq-trigger`)
2. null dereference/oops path (the earlier implementation of 
`sysrq_handle_crash`)
3. sys reset
4. sys reset inside qemu (to test for any regressions, that were fixed by
   commit d16a58f8854b19)

Changelog
- V2:
  - fixed bogus LR by storing caller's LR area as pointed out by Naveen and
  Nick

---
 arch/powerpc/kernel/ppc_save_regs.S | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/ppc_save_regs.S 
b/arch/powerpc/kernel/ppc_save_regs.S
index 49813f982468..a9b9c32d0c1f 100644
--- a/arch/powerpc/kernel/ppc_save_regs.S
+++ b/arch/powerpc/kernel/ppc_save_regs.S
@@ -31,10 +31,10 @@ _GLOBAL(ppc_save_regs)
lbz r0,PACAIRQSOFTMASK(r13)
PPC_STL r0,SOFTE(r3)
 #endif
-   /* go up one stack frame for SP */
-   PPC_LL  r4,0(r1)
-   PPC_STL r4,GPR1(r3)
+   /* store current SP */
+   PPC_STL r1,GPR1(r3)
/* get caller's LR */
+   PPC_LL  r4,0(r1)
PPC_LL  r0,LRSAVE(r4)
PPC_STL r0,_LINK(r3)
mflrr0
-- 
2.40.1