Re: [musl] Re: [PATCH v2] powerpc/64/signal: balance return predictor stack in signal trampoline

2021-01-22 Thread Raoni Fassina Firmino
On Fri, Jan 22, 2021 at 09:44:05AM -0500, Rich Felker wrote:
> Maybe I'm missing something but I don't see how this would break musl;
> we just inspect the PC in the mcontext, which I don't see any changes
> to and which should still point to the next instruction of the
> interrupted context. I don't have a test environment though so I'll
> have to wait for feedback from ppc users to be sure. Are there any
> further details on how it's breaking glibc?

For glibc, backtrace() compares the return-address from each stack frame
to the value of `__kernel_sigtramp_rt64` to identify the frame with the
mcontext information, but now the return-address is not the start of the
routine, but the middle of it, so it fails to catch this special frame.


o/
Raoni Fassina


Re: [PATCH v2] powerpc/64/signal: balance return predictor stack in signal trampoline

2021-01-22 Thread Raoni Fassina Firmino
On Fri, Jan 22, 2021 at 12:27:14PM +0100, AL glibc-alpha wrote:
> * Nicholas Piggin:
> 
> > diff --git a/arch/powerpc/kernel/vdso64/sigtramp.S 
> > b/arch/powerpc/kernel/vdso64/sigtramp.S
> > index a8cc0409d7d2..bbf68cd01088 100644
> > --- a/arch/powerpc/kernel/vdso64/sigtramp.S
> > +++ b/arch/powerpc/kernel/vdso64/sigtramp.S
> > @@ -6,6 +6,7 @@
> >   * Copyright (C) 2004 Benjamin Herrenschmuidt (b...@kernel.crashing.org), 
> > IBM Corp.
> >   * Copyright (C) 2004 Alan Modra (amo...@au.ibm.com)), IBM Corp.
> >   */
> > +#include  /* IFETCH_ALIGN_BYTES */
> >  #include 
> >  #include 
> >  #include 
> > @@ -14,21 +15,17 @@
> >  
> > .text
> >  
> > -/* The nop here is a hack.  The dwarf2 unwind routines subtract 1 from
> > -   the return address to get an address in the middle of the presumed
> > -   call instruction.  Since we don't have a call here, we artificially
> > -   extend the range covered by the unwind info by padding before the
> > -   real start.  */
> > -   nop
> > .balign 8
> > +   .balign IFETCH_ALIGN_BYTES
> >  V_FUNCTION_BEGIN(__kernel_sigtramp_rt64)
> > -.Lsigrt_start = . - 4
> > +.Lsigrt_start:
> > +   bctrl   /* call the handler */
> > addir1, r1, __SIGNAL_FRAMESIZE
> > li  r0,__NR_rt_sigreturn
> > sc
> >  .Lsigrt_end:
> >  V_FUNCTION_END(__kernel_sigtramp_rt64)
> > -/* The ".balign 8" above and the following zeros mimic the old stack
> > +/* The .balign 8 above and the following zeros mimic the old stack
> > trampoline layout.  The last magic value is the ucontext pointer,
> > chosen in such a way that older libgcc unwind code returns a zero
> > for a sigcontext pointer.  */
> 
> As far as I understand it, this breaks cancellation handling on musl and
> future glibc because it is necessary to look at the signal delivery
> location to see if a system call sequence has result in an action, and
> that location is no longer in user code after this change.
> 
> We have a glibc test in preparation of our change, and it started
> failing:
> 
>   Linux 5.10 breaks sigcontext_get_pc on powerpc64
>   
> 
> Isn't it possible to avoid the return predictor desynchronization by
> adding the appropriate hint?

I also caught this regression, I believe it was introduced in the kernel
5.9.

I don't know enough to comment on Florian suggestion, but I am working
on some possible fixes:

On the kernel side we can keep `__kernel_sigtramp_rt64` in the original
place after `bctrl` and add a new symbol so the kernel can jump to the
right place before `bctrl`.  This would ensure backward compatibility.

On the other side, this change exposed how fragile `backtrace()` is to
any changes in the trampoline code, which the libc has no control over
in this case. So maybe there is something that can be improved in how
backtrace decides that the return-address is to the trampoline. My fist
option is to test a range, after `__kernel_sigtramp_rt64` so see if the
address is inside the routine. This would be better if we can know the
size of the function, I know that the vdso.so has this info in the elf
but I don't know if it is exposed to the glibc.

As Nicholas mentioned in his patch, GDB seems to keep working just fine,
it is seems that GDB uses some heuristics to match the surround code of
the return address to identify that it is the trampoline code. So maybe
other option is to do something similar.


o/
Raoni Fassina


Re: [musl] Re: [PATCH v2] powerpc/64/signal: balance return predictor stack in signal trampoline

2021-01-22 Thread Raoni Fassina Firmino
On Fri, Jan 22, 2021 at 01:31:27PM -0500, Rich Felker wrote:
> On Fri, Jan 22, 2021 at 03:19:22PM -0300, Raoni Fassina Firmino wrote:
> > On Fri, Jan 22, 2021 at 09:44:05AM -0500, Rich Felker wrote:
> > > Maybe I'm missing something but I don't see how this would break musl;
> > > we just inspect the PC in the mcontext, which I don't see any changes
> > > to and which should still point to the next instruction of the
> > > interrupted context. I don't have a test environment though so I'll
> > > have to wait for feedback from ppc users to be sure. Are there any
> > > further details on how it's breaking glibc?
> > 
> > For glibc, backtrace() compares the return-address from each stack frame
> > to the value of `__kernel_sigtramp_rt64` to identify the frame with the
> > mcontext information, but now the return-address is not the start of the
> > routine, but the middle of it, so it fails to catch this special frame.
> 
> Is there a reason it's backtracing rather than just looking at the
> interrupted context (pointed to by the third argument to the signal
> handler)?

The regression is exposed in the backtrace() routine. More precisely,
when the backtrace() is called from inside a signal handling. What I
described is the way backtrace() uses to identify this special
situation. What is failling in glibc is the test for this.


o/
Raoni Fassina


[PATCH] powerpc/64/signal: Fix regression in __kernel_sigtramp_rt64 semantics

2021-02-01 Thread Raoni Fassina Firmino
Tested on powerpc64 and powerpc64le, with a glibc build and running the
affected glibc's testcase[2], inspected that glibc's backtrace() now gives
the correct result and gdb backtrace also keeps working as before.

I believe this should be backported to releases 5.9 and 5.10 as userspace
is affected in this releases.

 8< 

A Change[1] in __kernel_sigtramp_rt64 VDSO and trampoline code introduced a
regression in the way glibc's backtrace()[2] detects the signal-handler
stack frame.  Apart from the practical implications, __kernel_sigtram_rt64
was a VDSO with the semantics that it is a function you can call from
userspace to end a signal handling.  Now this semantics are no longer
valid.

I believe the aforementioned change affects all releases since 5.9.

This patch tries to fix both the semantics and practical aspect of
__kernel_sigtramp_rt64 returning it to the previous code, whilst keeping
the intended behavior from[1] by adding a new symbol to serve as the jump
target from the kernel to the trampoline. Now the trampoline has two parts,
an new entry point and the old return point.

[1] commit 0138ba5783ae0dcc799ad401a1e8ac8333790df9 ("powerpc/64/signal:
Balance return predictor stack in signal trampoline")
[2] https://lists.ozlabs.org/pipermail/linuxppc-dev/2021-January/223194.html

Fixes: 0138ba5783ae ("powerpc/64/signal: Balance return predictor stack in 
signal trampoline")
Signed-off-by: Raoni Fassina Firmino 
---
 arch/powerpc/kernel/vdso64/sigtramp.S   | 9 -
 arch/powerpc/kernel/vdso64/vdso64.lds.S | 2 +-
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/vdso64/sigtramp.S 
b/arch/powerpc/kernel/vdso64/sigtramp.S
index bbf68cd01088..f0fd8d2a9fc4 100644
--- a/arch/powerpc/kernel/vdso64/sigtramp.S
+++ b/arch/powerpc/kernel/vdso64/sigtramp.S
@@ -15,11 +15,18 @@
 
.text
 
+/* __kernel_start_sigtramp_rt64 and __kernel_sigtramp_rt64 together
+   are one function split in two parts. The kernel jumps to the former
+   and the signal handler indirectly (by blr) returns to the latter.
+   __kernel_sigtramp_rt64 needs to point to the return address so
+   glibc can correctly identify the trampoline stack frame.  */
.balign 8
.balign IFETCH_ALIGN_BYTES
-V_FUNCTION_BEGIN(__kernel_sigtramp_rt64)
+V_FUNCTION_BEGIN(__kernel_start_sigtramp_rt64)
 .Lsigrt_start:
bctrl   /* call the handler */
+V_FUNCTION_END(__kernel_start_sigtramp_rt64)
+V_FUNCTION_BEGIN(__kernel_sigtramp_rt64)
addir1, r1, __SIGNAL_FRAMESIZE
li  r0,__NR_rt_sigreturn
sc
diff --git a/arch/powerpc/kernel/vdso64/vdso64.lds.S 
b/arch/powerpc/kernel/vdso64/vdso64.lds.S
index 6164d1a1ba11..2f3c359cacd3 100644
--- a/arch/powerpc/kernel/vdso64/vdso64.lds.S
+++ b/arch/powerpc/kernel/vdso64/vdso64.lds.S
@@ -131,4 +131,4 @@ VERSION
 /*
  * Make the sigreturn code visible to the kernel.
  */
-VDSO_sigtramp_rt64 = __kernel_sigtramp_rt64;
+VDSO_sigtramp_rt64 = __kernel_start_sigtramp_rt64;

base-commit: 76c057c84d286140c6c416c3b4ba832cd1d8984e
-- 
2.26.2



Re: [PATCH] powerpc/64/signal: Fix regression in __kernel_sigtramp_rt64 semantics

2021-02-02 Thread Raoni Fassina Firmino
On Tue, Feb 02, 2021 at 05:41:35PM +1000, Nicholas Piggin wrote:
> Are you planning to update glibc to cope with this as well? Any idea 
> about musl? If so, including version numbers would be good (not that
> it's really a problem to carry this patch around).

For glibc from the beginning I planned to send a patch as well and
fortunately it got in[1] in time for yesterday's 2.33 release.  That
patch is meant to address kernels 5.9 and 5.10 but is also compatible
with older kernels and with this patch. So everyone should be compatible
with everyone else :-) (except unpatched kernels 5.9 and 5.10 with
unpatched glibcs prior to 2.33)

I don't know about musl, I took a look and maybe wrong here but didn't
find any backtrace() implementation and it seems that it uses its own
return code for the trampoline, but I am not sure.

Rich mentioned[2] that he didn't see how it would break musl and was
waiting for feedback from ppc users to be sure.

Thanks for the review :-)

o/
Raoni

[1] 
https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=5ee506ed35a2c9184bcb1fb5e79b6cceb9bb0dd1
[2] https://lists.ozlabs.org/pipermail/linuxppc-dev/2021-January/223198.html