Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid

2018-04-15 Thread Eric W. Biederman
Russell King - ARM Linux  writes:

> On Fri, Apr 13, 2018 at 12:53:49PM -0700, Linus Torvalds wrote:
>> On Fri, Apr 13, 2018 at 11:45 AM, Dave Martin  wrote:
>> >
>> > Most uses I've seen do nothing more than use the FPE_xyz value to
>> > format diagnostic messages while dying.  I struggled to find code that
>> > made a meaningful functional decision based on the value, though that's
>> > not proof...
>> 
>> Yeah. I've seen code that cares about SIGFPE deeply, but it's almost
>> invariably about some emulated environment (eg Java VM, or CPU
>> emulation).
>> 
>> And the siginfo data is basically never good enough for those
>> environments anyway on its own, so they will go and look at the actual
>> instruction that caused the fault and the register state instead,
>> because they need *all* the information.
>> 
>> The cases that use si_code are the ones that just trapped signals in
>> order to give a more helpful abort message.
>> 
>> So I could certainly imagine that si_code is actually used by somebody
>> who then decides to actuall act differently on it, but aside from
>> perhaps printing out a different message, it sounds far-fetched.
>
> Okay, in that case let's just use FPE_FLTINV.  That makes the patch
> easily back-portable for stable kernels.

If we want to I don't think  backporting 266da65e9156 ("signal: Add
FPE_FLTUNK si_code for undiagnosable fp exceptions") would be at
all difficult.

What it is changing has been stable for quite a while.  The surroundings
might change and so it might require some trivial manual fixup but I
don't expect any problems.

Not that I want to derail the consensus but if we want to backport
similar fixes for arm64 or the other architectures that wind up using
FPE_FLTUNK for their fix we would need to backport 266da65e9156 anyway.

Eric



Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid

2018-04-15 Thread Russell King - ARM Linux
On Fri, Apr 13, 2018 at 12:53:49PM -0700, Linus Torvalds wrote:
> On Fri, Apr 13, 2018 at 11:45 AM, Dave Martin  wrote:
> >
> > Most uses I've seen do nothing more than use the FPE_xyz value to
> > format diagnostic messages while dying.  I struggled to find code that
> > made a meaningful functional decision based on the value, though that's
> > not proof...
> 
> Yeah. I've seen code that cares about SIGFPE deeply, but it's almost
> invariably about some emulated environment (eg Java VM, or CPU
> emulation).
> 
> And the siginfo data is basically never good enough for those
> environments anyway on its own, so they will go and look at the actual
> instruction that caused the fault and the register state instead,
> because they need *all* the information.
> 
> The cases that use si_code are the ones that just trapped signals in
> order to give a more helpful abort message.
> 
> So I could certainly imagine that si_code is actually used by somebody
> who then decides to actuall act differently on it, but aside from
> perhaps printing out a different message, it sounds far-fetched.

Okay, in that case let's just use FPE_FLTINV.  That makes the patch
easily back-portable for stable kernels.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up


Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid

2018-04-13 Thread Dave Martin
On Fri, Apr 13, 2018 at 07:50:17PM +0100, Russell King - ARM Linux wrote:
> On Fri, Apr 13, 2018 at 07:35:38PM +0100, Dave Martin wrote:
> > If that's the case though, I don't see how a userspace testsuite is
> > hitting this code path.  Maybe I've misunderstood the context of this
> > thread.
> 
> It isn't hitting this exact case.
> 
> The userspace testsuite is hitting an entirely different case:
> 
>   kill(getpid(), SIGFPE);
> 
> As one expects, this generates a SIGFPE to the current process, which
> then inspects the siginfo structure.  Being a userspace generated
> signal, si_code is set to SI_USER, which has the value 0.
> 
> With FPE_FIXME defined to zero, as Eric has done:
> 
> enum siginfo_layout siginfo_layout(int sig, int si_code)
> {
> enum siginfo_layout layout = SIL_KILL;
> if ((si_code > SI_USER) && (si_code < SI_KERNEL)) {
> ...
> } else {
> ...
> #ifdef FPE_FIXME
> if ((sig == SIGFPE) && (si_code == FPE_FIXME))
> layout = SIL_FAULT;
> #endif
> }
> return layout;
> }
> 
> This causes siginfo_layout() to return SIL_FAULT for this userspace
> generated signal, rather than the correct SIL_KILL.
> 
> This affects which fields we copy to userspace.
> 
> SI_USER is defined to pass si_pid and si_uid to the userspace process,
> which on ARM are the first two consecutive 32-bit quantities in the
> union, which is done when siginfo_layout() returns SIL_KILL.  However,
> when SIL_FAULT is returned, we only copy si_addr in the union, which
> on ARM is just one 32-bit quantity.
> 
> Consequently, userspace sees a correct value for si_pid, and si_uid
> remains set to whatever was there in userspace.  In the case of the
> strace program, that's zero.  This means if you run the strace
> testsuite as root, the problem doesn't appear, but if you run it as
> a non-root user, it will.
> 
> So, the testsuite case has little to do with the behaviour of the VFP
> handling - it's to do with the behaviour of the kernel's signal handling.

Oh, right.  So, going back to the unhandled VFP bounce question,
is it reasonable for that to be a SIGKILL?  That avoids the question
of what si_code userspace should see, because userspace doesn't get
to see siginfo at all in that case: it's dead.

Or do we hit this in real situations that we want userspace to bail out
of more gracefully?

Cheers
---Dave


Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid

2018-04-13 Thread Dave Martin
On Fri, Apr 13, 2018 at 11:23:36AM -0700, Linus Torvalds wrote:
> On Fri, Apr 13, 2018 at 10:54 AM, Russell King - ARM Linux
>  wrote:
> >
> > FPE_FLTINV means "floating point invalid operation".  Does it really
> > cover the case where hardware has failed, or is it intended to cover
> > the case where userspace did something wrong and asked for an invalid
> > operation from the FP hardware?
> 
> Note that the number of people who actually look at the si_code is
> approximately zero.
> 
> But the ones that _do_ check the si_code are certainly not going to
> check it against a new code that they don't know about.
> 
> I suspect that if you start searching for FLT_xyz occurrences in code,
> approximately 100% of them are from the kernel code that generates
> them, not from any actual users.
> 
> So I'd be very surprised if you can find *anybody* who cares about
> that exact value (with the possible exceptions of test-suites).
> 
> Sadly, google code-search is no more. It was useful for things like that.

I've found https://codesearch.debian.net/ useful for digging into this
kind of question, though it tends to throw up a lot of false positives.

Most uses I've seen do nothing more than use the FPE_xyz value to
format diagnostic messages while dying.  I struggled to find code that
made a meaningful functional decision based on the value, though that's
not proof...

Cheers
---Dave


Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid

2018-04-13 Thread Dave Martin
On Fri, Apr 13, 2018 at 06:54:08PM +0100, Russell King - ARM Linux wrote:
> On Fri, Apr 13, 2018 at 06:08:28PM +0100, Dave Martin wrote:
> > On Fri, Apr 13, 2018 at 09:33:17AM -0700, Linus Torvalds wrote:
> > > On Fri, Apr 13, 2018 at 2:42 AM, Russell King - ARM Linux
> > >  wrote:
> > > >
> > > > Yes, it does solve the problem at hand with strace - the exact patch I
> > > > tested against 4.16 is below.
> > > 
> > > Ok, good.
> > > 
> > > > However, FPE_FLTUNK is not defined in older kernels, so while we can
> > > > fix it this way for the current merge window, that doesn't help 4.16.
> > > 
> > > I wonder if we should even bother with FPE_FLTUNK.
> > > 
> > > I suspect we might as well use FPE_FLTINV, I suspect, and not have
> > > this complexity at all. That case is not worth worrying about, since
> > > it's a "this shouldn't happen anyway" and the *real* reason will be in
> > > the kernel logs due to vfs_panic().
> > > 
> > > So it's not like this is something that the user should ever care
> > > about the si_code about.
> > 
> > Ack, my intended meaning for FPE_FLTUNK is that the fp exception is
> > either spurious or we can't tell easily (or possibly at all) which
> > FPE_XXX should be returned.  It's up to userspace to figure it out
> > if it really cares.  Previously we were accidentally returning SI_USER
> > in si_code for arm64.
> > 
> > This case on arm looks like a more serious error for which FPE_FLTINV
> > may be more appropriate anyway.
> 
> No.  The cases where we get to this point are:
> 
> 1. A trap concerning a coprocessor register transfer instruction (iow, move
>between a VFP register and ARM register.)
> 2. A trap concerning a coprocessor register load or save instruction.
> 
> (In both of these, "concerning" means that the VFP hardware provides
> such an instruction as the reason for the fault, *not* that it is the
> faulting instruction.)
> 
> 3. A combination of the exception bits (EX and DEX) on certain VFP
>implementations.
> 
> All of these can be summarised as "the hardware went wrong in some way"
> rather than "the user program did something wrong."

Although my understanding of VFP bounces is a bit hazy, I think this is
broadly in line with my assumptions.

> FPE_FLTINV means "floating point invalid operation".  Does it really
> cover the case where hardware has failed, or is it intended to cover
> the case where userspace did something wrong and asked for an invalid
> operation from the FP hardware?

So, there's an argument that FPE_FLTINV is not really correct.  My
rationale was that there is nothing correct that we can return, and
FPE_FLTINV may be no worse than the alternatives.

If we can only hit this case as the result of a hardware failure
or kernel bug though, should this be delivered as SIGKILL instead?

That's the approach I eventually followed for various exceptions
on arm64 that were theoretically delivered to userspace with si_code==0,
but really should be impossible unless and kernel and/or hardware
is buggy.

If that's the case though, I don't see how a userspace testsuite is
hitting this code path.  Maybe I've misunderstood the context of this
thread.

Cheers
---Dave


Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid

2018-04-13 Thread Dave Martin
On Fri, Apr 13, 2018 at 09:33:17AM -0700, Linus Torvalds wrote:
> On Fri, Apr 13, 2018 at 2:42 AM, Russell King - ARM Linux
>  wrote:
> >
> > Yes, it does solve the problem at hand with strace - the exact patch I
> > tested against 4.16 is below.
> 
> Ok, good.
> 
> > However, FPE_FLTUNK is not defined in older kernels, so while we can
> > fix it this way for the current merge window, that doesn't help 4.16.
> 
> I wonder if we should even bother with FPE_FLTUNK.
> 
> I suspect we might as well use FPE_FLTINV, I suspect, and not have
> this complexity at all. That case is not worth worrying about, since
> it's a "this shouldn't happen anyway" and the *real* reason will be in
> the kernel logs due to vfs_panic().
> 
> So it's not like this is something that the user should ever care
> about the si_code about.

Ack, my intended meaning for FPE_FLTUNK is that the fp exception is
either spurious or we can't tell easily (or possibly at all) which
FPE_XXX should be returned.  It's up to userspace to figure it out
if it really cares.  Previously we were accidentally returning SI_USER
in si_code for arm64.

This case on arm looks like a more serious error for which FPE_FLTINV
may be more appropriate anyway.

[...]

Cheers
---Dave


Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid

2018-04-13 Thread Linus Torvalds
On Fri, Apr 13, 2018 at 11:45 AM, Dave Martin  wrote:
>
> Most uses I've seen do nothing more than use the FPE_xyz value to
> format diagnostic messages while dying.  I struggled to find code that
> made a meaningful functional decision based on the value, though that's
> not proof...

Yeah. I've seen code that cares about SIGFPE deeply, but it's almost
invariably about some emulated environment (eg Java VM, or CPU
emulation).

And the siginfo data is basically never good enough for those
environments anyway on its own, so they will go and look at the actual
instruction that caused the fault and the register state instead,
because they need *all* the information.

The cases that use si_code are the ones that just trapped signals in
order to give a more helpful abort message.

So I could certainly imagine that si_code is actually used by somebody
who then decides to actuall act differently on it, but aside from
perhaps printing out a different message, it sounds far-fetched.

Linus


Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid

2018-04-13 Thread Russell King - ARM Linux
On Fri, Apr 13, 2018 at 07:35:38PM +0100, Dave Martin wrote:
> If that's the case though, I don't see how a userspace testsuite is
> hitting this code path.  Maybe I've misunderstood the context of this
> thread.

It isn't hitting this exact case.

The userspace testsuite is hitting an entirely different case:

kill(getpid(), SIGFPE);

As one expects, this generates a SIGFPE to the current process, which
then inspects the siginfo structure.  Being a userspace generated
signal, si_code is set to SI_USER, which has the value 0.

With FPE_FIXME defined to zero, as Eric has done:

enum siginfo_layout siginfo_layout(int sig, int si_code)
{
enum siginfo_layout layout = SIL_KILL;
if ((si_code > SI_USER) && (si_code < SI_KERNEL)) {
...
} else {
...
#ifdef FPE_FIXME
if ((sig == SIGFPE) && (si_code == FPE_FIXME))
layout = SIL_FAULT;
#endif
}
return layout;
}

This causes siginfo_layout() to return SIL_FAULT for this userspace
generated signal, rather than the correct SIL_KILL.

This affects which fields we copy to userspace.

SI_USER is defined to pass si_pid and si_uid to the userspace process,
which on ARM are the first two consecutive 32-bit quantities in the
union, which is done when siginfo_layout() returns SIL_KILL.  However,
when SIL_FAULT is returned, we only copy si_addr in the union, which
on ARM is just one 32-bit quantity.

Consequently, userspace sees a correct value for si_pid, and si_uid
remains set to whatever was there in userspace.  In the case of the
strace program, that's zero.  This means if you run the strace
testsuite as root, the problem doesn't appear, but if you run it as
a non-root user, it will.

So, the testsuite case has little to do with the behaviour of the VFP
handling - it's to do with the behaviour of the kernel's signal handling.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up


Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid

2018-04-13 Thread Linus Torvalds
On Fri, Apr 13, 2018 at 10:54 AM, Russell King - ARM Linux
 wrote:
>
> FPE_FLTINV means "floating point invalid operation".  Does it really
> cover the case where hardware has failed, or is it intended to cover
> the case where userspace did something wrong and asked for an invalid
> operation from the FP hardware?

Note that the number of people who actually look at the si_code is
approximately zero.

But the ones that _do_ check the si_code are certainly not going to
check it against a new code that they don't know about.

I suspect that if you start searching for FLT_xyz occurrences in code,
approximately 100% of them are from the kernel code that generates
them, not from any actual users.

So I'd be very surprised if you can find *anybody* who cares about
that exact value (with the possible exceptions of test-suites).

Sadly, google code-search is no more. It was useful for things like that.

Linus


Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid

2018-04-13 Thread Russell King - ARM Linux
On Fri, Apr 13, 2018 at 06:08:28PM +0100, Dave Martin wrote:
> On Fri, Apr 13, 2018 at 09:33:17AM -0700, Linus Torvalds wrote:
> > On Fri, Apr 13, 2018 at 2:42 AM, Russell King - ARM Linux
> >  wrote:
> > >
> > > Yes, it does solve the problem at hand with strace - the exact patch I
> > > tested against 4.16 is below.
> > 
> > Ok, good.
> > 
> > > However, FPE_FLTUNK is not defined in older kernels, so while we can
> > > fix it this way for the current merge window, that doesn't help 4.16.
> > 
> > I wonder if we should even bother with FPE_FLTUNK.
> > 
> > I suspect we might as well use FPE_FLTINV, I suspect, and not have
> > this complexity at all. That case is not worth worrying about, since
> > it's a "this shouldn't happen anyway" and the *real* reason will be in
> > the kernel logs due to vfs_panic().
> > 
> > So it's not like this is something that the user should ever care
> > about the si_code about.
> 
> Ack, my intended meaning for FPE_FLTUNK is that the fp exception is
> either spurious or we can't tell easily (or possibly at all) which
> FPE_XXX should be returned.  It's up to userspace to figure it out
> if it really cares.  Previously we were accidentally returning SI_USER
> in si_code for arm64.
> 
> This case on arm looks like a more serious error for which FPE_FLTINV
> may be more appropriate anyway.

No.  The cases where we get to this point are:

1. A trap concerning a coprocessor register transfer instruction (iow, move
   between a VFP register and ARM register.)
2. A trap concerning a coprocessor register load or save instruction.

(In both of these, "concerning" means that the VFP hardware provides
such an instruction as the reason for the fault, *not* that it is the
faulting instruction.)

3. A combination of the exception bits (EX and DEX) on certain VFP
   implementations.

All of these can be summarised as "the hardware went wrong in some way"
rather than "the user program did something wrong."

FPE_FLTINV means "floating point invalid operation".  Does it really
cover the case where hardware has failed, or is it intended to cover
the case where userspace did something wrong and asked for an invalid
operation from the FP hardware?

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up


Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid

2018-04-13 Thread Linus Torvalds
On Fri, Apr 13, 2018 at 2:42 AM, Russell King - ARM Linux
 wrote:
>
> Yes, it does solve the problem at hand with strace - the exact patch I
> tested against 4.16 is below.

Ok, good.

> However, FPE_FLTUNK is not defined in older kernels, so while we can
> fix it this way for the current merge window, that doesn't help 4.16.

I wonder if we should even bother with FPE_FLTUNK.

I suspect we might as well use FPE_FLTINV, I suspect, and not have
this complexity at all. That case is not worth worrying about, since
it's a "this shouldn't happen anyway" and the *real* reason will be in
the kernel logs due to vfs_panic().

So it's not like this is something that the user should ever care
about the si_code about.

> Given that the path we're talking about is unlikely to happen (as
> mentioned in my second paragraph) I still think reverting Eric's patch
> is the right way forward for older kernels.

I'd much rather get rid of that FPE_FIXME, and leave that whole mess behind.

So the attached patch seems simple and should work with 4.16 too.

Let's not leave this as some kind of nasty maintenance issue, and just
go for simple and stupid.

Hmm?

Linus
 arch/arm/include/uapi/asm/siginfo.h | 13 -
 arch/arm/vfp/vfpmodule.c|  2 +-
 2 files changed, 1 insertion(+), 14 deletions(-)

diff --git a/arch/arm/include/uapi/asm/siginfo.h b/arch/arm/include/uapi/asm/siginfo.h
deleted file mode 100644
index d0513880be21..
--- a/arch/arm/include/uapi/asm/siginfo.h
+++ /dev/null
@@ -1,13 +0,0 @@
-#ifndef __ASM_SIGINFO_H
-#define __ASM_SIGINFO_H
-
-#include 
-
-/*
- * SIGFPE si_codes
- */
-#ifdef __KERNEL__
-#define FPE_FIXME	0	/* Broken dup of SI_USER */
-#endif /* __KERNEL__ */
-
-#endif
diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c
index 4c375e11ae95..af4ee2cef2f9 100644
--- a/arch/arm/vfp/vfpmodule.c
+++ b/arch/arm/vfp/vfpmodule.c
@@ -257,7 +257,7 @@ static void vfp_raise_exceptions(u32 exceptions, u32 inst, u32 fpscr, struct pt_
 
 	if (exceptions == VFP_EXCEPTION_ERROR) {
 		vfp_panic("unhandled bounce", inst);
-		vfp_raise_sigfpe(FPE_FIXME, regs);
+		vfp_raise_sigfpe(FPE_FLTINV, regs);
 		return;
 	}
 


Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid

2018-04-13 Thread Russell King - ARM Linux
On Thu, Apr 12, 2018 at 10:22:15AM -0700, Linus Torvalds wrote:
> On Thu, Apr 12, 2018 at 10:20 AM, Russell King - ARM Linux
>  wrote:
> >
> > This file was created to contain FPE_FIXME, by the "signal/arm: Document
> > conflicts with SI_USER and SIGFPE" commit so if we're removing it, it
> > would be better to remove the whole file.
> 
> Fair enough. I'm not going to commit that anyway since I can't test
> it, but yes, if it tests ok that sounds like the right thing to do.

Yes, it does solve the problem at hand with strace - the exact patch I
tested against 4.16 is below.

Testing this exact code path (exceptions set to VFP_EXCEPTION_ERROR)
is something that can only happen if the hardware does something stupid,
and I don't have a way of making it do that, so the code path can't be
tested.

However, FPE_FLTUNK is not defined in older kernels, so while we can
fix it this way for the current merge window, that doesn't help 4.16.
How we solve that depends what happens with Eric's patch (266da65e9156
("signal: Add FPE_FLTUNK si_code for undiagnosable fp exceptions"))
that introduces FPE_FLTUNK - and there's also the problem of NSIGFPE,
which kernel/signal.c uses in the path that selects the siginfo layout.

Given that the path we're talking about is unlikely to happen (as
mentioned in my second paragraph) I still think reverting Eric's patch
is the right way forward for older kernels.

(Note, my previous comment about the si_code initialiser was incorrect.)

diff --git a/arch/arm/include/uapi/asm/siginfo.h 
b/arch/arm/include/uapi/asm/siginfo.h
deleted file mode 100644
index d0513880be21..
--- a/arch/arm/include/uapi/asm/siginfo.h
+++ /dev/null
@@ -1,13 +0,0 @@
-#ifndef __ASM_SIGINFO_H
-#define __ASM_SIGINFO_H
-
-#include 
-
-/*
- * SIGFPE si_codes
- */
-#ifdef __KERNEL__
-#define FPE_FIXME  0   /* Broken dup of SI_USER */
-#endif /* __KERNEL__ */
-
-#endif
diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c
index 4c375e11ae95..8a1a5e6048d2 100644
--- a/arch/arm/vfp/vfpmodule.c
+++ b/arch/arm/vfp/vfpmodule.c
@@ -28,6 +28,10 @@
 #include 
 #include 
 
+#ifndef FPE_FLTUNK
+#define FPE_FLTUNK 14
+#endif
+
 #include "vfpinstr.h"
 #include "vfp.h"
 
@@ -257,7 +261,7 @@ static void vfp_raise_exceptions(u32 exceptions, u32 inst, 
u32 fpscr, struct pt_
 
if (exceptions == VFP_EXCEPTION_ERROR) {
vfp_panic("unhandled bounce", inst);
-   vfp_raise_sigfpe(FPE_FIXME, regs);
+   vfp_raise_sigfpe(FPE_FLTUNK, regs);
return;
}
 


-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up


Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid

2018-04-12 Thread Dmitry V. Levin
On Thu, Apr 12, 2018 at 09:50:26AM -0700, Linus Torvalds wrote:
> Does this attached patch perhaps fix the ARM case?
> 
> It just uses FPE_FLTUNK as the default si_code for SIGFPE, which seems
> sane enough. And then gets rid of FPE_FIXME, which should resolve the
> nasty case.
> 
> Hmm? Entirely untested, and I didn't really look at the test-case in
> question since I can't really run it anyway.
> 
> Well, I could run it all on x86-64, but it doesn't have that FPE_FIXME
> case at all.
> 
>  Linus

>  arch/arm/include/uapi/asm/siginfo.h | 7 ---
>  arch/arm/vfp/vfpmodule.c| 4 ++--
>  2 files changed, 2 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/arm/include/uapi/asm/siginfo.h 
> b/arch/arm/include/uapi/asm/siginfo.h
> index d0513880be21..d87beeedb4c4 100644
> --- a/arch/arm/include/uapi/asm/siginfo.h
> +++ b/arch/arm/include/uapi/asm/siginfo.h
> @@ -3,11 +3,4 @@
>  
>  #include 
>  
> -/*
> - * SIGFPE si_codes
> - */
> -#ifdef __KERNEL__
> -#define FPE_FIXME0   /* Broken dup of SI_USER */
> -#endif /* __KERNEL__ */
> -
>  #endif

Looks like the whole file should go away.

> diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c
> index 4c375e11ae95..012c6e690303 100644
> --- a/arch/arm/vfp/vfpmodule.c
> +++ b/arch/arm/vfp/vfpmodule.c
> @@ -251,13 +251,13 @@ static void vfp_panic(char *reason, u32 inst)
>   */
>  static void vfp_raise_exceptions(u32 exceptions, u32 inst, u32 fpscr, struct 
> pt_regs *regs)
>  {
> - int si_code = 0;
> + int si_code = FPE_FLTUNK;

Note that this change would affect the following code
at the end of vfp_raise_exceptions:

if (si_code)
vfp_raise_sigfpe(si_code, regs);

>   pr_debug("VFP: raising exceptions %08x\n", exceptions);
>  
>   if (exceptions == VFP_EXCEPTION_ERROR) {
>   vfp_panic("unhandled bounce", inst);
> - vfp_raise_sigfpe(FPE_FIXME, regs);
> + vfp_raise_sigfpe(si_code, regs);
>   return;
>   }
>  

To be on the safe side, I'd just change it this way:

diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c
index 4c375e1..66a73ba 100644
--- a/arch/arm/vfp/vfpmodule.c
+++ b/arch/arm/vfp/vfpmodule.c
@@ -257,7 +257,7 @@ static void vfp_raise_exceptions(u32 exceptions, u32 inst, 
u32 fpscr, struct pt_
 
if (exceptions == VFP_EXCEPTION_ERROR) {
vfp_panic("unhandled bounce", inst);
-   vfp_raise_sigfpe(FPE_FIXME, regs);
+   vfp_raise_sigfpe(FPE_FLTUNK, regs);
return;
}

-- 
ldv


signature.asc
Description: PGP signature


Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid

2018-04-12 Thread Linus Torvalds
On Thu, Apr 12, 2018 at 10:20 AM, Russell King - ARM Linux
 wrote:
>
> This file was created to contain FPE_FIXME, by the "signal/arm: Document
> conflicts with SI_USER and SIGFPE" commit so if we're removing it, it
> would be better to remove the whole file.

Fair enough. I'm not going to commit that anyway since I can't test
it, but yes, if it tests ok that sounds like the right thing to do.

   Linus


Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid

2018-04-12 Thread Russell King - ARM Linux
On Thu, Apr 12, 2018 at 09:50:26AM -0700, Linus Torvalds wrote:
> Does this attached patch perhaps fix the ARM case?
> 
> It just uses FPE_FLTUNK as the default si_code for SIGFPE, which seems
> sane enough. And then gets rid of FPE_FIXME, which should resolve the
> nasty case.
> 
> Hmm? Entirely untested, and I didn't really look at the test-case in
> question since I can't really run it anyway.

I'll test tomorrow and let you know.

>  arch/arm/include/uapi/asm/siginfo.h | 7 ---
>  arch/arm/vfp/vfpmodule.c| 4 ++--
>  2 files changed, 2 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/arm/include/uapi/asm/siginfo.h 
> b/arch/arm/include/uapi/asm/siginfo.h
> index d0513880be21..d87beeedb4c4 100644
> --- a/arch/arm/include/uapi/asm/siginfo.h
> +++ b/arch/arm/include/uapi/asm/siginfo.h
> @@ -3,11 +3,4 @@
>  
>  #include 
>  
> -/*
> - * SIGFPE si_codes
> - */
> -#ifdef __KERNEL__
> -#define FPE_FIXME0   /* Broken dup of SI_USER */
> -#endif /* __KERNEL__ */
> -
>  #endif

This file was created to contain FPE_FIXME, by the "signal/arm: Document
conflicts with SI_USER and SIGFPE" commit so if we're removing it, it
would be better to remove the whole file.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up


Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid

2018-04-12 Thread Linus Torvalds
Does this attached patch perhaps fix the ARM case?

It just uses FPE_FLTUNK as the default si_code for SIGFPE, which seems
sane enough. And then gets rid of FPE_FIXME, which should resolve the
nasty case.

Hmm? Entirely untested, and I didn't really look at the test-case in
question since I can't really run it anyway.

Well, I could run it all on x86-64, but it doesn't have that FPE_FIXME
case at all.

 Linus
 arch/arm/include/uapi/asm/siginfo.h | 7 ---
 arch/arm/vfp/vfpmodule.c| 4 ++--
 2 files changed, 2 insertions(+), 9 deletions(-)

diff --git a/arch/arm/include/uapi/asm/siginfo.h b/arch/arm/include/uapi/asm/siginfo.h
index d0513880be21..d87beeedb4c4 100644
--- a/arch/arm/include/uapi/asm/siginfo.h
+++ b/arch/arm/include/uapi/asm/siginfo.h
@@ -3,11 +3,4 @@
 
 #include 
 
-/*
- * SIGFPE si_codes
- */
-#ifdef __KERNEL__
-#define FPE_FIXME	0	/* Broken dup of SI_USER */
-#endif /* __KERNEL__ */
-
 #endif
diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c
index 4c375e11ae95..012c6e690303 100644
--- a/arch/arm/vfp/vfpmodule.c
+++ b/arch/arm/vfp/vfpmodule.c
@@ -251,13 +251,13 @@ static void vfp_panic(char *reason, u32 inst)
  */
 static void vfp_raise_exceptions(u32 exceptions, u32 inst, u32 fpscr, struct pt_regs *regs)
 {
-	int si_code = 0;
+	int si_code = FPE_FLTUNK;
 
 	pr_debug("VFP: raising exceptions %08x\n", exceptions);
 
 	if (exceptions == VFP_EXCEPTION_ERROR) {
 		vfp_panic("unhandled bounce", inst);
-		vfp_raise_sigfpe(FPE_FIXME, regs);
+		vfp_raise_sigfpe(si_code, regs);
 		return;
 	}
 


Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid

2018-04-12 Thread Russell King - ARM Linux
On Thu, Apr 12, 2018 at 03:49:28PM +0300, Dmitry V. Levin wrote:
> The "KERNEL BUG" diagnostics I was talking about was added to strace yesterday
> as a part of workaround commit, see
> https://github.com/strace/strace/commit/34c7794cc16e2511eda7b1d5767c655a83b17309
> Before that change the test just failed.

Ah, seeing the test case really helps to see exactly what and why it's
broken.  Yes, Eric's commit was definitely wrong and needs to be
reverted, because it incorrectly changes what happens when kill(1) is
used to deliver a SIGFPE signal to a process.

Eric, please sort this out - you have a much better handle on whether
there are any dependencies here that would need to be resolved from
a simple revert of the offending commits, but that revert must happen
because you've caused a user visible regression.

The original code _was_ safe even if it wasn't correct to the specs,
as we'd end up copying the si_addr field (as the si_pid copy) and a
zeroed field as the si_uid copy.  It was just that si_code was
technically wrong, and that's something that would be even more
dangerous to change now.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up


Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid

2018-04-12 Thread Dmitry V. Levin
On Thu, Apr 12, 2018 at 01:19:49PM +0100, Russell King - ARM Linux wrote:
> On Thu, Apr 12, 2018 at 02:03:14PM +0300, Dmitry V. Levin wrote:
> > On Thu, Apr 12, 2018 at 10:58:11AM +0100, Russell King - ARM Linux wrote:
> > > On Thu, Apr 12, 2018 at 04:34:35AM +0300, Dmitry V. Levin wrote:
> > > > A similar commit v4.16-rc1~159^2~37
> > > > ("signal/arm: Document conflicts with SI_USER and SIGFPE") must have
> > > > introduced a similar ABI regression to compat arm.
> > > 
> > > So, could you explain how can this change cause a regression?
> > > 
> > > +#define FPE_FIXME  0
> > > -   vfp_raise_sigfpe(0, regs);
> > > +   vfp_raise_sigfpe(FPE_FIXME, regs);
> > 
> > No, this hunk hasn't caused the regression, but another one did:
> > 
> > diff --git a/arch/arm/include/uapi/asm/siginfo.h 
> > b/arch/arm/include/uapi/asm/siginfo.h
> > new file mode 100644
> > index 000..d051388
> > --- /dev/null
> > +++ b/arch/arm/include/uapi/asm/siginfo.h
> > @@ -0,0 +1,13 @@
> > +#ifndef __ASM_SIGINFO_H
> > +#define __ASM_SIGINFO_H
> > +
> > +#include 
> > +
> > +/*
> > + * SIGFPE si_codes
> > + */
> > +#ifdef __KERNEL__
> > +#define FPE_FIXME  0   /* Broken dup of SI_USER */
> > +#endif /* __KERNEL__ */
> > +
> > +#endif
> > 
> > This is due to FPE_FIXME handling in kernel/signal.c
> 
> Building strace 4.22 on ARM and running the test suite reveals no
> problems with the signal_receive test, tested on both 4.14 and 4.16
> kernels - there's no "KERNEL BUG" reports in any of the test results.

https://build.opensuse.org/public/build/home:ldv_alt/openSUSE_Factory_ARM/armv7l/strace/_log
- the test just fails there with
[   50s] + uname -a
[   50s] Linux armbuild01 4.16.0-1-lpae #1 SMP PREEMPT Wed Apr 4 13:35:56 UTC 
2018 (e16f96d) armv7l armv7l armv7l GNU/Linux
...
[  570s] FAIL: signal_receive.gen
[  570s]  SIGFPE {si_signo=SIGFPE, si_code=SI_USER, si_pid=25332, 
si_uid=399} ---
[  570s] +--- SIGFPE {si_signo=SIGFPE, si_code=SI_USER, si_pid=25332, si_uid=0} 
---
[  570s] signal_receive.gen.test: failed test: ../../strace -a16 -e trace=kill 
../signal_receive output mismatch

> However, stock strace 4.22 source doesn't appear to contain the
> "KERNEL BUG" string anywhere, so this may be a Suse specific addition
> to the test:

The "KERNEL BUG" diagnostics I was talking about was added to strace yesterday
as a part of workaround commit, see
https://github.com/strace/strace/commit/34c7794cc16e2511eda7b1d5767c655a83b17309
Before that change the test just failed.

[...]
> Any ideas where the "KERNEL BUG" in Suse builds is coming from?

strace developers use OBS to test strace.git for regressions.
The build environment is provided by OBS, all the rest comes from strace.git.

> Any ideas how to test it on other architectures (iow, where can we get
> source that contains this test?)

Just use master branch of https://github.com/strace/strace
or https://gitlab.com/strace/strace (they are the same).

> Based on previous experience, unfortunately folk don't tend to report
> user ABI regressions to kernel developers, so we'd probably never know
> that there's a problem - I do think the safer thing would've been to
> leave it well alone, and just accept that we'll end up copying more
> words to userspace than is actually intended.

Well, these changes caused visible regressions in strace test suite on arm, ppc,
and sparc - this is the reason why I have reported them to kernel developers.


-- 
ldv


signature.asc
Description: PGP signature


Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid

2018-04-12 Thread Russell King - ARM Linux
On Thu, Apr 12, 2018 at 02:03:14PM +0300, Dmitry V. Levin wrote:
> On Thu, Apr 12, 2018 at 10:58:11AM +0100, Russell King - ARM Linux wrote:
> > On Thu, Apr 12, 2018 at 04:34:35AM +0300, Dmitry V. Levin wrote:
> > > A similar commit v4.16-rc1~159^2~37
> > > ("signal/arm: Document conflicts with SI_USER and SIGFPE") must have
> > > introduced a similar ABI regression to compat arm.
> > 
> > So, could you explain how can this change cause a regression?
> > 
> > +#define FPE_FIXME  0
> > -   vfp_raise_sigfpe(0, regs);
> > +   vfp_raise_sigfpe(FPE_FIXME, regs);
> 
> No, this hunk hasn't caused the regression, but another one did:
> 
> diff --git a/arch/arm/include/uapi/asm/siginfo.h 
> b/arch/arm/include/uapi/asm/siginfo.h
> new file mode 100644
> index 000..d051388
> --- /dev/null
> +++ b/arch/arm/include/uapi/asm/siginfo.h
> @@ -0,0 +1,13 @@
> +#ifndef __ASM_SIGINFO_H
> +#define __ASM_SIGINFO_H
> +
> +#include 
> +
> +/*
> + * SIGFPE si_codes
> + */
> +#ifdef __KERNEL__
> +#define FPE_FIXME  0   /* Broken dup of SI_USER */
> +#endif /* __KERNEL__ */
> +
> +#endif
> 
> This is due to FPE_FIXME handling in kernel/signal.c

Building strace 4.22 on ARM and running the test suite reveals no
problems with the signal_receive test, tested on both 4.14 and 4.16
kernels - there's no "KERNEL BUG" reports in any of the test results.
However, stock strace 4.22 source doesn't appear to contain the
"KERNEL BUG" string anywhere, so this may be a Suse specific addition
to the test:

~/src/strace-4.22$ grep -ri 'KERNEL BUG' .
./strace.1:Arguably, every instance of such behavior is a kernel bug.)
./strace.1.in:Arguably, every instance of such behavior is a kernel bug.)
./NEWS:  * Worked around a kernel bug in tracing privileged executables.
./ChangeLog:aarch64: workaround gcc+kernel bug.
./ChangeLog:tests: workaround kernel bugs in seccomp-strict.test and 
prctl-seccomp-strict.test
./ChangeLog:instead.  We don't want the testsuite failing due to kernel 
bugs.
./ChangeLog:First guess is that it's a workaround for old kernel bugs:
./ChangeLog:This kernel bug is fixed long ago. This change removes the 
workaround.

Any ideas where the "KERNEL BUG" in Suse builds is coming from?  Any
ideas how to test it on other architectures (iow, where can we get
source that contains this test?)

Based on previous experience, unfortunately folk don't tend to report
user ABI regressions to kernel developers, so we'd probably never know
that there's a problem - I do think the safer thing would've been to
leave it well alone, and just accept that we'll end up copying more
words to userspace than is actually intended.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up


Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid

2018-04-12 Thread Dmitry V. Levin
On Thu, Apr 12, 2018 at 10:58:11AM +0100, Russell King - ARM Linux wrote:
> On Thu, Apr 12, 2018 at 04:34:35AM +0300, Dmitry V. Levin wrote:
> > A similar commit v4.16-rc1~159^2~37
> > ("signal/arm: Document conflicts with SI_USER and SIGFPE") must have
> > introduced a similar ABI regression to compat arm.
> 
> So, could you explain how can this change cause a regression?
> 
> +#define FPE_FIXME  0
> -   vfp_raise_sigfpe(0, regs);
> +   vfp_raise_sigfpe(FPE_FIXME, regs);

No, this hunk hasn't caused the regression, but another one did:

diff --git a/arch/arm/include/uapi/asm/siginfo.h 
b/arch/arm/include/uapi/asm/siginfo.h
new file mode 100644
index 000..d051388
--- /dev/null
+++ b/arch/arm/include/uapi/asm/siginfo.h
@@ -0,0 +1,13 @@
+#ifndef __ASM_SIGINFO_H
+#define __ASM_SIGINFO_H
+
+#include 
+
+/*
+ * SIGFPE si_codes
+ */
+#ifdef __KERNEL__
+#define FPE_FIXME  0   /* Broken dup of SI_USER */
+#endif /* __KERNEL__ */
+
+#endif

This is due to FPE_FIXME handling in kernel/signal.c


-- 
ldv


signature.asc
Description: PGP signature


Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid

2018-04-12 Thread Russell King - ARM Linux
On Thu, Apr 12, 2018 at 04:34:35AM +0300, Dmitry V. Levin wrote:
> A similar commit v4.16-rc1~159^2~37
> ("signal/arm: Document conflicts with SI_USER and SIGFPE") must have
> introduced a similar ABI regression to compat arm.

So, could you explain how can this change cause a regression?

+#define FPE_FIXME  0
-   vfp_raise_sigfpe(0, regs);
+   vfp_raise_sigfpe(FPE_FIXME, regs);

I think you're talking garbage here - look at the damned change.
It subsitutes a definition for a constant, and vfp_raise_sigfpe()
ends up receiving exactly the same value bother before and after
the change.

The change is rather incomplete though because it should have
also changed:

int si_code = 0;

as well.

So, the commit log is accurate in this case: it _is_ about
documenting the conflicting cases between SI_USER and SIGFPE and
that bit of the change has no ABI effect.

What does slightly annoy me is the creation of uapi/asm/siginfo.h
to contain a definition that _isn't_ to be exposed as part of the
UAPI.  If it's not part of the UAPI, it doesn't belong in a UAPI
header, period.  In any case, I don't think that is exposed to
userspace.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up


Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid

2018-04-11 Thread Linus Torvalds
On Wed, Apr 11, 2018 at 6:34 PM, Dmitry V. Levin  wrote:
>
> There is a clear pattern of sneaking in ABI changes using innocently
> looking commit messages.

Yes, this siginfo stuff has been a mess.

Eric - this needs to stop. Or we need to revert all that garbage entirely.

Send a fix. And stop changing the siginfo layout or field values.

 Linus


Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid

2018-04-11 Thread Dmitry V. Levin
Hi,

On Mon, Apr 09, 2018 at 06:22:53PM +0300, Dmitry V. Levin wrote:
> There seems to be a regression in v4.16 on ppc compat very similar
> to sparc compat regression reported earlier at
> https://marc.info/?l=linux-sparc=151501500704383 .
> 
> The symptoms are exactly the same: the same signal_receive test from
> the strace test suite fails with the same diagnostics:
> https://build.opensuse.org/public/build/home:ldv_alt/openSUSE_Factory_PowerPC/ppc/strace/_log

The log is big, just look for "KERNEL BUG".

> Unfortunately, I do not have any means to investigate further,
> so just passing this information on to those who care.

OK, the faulty commit is v4.16-rc1~159^2~39
("signal/powerpc: Document conflicts with SI_USER and SIGFPE and SIGTRAP").

One might think that a commit called "Document conflicts" shouldn't
introduce an ABI regression, but this one definitely does by defining
FPE_FIXME and TRAP_FIXME in arch/powerpc/include/uapi/asm/siginfo.h
that affect siginfo_layout().

A similar commit v4.16-rc1~159^2~37
("signal/arm: Document conflicts with SI_USER and SIGFPE") must have
introduced a similar ABI regression to compat arm.

An earlier commit v4.14-rc1~60^2^2~5
("signal/sparc: Document a conflict with SI_USER with SIGFPE") introduced
a similar ABI regression to compat sparc.

There is a clear pattern of sneaking in ABI changes using innocently
looking commit messages.


-- 
ldv


signature.asc
Description: PGP signature