Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid
Russell King - ARM Linux writes: > On Fri, Apr 13, 2018 at 12:53:49PM -0700, Linus Torvalds wrote: >> On Fri, Apr 13, 2018 at 11:45 AM, Dave Martin wrote: >> > >> > Most uses I've seen do nothing more than use the FPE_xyz value to >> > format diagnostic messages while dying. I struggled to find code that >> > made a meaningful functional decision based on the value, though that's >> > not proof... >> >> Yeah. I've seen code that cares about SIGFPE deeply, but it's almost >> invariably about some emulated environment (eg Java VM, or CPU >> emulation). >> >> And the siginfo data is basically never good enough for those >> environments anyway on its own, so they will go and look at the actual >> instruction that caused the fault and the register state instead, >> because they need *all* the information. >> >> The cases that use si_code are the ones that just trapped signals in >> order to give a more helpful abort message. >> >> So I could certainly imagine that si_code is actually used by somebody >> who then decides to actuall act differently on it, but aside from >> perhaps printing out a different message, it sounds far-fetched. > > Okay, in that case let's just use FPE_FLTINV. That makes the patch > easily back-portable for stable kernels. If we want to I don't think backporting 266da65e9156 ("signal: Add FPE_FLTUNK si_code for undiagnosable fp exceptions") would be at all difficult. What it is changing has been stable for quite a while. The surroundings might change and so it might require some trivial manual fixup but I don't expect any problems. Not that I want to derail the consensus but if we want to backport similar fixes for arm64 or the other architectures that wind up using FPE_FLTUNK for their fix we would need to backport 266da65e9156 anyway. Eric
Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid
On Fri, Apr 13, 2018 at 12:53:49PM -0700, Linus Torvalds wrote: > On Fri, Apr 13, 2018 at 11:45 AM, Dave Martin wrote: > > > > Most uses I've seen do nothing more than use the FPE_xyz value to > > format diagnostic messages while dying. I struggled to find code that > > made a meaningful functional decision based on the value, though that's > > not proof... > > Yeah. I've seen code that cares about SIGFPE deeply, but it's almost > invariably about some emulated environment (eg Java VM, or CPU > emulation). > > And the siginfo data is basically never good enough for those > environments anyway on its own, so they will go and look at the actual > instruction that caused the fault and the register state instead, > because they need *all* the information. > > The cases that use si_code are the ones that just trapped signals in > order to give a more helpful abort message. > > So I could certainly imagine that si_code is actually used by somebody > who then decides to actuall act differently on it, but aside from > perhaps printing out a different message, it sounds far-fetched. Okay, in that case let's just use FPE_FLTINV. That makes the patch easily back-portable for stable kernels. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up
Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid
On Fri, Apr 13, 2018 at 07:50:17PM +0100, Russell King - ARM Linux wrote: > On Fri, Apr 13, 2018 at 07:35:38PM +0100, Dave Martin wrote: > > If that's the case though, I don't see how a userspace testsuite is > > hitting this code path. Maybe I've misunderstood the context of this > > thread. > > It isn't hitting this exact case. > > The userspace testsuite is hitting an entirely different case: > > kill(getpid(), SIGFPE); > > As one expects, this generates a SIGFPE to the current process, which > then inspects the siginfo structure. Being a userspace generated > signal, si_code is set to SI_USER, which has the value 0. > > With FPE_FIXME defined to zero, as Eric has done: > > enum siginfo_layout siginfo_layout(int sig, int si_code) > { > enum siginfo_layout layout = SIL_KILL; > if ((si_code > SI_USER) && (si_code < SI_KERNEL)) { > ... > } else { > ... > #ifdef FPE_FIXME > if ((sig == SIGFPE) && (si_code == FPE_FIXME)) > layout = SIL_FAULT; > #endif > } > return layout; > } > > This causes siginfo_layout() to return SIL_FAULT for this userspace > generated signal, rather than the correct SIL_KILL. > > This affects which fields we copy to userspace. > > SI_USER is defined to pass si_pid and si_uid to the userspace process, > which on ARM are the first two consecutive 32-bit quantities in the > union, which is done when siginfo_layout() returns SIL_KILL. However, > when SIL_FAULT is returned, we only copy si_addr in the union, which > on ARM is just one 32-bit quantity. > > Consequently, userspace sees a correct value for si_pid, and si_uid > remains set to whatever was there in userspace. In the case of the > strace program, that's zero. This means if you run the strace > testsuite as root, the problem doesn't appear, but if you run it as > a non-root user, it will. > > So, the testsuite case has little to do with the behaviour of the VFP > handling - it's to do with the behaviour of the kernel's signal handling. Oh, right. So, going back to the unhandled VFP bounce question, is it reasonable for that to be a SIGKILL? That avoids the question of what si_code userspace should see, because userspace doesn't get to see siginfo at all in that case: it's dead. Or do we hit this in real situations that we want userspace to bail out of more gracefully? Cheers ---Dave
Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid
On Fri, Apr 13, 2018 at 11:23:36AM -0700, Linus Torvalds wrote: > On Fri, Apr 13, 2018 at 10:54 AM, Russell King - ARM Linux > wrote: > > > > FPE_FLTINV means "floating point invalid operation". Does it really > > cover the case where hardware has failed, or is it intended to cover > > the case where userspace did something wrong and asked for an invalid > > operation from the FP hardware? > > Note that the number of people who actually look at the si_code is > approximately zero. > > But the ones that _do_ check the si_code are certainly not going to > check it against a new code that they don't know about. > > I suspect that if you start searching for FLT_xyz occurrences in code, > approximately 100% of them are from the kernel code that generates > them, not from any actual users. > > So I'd be very surprised if you can find *anybody* who cares about > that exact value (with the possible exceptions of test-suites). > > Sadly, google code-search is no more. It was useful for things like that. I've found https://codesearch.debian.net/ useful for digging into this kind of question, though it tends to throw up a lot of false positives. Most uses I've seen do nothing more than use the FPE_xyz value to format diagnostic messages while dying. I struggled to find code that made a meaningful functional decision based on the value, though that's not proof... Cheers ---Dave
Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid
On Fri, Apr 13, 2018 at 06:54:08PM +0100, Russell King - ARM Linux wrote: > On Fri, Apr 13, 2018 at 06:08:28PM +0100, Dave Martin wrote: > > On Fri, Apr 13, 2018 at 09:33:17AM -0700, Linus Torvalds wrote: > > > On Fri, Apr 13, 2018 at 2:42 AM, Russell King - ARM Linux > > > wrote: > > > > > > > > Yes, it does solve the problem at hand with strace - the exact patch I > > > > tested against 4.16 is below. > > > > > > Ok, good. > > > > > > > However, FPE_FLTUNK is not defined in older kernels, so while we can > > > > fix it this way for the current merge window, that doesn't help 4.16. > > > > > > I wonder if we should even bother with FPE_FLTUNK. > > > > > > I suspect we might as well use FPE_FLTINV, I suspect, and not have > > > this complexity at all. That case is not worth worrying about, since > > > it's a "this shouldn't happen anyway" and the *real* reason will be in > > > the kernel logs due to vfs_panic(). > > > > > > So it's not like this is something that the user should ever care > > > about the si_code about. > > > > Ack, my intended meaning for FPE_FLTUNK is that the fp exception is > > either spurious or we can't tell easily (or possibly at all) which > > FPE_XXX should be returned. It's up to userspace to figure it out > > if it really cares. Previously we were accidentally returning SI_USER > > in si_code for arm64. > > > > This case on arm looks like a more serious error for which FPE_FLTINV > > may be more appropriate anyway. > > No. The cases where we get to this point are: > > 1. A trap concerning a coprocessor register transfer instruction (iow, move >between a VFP register and ARM register.) > 2. A trap concerning a coprocessor register load or save instruction. > > (In both of these, "concerning" means that the VFP hardware provides > such an instruction as the reason for the fault, *not* that it is the > faulting instruction.) > > 3. A combination of the exception bits (EX and DEX) on certain VFP >implementations. > > All of these can be summarised as "the hardware went wrong in some way" > rather than "the user program did something wrong." Although my understanding of VFP bounces is a bit hazy, I think this is broadly in line with my assumptions. > FPE_FLTINV means "floating point invalid operation". Does it really > cover the case where hardware has failed, or is it intended to cover > the case where userspace did something wrong and asked for an invalid > operation from the FP hardware? So, there's an argument that FPE_FLTINV is not really correct. My rationale was that there is nothing correct that we can return, and FPE_FLTINV may be no worse than the alternatives. If we can only hit this case as the result of a hardware failure or kernel bug though, should this be delivered as SIGKILL instead? That's the approach I eventually followed for various exceptions on arm64 that were theoretically delivered to userspace with si_code==0, but really should be impossible unless and kernel and/or hardware is buggy. If that's the case though, I don't see how a userspace testsuite is hitting this code path. Maybe I've misunderstood the context of this thread. Cheers ---Dave
Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid
On Fri, Apr 13, 2018 at 09:33:17AM -0700, Linus Torvalds wrote: > On Fri, Apr 13, 2018 at 2:42 AM, Russell King - ARM Linux > wrote: > > > > Yes, it does solve the problem at hand with strace - the exact patch I > > tested against 4.16 is below. > > Ok, good. > > > However, FPE_FLTUNK is not defined in older kernels, so while we can > > fix it this way for the current merge window, that doesn't help 4.16. > > I wonder if we should even bother with FPE_FLTUNK. > > I suspect we might as well use FPE_FLTINV, I suspect, and not have > this complexity at all. That case is not worth worrying about, since > it's a "this shouldn't happen anyway" and the *real* reason will be in > the kernel logs due to vfs_panic(). > > So it's not like this is something that the user should ever care > about the si_code about. Ack, my intended meaning for FPE_FLTUNK is that the fp exception is either spurious or we can't tell easily (or possibly at all) which FPE_XXX should be returned. It's up to userspace to figure it out if it really cares. Previously we were accidentally returning SI_USER in si_code for arm64. This case on arm looks like a more serious error for which FPE_FLTINV may be more appropriate anyway. [...] Cheers ---Dave
Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid
On Fri, Apr 13, 2018 at 11:45 AM, Dave Martin wrote: > > Most uses I've seen do nothing more than use the FPE_xyz value to > format diagnostic messages while dying. I struggled to find code that > made a meaningful functional decision based on the value, though that's > not proof... Yeah. I've seen code that cares about SIGFPE deeply, but it's almost invariably about some emulated environment (eg Java VM, or CPU emulation). And the siginfo data is basically never good enough for those environments anyway on its own, so they will go and look at the actual instruction that caused the fault and the register state instead, because they need *all* the information. The cases that use si_code are the ones that just trapped signals in order to give a more helpful abort message. So I could certainly imagine that si_code is actually used by somebody who then decides to actuall act differently on it, but aside from perhaps printing out a different message, it sounds far-fetched. Linus
Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid
On Fri, Apr 13, 2018 at 07:35:38PM +0100, Dave Martin wrote: > If that's the case though, I don't see how a userspace testsuite is > hitting this code path. Maybe I've misunderstood the context of this > thread. It isn't hitting this exact case. The userspace testsuite is hitting an entirely different case: kill(getpid(), SIGFPE); As one expects, this generates a SIGFPE to the current process, which then inspects the siginfo structure. Being a userspace generated signal, si_code is set to SI_USER, which has the value 0. With FPE_FIXME defined to zero, as Eric has done: enum siginfo_layout siginfo_layout(int sig, int si_code) { enum siginfo_layout layout = SIL_KILL; if ((si_code > SI_USER) && (si_code < SI_KERNEL)) { ... } else { ... #ifdef FPE_FIXME if ((sig == SIGFPE) && (si_code == FPE_FIXME)) layout = SIL_FAULT; #endif } return layout; } This causes siginfo_layout() to return SIL_FAULT for this userspace generated signal, rather than the correct SIL_KILL. This affects which fields we copy to userspace. SI_USER is defined to pass si_pid and si_uid to the userspace process, which on ARM are the first two consecutive 32-bit quantities in the union, which is done when siginfo_layout() returns SIL_KILL. However, when SIL_FAULT is returned, we only copy si_addr in the union, which on ARM is just one 32-bit quantity. Consequently, userspace sees a correct value for si_pid, and si_uid remains set to whatever was there in userspace. In the case of the strace program, that's zero. This means if you run the strace testsuite as root, the problem doesn't appear, but if you run it as a non-root user, it will. So, the testsuite case has little to do with the behaviour of the VFP handling - it's to do with the behaviour of the kernel's signal handling. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up
Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid
On Fri, Apr 13, 2018 at 10:54 AM, Russell King - ARM Linux wrote: > > FPE_FLTINV means "floating point invalid operation". Does it really > cover the case where hardware has failed, or is it intended to cover > the case where userspace did something wrong and asked for an invalid > operation from the FP hardware? Note that the number of people who actually look at the si_code is approximately zero. But the ones that _do_ check the si_code are certainly not going to check it against a new code that they don't know about. I suspect that if you start searching for FLT_xyz occurrences in code, approximately 100% of them are from the kernel code that generates them, not from any actual users. So I'd be very surprised if you can find *anybody* who cares about that exact value (with the possible exceptions of test-suites). Sadly, google code-search is no more. It was useful for things like that. Linus
Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid
On Fri, Apr 13, 2018 at 06:08:28PM +0100, Dave Martin wrote: > On Fri, Apr 13, 2018 at 09:33:17AM -0700, Linus Torvalds wrote: > > On Fri, Apr 13, 2018 at 2:42 AM, Russell King - ARM Linux > > wrote: > > > > > > Yes, it does solve the problem at hand with strace - the exact patch I > > > tested against 4.16 is below. > > > > Ok, good. > > > > > However, FPE_FLTUNK is not defined in older kernels, so while we can > > > fix it this way for the current merge window, that doesn't help 4.16. > > > > I wonder if we should even bother with FPE_FLTUNK. > > > > I suspect we might as well use FPE_FLTINV, I suspect, and not have > > this complexity at all. That case is not worth worrying about, since > > it's a "this shouldn't happen anyway" and the *real* reason will be in > > the kernel logs due to vfs_panic(). > > > > So it's not like this is something that the user should ever care > > about the si_code about. > > Ack, my intended meaning for FPE_FLTUNK is that the fp exception is > either spurious or we can't tell easily (or possibly at all) which > FPE_XXX should be returned. It's up to userspace to figure it out > if it really cares. Previously we were accidentally returning SI_USER > in si_code for arm64. > > This case on arm looks like a more serious error for which FPE_FLTINV > may be more appropriate anyway. No. The cases where we get to this point are: 1. A trap concerning a coprocessor register transfer instruction (iow, move between a VFP register and ARM register.) 2. A trap concerning a coprocessor register load or save instruction. (In both of these, "concerning" means that the VFP hardware provides such an instruction as the reason for the fault, *not* that it is the faulting instruction.) 3. A combination of the exception bits (EX and DEX) on certain VFP implementations. All of these can be summarised as "the hardware went wrong in some way" rather than "the user program did something wrong." FPE_FLTINV means "floating point invalid operation". Does it really cover the case where hardware has failed, or is it intended to cover the case where userspace did something wrong and asked for an invalid operation from the FP hardware? -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up
Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid
On Fri, Apr 13, 2018 at 2:42 AM, Russell King - ARM Linux wrote: > > Yes, it does solve the problem at hand with strace - the exact patch I > tested against 4.16 is below. Ok, good. > However, FPE_FLTUNK is not defined in older kernels, so while we can > fix it this way for the current merge window, that doesn't help 4.16. I wonder if we should even bother with FPE_FLTUNK. I suspect we might as well use FPE_FLTINV, I suspect, and not have this complexity at all. That case is not worth worrying about, since it's a "this shouldn't happen anyway" and the *real* reason will be in the kernel logs due to vfs_panic(). So it's not like this is something that the user should ever care about the si_code about. > Given that the path we're talking about is unlikely to happen (as > mentioned in my second paragraph) I still think reverting Eric's patch > is the right way forward for older kernels. I'd much rather get rid of that FPE_FIXME, and leave that whole mess behind. So the attached patch seems simple and should work with 4.16 too. Let's not leave this as some kind of nasty maintenance issue, and just go for simple and stupid. Hmm? Linus arch/arm/include/uapi/asm/siginfo.h | 13 - arch/arm/vfp/vfpmodule.c| 2 +- 2 files changed, 1 insertion(+), 14 deletions(-) diff --git a/arch/arm/include/uapi/asm/siginfo.h b/arch/arm/include/uapi/asm/siginfo.h deleted file mode 100644 index d0513880be21.. --- a/arch/arm/include/uapi/asm/siginfo.h +++ /dev/null @@ -1,13 +0,0 @@ -#ifndef __ASM_SIGINFO_H -#define __ASM_SIGINFO_H - -#include - -/* - * SIGFPE si_codes - */ -#ifdef __KERNEL__ -#define FPE_FIXME 0 /* Broken dup of SI_USER */ -#endif /* __KERNEL__ */ - -#endif diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c index 4c375e11ae95..af4ee2cef2f9 100644 --- a/arch/arm/vfp/vfpmodule.c +++ b/arch/arm/vfp/vfpmodule.c @@ -257,7 +257,7 @@ static void vfp_raise_exceptions(u32 exceptions, u32 inst, u32 fpscr, struct pt_ if (exceptions == VFP_EXCEPTION_ERROR) { vfp_panic("unhandled bounce", inst); - vfp_raise_sigfpe(FPE_FIXME, regs); + vfp_raise_sigfpe(FPE_FLTINV, regs); return; }
Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid
On Thu, Apr 12, 2018 at 10:22:15AM -0700, Linus Torvalds wrote: > On Thu, Apr 12, 2018 at 10:20 AM, Russell King - ARM Linux > wrote: > > > > This file was created to contain FPE_FIXME, by the "signal/arm: Document > > conflicts with SI_USER and SIGFPE" commit so if we're removing it, it > > would be better to remove the whole file. > > Fair enough. I'm not going to commit that anyway since I can't test > it, but yes, if it tests ok that sounds like the right thing to do. Yes, it does solve the problem at hand with strace - the exact patch I tested against 4.16 is below. Testing this exact code path (exceptions set to VFP_EXCEPTION_ERROR) is something that can only happen if the hardware does something stupid, and I don't have a way of making it do that, so the code path can't be tested. However, FPE_FLTUNK is not defined in older kernels, so while we can fix it this way for the current merge window, that doesn't help 4.16. How we solve that depends what happens with Eric's patch (266da65e9156 ("signal: Add FPE_FLTUNK si_code for undiagnosable fp exceptions")) that introduces FPE_FLTUNK - and there's also the problem of NSIGFPE, which kernel/signal.c uses in the path that selects the siginfo layout. Given that the path we're talking about is unlikely to happen (as mentioned in my second paragraph) I still think reverting Eric's patch is the right way forward for older kernels. (Note, my previous comment about the si_code initialiser was incorrect.) diff --git a/arch/arm/include/uapi/asm/siginfo.h b/arch/arm/include/uapi/asm/siginfo.h deleted file mode 100644 index d0513880be21.. --- a/arch/arm/include/uapi/asm/siginfo.h +++ /dev/null @@ -1,13 +0,0 @@ -#ifndef __ASM_SIGINFO_H -#define __ASM_SIGINFO_H - -#include - -/* - * SIGFPE si_codes - */ -#ifdef __KERNEL__ -#define FPE_FIXME 0 /* Broken dup of SI_USER */ -#endif /* __KERNEL__ */ - -#endif diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c index 4c375e11ae95..8a1a5e6048d2 100644 --- a/arch/arm/vfp/vfpmodule.c +++ b/arch/arm/vfp/vfpmodule.c @@ -28,6 +28,10 @@ #include #include +#ifndef FPE_FLTUNK +#define FPE_FLTUNK 14 +#endif + #include "vfpinstr.h" #include "vfp.h" @@ -257,7 +261,7 @@ static void vfp_raise_exceptions(u32 exceptions, u32 inst, u32 fpscr, struct pt_ if (exceptions == VFP_EXCEPTION_ERROR) { vfp_panic("unhandled bounce", inst); - vfp_raise_sigfpe(FPE_FIXME, regs); + vfp_raise_sigfpe(FPE_FLTUNK, regs); return; } -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up
Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid
On Thu, Apr 12, 2018 at 09:50:26AM -0700, Linus Torvalds wrote: > Does this attached patch perhaps fix the ARM case? > > It just uses FPE_FLTUNK as the default si_code for SIGFPE, which seems > sane enough. And then gets rid of FPE_FIXME, which should resolve the > nasty case. > > Hmm? Entirely untested, and I didn't really look at the test-case in > question since I can't really run it anyway. > > Well, I could run it all on x86-64, but it doesn't have that FPE_FIXME > case at all. > > Linus > arch/arm/include/uapi/asm/siginfo.h | 7 --- > arch/arm/vfp/vfpmodule.c| 4 ++-- > 2 files changed, 2 insertions(+), 9 deletions(-) > > diff --git a/arch/arm/include/uapi/asm/siginfo.h > b/arch/arm/include/uapi/asm/siginfo.h > index d0513880be21..d87beeedb4c4 100644 > --- a/arch/arm/include/uapi/asm/siginfo.h > +++ b/arch/arm/include/uapi/asm/siginfo.h > @@ -3,11 +3,4 @@ > > #include > > -/* > - * SIGFPE si_codes > - */ > -#ifdef __KERNEL__ > -#define FPE_FIXME0 /* Broken dup of SI_USER */ > -#endif /* __KERNEL__ */ > - > #endif Looks like the whole file should go away. > diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c > index 4c375e11ae95..012c6e690303 100644 > --- a/arch/arm/vfp/vfpmodule.c > +++ b/arch/arm/vfp/vfpmodule.c > @@ -251,13 +251,13 @@ static void vfp_panic(char *reason, u32 inst) > */ > static void vfp_raise_exceptions(u32 exceptions, u32 inst, u32 fpscr, struct > pt_regs *regs) > { > - int si_code = 0; > + int si_code = FPE_FLTUNK; Note that this change would affect the following code at the end of vfp_raise_exceptions: if (si_code) vfp_raise_sigfpe(si_code, regs); > pr_debug("VFP: raising exceptions %08x\n", exceptions); > > if (exceptions == VFP_EXCEPTION_ERROR) { > vfp_panic("unhandled bounce", inst); > - vfp_raise_sigfpe(FPE_FIXME, regs); > + vfp_raise_sigfpe(si_code, regs); > return; > } > To be on the safe side, I'd just change it this way: diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c index 4c375e1..66a73ba 100644 --- a/arch/arm/vfp/vfpmodule.c +++ b/arch/arm/vfp/vfpmodule.c @@ -257,7 +257,7 @@ static void vfp_raise_exceptions(u32 exceptions, u32 inst, u32 fpscr, struct pt_ if (exceptions == VFP_EXCEPTION_ERROR) { vfp_panic("unhandled bounce", inst); - vfp_raise_sigfpe(FPE_FIXME, regs); + vfp_raise_sigfpe(FPE_FLTUNK, regs); return; } -- ldv signature.asc Description: PGP signature
Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid
On Thu, Apr 12, 2018 at 10:20 AM, Russell King - ARM Linux wrote: > > This file was created to contain FPE_FIXME, by the "signal/arm: Document > conflicts with SI_USER and SIGFPE" commit so if we're removing it, it > would be better to remove the whole file. Fair enough. I'm not going to commit that anyway since I can't test it, but yes, if it tests ok that sounds like the right thing to do. Linus
Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid
On Thu, Apr 12, 2018 at 09:50:26AM -0700, Linus Torvalds wrote: > Does this attached patch perhaps fix the ARM case? > > It just uses FPE_FLTUNK as the default si_code for SIGFPE, which seems > sane enough. And then gets rid of FPE_FIXME, which should resolve the > nasty case. > > Hmm? Entirely untested, and I didn't really look at the test-case in > question since I can't really run it anyway. I'll test tomorrow and let you know. > arch/arm/include/uapi/asm/siginfo.h | 7 --- > arch/arm/vfp/vfpmodule.c| 4 ++-- > 2 files changed, 2 insertions(+), 9 deletions(-) > > diff --git a/arch/arm/include/uapi/asm/siginfo.h > b/arch/arm/include/uapi/asm/siginfo.h > index d0513880be21..d87beeedb4c4 100644 > --- a/arch/arm/include/uapi/asm/siginfo.h > +++ b/arch/arm/include/uapi/asm/siginfo.h > @@ -3,11 +3,4 @@ > > #include > > -/* > - * SIGFPE si_codes > - */ > -#ifdef __KERNEL__ > -#define FPE_FIXME0 /* Broken dup of SI_USER */ > -#endif /* __KERNEL__ */ > - > #endif This file was created to contain FPE_FIXME, by the "signal/arm: Document conflicts with SI_USER and SIGFPE" commit so if we're removing it, it would be better to remove the whole file. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up
Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid
Does this attached patch perhaps fix the ARM case? It just uses FPE_FLTUNK as the default si_code for SIGFPE, which seems sane enough. And then gets rid of FPE_FIXME, which should resolve the nasty case. Hmm? Entirely untested, and I didn't really look at the test-case in question since I can't really run it anyway. Well, I could run it all on x86-64, but it doesn't have that FPE_FIXME case at all. Linus arch/arm/include/uapi/asm/siginfo.h | 7 --- arch/arm/vfp/vfpmodule.c| 4 ++-- 2 files changed, 2 insertions(+), 9 deletions(-) diff --git a/arch/arm/include/uapi/asm/siginfo.h b/arch/arm/include/uapi/asm/siginfo.h index d0513880be21..d87beeedb4c4 100644 --- a/arch/arm/include/uapi/asm/siginfo.h +++ b/arch/arm/include/uapi/asm/siginfo.h @@ -3,11 +3,4 @@ #include -/* - * SIGFPE si_codes - */ -#ifdef __KERNEL__ -#define FPE_FIXME 0 /* Broken dup of SI_USER */ -#endif /* __KERNEL__ */ - #endif diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c index 4c375e11ae95..012c6e690303 100644 --- a/arch/arm/vfp/vfpmodule.c +++ b/arch/arm/vfp/vfpmodule.c @@ -251,13 +251,13 @@ static void vfp_panic(char *reason, u32 inst) */ static void vfp_raise_exceptions(u32 exceptions, u32 inst, u32 fpscr, struct pt_regs *regs) { - int si_code = 0; + int si_code = FPE_FLTUNK; pr_debug("VFP: raising exceptions %08x\n", exceptions); if (exceptions == VFP_EXCEPTION_ERROR) { vfp_panic("unhandled bounce", inst); - vfp_raise_sigfpe(FPE_FIXME, regs); + vfp_raise_sigfpe(si_code, regs); return; }
Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid
On Thu, Apr 12, 2018 at 03:49:28PM +0300, Dmitry V. Levin wrote: > The "KERNEL BUG" diagnostics I was talking about was added to strace yesterday > as a part of workaround commit, see > https://github.com/strace/strace/commit/34c7794cc16e2511eda7b1d5767c655a83b17309 > Before that change the test just failed. Ah, seeing the test case really helps to see exactly what and why it's broken. Yes, Eric's commit was definitely wrong and needs to be reverted, because it incorrectly changes what happens when kill(1) is used to deliver a SIGFPE signal to a process. Eric, please sort this out - you have a much better handle on whether there are any dependencies here that would need to be resolved from a simple revert of the offending commits, but that revert must happen because you've caused a user visible regression. The original code _was_ safe even if it wasn't correct to the specs, as we'd end up copying the si_addr field (as the si_pid copy) and a zeroed field as the si_uid copy. It was just that si_code was technically wrong, and that's something that would be even more dangerous to change now. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up
Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid
On Thu, Apr 12, 2018 at 01:19:49PM +0100, Russell King - ARM Linux wrote: > On Thu, Apr 12, 2018 at 02:03:14PM +0300, Dmitry V. Levin wrote: > > On Thu, Apr 12, 2018 at 10:58:11AM +0100, Russell King - ARM Linux wrote: > > > On Thu, Apr 12, 2018 at 04:34:35AM +0300, Dmitry V. Levin wrote: > > > > A similar commit v4.16-rc1~159^2~37 > > > > ("signal/arm: Document conflicts with SI_USER and SIGFPE") must have > > > > introduced a similar ABI regression to compat arm. > > > > > > So, could you explain how can this change cause a regression? > > > > > > +#define FPE_FIXME 0 > > > - vfp_raise_sigfpe(0, regs); > > > + vfp_raise_sigfpe(FPE_FIXME, regs); > > > > No, this hunk hasn't caused the regression, but another one did: > > > > diff --git a/arch/arm/include/uapi/asm/siginfo.h > > b/arch/arm/include/uapi/asm/siginfo.h > > new file mode 100644 > > index 000..d051388 > > --- /dev/null > > +++ b/arch/arm/include/uapi/asm/siginfo.h > > @@ -0,0 +1,13 @@ > > +#ifndef __ASM_SIGINFO_H > > +#define __ASM_SIGINFO_H > > + > > +#include > > + > > +/* > > + * SIGFPE si_codes > > + */ > > +#ifdef __KERNEL__ > > +#define FPE_FIXME 0 /* Broken dup of SI_USER */ > > +#endif /* __KERNEL__ */ > > + > > +#endif > > > > This is due to FPE_FIXME handling in kernel/signal.c > > Building strace 4.22 on ARM and running the test suite reveals no > problems with the signal_receive test, tested on both 4.14 and 4.16 > kernels - there's no "KERNEL BUG" reports in any of the test results. https://build.opensuse.org/public/build/home:ldv_alt/openSUSE_Factory_ARM/armv7l/strace/_log - the test just fails there with [ 50s] + uname -a [ 50s] Linux armbuild01 4.16.0-1-lpae #1 SMP PREEMPT Wed Apr 4 13:35:56 UTC 2018 (e16f96d) armv7l armv7l armv7l GNU/Linux ... [ 570s] FAIL: signal_receive.gen [ 570s] SIGFPE {si_signo=SIGFPE, si_code=SI_USER, si_pid=25332, si_uid=399} --- [ 570s] +--- SIGFPE {si_signo=SIGFPE, si_code=SI_USER, si_pid=25332, si_uid=0} --- [ 570s] signal_receive.gen.test: failed test: ../../strace -a16 -e trace=kill ../signal_receive output mismatch > However, stock strace 4.22 source doesn't appear to contain the > "KERNEL BUG" string anywhere, so this may be a Suse specific addition > to the test: The "KERNEL BUG" diagnostics I was talking about was added to strace yesterday as a part of workaround commit, see https://github.com/strace/strace/commit/34c7794cc16e2511eda7b1d5767c655a83b17309 Before that change the test just failed. [...] > Any ideas where the "KERNEL BUG" in Suse builds is coming from? strace developers use OBS to test strace.git for regressions. The build environment is provided by OBS, all the rest comes from strace.git. > Any ideas how to test it on other architectures (iow, where can we get > source that contains this test?) Just use master branch of https://github.com/strace/strace or https://gitlab.com/strace/strace (they are the same). > Based on previous experience, unfortunately folk don't tend to report > user ABI regressions to kernel developers, so we'd probably never know > that there's a problem - I do think the safer thing would've been to > leave it well alone, and just accept that we'll end up copying more > words to userspace than is actually intended. Well, these changes caused visible regressions in strace test suite on arm, ppc, and sparc - this is the reason why I have reported them to kernel developers. -- ldv signature.asc Description: PGP signature
Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid
On Thu, Apr 12, 2018 at 02:03:14PM +0300, Dmitry V. Levin wrote: > On Thu, Apr 12, 2018 at 10:58:11AM +0100, Russell King - ARM Linux wrote: > > On Thu, Apr 12, 2018 at 04:34:35AM +0300, Dmitry V. Levin wrote: > > > A similar commit v4.16-rc1~159^2~37 > > > ("signal/arm: Document conflicts with SI_USER and SIGFPE") must have > > > introduced a similar ABI regression to compat arm. > > > > So, could you explain how can this change cause a regression? > > > > +#define FPE_FIXME 0 > > - vfp_raise_sigfpe(0, regs); > > + vfp_raise_sigfpe(FPE_FIXME, regs); > > No, this hunk hasn't caused the regression, but another one did: > > diff --git a/arch/arm/include/uapi/asm/siginfo.h > b/arch/arm/include/uapi/asm/siginfo.h > new file mode 100644 > index 000..d051388 > --- /dev/null > +++ b/arch/arm/include/uapi/asm/siginfo.h > @@ -0,0 +1,13 @@ > +#ifndef __ASM_SIGINFO_H > +#define __ASM_SIGINFO_H > + > +#include > + > +/* > + * SIGFPE si_codes > + */ > +#ifdef __KERNEL__ > +#define FPE_FIXME 0 /* Broken dup of SI_USER */ > +#endif /* __KERNEL__ */ > + > +#endif > > This is due to FPE_FIXME handling in kernel/signal.c Building strace 4.22 on ARM and running the test suite reveals no problems with the signal_receive test, tested on both 4.14 and 4.16 kernels - there's no "KERNEL BUG" reports in any of the test results. However, stock strace 4.22 source doesn't appear to contain the "KERNEL BUG" string anywhere, so this may be a Suse specific addition to the test: ~/src/strace-4.22$ grep -ri 'KERNEL BUG' . ./strace.1:Arguably, every instance of such behavior is a kernel bug.) ./strace.1.in:Arguably, every instance of such behavior is a kernel bug.) ./NEWS: * Worked around a kernel bug in tracing privileged executables. ./ChangeLog:aarch64: workaround gcc+kernel bug. ./ChangeLog:tests: workaround kernel bugs in seccomp-strict.test and prctl-seccomp-strict.test ./ChangeLog:instead. We don't want the testsuite failing due to kernel bugs. ./ChangeLog:First guess is that it's a workaround for old kernel bugs: ./ChangeLog:This kernel bug is fixed long ago. This change removes the workaround. Any ideas where the "KERNEL BUG" in Suse builds is coming from? Any ideas how to test it on other architectures (iow, where can we get source that contains this test?) Based on previous experience, unfortunately folk don't tend to report user ABI regressions to kernel developers, so we'd probably never know that there's a problem - I do think the safer thing would've been to leave it well alone, and just accept that we'll end up copying more words to userspace than is actually intended. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up
Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid
On Thu, Apr 12, 2018 at 10:58:11AM +0100, Russell King - ARM Linux wrote: > On Thu, Apr 12, 2018 at 04:34:35AM +0300, Dmitry V. Levin wrote: > > A similar commit v4.16-rc1~159^2~37 > > ("signal/arm: Document conflicts with SI_USER and SIGFPE") must have > > introduced a similar ABI regression to compat arm. > > So, could you explain how can this change cause a regression? > > +#define FPE_FIXME 0 > - vfp_raise_sigfpe(0, regs); > + vfp_raise_sigfpe(FPE_FIXME, regs); No, this hunk hasn't caused the regression, but another one did: diff --git a/arch/arm/include/uapi/asm/siginfo.h b/arch/arm/include/uapi/asm/siginfo.h new file mode 100644 index 000..d051388 --- /dev/null +++ b/arch/arm/include/uapi/asm/siginfo.h @@ -0,0 +1,13 @@ +#ifndef __ASM_SIGINFO_H +#define __ASM_SIGINFO_H + +#include + +/* + * SIGFPE si_codes + */ +#ifdef __KERNEL__ +#define FPE_FIXME 0 /* Broken dup of SI_USER */ +#endif /* __KERNEL__ */ + +#endif This is due to FPE_FIXME handling in kernel/signal.c -- ldv signature.asc Description: PGP signature
Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid
On Thu, Apr 12, 2018 at 04:34:35AM +0300, Dmitry V. Levin wrote: > A similar commit v4.16-rc1~159^2~37 > ("signal/arm: Document conflicts with SI_USER and SIGFPE") must have > introduced a similar ABI regression to compat arm. So, could you explain how can this change cause a regression? +#define FPE_FIXME 0 - vfp_raise_sigfpe(0, regs); + vfp_raise_sigfpe(FPE_FIXME, regs); I think you're talking garbage here - look at the damned change. It subsitutes a definition for a constant, and vfp_raise_sigfpe() ends up receiving exactly the same value bother before and after the change. The change is rather incomplete though because it should have also changed: int si_code = 0; as well. So, the commit log is accurate in this case: it _is_ about documenting the conflicting cases between SI_USER and SIGFPE and that bit of the change has no ABI effect. What does slightly annoy me is the creation of uapi/asm/siginfo.h to contain a definition that _isn't_ to be exposed as part of the UAPI. If it's not part of the UAPI, it doesn't belong in a UAPI header, period. In any case, I don't think that is exposed to userspace. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up
Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid
On Wed, Apr 11, 2018 at 6:34 PM, Dmitry V. Levin wrote: > > There is a clear pattern of sneaking in ABI changes using innocently > looking commit messages. Yes, this siginfo stuff has been a mess. Eric - this needs to stop. Or we need to revert all that garbage entirely. Send a fix. And stop changing the siginfo layout or field values. Linus
Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid
Hi, On Mon, Apr 09, 2018 at 06:22:53PM +0300, Dmitry V. Levin wrote: > There seems to be a regression in v4.16 on ppc compat very similar > to sparc compat regression reported earlier at > https://marc.info/?l=linux-sparc&m=151501500704383 . > > The symptoms are exactly the same: the same signal_receive test from > the strace test suite fails with the same diagnostics: > https://build.opensuse.org/public/build/home:ldv_alt/openSUSE_Factory_PowerPC/ppc/strace/_log The log is big, just look for "KERNEL BUG". > Unfortunately, I do not have any means to investigate further, > so just passing this information on to those who care. OK, the faulty commit is v4.16-rc1~159^2~39 ("signal/powerpc: Document conflicts with SI_USER and SIGFPE and SIGTRAP"). One might think that a commit called "Document conflicts" shouldn't introduce an ABI regression, but this one definitely does by defining FPE_FIXME and TRAP_FIXME in arch/powerpc/include/uapi/asm/siginfo.h that affect siginfo_layout(). A similar commit v4.16-rc1~159^2~37 ("signal/arm: Document conflicts with SI_USER and SIGFPE") must have introduced a similar ABI regression to compat arm. An earlier commit v4.14-rc1~60^2^2~5 ("signal/sparc: Document a conflict with SI_USER with SIGFPE") introduced a similar ABI regression to compat sparc. There is a clear pattern of sneaking in ABI changes using innocently looking commit messages. -- ldv signature.asc Description: PGP signature