Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid
On Fri, Apr 13, 2018 at 07:50:17PM +0100, Russell King - ARM Linux wrote: > On Fri, Apr 13, 2018 at 07:35:38PM +0100, Dave Martin wrote: > > If that's the case though, I don't see how a userspace testsuite is > > hitting this code path. Maybe I've misunderstood the context of this > > thread. > > It isn't hitting this exact case. > > The userspace testsuite is hitting an entirely different case: > > kill(getpid(), SIGFPE); > > As one expects, this generates a SIGFPE to the current process, which > then inspects the siginfo structure. Being a userspace generated > signal, si_code is set to SI_USER, which has the value 0. > > With FPE_FIXME defined to zero, as Eric has done: > > enum siginfo_layout siginfo_layout(int sig, int si_code) > { > enum siginfo_layout layout = SIL_KILL; > if ((si_code > SI_USER) && (si_code < SI_KERNEL)) { > ... > } else { > ... > #ifdef FPE_FIXME > if ((sig == SIGFPE) && (si_code == FPE_FIXME)) > layout = SIL_FAULT; > #endif > } > return layout; > } > > This causes siginfo_layout() to return SIL_FAULT for this userspace > generated signal, rather than the correct SIL_KILL. > > This affects which fields we copy to userspace. > > SI_USER is defined to pass si_pid and si_uid to the userspace process, > which on ARM are the first two consecutive 32-bit quantities in the > union, which is done when siginfo_layout() returns SIL_KILL. However, > when SIL_FAULT is returned, we only copy si_addr in the union, which > on ARM is just one 32-bit quantity. > > Consequently, userspace sees a correct value for si_pid, and si_uid > remains set to whatever was there in userspace. In the case of the > strace program, that's zero. This means if you run the strace > testsuite as root, the problem doesn't appear, but if you run it as > a non-root user, it will. > > So, the testsuite case has little to do with the behaviour of the VFP > handling - it's to do with the behaviour of the kernel's signal handling. Oh, right. So, going back to the unhandled VFP bounce question, is it reasonable for that to be a SIGKILL? That avoids the question of what si_code userspace should see, because userspace doesn't get to see siginfo at all in that case: it's dead. Or do we hit this in real situations that we want userspace to bail out of more gracefully? Cheers ---Dave
Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid
On Fri, Apr 13, 2018 at 11:23:36AM -0700, Linus Torvalds wrote: > On Fri, Apr 13, 2018 at 10:54 AM, Russell King - ARM Linux >wrote: > > > > FPE_FLTINV means "floating point invalid operation". Does it really > > cover the case where hardware has failed, or is it intended to cover > > the case where userspace did something wrong and asked for an invalid > > operation from the FP hardware? > > Note that the number of people who actually look at the si_code is > approximately zero. > > But the ones that _do_ check the si_code are certainly not going to > check it against a new code that they don't know about. > > I suspect that if you start searching for FLT_xyz occurrences in code, > approximately 100% of them are from the kernel code that generates > them, not from any actual users. > > So I'd be very surprised if you can find *anybody* who cares about > that exact value (with the possible exceptions of test-suites). > > Sadly, google code-search is no more. It was useful for things like that. I've found https://codesearch.debian.net/ useful for digging into this kind of question, though it tends to throw up a lot of false positives. Most uses I've seen do nothing more than use the FPE_xyz value to format diagnostic messages while dying. I struggled to find code that made a meaningful functional decision based on the value, though that's not proof... Cheers ---Dave
Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid
On Fri, Apr 13, 2018 at 06:54:08PM +0100, Russell King - ARM Linux wrote: > On Fri, Apr 13, 2018 at 06:08:28PM +0100, Dave Martin wrote: > > On Fri, Apr 13, 2018 at 09:33:17AM -0700, Linus Torvalds wrote: > > > On Fri, Apr 13, 2018 at 2:42 AM, Russell King - ARM Linux > > >wrote: > > > > > > > > Yes, it does solve the problem at hand with strace - the exact patch I > > > > tested against 4.16 is below. > > > > > > Ok, good. > > > > > > > However, FPE_FLTUNK is not defined in older kernels, so while we can > > > > fix it this way for the current merge window, that doesn't help 4.16. > > > > > > I wonder if we should even bother with FPE_FLTUNK. > > > > > > I suspect we might as well use FPE_FLTINV, I suspect, and not have > > > this complexity at all. That case is not worth worrying about, since > > > it's a "this shouldn't happen anyway" and the *real* reason will be in > > > the kernel logs due to vfs_panic(). > > > > > > So it's not like this is something that the user should ever care > > > about the si_code about. > > > > Ack, my intended meaning for FPE_FLTUNK is that the fp exception is > > either spurious or we can't tell easily (or possibly at all) which > > FPE_XXX should be returned. It's up to userspace to figure it out > > if it really cares. Previously we were accidentally returning SI_USER > > in si_code for arm64. > > > > This case on arm looks like a more serious error for which FPE_FLTINV > > may be more appropriate anyway. > > No. The cases where we get to this point are: > > 1. A trap concerning a coprocessor register transfer instruction (iow, move >between a VFP register and ARM register.) > 2. A trap concerning a coprocessor register load or save instruction. > > (In both of these, "concerning" means that the VFP hardware provides > such an instruction as the reason for the fault, *not* that it is the > faulting instruction.) > > 3. A combination of the exception bits (EX and DEX) on certain VFP >implementations. > > All of these can be summarised as "the hardware went wrong in some way" > rather than "the user program did something wrong." Although my understanding of VFP bounces is a bit hazy, I think this is broadly in line with my assumptions. > FPE_FLTINV means "floating point invalid operation". Does it really > cover the case where hardware has failed, or is it intended to cover > the case where userspace did something wrong and asked for an invalid > operation from the FP hardware? So, there's an argument that FPE_FLTINV is not really correct. My rationale was that there is nothing correct that we can return, and FPE_FLTINV may be no worse than the alternatives. If we can only hit this case as the result of a hardware failure or kernel bug though, should this be delivered as SIGKILL instead? That's the approach I eventually followed for various exceptions on arm64 that were theoretically delivered to userspace with si_code==0, but really should be impossible unless and kernel and/or hardware is buggy. If that's the case though, I don't see how a userspace testsuite is hitting this code path. Maybe I've misunderstood the context of this thread. Cheers ---Dave
Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid
On Fri, Apr 13, 2018 at 09:33:17AM -0700, Linus Torvalds wrote: > On Fri, Apr 13, 2018 at 2:42 AM, Russell King - ARM Linux >wrote: > > > > Yes, it does solve the problem at hand with strace - the exact patch I > > tested against 4.16 is below. > > Ok, good. > > > However, FPE_FLTUNK is not defined in older kernels, so while we can > > fix it this way for the current merge window, that doesn't help 4.16. > > I wonder if we should even bother with FPE_FLTUNK. > > I suspect we might as well use FPE_FLTINV, I suspect, and not have > this complexity at all. That case is not worth worrying about, since > it's a "this shouldn't happen anyway" and the *real* reason will be in > the kernel logs due to vfs_panic(). > > So it's not like this is something that the user should ever care > about the si_code about. Ack, my intended meaning for FPE_FLTUNK is that the fp exception is either spurious or we can't tell easily (or possibly at all) which FPE_XXX should be returned. It's up to userspace to figure it out if it really cares. Previously we were accidentally returning SI_USER in si_code for arm64. This case on arm looks like a more serious error for which FPE_FLTINV may be more appropriate anyway. [...] Cheers ---Dave
Re: [PATCH 00/32] docs/vm: convert to ReST format
On Fri, Apr 13, 2018 at 01:55:51PM -0600, Jonathan Corbet wrote: > > I believe that keeping the mm docs together will give better visibility of > > what (little) mm documentation we have and will make the updates easier. > > The documents that fit well into a certain topic could be linked there. For > > instance: > > ...but this sounds like just the opposite...? > > I've had this conversation with folks in a number of subsystems. > Everybody wants to keep their documentation together in one place - it's > easier for the developers after all. But for the readers I think it's > objectively worse. It perpetuates the mess that Documentation/ is, and > forces readers to go digging through all kinds of inappropriate material > in the hope of finding something that tells them what they need to know. > > So I would *really* like to split the documentation by audience, as has > been done for a number of other kernel subsystems (and eventually all, I > hope). > > I can go ahead and apply the RST conversion, that seems like a step in > the right direction regardless. But I sure hope we don't really have to > keep it as an unorganized jumble of stuff... I've started on Documentation/core-api/memory.rst which covers just memory allocation. So far it has the Overview and GFP flags sections written and an outline for 'The slab allocator', 'The page allocator', 'The vmalloc allocator' and 'The page_frag allocator'. And typing this up, I realise we need a 'The percpu allocator'. I'm thinking that this is *not* the right document for the DMA memory allocators (although it should link to that documentation). I suspect the existing Documentation/vm/ should probably stay as an unorganised jumble of stuff. Developers mostly talking to other MM developers. Stuff that people outside the MM fraternity should know about needs to be centrally documented. By all means convert it to ReST ... I don't much care, and it may make it easier to steal bits or link to it from the organised documentation.
Re: [PATCH 00/32] docs/vm: convert to ReST format
Sorry for the silence, I'm pedaling as fast as I can, honest... On Sun, 1 Apr 2018 09:38:58 +0300 Mike Rapoportwrote: > My thinking was to start with mechanical RST conversion and then to start > working on the contents and ordering of the documentation. Some of the > existing files, e.g. ksm.txt, can be moved as is into the appropriate > places, others, like transhuge.txt should be at least split into admin/user > and developer guides. > > Another problem with many of the existing mm docs is that they are rather > developer notes and it wouldn't be really straight forward to assign them > to a particular topic. All this sounds good. > I believe that keeping the mm docs together will give better visibility of > what (little) mm documentation we have and will make the updates easier. > The documents that fit well into a certain topic could be linked there. For > instance: ...but this sounds like just the opposite...? I've had this conversation with folks in a number of subsystems. Everybody wants to keep their documentation together in one place - it's easier for the developers after all. But for the readers I think it's objectively worse. It perpetuates the mess that Documentation/ is, and forces readers to go digging through all kinds of inappropriate material in the hope of finding something that tells them what they need to know. So I would *really* like to split the documentation by audience, as has been done for a number of other kernel subsystems (and eventually all, I hope). I can go ahead and apply the RST conversion, that seems like a step in the right direction regardless. But I sure hope we don't really have to keep it as an unorganized jumble of stuff... Thanks, jon
Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid
On Fri, Apr 13, 2018 at 11:45 AM, Dave Martinwrote: > > Most uses I've seen do nothing more than use the FPE_xyz value to > format diagnostic messages while dying. I struggled to find code that > made a meaningful functional decision based on the value, though that's > not proof... Yeah. I've seen code that cares about SIGFPE deeply, but it's almost invariably about some emulated environment (eg Java VM, or CPU emulation). And the siginfo data is basically never good enough for those environments anyway on its own, so they will go and look at the actual instruction that caused the fault and the register state instead, because they need *all* the information. The cases that use si_code are the ones that just trapped signals in order to give a more helpful abort message. So I could certainly imagine that si_code is actually used by somebody who then decides to actuall act differently on it, but aside from perhaps printing out a different message, it sounds far-fetched. Linus
Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid
On Fri, Apr 13, 2018 at 07:35:38PM +0100, Dave Martin wrote: > If that's the case though, I don't see how a userspace testsuite is > hitting this code path. Maybe I've misunderstood the context of this > thread. It isn't hitting this exact case. The userspace testsuite is hitting an entirely different case: kill(getpid(), SIGFPE); As one expects, this generates a SIGFPE to the current process, which then inspects the siginfo structure. Being a userspace generated signal, si_code is set to SI_USER, which has the value 0. With FPE_FIXME defined to zero, as Eric has done: enum siginfo_layout siginfo_layout(int sig, int si_code) { enum siginfo_layout layout = SIL_KILL; if ((si_code > SI_USER) && (si_code < SI_KERNEL)) { ... } else { ... #ifdef FPE_FIXME if ((sig == SIGFPE) && (si_code == FPE_FIXME)) layout = SIL_FAULT; #endif } return layout; } This causes siginfo_layout() to return SIL_FAULT for this userspace generated signal, rather than the correct SIL_KILL. This affects which fields we copy to userspace. SI_USER is defined to pass si_pid and si_uid to the userspace process, which on ARM are the first two consecutive 32-bit quantities in the union, which is done when siginfo_layout() returns SIL_KILL. However, when SIL_FAULT is returned, we only copy si_addr in the union, which on ARM is just one 32-bit quantity. Consequently, userspace sees a correct value for si_pid, and si_uid remains set to whatever was there in userspace. In the case of the strace program, that's zero. This means if you run the strace testsuite as root, the problem doesn't appear, but if you run it as a non-root user, it will. So, the testsuite case has little to do with the behaviour of the VFP handling - it's to do with the behaviour of the kernel's signal handling. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up
[PATCH v2] powerpc/sparse: fix plain integer as NULL pointer warning
Trivial fix to remove the following sparse warnings: arch/powerpc/kernel/module_32.c:112:74: warning: Using plain integer as NULL pointer arch/powerpc/kernel/module_32.c:117:74: warning: Using plain integer as NULL pointer drivers/macintosh/via-pmu.c:1155:28: warning: Using plain integer as NULL pointer drivers/macintosh/via-pmu.c:1230:20: warning: Using plain integer as NULL pointer drivers/macintosh/via-pmu.c:1385:36: warning: Using plain integer as NULL pointer drivers/macintosh/via-pmu.c:1752:23: warning: Using plain integer as NULL pointer drivers/macintosh/via-pmu.c:2084:19: warning: Using plain integer as NULL pointer drivers/macintosh/via-pmu.c:2110:32: warning: Using plain integer as NULL pointer drivers/macintosh/via-pmu.c:2167:19: warning: Using plain integer as NULL pointer drivers/macintosh/via-pmu.c:2183:19: warning: Using plain integer as NULL pointer drivers/macintosh/via-pmu.c:277:20: warning: Using plain integer as NULL pointer arch/powerpc/platforms/powermac/setup.c:155:67: warning: Using plain integer as NULL pointer arch/powerpc/platforms/powermac/setup.c:247:27: warning: Using plain integer as NULL pointer arch/powerpc/platforms/powermac/setup.c:249:27: warning: Using plain integer as NULL pointer arch/powerpc/platforms/powermac/setup.c:252:37: warning: Using plain integer as NULL pointer arch/powerpc/mm/tlb_hash32.c:127:21: warning: Using plain integer as NULL pointer arch/powerpc/mm/tlb_hash32.c:148:21: warning: Using plain integer as NULL pointer arch/powerpc/mm/tlb_hash32.c:44:21: warning: Using plain integer as NULL pointer arch/powerpc/mm/tlb_hash32.c:57:21: warning: Using plain integer as NULL pointer arch/powerpc/mm/tlb_hash32.c:87:21: warning: Using plain integer as NULL pointer arch/powerpc/kernel/btext.c:160:31: warning: Using plain integer as NULL pointer arch/powerpc/kernel/btext.c:167:22: warning: Using plain integer as NULL pointer arch/powerpc/kernel/btext.c:274:21: warning: Using plain integer as NULL pointer arch/powerpc/kernel/btext.c:285:31: warning: Using plain integer as NULL pointer arch/powerpc/include/asm/hugetlb.h:204:16: warning: Using plain integer as NULL pointer arch/powerpc/mm/ppc_mmu_32.c:170:21: warning: Using plain integer as NULL pointer arch/powerpc/platforms/powermac/pci.c:1227:23: warning: Using plain integer as NULL pointer arch/powerpc/platforms/powermac/pci.c:65:24: warning: Using plain integer as NULL pointer Also use `--fix` command line option from `script/checkpatch --strict` to remove the following: CHECK: Comparison to NULL could be written "!dispDeviceBase" #72: FILE: arch/powerpc/kernel/btext.c:160: + if (dispDeviceBase == NULL) CHECK: Comparison to NULL could be written "!vbase" #80: FILE: arch/powerpc/kernel/btext.c:167: + if (vbase == NULL) CHECK: Comparison to NULL could be written "!base" #89: FILE: arch/powerpc/kernel/btext.c:274: + if (base == NULL) CHECK: Comparison to NULL could be written "!dispDeviceBase" #98: FILE: arch/powerpc/kernel/btext.c:285: + if (dispDeviceBase == NULL) CHECK: Comparison to NULL could be written "strstr" #117: FILE: arch/powerpc/kernel/module_32.c:117: + if (strstr(secstrings + sechdrs[i].sh_name, ".debug") != NULL) CHECK: Comparison to NULL could be written "!Hash" #130: FILE: arch/powerpc/mm/ppc_mmu_32.c:170: + if (Hash == NULL) CHECK: Comparison to NULL could be written "Hash" #143: FILE: arch/powerpc/mm/tlb_hash32.c:44: + if (Hash != NULL) { CHECK: Comparison to NULL could be written "!Hash" #152: FILE: arch/powerpc/mm/tlb_hash32.c:57: + if (Hash == NULL) { CHECK: Comparison to NULL could be written "!Hash" #161: FILE: arch/powerpc/mm/tlb_hash32.c:87: + if (Hash == NULL) { CHECK: Comparison to NULL could be written "!Hash" #170: FILE: arch/powerpc/mm/tlb_hash32.c:127: + if (Hash == NULL) { CHECK: Comparison to NULL could be written "!Hash" #179: FILE: arch/powerpc/mm/tlb_hash32.c:148: + if (Hash == NULL) { ERROR: space required after that ';' (ctx:VxV) #192: FILE: arch/powerpc/platforms/powermac/pci.c:65: + for (; node != NULL;node = node->sibling) { CHECK: Comparison to NULL could be written "node" #192: FILE: arch/powerpc/platforms/powermac/pci.c:65: + for (; node != NULL;node = node->sibling) { CHECK: Comparison to NULL could be written "!region" #201: FILE: arch/powerpc/platforms/powermac/pci.c:1227: + if (region == NULL) CHECK: Comparison to NULL could be written "of_get_property" #214: FILE: arch/powerpc/platforms/powermac/setup.c:155: + if (of_get_property(np, "cache-unified", NULL) != NULL && dc) { CHECK: Comparison to NULL could be written "!np" #223: FILE: arch/powerpc/platforms/powermac/setup.c:247: + if (np == NULL) CHECK: Comparison to NULL could be written "np" #226: FILE:
Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid
On Fri, Apr 13, 2018 at 10:54 AM, Russell King - ARM Linuxwrote: > > FPE_FLTINV means "floating point invalid operation". Does it really > cover the case where hardware has failed, or is it intended to cover > the case where userspace did something wrong and asked for an invalid > operation from the FP hardware? Note that the number of people who actually look at the si_code is approximately zero. But the ones that _do_ check the si_code are certainly not going to check it against a new code that they don't know about. I suspect that if you start searching for FLT_xyz occurrences in code, approximately 100% of them are from the kernel code that generates them, not from any actual users. So I'd be very surprised if you can find *anybody* who cares about that exact value (with the possible exceptions of test-suites). Sadly, google code-search is no more. It was useful for things like that. Linus
Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid
On Fri, Apr 13, 2018 at 06:08:28PM +0100, Dave Martin wrote: > On Fri, Apr 13, 2018 at 09:33:17AM -0700, Linus Torvalds wrote: > > On Fri, Apr 13, 2018 at 2:42 AM, Russell King - ARM Linux > >wrote: > > > > > > Yes, it does solve the problem at hand with strace - the exact patch I > > > tested against 4.16 is below. > > > > Ok, good. > > > > > However, FPE_FLTUNK is not defined in older kernels, so while we can > > > fix it this way for the current merge window, that doesn't help 4.16. > > > > I wonder if we should even bother with FPE_FLTUNK. > > > > I suspect we might as well use FPE_FLTINV, I suspect, and not have > > this complexity at all. That case is not worth worrying about, since > > it's a "this shouldn't happen anyway" and the *real* reason will be in > > the kernel logs due to vfs_panic(). > > > > So it's not like this is something that the user should ever care > > about the si_code about. > > Ack, my intended meaning for FPE_FLTUNK is that the fp exception is > either spurious or we can't tell easily (or possibly at all) which > FPE_XXX should be returned. It's up to userspace to figure it out > if it really cares. Previously we were accidentally returning SI_USER > in si_code for arm64. > > This case on arm looks like a more serious error for which FPE_FLTINV > may be more appropriate anyway. No. The cases where we get to this point are: 1. A trap concerning a coprocessor register transfer instruction (iow, move between a VFP register and ARM register.) 2. A trap concerning a coprocessor register load or save instruction. (In both of these, "concerning" means that the VFP hardware provides such an instruction as the reason for the fault, *not* that it is the faulting instruction.) 3. A combination of the exception bits (EX and DEX) on certain VFP implementations. All of these can be summarised as "the hardware went wrong in some way" rather than "the user program did something wrong." FPE_FLTINV means "floating point invalid operation". Does it really cover the case where hardware has failed, or is it intended to cover the case where userspace did something wrong and asked for an invalid operation from the FP hardware? -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up
Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid
On Fri, Apr 13, 2018 at 2:42 AM, Russell King - ARM Linuxwrote: > > Yes, it does solve the problem at hand with strace - the exact patch I > tested against 4.16 is below. Ok, good. > However, FPE_FLTUNK is not defined in older kernels, so while we can > fix it this way for the current merge window, that doesn't help 4.16. I wonder if we should even bother with FPE_FLTUNK. I suspect we might as well use FPE_FLTINV, I suspect, and not have this complexity at all. That case is not worth worrying about, since it's a "this shouldn't happen anyway" and the *real* reason will be in the kernel logs due to vfs_panic(). So it's not like this is something that the user should ever care about the si_code about. > Given that the path we're talking about is unlikely to happen (as > mentioned in my second paragraph) I still think reverting Eric's patch > is the right way forward for older kernels. I'd much rather get rid of that FPE_FIXME, and leave that whole mess behind. So the attached patch seems simple and should work with 4.16 too. Let's not leave this as some kind of nasty maintenance issue, and just go for simple and stupid. Hmm? Linus arch/arm/include/uapi/asm/siginfo.h | 13 - arch/arm/vfp/vfpmodule.c| 2 +- 2 files changed, 1 insertion(+), 14 deletions(-) diff --git a/arch/arm/include/uapi/asm/siginfo.h b/arch/arm/include/uapi/asm/siginfo.h deleted file mode 100644 index d0513880be21.. --- a/arch/arm/include/uapi/asm/siginfo.h +++ /dev/null @@ -1,13 +0,0 @@ -#ifndef __ASM_SIGINFO_H -#define __ASM_SIGINFO_H - -#include - -/* - * SIGFPE si_codes - */ -#ifdef __KERNEL__ -#define FPE_FIXME 0 /* Broken dup of SI_USER */ -#endif /* __KERNEL__ */ - -#endif diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c index 4c375e11ae95..af4ee2cef2f9 100644 --- a/arch/arm/vfp/vfpmodule.c +++ b/arch/arm/vfp/vfpmodule.c @@ -257,7 +257,7 @@ static void vfp_raise_exceptions(u32 exceptions, u32 inst, u32 fpscr, struct pt_ if (exceptions == VFP_EXCEPTION_ERROR) { vfp_panic("unhandled bounce", inst); - vfp_raise_sigfpe(FPE_FIXME, regs); + vfp_raise_sigfpe(FPE_FLTINV, regs); return; }
Re: [PATCH v9 00/24] Speculative page faults
On 14/03/2018 14:11, Michal Hocko wrote: > On Tue 13-03-18 18:59:30, Laurent Dufour wrote: >> Changes since v8: >> - Don't check PMD when locking the pte when THP is disabled >>Thanks to Daniel Jordan for reporting this. >> - Rebase on 4.16 > > Is this really worth reposting the whole pile? I mean this is at v9, > each doing little changes. It is quite tiresome to barely get to a > bookmarked version just to find out that there are 2 new versions out. > > I am sorry to be grumpy and I can understand some frustration it doesn't > move forward that easilly but this is a _big_ change. We should start > with a real high level review rather than doing small changes here and > there and reach v20 quickly. I know this would mean v10, but there has been a bunch of reviews from David Rientjes and Jerome Glisse, and I had to make many changes to address them. So I think this is time to push a v10. If you have already started a review of this v9 series, please send me your remarks so that I can compile them in this v10 asap. Thanks, Laurent.
[PATCH v2] cxl: Set the PBCQ Tunnel BAR register when enabling capi mode
Skiboot used to set the default Tunnel BAR register value when capi mode was enabled. This approach was ok for the cxl driver, but prevented other drivers from choosing different values. Skiboot versions > 5.11 will not set the default value any longer. This patch modifies the cxl driver to set/reset the Tunnel BAR register when entering/exiting the cxl mode, with pnv_pci_set_tunnel_bar(). Signed-off-by: Philippe Bergheaud--- Changelog: v2: Restrict tunnel bar setting to power9. Do not fail cxl_configure_adapter() on tunnel bar setting error. Log an info message instead, and continue configuring capi mode. --- drivers/misc/cxl/pci.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c index 83f1d08058fc..355c789406f7 100644 --- a/drivers/misc/cxl/pci.c +++ b/drivers/misc/cxl/pci.c @@ -1742,6 +1742,10 @@ static int cxl_configure_adapter(struct cxl *adapter, struct pci_dev *dev) /* Required for devices using CAPP DMA mode, harmless for others */ pci_set_master(dev); + if (cxl_is_power9()) + if (pnv_pci_set_tunnel_bar(dev, 0x0002E000ull, 1)) + dev_info(>dev, "Tunneled operations unsupported\n"); + if ((rc = pnv_phb_to_cxl_mode(dev, adapter->native->sl_ops->capi_mode))) goto err; @@ -1768,6 +1772,8 @@ static void cxl_deconfigure_adapter(struct cxl *adapter) { struct pci_dev *pdev = to_pci_dev(adapter->dev.parent); + if (cxl_is_power9()) + pnv_pci_set_tunnel_bar(pdev, 0x0002E000ull, 0); cxl_native_release_psl_err_irq(adapter); cxl_unmap_adapter_regs(adapter); -- 2.16.2
Re: [PATCH 2/2] powernv/npu: Prevent overwriting of pnv_npu2_init_contex() callback parameters
On Fri, Apr 13, 2018 at 12:02 PM, Mark Hairgrovewrote: > > > On Wed, 11 Apr 2018, Alistair Popple wrote: > >> There is a single npu context per set of callback parameters. Callers >> should be prevented from overwriting existing callback values so instead >> return an error if different parameters are passed. >> >> Signed-off-by: Alistair Popple >> --- >> arch/powerpc/include/asm/powernv.h | 2 +- >> arch/powerpc/platforms/powernv/npu-dma.c | 16 +--- >> 2 files changed, 14 insertions(+), 4 deletions(-) >> >> diff --git a/arch/powerpc/include/asm/powernv.h >> b/arch/powerpc/include/asm/powernv.h >> index dc5f6a5d4575..362ea12a4501 100644 >> --- a/arch/powerpc/include/asm/powernv.h >> +++ b/arch/powerpc/include/asm/powernv.h >> @@ -15,7 +15,7 @@ >> extern void powernv_set_nmmu_ptcr(unsigned long ptcr); >> extern struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev, >> unsigned long flags, >> - struct npu_context *(*cb)(struct npu_context *, void >> *), >> + void (*cb)(struct npu_context *, void *), >> void *priv); >> extern void pnv_npu2_destroy_context(struct npu_context *context, >> struct pci_dev *gpdev); >> diff --git a/arch/powerpc/platforms/powernv/npu-dma.c >> b/arch/powerpc/platforms/powernv/npu-dma.c >> index cb77162f4e7a..193f43ea3fbc 100644 >> --- a/arch/powerpc/platforms/powernv/npu-dma.c >> +++ b/arch/powerpc/platforms/powernv/npu-dma.c >> @@ -407,7 +407,7 @@ struct npu_context { >> bool nmmu_flush; >> >> /* Callback to stop translation requests on a given GPU */ >> - struct npu_context *(*release_cb)(struct npu_context *, void *); >> + void (*release_cb)(struct npu_context *context, void *priv); >> >> /* >>* Private pointer passed to the above callback for usage by >> @@ -705,7 +705,7 @@ static const struct mmu_notifier_ops >> nv_nmmu_notifier_ops = { >> */ >> struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev, >> unsigned long flags, >> - struct npu_context *(*cb)(struct npu_context *, void >> *), >> + void (*cb)(struct npu_context *, void *), >> void *priv) >> { >> int rc; >> @@ -763,8 +763,18 @@ struct npu_context *pnv_npu2_init_context(struct >> pci_dev *gpdev, >>*/ >> spin_lock(_context_lock); >> npu_context = mm->context.npu_context; >> - if (npu_context) >> + if (npu_context) { >> + if (npu_context->release_cb != cb || >> + npu_context->priv != priv) { >> + spin_unlock(_context_lock); >> + opal_npu_destroy_context(nphb->opal_id, mm->context.id, >> + PCI_DEVID(gpdev->bus->number, >> + gpdev->devfn)); >> + return ERR_PTR(-EINVAL); >> + } >> + >> WARN_ON(!kref_get_unless_zero(_context->kref)); >> + } >> spin_unlock(_context_lock); >> >> if (!npu_context) { >> -- >> 2.11.0 >> >> > > Reviewed-by: Mark Hairgrove > Tested-by: Mark Hairgrove > Reviewed-by: Balbir Singh
Re: [PATCH 1/2] powernv/npu: Add lock to prevent race in concurrent context init/destroy
On Wed, Apr 11, 2018 at 4:38 PM, Alistair Popplewrote: > The pnv_npu2_init_context() and pnv_npu2_destroy_context() functions are > used to allocate/free contexts to allow address translation and shootdown > by the NPU on a particular GPU. Context initialisation is implicitly safe > as it is protected by the requirement mmap_sem be held in write mode, > however pnv_npu2_destroy_context() does not require mmap_sem to be held and > it is not safe to call with a concurrent initialisation for a different > GPU. > > It was assumed the driver would ensure destruction was not called > concurrently with initialisation. However the driver may be simplified by > allowing concurrent initialisation and destruction for different GPUs. As > npu context creation/destruction is not a performance critical path and the > critical section is not large a single spinlock is used for simplicity. > > Signed-off-by: Alistair Popple > --- > arch/powerpc/platforms/powernv/npu-dma.c | 51 > ++-- > 1 file changed, 42 insertions(+), 9 deletions(-) > > diff --git a/arch/powerpc/platforms/powernv/npu-dma.c > b/arch/powerpc/platforms/powernv/npu-dma.c > index 1cbef1f9cd37..cb77162f4e7a 100644 > --- a/arch/powerpc/platforms/powernv/npu-dma.c > +++ b/arch/powerpc/platforms/powernv/npu-dma.c > @@ -34,6 +34,12 @@ > #define npu_to_phb(x) container_of(x, struct pnv_phb, npu) > > /* > + * spinlock to protect initialisation of an npu_context for a particular > + * mm_struct. > + */ > +DEFINE_SPINLOCK(npu_context_lock); > + > +/* > * Other types of TCE cache invalidation are not functional in the > * hardware. > */ > @@ -694,7 +700,8 @@ static const struct mmu_notifier_ops nv_nmmu_notifier_ops > = { > * Returns an error if there no contexts are currently available or a > * npu_context which should be passed to pnv_npu2_handle_fault(). > * > - * mmap_sem must be held in write mode. > + * mmap_sem must be held in write mode and must not be called from interrupt > + * context. > */ > struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev, > unsigned long flags, > @@ -741,7 +748,9 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev > *gpdev, > /* > * Setup the NPU context table for a particular GPU. These need to be > * per-GPU as we need the tables to filter ATSDs when there are no > -* active contexts on a particular GPU. > +* active contexts on a particular GPU. It is safe for these to be > +* called concurrently with destroy as the OPAL call takes appropriate > +* locks and refcounts on init/destroy. > */ > rc = opal_npu_init_context(nphb->opal_id, mm->context.id, flags, > PCI_DEVID(gpdev->bus->number, gpdev->devfn)); > @@ -752,8 +761,19 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev > *gpdev, > * We store the npu pci device so we can more easily get at the > * associated npus. > */ > + spin_lock(_context_lock); > npu_context = mm->context.npu_context; > + if (npu_context) > + WARN_ON(!kref_get_unless_zero(_context->kref)); > + spin_unlock(_context_lock); > + > if (!npu_context) { > + /* > +* We can set up these fields without holding the > +* npu_context_lock as the npu_context hasn't been returned to > +* the caller meaning it can't be destroyed. Parallel > allocation > +* is protected against by mmap_sem. > +*/ > rc = -ENOMEM; > npu_context = kzalloc(sizeof(struct npu_context), GFP_KERNEL); > if (npu_context) { > @@ -772,8 +792,6 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev > *gpdev, > } > > mm->context.npu_context = npu_context; > - } else { > - WARN_ON(!kref_get_unless_zero(_context->kref)); > } > > npu_context->release_cb = cb; > @@ -811,15 +829,16 @@ static void pnv_npu2_release_context(struct kref *kref) > mm_context_remove_copro(npu_context->mm); > > npu_context->mm->context.npu_context = NULL; > - mmu_notifier_unregister(_context->mn, > - npu_context->mm); > - > - kfree(npu_context); > } > > +/* > + * Destroy a context on the given GPU. May free the npu_context if it is no > + * longer active on any GPUs. Must not be called from interrupt context. > + */ > void pnv_npu2_destroy_context(struct npu_context *npu_context, > struct pci_dev *gpdev) > { > + int removed; > struct pnv_phb *nphb; > struct npu *npu; > struct pci_dev *npdev = pnv_pci_get_npu_dev(gpdev, 0); > @@ -841,7 +860,21 @@ void pnv_npu2_destroy_context(struct npu_context > *npu_context, >
Re: [PATCH] powerpc/sparse: fix plain integer as NULL pointer warning
Hi Mathieu, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on powerpc/next] [also build test WARNING on v4.16 next-20180413] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Mathieu-Malaterre/powerpc-sparse-fix-plain-integer-as-NULL-pointer-warning/20180413-144954 base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next config: powerpc-allyesconfig (attached as .config) compiler: powerpc64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=powerpc All warnings (new ones prefixed by >>): drivers/macintosh/via-pmu.c: In function 'pmu_present': >> drivers/macintosh/via-pmu.c:1752:9: warning: return makes integer from >> pointer without a cast [-Wint-conversion] return via; ^~~ vim +1752 drivers/macintosh/via-pmu.c 1748 1749 int 1750 pmu_present(void) 1751 { > 1752 return via; 1753 } 1754 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH] cxl: Set the PBCQ Tunnel BAR register when enabling capi mode
Le 12/04/2018 à 13:06, Philippe Bergheaud a écrit : Skiboot used to set the default Tunnel BAR register value when capi mode was enabled. This approach was ok for the cxl driver, but prevented other drivers from choosing different values. Skiboot versions > 5.11 will not set the default value any longer. This patch modifies the cxl driver to set/reset the Tunnel BAR register when entering/exiting the cxl mode, with pnv_pci_set_tunnel_bar(). Signed-off-by: Philippe Bergheaud--- drivers/misc/cxl/pci.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c index 83f1d08058fc..3beff9188446 100644 --- a/drivers/misc/cxl/pci.c +++ b/drivers/misc/cxl/pci.c @@ -1742,6 +1742,9 @@ static int cxl_configure_adapter(struct cxl *adapter, struct pci_dev *dev) /* Required for devices using CAPP DMA mode, harmless for others */ pci_set_master(dev); + if ((rc = pnv_pci_set_tunnel_bar(dev, 0x0002E000ull, 1))) + goto err; + Isn't that call going to fail on older skiboot which don't support OPAL_PCI_SET_PBCQ_TUNNEL_BAR, i.e. on p8? Fred if ((rc = pnv_phb_to_cxl_mode(dev, adapter->native->sl_ops->capi_mode))) goto err; @@ -1768,6 +1771,7 @@ static void cxl_deconfigure_adapter(struct cxl *adapter) { struct pci_dev *pdev = to_pci_dev(adapter->dev.parent); + pnv_pci_set_tunnel_bar(pdev, 0x0002E000ull, 0); cxl_native_release_psl_err_irq(adapter); cxl_unmap_adapter_regs(adapter);
Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid
On Thu, Apr 12, 2018 at 10:22:15AM -0700, Linus Torvalds wrote: > On Thu, Apr 12, 2018 at 10:20 AM, Russell King - ARM Linux >wrote: > > > > This file was created to contain FPE_FIXME, by the "signal/arm: Document > > conflicts with SI_USER and SIGFPE" commit so if we're removing it, it > > would be better to remove the whole file. > > Fair enough. I'm not going to commit that anyway since I can't test > it, but yes, if it tests ok that sounds like the right thing to do. Yes, it does solve the problem at hand with strace - the exact patch I tested against 4.16 is below. Testing this exact code path (exceptions set to VFP_EXCEPTION_ERROR) is something that can only happen if the hardware does something stupid, and I don't have a way of making it do that, so the code path can't be tested. However, FPE_FLTUNK is not defined in older kernels, so while we can fix it this way for the current merge window, that doesn't help 4.16. How we solve that depends what happens with Eric's patch (266da65e9156 ("signal: Add FPE_FLTUNK si_code for undiagnosable fp exceptions")) that introduces FPE_FLTUNK - and there's also the problem of NSIGFPE, which kernel/signal.c uses in the path that selects the siginfo layout. Given that the path we're talking about is unlikely to happen (as mentioned in my second paragraph) I still think reverting Eric's patch is the right way forward for older kernels. (Note, my previous comment about the si_code initialiser was incorrect.) diff --git a/arch/arm/include/uapi/asm/siginfo.h b/arch/arm/include/uapi/asm/siginfo.h deleted file mode 100644 index d0513880be21.. --- a/arch/arm/include/uapi/asm/siginfo.h +++ /dev/null @@ -1,13 +0,0 @@ -#ifndef __ASM_SIGINFO_H -#define __ASM_SIGINFO_H - -#include - -/* - * SIGFPE si_codes - */ -#ifdef __KERNEL__ -#define FPE_FIXME 0 /* Broken dup of SI_USER */ -#endif /* __KERNEL__ */ - -#endif diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c index 4c375e11ae95..8a1a5e6048d2 100644 --- a/arch/arm/vfp/vfpmodule.c +++ b/arch/arm/vfp/vfpmodule.c @@ -28,6 +28,10 @@ #include #include +#ifndef FPE_FLTUNK +#define FPE_FLTUNK 14 +#endif + #include "vfpinstr.h" #include "vfp.h" @@ -257,7 +261,7 @@ static void vfp_raise_exceptions(u32 exceptions, u32 inst, u32 fpscr, struct pt_ if (exceptions == VFP_EXCEPTION_ERROR) { vfp_panic("unhandled bounce", inst); - vfp_raise_sigfpe(FPE_FIXME, regs); + vfp_raise_sigfpe(FPE_FLTUNK, regs); return; } -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up