Re: [BISECTED REGRESSION] b43legacy broken on G4 PowerBook
On Tue, Jun 11, 2019 at 05:20:12PM -0500, Larry Finger wrote: > Your first patch did not work as the configuration does not have > CONFIG_ZONE_DMA. As a result, the initial value of min_mask always starts > at 32 bits and is taken down to 31 with the maximum pfn minimization. When > I forced the initial value of min_mask to 30 bits, the device worked. Ooops, yes. But I think we could just enable ZONE_DMA on 32-bit powerpc. Crude enablement hack below: diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 8c1c636308c8..1dd71a98b70c 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -372,7 +372,7 @@ config PPC_ADV_DEBUG_DAC_RANGE config ZONE_DMA bool - default y if PPC_BOOK3E_64 + default y config PGTABLE_LEVELS int
Re: [PATCH v2 8/8] habanalabs: enable 64-bit DMA mask in POWER9
On Wed, Jun 12, 2019 at 04:35:22PM +1000, Oliver O'Halloran wrote: > Setting a 48 bit DMA mask doesn't work today because we only allocate > IOMMU tables to cover the 0..2GB range of PCI bus addresses. I don't think that is true upstream, and if it is we need to fix bug in the powerpc code. powerpc should be falling back treating a 48-bit dma mask like a 32-bit one at least, that is use dynamic iommu mappings instead of using the direct mapping. And from my reding of arch/powerpc/kernel/dma-iommu.c that is exactly what it does.
Re: [PATCH v2 8/8] habanalabs: enable 64-bit DMA mask in POWER9
On Wed, Jun 12, 2019 at 3:25 AM Oded Gabbay wrote: > > On Tue, Jun 11, 2019 at 8:03 PM Oded Gabbay wrote: > > > > On Tue, Jun 11, 2019 at 6:26 PM Greg KH wrote: > > > *snip* > > > > Now, when I tried to integrate Goya into a POWER9 machine, I got a > > reject from the call to pci_set_dma_mask(pdev, 48). The standard code, > > as I wrote above, is to call the same function with 32-bits. That > > works BUT it is not practical, as our applications require much more > > memory mapped then 32-bits. Setting a 48 bit DMA mask doesn't work today because we only allocate IOMMU tables to cover the 0..2GB range of PCI bus addresses. Alexey has some patches to expand that range so we can support devices that can't hit the 64 bit bypass window. You need: This fix: http://patchwork.ozlabs.org/patch/1113506/ This series: http://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=110810 Give that a try and see if the IOMMU overhead is tolerable. > >In addition, once you add more cards which > > are all mapped to the same range, it is simply not usable at all. Each IOMMU group should have a separate bus address space and seperate cards shouldn't be in the same IOMMU group. If they are then there's something up. Oliver
Re: sys_exit: NR -1
Paul Clarke wrote: What are the circumstances in which raw_syscalls:sys_exit reports "-1" for the syscall ID? perf 5375 [007] 59632.478528: raw_syscalls:sys_enter: NR 1 (3, 9fb888, 8, 2d83740, 1, 7) perf 5375 [007] 59632.478532:raw_syscalls:sys_exit: NR 1 = 8 perf 5375 [007] 59632.478538: raw_syscalls:sys_enter: NR 15 (11, 7ca734b0, 7ca73380, 2d83740, 1, 7) perf 5375 [007] 59632.478539:raw_syscalls:sys_exit: NR -1 = 8 perf 5375 [007] 59632.478543: raw_syscalls:sys_enter: NR 16 (4, 2401, 0, 2d83740, 1, 0) perf 5375 [007] 59632.478551:raw_syscalls:sys_exit: NR 16 = 0 Which architecture? For powerpc, see: static inline int syscall_get_nr(struct task_struct *task, struct pt_regs *regs) { /* * Note that we are returning an int here. That means 0x, ie. * 32-bit negative 1, will be interpreted as -1 on a 64-bit kernel. * This is important for seccomp so that compat tasks can set r0 = -1 * to reject the syscall. */ return TRAP(regs) == 0xc00 ? regs->gpr[0] : -1; } - Naveen
Re: [PATCH v8 2/7] x86/dma: use IS_ENABLED() to simplify the code
On 2019/6/12 13:16, Borislav Petkov wrote: > On Thu, May 30, 2019 at 11:48:26AM +0800, Zhen Lei wrote: >> This patch removes the ifdefs around CONFIG_IOMMU_DEFAULT_PASSTHROUGH to >> improve readablity. > > Avoid having "This patch" or "This commit" in the commit message. It is > tautologically useless. OK, thanks. > > Also, do > > $ git grep 'This patch' Documentation/process > > for more details. > >> Signed-off-by: Zhen Lei >> --- >> arch/x86/kernel/pci-dma.c | 7 ++- >> 1 file changed, 2 insertions(+), 5 deletions(-) >> >> diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c >> index dcd272dbd0a9330..9f2b19c35a060df 100644 >> --- a/arch/x86/kernel/pci-dma.c >> +++ b/arch/x86/kernel/pci-dma.c >> @@ -43,11 +43,8 @@ >> * It is also possible to disable by default in kernel config, and enable >> with >> * iommu=nopt at boot time. >> */ >> -#ifdef CONFIG_IOMMU_DEFAULT_PASSTHROUGH >> -int iommu_pass_through __read_mostly = 1; >> -#else >> -int iommu_pass_through __read_mostly; >> -#endif >> +int iommu_pass_through __read_mostly = >> +IS_ENABLED(CONFIG_IOMMU_DEFAULT_PASSTHROUGH); > > Let that line stick out. OK, I will merge them on the same line. > > Thx. >
Re: [PATCH v2 8/8] habanalabs: enable 64-bit DMA mask in POWER9
On Wed, Jun 12, 2019 at 1:53 AM Benjamin Herrenschmidt wrote: > > On Tue, 2019-06-11 at 20:22 +0300, Oded Gabbay wrote: > > > > > So, to summarize: > > > If I call pci_set_dma_mask with 48, then it fails on POWER9. However, > > > in runtime, I don't know if its POWER9 or not, so upon failure I will > > > call it again with 32, which makes our device pretty much unusable. > > > If I call pci_set_dma_mask with 64, and do the dedicated configuration > > > in Goya's PCIe controller, then it won't work on x86-64, because bit > > > 59 will be set and the host won't like it (I checked it). In addition, > > > I might get addresses above 50 bits, which my device can't generate. > > > > > > I hope this makes things more clear. Now, please explain to me how I > > > can call pci_set_dma_mask without any regard to whether I run on > > > x86-64 or POWER9, considering what I wrote above ? > > > > > > Thanks, > > > Oded > > > > Adding ppc mailing list. > > You can't. Your device is broken. Devices that don't support DMAing to > the full 64-bit deserve to be added to the trash pile. > Hmm... right know they are added to customers data-centers but what do I know ;) > As a result, getting it to work will require hacks. Some GPUs have > similar issues and require similar hacks, it's unfortunate. > > Added a couple of guys on CC who might be able to help get those hacks > right. Thanks :) > > It's still very fishy .. the idea is to detect the case where setting a > 64-bit mask will give your system memory mapped at a fixed high address > (1 << 59 in our case) and program that in your chip in the "Fixed high > bits" register that you seem to have (also make sure it doesn't affect > MSIs or it will break them). MSI-X are working. The set of bit 59 doesn't apply to MSI-X transactions (AFAICS from the PCIe controller spec we have). > > This will only work as long as all of the system memory can be > addressed at an offset from that fixed address that itself fits your > device addressing capabilities (50 bits in this case). It may or may > not be the case but there's no way to check since the DMA mask logic > won't really apply. Understood. In the specific system we are integrated to, that is the case - we have less then 48 bits. But, as you pointed out, it is not a generic solution but with my H/W I can't give a generic fit-all solution for POWER9. I'll settle for the best that I can do. > > You might want to consider fixing your HW in the next iteration... This > is going to bite you when x86 increases the max physical memory for > example, or on other architectures. Understood and taken care of. > > Cheers, > Ben. > > > >
Re: [PATCH v3 1/3] powerpc/powernv: Add OPAL API interface to get secureboot state
Nayna Jain writes: > From: Claudio Carvalho > > The X.509 certificates trusted by the platform and other information > required to secure boot the OS kernel are wrapped in secure variables, > which are controlled by OPAL. > > This patch adds support to read OPAL secure variables through > OPAL_SECVAR_GET call. It returns the metadata and data for a given secure > variable based on the unique key. > > Since OPAL can support different types of backend which can vary in the > variable interpretation, a new OPAL API call named OPAL_SECVAR_BACKEND, is > added to retrieve the supported backend version. This helps the consumer > to know how to interpret the variable. > (Firstly, apologies that I haven't got around to asking about this yet!) Are pluggable/versioned backend a good idea? There are a few things that worry me about the idea: - It adds complexity in crypto (or crypto-adjacent) code, and that increases the likelihood that we'll accidentally add a bug with bad consequences. - Under what circumstances would would we change the kernel-visible behaviour of skiboot? Are we expecting to change the behaviour, content or names of the variables in future? Otherwise the only relevant change I can think of is a change to hardware platforms, and I'm not sure how a change in hardware would lead to change in behaviour in the kernel. Wouldn't Skiboot hide h/w differences? - If we are worried about a long-term-future change to how secure-boot works, would it be better to just add more get/set calls to opal at the point at which we actually implement the new system? - UEFI added EFI_VARIABLE_AUTHENTICATION_3 in a way that - as far as I know - didn't break backwards compatibility. Is there a reason we cannot add features that way instead? (It also dropped v1 of the authentication header.) - What is the correct fallback behaviour if a kernel receives a result that it does not expect? If a kernel expecting BackendV1 is instead informed that it is running on BackendV2, then the cannot access the secure variable at all, so it cannot load keys that are potentially required to successfully boot (e.g. to validate the module for network card or graphics!) Kind regards, Daniel > This support can be enabled using CONFIG_OPAL_SECVAR > > Signed-off-by: Claudio Carvalho > Signed-off-by: Nayna Jain > --- > This patch depends on a new OPAL call that is being added to skiboot. > The patch set that implements the new call has been posted to > https://patchwork.ozlabs.org/project/skiboot/list/?series=112868 > > arch/powerpc/include/asm/opal-api.h | 4 +- > arch/powerpc/include/asm/opal-secvar.h | 23 ++ > arch/powerpc/include/asm/opal.h | 6 ++ > arch/powerpc/platforms/powernv/Kconfig | 6 ++ > arch/powerpc/platforms/powernv/Makefile | 1 + > arch/powerpc/platforms/powernv/opal-call.c | 2 + > arch/powerpc/platforms/powernv/opal-secvar.c | 85 > 7 files changed, 126 insertions(+), 1 deletion(-) > create mode 100644 arch/powerpc/include/asm/opal-secvar.h > create mode 100644 arch/powerpc/platforms/powernv/opal-secvar.c > > diff --git a/arch/powerpc/include/asm/opal-api.h > b/arch/powerpc/include/asm/opal-api.h > index e1577cfa7186..a505e669b4b6 100644 > --- a/arch/powerpc/include/asm/opal-api.h > +++ b/arch/powerpc/include/asm/opal-api.h > @@ -212,7 +212,9 @@ > #define OPAL_HANDLE_HMI2 166 > #define OPAL_NX_COPROC_INIT 167 > #define OPAL_XIVE_GET_VP_STATE 170 > -#define OPAL_LAST170 > +#define OPAL_SECVAR_GET 173 > +#define OPAL_SECVAR_BACKEND 177 > +#define OPAL_LAST177 > > #define QUIESCE_HOLD 1 /* Spin all calls at entry */ > #define QUIESCE_REJECT 2 /* Fail all calls with > OPAL_BUSY */ > diff --git a/arch/powerpc/include/asm/opal-secvar.h > b/arch/powerpc/include/asm/opal-secvar.h > new file mode 100644 > index ..b677171a0368 > --- /dev/null > +++ b/arch/powerpc/include/asm/opal-secvar.h > @@ -0,0 +1,23 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +/* > + * PowerNV definitions for secure variables OPAL API. > + * > + * Copyright (C) 2019 IBM Corporation > + * Author: Claudio Carvalho > + * > + */ > +#ifndef OPAL_SECVAR_H > +#define OPAL_SECVAR_H > + > +enum { > + BACKEND_NONE = 0, > + BACKEND_TC_COMPAT_V1, > +}; > + > +extern int opal_get_variable(u8 *key, unsigned long ksize, > + u8 *metadata, unsigned long *mdsize, > + u8 *data, unsigned long *dsize); > + > +extern int opal_variable_version(unsigned long *backend); > + > +#endif > diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h > index 4cc37e708bc7..57d2c2356eda 100644 > --- a/arch/powerpc/include/asm/opal.h > +++ b/arch/
Re: [PATCH kernel v3 0/3] powerpc/ioda2: Yet another attempt to allow DMA masks between 32 and 59
On Wed, Jun 12, 2019 at 3:06 PM Shawn Anastasio wrote: > > On 6/5/19 11:11 PM, Shawn Anastasio wrote: > > On 5/30/19 2:03 AM, Alexey Kardashevskiy wrote: > >> This is an attempt to allow DMA masks between 32..59 which are not large > >> enough to use either a PHB3 bypass mode or a sketchy bypass. Depending > >> on the max order, up to 40 is usually available. > >> > >> > >> This is based on v5.2-rc2. > >> > >> Please comment. Thanks. > > > > I have tested this patch set with an AMD GPU that's limited to <64bit > > DMA (I believe it's 40 or 42 bit). It successfully allows the card to > > operate without falling back to 32-bit DMA mode as it does without > > the patches. > > > > Relevant kernel log message: > > ``` > > [0.311211] pci 0033:01 : [PE# 00] Enabling 64-bit DMA bypass > > ``` > > > > Tested-by: Shawn Anastasio > > After a few days of further testing, I've started to run into stability > issues with the patch applied and used with an AMD GPU. Specifically, > the system sometimes spontaneously crashes. Not just EEH errors either, > the whole system shuts down in what looks like a checkstop. Any specific workload? Checkstops are harder to debug without a system in the failed state so we'd need to replicate that locally to get a decent idea what's up. > Perhaps some subtle corruption is occurring?
Re: [PATCH v2] powerpc/perf: Use cpumask_last() to determine the designated cpu for nest/core units.
Hi Leonardo, On 6/11/19 12:17 AM, Leonardo Bras wrote: On Mon, 2019-06-10 at 12:02 +0530, Anju T Sudhakar wrote: Nest and core imc(In-memory Collection counters) assigns a particular cpu as the designated target for counter data collection. During system boot, the first online cpu in a chip gets assigned as the designated cpu for that chip(for nest-imc) and the first online cpu in a core gets assigned as the designated cpu for that core(for core-imc). If the designated cpu goes offline, the next online cpu from the same chip(for nest-imc)/core(for core-imc) is assigned as the next target, and the event context is migrated to the target cpu. Currently, cpumask_any_but() function is used to find the target cpu. Though this function is expected to return a `random` cpu, this always returns the next online cpu. If all cpus in a chip/core is offlined in a sequential manner, starting from the first cpu, the event migration has to happen for all the cpus which goes offline. Since the migration process involves a grace period, the total time taken to offline all the cpus will be significantly high. Seems like a very interesting work. Out of curiosity, have you used 'chcpu -d' to create your benchmark? Here I did not use chcpu to disable the cpu. I used a script which will offline cpus 88-175 by echoing `0` to /sys/devices/system/cpu/cpu*/online. Regards, Anju
Re: [PATCH v2 0/4] Additional fixes on Talitos driver
Le 11/06/2019 à 18:30, Horia Geanta a écrit : On 6/11/2019 6:40 PM, Christophe Leroy wrote: Le 11/06/2019 à 17:37, Horia Geanta a écrit : On 6/11/2019 5:39 PM, Christophe Leroy wrote: This series is the last set of fixes for the Talitos driver. We now get a fully clean boot on both SEC1 (SEC1.2 on mpc885) and SEC2 (SEC2.2 on mpc8321E) with CONFIG_CRYPTO_MANAGER_EXTRA_TESTS: I am getting below failures on a sec 3.3.2 (p1020rdb) for hmac(sha384) and hmac(sha512): Is that new with this series or did you already have it before ? Looks like this happens with or without this series. Found the issue, that's in https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=b8fbdc2bc4e71b62646031d5df5f08aafe15d5ad CONFIG_CRYPTO_DEV_TALITOS_SEC2 should be CONFIG_CRYPTO_DEV_TALITOS2 instead. Just sent a patch to fix it. Thanks Christophe I haven't checked the state of this driver for quite some time. Since I've noticed increased activity, I thought it would be worth actually testing the changes. Are changes in patch 2/4 ("crypto: talitos - fix hash on SEC1.") strictly for sec 1.x or they affect all revisions? What do you mean by "fuzz testing" enabled ? Is that CONFIG_CRYPTO_MANAGER_EXTRA_TESTS or something else ? Yes, it's this config symbol. Horia
[PATCH] crypto: talitos - fix max key size for sha384 and sha512
Below commit came with a typo in the CONFIG_ symbol, leading to a permanently reduced max key size regarless of the driver capabilities. Reported-by: Horia Geantă Fixes: b8fbdc2bc4e7 ("crypto: talitos - reduce max key size for SEC1") Signed-off-by: Christophe Leroy --- drivers/crypto/talitos.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c index 03b7a5d28fb0..b4c8a013f302 100644 --- a/drivers/crypto/talitos.c +++ b/drivers/crypto/talitos.c @@ -832,7 +832,7 @@ static void talitos_unregister_rng(struct device *dev) * HMAC_SNOOP_NO_AFEA (HSNA) instead of type IPSEC_ESP */ #define TALITOS_CRA_PRIORITY_AEAD_HSNA (TALITOS_CRA_PRIORITY - 1) -#ifdef CONFIG_CRYPTO_DEV_TALITOS_SEC2 +#ifdef CONFIG_CRYPTO_DEV_TALITOS2 #define TALITOS_MAX_KEY_SIZE (AES_MAX_KEY_SIZE + SHA512_BLOCK_SIZE) #else #define TALITOS_MAX_KEY_SIZE (AES_MAX_KEY_SIZE + SHA256_BLOCK_SIZE) -- 2.13.3
Re: [PATCH v2 8/8] habanalabs: enable 64-bit DMA mask in POWER9
On Wed, Jun 12, 2019 at 8:54 AM Benjamin Herrenschmidt wrote: > > On Tue, 2019-06-11 at 20:22 +0300, Oded Gabbay wrote: > > > > > So, to summarize: > > > If I call pci_set_dma_mask with 48, then it fails on POWER9. However, > > > in runtime, I don't know if its POWER9 or not, so upon failure I will > > > call it again with 32, which makes our device pretty much unusable. > > > If I call pci_set_dma_mask with 64, and do the dedicated configuration > > > in Goya's PCIe controller, then it won't work on x86-64, because bit > > > 59 will be set and the host won't like it (I checked it). In addition, > > > I might get addresses above 50 bits, which my device can't generate. > > > > > > I hope this makes things more clear. Now, please explain to me how I > > > can call pci_set_dma_mask without any regard to whether I run on > > > x86-64 or POWER9, considering what I wrote above ? > > > > > > Thanks, > > > Oded > > > > Adding ppc mailing list. > > You can't. Your device is broken. Devices that don't support DMAing to > the full 64-bit deserve to be added to the trash pile. > > As a result, getting it to work will require hacks. Some GPUs have > similar issues and require similar hacks, it's unfortunate. > > Added a couple of guys on CC who might be able to help get those hacks > right. > It's still very fishy .. the idea is to detect the case where setting a > 64-bit mask will give your system memory mapped at a fixed high address > (1 << 59 in our case) and program that in your chip in the "Fixed high > bits" register that you seem to have (also make sure it doesn't affect > MSIs or it will break them). Judging from the patch (https://lkml.org/lkml/2019/6/11/59) this is what they're doing. Also, are you sure about the MSI thing? The IODA3 spec says the only important bits for a 64bit MSI are bits 61:60 (to hit the window) and the lower bits that determine what IVE to use. Everything in between is ignored so ORing in bit 59 shouldn't break anything. > This will only work as long as all of the system memory can be > addressed at an offset from that fixed address that itself fits your > device addressing capabilities (50 bits in this case). It may or may > not be the case but there's no way to check since the DMA mask logic > won't really apply. > > You might want to consider fixing your HW in the next iteration... This > is going to bite you when x86 increases the max physical memory for > example, or on other architectures. Yes, do this. The easiest way to avoid this sort of wierd hack is to just design the PCIe interface to the spec in the first place.
Re: [PATCH v8 2/7] x86/dma: use IS_ENABLED() to simplify the code
On Thu, May 30, 2019 at 11:48:26AM +0800, Zhen Lei wrote: > This patch removes the ifdefs around CONFIG_IOMMU_DEFAULT_PASSTHROUGH to > improve readablity. Avoid having "This patch" or "This commit" in the commit message. It is tautologically useless. Also, do $ git grep 'This patch' Documentation/process for more details. > Signed-off-by: Zhen Lei > --- > arch/x86/kernel/pci-dma.c | 7 ++- > 1 file changed, 2 insertions(+), 5 deletions(-) > > diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c > index dcd272dbd0a9330..9f2b19c35a060df 100644 > --- a/arch/x86/kernel/pci-dma.c > +++ b/arch/x86/kernel/pci-dma.c > @@ -43,11 +43,8 @@ > * It is also possible to disable by default in kernel config, and enable > with > * iommu=nopt at boot time. > */ > -#ifdef CONFIG_IOMMU_DEFAULT_PASSTHROUGH > -int iommu_pass_through __read_mostly = 1; > -#else > -int iommu_pass_through __read_mostly; > -#endif > +int iommu_pass_through __read_mostly = > + IS_ENABLED(CONFIG_IOMMU_DEFAULT_PASSTHROUGH); Let that line stick out. Thx. -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply.
Re: [PATCH kernel v3 0/3] powerpc/ioda2: Yet another attempt to allow DMA masks between 32 and 59
On 6/5/19 11:11 PM, Shawn Anastasio wrote: On 5/30/19 2:03 AM, Alexey Kardashevskiy wrote: This is an attempt to allow DMA masks between 32..59 which are not large enough to use either a PHB3 bypass mode or a sketchy bypass. Depending on the max order, up to 40 is usually available. This is based on v5.2-rc2. Please comment. Thanks. I have tested this patch set with an AMD GPU that's limited to <64bit DMA (I believe it's 40 or 42 bit). It successfully allows the card to operate without falling back to 32-bit DMA mode as it does without the patches. Relevant kernel log message: ``` [ 0.311211] pci 0033:01 : [PE# 00] Enabling 64-bit DMA bypass ``` Tested-by: Shawn Anastasio After a few days of further testing, I've started to run into stability issues with the patch applied and used with an AMD GPU. Specifically, the system sometimes spontaneously crashes. Not just EEH errors either, the whole system shuts down in what looks like a checkstop. Perhaps some subtle corruption is occurring?
Re: [PATCH 2/2] powerpc/64s: __find_linux_pte synchronization vs pmdp_invalidate
On Fri, 2019-06-07 at 03:56:36 UTC, Nicholas Piggin wrote: > The change to pmdp_invalidate to mark the pmd with _PAGE_INVALID broke > the synchronisation against lock free lookups, __find_linux_pte's > pmd_none check no longer returns true for such cases. > > Fix this by adding a check for this condition as well. > > Fixes: da7ad366b497 ("powerpc/mm/book3s: Update pmd_present to look at > _PAGE_PRESENT bit") > Cc: Christophe Leroy > Suggested-by: Aneesh Kumar K.V > Signed-off-by: Nicholas Piggin > Reviewed-by: Aneesh Kumar K.V Applied to powerpc fixes, thanks. https://git.kernel.org/powerpc/c/a00196a272161338d4b1d66ec69e3d57 cheers
Re: [PATCH 1/2] powerpc/64s: Fix THP PMD collapse serialisation
On Fri, 2019-06-07 at 03:56:35 UTC, Nicholas Piggin wrote: > Commit 1b2443a547f9 ("powerpc/book3s64: Avoid multiple endian conversion > in pte helpers") changed the actual bitwise tests in pte_access_permitted > by using pte_write() and pte_present() helpers rather than raw bitwise > testing _PAGE_WRITE and _PAGE_PRESENT bits. > > The pte_present change now returns true for ptes which are !_PAGE_PRESENT > and _PAGE_INVALID, which is the combination used by pmdp_invalidate to > synchronize access from lock-free lookups. pte_access_permitted is used by > pmd_access_permitted, so allowing GUP lock free access to proceed with > such PTEs breaks this synchronisation. > > This bug has been observed on HPT host, with random crashes and corruption > in guests, usually together with bad PMD messages in the host. > > Fix this by adding an explicit check in pmd_access_permitted, and > documenting the condition explicitly. > > The pte_write() change should be okay, and would prevent GUP from falling > back to the slow path when encountering savedwrite ptes, which matches > what x86 (that does not implement savedwrite) does. > > Fixes: 1b2443a547f9 ("powerpc/book3s64: Avoid multiple endian conversion in > pte helpers") > Cc: Aneesh Kumar K.V > Cc: Christophe Leroy > Signed-off-by: Nicholas Piggin > Reviewed-by: Aneesh Kumar K.V Applied to powerpc fixes, thanks. https://git.kernel.org/powerpc/c/33258a1db165cf43a9e6382587ad06e9 cheers
Re: [PATCH] powerpc/32s: fix booting with CONFIG_PPC_EARLY_DEBUG_BOOTX
On Mon, 2019-06-03 at 13:00:51 UTC, Christophe Leroy wrote: > When booting through OF, setup_disp_bat() does nothing because > disp_BAT are not set. By change, it used to work because BOOTX > buffer is mapped 1:1 at address 0x8100 by the bootloader, and > btext_setup_display() sets virt addr same as phys addr. > > But since commit 215b823707ce ("powerpc/32s: set up an early static > hash table for KASAN."), a temporary page table overrides the > bootloader mapping. > > This 0x8100 is also problematic with the newly implemented > Kernel Userspace Access Protection (KUAP) because it is within user > address space. > > This patch fixes those issues by properly setting disp_BAT through > a call to btext_prepare_BAT(), allowing setup_disp_bat() to > properly setup BAT3 for early bootx screen buffer access. > > Reported-by: Mathieu Malaterre > Fixes: 215b823707ce ("powerpc/32s: set up an early static hash table for > KASAN.") > Signed-off-by: Christophe Leroy > Tested-by: Mathieu Malaterre Applied to powerpc fixes, thanks. https://git.kernel.org/powerpc/c/c21f5a9ed85ca3e914ca11f421677ae9 cheers
Re: [PATCH v3] powerpc: fix kexec failure on book3s/32
On Mon, 2019-06-03 at 08:20:28 UTC, Christophe Leroy wrote: > In the old days, _PAGE_EXEC didn't exist on 6xx aka book3s/32. > Therefore, allthough __mapin_ram_chunk() was already mapping kernel > text with PAGE_KERNEL_TEXT and the rest with PAGE_KERNEL, the entire > memory was executable. Part of the memory (first 512kbytes) was > mapped with BATs instead of page table, but it was also entirely > mapped as executable. > > In commit 385e89d5b20f ("powerpc/mm: add exec protection on > powerpc 603"), we started adding exec protection to some 6xx, namely > the 603, for pages mapped via pagetables. > > Then, in commit 63b2bc619565 ("powerpc/mm/32s: Use BATs for > STRICT_KERNEL_RWX"), the exec protection was extended to BAT mapped > memory, so that really only the kernel text could be executed. > > The problem here is that kexec is based on copying some code into > upper part of memory then executing it from there in order to install > a fresh new kernel at its definitive location. > > However, the code is position independant and first part of it is > just there to deactivate the MMU and jump to the second part. So it > is possible to run this first part inplace instead of running the > copy. Once the MMU is off, there is no protection anymore and the > second part of the code will just run as before. > > Reported-by: Aaro Koskinen > Fixes: 63b2bc619565 ("powerpc/mm/32s: Use BATs for STRICT_KERNEL_RWX") > Cc: sta...@vger.kernel.org > Signed-off-by: Christophe Leroy > Tested-by: Aaro Koskinen Applied to powerpc fixes, thanks. https://git.kernel.org/powerpc/c/6c284228eb356a1ec62a704b4d232971 cheers
[PATCH 0/3] live partition migration vs cacheinfo
Partition migration often results in the platform telling the OS to replace all the cache nodes in the device tree. The cacheinfo code has no knowledge of this, and continues to maintain references to the deleted/detached nodes, causing subsequent CPU online/offline operations to get warnings and oopses. This series addresses this longstanding issue by providing an interface to the cacheinfo layer that the migration code uses to rebuild the cacheinfo data structures at a safe time after migration, with appropriate serialization vs CPU hotplug. Nathan Lynch (3): powerpc/cacheinfo: add cacheinfo_teardown, cacheinfo_rebuild powerpc/pseries/mobility: prevent cpu hotplug during DT update powerpc/pseries/mobility: rebuild cacheinfo hierarchy post-migration arch/powerpc/kernel/cacheinfo.c | 21 + arch/powerpc/kernel/cacheinfo.h | 4 arch/powerpc/platforms/pseries/mobility.c | 19 +++ 3 files changed, 44 insertions(+) -- 2.20.1
[PATCH 2/3] powerpc/pseries/mobility: prevent cpu hotplug during DT update
CPU online/offline code paths are sensitive to parts of the device tree (various cpu node properties, cache nodes) that can be changed as a result of a migration. Prevent CPU hotplug while the device tree potentially is inconsistent. Fixes: 410bccf97881 ("powerpc/pseries: Partition migration in the kernel") Signed-off-by: Nathan Lynch --- arch/powerpc/platforms/pseries/mobility.c | 9 + 1 file changed, 9 insertions(+) diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c index 88925f8ca8a0..edc1ec408589 100644 --- a/arch/powerpc/platforms/pseries/mobility.c +++ b/arch/powerpc/platforms/pseries/mobility.c @@ -9,6 +9,7 @@ * 2 as published by the Free Software Foundation. */ +#include #include #include #include @@ -338,11 +339,19 @@ void post_mobility_fixup(void) if (rc) printk(KERN_ERR "Post-mobility activate-fw failed: %d\n", rc); + /* +* We don't want CPUs to go online/offline while the device +* tree is being updated. +*/ + cpus_read_lock(); + rc = pseries_devicetree_update(MIGRATION_SCOPE); if (rc) printk(KERN_ERR "Post-mobility device tree update " "failed: %d\n", rc); + cpus_read_unlock(); + /* Possibly switch to a new RFI flush type */ pseries_setup_rfi_flush(); -- 2.20.1
[PATCH 3/3] powerpc/pseries/mobility: rebuild cacheinfo hierarchy post-migration
It's common for the platform to replace the cache device nodes after a migration. Since the cacheinfo code is never informed about this, it never drops its references to the source system's cache nodes, causing it to wind up in an inconsistent state resulting in warnings and oopses as soon as CPU online/offline occurs after the migration, e.g. cache for /cpus/l3-cache@3113(Unified) refers to cache for /cpus/l2-cache@200d(Unified) WARNING: CPU: 15 PID: 86 at arch/powerpc/kernel/cacheinfo.c:176 release_cache+0x1bc/0x1d0 [...] NIP [c002d9bc] release_cache+0x1bc/0x1d0 LR [c002d9b8] release_cache+0x1b8/0x1d0 Call Trace: [c001fc99fa70] [c002d9b8] release_cache+0x1b8/0x1d0 (unreliable) [c001fc99fb10] [c002ebf4] cacheinfo_cpu_offline+0x1c4/0x2c0 [c001fc99fbe0] [c002ae58] unregister_cpu_online+0x1b8/0x260 [c001fc99fc40] [c0165a64] cpuhp_invoke_callback+0x114/0xf40 [c001fc99fcd0] [c0167450] cpuhp_thread_fun+0x270/0x310 [c001fc99fd40] [c01a8bb8] smpboot_thread_fn+0x2c8/0x390 [c001fc99fdb0] [c01a1cd8] kthread+0x1b8/0x1c0 [c001fc99fe20] [c000c2d4] ret_from_kernel_thread+0x5c/0x68 Using device tree notifiers won't work since we want to rebuild the hierarchy only after all the removals and additions have occurred and the device tree is in a consistent state. Call cacheinfo_teardown() before processing device tree updates, and rebuild the hierarchy afterward. Fixes: 410bccf97881 ("powerpc/pseries: Partition migration in the kernel") Signed-off-by: Nathan Lynch --- arch/powerpc/platforms/pseries/mobility.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c index edc1ec408589..b8c8096907d4 100644 --- a/arch/powerpc/platforms/pseries/mobility.c +++ b/arch/powerpc/platforms/pseries/mobility.c @@ -23,6 +23,7 @@ #include #include #include "pseries.h" +#include "../../kernel/cacheinfo.h" static struct kobject *mobility_kobj; @@ -345,11 +346,20 @@ void post_mobility_fixup(void) */ cpus_read_lock(); + /* +* It's common for the destination firmware to replace cache +* nodes. Release all of the cacheinfo hierarchy's references +* before updating the device tree. +*/ + cacheinfo_teardown(); + rc = pseries_devicetree_update(MIGRATION_SCOPE); if (rc) printk(KERN_ERR "Post-mobility device tree update " "failed: %d\n", rc); + cacheinfo_rebuild(); + cpus_read_unlock(); /* Possibly switch to a new RFI flush type */ -- 2.20.1
[PATCH 1/3] powerpc/cacheinfo: add cacheinfo_teardown, cacheinfo_rebuild
Allow external callers to force the cacheinfo code to release all its references to cache nodes, e.g. before processing device tree updates post-migration, and to rebuild the hierarchy afterward. CPU online/offline must be blocked by callers; enforce this. Fixes: 410bccf97881 ("powerpc/pseries: Partition migration in the kernel") Signed-off-by: Nathan Lynch --- arch/powerpc/kernel/cacheinfo.c | 21 + arch/powerpc/kernel/cacheinfo.h | 4 2 files changed, 25 insertions(+) diff --git a/arch/powerpc/kernel/cacheinfo.c b/arch/powerpc/kernel/cacheinfo.c index 862e2890bd3d..42c559efe060 100644 --- a/arch/powerpc/kernel/cacheinfo.c +++ b/arch/powerpc/kernel/cacheinfo.c @@ -896,4 +896,25 @@ void cacheinfo_cpu_offline(unsigned int cpu_id) if (cache) cache_cpu_clear(cache, cpu_id); } + +void cacheinfo_teardown(void) +{ + unsigned int cpu; + + lockdep_assert_cpus_held(); + + for_each_online_cpu(cpu) + cacheinfo_cpu_offline(cpu); +} + +void cacheinfo_rebuild(void) +{ + unsigned int cpu; + + lockdep_assert_cpus_held(); + + for_each_online_cpu(cpu) + cacheinfo_cpu_online(cpu); +} + #endif /* (CONFIG_PPC_PSERIES && CONFIG_SUSPEND) || CONFIG_HOTPLUG_CPU */ diff --git a/arch/powerpc/kernel/cacheinfo.h b/arch/powerpc/kernel/cacheinfo.h index 955f5e999f1b..52bd3fc6642d 100644 --- a/arch/powerpc/kernel/cacheinfo.h +++ b/arch/powerpc/kernel/cacheinfo.h @@ -6,4 +6,8 @@ extern void cacheinfo_cpu_online(unsigned int cpu_id); extern void cacheinfo_cpu_offline(unsigned int cpu_id); +/* Allow migration/suspend to tear down and rebuild the hierarchy. */ +extern void cacheinfo_teardown(void); +extern void cacheinfo_rebuild(void); + #endif /* _PPC_CACHEINFO_H */ -- 2.20.1
Re: [BISECTED REGRESSION] b43legacy broken on G4 PowerBook
On Tue, 2019-06-11 at 20:52 -0500, Larry Finger wrote: > On 6/11/19 5:46 PM, Benjamin Herrenschmidt wrote: > > On Tue, 2019-06-11 at 17:20 -0500, Larry Finger wrote: > > > b43-pci-bridge 0001:11:00.0: dma_direct_supported: failed (mask = > > > 0x3fff, > > > min_mask = 0x5000/0x5000, dma bits = 0x1f > > > > Ugh ? A mask with holes in it ? That's very wrong... That min_mask is > > bogus. > > I agree, but that is not likely serious as most systems will have enough > memory > that the max_pfn term will be much larger than the initial min_mask, and > min_mask will be unchanged by the min function. Well no... it's too much memory that is the problem. If min_mask is bogus though it will cause problem later too, so one should look into it. > In addition, min_mask is not > used beyond this routine, and then only to decide if direct dma is supported. > The following patch generates masks with no holes, but I cannot see that it > is > needed. The right fix is to round up max_pfn to a power of 2, something like min_mask = min_t(u64, min_mask, (roundup_pow_of_two(max_pfn - 1)) << PAGE_SHIFT) > diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c > index 2c2772e9702a..e3edd4f29e80 100644 > --- a/kernel/dma/direct.c > +++ b/kernel/dma/direct.c > @@ -384,7 +384,8 @@ int dma_direct_supported(struct device *dev, u64 mask) > else > min_mask = DMA_BIT_MASK(32); > > - min_mask = min_t(u64, min_mask, (max_pfn - 1) << PAGE_SHIFT); > + min_mask = min_t(u64, min_mask, ((max_pfn - 1) << PAGE_SHIFT) | > +DMA_BIT_MASK(PAGE_SHIFT)); > > /* > * This check needs to be against the actual bit mask value, so > > > Larry
Re: [BISECTED REGRESSION] b43legacy broken on G4 PowerBook
On 6/11/19 5:46 PM, Aaro Koskinen wrote: Hi, On Tue, Jun 11, 2019 at 05:20:12PM -0500, Larry Finger wrote: It is obvious that the case of a mask smaller than min_mask should be handled by the IOMMU. In my system, CONFIG_IOMMU_SUPPORT is selected. All other CONFIG variables containing IOMMU are not selected. When dma_direct_supported() fails, should the system not try for an IOMMU solution? Is the driver asking for the wrong type of memory? It is doing a dma_and_set_mask_coherent() call. I don't think we have IOMMU on G4. On G5 it should work (I remember fixing b43 issue on G5, see 4c374af5fdee, unfortunately all my G5 Macs with b43 are dead and waiting for re-capping). You are right. My configuration has CONFIG_IOMMU_SUPPORT=y, but there is no mention of an IOMMU in the log. Larry
Re: [BISECTED REGRESSION] b43legacy broken on G4 PowerBook
On 6/11/19 5:46 PM, Benjamin Herrenschmidt wrote: On Tue, 2019-06-11 at 17:20 -0500, Larry Finger wrote: b43-pci-bridge 0001:11:00.0: dma_direct_supported: failed (mask = 0x3fff, min_mask = 0x5000/0x5000, dma bits = 0x1f Ugh ? A mask with holes in it ? That's very wrong... That min_mask is bogus. I agree, but that is not likely serious as most systems will have enough memory that the max_pfn term will be much larger than the initial min_mask, and min_mask will be unchanged by the min function. In addition, min_mask is not used beyond this routine, and then only to decide if direct dma is supported. The following patch generates masks with no holes, but I cannot see that it is needed. diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index 2c2772e9702a..e3edd4f29e80 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -384,7 +384,8 @@ int dma_direct_supported(struct device *dev, u64 mask) else min_mask = DMA_BIT_MASK(32); - min_mask = min_t(u64, min_mask, (max_pfn - 1) << PAGE_SHIFT); + min_mask = min_t(u64, min_mask, ((max_pfn - 1) << PAGE_SHIFT) | +DMA_BIT_MASK(PAGE_SHIFT)); /* * This check needs to be against the actual bit mask value, so Larry
Re: [PATCH 16/16] mm: pass get_user_pages_fast iterator arguments in a structure
> On Jun 11, 2019, at 5:52 PM, Nicholas Piggin wrote: > > Christoph Hellwig's on June 12, 2019 12:41 am: >> Instead of passing a set of always repeated arguments down the >> get_user_pages_fast iterators, create a struct gup_args to hold them and >> pass that by reference. This leads to an over 100 byte .text size >> reduction for x86-64. > > What does this do for performance? I've found this pattern can be > bad for store aliasing detection. Note that sometimes such an optimization can also have adverse effect due to stack protector code that gcc emits when you use such structs. Matthew Wilcox encountered such a case: https://patchwork.kernel.org/patch/10702741/
Re: [PATCH 16/16] mm: pass get_user_pages_fast iterator arguments in a structure
On Tue, Jun 11, 2019 at 2:55 PM Nicholas Piggin wrote: > > What does this do for performance? I've found this pattern can be > bad for store aliasing detection. I wouldn't expect it to be noticeable, and the lack of argument reloading etc should make up for it. Plus inlining makes it a non-issue when that happens. But I guess we could also at least look at using "restrict", if that ends up helping. Unlike the completely bogus type-based aliasing rules (that we disable because I think the C people were on some bad bad drugs when they came up with them), restricted pointers are a real thing that makes sense. That said, we haven't traditionally used it, and I don't know how much it helps gcc. Maybe gcc ignores it entirely? S Linus
Re: [PATCH 16/16] mm: pass get_user_pages_fast iterator arguments in a structure
Christoph Hellwig's on June 12, 2019 12:41 am: > Instead of passing a set of always repeated arguments down the > get_user_pages_fast iterators, create a struct gup_args to hold them and > pass that by reference. This leads to an over 100 byte .text size > reduction for x86-64. What does this do for performance? I've found this pattern can be bad for store aliasing detection. Thanks, Nick
Re: [PATCH v2 8/8] habanalabs: enable 64-bit DMA mask in POWER9
On Tue, 2019-06-11 at 20:22 +0300, Oded Gabbay wrote: > > > So, to summarize: > > If I call pci_set_dma_mask with 48, then it fails on POWER9. However, > > in runtime, I don't know if its POWER9 or not, so upon failure I will > > call it again with 32, which makes our device pretty much unusable. > > If I call pci_set_dma_mask with 64, and do the dedicated configuration > > in Goya's PCIe controller, then it won't work on x86-64, because bit > > 59 will be set and the host won't like it (I checked it). In addition, > > I might get addresses above 50 bits, which my device can't generate. > > > > I hope this makes things more clear. Now, please explain to me how I > > can call pci_set_dma_mask without any regard to whether I run on > > x86-64 or POWER9, considering what I wrote above ? > > > > Thanks, > > Oded > > Adding ppc mailing list. You can't. Your device is broken. Devices that don't support DMAing to the full 64-bit deserve to be added to the trash pile. As a result, getting it to work will require hacks. Some GPUs have similar issues and require similar hacks, it's unfortunate. Added a couple of guys on CC who might be able to help get those hacks right. It's still very fishy .. the idea is to detect the case where setting a 64-bit mask will give your system memory mapped at a fixed high address (1 << 59 in our case) and program that in your chip in the "Fixed high bits" register that you seem to have (also make sure it doesn't affect MSIs or it will break them). This will only work as long as all of the system memory can be addressed at an offset from that fixed address that itself fits your device addressing capabilities (50 bits in this case). It may or may not be the case but there's no way to check since the DMA mask logic won't really apply. You might want to consider fixing your HW in the next iteration... This is going to bite you when x86 increases the max physical memory for example, or on other architectures. Cheers, Ben.
Re: [BISECTED REGRESSION] b43legacy broken on G4 PowerBook
On Tue, 2019-06-11 at 17:20 -0500, Larry Finger wrote: > b43-pci-bridge 0001:11:00.0: dma_direct_supported: failed (mask = > 0x3fff, > min_mask = 0x5000/0x5000, dma bits = 0x1f Ugh ? A mask with holes in it ? That's very wrong... That min_mask is bogus. Ben.
Re: [BISECTED REGRESSION] b43legacy broken on G4 PowerBook
Hi, On Tue, Jun 11, 2019 at 05:20:12PM -0500, Larry Finger wrote: > It is obvious that the case of a mask smaller than min_mask should be > handled by the IOMMU. In my system, CONFIG_IOMMU_SUPPORT is selected. All > other CONFIG variables containing IOMMU are not selected. When > dma_direct_supported() fails, should the system not try for an IOMMU > solution? Is the driver asking for the wrong type of memory? It is doing a > dma_and_set_mask_coherent() call. I don't think we have IOMMU on G4. On G5 it should work (I remember fixing b43 issue on G5, see 4c374af5fdee, unfortunately all my G5 Macs with b43 are dead and waiting for re-capping). A.
Re: [BISECTED REGRESSION] b43legacy broken on G4 PowerBook
On 6/11/19 1:05 AM, Christoph Hellwig wrote: On Mon, Jun 10, 2019 at 11:09:47AM -0500, Larry Finger wrote: What might be confusing in your output is that dev->dma_mask is a pointer, and we are setting it in dma_set_mask. That is before we only check if the pointer is set, and later we override it. Of course this doesn't actually explain the failure. But what is even more strange to me is that you get a return value from dma_supported() that isn't 0 or 1, as that function is supposed to return a boolean, and I really can't see how mask >= __phys_to_dma(dev, min_mask), would return anything but 0 or 1. Does the output change if you use the correct printk specifiers? i.e. with a debug patch like this: diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index 2c2772e9702a..9e5b30b12b10 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -378,6 +378,7 @@ EXPORT_SYMBOL(dma_direct_map_resource); int dma_direct_supported(struct device *dev, u64 mask) { u64 min_mask; + bool ret; if (IS_ENABLED(CONFIG_ZONE_DMA)) min_mask = DMA_BIT_MASK(ARCH_ZONE_DMA_BITS); @@ -391,7 +392,12 @@ int dma_direct_supported(struct device *dev, u64 mask) * use __phys_to_dma() here so that the SME encryption mask isn't * part of the check. */ - return mask >= __phys_to_dma(dev, min_mask); + ret = (mask >= __phys_to_dma(dev, min_mask)); + if (!ret) + dev_info(dev, + "%s: failed (mask = 0x%llx, min_mask = 0x%llx/0x%llx, dma bits = %d\n", + __func__, mask, min_mask, __phys_to_dma(dev, min_mask), ARCH_ZONE_DMA_BITS); + return ret; } size_t dma_direct_max_mapping_size(struct device *dev) diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c index f7afdadb6770..6c57ccdee2ae 100644 --- a/kernel/dma/mapping.c +++ b/kernel/dma/mapping.c @@ -317,8 +317,14 @@ void arch_dma_set_mask(struct device *dev, u64 mask); int dma_set_mask(struct device *dev, u64 mask) { - if (!dev->dma_mask || !dma_supported(dev, mask)) + if (!dev->dma_mask) { + dev_info(dev, "no DMA mask set!\n"); return -EIO; + } + if (!dma_supported(dev, mask)) { + printk("DMA not supported\n"); + return -EIO; + } arch_dma_set_mask(dev, mask); dma_check_mask(dev, mask); After I got the correct formatting, the output with this patch only gives the following in dmesg: b43-pci-bridge 0001:11:00.0: dma_direct_supported: failed (mask = 0x3fff, min_mask = 0x5000/0x5000, dma bits = 0x1f DMA not supported b43legacy-phy0 ERROR: The machine/kernel does not support the required 30-bit DMA mask Your first patch did not work as the configuration does not have CONFIG_ZONE_DMA. As a result, the initial value of min_mask always starts at 32 bits and is taken down to 31 with the maximum pfn minimization. When I forced the initial value of min_mask to 30 bits, the device worked. It is obvious that the case of a mask smaller than min_mask should be handled by the IOMMU. In my system, CONFIG_IOMMU_SUPPORT is selected. All other CONFIG variables containing IOMMU are not selected. When dma_direct_supported() fails, should the system not try for an IOMMU solution? Is the driver asking for the wrong type of memory? It is doing a dma_and_set_mask_coherent() call. Larry
[Bug 203837] Booting kernel under KVM immediately freezes host
https://bugzilla.kernel.org/show_bug.cgi?id=203837 --- Comment #4 from Shawn Anastasio (sh...@anastas.io) --- I have applied Nick's patchset to 5.1.7 but the issue still occurs. As for using pdbg, I'm aware of the tool's existence but I'm not sure how I would effectively use it to diagnose this issue. If anybody has some pointers, it'd be appreciated. -- You are receiving this mail because: You are watching the assignee of the bug.
Re: [PATCH 10/16] mm: rename CONFIG_HAVE_GENERIC_GUP to CONFIG_HAVE_FAST_GUP
On 6/11/19 8:40 AM, Christoph Hellwig wrote: > We only support the generic GUP now, so rename the config option to > be more clear, and always use the mm/Kconfig definition of the > symbol and select it from the arch Kconfigs. > > Signed-off-by: Christoph Hellwig > --- > arch/arm/Kconfig | 5 + > arch/arm64/Kconfig | 4 +--- > arch/mips/Kconfig| 2 +- > arch/powerpc/Kconfig | 2 +- > arch/s390/Kconfig| 2 +- > arch/sh/Kconfig | 2 +- > arch/sparc/Kconfig | 2 +- > arch/x86/Kconfig | 4 +--- > mm/Kconfig | 2 +- > mm/gup.c | 4 ++-- > 10 files changed, 11 insertions(+), 18 deletions(-) > Looks good. Reviewed-by: Khalid Aziz
Re: [PATCH 09/16] sparc64: use the generic get_user_pages_fast code
On 6/11/19 8:40 AM, Christoph Hellwig wrote: > The sparc64 code is mostly equivalent to the generic one, minus various > bugfixes and two arch overrides that this patch adds to pgtable.h. > > Signed-off-by: Christoph Hellwig > --- > arch/sparc/Kconfig | 1 + > arch/sparc/include/asm/pgtable_64.h | 18 ++ > arch/sparc/mm/Makefile | 2 +- > arch/sparc/mm/gup.c | 340 > 4 files changed, 20 insertions(+), 341 deletions(-) > delete mode 100644 arch/sparc/mm/gup.c > Reviewed-by: Khalid Aziz
Re: [PATCH 08/16] sparc64: define untagged_addr()
On 6/11/19 8:40 AM, Christoph Hellwig wrote: > Add a helper to untag a user pointer. This is needed for ADI support > in get_user_pages_fast. > > Signed-off-by: Christoph Hellwig > --- > arch/sparc/include/asm/pgtable_64.h | 22 ++ > 1 file changed, 22 insertions(+) Looks good to me. Reviewed-by: Khalid Aziz > > diff --git a/arch/sparc/include/asm/pgtable_64.h > b/arch/sparc/include/asm/pgtable_64.h > index f0dcf991d27f..1904782dcd39 100644 > --- a/arch/sparc/include/asm/pgtable_64.h > +++ b/arch/sparc/include/asm/pgtable_64.h > @@ -1076,6 +1076,28 @@ static inline int io_remap_pfn_range(struct > vm_area_struct *vma, > } > #define io_remap_pfn_range io_remap_pfn_range > > +static inline unsigned long untagged_addr(unsigned long start) > +{ > + if (adi_capable()) { > + long addr = start; > + > + /* If userspace has passed a versioned address, kernel > + * will not find it in the VMAs since it does not store > + * the version tags in the list of VMAs. Storing version > + * tags in list of VMAs is impractical since they can be > + * changed any time from userspace without dropping into > + * kernel. Any address search in VMAs will be done with > + * non-versioned addresses. Ensure the ADI version bits > + * are dropped here by sign extending the last bit before > + * ADI bits. IOMMU does not implement version tags. > + */ > + return (addr << (long)adi_nbits()) >> (long)adi_nbits(); > + } > + > + return start; > +} > +#define untagged_addr untagged_addr > + > #include > #include > >
Re: [PATCH 01/16] mm: use untagged_addr() for get_user_pages_fast addresses
On 6/11/19 8:40 AM, Christoph Hellwig wrote: > This will allow sparc64 to override its ADI tags for > get_user_pages and get_user_pages_fast. > > Signed-off-by: Christoph Hellwig > --- Commit message is sparc64 specific but the goal here is to allow any architecture with memory tagging to use this. So I would suggest rewording the commit log. Other than that: Reviewed-by: Khalid Aziz > mm/gup.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/mm/gup.c b/mm/gup.c > index ddde097cf9e4..6bb521db67ec 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -2146,7 +2146,7 @@ int __get_user_pages_fast(unsigned long start, int > nr_pages, int write, > unsigned long flags; > int nr = 0; > > - start &= PAGE_MASK; > + start = untagged_addr(start) & PAGE_MASK; > len = (unsigned long) nr_pages << PAGE_SHIFT; > end = start + len; > > @@ -2219,7 +2219,7 @@ int get_user_pages_fast(unsigned long start, int > nr_pages, > unsigned long addr, len, end; > int nr = 0, ret = 0; > > - start &= PAGE_MASK; > + start = untagged_addr(start) & PAGE_MASK; > addr = start; > len = (unsigned long) nr_pages << PAGE_SHIFT; > end = start + len; >
Re: [PATCH v3 06/20] docs: mark orphan documents as such
On Tue, Jun 11, 2019 at 8:05 PM Mauro Carvalho Chehab wrote: > > Em Tue, 11 Jun 2019 19:52:04 +0300 > Andy Shevchenko escreveu: > > > On Fri, Jun 7, 2019 at 10:04 PM Mauro Carvalho Chehab > > wrote: > > > Sphinx doesn't like orphan documents: > > > > > Documentation/laptops/lg-laptop.rst: WARNING: document isn't included > > > in any toctree > > > > > Documentation/laptops/lg-laptop.rst | 2 ++ > > > > > diff --git a/Documentation/laptops/lg-laptop.rst > > > b/Documentation/laptops/lg-laptop.rst > > > index aa503ee9b3bc..f2c2ffe31101 100644 > > > --- a/Documentation/laptops/lg-laptop.rst > > > +++ b/Documentation/laptops/lg-laptop.rst > > > @@ -1,5 +1,7 @@ > > > .. SPDX-License-Identifier: GPL-2.0+ > > > > > > +:orphan: > > > + > > > LG Gram laptop extra features > > > = > > > > > > > Can we rather create a toc tree there? > > It was a first document in reST format in that folder. > > Sure, but: > > 1) I have a patch converting the other files on this dir to rst: > > > https://git.linuxtv.org/mchehab/experimental.git/commit/?h=convert_rst_renames_v4.1&id=abc13233035fdfdbc5ef2f2fbd3d127a1ab15530 > > 2) It probably makes sense to move the entire dir to > Documentation/admin-guide. > > So, I would prefer to have the :orphan: here while (1) is not merged. Fine to me as long as you will drop it by the mentioned effort. -- With Best Regards, Andy Shevchenko
[PATCH] cxl: no need to check return value of debugfs_create functions
When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Because there's no need to check, also make the return value of the local debugfs_create_io_x64() call void, as no one ever did anything with the return value (as they did not need to.) Cc: Frederic Barrat Cc: Andrew Donnellan Cc: Arnd Bergmann Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Greg Kroah-Hartman --- drivers/misc/cxl/debugfs.c | 19 +-- 1 file changed, 5 insertions(+), 14 deletions(-) diff --git a/drivers/misc/cxl/debugfs.c b/drivers/misc/cxl/debugfs.c index 1fda22c24c93..27f3bcb7d939 100644 --- a/drivers/misc/cxl/debugfs.c +++ b/drivers/misc/cxl/debugfs.c @@ -26,11 +26,11 @@ static int debugfs_io_u64_set(void *data, u64 val) DEFINE_DEBUGFS_ATTRIBUTE(fops_io_x64, debugfs_io_u64_get, debugfs_io_u64_set, "0x%016llx\n"); -static struct dentry *debugfs_create_io_x64(const char *name, umode_t mode, - struct dentry *parent, u64 __iomem *value) +static void debugfs_create_io_x64(const char *name, umode_t mode, + struct dentry *parent, u64 __iomem *value) { - return debugfs_create_file_unsafe(name, mode, parent, - (void __force *)value, &fops_io_x64); + debugfs_create_file_unsafe(name, mode, parent, (void __force *)value, + &fops_io_x64); } void cxl_debugfs_add_adapter_regs_psl9(struct cxl *adapter, struct dentry *dir) @@ -64,8 +64,6 @@ int cxl_debugfs_adapter_add(struct cxl *adapter) snprintf(buf, 32, "card%i", adapter->adapter_num); dir = debugfs_create_dir(buf, cxl_debugfs); - if (IS_ERR(dir)) - return PTR_ERR(dir); adapter->debugfs = dir; debugfs_create_io_x64("err_ivte", S_IRUSR, dir, _cxl_p1_addr(adapter, CXL_PSL_ErrIVTE)); @@ -106,8 +104,6 @@ int cxl_debugfs_afu_add(struct cxl_afu *afu) snprintf(buf, 32, "psl%i.%i", afu->adapter->adapter_num, afu->slice); dir = debugfs_create_dir(buf, afu->adapter->debugfs); - if (IS_ERR(dir)) - return PTR_ERR(dir); afu->debugfs = dir; debugfs_create_io_x64("sr", S_IRUSR, dir, _cxl_p1n_addr(afu, CXL_PSL_SR_An)); @@ -129,15 +125,10 @@ void cxl_debugfs_afu_remove(struct cxl_afu *afu) int __init cxl_debugfs_init(void) { - struct dentry *ent; - if (!cpu_has_feature(CPU_FTR_HVMODE)) return 0; - ent = debugfs_create_dir("cxl", NULL); - if (IS_ERR(ent)) - return PTR_ERR(ent); - cxl_debugfs = ent; + cxl_debugfs = debugfs_create_dir("cxl", NULL); return 0; } -- 2.22.0
Re: [BISECTED REGRESSION] b43legacy broken on G4 PowerBook
On Jun 10 2019, Larry Finger wrote: > I do not understand why the if statement returns true as neither of the > values is zero. That's because the format string does not make any sense. You are printing garbage. > diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c > index f7afdad..ba2489d 100644 > --- a/kernel/dma/mapping.c > +++ b/kernel/dma/mapping.c > @@ -317,9 +317,12 @@ int dma_supported(struct device *dev, u64 mask) > > int dma_set_mask(struct device *dev, u64 mask) > { > + pr_info("mask 0x%llx, dma_mask 0x%llx, dma_supported 0x%llx\n", > mask, dev->dma_mask, > + dma_supported(dev, mask)); None of the format directives match the type of the arguments. Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different."
Re: [RFC V3] mm: Generalize and rename notify_page_fault() as kprobe_page_fault()
On Tue, 2019-06-11 at 10:44 +0530, Anshuman Khandual wrote: > > On 06/10/2019 08:57 PM, Leonardo Bras wrote: > > On Mon, 2019-06-10 at 08:09 +0530, Anshuman Khandual wrote: > > > > > +/* > > > > > + * To be potentially processing a kprobe fault and to be allowed > > > > > + * to call kprobe_running(), we have to be non-preemptible. > > > > > + */ > > > > > +if (kprobes_built_in() && !preemptible() && !user_mode(regs)) { > > > > > +if (kprobe_running() && kprobe_fault_handler(regs, trap)) > > > > > > > > don't need an 'if A if B', can do 'if A && B' > > > > > > Which will make it a very lengthy condition check. > > > > Well, is there any problem line-breaking the if condition? > > > > if (A && B && C && > > D && E ) > > > > Also, if it's used only to decide the return value, maybe would be fine > > to do somethink like that: > > > > return (A && B && C && > > D && E ); > > Got it. But as Dave and Matthew had pointed out earlier, the current x86 > implementation has better readability. Hence will probably stick with it. > Sure, I agree with them. It's way more readable. signature.asc Description: This is a digitally signed message part
Re: [PATCH v2 8/8] habanalabs: enable 64-bit DMA mask in POWER9
On Tue, Jun 11, 2019 at 8:03 PM Oded Gabbay wrote: > > On Tue, Jun 11, 2019 at 6:26 PM Greg KH wrote: > > > > On Tue, Jun 11, 2019 at 08:17:53AM -0700, Christoph Hellwig wrote: > > > On Tue, Jun 11, 2019 at 11:58:57AM +0200, Greg KH wrote: > > > > That feels like a big hack. ppc doesn't have any "what arch am I > > > > running on?" runtime call? Did you ask on the ppc64 mailing list? I'm > > > > ok to take this for now, but odds are you need a better fix for this > > > > sometime... > > > > > > That isn't the worst part of it. The whole idea of checking what I'm > > > running to set a dma mask just doesn't make any sense at all. > > > > Oded, I thought I asked if there was a dma call you should be making to > > keep this type of check from being needed. What happened to that? As > > Christoph points out, none of this should be needed, which is what I > > thought I originally said :) > > > > thanks, > > > > greg k-h > > I'm sorry, but it seems I can't explain what's my problem because you > and Christoph keep mentioning the pci_set_dma_mask() but it doesn't > help me. > I'll try again to explain. > > The main problem specifically for Goya device, is that I can't call > this function with *the same parameter* for POWER9 and x86-64, because > x86-64 supports dma mask of 48-bits while POWER9 supports only 32-bits > or 64-bits. > > The main limitation in my Goya device is that it can generate PCI > outbound transactions with addresses from 0 to (2^50 - 1). > That's why when we first integrated it in x86-64, we used a DMA mask > of 48-bits, by calling pci_set_dma_mask(pdev, 48). That way, the > kernel ensures me that all the DMA addresses are from 0 to (2^48 - 1), > and that address range is accessible by my device. > > If for some reason, the x86-64 machine doesn't support 48-bits, the > standard fallback code in ALL the drivers I have seen is to set the > DMA mask to 32-bits. And that's how my current driver's code is > written. > > Now, when I tried to integrate Goya into a POWER9 machine, I got a > reject from the call to pci_set_dma_mask(pdev, 48). The standard code, > as I wrote above, is to call the same function with 32-bits. That > works BUT it is not practical, as our applications require much more > memory mapped then 32-bits. In addition, once you add more cards which > are all mapped to the same range, it is simply not usable at all. > > Therefore, I consulted with POWER people and they told me I can call > to pci_set_dma_mask with the mask as 64, but I must make sure that ALL > outbound transactions from Goya will be with bit 59 set in the > address. > I can achieve that with a dedicated configuration I make in Goya's > PCIe controller. That's what I did and that works. > > So, to summarize: > If I call pci_set_dma_mask with 48, then it fails on POWER9. However, > in runtime, I don't know if its POWER9 or not, so upon failure I will > call it again with 32, which makes our device pretty much unusable. > If I call pci_set_dma_mask with 64, and do the dedicated configuration > in Goya's PCIe controller, then it won't work on x86-64, because bit > 59 will be set and the host won't like it (I checked it). In addition, > I might get addresses above 50 bits, which my device can't generate. > > I hope this makes things more clear. Now, please explain to me how I > can call pci_set_dma_mask without any regard to whether I run on > x86-64 or POWER9, considering what I wrote above ? > > Thanks, > Oded Adding ppc mailing list. Oded
Re: Question - check in runtime which architecture am I running on
On Tue, Jun 11, 2019 at 5:07 PM Christoph Hellwig wrote: > > On Tue, Jun 11, 2019 at 03:30:08PM +0300, Oded Gabbay wrote: > > Hello POWER developers, > > > > I'm trying to find out if there is an internal kernel API so that a > > PCI driver can call it to check if its PCI device is running inside a > > POWER9 machine. Alternatively, if that's not available, if it is > > running on a machine with powerpc architecture. > > Your driver has absolutely not business knowing this. > > > > > I need this information as my device (Goya AI accelerator) > > unfortunately needs a slightly different configuration of its PCIe > > controller in case of POWER9 (need to set bit 59 to be 1 in all > > outbound transactions). > > No, it doesn't. You can query the output from dma_get_required_mask > to optimize for the DMA addresses you get, and otherwise you simply > set the maximum dma mask you support. That is about the control you > get, and nothing else is a drivers business. I don't want to conduct two discussions as I saw you answered on my patch. I'll add the ppc mailing list to my patch. Oded
Re: [PATCH v2 0/4] Additional fixes on Talitos driver
Le 11/06/2019 à 18:30, Horia Geanta a écrit : On 6/11/2019 6:40 PM, Christophe Leroy wrote: Le 11/06/2019 à 17:37, Horia Geanta a écrit : On 6/11/2019 5:39 PM, Christophe Leroy wrote: This series is the last set of fixes for the Talitos driver. We now get a fully clean boot on both SEC1 (SEC1.2 on mpc885) and SEC2 (SEC2.2 on mpc8321E) with CONFIG_CRYPTO_MANAGER_EXTRA_TESTS: I am getting below failures on a sec 3.3.2 (p1020rdb) for hmac(sha384) and hmac(sha512): Is that new with this series or did you already have it before ? Looks like this happens with or without this series. I haven't checked the state of this driver for quite some time. Since I've noticed increased activity, I thought it would be worth actually testing the changes. Are changes in patch 2/4 ("crypto: talitos - fix hash on SEC1.") strictly for sec 1.x or they affect all revisions? They are strictly for sec 1.x What do you mean by "fuzz testing" enabled ? Is that CONFIG_CRYPTO_MANAGER_EXTRA_TESTS or something else ? Yes, it's this config symbol. Indeed SEC 2.2 only supports up to SHA-256. Christophe Horia
Re: [PATCH v3 3/3] powerpc: Add support to initialize ima policy rules
On 06/11/2019 01:19 AM, Satheesh Rajendran wrote: On Mon, Jun 10, 2019 at 04:33:57PM -0400, Nayna Jain wrote: PowerNV secure boot relies on the kernel IMA security subsystem to perform the OS kernel image signature verification. Since each secure boot mode has different IMA policy requirements, dynamic definition of the policy rules based on the runtime secure boot mode of the system is required. On systems that support secure boot, but have it disabled, only measurement policy rules of the kernel image and modules are defined. This patch defines the arch-specific implementation to retrieve the secure boot mode of the system and accordingly configures the IMA policy rules. This patch provides arch-specific IMA policies if PPC_SECURE_BOOT config is enabled. Signed-off-by: Nayna Jain --- arch/powerpc/Kconfig | 14 + arch/powerpc/kernel/Makefile | 1 + arch/powerpc/kernel/ima_arch.c | 54 ++ include/linux/ima.h| 3 +- 4 files changed, 71 insertions(+), 1 deletion(-) create mode 100644 arch/powerpc/kernel/ima_arch.c Hi, This series failed to build against linuxppc/merge tree with `ppc64le_defconfig`, arch/powerpc/platforms/powernv/secboot.c:14:6: error: redefinition of 'get_powerpc_sb_mode' 14 | bool get_powerpc_sb_mode(void) | ^~~ In file included from arch/powerpc/platforms/powernv/secboot.c:11: ./arch/powerpc/include/asm/secboot.h:15:20: note: previous definition of 'get_powerpc_sb_mode' was here 15 | static inline bool get_powerpc_sb_mode(void) |^~~ make[3]: *** [scripts/Makefile.build:278: arch/powerpc/platforms/powernv/secboot.o] Error 1 make[3]: *** Waiting for unfinished jobs make[2]: *** [scripts/Makefile.build:489: arch/powerpc/platforms/powernv] Error 2 make[1]: *** [scripts/Makefile.build:489: arch/powerpc/platforms] Error 2 make: *** [Makefile:1071: arch/powerpc] Error 2 make: *** Waiting for unfinished jobs Thanks for reporting. I have fixed it and reposted as v4. Please retry. Thanks & Regards, - Nayna
[PATCH v4 2/3] powerpc/powernv: detect the secure boot mode of the system
PowerNV secure boot defines different IMA policies based on the secure boot state of the system. This patch defines a function to detect the secure boot state of the system. Signed-off-by: Nayna Jain --- arch/powerpc/include/asm/secboot.h | 21 arch/powerpc/platforms/powernv/Makefile | 2 +- arch/powerpc/platforms/powernv/secboot.c | 61 3 files changed, 83 insertions(+), 1 deletion(-) create mode 100644 arch/powerpc/include/asm/secboot.h create mode 100644 arch/powerpc/platforms/powernv/secboot.c diff --git a/arch/powerpc/include/asm/secboot.h b/arch/powerpc/include/asm/secboot.h new file mode 100644 index ..1904fb4a3352 --- /dev/null +++ b/arch/powerpc/include/asm/secboot.h @@ -0,0 +1,21 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * PowerPC secure boot definitions + * + * Copyright (C) 2019 IBM Corporation + * Author: Nayna Jain + * + */ +#ifndef POWERPC_SECBOOT_H +#define POWERPC_SECBOOT_H + +#if defined(CONFIG_OPAL_SECVAR) +extern bool get_powerpc_sb_mode(void); +#else +static inline bool get_powerpc_sb_mode(void) +{ + return false; +} +#endif + +#endif diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile index 6651c742e530..6f4af607a915 100644 --- a/arch/powerpc/platforms/powernv/Makefile +++ b/arch/powerpc/platforms/powernv/Makefile @@ -16,4 +16,4 @@ obj-$(CONFIG_PERF_EVENTS) += opal-imc.o obj-$(CONFIG_PPC_MEMTRACE) += memtrace.o obj-$(CONFIG_PPC_VAS) += vas.o vas-window.o vas-debug.o obj-$(CONFIG_OCXL_BASE)+= ocxl.o -obj-$(CONFIG_OPAL_SECVAR) += opal-secvar.o +obj-$(CONFIG_OPAL_SECVAR) += opal-secvar.o secboot.o diff --git a/arch/powerpc/platforms/powernv/secboot.c b/arch/powerpc/platforms/powernv/secboot.c new file mode 100644 index ..9199e520ebed --- /dev/null +++ b/arch/powerpc/platforms/powernv/secboot.c @@ -0,0 +1,61 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2019 IBM Corporation + * Author: Nayna Jain + * + * secboot.c + * - util function to get powerpc secboot state + */ +#include +#include +#include +#include + +bool get_powerpc_sb_mode(void) +{ + u8 secure_boot_name[] = "SecureBoot"; + u8 setup_mode_name[] = "SetupMode"; + u8 secboot, setupmode; + unsigned long size = sizeof(secboot); + int status; + unsigned long version; + + status = opal_variable_version(&version); + if ((status != OPAL_SUCCESS) || (version != BACKEND_TC_COMPAT_V1)) { + pr_info("secboot: error retrieving compatible backend\n"); + return false; + } + + status = opal_get_variable(secure_boot_name, sizeof(secure_boot_name), + NULL, NULL, &secboot, &size); + + /* +* For now assume all failures reading the SecureBoot variable implies +* secure boot is not enabled. Later differentiate failure types. +*/ + if (status != OPAL_SUCCESS) { + secboot = 0; + setupmode = 0; + goto out; + } + + size = sizeof(setupmode); + status = opal_get_variable(setup_mode_name, sizeof(setup_mode_name), + NULL, NULL, &setupmode, &size); + + /* +* Failure to read the SetupMode variable does not prevent +* secure boot mode +*/ + if (status != OPAL_SUCCESS) + setupmode = 0; + +out: + if ((secboot == 0) || (setupmode == 1)) { + pr_info("secboot: secureboot mode disabled\n"); + return false; + } + + pr_info("secboot: secureboot mode enabled\n"); + return true; +} -- 2.20.1
[PATCH v4 3/3] powerpc: Add support to initialize ima policy rules
PowerNV secure boot relies on the kernel IMA security subsystem to perform the OS kernel image signature verification. Since each secure boot mode has different IMA policy requirements, dynamic definition of the policy rules based on the runtime secure boot mode of the system is required. On systems that support secure boot, but have it disabled, only measurement policy rules of the kernel image and modules are defined. This patch defines the arch-specific implementation to retrieve the secure boot mode of the system and accordingly configures the IMA policy rules. This patch provides arch-specific IMA policies if PPC_SECURE_BOOT config is enabled. Signed-off-by: Nayna Jain --- arch/powerpc/Kconfig | 14 + arch/powerpc/kernel/Makefile | 1 + arch/powerpc/kernel/ima_arch.c | 54 ++ include/linux/ima.h| 3 +- 4 files changed, 71 insertions(+), 1 deletion(-) create mode 100644 arch/powerpc/kernel/ima_arch.c diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 8c1c636308c8..9de77bb14f54 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -902,6 +902,20 @@ config PPC_MEM_KEYS If unsure, say y. +config PPC_SECURE_BOOT + prompt "Enable PowerPC Secure Boot" + bool + default n + depends on PPC64 + depends on OPAL_SECVAR + depends on IMA + depends on IMA_ARCH_POLICY + help + Linux on POWER with firmware secure boot enabled needs to define + security policies to extend secure boot to the OS.This config + allows user to enable OS Secure Boot on PowerPC systems that + have firmware secure boot support. + endmenu config ISA_DMA_API diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index 0ea6c4aa3a20..75c929b41341 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -131,6 +131,7 @@ ifdef CONFIG_IMA obj-y += ima_kexec.o endif endif +obj-$(CONFIG_PPC_SECURE_BOOT) += ima_arch.o obj-$(CONFIG_AUDIT)+= audit.o obj64-$(CONFIG_AUDIT) += compat_audit.o diff --git a/arch/powerpc/kernel/ima_arch.c b/arch/powerpc/kernel/ima_arch.c new file mode 100644 index ..1767bf6e6550 --- /dev/null +++ b/arch/powerpc/kernel/ima_arch.c @@ -0,0 +1,54 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2019 IBM Corporation + * Author: Nayna Jain + * + * ima_arch.c + * - initialize ima policies for PowerPC Secure Boot + */ + +#include +#include + +bool arch_ima_get_secureboot(void) +{ + bool sb_mode; + + sb_mode = get_powerpc_sb_mode(); + if (sb_mode) + return true; + else + return false; +} + +/* + * File signature verification is not needed, include only measurements + */ +static const char *const default_arch_rules[] = { + "measure func=KEXEC_KERNEL_CHECK template=ima-modsig", + "measure func=MODULE_CHECK template=ima-modsig", + NULL +}; + +/* Both file signature verification and measurements are needed */ +static const char *const sb_arch_rules[] = { + "measure func=KEXEC_KERNEL_CHECK template=ima-modsig", + "measure func=MODULE_CHECK template=ima-modsig", + "appraise func=KEXEC_KERNEL_CHECK appraise_type=imasig|modsig template=ima-modsig", +#if !IS_ENABLED(CONFIG_MODULE_SIG) + "appraise func=MODULE_CHECK appraise_type=imasig|modsig template=ima-modsig", +#endif + NULL +}; + +/* + * On PowerPC, file measurements are to be added to the IMA measurement list + * irrespective of the secure boot state of the system. Signature verification + * is conditionally enabled based on the secure boot state. + */ +const char *const *arch_get_ima_policy(void) +{ + if (IS_ENABLED(CONFIG_IMA_ARCH_POLICY) && arch_ima_get_secureboot()) + return sb_arch_rules; + return default_arch_rules; +} diff --git a/include/linux/ima.h b/include/linux/ima.h index fd9f7cf4cdf5..a01df076ecae 100644 --- a/include/linux/ima.h +++ b/include/linux/ima.h @@ -31,7 +31,8 @@ extern void ima_post_path_mknod(struct dentry *dentry); extern void ima_add_kexec_buffer(struct kimage *image); #endif -#if (defined(CONFIG_X86) && defined(CONFIG_EFI)) || defined(CONFIG_S390) +#if (defined(CONFIG_X86) && defined(CONFIG_EFI)) || defined(CONFIG_S390) \ + || defined(CONFIG_PPC_SECURE_BOOT) extern bool arch_ima_get_secureboot(void); extern const char * const *arch_get_ima_policy(void); #else -- 2.20.1
[PATCH v4 1/3] powerpc/powernv: Add OPAL API interface to get secureboot state
From: Claudio Carvalho The X.509 certificates trusted by the platform and other information required to secure boot the OS kernel are wrapped in secure variables, which are controlled by OPAL. This patch adds support to read OPAL secure variables through OPAL_SECVAR_GET call. It returns the metadata and data for a given secure variable based on the unique key. Since OPAL can support different types of backend which can vary in the variable interpretation, a new OPAL API call named OPAL_SECVAR_BACKEND, is added to retrieve the supported backend version. This helps the consumer to know how to interpret the variable. This support can be enabled using CONFIG_OPAL_SECVAR Signed-off-by: Claudio Carvalho Signed-off-by: Nayna Jain --- This patch depends on a new OPAL call that is being added to skiboot. The patch set that implements the new call has been posted to https://patchwork.ozlabs.org/project/skiboot/list/?series=112868 arch/powerpc/include/asm/opal-api.h | 4 +- arch/powerpc/include/asm/opal-secvar.h | 23 ++ arch/powerpc/include/asm/opal.h | 6 ++ arch/powerpc/platforms/powernv/Kconfig | 6 ++ arch/powerpc/platforms/powernv/Makefile | 1 + arch/powerpc/platforms/powernv/opal-call.c | 2 + arch/powerpc/platforms/powernv/opal-secvar.c | 85 7 files changed, 126 insertions(+), 1 deletion(-) create mode 100644 arch/powerpc/include/asm/opal-secvar.h create mode 100644 arch/powerpc/platforms/powernv/opal-secvar.c diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h index e1577cfa7186..a505e669b4b6 100644 --- a/arch/powerpc/include/asm/opal-api.h +++ b/arch/powerpc/include/asm/opal-api.h @@ -212,7 +212,9 @@ #define OPAL_HANDLE_HMI2 166 #defineOPAL_NX_COPROC_INIT 167 #define OPAL_XIVE_GET_VP_STATE 170 -#define OPAL_LAST 170 +#define OPAL_SECVAR_GET 173 +#define OPAL_SECVAR_BACKEND 177 +#define OPAL_LAST 177 #define QUIESCE_HOLD 1 /* Spin all calls at entry */ #define QUIESCE_REJECT 2 /* Fail all calls with OPAL_BUSY */ diff --git a/arch/powerpc/include/asm/opal-secvar.h b/arch/powerpc/include/asm/opal-secvar.h new file mode 100644 index ..b677171a0368 --- /dev/null +++ b/arch/powerpc/include/asm/opal-secvar.h @@ -0,0 +1,23 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * PowerNV definitions for secure variables OPAL API. + * + * Copyright (C) 2019 IBM Corporation + * Author: Claudio Carvalho + * + */ +#ifndef OPAL_SECVAR_H +#define OPAL_SECVAR_H + +enum { + BACKEND_NONE = 0, + BACKEND_TC_COMPAT_V1, +}; + +extern int opal_get_variable(u8 *key, unsigned long ksize, +u8 *metadata, unsigned long *mdsize, +u8 *data, unsigned long *dsize); + +extern int opal_variable_version(unsigned long *backend); + +#endif diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h index 4cc37e708bc7..57d2c2356eda 100644 --- a/arch/powerpc/include/asm/opal.h +++ b/arch/powerpc/include/asm/opal.h @@ -394,6 +394,12 @@ void opal_powercap_init(void); void opal_psr_init(void); void opal_sensor_groups_init(void); +extern int opal_secvar_get(uint64_t k_key, uint64_t k_key_len, + uint64_t k_metadata, uint64_t k_metadata_size, + uint64_t k_data, uint64_t k_data_size); + +extern int opal_secvar_backend(uint64_t k_backend); + #endif /* __ASSEMBLY__ */ #endif /* _ASM_POWERPC_OPAL_H */ diff --git a/arch/powerpc/platforms/powernv/Kconfig b/arch/powerpc/platforms/powernv/Kconfig index 850eee860cf2..65b060539b5c 100644 --- a/arch/powerpc/platforms/powernv/Kconfig +++ b/arch/powerpc/platforms/powernv/Kconfig @@ -47,3 +47,9 @@ config PPC_VAS VAS adapters are found in POWER9 based systems. If unsure, say N. + +config OPAL_SECVAR + bool "OPAL Secure Variables" + depends on PPC_POWERNV + help + This enables the kernel to access OPAL secure variables. diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile index da2e99efbd04..6651c742e530 100644 --- a/arch/powerpc/platforms/powernv/Makefile +++ b/arch/powerpc/platforms/powernv/Makefile @@ -16,3 +16,4 @@ obj-$(CONFIG_PERF_EVENTS) += opal-imc.o obj-$(CONFIG_PPC_MEMTRACE) += memtrace.o obj-$(CONFIG_PPC_VAS) += vas.o vas-window.o vas-debug.o obj-$(CONFIG_OCXL_BASE)+= ocxl.o +obj-$(CONFIG_OPAL_SECVAR) += opal-secvar.o diff --git a/arch/powerpc/platforms/powernv/opal-call.c b/arch/powerpc/platforms/powernv/opal-call.c index 36c8fa3647a2..0445980f294f 100644 --- a/arch/powerpc/platforms/powernv/opal-call.c +++ b/arch/powerpc/platforms/powernv/opal-call.c @@ -288,3 +288,5 @@ OPAL_CALL(opal_pci_set_pbcq_tunnel_bar,
[PATCH v4 0/3] powerpc: Enabling IMA arch specific secure boot policies
This patch set, previously named "powerpc: Enabling secure boot on powernv systems - Part 1", is part of a series that implements secure boot on PowerNV systems. In order to verify the OS kernel on PowerNV, secure boot requires X.509 certificates trusted by the platform, the secure boot modes, and several other pieces of information. These are stored in secure variables controlled by OPAL, also known as OPAL secure variables. The IMA architecture specific policy support on POWER is dependent on OPAL runtime services to access secure variables. OPAL APIs in skiboot are modified to define generic interface compatible to any backend. This patchset is consequently updated to be compatible with new OPAL API interface. This has cleaned up any EFIsms in the arch specific code. Further, the ima arch specific policies are updated to be able to support appended signatures. They also now use per policy template. Exposing the OPAL secure variables to userspace will be posted as a separate patch set, allowing the IMA architecture specific policy on POWER to be upstreamed independently. This patch set adds the following features: 1. Add support for OPAL Runtime API to access secure variables controlled by OPAL. 2. Define IMA arch-specific policies based on the secure boot state and mode of the system. On secure boot enabled PowerNV systems, the OS kernel signature will be verified by IMA appraisal. Pre-requisites for this patchset are: 1. OPAL APIs in Skiboot[1] 2. Appended signature support in IMA [2] 3. Per policy template support in IMA [3] [1] https://patchwork.ozlabs.org/project/skiboot/list/?series=112868 [2] https://patchwork.ozlabs.org/cover/1087361/. Updated version will be posted soon [3] Repo: https://kernel.googlesource.com/pub/scm/linux/kernel/git/zohar/linux-integrity Branch: next-queued-testing. Commit: f241bb1f42aa95 -- Original Cover Letter: This patch set is part of a series that implements secure boot on PowerNV systems. In order to verify the OS kernel on PowerNV, secure boot requires X.509 certificates trusted by the platform, the secure boot modes, and several other pieces of information. These are stored in secure variables controlled by OPAL, also known as OPAL secure variables. The IMA architecture specific policy support on Power is dependent on OPAL runtime services to access secure variables. Instead of directly accessing the OPAL runtime services, version 3 of this patch set relied upon the EFI hooks. This version drops that dependency and calls the OPAL runtime services directly. Skiboot OPAL APIs are due to be posted soon. Exposing the OPAL secure variables to userspace will be posted as a separate patch set, allowing the IMA architecture specific policy on Power to be upstreamed independently. This patch set adds the following features: 1. Add support for OPAL Runtime API to access secure variables controlled by OPAL. 2. Define IMA arch-specific policies based on the secure boot state and mode of the system. On secure boot enabled powernv systems, the OS kernel signature will be verified by IMA appraisal. [1] https://patchwork.kernel.org/cover/10882149/ Changelog: v4: * Fixed the build issue as reported by Satheesh Rajendran. v3: * OPAL APIs in Patch 1 are updated to provide generic interface based on key/keylen. This patchset updates kernel OPAL APIs to be compatible with generic interface. * Patch 2 is cleaned up to use new OPAL APIs. * Since OPAL can support different types of backend which can vary in the variable interpretation, the Patch 2 is updated to add a check for the backend version * OPAL API now expects consumer to first check the supported backend version before calling other secvar OPAL APIs. This check is now added in patch 2. * IMA policies in Patch 3 is updated to specify appended signature and per policy template. * The patches now are free of any EFIisms. v2: * Removed Patch 1: powerpc/include: Override unneeded early ioremap functions * Updated Subject line and patch description of the Patch 1 of this series * Removed dependency of OPAL_SECVAR on EFI, CPU_BIG_ENDIAN and UCS2_STRING * Changed OPAL APIs from static to non-static. Added opal-secvar.h for the same * Removed EFI hooks from opal_secvar.c * Removed opal_secvar_get_next(), opal_secvar_enqueue() and opal_query_variable_info() function * get_powerpc_sb_mode() in secboot.c now directly calls OPAL Runtime API rather than via EFI hooks. * Fixed log messages in get_powerpc_sb_mode() function. * Added dependency for PPC_SECURE_BOOT on configs PPC64 and OPAL_SECVAR * Replaced obj-$(CONFIG_IMA) with obj-$(CONFIG_PPC_SECURE_BOOT) in arch/powerpc/kernel/Makefile Claudio Carvalho (1): powerpc/powernv: Add OPAL API interface to get secureboot state Nayna Jain (2): powerpc/powernv: detect the secure boot mode of the system powerpc: Add support to initialize ima policy rules arch/powerpc/Kconfig
Re: [PATCH v3 06/20] docs: mark orphan documents as such
Em Tue, 11 Jun 2019 19:52:04 +0300 Andy Shevchenko escreveu: > On Fri, Jun 7, 2019 at 10:04 PM Mauro Carvalho Chehab > wrote: > > Sphinx doesn't like orphan documents: > > > Documentation/laptops/lg-laptop.rst: WARNING: document isn't included > > in any toctree > > > Documentation/laptops/lg-laptop.rst | 2 ++ > > > diff --git a/Documentation/laptops/lg-laptop.rst > > b/Documentation/laptops/lg-laptop.rst > > index aa503ee9b3bc..f2c2ffe31101 100644 > > --- a/Documentation/laptops/lg-laptop.rst > > +++ b/Documentation/laptops/lg-laptop.rst > > @@ -1,5 +1,7 @@ > > .. SPDX-License-Identifier: GPL-2.0+ > > > > +:orphan: > > + > > LG Gram laptop extra features > > = > > > > Can we rather create a toc tree there? > It was a first document in reST format in that folder. Sure, but: 1) I have a patch converting the other files on this dir to rst: https://git.linuxtv.org/mchehab/experimental.git/commit/?h=convert_rst_renames_v4.1&id=abc13233035fdfdbc5ef2f2fbd3d127a1ab15530 2) It probably makes sense to move the entire dir to Documentation/admin-guide. So, I would prefer to have the :orphan: here while (1) is not merged. Thanks, Mauro
Re: [PATCH v3 06/20] docs: mark orphan documents as such
On Fri, Jun 7, 2019 at 10:04 PM Mauro Carvalho Chehab wrote: > Sphinx doesn't like orphan documents: > Documentation/laptops/lg-laptop.rst: WARNING: document isn't included in > any toctree > Documentation/laptops/lg-laptop.rst | 2 ++ > diff --git a/Documentation/laptops/lg-laptop.rst > b/Documentation/laptops/lg-laptop.rst > index aa503ee9b3bc..f2c2ffe31101 100644 > --- a/Documentation/laptops/lg-laptop.rst > +++ b/Documentation/laptops/lg-laptop.rst > @@ -1,5 +1,7 @@ > .. SPDX-License-Identifier: GPL-2.0+ > > +:orphan: > + > LG Gram laptop extra features > = > Can we rather create a toc tree there? It was a first document in reST format in that folder. -- With Best Regards, Andy Shevchenko
Re: [PATCH] powerpc/32s: fix initial setup of segment registers on secondary CPU
Le 11/06/2019 à 17:47, Christophe Leroy a écrit : The patch referenced below moved the loading of segment registers out of load_up_mmu() in order to do it earlier in the boot sequence. However, the secondary CPU still needs it to be done when loading up the MMU. Reported-by: Erhard F. Fixes: 215b823707ce ("powerpc/32s: set up an early static hash table for KASAN") Cc: sta...@vger.kernel.org Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/head_32.S | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S index 1d5f1bd0dacd..f255e22184b4 100644 --- a/arch/powerpc/kernel/head_32.S +++ b/arch/powerpc/kernel/head_32.S @@ -752,6 +752,7 @@ __secondary_start: stw r0,0(r3) /* load up the MMU */ + bl load_segment_registers bl load_up_mmu /* ptr to phys current thread */
Re: [PATCH v2 0/4] Additional fixes on Talitos driver
On 6/11/2019 6:40 PM, Christophe Leroy wrote: > > > Le 11/06/2019 à 17:37, Horia Geanta a écrit : >> On 6/11/2019 5:39 PM, Christophe Leroy wrote: >>> This series is the last set of fixes for the Talitos driver. >>> >>> We now get a fully clean boot on both SEC1 (SEC1.2 on mpc885) and >>> SEC2 (SEC2.2 on mpc8321E) with CONFIG_CRYPTO_MANAGER_EXTRA_TESTS: >>> >> I am getting below failures on a sec 3.3.2 (p1020rdb) for hmac(sha384) and >> hmac(sha512): > > Is that new with this series or did you already have it before ? > Looks like this happens with or without this series. I haven't checked the state of this driver for quite some time. Since I've noticed increased activity, I thought it would be worth actually testing the changes. Are changes in patch 2/4 ("crypto: talitos - fix hash on SEC1.") strictly for sec 1.x or they affect all revisions? > What do you mean by "fuzz testing" enabled ? Is that > CONFIG_CRYPTO_MANAGER_EXTRA_TESTS or something else ? > Yes, it's this config symbol. Horia
[PATCH] powerpc/32s: fix initial setup of segment registers on secondary CPU
The patch referenced below moved the loading of segment registers out of load_up_mmu() in order to do it earlier in the boot sequence. However, the secondary CPU still needs it to be done when loading up the MMU. Reported-by: Erhard F. Fixes: 215b823707ce ("powerpc/32s: set up an early static hash table for KASAN") Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/head_32.S | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S index 1d5f1bd0dacd..f255e22184b4 100644 --- a/arch/powerpc/kernel/head_32.S +++ b/arch/powerpc/kernel/head_32.S @@ -752,6 +752,7 @@ __secondary_start: stw r0,0(r3) /* load up the MMU */ + bl load_segment_registers bl load_up_mmu /* ptr to phys current thread */ -- 2.13.3
Re: [PATCH v2 0/4] Additional fixes on Talitos driver
Le 11/06/2019 à 17:37, Horia Geanta a écrit : On 6/11/2019 5:39 PM, Christophe Leroy wrote: This series is the last set of fixes for the Talitos driver. We now get a fully clean boot on both SEC1 (SEC1.2 on mpc885) and SEC2 (SEC2.2 on mpc8321E) with CONFIG_CRYPTO_MANAGER_EXTRA_TESTS: I am getting below failures on a sec 3.3.2 (p1020rdb) for hmac(sha384) and hmac(sha512): Is that new with this series or did you already have it before ? What do you mean by "fuzz testing" enabled ? Is that CONFIG_CRYPTO_MANAGER_EXTRA_TESTS or something else ? Christophe alg: ahash: hmac-sha384-talitos test failed (wrong result) on test vector "random: psize=2497 ksize=124", cfg="random: inplace use_finup nosimd src_divs=[76.49%@+4002, 23.51%@alignmask+26] iv_offset=4" alg: ahash: hmac-sha512-talitos test failed (wrong result) on test vector "random: psize=27 ksize=121", cfg="random: inplace may_sleep use_digest src_divs=[100.0%@+10] iv_offset=9" Reproducibility rate is 100% so far, here are a few more runs - they might help finding a pattern: 1. alg: ahash: hmac-sha384-talitos test failed (wrong result) on test vector "random: psize=184 ksize=121", cfg="random: use_finup src_divs=[100.0%@+3988] dst_divs=[100.0%@+547] iv_offset=44" alg: ahash: hmac-sha512-talitos test failed (wrong result) on test vector "random: psize=7 ksize=122", cfg="random: may_sleep use_digest src_divs=[100.0%@+3968] dst_divs=[100.0%@+20]" 2. alg: ahash: hmac-sha384-talitos test failed (wrong result) on test vector "random: psize=6481 ksize=120", cfg="random: use_final src_divs=[100.0%@+6] dst_divs=[43.84%@alignmask+6, 56.16%@+22]" alg: ahash: hmac-sha512-talitos test failed (wrong result) on test vector "random: psize=635 ksize=128", cfg="random: may_sleep use_finup src_divs=[100.0%@+4062] dst_divs=[20.47%@+2509, 72.36%@alignmask+2, 7.17%@alignmask+3990]" 3. alg: ahash: hmac-sha384-talitos test failed (wrong result) on test vector "random: psize=2428 ksize=127", cfg="random: may_sleep use_finup src_divs=[35.19%@+18, 64.81%@+1755] dst_divs=[100.0%@+111] iv_offset=5" alg: ahash: hmac-sha512-talitos test failed (wrong result) on test vector "random: psize=4345 ksize=128", cfg="random: may_sleep use_digest src_divs=[100.0%@+2820] iv_offset=59" If you run several times with fuzz testing enabled on your sec2.2, are you able to see similar failures? Thanks, Horia
Re: [PATCH v2 0/4] Additional fixes on Talitos driver
On 6/11/2019 5:39 PM, Christophe Leroy wrote: > This series is the last set of fixes for the Talitos driver. > > We now get a fully clean boot on both SEC1 (SEC1.2 on mpc885) and > SEC2 (SEC2.2 on mpc8321E) with CONFIG_CRYPTO_MANAGER_EXTRA_TESTS: > I am getting below failures on a sec 3.3.2 (p1020rdb) for hmac(sha384) and hmac(sha512): alg: ahash: hmac-sha384-talitos test failed (wrong result) on test vector "random: psize=2497 ksize=124", cfg="random: inplace use_finup nosimd src_divs=[76.49%@+4002, 23.51%@alignmask+26] iv_offset=4" alg: ahash: hmac-sha512-talitos test failed (wrong result) on test vector "random: psize=27 ksize=121", cfg="random: inplace may_sleep use_digest src_divs=[100.0%@+10] iv_offset=9" Reproducibility rate is 100% so far, here are a few more runs - they might help finding a pattern: 1. alg: ahash: hmac-sha384-talitos test failed (wrong result) on test vector "random: psize=184 ksize=121", cfg="random: use_finup src_divs=[100.0%@+3988] dst_divs=[100.0%@+547] iv_offset=44" alg: ahash: hmac-sha512-talitos test failed (wrong result) on test vector "random: psize=7 ksize=122", cfg="random: may_sleep use_digest src_divs=[100.0%@+3968] dst_divs=[100.0%@+20]" 2. alg: ahash: hmac-sha384-talitos test failed (wrong result) on test vector "random: psize=6481 ksize=120", cfg="random: use_final src_divs=[100.0%@+6] dst_divs=[43.84%@alignmask+6, 56.16%@+22]" alg: ahash: hmac-sha512-talitos test failed (wrong result) on test vector "random: psize=635 ksize=128", cfg="random: may_sleep use_finup src_divs=[100.0%@+4062] dst_divs=[20.47%@+2509, 72.36%@alignmask+2, 7.17%@alignmask+3990]" 3. alg: ahash: hmac-sha384-talitos test failed (wrong result) on test vector "random: psize=2428 ksize=127", cfg="random: may_sleep use_finup src_divs=[35.19%@+18, 64.81%@+1755] dst_divs=[100.0%@+111] iv_offset=5" alg: ahash: hmac-sha512-talitos test failed (wrong result) on test vector "random: psize=4345 ksize=128", cfg="random: may_sleep use_digest src_divs=[100.0%@+2820] iv_offset=59" If you run several times with fuzz testing enabled on your sec2.2, are you able to see similar failures? Thanks, Horia
[PATCH 16/16] mm: pass get_user_pages_fast iterator arguments in a structure
Instead of passing a set of always repeated arguments down the get_user_pages_fast iterators, create a struct gup_args to hold them and pass that by reference. This leads to an over 100 byte .text size reduction for x86-64. Signed-off-by: Christoph Hellwig --- mm/gup.c | 338 ++- 1 file changed, 158 insertions(+), 180 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index 8bcc042f933a..419a565fc998 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -24,6 +24,13 @@ #include "internal.h" +struct gup_args { + unsigned long addr; + unsigned intflags; + struct page **pages; + unsigned intnr; +}; + struct follow_page_context { struct dev_pagemap *pgmap; unsigned int page_mask; @@ -1786,10 +1793,10 @@ static inline pte_t gup_get_pte(pte_t *ptep) } #endif /* CONFIG_GUP_GET_PTE_LOW_HIGH */ -static void undo_dev_pagemap(int *nr, int nr_start, struct page **pages) +static void undo_dev_pagemap(struct gup_args *args, int nr_start) { - while ((*nr) - nr_start) { - struct page *page = pages[--(*nr)]; + while (args->nr - nr_start) { + struct page *page = args->pages[--args->nr]; ClearPageReferenced(page); put_page(page); @@ -1811,14 +1818,13 @@ static inline struct page *try_get_compound_head(struct page *page, int refs) } #ifdef CONFIG_ARCH_HAS_PTE_SPECIAL -static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, -unsigned int flags, struct page **pages, int *nr) +static int gup_pte_range(struct gup_args *args, pmd_t pmd, unsigned long end) { struct dev_pagemap *pgmap = NULL; - int nr_start = *nr, ret = 0; + int nr_start = args->nr, ret = 0; pte_t *ptep, *ptem; - ptem = ptep = pte_offset_map(&pmd, addr); + ptem = ptep = pte_offset_map(&pmd, args->addr); do { pte_t pte = gup_get_pte(ptep); struct page *head, *page; @@ -1830,16 +1836,16 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, if (pte_protnone(pte)) goto pte_unmap; - if (!pte_access_permitted(pte, flags & FOLL_WRITE)) + if (!pte_access_permitted(pte, args->flags & FOLL_WRITE)) goto pte_unmap; if (pte_devmap(pte)) { - if (unlikely(flags & FOLL_LONGTERM)) + if (unlikely(args->flags & FOLL_LONGTERM)) goto pte_unmap; pgmap = get_dev_pagemap(pte_pfn(pte), pgmap); if (unlikely(!pgmap)) { - undo_dev_pagemap(nr, nr_start, pages); + undo_dev_pagemap(args, nr_start); goto pte_unmap; } } else if (pte_special(pte)) @@ -1860,10 +1866,8 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, VM_BUG_ON_PAGE(compound_head(page) != head, page); SetPageReferenced(page); - pages[*nr] = page; - (*nr)++; - - } while (ptep++, addr += PAGE_SIZE, addr != end); + args->pages[args->nr++] = page; + } while (ptep++, args->addr += PAGE_SIZE, args->addr != end); ret = 1; @@ -1884,18 +1888,17 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, * __get_user_pages_fast implementation that can pin pages. Thus it's still * useful to have gup_huge_pmd even if we can't operate on ptes. */ -static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, -unsigned int flags, struct page **pages, int *nr) +static int gup_pte_range(struct gup_args *args, pmd_t pmd, unsigned long end) { return 0; } #endif /* CONFIG_ARCH_HAS_PTE_SPECIAL */ #if defined(__HAVE_ARCH_PTE_DEVMAP) && defined(CONFIG_TRANSPARENT_HUGEPAGE) -static int __gup_device_huge(unsigned long pfn, unsigned long addr, - unsigned long end, struct page **pages, int *nr) +static int __gup_device_huge(struct gup_args *args, unsigned long pfn, + unsigned long end) { - int nr_start = *nr; + int nr_start = args->nr; struct dev_pagemap *pgmap = NULL; do { @@ -1903,64 +1906,63 @@ static int __gup_device_huge(unsigned long pfn, unsigned long addr, pgmap = get_dev_pagemap(pfn, pgmap); if (unlikely(!pgmap)) { - undo_dev_pagemap(nr, nr_start, pages); + undo_dev_pagemap(args, nr_start); return 0; } SetPageReferenced(page); - pages[*nr] = page; + args->pages[args->nr++] = page; get_page(page);
[PATCH 15/16] mm: mark the page referenced in gup_hugepte
All other get_user_page_fast cases mark the page referenced, so do this here as well. Signed-off-by: Christoph Hellwig --- mm/gup.c | 1 + 1 file changed, 1 insertion(+) diff --git a/mm/gup.c b/mm/gup.c index 0733674b539d..8bcc042f933a 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -2021,6 +2021,7 @@ static int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr, return 0; } + SetPageReferenced(head); return 1; } -- 2.20.1
[PATCH 13/16] mm: move the powerpc hugepd code to mm/gup.c
While only powerpc supports the hugepd case, the code is pretty generic and I'd like to keep all GUP internals in one place. Signed-off-by: Christoph Hellwig --- arch/powerpc/Kconfig | 1 + arch/powerpc/mm/hugetlbpage.c | 72 -- include/linux/hugetlb.h | 18 mm/Kconfig| 10 + mm/gup.c | 82 +++ 5 files changed, 93 insertions(+), 90 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 992a04796e56..4f1b00979cde 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -125,6 +125,7 @@ config PPC select ARCH_HAS_FORTIFY_SOURCE select ARCH_HAS_GCOV_PROFILE_ALL select ARCH_HAS_KCOV + select ARCH_HAS_HUGEPD if HUGETLB_PAGE select ARCH_HAS_MMIOWB if PPC64 select ARCH_HAS_PHYS_TO_DMA select ARCH_HAS_PMEM_APIif PPC64 diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c index b5d92dc32844..51716c11d0fb 100644 --- a/arch/powerpc/mm/hugetlbpage.c +++ b/arch/powerpc/mm/hugetlbpage.c @@ -511,13 +511,6 @@ struct page *follow_huge_pd(struct vm_area_struct *vma, return page; } -static unsigned long hugepte_addr_end(unsigned long addr, unsigned long end, - unsigned long sz) -{ - unsigned long __boundary = (addr + sz) & ~(sz-1); - return (__boundary - 1 < end - 1) ? __boundary : end; -} - #ifdef CONFIG_PPC_MM_SLICES unsigned long hugetlb_get_unmapped_area(struct file *file, unsigned long addr, unsigned long len, unsigned long pgoff, @@ -665,68 +658,3 @@ void flush_dcache_icache_hugepage(struct page *page) } } } - -static int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr, - unsigned long end, int write, struct page **pages, int *nr) -{ - unsigned long pte_end; - struct page *head, *page; - pte_t pte; - int refs; - - pte_end = (addr + sz) & ~(sz-1); - if (pte_end < end) - end = pte_end; - - pte = READ_ONCE(*ptep); - - if (!pte_access_permitted(pte, write)) - return 0; - - /* hugepages are never "special" */ - VM_BUG_ON(!pfn_valid(pte_pfn(pte))); - - refs = 0; - head = pte_page(pte); - - page = head + ((addr & (sz-1)) >> PAGE_SHIFT); - do { - VM_BUG_ON(compound_head(page) != head); - pages[*nr] = page; - (*nr)++; - page++; - refs++; - } while (addr += PAGE_SIZE, addr != end); - - if (!page_cache_add_speculative(head, refs)) { - *nr -= refs; - return 0; - } - - if (unlikely(pte_val(pte) != pte_val(*ptep))) { - /* Could be optimized better */ - *nr -= refs; - while (refs--) - put_page(head); - return 0; - } - - return 1; -} - -int gup_huge_pd(hugepd_t hugepd, unsigned long addr, unsigned int pdshift, - unsigned long end, int write, struct page **pages, int *nr) -{ - pte_t *ptep; - unsigned long sz = 1UL << hugepd_shift(hugepd); - unsigned long next; - - ptep = hugepte_offset(hugepd, addr, pdshift); - do { - next = hugepte_addr_end(addr, end, sz); - if (!gup_hugepte(ptep, sz, addr, end, write, pages, nr)) - return 0; - } while (ptep++, addr = next, addr != end); - - return 1; -} diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index edf476c8cfb9..0f91761e2c53 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -16,29 +16,11 @@ struct user_struct; struct mmu_gather; #ifndef is_hugepd -/* - * Some architectures requires a hugepage directory format that is - * required to support multiple hugepage sizes. For example - * a4fe3ce76 "powerpc/mm: Allow more flexible layouts for hugepage pagetables" - * introduced the same on powerpc. This allows for a more flexible hugepage - * pagetable layout. - */ typedef struct { unsigned long pd; } hugepd_t; #define is_hugepd(hugepd) (0) #define __hugepd(x) ((hugepd_t) { (x) }) -static inline int gup_huge_pd(hugepd_t hugepd, unsigned long addr, - unsigned pdshift, unsigned long end, - int write, struct page **pages, int *nr) -{ - return 0; -} -#else -extern int gup_huge_pd(hugepd_t hugepd, unsigned long addr, - unsigned pdshift, unsigned long end, - int write, struct page **pages, int *nr); #endif - #ifdef CONFIG_HUGETLB_PAGE #include diff --git a/mm/Kconfig b/mm/Kconfig index 5c41409557da..44be3f01a2b2 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -769,4 +769,14 @@ config GUP_GET_PTE_L
[PATCH 14/16] mm: switch gup_hugepte to use try_get_compound_head
This applies the overflow fixes from 8fde12ca79aff ("mm: prevent get_user_pages() from overflowing page refcount") to the powerpc hugepd code and brings it back in sync with the other GUP cases. Signed-off-by: Christoph Hellwig --- mm/gup.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/mm/gup.c b/mm/gup.c index 494aa4c3a55e..0733674b539d 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -2007,7 +2007,8 @@ static int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr, refs++; } while (addr += PAGE_SIZE, addr != end); - if (!page_cache_add_speculative(head, refs)) { + head = try_get_compound_head(head, refs); + if (!head) { *nr -= refs; return 0; } -- 2.20.1
[PATCH 11/16] mm: consolidate the get_user_pages* implementations
Always build mm/gup.c, and move the nommu versions and replace the separate stubs for various functions by the default ones, with the _fast version always falling back to the slow path because gup_fast_permitted always returns false now if HAVE_FAST_GUP is not set, and we use the nommu version of __get_user_pages while keeping all the wrappers common. This also ensures the new put_user_pages* helpers are available for nommu, as those are currently missing, which would create a problem as soon as we actually grew users for it. Signed-off-by: Christoph Hellwig --- mm/Kconfig | 1 + mm/Makefile | 4 +- mm/gup.c| 476 +--- mm/nommu.c | 88 -- mm/util.c | 47 -- 5 files changed, 269 insertions(+), 347 deletions(-) diff --git a/mm/Kconfig b/mm/Kconfig index 98dffb0f2447..5c41409557da 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -133,6 +133,7 @@ config HAVE_MEMBLOCK_PHYS_MAP bool config HAVE_FAST_GUP + depends on MMU bool config ARCH_KEEP_MEMBLOCK diff --git a/mm/Makefile b/mm/Makefile index ac5e5ba78874..dc0746ca1109 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -22,7 +22,7 @@ KCOV_INSTRUMENT_mmzone.o := n KCOV_INSTRUMENT_vmstat.o := n mmu-y := nommu.o -mmu-$(CONFIG_MMU) := gup.o highmem.o memory.o mincore.o \ +mmu-$(CONFIG_MMU) := highmem.o memory.o mincore.o \ mlock.o mmap.o mmu_gather.o mprotect.o mremap.o \ msync.o page_vma_mapped.o pagewalk.o \ pgtable-generic.o rmap.o vmalloc.o @@ -39,7 +39,7 @@ obj-y := filemap.o mempool.o oom_kill.o fadvise.o \ mm_init.o mmu_context.o percpu.o slab_common.o \ compaction.o vmacache.o \ interval_tree.o list_lru.o workingset.o \ - debug.o $(mmu-y) + debug.o gup.o $(mmu-y) # Give 'page_alloc' its own module-parameter namespace page-alloc-y := page_alloc.o diff --git a/mm/gup.c b/mm/gup.c index 7328890ad8d3..fe4f205651fd 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -134,6 +134,7 @@ void put_user_pages(struct page **pages, unsigned long npages) } EXPORT_SYMBOL(put_user_pages); +#ifdef CONFIG_MMU static struct page *no_page_table(struct vm_area_struct *vma, unsigned int flags) { @@ -1100,86 +1101,6 @@ static __always_inline long __get_user_pages_locked(struct task_struct *tsk, return pages_done; } -/* - * We can leverage the VM_FAULT_RETRY functionality in the page fault - * paths better by using either get_user_pages_locked() or - * get_user_pages_unlocked(). - * - * get_user_pages_locked() is suitable to replace the form: - * - * down_read(&mm->mmap_sem); - * do_something() - * get_user_pages(tsk, mm, ..., pages, NULL); - * up_read(&mm->mmap_sem); - * - * to: - * - * int locked = 1; - * down_read(&mm->mmap_sem); - * do_something() - * get_user_pages_locked(tsk, mm, ..., pages, &locked); - * if (locked) - * up_read(&mm->mmap_sem); - */ -long get_user_pages_locked(unsigned long start, unsigned long nr_pages, - unsigned int gup_flags, struct page **pages, - int *locked) -{ - /* -* FIXME: Current FOLL_LONGTERM behavior is incompatible with -* FAULT_FLAG_ALLOW_RETRY because of the FS DAX check requirement on -* vmas. As there are no users of this flag in this call we simply -* disallow this option for now. -*/ - if (WARN_ON_ONCE(gup_flags & FOLL_LONGTERM)) - return -EINVAL; - - return __get_user_pages_locked(current, current->mm, start, nr_pages, - pages, NULL, locked, - gup_flags | FOLL_TOUCH); -} -EXPORT_SYMBOL(get_user_pages_locked); - -/* - * get_user_pages_unlocked() is suitable to replace the form: - * - * down_read(&mm->mmap_sem); - * get_user_pages(tsk, mm, ..., pages, NULL); - * up_read(&mm->mmap_sem); - * - * with: - * - * get_user_pages_unlocked(tsk, mm, ..., pages); - * - * It is functionally equivalent to get_user_pages_fast so - * get_user_pages_fast should be used instead if specific gup_flags - * (e.g. FOLL_FORCE) are not required. - */ -long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages, -struct page **pages, unsigned int gup_flags) -{ - struct mm_struct *mm = current->mm; - int locked = 1; - long ret; - - /* -* FIXME: Current FOLL_LONGTERM behavior is incompatible with -* FAULT_FLAG_ALLOW_RETRY because of the FS DAX check requirement on -* vmas. As there are no users of this flag in this call we simply -* disallow this option for now. -*/ - if (WARN_ON_ONCE(gup_flags & FOLL_LONGTERM))
[PATCH 12/16] mm: validate get_user_pages_fast flags
We can only deal with FOLL_WRITE and/or FOLL_LONGTERM in get_user_pages_fast, so reject all other flags. Signed-off-by: Christoph Hellwig --- mm/gup.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/mm/gup.c b/mm/gup.c index fe4f205651fd..78dc1871b3d4 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -2317,6 +2317,9 @@ int get_user_pages_fast(unsigned long start, int nr_pages, unsigned long addr, len, end; int nr = 0, ret = 0; + if (WARN_ON_ONCE(gup_flags & ~(FOLL_WRITE | FOLL_LONGTERM))) + return -EINVAL; + start = untagged_addr(start) & PAGE_MASK; addr = start; len = (unsigned long) nr_pages << PAGE_SHIFT; -- 2.20.1
[PATCH 10/16] mm: rename CONFIG_HAVE_GENERIC_GUP to CONFIG_HAVE_FAST_GUP
We only support the generic GUP now, so rename the config option to be more clear, and always use the mm/Kconfig definition of the symbol and select it from the arch Kconfigs. Signed-off-by: Christoph Hellwig --- arch/arm/Kconfig | 5 + arch/arm64/Kconfig | 4 +--- arch/mips/Kconfig| 2 +- arch/powerpc/Kconfig | 2 +- arch/s390/Kconfig| 2 +- arch/sh/Kconfig | 2 +- arch/sparc/Kconfig | 2 +- arch/x86/Kconfig | 4 +--- mm/Kconfig | 2 +- mm/gup.c | 4 ++-- 10 files changed, 11 insertions(+), 18 deletions(-) diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 8869742a85df..3879a3e2c511 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -73,6 +73,7 @@ config ARM select HAVE_DYNAMIC_FTRACE_WITH_REGS if HAVE_DYNAMIC_FTRACE select HAVE_EFFICIENT_UNALIGNED_ACCESS if (CPU_V6 || CPU_V6K || CPU_V7) && MMU select HAVE_EXIT_THREAD + select HAVE_FAST_GUP if ARM_LPAE select HAVE_FTRACE_MCOUNT_RECORD if !XIP_KERNEL select HAVE_FUNCTION_GRAPH_TRACER if !THUMB2_KERNEL && !CC_IS_CLANG select HAVE_FUNCTION_TRACER if !XIP_KERNEL @@ -1596,10 +1597,6 @@ config ARCH_SELECT_MEMORY_MODEL config HAVE_ARCH_PFN_VALID def_bool ARCH_HAS_HOLES_MEMORYMODEL || !SPARSEMEM -config HAVE_GENERIC_GUP - def_bool y - depends on ARM_LPAE - config HIGHMEM bool "High Memory Support" depends on MMU diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 697ea0510729..4a6ee3e92757 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -140,6 +140,7 @@ config ARM64 select HAVE_DMA_CONTIGUOUS select HAVE_DYNAMIC_FTRACE select HAVE_EFFICIENT_UNALIGNED_ACCESS + select HAVE_FAST_GUP select HAVE_FTRACE_MCOUNT_RECORD select HAVE_FUNCTION_TRACER select HAVE_FUNCTION_GRAPH_TRACER @@ -262,9 +263,6 @@ config GENERIC_CALIBRATE_DELAY config ZONE_DMA32 def_bool y -config HAVE_GENERIC_GUP - def_bool y - config ARCH_ENABLE_MEMORY_HOTPLUG def_bool y diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig index 64108a2a16d4..b1e42f0e4ed0 100644 --- a/arch/mips/Kconfig +++ b/arch/mips/Kconfig @@ -54,10 +54,10 @@ config MIPS select HAVE_DMA_CONTIGUOUS select HAVE_DYNAMIC_FTRACE select HAVE_EXIT_THREAD + select HAVE_FAST_GUP select HAVE_FTRACE_MCOUNT_RECORD select HAVE_FUNCTION_GRAPH_TRACER select HAVE_FUNCTION_TRACER - select HAVE_GENERIC_GUP select HAVE_IDE select HAVE_IOREMAP_PROT select HAVE_IRQ_EXIT_ON_IRQ_STACK diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 8c1c636308c8..992a04796e56 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -185,12 +185,12 @@ config PPC select HAVE_DYNAMIC_FTRACE_WITH_REGSif MPROFILE_KERNEL select HAVE_EBPF_JITif PPC64 select HAVE_EFFICIENT_UNALIGNED_ACCESS if !(CPU_LITTLE_ENDIAN && POWER7_CPU) + select HAVE_FAST_GUP select HAVE_FTRACE_MCOUNT_RECORD select HAVE_FUNCTION_ERROR_INJECTION select HAVE_FUNCTION_GRAPH_TRACER select HAVE_FUNCTION_TRACER select HAVE_GCC_PLUGINS if GCC_VERSION >= 50200 # plugin support on gcc <= 5.1 is buggy on PPC - select HAVE_GENERIC_GUP select HAVE_HW_BREAKPOINT if PERF_EVENTS && (PPC_BOOK3S || PPC_8xx) select HAVE_IDE select HAVE_IOREMAP_PROT diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig index 109243fdb6ec..aaff0376bf53 100644 --- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -137,6 +137,7 @@ config S390 select HAVE_DMA_CONTIGUOUS select HAVE_DYNAMIC_FTRACE select HAVE_DYNAMIC_FTRACE_WITH_REGS + select HAVE_FAST_GUP select HAVE_EFFICIENT_UNALIGNED_ACCESS select HAVE_FENTRY select HAVE_FTRACE_MCOUNT_RECORD @@ -144,7 +145,6 @@ config S390 select HAVE_FUNCTION_TRACER select HAVE_FUTEX_CMPXCHG if FUTEX select HAVE_GCC_PLUGINS - select HAVE_GENERIC_GUP select HAVE_KERNEL_BZIP2 select HAVE_KERNEL_GZIP select HAVE_KERNEL_LZ4 diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig index 6fddfc3c9710..56712f3c9838 100644 --- a/arch/sh/Kconfig +++ b/arch/sh/Kconfig @@ -14,7 +14,7 @@ config SUPERH select HAVE_ARCH_TRACEHOOK select HAVE_PERF_EVENTS select HAVE_DEBUG_BUGVERBOSE - select HAVE_GENERIC_GUP + select HAVE_FAST_GUP select ARCH_HAVE_CUSTOM_GPIO_H select ARCH_HAVE_NMI_SAFE_CMPXCHG if (GUSA_RB || CPU_SH4A) select ARCH_HAS_GCOV_PROFILE_ALL diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig index 22435471f942..659232b760e1 100644 --- a/arch/sparc/Kconfig +++ b/arch/sparc/Kconfig @@ -28,7 +28,7 @@ config SPARC select RTC_DRV_M48T59 select RTC_SYSTOHC select HAVE_ARCH_JUMP_LABEL if SPARC64 - select HAVE_GENERIC_GUP if
[PATCH 09/16] sparc64: use the generic get_user_pages_fast code
The sparc64 code is mostly equivalent to the generic one, minus various bugfixes and two arch overrides that this patch adds to pgtable.h. Signed-off-by: Christoph Hellwig --- arch/sparc/Kconfig | 1 + arch/sparc/include/asm/pgtable_64.h | 18 ++ arch/sparc/mm/Makefile | 2 +- arch/sparc/mm/gup.c | 340 4 files changed, 20 insertions(+), 341 deletions(-) delete mode 100644 arch/sparc/mm/gup.c diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig index 26ab6f5bbaaf..22435471f942 100644 --- a/arch/sparc/Kconfig +++ b/arch/sparc/Kconfig @@ -28,6 +28,7 @@ config SPARC select RTC_DRV_M48T59 select RTC_SYSTOHC select HAVE_ARCH_JUMP_LABEL if SPARC64 + select HAVE_GENERIC_GUP if SPARC64 select GENERIC_IRQ_SHOW select ARCH_WANT_IPC_PARSE_VERSION select GENERIC_PCI_IOMAP diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h index 1904782dcd39..547ff96fb228 100644 --- a/arch/sparc/include/asm/pgtable_64.h +++ b/arch/sparc/include/asm/pgtable_64.h @@ -1098,6 +1098,24 @@ static inline unsigned long untagged_addr(unsigned long start) } #define untagged_addr untagged_addr +static inline bool pte_access_permitted(pte_t pte, bool write) +{ + u64 prot; + + if (tlb_type == hypervisor) { + prot = _PAGE_PRESENT_4V | _PAGE_P_4V; + if (write) + prot |= _PAGE_WRITE_4V; + } else { + prot = _PAGE_PRESENT_4U | _PAGE_P_4U; + if (write) + prot |= _PAGE_WRITE_4U; + } + + return (pte_val(pte) & (prot | _PAGE_SPECIAL)) == prot; +} +#define pte_access_permitted pte_access_permitted + #include #include diff --git a/arch/sparc/mm/Makefile b/arch/sparc/mm/Makefile index d39075b1e3b7..b078205b70e0 100644 --- a/arch/sparc/mm/Makefile +++ b/arch/sparc/mm/Makefile @@ -5,7 +5,7 @@ asflags-y := -ansi ccflags-y := -Werror -obj-$(CONFIG_SPARC64) += ultra.o tlb.o tsb.o gup.o +obj-$(CONFIG_SPARC64) += ultra.o tlb.o tsb.o obj-y += fault_$(BITS).o obj-y += init_$(BITS).o obj-$(CONFIG_SPARC32) += extable.o srmmu.o iommu.o io-unit.o diff --git a/arch/sparc/mm/gup.c b/arch/sparc/mm/gup.c deleted file mode 100644 index 1e770a517d4a.. --- a/arch/sparc/mm/gup.c +++ /dev/null @@ -1,340 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -/* - * Lockless get_user_pages_fast for sparc, cribbed from powerpc - * - * Copyright (C) 2008 Nick Piggin - * Copyright (C) 2008 Novell Inc. - */ - -#include -#include -#include -#include -#include -#include -#include - -/* - * The performance critical leaf functions are made noinline otherwise gcc - * inlines everything into a single function which results in too much - * register pressure. - */ -static noinline int gup_pte_range(pmd_t pmd, unsigned long addr, - unsigned long end, int write, struct page **pages, int *nr) -{ - unsigned long mask, result; - pte_t *ptep; - - if (tlb_type == hypervisor) { - result = _PAGE_PRESENT_4V|_PAGE_P_4V; - if (write) - result |= _PAGE_WRITE_4V; - } else { - result = _PAGE_PRESENT_4U|_PAGE_P_4U; - if (write) - result |= _PAGE_WRITE_4U; - } - mask = result | _PAGE_SPECIAL; - - ptep = pte_offset_kernel(&pmd, addr); - do { - struct page *page, *head; - pte_t pte = *ptep; - - if ((pte_val(pte) & mask) != result) - return 0; - VM_BUG_ON(!pfn_valid(pte_pfn(pte))); - - /* The hugepage case is simplified on sparc64 because -* we encode the sub-page pfn offsets into the -* hugepage PTEs. We could optimize this in the future -* use page_cache_add_speculative() for the hugepage case. -*/ - page = pte_page(pte); - head = compound_head(page); - if (!page_cache_get_speculative(head)) - return 0; - if (unlikely(pte_val(pte) != pte_val(*ptep))) { - put_page(head); - return 0; - } - - pages[*nr] = page; - (*nr)++; - } while (ptep++, addr += PAGE_SIZE, addr != end); - - return 1; -} - -static int gup_huge_pmd(pmd_t *pmdp, pmd_t pmd, unsigned long addr, - unsigned long end, int write, struct page **pages, - int *nr) -{ - struct page *head, *page; - int refs; - - if (!(pmd_val(pmd) & _PAGE_VALID)) - return 0; - - if (write && !pmd_write(pmd)) - return 0; - - refs = 0; - page = pmd_page(pmd) + ((addr & ~PMD_MASK) >> PAGE_SHIFT); - head = compound
[PATCH 08/16] sparc64: define untagged_addr()
Add a helper to untag a user pointer. This is needed for ADI support in get_user_pages_fast. Signed-off-by: Christoph Hellwig --- arch/sparc/include/asm/pgtable_64.h | 22 ++ 1 file changed, 22 insertions(+) diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h index f0dcf991d27f..1904782dcd39 100644 --- a/arch/sparc/include/asm/pgtable_64.h +++ b/arch/sparc/include/asm/pgtable_64.h @@ -1076,6 +1076,28 @@ static inline int io_remap_pfn_range(struct vm_area_struct *vma, } #define io_remap_pfn_range io_remap_pfn_range +static inline unsigned long untagged_addr(unsigned long start) +{ + if (adi_capable()) { + long addr = start; + + /* If userspace has passed a versioned address, kernel +* will not find it in the VMAs since it does not store +* the version tags in the list of VMAs. Storing version +* tags in list of VMAs is impractical since they can be +* changed any time from userspace without dropping into +* kernel. Any address search in VMAs will be done with +* non-versioned addresses. Ensure the ADI version bits +* are dropped here by sign extending the last bit before +* ADI bits. IOMMU does not implement version tags. +*/ + return (addr << (long)adi_nbits()) >> (long)adi_nbits(); + } + + return start; +} +#define untagged_addr untagged_addr + #include #include -- 2.20.1
[PATCH 07/16] sparc64: add the missing pgd_page definition
sparc64 only had pgd_page_vaddr, but not pgd_page. Signed-off-by: Christoph Hellwig --- arch/sparc/include/asm/pgtable_64.h | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h index 22500c3be7a9..f0dcf991d27f 100644 --- a/arch/sparc/include/asm/pgtable_64.h +++ b/arch/sparc/include/asm/pgtable_64.h @@ -861,6 +861,7 @@ static inline unsigned long pud_page_vaddr(pud_t pud) #define pud_clear(pudp)(pud_val(*(pudp)) = 0UL) #define pgd_page_vaddr(pgd)\ ((unsigned long) __va(pgd_val(pgd))) +#define pgd_page(pgd) pfn_to_page(pgd_pfn(pgd)) #define pgd_present(pgd) (pgd_val(pgd) != 0U) #define pgd_clear(pgdp)(pgd_val(*(pgdp)) = 0UL) -- 2.20.1
[PATCH 06/16] sh: use the generic get_user_pages_fast code
The sh code is mostly equivalent to the generic one, minus various bugfixes and two arch overrides that this patch adds to pgtable.h. Signed-off-by: Christoph Hellwig --- arch/sh/Kconfig | 2 + arch/sh/include/asm/pgtable.h | 37 + arch/sh/mm/Makefile | 2 +- arch/sh/mm/gup.c | 277 -- 4 files changed, 40 insertions(+), 278 deletions(-) delete mode 100644 arch/sh/mm/gup.c diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig index b77f512bb176..6fddfc3c9710 100644 --- a/arch/sh/Kconfig +++ b/arch/sh/Kconfig @@ -14,6 +14,7 @@ config SUPERH select HAVE_ARCH_TRACEHOOK select HAVE_PERF_EVENTS select HAVE_DEBUG_BUGVERBOSE + select HAVE_GENERIC_GUP select ARCH_HAVE_CUSTOM_GPIO_H select ARCH_HAVE_NMI_SAFE_CMPXCHG if (GUSA_RB || CPU_SH4A) select ARCH_HAS_GCOV_PROFILE_ALL @@ -63,6 +64,7 @@ config SUPERH config SUPERH32 def_bool "$(ARCH)" = "sh" select ARCH_32BIT_OFF_T + select GUP_GET_PTE_LOW_HIGH if X2TLB select HAVE_KPROBES select HAVE_KRETPROBES select HAVE_IOREMAP_PROT if MMU && !X2TLB diff --git a/arch/sh/include/asm/pgtable.h b/arch/sh/include/asm/pgtable.h index 3587103afe59..9085d1142fa3 100644 --- a/arch/sh/include/asm/pgtable.h +++ b/arch/sh/include/asm/pgtable.h @@ -149,6 +149,43 @@ extern void paging_init(void); extern void page_table_range_init(unsigned long start, unsigned long end, pgd_t *pgd); +static inline bool __pte_access_permitted(pte_t pte, u64 prot) +{ + return (pte_val(pte) & (prot | _PAGE_SPECIAL)) == prot; +} + +#ifdef CONFIG_X2TLB +static inline bool pte_access_permitted(pte_t pte, bool write) +{ + u64 prot = _PAGE_PRESENT; + + prot |= _PAGE_EXT(_PAGE_EXT_KERN_READ | _PAGE_EXT_USER_READ); + if (write) + prot |= _PAGE_EXT(_PAGE_EXT_KERN_WRITE | _PAGE_EXT_USER_WRITE); + return __pte_access_permitted(pte, prot); +} +#elif defined(CONFIG_SUPERH64) +static inline bool pte_access_permitted(pte_t pte, bool write) +{ + u64 prot = _PAGE_PRESENT | _PAGE_USER | _PAGE_READ; + + if (write) + prot |= _PAGE_WRITE; + return __pte_access_permitted(pte, prot); +} +#else +static inline bool pte_access_permitted(pte_t pte, bool write) +{ + u64 prot = _PAGE_PRESENT | _PAGE_USER; + + if (write) + prot |= _PAGE_RW; + return __pte_access_permitted(pte, prot); +} +#endif + +#define pte_access_permitted pte_access_permitted + /* arch/sh/mm/mmap.c */ #define HAVE_ARCH_UNMAPPED_AREA #define HAVE_ARCH_UNMAPPED_AREA_TOPDOWN diff --git a/arch/sh/mm/Makefile b/arch/sh/mm/Makefile index fbe5e79751b3..5051b38fd5b6 100644 --- a/arch/sh/mm/Makefile +++ b/arch/sh/mm/Makefile @@ -17,7 +17,7 @@ cacheops-$(CONFIG_CPU_SHX3) += cache-shx3.o obj-y += $(cacheops-y) mmu-y := nommu.o extable_32.o -mmu-$(CONFIG_MMU) := extable_$(BITS).o fault.o gup.o ioremap.o kmap.o \ +mmu-$(CONFIG_MMU) := extable_$(BITS).o fault.o ioremap.o kmap.o \ pgtable.o tlbex_$(BITS).o tlbflush_$(BITS).o obj-y += $(mmu-y) diff --git a/arch/sh/mm/gup.c b/arch/sh/mm/gup.c deleted file mode 100644 index 277c882f7489.. --- a/arch/sh/mm/gup.c +++ /dev/null @@ -1,277 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -/* - * Lockless get_user_pages_fast for SuperH - * - * Copyright (C) 2009 - 2010 Paul Mundt - * - * Cloned from the x86 and PowerPC versions, by: - * - * Copyright (C) 2008 Nick Piggin - * Copyright (C) 2008 Novell Inc. - */ -#include -#include -#include -#include -#include - -static inline pte_t gup_get_pte(pte_t *ptep) -{ -#ifndef CONFIG_X2TLB - return READ_ONCE(*ptep); -#else - /* -* With get_user_pages_fast, we walk down the pagetables without -* taking any locks. For this we would like to load the pointers -* atomically, but that is not possible with 64-bit PTEs. What -* we do have is the guarantee that a pte will only either go -* from not present to present, or present to not present or both -* -- it will not switch to a completely different present page -* without a TLB flush in between; something that we are blocking -* by holding interrupts off. -* -* Setting ptes from not present to present goes: -* ptep->pte_high = h; -* smp_wmb(); -* ptep->pte_low = l; -* -* And present to not present goes: -* ptep->pte_low = 0; -* smp_wmb(); -* ptep->pte_high = 0; -* -* We must ensure here that the load of pte_low sees l iff pte_high -* sees h. We load pte_high *after* loading pte_low, which ensures we -* don't see an older value of pte_high. *Then* we recheck pte_low, -* which ensures that we haven't picked up a chang
[PATCH 05/16] sh: add the missing pud_page definition
sh only had pud_page_vaddr, but not pud_page. Signed-off-by: Christoph Hellwig --- arch/sh/include/asm/pgtable-3level.h | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/sh/include/asm/pgtable-3level.h b/arch/sh/include/asm/pgtable-3level.h index 7d8587eb65ff..3c7ff20f3f94 100644 --- a/arch/sh/include/asm/pgtable-3level.h +++ b/arch/sh/include/asm/pgtable-3level.h @@ -37,6 +37,7 @@ static inline unsigned long pud_page_vaddr(pud_t pud) { return pud_val(pud); } +#define pud_page(pud) pfn_to_page(pud_pfn(pud)) #define pmd_index(address) (((address) >> PMD_SHIFT) & (PTRS_PER_PMD-1)) static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address) -- 2.20.1
[PATCH 03/16] mm: lift the x86_32 PAE version of gup_get_pte to common code
The split low/high access is the only non-READ_ONCE version of gup_get_pte that did show up in the various arch implemenations. Lift it to common code and drop the ifdef based arch override. Signed-off-by: Christoph Hellwig --- arch/x86/Kconfig | 1 + arch/x86/include/asm/pgtable-3level.h | 47 arch/x86/kvm/mmu.c| 2 +- mm/Kconfig| 3 ++ mm/gup.c | 51 --- 5 files changed, 52 insertions(+), 52 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 2bbbd4d1ba31..7cd53cc59f0f 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -121,6 +121,7 @@ config X86 select GENERIC_STRNCPY_FROM_USER select GENERIC_STRNLEN_USER select GENERIC_TIME_VSYSCALL + select GUP_GET_PTE_LOW_HIGH if X86_PAE select HARDLOCKUP_CHECK_TIMESTAMP if X86_64 select HAVE_ACPI_APEI if ACPI select HAVE_ACPI_APEI_NMI if ACPI diff --git a/arch/x86/include/asm/pgtable-3level.h b/arch/x86/include/asm/pgtable-3level.h index f8b1ad2c3828..e3633795fb22 100644 --- a/arch/x86/include/asm/pgtable-3level.h +++ b/arch/x86/include/asm/pgtable-3level.h @@ -285,53 +285,6 @@ static inline pud_t native_pudp_get_and_clear(pud_t *pudp) #define __pte_to_swp_entry(pte)(__swp_entry(__pteval_swp_type(pte), \ __pteval_swp_offset(pte))) -#define gup_get_pte gup_get_pte -/* - * WARNING: only to be used in the get_user_pages_fast() implementation. - * - * With get_user_pages_fast(), we walk down the pagetables without taking - * any locks. For this we would like to load the pointers atomically, - * but that is not possible (without expensive cmpxchg8b) on PAE. What - * we do have is the guarantee that a PTE will only either go from not - * present to present, or present to not present or both -- it will not - * switch to a completely different present page without a TLB flush in - * between; something that we are blocking by holding interrupts off. - * - * Setting ptes from not present to present goes: - * - * ptep->pte_high = h; - * smp_wmb(); - * ptep->pte_low = l; - * - * And present to not present goes: - * - * ptep->pte_low = 0; - * smp_wmb(); - * ptep->pte_high = 0; - * - * We must ensure here that the load of pte_low sees 'l' iff pte_high - * sees 'h'. We load pte_high *after* loading pte_low, which ensures we - * don't see an older value of pte_high. *Then* we recheck pte_low, - * which ensures that we haven't picked up a changed pte high. We might - * have gotten rubbish values from pte_low and pte_high, but we are - * guaranteed that pte_low will not have the present bit set *unless* - * it is 'l'. Because get_user_pages_fast() only operates on present ptes - * we're safe. - */ -static inline pte_t gup_get_pte(pte_t *ptep) -{ - pte_t pte; - - do { - pte.pte_low = ptep->pte_low; - smp_rmb(); - pte.pte_high = ptep->pte_high; - smp_rmb(); - } while (unlikely(pte.pte_low != ptep->pte_low)); - - return pte; -} - #include #endif /* _ASM_X86_PGTABLE_3LEVEL_H */ diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 1e9ba81accba..3f7cd11168f9 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -653,7 +653,7 @@ static u64 __update_clear_spte_slow(u64 *sptep, u64 spte) /* * The idea using the light way get the spte on x86_32 guest is from - * gup_get_pte(arch/x86/mm/gup.c). + * gup_get_pte (mm/gup.c). * * An spte tlb flush may be pending, because kvm_set_pte_rmapp * coalesces them and we are running out of the MMU lock. Therefore diff --git a/mm/Kconfig b/mm/Kconfig index f0c76ba47695..fe51f104a9e0 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -762,6 +762,9 @@ config GUP_BENCHMARK See tools/testing/selftests/vm/gup_benchmark.c +config GUP_GET_PTE_LOW_HIGH + bool + config ARCH_HAS_PTE_SPECIAL bool diff --git a/mm/gup.c b/mm/gup.c index 3237f33792e6..9b72f2ea3471 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1684,17 +1684,60 @@ struct page *get_dump_page(unsigned long addr) * This code is based heavily on the PowerPC implementation by Nick Piggin. */ #ifdef CONFIG_HAVE_GENERIC_GUP +#ifdef CONFIG_GUP_GET_PTE_LOW_HIGH +/* + * WARNING: only to be used in the get_user_pages_fast() implementation. + * + * With get_user_pages_fast(), we walk down the pagetables without taking any + * locks. For this we would like to load the pointers atomically, but sometimes + * that is not possible (e.g. without expensive cmpxchg8b on x86_32 PAE). What + * we do have is the guarantee that a PTE will only either go from not present + * to present, or present to not present or both -- it will not switch to a + * completely different present page without a TLB flush in between; something + * that we are blocking by h
[PATCH 04/16] MIPS: use the generic get_user_pages_fast code
The mips code is mostly equivalent to the generic one, minus various bugfixes and an arch override for gup_fast_permitted. Note that this defines ARCH_HAS_PTE_SPECIAL for mips as mips has pte_special and pte_mkspecial implemented and used in the existing gup code. They are no-op stubs, though which makes me a little unsure if this is really right thing to do. Signed-off-by: Christoph Hellwig --- arch/mips/Kconfig | 3 + arch/mips/include/asm/pgtable.h | 3 + arch/mips/mm/Makefile | 1 - arch/mips/mm/gup.c | 303 4 files changed, 6 insertions(+), 304 deletions(-) delete mode 100644 arch/mips/mm/gup.c diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig index 70d3200476bf..64108a2a16d4 100644 --- a/arch/mips/Kconfig +++ b/arch/mips/Kconfig @@ -6,6 +6,7 @@ config MIPS select ARCH_BINFMT_ELF_STATE if MIPS_FP_SUPPORT select ARCH_CLOCKSOURCE_DATA select ARCH_HAS_ELF_RANDOMIZE + select ARCH_HAS_PTE_SPECIAL select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST select ARCH_HAS_UBSAN_SANITIZE_ALL select ARCH_SUPPORTS_UPROBES @@ -34,6 +35,7 @@ config MIPS select GENERIC_SCHED_CLOCK if !CAVIUM_OCTEON_SOC select GENERIC_SMP_IDLE_THREAD select GENERIC_TIME_VSYSCALL + select GUP_GET_PTE_LOW_HIGH if CPU_MIPS32 && PHYS_ADDR_T_64BIT select HANDLE_DOMAIN_IRQ select HAVE_ARCH_COMPILER_H select HAVE_ARCH_JUMP_LABEL @@ -55,6 +57,7 @@ config MIPS select HAVE_FTRACE_MCOUNT_RECORD select HAVE_FUNCTION_GRAPH_TRACER select HAVE_FUNCTION_TRACER + select HAVE_GENERIC_GUP select HAVE_IDE select HAVE_IOREMAP_PROT select HAVE_IRQ_EXIT_ON_IRQ_STACK diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h index 4ccb465ef3f2..7d27194e3b45 100644 --- a/arch/mips/include/asm/pgtable.h +++ b/arch/mips/include/asm/pgtable.h @@ -20,6 +20,7 @@ #include #include #include +#include struct mm_struct; struct vm_area_struct; @@ -626,6 +627,8 @@ static inline pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm, #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +#define gup_fast_permitted(start, end) (!cpu_has_dc_aliases) + #include /* diff --git a/arch/mips/mm/Makefile b/arch/mips/mm/Makefile index f34d7ff5eb60..1e8d335025d7 100644 --- a/arch/mips/mm/Makefile +++ b/arch/mips/mm/Makefile @@ -7,7 +7,6 @@ obj-y += cache.o obj-y += context.o obj-y += extable.o obj-y += fault.o -obj-y += gup.o obj-y += init.o obj-y += mmap.o obj-y += page.o diff --git a/arch/mips/mm/gup.c b/arch/mips/mm/gup.c deleted file mode 100644 index 4c2b4483683c.. --- a/arch/mips/mm/gup.c +++ /dev/null @@ -1,303 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -/* - * Lockless get_user_pages_fast for MIPS - * - * Copyright (C) 2008 Nick Piggin - * Copyright (C) 2008 Novell Inc. - * Copyright (C) 2011 Ralf Baechle - */ -#include -#include -#include -#include -#include -#include - -#include -#include - -static inline pte_t gup_get_pte(pte_t *ptep) -{ -#if defined(CONFIG_PHYS_ADDR_T_64BIT) && defined(CONFIG_CPU_MIPS32) - pte_t pte; - -retry: - pte.pte_low = ptep->pte_low; - smp_rmb(); - pte.pte_high = ptep->pte_high; - smp_rmb(); - if (unlikely(pte.pte_low != ptep->pte_low)) - goto retry; - - return pte; -#else - return READ_ONCE(*ptep); -#endif -} - -static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, - int write, struct page **pages, int *nr) -{ - pte_t *ptep = pte_offset_map(&pmd, addr); - do { - pte_t pte = gup_get_pte(ptep); - struct page *page; - - if (!pte_present(pte) || - pte_special(pte) || (write && !pte_write(pte))) { - pte_unmap(ptep); - return 0; - } - VM_BUG_ON(!pfn_valid(pte_pfn(pte))); - page = pte_page(pte); - get_page(page); - SetPageReferenced(page); - pages[*nr] = page; - (*nr)++; - - } while (ptep++, addr += PAGE_SIZE, addr != end); - - pte_unmap(ptep - 1); - return 1; -} - -static inline void get_head_page_multiple(struct page *page, int nr) -{ - VM_BUG_ON(page != compound_head(page)); - VM_BUG_ON(page_count(page) == 0); - page_ref_add(page, nr); - SetPageReferenced(page); -} - -static int gup_huge_pmd(pmd_t pmd, unsigned long addr, unsigned long end, - int write, struct page **pages, int *nr) -{ - pte_t pte = *(pte_t *)&pmd; - struct page *head, *page; - int refs; - -
[PATCH 02/16] mm: simplify gup_fast_permitted
Pass in the already calculated end value instead of recomputing it, and leave the end > start check in the callers instead of duplicating them in the arch code. Signed-off-by: Christoph Hellwig --- arch/s390/include/asm/pgtable.h | 8 +--- arch/x86/include/asm/pgtable_64.h | 8 +--- mm/gup.c | 17 +++-- 3 files changed, 9 insertions(+), 24 deletions(-) diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h index 9f0195d5fa16..9b274fcaacb6 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -1270,14 +1270,8 @@ static inline pte_t *pte_offset(pmd_t *pmd, unsigned long address) #define pte_offset_map(pmd, address) pte_offset_kernel(pmd, address) #define pte_unmap(pte) do { } while (0) -static inline bool gup_fast_permitted(unsigned long start, int nr_pages) +static inline bool gup_fast_permitted(unsigned long start, unsigned long end) { - unsigned long len, end; - - len = (unsigned long) nr_pages << PAGE_SHIFT; - end = start + len; - if (end < start) - return false; return end <= current->mm->context.asce_limit; } #define gup_fast_permitted gup_fast_permitted diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h index 0bb566315621..4990d26dfc73 100644 --- a/arch/x86/include/asm/pgtable_64.h +++ b/arch/x86/include/asm/pgtable_64.h @@ -259,14 +259,8 @@ extern void init_extra_mapping_uc(unsigned long phys, unsigned long size); extern void init_extra_mapping_wb(unsigned long phys, unsigned long size); #define gup_fast_permitted gup_fast_permitted -static inline bool gup_fast_permitted(unsigned long start, int nr_pages) +static inline bool gup_fast_permitted(unsigned long start, unsigned long end) { - unsigned long len, end; - - len = (unsigned long)nr_pages << PAGE_SHIFT; - end = start + len; - if (end < start) - return false; if (end >> __VIRTUAL_MASK_SHIFT) return false; return true; diff --git a/mm/gup.c b/mm/gup.c index 6bb521db67ec..3237f33792e6 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -2123,13 +2123,9 @@ static void gup_pgd_range(unsigned long addr, unsigned long end, * Check if it's allowed to use __get_user_pages_fast() for the range, or * we need to fall back to the slow version: */ -bool gup_fast_permitted(unsigned long start, int nr_pages) +static bool gup_fast_permitted(unsigned long start, unsigned long end) { - unsigned long len, end; - - len = (unsigned long) nr_pages << PAGE_SHIFT; - end = start + len; - return end >= start; + return true; } #endif @@ -2150,6 +2146,8 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write, len = (unsigned long) nr_pages << PAGE_SHIFT; end = start + len; + if (end <= start) + return 0; if (unlikely(!access_ok((void __user *)start, len))) return 0; @@ -2165,7 +2163,7 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write, * block IPIs that come from THPs splitting. */ - if (gup_fast_permitted(start, nr_pages)) { + if (gup_fast_permitted(start, end)) { local_irq_save(flags); gup_pgd_range(start, end, write ? FOLL_WRITE : 0, pages, &nr); local_irq_restore(flags); @@ -2224,13 +,12 @@ int get_user_pages_fast(unsigned long start, int nr_pages, len = (unsigned long) nr_pages << PAGE_SHIFT; end = start + len; - if (nr_pages <= 0) + if (end <= start) return 0; - if (unlikely(!access_ok((void __user *)start, len))) return -EFAULT; - if (gup_fast_permitted(start, nr_pages)) { + if (gup_fast_permitted(start, end)) { local_irq_disable(); gup_pgd_range(addr, end, gup_flags, pages, &nr); local_irq_enable(); -- 2.20.1
switch the remaining architectures to use generic GUP v3
Hi Linus and maintainers, below is a series to switch mips, sh and sparc64 to use the generic GUP code so that we only have one codebase to touch for further improvements to this code. I don't have hardware for any of these architectures, and generally no clue about their page table management, so handle with care. Changes since v2: - rebase to mainline to pick up the untagged_addr definition - fix the gup range check to be start <= end to catch the 0 length case - use pfn based version for the missing pud_page/pgd_page definitions - fix a wrong check in the sparc64 version of pte_access_permitted Changes since v1: - fix various issues found by the build bot - cherry pick and use the untagged_addr helper form Andrey - add various refactoring patches to share more code over architectures - move the powerpc hugepd code to mm/gup.c and sync it with the generic hup semantics
[PATCH 01/16] mm: use untagged_addr() for get_user_pages_fast addresses
This will allow sparc64 to override its ADI tags for get_user_pages and get_user_pages_fast. Signed-off-by: Christoph Hellwig --- mm/gup.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index ddde097cf9e4..6bb521db67ec 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -2146,7 +2146,7 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write, unsigned long flags; int nr = 0; - start &= PAGE_MASK; + start = untagged_addr(start) & PAGE_MASK; len = (unsigned long) nr_pages << PAGE_SHIFT; end = start + len; @@ -2219,7 +2219,7 @@ int get_user_pages_fast(unsigned long start, int nr_pages, unsigned long addr, len, end; int nr = 0, ret = 0; - start &= PAGE_MASK; + start = untagged_addr(start) & PAGE_MASK; addr = start; len = (unsigned long) nr_pages << PAGE_SHIFT; end = start + len; -- 2.20.1
[PATCH v2 3/4] crypto: talitos - eliminate unneeded 'done' functions at build time
When building for SEC1 only, talitos2_done functions are unneeded and should go away. For this, use has_ftr_sec1() which will always return true when only SEC1 support is being built, allowing GCC to drop TALITOS2 functions. Signed-off-by: Christophe Leroy --- drivers/crypto/talitos.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c index 4f03baef952b..b2de931de623 100644 --- a/drivers/crypto/talitos.c +++ b/drivers/crypto/talitos.c @@ -3401,7 +3401,7 @@ static int talitos_probe(struct platform_device *ofdev) if (err) goto err_out; - if (of_device_is_compatible(np, "fsl,sec1.0")) { + if (has_ftr_sec1(priv)) { if (priv->num_channels == 1) tasklet_init(&priv->done_task[0], talitos1_done_ch0, (unsigned long)dev); -- 2.13.3
[PATCH v2 4/4] crypto: talitos - drop icv_ool
icv_ool is not used anymore, drop it. Fixes: 9cc87bc3613b ("crypto: talitos - fix AEAD processing") Signed-off-by: Christophe Leroy --- drivers/crypto/talitos.c | 3 --- drivers/crypto/talitos.h | 2 -- 2 files changed, 5 deletions(-) diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c index b2de931de623..03b7a5d28fb0 100644 --- a/drivers/crypto/talitos.c +++ b/drivers/crypto/talitos.c @@ -1278,9 +1278,6 @@ static int ipsec_esp(struct talitos_edesc *edesc, struct aead_request *areq, is_ipsec_esp && !encrypt); tbl_off += ret; - /* ICV data */ - edesc->icv_ool = !encrypt; - if (!encrypt && is_ipsec_esp) { struct talitos_ptr *tbl_ptr = &edesc->link_tbl[tbl_off]; diff --git a/drivers/crypto/talitos.h b/drivers/crypto/talitos.h index 95f78c6d9206..1469b956948a 100644 --- a/drivers/crypto/talitos.h +++ b/drivers/crypto/talitos.h @@ -46,7 +46,6 @@ struct talitos_desc { * talitos_edesc - s/w-extended descriptor * @src_nents: number of segments in input scatterlist * @dst_nents: number of segments in output scatterlist - * @icv_ool: whether ICV is out-of-line * @iv_dma: dma address of iv for checking continuity and link table * @dma_len: length of dma mapped link_tbl space * @dma_link_tbl: bus physical address of link_tbl/buf @@ -61,7 +60,6 @@ struct talitos_desc { struct talitos_edesc { int src_nents; int dst_nents; - bool icv_ool; dma_addr_t iv_dma; int dma_len; dma_addr_t dma_link_tbl; -- 2.13.3
[PATCH v2 2/4] crypto: talitos - fix hash on SEC1.
On SEC1, hash provides wrong result when performing hashing in several steps with input data SG list has more than one element. This was detected with CONFIG_CRYPTO_MANAGER_EXTRA_TESTS: [ 44.185947] alg: hash: md5-talitos test failed (wrong result) on test vector 6, cfg="random: may_sleep use_finup src_divs=[25.88%@+8063, 24.19%@+9588, 28.63%@+16333, 4.60%@+6756, 16.70%@+16281] dst_divs=[71.61%@alignmask+16361, 14.36%@+7756, 14.3%@+" [ 44.325122] alg: hash: sha1-talitos test failed (wrong result) on test vector 3, cfg="random: inplace use_final src_divs=[16.56%@+16378, 52.0%@+16329, 21.42%@alignmask+16380, 10.2%@alignmask+16380] iv_offset=39" [ 44.493500] alg: hash: sha224-talitos test failed (wrong result) on test vector 4, cfg="random: use_final nosimd src_divs=[52.27%@+7401, 17.34%@+16285, 17.71%@+26, 12.68%@+10644] iv_offset=43" [ 44.673262] alg: hash: sha256-talitos test failed (wrong result) on test vector 4, cfg="random: may_sleep use_finup src_divs=[60.6%@+12790, 17.86%@+1329, 12.64%@alignmask+16300, 8.29%@+15, 0.40%@+13506, 0.51%@+16322, 0.24%@+16339] dst_divs" This is due to two issues: - We have an overlap between the buffer used for copying the input data (SEC1 doesn't do scatter/gather) and the chained descriptor. - Data copy is wrong when the previous hash left less than one blocksize of data to hash, implying a complement of the previous block with a few bytes from the new request. This patch fixes it by: - Moving the second descriptor after the buffer, as moving the buffer after the descriptor would make it more complex for other cipher operations (AEAD, ABLKCIPHER) - Rebuiding a new data SG list without the bytes taken from the new request to complete the previous one. Fixes: 37b5e8897eb5 ("crypto: talitos - chain in buffered data for ahash on SEC1") Cc: sta...@vger.kernel.org Signed-off-by: Christophe Leroy --- drivers/crypto/talitos.c | 63 ++-- 1 file changed, 40 insertions(+), 23 deletions(-) diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c index 5b401aec6c84..4f03baef952b 100644 --- a/drivers/crypto/talitos.c +++ b/drivers/crypto/talitos.c @@ -336,15 +336,18 @@ static void flush_channel(struct device *dev, int ch, int error, int reset_ch) tail = priv->chan[ch].tail; while (priv->chan[ch].fifo[tail].desc) { __be32 hdr; + struct talitos_edesc *edesc; request = &priv->chan[ch].fifo[tail]; + edesc = container_of(request->desc, struct talitos_edesc, desc); /* descriptors with their done bits set don't get the error */ rmb(); if (!is_sec1) hdr = request->desc->hdr; else if (request->desc->next_desc) - hdr = (request->desc + 1)->hdr1; + hdr = ((struct talitos_desc *) + (edesc->buf + edesc->dma_len))->hdr1; else hdr = request->desc->hdr1; @@ -476,8 +479,14 @@ static u32 current_desc_hdr(struct device *dev, int ch) } } - if (priv->chan[ch].fifo[iter].desc->next_desc == cur_desc) - return (priv->chan[ch].fifo[iter].desc + 1)->hdr; + if (priv->chan[ch].fifo[iter].desc->next_desc == cur_desc) { + struct talitos_edesc *edesc; + + edesc = container_of(priv->chan[ch].fifo[iter].desc, +struct talitos_edesc, desc); + return ((struct talitos_desc *) + (edesc->buf + edesc->dma_len))->hdr; + } return priv->chan[ch].fifo[iter].desc->hdr; } @@ -1402,15 +1411,11 @@ static struct talitos_edesc *talitos_edesc_alloc(struct device *dev, edesc->dst_nents = dst_nents; edesc->iv_dma = iv_dma; edesc->dma_len = dma_len; - if (dma_len) { - void *addr = &edesc->link_tbl[0]; - - if (is_sec1 && !dst) - addr += sizeof(struct talitos_desc); - edesc->dma_link_tbl = dma_map_single(dev, addr, + if (dma_len) + edesc->dma_link_tbl = dma_map_single(dev, &edesc->link_tbl[0], edesc->dma_len, DMA_BIDIRECTIONAL); - } + return edesc; } @@ -1722,14 +1727,16 @@ static void common_nonsnoop_hash_unmap(struct device *dev, struct talitos_private *priv = dev_get_drvdata(dev); bool is_sec1 = has_ftr_sec1(priv); struct talitos_desc *desc = &edesc->desc; - struct talitos_desc *desc2 = desc + 1; + struct talitos_desc *desc2 = (struct talitos_desc *) +(edesc->buf + edesc->dma_len); unmap_single_talitos_ptr(dev, &edesc->desc.ptr[5], DMA_FROM_DEVICE); if (desc->next_desc && desc->pt
[PATCH v2 1/4] crypto: talitos - move struct talitos_edesc into talitos.h
Next patch will require struct talitos_edesc to be defined earlier in talitos.c This patch moves it into talitos.h so that it can be used from any place in talitos.c Fixes: 37b5e8897eb5 ("crypto: talitos - chain in buffered data for ahash on SEC1") Cc: sta...@vger.kernel.org Signed-off-by: Christophe Leroy --- drivers/crypto/talitos.c | 30 -- drivers/crypto/talitos.h | 30 ++ 2 files changed, 30 insertions(+), 30 deletions(-) diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c index 3b3e99f1cddb..5b401aec6c84 100644 --- a/drivers/crypto/talitos.c +++ b/drivers/crypto/talitos.c @@ -951,36 +951,6 @@ static int aead_des3_setkey(struct crypto_aead *authenc, goto out; } -/* - * talitos_edesc - s/w-extended descriptor - * @src_nents: number of segments in input scatterlist - * @dst_nents: number of segments in output scatterlist - * @icv_ool: whether ICV is out-of-line - * @iv_dma: dma address of iv for checking continuity and link table - * @dma_len: length of dma mapped link_tbl space - * @dma_link_tbl: bus physical address of link_tbl/buf - * @desc: h/w descriptor - * @link_tbl: input and output h/w link tables (if {src,dst}_nents > 1) (SEC2) - * @buf: input and output buffeur (if {src,dst}_nents > 1) (SEC1) - * - * if decrypting (with authcheck), or either one of src_nents or dst_nents - * is greater than 1, an integrity check value is concatenated to the end - * of link_tbl data - */ -struct talitos_edesc { - int src_nents; - int dst_nents; - bool icv_ool; - dma_addr_t iv_dma; - int dma_len; - dma_addr_t dma_link_tbl; - struct talitos_desc desc; - union { - struct talitos_ptr link_tbl[0]; - u8 buf[0]; - }; -}; - static void talitos_sg_unmap(struct device *dev, struct talitos_edesc *edesc, struct scatterlist *src, diff --git a/drivers/crypto/talitos.h b/drivers/crypto/talitos.h index 32ad4fc679ed..95f78c6d9206 100644 --- a/drivers/crypto/talitos.h +++ b/drivers/crypto/talitos.h @@ -42,6 +42,36 @@ struct talitos_desc { #define TALITOS_DESC_SIZE (sizeof(struct talitos_desc) - sizeof(__be32)) +/* + * talitos_edesc - s/w-extended descriptor + * @src_nents: number of segments in input scatterlist + * @dst_nents: number of segments in output scatterlist + * @icv_ool: whether ICV is out-of-line + * @iv_dma: dma address of iv for checking continuity and link table + * @dma_len: length of dma mapped link_tbl space + * @dma_link_tbl: bus physical address of link_tbl/buf + * @desc: h/w descriptor + * @link_tbl: input and output h/w link tables (if {src,dst}_nents > 1) (SEC2) + * @buf: input and output buffeur (if {src,dst}_nents > 1) (SEC1) + * + * if decrypting (with authcheck), or either one of src_nents or dst_nents + * is greater than 1, an integrity check value is concatenated to the end + * of link_tbl data + */ +struct talitos_edesc { + int src_nents; + int dst_nents; + bool icv_ool; + dma_addr_t iv_dma; + int dma_len; + dma_addr_t dma_link_tbl; + struct talitos_desc desc; + union { + struct talitos_ptr link_tbl[0]; + u8 buf[0]; + }; +}; + /** * talitos_request - descriptor submission request * @desc: descriptor pointer (kernel virtual) -- 2.13.3
[PATCH v2 0/4] Additional fixes on Talitos driver
This series is the last set of fixes for the Talitos driver. We now get a fully clean boot on both SEC1 (SEC1.2 on mpc885) and SEC2 (SEC2.2 on mpc8321E) with CONFIG_CRYPTO_MANAGER_EXTRA_TESTS: [3.385197] bus: 'platform': really_probe: probing driver talitos with device ff02.crypto [3.450982] random: fast init done [ 12.252548] alg: No test for authenc(hmac(md5),cbc(aes)) (authenc-hmac-md5-cbc-aes-talitos-hsna) [ 12.262226] alg: No test for authenc(hmac(md5),cbc(des3_ede)) (authenc-hmac-md5-cbc-3des-talitos-hsna) [ 43.310737] Bug in SEC1, padding ourself [ 45.603318] random: crng init done [ 54.612333] talitos ff02.crypto: fsl,sec1.2 algorithms registered in /proc/crypto [ 54.620232] driver: 'talitos': driver_bound: bound to device 'ff02.crypto' [1.193721] bus: 'platform': really_probe: probing driver talitos with device b003.crypto [1.229197] random: fast init done [2.714920] alg: No test for authenc(hmac(sha224),cbc(aes)) (authenc-hmac-sha224-cbc-aes-talitos) [2.724312] alg: No test for authenc(hmac(sha224),cbc(aes)) (authenc-hmac-sha224-cbc-aes-talitos-hsna) [4.482045] alg: No test for authenc(hmac(md5),cbc(aes)) (authenc-hmac-md5-cbc-aes-talitos) [4.490940] alg: No test for authenc(hmac(md5),cbc(aes)) (authenc-hmac-md5-cbc-aes-talitos-hsna) [4.500280] alg: No test for authenc(hmac(md5),cbc(des3_ede)) (authenc-hmac-md5-cbc-3des-talitos) [4.509727] alg: No test for authenc(hmac(md5),cbc(des3_ede)) (authenc-hmac-md5-cbc-3des-talitos-hsna) [6.631781] random: crng init done [ 11.521795] talitos b003.crypto: fsl,sec2.2 algorithms registered in /proc/crypto [ 11.529803] driver: 'talitos': driver_bound: bound to device 'b003.crypto' v2: dropped patch 1 which was irrelevant due to a rebase weirdness. Added Cc to stable on the 2 first patches. Christophe Leroy (4): crypto: talitos - move struct talitos_edesc into talitos.h crypto: talitos - fix hash on SEC1. crypto: talitos - eliminate unneeded 'done' functions at build time crypto: talitos - drop icv_ool drivers/crypto/talitos.c | 98 drivers/crypto/talitos.h | 28 ++ 2 files changed, 69 insertions(+), 57 deletions(-) -- 2.13.3
[PATCH 28/28] powerpc/64s/exception: avoid SPR RAW scoreboard stall in real mode entry
Move SPR reads ahead of writes. Real mode entry that is not a KVM guest is rare these days, but bad practice propagates. Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/exceptions-64s.S | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index d9e531a00319..df9c3126fe08 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -183,19 +183,19 @@ END_FTR_SECTION_NESTED(ftr,ftr,943) .endif .if \hsrr mfspr r11,SPRN_HSRR0 /* save HSRR0 */ + mfspr r12,SPRN_HSRR1 /* and HSRR1 */ + mtspr SPRN_HSRR1,r10 .else mfspr r11,SPRN_SRR0 /* save SRR0 */ + mfspr r12,SPRN_SRR1 /* and SRR1 */ + mtspr SPRN_SRR1,r10 .endif - LOAD_HANDLER(r12, \label\()) + LOAD_HANDLER(r10, \label\()) .if \hsrr - mtspr SPRN_HSRR0,r12 - mfspr r12,SPRN_HSRR1 /* and HSRR1 */ - mtspr SPRN_HSRR1,r10 + mtspr SPRN_HSRR0,r10 HRFI_TO_KERNEL .else - mtspr SPRN_SRR0,r12 - mfspr r12,SPRN_SRR1 /* and SRR1 */ - mtspr SPRN_SRR1,r10 + mtspr SPRN_SRR0,r10 RFI_TO_KERNEL .endif b . /* prevent speculative execution */ -- 2.20.1
[PATCH 27/28] powerpc/64s/exception: clean up system call entry
syscall / hcall entry unnecessarily differs between KVM and non-KVM builds. Move the SMT priority instruction to the same location (after INTERRUPT_TO_KERNEL). Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/exceptions-64s.S | 25 +++-- 1 file changed, 7 insertions(+), 18 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index c1075bbe4677..d9e531a00319 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -1643,10 +1643,8 @@ EXC_COMMON(trap_0b_common, 0xb00, unknown_exception) std r10,PACA_EXGEN+EX_R10(r13) INTERRUPT_TO_KERNEL KVMTEST EXC_STD 0xc00 /* uses r10, branch to do_kvm_0xc00_system_call */ - HMT_MEDIUM mfctr r9 #else - HMT_MEDIUM mr r9,r13 GET_PACA(r13) INTERRUPT_TO_KERNEL @@ -1658,11 +1656,13 @@ BEGIN_FTR_SECTION beq-1f END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE) #endif - /* We reach here with PACA in r13, r13 in r9, and HMT_MEDIUM. */ - - .if \real + /* We reach here with PACA in r13, r13 in r9. */ mfspr r11,SPRN_SRR0 mfspr r12,SPRN_SRR1 + + HMT_MEDIUM + + .if \real __LOAD_HANDLER(r10, system_call_common) mtspr SPRN_SRR0,r10 ld r10,PACAKMSR(r13) @@ -1670,24 +1670,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE) RFI_TO_KERNEL b . /* prevent speculative execution */ .else + li r10,MSR_RI + mtmsrd r10,1 /* Set RI (EE=0) */ #ifdef CONFIG_RELOCATABLE - /* -* We can't branch directly so we do it via the CTR which -* is volatile across system calls. -*/ __LOAD_HANDLER(r10, system_call_common) mtctr r10 - mfspr r11,SPRN_SRR0 - mfspr r12,SPRN_SRR1 - li r10,MSR_RI - mtmsrd r10,1 bctr #else - /* We can branch directly */ - mfspr r11,SPRN_SRR0 - mfspr r12,SPRN_SRR1 - li r10,MSR_RI - mtmsrd r10,1 /* Set RI (EE=0) */ b system_call_common #endif .endif -- 2.20.1
[PATCH 26/28] powerpc/64s/exception: move paca save area offsets into exception-64s.S
No generated code change. File is change is in bug table line numbers. Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/exception-64s.h | 17 +++-- arch/powerpc/kernel/exceptions-64s.S | 22 ++ 2 files changed, 25 insertions(+), 14 deletions(-) diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h index 79e5ac87c029..33f4f72eb035 100644 --- a/arch/powerpc/include/asm/exception-64s.h +++ b/arch/powerpc/include/asm/exception-64s.h @@ -32,22 +32,11 @@ */ #include -/* PACA save area offsets (exgen, exmc, etc) */ -#define EX_R9 0 -#define EX_R10 8 -#define EX_R11 16 -#define EX_R12 24 -#define EX_R13 32 -#define EX_DAR 40 -#define EX_DSISR 48 -#define EX_CCR 52 -#define EX_CFAR56 -#define EX_PPR 64 +/* PACA save area size in u64 units (exgen, exmc, etc) */ #if defined(CONFIG_RELOCATABLE) -#define EX_CTR 72 -#define EX_SIZE10 /* size in u64 units */ +#define EX_SIZE10 #else -#define EX_SIZE9 /* size in u64 units */ +#define EX_SIZE9 #endif /* diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 8b571a2b3d76..c1075bbe4677 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -21,6 +21,28 @@ #include #include +/* PACA save area offsets (exgen, exmc, etc) */ +#define EX_R9 0 +#define EX_R10 8 +#define EX_R11 16 +#define EX_R12 24 +#define EX_R13 32 +#define EX_DAR 40 +#define EX_DSISR 48 +#define EX_CCR 52 +#define EX_CFAR56 +#define EX_PPR 64 +#if defined(CONFIG_RELOCATABLE) +#define EX_CTR 72 +.if EX_SIZE != 10 + .error "EX_SIZE is wrong" +.endif +#else +.if EX_SIZE != 9 + .error "EX_SIZE is wrong" +.endif +#endif + /* * We're short on space and time in the exception prolog, so we can't * use the normal LOAD_REG_IMMEDIATE macro to load the address of label. -- 2.20.1
[PATCH 25/28] powerpc/64s/exception: remove pointless EXCEPTION_PROLOG macro indirection
No generated code change. File is change is in bug table line numbers. Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/exceptions-64s.S | 97 +--- 1 file changed, 45 insertions(+), 52 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index b402a006cd48..8b571a2b3d76 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -334,34 +334,6 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948) std r0,GPR0(r1);/* save r0 in stackframe*/ \ std r10,GPR1(r1); /* save r1 in stackframe*/ \ - -/* - * The common exception prolog is used for all except a few exceptions - * such as a segment miss on a kernel address. We have to be prepared - * to take another exception from the point where we first touch the - * kernel stack onwards. - * - * On entry r13 points to the paca, r9-r13 are saved in the paca, - * r9 contains the saved CR, r11 and r12 contain the saved SRR0 and - * SRR1, and relocation is on. - */ -#define EXCEPTION_PROLOG_COMMON(n, area) \ - andi. r10,r12,MSR_PR; /* See if coming from user */ \ - mr r10,r1; /* Save r1 */ \ - subir1,r1,INT_FRAME_SIZE; /* alloc frame on kernel stack */ \ - beq-1f;\ - ld r1,PACAKSAVE(r13); /* kernel stack to use */ \ -1: tdgei r1,-INT_FRAME_SIZE; /* trap if r1 is in userspace */ \ - EMIT_BUG_ENTRY 1b,__FILE__,__LINE__,0; \ -3: EXCEPTION_PROLOG_COMMON_1(); \ - kuap_save_amr_and_lock r9, r10, cr1, cr0; \ - beq 4f; /* if from kernel mode */ \ - ACCOUNT_CPU_USER_ENTRY(r13, r9, r10); \ - SAVE_PPR(area, r9);\ -4: EXCEPTION_PROLOG_COMMON_2(area)\ - EXCEPTION_PROLOG_COMMON_3(n) \ - ACCOUNT_STOLEN_TIME - /* Save original regs values from save area to stack frame. */ #define EXCEPTION_PROLOG_COMMON_2(area) \ ld r9,area+EX_R9(r13); /* move r9, r10 to stackframe */ \ @@ -381,7 +353,7 @@ END_FTR_SECTION_NESTED(CPU_FTR_CFAR, CPU_FTR_CFAR, 66); \ GET_CTR(r10, area);\ std r10,_CTR(r1); -#define EXCEPTION_PROLOG_COMMON_3(n) \ +#define EXCEPTION_PROLOG_COMMON_3(trap) \ std r2,GPR2(r1);/* save r2 in stackframe*/ \ SAVE_4GPRS(3, r1); /* save r3 - r6 in stackframe */ \ SAVE_2GPRS(7, r1); /* save r7, r8 in stackframe*/ \ @@ -392,26 +364,38 @@ END_FTR_SECTION_NESTED(CPU_FTR_CFAR, CPU_FTR_CFAR, 66); \ mfspr r11,SPRN_XER; /* save XER in stackframe */ \ std r10,SOFTE(r1); \ std r11,_XER(r1); \ - li r9,(n)+1; \ + li r9,(trap)+1; \ std r9,_TRAP(r1); /* set trap number */ \ li r10,0; \ ld r11,exception_marker@toc(r2); \ std r10,RESULT(r1); /* clear regs->result */ \ std r11,STACK_FRAME_OVERHEAD-16(r1); /* mark the frame */ -#define RUNLATCH_ON\ -BEGIN_FTR_SECTION \ - ld r3, PACA_THREAD_INFO(r13); \ - ld r4,TI_LOCAL_FLAGS(r3); \ - andi. r0,r4,_TLF_RUNLATCH;\ - beqlppc64_runlatch_on_trampoline; \ -END_FTR_SECTION_IFSET(CPU_FTR_CTRL) - -#define EXCEPTION_COMMON(area, trap) \ - EXCEPTION_PROLOG_COMMON(trap, area);\ +/* + * On entry r13 points to the paca, r9-r13 are saved in the paca, + * r9 contains the saved CR, r11 and r12 contain the saved SRR0 and + * SRR1, and relocation is on. + */ +#define EXCEPTION_COMMON(area, trap) \ + andi. r10,r12,MSR_PR; /* See if coming from user */ \ + mr r10,r1; /* Save r1 */ \ + subir1,r1,INT_FRAME_SIZE; /* alloc frame on kernel stack */ \ + beq-1f;
[PATCH 24/28] powerpc/64s/exception: remove bad stack branch
The bad stack test in interrupt handlers has a few problems. For performance it is taken in the common case, which is a fetch bubble and a waste of i-cache. For code development and maintainence, it requires yet another stack frame setup routine, and that constrains all exception handlers to follow the same register save pattern which inhibits future optimisation. Remove the test/branch and replace it with a trap. Teach the program check handler to use the emergency stack for this case. This does not result in quite so nice a message, however the SRR0 and SRR1 of the crashed interrupt can be seen in r11 and r12, as is the original r1 (adjusted by INT_FRAME_SIZE). These are the most important parts to debugging the issue. The original r9-12 and cr0 is lost, which is the main downside. kernel BUG at linux/arch/powerpc/kernel/exceptions-64s.S:847! Oops: Exception in kernel mode, sig: 5 [#1] BE SMP NR_CPUS=2048 NUMA PowerNV Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted NIP: c0009108 LR: c0cadbcc CTR: c00090f0 REGS: c000fffcbd70 TRAP: 0700 Not tainted MSR: 90021032 CR: 28222448 XER: 2004 CFAR: c0009100 IRQMASK: 0 GPR00: 003d fd00 c18cfb00 c000f02b3166 GPR04: fffd 0007 fffb 0030 GPR08: 0037 28222448 c0ca8de0 GPR12: 92009032 c1ae c0010a00 GPR16: GPR20: c000f00322c0 c0f85200 0004 GPR24: fffe 000a GPR28: c000f02b391c c000f02b3167 NIP [c0009108] decrementer_common+0x18/0x160 LR [c0cadbcc] .vsnprintf+0x3ec/0x4f0 Call Trace: Instruction dump: 996d098a 994d098b 38610070 480246ed 48005518 6000 3820 718a4000 7c2a0b78 3821fd00 41c20008 e82d0970 <0981fd00> f92101a0 f9610170 f9810178 Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/exception-64s.h | 7 -- arch/powerpc/include/asm/paca.h | 2 + arch/powerpc/kernel/asm-offsets.c| 2 + arch/powerpc/kernel/exceptions-64s.S | 95 arch/powerpc/xmon/xmon.c | 2 + 5 files changed, 22 insertions(+), 86 deletions(-) diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h index dc6a5ccac965..79e5ac87c029 100644 --- a/arch/powerpc/include/asm/exception-64s.h +++ b/arch/powerpc/include/asm/exception-64s.h @@ -55,13 +55,6 @@ */ #define MAX_MCE_DEPTH 4 -/* - * EX_R3 is only used by the bad_stack handler. bad_stack reloads and - * saves DAR from SPRN_DAR, and EX_DAR is not used. So EX_R3 can overlap - * with EX_DAR. - */ -#define EX_R3 EX_DAR - #ifdef __ASSEMBLY__ #define STF_ENTRY_BARRIER_SLOT \ diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h index 9bd2326bef6f..e3cc9eb9204d 100644 --- a/arch/powerpc/include/asm/paca.h +++ b/arch/powerpc/include/asm/paca.h @@ -166,7 +166,9 @@ struct paca_struct { u64 kstack; /* Saved Kernel stack addr */ u64 saved_r1; /* r1 save for RTAS calls or PM or EE=0 */ u64 saved_msr; /* MSR saved here by enter_rtas */ +#ifdef CONFIG_PPC_BOOK3E u16 trap_save; /* Used when bad stack is encountered */ +#endif u8 irq_soft_mask; /* mask for irq soft masking */ u8 irq_happened;/* irq happened while soft-disabled */ u8 irq_work_pending;/* IRQ_WORK interrupt while soft-disable */ diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 31dc7e64cbfc..4ccb6b3a7fbd 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -266,7 +266,9 @@ int main(void) OFFSET(ACCOUNT_STARTTIME_USER, paca_struct, accounting.starttime_user); OFFSET(ACCOUNT_USER_TIME, paca_struct, accounting.utime); OFFSET(ACCOUNT_SYSTEM_TIME, paca_struct, accounting.stime); +#ifdef CONFIG_PPC_BOOK3E OFFSET(PACA_TRAP_SAVE, paca_struct, trap_save); +#endif OFFSET(PACA_SPRG_VDSO, paca_struct, sprg_vdso); #else /* CONFIG_PPC64 */ #ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index ce7aad9d3840..b402a006cd48 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -351,14 +351,8 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948) subir1,r1,INT_FRAME_SIZE; /* alloc frame on kernel stack */ \ beq-1f;\
[PATCH 23/28] powerpc/64s/exception: generate regs clear instructions using .rept
No generated code change. Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/exceptions-64s.S | 29 +++- 1 file changed, 16 insertions(+), 13 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index a0721c3fc097..ce7aad9d3840 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -2018,12 +2018,11 @@ BEGIN_FTR_SECTION mtmsrd r10 sync -#define FMR2(n) fmr (n), (n) ; fmr n+1, n+1 -#define FMR4(n) FMR2(n) ; FMR2(n+2) -#define FMR8(n) FMR4(n) ; FMR4(n+4) -#define FMR16(n) FMR8(n) ; FMR8(n+8) -#define FMR32(n) FMR16(n) ; FMR16(n+16) - FMR32(0) + .Lreg=0 + .rept 32 + fmr .Lreg,.Lreg + .Lreg=.Lreg+1 + .endr FTR_SECTION_ELSE /* @@ -2035,12 +2034,11 @@ FTR_SECTION_ELSE mtmsrd r10 sync -#define XVCPSGNDP2(n) XVCPSGNDP(n,n,n) ; XVCPSGNDP(n+1,n+1,n+1) -#define XVCPSGNDP4(n) XVCPSGNDP2(n) ; XVCPSGNDP2(n+2) -#define XVCPSGNDP8(n) XVCPSGNDP4(n) ; XVCPSGNDP4(n+4) -#define XVCPSGNDP16(n) XVCPSGNDP8(n) ; XVCPSGNDP8(n+8) -#define XVCPSGNDP32(n) XVCPSGNDP16(n) ; XVCPSGNDP16(n+16) - XVCPSGNDP32(0) + .Lreg=0 + .rept 32 + XVCPSGNDP(.Lreg,.Lreg,.Lreg) + .Lreg=.Lreg+1 + .endr ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_206) @@ -2051,7 +2049,12 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S) * To denormalise we need to move a copy of the register to itself. * For POWER8 we need to do that for all 64 VSX registers */ - XVCPSGNDP32(32) + .Lreg=32 + .rept 32 + XVCPSGNDP(.Lreg,.Lreg,.Lreg) + .Lreg=.Lreg+1 + .endr + denorm_done: mfspr r11,SPRN_HSRR0 subir11,r11,4 -- 2.20.1
[PATCH 22/28] powerpc/64s/exception: fix indenting irregularities
Generally, macros that result in instructions being expanded are indented by a tab, and those that don't have no indent. Fix the obvious cases that go contrary to style. No generated code change. Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/exceptions-64s.S | 92 ++-- 1 file changed, 46 insertions(+), 46 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 1c11a7330856..a0721c3fc097 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -269,16 +269,16 @@ END_FTR_SECTION_NESTED(ftr,ftr,943) cmpwi r10,KVM_GUEST_MODE_SKIP beq 89f .else - BEGIN_FTR_SECTION_NESTED(947) +BEGIN_FTR_SECTION_NESTED(947) ld r10,\area+EX_CFAR(r13) std r10,HSTATE_CFAR(r13) - END_FTR_SECTION_NESTED(CPU_FTR_CFAR,CPU_FTR_CFAR,947) +END_FTR_SECTION_NESTED(CPU_FTR_CFAR,CPU_FTR_CFAR,947) .endif - BEGIN_FTR_SECTION_NESTED(948) +BEGIN_FTR_SECTION_NESTED(948) ld r10,\area+EX_PPR(r13) std r10,HSTATE_PPR(r13) - END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948) +END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948) ld r10,\area+EX_R10(r13) std r12,HSTATE_SCRATCH0(r13) sldir12,r9,32 @@ -380,10 +380,10 @@ END_FTR_SECTION_NESTED(ftr,ftr,943) std r9,GPR11(r1); \ std r10,GPR12(r1); \ std r11,GPR13(r1); \ - BEGIN_FTR_SECTION_NESTED(66); \ +BEGIN_FTR_SECTION_NESTED(66); \ ld r10,area+EX_CFAR(r13); \ std r10,ORIG_GPR3(r1); \ - END_FTR_SECTION_NESTED(CPU_FTR_CFAR, CPU_FTR_CFAR, 66);\ +END_FTR_SECTION_NESTED(CPU_FTR_CFAR, CPU_FTR_CFAR, 66); \ GET_CTR(r10, area);\ std r10,_CTR(r1); @@ -802,7 +802,7 @@ EXC_REAL_BEGIN(system_reset, 0x100, 0x100) * but we branch to the 0xc000... address so we can turn on relocation * with mtmsr. */ - BEGIN_FTR_SECTION +BEGIN_FTR_SECTION mfspr r10,SPRN_SRR1 rlwinm. r10,r10,47-31,30,31 beq-1f @@ -811,7 +811,7 @@ EXC_REAL_BEGIN(system_reset, 0x100, 0x100) bltlr cr1 /* no state loss, return to idle caller */ BRANCH_TO_C000(r10, system_reset_idle_common) 1: - END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) +END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) #endif KVMTEST EXC_STD 0x100 @@ -1159,10 +1159,10 @@ END_FTR_SECTION_IFCLR(CPU_FTR_HVMODE) * * Go back to nap/sleep/winkle mode again if (b) is true. */ - BEGIN_FTR_SECTION +BEGIN_FTR_SECTION rlwinm. r11,r12,47-31,30,31 bne machine_check_idle_common - END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) +END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) #endif /* @@ -1269,13 +1269,13 @@ EXC_COMMON_BEGIN(mce_return) b . EXC_REAL_BEGIN(data_access, 0x300, 0x80) -SET_SCRATCH0(r13) /* save r13 */ -EXCEPTION_PROLOG_0 PACA_EXGEN + SET_SCRATCH0(r13) /* save r13 */ + EXCEPTION_PROLOG_0 PACA_EXGEN b tramp_real_data_access EXC_REAL_END(data_access, 0x300, 0x80) TRAMP_REAL_BEGIN(tramp_real_data_access) -EXCEPTION_PROLOG_1 EXC_STD, PACA_EXGEN, 1, 0x300, 0 + EXCEPTION_PROLOG_1 EXC_STD, PACA_EXGEN, 1, 0x300, 0 /* * DAR/DSISR must be read before setting MSR[RI], because * a d-side MCE will clobber those registers so is not @@ -1288,9 +1288,9 @@ EXCEPTION_PROLOG_1 EXC_STD, PACA_EXGEN, 1, 0x300, 0 EXCEPTION_PROLOG_2_REAL data_access_common, EXC_STD, 1 EXC_VIRT_BEGIN(data_access, 0x4300, 0x80) -SET_SCRATCH0(r13) /* save r13 */ -EXCEPTION_PROLOG_0 PACA_EXGEN -EXCEPTION_PROLOG_1 EXC_STD, PACA_EXGEN, 0, 0x300, 0 + SET_SCRATCH0(r13) /* save r13 */ + EXCEPTION_PROLOG_0 PACA_EXGEN + EXCEPTION_PROLOG_1 EXC_STD, PACA_EXGEN, 0, 0x300, 0 mfspr r10,SPRN_DAR mfspr r11,SPRN_DSISR std r10,PACA_EXGEN+EX_DAR(r13) @@ -1323,24 +1323,24 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX) EXC_REAL_BEGIN(data_access_slb, 0x380, 0x80) -SET_SCRATCH0(r13) /* save r13 */ -EXCEPTION_PROLOG_0 PACA_EXSLB + SET_SCRATCH0(r13) /* save r13 */ + EXCEPTION_PROLOG_0 PACA_EXSLB b tramp_real_data_access_slb EXC_REAL_END(data_access_slb, 0x380, 0x80) TRAMP_REAL_BEGIN(tramp_real_data_access_slb) -EXCEPTION_PROLOG_1 EXC_STD, PACA_EXSL
[PATCH 21/28] powerpc/64s/exception: use a gas macro for system call handler code
No generated code change. Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/exceptions-64s.S | 127 --- 1 file changed, 55 insertions(+), 72 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 8a65ae64ed54..1c11a7330856 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -1615,6 +1615,7 @@ EXC_COMMON(trap_0b_common, 0xb00, unknown_exception) * without saving, though xer is not a good idea to use, as hardware may * interpret some bits so it may be costly to change them. */ +.macro SYSTEM_CALL real #ifdef CONFIG_KVM_BOOK3S_64_HANDLER /* * There is a little bit of juggling to get syscall and hcall @@ -1624,95 +1625,77 @@ EXC_COMMON(trap_0b_common, 0xb00, unknown_exception) * Userspace syscalls have already saved the PPR, hcalls must save * it before setting HMT_MEDIUM. */ -#define SYSCALL_KVMTEST \ - mtctr r13;\ - GET_PACA(r13); \ - std r10,PACA_EXGEN+EX_R10(r13); \ - INTERRUPT_TO_KERNEL;\ - KVMTEST EXC_STD 0xc00 ; /* uses r10, branch to do_kvm_0xc00_system_call */ \ - HMT_MEDIUM; \ - mfctr r9; - + mtctr r13 + GET_PACA(r13) + std r10,PACA_EXGEN+EX_R10(r13) + INTERRUPT_TO_KERNEL + KVMTEST EXC_STD 0xc00 /* uses r10, branch to do_kvm_0xc00_system_call */ + HMT_MEDIUM + mfctr r9 #else -#define SYSCALL_KVMTEST \ - HMT_MEDIUM; \ - mr r9,r13; \ - GET_PACA(r13); \ - INTERRUPT_TO_KERNEL; + HMT_MEDIUM + mr r9,r13 + GET_PACA(r13) + INTERRUPT_TO_KERNEL #endif - -#define LOAD_SYSCALL_HANDLER(reg) \ - __LOAD_HANDLER(reg, system_call_common) - -/* - * After SYSCALL_KVMTEST, we reach here with PACA in r13, r13 in r9, - * and HMT_MEDIUM. - */ -#define SYSCALL_REAL \ - mfspr r11,SPRN_SRR0 ; \ - mfspr r12,SPRN_SRR1 ; \ - LOAD_SYSCALL_HANDLER(r10) ; \ - mtspr SPRN_SRR0,r10 ; \ - ld r10,PACAKMSR(r13) ; \ - mtspr SPRN_SRR1,r10 ; \ - RFI_TO_KERNEL ; \ - b . ; /* prevent speculative execution */ #ifdef CONFIG_PPC_FAST_ENDIAN_SWITCH -#define SYSCALL_FASTENDIAN_TEST\ -BEGIN_FTR_SECTION \ - cmpdi r0,0x1ebe ; \ - beq-1f ;\ -END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE) \ - -#define SYSCALL_FASTENDIAN \ - /* Fast LE/BE switch system call */ \ -1: mfspr r12,SPRN_SRR1 ; \ - xorir12,r12,MSR_LE ;\ - mtspr SPRN_SRR1,r12 ; \ - mr r13,r9 ;\ - RFI_TO_USER ; /* return to userspace */ \ - b . ; /* prevent speculative execution */ -#else -#define SYSCALL_FASTENDIAN_TEST -#define SYSCALL_FASTENDIAN -#endif /* CONFIG_PPC_FAST_ENDIAN_SWITCH */ +BEGIN_FTR_SECTION + cmpdi r0,0x1ebe + beq-1f +END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE) +#endif + /* We reach here with PACA in r13, r13 in r9, and HMT_MEDIUM. */ -#if defined(CONFIG_RELOCATABLE) + .if \real + mfspr r11,SPRN_SRR0 + mfspr r12,SPRN_SRR1 + __LOAD_HANDLER(r10, system_call_common) + mtspr SPRN_SRR0,r10 + ld r10,PACAKMSR(r13) + mtspr SPRN_SRR1,r10 + RFI_TO_KERNEL + b . /* prevent speculative execution */ + .else +#ifdef CONFIG_RELOCATABLE /* * We can't branch directly so we do it via the CTR which * is volatile across system calls. */ -#define SYSCALL_VIRT \ - LOAD_SYSCALL_HANDLER(r10) ; \ - mtctr r10 ; \ - mfspr r11,SPRN_SRR0 ; \ -
[PATCH 20/28] powerpc/64s/exception: remove __BRANCH_TO_KVM
No generated code change. Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/exceptions-64s.S | 43 1 file changed, 18 insertions(+), 25 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 013abf3ea6f6..8a65ae64ed54 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -243,29 +243,6 @@ END_FTR_SECTION_NESTED(ftr,ftr,943) #endif #ifdef CONFIG_KVM_BOOK3S_64_HANDLER - -#ifdef CONFIG_RELOCATABLE -/* - * KVM requires __LOAD_FAR_HANDLER. - * - * __BRANCH_TO_KVM_EXIT branches are also a special case because they - * explicitly use r9 then reload it from PACA before branching. Hence - * the double-underscore. - */ -#define __BRANCH_TO_KVM_EXIT(area, label) \ - mfctr r9; \ - std r9,HSTATE_SCRATCH1(r13);\ - __LOAD_FAR_HANDLER(r9, label); \ - mtctr r9; \ - ld r9,area+EX_R9(r13); \ - bctr - -#else -#define __BRANCH_TO_KVM_EXIT(area, label) \ - ld r9,area+EX_R9(r13); \ - b label -#endif - #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE /* * If hv is possible, interrupts come into to the hv version @@ -311,8 +288,24 @@ END_FTR_SECTION_NESTED(ftr,ftr,943) .else ori r12,r12,(\n) .endif - /* This reloads r9 before branching to kvmppc_interrupt */ - __BRANCH_TO_KVM_EXIT(\area, kvmppc_interrupt) + +#ifdef CONFIG_RELOCATABLE + /* +* KVM requires __LOAD_FAR_HANDLER beause kvmppc_interrupt lives +* outside the head section. CONFIG_RELOCATABLE KVM expects CTR +* to be saved in HSTATE_SCRATCH1. +*/ + mfctr r9 + std r9,HSTATE_SCRATCH1(r13) + __LOAD_FAR_HANDLER(r9, kvmppc_interrupt) + mtctr r9 + ld r9,\area+EX_R9(r13) + bctr +#else + ld r9,\area+EX_R9(r13) + b kvmppc_interrupt +#endif + .if \skip 89:mtocrf 0x80,r9 -- 2.20.1
[PATCH 19/28] powerpc/64s/exception: move head-64.h code to exception-64s.S where it is used
No generated code change. Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/exception-64s.h | 1 - arch/powerpc/include/asm/head-64.h | 252 --- arch/powerpc/kernel/exceptions-64s.S | 251 ++ 3 files changed, 251 insertions(+), 253 deletions(-) diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h index 9e6712099f7a..dc6a5ccac965 100644 --- a/arch/powerpc/include/asm/exception-64s.h +++ b/arch/powerpc/include/asm/exception-64s.h @@ -30,7 +30,6 @@ * exception handlers (including pSeries LPAR) and iSeries LPAR * implementations as possible. */ -#include #include /* PACA save area offsets (exgen, exmc, etc) */ diff --git a/arch/powerpc/include/asm/head-64.h b/arch/powerpc/include/asm/head-64.h index dc1940c94a86..a466765709a9 100644 --- a/arch/powerpc/include/asm/head-64.h +++ b/arch/powerpc/include/asm/head-64.h @@ -169,53 +169,6 @@ end_##sname: #define ABS_ADDR(label) (label - fs_label + fs_start) -/* - * Following are the BOOK3S exception handler helper macros. - * Handlers come in a number of types, and each type has a number of varieties. - * - * EXC_REAL_* - real, unrelocated exception vectors - * EXC_VIRT_* - virt (AIL), unrelocated exception vectors - * TRAMP_REAL_* - real, unrelocated helpers (virt can call these) - * TRAMP_VIRT_* - virt, unreloc helpers (in practice, real can use) - * TRAMP_KVM - KVM handlers that get put into real, unrelocated - * EXC_COMMON - virt, relocated common handlers - * - * The EXC handlers are given a name, and branch to name_common, or the - * appropriate KVM or masking function. Vector handler verieties are as - * follows: - * - * EXC_{REAL|VIRT}_BEGIN/END - used to open-code the exception - * - * EXC_{REAL|VIRT} - standard exception - * - * EXC_{REAL|VIRT}_suffix - * where _suffix is: - * - _MASKABLE - maskable exception - * - _OOL- out of line with trampoline to common handler - * - _HV - HV exception - * - * There can be combinations, e.g., EXC_VIRT_OOL_MASKABLE_HV - * - * The one unusual case is __EXC_REAL_OOL_HV_DIRECT, which is - * an OOL vector that branches to a specified handler rather than the usual - * trampoline that goes to common. It, and other underscore macros, should - * be used with care. - * - * KVM handlers come in the following verieties: - * TRAMP_KVM - * TRAMP_KVM_SKIP - * TRAMP_KVM_HV - * TRAMP_KVM_HV_SKIP - * - * COMMON handlers come in the following verieties: - * EXC_COMMON_BEGIN/END - used to open-code the handler - * EXC_COMMON - * EXC_COMMON_ASYNC - * - * TRAMP_REAL and TRAMP_VIRT can be used with BEGIN/END. KVM - * and OOL handlers are implemented as types of TRAMP and TRAMP_VIRT handlers. - */ - #define EXC_REAL_BEGIN(name, start, size) \ FIXED_SECTION_ENTRY_BEGIN_LOCATION(real_vectors, exc_real_##start##_##name, start, size) @@ -257,211 +210,6 @@ end_##sname: FIXED_SECTION_ENTRY_BEGIN_LOCATION(virt_vectors, exc_virt_##start##_##unused, start, size); \ FIXED_SECTION_ENTRY_END_LOCATION(virt_vectors, exc_virt_##start##_##unused, start, size) - -#define __EXC_REAL(name, start, size, area)\ - EXC_REAL_BEGIN(name, start, size); \ - SET_SCRATCH0(r13); /* save r13 */ \ - EXCEPTION_PROLOG_0 area ; \ - EXCEPTION_PROLOG_1 EXC_STD, area, 1, start, 0 ; \ - EXCEPTION_PROLOG_2_REAL name##_common, EXC_STD, 1 ; \ - EXC_REAL_END(name, start, size) - -#define EXC_REAL(name, start, size)\ - __EXC_REAL(name, start, size, PACA_EXGEN) - -#define __EXC_VIRT(name, start, size, realvec, area) \ - EXC_VIRT_BEGIN(name, start, size); \ - SET_SCRATCH0(r13);/* save r13 */\ - EXCEPTION_PROLOG_0 area ; \ - EXCEPTION_PROLOG_1 EXC_STD, area, 0, realvec, 0;\ - EXCEPTION_PROLOG_2_VIRT name##_common, EXC_STD ;\ - EXC_VIRT_END(name, start, size) - -#define EXC_VIRT(name, start, size, realvec) \ - __EXC_VIRT(name, start, size, realvec, PACA_EXGEN) - -#define EXC_REAL_MASKABLE(name, start, size, bitmask) \ - EXC_REAL_BEGIN(name, start, size); \ - SET_SCRATCH0(r13);/* save r13 */\ - EXCEPTION_PROLOG_0 PACA_EXGEN ; \ - EXCEPTION_PROLOG_1 EXC_STD, PACA_EXGEN, 1, start, bitmask ; \ - EXCEPTION_PROLOG_2_REAL name##_common, EXC_STD, 1 ; \ - EXC_REAL_END(name, start, size) - -#define EXC_VIRT_MASKABLE(name, start, size, real
[PATCH 18/28] powerpc/64s/exception: move exception-64s.h code to exception-64s.S where it is used
No generated code change. Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/exception-64s.h | 430 -- arch/powerpc/kernel/exceptions-64s.S | 431 +++ 2 files changed, 431 insertions(+), 430 deletions(-) diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h index e996ffe68cf3..9e6712099f7a 100644 --- a/arch/powerpc/include/asm/exception-64s.h +++ b/arch/powerpc/include/asm/exception-64s.h @@ -146,436 +146,6 @@ hrfid; \ b hrfi_flush_fallback -/* - * We're short on space and time in the exception prolog, so we can't - * use the normal LOAD_REG_IMMEDIATE macro to load the address of label. - * Instead we get the base of the kernel from paca->kernelbase and or in the low - * part of label. This requires that the label be within 64KB of kernelbase, and - * that kernelbase be 64K aligned. - */ -#define LOAD_HANDLER(reg, label) \ - ld reg,PACAKBASE(r13); /* get high part of &label */ \ - ori reg,reg,FIXED_SYMBOL_ABS_ADDR(label) - -#define __LOAD_HANDLER(reg, label) \ - ld reg,PACAKBASE(r13); \ - ori reg,reg,(ABS_ADDR(label))@l - -/* - * Branches from unrelocated code (e.g., interrupts) to labels outside - * head-y require >64K offsets. - */ -#define __LOAD_FAR_HANDLER(reg, label) \ - ld reg,PACAKBASE(r13); \ - ori reg,reg,(ABS_ADDR(label))@l;\ - addis reg,reg,(ABS_ADDR(label))@h - -/* Exception register prefixes */ -#define EXC_HV 1 -#define EXC_STD0 - -#if defined(CONFIG_RELOCATABLE) -/* - * If we support interrupts with relocation on AND we're a relocatable kernel, - * we need to use CTR to get to the 2nd level handler. So, save/restore it - * when required. - */ -#define SAVE_CTR(reg, area)mfctr reg ; std reg,area+EX_CTR(r13) -#define GET_CTR(reg, area) ld reg,area+EX_CTR(r13) -#define RESTORE_CTR(reg, area) ld reg,area+EX_CTR(r13) ; mtctr reg -#else -/* ...else CTR is unused and in register. */ -#define SAVE_CTR(reg, area) -#define GET_CTR(reg, area) mfctr reg -#define RESTORE_CTR(reg, area) -#endif - -/* - * PPR save/restore macros used in exceptions_64s.S - * Used for P7 or later processors - */ -#define SAVE_PPR(area, ra) \ -BEGIN_FTR_SECTION_NESTED(940) \ - ld ra,area+EX_PPR(r13);/* Read PPR from paca */\ - std ra,_PPR(r1);\ -END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,940) - -#define RESTORE_PPR_PACA(area, ra) \ -BEGIN_FTR_SECTION_NESTED(941) \ - ld ra,area+EX_PPR(r13);\ - mtspr SPRN_PPR,ra;\ -END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,941) - -/* - * Get an SPR into a register if the CPU has the given feature - */ -#define OPT_GET_SPR(ra, spr, ftr) \ -BEGIN_FTR_SECTION_NESTED(943) \ - mfspr ra,spr; \ -END_FTR_SECTION_NESTED(ftr,ftr,943) - -/* - * Set an SPR from a register if the CPU has the given feature - */ -#define OPT_SET_SPR(ra, spr, ftr) \ -BEGIN_FTR_SECTION_NESTED(943) \ - mtspr spr,ra; \ -END_FTR_SECTION_NESTED(ftr,ftr,943) - -/* - * Save a register to the PACA if the CPU has the given feature - */ -#define OPT_SAVE_REG_TO_PACA(offset, ra, ftr) \ -BEGIN_FTR_SECTION_NESTED(943) \ - std ra,offset(r13); \ -END_FTR_SECTION_NESTED(ftr,ftr,943) - -.macro EXCEPTION_PROLOG_0 area - GET_PACA(r13) - std r9,\area\()+EX_R9(r13) /* save r9 */ - OPT_GET_SPR(r9, SPRN_PPR, CPU_FTR_HAS_PPR) - HMT_MEDIUM - std r10,\area\()+EX_R10(r13)/* save r10 - r12 */ - OPT_GET_SPR(r10, SPRN_CFAR, CPU_FTR_CFAR) -.endm - -.macro EXCEPTION_PROLOG_1 hsrr, area, kvm, vec, bitmask - OPT_SAVE_REG_TO_PACA(\area\()+EX_PPR, r9, CPU_FTR_HAS_PPR) - OPT_SAVE_REG_TO_PACA(\area\()+EX_CFAR, r10, CPU_FTR_CFAR) - INTERRUPT_TO_KERNEL - SAVE_CTR(r10, \area\()) - mfcrr9 - .if \kvm - KVMTEST \hsrr \vec - .endif - .if \bitmask - lbz r10,PACAIRQ
[PATCH 17/28] powerpc/64s/exception: move KVM related code together
No generated code change. Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/exception-64s.h | 40 +--- 1 file changed, 21 insertions(+), 19 deletions(-) diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h index 73705421f423..e996ffe68cf3 100644 --- a/arch/powerpc/include/asm/exception-64s.h +++ b/arch/powerpc/include/asm/exception-64s.h @@ -335,18 +335,6 @@ END_FTR_SECTION_NESTED(ftr,ftr,943) #endif .endm - -#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE -/* - * If hv is possible, interrupts come into to the hv version - * of the kvmppc_interrupt code, which then jumps to the PR handler, - * kvmppc_interrupt_pr, if the guest is a PR guest. - */ -#define kvmppc_interrupt kvmppc_interrupt_hv -#else -#define kvmppc_interrupt kvmppc_interrupt_pr -#endif - /* * Branch to label using its 0xC000 address. This results in instruction * address suitable for MSR[IR]=0 or 1, which allows relocation to be turned @@ -371,6 +359,17 @@ END_FTR_SECTION_NESTED(ftr,ftr,943) mtctr r12;\ bctrl +#else +#define BRANCH_TO_COMMON(reg, label) \ + b label + +#define BRANCH_LINK_TO_FAR(label) \ + bl label +#endif + +#ifdef CONFIG_KVM_BOOK3S_64_HANDLER + +#ifdef CONFIG_RELOCATABLE /* * KVM requires __LOAD_FAR_HANDLER. * @@ -387,19 +386,22 @@ END_FTR_SECTION_NESTED(ftr,ftr,943) bctr #else -#define BRANCH_TO_COMMON(reg, label) \ - b label - -#define BRANCH_LINK_TO_FAR(label) \ - bl label - #define __BRANCH_TO_KVM_EXIT(area, label) \ ld r9,area+EX_R9(r13); \ b label +#endif +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE +/* + * If hv is possible, interrupts come into to the hv version + * of the kvmppc_interrupt code, which then jumps to the PR handler, + * kvmppc_interrupt_pr, if the guest is a PR guest. + */ +#define kvmppc_interrupt kvmppc_interrupt_hv +#else +#define kvmppc_interrupt kvmppc_interrupt_pr #endif -#ifdef CONFIG_KVM_BOOK3S_64_HANDLER .macro KVMTEST hsrr, n lbz r10,HSTATE_IN_GUEST(r13) cmpwi r10,0 -- 2.20.1
[PATCH 16/28] powerpc/64s/exception: remove STD_EXCEPTION_COMMON variants
These are only called in one place each. No generated code change. Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/exception-64s.h | 22 -- arch/powerpc/include/asm/head-64.h | 19 +-- 2 files changed, 17 insertions(+), 24 deletions(-) diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h index 6de3c393ddf7..73705421f423 100644 --- a/arch/powerpc/include/asm/exception-64s.h +++ b/arch/powerpc/include/asm/exception-64s.h @@ -555,28 +555,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_CTRL) EXCEPTION_PROLOG_COMMON_2(area);\ EXCEPTION_PROLOG_COMMON_3(trap) -#define STD_EXCEPTION_COMMON(trap, hdlr) \ - EXCEPTION_COMMON(PACA_EXGEN, trap); \ - bl save_nvgprs;\ - RECONCILE_IRQ_STATE(r10, r11); \ - addir3,r1,STACK_FRAME_OVERHEAD; \ - bl hdlr; \ - b ret_from_except - -/* - * Like STD_EXCEPTION_COMMON, but for exceptions that can occur - * in the idle task and therefore need the special idle handling - * (finish nap and runlatch) - */ -#define STD_EXCEPTION_COMMON_ASYNC(trap, hdlr) \ - EXCEPTION_COMMON(PACA_EXGEN, trap); \ - FINISH_NAP; \ - RECONCILE_IRQ_STATE(r10, r11); \ - RUNLATCH_ON;\ - addir3,r1,STACK_FRAME_OVERHEAD; \ - bl hdlr; \ - b ret_from_except_lite - /* * When the idle code in power4_idle puts the CPU into NAP mode, * it has to do so in a loop, and relies on the external interrupt diff --git a/arch/powerpc/include/asm/head-64.h b/arch/powerpc/include/asm/head-64.h index 54db05afb80f..dc1940c94a86 100644 --- a/arch/powerpc/include/asm/head-64.h +++ b/arch/powerpc/include/asm/head-64.h @@ -441,11 +441,26 @@ end_##sname: #define EXC_COMMON(name, realvec, hdlr) \ EXC_COMMON_BEGIN(name); \ - STD_EXCEPTION_COMMON(realvec, hdlr) + EXCEPTION_COMMON(PACA_EXGEN, realvec); \ + bl save_nvgprs;\ + RECONCILE_IRQ_STATE(r10, r11); \ + addir3,r1,STACK_FRAME_OVERHEAD; \ + bl hdlr; \ + b ret_from_except +/* + * Like EXC_COMMON, but for exceptions that can occur in the idle task and + * therefore need the special idle handling (finish nap and runlatch) + */ #define EXC_COMMON_ASYNC(name, realvec, hdlr) \ EXC_COMMON_BEGIN(name); \ - STD_EXCEPTION_COMMON_ASYNC(realvec, hdlr) + EXCEPTION_COMMON(PACA_EXGEN, realvec); \ + FINISH_NAP; \ + RECONCILE_IRQ_STATE(r10, r11); \ + RUNLATCH_ON;\ + addir3,r1,STACK_FRAME_OVERHEAD; \ + bl hdlr; \ + b ret_from_except_lite #endif /* __ASSEMBLY__ */ -- 2.20.1
[PATCH 15/28] powerpc/64s/exception: move EXCEPTION_PROLOG_2* to a more logical place
No generated code change. Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/exception-64s.h | 113 --- 1 file changed, 57 insertions(+), 56 deletions(-) diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h index 0bb0310b794f..6de3c393ddf7 100644 --- a/arch/powerpc/include/asm/exception-64s.h +++ b/arch/powerpc/include/asm/exception-64s.h @@ -170,62 +170,6 @@ ori reg,reg,(ABS_ADDR(label))@l;\ addis reg,reg,(ABS_ADDR(label))@h -.macro EXCEPTION_PROLOG_2_REAL label, hsrr, set_ri - ld r10,PACAKMSR(r13) /* get MSR value for kernel */ - .if ! \set_ri - xorir10,r10,MSR_RI /* Clear MSR_RI */ - .endif - .if \hsrr - mfspr r11,SPRN_HSRR0 /* save HSRR0 */ - .else - mfspr r11,SPRN_SRR0 /* save SRR0 */ - .endif - LOAD_HANDLER(r12, \label\()) - .if \hsrr - mtspr SPRN_HSRR0,r12 - mfspr r12,SPRN_HSRR1 /* and HSRR1 */ - mtspr SPRN_HSRR1,r10 - HRFI_TO_KERNEL - .else - mtspr SPRN_SRR0,r12 - mfspr r12,SPRN_SRR1 /* and SRR1 */ - mtspr SPRN_SRR1,r10 - RFI_TO_KERNEL - .endif - b . /* prevent speculative execution */ -.endm - -.macro EXCEPTION_PROLOG_2_VIRT label, hsrr -#ifdef CONFIG_RELOCATABLE - .if \hsrr - mfspr r11,SPRN_HSRR0 /* save HSRR0 */ - .else - mfspr r11,SPRN_SRR0 /* save SRR0 */ - .endif - LOAD_HANDLER(r12, \label\()) - mtctr r12 - .if \hsrr - mfspr r12,SPRN_HSRR1 /* and HSRR1 */ - .else - mfspr r12,SPRN_SRR1 /* and HSRR1 */ - .endif - li r10,MSR_RI - mtmsrd r10,1 /* Set RI (EE=0) */ - bctr -#else - .if \hsrr - mfspr r11,SPRN_HSRR0 /* save HSRR0 */ - mfspr r12,SPRN_HSRR1 /* and HSRR1 */ - .else - mfspr r11,SPRN_SRR0 /* save SRR0 */ - mfspr r12,SPRN_SRR1 /* and SRR1 */ - .endif - li r10,MSR_RI - mtmsrd r10,1 /* Set RI (EE=0) */ - b \label -#endif -.endm - /* Exception register prefixes */ #define EXC_HV 1 #define EXC_STD0 @@ -335,6 +279,63 @@ END_FTR_SECTION_NESTED(ftr,ftr,943) std r10,\area\()+EX_R13(r13) .endm +.macro EXCEPTION_PROLOG_2_REAL label, hsrr, set_ri + ld r10,PACAKMSR(r13) /* get MSR value for kernel */ + .if ! \set_ri + xorir10,r10,MSR_RI /* Clear MSR_RI */ + .endif + .if \hsrr + mfspr r11,SPRN_HSRR0 /* save HSRR0 */ + .else + mfspr r11,SPRN_SRR0 /* save SRR0 */ + .endif + LOAD_HANDLER(r12, \label\()) + .if \hsrr + mtspr SPRN_HSRR0,r12 + mfspr r12,SPRN_HSRR1 /* and HSRR1 */ + mtspr SPRN_HSRR1,r10 + HRFI_TO_KERNEL + .else + mtspr SPRN_SRR0,r12 + mfspr r12,SPRN_SRR1 /* and SRR1 */ + mtspr SPRN_SRR1,r10 + RFI_TO_KERNEL + .endif + b . /* prevent speculative execution */ +.endm + +.macro EXCEPTION_PROLOG_2_VIRT label, hsrr +#ifdef CONFIG_RELOCATABLE + .if \hsrr + mfspr r11,SPRN_HSRR0 /* save HSRR0 */ + .else + mfspr r11,SPRN_SRR0 /* save SRR0 */ + .endif + LOAD_HANDLER(r12, \label\()) + mtctr r12 + .if \hsrr + mfspr r12,SPRN_HSRR1 /* and HSRR1 */ + .else + mfspr r12,SPRN_SRR1 /* and HSRR1 */ + .endif + li r10,MSR_RI + mtmsrd r10,1 /* Set RI (EE=0) */ + bctr +#else + .if \hsrr + mfspr r11,SPRN_HSRR0 /* save HSRR0 */ + mfspr r12,SPRN_HSRR1 /* and HSRR1 */ + .else + mfspr r11,SPRN_SRR0 /* save SRR0 */ + mfspr r12,SPRN_SRR1 /* and SRR1 */ + .endif + li r10,MSR_RI + mtmsrd r10,1 /* Set RI (EE=0) */ + b \label +#endif +.endm + + #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE /* * If hv is possible, interrupts come into to the hv version -- 2.20.1
[PATCH 14/28] powerpc/64s/exception: improve 0x500 handler code
After the previous cleanup, it becomes possible to consolidate some common code outside the runtime alternate patching. Also remove unused labels. This results in some code change, but unchanged runtime instruction sequence. Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/exceptions-64s.S | 16 1 file changed, 4 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index b8dba3fffeeb..c95dfc618a52 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -746,32 +746,24 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX) EXC_REAL_BEGIN(hardware_interrupt, 0x500, 0x100) - .globl hardware_interrupt_hv -hardware_interrupt_hv: + SET_SCRATCH0(r13) /* save r13 */ + EXCEPTION_PROLOG_0 PACA_EXGEN BEGIN_FTR_SECTION - SET_SCRATCH0(r13) /* save r13 */ - EXCEPTION_PROLOG_0 PACA_EXGEN EXCEPTION_PROLOG_1 EXC_HV, PACA_EXGEN, 1, 0x500, IRQS_DISABLED EXCEPTION_PROLOG_2_REAL hardware_interrupt_common, EXC_HV, 1 FTR_SECTION_ELSE - SET_SCRATCH0(r13) /* save r13 */ - EXCEPTION_PROLOG_0 PACA_EXGEN EXCEPTION_PROLOG_1 EXC_STD, PACA_EXGEN, 1, 0x500, IRQS_DISABLED EXCEPTION_PROLOG_2_REAL hardware_interrupt_common, EXC_STD, 1 ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) EXC_REAL_END(hardware_interrupt, 0x500, 0x100) EXC_VIRT_BEGIN(hardware_interrupt, 0x4500, 0x100) - .globl hardware_interrupt_relon_hv -hardware_interrupt_relon_hv: + SET_SCRATCH0(r13) /* save r13 */ + EXCEPTION_PROLOG_0 PACA_EXGEN BEGIN_FTR_SECTION - SET_SCRATCH0(r13) /* save r13 */ - EXCEPTION_PROLOG_0 PACA_EXGEN EXCEPTION_PROLOG_1 EXC_HV, PACA_EXGEN, 1, 0x500, IRQS_DISABLED EXCEPTION_PROLOG_2_VIRT hardware_interrupt_common, EXC_HV FTR_SECTION_ELSE - SET_SCRATCH0(r13) /* save r13 */ - EXCEPTION_PROLOG_0 PACA_EXGEN EXCEPTION_PROLOG_1 EXC_STD, PACA_EXGEN, 1, 0x500, IRQS_DISABLED EXCEPTION_PROLOG_2_VIRT hardware_interrupt_common, EXC_STD ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE) -- 2.20.1
[PATCH 13/28] powerpc/64s/exception: unwind exception-64s.h macros
Many of these macros just specify 1-4 lines which are only called a few times each at most, and often just once. Remove this indirection. No generated code change. Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/exception-64s.h | 101 --- arch/powerpc/include/asm/head-64.h | 76 - arch/powerpc/kernel/exceptions-64s.S | 44 +- 3 files changed, 82 insertions(+), 139 deletions(-) diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h index 24fc0104c9d3..0bb0310b794f 100644 --- a/arch/powerpc/include/asm/exception-64s.h +++ b/arch/powerpc/include/asm/exception-64s.h @@ -226,17 +226,6 @@ #endif .endm -/* - * As EXCEPTION_PROLOG(), except we've already got relocation on so no need to - * rfid. Save CTR in case we're CONFIG_RELOCATABLE, in which case - * EXCEPTION_PROLOG_2_VIRT will be using CTR. - */ -#define EXCEPTION_RELON_PROLOG(area, label, hsrr, kvm, vec)\ - SET_SCRATCH0(r13); /* save r13 */ \ - EXCEPTION_PROLOG_0 area ; \ - EXCEPTION_PROLOG_1 hsrr, area, kvm, vec, 0 ;\ - EXCEPTION_PROLOG_2_VIRT label, hsrr - /* Exception register prefixes */ #define EXC_HV 1 #define EXC_STD0 @@ -346,12 +335,6 @@ END_FTR_SECTION_NESTED(ftr,ftr,943) std r10,\area\()+EX_R13(r13) .endm -#define EXCEPTION_PROLOG(area, label, hsrr, kvm, vec) \ - SET_SCRATCH0(r13); /* save r13 */ \ - EXCEPTION_PROLOG_0 area ; \ - EXCEPTION_PROLOG_1 hsrr, area, kvm, vec, 0 ;\ - EXCEPTION_PROLOG_2_REAL label, hsrr, 1 - #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE /* * If hv is possible, interrupts come into to the hv version @@ -415,12 +398,6 @@ END_FTR_SECTION_NESTED(ftr,ftr,943) #endif -/* Do not enable RI */ -#define EXCEPTION_PROLOG_NORI(area, label, hsrr, kvm, vec) \ - EXCEPTION_PROLOG_0 area ; \ - EXCEPTION_PROLOG_1 hsrr, area, kvm, vec, 0 ;\ - EXCEPTION_PROLOG_2_REAL label, hsrr, 0 - #ifdef CONFIG_KVM_BOOK3S_64_HANDLER .macro KVMTEST hsrr, n lbz r10,HSTATE_IN_GUEST(r13) @@ -557,84 +534,6 @@ END_FTR_SECTION_NESTED(ftr,ftr,943) std r10,RESULT(r1); /* clear regs->result */ \ std r11,STACK_FRAME_OVERHEAD-16(r1); /* mark the frame */ -/* - * Exception vectors. - */ -#define STD_EXCEPTION(vec, label) \ - EXCEPTION_PROLOG(PACA_EXGEN, label, EXC_STD, 1, vec); - -/* Version of above for when we have to branch out-of-line */ -#define __OOL_EXCEPTION(vec, label, hdlr) \ - SET_SCRATCH0(r13); \ - EXCEPTION_PROLOG_0 PACA_EXGEN ; \ - b hdlr - -#define STD_EXCEPTION_OOL(vec, label) \ - EXCEPTION_PROLOG_1 EXC_STD, PACA_EXGEN, 1, vec, 0 ; \ - EXCEPTION_PROLOG_2_REAL label, EXC_STD, 1 - -#define STD_EXCEPTION_HV(loc, vec, label) \ - EXCEPTION_PROLOG(PACA_EXGEN, label, EXC_HV, 1, vec) - -#define STD_EXCEPTION_HV_OOL(vec, label) \ - EXCEPTION_PROLOG_1 EXC_HV, PACA_EXGEN, 1, vec, 0 ; \ - EXCEPTION_PROLOG_2_REAL label, EXC_HV, 1 - -#define STD_RELON_EXCEPTION(loc, vec, label) \ - /* No guest interrupts come through here */ \ - EXCEPTION_RELON_PROLOG(PACA_EXGEN, label, EXC_STD, 0, vec) - -#define STD_RELON_EXCEPTION_OOL(vec, label)\ - EXCEPTION_PROLOG_1 EXC_STD, PACA_EXGEN, 0, vec, 0 ; \ - EXCEPTION_PROLOG_2_VIRT label, EXC_STD - -#define STD_RELON_EXCEPTION_HV(loc, vec, label)\ - EXCEPTION_RELON_PROLOG(PACA_EXGEN, label, EXC_HV, 1, vec) - -#define STD_RELON_EXCEPTION_HV_OOL(vec, label) \ - EXCEPTION_PROLOG_1 EXC_HV, PACA_EXGEN, 1, vec, 0 ; \ - EXCEPTION_PROLOG_2_VIRT label, EXC_HV - -#define __MASKABLE_EXCEPTION(vec, label, hsrr, kvm, bitmask) \ - SET_SCRATCH0(r13);/* save r13 */\ - EXCEPTION_PROLOG_0 PACA_EXGEN ; \ - EXCEPTION_PROLOG_1 hsrr, PACA_EXGEN, kvm, vec, bitmask ;\ - EXCEPTION_PROLOG_2_REAL label, hsrr, 1 - -#define MASKABLE_EXCEPTION(vec, label, bitmask) \ - __MASKABLE_EXCEPTION(vec, label, EXC_STD, 1, bitmask) - -#define MASKABLE_EXCEPTION_OOL(vec, label, bitmask)\ - EXCEPTION_PROLOG_1 EXC_STD, PACA_EXGEN, 1, vec, bitmask ; \ - EXCEPTION_PROLOG_2_REAL label, EXC_STD, 1 - -#define MASKABLE_EXCEPTION_HV(vec, label, bitmask) \ - __MASK
[PATCH 11/28] powerpc/64s/exception: Move EXCEPTION_COMMON handler and return branches into callers
The aim is to reduce the amount of indirection it takes to get through the exception handler macros, particularly where it provides little code sharing. No generated code change. Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/exception-64s.h | 26 arch/powerpc/kernel/exceptions-64s.S | 21 +++ 2 files changed, 26 insertions(+), 21 deletions(-) diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h index f19c2391cc36..cc65e87cff2f 100644 --- a/arch/powerpc/include/asm/exception-64s.h +++ b/arch/powerpc/include/asm/exception-64s.h @@ -658,31 +658,28 @@ BEGIN_FTR_SECTION \ beqlppc64_runlatch_on_trampoline; \ END_FTR_SECTION_IFSET(CPU_FTR_CTRL) -#define EXCEPTION_COMMON(area, trap, label, hdlr, ret, additions) \ +#define EXCEPTION_COMMON(area, trap, label, additions) \ EXCEPTION_PROLOG_COMMON(trap, area);\ /* Volatile regs are potentially clobbered here */ \ - additions; \ - addir3,r1,STACK_FRAME_OVERHEAD; \ - bl hdlr; \ - b ret + additions /* * Exception where stack is already set in r1, r1 is saved in r10, and it * continues rather than returns. */ -#define EXCEPTION_COMMON_NORET_STACK(area, trap, label, hdlr, additions) \ +#define EXCEPTION_COMMON_NORET_STACK(area, trap, label, additions) \ EXCEPTION_PROLOG_COMMON_1();\ kuap_save_amr_and_lock r9, r10, cr1;\ EXCEPTION_PROLOG_COMMON_2(area);\ EXCEPTION_PROLOG_COMMON_3(trap);\ /* Volatile regs are potentially clobbered here */ \ - additions; \ - addir3,r1,STACK_FRAME_OVERHEAD; \ - bl hdlr + additions #define STD_EXCEPTION_COMMON(trap, label, hdlr)\ - EXCEPTION_COMMON(PACA_EXGEN, trap, label, hdlr, \ - ret_from_except, ADD_NVGPRS;ADD_RECONCILE) + EXCEPTION_COMMON(PACA_EXGEN, trap, label, ADD_NVGPRS;ADD_RECONCILE); \ + addir3,r1,STACK_FRAME_OVERHEAD; \ + bl hdlr; \ + b ret_from_except /* * Like STD_EXCEPTION_COMMON, but for exceptions that can occur @@ -690,8 +687,11 @@ END_FTR_SECTION_IFSET(CPU_FTR_CTRL) * (finish nap and runlatch) */ #define STD_EXCEPTION_COMMON_ASYNC(trap, label, hdlr) \ - EXCEPTION_COMMON(PACA_EXGEN, trap, label, hdlr, \ - ret_from_except_lite, FINISH_NAP;ADD_RECONCILE;RUNLATCH_ON) + EXCEPTION_COMMON(PACA_EXGEN, trap, label, \ + FINISH_NAP;ADD_RECONCILE;RUNLATCH_ON); \ + addir3,r1,STACK_FRAME_OVERHEAD; \ + bl hdlr; \ + b ret_from_except_lite /* * When the idle code in power4_idle puts the CPU into NAP mode, diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 88f892167a64..63b161c23e9e 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -195,9 +195,10 @@ EXC_COMMON_BEGIN(system_reset_common) mr r10,r1 ld r1,PACA_NMI_EMERG_SP(r13) subir1,r1,INT_FRAME_SIZE - EXCEPTION_COMMON_NORET_STACK(PACA_EXNMI, 0x100, - system_reset, system_reset_exception, - ADD_NVGPRS;ADD_RECONCILE_NMI) + EXCEPTION_COMMON_NORET_STACK(PACA_EXNMI, 0x100, system_reset, + ADD_NVGPRS;ADD_RECONCILE_NMI) + addir3,r1,STACK_FRAME_OVERHEAD + bl system_reset_exception /* This (and MCE) can be simplified with mtmsrd L=1 */ /* Clear MSR_RI before setting SRR0 and SRR1. */ @@ -1171,8 +1172,11 @@ hmi_exception_after_realmode: b tramp_real_hmi_exception EXC_COMMON_BEGIN(hmi_exception_common) -EXCEPTION_COMMON(PACA_EXGEN, 0xe60, hmi_exception_common, handle_hmi_exception, -ret_from_except, FINISH_NAP;ADD_NVGPRS;ADD_RECONCILE;RUNLATCH_ON) +EXCEPTION_COMMON(PACA_EXGEN, 0xe60, hmi_exception_common, + FINISH_NAP;ADD_NVGPRS;ADD_RECONCILE;RUNLATCH_ON) + addir3,r1,STACK_FRAME_OVERHEAD + bl handle_hmi_exception + b ret_from_except EXC_REAL_OOL_MASKABLE_HV(h_doorbell, 0xe80, 0x20, IRQS_DISABLED) EXC_VIRT_OOL_MASKABLE_HV(h_doorbell, 0x4e80, 0x20, 0xe80, IRQS_DISABLED) @@ -1467,9 +1471,10 @@ EXC_COMMON_BEGIN(soft_nmi_common) mr r10,r1 ld r1,PACAEMERGSP(r13) subir1,r1,INT_FRAME_SIZE - EXCEPTION_CO
[PATCH 12/28] powerpc/64s/exception: Move EXCEPTION_COMMON additions into callers
More cases of code insertion via macros that does not add a great deal. All the additions have to be specified in the macro arguments, so they can just as well go after the macro. No generated code change. Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/exception-64s.h | 42 +++--- arch/powerpc/include/asm/head-64.h | 4 +-- arch/powerpc/kernel/exceptions-64s.S | 45 +--- 3 files changed, 39 insertions(+), 52 deletions(-) diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h index cc65e87cff2f..24fc0104c9d3 100644 --- a/arch/powerpc/include/asm/exception-64s.h +++ b/arch/powerpc/include/asm/exception-64s.h @@ -635,21 +635,6 @@ END_FTR_SECTION_NESTED(ftr,ftr,943) EXCEPTION_PROLOG_1 EXC_HV, PACA_EXGEN, 1, vec, bitmask ;\ EXCEPTION_PROLOG_2_VIRT label, EXC_HV -/* - * Our exception common code can be passed various "additions" - * to specify the behaviour of interrupts, whether to kick the - * runlatch, etc... - */ - -/* - * This addition reconciles our actual IRQ state with the various software - * flags that track it. This may call C code. - */ -#define ADD_RECONCILE RECONCILE_IRQ_STATE(r10,r11) - -#define ADD_NVGPRS \ - bl save_nvgprs - #define RUNLATCH_ON\ BEGIN_FTR_SECTION \ ld r3, PACA_THREAD_INFO(r13); \ @@ -658,25 +643,22 @@ BEGIN_FTR_SECTION \ beqlppc64_runlatch_on_trampoline; \ END_FTR_SECTION_IFSET(CPU_FTR_CTRL) -#define EXCEPTION_COMMON(area, trap, label, additions) \ +#define EXCEPTION_COMMON(area, trap) \ EXCEPTION_PROLOG_COMMON(trap, area);\ - /* Volatile regs are potentially clobbered here */ \ - additions /* - * Exception where stack is already set in r1, r1 is saved in r10, and it - * continues rather than returns. + * Exception where stack is already set in r1, r1 is saved in r10 */ -#define EXCEPTION_COMMON_NORET_STACK(area, trap, label, additions) \ +#define EXCEPTION_COMMON_STACK(area, trap) \ EXCEPTION_PROLOG_COMMON_1();\ kuap_save_amr_and_lock r9, r10, cr1;\ EXCEPTION_PROLOG_COMMON_2(area);\ - EXCEPTION_PROLOG_COMMON_3(trap);\ - /* Volatile regs are potentially clobbered here */ \ - additions + EXCEPTION_PROLOG_COMMON_3(trap) -#define STD_EXCEPTION_COMMON(trap, label, hdlr)\ - EXCEPTION_COMMON(PACA_EXGEN, trap, label, ADD_NVGPRS;ADD_RECONCILE); \ +#define STD_EXCEPTION_COMMON(trap, hdlr) \ + EXCEPTION_COMMON(PACA_EXGEN, trap); \ + bl save_nvgprs;\ + RECONCILE_IRQ_STATE(r10, r11); \ addir3,r1,STACK_FRAME_OVERHEAD; \ bl hdlr; \ b ret_from_except @@ -686,9 +668,11 @@ END_FTR_SECTION_IFSET(CPU_FTR_CTRL) * in the idle task and therefore need the special idle handling * (finish nap and runlatch) */ -#define STD_EXCEPTION_COMMON_ASYNC(trap, label, hdlr) \ - EXCEPTION_COMMON(PACA_EXGEN, trap, label, \ - FINISH_NAP;ADD_RECONCILE;RUNLATCH_ON); \ +#define STD_EXCEPTION_COMMON_ASYNC(trap, hdlr) \ + EXCEPTION_COMMON(PACA_EXGEN, trap); \ + FINISH_NAP; \ + RECONCILE_IRQ_STATE(r10, r11); \ + RUNLATCH_ON;\ addir3,r1,STACK_FRAME_OVERHEAD; \ bl hdlr; \ b ret_from_except_lite diff --git a/arch/powerpc/include/asm/head-64.h b/arch/powerpc/include/asm/head-64.h index bdd67a26e959..acd94fcf9f40 100644 --- a/arch/powerpc/include/asm/head-64.h +++ b/arch/powerpc/include/asm/head-64.h @@ -403,11 +403,11 @@ end_##sname: #define EXC_COMMON(name, realvec, hdlr) \ EXC_COMMON_BEGIN(name); \ - STD_EXCEPTION_COMMON(realvec, name, hdlr) + STD_EXCEPTION_COMMON(realvec, hdlr) #define EXC_COMMON_ASYNC(name, realvec, hdlr) \ EXC_COMMON_BEGIN(name); \ - STD_EXCEPTION_COMMON_ASYNC(realvec, name, hdlr) + STD_EXCEPTION_COMMON_ASYNC(realvec, hdlr) #endif /* __ASSEMBLY__ */ diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 63b161c23e9e..935019529f16 100644 --- a/arch/powerpc/ke
[PATCH 10/28] powerpc/64s/exception: Make EXCEPTION_PROLOG_0 a gas macro for consistency with others
No generated code change. Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/exception-64s.h | 25 arch/powerpc/kernel/exceptions-64s.S | 24 +++ 2 files changed, 25 insertions(+), 24 deletions(-) diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h index 1d8fc085e845..f19c2391cc36 100644 --- a/arch/powerpc/include/asm/exception-64s.h +++ b/arch/powerpc/include/asm/exception-64s.h @@ -233,7 +233,7 @@ */ #define EXCEPTION_RELON_PROLOG(area, label, hsrr, kvm, vec)\ SET_SCRATCH0(r13); /* save r13 */ \ - EXCEPTION_PROLOG_0(area); \ + EXCEPTION_PROLOG_0 area ; \ EXCEPTION_PROLOG_1 hsrr, area, kvm, vec, 0 ;\ EXCEPTION_PROLOG_2_VIRT label, hsrr @@ -297,13 +297,14 @@ BEGIN_FTR_SECTION_NESTED(943) \ std ra,offset(r13); \ END_FTR_SECTION_NESTED(ftr,ftr,943) -#define EXCEPTION_PROLOG_0(area) \ - GET_PACA(r13); \ - std r9,area+EX_R9(r13); /* save r9 */ \ - OPT_GET_SPR(r9, SPRN_PPR, CPU_FTR_HAS_PPR); \ - HMT_MEDIUM; \ - std r10,area+EX_R10(r13); /* save r10 - r12 */\ +.macro EXCEPTION_PROLOG_0 area + GET_PACA(r13) + std r9,\area\()+EX_R9(r13) /* save r9 */ + OPT_GET_SPR(r9, SPRN_PPR, CPU_FTR_HAS_PPR) + HMT_MEDIUM + std r10,\area\()+EX_R10(r13)/* save r10 - r12 */ OPT_GET_SPR(r10, SPRN_CFAR, CPU_FTR_CFAR) +.endm .macro EXCEPTION_PROLOG_1 hsrr, area, kvm, vec, bitmask OPT_SAVE_REG_TO_PACA(\area\()+EX_PPR, r9, CPU_FTR_HAS_PPR) @@ -347,7 +348,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943) #define EXCEPTION_PROLOG(area, label, hsrr, kvm, vec) \ SET_SCRATCH0(r13); /* save r13 */ \ - EXCEPTION_PROLOG_0(area); \ + EXCEPTION_PROLOG_0 area ; \ EXCEPTION_PROLOG_1 hsrr, area, kvm, vec, 0 ;\ EXCEPTION_PROLOG_2_REAL label, hsrr, 1 @@ -416,7 +417,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943) /* Do not enable RI */ #define EXCEPTION_PROLOG_NORI(area, label, hsrr, kvm, vec) \ - EXCEPTION_PROLOG_0(area); \ + EXCEPTION_PROLOG_0 area ; \ EXCEPTION_PROLOG_1 hsrr, area, kvm, vec, 0 ;\ EXCEPTION_PROLOG_2_REAL label, hsrr, 0 @@ -565,7 +566,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943) /* Version of above for when we have to branch out-of-line */ #define __OOL_EXCEPTION(vec, label, hdlr) \ SET_SCRATCH0(r13); \ - EXCEPTION_PROLOG_0(PACA_EXGEN); \ + EXCEPTION_PROLOG_0 PACA_EXGEN ; \ b hdlr #define STD_EXCEPTION_OOL(vec, label) \ @@ -596,7 +597,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943) #define __MASKABLE_EXCEPTION(vec, label, hsrr, kvm, bitmask) \ SET_SCRATCH0(r13);/* save r13 */\ - EXCEPTION_PROLOG_0(PACA_EXGEN); \ + EXCEPTION_PROLOG_0 PACA_EXGEN ; \ EXCEPTION_PROLOG_1 hsrr, PACA_EXGEN, kvm, vec, bitmask ;\ EXCEPTION_PROLOG_2_REAL label, hsrr, 1 @@ -616,7 +617,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943) #define __MASKABLE_RELON_EXCEPTION(vec, label, hsrr, kvm, bitmask) \ SET_SCRATCH0(r13);/* save r13 */\ - EXCEPTION_PROLOG_0(PACA_EXGEN); \ + EXCEPTION_PROLOG_0 PACA_EXGEN ; \ EXCEPTION_PROLOG_1 hsrr, PACA_EXGEN, kvm, vec, bitmask ;\ EXCEPTION_PROLOG_2_VIRT label, hsrr diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 8680cd7da550..88f892167a64 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -109,7 +109,7 @@ EXC_VIRT_NONE(0x4000, 0x100) EXC_REAL_BEGIN(system_reset, 0x100, 0x100) SET_SCRATCH0(r13) - EXCEPTION_PROLOG_0(PACA_EXNMI) + EXCEPTION_PROLOG_0 PACA_EXNMI /* This is EXCEPTION_PROLOG_1 with the idle feature section added */ OPT_SAVE_REG_TO_PACA(PACA_EXNMI+EX_PPR, r9, CPU_FTR_HAS_PPR) @@ -266,7 +266,7 @@ EXC_REAL_BEGIN(machine_check, 0x200, 0x100) * vector
[PATCH 09/28] powerpc/64s/exception: KVM handler can set the HSRR trap bit
Move the KVM trap HSRR bit into the KVM handler, which can be conditionally applied when hsrr parameter is set. No generated code change. Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/exception-64s.h | 5 + arch/powerpc/include/asm/head-64.h | 7 ++- 2 files changed, 7 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h index 737c37d1df4b..1d8fc085e845 100644 --- a/arch/powerpc/include/asm/exception-64s.h +++ b/arch/powerpc/include/asm/exception-64s.h @@ -449,7 +449,12 @@ END_FTR_SECTION_NESTED(ftr,ftr,943) ld r10,\area+EX_R10(r13) std r12,HSTATE_SCRATCH0(r13) sldir12,r9,32 + /* HSRR variants have the 0x2 bit added to their trap number */ + .if \hsrr + ori r12,r12,(\n + 0x2) + .else ori r12,r12,(\n) + .endif /* This reloads r9 before branching to kvmppc_interrupt */ __BRANCH_TO_KVM_EXIT(\area, kvmppc_interrupt) diff --git a/arch/powerpc/include/asm/head-64.h b/arch/powerpc/include/asm/head-64.h index 518d9758b41e..bdd67a26e959 100644 --- a/arch/powerpc/include/asm/head-64.h +++ b/arch/powerpc/include/asm/head-64.h @@ -393,16 +393,13 @@ end_##sname: TRAMP_KVM_BEGIN(do_kvm_##n);\ KVM_HANDLER area, EXC_STD, n, 1 -/* - * HV variant exceptions get the 0x2 bit added to their trap number. - */ #define TRAMP_KVM_HV(area, n) \ TRAMP_KVM_BEGIN(do_kvm_H##n); \ - KVM_HANDLER area, EXC_HV, n + 0x2, 0 + KVM_HANDLER area, EXC_HV, n, 0 #define TRAMP_KVM_HV_SKIP(area, n) \ TRAMP_KVM_BEGIN(do_kvm_H##n); \ - KVM_HANDLER area, EXC_HV, n + 0x2, 1 + KVM_HANDLER area, EXC_HV, n, 1 #define EXC_COMMON(name, realvec, hdlr) \ EXC_COMMON_BEGIN(name); \ -- 2.20.1
[PATCH 08/28] powerpc/64s/exception: merge KVM handler and skip variants
Conditionally expand the skip case if it is specified. No generated code change. Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/exception-64s.h | 28 +--- arch/powerpc/include/asm/head-64.h | 8 +++ arch/powerpc/kernel/exceptions-64s.S | 2 +- 3 files changed, 15 insertions(+), 23 deletions(-) diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h index 74ddcb37156c..737c37d1df4b 100644 --- a/arch/powerpc/include/asm/exception-64s.h +++ b/arch/powerpc/include/asm/exception-64s.h @@ -431,26 +431,17 @@ END_FTR_SECTION_NESTED(ftr,ftr,943) .endif .endm -.macro KVM_HANDLER area, hsrr, n +.macro KVM_HANDLER area, hsrr, n, skip + .if \skip + cmpwi r10,KVM_GUEST_MODE_SKIP + beq 89f + .else BEGIN_FTR_SECTION_NESTED(947) ld r10,\area+EX_CFAR(r13) std r10,HSTATE_CFAR(r13) END_FTR_SECTION_NESTED(CPU_FTR_CFAR,CPU_FTR_CFAR,947) - BEGIN_FTR_SECTION_NESTED(948) - ld r10,\area+EX_PPR(r13) - std r10,HSTATE_PPR(r13) - END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948) - ld r10,\area+EX_R10(r13) - std r12,HSTATE_SCRATCH0(r13) - sldir12,r9,32 - ori r12,r12,(\n) - /* This reloads r9 before branching to kvmppc_interrupt */ - __BRANCH_TO_KVM_EXIT(\area, kvmppc_interrupt) -.endm + .endif -.macro KVM_HANDLER_SKIP area, hsrr, n - cmpwi r10,KVM_GUEST_MODE_SKIP - beq 89f BEGIN_FTR_SECTION_NESTED(948) ld r10,\area+EX_PPR(r13) std r10,HSTATE_PPR(r13) @@ -461,6 +452,8 @@ END_FTR_SECTION_NESTED(ftr,ftr,943) ori r12,r12,(\n) /* This reloads r9 before branching to kvmppc_interrupt */ __BRANCH_TO_KVM_EXIT(\area, kvmppc_interrupt) + + .if \skip 89:mtocrf 0x80,r9 ld r9,\area+EX_R9(r13) ld r10,\area+EX_R10(r13) @@ -469,14 +462,13 @@ END_FTR_SECTION_NESTED(ftr,ftr,943) .else b kvmppc_skip_interrupt .endif + .endif .endm #else .macro KVMTEST hsrr, n .endm -.macro KVM_HANDLER area, hsrr, n -.endm -.macro KVM_HANDLER_SKIP area, hsrr, n +.macro KVM_HANDLER area, hsrr, n, skip .endm #endif diff --git a/arch/powerpc/include/asm/head-64.h b/arch/powerpc/include/asm/head-64.h index 4767d6c7b8fa..518d9758b41e 100644 --- a/arch/powerpc/include/asm/head-64.h +++ b/arch/powerpc/include/asm/head-64.h @@ -387,22 +387,22 @@ end_##sname: #define TRAMP_KVM(area, n) \ TRAMP_KVM_BEGIN(do_kvm_##n);\ - KVM_HANDLER area, EXC_STD, n + KVM_HANDLER area, EXC_STD, n, 0 #define TRAMP_KVM_SKIP(area, n) \ TRAMP_KVM_BEGIN(do_kvm_##n);\ - KVM_HANDLER_SKIP area, EXC_STD, n + KVM_HANDLER area, EXC_STD, n, 1 /* * HV variant exceptions get the 0x2 bit added to their trap number. */ #define TRAMP_KVM_HV(area, n) \ TRAMP_KVM_BEGIN(do_kvm_H##n); \ - KVM_HANDLER area, EXC_HV, n + 0x2 + KVM_HANDLER area, EXC_HV, n + 0x2, 0 #define TRAMP_KVM_HV_SKIP(area, n) \ TRAMP_KVM_BEGIN(do_kvm_H##n); \ - KVM_HANDLER_SKIP area, EXC_HV, n + 0x2 + KVM_HANDLER area, EXC_HV, n + 0x2, 1 #define EXC_COMMON(name, realvec, hdlr) \ EXC_COMMON_BEGIN(name); \ diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 91350b3dedde..8680cd7da550 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -1063,7 +1063,7 @@ TRAMP_KVM_BEGIN(do_kvm_0xc00) SET_SCRATCH0(r10) std r9,PACA_EXGEN+EX_R9(r13) mfcrr9 - KVM_HANDLER PACA_EXGEN, EXC_STD, 0xc00 + KVM_HANDLER PACA_EXGEN, EXC_STD, 0xc00, 0 #endif -- 2.20.1
[PATCH 07/28] powerpc/64s/exception: consolidate maskable and non-maskable prologs
Conditionally expand the soft-masking test if a mask is passed in. No generated code change. Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/exception-64s.h | 113 +-- arch/powerpc/kernel/exceptions-64s.S | 20 ++-- 2 files changed, 55 insertions(+), 78 deletions(-) diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h index e1b449e2c9ea..74ddcb37156c 100644 --- a/arch/powerpc/include/asm/exception-64s.h +++ b/arch/powerpc/include/asm/exception-64s.h @@ -234,7 +234,7 @@ #define EXCEPTION_RELON_PROLOG(area, label, hsrr, kvm, vec)\ SET_SCRATCH0(r13); /* save r13 */ \ EXCEPTION_PROLOG_0(area); \ - EXCEPTION_PROLOG_1 hsrr, area, kvm, vec ; \ + EXCEPTION_PROLOG_1 hsrr, area, kvm, vec, 0 ;\ EXCEPTION_PROLOG_2_VIRT label, hsrr /* Exception register prefixes */ @@ -305,73 +305,50 @@ END_FTR_SECTION_NESTED(ftr,ftr,943) std r10,area+EX_R10(r13); /* save r10 - r12 */\ OPT_GET_SPR(r10, SPRN_CFAR, CPU_FTR_CFAR) -#define __EXCEPTION_PROLOG_1_PRE(area) \ - OPT_SAVE_REG_TO_PACA(area+EX_PPR, r9, CPU_FTR_HAS_PPR); \ - OPT_SAVE_REG_TO_PACA(area+EX_CFAR, r10, CPU_FTR_CFAR); \ - INTERRUPT_TO_KERNEL;\ - SAVE_CTR(r10, area);\ +.macro EXCEPTION_PROLOG_1 hsrr, area, kvm, vec, bitmask + OPT_SAVE_REG_TO_PACA(\area\()+EX_PPR, r9, CPU_FTR_HAS_PPR) + OPT_SAVE_REG_TO_PACA(\area\()+EX_CFAR, r10, CPU_FTR_CFAR) + INTERRUPT_TO_KERNEL + SAVE_CTR(r10, \area\()) mfcrr9 - -#define __EXCEPTION_PROLOG_1_POST(area) \ - std r11,area+EX_R11(r13); \ - std r12,area+EX_R12(r13); \ - GET_SCRATCH0(r10); \ - std r10,area+EX_R13(r13) - -/* - * This version of the EXCEPTION_PROLOG_1 will carry - * addition parameter called "bitmask" to support - * checking of the interrupt maskable level. - * Intended to be used in MASKABLE_EXCPETION_* macros. - */ -.macro MASKABLE_EXCEPTION_PROLOG_1 hsrr, area, kvm, vec, bitmask - __EXCEPTION_PROLOG_1_PRE(\area\()) .if \kvm KVMTEST \hsrr \vec .endif - - lbz r10,PACAIRQSOFTMASK(r13) - andi. r10,r10,\bitmask - /* This associates vector numbers with bits in paca->irq_happened */ - .if \vec == 0x500 || \vec == 0xea0 - li r10,PACA_IRQ_EE - .elseif \vec == 0x900 || \vec == 0xea0 - li r10,PACA_IRQ_DEC - .elseif \vec == 0xa00 || \vec == 0xe80 - li r10,PACA_IRQ_DBELL - .elseif \vec == 0xe60 - li r10,PACA_IRQ_HMI - .elseif \vec == 0xf00 - li r10,PACA_IRQ_PMI - .else - .abort "Bad maskable vector" + .if \bitmask + lbz r10,PACAIRQSOFTMASK(r13) + andi. r10,r10,\bitmask + /* Associate vector numbers with bits in paca->irq_happened */ + .if \vec == 0x500 || \vec == 0xea0 + li r10,PACA_IRQ_EE + .elseif \vec == 0x900 || \vec == 0xea0 + li r10,PACA_IRQ_DEC + .elseif \vec == 0xa00 || \vec == 0xe80 + li r10,PACA_IRQ_DBELL + .elseif \vec == 0xe60 + li r10,PACA_IRQ_HMI + .elseif \vec == 0xf00 + li r10,PACA_IRQ_PMI + .else + .abort "Bad maskable vector" + .endif + + .if \hsrr + bne masked_Hinterrupt + .else + bne masked_interrupt + .endif .endif - .if \hsrr - bne masked_Hinterrupt - .else - bne masked_interrupt - .endif - - __EXCEPTION_PROLOG_1_POST(\area\()) -.endm - -/* - * This version of the EXCEPTION_PROLOG_1 is intended - * to be used in STD_EXCEPTION* macros - */ -.macro EXCEPTION_PROLOG_1 hsrr, area, kvm, vec - __EXCEPTION_PROLOG_1_PRE(\area\()) - .if \kvm - KVMTEST \hsrr \vec - .endif - __EXCEPTION_PROLOG_1_POST(\area\()) + std r11,\area\()+EX_R11(r13) + std r12,\area\()+EX_R12(r13) + GET_SCRATCH0(r10) + std r10,\area\()+EX_R13(r13) .endm #define EXCEPTION_PROLOG(area, label, hsrr, kvm, vec) \ SET_SCRATCH0(r13); /* save r13 */ \ EXCEPTION_PROLOG_0(area); \ - EXCEPTION_PROLOG_1 hsrr, area, kvm, vec ; \ + EXCEPTION_PROLOG_1 hsrr, a