Re: Onboard SD card doesn't work anymore after the 'mmc-v5.4-2' updates
Russell King - ARM Linux admin writes: > On Tue, Oct 15, 2019 at 03:12:49PM +0200, Christian Zigotzky wrote: >> Hello Russell, >> >> You asked me about "dma-coherent" in the Cyrus device tree. Unfortunately I >> don't find the property "dma-coherent" in the dtb source files. >> >> Output of "fdtdump cyrus_p5020_eth_poweroff.dtb | grep dma": >> >> dma0 = "/soc@ffe00/dma@100300"; >> dma1 = "/soc@ffe00/dma@101300"; >> dma@100300 { >> compatible = "fsl,eloplus-dma"; >> dma-channel@0 { >> compatible = "fsl,eloplus-dma-channel"; >> dma-channel@80 { >> compatible = "fsl,eloplus-dma-channel"; >> dma-channel@100 { >> compatible = "fsl,eloplus-dma-channel"; >> dma-channel@180 { >> compatible = "fsl,eloplus-dma-channel"; >> dma@101300 { >> compatible = "fsl,eloplus-dma"; >> dma-channel@0 { >> compatible = "fsl,eloplus-dma-channel"; >> dma-channel@80 { >> compatible = "fsl,eloplus-dma-channel"; >> dma-channel@100 { >> compatible = "fsl,eloplus-dma-channel"; >> dma-channel@180 { >> compatible = "fsl,eloplus-dma-channel"; > > Hmm, so it looks like PowerPC doesn't mark devices that are dma > coherent with a property that describes them as such. > > I think this opens a wider question - what should of_dma_is_coherent() > return for PowerPC? It seems right now that it returns false for > devices that are DMA coherent, which seems to me to be a recipe for > future mistakes. Right, it seems of_dma_is_coherent() has baked in the assumption that devices are non-coherent unless explicitly marked as coherent. Which is wrong on all or at least most existing powerpc systems according to Ben. > Any ideas from the PPC maintainers? Fixing it at the source seems like the best option to prevent future breakage. So I guess that would mean making of_dma_is_coherent() return true/false based on CONFIG_NOT_COHERENT_CACHE on powerpc. We could do it like below, which would still allow the dma-coherent property to work if it ever makes sense on a future powerpc platform. I don't really know any of this embedded stuff well, so happy to take other suggestions on how to handle this mess. cheers diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c index 25aaa3903000..b96c9010acb6 100644 --- a/arch/powerpc/kernel/setup-common.c +++ b/arch/powerpc/kernel/setup-common.c @@ -760,6 +760,22 @@ static int __init check_cache_coherency(void) late_initcall(check_cache_coherency); #endif /* CONFIG_CHECK_CACHE_COHERENCY */ +#ifndef CONFIG_NOT_COHERENT_CACHE +/* + * For historical reasons powerpc kernels are built with hard wired knowledge of + * whether or not DMA accesses are cache coherent. Additionally device trees on + * powerpc do not typically support the dma-coherent property. + * + * So when we know that DMA is coherent, override arch_of_dma_is_coherent() to + * tell the drivers/of code that all devices are coherent regardless of whether + * they have a dma-coherent property. + */ +bool arch_of_dma_is_coherent(struct device_node *np) +{ + return true; +} +#endif + #ifdef CONFIG_DEBUG_FS struct dentry *powerpc_debugfs_root; EXPORT_SYMBOL(powerpc_debugfs_root); diff --git a/drivers/of/address.c b/drivers/of/address.c index 978427a9d5e6..3a4b2949a322 100644 --- a/drivers/of/address.c +++ b/drivers/of/address.c @@ -993,6 +993,14 @@ int of_dma_get_range(struct device_node *np, u64 *dma_addr, u64 *paddr, u64 *siz } EXPORT_SYMBOL_GPL(of_dma_get_range); +/* + * arch_of_dma_is_coherent - Arch hook to determine if device is coherent for DMA + */ +bool __weak arch_of_dma_is_coherent(struct device_node *np) +{ + return false; +} + /** * of_dma_is_coherent - Check if device is coherent * @np:device node @@ -1002,8 +1010,12 @@ EXPORT_SYMBOL_GPL(of_dma_get_range); */ bool of_dma_is_coherent(struct device_node *np) { - struct device_node *node = of_node_get(np); + struct device_node *node; + + if (arch_of_dma_is_coherent(np)) + return true; + np = of_node_get(np); while (node) { if (of_property_read_bool(node, "dma-coherent")) { of_node_put(node);
Re: [PATCH v9 2/8] KVM: PPC: Move pages between normal and secure memory
On Wed, Oct 23, 2019 at 03:17:54PM +1100, Paul Mackerras wrote: > On Tue, Oct 22, 2019 at 11:59:35AM +0530, Bharata B Rao wrote: > The mapping of pages in userspace memory, and the mapping of userspace > memory to guest physical space, are two distinct things. The memslots > describe the mapping of userspace addresses to guest physical > addresses, but don't say anything about what is mapped at those > userspace addresses. So you can indeed get a page fault on a > userspace address at the same time that a memslot is being deleted > (even a memslot that maps that particular userspace address), because > removing the memslot does not unmap anything from userspace memory, > it just breaks the association between that userspace memory and guest > physical memory. Deleting the memslot does unmap the pages from the > guest but doesn't unmap them from the userspace process (e.g. QEMU). > > It is an interesting question what the semantics should be when a > memslot is deleted and there are pages of userspace currently paged > out to the device (i.e. the ultravisor). One approach might be to say > that all those pages have to come back to the host before we finish > the memslot deletion, but that is probably not necessary; I think we > could just say that those pages are gone and can be replaced by zero > pages if they get accessed on the host side. If userspace then unmaps > the corresponding region of the userspace memory map, we can then just > forget all those pages with very little work. There are 5 scenarios currently where we are replacing the device mappings: 1. Guest reset 2. Memslot free (Memory unplug) (Not present in this version though) 3. Converting secure page to shared page 4. HV touching the secure page 5. H_SVM_INIT_ABORT hcall to abort SVM due to errors when transitioning to secure mode (Not present in this version) In the first 3 cases, we don't need to get the page to HV from the secure side and hence skip the page out. However currently we do allocate fresh page and replace the mapping with the new one. > > However if that sounds fragile, may be I can go back to my initial > > design where we weren't using rmap[] to store device PFNs. That will > > increase the memory usage but we give us an easy option to have > > per-guest mutex to protect concurrent page-ins/outs/faults. > > That sounds like it would be the best option, even if only in the > short term. At least it would give us a working solution, even if > it's not the best performing solution. Sure, will avoid using rmap[] in the next version. Regards, Bharata.
Re: [PATCH v9 2/8] KVM: PPC: Move pages between normal and secure memory
On Tue, Oct 22, 2019 at 11:59:35AM +0530, Bharata B Rao wrote: > On Fri, Oct 18, 2019 at 8:31 AM Paul Mackerras wrote: > > > > On Wed, Sep 25, 2019 at 10:36:43AM +0530, Bharata B Rao wrote: > > > Manage migration of pages betwen normal and secure memory of secure > > > guest by implementing H_SVM_PAGE_IN and H_SVM_PAGE_OUT hcalls. > > > > > > H_SVM_PAGE_IN: Move the content of a normal page to secure page > > > H_SVM_PAGE_OUT: Move the content of a secure page to normal page > > > > > > Private ZONE_DEVICE memory equal to the amount of secure memory > > > available in the platform for running secure guests is created. > > > Whenever a page belonging to the guest becomes secure, a page from > > > this private device memory is used to represent and track that secure > > > page on the HV side. The movement of pages between normal and secure > > > memory is done via migrate_vma_pages() using UV_PAGE_IN and > > > UV_PAGE_OUT ucalls. > > > > As we discussed privately, but mentioning it here so there is a > > record: I am concerned about this structure > > > > > +struct kvmppc_uvmem_page_pvt { > > > + unsigned long *rmap; > > > + struct kvm *kvm; > > > + unsigned long gpa; > > > +}; > > > > which keeps a reference to the rmap. The reference could become stale > > if the memslot is deleted or moved, and nothing in the patch series > > ensures that the stale references are cleaned up. > > I will add code to release the device PFNs when memslot goes away. In > fact the early versions of the patchset had this, but it subsequently > got removed. > > > > > If it is possible to do without the long-term rmap reference, and > > instead find the rmap via the memslots (with the srcu lock held) each > > time we need the rmap, that would be safer, I think, provided that we > > can sort out the lock ordering issues. > > All paths except fault handler access rmap[] under srcu lock. Even in > case of fault handler, for those faults induced by us (shared page > handling, releasing device pfns), we do hold srcu lock. The difficult > case is when we fault due to HV accessing a device page. In this case > we come to fault hanler with mmap_sem already held and are not in a > position to take kvm srcu lock as that would lead to lock order > reversal. Given that we have pages mapped in still, I assume memslot > can't go away while we access rmap[], so think we should be ok here. The mapping of pages in userspace memory, and the mapping of userspace memory to guest physical space, are two distinct things. The memslots describe the mapping of userspace addresses to guest physical addresses, but don't say anything about what is mapped at those userspace addresses. So you can indeed get a page fault on a userspace address at the same time that a memslot is being deleted (even a memslot that maps that particular userspace address), because removing the memslot does not unmap anything from userspace memory, it just breaks the association between that userspace memory and guest physical memory. Deleting the memslot does unmap the pages from the guest but doesn't unmap them from the userspace process (e.g. QEMU). It is an interesting question what the semantics should be when a memslot is deleted and there are pages of userspace currently paged out to the device (i.e. the ultravisor). One approach might be to say that all those pages have to come back to the host before we finish the memslot deletion, but that is probably not necessary; I think we could just say that those pages are gone and can be replaced by zero pages if they get accessed on the host side. If userspace then unmaps the corresponding region of the userspace memory map, we can then just forget all those pages with very little work. > However if that sounds fragile, may be I can go back to my initial > design where we weren't using rmap[] to store device PFNs. That will > increase the memory usage but we give us an easy option to have > per-guest mutex to protect concurrent page-ins/outs/faults. That sounds like it would be the best option, even if only in the short term. At least it would give us a working solution, even if it's not the best performing solution. Paul.
Re: [PATCH v3 1/4] powerpc/mm: Implement set_memory() routines
Russell Currey writes: > diff --git a/arch/powerpc/mm/pageattr.c b/arch/powerpc/mm/pageattr.c > new file mode 100644 > index ..fe3ecbfb8e10 > --- /dev/null > +++ b/arch/powerpc/mm/pageattr.c > @@ -0,0 +1,60 @@ > +// SPDX-License-Identifier: GPL-2.0 > + > +/* > + * MMU-generic set_memory implementation for powerpc > + * > + * Author: Russell Currey Please don't add email addresses in new files, they just risk bit-rotting, they're in the git log anyway. > + * > + * Copyright 2019, IBM Corporation. > + */ > + > +#include > +#include > + > +#include > +#include > +#include > + > +static int change_page_attr(pte_t *ptep, unsigned long addr, void *data) > +{ > + int action = *((int *)data); > + pte_t pte_val; > + > + // invalidate the PTE so it's safe to modify > + pte_val = ptep_get_and_clear(_mm, addr, ptep); > + flush_tlb_kernel_range(addr, addr + PAGE_SIZE); This doesn't work if for example we're setting the text mapping we're executing from read-only, which in principle should work. Or if another CPU is concurrently reading from a mapping we're marking read-only. I /think/ that's acceptable for all the current users, but I don't know that for sure and it's not documented anywhere AFAICS. At the very least it needs a big comment, and to be mentioned in the change log. Also there's no locking here, or in apply_to_page_range() AFAICS. And because we're doing clear/modify/write, two CPUs that race doing eg. set_memory_ro() and set_memory_nx() will potentially result in some PTEs being marked permanently invalid, if one CPU sees the other CPUs clear of the PTE before the write. Again I'm not sure any current callers do that, but it's a bit fragile. I think we can fix the race at least by taking the init_mm page_table_lock around the clear/modify/write. > + // modify the PTE bits as desired, then apply > + switch (action) { > + case SET_MEMORY_RO: > + pte_val = pte_wrprotect(pte_val); > + break; > + case SET_MEMORY_RW: > + pte_val = pte_mkwrite(pte_val); > + break; > + case SET_MEMORY_NX: > + pte_val = pte_exprotect(pte_val); > + break; > + case SET_MEMORY_X: > + pte_val = pte_mkexec(pte_val); > + break; > + default: > + WARN_ON(true); > + return -EINVAL; > + } > + > + set_pte_at(_mm, addr, ptep, pte_val); > + > + return 0; > +} cheers
RE: [PATCH 0/7] towards QE support on ARM
On 22/10/2019 18:18, Rasmus Villemoes wrote: > -Original Message- > From: Rasmus Villemoes > Sent: 2019年10月22日 18:18 > To: Qiang Zhao ; Leo Li > Cc: Timur Tabi ; Greg Kroah-Hartman > ; linux-ker...@vger.kernel.org; > linux-ser...@vger.kernel.org; Jiri Slaby ; > linuxppc-dev@lists.ozlabs.org; linux-arm-ker...@lists.infradead.org > Subject: Re: [PATCH 0/7] towards QE support on ARM > > On 22/10/2019 04.24, Qiang Zhao wrote: > > On Mon, Oct 22, 2019 at 6:11 AM Leo Li wrote > > >> Right. I'm really interested in getting this applied to my tree and > >> make it upstream. Zhao Qiang, can you help to review Rasmus's > >> patches and comment? > > > > As you know, I maintained a similar patchset removing PPC, and someone > told me qe_ic should moved into drivers/irqchip/. > > I also thought qe_ic is a interrupt control driver, should be moved into dir > irqchip. > > Yes, and I also plan to do that at some point. However, that's orthogonal to > making the driver build on ARM, so I don't want to mix the two. Making it > usable on ARM is my/our priority currently. > > I'd appreciate your input on my patches. Yes, we can put this patchset in first place, ensure it can build and work on ARM, then push another patchset to move qe_ic. Best Regards, Qiang
[PATCH] powerpc/boot: Fix the initrd being overwritten under qemu
When booting under OF the zImage expects the initrd address and size to be passed to it using registers r3 and r4. SLOF (guest firmware used by QEMU) currently doesn't do this so the zImage is not aware of the initrd location. This can result in initrd corruption either though the zImage extracting the vmlinux over the initrd, or by the vmlinux overwriting the initrd when relocating itself. QEMU does put the linux,initrd-start and linux,initrd-end properties into the devicetree to vmlinux to find the initrd. We can work around the SLOF bug by also looking those properties in the zImage. Cc: sta...@vger.kernel.org Cc: Alexey Kardashevskiy Signed-off-by: Oliver O'Halloran --- First noticed here: https://unix.stackexchange.com/questions/547023/linux-kernel-on-ppc64le-vmlinux-equivalent-in-arch-powerpc-boot --- arch/powerpc/boot/devtree.c | 21 + arch/powerpc/boot/main.c| 7 +++ arch/powerpc/boot/of.h | 16 arch/powerpc/boot/ops.h | 1 + arch/powerpc/boot/swab.h| 17 + 5 files changed, 46 insertions(+), 16 deletions(-) diff --git a/arch/powerpc/boot/devtree.c b/arch/powerpc/boot/devtree.c index 5d91036..ac5c26b 100644 --- a/arch/powerpc/boot/devtree.c +++ b/arch/powerpc/boot/devtree.c @@ -13,6 +13,7 @@ #include "string.h" #include "stdio.h" #include "ops.h" +#include "swab.h" void dt_fixup_memory(u64 start, u64 size) { @@ -318,6 +319,26 @@ int dt_xlate_reg(void *node, int res, unsigned long *addr, unsigned long *size) return dt_xlate(node, res, reglen, addr, size); } +int dt_read_addr(void *node, const char *prop, unsigned long *out_addr) +{ + int reglen; + + *out_addr = 0; + + reglen = getprop(node, prop, prop_buf, sizeof(prop_buf)) / 4; + if (reglen == 2) { + u64 v0 = be32_to_cpu(prop_buf[0]); + u64 v1 = be32_to_cpu(prop_buf[1]); + *out_addr = (v0 << 32) | v1; + } else if (reglen == 1) { + *out_addr = be32_to_cpu(prop_buf[0]); + } else { + return 0; + } + + return 1; +} + int dt_xlate_addr(void *node, u32 *buf, int buflen, unsigned long *xlated_addr) { diff --git a/arch/powerpc/boot/main.c b/arch/powerpc/boot/main.c index a9d2091..518af24 100644 --- a/arch/powerpc/boot/main.c +++ b/arch/powerpc/boot/main.c @@ -112,6 +112,13 @@ static struct addr_range prep_initrd(struct addr_range vmlinux, void *chosen, } else if (initrd_size > 0) { printf("Using loader supplied ramdisk at 0x%lx-0x%lx\n\r", initrd_addr, initrd_addr + initrd_size); + } else if (chosen) { + unsigned long initrd_end; + + dt_read_addr(chosen, "linux,initrd-start", _addr); + dt_read_addr(chosen, "linux,initrd-end", _end); + + initrd_size = initrd_end - initrd_addr; } /* If there's no initrd at all, we're done */ diff --git a/arch/powerpc/boot/of.h b/arch/powerpc/boot/of.h index 31b2f5d..dc24770 100644 --- a/arch/powerpc/boot/of.h +++ b/arch/powerpc/boot/of.h @@ -26,22 +26,6 @@ typedef u16 __be16; typedef u32__be32; typedef u64__be64; -#ifdef __LITTLE_ENDIAN__ -#define cpu_to_be16(x) swab16(x) -#define be16_to_cpu(x) swab16(x) -#define cpu_to_be32(x) swab32(x) -#define be32_to_cpu(x) swab32(x) -#define cpu_to_be64(x) swab64(x) -#define be64_to_cpu(x) swab64(x) -#else -#define cpu_to_be16(x) (x) -#define be16_to_cpu(x) (x) -#define cpu_to_be32(x) (x) -#define be32_to_cpu(x) (x) -#define cpu_to_be64(x) (x) -#define be64_to_cpu(x) (x) -#endif - #define PROM_ERROR (-1u) #endif /* _PPC_BOOT_OF_H_ */ diff --git a/arch/powerpc/boot/ops.h b/arch/powerpc/boot/ops.h index e060676..5100dd7 100644 --- a/arch/powerpc/boot/ops.h +++ b/arch/powerpc/boot/ops.h @@ -95,6 +95,7 @@ void *simple_alloc_init(char *base, unsigned long heap_size, extern void flush_cache(void *, unsigned long); int dt_xlate_reg(void *node, int res, unsigned long *addr, unsigned long *size); int dt_xlate_addr(void *node, u32 *buf, int buflen, unsigned long *xlated_addr); +int dt_read_addr(void *node, const char *prop, unsigned long *out); int dt_is_compatible(void *node, const char *compat); void dt_get_reg_format(void *node, u32 *naddr, u32 *nsize); int dt_get_virtual_reg(void *node, void **addr, int nres); diff --git a/arch/powerpc/boot/swab.h b/arch/powerpc/boot/swab.h index 11d2069..82db2c1 100644 --- a/arch/powerpc/boot/swab.h +++ b/arch/powerpc/boot/swab.h @@ -27,4 +27,21 @@ static inline u64 swab64(u64 x) (u64)((x & (u64)0x00ffULL) >> 40) | (u64)((x & (u64)0xff00ULL) >> 56); } + +#ifdef __LITTLE_ENDIAN__ +#define cpu_to_be16(x) swab16(x) +#define be16_to_cpu(x) swab16(x) +#define cpu_to_be32(x) swab32(x) +#define be32_to_cpu(x) swab32(x) +#define cpu_to_be64(x) swab64(x) +#define be64_to_cpu(x) swab64(x)
Re: [PATCH v8 3/8] powerpc: detect the trusted boot state of the system
Nayna Jain writes: > diff --git a/arch/powerpc/kernel/secure_boot.c > b/arch/powerpc/kernel/secure_boot.c > index 99bba7915629..9753470ab08a 100644 > --- a/arch/powerpc/kernel/secure_boot.c > +++ b/arch/powerpc/kernel/secure_boot.c > @@ -28,3 +39,16 @@ bool is_ppc_secureboot_enabled(void) > pr_info("Secure boot mode %s\n", enabled ? "enabled" : "disabled"); > return enabled; > } > + > +bool is_ppc_trustedboot_enabled(void) > +{ > + struct device_node *node; > + bool enabled = false; > + > + node = get_ppc_fw_sb_node(); > + enabled = of_property_read_bool(node, "trusted-enabled"); Also here you need: of_node_put(node); > + > + pr_info("Trusted boot mode %s\n", enabled ? "enabled" : "disabled"); > + > + return enabled; > +} cheers
Re: [PATCH v8 1/8] powerpc: detect the secure boot mode of the system
Nayna Jain writes: > diff --git a/arch/powerpc/kernel/secure_boot.c > b/arch/powerpc/kernel/secure_boot.c > new file mode 100644 > index ..99bba7915629 > --- /dev/null > +++ b/arch/powerpc/kernel/secure_boot.c > @@ -0,0 +1,30 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Copyright (C) 2019 IBM Corporation > + * Author: Nayna Jain > + */ > +#include > +#include > +#include > + > +bool is_ppc_secureboot_enabled(void) > +{ > + struct device_node *node; > + bool enabled = false; > + > + node = of_find_compatible_node(NULL, NULL, "ibm,secvar-v1"); If this found a node then you have a node with an elevated refcount which you need to drop on the way out. > + if (!of_device_is_available(node)) { > + pr_err("Cannot find secure variable node in device tree; > failing to secure state\n"); > + goto out; > + } > + > + /* > + * secureboot is enabled if os-secure-enforcing property exists, > + * else disabled. > + */ > + enabled = of_property_read_bool(node, "os-secure-enforcing"); > + > +out: So here you need: of_node_put(node); > + pr_info("Secure boot mode %s\n", enabled ? "enabled" : "disabled"); > + return enabled; > +} cheers
Re: [PATCH v7 00/12] implement KASLR for powerpc/fsl_booke/32
On Mon, 2019-10-21 at 11:34 +0800, Jason Yan wrote: > > On 2019/10/10 2:46, Scott Wood wrote: > > On Wed, 2019-10-09 at 16:41 +0800, Jason Yan wrote: > > > Hi Scott, > > > > > > On 2019/10/9 15:13, Scott Wood wrote: > > > > On Wed, 2019-10-09 at 14:10 +0800, Jason Yan wrote: > > > > > Hi Scott, > > > > > > > > > > Would you please take sometime to test this? > > > > > > > > > > Thank you so much. > > > > > > > > > > On 2019/9/24 13:52, Jason Yan wrote: > > > > > > Hi Scott, > > > > > > > > > > > > Can you test v7 to see if it works to load a kernel at a non-zero > > > > > > address? > > > > > > > > > > > > Thanks, > > > > > > > > Sorry for the delay. Here's the output: > > > > > > > > > > Thanks for the test. > > > > > > > ## Booting kernel from Legacy Image at 1000 ... > > > > Image Name: Linux-5.4.0-rc2-00050-g8ac2cf5b4 > > > > Image Type: PowerPC Linux Kernel Image (gzip compressed) > > > > Data Size:7521134 Bytes = 7.2 MiB > > > > Load Address: 0400 > > > > Entry Point: 0400 > > > > Verifying Checksum ... OK > > > > ## Flattened Device Tree blob at 1fc0 > > > > Booting using the fdt blob at 0x1fc0 > > > > Uncompressing Kernel Image ... OK > > > > Loading Device Tree to 07fe, end 07fff65c ... OK > > > > KASLR: No safe seed for randomizing the kernel base. > > > > OF: reserved mem: initialized node qman-fqd, compatible id fsl,qman- > > > > fqd > > > > OF: reserved mem: initialized node qman-pfdr, compatible id fsl,qman- > > > > pfdr > > > > OF: reserved mem: initialized node bman-fbpr, compatible id fsl,bman- > > > > fbpr > > > > Memory CAM mapping: 64/64/64 Mb, residual: 12032Mb > > > > > > When boot from 0400, the max CAM value is 64M. And > > > you have a board with 12G memory, CONFIG_LOWMEM_CAM_NUM=3 means only > > > 192M memory is mapped and when kernel is randomized at the middle of > > > this 192M memory, we will not have enough continuous memory for node > > > map. > > > > > > Can you set CONFIG_LOWMEM_CAM_NUM=8 and see if it works? > > > > OK, that worked. > > > > Hi Scott, any more cases should be tested or any more comments? > What else need to be done before this feature can be merged? I've just applied it and sent a pull request. -Scott
Pull request: scottwood/linux.git next
This contains KASLR support for book3e 32-bit. The following changes since commit 612ee81b9461475b5a5612c2e8d71559dd3c7920: powerpc/papr_scm: Fix an off-by-one check in papr_scm_meta_{get, set} (2019-10-10 20:15:53 +1100) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux.git next for you to fetch changes up to 9df1ef3f1376ec5d3a1b51a4546c94279bcd88ca: powerpc/fsl_booke/32: Document KASLR implementation (2019-10-21 16:09:16 -0500) Jason Yan (12): powerpc: unify definition of M_IF_NEEDED powerpc: move memstart_addr and kernstart_addr to init-common.c powerpc: introduce kernstart_virt_addr to store the kernel base powerpc/fsl_booke/32: introduce create_kaslr_tlb_entry() helper powerpc/fsl_booke/32: introduce reloc_kernel_entry() helper powerpc/fsl_booke/32: implement KASLR infrastructure powerpc/fsl_booke/32: randomize the kernel image offset powerpc/fsl_booke/kaslr: clear the original kernel if randomized powerpc/fsl_booke/kaslr: support nokaslr cmdline parameter powerpc/fsl_booke/kaslr: dump out kernel offset information on panic powerpc/fsl_booke/kaslr: export offset in VMCOREINFO ELF notes powerpc/fsl_booke/32: Document KASLR implementation Documentation/powerpc/kaslr-booke32.rst | 42 +++ arch/powerpc/Kconfig | 11 + arch/powerpc/include/asm/nohash/mmu-book3e.h | 11 +- arch/powerpc/include/asm/page.h | 7 + arch/powerpc/kernel/early_32.c| 5 +- arch/powerpc/kernel/exceptions-64e.S | 12 +- arch/powerpc/kernel/fsl_booke_entry_mapping.S | 25 +- arch/powerpc/kernel/head_fsl_booke.S | 61 +++- arch/powerpc/kernel/machine_kexec.c | 1 + arch/powerpc/kernel/misc_64.S | 7 +- arch/powerpc/kernel/setup-common.c| 20 ++ arch/powerpc/mm/init-common.c | 7 + arch/powerpc/mm/init_32.c | 5 - arch/powerpc/mm/init_64.c | 5 - arch/powerpc/mm/mmu_decl.h| 11 + arch/powerpc/mm/nohash/Makefile | 1 + arch/powerpc/mm/nohash/fsl_booke.c| 8 +- arch/powerpc/mm/nohash/kaslr_booke.c | 401 ++ 18 files changed, 587 insertions(+), 53 deletions(-) create mode 100644 Documentation/powerpc/kaslr-booke32.rst create mode 100644 arch/powerpc/mm/nohash/kaslr_booke.c
Re: [PATCH RFC v1 00/12] mm: Don't mark hotplugged pages PG_reserved (including ZONE_DEVICE)
Hi David, Thanks for tackling this! On Tue, Oct 22, 2019 at 10:13 AM David Hildenbrand wrote: > > This series is based on [2], which should pop up in linux/next soon: > https://lkml.org/lkml/2019/10/21/1034 > > This is the result of a recent discussion with Michal ([1], [2]). Right > now we set all pages PG_reserved when initializing hotplugged memmaps. This > includes ZONE_DEVICE memory. In case of system memory, PG_reserved is > cleared again when onlining the memory, in case of ZONE_DEVICE memory > never. In ancient times, we needed PG_reserved, because there was no way > to tell whether the memmap was already properly initialized. We now have > SECTION_IS_ONLINE for that in the case of !ZONE_DEVICE memory. ZONE_DEVICE > memory is already initialized deferred, and there shouldn't be a visible > change in that regard. > > I remember that some time ago, we already talked about stopping to set > ZONE_DEVICE pages PG_reserved on the list, but I never saw any patches. > Also, I forgot who was part of the discussion :) You got me, Alex, and KVM folks on the Cc, so I'd say that was it. > One of the biggest fear were side effects. I went ahead and audited all > users of PageReserved(). The ones that don't need any care (patches) > can be found below. I will double check and hope I am not missing something > important. > > I am probably a little bit too careful (but I don't want to break things). > In most places (besides KVM and vfio that are nuts), the > pfn_to_online_page() check could most probably be avoided by a > is_zone_device_page() check. However, I usually get suspicious when I see > a pfn_valid() check (especially after I learned that people mmap parts of > /dev/mem into user space, including memory without memmaps. Also, people > could memmap offline memory blocks this way :/). As long as this does not > hurt performance, I think we should rather do it the clean way. I'm concerned about using is_zone_device_page() in places that are not known to already have a reference to the page. Here's an audit of current usages, and the ones I think need to cleaned up. The "unsafe" ones do not appear to have any protections against the device page being removed (get_dev_pagemap()). Yes, some of these were added by me. The "unsafe? HMM" ones need HMM eyes because HMM leaks device pages into anonymous memory paths and I'm not up to speed on how it guarantees 'struct page' validity vs device shutdown without using get_dev_pagemap(). smaps_pmd_entry(): unsafe put_devmap_managed_page(): safe, page reference is held is_device_private_page(): safe? gpu driver manages private page lifetime is_pci_p2pdma_page(): safe, page reference is held uncharge_page(): unsafe? HMM add_to_kill(): safe, protected by get_dev_pagemap() and dax_lock_page() soft_offline_page(): unsafe remove_migration_pte(): unsafe? HMM move_to_new_page(): unsafe? HMM migrate_vma_pages() and helpers: unsafe? HMM try_to_unmap_one(): unsafe? HMM __put_page(): safe release_pages(): safe I'm hoping all the HMM ones can be converted to is_device_private_page() directlly and have that routine grow a nice comment about how it knows it can always safely de-reference its @page argument. For the rest I'd like to propose that we add a facility to determine ZONE_DEVICE by pfn rather than page. The most straightforward why I can think of would be to just add another bitmap to mem_section_usage to indicate if a subsection is ZONE_DEVICE or not. > > I only gave it a quick test with DIMMs on x86-64, but didn't test the > ZONE_DEVICE part at all (any tips for a nice QEMU setup?). Compile-tested > on x86-64 and PPC. I'll give it a spin, but I don't think the kernel wants to grow more is_zone_device_page() users.
Re: [PATCH RFC v1 07/12] staging: kpc2000: Prepare transfer_complete_cb() for PG_reserved changes
On 22.10.19 19:55, Matt Sickler wrote: Right now, ZONE_DEVICE memory is always set PG_reserved. We want to change that. The pages are obtained via get_user_pages_fast(). I assume, these could be ZONE_DEVICE pages. Let's just exclude them as well explicitly. I'm not sure what ZONE_DEVICE pages are, but these pages are normal system RAM, typically HugePages (but not always). ZONE_DEVICE, a.k.a. devmem, are pages that bypass the pagecache (e.g., DAX) completely and will therefore never get swapped. These pages are not managed by any page allocator (especially not the buddy), they are rather "directly mapped device memory". E.g., a NVDIMM. It is mapped into the physical address space similar to ordinary RAM (a DIMM). Any write to such a PFN will directly end up on the target device. In contrast to a DIMM, the memory is persistent accross reboots. Now, if you mmap such an NVDIMM into a user space process, you will end up with ZONE_DEVICE pages as part of the user space mapping (VMA). get_user_pages_fast() on this memory will result in "struct pages" that belong to ZONE_DEVICE. This is where this patch comes into play. This patch makes sure that there is absolutely no change once we stop setting these ZONE_DEVICE pages PG_reserved. E.g., AFAIK, setting a ZONE_DEVICE page dirty does not make too much sense (never swapped). Yes, it might not be a likely setup, however, it is possible. In this series I collect all places that *could* be affected. If that change is really needed has to be decided. I can see that the two staging drivers I have patches for might be able to just live with the change - but then we talked about it and are aware of the change. Thanks! -- Thanks, David / dhildenb
RE: [PATCH RFC v1 07/12] staging: kpc2000: Prepare transfer_complete_cb() for PG_reserved changes
>Right now, ZONE_DEVICE memory is always set PG_reserved. We want to change >that. > >The pages are obtained via get_user_pages_fast(). I assume, these could be >ZONE_DEVICE pages. Let's just exclude them as well explicitly. I'm not sure what ZONE_DEVICE pages are, but these pages are normal system RAM, typically HugePages (but not always). > >Cc: Greg Kroah-Hartman >Cc: Vandana BN >Cc: "Simon Sandström" >Cc: Dan Carpenter >Cc: Nishka Dasgupta >Cc: Madhumitha Prabakaran >Cc: Fabio Estevam >Cc: Matt Sickler >Cc: Jeremy Sowden >Signed-off-by: David Hildenbrand >--- > drivers/staging/kpc2000/kpc_dma/fileops.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > >diff --git a/drivers/staging/kpc2000/kpc_dma/fileops.c >b/drivers/staging/kpc2000/kpc_dma/fileops.c >index cb52bd9a6d2f..457adcc81fe6 100644 >--- a/drivers/staging/kpc2000/kpc_dma/fileops.c >+++ b/drivers/staging/kpc2000/kpc_dma/fileops.c >@@ -212,7 +212,8 @@ void transfer_complete_cb(struct aio_cb_data *acd, size_t >xfr_count, u32 flags) >BUG_ON(acd->ldev->pldev == NULL); > >for (i = 0 ; i < acd->page_count ; i++) { >- if (!PageReserved(acd->user_pages[i])) { >+ if (!PageReserved(acd->user_pages[i]) && >+ !is_zone_device_page(acd->user_pages[i])) { >set_page_dirty(acd->user_pages[i]); >} >} >-- >2.21.0
[PATCH RFC v1 02/12] mm/usercopy.c: Prepare check_page_span() for PG_reserved changes
Right now, ZONE_DEVICE memory is always set PG_reserved. We want to change that. Let's make sure that the logic in the function won't change. Once we no longer set these pages to reserved, we can rework this function to perform separate checks for ZONE_DEVICE (split from PG_reserved checks). Cc: Kees Cook Cc: Andrew Morton Cc: Kate Stewart Cc: Allison Randal Cc: "Isaac J. Manjarres" Cc: Qian Cai Cc: Thomas Gleixner Signed-off-by: David Hildenbrand --- mm/usercopy.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/mm/usercopy.c b/mm/usercopy.c index 660717a1ea5c..a3ac4be35cde 100644 --- a/mm/usercopy.c +++ b/mm/usercopy.c @@ -203,14 +203,15 @@ static inline void check_page_span(const void *ptr, unsigned long n, * device memory), or CMA. Otherwise, reject since the object spans * several independently allocated pages. */ - is_reserved = PageReserved(page); + is_reserved = PageReserved(page) || is_zone_device_page(page); is_cma = is_migrate_cma_page(page); if (!is_reserved && !is_cma) usercopy_abort("spans multiple pages", NULL, to_user, 0, n); for (ptr += PAGE_SIZE; ptr <= end; ptr += PAGE_SIZE) { page = virt_to_head_page(ptr); - if (is_reserved && !PageReserved(page)) + if (is_reserved && !(PageReserved(page) || +is_zone_device_page(page))) usercopy_abort("spans Reserved and non-Reserved pages", NULL, to_user, 0, n); if (is_cma && !is_migrate_cma_page(page)) -- 2.21.0
[PATCH RFC v1 00/12] mm: Don't mark hotplugged pages PG_reserved (including ZONE_DEVICE)
This series is based on [2], which should pop up in linux/next soon: https://lkml.org/lkml/2019/10/21/1034 This is the result of a recent discussion with Michal ([1], [2]). Right now we set all pages PG_reserved when initializing hotplugged memmaps. This includes ZONE_DEVICE memory. In case of system memory, PG_reserved is cleared again when onlining the memory, in case of ZONE_DEVICE memory never. In ancient times, we needed PG_reserved, because there was no way to tell whether the memmap was already properly initialized. We now have SECTION_IS_ONLINE for that in the case of !ZONE_DEVICE memory. ZONE_DEVICE memory is already initialized deferred, and there shouldn't be a visible change in that regard. I remember that some time ago, we already talked about stopping to set ZONE_DEVICE pages PG_reserved on the list, but I never saw any patches. Also, I forgot who was part of the discussion :) One of the biggest fear were side effects. I went ahead and audited all users of PageReserved(). The ones that don't need any care (patches) can be found below. I will double check and hope I am not missing something important. I am probably a little bit too careful (but I don't want to break things). In most places (besides KVM and vfio that are nuts), the pfn_to_online_page() check could most probably be avoided by a is_zone_device_page() check. However, I usually get suspicious when I see a pfn_valid() check (especially after I learned that people mmap parts of /dev/mem into user space, including memory without memmaps. Also, people could memmap offline memory blocks this way :/). As long as this does not hurt performance, I think we should rather do it the clean way. I only gave it a quick test with DIMMs on x86-64, but didn't test the ZONE_DEVICE part at all (any tips for a nice QEMU setup?). Compile-tested on x86-64 and PPC. Other users of PageReserved() that should be fine: - mm/page_owner.c:pagetypeinfo_showmixedcount_print() -> Never called for ZONE_DEVICE, (+ pfn_to_online_page(pfn)) - mm/page_owner.c:init_pages_in_zone() -> Never called for ZONE_DEVICE (!populated_zone(zone)) - mm/page_ext.c:free_page_ext() -> Only a BUG_ON(PageReserved(page)), not relevant - mm/page_ext.c:has_unmovable_pages() -> Not releveant for ZONE_DEVICE - mm/page_ext.c:pfn_range_valid_contig() -> pfn_to_online_page() already guards us - mm/mempolicy.c:queue_pages_pte_range() -> vm_normal_page() checks against pte_devmap() - mm/memory-failure.c:hwpoison_user_mappings() -> Not reached via memory_failure() due to pfn_to_online_page() -> Also not reached indirectly via memory_failure_hugetlb() - mm/hugetlb.c:gather_bootmem_prealloc() -> Only a WARN_ON(PageReserved(page)), not relevant - kernel/power/snapshot.c:saveable_highmem_page() -> pfn_to_online_page() already guards us - kernel/power/snapshot.c:saveable_page() -> pfn_to_online_page() already guards us - fs/proc/task_mmu.c:can_gather_numa_stats() -> vm_normal_page() checks against pte_devmap() - fs/proc/task_mmu.c:can_gather_numa_stats_pmd -> vm_normal_page_pmd() checks against pte_devmap() - fs/proc/page.c:stable_page_flags() -> The reserved bit is simply copied, irrelevant - drivers/firmware/memmap.c:release_firmware_map_entry() -> really only a check to detect bootmem. Not relevant for ZONE_DEVICE - arch/ia64/kernel/mca_drv.c - arch/mips/mm/init.c - arch/mips/mm/ioremap.c - arch/nios2/mm/ioremap.c - arch/parisc/mm/ioremap.c - arch/sparc/mm/tlb.c - arch/xtensa/mm/cache.c -> No ZONE_DEVICE support - arch/powerpc/mm/init_64.c:vmemmap_free() -> Special-cases memmap on altmap -> Only a check for bootmem - arch/x86/kernel/alternative.c:__text_poke() -> Only a WARN_ON(!PageReserved(pages[0])) to verify it is bootmem - arch/x86/mm/init_64.c -> Only a check for bootmem [1] https://lkml.org/lkml/2019/10/21/736 [2] https://lkml.org/lkml/2019/10/21/1034 Cc: Michal Hocko Cc: Dan Williams Cc: kvm-...@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: k...@vger.kernel.org Cc: linux-hyp...@vger.kernel.org Cc: de...@driverdev.osuosl.org Cc: xen-de...@lists.xenproject.org Cc: x...@kernel.org Cc: Alexander Duyck David Hildenbrand (12): mm/memory_hotplug: Don't allow to online/offline memory blocks with holes mm/usercopy.c: Prepare check_page_span() for PG_reserved changes KVM: x86/mmu: Prepare kvm_is_mmio_pfn() for PG_reserved changes KVM: Prepare kvm_is_reserved_pfn() for PG_reserved changes vfio/type1: Prepare is_invalid_reserved_pfn() for PG_reserved changes staging/gasket: Prepare gasket_release_page() for PG_reserved changes staging: kpc2000: Prepare transfer_complete_cb() for PG_reserved changes powerpc/book3s: Prepare kvmppc_book3s_instantiate_page() for PG_reserved changes powerpc/64s: Prepare hash_page_do_lazy_icache() for PG_reserved changes powerpc/mm: Prepare maybe_pte_to_page() for PG_reserved changes x86/mm: Prepare __ioremap_check_ram() for PG_reserved changes mm/memory_hotplug:
[PATCH RFC v1 12/12] mm/memory_hotplug: Don't mark pages PG_reserved when initializing the memmap
Everything should be prepared to stop setting pages PG_reserved when initializing the memmap on memory hotplug. Most importantly, we stop marking ZONE_DEVICE pages PG_reserved. a) We made sure that any code that relied on PG_reserved to detect ZONE_DEVICE memory will no longer rely on PG_reserved - either by using pfn_to_online_page() to exclude them right away or by checking against is_zone_device_page(). b) We made sure that memory blocks with holes cannot be offlined and therefore also not onlined. We have quite some code that relies on memory holes being marked PG_reserved. This is now not an issue anymore. generic_online_page() still calls __free_pages_core(), which performs __ClearPageReserved(p). AFAIKS, this should not hurt. It is worth nothing that the users of online_page_callback_t might see a change. E.g., until now, pages not freed to the buddy by the HyperV balloonm were set PG_reserved until freed via generic_online_page(). Now, they would look like ordinarily allocated pages (refcount == 1). This callback is used by the XEN balloon and the HyperV balloon. To not introduce any silent errors, keep marking the pages PG_reserved. We can most probably stop doing that, but have to double check if there are issues (e.g., offlining code aborts right away in has_unmovable_pages() when it runs into a PageReserved(page)) Update the documentation at various places. Cc: "K. Y. Srinivasan" Cc: Haiyang Zhang Cc: Stephen Hemminger Cc: Sasha Levin Cc: Boris Ostrovsky Cc: Juergen Gross Cc: Stefano Stabellini Cc: Andrew Morton Cc: Alexander Duyck Cc: Pavel Tatashin Cc: Vlastimil Babka Cc: Johannes Weiner Cc: Anthony Yznaga Cc: Michal Hocko Cc: Oscar Salvador Cc: Dan Williams Cc: Mel Gorman Cc: Mike Rapoport Cc: Anshuman Khandual Suggested-by: Michal Hocko Signed-off-by: David Hildenbrand --- drivers/hv/hv_balloon.c| 6 ++ drivers/xen/balloon.c | 7 +++ include/linux/page-flags.h | 8 +--- mm/memory_hotplug.c| 17 +++-- mm/page_alloc.c| 11 --- 5 files changed, 21 insertions(+), 28 deletions(-) diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c index c722079d3c24..3214b0ef5247 100644 --- a/drivers/hv/hv_balloon.c +++ b/drivers/hv/hv_balloon.c @@ -670,6 +670,12 @@ static struct notifier_block hv_memory_nb = { /* Check if the particular page is backed and can be onlined and online it. */ static void hv_page_online_one(struct hv_hotadd_state *has, struct page *pg) { + /* +* TODO: The core used to mark the pages reserved. Most probably +* we can stop doing that now. +*/ + __SetPageReserved(pg); + if (!has_pfn_is_backed(has, page_to_pfn(pg))) { if (!PageOffline(pg)) __SetPageOffline(pg); diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c index 4f2e78a5e4db..af69f057913a 100644 --- a/drivers/xen/balloon.c +++ b/drivers/xen/balloon.c @@ -374,6 +374,13 @@ static void xen_online_page(struct page *page, unsigned int order) mutex_lock(_mutex); for (i = 0; i < size; i++) { p = pfn_to_page(start_pfn + i); + /* +* TODO: The core used to mark the pages reserved. Most probably +* we can stop doing that now. However, especially +* alloc_xenballooned_pages() left PG_reserved set +* on pages that can get mapped to user space. +*/ + __SetPageReserved(p); balloon_append(p); } mutex_unlock(_mutex); diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index f91cb8898ff0..d4f85d866b71 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -30,24 +30,18 @@ * - Pages falling into physical memory gaps - not IORESOURCE_SYSRAM. Trying * to read/write these pages might end badly. Don't touch! * - The zero page(s) - * - Pages not added to the page allocator when onlining a section because - * they were excluded via the online_page_callback() or because they are - * PG_hwpoison. * - Pages allocated in the context of kexec/kdump (loaded kernel image, * control pages, vmcoreinfo) * - MMIO/DMA pages. Some architectures don't allow to ioremap pages that are * not marked PG_reserved (as they might be in use by somebody else who does * not respect the caching strategy). - * - Pages part of an offline section (struct pages of offline sections should - * not be trusted as they will be initialized when first onlined). * - MCA pages on ia64 * - Pages holding CPU notes for POWER Firmware Assisted Dump - * - Device memory (e.g. PMEM, DAX, HMM) * Some PG_reserved pages will be excluded from the hibernation image. * PG_reserved does in general not hinder anybody from dumping or swapping * and is no longer required for remap_pfn_range(). ioremap might require it. * Consequently, PG_reserved for
[PATCH RFC v1 11/12] x86/mm: Prepare __ioremap_check_ram() for PG_reserved changes
Right now, ZONE_DEVICE memory is always set PG_reserved. We want to change that. We could explicitly check for is_zone_device_page(page). But looking at the pfn_valid() check, it seems safer to just use pfn_to_online_page() here, that will skip all ZONE_DEVICE pages right away. Cc: Dave Hansen Cc: Andy Lutomirski Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: "H. Peter Anvin" Signed-off-by: David Hildenbrand --- arch/x86/mm/ioremap.c | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c index a39dcdb5ae34..db6913b48edf 100644 --- a/arch/x86/mm/ioremap.c +++ b/arch/x86/mm/ioremap.c @@ -77,10 +77,17 @@ static unsigned int __ioremap_check_ram(struct resource *res) start_pfn = (res->start + PAGE_SIZE - 1) >> PAGE_SHIFT; stop_pfn = (res->end + 1) >> PAGE_SHIFT; if (stop_pfn > start_pfn) { - for (i = 0; i < (stop_pfn - start_pfn); ++i) - if (pfn_valid(start_pfn + i) && - !PageReserved(pfn_to_page(start_pfn + i))) + for (i = 0; i < (stop_pfn - start_pfn); ++i) { + struct page *page; +/* + * We treat any pages that are not online (not managed + * by the buddy) as not being RAM. This includes + * ZONE_DEVICE pages. + */ + page = pfn_to_online_page(start_pfn + i); + if (page && !PageReserved(page)) return IORES_MAP_SYSTEM_RAM; + } } return 0; -- 2.21.0
[PATCH RFC v1 10/12] powerpc/mm: Prepare maybe_pte_to_page() for PG_reserved changes
Right now, ZONE_DEVICE memory is always set PG_reserved. We want to change that. We could explicitly check for is_zone_device_page(page). But looking at the pfn_valid() check, it seems safer to just use pfn_to_online_page() here, that will skip all ZONE_DEVICE pages right away. Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Christophe Leroy Cc: "Aneesh Kumar K.V" Cc: Allison Randal Cc: Nicholas Piggin Cc: Thomas Gleixner Signed-off-by: David Hildenbrand --- arch/powerpc/mm/pgtable.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c index e3759b69f81b..613c98fa7dc0 100644 --- a/arch/powerpc/mm/pgtable.c +++ b/arch/powerpc/mm/pgtable.c @@ -55,10 +55,12 @@ static struct page *maybe_pte_to_page(pte_t pte) unsigned long pfn = pte_pfn(pte); struct page *page; - if (unlikely(!pfn_valid(pfn))) - return NULL; - page = pfn_to_page(pfn); - if (PageReserved(page)) + /* +* We reject any pages that are not online (not managed by the buddy). +* This includes ZONE_DEVICE pages. +*/ + page = pfn_to_online_page(pfn); + if (unlikely(!page || PageReserved(page))) return NULL; return page; } -- 2.21.0
[PATCH RFC v1 06/12] staging/gasket: Prepare gasket_release_page() for PG_reserved changes
Right now, ZONE_DEVICE memory is always set PG_reserved. We want to change that. The pages are obtained via get_user_pages_fast(). I assume, these could be ZONE_DEVICE pages. Let's just exclude them as well explicitly. Cc: Rob Springer Cc: Todd Poynor Cc: Ben Chan Cc: Greg Kroah-Hartman Signed-off-by: David Hildenbrand --- drivers/staging/gasket/gasket_page_table.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/staging/gasket/gasket_page_table.c b/drivers/staging/gasket/gasket_page_table.c index f6d715787da8..d43fed58bf65 100644 --- a/drivers/staging/gasket/gasket_page_table.c +++ b/drivers/staging/gasket/gasket_page_table.c @@ -447,7 +447,7 @@ static bool gasket_release_page(struct page *page) if (!page) return false; - if (!PageReserved(page)) + if (!PageReserved(page) && !is_zone_device_page(page)) SetPageDirty(page); put_page(page); -- 2.21.0
[PATCH RFC v1 09/12] powerpc/64s: Prepare hash_page_do_lazy_icache() for PG_reserved changes
Right now, ZONE_DEVICE memory is always set PG_reserved. We want to change that. We could explicitly check for is_zone_device_page(page). But looking at the pfn_valid() check, it seems safer to just use pfn_to_online_page() here, that will skip all ZONE_DEVICE pages right away. Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: "Aneesh Kumar K.V" Cc: Christophe Leroy Cc: Nicholas Piggin Cc: Andrew Morton Cc: Mike Rapoport Cc: YueHaibing Signed-off-by: David Hildenbrand --- arch/powerpc/mm/book3s64/hash_utils.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c index 6c123760164e..a1566039e747 100644 --- a/arch/powerpc/mm/book3s64/hash_utils.c +++ b/arch/powerpc/mm/book3s64/hash_utils.c @@ -1084,13 +1084,15 @@ void hash__early_init_mmu_secondary(void) */ unsigned int hash_page_do_lazy_icache(unsigned int pp, pte_t pte, int trap) { - struct page *page; + struct page *page = pfn_to_online_page(pte_pfn(pte)); - if (!pfn_valid(pte_pfn(pte))) + /* +* We ignore any pages that are not online (not managed by the buddy). +* This includes ZONE_DEVICE pages. +*/ + if (!page) return pp; - page = pte_page(pte); - /* page is dirty */ if (!test_bit(PG_arch_1, >flags) && !PageReserved(page)) { if (trap == 0x400) { -- 2.21.0
[PATCH RFC v1 05/12] vfio/type1: Prepare is_invalid_reserved_pfn() for PG_reserved changes
Right now, ZONE_DEVICE memory is always set PG_reserved. We want to change that. KVM has this weird use case that you can map anything from /dev/mem into the guest. pfn_valid() is not a reliable check whether the memmap was initialized and can be touched. pfn_to_online_page() makes sure that we have an initialized memmap. Note that ZONE_DEVICE memory is never online (IOW, managed by the buddy). Switching to pfn_to_online_page() keeps the existing behavior for PFNs without a memmap and for ZONE_DEVICE memory. They are treated as reserved and the page is not touched (e.g., to set it dirty or accessed). Cc: Alex Williamson Cc: Cornelia Huck Signed-off-by: David Hildenbrand --- drivers/vfio/vfio_iommu_type1.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 2ada8e6cdb88..f8ce8c408ba8 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -299,9 +299,15 @@ static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async) */ static bool is_invalid_reserved_pfn(unsigned long pfn) { - if (pfn_valid(pfn)) - return PageReserved(pfn_to_page(pfn)); + struct page *page = pfn_to_online_page(pfn); + /* +* We treat any pages that are not online (not managed by the buddy) +* as reserved - this includes ZONE_DEVICE pages and pages without +* a memmap (e.g., mapped via /dev/mem). +*/ + if (page) + return PageReserved(page); return true; } -- 2.21.0
[PATCH RFC v1 08/12] powerpc/book3s: Prepare kvmppc_book3s_instantiate_page() for PG_reserved changes
Right now, ZONE_DEVICE memory is always set PG_reserved. We want to change that. KVM has this weird use case that you can map anything from /dev/mem into the guest. pfn_valid() is not a reliable check whether the memmap was initialized and can be touched. pfn_to_online_page() makes sure that we have an initialized memmap. Note that ZONE_DEVICE memory is never online (IOW, managed by the buddy). Switching to pfn_to_online_page() keeps the existing behavior for PFNs without a memmap and for ZONE_DEVICE memory. Cc: Paul Mackerras Cc: Benjamin Herrenschmidt Cc: Michael Ellerman Signed-off-by: David Hildenbrand --- arch/powerpc/kvm/book3s_64_mmu_radix.c | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c index 2d415c36a61d..05397c0561fc 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c @@ -801,12 +801,14 @@ int kvmppc_book3s_instantiate_page(struct kvm_vcpu *vcpu, writing, upgrade_p); if (is_error_noslot_pfn(pfn)) return -EFAULT; - page = NULL; - if (pfn_valid(pfn)) { - page = pfn_to_page(pfn); - if (PageReserved(page)) - page = NULL; - } + /* +* We treat any pages that are not online (not managed by the +* buddy) as reserved - this includes ZONE_DEVICE pages and +* pages without a memmap (e.g., mapped via /dev/mem). +*/ + page = pfn_to_online_page(pfn); + if (page && PageReserved(page)) + page = NULL; } /* -- 2.21.0
[PATCH RFC v1 07/12] staging: kpc2000: Prepare transfer_complete_cb() for PG_reserved changes
Right now, ZONE_DEVICE memory is always set PG_reserved. We want to change that. The pages are obtained via get_user_pages_fast(). I assume, these could be ZONE_DEVICE pages. Let's just exclude them as well explicitly. Cc: Greg Kroah-Hartman Cc: Vandana BN Cc: "Simon Sandström" Cc: Dan Carpenter Cc: Nishka Dasgupta Cc: Madhumitha Prabakaran Cc: Fabio Estevam Cc: Matt Sickler Cc: Jeremy Sowden Signed-off-by: David Hildenbrand --- drivers/staging/kpc2000/kpc_dma/fileops.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/staging/kpc2000/kpc_dma/fileops.c b/drivers/staging/kpc2000/kpc_dma/fileops.c index cb52bd9a6d2f..457adcc81fe6 100644 --- a/drivers/staging/kpc2000/kpc_dma/fileops.c +++ b/drivers/staging/kpc2000/kpc_dma/fileops.c @@ -212,7 +212,8 @@ void transfer_complete_cb(struct aio_cb_data *acd, size_t xfr_count, u32 flags) BUG_ON(acd->ldev->pldev == NULL); for (i = 0 ; i < acd->page_count ; i++) { - if (!PageReserved(acd->user_pages[i])) { + if (!PageReserved(acd->user_pages[i]) && + !is_zone_device_page(acd->user_pages[i])) { set_page_dirty(acd->user_pages[i]); } } -- 2.21.0
[PATCH RFC v1 01/12] mm/memory_hotplug: Don't allow to online/offline memory blocks with holes
Our onlining/offlining code is unnecessarily complicated. Only memory blocks added during boot can have holes. Hotplugged memory never has holes. That memory is already online. When we stop allowing to offline memory blocks with holes, we implicitly stop to online memory blocks with holes. This allows to simplify the code. For example, we no longer have to worry about marking pages that fall into memory holes PG_reserved when onlining memory. We can stop setting pages PG_reserved. Offlining memory blocks added during boot is usually not guranteed to work either way. So stopping to do that (if anybody really used and tested this over the years) should not really hurt. For the use case of offlining memory to unplug DIMMs, we should see no change. (holes on DIMMs would be weird) Cc: Andrew Morton Cc: Michal Hocko Cc: Oscar Salvador Cc: Pavel Tatashin Cc: Dan Williams Signed-off-by: David Hildenbrand --- mm/memory_hotplug.c | 26 -- 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 561371ead39a..7210f4375279 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1447,10 +1447,19 @@ static void node_states_clear_node(int node, struct memory_notify *arg) node_clear_state(node, N_MEMORY); } +static int count_system_ram_pages_cb(unsigned long start_pfn, +unsigned long nr_pages, void *data) +{ + unsigned long *nr_system_ram_pages = data; + + *nr_system_ram_pages += nr_pages; + return 0; +} + static int __ref __offline_pages(unsigned long start_pfn, unsigned long end_pfn) { - unsigned long pfn, nr_pages; + unsigned long pfn, nr_pages = 0; unsigned long offlined_pages = 0; int ret, node, nr_isolate_pageblock; unsigned long flags; @@ -1461,6 +1470,20 @@ static int __ref __offline_pages(unsigned long start_pfn, mem_hotplug_begin(); + /* +* We don't allow to offline memory blocks that contain holes +* and consecuently don't allow to online memory blocks that contain +* holes. This allows to simplify the code quite a lot and we don't +* have to mess with PG_reserved pages for memory holes. +*/ + walk_system_ram_range(start_pfn, end_pfn - start_pfn, _pages, + count_system_ram_pages_cb); + if (nr_pages != end_pfn - start_pfn) { + ret = -EINVAL; + reason = "memory holes"; + goto failed_removal; + } + /* This makes hotplug much easier...and readable. we assume this for now. .*/ if (!test_pages_in_a_zone(start_pfn, end_pfn, _start, @@ -1472,7 +1495,6 @@ static int __ref __offline_pages(unsigned long start_pfn, zone = page_zone(pfn_to_page(valid_start)); node = zone_to_nid(zone); - nr_pages = end_pfn - start_pfn; /* set above range as isolated */ ret = start_isolate_page_range(start_pfn, end_pfn, -- 2.21.0
[PATCH RFC v1 04/12] KVM: Prepare kvm_is_reserved_pfn() for PG_reserved changes
Right now, ZONE_DEVICE memory is always set PG_reserved. We want to change that. KVM has this weird use case that you can map anything from /dev/mem into the guest. pfn_valid() is not a reliable check whether the memmap was initialized and can be touched. pfn_to_online_page() makes sure that we have an initialized memmap. Note that ZONE_DEVICE memory is never online (IOW, managed by the buddy). Switching to pfn_to_online_page() keeps the existing behavior for PFNs without a memmap and for ZONE_DEVICE memory. They are treated as reserved and the page is not touched (e.g., to set it dirty or accessed). Cc: Paolo Bonzini Cc: "Radim Krčmář" Cc: Michal Hocko Cc: Dan Williams Cc: KarimAllah Ahmed Signed-off-by: David Hildenbrand --- virt/kvm/kvm_main.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 66a977472a1c..b233d4129014 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -151,9 +151,15 @@ __weak int kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm, bool kvm_is_reserved_pfn(kvm_pfn_t pfn) { - if (pfn_valid(pfn)) - return PageReserved(pfn_to_page(pfn)); + struct page *page = pfn_to_online_page(pfn); + /* +* We treat any pages that are not online (not managed by the buddy) +* as reserved - this includes ZONE_DEVICE pages and pages without +* a memmap (e.g., mapped via /dev/mem). +*/ + if (page) + return PageReserved(page); return true; } -- 2.21.0
[PATCH RFC v1 03/12] KVM: x86/mmu: Prepare kvm_is_mmio_pfn() for PG_reserved changes
Right now, ZONE_DEVICE memory is always set PG_reserved. We want to change that. KVM has this weird use case that you can map anything from /dev/mem into the guest. pfn_valid() is not a reliable check whether the memmap was initialized and can be touched. pfn_to_online_page() makes sure that we have an initialized memmap - however, there is no reliable and fast check to detect memmaps that were initialized and are ZONE_DEVICE. Let's rewrite kvm_is_mmio_pfn() so we really only touch initialized memmaps that are guaranteed to not contain garbage. Make sure that RAM without a memmap is still not detected as MMIO and that ZONE_DEVICE that is not UC/UC-/WC is not detected as MMIO. Cc: Paolo Bonzini Cc: "Radim Krčmář" Cc: Sean Christopherson Cc: Vitaly Kuznetsov Cc: Wanpeng Li Cc: Jim Mattson Cc: Joerg Roedel Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: "H. Peter Anvin" Cc: KarimAllah Ahmed Cc: Michal Hocko Cc: Dan Williams Signed-off-by: David Hildenbrand --- arch/x86/kvm/mmu.c | 30 ++ 1 file changed, 18 insertions(+), 12 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 24c23c66b226..795869ffd4bb 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2962,20 +2962,26 @@ static bool mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn, static bool kvm_is_mmio_pfn(kvm_pfn_t pfn) { + struct page *page = pfn_to_online_page(pfn); + + /* +* Online pages consist of pages managed by the buddy. Especially, +* ZONE_DEVICE pages are never online. Online pages that are reserved +* indicate the zero page and MMIO pages. +*/ + if (page) + return !is_zero_pfn(pfn) && PageReserved(pfn_to_page(pfn)); + + /* +* Anything with a valid memmap could be ZONE_DEVICE - or the +* memmap could be uninitialized. Treat only UC/UC-/WC pages as MMIO. +*/ if (pfn_valid(pfn)) - return !is_zero_pfn(pfn) && PageReserved(pfn_to_page(pfn)) && - /* -* Some reserved pages, such as those from NVDIMM -* DAX devices, are not for MMIO, and can be mapped -* with cached memory type for better performance. -* However, the above check misconceives those pages -* as MMIO, and results in KVM mapping them with UC -* memory type, which would hurt the performance. -* Therefore, we check the host memory type in addition -* and only treat UC/UC-/WC pages as MMIO. -*/ - (!pat_enabled() || pat_pfn_immune_to_uc_mtrr(pfn)); + return !pat_enabled() || pat_pfn_immune_to_uc_mtrr(pfn); + /* +* Any RAM that has no memmap (e.g., mapped via /dev/mem) is not MMIO. +*/ return !e820__mapped_raw_any(pfn_to_hpa(pfn), pfn_to_hpa(pfn + 1) - 1, E820_TYPE_RAM); -- 2.21.0
Re: [PATCH 0/7] towards QE support on ARM
On 10/18/2019 12:52 PM, Rasmus Villemoes wrote: There have been several attempts in the past few years to allow building the QUICC engine drivers for platforms other than PPC. This is (the beginning of) yet another attempt. I hope I can get someone to pick up these relatively trivial patches (I _think_ they shouldn't change functionality at all), and then I'll continue slowly working towards removing the PPC32 dependency for CONFIG_QUICC_ENGINE. Tested on an MPC8309-derived board. Rasmus Villemoes (7): soc: fsl: qe: remove space-before-tab soc: fsl: qe: drop volatile qualifier of struct qe_ic::regs soc: fsl: qe: avoid ppc-specific io accessors soc: fsl: qe: replace spin_event_timeout by readx_poll_timeout_atomic serial: make SERIAL_QE depend on PPC32 serial: ucc_uart.c: explicitly include asm/cpm.h soc/fsl/qe/qe.h: remove include of asm/cpm.h Please copy the entire series to linuxppc-dev list. We are missing 5/7 and 7/7 (see https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=137048) Christophe drivers/soc/fsl/qe/gpio.c | 30 drivers/soc/fsl/qe/qe.c | 44 +++ drivers/soc/fsl/qe/qe_ic.c| 8 ++--- drivers/soc/fsl/qe/qe_ic.h| 2 +- drivers/soc/fsl/qe/qe_io.c| 40 ++--- drivers/soc/fsl/qe/qe_tdm.c | 8 ++--- drivers/soc/fsl/qe/ucc.c | 12 +++ drivers/soc/fsl/qe/ucc_fast.c | 66 ++- drivers/soc/fsl/qe/ucc_slow.c | 38 ++-- drivers/soc/fsl/qe/usb.c | 2 +- drivers/tty/serial/Kconfig| 1 + drivers/tty/serial/ucc_uart.c | 1 + include/soc/fsl/qe/qe.h | 1 - 13 files changed, 126 insertions(+), 127 deletions(-)
Re: [PATCH 3/7] soc: fsl: qe: avoid ppc-specific io accessors
On 10/18/2019 12:52 PM, Rasmus Villemoes wrote: In preparation for allowing to build QE support for architectures other than PPC, replace the ppc-specific io accessors. Done via This patch is not transparent in terms of performance, functions get changed significantly. Before the patch: 0330 : 330: 81 43 00 04 lwz r10,4(r3) 334: 7c 00 04 ac hwsync 338: 81 2a 00 00 lwz r9,0(r10) 33c: 0c 09 00 00 twi 0,r9,0 340: 4c 00 01 2c isync 344: 70 88 00 02 andi. r8,r4,2 348: 41 82 00 10 beq 358 34c: 39 00 00 01 li r8,1 350: 91 03 00 10 stw r8,16(r3) 354: 61 29 00 10 ori r9,r9,16 358: 70 88 00 01 andi. r8,r4,1 35c: 41 82 00 10 beq 36c 360: 39 00 00 01 li r8,1 364: 91 03 00 14 stw r8,20(r3) 368: 61 29 00 20 ori r9,r9,32 36c: 7c 00 04 ac hwsync 370: 91 2a 00 00 stw r9,0(r10) 374: 4e 80 00 20 blr After the patch: 030c : 30c: 94 21 ff e0 stwur1,-32(r1) 310: 7c 08 02 a6 mflrr0 314: bf a1 00 14 stmwr29,20(r1) 318: 7c 9f 23 78 mr r31,r4 31c: 90 01 00 24 stw r0,36(r1) 320: 7c 7e 1b 78 mr r30,r3 324: 83 a3 00 04 lwz r29,4(r3) 328: 7f a3 eb 78 mr r3,r29 32c: 48 00 00 01 bl 32c 32c: R_PPC_REL24ioread32be 330: 73 e9 00 02 andi. r9,r31,2 334: 41 82 00 10 beq 344 338: 39 20 00 01 li r9,1 33c: 91 3e 00 10 stw r9,16(r30) 340: 60 63 00 10 ori r3,r3,16 344: 73 e9 00 01 andi. r9,r31,1 348: 41 82 00 10 beq 358 34c: 39 20 00 01 li r9,1 350: 91 3e 00 14 stw r9,20(r30) 354: 60 63 00 20 ori r3,r3,32 358: 80 01 00 24 lwz r0,36(r1) 35c: 7f a4 eb 78 mr r4,r29 360: bb a1 00 14 lmw r29,20(r1) 364: 7c 08 03 a6 mtlrr0 368: 38 21 00 20 addir1,r1,32 36c: 48 00 00 00 b 36c 36c: R_PPC_REL24iowrite32be Christophe
Re: [RFC PATCH] powerpc/32: Switch VDSO to C implementation.
Le 22/10/2019 à 11:01, Christophe Leroy a écrit : Le 21/10/2019 à 23:29, Thomas Gleixner a écrit : On Mon, 21 Oct 2019, Christophe Leroy wrote: This is a tentative to switch powerpc/32 vdso to generic C implementation. It will likely not work on 64 bits or even build properly at the moment. powerpc is a bit special for VDSO as well as system calls in the way that it requires setting CR SO bit which cannot be done in C. Therefore, entry/exit and fallback needs to be performed in ASM. To allow that, C fallbacks just return -1 and the ASM entry point performs the system call when the C function returns -1. The performance is rather disappoiting. That's most likely all calculation in the C implementation are based on 64 bits math and converted to 32 bits at the very end. I guess C implementation should use 32 bits math like the assembly VDSO does as of today. gettimeofday: vdso: 750 nsec/call gettimeofday: vdso: 1533 nsec/call Small improvement (3%) with the proposed change: gettimeofday: vdso: 1485 nsec/call By inlining do_hres() I get the following: gettimeofday:vdso: 1072 nsec/call Christophe Though still some way to go. Christophe The only real 64bit math which can matter is the 64bit * 32bit multiply, i.e. static __always_inline u64 vdso_calc_delta(u64 cycles, u64 last, u64 mask, u32 mult) { return ((cycles - last) & mask) * mult; } Everything else is trivial add/sub/shift, which should be roughly the same in ASM. Can you try to replace that with: static __always_inline u64 vdso_calc_delta(u64 cycles, u64 last, u64 mask, u32 mult) { u64 ret, delta = ((cycles - last) & mask); u32 dh, dl; dl = delta; dh = delta >> 32; res = mul_u32_u32(al, mul); if (ah) res += mul_u32_u32(ah, mul) << 32; return res; } That's pretty much what __do_get_tspec does in ASM. Thanks, tglx
[PATCH] powerpc/powernv: Fix CPU idle to be called with IRQs disabled
Commit e78a7614f3876 ("idle: Prevent late-arriving interrupts from disrupting offline") changes arch_cpu_idle_dead to be called with interrupts disabled, which triggers the WARN in pnv_smp_cpu_kill_self. Fix this by fixing up irq_happened after hard disabling, rather than requiring there are no pending interrupts, similarly to what was done done until commit 2525db04d1cc5 ("powerpc/powernv: Simplify lazy IRQ handling in CPU offline"). Fixes: e78a7614f3876 ("idle: Prevent late-arriving interrupts from disrupting offline") Reported-by: Paul Mackerras Signed-off-by: Nicholas Piggin --- arch/powerpc/platforms/powernv/smp.c | 50 +++- 1 file changed, 35 insertions(+), 15 deletions(-) diff --git a/arch/powerpc/platforms/powernv/smp.c b/arch/powerpc/platforms/powernv/smp.c index fbd6e6b7bbf2..241cfee744d9 100644 --- a/arch/powerpc/platforms/powernv/smp.c +++ b/arch/powerpc/platforms/powernv/smp.c @@ -146,6 +146,18 @@ static int pnv_smp_cpu_disable(void) return 0; } +static void pnv_flush_interrupts(void) +{ + if (cpu_has_feature(CPU_FTR_ARCH_300)) { + if (xive_enabled()) + xive_flush_interrupt(); + else + icp_opal_flush_interrupt(); + } else { + icp_native_flush_interrupt(); + } +} + static void pnv_smp_cpu_kill_self(void) { unsigned int cpu; @@ -153,13 +165,6 @@ static void pnv_smp_cpu_kill_self(void) u64 lpcr_val; /* Standard hot unplug procedure */ - /* -* This hard disables local interurpts, ensuring we have no lazy -* irqs pending. -*/ - WARN_ON(irqs_disabled()); - hard_irq_disable(); - WARN_ON(lazy_irq_pending()); idle_task_exit(); current->active_mm = NULL; /* for sanity */ @@ -172,6 +177,26 @@ static void pnv_smp_cpu_kill_self(void) if (cpu_has_feature(CPU_FTR_ARCH_207S)) wmask = SRR1_WAKEMASK_P8; + /* +* This turns the irq soft-disabled state we're called with, into a +* hard-disabled state with pending irq_happened interrupts cleared. +* +* PACA_IRQ_DEC - Decrementer should be ignored. +* PACA_IRQ_HMI - Can be ignored, processing is done in real mode. +* PACA_IRQ_DBELL, EE, PMI - Unexpected. +*/ + hard_irq_disable(); + if (generic_check_cpu_restart(cpu)) + goto out; + if (local_paca->irq_happened & + (PACA_IRQ_DBELL | PACA_IRQ_EE | PACA_IRQ_PMI)) { + if (local_paca->irq_happened & PACA_IRQ_EE) + pnv_flush_interrupts(); + DBG("CPU%d Unexpected exit while offline irq_happened=%lx!\n", + cpu, local_paca->irq_happened); + } + local_paca->irq_happened = PACA_IRQ_HARD_DIS; + /* * We don't want to take decrementer interrupts while we are * offline, so clear LPCR:PECE1. We keep PECE2 (and @@ -197,6 +222,7 @@ static void pnv_smp_cpu_kill_self(void) srr1 = pnv_cpu_offline(cpu); + WARN_ON(!irqs_disabled()); WARN_ON(lazy_irq_pending()); /* @@ -212,13 +238,7 @@ static void pnv_smp_cpu_kill_self(void) */ if (((srr1 & wmask) == SRR1_WAKEEE) || ((srr1 & wmask) == SRR1_WAKEHVI)) { - if (cpu_has_feature(CPU_FTR_ARCH_300)) { - if (xive_enabled()) - xive_flush_interrupt(); - else - icp_opal_flush_interrupt(); - } else - icp_native_flush_interrupt(); + pnv_flush_interrupts(); } else if ((srr1 & wmask) == SRR1_WAKEHDBELL) { unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER); asm volatile(PPC_MSGCLR(%0) : : "r" (msg)); @@ -266,7 +286,7 @@ static void pnv_smp_cpu_kill_self(void) */ lpcr_val = mfspr(SPRN_LPCR) | (u64)LPCR_PECE1; pnv_program_cpu_hotplug_lpcr(cpu, lpcr_val); - +out: DBG("CPU%d coming online...\n", cpu); } -- 2.23.0
[PATCH] powerpc/powernv/prd: Allow copying partial data to user space
Allow copying partial data to user space. So that opal-prd daemon can read message size, reallocate memory and make read call to get rest of the data. Cc: Jeremy Kerr Cc: Vaidyanathan Srinivasan Signed-off-by: Vasant Hegde --- arch/powerpc/platforms/powernv/opal-prd.c | 27 --- 1 file changed, 9 insertions(+), 18 deletions(-) diff --git a/arch/powerpc/platforms/powernv/opal-prd.c b/arch/powerpc/platforms/powernv/opal-prd.c index 45f4223a790f..dac9d18293d8 100644 --- a/arch/powerpc/platforms/powernv/opal-prd.c +++ b/arch/powerpc/platforms/powernv/opal-prd.c @@ -153,20 +153,15 @@ static __poll_t opal_prd_poll(struct file *file, static ssize_t opal_prd_read(struct file *file, char __user *buf, size_t count, loff_t *ppos) { - struct opal_prd_msg_queue_item *item; + struct opal_prd_msg_queue_item *item = NULL; unsigned long flags; - ssize_t size, err; + ssize_t size; int rc; /* we need at least a header's worth of data */ if (count < sizeof(item->msg)) return -EINVAL; - if (*ppos) - return -ESPIPE; - - item = NULL; - for (;;) { spin_lock_irqsave(_prd_msg_queue_lock, flags); @@ -190,27 +185,23 @@ static ssize_t opal_prd_read(struct file *file, char __user *buf, } size = be16_to_cpu(item->msg.size); - if (size > count) { - err = -EINVAL; + rc = simple_read_from_buffer(buf, count, ppos, >msg, size); + if (rc < 0) goto err_requeue; - } - - rc = copy_to_user(buf, >msg, size); - if (rc) { - err = -EFAULT; + if (*ppos < size) goto err_requeue; - } + /* Reset position */ + *ppos = 0; kfree(item); - - return size; + return rc; err_requeue: /* eep! re-queue at the head of the list */ spin_lock_irqsave(_prd_msg_queue_lock, flags); list_add(>list, _prd_msg_queue); spin_unlock_irqrestore(_prd_msg_queue_lock, flags); - return err; + return rc; } static ssize_t opal_prd_write(struct file *file, const char __user *buf, -- 2.21.0
Re: [PATCH V7] mm/debug: Add tests validating architecture page table helpers
On 10/22/2019 12:41 PM, Christophe Leroy wrote: > > > On 10/21/2019 02:42 AM, Anshuman Khandual wrote: >> This adds tests which will validate architecture page table helpers and >> other accessors in their compliance with expected generic MM semantics. >> This will help various architectures in validating changes to existing >> page table helpers or addition of new ones. >> >> This test covers basic page table entry transformations including but not >> limited to old, young, dirty, clean, write, write protect etc at various >> level along with populating intermediate entries with next page table page >> and validating them. >> >> Test page table pages are allocated from system memory with required size >> and alignments. The mapped pfns at page table levels are derived from a >> real pfn representing a valid kernel text symbol. This test gets called >> right after page_alloc_init_late(). >> >> This gets build and run when CONFIG_DEBUG_VM_PGTABLE is selected along with >> CONFIG_VM_DEBUG. Architectures willing to subscribe this test also need to >> select CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE which for now is limited to x86 and >> arm64. Going forward, other architectures too can enable this after fixing >> build or runtime problems (if any) with their page table helpers. >> >> Folks interested in making sure that a given platform's page table helpers >> conform to expected generic MM semantics should enable the above config >> which will just trigger this test during boot. Any non conformity here will >> be reported as an warning which would need to be fixed. This test will help >> catch any changes to the agreed upon semantics expected from generic MM and >> enable platforms to accommodate it thereafter. >> >> Cc: Andrew Morton >> Cc: Vlastimil Babka >> Cc: Greg Kroah-Hartman >> Cc: Thomas Gleixner >> Cc: Mike Rapoport >> Cc: Jason Gunthorpe >> Cc: Dan Williams >> Cc: Peter Zijlstra >> Cc: Michal Hocko >> Cc: Mark Rutland >> Cc: Mark Brown >> Cc: Steven Price >> Cc: Ard Biesheuvel >> Cc: Masahiro Yamada >> Cc: Kees Cook >> Cc: Tetsuo Handa >> Cc: Matthew Wilcox >> Cc: Sri Krishna chowdary >> Cc: Dave Hansen >> Cc: Russell King - ARM Linux >> Cc: Michael Ellerman >> Cc: Paul Mackerras >> Cc: Martin Schwidefsky >> Cc: Heiko Carstens >> Cc: "David S. Miller" >> Cc: Vineet Gupta >> Cc: James Hogan >> Cc: Paul Burton >> Cc: Ralf Baechle >> Cc: Kirill A. Shutemov >> Cc: Gerald Schaefer >> Cc: Christophe Leroy >> Cc: Ingo Molnar >> Cc: linux-snps-...@lists.infradead.org >> Cc: linux-m...@vger.kernel.org >> Cc: linux-arm-ker...@lists.infradead.org >> Cc: linux-i...@vger.kernel.org >> Cc: linuxppc-dev@lists.ozlabs.org >> Cc: linux-s...@vger.kernel.org >> Cc: linux...@vger.kernel.org >> Cc: sparcli...@vger.kernel.org >> Cc: x...@kernel.org >> Cc: linux-ker...@vger.kernel.org >> >> Tested-by: Christophe Leroy #PPC32 >> Suggested-by: Catalin Marinas >> Signed-off-by: Andrew Morton >> Signed-off-by: Christophe Leroy >> Signed-off-by: Anshuman Khandual >> --- > > The cover letter have the exact same title as this patch. I think a cover > letter is not necessary for a singleton series. Right, but it became singleton series in this version :) > > The history (and any other information you don't want to include in the > commit message) can be added here, below the '---'. That way it is in the > mail but won't be included in the commit. I was aware about that but the change log here was big, hence just choose to have that separately in a cover letter. As you said, I guess the cover letter is probably not required anymore. Will add it here in the patch, next time around. > >> .../debug/debug-vm-pgtable/arch-support.txt | 34 ++ >> arch/arm64/Kconfig | 1 + >> arch/x86/Kconfig | 1 + >> arch/x86/include/asm/pgtable_64.h | 6 + >> include/asm-generic/pgtable.h | 6 + >> init/main.c | 1 + >> lib/Kconfig.debug | 21 ++ >> mm/Makefile | 1 + >> mm/debug_vm_pgtable.c | 388 >> + >> 9 files changed, 459 insertions(+) >> create mode 100644 >> Documentation/features/debug/debug-vm-pgtable/arch-support.txt >> create mode 100644 mm/debug_vm_pgtable.c >> >> diff --git a/Documentation/features/debug/debug-vm-pgtable/arch-support.txt >> b/Documentation/features/debug/debug-vm-pgtable/arch-support.txt >> new file mode 100644 >> index 000..d6b8185 >> --- /dev/null >> +++ b/Documentation/features/debug/debug-vm-pgtable/arch-support.txt >> @@ -0,0 +1,34 @@ >> +# >> +# Feature name: debug-vm-pgtable >> +# Kconfig: ARCH_HAS_DEBUG_VM_PGTABLE >> +# description: arch supports pgtable tests for semantics >> compliance >> +# >> + ---
[PATCH v2 1/1] pseries/hotplug-cpu: Change default behaviour of cede_offline to "off"
From: "Gautham R. Shenoy" Currently on PSeries Linux guests, the offlined CPU can be put to one of the following two states: - Long term processor cede (also called extended cede) - Returned to the hypervisor via RTAS "stop-self" call. This is controlled by the kernel boot parameter "cede_offline=on/off". By default the offlined CPUs enter extended cede. The PHYP hypervisor considers CPUs in extended cede to be "active" since they are still under the control fo the Linux guests. Hence, when we change the SMT modes by offlining the secondary CPUs, the PURR and the RWMR SPRs will continue to count the values for offlined CPUs in extended cede as if they are online. This breaks the accounting in tools such as lparstat. To fix this, ensure that by default the offlined CPUs are returned to the hypervisor via RTAS "stop-self" call by changing the default value of "cede_offline_enabled" to false. Fixes: commit 3aa565f53c39 ("powerpc/pseries: Add hooks to put the CPU into an appropriate offline state") Signed-off-by: Gautham R. Shenoy --- Documentation/core-api/cpu_hotplug.rst | 2 +- arch/powerpc/platforms/pseries/hotplug-cpu.c | 12 +++- 2 files changed, 12 insertions(+), 2 deletions(-) diff --git a/Documentation/core-api/cpu_hotplug.rst b/Documentation/core-api/cpu_hotplug.rst index 4a50ab7..5319593 100644 --- a/Documentation/core-api/cpu_hotplug.rst +++ b/Documentation/core-api/cpu_hotplug.rst @@ -53,7 +53,7 @@ Command Line Switches ``cede_offline={"off","on"}`` Use this option to disable/enable putting offlined processors to an extended ``H_CEDE`` state on supported pseries platforms. If nothing is specified, - ``cede_offline`` is set to "on". + ``cede_offline`` is set to "off". This option is limited to the PowerPC architecture. diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c index bbda646..f9d0366 100644 --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c @@ -46,7 +46,17 @@ static DEFINE_PER_CPU(enum cpu_state_vals, preferred_offline_state) = static enum cpu_state_vals default_offline_state = CPU_STATE_OFFLINE; -static bool cede_offline_enabled __read_mostly = true; +/* + * Determines whether the offlined CPUs should be put to a long term + * processor cede (called extended cede) for power-saving + * purposes. The CPUs in extended cede are still with the Linux Guest + * and are not returned to the Hypervisor. + * + * By default, the offlined CPUs are returned to the hypervisor via + * RTAS "stop-self". This behaviour can be changed by passing the + * kernel commandline parameter "cede_offline=on". + */ +static bool cede_offline_enabled __read_mostly; /* * Enable/disable cede_offline when available. -- 1.9.4
[PATCH v2 0/1] pseries/hotplug: Change the default behaviour of cede_offline
From: "Gautham R. Shenoy" This is the v2 of the fix to change the default behaviour of cede_offline. The previous version can be found here: https://lkml.org/lkml/2019/9/12/222 The main change from v1 is that the patch2 to create a sysfs file to report and control the value of cede_offline_enabled has been dropped. Problem Description: Currently on Pseries Linux Guests, the offlined CPU can be put to one of the following two states: - Long term processor cede (also called extended cede) - Returned to the Hypervisor via RTAS "stop-self" call. This is controlled by the kernel boot parameter "cede_offline=on/off". By default the offlined CPUs enter extended cede. The PHYP hypervisor considers CPUs in extended cede to be "active" since the CPUs are still under the control fo the Linux Guests. Hence, when we change the SMT modes by offlining the secondary CPUs, the PURR and the RWMR SPRs will continue to count the values for offlined CPUs in extended cede as if they are online. One of the expectations with PURR is that the for an interval of time, the sum of the PURR increments across the online CPUs of a core should equal the number of timebase ticks for that interval. This is currently not the case. In the following data (Generated using https://github.com/gautshen/misc/blob/master/purr_tb.py): SD-PURR = Sum of PURR increments on online CPUs of that core in 1 second SMT=off === CoreSD-PURR SD-PURR (expected) (observed) === core00 [ 0]51200 69883784 core01 [ 8]51200 88782536 core02 [ 16]51200 94296824 core03 [ 24]51200 80951968 SMT=2 === CoreSD-PURR SD-PURR (expected) (observed) === core00 [ 0,1] 51200 136147792 core01 [ 8,9] 51200 128636784 core02 [ 16,17] 51200 135426488 core03 [ 24,25] 51200 153027520 SMT=4 === CoreSD-PURR SD-PURR (expected) (observed) === core00 [ 0,1,2,3] 51200 258331616 core01 [ 8,9,10,11]51200 274220072 core02 [ 16,17,18,19] 51200 260013736 core03 [ 24,25,26,27] 51200 260079672 SMT=on === CoreSD-PURR SD-PURR (expected) (observed) === core00 [ 0,1,2,3,4,5,6,7] 51200 512941248 core01 [ 8,9,10,11,12,13,14,15]51200 512936544 core02 [ 16,17,18,19,20,21,22,23] 51200 512931544 core03 [ 24,25,26,27,28,29,30,31] 51200 512923800 This patchset addresses this issue by ensuring that by default, the offlined CPUs are returned to the Hypervisor via RTAS "stop-self" call by changing the default value of "cede_offline_enabled" to false. With the patches, we see that the observed value of the sum of the PURR increments across the the online threads of a core in 1-second matches the number of tb-ticks in 1-second. SMT=off === CoreSD-PURR SD-PURR (expected) (observed) === core00 [ 0]51200512527568 core01 [ 8]51200512556128 core02 [ 16]51200512590016 core03 [ 24]51200512589440 SMT=2 === CoreSD-PURR SD-PURR (expected) (observed) === core00 [ 0,1] 51200 512635328 core01 [ 8,9] 51200 512610416 core02 [ 16,17] 51200 512639360 core03 [ 24,25] 51200 512638720 SMT=4 === CoreSD-PURR SD-PURR (expected) (observed) === core00 [ 0,1,2,3] 51200 512757328 core01 [ 8,9,10,11]51200 512727920 core02 [ 16,17,18,19] 51200 512754712 core03 [ 24,25,26,27] 51200 512739040 SMT=on == Core SD-PURR SD-PURR (expected) (observed) ==
Re: [PATCH 0/7] towards QE support on ARM
On 22/10/2019 04.24, Qiang Zhao wrote: > On Mon, Oct 22, 2019 at 6:11 AM Leo Li wrote >> Right. I'm really interested in getting this applied to my tree and make it >> upstream. Zhao Qiang, can you help to review Rasmus's patches and >> comment? > > As you know, I maintained a similar patchset removing PPC, and someone told > me qe_ic should moved into drivers/irqchip/. > I also thought qe_ic is a interrupt control driver, should be moved into dir > irqchip. Yes, and I also plan to do that at some point. However, that's orthogonal to making the driver build on ARM, so I don't want to mix the two. Making it usable on ARM is my/our priority currently. I'd appreciate your input on my patches. Rasmus
Re: [PATCH 0/7] towards QE support on ARM
On 22/10/2019 00.11, Li Yang wrote: > On Mon, Oct 21, 2019 at 3:46 AM Rasmus Villemoes > wrote: >> >>> Can you try the 4.14 branch from a newer LSDK release? LS1021a should >>> be supported platform on LSDK. If it is broken, something is wrong. >> >> What newer release? LSDK-18.06-V4.14 is the latest -V4.14 tag at >> https://github.com/qoriq-open-source/linux.git, and identical to the > > That tree has been abandoned for a while, we probably should state > that in the github. The latest tree can be found at > https://source.codeaurora.org/external/qoriq/qoriq-components/linux/ Ah. FYI, googling "LSDK" gives https://lsdk.github.io as one of the first hits, and (apart from itself being a github url) that says on the front page "Disaggregated components of LSDK are available in github.". But yes, navigating to the Components tab and from there to lsdk linux one does get directed at codeaurora. >> In any case, we have zero interest in running an NXP kernel. Maybe I >> should clarify what I meant by "based on commits from" above: We're >> currently running a mainline 4.14 kernel on LS1021A, with a few patches >> inspired from the NXP 4.1 branch applied on top - but also with some >> manual fixes for e.g. the pvr_version_is() issue. Now we want to move >> that to a 4.19-based kernel (so that it aligns with our MPC8309 platform). > > We also provide 4.19 based kernel in the codeaurora repo. I think it > will be helpful to reuse patches there if you want to make your own > tree. Again, we don't want to run off an NXP kernel, we want to get the necessary pieces upstream. For now, we have to live with a patched 4.19 kernel, but hopefully by the time we switch to 5.x (for some x >= 5) we don't need to supply anything other than our own .dts and defconfig. >> Yes, as I said, I wanted to try a fresh approach since Zhao >> Qiang's patches seemed to be getting nowhere. Splitting the patches into >> smaller pieces is definitely part of that - for example, the completely >> trivial whitespace fix in patch 1 is to make sure the later coccinelle >> generated patch is precisely that (i.e., a later respin can just rerun >> the coccinelle script, with zero manual fixups). I also want to avoid >> mixing the ppcism cleanups with other things (e.g. replacing some >> of_get_property() by of_property_read_u32()). And the "testing on ARM" >> part comes once I get to actually building on ARM. But there's not much >> point doing all that unless there's some indication that this can be >> applied to some tree that actually feeds into Linus', which is why I >> started with a few trivial patches and precisely to start this discussion. > > Right. I'm really interested in getting this applied to my tree and > make it upstream. Zhao Qiang, can you help to review Rasmus's patches > and comment? Thanks, this is exactly what I was hoping for. Even just getting these first rather trivial patches (in that they don't attempt to build for ARM, or change functionality at all for PPC) merged for 5.5 would reduce the amount of out-of-tree patches that we (and NXP for that matter) would have to carry. I'll take the above as a go-ahead for me to try to post more patches working towards enabling some of the QE drivers for ARM. Rasmus
Re: [PATCH 1/3] PM: wakeup: Add routine to help fetch wakeup source object.
On Tue, Oct 22, 2019 at 9:51 AM Ran Wang wrote: > > Some user might want to go through all registered wakeup sources > and doing things accordingly. For example, SoC PM driver might need to > do HW programming to prevent powering down specific IP which wakeup > source depending on. So add this API to help walk through all registered > wakeup source objects on that list and return them one by one. > > Signed-off-by: Ran Wang > Tested-by: Leonard Crestez > --- > Change in v8 > - Rename wakeup_source_get_next() to wakeup_sources_walk_next(). > - Add wakeup_sources_read_lock() to take over locking job of > wakeup_source_get_star(). > - Rename wakeup_source_get_start() to wakeup_sources_walk_start(). > - Replace wakeup_source_get_stop() with wakeup_sources_read_unlock(). > - Define macro for_each_wakeup_source(ws). > > Change in v7: > - Remove define of member *dev in wake_irq to fix conflict with commit > c8377adfa781 ("PM / wakeup: Show wakeup sources stats in sysfs"), user > will use ws->dev->parent instead. > - Remove '#include ' because it is not used. > > Change in v6: > - Add wakeup_source_get_star() and wakeup_source_get_stop() to aligned > with wakeup_sources_stats_seq_start/nex/stop. > > Change in v5: > - Update commit message, add decription of walk through all wakeup > source objects. > - Add SCU protection in function wakeup_source_get_next(). > - Rename wakeup_source member 'attached_dev' to 'dev' and move it up > (before wakeirq). > > Change in v4: > - None. > > Change in v3: > - Adjust indentation of *attached_dev;. > > Change in v2: > - None. > > drivers/base/power/wakeup.c | 42 ++ > include/linux/pm_wakeup.h | 9 + > 2 files changed, 51 insertions(+) > > diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c > index 5817b51..8c7a5f9 100644 > --- a/drivers/base/power/wakeup.c > +++ b/drivers/base/power/wakeup.c > @@ -248,6 +248,48 @@ void wakeup_source_unregister(struct wakeup_source *ws) > EXPORT_SYMBOL_GPL(wakeup_source_unregister); > > /** > + * wakeup_sources_read_lock - Lock wakeup source list for read. Please document the return value. > + */ > +int wakeup_sources_read_lock(void) > +{ > + return srcu_read_lock(_srcu); > +} > +EXPORT_SYMBOL_GPL(wakeup_sources_read_lock); > + > +/** > + * wakeup_sources_read_unlock - Unlock wakeup source list. Please document the argument. > + */ > +void wakeup_sources_read_unlock(int idx) > +{ > + srcu_read_unlock(_srcu, idx); > +} > +EXPORT_SYMBOL_GPL(wakeup_sources_read_unlock); > + > +/** > + * wakeup_sources_walk_start - Begin a walk on wakeup source list Please document the return value and add a note that the wakeup sources list needs to be locked for reading for this to be safe. > + */ > +struct wakeup_source *wakeup_sources_walk_start(void) > +{ > + struct list_head *ws_head = _sources; > + > + return list_entry_rcu(ws_head->next, struct wakeup_source, entry); > +} > +EXPORT_SYMBOL_GPL(wakeup_sources_walk_start); > + > +/** > + * wakeup_sources_walk_next - Get next wakeup source from the list > + * @ws: Previous wakeup source object Please add a note that the wakeup sources list needs to be locked for reading for this to be safe. > + */ > +struct wakeup_source *wakeup_sources_walk_next(struct wakeup_source *ws) > +{ > + struct list_head *ws_head = _sources; > + > + return list_next_or_null_rcu(ws_head, >entry, > + struct wakeup_source, entry); > +} > +EXPORT_SYMBOL_GPL(wakeup_sources_walk_next); > + > +/** > * device_wakeup_attach - Attach a wakeup source object to a device object. > * @dev: Device to handle. > * @ws: Wakeup source object to attach to @dev. > diff --git a/include/linux/pm_wakeup.h b/include/linux/pm_wakeup.h > index 661efa0..aa3da66 100644 > --- a/include/linux/pm_wakeup.h > +++ b/include/linux/pm_wakeup.h > @@ -63,6 +63,11 @@ struct wakeup_source { > boolautosleep_enabled:1; > }; > > +#define for_each_wakeup_source(ws) \ > + for ((ws) = wakeup_sources_walk_start();\ > +(ws); \ > +(ws) = wakeup_sources_walk_next((ws))) > + > #ifdef CONFIG_PM_SLEEP > > /* > @@ -92,6 +97,10 @@ extern void wakeup_source_remove(struct wakeup_source *ws); > extern struct wakeup_source *wakeup_source_register(struct device *dev, > const char *name); > extern void wakeup_source_unregister(struct wakeup_source *ws); > +extern int wakeup_sources_read_lock(void); > +extern void wakeup_sources_read_unlock(int idx); > +extern struct wakeup_source *wakeup_sources_walk_start(void); > +extern struct wakeup_source *wakeup_sources_walk_next(struct wakeup_source > *ws); > extern int
Re: [PATCH 3/3] soc: fsl: add RCPM driver
On Tue, Oct 22, 2019 at 9:52 AM Ran Wang wrote: > > The NXP's QorIQ Processors based on ARM Core have RCPM module > (Run Control and Power Management), which performs system level > tasks associated with power management such as wakeup source control. > > This driver depends on PM wakeup source framework which help to > collect wake information. > > Signed-off-by: Ran Wang > --- > Change in v8: > - Adjust related API usage to meet wakeup.c's update in patch 1/3. > - Add sanity checking for the case of ws->dev or ws->dev->parent > is null. > > Change in v7: > - Replace 'ws->dev' with 'ws->dev->parent' to get aligned with > c8377adfa781 ("PM / wakeup: Show wakeup sources stats in sysfs") > - Remove '+obj-y += ftm_alarm.o' since it is wrong. > - Cosmetic work. > > Change in v6: > - Adjust related API usage to meet wakeup.c's update in patch 1/3. > > Change in v5: > - Fix v4 regression of the return value of wakeup_source_get_next() > didn't pass to ws in while loop. > - Rename wakeup_source member 'attached_dev' to 'dev'. > - Rename property 'fsl,#rcpm-wakeup-cells' to > '#fsl,rcpm-wakeup-cells'. > please see https://lore.kernel.org/patchwork/patch/1101022/ > > Change in v4: > - Remove extra ',' in author line of rcpm.c > - Update usage of wakeup_source_get_next() to be less confusing to the > reader, code logic remain the same. > > Change in v3: > - Some whitespace ajdustment. > > Change in v2: > - Rebase Kconfig and Makefile update to latest mainline. > > drivers/soc/fsl/Kconfig | 8 +++ > drivers/soc/fsl/Makefile | 1 + > drivers/soc/fsl/rcpm.c | 133 > +++ > 3 files changed, 142 insertions(+) > create mode 100644 drivers/soc/fsl/rcpm.c > > diff --git a/drivers/soc/fsl/Kconfig b/drivers/soc/fsl/Kconfig > index f9ad8ad..4918856 100644 > --- a/drivers/soc/fsl/Kconfig > +++ b/drivers/soc/fsl/Kconfig > @@ -40,4 +40,12 @@ config DPAA2_CONSOLE > /dev/dpaa2_mc_console and /dev/dpaa2_aiop_console, > which can be used to dump the Management Complex and AIOP > firmware logs. > + > +config FSL_RCPM > + bool "Freescale RCPM support" > + depends on PM_SLEEP > + help > + The NXP QorIQ Processors based on ARM Core have RCPM module > + (Run Control and Power Management), which performs all device-level > + tasks associated with power management, such as wakeup source > control. > endmenu > diff --git a/drivers/soc/fsl/Makefile b/drivers/soc/fsl/Makefile > index 71dee8d..906f1cd 100644 > --- a/drivers/soc/fsl/Makefile > +++ b/drivers/soc/fsl/Makefile > @@ -6,6 +6,7 @@ > obj-$(CONFIG_FSL_DPAA) += qbman/ > obj-$(CONFIG_QUICC_ENGINE) += qe/ > obj-$(CONFIG_CPM) += qe/ > +obj-$(CONFIG_FSL_RCPM) += rcpm.o > obj-$(CONFIG_FSL_GUTS) += guts.o > obj-$(CONFIG_FSL_MC_DPIO) += dpio/ > obj-$(CONFIG_DPAA2_CONSOLE)+= dpaa2-console.o > diff --git a/drivers/soc/fsl/rcpm.c b/drivers/soc/fsl/rcpm.c > new file mode 100644 > index 000..3ed135e > --- /dev/null > +++ b/drivers/soc/fsl/rcpm.c > @@ -0,0 +1,133 @@ > +// SPDX-License-Identifier: GPL-2.0 > +// > +// rcpm.c - Freescale QorIQ RCPM driver > +// > +// Copyright 2019 NXP > +// > +// Author: Ran Wang > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#define RCPM_WAKEUP_CELL_MAX_SIZE 7 > + > +struct rcpm { > + unsigned intwakeup_cells; > + void __iomem*ippdexpcr_base; > + boollittle_endian; > +}; > + Please add a kerneldoc comment describing this routine. > +static int rcpm_pm_prepare(struct device *dev) > +{ > + int i, ret, idx; > + void __iomem *base; > + struct wakeup_source*ws; > + struct rcpm *rcpm; > + struct device_node *np = dev->of_node; > + u32 value[RCPM_WAKEUP_CELL_MAX_SIZE + 1], tmp; > + > + rcpm = dev_get_drvdata(dev); > + if (!rcpm) > + return -EINVAL; > + > + base = rcpm->ippdexpcr_base; > + idx = wakeup_sources_read_lock(); > + > + /* Begin with first registered wakeup source */ > + for_each_wakeup_source(ws) { > + > + /* skip object which is not attached to device */ > + if (!ws->dev || !ws->dev->parent) > + continue; > + > + ret = device_property_read_u32_array(ws->dev->parent, > + "fsl,rcpm-wakeup", value, > + rcpm->wakeup_cells + 1); > + > + /* Wakeup source should refer to current rcpm device */ > + if (ret || (np->phandle != value[0])) { > + dev_info(dev, "%s doesn't refer to this rcpm\n", > +
Re: [RFC PATCH] powerpc/32: Switch VDSO to C implementation.
Le 21/10/2019 à 23:29, Thomas Gleixner a écrit : On Mon, 21 Oct 2019, Christophe Leroy wrote: This is a tentative to switch powerpc/32 vdso to generic C implementation. It will likely not work on 64 bits or even build properly at the moment. powerpc is a bit special for VDSO as well as system calls in the way that it requires setting CR SO bit which cannot be done in C. Therefore, entry/exit and fallback needs to be performed in ASM. To allow that, C fallbacks just return -1 and the ASM entry point performs the system call when the C function returns -1. The performance is rather disappoiting. That's most likely all calculation in the C implementation are based on 64 bits math and converted to 32 bits at the very end. I guess C implementation should use 32 bits math like the assembly VDSO does as of today. gettimeofday:vdso: 750 nsec/call gettimeofday:vdso: 1533 nsec/call Small improvement (3%) with the proposed change: gettimeofday:vdso: 1485 nsec/call Though still some way to go. Christophe The only real 64bit math which can matter is the 64bit * 32bit multiply, i.e. static __always_inline u64 vdso_calc_delta(u64 cycles, u64 last, u64 mask, u32 mult) { return ((cycles - last) & mask) * mult; } Everything else is trivial add/sub/shift, which should be roughly the same in ASM. Can you try to replace that with: static __always_inline u64 vdso_calc_delta(u64 cycles, u64 last, u64 mask, u32 mult) { u64 ret, delta = ((cycles - last) & mask); u32 dh, dl; dl = delta; dh = delta >> 32; res = mul_u32_u32(al, mul); if (ah) res += mul_u32_u32(ah, mul) << 32; return res; } That's pretty much what __do_get_tspec does in ASM. Thanks, tglx
Re: [PATCH v4 3/3] powerpc/prom_init: Use -ffreestanding to avoid a reference to bcmp
On Mon, Oct 21, 2019 at 10:15:29PM -0700, Nathan Chancellor wrote: > On Fri, Oct 18, 2019 at 03:02:10PM -0500, Segher Boessenkool wrote: > > I think the proper solution is for the kernel to *do* use -ffreestanding, > > and then somehow tell the kernel that memcpy etc. are the standard > > functions. A freestanding GCC already requires memcpy, memmove, memset, > > memcmp, and sometimes abort to exist and do the standard thing; why cannot > > programs then also rely on it to be the standard functions. > > > > What exact functions are the reason the kernel does not use -ffreestanding? > > Is it just memcpy? Is more wanted? > > I think Linus summarized it pretty well here: > > https://lore.kernel.org/lkml/CAHk-=wi-epJZfBHDbKKDZ64us7WkF=lpufhvybmzsteo8q0...@mail.gmail.com/ GCC recognises __builtin_memcpy (or any other __builtin) just fine even with -ffreestanding. So the kernel wants a warning (or error) whenever a call to one of these library functions is generated by the compiler without the user asking for it directly (via a __builtin)? And that is all that is needed for the kernel to use -ffreestanding? That shouldn't be hard. Anything missing here? Segher
[PATCH 2/3] ocxl: Add pseries-specific code
pseries.c implements the guest-specific callbacks for the backend API. The hypervisor calls provide an interface to configure and interact with OpenCAPI devices. It matches the last version of the 'PAPR changes' document. The following hcalls are supported: H_OCXL_CONFIG_ADAPTER Used to configure OpenCAPI adapter characteristics. H_OCXL_CONFIG_SPA Used to configure the schedule process area (SPA) table for an OCAPI device. H_OCXL_GET_FAULT_STATE Used to retrieve fault information from an OpenCAPI device. H_OCXL_HANDLE_FAULT Used to respond to an OpenCAPI fault. Each previous hcall supports a config flag parameter, to allows the guest to manage the CAPI device. The current values 0xf004 to 0xf007 have been chosen according the available QEMU hcall values which are specific to qemu / KVM-on-POWER. Two parameters are common to all hcalls (buid and config_addr) that will be used to allow QEMU to recover the PCI device. Signed-off-by: Christophe Lombard --- drivers/misc/ocxl/Makefile| 1 + drivers/misc/ocxl/main.c | 4 + drivers/misc/ocxl/ocxl_internal.h | 1 + drivers/misc/ocxl/pseries.c | 450 ++ 4 files changed, 456 insertions(+) create mode 100644 drivers/misc/ocxl/pseries.c diff --git a/drivers/misc/ocxl/Makefile b/drivers/misc/ocxl/Makefile index bfdaeb232b83..3474e912c402 100644 --- a/drivers/misc/ocxl/Makefile +++ b/drivers/misc/ocxl/Makefile @@ -5,6 +5,7 @@ ocxl-y += main.o pci.o config.o file.o pasid.o mmio.o ocxl-y += link.o context.o afu_irq.o sysfs.o trace.o ocxl-y += core.o ocxl-$(CONFIG_PPC_POWERNV) += powernv.o +ocxl-$(CONFIG_PPC_PSERIES) += pseries.o obj-$(CONFIG_OCXL) += ocxl.o diff --git a/drivers/misc/ocxl/main.c b/drivers/misc/ocxl/main.c index 95df2ba4d473..bdd9ffa7f769 100644 --- a/drivers/misc/ocxl/main.c +++ b/drivers/misc/ocxl/main.c @@ -16,6 +16,10 @@ static int __init init_ocxl(void) if (cpu_has_feature(CPU_FTR_HVMODE)) ocxl_ops = _powernv_ops; +#ifdef CONFIG_PPC_PSERIES + else + ocxl_ops = _pseries_ops; +#endif rc = pci_register_driver(_pci_driver); if (rc) { diff --git a/drivers/misc/ocxl/ocxl_internal.h b/drivers/misc/ocxl/ocxl_internal.h index 2bdea279bdc6..c18b32df3fe5 100644 --- a/drivers/misc/ocxl/ocxl_internal.h +++ b/drivers/misc/ocxl/ocxl_internal.h @@ -104,6 +104,7 @@ struct ocxl_backend_ops { }; extern const struct ocxl_backend_ops ocxl_powernv_ops; +extern const struct ocxl_backend_ops ocxl_pseries_ops; extern const struct ocxl_backend_ops *ocxl_ops; int ocxl_create_cdev(struct ocxl_afu *afu); diff --git a/drivers/misc/ocxl/pseries.c b/drivers/misc/ocxl/pseries.c new file mode 100644 index ..1d4942d713f7 --- /dev/null +++ b/drivers/misc/ocxl/pseries.c @@ -0,0 +1,450 @@ +// SPDX-License-Identifier: GPL-2.0+ +// Copyright 2018 IBM Corp. +#include +#include "ocxl_internal.h" +#include + +#define H_OCXL_CONFIG_ADAPTER 0xf004 +#define H_OCXL_CONFIG_SPA 0xf005 +#define H_OCXL_GET_FAULT_STATE 0xf006 +#define H_OCXL_HANDLE_FAULT0xf007 + +#define H_CONFIG_ADAPTER_SETUP 1 +#define H_CONFIG_ADAPTER_RELEASE 2 +#define H_CONFIG_ADAPTER_GET_ACTAG 3 +#define H_CONFIG_ADAPTER_GET_PASID 4 +#define H_CONFIG_ADAPTER_SET_TL5 +#define H_CONFIG_ADAPTER_ALLOC_IRQ 6 +#define H_CONFIG_ADAPTER_FREE_IRQ 7 + +#define H_CONFIG_SPA_SET 1 +#define H_CONFIG_SPA_UPDATE2 +#define H_CONFIG_SPA_REMOVE3 + +static char *config_adapter_names[] = { + "UNKNOWN_OP", /* 0 undefined */ + "SETUP",/* 1 */ + "RELEASE", /* 2 */ + "GET_ACTAG",/* 3 */ + "GET_PASID",/* 4 */ + "SET_TL", /* 5 */ + "ALLOC_IRQ",/* 6 */ + "FREE_IRQ", /* 7 */ +}; + +static char *config_spa_names[] = { + "UNKNOWN_OP", /* 0 undefined */ + "SET", /* 1 */ + "UPDATE", /* 2 */ + "REMOVE", /* 3 */ +}; + +static char *op_str(unsigned int op, char *names[], int len) +{ + if (op >= len) + return "UNKNOWN_OP"; + return names[op]; +} + +#define OP_STR(op, names) op_str(op, names, ARRAY_SIZE(names)) +#define OP_STR_CA(op) OP_STR(op, config_adapter_names) +#define OP_STR_CS(op) OP_STR(op, config_spa_names) + +#define _PRINT_MSG(rc, format, ...)\ + { \ + if (rc != H_SUCCESS && rc != H_CONTINUE)\ + pr_err(format, __VA_ARGS__);\ + else\ +
[PATCH 0/3] ocxl: Support for an 0penCAPI device in a QEMU guest.
This series adds support for an 0penCAPI device in a QEMU guest. It builds on top of the existing ocxl driver + http://patchwork.ozlabs.org/patch/1177999/ The ocxl module registers either a pci driver or a platform driver, based on the environment (bare-metal (powernv) or pseries). Roughly 4/5 of the code is common between the 2 types of driver: - PCI implementation - mmio operations - link management - sysfs folders - page fault and context handling The differences in implementation are essentially based on the interact with the opal api(s) defined in the host. Several hcalls have been defined (extension of the PAPR) to: - configure the Sceduled Process Area - get specific AFU information - allocated irq - handle page fault and process element When the code needs to call a platform-specific implementation, it does so through an API. The powervn and pseries implementations each describe their own definition. See struct ocxl_backend_ops. It has been tested in a bare-metal and QEMU environment using the memcpy and the AFP AFUs. christophe lombard (3): ocxl: Introduce implementation-specific API ocxl: Add pseries-specific code powerpc/pseries: Fixup config space size of OpenCAPI devices arch/powerpc/platforms/pseries/pci.c | 9 + drivers/misc/ocxl/Makefile | 3 + drivers/misc/ocxl/config.c | 7 +- drivers/misc/ocxl/link.c | 31 +- drivers/misc/ocxl/main.c | 9 + drivers/misc/ocxl/ocxl_internal.h| 25 ++ drivers/misc/ocxl/powernv.c | 88 ++ drivers/misc/ocxl/pseries.c | 450 +++ 8 files changed, 603 insertions(+), 19 deletions(-) create mode 100644 drivers/misc/ocxl/powernv.c create mode 100644 drivers/misc/ocxl/pseries.c -- 2.21.0
[PATCH 3/3] powerpc/pseries: Fixup config space size of OpenCAPI devices
Fix up the pci config size of the OpenCAPI PCIe devices in the pseries environment. Most of OpenCAPI PCIe devices have 4096 bytes of configuration space. Signed-off-by: Christophe Lombard --- arch/powerpc/platforms/pseries/pci.c | 9 + 1 file changed, 9 insertions(+) diff --git a/arch/powerpc/platforms/pseries/pci.c b/arch/powerpc/platforms/pseries/pci.c index 1eae1d09980c..3397784767b0 100644 --- a/arch/powerpc/platforms/pseries/pci.c +++ b/arch/powerpc/platforms/pseries/pci.c @@ -291,6 +291,15 @@ static void fixup_winbond_82c105(struct pci_dev* dev) DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_WINBOND, PCI_DEVICE_ID_WINBOND_82C105, fixup_winbond_82c105); +static void fixup_opencapi_cfg_size(struct pci_dev *pdev) +{ + if (!machine_is(pseries)) + return; + + pdev->cfg_size = PCI_CFG_SPACE_EXP_SIZE; +} +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_IBM, 0x062b, fixup_opencapi_cfg_size); + int pseries_root_bridge_prepare(struct pci_host_bridge *bridge) { struct device_node *dn, *pdn; -- 2.21.0
[PATCH 1/3] ocxl: Introduce implementation-specific API
The backend API (in ocxl.h) lists some low-level functions whose implementation is different on bare-metal and in a guest. Each environment implements its own functions, and the common code uses them through function pointers, defined in ocxl_backend_ops A new powernv.c file is created to call the pnv_ocxl_ API for the bare-metal environment. Signed-off-by: Christophe Lombard --- drivers/misc/ocxl/Makefile| 2 + drivers/misc/ocxl/config.c| 7 ++- drivers/misc/ocxl/link.c | 31 +-- drivers/misc/ocxl/main.c | 5 ++ drivers/misc/ocxl/ocxl_internal.h | 24 + drivers/misc/ocxl/powernv.c | 88 +++ 6 files changed, 138 insertions(+), 19 deletions(-) create mode 100644 drivers/misc/ocxl/powernv.c diff --git a/drivers/misc/ocxl/Makefile b/drivers/misc/ocxl/Makefile index d07d1bb8e8d4..bfdaeb232b83 100644 --- a/drivers/misc/ocxl/Makefile +++ b/drivers/misc/ocxl/Makefile @@ -4,6 +4,8 @@ ccflags-$(CONFIG_PPC_WERROR)+= -Werror ocxl-y += main.o pci.o config.o file.o pasid.o mmio.o ocxl-y += link.o context.o afu_irq.o sysfs.o trace.o ocxl-y += core.o +ocxl-$(CONFIG_PPC_POWERNV) += powernv.o + obj-$(CONFIG_OCXL) += ocxl.o # For tracepoints to include our trace.h from tracepoint infrastructure: diff --git a/drivers/misc/ocxl/config.c b/drivers/misc/ocxl/config.c index 7ca0f6744125..981a3bcfe742 100644 --- a/drivers/misc/ocxl/config.c +++ b/drivers/misc/ocxl/config.c @@ -1,7 +1,6 @@ // SPDX-License-Identifier: GPL-2.0+ // Copyright 2017 IBM Corp. #include -#include #include #include "ocxl_internal.h" @@ -649,7 +648,7 @@ int ocxl_config_get_actag_info(struct pci_dev *dev, u16 *base, u16 *enabled, * avoid an external driver using ocxl as a library to call * platform-dependent code */ - rc = pnv_ocxl_get_actag(dev, base, enabled, supported); + rc = ocxl_ops->get_actag(dev, base, enabled, supported); if (rc) { dev_err(>dev, "Can't get actag for device: %d\n", rc); return rc; @@ -673,7 +672,7 @@ EXPORT_SYMBOL_GPL(ocxl_config_set_afu_actag); int ocxl_config_get_pasid_info(struct pci_dev *dev, int *count) { - return pnv_ocxl_get_pasid_count(dev, count); + return ocxl_ops->get_pasid_count(dev, count); } void ocxl_config_set_afu_pasid(struct pci_dev *dev, int pos, int pasid_base, @@ -715,7 +714,7 @@ int ocxl_config_set_TL(struct pci_dev *dev, int tl_dvsec) if (PCI_FUNC(dev->devfn) != 0) return 0; - return pnv_ocxl_set_TL(dev, tl_dvsec); + return ocxl_ops->set_tl(dev, tl_dvsec); } EXPORT_SYMBOL_GPL(ocxl_config_set_TL); diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c index e936a3bd5957..9f4d164180a7 100644 --- a/drivers/misc/ocxl/link.c +++ b/drivers/misc/ocxl/link.c @@ -5,7 +5,6 @@ #include #include #include -#include #include #include "ocxl_internal.h" #include "trace.h" @@ -83,7 +82,7 @@ static void ack_irq(struct ocxl_link *link, enum xsl_response r) link->xsl_fault.dsisr, link->xsl_fault.dar, reg); - pnv_ocxl_handle_fault(link->platform_data, reg); + ocxl_ops->ack_irq(link->platform_data, reg); } } @@ -146,8 +145,8 @@ static irqreturn_t xsl_fault_handler(int irq, void *data) int pid; bool schedule = false; - pnv_ocxl_get_fault_state(link->platform_data, , , -_handle, ); + ocxl_ops->get_fault_state(link->platform_data, , , + _handle, ); trace_ocxl_fault(pe_handle, dsisr, dar, -1); /* We could be reading all null values here if the PE is being @@ -282,8 +281,8 @@ static int alloc_link(struct pci_dev *dev, int PE_mask, struct ocxl_link **out_l INIT_WORK(>xsl_fault.fault_work, xsl_fault_handler_bh); /* platform specific hook */ - rc = pnv_ocxl_platform_setup(dev, PE_mask, _irq, ->platform_data); + rc = ocxl_ops->platform_setup(dev, PE_mask, _irq, + >platform_data); if (rc) goto err_free; @@ -298,7 +297,7 @@ static int alloc_link(struct pci_dev *dev, int PE_mask, struct ocxl_link **out_l return 0; err_xsl_irq: - pnv_ocxl_platform_release(link->platform_data); + ocxl_ops->platform_release(link->platform_data); err_free: kfree(link); return rc; @@ -344,7 +343,7 @@ static void release_xsl(struct kref *ref) list_del(>list); /* call platform code before releasing data */ - pnv_ocxl_platform_release(link->platform_data); + ocxl_ops->platform_release(link->platform_data); free_link(link); } @@ -378,8 +377,8 @@
[PATCH 2/3] Documentation: dt: binding: fsl: Add 'little-endian' and update Chassis define
By default, QorIQ SoC's RCPM register block is Big Endian. But there are some exceptions, such as LS1088A and LS2088A, are Little Endian. So add this optional property to help identify them. Actually LS2021A and other Layerscapes won't totally follow Chassis 2.1, so separate them from powerpc SoC. Signed-off-by: Ran Wang Reviewed-by: Rob Herring --- Change in v8: - None. Change in v7: - None. Change in v6: - None. Change in v5: - Add 'Reviewed-by: Rob Herring ' to commit message. - Rename property 'fsl,#rcpm-wakeup-cells' to '#fsl,rcpm-wakeup-cells'. please see https://lore.kernel.org/patchwork/patch/1101022/ Change in v4: - Adjust indectation of 'ls1021a, ls1012a, ls1043a, ls1046a'. Change in v3: - None. Change in v2: - None. Documentation/devicetree/bindings/soc/fsl/rcpm.txt | 14 ++ 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/Documentation/devicetree/bindings/soc/fsl/rcpm.txt b/Documentation/devicetree/bindings/soc/fsl/rcpm.txt index e284e4e..5a33619 100644 --- a/Documentation/devicetree/bindings/soc/fsl/rcpm.txt +++ b/Documentation/devicetree/bindings/soc/fsl/rcpm.txt @@ -5,7 +5,7 @@ and power management. Required properites: - reg : Offset and length of the register set of the RCPM block. - - fsl,#rcpm-wakeup-cells : The number of IPPDEXPCR register cells in the + - #fsl,rcpm-wakeup-cells : The number of IPPDEXPCR register cells in the fsl,rcpm-wakeup property. - compatible : Must contain a chip-specific RCPM block compatible string and (if applicable) may contain a chassis-version RCPM compatible @@ -20,6 +20,7 @@ Required properites: * "fsl,qoriq-rcpm-1.0": for chassis 1.0 rcpm * "fsl,qoriq-rcpm-2.0": for chassis 2.0 rcpm * "fsl,qoriq-rcpm-2.1": for chassis 2.1 rcpm + * "fsl,qoriq-rcpm-2.1+": for chassis 2.1+ rcpm All references to "1.0" and "2.0" refer to the QorIQ chassis version to which the chip complies. @@ -27,14 +28,19 @@ Chassis Version Example Chips ------ 1.0p4080, p5020, p5040, p2041, p3041 2.0t4240, b4860, b4420 -2.1t1040, ls1021 +2.1t1040, +2.1+ ls1021a, ls1012a, ls1043a, ls1046a + +Optional properties: + - little-endian : RCPM register block is Little Endian. Without it RCPM + will be Big Endian (default case). Example: The RCPM node for T4240: rcpm: global-utilities@e2000 { compatible = "fsl,t4240-rcpm", "fsl,qoriq-rcpm-2.0"; reg = <0xe2000 0x1000>; - fsl,#rcpm-wakeup-cells = <2>; + #fsl,rcpm-wakeup-cells = <2>; }; * Freescale RCPM Wakeup Source Device Tree Bindings @@ -44,7 +50,7 @@ can be used as a wakeup source. - fsl,rcpm-wakeup: Consists of a phandle to the rcpm node and the IPPDEXPCR register cells. The number of IPPDEXPCR register cells is defined in - "fsl,#rcpm-wakeup-cells" in the rcpm node. The first register cell is + "#fsl,rcpm-wakeup-cells" in the rcpm node. The first register cell is the bit mask that should be set in IPPDEXPCR0, and the second register cell is for IPPDEXPCR1, and so on. -- 2.7.4
[PATCH 3/3] soc: fsl: add RCPM driver
The NXP's QorIQ Processors based on ARM Core have RCPM module (Run Control and Power Management), which performs system level tasks associated with power management such as wakeup source control. This driver depends on PM wakeup source framework which help to collect wake information. Signed-off-by: Ran Wang --- Change in v8: - Adjust related API usage to meet wakeup.c's update in patch 1/3. - Add sanity checking for the case of ws->dev or ws->dev->parent is null. Change in v7: - Replace 'ws->dev' with 'ws->dev->parent' to get aligned with c8377adfa781 ("PM / wakeup: Show wakeup sources stats in sysfs") - Remove '+obj-y += ftm_alarm.o' since it is wrong. - Cosmetic work. Change in v6: - Adjust related API usage to meet wakeup.c's update in patch 1/3. Change in v5: - Fix v4 regression of the return value of wakeup_source_get_next() didn't pass to ws in while loop. - Rename wakeup_source member 'attached_dev' to 'dev'. - Rename property 'fsl,#rcpm-wakeup-cells' to '#fsl,rcpm-wakeup-cells'. please see https://lore.kernel.org/patchwork/patch/1101022/ Change in v4: - Remove extra ',' in author line of rcpm.c - Update usage of wakeup_source_get_next() to be less confusing to the reader, code logic remain the same. Change in v3: - Some whitespace ajdustment. Change in v2: - Rebase Kconfig and Makefile update to latest mainline. drivers/soc/fsl/Kconfig | 8 +++ drivers/soc/fsl/Makefile | 1 + drivers/soc/fsl/rcpm.c | 133 +++ 3 files changed, 142 insertions(+) create mode 100644 drivers/soc/fsl/rcpm.c diff --git a/drivers/soc/fsl/Kconfig b/drivers/soc/fsl/Kconfig index f9ad8ad..4918856 100644 --- a/drivers/soc/fsl/Kconfig +++ b/drivers/soc/fsl/Kconfig @@ -40,4 +40,12 @@ config DPAA2_CONSOLE /dev/dpaa2_mc_console and /dev/dpaa2_aiop_console, which can be used to dump the Management Complex and AIOP firmware logs. + +config FSL_RCPM + bool "Freescale RCPM support" + depends on PM_SLEEP + help + The NXP QorIQ Processors based on ARM Core have RCPM module + (Run Control and Power Management), which performs all device-level + tasks associated with power management, such as wakeup source control. endmenu diff --git a/drivers/soc/fsl/Makefile b/drivers/soc/fsl/Makefile index 71dee8d..906f1cd 100644 --- a/drivers/soc/fsl/Makefile +++ b/drivers/soc/fsl/Makefile @@ -6,6 +6,7 @@ obj-$(CONFIG_FSL_DPAA) += qbman/ obj-$(CONFIG_QUICC_ENGINE) += qe/ obj-$(CONFIG_CPM) += qe/ +obj-$(CONFIG_FSL_RCPM) += rcpm.o obj-$(CONFIG_FSL_GUTS) += guts.o obj-$(CONFIG_FSL_MC_DPIO) += dpio/ obj-$(CONFIG_DPAA2_CONSOLE)+= dpaa2-console.o diff --git a/drivers/soc/fsl/rcpm.c b/drivers/soc/fsl/rcpm.c new file mode 100644 index 000..3ed135e --- /dev/null +++ b/drivers/soc/fsl/rcpm.c @@ -0,0 +1,133 @@ +// SPDX-License-Identifier: GPL-2.0 +// +// rcpm.c - Freescale QorIQ RCPM driver +// +// Copyright 2019 NXP +// +// Author: Ran Wang + +#include +#include +#include +#include +#include +#include +#include + +#define RCPM_WAKEUP_CELL_MAX_SIZE 7 + +struct rcpm { + unsigned intwakeup_cells; + void __iomem*ippdexpcr_base; + boollittle_endian; +}; + +static int rcpm_pm_prepare(struct device *dev) +{ + int i, ret, idx; + void __iomem *base; + struct wakeup_source*ws; + struct rcpm *rcpm; + struct device_node *np = dev->of_node; + u32 value[RCPM_WAKEUP_CELL_MAX_SIZE + 1], tmp; + + rcpm = dev_get_drvdata(dev); + if (!rcpm) + return -EINVAL; + + base = rcpm->ippdexpcr_base; + idx = wakeup_sources_read_lock(); + + /* Begin with first registered wakeup source */ + for_each_wakeup_source(ws) { + + /* skip object which is not attached to device */ + if (!ws->dev || !ws->dev->parent) + continue; + + ret = device_property_read_u32_array(ws->dev->parent, + "fsl,rcpm-wakeup", value, + rcpm->wakeup_cells + 1); + + /* Wakeup source should refer to current rcpm device */ + if (ret || (np->phandle != value[0])) { + dev_info(dev, "%s doesn't refer to this rcpm\n", + ws->name); + continue; + } + + for (i = 0; i < rcpm->wakeup_cells; i++) { + /* We can only OR related bits */ + if (value[i + 1]) { + if (rcpm->little_endian) { + tmp = ioread32(base + i * 4); +
[PATCH 1/3] PM: wakeup: Add routine to help fetch wakeup source object.
Some user might want to go through all registered wakeup sources and doing things accordingly. For example, SoC PM driver might need to do HW programming to prevent powering down specific IP which wakeup source depending on. So add this API to help walk through all registered wakeup source objects on that list and return them one by one. Signed-off-by: Ran Wang Tested-by: Leonard Crestez --- Change in v8 - Rename wakeup_source_get_next() to wakeup_sources_walk_next(). - Add wakeup_sources_read_lock() to take over locking job of wakeup_source_get_star(). - Rename wakeup_source_get_start() to wakeup_sources_walk_start(). - Replace wakeup_source_get_stop() with wakeup_sources_read_unlock(). - Define macro for_each_wakeup_source(ws). Change in v7: - Remove define of member *dev in wake_irq to fix conflict with commit c8377adfa781 ("PM / wakeup: Show wakeup sources stats in sysfs"), user will use ws->dev->parent instead. - Remove '#include ' because it is not used. Change in v6: - Add wakeup_source_get_star() and wakeup_source_get_stop() to aligned with wakeup_sources_stats_seq_start/nex/stop. Change in v5: - Update commit message, add decription of walk through all wakeup source objects. - Add SCU protection in function wakeup_source_get_next(). - Rename wakeup_source member 'attached_dev' to 'dev' and move it up (before wakeirq). Change in v4: - None. Change in v3: - Adjust indentation of *attached_dev;. Change in v2: - None. drivers/base/power/wakeup.c | 42 ++ include/linux/pm_wakeup.h | 9 + 2 files changed, 51 insertions(+) diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c index 5817b51..8c7a5f9 100644 --- a/drivers/base/power/wakeup.c +++ b/drivers/base/power/wakeup.c @@ -248,6 +248,48 @@ void wakeup_source_unregister(struct wakeup_source *ws) EXPORT_SYMBOL_GPL(wakeup_source_unregister); /** + * wakeup_sources_read_lock - Lock wakeup source list for read. + */ +int wakeup_sources_read_lock(void) +{ + return srcu_read_lock(_srcu); +} +EXPORT_SYMBOL_GPL(wakeup_sources_read_lock); + +/** + * wakeup_sources_read_unlock - Unlock wakeup source list. + */ +void wakeup_sources_read_unlock(int idx) +{ + srcu_read_unlock(_srcu, idx); +} +EXPORT_SYMBOL_GPL(wakeup_sources_read_unlock); + +/** + * wakeup_sources_walk_start - Begin a walk on wakeup source list + */ +struct wakeup_source *wakeup_sources_walk_start(void) +{ + struct list_head *ws_head = _sources; + + return list_entry_rcu(ws_head->next, struct wakeup_source, entry); +} +EXPORT_SYMBOL_GPL(wakeup_sources_walk_start); + +/** + * wakeup_sources_walk_next - Get next wakeup source from the list + * @ws: Previous wakeup source object + */ +struct wakeup_source *wakeup_sources_walk_next(struct wakeup_source *ws) +{ + struct list_head *ws_head = _sources; + + return list_next_or_null_rcu(ws_head, >entry, + struct wakeup_source, entry); +} +EXPORT_SYMBOL_GPL(wakeup_sources_walk_next); + +/** * device_wakeup_attach - Attach a wakeup source object to a device object. * @dev: Device to handle. * @ws: Wakeup source object to attach to @dev. diff --git a/include/linux/pm_wakeup.h b/include/linux/pm_wakeup.h index 661efa0..aa3da66 100644 --- a/include/linux/pm_wakeup.h +++ b/include/linux/pm_wakeup.h @@ -63,6 +63,11 @@ struct wakeup_source { boolautosleep_enabled:1; }; +#define for_each_wakeup_source(ws) \ + for ((ws) = wakeup_sources_walk_start();\ +(ws); \ +(ws) = wakeup_sources_walk_next((ws))) + #ifdef CONFIG_PM_SLEEP /* @@ -92,6 +97,10 @@ extern void wakeup_source_remove(struct wakeup_source *ws); extern struct wakeup_source *wakeup_source_register(struct device *dev, const char *name); extern void wakeup_source_unregister(struct wakeup_source *ws); +extern int wakeup_sources_read_lock(void); +extern void wakeup_sources_read_unlock(int idx); +extern struct wakeup_source *wakeup_sources_walk_start(void); +extern struct wakeup_source *wakeup_sources_walk_next(struct wakeup_source *ws); extern int device_wakeup_enable(struct device *dev); extern int device_wakeup_disable(struct device *dev); extern void device_set_wakeup_capable(struct device *dev, bool capable); -- 2.7.4
Re: [PATCH V7] mm/debug: Add tests validating architecture page table helpers
On 10/21/2019 02:42 AM, Anshuman Khandual wrote: This adds tests which will validate architecture page table helpers and other accessors in their compliance with expected generic MM semantics. This will help various architectures in validating changes to existing page table helpers or addition of new ones. This test covers basic page table entry transformations including but not limited to old, young, dirty, clean, write, write protect etc at various level along with populating intermediate entries with next page table page and validating them. Test page table pages are allocated from system memory with required size and alignments. The mapped pfns at page table levels are derived from a real pfn representing a valid kernel text symbol. This test gets called right after page_alloc_init_late(). This gets build and run when CONFIG_DEBUG_VM_PGTABLE is selected along with CONFIG_VM_DEBUG. Architectures willing to subscribe this test also need to select CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE which for now is limited to x86 and arm64. Going forward, other architectures too can enable this after fixing build or runtime problems (if any) with their page table helpers. Folks interested in making sure that a given platform's page table helpers conform to expected generic MM semantics should enable the above config which will just trigger this test during boot. Any non conformity here will be reported as an warning which would need to be fixed. This test will help catch any changes to the agreed upon semantics expected from generic MM and enable platforms to accommodate it thereafter. Cc: Andrew Morton Cc: Vlastimil Babka Cc: Greg Kroah-Hartman Cc: Thomas Gleixner Cc: Mike Rapoport Cc: Jason Gunthorpe Cc: Dan Williams Cc: Peter Zijlstra Cc: Michal Hocko Cc: Mark Rutland Cc: Mark Brown Cc: Steven Price Cc: Ard Biesheuvel Cc: Masahiro Yamada Cc: Kees Cook Cc: Tetsuo Handa Cc: Matthew Wilcox Cc: Sri Krishna chowdary Cc: Dave Hansen Cc: Russell King - ARM Linux Cc: Michael Ellerman Cc: Paul Mackerras Cc: Martin Schwidefsky Cc: Heiko Carstens Cc: "David S. Miller" Cc: Vineet Gupta Cc: James Hogan Cc: Paul Burton Cc: Ralf Baechle Cc: Kirill A. Shutemov Cc: Gerald Schaefer Cc: Christophe Leroy Cc: Ingo Molnar Cc: linux-snps-...@lists.infradead.org Cc: linux-m...@vger.kernel.org Cc: linux-arm-ker...@lists.infradead.org Cc: linux-i...@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-s...@vger.kernel.org Cc: linux...@vger.kernel.org Cc: sparcli...@vger.kernel.org Cc: x...@kernel.org Cc: linux-ker...@vger.kernel.org Tested-by: Christophe Leroy #PPC32 Suggested-by: Catalin Marinas Signed-off-by: Andrew Morton Signed-off-by: Christophe Leroy Signed-off-by: Anshuman Khandual --- The cover letter have the exact same title as this patch. I think a cover letter is not necessary for a singleton series. The history (and any other information you don't want to include in the commit message) can be added here, below the '---'. That way it is in the mail but won't be included in the commit. .../debug/debug-vm-pgtable/arch-support.txt| 34 ++ arch/arm64/Kconfig | 1 + arch/x86/Kconfig | 1 + arch/x86/include/asm/pgtable_64.h | 6 + include/asm-generic/pgtable.h | 6 + init/main.c| 1 + lib/Kconfig.debug | 21 ++ mm/Makefile| 1 + mm/debug_vm_pgtable.c | 388 + 9 files changed, 459 insertions(+) create mode 100644 Documentation/features/debug/debug-vm-pgtable/arch-support.txt create mode 100644 mm/debug_vm_pgtable.c diff --git a/Documentation/features/debug/debug-vm-pgtable/arch-support.txt b/Documentation/features/debug/debug-vm-pgtable/arch-support.txt new file mode 100644 index 000..d6b8185 --- /dev/null +++ b/Documentation/features/debug/debug-vm-pgtable/arch-support.txt @@ -0,0 +1,34 @@ +# +# Feature name: debug-vm-pgtable +# Kconfig: ARCH_HAS_DEBUG_VM_PGTABLE +# description: arch supports pgtable tests for semantics compliance +# +--- +| arch |status| +--- +| alpha: | TODO | +| arc: | TODO | +| arm: | TODO | +| arm64: | ok | +| c6x: | TODO | +|csky: | TODO | +| h8300: | TODO | +| hexagon: | TODO | +|ia64: | TODO | +|m68k: | TODO | +| microblaze: | TODO | +|mips: | TODO | +| nds32: | TODO | +| nios2: | TODO | +|openrisc: | TODO | +| parisc: | TODO | +| powerpc: | TODO | Say ok on ppc32 +| riscv: | TODO | +|s390: | TODO | +| sh: | TODO | +| sparc:
Re: [PATCH v9 2/8] KVM: PPC: Move pages between normal and secure memory
On Fri, Oct 18, 2019 at 8:31 AM Paul Mackerras wrote: > > On Wed, Sep 25, 2019 at 10:36:43AM +0530, Bharata B Rao wrote: > > Manage migration of pages betwen normal and secure memory of secure > > guest by implementing H_SVM_PAGE_IN and H_SVM_PAGE_OUT hcalls. > > > > H_SVM_PAGE_IN: Move the content of a normal page to secure page > > H_SVM_PAGE_OUT: Move the content of a secure page to normal page > > > > Private ZONE_DEVICE memory equal to the amount of secure memory > > available in the platform for running secure guests is created. > > Whenever a page belonging to the guest becomes secure, a page from > > this private device memory is used to represent and track that secure > > page on the HV side. The movement of pages between normal and secure > > memory is done via migrate_vma_pages() using UV_PAGE_IN and > > UV_PAGE_OUT ucalls. > > As we discussed privately, but mentioning it here so there is a > record: I am concerned about this structure > > > +struct kvmppc_uvmem_page_pvt { > > + unsigned long *rmap; > > + struct kvm *kvm; > > + unsigned long gpa; > > +}; > > which keeps a reference to the rmap. The reference could become stale > if the memslot is deleted or moved, and nothing in the patch series > ensures that the stale references are cleaned up. I will add code to release the device PFNs when memslot goes away. In fact the early versions of the patchset had this, but it subsequently got removed. > > If it is possible to do without the long-term rmap reference, and > instead find the rmap via the memslots (with the srcu lock held) each > time we need the rmap, that would be safer, I think, provided that we > can sort out the lock ordering issues. All paths except fault handler access rmap[] under srcu lock. Even in case of fault handler, for those faults induced by us (shared page handling, releasing device pfns), we do hold srcu lock. The difficult case is when we fault due to HV accessing a device page. In this case we come to fault hanler with mmap_sem already held and are not in a position to take kvm srcu lock as that would lead to lock order reversal. Given that we have pages mapped in still, I assume memslot can't go away while we access rmap[], so think we should be ok here. However if that sounds fragile, may be I can go back to my initial design where we weren't using rmap[] to store device PFNs. That will increase the memory usage but we give us an easy option to have per-guest mutex to protect concurrent page-ins/outs/faults. Regards, Bharata. -- http://raobharata.wordpress.com/
Re: [PATCH] powerpc/64s/exception: Fix kaup -> kuap typo
On Tue, 2019-10-22 at 17:06 +1100, Andrew Donnellan wrote: > It's KUAP, not KAUP. Fix typo in INT_COMMON macro. > > Signed-off-by: Andrew Donnellan Akced-by: Russell Currey
[PATCH] powerpc/64s/exception: Fix kaup -> kuap typo
It's KUAP, not KAUP. Fix typo in INT_COMMON macro. Signed-off-by: Andrew Donnellan --- arch/powerpc/kernel/exceptions-64s.S | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index d0018dd17e0a..46508b148e16 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -514,7 +514,7 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948) * If stack=0, then the stack is already set in r1, and r1 is saved in r10. * PPR save and CPU accounting is not done for the !stack case (XXX why not?) */ -.macro INT_COMMON vec, area, stack, kaup, reconcile, dar, dsisr +.macro INT_COMMON vec, area, stack, kuap, reconcile, dar, dsisr .if \stack andi. r10,r12,MSR_PR /* See if coming from user */ mr r10,r1 /* Save r1 */ @@ -533,7 +533,7 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948) std r10,GPR1(r1)/* save r1 in stackframe*/ .if \stack - .if \kaup + .if \kuap kuap_save_amr_and_lock r9, r10, cr1, cr0 .endif beq 101f/* if from kernel mode */ @@ -541,7 +541,7 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948) SAVE_PPR(\area, r9) 101: .else - .if \kaup + .if \kuap kuap_save_amr_and_lock r9, r10, cr1 .endif .endif -- 2.20.1