Re: [PATCH v4 3/7] PCI: Separate VF BAR updates from standard BAR updates
On Tue, Nov 29, 2016 at 08:48:26AM -0600, Bjorn Helgaas wrote: >On Tue, Nov 29, 2016 at 03:55:46PM +1100, Gavin Shan wrote: >> On Mon, Nov 28, 2016 at 10:15:06PM -0600, Bjorn Helgaas wrote: >> >Previously pci_update_resource() used the same code path for updating >> >standard BARs and VF BARs in SR-IOV capabilities. >> > >> >Split the VF BAR update into a new pci_iov_update_resource() internal >> >interface, which makes it simpler to compute the BAR address (we can get >> >rid of pci_resource_bar() and pci_iov_resource_bar()). >> > >> >This patch: >> > >> > - Renames pci_update_resource() to pci_std_update_resource(), >> > - Adds pci_iov_update_resource(), >> > - Makes pci_update_resource() a wrapper that calls the appropriate one, >> > >> >No functional change intended. >> > >> >Signed-off-by: Bjorn Helgaas>> >> With below minor comments fixed: >> >> Reviewed-by: Gavin Shan >> >> >--- >> > drivers/pci/iov.c | 49 >> > +++ >> > drivers/pci/pci.h |1 + >> > drivers/pci/setup-res.c | 13 +++- >> > 3 files changed, 61 insertions(+), 2 deletions(-) >> > >> >diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c >> >index d41ec29..d00ed5c 100644 >> >--- a/drivers/pci/iov.c >> >+++ b/drivers/pci/iov.c >> >@@ -571,6 +571,55 @@ int pci_iov_resource_bar(struct pci_dev *dev, int >> >resno) >> >4 * (resno - PCI_IOV_RESOURCES); >> > } >> > >> >+/** >> >+ * pci_iov_update_resource - update a VF BAR >> >+ * @dev: the PCI device >> >+ * @resno: the resource number >> >+ * >> >+ * Update a VF BAR in the SR-IOV capability of a PF. >> >+ */ >> >+void pci_iov_update_resource(struct pci_dev *dev, int resno) >> >+{ >> >+ struct pci_sriov *iov = dev->is_physfn ? dev->sriov : NULL; >> >+ struct resource *res = dev->resource + resno; >> >+ int vf_bar = resno - PCI_IOV_RESOURCES; >> >+ struct pci_bus_region region; >> >+ u32 new; >> >+ int reg; >> >+ >> >+ /* >> >+* The generic pci_restore_bars() path calls this for all devices, >> >+* including VFs and non-SR-IOV devices. If this is not a PF, we >> >+* have nothing to do. >> >+*/ >> >+ if (!iov) >> >+ return; >> >+ >> >+ /* >> >+* Ignore unimplemented BARs, unused resource slots for 64-bit >> >+* BARs, and non-movable resources, e.g., those described via >> >+* Enhanced Allocation. >> >+*/ >> >+ if (!res->flags) >> >+ return; >> >+ >> >+ if (res->flags & IORESOURCE_UNSET) >> >+ return; >> >+ >> >+ if (res->flags & IORESOURCE_PCI_FIXED) >> >+ return; >> >+ >> >+ pcibios_resource_to_bus(dev->bus, , res); >> >+ new = region.start; >> >+ >> >> The bits indicating the BAR's property (e.g. memory, IO etc) are missed in >> @new. > >Hmm, yes. I omitted those because those bits are supposed to be >read-only, per spec (PCI r3.0, sec 6.2.5.1). But I guess it would be >more conservative to keep them, and this shouldn't be needlessly >different from pci_std_update_resource(). > Yeah, Agree. >However, I don't think this code in pci_update_resource() is obviously >correct: > > new = region.start | (res->flags & PCI_REGION_FLAG_MASK); > >PCI_REGION_FLAG_MASK is 0xf. For memory BARs, bits 0-3 are read-only >property bits. For I/O BARs, bits 0-1 are read-only and bits 2-3 are >part of the address, so on the face of it, the above could corrupt two >bits of an I/O address. > >It's true that decode_bar() initializes flags correctly, using >PCI_BASE_ADDRESS_IO_MASK for I/O BARs and PCI_BASE_ADDRESS_MEM_MASK >for memory BARs, but it would take a little more digging to be sure >that we never set bits 2-3 of flags for an I/O resource elsewhere. > The BAR's property bits are probed from device-tree, not hardware on some platforms (e.g. pSeries). Also, there is only one (property) bit if it's a ROM BAR. So more check as below might be needed because the code (without the enhancement) should also work fine. >How about this in pci_std_update_resource(): > >pcibios_resource_to_bus(dev->bus, , res); >new = region.start; > >if (res->flags & IORESOURCE_IO) { >mask = (u32)PCI_BASE_ADDRESS_IO_MASK; >new |= res->flags & ~PCI_BASE_ADDRESS_IO_MASK; >} else { >mask = (u32)PCI_BASE_ADDRESS_MEM_MASK; >new |= res->flags & ~PCI_BASE_ADDRESS_MEM_MASK; >} > if (res->flags & IORESOURCE_IO) { mask = (u32)PCI_BASE_ADDRESS_IO_MASK; new |= res->flags & ~PCI_BASE_ADDRESS_IO_MASK; } else if (resno < PCI_ROM_RESOURCE) { mask = (u32)PCI_BASE_ADDRESS_MEM_MASK; new |= res->flags & ~PCI_BASE_ADDRESS_MEM_MASK; } else if (resno == PCI_ROM_RESOURCE) { mask = ~((u32)IORESOURCE_ROM_ENABLE); new |= res->flags & IORESOURCE_ROM_ENABLE); } else {
Re: [PATCH v4 3/7] PCI: Separate VF BAR updates from standard BAR updates
On Wed, Nov 30, 2016 at 10:20:28AM +1100, Gavin Shan wrote: > On Tue, Nov 29, 2016 at 08:48:26AM -0600, Bjorn Helgaas wrote: > >On Tue, Nov 29, 2016 at 03:55:46PM +1100, Gavin Shan wrote: > >> On Mon, Nov 28, 2016 at 10:15:06PM -0600, Bjorn Helgaas wrote: > >> >Previously pci_update_resource() used the same code path for updating > >> >standard BARs and VF BARs in SR-IOV capabilities. > >> > > >> >Split the VF BAR update into a new pci_iov_update_resource() internal > >> >interface, which makes it simpler to compute the BAR address (we can get > >> >rid of pci_resource_bar() and pci_iov_resource_bar()). > >> > > >> >This patch: > >> > > >> > - Renames pci_update_resource() to pci_std_update_resource(), > >> > - Adds pci_iov_update_resource(), > >> > - Makes pci_update_resource() a wrapper that calls the appropriate one, > >> > > >> >No functional change intended. > >However, I don't think this code in pci_update_resource() is obviously > >correct: > > > > new = region.start | (res->flags & PCI_REGION_FLAG_MASK); > > > >PCI_REGION_FLAG_MASK is 0xf. For memory BARs, bits 0-3 are read-only > >property bits. For I/O BARs, bits 0-1 are read-only and bits 2-3 are > >part of the address, so on the face of it, the above could corrupt two > >bits of an I/O address. > > > >It's true that decode_bar() initializes flags correctly, using > >PCI_BASE_ADDRESS_IO_MASK for I/O BARs and PCI_BASE_ADDRESS_MEM_MASK > >for memory BARs, but it would take a little more digging to be sure > >that we never set bits 2-3 of flags for an I/O resource elsewhere. > > > > The BAR's property bits are probed from device-tree, not hardware > on some platforms (e.g. pSeries). Also, there is only one (property) > bit if it's a ROM BAR. So more check as below might be needed because > the code (without the enhancement) should also work fine. Ah, right, I forgot about that. I didn't do enough digging :) > >How about this in pci_std_update_resource(): > > > >pcibios_resource_to_bus(dev->bus, , res); > >new = region.start; > > > >if (res->flags & IORESOURCE_IO) { > >mask = (u32)PCI_BASE_ADDRESS_IO_MASK; > >new |= res->flags & ~PCI_BASE_ADDRESS_IO_MASK; > >} else { > >mask = (u32)PCI_BASE_ADDRESS_MEM_MASK; > >new |= res->flags & ~PCI_BASE_ADDRESS_MEM_MASK; > >} > > > > if (res->flags & IORESOURCE_IO) { > mask = (u32)PCI_BASE_ADDRESS_IO_MASK; > new |= res->flags & ~PCI_BASE_ADDRESS_IO_MASK; > } else if (resno < PCI_ROM_RESOURCE) { > mask = (u32)PCI_BASE_ADDRESS_MEM_MASK; > new |= res->flags & ~PCI_BASE_ADDRESS_MEM_MASK; > } else if (resno == PCI_ROM_RESOURCE) { > mask = ~((u32)IORESOURCE_ROM_ENABLE); > new |= res->flags & IORESOURCE_ROM_ENABLE); > } else { > dev_warn(>dev, "BAR#%d out of range\n", resno); > return; > } After this patch, the only thing we OR into a ROM BAR value is PCI_ROM_ADDRESS_ENABLE, and that's done below, only if the ROM is already enabled. I did update the ROM mask (to PCI_ROM_ADDRESS_MASK). I'm not 100% sure about doing that -- it follows the spec, but it is a change from what we've been doing before. I guess it should be safe because it means we're checking fewer bits than before (only the top 21 bits for ROMs, where we used check the top 28), so the only possible difference is that we might not warn about "error updating" in some case where we used to. I'm not really sure about the value of the "error updating" checks to begin with, though I guess it does help us find broken devices that put non-BARs where BARs are supposed to be. Bjorn
[PATCH] powerpc/radix/mm: Fixup storage key mm fault
Aneesh/Ben reported that the change to do_page_fault() needs to handle the case where CPU_FTR_COHERENT_ICACHE is missing but we have CPU_FTR_NOEXECUTE. In those cases the check added for SRR1_ISI_N_OR_G might trigger a false positive. This patch checks for CPU_FTR_COHERENT_ICACHE in addition to the MSR value Reported-by: Aneesh Kumar K.VSigned-off-by: Balbir Singh --- Applies on top of powerpc/next and I've not added a fixes tag arch/powerpc/mm/fault.c | 9 - drivers/crypto/Makefile | 1 - 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c index a17029aa..eab3ded 100644 --- a/arch/powerpc/mm/fault.c +++ b/arch/powerpc/mm/fault.c @@ -392,8 +392,15 @@ int do_page_fault(struct pt_regs *regs, unsigned long address, if (is_exec) { /* * An execution fault + no execute ? +* We need to check for CPU_FTR_COHERENT_ICACHE, since +* on some variants, an NX fault is taken and +* hash_page_do_lazy_icache() does the fixup. Without the +* check for CPU_FTR_COHERENT_ICACHE we could have a false +* positive if we have !CPU_FTR_COHERENT_ICACHE and +* CPU_FTR_NOEXECUTE */ - if (regs->msr & SRR1_ISI_N_OR_G) + if (cpu_has_feature(CPU_FTR_COHERENT_ICACHE) && + (regs->msr & SRR1_ISI_N_OR_G)) goto bad_area; /* diff --git a/drivers/crypto/Makefile b/drivers/crypto/Makefile index ad7250f..3c6432d 100644 --- a/drivers/crypto/Makefile +++ b/drivers/crypto/Makefile @@ -31,4 +31,3 @@ obj-$(CONFIG_CRYPTO_DEV_QCE) += qce/ obj-$(CONFIG_CRYPTO_DEV_VMX) += vmx/ obj-$(CONFIG_CRYPTO_DEV_SUN4I_SS) += sunxi-ss/ obj-$(CONFIG_CRYPTO_DEV_ROCKCHIP) += rockchip/ -obj-$(CONFIG_CRYPTO_DEV_CHELSIO) += chelsio/ -- 2.5.5
[PATCH] powerpc/opal-irqchip: Use interrupt names if present
Recent versions of OPAL will be able to provide names for the various OPAL interrupts via a new "opal-interrupt-names" property. So let's use them to make /proc/interrupts more informative. This also modernises the code that fetches the interrupt array to use the helpers provided by the generic code instead of hand-parsing the property. Signed-off-by: Benjamin Herrenschmidt--- arch/powerpc/platforms/powernv/opal-irqchip.c | 45 --- 1 file changed, 33 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/platforms/powernv/opal-irqchip.c b/arch/powerpc/platforms/powernv/opal-irqchip.c index 998316b..fe9b029 100644 --- a/arch/powerpc/platforms/powernv/opal-irqchip.c +++ b/arch/powerpc/platforms/powernv/opal-irqchip.c @@ -183,8 +183,9 @@ void opal_event_shutdown(void) int __init opal_event_init(void) { struct device_node *dn, *opal_node; - const __be32 *irqs; - int i, irqlen, rc = 0; + const char **names; + u32 *irqs; + int i, rc = 0; opal_node = of_find_node_by_path("/ibm,opal"); if (!opal_node) { @@ -209,37 +210,57 @@ int __init opal_event_init(void) goto out; } - /* Get interrupt property */ - irqs = of_get_property(opal_node, "opal-interrupts", ); - opal_irq_count = irqs ? (irqlen / 4) : 0; + /* Get opal-interrupts property and names if present */ + rc = of_property_count_u32_elems(opal_node, "opal-interrupts"); + if (rc < 0) + goto out; + opal_irq_count = rc; pr_debug("Found %d interrupts reserved for OPAL\n", opal_irq_count); + irqs = kzalloc(rc * sizeof(u32), GFP_KERNEL); + if (WARN_ON(!irqs)) + goto out; + rc = of_property_read_u32_array(opal_node, "opal-interrupts", + irqs, opal_irq_count); + if (rc < 0) { + pr_err("Error %d reading opal-interrupts array\n", rc); + goto out; + } + names = kzalloc(opal_irq_count * sizeof(char *), GFP_KERNEL); + of_property_read_string_array(opal_node, "opal-interrupts-names", + names, opal_irq_count); /* Install interrupt handlers */ opal_irqs = kcalloc(opal_irq_count, sizeof(*opal_irqs), GFP_KERNEL); - for (i = 0; irqs && i < opal_irq_count; i++, irqs++) { - unsigned int irq, virq; + for (i = 0; i < opal_irq_count; i++) { + unsigned int virq; + char *name; /* Get hardware and virtual IRQ */ - irq = be32_to_cpup(irqs); - virq = irq_create_mapping(NULL, irq); + virq = irq_create_mapping(NULL, irqs[i]); if (!virq) { - pr_warn("Failed to map irq 0x%x\n", irq); + pr_warn("Failed to map irq 0x%x\n", irqs[i]); continue; } + if (names && names[i] && strlen(names[i])) + name = kasprintf(GFP_KERNEL, "opal-%s", names[i]); + else + name = kasprintf(GFP_KERNEL, "opal"); /* Install interrupt handler */ rc = request_irq(virq, opal_interrupt, IRQF_TRIGGER_LOW, -"opal", NULL); +name, NULL); if (rc) { irq_dispose_mapping(virq); pr_warn("Error %d requesting irq %d (0x%x)\n", -rc, virq, irq); +rc, virq, irqs[i]); continue; } /* Cache IRQ */ opal_irqs[i] = virq; } + kfree(irqs); + kfree(names); out: of_node_put(opal_node);
Re: [PATCH v7 3/7] powerpc/mm: Introduce _PAGE_LARGE software pte bits
On 28/11/16 17:17, Aneesh Kumar K.V wrote: > This patch adds a new software defined pte bit. We use the reserved > fields of ISA 3.0 pte definition since we will only be using this > on DD1 code paths. We can possibly look at removing this code later. > > The software bit will be used to differentiate between 64K/4K and 2M ptes. > This helps in finding the page size mapping by a pte so that we can do > efficient > tlb flush. > > We don't support 1G hugetlb pages yet. So we add a DEBUG WARN_ON to catch > wrong usage. > I thought we do in hugetlb_page_init() don't we register sizes for every size from 0 to MMU_PAGE_COUNT? > Signed-off-by: Aneesh Kumar K.V> --- > arch/powerpc/include/asm/book3s/64/hugetlb.h | 20 > arch/powerpc/include/asm/book3s/64/pgtable.h | 9 + > arch/powerpc/include/asm/book3s/64/radix.h | 2 ++ > 3 files changed, 31 insertions(+) > > diff --git a/arch/powerpc/include/asm/book3s/64/hugetlb.h > b/arch/powerpc/include/asm/book3s/64/hugetlb.h > index d9c283f95e05..c62f14d0bec1 100644 > --- a/arch/powerpc/include/asm/book3s/64/hugetlb.h > +++ b/arch/powerpc/include/asm/book3s/64/hugetlb.h > @@ -30,4 +30,24 @@ static inline int hstate_get_psize(struct hstate *hstate) > return mmu_virtual_psize; > } > } > + > +#define arch_make_huge_pte arch_make_huge_pte > +static inline pte_t arch_make_huge_pte(pte_t entry, struct vm_area_struct > *vma, > +struct page *page, int writable) > +{ > + unsigned long page_shift; > + > + if (!cpu_has_feature(CPU_FTR_POWER9_DD1)) > + return entry; > + > + page_shift = huge_page_shift(hstate_vma(vma)); > + /* > + * We don't support 1G hugetlb pages yet. > + */ > + VM_WARN_ON(page_shift == mmu_psize_defs[MMU_PAGE_1G].shift); > + if (page_shift == mmu_psize_defs[MMU_PAGE_2M].shift) > + return __pte(pte_val(entry) | _PAGE_LARGE); > + else > + return entry; > +} > #endif > diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h > b/arch/powerpc/include/asm/book3s/64/pgtable.h > index 86870c11917b..6f39b9d134a2 100644 > --- a/arch/powerpc/include/asm/book3s/64/pgtable.h > +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h > @@ -26,6 +26,11 @@ > #define _RPAGE_SW1 0x00800 > #define _RPAGE_SW2 0x00400 > #define _RPAGE_SW3 0x00200 > +#define _RPAGE_RSV1 0x1000UL > +#define _RPAGE_RSV2 0x0800UL > +#define _RPAGE_RSV3 0x0400UL > +#define _RPAGE_RSV4 0x0200UL > + We use the top 4 bits and not the _SW bits? > #ifdef CONFIG_MEM_SOFT_DIRTY > #define _PAGE_SOFT_DIRTY _RPAGE_SW3 /* software: software dirty tracking > */ > #else > @@ -34,6 +39,10 @@ > #define _PAGE_SPECIAL_RPAGE_SW2 /* software: special page */ > #define _PAGE_DEVMAP _RPAGE_SW1 > #define __HAVE_ARCH_PTE_DEVMAP > +/* > + * For DD1 only, we need to track whether the pte huge For POWER9_DD1 only > + */ > +#define _PAGE_LARGE _RPAGE_RSV1 > > > #define _PAGE_PTE(1ul << 62) /* distinguishes PTEs from > pointers */ > diff --git a/arch/powerpc/include/asm/book3s/64/radix.h > b/arch/powerpc/include/asm/book3s/64/radix.h > index 2a46dea8e1b1..d2c5c064e266 100644 > --- a/arch/powerpc/include/asm/book3s/64/radix.h > +++ b/arch/powerpc/include/asm/book3s/64/radix.h > @@ -243,6 +243,8 @@ static inline int radix__pmd_trans_huge(pmd_t pmd) > > static inline pmd_t radix__pmd_mkhuge(pmd_t pmd) > { > + if (cpu_has_feature(CPU_FTR_POWER9_DD1)) > + return __pmd(pmd_val(pmd) | _PAGE_PTE | _PAGE_LARGE); > return __pmd(pmd_val(pmd) | _PAGE_PTE); > } > static inline void radix__pmdp_huge_split_prepare(struct vm_area_struct *vma, >
Re: [PATCH v7 3/7] powerpc/mm: Introduce _PAGE_LARGE software pte bits
On Wed, 2016-11-30 at 11:14 +1100, Balbir Singh wrote: > > +#define _RPAGE_RSV1 0x1000UL > > +#define _RPAGE_RSV2 0x0800UL > > +#define _RPAGE_RSV3 0x0400UL > > +#define _RPAGE_RSV4 0x0200UL > > + > > We use the top 4 bits and not the _SW bits? Correct, welcome to the discussion we've been having the last 2 weeks :-) We use those bits because we are otherwise short on SW bits (we still need _PAGE_DEVMAP etc...). We know P9 DD1 is supposed to ignore the reserved bits so it's a good place holder. Cheers, Ben.
Re: [PATCH v11 0/8] powerpc: Implement kexec_file_load()
Hello Andrew, Am Dienstag, 29. November 2016, 13:45:18 BRST schrieb Andrew Morton: > On Tue, 29 Nov 2016 23:45:46 +1100 Michael Ellermanwrote: > > This is v11 of the kexec_file_load() for powerpc series. > > > > I've stripped this down to the minimum we need, so we can get this in for > > 4.10. Any additions can come later incrementally. > > This made a bit of a mess of Mimi's series "ima: carry the > measurement list across kexec v10". > > powerpc-ima-get-the-kexec-buffer-passed-by-the-previous-kernel.patch > ima-on-soft-reboot-restore-the-measurement-list.patch > ima-permit-duplicate-measurement-list-entries.patch > ima-maintain-memory-size-needed-for-serializing-the-measurement-list.patch > powerpc-ima-send-the-kexec-buffer-to-the-next-kernel.patch > ima-on-soft-reboot-save-the-measurement-list.patch > ima-store-the-builtin-custom-template-definitions-in-a-list.patch > ima-support-restoring-multiple-template-formats.patch > ima-define-a-canonical-binary_runtime_measurements-list-format.patch > ima-platform-independent-hash-value.patch > > I made the syntactic fixes but I won't be testing it. Sorry about that. We are preparing an updated version rebased on Michael's patches to address that. Just to explain where v11 is coming from: kexec_file_load v11 uses a minimal purgatory taken from kexec-lite, resulting in a purgatory object without relocations. This avoids the problem of having to have a lot of code to process purgatory relocations, which is the problem I was trying to address in the past couple versions of this patch series. The new purgatory also doesn't need the kernel to set global variables to tell it where the stack, TOC and OPAL entrypoint are, so that code was dropped from setup_purgatory. The other change was to move the code in elf_util.[ch] into kexec_elf_64.c, with no actual code change. > > If no one objects I'll merge this via the powerpc tree. The three kexec > > patches have been acked by Dave Young (since forever), and have been in > > linux-next (via akpm's tree) also for a long time. > > OK, I'll wait for these to appear in -next and I will await advice on Mimi and I would like to thank you for your support and help with these patches, Andrew. -- Thiago Jung Bauermann IBM Linux Technology Center
Re: [RFC] fs: add userspace critical mounts event support
On Tue, Nov 29, 2016 at 10:10:56PM +0100, Tom Gundersen wrote: > On Tue, Nov 15, 2016 at 10:28 AM, Johannes Berg >wrote: > > My argument basically goes like this: > > > > First, given good drivers (i.e. using request_firmware_nowait()) > > putting firmware even for a built-in driver into initramfs or not > > should be a system integrator decision. If they don't need the device > > that early, it should be possible for them to delay it. Or, perhaps, if > > the firmware is too big, etc. I'm sure we can all come up with more > > examples of why you'd want to do it one way or another. > > This is how I understood the the situation, but I never quite bought > it. What is wrong with the kernel saying "you must put your module and > your firmware together"? Sure, people may want to do things > differently, but what is the real blocker? 0) Firmware upgrades are possible 1) Some firmware is optional 2) Firmware licenses may often not be GPLv2 compatible 3) Some firmwares may be stupid large (remote-proc) as such neither built-in firmware nor using the firmware in initramfs is reasonable. But note that Johannes' main point was that today only a few properly constructed drivers use async fw request, and furthermore given the lack of a deterministic final rootfs signal his proposal was to address the lack of semantics available between kernel and userspcae available for this with a firmware kobject uevent fallback helper. This fallback kobject uevent helper would not reply firmly against files not found until it knows all rootfs firmware paths are ready. > Fundamentally, it seems to me that if a module needs firmware, it > makes no sense to make the module available before the firmware. I'm > probably missing something though :) You are right but just consider all the above. Luis
Re: [PATCH v11 0/8] powerpc: Implement kexec_file_load()
On Tue, 29 Nov 2016 23:45:46 +1100 Michael Ellermanwrote: > This is v11 of the kexec_file_load() for powerpc series. > > I've stripped this down to the minimum we need, so we can get this in for > 4.10. > Any additions can come later incrementally. This made a bit of a mess of Mimi's series "ima: carry the measurement list across kexec v10". powerpc-ima-get-the-kexec-buffer-passed-by-the-previous-kernel.patch ima-on-soft-reboot-restore-the-measurement-list.patch ima-permit-duplicate-measurement-list-entries.patch ima-maintain-memory-size-needed-for-serializing-the-measurement-list.patch powerpc-ima-send-the-kexec-buffer-to-the-next-kernel.patch ima-on-soft-reboot-save-the-measurement-list.patch ima-store-the-builtin-custom-template-definitions-in-a-list.patch ima-support-restoring-multiple-template-formats.patch ima-define-a-canonical-binary_runtime_measurements-list-format.patch ima-platform-independent-hash-value.patch I made the syntactic fixes but I won't be testing it. > If no one objects I'll merge this via the powerpc tree. The three kexec > patches > have been acked by Dave Young (since forever), and have been in linux-next > (via > akpm's tree) also for a long time. OK, I'll wait for these to appear in -next and I will await advice on
Re: [RFC] fs: add userspace critical mounts event support
On Wed, Nov 09, 2016 at 03:21:07AM -0800, Andy Lutomirski wrote: > On Wed, Nov 9, 2016 at 1:13 AM, Daniel Wagner >wrote: > > [CC: added Harald] > > > > As Harald pointed out over a beer yesterday evening, there is at least > > one more reason why UMH isn't obsolete. The ordering of the firmware loading > > might be of important. Say you want to greet the user with a splash screen > > really early on, the graphic card firmware should be loaded first. Also the > > automotive world has this fancy requirement that rear camera must be on the > > screen within 2 seconds. So controlling the firmware loading order is of > > importance (e.g. also do not overcommit the I/O bandwith not so important > > firmwares). A user space helper is able to prioritize the request > > accordingly the use case. > > That seems like a valid problem, but I don't think that UMH adequately > solves it. Sure, loading firmware in the right order avoids a >2sec > delay due to firmware loading, but what happens when you have a slow > USB device that *doesn't* need firmware plugged in to your car's shiny > USB port when you start the car? > > It seems to me that this use case requires explicit control over > device probing and, if that gets added, you get your firmware ordering > for free (just probe the important devices first). In theory this is correct, the problem comes with the flexibility we have created with pivot_root() and friends (another is mount on /lib/firmware) which enables system integrators to pick and choose the "real rootfs" to be a few layers away from the first fs picked up by the kernel. In providing this flexibility we did not envision nor have devised signals to enable a deterministic lookup due to the requirements such lookups might have -- in this case the requirements are that direct fs is ready and kosher all the paths possible for firmware are ready. As you can imagine first race is not only an issue for firmware but a generic issue. The generic race on the fs lookup requires a fs availability event, and addressing fs suspend. I'll note that the race on init is addressed today *only* by the firmware UMH (its UMH is kobject uevent and optionally a custom binary) by using the UMH lock. During a cleanup by Daniel recently I realized it was bogus to use the UMH of the UMH was not used, turns out this would still expose the direct FS lookup to a race though. This begs the question if the UMH lock either be removed / shared with the other kernel UMHs or a generic solution provided for direct fs lookup with some requirements specified. This is all a mess so I've documented each component and issues / ideas we've discussed so far separately, the firmware UMH (which we should probably rebrand to firmware kobject uevent helper to avoid confusion) [0], the real kernel usermode helper [1], the new common kernel file loader [2] [0] https://kernelnewbies.org/KernelProjects/firmware-class-enhancements [1] https://kernelnewbies.org/KernelProjects/usermode-helper-enhancements [2] https://kernelnewbies.org/KernelProjects/common-kernel-loader Luis
Re: powerpc/ps3: Fix system hang with GCC 5 builds
On Tue, Nov 29, 2016 at 10:47:32AM -0800, Geoff Levand wrote: > GCC 5 generates different code for this bootwrapper null check > that causes the PS3 to hang very early in its bootup. This > check is of limited value, so just get rid of it. > > Signed-off-by: Geoff Levand> --- > arch/powerpc/boot/ps3-head.S | 5 - > arch/powerpc/boot/ps3.c | 8 +--- > 2 files changed, 1 insertion(+), 12 deletions(-) This is not the correct way to submit patches for inclusion in the stable kernel tree. Please read Documentation/stable_kernel_rules.txt for how to do this properly.
Re: [RFC] fs: add userspace critical mounts event support
On Tue, Nov 15, 2016 at 10:28 AM, Johannes Bergwrote: > My argument basically goes like this: > > First, given good drivers (i.e. using request_firmware_nowait()) > putting firmware even for a built-in driver into initramfs or not > should be a system integrator decision. If they don't need the device > that early, it should be possible for them to delay it. Or, perhaps, if > the firmware is too big, etc. I'm sure we can all come up with more > examples of why you'd want to do it one way or another. This is how I understood the the situation, but I never quite bought it. What is wrong with the kernel saying "you must put your module and your firmware together"? Sure, people may want to do things differently, but what is the real blocker? Fundamentally, it seems to me that if a module needs firmware, it makes no sense to make the module available before the firmware. I'm probably missing something though :) Cheers, Tom
Re: [PATCH v7 2/7] powerpc/mm/hugetlb: Handle hugepage size supported by hash config
On 28/11/16 17:16, Aneesh Kumar K.V wrote: > W.r.t hash page table config, we support 16MB and 16GB as the hugepage > size. Update the hstate_get_psize to handle 16M and 16G. > > Signed-off-by: Aneesh Kumar K.V> --- > arch/powerpc/include/asm/book3s/64/hugetlb.h | 4 > 1 file changed, 4 insertions(+) > > diff --git a/arch/powerpc/include/asm/book3s/64/hugetlb.h > b/arch/powerpc/include/asm/book3s/64/hugetlb.h > index 499268045306..d9c283f95e05 100644 > --- a/arch/powerpc/include/asm/book3s/64/hugetlb.h > +++ b/arch/powerpc/include/asm/book3s/64/hugetlb.h > @@ -21,6 +21,10 @@ static inline int hstate_get_psize(struct hstate *hstate) > return MMU_PAGE_2M; > else if (shift == mmu_psize_defs[MMU_PAGE_1G].shift) > return MMU_PAGE_1G; > + else if (shift == mmu_psize_defs[MMU_PAGE_16M].shift) > + return MMU_PAGE_16M; > + else if (shift == mmu_psize_defs[MMU_PAGE_16G].shift) > + return MMU_PAGE_16G; Could we reorder this We check for 2M, 1G, 16M and 16G. The likely sizes are 2M and 16M. Can we have those upfront so that the order of checks is 2M, 16M, 1G and 16G Balbir
[PATCH] powerpc/8xx: xmon compile fix
Signed-off-by: Nicholas Piggin--- arch/powerpc/xmon/xmon.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index 7605455..435f5f5 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -1213,10 +1213,13 @@ bpt_cmds(void) { int cmd; unsigned long a; - int mode, i; + int i; struct bpt *bp; +#ifndef CONFIG_8xx + int mode; const char badaddr[] = "Only kernel addresses are permitted " "for breakpoints\n"; +#endif cmd = inchar(); switch (cmd) { -- 2.10.2
Re: [PATCH] powerpc/8xx: xmon compile fix
Le 29/11/2016 à 09:56, Nicholas Piggin a écrit : Signed-off-by: Nicholas Piggin--- arch/powerpc/xmon/xmon.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index 7605455..435f5f5 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -1213,10 +1213,13 @@ bpt_cmds(void) { int cmd; unsigned long a; - int mode, i; + int i; struct bpt *bp; +#ifndef CONFIG_8xx CONFIG_8xx is deprecated (ref arch/powerpc/platforms/Kconfig.cputype). CONFIG_PPC_8xx should be used instead. + int mode; You could also have moved this declaration inside the switch {, something like switch (cmd) { #ifndef CONFIG_8xx + int mode; case 'd': Christophe const char badaddr[] = "Only kernel addresses are permitted " "for breakpoints\n"; +#endif cmd = inchar(); switch (cmd) {
Re: [PATCH] powerpc/8xx: xmon compile fix
On Tue, 29 Nov 2016 10:06:43 +0100 Christophe LEROYwrote: > Le 29/11/2016 à 09:56, Nicholas Piggin a écrit : > > Signed-off-by: Nicholas Piggin > > --- > > arch/powerpc/xmon/xmon.c | 5 - > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c > > index 7605455..435f5f5 100644 > > --- a/arch/powerpc/xmon/xmon.c > > +++ b/arch/powerpc/xmon/xmon.c > > @@ -1213,10 +1213,13 @@ bpt_cmds(void) > > { > > int cmd; > > unsigned long a; > > - int mode, i; > > + int i; > > struct bpt *bp; > > +#ifndef CONFIG_8xx > > CONFIG_8xx is deprecated (ref arch/powerpc/platforms/Kconfig.cputype). > CONFIG_PPC_8xx should be used instead. Thanks for picking that up. Michael, can you adjust it if you merge please? > > > + int mode; > > You could also have moved this declaration inside the switch {, > something like I tried that, couldn't decide that it was better (you also need badaddr). Thanks, Nick
[PATCH 2/2] powerpc/8xx: Implement hw_breakpoint
This patch implements HW breakpoint on the 8xx. The 8xx has capability to manage HW breakpoints, which is slightly different than BOOK3S: 1/ The breakpoint match doesn't trigger a DSI exception but a dedicated data breakpoint exception. 2/ The breakpoint happens after the instruction has completed, no need to single step or emulate the instruction, 3/ Matched address is not set in DAR but in BAR, 4/ DABR register doesn't exist, instead we have registers LCTRL1, LCTRL2 and CMPx registers, 5/ The match on one comparator is not on a double word but on a single word. The patch does: 1/ Prepare the dedicated registers in call to __set_dabr(). In order to emulate the double word handling of BOOK3S, comparator E is set to DABR address value and comparator F to address + 4. Then breakpoint 1 is set to match comparator E or F, 2/ Skip the singlestepping stage when compiled for CONFIG_PPC_8xx, 3/ Implement the exception. In that exception, the matched address is taken from SPRN_BAR and manage as if it was from SPRN_DAR. 4/ I/D TLB error exception routines perform a tlbie on bad TLBs. That tlbie triggers the breakpoint exception when performed on the breakpoint address. For this reason, the routine returns if the match is from one of those two tlbie. Signed-off-by: Christophe Leroy--- arch/powerpc/Kconfig| 2 +- arch/powerpc/include/asm/reg_8xx.h | 7 +++ arch/powerpc/kernel/head_8xx.S | 28 +++- arch/powerpc/kernel/hw_breakpoint.c | 6 +- arch/powerpc/kernel/process.c | 22 ++ 5 files changed, 62 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 5b736e4..75459cf 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -113,7 +113,7 @@ config PPC select HAVE_PERF_REGS select HAVE_PERF_USER_STACK_DUMP select HAVE_REGS_AND_STACK_ACCESS_API - select HAVE_HW_BREAKPOINT if PERF_EVENTS && PPC_BOOK3S + select HAVE_HW_BREAKPOINT if PERF_EVENTS && (PPC_BOOK3S || PPC_8xx) select ARCH_WANT_IPC_PARSE_VERSION select SPARSE_IRQ select IRQ_DOMAIN diff --git a/arch/powerpc/include/asm/reg_8xx.h b/arch/powerpc/include/asm/reg_8xx.h index 0197e12..8c5c7f2 100644 --- a/arch/powerpc/include/asm/reg_8xx.h +++ b/arch/powerpc/include/asm/reg_8xx.h @@ -29,6 +29,13 @@ #define SPRN_EIE 80 /* External interrupt enable (EE=1, RI=1) */ #define SPRN_EID 81 /* External interrupt disable (EE=0, RI=1) */ +/* Debug registers */ +#define SPRN_CMPE 152 +#define SPRN_CMPF 153 +#define SPRN_LCTRL1156 +#define SPRN_LCTRL2157 +#define SPRN_BAR 159 + /* Commands. Only the first few are available to the instruction cache. */ #defineIDC_ENABLE 0x0200 /* Cache enable */ diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S index fb133a1..d4f3335 100644 --- a/arch/powerpc/kernel/head_8xx.S +++ b/arch/powerpc/kernel/head_8xx.S @@ -478,6 +478,7 @@ InstructionTLBError: andis. r10,r5,0x4000 beq+1f tlbie r4 +itlbie: /* 0x400 is InstructionAccess exception, needed by bad_page_fault() */ 1: EXC_XFER_LITE(0x400, handle_page_fault) @@ -502,6 +503,7 @@ DARFixed:/* Return from dcbx instruction bug workaround */ andis. r10,r5,0x4000 beq+1f tlbie r4 +dtlbie: 1: li r10,RPN_PATTERN mtspr SPRN_DAR,r10/* Tag DAR, to be used in DTLB Error */ /* 0x300 is DataAccess exception, needed by bad_page_fault() */ @@ -519,7 +521,27 @@ DARFixed:/* Return from dcbx instruction bug workaround */ * support of breakpoints and such. Someday I will get around to * using them. */ - EXCEPTION(0x1c00, Trap_1c, unknown_exception, EXC_XFER_EE) + . = 0x1c00 +DataBreakpoint: + EXCEPTION_PROLOG_0 + mfcrr10 + mfspr r11, SPRN_SRR0 + cmplwi cr0, r11, (dtlbie - PAGE_OFFSET)@l + cmplwi cr7, r11, (itlbie - PAGE_OFFSET)@l + beq-cr0, 11f + beq-cr7, 11f + EXCEPTION_PROLOG_1 + EXCEPTION_PROLOG_2 + addir3,r1,STACK_FRAME_OVERHEAD + mfspr r4,SPRN_BAR + stw r4,_DAR(r11) + mfspr r5,SPRN_DSISR + EXC_XFER_EE(0x1c00, do_break) +11: + mtcrr10 + EXCEPTION_EPILOG_0 + rfi + EXCEPTION(0x1d00, Trap_1d, unknown_exception, EXC_XFER_EE) EXCEPTION(0x1e00, Trap_1e, unknown_exception, EXC_XFER_EE) EXCEPTION(0x1f00, Trap_1f, unknown_exception, EXC_XFER_EE) @@ -870,6 +892,10 @@ initial_mmu: lis r8, IDC_ENABLE@h mtspr SPRN_DC_CST, r8 #endif + /* Disable debug mode entry on data breakpoints */ + mfspr r8, SPRN_DER + rlwinm r8, r8, 0, ~0x8 + mtspr SPRN_DER, r8 blr diff --git a/arch/powerpc/kernel/hw_breakpoint.c b/arch/powerpc/kernel/hw_breakpoint.c index 03d089b..4b70a53
Re: [PATCH v7 0/7] Radix pte update tlbflush optimizations.
On 28/11/16 17:16, Aneesh Kumar K.V wrote: > Changes from v6: > * restrict the new pte bit to radix and DD1 config > > Changes from V5: > Switch to use pte bits to track page size. > > This series looks much better, I wish there was a better way of avoiding to have to pass the address to the ptep function, but I guess we get to live with it forever Balbir Singh.
[PATCH 0/2] powerpc: hw_breakpoint for book3s/32 and 8xx
This serie provides HW breakpoints on 32 bits Book3S and 8xx Tested on mpc8321 Tested on mpc885 Christophe Leroy (2): powerpc/32: Enable HW_BREAKPOINT on BOOK3S powerpc/8xx: Implement hw_breakpoint arch/powerpc/Kconfig | 2 +- arch/powerpc/include/asm/processor.h | 2 +- arch/powerpc/include/asm/reg_8xx.h | 7 +++ arch/powerpc/kernel/head_8xx.S | 28 +++- arch/powerpc/kernel/hw_breakpoint.c | 6 +- arch/powerpc/kernel/process.c| 22 ++ 6 files changed, 63 insertions(+), 4 deletions(-) -- 2.10.1
[PATCH] powerpc/32: tlbie provide L operand explicitly
The single-operand form of tlbie used to be accepted as the second operand (L) being implicitly 0. Newer binutils reject this. Change remaining single-op tlbie instructions to have explicit 0 second argument. Signed-off-by: Nicholas Piggin--- arch/powerpc/include/asm/ppc_asm.h | 2 +- arch/powerpc/kernel/head_32.S | 2 +- arch/powerpc/kernel/head_8xx.S | 8 arch/powerpc/kernel/swsusp_32.S | 2 +- arch/powerpc/mm/hash_low_32.S | 8 arch/powerpc/mm/mmu_decl.h | 2 +- arch/powerpc/platforms/powermac/sleep.S | 2 +- 7 files changed, 13 insertions(+), 13 deletions(-) diff --git a/arch/powerpc/include/asm/ppc_asm.h b/arch/powerpc/include/asm/ppc_asm.h index c73750b..5a0a2f9 100644 --- a/arch/powerpc/include/asm/ppc_asm.h +++ b/arch/powerpc/include/asm/ppc_asm.h @@ -416,7 +416,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_601) lis r4,KERNELBASE@h;\ .machine push; \ .machine "power4"; \ -0: tlbie r4; \ +0: tlbie r4,0; \ .machine pop; \ addir4,r4,0x1000; \ bdnz0b diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S index 9d96354..af99545 100644 --- a/arch/powerpc/kernel/head_32.S +++ b/arch/powerpc/kernel/head_32.S @@ -1110,7 +1110,7 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_USE_HIGH_BATS) flush_tlbs: lis r10, 0x40 1: addic. r10, r10, -0x1000 - tlbie r10 + tlbie r10,0 bgt 1b sync blr diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S index fb133a1..b967bfa 100644 --- a/arch/powerpc/kernel/head_8xx.S +++ b/arch/powerpc/kernel/head_8xx.S @@ -314,9 +314,9 @@ SystemCall: #ifdef CONFIG_8xx_CPU15 #define INVALIDATE_ADJACENT_PAGES_CPU15(tmp, addr) \ additmp, addr, PAGE_SIZE; \ - tlbie tmp;\ + tlbie tmp,0; \ additmp, addr, -PAGE_SIZE; \ - tlbie tmp + tlbie tmp,0 #else #define INVALIDATE_ADJACENT_PAGES_CPU15(tmp, addr) #endif @@ -477,7 +477,7 @@ InstructionTLBError: mr r5,r9 andis. r10,r5,0x4000 beq+1f - tlbie r4 + tlbie r4,0 /* 0x400 is InstructionAccess exception, needed by bad_page_fault() */ 1: EXC_XFER_LITE(0x400, handle_page_fault) @@ -501,7 +501,7 @@ DARFixed:/* Return from dcbx instruction bug workaround */ mfspr r4,SPRN_DAR andis. r10,r5,0x4000 beq+1f - tlbie r4 + tlbie r4,0 1: li r10,RPN_PATTERN mtspr SPRN_DAR,r10/* Tag DAR, to be used in DTLB Error */ /* 0x300 is DataAccess exception, needed by bad_page_fault() */ diff --git a/arch/powerpc/kernel/swsusp_32.S b/arch/powerpc/kernel/swsusp_32.S index ba4dee3..cb26ab3 100644 --- a/arch/powerpc/kernel/swsusp_32.S +++ b/arch/powerpc/kernel/swsusp_32.S @@ -302,7 +302,7 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_USE_HIGH_BATS) /* Flush all TLBs */ lis r4,0x1000 1: addic. r4,r4,-0x1000 - tlbie r4 + tlbie r4,0 bgt 1b sync diff --git a/arch/powerpc/mm/hash_low_32.S b/arch/powerpc/mm/hash_low_32.S index 09cc50c..0675034 100644 --- a/arch/powerpc/mm/hash_low_32.S +++ b/arch/powerpc/mm/hash_low_32.S @@ -352,7 +352,7 @@ _GLOBAL(hash_page_patch_A) */ andi. r6,r6,_PAGE_HASHPTE beq+10f /* no PTE: go look for an empty slot */ - tlbie r4 + tlbie r4,0 addis r4,r7,htab_hash_searches@ha lwz r6,htab_hash_searches@l(r4) @@ -612,7 +612,7 @@ _GLOBAL(flush_hash_patch_B) 3: li r0,0 STPTE r0,0(r12) /* invalidate entry */ 4: sync - tlbie r4 /* in hw tlb too */ + tlbie r4,0/* in hw tlb too */ sync 8: ble cr1,9f /* if all ptes checked */ @@ -661,7 +661,7 @@ _GLOBAL(_tlbie) stwcx. r8,0,r9 bne-10b eieio - tlbie r3 + tlbie r3,0 sync TLBSYNC li r0,0 @@ -670,7 +670,7 @@ _GLOBAL(_tlbie) SYNC_601 isync #else /* CONFIG_SMP */ - tlbie r3 + tlbie r3,0 sync #endif /* CONFIG_SMP */ blr diff --git a/arch/powerpc/mm/mmu_decl.h b/arch/powerpc/mm/mmu_decl.h index f988db6..9b9e780 100644 --- a/arch/powerpc/mm/mmu_decl.h +++ b/arch/powerpc/mm/mmu_decl.h @@ -55,7 +55,7 @@ extern void _tlbil_pid_noind(unsigned int pid); static inline void _tlbil_va(unsigned long address, unsigned int pid, unsigned int tsize, unsigned int ind) { - asm volatile ("tlbie %0; sync" : : "r" (address) : "memory"); + asm volatile ("tlbie
Re: [PATCH v3 1/3] powernv:idle: Add IDLE_STATE_ENTER_SEQ_NORET macro
On 10/11/16 18:54, Gautham R. Shenoy wrote: > From: "Gautham R. Shenoy"> > Currently all the low-power idle states are expected to wake up > at reset vector 0x100. Which is why the macro IDLE_STATE_ENTER_SEQ > that puts the CPU to an idle state and never returns. > > On ISA_300, when the ESL and EC bits in the PSSCR are zero, the > CPU is expected to wake up at the next instruction of the idle > instruction. > > This patch adds a new macro named IDLE_STATE_ENTER_SEQ_NORET for the I think something like IDLE_STATE_ENTER_SEQ_LOSE_CTX would be better? > no-return variant and reuses the name IDLE_STATE_ENTER_SEQ > for a variant that allows resuming operation at the instruction next > to the idle-instruction. > > + > +#define IDLE_STATE_ENTER_SEQ_NORET(IDLE_INST) \ > + IDLE_STATE_ENTER_SEQ(IDLE_INST) \ So we start off with both as the same? > b . > #endif /* CONFIG_PPC_P7_NAP */ Balbir
Re: [PATCH v7 1/7] powerpc/mm: Rename hugetlb-radix.h to hugetlb.h
On 28/11/16 17:16, Aneesh Kumar K.V wrote: > We will start moving some book3s specific hugetlb functions there. You mean for both radix and hash right? Balbir
[PATCH 1/2] powerpc/32: Enable HW_BREAKPOINT on BOOK3S
BOOK3S also has DABR register and capability to handle data breakpoints, so this patch enable it on all BOOK3S, not only 64 bits. Signed-off-by: Christophe Leroy--- arch/powerpc/Kconfig | 2 +- arch/powerpc/include/asm/processor.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 2d86643..5b736e4 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -113,7 +113,7 @@ config PPC select HAVE_PERF_REGS select HAVE_PERF_USER_STACK_DUMP select HAVE_REGS_AND_STACK_ACCESS_API - select HAVE_HW_BREAKPOINT if PERF_EVENTS && PPC_BOOK3S_64 + select HAVE_HW_BREAKPOINT if PERF_EVENTS && PPC_BOOK3S select ARCH_WANT_IPC_PARSE_VERSION select SPARSE_IRQ select IRQ_DOMAIN diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index 1ba8144..2053a4b 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch/powerpc/include/asm/processor.h @@ -225,6 +225,7 @@ struct thread_struct { #ifdef CONFIG_PPC64 unsigned long start_tb; /* Start purr when proc switched in */ unsigned long accum_tb; /* Total accumulated purr for process */ +#endif #ifdef CONFIG_HAVE_HW_BREAKPOINT struct perf_event *ptrace_bps[HBP_NUM]; /* @@ -233,7 +234,6 @@ struct thread_struct { */ struct perf_event *last_hit_ubp; #endif /* CONFIG_HAVE_HW_BREAKPOINT */ -#endif struct arch_hw_breakpoint hw_brk; /* info on the hardware breakpoint */ unsigned long trap_nr;/* last trap # on this thread */ u8 load_fp; -- 2.10.1
[PATCH v2] KVM/PPC Patch for KVM issue in real mode
Some KVM functions for book3s_hv are called in real mode. In real mode the top 4 bits of the address space are ignored, hence an address beginning with 0xc000+offset is the same as 0xd000+offset. The issue was observed when a kvm memslot resolution lead to random values when access from kvmppc_h_enter(). The issue is hit if the KVM host is running with a page size of 4K, since kvzalloc() looks at size < PAGE_SIZE. On systems with 64K the issue is not observed easily, it largely depends on the size of the structure being allocated. The proposed fix moves all KVM allocations for book3s_hv to kzalloc() until all structures used in real mode are audited. For safety allocations are moved to kmalloc space. The impact is a large allocation on systems with 4K page size. Signed-off-by: Balbir Singh--- Changelog v2: Fix build failures reported by the kbuild test robot http://www.spinics.net/lists/kvm/msg141727.html arch/powerpc/include/asm/kvm_host.h | 19 +++ include/linux/kvm_host.h| 11 +++ virt/kvm/kvm_main.c | 2 +- 3 files changed, 31 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index f15713a..53f5172 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -734,6 +734,25 @@ struct kvm_vcpu_arch { #define __KVM_HAVE_ARCH_WQP #define __KVM_HAVE_CREATE_DEVICE +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE +#define __KVM_HAVE_ARCH_VZALLOC_OVERRIDE + +/* + * KVM uses some of these data structures -- the ones + * from kvzalloc() in real mode. If the data structure + * happens to come from a vmalloc'd range then its access + * in real mode will lead to problems due to the aliasing + * issue - (top 4 bits are ignore). + * A 0xd000+offset will point to a 0xc000+offset in realmode + * Hence we want our data structures from come from kmalloc'd + * regions, so that we don't have these aliasing issues + */ +static inline void *kvm_arch_vzalloc(unsigned long size) +{ + return kzalloc(size, GFP_KERNEL); +} +#endif + static inline void kvm_arch_hardware_disable(void) {} static inline void kvm_arch_hardware_unsetup(void) {} static inline void kvm_arch_sync_events(struct kvm *kvm) {} diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 01c0b9c..0c88af5 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -19,6 +19,7 @@ #include #include #include +#include #include #include #include @@ -793,6 +794,16 @@ static inline bool kvm_arch_has_noncoherent_dma(struct kvm *kvm) return false; } #endif + +#ifdef __KVM_HAVE_ARCH_VZALLOC_OVERRIDE +static void *kvm_arch_vzalloc(unsigned long size); +#else +static inline void *kvm_arch_vzalloc(unsigned long size) +{ + return vzalloc(size); +} +#endif + #ifdef __KVM_HAVE_ARCH_ASSIGNED_DEVICE void kvm_arch_start_assignment(struct kvm *kvm); void kvm_arch_end_assignment(struct kvm *kvm); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index fbf04c0..57e3dca 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -689,7 +689,7 @@ static struct kvm *kvm_create_vm(unsigned long type) void *kvm_kvzalloc(unsigned long size) { if (size > PAGE_SIZE) - return vzalloc(size); + return kvm_arch_vzalloc(size); else return kzalloc(size, GFP_KERNEL); } -- 2.5.5
Re: [PATCH net 00/16] net: fix fixed-link phydev leaks
From: Johan HovoldDate: Mon, 28 Nov 2016 19:24:53 +0100 > This series fixes failures to deregister and free fixed-link phydevs > that have been registered using the of_phy_register_fixed_link() > interface. > > All but two drivers currently fail to do this and this series fixes most > of them with the exception of a staging driver and the stmmac drivers > which will be fixed by follow-on patches. > > Included are also a couple of fixes for related of-node leaks. > > Note that all patches except the of_mdio one have been compile-tested > only. > > Also note that the series is against net due to dependencies not yet in > net-next. Series applied, thanks Johan.
[PATCH kernel v7 1/7] powerpc/iommu: Pass mm_struct to init/cleanup helpers
We are going to get rid of @current references in mmu_context_boos3s64.c and cache mm_struct in the VFIO container. Since mm_context_t does not have reference counting, we will be using mm_struct which does have the reference counter. This changes mm_iommu_init/mm_iommu_cleanup to receive mm_struct rather than mm_context_t (which is embedded into mm). This should not cause any behavioral change. Signed-off-by: Alexey KardashevskiyReviewed-by: David Gibson --- arch/powerpc/include/asm/mmu_context.h | 4 ++-- arch/powerpc/kernel/setup-common.c | 2 +- arch/powerpc/mm/mmu_context_book3s64.c | 4 ++-- arch/powerpc/mm/mmu_context_iommu.c| 9 + 4 files changed, 10 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h index 5c45114..424844b 100644 --- a/arch/powerpc/include/asm/mmu_context.h +++ b/arch/powerpc/include/asm/mmu_context.h @@ -23,8 +23,8 @@ extern bool mm_iommu_preregistered(void); extern long mm_iommu_get(unsigned long ua, unsigned long entries, struct mm_iommu_table_group_mem_t **pmem); extern long mm_iommu_put(struct mm_iommu_table_group_mem_t *mem); -extern void mm_iommu_init(mm_context_t *ctx); -extern void mm_iommu_cleanup(mm_context_t *ctx); +extern void mm_iommu_init(struct mm_struct *mm); +extern void mm_iommu_cleanup(struct mm_struct *mm); extern struct mm_iommu_table_group_mem_t *mm_iommu_lookup(unsigned long ua, unsigned long size); extern struct mm_iommu_table_group_mem_t *mm_iommu_find(unsigned long ua, diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c index 270ee30..f516ac5 100644 --- a/arch/powerpc/kernel/setup-common.c +++ b/arch/powerpc/kernel/setup-common.c @@ -915,7 +915,7 @@ void __init setup_arch(char **cmdline_p) init_mm.context.pte_frag = NULL; #endif #ifdef CONFIG_SPAPR_TCE_IOMMU - mm_iommu_init(_mm.context); + mm_iommu_init(_mm); #endif irqstack_early_init(); exc_lvl_early_init(); diff --git a/arch/powerpc/mm/mmu_context_book3s64.c b/arch/powerpc/mm/mmu_context_book3s64.c index b114f8b..ad82735 100644 --- a/arch/powerpc/mm/mmu_context_book3s64.c +++ b/arch/powerpc/mm/mmu_context_book3s64.c @@ -115,7 +115,7 @@ int init_new_context(struct task_struct *tsk, struct mm_struct *mm) mm->context.pte_frag = NULL; #endif #ifdef CONFIG_SPAPR_TCE_IOMMU - mm_iommu_init(>context); + mm_iommu_init(mm); #endif return 0; } @@ -160,7 +160,7 @@ static inline void destroy_pagetable_page(struct mm_struct *mm) void destroy_context(struct mm_struct *mm) { #ifdef CONFIG_SPAPR_TCE_IOMMU - mm_iommu_cleanup(>context); + mm_iommu_cleanup(mm); #endif #ifdef CONFIG_PPC_ICSWX diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c index e0f1c33..ad2e575 100644 --- a/arch/powerpc/mm/mmu_context_iommu.c +++ b/arch/powerpc/mm/mmu_context_iommu.c @@ -373,16 +373,17 @@ void mm_iommu_mapped_dec(struct mm_iommu_table_group_mem_t *mem) } EXPORT_SYMBOL_GPL(mm_iommu_mapped_dec); -void mm_iommu_init(mm_context_t *ctx) +void mm_iommu_init(struct mm_struct *mm) { - INIT_LIST_HEAD_RCU(>iommu_group_mem_list); + INIT_LIST_HEAD_RCU(>context.iommu_group_mem_list); } -void mm_iommu_cleanup(mm_context_t *ctx) +void mm_iommu_cleanup(struct mm_struct *mm) { struct mm_iommu_table_group_mem_t *mem, *tmp; - list_for_each_entry_safe(mem, tmp, >iommu_group_mem_list, next) { + list_for_each_entry_safe(mem, tmp, >context.iommu_group_mem_list, + next) { list_del_rcu(>next); mm_iommu_do_free(mem); } -- 2.5.0.rc3
Re: [PATCH v11 0/8] powerpc: Implement kexec_file_load()
Andrew Mortonwrites: > On Tue, 29 Nov 2016 23:45:46 +1100 Michael Ellerman > wrote: > >> This is v11 of the kexec_file_load() for powerpc series. >> >> I've stripped this down to the minimum we need, so we can get this in for >> 4.10. >> Any additions can come later incrementally. > > This made a bit of a mess of Mimi's series "ima: carry the > measurement list across kexec v10". Urk, sorry about that. I didn't realise there was a big dependency between them, but I guess I should have tried to do the rebase. > powerpc-ima-get-the-kexec-buffer-passed-by-the-previous-kernel.patch > ima-on-soft-reboot-restore-the-measurement-list.patch > ima-permit-duplicate-measurement-list-entries.patch > ima-maintain-memory-size-needed-for-serializing-the-measurement-list.patch > powerpc-ima-send-the-kexec-buffer-to-the-next-kernel.patch > ima-on-soft-reboot-save-the-measurement-list.patch > ima-store-the-builtin-custom-template-definitions-in-a-list.patch > ima-support-restoring-multiple-template-formats.patch > ima-define-a-canonical-binary_runtime_measurements-list-format.patch > ima-platform-independent-hash-value.patch > > I made the syntactic fixes but I won't be testing it. Thanks. TBH I don't know how to test the IMA part, I'm relying on Thiago and Mimi to do that. >> If no one objects I'll merge this via the powerpc tree. The three kexec >> patches >> have been acked by Dave Young (since forever), and have been in linux-next >> (via >> akpm's tree) also for a long time. > > OK, I'll wait for these to appear in -next and I will await advice on Thanks. I'll let them stew for a few more hours and then put them in my next for tomorrows linux-next. cheers
[PATCH kernel v7 4/7] vfio/spapr: Add a helper to create default DMA window
There is already a helper to create a DMA window which does allocate a table and programs it to the IOMMU group. However tce_iommu_take_ownership_ddw() did not use it and did these 2 calls itself to simplify error path. Since we are going to delay the default window creation till the default window is accessed/removed or new window is added, we need a helper to create a default window from all these cases. This adds tce_iommu_create_default_window(). Since it relies on a VFIO container to have at least one IOMMU group (for future use), this changes tce_iommu_attach_group() to add a group to the container first and then call the new helper. Signed-off-by: Alexey KardashevskiyReviewed-by: David Gibson --- Changes: v6: * new to the patchset --- drivers/vfio/vfio_iommu_spapr_tce.c | 87 ++--- 1 file changed, 42 insertions(+), 45 deletions(-) diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c index 4efd2b2..a67bbfd 100644 --- a/drivers/vfio/vfio_iommu_spapr_tce.c +++ b/drivers/vfio/vfio_iommu_spapr_tce.c @@ -710,6 +710,29 @@ static long tce_iommu_remove_window(struct tce_container *container, return 0; } +static long tce_iommu_create_default_window(struct tce_container *container) +{ + long ret; + __u64 start_addr = 0; + struct tce_iommu_group *tcegrp; + struct iommu_table_group *table_group; + + if (!tce_groups_attached(container)) + return -ENODEV; + + tcegrp = list_first_entry(>group_list, + struct tce_iommu_group, next); + table_group = iommu_group_get_iommudata(tcegrp->grp); + if (!table_group) + return -ENODEV; + + ret = tce_iommu_create_window(container, IOMMU_PAGE_SHIFT_4K, + table_group->tce32_size, 1, _addr); + WARN_ON_ONCE(!ret && start_addr); + + return ret; +} + static long tce_iommu_ioctl(void *iommu_data, unsigned int cmd, unsigned long arg) { @@ -1100,9 +1123,6 @@ static void tce_iommu_release_ownership_ddw(struct tce_container *container, static long tce_iommu_take_ownership_ddw(struct tce_container *container, struct iommu_table_group *table_group) { - long i, ret = 0; - struct iommu_table *tbl = NULL; - if (!table_group->ops->create_table || !table_group->ops->set_window || !table_group->ops->release_ownership) { WARN_ON_ONCE(1); @@ -,47 +1131,7 @@ static long tce_iommu_take_ownership_ddw(struct tce_container *container, table_group->ops->take_ownership(table_group); - /* -* If it the first group attached, check if there is -* a default DMA window and create one if none as -* the userspace expects it to exist. -*/ - if (!tce_groups_attached(container) && !container->tables[0]) { - ret = tce_iommu_create_table(container, - table_group, - 0, /* window number */ - IOMMU_PAGE_SHIFT_4K, - table_group->tce32_size, - 1, /* default levels */ - ); - if (ret) - goto release_exit; - else - container->tables[0] = tbl; - } - - /* Set all windows to the new group */ - for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i) { - tbl = container->tables[i]; - - if (!tbl) - continue; - - /* Set the default window to a new group */ - ret = table_group->ops->set_window(table_group, i, tbl); - if (ret) - goto release_exit; - } - return 0; - -release_exit: - for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i) - table_group->ops->unset_window(table_group, i); - - table_group->ops->release_ownership(table_group); - - return ret; } static int tce_iommu_attach_group(void *iommu_data, @@ -1161,6 +1141,7 @@ static int tce_iommu_attach_group(void *iommu_data, struct tce_container *container = iommu_data; struct iommu_table_group *table_group; struct tce_iommu_group *tcegrp = NULL; + bool create_default_window = false; mutex_lock(>lock); @@ -1203,14 +1184,30 @@ static int tce_iommu_attach_group(void *iommu_data, } if (!table_group->ops || !table_group->ops->take_ownership || - !table_group->ops->release_ownership) + !table_group->ops->release_ownership) { ret = tce_iommu_take_ownership(container, table_group); - else + } else { ret = tce_iommu_take_ownership_ddw(container, table_group); +
[PATCH kernel v7 3/7] vfio/spapr: Postpone allocation of userspace version of TCE table
The iommu_table struct manages a hardware TCE table and a vmalloc'd table with corresponding userspace addresses. Both are allocated when the default DMA window is created and this happens when the very first group is attached to a container. As we are going to allow the userspace to configure container in one memory context and pas container fd to another, we have to postpones such allocations till a container fd is passed to the destination user process so we would account locked memory limit against the actual container user constrainsts. This postpones the it_userspace array allocation till it is used first time for mapping. The unmapping patch already checks if the array is allocated. Signed-off-by: Alexey KardashevskiyReviewed-by: David Gibson --- Changes: v6: * moved missing hunk from the next patch: tce_iommu_create_table() would decrement locked_vm while new caller - tce_iommu_build_v2() - will not; this adds a new return code to the DMA mapping path but this seems to be a minor change. --- drivers/vfio/vfio_iommu_spapr_tce.c | 20 +++- 1 file changed, 7 insertions(+), 13 deletions(-) diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c index d0c38b2..4efd2b2 100644 --- a/drivers/vfio/vfio_iommu_spapr_tce.c +++ b/drivers/vfio/vfio_iommu_spapr_tce.c @@ -515,6 +515,12 @@ static long tce_iommu_build_v2(struct tce_container *container, unsigned long hpa; enum dma_data_direction dirtmp; + if (!tbl->it_userspace) { + ret = tce_iommu_userspace_view_alloc(tbl); + if (ret) + return ret; + } + for (i = 0; i < pages; ++i) { struct mm_iommu_table_group_mem_t *mem = NULL; unsigned long *pua = IOMMU_TABLE_USERSPACE_ENTRY(tbl, @@ -588,15 +594,6 @@ static long tce_iommu_create_table(struct tce_container *container, WARN_ON(!ret && !(*ptbl)->it_ops->free); WARN_ON(!ret && ((*ptbl)->it_allocated_size != table_size)); - if (!ret && container->v2) { - ret = tce_iommu_userspace_view_alloc(*ptbl); - if (ret) - (*ptbl)->it_ops->free(*ptbl); - } - - if (ret) - decrement_locked_vm(table_size >> PAGE_SHIFT); - return ret; } @@ -1068,10 +1065,7 @@ static int tce_iommu_take_ownership(struct tce_container *container, if (!tbl || !tbl->it_map) continue; - rc = tce_iommu_userspace_view_alloc(tbl); - if (!rc) - rc = iommu_take_ownership(tbl); - + rc = iommu_take_ownership(tbl); if (rc) { for (j = 0; j < i; ++j) iommu_release_ownership( -- 2.5.0.rc3
[PATCH kernel v7 0/7] powerpc/spapr/vfio: Put pages on VFIO container shutdown
These patches are to fix a bug when pages stay pinned hours after QEMU which requested pinning exited. Change to v6 it in the last 2 patches, individual patches got detailed changelog. Please comment. Thanks. Alexey Kardashevskiy (7): powerpc/iommu: Pass mm_struct to init/cleanup helpers powerpc/iommu: Stop using @current in mm_iommu_xxx vfio/spapr: Postpone allocation of userspace version of TCE table vfio/spapr: Add a helper to create default DMA window vfio/spapr: Postpone default window creation vfio/spapr: Reference mm in tce_container powerpc/mm/iommu, vfio/spapr: Put pages on VFIO container shutdown arch/powerpc/include/asm/mmu_context.h | 20 +- arch/powerpc/kernel/setup-common.c | 2 +- arch/powerpc/mm/mmu_context_book3s64.c | 6 +- arch/powerpc/mm/mmu_context_iommu.c| 60 ++ drivers/vfio/vfio_iommu_spapr_tce.c| 328 ++--- 5 files changed, 250 insertions(+), 166 deletions(-) -- 2.5.0.rc3
[PATCH kernel v7 5/7] vfio/spapr: Postpone default window creation
We are going to allow the userspace to configure container in one memory context and pass container fd to another so we are postponing memory allocations accounted against the locked memory limit. One of previous patches took care of it_userspace. At the moment we create the default DMA window when the first group is attached to a container; this is done for the userspace which is not DDW-aware but familiar with the SPAPR TCE IOMMU v2 in the part of memory pre-registration - such client expects the default DMA window to exist. This postpones the default DMA window allocation till one of the folliwing happens: 1. first map/unmap request arrives; 2. new window is requested; This adds noop for the case when the userspace requested removal of the default window which has not been created yet. Signed-off-by: Alexey KardashevskiyReviewed-by: David Gibson --- Changes: v6: * new helper tce_iommu_create_default_window() moved to a separate patch; * creates a default window when new window is requested; it used to reset the def_window_pending flag instead; * def_window_pending handling (mostly) localized in tce_iommu_create_default_window() now, the only exception is removal of not yet created default window. --- drivers/vfio/vfio_iommu_spapr_tce.c | 40 +++-- 1 file changed, 25 insertions(+), 15 deletions(-) diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c index a67bbfd..88622be 100644 --- a/drivers/vfio/vfio_iommu_spapr_tce.c +++ b/drivers/vfio/vfio_iommu_spapr_tce.c @@ -97,6 +97,7 @@ struct tce_container { struct mutex lock; bool enabled; bool v2; + bool def_window_pending; unsigned long locked_pages; struct iommu_table *tables[IOMMU_TABLE_GROUP_MAX_TABLES]; struct list_head group_list; @@ -717,6 +718,9 @@ static long tce_iommu_create_default_window(struct tce_container *container) struct tce_iommu_group *tcegrp; struct iommu_table_group *table_group; + if (!container->def_window_pending) + return 0; + if (!tce_groups_attached(container)) return -ENODEV; @@ -730,6 +734,9 @@ static long tce_iommu_create_default_window(struct tce_container *container) table_group->tce32_size, 1, _addr); WARN_ON_ONCE(!ret && start_addr); + if (!ret) + container->def_window_pending = false; + return ret; } @@ -823,6 +830,10 @@ static long tce_iommu_ioctl(void *iommu_data, VFIO_DMA_MAP_FLAG_WRITE)) return -EINVAL; + ret = tce_iommu_create_default_window(container); + if (ret) + return ret; + num = tce_iommu_find_table(container, param.iova, ); if (num < 0) return -ENXIO; @@ -886,6 +897,10 @@ static long tce_iommu_ioctl(void *iommu_data, if (param.flags) return -EINVAL; + ret = tce_iommu_create_default_window(container); + if (ret) + return ret; + num = tce_iommu_find_table(container, param.iova, ); if (num < 0) return -ENXIO; @@ -1012,6 +1027,10 @@ static long tce_iommu_ioctl(void *iommu_data, mutex_lock(>lock); + ret = tce_iommu_create_default_window(container); + if (ret) + return ret; + ret = tce_iommu_create_window(container, create.page_shift, create.window_size, create.levels, _addr); @@ -1044,6 +1063,11 @@ static long tce_iommu_ioctl(void *iommu_data, if (remove.flags) return -EINVAL; + if (container->def_window_pending && !remove.start_addr) { + container->def_window_pending = false; + return 0; + } + mutex_lock(>lock); ret = tce_iommu_remove_window(container, remove.start_addr); @@ -1141,7 +1165,6 @@ static int tce_iommu_attach_group(void *iommu_data, struct tce_container *container = iommu_data; struct iommu_table_group *table_group; struct tce_iommu_group *tcegrp = NULL; - bool create_default_window = false; mutex_lock(>lock); @@ -1189,25 +1212,12 @@ static int tce_iommu_attach_group(void *iommu_data, } else { ret = tce_iommu_take_ownership_ddw(container, table_group); if (!tce_groups_attached(container) && !container->tables[0]) - create_default_window = true; + container->def_window_pending = true; } if (!ret) { tcegrp->grp = iommu_group;
[PATCH v2] Fix the message in facility unavailable exception
I ran into this during some testing on qemu. The current facility_strings[] are correct when the trap address is 0xf80 (hypervisor facility unavailable). When the trap address is 0xf60, IC (Interruption Cause) a.k.a status in the code is undefined for values 0 and 1. This patch adds a check to prevent printing the wrong information and helps better direct debugging effort. Signed-off-by: Balbir Singh--- Changelog v2: Redo conditional checks as suggested by Michael arch/powerpc/kernel/traps.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c index 023a462..010b11d 100644 --- a/arch/powerpc/kernel/traps.c +++ b/arch/powerpc/kernel/traps.c @@ -1519,9 +1519,13 @@ void facility_unavailable_exception(struct pt_regs *regs) return; } - if ((status < ARRAY_SIZE(facility_strings)) && - facility_strings[status]) - facility = facility_strings[status]; + if ((hv || status >= 2) && + (status < ARRAY_SIZE(facility_strings)) && + facility_strings[status]) + facility = facility_strings[status]; + else + pr_warn_ratelimited("Unexpected facility unavailable exception " + "interruption cause %d\n", status); /* We restore the interrupt state now */ if (!arch_irq_disabled_regs(regs)) -- 2.5.5
[PATCH kernel v7 6/7] vfio/spapr: Reference mm in tce_container
In some situations the userspace memory context may live longer than the userspace process itself so if we need to do proper memory context cleanup, we better have tce_container take a reference to mm_struct and use it later when the process is gone (@current or @current->mm is NULL). This references mm and stores the pointer in the container; this is done in a new helper - tce_iommu_mm_set() - when one of the following happens: - a container is enabled (IOMMU v1); - a first attempt to pre-register memory is made (IOMMU v2); - a DMA window is created (IOMMU v2). The @mm stays referenced till the container is destroyed. This replaces current->mm with container->mm everywhere except debug prints. This adds a check that current->mm is the same as the one stored in the container to prevent userspace from making changes to a memory context of other processes. DMA map/unmap ioctls() do not check for @mm as they already check for @enabled which is set after tce_iommu_mm_set() is called. This does not reference a task as multiple threads within the same mm are allowed to ioctl() to vfio and supposedly they will have same limits and capabilities and if they do not, we'll just fail with no harm made. Signed-off-by: Alexey Kardashevskiy--- Changes: v7: * WARN_ON_ONCE(!mm)) in try_increment_locked_vm() * s/&&/||/ in a parameter check in decrement_locked_vm() * instead of failing on unset container, the VFIO_IOMMU_SPAPR_TCE_REMOVE handler sets mm to container now v6: * updated the commit log about not referencing task v5: * postpone referencing of mm v4: * added check for container->mm!=current->mm in tce_iommu_ioctl() for all ioctls and removed other redundand checks --- drivers/vfio/vfio_iommu_spapr_tce.c | 160 ++-- 1 file changed, 100 insertions(+), 60 deletions(-) diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c index 88622be..4c03c85 100644 --- a/drivers/vfio/vfio_iommu_spapr_tce.c +++ b/drivers/vfio/vfio_iommu_spapr_tce.c @@ -31,49 +31,49 @@ static void tce_iommu_detach_group(void *iommu_data, struct iommu_group *iommu_group); -static long try_increment_locked_vm(long npages) +static long try_increment_locked_vm(struct mm_struct *mm, long npages) { long ret = 0, locked, lock_limit; - if (!current || !current->mm) - return -ESRCH; /* process exited */ + if (WARN_ON_ONCE(!mm)) + return -EPERM; if (!npages) return 0; - down_write(>mm->mmap_sem); - locked = current->mm->locked_vm + npages; + down_write(>mmap_sem); + locked = mm->locked_vm + npages; lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; if (locked > lock_limit && !capable(CAP_IPC_LOCK)) ret = -ENOMEM; else - current->mm->locked_vm += npages; + mm->locked_vm += npages; pr_debug("[%d] RLIMIT_MEMLOCK +%ld %ld/%ld%s\n", current->pid, npages << PAGE_SHIFT, - current->mm->locked_vm << PAGE_SHIFT, + mm->locked_vm << PAGE_SHIFT, rlimit(RLIMIT_MEMLOCK), ret ? " - exceeded" : ""); - up_write(>mm->mmap_sem); + up_write(>mmap_sem); return ret; } -static void decrement_locked_vm(long npages) +static void decrement_locked_vm(struct mm_struct *mm, long npages) { - if (!current || !current->mm || !npages) - return; /* process exited */ + if (!mm || !npages) + return; - down_write(>mm->mmap_sem); - if (WARN_ON_ONCE(npages > current->mm->locked_vm)) - npages = current->mm->locked_vm; - current->mm->locked_vm -= npages; + down_write(>mmap_sem); + if (WARN_ON_ONCE(npages > mm->locked_vm)) + npages = mm->locked_vm; + mm->locked_vm -= npages; pr_debug("[%d] RLIMIT_MEMLOCK -%ld %ld/%ld\n", current->pid, npages << PAGE_SHIFT, - current->mm->locked_vm << PAGE_SHIFT, + mm->locked_vm << PAGE_SHIFT, rlimit(RLIMIT_MEMLOCK)); - up_write(>mm->mmap_sem); + up_write(>mmap_sem); } /* @@ -99,26 +99,38 @@ struct tce_container { bool v2; bool def_window_pending; unsigned long locked_pages; + struct mm_struct *mm; struct iommu_table *tables[IOMMU_TABLE_GROUP_MAX_TABLES]; struct list_head group_list; }; +static long tce_iommu_mm_set(struct tce_container *container) +{ + if (container->mm) { + if (container->mm == current->mm) + return 0; + return -EPERM; + } + BUG_ON(!current->mm); + container->mm = current->mm; + atomic_inc(>mm->mm_count); + + return 0; +} + static long tce_iommu_unregister_pages(struct
[PATCH kernel v7 2/7] powerpc/iommu: Stop using @current in mm_iommu_xxx
This changes mm_iommu_xxx helpers to take mm_struct as a parameter instead of getting it from @current which in some situations may not have a valid reference to mm. This changes helpers to receive @mm and moves all references to @current to the caller, including checks for !current and !current->mm; checks in mm_iommu_preregistered() are removed as there is no caller yet. This moves the mm_iommu_adjust_locked_vm() call to the caller as it receives mm_iommu_table_group_mem_t but it needs mm. This should cause no behavioral change. Signed-off-by: Alexey KardashevskiyReviewed-by: David Gibson --- arch/powerpc/include/asm/mmu_context.h | 16 ++-- arch/powerpc/mm/mmu_context_iommu.c| 46 +- drivers/vfio/vfio_iommu_spapr_tce.c| 14 --- 3 files changed, 36 insertions(+), 40 deletions(-) diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h index 424844b..b9e3f0a 100644 --- a/arch/powerpc/include/asm/mmu_context.h +++ b/arch/powerpc/include/asm/mmu_context.h @@ -19,16 +19,18 @@ extern void destroy_context(struct mm_struct *mm); struct mm_iommu_table_group_mem_t; extern int isolate_lru_page(struct page *page);/* from internal.h */ -extern bool mm_iommu_preregistered(void); -extern long mm_iommu_get(unsigned long ua, unsigned long entries, +extern bool mm_iommu_preregistered(struct mm_struct *mm); +extern long mm_iommu_get(struct mm_struct *mm, + unsigned long ua, unsigned long entries, struct mm_iommu_table_group_mem_t **pmem); -extern long mm_iommu_put(struct mm_iommu_table_group_mem_t *mem); +extern long mm_iommu_put(struct mm_struct *mm, + struct mm_iommu_table_group_mem_t *mem); extern void mm_iommu_init(struct mm_struct *mm); extern void mm_iommu_cleanup(struct mm_struct *mm); -extern struct mm_iommu_table_group_mem_t *mm_iommu_lookup(unsigned long ua, - unsigned long size); -extern struct mm_iommu_table_group_mem_t *mm_iommu_find(unsigned long ua, - unsigned long entries); +extern struct mm_iommu_table_group_mem_t *mm_iommu_lookup(struct mm_struct *mm, + unsigned long ua, unsigned long size); +extern struct mm_iommu_table_group_mem_t *mm_iommu_find(struct mm_struct *mm, + unsigned long ua, unsigned long entries); extern long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem, unsigned long ua, unsigned long *hpa); extern long mm_iommu_mapped_inc(struct mm_iommu_table_group_mem_t *mem); diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c index ad2e575..4c6db09 100644 --- a/arch/powerpc/mm/mmu_context_iommu.c +++ b/arch/powerpc/mm/mmu_context_iommu.c @@ -56,7 +56,7 @@ static long mm_iommu_adjust_locked_vm(struct mm_struct *mm, } pr_debug("[%d] RLIMIT_MEMLOCK HASH64 %c%ld %ld/%ld\n", - current->pid, + current ? current->pid : 0, incr ? '+' : '-', npages << PAGE_SHIFT, mm->locked_vm << PAGE_SHIFT, @@ -66,12 +66,9 @@ static long mm_iommu_adjust_locked_vm(struct mm_struct *mm, return ret; } -bool mm_iommu_preregistered(void) +bool mm_iommu_preregistered(struct mm_struct *mm) { - if (!current || !current->mm) - return false; - - return !list_empty(>mm->context.iommu_group_mem_list); + return !list_empty(>context.iommu_group_mem_list); } EXPORT_SYMBOL_GPL(mm_iommu_preregistered); @@ -124,19 +121,16 @@ static int mm_iommu_move_page_from_cma(struct page *page) return 0; } -long mm_iommu_get(unsigned long ua, unsigned long entries, +long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries, struct mm_iommu_table_group_mem_t **pmem) { struct mm_iommu_table_group_mem_t *mem; long i, j, ret = 0, locked_entries = 0; struct page *page = NULL; - if (!current || !current->mm) - return -ESRCH; /* process exited */ - mutex_lock(_list_mutex); - list_for_each_entry_rcu(mem, >mm->context.iommu_group_mem_list, + list_for_each_entry_rcu(mem, >context.iommu_group_mem_list, next) { if ((mem->ua == ua) && (mem->entries == entries)) { ++mem->used; @@ -154,7 +148,7 @@ long mm_iommu_get(unsigned long ua, unsigned long entries, } - ret = mm_iommu_adjust_locked_vm(current->mm, entries, true); + ret = mm_iommu_adjust_locked_vm(mm, entries, true); if (ret) goto unlock_exit; @@ -215,11 +209,11 @@ long mm_iommu_get(unsigned long ua, unsigned long entries, mem->entries = entries; *pmem = mem; - list_add_rcu(>next, >mm->context.iommu_group_mem_list); + list_add_rcu(>next,
Re: [PATCH v7 0/7] Radix pte update tlbflush optimizations.
Balbir Singhwrites: > On 28/11/16 17:16, Aneesh Kumar K.V wrote: >> Changes from v6: >> * restrict the new pte bit to radix and DD1 config >> >> Changes from V5: >> Switch to use pte bits to track page size. > > This series looks much better, I wish there was a better > way of avoiding to have to pass the address to the ptep function, > but I guess we get to live with it forever No, we can always revert it when P9 DD1 is dead and buried. cheers
[PATCH kernel v7 7/7] powerpc/mm/iommu, vfio/spapr: Put pages on VFIO container shutdown
At the moment the userspace tool is expected to request pinning of the entire guest RAM when VFIO IOMMU SPAPR v2 driver is present. When the userspace process finishes, all the pinned pages need to be put; this is done as a part of the userspace memory context (MM) destruction which happens on the very last mmdrop(). This approach has a problem that a MM of the userspace process may live longer than the userspace process itself as kernel threads use userspace process MMs which was runnning on a CPU where the kernel thread was scheduled to. If this happened, the MM remains referenced until this exact kernel thread wakes up again and releases the very last reference to the MM, on an idle system this can take even hours. This moves preregistered regions tracking from MM to VFIO; insteads of using mm_iommu_table_group_mem_t::used, tce_container::prereg_list is added so each container releases regions which it has pre-registered. This changes the userspace interface to return EBUSY if a memory region is already registered in a container. However it should not have any practical effect as the only userspace tool available now does register memory region once per container anyway. As tce_iommu_register_pages/tce_iommu_unregister_pages are called under container->lock, this does not need additional locking. Signed-off-by: Alexey KardashevskiyReviewed-by: Nicholas Piggin --- Changes: v7: * left sanity check in destroy_context() * tce_iommu_prereg_free() does not free tce_iommu_prereg struct if mm_iommu_put() failed; VFIO SPAPR container release callback now warns on an error v4: * changed tce_iommu_register_pages() to call mm_iommu_find() first and avoid calling mm_iommu_put() if memory is preregistered already v3: * moved tce_iommu_prereg_free() call out of list_for_each_entry() v2: * updated commit log --- arch/powerpc/mm/mmu_context_book3s64.c | 4 +-- arch/powerpc/mm/mmu_context_iommu.c| 11 -- drivers/vfio/vfio_iommu_spapr_tce.c| 61 +- 3 files changed, 61 insertions(+), 15 deletions(-) diff --git a/arch/powerpc/mm/mmu_context_book3s64.c b/arch/powerpc/mm/mmu_context_book3s64.c index ad82735..73bf6e1 100644 --- a/arch/powerpc/mm/mmu_context_book3s64.c +++ b/arch/powerpc/mm/mmu_context_book3s64.c @@ -156,13 +156,11 @@ static inline void destroy_pagetable_page(struct mm_struct *mm) } #endif - void destroy_context(struct mm_struct *mm) { #ifdef CONFIG_SPAPR_TCE_IOMMU - mm_iommu_cleanup(mm); + WARN_ON_ONCE(!list_empty(>context.iommu_group_mem_list)); #endif - #ifdef CONFIG_PPC_ICSWX drop_cop(mm->context.acop, mm); kfree(mm->context.cop_lockp); diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c index 4c6db09..104bad0 100644 --- a/arch/powerpc/mm/mmu_context_iommu.c +++ b/arch/powerpc/mm/mmu_context_iommu.c @@ -365,14 +365,3 @@ void mm_iommu_init(struct mm_struct *mm) { INIT_LIST_HEAD_RCU(>context.iommu_group_mem_list); } - -void mm_iommu_cleanup(struct mm_struct *mm) -{ - struct mm_iommu_table_group_mem_t *mem, *tmp; - - list_for_each_entry_safe(mem, tmp, >context.iommu_group_mem_list, - next) { - list_del_rcu(>next); - mm_iommu_do_free(mem); - } -} diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c index 4c03c85..c882357 100644 --- a/drivers/vfio/vfio_iommu_spapr_tce.c +++ b/drivers/vfio/vfio_iommu_spapr_tce.c @@ -89,6 +89,15 @@ struct tce_iommu_group { }; /* + * A container needs to remember which preregistered region it has + * referenced to do proper cleanup at the userspace process exit. + */ +struct tce_iommu_prereg { + struct list_head next; + struct mm_iommu_table_group_mem_t *mem; +}; + +/* * The container descriptor supports only a single group per container. * Required by the API as the container is not supplied with the IOMMU group * at the moment of initialization. @@ -102,6 +111,7 @@ struct tce_container { struct mm_struct *mm; struct iommu_table *tables[IOMMU_TABLE_GROUP_MAX_TABLES]; struct list_head group_list; + struct list_head prereg_list; }; static long tce_iommu_mm_set(struct tce_container *container) @@ -118,10 +128,27 @@ static long tce_iommu_mm_set(struct tce_container *container) return 0; } +static long tce_iommu_prereg_free(struct tce_container *container, + struct tce_iommu_prereg *tcemem) +{ + long ret; + + ret = mm_iommu_put(container->mm, tcemem->mem); + if (ret) + return ret; + + list_del(>next); + kfree(tcemem); + + return 0; +} + static long tce_iommu_unregister_pages(struct tce_container *container, __u64 vaddr, __u64 size) { struct mm_iommu_table_group_mem_t *mem; + struct tce_iommu_prereg *tcemem; + bool found = false; if
Re: [PATCH v7 3/7] powerpc/mm: Introduce _PAGE_LARGE software pte bits
On 30/11/16 11:35, Benjamin Herrenschmidt wrote: > On Wed, 2016-11-30 at 11:14 +1100, Balbir Singh wrote: >>> +#define _RPAGE_RSV1 0x1000UL >>> +#define _RPAGE_RSV2 0x0800UL >>> +#define _RPAGE_RSV3 0x0400UL >>> +#define _RPAGE_RSV4 0x0200UL >>> + >> >> We use the top 4 bits and not the _SW bits? > > Correct, welcome to the discussion we've been having the last 2 weeks > :-) > I thought we were following Paul's suggestion here https://lists.ozlabs.org/pipermail/linuxppc-dev/2016-November/151620.html and I also noticed https://lists.ozlabs.org/pipermail/linuxppc-dev/2016-November/151624.html My bad, I thought we had two SW bits to use for DD1 Balbir Singh.
Re: [PATCH] EDAC: mpc85xx: Add T2080 l2-cache support
On Tue, Nov 29, 2016 at 03:20:37PM +1300, Chris Packham wrote: > The l2-cache controller on the T2080 SoC has similar capabilities to the > others already supported by the mpc85xx_edac driver. Add it to the list > of compatible devices. > > Signed-off-by: Chris Packham> --- Looks good, Acked-by: Johannes Thumshirn -- Johannes Thumshirn Storage jthumsh...@suse.de+49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
Re: [PATCH] PPC/CAS Add support for power9 in ibm_architecture_vec
On 11/29/16, Balbir Singhwrote: > > > The PVR list has been updated and IBM_ARCH_VEC_NRCORES_OFFSET. > This provides the cpu versions supported to the hypervisor and in this case > tells the hypervisor that the guest supports ISA 3.0 and Power9. > > Signed-off-by: Balbir Singh Michael rewrote the code so you have to update the patch. See https://patchwork.ozlabs.org/patch/658627/ > --- > arch/powerpc/include/asm/prom.h | 2 ++ > arch/powerpc/kernel/prom_init.c | 7 +-- > 2 files changed, 7 insertions(+), 2 deletions(-) > > diff --git a/arch/powerpc/include/asm/prom.h > b/arch/powerpc/include/asm/prom.h > index 7f436ba..785bc6b 100644 > --- a/arch/powerpc/include/asm/prom.h > +++ b/arch/powerpc/include/asm/prom.h > @@ -121,6 +121,8 @@ struct of_drconf_cell { > #define OV1_PPC_2_06 0x02/* set if we support PowerPC 2.06 */ > #define OV1_PPC_2_07 0x01/* set if we support PowerPC 2.07 */ > > +#define OV1_PPC_3_00 0x80/* set if we support PowerPC 3.00 */ > + > /* Option vector 2: Open Firmware options supported */ > #define OV2_REAL_MODE0x20/* set if we want OF in real > mode */ > > diff --git a/arch/powerpc/kernel/prom_init.c > b/arch/powerpc/kernel/prom_init.c > index 88ac964..2a8d6b0 100644 > --- a/arch/powerpc/kernel/prom_init.c > +++ b/arch/powerpc/kernel/prom_init.c > @@ -659,6 +659,8 @@ unsigned char ibm_architecture_vec[] = { > W(0x), W(0x004b), /* POWER8E */ > W(0x), W(0x004c), /* POWER8NVL */ > W(0x), W(0x004d), /* POWER8 */ > + W(0x), W(0x004e), /* POWER9 */ > + W(0x), W(0x0f05), /* all 3.00-compliant */ > W(0x), W(0x0f04), /* all 2.07-compliant */ > W(0x), W(0x0f03), /* all 2.06-compliant */ > W(0x), W(0x0f02), /* all 2.05-compliant */ > @@ -666,10 +668,11 @@ unsigned char ibm_architecture_vec[] = { > NUM_VECTORS(6), /* 6 option vectors */ > > /* option vector 1: processor architectures supported */ > - VECTOR_LENGTH(2), /* length */ > + VECTOR_LENGTH(3), /* length */ > 0, /* don't ignore, don't halt */ > OV1_PPC_2_00 | OV1_PPC_2_01 | OV1_PPC_2_02 | OV1_PPC_2_03 | > OV1_PPC_2_04 | OV1_PPC_2_05 | OV1_PPC_2_06 | OV1_PPC_2_07, > + OV1_PPC_3_00, > > /* option vector 2: Open Firmware options supported */ > VECTOR_LENGTH(33), /* length */ > @@ -720,7 +723,7 @@ unsigned char ibm_architecture_vec[] = { >* must match by the macro below. Update the definition if >* the structure layout changes. >*/ > -#define IBM_ARCH_VEC_NRCORES_OFFSET 133 > +#define IBM_ARCH_VEC_NRCORES_OFFSET 150 > W(NR_CPUS), /* number of cores supported */ > 0, > 0, > -- > 2.5.5 > >
Re: [1/3] powerpc/64e: convert cmpi to cmpwi in head_64.S
On Wed, 2016-11-23 at 13:02:07 UTC, Nicholas Piggin wrote: > >From 80f23935cadb ("powerpc: Convert cmp to cmpd in idle enter sequence"): > > PowerPC's "cmp" instruction has four operands. Normally people write > "cmpw" or "cmpd" for the second cmp operand 0 or 1. But, frequently > people forget, and write "cmp" with just three operands. > > With older binutils this is silently accepted as if this was "cmpw", > while often "cmpd" is wanted. With newer binutils GAS will complain > about this for 64-bit code. For 32-bit code it still silently assumes > "cmpw" is what is meant. > > In this instance the code comes directly from ISA v2.07, including the > cmp, but cmpd is correct. Backport to stable so that new toolchains can > build old kernels. > > In this case, cmpwi is called for, so this is just a build fix for > new toolchians. > > Stable: v3.0 > Cc: Segher Boessenkool> Signed-off-by: Nicholas Piggin Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/f87f253bac3ce4a4eb2a60a1ae604d cheers
[PATCH v11 3/8] kexec_file: Factor out kexec_locate_mem_hole from kexec_add_buffer.
From: Thiago Jung Bauermannkexec_locate_mem_hole will be used by the PowerPC kexec_file_load implementation to find free memory for the purgatory stack. Signed-off-by: Thiago Jung Bauermann Acked-by: Dave Young Signed-off-by: Michael Ellerman --- include/linux/kexec.h | 1 + kernel/kexec_file.c | 25 - 2 files changed, 21 insertions(+), 5 deletions(-) diff --git a/include/linux/kexec.h b/include/linux/kexec.h index 437ef1b47428..a33f63351f86 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -176,6 +176,7 @@ struct kexec_buf { int __weak arch_kexec_walk_mem(struct kexec_buf *kbuf, int (*func)(u64, u64, void *)); extern int kexec_add_buffer(struct kexec_buf *kbuf); +int kexec_locate_mem_hole(struct kexec_buf *kbuf); #endif /* CONFIG_KEXEC_FILE */ struct kimage { diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c index efd2c094af7e..0c2df7f73792 100644 --- a/kernel/kexec_file.c +++ b/kernel/kexec_file.c @@ -450,6 +450,23 @@ int __weak arch_kexec_walk_mem(struct kexec_buf *kbuf, } /** + * kexec_locate_mem_hole - find free memory for the purgatory or the next kernel + * @kbuf: Parameters for the memory search. + * + * On success, kbuf->mem will have the start address of the memory region found. + * + * Return: 0 on success, negative errno on error. + */ +int kexec_locate_mem_hole(struct kexec_buf *kbuf) +{ + int ret; + + ret = arch_kexec_walk_mem(kbuf, locate_mem_hole_callback); + + return ret == 1 ? 0 : -EADDRNOTAVAIL; +} + +/** * kexec_add_buffer - place a buffer in a kexec segment * @kbuf: Buffer contents and memory parameters. * @@ -489,11 +506,9 @@ int kexec_add_buffer(struct kexec_buf *kbuf) kbuf->buf_align = max(kbuf->buf_align, PAGE_SIZE); /* Walk the RAM ranges and allocate a suitable range for the buffer */ - ret = arch_kexec_walk_mem(kbuf, locate_mem_hole_callback); - if (ret != 1) { - /* A suitable memory range could not be found for buffer */ - return -EADDRNOTAVAIL; - } + ret = kexec_locate_mem_hole(kbuf); + if (ret) + return ret; /* Found a suitable memory range */ ksegment = >image->segment[kbuf->image->nr_segments]; -- 2.7.4
[PATCH v11 7/8] powerpc/kexec: Enable kexec_file_load() syscall
From: Thiago Jung BauermannDefine the Kconfig symbol so that the kexec_file_load() code can be built, and wire up the syscall so that it can be called. Signed-off-by: Thiago Jung Bauermann Signed-off-by: Michael Ellerman --- arch/powerpc/Kconfig | 13 + arch/powerpc/include/asm/systbl.h | 1 + arch/powerpc/include/asm/unistd.h | 2 +- arch/powerpc/include/uapi/asm/unistd.h | 1 + 4 files changed, 16 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 6cb59c6e5ba4..897d0f14447d 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -455,6 +455,19 @@ config KEXEC interface is strongly in flux, so no good recommendation can be made. +config KEXEC_FILE + bool "kexec file based system call" + select KEXEC_CORE + select BUILD_BIN2C + depends on PPC64 + depends on CRYPTO=y + depends on CRYPTO_SHA256=y + help + This is a new version of the kexec system call. This call is + file based and takes in file descriptors as system call arguments + for kernel and initramfs as opposed to a list of segments as is the + case for the older kexec call. + config RELOCATABLE bool "Build a relocatable kernel" depends on (PPC64 && !COMPILE_TEST) || (FLATMEM && (44x || FSL_BOOKE)) diff --git a/arch/powerpc/include/asm/systbl.h b/arch/powerpc/include/asm/systbl.h index 2fc5d4db503c..4b369d83fe9c 100644 --- a/arch/powerpc/include/asm/systbl.h +++ b/arch/powerpc/include/asm/systbl.h @@ -386,3 +386,4 @@ SYSCALL(mlock2) SYSCALL(copy_file_range) COMPAT_SYS_SPU(preadv2) COMPAT_SYS_SPU(pwritev2) +SYSCALL(kexec_file_load) diff --git a/arch/powerpc/include/asm/unistd.h b/arch/powerpc/include/asm/unistd.h index e8cdfec8d512..eb1acee91a20 100644 --- a/arch/powerpc/include/asm/unistd.h +++ b/arch/powerpc/include/asm/unistd.h @@ -12,7 +12,7 @@ #include -#define NR_syscalls382 +#define NR_syscalls383 #define __NR__exit __NR_exit diff --git a/arch/powerpc/include/uapi/asm/unistd.h b/arch/powerpc/include/uapi/asm/unistd.h index e9f5f41aa55a..2f26335a3c42 100644 --- a/arch/powerpc/include/uapi/asm/unistd.h +++ b/arch/powerpc/include/uapi/asm/unistd.h @@ -392,5 +392,6 @@ #define __NR_copy_file_range 379 #define __NR_preadv2 380 #define __NR_pwritev2 381 +#define __NR_kexec_file_load 382 #endif /* _UAPI_ASM_POWERPC_UNISTD_H_ */ -- 2.7.4
[PATCH v11 8/8] powerpc: Enable CONFIG_KEXEC_FILE in powerpc server defconfigs.
From: Thiago Jung BauermannEnable CONFIG_KEXEC_FILE in powernv_defconfig, ppc64_defconfig and pseries_defconfig. It depends on CONFIG_CRYPTO_SHA256=y, so add that as well. Signed-off-by: Thiago Jung Bauermann Signed-off-by: Michael Ellerman --- arch/powerpc/configs/powernv_defconfig | 2 ++ arch/powerpc/configs/ppc64_defconfig | 2 ++ arch/powerpc/configs/pseries_defconfig | 2 ++ 3 files changed, 6 insertions(+) diff --git a/arch/powerpc/configs/powernv_defconfig b/arch/powerpc/configs/powernv_defconfig index d98b6eb3254f..5a190aa5534b 100644 --- a/arch/powerpc/configs/powernv_defconfig +++ b/arch/powerpc/configs/powernv_defconfig @@ -49,6 +49,7 @@ CONFIG_BINFMT_MISC=m CONFIG_PPC_TRANSACTIONAL_MEM=y CONFIG_HOTPLUG_CPU=y CONFIG_KEXEC=y +CONFIG_KEXEC_FILE=y CONFIG_IRQ_ALL_CPUS=y CONFIG_NUMA=y CONFIG_MEMORY_HOTPLUG=y @@ -301,6 +302,7 @@ CONFIG_CRYPTO_CCM=m CONFIG_CRYPTO_PCBC=m CONFIG_CRYPTO_HMAC=y CONFIG_CRYPTO_MICHAEL_MIC=m +CONFIG_CRYPTO_SHA256=y CONFIG_CRYPTO_TGR192=m CONFIG_CRYPTO_WP512=m CONFIG_CRYPTO_ANUBIS=m diff --git a/arch/powerpc/configs/ppc64_defconfig b/arch/powerpc/configs/ppc64_defconfig index 58a98d40086f..0059d2088b9c 100644 --- a/arch/powerpc/configs/ppc64_defconfig +++ b/arch/powerpc/configs/ppc64_defconfig @@ -46,6 +46,7 @@ CONFIG_HZ_100=y CONFIG_BINFMT_MISC=m CONFIG_PPC_TRANSACTIONAL_MEM=y CONFIG_KEXEC=y +CONFIG_KEXEC_FILE=y CONFIG_CRASH_DUMP=y CONFIG_IRQ_ALL_CPUS=y CONFIG_MEMORY_HOTREMOVE=y @@ -336,6 +337,7 @@ CONFIG_CRYPTO_TEST=m CONFIG_CRYPTO_PCBC=m CONFIG_CRYPTO_HMAC=y CONFIG_CRYPTO_MICHAEL_MIC=m +CONFIG_CRYPTO_SHA256=y CONFIG_CRYPTO_TGR192=m CONFIG_CRYPTO_WP512=m CONFIG_CRYPTO_ANUBIS=m diff --git a/arch/powerpc/configs/pseries_defconfig b/arch/powerpc/configs/pseries_defconfig index 8a3bc016b732..f022f657a984 100644 --- a/arch/powerpc/configs/pseries_defconfig +++ b/arch/powerpc/configs/pseries_defconfig @@ -52,6 +52,7 @@ CONFIG_HZ_100=y CONFIG_BINFMT_MISC=m CONFIG_PPC_TRANSACTIONAL_MEM=y CONFIG_KEXEC=y +CONFIG_KEXEC_FILE=y CONFIG_IRQ_ALL_CPUS=y CONFIG_MEMORY_HOTPLUG=y CONFIG_MEMORY_HOTREMOVE=y @@ -303,6 +304,7 @@ CONFIG_CRYPTO_TEST=m CONFIG_CRYPTO_PCBC=m CONFIG_CRYPTO_HMAC=y CONFIG_CRYPTO_MICHAEL_MIC=m +CONFIG_CRYPTO_SHA256=y CONFIG_CRYPTO_TGR192=m CONFIG_CRYPTO_WP512=m CONFIG_CRYPTO_ANUBIS=m -- 2.7.4
[PATCH v11 2/8] kexec_file: Change kexec_add_buffer to take kexec_buf as argument.
From: Thiago Jung BauermannThis is done to simplify the kexec_add_buffer argument list. Adapt all callers to set up a kexec_buf to pass to kexec_add_buffer. In addition, change the type of kexec_buf.buffer from char * to void *. There is no particular reason for it to be a char *, and the change allows us to get rid of 3 existing casts to char * in the code. Signed-off-by: Thiago Jung Bauermann Acked-by: Dave Young Acked-by: Balbir Singh Signed-off-by: Michael Ellerman --- arch/x86/kernel/crash.c | 37 arch/x86/kernel/kexec-bzimage64.c | 48 +++-- include/linux/kexec.h | 8 +--- kernel/kexec_file.c | 88 ++- 4 files changed, 87 insertions(+), 94 deletions(-) diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c index 650830e39e3a..3741461c63a0 100644 --- a/arch/x86/kernel/crash.c +++ b/arch/x86/kernel/crash.c @@ -631,9 +631,9 @@ static int determine_backup_region(u64 start, u64 end, void *arg) int crash_load_segments(struct kimage *image) { - unsigned long src_start, src_sz, elf_sz; - void *elf_addr; int ret; + struct kexec_buf kbuf = { .image = image, .buf_min = 0, + .buf_max = ULONG_MAX, .top_down = false }; /* * Determine and load a segment for backup area. First 640K RAM @@ -647,43 +647,44 @@ int crash_load_segments(struct kimage *image) if (ret < 0) return ret; - src_start = image->arch.backup_src_start; - src_sz = image->arch.backup_src_sz; - /* Add backup segment. */ - if (src_sz) { + if (image->arch.backup_src_sz) { + kbuf.buffer = _zero_bytes; + kbuf.bufsz = sizeof(crash_zero_bytes); + kbuf.memsz = image->arch.backup_src_sz; + kbuf.buf_align = PAGE_SIZE; /* * Ideally there is no source for backup segment. This is * copied in purgatory after crash. Just add a zero filled * segment for now to make sure checksum logic works fine. */ - ret = kexec_add_buffer(image, (char *)_zero_bytes, - sizeof(crash_zero_bytes), src_sz, - PAGE_SIZE, 0, -1, 0, - >arch.backup_load_addr); + ret = kexec_add_buffer(); if (ret) return ret; + image->arch.backup_load_addr = kbuf.mem; pr_debug("Loaded backup region at 0x%lx backup_start=0x%lx memsz=0x%lx\n", -image->arch.backup_load_addr, src_start, src_sz); +image->arch.backup_load_addr, +image->arch.backup_src_start, kbuf.memsz); } /* Prepare elf headers and add a segment */ - ret = prepare_elf_headers(image, _addr, _sz); + ret = prepare_elf_headers(image, , ); if (ret) return ret; - image->arch.elf_headers = elf_addr; - image->arch.elf_headers_sz = elf_sz; + image->arch.elf_headers = kbuf.buffer; + image->arch.elf_headers_sz = kbuf.bufsz; - ret = kexec_add_buffer(image, (char *)elf_addr, elf_sz, elf_sz, - ELF_CORE_HEADER_ALIGN, 0, -1, 0, - >arch.elf_load_addr); + kbuf.memsz = kbuf.bufsz; + kbuf.buf_align = ELF_CORE_HEADER_ALIGN; + ret = kexec_add_buffer(); if (ret) { vfree((void *)image->arch.elf_headers); return ret; } + image->arch.elf_load_addr = kbuf.mem; pr_debug("Loaded ELF headers at 0x%lx bufsz=0x%lx memsz=0x%lx\n", -image->arch.elf_load_addr, elf_sz, elf_sz); +image->arch.elf_load_addr, kbuf.bufsz, kbuf.bufsz); return ret; } diff --git a/arch/x86/kernel/kexec-bzimage64.c b/arch/x86/kernel/kexec-bzimage64.c index 3407b148c240..d0a814a9d96a 100644 --- a/arch/x86/kernel/kexec-bzimage64.c +++ b/arch/x86/kernel/kexec-bzimage64.c @@ -331,17 +331,17 @@ static void *bzImage64_load(struct kimage *image, char *kernel, struct setup_header *header; int setup_sects, kern16_size, ret = 0; - unsigned long setup_header_size, params_cmdline_sz, params_misc_sz; + unsigned long setup_header_size, params_cmdline_sz; struct boot_params *params; unsigned long bootparam_load_addr, kernel_load_addr, initrd_load_addr; unsigned long purgatory_load_addr; - unsigned long kernel_bufsz, kernel_memsz, kernel_align; - char *kernel_buf; struct bzimage64_data *ldata; struct kexec_entry64_regs regs64; void *stack; unsigned int setup_hdr_offset =
[PATCH v11 4/8] powerpc: Change places using CONFIG_KEXEC to use CONFIG_KEXEC_CORE instead.
From: Thiago Jung BauermannCommit 2965faa5e03d ("kexec: split kexec_load syscall from kexec core code") introduced CONFIG_KEXEC_CORE so that CONFIG_KEXEC means whether the kexec_load system call should be compiled-in and CONFIG_KEXEC_FILE means whether the kexec_file_load system call should be compiled-in. These options can be set independently from each other. Since until now powerpc only supported kexec_load, CONFIG_KEXEC and CONFIG_KEXEC_CORE were synonyms. That is not the case anymore, so we need to make a distinction. Almost all places where CONFIG_KEXEC was being used should be using CONFIG_KEXEC_CORE instead, since kexec_file_load also needs that code compiled in. Signed-off-by: Thiago Jung Bauermann Signed-off-by: Michael Ellerman --- arch/powerpc/Kconfig | 2 +- arch/powerpc/include/asm/debug.h | 2 +- arch/powerpc/include/asm/kexec.h | 6 +++--- arch/powerpc/include/asm/machdep.h| 4 ++-- arch/powerpc/include/asm/smp.h| 2 +- arch/powerpc/kernel/Makefile | 4 ++-- arch/powerpc/kernel/head_64.S | 2 +- arch/powerpc/kernel/misc_32.S | 2 +- arch/powerpc/kernel/misc_64.S | 6 +++--- arch/powerpc/kernel/prom.c| 2 +- arch/powerpc/kernel/setup_64.c| 4 ++-- arch/powerpc/kernel/smp.c | 6 +++--- arch/powerpc/kernel/traps.c | 2 +- arch/powerpc/platforms/85xx/corenet_generic.c | 2 +- arch/powerpc/platforms/85xx/smp.c | 8 arch/powerpc/platforms/cell/spu_base.c| 2 +- arch/powerpc/platforms/powernv/setup.c| 6 +++--- arch/powerpc/platforms/ps3/setup.c| 4 ++-- arch/powerpc/platforms/pseries/Makefile | 2 +- arch/powerpc/platforms/pseries/setup.c| 4 ++-- 20 files changed, 36 insertions(+), 36 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 65fba4c34cd7..6cb59c6e5ba4 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -489,7 +489,7 @@ config CRASH_DUMP config FA_DUMP bool "Firmware-assisted dump" - depends on PPC64 && PPC_RTAS && CRASH_DUMP && KEXEC + depends on PPC64 && PPC_RTAS && CRASH_DUMP && KEXEC_CORE help A robust mechanism to get reliable kernel crash dump with assistance from firmware. This approach does not use kexec, diff --git a/arch/powerpc/include/asm/debug.h b/arch/powerpc/include/asm/debug.h index a954e4975049..86308f177f2d 100644 --- a/arch/powerpc/include/asm/debug.h +++ b/arch/powerpc/include/asm/debug.h @@ -10,7 +10,7 @@ struct pt_regs; extern struct dentry *powerpc_debugfs_root; -#if defined(CONFIG_DEBUGGER) || defined(CONFIG_KEXEC) +#if defined(CONFIG_DEBUGGER) || defined(CONFIG_KEXEC_CORE) extern int (*__debugger)(struct pt_regs *regs); extern int (*__debugger_ipi)(struct pt_regs *regs); diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h index a46f5f45570c..eca2f975bf44 100644 --- a/arch/powerpc/include/asm/kexec.h +++ b/arch/powerpc/include/asm/kexec.h @@ -53,7 +53,7 @@ typedef void (*crash_shutdown_t)(void); -#ifdef CONFIG_KEXEC +#ifdef CONFIG_KEXEC_CORE /* * This function is responsible for capturing register states if coming @@ -91,7 +91,7 @@ static inline bool kdump_in_progress(void) return crashing_cpu >= 0; } -#else /* !CONFIG_KEXEC */ +#else /* !CONFIG_KEXEC_CORE */ static inline void crash_kexec_secondary(struct pt_regs *regs) { } static inline int overlaps_crashkernel(unsigned long start, unsigned long size) @@ -116,7 +116,7 @@ static inline bool kdump_in_progress(void) return false; } -#endif /* CONFIG_KEXEC */ +#endif /* CONFIG_KEXEC_CORE */ #endif /* ! __ASSEMBLY__ */ #endif /* __KERNEL__ */ #endif /* _ASM_POWERPC_KEXEC_H */ diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h index e02cbc6a6c70..5011b69107a7 100644 --- a/arch/powerpc/include/asm/machdep.h +++ b/arch/powerpc/include/asm/machdep.h @@ -183,7 +183,7 @@ struct machdep_calls { */ void (*machine_shutdown)(void); -#ifdef CONFIG_KEXEC +#ifdef CONFIG_KEXEC_CORE void (*kexec_cpu_down)(int crash_shutdown, int secondary); /* Called to do what every setup is needed on image and the @@ -198,7 +198,7 @@ struct machdep_calls { * no return. */ void (*machine_kexec)(struct kimage *image); -#endif /* CONFIG_KEXEC */ +#endif /* CONFIG_KEXEC_CORE */ #ifdef CONFIG_SUSPEND /* These are called to disable and enable, respectively, IRQs when diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h index 0d02c11dc331..32db16d2e7ad 100644 --- a/arch/powerpc/include/asm/smp.h +++ b/arch/powerpc/include/asm/smp.h @@ -176,7 +176,7 @@ static inline void
[PATCH v11 1/8] kexec_file: Allow arch-specific memory walking for kexec_add_buffer
From: Thiago Jung BauermannAllow architectures to specify a different memory walking function for kexec_add_buffer. x86 uses iomem to track reserved memory ranges, but PowerPC uses the memblock subsystem. Signed-off-by: Thiago Jung Bauermann Acked-by: Dave Young Acked-by: Balbir Singh Signed-off-by: Michael Ellerman --- include/linux/kexec.h | 29 - kernel/kexec_file.c | 30 ++ kernel/kexec_internal.h | 16 3 files changed, 50 insertions(+), 25 deletions(-) diff --git a/include/linux/kexec.h b/include/linux/kexec.h index 406c33dcae13..5e320ddaaa82 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -148,7 +148,34 @@ struct kexec_file_ops { kexec_verify_sig_t *verify_sig; #endif }; -#endif + +/** + * struct kexec_buf - parameters for finding a place for a buffer in memory + * @image: kexec image in which memory to search. + * @buffer:Contents which will be copied to the allocated memory. + * @bufsz: Size of @buffer. + * @mem: On return will have address of the buffer in memory. + * @memsz: Size for the buffer in memory. + * @buf_align: Minimum alignment needed. + * @buf_min: The buffer can't be placed below this address. + * @buf_max: The buffer can't be placed above this address. + * @top_down: Allocate from top of memory. + */ +struct kexec_buf { + struct kimage *image; + char *buffer; + unsigned long bufsz; + unsigned long mem; + unsigned long memsz; + unsigned long buf_align; + unsigned long buf_min; + unsigned long buf_max; + bool top_down; +}; + +int __weak arch_kexec_walk_mem(struct kexec_buf *kbuf, + int (*func)(u64, u64, void *)); +#endif /* CONFIG_KEXEC_FILE */ struct kimage { kimage_entry_t head; diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c index 037c321c5618..f865674bff51 100644 --- a/kernel/kexec_file.c +++ b/kernel/kexec_file.c @@ -428,6 +428,27 @@ static int locate_mem_hole_callback(u64 start, u64 end, void *arg) return locate_mem_hole_bottom_up(start, end, kbuf); } +/** + * arch_kexec_walk_mem - call func(data) on free memory regions + * @kbuf: Context info for the search. Also passed to @func. + * @func: Function to call for each memory region. + * + * Return: The memory walk will stop when func returns a non-zero value + * and that value will be returned. If all free regions are visited without + * func returning non-zero, then zero will be returned. + */ +int __weak arch_kexec_walk_mem(struct kexec_buf *kbuf, + int (*func)(u64, u64, void *)) +{ + if (kbuf->image->type == KEXEC_TYPE_CRASH) + return walk_iomem_res_desc(crashk_res.desc, + IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY, + crashk_res.start, crashk_res.end, + kbuf, func); + else + return walk_system_ram_res(0, ULONG_MAX, kbuf, func); +} + /* * Helper function for placing a buffer in a kexec segment. This assumes * that kexec_mutex is held. @@ -474,14 +495,7 @@ int kexec_add_buffer(struct kimage *image, char *buffer, unsigned long bufsz, kbuf->top_down = top_down; /* Walk the RAM ranges and allocate a suitable range for the buffer */ - if (image->type == KEXEC_TYPE_CRASH) - ret = walk_iomem_res_desc(crashk_res.desc, - IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY, - crashk_res.start, crashk_res.end, kbuf, - locate_mem_hole_callback); - else - ret = walk_system_ram_res(0, -1, kbuf, - locate_mem_hole_callback); + ret = arch_kexec_walk_mem(kbuf, locate_mem_hole_callback); if (ret != 1) { /* A suitable memory range could not be found for buffer */ return -EADDRNOTAVAIL; diff --git a/kernel/kexec_internal.h b/kernel/kexec_internal.h index 0a52315d9c62..4cef7e4706b0 100644 --- a/kernel/kexec_internal.h +++ b/kernel/kexec_internal.h @@ -20,22 +20,6 @@ struct kexec_sha_region { unsigned long len; }; -/* - * Keeps track of buffer parameters as provided by caller for requesting - * memory placement of buffer. - */ -struct kexec_buf { - struct kimage *image; - char *buffer; - unsigned long bufsz; - unsigned long mem; - unsigned long memsz; - unsigned long buf_align; - unsigned long buf_min; - unsigned long buf_max; - bool top_down; /* allocate from top of memory hole */ -}; - void kimage_file_post_load_cleanup(struct kimage *image); #else /* CONFIG_KEXEC_FILE */ static
[PATCH v11 6/8] powerpc: Add purgatory for kexec_file_load() implementation.
From: Thiago Jung BauermannThis purgatory implementation is based on the versions from kexec-tools and kexec-lite, with additional changes. Signed-off-by: Thiago Jung Bauermann Signed-off-by: Michael Ellerman --- arch/powerpc/Makefile | 1 + arch/powerpc/kernel/machine_kexec_64.c | 2 +- arch/powerpc/purgatory/.gitignore | 2 + arch/powerpc/purgatory/Makefile| 15 arch/powerpc/purgatory/trampoline.S| 128 + 5 files changed, 147 insertions(+), 1 deletion(-) create mode 100644 arch/powerpc/purgatory/.gitignore create mode 100644 arch/powerpc/purgatory/Makefile create mode 100644 arch/powerpc/purgatory/trampoline.S diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile index 617dece67924..5e7dcdaf93f5 100644 --- a/arch/powerpc/Makefile +++ b/arch/powerpc/Makefile @@ -249,6 +249,7 @@ core-y += arch/powerpc/kernel/ \ core-$(CONFIG_XMON)+= arch/powerpc/xmon/ core-$(CONFIG_KVM) += arch/powerpc/kvm/ core-$(CONFIG_PERF_EVENTS) += arch/powerpc/perf/ +core-$(CONFIG_KEXEC_FILE) += arch/powerpc/purgatory/ drivers-$(CONFIG_OPROFILE) += arch/powerpc/oprofile/ diff --git a/arch/powerpc/kernel/machine_kexec_64.c b/arch/powerpc/kernel/machine_kexec_64.c index a205fa3d9bf3..5c12e21d0d1a 100644 --- a/arch/powerpc/kernel/machine_kexec_64.c +++ b/arch/powerpc/kernel/machine_kexec_64.c @@ -310,7 +310,7 @@ void default_machine_kexec(struct kimage *image) if (!kdump_in_progress()) kexec_prepare_cpus(); - pr_debug("kexec: Starting switchover sequence.\n"); + printk("kexec: Starting switchover sequence.\n"); /* switch to a staticly allocated stack. Based on irq stack code. * We setup preempt_count to avoid using VMX in memcpy. diff --git a/arch/powerpc/purgatory/.gitignore b/arch/powerpc/purgatory/.gitignore new file mode 100644 index ..e9e66f178a6d --- /dev/null +++ b/arch/powerpc/purgatory/.gitignore @@ -0,0 +1,2 @@ +kexec-purgatory.c +purgatory.ro diff --git a/arch/powerpc/purgatory/Makefile b/arch/powerpc/purgatory/Makefile new file mode 100644 index ..ac8793c13348 --- /dev/null +++ b/arch/powerpc/purgatory/Makefile @@ -0,0 +1,15 @@ +targets += trampoline.o purgatory.ro kexec-purgatory.c + +LDFLAGS_purgatory.ro := -e purgatory_start -r --no-undefined + +$(obj)/purgatory.ro: $(obj)/trampoline.o FORCE + $(call if_changed,ld) + +CMD_BIN2C = $(objtree)/scripts/basic/bin2c +quiet_cmd_bin2c = BIN2C $@ + cmd_bin2c = $(CMD_BIN2C) kexec_purgatory < $< > $@ + +$(obj)/kexec-purgatory.c: $(obj)/purgatory.ro FORCE + $(call if_changed,bin2c) + +obj-y += kexec-purgatory.o diff --git a/arch/powerpc/purgatory/trampoline.S b/arch/powerpc/purgatory/trampoline.S new file mode 100644 index ..f9760ccf4032 --- /dev/null +++ b/arch/powerpc/purgatory/trampoline.S @@ -0,0 +1,128 @@ +/* + * kexec trampoline + * + * Based on code taken from kexec-tools and kexec-lite. + * + * Copyright (C) 2004 - 2005, Milton D Miller II, IBM Corporation + * Copyright (C) 2006, Mohan Kumar M, IBM Corporation + * Copyright (C) 2013, Anton Blanchard, IBM Corporation + * + * This program is free software; you can redistribute it and/or modify it under + * the terms of the GNU General Public License as published by the Free + * Software Foundation (version 2 of the License). + */ + +#if defined(__LITTLE_ENDIAN__) +#define STWX_BEstwbrx +#define LWZX_BElwbrx +#elif defined(__BIG_ENDIAN__) +#define STWX_BEstwx +#define LWZX_BElwzx +#else +#error no endianness defined! +#endif + + .machine ppc64 + .balign 256 + .globl purgatory_start +purgatory_start: + b master + + /* ABI: possible run_at_load flag at 0x5c */ + .org purgatory_start + 0x5c + .globl run_at_load +run_at_load: + .long 0 + .size run_at_load, . - run_at_load + + /* ABI: slaves start at 60 with r3=phys */ + .org purgatory_start + 0x60 +slave: + b . + /* ABI: end of copied region */ + .org purgatory_start + 0x100 + .size purgatory_start, . - purgatory_start + +/* + * The above 0x100 bytes at purgatory_start are replaced with the + * code from the kernel (or next stage) by setup_purgatory(). + */ + +master: + or %r1,%r1,%r1 /* low priority to let other threads catchup */ + isync + mr %r17,%r3/* save cpu id to r17 */ + mr %r15,%r4/* save physical address in reg15 */ + + or %r3,%r3,%r3 /* ok now to high priority, lets boot */ + lis %r6,0x1 + mtctr %r6 /* delay a bit for slaves to catch up */ + bdnz. /* before we overwrite 0-100 again */ + + bl 0f /* Work out where we're running */
Re: [3/3] powerpc/64e: don't branch to dot symbols
On Wed, 2016-11-23 at 13:02:09 UTC, Nicholas Piggin wrote: > This converts one that was missed by b1576fec7f4d ("powerpc: No need > to use dot symbols when branching to a function"). > > Signed-off-by: Nicholas PigginApplied to powerpc next, thanks. https://git.kernel.org/powerpc/c/ae88f7b9af17a1267f5dd5b87a4487 cheers
Re: [v7,1/7] powerpc/mm: Rename hugetlb-radix.h to hugetlb.h
On Mon, 2016-11-28 at 06:16:58 UTC, "Aneesh Kumar K.V" wrote: > We will start moving some book3s specific hugetlb functions there. > > Signed-off-by: Aneesh Kumar K.VSeries applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/bee8b3b56d1dfc4075254a61340ee3 cheers
[PATCH] PPC/CAS Add support for power9 in ibm_architecture_vec
The PVR list has been updated and IBM_ARCH_VEC_NRCORES_OFFSET. This provides the cpu versions supported to the hypervisor and in this case tells the hypervisor that the guest supports ISA 3.0 and Power9. Signed-off-by: Balbir Singh--- arch/powerpc/include/asm/prom.h | 2 ++ arch/powerpc/kernel/prom_init.c | 7 +-- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/prom.h b/arch/powerpc/include/asm/prom.h index 7f436ba..785bc6b 100644 --- a/arch/powerpc/include/asm/prom.h +++ b/arch/powerpc/include/asm/prom.h @@ -121,6 +121,8 @@ struct of_drconf_cell { #define OV1_PPC_2_06 0x02/* set if we support PowerPC 2.06 */ #define OV1_PPC_2_07 0x01/* set if we support PowerPC 2.07 */ +#define OV1_PPC_3_00 0x80/* set if we support PowerPC 3.00 */ + /* Option vector 2: Open Firmware options supported */ #define OV2_REAL_MODE 0x20/* set if we want OF in real mode */ diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c index 88ac964..2a8d6b0 100644 --- a/arch/powerpc/kernel/prom_init.c +++ b/arch/powerpc/kernel/prom_init.c @@ -659,6 +659,8 @@ unsigned char ibm_architecture_vec[] = { W(0x), W(0x004b), /* POWER8E */ W(0x), W(0x004c), /* POWER8NVL */ W(0x), W(0x004d), /* POWER8 */ + W(0x), W(0x004e), /* POWER9 */ + W(0x), W(0x0f05), /* all 3.00-compliant */ W(0x), W(0x0f04), /* all 2.07-compliant */ W(0x), W(0x0f03), /* all 2.06-compliant */ W(0x), W(0x0f02), /* all 2.05-compliant */ @@ -666,10 +668,11 @@ unsigned char ibm_architecture_vec[] = { NUM_VECTORS(6), /* 6 option vectors */ /* option vector 1: processor architectures supported */ - VECTOR_LENGTH(2), /* length */ + VECTOR_LENGTH(3), /* length */ 0, /* don't ignore, don't halt */ OV1_PPC_2_00 | OV1_PPC_2_01 | OV1_PPC_2_02 | OV1_PPC_2_03 | OV1_PPC_2_04 | OV1_PPC_2_05 | OV1_PPC_2_06 | OV1_PPC_2_07, + OV1_PPC_3_00, /* option vector 2: Open Firmware options supported */ VECTOR_LENGTH(33), /* length */ @@ -720,7 +723,7 @@ unsigned char ibm_architecture_vec[] = { * must match by the macro below. Update the definition if * the structure layout changes. */ -#define IBM_ARCH_VEC_NRCORES_OFFSET133 +#define IBM_ARCH_VEC_NRCORES_OFFSET150 W(NR_CPUS), /* number of cores supported */ 0, 0, -- 2.5.5
Re: [1/3] powerpc: Stop passing ARCH=ppc64 to boot Makefile
On Mon, 2016-11-21 at 10:14:33 UTC, Michael Ellerman wrote: > Back in 2005 when the ppc/ppc64 merge started, we used to build the > kernel code in arch/powerpc but use the boot code from arch/ppc or > arch/ppc64 depending on whether we were building for 32 or 64-bit. > > Originally we called the boot Makefile passing ARCH=$(OLDARCH), where > OLDARCH was ppc or ppc64. > > In commit 20f629549b30 ("powerpc: Make building the boot image work for > both 32-bit and 64-bit") (2005-10-11) we split the call for 32/64-bit > using an ifeq check, because the two Makefiles took different targets, > and explicitly passed ARCH=ppc64 for the 64-bit case and ARCH=ppc for > the 32-bit case. > > Then in commit 94b212c29f68 ("powerpc: Move ppc64 boot wrapper code over > to arch/powerpc") (2005-11-16) we moved the boot code into arch/powerpc > and dropped the ppc case, but kept passing ARCH=ppc64 to > arch/powerpc/boot/Makefile. > > Since then there have been several more boot targets added, all of which > have copied the ARCH=ppc64 setting, such that now we have four targets > using it. > > Currently it seems that nothing actually uses the ARCH value, but that's > basically just luck, and in particular it prevents us from using the > generic cpp_lds_S rule. It's also clearly wrong, ARCH=ppc64 is dead, > buried and cremated. > > Fix it by dropping the setting of ARCH completely, the correct value is > exported by the top level Makefile. > > Signed-off-by: Michael EllermanSeries applied to powerpc next. https://git.kernel.org/powerpc/c/1196d7aaebf6cdad619310fe283422 cheers
[PATCH v11 0/8] powerpc: Implement kexec_file_load()
This is v11 of the kexec_file_load() for powerpc series. I've stripped this down to the minimum we need, so we can get this in for 4.10. Any additions can come later incrementally. If no one objects I'll merge this via the powerpc tree. The three kexec patches have been acked by Dave Young (since forever), and have been in linux-next (via akpm's tree) also for a long time. cheers v11 (Michael Ellerman): - Strip back purgatory to the minimal trampoline required. This avoids complexity in the purgatory environment where all exceptions are fatal. - Reorder the series so we don't start advertising the config symbol, or more importantly the syscall, until they're actually implemented. Original cover letter by Thiago: This patch series implements the kexec_file_load system call on PowerPC. This system call moves the reading of the kernel, initrd and the device tree from the userspace kexec tool to the kernel. This is needed if you want to do one or both of the following: 1. only allow loading of signed kernels. 2. "measure" (i.e., record the hashes of) the kernel, initrd, kernel command line and other boot inputs for the Integrity Measurement Architecture subsystem. The above are the functions kexec already has built into kexec_file_load. Yesterday I posted a set of patches which allows a third feature: 3. have IMA pass-on its event log (where integrity measurements are registered) accross kexec to the second kernel, so that the event history is preserved. Because OpenPower uses an intermediary Linux instance as a boot loader (skiroot), feature 1 is needed to implement secure boot for the platform, while features 2 and 3 are needed to implement trusted boot. This patch series starts by removing an x86 assumption from kexec_file: kexec_add_buffer uses iomem to find reserved memory ranges, but PowerPC uses the memblock subsystem. A hook is added so that each arch can specify how memory ranges can be found. Also, the memory-walking logic in kexec_add_buffer is useful in this implementation to find a free area for the purgatory's stack, so the next patch moves that logic to kexec_locate_mem_hole. The kexec_file_load system call needs to apply relocations to the purgatory but adding code for that would duplicate functionality with the module loading mechanism, which also needs to apply relocations to the kernel modules. Therefore, this patch series factors out the module relocation code so that it can be shared. One thing that is still missing is crashkernel support, which I intend to submit shortly. For now, arch_kexec_kernel_image_probe rejects crash kernels. This code is based on kexec-tools, but with many modifications to adapt it to the kernel environment and facilities.
[PATCH v11 5/8] powerpc: Add support code for kexec_file_load()
From: Thiago Jung BauermannThis patch adds the support code needed for implementing kexec_file_load() on powerpc. This consists of functions to load the ELF kernel, either big or little endian, and setup the purgatory enviroment which switches from the first kernel to the second kernel. None of this code is built yet, as it depends on CONFIG_KEXEC_FILE which we have not yet defined. Although we could define CONFIG_KEXEC_FILE in this patch, we'd then have a window in history where the kconfig symbol is present but the syscall is not, which would be awkward. Signed-off-by: Josh Sklar Signed-off-by: Thiago Jung Bauermann Signed-off-by: Michael Ellerman --- arch/powerpc/include/asm/kexec.h| 10 + arch/powerpc/kernel/Makefile| 1 + arch/powerpc/kernel/kexec_elf_64.c | 663 arch/powerpc/kernel/machine_kexec_file_64.c | 338 ++ 4 files changed, 1012 insertions(+) create mode 100644 arch/powerpc/kernel/kexec_elf_64.c create mode 100644 arch/powerpc/kernel/machine_kexec_file_64.c diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h index eca2f975bf44..6c3b71502fbc 100644 --- a/arch/powerpc/include/asm/kexec.h +++ b/arch/powerpc/include/asm/kexec.h @@ -91,6 +91,16 @@ static inline bool kdump_in_progress(void) return crashing_cpu >= 0; } +#ifdef CONFIG_KEXEC_FILE +extern struct kexec_file_ops kexec_elf64_ops; + +int setup_purgatory(struct kimage *image, const void *slave_code, + const void *fdt, unsigned long kernel_load_addr, + unsigned long fdt_load_addr); +int setup_new_fdt(void *fdt, unsigned long initrd_load_addr, + unsigned long initrd_len, const char *cmdline); +#endif /* CONFIG_KEXEC_FILE */ + #else /* !CONFIG_KEXEC_CORE */ static inline void crash_kexec_secondary(struct pt_regs *regs) { } diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index 22534a56c914..41d8ff34ae27 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -109,6 +109,7 @@ obj-$(CONFIG_PCI) += pci_$(BITS).o $(pci64-y) \ obj-$(CONFIG_PCI_MSI) += msi.o obj-$(CONFIG_KEXEC_CORE) += machine_kexec.o crash.o \ machine_kexec_$(BITS).o +obj-$(CONFIG_KEXEC_FILE) += machine_kexec_file_$(BITS).o kexec_elf_$(BITS).o obj-$(CONFIG_AUDIT)+= audit.o obj64-$(CONFIG_AUDIT) += compat_audit.o diff --git a/arch/powerpc/kernel/kexec_elf_64.c b/arch/powerpc/kernel/kexec_elf_64.c new file mode 100644 index ..6acffd34a70f --- /dev/null +++ b/arch/powerpc/kernel/kexec_elf_64.c @@ -0,0 +1,663 @@ +/* + * Load ELF vmlinux file for the kexec_file_load syscall. + * + * Copyright (C) 2004 Adam Litke (a...@us.ibm.com) + * Copyright (C) 2004 IBM Corp. + * Copyright (C) 2005 R Sharada (shar...@in.ibm.com) + * Copyright (C) 2006 Mohan Kumar M (mo...@in.ibm.com) + * Copyright (C) 2016 IBM Corporation + * + * Based on kexec-tools' kexec-elf-exec.c and kexec-elf-ppc64.c. + * Heavily modified for the kernel by + * Thiago Jung Bauermann . + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation (version 2 of the License). + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#define pr_fmt(fmt)"kexec_elf: " fmt + +#include +#include +#include +#include +#include +#include +#include + +#define PURGATORY_STACK_SIZE (16 * 1024) + +#define elf_addr_to_cpuelf64_to_cpu + +#ifndef Elf_Rel +#define Elf_RelElf64_Rel +#endif /* Elf_Rel */ + +struct elf_info { + /* +* Where the ELF binary contents are kept. +* Memory managed by the user of the struct. +*/ + const char *buffer; + + const struct elfhdr *ehdr; + const struct elf_phdr *proghdrs; + struct elf_shdr *sechdrs; +}; + +static inline bool elf_is_elf_file(const struct elfhdr *ehdr) +{ + return memcmp(ehdr->e_ident, ELFMAG, SELFMAG) == 0; +} + +static uint64_t elf64_to_cpu(const struct elfhdr *ehdr, uint64_t value) +{ + if (ehdr->e_ident[EI_DATA] == ELFDATA2LSB) + value = le64_to_cpu(value); + else if (ehdr->e_ident[EI_DATA] == ELFDATA2MSB) + value = be64_to_cpu(value); + + return value; +} + +static uint16_t elf16_to_cpu(const struct elfhdr *ehdr, uint16_t value) +{ + if (ehdr->e_ident[EI_DATA] == ELFDATA2LSB) + value = le16_to_cpu(value); + else if
Re: [PATCH v4 3/7] PCI: Separate VF BAR updates from standard BAR updates
On Tue, Nov 29, 2016 at 03:55:46PM +1100, Gavin Shan wrote: > On Mon, Nov 28, 2016 at 10:15:06PM -0600, Bjorn Helgaas wrote: > >Previously pci_update_resource() used the same code path for updating > >standard BARs and VF BARs in SR-IOV capabilities. > > > >Split the VF BAR update into a new pci_iov_update_resource() internal > >interface, which makes it simpler to compute the BAR address (we can get > >rid of pci_resource_bar() and pci_iov_resource_bar()). > > > >This patch: > > > > - Renames pci_update_resource() to pci_std_update_resource(), > > - Adds pci_iov_update_resource(), > > - Makes pci_update_resource() a wrapper that calls the appropriate one, > > > >No functional change intended. > > > >Signed-off-by: Bjorn Helgaas> > With below minor comments fixed: > > Reviewed-by: Gavin Shan > > >--- > > drivers/pci/iov.c | 49 > > +++ > > drivers/pci/pci.h |1 + > > drivers/pci/setup-res.c | 13 +++- > > 3 files changed, 61 insertions(+), 2 deletions(-) > > > >diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c > >index d41ec29..d00ed5c 100644 > >--- a/drivers/pci/iov.c > >+++ b/drivers/pci/iov.c > >@@ -571,6 +571,55 @@ int pci_iov_resource_bar(struct pci_dev *dev, int resno) > > 4 * (resno - PCI_IOV_RESOURCES); > > } > > > >+/** > >+ * pci_iov_update_resource - update a VF BAR > >+ * @dev: the PCI device > >+ * @resno: the resource number > >+ * > >+ * Update a VF BAR in the SR-IOV capability of a PF. > >+ */ > >+void pci_iov_update_resource(struct pci_dev *dev, int resno) > >+{ > >+struct pci_sriov *iov = dev->is_physfn ? dev->sriov : NULL; > >+struct resource *res = dev->resource + resno; > >+int vf_bar = resno - PCI_IOV_RESOURCES; > >+struct pci_bus_region region; > >+u32 new; > >+int reg; > >+ > >+/* > >+ * The generic pci_restore_bars() path calls this for all devices, > >+ * including VFs and non-SR-IOV devices. If this is not a PF, we > >+ * have nothing to do. > >+ */ > >+if (!iov) > >+return; > >+ > >+/* > >+ * Ignore unimplemented BARs, unused resource slots for 64-bit > >+ * BARs, and non-movable resources, e.g., those described via > >+ * Enhanced Allocation. > >+ */ > >+if (!res->flags) > >+return; > >+ > >+if (res->flags & IORESOURCE_UNSET) > >+return; > >+ > >+if (res->flags & IORESOURCE_PCI_FIXED) > >+return; > >+ > >+pcibios_resource_to_bus(dev->bus, , res); > >+new = region.start; > >+ > > The bits indicating the BAR's property (e.g. memory, IO etc) are missed in > @new. Hmm, yes. I omitted those because those bits are supposed to be read-only, per spec (PCI r3.0, sec 6.2.5.1). But I guess it would be more conservative to keep them, and this shouldn't be needlessly different from pci_std_update_resource(). However, I don't think this code in pci_update_resource() is obviously correct: new = region.start | (res->flags & PCI_REGION_FLAG_MASK); PCI_REGION_FLAG_MASK is 0xf. For memory BARs, bits 0-3 are read-only property bits. For I/O BARs, bits 0-1 are read-only and bits 2-3 are part of the address, so on the face of it, the above could corrupt two bits of an I/O address. It's true that decode_bar() initializes flags correctly, using PCI_BASE_ADDRESS_IO_MASK for I/O BARs and PCI_BASE_ADDRESS_MEM_MASK for memory BARs, but it would take a little more digging to be sure that we never set bits 2-3 of flags for an I/O resource elsewhere. How about this in pci_std_update_resource(): pcibios_resource_to_bus(dev->bus, , res); new = region.start; if (res->flags & IORESOURCE_IO) { mask = (u32)PCI_BASE_ADDRESS_IO_MASK; new |= res->flags & ~PCI_BASE_ADDRESS_IO_MASK; } else { mask = (u32)PCI_BASE_ADDRESS_MEM_MASK; new |= res->flags & ~PCI_BASE_ADDRESS_MEM_MASK; } and this in pci_iov_update_resource(): pcibios_resource_to_bus(dev->bus, , res); new = region.start; new |= res->flags & ~PCI_BASE_ADDRESS_MEM_MASK; It shouldn't fix anything, but I think it is more obvious that we can't corrupt bits 2-3 of an I/O BAR. > >+reg = iov->pos + PCI_SRIOV_BAR + 4 * vf_bar; > >+pci_write_config_dword(dev, reg, new); > >+if (res->flags & IORESOURCE_MEM_64) { > >+new = region.start >> 16 >> 16; > > I think it was copied from pci_update_resource(). Why we can't just have "new > = region.start >> 32"? Right; I did copy this from pci_update_resource(). The changelog from cf7bee5a0bf2 ("[PATCH] Fix restore of 64-bit PCI BAR's") says "Also make sure to write high bits - use "x >> 16 >> 16" (rather than the simpler ">> 32") to avoid warnings on 32-bit architectures where we're not going to have any high bits." I didn't take the time to revalidate whether that's still applicable.
Re: [PATCH] scsi/ipr: Fix runaway IRQs when falling back from MSI to LSI
> "Benjamin" == Benjamin Herrenschmidtwrites: Benjamin> LSIs must be ack'ed with an MMIO otherwise they remain Benjamin> asserted forever. This is controlled by the "clear_isr" flag. Benjamin> While we set that flag properly when deciding initially Benjamin> whether to use LSIs or MSIs, we fail to set it if we first Benjamin> chose MSIs, the test fails, then fallback to LSIs. Brian: Please review! -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH v2 04/14] cxlflash: Avoid command room violation
Uma, This looks better, thanks for reworking. -matt > On Nov 28, 2016, at 6:41 PM, Uma Krishnanwrote: > > During test, a command room violation interrupt is occasionally seen > for the master context when the CXL flash devices are stressed. > > After studying the code, there could be gaps in the way command room > value is being cached in cxlflash. When the cached command room is zero > the thread attempting to send becomes burdened with updating the cached > value with the actual value from the AFU. Today, this is handled with an > atomic set operation of the raw value read. Following the atomic update, > the thread proceeds to send. > > This behavior is incorrect on two counts: > > - The update fails to take into account the current thread and its > consumption of one of the hardware commands. > > - The update does not take into account other threads also atomically > updating. Per design, a worker thread updates the cached value when a > send thread times out. By not protecting the update with a lock, the > cached value can be incorrectly clobbered. > > To correct these issues, the update of the cached command room has been > simplified and also protected using a spin lock which is held until the > MMIO is complete. This ensures the command room is properly consumed by > the same thread. Update of cached value also takes into account the > current thread consuming a hardware command. > > Signed-off-by: Uma Krishnan Acked-by: Matthew R. Ochs
powerpc/ps3: Fix system hang with GCC 5 builds
GCC 5 generates different code for this bootwrapper null check that causes the PS3 to hang very early in its bootup. This check is of limited value, so just get rid of it. Signed-off-by: Geoff Levand--- arch/powerpc/boot/ps3-head.S | 5 - arch/powerpc/boot/ps3.c | 8 +--- 2 files changed, 1 insertion(+), 12 deletions(-) diff --git a/arch/powerpc/boot/ps3-head.S b/arch/powerpc/boot/ps3-head.S index b6fcbaf..3dc44b0 100644 --- a/arch/powerpc/boot/ps3-head.S +++ b/arch/powerpc/boot/ps3-head.S @@ -57,11 +57,6 @@ __system_reset_overlay: bctr 1: - /* Save the value at addr zero for a null pointer write check later. */ - - li r4, 0 - lwz r3, 0(r4) - /* Primary delays then goes to _zimage_start in wrapper. */ or 31, 31, 31 /* db16cyc */ diff --git a/arch/powerpc/boot/ps3.c b/arch/powerpc/boot/ps3.c index 4ec2d86..a05558a 100644 --- a/arch/powerpc/boot/ps3.c +++ b/arch/powerpc/boot/ps3.c @@ -119,13 +119,12 @@ void ps3_copy_vectors(void) flush_cache((void *)0x100, 512); } -void platform_init(unsigned long null_check) +void platform_init(void) { const u32 heapsize = 0x100 - (u32)_end; /* 16MiB */ void *chosen; unsigned long ft_addr; u64 rm_size; - unsigned long val; console_ops.write = ps3_console_write; platform_ops.exit = ps3_exit; @@ -153,11 +152,6 @@ void platform_init(unsigned long null_check) printf(" flat tree at 0x%lx\n\r", ft_addr); - val = *(unsigned long *)0; - - if (val != null_check) - printf("null check failed: %lx != %lx\n\r", val, null_check); - ((kernel_entry_t)0)(ft_addr, 0, NULL); ps3_exit(); -- 2.7.4