Re: [RFC FIX PATCH v0] powerpc,numa: Fix memory_hotplug_max()
On 04/06/2016 04:44 AM, Bharata B Rao wrote: > memory_hotplug_max() uses hot_add_drconf_memory_max() to get maxmimum > addressable memory by referring to ibm,dyanamic-memory property. There > are three problems with the current approach: > > 1 hot_add_drconf_memory_max() assumes that ibm,dynamic-memory includes > all the LMBs of the guest, but that is not true for PowerKVM which > populates only DR LMBs (LMBs that can be hotplugged/removed) in that > property. > 2 hot_add_drconf_memory_max() multiplies lmb-size with lmb-count to arrive > at the max possible address. Since ibm,dynamic-memory doesn't include > RMA LMBs, the address thus obtained will be less than the actual max > address. For example, if max possible memory size is 32G, with lmb-size > of 256MB there can be 127 LMBs in ibm,dynamic-memory (1 LMB for RMA > which won't be present here). hot_add_drconf_memory_max() would then > return the max addressable memory as 127 * 256MB = 31.75GB, the max > address should have been 32G which is what ibm,lrdr-capacity shows. > 3 In PowerKVM, there can be a gap between the end of boot time RAM and > beginning of hotplug RAM area. So just multiplying lmb-count with > lmb-size will not provide the correct max possible address for PowerKVM. > > This patch fixes 1 by using ibm,lrdr-capacity property to return the max > addressable memory whenever the property is present. Then it fixes 2 & 3 > by fetching the address of the last LMB in ibm,dynamic-memory property. > > NOTE: There are some unnecessary changes in the patch because of converting > spaces to tabs w/o which checkpatch.pl complains. > > Signed-off-by: Bharata B Rao> --- > arch/powerpc/mm/numa.c | 29 ++--- > 1 file changed, 22 insertions(+), 7 deletions(-) > > diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c > index 669a15e..57d5877 100644 > --- a/arch/powerpc/mm/numa.c > +++ b/arch/powerpc/mm/numa.c > @@ -1164,17 +1164,32 @@ int hot_add_scn_to_nid(unsigned long scn_addr) > static u64 hot_add_drconf_memory_max(void) > { > struct device_node *memory = NULL; > -unsigned int drconf_cell_cnt = 0; > -u64 lmb_size = 0; > + struct device_node *dn = NULL; > + unsigned int drconf_cell_cnt = 0; > + u64 lmb_size = 0; > const __be32 *dm = NULL; > + const __be64 *lrdr = NULL; > + struct of_drconf_cell drmem; > + > + dn = of_find_node_by_path("/rtas"); > + if (dn) { > + lrdr = of_get_property(dn, "ibm,lrdr-capacity", NULL); > + of_node_put(dn); > + if (lrdr) > + return be64_to_cpup(lrdr); > + } > > memory = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory"); > if (memory) { > -drconf_cell_cnt = of_get_drconf_memory(memory, ); > -lmb_size = of_get_lmb_size(memory); > -of_node_put(memory); > -} > -return lmb_size * drconf_cell_cnt; > + drconf_cell_cnt = of_get_drconf_memory(memory, ); > + lmb_size = of_get_lmb_size(memory); > + > + /* Advance to the last cell, each cell has 6 32 bit integers */ > + dm += (drconf_cell_cnt - 1) * 6; You could do this as follows to avoid hard-coding 6 dm += (drconf_cell_cnt - 1) * sizeof(struct of_drconf_cell) > + read_drconf_cell(, ); > + of_node_put(memory); > + } > + return drmem.base_addr + lmb_size; I assume it is a safe assumption that there will only be 1 RMA LMB? I do see that the PAPR defines a bit in the flags field for each LMB in ibm,dynamic-memory as 'reserved'. Is this something you could use to flag RMA LMBs and put them in the ibm,dynamic-memory property? I'm just curious why these LMBs are not in this property. -Nathan > } > > /* > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH kernel] vfio_iommu_spapr_tce: Remove unneeded iommu_group_get_iommudata
On Fri, Apr 08, 2016 at 02:54:41PM +1000, Alexey Kardashevskiy wrote: > This removes iommu_group_get_iommudata() as the result is never used. > As this is a minor cleanup, no change in behavior is expected. > > Signed-off-by: Alexey KardashevskiyReviewed-by: David Gibson > --- > drivers/vfio/vfio_iommu_spapr_tce.c | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c > b/drivers/vfio/vfio_iommu_spapr_tce.c > index 0582b72..6419566 100644 > --- a/drivers/vfio/vfio_iommu_spapr_tce.c > +++ b/drivers/vfio/vfio_iommu_spapr_tce.c > @@ -331,14 +331,12 @@ static void tce_iommu_free_table(struct iommu_table > *tbl); > static void tce_iommu_release(void *iommu_data) > { > struct tce_container *container = iommu_data; > - struct iommu_table_group *table_group; > struct tce_iommu_group *tcegrp; > long i; > > while (tce_groups_attached(container)) { > tcegrp = list_first_entry(>group_list, > struct tce_iommu_group, next); > - table_group = iommu_group_get_iommudata(tcegrp->grp); > tce_iommu_detach_group(iommu_data, tcegrp->grp); > } > -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson signature.asc Description: PGP signature ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH kernel] vfio_iommu_spapr_tce: Remove unneeded iommu_group_get_iommudata
This removes iommu_group_get_iommudata() as the result is never used. As this is a minor cleanup, no change in behavior is expected. Signed-off-by: Alexey Kardashevskiy--- drivers/vfio/vfio_iommu_spapr_tce.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c index 0582b72..6419566 100644 --- a/drivers/vfio/vfio_iommu_spapr_tce.c +++ b/drivers/vfio/vfio_iommu_spapr_tce.c @@ -331,14 +331,12 @@ static void tce_iommu_free_table(struct iommu_table *tbl); static void tce_iommu_release(void *iommu_data) { struct tce_container *container = iommu_data; - struct iommu_table_group *table_group; struct tce_iommu_group *tcegrp; long i; while (tce_groups_attached(container)) { tcegrp = list_first_entry(>group_list, struct tce_iommu_group, next); - table_group = iommu_group_get_iommudata(tcegrp->grp); tce_iommu_detach_group(iommu_data, tcegrp->grp); } -- 2.5.0.rc3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH][v3] mtd/ifc: Add support for IFC controller version 2.0
On Wed, 6 Apr 2016 18:53:39 + Yang-Leo Liwrote: > > > > -Original Message- > > From: Brian Norris [mailto:computersforpe...@gmail.com] > > Sent: Wednesday, April 06, 2016 12:53 PM > > To: Li Yang > > Cc: Scott Wood ; Raghav Dogra ; > > linux-...@lists.infradead.org; linuxppc-dev ; > > Prabhakar Kushwaha ; Raghav Dogra > > ; Jaiprakash Singh ; Boris > > Brezillon > > Subject: Re: [PATCH][v3] mtd/ifc: Add support for IFC controller version 2.0 > > > > Hi, > > > > On Wed, Mar 30, 2016 at 03:43:40PM -0500, Li Yang wrote: > > > Hi Brian, > > > > > > Could you help to review and pull in this patch and the Kconfig update > > > after this patch(https://patchwork.ozlabs.org/patch/557389/)? It > > > > It's probably best for Boris to apply this now. > > Thanks Brian. I see Boris is taking over the maintenance of NAND recently, > we will follow up with him. > > Hi Boris, > > Can you consider to pull in this patch and then Kconfig patch > (https://patchwork.ozlabs.org/patch/557389/)? Applied. Thanks, Boris -- Boris Brezillon, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v11 58/60] PCI: Introduce resource_disabled()
Current is using !flags, and we are going to use IORESOURCE_DISABLED instead of clearing resource flags. Let's convert all !flags to helper function resource_disabled(). resource_disabled will check !flags and IORESOURCE_DISABLED both. Cc: linux-al...@vger.kernel.org Cc: linux-i...@vger.kernel.org Cc: linux-am33-l...@redhat.com Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-s...@vger.kernel.org Cc: sparcli...@vger.kernel.org Cc: linux-...@vger.kernel.org Cc: linux-xte...@linux-xtensa.org Cc: io...@lists.linux-foundation.org Cc: linux...@vger.kernel.org Signed-off-by: Yinghai LuAcked-by: Michael Ellerman --- arch/alpha/kernel/pci.c | 2 +- arch/ia64/pci/pci.c | 4 ++-- arch/microblaze/pci/pci-common.c | 15 --- arch/mn10300/unit-asb2305/pci-asb2305.c | 4 ++-- arch/mn10300/unit-asb2305/pci.c | 4 ++-- arch/powerpc/kernel/pci-common.c | 16 +--- arch/powerpc/platforms/powernv/pci-ioda.c | 12 ++-- arch/s390/pci/pci.c | 2 +- arch/sparc/kernel/pci.c | 2 +- arch/x86/pci/i386.c | 4 ++-- arch/xtensa/kernel/pci.c | 4 ++-- drivers/iommu/intel-iommu.c | 3 ++- drivers/pci/host/pcie-rcar.c | 2 +- drivers/pci/iov.c | 2 +- drivers/pci/probe.c | 2 +- drivers/pci/quirks.c | 4 ++-- drivers/pci/rom.c | 2 +- drivers/pci/setup-bus.c | 8 drivers/pci/setup-res.c | 2 +- include/linux/ioport.h| 4 20 files changed, 53 insertions(+), 45 deletions(-) diff --git a/arch/alpha/kernel/pci.c b/arch/alpha/kernel/pci.c index 5f387ee..c89c8ef 100644 --- a/arch/alpha/kernel/pci.c +++ b/arch/alpha/kernel/pci.c @@ -282,7 +282,7 @@ pcibios_claim_one_bus(struct pci_bus *b) for (i = 0; i < PCI_NUM_RESOURCES; i++) { struct resource *r = >resource[i]; - if (r->parent || !r->start || !r->flags) + if (r->parent || !r->start || resource_disabled(r)) continue; if (pci_has_flag(PCI_PROBE_ONLY) || (r->flags & IORESOURCE_PCI_FIXED)) { diff --git a/arch/ia64/pci/pci.c b/arch/ia64/pci/pci.c index 8f6ac2f..f00373f 100644 --- a/arch/ia64/pci/pci.c +++ b/arch/ia64/pci/pci.c @@ -333,7 +333,7 @@ void pcibios_fixup_device_resources(struct pci_dev *dev) for (idx = 0; idx < PCI_BRIDGE_RESOURCES; idx++) { struct resource *r = >resource[idx]; - if (!r->flags || r->parent || !r->start) + if (resource_disabled(r) || r->parent || !r->start) continue; pci_claim_resource(dev, idx); @@ -351,7 +351,7 @@ static void pcibios_fixup_bridge_resources(struct pci_dev *dev) for (idx = PCI_BRIDGE_RESOURCES; idx < PCI_NUM_RESOURCES; idx++) { struct resource *r = >resource[idx]; - if (!r->flags || r->parent || !r->start) + if (resource_disabled(r) || r->parent || !r->start) continue; pci_claim_bridge_resource(dev, idx); diff --git a/arch/microblaze/pci/pci-common.c b/arch/microblaze/pci/pci-common.c index 35654be..4cc5ed0 100644 --- a/arch/microblaze/pci/pci-common.c +++ b/arch/microblaze/pci/pci-common.c @@ -694,7 +694,7 @@ static void pcibios_fixup_resources(struct pci_dev *dev) } for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) { struct resource *res = dev->resource + i; - if (!res->flags) + if (resource_disabled(res)) continue; if (res->start == 0) { pr_debug("PCI:%s Resource %d %016llx-%016llx [%x]", @@ -795,7 +795,7 @@ static void pcibios_fixup_bridge(struct pci_bus *bus) pci_bus_for_each_resource(bus, res, i) { if (!res) continue; - if (!res->flags) + if (resource_disabled(res)) continue; if (i >= 3 && bus->self->transparent) continue; @@ -964,7 +964,7 @@ static void pcibios_allocate_bus_resources(struct pci_bus *bus) pci_domain_nr(bus), bus->number); pci_bus_for_each_resource(bus, res, i) { - if (!res || !res->flags + if (!res || resource_disabled(res) || res->start > res->end || res->parent) continue; if (bus->parent == NULL) @@ -1066,7 +1066,8 @@ static void __init pcibios_allocate_resources(int pass) r = >resource[idx]; if (r->parent) /* Already allocated */
[PATCH v11 52/60] PCI: Unify skip_ioresource_align()
There are powerpc generic version and x86 local version for skip_ioresource_align(). Move the powerpc version to setup-bus.c, and kill x86 local version. Also kill dummy version in microblaze. Cc: Michal SimekCc: Paul Mackerras Cc: Michael Ellerman Cc: Arnd Bergmann Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-a...@vger.kernel.org Signed-off-by: Yinghai Lu Reviewed-by: Thomas Gleixner Acked-by: Michael Ellerman --- arch/powerpc/kernel/pci-common.c | 11 +-- arch/x86/include/asm/pci_x86.h | 1 - arch/x86/pci/common.c| 4 ++-- arch/x86/pci/i386.c | 11 +-- drivers/pci/setup-bus.c | 9 + include/linux/pci.h | 2 ++ 6 files changed, 15 insertions(+), 23 deletions(-) diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c index 0f7a60f..2a7f4fd 100644 --- a/arch/powerpc/kernel/pci-common.c +++ b/arch/powerpc/kernel/pci-common.c @@ -1053,15 +1053,6 @@ void pci_fixup_cardbus(struct pci_bus *bus) pcibios_setup_bus_devices(bus); } - -static int skip_isa_ioresource_align(struct pci_dev *dev) -{ - if (pci_has_flag(PCI_CAN_SKIP_ISA_ALIGN) && - !(dev->bus->bridge_ctl & PCI_BRIDGE_CTL_ISA)) - return 1; - return 0; -} - /* * We need to avoid collisions with `mirrored' VGA ports * and other strange ISA hardware, so we always want the @@ -1082,7 +1073,7 @@ resource_size_t pcibios_align_resource(void *data, const struct resource *res, resource_size_t start = res->start; if (res->flags & IORESOURCE_IO) { - if (skip_isa_ioresource_align(dev)) + if (skip_isa_ioresource_align(dev->bus)) return start; if (start & 0x300) start = (start + 0x3ff) & ~0x3ff; diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h index d08eacd2..d1f919e 100644 --- a/arch/x86/include/asm/pci_x86.h +++ b/arch/x86/include/asm/pci_x86.h @@ -28,7 +28,6 @@ do { \ #define PCI_ASSIGN_ROMS0x1000 #define PCI_BIOS_IRQ_SCAN 0x2000 #define PCI_ASSIGN_ALL_BUSSES 0x4000 -#define PCI_CAN_SKIP_ISA_ALIGN 0x8000 #define PCI_USE__CRS 0x1 #define PCI_CHECK_ENABLE_AMD_MMCONF0x2 #define PCI_HAS_IO_ECS 0x4 diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c index 381a43c..09a16b7 100644 --- a/arch/x86/pci/common.c +++ b/arch/x86/pci/common.c @@ -82,7 +82,7 @@ DEFINE_RAW_SPINLOCK(pci_config_lock); static int __init can_skip_ioresource_align(const struct dmi_system_id *d) { - pci_probe |= PCI_CAN_SKIP_ISA_ALIGN; + pci_add_flags(PCI_CAN_SKIP_ISA_ALIGN); printk(KERN_INFO "PCI: %s detected, can skip ISA alignment\n", d->ident); return 0; } @@ -618,7 +618,7 @@ char *__init pcibios_setup(char *str) pci_routeirq = 1; return NULL; } else if (!strcmp(str, "skip_isa_align")) { - pci_probe |= PCI_CAN_SKIP_ISA_ALIGN; + pci_add_flags(PCI_CAN_SKIP_ISA_ALIGN); return NULL; } else if (!strcmp(str, "noioapicquirk")) { noioapicquirk = 1; diff --git a/arch/x86/pci/i386.c b/arch/x86/pci/i386.c index 0a9f2ca..cf296f5 100644 --- a/arch/x86/pci/i386.c +++ b/arch/x86/pci/i386.c @@ -128,15 +128,6 @@ static void __init pcibios_fw_addr_list_del(void) pcibios_fw_addr_done = true; } -static int -skip_isa_ioresource_align(struct pci_dev *dev) { - - if ((pci_probe & PCI_CAN_SKIP_ISA_ALIGN) && - !(dev->bus->bridge_ctl & PCI_BRIDGE_CTL_ISA)) - return 1; - return 0; -} - /* * We need to avoid collisions with `mirrored' VGA ports * and other strange ISA hardware, so we always want the @@ -158,7 +149,7 @@ pcibios_align_resource(void *data, const struct resource *res, resource_size_t start = res->start; if (res->flags & IORESOURCE_IO) { - if (skip_isa_ioresource_align(dev)) + if (skip_isa_ioresource_align(dev->bus)) return start; if (start & 0x300) start = (start + 0x3ff) & ~0x3ff; diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c index 28dfd8e..5ba4bf5 100644 --- a/drivers/pci/setup-bus.c +++ b/drivers/pci/setup-bus.c @@ -1150,6 +1150,15 @@ static resource_size_t window_alignment(struct pci_bus *bus, return max(align, arch_align); } +int skip_isa_ioresource_align(struct pci_bus *bus) +{ + if (pci_has_flag(PCI_CAN_SKIP_ISA_ALIGN) && + !(bus->bridge_ctl & PCI_BRIDGE_CTL_ISA)) + return 1; + + return 0; +} + static resource_size_t size_aligned_for_isa(resource_size_t size) { /* diff --git a/include/linux/pci.h
[PATCH v11 10/60] powerpc/PCI: Add IORESOURCE_MEM_64 for 64-bit resource in OF parsing
For device resource PREF bit setting under bridge 64-bit pref resource, we need to make sure only set PREF for 64bit resource. This patch set IORESOUCE_MEM_64 for 64bit resource during OF device resource flags parsing. Link: https://bugzilla.kernel.org/show_bug.cgi?id=96261 Link: https://bugzilla.kernel.org/show_bug.cgi?id=96241 Signed-off-by: Yinghai LuCc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Gavin Shan Cc: Yijing Wang Cc: Anton Blanchard Cc: linuxppc-dev@lists.ozlabs.org --- arch/powerpc/kernel/pci_of_scan.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/pci_of_scan.c b/arch/powerpc/kernel/pci_of_scan.c index 719f225..476b8ac5 100644 --- a/arch/powerpc/kernel/pci_of_scan.c +++ b/arch/powerpc/kernel/pci_of_scan.c @@ -44,8 +44,10 @@ static unsigned int pci_parse_of_flags(u32 addr0, int bridge) if (addr0 & 0x0200) { flags = IORESOURCE_MEM | PCI_BASE_ADDRESS_SPACE_MEMORY; - flags |= (addr0 >> 22) & PCI_BASE_ADDRESS_MEM_TYPE_64; flags |= (addr0 >> 28) & PCI_BASE_ADDRESS_MEM_TYPE_1M; + if (addr0 & 0x0100) + flags |= IORESOURCE_MEM_64 +| PCI_BASE_ADDRESS_MEM_TYPE_64; if (addr0 & 0x4000) flags |= IORESOURCE_PREFETCH | PCI_BASE_ADDRESS_MEM_PREFETCH; -- 1.8.4.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V3 1/2] pseries/eeh: Handle RTAS delay requests in configure_bridge
On Thu, Apr 07, 2016 at 04:28:26PM +1000, Russell Currey wrote: >In the "ibm,configure-pe" and "ibm,configure-bridge" RTAS calls, the >spec states that values of 9900-9905 can be returned, indicating that >software should delay for 10^x (where x is the last digit, i.e. 990x) >milliseconds and attempt the call again. Currently, the kernel doesn't >know about this, and respecting it fixes some PCI failures when the >hypervisor is busy. > >The delay is capped at 0.2 seconds. > >Cc:# 3.10+ >Signed-off-by: Russell Currey Acked-by: Gavin Shan >--- >V3 changelog: > - Refactorings and rewordings thanks to Gavin > - Treat return values >9902 as 9902 thanks to Tyrel >--- > arch/powerpc/platforms/pseries/eeh_pseries.c | 51 > 1 file changed, 36 insertions(+), 15 deletions(-) > >diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c >b/arch/powerpc/platforms/pseries/eeh_pseries.c >index ac3ffd9..405baaf 100644 >--- a/arch/powerpc/platforms/pseries/eeh_pseries.c >+++ b/arch/powerpc/platforms/pseries/eeh_pseries.c >@@ -615,29 +615,50 @@ static int pseries_eeh_configure_bridge(struct eeh_pe >*pe) > { > int config_addr; > int ret; >+ /* Waiting 0.2s maximum before skipping configuration */ >+ int max_wait = 200; > > /* Figure out the PE address */ > config_addr = pe->config_addr; > if (pe->addr) > config_addr = pe->addr; > >- /* Use new configure-pe function, if supported */ >- if (ibm_configure_pe != RTAS_UNKNOWN_SERVICE) { >- ret = rtas_call(ibm_configure_pe, 3, 1, NULL, >- config_addr, BUID_HI(pe->phb->buid), >- BUID_LO(pe->phb->buid)); >- } else if (ibm_configure_bridge != RTAS_UNKNOWN_SERVICE) { >- ret = rtas_call(ibm_configure_bridge, 3, 1, NULL, >- config_addr, BUID_HI(pe->phb->buid), >- BUID_LO(pe->phb->buid)); >- } else { >- return -EFAULT; >- } >+ while (max_wait > 0) { >+ /* Use new configure-pe function, if supported */ >+ if (ibm_configure_pe != RTAS_UNKNOWN_SERVICE) { >+ ret = rtas_call(ibm_configure_pe, 3, 1, NULL, >+ config_addr, BUID_HI(pe->phb->buid), >+ BUID_LO(pe->phb->buid)); >+ } else if (ibm_configure_bridge != RTAS_UNKNOWN_SERVICE) { >+ ret = rtas_call(ibm_configure_bridge, 3, 1, NULL, >+ config_addr, BUID_HI(pe->phb->buid), >+ BUID_LO(pe->phb->buid)); >+ } else { >+ return -EFAULT; >+ } > >- if (ret) >- pr_warn("%s: Unable to configure bridge PHB#%d-PE#%x (%d)\n", >- __func__, pe->phb->global_number, pe->addr, ret); >+ if (!ret) >+ return ret; >+ >+ /* >+ * If RTAS returns a delay value that's above 100ms, cut it >+ * down to 100ms in case firmware made a mistake. For more >+ * on how these delay values work see rtas_busy_delay_time >+ */ >+ if (ret > RTAS_EXTENDED_DELAY_MIN+2 && >+ ret <= RTAS_EXTENDED_DELAY_MAX) >+ ret = RTAS_EXTENDED_DELAY_MIN+2; >+ >+ max_wait -= rtas_busy_delay_time(ret); >+ >+ if (max_wait < 0) >+ break; >+ >+ rtas_busy_delay(ret); >+ } > >+ pr_warn("%s: Unable to configure bridge PHB#%d-PE#%x (%d)\n", >+ __func__, pe->phb->global_number, pe->addr, ret); > return ret; > } > >-- >2.8.0 > >___ >Linuxppc-dev mailing list >Linuxppc-dev@lists.ozlabs.org >https://lists.ozlabs.org/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V3 2/2] pseries/eeh: Refactor the configure_bridge RTAS tokens
On Thu, Apr 07, 2016 at 04:28:27PM +1000, Russell Currey wrote: >The RTAS calls "ibm,configure-pe" and "ibm,configure-bridge" perform the >same actions, however the former can skip configuration if unnecessary. >The existing code treats them as different tokens even though only one >will ever be called. Refactor this by making a single token that is >assigned during init. > >Signed-off-by: Russell CurreyAcked-by: Gavin Shan >--- >V3: Reorder commits so the previous patch doesn't depend on this > >I had a look at doing the same with some other duplicated tokens but >they had slight differences in semantics so it wasn't helping clarity. >--- > arch/powerpc/platforms/pseries/eeh_pseries.c | 28 > 1 file changed, 12 insertions(+), 16 deletions(-) > >diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c >b/arch/powerpc/platforms/pseries/eeh_pseries.c >index 405baaf..3998e0f 100644 >--- a/arch/powerpc/platforms/pseries/eeh_pseries.c >+++ b/arch/powerpc/platforms/pseries/eeh_pseries.c >@@ -53,7 +53,6 @@ static int ibm_read_slot_reset_state2; > static int ibm_slot_error_detail; > static int ibm_get_config_addr_info; > static int ibm_get_config_addr_info2; >-static int ibm_configure_bridge; > static int ibm_configure_pe; > > /* >@@ -81,7 +80,14 @@ static int pseries_eeh_init(void) > ibm_get_config_addr_info2 = > rtas_token("ibm,get-config-addr-info2"); > ibm_get_config_addr_info= > rtas_token("ibm,get-config-addr-info"); > ibm_configure_pe= rtas_token("ibm,configure-pe"); >- ibm_configure_bridge= rtas_token("ibm,configure-bridge"); >+ >+ /* >+ * ibm,configure-pe and ibm,configure-bridge have the same semantics, >+ * however ibm,configure-pe can be faster. If we can't find >+ * ibm,configure-pe then fall back to using ibm,configure-bridge. >+ */ >+ if (ibm_configure_pe == RTAS_UNKNOWN_SERVICE) >+ ibm_configure_pe= rtas_token("ibm,configure-bridge"); > > /* >* Necessary sanity check. We needn't check "get-config-addr-info" >@@ -93,8 +99,7 @@ static int pseries_eeh_init(void) > (ibm_read_slot_reset_state2 == RTAS_UNKNOWN_SERVICE && >ibm_read_slot_reset_state == RTAS_UNKNOWN_SERVICE) || > ibm_slot_error_detail == RTAS_UNKNOWN_SERVICE || >- (ibm_configure_pe == RTAS_UNKNOWN_SERVICE && >- ibm_configure_bridge == RTAS_UNKNOWN_SERVICE)) { >+ ibm_configure_pe == RTAS_UNKNOWN_SERVICE) { > pr_info("EEH functionality not supported\n"); > return -EINVAL; > } >@@ -624,18 +629,9 @@ static int pseries_eeh_configure_bridge(struct eeh_pe *pe) > config_addr = pe->addr; > > while (max_wait > 0) { >- /* Use new configure-pe function, if supported */ >- if (ibm_configure_pe != RTAS_UNKNOWN_SERVICE) { >- ret = rtas_call(ibm_configure_pe, 3, 1, NULL, >- config_addr, BUID_HI(pe->phb->buid), >- BUID_LO(pe->phb->buid)); >- } else if (ibm_configure_bridge != RTAS_UNKNOWN_SERVICE) { >- ret = rtas_call(ibm_configure_bridge, 3, 1, NULL, >- config_addr, BUID_HI(pe->phb->buid), >- BUID_LO(pe->phb->buid)); >- } else { >- return -EFAULT; >- } >+ ret = rtas_call(ibm_configure_pe, 3, 1, NULL, >+ config_addr, BUID_HI(pe->phb->buid), >+ BUID_LO(pe->phb->buid)); > > if (!ret) > return ret; >-- >2.8.0 > >___ >Linuxppc-dev mailing list >Linuxppc-dev@lists.ozlabs.org >https://lists.ozlabs.org/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v4 1/3] ppc64/book3s: fix branching to out of line handlers in relocation kernel
Some of the interrupt vectors on 64-bit POWER server processors are only 32 bytes long (8 instructions), which is not enough for the full first-level interrupt handler. For these we need to branch to an out- of-line (OOL) handler. But when we are running a relocatable kernel, interrupt vectors till __end_interrupts marker are copied down to real address 0x100. So, branching to labels (read OOL handlers) outside this section should be handled differently (see LOAD_HANDLER()), considering relocatable kernel, which would need atleast 4 instructions. However, branching from interrupt vector means that we corrupt the CFAR (come-from address register) on POWER7 and later processors as mentioned in commit 1707dd16. So, EXCEPTION_PROLOG_0 (6 instructions) that contains the part up to the point where the CFAR is saved in the PACA should be part of the short interrupt vectors before we branch out to OOL handlers. But as mentioned already, there are interrupt vectors on 64-bit POWER server processors that are only 32 bytes long (like vectors 0x4f00, 0x4f20, etc.), which cannot accomodate the above two cases at the same time owing to space constraint. Currently, in these interrupt vectors, we simply branch out to OOL handlers, without using LOAD_HANDLER(), which leaves us vulnerable when running a relocatable kernel (eg. kdump case). While this has been the case for sometime now and kdump is used widely, we were fortunate not to see any problems so far, for three reasons: 1. In almost all cases, production kernel (relocatable) is used for kdump as well, which would mean that crashed kernel's OOL handler would be at the same place where we endup branching to, from short interrupt vector of kdump kernel. 2. Also, OOL handler was unlikely the reason for crash in almost all the kdump scenarios, which meant we had a sane OOL handler from crashed kernel that we branched to. 3. On most 64-bit POWER server processors, page size is large enough that marking interrupt vector code as executable (see commit 429d2e83) leads to marking OOL handler code from crashed kernel, that sits right below interrupt vector code from kdump kernel, as executable as well. Let us fix this undependable code path by moving these OOL handlers below __end_interrupts marker to make sure we also copy these handlers to real address 0x100 when running a relocatable kernel. Because the interrupt vectors branching to these OOL handlers are not long enough to use LOAD_HANDLER() for branching as discussed above. This fix has been tested successfully in kdump scenario, on a lpar with 4K page size by using different default/production kernel and kdump kernel. Signed-off-by: Hari BathiniSigned-off-by: Mahesh Salgaonkar --- Michael, I did test this patchset in different scenarios. But if you feel the change is too radical, we could go with version2. But I thought this was worth a shot. changes from v3: 1. No changes in this patch except for a spellcheck 2. A new patch that tries to free up space below 0x7000 (2/3) 3. A new patch to remove __end_handlers marker (3/3) arch/powerpc/kernel/exceptions-64s.S | 29 +++-- 1 file changed, 19 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 7716ceb..f76b2f3 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -953,6 +953,25 @@ hv_facility_unavailable_relon_trampoline: #endif STD_RELON_EXCEPTION_PSERIES(0x5700, 0x1700, altivec_assist) + /* +* Out-Of-Line handlers for relocation-on interrupt vectors +* +* We need these OOL handlers to be below __end_interrupts +* marker to ensure we also copy these OOL handlers along +* with the interrupt vectors to real address 0x100 when +* running a relocatable kernel. Because the interrupt +* vectors branching to these OOL handlers are not long +* enough to use LOAD_HANDLER() for branching. +*/ + STD_RELON_EXCEPTION_HV_OOL(0xe40, emulation_assist) + MASKABLE_RELON_EXCEPTION_HV_OOL(0xe80, h_doorbell) + + STD_RELON_EXCEPTION_PSERIES_OOL(0xf00, performance_monitor) + STD_RELON_EXCEPTION_PSERIES_OOL(0xf20, altivec_unavailable) + STD_RELON_EXCEPTION_PSERIES_OOL(0xf40, vsx_unavailable) + STD_RELON_EXCEPTION_PSERIES_OOL(0xf60, facility_unavailable) + STD_RELON_EXCEPTION_HV_OOL(0xf80, hv_facility_unavailable) + /* Other future vectors */ .align 7 .globl __end_interrupts @@ -1234,16 +1253,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX) .globl __end_handlers __end_handlers: - /* Equivalents to the above handlers for relocation-on interrupt vectors */ - STD_RELON_EXCEPTION_HV_OOL(0xe40, emulation_assist) -
[PATCH v4 2/3] ppc64/book3s: make some room for common interrupt vector code
With the previous patch, we choke out whatever little space is left below 0x7000 (FWNMI hard block) while there is a hole of ~1400 bytes below __end_interrupts marker when CONFIG_CBE_RAS is disabled. Considering CONFIG_CBE_RAS is not enabled by default for BOOK3S, this is not a desirable scenario especially when we have to worry about each additional instruction that goes below 0x7000. Memory region from 0x1800 to 0x4000 is dedicated for common interrupt vector code. Also, we never hit an interrupt below 0x300 when IR=DR=1 implying memory region between 0x4000 to 0x4300 can also be used for common interrupt vector code. So, we can effectively use memory region between 0x1800 to 0x4300 for common interrupt vector code. This patch tries to free up some space below 0x7000 by rearranging the common interrupt vector code. The approach here is to avoid large holes below 0x4300 for any kernel configuration. For this, let us move common interrupt vector code that only gets enabled with CONFIG_CBE_RAS above 0x8000, as it doesn't need to be too close to the call sites and can be branched to with LOAD_HANDLER() as long as it is within the first 64KB (0x1) of the kernel image. Instead, lets move common interrupt vector code marked h_instr_storage_common, facility_unavailable_common & hv_facility_unavailable_common below 0x4300. This leaves ~250 bytes free below 0x4300 and ~1150 bytes free below 0x7000 - enough space to stop worrying about every additional instruction that goes below 0x7000. This patch assumes at least commit 376af594, part of the patch series that starts with commit 468a3302, is part of the code to avoid messy compilation issues like: relocation truncated to fit: R_PPC64_REL14 against `.text'+1c90 Makefile:864: recipe for target 'vmlinux' failed I tested this patch successfully on ppc64, ppc64le lpars and baremetal environments. Couldn't test it on IBM cell blade though but expecting no problems with this patch in IBM cell blade environment as well. If someone can test this patch in cell platform, it would be great. Signed-off-by: Hari Bathini--- arch/powerpc/kernel/exceptions-64s.S | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index f76b2f3..c193ebd 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -786,6 +786,7 @@ kvmppc_skip_Hinterrupt: STD_EXCEPTION_COMMON(0xb00, trap_0b, unknown_exception) STD_EXCEPTION_COMMON(0xd00, single_step, single_step_exception) STD_EXCEPTION_COMMON(0xe00, trap_0e, unknown_exception) + STD_EXCEPTION_COMMON(0xe20, h_instr_storage, unknown_exception) STD_EXCEPTION_COMMON(0xe40, emulation_assist, emulation_assist_interrupt) STD_EXCEPTION_COMMON_ASYNC(0xe60, hmi_exception, handle_hmi_exception) #ifdef CONFIG_PPC_DOORBELL @@ -794,6 +795,9 @@ kvmppc_skip_Hinterrupt: STD_EXCEPTION_COMMON_ASYNC(0xe80, h_doorbell, unknown_exception) #endif STD_EXCEPTION_COMMON_ASYNC(0xf00, performance_monitor, performance_monitor_exception) + STD_EXCEPTION_COMMON(0xf60, facility_unavailable, facility_unavailable_exception) + STD_EXCEPTION_COMMON(0xf80, hv_facility_unavailable, facility_unavailable_exception) + STD_EXCEPTION_COMMON(0x1300, instruction_breakpoint, instruction_breakpoint_exception) STD_EXCEPTION_COMMON(0x1502, denorm, unknown_exception) #ifdef CONFIG_ALTIVEC @@ -801,11 +805,6 @@ kvmppc_skip_Hinterrupt: #else STD_EXCEPTION_COMMON(0x1700, altivec_assist, unknown_exception) #endif -#ifdef CONFIG_CBE_RAS - STD_EXCEPTION_COMMON(0x1200, cbe_system_error, cbe_system_error_exception) - STD_EXCEPTION_COMMON(0x1600, cbe_maintenance, cbe_maintenance_exception) - STD_EXCEPTION_COMMON(0x1800, cbe_thermal, cbe_thermal_exception) -#endif /* CONFIG_CBE_RAS */ /* * Relocation-on interrupts: A subset of the interrupts can be delivered @@ -1029,8 +1028,6 @@ instruction_access_common: li r5,0x400 b do_hash_page/* Try to handle as hpte fault */ - STD_EXCEPTION_COMMON(0xe20, h_instr_storage, unknown_exception) - /* * Here is the common SLB miss user that is used when going to virtual * mode for SLB misses, that is currently not used @@ -1246,9 +1243,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX) bl vsx_unavailable_exception b ret_from_except - STD_EXCEPTION_COMMON(0xf60, facility_unavailable, facility_unavailable_exception) - STD_EXCEPTION_COMMON(0xf80, hv_facility_unavailable, facility_unavailable_exception) - .align 7 .globl __end_handlers __end_handlers: @@ -1268,6 +1262,12 @@ fwnmi_data_area: . = 0x8000 #endif /* defined(CONFIG_PPC_PSERIES) || defined(CONFIG_PPC_POWERNV) */ +#ifdef CONFIG_CBE_RAS +
[PATCH v4 3/3] ppc64/book3s: remove __end_handlers marker
__end_handlers marker was intended to mark down upto code that gets called from exception prologs. But that hasn't kept pace with code changes. Case in point, slb_miss_realmode being called from exception prolog code but isn't below __end_handlers marker. So, __end_handlers marker is as good as a comment but could be misleading at times if it isn't in sync with the code, as is the case now. So, let us avoid this confusion by having a better comment and removing __end_handlers marker altogether. Signed-off-by: Hari Bathini--- arch/powerpc/kernel/exceptions-64s.S | 13 - 1 file changed, 4 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index c193ebd..80f9fc4 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -764,11 +764,10 @@ kvmppc_skip_Hinterrupt: #endif /* - * Code from here down to __end_handlers is invoked from the - * exception prologs above. Because the prologs assemble the - * addresses of these handlers using the LOAD_HANDLER macro, - * which uses an ori instruction, these handlers must be in - * the first 64k of the kernel image. + * Ensure that any handlers that get invoked from the exception prologs + * above are below the first 64KB (0x1) of the kernel image because + * the prologs assemble the addresses of these handlers using the + * LOAD_HANDLER macro, which uses an ori instruction. */ /*** Common interrupt handlers ***/ @@ -1243,10 +1242,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX) bl vsx_unavailable_exception b ret_from_except - .align 7 - .globl __end_handlers -__end_handlers: - #if defined(CONFIG_PPC_PSERIES) || defined(CONFIG_PPC_POWERNV) /* * Data area reserved for FWNMI option. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V10 00/28] Add new powerpc specific ELF core notes
On 7 April 2016 7:23:46 pm AEST, Laurent Dufourwrote: >On 16/02/2016 09:59, Anshuman Khandual wrote: >> This patch series adds twelve new ELF core note sections which can >> be used with existing ptrace request PTRACE_GETREGSET-SETREGSET for >accessing >> various transactional memory and other miscellaneous debug register >sets on >> powerpc platform. > >Hi Michael, > >This series is required to handle TM state in CRIU. >Is there a chance to get it upstream soon ? We were waiting on the gdb support to make sure it had some testing. If it's working for CRIU that would be a good data point, have you actually tested it with CRIU? cheers -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC v5 7/7] vfio-pci: Allow to mmap MSI-X table if interrupt remapping is supported
Hi Yongji, On 04/07/2016 01:38 PM, Yongji Xie wrote: > On 2016/4/6 22:45, Alex Williamson wrote: >> On Tue, 5 Apr 2016 21:46:44 +0800 >> Yongji Xiewrote: >> >>> This patch enables mmapping MSI-X tables if >>> hardware supports interrupt remapping which >>> can ensure that a given pci device can only >>> shoot the MSIs assigned for it. >>> >>> Signed-off-by: Yongji Xie >>> --- >>> drivers/vfio/pci/vfio_pci.c |9 +++-- >>> drivers/vfio/pci/vfio_pci_private.h |1 + >>> drivers/vfio/pci/vfio_pci_rdwr.c|2 +- >>> 3 files changed, 9 insertions(+), 3 deletions(-) >>> >>> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c >>> index c60d790..ef02896 100644 >>> --- a/drivers/vfio/pci/vfio_pci.c >>> +++ b/drivers/vfio/pci/vfio_pci.c >>> @@ -201,6 +201,10 @@ static int vfio_pci_enable(struct >>> vfio_pci_device *vdev) >>> } else >>> vdev->msix_bar = 0xFF; >>> +if (iommu_capable(pdev->dev.bus, IOMMU_CAP_INTR_REMAP) || >> This doesn't address the issue I raised earlier where ARM SMMU sets >> this capability, but doesn't really provide per vector isolation. ARM >> either needs to be fixed or we need to consider the whole capability >> tainted for this application and standardize around the bus flags. >> It's not very desirable to have two different ways to test this anyway. > > I saw Eric posted a patchset [1] which introduce a flag > MSI_FLAG_IRQ_REMAPPING to indicate the capability > for ARM SMMU. With this patchset applied, it would > be workable to use bus_flags to test the capability > of ARM SMMU: My purpose was to remove the advertising of IOMMU_CAP_INTR_REMAP from arm-smmu.c, "fix" mentionned by Alex (by the way I also need to do the same in v3 code) and to advertise the functionality on MSI controller instead (since the IRQ REMAPPING functionality is abstracted in GICv3 ITS MSI controller) On top of that, on ARM we have platform (non PCI) MSI controllers so my understanding is the capability advertising should be possible beyond the PCI bus? Best Regards Eric > > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c > index a080f44..b2d1756 100644 > --- a/drivers/pci/msi.c > +++ b/drivers/pci/msi.c > @@ -1134,6 +1134,21 @@ void *msi_desc_to_pci_sysdata(struct msi_desc *desc) > } > EXPORT_SYMBOL_GPL(msi_desc_to_pci_sysdata); > > +void pci_check_msi_remapping(struct pci_bus *bus) > +{ > +#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN > +struct irq_domain *domain; > +struct msi_domain_info *info; > + > +domain = dev_get_msi_domain(>dev); > +if (domain) { > +info = msi_get_domain_info(domain); > +if (info->flags & MSI_FLAG_IRQ_REMAPPING) > +pdev->bus->bus_flags |= PCI_BUS_FLAGS_MSI_REMAP; > +} > +#endif > +} > + > #ifdef CONFIG_PCI_MSI_IRQ_DOMAIN > /** > * pci_msi_domain_write_msg - Helper to write MSI message to PCI config > space > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c > index 6d7ab9b..24e9606 100644 > --- a/drivers/pci/probe.c > +++ b/drivers/pci/probe.c > @@ -2115,6 +2115,7 @@ struct pci_bus *pci_create_root_bus(struct device > *parent, int bus, > device_enable_async_suspend(b->bridge); > pci_set_bus_of_node(b); > pci_set_bus_msi_domain(b); > + pci_check_msi_remapping(b); > > if (!parent) > set_dev_node(b->bridge, pcibus_to_node(b)); > diff --git a/include/linux/msi.h b/include/linux/msi.h > index a2a0068..fe8ce7b 100644 > --- a/include/linux/msi.h > +++ b/include/linux/msi.h > @@ -15,6 +15,7 @@ extern int pci_msi_ignore_mask; > struct irq_data; > struct msi_desc; > struct pci_dev; > +struct pci_bus; > struct platform_msi_priv_data; > void __get_cached_msi_msg(struct msi_desc *entry, struct msi_msg *msg); > void get_cached_msi_msg(unsigned int irq, struct msi_msg *msg); > @@ -155,6 +156,8 @@ void arch_restore_msi_irqs(struct pci_dev *dev); > void default_teardown_msi_irqs(struct pci_dev *dev); > void default_restore_msi_irqs(struct pci_dev *dev); > > +void pci_check_msi_remapping(struct pci_bus *bus); > + > struct msi_controller { > struct module *owner; > struct device *dev; > > Next we just need to find a proper way to make > bus_flags compatible with IOMMU_CAP_INTR_REMAP, right? > > I think a good place to do that is add_iommu_group(). > But I'm not sure whether iommu drivers must be > initialized after PCI enumeration. Do you have any comment? > > [1] http://www.spinics.net/lists/kvm/msg130256.html > >>> +pdev->bus->bus_flags | PCI_BUS_FLAGS_MSI_REMAP) >> Perhaps some sort of wrapper for testing these flags would help avoid >> this kind of coding error (| vs &) > > Thank you. I'll try not to make the same mistake again. > > Regards, > Yongji > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org
Re: [PATCH v2] powerpc: Fix incorrect PPC32 PAMU dependency
On Wed, Mar 16, 2016 at 11:15:44PM -0500, Andy Fleming wrote: > The Freescale PAMU can be enabled on both 32 and 64-bit Power > chips. Commit 477ab7a19cec8409e4e2dd10e7348e4cac3c06e5 > (iommu: Make more drivers depend on COMPILE_TEST) > restricted PAMU to PPC32. PPC covers both. > > Signed-off-by: Andy Fleming> --- > > v2: Implemented Michael Ellerman's suggestion to clean up the > dependency chain Applied, thanks. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC v5 7/7] vfio-pci: Allow to mmap MSI-X table if interrupt remapping is supported
On 2016/4/6 22:45, Alex Williamson wrote: On Tue, 5 Apr 2016 21:46:44 +0800 Yongji Xiewrote: This patch enables mmapping MSI-X tables if hardware supports interrupt remapping which can ensure that a given pci device can only shoot the MSIs assigned for it. Signed-off-by: Yongji Xie --- drivers/vfio/pci/vfio_pci.c |9 +++-- drivers/vfio/pci/vfio_pci_private.h |1 + drivers/vfio/pci/vfio_pci_rdwr.c|2 +- 3 files changed, 9 insertions(+), 3 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index c60d790..ef02896 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -201,6 +201,10 @@ static int vfio_pci_enable(struct vfio_pci_device *vdev) } else vdev->msix_bar = 0xFF; + if (iommu_capable(pdev->dev.bus, IOMMU_CAP_INTR_REMAP) || This doesn't address the issue I raised earlier where ARM SMMU sets this capability, but doesn't really provide per vector isolation. ARM either needs to be fixed or we need to consider the whole capability tainted for this application and standardize around the bus flags. It's not very desirable to have two different ways to test this anyway. I saw Eric posted a patchset [1] which introduce a flag MSI_FLAG_IRQ_REMAPPING to indicate the capability for ARM SMMU. With this patchset applied, it would be workable to use bus_flags to test the capability of ARM SMMU: diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index a080f44..b2d1756 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -1134,6 +1134,21 @@ void *msi_desc_to_pci_sysdata(struct msi_desc *desc) } EXPORT_SYMBOL_GPL(msi_desc_to_pci_sysdata); +void pci_check_msi_remapping(struct pci_bus *bus) +{ +#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN +struct irq_domain *domain; +struct msi_domain_info *info; + +domain = dev_get_msi_domain(>dev); +if (domain) { +info = msi_get_domain_info(domain); +if (info->flags & MSI_FLAG_IRQ_REMAPPING) +pdev->bus->bus_flags |= PCI_BUS_FLAGS_MSI_REMAP; +} +#endif +} + #ifdef CONFIG_PCI_MSI_IRQ_DOMAIN /** * pci_msi_domain_write_msg - Helper to write MSI message to PCI config space diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 6d7ab9b..24e9606 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -2115,6 +2115,7 @@ struct pci_bus *pci_create_root_bus(struct device *parent, int bus, device_enable_async_suspend(b->bridge); pci_set_bus_of_node(b); pci_set_bus_msi_domain(b); + pci_check_msi_remapping(b); if (!parent) set_dev_node(b->bridge, pcibus_to_node(b)); diff --git a/include/linux/msi.h b/include/linux/msi.h index a2a0068..fe8ce7b 100644 --- a/include/linux/msi.h +++ b/include/linux/msi.h @@ -15,6 +15,7 @@ extern int pci_msi_ignore_mask; struct irq_data; struct msi_desc; struct pci_dev; +struct pci_bus; struct platform_msi_priv_data; void __get_cached_msi_msg(struct msi_desc *entry, struct msi_msg *msg); void get_cached_msi_msg(unsigned int irq, struct msi_msg *msg); @@ -155,6 +156,8 @@ void arch_restore_msi_irqs(struct pci_dev *dev); void default_teardown_msi_irqs(struct pci_dev *dev); void default_restore_msi_irqs(struct pci_dev *dev); +void pci_check_msi_remapping(struct pci_bus *bus); + struct msi_controller { struct module *owner; struct device *dev; Next we just need to find a proper way to make bus_flags compatible with IOMMU_CAP_INTR_REMAP, right? I think a good place to do that is add_iommu_group(). But I'm not sure whether iommu drivers must be initialized after PCI enumeration. Do you have any comment? [1] http://www.spinics.net/lists/kvm/msg130256.html + pdev->bus->bus_flags | PCI_BUS_FLAGS_MSI_REMAP) Perhaps some sort of wrapper for testing these flags would help avoid this kind of coding error (| vs &) Thank you. I'll try not to make the same mistake again. Regards, Yongji ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] kvm-pr: manage single-step mode
Ping? On 22/03/2016 15:53, Laurent Vivier wrote: > Until now, when we connect gdb to the QEMU gdb-server, the > single-step mode is not managed. > > This patch adds this, only for kvm-pr: > > If KVM_GUESTDBG_SINGLESTEP is set, we enable single-step trace bit in the > MSR (MSR_SE) just before the __kvmppc_vcpu_run(), and disable it just after. > In kvmppc_handle_exit_pr, instead of routing the interrupt to > the guest, we return to host, with KVM_EXIT_DEBUG reason. > > Signed-off-by: Laurent Vivier> --- > arch/powerpc/kvm/book3s_pr.c | 31 +-- > 1 file changed, 29 insertions(+), 2 deletions(-) > > diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c > index 95bceca..e6896f4 100644 > --- a/arch/powerpc/kvm/book3s_pr.c > +++ b/arch/powerpc/kvm/book3s_pr.c > @@ -882,6 +882,24 @@ void kvmppc_set_fscr(struct kvm_vcpu *vcpu, u64 fscr) > } > #endif > > +static void kvmppc_setup_debug(struct kvm_vcpu *vcpu) > +{ > + if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) { > + u64 msr = kvmppc_get_msr(vcpu); > + > + kvmppc_set_msr(vcpu, msr | MSR_SE); > + } > +} > + > +static void kvmppc_clear_debug(struct kvm_vcpu *vcpu) > +{ > + if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) { > + u64 msr = kvmppc_get_msr(vcpu); > + > + kvmppc_set_msr(vcpu, msr & ~MSR_SE); > + } > +} > + > int kvmppc_handle_exit_pr(struct kvm_run *run, struct kvm_vcpu *vcpu, > unsigned int exit_nr) > { > @@ -1208,8 +1226,13 @@ program_interrupt: > #endif > case BOOK3S_INTERRUPT_MACHINE_CHECK: > case BOOK3S_INTERRUPT_TRACE: > - kvmppc_book3s_queue_irqprio(vcpu, exit_nr); > - r = RESUME_GUEST; > + if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) { > + run->exit_reason = KVM_EXIT_DEBUG; > + r = RESUME_HOST; > + } else { > + kvmppc_book3s_queue_irqprio(vcpu, exit_nr); > + r = RESUME_GUEST; > + } > break; > default: > { > @@ -1479,6 +1502,8 @@ static int kvmppc_vcpu_run_pr(struct kvm_run *kvm_run, > struct kvm_vcpu *vcpu) > goto out; > } > > + kvmppc_setup_debug(vcpu); > + > /* >* Interrupts could be timers for the guest which we have to inject >* again, so let's postpone them until we're in the guest and if we > @@ -1501,6 +1526,8 @@ static int kvmppc_vcpu_run_pr(struct kvm_run *kvm_run, > struct kvm_vcpu *vcpu) > > ret = __kvmppc_vcpu_run(kvm_run, vcpu); > > + kvmppc_clear_debug(vcpu); > + > /* No need for kvm_guest_exit. It's done in handle_exit. > We also get here with interrupts enabled. */ > > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] kvm-pr: manage illegal instructions
Ping? On 15/03/2016 21:18, Laurent Vivier wrote: > While writing some instruction tests for kvm-unit-tests for powerpc, > I've found that illegal instructions are not managed correctly with kvm-pr, > while it is fine with kvm-hv. > > When an illegal instruction (like ".long 0") is processed by kvm-pr, > the kernel logs are filled with: > > Couldn't emulate instruction 0x (op 0 xop 0) > kvmppc_handle_exit_pr: emulation at 700 failed () > > While the exception handler receives an interrupt for each instruction > executed after the illegal instruction. > > Signed-off-by: Laurent Vivier> --- > arch/powerpc/kvm/book3s_emulate.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/kvm/book3s_emulate.c > b/arch/powerpc/kvm/book3s_emulate.c > index 2afdb9c..4ee969d 100644 > --- a/arch/powerpc/kvm/book3s_emulate.c > +++ b/arch/powerpc/kvm/book3s_emulate.c > @@ -99,7 +99,6 @@ int kvmppc_core_emulate_op_pr(struct kvm_run *run, struct > kvm_vcpu *vcpu, > > switch (get_op(inst)) { > case 0: > - emulated = EMULATE_FAIL; > if ((kvmppc_get_msr(vcpu) & MSR_LE) && > (inst == swab32(inst_sc))) { > /* > @@ -112,6 +111,9 @@ int kvmppc_core_emulate_op_pr(struct kvm_run *run, struct > kvm_vcpu *vcpu, > kvmppc_set_gpr(vcpu, 3, EV_UNIMPLEMENTED); > kvmppc_set_pc(vcpu, kvmppc_get_pc(vcpu) + 4); > emulated = EMULATE_DONE; > + } else { > + kvmppc_core_queue_program(vcpu, SRR1_PROGILL); > + emulated = EMULATE_AGAIN; > } > break; > case 19: > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 03/10] mm/hugetlb: Protect follow_huge_(pud|pgd) functions from race
Hi Anshuman, [auto build test ERROR on powerpc/next] [also build test ERROR on v4.6-rc2 next-20160407] [if your patch is applied to the wrong git tree, please drop us a note to help improving the system] url: https://github.com/0day-ci/linux/commits/Anshuman-Khandual/Enable-HugeTLB-page-migration-on-POWER/20160407-165841 base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next config: s390-allyesconfig (attached as .config) reproduce: wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=s390 All errors (new ones prefixed by >>): mm/hugetlb.c: In function 'follow_huge_pud': >> mm/hugetlb.c:4360:3: error: implicit declaration of function 'pud_page' >> [-Werror=implicit-function-declaration] page = pud_page(*pud) + ((address & ~PUD_MASK) >> PAGE_SHIFT); ^ mm/hugetlb.c:4360:8: warning: assignment makes pointer from integer without a cast page = pud_page(*pud) + ((address & ~PUD_MASK) >> PAGE_SHIFT); ^ mm/hugetlb.c: In function 'follow_huge_pgd': mm/hugetlb.c:4395:3: error: implicit declaration of function 'pgd_page' [-Werror=implicit-function-declaration] page = pgd_page(*pgd) + ((address & ~PGDIR_MASK) >> PAGE_SHIFT); ^ mm/hugetlb.c:4395:8: warning: assignment makes pointer from integer without a cast page = pgd_page(*pgd) + ((address & ~PGDIR_MASK) >> PAGE_SHIFT); ^ cc1: some warnings being treated as errors vim +/pud_page +4360 mm/hugetlb.c 4354 * make sure that the address range covered by this pud is not 4355 * unmapped from other threads. 4356 */ 4357 if (!pud_huge(*pud)) 4358 goto out; 4359 if (pud_present(*pud)) { > 4360 page = pud_page(*pud) + ((address & ~PUD_MASK) >> > PAGE_SHIFT); 4361 if (flags & FOLL_GET) 4362 get_page(page); 4363 } else { --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: Binary data ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/2] perf probe fixes for ppc64le
On 2016/04/07 06:19PM, Balbir Singh wrote: > > On 06/04/16 22:32, Naveen N. Rao wrote: > > This patchset fixes three issues found with perf probe on ppc64le: > > 1. 'perf test kallsyms' failure on ppc64le (reported by Michael > > Ellerman). This was due to the symbols being fixed up during symbol > > table load. This is fixed in patch 2 by delaying symbol fixup until > > later. > > 2. perf probe function offset was being calculated from the local entry > > point (LEP), which does not match user expectation when trying to look > > at function disassembly output (reported by Ananth N). This is fixed for > > kallsyms in patch 1 and for symbol table in patch 2. > > I think the bit where the offset is w.r.t LEP when using a name, but w.r.t > GEP when using function+offset can be confusing. Thanks for your review! The rationale for this is actually from the end-user perspective. The two use cases we are considering are: 1. User just wants to probe at function entry point: # perf probe _do_fork In this case, the user most definitely needs the local entry point, without which the probe won't be hit. So, for this case, we automatically insert the probe at the LEP. [We really only want to alter perf probe behavior in this case only, but we were incorrectly changing the behavior of perf with the below scenario as well.] 2. User wants to probe at a specific location. In this case, the user most likely starts by looking at the function disassembly. For instance: # objdump -S -d vmlinux.bak | grep -A100 \<_do_fork\>: c00b6a00 <_do_fork>: unsigned long stack_start, unsigned long stack_size, int __user *parent_tidptr, int __user *child_tidptr, unsigned long tls) { c00b6a00: f7 00 4c 3c addis r2,r12,247 c00b6a04: 00 86 42 38 addir2,r2,-31232 c00b6a08: a6 02 08 7c mflrr0 c00b6a0c: d0 ff 41 fb std r26,-48(r1) c00b6a10: 26 80 90 7d mfocrf r12,8 .. if (!(clone_flags & CLONE_UNTRACED)) { c00b6a54: e3 4f c7 7b rldicl. r7,r30,41,63 c00b6a58: 2c 00 82 40 bne c00b6a84 <_do_fork+0x84> if (clone_flags & CLONE_VFORK) c00b6a5c: e3 97 c8 7b rldicl. r8,r30,50,63 c00b6a60: a0 01 82 41 beq c00b6c00 <_do_fork+0x200> c00b6a64: 20 00 20 39 li r9,32 trace = PTRACE_EVENT_VFORK; c00b6a68: 02 00 80 3b li r28,2 c00b6a6c: 10 02 4d e9 ld r10,528(r13) If the user wants to probe at _do_fork+0x54, he'd do: # perf probe _do_fork+0x54 With the earlier approach, we would insert the probe at _do_fork+0x5c (0x54 from the LEP) instead, which is incorrect. In reality, user would probably just use debuginfo: # perf probe -L _do_fork <_do_fork@/root/linus/kernel/fork.c:0> 0 long _do_fork(unsigned long clone_flags, unsigned long stack_start, unsigned long stack_size, int __user *parent_tidptr, int __user *child_tidptr, unsigned long tls) 6 { struct task_struct *p; 8 int trace = 0; long nr; /* * Determine whether and which event to report to ptracer. When * called from kernel_thread or CLONE_UNTRACED is explicitly * requested, no event is reported; otherwise, report if the event * for the type of forking is enabled. */ 17 if (!(clone_flags & CLONE_UNTRACED)) { 18 if (clone_flags & CLONE_VFORK) 19 trace = PTRACE_EVENT_VFORK; 20 else if ((clone_flags & CSIGNAL) != SIGCHLD) 21 trace = PTRACE_EVENT_CLONE; # perf probe _do_fork:17 In this case, perf chooses the right address based on DWARF. The current patchset matches the behavior of perf without debuginfo with this. > Do we really need probe > points between GEP and LEP? All the GEP does is setup r2. The use case > could be more generic, but please clarify. There could be scenarios where having a probe point between GEP and LEP is useful - for instance, if we are only interested in calls to an in-kernel function from an external module. However, this is a secondary consideration and the more important
Re: [PATCH 03/10] mm/hugetlb: Protect follow_huge_(pud|pgd) functions from race
On 07/04/16 15:37, Anshuman Khandual wrote: > follow_huge_(pmd|pud|pgd) functions are used to walk the page table and > fetch the page struct during 'follow_page_mask' call. There are possible > race conditions faced by these functions which arise out of simultaneous > calls of move_pages() and freeing of huge pages. This was fixed partly > by the previous commit e66f17ff7177 ("mm/hugetlb: take page table lock > in follow_huge_pmd()") for only PMD based huge pages. > > After implementing similar logic, functions like follow_huge_(pud|pgd) > are now safe from above mentioned race conditions and also can support > FOLL_GET. Generic version of the function 'follow_huge_addr' has been > left as it is and its upto the architecture to decide on it. > > Signed-off-by: Anshuman Khandual> --- > include/linux/mm.h | 33 +++ > mm/hugetlb.c | 67 > ++ > 2 files changed, 91 insertions(+), 9 deletions(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index ffcff53..734182a 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -1751,6 +1751,19 @@ static inline void pgtable_page_dtor(struct page *page) > NULL: pte_offset_kernel(pmd, address)) > > #if USE_SPLIT_PMD_PTLOCKS Do we still use USE_SPLIT_PMD_PTLOCKS? I think its good enough. with pgd's we are likely to use the same locks and the split nature may not be really split. > +static struct page *pgd_to_page(pgd_t *pgd) > +{ > + unsigned long mask = ~(PTRS_PER_PGD * sizeof(pgd_t) - 1); > + > + return virt_to_page((void *)((unsigned long) pgd & mask)); > +} > + > +static struct page *pud_to_page(pud_t *pud) > +{ > + unsigned long mask = ~(PTRS_PER_PUD * sizeof(pud_t) - 1); > + > + return virt_to_page((void *)((unsigned long) pud & mask)); > +} > > static struct page *pmd_to_page(pmd_t *pmd) > { > @@ -1758,6 +1771,16 @@ static struct page *pmd_to_page(pmd_t *pmd) > return virt_to_page((void *)((unsigned long) pmd & mask)); > } > > +static inline spinlock_t *pgd_lockptr(struct mm_struct *mm, pgd_t *pgd) > +{ > + return ptlock_ptr(pgd_to_page(pgd)); > +} > + > +static inline spinlock_t *pud_lockptr(struct mm_struct *mm, pud_t *pud) > +{ > + return ptlock_ptr(pud_to_page(pud)); > +} > + > static inline spinlock_t *pmd_lockptr(struct mm_struct *mm, pmd_t *pmd) > { > return ptlock_ptr(pmd_to_page(pmd)); > @@ -1783,6 +1806,16 @@ static inline void pgtable_pmd_page_dtor(struct page > *page) > > #else > > +static inline spinlock_t *pgd_lockptr(struct mm_struct *mm, pgd_t *pgd) > +{ > + return >page_table_lock; > +} > + > +static inline spinlock_t *pud_lockptr(struct mm_struct *mm, pud_t *pud) > +{ > + return >page_table_lock; > +} > + > static inline spinlock_t *pmd_lockptr(struct mm_struct *mm, pmd_t *pmd) > { > return >page_table_lock; > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 5ea3158..e84e479 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -4346,21 +4346,70 @@ struct page * __weak > follow_huge_pud(struct mm_struct *mm, unsigned long address, > pud_t *pud, int flags) > { > - if (flags & FOLL_GET) > - return NULL; > - > - return pte_page(*(pte_t *)pud) + ((address & ~PUD_MASK) >> PAGE_SHIFT); > + struct page *page = NULL; > + spinlock_t *ptl; > +retry: > + ptl = pud_lockptr(mm, pud); > + spin_lock(ptl); > + /* > + * make sure that the address range covered by this pud is not > + * unmapped from other threads. > + */ > + if (!pud_huge(*pud)) > + goto out; > + if (pud_present(*pud)) { > + page = pud_page(*pud) + ((address & ~PUD_MASK) >> PAGE_SHIFT); > + if (flags & FOLL_GET) > + get_page(page); > + } else { > + if (is_hugetlb_entry_migration(huge_ptep_get((pte_t *)pud))) { > + spin_unlock(ptl); > + __migration_entry_wait(mm, (pte_t *)pud, ptl); > + goto retry; > + } > + /* > + * hwpoisoned entry is treated as no_page_table in > + * follow_page_mask(). > + */ > + } > +out: > + spin_unlock(ptl); > + return page; > } > > struct page * __weak > follow_huge_pgd(struct mm_struct *mm, unsigned long address, > pgd_t *pgd, int flags) > { > - if (flags & FOLL_GET) > - return NULL; > - > - return pte_page(*(pte_t *)pgd) + > - ((address & ~PGDIR_MASK) >> PAGE_SHIFT); > + struct page *page = NULL; > + spinlock_t *ptl; > +retry: > + ptl = pgd_lockptr(mm, pgd); > + spin_lock(ptl); > + /* > + * make sure that the address range covered by this pgd is not > + * unmapped from other threads. > + */ > + if (!pgd_huge(*pgd)) > + goto out; > + if
Re: [PATCH V10 00/28] Add new powerpc specific ELF core notes
On 16/02/2016 09:59, Anshuman Khandual wrote: > This patch series adds twelve new ELF core note sections which can > be used with existing ptrace request PTRACE_GETREGSET-SETREGSET for accessing > various transactional memory and other miscellaneous debug register sets on > powerpc platform. Hi Michael, This series is required to handle TM state in CRIU. Is there a chance to get it upstream soon ? Thanks, Laurent. > > Test Result (All tests pass on both BE and LE) > -- > ptrace-ebbPASS > ptrace-gprPASS > ptrace-tm-gpr PASS > ptrace-tm-spd-gpr PASS > ptrace-tarPASS > ptrace-tm-tar PASS > ptrace-tm-spd-tar PASS > ptrace-vsxPASS > ptrace-tm-vsx PASS > ptrace-tm-spd-vsx PASS > ptrace-tm-spr PASS > > Previous versions: > == > RFC: https://lkml.org/lkml/2014/4/1/292 > V1: https://lkml.org/lkml/2014/4/2/43 > V2: https://lkml.org/lkml/2014/5/5/88 > V3: https://lkml.org/lkml/2014/5/23/486 > V4: https://lkml.org/lkml/2014/11/11/6 > V5: https://lkml.org/lkml/2014/11/25/134 > V6: https://lkml.org/lkml/2014/12/2/98 > V7: https://lkml.org/lkml/2015/1/14/19 > V8: https://lkml.org/lkml/2015/5/19/700 > V9: https://lkml.org/lkml/2015/10/8/522 > > Changes in V10: > --- > - Rebased against the latest mainline > - Fixed couple of build failures in the test cases related to aux vector > > Changes in V9: > -- > - Fixed static build check failure after tm_orig_msr got dropped > - Fixed asm volatile construct for used registers set > - Fixed EBB, VSX, VMX tests for LE > - Fixed TAR test which was failing because of system calls > - Added checks for PPC_FEATURE2_HTM aux feature in the tests > - Fixed copyright statements > > Changes in V8: > -- > - Split the misc register set into individual ELF core notes > - Implemented support for VSX register set (on and off TM) > - Implemented support for EBB register set > - Implemented review comments on previous versions > - Some code re-arrangements, re-writes and documentation > - Added comprehensive list of test cases into selftests > > Changes in V7: > -- > - Fixed a config directive in the MISC code > - Merged the two gitignore patches into a single one > > Changes in V6: > -- > - Added two git ignore patches for powerpc selftests > - Re-formatted all in-code function definitions in kernel-doc format > > Changes in V5: > -- > - Changed flush_tmregs_to_thread, so not to take into account self tracing > - Dropped the 3rd patch in the series which had merged two functions > - Fixed one build problem for the misc debug register patch > - Accommodated almost all the review comments from Suka on the 6th patch > - Minor changes to the self test program > - Changed commit messages for some of the patches > > Changes in V4: > -- > - Added one test program into the powerpc selftest bucket in this regard > - Split the 2nd patch in the previous series into four different patches > - Accommodated most of the review comments on the previous patch series > - Added a patch to merge functions __switch_to_tm and tm_reclaim_task > > Changes in V3: > -- > - Added two new error paths in every TM related get/set functions when regset > support is not present on the system (ENODEV) or when the process does not > have any transaction active (ENODATA) in the context > - Installed the active hooks for all the newly added regset core note types > > Changes in V2: > -- > - Removed all the power specific ptrace requests corresponding to new NT_PPC_* > elf core note types. Now all the register sets can be accessed from ptrace > through PTRACE_GETREGSET/PTRACE_SETREGSET using the individual NT_PPC* core > note type instead > - Fixed couple of attribute values for REGSET_TM_CGPR register set > - Renamed flush_tmreg_to_thread as flush_tmregs_to_thread > - Fixed 32 bit checkpointed GPR support > - Changed commit messages accordingly > > > Anshuman Khandual (28): > elf: Add powerpc specific core note sections > powerpc, process: Add the function flush_tmregs_to_thread > powerpc, ptrace: Enable in transaction NT_PRFPREG ptrace requests > powerpc, ptrace: Enable in transaction NT_PPC_VMX ptrace requests > powerpc, ptrace: Enable in transaction NT_PPC_VSX ptrace requests > powerpc, ptrace: Adapt gpr32_get, gpr32_set functions for transaction > powerpc, ptrace: Enable support for NT_PPC_CGPR > powerpc, ptrace: Enable support for NT_PPC_CFPR > powerpc, ptrace: Enable support for NT_PPC_CVMX > powerpc, ptrace: Enable support for NT_PPC_CVSX > powerpc, ptrace: Enable support for TM SPR state > powerpc, ptrace: Enable NT_PPC_TM_CTAR, NT_PPC_TM_CPPR, NT_PPC_TM_CDSCR > powerpc, ptrace: Enable support for NT_PPPC_TAR, NT_PPC_PPR, NT_PPC_DSCR > powerpc, ptrace: Enable support for EBB registers >
Re: [PATCH 03/10] mm/hugetlb: Protect follow_huge_(pud|pgd) functions from race
Hi Anshuman, [auto build test ERROR on powerpc/next] [also build test ERROR on v4.6-rc2 next-20160407] [if your patch is applied to the wrong git tree, please drop us a note to help improving the system] url: https://github.com/0day-ci/linux/commits/Anshuman-Khandual/Enable-HugeTLB-page-migration-on-POWER/20160407-165841 base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next config: sparc64-allyesconfig (attached as .config) reproduce: wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=sparc64 All error/warnings (new ones prefixed by >>): mm/hugetlb.c: In function 'follow_huge_pgd': >> mm/hugetlb.c:4395:3: error: implicit declaration of function 'pgd_page' >> [-Werror=implicit-function-declaration] page = pgd_page(*pgd) + ((address & ~PGDIR_MASK) >> PAGE_SHIFT); ^ >> mm/hugetlb.c:4395:8: warning: assignment makes pointer from integer without >> a cast page = pgd_page(*pgd) + ((address & ~PGDIR_MASK) >> PAGE_SHIFT); ^ cc1: some warnings being treated as errors vim +/pgd_page +4395 mm/hugetlb.c 4389 * make sure that the address range covered by this pgd is not 4390 * unmapped from other threads. 4391 */ 4392 if (!pgd_huge(*pgd)) 4393 goto out; 4394 if (pgd_present(*pgd)) { > 4395 page = pgd_page(*pgd) + ((address & ~PGDIR_MASK) >> > PAGE_SHIFT); 4396 if (flags & FOLL_GET) 4397 get_page(page); 4398 } else { --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: Binary data ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 02/10] mm/hugetlb: Add PGD based implementation awareness
On 07/04/16 15:37, Anshuman Khandual wrote: > Currently the config ARCH_WANT_GENERAL_HUGETLB enabled functions like > 'huge_pte_alloc' and 'huge_pte_offset' dont take into account HugeTLB > page implementation at the PGD level. This is also true for functions > like 'follow_page_mask' which is called from move_pages() system call. > This lack of PGD level huge page support prohibits some architectures > to use these generic HugeTLB functions. > From what I know of move_pages(), it will always call follow_page_mask() with FOLL_GET (I could be wrong here) and the implementation below returns NULL for follow_huge_pgd(). > This change adds the required PGD based implementation awareness and > with that, more architectures like POWER which implements 16GB pages > at the PGD level along with the 16MB pages at the PMD level can now > use ARCH_WANT_GENERAL_HUGETLB config option. > > Signed-off-by: Anshuman Khandual> --- > include/linux/hugetlb.h | 3 +++ > mm/gup.c| 6 ++ > mm/hugetlb.c| 20 > 3 files changed, 29 insertions(+) > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > index 7d953c2..71832e1 100644 > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h > @@ -115,6 +115,8 @@ struct page *follow_huge_pmd(struct mm_struct *mm, > unsigned long address, > pmd_t *pmd, int flags); > struct page *follow_huge_pud(struct mm_struct *mm, unsigned long address, > pud_t *pud, int flags); > +struct page *follow_huge_pgd(struct mm_struct *mm, unsigned long address, > + pgd_t *pgd, int flags); > int pmd_huge(pmd_t pmd); > int pud_huge(pud_t pmd); > unsigned long hugetlb_change_protection(struct vm_area_struct *vma, > @@ -143,6 +145,7 @@ static inline void hugetlb_show_meminfo(void) > } > #define follow_huge_pmd(mm, addr, pmd, flags)NULL > #define follow_huge_pud(mm, addr, pud, flags)NULL > +#define follow_huge_pgd(mm, addr, pgd, flags)NULL > #define prepare_hugepage_range(file, addr, len) (-EINVAL) > #define pmd_huge(x) 0 > #define pud_huge(x) 0 > diff --git a/mm/gup.c b/mm/gup.c > index fb87aea..9bac78c 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -234,6 +234,12 @@ struct page *follow_page_mask(struct vm_area_struct *vma, > pgd = pgd_offset(mm, address); > if (pgd_none(*pgd) || unlikely(pgd_bad(*pgd))) > return no_page_table(vma, flags); > + if (pgd_huge(*pgd) && vma->vm_flags & VM_HUGETLB) { > + page = follow_huge_pgd(mm, address, pgd, flags); > + if (page) > + return page; > + return no_page_table(vma, flags); This will return NULL as well? > + } > > pud = pud_offset(pgd, address); > if (pud_none(*pud)) > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 19d0d08..5ea3158 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -4250,6 +4250,11 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, > pte_t *pte = NULL; > > pgd = pgd_offset(mm, addr); > + if (sz == PGDIR_SIZE) { > + pte = (pte_t *)pgd; > + goto huge_pgd; > + } > + No allocation for a pgd slot - right? > pud = pud_alloc(mm, pgd, addr); > if (pud) { > if (sz == PUD_SIZE) { > @@ -4262,6 +4267,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, > pte = (pte_t *)pmd_alloc(mm, pud, addr); > } > } > + > +huge_pgd: > BUG_ON(pte && !pte_none(*pte) && !pte_huge(*pte)); > > return pte; > @@ -4275,6 +4282,8 @@ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned > long addr) > > pgd = pgd_offset(mm, addr); > if (pgd_present(*pgd)) { > + if (pgd_huge(*pgd)) > + return (pte_t *)pgd; > pud = pud_offset(pgd, addr); > if (pud_present(*pud)) { > if (pud_huge(*pud)) > @@ -4343,6 +4352,17 @@ follow_huge_pud(struct mm_struct *mm, unsigned long > address, > return pte_page(*(pte_t *)pud) + ((address & ~PUD_MASK) >> PAGE_SHIFT); > } > > +struct page * __weak > +follow_huge_pgd(struct mm_struct *mm, unsigned long address, > + pgd_t *pgd, int flags) > +{ > + if (flags & FOLL_GET) > + return NULL; > + > + return pte_page(*(pte_t *)pgd) + > + ((address & ~PGDIR_MASK) >> PAGE_SHIFT); > +} > + > #ifdef CONFIG_MEMORY_FAILURE > > /* > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 01/10] mm/mmap: Replace SHM_HUGE_MASK with MAP_HUGE_MASK inside mmap_pgoff
On 07/04/16 15:37, Anshuman Khandual wrote: > The commit 091d0d55b286 ("shm: fix null pointer deref when userspace > specifies invalid hugepage size") had replaced MAP_HUGE_MASK with > SHM_HUGE_MASK. Though both of them contain the same numeric value of > 0x3f, MAP_HUGE_MASK flag sounds more appropriate than the other one > in the context. Hence change it back. > > Signed-off-by: Anshuman KhandualAcked-by: Balbir Singh ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/2] perf probe fixes for ppc64le
On 06/04/16 22:32, Naveen N. Rao wrote: > This patchset fixes three issues found with perf probe on ppc64le: > 1. 'perf test kallsyms' failure on ppc64le (reported by Michael > Ellerman). This was due to the symbols being fixed up during symbol > table load. This is fixed in patch 2 by delaying symbol fixup until > later. > 2. perf probe function offset was being calculated from the local entry > point (LEP), which does not match user expectation when trying to look > at function disassembly output (reported by Ananth N). This is fixed for > kallsyms in patch 1 and for symbol table in patch 2. I think the bit where the offset is w.r.t LEP when using a name, but w.r.t GEP when using function+offset can be confusing. Do we really need probe points between GEP and LEP? All the GEP does is setup r2. The use case could be more generic, but please clarify. > 3. perf probe failure with kretprobe when using kallsyms. This was > failing as we were specifying an offset. This is fixed in patch 1. > Balbir Singh. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/2] perf/powerpc: Fix kprobe and kretprobe handling with kallsyms
On 2016/04/07 10:00AM, Ananth N wrote: > On Wed, Apr 06, 2016 at 06:02:57PM +0530, Naveen N. Rao wrote: > > > + if (!pev->uprobes && map->dso->symtab_type == DSO_BINARY_TYPE__KALLSYMS) > > tev->point.offset += PPC64LE_LEP_OFFSET; > > uprobes check against kallsysms? Am I missing something here? Ah yes. That check shouldn't be necessary since symtab_type would be different anyway. I will remove that check. Thanks for the review! - Naveen ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V3 2/2] pseries/eeh: Refactor the configure_bridge RTAS tokens
The RTAS calls "ibm,configure-pe" and "ibm,configure-bridge" perform the same actions, however the former can skip configuration if unnecessary. The existing code treats them as different tokens even though only one will ever be called. Refactor this by making a single token that is assigned during init. Signed-off-by: Russell Currey--- V3: Reorder commits so the previous patch doesn't depend on this I had a look at doing the same with some other duplicated tokens but they had slight differences in semantics so it wasn't helping clarity. --- arch/powerpc/platforms/pseries/eeh_pseries.c | 28 1 file changed, 12 insertions(+), 16 deletions(-) diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c index 405baaf..3998e0f 100644 --- a/arch/powerpc/platforms/pseries/eeh_pseries.c +++ b/arch/powerpc/platforms/pseries/eeh_pseries.c @@ -53,7 +53,6 @@ static int ibm_read_slot_reset_state2; static int ibm_slot_error_detail; static int ibm_get_config_addr_info; static int ibm_get_config_addr_info2; -static int ibm_configure_bridge; static int ibm_configure_pe; /* @@ -81,7 +80,14 @@ static int pseries_eeh_init(void) ibm_get_config_addr_info2 = rtas_token("ibm,get-config-addr-info2"); ibm_get_config_addr_info= rtas_token("ibm,get-config-addr-info"); ibm_configure_pe= rtas_token("ibm,configure-pe"); - ibm_configure_bridge= rtas_token("ibm,configure-bridge"); + + /* +* ibm,configure-pe and ibm,configure-bridge have the same semantics, +* however ibm,configure-pe can be faster. If we can't find +* ibm,configure-pe then fall back to using ibm,configure-bridge. +*/ + if (ibm_configure_pe == RTAS_UNKNOWN_SERVICE) + ibm_configure_pe= rtas_token("ibm,configure-bridge"); /* * Necessary sanity check. We needn't check "get-config-addr-info" @@ -93,8 +99,7 @@ static int pseries_eeh_init(void) (ibm_read_slot_reset_state2 == RTAS_UNKNOWN_SERVICE && ibm_read_slot_reset_state == RTAS_UNKNOWN_SERVICE) || ibm_slot_error_detail == RTAS_UNKNOWN_SERVICE || - (ibm_configure_pe == RTAS_UNKNOWN_SERVICE && -ibm_configure_bridge == RTAS_UNKNOWN_SERVICE)) { + ibm_configure_pe == RTAS_UNKNOWN_SERVICE) { pr_info("EEH functionality not supported\n"); return -EINVAL; } @@ -624,18 +629,9 @@ static int pseries_eeh_configure_bridge(struct eeh_pe *pe) config_addr = pe->addr; while (max_wait > 0) { - /* Use new configure-pe function, if supported */ - if (ibm_configure_pe != RTAS_UNKNOWN_SERVICE) { - ret = rtas_call(ibm_configure_pe, 3, 1, NULL, - config_addr, BUID_HI(pe->phb->buid), - BUID_LO(pe->phb->buid)); - } else if (ibm_configure_bridge != RTAS_UNKNOWN_SERVICE) { - ret = rtas_call(ibm_configure_bridge, 3, 1, NULL, - config_addr, BUID_HI(pe->phb->buid), - BUID_LO(pe->phb->buid)); - } else { - return -EFAULT; - } + ret = rtas_call(ibm_configure_pe, 3, 1, NULL, + config_addr, BUID_HI(pe->phb->buid), + BUID_LO(pe->phb->buid)); if (!ret) return ret; -- 2.8.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V3 1/2] pseries/eeh: Handle RTAS delay requests in configure_bridge
In the "ibm,configure-pe" and "ibm,configure-bridge" RTAS calls, the spec states that values of 9900-9905 can be returned, indicating that software should delay for 10^x (where x is the last digit, i.e. 990x) milliseconds and attempt the call again. Currently, the kernel doesn't know about this, and respecting it fixes some PCI failures when the hypervisor is busy. The delay is capped at 0.2 seconds. Cc:# 3.10+ Signed-off-by: Russell Currey --- V3 changelog: - Refactorings and rewordings thanks to Gavin - Treat return values >9902 as 9902 thanks to Tyrel --- arch/powerpc/platforms/pseries/eeh_pseries.c | 51 1 file changed, 36 insertions(+), 15 deletions(-) diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c index ac3ffd9..405baaf 100644 --- a/arch/powerpc/platforms/pseries/eeh_pseries.c +++ b/arch/powerpc/platforms/pseries/eeh_pseries.c @@ -615,29 +615,50 @@ static int pseries_eeh_configure_bridge(struct eeh_pe *pe) { int config_addr; int ret; + /* Waiting 0.2s maximum before skipping configuration */ + int max_wait = 200; /* Figure out the PE address */ config_addr = pe->config_addr; if (pe->addr) config_addr = pe->addr; - /* Use new configure-pe function, if supported */ - if (ibm_configure_pe != RTAS_UNKNOWN_SERVICE) { - ret = rtas_call(ibm_configure_pe, 3, 1, NULL, - config_addr, BUID_HI(pe->phb->buid), - BUID_LO(pe->phb->buid)); - } else if (ibm_configure_bridge != RTAS_UNKNOWN_SERVICE) { - ret = rtas_call(ibm_configure_bridge, 3, 1, NULL, - config_addr, BUID_HI(pe->phb->buid), - BUID_LO(pe->phb->buid)); - } else { - return -EFAULT; - } + while (max_wait > 0) { + /* Use new configure-pe function, if supported */ + if (ibm_configure_pe != RTAS_UNKNOWN_SERVICE) { + ret = rtas_call(ibm_configure_pe, 3, 1, NULL, + config_addr, BUID_HI(pe->phb->buid), + BUID_LO(pe->phb->buid)); + } else if (ibm_configure_bridge != RTAS_UNKNOWN_SERVICE) { + ret = rtas_call(ibm_configure_bridge, 3, 1, NULL, + config_addr, BUID_HI(pe->phb->buid), + BUID_LO(pe->phb->buid)); + } else { + return -EFAULT; + } - if (ret) - pr_warn("%s: Unable to configure bridge PHB#%d-PE#%x (%d)\n", - __func__, pe->phb->global_number, pe->addr, ret); + if (!ret) + return ret; + + /* +* If RTAS returns a delay value that's above 100ms, cut it +* down to 100ms in case firmware made a mistake. For more +* on how these delay values work see rtas_busy_delay_time +*/ + if (ret > RTAS_EXTENDED_DELAY_MIN+2 && + ret <= RTAS_EXTENDED_DELAY_MAX) + ret = RTAS_EXTENDED_DELAY_MIN+2; + + max_wait -= rtas_busy_delay_time(ret); + + if (max_wait < 0) + break; + + rtas_busy_delay(ret); + } + pr_warn("%s: Unable to configure bridge PHB#%d-PE#%x (%d)\n", + __func__, pe->phb->global_number, pe->addr, ret); return ret; } -- 2.8.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc: define the fman node for the kmcoge4 DTS
On 06/04/16 23:49, Scott Wood wrote: > On Wed, 2016-04-06 at 15:37 +0200, Valentin Longchamp wrote: >> Now that the FMAN mac driver has been merged the fman node is relevant. >> >> The kmcoge4 board implements 3 ethernet interfaces, 1 with a RGMII phy >> and 2 with fixed 1 Giga SGMII links. >> >> Signed-off-by: Valentin Longchamp>> --- >> arch/powerpc/boot/dts/fsl/kmcoge4.dts | 39 >> +++ >> 1 file changed, 39 insertions(+) >> >> diff --git a/arch/powerpc/boot/dts/fsl/kmcoge4.dts >> b/arch/powerpc/boot/dts/fsl/kmcoge4.dts >> index 6858ec9..1cec66d 100644 >> --- a/arch/powerpc/boot/dts/fsl/kmcoge4.dts >> +++ b/arch/powerpc/boot/dts/fsl/kmcoge4.dts >> @@ -106,6 +106,45 @@ >> sata@221000 { >> status = "disabled"; >> }; >> + >> +fman0: fman@40 { >> +enet0: ethernet@e { >> +phy-connection-type = "sgmii"; >> +local-mac-address = [00 11 22 33 44 55]; >> +fixed-link { >> +speed = <1000>; >> +full-duplex; >> +}; >> +}; >> +mdio0: mdio@e1120 { >> +front_phy: ethernet-phy@11 { >> +reg = <0x11>; >> +}; >> +}; >> + >> +enet1: ethernet@e2000 { >> +phy-connection-type = "sgmii"; >> +local-mac-address = [00 11 22 33 44 56]; >> +fixed-link { >> +speed = <1000>; >> +full-duplex; >> +}; >> +}; > > No hardcoded MAC addresses. > For these 2 interfaces where I have the local-mac-address field, the MAC addresses are set later by an application that reads the real address in some EEPROM. However, in order to let the fman mac_probe to run successfully in the first place I have set non-zero MAC addresses since the local-mac-address fields are not set by u-boot. I have found several local-mac-address fields in other DTS files that are all zeros, and thus are rejected by of_get_mac_address. Are they leftovers from the past or should they be used here as well ? If not, I will simply drop these 2 fields. Thanks Valentin ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev