Re: [RFC FIX PATCH v0] powerpc,numa: Fix memory_hotplug_max()

2016-04-07 Thread Nathan Fontenot
On 04/06/2016 04:44 AM, Bharata B Rao wrote:
> memory_hotplug_max() uses hot_add_drconf_memory_max() to get maxmimum
> addressable memory by referring to ibm,dyanamic-memory property. There
> are three problems with the current approach:
> 
> 1 hot_add_drconf_memory_max() assumes that ibm,dynamic-memory includes
>   all the LMBs of the guest, but that is not true for PowerKVM which
>   populates only DR LMBs (LMBs that can be hotplugged/removed) in that
>   property.
> 2 hot_add_drconf_memory_max() multiplies lmb-size with lmb-count to arrive
>   at the max possible address. Since ibm,dynamic-memory doesn't include
>   RMA LMBs, the address thus obtained will be less than the actual max
>   address. For example, if max possible memory size is 32G, with lmb-size
>   of 256MB there can be 127 LMBs in ibm,dynamic-memory (1 LMB for RMA
>   which won't be present here).  hot_add_drconf_memory_max() would then
>   return the max addressable memory as 127 * 256MB = 31.75GB, the max
>   address should have been 32G which is what ibm,lrdr-capacity shows.
> 3 In PowerKVM, there can be a gap between the end of boot time RAM and
>   beginning of hotplug RAM area. So just multiplying lmb-count with
>   lmb-size will not provide the correct max possible address for PowerKVM.
> 
> This patch fixes 1 by using ibm,lrdr-capacity property to return the max
> addressable memory whenever the property is present. Then it fixes 2 & 3
> by fetching the address of the last LMB in ibm,dynamic-memory property.
> 
> NOTE: There are some unnecessary changes in the patch because of converting
> spaces to tabs w/o which checkpatch.pl complains.
> 
> Signed-off-by: Bharata B Rao 
> ---
>  arch/powerpc/mm/numa.c | 29 ++---
>  1 file changed, 22 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index 669a15e..57d5877 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -1164,17 +1164,32 @@ int hot_add_scn_to_nid(unsigned long scn_addr)
>  static u64 hot_add_drconf_memory_max(void)
>  {
>  struct device_node *memory = NULL;
> -unsigned int drconf_cell_cnt = 0;
> -u64 lmb_size = 0;
> + struct device_node *dn = NULL;
> + unsigned int drconf_cell_cnt = 0;
> + u64 lmb_size = 0;
>   const __be32 *dm = NULL;
> + const __be64 *lrdr = NULL;
> + struct of_drconf_cell drmem;
> +
> + dn = of_find_node_by_path("/rtas");
> + if (dn) {
> + lrdr = of_get_property(dn, "ibm,lrdr-capacity", NULL);
> + of_node_put(dn);
> + if (lrdr)
> + return be64_to_cpup(lrdr);
> + }
>  
>  memory = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory");
>  if (memory) {
> -drconf_cell_cnt = of_get_drconf_memory(memory, );
> -lmb_size = of_get_lmb_size(memory);
> -of_node_put(memory);
> -}
> -return lmb_size * drconf_cell_cnt;
> + drconf_cell_cnt = of_get_drconf_memory(memory, );
> + lmb_size = of_get_lmb_size(memory);
> +
> + /* Advance to the last cell, each cell has 6 32 bit integers */
> + dm += (drconf_cell_cnt - 1) * 6;

You could do this as follows to avoid hard-coding 6
dm += (drconf_cell_cnt - 1) * sizeof(struct of_drconf_cell)

> + read_drconf_cell(, );
> + of_node_put(memory);
> + }
> + return drmem.base_addr + lmb_size;

I assume it is a safe assumption that there will only be 1 RMA LMB?

I do see that the PAPR defines a bit in the flags field for each LMB
in ibm,dynamic-memory as 'reserved'. Is this something you could use
to flag RMA LMBs and put them in the ibm,dynamic-memory property?

I'm just curious why these LMBs are not in this property.

-Nathan 
>  }
>  
>  /*
> 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH kernel] vfio_iommu_spapr_tce: Remove unneeded iommu_group_get_iommudata

2016-04-07 Thread David Gibson
On Fri, Apr 08, 2016 at 02:54:41PM +1000, Alexey Kardashevskiy wrote:
> This removes iommu_group_get_iommudata() as the result is never used.
> As this is a minor cleanup, no change in behavior is expected.
> 
> Signed-off-by: Alexey Kardashevskiy 

Reviewed-by: David Gibson 

> ---
>  drivers/vfio/vfio_iommu_spapr_tce.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
> b/drivers/vfio/vfio_iommu_spapr_tce.c
> index 0582b72..6419566 100644
> --- a/drivers/vfio/vfio_iommu_spapr_tce.c
> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
> @@ -331,14 +331,12 @@ static void tce_iommu_free_table(struct iommu_table 
> *tbl);
>  static void tce_iommu_release(void *iommu_data)
>  {
>   struct tce_container *container = iommu_data;
> - struct iommu_table_group *table_group;
>   struct tce_iommu_group *tcegrp;
>   long i;
>  
>   while (tce_groups_attached(container)) {
>   tcegrp = list_first_entry(>group_list,
>   struct tce_iommu_group, next);
> - table_group = iommu_group_get_iommudata(tcegrp->grp);
>   tce_iommu_detach_group(iommu_data, tcegrp->grp);
>   }
>  

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH kernel] vfio_iommu_spapr_tce: Remove unneeded iommu_group_get_iommudata

2016-04-07 Thread Alexey Kardashevskiy
This removes iommu_group_get_iommudata() as the result is never used.
As this is a minor cleanup, no change in behavior is expected.

Signed-off-by: Alexey Kardashevskiy 
---
 drivers/vfio/vfio_iommu_spapr_tce.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
b/drivers/vfio/vfio_iommu_spapr_tce.c
index 0582b72..6419566 100644
--- a/drivers/vfio/vfio_iommu_spapr_tce.c
+++ b/drivers/vfio/vfio_iommu_spapr_tce.c
@@ -331,14 +331,12 @@ static void tce_iommu_free_table(struct iommu_table *tbl);
 static void tce_iommu_release(void *iommu_data)
 {
struct tce_container *container = iommu_data;
-   struct iommu_table_group *table_group;
struct tce_iommu_group *tcegrp;
long i;
 
while (tce_groups_attached(container)) {
tcegrp = list_first_entry(>group_list,
struct tce_iommu_group, next);
-   table_group = iommu_group_get_iommudata(tcegrp->grp);
tce_iommu_detach_group(iommu_data, tcegrp->grp);
}
 
-- 
2.5.0.rc3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH][v3] mtd/ifc: Add support for IFC controller version 2.0

2016-04-07 Thread Boris Brezillon
On Wed, 6 Apr 2016 18:53:39 +
Yang-Leo Li  wrote:

> 
> 
> > -Original Message-
> > From: Brian Norris [mailto:computersforpe...@gmail.com]
> > Sent: Wednesday, April 06, 2016 12:53 PM
> > To: Li Yang 
> > Cc: Scott Wood ; Raghav Dogra ;
> > linux-...@lists.infradead.org; linuxppc-dev ;
> > Prabhakar Kushwaha ; Raghav Dogra
> > ; Jaiprakash Singh ; Boris
> > Brezillon 
> > Subject: Re: [PATCH][v3] mtd/ifc: Add support for IFC controller version 2.0
> > 
> > Hi,
> > 
> > On Wed, Mar 30, 2016 at 03:43:40PM -0500, Li Yang wrote:
> > > Hi Brian,
> > >
> > > Could you help to review and pull in this patch and the Kconfig update
> > > after this patch(https://patchwork.ozlabs.org/patch/557389/)?  It
> > 
> > It's probably best for Boris to apply this now.
> 
> Thanks Brian.  I see Boris is taking over the maintenance of NAND recently, 
> we will follow up with him.
> 
> Hi Boris,
> 
> Can you consider to pull in this patch and then Kconfig patch 
> (https://patchwork.ozlabs.org/patch/557389/)?

Applied.

Thanks,

Boris


-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v11 58/60] PCI: Introduce resource_disabled()

2016-04-07 Thread Yinghai Lu
Current is using !flags, and we are going to use
IORESOURCE_DISABLED instead of clearing resource flags.

Let's convert all !flags to helper function resource_disabled().
resource_disabled will check !flags and IORESOURCE_DISABLED both.

Cc: linux-al...@vger.kernel.org
Cc: linux-i...@vger.kernel.org
Cc: linux-am33-l...@redhat.com
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s...@vger.kernel.org
Cc: sparcli...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: linux-xte...@linux-xtensa.org
Cc: io...@lists.linux-foundation.org
Cc: linux...@vger.kernel.org
Signed-off-by: Yinghai Lu 
Acked-by: Michael Ellerman 
---
 arch/alpha/kernel/pci.c   |  2 +-
 arch/ia64/pci/pci.c   |  4 ++--
 arch/microblaze/pci/pci-common.c  | 15 ---
 arch/mn10300/unit-asb2305/pci-asb2305.c   |  4 ++--
 arch/mn10300/unit-asb2305/pci.c   |  4 ++--
 arch/powerpc/kernel/pci-common.c  | 16 +---
 arch/powerpc/platforms/powernv/pci-ioda.c | 12 ++--
 arch/s390/pci/pci.c   |  2 +-
 arch/sparc/kernel/pci.c   |  2 +-
 arch/x86/pci/i386.c   |  4 ++--
 arch/xtensa/kernel/pci.c  |  4 ++--
 drivers/iommu/intel-iommu.c   |  3 ++-
 drivers/pci/host/pcie-rcar.c  |  2 +-
 drivers/pci/iov.c |  2 +-
 drivers/pci/probe.c   |  2 +-
 drivers/pci/quirks.c  |  4 ++--
 drivers/pci/rom.c |  2 +-
 drivers/pci/setup-bus.c   |  8 
 drivers/pci/setup-res.c   |  2 +-
 include/linux/ioport.h|  4 
 20 files changed, 53 insertions(+), 45 deletions(-)

diff --git a/arch/alpha/kernel/pci.c b/arch/alpha/kernel/pci.c
index 5f387ee..c89c8ef 100644
--- a/arch/alpha/kernel/pci.c
+++ b/arch/alpha/kernel/pci.c
@@ -282,7 +282,7 @@ pcibios_claim_one_bus(struct pci_bus *b)
for (i = 0; i < PCI_NUM_RESOURCES; i++) {
struct resource *r = >resource[i];
 
-   if (r->parent || !r->start || !r->flags)
+   if (r->parent || !r->start || resource_disabled(r))
continue;
if (pci_has_flag(PCI_PROBE_ONLY) ||
(r->flags & IORESOURCE_PCI_FIXED)) {
diff --git a/arch/ia64/pci/pci.c b/arch/ia64/pci/pci.c
index 8f6ac2f..f00373f 100644
--- a/arch/ia64/pci/pci.c
+++ b/arch/ia64/pci/pci.c
@@ -333,7 +333,7 @@ void pcibios_fixup_device_resources(struct pci_dev *dev)
for (idx = 0; idx < PCI_BRIDGE_RESOURCES; idx++) {
struct resource *r = >resource[idx];
 
-   if (!r->flags || r->parent || !r->start)
+   if (resource_disabled(r) || r->parent || !r->start)
continue;
 
pci_claim_resource(dev, idx);
@@ -351,7 +351,7 @@ static void pcibios_fixup_bridge_resources(struct pci_dev 
*dev)
for (idx = PCI_BRIDGE_RESOURCES; idx < PCI_NUM_RESOURCES; idx++) {
struct resource *r = >resource[idx];
 
-   if (!r->flags || r->parent || !r->start)
+   if (resource_disabled(r) || r->parent || !r->start)
continue;
 
pci_claim_bridge_resource(dev, idx);
diff --git a/arch/microblaze/pci/pci-common.c b/arch/microblaze/pci/pci-common.c
index 35654be..4cc5ed0 100644
--- a/arch/microblaze/pci/pci-common.c
+++ b/arch/microblaze/pci/pci-common.c
@@ -694,7 +694,7 @@ static void pcibios_fixup_resources(struct pci_dev *dev)
}
for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) {
struct resource *res = dev->resource + i;
-   if (!res->flags)
+   if (resource_disabled(res))
continue;
if (res->start == 0) {
pr_debug("PCI:%s Resource %d %016llx-%016llx [%x]",
@@ -795,7 +795,7 @@ static void pcibios_fixup_bridge(struct pci_bus *bus)
pci_bus_for_each_resource(bus, res, i) {
if (!res)
continue;
-   if (!res->flags)
+   if (resource_disabled(res))
continue;
if (i >= 3 && bus->self->transparent)
continue;
@@ -964,7 +964,7 @@ static void pcibios_allocate_bus_resources(struct pci_bus 
*bus)
 pci_domain_nr(bus), bus->number);
 
pci_bus_for_each_resource(bus, res, i) {
-   if (!res || !res->flags
+   if (!res || resource_disabled(res)
|| res->start > res->end || res->parent)
continue;
if (bus->parent == NULL)
@@ -1066,7 +1066,8 @@ static void __init pcibios_allocate_resources(int pass)
r = >resource[idx];
if (r->parent)  /* Already allocated */

[PATCH v11 52/60] PCI: Unify skip_ioresource_align()

2016-04-07 Thread Yinghai Lu
There are powerpc generic version and x86 local version for
skip_ioresource_align().

Move the powerpc version to setup-bus.c, and kill x86 local version.

Also kill dummy version in microblaze.

Cc: Michal Simek 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Arnd Bergmann 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-a...@vger.kernel.org
Signed-off-by: Yinghai Lu 
Reviewed-by: Thomas Gleixner 
Acked-by: Michael Ellerman 
---
 arch/powerpc/kernel/pci-common.c | 11 +--
 arch/x86/include/asm/pci_x86.h   |  1 -
 arch/x86/pci/common.c|  4 ++--
 arch/x86/pci/i386.c  | 11 +--
 drivers/pci/setup-bus.c  |  9 +
 include/linux/pci.h  |  2 ++
 6 files changed, 15 insertions(+), 23 deletions(-)

diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 0f7a60f..2a7f4fd 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -1053,15 +1053,6 @@ void pci_fixup_cardbus(struct pci_bus *bus)
pcibios_setup_bus_devices(bus);
 }
 
-
-static int skip_isa_ioresource_align(struct pci_dev *dev)
-{
-   if (pci_has_flag(PCI_CAN_SKIP_ISA_ALIGN) &&
-   !(dev->bus->bridge_ctl & PCI_BRIDGE_CTL_ISA))
-   return 1;
-   return 0;
-}
-
 /*
  * We need to avoid collisions with `mirrored' VGA ports
  * and other strange ISA hardware, so we always want the
@@ -1082,7 +1073,7 @@ resource_size_t pcibios_align_resource(void *data, const 
struct resource *res,
resource_size_t start = res->start;
 
if (res->flags & IORESOURCE_IO) {
-   if (skip_isa_ioresource_align(dev))
+   if (skip_isa_ioresource_align(dev->bus))
return start;
if (start & 0x300)
start = (start + 0x3ff) & ~0x3ff;
diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h
index d08eacd2..d1f919e 100644
--- a/arch/x86/include/asm/pci_x86.h
+++ b/arch/x86/include/asm/pci_x86.h
@@ -28,7 +28,6 @@ do {  \
 #define PCI_ASSIGN_ROMS0x1000
 #define PCI_BIOS_IRQ_SCAN  0x2000
 #define PCI_ASSIGN_ALL_BUSSES  0x4000
-#define PCI_CAN_SKIP_ISA_ALIGN 0x8000
 #define PCI_USE__CRS   0x1
 #define PCI_CHECK_ENABLE_AMD_MMCONF0x2
 #define PCI_HAS_IO_ECS 0x4
diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
index 381a43c..09a16b7 100644
--- a/arch/x86/pci/common.c
+++ b/arch/x86/pci/common.c
@@ -82,7 +82,7 @@ DEFINE_RAW_SPINLOCK(pci_config_lock);
 
 static int __init can_skip_ioresource_align(const struct dmi_system_id *d)
 {
-   pci_probe |= PCI_CAN_SKIP_ISA_ALIGN;
+   pci_add_flags(PCI_CAN_SKIP_ISA_ALIGN);
printk(KERN_INFO "PCI: %s detected, can skip ISA alignment\n", 
d->ident);
return 0;
 }
@@ -618,7 +618,7 @@ char *__init pcibios_setup(char *str)
pci_routeirq = 1;
return NULL;
} else if (!strcmp(str, "skip_isa_align")) {
-   pci_probe |= PCI_CAN_SKIP_ISA_ALIGN;
+   pci_add_flags(PCI_CAN_SKIP_ISA_ALIGN);
return NULL;
} else if (!strcmp(str, "noioapicquirk")) {
noioapicquirk = 1;
diff --git a/arch/x86/pci/i386.c b/arch/x86/pci/i386.c
index 0a9f2ca..cf296f5 100644
--- a/arch/x86/pci/i386.c
+++ b/arch/x86/pci/i386.c
@@ -128,15 +128,6 @@ static void __init pcibios_fw_addr_list_del(void)
pcibios_fw_addr_done = true;
 }
 
-static int
-skip_isa_ioresource_align(struct pci_dev *dev) {
-
-   if ((pci_probe & PCI_CAN_SKIP_ISA_ALIGN) &&
-   !(dev->bus->bridge_ctl & PCI_BRIDGE_CTL_ISA))
-   return 1;
-   return 0;
-}
-
 /*
  * We need to avoid collisions with `mirrored' VGA ports
  * and other strange ISA hardware, so we always want the
@@ -158,7 +149,7 @@ pcibios_align_resource(void *data, const struct resource 
*res,
resource_size_t start = res->start;
 
if (res->flags & IORESOURCE_IO) {
-   if (skip_isa_ioresource_align(dev))
+   if (skip_isa_ioresource_align(dev->bus))
return start;
if (start & 0x300)
start = (start + 0x3ff) & ~0x3ff;
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 28dfd8e..5ba4bf5 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1150,6 +1150,15 @@ static resource_size_t window_alignment(struct pci_bus 
*bus,
return max(align, arch_align);
 }
 
+int skip_isa_ioresource_align(struct pci_bus *bus)
+{
+   if (pci_has_flag(PCI_CAN_SKIP_ISA_ALIGN) &&
+   !(bus->bridge_ctl & PCI_BRIDGE_CTL_ISA))
+   return 1;
+
+   return 0;
+}
+
 static resource_size_t size_aligned_for_isa(resource_size_t size)
 {
/*
diff --git a/include/linux/pci.h 

[PATCH v11 10/60] powerpc/PCI: Add IORESOURCE_MEM_64 for 64-bit resource in OF parsing

2016-04-07 Thread Yinghai Lu
For device resource PREF bit setting under bridge 64-bit pref resource,
we need to make sure only set PREF for 64bit resource.

This patch set IORESOUCE_MEM_64 for 64bit resource during OF device resource
flags parsing.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=96261
Link: https://bugzilla.kernel.org/show_bug.cgi?id=96241
Signed-off-by: Yinghai Lu 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Gavin Shan 
Cc: Yijing Wang 
Cc: Anton Blanchard 
Cc: linuxppc-dev@lists.ozlabs.org
---
 arch/powerpc/kernel/pci_of_scan.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/pci_of_scan.c 
b/arch/powerpc/kernel/pci_of_scan.c
index 719f225..476b8ac5 100644
--- a/arch/powerpc/kernel/pci_of_scan.c
+++ b/arch/powerpc/kernel/pci_of_scan.c
@@ -44,8 +44,10 @@ static unsigned int pci_parse_of_flags(u32 addr0, int bridge)
 
if (addr0 & 0x0200) {
flags = IORESOURCE_MEM | PCI_BASE_ADDRESS_SPACE_MEMORY;
-   flags |= (addr0 >> 22) & PCI_BASE_ADDRESS_MEM_TYPE_64;
flags |= (addr0 >> 28) & PCI_BASE_ADDRESS_MEM_TYPE_1M;
+   if (addr0 & 0x0100)
+   flags |= IORESOURCE_MEM_64
+| PCI_BASE_ADDRESS_MEM_TYPE_64;
if (addr0 & 0x4000)
flags |= IORESOURCE_PREFETCH
 | PCI_BASE_ADDRESS_MEM_PREFETCH;
-- 
1.8.4.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V3 1/2] pseries/eeh: Handle RTAS delay requests in configure_bridge

2016-04-07 Thread Gavin Shan
On Thu, Apr 07, 2016 at 04:28:26PM +1000, Russell Currey wrote:
>In the "ibm,configure-pe" and "ibm,configure-bridge" RTAS calls, the
>spec states that values of 9900-9905 can be returned, indicating that
>software should delay for 10^x (where x is the last digit, i.e. 990x)
>milliseconds and attempt the call again. Currently, the kernel doesn't
>know about this, and respecting it fixes some PCI failures when the
>hypervisor is busy.
>
>The delay is capped at 0.2 seconds.
>
>Cc:  # 3.10+
>Signed-off-by: Russell Currey 

Acked-by: Gavin Shan 

>---
>V3 changelog:
> - Refactorings and rewordings thanks to Gavin
> - Treat return values >9902 as 9902 thanks to Tyrel
>---
> arch/powerpc/platforms/pseries/eeh_pseries.c | 51 
> 1 file changed, 36 insertions(+), 15 deletions(-)
>
>diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c 
>b/arch/powerpc/platforms/pseries/eeh_pseries.c
>index ac3ffd9..405baaf 100644
>--- a/arch/powerpc/platforms/pseries/eeh_pseries.c
>+++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
>@@ -615,29 +615,50 @@ static int pseries_eeh_configure_bridge(struct eeh_pe 
>*pe)
> {
>   int config_addr;
>   int ret;
>+  /* Waiting 0.2s maximum before skipping configuration */
>+  int max_wait = 200;
> 
>   /* Figure out the PE address */
>   config_addr = pe->config_addr;
>   if (pe->addr)
>   config_addr = pe->addr;
> 
>-  /* Use new configure-pe function, if supported */
>-  if (ibm_configure_pe != RTAS_UNKNOWN_SERVICE) {
>-  ret = rtas_call(ibm_configure_pe, 3, 1, NULL,
>-  config_addr, BUID_HI(pe->phb->buid),
>-  BUID_LO(pe->phb->buid));
>-  } else if (ibm_configure_bridge != RTAS_UNKNOWN_SERVICE) {
>-  ret = rtas_call(ibm_configure_bridge, 3, 1, NULL,
>-  config_addr, BUID_HI(pe->phb->buid),
>-  BUID_LO(pe->phb->buid));
>-  } else {
>-  return -EFAULT;
>-  }
>+  while (max_wait > 0) {
>+  /* Use new configure-pe function, if supported */
>+  if (ibm_configure_pe != RTAS_UNKNOWN_SERVICE) {
>+  ret = rtas_call(ibm_configure_pe, 3, 1, NULL,
>+  config_addr, BUID_HI(pe->phb->buid),
>+  BUID_LO(pe->phb->buid));
>+  } else if (ibm_configure_bridge != RTAS_UNKNOWN_SERVICE) {
>+  ret = rtas_call(ibm_configure_bridge, 3, 1, NULL,
>+  config_addr, BUID_HI(pe->phb->buid),
>+  BUID_LO(pe->phb->buid));
>+  } else {
>+  return -EFAULT;
>+  }
> 
>-  if (ret)
>-  pr_warn("%s: Unable to configure bridge PHB#%d-PE#%x (%d)\n",
>-  __func__, pe->phb->global_number, pe->addr, ret);
>+  if (!ret)
>+  return ret;
>+
>+  /*
>+   * If RTAS returns a delay value that's above 100ms, cut it
>+   * down to 100ms in case firmware made a mistake.  For more
>+   * on how these delay values work see rtas_busy_delay_time
>+   */
>+  if (ret > RTAS_EXTENDED_DELAY_MIN+2 &&
>+  ret <= RTAS_EXTENDED_DELAY_MAX)
>+  ret = RTAS_EXTENDED_DELAY_MIN+2;
>+
>+  max_wait -= rtas_busy_delay_time(ret);
>+
>+  if (max_wait < 0)
>+  break;
>+
>+  rtas_busy_delay(ret);
>+  }
> 
>+  pr_warn("%s: Unable to configure bridge PHB#%d-PE#%x (%d)\n",
>+  __func__, pe->phb->global_number, pe->addr, ret);
>   return ret;
> }
> 
>-- 
>2.8.0
>
>___
>Linuxppc-dev mailing list
>Linuxppc-dev@lists.ozlabs.org
>https://lists.ozlabs.org/listinfo/linuxppc-dev

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V3 2/2] pseries/eeh: Refactor the configure_bridge RTAS tokens

2016-04-07 Thread Gavin Shan
On Thu, Apr 07, 2016 at 04:28:27PM +1000, Russell Currey wrote:
>The RTAS calls "ibm,configure-pe" and "ibm,configure-bridge" perform the
>same actions, however the former can skip configuration if unnecessary.
>The existing code treats them as different tokens even though only one
>will ever be called.  Refactor this by making a single token that is
>assigned during init.
>
>Signed-off-by: Russell Currey 

Acked-by: Gavin Shan 

>---
>V3: Reorder commits so the previous patch doesn't depend on this
>
>I had a look at doing the same with some other duplicated tokens but
>they had slight differences in semantics so it wasn't helping clarity.
>---
> arch/powerpc/platforms/pseries/eeh_pseries.c | 28 
> 1 file changed, 12 insertions(+), 16 deletions(-)
>
>diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c 
>b/arch/powerpc/platforms/pseries/eeh_pseries.c
>index 405baaf..3998e0f 100644
>--- a/arch/powerpc/platforms/pseries/eeh_pseries.c
>+++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
>@@ -53,7 +53,6 @@ static int ibm_read_slot_reset_state2;
> static int ibm_slot_error_detail;
> static int ibm_get_config_addr_info;
> static int ibm_get_config_addr_info2;
>-static int ibm_configure_bridge;
> static int ibm_configure_pe;
> 
> /*
>@@ -81,7 +80,14 @@ static int pseries_eeh_init(void)
>   ibm_get_config_addr_info2   = 
> rtas_token("ibm,get-config-addr-info2");
>   ibm_get_config_addr_info= 
> rtas_token("ibm,get-config-addr-info");
>   ibm_configure_pe= rtas_token("ibm,configure-pe");
>-  ibm_configure_bridge= rtas_token("ibm,configure-bridge");
>+
>+  /*
>+   * ibm,configure-pe and ibm,configure-bridge have the same semantics,
>+   * however ibm,configure-pe can be faster.  If we can't find
>+   * ibm,configure-pe then fall back to using ibm,configure-bridge.
>+   */
>+  if (ibm_configure_pe == RTAS_UNKNOWN_SERVICE)
>+  ibm_configure_pe= rtas_token("ibm,configure-bridge");
> 
>   /*
>* Necessary sanity check. We needn't check "get-config-addr-info"
>@@ -93,8 +99,7 @@ static int pseries_eeh_init(void)
>   (ibm_read_slot_reset_state2 == RTAS_UNKNOWN_SERVICE &&
>ibm_read_slot_reset_state == RTAS_UNKNOWN_SERVICE) ||
>   ibm_slot_error_detail == RTAS_UNKNOWN_SERVICE   ||
>-  (ibm_configure_pe == RTAS_UNKNOWN_SERVICE   &&
>-   ibm_configure_bridge == RTAS_UNKNOWN_SERVICE)) {
>+  ibm_configure_pe == RTAS_UNKNOWN_SERVICE) {
>   pr_info("EEH functionality not supported\n");
>   return -EINVAL;
>   }
>@@ -624,18 +629,9 @@ static int pseries_eeh_configure_bridge(struct eeh_pe *pe)
>   config_addr = pe->addr;
> 
>   while (max_wait > 0) {
>-  /* Use new configure-pe function, if supported */
>-  if (ibm_configure_pe != RTAS_UNKNOWN_SERVICE) {
>-  ret = rtas_call(ibm_configure_pe, 3, 1, NULL,
>-  config_addr, BUID_HI(pe->phb->buid),
>-  BUID_LO(pe->phb->buid));
>-  } else if (ibm_configure_bridge != RTAS_UNKNOWN_SERVICE) {
>-  ret = rtas_call(ibm_configure_bridge, 3, 1, NULL,
>-  config_addr, BUID_HI(pe->phb->buid),
>-  BUID_LO(pe->phb->buid));
>-  } else {
>-  return -EFAULT;
>-  }
>+  ret = rtas_call(ibm_configure_pe, 3, 1, NULL,
>+  config_addr, BUID_HI(pe->phb->buid),
>+  BUID_LO(pe->phb->buid));
> 
>   if (!ret)
>   return ret;
>-- 
>2.8.0
>
>___
>Linuxppc-dev mailing list
>Linuxppc-dev@lists.ozlabs.org
>https://lists.ozlabs.org/listinfo/linuxppc-dev

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4 1/3] ppc64/book3s: fix branching to out of line handlers in relocation kernel

2016-04-07 Thread Hari Bathini
Some of the interrupt vectors on 64-bit POWER server processors  are
only 32 bytes long (8 instructions), which is not enough for the full
first-level interrupt handler. For these we need to branch to an out-
of-line (OOL) handler. But when we are running a relocatable kernel,
interrupt vectors till __end_interrupts marker are copied down to real
address 0x100. So, branching to labels (read OOL handlers) outside this
section should be handled differently (see LOAD_HANDLER()), considering
relocatable kernel, which would need atleast 4 instructions.

However, branching from interrupt vector means that we corrupt the CFAR
(come-from address register) on POWER7 and later processors as mentioned
in commit 1707dd16. So, EXCEPTION_PROLOG_0 (6 instructions) that contains
the part up to the point where the CFAR is saved in the PACA should be
part of the short interrupt vectors before we branch out to OOL handlers.

But as mentioned already, there are interrupt vectors on 64-bit POWER server
processors that are only 32 bytes long (like vectors 0x4f00, 0x4f20, etc.),
which cannot accomodate the above two cases at the same time owing to space
constraint. Currently, in these interrupt vectors, we simply branch out to
OOL handlers, without using LOAD_HANDLER(), which leaves us vulnerable when
running a relocatable kernel (eg. kdump case). While this has been the case
for sometime now and kdump is used widely, we were fortunate not to see any
problems so far, for three reasons:

1. In almost all cases, production kernel (relocatable) is used for
   kdump as well, which would mean that crashed kernel's OOL handler
   would be at the same place where we endup branching to, from short
   interrupt vector of kdump kernel.
2. Also, OOL handler was unlikely the reason for crash in almost all
   the kdump scenarios, which meant we had a sane OOL handler from
   crashed kernel that we branched to.
3. On most 64-bit POWER server processors, page size is large enough
   that marking interrupt vector code as executable (see commit
   429d2e83) leads to marking OOL handler code from crashed kernel,
   that sits right below interrupt vector code from kdump kernel, as
   executable as well.

Let us fix this undependable code path by moving these OOL handlers below
__end_interrupts marker to make sure we also copy these handlers to real
address 0x100 when running a relocatable kernel. Because the interrupt
vectors branching to these OOL handlers are not long enough to use
LOAD_HANDLER() for branching as discussed above.

This fix has been tested successfully in kdump scenario, on a lpar with 4K page
size by using different default/production kernel and kdump kernel.

Signed-off-by: Hari Bathini 
Signed-off-by: Mahesh Salgaonkar 
---

Michael, I did test this patchset in different scenarios. But if you feel
the change is too radical, we could go with version2. But I thought this was
worth a shot.

changes from v3:
1. No changes in this patch except for a spellcheck
2. A new patch that tries to free up space below 0x7000 (2/3)
3. A new patch to remove __end_handlers marker (3/3)


 arch/powerpc/kernel/exceptions-64s.S |   29 +++--
 1 file changed, 19 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 7716ceb..f76b2f3 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -953,6 +953,25 @@ hv_facility_unavailable_relon_trampoline:
 #endif
STD_RELON_EXCEPTION_PSERIES(0x5700, 0x1700, altivec_assist)
 
+   /*
+* Out-Of-Line handlers for relocation-on interrupt vectors
+*
+* We need these OOL handlers to be below __end_interrupts
+* marker to ensure we also copy these OOL handlers along
+* with the interrupt vectors to real address 0x100 when
+* running a relocatable kernel. Because the interrupt
+* vectors branching to these OOL handlers are not long
+* enough to use LOAD_HANDLER() for branching.
+*/
+   STD_RELON_EXCEPTION_HV_OOL(0xe40, emulation_assist)
+   MASKABLE_RELON_EXCEPTION_HV_OOL(0xe80, h_doorbell)
+
+   STD_RELON_EXCEPTION_PSERIES_OOL(0xf00, performance_monitor)
+   STD_RELON_EXCEPTION_PSERIES_OOL(0xf20, altivec_unavailable)
+   STD_RELON_EXCEPTION_PSERIES_OOL(0xf40, vsx_unavailable)
+   STD_RELON_EXCEPTION_PSERIES_OOL(0xf60, facility_unavailable)
+   STD_RELON_EXCEPTION_HV_OOL(0xf80, hv_facility_unavailable)
+
/* Other future vectors */
.align  7
.globl  __end_interrupts
@@ -1234,16 +1253,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX)
.globl  __end_handlers
 __end_handlers:
 
-   /* Equivalents to the above handlers for relocation-on interrupt 
vectors */
-   STD_RELON_EXCEPTION_HV_OOL(0xe40, emulation_assist)
-   

[PATCH v4 2/3] ppc64/book3s: make some room for common interrupt vector code

2016-04-07 Thread Hari Bathini
With the previous patch, we choke out whatever little space is left
below 0x7000 (FWNMI hard block) while there is a hole of ~1400 bytes
below __end_interrupts marker when CONFIG_CBE_RAS is disabled.
Considering CONFIG_CBE_RAS is not enabled by default for BOOK3S, this
is not a desirable scenario especially when we have to worry about
each additional instruction that goes below 0x7000.

Memory region from 0x1800 to 0x4000 is dedicated for common interrupt
vector code. Also, we never hit an interrupt below 0x300 when IR=DR=1
implying memory region between 0x4000 to 0x4300 can also be used for
common interrupt vector code. So, we can effectively use memory region
between 0x1800 to 0x4300 for common interrupt vector code.

This patch tries to free up some space below 0x7000 by rearranging the
common interrupt vector code. The approach here is to avoid large holes
below 0x4300 for any kernel configuration. For this, let us move common
interrupt vector code that only gets enabled with CONFIG_CBE_RAS above
0x8000, as it doesn't need to be too close to the call sites and can be
branched to with LOAD_HANDLER() as long as it is within the first 64KB
(0x1) of the kernel image. Instead, lets move common interrupt vector
code marked h_instr_storage_common, facility_unavailable_common &
hv_facility_unavailable_common below 0x4300. This leaves ~250 bytes
free below 0x4300 and ~1150 bytes free below 0x7000 - enough space to
stop worrying about every additional instruction that goes below 0x7000.

This patch assumes at least commit 376af594, part of the patch series
that starts with commit 468a3302, is part of the code to avoid messy
compilation issues like:

relocation truncated to fit: R_PPC64_REL14 against `.text'+1c90
Makefile:864: recipe for target 'vmlinux' failed

I tested this patch successfully on ppc64, ppc64le lpars and baremetal
environments. Couldn't test it on IBM cell blade though but expecting no
problems with this patch in IBM cell blade environment as well. If
someone can test this patch in cell platform, it would be great.

Signed-off-by: Hari Bathini 
---
 arch/powerpc/kernel/exceptions-64s.S |   20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index f76b2f3..c193ebd 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -786,6 +786,7 @@ kvmppc_skip_Hinterrupt:
STD_EXCEPTION_COMMON(0xb00, trap_0b, unknown_exception)
STD_EXCEPTION_COMMON(0xd00, single_step, single_step_exception)
STD_EXCEPTION_COMMON(0xe00, trap_0e, unknown_exception)
+   STD_EXCEPTION_COMMON(0xe20, h_instr_storage, unknown_exception)
STD_EXCEPTION_COMMON(0xe40, emulation_assist, 
emulation_assist_interrupt)
STD_EXCEPTION_COMMON_ASYNC(0xe60, hmi_exception, handle_hmi_exception)
 #ifdef CONFIG_PPC_DOORBELL
@@ -794,6 +795,9 @@ kvmppc_skip_Hinterrupt:
STD_EXCEPTION_COMMON_ASYNC(0xe80, h_doorbell, unknown_exception)
 #endif
STD_EXCEPTION_COMMON_ASYNC(0xf00, performance_monitor, 
performance_monitor_exception)
+   STD_EXCEPTION_COMMON(0xf60, facility_unavailable, 
facility_unavailable_exception)
+   STD_EXCEPTION_COMMON(0xf80, hv_facility_unavailable, 
facility_unavailable_exception)
+
STD_EXCEPTION_COMMON(0x1300, instruction_breakpoint, 
instruction_breakpoint_exception)
STD_EXCEPTION_COMMON(0x1502, denorm, unknown_exception)
 #ifdef CONFIG_ALTIVEC
@@ -801,11 +805,6 @@ kvmppc_skip_Hinterrupt:
 #else
STD_EXCEPTION_COMMON(0x1700, altivec_assist, unknown_exception)
 #endif
-#ifdef CONFIG_CBE_RAS
-   STD_EXCEPTION_COMMON(0x1200, cbe_system_error, 
cbe_system_error_exception)
-   STD_EXCEPTION_COMMON(0x1600, cbe_maintenance, cbe_maintenance_exception)
-   STD_EXCEPTION_COMMON(0x1800, cbe_thermal, cbe_thermal_exception)
-#endif /* CONFIG_CBE_RAS */
 
/*
 * Relocation-on interrupts: A subset of the interrupts can be delivered
@@ -1029,8 +1028,6 @@ instruction_access_common:
li  r5,0x400
b   do_hash_page/* Try to handle as hpte fault */
 
-   STD_EXCEPTION_COMMON(0xe20, h_instr_storage, unknown_exception)
-
 /*
  * Here is the common SLB miss user that is used when going to virtual
  * mode for SLB misses, that is currently not used
@@ -1246,9 +1243,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX)
bl  vsx_unavailable_exception
b   ret_from_except
 
-   STD_EXCEPTION_COMMON(0xf60, facility_unavailable, 
facility_unavailable_exception)
-   STD_EXCEPTION_COMMON(0xf80, hv_facility_unavailable, 
facility_unavailable_exception)
-
.align  7
.globl  __end_handlers
 __end_handlers:
@@ -1268,6 +1262,12 @@ fwnmi_data_area:
. = 0x8000
 #endif /* defined(CONFIG_PPC_PSERIES) || defined(CONFIG_PPC_POWERNV) */
 
+#ifdef CONFIG_CBE_RAS
+   

[PATCH v4 3/3] ppc64/book3s: remove __end_handlers marker

2016-04-07 Thread Hari Bathini
__end_handlers marker was intended to mark down upto code that gets
called from exception prologs. But that hasn't kept pace with code
changes. Case in point, slb_miss_realmode being called from exception
prolog code but isn't below __end_handlers marker. So, __end_handlers
marker is as good as a comment but could be misleading at times if
it isn't in sync with the code, as is the case now. So, let us avoid
this confusion by having a better comment and removing __end_handlers
marker altogether.

Signed-off-by: Hari Bathini 
---
 arch/powerpc/kernel/exceptions-64s.S |   13 -
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index c193ebd..80f9fc4 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -764,11 +764,10 @@ kvmppc_skip_Hinterrupt:
 #endif
 
 /*
- * Code from here down to __end_handlers is invoked from the
- * exception prologs above.  Because the prologs assemble the
- * addresses of these handlers using the LOAD_HANDLER macro,
- * which uses an ori instruction, these handlers must be in
- * the first 64k of the kernel image.
+ * Ensure that any handlers that get invoked from the exception prologs
+ * above are below the first 64KB (0x1) of the kernel image because
+ * the prologs assemble the addresses of these handlers using the
+ * LOAD_HANDLER macro, which uses an ori instruction.
  */
 
 /*** Common interrupt handlers ***/
@@ -1243,10 +1242,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX)
bl  vsx_unavailable_exception
b   ret_from_except
 
-   .align  7
-   .globl  __end_handlers
-__end_handlers:
-
 #if defined(CONFIG_PPC_PSERIES) || defined(CONFIG_PPC_POWERNV)
 /*
  * Data area reserved for FWNMI option.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V10 00/28] Add new powerpc specific ELF core notes

2016-04-07 Thread Michael Ellerman


On 7 April 2016 7:23:46 pm AEST, Laurent Dufour  
wrote:
>On 16/02/2016 09:59, Anshuman Khandual wrote:
>>  This patch series adds twelve new ELF core note sections which can
>> be used with existing ptrace request PTRACE_GETREGSET-SETREGSET for
>accessing
>> various transactional memory and other miscellaneous debug register
>sets on
>> powerpc platform.
>
>Hi Michael,
>
>This series is required to handle TM state in CRIU.
>Is there a chance to get it upstream soon ?

We were waiting on the gdb support to make sure it had some testing. If it's 
working for CRIU that would be a good data point, have you actually tested it 
with CRIU?

cheers
-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC v5 7/7] vfio-pci: Allow to mmap MSI-X table if interrupt remapping is supported

2016-04-07 Thread Eric Auger
Hi Yongji,
On 04/07/2016 01:38 PM, Yongji Xie wrote:
> On 2016/4/6 22:45, Alex Williamson wrote:
>> On Tue,  5 Apr 2016 21:46:44 +0800
>> Yongji Xie  wrote:
>>
>>> This patch enables mmapping MSI-X tables if
>>> hardware supports interrupt remapping which
>>> can ensure that a given pci device can only
>>> shoot the MSIs assigned for it.
>>>
>>> Signed-off-by: Yongji Xie 
>>> ---
>>>   drivers/vfio/pci/vfio_pci.c |9 +++--
>>>   drivers/vfio/pci/vfio_pci_private.h |1 +
>>>   drivers/vfio/pci/vfio_pci_rdwr.c|2 +-
>>>   3 files changed, 9 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
>>> index c60d790..ef02896 100644
>>> --- a/drivers/vfio/pci/vfio_pci.c
>>> +++ b/drivers/vfio/pci/vfio_pci.c
>>> @@ -201,6 +201,10 @@ static int vfio_pci_enable(struct
>>> vfio_pci_device *vdev)
>>>   } else
>>>   vdev->msix_bar = 0xFF;
>>>   +if (iommu_capable(pdev->dev.bus, IOMMU_CAP_INTR_REMAP) ||
>> This doesn't address the issue I raised earlier where ARM SMMU sets
>> this capability, but doesn't really provide per vector isolation.  ARM
>> either needs to be fixed or we need to consider the whole capability
>> tainted for this application and standardize around the bus flags.
>> It's not very desirable to have two different ways to test this anyway.
> 
> I saw Eric posted a patchset [1] which introduce a flag
> MSI_FLAG_IRQ_REMAPPING to indicate the capability
> for ARM SMMU. With this patchset applied, it would
> be  workable to use bus_flags to test the capability
> of ARM SMMU:

My purpose was to remove the advertising of IOMMU_CAP_INTR_REMAP from
arm-smmu.c, "fix" mentionned by Alex (by the way I also need to do the
same in v3 code) and to advertise the functionality on MSI controller
instead (since the IRQ REMAPPING functionality is abstracted in GICv3
ITS MSI controller)

On top of that, on ARM we have platform (non PCI) MSI controllers so my
understanding is the capability advertising should be possible beyond
the PCI bus?

Best Regards

Eric
> 
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index a080f44..b2d1756 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -1134,6 +1134,21 @@ void *msi_desc_to_pci_sysdata(struct msi_desc *desc)
>  }
>  EXPORT_SYMBOL_GPL(msi_desc_to_pci_sysdata);
> 
> +void pci_check_msi_remapping(struct pci_bus *bus)
> +{
> +#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
> +struct irq_domain *domain;
> +struct msi_domain_info *info;
> +
> +domain = dev_get_msi_domain(>dev);
> +if (domain) {
> +info = msi_get_domain_info(domain);
> +if (info->flags & MSI_FLAG_IRQ_REMAPPING)
> +pdev->bus->bus_flags |= PCI_BUS_FLAGS_MSI_REMAP;
> +}
> +#endif
> +}
> +
>  #ifdef CONFIG_PCI_MSI_IRQ_DOMAIN
>  /**
>   * pci_msi_domain_write_msg - Helper to write MSI message to PCI config
> space
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 6d7ab9b..24e9606 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -2115,6 +2115,7 @@ struct pci_bus *pci_create_root_bus(struct device
> *parent, int bus,
> device_enable_async_suspend(b->bridge);
> pci_set_bus_of_node(b);
> pci_set_bus_msi_domain(b);
> +   pci_check_msi_remapping(b);
> 
> if (!parent)
> set_dev_node(b->bridge, pcibus_to_node(b));
> diff --git a/include/linux/msi.h b/include/linux/msi.h
> index a2a0068..fe8ce7b 100644
> --- a/include/linux/msi.h
> +++ b/include/linux/msi.h
> @@ -15,6 +15,7 @@ extern int pci_msi_ignore_mask;
>  struct irq_data;
>  struct msi_desc;
>  struct pci_dev;
> +struct pci_bus;
>  struct platform_msi_priv_data;
>  void __get_cached_msi_msg(struct msi_desc *entry, struct msi_msg *msg);
>  void get_cached_msi_msg(unsigned int irq, struct msi_msg *msg);
> @@ -155,6 +156,8 @@ void arch_restore_msi_irqs(struct pci_dev *dev);
>  void default_teardown_msi_irqs(struct pci_dev *dev);
>  void default_restore_msi_irqs(struct pci_dev *dev);
> 
> +void pci_check_msi_remapping(struct pci_bus *bus);
> +
>  struct msi_controller {
> struct module *owner;
> struct device *dev;
> 
> Next we just need to find a proper way to make
> bus_flags compatible with IOMMU_CAP_INTR_REMAP, right?
> 
> I think a good place to do that is add_iommu_group().
> But I'm not sure whether iommu drivers must be
> initialized after PCI enumeration.  Do you have any comment?
> 
> [1] http://www.spinics.net/lists/kvm/msg130256.html
> 
>>> +pdev->bus->bus_flags | PCI_BUS_FLAGS_MSI_REMAP)
>> Perhaps some sort of wrapper for testing these flags would help avoid
>> this kind of coding error (| vs &)
> 
> Thank you.  I'll try not to make the same mistake again.
> 
> Regards,
> Yongji
> 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org

Re: [PATCH v2] powerpc: Fix incorrect PPC32 PAMU dependency

2016-04-07 Thread Joerg Roedel
On Wed, Mar 16, 2016 at 11:15:44PM -0500, Andy Fleming wrote:
> The Freescale PAMU can be enabled on both 32 and 64-bit Power
> chips. Commit 477ab7a19cec8409e4e2dd10e7348e4cac3c06e5
> (iommu: Make more drivers depend on COMPILE_TEST)
> restricted PAMU to PPC32. PPC covers both.
> 
> Signed-off-by: Andy Fleming 
> ---
> 
> v2: Implemented Michael Ellerman's suggestion to clean up the
> dependency chain

Applied, thanks.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC v5 7/7] vfio-pci: Allow to mmap MSI-X table if interrupt remapping is supported

2016-04-07 Thread Yongji Xie

On 2016/4/6 22:45, Alex Williamson wrote:

On Tue,  5 Apr 2016 21:46:44 +0800
Yongji Xie  wrote:


This patch enables mmapping MSI-X tables if
hardware supports interrupt remapping which
can ensure that a given pci device can only
shoot the MSIs assigned for it.

Signed-off-by: Yongji Xie 
---
  drivers/vfio/pci/vfio_pci.c |9 +++--
  drivers/vfio/pci/vfio_pci_private.h |1 +
  drivers/vfio/pci/vfio_pci_rdwr.c|2 +-
  3 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index c60d790..ef02896 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -201,6 +201,10 @@ static int vfio_pci_enable(struct vfio_pci_device *vdev)
} else
vdev->msix_bar = 0xFF;
  
+	if (iommu_capable(pdev->dev.bus, IOMMU_CAP_INTR_REMAP) ||

This doesn't address the issue I raised earlier where ARM SMMU sets
this capability, but doesn't really provide per vector isolation.  ARM
either needs to be fixed or we need to consider the whole capability
tainted for this application and standardize around the bus flags.
It's not very desirable to have two different ways to test this anyway.


I saw Eric posted a patchset [1] which introduce a flag
MSI_FLAG_IRQ_REMAPPING to indicate the capability
for ARM SMMU. With this patchset applied, it would
be  workable to use bus_flags to test the capability
of ARM SMMU:

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index a080f44..b2d1756 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -1134,6 +1134,21 @@ void *msi_desc_to_pci_sysdata(struct msi_desc *desc)
 }
 EXPORT_SYMBOL_GPL(msi_desc_to_pci_sysdata);

+void pci_check_msi_remapping(struct pci_bus *bus)
+{
+#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
+struct irq_domain *domain;
+struct msi_domain_info *info;
+
+domain = dev_get_msi_domain(>dev);
+if (domain) {
+info = msi_get_domain_info(domain);
+if (info->flags & MSI_FLAG_IRQ_REMAPPING)
+pdev->bus->bus_flags |= PCI_BUS_FLAGS_MSI_REMAP;
+}
+#endif
+}
+
 #ifdef CONFIG_PCI_MSI_IRQ_DOMAIN
 /**
  * pci_msi_domain_write_msg - Helper to write MSI message to PCI 
config space

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 6d7ab9b..24e9606 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2115,6 +2115,7 @@ struct pci_bus *pci_create_root_bus(struct device 
*parent, int bus,

device_enable_async_suspend(b->bridge);
pci_set_bus_of_node(b);
pci_set_bus_msi_domain(b);
+   pci_check_msi_remapping(b);

if (!parent)
set_dev_node(b->bridge, pcibus_to_node(b));
diff --git a/include/linux/msi.h b/include/linux/msi.h
index a2a0068..fe8ce7b 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -15,6 +15,7 @@ extern int pci_msi_ignore_mask;
 struct irq_data;
 struct msi_desc;
 struct pci_dev;
+struct pci_bus;
 struct platform_msi_priv_data;
 void __get_cached_msi_msg(struct msi_desc *entry, struct msi_msg *msg);
 void get_cached_msi_msg(unsigned int irq, struct msi_msg *msg);
@@ -155,6 +156,8 @@ void arch_restore_msi_irqs(struct pci_dev *dev);
 void default_teardown_msi_irqs(struct pci_dev *dev);
 void default_restore_msi_irqs(struct pci_dev *dev);

+void pci_check_msi_remapping(struct pci_bus *bus);
+
 struct msi_controller {
struct module *owner;
struct device *dev;

Next we just need to find a proper way to make
bus_flags compatible with IOMMU_CAP_INTR_REMAP, right?

I think a good place to do that is add_iommu_group().
But I'm not sure whether iommu drivers must be
initialized after PCI enumeration.  Do you have any comment?

[1] http://www.spinics.net/lists/kvm/msg130256.html


+   pdev->bus->bus_flags | PCI_BUS_FLAGS_MSI_REMAP)

Perhaps some sort of wrapper for testing these flags would help avoid
this kind of coding error (| vs &)


Thank you.  I'll try not to make the same mistake again.

Regards,
Yongji

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] kvm-pr: manage single-step mode

2016-04-07 Thread Laurent Vivier
Ping?

On 22/03/2016 15:53, Laurent Vivier wrote:
> Until now, when we connect gdb to the QEMU gdb-server, the
> single-step mode is not managed.
> 
> This patch adds this, only for kvm-pr:
> 
> If KVM_GUESTDBG_SINGLESTEP is set, we enable single-step trace bit in the
> MSR (MSR_SE) just before the __kvmppc_vcpu_run(), and disable it just after.
> In kvmppc_handle_exit_pr, instead of routing the interrupt to
> the guest, we return to host, with KVM_EXIT_DEBUG reason.
> 
> Signed-off-by: Laurent Vivier 
> ---
>  arch/powerpc/kvm/book3s_pr.c | 31 +--
>  1 file changed, 29 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
> index 95bceca..e6896f4 100644
> --- a/arch/powerpc/kvm/book3s_pr.c
> +++ b/arch/powerpc/kvm/book3s_pr.c
> @@ -882,6 +882,24 @@ void kvmppc_set_fscr(struct kvm_vcpu *vcpu, u64 fscr)
>  }
>  #endif
>  
> +static void kvmppc_setup_debug(struct kvm_vcpu *vcpu)
> +{
> + if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) {
> + u64 msr = kvmppc_get_msr(vcpu);
> +
> + kvmppc_set_msr(vcpu, msr | MSR_SE);
> + }
> +}
> +
> +static void kvmppc_clear_debug(struct kvm_vcpu *vcpu)
> +{
> + if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) {
> + u64 msr = kvmppc_get_msr(vcpu);
> +
> + kvmppc_set_msr(vcpu, msr & ~MSR_SE);
> + }
> +}
> +
>  int kvmppc_handle_exit_pr(struct kvm_run *run, struct kvm_vcpu *vcpu,
> unsigned int exit_nr)
>  {
> @@ -1208,8 +1226,13 @@ program_interrupt:
>  #endif
>   case BOOK3S_INTERRUPT_MACHINE_CHECK:
>   case BOOK3S_INTERRUPT_TRACE:
> - kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
> - r = RESUME_GUEST;
> + if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) {
> + run->exit_reason = KVM_EXIT_DEBUG;
> + r = RESUME_HOST;
> + } else {
> + kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
> + r = RESUME_GUEST;
> + }
>   break;
>   default:
>   {
> @@ -1479,6 +1502,8 @@ static int kvmppc_vcpu_run_pr(struct kvm_run *kvm_run, 
> struct kvm_vcpu *vcpu)
>   goto out;
>   }
>  
> + kvmppc_setup_debug(vcpu);
> +
>   /*
>* Interrupts could be timers for the guest which we have to inject
>* again, so let's postpone them until we're in the guest and if we
> @@ -1501,6 +1526,8 @@ static int kvmppc_vcpu_run_pr(struct kvm_run *kvm_run, 
> struct kvm_vcpu *vcpu)
>  
>   ret = __kvmppc_vcpu_run(kvm_run, vcpu);
>  
> + kvmppc_clear_debug(vcpu);
> +
>   /* No need for kvm_guest_exit. It's done in handle_exit.
>  We also get here with interrupts enabled. */
>  
> 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] kvm-pr: manage illegal instructions

2016-04-07 Thread Laurent Vivier
Ping?

On 15/03/2016 21:18, Laurent Vivier wrote:
> While writing some instruction tests for kvm-unit-tests for powerpc,
> I've found that illegal instructions are not managed correctly with kvm-pr,
> while it is fine with kvm-hv.
> 
> When an illegal instruction (like ".long 0") is processed by kvm-pr,
> the kernel logs are filled with:
> 
>  Couldn't emulate instruction 0x (op 0 xop 0)
>  kvmppc_handle_exit_pr: emulation at 700 failed ()
> 
> While the exception handler receives an interrupt for each instruction
> executed after the illegal instruction.
> 
> Signed-off-by: Laurent Vivier 
> ---
>  arch/powerpc/kvm/book3s_emulate.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_emulate.c 
> b/arch/powerpc/kvm/book3s_emulate.c
> index 2afdb9c..4ee969d 100644
> --- a/arch/powerpc/kvm/book3s_emulate.c
> +++ b/arch/powerpc/kvm/book3s_emulate.c
> @@ -99,7 +99,6 @@ int kvmppc_core_emulate_op_pr(struct kvm_run *run, struct 
> kvm_vcpu *vcpu,
>  
>   switch (get_op(inst)) {
>   case 0:
> - emulated = EMULATE_FAIL;
>   if ((kvmppc_get_msr(vcpu) & MSR_LE) &&
>   (inst == swab32(inst_sc))) {
>   /*
> @@ -112,6 +111,9 @@ int kvmppc_core_emulate_op_pr(struct kvm_run *run, struct 
> kvm_vcpu *vcpu,
>   kvmppc_set_gpr(vcpu, 3, EV_UNIMPLEMENTED);
>   kvmppc_set_pc(vcpu, kvmppc_get_pc(vcpu) + 4);
>   emulated = EMULATE_DONE;
> + } else {
> + kvmppc_core_queue_program(vcpu, SRR1_PROGILL);
> + emulated = EMULATE_AGAIN;
>   }
>   break;
>   case 19:
> 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 03/10] mm/hugetlb: Protect follow_huge_(pud|pgd) functions from race

2016-04-07 Thread kbuild test robot
Hi Anshuman,

[auto build test ERROR on powerpc/next]
[also build test ERROR on v4.6-rc2 next-20160407]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improving the system]

url:
https://github.com/0day-ci/linux/commits/Anshuman-Khandual/Enable-HugeTLB-page-migration-on-POWER/20160407-165841
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: s390-allyesconfig (attached as .config)
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=s390 

All errors (new ones prefixed by >>):

   mm/hugetlb.c: In function 'follow_huge_pud':
>> mm/hugetlb.c:4360:3: error: implicit declaration of function 'pud_page' 
>> [-Werror=implicit-function-declaration]
  page = pud_page(*pud) + ((address & ~PUD_MASK) >> PAGE_SHIFT);
  ^
   mm/hugetlb.c:4360:8: warning: assignment makes pointer from integer without 
a cast
  page = pud_page(*pud) + ((address & ~PUD_MASK) >> PAGE_SHIFT);
   ^
   mm/hugetlb.c: In function 'follow_huge_pgd':
   mm/hugetlb.c:4395:3: error: implicit declaration of function 'pgd_page' 
[-Werror=implicit-function-declaration]
  page = pgd_page(*pgd) + ((address & ~PGDIR_MASK) >> PAGE_SHIFT);
  ^
   mm/hugetlb.c:4395:8: warning: assignment makes pointer from integer without 
a cast
  page = pgd_page(*pgd) + ((address & ~PGDIR_MASK) >> PAGE_SHIFT);
   ^
   cc1: some warnings being treated as errors

vim +/pud_page +4360 mm/hugetlb.c

  4354   * make sure that the address range covered by this pud is not
  4355   * unmapped from other threads.
  4356   */
  4357  if (!pud_huge(*pud))
  4358  goto out;
  4359  if (pud_present(*pud)) {
> 4360  page = pud_page(*pud) + ((address & ~PUD_MASK) >> 
> PAGE_SHIFT);
  4361  if (flags & FOLL_GET)
  4362  get_page(page);
  4363  } else {

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/2] perf probe fixes for ppc64le

2016-04-07 Thread Naveen N. Rao
On 2016/04/07 06:19PM, Balbir Singh wrote:
> 
> On 06/04/16 22:32, Naveen N. Rao wrote:
> > This patchset fixes three issues found with perf probe on ppc64le:
> > 1. 'perf test kallsyms' failure on ppc64le (reported by Michael
> > Ellerman). This was due to the symbols being fixed up during symbol
> > table load. This is fixed in patch 2 by delaying symbol fixup until
> > later.
> > 2. perf probe function offset was being calculated from the local entry
> > point (LEP), which does not match user expectation when trying to look
> > at function disassembly output (reported by Ananth N). This is fixed for
> > kallsyms in patch 1 and for symbol table in patch 2.
> 
> I think the bit where the offset is w.r.t LEP when using a name, but w.r.t
> GEP when using function+offset can be confusing.

Thanks for your review!

The rationale for this is actually from the end-user perspective. The 
two use cases we are considering are:
1. User just wants to probe at function entry point:
# perf probe _do_fork

In this case, the user most definitely needs the local entry point, 
without which the probe won't be hit. So, for this case, we 
automatically insert the probe at the LEP.

[We really only want to alter perf probe behavior in this case only, but 
we were incorrectly changing the behavior of perf with the below 
scenario as well.]

2. User wants to probe at a specific location. In this case, the user 
most likely starts by looking at the function disassembly. For instance:
# objdump -S -d vmlinux.bak | grep -A100 \<_do_fork\>:
c00b6a00 <_do_fork>:
  unsigned long stack_start,
  unsigned long stack_size,
  int __user *parent_tidptr,
  int __user *child_tidptr,
  unsigned long tls)
{
c00b6a00:   f7 00 4c 3c addis   r2,r12,247
c00b6a04:   00 86 42 38 addir2,r2,-31232
c00b6a08:   a6 02 08 7c mflrr0
c00b6a0c:   d0 ff 41 fb std r26,-48(r1)
c00b6a10:   26 80 90 7d mfocrf  r12,8
..
if (!(clone_flags & CLONE_UNTRACED)) {
c00b6a54:   e3 4f c7 7b rldicl. r7,r30,41,63
c00b6a58:   2c 00 82 40 bne c00b6a84 
<_do_fork+0x84>
if (clone_flags & CLONE_VFORK)
c00b6a5c:   e3 97 c8 7b rldicl. r8,r30,50,63
c00b6a60:   a0 01 82 41 beq c00b6c00 
<_do_fork+0x200>
c00b6a64:   20 00 20 39 li  r9,32
trace = PTRACE_EVENT_VFORK;
c00b6a68:   02 00 80 3b li  r28,2
c00b6a6c:   10 02 4d e9 ld  r10,528(r13)

If the user wants to probe at _do_fork+0x54, he'd do:
# perf probe _do_fork+0x54

With the earlier approach, we would insert the probe at _do_fork+0x5c 
(0x54 from the LEP) instead, which is incorrect.

In reality, user would probably just use debuginfo:
# perf probe -L _do_fork
<_do_fork@/root/linus/kernel/fork.c:0>
  0  long _do_fork(unsigned long clone_flags,
  unsigned long stack_start,
  unsigned long stack_size,
  int __user *parent_tidptr,
  int __user *child_tidptr,
  unsigned long tls)
  6  {
struct task_struct *p;
  8 int trace = 0;
long nr;
 
/*
 * Determine whether and which event to report to 
ptracer.  When
 * called from kernel_thread or CLONE_UNTRACED is 
explicitly
 * requested, no event is reported; otherwise, report 
if the event
 * for the type of forking is enabled.
 */
 17 if (!(clone_flags & CLONE_UNTRACED)) {
 18 if (clone_flags & CLONE_VFORK)
 19 trace = PTRACE_EVENT_VFORK;
 20 else if ((clone_flags & CSIGNAL) != SIGCHLD)
 21 trace = PTRACE_EVENT_CLONE;

# perf probe _do_fork:17

In this case, perf chooses the right address based on DWARF. The current 
patchset matches the behavior of perf without debuginfo with this.

> Do we really need probe
> points between GEP and LEP? All the GEP does is setup r2. The use case
> could be more generic, but please clarify.

There could be scenarios where having a probe point between GEP and LEP 
is useful - for instance, if we are only interested in calls to an 
in-kernel function from an external module. However, this is a secondary 
consideration and the more important 

Re: [PATCH 03/10] mm/hugetlb: Protect follow_huge_(pud|pgd) functions from race

2016-04-07 Thread Balbir Singh


On 07/04/16 15:37, Anshuman Khandual wrote:
> follow_huge_(pmd|pud|pgd) functions are used to walk the page table and
> fetch the page struct during 'follow_page_mask' call. There are possible
> race conditions faced by these functions which arise out of simultaneous
> calls of move_pages() and freeing of huge pages. This was fixed partly
> by the previous commit e66f17ff7177 ("mm/hugetlb: take page table lock
> in follow_huge_pmd()") for only PMD based huge pages.
> 
> After implementing similar logic, functions like follow_huge_(pud|pgd)
> are now safe from above mentioned race conditions and also can support
> FOLL_GET. Generic version of the function 'follow_huge_addr' has been
> left as it is and its upto the architecture to decide on it.
> 
> Signed-off-by: Anshuman Khandual 
> ---
>  include/linux/mm.h | 33 +++
>  mm/hugetlb.c   | 67 
> ++
>  2 files changed, 91 insertions(+), 9 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index ffcff53..734182a 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1751,6 +1751,19 @@ static inline void pgtable_page_dtor(struct page *page)
>   NULL: pte_offset_kernel(pmd, address))
>  
>  #if USE_SPLIT_PMD_PTLOCKS

Do we still use USE_SPLIT_PMD_PTLOCKS? I think its good enough. with pgd's
we are likely to use the same locks and the split nature may not be really
split.

> +static struct page *pgd_to_page(pgd_t *pgd)
> +{
> + unsigned long mask = ~(PTRS_PER_PGD * sizeof(pgd_t) - 1);
> +
> + return virt_to_page((void *)((unsigned long) pgd & mask));
> +}
> +
> +static struct page *pud_to_page(pud_t *pud)
> +{
> + unsigned long mask = ~(PTRS_PER_PUD * sizeof(pud_t) - 1);
> +
> + return virt_to_page((void *)((unsigned long) pud & mask));
> +}
>  
>  static struct page *pmd_to_page(pmd_t *pmd)
>  {
> @@ -1758,6 +1771,16 @@ static struct page *pmd_to_page(pmd_t *pmd)
>   return virt_to_page((void *)((unsigned long) pmd & mask));
>  }
>  
> +static inline spinlock_t *pgd_lockptr(struct mm_struct *mm, pgd_t *pgd)
> +{
> + return ptlock_ptr(pgd_to_page(pgd));
> +}
> +
> +static inline spinlock_t *pud_lockptr(struct mm_struct *mm, pud_t *pud)
> +{
> + return ptlock_ptr(pud_to_page(pud));
> +}
> +
>  static inline spinlock_t *pmd_lockptr(struct mm_struct *mm, pmd_t *pmd)
>  {
>   return ptlock_ptr(pmd_to_page(pmd));
> @@ -1783,6 +1806,16 @@ static inline void pgtable_pmd_page_dtor(struct page 
> *page)
>  
>  #else
>  
> +static inline spinlock_t *pgd_lockptr(struct mm_struct *mm, pgd_t *pgd)
> +{
> + return >page_table_lock;
> +}
> +
> +static inline spinlock_t *pud_lockptr(struct mm_struct *mm, pud_t *pud)
> +{
> + return >page_table_lock;
> +}
> +
>  static inline spinlock_t *pmd_lockptr(struct mm_struct *mm, pmd_t *pmd)
>  {
>   return >page_table_lock;
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 5ea3158..e84e479 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -4346,21 +4346,70 @@ struct page * __weak
>  follow_huge_pud(struct mm_struct *mm, unsigned long address,
>   pud_t *pud, int flags)
>  {
> - if (flags & FOLL_GET)
> - return NULL;
> -
> - return pte_page(*(pte_t *)pud) + ((address & ~PUD_MASK) >> PAGE_SHIFT);
> + struct page *page = NULL;
> + spinlock_t *ptl;
> +retry:
> + ptl = pud_lockptr(mm, pud);
> + spin_lock(ptl);
> + /*
> +  * make sure that the address range covered by this pud is not
> +  * unmapped from other threads.
> +  */
> + if (!pud_huge(*pud))
> + goto out;
> + if (pud_present(*pud)) {
> + page = pud_page(*pud) + ((address & ~PUD_MASK) >> PAGE_SHIFT);
> + if (flags & FOLL_GET)
> + get_page(page);
> + } else {
> + if (is_hugetlb_entry_migration(huge_ptep_get((pte_t *)pud))) {
> + spin_unlock(ptl);
> + __migration_entry_wait(mm, (pte_t *)pud, ptl);
> + goto retry;
> + }
> + /*
> +  * hwpoisoned entry is treated as no_page_table in
> +  * follow_page_mask().
> +  */
> + }
> +out:
> + spin_unlock(ptl);
> + return page;
>  }
>  
>  struct page * __weak
>  follow_huge_pgd(struct mm_struct *mm, unsigned long address,
>   pgd_t *pgd, int flags)
>  {
> - if (flags & FOLL_GET)
> - return NULL;
> -
> - return pte_page(*(pte_t *)pgd) +
> - ((address & ~PGDIR_MASK) >> PAGE_SHIFT);
> + struct page *page = NULL;
> + spinlock_t *ptl;
> +retry:
> + ptl = pgd_lockptr(mm, pgd);
> + spin_lock(ptl);
> + /*
> +  * make sure that the address range covered by this pgd is not
> +  * unmapped from other threads.
> +  */
> + if (!pgd_huge(*pgd))
> + goto out;
> + if 

Re: [PATCH V10 00/28] Add new powerpc specific ELF core notes

2016-04-07 Thread Laurent Dufour
On 16/02/2016 09:59, Anshuman Khandual wrote:
>   This patch series adds twelve new ELF core note sections which can
> be used with existing ptrace request PTRACE_GETREGSET-SETREGSET for accessing
> various transactional memory and other miscellaneous debug register sets on
> powerpc platform.

Hi Michael,

This series is required to handle TM state in CRIU.
Is there a chance to get it upstream soon ?

Thanks,
Laurent.

> 
> Test Result (All tests pass on both BE and LE)
> --
> ptrace-ebbPASS
> ptrace-gprPASS
> ptrace-tm-gpr PASS
> ptrace-tm-spd-gpr PASS
> ptrace-tarPASS
> ptrace-tm-tar PASS
> ptrace-tm-spd-tar PASS
> ptrace-vsxPASS
> ptrace-tm-vsx PASS
> ptrace-tm-spd-vsx PASS
> ptrace-tm-spr PASS
> 
> Previous versions:
> ==
> RFC: https://lkml.org/lkml/2014/4/1/292
> V1:  https://lkml.org/lkml/2014/4/2/43
> V2:  https://lkml.org/lkml/2014/5/5/88
> V3:  https://lkml.org/lkml/2014/5/23/486
> V4:  https://lkml.org/lkml/2014/11/11/6
> V5:  https://lkml.org/lkml/2014/11/25/134
> V6:  https://lkml.org/lkml/2014/12/2/98
> V7:  https://lkml.org/lkml/2015/1/14/19
> V8:  https://lkml.org/lkml/2015/5/19/700
> V9:  https://lkml.org/lkml/2015/10/8/522
> 
> Changes in V10:
> ---
> - Rebased against the latest mainline
> - Fixed couple of build failures in the test cases related to aux vector
> 
> Changes in V9:
> --
> - Fixed static build check failure after tm_orig_msr got dropped
> - Fixed asm volatile construct for used registers set
> - Fixed EBB, VSX, VMX tests for LE
> - Fixed TAR test which was failing because of system calls
> - Added checks for PPC_FEATURE2_HTM aux feature in the tests
> - Fixed copyright statements
> 
> Changes in V8:
> --
> - Split the misc register set into individual ELF core notes
> - Implemented support for VSX register set (on and off TM)
> - Implemented support for EBB register set
> - Implemented review comments on previous versions
> - Some code re-arrangements, re-writes and documentation
> - Added comprehensive list of test cases into selftests
> 
> Changes in V7:
> --
> - Fixed a config directive in the MISC code
> - Merged the two gitignore patches into a single one
> 
> Changes in V6:
> --
> - Added two git ignore patches for powerpc selftests
> - Re-formatted all in-code function definitions in kernel-doc format
> 
> Changes in V5:
> --
> - Changed flush_tmregs_to_thread, so not to take into account self tracing
> - Dropped the 3rd patch in the series which had merged two functions
> - Fixed one build problem for the misc debug register patch
> - Accommodated almost all the review comments from Suka on the 6th patch
> - Minor changes to the self test program
> - Changed commit messages for some of the patches
> 
> Changes in V4:
> --
> - Added one test program into the powerpc selftest bucket in this regard
> - Split the 2nd patch in the previous series into four different patches
> - Accommodated most of the review comments on the previous patch series
> - Added a patch to merge functions __switch_to_tm and tm_reclaim_task
> 
> Changes in V3:
> --
> - Added two new error paths in every TM related get/set functions when regset
>   support is not present on the system (ENODEV) or when the process does not
>   have any transaction active (ENODATA) in the context
> - Installed the active hooks for all the newly added regset core note types
> 
> Changes in V2:
> --
> - Removed all the power specific ptrace requests corresponding to new NT_PPC_*
>   elf core note types. Now all the register sets can be accessed from ptrace
>   through PTRACE_GETREGSET/PTRACE_SETREGSET using the individual NT_PPC* core
>   note type instead
> - Fixed couple of attribute values for REGSET_TM_CGPR register set
> - Renamed flush_tmreg_to_thread as flush_tmregs_to_thread
> - Fixed 32 bit checkpointed GPR support
> - Changed commit messages accordingly
> 
> 
> Anshuman Khandual (28):
>   elf: Add powerpc specific core note sections
>   powerpc, process: Add the function flush_tmregs_to_thread
>   powerpc, ptrace: Enable in transaction NT_PRFPREG ptrace requests
>   powerpc, ptrace: Enable in transaction NT_PPC_VMX ptrace requests
>   powerpc, ptrace: Enable in transaction NT_PPC_VSX ptrace requests
>   powerpc, ptrace: Adapt gpr32_get, gpr32_set functions for transaction
>   powerpc, ptrace: Enable support for NT_PPC_CGPR
>   powerpc, ptrace: Enable support for NT_PPC_CFPR
>   powerpc, ptrace: Enable support for NT_PPC_CVMX
>   powerpc, ptrace: Enable support for NT_PPC_CVSX
>   powerpc, ptrace: Enable support for TM SPR state
>   powerpc, ptrace: Enable NT_PPC_TM_CTAR, NT_PPC_TM_CPPR, NT_PPC_TM_CDSCR
>   powerpc, ptrace: Enable support for NT_PPPC_TAR, NT_PPC_PPR, NT_PPC_DSCR
>   powerpc, ptrace: Enable support for EBB registers
>   

Re: [PATCH 03/10] mm/hugetlb: Protect follow_huge_(pud|pgd) functions from race

2016-04-07 Thread kbuild test robot
Hi Anshuman,

[auto build test ERROR on powerpc/next]
[also build test ERROR on v4.6-rc2 next-20160407]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improving the system]

url:
https://github.com/0day-ci/linux/commits/Anshuman-Khandual/Enable-HugeTLB-page-migration-on-POWER/20160407-165841
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: sparc64-allyesconfig (attached as .config)
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=sparc64 

All error/warnings (new ones prefixed by >>):

   mm/hugetlb.c: In function 'follow_huge_pgd':
>> mm/hugetlb.c:4395:3: error: implicit declaration of function 'pgd_page' 
>> [-Werror=implicit-function-declaration]
  page = pgd_page(*pgd) + ((address & ~PGDIR_MASK) >> PAGE_SHIFT);
  ^
>> mm/hugetlb.c:4395:8: warning: assignment makes pointer from integer without 
>> a cast
  page = pgd_page(*pgd) + ((address & ~PGDIR_MASK) >> PAGE_SHIFT);
   ^
   cc1: some warnings being treated as errors

vim +/pgd_page +4395 mm/hugetlb.c

  4389   * make sure that the address range covered by this pgd is not
  4390   * unmapped from other threads.
  4391   */
  4392  if (!pgd_huge(*pgd))
  4393  goto out;
  4394  if (pgd_present(*pgd)) {
> 4395  page = pgd_page(*pgd) + ((address & ~PGDIR_MASK) >> 
> PAGE_SHIFT);
  4396  if (flags & FOLL_GET)
  4397  get_page(page);
  4398  } else {

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 02/10] mm/hugetlb: Add PGD based implementation awareness

2016-04-07 Thread Balbir Singh


On 07/04/16 15:37, Anshuman Khandual wrote:
> Currently the config ARCH_WANT_GENERAL_HUGETLB enabled functions like
> 'huge_pte_alloc' and 'huge_pte_offset' dont take into account HugeTLB
> page implementation at the PGD level. This is also true for functions
> like 'follow_page_mask' which is called from move_pages() system call.
> This lack of PGD level huge page support prohibits some architectures
> to use these generic HugeTLB functions.
> 

From what I know of move_pages(), it will always call follow_page_mask()
with FOLL_GET (I could be wrong here) and the implementation below
returns NULL for follow_huge_pgd().

> This change adds the required PGD based implementation awareness and
> with that, more architectures like POWER which implements 16GB pages
> at the PGD level along with the 16MB pages at the PMD level can now
> use ARCH_WANT_GENERAL_HUGETLB config option.
> 
> Signed-off-by: Anshuman Khandual 
> ---
>  include/linux/hugetlb.h |  3 +++
>  mm/gup.c|  6 ++
>  mm/hugetlb.c| 20 
>  3 files changed, 29 insertions(+)
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 7d953c2..71832e1 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -115,6 +115,8 @@ struct page *follow_huge_pmd(struct mm_struct *mm, 
> unsigned long address,
>   pmd_t *pmd, int flags);
>  struct page *follow_huge_pud(struct mm_struct *mm, unsigned long address,
>   pud_t *pud, int flags);
> +struct page *follow_huge_pgd(struct mm_struct *mm, unsigned long address,
> + pgd_t *pgd, int flags);
>  int pmd_huge(pmd_t pmd);
>  int pud_huge(pud_t pmd);
>  unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
> @@ -143,6 +145,7 @@ static inline void hugetlb_show_meminfo(void)
>  }
>  #define follow_huge_pmd(mm, addr, pmd, flags)NULL
>  #define follow_huge_pud(mm, addr, pud, flags)NULL
> +#define follow_huge_pgd(mm, addr, pgd, flags)NULL
>  #define prepare_hugepage_range(file, addr, len)  (-EINVAL)
>  #define pmd_huge(x)  0
>  #define pud_huge(x)  0
> diff --git a/mm/gup.c b/mm/gup.c
> index fb87aea..9bac78c 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -234,6 +234,12 @@ struct page *follow_page_mask(struct vm_area_struct *vma,
>   pgd = pgd_offset(mm, address);
>   if (pgd_none(*pgd) || unlikely(pgd_bad(*pgd)))
>   return no_page_table(vma, flags);
> + if (pgd_huge(*pgd) && vma->vm_flags & VM_HUGETLB) {
> + page = follow_huge_pgd(mm, address, pgd, flags);
> + if (page)
> + return page;
> + return no_page_table(vma, flags);
This will return NULL as well?
> + }
>  
>   pud = pud_offset(pgd, address);
>   if (pud_none(*pud))
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 19d0d08..5ea3158 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -4250,6 +4250,11 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
>   pte_t *pte = NULL;
>  
>   pgd = pgd_offset(mm, addr);
> + if (sz == PGDIR_SIZE) {
> + pte = (pte_t *)pgd;
> + goto huge_pgd;
> + }
> +

No allocation for a pgd slot - right?

>   pud = pud_alloc(mm, pgd, addr);
>   if (pud) {
>   if (sz == PUD_SIZE) {
> @@ -4262,6 +4267,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
>   pte = (pte_t *)pmd_alloc(mm, pud, addr);
>   }
>   }
> +
> +huge_pgd:
>   BUG_ON(pte && !pte_none(*pte) && !pte_huge(*pte));
>  
>   return pte;
> @@ -4275,6 +4282,8 @@ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned 
> long addr)
>  
>   pgd = pgd_offset(mm, addr);
>   if (pgd_present(*pgd)) {
> + if (pgd_huge(*pgd))
> + return (pte_t *)pgd;
>   pud = pud_offset(pgd, addr);
>   if (pud_present(*pud)) {
>   if (pud_huge(*pud))
> @@ -4343,6 +4352,17 @@ follow_huge_pud(struct mm_struct *mm, unsigned long 
> address,
>   return pte_page(*(pte_t *)pud) + ((address & ~PUD_MASK) >> PAGE_SHIFT);
>  }
>  
> +struct page * __weak
> +follow_huge_pgd(struct mm_struct *mm, unsigned long address,
> + pgd_t *pgd, int flags)
> +{
> + if (flags & FOLL_GET)
> + return NULL;
> +
> + return pte_page(*(pte_t *)pgd) +
> + ((address & ~PGDIR_MASK) >> PAGE_SHIFT);
> +}
> +
>  #ifdef CONFIG_MEMORY_FAILURE
>  
>  /*
> 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 01/10] mm/mmap: Replace SHM_HUGE_MASK with MAP_HUGE_MASK inside mmap_pgoff

2016-04-07 Thread Balbir Singh


On 07/04/16 15:37, Anshuman Khandual wrote:
> The commit 091d0d55b286 ("shm: fix null pointer deref when userspace
> specifies invalid hugepage size") had replaced MAP_HUGE_MASK with
> SHM_HUGE_MASK. Though both of them contain the same numeric value of
> 0x3f, MAP_HUGE_MASK flag sounds more appropriate than the other one
> in the context. Hence change it back.
> 
> Signed-off-by: Anshuman Khandual 

Acked-by: Balbir Singh 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/2] perf probe fixes for ppc64le

2016-04-07 Thread Balbir Singh

On 06/04/16 22:32, Naveen N. Rao wrote:
> This patchset fixes three issues found with perf probe on ppc64le:
> 1. 'perf test kallsyms' failure on ppc64le (reported by Michael
> Ellerman). This was due to the symbols being fixed up during symbol
> table load. This is fixed in patch 2 by delaying symbol fixup until
> later.
> 2. perf probe function offset was being calculated from the local entry
> point (LEP), which does not match user expectation when trying to look
> at function disassembly output (reported by Ananth N). This is fixed for
> kallsyms in patch 1 and for symbol table in patch 2.

I think the bit where the offset is w.r.t LEP when using a name, but w.r.t
GEP when using function+offset can be confusing. Do we really need probe
points between GEP and LEP? All the GEP does is setup r2. The use case
could be more generic, but please clarify.

> 3. perf probe failure with kretprobe when using kallsyms. This was
> failing as we were specifying an offset. This is fixed in patch 1.
> 

Balbir Singh.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/2] perf/powerpc: Fix kprobe and kretprobe handling with kallsyms

2016-04-07 Thread Naveen N. Rao
On 2016/04/07 10:00AM, Ananth N wrote:
> On Wed, Apr 06, 2016 at 06:02:57PM +0530, Naveen N. Rao wrote:
> 
> > +   if (!pev->uprobes && map->dso->symtab_type == DSO_BINARY_TYPE__KALLSYMS)
> > tev->point.offset += PPC64LE_LEP_OFFSET;
> 
> uprobes check against kallsysms? Am I missing something here?

Ah yes. That check shouldn't be necessary since symtab_type would be 
different anyway. I will remove that check.

Thanks for the review!
- Naveen

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V3 2/2] pseries/eeh: Refactor the configure_bridge RTAS tokens

2016-04-07 Thread Russell Currey
The RTAS calls "ibm,configure-pe" and "ibm,configure-bridge" perform the
same actions, however the former can skip configuration if unnecessary.
The existing code treats them as different tokens even though only one
will ever be called.  Refactor this by making a single token that is
assigned during init.

Signed-off-by: Russell Currey 
---
V3: Reorder commits so the previous patch doesn't depend on this

I had a look at doing the same with some other duplicated tokens but
they had slight differences in semantics so it wasn't helping clarity.
---
 arch/powerpc/platforms/pseries/eeh_pseries.c | 28 
 1 file changed, 12 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c 
b/arch/powerpc/platforms/pseries/eeh_pseries.c
index 405baaf..3998e0f 100644
--- a/arch/powerpc/platforms/pseries/eeh_pseries.c
+++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
@@ -53,7 +53,6 @@ static int ibm_read_slot_reset_state2;
 static int ibm_slot_error_detail;
 static int ibm_get_config_addr_info;
 static int ibm_get_config_addr_info2;
-static int ibm_configure_bridge;
 static int ibm_configure_pe;
 
 /*
@@ -81,7 +80,14 @@ static int pseries_eeh_init(void)
ibm_get_config_addr_info2   = 
rtas_token("ibm,get-config-addr-info2");
ibm_get_config_addr_info= 
rtas_token("ibm,get-config-addr-info");
ibm_configure_pe= rtas_token("ibm,configure-pe");
-   ibm_configure_bridge= rtas_token("ibm,configure-bridge");
+
+   /*
+* ibm,configure-pe and ibm,configure-bridge have the same semantics,
+* however ibm,configure-pe can be faster.  If we can't find
+* ibm,configure-pe then fall back to using ibm,configure-bridge.
+*/
+   if (ibm_configure_pe == RTAS_UNKNOWN_SERVICE)
+   ibm_configure_pe= rtas_token("ibm,configure-bridge");
 
/*
 * Necessary sanity check. We needn't check "get-config-addr-info"
@@ -93,8 +99,7 @@ static int pseries_eeh_init(void)
(ibm_read_slot_reset_state2 == RTAS_UNKNOWN_SERVICE &&
 ibm_read_slot_reset_state == RTAS_UNKNOWN_SERVICE) ||
ibm_slot_error_detail == RTAS_UNKNOWN_SERVICE   ||
-   (ibm_configure_pe == RTAS_UNKNOWN_SERVICE   &&
-ibm_configure_bridge == RTAS_UNKNOWN_SERVICE)) {
+   ibm_configure_pe == RTAS_UNKNOWN_SERVICE) {
pr_info("EEH functionality not supported\n");
return -EINVAL;
}
@@ -624,18 +629,9 @@ static int pseries_eeh_configure_bridge(struct eeh_pe *pe)
config_addr = pe->addr;
 
while (max_wait > 0) {
-   /* Use new configure-pe function, if supported */
-   if (ibm_configure_pe != RTAS_UNKNOWN_SERVICE) {
-   ret = rtas_call(ibm_configure_pe, 3, 1, NULL,
-   config_addr, BUID_HI(pe->phb->buid),
-   BUID_LO(pe->phb->buid));
-   } else if (ibm_configure_bridge != RTAS_UNKNOWN_SERVICE) {
-   ret = rtas_call(ibm_configure_bridge, 3, 1, NULL,
-   config_addr, BUID_HI(pe->phb->buid),
-   BUID_LO(pe->phb->buid));
-   } else {
-   return -EFAULT;
-   }
+   ret = rtas_call(ibm_configure_pe, 3, 1, NULL,
+   config_addr, BUID_HI(pe->phb->buid),
+   BUID_LO(pe->phb->buid));
 
if (!ret)
return ret;
-- 
2.8.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V3 1/2] pseries/eeh: Handle RTAS delay requests in configure_bridge

2016-04-07 Thread Russell Currey
In the "ibm,configure-pe" and "ibm,configure-bridge" RTAS calls, the
spec states that values of 9900-9905 can be returned, indicating that
software should delay for 10^x (where x is the last digit, i.e. 990x)
milliseconds and attempt the call again. Currently, the kernel doesn't
know about this, and respecting it fixes some PCI failures when the
hypervisor is busy.

The delay is capped at 0.2 seconds.

Cc:  # 3.10+
Signed-off-by: Russell Currey 
---
V3 changelog:
 - Refactorings and rewordings thanks to Gavin
 - Treat return values >9902 as 9902 thanks to Tyrel
---
 arch/powerpc/platforms/pseries/eeh_pseries.c | 51 
 1 file changed, 36 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c 
b/arch/powerpc/platforms/pseries/eeh_pseries.c
index ac3ffd9..405baaf 100644
--- a/arch/powerpc/platforms/pseries/eeh_pseries.c
+++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
@@ -615,29 +615,50 @@ static int pseries_eeh_configure_bridge(struct eeh_pe *pe)
 {
int config_addr;
int ret;
+   /* Waiting 0.2s maximum before skipping configuration */
+   int max_wait = 200;
 
/* Figure out the PE address */
config_addr = pe->config_addr;
if (pe->addr)
config_addr = pe->addr;
 
-   /* Use new configure-pe function, if supported */
-   if (ibm_configure_pe != RTAS_UNKNOWN_SERVICE) {
-   ret = rtas_call(ibm_configure_pe, 3, 1, NULL,
-   config_addr, BUID_HI(pe->phb->buid),
-   BUID_LO(pe->phb->buid));
-   } else if (ibm_configure_bridge != RTAS_UNKNOWN_SERVICE) {
-   ret = rtas_call(ibm_configure_bridge, 3, 1, NULL,
-   config_addr, BUID_HI(pe->phb->buid),
-   BUID_LO(pe->phb->buid));
-   } else {
-   return -EFAULT;
-   }
+   while (max_wait > 0) {
+   /* Use new configure-pe function, if supported */
+   if (ibm_configure_pe != RTAS_UNKNOWN_SERVICE) {
+   ret = rtas_call(ibm_configure_pe, 3, 1, NULL,
+   config_addr, BUID_HI(pe->phb->buid),
+   BUID_LO(pe->phb->buid));
+   } else if (ibm_configure_bridge != RTAS_UNKNOWN_SERVICE) {
+   ret = rtas_call(ibm_configure_bridge, 3, 1, NULL,
+   config_addr, BUID_HI(pe->phb->buid),
+   BUID_LO(pe->phb->buid));
+   } else {
+   return -EFAULT;
+   }
 
-   if (ret)
-   pr_warn("%s: Unable to configure bridge PHB#%d-PE#%x (%d)\n",
-   __func__, pe->phb->global_number, pe->addr, ret);
+   if (!ret)
+   return ret;
+
+   /*
+* If RTAS returns a delay value that's above 100ms, cut it
+* down to 100ms in case firmware made a mistake.  For more
+* on how these delay values work see rtas_busy_delay_time
+*/
+   if (ret > RTAS_EXTENDED_DELAY_MIN+2 &&
+   ret <= RTAS_EXTENDED_DELAY_MAX)
+   ret = RTAS_EXTENDED_DELAY_MIN+2;
+
+   max_wait -= rtas_busy_delay_time(ret);
+
+   if (max_wait < 0)
+   break;
+
+   rtas_busy_delay(ret);
+   }
 
+   pr_warn("%s: Unable to configure bridge PHB#%d-PE#%x (%d)\n",
+   __func__, pe->phb->global_number, pe->addr, ret);
return ret;
 }
 
-- 
2.8.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc: define the fman node for the kmcoge4 DTS

2016-04-07 Thread Valentin Longchamp
On 06/04/16 23:49, Scott Wood wrote:
> On Wed, 2016-04-06 at 15:37 +0200, Valentin Longchamp wrote:
>> Now that the FMAN mac driver has been merged the fman node is relevant.
>>
>> The kmcoge4 board implements 3 ethernet interfaces, 1 with a RGMII phy
>> and 2 with fixed 1 Giga SGMII links.
>>
>> Signed-off-by: Valentin Longchamp 
>> ---
>>  arch/powerpc/boot/dts/fsl/kmcoge4.dts | 39
>> +++
>>  1 file changed, 39 insertions(+)
>>
>> diff --git a/arch/powerpc/boot/dts/fsl/kmcoge4.dts
>> b/arch/powerpc/boot/dts/fsl/kmcoge4.dts
>> index 6858ec9..1cec66d 100644
>> --- a/arch/powerpc/boot/dts/fsl/kmcoge4.dts
>> +++ b/arch/powerpc/boot/dts/fsl/kmcoge4.dts
>> @@ -106,6 +106,45 @@
>>  sata@221000 {
>>  status = "disabled";
>>  };
>> +
>> +fman0: fman@40 {
>> +enet0: ethernet@e {
>> +phy-connection-type = "sgmii";
>> +local-mac-address = [00 11 22 33 44 55];
>> +fixed-link {
>> +speed = <1000>;
>> +full-duplex;
>> +};
>> +};
>> +mdio0: mdio@e1120 {
>> +front_phy: ethernet-phy@11 {
>> +reg = <0x11>;
>> +};
>> +};
>> +
>> +enet1: ethernet@e2000 {
>> +phy-connection-type = "sgmii";
>> +local-mac-address = [00 11 22 33 44 56];
>> +fixed-link {
>> +speed = <1000>;
>> +full-duplex;
>> +};
>> +};
> 
> No hardcoded MAC addresses.
> 

For these 2 interfaces where I have the local-mac-address field, the MAC
addresses are set later by an application that reads the real address in some
EEPROM. However, in order to let the fman mac_probe to run successfully in the
first place I have set non-zero MAC addresses since the local-mac-address fields
are not set by u-boot.

I have found several local-mac-address fields in other DTS files that are all
zeros, and thus are rejected by of_get_mac_address. Are they leftovers from the
past or should they be used here as well ? If not, I will simply drop these 2
fields.

Thanks

Valentin
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev