[PATCH 22/27] powerpc: Remove shim for pci_controller_ops.reset_secondary_bus

2015-03-25 Thread Daniel Axtens
Signed-off-by: Daniel Axtens d...@axtens.net
---
 arch/powerpc/include/asm/machdep.h|  3 ---
 arch/powerpc/include/asm/pci-bridge.h | 16 
 arch/powerpc/kernel/pci-common.c  |  9 -
 3 files changed, 8 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index f1476b8..f178cf1 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -244,9 +244,6 @@ struct machdep_calls {
/* Called after scan and before resource survey */
void (*pcibios_fixup_phb)(struct pci_controller *hose);
 
-   /* Reset the secondary bus of bridge */
-   void  (*pcibios_reset_secondary_bus)(struct pci_dev *dev);
-
/* Called to shutdown machine specific hardware not already controlled
 * by other drivers.
 */
diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index b62e043..b08db93 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -327,21 +327,5 @@ static inline bool enable_device_hook(struct pci_dev *dev)
return true;
 }
 
-static inline void reset_secondary_bus(struct pci_dev *dev)
-{
-   struct pci_controller *hose = pci_bus_to_host(dev-bus);
-
-   if (hose-controller_ops.reset_secondary_bus)
-   hose-controller_ops.reset_secondary_bus(dev);
-   else if (ppc_md.pcibios_reset_secondary_bus)
-   ppc_md.pcibios_reset_secondary_bus(dev);
-   else
-   /*
-* Fallback to the generic function if no
-* platform-specific one is provided
-*/
-   pci_reset_secondary_bus(dev);
-}
-
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_PCI_BRIDGE_H */
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 9edb479..a535d31 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -124,7 +124,14 @@ resource_size_t pcibios_window_alignment(struct pci_bus 
*bus,
 
 void pcibios_reset_secondary_bus(struct pci_dev *dev)
 {
-   reset_secondary_bus(dev);
+   struct pci_controller *hose = pci_bus_to_host(dev-bus);
+
+   if (hose-controller_ops.reset_secondary_bus) {
+   hose-controller_ops.reset_secondary_bus(dev);
+   return;
+   }
+
+   pci_reset_secondary_bus(dev);
 }
 
 static resource_size_t pcibios_io_size(const struct pci_controller *hose)
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V15 00/21] Enable SRIOV on POWER8

2015-03-25 Thread Wei Yang
This patchset enables the SRIOV on POWER8.

The general idea is put each VF into one individual PE and allocate required
resources like MMIO/DMA/MSI. The major difficulty comes from the MMIO
allocation and adjustment for PF's IOV BAR.

On P8, we use M64BT to cover a PF's IOV BAR, which could make an individual VF
sit in its own PE. This gives more flexiblity, while at the mean time it
brings on some restrictions on the PF's IOV BAR size and alignment.

To achieve this effect, we need to do some hack on pci devices's resources.
1. Expand the IOV BAR properly.
   Done by pnv_pci_ioda_fixup_iov_resources().
2. Shift the IOV BAR properly.
   Done by pnv_pci_vf_resource_shift().
3. IOV BAR alignment is calculated by arch dependent function instead of an
   individual VF BAR size.
   Done by pnv_pcibios_sriov_resource_alignment().
4. Take the IOV BAR alignment into consideration in the sizing and assigning.
   This is achieved by commit: PCI: Take additional IOV BAR alignment in
   sizing and assigning

Test Environment:
   The SRIOV device tested is Emulex Lancer(10df:e220) and
   Mellanox ConnectX-3(15b3:1003) on POWER8.

Examples on pass through a VF to guest through vfio:
1. unbind the original driver and bind to vfio-pci driver
   echo :06:0d.0  /sys/bus/pci/devices/:06:0d.0/driver/unbind
   echo  1102 0002  /sys/bus/pci/drivers/vfio-pci/new_id
   Note: this should be done for each device in the same iommu_group
2. Start qemu and pass device through vfio
   /home/ywywyang/git/qemu-impreza/ppc64-softmmu/qemu-system-ppc64 \
   -M pseries -m 2048 -enable-kvm -nographic \
   -drive file=/home/ywywyang/kvm/fc19.img \
   -monitor telnet:localhost:5435,server,nowait -boot cd \
   -device 
spapr-pci-vfio-host-bridge,id=CXGB3,iommu=26,index=6

Verify this is the exact VF response:
1. ping from a machine in the same subnet(the broadcast domain)
2. run arp -n on this machine
   9.115.251.20 ether   00:00:c9:df:ed:bf   C eth0
3. ifconfig in the guest
   # ifconfig eth1
   eth1: flags=4163UP,BROADCAST,RUNNING,MULTICAST  mtu 1500
inet 9.115.251.20  netmask 255.255.255.0  broadcast 
9.115.251.255
inet6 fe80::200:c9ff:fedf:edbf  prefixlen 64  scopeid 0x20link
ether 00:00:c9:df:ed:bf  txqueuelen 1000 (Ethernet)
RX packets 175  bytes 13278 (12.9 KiB)
RX errors 0  dropped 0  overruns 0  frame 0
TX packets 58  bytes 9276 (9.0 KiB)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
4. They have the same MAC address

Note: make sure you shutdown other network interfaces in guest.

---
v15:
   * Add Ack from Bjorn
   * Make more detailed comment for pnv_pci_vf_resource_shift()
v14:
   * call ppc_md.pcibios_fixup_sriov() in pcibios_add_device
   * add more explanation in change log
   * Following patches have been reordered to the beginning.
 EEH refactor to use pci_dn:
 8ec20d6 powerpc/powernv: Use pci_dn, not device_node, in PCI config 
accessor
 a3460fc powerpc/pci: Refactor pci_dn
 These two patches will be modified to merge with other patches which are
 under discussion/review in ppc mail list. Some changes may also be made in
 other patches, which I didn't include them in this series, so that the
 auto build robot could work on this.
 There may have several changes in powerpc arch, which not effect the pci
 core. So after this patch set pass the review in pci community, I would
 rebase this series on ppc brach and send out for comment.
   * use add_res-min_align as the alignment in reassign_resources_sorted()
   * some cleanup in Document
v13:
   * fix error in pcibios_iov_resource_alignment(), use pdev instead of dev
   * rename vf_num to num_vfs in pcibios_sriov_enable(),
 pnv_pci_vf_resource_shift(), pnv_pci_sriov_disable(),
 pnv_pci_sriov_enable(), pnv_pci_ioda2_setup_dma_pe()
   * add more explanation in commit powerpc/pci: Don't unset PCI resources
 for VFs
   * fix IOV BAR in hotplug path as well, and don't fixup an already added
 device
   * use roundup_pow_of_two() instead of __roundup_pow_of_two()
   * this is based on v4.0-rc1
v12:
   * remove align parameter from pcibios_iov_resource_alignment()
 default version returns pci_iov_resource_size() instead of the
 align parameter
   * in powerpc pcibios_iov_resource_alignment(), return
 pci_iov_resource_size() if there's no ppc_md function pointer
   * in pci_sriov_resource_alignment(), don't re-read base, since we
 saved the required alignment when reading it the first time
   * remove vf_num parameter from add_dev_pci_info() and
 remove_dev_pci_info(); use pci_sriov_get_totalvfs() instead
   * use dev_warn() instead of pr_warn() when possible
   * check to be sure IOV BAR 

[PATCH V15 01/21] powerpc/pci: Refactor pci_dn

2015-03-25 Thread Wei Yang
From: Gavin Shan gws...@linux.vnet.ibm.com

pci_dn is the extension of PCI device node and is created from device node.
Unfortunately, VFs are enabled dynamically by PF's driver and they don't
have corresponding device nodes, and pci_dn.  Refactor pci_dn to support
VFs:

   * pci_dn is organized as a hierarchy tree.  VF's pci_dn is put
 to the child list of pci_dn of PF's bridge.  pci_dn of other device
 put to the child list of pci_dn of its upstream bridge.

   * VF's pci_dn is expected to be created dynamically when PF
 enabling VFs.  VF's pci_dn will be destroyed when PF disabling VFs.
 pci_dn of other device is still created from device node as before.

   * For one particular PCI device (VF or not), its pci_dn can be
 found from pdev-dev.archdata.firmware_data, PCI_DN(devnode), or
 parent's list.  The fast path (fetching pci_dn through PCI device
 instance) is populated during early fixup time.

[bhelgaas: add ifdef around add_one_dev_pci_info(), use dev_printk()]
Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/device.h |3 +
 arch/powerpc/include/asm/pci-bridge.h |   14 +-
 arch/powerpc/kernel/pci_dn.c  |  245 -
 arch/powerpc/platforms/powernv/pci-ioda.c |   16 ++
 4 files changed, 272 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/device.h 
b/arch/powerpc/include/asm/device.h
index 38faede..29992cd 100644
--- a/arch/powerpc/include/asm/device.h
+++ b/arch/powerpc/include/asm/device.h
@@ -34,6 +34,9 @@ struct dev_archdata {
 #ifdef CONFIG_SWIOTLB
dma_addr_t  max_direct_dma_addr;
 #endif
+#ifdef CONFIG_PPC64
+   void*firmware_data;
+#endif
 #ifdef CONFIG_EEH
struct eeh_dev  *edev;
 #endif
diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index 546d036..513f8f2 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -89,6 +89,7 @@ struct pci_controller {
 
 #ifdef CONFIG_PPC64
unsigned long buid;
+   void *firmware_data;
 #endif /* CONFIG_PPC64 */
 
void *private_data;
@@ -154,9 +155,13 @@ static inline int isa_vaddr_is_ioport(void __iomem 
*address)
 struct iommu_table;
 
 struct pci_dn {
+   int flags;
+#define PCI_DN_FLAG_IOV_VF 0x01
+
int busno;  /* pci bus number */
int devfn;  /* pci device and function number */
 
+   struct  pci_dn *parent;
struct  pci_controller *phb;/* for pci devices */
struct  iommu_table *iommu_table;   /* for phb's or bridges */
struct  device_node *node;  /* back-pointer to the device_node */
@@ -171,14 +176,19 @@ struct pci_dn {
 #ifdef CONFIG_PPC_POWERNV
int pe_number;
 #endif
+   struct list_head child_list;
+   struct list_head list;
 };
 
 /* Get the pointer to a device_node's pci_dn */
 #define PCI_DN(dn) ((struct pci_dn *) (dn)-data)
 
+extern struct pci_dn *pci_get_pdn_by_devfn(struct pci_bus *bus,
+  int devfn);
 extern struct pci_dn *pci_get_pdn(struct pci_dev *pdev);
-
-extern void * update_dn_pci_info(struct device_node *dn, void *data);
+extern struct pci_dn *add_dev_pci_info(struct pci_dev *pdev);
+extern void remove_dev_pci_info(struct pci_dev *pdev);
+extern void *update_dn_pci_info(struct device_node *dn, void *data);
 
 static inline int pci_device_from_OF_node(struct device_node *np,
  u8 *bus, u8 *devfn)
diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index 83df307..f3a1a81 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -32,12 +32,223 @@
 #include asm/ppc-pci.h
 #include asm/firmware.h
 
+/*
+ * The function is used to find the firmware data of one
+ * specific PCI device, which is attached to the indicated
+ * PCI bus. For VFs, their firmware data is linked to that
+ * one of PF's bridge. For other devices, their firmware
+ * data is linked to that of their bridge.
+ */
+static struct pci_dn *pci_bus_to_pdn(struct pci_bus *bus)
+{
+   struct pci_bus *pbus;
+   struct device_node *dn;
+   struct pci_dn *pdn;
+
+   /*
+* We probably have virtual bus which doesn't
+* have associated bridge.
+*/
+   pbus = bus;
+   while (pbus) {
+   if (pci_is_root_bus(pbus) || pbus-self)
+   break;
+
+   pbus = pbus-parent;
+   }
+
+   /*
+* Except virtual bus, all PCI buses should
+* have device nodes.
+*/
+   dn = pci_bus_to_OF_node(pbus);
+   pdn = dn ? PCI_DN(dn) : NULL;
+
+   return pdn;
+}
+
+struct pci_dn *pci_get_pdn_by_devfn(struct pci_bus *bus,
+   int devfn)
+{
+   struct device_node *dn = NULL;
+   struct pci_dn *parent, 

Re: [PATCH] powerpc/powernv: Remove powernv RTAS support

2015-03-25 Thread Stewart Smith
Michael Ellerman m...@ellerman.id.au writes:

 The powernv code has some conditional support for running on bare metal
 machines that have no OPAL firmware, but provide RTAS.

 No released machines ever supported that, and even in the lab it was
 just a transitional hack in the days when OPAL was still being
 developed.

 So remove the code.

 Signed-off-by: Michael Ellerman m...@ellerman.id.au

The only current place I could think this could be remotely possible
would be in simulator... and we should instead make the OPAL calls work
properly in the simulator for all the RTAS functionality (that we care
about).

In related news.. I should poke the simulator guys.

Acked-by: Stewart Smith stew...@linux.vnet.ibm.com

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V15 03/21] PCI: Print more info in sriov_enable() error message

2015-03-25 Thread Wei Yang
From: Bjorn Helgaas bhelg...@google.com

If we don't have space for all the bus numbers required to enable VFs,
print the largest bus number required and the range available.

No functional change; improved error message only.

Signed-off-by: Bjorn Helgaas bhelg...@google.com
Acked-by: Wei Yang weiy...@linux.vnet.ibm.com
---
 drivers/pci/iov.c |7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 4b3a4ea..c4c33ea 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -180,6 +180,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
struct pci_dev *pdev;
struct pci_sriov *iov = dev-sriov;
int bars = 0;
+   u8 bus;
 
if (!nr_virtfn)
return 0;
@@ -216,8 +217,10 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
iov-offset = offset;
iov-stride = stride;
 
-   if (virtfn_bus(dev, nr_virtfn - 1)  dev-bus-busn_res.end) {
-   dev_err(dev-dev, SR-IOV: bus number out of range\n);
+   bus = virtfn_bus(dev, nr_virtfn - 1);
+   if (bus  dev-bus-busn_res.end) {
+   dev_err(dev-dev, can't enable %d VFs (bus %02x out of range 
of %pR)\n,
+   nr_virtfn, bus, dev-bus-busn_res);
return -ENOMEM;
}
 
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V15 12/21] PCI: Consider additional PF's IOV BAR alignment in sizing and assigning

2015-03-25 Thread Wei Yang
When sizing and assigning resources, we divide the resources into two
lists: the requested list and the additional list.  We don't consider the
alignment of additional VF(n) BAR space.

This is because the alignment required for the VF(n) BAR space is the size
of an individual VF BAR, not the size of the space for *all* VFs.  But we
want additional alignment to support partitioning on PowerNV.

Consider the additional IOV BAR alignment when sizing and assigning
resources.  When there is not enough system MMIO space to accomodate both
the requested list and the additional list, the PF's IOV BAR alignment will
not contribute to the bridge. When there is enough system MMIO space for
both lists, the additional alignment will contribute to the bridge.

The additional alignment is stored in the min_align of pci_dev_resource,
which is stored in the additional list by add_to_list() at the end of
pbus_size_mem(). The additional alignment is calculated in
pci_resource_alignment().  For an IOV BAR, we have arch dependent function
to get the alignment for different arch.

[bhelgaas: changelog, printk cast]
Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com
Acked-by: Bjorn Helgaas bhelg...@google.com
---
 drivers/pci/setup-bus.c |   95 +++
 1 file changed, 79 insertions(+), 16 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index e3e17f3..6603d40 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -99,8 +99,8 @@ static void remove_from_list(struct list_head *head,
}
 }
 
-static resource_size_t get_res_add_size(struct list_head *head,
-   struct resource *res)
+static struct pci_dev_resource *res_to_dev_res(struct list_head *head,
+  struct resource *res)
 {
struct pci_dev_resource *dev_res;
 
@@ -109,17 +109,37 @@ static resource_size_t get_res_add_size(struct list_head 
*head,
int idx = res - dev_res-dev-resource[0];
 
dev_printk(KERN_DEBUG, dev_res-dev-dev,
-res[%d]=%pR get_res_add_size add_size %llx\n,
+res[%d]=%pR res_to_dev_res add_size %llx 
min_align %llx\n,
 idx, dev_res-res,
-(unsigned long long)dev_res-add_size);
+(unsigned long long)dev_res-add_size,
+(unsigned long long)dev_res-min_align);
 
-   return dev_res-add_size;
+   return dev_res;
}
}
 
-   return 0;
+   return NULL;
 }
 
+static resource_size_t get_res_add_size(struct list_head *head,
+   struct resource *res)
+{
+   struct pci_dev_resource *dev_res;
+
+   dev_res = res_to_dev_res(head, res);
+   return dev_res ? dev_res-add_size : 0;
+}
+
+static resource_size_t get_res_add_align(struct list_head *head,
+struct resource *res)
+{
+   struct pci_dev_resource *dev_res;
+
+   dev_res = res_to_dev_res(head, res);
+   return dev_res ? dev_res-min_align : 0;
+}
+
+
 /* Sort resources by alignment */
 static void pdev_sort_resources(struct pci_dev *dev, struct list_head *head)
 {
@@ -215,7 +235,7 @@ static void reassign_resources_sorted(struct list_head 
*realloc_head,
struct resource *res;
struct pci_dev_resource *add_res, *tmp;
struct pci_dev_resource *dev_res;
-   resource_size_t add_size;
+   resource_size_t add_size, align;
int idx;
 
list_for_each_entry_safe(add_res, tmp, realloc_head, list) {
@@ -238,13 +258,13 @@ static void reassign_resources_sorted(struct list_head 
*realloc_head,
 
idx = res - add_res-dev-resource[0];
add_size = add_res-add_size;
+   align = add_res-min_align;
if (!resource_size(res)) {
-   res-start = add_res-start;
+   res-start = align;
res-end = res-start + add_size - 1;
if (pci_assign_resource(add_res-dev, idx))
reset_resource(res);
} else {
-   resource_size_t align = add_res-min_align;
res-flags |= add_res-flags 
 (IORESOURCE_STARTALIGN|IORESOURCE_SIZEALIGN);
if (pci_reassign_resource(add_res-dev, idx,
@@ -368,8 +388,9 @@ static void __assign_resources_sorted(struct list_head 
*head,
LIST_HEAD(save_head);
LIST_HEAD(local_fail_head);
struct pci_dev_resource *save_res;
-   struct pci_dev_resource *dev_res, *tmp_res;
+   struct pci_dev_resource *dev_res, *tmp_res, *dev_res2;
unsigned long fail_type;
+   resource_size_t add_align, align;
 
/* 

[PATCH 24/27] powerpc: Remove shim for pci_controller_ops.probe_mode

2015-03-25 Thread Daniel Axtens
This also moves back the defines, as explained in the commit that
created the shim.

Signed-off-by: Daniel Axtens d...@axtens.net
---
 arch/powerpc/include/asm/machdep.h|  1 -
 arch/powerpc/include/asm/pci-bridge.h | 16 
 arch/powerpc/include/asm/pci.h|  5 +
 arch/powerpc/kernel/pci-common.c  |  4 ++--
 arch/powerpc/kernel/pci-hotplug.c |  6 +-
 arch/powerpc/kernel/pci_of_scan.c |  6 +-
 6 files changed, 17 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index 5549b6c..dfc8d2b 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -127,7 +127,6 @@ struct machdep_calls {
/* PCI stuff */
/* Called after scanning the bus, before allocating resources */
void(*pcibios_fixup)(void);
-   int (*pci_probe_mode)(struct pci_bus *);
void(*pci_irq_fixup)(struct pci_dev *dev);
int (*pcibios_root_bridge_prepare)(struct pci_host_bridge
*bridge);
diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index 029def0..b5d8631 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -12,11 +12,6 @@
 #include linux/ioport.h
 #include asm-generic/pci-bridge.h
 
-/* Return values for pci_controller_ops.probe_mode function */
-#define PCI_PROBE_NONE -1  /* Don't look at this bus at all */
-#define PCI_PROBE_NORMAL   0   /* Do normal PCI probing */
-#define PCI_PROBE_DEVTREE  1   /* Instantiate from device tree */
-
 struct device_node;
 
 /*
@@ -305,16 +300,5 @@ static inline void dma_bus_setup(struct pci_bus *bus)
ppc_md.pci_dma_bus_setup(bus);
 }
 
-static inline int probe_mode(struct pci_bus *bus)
-{
-   struct pci_controller *hose = pci_bus_to_host(bus);
-
-   if (hose-controller_ops.probe_mode)
-   return hose-controller_ops.probe_mode(bus);
-   if (ppc_md.pci_probe_mode)
-   return ppc_md.pci_probe_mode(bus);
-   return PCI_PROBE_NORMAL;
-}
-
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_PCI_BRIDGE_H */
diff --git a/arch/powerpc/include/asm/pci.h b/arch/powerpc/include/asm/pci.h
index 8745067..4aef8d6 100644
--- a/arch/powerpc/include/asm/pci.h
+++ b/arch/powerpc/include/asm/pci.h
@@ -22,6 +22,11 @@
 
 #include asm-generic/pci-dma-compat.h
 
+/* Return values for pci_controller_ops.probe_mode function */
+#define PCI_PROBE_NONE -1  /* Don't look at this bus at all */
+#define PCI_PROBE_NORMAL   0   /* Do normal PCI probing */
+#define PCI_PROBE_DEVTREE  1   /* Instantiate from device tree */
+
 #define PCIBIOS_MIN_IO 0x1000
 #define PCIBIOS_MIN_MEM0x1000
 
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 5b90e99c..a61ecb4 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -1628,8 +1628,8 @@ void pcibios_scan_phb(struct pci_controller *hose)
 
/* Get probe mode and perform scan */
mode = PCI_PROBE_NORMAL;
-   if (node)
-   mode = probe_mode(bus);
+   if (node  hose-controller_ops.probe_mode)
+   mode = hose-controller_ops.probe_mode(bus);
pr_debug(probe mode: %d\n, mode);
if (mode == PCI_PROBE_DEVTREE)
of_scan_bus(node, bus);
diff --git a/arch/powerpc/kernel/pci-hotplug.c 
b/arch/powerpc/kernel/pci-hotplug.c
index 504d823..e9b0a4a 100644
--- a/arch/powerpc/kernel/pci-hotplug.c
+++ b/arch/powerpc/kernel/pci-hotplug.c
@@ -73,12 +73,16 @@ void pcibios_add_pci_devices(struct pci_bus * bus)
 {
int slotno, mode, pass, max;
struct pci_dev *dev;
+   struct pci_controller *hose;
struct device_node *dn = pci_bus_to_OF_node(bus);
 
eeh_add_device_tree_early(dn);
 
+   hose = pci_bus_to_host(bus);
+
mode = PCI_PROBE_NORMAL;
-   mode = probe_mode(bus);
+   if (hose-controller_ops.probe_mode)
+   mode = hose-controller_ops.probe_mode(bus);
 
if (mode == PCI_PROBE_DEVTREE) {
/* use ofdt-based probe */
diff --git a/arch/powerpc/kernel/pci_of_scan.c 
b/arch/powerpc/kernel/pci_of_scan.c
index ae1767b..8312962 100644
--- a/arch/powerpc/kernel/pci_of_scan.c
+++ b/arch/powerpc/kernel/pci_of_scan.c
@@ -207,6 +207,7 @@ void of_scan_pci_bridge(struct pci_dev *dev)
 {
struct device_node *node = dev-dev.of_node;
struct pci_bus *bus;
+   struct pci_controller *hose;
const __be32 *busrange, *ranges;
int len, i, mode;
struct pci_bus_region region;
@@ -286,8 +287,11 @@ void of_scan_pci_bridge(struct pci_dev *dev)
bus-number);
pr_debug(bus name: %s\n, bus-name);
 
+   hose = pci_bus_to_host(bus);
+
mode = PCI_PROBE_NORMAL;
-   

[PATCH 23/27] powerpc: Remove shim for pci_controller_ops.enable_device_hook

2015-03-25 Thread Daniel Axtens
Signed-off-by: Daniel Axtens d...@axtens.net
---
 arch/powerpc/include/asm/machdep.h|  4 
 arch/powerpc/include/asm/pci-bridge.h | 11 ---
 arch/powerpc/kernel/pci-common.c  |  7 +--
 3 files changed, 5 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index f178cf1..5549b6c 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -237,10 +237,6 @@ struct machdep_calls {
/* Called for each PCI bus in the system when it's probed */
void (*pcibios_fixup_bus)(struct pci_bus *);
 
-   /* Called when pci_enable_device() is called. Returns true to
-* allow assignment/enabling of the device. */
-   bool (*pcibios_enable_device_hook)(struct pci_dev *);
-
/* Called after scan and before resource survey */
void (*pcibios_fixup_phb)(struct pci_controller *hose);
 
diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index b08db93..029def0 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -316,16 +316,5 @@ static inline int probe_mode(struct pci_bus *bus)
return PCI_PROBE_NORMAL;
 }
 
-static inline bool enable_device_hook(struct pci_dev *dev)
-{
-   struct pci_controller *hose = pci_bus_to_host(dev-bus);
-
-   if (hose-controller_ops.enable_device_hook)
-   return hose-controller_ops.enable_device_hook(dev);
-   if (ppc_md.pcibios_enable_device_hook)
-   return ppc_md.pcibios_enable_device_hook(dev);
-   return true;
-}
-
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_PCI_BRIDGE_H */
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index a535d31..5b90e99c 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -1452,8 +1452,11 @@ EXPORT_SYMBOL_GPL(pcibios_finish_adding_to_bus);
 
 int pcibios_enable_device(struct pci_dev *dev, int mask)
 {
-   if (!enable_device_hook(dev))
-   return -EINVAL;
+   struct pci_controller *hose = pci_bus_to_host(dev-bus);
+
+   if (hose-controller_ops.enable_device_hook)
+   if (!hose-controller_ops.enable_device_hook(dev))
+   return -EINVAL;
 
return pci_enable_resources(dev, mask);
 }
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 27/27] powerpc: dart_iommu: Remove check for controller_ops == NULL case

2015-03-25 Thread Daniel Axtens
Now that we have ported the calls to iommu_init_early_dart to always
supply a pci_controller_ops struct, we can safely drop the check.

Signed-off-by: Daniel Axtens d...@axtens.net
---
 arch/powerpc/sysdev/dart_iommu.c | 13 +
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/sysdev/dart_iommu.c b/arch/powerpc/sysdev/dart_iommu.c
index 87b8000..d00a566 100644
--- a/arch/powerpc/sysdev/dart_iommu.c
+++ b/arch/powerpc/sysdev/dart_iommu.c
@@ -395,20 +395,17 @@ void __init iommu_init_early_dart(struct 
pci_controller_ops *controller_ops)
if (dart_is_u4)
ppc_md.dma_set_mask = dart_dma_set_mask;
 
-   if (controller_ops) {
-   controller_ops-dma_dev_setup = pci_dma_dev_setup_dart;
-   controller_ops-dma_bus_setup = pci_dma_bus_setup_dart;
-   }
+   controller_ops-dma_dev_setup = pci_dma_dev_setup_dart;
+   controller_ops-dma_bus_setup = pci_dma_bus_setup_dart;
+
/* Setup pci_dma ops */
set_pci_dma_ops(dma_iommu_ops);
return;
 
  bail:
/* If init failed, use direct iommu and null setup functions */
-   if (controller_ops) {
-   controller_ops-dma_dev_setup = NULL;
-   controller_ops-dma_bus_setup = NULL;
-   }
+   controller_ops-dma_dev_setup = NULL;
+   controller_ops-dma_bus_setup = NULL;
 
/* Setup pci_dma ops */
set_pci_dma_ops(dma_direct_ops);
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V15 06/21] PCI: Index IOV resources in the conventional style

2015-03-25 Thread Wei Yang
From: Bjorn Helgaas bhelg...@google.com

Most of PCI uses res = dev-resource[i], not res = dev-resource + i.
Use that style in iov.c also.

No functional change.

Signed-off-by: Bjorn Helgaas bhelg...@google.com
Acked-by: Wei Yang weiy...@linux.vnet.ibm.com
---
 drivers/pci/iov.c |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 5bca0e1..27b98c3 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -95,7 +95,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int reset)
virtfn-multifunction = 0;
 
for (i = 0; i  PCI_SRIOV_NUM_BARS; i++) {
-   res = dev-resource + PCI_IOV_RESOURCES + i;
+   res = dev-resource[i + PCI_IOV_RESOURCES];
if (!res-parent)
continue;
virtfn-resource[i].name = pci_name(virtfn);
@@ -212,7 +212,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
nres = 0;
for (i = 0; i  PCI_SRIOV_NUM_BARS; i++) {
bars |= (1  (i + PCI_IOV_RESOURCES));
-   res = dev-resource + PCI_IOV_RESOURCES + i;
+   res = dev-resource[i + PCI_IOV_RESOURCES];
if (res-parent)
nres++;
}
@@ -373,7 +373,7 @@ found:
 
nres = 0;
for (i = 0; i  PCI_SRIOV_NUM_BARS; i++) {
-   res = dev-resource + PCI_IOV_RESOURCES + i;
+   res = dev-resource[i + PCI_IOV_RESOURCES];
bar64 = __pci_read_base(dev, pci_bar_unknown, res,
pos + PCI_SRIOV_BAR + i * 4);
if (!res-flags)
@@ -417,7 +417,7 @@ found:
 
 failed:
for (i = 0; i  PCI_SRIOV_NUM_BARS; i++) {
-   res = dev-resource + PCI_IOV_RESOURCES + i;
+   res = dev-resource[i + PCI_IOV_RESOURCES];
res-flags = 0;
}
 
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V15 05/21] PCI: Keep individual VF BAR size in struct pci_sriov

2015-03-25 Thread Wei Yang
Currently we don't store the individual VF BAR size.  We calculate it when
needed by dividing the PF's IOV resource size (which contains space for
*all* the VFs) by total_VFs or by reading the BAR in the SR-IOV capability
again.

Keep the individual VF BAR size in struct pci_sriov.barsz[], add
pci_iov_resource_size() to retrieve it, and use that instead of doing the
division or reading the SR-IOV capability BAR.

[bhelgaas: rename to barsz[], simplify barsz[] index computation, remove
SR-IOV capability BAR sizing]
Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com
Acked-by: Bjorn Helgaas bhelg...@google.com
---
 drivers/pci/iov.c   |   39 ---
 drivers/pci/pci.h   |1 +
 include/linux/pci.h |3 +++
 3 files changed, 24 insertions(+), 19 deletions(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 05f9d97..5bca0e1 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -57,6 +57,14 @@ static void virtfn_remove_bus(struct pci_bus *physbus, 
struct pci_bus *virtbus)
pci_remove_bus(virtbus);
 }
 
+resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno)
+{
+   if (!dev-is_physfn)
+   return 0;
+
+   return dev-sriov-barsz[resno - PCI_IOV_RESOURCES];
+}
+
 static int virtfn_add(struct pci_dev *dev, int id, int reset)
 {
int i;
@@ -92,8 +100,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int reset)
continue;
virtfn-resource[i].name = pci_name(virtfn);
virtfn-resource[i].flags = res-flags;
-   size = resource_size(res);
-   do_div(size, iov-total_VFs);
+   size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES);
virtfn-resource[i].start = res-start + size * id;
virtfn-resource[i].end = virtfn-resource[i].start + size - 1;
rc = request_resource(res, virtfn-resource[i]);
@@ -311,7 +318,7 @@ static void sriov_disable(struct pci_dev *dev)
 
 static int sriov_init(struct pci_dev *dev, int pos)
 {
-   int i;
+   int i, bar64;
int rc;
int nres;
u32 pgsz;
@@ -360,29 +367,29 @@ found:
pgsz = ~(pgsz - 1);
pci_write_config_dword(dev, pos + PCI_SRIOV_SYS_PGSIZE, pgsz);
 
+   iov = kzalloc(sizeof(*iov), GFP_KERNEL);
+   if (!iov)
+   return -ENOMEM;
+
nres = 0;
for (i = 0; i  PCI_SRIOV_NUM_BARS; i++) {
res = dev-resource + PCI_IOV_RESOURCES + i;
-   i += __pci_read_base(dev, pci_bar_unknown, res,
-pos + PCI_SRIOV_BAR + i * 4);
+   bar64 = __pci_read_base(dev, pci_bar_unknown, res,
+   pos + PCI_SRIOV_BAR + i * 4);
if (!res-flags)
continue;
if (resource_size(res)  (PAGE_SIZE - 1)) {
rc = -EIO;
goto failed;
}
+   iov-barsz[i] = resource_size(res);
res-end = res-start + resource_size(res) * total - 1;
dev_info(dev-dev, VF(n) BAR%d space: %pR (contains BAR%d for 
%d VFs)\n,
 i, res, i, total);
+   i += bar64;
nres++;
}
 
-   iov = kzalloc(sizeof(*iov), GFP_KERNEL);
-   if (!iov) {
-   rc = -ENOMEM;
-   goto failed;
-   }
-
iov-pos = pos;
iov-nres = nres;
iov-ctrl = ctrl;
@@ -414,6 +421,7 @@ failed:
res-flags = 0;
}
 
+   kfree(iov);
return rc;
 }
 
@@ -510,14 +518,7 @@ int pci_iov_resource_bar(struct pci_dev *dev, int resno)
  */
 resource_size_t pci_sriov_resource_alignment(struct pci_dev *dev, int resno)
 {
-   struct resource tmp;
-   int reg = pci_iov_resource_bar(dev, resno);
-
-   if (!reg)
-   return 0;
-
-__pci_read_base(dev, pci_bar_unknown, tmp, reg);
-   return resource_alignment(tmp);
+   return pci_iov_resource_size(dev, resno);
 }
 
 /**
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 4091f82..5732964 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -247,6 +247,7 @@ struct pci_sriov {
struct pci_dev *dev;/* lowest numbered PF */
struct pci_dev *self;   /* this PF */
struct mutex lock;  /* lock for VF bus */
+   resource_size_t barsz[PCI_SRIOV_NUM_BARS];  /* VF BAR size */
 };
 
 #ifdef CONFIG_PCI_ATS
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 211e9da..1559658 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1675,6 +1675,7 @@ int pci_num_vf(struct pci_dev *dev);
 int pci_vfs_assigned(struct pci_dev *dev);
 int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs);
 int pci_sriov_get_totalvfs(struct pci_dev *dev);
+resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno);
 #else
 static inline int 

[PATCH V15 16/21] powerpc/powernv: Implement pcibios_iov_resource_alignment() on powernv

2015-03-25 Thread Wei Yang
Implement pcibios_iov_resource_alignment() on powernv platform.

On PowerNV platform, there are 3 cases for the IOV BAR:
1. initial state, the IOV BAR size is multiple times of VF BAR size
2. after expanded, the IOV BAR size is expanded to meet the M64 segment size
3. sizing stage, the IOV BAR is truncated to 0

pnv_pci_iov_resource_alignment() handle these three cases respectively.

[bhelgaas: adjust to drop align parameter, return pci_iov_resource_size()
if no ppc_md machdep_call version]
Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/machdep.h|1 +
 arch/powerpc/kernel/pci-common.c  |   10 ++
 arch/powerpc/platforms/powernv/pci-ioda.c |   20 
 3 files changed, 31 insertions(+)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index 1d72fda..37e451f 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -252,6 +252,7 @@ struct machdep_calls {
 
 #ifdef CONFIG_PCI_IOV
void (*pcibios_fixup_sriov)(struct pci_dev *pdev);
+   resource_size_t (*pcibios_iov_resource_alignment)(struct pci_dev *, int 
resno);
 #endif /* CONFIG_PCI_IOV */
 
/* Called to shutdown machine specific hardware not already controlled
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 375bf70..9a306ff 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -130,6 +130,16 @@ void pcibios_reset_secondary_bus(struct pci_dev *dev)
pci_reset_secondary_bus(dev);
 }
 
+#ifdef CONFIG_PCI_IOV
+resource_size_t pcibios_iov_resource_alignment(struct pci_dev *pdev, int resno)
+{
+   if (ppc_md.pcibios_iov_resource_alignment)
+   return ppc_md.pcibios_iov_resource_alignment(pdev, resno);
+
+   return pci_iov_resource_size(pdev, resno);
+}
+#endif /* CONFIG_PCI_IOV */
+
 static resource_size_t pcibios_io_size(const struct pci_controller *hose)
 {
 #ifdef CONFIG_PPC64
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index cadd3fb..93ec16c 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1965,6 +1965,25 @@ static resource_size_t pnv_pci_window_alignment(struct 
pci_bus *bus,
return phb-ioda.io_segsize;
 }
 
+#ifdef CONFIG_PCI_IOV
+static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
+ int resno)
+{
+   struct pci_dn *pdn = pci_get_pdn(pdev);
+   resource_size_t align, iov_align;
+
+   iov_align = resource_size(pdev-resource[resno]);
+   if (iov_align)
+   return iov_align;
+
+   align = pci_iov_resource_size(pdev, resno);
+   if (pdn-vfs_expanded)
+   return pdn-vfs_expanded * align;
+
+   return align;
+}
+#endif /* CONFIG_PCI_IOV */
+
 /* Prevent enabling devices for which we couldn't properly
  * assign a PE
  */
@@ -2167,6 +2186,7 @@ static void __init pnv_pci_init_ioda_phb(struct 
device_node *np,
ppc_md.pcibios_reset_secondary_bus = pnv_pci_reset_secondary_bus;
 #ifdef CONFIG_PCI_IOV
ppc_md.pcibios_fixup_sriov = pnv_pci_ioda_fixup_iov_resources;
+   ppc_md.pcibios_iov_resource_alignment = pnv_pci_iov_resource_alignment;
 #endif /* CONFIG_PCI_IOV */
pci_add_flags(PCI_REASSIGN_ALL_RSRC);
 
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V15 21/21] powerpc/pci: Add PCI resource alignment documentation

2015-03-25 Thread Wei Yang
In order to enable SRIOV on PowerNV platform, the PF's IOV BAR needs to be
adjusted:

1. size expanded
2. aligned to M64BT size

This patch documents this change on the reason and how.

[bhelgaas: reformat, clarify, expand]
Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com
---
 .../powerpc/pci_iov_resource_on_powernv.txt|  301 
 1 file changed, 301 insertions(+)
 create mode 100644 Documentation/powerpc/pci_iov_resource_on_powernv.txt

diff --git a/Documentation/powerpc/pci_iov_resource_on_powernv.txt 
b/Documentation/powerpc/pci_iov_resource_on_powernv.txt
new file mode 100644
index 000..b55c5cd
--- /dev/null
+++ b/Documentation/powerpc/pci_iov_resource_on_powernv.txt
@@ -0,0 +1,301 @@
+Wei Yang weiy...@linux.vnet.ibm.com
+Benjamin Herrenschmidt b...@au1.ibm.com
+Bjorn Helgaas bhelg...@google.com
+26 Aug 2014
+
+This document describes the requirement from hardware for PCI MMIO resource
+sizing and assignment on PowerKVM and how generic PCI code handles this
+requirement. The first two sections describe the concepts of Partitionable
+Endpoints and the implementation on P8 (IODA2). The next two sections talks
+about considerations on enabling SRIOV on IODA2.
+
+1. Introduction to Partitionable Endpoints
+
+A Partitionable Endpoint (PE) is a way to group the various resources
+associated with a device or a set of devices to provide isolation between
+partitions (i.e., filtering of DMA, MSIs etc.) and to provide a mechanism
+to freeze a device that is causing errors in order to limit the possibility
+of propagation of bad data.
+
+There is thus, in HW, a table of PE states that contains a pair of frozen
+state bits (one for MMIO and one for DMA, they get set together but can be
+cleared independently) for each PE.
+
+When a PE is frozen, all stores in any direction are dropped and all loads
+return all 1's value. MSIs are also blocked. There's a bit more state that
+captures things like the details of the error that caused the freeze etc., but
+that's not critical.
+
+The interesting part is how the various PCIe transactions (MMIO, DMA, ...)
+are matched to their corresponding PEs.
+
+The following section provides a rough description of what we have on P8
+(IODA2).  Keep in mind that this is all per PHB (PCI host bridge).  Each PHB
+is a completely separate HW entity that replicates the entire logic, so has
+its own set of PEs, etc.
+
+2. Implementation of Partitionable Endpoints on P8 (IODA2)
+
+P8 supports up to 256 Partitionable Endpoints per PHB.
+
+  * Inbound
+
+For DMA, MSIs and inbound PCIe error messages, we have a table (in
+memory but accessed in HW by the chip) that provides a direct
+correspondence between a PCIe RID (bus/dev/fn) with a PE number.
+We call this the RTT.
+
+- For DMA we then provide an entire address space for each PE that can
+  contain two windows, depending on the value of PCI address bit 59.
+  Each window can be configured to be remapped via a TCE table (IOMMU
+  translation table), which has various configurable characteristics
+  not described here.
+
+- For MSIs, we have two windows in the address space (one at the top of
+  the 32-bit space and one much higher) which, via a combination of the
+  address and MSI value, will result in one of the 2048 interrupts per
+  bridge being triggered.  There's a PE# in the interrupt controller
+  descriptor table as well which is compared with the PE# obtained from
+  the RTT to authorize the device to emit that specific interrupt.
+
+- Error messages just use the RTT.
+
+  * Outbound.  That's where the tricky part is.
+
+Like other PCI host bridges, the Power8 IODA2 PHB supports windows
+from the CPU address space to the PCI address space.  There is one M32
+window and sixteen M64 windows.  They have different characteristics.
+First what they have in common: they forward a configurable portion of
+the CPU address space to the PCIe bus and must be naturally aligned
+power of two in size.  The rest is different:
+
+- The M32 window:
+
+  * Is limited to 4GB in size.
+
+  * Drops the top bits of the address (above the size) and replaces
+   them with a configurable value.  This is typically used to generate
+   32-bit PCIe accesses.  We configure that window at boot from FW and
+   don't touch it from Linux; it's usually set to forward a 2GB
+   portion of address space from the CPU to PCIe
+   0x8000_..0x_.  (Note: The top 64KB are actually
+   reserved for MSIs but this is not a problem at this point; we just
+   need to ensure Linux doesn't assign anything there, the M32 logic
+   ignores that however and will forward in that space if we try).
+
+  * It is divided into 256 segments of equal size.  A table in the chip
+   maps each segment to a PE#.  That allows portions of the MMIO space
+   to be assigned to PEs on a segment 

[PATCH V15 20/21] powerpc/pci: Remove unused struct pci_dn.pcidev field

2015-03-25 Thread Wei Yang
In struct pci_dn, the pcidev field is assigned but not used, so remove it.

Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com
Acked-by: Gavin Shan gws...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/pci-bridge.h |1 -
 arch/powerpc/platforms/powernv/pci-ioda.c |1 -
 2 files changed, 2 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index ec83b51..680ae56 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -168,7 +168,6 @@ struct pci_dn {
 
int pci_ext_config_space;   /* for pci devices */
 
-   struct  pci_dev *pcidev;/* back-pointer to the pci device */
 #ifdef CONFIG_EEH
struct eeh_dev *edev;   /* eeh device */
 #endif
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index dc9f401..2505ad1 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1028,7 +1028,6 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, 
struct pnv_ioda_pe *pe)
pci_name(dev));
continue;
}
-   pdn-pcidev = dev;
pdn-pe_number = pe-pe_number;
pe-dma_weight += pnv_ioda_dma_weight(dev);
if ((pe-flags  PNV_IODA_PE_BUS_ALL)  dev-subordinate)
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 15/27] powerpc/pseries: Move controller ops from ppc_md to controller_ops

2015-03-25 Thread Daniel Axtens
This moves the pSeries platform to use the pci_controller_ops structure,
rather than ppc_md for PCI controller operations.

Signed-off-by: Daniel Axtens d...@axtens.net
---
 arch/powerpc/platforms/pseries/iommu.c   | 9 +
 arch/powerpc/platforms/pseries/pseries.h | 2 ++
 arch/powerpc/platforms/pseries/setup.c   | 6 +-
 3 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c 
b/arch/powerpc/platforms/pseries/iommu.c
index 7803a19..61d5a17 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -49,6 +49,7 @@
 #include asm/mmzone.h
 #include asm/plpar_wrappers.h
 
+#include pseries.h
 
 static void tce_invalidate_pSeries_sw(struct iommu_table *tbl,
  __be64 *startp, __be64 *endp)
@@ -1307,16 +1308,16 @@ void iommu_init_early_pSeries(void)
ppc_md.tce_free  = tce_free_pSeriesLP;
}
ppc_md.tce_get   = tce_get_pSeriesLP;
-   ppc_md.pci_dma_bus_setup = pci_dma_bus_setup_pSeriesLP;
-   ppc_md.pci_dma_dev_setup = pci_dma_dev_setup_pSeriesLP;
+   pseries_pci_controller_ops.dma_bus_setup = 
pci_dma_bus_setup_pSeriesLP;
+   pseries_pci_controller_ops.dma_dev_setup = 
pci_dma_dev_setup_pSeriesLP;
ppc_md.dma_set_mask = dma_set_mask_pSeriesLP;
ppc_md.dma_get_required_mask = dma_get_required_mask_pSeriesLP;
} else {
ppc_md.tce_build = tce_build_pSeries;
ppc_md.tce_free  = tce_free_pSeries;
ppc_md.tce_get   = tce_get_pseries;
-   ppc_md.pci_dma_bus_setup = pci_dma_bus_setup_pSeries;
-   ppc_md.pci_dma_dev_setup = pci_dma_dev_setup_pSeries;
+   pseries_pci_controller_ops.dma_bus_setup = 
pci_dma_bus_setup_pSeries;
+   pseries_pci_controller_ops.dma_dev_setup = 
pci_dma_dev_setup_pSeries;
}
 
 
diff --git a/arch/powerpc/platforms/pseries/pseries.h 
b/arch/powerpc/platforms/pseries/pseries.h
index 1796c54..cd64672 100644
--- a/arch/powerpc/platforms/pseries/pseries.h
+++ b/arch/powerpc/platforms/pseries/pseries.h
@@ -65,6 +65,8 @@ extern int dlpar_detach_node(struct device_node *);
 struct pci_host_bridge;
 int pseries_root_bridge_prepare(struct pci_host_bridge *bridge);
 
+extern struct pci_controller_ops pseries_pci_controller_ops;
+
 unsigned long pseries_memory_block_size(void);
 
 #endif /* _PSERIES_PSERIES_H */
diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index 1a5f884..328e318 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -478,6 +478,7 @@ static void __init find_and_init_phbs(void)
rtas_setup_phb(phb);
pci_process_bridge_OF_ranges(phb, node, 0);
isa_bridge_find_early(phb);
+   phb-controller_ops = pseries_pci_controller_ops;
}
 
of_node_put(root);
@@ -840,6 +841,10 @@ static int pSeries_pci_probe_mode(struct pci_bus *bus)
 void pSeries_final_fixup(void) { }
 #endif
 
+struct pci_controller_ops pseries_pci_controller_ops = {
+   .probe_mode = pSeries_pci_probe_mode,
+};
+
 define_machine(pseries) {
.name   = pSeries,
.probe  = pSeries_probe,
@@ -848,7 +853,6 @@ define_machine(pseries) {
.show_cpuinfo   = pSeries_show_cpuinfo,
.log_error  = pSeries_log_error,
.pcibios_fixup  = pSeries_final_fixup,
-   .pci_probe_mode = pSeries_pci_probe_mode,
.restart= rtas_restart,
.halt   = rtas_halt,
.panic  = rtas_os_term,
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 19/27] powerpc: fsl_pci, swiotlb: Move controller ops from ppc_md to controller_ops

2015-03-25 Thread Daniel Axtens
This moves the setup out of swiotlb's subsys init call, and into an

fsl_pci.c is the only thing that checks the ppc_swiotlb_enable global,
so we can be confident that patching it will cover all the PCI
implementations affected by the changes to dma-swiotlb.c.

Signed-off-by: Daniel Axtens d...@axtens.net
---
 arch/powerpc/kernel/dma-swiotlb.c | 11 ---
 arch/powerpc/sysdev/fsl_pci.c | 19 +++
 2 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kernel/dma-swiotlb.c 
b/arch/powerpc/kernel/dma-swiotlb.c
index d06491b..6e8d764 100644
--- a/arch/powerpc/kernel/dma-swiotlb.c
+++ b/arch/powerpc/kernel/dma-swiotlb.c
@@ -116,16 +116,13 @@ void __init swiotlb_detect_4g(void)
}
 }
 
-static int __init swiotlb_subsys_init(void)
+static int __init check_swiotlb_enabled(void)
 {
-   if (ppc_swiotlb_enable) {
+   if (ppc_swiotlb_enable)
swiotlb_print_info();
-   set_pci_dma_ops(swiotlb_dma_ops);
-   ppc_md.pci_dma_dev_setup = pci_dma_dev_setup_swiotlb;
-   } else {
+   else
swiotlb_free();
-   }
 
return 0;
 }
-subsys_initcall(swiotlb_subsys_init);
+subsys_initcall(check_swiotlb_enabled);
diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c
index 7071feb..ca13b7f 100644
--- a/arch/powerpc/sysdev/fsl_pci.c
+++ b/arch/powerpc/sysdev/fsl_pci.c
@@ -111,6 +111,22 @@ static struct pci_ops fsl_indirect_pcie_ops =
 #define MAX_PHYS_ADDR_BITS 40
 static u64 pci64_dma_offset = 1ull  MAX_PHYS_ADDR_BITS;
 
+#ifdef CONFIG_SWIOTLB
+static struct pci_controller_ops swiotlb_pci_controller_ops = {
+   .dma_dev_setup = pci_dma_dev_setup_swiotlb,
+};
+
+static void setup_swiotlb_ops(struct pci_controller *hose)
+{
+   if (ppc_swiotlb_enable) {
+   hose-controller_ops = swiotlb_pci_controller_ops;
+   set_pci_dma_ops(swiotlb_dma_ops);
+   }
+}
+#else
+static inline void setup_swiotlb_ops(struct pci_controller *hose) {}
+#endif
+
 static int fsl_pci_dma_set_mask(struct device *dev, u64 dma_mask)
 {
if (!dev-dma_mask || !dma_supported(dev, dma_mask))
@@ -492,6 +508,9 @@ int fsl_add_bridge(struct platform_device *pdev, int 
is_primary)
hose-first_busno = bus_range ? bus_range[0] : 0x0;
hose-last_busno = bus_range ? bus_range[1] : 0xff;
 
+   /* Set up controller operations */
+   setup_swiotlb_ops(hose);
+
pr_debug(PCI memory map start 0x%016llx, size 0x%016llx\n,
 (u64)rsrc.start, (u64)resource_size(rsrc));
 
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 18/27] powerpc/maple: Move controller ops from ppc_md to controller_ops

2015-03-25 Thread Daniel Axtens
This moves the Maple platform to use the pci_controller_ops
structure rather than ppc_md for PCI controller operations.

Signed-off-by: Daniel Axtens d...@axtens.net
---
 arch/powerpc/platforms/maple/maple.h | 2 ++
 arch/powerpc/platforms/maple/pci.c   | 4 
 arch/powerpc/platforms/maple/setup.c | 2 +-
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/maple/maple.h 
b/arch/powerpc/platforms/maple/maple.h
index c6911dd..eecfa18 100644
--- a/arch/powerpc/platforms/maple/maple.h
+++ b/arch/powerpc/platforms/maple/maple.h
@@ -10,3 +10,5 @@ extern void maple_calibrate_decr(void);
 extern void maple_pci_init(void);
 extern void maple_pci_irq_fixup(struct pci_dev *dev);
 extern int maple_pci_get_legacy_ide_irq(struct pci_dev *dev, int channel);
+
+extern struct pci_controller_ops maple_pci_controller_ops;
diff --git a/arch/powerpc/platforms/maple/pci.c 
b/arch/powerpc/platforms/maple/pci.c
index d3a1306..a923230 100644
--- a/arch/powerpc/platforms/maple/pci.c
+++ b/arch/powerpc/platforms/maple/pci.c
@@ -510,6 +510,7 @@ static int __init maple_add_bridge(struct device_node *dev)
return -ENOMEM;
hose-first_busno = bus_range ? bus_range[0] : 0;
hose-last_busno = bus_range ? bus_range[1] : 0xff;
+   hose-controller_ops = maple_pci_controller_ops;
 
disp_name = NULL;
if (of_device_is_compatible(dev, u3-agp)) {
@@ -660,3 +661,6 @@ static void quirk_ipr_msi(struct pci_dev *dev)
 }
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_IBM, PCI_DEVICE_ID_IBM_OBSIDIAN,
quirk_ipr_msi);
+
+struct pci_controller_ops maple_pci_controller_ops = {
+};
diff --git a/arch/powerpc/platforms/maple/setup.c 
b/arch/powerpc/platforms/maple/setup.c
index 3bf2e03..a837188 100644
--- a/arch/powerpc/platforms/maple/setup.c
+++ b/arch/powerpc/platforms/maple/setup.c
@@ -203,7 +203,7 @@ static void __init maple_init_early(void)
 {
DBG( - maple_init_early\n);
 
-   iommu_init_early_dart(NULL);
+   iommu_init_early_dart(maple_pci_controller_ops);
 
DBG( - maple_init_early\n);
 }
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 25/27] powerpc: Remove shim for pci_controller_ops.dma_dev_setup

2015-03-25 Thread Daniel Axtens
Signed-off-by: Daniel Axtens d...@axtens.net
---
 arch/powerpc/include/asm/machdep.h| 1 -
 arch/powerpc/include/asm/pci-bridge.h | 9 -
 arch/powerpc/kernel/pci-common.c  | 5 -
 arch/powerpc/sysdev/dart_iommu.c  | 2 --
 4 files changed, 4 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index dfc8d2b..2f7b319 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -103,7 +103,6 @@ struct machdep_calls {
 #endif
 #endif /* CONFIG_PPC64 */
 
-   void(*pci_dma_dev_setup)(struct pci_dev *dev);
void(*pci_dma_bus_setup)(struct pci_bus *bus);
 
/* Platform set_dma_mask and dma_get_required_mask overrides */
diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index b5d8631..e578f67 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -280,15 +280,6 @@ static inline int pcibios_vaddr_is_ioport(void __iomem 
*address)
 /*
  * Shims to prefer pci_controller version over ppc_md where available.
  */
-static inline void dma_dev_setup(struct pci_dev *dev)
-{
-   struct pci_controller *hose = pci_bus_to_host(dev-bus);
-
-   if (hose-controller_ops.dma_dev_setup)
-   hose-controller_ops.dma_dev_setup(dev);
-   else if (ppc_md.pci_dma_dev_setup)
-   ppc_md.pci_dma_dev_setup(dev);
-}
 
 static inline void dma_bus_setup(struct pci_bus *bus)
 {
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index a61ecb4..433b387 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -962,6 +962,7 @@ void pcibios_setup_bus_self(struct pci_bus *bus)
 
 static void pcibios_setup_device(struct pci_dev *dev)
 {
+   struct pci_controller *hose;
/* Fixup NUMA node as it may not be setup yet by the generic
 * code and is needed by the DMA init
 */
@@ -972,7 +973,9 @@ static void pcibios_setup_device(struct pci_dev *dev)
set_dma_offset(dev-dev, PCI_DRAM_OFFSET);
 
/* Additional platform DMA/iommu setup */
-   dma_dev_setup(dev);
+   hose = pci_bus_to_host(dev-bus);
+   if (hose-controller_ops.dma_dev_setup)
+   hose-controller_ops.dma_dev_setup(dev);
 
/* Read default IRQs and fixup if necessary */
pci_read_irq_line(dev);
diff --git a/arch/powerpc/sysdev/dart_iommu.c b/arch/powerpc/sysdev/dart_iommu.c
index 120e96a..ca38b1e 100644
--- a/arch/powerpc/sysdev/dart_iommu.c
+++ b/arch/powerpc/sysdev/dart_iommu.c
@@ -399,7 +399,6 @@ void __init iommu_init_early_dart(struct pci_controller_ops 
*controller_ops)
controller_ops-dma_dev_setup = pci_dma_dev_setup_dart;
controller_ops-dma_bus_setup = pci_dma_bus_setup_dart;
} else {
-   ppc_md.pci_dma_dev_setup = pci_dma_dev_setup_dart;
ppc_md.pci_dma_bus_setup = pci_dma_bus_setup_dart;
}
/* Setup pci_dma ops */
@@ -412,7 +411,6 @@ void __init iommu_init_early_dart(struct pci_controller_ops 
*controller_ops)
controller_ops-dma_dev_setup = NULL;
controller_ops-dma_bus_setup = NULL;
}
-   ppc_md.pci_dma_dev_setup = NULL;
ppc_md.pci_dma_bus_setup = NULL;
 
/* Setup pci_dma ops */
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V15 04/21] PCI: Print PF SR-IOV resource that contains all VF(n) BAR space

2015-03-25 Thread Wei Yang
When we size VF BAR0, VF BAR1, etc., from the SR-IOV Capability of a PF, we
learn the alignment requirement and amount of space consumed by a single
VF.  But when VFs are enabled, *each* of the NumVFs consumes that amount of
space, so the total size of the PF resource is VF BAR size * NumVFs.

Add a printk of the total space consumed by the VFs corresponding to what
we already do for normal non-IOV BARs.

No functional change; new message only.

[bhelgaas: split out into its own patch]
Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com
Acked-by: Bjorn Helgaas bhelg...@google.com
---
 drivers/pci/iov.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index c4c33ea..05f9d97 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -372,6 +372,8 @@ found:
goto failed;
}
res-end = res-start + resource_size(res) * total - 1;
+   dev_info(dev-dev, VF(n) BAR%d space: %pR (contains BAR%d for 
%d VFs)\n,
+i, res, i, total);
nres++;
}
 
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V15 08/21] PCI: Calculate maximum number of buses required for VFs

2015-03-25 Thread Wei Yang
An SR-IOV device can change its First VF Offset and VF Stride based on the
values of ARI Capable Hierarchy and NumVFs.  The number of buses required
for all VFs is determined by NumVFs, First VF Offset, and VF Stride (see
SR-IOV spec r1.1, sec 2.1.2).

Previously pci_iov_bus_range() computed how many buses would be required by
TotalVFs, but this was based on a single NumVFs value and may not have been
the maximum for all NumVFs configurations.

Iterate over all valid NumVFs and calculate the maximum number of bus
numbers that could ever be required for VFs of this device.

[bhelgaas: changelog, compute busnr of NumVFs, not TotalVFs, remove
kerenl-doc comment marker]
Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com
Acked-by: Bjorn Helgaas bhelg...@google.com
---
 drivers/pci/iov.c |   31 +++
 drivers/pci/pci.h |1 +
 2 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index a8752c2..2ae921f 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -46,6 +46,30 @@ static inline void pci_iov_set_numvfs(struct pci_dev *dev, 
int nr_virtfn)
pci_read_config_word(dev, iov-pos + PCI_SRIOV_VF_STRIDE, iov-stride);
 }
 
+/*
+ * The PF consumes one bus number.  NumVFs, First VF Offset, and VF Stride
+ * determine how many additional bus numbers will be consumed by VFs.
+ *
+ * Iterate over all valid NumVFs and calculate the maximum number of bus
+ * numbers that could ever be required.
+ */
+static inline u8 virtfn_max_buses(struct pci_dev *dev)
+{
+   struct pci_sriov *iov = dev-sriov;
+   int nr_virtfn;
+   u8 max = 0;
+   u8 busnr;
+
+   for (nr_virtfn = 1; nr_virtfn = iov-total_VFs; nr_virtfn++) {
+   pci_iov_set_numvfs(dev, nr_virtfn);
+   busnr = virtfn_bus(dev, nr_virtfn - 1);
+   if (busnr  max)
+   max = busnr;
+   }
+
+   return max;
+}
+
 static struct pci_bus *virtfn_add_bus(struct pci_bus *bus, int busnr)
 {
struct pci_bus *child;
@@ -427,6 +451,7 @@ found:
 
dev-sriov = iov;
dev-is_physfn = 1;
+   iov-max_VF_buses = virtfn_max_buses(dev);
 
return 0;
 
@@ -556,15 +581,13 @@ void pci_restore_iov_state(struct pci_dev *dev)
 int pci_iov_bus_range(struct pci_bus *bus)
 {
int max = 0;
-   u8 busnr;
struct pci_dev *dev;
 
list_for_each_entry(dev, bus-devices, bus_list) {
if (!dev-is_physfn)
continue;
-   busnr = virtfn_bus(dev, dev-sriov-total_VFs - 1);
-   if (busnr  max)
-   max = busnr;
+   if (dev-sriov-max_VF_buses  max)
+   max = dev-sriov-max_VF_buses;
}
 
return max ? max - bus-number : 0;
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 5732964..bae593c 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -243,6 +243,7 @@ struct pci_sriov {
u16 stride; /* following VF stride */
u32 pgsz;   /* page size for BAR alignment */
u8 link;/* Function Dependency Link */
+   u8 max_VF_buses;/* max buses consumed by VFs */
u16 driver_max_VFs; /* max num VFs driver supports */
struct pci_dev *dev;/* lowest numbered PF */
struct pci_dev *self;   /* this PF */
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V15 09/21] PCI: Export pci_iov_virtfn_bus() and pci_iov_virtfn_devfn()

2015-03-25 Thread Wei Yang
On PowerNV, some resource reservation is needed for SR-IOV VFs that don't
exist at the bootup stage.  To do the match between resources and VFs, the
code need to get the VF's BDF in advance.

Rename virtfn_bus() and virtfn_devfn() to pci_iov_virtfn_bus() and
pci_iov_virtfn_devfn() and export them.

[bhelgaas: changelog, make busnr int]
Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com
Acked-by: Bjorn Helgaas bhelg...@google.com
---
 drivers/pci/iov.c   |   28 
 include/linux/pci.h |   11 +++
 2 files changed, 27 insertions(+), 12 deletions(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 2ae921f..5643a10 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -19,16 +19,20 @@
 
 #define VIRTFN_ID_LEN  16
 
-static inline u8 virtfn_bus(struct pci_dev *dev, int id)
+int pci_iov_virtfn_bus(struct pci_dev *dev, int vf_id)
 {
+   if (!dev-is_physfn)
+   return -EINVAL;
return dev-bus-number + ((dev-devfn + dev-sriov-offset +
-   dev-sriov-stride * id)  8);
+   dev-sriov-stride * vf_id)  8);
 }
 
-static inline u8 virtfn_devfn(struct pci_dev *dev, int id)
+int pci_iov_virtfn_devfn(struct pci_dev *dev, int vf_id)
 {
+   if (!dev-is_physfn)
+   return -EINVAL;
return (dev-devfn + dev-sriov-offset +
-   dev-sriov-stride * id)  0xff;
+   dev-sriov-stride * vf_id)  0xff;
 }
 
 /*
@@ -58,11 +62,11 @@ static inline u8 virtfn_max_buses(struct pci_dev *dev)
struct pci_sriov *iov = dev-sriov;
int nr_virtfn;
u8 max = 0;
-   u8 busnr;
+   int busnr;
 
for (nr_virtfn = 1; nr_virtfn = iov-total_VFs; nr_virtfn++) {
pci_iov_set_numvfs(dev, nr_virtfn);
-   busnr = virtfn_bus(dev, nr_virtfn - 1);
+   busnr = pci_iov_virtfn_bus(dev, nr_virtfn - 1);
if (busnr  max)
max = busnr;
}
@@ -116,7 +120,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int 
reset)
struct pci_bus *bus;
 
mutex_lock(iov-dev-sriov-lock);
-   bus = virtfn_add_bus(dev-bus, virtfn_bus(dev, id));
+   bus = virtfn_add_bus(dev-bus, pci_iov_virtfn_bus(dev, id));
if (!bus)
goto failed;
 
@@ -124,7 +128,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int 
reset)
if (!virtfn)
goto failed0;
 
-   virtfn-devfn = virtfn_devfn(dev, id);
+   virtfn-devfn = pci_iov_virtfn_devfn(dev, id);
virtfn-vendor = dev-vendor;
pci_read_config_word(dev, iov-pos + PCI_SRIOV_VF_DID, virtfn-device);
pci_setup_device(virtfn);
@@ -186,8 +190,8 @@ static void virtfn_remove(struct pci_dev *dev, int id, int 
reset)
struct pci_sriov *iov = dev-sriov;
 
virtfn = pci_get_domain_bus_and_slot(pci_domain_nr(dev-bus),
-virtfn_bus(dev, id),
-virtfn_devfn(dev, id));
+pci_iov_virtfn_bus(dev, id),
+pci_iov_virtfn_devfn(dev, id));
if (!virtfn)
return;
 
@@ -226,7 +230,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
struct pci_dev *pdev;
struct pci_sriov *iov = dev-sriov;
int bars = 0;
-   u8 bus;
+   int bus;
 
if (!nr_virtfn)
return 0;
@@ -263,7 +267,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
iov-offset = offset;
iov-stride = stride;
 
-   bus = virtfn_bus(dev, nr_virtfn - 1);
+   bus = pci_iov_virtfn_bus(dev, nr_virtfn - 1);
if (bus  dev-bus-busn_res.end) {
dev_err(dev-dev, can't enable %d VFs (bus %02x out of range 
of %pR)\n,
nr_virtfn, bus, dev-bus-busn_res);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 1559658..99ea948 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1669,6 +1669,9 @@ int pci_ext_cfg_avail(void);
 void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar);
 
 #ifdef CONFIG_PCI_IOV
+int pci_iov_virtfn_bus(struct pci_dev *dev, int id);
+int pci_iov_virtfn_devfn(struct pci_dev *dev, int id);
+
 int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn);
 void pci_disable_sriov(struct pci_dev *dev);
 int pci_num_vf(struct pci_dev *dev);
@@ -1677,6 +1680,14 @@ int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 
numvfs);
 int pci_sriov_get_totalvfs(struct pci_dev *dev);
 resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno);
 #else
+static inline int pci_iov_virtfn_bus(struct pci_dev *dev, int id)
+{
+   return -ENOSYS;
+}
+static inline int pci_iov_virtfn_devfn(struct pci_dev *dev, int id)
+{
+   return -ENOSYS;
+}
 static inline int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn)
 { return -ENODEV; }
 static inline void 

[PATCH V15 14/21] powerpc/powernv: Allocate struct pnv_ioda_pe iommu_table dynamically

2015-03-25 Thread Wei Yang
Previously the iommu_table had the same lifetime as a struct pnv_ioda_pe
and was embedded in it. The pnv_ioda_pe was assigned to a PE on the bootup
stage. Since PEs are based on the hardware layout which is static in the
system, they will never get released. This means the iommu_table in the
pnv_ioda_pe will never get released either.

This no longer works for VF PE. VF PEs are created and released dynamically
when VFs are created and released. So we need to assign pnv_ioda_pe to VF
PEs respectively when VFs are enabled and clean up those resources for VF
PE when VFs are disabled. And iommu_table is one of the resources we need
to handle dynamically.

Current iommu_table is a static field in pnv_ioda_pe, which will face a
problem when freeing it. During the disabling of a VF,
pnv_pci_ioda2_release_dma_pe will call iommu_free_table to release the
iommu_table for this PE. A static iommu_table will fail in
iommu_free_table.

According to these requirement, this patch allocates iommu_table
dynamically.

Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/iommu.h  |3 +++
 arch/powerpc/platforms/powernv/pci-ioda.c |   26 ++
 arch/powerpc/platforms/powernv/pci.h  |2 +-
 3 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 9cfa370..5574eeb 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -78,6 +78,9 @@ struct iommu_table {
struct iommu_group *it_group;
 #endif
void (*set_bypass)(struct iommu_table *tbl, bool enable);
+#ifdef CONFIG_PPC_POWERNV
+   void   *data;
+#endif
 };
 
 /* Pure 2^n version of get_order */
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index df4a295..1b37066 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -916,6 +916,10 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, int 
all)
return;
}
 
+   pe-tce32_table = kzalloc_node(sizeof(struct iommu_table),
+   GFP_KERNEL, hose-node);
+   pe-tce32_table-data = pe;
+
/* Associate it with all child devices */
pnv_ioda_setup_same_PE(bus, pe);
 
@@ -1005,7 +1009,7 @@ static void pnv_pci_ioda_dma_dev_setup(struct pnv_phb 
*phb, struct pci_dev *pdev
 
pe = phb-ioda.pe_array[pdn-pe_number];
WARN_ON(get_dma_ops(pdev-dev) != dma_iommu_ops);
-   set_iommu_table_base_and_group(pdev-dev, pe-tce32_table);
+   set_iommu_table_base_and_group(pdev-dev, pe-tce32_table);
 }
 
 static int pnv_pci_ioda_dma_set_mask(struct pnv_phb *phb,
@@ -1032,7 +1036,7 @@ static int pnv_pci_ioda_dma_set_mask(struct pnv_phb *phb,
} else {
dev_info(pdev-dev, Using 32-bit DMA via iommu\n);
set_dma_ops(pdev-dev, dma_iommu_ops);
-   set_iommu_table_base(pdev-dev, pe-tce32_table);
+   set_iommu_table_base(pdev-dev, pe-tce32_table);
}
*pdev-dev.dma_mask = dma_mask;
return 0;
@@ -1069,9 +1073,9 @@ static void pnv_ioda_setup_bus_dma(struct pnv_ioda_pe *pe,
list_for_each_entry(dev, bus-devices, bus_list) {
if (add_to_iommu_group)
set_iommu_table_base_and_group(dev-dev,
-  pe-tce32_table);
+  pe-tce32_table);
else
-   set_iommu_table_base(dev-dev, pe-tce32_table);
+   set_iommu_table_base(dev-dev, pe-tce32_table);
 
if (dev-subordinate)
pnv_ioda_setup_bus_dma(pe, dev-subordinate,
@@ -1161,8 +1165,7 @@ static void pnv_pci_ioda2_tce_invalidate(struct 
pnv_ioda_pe *pe,
 void pnv_pci_ioda_tce_invalidate(struct iommu_table *tbl,
 __be64 *startp, __be64 *endp, bool rm)
 {
-   struct pnv_ioda_pe *pe = container_of(tbl, struct pnv_ioda_pe,
- tce32_table);
+   struct pnv_ioda_pe *pe = tbl-data;
struct pnv_phb *phb = pe-phb;
 
if (phb-type == PNV_PHB_IODA1)
@@ -1228,7 +1231,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
}
 
/* Setup linux iommu table */
-   tbl = pe-tce32_table;
+   tbl = pe-tce32_table;
pnv_pci_setup_iommu_table(tbl, addr, TCE32_TABLE_SIZE * segs,
  base  28, IOMMU_PAGE_SHIFT_4K);
 
@@ -1266,8 +1269,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
 
 static void pnv_pci_ioda2_set_bypass(struct iommu_table *tbl, bool enable)
 {
-   struct pnv_ioda_pe *pe = container_of(tbl, struct pnv_ioda_pe,
- tce32_table);
+   struct pnv_ioda_pe *pe = tbl-data;
uint16_t window_id = 

[PATCH V15 15/21] powerpc/powernv: Reserve additional space for IOV BAR according to the number of total_pe

2015-03-25 Thread Wei Yang
On PHB3, PF IOV BAR will be covered by M64 BAR to have better PE isolation.
M64 BAR is a type of hardware resource in PHB3, which could map a range of
MMIO to PE numbers on powernv platform. And this range is divided equally
by the number of total_pe with each divided range mapping to a PE number.
Also, the M64 BAR must map a MMIO range with power-of-two size.

The total_pe number is usually different from total_VFs, which can lead to
a conflict between MMIO space and the PE number.

For example, if total_VFs is 128 and total_pe is 256, the second half of
M64 BAR will be part of other PCI device, which may already belong to other
PEs.

This patch prevents the conflict by reserving additional space for the PF
IOV BAR, which is total_pe number of VF's BAR size.

[bhelgaas: make dev_printk() output more consistent, index resource[]
conventionally]
Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/machdep.h|4 +++
 arch/powerpc/include/asm/pci-bridge.h |3 ++
 arch/powerpc/kernel/pci-common.c  |6 
 arch/powerpc/platforms/powernv/pci-ioda.c |   43 +
 4 files changed, 56 insertions(+)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index c8175a3..1d72fda 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -250,6 +250,10 @@ struct machdep_calls {
/* Reset the secondary bus of bridge */
void  (*pcibios_reset_secondary_bus)(struct pci_dev *dev);
 
+#ifdef CONFIG_PCI_IOV
+   void (*pcibios_fixup_sriov)(struct pci_dev *pdev);
+#endif /* CONFIG_PCI_IOV */
+
/* Called to shutdown machine specific hardware not already controlled
 * by other drivers.
 */
diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index 513f8f2..d0d1718 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -175,6 +175,9 @@ struct pci_dn {
 #define IODA_INVALID_PE(-1)
 #ifdef CONFIG_PPC_POWERNV
int pe_number;
+#ifdef CONFIG_PCI_IOV
+   u16 vfs_expanded;   /* number of VFs IOV BAR expanded */
+#endif /* CONFIG_PCI_IOV */
 #endif
struct list_head child_list;
struct list_head list;
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 8203101..375bf70 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -990,6 +990,12 @@ int pcibios_add_device(struct pci_dev *dev)
 */
if (dev-bus-is_added)
pcibios_setup_device(dev);
+
+#ifdef CONFIG_PCI_IOV
+   if (ppc_md.pcibios_fixup_sriov)
+   ppc_md.pcibios_fixup_sriov(dev);
+#endif /* CONFIG_PCI_IOV */
+
return 0;
 }
 
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 1b37066..cadd3fb 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1749,6 +1749,46 @@ static void pnv_pci_init_ioda_msis(struct pnv_phb *phb)
 static void pnv_pci_init_ioda_msis(struct pnv_phb *phb) { }
 #endif /* CONFIG_PCI_MSI */
 
+#ifdef CONFIG_PCI_IOV
+static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
+{
+   struct pci_controller *hose;
+   struct pnv_phb *phb;
+   struct resource *res;
+   int i;
+   resource_size_t size;
+   struct pci_dn *pdn;
+
+   if (!pdev-is_physfn || pdev-is_added)
+   return;
+
+   hose = pci_bus_to_host(pdev-bus);
+   phb = hose-private_data;
+
+   pdn = pci_get_pdn(pdev);
+   pdn-vfs_expanded = 0;
+
+   for (i = 0; i  PCI_SRIOV_NUM_BARS; i++) {
+   res = pdev-resource[i + PCI_IOV_RESOURCES];
+   if (!res-flags || res-parent)
+   continue;
+   if (!pnv_pci_is_mem_pref_64(res-flags)) {
+   dev_warn(pdev-dev, Skipping expanding VF BAR%d: 
%pR\n,
+i, res);
+   continue;
+   }
+
+   dev_dbg(pdev-dev,  Fixing VF BAR%d: %pR to\n, i, res);
+   size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
+   res-end = res-start + size * phb-ioda.total_pe - 1;
+   dev_dbg(pdev-dev,%pR\n, res);
+   dev_info(pdev-dev, VF BAR%d: %pR (expanded to %d VFs for PE 
alignment),
+   i, res, phb-ioda.total_pe);
+   }
+   pdn-vfs_expanded = phb-ioda.total_pe;
+}
+#endif /* CONFIG_PCI_IOV */
+
 /*
  * This function is supposed to be called on basis of PE from top
  * to bottom style. So the the I/O or MMIO segment assigned to
@@ -2125,6 +2165,9 @@ static void __init pnv_pci_init_ioda_phb(struct 
device_node *np,
ppc_md.pcibios_enable_device_hook = pnv_pci_enable_device_hook;
ppc_md.pcibios_window_alignment = 

[PATCH V15 18/21] powerpc/powernv: Reserve additional space for IOV BAR, with m64_per_iov supported

2015-03-25 Thread Wei Yang
M64 aperture size is limited on PHB3.  When the IOV BAR is too big, this
will exceed the limitation and failed to be assigned.

Introduce a different mechanism based on the IOV BAR size:

  - if IOV BAR size is smaller than 64MB, expand to total_pe
  - if IOV BAR size is bigger than 64MB, roundup power2

[bhelgaas: make dev_printk() output more consistent, use PCI_SRIOV_NUM_BARS]
Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/pci-bridge.h |2 ++
 arch/powerpc/platforms/powernv/pci-ioda.c |   33 ++---
 2 files changed, 32 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index 3c95097..d6942c9 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -179,6 +179,8 @@ struct pci_dn {
u16 vfs_expanded;   /* number of VFs IOV BAR expanded */
u16 num_vfs;/* number of VFs enabled*/
int offset; /* PE# for the first VF PE */
+#define M64_PER_IOV 4
+   int m64_per_iov;
 #define IODA_INVALID_M64(-1)
int m64_wins[PCI_SRIOV_NUM_BARS];
 #endif /* CONFIG_PCI_IOV */
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 11262df..2c13a39 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2250,6 +2250,7 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
pci_dev *pdev)
int i;
resource_size_t size;
struct pci_dn *pdn;
+   int mul, total_vfs;
 
if (!pdev-is_physfn || pdev-is_added)
return;
@@ -2260,6 +2261,32 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
pci_dev *pdev)
pdn = pci_get_pdn(pdev);
pdn-vfs_expanded = 0;
 
+   total_vfs = pci_sriov_get_totalvfs(pdev);
+   pdn-m64_per_iov = 1;
+   mul = phb-ioda.total_pe;
+
+   for (i = 0; i  PCI_SRIOV_NUM_BARS; i++) {
+   res = pdev-resource[i + PCI_IOV_RESOURCES];
+   if (!res-flags || res-parent)
+   continue;
+   if (!pnv_pci_is_mem_pref_64(res-flags)) {
+   dev_warn(pdev-dev,  non M64 VF BAR%d: %pR\n,
+i, res);
+   continue;
+   }
+
+   size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
+
+   /* bigger than 64M */
+   if (size  (1  26)) {
+   dev_info(pdev-dev, PowerNV: VF BAR%d: %pR IOV size 
is bigger than 64M, roundup power2\n,
+i, res);
+   pdn-m64_per_iov = M64_PER_IOV;
+   mul = roundup_pow_of_two(total_vfs);
+   break;
+   }
+   }
+
for (i = 0; i  PCI_SRIOV_NUM_BARS; i++) {
res = pdev-resource[i + PCI_IOV_RESOURCES];
if (!res-flags || res-parent)
@@ -2272,12 +2299,12 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
pci_dev *pdev)
 
dev_dbg(pdev-dev,  Fixing VF BAR%d: %pR to\n, i, res);
size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
-   res-end = res-start + size * phb-ioda.total_pe - 1;
+   res-end = res-start + size * mul - 1;
dev_dbg(pdev-dev,%pR\n, res);
dev_info(pdev-dev, VF BAR%d: %pR (expanded to %d VFs for PE 
alignment),
-   i, res, phb-ioda.total_pe);
+i, res, mul);
}
-   pdn-vfs_expanded = phb-ioda.total_pe;
+   pdn-vfs_expanded = mul;
 }
 #endif /* CONFIG_PCI_IOV */
 
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V15 19/21] powerpc/powernv: Group VF PE when IOV BAR is big on PHB3

2015-03-25 Thread Wei Yang
When IOV BAR is big, each is covered by 4 M64 windows.  This leads to
several VF PE sits in one PE in terms of M64.

Group VF PEs according to the M64 allocation.

[bhelgaas: use dev_printk() when possible]
Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/pci-bridge.h |2 +-
 arch/powerpc/platforms/powernv/pci-ioda.c |  197 ++---
 2 files changed, 154 insertions(+), 45 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index d6942c9..ec83b51 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -182,7 +182,7 @@ struct pci_dn {
 #define M64_PER_IOV 4
int m64_per_iov;
 #define IODA_INVALID_M64(-1)
-   int m64_wins[PCI_SRIOV_NUM_BARS];
+   int m64_wins[PCI_SRIOV_NUM_BARS][M64_PER_IOV];
 #endif /* CONFIG_PCI_IOV */
 #endif
struct list_head child_list;
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 2c13a39..dc9f401 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1156,26 +1156,27 @@ static int pnv_pci_vf_release_m64(struct pci_dev *pdev)
struct pci_controller *hose;
struct pnv_phb*phb;
struct pci_dn *pdn;
-   inti;
+   inti, j;
 
bus = pdev-bus;
hose = pci_bus_to_host(bus);
phb = hose-private_data;
pdn = pci_get_pdn(pdev);
 
-   for (i = 0; i  PCI_SRIOV_NUM_BARS; i++) {
-   if (pdn-m64_wins[i] == IODA_INVALID_M64)
-   continue;
-   opal_pci_phb_mmio_enable(phb-opal_id,
-   OPAL_M64_WINDOW_TYPE, pdn-m64_wins[i], 0);
-   clear_bit(pdn-m64_wins[i], phb-ioda.m64_bar_alloc);
-   pdn-m64_wins[i] = IODA_INVALID_M64;
-   }
+   for (i = 0; i  PCI_SRIOV_NUM_BARS; i++)
+   for (j = 0; j  M64_PER_IOV; j++) {
+   if (pdn-m64_wins[i][j] == IODA_INVALID_M64)
+   continue;
+   opal_pci_phb_mmio_enable(phb-opal_id,
+   OPAL_M64_WINDOW_TYPE, pdn-m64_wins[i][j], 0);
+   clear_bit(pdn-m64_wins[i][j], 
phb-ioda.m64_bar_alloc);
+   pdn-m64_wins[i][j] = IODA_INVALID_M64;
+   }
 
return 0;
 }
 
-static int pnv_pci_vf_assign_m64(struct pci_dev *pdev)
+static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
 {
struct pci_bus*bus;
struct pci_controller *hose;
@@ -1183,17 +1184,33 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev)
struct pci_dn *pdn;
unsigned int   win;
struct resource   *res;
-   inti;
+   inti, j;
int64_trc;
+   inttotal_vfs;
+   resource_size_tsize, start;
+   intpe_num;
+   intvf_groups;
+   intvf_per_group;
 
bus = pdev-bus;
hose = pci_bus_to_host(bus);
phb = hose-private_data;
pdn = pci_get_pdn(pdev);
+   total_vfs = pci_sriov_get_totalvfs(pdev);
 
/* Initialize the m64_wins to IODA_INVALID_M64 */
for (i = 0; i  PCI_SRIOV_NUM_BARS; i++)
-   pdn-m64_wins[i] = IODA_INVALID_M64;
+   for (j = 0; j  M64_PER_IOV; j++)
+   pdn-m64_wins[i][j] = IODA_INVALID_M64;
+
+   if (pdn-m64_per_iov == M64_PER_IOV) {
+   vf_groups = (num_vfs = M64_PER_IOV) ? num_vfs: M64_PER_IOV;
+   vf_per_group = (num_vfs = M64_PER_IOV)? 1:
+   roundup_pow_of_two(num_vfs) / pdn-m64_per_iov;
+   } else {
+   vf_groups = 1;
+   vf_per_group = 1;
+   }
 
for (i = 0; i  PCI_SRIOV_NUM_BARS; i++) {
res = pdev-resource[i + PCI_IOV_RESOURCES];
@@ -1203,35 +1220,61 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev)
if (!pnv_pci_is_mem_pref_64(res-flags))
continue;
 
-   do {
-   win = find_next_zero_bit(phb-ioda.m64_bar_alloc,
-   phb-ioda.m64_bar_idx + 1, 0);
-
-   if (win = phb-ioda.m64_bar_idx + 1)
-   goto m64_failed;
-   } while (test_and_set_bit(win, phb-ioda.m64_bar_alloc));
+   for (j = 0; j  vf_groups; j++) {
+   do {
+   win = 
find_next_zero_bit(phb-ioda.m64_bar_alloc,
+   phb-ioda.m64_bar_idx + 1, 0);
+
+   if (win = phb-ioda.m64_bar_idx + 1)
+   

[PATCH 20/27] powerpc/cell: Move controller ops from ppc_md to controller_ops

2015-03-25 Thread Daniel Axtens
This moves the Cell platform to use the pci_controller_ops
structure rather than ppc_md for PCI controller operations.

This depends on the patch to drop celleb support:
http://patchwork.ozlabs.org/patch/451730/

Signed-off-by: Daniel Axtens d...@axtens.net
---
 arch/powerpc/platforms/cell/cell.h  | 24 
 arch/powerpc/platforms/cell/iommu.c |  7 ---
 arch/powerpc/platforms/cell/setup.c |  5 +
 3 files changed, 33 insertions(+), 3 deletions(-)
 create mode 100644 arch/powerpc/platforms/cell/cell.h

diff --git a/arch/powerpc/platforms/cell/cell.h 
b/arch/powerpc/platforms/cell/cell.h
new file mode 100644
index 000..ef143df
--- /dev/null
+++ b/arch/powerpc/platforms/cell/cell.h
@@ -0,0 +1,24 @@
+/*
+ * Cell Platform common data structures
+ *
+ * Copyright 2015, Daniel Axtens, IBM Corporation
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef CELL_H
+#define CELL_H
+
+#include asm/pci-bridge.h
+
+extern struct pci_controller_ops cell_pci_controller_ops;
+
+#endif
diff --git a/arch/powerpc/platforms/cell/iommu.c 
b/arch/powerpc/platforms/cell/iommu.c
index 31b1a67..4cb120f 100644
--- a/arch/powerpc/platforms/cell/iommu.c
+++ b/arch/powerpc/platforms/cell/iommu.c
@@ -39,6 +39,7 @@
 #include asm/firmware.h
 #include asm/cell-regs.h
 
+#include cell.h
 #include interrupt.h
 
 /* Define CELL_IOMMU_REAL_UNMAP to actually unmap non-used pages
@@ -857,7 +858,7 @@ static int __init cell_iommu_init_disabled(void)
cell_dma_direct_offset += base;
 
if (cell_dma_direct_offset != 0)
-   ppc_md.pci_dma_dev_setup = cell_pci_dma_dev_setup;
+   cell_pci_controller_ops.dma_dev_setup = cell_pci_dma_dev_setup;
 
printk(iommu: disabled, direct DMA offset is 0x%lx\n,
   cell_dma_direct_offset);
@@ -1197,8 +1198,8 @@ static int __init cell_iommu_init(void)
if (cell_iommu_init_disabled() == 0)
goto bail;
 
-   /* Setup various ppc_md. callbacks */
-   ppc_md.pci_dma_dev_setup = cell_pci_dma_dev_setup;
+   /* Setup various callbacks */
+   cell_pci_controller_ops.dma_dev_setup = cell_pci_dma_dev_setup;
ppc_md.dma_get_required_mask = cell_dma_get_required_mask;
ppc_md.tce_build = tce_build_cell;
ppc_md.tce_free = tce_free_cell;
diff --git a/arch/powerpc/platforms/cell/setup.c 
b/arch/powerpc/platforms/cell/setup.c
index d62aa98..d1be268 100644
--- a/arch/powerpc/platforms/cell/setup.c
+++ b/arch/powerpc/platforms/cell/setup.c
@@ -54,6 +54,7 @@
 #include asm/cell-regs.h
 #include asm/io-workarounds.h
 
+#include cell.h
 #include interrupt.h
 #include pervasive.h
 #include ras.h
@@ -131,6 +132,8 @@ static int cell_setup_phb(struct pci_controller *phb)
if (model == NULL || strcmp(np-name, pci))
return 0;
 
+   phb-controller_ops = cell_pci_controller_ops;
+
/* Setup workarounds for spider */
if (strcmp(model, Spider))
return 0;
@@ -279,3 +282,5 @@ define_machine(cell) {
.init_IRQ   = cell_init_irq,
.pci_setup_phb  = cell_setup_phb,
 };
+
+struct pci_controller_ops cell_pci_controller_ops;
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V15 07/21] PCI: Refresh First VF Offset and VF Stride when updating NumVFs

2015-03-25 Thread Wei Yang
The First VF Offset and VF Stride fields depend on the NumVFs setting, so
refresh the cached fields in struct pci_sriov when updating NumVFs.  See
the SR-IOV spec r1.1, sec 3.3.9 and 3.3.10.

[bhelgaas: changelog, remove kernel-doc comment marker]
Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com
Acked-by: Bjorn Helgaas bhelg...@google.com
---
 drivers/pci/iov.c |   23 +++
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 27b98c3..a8752c2 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -31,6 +31,21 @@ static inline u8 virtfn_devfn(struct pci_dev *dev, int id)
dev-sriov-stride * id)  0xff;
 }
 
+/*
+ * Per SR-IOV spec sec 3.3.10 and 3.3.11, First VF Offset and VF Stride may
+ * change when NumVFs changes.
+ *
+ * Update iov-offset and iov-stride when NumVFs is written.
+ */
+static inline void pci_iov_set_numvfs(struct pci_dev *dev, int nr_virtfn)
+{
+   struct pci_sriov *iov = dev-sriov;
+
+   pci_write_config_word(dev, iov-pos + PCI_SRIOV_NUM_VF, nr_virtfn);
+   pci_read_config_word(dev, iov-pos + PCI_SRIOV_VF_OFFSET, iov-offset);
+   pci_read_config_word(dev, iov-pos + PCI_SRIOV_VF_STRIDE, iov-stride);
+}
+
 static struct pci_bus *virtfn_add_bus(struct pci_bus *bus, int busnr)
 {
struct pci_bus *child;
@@ -253,7 +268,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
return rc;
}
 
-   pci_write_config_word(dev, iov-pos + PCI_SRIOV_NUM_VF, nr_virtfn);
+   pci_iov_set_numvfs(dev, nr_virtfn);
iov-ctrl |= PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE;
pci_cfg_access_lock(dev);
pci_write_config_word(dev, iov-pos + PCI_SRIOV_CTRL, iov-ctrl);
@@ -282,7 +297,7 @@ failed:
iov-ctrl = ~(PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE);
pci_cfg_access_lock(dev);
pci_write_config_word(dev, iov-pos + PCI_SRIOV_CTRL, iov-ctrl);
-   pci_write_config_word(dev, iov-pos + PCI_SRIOV_NUM_VF, 0);
+   pci_iov_set_numvfs(dev, 0);
ssleep(1);
pci_cfg_access_unlock(dev);
 
@@ -313,7 +328,7 @@ static void sriov_disable(struct pci_dev *dev)
sysfs_remove_link(dev-dev.kobj, dep_link);
 
iov-num_VFs = 0;
-   pci_write_config_word(dev, iov-pos + PCI_SRIOV_NUM_VF, 0);
+   pci_iov_set_numvfs(dev, 0);
 }
 
 static int sriov_init(struct pci_dev *dev, int pos)
@@ -452,7 +467,7 @@ static void sriov_restore_state(struct pci_dev *dev)
pci_update_resource(dev, i);
 
pci_write_config_dword(dev, iov-pos + PCI_SRIOV_SYS_PGSIZE, iov-pgsz);
-   pci_write_config_word(dev, iov-pos + PCI_SRIOV_NUM_VF, iov-num_VFs);
+   pci_iov_set_numvfs(dev, iov-num_VFs);
pci_write_config_word(dev, iov-pos + PCI_SRIOV_CTRL, iov-ctrl);
if (iov-ctrl  PCI_SRIOV_CTRL_VFE)
msleep(100);
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V15 13/21] powerpc/pci: Don't unset PCI resources for VFs

2015-03-25 Thread Wei Yang
Flag PCI_REASSIGN_ALL_RSRC is used to ignore resources information setup by
firmware, so that kernel would re-assign all resources of pci devices.

On powerpc arch, this happens in a header fixup function
pcibios_fixup_resources(), which will clean up the resources if this flag
is set. This works fine for PFs, since after clean up, kernel will
re-assign the resources in pcibios_resource_survey().

Below is a simple call flow on how it works:

pcibios_init
  pcibios_scan_phb
pci_scan_child_bus
  ...
pci_device_add
  pci_fixup_device(pci_fixup_header)
pcibios_fixup_resources # header fixup
  for (i = 0; i  DEVICE_COUNT_RESOURCE; i++)
dev-resource[i].start = 0
  pcibios_resource_survey   # re-assign
pcibios_allocate_resources

However, the VF resources won't be re-assigned, since the VF resources are
completely determined by the PF resources, and the PF resources have
already been reassigned. This means we need to leave VF's resources
un-cleared in pcibios_fixup_resources().

In this patch, we skip the resource unset process in
pcibios_fixup_resources(), if the pci_dev is a VF.

Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com
---
 arch/powerpc/kernel/pci-common.c |4 
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 2a525c9..8203101 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -788,6 +788,10 @@ static void pcibios_fixup_resources(struct pci_dev *dev)
   pci_name(dev));
return;
}
+
+   if (dev-is_virtfn)
+   return;
+
for (i = 0; i  DEVICE_COUNT_RESOURCE; i++) {
struct resource *res = dev-resource + i;
struct pci_bus_region reg;
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V15 17/21] powerpc/powernv: Shift VF resource with an offset

2015-03-25 Thread Wei Yang
On PowerNV platform, resource position in M64 BAR implies the PE# the
resource belongs to. In some cases, adjustment of a resource is necessary
to locate it to a correct position in M64 BAR .

This patch adds pnv_pci_vf_resource_shift() to shift the 'real' PF IOV BAR
address according to an offset.

Note:

After doing so, there would be a hole in the /proc/iomem when offset
is a positive value. It looks like the device return some mmio back to
the system, which actually no one could use it.

[bhelgaas: rework loops, rework overlap check, index resource[]
conventionally, remove pci_regs.h include, squashed with next patch]
Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/pci-bridge.h |4 +
 arch/powerpc/kernel/pci_dn.c  |   13 +
 arch/powerpc/platforms/powernv/pci-ioda.c |  528 -
 arch/powerpc/platforms/powernv/pci.c  |   18 +
 arch/powerpc/platforms/powernv/pci.h  |7 +
 5 files changed, 553 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index d0d1718..3c95097 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -177,6 +177,10 @@ struct pci_dn {
int pe_number;
 #ifdef CONFIG_PCI_IOV
u16 vfs_expanded;   /* number of VFs IOV BAR expanded */
+   u16 num_vfs;/* number of VFs enabled*/
+   int offset; /* PE# for the first VF PE */
+#define IODA_INVALID_M64(-1)
+   int m64_wins[PCI_SRIOV_NUM_BARS];
 #endif /* CONFIG_PCI_IOV */
 #endif
struct list_head child_list;
diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index f3a1a81..93ed7b3 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -217,6 +217,19 @@ void remove_dev_pci_info(struct pci_dev *pdev)
struct pci_dn *pdn, *tmp;
int i;
 
+   /*
+* VF and VF PE are created/released dynamically, so we need to
+* bind/unbind them.  Otherwise the VF and VF PE would be mismatched
+* when re-enabling SR-IOV.
+*/
+   if (pdev-is_virtfn) {
+   pdn = pci_get_pdn(pdev);
+#ifdef CONFIG_PPC_POWERNV
+   pdn-pe_number = IODA_INVALID_PE;
+#endif
+   return;
+   }
+
/* Only support IOV PF for now */
if (!pdev-is_physfn)
return;
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 93ec16c..11262df 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -44,6 +44,9 @@
 #include powernv.h
 #include pci.h
 
+/* 256M DMA window, 4K TCE pages, 8 bytes TCE */
+#define TCE32_TABLE_SIZE   ((0x1000 / 0x1000) * 8)
+
 static void pe_level_printk(const struct pnv_ioda_pe *pe, const char *level,
const char *fmt, ...)
 {
@@ -56,11 +59,18 @@ static void pe_level_printk(const struct pnv_ioda_pe *pe, 
const char *level,
vaf.fmt = fmt;
vaf.va = args;
 
-   if (pe-pdev)
+   if (pe-flags  PNV_IODA_PE_DEV)
strlcpy(pfix, dev_name(pe-pdev-dev), sizeof(pfix));
-   else
+   else if (pe-flags  (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL))
sprintf(pfix, %04x:%02x ,
pci_domain_nr(pe-pbus), pe-pbus-number);
+#ifdef CONFIG_PCI_IOV
+   else if (pe-flags  PNV_IODA_PE_VF)
+   sprintf(pfix, %04x:%02x:%2x.%d,
+   pci_domain_nr(pe-parent_dev-bus),
+   (pe-rid  0xff00)  8,
+   PCI_SLOT(pe-rid), PCI_FUNC(pe-rid));
+#endif /* CONFIG_PCI_IOV*/
 
printk(%spci %s: [PE# %.3d] %pV,
   level, pfix, pe-pe_number, vaf);
@@ -591,7 +601,7 @@ static int pnv_ioda_set_peltv(struct pnv_phb *phb,
  bool is_add)
 {
struct pnv_ioda_pe *slave;
-   struct pci_dev *pdev;
+   struct pci_dev *pdev = NULL;
int ret;
 
/*
@@ -630,8 +640,12 @@ static int pnv_ioda_set_peltv(struct pnv_phb *phb,
 
if (pe-flags  (PNV_IODA_PE_BUS_ALL | PNV_IODA_PE_BUS))
pdev = pe-pbus-self;
-   else
+   else if (pe-flags  PNV_IODA_PE_DEV)
pdev = pe-pdev-bus-self;
+#ifdef CONFIG_PCI_IOV
+   else if (pe-flags  PNV_IODA_PE_VF)
+   pdev = pe-parent_dev-bus-self;
+#endif /* CONFIG_PCI_IOV */
while (pdev) {
struct pci_dn *pdn = pci_get_pdn(pdev);
struct pnv_ioda_pe *parent;
@@ -649,6 +663,87 @@ static int pnv_ioda_set_peltv(struct pnv_phb *phb,
return 0;
 }
 
+#ifdef CONFIG_PCI_IOV
+static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
+{
+   struct pci_dev *parent;
+   uint8_t bcomp, dcomp, fcomp;
+   int64_t rc;
+   long rid_end, rid;
+
+   /* 

[PATCH 14/27] powerpc/powermac: Move controller ops from ppc_md to controller_ops

2015-03-25 Thread Daniel Axtens
This moves the Power Mac platform to use the pci_controller_ops
structure rather than ppc_md for PCI controller operations.

Signed-off-by: Daniel Axtens d...@axtens.net
---
 arch/powerpc/platforms/powermac/pci.c   | 17 +++--
 arch/powerpc/platforms/powermac/pmac.h  |  5 +
 arch/powerpc/platforms/powermac/setup.c |  4 +---
 3 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/platforms/powermac/pci.c 
b/arch/powerpc/platforms/powermac/pci.c
index 9c89fd2..59ab16f 100644
--- a/arch/powerpc/platforms/powermac/pci.c
+++ b/arch/powerpc/platforms/powermac/pci.c
@@ -27,6 +27,8 @@
 #include asm/grackle.h
 #include asm/ppc-pci.h
 
+#include pmac.h
+
 #undef DEBUG
 
 #ifdef DEBUG
@@ -798,6 +800,7 @@ static int __init pmac_add_bridge(struct device_node *dev)
return -ENOMEM;
hose-first_busno = bus_range ? bus_range[0] : 0;
hose-last_busno = bus_range ? bus_range[1] : 0xff;
+   hose-controller_ops = pmac_pci_controller_ops;
 
disp_name = NULL;
 
@@ -942,7 +945,7 @@ void __init pmac_pci_init(void)
 }
 
 #ifdef CONFIG_PPC32
-bool pmac_pci_enable_device_hook(struct pci_dev *dev)
+static bool pmac_pci_enable_device_hook(struct pci_dev *dev)
 {
struct device_node* node;
int updatecfg = 0;
@@ -1225,7 +1228,7 @@ static void fixup_u4_pcie(struct pci_dev* dev)
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_APPLE, PCI_DEVICE_ID_APPLE_U4_PCIE, 
fixup_u4_pcie);
 
 #ifdef CONFIG_PPC64
-int pmac_pci_probe_mode(struct pci_bus *bus)
+static int pmac_pci_probe_mode(struct pci_bus *bus)
 {
struct device_node *node = pci_bus_to_OF_node(bus);
 
@@ -1240,3 +1243,13 @@ int pmac_pci_probe_mode(struct pci_bus *bus)
return PCI_PROBE_DEVTREE;
 }
 #endif /* CONFIG_PPC64 */
+
+struct pci_controller_ops pmac_pci_controller_ops = {
+#ifdef CONFIG_PPC64
+   .probe_mode = pmac_pci_probe_mode,
+#endif
+#ifdef CONFIG_PPC32
+   .enable_device_hook = pmac_pci_enable_device_hook,
+#endif
+};
+
diff --git a/arch/powerpc/platforms/powermac/pmac.h 
b/arch/powerpc/platforms/powermac/pmac.h
index b8d5721..e7f8163 100644
--- a/arch/powerpc/platforms/powermac/pmac.h
+++ b/arch/powerpc/platforms/powermac/pmac.h
@@ -25,7 +25,6 @@ extern void pmac_pci_init(void);
 extern void pmac_nvram_update(void);
 extern unsigned char pmac_nvram_read_byte(int addr);
 extern void pmac_nvram_write_byte(int addr, unsigned char val);
-extern bool pmac_pci_enable_device_hook(struct pci_dev *dev);
 extern void pmac_pcibios_after_init(void);
 extern int of_show_percpuinfo(struct seq_file *m, int i);
 
@@ -39,8 +38,6 @@ extern void low_cpu_die(void) __attribute__((noreturn));
 extern int pmac_nvram_init(void);
 extern void pmac_pic_init(void);
 
-#ifdef CONFIG_PPC64
-extern int pmac_pci_probe_mode(struct pci_bus *bus);
-#endif
+extern struct pci_controller_ops pmac_pci_controller_ops;
 
 #endif /* __PMAC_H__ */
diff --git a/arch/powerpc/platforms/powermac/setup.c 
b/arch/powerpc/platforms/powermac/setup.c
index 71a353c..8dd78f4 100644
--- a/arch/powerpc/platforms/powermac/setup.c
+++ b/arch/powerpc/platforms/powermac/setup.c
@@ -473,7 +473,7 @@ static void __init pmac_init_early(void)
udbg_adb_init(!!strstr(boot_command_line, btextdbg));
 
 #ifdef CONFIG_PPC64
-   iommu_init_early_dart(NULL);
+   iommu_init_early_dart(pmac_pci_controller_ops);
 #endif
 
/* SMP Init has to be done early as we need to patch up
@@ -656,12 +656,10 @@ define_machine(powermac) {
.feature_call   = pmac_do_feature_call,
.progress   = udbg_progress,
 #ifdef CONFIG_PPC64
-   .pci_probe_mode = pmac_pci_probe_mode,
.power_save = power4_idle,
.enable_pmcs= power4_enable_pmcs,
 #endif /* CONFIG_PPC64 */
 #ifdef CONFIG_PPC32
-   .pcibios_enable_device_hook = pmac_pci_enable_device_hook,
.pcibios_after_init = pmac_pcibios_after_init,
.phys_mem_access_prot   = pci_phys_mem_access_prot,
 #endif
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 17/27] powerpc/pasemi: Move controller ops from ppc_md to controller_ops

2015-03-25 Thread Daniel Axtens
This moves the PaSemi platform to use the pci_controller_ops
structure rather than ppc_md for PCI controller operations.

Signed-off-by: Daniel Axtens d...@axtens.net
---
 arch/powerpc/platforms/pasemi/iommu.c  | 6 --
 arch/powerpc/platforms/pasemi/pasemi.h | 1 +
 arch/powerpc/platforms/pasemi/pci.c| 5 +
 3 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/pasemi/iommu.c 
b/arch/powerpc/platforms/pasemi/iommu.c
index 2e576f2..b8f567b 100644
--- a/arch/powerpc/platforms/pasemi/iommu.c
+++ b/arch/powerpc/platforms/pasemi/iommu.c
@@ -27,6 +27,8 @@
 #include asm/machdep.h
 #include asm/firmware.h
 
+#include pasemi.h
+
 #define IOBMAP_PAGE_SHIFT  12
 #define IOBMAP_PAGE_SIZE   (1  IOBMAP_PAGE_SHIFT)
 #define IOBMAP_PAGE_MASK   (IOBMAP_PAGE_SIZE - 1)
@@ -248,8 +250,8 @@ void __init iommu_init_early_pasemi(void)
 
iob_init(NULL);
 
-   ppc_md.pci_dma_dev_setup = pci_dma_dev_setup_pasemi;
-   ppc_md.pci_dma_bus_setup = pci_dma_bus_setup_pasemi;
+   pasemi_pci_controller_ops.dma_dev_setup = pci_dma_dev_setup_pasemi;
+   pasemi_pci_controller_ops.dma_bus_setup = pci_dma_bus_setup_pasemi;
ppc_md.tce_build = iobmap_build;
ppc_md.tce_free  = iobmap_free;
set_pci_dma_ops(dma_iommu_ops);
diff --git a/arch/powerpc/platforms/pasemi/pasemi.h 
b/arch/powerpc/platforms/pasemi/pasemi.h
index ea65bf0..11f230a 100644
--- a/arch/powerpc/platforms/pasemi/pasemi.h
+++ b/arch/powerpc/platforms/pasemi/pasemi.h
@@ -30,5 +30,6 @@ static inline void restore_astate(int cpu)
 }
 #endif
 
+extern struct pci_controller_ops pasemi_pci_controller_ops;
 
 #endif /* _PASEMI_PASEMI_H */
diff --git a/arch/powerpc/platforms/pasemi/pci.c 
b/arch/powerpc/platforms/pasemi/pci.c
index aa86271..f3a68a0 100644
--- a/arch/powerpc/platforms/pasemi/pci.c
+++ b/arch/powerpc/platforms/pasemi/pci.c
@@ -31,6 +31,8 @@
 
 #include asm/ppc-pci.h
 
+#include pasemi.h
+
 #define PA_PXP_CFA(bus, devfn, off) (((bus)  20) | ((devfn)  12) | (off))
 
 static inline int pa_pxp_offset_valid(u8 bus, u8 devfn, int offset)
@@ -199,6 +201,7 @@ static int __init pas_add_bridge(struct device_node *dev)
 
hose-first_busno = 0;
hose-last_busno = 0xff;
+   hose-controller_ops = pasemi_pci_controller_ops;
 
setup_pa_pxp(hose);
 
@@ -239,3 +242,5 @@ void __iomem *pasemi_pci_getcfgaddr(struct pci_dev *dev, 
int offset)
 
return (void __iomem *)pa_pxp_cfg_addr(hose, dev-bus-number, 
dev-devfn, offset);
 }
+
+struct pci_controller_ops pasemi_pci_controller_ops;
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 16/27] powerpc/powernv: Move controller ops from ppc_md to controller_ops

2015-03-25 Thread Daniel Axtens
This moves the PowerNV platform to use the pci_controller_ops
structure rather than ppc_md for PCI controller operations.

Signed-off-by: Daniel Axtens d...@axtens.net
---
 arch/powerpc/platforms/powernv/pci-ioda.c   | 7 ---
 arch/powerpc/platforms/powernv/pci-p5ioc2.c | 1 +
 arch/powerpc/platforms/powernv/pci.c| 5 -
 arch/powerpc/platforms/powernv/powernv.h| 2 ++
 4 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index c18e191..b4e46bf 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1988,6 +1988,7 @@ static void __init pnv_pci_init_ioda_phb(struct 
device_node *np,
hose-last_busno = 0xff;
}
hose-private_data = phb;
+   hose-controller_ops = pnv_pci_controller_ops;
phb-hub_id = hub_id;
phb-opal_id = phb_id;
phb-type = ioda_type;
@@ -2104,9 +2105,9 @@ static void __init pnv_pci_init_ioda_phb(struct 
device_node *np,
 * the child P2P bridges) can form individual PE.
 */
ppc_md.pcibios_fixup = pnv_pci_ioda_fixup;
-   ppc_md.pcibios_enable_device_hook = pnv_pci_enable_device_hook;
-   ppc_md.pcibios_window_alignment = pnv_pci_window_alignment;
-   ppc_md.pcibios_reset_secondary_bus = pnv_pci_reset_secondary_bus;
+   pnv_pci_controller_ops.enable_device_hook = pnv_pci_enable_device_hook;
+   pnv_pci_controller_ops.window_alignment = pnv_pci_window_alignment;
+   pnv_pci_controller_ops.reset_secondary_bus = 
pnv_pci_reset_secondary_bus;
pci_add_flags(PCI_REASSIGN_ALL_RSRC);
 
/* Reset IODA tables to a clean state */
diff --git a/arch/powerpc/platforms/powernv/pci-p5ioc2.c 
b/arch/powerpc/platforms/powernv/pci-p5ioc2.c
index 6ef6d4d..4729ca7 100644
--- a/arch/powerpc/platforms/powernv/pci-p5ioc2.c
+++ b/arch/powerpc/platforms/powernv/pci-p5ioc2.c
@@ -133,6 +133,7 @@ static void __init pnv_pci_init_p5ioc2_phb(struct 
device_node *np, u64 hub_id,
phb-hose-first_busno = 0;
phb-hose-last_busno = 0xff;
phb-hose-private_data = phb;
+   phb-hose-controller_ops = pnv_pci_controller_ops;
phb-hub_id = hub_id;
phb-opal_id = phb_id;
phb-type = PNV_PHB_P5IOC2;
diff --git a/arch/powerpc/platforms/powernv/pci.c 
b/arch/powerpc/platforms/powernv/pci.c
index c8939ad..63518b3 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -761,7 +761,6 @@ void __init pnv_pci_init(void)
pci_devs_phb_init();
 
/* Configure IOMMU DMA hooks */
-   ppc_md.pci_dma_dev_setup = pnv_pci_dma_dev_setup;
ppc_md.tce_build = pnv_tce_build_vm;
ppc_md.tce_free = pnv_tce_free_vm;
ppc_md.tce_build_rm = pnv_tce_build_rm;
@@ -777,3 +776,7 @@ void __init pnv_pci_init(void)
 }
 
 machine_subsys_initcall_sync(powernv, tce_iommu_bus_notifier_init);
+
+struct pci_controller_ops pnv_pci_controller_ops = {
+   .dma_dev_setup = pnv_pci_dma_dev_setup,
+};
diff --git a/arch/powerpc/platforms/powernv/powernv.h 
b/arch/powerpc/platforms/powernv/powernv.h
index 604c48e..826d2c9 100644
--- a/arch/powerpc/platforms/powernv/powernv.h
+++ b/arch/powerpc/platforms/powernv/powernv.h
@@ -29,6 +29,8 @@ static inline u64 pnv_pci_dma_get_required_mask(struct 
pci_dev *pdev)
 }
 #endif
 
+extern struct pci_controller_ops pnv_pci_controller_ops;
+
 extern u32 pnv_get_supported_cpuidle_states(void);
 
 extern void pnv_lpc_init(void);
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 21/27] powerpc: Remove shim for pci_controller_ops.window_alignment

2015-03-25 Thread Daniel Axtens
Signed-off-by: Daniel Axtens d...@axtens.net
---
 arch/powerpc/include/asm/machdep.h|  3 ---
 arch/powerpc/include/asm/pci-bridge.h | 18 --
 arch/powerpc/kernel/pci-common.c  | 12 +++-
 3 files changed, 11 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index 9d4a067..f1476b8 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -244,9 +244,6 @@ struct machdep_calls {
/* Called after scan and before resource survey */
void (*pcibios_fixup_phb)(struct pci_controller *hose);
 
-   /* Called during PCI resource reassignment */
-   resource_size_t (*pcibios_window_alignment)(struct pci_bus *, unsigned 
long type);
-
/* Reset the secondary bus of bridge */
void  (*pcibios_reset_secondary_bus)(struct pci_dev *dev);
 
diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index ea9496b..b62e043 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -327,24 +327,6 @@ static inline bool enable_device_hook(struct pci_dev *dev)
return true;
 }
 
-static inline resource_size_t pci_window_alignment(struct pci_bus *bus,
-  unsigned long type)
-{
-   struct pci_controller *hose = pci_bus_to_host(bus);
-
-   if (hose-controller_ops.window_alignment)
-   return hose-controller_ops.window_alignment(bus, type);
-   if (ppc_md.pcibios_window_alignment)
-   return ppc_md.pcibios_window_alignment(bus, type);
-
-   /*
-* PCI core will figure out the default
-* alignment: 4KiB for I/O and 1MiB for
-* memory window.
-*/
-   return 1;
-}
-
 static inline void reset_secondary_bus(struct pci_dev *dev)
 {
struct pci_controller *hose = pci_bus_to_host(dev-bus);
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 67d4dcb..9edb479 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -109,7 +109,17 @@ void pcibios_free_controller(struct pci_controller *phb)
 resource_size_t pcibios_window_alignment(struct pci_bus *bus,
 unsigned long type)
 {
-   return pci_window_alignment(bus, type);
+   struct pci_controller *hose = pci_bus_to_host(bus);
+
+   if (hose-controller_ops.window_alignment)
+   return hose-controller_ops.window_alignment(bus, type);
+
+   /*
+* PCI core will figure out the default
+* alignment: 4KiB for I/O and 1MiB for
+* memory window.
+*/
+   return 1;
 }
 
 void pcibios_reset_secondary_bus(struct pci_dev *dev)
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 26/27] powerpc: Remove shim for pci_controller_ops.dma_bus_setup

2015-03-25 Thread Daniel Axtens
Signed-off-by: Daniel Axtens d...@axtens.net
---
 arch/powerpc/include/asm/machdep.h|  2 --
 arch/powerpc/include/asm/pci-bridge.h | 14 --
 arch/powerpc/kernel/pci-common.c  |  5 -
 arch/powerpc/sysdev/dart_iommu.c  |  3 ---
 4 files changed, 4 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index 2f7b319..92b085b 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -103,8 +103,6 @@ struct machdep_calls {
 #endif
 #endif /* CONFIG_PPC64 */
 
-   void(*pci_dma_bus_setup)(struct pci_bus *bus);
-
/* Platform set_dma_mask and dma_get_required_mask overrides */
int (*dma_set_mask)(struct device *dev, u64 dma_mask);
u64 (*dma_get_required_mask)(struct device *dev);
diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index e578f67..4f39ef9 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -277,19 +277,5 @@ static inline int pcibios_vaddr_is_ioport(void __iomem 
*address)
 }
 #endif /* CONFIG_PCI */
 
-/*
- * Shims to prefer pci_controller version over ppc_md where available.
- */
-
-static inline void dma_bus_setup(struct pci_bus *bus)
-{
-   struct pci_controller *hose = pci_bus_to_host(bus);
-
-   if (hose-controller_ops.dma_bus_setup)
-   hose-controller_ops.dma_bus_setup(bus);
-   else if (ppc_md.pci_dma_bus_setup)
-   ppc_md.pci_dma_bus_setup(bus);
-}
-
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_PCI_BRIDGE_H */
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 433b387..7447b10 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -946,6 +946,7 @@ static void pcibios_fixup_bridge(struct pci_bus *bus)
 
 void pcibios_setup_bus_self(struct pci_bus *bus)
 {
+   struct pci_controller *hose;
/* Fix up the bus resources for P2P bridges */
if (bus-self != NULL)
pcibios_fixup_bridge(bus);
@@ -957,7 +958,9 @@ void pcibios_setup_bus_self(struct pci_bus *bus)
ppc_md.pcibios_fixup_bus(bus);
 
/* Setup bus DMA mappings */
-   dma_bus_setup(bus);
+   hose = pci_bus_to_host(bus);
+   if (hose-controller_ops.dma_bus_setup)
+   hose-controller_ops.dma_bus_setup(bus);
 }
 
 static void pcibios_setup_device(struct pci_dev *dev)
diff --git a/arch/powerpc/sysdev/dart_iommu.c b/arch/powerpc/sysdev/dart_iommu.c
index ca38b1e..87b8000 100644
--- a/arch/powerpc/sysdev/dart_iommu.c
+++ b/arch/powerpc/sysdev/dart_iommu.c
@@ -398,8 +398,6 @@ void __init iommu_init_early_dart(struct pci_controller_ops 
*controller_ops)
if (controller_ops) {
controller_ops-dma_dev_setup = pci_dma_dev_setup_dart;
controller_ops-dma_bus_setup = pci_dma_bus_setup_dart;
-   } else {
-   ppc_md.pci_dma_bus_setup = pci_dma_bus_setup_dart;
}
/* Setup pci_dma ops */
set_pci_dma_ops(dma_iommu_ops);
@@ -411,7 +409,6 @@ void __init iommu_init_early_dart(struct pci_controller_ops 
*controller_ops)
controller_ops-dma_dev_setup = NULL;
controller_ops-dma_bus_setup = NULL;
}
-   ppc_md.pci_dma_bus_setup = NULL;
 
/* Setup pci_dma ops */
set_pci_dma_ops(dma_direct_ops);
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V15 02/21] powerpc/powernv: Use pci_dn, not device_node, in PCI config accessor

2015-03-25 Thread Wei Yang
The PCI config accessors previously relied on device_node.  Unfortunately,
VFs don't have a corresponding device_node, so change the accessors to use
pci_dn instead.

[bhelgaas: changelog]
Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com
---
 arch/powerpc/platforms/powernv/eeh-powernv.c |   14 +-
 arch/powerpc/platforms/powernv/pci.c |   69 ++
 arch/powerpc/platforms/powernv/pci.h |4 +-
 3 files changed, 40 insertions(+), 47 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
b/arch/powerpc/platforms/powernv/eeh-powernv.c
index e261869..7a5021b 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -430,21 +430,31 @@ static inline bool powernv_eeh_cfg_blocked(struct 
device_node *dn)
 static int powernv_eeh_read_config(struct device_node *dn,
   int where, int size, u32 *val)
 {
+   struct pci_dn *pdn = PCI_DN(dn);
+
+   if (!pdn)
+   return PCIBIOS_DEVICE_NOT_FOUND;
+
if (powernv_eeh_cfg_blocked(dn)) {
*val = 0x;
return PCIBIOS_SET_FAILED;
}
 
-   return pnv_pci_cfg_read(dn, where, size, val);
+   return pnv_pci_cfg_read(pdn, where, size, val);
 }
 
 static int powernv_eeh_write_config(struct device_node *dn,
int where, int size, u32 val)
 {
+   struct pci_dn *pdn = PCI_DN(dn);
+
+   if (!pdn)
+   return PCIBIOS_DEVICE_NOT_FOUND;
+
if (powernv_eeh_cfg_blocked(dn))
return PCIBIOS_SET_FAILED;
 
-   return pnv_pci_cfg_write(dn, where, size, val);
+   return pnv_pci_cfg_write(pdn, where, size, val);
 }
 
 /**
diff --git a/arch/powerpc/platforms/powernv/pci.c 
b/arch/powerpc/platforms/powernv/pci.c
index e69142f..6c20d6e 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -366,9 +366,9 @@ static void pnv_pci_handle_eeh_config(struct pnv_phb *phb, 
u32 pe_no)
spin_unlock_irqrestore(phb-lock, flags);
 }
 
-static void pnv_pci_config_check_eeh(struct pnv_phb *phb,
-struct device_node *dn)
+static void pnv_pci_config_check_eeh(struct pci_dn *pdn)
 {
+   struct pnv_phb *phb = pdn-phb-private_data;
u8  fstate;
__be16  pcierr;
int pe_no;
@@ -379,7 +379,7 @@ static void pnv_pci_config_check_eeh(struct pnv_phb *phb,
 * setup that yet. So all ER errors should be mapped to
 * reserved PE.
 */
-   pe_no = PCI_DN(dn)-pe_number;
+   pe_no = pdn-pe_number;
if (pe_no == IODA_INVALID_PE) {
if (phb-type == PNV_PHB_P5IOC2)
pe_no = 0;
@@ -407,8 +407,7 @@ static void pnv_pci_config_check_eeh(struct pnv_phb *phb,
}
 
cfg_dbg( - EEH check, bdfn=%04x PE#%d fstate=%x\n,
-   (PCI_DN(dn)-busno  8) | (PCI_DN(dn)-devfn),
-   pe_no, fstate);
+   (pdn-busno  8) | (pdn-devfn), pe_no, fstate);
 
/* Clear the frozen state if applicable */
if (fstate == OPAL_EEH_STOPPED_MMIO_FREEZE ||
@@ -425,10 +424,9 @@ static void pnv_pci_config_check_eeh(struct pnv_phb *phb,
}
 }
 
-int pnv_pci_cfg_read(struct device_node *dn,
+int pnv_pci_cfg_read(struct pci_dn *pdn,
 int where, int size, u32 *val)
 {
-   struct pci_dn *pdn = PCI_DN(dn);
struct pnv_phb *phb = pdn-phb-private_data;
u32 bdfn = (pdn-busno  8) | pdn-devfn;
s64 rc;
@@ -462,10 +460,9 @@ int pnv_pci_cfg_read(struct device_node *dn,
return PCIBIOS_SUCCESSFUL;
 }
 
-int pnv_pci_cfg_write(struct device_node *dn,
+int pnv_pci_cfg_write(struct pci_dn *pdn,
  int where, int size, u32 val)
 {
-   struct pci_dn *pdn = PCI_DN(dn);
struct pnv_phb *phb = pdn-phb-private_data;
u32 bdfn = (pdn-busno  8) | pdn-devfn;
 
@@ -489,18 +486,17 @@ int pnv_pci_cfg_write(struct device_node *dn,
 }
 
 #if CONFIG_EEH
-static bool pnv_pci_cfg_check(struct pci_controller *hose,
- struct device_node *dn)
+static bool pnv_pci_cfg_check(struct pci_dn *pdn)
 {
struct eeh_dev *edev = NULL;
-   struct pnv_phb *phb = hose-private_data;
+   struct pnv_phb *phb = pdn-phb-private_data;
 
/* EEH not enabled ? */
if (!(phb-flags  PNV_PHB_FLAG_EEH))
return true;
 
/* PE reset or device removed ? */
-   edev = of_node_to_eeh_dev(dn);
+   edev = pdn-edev;
if (edev) {
if (edev-pe 
(edev-pe-state  EEH_PE_CFG_BLOCKED))
@@ -513,8 +509,7 @@ static bool pnv_pci_cfg_check(struct pci_controller *hose,
return true;
 }
 #else
-static inline pnv_pci_cfg_check(struct pci_controller *hose,
-   struct device_node *dn)
+static inline pnv_pci_cfg_check(struct pci_dn *pdn)
 {

[PATCH V15 11/21] PCI: Add pcibios_iov_resource_alignment() interface

2015-03-25 Thread Wei Yang
Per the SR-IOV spec r1.1, sec 3.3.14, the required alignment of a PF's IOV
BAR is the size of an individual VF BAR, and the size consumed is the
individual VF BAR size times NumVFs.

The PowerNV platform has additional alignment requirements to help support
its Partitionable Endpoint device isolation feature (see
Documentation/powerpc/pci_iov_resource_on_powernv.txt).

Add a pcibios_iov_resource_alignment() interface to allow platforms to
request additional alignment.

[bhelgaas: changelog, adapt to reworked pci_sriov_resource_alignment(),
drop align parameter]
Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com
Acked-by: Bjorn Helgaas bhelg...@google.com
---
 drivers/pci/iov.c   |8 +++-
 include/linux/pci.h |1 +
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 64c4692..ee0ebff 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -569,6 +569,12 @@ int pci_iov_resource_bar(struct pci_dev *dev, int resno)
4 * (resno - PCI_IOV_RESOURCES);
 }
 
+resource_size_t __weak pcibios_iov_resource_alignment(struct pci_dev *dev,
+ int resno)
+{
+   return pci_iov_resource_size(dev, resno);
+}
+
 /**
  * pci_sriov_resource_alignment - get resource alignment for VF BAR
  * @dev: the PCI device
@@ -581,7 +587,7 @@ int pci_iov_resource_bar(struct pci_dev *dev, int resno)
  */
 resource_size_t pci_sriov_resource_alignment(struct pci_dev *dev, int resno)
 {
-   return pci_iov_resource_size(dev, resno);
+   return pcibios_iov_resource_alignment(dev, resno);
 }
 
 /**
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 99ea948..4e1f17d 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1174,6 +1174,7 @@ unsigned char pci_bus_max_busnr(struct pci_bus *bus);
 void pci_setup_bridge(struct pci_bus *bus);
 resource_size_t pcibios_window_alignment(struct pci_bus *bus,
 unsigned long type);
+resource_size_t pcibios_iov_resource_alignment(struct pci_dev *dev, int resno);
 
 #define PCI_VGA_STATE_CHANGE_BRIDGE (1  0)
 #define PCI_VGA_STATE_CHANGE_DECODES (1  1)
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V15 10/21] PCI: Add pcibios_sriov_enable() and pcibios_sriov_disable()

2015-03-25 Thread Wei Yang
VFs are dynamically created when a driver enables them.  On some platforms,
like PowerNV, special resources are necessary to enable VFs.

Add platform hooks for enabling and disabling VFs.

Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com
Acked-by: Bjorn Helgaas bhelg...@google.com
---
 drivers/pci/iov.c |   19 +++
 1 file changed, 19 insertions(+)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 5643a10..64c4692 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -220,6 +220,11 @@ static void virtfn_remove(struct pci_dev *dev, int id, int 
reset)
pci_dev_put(dev);
 }
 
+int __weak pcibios_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
+{
+   return 0;
+}
+
 static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
 {
int rc;
@@ -231,6 +236,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
struct pci_sriov *iov = dev-sriov;
int bars = 0;
int bus;
+   int retval;
 
if (!nr_virtfn)
return 0;
@@ -307,6 +313,12 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
if (nr_virtfn  initial)
initial = nr_virtfn;
 
+   if ((retval = pcibios_sriov_enable(dev, initial))) {
+   dev_err(dev-dev, failure %d from pcibios_sriov_enable()\n,
+   retval);
+   return retval;
+   }
+
for (i = 0; i  initial; i++) {
rc = virtfn_add(dev, i, 0);
if (rc)
@@ -335,6 +347,11 @@ failed:
return rc;
 }
 
+int __weak pcibios_sriov_disable(struct pci_dev *pdev)
+{
+   return 0;
+}
+
 static void sriov_disable(struct pci_dev *dev)
 {
int i;
@@ -346,6 +363,8 @@ static void sriov_disable(struct pci_dev *dev)
for (i = 0; i  iov-num_VFs; i++)
virtfn_remove(dev, i, 0);
 
+   pcibios_sriov_disable(dev);
+
iov-ctrl = ~(PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE);
pci_cfg_access_lock(dev);
pci_write_config_word(dev, iov-pos + PCI_SRIOV_CTRL, iov-ctrl);
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2] powerpc/mpc85xx: Add MDIO bus muxing support to the board device tree(s)

2015-03-25 Thread Emil Medve
From: Igal Liberman igal.liber...@freescale.com

Describe the PHY topology for all configurations supported by each board

Based on prior work by Andy Fleming aflem...@gmail.com

Signed-off-by: Igal Liberman igal.liber...@freescale.com
Signed-off-by: Shruti Kanetkar kanetkar.shr...@gmail.com
Signed-off-by: Emil Medve emilian.me...@freescale.com
---

v2: Remove 'Change-Id'

 arch/powerpc/boot/dts/b4860qds.dts|  60 -
 arch/powerpc/boot/dts/b4qds.dtsi  |  51 -
 arch/powerpc/boot/dts/p1023rdb.dts|  24 +-
 arch/powerpc/boot/dts/p2041rdb.dts|  92 +++-
 arch/powerpc/boot/dts/p3041ds.dts | 112 +-
 arch/powerpc/boot/dts/p4080ds.dts | 184 +++-
 arch/powerpc/boot/dts/p5020ds.dts | 112 +-
 arch/powerpc/boot/dts/p5040ds.dts | 234 +++-
 arch/powerpc/boot/dts/t1040rdb.dts|  32 ++-
 arch/powerpc/boot/dts/t1042rdb.dts|  30 ++-
 arch/powerpc/boot/dts/t1042rdb_pi.dts |  18 +-
 arch/powerpc/boot/dts/t104xqds.dtsi   | 178 ++-
 arch/powerpc/boot/dts/t104xrdb.dtsi   |  33 ++-
 arch/powerpc/boot/dts/t2080qds.dts| 158 +-
 arch/powerpc/boot/dts/t2080rdb.dts|  67 +-
 arch/powerpc/boot/dts/t2081qds.dts| 221 ++-
 arch/powerpc/boot/dts/t4240qds.dts| 400 +-
 arch/powerpc/boot/dts/t4240rdb.dts| 149 -
 18 files changed, 2135 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/boot/dts/b4860qds.dts 
b/arch/powerpc/boot/dts/b4860qds.dts
index 6bb3707..98b1ef4 100644
--- a/arch/powerpc/boot/dts/b4860qds.dts
+++ b/arch/powerpc/boot/dts/b4860qds.dts
@@ -1,7 +1,7 @@
 /*
  * B4860DS Device Tree Source
  *
- * Copyright 2012 Freescale Semiconductor Inc.
+ * Copyright 2012 - 2015 Freescale Semiconductor Inc.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are met:
@@ -39,12 +39,69 @@
model = fsl,B4860QDS;
compatible = fsl,B4860QDS;
 
+   aliases {
+   phy_sgmii_1e = phy_sgmii_1e;
+   phy_sgmii_1f = phy_sgmii_1f;
+   phy_xaui_slot1 = phy_xaui_slot1;
+   phy_xaui_slot2 = phy_xaui_slot2;
+   };
+
ifc: localbus@ffe124000 {
board-control@3,0 {
compatible = fsl,b4860qds-fpga, fsl,fpga-qixis;
};
};
 
+   soc@ffe00 {
+   fman@40 {
+   ethernet@e8000 {
+   phy-handle = phy_sgmii_1e;
+   phy-connection-type = sgmii;
+   };
+
+   ethernet@ea000 {
+   phy-handle = phy_sgmii_1f;
+   phy-connection-type = sgmii;
+   };
+
+   ethernet@f {
+   phy-handle = phy_xaui_slot1;
+   phy-connection-type = xgmii;
+   };
+
+   ethernet@f2000 {
+   phy-handle = phy_xaui_slot2;
+   phy-connection-type = xgmii;
+   };
+
+   mdio@fc000 {
+   phy_sgmii_1e: ethernet-phy@1e {
+   reg = 0x1e;
+   status = disabled;
+   };
+
+   phy_sgmii_1f: ethernet-phy@1f {
+   reg = 0x1f;
+   status = disabled;
+   };
+   };
+
+   mdio@fd000 {
+   phy_xaui_slot1: xaui-phy@slot1 {
+   compatible = 
ethernet-phy-ieee802.3-c45;
+   reg = 0x7;
+   status = disabled;
+   };
+
+   phy_xaui_slot2: xaui-phy@slot2 {
+   compatible = 
ethernet-phy-ieee802.3-c45;
+   reg = 0x6;
+   status = disabled;
+   };
+   };
+   };
+   };
+
rio: rapidio@ffe0c {
reg = 0xf 0xfe0c 0 0x11000;
 
@@ -55,7 +112,6 @@
ranges = 0 0 0xc 0x3000 0 0x1000;
};
};
-
 };
 
 /include/ fsl/b4860si-post.dtsi
diff --git a/arch/powerpc/boot/dts/b4qds.dtsi b/arch/powerpc/boot/dts/b4qds.dtsi
index 559d006..af49456 100644
--- a/arch/powerpc/boot/dts/b4qds.dtsi
+++ b/arch/powerpc/boot/dts/b4qds.dtsi
@@ -1,7 +1,7 @@
 /*
  * B4420DS Device Tree Source
  *
- * Copyright 2012 - 2014 Freescale Semiconductor, Inc.
+ * Copyright 2012 - 2015 

Re: [PATCH v8 19/30] powerpc/pci: Use pci_scan_host_bridge() for simplicity

2015-03-25 Thread Yijing Wang
On 2015/3/25 7:58, Daniel Axtens wrote:
 On Tue, 2015-03-24 at 11:34 +0800, Yijing Wang wrote:
 Now we could use pci_scan_host_bridge() to scan
 pci buses, provide powerpc specific pci_host_bridge_ops.

 Signed-off-by: Yijing Wang wangyij...@huawei.com
 CC: Benjamin Herrenschmidt b...@kernel.crashing.org
 CC: linuxppc-dev@lists.ozlabs.org
 ---
  arch/powerpc/kernel/pci-common.c |   60 
 +++--
  1 files changed, 37 insertions(+), 23 deletions(-)
 
 diff --git a/arch/powerpc/kernel/pci-common.c 
 b/arch/powerpc/kernel/pci-common.c
 index 2c58200..e2b50a2 100644
 --- a/arch/powerpc/kernel/pci-common.c
 +++ b/arch/powerpc/kernel/pci-common.c
 @@ -773,6 +773,29 @@ void pcibios_set_root_bus_speed(struct pci_host_bridge 
 *bridge)
  return ppc_md.pcibios_set_root_bus_speed(bridge);
  }
  
 +static int pci_host_scan_bus(struct pci_host_bridge *host)
 +{
 +int mode = PCI_PROBE_NORMAL;
 +struct pci_bus *bus = host-bus;
 +struct pci_controller *hose = dev_get_drvdata(host-dev);
 Is there any reason this isn't *hose = pci_bus_to_host(bus)?

Hi Daniel, thanks for your review and comments. We want to make a generic 
pci_host_bridge,
which would hold the common host information, for example, pci domain is common 
info for
pci host bridge, this series saved domain in pci_host_bridge, then we no need to
extract out domain by pci_bus-sysdata by platform specific pci_domain_nr().
Also we store the sysdata in pci_host_bridge, and pci_bus_to_host() is the 
platform
interface, I think use the common interface would be better.

 +
 +/* Get probe mode and perform scan */
 +if (hose-dn  ppc_md.pci_probe_mode)
 +mode = ppc_md.pci_probe_mode(bus);
 +
 +pr_debug(probe mode: %d\n, mode);
 +if (mode == PCI_PROBE_DEVTREE)
 +of_scan_bus(hose-dn, bus);
 +
 +if (mode == PCI_PROBE_NORMAL) {
 +pci_bus_update_busn_res_end(bus, 255);
 +hose-last_busno = pci_scan_child_bus(bus);
 +pci_bus_update_busn_res_end(bus, hose-last_busno);
 +}
 +
 +return pci_bus_child_max_busnr(bus);
 +}
 +
 I'm having trouble convincing myself that this patch covers every
 variation within our PCI implementations. In particular, there's a
 stanza in of_scan_pci_bridge in kernel/pci_of_scan.c that's almost
 identical to this function. Does that implementation need to be cleaned
 up and replaced with this function too?
 

This is a pci_host_bridge_ops hook function, which would be called in
PCI core, and after applied this series, we only need to call 
pci_scan_host_bridge()
to scan pci devices, and this function is also extracted from the 
pcibios_scan_phb(),
it's not the redundant code.

 
 @@ -1641,9 +1655,9 @@ void pcibios_scan_phb(struct pci_controller *hose)
  ppc_md.pcibios_fixup_phb(hose);
  
  /* Configure PCI Express settings */
 -if (bus  !pci_has_flag(PCI_PROBE_ONLY)) {
 +if (host-bus  !pci_has_flag(PCI_PROBE_ONLY)) {
  struct pci_bus *child;
 -list_for_each_entry(child, bus-children, node)
 +list_for_each_entry(child, host-bus-children, node)
  pcie_bus_configure_settings(child);
  }
  }
 Two things: Firstly, the function uses hose throughout, not host.
 Secondly, you're not deleting the bus variable: what's the purpose of
 this change?

host is the common pci_host_bridge which is created by PCI core for pci host 
bridge driver,
the hose is the platform data used in powerpc. The purpose of the patch/series 
is to simplify
pci enumeration interface, and try to reduce the weak functions which were used 
to setup pci bus/devices
during PCI enumeration.

 
 Regards,
 Daniel
 


-- 
Thanks!
Yijing

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V16 00/20] Enable SRIOV on POWER8

2015-03-25 Thread Wei Yang
This patchset enables the SRIOV on POWER8.

The general idea is put each VF into one individual PE and allocate required
resources like MMIO/DMA/MSI. The major difficulty comes from the MMIO
allocation and adjustment for PF's IOV BAR.

On P8, we use M64BT to cover a PF's IOV BAR, which could make an individual VF
sit in its own PE. This gives more flexiblity, while at the mean time it
brings on some restrictions on the PF's IOV BAR size and alignment.

To achieve this effect, we need to do some hack on pci devices's resources.
1. Expand the IOV BAR properly.
   Done by pnv_pci_ioda_fixup_iov_resources().
2. Shift the IOV BAR properly.
   Done by pnv_pci_vf_resource_shift().
3. IOV BAR alignment is calculated by arch dependent function instead of an
   individual VF BAR size.
   Done by pnv_pcibios_sriov_resource_alignment().
4. Take the IOV BAR alignment into consideration in the sizing and assigning.
   This is achieved by commit: PCI: Take additional IOV BAR alignment in
   sizing and assigning

Test Environment:
   The SRIOV device tested is Emulex Lancer(10df:e220) and
   Mellanox ConnectX-3(15b3:1003) on POWER8.

Examples on pass through a VF to guest through vfio:
1. unbind the original driver and bind to vfio-pci driver
   echo :06:0d.0  /sys/bus/pci/devices/:06:0d.0/driver/unbind
   echo  1102 0002  /sys/bus/pci/drivers/vfio-pci/new_id
   Note: this should be done for each device in the same iommu_group
2. Start qemu and pass device through vfio
   /home/ywywyang/git/qemu-impreza/ppc64-softmmu/qemu-system-ppc64 \
   -M pseries -m 2048 -enable-kvm -nographic \
   -drive file=/home/ywywyang/kvm/fc19.img \
   -monitor telnet:localhost:5435,server,nowait -boot cd \
   -device 
spapr-pci-vfio-host-bridge,id=CXGB3,iommu=26,index=6

Verify this is the exact VF response:
1. ping from a machine in the same subnet(the broadcast domain)
2. run arp -n on this machine
   9.115.251.20 ether   00:00:c9:df:ed:bf   C eth0
3. ifconfig in the guest
   # ifconfig eth1
   eth1: flags=4163UP,BROADCAST,RUNNING,MULTICAST  mtu 1500
inet 9.115.251.20  netmask 255.255.255.0  broadcast 
9.115.251.255
inet6 fe80::200:c9ff:fedf:edbf  prefixlen 64  scopeid 0x20link
ether 00:00:c9:df:ed:bf  txqueuelen 1000 (Ethernet)
RX packets 175  bytes 13278 (12.9 KiB)
RX errors 0  dropped 0  overruns 0  frame 0
TX packets 58  bytes 9276 (9.0 KiB)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
4. They have the same MAC address

Note: make sure you shutdown other network interfaces in guest.

---
v16:
   * rebased on Ben's next-eeh
   * Following two patches have been divided into three. First two are already
 merged, the third one is renamed to powerpc/pci: Create pci_dn for VFs
 and sent in this patch set.
 8ec20d6 powerpc/powernv: Use pci_dn, not device_node, in PCI config 
accessor
 a3460fc powerpc/pci: Refactor pci_dn
v15:
   * Add Ack from Bjorn
   * Make more detailed comment for pnv_pci_vf_resource_shift()
v14:
   * call ppc_md.pcibios_fixup_sriov() in pcibios_add_device
   * add more explanation in change log
   * Following patches have been reordered to the beginning.
 8ec20d6 powerpc/powernv: Use pci_dn, not device_node, in PCI config 
accessor
 a3460fc powerpc/pci: Refactor pci_dn
 These two patches will be modified to merge with other patches which are
 under discussion/review in ppc mail list. Some changes may also be made in
 other patches, which I didn't include them in this series, so that the
 auto build robot could work on this.
 There may have several changes in powerpc arch, which not effect the pci
 core. So after this patch set pass the review in pci community, I would
 rebase this series on ppc brach and send out for comment.
   * use add_res-min_align as the alignment in reassign_resources_sorted()
   * some cleanup in Document
v13:
   * fix error in pcibios_iov_resource_alignment(), use pdev instead of dev
   * rename vf_num to num_vfs in pcibios_sriov_enable(),
 pnv_pci_vf_resource_shift(), pnv_pci_sriov_disable(),
 pnv_pci_sriov_enable(), pnv_pci_ioda2_setup_dma_pe()
   * add more explanation in commit powerpc/pci: Don't unset PCI resources
 for VFs
   * fix IOV BAR in hotplug path as well, and don't fixup an already added
 device
   * use roundup_pow_of_two() instead of __roundup_pow_of_two()
   * this is based on v4.0-rc1
v12:
   * remove align parameter from pcibios_iov_resource_alignment()
 default version returns pci_iov_resource_size() instead of the
 align parameter
   * in powerpc pcibios_iov_resource_alignment(), return
 pci_iov_resource_size() if there's no ppc_md function pointer
   * in 

[PATCH V16 05/20] PCI: Refresh First VF Offset and VF Stride when updating NumVFs

2015-03-25 Thread Wei Yang
The First VF Offset and VF Stride fields depend on the NumVFs setting, so
refresh the cached fields in struct pci_sriov when updating NumVFs.  See
the SR-IOV spec r1.1, sec 3.3.9 and 3.3.10.

[bhelgaas: changelog, remove kernel-doc comment marker]
Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com
Acked-by: Bjorn Helgaas bhelg...@google.com
---
 drivers/pci/iov.c |   23 +++
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 27b98c3..a8752c2 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -31,6 +31,21 @@ static inline u8 virtfn_devfn(struct pci_dev *dev, int id)
dev-sriov-stride * id)  0xff;
 }
 
+/*
+ * Per SR-IOV spec sec 3.3.10 and 3.3.11, First VF Offset and VF Stride may
+ * change when NumVFs changes.
+ *
+ * Update iov-offset and iov-stride when NumVFs is written.
+ */
+static inline void pci_iov_set_numvfs(struct pci_dev *dev, int nr_virtfn)
+{
+   struct pci_sriov *iov = dev-sriov;
+
+   pci_write_config_word(dev, iov-pos + PCI_SRIOV_NUM_VF, nr_virtfn);
+   pci_read_config_word(dev, iov-pos + PCI_SRIOV_VF_OFFSET, iov-offset);
+   pci_read_config_word(dev, iov-pos + PCI_SRIOV_VF_STRIDE, iov-stride);
+}
+
 static struct pci_bus *virtfn_add_bus(struct pci_bus *bus, int busnr)
 {
struct pci_bus *child;
@@ -253,7 +268,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
return rc;
}
 
-   pci_write_config_word(dev, iov-pos + PCI_SRIOV_NUM_VF, nr_virtfn);
+   pci_iov_set_numvfs(dev, nr_virtfn);
iov-ctrl |= PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE;
pci_cfg_access_lock(dev);
pci_write_config_word(dev, iov-pos + PCI_SRIOV_CTRL, iov-ctrl);
@@ -282,7 +297,7 @@ failed:
iov-ctrl = ~(PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE);
pci_cfg_access_lock(dev);
pci_write_config_word(dev, iov-pos + PCI_SRIOV_CTRL, iov-ctrl);
-   pci_write_config_word(dev, iov-pos + PCI_SRIOV_NUM_VF, 0);
+   pci_iov_set_numvfs(dev, 0);
ssleep(1);
pci_cfg_access_unlock(dev);
 
@@ -313,7 +328,7 @@ static void sriov_disable(struct pci_dev *dev)
sysfs_remove_link(dev-dev.kobj, dep_link);
 
iov-num_VFs = 0;
-   pci_write_config_word(dev, iov-pos + PCI_SRIOV_NUM_VF, 0);
+   pci_iov_set_numvfs(dev, 0);
 }
 
 static int sriov_init(struct pci_dev *dev, int pos)
@@ -452,7 +467,7 @@ static void sriov_restore_state(struct pci_dev *dev)
pci_update_resource(dev, i);
 
pci_write_config_dword(dev, iov-pos + PCI_SRIOV_SYS_PGSIZE, iov-pgsz);
-   pci_write_config_word(dev, iov-pos + PCI_SRIOV_NUM_VF, iov-num_VFs);
+   pci_iov_set_numvfs(dev, iov-num_VFs);
pci_write_config_word(dev, iov-pos + PCI_SRIOV_CTRL, iov-ctrl);
if (iov-ctrl  PCI_SRIOV_CTRL_VFE)
msleep(100);
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V16 12/20] powerpc/pci: Don't unset PCI resources for VFs

2015-03-25 Thread Wei Yang
Flag PCI_REASSIGN_ALL_RSRC is used to ignore resources information setup by
firmware, so that kernel would re-assign all resources of pci devices.

On powerpc arch, this happens in a header fixup function
pcibios_fixup_resources(), which will clean up the resources if this flag
is set. This works fine for PFs, since after clean up, kernel will
re-assign the resources in pcibios_resource_survey().

Below is a simple call flow on how it works:

pcibios_init
  pcibios_scan_phb
pci_scan_child_bus
  ...
pci_device_add
  pci_fixup_device(pci_fixup_header)
pcibios_fixup_resources # header fixup
  for (i = 0; i  DEVICE_COUNT_RESOURCE; i++)
dev-resource[i].start = 0
  pcibios_resource_survey   # re-assign
pcibios_allocate_resources

However, the VF resources won't be re-assigned, since the VF resources are
completely determined by the PF resources, and the PF resources have
already been reassigned. This means we need to leave VF's resources
un-cleared in pcibios_fixup_resources().

In this patch, we skip the resource unset process in
pcibios_fixup_resources(), if the pci_dev is a VF.

Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com
---
 arch/powerpc/kernel/pci-common.c |4 
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 2a525c9..8203101 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -788,6 +788,10 @@ static void pcibios_fixup_resources(struct pci_dev *dev)
   pci_name(dev));
return;
}
+
+   if (dev-is_virtfn)
+   return;
+
for (i = 0; i  DEVICE_COUNT_RESOURCE; i++) {
struct resource *res = dev-resource + i;
struct pci_bus_region reg;
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V16 18/20] powerpc/powernv: Group VF PE when IOV BAR is big on PHB3

2015-03-25 Thread Wei Yang
When IOV BAR is big, each is covered by 4 M64 windows.  This leads to
several VF PE sits in one PE in terms of M64.

Group VF PEs according to the M64 allocation.

[bhelgaas: use dev_printk() when possible]
Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/pci-bridge.h |2 +-
 arch/powerpc/platforms/powernv/pci-ioda.c |  197 ++---
 2 files changed, 154 insertions(+), 45 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index 415df85..560c739 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -185,7 +185,7 @@ struct pci_dn {
 #define M64_PER_IOV 4
int m64_per_iov;
 #define IODA_INVALID_M64(-1)
-   int m64_wins[PCI_SRIOV_NUM_BARS];
+   int m64_wins[PCI_SRIOV_NUM_BARS][M64_PER_IOV];
 #endif /* CONFIG_PCI_IOV */
 #endif
struct list_head child_list;
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index b63925f..33088f6 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1156,26 +1156,27 @@ static int pnv_pci_vf_release_m64(struct pci_dev *pdev)
struct pci_controller *hose;
struct pnv_phb*phb;
struct pci_dn *pdn;
-   inti;
+   inti, j;
 
bus = pdev-bus;
hose = pci_bus_to_host(bus);
phb = hose-private_data;
pdn = pci_get_pdn(pdev);
 
-   for (i = 0; i  PCI_SRIOV_NUM_BARS; i++) {
-   if (pdn-m64_wins[i] == IODA_INVALID_M64)
-   continue;
-   opal_pci_phb_mmio_enable(phb-opal_id,
-   OPAL_M64_WINDOW_TYPE, pdn-m64_wins[i], 0);
-   clear_bit(pdn-m64_wins[i], phb-ioda.m64_bar_alloc);
-   pdn-m64_wins[i] = IODA_INVALID_M64;
-   }
+   for (i = 0; i  PCI_SRIOV_NUM_BARS; i++)
+   for (j = 0; j  M64_PER_IOV; j++) {
+   if (pdn-m64_wins[i][j] == IODA_INVALID_M64)
+   continue;
+   opal_pci_phb_mmio_enable(phb-opal_id,
+   OPAL_M64_WINDOW_TYPE, pdn-m64_wins[i][j], 0);
+   clear_bit(pdn-m64_wins[i][j], 
phb-ioda.m64_bar_alloc);
+   pdn-m64_wins[i][j] = IODA_INVALID_M64;
+   }
 
return 0;
 }
 
-static int pnv_pci_vf_assign_m64(struct pci_dev *pdev)
+static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
 {
struct pci_bus*bus;
struct pci_controller *hose;
@@ -1183,17 +1184,33 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev)
struct pci_dn *pdn;
unsigned int   win;
struct resource   *res;
-   inti;
+   inti, j;
int64_trc;
+   inttotal_vfs;
+   resource_size_tsize, start;
+   intpe_num;
+   intvf_groups;
+   intvf_per_group;
 
bus = pdev-bus;
hose = pci_bus_to_host(bus);
phb = hose-private_data;
pdn = pci_get_pdn(pdev);
+   total_vfs = pci_sriov_get_totalvfs(pdev);
 
/* Initialize the m64_wins to IODA_INVALID_M64 */
for (i = 0; i  PCI_SRIOV_NUM_BARS; i++)
-   pdn-m64_wins[i] = IODA_INVALID_M64;
+   for (j = 0; j  M64_PER_IOV; j++)
+   pdn-m64_wins[i][j] = IODA_INVALID_M64;
+
+   if (pdn-m64_per_iov == M64_PER_IOV) {
+   vf_groups = (num_vfs = M64_PER_IOV) ? num_vfs: M64_PER_IOV;
+   vf_per_group = (num_vfs = M64_PER_IOV)? 1:
+   roundup_pow_of_two(num_vfs) / pdn-m64_per_iov;
+   } else {
+   vf_groups = 1;
+   vf_per_group = 1;
+   }
 
for (i = 0; i  PCI_SRIOV_NUM_BARS; i++) {
res = pdev-resource[i + PCI_IOV_RESOURCES];
@@ -1203,35 +1220,61 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev)
if (!pnv_pci_is_mem_pref_64(res-flags))
continue;
 
-   do {
-   win = find_next_zero_bit(phb-ioda.m64_bar_alloc,
-   phb-ioda.m64_bar_idx + 1, 0);
-
-   if (win = phb-ioda.m64_bar_idx + 1)
-   goto m64_failed;
-   } while (test_and_set_bit(win, phb-ioda.m64_bar_alloc));
+   for (j = 0; j  vf_groups; j++) {
+   do {
+   win = 
find_next_zero_bit(phb-ioda.m64_bar_alloc,
+   phb-ioda.m64_bar_idx + 1, 0);
+
+   if (win = phb-ioda.m64_bar_idx + 1)
+   

[PATCH 1/6] powerpc/mm: Remove duplicate declaration of setbat()

2015-03-25 Thread Michael Ellerman
This is already declared in mmu_decl.h, so we don't need a second
version in the C file.

Signed-off-by: Michael Ellerman m...@ellerman.id.au
---
 arch/powerpc/mm/pgtable_32.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index 03b1a3b0fbd5..72555ab145cd 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -54,9 +54,6 @@ extern char etext[], _stext[];
 #ifdef HAVE_BATS
 extern phys_addr_t v_mapped_by_bats(unsigned long va);
 extern unsigned long p_mapped_by_bats(phys_addr_t pa);
-void setbat(int index, unsigned long virt, phys_addr_t phys,
-   unsigned int size, int flags);
-
 #else /* !HAVE_BATS */
 #define v_mapped_by_bats(x)(0UL)
 #define p_mapped_by_bats(x)(0UL)
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 3/6] powerpc: Make STRICT_MM_TYPECHECKS a config option

2015-03-25 Thread Michael Ellerman
The STRICT_MM_TYPECHECKS code has bit-rotted over the years. To make it
possible to easily build test it, make it a CONFIG option.

Signed-off-by: Michael Ellerman m...@ellerman.id.au
---
 arch/powerpc/Kconfig.debug   | 8 
 arch/powerpc/include/asm/page.h  | 4 +---
 arch/powerpc/include/asm/pgtable-ppc64.h | 2 +-
 3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
index ec2e40f2cc11..777108f4acab 100644
--- a/arch/powerpc/Kconfig.debug
+++ b/arch/powerpc/Kconfig.debug
@@ -19,6 +19,14 @@ config PPC_WERROR
depends on !PPC_DISABLE_WERROR
default y
 
+config STRICT_MM_TYPECHECKS
+   bool Do extra type checking on mm types
+   default n
+   help
+ This option turns on extra type checking for some mm related types.
+
+ If you don't know what this means, say N.
+
 config PRINT_STACK_DEPTH
int Stack depth to print if DEBUG_KERNEL
default 64
diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
index 69c059887a2c..71294a6e976e 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -278,9 +278,7 @@ extern long long virt_phys_offset;
 
 #ifndef __ASSEMBLY__
 
-#undef STRICT_MM_TYPECHECKS
-
-#ifdef STRICT_MM_TYPECHECKS
+#ifdef CONFIG_STRICT_MM_TYPECHECKS
 /* These are used to make use of C type-checking. */
 
 /* PTE level */
diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h 
b/arch/powerpc/include/asm/pgtable-ppc64.h
index 43e6ad424c7f..f951d9cf358a 100644
--- a/arch/powerpc/include/asm/pgtable-ppc64.h
+++ b/arch/powerpc/include/asm/pgtable-ppc64.h
@@ -118,7 +118,7 @@
  */
 #ifndef __real_pte
 
-#ifdef STRICT_MM_TYPECHECKS
+#ifdef CONFIG_STRICT_MM_TYPECHECKS
 #define __real_pte(e,p)((real_pte_t){(e)})
 #define __rpte_to_pte(r)   ((r).pte)
 #else
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC PATCH 6/6] powerpc: Always enable STRICT_MM_TYPECHECKS

2015-03-25 Thread Michael Ellerman
The argument for making this an option was that gcc produced inferior
code with it enabled. That doesn't seem to be the case anymore (gcc
4.9), so turn it on always.

Signed-off-by: Michael Ellerman m...@ellerman.id.au
---
 arch/powerpc/Kconfig.debug   |  8 ---
 arch/powerpc/include/asm/page.h  | 40 
 arch/powerpc/include/asm/pgtable-ppc64.h |  5 
 3 files changed, 53 deletions(-)

diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
index 777108f4acab..ec2e40f2cc11 100644
--- a/arch/powerpc/Kconfig.debug
+++ b/arch/powerpc/Kconfig.debug
@@ -19,14 +19,6 @@ config PPC_WERROR
depends on !PPC_DISABLE_WERROR
default y
 
-config STRICT_MM_TYPECHECKS
-   bool Do extra type checking on mm types
-   default n
-   help
- This option turns on extra type checking for some mm related types.
-
- If you don't know what this means, say N.
-
 config PRINT_STACK_DEPTH
int Stack depth to print if DEBUG_KERNEL
default 64
diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
index 71294a6e976e..6af9514106ab 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -278,7 +278,6 @@ extern long long virt_phys_offset;
 
 #ifndef __ASSEMBLY__
 
-#ifdef CONFIG_STRICT_MM_TYPECHECKS
 /* These are used to make use of C type-checking. */
 
 /* PTE level */
@@ -319,45 +318,6 @@ typedef struct { unsigned long pgprot; } pgprot_t;
 #define pgprot_val(x)  ((x).pgprot)
 #define __pgprot(x)((pgprot_t) { (x) })
 
-#else
-
-/*
- * .. while these make it easier on the compiler
- */
-
-typedef pte_basic_t pte_t;
-#define pte_val(x) (x)
-#define __pte(x)   (x)
-
-#if defined(CONFIG_PPC_64K_PAGES)  defined(CONFIG_PPC_STD_MMU_64)
-typedef struct { pte_t pte; unsigned long hidx; } real_pte_t;
-#else
-typedef pte_t real_pte_t;
-#endif
-
-
-#ifdef CONFIG_PPC64
-typedef unsigned long pmd_t;
-#define pmd_val(x) (x)
-#define __pmd(x)   (x)
-
-#ifndef CONFIG_PPC_64K_PAGES
-typedef unsigned long pud_t;
-#define pud_val(x) (x)
-#define __pud(x)   (x)
-#endif /* !CONFIG_PPC_64K_PAGES */
-#endif /* CONFIG_PPC64 */
-
-typedef unsigned long pgd_t;
-#define pgd_val(x) (x)
-#define pgprot_val(x)  (x)
-
-typedef unsigned long pgprot_t;
-#define __pgd(x)   (x)
-#define __pgprot(x)(x)
-
-#endif
-
 typedef struct { signed long pd; } hugepd_t;
 
 #ifdef CONFIG_HUGETLB_PAGE
diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h 
b/arch/powerpc/include/asm/pgtable-ppc64.h
index f951d9cf358a..4fa3035d8449 100644
--- a/arch/powerpc/include/asm/pgtable-ppc64.h
+++ b/arch/powerpc/include/asm/pgtable-ppc64.h
@@ -118,13 +118,8 @@
  */
 #ifndef __real_pte
 
-#ifdef CONFIG_STRICT_MM_TYPECHECKS
 #define __real_pte(e,p)((real_pte_t){(e)})
 #define __rpte_to_pte(r)   ((r).pte)
-#else
-#define __real_pte(e,p)(e)
-#define __rpte_to_pte(r)   (__pte(r))
-#endif
 #define __rpte_to_hidx(r,index)(pte_val(__rpte_to_pte(r))  12)
 
 #define pte_iterate_hashed_subpages(rpte, psize, va, index, shift)   \
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 01/27] powerpc: move find_and_init_phbs() to pSeries specific code

2015-03-25 Thread Stephen Rothwell
Hi Daniel,

On Wed, 25 Mar 2015 16:35:35 +1100 Daniel Axtens d...@axtens.net wrote:

 Previously, find_and_init_phbs() was used in both PowerNV and pSeries
 setup. However, since RTAS support has been dropped from PowerNV, we
 can move it into a platform-specific file.
 
 This patch depends on the patch to drop RTAS support from PowerNV:
 http://patchwork.ozlabs.org/patch/449316/

In the future, you should put this sort of commentary (this last
paragraph) below the --- line as we don't really want it in the
commit message, right?  This is more instructions to the
reviewers/committer than description of the change.

 
 Signed-off-by: Daniel Axtens d...@axtens.net
 ---
  arch/powerpc/include/asm/ppc-pci.h |  3 ---
  arch/powerpc/kernel/rtas_pci.c | 47 
 --
  arch/powerpc/platforms/pseries/setup.c | 47 
 ++
  3 files changed, 47 insertions(+), 50 deletions(-)
-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpR0Zprq9vER.pgp
Description: OpenPGP digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V16 04/20] PCI: Index IOV resources in the conventional style

2015-03-25 Thread Wei Yang
From: Bjorn Helgaas bhelg...@google.com

Most of PCI uses res = dev-resource[i], not res = dev-resource + i.
Use that style in iov.c also.

No functional change.

Signed-off-by: Bjorn Helgaas bhelg...@google.com
Acked-by: Wei Yang weiy...@linux.vnet.ibm.com
---
 drivers/pci/iov.c |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 5bca0e1..27b98c3 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -95,7 +95,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int reset)
virtfn-multifunction = 0;
 
for (i = 0; i  PCI_SRIOV_NUM_BARS; i++) {
-   res = dev-resource + PCI_IOV_RESOURCES + i;
+   res = dev-resource[i + PCI_IOV_RESOURCES];
if (!res-parent)
continue;
virtfn-resource[i].name = pci_name(virtfn);
@@ -212,7 +212,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
nres = 0;
for (i = 0; i  PCI_SRIOV_NUM_BARS; i++) {
bars |= (1  (i + PCI_IOV_RESOURCES));
-   res = dev-resource + PCI_IOV_RESOURCES + i;
+   res = dev-resource[i + PCI_IOV_RESOURCES];
if (res-parent)
nres++;
}
@@ -373,7 +373,7 @@ found:
 
nres = 0;
for (i = 0; i  PCI_SRIOV_NUM_BARS; i++) {
-   res = dev-resource + PCI_IOV_RESOURCES + i;
+   res = dev-resource[i + PCI_IOV_RESOURCES];
bar64 = __pci_read_base(dev, pci_bar_unknown, res,
pos + PCI_SRIOV_BAR + i * 4);
if (!res-flags)
@@ -417,7 +417,7 @@ found:
 
 failed:
for (i = 0; i  PCI_SRIOV_NUM_BARS; i++) {
-   res = dev-resource + PCI_IOV_RESOURCES + i;
+   res = dev-resource[i + PCI_IOV_RESOURCES];
res-flags = 0;
}
 
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V16 08/20] PCI: Add pcibios_sriov_enable() and pcibios_sriov_disable()

2015-03-25 Thread Wei Yang
VFs are dynamically created when a driver enables them.  On some platforms,
like PowerNV, special resources are necessary to enable VFs.

Add platform hooks for enabling and disabling VFs.

Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com
Acked-by: Bjorn Helgaas bhelg...@google.com
---
 drivers/pci/iov.c |   19 +++
 1 file changed, 19 insertions(+)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 5643a10..64c4692 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -220,6 +220,11 @@ static void virtfn_remove(struct pci_dev *dev, int id, int 
reset)
pci_dev_put(dev);
 }
 
+int __weak pcibios_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
+{
+   return 0;
+}
+
 static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
 {
int rc;
@@ -231,6 +236,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
struct pci_sriov *iov = dev-sriov;
int bars = 0;
int bus;
+   int retval;
 
if (!nr_virtfn)
return 0;
@@ -307,6 +313,12 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
if (nr_virtfn  initial)
initial = nr_virtfn;
 
+   if ((retval = pcibios_sriov_enable(dev, initial))) {
+   dev_err(dev-dev, failure %d from pcibios_sriov_enable()\n,
+   retval);
+   return retval;
+   }
+
for (i = 0; i  initial; i++) {
rc = virtfn_add(dev, i, 0);
if (rc)
@@ -335,6 +347,11 @@ failed:
return rc;
 }
 
+int __weak pcibios_sriov_disable(struct pci_dev *pdev)
+{
+   return 0;
+}
+
 static void sriov_disable(struct pci_dev *dev)
 {
int i;
@@ -346,6 +363,8 @@ static void sriov_disable(struct pci_dev *dev)
for (i = 0; i  iov-num_VFs; i++)
virtfn_remove(dev, i, 0);
 
+   pcibios_sriov_disable(dev);
+
iov-ctrl = ~(PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE);
pci_cfg_access_lock(dev);
pci_write_config_word(dev, iov-pos + PCI_SRIOV_CTRL, iov-ctrl);
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V16 09/20] PCI: Add pcibios_iov_resource_alignment() interface

2015-03-25 Thread Wei Yang
Per the SR-IOV spec r1.1, sec 3.3.14, the required alignment of a PF's IOV
BAR is the size of an individual VF BAR, and the size consumed is the
individual VF BAR size times NumVFs.

The PowerNV platform has additional alignment requirements to help support
its Partitionable Endpoint device isolation feature (see
Documentation/powerpc/pci_iov_resource_on_powernv.txt).

Add a pcibios_iov_resource_alignment() interface to allow platforms to
request additional alignment.

[bhelgaas: changelog, adapt to reworked pci_sriov_resource_alignment(),
drop align parameter]
Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com
Acked-by: Bjorn Helgaas bhelg...@google.com
---
 drivers/pci/iov.c   |8 +++-
 include/linux/pci.h |1 +
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 64c4692..ee0ebff 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -569,6 +569,12 @@ int pci_iov_resource_bar(struct pci_dev *dev, int resno)
4 * (resno - PCI_IOV_RESOURCES);
 }
 
+resource_size_t __weak pcibios_iov_resource_alignment(struct pci_dev *dev,
+ int resno)
+{
+   return pci_iov_resource_size(dev, resno);
+}
+
 /**
  * pci_sriov_resource_alignment - get resource alignment for VF BAR
  * @dev: the PCI device
@@ -581,7 +587,7 @@ int pci_iov_resource_bar(struct pci_dev *dev, int resno)
  */
 resource_size_t pci_sriov_resource_alignment(struct pci_dev *dev, int resno)
 {
-   return pci_iov_resource_size(dev, resno);
+   return pcibios_iov_resource_alignment(dev, resno);
 }
 
 /**
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 99ea948..4e1f17d 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1174,6 +1174,7 @@ unsigned char pci_bus_max_busnr(struct pci_bus *bus);
 void pci_setup_bridge(struct pci_bus *bus);
 resource_size_t pcibios_window_alignment(struct pci_bus *bus,
 unsigned long type);
+resource_size_t pcibios_iov_resource_alignment(struct pci_dev *dev, int resno);
 
 #define PCI_VGA_STATE_CHANGE_BRIDGE (1  0)
 #define PCI_VGA_STATE_CHANGE_DECODES (1  1)
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V16 10/20] PCI: Consider additional PF's IOV BAR alignment in sizing and assigning

2015-03-25 Thread Wei Yang
When sizing and assigning resources, we divide the resources into two
lists: the requested list and the additional list.  We don't consider the
alignment of additional VF(n) BAR space.

This is because the alignment required for the VF(n) BAR space is the size
of an individual VF BAR, not the size of the space for *all* VFs.  But we
want additional alignment to support partitioning on PowerNV.

Consider the additional IOV BAR alignment when sizing and assigning
resources.  When there is not enough system MMIO space to accomodate both
the requested list and the additional list, the PF's IOV BAR alignment will
not contribute to the bridge. When there is enough system MMIO space for
both lists, the additional alignment will contribute to the bridge.

The additional alignment is stored in the min_align of pci_dev_resource,
which is stored in the additional list by add_to_list() at the end of
pbus_size_mem(). The additional alignment is calculated in
pci_resource_alignment().  For an IOV BAR, we have arch dependent function
to get the alignment for different arch.

[bhelgaas: changelog, printk cast]
Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com
Acked-by: Bjorn Helgaas bhelg...@google.com
---
 drivers/pci/setup-bus.c |   95 +++
 1 file changed, 79 insertions(+), 16 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index e3e17f3..6603d40 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -99,8 +99,8 @@ static void remove_from_list(struct list_head *head,
}
 }
 
-static resource_size_t get_res_add_size(struct list_head *head,
-   struct resource *res)
+static struct pci_dev_resource *res_to_dev_res(struct list_head *head,
+  struct resource *res)
 {
struct pci_dev_resource *dev_res;
 
@@ -109,17 +109,37 @@ static resource_size_t get_res_add_size(struct list_head 
*head,
int idx = res - dev_res-dev-resource[0];
 
dev_printk(KERN_DEBUG, dev_res-dev-dev,
-res[%d]=%pR get_res_add_size add_size %llx\n,
+res[%d]=%pR res_to_dev_res add_size %llx 
min_align %llx\n,
 idx, dev_res-res,
-(unsigned long long)dev_res-add_size);
+(unsigned long long)dev_res-add_size,
+(unsigned long long)dev_res-min_align);
 
-   return dev_res-add_size;
+   return dev_res;
}
}
 
-   return 0;
+   return NULL;
 }
 
+static resource_size_t get_res_add_size(struct list_head *head,
+   struct resource *res)
+{
+   struct pci_dev_resource *dev_res;
+
+   dev_res = res_to_dev_res(head, res);
+   return dev_res ? dev_res-add_size : 0;
+}
+
+static resource_size_t get_res_add_align(struct list_head *head,
+struct resource *res)
+{
+   struct pci_dev_resource *dev_res;
+
+   dev_res = res_to_dev_res(head, res);
+   return dev_res ? dev_res-min_align : 0;
+}
+
+
 /* Sort resources by alignment */
 static void pdev_sort_resources(struct pci_dev *dev, struct list_head *head)
 {
@@ -215,7 +235,7 @@ static void reassign_resources_sorted(struct list_head 
*realloc_head,
struct resource *res;
struct pci_dev_resource *add_res, *tmp;
struct pci_dev_resource *dev_res;
-   resource_size_t add_size;
+   resource_size_t add_size, align;
int idx;
 
list_for_each_entry_safe(add_res, tmp, realloc_head, list) {
@@ -238,13 +258,13 @@ static void reassign_resources_sorted(struct list_head 
*realloc_head,
 
idx = res - add_res-dev-resource[0];
add_size = add_res-add_size;
+   align = add_res-min_align;
if (!resource_size(res)) {
-   res-start = add_res-start;
+   res-start = align;
res-end = res-start + add_size - 1;
if (pci_assign_resource(add_res-dev, idx))
reset_resource(res);
} else {
-   resource_size_t align = add_res-min_align;
res-flags |= add_res-flags 
 (IORESOURCE_STARTALIGN|IORESOURCE_SIZEALIGN);
if (pci_reassign_resource(add_res-dev, idx,
@@ -368,8 +388,9 @@ static void __assign_resources_sorted(struct list_head 
*head,
LIST_HEAD(save_head);
LIST_HEAD(local_fail_head);
struct pci_dev_resource *save_res;
-   struct pci_dev_resource *dev_res, *tmp_res;
+   struct pci_dev_resource *dev_res, *tmp_res, *dev_res2;
unsigned long fail_type;
+   resource_size_t add_align, align;
 
/* 

[PATCH V16 11/20] powerpc/pci: Create pci_dn for VFs

2015-03-25 Thread Wei Yang
From: Gavin Shan gws...@linux.vnet.ibm.com

pci_dn is the extension of PCI device node and is created from device node.
Unfortunately, VFs are enabled dynamically by PF's driver and they don't
have corresponding device nodes and pci_dn, which is required to access
VFs' config spaces.

The patch creates pci_dn for VFs in pcibios_sriov_enable() on their PF,
and removes pci_dn for VFs in pcibios_sriov_disable() on their PF. When
VF's pci_dn is created, it's put to the child list of the pci_dn of PF's
upstream bridge. The pci_dn is linked to pci_dev during early fixup time
to setup the fast path.

[bhelgaas: add ifdef around add_one_dev_pci_info(), use dev_printk()]
Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/pci-bridge.h |3 +
 arch/powerpc/kernel/pci_dn.c  |  116 +
 arch/powerpc/platforms/powernv/pci-ioda.c |   16 
 3 files changed, 135 insertions(+)

diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index 2c6dc2a..ece30f5 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -156,6 +156,7 @@ struct iommu_table;
 
 struct pci_dn {
int flags;
+#define PCI_DN_FLAG_IOV_VF 0x01
 
int busno;  /* pci bus number */
int devfn;  /* pci device and function number */
@@ -188,6 +189,8 @@ struct pci_dn {
 extern struct pci_dn *pci_get_pdn_by_devfn(struct pci_bus *bus,
   int devfn);
 extern struct pci_dn *pci_get_pdn(struct pci_dev *pdev);
+extern struct pci_dn *add_dev_pci_data(struct pci_dev *pdev);
+extern void remove_dev_pci_data(struct pci_dev *pdev);
 extern void *update_dn_pci_info(struct device_node *dn, void *data);
 
 static inline int pci_device_from_OF_node(struct device_node *np,
diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index 65b9836..e5f1d78 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -136,6 +136,122 @@ struct pci_dn *pci_get_pdn(struct pci_dev *pdev)
return NULL;
 }
 
+#ifdef CONFIG_PCI_IOV
+static struct pci_dn *add_one_dev_pci_data(struct pci_dn *parent,
+  struct pci_dev *pdev,
+  int busno, int devfn)
+{
+   struct pci_dn *pdn;
+
+   /* Except PHB, we always have the parent */
+   if (!parent)
+   return NULL;
+
+   pdn = kzalloc(sizeof(*pdn), GFP_KERNEL);
+   if (!pdn) {
+   dev_warn(pdev-dev, %s: Out of memory!\n, __func__);
+   return NULL;
+   }
+
+   pdn-phb = parent-phb;
+   pdn-parent = parent;
+   pdn-busno = busno;
+   pdn-devfn = devfn;
+#ifdef CONFIG_PPC_POWERNV
+   pdn-pe_number = IODA_INVALID_PE;
+#endif
+   INIT_LIST_HEAD(pdn-child_list);
+   INIT_LIST_HEAD(pdn-list);
+   list_add_tail(pdn-list, parent-child_list);
+
+   /*
+* If we already have PCI device instance, lets
+* bind them.
+*/
+   if (pdev)
+   pdev-dev.archdata.pci_data = pdn;
+
+   return pdn;
+}
+#endif
+
+struct pci_dn *add_dev_pci_data(struct pci_dev *pdev)
+{
+#ifdef CONFIG_PCI_IOV
+   struct pci_dn *parent, *pdn;
+   int i;
+
+   /* Only support IOV for now */
+   if (!pdev-is_physfn)
+   return pci_get_pdn(pdev);
+
+   /* Check if VFs have been populated */
+   pdn = pci_get_pdn(pdev);
+   if (!pdn || (pdn-flags  PCI_DN_FLAG_IOV_VF))
+   return NULL;
+
+   pdn-flags |= PCI_DN_FLAG_IOV_VF;
+   parent = pci_bus_to_pdn(pdev-bus);
+   if (!parent)
+   return NULL;
+
+   for (i = 0; i  pci_sriov_get_totalvfs(pdev); i++) {
+   pdn = add_one_dev_pci_data(parent, NULL,
+  pci_iov_virtfn_bus(pdev, i),
+  pci_iov_virtfn_devfn(pdev, i));
+   if (!pdn) {
+   dev_warn(pdev-dev, %s: Cannot create firmware data 
for VF#%d\n,
+__func__, i);
+   return NULL;
+   }
+   }
+#endif /* CONFIG_PCI_IOV */
+
+   return pci_get_pdn(pdev);
+}
+
+void remove_dev_pci_data(struct pci_dev *pdev)
+{
+#ifdef CONFIG_PCI_IOV
+   struct pci_dn *parent;
+   struct pci_dn *pdn, *tmp;
+   int i;
+
+   /* Only support IOV PF for now */
+   if (!pdev-is_physfn)
+   return;
+
+   /* Check if VFs have been populated */
+   pdn = pci_get_pdn(pdev);
+   if (!pdn || !(pdn-flags  PCI_DN_FLAG_IOV_VF))
+   return;
+
+   pdn-flags = ~PCI_DN_FLAG_IOV_VF;
+   parent = pci_bus_to_pdn(pdev-bus);
+   if (!parent)
+   return;
+
+   /*
+* We might introduce flag to pci_dn in future
+* so that we can release VF's firmware data in
+

[PATCH V16 14/20] powerpc/powernv: Reserve additional space for IOV BAR according to the number of total_pe

2015-03-25 Thread Wei Yang
On PHB3, PF IOV BAR will be covered by M64 BAR to have better PE isolation.
M64 BAR is a type of hardware resource in PHB3, which could map a range of
MMIO to PE numbers on powernv platform. And this range is divided equally
by the number of total_pe with each divided range mapping to a PE number.
Also, the M64 BAR must map a MMIO range with power-of-two size.

The total_pe number is usually different from total_VFs, which can lead to
a conflict between MMIO space and the PE number.

For example, if total_VFs is 128 and total_pe is 256, the second half of
M64 BAR will be part of other PCI device, which may already belong to other
PEs.

This patch prevents the conflict by reserving additional space for the PF
IOV BAR, which is total_pe number of VF's BAR size.

[bhelgaas: make dev_printk() output more consistent, index resource[]
conventionally]
Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/machdep.h|4 +++
 arch/powerpc/include/asm/pci-bridge.h |3 ++
 arch/powerpc/kernel/pci-common.c  |6 
 arch/powerpc/platforms/powernv/pci-ioda.c |   43 +
 4 files changed, 56 insertions(+)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index 098d51e..b303833 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -250,6 +250,10 @@ struct machdep_calls {
/* Reset the secondary bus of bridge */
void  (*pcibios_reset_secondary_bus)(struct pci_dev *dev);
 
+#ifdef CONFIG_PCI_IOV
+   void (*pcibios_fixup_sriov)(struct pci_dev *pdev);
+#endif /* CONFIG_PCI_IOV */
+
/* Called to shutdown machine specific hardware not already controlled
 * by other drivers.
 */
diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index ece30f5..7b8ebc5 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -178,6 +178,9 @@ struct pci_dn {
 #define IODA_INVALID_PE(-1)
 #ifdef CONFIG_PPC_POWERNV
int pe_number;
+#ifdef CONFIG_PCI_IOV
+   u16 vfs_expanded;   /* number of VFs IOV BAR expanded */
+#endif /* CONFIG_PCI_IOV */
 #endif
struct list_head child_list;
struct list_head list;
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 8203101..375bf70 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -990,6 +990,12 @@ int pcibios_add_device(struct pci_dev *dev)
 */
if (dev-bus-is_added)
pcibios_setup_device(dev);
+
+#ifdef CONFIG_PCI_IOV
+   if (ppc_md.pcibios_fixup_sriov)
+   ppc_md.pcibios_fixup_sriov(dev);
+#endif /* CONFIG_PCI_IOV */
+
return 0;
 }
 
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 9447ee9..1da45aa 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1749,6 +1749,46 @@ static void pnv_pci_init_ioda_msis(struct pnv_phb *phb)
 static void pnv_pci_init_ioda_msis(struct pnv_phb *phb) { }
 #endif /* CONFIG_PCI_MSI */
 
+#ifdef CONFIG_PCI_IOV
+static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
+{
+   struct pci_controller *hose;
+   struct pnv_phb *phb;
+   struct resource *res;
+   int i;
+   resource_size_t size;
+   struct pci_dn *pdn;
+
+   if (!pdev-is_physfn || pdev-is_added)
+   return;
+
+   hose = pci_bus_to_host(pdev-bus);
+   phb = hose-private_data;
+
+   pdn = pci_get_pdn(pdev);
+   pdn-vfs_expanded = 0;
+
+   for (i = 0; i  PCI_SRIOV_NUM_BARS; i++) {
+   res = pdev-resource[i + PCI_IOV_RESOURCES];
+   if (!res-flags || res-parent)
+   continue;
+   if (!pnv_pci_is_mem_pref_64(res-flags)) {
+   dev_warn(pdev-dev, Skipping expanding VF BAR%d: 
%pR\n,
+i, res);
+   continue;
+   }
+
+   dev_dbg(pdev-dev,  Fixing VF BAR%d: %pR to\n, i, res);
+   size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
+   res-end = res-start + size * phb-ioda.total_pe - 1;
+   dev_dbg(pdev-dev,%pR\n, res);
+   dev_info(pdev-dev, VF BAR%d: %pR (expanded to %d VFs for PE 
alignment),
+   i, res, phb-ioda.total_pe);
+   }
+   pdn-vfs_expanded = phb-ioda.total_pe;
+}
+#endif /* CONFIG_PCI_IOV */
+
 /*
  * This function is supposed to be called on basis of PE from top
  * to bottom style. So the the I/O or MMIO segment assigned to
@@ -2122,6 +2162,9 @@ static void __init pnv_pci_init_ioda_phb(struct 
device_node *np,
ppc_md.pcibios_enable_device_hook = pnv_pci_enable_device_hook;
ppc_md.pcibios_window_alignment = 

[PATCH V16 15/20] powerpc/powernv: Implement pcibios_iov_resource_alignment() on powernv

2015-03-25 Thread Wei Yang
Implement pcibios_iov_resource_alignment() on powernv platform.

On PowerNV platform, there are 3 cases for the IOV BAR:
1. initial state, the IOV BAR size is multiple times of VF BAR size
2. after expanded, the IOV BAR size is expanded to meet the M64 segment size
3. sizing stage, the IOV BAR is truncated to 0

pnv_pci_iov_resource_alignment() handle these three cases respectively.

[bhelgaas: adjust to drop align parameter, return pci_iov_resource_size()
if no ppc_md machdep_call version]
Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/machdep.h|1 +
 arch/powerpc/kernel/pci-common.c  |   10 ++
 arch/powerpc/platforms/powernv/pci-ioda.c |   20 
 3 files changed, 31 insertions(+)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index b303833..1b26804 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -252,6 +252,7 @@ struct machdep_calls {
 
 #ifdef CONFIG_PCI_IOV
void (*pcibios_fixup_sriov)(struct pci_dev *pdev);
+   resource_size_t (*pcibios_iov_resource_alignment)(struct pci_dev *, int 
resno);
 #endif /* CONFIG_PCI_IOV */
 
/* Called to shutdown machine specific hardware not already controlled
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 375bf70..9a306ff 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -130,6 +130,16 @@ void pcibios_reset_secondary_bus(struct pci_dev *dev)
pci_reset_secondary_bus(dev);
 }
 
+#ifdef CONFIG_PCI_IOV
+resource_size_t pcibios_iov_resource_alignment(struct pci_dev *pdev, int resno)
+{
+   if (ppc_md.pcibios_iov_resource_alignment)
+   return ppc_md.pcibios_iov_resource_alignment(pdev, resno);
+
+   return pci_iov_resource_size(pdev, resno);
+}
+#endif /* CONFIG_PCI_IOV */
+
 static resource_size_t pcibios_io_size(const struct pci_controller *hose)
 {
 #ifdef CONFIG_PPC64
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 1da45aa..217eaad 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1965,6 +1965,25 @@ static resource_size_t pnv_pci_window_alignment(struct 
pci_bus *bus,
return phb-ioda.io_segsize;
 }
 
+#ifdef CONFIG_PCI_IOV
+static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
+ int resno)
+{
+   struct pci_dn *pdn = pci_get_pdn(pdev);
+   resource_size_t align, iov_align;
+
+   iov_align = resource_size(pdev-resource[resno]);
+   if (iov_align)
+   return iov_align;
+
+   align = pci_iov_resource_size(pdev, resno);
+   if (pdn-vfs_expanded)
+   return pdn-vfs_expanded * align;
+
+   return align;
+}
+#endif /* CONFIG_PCI_IOV */
+
 /* Prevent enabling devices for which we couldn't properly
  * assign a PE
  */
@@ -2164,6 +2183,7 @@ static void __init pnv_pci_init_ioda_phb(struct 
device_node *np,
ppc_md.pcibios_reset_secondary_bus = pnv_pci_reset_secondary_bus;
 #ifdef CONFIG_PCI_IOV
ppc_md.pcibios_fixup_sriov = pnv_pci_ioda_fixup_iov_resources;
+   ppc_md.pcibios_iov_resource_alignment = pnv_pci_iov_resource_alignment;
 #endif /* CONFIG_PCI_IOV */
pci_add_flags(PCI_REASSIGN_ALL_RSRC);
 
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V16 20/20] powerpc/pci: Add PCI resource alignment documentation

2015-03-25 Thread Wei Yang
In order to enable SRIOV on PowerNV platform, the PF's IOV BAR needs to be
adjusted:

1. size expanded
2. aligned to M64BT size

This patch documents this change on the reason and how.

[bhelgaas: reformat, clarify, expand]
Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com
---
 .../powerpc/pci_iov_resource_on_powernv.txt|  301 
 1 file changed, 301 insertions(+)
 create mode 100644 Documentation/powerpc/pci_iov_resource_on_powernv.txt

diff --git a/Documentation/powerpc/pci_iov_resource_on_powernv.txt 
b/Documentation/powerpc/pci_iov_resource_on_powernv.txt
new file mode 100644
index 000..b55c5cd
--- /dev/null
+++ b/Documentation/powerpc/pci_iov_resource_on_powernv.txt
@@ -0,0 +1,301 @@
+Wei Yang weiy...@linux.vnet.ibm.com
+Benjamin Herrenschmidt b...@au1.ibm.com
+Bjorn Helgaas bhelg...@google.com
+26 Aug 2014
+
+This document describes the requirement from hardware for PCI MMIO resource
+sizing and assignment on PowerKVM and how generic PCI code handles this
+requirement. The first two sections describe the concepts of Partitionable
+Endpoints and the implementation on P8 (IODA2). The next two sections talks
+about considerations on enabling SRIOV on IODA2.
+
+1. Introduction to Partitionable Endpoints
+
+A Partitionable Endpoint (PE) is a way to group the various resources
+associated with a device or a set of devices to provide isolation between
+partitions (i.e., filtering of DMA, MSIs etc.) and to provide a mechanism
+to freeze a device that is causing errors in order to limit the possibility
+of propagation of bad data.
+
+There is thus, in HW, a table of PE states that contains a pair of frozen
+state bits (one for MMIO and one for DMA, they get set together but can be
+cleared independently) for each PE.
+
+When a PE is frozen, all stores in any direction are dropped and all loads
+return all 1's value. MSIs are also blocked. There's a bit more state that
+captures things like the details of the error that caused the freeze etc., but
+that's not critical.
+
+The interesting part is how the various PCIe transactions (MMIO, DMA, ...)
+are matched to their corresponding PEs.
+
+The following section provides a rough description of what we have on P8
+(IODA2).  Keep in mind that this is all per PHB (PCI host bridge).  Each PHB
+is a completely separate HW entity that replicates the entire logic, so has
+its own set of PEs, etc.
+
+2. Implementation of Partitionable Endpoints on P8 (IODA2)
+
+P8 supports up to 256 Partitionable Endpoints per PHB.
+
+  * Inbound
+
+For DMA, MSIs and inbound PCIe error messages, we have a table (in
+memory but accessed in HW by the chip) that provides a direct
+correspondence between a PCIe RID (bus/dev/fn) with a PE number.
+We call this the RTT.
+
+- For DMA we then provide an entire address space for each PE that can
+  contain two windows, depending on the value of PCI address bit 59.
+  Each window can be configured to be remapped via a TCE table (IOMMU
+  translation table), which has various configurable characteristics
+  not described here.
+
+- For MSIs, we have two windows in the address space (one at the top of
+  the 32-bit space and one much higher) which, via a combination of the
+  address and MSI value, will result in one of the 2048 interrupts per
+  bridge being triggered.  There's a PE# in the interrupt controller
+  descriptor table as well which is compared with the PE# obtained from
+  the RTT to authorize the device to emit that specific interrupt.
+
+- Error messages just use the RTT.
+
+  * Outbound.  That's where the tricky part is.
+
+Like other PCI host bridges, the Power8 IODA2 PHB supports windows
+from the CPU address space to the PCI address space.  There is one M32
+window and sixteen M64 windows.  They have different characteristics.
+First what they have in common: they forward a configurable portion of
+the CPU address space to the PCIe bus and must be naturally aligned
+power of two in size.  The rest is different:
+
+- The M32 window:
+
+  * Is limited to 4GB in size.
+
+  * Drops the top bits of the address (above the size) and replaces
+   them with a configurable value.  This is typically used to generate
+   32-bit PCIe accesses.  We configure that window at boot from FW and
+   don't touch it from Linux; it's usually set to forward a 2GB
+   portion of address space from the CPU to PCIe
+   0x8000_..0x_.  (Note: The top 64KB are actually
+   reserved for MSIs but this is not a problem at this point; we just
+   need to ensure Linux doesn't assign anything there, the M32 logic
+   ignores that however and will forward in that space if we try).
+
+  * It is divided into 256 segments of equal size.  A table in the chip
+   maps each segment to a PE#.  That allows portions of the MMIO space
+   to be assigned to PEs on a segment 

[PATCH V16 19/20] powerpc/pci: Remove unused struct pci_dn.pcidev field

2015-03-25 Thread Wei Yang
In struct pci_dn, the pcidev field is assigned but not used, so remove it.

Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com
Acked-by: Gavin Shan gws...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/pci-bridge.h |1 -
 arch/powerpc/platforms/powernv/pci-ioda.c |1 -
 2 files changed, 2 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index 560c739..a39270e 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -171,7 +171,6 @@ struct pci_dn {
 
int pci_ext_config_space;   /* for pci devices */
 
-   struct  pci_dev *pcidev;/* back-pointer to the pci device */
 #ifdef CONFIG_EEH
struct eeh_dev *edev;   /* eeh device */
 #endif
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 33088f6..b1387ea 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1028,7 +1028,6 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, 
struct pnv_ioda_pe *pe)
pci_name(dev));
continue;
}
-   pdn-pcidev = dev;
pdn-pe_number = pe-pe_number;
pe-dma_weight += pnv_ioda_dma_weight(dev);
if ((pe-flags  PNV_IODA_PE_BUS_ALL)  dev-subordinate)
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 2/6] powerpc/mm: Change setbat() to take a pgprot_t rather than flags

2015-03-25 Thread Michael Ellerman
The callers of setbat() are actually passing a pgprot_t for the flags
parameter. This doesn't matter unless STRICT_MM_TYPECHECKS is enabled.
So we can turn that on without breaking the build, change setbat() to
take a pgprot_t and have it convert it to an unsigned long internally.

Signed-off-by: Michael Ellerman m...@ellerman.id.au
---
 arch/powerpc/mm/mmu_decl.h   | 2 +-
 arch/powerpc/mm/ppc_mmu_32.c | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/mmu_decl.h b/arch/powerpc/mm/mmu_decl.h
index 78c45f392f5b..085b66b10891 100644
--- a/arch/powerpc/mm/mmu_decl.h
+++ b/arch/powerpc/mm/mmu_decl.h
@@ -96,7 +96,7 @@ extern void _tlbia(void);
 extern void mapin_ram(void);
 extern int map_page(unsigned long va, phys_addr_t pa, int flags);
 extern void setbat(int index, unsigned long virt, phys_addr_t phys,
-  unsigned int size, int flags);
+  unsigned int size, pgprot_t prot);
 
 extern int __map_without_bats;
 extern int __allow_ioremap_reserved;
diff --git a/arch/powerpc/mm/ppc_mmu_32.c b/arch/powerpc/mm/ppc_mmu_32.c
index 5029dc19b517..94f33721d382 100644
--- a/arch/powerpc/mm/ppc_mmu_32.c
+++ b/arch/powerpc/mm/ppc_mmu_32.c
@@ -113,11 +113,12 @@ unsigned long __init mmu_mapin_ram(unsigned long top)
  * of 2 between 128k and 256M.
  */
 void __init setbat(int index, unsigned long virt, phys_addr_t phys,
-  unsigned int size, int flags)
+  unsigned int size, pgprot_t prot)
 {
unsigned int bl;
int wimgxpp;
struct ppc_bat *bat = BATS[index];
+   unsigned long flags = pgprot_val(prot);
 
if ((flags  _PAGE_NO_CACHE) ||
(cpu_has_feature(CPU_FTR_NEED_COHERENT) == 0))
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V16 01/20] PCI: Print more info in sriov_enable() error message

2015-03-25 Thread Wei Yang
From: Bjorn Helgaas bhelg...@google.com

If we don't have space for all the bus numbers required to enable VFs,
print the largest bus number required and the range available.

No functional change; improved error message only.

Signed-off-by: Bjorn Helgaas bhelg...@google.com
Acked-by: Wei Yang weiy...@linux.vnet.ibm.com
---
 drivers/pci/iov.c |7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 4b3a4ea..c4c33ea 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -180,6 +180,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
struct pci_dev *pdev;
struct pci_sriov *iov = dev-sriov;
int bars = 0;
+   u8 bus;
 
if (!nr_virtfn)
return 0;
@@ -216,8 +217,10 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
iov-offset = offset;
iov-stride = stride;
 
-   if (virtfn_bus(dev, nr_virtfn - 1)  dev-bus-busn_res.end) {
-   dev_err(dev-dev, SR-IOV: bus number out of range\n);
+   bus = virtfn_bus(dev, nr_virtfn - 1);
+   if (bus  dev-bus-busn_res.end) {
+   dev_err(dev-dev, can't enable %d VFs (bus %02x out of range 
of %pR)\n,
+   nr_virtfn, bus, dev-bus-busn_res);
return -ENOMEM;
}
 
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V16 02/20] PCI: Print PF SR-IOV resource that contains all VF(n) BAR space

2015-03-25 Thread Wei Yang
When we size VF BAR0, VF BAR1, etc., from the SR-IOV Capability of a PF, we
learn the alignment requirement and amount of space consumed by a single
VF.  But when VFs are enabled, *each* of the NumVFs consumes that amount of
space, so the total size of the PF resource is VF BAR size * NumVFs.

Add a printk of the total space consumed by the VFs corresponding to what
we already do for normal non-IOV BARs.

No functional change; new message only.

[bhelgaas: split out into its own patch]
Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com
Acked-by: Bjorn Helgaas bhelg...@google.com
---
 drivers/pci/iov.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index c4c33ea..05f9d97 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -372,6 +372,8 @@ found:
goto failed;
}
res-end = res-start + resource_size(res) * total - 1;
+   dev_info(dev-dev, VF(n) BAR%d space: %pR (contains BAR%d for 
%d VFs)\n,
+i, res, i, total);
nres++;
}
 
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V16 06/20] PCI: Calculate maximum number of buses required for VFs

2015-03-25 Thread Wei Yang
An SR-IOV device can change its First VF Offset and VF Stride based on the
values of ARI Capable Hierarchy and NumVFs.  The number of buses required
for all VFs is determined by NumVFs, First VF Offset, and VF Stride (see
SR-IOV spec r1.1, sec 2.1.2).

Previously pci_iov_bus_range() computed how many buses would be required by
TotalVFs, but this was based on a single NumVFs value and may not have been
the maximum for all NumVFs configurations.

Iterate over all valid NumVFs and calculate the maximum number of bus
numbers that could ever be required for VFs of this device.

[bhelgaas: changelog, compute busnr of NumVFs, not TotalVFs, remove
kerenl-doc comment marker]
Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com
Acked-by: Bjorn Helgaas bhelg...@google.com
---
 drivers/pci/iov.c |   31 +++
 drivers/pci/pci.h |1 +
 2 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index a8752c2..2ae921f 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -46,6 +46,30 @@ static inline void pci_iov_set_numvfs(struct pci_dev *dev, 
int nr_virtfn)
pci_read_config_word(dev, iov-pos + PCI_SRIOV_VF_STRIDE, iov-stride);
 }
 
+/*
+ * The PF consumes one bus number.  NumVFs, First VF Offset, and VF Stride
+ * determine how many additional bus numbers will be consumed by VFs.
+ *
+ * Iterate over all valid NumVFs and calculate the maximum number of bus
+ * numbers that could ever be required.
+ */
+static inline u8 virtfn_max_buses(struct pci_dev *dev)
+{
+   struct pci_sriov *iov = dev-sriov;
+   int nr_virtfn;
+   u8 max = 0;
+   u8 busnr;
+
+   for (nr_virtfn = 1; nr_virtfn = iov-total_VFs; nr_virtfn++) {
+   pci_iov_set_numvfs(dev, nr_virtfn);
+   busnr = virtfn_bus(dev, nr_virtfn - 1);
+   if (busnr  max)
+   max = busnr;
+   }
+
+   return max;
+}
+
 static struct pci_bus *virtfn_add_bus(struct pci_bus *bus, int busnr)
 {
struct pci_bus *child;
@@ -427,6 +451,7 @@ found:
 
dev-sriov = iov;
dev-is_physfn = 1;
+   iov-max_VF_buses = virtfn_max_buses(dev);
 
return 0;
 
@@ -556,15 +581,13 @@ void pci_restore_iov_state(struct pci_dev *dev)
 int pci_iov_bus_range(struct pci_bus *bus)
 {
int max = 0;
-   u8 busnr;
struct pci_dev *dev;
 
list_for_each_entry(dev, bus-devices, bus_list) {
if (!dev-is_physfn)
continue;
-   busnr = virtfn_bus(dev, dev-sriov-total_VFs - 1);
-   if (busnr  max)
-   max = busnr;
+   if (dev-sriov-max_VF_buses  max)
+   max = dev-sriov-max_VF_buses;
}
 
return max ? max - bus-number : 0;
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 5732964..bae593c 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -243,6 +243,7 @@ struct pci_sriov {
u16 stride; /* following VF stride */
u32 pgsz;   /* page size for BAR alignment */
u8 link;/* Function Dependency Link */
+   u8 max_VF_buses;/* max buses consumed by VFs */
u16 driver_max_VFs; /* max num VFs driver supports */
struct pci_dev *dev;/* lowest numbered PF */
struct pci_dev *self;   /* this PF */
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V16 03/20] PCI: Keep individual VF BAR size in struct pci_sriov

2015-03-25 Thread Wei Yang
Currently we don't store the individual VF BAR size.  We calculate it when
needed by dividing the PF's IOV resource size (which contains space for
*all* the VFs) by total_VFs or by reading the BAR in the SR-IOV capability
again.

Keep the individual VF BAR size in struct pci_sriov.barsz[], add
pci_iov_resource_size() to retrieve it, and use that instead of doing the
division or reading the SR-IOV capability BAR.

[bhelgaas: rename to barsz[], simplify barsz[] index computation, remove
SR-IOV capability BAR sizing]
Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com
Acked-by: Bjorn Helgaas bhelg...@google.com
---
 drivers/pci/iov.c   |   39 ---
 drivers/pci/pci.h   |1 +
 include/linux/pci.h |3 +++
 3 files changed, 24 insertions(+), 19 deletions(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 05f9d97..5bca0e1 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -57,6 +57,14 @@ static void virtfn_remove_bus(struct pci_bus *physbus, 
struct pci_bus *virtbus)
pci_remove_bus(virtbus);
 }
 
+resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno)
+{
+   if (!dev-is_physfn)
+   return 0;
+
+   return dev-sriov-barsz[resno - PCI_IOV_RESOURCES];
+}
+
 static int virtfn_add(struct pci_dev *dev, int id, int reset)
 {
int i;
@@ -92,8 +100,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int reset)
continue;
virtfn-resource[i].name = pci_name(virtfn);
virtfn-resource[i].flags = res-flags;
-   size = resource_size(res);
-   do_div(size, iov-total_VFs);
+   size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES);
virtfn-resource[i].start = res-start + size * id;
virtfn-resource[i].end = virtfn-resource[i].start + size - 1;
rc = request_resource(res, virtfn-resource[i]);
@@ -311,7 +318,7 @@ static void sriov_disable(struct pci_dev *dev)
 
 static int sriov_init(struct pci_dev *dev, int pos)
 {
-   int i;
+   int i, bar64;
int rc;
int nres;
u32 pgsz;
@@ -360,29 +367,29 @@ found:
pgsz = ~(pgsz - 1);
pci_write_config_dword(dev, pos + PCI_SRIOV_SYS_PGSIZE, pgsz);
 
+   iov = kzalloc(sizeof(*iov), GFP_KERNEL);
+   if (!iov)
+   return -ENOMEM;
+
nres = 0;
for (i = 0; i  PCI_SRIOV_NUM_BARS; i++) {
res = dev-resource + PCI_IOV_RESOURCES + i;
-   i += __pci_read_base(dev, pci_bar_unknown, res,
-pos + PCI_SRIOV_BAR + i * 4);
+   bar64 = __pci_read_base(dev, pci_bar_unknown, res,
+   pos + PCI_SRIOV_BAR + i * 4);
if (!res-flags)
continue;
if (resource_size(res)  (PAGE_SIZE - 1)) {
rc = -EIO;
goto failed;
}
+   iov-barsz[i] = resource_size(res);
res-end = res-start + resource_size(res) * total - 1;
dev_info(dev-dev, VF(n) BAR%d space: %pR (contains BAR%d for 
%d VFs)\n,
 i, res, i, total);
+   i += bar64;
nres++;
}
 
-   iov = kzalloc(sizeof(*iov), GFP_KERNEL);
-   if (!iov) {
-   rc = -ENOMEM;
-   goto failed;
-   }
-
iov-pos = pos;
iov-nres = nres;
iov-ctrl = ctrl;
@@ -414,6 +421,7 @@ failed:
res-flags = 0;
}
 
+   kfree(iov);
return rc;
 }
 
@@ -510,14 +518,7 @@ int pci_iov_resource_bar(struct pci_dev *dev, int resno)
  */
 resource_size_t pci_sriov_resource_alignment(struct pci_dev *dev, int resno)
 {
-   struct resource tmp;
-   int reg = pci_iov_resource_bar(dev, resno);
-
-   if (!reg)
-   return 0;
-
-__pci_read_base(dev, pci_bar_unknown, tmp, reg);
-   return resource_alignment(tmp);
+   return pci_iov_resource_size(dev, resno);
 }
 
 /**
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 4091f82..5732964 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -247,6 +247,7 @@ struct pci_sriov {
struct pci_dev *dev;/* lowest numbered PF */
struct pci_dev *self;   /* this PF */
struct mutex lock;  /* lock for VF bus */
+   resource_size_t barsz[PCI_SRIOV_NUM_BARS];  /* VF BAR size */
 };
 
 #ifdef CONFIG_PCI_ATS
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 211e9da..1559658 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1675,6 +1675,7 @@ int pci_num_vf(struct pci_dev *dev);
 int pci_vfs_assigned(struct pci_dev *dev);
 int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs);
 int pci_sriov_get_totalvfs(struct pci_dev *dev);
+resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno);
 #else
 static inline int 

[PATCH V16 07/20] PCI: Export pci_iov_virtfn_bus() and pci_iov_virtfn_devfn()

2015-03-25 Thread Wei Yang
On PowerNV, some resource reservation is needed for SR-IOV VFs that don't
exist at the bootup stage.  To do the match between resources and VFs, the
code need to get the VF's BDF in advance.

Rename virtfn_bus() and virtfn_devfn() to pci_iov_virtfn_bus() and
pci_iov_virtfn_devfn() and export them.

[bhelgaas: changelog, make busnr int]
Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com
Acked-by: Bjorn Helgaas bhelg...@google.com
---
 drivers/pci/iov.c   |   28 
 include/linux/pci.h |   11 +++
 2 files changed, 27 insertions(+), 12 deletions(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 2ae921f..5643a10 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -19,16 +19,20 @@
 
 #define VIRTFN_ID_LEN  16
 
-static inline u8 virtfn_bus(struct pci_dev *dev, int id)
+int pci_iov_virtfn_bus(struct pci_dev *dev, int vf_id)
 {
+   if (!dev-is_physfn)
+   return -EINVAL;
return dev-bus-number + ((dev-devfn + dev-sriov-offset +
-   dev-sriov-stride * id)  8);
+   dev-sriov-stride * vf_id)  8);
 }
 
-static inline u8 virtfn_devfn(struct pci_dev *dev, int id)
+int pci_iov_virtfn_devfn(struct pci_dev *dev, int vf_id)
 {
+   if (!dev-is_physfn)
+   return -EINVAL;
return (dev-devfn + dev-sriov-offset +
-   dev-sriov-stride * id)  0xff;
+   dev-sriov-stride * vf_id)  0xff;
 }
 
 /*
@@ -58,11 +62,11 @@ static inline u8 virtfn_max_buses(struct pci_dev *dev)
struct pci_sriov *iov = dev-sriov;
int nr_virtfn;
u8 max = 0;
-   u8 busnr;
+   int busnr;
 
for (nr_virtfn = 1; nr_virtfn = iov-total_VFs; nr_virtfn++) {
pci_iov_set_numvfs(dev, nr_virtfn);
-   busnr = virtfn_bus(dev, nr_virtfn - 1);
+   busnr = pci_iov_virtfn_bus(dev, nr_virtfn - 1);
if (busnr  max)
max = busnr;
}
@@ -116,7 +120,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int 
reset)
struct pci_bus *bus;
 
mutex_lock(iov-dev-sriov-lock);
-   bus = virtfn_add_bus(dev-bus, virtfn_bus(dev, id));
+   bus = virtfn_add_bus(dev-bus, pci_iov_virtfn_bus(dev, id));
if (!bus)
goto failed;
 
@@ -124,7 +128,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int 
reset)
if (!virtfn)
goto failed0;
 
-   virtfn-devfn = virtfn_devfn(dev, id);
+   virtfn-devfn = pci_iov_virtfn_devfn(dev, id);
virtfn-vendor = dev-vendor;
pci_read_config_word(dev, iov-pos + PCI_SRIOV_VF_DID, virtfn-device);
pci_setup_device(virtfn);
@@ -186,8 +190,8 @@ static void virtfn_remove(struct pci_dev *dev, int id, int 
reset)
struct pci_sriov *iov = dev-sriov;
 
virtfn = pci_get_domain_bus_and_slot(pci_domain_nr(dev-bus),
-virtfn_bus(dev, id),
-virtfn_devfn(dev, id));
+pci_iov_virtfn_bus(dev, id),
+pci_iov_virtfn_devfn(dev, id));
if (!virtfn)
return;
 
@@ -226,7 +230,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
struct pci_dev *pdev;
struct pci_sriov *iov = dev-sriov;
int bars = 0;
-   u8 bus;
+   int bus;
 
if (!nr_virtfn)
return 0;
@@ -263,7 +267,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
iov-offset = offset;
iov-stride = stride;
 
-   bus = virtfn_bus(dev, nr_virtfn - 1);
+   bus = pci_iov_virtfn_bus(dev, nr_virtfn - 1);
if (bus  dev-bus-busn_res.end) {
dev_err(dev-dev, can't enable %d VFs (bus %02x out of range 
of %pR)\n,
nr_virtfn, bus, dev-bus-busn_res);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 1559658..99ea948 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1669,6 +1669,9 @@ int pci_ext_cfg_avail(void);
 void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar);
 
 #ifdef CONFIG_PCI_IOV
+int pci_iov_virtfn_bus(struct pci_dev *dev, int id);
+int pci_iov_virtfn_devfn(struct pci_dev *dev, int id);
+
 int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn);
 void pci_disable_sriov(struct pci_dev *dev);
 int pci_num_vf(struct pci_dev *dev);
@@ -1677,6 +1680,14 @@ int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 
numvfs);
 int pci_sriov_get_totalvfs(struct pci_dev *dev);
 resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno);
 #else
+static inline int pci_iov_virtfn_bus(struct pci_dev *dev, int id)
+{
+   return -ENOSYS;
+}
+static inline int pci_iov_virtfn_devfn(struct pci_dev *dev, int id)
+{
+   return -ENOSYS;
+}
 static inline int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn)
 { return -ENODEV; }
 static inline void 

[PATCH V16 13/20] powerpc/powernv: Allocate struct pnv_ioda_pe iommu_table dynamically

2015-03-25 Thread Wei Yang
Previously the iommu_table had the same lifetime as a struct pnv_ioda_pe
and was embedded in it. The pnv_ioda_pe was assigned to a PE on the bootup
stage. Since PEs are based on the hardware layout which is static in the
system, they will never get released. This means the iommu_table in the
pnv_ioda_pe will never get released either.

This no longer works for VF PE. VF PEs are created and released dynamically
when VFs are created and released. So we need to assign pnv_ioda_pe to VF
PEs respectively when VFs are enabled and clean up those resources for VF
PE when VFs are disabled. And iommu_table is one of the resources we need
to handle dynamically.

Current iommu_table is a static field in pnv_ioda_pe, which will face a
problem when freeing it. During the disabling of a VF,
pnv_pci_ioda2_release_dma_pe will call iommu_free_table to release the
iommu_table for this PE. A static iommu_table will fail in
iommu_free_table.

According to these requirement, this patch allocates iommu_table
dynamically.

Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/iommu.h  |3 +++
 arch/powerpc/platforms/powernv/pci-ioda.c |   26 ++
 arch/powerpc/platforms/powernv/pci.h  |2 +-
 3 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index f1ea597..e2abbe8 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -78,6 +78,9 @@ struct iommu_table {
struct iommu_group *it_group;
 #endif
void (*set_bypass)(struct iommu_table *tbl, bool enable);
+#ifdef CONFIG_PPC_POWERNV
+   void   *data;
+#endif
 };
 
 /* Pure 2^n version of get_order */
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 7f58f19..9447ee9 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -916,6 +916,10 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, int 
all)
return;
}
 
+   pe-tce32_table = kzalloc_node(sizeof(struct iommu_table),
+   GFP_KERNEL, hose-node);
+   pe-tce32_table-data = pe;
+
/* Associate it with all child devices */
pnv_ioda_setup_same_PE(bus, pe);
 
@@ -1005,7 +1009,7 @@ static void pnv_pci_ioda_dma_dev_setup(struct pnv_phb 
*phb, struct pci_dev *pdev
 
pe = phb-ioda.pe_array[pdn-pe_number];
WARN_ON(get_dma_ops(pdev-dev) != dma_iommu_ops);
-   set_iommu_table_base_and_group(pdev-dev, pe-tce32_table);
+   set_iommu_table_base_and_group(pdev-dev, pe-tce32_table);
 }
 
 static int pnv_pci_ioda_dma_set_mask(struct pnv_phb *phb,
@@ -1032,7 +1036,7 @@ static int pnv_pci_ioda_dma_set_mask(struct pnv_phb *phb,
} else {
dev_info(pdev-dev, Using 32-bit DMA via iommu\n);
set_dma_ops(pdev-dev, dma_iommu_ops);
-   set_iommu_table_base(pdev-dev, pe-tce32_table);
+   set_iommu_table_base(pdev-dev, pe-tce32_table);
}
*pdev-dev.dma_mask = dma_mask;
return 0;
@@ -1069,9 +1073,9 @@ static void pnv_ioda_setup_bus_dma(struct pnv_ioda_pe *pe,
list_for_each_entry(dev, bus-devices, bus_list) {
if (add_to_iommu_group)
set_iommu_table_base_and_group(dev-dev,
-  pe-tce32_table);
+  pe-tce32_table);
else
-   set_iommu_table_base(dev-dev, pe-tce32_table);
+   set_iommu_table_base(dev-dev, pe-tce32_table);
 
if (dev-subordinate)
pnv_ioda_setup_bus_dma(pe, dev-subordinate,
@@ -1161,8 +1165,7 @@ static void pnv_pci_ioda2_tce_invalidate(struct 
pnv_ioda_pe *pe,
 void pnv_pci_ioda_tce_invalidate(struct iommu_table *tbl,
 __be64 *startp, __be64 *endp, bool rm)
 {
-   struct pnv_ioda_pe *pe = container_of(tbl, struct pnv_ioda_pe,
- tce32_table);
+   struct pnv_ioda_pe *pe = tbl-data;
struct pnv_phb *phb = pe-phb;
 
if (phb-type == PNV_PHB_IODA1)
@@ -1228,7 +1231,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
}
 
/* Setup linux iommu table */
-   tbl = pe-tce32_table;
+   tbl = pe-tce32_table;
pnv_pci_setup_iommu_table(tbl, addr, TCE32_TABLE_SIZE * segs,
  base  28, IOMMU_PAGE_SHIFT_4K);
 
@@ -1266,8 +1269,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
 
 static void pnv_pci_ioda2_set_bypass(struct iommu_table *tbl, bool enable)
 {
-   struct pnv_ioda_pe *pe = container_of(tbl, struct pnv_ioda_pe,
- tce32_table);
+   struct pnv_ioda_pe *pe = tbl-data;
uint16_t window_id = 

[PATCH V16 16/20] powerpc/powernv: Shift VF resource with an offset

2015-03-25 Thread Wei Yang
On PowerNV platform, resource position in M64 BAR implies the PE# the
resource belongs to. In some cases, adjustment of a resource is necessary
to locate it to a correct position in M64 BAR .

This patch adds pnv_pci_vf_resource_shift() to shift the 'real' PF IOV BAR
address according to an offset.

Note:

After doing so, there would be a hole in the /proc/iomem when offset
is a positive value. It looks like the device return some mmio back to
the system, which actually no one could use it.

[bhelgaas: rework loops, rework overlap check, index resource[]
conventionally, remove pci_regs.h include, squashed with next patch]
Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/pci-bridge.h |4 +
 arch/powerpc/kernel/pci_dn.c  |   13 +
 arch/powerpc/platforms/powernv/pci-ioda.c |  528 -
 arch/powerpc/platforms/powernv/pci.c  |   18 +
 arch/powerpc/platforms/powernv/pci.h  |7 +
 5 files changed, 553 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index 7b8ebc5..8716db4 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -180,6 +180,10 @@ struct pci_dn {
int pe_number;
 #ifdef CONFIG_PCI_IOV
u16 vfs_expanded;   /* number of VFs IOV BAR expanded */
+   u16 num_vfs;/* number of VFs enabled*/
+   int offset; /* PE# for the first VF PE */
+#define IODA_INVALID_M64(-1)
+   int m64_wins[PCI_SRIOV_NUM_BARS];
 #endif /* CONFIG_PCI_IOV */
 #endif
struct list_head child_list;
diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index e5f1d78..b3b4df9 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -217,6 +217,19 @@ void remove_dev_pci_data(struct pci_dev *pdev)
struct pci_dn *pdn, *tmp;
int i;
 
+   /*
+* VF and VF PE are created/released dynamically, so we need to
+* bind/unbind them.  Otherwise the VF and VF PE would be mismatched
+* when re-enabling SR-IOV.
+*/
+   if (pdev-is_virtfn) {
+   pdn = pci_get_pdn(pdev);
+#ifdef CONFIG_PPC_POWERNV
+   pdn-pe_number = IODA_INVALID_PE;
+#endif
+   return;
+   }
+
/* Only support IOV PF for now */
if (!pdev-is_physfn)
return;
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 217eaad..5187d16 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -44,6 +44,9 @@
 #include powernv.h
 #include pci.h
 
+/* 256M DMA window, 4K TCE pages, 8 bytes TCE */
+#define TCE32_TABLE_SIZE   ((0x1000 / 0x1000) * 8)
+
 static void pe_level_printk(const struct pnv_ioda_pe *pe, const char *level,
const char *fmt, ...)
 {
@@ -56,11 +59,18 @@ static void pe_level_printk(const struct pnv_ioda_pe *pe, 
const char *level,
vaf.fmt = fmt;
vaf.va = args;
 
-   if (pe-pdev)
+   if (pe-flags  PNV_IODA_PE_DEV)
strlcpy(pfix, dev_name(pe-pdev-dev), sizeof(pfix));
-   else
+   else if (pe-flags  (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL))
sprintf(pfix, %04x:%02x ,
pci_domain_nr(pe-pbus), pe-pbus-number);
+#ifdef CONFIG_PCI_IOV
+   else if (pe-flags  PNV_IODA_PE_VF)
+   sprintf(pfix, %04x:%02x:%2x.%d,
+   pci_domain_nr(pe-parent_dev-bus),
+   (pe-rid  0xff00)  8,
+   PCI_SLOT(pe-rid), PCI_FUNC(pe-rid));
+#endif /* CONFIG_PCI_IOV*/
 
printk(%spci %s: [PE# %.3d] %pV,
   level, pfix, pe-pe_number, vaf);
@@ -591,7 +601,7 @@ static int pnv_ioda_set_peltv(struct pnv_phb *phb,
  bool is_add)
 {
struct pnv_ioda_pe *slave;
-   struct pci_dev *pdev;
+   struct pci_dev *pdev = NULL;
int ret;
 
/*
@@ -630,8 +640,12 @@ static int pnv_ioda_set_peltv(struct pnv_phb *phb,
 
if (pe-flags  (PNV_IODA_PE_BUS_ALL | PNV_IODA_PE_BUS))
pdev = pe-pbus-self;
-   else
+   else if (pe-flags  PNV_IODA_PE_DEV)
pdev = pe-pdev-bus-self;
+#ifdef CONFIG_PCI_IOV
+   else if (pe-flags  PNV_IODA_PE_VF)
+   pdev = pe-parent_dev-bus-self;
+#endif /* CONFIG_PCI_IOV */
while (pdev) {
struct pci_dn *pdn = pci_get_pdn(pdev);
struct pnv_ioda_pe *parent;
@@ -649,6 +663,87 @@ static int pnv_ioda_set_peltv(struct pnv_phb *phb,
return 0;
 }
 
+#ifdef CONFIG_PCI_IOV
+static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
+{
+   struct pci_dev *parent;
+   uint8_t bcomp, dcomp, fcomp;
+   int64_t rc;
+   long rid_end, rid;
+
+   /* 

[PATCH V16 17/20] powerpc/powernv: Reserve additional space for IOV BAR, with m64_per_iov supported

2015-03-25 Thread Wei Yang
M64 aperture size is limited on PHB3.  When the IOV BAR is too big, this
will exceed the limitation and failed to be assigned.

Introduce a different mechanism based on the IOV BAR size:

  - if IOV BAR size is smaller than 64MB, expand to total_pe
  - if IOV BAR size is bigger than 64MB, roundup power2

[bhelgaas: make dev_printk() output more consistent, use PCI_SRIOV_NUM_BARS]
Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/pci-bridge.h |2 ++
 arch/powerpc/platforms/powernv/pci-ioda.c |   33 ++---
 2 files changed, 32 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index 8716db4..415df85 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -182,6 +182,8 @@ struct pci_dn {
u16 vfs_expanded;   /* number of VFs IOV BAR expanded */
u16 num_vfs;/* number of VFs enabled*/
int offset; /* PE# for the first VF PE */
+#define M64_PER_IOV 4
+   int m64_per_iov;
 #define IODA_INVALID_M64(-1)
int m64_wins[PCI_SRIOV_NUM_BARS];
 #endif /* CONFIG_PCI_IOV */
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 5187d16..b63925f 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2250,6 +2250,7 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
pci_dev *pdev)
int i;
resource_size_t size;
struct pci_dn *pdn;
+   int mul, total_vfs;
 
if (!pdev-is_physfn || pdev-is_added)
return;
@@ -2260,6 +2261,32 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
pci_dev *pdev)
pdn = pci_get_pdn(pdev);
pdn-vfs_expanded = 0;
 
+   total_vfs = pci_sriov_get_totalvfs(pdev);
+   pdn-m64_per_iov = 1;
+   mul = phb-ioda.total_pe;
+
+   for (i = 0; i  PCI_SRIOV_NUM_BARS; i++) {
+   res = pdev-resource[i + PCI_IOV_RESOURCES];
+   if (!res-flags || res-parent)
+   continue;
+   if (!pnv_pci_is_mem_pref_64(res-flags)) {
+   dev_warn(pdev-dev,  non M64 VF BAR%d: %pR\n,
+i, res);
+   continue;
+   }
+
+   size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
+
+   /* bigger than 64M */
+   if (size  (1  26)) {
+   dev_info(pdev-dev, PowerNV: VF BAR%d: %pR IOV size 
is bigger than 64M, roundup power2\n,
+i, res);
+   pdn-m64_per_iov = M64_PER_IOV;
+   mul = roundup_pow_of_two(total_vfs);
+   break;
+   }
+   }
+
for (i = 0; i  PCI_SRIOV_NUM_BARS; i++) {
res = pdev-resource[i + PCI_IOV_RESOURCES];
if (!res-flags || res-parent)
@@ -2272,12 +2299,12 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
pci_dev *pdev)
 
dev_dbg(pdev-dev,  Fixing VF BAR%d: %pR to\n, i, res);
size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
-   res-end = res-start + size * phb-ioda.total_pe - 1;
+   res-end = res-start + size * mul - 1;
dev_dbg(pdev-dev,%pR\n, res);
dev_info(pdev-dev, VF BAR%d: %pR (expanded to %d VFs for PE 
alignment),
-   i, res, phb-ioda.total_pe);
+i, res, mul);
}
-   pdn-vfs_expanded = phb-ioda.total_pe;
+   pdn-vfs_expanded = mul;
 }
 #endif /* CONFIG_PCI_IOV */
 
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 4/6] powerpc: Fix compile errors with STRICT_MM_TYPECHECKS enabled

2015-03-25 Thread Michael Ellerman
Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
[mpe: Fix the 32-bit code also]
Signed-off-by: Michael Ellerman m...@ellerman.id.au
---
 arch/powerpc/include/asm/kvm_book3s_64.h | 12 +++-
 arch/powerpc/mm/dma-noncoherent.c|  2 +-
 arch/powerpc/mm/fsl_booke_mmu.c  |  2 +-
 arch/powerpc/mm/hugepage-hash64.c|  2 +-
 arch/powerpc/mm/hugetlbpage.c|  4 ++--
 arch/powerpc/mm/pgtable_32.c |  4 ++--
 arch/powerpc/mm/pgtable_64.c |  2 +-
 arch/powerpc/mm/tlb_hash64.c |  2 +-
 8 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 2d81e202bdcc..cc073a7ac2b7 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -290,11 +290,11 @@ static inline pte_t kvmppc_read_update_linux_pte(pte_t 
*ptep, int writing,
pte_t old_pte, new_pte = __pte(0);
 
while (1) {
-   old_pte = pte_val(*ptep);
+   old_pte = *ptep;
/*
 * wait until _PAGE_BUSY is clear then set it atomically
 */
-   if (unlikely(old_pte  _PAGE_BUSY)) {
+   if (unlikely(pte_val(old_pte)  _PAGE_BUSY)) {
cpu_relax();
continue;
}
@@ -305,16 +305,18 @@ static inline pte_t kvmppc_read_update_linux_pte(pte_t 
*ptep, int writing,
return __pte(0);
 #endif
/* If pte is not present return None */
-   if (unlikely(!(old_pte  _PAGE_PRESENT)))
+   if (unlikely(!(pte_val(old_pte)  _PAGE_PRESENT)))
return __pte(0);
 
new_pte = pte_mkyoung(old_pte);
if (writing  pte_write(old_pte))
new_pte = pte_mkdirty(new_pte);
 
-   if (old_pte == __cmpxchg_u64((unsigned long *)ptep, old_pte,
-new_pte))
+   if (pte_val(old_pte) == __cmpxchg_u64((unsigned long *)ptep,
+ pte_val(old_pte),
+ pte_val(new_pte))) {
break;
+   }
}
return new_pte;
 }
diff --git a/arch/powerpc/mm/dma-noncoherent.c 
b/arch/powerpc/mm/dma-noncoherent.c
index d85e86aac7fb..169aba446a74 100644
--- a/arch/powerpc/mm/dma-noncoherent.c
+++ b/arch/powerpc/mm/dma-noncoherent.c
@@ -228,7 +228,7 @@ __dma_alloc_coherent(struct device *dev, size_t size, 
dma_addr_t *handle, gfp_t
do {
SetPageReserved(page);
map_page(vaddr, page_to_phys(page),
-pgprot_noncached(PAGE_KERNEL));
+pgprot_val(pgprot_noncached(PAGE_KERNEL)));
page++;
vaddr += PAGE_SIZE;
} while (size -= PAGE_SIZE);
diff --git a/arch/powerpc/mm/fsl_booke_mmu.c b/arch/powerpc/mm/fsl_booke_mmu.c
index b46912fee7cd..9c90e66cffb6 100644
--- a/arch/powerpc/mm/fsl_booke_mmu.c
+++ b/arch/powerpc/mm/fsl_booke_mmu.c
@@ -181,7 +181,7 @@ static unsigned long map_mem_in_cams_addr(phys_addr_t phys, 
unsigned long virt,
unsigned long cam_sz;
 
cam_sz = calc_cam_sz(ram, virt, phys);
-   settlbcam(i, virt, phys, cam_sz, PAGE_KERNEL_X, 0);
+   settlbcam(i, virt, phys, cam_sz, pgprot_val(PAGE_KERNEL_X), 0);
 
ram -= cam_sz;
amount_mapped += cam_sz;
diff --git a/arch/powerpc/mm/hugepage-hash64.c 
b/arch/powerpc/mm/hugepage-hash64.c
index 86686514ae13..43dafb9d6a46 100644
--- a/arch/powerpc/mm/hugepage-hash64.c
+++ b/arch/powerpc/mm/hugepage-hash64.c
@@ -33,7 +33,7 @@ int __hash_page_thp(unsigned long ea, unsigned long access, 
unsigned long vsid,
 * atomically mark the linux large page PMD busy and dirty
 */
do {
-   pmd_t pmd = ACCESS_ONCE(*pmdp);
+   pmd_t pmd = READ_ONCE(*pmdp);
 
old_pmd = pmd_val(pmd);
/* If PMD busy, retry the access */
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 7e408bfc7948..fa9d5c238d22 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -964,7 +964,7 @@ pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned 
long ea, unsigned *shift
*shift = 0;
 
pgdp = pgdir + pgd_index(ea);
-   pgd  = ACCESS_ONCE(*pgdp);
+   pgd  = READ_ONCE(*pgdp);
/*
 * Always operate on the local stack value. This make sure the
 * value don't get updated by a parallel THP split/collapse,
@@ -1045,7 +1045,7 @@ int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned 
long addr,
if (pte_end  end)
end = pte_end;
 
-   

[PATCH 5/6] mm/gup: Replace ACCESS_ONCE with READ_ONCE for STRICT_MM_TYPECHECKS

2015-03-25 Thread Michael Ellerman
If STRICT_MM_TYPECHECKS is enabled the generic gup code fails to build
because we are using ACCESS_ONCE on non-scalar types.

Convert all uses to READ_ONCE.

Cc: a...@linux-foundation.org
Cc: kirill.shute...@linux.intel.com
Cc: aarca...@redhat.com
Cc: borntrae...@de.ibm.com
Cc: steve.cap...@linaro.org
Cc: linux...@kvack.org
Signed-off-by: Michael Ellerman m...@ellerman.id.au
---
 mm/gup.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index a6e24e246f86..120c3adc843c 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -901,7 +901,7 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, 
unsigned long end,
 *
 * for an example see gup_get_pte in arch/x86/mm/gup.c
 */
-   pte_t pte = ACCESS_ONCE(*ptep);
+   pte_t pte = READ_ONCE(*ptep);
struct page *page;
 
/*
@@ -1191,7 +1191,7 @@ int __get_user_pages_fast(unsigned long start, int 
nr_pages, int write,
local_irq_save(flags);
pgdp = pgd_offset(mm, addr);
do {
-   pgd_t pgd = ACCESS_ONCE(*pgdp);
+   pgd_t pgd = READ_ONCE(*pgdp);
 
next = pgd_addr_end(addr, end);
if (pgd_none(pgd))
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 5/6] mm/gup: Replace ACCESS_ONCE with READ_ONCE for STRICT_MM_TYPECHECKS

2015-03-25 Thread Christian Borntraeger
Am 25.03.2015 um 10:11 schrieb Michael Ellerman:
 If STRICT_MM_TYPECHECKS is enabled the generic gup code fails to build
 because we are using ACCESS_ONCE on non-scalar types.
 
 Convert all uses to READ_ONCE.

There is a similar patch from Jason Low in Andrews patch.
If that happens in 4.0-rc, we probably want to merge this before 4.0.


 
 Cc: a...@linux-foundation.org
 Cc: kirill.shute...@linux.intel.com
 Cc: aarca...@redhat.com
 Cc: borntrae...@de.ibm.com
 Cc: steve.cap...@linaro.org
 Cc: linux...@kvack.org
 Signed-off-by: Michael Ellerman m...@ellerman.id.au
 ---
  mm/gup.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)
 
 diff --git a/mm/gup.c b/mm/gup.c
 index a6e24e246f86..120c3adc843c 100644
 --- a/mm/gup.c
 +++ b/mm/gup.c
 @@ -901,7 +901,7 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, 
 unsigned long end,
*
* for an example see gup_get_pte in arch/x86/mm/gup.c
*/
 - pte_t pte = ACCESS_ONCE(*ptep);
 + pte_t pte = READ_ONCE(*ptep);
   struct page *page;
 
   /*
 @@ -1191,7 +1191,7 @@ int __get_user_pages_fast(unsigned long start, int 
 nr_pages, int write,
   local_irq_save(flags);
   pgdp = pgd_offset(mm, addr);
   do {
 - pgd_t pgd = ACCESS_ONCE(*pgdp);
 + pgd_t pgd = READ_ONCE(*pgdp);
 
   next = pgd_addr_end(addr, end);
   if (pgd_none(pgd))
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v4 0/3] powerpc: Enable seccomp filter support

2015-03-25 Thread Michael Ellerman
On Mon, 2015-03-23 at 13:44 +0200, Purcareata Bogdan wrote:
 On 27.02.2015 22:54, Benjamin Herrenschmidt wrote:
  On Fri, 2015-02-27 at 09:28 +0200, Purcareata Bogdan wrote:
  Ping?
 
  What is the ping for ?
 
  Ben.
 
 Hello Ben,
 
 I just wanted to check with you what's the current status of these 
 patches. I noticed in patchwork [1][2][3] that the patches are marked as 
 non-applicable.
 
 As of today, I cloned Michael Ellerman's tree [4], applied the patches 
 on the master branch, compiled and tested. Tests pass both with the 
 libseccomp regression suite and my LXC tests.
 
 Is there a specific tree I should send them against, or on another 
 mailing list? Is there any other reason the patches are not applicable?

I just haven't had time to review them properly.

Because you're touching the syscall path for all powerpc platforms it needs
more scrutiny than the average patch.

It should still make 4.1, probably :)

cheers



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 5/6] mm/gup: Replace ACCESS_ONCE with READ_ONCE for STRICT_MM_TYPECHECKS

2015-03-25 Thread Michael Ellerman
On Wed, 2015-03-25 at 10:18 +0100, Christian Borntraeger wrote:
 Am 25.03.2015 um 10:11 schrieb Michael Ellerman:
  If STRICT_MM_TYPECHECKS is enabled the generic gup code fails to build
  because we are using ACCESS_ONCE on non-scalar types.
  
  Convert all uses to READ_ONCE.
 
 There is a similar patch from Jason Low in Andrews patch.

Ah sorry, I didn't think to check.

 If that happens in 4.0-rc, we probably want to merge this before 4.0.

My series can wait, it's not urgent. So I'll plan to merge mine once Andrew's
tree has gone into Linus' tree for 4.1.

cheers


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/powernv: Remove powernv RTAS support

2015-03-25 Thread Michael Ellerman
On Wed, 2015-03-25 at 16:46 +1100, Stewart Smith wrote:
 Michael Ellerman m...@ellerman.id.au writes:
 
  The powernv code has some conditional support for running on bare metal
  machines that have no OPAL firmware, but provide RTAS.
 
  No released machines ever supported that, and even in the lab it was
  just a transitional hack in the days when OPAL was still being
  developed.
 
  So remove the code.
 
  Signed-off-by: Michael Ellerman m...@ellerman.id.au
 
 The only current place I could think this could be remotely possible
 would be in simulator... and we should instead make the OPAL calls work
 properly in the simulator for all the RTAS functionality (that we care
 about).

If you mean mambo, I tested that, at least the public version, and it doesn't
provide or need RTAS.

On the other sims we ran without RTAS during the Power8 bringup, though it was
eventually used a little bit late in the cycle. In future we should be using
skiboot, or just putting logic directly into the kernel for early bringup - or
permanently :)

 Acked-by: Stewart Smith stew...@linux.vnet.ibm.com

Thanks.

cheers



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC, powerpc] perf/hv-24x7 set the attr group to NULL if events failed to be initialized

2015-03-25 Thread Michael Ellerman
On Sun, 2015-15-02 at 09:42:57 UTC, Li Zhong wrote:
 sysfs_create_groups() creates groups one by one in the attr_groups array
 before a NULL entry is encountered. But if an error is seen, it stops
 and removes all the groups already created:
 for (i = 0; groups[i]; i++) {
 error = sysfs_create_group(kobj, groups[i]);
 if (error) {
 while (--i = 0)
 sysfs_remove_group(kobj, groups[i]);
 break;
 }
 }
 
 And for the three event groups of 24x7, if it is not supported,
 according to the above logic, it causes format and interface group to be
 removed because of the error.
 
 This patch moves the three events groups to the end of the attr groups,
 and if create_events_from_catalog() fails to set their attributes, we
 set them to NULL in attr_groups.

But why are we continuing at all if create_events_from_catalog() fails?

Shouldn't that just be a fatal error and we bail?

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: powerpc/perf: add missing put_cpu_var in power_pmu_event_init

2015-03-25 Thread Jan Stancek


- Original Message -
 From: Michael Ellerman m...@ellerman.id.au
 To: Jan Stancek jstan...@redhat.com, linuxppc-dev@lists.ozlabs.org
 Cc: linux-ker...@vger.kernel.org, pau...@samba.org, an...@samba.org, 
 t...@kernel.org, c...@linux.com, jo...@redhat.com,
 jstan...@redhat.com, j...@jms.id.au
 Sent: Wednesday, 25 March, 2015 6:25:09 AM
 Subject: Re: powerpc/perf: add missing put_cpu_var in power_pmu_event_init
 
 On Tue, 2015-24-03 at 12:33:22 UTC, Jan Stancek wrote:
  One path in power_pmu_event_init() calls get_cpu_var(), but is
  missing matching call to put_cpu_var(), which causes preemption
  imbalance and crash in user-space:
  
Page fault in user mode with in_atomic() = 1 mm = c01fefa5a280
NIP = 3fff9bf2cae0  MSR = 90014280f032
Oops: Weird page fault, sig: 11 [#23]
 
 snip
 
 Thanks. But I don't see this. I guess you have CONFIG_PREEMPT enabled?

Hi,

CONFIG_PREEMPT_NOTIFIERS=y
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
CONFIG_PREEMPT_COUNT=y

but I think the difference comes from:
  CONFIG_DEBUG_ATOMIC_SLEEP=y

I did following:
- took the default config from RHEL7.1 kernel
- ran 'make oldnoconfig'.
- reproducer didn't trigger anything
- then I added CONFIG_DEBUG_ATOMIC_SLEEP=y
- this time reproducer triggered a panic (3 out of 3 attempts)

Here's config from panic-ing kernel: http://fpaste.org/202543/

[  133.957305] Page fault in user mode with in_atomic() = 1 mm = 
c5fc7e80
[  133.957399] NIP = 3fff9be0cae0  MSR = 90014280f032
[  133.957405] Oops: Weird page fault, sig: 11 [#1]
[  133.957409] SMP NR_CPUS=2048 NUMA PowerNV
[  133.957414] Modules linked in: ses enclosure shpchp uio_pdrv_genirq 
powernv_rng uio xfs libcrc32c sr_mod sd_mod cdrom ipr libata tg3 ptp pps_core 
dm_mirror dm_region_hash dm_log dm_mod
[  133.957638] CPU: 16 PID: 6035 Comm: a.out Not tainted 4.0.0-rc5+ #4
[  133.957693] task: c00fea44b640 ti: c00fea5e4000 task.ti: 
c00fea5e4000
[  133.957759] NIP: 3fff9be0cae0 LR: 3fff9bdc4898 CTR: 3fff9be0cae0
[  133.957825] REGS: c00fea5e7ea0 TRAP: 0401   Not tainted  (4.0.0-rc5+)
[  133.957880] MSR: 90014280f032 SF,HV,VEC,VSX,EE,PR,FP,ME,IR,DR,RI  CR: 
2228  XER: 
[  133.958079] CFAR: 3fff9bdc4894 SOFTE: 1 
GPR00: 3fff9bdc494c 31fef3e0 3fff9bf64410 10020068 
GPR04:  0002 0008 0001 
GPR08: 0001 3fff9bf54a30 3fff9be0cae0 3fff9be0cd70 
GPR12: 5222 3fff9bfeb700 
[  133.958485] NIP [3fff9be0cae0] 0x3fff9be0cae0
[  133.958530] LR [3fff9bdc4898] 0x3fff9bdc4898
[  133.958574] Call Trace:
[  133.958597] ---[ end trace 56ec543903422cd9 ]---
[  133.958642] 
[  135.958709] Kernel panic - not syncing: Fatal exception
[  135.958863] Rebooting in 10 seconds..
[  145.970348] BUG: sleeping function called from invalid context at 
kernel/irq/manage.c:104
[  145.970453] in_atomic(): 1, irqs_disabled(): 1, pid: 6035, name: a.out
[  145.970515] CPU: 16 PID: 6035 Comm: a.out Tainted: G  D 
4.0.0-rc5+ #4
[  145.970588] Call Trace:
[  145.970618] [c00fea5e76d0] [c07c2090] .dump_stack+0x98/0xd4 
(unreliable)
[  145.970707] [c00fea5e7750] [c00d5fe4] .___might_sleep+0x124/0x170
[  145.970782] [c00fea5e77c0] [c0112860] .synchronize_irq+0x40/0xe0
[  145.970857] [c00fea5e7880] [c0112fa8] .__free_irq+0xf8/0x2b0
[  145.970931] [c00fea5e7920] [c0113258] .free_irq+0x78/0x100
[  145.971007] [c00fea5e79b0] [c0067ae8] .opal_shutdown+0x88/0x120
[  145.971081] [c00fea5e7a40] [c0063e88] .pnv_shutdown+0x18/0x30
[  145.971157] [c00fea5e7ab0] [c0020c98] .machine_shutdown+0x38/0x50
[  145.971231] [c00fea5e7b20] [c0020d24] .machine_restart+0x14/0x70
[  145.971307] [c00fea5e7ba0] [c00cdc10] 
.emergency_restart+0x20/0x40
[  145.971393] [c00fea5e7c10] [c07bb0a4] .panic+0x224/0x2a4
[  145.971468] [c00fea5e7cb0] [c001e1fc] .die+0x43c/0x450
[  145.971543] [c00fea5e7d60] [c07b62c4] .do_page_fault+0x2d4/0x8f0
[  145.971618] [c00fea5e7e30] [c0008664] handle_page_fault+0x10/0x30

Regards,
Jan
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 0/2] Tracking user space vDSO remaping

2015-03-25 Thread Laurent Dufour
CRIU is recreating the process memory layout by remapping the checkpointee
memory area on top of the current process (criu). This includes remapping
the vDSO to the place it has at checkpoint time.

However some architectures like powerpc are keeping a reference to the vDSO
base address to build the signal return stack frame by calling the vDSO
sigreturn service. So once the vDSO has been moved, this reference is no
more valid and the signal frame built later are not usable.

This patch serie is introducing a new mm hook 'arch_remap' which is called
when mremap is done and the mm lock still hold. The next patch is adding the
vDSO remap and unmap tracking to the powerpc architecture.

Changes in v3:
--
- Fixed grammatical error in a comment of the second patch. 
  Thanks again, Ingo.

Changes in v2:
--
- Following the Ingo Molnar's advice, enabling the call to arch_remap through
  the __HAVE_ARCH_REMAP macro. This reduces considerably the first patch.

Laurent Dufour (2):
  mm: Introducing arch_remap hook
  powerpc/mm: Tracking vDSO remap

 arch/powerpc/include/asm/mmu_context.h | 36 +-
 mm/mremap.c| 11 +--
 2 files changed, 44 insertions(+), 3 deletions(-)

-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/2] KVM: PPC: Use ACCESS_ONCE when dereferencing pte_t pointer

2015-03-25 Thread Aneesh Kumar K.V

Hi,

Ignore this series, I used a wrong directory when sending out the
patchset. Will send a v3.

-aneesh

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 2/2] powerpc/mm: Tracking vDSO remap

2015-03-25 Thread Laurent Dufour
On 25/03/2015 13:11, Ingo Molnar wrote:
 
 * Laurent Dufour lduf...@linux.vnet.ibm.com wrote:
 
 Some processes (CRIU) are moving the vDSO area using the mremap system
 call. As a consequence the kernel reference to the vDSO base address is
 no more valid and the signal return frame built once the vDSO has been
 moved is not pointing to the new sigreturn address.

 This patch handles vDSO remapping and unmapping.

 Signed-off-by: Laurent Dufour lduf...@linux.vnet.ibm.com
 ---
  arch/powerpc/include/asm/mmu_context.h | 36 
 +-
  1 file changed, 35 insertions(+), 1 deletion(-)

 diff --git a/arch/powerpc/include/asm/mmu_context.h 
 b/arch/powerpc/include/asm/mmu_context.h
 index 73382eba02dc..be5dca3f7826 100644
 --- a/arch/powerpc/include/asm/mmu_context.h
 +++ b/arch/powerpc/include/asm/mmu_context.h
 @@ -8,7 +8,6 @@
  #include linux/spinlock.h
  #include asm/mmu.h
  #include asm/cputable.h
 -#include asm-generic/mm_hooks.h
  #include asm/cputhreads.h
  
  /*
 @@ -109,5 +108,40 @@ static inline void enter_lazy_tlb(struct mm_struct *mm,
  #endif
  }
  
 +static inline void arch_dup_mmap(struct mm_struct *oldmm,
 + struct mm_struct *mm)
 +{
 +}
 +
 +static inline void arch_exit_mmap(struct mm_struct *mm)
 +{
 +}
 +
 +static inline void arch_unmap(struct mm_struct *mm,
 +struct vm_area_struct *vma,
 +unsigned long start, unsigned long end)
 +{
 +if (start = mm-context.vdso_base  mm-context.vdso_base  end)
 +mm-context.vdso_base = 0;
 +}
 +
 +static inline void arch_bprm_mm_init(struct mm_struct *mm,
 + struct vm_area_struct *vma)
 +{
 +}
 +
 +#define __HAVE_ARCH_REMAP
 +static inline void arch_remap(struct mm_struct *mm,
 +  unsigned long old_start, unsigned long old_end,
 +  unsigned long new_start, unsigned long new_end)
 +{
 +/*
 + * mremap don't allow moving multiple vma so we can limit the check
 + * to old_start == vdso_base.
 
 s/mremap don't allow moving multiple vma
   mremap() doesn't allow moving multiple vmas
 
 right?

Sure you're right.

I'll provide a v3 fixing that comment.

Thanks,
Laurent.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V3 3/3] powerpc/mm/thp: Make page table walk safe against thp split/collapse

2015-03-25 Thread Aneesh Kumar K.V
We can disable a THP split or a hugepage collapse by disabling irq.
We do send IPI to all the cpus in the early part of split/collapse,
and disabling local irq ensure we don't make progress with
split/collapse. If the THP is getting split we return NULL from
find_linux_pte_or_hugepte(). For all the current callers it should be ok.
We need to be careful if we want to use returned pte_t pointer outside
the irq disabled region. W.r.t to THP split, the pfn remains the same,
but then a hugepage collapse will result in a pfn change. There are
few steps we can take to avoid a hugepage collapse.One way is to take page
reference inside the irq disable region. Other option is to take
mmap_sem so that a parallel collapse will not happen. We can also
disable collapse by taking pmd_lock. Another method used by kvm
subsystem is to check whether we had a mmu_notifer update in between
using mmu_notifier_retry().

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/kvm_book3s_64.h | 12 ++--
 arch/powerpc/include/asm/pgtable.h   | 11 ++-
 arch/powerpc/kernel/eeh.c|  6 --
 arch/powerpc/kernel/io-workarounds.c | 10 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c  | 14 ++
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  | 32 
 arch/powerpc/kvm/e500_mmu_host.c | 14 --
 arch/powerpc/mm/hash_utils_64.c  |  2 +-
 arch/powerpc/mm/hugetlbpage.c| 20 ++--
 arch/powerpc/perf/callchain.c| 24 ++--
 10 files changed, 92 insertions(+), 53 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index f06820c67175..5233a35d80e2 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -281,11 +281,9 @@ static inline int hpte_cache_flags_ok(unsigned long ptel, 
unsigned long io_type)
 
 /*
  * If it's present and writable, atomically set dirty and referenced bits and
- * return the PTE, otherwise return 0. If we find a transparent hugepage
- * and if it is marked splitting we return 0;
+ * return the PTE, otherwise return 0.
  */
-static inline pte_t kvmppc_read_update_linux_pte(pte_t *ptep, int writing,
-unsigned int hugepage)
+static inline pte_t kvmppc_read_update_linux_pte(pte_t *ptep, int writing)
 {
pte_t old_pte, new_pte = __pte(0);
 
@@ -301,12 +299,6 @@ static inline pte_t kvmppc_read_update_linux_pte(pte_t 
*ptep, int writing,
cpu_relax();
continue;
}
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-   /* If hugepage and is trans splitting return None */
-   if (unlikely(hugepage 
-pmd_trans_splitting(pte_pmd(old_pte
-   return __pte(0);
-#endif
/* If pte is not present return None */
if (unlikely(!(pte_val(old_pte)  _PAGE_PRESENT)))
return __pte(0);
diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index 92fe01c355a9..11a38635dd65 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -247,8 +247,17 @@ extern int gup_hugepte(pte_t *ptep, unsigned long sz, 
unsigned long addr,
 #define pmd_large(pmd) 0
 #define has_transparent_hugepage() 0
 #endif
-pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea,
+pte_t *__find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea,
 unsigned *shift);
+static inline pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea,
+  unsigned *shift)
+{
+   if (!arch_irqs_disabled()) {
+   pr_info(%s called with irq enabled\n, __func__);
+   dump_stack();
+   }
+   return __find_linux_pte_or_hugepte(pgdir, ea, shift);
+}
 #endif /* __ASSEMBLY__ */
 
 #endif /* __KERNEL__ */
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 3b2252e7731b..8424b232e598 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -330,9 +330,11 @@ static inline unsigned long eeh_token_to_phys(unsigned 
long token)
int hugepage_shift;
 
/*
-* We won't find hugepages here, iomem
+* We won't find hugepages here(this is iomem). Hence we are not
+* worried about _PAGE_SPLITTING/collapse. Also we will not hit
+* page table free, because of init_mm.
 */
-   ptep = find_linux_pte_or_hugepte(init_mm.pgd, token, hugepage_shift);
+   ptep = __find_linux_pte_or_hugepte(init_mm.pgd, token, hugepage_shift);
if (!ptep)
return token;
WARN_ON(hugepage_shift);
diff --git a/arch/powerpc/kernel/io-workarounds.c 
b/arch/powerpc/kernel/io-workarounds.c
index 

Re: [PATCH v2 2/2] powerpc/mm: Tracking vDSO remap

2015-03-25 Thread Ingo Molnar

* Laurent Dufour lduf...@linux.vnet.ibm.com wrote:

 Some processes (CRIU) are moving the vDSO area using the mremap system
 call. As a consequence the kernel reference to the vDSO base address is
 no more valid and the signal return frame built once the vDSO has been
 moved is not pointing to the new sigreturn address.
 
 This patch handles vDSO remapping and unmapping.
 
 Signed-off-by: Laurent Dufour lduf...@linux.vnet.ibm.com
 ---
  arch/powerpc/include/asm/mmu_context.h | 36 
 +-
  1 file changed, 35 insertions(+), 1 deletion(-)
 
 diff --git a/arch/powerpc/include/asm/mmu_context.h 
 b/arch/powerpc/include/asm/mmu_context.h
 index 73382eba02dc..be5dca3f7826 100644
 --- a/arch/powerpc/include/asm/mmu_context.h
 +++ b/arch/powerpc/include/asm/mmu_context.h
 @@ -8,7 +8,6 @@
  #include linux/spinlock.h
  #include asm/mmu.h 
  #include asm/cputable.h
 -#include asm-generic/mm_hooks.h
  #include asm/cputhreads.h
  
  /*
 @@ -109,5 +108,40 @@ static inline void enter_lazy_tlb(struct mm_struct *mm,
  #endif
  }
  
 +static inline void arch_dup_mmap(struct mm_struct *oldmm,
 +  struct mm_struct *mm)
 +{
 +}
 +
 +static inline void arch_exit_mmap(struct mm_struct *mm)
 +{
 +}
 +
 +static inline void arch_unmap(struct mm_struct *mm,
 + struct vm_area_struct *vma,
 + unsigned long start, unsigned long end)
 +{
 + if (start = mm-context.vdso_base  mm-context.vdso_base  end)
 + mm-context.vdso_base = 0;
 +}
 +
 +static inline void arch_bprm_mm_init(struct mm_struct *mm,
 +  struct vm_area_struct *vma)
 +{
 +}
 +
 +#define __HAVE_ARCH_REMAP
 +static inline void arch_remap(struct mm_struct *mm,
 +   unsigned long old_start, unsigned long old_end,
 +   unsigned long new_start, unsigned long new_end)
 +{
 + /*
 +  * mremap don't allow moving multiple vma so we can limit the check
 +  * to old_start == vdso_base.

s/mremap don't allow moving multiple vma
  mremap() doesn't allow moving multiple vmas

right?

Thanks,

Ingo
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3] powerpc: Use PFN_PHYS() to avoid truncating the physical address

2015-03-25 Thread Emil Medve
Signed-off-by: Emil Medve emilian.me...@freescale.com
---

v3: Rebased and updated due to upstream changes since v2

v2: Rebased and updated due to upstream changes since v1

 arch/powerpc/include/asm/io.h  | 2 +-
 arch/powerpc/include/asm/page.h| 2 +-
 arch/powerpc/include/asm/pgalloc-32.h  | 2 +-
 arch/powerpc/include/asm/rtas.h| 3 ++-
 arch/powerpc/kernel/crash_dump.c   | 2 +-
 arch/powerpc/kernel/eeh.c  | 4 +---
 arch/powerpc/kernel/io-workarounds.c   | 2 +-
 arch/powerpc/kernel/pci-common.c   | 2 +-
 arch/powerpc/kernel/vdso.c | 6 +++---
 arch/powerpc/kvm/book3s_64_mmu_host.c  | 2 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c| 2 +-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c| 4 ++--
 arch/powerpc/kvm/e500_mmu_host.c   | 5 ++---
 arch/powerpc/mm/hugepage-hash64.c  | 2 +-
 arch/powerpc/mm/hugetlbpage-book3e.c   | 2 +-
 arch/powerpc/mm/hugetlbpage-hash64.c   | 2 +-
 arch/powerpc/mm/mem.c  | 9 -
 arch/powerpc/mm/numa.c | 5 ++---
 arch/powerpc/platforms/powernv/opal.c  | 2 +-
 arch/powerpc/platforms/pseries/iommu.c | 8 
 20 files changed, 32 insertions(+), 36 deletions(-)

diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h
index 9eaf301..d6454f5 100644
--- a/arch/powerpc/include/asm/io.h
+++ b/arch/powerpc/include/asm/io.h
@@ -794,7 +794,7 @@ static inline void * phys_to_virt(unsigned long address)
 /*
  * Change struct page to physical address.
  */
-#define page_to_phys(page) ((phys_addr_t)page_to_pfn(page)  PAGE_SHIFT)
+#define page_to_phys(page) PFN_PHYS(page_to_pfn(page))
 
 /*
  * 32 bits still uses virt_to_bus() for it's implementation of DMA
diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
index 69c0598..30f33ed 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -128,7 +128,7 @@ extern long long virt_phys_offset;
 #endif
 
 #define virt_to_page(kaddr)pfn_to_page(__pa(kaddr)  PAGE_SHIFT)
-#define pfn_to_kaddr(pfn)  __va((pfn)  PAGE_SHIFT)
+#define pfn_to_kaddr(pfn)  __va(PFN_PHYS(pfn))
 #define virt_addr_valid(kaddr) pfn_valid(__pa(kaddr)  PAGE_SHIFT)
 
 /*
diff --git a/arch/powerpc/include/asm/pgalloc-32.h 
b/arch/powerpc/include/asm/pgalloc-32.h
index 842846c..3d19a8e 100644
--- a/arch/powerpc/include/asm/pgalloc-32.h
+++ b/arch/powerpc/include/asm/pgalloc-32.h
@@ -24,7 +24,7 @@ extern void pgd_free(struct mm_struct *mm, pgd_t *pgd);
 #define pmd_populate_kernel(mm, pmd, pte)  \
(pmd_val(*(pmd)) = __pa(pte) | _PMD_PRESENT)
 #define pmd_populate(mm, pmd, pte) \
-   (pmd_val(*(pmd)) = (page_to_pfn(pte)  PAGE_SHIFT) | 
_PMD_PRESENT)
+   (pmd_val(*(pmd)) = PFN_PHYS(page_to_pfn(pte)) | _PMD_PRESENT)
 #define pmd_pgtable(pmd) pmd_page(pmd)
 #else
 #define pmd_populate_kernel(mm, pmd, pte)  \
diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 2e23e92..2e430b6d 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -3,6 +3,7 @@
 #ifdef __KERNEL__
 
 #include linux/spinlock.h
+#include linux/pfn.h
 #include asm/page.h
 
 /*
@@ -418,7 +419,7 @@ extern void rtas_take_timebase(void);
 #ifdef CONFIG_PPC_RTAS
 static inline int page_is_rtas_user_buf(unsigned long pfn)
 {
-   unsigned long paddr = (pfn  PAGE_SHIFT);
+   unsigned long paddr = PFN_PHYS(pfn);
if (paddr = rtas_rmo_buf  paddr  (rtas_rmo_buf + RTAS_RMOBUF_MAX))
return 1;
return 0;
diff --git a/arch/powerpc/kernel/crash_dump.c b/arch/powerpc/kernel/crash_dump.c
index cfa0f81..b6578ee 100644
--- a/arch/powerpc/kernel/crash_dump.c
+++ b/arch/powerpc/kernel/crash_dump.c
@@ -104,7 +104,7 @@ ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
return 0;
 
csize = min_t(size_t, csize, PAGE_SIZE);
-   paddr = pfn  PAGE_SHIFT;
+   paddr = PFN_PHYS(pfn);
 
if (memblock_is_region_memory(paddr, csize)) {
vaddr = __va(paddr);
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 3b2252e..119af20 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -326,7 +326,6 @@ void eeh_slot_error_detail(struct eeh_pe *pe, int severity)
 static inline unsigned long eeh_token_to_phys(unsigned long token)
 {
pte_t *ptep;
-   unsigned long pa;
int hugepage_shift;
 
/*
@@ -336,9 +335,8 @@ static inline unsigned long eeh_token_to_phys(unsigned long 
token)
if (!ptep)
return token;
WARN_ON(hugepage_shift);
-   pa = pte_pfn(*ptep)  PAGE_SHIFT;
 
-   return pa | (token  (PAGE_SIZE-1));
+   return PFN_PHYS(pte_pfn(*ptep)) | (token  (PAGE_SIZE - 1));
 }
 
 /*
diff --git a/arch/powerpc/kernel/io-workarounds.c 
b/arch/powerpc/kernel/io-workarounds.c
index 24b968f..dd9a4a2 100644
--- a/arch/powerpc/kernel/io-workarounds.c
+++ 

[PATCH V3 2/3] powerpc/mm: Remove page table walk helpers

2015-03-25 Thread Aneesh Kumar K.V
This patch remove helpers which we had used only once in the code.
Limiting page table walk variants help in ensuring that we won't
end up with code walking page table with wrong assumptions.

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/pgtable.h  | 21 -
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 62 -
 arch/powerpc/kvm/e500_mmu_host.c|  2 +-
 3 files changed, 28 insertions(+), 57 deletions(-)

diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index 9835ac4173b7..92fe01c355a9 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -249,27 +249,6 @@ extern int gup_hugepte(pte_t *ptep, unsigned long sz, 
unsigned long addr,
 #endif
 pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea,
 unsigned *shift);
-
-static inline pte_t *lookup_linux_ptep(pgd_t *pgdir, unsigned long hva,
-unsigned long *pte_sizep)
-{
-   pte_t *ptep;
-   unsigned long ps = *pte_sizep;
-   unsigned int shift;
-
-   ptep = find_linux_pte_or_hugepte(pgdir, hva, shift);
-   if (!ptep)
-   return NULL;
-   if (shift)
-   *pte_sizep = 1ul  shift;
-   else
-   *pte_sizep = PAGE_SIZE;
-
-   if (ps  *pte_sizep)
-   return NULL;
-
-   return ptep;
-}
 #endif /* __ASSEMBLY__ */
 
 #endif /* __KERNEL__ */
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 625407e4d3b0..73e083cb9f7e 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -131,25 +131,6 @@ static void remove_revmap_chain(struct kvm *kvm, long 
pte_index,
unlock_rmap(rmap);
 }
 
-static pte_t lookup_linux_pte_and_update(pgd_t *pgdir, unsigned long hva,
- int writing, unsigned long *pte_sizep)
-{
-   pte_t *ptep;
-   unsigned long ps = *pte_sizep;
-   unsigned int hugepage_shift;
-
-   ptep = find_linux_pte_or_hugepte(pgdir, hva, hugepage_shift);
-   if (!ptep)
-   return __pte(0);
-   if (hugepage_shift)
-   *pte_sizep = 1ul  hugepage_shift;
-   else
-   *pte_sizep = PAGE_SIZE;
-   if (ps  *pte_sizep)
-   return __pte(0);
-   return kvmppc_read_update_linux_pte(ptep, writing, hugepage_shift);
-}
-
 static inline void unlock_hpte(__be64 *hpte, unsigned long hpte_v)
 {
asm volatile(PPC_RELEASE_BARRIER  : : : memory);
@@ -166,10 +147,10 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long 
flags,
struct revmap_entry *rev;
unsigned long g_ptel;
struct kvm_memory_slot *memslot;
-   unsigned long pte_size;
+   unsigned hpage_shift;
unsigned long is_io;
unsigned long *rmap;
-   pte_t pte;
+   pte_t *ptep;
unsigned int writing;
unsigned long mmu_seq;
unsigned long rcbits;
@@ -208,22 +189,33 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long 
flags,
 
/* Translate to host virtual address */
hva = __gfn_to_hva_memslot(memslot, gfn);
+   ptep = find_linux_pte_or_hugepte(pgdir, hva, hpage_shift);
+   if (ptep) {
+   pte_t pte;
+   unsigned int host_pte_size;
 
-   /* Look up the Linux PTE for the backing page */
-   pte_size = psize;
-   pte = lookup_linux_pte_and_update(pgdir, hva, writing, pte_size);
-   if (pte_present(pte)  !pte_protnone(pte)) {
-   if (writing  !pte_write(pte))
-   /* make the actual HPTE be read-only */
-   ptel = hpte_make_readonly(ptel);
-   is_io = hpte_cache_bits(pte_val(pte));
-   pa = pte_pfn(pte)  PAGE_SHIFT;
-   pa |= hva  (pte_size - 1);
-   pa |= gpa  ~PAGE_MASK;
-   }
+   if (hpage_shift)
+   host_pte_size = 1ul  hpage_shift;
+   else
+   host_pte_size = PAGE_SIZE;
+   /*
+* We should always find the guest page size
+* to = host page size, if host is using hugepage
+*/
+   if (host_pte_size  psize)
+   return H_PARAMETER;
 
-   if (pte_size  psize)
-   return H_PARAMETER;
+   pte = kvmppc_read_update_linux_pte(ptep, writing, hpage_shift);
+   if (pte_present(pte)  !pte_protnone(pte)) {
+   if (writing  !pte_write(pte))
+   /* make the actual HPTE be read-only */
+   ptel = hpte_make_readonly(ptel);
+   is_io = hpte_cache_bits(pte_val(pte));
+   pa = pte_pfn(pte)  PAGE_SHIFT;
+   pa |= hva  (host_pte_size - 1);
+   pa |= gpa  

[PATCH v3 2/2] powerpc/mm: Tracking vDSO remap

2015-03-25 Thread Laurent Dufour
Some processes (CRIU) are moving the vDSO area using the mremap system
call. As a consequence the kernel reference to the vDSO base address is
no more valid and the signal return frame built once the vDSO has been
moved is not pointing to the new sigreturn address.

This patch handles vDSO remapping and unmapping.

Signed-off-by: Laurent Dufour lduf...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/mmu_context.h | 36 +-
 1 file changed, 35 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/mmu_context.h 
b/arch/powerpc/include/asm/mmu_context.h
index 73382eba02dc..7d315c1898d4 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -8,7 +8,6 @@
 #include linux/spinlock.h
 #include asm/mmu.h   
 #include asm/cputable.h
-#include asm-generic/mm_hooks.h
 #include asm/cputhreads.h
 
 /*
@@ -109,5 +108,40 @@ static inline void enter_lazy_tlb(struct mm_struct *mm,
 #endif
 }
 
+static inline void arch_dup_mmap(struct mm_struct *oldmm,
+struct mm_struct *mm)
+{
+}
+
+static inline void arch_exit_mmap(struct mm_struct *mm)
+{
+}
+
+static inline void arch_unmap(struct mm_struct *mm,
+   struct vm_area_struct *vma,
+   unsigned long start, unsigned long end)
+{
+   if (start = mm-context.vdso_base  mm-context.vdso_base  end)
+   mm-context.vdso_base = 0;
+}
+
+static inline void arch_bprm_mm_init(struct mm_struct *mm,
+struct vm_area_struct *vma)
+{
+}
+
+#define __HAVE_ARCH_REMAP
+static inline void arch_remap(struct mm_struct *mm,
+ unsigned long old_start, unsigned long old_end,
+ unsigned long new_start, unsigned long new_end)
+{
+   /*
+* mremap() doesn't allow moving multiple vmas so we can limit the
+* check to old_start == vdso_base.
+*/
+   if (old_start == mm-context.vdso_base)
+   mm-context.vdso_base = new_start;
+}
+
 #endif /* __KERNEL__ */
 #endif /* __ASM_POWERPC_MMU_CONTEXT_H */
-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 1/2] mm: Introducing arch_remap hook

2015-03-25 Thread Laurent Dufour
Some architecture would like to be triggered when a memory area is moved
through the mremap system call.

This patch is introducing a new arch_remap mm hook which is placed in the
path of mremap, and is called before the old area is unmapped (and the
arch_unmap hook is called).

The architectures which need to call this hook should define
__HAVE_ARCH_REMAP in their asm/mmu_context.h and provide the arch_remap
service with the following prototype:
void arch_remap(struct mm_struct *mm,
unsigned long old_start, unsigned long old_end,
unsigned long new_start, unsigned long new_end);

Signed-off-by: Laurent Dufour lduf...@linux.vnet.ibm.com
---
 mm/mremap.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/mm/mremap.c b/mm/mremap.c
index 57dadc025c64..bafc234db45c 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -25,6 +25,7 @@
 
 #include asm/cacheflush.h
 #include asm/tlbflush.h
+#include asm/mmu_context.h
 
 #include internal.h
 
@@ -286,8 +287,14 @@ static unsigned long move_vma(struct vm_area_struct *vma,
old_len = new_len;
old_addr = new_addr;
new_addr = -ENOMEM;
-   } else if (vma-vm_file  vma-vm_file-f_op-mremap)
-   vma-vm_file-f_op-mremap(vma-vm_file, new_vma);
+   } else {
+   if (vma-vm_file  vma-vm_file-f_op-mremap)
+   vma-vm_file-f_op-mremap(vma-vm_file, new_vma);
+#ifdef __HAVE_ARCH_REMAP
+   arch_remap(mm, old_addr, old_addr+old_len,
+  new_addr, new_addr+new_len);
+#endif
+   }
 
/* Conceal VM_ACCOUNT so old reservation is not undone */
if (vm_flags  VM_ACCOUNT) {
-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V3 1/3] KVM: PPC: Use READ_ONCE when dereferencing pte_t pointer

2015-03-25 Thread Aneesh Kumar K.V
pte can get updated from other CPUs as part of multiple activities
like THP split, huge page collapse, unmap. We need to make sure we
don't reload the pte value again and again for different checks.

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
NOTE:
The series depends on the patch
 [PATCH 4/6] powerpc: Fix compile errors with STRICT_MM_TYPECHECKS enabled

 arch/powerpc/include/asm/kvm_book3s_64.h |  5 -
 arch/powerpc/kvm/e500_mmu_host.c | 20 
 2 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index cc073a7ac2b7..f06820c67175 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -290,7 +290,10 @@ static inline pte_t kvmppc_read_update_linux_pte(pte_t 
*ptep, int writing,
pte_t old_pte, new_pte = __pte(0);
 
while (1) {
-   old_pte = *ptep;
+   /*
+* Make sure we don't reload from ptep
+*/
+   old_pte = READ_ONCE(*ptep);
/*
 * wait until _PAGE_BUSY is clear then set it atomically
 */
diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
index cc536d4a75ef..5840d546aa03 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -469,14 +469,18 @@ static inline int kvmppc_e500_shadow_map(struct 
kvmppc_vcpu_e500 *vcpu_e500,
 
pgdir = vcpu_e500-vcpu.arch.pgdir;
ptep = lookup_linux_ptep(pgdir, hva, tsize_pages);
-   if (pte_present(*ptep))
-   wimg = (*ptep  PTE_WIMGE_SHIFT)  MAS2_WIMGE_MASK;
-   else {
-   if (printk_ratelimit())
-   pr_err(%s: pte not present: gfn %lx, pfn %lx\n,
-   __func__, (long)gfn, pfn);
-   ret = -EINVAL;
-   goto out;
+   if (ptep) {
+   pte_t pte = READ_ONCE(*ptep);
+
+   if (pte_present(pte))
+   wimg = (pte_val(pte)  PTE_WIMGE_SHIFT) 
+   MAS2_WIMGE_MASK;
+   else {
+   pr_err_ratelimited(%s: pte not present: gfn %lx,pfn 
%lx\n,
+  __func__, (long)gfn, pfn);
+   ret = -EINVAL;
+   goto out;
+   }
}
kvmppc_e500_ref_setup(ref, gtlbe, pfn, wimg);
 
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: drivers/of: Add empty ranges quirk for PA-Semi

2015-03-25 Thread Grant Likely
On Mon, 23 Mar 2015 15:06:35 +1100
, Benjamin Herrenschmidt b...@kernel.crashing.org
 wrote:
 On Mon, 2015-03-23 at 14:50 +1100, Michael Ellerman wrote:
  On Mon, 2015-23-03 at 03:16:38 UTC, Benjamin Herrenschmidt wrote:
   The sdc node is missing the ranges property, it needs to be treated
   as having an empty one otherwise translation fails for its children.
   
   Tested-by: Steven Rostedt rost...@goodmis.org
   Signed-off-by: Benjamin Herrenschmidt b...@kernel.crashing.org
  
  Fixes: 746c9e9f92dd (of/base: Fix PowerPC address parsing hack)
  
  Which went into 3.18-rc6, and was CC'ed to stable. So this should probably 
  also
  go to stable no?
 
 Sure, go for it.

Applied, thanks.

g.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v7 RFC 3/3] sparc: Make LDC use common iommu poll management functions

2015-03-25 Thread Sowmini Varadhan
Note that this conversion is only being done to consolidate the
code and ensure that the common code provides the sufficient
abstraction. It is not expected to result in any noticeable
performance improvement, as there is typically one ldc_iommu
per vnet_port, and each one has 8k entries, with a typical
request for 1-4 pages.  Thus LDC uses npools == 1.

Signed-off-by: Sowmini Varadhan sowmini.varad...@oracle.com
---
v3: added this file to be a consumer of the common iommu library
v4: removed -cookie_to_index and -demap from iommu_tbl_ops and instead
inline these calls into ldc before calling into iommu-common
v6: remove iommu_tbl_ops
v7: move pool_hash initialization to iommu_tbl_pool_init

 arch/sparc/kernel/ldc.c |  152 ---
 1 files changed, 64 insertions(+), 88 deletions(-)

diff --git a/arch/sparc/kernel/ldc.c b/arch/sparc/kernel/ldc.c
index 274a9f5..e858968 100644
--- a/arch/sparc/kernel/ldc.c
+++ b/arch/sparc/kernel/ldc.c
@@ -15,6 +15,7 @@
 #include linux/list.h
 #include linux/init.h
 #include linux/bitmap.h
+#include linux/iommu-common.h
 
 #include asm/hypervisor.h
 #include asm/iommu.h
@@ -27,6 +28,10 @@
 #define DRV_MODULE_VERSION 1.1
 #define DRV_MODULE_RELDATE July 22, 2008
 
+#define COOKIE_PGSZ_CODE   0xf000ULL
+#define COOKIE_PGSZ_CODE_SHIFT 60ULL
+
+
 static char version[] =
DRV_MODULE_NAME .c:v DRV_MODULE_VERSION  ( DRV_MODULE_RELDATE )\n;
 #define LDC_PACKET_SIZE64
@@ -98,10 +103,10 @@ static const struct ldc_mode_ops stream_ops;
 int ldom_domaining_enabled;
 
 struct ldc_iommu {
-   /* Protects arena alloc/free.  */
+   /* Protects ldc_unmap.  */
spinlock_t  lock;
-   struct iommu_arena  arena;
struct ldc_mtable_entry *page_table;
+   struct iommu_table  iommu_table;
 };
 
 struct ldc_channel {
@@ -998,31 +1003,59 @@ static void free_queue(unsigned long num_entries, struct 
ldc_packet *q)
free_pages((unsigned long)q, order);
 }
 
+static unsigned long ldc_cookie_to_index(u64 cookie, void *arg)
+{
+   u64 szcode = cookie  COOKIE_PGSZ_CODE_SHIFT;
+   /* struct ldc_iommu *ldc_iommu = (struct ldc_iommu *)arg; */
+
+   cookie = ~COOKIE_PGSZ_CODE;
+
+   return (cookie  (13ULL + (szcode * 3ULL)));
+}
+
+static void ldc_demap(struct ldc_iommu *iommu, unsigned long id, u64 cookie,
+ unsigned long entry, unsigned long npages)
+{
+   struct ldc_mtable_entry *base;
+   unsigned long i, shift;
+
+   shift = (cookie  COOKIE_PGSZ_CODE_SHIFT) * 3;
+   base = iommu-page_table + entry;
+   for (i = 0; i  npages; i++) {
+   if (base-cookie)
+   sun4v_ldc_revoke(id, cookie + (i  shift),
+base-cookie);
+   base-mte = 0;
+   }
+}
+
 /* XXX Make this configurable... XXX */
 #define LDC_IOTABLE_SIZE   (8 * 1024)
 
-static int ldc_iommu_init(struct ldc_channel *lp)
+static int ldc_iommu_init(const char *name, struct ldc_channel *lp)
 {
unsigned long sz, num_tsb_entries, tsbsize, order;
-   struct ldc_iommu *iommu = lp-iommu;
+   struct ldc_iommu *ldc_iommu = lp-iommu;
+   struct iommu_table *iommu = ldc_iommu-iommu_table;
struct ldc_mtable_entry *table;
unsigned long hv_err;
int err;
 
num_tsb_entries = LDC_IOTABLE_SIZE;
tsbsize = num_tsb_entries * sizeof(struct ldc_mtable_entry);
-
-   spin_lock_init(iommu-lock);
+   spin_lock_init(ldc_iommu-lock);
 
sz = num_tsb_entries / 8;
sz = (sz + 7UL)  ~7UL;
-   iommu-arena.map = kzalloc(sz, GFP_KERNEL);
-   if (!iommu-arena.map) {
+   iommu-map = kzalloc(sz, GFP_KERNEL);
+   if (!iommu-map) {
printk(KERN_ERR PFX Alloc of arena map failed, sz=%lu\n, sz);
return -ENOMEM;
}
-
-   iommu-arena.limit = num_tsb_entries;
+   iommu_tbl_pool_init(iommu, num_tsb_entries, PAGE_SHIFT,
+   NULL, false /* no large pool */,
+   1 /* npools */,
+   true /* skip span boundary check */);
 
order = get_order(tsbsize);
 
@@ -1037,7 +1070,7 @@ static int ldc_iommu_init(struct ldc_channel *lp)
 
memset(table, 0, PAGE_SIZE  order);
 
-   iommu-page_table = table;
+   ldc_iommu-page_table = table;
 
hv_err = sun4v_ldc_set_map_table(lp-id, __pa(table),
 num_tsb_entries);
@@ -1049,31 +1082,32 @@ static int ldc_iommu_init(struct ldc_channel *lp)
 
 out_free_table:
free_pages((unsigned long) table, order);
-   iommu-page_table = NULL;
+   ldc_iommu-page_table = NULL;
 
 out_free_map:
-   kfree(iommu-arena.map);
-   iommu-arena.map = NULL;
+   kfree(iommu-map);
+   iommu-map = NULL;
 
return err;
 }
 
 static void ldc_iommu_release(struct 

[PATCH] powerpc/powernv: handle OPAL_SUCCESS return in opal_sensor_read

2015-03-25 Thread Cédric Le Goater
Currently, when a sensor value is read, the kernel calls OPAL, which in
turn builds a message for the FSP, and waits for a message back. 

The new device tree for OPAL sensors [1] adds new sensors that can be 
read synchronously (core temperatures for instance) and that don't need 
to wait for a response.

This patch modifies the opal call to accept an OPAL_SUCCESS return value
and cover the case above.

[1] https://lists.ozlabs.org/pipermail/skiboot/2015-March/000639.html

Signed-off-by: Cédric Le Goater c...@fr.ibm.com
---

 We still uselessly reserve a token (for the response) and take a
 lock, which might raise the need of a new 'opal_sensor_read_sync' 
 call.

 arch/powerpc/platforms/powernv/opal-sensor.c |   29 +-
 1 file changed, 19 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/opal-sensor.c 
b/arch/powerpc/platforms/powernv/opal-sensor.c
index 4ab67ef7abc9..99d6d9a371ab 100644
--- a/arch/powerpc/platforms/powernv/opal-sensor.c
+++ b/arch/powerpc/platforms/powernv/opal-sensor.c
@@ -46,18 +46,27 @@ int opal_get_sensor_data(u32 sensor_hndl, u32 *sensor_data)
 
mutex_lock(opal_sensor_mutex);
ret = opal_sensor_read(sensor_hndl, token, data);
-   if (ret != OPAL_ASYNC_COMPLETION)
-   goto out_token;
+   switch (ret) {
+   case OPAL_ASYNC_COMPLETION:
+   ret = opal_async_wait_response(token, msg);
+   if (ret) {
+   pr_err(%s: Failed to wait for the async response, 
%d\n,
+  __func__, ret);
+   goto out_token;
+   }
 
-   ret = opal_async_wait_response(token, msg);
-   if (ret) {
-   pr_err(%s: Failed to wait for the async response, %d\n,
-   __func__, ret);
-   goto out_token;
-   }
+   ret = be64_to_cpu(msg.params[1]);
+
+   *sensor_data = be32_to_cpu(data);
+   break;
 
-   *sensor_data = be32_to_cpu(data);
-   ret = be64_to_cpu(msg.params[1]);
+   case OPAL_SUCCESS:
+   *sensor_data = be32_to_cpu(data);
+   break;
+
+   default:
+   break;
+   }
 
 out_token:
mutex_unlock(opal_sensor_mutex);
-- 
1.7.10.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: net: ucc: tbi phy detection broken by 058112c7efc9ef43bb511c137293dddbe6e42908

2015-03-25 Thread Lennart Sorensen
On Sat, Dec 20, 2014 at 09:08:51AM -0800, Florian Fainelli wrote:
 2014-12-18 19:49 GMT-08:00 Lennart Sorensen lsore...@csclub.uwaterloo.ca:
  I have been trying to move an 8360 based system from a 3.0 kernel to a
  3.12 (on the way to 3.14 with ipipe/xenomai) kernel and encountered an
  oops in the ucc_geth driver when using RTBI mode on one of the ucc
  ports.  I haven't managed to find any commits to of_mdio or ucc_geth or
  fsl_pq_mdio that would appear to address this problem, so I believe it
  is still present in the latest kernel, but have not confirmed that with
  testing yet.
 
  Commit 058112c7efc9ef43bb511c137293dddbe6e42908 appears to have broken
  ucc support for tbi phy detection.
 
  With the patch in place, I am unable to get the mdio bus to create phy
  devices for the tbi phy in the ucc on an 8360e, and the ucc_geth driver
  causes a kernel oops, while with the patch reverted, it does create them
  and the driver comes up and works.
 
  The tbi phy is needed when using a ucc in RTBI, TBI or SGMII mode.
 
  I am not convinced that the tbi phy really behaves quite like a real phy,
  which may be why get_phy_device does not work with it.  Perhaps there
  is a better way to deal with the tbi phy on the ucc for this purpose.
 
 There are some comments in ucc_geth that also lead me to believe this
 is a just a hack instead of a real Ethernet PHY device. Part of what I
 think got broken is because of this comment:
 
 /* Initialize TBI PHY interface for communicating with the
  * SERDES lynx PHY on the chip.  We communicate with this PHY
  * through the MDIO bus on each controller, treating it as a
  * normal PHY at the address found in the UTBIPA register.  We assume
  * that the UTBIPA register is valid.  Either the MDIO bus code will set
  * it to a value that doesn't conflict with other PHYs on the bus, or the
  * value doesn't matter, as there are no other PHYs on the bus.
  */
 
 In particular this one:
 
 Either the MDIO bus code will set
  * it to a value that doesn't conflict with other PHYs on the bus, or the
  * value doesn't matter, as there are no other PHYs on the bus.
 
 and what Sebastian removed did exactly that, we used the special MDIO
 broadcast address 0 to provide this whatever. If this is such a
 requirement from the ucc_geth driver and TBI PHYs, maybe we should
 have this hack somewhere in the actual MDIO driver used by the
 ucc_geth driver instead, or set a flag/read the PHY connection mode
 and do this in drivers/of/of_mdio.c
 

I discovered a problem with the tbi address handling on ucc_geth.

In get_ucc_tbipa, the passed in pointer is expecting a pointer to a struct
fsl_pq_mdio, but on ucc the pointer is actually to the start of the mii
area, since it doesn't have all the stuff that the etsec2 has, so as a
result the address returned for tbipa is actually 1312 bytes too high,
which means the address never gets set of course.  In fact the driver
prints out cr=0 and sr=0, while with the older working driver it printed
cr=140 and sr=149.

As a quick test I did:

}

tbipa = data-get_tbipa(priv-map - offsetof(struct 
fsl_pq_mdio, mii));

out_be32(tbipa, be32_to_cpup(prop));

and that made it work, but of course is ugly and would break etsec2.

Any suggestion for a clean way to make get_ucc_tbipa able to dereference
the structure correctly?

I suppose I could do:

/*
 * Return the TBIPAR address for a QE MDIO node
 */
static uint32_t __iomem *get_ucc_tbipa(void __iomem *p)
{
struct fsl_pq_mdio __iomem *mdio = p - offsetof(struct fsl_pq_mdio, 
mii);

return mdio-utbipar;
}

but it seems like just putting more hacks in place.  The use of the
mii_offset in the first place seems like a clue that defining one
structure for etsec2 and ucc and such even though it doesn't apply to
both is probably an error.  It would just be using mii_offset in reverse
for the ucc, versus the etsec2.

-- 
Len Sorensen
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v7 RFC 2/3] sparc: Make sparc64 use scalable lib/iommu-common.c functions

2015-03-25 Thread Sowmini Varadhan
In iperf experiments running linux as the Tx side (TCP client) with
10 threads results in a severe performance drop when TSO is disabled,
indicating a weakness in the software that can be avoided by using
the scalable IOMMU arena DMA allocation.

Baseline numbers before this patch:
   with default settings (TSO enabled) :9-9.5 Gbps
   Disable TSO using ethtool- drops badly:  2-3 Gbps.

After this patch, iperf client with 10 threads, can give a
throughput of at least 8.5 Gbps, even when TSO is disabled.

Signed-off-by: Sowmini Varadhan sowmini.varad...@oracle.com
---
v2: moved sparc specific fileds into iommu_sparc
v3: converted all sparc users of iommu, so lot of cleanup and streamlining
v4: David Miller review change:
- s/IOMMU_ERROR_CODE/DMA_ERROR_CODE
- reverts pci_impl.h (now that all iommu usage has been converted)
v5: benh/aik feedback modifies the function signatures: pass in 
modified args to iommmu_tbl_pool_init() and iommu_tbl_range_free()
v6: removed iommu_tbl_ops. Pass flush_all as function pointer to 
iommu_tbl_pool_init
v7: move pool_hash initialization to iommu_tbl_pool_init()

 arch/sparc/include/asm/iommu_64.h |7 +-
 arch/sparc/kernel/iommu.c |  174 +---
 arch/sparc/kernel/iommu_common.h  |8 --
 arch/sparc/kernel/pci_sun4v.c |  179 
 4 files changed, 127 insertions(+), 241 deletions(-)

diff --git a/arch/sparc/include/asm/iommu_64.h 
b/arch/sparc/include/asm/iommu_64.h
index 2b9321a..e3cd449 100644
--- a/arch/sparc/include/asm/iommu_64.h
+++ b/arch/sparc/include/asm/iommu_64.h
@@ -16,6 +16,7 @@
 #define IOPTE_WRITE   0x0002UL
 
 #define IOMMU_NUM_CTXS 4096
+#include linux/iommu-common.h
 
 struct iommu_arena {
unsigned long   *map;
@@ -24,11 +25,10 @@ struct iommu_arena {
 };
 
 struct iommu {
+   struct iommu_table  tbl;
spinlock_t  lock;
-   struct iommu_arena  arena;
-   void(*flush_all)(struct iommu *);
+   u32 dma_addr_mask;
iopte_t *page_table;
-   u32 page_table_map_base;
unsigned long   iommu_control;
unsigned long   iommu_tsbbase;
unsigned long   iommu_flush;
@@ -40,7 +40,6 @@ struct iommu {
unsigned long   dummy_page_pa;
unsigned long   ctx_lowest_free;
DECLARE_BITMAP(ctx_bitmap, IOMMU_NUM_CTXS);
-   u32 dma_addr_mask;
 };
 
 struct strbuf {
diff --git a/arch/sparc/kernel/iommu.c b/arch/sparc/kernel/iommu.c
index bfa4d0c..f7fdff2 100644
--- a/arch/sparc/kernel/iommu.c
+++ b/arch/sparc/kernel/iommu.c
@@ -13,6 +13,7 @@
 #include linux/errno.h
 #include linux/iommu-helper.h
 #include linux/bitmap.h
+#include linux/iommu-common.h
 
 #ifdef CONFIG_PCI
 #include linux/pci.h
@@ -45,8 +46,9 @@
   i (ASI_PHYS_BYPASS_EC_E))
 
 /* Must be invoked under the IOMMU lock. */
-static void iommu_flushall(struct iommu *iommu)
+static void iommu_flushall(struct iommu_table *iommu_table)
 {
+   struct iommu *iommu = container_of(iommu_table, struct iommu, tbl);
if (iommu-iommu_flushinv) {
iommu_write(iommu-iommu_flushinv, ~(u64)0);
} else {
@@ -87,94 +89,6 @@ static inline void iopte_make_dummy(struct iommu *iommu, 
iopte_t *iopte)
iopte_val(*iopte) = val;
 }
 
-/* Based almost entirely upon the ppc64 iommu allocator.  If you use the 
'handle'
- * facility it must all be done in one pass while under the iommu lock.
- *
- * On sun4u platforms, we only flush the IOMMU once every time we've passed
- * over the entire page table doing allocations.  Therefore we only ever 
advance
- * the hint and cannot backtrack it.
- */
-unsigned long iommu_range_alloc(struct device *dev,
-   struct iommu *iommu,
-   unsigned long npages,
-   unsigned long *handle)
-{
-   unsigned long n, end, start, limit, boundary_size;
-   struct iommu_arena *arena = iommu-arena;
-   int pass = 0;
-
-   /* This allocator was derived from x86_64's bit string search */
-
-   /* Sanity check */
-   if (unlikely(npages == 0)) {
-   if (printk_ratelimit())
-   WARN_ON(1);
-   return DMA_ERROR_CODE;
-   }
-
-   if (handle  *handle)
-   start = *handle;
-   else
-   start = arena-hint;
-
-   limit = arena-limit;
-
-   /* The case below can happen if we have a small segment appended
-* to a large, or when the previous alloc was at the very end of
-* the available space. If so, go back to the beginning and flush.
-*/
-   if (start = limit) {
-   start = 0;
-   if (iommu-flush_all)
-   iommu-flush_all(iommu);
-   }
-
- again:
-
-   if 

[PATCH v7 0/3] Generic IOMMU pooled allocator

2015-03-25 Thread Sowmini Varadhan

Changes from patchv6: moved pool_hash initialization to
lib/iommu-common.c and cleaned up code duplication from 
sun4v/sun4u/ldc. 

Sowmini (2):
  Break up monolithic iommu table/lock into finer graularity pools and
lock
  Make sparc64 use scalable lib/iommu-common.c functions

Sowmini Varadhan (1):
  Make LDC use common iommu poll management functions

 arch/sparc/include/asm/iommu_64.h |7 +-
 arch/sparc/kernel/iommu.c |  174 +++
 arch/sparc/kernel/iommu_common.h  |8 --
 arch/sparc/kernel/ldc.c   |  152 ++--
 arch/sparc/kernel/pci_sun4v.c |  179 +
 include/linux/iommu-common.h  |   48 
 lib/Makefile  |2 +-
 lib/iommu-common.c|  235 +
 8 files changed, 475 insertions(+), 330 deletions(-)
 create mode 100644 include/linux/iommu-common.h
 create mode 100644 lib/iommu-common.c

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v7 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-03-25 Thread Sowmini Varadhan
Investigation of multithreaded iperf experiments on an ethernet
interface show the iommu-lock as the hottest lock identified by
lockstat, with something of the order of  21M contentions out of
27M acquisitions, and an average wait time of 26 us for the lock.
This is not efficient. A more scalable design is to follow the ppc
model, where the iommu_table has multiple pools, each stretching
over a segment of the map, and with a separate lock for each pool.
This model allows for better parallelization of the iommu map search.

This patch adds the iommu range alloc/free function infrastructure.

Signed-off-by: Sowmini Varadhan sowmini.varad...@oracle.com
---
v2 changes:
  - incorporate David Miller editorial comments: sparc specific
fields moved from iommu-common into sparc's iommu_64.h
  - make the npools value an input parameter, for the case when
the iommu map size is not very large
  - cookie_to_index mapping, and optimizations for span-boundary
check, for use case such as LDC.
v3: eliminate iommu_sparc, rearrange the -demap indirection to
be invoked under the pool lock.

v4: David Miller review changes:
  - s/IOMMU_ERROR_CODE/DMA_ERROR_CODE
  - page_table_map_base and page_table_shift are unsigned long, not u32.

v5: Feedback from b...@kernel.crashing.org and a...@ozlabs.ru
  - removed -cookie_to_index and -demap indirection: caller should
invoke these as needed before calling into the generic allocator

v6: Benh/DaveM discussion eliminationg iommu_tbl_ops, but retaining flush_all
optimization.

v7: one-time initialization of pool_hash from iommu_tbl_pool_init()

 include/linux/iommu-common.h |   48 +
 lib/Makefile |2 +-
 lib/iommu-common.c   |  235 ++
 3 files changed, 284 insertions(+), 1 deletions(-)
 create mode 100644 include/linux/iommu-common.h
 create mode 100644 lib/iommu-common.c

diff --git a/include/linux/iommu-common.h b/include/linux/iommu-common.h
new file mode 100644
index 000..197111b
--- /dev/null
+++ b/include/linux/iommu-common.h
@@ -0,0 +1,48 @@
+#ifndef _LINUX_IOMMU_COMMON_H
+#define _LINUX_IOMMU_COMMON_H
+
+#include linux/spinlock_types.h
+#include linux/device.h
+#include asm/page.h
+
+#define IOMMU_POOL_HASHBITS 4
+#define IOMMU_NR_POOLS  (1  IOMMU_POOL_HASHBITS)
+
+struct iommu_pool {
+   unsigned long   start;
+   unsigned long   end;
+   unsigned long   hint;
+   spinlock_t  lock;
+};
+
+struct iommu_table {
+   unsigned long   page_table_map_base;
+   unsigned long   page_table_shift;
+   unsigned long   nr_pools;
+   void(*flush_all)(struct iommu_table *);
+   unsigned long   poolsize;
+   struct iommu_pool   arena_pool[IOMMU_NR_POOLS];
+   u32 flags;
+#defineIOMMU_HAS_LARGE_POOL0x0001
+#defineIOMMU_NO_SPAN_BOUND 0x0002
+   struct iommu_pool   large_pool;
+   unsigned long   *map;
+};
+
+extern void iommu_tbl_pool_init(struct iommu_table *iommu,
+   unsigned long num_entries,
+   u32 page_table_shift,
+   void (*flush_all)(struct iommu_table *),
+   bool large_pool, u32 npools,
+   bool skip_span_boundary_check);
+
+extern unsigned long iommu_tbl_range_alloc(struct device *dev,
+  struct iommu_table *iommu,
+  unsigned long npages,
+  unsigned long *handle);
+
+extern void iommu_tbl_range_free(struct iommu_table *iommu,
+u64 dma_addr, unsigned long npages,
+unsigned long entry);
+
+#endif
diff --git a/lib/Makefile b/lib/Makefile
index 3c3b30b..0ea2ac6 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -102,7 +102,7 @@ obj-$(CONFIG_AUDIT_GENERIC) += audit.o
 obj-$(CONFIG_AUDIT_COMPAT_GENERIC) += compat_audit.o
 
 obj-$(CONFIG_SWIOTLB) += swiotlb.o
-obj-$(CONFIG_IOMMU_HELPER) += iommu-helper.o
+obj-$(CONFIG_IOMMU_HELPER) += iommu-helper.o iommu-common.o
 obj-$(CONFIG_FAULT_INJECTION) += fault-inject.o
 obj-$(CONFIG_NOTIFIER_ERROR_INJECTION) += notifier-error-inject.o
 obj-$(CONFIG_CPU_NOTIFIER_ERROR_INJECT) += cpu-notifier-error-inject.o
diff --git a/lib/iommu-common.c b/lib/iommu-common.c
new file mode 100644
index 000..bb7e706
--- /dev/null
+++ b/lib/iommu-common.c
@@ -0,0 +1,235 @@
+/*
+ * IOMMU mmap management and range allocation functions.
+ * Based almost entirely upon the powerpc iommu allocator.
+ */
+
+#include linux/export.h
+#include linux/bitmap.h
+#include linux/bug.h
+#include linux/iommu-helper.h
+#include linux/iommu-common.h
+#include linux/dma-mapping.h
+#include linux/hash.h
+
+#define IOMMU_LARGE_ALLOC  15
+
+static DEFINE_PER_CPU(unsigned int, 

Re: [PATCH v6 0/3] Generic IOMMU pooled allocator

2015-03-25 Thread Sowmini Varadhan
On (03/24/15 18:16), David Miller wrote:
 Generally this looks fine to me.
 
 But about patch #2, I see no reason to have multiple iommu_pool_hash
 tables.  Even from a purely sparc perspective, we can always just do
 with just one of them.
 
 Furthermore, you can even probably move it down into lib/iommu-common.c
 itself.  iommu_tbl_pool_init() can do the one time initialization.

fixed in v7.

Ben, Alexey, do you need more time to review this?

--Sowmini
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC, powerpc] perf/hv-24x7 set the attr group to NULL if events failed to be initialized

2015-03-25 Thread Sukadev Bhattiprolu
Michael Ellerman [m...@ellerman.id.au] wrote:
| On Sun, 2015-15-02 at 09:42:57 UTC, Li Zhong wrote:
|  This patch moves the three events groups to the end of the attr groups,
|  and if create_events_from_catalog() fails to set their attributes, we
|  set them to NULL in attr_groups.
| 
| But why are we continuing at all if create_events_from_catalog() fails?
| 
| Shouldn't that just be a fatal error and we bail?

Well, even if create_events_from_catalog() fails, we can continue to use
the 24x7 events, rather clumsily, as long as the catalog is readable.  i.e.
parse /sys/bus/event_source/devices/hv_24x7/interface/catalog to find event
offset and run:

perf stat -C 0 -e hv_24x7/domain=2,offset=8,core=0/ workload
Suka

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v7 0/3] Generic IOMMU pooled allocator

2015-03-25 Thread David Miller
From: Sowmini Varadhan sowmini.varad...@oracle.com
Date: Wed, 25 Mar 2015 13:34:45 -0400

 Changes from patchv6: moved pool_hash initialization to
 lib/iommu-common.c and cleaned up code duplication from 
 sun4v/sun4u/ldc. 

Looks good to me.

PowerPC folks, what do you think?
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 2/2] powerpc/mm: Tracking vDSO remap

2015-03-25 Thread Ingo Molnar

* Laurent Dufour lduf...@linux.vnet.ibm.com wrote:

 +static inline void arch_unmap(struct mm_struct *mm,
 + struct vm_area_struct *vma,
 + unsigned long start, unsigned long end)
 +{
 + if (start = mm-context.vdso_base  mm-context.vdso_base  end)
 + mm-context.vdso_base = 0;
 +}

So AFAICS PowerPC can have multi-page vDSOs, right?

So what happens if I munmap() the middle or end of the vDSO? The above 
condition only seems to cover unmaps that affect the first page. I 
think 'affects any page' ought to be the right condition? (But I know 
nothing about PowerPC so I might be wrong.)


 +#define __HAVE_ARCH_REMAP
 +static inline void arch_remap(struct mm_struct *mm,
 +   unsigned long old_start, unsigned long old_end,
 +   unsigned long new_start, unsigned long new_end)
 +{
 + /*
 +  * mremap() doesn't allow moving multiple vmas so we can limit the
 +  * check to old_start == vdso_base.
 +  */
 + if (old_start == mm-context.vdso_base)
 + mm-context.vdso_base = new_start;
 +}

mremap() doesn't allow moving multiple vmas, but it allows the 
movement of multi-page vmas and it also allows partial mremap()s, 
where it will split up a vma.

In particular, what happens if an mremap() is done with 
old_start == vdso_base, but a shorter end than the end of the vDSO? 
(i.e. a partial mremap() with fewer pages than the vDSO size)

Thanks,

Ingo
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 2/2] powerpc/mm: Tracking vDSO remap

2015-03-25 Thread Ingo Molnar

* Ingo Molnar mi...@kernel.org wrote:

  +#define __HAVE_ARCH_REMAP
  +static inline void arch_remap(struct mm_struct *mm,
  + unsigned long old_start, unsigned long old_end,
  + unsigned long new_start, unsigned long new_end)
  +{
  +   /*
  +* mremap() doesn't allow moving multiple vmas so we can limit the
  +* check to old_start == vdso_base.
  +*/
  +   if (old_start == mm-context.vdso_base)
  +   mm-context.vdso_base = new_start;
  +}
 
 mremap() doesn't allow moving multiple vmas, but it allows the 
 movement of multi-page vmas and it also allows partial mremap()s, 
 where it will split up a vma.

I.e. mremap() supports the shrinking (and growing) of vmas. In that 
case mremap() will unmap the end of the vma and will shrink the 
remaining vDSO vma.

Doesn't that result in a non-working vDSO that should zero out 
vdso_base?

Thanks,

Ingo
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 2/2] powerpc/mm: Tracking vDSO remap

2015-03-25 Thread Benjamin Herrenschmidt
On Wed, 2015-03-25 at 19:33 +0100, Ingo Molnar wrote:
 * Laurent Dufour lduf...@linux.vnet.ibm.com wrote:
 
  +static inline void arch_unmap(struct mm_struct *mm,
  +   struct vm_area_struct *vma,
  +   unsigned long start, unsigned long end)
  +{
  +   if (start = mm-context.vdso_base  mm-context.vdso_base  end)
  +   mm-context.vdso_base = 0;
  +}
 
 So AFAICS PowerPC can have multi-page vDSOs, right?
 
 So what happens if I munmap() the middle or end of the vDSO? The above 
 condition only seems to cover unmaps that affect the first page. I 
 think 'affects any page' ought to be the right condition? (But I know 
 nothing about PowerPC so I might be wrong.)

You are right, we have at least two pages.
 
  +#define __HAVE_ARCH_REMAP
  +static inline void arch_remap(struct mm_struct *mm,
  + unsigned long old_start, unsigned long old_end,
  + unsigned long new_start, unsigned long new_end)
  +{
  +   /*
  +* mremap() doesn't allow moving multiple vmas so we can limit the
  +* check to old_start == vdso_base.
  +*/
  +   if (old_start == mm-context.vdso_base)
  +   mm-context.vdso_base = new_start;
  +}
 
 mremap() doesn't allow moving multiple vmas, but it allows the 
 movement of multi-page vmas and it also allows partial mremap()s, 
 where it will split up a vma.
 
 In particular, what happens if an mremap() is done with 
 old_start == vdso_base, but a shorter end than the end of the vDSO? 
 (i.e. a partial mremap() with fewer pages than the vDSO size)

Is there a way to forbid splitting ? Does x86 deal with that case at all
or it doesn't have to for some other reason ?

Cheers,
Ben.

 Thanks,
 
   Ingo
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 2/2] powerpc/mm: Tracking vDSO remap

2015-03-25 Thread Benjamin Herrenschmidt
On Wed, 2015-03-25 at 19:36 +0100, Ingo Molnar wrote:
 * Ingo Molnar mi...@kernel.org wrote:
 
   +#define __HAVE_ARCH_REMAP
   +static inline void arch_remap(struct mm_struct *mm,
   +   unsigned long old_start, unsigned long old_end,
   +   unsigned long new_start, unsigned long new_end)
   +{
   + /*
   +  * mremap() doesn't allow moving multiple vmas so we can limit the
   +  * check to old_start == vdso_base.
   +  */
   + if (old_start == mm-context.vdso_base)
   + mm-context.vdso_base = new_start;
   +}
  
  mremap() doesn't allow moving multiple vmas, but it allows the 
  movement of multi-page vmas and it also allows partial mremap()s, 
  where it will split up a vma.
 
 I.e. mremap() supports the shrinking (and growing) of vmas. In that 
 case mremap() will unmap the end of the vma and will shrink the 
 remaining vDSO vma.
 
 Doesn't that result in a non-working vDSO that should zero out 
 vdso_base?

Right. Now we can't completely prevent the user from shooting itself in
the foot I suppose, though there is a legit usage scenario which is to
move the vDSO around which it would be nice to support. I think it's
reasonable to put the onus on the user here to do the right thing.

Cheers,
Ben.

 Thanks,
 
   Ingo
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v7 0/3] Generic IOMMU pooled allocator

2015-03-25 Thread Benjamin Herrenschmidt
On Wed, 2015-03-25 at 14:12 -0400, David Miller wrote:
 From: Sowmini Varadhan sowmini.varad...@oracle.com
 Date: Wed, 25 Mar 2015 13:34:45 -0400
 
  Changes from patchv6: moved pool_hash initialization to
  lib/iommu-common.c and cleaned up code duplication from 
  sun4v/sun4u/ldc. 
 
 Looks good to me.
 
 PowerPC folks, what do you think?

I'll give it another look today.

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 19/30] powerpc/pci: Use pci_scan_host_bridge() for simplicity

2015-03-25 Thread Daniel Axtens
Hi Yijing,

I wasn't quite sure I understood your comments, so I was trying to apply
your patch series and test it, but patch 3 doesn't apply cleanly to
4.0-rc5 or master. Can you respin the series?

Thanks,
Daniel


 Hi Daniel, thanks for your review and comments. We want to make a generic 
 pci_host_bridge,
 which would hold the common host information, for example, pci domain is 
 common info for
 pci host bridge, this series saved domain in pci_host_bridge, then we no need 
 to
 extract out domain by pci_bus-sysdata by platform specific pci_domain_nr().
 Also we store the sysdata in pci_host_bridge, and pci_bus_to_host() is the 
 platform
 interface, I think use the common interface would be better.
 
  +
  +  /* Get probe mode and perform scan */
  +  if (hose-dn  ppc_md.pci_probe_mode)
  +  mode = ppc_md.pci_probe_mode(bus);
  +
  +  pr_debug(probe mode: %d\n, mode);
  +  if (mode == PCI_PROBE_DEVTREE)
  +  of_scan_bus(hose-dn, bus);
  +
  +  if (mode == PCI_PROBE_NORMAL) {
  +  pci_bus_update_busn_res_end(bus, 255);
  +  hose-last_busno = pci_scan_child_bus(bus);
  +  pci_bus_update_busn_res_end(bus, hose-last_busno);
  +  }
  +
  +  return pci_bus_child_max_busnr(bus);
  +}
  +
  I'm having trouble convincing myself that this patch covers every
  variation within our PCI implementations. In particular, there's a
  stanza in of_scan_pci_bridge in kernel/pci_of_scan.c that's almost
  identical to this function. Does that implementation need to be cleaned
  up and replaced with this function too?
  
 
 This is a pci_host_bridge_ops hook function, which would be called in
 PCI core, and after applied this series, we only need to call 
 pci_scan_host_bridge()
 to scan pci devices, and this function is also extracted from the 
 pcibios_scan_phb(),
 it's not the redundant code.
 
  
  @@ -1641,9 +1655,9 @@ void pcibios_scan_phb(struct pci_controller *hose)
 ppc_md.pcibios_fixup_phb(hose);
   
 /* Configure PCI Express settings */
  -  if (bus  !pci_has_flag(PCI_PROBE_ONLY)) {
  +  if (host-bus  !pci_has_flag(PCI_PROBE_ONLY)) {
 struct pci_bus *child;
  -  list_for_each_entry(child, bus-children, node)
  +  list_for_each_entry(child, host-bus-children, node)
 pcie_bus_configure_settings(child);
 }
   }
  Two things: Firstly, the function uses hose throughout, not host.
  Secondly, you're not deleting the bus variable: what's the purpose of
  this change?
 
 host is the common pci_host_bridge which is created by PCI core for pci host 
 bridge driver,
 the hose is the platform data used in powerpc. The purpose of the 
 patch/series is to simplify
 pci enumeration interface, and try to reduce the weak functions which were 
 used to setup pci bus/devices
 during PCI enumeration.
 
  
  Regards,
  Daniel
  
 
 



signature.asc
Description: This is a digitally signed message part
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4 2/4] powerpc/eeh: Introduce eeh_pe_inject_err()

2015-03-25 Thread Gavin Shan
The patch defines PCI error types and functions in uapi/asm/eeh.h
and exports function eeh_pe_inject_err(), which will be called by
VFIO driver to inject the specified PCI error to the indicated
PE for testing purpose.

Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com
Reviewed-by: David Gibson da...@gibson.dropbear.id.au
---
 arch/powerpc/include/asm/eeh.h  |  2 ++
 arch/powerpc/include/uapi/asm/eeh.h | 26 ++
 arch/powerpc/kernel/eeh.c   | 35 +++
 3 files changed, 63 insertions(+)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 08c4042..cd6003b 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -291,6 +291,8 @@ int eeh_pe_set_option(struct eeh_pe *pe, int option);
 int eeh_pe_get_state(struct eeh_pe *pe);
 int eeh_pe_reset(struct eeh_pe *pe, int option);
 int eeh_pe_configure(struct eeh_pe *pe);
+int eeh_pe_inject_err(struct eeh_pe *pe, int type, int func,
+ unsigned long addr, unsigned long mask);
 
 /**
  * EEH_POSSIBLE_ERROR() -- test for possible MMIO failure.
diff --git a/arch/powerpc/include/uapi/asm/eeh.h 
b/arch/powerpc/include/uapi/asm/eeh.h
index 8bb34b0..291b7d1 100644
--- a/arch/powerpc/include/uapi/asm/eeh.h
+++ b/arch/powerpc/include/uapi/asm/eeh.h
@@ -27,4 +27,30 @@
 #define EEH_PE_STATE_STOPPED_DMA   4   /* Stopped DMA only */
 #define EEH_PE_STATE_UNAVAIL   5   /* Unavailable  */
 
+/* EEH error types and functions */
+#define EEH_ERR_TYPE_320   /* 32-bits error
*/
+#define EEH_ERR_TYPE_641   /* 64-bits error
*/
+#define EEH_ERR_FUNC_MIN   0
+#define EEH_ERR_FUNC_LD_MEM_ADDR   0   /* Memory load  */
+#define EEH_ERR_FUNC_LD_MEM_DATA   1
+#define EEH_ERR_FUNC_LD_IO_ADDR2   /* IO load  */
+#define EEH_ERR_FUNC_LD_IO_DATA3
+#define EEH_ERR_FUNC_LD_CFG_ADDR   4   /* Config load  */
+#define EEH_ERR_FUNC_LD_CFG_DATA   5
+#define EEH_ERR_FUNC_ST_MEM_ADDR   6   /* Memory store */
+#define EEH_ERR_FUNC_ST_MEM_DATA   7
+#define EEH_ERR_FUNC_ST_IO_ADDR8   /* IO store */
+#define EEH_ERR_FUNC_ST_IO_DATA9
+#define EEH_ERR_FUNC_ST_CFG_ADDR   10  /* Config store */
+#define EEH_ERR_FUNC_ST_CFG_DATA   11
+#define EEH_ERR_FUNC_DMA_RD_ADDR   12  /* DMA read */
+#define EEH_ERR_FUNC_DMA_RD_DATA   13
+#define EEH_ERR_FUNC_DMA_RD_MASTER 14
+#define EEH_ERR_FUNC_DMA_RD_TARGET 15
+#define EEH_ERR_FUNC_DMA_WR_ADDR   16  /* DMA write*/
+#define EEH_ERR_FUNC_DMA_WR_DATA   17
+#define EEH_ERR_FUNC_DMA_WR_MASTER 18
+#define EEH_ERR_FUNC_DMA_WR_TARGET 19
+#define EEH_ERR_FUNC_MAX   19
+
 #endif /* _ASM_POWERPC_EEH_H */
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 76253eb..daa68a1 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -1636,6 +1636,41 @@ int eeh_pe_configure(struct eeh_pe *pe)
 }
 EXPORT_SYMBOL_GPL(eeh_pe_configure);
 
+/**
+ * eeh_pe_inject_err - Injecting the specified PCI error to the indicated PE
+ * @pe: the indicated PE
+ * @type: error type
+ * @function: error function
+ * @addr: address
+ * @mask: address mask
+ *
+ * The routine is called to inject the specified PCI error, which
+ * is determined by @type and @function, to the indicated PE for
+ * testing purpose.
+ */
+int eeh_pe_inject_err(struct eeh_pe *pe, int type, int func,
+ unsigned long addr, unsigned long mask)
+{
+   /* Invalid PE ? */
+   if (!pe)
+   return -ENODEV;
+
+   /* Unsupported operation ? */
+   if (!eeh_ops || !eeh_ops-err_inject)
+   return -ENOENT;
+
+   /* Check on PCI error type */
+   if (type != EEH_ERR_TYPE_32  type != EEH_ERR_TYPE_64)
+   return -EINVAL;
+
+   /* Check on PCI error function */
+   if (func  EEH_ERR_FUNC_MIN || func  EEH_ERR_FUNC_MAX)
+   return -EINVAL;
+
+   return eeh_ops-err_inject(pe, type, func, addr, mask);
+}
+EXPORT_SYMBOL_GPL(eeh_pe_inject_err);
+
 static int proc_eeh_show(struct seq_file *m, void *v)
 {
if (!eeh_enabled()) {
-- 
1.8.3.2

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

  1   2   >