[PATCH 22/27] powerpc: Remove shim for pci_controller_ops.reset_secondary_bus
Signed-off-by: Daniel Axtens d...@axtens.net --- arch/powerpc/include/asm/machdep.h| 3 --- arch/powerpc/include/asm/pci-bridge.h | 16 arch/powerpc/kernel/pci-common.c | 9 - 3 files changed, 8 insertions(+), 20 deletions(-) diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h index f1476b8..f178cf1 100644 --- a/arch/powerpc/include/asm/machdep.h +++ b/arch/powerpc/include/asm/machdep.h @@ -244,9 +244,6 @@ struct machdep_calls { /* Called after scan and before resource survey */ void (*pcibios_fixup_phb)(struct pci_controller *hose); - /* Reset the secondary bus of bridge */ - void (*pcibios_reset_secondary_bus)(struct pci_dev *dev); - /* Called to shutdown machine specific hardware not already controlled * by other drivers. */ diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h index b62e043..b08db93 100644 --- a/arch/powerpc/include/asm/pci-bridge.h +++ b/arch/powerpc/include/asm/pci-bridge.h @@ -327,21 +327,5 @@ static inline bool enable_device_hook(struct pci_dev *dev) return true; } -static inline void reset_secondary_bus(struct pci_dev *dev) -{ - struct pci_controller *hose = pci_bus_to_host(dev-bus); - - if (hose-controller_ops.reset_secondary_bus) - hose-controller_ops.reset_secondary_bus(dev); - else if (ppc_md.pcibios_reset_secondary_bus) - ppc_md.pcibios_reset_secondary_bus(dev); - else - /* -* Fallback to the generic function if no -* platform-specific one is provided -*/ - pci_reset_secondary_bus(dev); -} - #endif /* __KERNEL__ */ #endif /* _ASM_POWERPC_PCI_BRIDGE_H */ diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c index 9edb479..a535d31 100644 --- a/arch/powerpc/kernel/pci-common.c +++ b/arch/powerpc/kernel/pci-common.c @@ -124,7 +124,14 @@ resource_size_t pcibios_window_alignment(struct pci_bus *bus, void pcibios_reset_secondary_bus(struct pci_dev *dev) { - reset_secondary_bus(dev); + struct pci_controller *hose = pci_bus_to_host(dev-bus); + + if (hose-controller_ops.reset_secondary_bus) { + hose-controller_ops.reset_secondary_bus(dev); + return; + } + + pci_reset_secondary_bus(dev); } static resource_size_t pcibios_io_size(const struct pci_controller *hose) -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V15 00/21] Enable SRIOV on POWER8
This patchset enables the SRIOV on POWER8. The general idea is put each VF into one individual PE and allocate required resources like MMIO/DMA/MSI. The major difficulty comes from the MMIO allocation and adjustment for PF's IOV BAR. On P8, we use M64BT to cover a PF's IOV BAR, which could make an individual VF sit in its own PE. This gives more flexiblity, while at the mean time it brings on some restrictions on the PF's IOV BAR size and alignment. To achieve this effect, we need to do some hack on pci devices's resources. 1. Expand the IOV BAR properly. Done by pnv_pci_ioda_fixup_iov_resources(). 2. Shift the IOV BAR properly. Done by pnv_pci_vf_resource_shift(). 3. IOV BAR alignment is calculated by arch dependent function instead of an individual VF BAR size. Done by pnv_pcibios_sriov_resource_alignment(). 4. Take the IOV BAR alignment into consideration in the sizing and assigning. This is achieved by commit: PCI: Take additional IOV BAR alignment in sizing and assigning Test Environment: The SRIOV device tested is Emulex Lancer(10df:e220) and Mellanox ConnectX-3(15b3:1003) on POWER8. Examples on pass through a VF to guest through vfio: 1. unbind the original driver and bind to vfio-pci driver echo :06:0d.0 /sys/bus/pci/devices/:06:0d.0/driver/unbind echo 1102 0002 /sys/bus/pci/drivers/vfio-pci/new_id Note: this should be done for each device in the same iommu_group 2. Start qemu and pass device through vfio /home/ywywyang/git/qemu-impreza/ppc64-softmmu/qemu-system-ppc64 \ -M pseries -m 2048 -enable-kvm -nographic \ -drive file=/home/ywywyang/kvm/fc19.img \ -monitor telnet:localhost:5435,server,nowait -boot cd \ -device spapr-pci-vfio-host-bridge,id=CXGB3,iommu=26,index=6 Verify this is the exact VF response: 1. ping from a machine in the same subnet(the broadcast domain) 2. run arp -n on this machine 9.115.251.20 ether 00:00:c9:df:ed:bf C eth0 3. ifconfig in the guest # ifconfig eth1 eth1: flags=4163UP,BROADCAST,RUNNING,MULTICAST mtu 1500 inet 9.115.251.20 netmask 255.255.255.0 broadcast 9.115.251.255 inet6 fe80::200:c9ff:fedf:edbf prefixlen 64 scopeid 0x20link ether 00:00:c9:df:ed:bf txqueuelen 1000 (Ethernet) RX packets 175 bytes 13278 (12.9 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 58 bytes 9276 (9.0 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 4. They have the same MAC address Note: make sure you shutdown other network interfaces in guest. --- v15: * Add Ack from Bjorn * Make more detailed comment for pnv_pci_vf_resource_shift() v14: * call ppc_md.pcibios_fixup_sriov() in pcibios_add_device * add more explanation in change log * Following patches have been reordered to the beginning. EEH refactor to use pci_dn: 8ec20d6 powerpc/powernv: Use pci_dn, not device_node, in PCI config accessor a3460fc powerpc/pci: Refactor pci_dn These two patches will be modified to merge with other patches which are under discussion/review in ppc mail list. Some changes may also be made in other patches, which I didn't include them in this series, so that the auto build robot could work on this. There may have several changes in powerpc arch, which not effect the pci core. So after this patch set pass the review in pci community, I would rebase this series on ppc brach and send out for comment. * use add_res-min_align as the alignment in reassign_resources_sorted() * some cleanup in Document v13: * fix error in pcibios_iov_resource_alignment(), use pdev instead of dev * rename vf_num to num_vfs in pcibios_sriov_enable(), pnv_pci_vf_resource_shift(), pnv_pci_sriov_disable(), pnv_pci_sriov_enable(), pnv_pci_ioda2_setup_dma_pe() * add more explanation in commit powerpc/pci: Don't unset PCI resources for VFs * fix IOV BAR in hotplug path as well, and don't fixup an already added device * use roundup_pow_of_two() instead of __roundup_pow_of_two() * this is based on v4.0-rc1 v12: * remove align parameter from pcibios_iov_resource_alignment() default version returns pci_iov_resource_size() instead of the align parameter * in powerpc pcibios_iov_resource_alignment(), return pci_iov_resource_size() if there's no ppc_md function pointer * in pci_sriov_resource_alignment(), don't re-read base, since we saved the required alignment when reading it the first time * remove vf_num parameter from add_dev_pci_info() and remove_dev_pci_info(); use pci_sriov_get_totalvfs() instead * use dev_warn() instead of pr_warn() when possible * check to be sure IOV BAR
[PATCH V15 01/21] powerpc/pci: Refactor pci_dn
From: Gavin Shan gws...@linux.vnet.ibm.com pci_dn is the extension of PCI device node and is created from device node. Unfortunately, VFs are enabled dynamically by PF's driver and they don't have corresponding device nodes, and pci_dn. Refactor pci_dn to support VFs: * pci_dn is organized as a hierarchy tree. VF's pci_dn is put to the child list of pci_dn of PF's bridge. pci_dn of other device put to the child list of pci_dn of its upstream bridge. * VF's pci_dn is expected to be created dynamically when PF enabling VFs. VF's pci_dn will be destroyed when PF disabling VFs. pci_dn of other device is still created from device node as before. * For one particular PCI device (VF or not), its pci_dn can be found from pdev-dev.archdata.firmware_data, PCI_DN(devnode), or parent's list. The fast path (fetching pci_dn through PCI device instance) is populated during early fixup time. [bhelgaas: add ifdef around add_one_dev_pci_info(), use dev_printk()] Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com --- arch/powerpc/include/asm/device.h |3 + arch/powerpc/include/asm/pci-bridge.h | 14 +- arch/powerpc/kernel/pci_dn.c | 245 - arch/powerpc/platforms/powernv/pci-ioda.c | 16 ++ 4 files changed, 272 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/include/asm/device.h b/arch/powerpc/include/asm/device.h index 38faede..29992cd 100644 --- a/arch/powerpc/include/asm/device.h +++ b/arch/powerpc/include/asm/device.h @@ -34,6 +34,9 @@ struct dev_archdata { #ifdef CONFIG_SWIOTLB dma_addr_t max_direct_dma_addr; #endif +#ifdef CONFIG_PPC64 + void*firmware_data; +#endif #ifdef CONFIG_EEH struct eeh_dev *edev; #endif diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h index 546d036..513f8f2 100644 --- a/arch/powerpc/include/asm/pci-bridge.h +++ b/arch/powerpc/include/asm/pci-bridge.h @@ -89,6 +89,7 @@ struct pci_controller { #ifdef CONFIG_PPC64 unsigned long buid; + void *firmware_data; #endif /* CONFIG_PPC64 */ void *private_data; @@ -154,9 +155,13 @@ static inline int isa_vaddr_is_ioport(void __iomem *address) struct iommu_table; struct pci_dn { + int flags; +#define PCI_DN_FLAG_IOV_VF 0x01 + int busno; /* pci bus number */ int devfn; /* pci device and function number */ + struct pci_dn *parent; struct pci_controller *phb;/* for pci devices */ struct iommu_table *iommu_table; /* for phb's or bridges */ struct device_node *node; /* back-pointer to the device_node */ @@ -171,14 +176,19 @@ struct pci_dn { #ifdef CONFIG_PPC_POWERNV int pe_number; #endif + struct list_head child_list; + struct list_head list; }; /* Get the pointer to a device_node's pci_dn */ #define PCI_DN(dn) ((struct pci_dn *) (dn)-data) +extern struct pci_dn *pci_get_pdn_by_devfn(struct pci_bus *bus, + int devfn); extern struct pci_dn *pci_get_pdn(struct pci_dev *pdev); - -extern void * update_dn_pci_info(struct device_node *dn, void *data); +extern struct pci_dn *add_dev_pci_info(struct pci_dev *pdev); +extern void remove_dev_pci_info(struct pci_dev *pdev); +extern void *update_dn_pci_info(struct device_node *dn, void *data); static inline int pci_device_from_OF_node(struct device_node *np, u8 *bus, u8 *devfn) diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c index 83df307..f3a1a81 100644 --- a/arch/powerpc/kernel/pci_dn.c +++ b/arch/powerpc/kernel/pci_dn.c @@ -32,12 +32,223 @@ #include asm/ppc-pci.h #include asm/firmware.h +/* + * The function is used to find the firmware data of one + * specific PCI device, which is attached to the indicated + * PCI bus. For VFs, their firmware data is linked to that + * one of PF's bridge. For other devices, their firmware + * data is linked to that of their bridge. + */ +static struct pci_dn *pci_bus_to_pdn(struct pci_bus *bus) +{ + struct pci_bus *pbus; + struct device_node *dn; + struct pci_dn *pdn; + + /* +* We probably have virtual bus which doesn't +* have associated bridge. +*/ + pbus = bus; + while (pbus) { + if (pci_is_root_bus(pbus) || pbus-self) + break; + + pbus = pbus-parent; + } + + /* +* Except virtual bus, all PCI buses should +* have device nodes. +*/ + dn = pci_bus_to_OF_node(pbus); + pdn = dn ? PCI_DN(dn) : NULL; + + return pdn; +} + +struct pci_dn *pci_get_pdn_by_devfn(struct pci_bus *bus, + int devfn) +{ + struct device_node *dn = NULL; + struct pci_dn *parent,
Re: [PATCH] powerpc/powernv: Remove powernv RTAS support
Michael Ellerman m...@ellerman.id.au writes: The powernv code has some conditional support for running on bare metal machines that have no OPAL firmware, but provide RTAS. No released machines ever supported that, and even in the lab it was just a transitional hack in the days when OPAL was still being developed. So remove the code. Signed-off-by: Michael Ellerman m...@ellerman.id.au The only current place I could think this could be remotely possible would be in simulator... and we should instead make the OPAL calls work properly in the simulator for all the RTAS functionality (that we care about). In related news.. I should poke the simulator guys. Acked-by: Stewart Smith stew...@linux.vnet.ibm.com ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V15 03/21] PCI: Print more info in sriov_enable() error message
From: Bjorn Helgaas bhelg...@google.com If we don't have space for all the bus numbers required to enable VFs, print the largest bus number required and the range available. No functional change; improved error message only. Signed-off-by: Bjorn Helgaas bhelg...@google.com Acked-by: Wei Yang weiy...@linux.vnet.ibm.com --- drivers/pci/iov.c |7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index 4b3a4ea..c4c33ea 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -180,6 +180,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn) struct pci_dev *pdev; struct pci_sriov *iov = dev-sriov; int bars = 0; + u8 bus; if (!nr_virtfn) return 0; @@ -216,8 +217,10 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn) iov-offset = offset; iov-stride = stride; - if (virtfn_bus(dev, nr_virtfn - 1) dev-bus-busn_res.end) { - dev_err(dev-dev, SR-IOV: bus number out of range\n); + bus = virtfn_bus(dev, nr_virtfn - 1); + if (bus dev-bus-busn_res.end) { + dev_err(dev-dev, can't enable %d VFs (bus %02x out of range of %pR)\n, + nr_virtfn, bus, dev-bus-busn_res); return -ENOMEM; } -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V15 12/21] PCI: Consider additional PF's IOV BAR alignment in sizing and assigning
When sizing and assigning resources, we divide the resources into two lists: the requested list and the additional list. We don't consider the alignment of additional VF(n) BAR space. This is because the alignment required for the VF(n) BAR space is the size of an individual VF BAR, not the size of the space for *all* VFs. But we want additional alignment to support partitioning on PowerNV. Consider the additional IOV BAR alignment when sizing and assigning resources. When there is not enough system MMIO space to accomodate both the requested list and the additional list, the PF's IOV BAR alignment will not contribute to the bridge. When there is enough system MMIO space for both lists, the additional alignment will contribute to the bridge. The additional alignment is stored in the min_align of pci_dev_resource, which is stored in the additional list by add_to_list() at the end of pbus_size_mem(). The additional alignment is calculated in pci_resource_alignment(). For an IOV BAR, we have arch dependent function to get the alignment for different arch. [bhelgaas: changelog, printk cast] Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com Acked-by: Bjorn Helgaas bhelg...@google.com --- drivers/pci/setup-bus.c | 95 +++ 1 file changed, 79 insertions(+), 16 deletions(-) diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c index e3e17f3..6603d40 100644 --- a/drivers/pci/setup-bus.c +++ b/drivers/pci/setup-bus.c @@ -99,8 +99,8 @@ static void remove_from_list(struct list_head *head, } } -static resource_size_t get_res_add_size(struct list_head *head, - struct resource *res) +static struct pci_dev_resource *res_to_dev_res(struct list_head *head, + struct resource *res) { struct pci_dev_resource *dev_res; @@ -109,17 +109,37 @@ static resource_size_t get_res_add_size(struct list_head *head, int idx = res - dev_res-dev-resource[0]; dev_printk(KERN_DEBUG, dev_res-dev-dev, -res[%d]=%pR get_res_add_size add_size %llx\n, +res[%d]=%pR res_to_dev_res add_size %llx min_align %llx\n, idx, dev_res-res, -(unsigned long long)dev_res-add_size); +(unsigned long long)dev_res-add_size, +(unsigned long long)dev_res-min_align); - return dev_res-add_size; + return dev_res; } } - return 0; + return NULL; } +static resource_size_t get_res_add_size(struct list_head *head, + struct resource *res) +{ + struct pci_dev_resource *dev_res; + + dev_res = res_to_dev_res(head, res); + return dev_res ? dev_res-add_size : 0; +} + +static resource_size_t get_res_add_align(struct list_head *head, +struct resource *res) +{ + struct pci_dev_resource *dev_res; + + dev_res = res_to_dev_res(head, res); + return dev_res ? dev_res-min_align : 0; +} + + /* Sort resources by alignment */ static void pdev_sort_resources(struct pci_dev *dev, struct list_head *head) { @@ -215,7 +235,7 @@ static void reassign_resources_sorted(struct list_head *realloc_head, struct resource *res; struct pci_dev_resource *add_res, *tmp; struct pci_dev_resource *dev_res; - resource_size_t add_size; + resource_size_t add_size, align; int idx; list_for_each_entry_safe(add_res, tmp, realloc_head, list) { @@ -238,13 +258,13 @@ static void reassign_resources_sorted(struct list_head *realloc_head, idx = res - add_res-dev-resource[0]; add_size = add_res-add_size; + align = add_res-min_align; if (!resource_size(res)) { - res-start = add_res-start; + res-start = align; res-end = res-start + add_size - 1; if (pci_assign_resource(add_res-dev, idx)) reset_resource(res); } else { - resource_size_t align = add_res-min_align; res-flags |= add_res-flags (IORESOURCE_STARTALIGN|IORESOURCE_SIZEALIGN); if (pci_reassign_resource(add_res-dev, idx, @@ -368,8 +388,9 @@ static void __assign_resources_sorted(struct list_head *head, LIST_HEAD(save_head); LIST_HEAD(local_fail_head); struct pci_dev_resource *save_res; - struct pci_dev_resource *dev_res, *tmp_res; + struct pci_dev_resource *dev_res, *tmp_res, *dev_res2; unsigned long fail_type; + resource_size_t add_align, align; /*
[PATCH 24/27] powerpc: Remove shim for pci_controller_ops.probe_mode
This also moves back the defines, as explained in the commit that created the shim. Signed-off-by: Daniel Axtens d...@axtens.net --- arch/powerpc/include/asm/machdep.h| 1 - arch/powerpc/include/asm/pci-bridge.h | 16 arch/powerpc/include/asm/pci.h| 5 + arch/powerpc/kernel/pci-common.c | 4 ++-- arch/powerpc/kernel/pci-hotplug.c | 6 +- arch/powerpc/kernel/pci_of_scan.c | 6 +- 6 files changed, 17 insertions(+), 21 deletions(-) diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h index 5549b6c..dfc8d2b 100644 --- a/arch/powerpc/include/asm/machdep.h +++ b/arch/powerpc/include/asm/machdep.h @@ -127,7 +127,6 @@ struct machdep_calls { /* PCI stuff */ /* Called after scanning the bus, before allocating resources */ void(*pcibios_fixup)(void); - int (*pci_probe_mode)(struct pci_bus *); void(*pci_irq_fixup)(struct pci_dev *dev); int (*pcibios_root_bridge_prepare)(struct pci_host_bridge *bridge); diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h index 029def0..b5d8631 100644 --- a/arch/powerpc/include/asm/pci-bridge.h +++ b/arch/powerpc/include/asm/pci-bridge.h @@ -12,11 +12,6 @@ #include linux/ioport.h #include asm-generic/pci-bridge.h -/* Return values for pci_controller_ops.probe_mode function */ -#define PCI_PROBE_NONE -1 /* Don't look at this bus at all */ -#define PCI_PROBE_NORMAL 0 /* Do normal PCI probing */ -#define PCI_PROBE_DEVTREE 1 /* Instantiate from device tree */ - struct device_node; /* @@ -305,16 +300,5 @@ static inline void dma_bus_setup(struct pci_bus *bus) ppc_md.pci_dma_bus_setup(bus); } -static inline int probe_mode(struct pci_bus *bus) -{ - struct pci_controller *hose = pci_bus_to_host(bus); - - if (hose-controller_ops.probe_mode) - return hose-controller_ops.probe_mode(bus); - if (ppc_md.pci_probe_mode) - return ppc_md.pci_probe_mode(bus); - return PCI_PROBE_NORMAL; -} - #endif /* __KERNEL__ */ #endif /* _ASM_POWERPC_PCI_BRIDGE_H */ diff --git a/arch/powerpc/include/asm/pci.h b/arch/powerpc/include/asm/pci.h index 8745067..4aef8d6 100644 --- a/arch/powerpc/include/asm/pci.h +++ b/arch/powerpc/include/asm/pci.h @@ -22,6 +22,11 @@ #include asm-generic/pci-dma-compat.h +/* Return values for pci_controller_ops.probe_mode function */ +#define PCI_PROBE_NONE -1 /* Don't look at this bus at all */ +#define PCI_PROBE_NORMAL 0 /* Do normal PCI probing */ +#define PCI_PROBE_DEVTREE 1 /* Instantiate from device tree */ + #define PCIBIOS_MIN_IO 0x1000 #define PCIBIOS_MIN_MEM0x1000 diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c index 5b90e99c..a61ecb4 100644 --- a/arch/powerpc/kernel/pci-common.c +++ b/arch/powerpc/kernel/pci-common.c @@ -1628,8 +1628,8 @@ void pcibios_scan_phb(struct pci_controller *hose) /* Get probe mode and perform scan */ mode = PCI_PROBE_NORMAL; - if (node) - mode = probe_mode(bus); + if (node hose-controller_ops.probe_mode) + mode = hose-controller_ops.probe_mode(bus); pr_debug(probe mode: %d\n, mode); if (mode == PCI_PROBE_DEVTREE) of_scan_bus(node, bus); diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-hotplug.c index 504d823..e9b0a4a 100644 --- a/arch/powerpc/kernel/pci-hotplug.c +++ b/arch/powerpc/kernel/pci-hotplug.c @@ -73,12 +73,16 @@ void pcibios_add_pci_devices(struct pci_bus * bus) { int slotno, mode, pass, max; struct pci_dev *dev; + struct pci_controller *hose; struct device_node *dn = pci_bus_to_OF_node(bus); eeh_add_device_tree_early(dn); + hose = pci_bus_to_host(bus); + mode = PCI_PROBE_NORMAL; - mode = probe_mode(bus); + if (hose-controller_ops.probe_mode) + mode = hose-controller_ops.probe_mode(bus); if (mode == PCI_PROBE_DEVTREE) { /* use ofdt-based probe */ diff --git a/arch/powerpc/kernel/pci_of_scan.c b/arch/powerpc/kernel/pci_of_scan.c index ae1767b..8312962 100644 --- a/arch/powerpc/kernel/pci_of_scan.c +++ b/arch/powerpc/kernel/pci_of_scan.c @@ -207,6 +207,7 @@ void of_scan_pci_bridge(struct pci_dev *dev) { struct device_node *node = dev-dev.of_node; struct pci_bus *bus; + struct pci_controller *hose; const __be32 *busrange, *ranges; int len, i, mode; struct pci_bus_region region; @@ -286,8 +287,11 @@ void of_scan_pci_bridge(struct pci_dev *dev) bus-number); pr_debug(bus name: %s\n, bus-name); + hose = pci_bus_to_host(bus); + mode = PCI_PROBE_NORMAL; -
[PATCH 23/27] powerpc: Remove shim for pci_controller_ops.enable_device_hook
Signed-off-by: Daniel Axtens d...@axtens.net --- arch/powerpc/include/asm/machdep.h| 4 arch/powerpc/include/asm/pci-bridge.h | 11 --- arch/powerpc/kernel/pci-common.c | 7 +-- 3 files changed, 5 insertions(+), 17 deletions(-) diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h index f178cf1..5549b6c 100644 --- a/arch/powerpc/include/asm/machdep.h +++ b/arch/powerpc/include/asm/machdep.h @@ -237,10 +237,6 @@ struct machdep_calls { /* Called for each PCI bus in the system when it's probed */ void (*pcibios_fixup_bus)(struct pci_bus *); - /* Called when pci_enable_device() is called. Returns true to -* allow assignment/enabling of the device. */ - bool (*pcibios_enable_device_hook)(struct pci_dev *); - /* Called after scan and before resource survey */ void (*pcibios_fixup_phb)(struct pci_controller *hose); diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h index b08db93..029def0 100644 --- a/arch/powerpc/include/asm/pci-bridge.h +++ b/arch/powerpc/include/asm/pci-bridge.h @@ -316,16 +316,5 @@ static inline int probe_mode(struct pci_bus *bus) return PCI_PROBE_NORMAL; } -static inline bool enable_device_hook(struct pci_dev *dev) -{ - struct pci_controller *hose = pci_bus_to_host(dev-bus); - - if (hose-controller_ops.enable_device_hook) - return hose-controller_ops.enable_device_hook(dev); - if (ppc_md.pcibios_enable_device_hook) - return ppc_md.pcibios_enable_device_hook(dev); - return true; -} - #endif /* __KERNEL__ */ #endif /* _ASM_POWERPC_PCI_BRIDGE_H */ diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c index a535d31..5b90e99c 100644 --- a/arch/powerpc/kernel/pci-common.c +++ b/arch/powerpc/kernel/pci-common.c @@ -1452,8 +1452,11 @@ EXPORT_SYMBOL_GPL(pcibios_finish_adding_to_bus); int pcibios_enable_device(struct pci_dev *dev, int mask) { - if (!enable_device_hook(dev)) - return -EINVAL; + struct pci_controller *hose = pci_bus_to_host(dev-bus); + + if (hose-controller_ops.enable_device_hook) + if (!hose-controller_ops.enable_device_hook(dev)) + return -EINVAL; return pci_enable_resources(dev, mask); } -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 27/27] powerpc: dart_iommu: Remove check for controller_ops == NULL case
Now that we have ported the calls to iommu_init_early_dart to always supply a pci_controller_ops struct, we can safely drop the check. Signed-off-by: Daniel Axtens d...@axtens.net --- arch/powerpc/sysdev/dart_iommu.c | 13 + 1 file changed, 5 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/sysdev/dart_iommu.c b/arch/powerpc/sysdev/dart_iommu.c index 87b8000..d00a566 100644 --- a/arch/powerpc/sysdev/dart_iommu.c +++ b/arch/powerpc/sysdev/dart_iommu.c @@ -395,20 +395,17 @@ void __init iommu_init_early_dart(struct pci_controller_ops *controller_ops) if (dart_is_u4) ppc_md.dma_set_mask = dart_dma_set_mask; - if (controller_ops) { - controller_ops-dma_dev_setup = pci_dma_dev_setup_dart; - controller_ops-dma_bus_setup = pci_dma_bus_setup_dart; - } + controller_ops-dma_dev_setup = pci_dma_dev_setup_dart; + controller_ops-dma_bus_setup = pci_dma_bus_setup_dart; + /* Setup pci_dma ops */ set_pci_dma_ops(dma_iommu_ops); return; bail: /* If init failed, use direct iommu and null setup functions */ - if (controller_ops) { - controller_ops-dma_dev_setup = NULL; - controller_ops-dma_bus_setup = NULL; - } + controller_ops-dma_dev_setup = NULL; + controller_ops-dma_bus_setup = NULL; /* Setup pci_dma ops */ set_pci_dma_ops(dma_direct_ops); -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V15 06/21] PCI: Index IOV resources in the conventional style
From: Bjorn Helgaas bhelg...@google.com Most of PCI uses res = dev-resource[i], not res = dev-resource + i. Use that style in iov.c also. No functional change. Signed-off-by: Bjorn Helgaas bhelg...@google.com Acked-by: Wei Yang weiy...@linux.vnet.ibm.com --- drivers/pci/iov.c |8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index 5bca0e1..27b98c3 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -95,7 +95,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int reset) virtfn-multifunction = 0; for (i = 0; i PCI_SRIOV_NUM_BARS; i++) { - res = dev-resource + PCI_IOV_RESOURCES + i; + res = dev-resource[i + PCI_IOV_RESOURCES]; if (!res-parent) continue; virtfn-resource[i].name = pci_name(virtfn); @@ -212,7 +212,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn) nres = 0; for (i = 0; i PCI_SRIOV_NUM_BARS; i++) { bars |= (1 (i + PCI_IOV_RESOURCES)); - res = dev-resource + PCI_IOV_RESOURCES + i; + res = dev-resource[i + PCI_IOV_RESOURCES]; if (res-parent) nres++; } @@ -373,7 +373,7 @@ found: nres = 0; for (i = 0; i PCI_SRIOV_NUM_BARS; i++) { - res = dev-resource + PCI_IOV_RESOURCES + i; + res = dev-resource[i + PCI_IOV_RESOURCES]; bar64 = __pci_read_base(dev, pci_bar_unknown, res, pos + PCI_SRIOV_BAR + i * 4); if (!res-flags) @@ -417,7 +417,7 @@ found: failed: for (i = 0; i PCI_SRIOV_NUM_BARS; i++) { - res = dev-resource + PCI_IOV_RESOURCES + i; + res = dev-resource[i + PCI_IOV_RESOURCES]; res-flags = 0; } -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V15 05/21] PCI: Keep individual VF BAR size in struct pci_sriov
Currently we don't store the individual VF BAR size. We calculate it when needed by dividing the PF's IOV resource size (which contains space for *all* the VFs) by total_VFs or by reading the BAR in the SR-IOV capability again. Keep the individual VF BAR size in struct pci_sriov.barsz[], add pci_iov_resource_size() to retrieve it, and use that instead of doing the division or reading the SR-IOV capability BAR. [bhelgaas: rename to barsz[], simplify barsz[] index computation, remove SR-IOV capability BAR sizing] Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com Acked-by: Bjorn Helgaas bhelg...@google.com --- drivers/pci/iov.c | 39 --- drivers/pci/pci.h |1 + include/linux/pci.h |3 +++ 3 files changed, 24 insertions(+), 19 deletions(-) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index 05f9d97..5bca0e1 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -57,6 +57,14 @@ static void virtfn_remove_bus(struct pci_bus *physbus, struct pci_bus *virtbus) pci_remove_bus(virtbus); } +resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno) +{ + if (!dev-is_physfn) + return 0; + + return dev-sriov-barsz[resno - PCI_IOV_RESOURCES]; +} + static int virtfn_add(struct pci_dev *dev, int id, int reset) { int i; @@ -92,8 +100,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int reset) continue; virtfn-resource[i].name = pci_name(virtfn); virtfn-resource[i].flags = res-flags; - size = resource_size(res); - do_div(size, iov-total_VFs); + size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES); virtfn-resource[i].start = res-start + size * id; virtfn-resource[i].end = virtfn-resource[i].start + size - 1; rc = request_resource(res, virtfn-resource[i]); @@ -311,7 +318,7 @@ static void sriov_disable(struct pci_dev *dev) static int sriov_init(struct pci_dev *dev, int pos) { - int i; + int i, bar64; int rc; int nres; u32 pgsz; @@ -360,29 +367,29 @@ found: pgsz = ~(pgsz - 1); pci_write_config_dword(dev, pos + PCI_SRIOV_SYS_PGSIZE, pgsz); + iov = kzalloc(sizeof(*iov), GFP_KERNEL); + if (!iov) + return -ENOMEM; + nres = 0; for (i = 0; i PCI_SRIOV_NUM_BARS; i++) { res = dev-resource + PCI_IOV_RESOURCES + i; - i += __pci_read_base(dev, pci_bar_unknown, res, -pos + PCI_SRIOV_BAR + i * 4); + bar64 = __pci_read_base(dev, pci_bar_unknown, res, + pos + PCI_SRIOV_BAR + i * 4); if (!res-flags) continue; if (resource_size(res) (PAGE_SIZE - 1)) { rc = -EIO; goto failed; } + iov-barsz[i] = resource_size(res); res-end = res-start + resource_size(res) * total - 1; dev_info(dev-dev, VF(n) BAR%d space: %pR (contains BAR%d for %d VFs)\n, i, res, i, total); + i += bar64; nres++; } - iov = kzalloc(sizeof(*iov), GFP_KERNEL); - if (!iov) { - rc = -ENOMEM; - goto failed; - } - iov-pos = pos; iov-nres = nres; iov-ctrl = ctrl; @@ -414,6 +421,7 @@ failed: res-flags = 0; } + kfree(iov); return rc; } @@ -510,14 +518,7 @@ int pci_iov_resource_bar(struct pci_dev *dev, int resno) */ resource_size_t pci_sriov_resource_alignment(struct pci_dev *dev, int resno) { - struct resource tmp; - int reg = pci_iov_resource_bar(dev, resno); - - if (!reg) - return 0; - -__pci_read_base(dev, pci_bar_unknown, tmp, reg); - return resource_alignment(tmp); + return pci_iov_resource_size(dev, resno); } /** diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 4091f82..5732964 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -247,6 +247,7 @@ struct pci_sriov { struct pci_dev *dev;/* lowest numbered PF */ struct pci_dev *self; /* this PF */ struct mutex lock; /* lock for VF bus */ + resource_size_t barsz[PCI_SRIOV_NUM_BARS]; /* VF BAR size */ }; #ifdef CONFIG_PCI_ATS diff --git a/include/linux/pci.h b/include/linux/pci.h index 211e9da..1559658 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1675,6 +1675,7 @@ int pci_num_vf(struct pci_dev *dev); int pci_vfs_assigned(struct pci_dev *dev); int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs); int pci_sriov_get_totalvfs(struct pci_dev *dev); +resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno); #else static inline int
[PATCH V15 16/21] powerpc/powernv: Implement pcibios_iov_resource_alignment() on powernv
Implement pcibios_iov_resource_alignment() on powernv platform. On PowerNV platform, there are 3 cases for the IOV BAR: 1. initial state, the IOV BAR size is multiple times of VF BAR size 2. after expanded, the IOV BAR size is expanded to meet the M64 segment size 3. sizing stage, the IOV BAR is truncated to 0 pnv_pci_iov_resource_alignment() handle these three cases respectively. [bhelgaas: adjust to drop align parameter, return pci_iov_resource_size() if no ppc_md machdep_call version] Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com --- arch/powerpc/include/asm/machdep.h|1 + arch/powerpc/kernel/pci-common.c | 10 ++ arch/powerpc/platforms/powernv/pci-ioda.c | 20 3 files changed, 31 insertions(+) diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h index 1d72fda..37e451f 100644 --- a/arch/powerpc/include/asm/machdep.h +++ b/arch/powerpc/include/asm/machdep.h @@ -252,6 +252,7 @@ struct machdep_calls { #ifdef CONFIG_PCI_IOV void (*pcibios_fixup_sriov)(struct pci_dev *pdev); + resource_size_t (*pcibios_iov_resource_alignment)(struct pci_dev *, int resno); #endif /* CONFIG_PCI_IOV */ /* Called to shutdown machine specific hardware not already controlled diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c index 375bf70..9a306ff 100644 --- a/arch/powerpc/kernel/pci-common.c +++ b/arch/powerpc/kernel/pci-common.c @@ -130,6 +130,16 @@ void pcibios_reset_secondary_bus(struct pci_dev *dev) pci_reset_secondary_bus(dev); } +#ifdef CONFIG_PCI_IOV +resource_size_t pcibios_iov_resource_alignment(struct pci_dev *pdev, int resno) +{ + if (ppc_md.pcibios_iov_resource_alignment) + return ppc_md.pcibios_iov_resource_alignment(pdev, resno); + + return pci_iov_resource_size(pdev, resno); +} +#endif /* CONFIG_PCI_IOV */ + static resource_size_t pcibios_io_size(const struct pci_controller *hose) { #ifdef CONFIG_PPC64 diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index cadd3fb..93ec16c 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -1965,6 +1965,25 @@ static resource_size_t pnv_pci_window_alignment(struct pci_bus *bus, return phb-ioda.io_segsize; } +#ifdef CONFIG_PCI_IOV +static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev, + int resno) +{ + struct pci_dn *pdn = pci_get_pdn(pdev); + resource_size_t align, iov_align; + + iov_align = resource_size(pdev-resource[resno]); + if (iov_align) + return iov_align; + + align = pci_iov_resource_size(pdev, resno); + if (pdn-vfs_expanded) + return pdn-vfs_expanded * align; + + return align; +} +#endif /* CONFIG_PCI_IOV */ + /* Prevent enabling devices for which we couldn't properly * assign a PE */ @@ -2167,6 +2186,7 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np, ppc_md.pcibios_reset_secondary_bus = pnv_pci_reset_secondary_bus; #ifdef CONFIG_PCI_IOV ppc_md.pcibios_fixup_sriov = pnv_pci_ioda_fixup_iov_resources; + ppc_md.pcibios_iov_resource_alignment = pnv_pci_iov_resource_alignment; #endif /* CONFIG_PCI_IOV */ pci_add_flags(PCI_REASSIGN_ALL_RSRC); -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V15 21/21] powerpc/pci: Add PCI resource alignment documentation
In order to enable SRIOV on PowerNV platform, the PF's IOV BAR needs to be adjusted: 1. size expanded 2. aligned to M64BT size This patch documents this change on the reason and how. [bhelgaas: reformat, clarify, expand] Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com --- .../powerpc/pci_iov_resource_on_powernv.txt| 301 1 file changed, 301 insertions(+) create mode 100644 Documentation/powerpc/pci_iov_resource_on_powernv.txt diff --git a/Documentation/powerpc/pci_iov_resource_on_powernv.txt b/Documentation/powerpc/pci_iov_resource_on_powernv.txt new file mode 100644 index 000..b55c5cd --- /dev/null +++ b/Documentation/powerpc/pci_iov_resource_on_powernv.txt @@ -0,0 +1,301 @@ +Wei Yang weiy...@linux.vnet.ibm.com +Benjamin Herrenschmidt b...@au1.ibm.com +Bjorn Helgaas bhelg...@google.com +26 Aug 2014 + +This document describes the requirement from hardware for PCI MMIO resource +sizing and assignment on PowerKVM and how generic PCI code handles this +requirement. The first two sections describe the concepts of Partitionable +Endpoints and the implementation on P8 (IODA2). The next two sections talks +about considerations on enabling SRIOV on IODA2. + +1. Introduction to Partitionable Endpoints + +A Partitionable Endpoint (PE) is a way to group the various resources +associated with a device or a set of devices to provide isolation between +partitions (i.e., filtering of DMA, MSIs etc.) and to provide a mechanism +to freeze a device that is causing errors in order to limit the possibility +of propagation of bad data. + +There is thus, in HW, a table of PE states that contains a pair of frozen +state bits (one for MMIO and one for DMA, they get set together but can be +cleared independently) for each PE. + +When a PE is frozen, all stores in any direction are dropped and all loads +return all 1's value. MSIs are also blocked. There's a bit more state that +captures things like the details of the error that caused the freeze etc., but +that's not critical. + +The interesting part is how the various PCIe transactions (MMIO, DMA, ...) +are matched to their corresponding PEs. + +The following section provides a rough description of what we have on P8 +(IODA2). Keep in mind that this is all per PHB (PCI host bridge). Each PHB +is a completely separate HW entity that replicates the entire logic, so has +its own set of PEs, etc. + +2. Implementation of Partitionable Endpoints on P8 (IODA2) + +P8 supports up to 256 Partitionable Endpoints per PHB. + + * Inbound + +For DMA, MSIs and inbound PCIe error messages, we have a table (in +memory but accessed in HW by the chip) that provides a direct +correspondence between a PCIe RID (bus/dev/fn) with a PE number. +We call this the RTT. + +- For DMA we then provide an entire address space for each PE that can + contain two windows, depending on the value of PCI address bit 59. + Each window can be configured to be remapped via a TCE table (IOMMU + translation table), which has various configurable characteristics + not described here. + +- For MSIs, we have two windows in the address space (one at the top of + the 32-bit space and one much higher) which, via a combination of the + address and MSI value, will result in one of the 2048 interrupts per + bridge being triggered. There's a PE# in the interrupt controller + descriptor table as well which is compared with the PE# obtained from + the RTT to authorize the device to emit that specific interrupt. + +- Error messages just use the RTT. + + * Outbound. That's where the tricky part is. + +Like other PCI host bridges, the Power8 IODA2 PHB supports windows +from the CPU address space to the PCI address space. There is one M32 +window and sixteen M64 windows. They have different characteristics. +First what they have in common: they forward a configurable portion of +the CPU address space to the PCIe bus and must be naturally aligned +power of two in size. The rest is different: + +- The M32 window: + + * Is limited to 4GB in size. + + * Drops the top bits of the address (above the size) and replaces + them with a configurable value. This is typically used to generate + 32-bit PCIe accesses. We configure that window at boot from FW and + don't touch it from Linux; it's usually set to forward a 2GB + portion of address space from the CPU to PCIe + 0x8000_..0x_. (Note: The top 64KB are actually + reserved for MSIs but this is not a problem at this point; we just + need to ensure Linux doesn't assign anything there, the M32 logic + ignores that however and will forward in that space if we try). + + * It is divided into 256 segments of equal size. A table in the chip + maps each segment to a PE#. That allows portions of the MMIO space + to be assigned to PEs on a segment
[PATCH V15 20/21] powerpc/pci: Remove unused struct pci_dn.pcidev field
In struct pci_dn, the pcidev field is assigned but not used, so remove it. Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com Acked-by: Gavin Shan gws...@linux.vnet.ibm.com --- arch/powerpc/include/asm/pci-bridge.h |1 - arch/powerpc/platforms/powernv/pci-ioda.c |1 - 2 files changed, 2 deletions(-) diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h index ec83b51..680ae56 100644 --- a/arch/powerpc/include/asm/pci-bridge.h +++ b/arch/powerpc/include/asm/pci-bridge.h @@ -168,7 +168,6 @@ struct pci_dn { int pci_ext_config_space; /* for pci devices */ - struct pci_dev *pcidev;/* back-pointer to the pci device */ #ifdef CONFIG_EEH struct eeh_dev *edev; /* eeh device */ #endif diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index dc9f401..2505ad1 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -1028,7 +1028,6 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe) pci_name(dev)); continue; } - pdn-pcidev = dev; pdn-pe_number = pe-pe_number; pe-dma_weight += pnv_ioda_dma_weight(dev); if ((pe-flags PNV_IODA_PE_BUS_ALL) dev-subordinate) -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 15/27] powerpc/pseries: Move controller ops from ppc_md to controller_ops
This moves the pSeries platform to use the pci_controller_ops structure, rather than ppc_md for PCI controller operations. Signed-off-by: Daniel Axtens d...@axtens.net --- arch/powerpc/platforms/pseries/iommu.c | 9 + arch/powerpc/platforms/pseries/pseries.h | 2 ++ arch/powerpc/platforms/pseries/setup.c | 6 +- 3 files changed, 12 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c index 7803a19..61d5a17 100644 --- a/arch/powerpc/platforms/pseries/iommu.c +++ b/arch/powerpc/platforms/pseries/iommu.c @@ -49,6 +49,7 @@ #include asm/mmzone.h #include asm/plpar_wrappers.h +#include pseries.h static void tce_invalidate_pSeries_sw(struct iommu_table *tbl, __be64 *startp, __be64 *endp) @@ -1307,16 +1308,16 @@ void iommu_init_early_pSeries(void) ppc_md.tce_free = tce_free_pSeriesLP; } ppc_md.tce_get = tce_get_pSeriesLP; - ppc_md.pci_dma_bus_setup = pci_dma_bus_setup_pSeriesLP; - ppc_md.pci_dma_dev_setup = pci_dma_dev_setup_pSeriesLP; + pseries_pci_controller_ops.dma_bus_setup = pci_dma_bus_setup_pSeriesLP; + pseries_pci_controller_ops.dma_dev_setup = pci_dma_dev_setup_pSeriesLP; ppc_md.dma_set_mask = dma_set_mask_pSeriesLP; ppc_md.dma_get_required_mask = dma_get_required_mask_pSeriesLP; } else { ppc_md.tce_build = tce_build_pSeries; ppc_md.tce_free = tce_free_pSeries; ppc_md.tce_get = tce_get_pseries; - ppc_md.pci_dma_bus_setup = pci_dma_bus_setup_pSeries; - ppc_md.pci_dma_dev_setup = pci_dma_dev_setup_pSeries; + pseries_pci_controller_ops.dma_bus_setup = pci_dma_bus_setup_pSeries; + pseries_pci_controller_ops.dma_dev_setup = pci_dma_dev_setup_pSeries; } diff --git a/arch/powerpc/platforms/pseries/pseries.h b/arch/powerpc/platforms/pseries/pseries.h index 1796c54..cd64672 100644 --- a/arch/powerpc/platforms/pseries/pseries.h +++ b/arch/powerpc/platforms/pseries/pseries.h @@ -65,6 +65,8 @@ extern int dlpar_detach_node(struct device_node *); struct pci_host_bridge; int pseries_root_bridge_prepare(struct pci_host_bridge *bridge); +extern struct pci_controller_ops pseries_pci_controller_ops; + unsigned long pseries_memory_block_size(void); #endif /* _PSERIES_PSERIES_H */ diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c index 1a5f884..328e318 100644 --- a/arch/powerpc/platforms/pseries/setup.c +++ b/arch/powerpc/platforms/pseries/setup.c @@ -478,6 +478,7 @@ static void __init find_and_init_phbs(void) rtas_setup_phb(phb); pci_process_bridge_OF_ranges(phb, node, 0); isa_bridge_find_early(phb); + phb-controller_ops = pseries_pci_controller_ops; } of_node_put(root); @@ -840,6 +841,10 @@ static int pSeries_pci_probe_mode(struct pci_bus *bus) void pSeries_final_fixup(void) { } #endif +struct pci_controller_ops pseries_pci_controller_ops = { + .probe_mode = pSeries_pci_probe_mode, +}; + define_machine(pseries) { .name = pSeries, .probe = pSeries_probe, @@ -848,7 +853,6 @@ define_machine(pseries) { .show_cpuinfo = pSeries_show_cpuinfo, .log_error = pSeries_log_error, .pcibios_fixup = pSeries_final_fixup, - .pci_probe_mode = pSeries_pci_probe_mode, .restart= rtas_restart, .halt = rtas_halt, .panic = rtas_os_term, -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 19/27] powerpc: fsl_pci, swiotlb: Move controller ops from ppc_md to controller_ops
This moves the setup out of swiotlb's subsys init call, and into an fsl_pci.c is the only thing that checks the ppc_swiotlb_enable global, so we can be confident that patching it will cover all the PCI implementations affected by the changes to dma-swiotlb.c. Signed-off-by: Daniel Axtens d...@axtens.net --- arch/powerpc/kernel/dma-swiotlb.c | 11 --- arch/powerpc/sysdev/fsl_pci.c | 19 +++ 2 files changed, 23 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/kernel/dma-swiotlb.c b/arch/powerpc/kernel/dma-swiotlb.c index d06491b..6e8d764 100644 --- a/arch/powerpc/kernel/dma-swiotlb.c +++ b/arch/powerpc/kernel/dma-swiotlb.c @@ -116,16 +116,13 @@ void __init swiotlb_detect_4g(void) } } -static int __init swiotlb_subsys_init(void) +static int __init check_swiotlb_enabled(void) { - if (ppc_swiotlb_enable) { + if (ppc_swiotlb_enable) swiotlb_print_info(); - set_pci_dma_ops(swiotlb_dma_ops); - ppc_md.pci_dma_dev_setup = pci_dma_dev_setup_swiotlb; - } else { + else swiotlb_free(); - } return 0; } -subsys_initcall(swiotlb_subsys_init); +subsys_initcall(check_swiotlb_enabled); diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c index 7071feb..ca13b7f 100644 --- a/arch/powerpc/sysdev/fsl_pci.c +++ b/arch/powerpc/sysdev/fsl_pci.c @@ -111,6 +111,22 @@ static struct pci_ops fsl_indirect_pcie_ops = #define MAX_PHYS_ADDR_BITS 40 static u64 pci64_dma_offset = 1ull MAX_PHYS_ADDR_BITS; +#ifdef CONFIG_SWIOTLB +static struct pci_controller_ops swiotlb_pci_controller_ops = { + .dma_dev_setup = pci_dma_dev_setup_swiotlb, +}; + +static void setup_swiotlb_ops(struct pci_controller *hose) +{ + if (ppc_swiotlb_enable) { + hose-controller_ops = swiotlb_pci_controller_ops; + set_pci_dma_ops(swiotlb_dma_ops); + } +} +#else +static inline void setup_swiotlb_ops(struct pci_controller *hose) {} +#endif + static int fsl_pci_dma_set_mask(struct device *dev, u64 dma_mask) { if (!dev-dma_mask || !dma_supported(dev, dma_mask)) @@ -492,6 +508,9 @@ int fsl_add_bridge(struct platform_device *pdev, int is_primary) hose-first_busno = bus_range ? bus_range[0] : 0x0; hose-last_busno = bus_range ? bus_range[1] : 0xff; + /* Set up controller operations */ + setup_swiotlb_ops(hose); + pr_debug(PCI memory map start 0x%016llx, size 0x%016llx\n, (u64)rsrc.start, (u64)resource_size(rsrc)); -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 18/27] powerpc/maple: Move controller ops from ppc_md to controller_ops
This moves the Maple platform to use the pci_controller_ops structure rather than ppc_md for PCI controller operations. Signed-off-by: Daniel Axtens d...@axtens.net --- arch/powerpc/platforms/maple/maple.h | 2 ++ arch/powerpc/platforms/maple/pci.c | 4 arch/powerpc/platforms/maple/setup.c | 2 +- 3 files changed, 7 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/maple/maple.h b/arch/powerpc/platforms/maple/maple.h index c6911dd..eecfa18 100644 --- a/arch/powerpc/platforms/maple/maple.h +++ b/arch/powerpc/platforms/maple/maple.h @@ -10,3 +10,5 @@ extern void maple_calibrate_decr(void); extern void maple_pci_init(void); extern void maple_pci_irq_fixup(struct pci_dev *dev); extern int maple_pci_get_legacy_ide_irq(struct pci_dev *dev, int channel); + +extern struct pci_controller_ops maple_pci_controller_ops; diff --git a/arch/powerpc/platforms/maple/pci.c b/arch/powerpc/platforms/maple/pci.c index d3a1306..a923230 100644 --- a/arch/powerpc/platforms/maple/pci.c +++ b/arch/powerpc/platforms/maple/pci.c @@ -510,6 +510,7 @@ static int __init maple_add_bridge(struct device_node *dev) return -ENOMEM; hose-first_busno = bus_range ? bus_range[0] : 0; hose-last_busno = bus_range ? bus_range[1] : 0xff; + hose-controller_ops = maple_pci_controller_ops; disp_name = NULL; if (of_device_is_compatible(dev, u3-agp)) { @@ -660,3 +661,6 @@ static void quirk_ipr_msi(struct pci_dev *dev) } DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_IBM, PCI_DEVICE_ID_IBM_OBSIDIAN, quirk_ipr_msi); + +struct pci_controller_ops maple_pci_controller_ops = { +}; diff --git a/arch/powerpc/platforms/maple/setup.c b/arch/powerpc/platforms/maple/setup.c index 3bf2e03..a837188 100644 --- a/arch/powerpc/platforms/maple/setup.c +++ b/arch/powerpc/platforms/maple/setup.c @@ -203,7 +203,7 @@ static void __init maple_init_early(void) { DBG( - maple_init_early\n); - iommu_init_early_dart(NULL); + iommu_init_early_dart(maple_pci_controller_ops); DBG( - maple_init_early\n); } -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 25/27] powerpc: Remove shim for pci_controller_ops.dma_dev_setup
Signed-off-by: Daniel Axtens d...@axtens.net --- arch/powerpc/include/asm/machdep.h| 1 - arch/powerpc/include/asm/pci-bridge.h | 9 - arch/powerpc/kernel/pci-common.c | 5 - arch/powerpc/sysdev/dart_iommu.c | 2 -- 4 files changed, 4 insertions(+), 13 deletions(-) diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h index dfc8d2b..2f7b319 100644 --- a/arch/powerpc/include/asm/machdep.h +++ b/arch/powerpc/include/asm/machdep.h @@ -103,7 +103,6 @@ struct machdep_calls { #endif #endif /* CONFIG_PPC64 */ - void(*pci_dma_dev_setup)(struct pci_dev *dev); void(*pci_dma_bus_setup)(struct pci_bus *bus); /* Platform set_dma_mask and dma_get_required_mask overrides */ diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h index b5d8631..e578f67 100644 --- a/arch/powerpc/include/asm/pci-bridge.h +++ b/arch/powerpc/include/asm/pci-bridge.h @@ -280,15 +280,6 @@ static inline int pcibios_vaddr_is_ioport(void __iomem *address) /* * Shims to prefer pci_controller version over ppc_md where available. */ -static inline void dma_dev_setup(struct pci_dev *dev) -{ - struct pci_controller *hose = pci_bus_to_host(dev-bus); - - if (hose-controller_ops.dma_dev_setup) - hose-controller_ops.dma_dev_setup(dev); - else if (ppc_md.pci_dma_dev_setup) - ppc_md.pci_dma_dev_setup(dev); -} static inline void dma_bus_setup(struct pci_bus *bus) { diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c index a61ecb4..433b387 100644 --- a/arch/powerpc/kernel/pci-common.c +++ b/arch/powerpc/kernel/pci-common.c @@ -962,6 +962,7 @@ void pcibios_setup_bus_self(struct pci_bus *bus) static void pcibios_setup_device(struct pci_dev *dev) { + struct pci_controller *hose; /* Fixup NUMA node as it may not be setup yet by the generic * code and is needed by the DMA init */ @@ -972,7 +973,9 @@ static void pcibios_setup_device(struct pci_dev *dev) set_dma_offset(dev-dev, PCI_DRAM_OFFSET); /* Additional platform DMA/iommu setup */ - dma_dev_setup(dev); + hose = pci_bus_to_host(dev-bus); + if (hose-controller_ops.dma_dev_setup) + hose-controller_ops.dma_dev_setup(dev); /* Read default IRQs and fixup if necessary */ pci_read_irq_line(dev); diff --git a/arch/powerpc/sysdev/dart_iommu.c b/arch/powerpc/sysdev/dart_iommu.c index 120e96a..ca38b1e 100644 --- a/arch/powerpc/sysdev/dart_iommu.c +++ b/arch/powerpc/sysdev/dart_iommu.c @@ -399,7 +399,6 @@ void __init iommu_init_early_dart(struct pci_controller_ops *controller_ops) controller_ops-dma_dev_setup = pci_dma_dev_setup_dart; controller_ops-dma_bus_setup = pci_dma_bus_setup_dart; } else { - ppc_md.pci_dma_dev_setup = pci_dma_dev_setup_dart; ppc_md.pci_dma_bus_setup = pci_dma_bus_setup_dart; } /* Setup pci_dma ops */ @@ -412,7 +411,6 @@ void __init iommu_init_early_dart(struct pci_controller_ops *controller_ops) controller_ops-dma_dev_setup = NULL; controller_ops-dma_bus_setup = NULL; } - ppc_md.pci_dma_dev_setup = NULL; ppc_md.pci_dma_bus_setup = NULL; /* Setup pci_dma ops */ -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V15 04/21] PCI: Print PF SR-IOV resource that contains all VF(n) BAR space
When we size VF BAR0, VF BAR1, etc., from the SR-IOV Capability of a PF, we learn the alignment requirement and amount of space consumed by a single VF. But when VFs are enabled, *each* of the NumVFs consumes that amount of space, so the total size of the PF resource is VF BAR size * NumVFs. Add a printk of the total space consumed by the VFs corresponding to what we already do for normal non-IOV BARs. No functional change; new message only. [bhelgaas: split out into its own patch] Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com Acked-by: Bjorn Helgaas bhelg...@google.com --- drivers/pci/iov.c |2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index c4c33ea..05f9d97 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -372,6 +372,8 @@ found: goto failed; } res-end = res-start + resource_size(res) * total - 1; + dev_info(dev-dev, VF(n) BAR%d space: %pR (contains BAR%d for %d VFs)\n, +i, res, i, total); nres++; } -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V15 08/21] PCI: Calculate maximum number of buses required for VFs
An SR-IOV device can change its First VF Offset and VF Stride based on the values of ARI Capable Hierarchy and NumVFs. The number of buses required for all VFs is determined by NumVFs, First VF Offset, and VF Stride (see SR-IOV spec r1.1, sec 2.1.2). Previously pci_iov_bus_range() computed how many buses would be required by TotalVFs, but this was based on a single NumVFs value and may not have been the maximum for all NumVFs configurations. Iterate over all valid NumVFs and calculate the maximum number of bus numbers that could ever be required for VFs of this device. [bhelgaas: changelog, compute busnr of NumVFs, not TotalVFs, remove kerenl-doc comment marker] Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com Acked-by: Bjorn Helgaas bhelg...@google.com --- drivers/pci/iov.c | 31 +++ drivers/pci/pci.h |1 + 2 files changed, 28 insertions(+), 4 deletions(-) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index a8752c2..2ae921f 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -46,6 +46,30 @@ static inline void pci_iov_set_numvfs(struct pci_dev *dev, int nr_virtfn) pci_read_config_word(dev, iov-pos + PCI_SRIOV_VF_STRIDE, iov-stride); } +/* + * The PF consumes one bus number. NumVFs, First VF Offset, and VF Stride + * determine how many additional bus numbers will be consumed by VFs. + * + * Iterate over all valid NumVFs and calculate the maximum number of bus + * numbers that could ever be required. + */ +static inline u8 virtfn_max_buses(struct pci_dev *dev) +{ + struct pci_sriov *iov = dev-sriov; + int nr_virtfn; + u8 max = 0; + u8 busnr; + + for (nr_virtfn = 1; nr_virtfn = iov-total_VFs; nr_virtfn++) { + pci_iov_set_numvfs(dev, nr_virtfn); + busnr = virtfn_bus(dev, nr_virtfn - 1); + if (busnr max) + max = busnr; + } + + return max; +} + static struct pci_bus *virtfn_add_bus(struct pci_bus *bus, int busnr) { struct pci_bus *child; @@ -427,6 +451,7 @@ found: dev-sriov = iov; dev-is_physfn = 1; + iov-max_VF_buses = virtfn_max_buses(dev); return 0; @@ -556,15 +581,13 @@ void pci_restore_iov_state(struct pci_dev *dev) int pci_iov_bus_range(struct pci_bus *bus) { int max = 0; - u8 busnr; struct pci_dev *dev; list_for_each_entry(dev, bus-devices, bus_list) { if (!dev-is_physfn) continue; - busnr = virtfn_bus(dev, dev-sriov-total_VFs - 1); - if (busnr max) - max = busnr; + if (dev-sriov-max_VF_buses max) + max = dev-sriov-max_VF_buses; } return max ? max - bus-number : 0; diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 5732964..bae593c 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -243,6 +243,7 @@ struct pci_sriov { u16 stride; /* following VF stride */ u32 pgsz; /* page size for BAR alignment */ u8 link;/* Function Dependency Link */ + u8 max_VF_buses;/* max buses consumed by VFs */ u16 driver_max_VFs; /* max num VFs driver supports */ struct pci_dev *dev;/* lowest numbered PF */ struct pci_dev *self; /* this PF */ -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V15 09/21] PCI: Export pci_iov_virtfn_bus() and pci_iov_virtfn_devfn()
On PowerNV, some resource reservation is needed for SR-IOV VFs that don't exist at the bootup stage. To do the match between resources and VFs, the code need to get the VF's BDF in advance. Rename virtfn_bus() and virtfn_devfn() to pci_iov_virtfn_bus() and pci_iov_virtfn_devfn() and export them. [bhelgaas: changelog, make busnr int] Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com Acked-by: Bjorn Helgaas bhelg...@google.com --- drivers/pci/iov.c | 28 include/linux/pci.h | 11 +++ 2 files changed, 27 insertions(+), 12 deletions(-) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index 2ae921f..5643a10 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -19,16 +19,20 @@ #define VIRTFN_ID_LEN 16 -static inline u8 virtfn_bus(struct pci_dev *dev, int id) +int pci_iov_virtfn_bus(struct pci_dev *dev, int vf_id) { + if (!dev-is_physfn) + return -EINVAL; return dev-bus-number + ((dev-devfn + dev-sriov-offset + - dev-sriov-stride * id) 8); + dev-sriov-stride * vf_id) 8); } -static inline u8 virtfn_devfn(struct pci_dev *dev, int id) +int pci_iov_virtfn_devfn(struct pci_dev *dev, int vf_id) { + if (!dev-is_physfn) + return -EINVAL; return (dev-devfn + dev-sriov-offset + - dev-sriov-stride * id) 0xff; + dev-sriov-stride * vf_id) 0xff; } /* @@ -58,11 +62,11 @@ static inline u8 virtfn_max_buses(struct pci_dev *dev) struct pci_sriov *iov = dev-sriov; int nr_virtfn; u8 max = 0; - u8 busnr; + int busnr; for (nr_virtfn = 1; nr_virtfn = iov-total_VFs; nr_virtfn++) { pci_iov_set_numvfs(dev, nr_virtfn); - busnr = virtfn_bus(dev, nr_virtfn - 1); + busnr = pci_iov_virtfn_bus(dev, nr_virtfn - 1); if (busnr max) max = busnr; } @@ -116,7 +120,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int reset) struct pci_bus *bus; mutex_lock(iov-dev-sriov-lock); - bus = virtfn_add_bus(dev-bus, virtfn_bus(dev, id)); + bus = virtfn_add_bus(dev-bus, pci_iov_virtfn_bus(dev, id)); if (!bus) goto failed; @@ -124,7 +128,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int reset) if (!virtfn) goto failed0; - virtfn-devfn = virtfn_devfn(dev, id); + virtfn-devfn = pci_iov_virtfn_devfn(dev, id); virtfn-vendor = dev-vendor; pci_read_config_word(dev, iov-pos + PCI_SRIOV_VF_DID, virtfn-device); pci_setup_device(virtfn); @@ -186,8 +190,8 @@ static void virtfn_remove(struct pci_dev *dev, int id, int reset) struct pci_sriov *iov = dev-sriov; virtfn = pci_get_domain_bus_and_slot(pci_domain_nr(dev-bus), -virtfn_bus(dev, id), -virtfn_devfn(dev, id)); +pci_iov_virtfn_bus(dev, id), +pci_iov_virtfn_devfn(dev, id)); if (!virtfn) return; @@ -226,7 +230,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn) struct pci_dev *pdev; struct pci_sriov *iov = dev-sriov; int bars = 0; - u8 bus; + int bus; if (!nr_virtfn) return 0; @@ -263,7 +267,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn) iov-offset = offset; iov-stride = stride; - bus = virtfn_bus(dev, nr_virtfn - 1); + bus = pci_iov_virtfn_bus(dev, nr_virtfn - 1); if (bus dev-bus-busn_res.end) { dev_err(dev-dev, can't enable %d VFs (bus %02x out of range of %pR)\n, nr_virtfn, bus, dev-bus-busn_res); diff --git a/include/linux/pci.h b/include/linux/pci.h index 1559658..99ea948 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1669,6 +1669,9 @@ int pci_ext_cfg_avail(void); void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar); #ifdef CONFIG_PCI_IOV +int pci_iov_virtfn_bus(struct pci_dev *dev, int id); +int pci_iov_virtfn_devfn(struct pci_dev *dev, int id); + int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn); void pci_disable_sriov(struct pci_dev *dev); int pci_num_vf(struct pci_dev *dev); @@ -1677,6 +1680,14 @@ int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs); int pci_sriov_get_totalvfs(struct pci_dev *dev); resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno); #else +static inline int pci_iov_virtfn_bus(struct pci_dev *dev, int id) +{ + return -ENOSYS; +} +static inline int pci_iov_virtfn_devfn(struct pci_dev *dev, int id) +{ + return -ENOSYS; +} static inline int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn) { return -ENODEV; } static inline void
[PATCH V15 14/21] powerpc/powernv: Allocate struct pnv_ioda_pe iommu_table dynamically
Previously the iommu_table had the same lifetime as a struct pnv_ioda_pe and was embedded in it. The pnv_ioda_pe was assigned to a PE on the bootup stage. Since PEs are based on the hardware layout which is static in the system, they will never get released. This means the iommu_table in the pnv_ioda_pe will never get released either. This no longer works for VF PE. VF PEs are created and released dynamically when VFs are created and released. So we need to assign pnv_ioda_pe to VF PEs respectively when VFs are enabled and clean up those resources for VF PE when VFs are disabled. And iommu_table is one of the resources we need to handle dynamically. Current iommu_table is a static field in pnv_ioda_pe, which will face a problem when freeing it. During the disabling of a VF, pnv_pci_ioda2_release_dma_pe will call iommu_free_table to release the iommu_table for this PE. A static iommu_table will fail in iommu_free_table. According to these requirement, this patch allocates iommu_table dynamically. Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com --- arch/powerpc/include/asm/iommu.h |3 +++ arch/powerpc/platforms/powernv/pci-ioda.c | 26 ++ arch/powerpc/platforms/powernv/pci.h |2 +- 3 files changed, 18 insertions(+), 13 deletions(-) diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h index 9cfa370..5574eeb 100644 --- a/arch/powerpc/include/asm/iommu.h +++ b/arch/powerpc/include/asm/iommu.h @@ -78,6 +78,9 @@ struct iommu_table { struct iommu_group *it_group; #endif void (*set_bypass)(struct iommu_table *tbl, bool enable); +#ifdef CONFIG_PPC_POWERNV + void *data; +#endif }; /* Pure 2^n version of get_order */ diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index df4a295..1b37066 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -916,6 +916,10 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, int all) return; } + pe-tce32_table = kzalloc_node(sizeof(struct iommu_table), + GFP_KERNEL, hose-node); + pe-tce32_table-data = pe; + /* Associate it with all child devices */ pnv_ioda_setup_same_PE(bus, pe); @@ -1005,7 +1009,7 @@ static void pnv_pci_ioda_dma_dev_setup(struct pnv_phb *phb, struct pci_dev *pdev pe = phb-ioda.pe_array[pdn-pe_number]; WARN_ON(get_dma_ops(pdev-dev) != dma_iommu_ops); - set_iommu_table_base_and_group(pdev-dev, pe-tce32_table); + set_iommu_table_base_and_group(pdev-dev, pe-tce32_table); } static int pnv_pci_ioda_dma_set_mask(struct pnv_phb *phb, @@ -1032,7 +1036,7 @@ static int pnv_pci_ioda_dma_set_mask(struct pnv_phb *phb, } else { dev_info(pdev-dev, Using 32-bit DMA via iommu\n); set_dma_ops(pdev-dev, dma_iommu_ops); - set_iommu_table_base(pdev-dev, pe-tce32_table); + set_iommu_table_base(pdev-dev, pe-tce32_table); } *pdev-dev.dma_mask = dma_mask; return 0; @@ -1069,9 +1073,9 @@ static void pnv_ioda_setup_bus_dma(struct pnv_ioda_pe *pe, list_for_each_entry(dev, bus-devices, bus_list) { if (add_to_iommu_group) set_iommu_table_base_and_group(dev-dev, - pe-tce32_table); + pe-tce32_table); else - set_iommu_table_base(dev-dev, pe-tce32_table); + set_iommu_table_base(dev-dev, pe-tce32_table); if (dev-subordinate) pnv_ioda_setup_bus_dma(pe, dev-subordinate, @@ -1161,8 +1165,7 @@ static void pnv_pci_ioda2_tce_invalidate(struct pnv_ioda_pe *pe, void pnv_pci_ioda_tce_invalidate(struct iommu_table *tbl, __be64 *startp, __be64 *endp, bool rm) { - struct pnv_ioda_pe *pe = container_of(tbl, struct pnv_ioda_pe, - tce32_table); + struct pnv_ioda_pe *pe = tbl-data; struct pnv_phb *phb = pe-phb; if (phb-type == PNV_PHB_IODA1) @@ -1228,7 +1231,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb, } /* Setup linux iommu table */ - tbl = pe-tce32_table; + tbl = pe-tce32_table; pnv_pci_setup_iommu_table(tbl, addr, TCE32_TABLE_SIZE * segs, base 28, IOMMU_PAGE_SHIFT_4K); @@ -1266,8 +1269,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb, static void pnv_pci_ioda2_set_bypass(struct iommu_table *tbl, bool enable) { - struct pnv_ioda_pe *pe = container_of(tbl, struct pnv_ioda_pe, - tce32_table); + struct pnv_ioda_pe *pe = tbl-data; uint16_t window_id =
[PATCH V15 15/21] powerpc/powernv: Reserve additional space for IOV BAR according to the number of total_pe
On PHB3, PF IOV BAR will be covered by M64 BAR to have better PE isolation. M64 BAR is a type of hardware resource in PHB3, which could map a range of MMIO to PE numbers on powernv platform. And this range is divided equally by the number of total_pe with each divided range mapping to a PE number. Also, the M64 BAR must map a MMIO range with power-of-two size. The total_pe number is usually different from total_VFs, which can lead to a conflict between MMIO space and the PE number. For example, if total_VFs is 128 and total_pe is 256, the second half of M64 BAR will be part of other PCI device, which may already belong to other PEs. This patch prevents the conflict by reserving additional space for the PF IOV BAR, which is total_pe number of VF's BAR size. [bhelgaas: make dev_printk() output more consistent, index resource[] conventionally] Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com --- arch/powerpc/include/asm/machdep.h|4 +++ arch/powerpc/include/asm/pci-bridge.h |3 ++ arch/powerpc/kernel/pci-common.c |6 arch/powerpc/platforms/powernv/pci-ioda.c | 43 + 4 files changed, 56 insertions(+) diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h index c8175a3..1d72fda 100644 --- a/arch/powerpc/include/asm/machdep.h +++ b/arch/powerpc/include/asm/machdep.h @@ -250,6 +250,10 @@ struct machdep_calls { /* Reset the secondary bus of bridge */ void (*pcibios_reset_secondary_bus)(struct pci_dev *dev); +#ifdef CONFIG_PCI_IOV + void (*pcibios_fixup_sriov)(struct pci_dev *pdev); +#endif /* CONFIG_PCI_IOV */ + /* Called to shutdown machine specific hardware not already controlled * by other drivers. */ diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h index 513f8f2..d0d1718 100644 --- a/arch/powerpc/include/asm/pci-bridge.h +++ b/arch/powerpc/include/asm/pci-bridge.h @@ -175,6 +175,9 @@ struct pci_dn { #define IODA_INVALID_PE(-1) #ifdef CONFIG_PPC_POWERNV int pe_number; +#ifdef CONFIG_PCI_IOV + u16 vfs_expanded; /* number of VFs IOV BAR expanded */ +#endif /* CONFIG_PCI_IOV */ #endif struct list_head child_list; struct list_head list; diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c index 8203101..375bf70 100644 --- a/arch/powerpc/kernel/pci-common.c +++ b/arch/powerpc/kernel/pci-common.c @@ -990,6 +990,12 @@ int pcibios_add_device(struct pci_dev *dev) */ if (dev-bus-is_added) pcibios_setup_device(dev); + +#ifdef CONFIG_PCI_IOV + if (ppc_md.pcibios_fixup_sriov) + ppc_md.pcibios_fixup_sriov(dev); +#endif /* CONFIG_PCI_IOV */ + return 0; } diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 1b37066..cadd3fb 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -1749,6 +1749,46 @@ static void pnv_pci_init_ioda_msis(struct pnv_phb *phb) static void pnv_pci_init_ioda_msis(struct pnv_phb *phb) { } #endif /* CONFIG_PCI_MSI */ +#ifdef CONFIG_PCI_IOV +static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev) +{ + struct pci_controller *hose; + struct pnv_phb *phb; + struct resource *res; + int i; + resource_size_t size; + struct pci_dn *pdn; + + if (!pdev-is_physfn || pdev-is_added) + return; + + hose = pci_bus_to_host(pdev-bus); + phb = hose-private_data; + + pdn = pci_get_pdn(pdev); + pdn-vfs_expanded = 0; + + for (i = 0; i PCI_SRIOV_NUM_BARS; i++) { + res = pdev-resource[i + PCI_IOV_RESOURCES]; + if (!res-flags || res-parent) + continue; + if (!pnv_pci_is_mem_pref_64(res-flags)) { + dev_warn(pdev-dev, Skipping expanding VF BAR%d: %pR\n, +i, res); + continue; + } + + dev_dbg(pdev-dev, Fixing VF BAR%d: %pR to\n, i, res); + size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES); + res-end = res-start + size * phb-ioda.total_pe - 1; + dev_dbg(pdev-dev,%pR\n, res); + dev_info(pdev-dev, VF BAR%d: %pR (expanded to %d VFs for PE alignment), + i, res, phb-ioda.total_pe); + } + pdn-vfs_expanded = phb-ioda.total_pe; +} +#endif /* CONFIG_PCI_IOV */ + /* * This function is supposed to be called on basis of PE from top * to bottom style. So the the I/O or MMIO segment assigned to @@ -2125,6 +2165,9 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np, ppc_md.pcibios_enable_device_hook = pnv_pci_enable_device_hook; ppc_md.pcibios_window_alignment =
[PATCH V15 18/21] powerpc/powernv: Reserve additional space for IOV BAR, with m64_per_iov supported
M64 aperture size is limited on PHB3. When the IOV BAR is too big, this will exceed the limitation and failed to be assigned. Introduce a different mechanism based on the IOV BAR size: - if IOV BAR size is smaller than 64MB, expand to total_pe - if IOV BAR size is bigger than 64MB, roundup power2 [bhelgaas: make dev_printk() output more consistent, use PCI_SRIOV_NUM_BARS] Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com --- arch/powerpc/include/asm/pci-bridge.h |2 ++ arch/powerpc/platforms/powernv/pci-ioda.c | 33 ++--- 2 files changed, 32 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h index 3c95097..d6942c9 100644 --- a/arch/powerpc/include/asm/pci-bridge.h +++ b/arch/powerpc/include/asm/pci-bridge.h @@ -179,6 +179,8 @@ struct pci_dn { u16 vfs_expanded; /* number of VFs IOV BAR expanded */ u16 num_vfs;/* number of VFs enabled*/ int offset; /* PE# for the first VF PE */ +#define M64_PER_IOV 4 + int m64_per_iov; #define IODA_INVALID_M64(-1) int m64_wins[PCI_SRIOV_NUM_BARS]; #endif /* CONFIG_PCI_IOV */ diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 11262df..2c13a39 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -2250,6 +2250,7 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev) int i; resource_size_t size; struct pci_dn *pdn; + int mul, total_vfs; if (!pdev-is_physfn || pdev-is_added) return; @@ -2260,6 +2261,32 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev) pdn = pci_get_pdn(pdev); pdn-vfs_expanded = 0; + total_vfs = pci_sriov_get_totalvfs(pdev); + pdn-m64_per_iov = 1; + mul = phb-ioda.total_pe; + + for (i = 0; i PCI_SRIOV_NUM_BARS; i++) { + res = pdev-resource[i + PCI_IOV_RESOURCES]; + if (!res-flags || res-parent) + continue; + if (!pnv_pci_is_mem_pref_64(res-flags)) { + dev_warn(pdev-dev, non M64 VF BAR%d: %pR\n, +i, res); + continue; + } + + size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES); + + /* bigger than 64M */ + if (size (1 26)) { + dev_info(pdev-dev, PowerNV: VF BAR%d: %pR IOV size is bigger than 64M, roundup power2\n, +i, res); + pdn-m64_per_iov = M64_PER_IOV; + mul = roundup_pow_of_two(total_vfs); + break; + } + } + for (i = 0; i PCI_SRIOV_NUM_BARS; i++) { res = pdev-resource[i + PCI_IOV_RESOURCES]; if (!res-flags || res-parent) @@ -2272,12 +2299,12 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev) dev_dbg(pdev-dev, Fixing VF BAR%d: %pR to\n, i, res); size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES); - res-end = res-start + size * phb-ioda.total_pe - 1; + res-end = res-start + size * mul - 1; dev_dbg(pdev-dev,%pR\n, res); dev_info(pdev-dev, VF BAR%d: %pR (expanded to %d VFs for PE alignment), - i, res, phb-ioda.total_pe); +i, res, mul); } - pdn-vfs_expanded = phb-ioda.total_pe; + pdn-vfs_expanded = mul; } #endif /* CONFIG_PCI_IOV */ -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V15 19/21] powerpc/powernv: Group VF PE when IOV BAR is big on PHB3
When IOV BAR is big, each is covered by 4 M64 windows. This leads to several VF PE sits in one PE in terms of M64. Group VF PEs according to the M64 allocation. [bhelgaas: use dev_printk() when possible] Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com --- arch/powerpc/include/asm/pci-bridge.h |2 +- arch/powerpc/platforms/powernv/pci-ioda.c | 197 ++--- 2 files changed, 154 insertions(+), 45 deletions(-) diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h index d6942c9..ec83b51 100644 --- a/arch/powerpc/include/asm/pci-bridge.h +++ b/arch/powerpc/include/asm/pci-bridge.h @@ -182,7 +182,7 @@ struct pci_dn { #define M64_PER_IOV 4 int m64_per_iov; #define IODA_INVALID_M64(-1) - int m64_wins[PCI_SRIOV_NUM_BARS]; + int m64_wins[PCI_SRIOV_NUM_BARS][M64_PER_IOV]; #endif /* CONFIG_PCI_IOV */ #endif struct list_head child_list; diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 2c13a39..dc9f401 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -1156,26 +1156,27 @@ static int pnv_pci_vf_release_m64(struct pci_dev *pdev) struct pci_controller *hose; struct pnv_phb*phb; struct pci_dn *pdn; - inti; + inti, j; bus = pdev-bus; hose = pci_bus_to_host(bus); phb = hose-private_data; pdn = pci_get_pdn(pdev); - for (i = 0; i PCI_SRIOV_NUM_BARS; i++) { - if (pdn-m64_wins[i] == IODA_INVALID_M64) - continue; - opal_pci_phb_mmio_enable(phb-opal_id, - OPAL_M64_WINDOW_TYPE, pdn-m64_wins[i], 0); - clear_bit(pdn-m64_wins[i], phb-ioda.m64_bar_alloc); - pdn-m64_wins[i] = IODA_INVALID_M64; - } + for (i = 0; i PCI_SRIOV_NUM_BARS; i++) + for (j = 0; j M64_PER_IOV; j++) { + if (pdn-m64_wins[i][j] == IODA_INVALID_M64) + continue; + opal_pci_phb_mmio_enable(phb-opal_id, + OPAL_M64_WINDOW_TYPE, pdn-m64_wins[i][j], 0); + clear_bit(pdn-m64_wins[i][j], phb-ioda.m64_bar_alloc); + pdn-m64_wins[i][j] = IODA_INVALID_M64; + } return 0; } -static int pnv_pci_vf_assign_m64(struct pci_dev *pdev) +static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs) { struct pci_bus*bus; struct pci_controller *hose; @@ -1183,17 +1184,33 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev) struct pci_dn *pdn; unsigned int win; struct resource *res; - inti; + inti, j; int64_trc; + inttotal_vfs; + resource_size_tsize, start; + intpe_num; + intvf_groups; + intvf_per_group; bus = pdev-bus; hose = pci_bus_to_host(bus); phb = hose-private_data; pdn = pci_get_pdn(pdev); + total_vfs = pci_sriov_get_totalvfs(pdev); /* Initialize the m64_wins to IODA_INVALID_M64 */ for (i = 0; i PCI_SRIOV_NUM_BARS; i++) - pdn-m64_wins[i] = IODA_INVALID_M64; + for (j = 0; j M64_PER_IOV; j++) + pdn-m64_wins[i][j] = IODA_INVALID_M64; + + if (pdn-m64_per_iov == M64_PER_IOV) { + vf_groups = (num_vfs = M64_PER_IOV) ? num_vfs: M64_PER_IOV; + vf_per_group = (num_vfs = M64_PER_IOV)? 1: + roundup_pow_of_two(num_vfs) / pdn-m64_per_iov; + } else { + vf_groups = 1; + vf_per_group = 1; + } for (i = 0; i PCI_SRIOV_NUM_BARS; i++) { res = pdev-resource[i + PCI_IOV_RESOURCES]; @@ -1203,35 +1220,61 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev) if (!pnv_pci_is_mem_pref_64(res-flags)) continue; - do { - win = find_next_zero_bit(phb-ioda.m64_bar_alloc, - phb-ioda.m64_bar_idx + 1, 0); - - if (win = phb-ioda.m64_bar_idx + 1) - goto m64_failed; - } while (test_and_set_bit(win, phb-ioda.m64_bar_alloc)); + for (j = 0; j vf_groups; j++) { + do { + win = find_next_zero_bit(phb-ioda.m64_bar_alloc, + phb-ioda.m64_bar_idx + 1, 0); + + if (win = phb-ioda.m64_bar_idx + 1) +
[PATCH 20/27] powerpc/cell: Move controller ops from ppc_md to controller_ops
This moves the Cell platform to use the pci_controller_ops structure rather than ppc_md for PCI controller operations. This depends on the patch to drop celleb support: http://patchwork.ozlabs.org/patch/451730/ Signed-off-by: Daniel Axtens d...@axtens.net --- arch/powerpc/platforms/cell/cell.h | 24 arch/powerpc/platforms/cell/iommu.c | 7 --- arch/powerpc/platforms/cell/setup.c | 5 + 3 files changed, 33 insertions(+), 3 deletions(-) create mode 100644 arch/powerpc/platforms/cell/cell.h diff --git a/arch/powerpc/platforms/cell/cell.h b/arch/powerpc/platforms/cell/cell.h new file mode 100644 index 000..ef143df --- /dev/null +++ b/arch/powerpc/platforms/cell/cell.h @@ -0,0 +1,24 @@ +/* + * Cell Platform common data structures + * + * Copyright 2015, Daniel Axtens, IBM Corporation + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2, or (at your option) + * any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#ifndef CELL_H +#define CELL_H + +#include asm/pci-bridge.h + +extern struct pci_controller_ops cell_pci_controller_ops; + +#endif diff --git a/arch/powerpc/platforms/cell/iommu.c b/arch/powerpc/platforms/cell/iommu.c index 31b1a67..4cb120f 100644 --- a/arch/powerpc/platforms/cell/iommu.c +++ b/arch/powerpc/platforms/cell/iommu.c @@ -39,6 +39,7 @@ #include asm/firmware.h #include asm/cell-regs.h +#include cell.h #include interrupt.h /* Define CELL_IOMMU_REAL_UNMAP to actually unmap non-used pages @@ -857,7 +858,7 @@ static int __init cell_iommu_init_disabled(void) cell_dma_direct_offset += base; if (cell_dma_direct_offset != 0) - ppc_md.pci_dma_dev_setup = cell_pci_dma_dev_setup; + cell_pci_controller_ops.dma_dev_setup = cell_pci_dma_dev_setup; printk(iommu: disabled, direct DMA offset is 0x%lx\n, cell_dma_direct_offset); @@ -1197,8 +1198,8 @@ static int __init cell_iommu_init(void) if (cell_iommu_init_disabled() == 0) goto bail; - /* Setup various ppc_md. callbacks */ - ppc_md.pci_dma_dev_setup = cell_pci_dma_dev_setup; + /* Setup various callbacks */ + cell_pci_controller_ops.dma_dev_setup = cell_pci_dma_dev_setup; ppc_md.dma_get_required_mask = cell_dma_get_required_mask; ppc_md.tce_build = tce_build_cell; ppc_md.tce_free = tce_free_cell; diff --git a/arch/powerpc/platforms/cell/setup.c b/arch/powerpc/platforms/cell/setup.c index d62aa98..d1be268 100644 --- a/arch/powerpc/platforms/cell/setup.c +++ b/arch/powerpc/platforms/cell/setup.c @@ -54,6 +54,7 @@ #include asm/cell-regs.h #include asm/io-workarounds.h +#include cell.h #include interrupt.h #include pervasive.h #include ras.h @@ -131,6 +132,8 @@ static int cell_setup_phb(struct pci_controller *phb) if (model == NULL || strcmp(np-name, pci)) return 0; + phb-controller_ops = cell_pci_controller_ops; + /* Setup workarounds for spider */ if (strcmp(model, Spider)) return 0; @@ -279,3 +282,5 @@ define_machine(cell) { .init_IRQ = cell_init_irq, .pci_setup_phb = cell_setup_phb, }; + +struct pci_controller_ops cell_pci_controller_ops; -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V15 07/21] PCI: Refresh First VF Offset and VF Stride when updating NumVFs
The First VF Offset and VF Stride fields depend on the NumVFs setting, so refresh the cached fields in struct pci_sriov when updating NumVFs. See the SR-IOV spec r1.1, sec 3.3.9 and 3.3.10. [bhelgaas: changelog, remove kernel-doc comment marker] Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com Acked-by: Bjorn Helgaas bhelg...@google.com --- drivers/pci/iov.c | 23 +++ 1 file changed, 19 insertions(+), 4 deletions(-) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index 27b98c3..a8752c2 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -31,6 +31,21 @@ static inline u8 virtfn_devfn(struct pci_dev *dev, int id) dev-sriov-stride * id) 0xff; } +/* + * Per SR-IOV spec sec 3.3.10 and 3.3.11, First VF Offset and VF Stride may + * change when NumVFs changes. + * + * Update iov-offset and iov-stride when NumVFs is written. + */ +static inline void pci_iov_set_numvfs(struct pci_dev *dev, int nr_virtfn) +{ + struct pci_sriov *iov = dev-sriov; + + pci_write_config_word(dev, iov-pos + PCI_SRIOV_NUM_VF, nr_virtfn); + pci_read_config_word(dev, iov-pos + PCI_SRIOV_VF_OFFSET, iov-offset); + pci_read_config_word(dev, iov-pos + PCI_SRIOV_VF_STRIDE, iov-stride); +} + static struct pci_bus *virtfn_add_bus(struct pci_bus *bus, int busnr) { struct pci_bus *child; @@ -253,7 +268,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn) return rc; } - pci_write_config_word(dev, iov-pos + PCI_SRIOV_NUM_VF, nr_virtfn); + pci_iov_set_numvfs(dev, nr_virtfn); iov-ctrl |= PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE; pci_cfg_access_lock(dev); pci_write_config_word(dev, iov-pos + PCI_SRIOV_CTRL, iov-ctrl); @@ -282,7 +297,7 @@ failed: iov-ctrl = ~(PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE); pci_cfg_access_lock(dev); pci_write_config_word(dev, iov-pos + PCI_SRIOV_CTRL, iov-ctrl); - pci_write_config_word(dev, iov-pos + PCI_SRIOV_NUM_VF, 0); + pci_iov_set_numvfs(dev, 0); ssleep(1); pci_cfg_access_unlock(dev); @@ -313,7 +328,7 @@ static void sriov_disable(struct pci_dev *dev) sysfs_remove_link(dev-dev.kobj, dep_link); iov-num_VFs = 0; - pci_write_config_word(dev, iov-pos + PCI_SRIOV_NUM_VF, 0); + pci_iov_set_numvfs(dev, 0); } static int sriov_init(struct pci_dev *dev, int pos) @@ -452,7 +467,7 @@ static void sriov_restore_state(struct pci_dev *dev) pci_update_resource(dev, i); pci_write_config_dword(dev, iov-pos + PCI_SRIOV_SYS_PGSIZE, iov-pgsz); - pci_write_config_word(dev, iov-pos + PCI_SRIOV_NUM_VF, iov-num_VFs); + pci_iov_set_numvfs(dev, iov-num_VFs); pci_write_config_word(dev, iov-pos + PCI_SRIOV_CTRL, iov-ctrl); if (iov-ctrl PCI_SRIOV_CTRL_VFE) msleep(100); -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V15 13/21] powerpc/pci: Don't unset PCI resources for VFs
Flag PCI_REASSIGN_ALL_RSRC is used to ignore resources information setup by firmware, so that kernel would re-assign all resources of pci devices. On powerpc arch, this happens in a header fixup function pcibios_fixup_resources(), which will clean up the resources if this flag is set. This works fine for PFs, since after clean up, kernel will re-assign the resources in pcibios_resource_survey(). Below is a simple call flow on how it works: pcibios_init pcibios_scan_phb pci_scan_child_bus ... pci_device_add pci_fixup_device(pci_fixup_header) pcibios_fixup_resources # header fixup for (i = 0; i DEVICE_COUNT_RESOURCE; i++) dev-resource[i].start = 0 pcibios_resource_survey # re-assign pcibios_allocate_resources However, the VF resources won't be re-assigned, since the VF resources are completely determined by the PF resources, and the PF resources have already been reassigned. This means we need to leave VF's resources un-cleared in pcibios_fixup_resources(). In this patch, we skip the resource unset process in pcibios_fixup_resources(), if the pci_dev is a VF. Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com --- arch/powerpc/kernel/pci-common.c |4 1 file changed, 4 insertions(+) diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c index 2a525c9..8203101 100644 --- a/arch/powerpc/kernel/pci-common.c +++ b/arch/powerpc/kernel/pci-common.c @@ -788,6 +788,10 @@ static void pcibios_fixup_resources(struct pci_dev *dev) pci_name(dev)); return; } + + if (dev-is_virtfn) + return; + for (i = 0; i DEVICE_COUNT_RESOURCE; i++) { struct resource *res = dev-resource + i; struct pci_bus_region reg; -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V15 17/21] powerpc/powernv: Shift VF resource with an offset
On PowerNV platform, resource position in M64 BAR implies the PE# the resource belongs to. In some cases, adjustment of a resource is necessary to locate it to a correct position in M64 BAR . This patch adds pnv_pci_vf_resource_shift() to shift the 'real' PF IOV BAR address according to an offset. Note: After doing so, there would be a hole in the /proc/iomem when offset is a positive value. It looks like the device return some mmio back to the system, which actually no one could use it. [bhelgaas: rework loops, rework overlap check, index resource[] conventionally, remove pci_regs.h include, squashed with next patch] Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com --- arch/powerpc/include/asm/pci-bridge.h |4 + arch/powerpc/kernel/pci_dn.c | 13 + arch/powerpc/platforms/powernv/pci-ioda.c | 528 - arch/powerpc/platforms/powernv/pci.c | 18 + arch/powerpc/platforms/powernv/pci.h |7 + 5 files changed, 553 insertions(+), 17 deletions(-) diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h index d0d1718..3c95097 100644 --- a/arch/powerpc/include/asm/pci-bridge.h +++ b/arch/powerpc/include/asm/pci-bridge.h @@ -177,6 +177,10 @@ struct pci_dn { int pe_number; #ifdef CONFIG_PCI_IOV u16 vfs_expanded; /* number of VFs IOV BAR expanded */ + u16 num_vfs;/* number of VFs enabled*/ + int offset; /* PE# for the first VF PE */ +#define IODA_INVALID_M64(-1) + int m64_wins[PCI_SRIOV_NUM_BARS]; #endif /* CONFIG_PCI_IOV */ #endif struct list_head child_list; diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c index f3a1a81..93ed7b3 100644 --- a/arch/powerpc/kernel/pci_dn.c +++ b/arch/powerpc/kernel/pci_dn.c @@ -217,6 +217,19 @@ void remove_dev_pci_info(struct pci_dev *pdev) struct pci_dn *pdn, *tmp; int i; + /* +* VF and VF PE are created/released dynamically, so we need to +* bind/unbind them. Otherwise the VF and VF PE would be mismatched +* when re-enabling SR-IOV. +*/ + if (pdev-is_virtfn) { + pdn = pci_get_pdn(pdev); +#ifdef CONFIG_PPC_POWERNV + pdn-pe_number = IODA_INVALID_PE; +#endif + return; + } + /* Only support IOV PF for now */ if (!pdev-is_physfn) return; diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 93ec16c..11262df 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -44,6 +44,9 @@ #include powernv.h #include pci.h +/* 256M DMA window, 4K TCE pages, 8 bytes TCE */ +#define TCE32_TABLE_SIZE ((0x1000 / 0x1000) * 8) + static void pe_level_printk(const struct pnv_ioda_pe *pe, const char *level, const char *fmt, ...) { @@ -56,11 +59,18 @@ static void pe_level_printk(const struct pnv_ioda_pe *pe, const char *level, vaf.fmt = fmt; vaf.va = args; - if (pe-pdev) + if (pe-flags PNV_IODA_PE_DEV) strlcpy(pfix, dev_name(pe-pdev-dev), sizeof(pfix)); - else + else if (pe-flags (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL)) sprintf(pfix, %04x:%02x , pci_domain_nr(pe-pbus), pe-pbus-number); +#ifdef CONFIG_PCI_IOV + else if (pe-flags PNV_IODA_PE_VF) + sprintf(pfix, %04x:%02x:%2x.%d, + pci_domain_nr(pe-parent_dev-bus), + (pe-rid 0xff00) 8, + PCI_SLOT(pe-rid), PCI_FUNC(pe-rid)); +#endif /* CONFIG_PCI_IOV*/ printk(%spci %s: [PE# %.3d] %pV, level, pfix, pe-pe_number, vaf); @@ -591,7 +601,7 @@ static int pnv_ioda_set_peltv(struct pnv_phb *phb, bool is_add) { struct pnv_ioda_pe *slave; - struct pci_dev *pdev; + struct pci_dev *pdev = NULL; int ret; /* @@ -630,8 +640,12 @@ static int pnv_ioda_set_peltv(struct pnv_phb *phb, if (pe-flags (PNV_IODA_PE_BUS_ALL | PNV_IODA_PE_BUS)) pdev = pe-pbus-self; - else + else if (pe-flags PNV_IODA_PE_DEV) pdev = pe-pdev-bus-self; +#ifdef CONFIG_PCI_IOV + else if (pe-flags PNV_IODA_PE_VF) + pdev = pe-parent_dev-bus-self; +#endif /* CONFIG_PCI_IOV */ while (pdev) { struct pci_dn *pdn = pci_get_pdn(pdev); struct pnv_ioda_pe *parent; @@ -649,6 +663,87 @@ static int pnv_ioda_set_peltv(struct pnv_phb *phb, return 0; } +#ifdef CONFIG_PCI_IOV +static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe) +{ + struct pci_dev *parent; + uint8_t bcomp, dcomp, fcomp; + int64_t rc; + long rid_end, rid; + + /*
[PATCH 14/27] powerpc/powermac: Move controller ops from ppc_md to controller_ops
This moves the Power Mac platform to use the pci_controller_ops structure rather than ppc_md for PCI controller operations. Signed-off-by: Daniel Axtens d...@axtens.net --- arch/powerpc/platforms/powermac/pci.c | 17 +++-- arch/powerpc/platforms/powermac/pmac.h | 5 + arch/powerpc/platforms/powermac/setup.c | 4 +--- 3 files changed, 17 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/platforms/powermac/pci.c b/arch/powerpc/platforms/powermac/pci.c index 9c89fd2..59ab16f 100644 --- a/arch/powerpc/platforms/powermac/pci.c +++ b/arch/powerpc/platforms/powermac/pci.c @@ -27,6 +27,8 @@ #include asm/grackle.h #include asm/ppc-pci.h +#include pmac.h + #undef DEBUG #ifdef DEBUG @@ -798,6 +800,7 @@ static int __init pmac_add_bridge(struct device_node *dev) return -ENOMEM; hose-first_busno = bus_range ? bus_range[0] : 0; hose-last_busno = bus_range ? bus_range[1] : 0xff; + hose-controller_ops = pmac_pci_controller_ops; disp_name = NULL; @@ -942,7 +945,7 @@ void __init pmac_pci_init(void) } #ifdef CONFIG_PPC32 -bool pmac_pci_enable_device_hook(struct pci_dev *dev) +static bool pmac_pci_enable_device_hook(struct pci_dev *dev) { struct device_node* node; int updatecfg = 0; @@ -1225,7 +1228,7 @@ static void fixup_u4_pcie(struct pci_dev* dev) DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_APPLE, PCI_DEVICE_ID_APPLE_U4_PCIE, fixup_u4_pcie); #ifdef CONFIG_PPC64 -int pmac_pci_probe_mode(struct pci_bus *bus) +static int pmac_pci_probe_mode(struct pci_bus *bus) { struct device_node *node = pci_bus_to_OF_node(bus); @@ -1240,3 +1243,13 @@ int pmac_pci_probe_mode(struct pci_bus *bus) return PCI_PROBE_DEVTREE; } #endif /* CONFIG_PPC64 */ + +struct pci_controller_ops pmac_pci_controller_ops = { +#ifdef CONFIG_PPC64 + .probe_mode = pmac_pci_probe_mode, +#endif +#ifdef CONFIG_PPC32 + .enable_device_hook = pmac_pci_enable_device_hook, +#endif +}; + diff --git a/arch/powerpc/platforms/powermac/pmac.h b/arch/powerpc/platforms/powermac/pmac.h index b8d5721..e7f8163 100644 --- a/arch/powerpc/platforms/powermac/pmac.h +++ b/arch/powerpc/platforms/powermac/pmac.h @@ -25,7 +25,6 @@ extern void pmac_pci_init(void); extern void pmac_nvram_update(void); extern unsigned char pmac_nvram_read_byte(int addr); extern void pmac_nvram_write_byte(int addr, unsigned char val); -extern bool pmac_pci_enable_device_hook(struct pci_dev *dev); extern void pmac_pcibios_after_init(void); extern int of_show_percpuinfo(struct seq_file *m, int i); @@ -39,8 +38,6 @@ extern void low_cpu_die(void) __attribute__((noreturn)); extern int pmac_nvram_init(void); extern void pmac_pic_init(void); -#ifdef CONFIG_PPC64 -extern int pmac_pci_probe_mode(struct pci_bus *bus); -#endif +extern struct pci_controller_ops pmac_pci_controller_ops; #endif /* __PMAC_H__ */ diff --git a/arch/powerpc/platforms/powermac/setup.c b/arch/powerpc/platforms/powermac/setup.c index 71a353c..8dd78f4 100644 --- a/arch/powerpc/platforms/powermac/setup.c +++ b/arch/powerpc/platforms/powermac/setup.c @@ -473,7 +473,7 @@ static void __init pmac_init_early(void) udbg_adb_init(!!strstr(boot_command_line, btextdbg)); #ifdef CONFIG_PPC64 - iommu_init_early_dart(NULL); + iommu_init_early_dart(pmac_pci_controller_ops); #endif /* SMP Init has to be done early as we need to patch up @@ -656,12 +656,10 @@ define_machine(powermac) { .feature_call = pmac_do_feature_call, .progress = udbg_progress, #ifdef CONFIG_PPC64 - .pci_probe_mode = pmac_pci_probe_mode, .power_save = power4_idle, .enable_pmcs= power4_enable_pmcs, #endif /* CONFIG_PPC64 */ #ifdef CONFIG_PPC32 - .pcibios_enable_device_hook = pmac_pci_enable_device_hook, .pcibios_after_init = pmac_pcibios_after_init, .phys_mem_access_prot = pci_phys_mem_access_prot, #endif -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 17/27] powerpc/pasemi: Move controller ops from ppc_md to controller_ops
This moves the PaSemi platform to use the pci_controller_ops structure rather than ppc_md for PCI controller operations. Signed-off-by: Daniel Axtens d...@axtens.net --- arch/powerpc/platforms/pasemi/iommu.c | 6 -- arch/powerpc/platforms/pasemi/pasemi.h | 1 + arch/powerpc/platforms/pasemi/pci.c| 5 + 3 files changed, 10 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/platforms/pasemi/iommu.c b/arch/powerpc/platforms/pasemi/iommu.c index 2e576f2..b8f567b 100644 --- a/arch/powerpc/platforms/pasemi/iommu.c +++ b/arch/powerpc/platforms/pasemi/iommu.c @@ -27,6 +27,8 @@ #include asm/machdep.h #include asm/firmware.h +#include pasemi.h + #define IOBMAP_PAGE_SHIFT 12 #define IOBMAP_PAGE_SIZE (1 IOBMAP_PAGE_SHIFT) #define IOBMAP_PAGE_MASK (IOBMAP_PAGE_SIZE - 1) @@ -248,8 +250,8 @@ void __init iommu_init_early_pasemi(void) iob_init(NULL); - ppc_md.pci_dma_dev_setup = pci_dma_dev_setup_pasemi; - ppc_md.pci_dma_bus_setup = pci_dma_bus_setup_pasemi; + pasemi_pci_controller_ops.dma_dev_setup = pci_dma_dev_setup_pasemi; + pasemi_pci_controller_ops.dma_bus_setup = pci_dma_bus_setup_pasemi; ppc_md.tce_build = iobmap_build; ppc_md.tce_free = iobmap_free; set_pci_dma_ops(dma_iommu_ops); diff --git a/arch/powerpc/platforms/pasemi/pasemi.h b/arch/powerpc/platforms/pasemi/pasemi.h index ea65bf0..11f230a 100644 --- a/arch/powerpc/platforms/pasemi/pasemi.h +++ b/arch/powerpc/platforms/pasemi/pasemi.h @@ -30,5 +30,6 @@ static inline void restore_astate(int cpu) } #endif +extern struct pci_controller_ops pasemi_pci_controller_ops; #endif /* _PASEMI_PASEMI_H */ diff --git a/arch/powerpc/platforms/pasemi/pci.c b/arch/powerpc/platforms/pasemi/pci.c index aa86271..f3a68a0 100644 --- a/arch/powerpc/platforms/pasemi/pci.c +++ b/arch/powerpc/platforms/pasemi/pci.c @@ -31,6 +31,8 @@ #include asm/ppc-pci.h +#include pasemi.h + #define PA_PXP_CFA(bus, devfn, off) (((bus) 20) | ((devfn) 12) | (off)) static inline int pa_pxp_offset_valid(u8 bus, u8 devfn, int offset) @@ -199,6 +201,7 @@ static int __init pas_add_bridge(struct device_node *dev) hose-first_busno = 0; hose-last_busno = 0xff; + hose-controller_ops = pasemi_pci_controller_ops; setup_pa_pxp(hose); @@ -239,3 +242,5 @@ void __iomem *pasemi_pci_getcfgaddr(struct pci_dev *dev, int offset) return (void __iomem *)pa_pxp_cfg_addr(hose, dev-bus-number, dev-devfn, offset); } + +struct pci_controller_ops pasemi_pci_controller_ops; -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 16/27] powerpc/powernv: Move controller ops from ppc_md to controller_ops
This moves the PowerNV platform to use the pci_controller_ops structure rather than ppc_md for PCI controller operations. Signed-off-by: Daniel Axtens d...@axtens.net --- arch/powerpc/platforms/powernv/pci-ioda.c | 7 --- arch/powerpc/platforms/powernv/pci-p5ioc2.c | 1 + arch/powerpc/platforms/powernv/pci.c| 5 - arch/powerpc/platforms/powernv/powernv.h| 2 ++ 4 files changed, 11 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index c18e191..b4e46bf 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -1988,6 +1988,7 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np, hose-last_busno = 0xff; } hose-private_data = phb; + hose-controller_ops = pnv_pci_controller_ops; phb-hub_id = hub_id; phb-opal_id = phb_id; phb-type = ioda_type; @@ -2104,9 +2105,9 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np, * the child P2P bridges) can form individual PE. */ ppc_md.pcibios_fixup = pnv_pci_ioda_fixup; - ppc_md.pcibios_enable_device_hook = pnv_pci_enable_device_hook; - ppc_md.pcibios_window_alignment = pnv_pci_window_alignment; - ppc_md.pcibios_reset_secondary_bus = pnv_pci_reset_secondary_bus; + pnv_pci_controller_ops.enable_device_hook = pnv_pci_enable_device_hook; + pnv_pci_controller_ops.window_alignment = pnv_pci_window_alignment; + pnv_pci_controller_ops.reset_secondary_bus = pnv_pci_reset_secondary_bus; pci_add_flags(PCI_REASSIGN_ALL_RSRC); /* Reset IODA tables to a clean state */ diff --git a/arch/powerpc/platforms/powernv/pci-p5ioc2.c b/arch/powerpc/platforms/powernv/pci-p5ioc2.c index 6ef6d4d..4729ca7 100644 --- a/arch/powerpc/platforms/powernv/pci-p5ioc2.c +++ b/arch/powerpc/platforms/powernv/pci-p5ioc2.c @@ -133,6 +133,7 @@ static void __init pnv_pci_init_p5ioc2_phb(struct device_node *np, u64 hub_id, phb-hose-first_busno = 0; phb-hose-last_busno = 0xff; phb-hose-private_data = phb; + phb-hose-controller_ops = pnv_pci_controller_ops; phb-hub_id = hub_id; phb-opal_id = phb_id; phb-type = PNV_PHB_P5IOC2; diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c index c8939ad..63518b3 100644 --- a/arch/powerpc/platforms/powernv/pci.c +++ b/arch/powerpc/platforms/powernv/pci.c @@ -761,7 +761,6 @@ void __init pnv_pci_init(void) pci_devs_phb_init(); /* Configure IOMMU DMA hooks */ - ppc_md.pci_dma_dev_setup = pnv_pci_dma_dev_setup; ppc_md.tce_build = pnv_tce_build_vm; ppc_md.tce_free = pnv_tce_free_vm; ppc_md.tce_build_rm = pnv_tce_build_rm; @@ -777,3 +776,7 @@ void __init pnv_pci_init(void) } machine_subsys_initcall_sync(powernv, tce_iommu_bus_notifier_init); + +struct pci_controller_ops pnv_pci_controller_ops = { + .dma_dev_setup = pnv_pci_dma_dev_setup, +}; diff --git a/arch/powerpc/platforms/powernv/powernv.h b/arch/powerpc/platforms/powernv/powernv.h index 604c48e..826d2c9 100644 --- a/arch/powerpc/platforms/powernv/powernv.h +++ b/arch/powerpc/platforms/powernv/powernv.h @@ -29,6 +29,8 @@ static inline u64 pnv_pci_dma_get_required_mask(struct pci_dev *pdev) } #endif +extern struct pci_controller_ops pnv_pci_controller_ops; + extern u32 pnv_get_supported_cpuidle_states(void); extern void pnv_lpc_init(void); -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 21/27] powerpc: Remove shim for pci_controller_ops.window_alignment
Signed-off-by: Daniel Axtens d...@axtens.net --- arch/powerpc/include/asm/machdep.h| 3 --- arch/powerpc/include/asm/pci-bridge.h | 18 -- arch/powerpc/kernel/pci-common.c | 12 +++- 3 files changed, 11 insertions(+), 22 deletions(-) diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h index 9d4a067..f1476b8 100644 --- a/arch/powerpc/include/asm/machdep.h +++ b/arch/powerpc/include/asm/machdep.h @@ -244,9 +244,6 @@ struct machdep_calls { /* Called after scan and before resource survey */ void (*pcibios_fixup_phb)(struct pci_controller *hose); - /* Called during PCI resource reassignment */ - resource_size_t (*pcibios_window_alignment)(struct pci_bus *, unsigned long type); - /* Reset the secondary bus of bridge */ void (*pcibios_reset_secondary_bus)(struct pci_dev *dev); diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h index ea9496b..b62e043 100644 --- a/arch/powerpc/include/asm/pci-bridge.h +++ b/arch/powerpc/include/asm/pci-bridge.h @@ -327,24 +327,6 @@ static inline bool enable_device_hook(struct pci_dev *dev) return true; } -static inline resource_size_t pci_window_alignment(struct pci_bus *bus, - unsigned long type) -{ - struct pci_controller *hose = pci_bus_to_host(bus); - - if (hose-controller_ops.window_alignment) - return hose-controller_ops.window_alignment(bus, type); - if (ppc_md.pcibios_window_alignment) - return ppc_md.pcibios_window_alignment(bus, type); - - /* -* PCI core will figure out the default -* alignment: 4KiB for I/O and 1MiB for -* memory window. -*/ - return 1; -} - static inline void reset_secondary_bus(struct pci_dev *dev) { struct pci_controller *hose = pci_bus_to_host(dev-bus); diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c index 67d4dcb..9edb479 100644 --- a/arch/powerpc/kernel/pci-common.c +++ b/arch/powerpc/kernel/pci-common.c @@ -109,7 +109,17 @@ void pcibios_free_controller(struct pci_controller *phb) resource_size_t pcibios_window_alignment(struct pci_bus *bus, unsigned long type) { - return pci_window_alignment(bus, type); + struct pci_controller *hose = pci_bus_to_host(bus); + + if (hose-controller_ops.window_alignment) + return hose-controller_ops.window_alignment(bus, type); + + /* +* PCI core will figure out the default +* alignment: 4KiB for I/O and 1MiB for +* memory window. +*/ + return 1; } void pcibios_reset_secondary_bus(struct pci_dev *dev) -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 26/27] powerpc: Remove shim for pci_controller_ops.dma_bus_setup
Signed-off-by: Daniel Axtens d...@axtens.net --- arch/powerpc/include/asm/machdep.h| 2 -- arch/powerpc/include/asm/pci-bridge.h | 14 -- arch/powerpc/kernel/pci-common.c | 5 - arch/powerpc/sysdev/dart_iommu.c | 3 --- 4 files changed, 4 insertions(+), 20 deletions(-) diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h index 2f7b319..92b085b 100644 --- a/arch/powerpc/include/asm/machdep.h +++ b/arch/powerpc/include/asm/machdep.h @@ -103,8 +103,6 @@ struct machdep_calls { #endif #endif /* CONFIG_PPC64 */ - void(*pci_dma_bus_setup)(struct pci_bus *bus); - /* Platform set_dma_mask and dma_get_required_mask overrides */ int (*dma_set_mask)(struct device *dev, u64 dma_mask); u64 (*dma_get_required_mask)(struct device *dev); diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h index e578f67..4f39ef9 100644 --- a/arch/powerpc/include/asm/pci-bridge.h +++ b/arch/powerpc/include/asm/pci-bridge.h @@ -277,19 +277,5 @@ static inline int pcibios_vaddr_is_ioport(void __iomem *address) } #endif /* CONFIG_PCI */ -/* - * Shims to prefer pci_controller version over ppc_md where available. - */ - -static inline void dma_bus_setup(struct pci_bus *bus) -{ - struct pci_controller *hose = pci_bus_to_host(bus); - - if (hose-controller_ops.dma_bus_setup) - hose-controller_ops.dma_bus_setup(bus); - else if (ppc_md.pci_dma_bus_setup) - ppc_md.pci_dma_bus_setup(bus); -} - #endif /* __KERNEL__ */ #endif /* _ASM_POWERPC_PCI_BRIDGE_H */ diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c index 433b387..7447b10 100644 --- a/arch/powerpc/kernel/pci-common.c +++ b/arch/powerpc/kernel/pci-common.c @@ -946,6 +946,7 @@ static void pcibios_fixup_bridge(struct pci_bus *bus) void pcibios_setup_bus_self(struct pci_bus *bus) { + struct pci_controller *hose; /* Fix up the bus resources for P2P bridges */ if (bus-self != NULL) pcibios_fixup_bridge(bus); @@ -957,7 +958,9 @@ void pcibios_setup_bus_self(struct pci_bus *bus) ppc_md.pcibios_fixup_bus(bus); /* Setup bus DMA mappings */ - dma_bus_setup(bus); + hose = pci_bus_to_host(bus); + if (hose-controller_ops.dma_bus_setup) + hose-controller_ops.dma_bus_setup(bus); } static void pcibios_setup_device(struct pci_dev *dev) diff --git a/arch/powerpc/sysdev/dart_iommu.c b/arch/powerpc/sysdev/dart_iommu.c index ca38b1e..87b8000 100644 --- a/arch/powerpc/sysdev/dart_iommu.c +++ b/arch/powerpc/sysdev/dart_iommu.c @@ -398,8 +398,6 @@ void __init iommu_init_early_dart(struct pci_controller_ops *controller_ops) if (controller_ops) { controller_ops-dma_dev_setup = pci_dma_dev_setup_dart; controller_ops-dma_bus_setup = pci_dma_bus_setup_dart; - } else { - ppc_md.pci_dma_bus_setup = pci_dma_bus_setup_dart; } /* Setup pci_dma ops */ set_pci_dma_ops(dma_iommu_ops); @@ -411,7 +409,6 @@ void __init iommu_init_early_dart(struct pci_controller_ops *controller_ops) controller_ops-dma_dev_setup = NULL; controller_ops-dma_bus_setup = NULL; } - ppc_md.pci_dma_bus_setup = NULL; /* Setup pci_dma ops */ set_pci_dma_ops(dma_direct_ops); -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V15 02/21] powerpc/powernv: Use pci_dn, not device_node, in PCI config accessor
The PCI config accessors previously relied on device_node. Unfortunately, VFs don't have a corresponding device_node, so change the accessors to use pci_dn instead. [bhelgaas: changelog] Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com --- arch/powerpc/platforms/powernv/eeh-powernv.c | 14 +- arch/powerpc/platforms/powernv/pci.c | 69 ++ arch/powerpc/platforms/powernv/pci.h |4 +- 3 files changed, 40 insertions(+), 47 deletions(-) diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c index e261869..7a5021b 100644 --- a/arch/powerpc/platforms/powernv/eeh-powernv.c +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c @@ -430,21 +430,31 @@ static inline bool powernv_eeh_cfg_blocked(struct device_node *dn) static int powernv_eeh_read_config(struct device_node *dn, int where, int size, u32 *val) { + struct pci_dn *pdn = PCI_DN(dn); + + if (!pdn) + return PCIBIOS_DEVICE_NOT_FOUND; + if (powernv_eeh_cfg_blocked(dn)) { *val = 0x; return PCIBIOS_SET_FAILED; } - return pnv_pci_cfg_read(dn, where, size, val); + return pnv_pci_cfg_read(pdn, where, size, val); } static int powernv_eeh_write_config(struct device_node *dn, int where, int size, u32 val) { + struct pci_dn *pdn = PCI_DN(dn); + + if (!pdn) + return PCIBIOS_DEVICE_NOT_FOUND; + if (powernv_eeh_cfg_blocked(dn)) return PCIBIOS_SET_FAILED; - return pnv_pci_cfg_write(dn, where, size, val); + return pnv_pci_cfg_write(pdn, where, size, val); } /** diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c index e69142f..6c20d6e 100644 --- a/arch/powerpc/platforms/powernv/pci.c +++ b/arch/powerpc/platforms/powernv/pci.c @@ -366,9 +366,9 @@ static void pnv_pci_handle_eeh_config(struct pnv_phb *phb, u32 pe_no) spin_unlock_irqrestore(phb-lock, flags); } -static void pnv_pci_config_check_eeh(struct pnv_phb *phb, -struct device_node *dn) +static void pnv_pci_config_check_eeh(struct pci_dn *pdn) { + struct pnv_phb *phb = pdn-phb-private_data; u8 fstate; __be16 pcierr; int pe_no; @@ -379,7 +379,7 @@ static void pnv_pci_config_check_eeh(struct pnv_phb *phb, * setup that yet. So all ER errors should be mapped to * reserved PE. */ - pe_no = PCI_DN(dn)-pe_number; + pe_no = pdn-pe_number; if (pe_no == IODA_INVALID_PE) { if (phb-type == PNV_PHB_P5IOC2) pe_no = 0; @@ -407,8 +407,7 @@ static void pnv_pci_config_check_eeh(struct pnv_phb *phb, } cfg_dbg( - EEH check, bdfn=%04x PE#%d fstate=%x\n, - (PCI_DN(dn)-busno 8) | (PCI_DN(dn)-devfn), - pe_no, fstate); + (pdn-busno 8) | (pdn-devfn), pe_no, fstate); /* Clear the frozen state if applicable */ if (fstate == OPAL_EEH_STOPPED_MMIO_FREEZE || @@ -425,10 +424,9 @@ static void pnv_pci_config_check_eeh(struct pnv_phb *phb, } } -int pnv_pci_cfg_read(struct device_node *dn, +int pnv_pci_cfg_read(struct pci_dn *pdn, int where, int size, u32 *val) { - struct pci_dn *pdn = PCI_DN(dn); struct pnv_phb *phb = pdn-phb-private_data; u32 bdfn = (pdn-busno 8) | pdn-devfn; s64 rc; @@ -462,10 +460,9 @@ int pnv_pci_cfg_read(struct device_node *dn, return PCIBIOS_SUCCESSFUL; } -int pnv_pci_cfg_write(struct device_node *dn, +int pnv_pci_cfg_write(struct pci_dn *pdn, int where, int size, u32 val) { - struct pci_dn *pdn = PCI_DN(dn); struct pnv_phb *phb = pdn-phb-private_data; u32 bdfn = (pdn-busno 8) | pdn-devfn; @@ -489,18 +486,17 @@ int pnv_pci_cfg_write(struct device_node *dn, } #if CONFIG_EEH -static bool pnv_pci_cfg_check(struct pci_controller *hose, - struct device_node *dn) +static bool pnv_pci_cfg_check(struct pci_dn *pdn) { struct eeh_dev *edev = NULL; - struct pnv_phb *phb = hose-private_data; + struct pnv_phb *phb = pdn-phb-private_data; /* EEH not enabled ? */ if (!(phb-flags PNV_PHB_FLAG_EEH)) return true; /* PE reset or device removed ? */ - edev = of_node_to_eeh_dev(dn); + edev = pdn-edev; if (edev) { if (edev-pe (edev-pe-state EEH_PE_CFG_BLOCKED)) @@ -513,8 +509,7 @@ static bool pnv_pci_cfg_check(struct pci_controller *hose, return true; } #else -static inline pnv_pci_cfg_check(struct pci_controller *hose, - struct device_node *dn) +static inline pnv_pci_cfg_check(struct pci_dn *pdn) {
[PATCH V15 11/21] PCI: Add pcibios_iov_resource_alignment() interface
Per the SR-IOV spec r1.1, sec 3.3.14, the required alignment of a PF's IOV BAR is the size of an individual VF BAR, and the size consumed is the individual VF BAR size times NumVFs. The PowerNV platform has additional alignment requirements to help support its Partitionable Endpoint device isolation feature (see Documentation/powerpc/pci_iov_resource_on_powernv.txt). Add a pcibios_iov_resource_alignment() interface to allow platforms to request additional alignment. [bhelgaas: changelog, adapt to reworked pci_sriov_resource_alignment(), drop align parameter] Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com Acked-by: Bjorn Helgaas bhelg...@google.com --- drivers/pci/iov.c |8 +++- include/linux/pci.h |1 + 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index 64c4692..ee0ebff 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -569,6 +569,12 @@ int pci_iov_resource_bar(struct pci_dev *dev, int resno) 4 * (resno - PCI_IOV_RESOURCES); } +resource_size_t __weak pcibios_iov_resource_alignment(struct pci_dev *dev, + int resno) +{ + return pci_iov_resource_size(dev, resno); +} + /** * pci_sriov_resource_alignment - get resource alignment for VF BAR * @dev: the PCI device @@ -581,7 +587,7 @@ int pci_iov_resource_bar(struct pci_dev *dev, int resno) */ resource_size_t pci_sriov_resource_alignment(struct pci_dev *dev, int resno) { - return pci_iov_resource_size(dev, resno); + return pcibios_iov_resource_alignment(dev, resno); } /** diff --git a/include/linux/pci.h b/include/linux/pci.h index 99ea948..4e1f17d 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1174,6 +1174,7 @@ unsigned char pci_bus_max_busnr(struct pci_bus *bus); void pci_setup_bridge(struct pci_bus *bus); resource_size_t pcibios_window_alignment(struct pci_bus *bus, unsigned long type); +resource_size_t pcibios_iov_resource_alignment(struct pci_dev *dev, int resno); #define PCI_VGA_STATE_CHANGE_BRIDGE (1 0) #define PCI_VGA_STATE_CHANGE_DECODES (1 1) -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V15 10/21] PCI: Add pcibios_sriov_enable() and pcibios_sriov_disable()
VFs are dynamically created when a driver enables them. On some platforms, like PowerNV, special resources are necessary to enable VFs. Add platform hooks for enabling and disabling VFs. Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com Acked-by: Bjorn Helgaas bhelg...@google.com --- drivers/pci/iov.c | 19 +++ 1 file changed, 19 insertions(+) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index 5643a10..64c4692 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -220,6 +220,11 @@ static void virtfn_remove(struct pci_dev *dev, int id, int reset) pci_dev_put(dev); } +int __weak pcibios_sriov_enable(struct pci_dev *pdev, u16 num_vfs) +{ + return 0; +} + static int sriov_enable(struct pci_dev *dev, int nr_virtfn) { int rc; @@ -231,6 +236,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn) struct pci_sriov *iov = dev-sriov; int bars = 0; int bus; + int retval; if (!nr_virtfn) return 0; @@ -307,6 +313,12 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn) if (nr_virtfn initial) initial = nr_virtfn; + if ((retval = pcibios_sriov_enable(dev, initial))) { + dev_err(dev-dev, failure %d from pcibios_sriov_enable()\n, + retval); + return retval; + } + for (i = 0; i initial; i++) { rc = virtfn_add(dev, i, 0); if (rc) @@ -335,6 +347,11 @@ failed: return rc; } +int __weak pcibios_sriov_disable(struct pci_dev *pdev) +{ + return 0; +} + static void sriov_disable(struct pci_dev *dev) { int i; @@ -346,6 +363,8 @@ static void sriov_disable(struct pci_dev *dev) for (i = 0; i iov-num_VFs; i++) virtfn_remove(dev, i, 0); + pcibios_sriov_disable(dev); + iov-ctrl = ~(PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE); pci_cfg_access_lock(dev); pci_write_config_word(dev, iov-pos + PCI_SRIOV_CTRL, iov-ctrl); -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2] powerpc/mpc85xx: Add MDIO bus muxing support to the board device tree(s)
From: Igal Liberman igal.liber...@freescale.com Describe the PHY topology for all configurations supported by each board Based on prior work by Andy Fleming aflem...@gmail.com Signed-off-by: Igal Liberman igal.liber...@freescale.com Signed-off-by: Shruti Kanetkar kanetkar.shr...@gmail.com Signed-off-by: Emil Medve emilian.me...@freescale.com --- v2: Remove 'Change-Id' arch/powerpc/boot/dts/b4860qds.dts| 60 - arch/powerpc/boot/dts/b4qds.dtsi | 51 - arch/powerpc/boot/dts/p1023rdb.dts| 24 +- arch/powerpc/boot/dts/p2041rdb.dts| 92 +++- arch/powerpc/boot/dts/p3041ds.dts | 112 +- arch/powerpc/boot/dts/p4080ds.dts | 184 +++- arch/powerpc/boot/dts/p5020ds.dts | 112 +- arch/powerpc/boot/dts/p5040ds.dts | 234 +++- arch/powerpc/boot/dts/t1040rdb.dts| 32 ++- arch/powerpc/boot/dts/t1042rdb.dts| 30 ++- arch/powerpc/boot/dts/t1042rdb_pi.dts | 18 +- arch/powerpc/boot/dts/t104xqds.dtsi | 178 ++- arch/powerpc/boot/dts/t104xrdb.dtsi | 33 ++- arch/powerpc/boot/dts/t2080qds.dts| 158 +- arch/powerpc/boot/dts/t2080rdb.dts| 67 +- arch/powerpc/boot/dts/t2081qds.dts| 221 ++- arch/powerpc/boot/dts/t4240qds.dts| 400 +- arch/powerpc/boot/dts/t4240rdb.dts| 149 - 18 files changed, 2135 insertions(+), 20 deletions(-) diff --git a/arch/powerpc/boot/dts/b4860qds.dts b/arch/powerpc/boot/dts/b4860qds.dts index 6bb3707..98b1ef4 100644 --- a/arch/powerpc/boot/dts/b4860qds.dts +++ b/arch/powerpc/boot/dts/b4860qds.dts @@ -1,7 +1,7 @@ /* * B4860DS Device Tree Source * - * Copyright 2012 Freescale Semiconductor Inc. + * Copyright 2012 - 2015 Freescale Semiconductor Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: @@ -39,12 +39,69 @@ model = fsl,B4860QDS; compatible = fsl,B4860QDS; + aliases { + phy_sgmii_1e = phy_sgmii_1e; + phy_sgmii_1f = phy_sgmii_1f; + phy_xaui_slot1 = phy_xaui_slot1; + phy_xaui_slot2 = phy_xaui_slot2; + }; + ifc: localbus@ffe124000 { board-control@3,0 { compatible = fsl,b4860qds-fpga, fsl,fpga-qixis; }; }; + soc@ffe00 { + fman@40 { + ethernet@e8000 { + phy-handle = phy_sgmii_1e; + phy-connection-type = sgmii; + }; + + ethernet@ea000 { + phy-handle = phy_sgmii_1f; + phy-connection-type = sgmii; + }; + + ethernet@f { + phy-handle = phy_xaui_slot1; + phy-connection-type = xgmii; + }; + + ethernet@f2000 { + phy-handle = phy_xaui_slot2; + phy-connection-type = xgmii; + }; + + mdio@fc000 { + phy_sgmii_1e: ethernet-phy@1e { + reg = 0x1e; + status = disabled; + }; + + phy_sgmii_1f: ethernet-phy@1f { + reg = 0x1f; + status = disabled; + }; + }; + + mdio@fd000 { + phy_xaui_slot1: xaui-phy@slot1 { + compatible = ethernet-phy-ieee802.3-c45; + reg = 0x7; + status = disabled; + }; + + phy_xaui_slot2: xaui-phy@slot2 { + compatible = ethernet-phy-ieee802.3-c45; + reg = 0x6; + status = disabled; + }; + }; + }; + }; + rio: rapidio@ffe0c { reg = 0xf 0xfe0c 0 0x11000; @@ -55,7 +112,6 @@ ranges = 0 0 0xc 0x3000 0 0x1000; }; }; - }; /include/ fsl/b4860si-post.dtsi diff --git a/arch/powerpc/boot/dts/b4qds.dtsi b/arch/powerpc/boot/dts/b4qds.dtsi index 559d006..af49456 100644 --- a/arch/powerpc/boot/dts/b4qds.dtsi +++ b/arch/powerpc/boot/dts/b4qds.dtsi @@ -1,7 +1,7 @@ /* * B4420DS Device Tree Source * - * Copyright 2012 - 2014 Freescale Semiconductor, Inc. + * Copyright 2012 - 2015
Re: [PATCH v8 19/30] powerpc/pci: Use pci_scan_host_bridge() for simplicity
On 2015/3/25 7:58, Daniel Axtens wrote: On Tue, 2015-03-24 at 11:34 +0800, Yijing Wang wrote: Now we could use pci_scan_host_bridge() to scan pci buses, provide powerpc specific pci_host_bridge_ops. Signed-off-by: Yijing Wang wangyij...@huawei.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: linuxppc-dev@lists.ozlabs.org --- arch/powerpc/kernel/pci-common.c | 60 +++-- 1 files changed, 37 insertions(+), 23 deletions(-) diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c index 2c58200..e2b50a2 100644 --- a/arch/powerpc/kernel/pci-common.c +++ b/arch/powerpc/kernel/pci-common.c @@ -773,6 +773,29 @@ void pcibios_set_root_bus_speed(struct pci_host_bridge *bridge) return ppc_md.pcibios_set_root_bus_speed(bridge); } +static int pci_host_scan_bus(struct pci_host_bridge *host) +{ +int mode = PCI_PROBE_NORMAL; +struct pci_bus *bus = host-bus; +struct pci_controller *hose = dev_get_drvdata(host-dev); Is there any reason this isn't *hose = pci_bus_to_host(bus)? Hi Daniel, thanks for your review and comments. We want to make a generic pci_host_bridge, which would hold the common host information, for example, pci domain is common info for pci host bridge, this series saved domain in pci_host_bridge, then we no need to extract out domain by pci_bus-sysdata by platform specific pci_domain_nr(). Also we store the sysdata in pci_host_bridge, and pci_bus_to_host() is the platform interface, I think use the common interface would be better. + +/* Get probe mode and perform scan */ +if (hose-dn ppc_md.pci_probe_mode) +mode = ppc_md.pci_probe_mode(bus); + +pr_debug(probe mode: %d\n, mode); +if (mode == PCI_PROBE_DEVTREE) +of_scan_bus(hose-dn, bus); + +if (mode == PCI_PROBE_NORMAL) { +pci_bus_update_busn_res_end(bus, 255); +hose-last_busno = pci_scan_child_bus(bus); +pci_bus_update_busn_res_end(bus, hose-last_busno); +} + +return pci_bus_child_max_busnr(bus); +} + I'm having trouble convincing myself that this patch covers every variation within our PCI implementations. In particular, there's a stanza in of_scan_pci_bridge in kernel/pci_of_scan.c that's almost identical to this function. Does that implementation need to be cleaned up and replaced with this function too? This is a pci_host_bridge_ops hook function, which would be called in PCI core, and after applied this series, we only need to call pci_scan_host_bridge() to scan pci devices, and this function is also extracted from the pcibios_scan_phb(), it's not the redundant code. @@ -1641,9 +1655,9 @@ void pcibios_scan_phb(struct pci_controller *hose) ppc_md.pcibios_fixup_phb(hose); /* Configure PCI Express settings */ -if (bus !pci_has_flag(PCI_PROBE_ONLY)) { +if (host-bus !pci_has_flag(PCI_PROBE_ONLY)) { struct pci_bus *child; -list_for_each_entry(child, bus-children, node) +list_for_each_entry(child, host-bus-children, node) pcie_bus_configure_settings(child); } } Two things: Firstly, the function uses hose throughout, not host. Secondly, you're not deleting the bus variable: what's the purpose of this change? host is the common pci_host_bridge which is created by PCI core for pci host bridge driver, the hose is the platform data used in powerpc. The purpose of the patch/series is to simplify pci enumeration interface, and try to reduce the weak functions which were used to setup pci bus/devices during PCI enumeration. Regards, Daniel -- Thanks! Yijing ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V16 00/20] Enable SRIOV on POWER8
This patchset enables the SRIOV on POWER8. The general idea is put each VF into one individual PE and allocate required resources like MMIO/DMA/MSI. The major difficulty comes from the MMIO allocation and adjustment for PF's IOV BAR. On P8, we use M64BT to cover a PF's IOV BAR, which could make an individual VF sit in its own PE. This gives more flexiblity, while at the mean time it brings on some restrictions on the PF's IOV BAR size and alignment. To achieve this effect, we need to do some hack on pci devices's resources. 1. Expand the IOV BAR properly. Done by pnv_pci_ioda_fixup_iov_resources(). 2. Shift the IOV BAR properly. Done by pnv_pci_vf_resource_shift(). 3. IOV BAR alignment is calculated by arch dependent function instead of an individual VF BAR size. Done by pnv_pcibios_sriov_resource_alignment(). 4. Take the IOV BAR alignment into consideration in the sizing and assigning. This is achieved by commit: PCI: Take additional IOV BAR alignment in sizing and assigning Test Environment: The SRIOV device tested is Emulex Lancer(10df:e220) and Mellanox ConnectX-3(15b3:1003) on POWER8. Examples on pass through a VF to guest through vfio: 1. unbind the original driver and bind to vfio-pci driver echo :06:0d.0 /sys/bus/pci/devices/:06:0d.0/driver/unbind echo 1102 0002 /sys/bus/pci/drivers/vfio-pci/new_id Note: this should be done for each device in the same iommu_group 2. Start qemu and pass device through vfio /home/ywywyang/git/qemu-impreza/ppc64-softmmu/qemu-system-ppc64 \ -M pseries -m 2048 -enable-kvm -nographic \ -drive file=/home/ywywyang/kvm/fc19.img \ -monitor telnet:localhost:5435,server,nowait -boot cd \ -device spapr-pci-vfio-host-bridge,id=CXGB3,iommu=26,index=6 Verify this is the exact VF response: 1. ping from a machine in the same subnet(the broadcast domain) 2. run arp -n on this machine 9.115.251.20 ether 00:00:c9:df:ed:bf C eth0 3. ifconfig in the guest # ifconfig eth1 eth1: flags=4163UP,BROADCAST,RUNNING,MULTICAST mtu 1500 inet 9.115.251.20 netmask 255.255.255.0 broadcast 9.115.251.255 inet6 fe80::200:c9ff:fedf:edbf prefixlen 64 scopeid 0x20link ether 00:00:c9:df:ed:bf txqueuelen 1000 (Ethernet) RX packets 175 bytes 13278 (12.9 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 58 bytes 9276 (9.0 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 4. They have the same MAC address Note: make sure you shutdown other network interfaces in guest. --- v16: * rebased on Ben's next-eeh * Following two patches have been divided into three. First two are already merged, the third one is renamed to powerpc/pci: Create pci_dn for VFs and sent in this patch set. 8ec20d6 powerpc/powernv: Use pci_dn, not device_node, in PCI config accessor a3460fc powerpc/pci: Refactor pci_dn v15: * Add Ack from Bjorn * Make more detailed comment for pnv_pci_vf_resource_shift() v14: * call ppc_md.pcibios_fixup_sriov() in pcibios_add_device * add more explanation in change log * Following patches have been reordered to the beginning. 8ec20d6 powerpc/powernv: Use pci_dn, not device_node, in PCI config accessor a3460fc powerpc/pci: Refactor pci_dn These two patches will be modified to merge with other patches which are under discussion/review in ppc mail list. Some changes may also be made in other patches, which I didn't include them in this series, so that the auto build robot could work on this. There may have several changes in powerpc arch, which not effect the pci core. So after this patch set pass the review in pci community, I would rebase this series on ppc brach and send out for comment. * use add_res-min_align as the alignment in reassign_resources_sorted() * some cleanup in Document v13: * fix error in pcibios_iov_resource_alignment(), use pdev instead of dev * rename vf_num to num_vfs in pcibios_sriov_enable(), pnv_pci_vf_resource_shift(), pnv_pci_sriov_disable(), pnv_pci_sriov_enable(), pnv_pci_ioda2_setup_dma_pe() * add more explanation in commit powerpc/pci: Don't unset PCI resources for VFs * fix IOV BAR in hotplug path as well, and don't fixup an already added device * use roundup_pow_of_two() instead of __roundup_pow_of_two() * this is based on v4.0-rc1 v12: * remove align parameter from pcibios_iov_resource_alignment() default version returns pci_iov_resource_size() instead of the align parameter * in powerpc pcibios_iov_resource_alignment(), return pci_iov_resource_size() if there's no ppc_md function pointer * in
[PATCH V16 05/20] PCI: Refresh First VF Offset and VF Stride when updating NumVFs
The First VF Offset and VF Stride fields depend on the NumVFs setting, so refresh the cached fields in struct pci_sriov when updating NumVFs. See the SR-IOV spec r1.1, sec 3.3.9 and 3.3.10. [bhelgaas: changelog, remove kernel-doc comment marker] Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com Acked-by: Bjorn Helgaas bhelg...@google.com --- drivers/pci/iov.c | 23 +++ 1 file changed, 19 insertions(+), 4 deletions(-) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index 27b98c3..a8752c2 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -31,6 +31,21 @@ static inline u8 virtfn_devfn(struct pci_dev *dev, int id) dev-sriov-stride * id) 0xff; } +/* + * Per SR-IOV spec sec 3.3.10 and 3.3.11, First VF Offset and VF Stride may + * change when NumVFs changes. + * + * Update iov-offset and iov-stride when NumVFs is written. + */ +static inline void pci_iov_set_numvfs(struct pci_dev *dev, int nr_virtfn) +{ + struct pci_sriov *iov = dev-sriov; + + pci_write_config_word(dev, iov-pos + PCI_SRIOV_NUM_VF, nr_virtfn); + pci_read_config_word(dev, iov-pos + PCI_SRIOV_VF_OFFSET, iov-offset); + pci_read_config_word(dev, iov-pos + PCI_SRIOV_VF_STRIDE, iov-stride); +} + static struct pci_bus *virtfn_add_bus(struct pci_bus *bus, int busnr) { struct pci_bus *child; @@ -253,7 +268,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn) return rc; } - pci_write_config_word(dev, iov-pos + PCI_SRIOV_NUM_VF, nr_virtfn); + pci_iov_set_numvfs(dev, nr_virtfn); iov-ctrl |= PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE; pci_cfg_access_lock(dev); pci_write_config_word(dev, iov-pos + PCI_SRIOV_CTRL, iov-ctrl); @@ -282,7 +297,7 @@ failed: iov-ctrl = ~(PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE); pci_cfg_access_lock(dev); pci_write_config_word(dev, iov-pos + PCI_SRIOV_CTRL, iov-ctrl); - pci_write_config_word(dev, iov-pos + PCI_SRIOV_NUM_VF, 0); + pci_iov_set_numvfs(dev, 0); ssleep(1); pci_cfg_access_unlock(dev); @@ -313,7 +328,7 @@ static void sriov_disable(struct pci_dev *dev) sysfs_remove_link(dev-dev.kobj, dep_link); iov-num_VFs = 0; - pci_write_config_word(dev, iov-pos + PCI_SRIOV_NUM_VF, 0); + pci_iov_set_numvfs(dev, 0); } static int sriov_init(struct pci_dev *dev, int pos) @@ -452,7 +467,7 @@ static void sriov_restore_state(struct pci_dev *dev) pci_update_resource(dev, i); pci_write_config_dword(dev, iov-pos + PCI_SRIOV_SYS_PGSIZE, iov-pgsz); - pci_write_config_word(dev, iov-pos + PCI_SRIOV_NUM_VF, iov-num_VFs); + pci_iov_set_numvfs(dev, iov-num_VFs); pci_write_config_word(dev, iov-pos + PCI_SRIOV_CTRL, iov-ctrl); if (iov-ctrl PCI_SRIOV_CTRL_VFE) msleep(100); -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V16 12/20] powerpc/pci: Don't unset PCI resources for VFs
Flag PCI_REASSIGN_ALL_RSRC is used to ignore resources information setup by firmware, so that kernel would re-assign all resources of pci devices. On powerpc arch, this happens in a header fixup function pcibios_fixup_resources(), which will clean up the resources if this flag is set. This works fine for PFs, since after clean up, kernel will re-assign the resources in pcibios_resource_survey(). Below is a simple call flow on how it works: pcibios_init pcibios_scan_phb pci_scan_child_bus ... pci_device_add pci_fixup_device(pci_fixup_header) pcibios_fixup_resources # header fixup for (i = 0; i DEVICE_COUNT_RESOURCE; i++) dev-resource[i].start = 0 pcibios_resource_survey # re-assign pcibios_allocate_resources However, the VF resources won't be re-assigned, since the VF resources are completely determined by the PF resources, and the PF resources have already been reassigned. This means we need to leave VF's resources un-cleared in pcibios_fixup_resources(). In this patch, we skip the resource unset process in pcibios_fixup_resources(), if the pci_dev is a VF. Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com --- arch/powerpc/kernel/pci-common.c |4 1 file changed, 4 insertions(+) diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c index 2a525c9..8203101 100644 --- a/arch/powerpc/kernel/pci-common.c +++ b/arch/powerpc/kernel/pci-common.c @@ -788,6 +788,10 @@ static void pcibios_fixup_resources(struct pci_dev *dev) pci_name(dev)); return; } + + if (dev-is_virtfn) + return; + for (i = 0; i DEVICE_COUNT_RESOURCE; i++) { struct resource *res = dev-resource + i; struct pci_bus_region reg; -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V16 18/20] powerpc/powernv: Group VF PE when IOV BAR is big on PHB3
When IOV BAR is big, each is covered by 4 M64 windows. This leads to several VF PE sits in one PE in terms of M64. Group VF PEs according to the M64 allocation. [bhelgaas: use dev_printk() when possible] Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com --- arch/powerpc/include/asm/pci-bridge.h |2 +- arch/powerpc/platforms/powernv/pci-ioda.c | 197 ++--- 2 files changed, 154 insertions(+), 45 deletions(-) diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h index 415df85..560c739 100644 --- a/arch/powerpc/include/asm/pci-bridge.h +++ b/arch/powerpc/include/asm/pci-bridge.h @@ -185,7 +185,7 @@ struct pci_dn { #define M64_PER_IOV 4 int m64_per_iov; #define IODA_INVALID_M64(-1) - int m64_wins[PCI_SRIOV_NUM_BARS]; + int m64_wins[PCI_SRIOV_NUM_BARS][M64_PER_IOV]; #endif /* CONFIG_PCI_IOV */ #endif struct list_head child_list; diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index b63925f..33088f6 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -1156,26 +1156,27 @@ static int pnv_pci_vf_release_m64(struct pci_dev *pdev) struct pci_controller *hose; struct pnv_phb*phb; struct pci_dn *pdn; - inti; + inti, j; bus = pdev-bus; hose = pci_bus_to_host(bus); phb = hose-private_data; pdn = pci_get_pdn(pdev); - for (i = 0; i PCI_SRIOV_NUM_BARS; i++) { - if (pdn-m64_wins[i] == IODA_INVALID_M64) - continue; - opal_pci_phb_mmio_enable(phb-opal_id, - OPAL_M64_WINDOW_TYPE, pdn-m64_wins[i], 0); - clear_bit(pdn-m64_wins[i], phb-ioda.m64_bar_alloc); - pdn-m64_wins[i] = IODA_INVALID_M64; - } + for (i = 0; i PCI_SRIOV_NUM_BARS; i++) + for (j = 0; j M64_PER_IOV; j++) { + if (pdn-m64_wins[i][j] == IODA_INVALID_M64) + continue; + opal_pci_phb_mmio_enable(phb-opal_id, + OPAL_M64_WINDOW_TYPE, pdn-m64_wins[i][j], 0); + clear_bit(pdn-m64_wins[i][j], phb-ioda.m64_bar_alloc); + pdn-m64_wins[i][j] = IODA_INVALID_M64; + } return 0; } -static int pnv_pci_vf_assign_m64(struct pci_dev *pdev) +static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs) { struct pci_bus*bus; struct pci_controller *hose; @@ -1183,17 +1184,33 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev) struct pci_dn *pdn; unsigned int win; struct resource *res; - inti; + inti, j; int64_trc; + inttotal_vfs; + resource_size_tsize, start; + intpe_num; + intvf_groups; + intvf_per_group; bus = pdev-bus; hose = pci_bus_to_host(bus); phb = hose-private_data; pdn = pci_get_pdn(pdev); + total_vfs = pci_sriov_get_totalvfs(pdev); /* Initialize the m64_wins to IODA_INVALID_M64 */ for (i = 0; i PCI_SRIOV_NUM_BARS; i++) - pdn-m64_wins[i] = IODA_INVALID_M64; + for (j = 0; j M64_PER_IOV; j++) + pdn-m64_wins[i][j] = IODA_INVALID_M64; + + if (pdn-m64_per_iov == M64_PER_IOV) { + vf_groups = (num_vfs = M64_PER_IOV) ? num_vfs: M64_PER_IOV; + vf_per_group = (num_vfs = M64_PER_IOV)? 1: + roundup_pow_of_two(num_vfs) / pdn-m64_per_iov; + } else { + vf_groups = 1; + vf_per_group = 1; + } for (i = 0; i PCI_SRIOV_NUM_BARS; i++) { res = pdev-resource[i + PCI_IOV_RESOURCES]; @@ -1203,35 +1220,61 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev) if (!pnv_pci_is_mem_pref_64(res-flags)) continue; - do { - win = find_next_zero_bit(phb-ioda.m64_bar_alloc, - phb-ioda.m64_bar_idx + 1, 0); - - if (win = phb-ioda.m64_bar_idx + 1) - goto m64_failed; - } while (test_and_set_bit(win, phb-ioda.m64_bar_alloc)); + for (j = 0; j vf_groups; j++) { + do { + win = find_next_zero_bit(phb-ioda.m64_bar_alloc, + phb-ioda.m64_bar_idx + 1, 0); + + if (win = phb-ioda.m64_bar_idx + 1) +
[PATCH 1/6] powerpc/mm: Remove duplicate declaration of setbat()
This is already declared in mmu_decl.h, so we don't need a second version in the C file. Signed-off-by: Michael Ellerman m...@ellerman.id.au --- arch/powerpc/mm/pgtable_32.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c index 03b1a3b0fbd5..72555ab145cd 100644 --- a/arch/powerpc/mm/pgtable_32.c +++ b/arch/powerpc/mm/pgtable_32.c @@ -54,9 +54,6 @@ extern char etext[], _stext[]; #ifdef HAVE_BATS extern phys_addr_t v_mapped_by_bats(unsigned long va); extern unsigned long p_mapped_by_bats(phys_addr_t pa); -void setbat(int index, unsigned long virt, phys_addr_t phys, - unsigned int size, int flags); - #else /* !HAVE_BATS */ #define v_mapped_by_bats(x)(0UL) #define p_mapped_by_bats(x)(0UL) -- 2.1.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 3/6] powerpc: Make STRICT_MM_TYPECHECKS a config option
The STRICT_MM_TYPECHECKS code has bit-rotted over the years. To make it possible to easily build test it, make it a CONFIG option. Signed-off-by: Michael Ellerman m...@ellerman.id.au --- arch/powerpc/Kconfig.debug | 8 arch/powerpc/include/asm/page.h | 4 +--- arch/powerpc/include/asm/pgtable-ppc64.h | 2 +- 3 files changed, 10 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug index ec2e40f2cc11..777108f4acab 100644 --- a/arch/powerpc/Kconfig.debug +++ b/arch/powerpc/Kconfig.debug @@ -19,6 +19,14 @@ config PPC_WERROR depends on !PPC_DISABLE_WERROR default y +config STRICT_MM_TYPECHECKS + bool Do extra type checking on mm types + default n + help + This option turns on extra type checking for some mm related types. + + If you don't know what this means, say N. + config PRINT_STACK_DEPTH int Stack depth to print if DEBUG_KERNEL default 64 diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h index 69c059887a2c..71294a6e976e 100644 --- a/arch/powerpc/include/asm/page.h +++ b/arch/powerpc/include/asm/page.h @@ -278,9 +278,7 @@ extern long long virt_phys_offset; #ifndef __ASSEMBLY__ -#undef STRICT_MM_TYPECHECKS - -#ifdef STRICT_MM_TYPECHECKS +#ifdef CONFIG_STRICT_MM_TYPECHECKS /* These are used to make use of C type-checking. */ /* PTE level */ diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h b/arch/powerpc/include/asm/pgtable-ppc64.h index 43e6ad424c7f..f951d9cf358a 100644 --- a/arch/powerpc/include/asm/pgtable-ppc64.h +++ b/arch/powerpc/include/asm/pgtable-ppc64.h @@ -118,7 +118,7 @@ */ #ifndef __real_pte -#ifdef STRICT_MM_TYPECHECKS +#ifdef CONFIG_STRICT_MM_TYPECHECKS #define __real_pte(e,p)((real_pte_t){(e)}) #define __rpte_to_pte(r) ((r).pte) #else -- 2.1.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH 6/6] powerpc: Always enable STRICT_MM_TYPECHECKS
The argument for making this an option was that gcc produced inferior code with it enabled. That doesn't seem to be the case anymore (gcc 4.9), so turn it on always. Signed-off-by: Michael Ellerman m...@ellerman.id.au --- arch/powerpc/Kconfig.debug | 8 --- arch/powerpc/include/asm/page.h | 40 arch/powerpc/include/asm/pgtable-ppc64.h | 5 3 files changed, 53 deletions(-) diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug index 777108f4acab..ec2e40f2cc11 100644 --- a/arch/powerpc/Kconfig.debug +++ b/arch/powerpc/Kconfig.debug @@ -19,14 +19,6 @@ config PPC_WERROR depends on !PPC_DISABLE_WERROR default y -config STRICT_MM_TYPECHECKS - bool Do extra type checking on mm types - default n - help - This option turns on extra type checking for some mm related types. - - If you don't know what this means, say N. - config PRINT_STACK_DEPTH int Stack depth to print if DEBUG_KERNEL default 64 diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h index 71294a6e976e..6af9514106ab 100644 --- a/arch/powerpc/include/asm/page.h +++ b/arch/powerpc/include/asm/page.h @@ -278,7 +278,6 @@ extern long long virt_phys_offset; #ifndef __ASSEMBLY__ -#ifdef CONFIG_STRICT_MM_TYPECHECKS /* These are used to make use of C type-checking. */ /* PTE level */ @@ -319,45 +318,6 @@ typedef struct { unsigned long pgprot; } pgprot_t; #define pgprot_val(x) ((x).pgprot) #define __pgprot(x)((pgprot_t) { (x) }) -#else - -/* - * .. while these make it easier on the compiler - */ - -typedef pte_basic_t pte_t; -#define pte_val(x) (x) -#define __pte(x) (x) - -#if defined(CONFIG_PPC_64K_PAGES) defined(CONFIG_PPC_STD_MMU_64) -typedef struct { pte_t pte; unsigned long hidx; } real_pte_t; -#else -typedef pte_t real_pte_t; -#endif - - -#ifdef CONFIG_PPC64 -typedef unsigned long pmd_t; -#define pmd_val(x) (x) -#define __pmd(x) (x) - -#ifndef CONFIG_PPC_64K_PAGES -typedef unsigned long pud_t; -#define pud_val(x) (x) -#define __pud(x) (x) -#endif /* !CONFIG_PPC_64K_PAGES */ -#endif /* CONFIG_PPC64 */ - -typedef unsigned long pgd_t; -#define pgd_val(x) (x) -#define pgprot_val(x) (x) - -typedef unsigned long pgprot_t; -#define __pgd(x) (x) -#define __pgprot(x)(x) - -#endif - typedef struct { signed long pd; } hugepd_t; #ifdef CONFIG_HUGETLB_PAGE diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h b/arch/powerpc/include/asm/pgtable-ppc64.h index f951d9cf358a..4fa3035d8449 100644 --- a/arch/powerpc/include/asm/pgtable-ppc64.h +++ b/arch/powerpc/include/asm/pgtable-ppc64.h @@ -118,13 +118,8 @@ */ #ifndef __real_pte -#ifdef CONFIG_STRICT_MM_TYPECHECKS #define __real_pte(e,p)((real_pte_t){(e)}) #define __rpte_to_pte(r) ((r).pte) -#else -#define __real_pte(e,p)(e) -#define __rpte_to_pte(r) (__pte(r)) -#endif #define __rpte_to_hidx(r,index)(pte_val(__rpte_to_pte(r)) 12) #define pte_iterate_hashed_subpages(rpte, psize, va, index, shift) \ -- 2.1.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 01/27] powerpc: move find_and_init_phbs() to pSeries specific code
Hi Daniel, On Wed, 25 Mar 2015 16:35:35 +1100 Daniel Axtens d...@axtens.net wrote: Previously, find_and_init_phbs() was used in both PowerNV and pSeries setup. However, since RTAS support has been dropped from PowerNV, we can move it into a platform-specific file. This patch depends on the patch to drop RTAS support from PowerNV: http://patchwork.ozlabs.org/patch/449316/ In the future, you should put this sort of commentary (this last paragraph) below the --- line as we don't really want it in the commit message, right? This is more instructions to the reviewers/committer than description of the change. Signed-off-by: Daniel Axtens d...@axtens.net --- arch/powerpc/include/asm/ppc-pci.h | 3 --- arch/powerpc/kernel/rtas_pci.c | 47 -- arch/powerpc/platforms/pseries/setup.c | 47 ++ 3 files changed, 47 insertions(+), 50 deletions(-) -- Cheers, Stephen Rothwells...@canb.auug.org.au pgpR0Zprq9vER.pgp Description: OpenPGP digital signature ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V16 04/20] PCI: Index IOV resources in the conventional style
From: Bjorn Helgaas bhelg...@google.com Most of PCI uses res = dev-resource[i], not res = dev-resource + i. Use that style in iov.c also. No functional change. Signed-off-by: Bjorn Helgaas bhelg...@google.com Acked-by: Wei Yang weiy...@linux.vnet.ibm.com --- drivers/pci/iov.c |8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index 5bca0e1..27b98c3 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -95,7 +95,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int reset) virtfn-multifunction = 0; for (i = 0; i PCI_SRIOV_NUM_BARS; i++) { - res = dev-resource + PCI_IOV_RESOURCES + i; + res = dev-resource[i + PCI_IOV_RESOURCES]; if (!res-parent) continue; virtfn-resource[i].name = pci_name(virtfn); @@ -212,7 +212,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn) nres = 0; for (i = 0; i PCI_SRIOV_NUM_BARS; i++) { bars |= (1 (i + PCI_IOV_RESOURCES)); - res = dev-resource + PCI_IOV_RESOURCES + i; + res = dev-resource[i + PCI_IOV_RESOURCES]; if (res-parent) nres++; } @@ -373,7 +373,7 @@ found: nres = 0; for (i = 0; i PCI_SRIOV_NUM_BARS; i++) { - res = dev-resource + PCI_IOV_RESOURCES + i; + res = dev-resource[i + PCI_IOV_RESOURCES]; bar64 = __pci_read_base(dev, pci_bar_unknown, res, pos + PCI_SRIOV_BAR + i * 4); if (!res-flags) @@ -417,7 +417,7 @@ found: failed: for (i = 0; i PCI_SRIOV_NUM_BARS; i++) { - res = dev-resource + PCI_IOV_RESOURCES + i; + res = dev-resource[i + PCI_IOV_RESOURCES]; res-flags = 0; } -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V16 08/20] PCI: Add pcibios_sriov_enable() and pcibios_sriov_disable()
VFs are dynamically created when a driver enables them. On some platforms, like PowerNV, special resources are necessary to enable VFs. Add platform hooks for enabling and disabling VFs. Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com Acked-by: Bjorn Helgaas bhelg...@google.com --- drivers/pci/iov.c | 19 +++ 1 file changed, 19 insertions(+) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index 5643a10..64c4692 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -220,6 +220,11 @@ static void virtfn_remove(struct pci_dev *dev, int id, int reset) pci_dev_put(dev); } +int __weak pcibios_sriov_enable(struct pci_dev *pdev, u16 num_vfs) +{ + return 0; +} + static int sriov_enable(struct pci_dev *dev, int nr_virtfn) { int rc; @@ -231,6 +236,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn) struct pci_sriov *iov = dev-sriov; int bars = 0; int bus; + int retval; if (!nr_virtfn) return 0; @@ -307,6 +313,12 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn) if (nr_virtfn initial) initial = nr_virtfn; + if ((retval = pcibios_sriov_enable(dev, initial))) { + dev_err(dev-dev, failure %d from pcibios_sriov_enable()\n, + retval); + return retval; + } + for (i = 0; i initial; i++) { rc = virtfn_add(dev, i, 0); if (rc) @@ -335,6 +347,11 @@ failed: return rc; } +int __weak pcibios_sriov_disable(struct pci_dev *pdev) +{ + return 0; +} + static void sriov_disable(struct pci_dev *dev) { int i; @@ -346,6 +363,8 @@ static void sriov_disable(struct pci_dev *dev) for (i = 0; i iov-num_VFs; i++) virtfn_remove(dev, i, 0); + pcibios_sriov_disable(dev); + iov-ctrl = ~(PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE); pci_cfg_access_lock(dev); pci_write_config_word(dev, iov-pos + PCI_SRIOV_CTRL, iov-ctrl); -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V16 09/20] PCI: Add pcibios_iov_resource_alignment() interface
Per the SR-IOV spec r1.1, sec 3.3.14, the required alignment of a PF's IOV BAR is the size of an individual VF BAR, and the size consumed is the individual VF BAR size times NumVFs. The PowerNV platform has additional alignment requirements to help support its Partitionable Endpoint device isolation feature (see Documentation/powerpc/pci_iov_resource_on_powernv.txt). Add a pcibios_iov_resource_alignment() interface to allow platforms to request additional alignment. [bhelgaas: changelog, adapt to reworked pci_sriov_resource_alignment(), drop align parameter] Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com Acked-by: Bjorn Helgaas bhelg...@google.com --- drivers/pci/iov.c |8 +++- include/linux/pci.h |1 + 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index 64c4692..ee0ebff 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -569,6 +569,12 @@ int pci_iov_resource_bar(struct pci_dev *dev, int resno) 4 * (resno - PCI_IOV_RESOURCES); } +resource_size_t __weak pcibios_iov_resource_alignment(struct pci_dev *dev, + int resno) +{ + return pci_iov_resource_size(dev, resno); +} + /** * pci_sriov_resource_alignment - get resource alignment for VF BAR * @dev: the PCI device @@ -581,7 +587,7 @@ int pci_iov_resource_bar(struct pci_dev *dev, int resno) */ resource_size_t pci_sriov_resource_alignment(struct pci_dev *dev, int resno) { - return pci_iov_resource_size(dev, resno); + return pcibios_iov_resource_alignment(dev, resno); } /** diff --git a/include/linux/pci.h b/include/linux/pci.h index 99ea948..4e1f17d 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1174,6 +1174,7 @@ unsigned char pci_bus_max_busnr(struct pci_bus *bus); void pci_setup_bridge(struct pci_bus *bus); resource_size_t pcibios_window_alignment(struct pci_bus *bus, unsigned long type); +resource_size_t pcibios_iov_resource_alignment(struct pci_dev *dev, int resno); #define PCI_VGA_STATE_CHANGE_BRIDGE (1 0) #define PCI_VGA_STATE_CHANGE_DECODES (1 1) -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V16 10/20] PCI: Consider additional PF's IOV BAR alignment in sizing and assigning
When sizing and assigning resources, we divide the resources into two lists: the requested list and the additional list. We don't consider the alignment of additional VF(n) BAR space. This is because the alignment required for the VF(n) BAR space is the size of an individual VF BAR, not the size of the space for *all* VFs. But we want additional alignment to support partitioning on PowerNV. Consider the additional IOV BAR alignment when sizing and assigning resources. When there is not enough system MMIO space to accomodate both the requested list and the additional list, the PF's IOV BAR alignment will not contribute to the bridge. When there is enough system MMIO space for both lists, the additional alignment will contribute to the bridge. The additional alignment is stored in the min_align of pci_dev_resource, which is stored in the additional list by add_to_list() at the end of pbus_size_mem(). The additional alignment is calculated in pci_resource_alignment(). For an IOV BAR, we have arch dependent function to get the alignment for different arch. [bhelgaas: changelog, printk cast] Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com Acked-by: Bjorn Helgaas bhelg...@google.com --- drivers/pci/setup-bus.c | 95 +++ 1 file changed, 79 insertions(+), 16 deletions(-) diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c index e3e17f3..6603d40 100644 --- a/drivers/pci/setup-bus.c +++ b/drivers/pci/setup-bus.c @@ -99,8 +99,8 @@ static void remove_from_list(struct list_head *head, } } -static resource_size_t get_res_add_size(struct list_head *head, - struct resource *res) +static struct pci_dev_resource *res_to_dev_res(struct list_head *head, + struct resource *res) { struct pci_dev_resource *dev_res; @@ -109,17 +109,37 @@ static resource_size_t get_res_add_size(struct list_head *head, int idx = res - dev_res-dev-resource[0]; dev_printk(KERN_DEBUG, dev_res-dev-dev, -res[%d]=%pR get_res_add_size add_size %llx\n, +res[%d]=%pR res_to_dev_res add_size %llx min_align %llx\n, idx, dev_res-res, -(unsigned long long)dev_res-add_size); +(unsigned long long)dev_res-add_size, +(unsigned long long)dev_res-min_align); - return dev_res-add_size; + return dev_res; } } - return 0; + return NULL; } +static resource_size_t get_res_add_size(struct list_head *head, + struct resource *res) +{ + struct pci_dev_resource *dev_res; + + dev_res = res_to_dev_res(head, res); + return dev_res ? dev_res-add_size : 0; +} + +static resource_size_t get_res_add_align(struct list_head *head, +struct resource *res) +{ + struct pci_dev_resource *dev_res; + + dev_res = res_to_dev_res(head, res); + return dev_res ? dev_res-min_align : 0; +} + + /* Sort resources by alignment */ static void pdev_sort_resources(struct pci_dev *dev, struct list_head *head) { @@ -215,7 +235,7 @@ static void reassign_resources_sorted(struct list_head *realloc_head, struct resource *res; struct pci_dev_resource *add_res, *tmp; struct pci_dev_resource *dev_res; - resource_size_t add_size; + resource_size_t add_size, align; int idx; list_for_each_entry_safe(add_res, tmp, realloc_head, list) { @@ -238,13 +258,13 @@ static void reassign_resources_sorted(struct list_head *realloc_head, idx = res - add_res-dev-resource[0]; add_size = add_res-add_size; + align = add_res-min_align; if (!resource_size(res)) { - res-start = add_res-start; + res-start = align; res-end = res-start + add_size - 1; if (pci_assign_resource(add_res-dev, idx)) reset_resource(res); } else { - resource_size_t align = add_res-min_align; res-flags |= add_res-flags (IORESOURCE_STARTALIGN|IORESOURCE_SIZEALIGN); if (pci_reassign_resource(add_res-dev, idx, @@ -368,8 +388,9 @@ static void __assign_resources_sorted(struct list_head *head, LIST_HEAD(save_head); LIST_HEAD(local_fail_head); struct pci_dev_resource *save_res; - struct pci_dev_resource *dev_res, *tmp_res; + struct pci_dev_resource *dev_res, *tmp_res, *dev_res2; unsigned long fail_type; + resource_size_t add_align, align; /*
[PATCH V16 11/20] powerpc/pci: Create pci_dn for VFs
From: Gavin Shan gws...@linux.vnet.ibm.com pci_dn is the extension of PCI device node and is created from device node. Unfortunately, VFs are enabled dynamically by PF's driver and they don't have corresponding device nodes and pci_dn, which is required to access VFs' config spaces. The patch creates pci_dn for VFs in pcibios_sriov_enable() on their PF, and removes pci_dn for VFs in pcibios_sriov_disable() on their PF. When VF's pci_dn is created, it's put to the child list of the pci_dn of PF's upstream bridge. The pci_dn is linked to pci_dev during early fixup time to setup the fast path. [bhelgaas: add ifdef around add_one_dev_pci_info(), use dev_printk()] Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com --- arch/powerpc/include/asm/pci-bridge.h |3 + arch/powerpc/kernel/pci_dn.c | 116 + arch/powerpc/platforms/powernv/pci-ioda.c | 16 3 files changed, 135 insertions(+) diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h index 2c6dc2a..ece30f5 100644 --- a/arch/powerpc/include/asm/pci-bridge.h +++ b/arch/powerpc/include/asm/pci-bridge.h @@ -156,6 +156,7 @@ struct iommu_table; struct pci_dn { int flags; +#define PCI_DN_FLAG_IOV_VF 0x01 int busno; /* pci bus number */ int devfn; /* pci device and function number */ @@ -188,6 +189,8 @@ struct pci_dn { extern struct pci_dn *pci_get_pdn_by_devfn(struct pci_bus *bus, int devfn); extern struct pci_dn *pci_get_pdn(struct pci_dev *pdev); +extern struct pci_dn *add_dev_pci_data(struct pci_dev *pdev); +extern void remove_dev_pci_data(struct pci_dev *pdev); extern void *update_dn_pci_info(struct device_node *dn, void *data); static inline int pci_device_from_OF_node(struct device_node *np, diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c index 65b9836..e5f1d78 100644 --- a/arch/powerpc/kernel/pci_dn.c +++ b/arch/powerpc/kernel/pci_dn.c @@ -136,6 +136,122 @@ struct pci_dn *pci_get_pdn(struct pci_dev *pdev) return NULL; } +#ifdef CONFIG_PCI_IOV +static struct pci_dn *add_one_dev_pci_data(struct pci_dn *parent, + struct pci_dev *pdev, + int busno, int devfn) +{ + struct pci_dn *pdn; + + /* Except PHB, we always have the parent */ + if (!parent) + return NULL; + + pdn = kzalloc(sizeof(*pdn), GFP_KERNEL); + if (!pdn) { + dev_warn(pdev-dev, %s: Out of memory!\n, __func__); + return NULL; + } + + pdn-phb = parent-phb; + pdn-parent = parent; + pdn-busno = busno; + pdn-devfn = devfn; +#ifdef CONFIG_PPC_POWERNV + pdn-pe_number = IODA_INVALID_PE; +#endif + INIT_LIST_HEAD(pdn-child_list); + INIT_LIST_HEAD(pdn-list); + list_add_tail(pdn-list, parent-child_list); + + /* +* If we already have PCI device instance, lets +* bind them. +*/ + if (pdev) + pdev-dev.archdata.pci_data = pdn; + + return pdn; +} +#endif + +struct pci_dn *add_dev_pci_data(struct pci_dev *pdev) +{ +#ifdef CONFIG_PCI_IOV + struct pci_dn *parent, *pdn; + int i; + + /* Only support IOV for now */ + if (!pdev-is_physfn) + return pci_get_pdn(pdev); + + /* Check if VFs have been populated */ + pdn = pci_get_pdn(pdev); + if (!pdn || (pdn-flags PCI_DN_FLAG_IOV_VF)) + return NULL; + + pdn-flags |= PCI_DN_FLAG_IOV_VF; + parent = pci_bus_to_pdn(pdev-bus); + if (!parent) + return NULL; + + for (i = 0; i pci_sriov_get_totalvfs(pdev); i++) { + pdn = add_one_dev_pci_data(parent, NULL, + pci_iov_virtfn_bus(pdev, i), + pci_iov_virtfn_devfn(pdev, i)); + if (!pdn) { + dev_warn(pdev-dev, %s: Cannot create firmware data for VF#%d\n, +__func__, i); + return NULL; + } + } +#endif /* CONFIG_PCI_IOV */ + + return pci_get_pdn(pdev); +} + +void remove_dev_pci_data(struct pci_dev *pdev) +{ +#ifdef CONFIG_PCI_IOV + struct pci_dn *parent; + struct pci_dn *pdn, *tmp; + int i; + + /* Only support IOV PF for now */ + if (!pdev-is_physfn) + return; + + /* Check if VFs have been populated */ + pdn = pci_get_pdn(pdev); + if (!pdn || !(pdn-flags PCI_DN_FLAG_IOV_VF)) + return; + + pdn-flags = ~PCI_DN_FLAG_IOV_VF; + parent = pci_bus_to_pdn(pdev-bus); + if (!parent) + return; + + /* +* We might introduce flag to pci_dn in future +* so that we can release VF's firmware data in +
[PATCH V16 14/20] powerpc/powernv: Reserve additional space for IOV BAR according to the number of total_pe
On PHB3, PF IOV BAR will be covered by M64 BAR to have better PE isolation. M64 BAR is a type of hardware resource in PHB3, which could map a range of MMIO to PE numbers on powernv platform. And this range is divided equally by the number of total_pe with each divided range mapping to a PE number. Also, the M64 BAR must map a MMIO range with power-of-two size. The total_pe number is usually different from total_VFs, which can lead to a conflict between MMIO space and the PE number. For example, if total_VFs is 128 and total_pe is 256, the second half of M64 BAR will be part of other PCI device, which may already belong to other PEs. This patch prevents the conflict by reserving additional space for the PF IOV BAR, which is total_pe number of VF's BAR size. [bhelgaas: make dev_printk() output more consistent, index resource[] conventionally] Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com --- arch/powerpc/include/asm/machdep.h|4 +++ arch/powerpc/include/asm/pci-bridge.h |3 ++ arch/powerpc/kernel/pci-common.c |6 arch/powerpc/platforms/powernv/pci-ioda.c | 43 + 4 files changed, 56 insertions(+) diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h index 098d51e..b303833 100644 --- a/arch/powerpc/include/asm/machdep.h +++ b/arch/powerpc/include/asm/machdep.h @@ -250,6 +250,10 @@ struct machdep_calls { /* Reset the secondary bus of bridge */ void (*pcibios_reset_secondary_bus)(struct pci_dev *dev); +#ifdef CONFIG_PCI_IOV + void (*pcibios_fixup_sriov)(struct pci_dev *pdev); +#endif /* CONFIG_PCI_IOV */ + /* Called to shutdown machine specific hardware not already controlled * by other drivers. */ diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h index ece30f5..7b8ebc5 100644 --- a/arch/powerpc/include/asm/pci-bridge.h +++ b/arch/powerpc/include/asm/pci-bridge.h @@ -178,6 +178,9 @@ struct pci_dn { #define IODA_INVALID_PE(-1) #ifdef CONFIG_PPC_POWERNV int pe_number; +#ifdef CONFIG_PCI_IOV + u16 vfs_expanded; /* number of VFs IOV BAR expanded */ +#endif /* CONFIG_PCI_IOV */ #endif struct list_head child_list; struct list_head list; diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c index 8203101..375bf70 100644 --- a/arch/powerpc/kernel/pci-common.c +++ b/arch/powerpc/kernel/pci-common.c @@ -990,6 +990,12 @@ int pcibios_add_device(struct pci_dev *dev) */ if (dev-bus-is_added) pcibios_setup_device(dev); + +#ifdef CONFIG_PCI_IOV + if (ppc_md.pcibios_fixup_sriov) + ppc_md.pcibios_fixup_sriov(dev); +#endif /* CONFIG_PCI_IOV */ + return 0; } diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 9447ee9..1da45aa 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -1749,6 +1749,46 @@ static void pnv_pci_init_ioda_msis(struct pnv_phb *phb) static void pnv_pci_init_ioda_msis(struct pnv_phb *phb) { } #endif /* CONFIG_PCI_MSI */ +#ifdef CONFIG_PCI_IOV +static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev) +{ + struct pci_controller *hose; + struct pnv_phb *phb; + struct resource *res; + int i; + resource_size_t size; + struct pci_dn *pdn; + + if (!pdev-is_physfn || pdev-is_added) + return; + + hose = pci_bus_to_host(pdev-bus); + phb = hose-private_data; + + pdn = pci_get_pdn(pdev); + pdn-vfs_expanded = 0; + + for (i = 0; i PCI_SRIOV_NUM_BARS; i++) { + res = pdev-resource[i + PCI_IOV_RESOURCES]; + if (!res-flags || res-parent) + continue; + if (!pnv_pci_is_mem_pref_64(res-flags)) { + dev_warn(pdev-dev, Skipping expanding VF BAR%d: %pR\n, +i, res); + continue; + } + + dev_dbg(pdev-dev, Fixing VF BAR%d: %pR to\n, i, res); + size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES); + res-end = res-start + size * phb-ioda.total_pe - 1; + dev_dbg(pdev-dev,%pR\n, res); + dev_info(pdev-dev, VF BAR%d: %pR (expanded to %d VFs for PE alignment), + i, res, phb-ioda.total_pe); + } + pdn-vfs_expanded = phb-ioda.total_pe; +} +#endif /* CONFIG_PCI_IOV */ + /* * This function is supposed to be called on basis of PE from top * to bottom style. So the the I/O or MMIO segment assigned to @@ -2122,6 +2162,9 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np, ppc_md.pcibios_enable_device_hook = pnv_pci_enable_device_hook; ppc_md.pcibios_window_alignment =
[PATCH V16 15/20] powerpc/powernv: Implement pcibios_iov_resource_alignment() on powernv
Implement pcibios_iov_resource_alignment() on powernv platform. On PowerNV platform, there are 3 cases for the IOV BAR: 1. initial state, the IOV BAR size is multiple times of VF BAR size 2. after expanded, the IOV BAR size is expanded to meet the M64 segment size 3. sizing stage, the IOV BAR is truncated to 0 pnv_pci_iov_resource_alignment() handle these three cases respectively. [bhelgaas: adjust to drop align parameter, return pci_iov_resource_size() if no ppc_md machdep_call version] Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com --- arch/powerpc/include/asm/machdep.h|1 + arch/powerpc/kernel/pci-common.c | 10 ++ arch/powerpc/platforms/powernv/pci-ioda.c | 20 3 files changed, 31 insertions(+) diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h index b303833..1b26804 100644 --- a/arch/powerpc/include/asm/machdep.h +++ b/arch/powerpc/include/asm/machdep.h @@ -252,6 +252,7 @@ struct machdep_calls { #ifdef CONFIG_PCI_IOV void (*pcibios_fixup_sriov)(struct pci_dev *pdev); + resource_size_t (*pcibios_iov_resource_alignment)(struct pci_dev *, int resno); #endif /* CONFIG_PCI_IOV */ /* Called to shutdown machine specific hardware not already controlled diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c index 375bf70..9a306ff 100644 --- a/arch/powerpc/kernel/pci-common.c +++ b/arch/powerpc/kernel/pci-common.c @@ -130,6 +130,16 @@ void pcibios_reset_secondary_bus(struct pci_dev *dev) pci_reset_secondary_bus(dev); } +#ifdef CONFIG_PCI_IOV +resource_size_t pcibios_iov_resource_alignment(struct pci_dev *pdev, int resno) +{ + if (ppc_md.pcibios_iov_resource_alignment) + return ppc_md.pcibios_iov_resource_alignment(pdev, resno); + + return pci_iov_resource_size(pdev, resno); +} +#endif /* CONFIG_PCI_IOV */ + static resource_size_t pcibios_io_size(const struct pci_controller *hose) { #ifdef CONFIG_PPC64 diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 1da45aa..217eaad 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -1965,6 +1965,25 @@ static resource_size_t pnv_pci_window_alignment(struct pci_bus *bus, return phb-ioda.io_segsize; } +#ifdef CONFIG_PCI_IOV +static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev, + int resno) +{ + struct pci_dn *pdn = pci_get_pdn(pdev); + resource_size_t align, iov_align; + + iov_align = resource_size(pdev-resource[resno]); + if (iov_align) + return iov_align; + + align = pci_iov_resource_size(pdev, resno); + if (pdn-vfs_expanded) + return pdn-vfs_expanded * align; + + return align; +} +#endif /* CONFIG_PCI_IOV */ + /* Prevent enabling devices for which we couldn't properly * assign a PE */ @@ -2164,6 +2183,7 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np, ppc_md.pcibios_reset_secondary_bus = pnv_pci_reset_secondary_bus; #ifdef CONFIG_PCI_IOV ppc_md.pcibios_fixup_sriov = pnv_pci_ioda_fixup_iov_resources; + ppc_md.pcibios_iov_resource_alignment = pnv_pci_iov_resource_alignment; #endif /* CONFIG_PCI_IOV */ pci_add_flags(PCI_REASSIGN_ALL_RSRC); -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V16 20/20] powerpc/pci: Add PCI resource alignment documentation
In order to enable SRIOV on PowerNV platform, the PF's IOV BAR needs to be adjusted: 1. size expanded 2. aligned to M64BT size This patch documents this change on the reason and how. [bhelgaas: reformat, clarify, expand] Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com --- .../powerpc/pci_iov_resource_on_powernv.txt| 301 1 file changed, 301 insertions(+) create mode 100644 Documentation/powerpc/pci_iov_resource_on_powernv.txt diff --git a/Documentation/powerpc/pci_iov_resource_on_powernv.txt b/Documentation/powerpc/pci_iov_resource_on_powernv.txt new file mode 100644 index 000..b55c5cd --- /dev/null +++ b/Documentation/powerpc/pci_iov_resource_on_powernv.txt @@ -0,0 +1,301 @@ +Wei Yang weiy...@linux.vnet.ibm.com +Benjamin Herrenschmidt b...@au1.ibm.com +Bjorn Helgaas bhelg...@google.com +26 Aug 2014 + +This document describes the requirement from hardware for PCI MMIO resource +sizing and assignment on PowerKVM and how generic PCI code handles this +requirement. The first two sections describe the concepts of Partitionable +Endpoints and the implementation on P8 (IODA2). The next two sections talks +about considerations on enabling SRIOV on IODA2. + +1. Introduction to Partitionable Endpoints + +A Partitionable Endpoint (PE) is a way to group the various resources +associated with a device or a set of devices to provide isolation between +partitions (i.e., filtering of DMA, MSIs etc.) and to provide a mechanism +to freeze a device that is causing errors in order to limit the possibility +of propagation of bad data. + +There is thus, in HW, a table of PE states that contains a pair of frozen +state bits (one for MMIO and one for DMA, they get set together but can be +cleared independently) for each PE. + +When a PE is frozen, all stores in any direction are dropped and all loads +return all 1's value. MSIs are also blocked. There's a bit more state that +captures things like the details of the error that caused the freeze etc., but +that's not critical. + +The interesting part is how the various PCIe transactions (MMIO, DMA, ...) +are matched to their corresponding PEs. + +The following section provides a rough description of what we have on P8 +(IODA2). Keep in mind that this is all per PHB (PCI host bridge). Each PHB +is a completely separate HW entity that replicates the entire logic, so has +its own set of PEs, etc. + +2. Implementation of Partitionable Endpoints on P8 (IODA2) + +P8 supports up to 256 Partitionable Endpoints per PHB. + + * Inbound + +For DMA, MSIs and inbound PCIe error messages, we have a table (in +memory but accessed in HW by the chip) that provides a direct +correspondence between a PCIe RID (bus/dev/fn) with a PE number. +We call this the RTT. + +- For DMA we then provide an entire address space for each PE that can + contain two windows, depending on the value of PCI address bit 59. + Each window can be configured to be remapped via a TCE table (IOMMU + translation table), which has various configurable characteristics + not described here. + +- For MSIs, we have two windows in the address space (one at the top of + the 32-bit space and one much higher) which, via a combination of the + address and MSI value, will result in one of the 2048 interrupts per + bridge being triggered. There's a PE# in the interrupt controller + descriptor table as well which is compared with the PE# obtained from + the RTT to authorize the device to emit that specific interrupt. + +- Error messages just use the RTT. + + * Outbound. That's where the tricky part is. + +Like other PCI host bridges, the Power8 IODA2 PHB supports windows +from the CPU address space to the PCI address space. There is one M32 +window and sixteen M64 windows. They have different characteristics. +First what they have in common: they forward a configurable portion of +the CPU address space to the PCIe bus and must be naturally aligned +power of two in size. The rest is different: + +- The M32 window: + + * Is limited to 4GB in size. + + * Drops the top bits of the address (above the size) and replaces + them with a configurable value. This is typically used to generate + 32-bit PCIe accesses. We configure that window at boot from FW and + don't touch it from Linux; it's usually set to forward a 2GB + portion of address space from the CPU to PCIe + 0x8000_..0x_. (Note: The top 64KB are actually + reserved for MSIs but this is not a problem at this point; we just + need to ensure Linux doesn't assign anything there, the M32 logic + ignores that however and will forward in that space if we try). + + * It is divided into 256 segments of equal size. A table in the chip + maps each segment to a PE#. That allows portions of the MMIO space + to be assigned to PEs on a segment
[PATCH V16 19/20] powerpc/pci: Remove unused struct pci_dn.pcidev field
In struct pci_dn, the pcidev field is assigned but not used, so remove it. Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com Acked-by: Gavin Shan gws...@linux.vnet.ibm.com --- arch/powerpc/include/asm/pci-bridge.h |1 - arch/powerpc/platforms/powernv/pci-ioda.c |1 - 2 files changed, 2 deletions(-) diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h index 560c739..a39270e 100644 --- a/arch/powerpc/include/asm/pci-bridge.h +++ b/arch/powerpc/include/asm/pci-bridge.h @@ -171,7 +171,6 @@ struct pci_dn { int pci_ext_config_space; /* for pci devices */ - struct pci_dev *pcidev;/* back-pointer to the pci device */ #ifdef CONFIG_EEH struct eeh_dev *edev; /* eeh device */ #endif diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 33088f6..b1387ea 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -1028,7 +1028,6 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe) pci_name(dev)); continue; } - pdn-pcidev = dev; pdn-pe_number = pe-pe_number; pe-dma_weight += pnv_ioda_dma_weight(dev); if ((pe-flags PNV_IODA_PE_BUS_ALL) dev-subordinate) -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 2/6] powerpc/mm: Change setbat() to take a pgprot_t rather than flags
The callers of setbat() are actually passing a pgprot_t for the flags parameter. This doesn't matter unless STRICT_MM_TYPECHECKS is enabled. So we can turn that on without breaking the build, change setbat() to take a pgprot_t and have it convert it to an unsigned long internally. Signed-off-by: Michael Ellerman m...@ellerman.id.au --- arch/powerpc/mm/mmu_decl.h | 2 +- arch/powerpc/mm/ppc_mmu_32.c | 3 ++- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/mm/mmu_decl.h b/arch/powerpc/mm/mmu_decl.h index 78c45f392f5b..085b66b10891 100644 --- a/arch/powerpc/mm/mmu_decl.h +++ b/arch/powerpc/mm/mmu_decl.h @@ -96,7 +96,7 @@ extern void _tlbia(void); extern void mapin_ram(void); extern int map_page(unsigned long va, phys_addr_t pa, int flags); extern void setbat(int index, unsigned long virt, phys_addr_t phys, - unsigned int size, int flags); + unsigned int size, pgprot_t prot); extern int __map_without_bats; extern int __allow_ioremap_reserved; diff --git a/arch/powerpc/mm/ppc_mmu_32.c b/arch/powerpc/mm/ppc_mmu_32.c index 5029dc19b517..94f33721d382 100644 --- a/arch/powerpc/mm/ppc_mmu_32.c +++ b/arch/powerpc/mm/ppc_mmu_32.c @@ -113,11 +113,12 @@ unsigned long __init mmu_mapin_ram(unsigned long top) * of 2 between 128k and 256M. */ void __init setbat(int index, unsigned long virt, phys_addr_t phys, - unsigned int size, int flags) + unsigned int size, pgprot_t prot) { unsigned int bl; int wimgxpp; struct ppc_bat *bat = BATS[index]; + unsigned long flags = pgprot_val(prot); if ((flags _PAGE_NO_CACHE) || (cpu_has_feature(CPU_FTR_NEED_COHERENT) == 0)) -- 2.1.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V16 01/20] PCI: Print more info in sriov_enable() error message
From: Bjorn Helgaas bhelg...@google.com If we don't have space for all the bus numbers required to enable VFs, print the largest bus number required and the range available. No functional change; improved error message only. Signed-off-by: Bjorn Helgaas bhelg...@google.com Acked-by: Wei Yang weiy...@linux.vnet.ibm.com --- drivers/pci/iov.c |7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index 4b3a4ea..c4c33ea 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -180,6 +180,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn) struct pci_dev *pdev; struct pci_sriov *iov = dev-sriov; int bars = 0; + u8 bus; if (!nr_virtfn) return 0; @@ -216,8 +217,10 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn) iov-offset = offset; iov-stride = stride; - if (virtfn_bus(dev, nr_virtfn - 1) dev-bus-busn_res.end) { - dev_err(dev-dev, SR-IOV: bus number out of range\n); + bus = virtfn_bus(dev, nr_virtfn - 1); + if (bus dev-bus-busn_res.end) { + dev_err(dev-dev, can't enable %d VFs (bus %02x out of range of %pR)\n, + nr_virtfn, bus, dev-bus-busn_res); return -ENOMEM; } -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V16 02/20] PCI: Print PF SR-IOV resource that contains all VF(n) BAR space
When we size VF BAR0, VF BAR1, etc., from the SR-IOV Capability of a PF, we learn the alignment requirement and amount of space consumed by a single VF. But when VFs are enabled, *each* of the NumVFs consumes that amount of space, so the total size of the PF resource is VF BAR size * NumVFs. Add a printk of the total space consumed by the VFs corresponding to what we already do for normal non-IOV BARs. No functional change; new message only. [bhelgaas: split out into its own patch] Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com Acked-by: Bjorn Helgaas bhelg...@google.com --- drivers/pci/iov.c |2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index c4c33ea..05f9d97 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -372,6 +372,8 @@ found: goto failed; } res-end = res-start + resource_size(res) * total - 1; + dev_info(dev-dev, VF(n) BAR%d space: %pR (contains BAR%d for %d VFs)\n, +i, res, i, total); nres++; } -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V16 06/20] PCI: Calculate maximum number of buses required for VFs
An SR-IOV device can change its First VF Offset and VF Stride based on the values of ARI Capable Hierarchy and NumVFs. The number of buses required for all VFs is determined by NumVFs, First VF Offset, and VF Stride (see SR-IOV spec r1.1, sec 2.1.2). Previously pci_iov_bus_range() computed how many buses would be required by TotalVFs, but this was based on a single NumVFs value and may not have been the maximum for all NumVFs configurations. Iterate over all valid NumVFs and calculate the maximum number of bus numbers that could ever be required for VFs of this device. [bhelgaas: changelog, compute busnr of NumVFs, not TotalVFs, remove kerenl-doc comment marker] Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com Acked-by: Bjorn Helgaas bhelg...@google.com --- drivers/pci/iov.c | 31 +++ drivers/pci/pci.h |1 + 2 files changed, 28 insertions(+), 4 deletions(-) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index a8752c2..2ae921f 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -46,6 +46,30 @@ static inline void pci_iov_set_numvfs(struct pci_dev *dev, int nr_virtfn) pci_read_config_word(dev, iov-pos + PCI_SRIOV_VF_STRIDE, iov-stride); } +/* + * The PF consumes one bus number. NumVFs, First VF Offset, and VF Stride + * determine how many additional bus numbers will be consumed by VFs. + * + * Iterate over all valid NumVFs and calculate the maximum number of bus + * numbers that could ever be required. + */ +static inline u8 virtfn_max_buses(struct pci_dev *dev) +{ + struct pci_sriov *iov = dev-sriov; + int nr_virtfn; + u8 max = 0; + u8 busnr; + + for (nr_virtfn = 1; nr_virtfn = iov-total_VFs; nr_virtfn++) { + pci_iov_set_numvfs(dev, nr_virtfn); + busnr = virtfn_bus(dev, nr_virtfn - 1); + if (busnr max) + max = busnr; + } + + return max; +} + static struct pci_bus *virtfn_add_bus(struct pci_bus *bus, int busnr) { struct pci_bus *child; @@ -427,6 +451,7 @@ found: dev-sriov = iov; dev-is_physfn = 1; + iov-max_VF_buses = virtfn_max_buses(dev); return 0; @@ -556,15 +581,13 @@ void pci_restore_iov_state(struct pci_dev *dev) int pci_iov_bus_range(struct pci_bus *bus) { int max = 0; - u8 busnr; struct pci_dev *dev; list_for_each_entry(dev, bus-devices, bus_list) { if (!dev-is_physfn) continue; - busnr = virtfn_bus(dev, dev-sriov-total_VFs - 1); - if (busnr max) - max = busnr; + if (dev-sriov-max_VF_buses max) + max = dev-sriov-max_VF_buses; } return max ? max - bus-number : 0; diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 5732964..bae593c 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -243,6 +243,7 @@ struct pci_sriov { u16 stride; /* following VF stride */ u32 pgsz; /* page size for BAR alignment */ u8 link;/* Function Dependency Link */ + u8 max_VF_buses;/* max buses consumed by VFs */ u16 driver_max_VFs; /* max num VFs driver supports */ struct pci_dev *dev;/* lowest numbered PF */ struct pci_dev *self; /* this PF */ -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V16 03/20] PCI: Keep individual VF BAR size in struct pci_sriov
Currently we don't store the individual VF BAR size. We calculate it when needed by dividing the PF's IOV resource size (which contains space for *all* the VFs) by total_VFs or by reading the BAR in the SR-IOV capability again. Keep the individual VF BAR size in struct pci_sriov.barsz[], add pci_iov_resource_size() to retrieve it, and use that instead of doing the division or reading the SR-IOV capability BAR. [bhelgaas: rename to barsz[], simplify barsz[] index computation, remove SR-IOV capability BAR sizing] Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com Acked-by: Bjorn Helgaas bhelg...@google.com --- drivers/pci/iov.c | 39 --- drivers/pci/pci.h |1 + include/linux/pci.h |3 +++ 3 files changed, 24 insertions(+), 19 deletions(-) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index 05f9d97..5bca0e1 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -57,6 +57,14 @@ static void virtfn_remove_bus(struct pci_bus *physbus, struct pci_bus *virtbus) pci_remove_bus(virtbus); } +resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno) +{ + if (!dev-is_physfn) + return 0; + + return dev-sriov-barsz[resno - PCI_IOV_RESOURCES]; +} + static int virtfn_add(struct pci_dev *dev, int id, int reset) { int i; @@ -92,8 +100,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int reset) continue; virtfn-resource[i].name = pci_name(virtfn); virtfn-resource[i].flags = res-flags; - size = resource_size(res); - do_div(size, iov-total_VFs); + size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES); virtfn-resource[i].start = res-start + size * id; virtfn-resource[i].end = virtfn-resource[i].start + size - 1; rc = request_resource(res, virtfn-resource[i]); @@ -311,7 +318,7 @@ static void sriov_disable(struct pci_dev *dev) static int sriov_init(struct pci_dev *dev, int pos) { - int i; + int i, bar64; int rc; int nres; u32 pgsz; @@ -360,29 +367,29 @@ found: pgsz = ~(pgsz - 1); pci_write_config_dword(dev, pos + PCI_SRIOV_SYS_PGSIZE, pgsz); + iov = kzalloc(sizeof(*iov), GFP_KERNEL); + if (!iov) + return -ENOMEM; + nres = 0; for (i = 0; i PCI_SRIOV_NUM_BARS; i++) { res = dev-resource + PCI_IOV_RESOURCES + i; - i += __pci_read_base(dev, pci_bar_unknown, res, -pos + PCI_SRIOV_BAR + i * 4); + bar64 = __pci_read_base(dev, pci_bar_unknown, res, + pos + PCI_SRIOV_BAR + i * 4); if (!res-flags) continue; if (resource_size(res) (PAGE_SIZE - 1)) { rc = -EIO; goto failed; } + iov-barsz[i] = resource_size(res); res-end = res-start + resource_size(res) * total - 1; dev_info(dev-dev, VF(n) BAR%d space: %pR (contains BAR%d for %d VFs)\n, i, res, i, total); + i += bar64; nres++; } - iov = kzalloc(sizeof(*iov), GFP_KERNEL); - if (!iov) { - rc = -ENOMEM; - goto failed; - } - iov-pos = pos; iov-nres = nres; iov-ctrl = ctrl; @@ -414,6 +421,7 @@ failed: res-flags = 0; } + kfree(iov); return rc; } @@ -510,14 +518,7 @@ int pci_iov_resource_bar(struct pci_dev *dev, int resno) */ resource_size_t pci_sriov_resource_alignment(struct pci_dev *dev, int resno) { - struct resource tmp; - int reg = pci_iov_resource_bar(dev, resno); - - if (!reg) - return 0; - -__pci_read_base(dev, pci_bar_unknown, tmp, reg); - return resource_alignment(tmp); + return pci_iov_resource_size(dev, resno); } /** diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 4091f82..5732964 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -247,6 +247,7 @@ struct pci_sriov { struct pci_dev *dev;/* lowest numbered PF */ struct pci_dev *self; /* this PF */ struct mutex lock; /* lock for VF bus */ + resource_size_t barsz[PCI_SRIOV_NUM_BARS]; /* VF BAR size */ }; #ifdef CONFIG_PCI_ATS diff --git a/include/linux/pci.h b/include/linux/pci.h index 211e9da..1559658 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1675,6 +1675,7 @@ int pci_num_vf(struct pci_dev *dev); int pci_vfs_assigned(struct pci_dev *dev); int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs); int pci_sriov_get_totalvfs(struct pci_dev *dev); +resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno); #else static inline int
[PATCH V16 07/20] PCI: Export pci_iov_virtfn_bus() and pci_iov_virtfn_devfn()
On PowerNV, some resource reservation is needed for SR-IOV VFs that don't exist at the bootup stage. To do the match between resources and VFs, the code need to get the VF's BDF in advance. Rename virtfn_bus() and virtfn_devfn() to pci_iov_virtfn_bus() and pci_iov_virtfn_devfn() and export them. [bhelgaas: changelog, make busnr int] Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com Acked-by: Bjorn Helgaas bhelg...@google.com --- drivers/pci/iov.c | 28 include/linux/pci.h | 11 +++ 2 files changed, 27 insertions(+), 12 deletions(-) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index 2ae921f..5643a10 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -19,16 +19,20 @@ #define VIRTFN_ID_LEN 16 -static inline u8 virtfn_bus(struct pci_dev *dev, int id) +int pci_iov_virtfn_bus(struct pci_dev *dev, int vf_id) { + if (!dev-is_physfn) + return -EINVAL; return dev-bus-number + ((dev-devfn + dev-sriov-offset + - dev-sriov-stride * id) 8); + dev-sriov-stride * vf_id) 8); } -static inline u8 virtfn_devfn(struct pci_dev *dev, int id) +int pci_iov_virtfn_devfn(struct pci_dev *dev, int vf_id) { + if (!dev-is_physfn) + return -EINVAL; return (dev-devfn + dev-sriov-offset + - dev-sriov-stride * id) 0xff; + dev-sriov-stride * vf_id) 0xff; } /* @@ -58,11 +62,11 @@ static inline u8 virtfn_max_buses(struct pci_dev *dev) struct pci_sriov *iov = dev-sriov; int nr_virtfn; u8 max = 0; - u8 busnr; + int busnr; for (nr_virtfn = 1; nr_virtfn = iov-total_VFs; nr_virtfn++) { pci_iov_set_numvfs(dev, nr_virtfn); - busnr = virtfn_bus(dev, nr_virtfn - 1); + busnr = pci_iov_virtfn_bus(dev, nr_virtfn - 1); if (busnr max) max = busnr; } @@ -116,7 +120,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int reset) struct pci_bus *bus; mutex_lock(iov-dev-sriov-lock); - bus = virtfn_add_bus(dev-bus, virtfn_bus(dev, id)); + bus = virtfn_add_bus(dev-bus, pci_iov_virtfn_bus(dev, id)); if (!bus) goto failed; @@ -124,7 +128,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int reset) if (!virtfn) goto failed0; - virtfn-devfn = virtfn_devfn(dev, id); + virtfn-devfn = pci_iov_virtfn_devfn(dev, id); virtfn-vendor = dev-vendor; pci_read_config_word(dev, iov-pos + PCI_SRIOV_VF_DID, virtfn-device); pci_setup_device(virtfn); @@ -186,8 +190,8 @@ static void virtfn_remove(struct pci_dev *dev, int id, int reset) struct pci_sriov *iov = dev-sriov; virtfn = pci_get_domain_bus_and_slot(pci_domain_nr(dev-bus), -virtfn_bus(dev, id), -virtfn_devfn(dev, id)); +pci_iov_virtfn_bus(dev, id), +pci_iov_virtfn_devfn(dev, id)); if (!virtfn) return; @@ -226,7 +230,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn) struct pci_dev *pdev; struct pci_sriov *iov = dev-sriov; int bars = 0; - u8 bus; + int bus; if (!nr_virtfn) return 0; @@ -263,7 +267,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn) iov-offset = offset; iov-stride = stride; - bus = virtfn_bus(dev, nr_virtfn - 1); + bus = pci_iov_virtfn_bus(dev, nr_virtfn - 1); if (bus dev-bus-busn_res.end) { dev_err(dev-dev, can't enable %d VFs (bus %02x out of range of %pR)\n, nr_virtfn, bus, dev-bus-busn_res); diff --git a/include/linux/pci.h b/include/linux/pci.h index 1559658..99ea948 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1669,6 +1669,9 @@ int pci_ext_cfg_avail(void); void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar); #ifdef CONFIG_PCI_IOV +int pci_iov_virtfn_bus(struct pci_dev *dev, int id); +int pci_iov_virtfn_devfn(struct pci_dev *dev, int id); + int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn); void pci_disable_sriov(struct pci_dev *dev); int pci_num_vf(struct pci_dev *dev); @@ -1677,6 +1680,14 @@ int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs); int pci_sriov_get_totalvfs(struct pci_dev *dev); resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno); #else +static inline int pci_iov_virtfn_bus(struct pci_dev *dev, int id) +{ + return -ENOSYS; +} +static inline int pci_iov_virtfn_devfn(struct pci_dev *dev, int id) +{ + return -ENOSYS; +} static inline int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn) { return -ENODEV; } static inline void
[PATCH V16 13/20] powerpc/powernv: Allocate struct pnv_ioda_pe iommu_table dynamically
Previously the iommu_table had the same lifetime as a struct pnv_ioda_pe and was embedded in it. The pnv_ioda_pe was assigned to a PE on the bootup stage. Since PEs are based on the hardware layout which is static in the system, they will never get released. This means the iommu_table in the pnv_ioda_pe will never get released either. This no longer works for VF PE. VF PEs are created and released dynamically when VFs are created and released. So we need to assign pnv_ioda_pe to VF PEs respectively when VFs are enabled and clean up those resources for VF PE when VFs are disabled. And iommu_table is one of the resources we need to handle dynamically. Current iommu_table is a static field in pnv_ioda_pe, which will face a problem when freeing it. During the disabling of a VF, pnv_pci_ioda2_release_dma_pe will call iommu_free_table to release the iommu_table for this PE. A static iommu_table will fail in iommu_free_table. According to these requirement, this patch allocates iommu_table dynamically. Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com --- arch/powerpc/include/asm/iommu.h |3 +++ arch/powerpc/platforms/powernv/pci-ioda.c | 26 ++ arch/powerpc/platforms/powernv/pci.h |2 +- 3 files changed, 18 insertions(+), 13 deletions(-) diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h index f1ea597..e2abbe8 100644 --- a/arch/powerpc/include/asm/iommu.h +++ b/arch/powerpc/include/asm/iommu.h @@ -78,6 +78,9 @@ struct iommu_table { struct iommu_group *it_group; #endif void (*set_bypass)(struct iommu_table *tbl, bool enable); +#ifdef CONFIG_PPC_POWERNV + void *data; +#endif }; /* Pure 2^n version of get_order */ diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 7f58f19..9447ee9 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -916,6 +916,10 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, int all) return; } + pe-tce32_table = kzalloc_node(sizeof(struct iommu_table), + GFP_KERNEL, hose-node); + pe-tce32_table-data = pe; + /* Associate it with all child devices */ pnv_ioda_setup_same_PE(bus, pe); @@ -1005,7 +1009,7 @@ static void pnv_pci_ioda_dma_dev_setup(struct pnv_phb *phb, struct pci_dev *pdev pe = phb-ioda.pe_array[pdn-pe_number]; WARN_ON(get_dma_ops(pdev-dev) != dma_iommu_ops); - set_iommu_table_base_and_group(pdev-dev, pe-tce32_table); + set_iommu_table_base_and_group(pdev-dev, pe-tce32_table); } static int pnv_pci_ioda_dma_set_mask(struct pnv_phb *phb, @@ -1032,7 +1036,7 @@ static int pnv_pci_ioda_dma_set_mask(struct pnv_phb *phb, } else { dev_info(pdev-dev, Using 32-bit DMA via iommu\n); set_dma_ops(pdev-dev, dma_iommu_ops); - set_iommu_table_base(pdev-dev, pe-tce32_table); + set_iommu_table_base(pdev-dev, pe-tce32_table); } *pdev-dev.dma_mask = dma_mask; return 0; @@ -1069,9 +1073,9 @@ static void pnv_ioda_setup_bus_dma(struct pnv_ioda_pe *pe, list_for_each_entry(dev, bus-devices, bus_list) { if (add_to_iommu_group) set_iommu_table_base_and_group(dev-dev, - pe-tce32_table); + pe-tce32_table); else - set_iommu_table_base(dev-dev, pe-tce32_table); + set_iommu_table_base(dev-dev, pe-tce32_table); if (dev-subordinate) pnv_ioda_setup_bus_dma(pe, dev-subordinate, @@ -1161,8 +1165,7 @@ static void pnv_pci_ioda2_tce_invalidate(struct pnv_ioda_pe *pe, void pnv_pci_ioda_tce_invalidate(struct iommu_table *tbl, __be64 *startp, __be64 *endp, bool rm) { - struct pnv_ioda_pe *pe = container_of(tbl, struct pnv_ioda_pe, - tce32_table); + struct pnv_ioda_pe *pe = tbl-data; struct pnv_phb *phb = pe-phb; if (phb-type == PNV_PHB_IODA1) @@ -1228,7 +1231,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb, } /* Setup linux iommu table */ - tbl = pe-tce32_table; + tbl = pe-tce32_table; pnv_pci_setup_iommu_table(tbl, addr, TCE32_TABLE_SIZE * segs, base 28, IOMMU_PAGE_SHIFT_4K); @@ -1266,8 +1269,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb, static void pnv_pci_ioda2_set_bypass(struct iommu_table *tbl, bool enable) { - struct pnv_ioda_pe *pe = container_of(tbl, struct pnv_ioda_pe, - tce32_table); + struct pnv_ioda_pe *pe = tbl-data; uint16_t window_id =
[PATCH V16 16/20] powerpc/powernv: Shift VF resource with an offset
On PowerNV platform, resource position in M64 BAR implies the PE# the resource belongs to. In some cases, adjustment of a resource is necessary to locate it to a correct position in M64 BAR . This patch adds pnv_pci_vf_resource_shift() to shift the 'real' PF IOV BAR address according to an offset. Note: After doing so, there would be a hole in the /proc/iomem when offset is a positive value. It looks like the device return some mmio back to the system, which actually no one could use it. [bhelgaas: rework loops, rework overlap check, index resource[] conventionally, remove pci_regs.h include, squashed with next patch] Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com --- arch/powerpc/include/asm/pci-bridge.h |4 + arch/powerpc/kernel/pci_dn.c | 13 + arch/powerpc/platforms/powernv/pci-ioda.c | 528 - arch/powerpc/platforms/powernv/pci.c | 18 + arch/powerpc/platforms/powernv/pci.h |7 + 5 files changed, 553 insertions(+), 17 deletions(-) diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h index 7b8ebc5..8716db4 100644 --- a/arch/powerpc/include/asm/pci-bridge.h +++ b/arch/powerpc/include/asm/pci-bridge.h @@ -180,6 +180,10 @@ struct pci_dn { int pe_number; #ifdef CONFIG_PCI_IOV u16 vfs_expanded; /* number of VFs IOV BAR expanded */ + u16 num_vfs;/* number of VFs enabled*/ + int offset; /* PE# for the first VF PE */ +#define IODA_INVALID_M64(-1) + int m64_wins[PCI_SRIOV_NUM_BARS]; #endif /* CONFIG_PCI_IOV */ #endif struct list_head child_list; diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c index e5f1d78..b3b4df9 100644 --- a/arch/powerpc/kernel/pci_dn.c +++ b/arch/powerpc/kernel/pci_dn.c @@ -217,6 +217,19 @@ void remove_dev_pci_data(struct pci_dev *pdev) struct pci_dn *pdn, *tmp; int i; + /* +* VF and VF PE are created/released dynamically, so we need to +* bind/unbind them. Otherwise the VF and VF PE would be mismatched +* when re-enabling SR-IOV. +*/ + if (pdev-is_virtfn) { + pdn = pci_get_pdn(pdev); +#ifdef CONFIG_PPC_POWERNV + pdn-pe_number = IODA_INVALID_PE; +#endif + return; + } + /* Only support IOV PF for now */ if (!pdev-is_physfn) return; diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 217eaad..5187d16 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -44,6 +44,9 @@ #include powernv.h #include pci.h +/* 256M DMA window, 4K TCE pages, 8 bytes TCE */ +#define TCE32_TABLE_SIZE ((0x1000 / 0x1000) * 8) + static void pe_level_printk(const struct pnv_ioda_pe *pe, const char *level, const char *fmt, ...) { @@ -56,11 +59,18 @@ static void pe_level_printk(const struct pnv_ioda_pe *pe, const char *level, vaf.fmt = fmt; vaf.va = args; - if (pe-pdev) + if (pe-flags PNV_IODA_PE_DEV) strlcpy(pfix, dev_name(pe-pdev-dev), sizeof(pfix)); - else + else if (pe-flags (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL)) sprintf(pfix, %04x:%02x , pci_domain_nr(pe-pbus), pe-pbus-number); +#ifdef CONFIG_PCI_IOV + else if (pe-flags PNV_IODA_PE_VF) + sprintf(pfix, %04x:%02x:%2x.%d, + pci_domain_nr(pe-parent_dev-bus), + (pe-rid 0xff00) 8, + PCI_SLOT(pe-rid), PCI_FUNC(pe-rid)); +#endif /* CONFIG_PCI_IOV*/ printk(%spci %s: [PE# %.3d] %pV, level, pfix, pe-pe_number, vaf); @@ -591,7 +601,7 @@ static int pnv_ioda_set_peltv(struct pnv_phb *phb, bool is_add) { struct pnv_ioda_pe *slave; - struct pci_dev *pdev; + struct pci_dev *pdev = NULL; int ret; /* @@ -630,8 +640,12 @@ static int pnv_ioda_set_peltv(struct pnv_phb *phb, if (pe-flags (PNV_IODA_PE_BUS_ALL | PNV_IODA_PE_BUS)) pdev = pe-pbus-self; - else + else if (pe-flags PNV_IODA_PE_DEV) pdev = pe-pdev-bus-self; +#ifdef CONFIG_PCI_IOV + else if (pe-flags PNV_IODA_PE_VF) + pdev = pe-parent_dev-bus-self; +#endif /* CONFIG_PCI_IOV */ while (pdev) { struct pci_dn *pdn = pci_get_pdn(pdev); struct pnv_ioda_pe *parent; @@ -649,6 +663,87 @@ static int pnv_ioda_set_peltv(struct pnv_phb *phb, return 0; } +#ifdef CONFIG_PCI_IOV +static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe) +{ + struct pci_dev *parent; + uint8_t bcomp, dcomp, fcomp; + int64_t rc; + long rid_end, rid; + + /*
[PATCH V16 17/20] powerpc/powernv: Reserve additional space for IOV BAR, with m64_per_iov supported
M64 aperture size is limited on PHB3. When the IOV BAR is too big, this will exceed the limitation and failed to be assigned. Introduce a different mechanism based on the IOV BAR size: - if IOV BAR size is smaller than 64MB, expand to total_pe - if IOV BAR size is bigger than 64MB, roundup power2 [bhelgaas: make dev_printk() output more consistent, use PCI_SRIOV_NUM_BARS] Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com --- arch/powerpc/include/asm/pci-bridge.h |2 ++ arch/powerpc/platforms/powernv/pci-ioda.c | 33 ++--- 2 files changed, 32 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h index 8716db4..415df85 100644 --- a/arch/powerpc/include/asm/pci-bridge.h +++ b/arch/powerpc/include/asm/pci-bridge.h @@ -182,6 +182,8 @@ struct pci_dn { u16 vfs_expanded; /* number of VFs IOV BAR expanded */ u16 num_vfs;/* number of VFs enabled*/ int offset; /* PE# for the first VF PE */ +#define M64_PER_IOV 4 + int m64_per_iov; #define IODA_INVALID_M64(-1) int m64_wins[PCI_SRIOV_NUM_BARS]; #endif /* CONFIG_PCI_IOV */ diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 5187d16..b63925f 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -2250,6 +2250,7 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev) int i; resource_size_t size; struct pci_dn *pdn; + int mul, total_vfs; if (!pdev-is_physfn || pdev-is_added) return; @@ -2260,6 +2261,32 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev) pdn = pci_get_pdn(pdev); pdn-vfs_expanded = 0; + total_vfs = pci_sriov_get_totalvfs(pdev); + pdn-m64_per_iov = 1; + mul = phb-ioda.total_pe; + + for (i = 0; i PCI_SRIOV_NUM_BARS; i++) { + res = pdev-resource[i + PCI_IOV_RESOURCES]; + if (!res-flags || res-parent) + continue; + if (!pnv_pci_is_mem_pref_64(res-flags)) { + dev_warn(pdev-dev, non M64 VF BAR%d: %pR\n, +i, res); + continue; + } + + size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES); + + /* bigger than 64M */ + if (size (1 26)) { + dev_info(pdev-dev, PowerNV: VF BAR%d: %pR IOV size is bigger than 64M, roundup power2\n, +i, res); + pdn-m64_per_iov = M64_PER_IOV; + mul = roundup_pow_of_two(total_vfs); + break; + } + } + for (i = 0; i PCI_SRIOV_NUM_BARS; i++) { res = pdev-resource[i + PCI_IOV_RESOURCES]; if (!res-flags || res-parent) @@ -2272,12 +2299,12 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev) dev_dbg(pdev-dev, Fixing VF BAR%d: %pR to\n, i, res); size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES); - res-end = res-start + size * phb-ioda.total_pe - 1; + res-end = res-start + size * mul - 1; dev_dbg(pdev-dev,%pR\n, res); dev_info(pdev-dev, VF BAR%d: %pR (expanded to %d VFs for PE alignment), - i, res, phb-ioda.total_pe); +i, res, mul); } - pdn-vfs_expanded = phb-ioda.total_pe; + pdn-vfs_expanded = mul; } #endif /* CONFIG_PCI_IOV */ -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 4/6] powerpc: Fix compile errors with STRICT_MM_TYPECHECKS enabled
Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com [mpe: Fix the 32-bit code also] Signed-off-by: Michael Ellerman m...@ellerman.id.au --- arch/powerpc/include/asm/kvm_book3s_64.h | 12 +++- arch/powerpc/mm/dma-noncoherent.c| 2 +- arch/powerpc/mm/fsl_booke_mmu.c | 2 +- arch/powerpc/mm/hugepage-hash64.c| 2 +- arch/powerpc/mm/hugetlbpage.c| 4 ++-- arch/powerpc/mm/pgtable_32.c | 4 ++-- arch/powerpc/mm/pgtable_64.c | 2 +- arch/powerpc/mm/tlb_hash64.c | 2 +- 8 files changed, 16 insertions(+), 14 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 2d81e202bdcc..cc073a7ac2b7 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -290,11 +290,11 @@ static inline pte_t kvmppc_read_update_linux_pte(pte_t *ptep, int writing, pte_t old_pte, new_pte = __pte(0); while (1) { - old_pte = pte_val(*ptep); + old_pte = *ptep; /* * wait until _PAGE_BUSY is clear then set it atomically */ - if (unlikely(old_pte _PAGE_BUSY)) { + if (unlikely(pte_val(old_pte) _PAGE_BUSY)) { cpu_relax(); continue; } @@ -305,16 +305,18 @@ static inline pte_t kvmppc_read_update_linux_pte(pte_t *ptep, int writing, return __pte(0); #endif /* If pte is not present return None */ - if (unlikely(!(old_pte _PAGE_PRESENT))) + if (unlikely(!(pte_val(old_pte) _PAGE_PRESENT))) return __pte(0); new_pte = pte_mkyoung(old_pte); if (writing pte_write(old_pte)) new_pte = pte_mkdirty(new_pte); - if (old_pte == __cmpxchg_u64((unsigned long *)ptep, old_pte, -new_pte)) + if (pte_val(old_pte) == __cmpxchg_u64((unsigned long *)ptep, + pte_val(old_pte), + pte_val(new_pte))) { break; + } } return new_pte; } diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma-noncoherent.c index d85e86aac7fb..169aba446a74 100644 --- a/arch/powerpc/mm/dma-noncoherent.c +++ b/arch/powerpc/mm/dma-noncoherent.c @@ -228,7 +228,7 @@ __dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *handle, gfp_t do { SetPageReserved(page); map_page(vaddr, page_to_phys(page), -pgprot_noncached(PAGE_KERNEL)); +pgprot_val(pgprot_noncached(PAGE_KERNEL))); page++; vaddr += PAGE_SIZE; } while (size -= PAGE_SIZE); diff --git a/arch/powerpc/mm/fsl_booke_mmu.c b/arch/powerpc/mm/fsl_booke_mmu.c index b46912fee7cd..9c90e66cffb6 100644 --- a/arch/powerpc/mm/fsl_booke_mmu.c +++ b/arch/powerpc/mm/fsl_booke_mmu.c @@ -181,7 +181,7 @@ static unsigned long map_mem_in_cams_addr(phys_addr_t phys, unsigned long virt, unsigned long cam_sz; cam_sz = calc_cam_sz(ram, virt, phys); - settlbcam(i, virt, phys, cam_sz, PAGE_KERNEL_X, 0); + settlbcam(i, virt, phys, cam_sz, pgprot_val(PAGE_KERNEL_X), 0); ram -= cam_sz; amount_mapped += cam_sz; diff --git a/arch/powerpc/mm/hugepage-hash64.c b/arch/powerpc/mm/hugepage-hash64.c index 86686514ae13..43dafb9d6a46 100644 --- a/arch/powerpc/mm/hugepage-hash64.c +++ b/arch/powerpc/mm/hugepage-hash64.c @@ -33,7 +33,7 @@ int __hash_page_thp(unsigned long ea, unsigned long access, unsigned long vsid, * atomically mark the linux large page PMD busy and dirty */ do { - pmd_t pmd = ACCESS_ONCE(*pmdp); + pmd_t pmd = READ_ONCE(*pmdp); old_pmd = pmd_val(pmd); /* If PMD busy, retry the access */ diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c index 7e408bfc7948..fa9d5c238d22 100644 --- a/arch/powerpc/mm/hugetlbpage.c +++ b/arch/powerpc/mm/hugetlbpage.c @@ -964,7 +964,7 @@ pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea, unsigned *shift *shift = 0; pgdp = pgdir + pgd_index(ea); - pgd = ACCESS_ONCE(*pgdp); + pgd = READ_ONCE(*pgdp); /* * Always operate on the local stack value. This make sure the * value don't get updated by a parallel THP split/collapse, @@ -1045,7 +1045,7 @@ int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr, if (pte_end end) end = pte_end; -
[PATCH 5/6] mm/gup: Replace ACCESS_ONCE with READ_ONCE for STRICT_MM_TYPECHECKS
If STRICT_MM_TYPECHECKS is enabled the generic gup code fails to build because we are using ACCESS_ONCE on non-scalar types. Convert all uses to READ_ONCE. Cc: a...@linux-foundation.org Cc: kirill.shute...@linux.intel.com Cc: aarca...@redhat.com Cc: borntrae...@de.ibm.com Cc: steve.cap...@linaro.org Cc: linux...@kvack.org Signed-off-by: Michael Ellerman m...@ellerman.id.au --- mm/gup.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index a6e24e246f86..120c3adc843c 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -901,7 +901,7 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, * * for an example see gup_get_pte in arch/x86/mm/gup.c */ - pte_t pte = ACCESS_ONCE(*ptep); + pte_t pte = READ_ONCE(*ptep); struct page *page; /* @@ -1191,7 +1191,7 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write, local_irq_save(flags); pgdp = pgd_offset(mm, addr); do { - pgd_t pgd = ACCESS_ONCE(*pgdp); + pgd_t pgd = READ_ONCE(*pgdp); next = pgd_addr_end(addr, end); if (pgd_none(pgd)) -- 2.1.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 5/6] mm/gup: Replace ACCESS_ONCE with READ_ONCE for STRICT_MM_TYPECHECKS
Am 25.03.2015 um 10:11 schrieb Michael Ellerman: If STRICT_MM_TYPECHECKS is enabled the generic gup code fails to build because we are using ACCESS_ONCE on non-scalar types. Convert all uses to READ_ONCE. There is a similar patch from Jason Low in Andrews patch. If that happens in 4.0-rc, we probably want to merge this before 4.0. Cc: a...@linux-foundation.org Cc: kirill.shute...@linux.intel.com Cc: aarca...@redhat.com Cc: borntrae...@de.ibm.com Cc: steve.cap...@linaro.org Cc: linux...@kvack.org Signed-off-by: Michael Ellerman m...@ellerman.id.au --- mm/gup.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index a6e24e246f86..120c3adc843c 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -901,7 +901,7 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, * * for an example see gup_get_pte in arch/x86/mm/gup.c */ - pte_t pte = ACCESS_ONCE(*ptep); + pte_t pte = READ_ONCE(*ptep); struct page *page; /* @@ -1191,7 +1191,7 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write, local_irq_save(flags); pgdp = pgd_offset(mm, addr); do { - pgd_t pgd = ACCESS_ONCE(*pgdp); + pgd_t pgd = READ_ONCE(*pgdp); next = pgd_addr_end(addr, end); if (pgd_none(pgd)) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v4 0/3] powerpc: Enable seccomp filter support
On Mon, 2015-03-23 at 13:44 +0200, Purcareata Bogdan wrote: On 27.02.2015 22:54, Benjamin Herrenschmidt wrote: On Fri, 2015-02-27 at 09:28 +0200, Purcareata Bogdan wrote: Ping? What is the ping for ? Ben. Hello Ben, I just wanted to check with you what's the current status of these patches. I noticed in patchwork [1][2][3] that the patches are marked as non-applicable. As of today, I cloned Michael Ellerman's tree [4], applied the patches on the master branch, compiled and tested. Tests pass both with the libseccomp regression suite and my LXC tests. Is there a specific tree I should send them against, or on another mailing list? Is there any other reason the patches are not applicable? I just haven't had time to review them properly. Because you're touching the syscall path for all powerpc platforms it needs more scrutiny than the average patch. It should still make 4.1, probably :) cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 5/6] mm/gup: Replace ACCESS_ONCE with READ_ONCE for STRICT_MM_TYPECHECKS
On Wed, 2015-03-25 at 10:18 +0100, Christian Borntraeger wrote: Am 25.03.2015 um 10:11 schrieb Michael Ellerman: If STRICT_MM_TYPECHECKS is enabled the generic gup code fails to build because we are using ACCESS_ONCE on non-scalar types. Convert all uses to READ_ONCE. There is a similar patch from Jason Low in Andrews patch. Ah sorry, I didn't think to check. If that happens in 4.0-rc, we probably want to merge this before 4.0. My series can wait, it's not urgent. So I'll plan to merge mine once Andrew's tree has gone into Linus' tree for 4.1. cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc/powernv: Remove powernv RTAS support
On Wed, 2015-03-25 at 16:46 +1100, Stewart Smith wrote: Michael Ellerman m...@ellerman.id.au writes: The powernv code has some conditional support for running on bare metal machines that have no OPAL firmware, but provide RTAS. No released machines ever supported that, and even in the lab it was just a transitional hack in the days when OPAL was still being developed. So remove the code. Signed-off-by: Michael Ellerman m...@ellerman.id.au The only current place I could think this could be remotely possible would be in simulator... and we should instead make the OPAL calls work properly in the simulator for all the RTAS functionality (that we care about). If you mean mambo, I tested that, at least the public version, and it doesn't provide or need RTAS. On the other sims we ran without RTAS during the Power8 bringup, though it was eventually used a little bit late in the cycle. In future we should be using skiboot, or just putting logic directly into the kernel for early bringup - or permanently :) Acked-by: Stewart Smith stew...@linux.vnet.ibm.com Thanks. cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC, powerpc] perf/hv-24x7 set the attr group to NULL if events failed to be initialized
On Sun, 2015-15-02 at 09:42:57 UTC, Li Zhong wrote: sysfs_create_groups() creates groups one by one in the attr_groups array before a NULL entry is encountered. But if an error is seen, it stops and removes all the groups already created: for (i = 0; groups[i]; i++) { error = sysfs_create_group(kobj, groups[i]); if (error) { while (--i = 0) sysfs_remove_group(kobj, groups[i]); break; } } And for the three event groups of 24x7, if it is not supported, according to the above logic, it causes format and interface group to be removed because of the error. This patch moves the three events groups to the end of the attr groups, and if create_events_from_catalog() fails to set their attributes, we set them to NULL in attr_groups. But why are we continuing at all if create_events_from_catalog() fails? Shouldn't that just be a fatal error and we bail? cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: powerpc/perf: add missing put_cpu_var in power_pmu_event_init
- Original Message - From: Michael Ellerman m...@ellerman.id.au To: Jan Stancek jstan...@redhat.com, linuxppc-dev@lists.ozlabs.org Cc: linux-ker...@vger.kernel.org, pau...@samba.org, an...@samba.org, t...@kernel.org, c...@linux.com, jo...@redhat.com, jstan...@redhat.com, j...@jms.id.au Sent: Wednesday, 25 March, 2015 6:25:09 AM Subject: Re: powerpc/perf: add missing put_cpu_var in power_pmu_event_init On Tue, 2015-24-03 at 12:33:22 UTC, Jan Stancek wrote: One path in power_pmu_event_init() calls get_cpu_var(), but is missing matching call to put_cpu_var(), which causes preemption imbalance and crash in user-space: Page fault in user mode with in_atomic() = 1 mm = c01fefa5a280 NIP = 3fff9bf2cae0 MSR = 90014280f032 Oops: Weird page fault, sig: 11 [#23] snip Thanks. But I don't see this. I guess you have CONFIG_PREEMPT enabled? Hi, CONFIG_PREEMPT_NOTIFIERS=y # CONFIG_PREEMPT_NONE is not set CONFIG_PREEMPT_VOLUNTARY=y # CONFIG_PREEMPT is not set CONFIG_PREEMPT_COUNT=y but I think the difference comes from: CONFIG_DEBUG_ATOMIC_SLEEP=y I did following: - took the default config from RHEL7.1 kernel - ran 'make oldnoconfig'. - reproducer didn't trigger anything - then I added CONFIG_DEBUG_ATOMIC_SLEEP=y - this time reproducer triggered a panic (3 out of 3 attempts) Here's config from panic-ing kernel: http://fpaste.org/202543/ [ 133.957305] Page fault in user mode with in_atomic() = 1 mm = c5fc7e80 [ 133.957399] NIP = 3fff9be0cae0 MSR = 90014280f032 [ 133.957405] Oops: Weird page fault, sig: 11 [#1] [ 133.957409] SMP NR_CPUS=2048 NUMA PowerNV [ 133.957414] Modules linked in: ses enclosure shpchp uio_pdrv_genirq powernv_rng uio xfs libcrc32c sr_mod sd_mod cdrom ipr libata tg3 ptp pps_core dm_mirror dm_region_hash dm_log dm_mod [ 133.957638] CPU: 16 PID: 6035 Comm: a.out Not tainted 4.0.0-rc5+ #4 [ 133.957693] task: c00fea44b640 ti: c00fea5e4000 task.ti: c00fea5e4000 [ 133.957759] NIP: 3fff9be0cae0 LR: 3fff9bdc4898 CTR: 3fff9be0cae0 [ 133.957825] REGS: c00fea5e7ea0 TRAP: 0401 Not tainted (4.0.0-rc5+) [ 133.957880] MSR: 90014280f032 SF,HV,VEC,VSX,EE,PR,FP,ME,IR,DR,RI CR: 2228 XER: [ 133.958079] CFAR: 3fff9bdc4894 SOFTE: 1 GPR00: 3fff9bdc494c 31fef3e0 3fff9bf64410 10020068 GPR04: 0002 0008 0001 GPR08: 0001 3fff9bf54a30 3fff9be0cae0 3fff9be0cd70 GPR12: 5222 3fff9bfeb700 [ 133.958485] NIP [3fff9be0cae0] 0x3fff9be0cae0 [ 133.958530] LR [3fff9bdc4898] 0x3fff9bdc4898 [ 133.958574] Call Trace: [ 133.958597] ---[ end trace 56ec543903422cd9 ]--- [ 133.958642] [ 135.958709] Kernel panic - not syncing: Fatal exception [ 135.958863] Rebooting in 10 seconds.. [ 145.970348] BUG: sleeping function called from invalid context at kernel/irq/manage.c:104 [ 145.970453] in_atomic(): 1, irqs_disabled(): 1, pid: 6035, name: a.out [ 145.970515] CPU: 16 PID: 6035 Comm: a.out Tainted: G D 4.0.0-rc5+ #4 [ 145.970588] Call Trace: [ 145.970618] [c00fea5e76d0] [c07c2090] .dump_stack+0x98/0xd4 (unreliable) [ 145.970707] [c00fea5e7750] [c00d5fe4] .___might_sleep+0x124/0x170 [ 145.970782] [c00fea5e77c0] [c0112860] .synchronize_irq+0x40/0xe0 [ 145.970857] [c00fea5e7880] [c0112fa8] .__free_irq+0xf8/0x2b0 [ 145.970931] [c00fea5e7920] [c0113258] .free_irq+0x78/0x100 [ 145.971007] [c00fea5e79b0] [c0067ae8] .opal_shutdown+0x88/0x120 [ 145.971081] [c00fea5e7a40] [c0063e88] .pnv_shutdown+0x18/0x30 [ 145.971157] [c00fea5e7ab0] [c0020c98] .machine_shutdown+0x38/0x50 [ 145.971231] [c00fea5e7b20] [c0020d24] .machine_restart+0x14/0x70 [ 145.971307] [c00fea5e7ba0] [c00cdc10] .emergency_restart+0x20/0x40 [ 145.971393] [c00fea5e7c10] [c07bb0a4] .panic+0x224/0x2a4 [ 145.971468] [c00fea5e7cb0] [c001e1fc] .die+0x43c/0x450 [ 145.971543] [c00fea5e7d60] [c07b62c4] .do_page_fault+0x2d4/0x8f0 [ 145.971618] [c00fea5e7e30] [c0008664] handle_page_fault+0x10/0x30 Regards, Jan ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3 0/2] Tracking user space vDSO remaping
CRIU is recreating the process memory layout by remapping the checkpointee memory area on top of the current process (criu). This includes remapping the vDSO to the place it has at checkpoint time. However some architectures like powerpc are keeping a reference to the vDSO base address to build the signal return stack frame by calling the vDSO sigreturn service. So once the vDSO has been moved, this reference is no more valid and the signal frame built later are not usable. This patch serie is introducing a new mm hook 'arch_remap' which is called when mremap is done and the mm lock still hold. The next patch is adding the vDSO remap and unmap tracking to the powerpc architecture. Changes in v3: -- - Fixed grammatical error in a comment of the second patch. Thanks again, Ingo. Changes in v2: -- - Following the Ingo Molnar's advice, enabling the call to arch_remap through the __HAVE_ARCH_REMAP macro. This reduces considerably the first patch. Laurent Dufour (2): mm: Introducing arch_remap hook powerpc/mm: Tracking vDSO remap arch/powerpc/include/asm/mmu_context.h | 36 +- mm/mremap.c| 11 +-- 2 files changed, 44 insertions(+), 3 deletions(-) -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/2] KVM: PPC: Use ACCESS_ONCE when dereferencing pte_t pointer
Hi, Ignore this series, I used a wrong directory when sending out the patchset. Will send a v3. -aneesh ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 2/2] powerpc/mm: Tracking vDSO remap
On 25/03/2015 13:11, Ingo Molnar wrote: * Laurent Dufour lduf...@linux.vnet.ibm.com wrote: Some processes (CRIU) are moving the vDSO area using the mremap system call. As a consequence the kernel reference to the vDSO base address is no more valid and the signal return frame built once the vDSO has been moved is not pointing to the new sigreturn address. This patch handles vDSO remapping and unmapping. Signed-off-by: Laurent Dufour lduf...@linux.vnet.ibm.com --- arch/powerpc/include/asm/mmu_context.h | 36 +- 1 file changed, 35 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h index 73382eba02dc..be5dca3f7826 100644 --- a/arch/powerpc/include/asm/mmu_context.h +++ b/arch/powerpc/include/asm/mmu_context.h @@ -8,7 +8,6 @@ #include linux/spinlock.h #include asm/mmu.h #include asm/cputable.h -#include asm-generic/mm_hooks.h #include asm/cputhreads.h /* @@ -109,5 +108,40 @@ static inline void enter_lazy_tlb(struct mm_struct *mm, #endif } +static inline void arch_dup_mmap(struct mm_struct *oldmm, + struct mm_struct *mm) +{ +} + +static inline void arch_exit_mmap(struct mm_struct *mm) +{ +} + +static inline void arch_unmap(struct mm_struct *mm, +struct vm_area_struct *vma, +unsigned long start, unsigned long end) +{ +if (start = mm-context.vdso_base mm-context.vdso_base end) +mm-context.vdso_base = 0; +} + +static inline void arch_bprm_mm_init(struct mm_struct *mm, + struct vm_area_struct *vma) +{ +} + +#define __HAVE_ARCH_REMAP +static inline void arch_remap(struct mm_struct *mm, + unsigned long old_start, unsigned long old_end, + unsigned long new_start, unsigned long new_end) +{ +/* + * mremap don't allow moving multiple vma so we can limit the check + * to old_start == vdso_base. s/mremap don't allow moving multiple vma mremap() doesn't allow moving multiple vmas right? Sure you're right. I'll provide a v3 fixing that comment. Thanks, Laurent. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V3 3/3] powerpc/mm/thp: Make page table walk safe against thp split/collapse
We can disable a THP split or a hugepage collapse by disabling irq. We do send IPI to all the cpus in the early part of split/collapse, and disabling local irq ensure we don't make progress with split/collapse. If the THP is getting split we return NULL from find_linux_pte_or_hugepte(). For all the current callers it should be ok. We need to be careful if we want to use returned pte_t pointer outside the irq disabled region. W.r.t to THP split, the pfn remains the same, but then a hugepage collapse will result in a pfn change. There are few steps we can take to avoid a hugepage collapse.One way is to take page reference inside the irq disable region. Other option is to take mmap_sem so that a parallel collapse will not happen. We can also disable collapse by taking pmd_lock. Another method used by kvm subsystem is to check whether we had a mmu_notifer update in between using mmu_notifier_retry(). Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/include/asm/kvm_book3s_64.h | 12 ++-- arch/powerpc/include/asm/pgtable.h | 11 ++- arch/powerpc/kernel/eeh.c| 6 -- arch/powerpc/kernel/io-workarounds.c | 10 +- arch/powerpc/kvm/book3s_64_mmu_hv.c | 14 ++ arch/powerpc/kvm/book3s_hv_rm_mmu.c | 32 arch/powerpc/kvm/e500_mmu_host.c | 14 -- arch/powerpc/mm/hash_utils_64.c | 2 +- arch/powerpc/mm/hugetlbpage.c| 20 ++-- arch/powerpc/perf/callchain.c| 24 ++-- 10 files changed, 92 insertions(+), 53 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index f06820c67175..5233a35d80e2 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -281,11 +281,9 @@ static inline int hpte_cache_flags_ok(unsigned long ptel, unsigned long io_type) /* * If it's present and writable, atomically set dirty and referenced bits and - * return the PTE, otherwise return 0. If we find a transparent hugepage - * and if it is marked splitting we return 0; + * return the PTE, otherwise return 0. */ -static inline pte_t kvmppc_read_update_linux_pte(pte_t *ptep, int writing, -unsigned int hugepage) +static inline pte_t kvmppc_read_update_linux_pte(pte_t *ptep, int writing) { pte_t old_pte, new_pte = __pte(0); @@ -301,12 +299,6 @@ static inline pte_t kvmppc_read_update_linux_pte(pte_t *ptep, int writing, cpu_relax(); continue; } -#ifdef CONFIG_TRANSPARENT_HUGEPAGE - /* If hugepage and is trans splitting return None */ - if (unlikely(hugepage -pmd_trans_splitting(pte_pmd(old_pte - return __pte(0); -#endif /* If pte is not present return None */ if (unlikely(!(pte_val(old_pte) _PAGE_PRESENT))) return __pte(0); diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h index 92fe01c355a9..11a38635dd65 100644 --- a/arch/powerpc/include/asm/pgtable.h +++ b/arch/powerpc/include/asm/pgtable.h @@ -247,8 +247,17 @@ extern int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr, #define pmd_large(pmd) 0 #define has_transparent_hugepage() 0 #endif -pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea, +pte_t *__find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea, unsigned *shift); +static inline pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea, + unsigned *shift) +{ + if (!arch_irqs_disabled()) { + pr_info(%s called with irq enabled\n, __func__); + dump_stack(); + } + return __find_linux_pte_or_hugepte(pgdir, ea, shift); +} #endif /* __ASSEMBLY__ */ #endif /* __KERNEL__ */ diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c index 3b2252e7731b..8424b232e598 100644 --- a/arch/powerpc/kernel/eeh.c +++ b/arch/powerpc/kernel/eeh.c @@ -330,9 +330,11 @@ static inline unsigned long eeh_token_to_phys(unsigned long token) int hugepage_shift; /* -* We won't find hugepages here, iomem +* We won't find hugepages here(this is iomem). Hence we are not +* worried about _PAGE_SPLITTING/collapse. Also we will not hit +* page table free, because of init_mm. */ - ptep = find_linux_pte_or_hugepte(init_mm.pgd, token, hugepage_shift); + ptep = __find_linux_pte_or_hugepte(init_mm.pgd, token, hugepage_shift); if (!ptep) return token; WARN_ON(hugepage_shift); diff --git a/arch/powerpc/kernel/io-workarounds.c b/arch/powerpc/kernel/io-workarounds.c index
Re: [PATCH v2 2/2] powerpc/mm: Tracking vDSO remap
* Laurent Dufour lduf...@linux.vnet.ibm.com wrote: Some processes (CRIU) are moving the vDSO area using the mremap system call. As a consequence the kernel reference to the vDSO base address is no more valid and the signal return frame built once the vDSO has been moved is not pointing to the new sigreturn address. This patch handles vDSO remapping and unmapping. Signed-off-by: Laurent Dufour lduf...@linux.vnet.ibm.com --- arch/powerpc/include/asm/mmu_context.h | 36 +- 1 file changed, 35 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h index 73382eba02dc..be5dca3f7826 100644 --- a/arch/powerpc/include/asm/mmu_context.h +++ b/arch/powerpc/include/asm/mmu_context.h @@ -8,7 +8,6 @@ #include linux/spinlock.h #include asm/mmu.h #include asm/cputable.h -#include asm-generic/mm_hooks.h #include asm/cputhreads.h /* @@ -109,5 +108,40 @@ static inline void enter_lazy_tlb(struct mm_struct *mm, #endif } +static inline void arch_dup_mmap(struct mm_struct *oldmm, + struct mm_struct *mm) +{ +} + +static inline void arch_exit_mmap(struct mm_struct *mm) +{ +} + +static inline void arch_unmap(struct mm_struct *mm, + struct vm_area_struct *vma, + unsigned long start, unsigned long end) +{ + if (start = mm-context.vdso_base mm-context.vdso_base end) + mm-context.vdso_base = 0; +} + +static inline void arch_bprm_mm_init(struct mm_struct *mm, + struct vm_area_struct *vma) +{ +} + +#define __HAVE_ARCH_REMAP +static inline void arch_remap(struct mm_struct *mm, + unsigned long old_start, unsigned long old_end, + unsigned long new_start, unsigned long new_end) +{ + /* + * mremap don't allow moving multiple vma so we can limit the check + * to old_start == vdso_base. s/mremap don't allow moving multiple vma mremap() doesn't allow moving multiple vmas right? Thanks, Ingo ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3] powerpc: Use PFN_PHYS() to avoid truncating the physical address
Signed-off-by: Emil Medve emilian.me...@freescale.com --- v3: Rebased and updated due to upstream changes since v2 v2: Rebased and updated due to upstream changes since v1 arch/powerpc/include/asm/io.h | 2 +- arch/powerpc/include/asm/page.h| 2 +- arch/powerpc/include/asm/pgalloc-32.h | 2 +- arch/powerpc/include/asm/rtas.h| 3 ++- arch/powerpc/kernel/crash_dump.c | 2 +- arch/powerpc/kernel/eeh.c | 4 +--- arch/powerpc/kernel/io-workarounds.c | 2 +- arch/powerpc/kernel/pci-common.c | 2 +- arch/powerpc/kernel/vdso.c | 6 +++--- arch/powerpc/kvm/book3s_64_mmu_host.c | 2 +- arch/powerpc/kvm/book3s_64_mmu_hv.c| 2 +- arch/powerpc/kvm/book3s_hv_rm_mmu.c| 4 ++-- arch/powerpc/kvm/e500_mmu_host.c | 5 ++--- arch/powerpc/mm/hugepage-hash64.c | 2 +- arch/powerpc/mm/hugetlbpage-book3e.c | 2 +- arch/powerpc/mm/hugetlbpage-hash64.c | 2 +- arch/powerpc/mm/mem.c | 9 - arch/powerpc/mm/numa.c | 5 ++--- arch/powerpc/platforms/powernv/opal.c | 2 +- arch/powerpc/platforms/pseries/iommu.c | 8 20 files changed, 32 insertions(+), 36 deletions(-) diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h index 9eaf301..d6454f5 100644 --- a/arch/powerpc/include/asm/io.h +++ b/arch/powerpc/include/asm/io.h @@ -794,7 +794,7 @@ static inline void * phys_to_virt(unsigned long address) /* * Change struct page to physical address. */ -#define page_to_phys(page) ((phys_addr_t)page_to_pfn(page) PAGE_SHIFT) +#define page_to_phys(page) PFN_PHYS(page_to_pfn(page)) /* * 32 bits still uses virt_to_bus() for it's implementation of DMA diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h index 69c0598..30f33ed 100644 --- a/arch/powerpc/include/asm/page.h +++ b/arch/powerpc/include/asm/page.h @@ -128,7 +128,7 @@ extern long long virt_phys_offset; #endif #define virt_to_page(kaddr)pfn_to_page(__pa(kaddr) PAGE_SHIFT) -#define pfn_to_kaddr(pfn) __va((pfn) PAGE_SHIFT) +#define pfn_to_kaddr(pfn) __va(PFN_PHYS(pfn)) #define virt_addr_valid(kaddr) pfn_valid(__pa(kaddr) PAGE_SHIFT) /* diff --git a/arch/powerpc/include/asm/pgalloc-32.h b/arch/powerpc/include/asm/pgalloc-32.h index 842846c..3d19a8e 100644 --- a/arch/powerpc/include/asm/pgalloc-32.h +++ b/arch/powerpc/include/asm/pgalloc-32.h @@ -24,7 +24,7 @@ extern void pgd_free(struct mm_struct *mm, pgd_t *pgd); #define pmd_populate_kernel(mm, pmd, pte) \ (pmd_val(*(pmd)) = __pa(pte) | _PMD_PRESENT) #define pmd_populate(mm, pmd, pte) \ - (pmd_val(*(pmd)) = (page_to_pfn(pte) PAGE_SHIFT) | _PMD_PRESENT) + (pmd_val(*(pmd)) = PFN_PHYS(page_to_pfn(pte)) | _PMD_PRESENT) #define pmd_pgtable(pmd) pmd_page(pmd) #else #define pmd_populate_kernel(mm, pmd, pte) \ diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h index 2e23e92..2e430b6d 100644 --- a/arch/powerpc/include/asm/rtas.h +++ b/arch/powerpc/include/asm/rtas.h @@ -3,6 +3,7 @@ #ifdef __KERNEL__ #include linux/spinlock.h +#include linux/pfn.h #include asm/page.h /* @@ -418,7 +419,7 @@ extern void rtas_take_timebase(void); #ifdef CONFIG_PPC_RTAS static inline int page_is_rtas_user_buf(unsigned long pfn) { - unsigned long paddr = (pfn PAGE_SHIFT); + unsigned long paddr = PFN_PHYS(pfn); if (paddr = rtas_rmo_buf paddr (rtas_rmo_buf + RTAS_RMOBUF_MAX)) return 1; return 0; diff --git a/arch/powerpc/kernel/crash_dump.c b/arch/powerpc/kernel/crash_dump.c index cfa0f81..b6578ee 100644 --- a/arch/powerpc/kernel/crash_dump.c +++ b/arch/powerpc/kernel/crash_dump.c @@ -104,7 +104,7 @@ ssize_t copy_oldmem_page(unsigned long pfn, char *buf, return 0; csize = min_t(size_t, csize, PAGE_SIZE); - paddr = pfn PAGE_SHIFT; + paddr = PFN_PHYS(pfn); if (memblock_is_region_memory(paddr, csize)) { vaddr = __va(paddr); diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c index 3b2252e..119af20 100644 --- a/arch/powerpc/kernel/eeh.c +++ b/arch/powerpc/kernel/eeh.c @@ -326,7 +326,6 @@ void eeh_slot_error_detail(struct eeh_pe *pe, int severity) static inline unsigned long eeh_token_to_phys(unsigned long token) { pte_t *ptep; - unsigned long pa; int hugepage_shift; /* @@ -336,9 +335,8 @@ static inline unsigned long eeh_token_to_phys(unsigned long token) if (!ptep) return token; WARN_ON(hugepage_shift); - pa = pte_pfn(*ptep) PAGE_SHIFT; - return pa | (token (PAGE_SIZE-1)); + return PFN_PHYS(pte_pfn(*ptep)) | (token (PAGE_SIZE - 1)); } /* diff --git a/arch/powerpc/kernel/io-workarounds.c b/arch/powerpc/kernel/io-workarounds.c index 24b968f..dd9a4a2 100644 --- a/arch/powerpc/kernel/io-workarounds.c +++
[PATCH V3 2/3] powerpc/mm: Remove page table walk helpers
This patch remove helpers which we had used only once in the code. Limiting page table walk variants help in ensuring that we won't end up with code walking page table with wrong assumptions. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/include/asm/pgtable.h | 21 - arch/powerpc/kvm/book3s_hv_rm_mmu.c | 62 - arch/powerpc/kvm/e500_mmu_host.c| 2 +- 3 files changed, 28 insertions(+), 57 deletions(-) diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h index 9835ac4173b7..92fe01c355a9 100644 --- a/arch/powerpc/include/asm/pgtable.h +++ b/arch/powerpc/include/asm/pgtable.h @@ -249,27 +249,6 @@ extern int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr, #endif pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea, unsigned *shift); - -static inline pte_t *lookup_linux_ptep(pgd_t *pgdir, unsigned long hva, -unsigned long *pte_sizep) -{ - pte_t *ptep; - unsigned long ps = *pte_sizep; - unsigned int shift; - - ptep = find_linux_pte_or_hugepte(pgdir, hva, shift); - if (!ptep) - return NULL; - if (shift) - *pte_sizep = 1ul shift; - else - *pte_sizep = PAGE_SIZE; - - if (ps *pte_sizep) - return NULL; - - return ptep; -} #endif /* __ASSEMBLY__ */ #endif /* __KERNEL__ */ diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index 625407e4d3b0..73e083cb9f7e 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -131,25 +131,6 @@ static void remove_revmap_chain(struct kvm *kvm, long pte_index, unlock_rmap(rmap); } -static pte_t lookup_linux_pte_and_update(pgd_t *pgdir, unsigned long hva, - int writing, unsigned long *pte_sizep) -{ - pte_t *ptep; - unsigned long ps = *pte_sizep; - unsigned int hugepage_shift; - - ptep = find_linux_pte_or_hugepte(pgdir, hva, hugepage_shift); - if (!ptep) - return __pte(0); - if (hugepage_shift) - *pte_sizep = 1ul hugepage_shift; - else - *pte_sizep = PAGE_SIZE; - if (ps *pte_sizep) - return __pte(0); - return kvmppc_read_update_linux_pte(ptep, writing, hugepage_shift); -} - static inline void unlock_hpte(__be64 *hpte, unsigned long hpte_v) { asm volatile(PPC_RELEASE_BARRIER : : : memory); @@ -166,10 +147,10 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags, struct revmap_entry *rev; unsigned long g_ptel; struct kvm_memory_slot *memslot; - unsigned long pte_size; + unsigned hpage_shift; unsigned long is_io; unsigned long *rmap; - pte_t pte; + pte_t *ptep; unsigned int writing; unsigned long mmu_seq; unsigned long rcbits; @@ -208,22 +189,33 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags, /* Translate to host virtual address */ hva = __gfn_to_hva_memslot(memslot, gfn); + ptep = find_linux_pte_or_hugepte(pgdir, hva, hpage_shift); + if (ptep) { + pte_t pte; + unsigned int host_pte_size; - /* Look up the Linux PTE for the backing page */ - pte_size = psize; - pte = lookup_linux_pte_and_update(pgdir, hva, writing, pte_size); - if (pte_present(pte) !pte_protnone(pte)) { - if (writing !pte_write(pte)) - /* make the actual HPTE be read-only */ - ptel = hpte_make_readonly(ptel); - is_io = hpte_cache_bits(pte_val(pte)); - pa = pte_pfn(pte) PAGE_SHIFT; - pa |= hva (pte_size - 1); - pa |= gpa ~PAGE_MASK; - } + if (hpage_shift) + host_pte_size = 1ul hpage_shift; + else + host_pte_size = PAGE_SIZE; + /* +* We should always find the guest page size +* to = host page size, if host is using hugepage +*/ + if (host_pte_size psize) + return H_PARAMETER; - if (pte_size psize) - return H_PARAMETER; + pte = kvmppc_read_update_linux_pte(ptep, writing, hpage_shift); + if (pte_present(pte) !pte_protnone(pte)) { + if (writing !pte_write(pte)) + /* make the actual HPTE be read-only */ + ptel = hpte_make_readonly(ptel); + is_io = hpte_cache_bits(pte_val(pte)); + pa = pte_pfn(pte) PAGE_SHIFT; + pa |= hva (host_pte_size - 1); + pa |= gpa
[PATCH v3 2/2] powerpc/mm: Tracking vDSO remap
Some processes (CRIU) are moving the vDSO area using the mremap system call. As a consequence the kernel reference to the vDSO base address is no more valid and the signal return frame built once the vDSO has been moved is not pointing to the new sigreturn address. This patch handles vDSO remapping and unmapping. Signed-off-by: Laurent Dufour lduf...@linux.vnet.ibm.com --- arch/powerpc/include/asm/mmu_context.h | 36 +- 1 file changed, 35 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h index 73382eba02dc..7d315c1898d4 100644 --- a/arch/powerpc/include/asm/mmu_context.h +++ b/arch/powerpc/include/asm/mmu_context.h @@ -8,7 +8,6 @@ #include linux/spinlock.h #include asm/mmu.h #include asm/cputable.h -#include asm-generic/mm_hooks.h #include asm/cputhreads.h /* @@ -109,5 +108,40 @@ static inline void enter_lazy_tlb(struct mm_struct *mm, #endif } +static inline void arch_dup_mmap(struct mm_struct *oldmm, +struct mm_struct *mm) +{ +} + +static inline void arch_exit_mmap(struct mm_struct *mm) +{ +} + +static inline void arch_unmap(struct mm_struct *mm, + struct vm_area_struct *vma, + unsigned long start, unsigned long end) +{ + if (start = mm-context.vdso_base mm-context.vdso_base end) + mm-context.vdso_base = 0; +} + +static inline void arch_bprm_mm_init(struct mm_struct *mm, +struct vm_area_struct *vma) +{ +} + +#define __HAVE_ARCH_REMAP +static inline void arch_remap(struct mm_struct *mm, + unsigned long old_start, unsigned long old_end, + unsigned long new_start, unsigned long new_end) +{ + /* +* mremap() doesn't allow moving multiple vmas so we can limit the +* check to old_start == vdso_base. +*/ + if (old_start == mm-context.vdso_base) + mm-context.vdso_base = new_start; +} + #endif /* __KERNEL__ */ #endif /* __ASM_POWERPC_MMU_CONTEXT_H */ -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3 1/2] mm: Introducing arch_remap hook
Some architecture would like to be triggered when a memory area is moved through the mremap system call. This patch is introducing a new arch_remap mm hook which is placed in the path of mremap, and is called before the old area is unmapped (and the arch_unmap hook is called). The architectures which need to call this hook should define __HAVE_ARCH_REMAP in their asm/mmu_context.h and provide the arch_remap service with the following prototype: void arch_remap(struct mm_struct *mm, unsigned long old_start, unsigned long old_end, unsigned long new_start, unsigned long new_end); Signed-off-by: Laurent Dufour lduf...@linux.vnet.ibm.com --- mm/mremap.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/mm/mremap.c b/mm/mremap.c index 57dadc025c64..bafc234db45c 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -25,6 +25,7 @@ #include asm/cacheflush.h #include asm/tlbflush.h +#include asm/mmu_context.h #include internal.h @@ -286,8 +287,14 @@ static unsigned long move_vma(struct vm_area_struct *vma, old_len = new_len; old_addr = new_addr; new_addr = -ENOMEM; - } else if (vma-vm_file vma-vm_file-f_op-mremap) - vma-vm_file-f_op-mremap(vma-vm_file, new_vma); + } else { + if (vma-vm_file vma-vm_file-f_op-mremap) + vma-vm_file-f_op-mremap(vma-vm_file, new_vma); +#ifdef __HAVE_ARCH_REMAP + arch_remap(mm, old_addr, old_addr+old_len, + new_addr, new_addr+new_len); +#endif + } /* Conceal VM_ACCOUNT so old reservation is not undone */ if (vm_flags VM_ACCOUNT) { -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V3 1/3] KVM: PPC: Use READ_ONCE when dereferencing pte_t pointer
pte can get updated from other CPUs as part of multiple activities like THP split, huge page collapse, unmap. We need to make sure we don't reload the pte value again and again for different checks. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- NOTE: The series depends on the patch [PATCH 4/6] powerpc: Fix compile errors with STRICT_MM_TYPECHECKS enabled arch/powerpc/include/asm/kvm_book3s_64.h | 5 - arch/powerpc/kvm/e500_mmu_host.c | 20 2 files changed, 16 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index cc073a7ac2b7..f06820c67175 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -290,7 +290,10 @@ static inline pte_t kvmppc_read_update_linux_pte(pte_t *ptep, int writing, pte_t old_pte, new_pte = __pte(0); while (1) { - old_pte = *ptep; + /* +* Make sure we don't reload from ptep +*/ + old_pte = READ_ONCE(*ptep); /* * wait until _PAGE_BUSY is clear then set it atomically */ diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c index cc536d4a75ef..5840d546aa03 100644 --- a/arch/powerpc/kvm/e500_mmu_host.c +++ b/arch/powerpc/kvm/e500_mmu_host.c @@ -469,14 +469,18 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500, pgdir = vcpu_e500-vcpu.arch.pgdir; ptep = lookup_linux_ptep(pgdir, hva, tsize_pages); - if (pte_present(*ptep)) - wimg = (*ptep PTE_WIMGE_SHIFT) MAS2_WIMGE_MASK; - else { - if (printk_ratelimit()) - pr_err(%s: pte not present: gfn %lx, pfn %lx\n, - __func__, (long)gfn, pfn); - ret = -EINVAL; - goto out; + if (ptep) { + pte_t pte = READ_ONCE(*ptep); + + if (pte_present(pte)) + wimg = (pte_val(pte) PTE_WIMGE_SHIFT) + MAS2_WIMGE_MASK; + else { + pr_err_ratelimited(%s: pte not present: gfn %lx,pfn %lx\n, + __func__, (long)gfn, pfn); + ret = -EINVAL; + goto out; + } } kvmppc_e500_ref_setup(ref, gtlbe, pfn, wimg); -- 2.1.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: drivers/of: Add empty ranges quirk for PA-Semi
On Mon, 23 Mar 2015 15:06:35 +1100 , Benjamin Herrenschmidt b...@kernel.crashing.org wrote: On Mon, 2015-03-23 at 14:50 +1100, Michael Ellerman wrote: On Mon, 2015-23-03 at 03:16:38 UTC, Benjamin Herrenschmidt wrote: The sdc node is missing the ranges property, it needs to be treated as having an empty one otherwise translation fails for its children. Tested-by: Steven Rostedt rost...@goodmis.org Signed-off-by: Benjamin Herrenschmidt b...@kernel.crashing.org Fixes: 746c9e9f92dd (of/base: Fix PowerPC address parsing hack) Which went into 3.18-rc6, and was CC'ed to stable. So this should probably also go to stable no? Sure, go for it. Applied, thanks. g. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v7 RFC 3/3] sparc: Make LDC use common iommu poll management functions
Note that this conversion is only being done to consolidate the code and ensure that the common code provides the sufficient abstraction. It is not expected to result in any noticeable performance improvement, as there is typically one ldc_iommu per vnet_port, and each one has 8k entries, with a typical request for 1-4 pages. Thus LDC uses npools == 1. Signed-off-by: Sowmini Varadhan sowmini.varad...@oracle.com --- v3: added this file to be a consumer of the common iommu library v4: removed -cookie_to_index and -demap from iommu_tbl_ops and instead inline these calls into ldc before calling into iommu-common v6: remove iommu_tbl_ops v7: move pool_hash initialization to iommu_tbl_pool_init arch/sparc/kernel/ldc.c | 152 --- 1 files changed, 64 insertions(+), 88 deletions(-) diff --git a/arch/sparc/kernel/ldc.c b/arch/sparc/kernel/ldc.c index 274a9f5..e858968 100644 --- a/arch/sparc/kernel/ldc.c +++ b/arch/sparc/kernel/ldc.c @@ -15,6 +15,7 @@ #include linux/list.h #include linux/init.h #include linux/bitmap.h +#include linux/iommu-common.h #include asm/hypervisor.h #include asm/iommu.h @@ -27,6 +28,10 @@ #define DRV_MODULE_VERSION 1.1 #define DRV_MODULE_RELDATE July 22, 2008 +#define COOKIE_PGSZ_CODE 0xf000ULL +#define COOKIE_PGSZ_CODE_SHIFT 60ULL + + static char version[] = DRV_MODULE_NAME .c:v DRV_MODULE_VERSION ( DRV_MODULE_RELDATE )\n; #define LDC_PACKET_SIZE64 @@ -98,10 +103,10 @@ static const struct ldc_mode_ops stream_ops; int ldom_domaining_enabled; struct ldc_iommu { - /* Protects arena alloc/free. */ + /* Protects ldc_unmap. */ spinlock_t lock; - struct iommu_arena arena; struct ldc_mtable_entry *page_table; + struct iommu_table iommu_table; }; struct ldc_channel { @@ -998,31 +1003,59 @@ static void free_queue(unsigned long num_entries, struct ldc_packet *q) free_pages((unsigned long)q, order); } +static unsigned long ldc_cookie_to_index(u64 cookie, void *arg) +{ + u64 szcode = cookie COOKIE_PGSZ_CODE_SHIFT; + /* struct ldc_iommu *ldc_iommu = (struct ldc_iommu *)arg; */ + + cookie = ~COOKIE_PGSZ_CODE; + + return (cookie (13ULL + (szcode * 3ULL))); +} + +static void ldc_demap(struct ldc_iommu *iommu, unsigned long id, u64 cookie, + unsigned long entry, unsigned long npages) +{ + struct ldc_mtable_entry *base; + unsigned long i, shift; + + shift = (cookie COOKIE_PGSZ_CODE_SHIFT) * 3; + base = iommu-page_table + entry; + for (i = 0; i npages; i++) { + if (base-cookie) + sun4v_ldc_revoke(id, cookie + (i shift), +base-cookie); + base-mte = 0; + } +} + /* XXX Make this configurable... XXX */ #define LDC_IOTABLE_SIZE (8 * 1024) -static int ldc_iommu_init(struct ldc_channel *lp) +static int ldc_iommu_init(const char *name, struct ldc_channel *lp) { unsigned long sz, num_tsb_entries, tsbsize, order; - struct ldc_iommu *iommu = lp-iommu; + struct ldc_iommu *ldc_iommu = lp-iommu; + struct iommu_table *iommu = ldc_iommu-iommu_table; struct ldc_mtable_entry *table; unsigned long hv_err; int err; num_tsb_entries = LDC_IOTABLE_SIZE; tsbsize = num_tsb_entries * sizeof(struct ldc_mtable_entry); - - spin_lock_init(iommu-lock); + spin_lock_init(ldc_iommu-lock); sz = num_tsb_entries / 8; sz = (sz + 7UL) ~7UL; - iommu-arena.map = kzalloc(sz, GFP_KERNEL); - if (!iommu-arena.map) { + iommu-map = kzalloc(sz, GFP_KERNEL); + if (!iommu-map) { printk(KERN_ERR PFX Alloc of arena map failed, sz=%lu\n, sz); return -ENOMEM; } - - iommu-arena.limit = num_tsb_entries; + iommu_tbl_pool_init(iommu, num_tsb_entries, PAGE_SHIFT, + NULL, false /* no large pool */, + 1 /* npools */, + true /* skip span boundary check */); order = get_order(tsbsize); @@ -1037,7 +1070,7 @@ static int ldc_iommu_init(struct ldc_channel *lp) memset(table, 0, PAGE_SIZE order); - iommu-page_table = table; + ldc_iommu-page_table = table; hv_err = sun4v_ldc_set_map_table(lp-id, __pa(table), num_tsb_entries); @@ -1049,31 +1082,32 @@ static int ldc_iommu_init(struct ldc_channel *lp) out_free_table: free_pages((unsigned long) table, order); - iommu-page_table = NULL; + ldc_iommu-page_table = NULL; out_free_map: - kfree(iommu-arena.map); - iommu-arena.map = NULL; + kfree(iommu-map); + iommu-map = NULL; return err; } static void ldc_iommu_release(struct
[PATCH] powerpc/powernv: handle OPAL_SUCCESS return in opal_sensor_read
Currently, when a sensor value is read, the kernel calls OPAL, which in turn builds a message for the FSP, and waits for a message back. The new device tree for OPAL sensors [1] adds new sensors that can be read synchronously (core temperatures for instance) and that don't need to wait for a response. This patch modifies the opal call to accept an OPAL_SUCCESS return value and cover the case above. [1] https://lists.ozlabs.org/pipermail/skiboot/2015-March/000639.html Signed-off-by: Cédric Le Goater c...@fr.ibm.com --- We still uselessly reserve a token (for the response) and take a lock, which might raise the need of a new 'opal_sensor_read_sync' call. arch/powerpc/platforms/powernv/opal-sensor.c | 29 +- 1 file changed, 19 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/platforms/powernv/opal-sensor.c b/arch/powerpc/platforms/powernv/opal-sensor.c index 4ab67ef7abc9..99d6d9a371ab 100644 --- a/arch/powerpc/platforms/powernv/opal-sensor.c +++ b/arch/powerpc/platforms/powernv/opal-sensor.c @@ -46,18 +46,27 @@ int opal_get_sensor_data(u32 sensor_hndl, u32 *sensor_data) mutex_lock(opal_sensor_mutex); ret = opal_sensor_read(sensor_hndl, token, data); - if (ret != OPAL_ASYNC_COMPLETION) - goto out_token; + switch (ret) { + case OPAL_ASYNC_COMPLETION: + ret = opal_async_wait_response(token, msg); + if (ret) { + pr_err(%s: Failed to wait for the async response, %d\n, + __func__, ret); + goto out_token; + } - ret = opal_async_wait_response(token, msg); - if (ret) { - pr_err(%s: Failed to wait for the async response, %d\n, - __func__, ret); - goto out_token; - } + ret = be64_to_cpu(msg.params[1]); + + *sensor_data = be32_to_cpu(data); + break; - *sensor_data = be32_to_cpu(data); - ret = be64_to_cpu(msg.params[1]); + case OPAL_SUCCESS: + *sensor_data = be32_to_cpu(data); + break; + + default: + break; + } out_token: mutex_unlock(opal_sensor_mutex); -- 1.7.10.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: net: ucc: tbi phy detection broken by 058112c7efc9ef43bb511c137293dddbe6e42908
On Sat, Dec 20, 2014 at 09:08:51AM -0800, Florian Fainelli wrote: 2014-12-18 19:49 GMT-08:00 Lennart Sorensen lsore...@csclub.uwaterloo.ca: I have been trying to move an 8360 based system from a 3.0 kernel to a 3.12 (on the way to 3.14 with ipipe/xenomai) kernel and encountered an oops in the ucc_geth driver when using RTBI mode on one of the ucc ports. I haven't managed to find any commits to of_mdio or ucc_geth or fsl_pq_mdio that would appear to address this problem, so I believe it is still present in the latest kernel, but have not confirmed that with testing yet. Commit 058112c7efc9ef43bb511c137293dddbe6e42908 appears to have broken ucc support for tbi phy detection. With the patch in place, I am unable to get the mdio bus to create phy devices for the tbi phy in the ucc on an 8360e, and the ucc_geth driver causes a kernel oops, while with the patch reverted, it does create them and the driver comes up and works. The tbi phy is needed when using a ucc in RTBI, TBI or SGMII mode. I am not convinced that the tbi phy really behaves quite like a real phy, which may be why get_phy_device does not work with it. Perhaps there is a better way to deal with the tbi phy on the ucc for this purpose. There are some comments in ucc_geth that also lead me to believe this is a just a hack instead of a real Ethernet PHY device. Part of what I think got broken is because of this comment: /* Initialize TBI PHY interface for communicating with the * SERDES lynx PHY on the chip. We communicate with this PHY * through the MDIO bus on each controller, treating it as a * normal PHY at the address found in the UTBIPA register. We assume * that the UTBIPA register is valid. Either the MDIO bus code will set * it to a value that doesn't conflict with other PHYs on the bus, or the * value doesn't matter, as there are no other PHYs on the bus. */ In particular this one: Either the MDIO bus code will set * it to a value that doesn't conflict with other PHYs on the bus, or the * value doesn't matter, as there are no other PHYs on the bus. and what Sebastian removed did exactly that, we used the special MDIO broadcast address 0 to provide this whatever. If this is such a requirement from the ucc_geth driver and TBI PHYs, maybe we should have this hack somewhere in the actual MDIO driver used by the ucc_geth driver instead, or set a flag/read the PHY connection mode and do this in drivers/of/of_mdio.c I discovered a problem with the tbi address handling on ucc_geth. In get_ucc_tbipa, the passed in pointer is expecting a pointer to a struct fsl_pq_mdio, but on ucc the pointer is actually to the start of the mii area, since it doesn't have all the stuff that the etsec2 has, so as a result the address returned for tbipa is actually 1312 bytes too high, which means the address never gets set of course. In fact the driver prints out cr=0 and sr=0, while with the older working driver it printed cr=140 and sr=149. As a quick test I did: } tbipa = data-get_tbipa(priv-map - offsetof(struct fsl_pq_mdio, mii)); out_be32(tbipa, be32_to_cpup(prop)); and that made it work, but of course is ugly and would break etsec2. Any suggestion for a clean way to make get_ucc_tbipa able to dereference the structure correctly? I suppose I could do: /* * Return the TBIPAR address for a QE MDIO node */ static uint32_t __iomem *get_ucc_tbipa(void __iomem *p) { struct fsl_pq_mdio __iomem *mdio = p - offsetof(struct fsl_pq_mdio, mii); return mdio-utbipar; } but it seems like just putting more hacks in place. The use of the mii_offset in the first place seems like a clue that defining one structure for etsec2 and ucc and such even though it doesn't apply to both is probably an error. It would just be using mii_offset in reverse for the ucc, versus the etsec2. -- Len Sorensen ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v7 RFC 2/3] sparc: Make sparc64 use scalable lib/iommu-common.c functions
In iperf experiments running linux as the Tx side (TCP client) with 10 threads results in a severe performance drop when TSO is disabled, indicating a weakness in the software that can be avoided by using the scalable IOMMU arena DMA allocation. Baseline numbers before this patch: with default settings (TSO enabled) :9-9.5 Gbps Disable TSO using ethtool- drops badly: 2-3 Gbps. After this patch, iperf client with 10 threads, can give a throughput of at least 8.5 Gbps, even when TSO is disabled. Signed-off-by: Sowmini Varadhan sowmini.varad...@oracle.com --- v2: moved sparc specific fileds into iommu_sparc v3: converted all sparc users of iommu, so lot of cleanup and streamlining v4: David Miller review change: - s/IOMMU_ERROR_CODE/DMA_ERROR_CODE - reverts pci_impl.h (now that all iommu usage has been converted) v5: benh/aik feedback modifies the function signatures: pass in modified args to iommmu_tbl_pool_init() and iommu_tbl_range_free() v6: removed iommu_tbl_ops. Pass flush_all as function pointer to iommu_tbl_pool_init v7: move pool_hash initialization to iommu_tbl_pool_init() arch/sparc/include/asm/iommu_64.h |7 +- arch/sparc/kernel/iommu.c | 174 +--- arch/sparc/kernel/iommu_common.h |8 -- arch/sparc/kernel/pci_sun4v.c | 179 4 files changed, 127 insertions(+), 241 deletions(-) diff --git a/arch/sparc/include/asm/iommu_64.h b/arch/sparc/include/asm/iommu_64.h index 2b9321a..e3cd449 100644 --- a/arch/sparc/include/asm/iommu_64.h +++ b/arch/sparc/include/asm/iommu_64.h @@ -16,6 +16,7 @@ #define IOPTE_WRITE 0x0002UL #define IOMMU_NUM_CTXS 4096 +#include linux/iommu-common.h struct iommu_arena { unsigned long *map; @@ -24,11 +25,10 @@ struct iommu_arena { }; struct iommu { + struct iommu_table tbl; spinlock_t lock; - struct iommu_arena arena; - void(*flush_all)(struct iommu *); + u32 dma_addr_mask; iopte_t *page_table; - u32 page_table_map_base; unsigned long iommu_control; unsigned long iommu_tsbbase; unsigned long iommu_flush; @@ -40,7 +40,6 @@ struct iommu { unsigned long dummy_page_pa; unsigned long ctx_lowest_free; DECLARE_BITMAP(ctx_bitmap, IOMMU_NUM_CTXS); - u32 dma_addr_mask; }; struct strbuf { diff --git a/arch/sparc/kernel/iommu.c b/arch/sparc/kernel/iommu.c index bfa4d0c..f7fdff2 100644 --- a/arch/sparc/kernel/iommu.c +++ b/arch/sparc/kernel/iommu.c @@ -13,6 +13,7 @@ #include linux/errno.h #include linux/iommu-helper.h #include linux/bitmap.h +#include linux/iommu-common.h #ifdef CONFIG_PCI #include linux/pci.h @@ -45,8 +46,9 @@ i (ASI_PHYS_BYPASS_EC_E)) /* Must be invoked under the IOMMU lock. */ -static void iommu_flushall(struct iommu *iommu) +static void iommu_flushall(struct iommu_table *iommu_table) { + struct iommu *iommu = container_of(iommu_table, struct iommu, tbl); if (iommu-iommu_flushinv) { iommu_write(iommu-iommu_flushinv, ~(u64)0); } else { @@ -87,94 +89,6 @@ static inline void iopte_make_dummy(struct iommu *iommu, iopte_t *iopte) iopte_val(*iopte) = val; } -/* Based almost entirely upon the ppc64 iommu allocator. If you use the 'handle' - * facility it must all be done in one pass while under the iommu lock. - * - * On sun4u platforms, we only flush the IOMMU once every time we've passed - * over the entire page table doing allocations. Therefore we only ever advance - * the hint and cannot backtrack it. - */ -unsigned long iommu_range_alloc(struct device *dev, - struct iommu *iommu, - unsigned long npages, - unsigned long *handle) -{ - unsigned long n, end, start, limit, boundary_size; - struct iommu_arena *arena = iommu-arena; - int pass = 0; - - /* This allocator was derived from x86_64's bit string search */ - - /* Sanity check */ - if (unlikely(npages == 0)) { - if (printk_ratelimit()) - WARN_ON(1); - return DMA_ERROR_CODE; - } - - if (handle *handle) - start = *handle; - else - start = arena-hint; - - limit = arena-limit; - - /* The case below can happen if we have a small segment appended -* to a large, or when the previous alloc was at the very end of -* the available space. If so, go back to the beginning and flush. -*/ - if (start = limit) { - start = 0; - if (iommu-flush_all) - iommu-flush_all(iommu); - } - - again: - - if
[PATCH v7 0/3] Generic IOMMU pooled allocator
Changes from patchv6: moved pool_hash initialization to lib/iommu-common.c and cleaned up code duplication from sun4v/sun4u/ldc. Sowmini (2): Break up monolithic iommu table/lock into finer graularity pools and lock Make sparc64 use scalable lib/iommu-common.c functions Sowmini Varadhan (1): Make LDC use common iommu poll management functions arch/sparc/include/asm/iommu_64.h |7 +- arch/sparc/kernel/iommu.c | 174 +++ arch/sparc/kernel/iommu_common.h |8 -- arch/sparc/kernel/ldc.c | 152 ++-- arch/sparc/kernel/pci_sun4v.c | 179 + include/linux/iommu-common.h | 48 lib/Makefile |2 +- lib/iommu-common.c| 235 + 8 files changed, 475 insertions(+), 330 deletions(-) create mode 100644 include/linux/iommu-common.h create mode 100644 lib/iommu-common.c ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v7 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock
Investigation of multithreaded iperf experiments on an ethernet interface show the iommu-lock as the hottest lock identified by lockstat, with something of the order of 21M contentions out of 27M acquisitions, and an average wait time of 26 us for the lock. This is not efficient. A more scalable design is to follow the ppc model, where the iommu_table has multiple pools, each stretching over a segment of the map, and with a separate lock for each pool. This model allows for better parallelization of the iommu map search. This patch adds the iommu range alloc/free function infrastructure. Signed-off-by: Sowmini Varadhan sowmini.varad...@oracle.com --- v2 changes: - incorporate David Miller editorial comments: sparc specific fields moved from iommu-common into sparc's iommu_64.h - make the npools value an input parameter, for the case when the iommu map size is not very large - cookie_to_index mapping, and optimizations for span-boundary check, for use case such as LDC. v3: eliminate iommu_sparc, rearrange the -demap indirection to be invoked under the pool lock. v4: David Miller review changes: - s/IOMMU_ERROR_CODE/DMA_ERROR_CODE - page_table_map_base and page_table_shift are unsigned long, not u32. v5: Feedback from b...@kernel.crashing.org and a...@ozlabs.ru - removed -cookie_to_index and -demap indirection: caller should invoke these as needed before calling into the generic allocator v6: Benh/DaveM discussion eliminationg iommu_tbl_ops, but retaining flush_all optimization. v7: one-time initialization of pool_hash from iommu_tbl_pool_init() include/linux/iommu-common.h | 48 + lib/Makefile |2 +- lib/iommu-common.c | 235 ++ 3 files changed, 284 insertions(+), 1 deletions(-) create mode 100644 include/linux/iommu-common.h create mode 100644 lib/iommu-common.c diff --git a/include/linux/iommu-common.h b/include/linux/iommu-common.h new file mode 100644 index 000..197111b --- /dev/null +++ b/include/linux/iommu-common.h @@ -0,0 +1,48 @@ +#ifndef _LINUX_IOMMU_COMMON_H +#define _LINUX_IOMMU_COMMON_H + +#include linux/spinlock_types.h +#include linux/device.h +#include asm/page.h + +#define IOMMU_POOL_HASHBITS 4 +#define IOMMU_NR_POOLS (1 IOMMU_POOL_HASHBITS) + +struct iommu_pool { + unsigned long start; + unsigned long end; + unsigned long hint; + spinlock_t lock; +}; + +struct iommu_table { + unsigned long page_table_map_base; + unsigned long page_table_shift; + unsigned long nr_pools; + void(*flush_all)(struct iommu_table *); + unsigned long poolsize; + struct iommu_pool arena_pool[IOMMU_NR_POOLS]; + u32 flags; +#defineIOMMU_HAS_LARGE_POOL0x0001 +#defineIOMMU_NO_SPAN_BOUND 0x0002 + struct iommu_pool large_pool; + unsigned long *map; +}; + +extern void iommu_tbl_pool_init(struct iommu_table *iommu, + unsigned long num_entries, + u32 page_table_shift, + void (*flush_all)(struct iommu_table *), + bool large_pool, u32 npools, + bool skip_span_boundary_check); + +extern unsigned long iommu_tbl_range_alloc(struct device *dev, + struct iommu_table *iommu, + unsigned long npages, + unsigned long *handle); + +extern void iommu_tbl_range_free(struct iommu_table *iommu, +u64 dma_addr, unsigned long npages, +unsigned long entry); + +#endif diff --git a/lib/Makefile b/lib/Makefile index 3c3b30b..0ea2ac6 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -102,7 +102,7 @@ obj-$(CONFIG_AUDIT_GENERIC) += audit.o obj-$(CONFIG_AUDIT_COMPAT_GENERIC) += compat_audit.o obj-$(CONFIG_SWIOTLB) += swiotlb.o -obj-$(CONFIG_IOMMU_HELPER) += iommu-helper.o +obj-$(CONFIG_IOMMU_HELPER) += iommu-helper.o iommu-common.o obj-$(CONFIG_FAULT_INJECTION) += fault-inject.o obj-$(CONFIG_NOTIFIER_ERROR_INJECTION) += notifier-error-inject.o obj-$(CONFIG_CPU_NOTIFIER_ERROR_INJECT) += cpu-notifier-error-inject.o diff --git a/lib/iommu-common.c b/lib/iommu-common.c new file mode 100644 index 000..bb7e706 --- /dev/null +++ b/lib/iommu-common.c @@ -0,0 +1,235 @@ +/* + * IOMMU mmap management and range allocation functions. + * Based almost entirely upon the powerpc iommu allocator. + */ + +#include linux/export.h +#include linux/bitmap.h +#include linux/bug.h +#include linux/iommu-helper.h +#include linux/iommu-common.h +#include linux/dma-mapping.h +#include linux/hash.h + +#define IOMMU_LARGE_ALLOC 15 + +static DEFINE_PER_CPU(unsigned int,
Re: [PATCH v6 0/3] Generic IOMMU pooled allocator
On (03/24/15 18:16), David Miller wrote: Generally this looks fine to me. But about patch #2, I see no reason to have multiple iommu_pool_hash tables. Even from a purely sparc perspective, we can always just do with just one of them. Furthermore, you can even probably move it down into lib/iommu-common.c itself. iommu_tbl_pool_init() can do the one time initialization. fixed in v7. Ben, Alexey, do you need more time to review this? --Sowmini ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC, powerpc] perf/hv-24x7 set the attr group to NULL if events failed to be initialized
Michael Ellerman [m...@ellerman.id.au] wrote: | On Sun, 2015-15-02 at 09:42:57 UTC, Li Zhong wrote: | This patch moves the three events groups to the end of the attr groups, | and if create_events_from_catalog() fails to set their attributes, we | set them to NULL in attr_groups. | | But why are we continuing at all if create_events_from_catalog() fails? | | Shouldn't that just be a fatal error and we bail? Well, even if create_events_from_catalog() fails, we can continue to use the 24x7 events, rather clumsily, as long as the catalog is readable. i.e. parse /sys/bus/event_source/devices/hv_24x7/interface/catalog to find event offset and run: perf stat -C 0 -e hv_24x7/domain=2,offset=8,core=0/ workload Suka ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v7 0/3] Generic IOMMU pooled allocator
From: Sowmini Varadhan sowmini.varad...@oracle.com Date: Wed, 25 Mar 2015 13:34:45 -0400 Changes from patchv6: moved pool_hash initialization to lib/iommu-common.c and cleaned up code duplication from sun4v/sun4u/ldc. Looks good to me. PowerPC folks, what do you think? ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v3 2/2] powerpc/mm: Tracking vDSO remap
* Laurent Dufour lduf...@linux.vnet.ibm.com wrote: +static inline void arch_unmap(struct mm_struct *mm, + struct vm_area_struct *vma, + unsigned long start, unsigned long end) +{ + if (start = mm-context.vdso_base mm-context.vdso_base end) + mm-context.vdso_base = 0; +} So AFAICS PowerPC can have multi-page vDSOs, right? So what happens if I munmap() the middle or end of the vDSO? The above condition only seems to cover unmaps that affect the first page. I think 'affects any page' ought to be the right condition? (But I know nothing about PowerPC so I might be wrong.) +#define __HAVE_ARCH_REMAP +static inline void arch_remap(struct mm_struct *mm, + unsigned long old_start, unsigned long old_end, + unsigned long new_start, unsigned long new_end) +{ + /* + * mremap() doesn't allow moving multiple vmas so we can limit the + * check to old_start == vdso_base. + */ + if (old_start == mm-context.vdso_base) + mm-context.vdso_base = new_start; +} mremap() doesn't allow moving multiple vmas, but it allows the movement of multi-page vmas and it also allows partial mremap()s, where it will split up a vma. In particular, what happens if an mremap() is done with old_start == vdso_base, but a shorter end than the end of the vDSO? (i.e. a partial mremap() with fewer pages than the vDSO size) Thanks, Ingo ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v3 2/2] powerpc/mm: Tracking vDSO remap
* Ingo Molnar mi...@kernel.org wrote: +#define __HAVE_ARCH_REMAP +static inline void arch_remap(struct mm_struct *mm, + unsigned long old_start, unsigned long old_end, + unsigned long new_start, unsigned long new_end) +{ + /* +* mremap() doesn't allow moving multiple vmas so we can limit the +* check to old_start == vdso_base. +*/ + if (old_start == mm-context.vdso_base) + mm-context.vdso_base = new_start; +} mremap() doesn't allow moving multiple vmas, but it allows the movement of multi-page vmas and it also allows partial mremap()s, where it will split up a vma. I.e. mremap() supports the shrinking (and growing) of vmas. In that case mremap() will unmap the end of the vma and will shrink the remaining vDSO vma. Doesn't that result in a non-working vDSO that should zero out vdso_base? Thanks, Ingo ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v3 2/2] powerpc/mm: Tracking vDSO remap
On Wed, 2015-03-25 at 19:33 +0100, Ingo Molnar wrote: * Laurent Dufour lduf...@linux.vnet.ibm.com wrote: +static inline void arch_unmap(struct mm_struct *mm, + struct vm_area_struct *vma, + unsigned long start, unsigned long end) +{ + if (start = mm-context.vdso_base mm-context.vdso_base end) + mm-context.vdso_base = 0; +} So AFAICS PowerPC can have multi-page vDSOs, right? So what happens if I munmap() the middle or end of the vDSO? The above condition only seems to cover unmaps that affect the first page. I think 'affects any page' ought to be the right condition? (But I know nothing about PowerPC so I might be wrong.) You are right, we have at least two pages. +#define __HAVE_ARCH_REMAP +static inline void arch_remap(struct mm_struct *mm, + unsigned long old_start, unsigned long old_end, + unsigned long new_start, unsigned long new_end) +{ + /* +* mremap() doesn't allow moving multiple vmas so we can limit the +* check to old_start == vdso_base. +*/ + if (old_start == mm-context.vdso_base) + mm-context.vdso_base = new_start; +} mremap() doesn't allow moving multiple vmas, but it allows the movement of multi-page vmas and it also allows partial mremap()s, where it will split up a vma. In particular, what happens if an mremap() is done with old_start == vdso_base, but a shorter end than the end of the vDSO? (i.e. a partial mremap() with fewer pages than the vDSO size) Is there a way to forbid splitting ? Does x86 deal with that case at all or it doesn't have to for some other reason ? Cheers, Ben. Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v3 2/2] powerpc/mm: Tracking vDSO remap
On Wed, 2015-03-25 at 19:36 +0100, Ingo Molnar wrote: * Ingo Molnar mi...@kernel.org wrote: +#define __HAVE_ARCH_REMAP +static inline void arch_remap(struct mm_struct *mm, + unsigned long old_start, unsigned long old_end, + unsigned long new_start, unsigned long new_end) +{ + /* + * mremap() doesn't allow moving multiple vmas so we can limit the + * check to old_start == vdso_base. + */ + if (old_start == mm-context.vdso_base) + mm-context.vdso_base = new_start; +} mremap() doesn't allow moving multiple vmas, but it allows the movement of multi-page vmas and it also allows partial mremap()s, where it will split up a vma. I.e. mremap() supports the shrinking (and growing) of vmas. In that case mremap() will unmap the end of the vma and will shrink the remaining vDSO vma. Doesn't that result in a non-working vDSO that should zero out vdso_base? Right. Now we can't completely prevent the user from shooting itself in the foot I suppose, though there is a legit usage scenario which is to move the vDSO around which it would be nice to support. I think it's reasonable to put the onus on the user here to do the right thing. Cheers, Ben. Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v7 0/3] Generic IOMMU pooled allocator
On Wed, 2015-03-25 at 14:12 -0400, David Miller wrote: From: Sowmini Varadhan sowmini.varad...@oracle.com Date: Wed, 25 Mar 2015 13:34:45 -0400 Changes from patchv6: moved pool_hash initialization to lib/iommu-common.c and cleaned up code duplication from sun4v/sun4u/ldc. Looks good to me. PowerPC folks, what do you think? I'll give it another look today. Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v8 19/30] powerpc/pci: Use pci_scan_host_bridge() for simplicity
Hi Yijing, I wasn't quite sure I understood your comments, so I was trying to apply your patch series and test it, but patch 3 doesn't apply cleanly to 4.0-rc5 or master. Can you respin the series? Thanks, Daniel Hi Daniel, thanks for your review and comments. We want to make a generic pci_host_bridge, which would hold the common host information, for example, pci domain is common info for pci host bridge, this series saved domain in pci_host_bridge, then we no need to extract out domain by pci_bus-sysdata by platform specific pci_domain_nr(). Also we store the sysdata in pci_host_bridge, and pci_bus_to_host() is the platform interface, I think use the common interface would be better. + + /* Get probe mode and perform scan */ + if (hose-dn ppc_md.pci_probe_mode) + mode = ppc_md.pci_probe_mode(bus); + + pr_debug(probe mode: %d\n, mode); + if (mode == PCI_PROBE_DEVTREE) + of_scan_bus(hose-dn, bus); + + if (mode == PCI_PROBE_NORMAL) { + pci_bus_update_busn_res_end(bus, 255); + hose-last_busno = pci_scan_child_bus(bus); + pci_bus_update_busn_res_end(bus, hose-last_busno); + } + + return pci_bus_child_max_busnr(bus); +} + I'm having trouble convincing myself that this patch covers every variation within our PCI implementations. In particular, there's a stanza in of_scan_pci_bridge in kernel/pci_of_scan.c that's almost identical to this function. Does that implementation need to be cleaned up and replaced with this function too? This is a pci_host_bridge_ops hook function, which would be called in PCI core, and after applied this series, we only need to call pci_scan_host_bridge() to scan pci devices, and this function is also extracted from the pcibios_scan_phb(), it's not the redundant code. @@ -1641,9 +1655,9 @@ void pcibios_scan_phb(struct pci_controller *hose) ppc_md.pcibios_fixup_phb(hose); /* Configure PCI Express settings */ - if (bus !pci_has_flag(PCI_PROBE_ONLY)) { + if (host-bus !pci_has_flag(PCI_PROBE_ONLY)) { struct pci_bus *child; - list_for_each_entry(child, bus-children, node) + list_for_each_entry(child, host-bus-children, node) pcie_bus_configure_settings(child); } } Two things: Firstly, the function uses hose throughout, not host. Secondly, you're not deleting the bus variable: what's the purpose of this change? host is the common pci_host_bridge which is created by PCI core for pci host bridge driver, the hose is the platform data used in powerpc. The purpose of the patch/series is to simplify pci enumeration interface, and try to reduce the weak functions which were used to setup pci bus/devices during PCI enumeration. Regards, Daniel signature.asc Description: This is a digitally signed message part ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v4 2/4] powerpc/eeh: Introduce eeh_pe_inject_err()
The patch defines PCI error types and functions in uapi/asm/eeh.h and exports function eeh_pe_inject_err(), which will be called by VFIO driver to inject the specified PCI error to the indicated PE for testing purpose. Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com Reviewed-by: David Gibson da...@gibson.dropbear.id.au --- arch/powerpc/include/asm/eeh.h | 2 ++ arch/powerpc/include/uapi/asm/eeh.h | 26 ++ arch/powerpc/kernel/eeh.c | 35 +++ 3 files changed, 63 insertions(+) diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h index 08c4042..cd6003b 100644 --- a/arch/powerpc/include/asm/eeh.h +++ b/arch/powerpc/include/asm/eeh.h @@ -291,6 +291,8 @@ int eeh_pe_set_option(struct eeh_pe *pe, int option); int eeh_pe_get_state(struct eeh_pe *pe); int eeh_pe_reset(struct eeh_pe *pe, int option); int eeh_pe_configure(struct eeh_pe *pe); +int eeh_pe_inject_err(struct eeh_pe *pe, int type, int func, + unsigned long addr, unsigned long mask); /** * EEH_POSSIBLE_ERROR() -- test for possible MMIO failure. diff --git a/arch/powerpc/include/uapi/asm/eeh.h b/arch/powerpc/include/uapi/asm/eeh.h index 8bb34b0..291b7d1 100644 --- a/arch/powerpc/include/uapi/asm/eeh.h +++ b/arch/powerpc/include/uapi/asm/eeh.h @@ -27,4 +27,30 @@ #define EEH_PE_STATE_STOPPED_DMA 4 /* Stopped DMA only */ #define EEH_PE_STATE_UNAVAIL 5 /* Unavailable */ +/* EEH error types and functions */ +#define EEH_ERR_TYPE_320 /* 32-bits error */ +#define EEH_ERR_TYPE_641 /* 64-bits error */ +#define EEH_ERR_FUNC_MIN 0 +#define EEH_ERR_FUNC_LD_MEM_ADDR 0 /* Memory load */ +#define EEH_ERR_FUNC_LD_MEM_DATA 1 +#define EEH_ERR_FUNC_LD_IO_ADDR2 /* IO load */ +#define EEH_ERR_FUNC_LD_IO_DATA3 +#define EEH_ERR_FUNC_LD_CFG_ADDR 4 /* Config load */ +#define EEH_ERR_FUNC_LD_CFG_DATA 5 +#define EEH_ERR_FUNC_ST_MEM_ADDR 6 /* Memory store */ +#define EEH_ERR_FUNC_ST_MEM_DATA 7 +#define EEH_ERR_FUNC_ST_IO_ADDR8 /* IO store */ +#define EEH_ERR_FUNC_ST_IO_DATA9 +#define EEH_ERR_FUNC_ST_CFG_ADDR 10 /* Config store */ +#define EEH_ERR_FUNC_ST_CFG_DATA 11 +#define EEH_ERR_FUNC_DMA_RD_ADDR 12 /* DMA read */ +#define EEH_ERR_FUNC_DMA_RD_DATA 13 +#define EEH_ERR_FUNC_DMA_RD_MASTER 14 +#define EEH_ERR_FUNC_DMA_RD_TARGET 15 +#define EEH_ERR_FUNC_DMA_WR_ADDR 16 /* DMA write*/ +#define EEH_ERR_FUNC_DMA_WR_DATA 17 +#define EEH_ERR_FUNC_DMA_WR_MASTER 18 +#define EEH_ERR_FUNC_DMA_WR_TARGET 19 +#define EEH_ERR_FUNC_MAX 19 + #endif /* _ASM_POWERPC_EEH_H */ diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c index 76253eb..daa68a1 100644 --- a/arch/powerpc/kernel/eeh.c +++ b/arch/powerpc/kernel/eeh.c @@ -1636,6 +1636,41 @@ int eeh_pe_configure(struct eeh_pe *pe) } EXPORT_SYMBOL_GPL(eeh_pe_configure); +/** + * eeh_pe_inject_err - Injecting the specified PCI error to the indicated PE + * @pe: the indicated PE + * @type: error type + * @function: error function + * @addr: address + * @mask: address mask + * + * The routine is called to inject the specified PCI error, which + * is determined by @type and @function, to the indicated PE for + * testing purpose. + */ +int eeh_pe_inject_err(struct eeh_pe *pe, int type, int func, + unsigned long addr, unsigned long mask) +{ + /* Invalid PE ? */ + if (!pe) + return -ENODEV; + + /* Unsupported operation ? */ + if (!eeh_ops || !eeh_ops-err_inject) + return -ENOENT; + + /* Check on PCI error type */ + if (type != EEH_ERR_TYPE_32 type != EEH_ERR_TYPE_64) + return -EINVAL; + + /* Check on PCI error function */ + if (func EEH_ERR_FUNC_MIN || func EEH_ERR_FUNC_MAX) + return -EINVAL; + + return eeh_ops-err_inject(pe, type, func, addr, mask); +} +EXPORT_SYMBOL_GPL(eeh_pe_inject_err); + static int proc_eeh_show(struct seq_file *m, void *v) { if (!eeh_enabled()) { -- 1.8.3.2 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev