Re: [PATCH v6 10/42] powerpc/powernv: pnv_ioda_setup_dma() configure one PE only
On Tue, Aug 11, 2015 at 12:39:02PM +1000, Alexey Kardashevskiy wrote: On 08/11/2015 10:29 AM, Gavin Shan wrote: On Mon, Aug 10, 2015 at 07:31:11PM +1000, Alexey Kardashevskiy wrote: On 08/06/2015 02:11 PM, Gavin Shan wrote: The original implementation of pnv_ioda_setup_dma() iterates the list of PEs and configures the DMA32 space for them one by one. The function was designed to be called during PHB fixup time. When configuring PE's DMA32 space in pcibios_setup_bridge(), in order to support PCI hotplug, we have to have the function PE oriented. This renames pnv_ioda_setup_dma() to pnv_ioda1_setup_dma() and adds one more argument struct pnv_ioda_pe *pe to it. The caller, pnv_pci_ioda_setup_DMA(), gets PE from the list and passes to it or pnv_pci_ioda2_setup_dma_pe(). The patch shouldn't cause behavioral changes. Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com --- arch/powerpc/platforms/powernv/pci-ioda.c | 75 +++ 1 file changed, 36 insertions(+), 39 deletions(-) diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 8456f37..cd22002 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -2443,52 +2443,29 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb, pnv_ioda_setup_bus_dma(pe, pe-pbus); } -static void pnv_ioda_setup_dma(struct pnv_phb *phb) +static unsigned int pnv_ioda1_setup_dma(struct pnv_phb *phb, + struct pnv_ioda_pe *pe, + unsigned int base) { struct pci_controller *hose = phb-hose; - struct pnv_ioda_pe *pe; - unsigned int dma_weight; + unsigned int dma_weight, segs; /* Calculate the PHB's DMA weight */ dma_weight = pnv_ioda_phb_dma_weight(phb); pr_info(PCI%04x has %ld DMA32 segments, total weight %d\n, hose-global_number, phb-ioda.dma32_segcount, dma_weight); - pnv_pci_ioda_setup_opal_tce_kill(phb); - - /* Walk our PE list and configure their DMA segments, hand them -* out one base segment plus any residual segments based on -* weight -*/ - list_for_each_entry(pe, phb-ioda.pe_dma_list, dma_link) { - if (!pe-dma32_weight) - continue; - - /* -* For IODA2 compliant PHB3, we needn't care about the weight. -* The all available 32-bits DMA space will be assigned to -* the specific PE. -*/ - if (phb-type == PNV_PHB_IODA1) { - unsigned int segs, base = 0; - - if (pe-dma32_weight - dma_weight / phb-ioda.dma32_segcount) - segs = 1; - else - segs = (pe-dma32_weight * - phb-ioda.dma32_segcount) / dma_weight; - - pe_info(pe, DMA32 weight %d, assigned %d segments\n, - pe-dma32_weight, segs); - pnv_pci_ioda_setup_dma_pe(phb, pe, base, segs); + if (pe-dma32_weight + dma_weight / phb-ioda.dma32_segcount) Can be one line now. Indeed. + segs = 1; + else + segs = (pe-dma32_weight * + phb-ioda.dma32_segcount) / dma_weight; + pe_info(pe, DMA weight %d, assigned %d segments\n, + pe-dma32_weight, segs); + pnv_pci_ioda_setup_dma_pe(phb, pe, base, segs); Why not to merge pnv_ioda1_setup_dma() to pnv_pci_ioda_setup_dma_pe()? There're two reasons: - They're separate logically. One is calculating number of DMA32 segments required. Another one is allocate TCE32 tables and configure devices with them. - In PCI hotplug path, I need pnv_ioda1_setup_dma() which has pe as parameter. And hotplug path does not care about dma weight why? PHB3 doesn't care about DMA weight, but P7IOC needs. - base += segs; - } else { - pe_info(pe, Assign DMA32 space\n); - pnv_pci_ioda2_setup_dma_pe(phb, pe); - } - } + return segs; } #ifdef CONFIG_PCI_MSI @@ -2955,12 +2932,32 @@ static void pnv_pci_ioda_setup_DMA(void) { struct pci_controller *hose, *tmp; struct pnv_phb *phb; + struct pnv_ioda_pe *pe; + unsigned int base; list_for_each_entry_safe(hose, tmp, hose_list, list_node) { - pnv_ioda_setup_dma(hose-private_data); + phb = hose-private_data; + pnv_pci_ioda_setup_opal_tce_kill(phb); + + base = 0; + list_for_each_entry(pe, phb-ioda.pe_dma_list, dma_link) { + if (!pe-dma32_weight) + continue; + + switch (phb-type) { + case PNV_PHB_IODA1: + base += pnv_ioda1_setup_dma(phb, pe, base); This @base handling seems never be tested between 8..11 as [PATCH v6 11/42] powerpc/powernv: Trace DMA32 segments consumed by
[v2 01/11] powerpc: re-add devm_ioremap_prot()
From: Emil Medve emilian.me...@freescale.com devm_ioremap_prot() was removed in commit dedd24a12, and was introduced in commit b41e5fffe8. This reverts commit dedd24a12fe6735898feeb06184ee346907abb5d. Signed-off-by: Emil Medve emilian.me...@freescale.com --- arch/powerpc/include/asm/io.h |3 +++ arch/powerpc/lib/Makefile |1 + arch/powerpc/lib/devres.c | 43 + 3 files changed, 47 insertions(+) create mode 100644 arch/powerpc/lib/devres.c diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h index a8d2ef3..9eaf301 100644 --- a/arch/powerpc/include/asm/io.h +++ b/arch/powerpc/include/asm/io.h @@ -855,6 +855,9 @@ static inline void * bus_to_virt(unsigned long address) #define clrsetbits_8(addr, clear, set) clrsetbits(8, addr, clear, set) +void __iomem *devm_ioremap_prot(struct device *dev, resource_size_t offset, + size_t size, unsigned long flags); + #endif /* __KERNEL__ */ #endif /* _ASM_POWERPC_IO_H */ diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile index a47e142..7ae60f0 100644 --- a/arch/powerpc/lib/Makefile +++ b/arch/powerpc/lib/Makefile @@ -13,6 +13,7 @@ obj-y += string.o alloc.o crtsavres.o ppc_ksyms.o code-patching.o \ feature-fixups.o obj-$(CONFIG_PPC32)+= div64.o copy_32.o +obj-$(CONFIG_HAS_IOMEM)+= devres.o obj64-y+= copypage_64.o copyuser_64.o usercopy_64.o mem_64.o hweight_64.o \ copyuser_power7.o string_64.o copypage_power7.o memcpy_power7.o \ diff --git a/arch/powerpc/lib/devres.c b/arch/powerpc/lib/devres.c new file mode 100644 index 000..8df55fc --- /dev/null +++ b/arch/powerpc/lib/devres.c @@ -0,0 +1,43 @@ +/* + * Copyright (C) 2008 Freescale Semiconductor, Inc. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#include linux/device.h /* devres_*(), devm_ioremap_release() */ +#include linux/gfp.h +#include linux/io.h /* ioremap_prot() */ +#include linux/export.h /* EXPORT_SYMBOL() */ + +/** + * devm_ioremap_prot - Managed ioremap_prot() + * @dev: Generic device to remap IO address for + * @offset: BUS offset to map + * @size: Size of map + * @flags: Page flags + * + * Managed ioremap_prot(). Map is automatically unmapped on driver + * detach. + */ +void __iomem *devm_ioremap_prot(struct device *dev, resource_size_t offset, +size_t size, unsigned long flags) +{ + void __iomem **ptr, *addr; + + ptr = devres_alloc(devm_ioremap_release, sizeof(*ptr), GFP_KERNEL); + if (!ptr) + return NULL; + + addr = ioremap_prot(offset, size, flags); + if (addr) { + *ptr = addr; + devres_add(dev, ptr); + } else + devres_free(ptr); + + return addr; +} +EXPORT_SYMBOL(devm_ioremap_prot); -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[v2 10/11] soc/qman: Add HOTPLUG_CPU support to the QMan driver
From: Hai-Ying Wang haiying.w...@freescale.com Add support for CPU hotplug for the DPAA 1.0 Queue Manager driver. Signed-off-by: Hai-Ying Wang haiying.w...@freescale.com Signed-off-by: Emil Medve emilian.me...@freescale.com Signed-off-by: Roy Pledge roy.ple...@freescale.com --- drivers/soc/fsl/qbman/qman_portal.c | 43 +++ 1 file changed, 43 insertions(+) diff --git a/drivers/soc/fsl/qbman/qman_portal.c b/drivers/soc/fsl/qbman/qman_portal.c index ad9e3ba..85acba2 100644 --- a/drivers/soc/fsl/qbman/qman_portal.c +++ b/drivers/soc/fsl/qbman/qman_portal.c @@ -474,6 +474,46 @@ static void qman_offline_cpu(unsigned int cpu) } } +#ifdef CONFIG_HOTPLUG_CPU +static void qman_online_cpu(unsigned int cpu) +{ + struct qman_portal *p; + const struct qm_portal_config *pcfg; + + p = (struct qman_portal *)affine_portals[cpu]; + if (p) { + pcfg = qman_get_qm_portal_config(p); + if (pcfg) { + irq_set_affinity(pcfg-public_cfg.irq, cpumask_of(cpu)); + qman_portal_update_sdest(pcfg, cpu); + } + } +} + +static int qman_hotplug_cpu_callback(struct notifier_block *nfb, +unsigned long action, void *hcpu) +{ + unsigned int cpu = (unsigned long)hcpu; + + switch (action) { + case CPU_ONLINE: + case CPU_ONLINE_FROZEN: + qman_online_cpu(cpu); + break; + case CPU_DOWN_PREPARE: + case CPU_DOWN_PREPARE_FROZEN: + qman_offline_cpu(cpu); + default: + break; + } + return NOTIFY_OK; +} + +static struct notifier_block qman_hotplug_cpu_notifier = { + .notifier_call = qman_hotplug_cpu_callback, +}; +#endif /* CONFIG_HOTPLUG_CPU */ + __init int qman_init(void) { struct cpumask slave_cpus; @@ -597,6 +637,9 @@ __init int qman_init(void) cpumask_andnot(offline_cpus, cpu_possible_mask, cpu_online_mask); for_each_cpu(cpu, offline_cpus) qman_offline_cpu(cpu); +#ifdef CONFIG_HOTPLUG_CPU + register_hotcpu_notifier(qman_hotplug_cpu_notifier); +#endif return 0; } -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[v2 09/11] soc/bman: Add HOTPLUG_CPU support to the BMan driver
From: Hai-Ying Wang haiying.w...@freescale.com Add support for CPU hotplug for the DPAA 1.0 Buffer Manager driver Signed-off-by: Hai-Ying Wang haiying.w...@freescale.com Signed-off-by: Emil Medve emilian.me...@freescale.com Signed-off-by: Roy Pledge roy.ple...@freescale.com --- drivers/soc/fsl/qbman/bman_portal.c | 40 +++ drivers/soc/fsl/qbman/dpaa_sys.h|3 +++ 2 files changed, 43 insertions(+) diff --git a/drivers/soc/fsl/qbman/bman_portal.c b/drivers/soc/fsl/qbman/bman_portal.c index 62d8f64..f33d671 100644 --- a/drivers/soc/fsl/qbman/bman_portal.c +++ b/drivers/soc/fsl/qbman/bman_portal.c @@ -129,6 +129,42 @@ static void __cold bman_offline_cpu(unsigned int cpu) } } +#ifdef CONFIG_HOTPLUG_CPU +static void __cold bman_online_cpu(unsigned int cpu) +{ + struct bman_portal *p = (struct bman_portal *)affine_bportals[cpu]; + const struct bm_portal_config *pcfg; + + if (p) { + pcfg = bman_get_bm_portal_config(p); + if (pcfg) + irq_set_affinity(pcfg-public_cfg.irq, cpumask_of(cpu)); + } +} + +static int __cold bman_hotplug_cpu_callback(struct notifier_block *nfb, + unsigned long action, void *hcpu) +{ + unsigned int cpu = (unsigned long)hcpu; + + switch (action) { + case CPU_ONLINE: + case CPU_ONLINE_FROZEN: + bman_online_cpu(cpu); + break; + case CPU_DOWN_PREPARE: + case CPU_DOWN_PREPARE_FROZEN: + bman_offline_cpu(cpu); + } + + return NOTIFY_OK; +} + +static struct notifier_block bman_hotplug_cpu_notifier = { + .notifier_call = bman_hotplug_cpu_callback, +}; +#endif /* CONFIG_HOTPLUG_CPU */ + static int __cold bman_portal_probe(struct platform_device *of_dev) { struct device *dev = of_dev-dev; @@ -342,6 +378,10 @@ static int __init bman_portal_driver_register(struct platform_driver *drv) for_each_cpu(cpu, offline_cpus) bman_offline_cpu(cpu); +#ifdef CONFIG_HOTPLUG_CPU + register_hotcpu_notifier(bman_hotplug_cpu_notifier); +#endif + bman_seed_bpid_range(0, bman_pool_max); return 0; diff --git a/drivers/soc/fsl/qbman/dpaa_sys.h b/drivers/soc/fsl/qbman/dpaa_sys.h index 0dd341c..d1da092 100644 --- a/drivers/soc/fsl/qbman/dpaa_sys.h +++ b/drivers/soc/fsl/qbman/dpaa_sys.h @@ -43,6 +43,9 @@ #include linux/vmalloc.h #include linux/platform_device.h #include linux/ctype.h +#ifdef CONFIG_HOTPLUG_CPU +#include linux/cpu.h +#endif #include asm/pgtable.h -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: RFC: prepare for struct scatterlist entries without page backing
Hi, On Wed, Aug 12, 2015 at 10:42 PM, Boaz Harrosh b...@plexistor.com wrote: On 08/12/2015 10:05 AM, Christoph Hellwig wrote: It turns out most DMA mapping implementation can handle SGLs without page structures with some fairly simple mechanical work. Most of it is just about consistently using sg_phys. For implementations that need to flush caches we need a new helper that skips these cache flushes if a entry doesn't have a kernel virtual address. However the ccio (parisc) and sba_iommu (parisc ia64) IOMMUs seem to be operate mostly on virtual addresses. It's a fairly odd concept that I don't fully grasp, so I'll need some help with those if we want to bring this forward. Additional this series skips ARM entirely for now. The reason is that most arm implementations of the .map_sg operation just iterate over all entries and call -map_page for it, which means we'd need to convert those to a -map_pfn similar to Dan's previous approach. [snip] It is a bit of work but is worth while, and accelerating tremendously lots of workloads. Not like this abomination which only branches things more and more, and making things fatter and slower. As a random guy reading a big bunch of patches on code I know almost nothing about, parts of this comment really resonated with me: overall, we seem to be adding a lot of if statements to code that appears to be in a hot path. I.e. ~90% of this patch set seems to be just mechanically dropping BUG_ON()s and converting open coded stuff to use accessor functions (which should be macros or get inlined, right?) - and the remaining bit is not flushing if we don't have a physical page somewhere. Would it make sense to split this patch set into a few bits: one to drop all the useless BUG_ON()s, one to convert all the open coded stuff to accessor functions, then another to do the actual page-less sg stuff? Thanks, -- Julian Calaby Email: julian.cal...@gmail.com Profile: http://www.google.com/profiles/julian.calaby/ ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v6 08/42] powerpc/powernv: Calculate PHB's DMA weight dynamically
On Mon, Aug 10, 2015 at 07:21:12PM +1000, Alexey Kardashevskiy wrote: On 08/06/2015 02:11 PM, Gavin Shan wrote: For P7IOC, the whole available DMA32 space, which is below the MEM32 space, is divided evenly into 256MB segments. The number of continuous segments assigned to one particular PE depends on the PE's DMA weight that is calculated based on the type of each PCI devices contained in the PE, and PHB's DMA weight which is accumulative DMA weight of PEs contained in the PHB. It means that the PHB's DMA weight calculation depends on existing PEs, which works perfectly now, but not hotplug friendly. As the whole available DMA32 space can be assigned to one PE on PHB3, so we don't have the issue on PHB3. The patch calculates PHB's DMA weight based on the PCI devices contained in the PHB dynamically so that it's hotplug friendly. Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com --- arch/powerpc/platforms/powernv/pci-ioda.c | 88 +++ arch/powerpc/platforms/powernv/pci.h | 6 --- 2 files changed, 43 insertions(+), 51 deletions(-) diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 713f4b4..7342cfd 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -927,6 +927,9 @@ static void pnv_ioda_link_pe_by_weight(struct pnv_phb *phb, static unsigned int pnv_ioda_dma_weight(struct pci_dev *dev) { + struct pci_controller *hose = pci_bus_to_host(dev-bus); + struct pnv_phb *phb = hose-private_data; + /* This is quite simplistic. The base weight of a device * is 10. 0 means no DMA is to be accounted for it. */ @@ -939,14 +942,34 @@ static unsigned int pnv_ioda_dma_weight(struct pci_dev *dev) if (dev-class == PCI_CLASS_SERIAL_USB_UHCI || dev-class == PCI_CLASS_SERIAL_USB_OHCI || dev-class == PCI_CLASS_SERIAL_USB_EHCI) - return 3; + return 3 * phb-ioda.tce32_count; /* Increase the weight of RAID (includes Obsidian) */ if ((dev-class 8) == PCI_CLASS_STORAGE_RAID) - return 15; + return 15 * phb-ioda.tce32_count; /* Default */ - return 10; + return 10 * phb-ioda.tce32_count; +} + +static int __pnv_ioda_phb_dma_weight(struct pci_dev *pdev, void *data) +{ + unsigned int *dma_weight = data; + + *dma_weight += pnv_ioda_dma_weight(pdev); + return 0; +} + +static unsigned int pnv_ioda_phb_dma_weight(struct pnv_phb *phb) +{ + unsigned int dma_weight = 0; + + if (!phb-hose-bus) + return 0; + + pci_walk_bus(phb-hose-bus, + __pnv_ioda_phb_dma_weight, dma_weight); + return dma_weight; } #ifdef CONFIG_PCI_IOV @@ -1097,14 +1120,6 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all) /* Put PE to the list */ list_add_tail(pe-list, phb-ioda.pe_list); - /* Account for one DMA PE if at least one DMA capable device exist - * below the bridge - */ - if (pe-dma_weight != 0) { - phb-ioda.dma_weight += pe-dma_weight; - phb-ioda.dma_pe_count++; - } - /* Link the PE */ pnv_ioda_link_pe_by_weight(phb, pe); } @@ -2431,24 +2446,13 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb, static void pnv_ioda_setup_dma(struct pnv_phb *phb) { struct pci_controller *hose = phb-hose; - unsigned int residual, remaining, segs, tw, base; struct pnv_ioda_pe *pe; + unsigned int dma_weight; - /* If we have more PE# than segments available, hand out one - * per PE until we run out and let the rest fail. If not, - * then we assign at least one segment per PE, plus more based - * on the amount of devices under that PE - */ - if (phb-ioda.dma_pe_count phb-ioda.tce32_count) - residual = 0; - else - residual = phb-ioda.tce32_count - - phb-ioda.dma_pe_count; - - pr_info(PCI: Domain %04x has %ld available 32-bit DMA segments\n, - hose-global_number, phb-ioda.tce32_count); - pr_info(PCI: %d PE# for a total weight of %d\n, - phb-ioda.dma_pe_count, phb-ioda.dma_weight); + /* Calculate the PHB's DMA weight */ + dma_weight = pnv_ioda_phb_dma_weight(phb); + pr_info(PCI%04x has %ld DMA32 segments, total weight %d\n, + hose-global_number, phb-ioda.tce32_count, dma_weight); pnv_pci_ioda_setup_opal_tce_kill(phb); @@ -2456,22 +2460,9 @@ static void pnv_ioda_setup_dma(struct pnv_phb *phb) * out one base segment plus any residual segments based on * weight */ - remaining = phb-ioda.tce32_count; - tw = phb-ioda.dma_weight; - base = 0; list_for_each_entry(pe, phb-ioda.pe_dma_list, dma_link) { if (!pe-dma_weight) continue; - if (!remaining) { - pe_warn(pe, No DMA32 resources
[v2 07/11] soc/bman: Add debugfs support for the BMan driver
From: Geoff Thorpe geoff.tho...@freescale.com Add debugfs support for querying the state of hardware based Buffer Manager pools used in DPAA 1.0. Signed-off-by: Geoff Thorpe geoff.tho...@freescale.com Signed-off-by: Emil Medve emilian.me...@freescale.com Signed-off-by: Roy Pledge roy.ple...@freescale.com --- drivers/soc/fsl/qbman/Kconfig|7 ++ drivers/soc/fsl/qbman/Makefile |1 + drivers/soc/fsl/qbman/bman-debugfs.c | 117 ++ drivers/soc/fsl/qbman/bman_api.c | 19 ++ drivers/soc/fsl/qbman/dpaa_sys.h |7 +- 5 files changed, 145 insertions(+), 6 deletions(-) create mode 100644 drivers/soc/fsl/qbman/bman-debugfs.c diff --git a/drivers/soc/fsl/qbman/Kconfig b/drivers/soc/fsl/qbman/Kconfig index 1f2063a..919ef15 100644 --- a/drivers/soc/fsl/qbman/Kconfig +++ b/drivers/soc/fsl/qbman/Kconfig @@ -54,6 +54,13 @@ config FSL_BMAN_TEST_THRESH drainer thread, and the other threads that they observe exactly the depletion state changes that are expected. +config FSL_BMAN_DEBUGFS + tristate BMan debugfs support + depends on DEBUG_FS + default n + help + BMan debugfs support + config FSL_QMAN bool QMan device management default n diff --git a/drivers/soc/fsl/qbman/Makefile b/drivers/soc/fsl/qbman/Makefile index 82f5482..2b53fbc 100644 --- a/drivers/soc/fsl/qbman/Makefile +++ b/drivers/soc/fsl/qbman/Makefile @@ -9,6 +9,7 @@ obj-$(CONFIG_FSL_BMAN_TEST) += bman-test.o bman-test-y = bman_test.o bman-test-$(CONFIG_FSL_BMAN_TEST_API) += bman_test_api.o bman-test-$(CONFIG_FSL_BMAN_TEST_THRESH) += bman_test_thresh.o +obj-$(CONFIG_FSL_BMAN_DEBUGFS) += bman-debugfs.o obj-$(CONFIG_FSL_QMAN) += qman_api.o qman_utils.o qman_driver.o obj-$(CONFIG_FSL_QMAN_CONFIG) += qman.o qman_portal.o diff --git a/drivers/soc/fsl/qbman/bman-debugfs.c b/drivers/soc/fsl/qbman/bman-debugfs.c new file mode 100644 index 000..b384f47 --- /dev/null +++ b/drivers/soc/fsl/qbman/bman-debugfs.c @@ -0,0 +1,117 @@ +/* Copyright 2010 - 2015 Freescale Semiconductor, Inc. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * * Neither the name of Freescale Semiconductor nor the + * names of its contributors may be used to endorse or promote products + * derived from this software without specific prior written permission. + * + * ALTERNATIVELY, this software may be distributed under the terms of the + * GNU General Public License (GPL) as published by the Free Software + * Foundation, either version 2 of that License or (at your option) any + * later version. + * + * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY + * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED + * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE + * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY + * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES + * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; + * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS + * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#include bman_priv.h + +static struct dentry *dfs_root; /* debugfs root directory */ + +/* Query Buffer Pool State */ + +static int query_bp_state_show(struct seq_file *file, void *offset) +{ + int ret; + struct bm_pool_state state; + int i, j; + u32 mask; + + memset(state, 0, sizeof(state)); + ret = bman_query_pools(state); + if (ret) { + seq_printf(file, Error %d\n, ret); + return ret; + } + + seq_puts(file, bp_id free_buffers_avail bp_depleted\n); + for (i = 0; i 2; i++) { + mask = 0x8000; + for (j = 0; j 32; j++) { + seq_printf(file, + %-2u %-3s %-3s\n, +(i * 32) + j, +state.as.state.__state[i] mask ? no : yes, +state.ds.state.__state[i] mask ? yes : no); +mask = 1; +
[v2 11/11] soc/qman: add qman_delete_cgr_safe()
From: Madalin Bucur madalin.bu...@freescale.com Add qman_delete_cgr_safe() that can be called from any CPU. This in turn schedules qman_delete_cgr() on the proper CPU. Signed-off-by: Madalin Bucur madalin.bu...@freescale.com Signed-off-by: Roy Pledge roy.ple...@freescale.com --- drivers/soc/fsl/qbman/qman_api.c | 46 ++ 1 file changed, 46 insertions(+) diff --git a/drivers/soc/fsl/qbman/qman_api.c b/drivers/soc/fsl/qbman/qman_api.c index d4f9be0..1dd60f2 100644 --- a/drivers/soc/fsl/qbman/qman_api.c +++ b/drivers/soc/fsl/qbman/qman_api.c @@ -2463,6 +2463,8 @@ EXPORT_SYMBOL(qman_modify_cgr); QM_CHANNEL_SWPORTAL0)) #define PORTAL_IDX(n) (n-config-public_cfg.channel - QM_CHANNEL_SWPORTAL0) +static u8 qman_cgr_cpus[__CGR_NUM]; + int qman_create_cgr(struct qman_cgr *cgr, u32 flags, struct qm_mcc_initcgr *opts) { @@ -2479,7 +2481,10 @@ int qman_create_cgr(struct qman_cgr *cgr, u32 flags, if (cgr-cgrid = __CGR_NUM) return -EINVAL; + preempt_disable(); p = get_affine_portal(); + qman_cgr_cpus[cgr-cgrid] = smp_processor_id(); + preempt_enable(); memset(local_opts, 0, sizeof(struct qm_mcc_initcgr)); cgr-chan = p-config-public_cfg.channel; @@ -2621,6 +2626,47 @@ put_portal: } EXPORT_SYMBOL(qman_delete_cgr); +struct cgr_comp { + struct qman_cgr *cgr; + struct completion completion; +}; + +static int qman_delete_cgr_thread(void *p) +{ + struct cgr_comp *cgr_comp = (struct cgr_comp *)p; + int res; + + res = qman_delete_cgr((struct qman_cgr *)cgr_comp-cgr); + complete(cgr_comp-completion); + + return res; +} + +void qman_delete_cgr_safe(struct qman_cgr *cgr) +{ + struct task_struct *thread; + struct cgr_comp cgr_comp; + + preempt_disable(); + if (qman_cgr_cpus[cgr-cgrid] != smp_processor_id()) { + init_completion(cgr_comp.completion); + cgr_comp.cgr = cgr; + thread = kthread_create(qman_delete_cgr_thread, cgr_comp, + cgr_del); + + if (likely(!IS_ERR(thread))) { + kthread_bind(thread, qman_cgr_cpus[cgr-cgrid]); + wake_up_process(thread); + wait_for_completion(cgr_comp.completion); + preempt_enable(); + return; + } + } + qman_delete_cgr(cgr); + preempt_enable(); +} +EXPORT_SYMBOL(qman_delete_cgr_safe); + int qman_set_wpm(int wpm_enable) { return qm_set_wpm(wpm_enable); -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v6 07/42] powerpc/powernv: Improve IO and M32 mapping
On Tue, Aug 11, 2015 at 12:32:13PM +1000, Alexey Kardashevskiy wrote: On 08/11/2015 10:12 AM, Gavin Shan wrote: On Mon, Aug 10, 2015 at 05:40:08PM +1000, Alexey Kardashevskiy wrote: On 08/06/2015 02:11 PM, Gavin Shan wrote: There're 3 windows (IO, M32 and M64) for PHB, root port and upstream These are actually IO, non-prefetchable and prefetchable windows which happen to be IO, 32bit and 64bit windows but this has nothing to do with the M32/M64 BAR registers in P7IOC/PHB3, do I understand this correctly? In pci-ioda.c, we have below definiations that are defined when developing the code, not from any specification: IO - resources with IO property M32 - 32-bits or non-prefetchable resources M64 - 64-bits and prefetchable resources This what I am saying - it is incorrect and confusing. M32/M64 are PHB3 register names and associated windows (with M in the beginning) but not device resources. I don't see how it's incorrect and confusing. M32/M64 are not PHB3 register names. Also, device resource is either IO, 32-bits prefetchable, memory, 32-bits non-prefetchable memory, 64-bits non-prefetchable memory, 64-bits prefetchable memory. They match with IO, M32, M64. port of the PCIE switch behind root port. In order to support PCI hotplug, we extend the start/end address of those 3 windows of root port or upstream port to the start/end address of the 3 PHB's windows. The current implementation, assigning IO or M32 segment based on the bridge's windows, isn't reliable. The patch fixes above issue by calculating PE's consumed IO or M32 segments from its contained devices, no PCI bridge windows involved if the PE doesn't contain all the subordinate PCI buses. Please, rephrase it. How can PCI bridges be involved in PE consumption? Ok. Will add something like below: if the PE, corresponding to the PCI bus, doesn't contain all the subordinate PCI buses. No, my question was about PCI bridge windows involved - what do you do to the windows if PE does not own all child buses? All of it is about the original implementation: When the PE doesn't include all child buses, the resource consumed by the PE is: resources assigned to current PCI bus and then exclude the resources assigned to the child buses. Note that PCI bridge windows are actually PCI bus's resource. Otherwise, the PCI bridge windows still contribute to PE's consumed IO or M32 segments. PCI bridge windows themselves consume PEs? Is that correct? PCI bridge windows consume IO, M32, M64 segments, not PEs. Ah, right. Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com --- arch/powerpc/platforms/powernv/pci-ioda.c | 136 +- 1 file changed, 79 insertions(+), 57 deletions(-) diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 488a53e..713f4b4 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -2844,75 +2844,97 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev) } #endif /* CONFIG_PCI_IOV */ -/* - * This function is supposed to be called on basis of PE from top - * to bottom style. So the the I/O or MMIO segment assigned to - * parent PE could be overrided by its child PEs if necessary. - */ -static void pnv_ioda_setup_pe_seg(struct pci_controller *hose, - struct pnv_ioda_pe *pe) +static int pnv_ioda_setup_one_res(struct pci_controller *hose, + struct pnv_ioda_pe *pe, + struct resource *res) { struct pnv_phb *phb = hose-private_data; struct pci_bus_region region; - struct resource *res; - int i, index; - unsigned int segsize; + unsigned int index, segsize; unsigned long *segmap, *pe_segmap; uint16_t win; int64_t rc; - /* -* NOTE: We only care PCI bus based PE for now. For PCI -* device based PE, for example SRIOV sensitive VF should -* be figured out later. -*/ - BUG_ON(!(pe-flags (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL))); + /* Check if we need map the resource */ + if (!res-parent || !res-flags || res-start res-end) res-start = res-end ? No, res-start == res-end is valid. + return 0; - pci_bus_for_each_resource(pe-pbus, res, i) { - if (!res || !res-flags || - res-start res-end) - continue; + if (res-flags IORESOURCE_IO) { + region.start = res-start - phb-ioda.io_pci_base; + region.end = res-end - phb-ioda.io_pci_base; + segsize = phb-ioda.io_segsize; + segmap = phb-ioda.io_segmap; + pe_segmap= pe-io_segmap; + win = OPAL_IO_WINDOW_TYPE; + } else if ((res-flags IORESOURCE_MEM) + !pnv_pci_is_mem_pref_64(res-flags)) { + region.start = res-start - + hose-mem_offset[0] - + phb-ioda.m32_pci_base; + region.end =
[v2 05/11] soc/bman: Add self-tester for BMan driver
From: Geoff Thorpe geoff.tho...@freescale.com Add a self test for the DPAA 1.0 Buffer Manager driver. This test ensures that the driver can properly acquire and release buffers using the BMan portal infrastructure. Signed-off-by: Geoff Thorpe geoff.tho...@freescale.com Signed-off-by: Emil Medve emilian.me...@freescale.com Signed-off-by: Roy Pledge roy.ple...@freescale.com --- drivers/soc/fsl/qbman/Kconfig| 26 drivers/soc/fsl/qbman/Makefile |4 + drivers/soc/fsl/qbman/bman_test.c| 56 + drivers/soc/fsl/qbman/bman_test.h| 34 + drivers/soc/fsl/qbman/bman_test_api.c| 184 +++ drivers/soc/fsl/qbman/bman_test_thresh.c | 198 ++ drivers/soc/fsl/qbman/dpaa_sys.h |1 + 7 files changed, 503 insertions(+) create mode 100644 drivers/soc/fsl/qbman/bman_test.c create mode 100644 drivers/soc/fsl/qbman/bman_test.h create mode 100644 drivers/soc/fsl/qbman/bman_test_api.c create mode 100644 drivers/soc/fsl/qbman/bman_test_thresh.c diff --git a/drivers/soc/fsl/qbman/Kconfig b/drivers/soc/fsl/qbman/Kconfig index 1ff52a8..1f2063a 100644 --- a/drivers/soc/fsl/qbman/Kconfig +++ b/drivers/soc/fsl/qbman/Kconfig @@ -28,6 +28,32 @@ config FSL_BMAN_PORTAL help FSL BMan portal driver +config FSL_BMAN_TEST + tristate BMan self-tests + default n + help + Compile self-test code + +config FSL_BMAN_TEST_API + bool High-level API self-test + depends on FSL_BMAN_TEST + default y + help + This requires the presence of cpu-affine portals, and performs + high-level API testing with them (whichever portal(s) are affine + to the cpu(s) the test executes on). + +config FSL_BMAN_TEST_THRESH + bool Thresholds self-test + depends on FSL_BMAN_TEST + default y + help + Multi-threaded (SMP) test of BMan pool depletion. A pool is seeded + before multiple threads (one per cpu) create pool objects to track + depletion state changes. The pool is then drained to empty by a + drainer thread, and the other threads that they observe exactly + the depletion state changes that are expected. + config FSL_QMAN bool QMan device management default n diff --git a/drivers/soc/fsl/qbman/Makefile b/drivers/soc/fsl/qbman/Makefile index 0d96598..04509c3 100644 --- a/drivers/soc/fsl/qbman/Makefile +++ b/drivers/soc/fsl/qbman/Makefile @@ -5,6 +5,10 @@ obj-$(CONFIG_FSL_BMAN) += bman.o obj-$(CONFIG_FSL_BMAN_PORTAL) += bman-portal.o bman-portal-y = bman_portal.o bman_api.o \ bman_utils.o +obj-$(CONFIG_FSL_BMAN_TEST)+= bman-test.o +bman-test-y = bman_test.o +bman-test-$(CONFIG_FSL_BMAN_TEST_API) += bman_test_api.o +bman-test-$(CONFIG_FSL_BMAN_TEST_THRESH) += bman_test_thresh.o obj-$(CONFIG_FSL_QMAN) += qman_api.o qman_utils.o qman_driver.o obj-$(CONFIG_FSL_QMAN_CONFIG) += qman.o qman_portal.o diff --git a/drivers/soc/fsl/qbman/bman_test.c b/drivers/soc/fsl/qbman/bman_test.c new file mode 100644 index 000..9298093 --- /dev/null +++ b/drivers/soc/fsl/qbman/bman_test.c @@ -0,0 +1,56 @@ +/* Copyright 2008 - 2015 Freescale Semiconductor, Inc. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * * Neither the name of Freescale Semiconductor nor the + * names of its contributors may be used to endorse or promote products + * derived from this software without specific prior written permission. + * + * ALTERNATIVELY, this software may be distributed under the terms of the + * GNU General Public License (GPL) as published by the Free Software + * Foundation, either version 2 of that License or (at your option) any + * later version. + * + * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY + * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED + * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE + * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY + * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES + * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; + * LOSS OF USE,
[v2 02/11] soc/fsl: Introduce DPAA BMan device management driver
From: Geoff Thorpe geoff.tho...@freescale.com This driver enables the Freescale DPAA 1.0 Buffer Manager block. BMan is a hardware buffer pool manager that allows accelerators connected to the SoC datapath to acquire and release buffers during data processing. Signed-off-by: Geoff Thorpe geoff.tho...@freescale.com Signed-off-by: Emil Medve emilian.me...@freescale.com Signed-off-by: Roy Pledge roy.ple...@freescale.com --- drivers/soc/Kconfig |1 + drivers/soc/Makefile |1 + drivers/soc/fsl/Kconfig |5 + drivers/soc/fsl/Makefile |3 + drivers/soc/fsl/qbman/Kconfig | 25 ++ drivers/soc/fsl/qbman/Makefile|1 + drivers/soc/fsl/qbman/bman.c | 553 + drivers/soc/fsl/qbman/bman_priv.h | 53 drivers/soc/fsl/qbman/dpaa_sys.h | 55 9 files changed, 697 insertions(+) create mode 100644 drivers/soc/fsl/Kconfig create mode 100644 drivers/soc/fsl/Makefile create mode 100644 drivers/soc/fsl/qbman/Kconfig create mode 100644 drivers/soc/fsl/qbman/Makefile create mode 100644 drivers/soc/fsl/qbman/bman.c create mode 100644 drivers/soc/fsl/qbman/bman_priv.h create mode 100644 drivers/soc/fsl/qbman/dpaa_sys.h diff --git a/drivers/soc/Kconfig b/drivers/soc/Kconfig index 96ddecb..4e3c8f4 100644 --- a/drivers/soc/Kconfig +++ b/drivers/soc/Kconfig @@ -1,6 +1,7 @@ menu SOC (System On Chip) specific Drivers source drivers/soc/mediatek/Kconfig +source drivers/soc/fsl/Kconfig source drivers/soc/qcom/Kconfig source drivers/soc/sunxi/Kconfig source drivers/soc/ti/Kconfig diff --git a/drivers/soc/Makefile b/drivers/soc/Makefile index 7dc7c0d..7adcd97 100644 --- a/drivers/soc/Makefile +++ b/drivers/soc/Makefile @@ -3,6 +3,7 @@ # obj-$(CONFIG_ARCH_MEDIATEK)+= mediatek/ +obj-$(CONFIG_FSL_SOC) += fsl/ obj-$(CONFIG_ARCH_QCOM)+= qcom/ obj-$(CONFIG_ARCH_SUNXI) += sunxi/ obj-$(CONFIG_ARCH_TEGRA) += tegra/ diff --git a/drivers/soc/fsl/Kconfig b/drivers/soc/fsl/Kconfig new file mode 100644 index 000..daa9c0d --- /dev/null +++ b/drivers/soc/fsl/Kconfig @@ -0,0 +1,5 @@ +menu Freescale SOC (System On Chip) specific Drivers + +source drivers/soc/fsl/qbman/Kconfig + +endmenu diff --git a/drivers/soc/fsl/Makefile b/drivers/soc/fsl/Makefile new file mode 100644 index 000..19e74bb --- /dev/null +++ b/drivers/soc/fsl/Makefile @@ -0,0 +1,3 @@ +# Common +obj-$(CONFIG_FSL_DPA) += qbman/ + diff --git a/drivers/soc/fsl/qbman/Kconfig b/drivers/soc/fsl/qbman/Kconfig new file mode 100644 index 000..be4ae01 --- /dev/null +++ b/drivers/soc/fsl/qbman/Kconfig @@ -0,0 +1,25 @@ +menuconfig FSL_DPA + bool Freescale DPAA support + depends on FSL_SOC || COMPILE_TEST + default n + help + FSL Data-Path Acceleration Architecture drivers + + These are not the actual Ethernet driver(s) + +if FSL_DPA + +config FSL_DPA_CHECKING + bool additional driver checking + default n + help + Compiles in additional checks to sanity-check the drivers and + any use of it by other code. Not recommended for performance + +config FSL_BMAN + tristate BMan device management + default n + help + FSL DPAA BMan driver + +endif # FSL_DPA diff --git a/drivers/soc/fsl/qbman/Makefile b/drivers/soc/fsl/qbman/Makefile new file mode 100644 index 000..02014d9 --- /dev/null +++ b/drivers/soc/fsl/qbman/Makefile @@ -0,0 +1 @@ +obj-$(CONFIG_FSL_BMAN) += bman.o diff --git a/drivers/soc/fsl/qbman/bman.c b/drivers/soc/fsl/qbman/bman.c new file mode 100644 index 000..9a500ce --- /dev/null +++ b/drivers/soc/fsl/qbman/bman.c @@ -0,0 +1,553 @@ +/* Copyright (c) 2009 - 2015 Freescale Semiconductor, Inc. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * * Neither the name of Freescale Semiconductor nor the + * names of its contributors may be used to endorse or promote products + * derived from this software without specific prior written permission. + * + * ALTERNATIVELY, this software may be distributed under the terms of the + * GNU General Public License (GPL) as published by the Free Software + * Foundation, either version 2 of that License or (at your option) any + * later version. + * + * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY + * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +
Re: [PATCH v2 05/10] cxl: Refactor adaptor init/teardown
The function above doesn't even use the 'rc' value. Darn, you're right. I'll fix that in a new version. -- Regards, Daniel -- Regards, Daniel signature.asc Description: This is a digitally signed message part ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: RFC: prepare for struct scatterlist entries without page backing
On Wed, Aug 12, 2015 at 10:00 AM, James Bottomley james.bottom...@hansenpartnership.com wrote: On Wed, 2015-08-12 at 09:05 +0200, Christoph Hellwig wrote: ... However the ccio (parisc) and sba_iommu (parisc ia64) IOMMUs seem to be operate mostly on virtual addresses. It's a fairly odd concept that I don't fully grasp, so I'll need some help with those if we want to bring this forward. James explained the primary function of IOMMUs on parisc (DMA-Cache coherency) much better than I ever could. Three more observations: 1) the IOMMU can be bypassed by 64-bit DMA devices on IA64. 2) IOMMU enables 32-bit DMA devices to reach 32-bit physical memory and thus avoiding bounce buffers. parisc and older IA-64 have some 32-bit PCI devices - e.g. IDE boot HDD. 3) IOMMU acts as a proxy for IO devices by fetching cachelines of data for PA-RISC systems whose memory controllers ONLY serve cacheline sized transactions. ie. 32-bit DMA results in the IOMMU fetching the cacheline and updating just the 32-bits in a DMA cache coherent fashion. Bonus thought: 4) IOMMU can improve DMA performance in some cases using hints provided by the OS (e.g. prefetching DMA data or using READ_CURRENT bus transactions instead of normal memory fetches.) cheers, grant ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v6 11/42] powerpc/powernv: Trace DMA32 segments consumed by PE
On Mon, Aug 10, 2015 at 07:43:48PM +1000, Alexey Kardashevskiy wrote: On 08/06/2015 02:11 PM, Gavin Shan wrote: On P7IOC, the whole DMA32 space is divided evenly to 256MB segments. Each PE can consume one or multiple DMA32 segments. Current code doesn't trace the available DMA32 segments and those consumed by one particular PE. It's conflicting with PCI hotplug. The patch introduces one bitmap to PHB to trace the available DMA32 segments for allocation, more fields to struct pnv_ioda_pe to trace the consumed DMA32 segments by the PE, which is going to be released when the PE is destroyed at PCI unplugging time. Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com --- arch/powerpc/platforms/powernv/pci-ioda.c | 40 +++ arch/powerpc/platforms/powernv/pci.h | 4 +++- 2 files changed, 33 insertions(+), 11 deletions(-) diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index cd22002..57ba8fd 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -1946,6 +1946,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb, /* Grab a 32-bit TCE table */ pe-dma32_seg = base; + pe-dma32_segcount = segs; pe_info(pe, Setting up 32-bit TCE table at %08x..%08x\n, (base 28), ((base + segs) 28) - 1); @@ -2006,8 +2007,13 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb, return; fail: /* XXX Failure: Try to fallback to 64-bit only ? */ - if (pe-dma32_seg = 0) + if (pe-dma32_seg = 0) { + bitmap_clear(phb-ioda.dma32_segmap, + pe-dma32_seg, pe-dma32_segcount); pe-dma32_seg = -1; + pe-dma32_segcount = 0; + } + if (tce_mem) __free_pages(tce_mem, get_order(TCE32_TABLE_SIZE * segs)); if (tbl) { @@ -2443,12 +2449,11 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb, pnv_ioda_setup_bus_dma(pe, pe-pbus); } -static unsigned int pnv_ioda1_setup_dma(struct pnv_phb *phb, - struct pnv_ioda_pe *pe, - unsigned int base) +static void pnv_ioda1_setup_dma(struct pnv_phb *phb, + struct pnv_ioda_pe *pe) { struct pci_controller *hose = phb-hose; - unsigned int dma_weight, segs; + unsigned int dma_weight, base, segs; /* Calculate the PHB's DMA weight */ dma_weight = pnv_ioda_phb_dma_weight(phb); @@ -2461,11 +2466,28 @@ static unsigned int pnv_ioda1_setup_dma(struct pnv_phb *phb, else segs = (pe-dma32_weight * phb-ioda.dma32_segcount) / dma_weight; + + /* + * Allocate DMA32 segments. We might not have enough + * resources available. However we expect at least one + * to be available. + */ + do { + base = bitmap_find_next_zero_area(phb-ioda.dma32_segmap, + phb-ioda.dma32_segcount, + 0, segs, 0); + if (base phb-ioda.dma32_segcount) { + bitmap_set(phb-ioda.dma32_segmap, base, segs); + break; + } + } while (--segs); If segs==0 before entering the loop, the loop will execute 0xfffe times. Make it for(;segs;--segs){ }. segs is always equal to or bigger than 1 when entering the loop. + + if (WARN_ON(!segs)) + return; + pe_info(pe, DMA weight %d, assigned %d segments\n, pe-dma32_weight, segs); pnv_pci_ioda_setup_dma_pe(phb, pe, base, segs); - - return segs; } #ifdef CONFIG_PCI_MSI @@ -2933,20 +2955,18 @@ static void pnv_pci_ioda_setup_DMA(void) struct pci_controller *hose, *tmp; struct pnv_phb *phb; struct pnv_ioda_pe *pe; - unsigned int base; list_for_each_entry_safe(hose, tmp, hose_list, list_node) { phb = hose-private_data; pnv_pci_ioda_setup_opal_tce_kill(phb); - base = 0; list_for_each_entry(pe, phb-ioda.pe_dma_list, dma_link) { if (!pe-dma32_weight) continue; switch (phb-type) { case PNV_PHB_IODA1: - base += pnv_ioda1_setup_dma(phb, pe, base); + pnv_ioda1_setup_dma(phb, pe); break; case PNV_PHB_IODA2: pnv_pci_ioda2_setup_dma_pe(phb, pe); diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h index 574fe43..1dc9578 100644 --- a/arch/powerpc/platforms/powernv/pci.h +++ b/arch/powerpc/platforms/powernv/pci.h @@ -65,6 +65,7 @@ struct pnv_ioda_pe { /* Base iommu table, ie, 4K TCEs, 32-bit DMA */ int
Re: [PATCH v6 12/42] powerpc/powernv: Increase PE# capacity
On Tue, Aug 11, 2015 at 12:47:25PM +1000, Alexey Kardashevskiy wrote: On 08/11/2015 10:38 AM, Gavin Shan wrote: On Mon, Aug 10, 2015 at 07:53:02PM +1000, Alexey Kardashevskiy wrote: On 08/06/2015 02:11 PM, Gavin Shan wrote: Each PHB maintains an array helping to translate RID (Request ID) to PE# with the assumption that PE# takes 8 bits, indicating that we can't have more than 256 PEs. However, pci_dn-pe_number already had 4-bytes for the PE#. The patch extends the PE# capacity so that each of them will be 4-bytes long. Then we can use IODA_INVALID_PE to check one entry in phb-pe_rmap[] is valid or not. Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com --- arch/powerpc/platforms/powernv/pci-ioda.c | 8 ++-- arch/powerpc/platforms/powernv/pci.h | 7 +++ 2 files changed, 9 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 57ba8fd..3094c61 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -786,7 +786,7 @@ static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe) /* Clear the reverse map */ for (rid = pe-rid; rid rid_end; rid++) - phb-ioda.pe_rmap[rid] = 0; + phb-ioda.pe_rmap[rid] = IODA_INVALID_PE; /* Release from all parents PELT-V */ while (parent) { @@ -3134,7 +3134,7 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np, unsigned long size, pemap_off; const __be64 *prop64; const __be32 *prop32; - int len; + int len, i; u64 phb_id; void *aux; long rc; @@ -3201,6 +3201,10 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np, if (prop32) phb-ioda.reserved_pe = be32_to_cpup(prop32); + /* Invalidate RID to PE# mapping */ + for (i = 0; i ARRAY_SIZE(phb-ioda.pe_rmap); ++i) + phb-ioda.pe_rmap[i] = IODA_INVALID_PE; + /* Parse 64-bit MMIO range */ pnv_ioda_parse_m64_window(phb); diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h index 1dc9578..6f8568e 100644 --- a/arch/powerpc/platforms/powernv/pci.h +++ b/arch/powerpc/platforms/powernv/pci.h @@ -175,11 +175,10 @@ struct pnv_phb { struct list_headpe_list; struct mutexpe_list_mutex; - /* Reverse map of PEs, will have to extend if -* we are to support more than 256 PEs, indexed -* bus { bus, devfn } + /* Reverse map of PEs, indexed by +* { bus, devfn } */ - unsigned char pe_rmap[0x1]; + int pe_rmap[0x1]; 256k seems to be waste when only tiny fraction of it will ever be used. Using include/linux/hashtable.h makes sense here, and if you use a hashtable, you won't have to initialize anything with IODA_INVALID_PE. I'm not sure if I follow your idea completely. With hash table to trace RID mapping here, won't more memory needed if all PCI buse numbers (0 to 255) are all valid? It means hash table doesn't have advantage in memory consumption. You need 3 bytes - one for a bus and two for devfn - which makes it a perfect 32bit has key and you only store existing devices in a hash so you do not waste memory. You don't answer my concern yet: more memory will be needed if all PCI bus numbers (0 to 255) are all valid. Also, 2 bytes are enough: one byte is for bus number, another byte for devfn. Why we need 3 bytes here? How many bits of the 16-bits (2-bytes) used as the hash key? I believe it shouldn't all of them because lot of memory will be consumed for the hash bucket heads. Since most of cases, we have bus level PE. So it sounds reasonable to use the devfn as hash key, which is one-byte long. In this case, 2KB (256 * 8) is used for the hash bucket head without any node populated in the table yet. Every node would be represented by below data struct, each of which consumes 24-bytes. If the PHB has 5 PCI buses, which is commonly seen, the total consumed memory will be: 2KB for hash bucket head 30KB for hash nodes: (24 * 256 * 5) struct pnv_ioda_rid { int bdfn; int pe_number; struct hlist_node node; }; Don't forget it need more complex to maintain the conflicting list in one bucket. So I don't see the benefit to use hashtable here. On the other hand, searching in hash table buckets have to iterate list of conflicting items (keys), which is slow comparing to what we have. How often do you expect this code to execute? Is not it setup-type and hotplug only? Unless it is thousands times per second, it is not an issue here. I was intending to say: hashtable has more complex than array. The data struct can be as simple as array. I don't see why we bother to have hashtable here. However, you're correct, the code is just executed at system
[v2 00/11] Freescale DPAA QBMan Drivers
The Freescale Data Path Acceleration Architecture (DPAA) is a set of hardware components on specific QorIQ multicore processors. This architecture provides the infrastructure to support simplified sharing of networking interfaces and accelerators by multiple CPU cores and the accelerators. The Queue Manager (QMan) is a hardware queue management block that allows software and accelerators on the datapath to enqueue and dequeue frames in order to communicate. The Buffer Manager (BMan) is a hardware buffer pool management block that allows software and accelerators on the datapath to acquire and release buffers in order to build frames. This patch set introduces the QBMan driver code that configures initializes the QBMan hardware and provides APIs for software to use the frame queues and buffer pools the blocks provide. These drivers provide the base fuctionality for software to communicate with the other DPAA accelerators on Freescale QorIQ processors. Changes from v1: - Cleanup Kconfig options - Changed base QMan and BMan drivers to only be buit in. Will add loadable support in future patch - Replace panic() call with WARN_ON() - Elimanated some unused APIs - Replaced PowerPC specific IO accessors with platform independent versions Emil Medve (1): powerpc: re-add devm_ioremap_prot() Geoff Thorpe (7): soc/fsl: Introduce DPAA BMan device management driver soc/fsl: Introduce the DPAA BMan portal driver soc/fsl: Introduce drivers for the DPAA QMan soc/bman: Add self-tester for BMan driver soc/qman: Add self-tester for QMan driver soc/bman: Add debugfs support for the BMan driver soc/qman: Add debugfs support for the QMan driver Hai-Ying Wang (2): soc/bman: Add HOTPLUG_CPU support to the BMan driver soc/qman: Add HOTPLUG_CPU support to the QMan driver Madalin Bucur (1): soc/qman: add qman_delete_cgr_safe() arch/powerpc/include/asm/io.h |3 + arch/powerpc/lib/Makefile |1 + arch/powerpc/lib/devres.c | 43 + arch/powerpc/platforms/85xx/corenet_generic.c | 16 + arch/powerpc/platforms/85xx/p1023_rdb.c | 14 + drivers/soc/Kconfig |1 + drivers/soc/Makefile |1 + drivers/soc/fsl/Kconfig |5 + drivers/soc/fsl/Makefile |3 + drivers/soc/fsl/qbman/Kconfig | 120 + drivers/soc/fsl/qbman/Makefile| 20 + drivers/soc/fsl/qbman/bman-debugfs.c | 117 + drivers/soc/fsl/qbman/bman.c | 553 + drivers/soc/fsl/qbman/bman.h | 542 + drivers/soc/fsl/qbman/bman_api.c | 1072 + drivers/soc/fsl/qbman/bman_portal.c | 391 drivers/soc/fsl/qbman/bman_priv.h | 134 ++ drivers/soc/fsl/qbman/bman_test.c | 56 + drivers/soc/fsl/qbman/bman_test.h | 34 + drivers/soc/fsl/qbman/bman_test_api.c | 184 ++ drivers/soc/fsl/qbman/bman_test_thresh.c | 198 ++ drivers/soc/fsl/qbman/bman_utils.c| 72 + drivers/soc/fsl/qbman/dpaa_resource.c | 359 +++ drivers/soc/fsl/qbman/dpaa_sys.h | 271 +++ drivers/soc/fsl/qbman/qman-debugfs.c | 1313 +++ drivers/soc/fsl/qbman/qman.c | 1026 + drivers/soc/fsl/qbman/qman.h | 1128 ++ drivers/soc/fsl/qbman/qman_api.c | 2921 + drivers/soc/fsl/qbman/qman_driver.c | 83 + drivers/soc/fsl/qbman/qman_portal.c | 672 ++ drivers/soc/fsl/qbman/qman_priv.h | 287 +++ drivers/soc/fsl/qbman/qman_test.c | 57 + drivers/soc/fsl/qbman/qman_test.h | 44 + drivers/soc/fsl/qbman/qman_test_api.c | 216 ++ drivers/soc/fsl/qbman/qman_test_stash.c | 502 + drivers/soc/fsl/qbman/qman_utils.c| 305 +++ include/soc/fsl/bman.h| 518 + include/soc/fsl/qman.h| 1977 + 38 files changed, 15259 insertions(+) create mode 100644 arch/powerpc/lib/devres.c create mode 100644 drivers/soc/fsl/Kconfig create mode 100644 drivers/soc/fsl/Makefile create mode 100644 drivers/soc/fsl/qbman/Kconfig create mode 100644 drivers/soc/fsl/qbman/Makefile create mode 100644 drivers/soc/fsl/qbman/bman-debugfs.c create mode 100644 drivers/soc/fsl/qbman/bman.c create mode 100644 drivers/soc/fsl/qbman/bman.h create mode 100644 drivers/soc/fsl/qbman/bman_api.c create mode 100644 drivers/soc/fsl/qbman/bman_portal.c create mode 100644 drivers/soc/fsl/qbman/bman_priv.h create mode 100644 drivers/soc/fsl/qbman/bman_test.c create mode 100644 drivers/soc/fsl/qbman/bman_test.h create mode 100644 drivers/soc/fsl/qbman/bman_test_api.c create mode 100644 drivers/soc/fsl/qbman/bman_test_thresh.c create
[v2 08/11] soc/qman: Add debugfs support for the QMan driver
From: Geoff Thorpe geoff.tho...@freescale.com Add debugfs sypport for querying the state of hardware based queues managed by the DPAA 1.0 Queue Manager. Signed-off-by: Geoff Thorpe geoff.tho...@freescale.com Signed-off-by: Emil Medve emilian.me...@freescale.com Signed-off-by: Madalin Bucur madalin.bu...@freescale.com Signed-off-by: Roy Pledge roy.ple...@freescale.com --- drivers/soc/fsl/qbman/Makefile |1 + drivers/soc/fsl/qbman/dpaa_sys.h |2 + drivers/soc/fsl/qbman/qman-debugfs.c | 1313 ++ drivers/soc/fsl/qbman/qman_api.c | 60 +- drivers/soc/fsl/qbman/qman_priv.h|8 + 5 files changed, 1382 insertions(+), 2 deletions(-) create mode 100644 drivers/soc/fsl/qbman/qman-debugfs.c diff --git a/drivers/soc/fsl/qbman/Makefile b/drivers/soc/fsl/qbman/Makefile index 2b53fbc..cce1f70 100644 --- a/drivers/soc/fsl/qbman/Makefile +++ b/drivers/soc/fsl/qbman/Makefile @@ -17,3 +17,4 @@ obj-$(CONFIG_FSL_QMAN_TEST) += qman-test.o qman-test-y = qman_test.o qman-test-$(CONFIG_FSL_QMAN_TEST_API) += qman_test_api.o qman-test-$(CONFIG_FSL_QMAN_TEST_STASH)+= qman_test_stash.o +obj-$(CONFIG_FSL_QMAN_DEBUGFS) += qman-debugfs.o diff --git a/drivers/soc/fsl/qbman/dpaa_sys.h b/drivers/soc/fsl/qbman/dpaa_sys.h index 3cf446a..0dd341c 100644 --- a/drivers/soc/fsl/qbman/dpaa_sys.h +++ b/drivers/soc/fsl/qbman/dpaa_sys.h @@ -38,7 +38,9 @@ #include linux/of_irq.h #include linux/of_reserved_mem.h #include linux/kthread.h +#include linux/uaccess.h #include linux/debugfs.h +#include linux/vmalloc.h #include linux/platform_device.h #include linux/ctype.h diff --git a/drivers/soc/fsl/qbman/qman-debugfs.c b/drivers/soc/fsl/qbman/qman-debugfs.c new file mode 100644 index 000..57585e8 --- /dev/null +++ b/drivers/soc/fsl/qbman/qman-debugfs.c @@ -0,0 +1,1313 @@ +/* Copyright 2010 - 2015 Freescale Semiconductor, Inc. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * * Neither the name of Freescale Semiconductor nor the + * names of its contributors may be used to endorse or promote products + * derived from this software without specific prior written permission. + * + * ALTERNATIVELY, this software may be distributed under the terms of the + * GNU General Public License (GPL) as published by the Free Software + * Foundation, either version 2 of that License or (at your option) any + * later version. + * + * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY + * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED + * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE + * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY + * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES + * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; + * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS + * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ +#include qman_priv.h + +#define MAX_FQID (0x00ff) +#define QM_FQD_BLOCK_SIZE 64 +#define QM_FQD_AR(0xC10) + +static u32 fqid_max; +static u64 qman_ccsr_start; +static u64 qman_ccsr_size; + +static const char * const state_txt[] = { + Out of Service, + Retired, + Tentatively Scheduled, + Truly Scheduled, + Parked, + Active, Active Held or Held Suspended, + Unknown State 6, + Unknown State 7, + NULL, +}; + +static const u8 fqd_states[] = { + QM_MCR_NP_STATE_OOS, QM_MCR_NP_STATE_RETIRED, QM_MCR_NP_STATE_TEN_SCHED, + QM_MCR_NP_STATE_TRU_SCHED, QM_MCR_NP_STATE_PARKED, + QM_MCR_NP_STATE_ACTIVE}; + +struct mask_to_text { + u16 mask; + const char *txt; +}; + +struct mask_filter_s { + u16 mask; + u8 filter; +}; + +static const struct mask_filter_s mask_filter[] = { + {QM_FQCTRL_PREFERINCACHE, 0}, + {QM_FQCTRL_PREFERINCACHE, 1}, + {QM_FQCTRL_HOLDACTIVE, 0}, + {QM_FQCTRL_HOLDACTIVE, 1}, + {QM_FQCTRL_AVOIDBLOCK, 0}, + {QM_FQCTRL_AVOIDBLOCK, 1}, + {QM_FQCTRL_FORCESFDR, 0}, + {QM_FQCTRL_FORCESFDR, 1}, + {QM_FQCTRL_CPCSTASH, 0}, + {QM_FQCTRL_CPCSTASH, 1}, +
Re: [PATCH 31/31] dma-mapping-common: skip kmemleak checks for page-less SG entries
Christoph, On 12 August 2015 at 08:05, Christoph Hellwig h...@lst.de wrote: Signed-off-by: Christoph Hellwig h...@lst.de --- include/asm-generic/dma-mapping-common.h | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/include/asm-generic/dma-mapping-common.h b/include/asm-generic/dma-mapping-common.h index 940d5ec..afc3eaf 100644 --- a/include/asm-generic/dma-mapping-common.h +++ b/include/asm-generic/dma-mapping-common.h @@ -51,8 +51,10 @@ static inline int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg, int i, ents; struct scatterlist *s; - for_each_sg(sg, s, nents, i) - kmemcheck_mark_initialized(sg_virt(s), s-length); + for_each_sg(sg, s, nents, i) { + if (sg_has_page(s)) + kmemcheck_mark_initialized(sg_virt(s), s-length); + } Just a nitpick for the subject, it should say kmemcheck rather than kmemleak (different features ;)). -- Catalin ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v6 05/42] powerpc/powernv: Track IO/M32/M64 segments from PE
On Wed, Aug 12, 2015 at 10:57:33PM +1000, Alexey Kardashevskiy wrote: On 08/12/2015 09:20 PM, Gavin Shan wrote: On Wed, Aug 12, 2015 at 09:05:09PM +1000, Alexey Kardashevskiy wrote: On 08/12/2015 08:45 PM, Gavin Shan wrote: On Tue, Aug 11, 2015 at 12:23:42PM +1000, Alexey Kardashevskiy wrote: On 08/11/2015 10:03 AM, Gavin Shan wrote: On Mon, Aug 10, 2015 at 05:16:40PM +1000, Alexey Kardashevskiy wrote: On 08/06/2015 02:11 PM, Gavin Shan wrote: The patch is adding 6 bitmaps, three to PE and three to PHB, to track The patch is also removing 2 arrays (io_segmap and m32_segmap), what is that all about? Also, there was no m64_segmap, now there is, needs an explanation may be. Originally, the bitmaps (io_segmap and m32_segmap) are allocated dynamically. Now, they have fixed sizes - 512 bits. The subject powerpc/powernv: Track IO/M32/M64 segments from PE indicates why m64_segmap is added. But before this patch, you somehow managed to keep it working without a map for M64, by the same time you needed map for IO and M32. It seems you are making things consistent in this patch but it also feels like you do not have to do so as M64 did not need a map before and I cannot see why it needs one now. The M64 map is used by [PATCH v6 23/42] powerpc/powernv: Release PEs dynamically where the M64 segments consumed by one particular PE will be released. Then add it where it is really started being used. It is really hard to review a patch which is actually spread between patches. Do not count that reviewers will just trust you. Ok. I'll try. the consumed by one particular PE, which can be released once the PE is destroyed during PCI unplugging time. Also, we're using fixed quantity of bits to trace the used IO, M32 and M64 segments by PEs in one particular PHB. Out of curiosity - have you considered having just 3 arrays, in PHB, storing PE numbers, and ditching PE's arrays? Does PE itself need to know what PEs it is using? Not sure about this master/slave PEs though. I don't follow your suggestion. Can you rephrase and explain it a bit more? Please explains in what situations you need same map in both PHB and PE and how you are going to use them. For example, pe::m64_segmap and phb::m64_segmap. I believe you need to know what segment is used by what PE and that's it and having 2 bitmaps is overcomplicated hard to follow. Is there anything else what I am missing? The situation is same to all (IO, M32 and M64) segment maps. Taking m64_segmap as an example, it will be used when creating or destroying the PE who consumes M64 segments. phb::m64_segmap is recording the M64 segment usage in PHB's domain. It's used to check same M64 segment won't be used for towice. pe::m64_segmap tracks the M64 segments consumed by the PE. You could have a single map in PHB, key would be a segment number and value would be PE number. No need to have a map in PE. At all. No need to initialize bitmaps, etc. So it would be arrays for various segmant maps if I understood your suggestion as below. Please confirm: #define PNV_IODA_MAX_SEG_NUM 512 int struct pnv_phb::io_segmap[PNV_IODA_MAX_SEG_NUM]; m32_segmap[PNV_IODA_MAX_SEG_NUM]; m64_segmap[PNV_IODA_MAX_SEG_NUM]; - Initially, they are initialize to IODA_INVALID_PE; - When one segment is assigned to one PE, the corresponding entry of the array is set to PE number. - When one segment is relased, the corresponding entry of the array is set to IODA_INVALID_PE; No, not arrays, I meant DEFINE_HASHTABLE(), hash_add(), etc from include/linux/hashtable.h. http://kernelnewbies.org/FAQ/Hashtables is a good place to start :) Are you sure it needs hashtable to represent the simple data struct? I really don't understand the benefits, could you provide more details about the benefits? With hashtable, every bucket will include multiple items with conflicting hash key, each of which would be represented by data struct as below. The data struct uses 24 bytes memory and not efficient enough from this aspect. When one more segment consued, instance of struct pnv_ioda_segment is allocated and put into the conflicting list of the target bucket. At later point, the instance is removed from the list and released when the segment is detached from the PE. It's more complex than it should be. struct pnv_ioda_segment { int pe_number; int seg_number; struct hlist_node node; }; Thanks, Gavin ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[v2 06/11] soc/qman: Add self-tester for QMan driver
From: Geoff Thorpe geoff.tho...@freescale.com Add a self test for the DPAA 1.0 Queue Manager driver. The tests ensure that the driver can properly enqueue and dequeue from frame queues using the QMan portal infrastructure. Signed-off-by: Geoff Thorpe geoff.tho...@freescale.com Signed-off-by: Emil Medve emilian.me...@freescale.com Signed-off-by: Roy Pledge roy.ple...@freescale.com --- drivers/soc/fsl/qbman/Makefile |4 + drivers/soc/fsl/qbman/qman_test.c | 57 drivers/soc/fsl/qbman/qman_test.h | 44 +++ drivers/soc/fsl/qbman/qman_test_api.c | 216 + drivers/soc/fsl/qbman/qman_test_stash.c | 502 +++ 5 files changed, 823 insertions(+) create mode 100644 drivers/soc/fsl/qbman/qman_test.c create mode 100644 drivers/soc/fsl/qbman/qman_test.h create mode 100644 drivers/soc/fsl/qbman/qman_test_api.c create mode 100644 drivers/soc/fsl/qbman/qman_test_stash.c diff --git a/drivers/soc/fsl/qbman/Makefile b/drivers/soc/fsl/qbman/Makefile index 04509c3..82f5482 100644 --- a/drivers/soc/fsl/qbman/Makefile +++ b/drivers/soc/fsl/qbman/Makefile @@ -12,3 +12,7 @@ bman-test-$(CONFIG_FSL_BMAN_TEST_THRESH) += bman_test_thresh.o obj-$(CONFIG_FSL_QMAN) += qman_api.o qman_utils.o qman_driver.o obj-$(CONFIG_FSL_QMAN_CONFIG) += qman.o qman_portal.o +obj-$(CONFIG_FSL_QMAN_TEST)+= qman-test.o +qman-test-y = qman_test.o +qman-test-$(CONFIG_FSL_QMAN_TEST_API) += qman_test_api.o +qman-test-$(CONFIG_FSL_QMAN_TEST_STASH)+= qman_test_stash.o diff --git a/drivers/soc/fsl/qbman/qman_test.c b/drivers/soc/fsl/qbman/qman_test.c new file mode 100644 index 000..9ec49cb --- /dev/null +++ b/drivers/soc/fsl/qbman/qman_test.c @@ -0,0 +1,57 @@ +/* Copyright 2008 - 2015 Freescale Semiconductor, Inc. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * * Neither the name of Freescale Semiconductor nor the + * names of its contributors may be used to endorse or promote products + * derived from this software without specific prior written permission. + * + * ALTERNATIVELY, this software may be distributed under the terms of the + * GNU General Public License (GPL) as published by the Free Software + * Foundation, either version 2 of that License or (at your option) any + * later version. + * + * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY + * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED + * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE + * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY + * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES + * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; + * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS + * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#include qman_test.h + +MODULE_AUTHOR(Geoff Thorpe); +MODULE_LICENSE(Dual BSD/GPL); +MODULE_DESCRIPTION(QMan testing); + +static int test_init(void) +{ + int loop = 1; + + while (loop--) { +#ifdef CONFIG_FSL_QMAN_TEST_STASH + qman_test_stash(); +#endif +#ifdef CONFIG_FSL_QMAN_TEST_API + qman_test_api(); +#endif + } + return 0; +} + +static void test_exit(void) +{ +} + +module_init(test_init); +module_exit(test_exit); diff --git a/drivers/soc/fsl/qbman/qman_test.h b/drivers/soc/fsl/qbman/qman_test.h new file mode 100644 index 000..0b34a67 --- /dev/null +++ b/drivers/soc/fsl/qbman/qman_test.h @@ -0,0 +1,44 @@ +/* Copyright 2008 - 2015 Freescale Semiconductor, Inc. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * * Neither the name of Freescale Semiconductor nor
Re: [PATCH v6 18/42] powerpc/powernv: Allocate PE# in deasending order
On Tue, Aug 11, 2015 at 12:50:33PM +1000, Alexey Kardashevskiy wrote: On 08/11/2015 10:43 AM, Gavin Shan wrote: On Tue, Aug 11, 2015 at 12:39:02AM +1000, Alexey Kardashevskiy wrote: On 08/06/2015 02:11 PM, Gavin Shan wrote: The available PE#, represented by a bitmap in the PHB, is allocated in ascending order. Available PE# is available exactly because it is not allocated ;) Yeah, will correct it. It conflicts with the fact that M64 segments are assigned in same order. In order to avoid the conflict, the patch allocates PE# in descending order. What kind of conflict? On PHB3, the M64 segment is assigned to one PE whose PE number is determined. M64 segment are allocated in ascending order. It's why I would like to allocate PE# in deascending order. From previous lessons, I thought M64 segment number is PE# number as well :-/ Seems this is not the case, so what does store this seg#-PE# mapping in PHB? Your understanding is somewhat correct. Let me explain for more here. Taking PHB3 as an example: it has 16 M64 BARs. The last BAR (15th) is running in share mode. When one segment from this BAR is assigned to one PE, the PE number is determined and that's equal to the segment number. However, it's still possible one PE has multiple segments. We have master and slave PEs for the later case. If any one left BARs (0 to 14) is running in single mode and assigned to one particular PE. the PE number can be confiugred. Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com --- arch/powerpc/platforms/powernv/pci-ioda.c | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 56b058c..1c950e8 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -161,13 +161,18 @@ static struct pnv_ioda_pe *pnv_ioda_reserve_pe(struct pnv_phb *phb, int pe_no) static struct pnv_ioda_pe *pnv_ioda_alloc_pe(struct pnv_phb *phb) { unsigned long pe; + unsigned long limit = phb-ioda.total_pe_num - 1; do { pe = find_next_zero_bit(phb-ioda.pe_alloc, - phb-ioda.total_pe_num, 0); - if (pe = phb-ioda.total_pe_num) + phb-ioda.total_pe_num, limit); + if (pe phb-ioda.total_pe_num + !test_and_set_bit(pe, phb-ioda.pe_alloc)) + break; + + if (--limit = phb-ioda.total_pe_num) return NULL; - } while(test_and_set_bit(pe, phb-ioda.pe_alloc)); + } while (1); Usually, if it is while(1), then it is while(1){} rather than do{}while(1) :) Agree, will change it. return pnv_ioda_init_pe(phb, pe); } Thanks, Gavin ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [1/8] powerpc/slb: Remove a duplicate extern variable
On Wed, 2015-29-07 at 07:09:58 UTC, Anshuman Khandual wrote: This patch just removes one redundant entry for one extern variable 'slb_compare_rr_to_size' from the scope. This patch does not change any functionality. Signed-off-by: Anshuman Khandual khand...@linux.vnet.ibm.com Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/752b8adec4a776b4fdf0 cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v4 00/11] CXL EEH Handling
CXL accelerators are unfortunately not immune from failure. This patch set enables them to particpate in the Extended Error Handling process. This series starts with a number of preparatory patches: - Patch 1 is cleanup: converting macros to static inlines. - Patch 2 makes sure we don't touch the hardware when it has failed. - Patches 3-5 make the 'unplug' functions idempotent, so that if we get part way through recovery and then fail, being completely unplugged as part of removal doesn't cause us to oops out. - Patches 6 and 7 refactor init and teardown paths for the adapter and AFUs, so that they can be configured and deconfigured separately from their allocation and release. - Patch 8 stops cxl_reset from breaking EEH. Patches 9 and 10 are parts of EEH. - Firstly we have a kernel flag that allows us to confidently assert the hardware will not change (be reflashed) when it it reset. We need this in order to be able to safely do EEH recovery. - We then have the EEH support itself. Finally, we add a CONFIG_CXL_EEH symbol. This allows drivers to depend on the API we provide to enable CXL EEH, or to be easily backportable if EEH is optional. Changes from v3 are minor: - Clarification of responsibility of CXL driver vs driver bound to vPHB with regards to preventing inappropriate access of hardware during recovery. - Clean up unused rc in cxl_alloc_adapter, thanks David Laight. - Break setting rc and testing rc into different lines, thanks mpe and Cyril. - If we fail to init an AFU, don't try to select the best mode. Changes from v2 are mostly minor cleanups, reflecting some review and further testing. - Use static inlines instead of macros. - Propagate PCI link state to devices on the vPHB. - Various cleanup, thanks Cyril Bur. - Use pci_channel_offline instead of a direct check. - Don't ifdef, just provide the symbol so that drivers know that the new API is available. Thanks to Cyril for patiently explaining this to me about 3 times before I understood. Changes from v1: - More comprehensive link down checks, including vPHB. - Rebased to apply cleanly to 4.2-rc4. - cxl reset changes. - CONFIG_CXL_EEH symbol addition. - add better vPHB support to EEH. Daniel Axtens (11): cxl: Convert MMIO read/write macros to inline functions cxl: Drop commands if the PCI channel is not in normal state cxl: Allocate and release the SPA with the AFU cxl: Make IRQ release idempotent cxl: Clean up adapter MMIO unmap path. cxl: Refactor adaptor init/teardown cxl: Refactor AFU init/teardown cxl: Don't remove AFUs/vPHBs in cxl_reset cxl: Allow the kernel to trust that an image won't change on PERST. cxl: EEH support cxl: Add CONFIG_CXL_EEH symbol Documentation/ABI/testing/sysfs-class-cxl | 10 + drivers/misc/cxl/Kconfig | 6 + drivers/misc/cxl/api.c| 7 + drivers/misc/cxl/context.c| 6 +- drivers/misc/cxl/cxl.h| 84 - drivers/misc/cxl/file.c | 19 + drivers/misc/cxl/irq.c| 9 + drivers/misc/cxl/native.c | 104 +- drivers/misc/cxl/pci.c| 591 +++--- drivers/misc/cxl/sysfs.c | 26 ++ drivers/misc/cxl/vphb.c | 34 ++ include/misc/cxl.h| 10 + 12 files changed, 752 insertions(+), 154 deletions(-) -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [1/4] cxl: Compile with -Werror
On Fri, 2015-07-08 at 03:18:17 UTC, Daniel Axtens wrote: It's a good idea, and it brings us in line with the rest of arch/powerpc. Signed-off-by: Daniel Axtens d...@axtens.net Acked-by: Michael Neuling mi...@neuling.org Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/d3d73f4b38a8ece19846 cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [6/8] powerpc/prom: Simplify the logic while fetching SLB size
On Wed, 2015-29-07 at 07:10:03 UTC, Anshuman Khandual wrote: This patch just simplifies the existing code logic while fetching the SLB size property from the device tree. This also changes the function name from check_cpu_slb_size to init_mmu_slb_size as it just initializes the mmu_slb_size value. Signed-off-by: Anshuman Khandual khand...@linux.vnet.ibm.com Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/9c61f7a0ad6fdff85b0c cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v4 04/11] cxl: Make IRQ release idempotent
Check if an IRQ is mapped before releasing it. This will simplify future EEH code by allowing unconditional unmapping of IRQs. Acked-by: Cyril Bur cyril...@gmail.com Signed-off-by: Daniel Axtens d...@axtens.net --- drivers/misc/cxl/irq.c | 9 + 1 file changed, 9 insertions(+) diff --git a/drivers/misc/cxl/irq.c b/drivers/misc/cxl/irq.c index 77e5d0e7ebe1..9a1e5732c1af 100644 --- a/drivers/misc/cxl/irq.c +++ b/drivers/misc/cxl/irq.c @@ -341,6 +341,9 @@ int cxl_register_psl_err_irq(struct cxl *adapter) void cxl_release_psl_err_irq(struct cxl *adapter) { + if (adapter-err_virq != irq_find_mapping(NULL, adapter-err_hwirq)) + return; + cxl_p1_write(adapter, CXL_PSL_ErrIVTE, 0x); cxl_unmap_irq(adapter-err_virq, adapter); cxl_release_one_irq(adapter, adapter-err_hwirq); @@ -374,6 +377,9 @@ int cxl_register_serr_irq(struct cxl_afu *afu) void cxl_release_serr_irq(struct cxl_afu *afu) { + if (afu-serr_virq != irq_find_mapping(NULL, afu-serr_hwirq)) + return; + cxl_p1n_write(afu, CXL_PSL_SERR_An, 0x); cxl_unmap_irq(afu-serr_virq, afu); cxl_release_one_irq(afu-adapter, afu-serr_hwirq); @@ -400,6 +406,9 @@ int cxl_register_psl_irq(struct cxl_afu *afu) void cxl_release_psl_irq(struct cxl_afu *afu) { + if (afu-psl_virq != irq_find_mapping(NULL, afu-psl_hwirq)) + return; + cxl_unmap_irq(afu-psl_virq, afu); cxl_release_one_irq(afu-adapter, afu-psl_hwirq); kfree(afu-psl_irq_name); -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v4 07/11] cxl: Refactor AFU init/teardown
As with an adapter, some aspects of initialisation are done only once in the lifetime of an AFU: for example, allocating memory, or setting up sysfs/debugfs files. However, we may want to be able to do some parts of the initialisation multiple times: for example, in error recovery we want to be able to tear down and then re-map IO memory and IRQs. Therefore, refactor AFU init/teardown as follows. - Create two new functions: 'cxl_configure_afu', and its pair 'cxl_deconfigure_afu'. As with the adapter functions, these (de)configure resources that do not need to last the entire lifetime of the AFU. - Allocating and releasing memory remain the task of 'cxl_alloc_afu' and 'cxl_release_afu'. - Once-only functions that do not involve allocating/releasing memory stay in the overarching 'cxl_init_afu'/'cxl_remove_afu' pair. However, the task of picking an AFU mode and activating it has been broken out. Signed-off-by: Daniel Axtens d...@axtens.net --- drivers/misc/cxl/pci.c | 124 ++--- 1 file changed, 77 insertions(+), 47 deletions(-) diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c index f3c5998f2f37..8e7b0f3ad254 100644 --- a/drivers/misc/cxl/pci.c +++ b/drivers/misc/cxl/pci.c @@ -752,45 +752,77 @@ ssize_t cxl_afu_read_err_buffer(struct cxl_afu *afu, char *buf, return count; } -static int cxl_init_afu(struct cxl *adapter, int slice, struct pci_dev *dev) +static int cxl_configure_afu(struct cxl_afu *afu, struct cxl *adapter, struct pci_dev *dev) { - struct cxl_afu *afu; - bool free = true; int rc; - if (!(afu = cxl_alloc_afu(adapter, slice))) - return -ENOMEM; - - if ((rc = dev_set_name(afu-dev, afu%i.%i, adapter-adapter_num, slice))) - goto err1; + rc = cxl_map_slice_regs(afu, adapter, dev); + if (rc) + return rc; - if ((rc = cxl_map_slice_regs(afu, adapter, dev))) + rc = sanitise_afu_regs(afu); + if (rc) goto err1; - if ((rc = sanitise_afu_regs(afu))) - goto err2; - /* We need to reset the AFU before we can read the AFU descriptor */ - if ((rc = __cxl_afu_reset(afu))) - goto err2; + rc = __cxl_afu_reset(afu); + if (rc) + goto err1; if (cxl_verbose) dump_afu_descriptor(afu); - if ((rc = cxl_read_afu_descriptor(afu))) - goto err2; + rc = cxl_read_afu_descriptor(afu); + if (rc) + goto err1; - if ((rc = cxl_afu_descriptor_looks_ok(afu))) - goto err2; + rc = cxl_afu_descriptor_looks_ok(afu); + if (rc) + goto err1; - if ((rc = init_implementation_afu_regs(afu))) - goto err2; + rc = init_implementation_afu_regs(afu); + if (rc) + goto err1; + + rc = cxl_register_serr_irq(afu); + if (rc) + goto err1; - if ((rc = cxl_register_serr_irq(afu))) + rc = cxl_register_psl_irq(afu); + if (rc) goto err2; - if ((rc = cxl_register_psl_irq(afu))) - goto err3; + return 0; + +err2: + cxl_release_serr_irq(afu); +err1: + cxl_unmap_slice_regs(afu); + return rc; +} + +static void cxl_deconfigure_afu(struct cxl_afu *afu) +{ + cxl_release_psl_irq(afu); + cxl_release_serr_irq(afu); + cxl_unmap_slice_regs(afu); +} + +static int cxl_init_afu(struct cxl *adapter, int slice, struct pci_dev *dev) +{ + struct cxl_afu *afu; + int rc; + + if (!(afu = cxl_alloc_afu(adapter, slice))) + return -ENOMEM; + + rc = dev_set_name(afu-dev, afu%i.%i, adapter-adapter_num, slice); + if (rc) + goto err_free; + + rc = cxl_configure_afu(afu, adapter, dev); + if (rc) + goto err_free; /* Don't care if this fails */ cxl_debugfs_afu_add(afu); @@ -799,38 +831,32 @@ static int cxl_init_afu(struct cxl *adapter, int slice, struct pci_dev *dev) * After we call this function we must not free the afu directly, even * if it returns an error! */ - if ((rc = cxl_register_afu(afu))) + rc = cxl_register_afu(afu); + if (rc) goto err_put1; - if ((rc = cxl_sysfs_afu_add(afu))) + rc = cxl_sysfs_afu_add(afu); + if (rc) goto err_put1; - - if ((rc = cxl_afu_select_best_mode(afu))) - goto err_put2; - adapter-afu[afu-slice] = afu; - if ((rc = cxl_pci_vphb_add(afu))) + rc = cxl_pci_vphb_add(afu); + if (rc) dev_info(afu-dev, Can't register vPHB\n); return 0; -err_put2: - cxl_sysfs_afu_remove(afu); err_put1: - device_unregister(afu-dev); - free = false; + cxl_deconfigure_afu(afu);
[PATCH v4 05/11] cxl: Clean up adapter MMIO unmap path.
- MMIO pointer unmapping is guarded by a null pointer check. However, iounmap doesn't null the pointer, just invalidate it. Therefore, explicitly null the pointer after unmapping. - afu_desc_mmio also needs to be unmapped. - PCI regions are allocated in cxl_map_adapter_regs. Therefore they should be released in unmap, not elsewhere. Acked-by: Cyril Bur cyril...@gmail.com Signed-off-by: Daniel Axtens d...@axtens.net --- drivers/misc/cxl/pci.c | 24 ++-- 1 file changed, 18 insertions(+), 6 deletions(-) diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c index 62a762d94de3..484d35a5aead 100644 --- a/drivers/misc/cxl/pci.c +++ b/drivers/misc/cxl/pci.c @@ -539,10 +539,18 @@ err: static void cxl_unmap_slice_regs(struct cxl_afu *afu) { - if (afu-p2n_mmio) + if (afu-p2n_mmio) { iounmap(afu-p2n_mmio); - if (afu-p1n_mmio) + afu-p2n_mmio = NULL; + } + if (afu-p1n_mmio) { iounmap(afu-p1n_mmio); + afu-p1n_mmio = NULL; + } + if (afu-afu_desc_mmio) { + iounmap(afu-afu_desc_mmio); + afu-afu_desc_mmio = NULL; + } } static void cxl_release_afu(struct device *dev) @@ -919,10 +927,16 @@ err1: static void cxl_unmap_adapter_regs(struct cxl *adapter) { - if (adapter-p1_mmio) + if (adapter-p1_mmio) { iounmap(adapter-p1_mmio); - if (adapter-p2_mmio) + adapter-p1_mmio = NULL; + pci_release_region(to_pci_dev(adapter-dev.parent), 2); + } + if (adapter-p2_mmio) { iounmap(adapter-p2_mmio); + adapter-p2_mmio = NULL; + pci_release_region(to_pci_dev(adapter-dev.parent), 0); + } } static int cxl_read_vsec(struct cxl *adapter, struct pci_dev *dev) @@ -1131,8 +1145,6 @@ static void cxl_remove_adapter(struct cxl *adapter) device_unregister(adapter-dev); - pci_release_region(pdev, 0); - pci_release_region(pdev, 2); pci_disable_device(pdev); } -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v4 11/11] cxl: Add CONFIG_CXL_EEH symbol
CONFIG_CXL_EEH is for CXL's EEH related code. Other drivers can depend on or #ifdef on this symbol to configure PERST behaviour, allowing CXL to participate in the EEH process. Reviewed-by: Cyril Bur cyril...@gmail.com Signed-off-by: Daniel Axtens d...@axtens.net --- drivers/misc/cxl/Kconfig | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/misc/cxl/Kconfig b/drivers/misc/cxl/Kconfig index b6db9ebd52c2..c151fc1fe14c 100644 --- a/drivers/misc/cxl/Kconfig +++ b/drivers/misc/cxl/Kconfig @@ -11,11 +11,17 @@ config CXL_KERNEL_API bool default n +config CXL_EEH + bool + default n + select EEH + config CXL tristate Support for IBM Coherent Accelerators (CXL) depends on PPC_POWERNV PCI_MSI select CXL_BASE select CXL_KERNEL_API + select CXL_EEH default m help Select this option to enable driver support for IBM Coherent -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v6 23/42] powerpc/powernv: Release PEs dynamically
On Tue, Aug 11, 2015 at 11:03:40PM +1000, Alexey Kardashevskiy wrote: On 08/06/2015 02:11 PM, Gavin Shan wrote: This adds the refcount to PE, which represents number of PCI devices contained in the PE. When last device leaves from the PE, the PE together with its consumed resources (IO, DMA, PELTM, PELTV) are released, to support PCI hotplug. Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com --- arch/powerpc/platforms/powernv/pci-ioda.c | 233 +++--- arch/powerpc/platforms/powernv/pci.h | 3 + 2 files changed, 217 insertions(+), 19 deletions(-) diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index d2697a3..13d8a5b 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -132,6 +132,53 @@ static inline bool pnv_pci_is_mem_pref_64(unsigned long flags) (IORESOURCE_MEM_64 | IORESOURCE_PREFETCH)); } +static void pnv_pci_ioda_release_pe_dma(struct pnv_ioda_pe *pe) Is this ioda1 helper or common helper for both ioda1 and ioda2? It's for IODA1 only. +{ + struct pnv_phb *phb = pe-phb; + struct iommu_table *tbl; + int seg; + int64_t rc; + + /* No DMA32 segments allocated */ + if (pe-dma32_seg == PNV_INVALID_SEGMENT || + pe-dma32_segcount = 0) { dma32_segcount is unsigned long, cannot be less than 0. It's int dma32_segcount in pci.h: + pe-dma32_seg = PNV_INVALID_SEGMENT; + pe-dma32_segcount = 0; + return; + } + + /* Unlink IOMMU table from group */ + tbl = pe-table_group.tables[0]; + pnv_pci_unlink_table_and_group(tbl, pe-table_group); + if (pe-table_group.group) { + iommu_group_put(pe-table_group.group); + BUG_ON(pe-table_group.group); + } + + /* Release IOMMU table */ + free_pages(tbl-it_base, + get_order(TCE32_TABLE_SIZE * pe-dma32_segcount)); + iommu_free_table(tbl, + of_node_full_name(pci_bus_to_OF_node(pe-pbus))); There is pnv_pci_ioda2_table_free_pages(), use it. The function (pnv_pci_ioda_release_pe_dma()) is for IODA1 only. + + /* Disable TVE */ + for (seg = pe-dma32_seg; + seg pe-dma32_seg + pe-dma32_segcount; + seg++) { + rc = opal_pci_map_pe_dma_window(phb-opal_id, + pe-pe_number, seg, 0, 0ul, 0ul, 0ul); + if (rc) + pe_warn(pe, Error %ld unmapping DMA32 seg#%d\n, + rc, seg); + } May be implement iommu_table_group_ops::unset_window for IODA1 too? Good point, but it's something out of scope. I'm putting it into my TODO list and cook up the patch when having chance. + + /* Free the DMA32 segments */ + bitmap_clear(phb-ioda.dma32_segmap, + pe-dma32_seg, pe-dma32_segcount); + pe-dma32_seg = PNV_INVALID_SEGMENT; + pe-dma32_segcount = 0; +} + static inline void pnv_pci_ioda2_tce_invalidate_entire(struct pnv_ioda_pe *pe) { /* 01xb - invalidate TCEs that match the specified PE# */ @@ -199,13 +246,15 @@ static void pnv_pci_ioda2_set_bypass(struct pnv_ioda_pe *pe, bool enable) pe-tce_bypass_enabled = enable; } -#ifdef CONFIG_PCI_IOV -static void pnv_pci_ioda2_release_dma_pe(struct pci_dev *dev, - struct pnv_ioda_pe *pe) +static void pnv_pci_ioda2_release_pe_dma(struct pnv_ioda_pe *pe) { struct iommu_table*tbl; + struct device_node*dn; int64_t rc; + if (pe-dma32_seg == PNV_INVALID_SEGMENT) + return; + tbl = pe-table_group.tables[0]; rc = pnv_pci_ioda2_unset_window(pe-table_group, 0); if (rc) @@ -216,10 +265,91 @@ static void pnv_pci_ioda2_release_dma_pe(struct pci_dev *dev, iommu_group_put(pe-table_group.group); BUG_ON(pe-table_group.group); } + + if (pe-flags (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL)) + dn = pci_bus_to_OF_node(pe-pbus); + else if (pe-flags PNV_IODA_PE_DEV) + dn = pci_device_to_OF_node(pe-pdev); +#ifdef CONFIG_PCI_IOV + else if (pe-flags PNV_IODA_PE_VF) + dn = pci_device_to_OF_node(pe-parent_dev); +#endif + else + dn = NULL; + pnv_pci_ioda2_table_free_pages(tbl); - iommu_free_table(tbl, of_node_full_name(dev-dev.of_node)); + iommu_free_table(tbl, of_node_full_name(dn)); + pe-dma32_seg = PNV_INVALID_SEGMENT; +} I'd drop the chunk about calculating @dn above, nobody really cares what iommu_free_table() prints. If you really need to print something, print PE#. It makes sense. I'll drop the chunk of garbage and replace it with the PE number. + +static void pnv_ioda_release_pe_dma(struct pnv_ioda_pe *pe) +{ + struct pnv_phb *phb = pe-phb; + + switch (phb-type) { + case PNV_PHB_IODA1: + pnv_pci_ioda_release_pe_dma(pe); + break; +
Re: [2/4] cxl: sparse: Make declarations static
On Fri, 2015-07-08 at 03:18:18 UTC, Daniel Axtens wrote: A few declarations were identified by sparse as needing to be static: /scratch/dja/linux-capi/drivers/misc/cxl/irq.c:408:6: warning: symbol 'afu_irq_name_free' was not declared. Should it be static? /scratch/dja/linux-capi/drivers/misc/cxl/irq.c:467:6: warning: symbol 'afu_register_hwirqs' was not declared. Should it be static? /scratch/dja/linux-capi/drivers/misc/cxl/file.c:254:6: warning: symbol 'afu_compat_ioctl' was not declared. Should it be static? /scratch/dja/linux-capi/drivers/misc/cxl/file.c:399:30: warning: symbol 'afu_master_fops' was not declared. Should it be static? Make them static. Signed-off-by: Daniel Axtens d...@axtens.net Acked-by: Michael Neuling mi...@neuling.org Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/3d6b040e73387570521d cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [5/8] powerpc/slb: Add documentation to runtime patching of SLB encoding
On Wed, 2015-29-07 at 07:10:02 UTC, Anshuman Khandual wrote: This patch adds some documentation to 'patch_slb_encoding' function explaining about how it clears the existing immediate value in the given instruction and inserts a new one there. Signed-off-by: Anshuman Khandual khand...@linux.vnet.ibm.com Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/79d0be7407955a268bce cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v4 10/11] cxl: EEH support
EEH (Enhanced Error Handling) allows a driver to recover from the temporary failure of an attached PCI card. Enable basic CXL support for EEH. Signed-off-by: Daniel Axtens d...@axtens.net --- drivers/misc/cxl/cxl.h | 1 + drivers/misc/cxl/pci.c | 253 drivers/misc/cxl/vphb.c | 8 ++ 3 files changed, 262 insertions(+) diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h index cda02412b01e..6f5386653dae 100644 --- a/drivers/misc/cxl/cxl.h +++ b/drivers/misc/cxl/cxl.h @@ -726,6 +726,7 @@ int cxl_psl_purge(struct cxl_afu *afu); void cxl_stop_trace(struct cxl *cxl); int cxl_pci_vphb_add(struct cxl_afu *afu); +void cxl_pci_vphb_reconfigure(struct cxl_afu *afu); void cxl_pci_vphb_remove(struct cxl_afu *afu); extern struct pci_driver cxl_pci_driver; diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c index 965524a6ae7c..5c2dc82da92f 100644 --- a/drivers/misc/cxl/pci.c +++ b/drivers/misc/cxl/pci.c @@ -24,6 +24,7 @@ #include asm/io.h #include cxl.h +#include misc/cxl.h #define CXL_PCI_VSEC_ID0x1280 @@ -1275,10 +1276,262 @@ static void cxl_remove(struct pci_dev *dev) cxl_remove_adapter(adapter); } +static pci_ers_result_t cxl_vphb_error_detected(struct cxl_afu *afu, + pci_channel_state_t state) +{ + struct pci_dev *afu_dev; + pci_ers_result_t result = PCI_ERS_RESULT_NEED_RESET; + pci_ers_result_t afu_result = PCI_ERS_RESULT_NEED_RESET; + + /* There should only be one entry, but go through the list +* anyway +*/ + list_for_each_entry(afu_dev, afu-phb-bus-devices, bus_list) { + if (!afu_dev-driver) + continue; + + afu_dev-error_state = state; + + if (afu_dev-driver-err_handler) + afu_result = afu_dev-driver-err_handler-error_detected(afu_dev, + state); + /* Disconnect trumps all, NONE trumps NEED_RESET */ + if (afu_result == PCI_ERS_RESULT_DISCONNECT) + result = PCI_ERS_RESULT_DISCONNECT; + else if ((afu_result == PCI_ERS_RESULT_NONE) +(result == PCI_ERS_RESULT_NEED_RESET)) + result = PCI_ERS_RESULT_NONE; + } + return result; +} + +static pci_ers_result_t cxl_pci_error_detected(struct pci_dev *pdev, + pci_channel_state_t state) +{ + struct cxl *adapter = pci_get_drvdata(pdev); + struct cxl_afu *afu; + pci_ers_result_t result = PCI_ERS_RESULT_NEED_RESET; + int i; + + /* At this point, we could still have an interrupt pending. +* Let's try to get them out of the way before they do +* anything we don't like. +*/ + schedule(); + + /* If we're permanently dead, give up. */ + if (state == pci_channel_io_perm_failure) { + /* Tell the AFU drivers; but we don't care what they +* say, we're going away. +*/ + for (i = 0; i adapter-slices; i++) { + afu = adapter-afu[i]; + cxl_vphb_error_detected(afu, state); + } + return PCI_ERS_RESULT_DISCONNECT; + } + + /* Are we reflashing? +* +* If we reflash, we could come back as something entirely +* different, including a non-CAPI card. As such, by default +* we don't participate in the process. We'll be unbound and +* the slot re-probed. (TODO: check EEH doesn't blindly rebind +* us!) +* +* However, this isn't the entire story: for reliablity +* reasons, we usually want to reflash the FPGA on PERST in +* order to get back to a more reliable known-good state. +* +* This causes us a bit of a problem: if we reflash we can't +* trust that we'll come back the same - we could have a new +* image and been PERSTed in order to load that +* image. However, most of the time we actually *will* come +* back the same - for example a regular EEH event. +* +* Therefore, we allow the user to assert that the image is +* indeed the same and that we should continue on into EEH +* anyway. +*/ + if (adapter-perst_loads_image !adapter-perst_same_image) { + /* TODO take the PHB out of CXL mode */ + dev_info(pdev-dev, reflashing, so opting out of EEH!\n); + return PCI_ERS_RESULT_NONE; + } + + /* +* At this point, we want to try to recover. We'll always +* need a complete slot reset: we don't trust any other reset. +* +* Now, we go through each AFU: +* - We send the driver, if bound, an error_detected
[PATCH v4 06/11] cxl: Refactor adaptor init/teardown
Some aspects of initialisation are done only once in the lifetime of an adapter: for example, allocating memory for the adapter, allocating the adapter number, or setting up sysfs/debugfs files. However, we may want to be able to do some parts of the initialisation multiple times: for example, in error recovery we want to be able to tear down and then re-map IO memory and IRQs. Therefore, refactor CXL init/teardown as follows. - Keep the overarching functions 'cxl_init_adapter' and its pair, 'cxl_remove_adapter'. - Move all 'once only' allocation/freeing steps to the existing 'cxl_alloc_adapter' function, and its pair 'cxl_release_adapter' (This involves moving allocation of the adapter number out of cxl_init_adapter.) - Create two new functions: 'cxl_configure_adapter', and its pair 'cxl_deconfigure_adapter'. These two functions 'wire up' the hardware --- they (de)configure resources that do not need to last the entire lifetime of the adapter Signed-off-by: Daniel Axtens d...@axtens.net --- drivers/misc/cxl/pci.c | 176 +++-- 1 file changed, 111 insertions(+), 65 deletions(-) diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c index 484d35a5aead..f3c5998f2f37 100644 --- a/drivers/misc/cxl/pci.c +++ b/drivers/misc/cxl/pci.c @@ -965,7 +965,6 @@ static int cxl_read_vsec(struct cxl *adapter, struct pci_dev *dev) CXL_READ_VSEC_BASE_IMAGE(dev, vsec, adapter-base_image); CXL_READ_VSEC_IMAGE_STATE(dev, vsec, image_state); adapter-user_image_loaded = !!(image_state CXL_VSEC_USER_IMAGE_LOADED); - adapter-perst_loads_image = true; adapter-perst_select_user = !!(image_state CXL_VSEC_USER_IMAGE_LOADED); CXL_READ_VSEC_NAFUS(dev, vsec, adapter-slices); @@ -1025,22 +1024,33 @@ static void cxl_release_adapter(struct device *dev) pr_devel(cxl_release_adapter\n); + cxl_remove_adapter_nr(adapter); + kfree(adapter); } -static struct cxl *cxl_alloc_adapter(struct pci_dev *dev) +static struct cxl *cxl_alloc_adapter(void) { struct cxl *adapter; if (!(adapter = kzalloc(sizeof(struct cxl), GFP_KERNEL))) return NULL; - adapter-dev.parent = dev-dev; - adapter-dev.release = cxl_release_adapter; - pci_set_drvdata(dev, adapter); spin_lock_init(adapter-afu_list_lock); + if (cxl_alloc_adapter_nr(adapter)) + goto err1; + + if (dev_set_name(adapter-dev, card%i, adapter-adapter_num)) + goto err2; + return adapter; + +err2: + cxl_remove_adapter_nr(adapter); +err1: + kfree(adapter); + return NULL; } static int sanitise_adapter_regs(struct cxl *adapter) @@ -1049,57 +1059,107 @@ static int sanitise_adapter_regs(struct cxl *adapter) return cxl_tlb_slb_invalidate(adapter); } -static struct cxl *cxl_init_adapter(struct pci_dev *dev) +/* This should contain *only* operations that can safely be done in + * both creation and recovery. + */ +static int cxl_configure_adapter(struct cxl *adapter, struct pci_dev *dev) { - struct cxl *adapter; - bool free = true; int rc; + adapter-dev.parent = dev-dev; + adapter-dev.release = cxl_release_adapter; + pci_set_drvdata(dev, adapter); - if (!(adapter = cxl_alloc_adapter(dev))) - return ERR_PTR(-ENOMEM); - - if ((rc = cxl_read_vsec(adapter, dev))) - goto err1; - - if ((rc = cxl_vsec_looks_ok(adapter, dev))) - goto err1; + rc = pci_enable_device(dev); + if (rc) { + dev_err(dev-dev, pci_enable_device failed: %i\n, rc); + return rc; + } - if ((rc = setup_cxl_bars(dev))) - goto err1; + rc = cxl_read_vsec(adapter, dev); + if (rc) + return rc; - if ((rc = switch_card_to_cxl(dev))) - goto err1; + rc = cxl_vsec_looks_ok(adapter, dev); + if (rc) + return rc; - if ((rc = cxl_alloc_adapter_nr(adapter))) - goto err1; + rc = setup_cxl_bars(dev); + if (rc) + return rc; - if ((rc = dev_set_name(adapter-dev, card%i, adapter-adapter_num))) - goto err2; + rc = switch_card_to_cxl(dev); + if (rc) + return rc; - if ((rc = cxl_update_image_control(adapter))) - goto err2; + rc = cxl_update_image_control(adapter); + if (rc) + return rc; - if ((rc = cxl_map_adapter_regs(adapter, dev))) - goto err2; + rc = cxl_map_adapter_regs(adapter, dev); + if (rc) + return rc; - if ((rc = sanitise_adapter_regs(adapter))) - goto err2; + rc = sanitise_adapter_regs(adapter); + if (rc) + goto err; - if ((rc = init_implementation_adapter_regs(adapter, dev))) -
Re: [7/8] powerpc/xmon: Drop the 'valid' variable completely in 'dump_segments'
On Wed, 2015-29-07 at 07:10:04 UTC, Anshuman Khandual wrote: Value of the 'valid' variable is zero when 'esid' is zero and it does not matter when 'esid' is non-zero. The variable 'valid' can be dropped from the function 'dump_segments' by checking for validity of 'esid' inside the nested code block. This patch does that change. Signed-off-by: Anshuman Khandual khand...@linux.vnet.ibm.com Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/8218a3031c204b20582b cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v4 02/11] cxl: Drop commands if the PCI channel is not in normal state
If the PCI channel has gone down, don't attempt to poke the hardware. We need to guard every time cxl_whatever_(read|write) is called. This is because a call to those functions will dereference an offset into an mmio register, and the mmio mappings get invalidated in the EEH teardown. Check in the read/write functions in the header. We give them the same semantics as usual PCI operations: - a write to a channel that is down is ignored. - a read from a channel that is down returns all fs. Also, we try to access the MMIO space of a vPHB device as part of the PCI disable path. Because that's a read that bypasses most of our usual checks, we handle it explicitly. As far as user visible warnings go: - Check link state in file ops, return -EIO if down. - Be reasonably quiet if there's an error in a teardown path, or when we already know the hardware is going down. - Throw a big WARN if someone tries to start a CXL operation while the card is down. This gives a useful stacktrace for debugging whatever is doing that. Signed-off-by: Daniel Axtens d...@axtens.net --- drivers/misc/cxl/context.c | 6 +++- drivers/misc/cxl/cxl.h | 44 ++-- drivers/misc/cxl/file.c| 19 + drivers/misc/cxl/native.c | 71 -- drivers/misc/cxl/vphb.c| 26 + 5 files changed, 154 insertions(+), 12 deletions(-) diff --git a/drivers/misc/cxl/context.c b/drivers/misc/cxl/context.c index 1287148629c0..615842115848 100644 --- a/drivers/misc/cxl/context.c +++ b/drivers/misc/cxl/context.c @@ -193,7 +193,11 @@ int __detach_context(struct cxl_context *ctx) if (status != STARTED) return -EBUSY; - WARN_ON(cxl_detach_process(ctx)); + /* Only warn if we detached while the link was OK. +* If detach fails when hw is down, we don't care. +*/ + WARN_ON(cxl_detach_process(ctx) + cxl_adapter_link_ok(ctx-afu-adapter)); flush_work(ctx-fault_work); /* Only needed for dedicated process */ put_pid(ctx-pid); cxl_ctx_put(); diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h index 6a93bfbcd826..9b9e89fd02cc 100644 --- a/drivers/misc/cxl/cxl.h +++ b/drivers/misc/cxl/cxl.h @@ -531,6 +531,14 @@ struct cxl_process_element { __be32 software_state; } __packed; +static inline bool cxl_adapter_link_ok(struct cxl *cxl) +{ + struct pci_dev *pdev; + + pdev = to_pci_dev(cxl-dev.parent); + return !pci_channel_offline(pdev); +} + static inline void __iomem *_cxl_p1_addr(struct cxl *cxl, cxl_p1_reg_t reg) { WARN_ON(!cpu_has_feature(CPU_FTR_HVMODE)); @@ -539,12 +547,16 @@ static inline void __iomem *_cxl_p1_addr(struct cxl *cxl, cxl_p1_reg_t reg) static inline void cxl_p1_write(struct cxl *cxl, cxl_p1_reg_t reg, u64 val) { - out_be64(_cxl_p1_addr(cxl, reg), val); + if (likely(cxl_adapter_link_ok(cxl))) + out_be64(_cxl_p1_addr(cxl, reg), val); } static inline u64 cxl_p1_read(struct cxl *cxl, cxl_p1_reg_t reg) { - return in_be64(_cxl_p1_addr(cxl, reg)); + if (likely(cxl_adapter_link_ok(cxl))) + return in_be64(_cxl_p1_addr(cxl, reg)); + else + return ~0ULL; } static inline void __iomem *_cxl_p1n_addr(struct cxl_afu *afu, cxl_p1n_reg_t reg) @@ -555,12 +567,16 @@ static inline void __iomem *_cxl_p1n_addr(struct cxl_afu *afu, cxl_p1n_reg_t reg static inline void cxl_p1n_write(struct cxl_afu *afu, cxl_p1n_reg_t reg, u64 val) { - out_be64(_cxl_p1n_addr(afu, reg), val); + if (likely(cxl_adapter_link_ok(afu-adapter))) + out_be64(_cxl_p1n_addr(afu, reg), val); } static inline u64 cxl_p1n_read(struct cxl_afu *afu, cxl_p1n_reg_t reg) { - return in_be64(_cxl_p1n_addr(afu, reg)); + if (likely(cxl_adapter_link_ok(afu-adapter))) + return in_be64(_cxl_p1n_addr(afu, reg)); + else + return ~0ULL; } static inline void __iomem *_cxl_p2n_addr(struct cxl_afu *afu, cxl_p2n_reg_t reg) @@ -570,22 +586,34 @@ static inline void __iomem *_cxl_p2n_addr(struct cxl_afu *afu, cxl_p2n_reg_t reg static inline void cxl_p2n_write(struct cxl_afu *afu, cxl_p2n_reg_t reg, u64 val) { - out_be64(_cxl_p2n_addr(afu, reg), val); + if (likely(cxl_adapter_link_ok(afu-adapter))) + out_be64(_cxl_p2n_addr(afu, reg), val); } static inline u64 cxl_p2n_read(struct cxl_afu *afu, cxl_p2n_reg_t reg) { - return in_be64(_cxl_p2n_addr(afu, reg)); + if (likely(cxl_adapter_link_ok(afu-adapter))) + return in_be64(_cxl_p2n_addr(afu, reg)); + else + return ~0ULL; } static inline u64 cxl_afu_cr_read64(struct cxl_afu *afu, int cr, u64 off) { - return in_le64((afu)-afu_desc_mmio + (afu)-crs_offset + ((cr) * (afu)-crs_len) + (off)); + if (likely(cxl_adapter_link_ok(afu-adapter))) + return
[PATCH v4 03/11] cxl: Allocate and release the SPA with the AFU
Previously the SPA was allocated and freed upon entering and leaving AFU-directed mode. This causes some issues for error recovery - contexts hold a pointer inside the SPA, and they may persist after the AFU has been detached. We would ideally like to allocate the SPA when the AFU is allocated, and release it until the AFU is released. However, we don't know how big the SPA needs to be until we read the AFU descriptor. Therefore, restructure the code: - Allocate the SPA only once, on the first attach. - Release the SPA only when the entire AFU is being released (not detached). Guard the release with a NULL check, so we don't free if it was never allocated (e.g. dedicated mode) Acked-by: Cyril Bur cyril...@gmail.com Signed-off-by: Daniel Axtens d...@axtens.net --- drivers/misc/cxl/cxl.h| 3 +++ drivers/misc/cxl/native.c | 33 ++--- drivers/misc/cxl/pci.c| 2 ++ 3 files changed, 27 insertions(+), 11 deletions(-) diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h index 9b9e89fd02cc..d540542f9931 100644 --- a/drivers/misc/cxl/cxl.h +++ b/drivers/misc/cxl/cxl.h @@ -632,6 +632,9 @@ void unregister_cxl_calls(struct cxl_calls *calls); int cxl_alloc_adapter_nr(struct cxl *adapter); void cxl_remove_adapter_nr(struct cxl *adapter); +int cxl_alloc_spa(struct cxl_afu *afu); +void cxl_release_spa(struct cxl_afu *afu); + int cxl_file_init(void); void cxl_file_exit(void); int cxl_register_adapter(struct cxl *adapter); diff --git a/drivers/misc/cxl/native.c b/drivers/misc/cxl/native.c index cd1dda5fcd3a..0af3a0d1c697 100644 --- a/drivers/misc/cxl/native.c +++ b/drivers/misc/cxl/native.c @@ -182,10 +182,8 @@ static int spa_max_procs(int spa_size) return ((spa_size / 8) - 96) / 17; } -static int alloc_spa(struct cxl_afu *afu) +int cxl_alloc_spa(struct cxl_afu *afu) { - u64 spap; - /* Work out how many pages to allocate */ afu-spa_order = 0; do { @@ -204,6 +202,13 @@ static int alloc_spa(struct cxl_afu *afu) pr_devel(spa pages: %i afu-spa_max_procs: %i afu-num_procs: %i\n, 1afu-spa_order, afu-spa_max_procs, afu-num_procs); + return 0; +} + +static void attach_spa(struct cxl_afu *afu) +{ + u64 spap; + afu-sw_command_status = (__be64 *)((char *)afu-spa + ((afu-spa_max_procs + 3) * 128)); @@ -212,14 +217,19 @@ static int alloc_spa(struct cxl_afu *afu) spap |= CXL_PSL_SPAP_V; pr_devel(cxl: SPA allocated at 0x%p. Max processes: %i, sw_command_status: 0x%p CXL_PSL_SPAP_An=0x%016llx\n, afu-spa, afu-spa_max_procs, afu-sw_command_status, spap); cxl_p1n_write(afu, CXL_PSL_SPAP_An, spap); - - return 0; } -static void release_spa(struct cxl_afu *afu) +static inline void detach_spa(struct cxl_afu *afu) { cxl_p1n_write(afu, CXL_PSL_SPAP_An, 0); - free_pages((unsigned long) afu-spa, afu-spa_order); +} + +void cxl_release_spa(struct cxl_afu *afu) +{ + if (afu-spa) { + free_pages((unsigned long) afu-spa, afu-spa_order); + afu-spa = NULL; + } } int cxl_tlb_slb_invalidate(struct cxl *adapter) @@ -446,8 +456,11 @@ static int activate_afu_directed(struct cxl_afu *afu) dev_info(afu-dev, Activating AFU directed mode\n); - if (alloc_spa(afu)) - return -ENOMEM; + if (afu-spa == NULL) { + if (cxl_alloc_spa(afu)) + return -ENOMEM; + } + attach_spa(afu); cxl_p1n_write(afu, CXL_PSL_SCNTL_An, CXL_PSL_SCNTL_An_PM_AFU); cxl_p1n_write(afu, CXL_PSL_AMOR_An, 0xULL); @@ -558,8 +571,6 @@ static int deactivate_afu_directed(struct cxl_afu *afu) cxl_afu_disable(afu); cxl_psl_purge(afu); - release_spa(afu); - return 0; } diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c index 32ad09705949..62a762d94de3 100644 --- a/drivers/misc/cxl/pci.c +++ b/drivers/misc/cxl/pci.c @@ -551,6 +551,8 @@ static void cxl_release_afu(struct device *dev) pr_devel(cxl_release_afu\n); + cxl_release_spa(afu); + kfree(afu); } -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [2/8] powerpc/slb: Rename all the 'slot' occurrences to 'entry'
On Wed, 2015-29-07 at 07:09:59 UTC, Anshuman Khandual wrote: These are essentially SLB individual slots with entries what we are dealing with in these functions. Usage of both 'entry' and 'slot' synonyms makes it real confusing sometimes. This patch makes it uniform across the file by replacing all those 'slot's with 'entry's. Signed-off-by: Anshuman Khandual khand...@linux.vnet.ibm.com Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/2be682af48e8236558da cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [4/4] cxl: sparse: Silence iomem warning in debugfs file creation
On Fri, 2015-07-08 at 03:18:20 UTC, Daniel Axtens wrote: An IO address, tagged with __iomem, is passed to debugfs_create_file as private data. This requires that it be cast to void *. The cast creates a sparse warning: /scratch/dja/linux-capi/drivers/misc/cxl/debugfs.c:51:57: warning: cast removes address space of expression The address space marker is added back in the file operations (fops_io_u64). Silence the warning with __force. Signed-off-by: Daniel Axtens d...@axtens.net Acked-by: Michael Neuling mi...@neuling.org Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/83c3fee7e78f5a937b73 cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: powerpc/prom: Use DRCONF flags while processing detected LMBs
On Thu, 2015-06-08 at 13:05:07 UTC, Anshuman Khandual wrote: This patch just replaces hard coded values with existing DRCONF flags while procesing detected LMBs from the device tree. This does not change any functionality. Signed-off-by: Anshuman Khandual khand...@linux.vnet.ibm.com Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/9afac933433ca71e0f78 cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v4 01/11] cxl: Convert MMIO read/write macros to inline functions
We're about to make these more complex, so make them functions first. Signed-off-by: Daniel Axtens d...@axtens.net --- drivers/misc/cxl/cxl.h | 51 ++ 1 file changed, 35 insertions(+), 16 deletions(-) diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h index 4fd66cabde1e..6a93bfbcd826 100644 --- a/drivers/misc/cxl/cxl.h +++ b/drivers/misc/cxl/cxl.h @@ -537,10 +537,15 @@ static inline void __iomem *_cxl_p1_addr(struct cxl *cxl, cxl_p1_reg_t reg) return cxl-p1_mmio + cxl_reg_off(reg); } -#define cxl_p1_write(cxl, reg, val) \ - out_be64(_cxl_p1_addr(cxl, reg), val) -#define cxl_p1_read(cxl, reg) \ - in_be64(_cxl_p1_addr(cxl, reg)) +static inline void cxl_p1_write(struct cxl *cxl, cxl_p1_reg_t reg, u64 val) +{ + out_be64(_cxl_p1_addr(cxl, reg), val); +} + +static inline u64 cxl_p1_read(struct cxl *cxl, cxl_p1_reg_t reg) +{ + return in_be64(_cxl_p1_addr(cxl, reg)); +} static inline void __iomem *_cxl_p1n_addr(struct cxl_afu *afu, cxl_p1n_reg_t reg) { @@ -548,26 +553,40 @@ static inline void __iomem *_cxl_p1n_addr(struct cxl_afu *afu, cxl_p1n_reg_t reg return afu-p1n_mmio + cxl_reg_off(reg); } -#define cxl_p1n_write(afu, reg, val) \ - out_be64(_cxl_p1n_addr(afu, reg), val) -#define cxl_p1n_read(afu, reg) \ - in_be64(_cxl_p1n_addr(afu, reg)) +static inline void cxl_p1n_write(struct cxl_afu *afu, cxl_p1n_reg_t reg, u64 val) +{ + out_be64(_cxl_p1n_addr(afu, reg), val); +} + +static inline u64 cxl_p1n_read(struct cxl_afu *afu, cxl_p1n_reg_t reg) +{ + return in_be64(_cxl_p1n_addr(afu, reg)); +} static inline void __iomem *_cxl_p2n_addr(struct cxl_afu *afu, cxl_p2n_reg_t reg) { return afu-p2n_mmio + cxl_reg_off(reg); } -#define cxl_p2n_write(afu, reg, val) \ - out_be64(_cxl_p2n_addr(afu, reg), val) -#define cxl_p2n_read(afu, reg) \ - in_be64(_cxl_p2n_addr(afu, reg)) +static inline void cxl_p2n_write(struct cxl_afu *afu, cxl_p2n_reg_t reg, u64 val) +{ + out_be64(_cxl_p2n_addr(afu, reg), val); +} +static inline u64 cxl_p2n_read(struct cxl_afu *afu, cxl_p2n_reg_t reg) +{ + return in_be64(_cxl_p2n_addr(afu, reg)); +} -#define cxl_afu_cr_read64(afu, cr, off) \ - in_le64((afu)-afu_desc_mmio + (afu)-crs_offset + ((cr) * (afu)-crs_len) + (off)) -#define cxl_afu_cr_read32(afu, cr, off) \ - in_le32((afu)-afu_desc_mmio + (afu)-crs_offset + ((cr) * (afu)-crs_len) + (off)) +static inline u64 cxl_afu_cr_read64(struct cxl_afu *afu, int cr, u64 off) +{ + return in_le64((afu)-afu_desc_mmio + (afu)-crs_offset + ((cr) * (afu)-crs_len) + (off)); +} + +static inline u32 cxl_afu_cr_read32(struct cxl_afu *afu, int cr, u64 off) +{ + return in_le32((afu)-afu_desc_mmio + (afu)-crs_offset + ((cr) * (afu)-crs_len) + (off)); +} u16 cxl_afu_cr_read16(struct cxl_afu *afu, int cr, u64 off); u8 cxl_afu_cr_read8(struct cxl_afu *afu, int cr, u64 off); -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v4 09/11] cxl: Allow the kernel to trust that an image won't change on PERST.
Provide a kernel API and a sysfs entry which allow a user to specify that when a card is PERSTed, it's image will stay the same, allowing it to participate in EEH. cxl_reset is used to reflash the card. In that case, we cannot safely assert that the image will not change. Therefore, disallow cxl_reset if the flag is set. Signed-off-by: Daniel Axtens d...@axtens.net --- Documentation/ABI/testing/sysfs-class-cxl | 10 ++ drivers/misc/cxl/api.c| 7 +++ drivers/misc/cxl/cxl.h| 1 + drivers/misc/cxl/pci.c| 7 +++ drivers/misc/cxl/sysfs.c | 26 ++ include/misc/cxl.h| 10 ++ 6 files changed, 61 insertions(+) diff --git a/Documentation/ABI/testing/sysfs-class-cxl b/Documentation/ABI/testing/sysfs-class-cxl index acfe9df83139..b07e86d4597f 100644 --- a/Documentation/ABI/testing/sysfs-class-cxl +++ b/Documentation/ABI/testing/sysfs-class-cxl @@ -223,3 +223,13 @@ Description:write only Writing 1 will issue a PERST to card which may cause the card to reload the FPGA depending on load_image_on_perst. Users: https://github.com/ibm-capi/libcxl + +What: /sys/class/cxl/card/perst_reloads_same_image +Date: July 2015 +Contact: linuxppc-dev@lists.ozlabs.org +Description: read/write + Trust that when an image is reloaded via PERST, it will not + have changed. + 0 = don't trust, the image may be different (default) + 1 = trust that the image will not change. +Users: https://github.com/ibm-capi/libcxl diff --git a/drivers/misc/cxl/api.c b/drivers/misc/cxl/api.c index 729e0851167d..6a768a9ad22f 100644 --- a/drivers/misc/cxl/api.c +++ b/drivers/misc/cxl/api.c @@ -327,3 +327,10 @@ int cxl_afu_reset(struct cxl_context *ctx) return cxl_afu_check_and_enable(afu); } EXPORT_SYMBOL_GPL(cxl_afu_reset); + +void cxl_perst_reloads_same_image(struct cxl_afu *afu, + bool perst_reloads_same_image) +{ + afu-adapter-perst_same_image = perst_reloads_same_image; +} +EXPORT_SYMBOL_GPL(cxl_perst_reloads_same_image); diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h index d540542f9931..cda02412b01e 100644 --- a/drivers/misc/cxl/cxl.h +++ b/drivers/misc/cxl/cxl.h @@ -493,6 +493,7 @@ struct cxl { bool user_image_loaded; bool perst_loads_image; bool perst_select_user; + bool perst_same_image; }; int cxl_alloc_one_irq(struct cxl *adapter); diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c index e7976deed1f8..965524a6ae7c 100644 --- a/drivers/misc/cxl/pci.c +++ b/drivers/misc/cxl/pci.c @@ -887,6 +887,12 @@ int cxl_reset(struct cxl *adapter) int i; u32 val; + if (adapter-perst_same_image) { + dev_warn(dev-dev, +cxl: refusing to reset/reflash when perst_reloads_same_image is set.\n); + return -EINVAL; + } + dev_info(dev-dev, CXL reset\n); /* pcie_warm_reset requests a fundamental pci reset which includes a @@ -1171,6 +1177,7 @@ static struct cxl *cxl_init_adapter(struct pci_dev *dev) * configure/reconfigure */ adapter-perst_loads_image = true; + adapter-perst_same_image = false; rc = cxl_configure_adapter(adapter, dev); if (rc) { diff --git a/drivers/misc/cxl/sysfs.c b/drivers/misc/cxl/sysfs.c index 31f38bc71a3d..6619cf1f6e1f 100644 --- a/drivers/misc/cxl/sysfs.c +++ b/drivers/misc/cxl/sysfs.c @@ -112,12 +112,38 @@ static ssize_t load_image_on_perst_store(struct device *device, return count; } +static ssize_t perst_reloads_same_image_show(struct device *device, +struct device_attribute *attr, +char *buf) +{ + struct cxl *adapter = to_cxl_adapter(device); + + return scnprintf(buf, PAGE_SIZE, %i\n, adapter-perst_same_image); +} + +static ssize_t perst_reloads_same_image_store(struct device *device, +struct device_attribute *attr, +const char *buf, size_t count) +{ + struct cxl *adapter = to_cxl_adapter(device); + int rc; + int val; + + rc = sscanf(buf, %i, val); + if ((rc != 1) || !(val == 1 || val == 0)) + return -EINVAL; + + adapter-perst_same_image = (val == 1 ? true : false); + return count; +} + static struct device_attribute adapter_attrs[] = { __ATTR_RO(caia_version), __ATTR_RO(psl_revision), __ATTR_RO(base_image), __ATTR_RO(image_loaded), __ATTR_RW(load_image_on_perst), + __ATTR_RW(perst_reloads_same_image), __ATTR(reset, S_IWUSR, NULL, reset_adapter_store), }; diff --git a/include/misc/cxl.h b/include/misc/cxl.h index 7a6c1d6cc173..f2ffe5bd720d 100644
[PATCH v4 08/11] cxl: Don't remove AFUs/vPHBs in cxl_reset
If the driver doesn't participate in EEH, the AFUs will be removed by cxl_remove, which will be invoked by EEH. If the driver does particpate in EEH, the vPHB needs to stick around so that the it can particpate. In both cases, we shouldn't remove the AFU/vPHB. Reviewed-by: Cyril Bur cyril...@gmail.com Signed-off-by: Daniel Axtens d...@axtens.net --- drivers/misc/cxl/pci.c | 5 - 1 file changed, 5 deletions(-) diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c index 8e7b0f3ad254..e7976deed1f8 100644 --- a/drivers/misc/cxl/pci.c +++ b/drivers/misc/cxl/pci.c @@ -889,11 +889,6 @@ int cxl_reset(struct cxl *adapter) dev_info(dev-dev, CXL reset\n); - for (i = 0; i adapter-slices; i++) { - cxl_pci_vphb_remove(adapter-afu[i]); - cxl_remove_afu(adapter-afu[i]); - } - /* pcie_warm_reset requests a fundamental pci reset which includes a * PERST assert/deassert. PERST triggers a loading of the image * if user or factory is selected in sysfs */ -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc/prom: Use DRCONF flags while processing detected LMBs
On 08/11/2015 03:18 AM, Michael Ellerman wrote: On Fri, 2015-08-07 at 07:49 +0530, Madhavan Srinivasan wrote: On Thursday 06 August 2015 06:35 PM, Anshuman Khandual wrote: This patch just replaces hard coded values with existing Please drop This patch just and start with Replace hard ... https://www.kernel.org/doc/Documentation/SubmittingPatches Yeah I rewrote it as: Replace hard coded values with existing DRCONF flags while procesing detected LMBs from the device tree. Does not change any functionality. Thanks Michael. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 05/31] x86/pci-calgary: handle page-less SG entries
For the iommu offset we just need and offset into the page. Calculate that using the physical address instead of using the virtual address so that we don't require a virtual mapping. Signed-off-by: Christoph Hellwig h...@lst.de --- arch/x86/kernel/pci-calgary_64.c | 10 -- 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/arch/x86/kernel/pci-calgary_64.c b/arch/x86/kernel/pci-calgary_64.c index 0497f71..8f1581d 100644 --- a/arch/x86/kernel/pci-calgary_64.c +++ b/arch/x86/kernel/pci-calgary_64.c @@ -368,16 +368,14 @@ static int calgary_map_sg(struct device *dev, struct scatterlist *sg, { struct iommu_table *tbl = find_iommu_table(dev); struct scatterlist *s; - unsigned long vaddr; + unsigned long paddr; unsigned int npages; unsigned long entry; int i; for_each_sg(sg, s, nelems, i) { - BUG_ON(!sg_page(s)); - - vaddr = (unsigned long) sg_virt(s); - npages = iommu_num_pages(vaddr, s-length, PAGE_SIZE); + paddr = sg_phys(s); + npages = iommu_num_pages(paddr, s-length, PAGE_SIZE); entry = iommu_range_alloc(dev, tbl, npages); if (entry == DMA_ERROR_CODE) { @@ -389,7 +387,7 @@ static int calgary_map_sg(struct device *dev, struct scatterlist *sg, s-dma_address = (entry PAGE_SHIFT) | s-offset; /* insert into HW table */ - tce_build(tbl, entry, npages, vaddr PAGE_MASK, dir); + tce_build(tbl, entry, npages, paddr PAGE_MASK, dir); s-dma_length = s-length; } -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RFC: prepare for struct scatterlist entries without page backing
Dan Williams started to look into addressing I/O to and from Persistent Memory in his series from June: http://thread.gmane.org/gmane.linux.kernel.cross-arch/27944 I've started looking into DMA mapping of these SGLs specifically instead of the map_pfn method in there. In addition to supporting NVDIMM backed I/O I also suspect this would be highly useful for media drivers that go through nasty hoops to be able to DMA from/to their ioremapped regions, with vb2_dc_get_userptr in drivers/media/v4l2-core/videobuf2-dma-contig.c being a prime example for the unsafe hacks currently used. It turns out most DMA mapping implementation can handle SGLs without page structures with some fairly simple mechanical work. Most of it is just about consistently using sg_phys. For implementations that need to flush caches we need a new helper that skips these cache flushes if a entry doesn't have a kernel virtual address. However the ccio (parisc) and sba_iommu (parisc ia64) IOMMUs seem to be operate mostly on virtual addresses. It's a fairly odd concept that I don't fully grasp, so I'll need some help with those if we want to bring this forward. Additional this series skips ARM entirely for now. The reason is that most arm implementations of the .map_sg operation just iterate over all entries and call -map_page for it, which means we'd need to convert those to a -map_pfn similar to Dan's previous approach. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 11/31] sparc/iommu: handle page-less SG entries
Use sg_phys() instead of __pa(sg_virt(sg)) so that we don't require a kernel virtual address. Signed-off-by: Christoph Hellwig h...@lst.de --- arch/sparc/kernel/iommu.c| 2 +- arch/sparc/kernel/iommu_common.h | 4 +--- arch/sparc/kernel/pci_sun4v.c| 2 +- 3 files changed, 3 insertions(+), 5 deletions(-) diff --git a/arch/sparc/kernel/iommu.c b/arch/sparc/kernel/iommu.c index 5320689..2ad89d2 100644 --- a/arch/sparc/kernel/iommu.c +++ b/arch/sparc/kernel/iommu.c @@ -486,7 +486,7 @@ static int dma_4u_map_sg(struct device *dev, struct scatterlist *sglist, continue; } /* Allocate iommu entries for that segment */ - paddr = (unsigned long) SG_ENT_PHYS_ADDRESS(s); + paddr = sg_phys(s); npages = iommu_num_pages(paddr, slen, IO_PAGE_SIZE); entry = iommu_tbl_range_alloc(dev, iommu-tbl, npages, handle, (unsigned long)(-1), 0); diff --git a/arch/sparc/kernel/iommu_common.h b/arch/sparc/kernel/iommu_common.h index b40cec2..8e2c211 100644 --- a/arch/sparc/kernel/iommu_common.h +++ b/arch/sparc/kernel/iommu_common.h @@ -33,15 +33,13 @@ */ #define IOMMU_PAGE_SHIFT 13 -#define SG_ENT_PHYS_ADDRESS(SG)(__pa(sg_virt((SG - static inline int is_span_boundary(unsigned long entry, unsigned long shift, unsigned long boundary_size, struct scatterlist *outs, struct scatterlist *sg) { - unsigned long paddr = SG_ENT_PHYS_ADDRESS(outs); + unsigned long paddr = sg_phys(outs); int nr = iommu_num_pages(paddr, outs-dma_length + sg-length, IO_PAGE_SIZE); diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c index d2fe57d..a7a6e41 100644 --- a/arch/sparc/kernel/pci_sun4v.c +++ b/arch/sparc/kernel/pci_sun4v.c @@ -370,7 +370,7 @@ static int dma_4v_map_sg(struct device *dev, struct scatterlist *sglist, continue; } /* Allocate iommu entries for that segment */ - paddr = (unsigned long) SG_ENT_PHYS_ADDRESS(s); + paddr = sg_phys(s); npages = iommu_num_pages(paddr, slen, IO_PAGE_SIZE); entry = iommu_tbl_range_alloc(dev, iommu-tbl, npages, handle, (unsigned long)(-1), 0); -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 12/31] mn10300: handle page-less SG entries
Just remove a BUG_ON, the code handles them just fine as-is. Signed-off-by: Christoph Hellwig h...@lst.de --- arch/mn10300/include/asm/dma-mapping.h | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/arch/mn10300/include/asm/dma-mapping.h b/arch/mn10300/include/asm/dma-mapping.h index a18abfc..b1b1050 100644 --- a/arch/mn10300/include/asm/dma-mapping.h +++ b/arch/mn10300/include/asm/dma-mapping.h @@ -57,11 +57,8 @@ int dma_map_sg(struct device *dev, struct scatterlist *sglist, int nents, BUG_ON(!valid_dma_direction(direction)); WARN_ON(nents == 0 || sglist[0].length == 0); - for_each_sg(sglist, sg, nents, i) { - BUG_ON(!sg_page(sg)); - + for_each_sg(sglist, sg, nents, i) sg-dma_address = sg_phys(sg); - } mn10300_dcache_flush_inv(); return nents; -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 17/31] ia64/sba_iommu: remove sba_sg_address
Signed-off-by: Christoph Hellwig h...@lst.de --- arch/ia64/hp/common/sba_iommu.c | 22 ++ 1 file changed, 10 insertions(+), 12 deletions(-) diff --git a/arch/ia64/hp/common/sba_iommu.c b/arch/ia64/hp/common/sba_iommu.c index 344387a..9e5aa8e 100644 --- a/arch/ia64/hp/common/sba_iommu.c +++ b/arch/ia64/hp/common/sba_iommu.c @@ -248,8 +248,6 @@ static int reserve_sba_gart = 1; static SBA_INLINE void sba_mark_invalid(struct ioc *, dma_addr_t, size_t); static SBA_INLINE void sba_free_range(struct ioc *, dma_addr_t, size_t); -#define sba_sg_address(sg) sg_virt((sg)) - #ifdef FULL_VALID_PDIR static u64 prefetch_spill_page; #endif @@ -397,7 +395,7 @@ sba_dump_sg( struct ioc *ioc, struct scatterlist *startsg, int nents) while (nents-- 0) { printk(KERN_DEBUG %d : DMA %08lx/%05x CPU %p\n, nents, startsg-dma_address, startsg-dma_length, - sba_sg_address(startsg)); + sg_virt(startsg)); startsg = sg_next(startsg); } } @@ -409,7 +407,7 @@ sba_check_sg( struct ioc *ioc, struct scatterlist *startsg, int nents) int the_nents = nents; while (the_nents-- 0) { - if (sba_sg_address(the_sg) == 0x0UL) + if (sg_virt(the_sg) == 0x0UL) sba_dump_sg(NULL, startsg, nents); the_sg = sg_next(the_sg); } @@ -1243,11 +1241,11 @@ sba_fill_pdir( if (dump_run_sg) printk( %2d : %08lx/%05x %p\n, nents, startsg-dma_address, cnt, - sba_sg_address(startsg)); + sg_virt(startsg)); #else DBG_RUN_SG( %d : %08lx/%05x %p\n, nents, startsg-dma_address, cnt, - sba_sg_address(startsg)); + sg_virt(startsg)); #endif /* ** Look for the start of a new DMA stream @@ -1267,7 +1265,7 @@ sba_fill_pdir( ** Look for a VCONTIG chunk */ if (cnt) { - unsigned long vaddr = (unsigned long) sba_sg_address(startsg); + unsigned long vaddr = (unsigned long) sg_virt(startsg); ASSERT(pdirp); /* Since multiple Vcontig blocks could make up @@ -1335,7 +1333,7 @@ sba_coalesce_chunks(struct ioc *ioc, struct device *dev, int idx; while (nents 0) { - unsigned long vaddr = (unsigned long) sba_sg_address(startsg); + unsigned long vaddr = (unsigned long) sg_virt(startsg); /* ** Prepare for first/next DMA stream @@ -1380,7 +1378,7 @@ sba_coalesce_chunks(struct ioc *ioc, struct device *dev, ** ** append the next transaction? */ - vaddr = (unsigned long) sba_sg_address(startsg); + vaddr = (unsigned long) sg_virt(startsg); if (vcontig_end == vaddr) { vcontig_len += startsg-length; @@ -1479,7 +1477,7 @@ static int sba_map_sg_attrs(struct device *dev, struct scatterlist *sglist, if (likely((ioc-dma_mask ~to_pci_dev(dev)-dma_mask) == 0)) { for_each_sg(sglist, sg, nents, filled) { sg-dma_length = sg-length; - sg-dma_address = virt_to_phys(sba_sg_address(sg)); + sg-dma_address = virt_to_phys(sg_virt(sg)); } return filled; } @@ -1487,7 +1485,7 @@ static int sba_map_sg_attrs(struct device *dev, struct scatterlist *sglist, /* Fast path single entry scatterlists. */ if (nents == 1) { sglist-dma_length = sglist-length; - sglist-dma_address = sba_map_single_attrs(dev, sba_sg_address(sglist), sglist-length, dir, attrs); + sglist-dma_address = sba_map_single_attrs(dev, sg_virt(sglist), sglist-length, dir, attrs); return 1; } @@ -1563,7 +1561,7 @@ static void sba_unmap_sg_attrs(struct device *dev, struct scatterlist *sglist, #endif DBG_RUN_SG(%s() START %d entries, %p,%x\n, - __func__, nents, sba_sg_address(sglist), sglist-length); + __func__, nents, sg_virt(sglist), sglist-length); #ifdef ASSERT_PDIR_SANITY ioc = GET_IOC(dev); -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 24/31] xtensa: handle page-less SG entries
Make all cache invalidation conditional on sg_has_page(). Signed-off-by: Christoph Hellwig h...@lst.de --- arch/xtensa/include/asm/dma-mapping.h | 17 ++--- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/arch/xtensa/include/asm/dma-mapping.h b/arch/xtensa/include/asm/dma-mapping.h index 1f5f6dc..262a1d1 100644 --- a/arch/xtensa/include/asm/dma-mapping.h +++ b/arch/xtensa/include/asm/dma-mapping.h @@ -61,10 +61,9 @@ dma_map_sg(struct device *dev, struct scatterlist *sglist, int nents, BUG_ON(direction == DMA_NONE); for_each_sg(sglist, sg, nents, i) { - BUG_ON(!sg_page(sg)); - sg-dma_address = sg_phys(sg); - consistent_sync(sg_virt(sg), sg-length, direction); + if (sg_has_page(sg)) + consistent_sync(sg_virt(sg), sg-length, direction); } return nents; @@ -131,8 +130,10 @@ dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sglist, int nelems, int i; struct scatterlist *sg; - for_each_sg(sglist, sg, nelems, i) - consistent_sync(sg_virt(sg), sg-length, dir); + for_each_sg(sglist, sg, nelems, i) { + if (sg_has_page(sg)) + consistent_sync(sg_virt(sg), sg-length, dir); + } } static inline void @@ -142,8 +143,10 @@ dma_sync_sg_for_device(struct device *dev, struct scatterlist *sglist, int i; struct scatterlist *sg; - for_each_sg(sglist, sg, nelems, i) - consistent_sync(sg_virt(sg), sg-length, dir); + for_each_sg(sglist, sg, nelems, i) { + if (sg_has_page(sg)) + consistent_sync(sg_virt(sg), sg-length, dir); + } } static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr) -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 23/31] sh: handle page-less SG entries
Make all cache invalidation conditional on sg_has_page(). Signed-off-by: Christoph Hellwig h...@lst.de --- arch/sh/kernel/dma-nommu.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/arch/sh/kernel/dma-nommu.c b/arch/sh/kernel/dma-nommu.c index 5b0bfcd..3b64dc7 100644 --- a/arch/sh/kernel/dma-nommu.c +++ b/arch/sh/kernel/dma-nommu.c @@ -33,9 +33,8 @@ static int nommu_map_sg(struct device *dev, struct scatterlist *sg, WARN_ON(nents == 0 || sg[0].length == 0); for_each_sg(sg, s, nents, i) { - BUG_ON(!sg_page(s)); - - dma_cache_sync(dev, sg_virt(s), s-length, dir); + if (sg_has_page(s)) + dma_cache_sync(dev, sg_virt(s), s-length, dir); s-dma_address = sg_phys(s); s-dma_length = s-length; @@ -57,8 +56,10 @@ static void nommu_sync_sg(struct device *dev, struct scatterlist *sg, struct scatterlist *s; int i; - for_each_sg(sg, s, nelems, i) - dma_cache_sync(dev, sg_virt(s), s-length, dir); + for_each_sg(sg, s, nelems, i) { + if (sg_has_page(s)) + dma_cache_sync(dev, sg_virt(s), s-length, dir); + } } #endif -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 8/8] powerpc/xmon: Add some more elements to the existing PACA dump list
On 08/12/2015 11:35 AM, Michael Ellerman wrote: On Wed, 2015-07-29 at 12:40 +0530, Anshuman Khandual wrote: This patch adds a set of new elements to the existing PACA dump list inside an xmon session which can be listed below improving the overall xmon debug support. (1) hmi_event_available (2) dscr_default (3) vmalloc_sllp (4) slb_cache_ptr (5) sprg_vdso (6) tm_scratch (7) core_idle_state_ptr (8) thread_idle_state (9) thread_mask (10) slb_shadow (11) pgd (12) kernel_pgd (13) tcd_ptr (14) mc_kstack (15) crit_kstack (16) dbg_kstack (17) user_time (18) system_time (19) user_time_scaled (20) starttime (21) starttime_user (22) startspurr (23) utime_sspurr (24) stolen_time Adding these makes the paca display much longer than 24 lines. I know in general we don't worry too much about folks on 80x24 green screens, but it's nice if xmon works OK on those. Or on virtual consoles that don't scroll for whatever reason. So I'm going to hold off on this one until we have a way to display some of the paca. I have an idea for that and will send a patch if it works. Sure, if you believe that is the best thing to do at the moment. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v3 09/11] cxl: Allow the kernel to trust that an image won't change on PERST.
On Wed, 12 Aug 2015 10:48:18 +1000 Daniel Axtens d...@axtens.net wrote: Provide a kernel API and a sysfs entry which allow a user to specify that when a card is PERSTed, it's image will stay the same, allowing it to participate in EEH. cxl_reset is used to reflash the card. In that case, we cannot safely assert that the image will not change. Therefore, disallow cxl_reset if the flag is set. Looks much better without all the #ifdefs!! Reviewed-by: Cyril Bur cyril...@gmail.com Signed-off-by: Daniel Axtens d...@axtens.net --- Documentation/ABI/testing/sysfs-class-cxl | 10 ++ drivers/misc/cxl/api.c| 7 +++ drivers/misc/cxl/cxl.h| 1 + drivers/misc/cxl/pci.c| 7 +++ drivers/misc/cxl/sysfs.c | 26 ++ include/misc/cxl.h| 10 ++ 6 files changed, 61 insertions(+) diff --git a/Documentation/ABI/testing/sysfs-class-cxl b/Documentation/ABI/testing/sysfs-class-cxl index acfe9df83139..b07e86d4597f 100644 --- a/Documentation/ABI/testing/sysfs-class-cxl +++ b/Documentation/ABI/testing/sysfs-class-cxl @@ -223,3 +223,13 @@ Description:write only Writing 1 will issue a PERST to card which may cause the card to reload the FPGA depending on load_image_on_perst. Users: https://github.com/ibm-capi/libcxl + +What:/sys/class/cxl/card/perst_reloads_same_image +Date:July 2015 +Contact: linuxppc-dev@lists.ozlabs.org +Description: read/write + Trust that when an image is reloaded via PERST, it will not + have changed. + 0 = don't trust, the image may be different (default) + 1 = trust that the image will not change. +Users: https://github.com/ibm-capi/libcxl diff --git a/drivers/misc/cxl/api.c b/drivers/misc/cxl/api.c index 729e0851167d..6a768a9ad22f 100644 --- a/drivers/misc/cxl/api.c +++ b/drivers/misc/cxl/api.c @@ -327,3 +327,10 @@ int cxl_afu_reset(struct cxl_context *ctx) return cxl_afu_check_and_enable(afu); } EXPORT_SYMBOL_GPL(cxl_afu_reset); + +void cxl_perst_reloads_same_image(struct cxl_afu *afu, + bool perst_reloads_same_image) +{ + afu-adapter-perst_same_image = perst_reloads_same_image; +} +EXPORT_SYMBOL_GPL(cxl_perst_reloads_same_image); diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h index d540542f9931..cda02412b01e 100644 --- a/drivers/misc/cxl/cxl.h +++ b/drivers/misc/cxl/cxl.h @@ -493,6 +493,7 @@ struct cxl { bool user_image_loaded; bool perst_loads_image; bool perst_select_user; + bool perst_same_image; }; int cxl_alloc_one_irq(struct cxl *adapter); diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c index 023a2086830b..b4a68a896a33 100644 --- a/drivers/misc/cxl/pci.c +++ b/drivers/misc/cxl/pci.c @@ -874,6 +874,12 @@ int cxl_reset(struct cxl *adapter) int i; u32 val; + if (adapter-perst_same_image) { + dev_warn(dev-dev, + cxl: refusing to reset/reflash when perst_reloads_same_image is set.\n); + return -EINVAL; + } + dev_info(dev-dev, CXL reset\n); /* pcie_warm_reset requests a fundamental pci reset which includes a @@ -1148,6 +1154,7 @@ static struct cxl *cxl_init_adapter(struct pci_dev *dev) * configure/reconfigure */ adapter-perst_loads_image = true; + adapter-perst_same_image = false; rc = cxl_configure_adapter(adapter, dev); if (rc) { diff --git a/drivers/misc/cxl/sysfs.c b/drivers/misc/cxl/sysfs.c index 31f38bc71a3d..6619cf1f6e1f 100644 --- a/drivers/misc/cxl/sysfs.c +++ b/drivers/misc/cxl/sysfs.c @@ -112,12 +112,38 @@ static ssize_t load_image_on_perst_store(struct device *device, return count; } +static ssize_t perst_reloads_same_image_show(struct device *device, + struct device_attribute *attr, + char *buf) +{ + struct cxl *adapter = to_cxl_adapter(device); + + return scnprintf(buf, PAGE_SIZE, %i\n, adapter-perst_same_image); +} + +static ssize_t perst_reloads_same_image_store(struct device *device, + struct device_attribute *attr, + const char *buf, size_t count) +{ + struct cxl *adapter = to_cxl_adapter(device); + int rc; + int val; + + rc = sscanf(buf, %i, val); + if ((rc != 1) || !(val == 1 || val == 0)) + return -EINVAL; + + adapter-perst_same_image = (val == 1 ? true : false); + return count; +} + static struct device_attribute adapter_attrs[] = { __ATTR_RO(caia_version), __ATTR_RO(psl_revision), __ATTR_RO(base_image), __ATTR_RO(image_loaded),
Re: [4/8] powerpc/slb: Add some helper functions to improve modularization
On 08/12/2015 09:41 AM, Michael Ellerman wrote: On Wed, 2015-29-07 at 07:10:01 UTC, Anshuman Khandual wrote: This patch adds the following six helper functions to help improve modularization and readability of the code. (1) slb_invalidate_all:Invalidates the entire SLB (2) slb_invalidate:Invalidates SLB entries present in PACA (3) mmu_linear_vsid_flags: VSID flags for kernel linear mapping (4) mmu_virtual_vsid_flags:VSID flags for kernel virtual mapping (5) mmu_vmemmap_vsid_flags:VSID flags for kernel vmem mapping (6) mmu_io_vsid_flags: VSID flags for kernel I/O mapping That's too many changes for one patch, it's certainly not a single logical change. I'm happy with all the flag ones being done in a single patch, but please do the other two in separate patches. Sure, will split this into three separate patches, also update the in-code documentation as suggested on the [5/8] patch and then will send out a new series. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 01/31] scatterlist: add sg_pfn and sg_has_page helpers
Signed-off-by: Christoph Hellwig h...@lst.de --- include/linux/scatterlist.h | 10 ++ 1 file changed, 10 insertions(+) diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h index 9b1ef0c..b1056bf 100644 --- a/include/linux/scatterlist.h +++ b/include/linux/scatterlist.h @@ -230,6 +230,16 @@ static inline dma_addr_t sg_phys(struct scatterlist *sg) return page_to_phys(sg_page(sg)) + sg-offset; } +static inline unsigned long sg_pfn(struct scatterlist *sg) +{ + return page_to_pfn(sg_page(sg)); +} + +static inline bool sg_has_page(struct scatterlist *sg) +{ + return true; +} + /** * sg_virt - Return virtual address of an sg entry * @sg: SG entry -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 14/31] sparc32/io-unit: handle page-less SG entries
For the iommu offset we just need and offset into the page. Calculate that using the physical address instead of using the virtual address so that we don't require a virtual mapping. Signed-off-by: Christoph Hellwig h...@lst.de --- arch/sparc/mm/io-unit.c | 23 --- 1 file changed, 12 insertions(+), 11 deletions(-) diff --git a/arch/sparc/mm/io-unit.c b/arch/sparc/mm/io-unit.c index f311bf2..82f97ae 100644 --- a/arch/sparc/mm/io-unit.c +++ b/arch/sparc/mm/io-unit.c @@ -91,13 +91,14 @@ static int __init iounit_init(void) subsys_initcall(iounit_init); /* One has to hold iounit-lock to call this */ -static unsigned long iounit_get_area(struct iounit_struct *iounit, unsigned long vaddr, int size) +static dma_addr_t iounit_get_area(struct iounit_struct *iounit, + unsigned long paddr, int size) { int i, j, k, npages; - unsigned long rotor, scan, limit; + unsigned long rotor, scan, limit, dma_addr; iopte_t iopte; -npages = ((vaddr ~PAGE_MASK) + size + (PAGE_SIZE-1)) PAGE_SHIFT; +npages = ((paddr ~PAGE_MASK) + size + (PAGE_SIZE-1)) PAGE_SHIFT; /* A tiny bit of magic ingredience :) */ switch (npages) { @@ -106,7 +107,7 @@ static unsigned long iounit_get_area(struct iounit_struct *iounit, unsigned long default: i = 0x0213; break; } - IOD((iounit_get_area(%08lx,%d[%d])=, vaddr, size, npages)); + IOD((iounit_get_area(%08lx,%d[%d])=, paddr, size, npages)); next: j = (i 15); rotor = iounit-rotor[j - 1]; @@ -121,7 +122,7 @@ nexti: scan = find_next_zero_bit(iounit-bmap, limit, scan); } i = 4; if (!(i 15)) - panic(iounit_get_area: Couldn't find free iopte slots for (%08lx,%d)\n, vaddr, size); + panic(iounit_get_area: Couldn't find free iopte slots for (%08lx,%d)\n, paddr, size); goto next; } for (k = 1, scan++; k npages; k++) @@ -129,14 +130,14 @@ nexti:scan = find_next_zero_bit(iounit-bmap, limit, scan); goto nexti; iounit-rotor[j - 1] = (scan limit) ? scan : iounit-limit[j - 1]; scan -= npages; - iopte = MKIOPTE(__pa(vaddr PAGE_MASK)); - vaddr = IOUNIT_DMA_BASE + (scan PAGE_SHIFT) + (vaddr ~PAGE_MASK); + iopte = MKIOPTE(paddr PAGE_MASK); + dma_addr = IOUNIT_DMA_BASE + (scan PAGE_SHIFT) + (paddr ~PAGE_MASK); for (k = 0; k npages; k++, iopte = __iopte(iopte_val(iopte) + 0x100), scan++) { set_bit(scan, iounit-bmap); sbus_writel(iopte, iounit-page_table[scan]); } - IOD((%08lx\n, vaddr)); - return vaddr; + IOD((%08lx\n, dma_addr)); + return dma_addr; } static __u32 iounit_get_scsi_one(struct device *dev, char *vaddr, unsigned long len) @@ -145,7 +146,7 @@ static __u32 iounit_get_scsi_one(struct device *dev, char *vaddr, unsigned long unsigned long ret, flags; spin_lock_irqsave(iounit-lock, flags); - ret = iounit_get_area(iounit, (unsigned long)vaddr, len); + ret = iounit_get_area(iounit, virt_to_phys(vaddr), len); spin_unlock_irqrestore(iounit-lock, flags); return ret; } @@ -159,7 +160,7 @@ static void iounit_get_scsi_sgl(struct device *dev, struct scatterlist *sg, int spin_lock_irqsave(iounit-lock, flags); while (sz != 0) { --sz; - sg-dma_address = iounit_get_area(iounit, (unsigned long) sg_virt(sg), sg-length); + sg-dma_address = iounit_get_area(iounit, sg_phys(sg), sg-length); sg-dma_length = sg-length; sg = sg_next(sg); } -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 21/31] blackfin: handle page-less SG entries
Switch from sg_virt to sg_phys as blackfin like all nommu architectures has a 1:1 virtual to physical mapping. Signed-off-by: Christoph Hellwig h...@lst.de --- arch/blackfin/kernel/dma-mapping.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/blackfin/kernel/dma-mapping.c b/arch/blackfin/kernel/dma-mapping.c index df437e5..e2c4d1a 100644 --- a/arch/blackfin/kernel/dma-mapping.c +++ b/arch/blackfin/kernel/dma-mapping.c @@ -120,7 +120,7 @@ dma_map_sg(struct device *dev, struct scatterlist *sg_list, int nents, int i; for_each_sg(sg_list, sg, nents, i) { - sg-dma_address = (dma_addr_t) sg_virt(sg); + sg-dma_address = sg_phys(sg); __dma_sync(sg_dma_address(sg), sg_dma_len(sg), direction); } @@ -135,7 +135,7 @@ void dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg_list, int i; for_each_sg(sg_list, sg, nelems, i) { - sg-dma_address = (dma_addr_t) sg_virt(sg); + sg-dma_address = sg_phys(sg); __dma_sync(sg_dma_address(sg), sg_dma_len(sg), direction); } } -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 27/31] mips: handle page-less SG entries
Make all cache invalidation conditional on sg_has_page() and use sg_phys to get the physical address directly. To do this consolidate the two platform callouts using pages and virtual addresses into a single one using a physical address. Signed-off-by: Christoph Hellwig h...@lst.de --- arch/mips/bmips/dma.c | 9 ++-- arch/mips/include/asm/mach-ath25/dma-coherence.h | 10 ++--- arch/mips/include/asm/mach-bmips/dma-coherence.h | 4 ++-- .../include/asm/mach-cavium-octeon/dma-coherence.h | 11 ++ arch/mips/include/asm/mach-generic/dma-coherence.h | 12 +++ arch/mips/include/asm/mach-ip27/dma-coherence.h| 16 +++--- arch/mips/include/asm/mach-ip32/dma-coherence.h| 19 +++- arch/mips/include/asm/mach-jazz/dma-coherence.h| 11 +++--- .../include/asm/mach-loongson64/dma-coherence.h| 16 +++--- arch/mips/mm/dma-default.c | 25 -- 10 files changed, 37 insertions(+), 96 deletions(-) diff --git a/arch/mips/bmips/dma.c b/arch/mips/bmips/dma.c index 04790f4..13fc891 100644 --- a/arch/mips/bmips/dma.c +++ b/arch/mips/bmips/dma.c @@ -52,14 +52,9 @@ static dma_addr_t bmips_phys_to_dma(struct device *dev, phys_addr_t pa) return pa; } -dma_addr_t plat_map_dma_mem(struct device *dev, void *addr, size_t size) +dma_addr_t plat_map_dma_mem(struct device *dev, phys_addr_t phys, size_t size) { - return bmips_phys_to_dma(dev, virt_to_phys(addr)); -} - -dma_addr_t plat_map_dma_mem_page(struct device *dev, struct page *page) -{ - return bmips_phys_to_dma(dev, page_to_phys(page)); + return bmips_phys_to_dma(dev, phys); } unsigned long plat_dma_addr_to_phys(struct device *dev, dma_addr_t dma_addr) diff --git a/arch/mips/include/asm/mach-ath25/dma-coherence.h b/arch/mips/include/asm/mach-ath25/dma-coherence.h index d5defdd..4330de6 100644 --- a/arch/mips/include/asm/mach-ath25/dma-coherence.h +++ b/arch/mips/include/asm/mach-ath25/dma-coherence.h @@ -31,15 +31,9 @@ static inline dma_addr_t ath25_dev_offset(struct device *dev) } static inline dma_addr_t -plat_map_dma_mem(struct device *dev, void *addr, size_t size) +plat_map_dma_mem(struct device *dev, phys_addr_t phys, size_t size) { - return virt_to_phys(addr) + ath25_dev_offset(dev); -} - -static inline dma_addr_t -plat_map_dma_mem_page(struct device *dev, struct page *page) -{ - return page_to_phys(page) + ath25_dev_offset(dev); + return phys + ath25_dev_offset(dev); } static inline unsigned long diff --git a/arch/mips/include/asm/mach-bmips/dma-coherence.h b/arch/mips/include/asm/mach-bmips/dma-coherence.h index d29781f..1b9a7f4 100644 --- a/arch/mips/include/asm/mach-bmips/dma-coherence.h +++ b/arch/mips/include/asm/mach-bmips/dma-coherence.h @@ -21,8 +21,8 @@ struct device; -extern dma_addr_t plat_map_dma_mem(struct device *dev, void *addr, size_t size); -extern dma_addr_t plat_map_dma_mem_page(struct device *dev, struct page *page); +extern dma_addr_t plat_map_dma_mem(struct device *dev, phys_addr_t phys, + size_t size); extern unsigned long plat_dma_addr_to_phys(struct device *dev, dma_addr_t dma_addr); diff --git a/arch/mips/include/asm/mach-cavium-octeon/dma-coherence.h b/arch/mips/include/asm/mach-cavium-octeon/dma-coherence.h index 460042e..d0988c7 100644 --- a/arch/mips/include/asm/mach-cavium-octeon/dma-coherence.h +++ b/arch/mips/include/asm/mach-cavium-octeon/dma-coherence.h @@ -19,15 +19,8 @@ struct device; extern void octeon_pci_dma_init(void); -static inline dma_addr_t plat_map_dma_mem(struct device *dev, void *addr, - size_t size) -{ - BUG(); - return 0; -} - -static inline dma_addr_t plat_map_dma_mem_page(struct device *dev, - struct page *page) +static inline dma_addr_t plat_map_dma_mem(struct device *dev, phys_addr_t phys, + size_t size) { BUG(); return 0; diff --git a/arch/mips/include/asm/mach-generic/dma-coherence.h b/arch/mips/include/asm/mach-generic/dma-coherence.h index 0f8a354..2dfb133 100644 --- a/arch/mips/include/asm/mach-generic/dma-coherence.h +++ b/arch/mips/include/asm/mach-generic/dma-coherence.h @@ -11,16 +11,10 @@ struct device; -static inline dma_addr_t plat_map_dma_mem(struct device *dev, void *addr, - size_t size) +static inline dma_addr_t plat_map_dma_mem(struct device *dev, phys_addr_t phys, + size_t size) { - return virt_to_phys(addr); -} - -static inline dma_addr_t plat_map_dma_mem_page(struct device *dev, - struct page *page) -{ - return page_to_phys(page); + return phys; } static inline unsigned long plat_dma_addr_to_phys(struct device *dev, diff --git a/arch/mips/include/asm/mach-ip27/dma-coherence.h b/arch/mips/include/asm/mach-ip27/dma-coherence.h index 1daa644..2578b9d 100644 --- a/arch/mips/include/asm/mach-ip27/dma-coherence.h +++
[PATCH 28/31] powerpc: handle page-less SG entries
Make all cache invalidation conditional on sg_has_page(). Signed-off-by: Christoph Hellwig h...@lst.de --- arch/powerpc/kernel/dma.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c index 35e4dcc..cece40b 100644 --- a/arch/powerpc/kernel/dma.c +++ b/arch/powerpc/kernel/dma.c @@ -135,7 +135,10 @@ static int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, for_each_sg(sgl, sg, nents, i) { sg-dma_address = sg_phys(sg) + get_dma_offset(dev); sg-dma_length = sg-length; - __dma_sync_page(sg_page(sg), sg-offset, sg-length, direction); + if (sg_has_page(sg)) { + __dma_sync_page(sg_page(sg), sg-offset, sg-length, + direction); + } } return nents; @@ -200,7 +203,10 @@ static inline void dma_direct_sync_sg(struct device *dev, int i; for_each_sg(sgl, sg, nents, i) - __dma_sync_page(sg_page(sg), sg-offset, sg-length, direction); + if (sg_has_page(sg)) { + __dma_sync_page(sg_page(sg), sg-offset, sg-length, + direction); + } } static inline void dma_direct_sync_single(struct device *dev, -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [V3] powerpc/irq: Enable some more exceptions in /proc/interrupts interface
On 08/09/2015 07:57 AM, Benjamin Herrenschmidt wrote: On Tue, 2015-08-04 at 19:57 +1000, Michael Ellerman wrote: On Mon, 2015-13-07 at 08:16:06 UTC, Anshuman Khandual wrote: This patch enables facility unavailable exceptions for generic facility, FPU, ALTIVEC and VSX in /proc/interrupts listing by incrementing their newly added IRQ statistical counters as and when these exceptions happen. This also adds couple of helper functions which will be called from within the interrupt handler context to update their statistics. Similarly this patch also enables alignment and program check exceptions as well. ... diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 0a0399c2..a86180c 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -1158,6 +1158,7 @@ BEGIN_FTR_SECTION END_FTR_SECTION_IFSET(CPU_FTR_TM) #endif bl load_up_fpu + bl fpu_unav_exceptions_count Is it safe to call C code here? Even if it was (at some stage it wasn't, I'd have to look very closely to see what's the situation now), we certainly don't want to add overhead to load_up_fpu. As I had already mentioned in the V2 thread of this patch, the FPU performance with this patch being applied is still very much comparable to the kernel without this patch. Though I have not verified whether this still holds true with the new changes being proposed in exceptions-64s.S (earlier reply in this thread) to make the C function call safer. Average of 1000 iterations (context_switch2 --fp 0 0) Withthe patch : 322599.57 (Average of 1000 results) Without the patch : 320464.924 (Average of 1000 results) With standard deviation of the results. 6029.1407073288 (with patch ) 5941.7684079774 (without patch) Wondering if the result above still does not convince us that FPU performance might not be getting hit because of this patch, let me know if we need to do more experiments. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v3 02/11] cxl: Drop commands if the PCI channel is not in normal state
On Wed, 12 Aug 2015 10:48:11 +1000 Daniel Axtens d...@axtens.net wrote: If the PCI channel has gone down, don't attempt to poke the hardware. We need to guard every time cxl_whatever_(read|write) is called. This is because a call to those functions will dereference an offset into an mmio register, and the mmio mappings get invalidated in the EEH teardown. Check in the read/write functions in the header. We give them the same semantics as usual PCI operations: - a write to a channel that is down is ignored. - a read from a channel that is down returns all fs. Also, we try to access the MMIO space of a vPHB device as part of the PCI disable path. Because that's a read that bypasses most of our usual checks, we handle it explicitly. As far as user visible warnings go: - Check link state in file ops, return -EIO if down. - Be reasonably quiet if there's an error in a teardown path, or when we already know the hardware is going down. - Throw a big WARN if someone tries to start a CXL operation while the card is down. This gives a useful stacktrace for debugging whatever is doing that. My previous comments appear to have been added, making functions from those macros was a good move. I can't speak too much for the exact function of the patch but the code looks good. Reviewed-by: Cyril Bur cyril...@gmail.com Signed-off-by: Daniel Axtens d...@axtens.net --- drivers/misc/cxl/context.c | 6 +++- drivers/misc/cxl/cxl.h | 44 ++-- drivers/misc/cxl/file.c| 19 + drivers/misc/cxl/native.c | 71 -- drivers/misc/cxl/vphb.c| 26 + 5 files changed, 154 insertions(+), 12 deletions(-) diff --git a/drivers/misc/cxl/context.c b/drivers/misc/cxl/context.c index 1287148629c0..615842115848 100644 --- a/drivers/misc/cxl/context.c +++ b/drivers/misc/cxl/context.c @@ -193,7 +193,11 @@ int __detach_context(struct cxl_context *ctx) if (status != STARTED) return -EBUSY; - WARN_ON(cxl_detach_process(ctx)); + /* Only warn if we detached while the link was OK. + * If detach fails when hw is down, we don't care. + */ + WARN_ON(cxl_detach_process(ctx) + cxl_adapter_link_ok(ctx-afu-adapter)); flush_work(ctx-fault_work); /* Only needed for dedicated process */ put_pid(ctx-pid); cxl_ctx_put(); diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h index 6a93bfbcd826..9b9e89fd02cc 100644 --- a/drivers/misc/cxl/cxl.h +++ b/drivers/misc/cxl/cxl.h @@ -531,6 +531,14 @@ struct cxl_process_element { __be32 software_state; } __packed; +static inline bool cxl_adapter_link_ok(struct cxl *cxl) +{ + struct pci_dev *pdev; + + pdev = to_pci_dev(cxl-dev.parent); + return !pci_channel_offline(pdev); +} + static inline void __iomem *_cxl_p1_addr(struct cxl *cxl, cxl_p1_reg_t reg) { WARN_ON(!cpu_has_feature(CPU_FTR_HVMODE)); @@ -539,12 +547,16 @@ static inline void __iomem *_cxl_p1_addr(struct cxl *cxl, cxl_p1_reg_t reg) static inline void cxl_p1_write(struct cxl *cxl, cxl_p1_reg_t reg, u64 val) { - out_be64(_cxl_p1_addr(cxl, reg), val); + if (likely(cxl_adapter_link_ok(cxl))) + out_be64(_cxl_p1_addr(cxl, reg), val); } static inline u64 cxl_p1_read(struct cxl *cxl, cxl_p1_reg_t reg) { - return in_be64(_cxl_p1_addr(cxl, reg)); + if (likely(cxl_adapter_link_ok(cxl))) + return in_be64(_cxl_p1_addr(cxl, reg)); + else + return ~0ULL; } static inline void __iomem *_cxl_p1n_addr(struct cxl_afu *afu, cxl_p1n_reg_t reg) @@ -555,12 +567,16 @@ static inline void __iomem *_cxl_p1n_addr(struct cxl_afu *afu, cxl_p1n_reg_t reg static inline void cxl_p1n_write(struct cxl_afu *afu, cxl_p1n_reg_t reg, u64 val) { - out_be64(_cxl_p1n_addr(afu, reg), val); + if (likely(cxl_adapter_link_ok(afu-adapter))) + out_be64(_cxl_p1n_addr(afu, reg), val); } static inline u64 cxl_p1n_read(struct cxl_afu *afu, cxl_p1n_reg_t reg) { - return in_be64(_cxl_p1n_addr(afu, reg)); + if (likely(cxl_adapter_link_ok(afu-adapter))) + return in_be64(_cxl_p1n_addr(afu, reg)); + else + return ~0ULL; } static inline void __iomem *_cxl_p2n_addr(struct cxl_afu *afu, cxl_p2n_reg_t reg) @@ -570,22 +586,34 @@ static inline void __iomem *_cxl_p2n_addr(struct cxl_afu *afu, cxl_p2n_reg_t reg static inline void cxl_p2n_write(struct cxl_afu *afu, cxl_p2n_reg_t reg, u64 val) { - out_be64(_cxl_p2n_addr(afu, reg), val); + if (likely(cxl_adapter_link_ok(afu-adapter))) + out_be64(_cxl_p2n_addr(afu, reg), val); } static inline u64 cxl_p2n_read(struct cxl_afu *afu, cxl_p2n_reg_t reg) { - return in_be64(_cxl_p2n_addr(afu, reg)); + if (likely(cxl_adapter_link_ok(afu-adapter))) +
[PATCH 07/31] alpha/pci_iommu: handle page-less SG entries
Use sg_phys() instead of virt_to_phys(sg_virt(sg)) so that we don't require a kernel virtual address, and switch a few debug printfs to print physical instead of virtual addresses. Signed-off-by: Christoph Hellwig h...@lst.de --- arch/alpha/kernel/pci_iommu.c | 36 +++- 1 file changed, 15 insertions(+), 21 deletions(-) diff --git a/arch/alpha/kernel/pci_iommu.c b/arch/alpha/kernel/pci_iommu.c index eddee77..5d46b49 100644 --- a/arch/alpha/kernel/pci_iommu.c +++ b/arch/alpha/kernel/pci_iommu.c @@ -248,20 +248,17 @@ static int pci_dac_dma_supported(struct pci_dev *dev, u64 mask) until either pci_unmap_single or pci_dma_sync_single is performed. */ static dma_addr_t -pci_map_single_1(struct pci_dev *pdev, void *cpu_addr, size_t size, +pci_map_single_1(struct pci_dev *pdev, unsigned long paddr, size_t size, int dac_allowed) { struct pci_controller *hose = pdev ? pdev-sysdata : pci_isa_hose; dma_addr_t max_dma = pdev ? pdev-dma_mask : ISA_DMA_MASK; struct pci_iommu_arena *arena; long npages, dma_ofs, i; - unsigned long paddr; dma_addr_t ret; unsigned int align = 0; struct device *dev = pdev ? pdev-dev : NULL; - paddr = __pa(cpu_addr); - #if !DEBUG_NODIRECT /* First check to see if we can use the direct map window. */ if (paddr + size + __direct_map_base - 1 = max_dma @@ -269,7 +266,7 @@ pci_map_single_1(struct pci_dev *pdev, void *cpu_addr, size_t size, ret = paddr + __direct_map_base; DBGA2(pci_map_single: [%p,%zx] - direct %llx from %pf\n, - cpu_addr, size, ret, __builtin_return_address(0)); + paddr, size, ret, __builtin_return_address(0)); return ret; } @@ -280,7 +277,7 @@ pci_map_single_1(struct pci_dev *pdev, void *cpu_addr, size_t size, ret = paddr + alpha_mv.pci_dac_offset; DBGA2(pci_map_single: [%p,%zx] - DAC %llx from %pf\n, - cpu_addr, size, ret, __builtin_return_address(0)); + paddr, size, ret, __builtin_return_address(0)); return ret; } @@ -309,15 +306,15 @@ pci_map_single_1(struct pci_dev *pdev, void *cpu_addr, size_t size, return 0; } + offset = paddr ~PAGE_MASK; paddr = PAGE_MASK; for (i = 0; i npages; ++i, paddr += PAGE_SIZE) arena-ptes[i + dma_ofs] = mk_iommu_pte(paddr); - ret = arena-dma_base + dma_ofs * PAGE_SIZE; - ret += (unsigned long)cpu_addr ~PAGE_MASK; + ret = arena-dma_base + dma_ofs * PAGE_SIZE + offset; DBGA2(pci_map_single: [%p,%zx] np %ld - sg %llx from %pf\n, - cpu_addr, size, npages, ret, __builtin_return_address(0)); + paddr, size, npages, ret, __builtin_return_address(0)); return ret; } @@ -357,7 +354,7 @@ static dma_addr_t alpha_pci_map_page(struct device *dev, struct page *page, BUG_ON(dir == PCI_DMA_NONE); dac_allowed = pdev ? pci_dac_dma_supported(pdev, pdev-dma_mask) : 0; - return pci_map_single_1(pdev, (char *)page_address(page) + offset, + return pci_map_single_1(pdev, page_to_phys(page) + offset, size, dac_allowed); } @@ -453,7 +450,7 @@ try_again: } memset(cpu_addr, 0, size); - *dma_addrp = pci_map_single_1(pdev, cpu_addr, size, 0); + *dma_addrp = pci_map_single_1(pdev, __pa(cpu_addr), size, 0); if (*dma_addrp == 0) { free_pages((unsigned long)cpu_addr, order); if (alpha_mv.mv_pci_tbi || (gfp GFP_DMA)) @@ -497,9 +494,6 @@ static void alpha_pci_free_coherent(struct device *dev, size_t size, Write dma_length of each leader with the combined lengths of the mergable followers. */ -#define SG_ENT_VIRT_ADDRESS(SG) (sg_virt((SG))) -#define SG_ENT_PHYS_ADDRESS(SG) __pa(SG_ENT_VIRT_ADDRESS(SG)) - static void sg_classify(struct device *dev, struct scatterlist *sg, struct scatterlist *end, int virt_ok) @@ -512,13 +506,13 @@ sg_classify(struct device *dev, struct scatterlist *sg, struct scatterlist *end, leader = sg; leader_flag = 0; leader_length = leader-length; - next_paddr = SG_ENT_PHYS_ADDRESS(leader) + leader_length; + next_paddr = sg_phys(leader) + leader_length; /* we will not marge sg without device. */ max_seg_size = dev ? dma_get_max_seg_size(dev) : 0; for (++sg; sg end; ++sg) { unsigned long addr, len; - addr = SG_ENT_PHYS_ADDRESS(sg); + addr = sg_phys(sg); len = sg-length; if (leader_length + len max_seg_size) @@ -555,7 +549,7 @@ sg_fill(struct device *dev, struct scatterlist *leader, struct scatterlist *end, struct scatterlist *out, struct pci_iommu_arena *arena,
[PATCH 08/31] c6x: handle page-less SG entries
Use sg_phys() instead of virt_to_phys(sg_virt(sg)) so that we don't require a kernel virtual address. Signed-off-by: Christoph Hellwig h...@lst.de --- arch/c6x/kernel/dma.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/c6x/kernel/dma.c b/arch/c6x/kernel/dma.c index ab7b12d..79cae03 100644 --- a/arch/c6x/kernel/dma.c +++ b/arch/c6x/kernel/dma.c @@ -68,8 +68,7 @@ int dma_map_sg(struct device *dev, struct scatterlist *sglist, int i; for_each_sg(sglist, sg, nents, i) - sg-dma_address = dma_map_single(dev, sg_virt(sg), sg-length, -dir); + sg-dma_address = sg_phys(sg); debug_dma_map_sg(dev, sglist, nents, nents, dir); -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 15/31] sparc32/iommu: handle page-less SG entries
Pass a PFN to iommu_get_one instad of calculating it locall from a page structure so that we don't need pages for every address we can DMA to or from. Also further restrict the cache flushing as we now have a non-highmem way of not kernel virtual mapped physical addresses. Signed-off-by: Christoph Hellwig h...@lst.de --- arch/sparc/mm/iommu.c | 17 + 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/arch/sparc/mm/iommu.c b/arch/sparc/mm/iommu.c index 491511d..3ed53d7 100644 --- a/arch/sparc/mm/iommu.c +++ b/arch/sparc/mm/iommu.c @@ -174,7 +174,7 @@ static void iommu_flush_iotlb(iopte_t *iopte, unsigned int niopte) } } -static u32 iommu_get_one(struct device *dev, struct page *page, int npages) +static u32 iommu_get_one(struct device *dev, unsigned long pfn, int npages) { struct iommu_struct *iommu = dev-archdata.iommu; int ioptex; @@ -183,7 +183,7 @@ static u32 iommu_get_one(struct device *dev, struct page *page, int npages) int i; /* page color = pfn of page */ - ioptex = bit_map_string_get(iommu-usemap, npages, page_to_pfn(page)); + ioptex = bit_map_string_get(iommu-usemap, npages, pfn); if (ioptex 0) panic(iommu out); busa0 = iommu-start + (ioptex PAGE_SHIFT); @@ -192,11 +192,11 @@ static u32 iommu_get_one(struct device *dev, struct page *page, int npages) busa = busa0; iopte = iopte0; for (i = 0; i npages; i++) { - iopte_val(*iopte) = MKIOPTE(page_to_pfn(page), IOPERM); + iopte_val(*iopte) = MKIOPTE(pfn, IOPERM); iommu_invalidate_page(iommu-regs, busa); busa += PAGE_SIZE; iopte++; - page++; + pfn++; } iommu_flush_iotlb(iopte0, npages); @@ -214,7 +214,7 @@ static u32 iommu_get_scsi_one(struct device *dev, char *vaddr, unsigned int len) off = (unsigned long)vaddr ~PAGE_MASK; npages = (off + len + PAGE_SIZE-1) PAGE_SHIFT; page = virt_to_page((unsigned long)vaddr PAGE_MASK); - busa = iommu_get_one(dev, page, npages); + busa = iommu_get_one(dev, page_to_pfn(page), npages); return busa + off; } @@ -243,7 +243,7 @@ static void iommu_get_scsi_sgl_gflush(struct device *dev, struct scatterlist *sg while (sz != 0) { --sz; n = (sg-length + sg-offset + PAGE_SIZE-1) PAGE_SHIFT; - sg-dma_address = iommu_get_one(dev, sg_page(sg), n) + sg-offset; + sg-dma_address = iommu_get_one(dev, sg_pfn(sg), n) + sg-offset; sg-dma_length = sg-length; sg = sg_next(sg); } @@ -264,7 +264,8 @@ static void iommu_get_scsi_sgl_pflush(struct device *dev, struct scatterlist *sg * XXX Is this a good assumption? * XXX What if someone else unmaps it here and races us? */ - if ((page = (unsigned long) page_address(sg_page(sg))) != 0) { + if (sg_has_page(sg) + (page = (unsigned long) page_address(sg_page(sg))) != 0) { for (i = 0; i n; i++) { if (page != oldpage) { /* Already flushed? */ flush_page_for_dma(page); @@ -274,7 +275,7 @@ static void iommu_get_scsi_sgl_pflush(struct device *dev, struct scatterlist *sg } } - sg-dma_address = iommu_get_one(dev, sg_page(sg), n) + sg-offset; + sg-dma_address = iommu_get_one(dev, sg_pfn(sg), n) + sg-offset; sg-dma_length = sg-length; sg = sg_next(sg); } -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 16/31] s390: handle page-less SG entries
Use sg_phys() instead of page_to_phys(sg_page(sg)) so that we don't require a page structure for all DMA memory. Signed-off-by: Christoph Hellwig h...@lst.de --- arch/s390/pci/pci_dma.c | 20 ++-- 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/arch/s390/pci/pci_dma.c b/arch/s390/pci/pci_dma.c index 6fd8d58..aae5a47 100644 --- a/arch/s390/pci/pci_dma.c +++ b/arch/s390/pci/pci_dma.c @@ -272,14 +272,13 @@ int dma_set_mask(struct device *dev, u64 mask) } EXPORT_SYMBOL_GPL(dma_set_mask); -static dma_addr_t s390_dma_map_pages(struct device *dev, struct page *page, -unsigned long offset, size_t size, +static dma_addr_t s390_dma_map_phys(struct device *dev, unsigned long pa, +size_t size, enum dma_data_direction direction, struct dma_attrs *attrs) { struct zpci_dev *zdev = get_zdev(to_pci_dev(dev)); unsigned long nr_pages, iommu_page_index; - unsigned long pa = page_to_phys(page) + offset; int flags = ZPCI_PTE_VALID; dma_addr_t dma_addr; @@ -301,7 +300,7 @@ static dma_addr_t s390_dma_map_pages(struct device *dev, struct page *page, if (!dma_update_trans(zdev, pa, dma_addr, size, flags)) { atomic64_add(nr_pages, zdev-mapped_pages); - return dma_addr + (offset ~PAGE_MASK); + return dma_addr + (pa ~PAGE_MASK); } out_free: @@ -312,6 +311,16 @@ out_err: return DMA_ERROR_CODE; } +static dma_addr_t s390_dma_map_pages(struct device *dev, struct page *page, +unsigned long offset, size_t size, +enum dma_data_direction direction, +struct dma_attrs *attrs) +{ + unsigned long pa = page_to_phys(page) + offset; + + return s390_dma_map_phys(dev, pa, size, direction, attrs); +} + static void s390_dma_unmap_pages(struct device *dev, dma_addr_t dma_addr, size_t size, enum dma_data_direction direction, struct dma_attrs *attrs) @@ -384,8 +393,7 @@ static int s390_dma_map_sg(struct device *dev, struct scatterlist *sg, int i; for_each_sg(sg, s, nr_elements, i) { - struct page *page = sg_page(s); - s-dma_address = s390_dma_map_pages(dev, page, s-offset, + s-dma_address = s390_dma_map_phys(dev, sg_phys(s), s-length, dir, NULL); if (!dma_mapping_error(dev, s-dma_address)) { s-dma_length = s-length; -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 22/31] metag: handle page-less SG entries
Make all cache invalidation conditional on sg_has_page(). Signed-off-by: Christoph Hellwig h...@lst.de --- arch/metag/include/asm/dma-mapping.h | 22 -- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/arch/metag/include/asm/dma-mapping.h b/arch/metag/include/asm/dma-mapping.h index eb5cdec..2ae9057 100644 --- a/arch/metag/include/asm/dma-mapping.h +++ b/arch/metag/include/asm/dma-mapping.h @@ -55,10 +55,9 @@ dma_map_sg(struct device *dev, struct scatterlist *sglist, int nents, WARN_ON(nents == 0 || sglist[0].length == 0); for_each_sg(sglist, sg, nents, i) { - BUG_ON(!sg_page(sg)); - sg-dma_address = sg_phys(sg); - dma_sync_for_device(sg_virt(sg), sg-length, direction); + if (sg_has_page(sg)) + dma_sync_for_device(sg_virt(sg), sg-length, direction); } return nents; @@ -94,10 +93,9 @@ dma_unmap_sg(struct device *dev, struct scatterlist *sglist, int nhwentries, WARN_ON(nhwentries == 0 || sglist[0].length == 0); for_each_sg(sglist, sg, nhwentries, i) { - BUG_ON(!sg_page(sg)); - sg-dma_address = sg_phys(sg); - dma_sync_for_cpu(sg_virt(sg), sg-length, direction); + if (sg_has_page(sg)) + dma_sync_for_cpu(sg_virt(sg), sg-length, direction); } } @@ -140,8 +138,10 @@ dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sglist, int nelems, int i; struct scatterlist *sg; - for_each_sg(sglist, sg, nelems, i) - dma_sync_for_cpu(sg_virt(sg), sg-length, direction); + for_each_sg(sglist, sg, nelems, i) { + if (sg_has_page(sg)) + dma_sync_for_cpu(sg_virt(sg), sg-length, direction); + } } static inline void @@ -151,8 +151,10 @@ dma_sync_sg_for_device(struct device *dev, struct scatterlist *sglist, int i; struct scatterlist *sg; - for_each_sg(sglist, sg, nelems, i) - dma_sync_for_device(sg_virt(sg), sg-length, direction); + for_each_sg(sglist, sg, nelems, i) { + if (sg_has_page(sg)) + dma_sync_for_device(sg_virt(sg), sg-length, direction); + } } static inline int -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 09/10] Define PERF_PMU_TXN_READ interface
On Tue, Aug 11, 2015 at 09:14:00PM -0700, Sukadev Bhattiprolu wrote: | +static void __perf_read_group_add(struct perf_event *leader, u64 read_format, u64 *values) | { | + struct perf_event *sub; | + int n = 1; /* skip @nr */ This n = 1 is to skip over the values[0] = 1 + nr_siblings in the caller. Anyway, in __perf_read_group_add() we always start with n = 1, however ... | | + perf_event_read(leader, true); | + | + /* | +* Since we co-schedule groups, {enabled,running} times of siblings | +* will be identical to those of the leader, so we only publish one | +* set. | +*/ | + if (read_format PERF_FORMAT_TOTAL_TIME_ENABLED) { | + values[n++] += leader-total_time_enabled + | + atomic64_read(leader-child_total_time_enabled); Note how this is an in-place addition, | + } | | + if (read_format PERF_FORMAT_TOTAL_TIME_RUNNING) { | + values[n++] += leader-total_time_running + | + atomic64_read(leader-child_total_time_running); and here, | + } | | + /* | +* Write {count,id} tuples for every sibling. | +*/ | + values[n++] += perf_event_count(leader); and here, | if (read_format PERF_FORMAT_ID) | values[n++] = primary_event_id(leader); and this will always assign the same value. | + list_for_each_entry(sub, leader-sibling_list, group_entry) { | + values[n++] += perf_event_count(sub); | + if (read_format PERF_FORMAT_ID) | + values[n++] = primary_event_id(sub); Same for these, therefore, | + } | +} | | +static int perf_read_group(struct perf_event *event, | + u64 read_format, char __user *buf) | +{ | + struct perf_event *leader = event-group_leader, *child; | + struct perf_event_context *ctx = leader-ctx; | + int ret = leader-read_size; | + u64 *values; | | + lockdep_assert_held(ctx-mutex); | | + values = kzalloc(event-read_size); | + if (!values) | + return -ENOMEM; | | + values[0] = 1 + leader-nr_siblings; | | + /* | +* By locking the child_mutex of the leader we effectively | +* lock the child list of all siblings.. XXX explain how. | +*/ | + mutex_lock(leader-child_mutex); | | + __perf_read_group_add(leader, read_format, values); ... we don't copy_to_user() here, | + list_for_each_entry(child, leader-child_list, child_list) | + __perf_read_group_add(child, read_format, values); so won't we overwrite the values[], if we always start at n = 1 in __perf_read_group_add()? yes and no, we have to re-iterate the same values for each child as they all have the same group, but we add the time and count fields, we do not overwrite. The _add() suffix was supposed to be a hint ;-) | + mutex_unlock(leader-child_mutex); | + | + if (copy_to_user(buf, values, event-read_size)) | + ret = -EFAULT; | + | + kfree(values); | | return ret; | } Where previously we would iterate the group and for each member iterate/sum all the child values together before copying the value out, we now, because we need to read groups together, need to first iterate the child list and sum whole groups. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc/xmon: Allow limiting the size of the paca display
The paca display is already more than 24 lines, which can be problematic if you have an old school 80x24 terminal, or more likely you are on a virtual terminal which does not scroll for whatever reason. We'd like to expand the paca display even more, so add a way to limit the number of lines that are displayed. This adds a third form of 'dp' which is 'dp # #', where the first number is the cpu, and the second is the number of lines to display. Example output: 5:mon dp 3 6 paca for cpu 0x3 @ cfdc0d80: possible = yes present = yes online = yes lock_token = 0x8000(0xa) paca_index = 0x3 (0x8) Signed-off-by: Michael Ellerman m...@ellerman.id.au --- arch/powerpc/xmon/xmon.c | 23 +++ 1 file changed, 15 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index e599259d84fc..6f44e9c07f34 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -205,6 +205,7 @@ Commands:\n\ #ifdef CONFIG_PPC64 \ dp[#]dump paca for current cpu, or cpu #\n\ + dp## dump paca for cpu #, only # lines\n\ dpa dump paca for all possible cpus\n #endif \ @@ -2070,9 +2071,10 @@ static void xmon_rawdump (unsigned long adrs, long ndump) } #ifdef CONFIG_PPC64 -static void dump_one_paca(int cpu) +static void dump_one_paca(int cpu, int num_lines) { struct paca_struct *p; + int i; if (setjmp(bus_error_jmp) != 0) { printf(*** Error dumping paca for cpu 0x%x!\n, cpu); @@ -2090,9 +2092,12 @@ static void dump_one_paca(int cpu) printf( %-*s = %s\n, 16, present, cpu_present(cpu) ? yes : no); printf( %-*s = %s\n, 16, online, cpu_online(cpu) ? yes : no); + i = 4; /* We always print the first four lines */ + #define DUMP(paca, name, format) \ - printf( %-*s = %#-*format\t(0x%lx)\n, 16, #name, 18, paca-name, \ - offsetof(struct paca_struct, name)); + if (!num_lines || i++ num_lines) \ + printf( %-*s = %#-*format\t(0x%lx)\n, 16, #name, 18, \ + paca-name, offsetof(struct paca_struct, name)); DUMP(p, lock_token, x); DUMP(p, paca_index, x); @@ -2135,7 +2140,7 @@ static void dump_all_pacas(void) } for_each_possible_cpu(cpu) - dump_one_paca(cpu); + dump_one_paca(cpu, 0); } static void dump_pacas(void) @@ -2151,10 +2156,12 @@ static void dump_pacas(void) termch = c; /* Put c back, it wasn't 'a' */ - if (scanhex(num)) - dump_one_paca(num); - else - dump_one_paca(xmon_owner); + if (scanhex(num)) { + unsigned long lines = 0; + scanhex(lines); + dump_one_paca(num, lines); + } else + dump_one_paca(xmon_owner, 0); } #endif -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 03/31] dma-debug: handle page-less SG entries
Use sg_pfn to get a the PFN and skip checks that require a kernel virtual address. Signed-off-by: Christoph Hellwig h...@lst.de --- lib/dma-debug.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/lib/dma-debug.c b/lib/dma-debug.c index dace71f..a215a80 100644 --- a/lib/dma-debug.c +++ b/lib/dma-debug.c @@ -1368,7 +1368,7 @@ void debug_dma_map_sg(struct device *dev, struct scatterlist *sg, entry-type = dma_debug_sg; entry-dev= dev; - entry-pfn= page_to_pfn(sg_page(s)); + entry-pfn= sg_pfn(s); entry-offset = s-offset, entry-size = sg_dma_len(s); entry-dev_addr = sg_dma_address(s); @@ -1376,7 +1376,7 @@ void debug_dma_map_sg(struct device *dev, struct scatterlist *sg, entry-sg_call_ents = nents; entry-sg_mapped_ents = mapped_ents; - if (!PageHighMem(sg_page(s))) { + if (sg_has_page(s) !PageHighMem(sg_page(s))) { check_for_stack(dev, sg_virt(s)); check_for_illegal_area(dev, sg_virt(s), sg_dma_len(s)); } @@ -1419,7 +1419,7 @@ void debug_dma_unmap_sg(struct device *dev, struct scatterlist *sglist, struct dma_debug_entry ref = { .type = dma_debug_sg, .dev= dev, - .pfn= page_to_pfn(sg_page(s)), + .pfn= sg_pfn(s), .offset = s-offset, .dev_addr = sg_dma_address(s), .size = sg_dma_len(s), @@ -1580,7 +1580,7 @@ void debug_dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg, struct dma_debug_entry ref = { .type = dma_debug_sg, .dev= dev, - .pfn= page_to_pfn(sg_page(s)), + .pfn= sg_pfn(s), .offset = s-offset, .dev_addr = sg_dma_address(s), .size = sg_dma_len(s), @@ -1613,7 +1613,7 @@ void debug_dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg, struct dma_debug_entry ref = { .type = dma_debug_sg, .dev= dev, - .pfn= page_to_pfn(sg_page(s)), + .pfn= sg_pfn(s), .offset = s-offset, .dev_addr = sg_dma_address(s), .size = sg_dma_len(s), -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 13/31] sparc/ldc: handle page-less SG entries
Use sg_phys(sg) PAGE_MASK instead of page_to_pfn(sg_page(sg)) PAGE_SHIFT to get at the page-aligned physical address ofa SG entry, so that we don't require a page backing for SG entries. Signed-off-by: Christoph Hellwig h...@lst.de --- arch/sparc/kernel/ldc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/sparc/kernel/ldc.c b/arch/sparc/kernel/ldc.c index 1ae5eb1..0a29974 100644 --- a/arch/sparc/kernel/ldc.c +++ b/arch/sparc/kernel/ldc.c @@ -2051,7 +2051,7 @@ static void fill_cookies(struct cookie_state *sp, unsigned long pa, static int sg_count_one(struct scatterlist *sg) { - unsigned long base = page_to_pfn(sg_page(sg)) PAGE_SHIFT; + unsigned long base = sg_phys(sg) PAGE_MASK; long len = sg-length; if ((sg-offset | len) (8UL - 1)) @@ -2114,7 +2114,7 @@ int ldc_map_sg(struct ldc_channel *lp, state.nc = 0; for_each_sg(sg, s, num_sg, i) { - fill_cookies(state, page_to_pfn(sg_page(s)) PAGE_SHIFT, + fill_cookies(state, sg_phys(s) PAGE_MASK, s-offset, s-length); } -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 19/31] arc: handle page-less SG entries
Make all cache invalidation conditional on sg_has_page() and use sg_phys to get the physical address directly. Signed-off-by: Christoph Hellwig h...@lst.de --- arch/arc/include/asm/dma-mapping.h | 26 +++--- 1 file changed, 19 insertions(+), 7 deletions(-) diff --git a/arch/arc/include/asm/dma-mapping.h b/arch/arc/include/asm/dma-mapping.h index 2d28ba9..42eb526 100644 --- a/arch/arc/include/asm/dma-mapping.h +++ b/arch/arc/include/asm/dma-mapping.h @@ -108,9 +108,13 @@ dma_map_sg(struct device *dev, struct scatterlist *sg, struct scatterlist *s; int i; - for_each_sg(sg, s, nents, i) - s-dma_address = dma_map_page(dev, sg_page(s), s-offset, - s-length, dir); + for_each_sg(sg, s, nents, i) { + if (sg_has_page(s)) { + _dma_cache_sync((unsigned long)sg_virt(s), s-length, + dir); + } + s-dma_address = sg_phys(s); + } return nents; } @@ -163,8 +167,12 @@ dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sglist, int nelems, int i; struct scatterlist *sg; - for_each_sg(sglist, sg, nelems, i) - _dma_cache_sync((unsigned int)sg_virt(sg), sg-length, dir); + for_each_sg(sglist, sg, nelems, i) { + if (sg_has_page(sg)) { + _dma_cache_sync((unsigned int)sg_virt(sg), sg-length, + dir); + } + } } static inline void @@ -174,8 +182,12 @@ dma_sync_sg_for_device(struct device *dev, struct scatterlist *sglist, int i; struct scatterlist *sg; - for_each_sg(sglist, sg, nelems, i) - _dma_cache_sync((unsigned int)sg_virt(sg), sg-length, dir); + for_each_sg(sglist, sg, nelems, i) { + if (sg_has_page(sg)) { + _dma_cache_sync((unsigned int)sg_virt(sg), sg-length, + dir); + } + } } static inline int dma_supported(struct device *dev, u64 dma_mask) -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 20/31] avr32: handle page-less SG entries
Make all cache invalidation conditional on sg_has_page() and use sg_phys to get the physical address directly, bypassing the noop page_to_bus. Signed-off-by: Christoph Hellwig h...@lst.de --- arch/avr32/include/asm/dma-mapping.h | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/arch/avr32/include/asm/dma-mapping.h b/arch/avr32/include/asm/dma-mapping.h index ae7ac92..a662ce2 100644 --- a/arch/avr32/include/asm/dma-mapping.h +++ b/arch/avr32/include/asm/dma-mapping.h @@ -216,11 +216,9 @@ dma_map_sg(struct device *dev, struct scatterlist *sglist, int nents, struct scatterlist *sg; for_each_sg(sglist, sg, nents, i) { - char *virt; - - sg-dma_address = page_to_bus(sg_page(sg)) + sg-offset; - virt = sg_virt(sg); - dma_cache_sync(dev, virt, sg-length, direction); + sg-dma_address = sg_phys(sg); + if (sg_has_page(sg)) + dma_cache_sync(dev, sg_virt(sg), sg-length, direction); } return nents; @@ -328,8 +326,10 @@ dma_sync_sg_for_device(struct device *dev, struct scatterlist *sglist, int i; struct scatterlist *sg; - for_each_sg(sglist, sg, nents, i) - dma_cache_sync(dev, sg_virt(sg), sg-length, direction); + for_each_sg(sglist, sg, nents, i) { + if (sg_has_page(sg)) + dma_cache_sync(dev, sg_virt(sg), sg-length, direction); + } } /* Now for the API extensions over the pci_ one */ -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc/xmon: Allow limiting the size of the paca display
On 08/12/2015 12:27 PM, Michael Ellerman wrote: The paca display is already more than 24 lines, which can be problematic if you have an old school 80x24 terminal, or more likely you are on a virtual terminal which does not scroll for whatever reason. We'd like to expand the paca display even more, so add a way to limit the number of lines that are displayed. This adds a third form of 'dp' which is 'dp # #', where the first number is the cpu, and the second is the number of lines to display. Example output: 5:mon dp 3 6 paca for cpu 0x3 @ cfdc0d80: possible = yes present = yes online = yes lock_token = 0x8000 (0xa) paca_index = 0x3 (0x8) Signed-off-by: Michael Ellerman m...@ellerman.id.au --- arch/powerpc/xmon/xmon.c | 23 +++ 1 file changed, 15 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index e599259d84fc..6f44e9c07f34 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -205,6 +205,7 @@ Commands:\n\ #ifdef CONFIG_PPC64 \ dp[#] dump paca for current cpu, or cpu #\n\ + dp## dump paca for cpu #, only # lines\n\ dpadump paca for all possible cpus\n #endif \ @@ -2070,9 +2071,10 @@ static void xmon_rawdump (unsigned long adrs, long ndump) } #ifdef CONFIG_PPC64 -static void dump_one_paca(int cpu) +static void dump_one_paca(int cpu, int num_lines) { struct paca_struct *p; + int i; if (setjmp(bus_error_jmp) != 0) { printf(*** Error dumping paca for cpu 0x%x!\n, cpu); @@ -2090,9 +2092,12 @@ static void dump_one_paca(int cpu) printf( %-*s = %s\n, 16, present, cpu_present(cpu) ? yes : no); printf( %-*s = %s\n, 16, online, cpu_online(cpu) ? yes : no); + i = 4; /* We always print the first four lines */ + #define DUMP(paca, name, format) \ - printf( %-*s = %#-*format\t(0x%lx)\n, 16, #name, 18, paca-name, \ - offsetof(struct paca_struct, name)); + if (!num_lines || i++ num_lines) All look good except the fact that we are using 0 to signify that there is no limit to the number of lines. Is not it bit confusing ? ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 8/8] powerpc/xmon: Add some more elements to the existing PACA dump list
On Wed, 2015-07-29 at 12:40 +0530, Anshuman Khandual wrote: This patch adds a set of new elements to the existing PACA dump list inside an xmon session which can be listed below improving the overall xmon debug support. (1) hmi_event_available (2) dscr_default (3) vmalloc_sllp (4) slb_cache_ptr (5) sprg_vdso (6) tm_scratch (7) core_idle_state_ptr (8) thread_idle_state (9) thread_mask (10) slb_shadow (11) pgd (12) kernel_pgd (13) tcd_ptr (14) mc_kstack (15) crit_kstack (16) dbg_kstack (17) user_time (18) system_time (19) user_time_scaled (20) starttime (21) starttime_user (22) startspurr (23) utime_sspurr (24) stolen_time Adding these makes the paca display much longer than 24 lines. I know in general we don't worry too much about folks on 80x24 green screens, but it's nice if xmon works OK on those. Or on virtual consoles that don't scroll for whatever reason. So I'm going to hold off on this one until we have a way to display some of the paca. I have an idea for that and will send a patch if it works. cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v3 01/11] cxl: Convert MMIO read/write macros to inline functions
On Wed, 12 Aug 2015 10:48:10 +1000 Daniel Axtens d...@axtens.net wrote: We're about to make these more complex, so make them functions first. Reviewed-by: Cyril Bur cyril...@gmail.com Signed-off-by: Daniel Axtens d...@axtens.net --- drivers/misc/cxl/cxl.h | 51 ++ 1 file changed, 35 insertions(+), 16 deletions(-) diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h index 4fd66cabde1e..6a93bfbcd826 100644 --- a/drivers/misc/cxl/cxl.h +++ b/drivers/misc/cxl/cxl.h @@ -537,10 +537,15 @@ static inline void __iomem *_cxl_p1_addr(struct cxl *cxl, cxl_p1_reg_t reg) return cxl-p1_mmio + cxl_reg_off(reg); } -#define cxl_p1_write(cxl, reg, val) \ - out_be64(_cxl_p1_addr(cxl, reg), val) -#define cxl_p1_read(cxl, reg) \ - in_be64(_cxl_p1_addr(cxl, reg)) +static inline void cxl_p1_write(struct cxl *cxl, cxl_p1_reg_t reg, u64 val) +{ + out_be64(_cxl_p1_addr(cxl, reg), val); +} + +static inline u64 cxl_p1_read(struct cxl *cxl, cxl_p1_reg_t reg) +{ + return in_be64(_cxl_p1_addr(cxl, reg)); +} static inline void __iomem *_cxl_p1n_addr(struct cxl_afu *afu, cxl_p1n_reg_t reg) { @@ -548,26 +553,40 @@ static inline void __iomem *_cxl_p1n_addr(struct cxl_afu *afu, cxl_p1n_reg_t reg return afu-p1n_mmio + cxl_reg_off(reg); } -#define cxl_p1n_write(afu, reg, val) \ - out_be64(_cxl_p1n_addr(afu, reg), val) -#define cxl_p1n_read(afu, reg) \ - in_be64(_cxl_p1n_addr(afu, reg)) +static inline void cxl_p1n_write(struct cxl_afu *afu, cxl_p1n_reg_t reg, u64 val) +{ + out_be64(_cxl_p1n_addr(afu, reg), val); +} + +static inline u64 cxl_p1n_read(struct cxl_afu *afu, cxl_p1n_reg_t reg) +{ + return in_be64(_cxl_p1n_addr(afu, reg)); +} static inline void __iomem *_cxl_p2n_addr(struct cxl_afu *afu, cxl_p2n_reg_t reg) { return afu-p2n_mmio + cxl_reg_off(reg); } -#define cxl_p2n_write(afu, reg, val) \ - out_be64(_cxl_p2n_addr(afu, reg), val) -#define cxl_p2n_read(afu, reg) \ - in_be64(_cxl_p2n_addr(afu, reg)) +static inline void cxl_p2n_write(struct cxl_afu *afu, cxl_p2n_reg_t reg, u64 val) +{ + out_be64(_cxl_p2n_addr(afu, reg), val); +} +static inline u64 cxl_p2n_read(struct cxl_afu *afu, cxl_p2n_reg_t reg) +{ + return in_be64(_cxl_p2n_addr(afu, reg)); +} -#define cxl_afu_cr_read64(afu, cr, off) \ - in_le64((afu)-afu_desc_mmio + (afu)-crs_offset + ((cr) * (afu)-crs_len) + (off)) -#define cxl_afu_cr_read32(afu, cr, off) \ - in_le32((afu)-afu_desc_mmio + (afu)-crs_offset + ((cr) * (afu)-crs_len) + (off)) +static inline u64 cxl_afu_cr_read64(struct cxl_afu *afu, int cr, u64 off) +{ + return in_le64((afu)-afu_desc_mmio + (afu)-crs_offset + ((cr) * (afu)-crs_len) + (off)); +} + +static inline u32 cxl_afu_cr_read32(struct cxl_afu *afu, int cr, u64 off) +{ + return in_le32((afu)-afu_desc_mmio + (afu)-crs_offset + ((cr) * (afu)-crs_len) + (off)); +} u16 cxl_afu_cr_read16(struct cxl_afu *afu, int cr, u64 off); u8 cxl_afu_cr_read8(struct cxl_afu *afu, int cr, u64 off); ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v3 10/11] cxl: EEH support
On Wed, 12 Aug 2015 10:48:19 +1000 Daniel Axtens d...@axtens.net wrote: EEH (Enhanced Error Handling) allows a driver to recover from the temporary failure of an attached PCI card. Enable basic CXL support for EEH. Looks like the only change since was the removal of the #ifdef, if that is correct. Reviewed-by: Cyril Bur cyril...@gmail.com Signed-off-by: Daniel Axtens d...@axtens.net --- drivers/misc/cxl/cxl.h | 1 + drivers/misc/cxl/pci.c | 252 drivers/misc/cxl/vphb.c | 8 ++ 3 files changed, 261 insertions(+) diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h index cda02412b01e..6f5386653dae 100644 --- a/drivers/misc/cxl/cxl.h +++ b/drivers/misc/cxl/cxl.h @@ -726,6 +726,7 @@ int cxl_psl_purge(struct cxl_afu *afu); void cxl_stop_trace(struct cxl *cxl); int cxl_pci_vphb_add(struct cxl_afu *afu); +void cxl_pci_vphb_reconfigure(struct cxl_afu *afu); void cxl_pci_vphb_remove(struct cxl_afu *afu); extern struct pci_driver cxl_pci_driver; diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c index b4a68a896a33..1eb26a357ce0 100644 --- a/drivers/misc/cxl/pci.c +++ b/drivers/misc/cxl/pci.c @@ -24,6 +24,7 @@ #include asm/io.h #include cxl.h +#include misc/cxl.h #define CXL_PCI_VSEC_ID 0x1280 @@ -1246,10 +1247,261 @@ static void cxl_remove(struct pci_dev *dev) cxl_remove_adapter(adapter); } +static pci_ers_result_t cxl_vphb_error_detected(struct cxl_afu *afu, + pci_channel_state_t state) +{ + struct pci_dev *afu_dev; + pci_ers_result_t result = PCI_ERS_RESULT_NEED_RESET; + pci_ers_result_t afu_result = PCI_ERS_RESULT_NEED_RESET; + + /* There should only be one entry, but go through the list + * anyway + */ + list_for_each_entry(afu_dev, afu-phb-bus-devices, bus_list) { + if (!afu_dev-driver) + continue; + + afu_dev-error_state = state; + + if (afu_dev-driver-err_handler) + afu_result = afu_dev-driver-err_handler-error_detected(afu_dev, + state); + /* Disconnect trumps all, NONE trumps NEED_RESET */ + if (afu_result == PCI_ERS_RESULT_DISCONNECT) + result = PCI_ERS_RESULT_DISCONNECT; + else if ((afu_result == PCI_ERS_RESULT_NONE) + (result == PCI_ERS_RESULT_NEED_RESET)) + result = PCI_ERS_RESULT_NONE; + } + return result; +} + +static pci_ers_result_t cxl_pci_error_detected(struct pci_dev *pdev, +pci_channel_state_t state) +{ + struct cxl *adapter = pci_get_drvdata(pdev); + struct cxl_afu *afu; + pci_ers_result_t result = PCI_ERS_RESULT_NEED_RESET; + int i; + + /* At this point, we could still have an interrupt pending. + * Let's try to get them out of the way before they do + * anything we don't like. + */ + schedule(); + + /* If we're permanently dead, give up. */ + if (state == pci_channel_io_perm_failure) { + /* Tell the AFU drivers; but we don't care what they + * say, we're going away. + */ + for (i = 0; i adapter-slices; i++) { + afu = adapter-afu[i]; + cxl_vphb_error_detected(afu, state); + } + return PCI_ERS_RESULT_DISCONNECT; + } + + /* Are we reflashing? + * + * If we reflash, we could come back as something entirely + * different, including a non-CAPI card. As such, by default + * we don't participate in the process. We'll be unbound and + * the slot re-probed. (TODO: check EEH doesn't blindly rebind + * us!) + * + * However, this isn't the entire story: for reliablity + * reasons, we usually want to reflash the FPGA on PERST in + * order to get back to a more reliable known-good state. + * + * This causes us a bit of a problem: if we reflash we can't + * trust that we'll come back the same - we could have a new + * image and been PERSTed in order to load that + * image. However, most of the time we actually *will* come + * back the same - for example a regular EEH event. + * + * Therefore, we allow the user to assert that the image is + * indeed the same and that we should continue on into EEH + * anyway. + */ + if (adapter-perst_loads_image !adapter-perst_same_image) { + /* TODO take the PHB out of CXL mode */ + dev_info(pdev-dev, reflashing, so opting out of EEH!\n); + return PCI_ERS_RESULT_NONE; + } + + /* + * At this point, we want to try to recover. We'll always +
[PATCH 04/31] x86/pci-nommu: handle page-less SG entries
Just remove a BUG_ON, the code handles them just fine as-is. Signed-off-by: Christoph Hellwig h...@lst.de --- arch/x86/kernel/pci-nommu.c | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/x86/kernel/pci-nommu.c b/arch/x86/kernel/pci-nommu.c index da15918..a218059 100644 --- a/arch/x86/kernel/pci-nommu.c +++ b/arch/x86/kernel/pci-nommu.c @@ -63,7 +63,6 @@ static int nommu_map_sg(struct device *hwdev, struct scatterlist *sg, WARN_ON(nents == 0 || sg[0].length == 0); for_each_sg(sg, s, nents, i) { - BUG_ON(!sg_page(s)); s-dma_address = sg_phys(s); if (!check_addr(map_sg, hwdev, s-dma_address, s-length)) return 0; -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 09/31] ia64/pci_dma: handle page-less SG entries
Use sg_phys() instead of virt_to_phys(sg_virt(sg)) so that we don't require a kernel virtual address. Signed-off-by: Christoph Hellwig h...@lst.de --- arch/ia64/sn/pci/pci_dma.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/arch/ia64/sn/pci/pci_dma.c b/arch/ia64/sn/pci/pci_dma.c index d0853e8..8f713c8 100644 --- a/arch/ia64/sn/pci/pci_dma.c +++ b/arch/ia64/sn/pci/pci_dma.c @@ -18,9 +18,6 @@ #include asm/sn/pcidev.h #include asm/sn/sn_sal.h -#define SG_ENT_VIRT_ADDRESS(sg)(sg_virt((sg))) -#define SG_ENT_PHYS_ADDRESS(SG)virt_to_phys(SG_ENT_VIRT_ADDRESS(SG)) - /** * sn_dma_supported - test a DMA mask * @dev: device to test @@ -291,7 +288,7 @@ static int sn_dma_map_sg(struct device *dev, struct scatterlist *sgl, */ for_each_sg(sgl, sg, nhwentries, i) { dma_addr_t dma_addr; - phys_addr = SG_ENT_PHYS_ADDRESS(sg); + phys_addr = sg_phys(sg); if (dmabarr) dma_addr = provider-dma_map_consistent(pdev, phys_addr, -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 26/31] openrisc: handle page-less SG entries
Make all cache invalidation conditional on sg_has_page() and use sg_phys to get the physical address directly. Signed-off-by: Christoph Hellwig h...@lst.de --- arch/openrisc/kernel/dma.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/arch/openrisc/kernel/dma.c b/arch/openrisc/kernel/dma.c index 0b77ddb..94ed052 100644 --- a/arch/openrisc/kernel/dma.c +++ b/arch/openrisc/kernel/dma.c @@ -184,8 +184,13 @@ or1k_map_sg(struct device *dev, struct scatterlist *sg, int i; for_each_sg(sg, s, nents, i) { - s-dma_address = or1k_map_page(dev, sg_page(s), s-offset, - s-length, dir, NULL); + if (sg_has_page(s)) { + s-dma_address = or1k_map_page(dev, sg_page(s), + s-offset, s-length, dir, + NULL); + } else { + s-dma_address = sg_phys(s); + } } return nents; -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 31/31] dma-mapping-common: skip kmemleak checks for page-less SG entries
Signed-off-by: Christoph Hellwig h...@lst.de --- include/asm-generic/dma-mapping-common.h | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/include/asm-generic/dma-mapping-common.h b/include/asm-generic/dma-mapping-common.h index 940d5ec..afc3eaf 100644 --- a/include/asm-generic/dma-mapping-common.h +++ b/include/asm-generic/dma-mapping-common.h @@ -51,8 +51,10 @@ static inline int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg, int i, ents; struct scatterlist *s; - for_each_sg(sg, s, nents, i) - kmemcheck_mark_initialized(sg_virt(s), s-length); + for_each_sg(sg, s, nents, i) { + if (sg_has_page(s)) + kmemcheck_mark_initialized(sg_virt(s), s-length); + } BUG_ON(!valid_dma_direction(dir)); ents = ops-map_sg(dev, sg, nents, dir, attrs); BUG_ON(ents 0); -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v3 06/11] cxl: Refactor adaptor init/teardown
On Wed, 12 Aug 2015 10:48:15 +1000 Daniel Axtens d...@axtens.net wrote: Some aspects of initialisation are done only once in the lifetime of an adapter: for example, allocating memory for the adapter, allocating the adapter number, or setting up sysfs/debugfs files. However, we may want to be able to do some parts of the initialisation multiple times: for example, in error recovery we want to be able to tear down and then re-map IO memory and IRQs. Therefore, refactor CXL init/teardown as follows. - Keep the overarching functions 'cxl_init_adapter' and its pair, 'cxl_remove_adapter'. - Move all 'once only' allocation/freeing steps to the existing 'cxl_alloc_adapter' function, and its pair 'cxl_release_adapter' (This involves moving allocation of the adapter number out of cxl_init_adapter.) - Create two new functions: 'cxl_configure_adapter', and its pair 'cxl_deconfigure_adapter'. These two functions 'wire up' the hardware --- they (de)configure resources that do not need to last the entire lifetime of the adapter Reviewed-by: Cyril Bur cyril...@gmail.com Signed-off-by: Daniel Axtens d...@axtens.net --- drivers/misc/cxl/pci.c | 140 ++--- 1 file changed, 87 insertions(+), 53 deletions(-) diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c index 484d35a5aead..f6cb089ff981 100644 --- a/drivers/misc/cxl/pci.c +++ b/drivers/misc/cxl/pci.c @@ -965,7 +965,6 @@ static int cxl_read_vsec(struct cxl *adapter, struct pci_dev *dev) CXL_READ_VSEC_BASE_IMAGE(dev, vsec, adapter-base_image); CXL_READ_VSEC_IMAGE_STATE(dev, vsec, image_state); adapter-user_image_loaded = !!(image_state CXL_VSEC_USER_IMAGE_LOADED); - adapter-perst_loads_image = true; adapter-perst_select_user = !!(image_state CXL_VSEC_USER_IMAGE_LOADED); CXL_READ_VSEC_NAFUS(dev, vsec, adapter-slices); @@ -1025,22 +1024,34 @@ static void cxl_release_adapter(struct device *dev) pr_devel(cxl_release_adapter\n); + cxl_remove_adapter_nr(adapter); + kfree(adapter); } -static struct cxl *cxl_alloc_adapter(struct pci_dev *dev) +static struct cxl *cxl_alloc_adapter(void) { struct cxl *adapter; + int rc; if (!(adapter = kzalloc(sizeof(struct cxl), GFP_KERNEL))) return NULL; - adapter-dev.parent = dev-dev; - adapter-dev.release = cxl_release_adapter; - pci_set_drvdata(dev, adapter); spin_lock_init(adapter-afu_list_lock); + if ((rc = cxl_alloc_adapter_nr(adapter))) + goto err1; + + if ((rc = dev_set_name(adapter-dev, card%i, adapter-adapter_num))) + goto err2; + return adapter; + +err2: + cxl_remove_adapter_nr(adapter); +err1: + kfree(adapter); + return NULL; } static int sanitise_adapter_regs(struct cxl *adapter) @@ -1049,57 +1060,96 @@ static int sanitise_adapter_regs(struct cxl *adapter) return cxl_tlb_slb_invalidate(adapter); } -static struct cxl *cxl_init_adapter(struct pci_dev *dev) +/* This should contain *only* operations that can safely be done in + * both creation and recovery. + */ +static int cxl_configure_adapter(struct cxl *adapter, struct pci_dev *dev) { - struct cxl *adapter; - bool free = true; int rc; + adapter-dev.parent = dev-dev; + adapter-dev.release = cxl_release_adapter; + pci_set_drvdata(dev, adapter); - if (!(adapter = cxl_alloc_adapter(dev))) - return ERR_PTR(-ENOMEM); + rc = pci_enable_device(dev); + if (rc) { + dev_err(dev-dev, pci_enable_device failed: %i\n, rc); + return rc; + } if ((rc = cxl_read_vsec(adapter, dev))) - goto err1; + return rc; if ((rc = cxl_vsec_looks_ok(adapter, dev))) - goto err1; + return rc; if ((rc = setup_cxl_bars(dev))) - goto err1; + return rc; if ((rc = switch_card_to_cxl(dev))) - goto err1; - - if ((rc = cxl_alloc_adapter_nr(adapter))) - goto err1; - - if ((rc = dev_set_name(adapter-dev, card%i, adapter-adapter_num))) - goto err2; + return rc; if ((rc = cxl_update_image_control(adapter))) - goto err2; + return rc; if ((rc = cxl_map_adapter_regs(adapter, dev))) - goto err2; + return rc; if ((rc = sanitise_adapter_regs(adapter))) - goto err2; + goto err; if ((rc = init_implementation_adapter_regs(adapter, dev))) - goto err3; + goto err; if ((rc = pnv_phb_to_cxl_mode(dev, OPAL_PHB_CAPI_MODE_CAPI))) - goto err3; + goto err; /* If recovery happened, the last step is to turn on snooping. * In
[PATCH 02/31] scatterlist: use sg_phys()
From: Dan Williams dan.j.willi...@intel.com Coccinelle cleanup to replace open coded sg to physical address translations. This is in preparation for introducing scatterlists that reference __pfn_t. // sg_phys.cocci: convert usage page_to_phys(sg_page(sg)) to sg_phys(sg) // usage: make coccicheck COCCI=sg_phys.cocci MODE=patch virtual patch @@ struct scatterlist *sg; @@ - page_to_phys(sg_page(sg)) + sg-offset + sg_phys(sg) @@ struct scatterlist *sg; @@ - page_to_phys(sg_page(sg)) + sg_phys(sg) PAGE_MASK Signed-off-by: Dan Williams dan.j.willi...@intel.com --- arch/arm/mm/dma-mapping.c| 2 +- arch/microblaze/kernel/dma.c | 3 +-- drivers/iommu/intel-iommu.c | 4 ++-- drivers/iommu/iommu.c| 2 +- drivers/staging/android/ion/ion_chunk_heap.c | 4 ++-- 5 files changed, 7 insertions(+), 8 deletions(-) diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index cba12f3..3d3d6aa 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -1520,7 +1520,7 @@ static int __map_sg_chunk(struct device *dev, struct scatterlist *sg, return -ENOMEM; for (count = 0, s = sg; count (size PAGE_SHIFT); s = sg_next(s)) { - phys_addr_t phys = page_to_phys(sg_page(s)); + phys_addr_t phys = sg_phys(s) PAGE_MASK; unsigned int len = PAGE_ALIGN(s-offset + s-length); if (!is_coherent diff --git a/arch/microblaze/kernel/dma.c b/arch/microblaze/kernel/dma.c index bf4dec2..c89da63 100644 --- a/arch/microblaze/kernel/dma.c +++ b/arch/microblaze/kernel/dma.c @@ -61,8 +61,7 @@ static int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, /* FIXME this part of code is untested */ for_each_sg(sgl, sg, nents, i) { sg-dma_address = sg_phys(sg); - __dma_sync(page_to_phys(sg_page(sg)) + sg-offset, - sg-length, direction); + __dma_sync(sg_phys(sg), sg-length, direction); } return nents; diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 0649b94..3541d65 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -2097,7 +2097,7 @@ static int __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn, sg_res = aligned_nrpages(sg-offset, sg-length); sg-dma_address = ((dma_addr_t)iov_pfn VTD_PAGE_SHIFT) + sg-offset; sg-dma_length = sg-length; - pteval = page_to_phys(sg_page(sg)) | prot; + pteval = (sg_phys(sg) PAGE_MASK) | prot; phys_pfn = pteval VTD_PAGE_SHIFT; } @@ -3623,7 +3623,7 @@ static int intel_nontranslate_map_sg(struct device *hddev, for_each_sg(sglist, sg, nelems, i) { BUG_ON(!sg_page(sg)); - sg-dma_address = page_to_phys(sg_page(sg)) + sg-offset; + sg-dma_address = sg_phys(sg); sg-dma_length = sg-length; } return nelems; diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index f286090..049df49 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1408,7 +1408,7 @@ size_t default_iommu_map_sg(struct iommu_domain *domain, unsigned long iova, min_pagesz = 1 __ffs(domain-ops-pgsize_bitmap); for_each_sg(sg, s, nents, i) { - phys_addr_t phys = page_to_phys(sg_page(s)) + s-offset; + phys_addr_t phys = sg_phys(s); /* * We are mapping on IOMMU page boundaries, so offset within diff --git a/drivers/staging/android/ion/ion_chunk_heap.c b/drivers/staging/android/ion/ion_chunk_heap.c index 5474615..f7b6ef9 100644 --- a/drivers/staging/android/ion/ion_chunk_heap.c +++ b/drivers/staging/android/ion/ion_chunk_heap.c @@ -81,7 +81,7 @@ static int ion_chunk_heap_allocate(struct ion_heap *heap, err: sg = table-sgl; for (i -= 1; i = 0; i--) { - gen_pool_free(chunk_heap-pool, page_to_phys(sg_page(sg)), + gen_pool_free(chunk_heap-pool, sg_phys(sg) PAGE_MASK, sg-length); sg = sg_next(sg); } @@ -109,7 +109,7 @@ static void ion_chunk_heap_free(struct ion_buffer *buffer) DMA_BIDIRECTIONAL); for_each_sg(table-sgl, sg, table-nents, i) { - gen_pool_free(chunk_heap-pool, page_to_phys(sg_page(sg)), + gen_pool_free(chunk_heap-pool, sg_phys(sg) PAGE_MASK, sg-length); } chunk_heap-allocated -= allocated_size; -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 06/31] alpha/pci-noop: handle page-less SG entries
Use sg_phys() instead of virt_to_phys(sg_virt(sg)) so that we don't require a kernel virtual address. Signed-off-by: Christoph Hellwig h...@lst.de --- arch/alpha/kernel/pci-noop.c | 6 +- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/arch/alpha/kernel/pci-noop.c b/arch/alpha/kernel/pci-noop.c index df24b76..7319151 100644 --- a/arch/alpha/kernel/pci-noop.c +++ b/arch/alpha/kernel/pci-noop.c @@ -145,11 +145,7 @@ static int alpha_noop_map_sg(struct device *dev, struct scatterlist *sgl, int ne struct scatterlist *sg; for_each_sg(sgl, sg, nents, i) { - void *va; - - BUG_ON(!sg_page(sg)); - va = sg_virt(sg); - sg_dma_address(sg) = (dma_addr_t)virt_to_phys(va); + sg_dma_address(sg) = (dma_addr_t)sg_phys(sg); sg_dma_len(sg) = sg-length; } -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 10/31] powerpc/iommu: handle page-less SG entries
For the iommu offset we just need and offset into the page. Calculate that using the physical address instead of using the virtual address so that we don't require a virtual mapping. Signed-off-by: Christoph Hellwig h...@lst.de --- arch/powerpc/kernel/iommu.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c index a8e3490..0f52e40 100644 --- a/arch/powerpc/kernel/iommu.c +++ b/arch/powerpc/kernel/iommu.c @@ -457,7 +457,7 @@ int ppc_iommu_map_sg(struct device *dev, struct iommu_table *tbl, max_seg_size = dma_get_max_seg_size(dev); for_each_sg(sglist, s, nelems, i) { - unsigned long vaddr, npages, entry, slen; + unsigned long paddr, npages, entry, slen; slen = s-length; /* Sanity check */ @@ -466,22 +466,22 @@ int ppc_iommu_map_sg(struct device *dev, struct iommu_table *tbl, continue; } /* Allocate iommu entries for that segment */ - vaddr = (unsigned long) sg_virt(s); - npages = iommu_num_pages(vaddr, slen, IOMMU_PAGE_SIZE(tbl)); + paddr = sg_phys(s); + npages = iommu_num_pages(paddr, slen, IOMMU_PAGE_SIZE(tbl)); align = 0; if (tbl-it_page_shift PAGE_SHIFT slen = PAGE_SIZE - (vaddr ~PAGE_MASK) == 0) + (paddr ~PAGE_MASK) == 0) align = PAGE_SHIFT - tbl-it_page_shift; entry = iommu_range_alloc(dev, tbl, npages, handle, mask tbl-it_page_shift, align); - DBG( - vaddr: %lx, size: %lx\n, vaddr, slen); + DBG( - paddr: %lx, size: %lx\n, paddr, slen); /* Handle failure */ if (unlikely(entry == DMA_ERROR_CODE)) { if (printk_ratelimit()) dev_info(dev, iommu_alloc failed, tbl %p -vaddr %lx npages %lu\n, tbl, vaddr, +paddr %lx npages %lu\n, tbl, paddr, npages); goto failure; } @@ -496,7 +496,7 @@ int ppc_iommu_map_sg(struct device *dev, struct iommu_table *tbl, /* Insert into HW table */ build_fail = tbl-it_ops-set(tbl, entry, npages, - vaddr IOMMU_PAGE_MASK(tbl), + paddr IOMMU_PAGE_MASK(tbl), direction, attrs); if(unlikely(build_fail)) goto failure; -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 18/31] nios2: handle page-less SG entries
Make all cache invalidation conditional on sg_has_page() and use sg_phys to get the physical address directly. Signed-off-by: Christoph Hellwig h...@lst.de --- arch/nios2/mm/dma-mapping.c | 29 +++-- 1 file changed, 15 insertions(+), 14 deletions(-) diff --git a/arch/nios2/mm/dma-mapping.c b/arch/nios2/mm/dma-mapping.c index ac5da75..1a0a68d 100644 --- a/arch/nios2/mm/dma-mapping.c +++ b/arch/nios2/mm/dma-mapping.c @@ -64,13 +64,11 @@ int dma_map_sg(struct device *dev, struct scatterlist *sg, int nents, BUG_ON(!valid_dma_direction(direction)); for_each_sg(sg, sg, nents, i) { - void *addr; - - addr = sg_virt(sg); - if (addr) { - __dma_sync_for_device(addr, sg-length, direction); - sg-dma_address = sg_phys(sg); + if (sg_has_page(sg)) { + __dma_sync_for_device(sg_virt(sg), sg-length, + direction); } + sg-dma_address = sg_phys(sg); } return nents; @@ -113,9 +111,8 @@ void dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nhwentries, return; for_each_sg(sg, sg, nhwentries, i) { - addr = sg_virt(sg); - if (addr) - __dma_sync_for_cpu(addr, sg-length, direction); + if (sg_has_page(sg)) + __dma_sync_for_cpu(sg_virt(sg), sg-length, direction); } } EXPORT_SYMBOL(dma_unmap_sg); @@ -166,8 +163,10 @@ void dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg, int nelems, BUG_ON(!valid_dma_direction(direction)); /* Make sure that gcc doesn't leave the empty loop body. */ - for_each_sg(sg, sg, nelems, i) - __dma_sync_for_cpu(sg_virt(sg), sg-length, direction); + for_each_sg(sg, sg, nelems, i) { + if (sg_has_page(sg)) + __dma_sync_for_cpu(sg_virt(sg), sg-length, direction); + } } EXPORT_SYMBOL(dma_sync_sg_for_cpu); @@ -179,8 +178,10 @@ void dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg, BUG_ON(!valid_dma_direction(direction)); /* Make sure that gcc doesn't leave the empty loop body. */ - for_each_sg(sg, sg, nelems, i) - __dma_sync_for_device(sg_virt(sg), sg-length, direction); - + for_each_sg(sg, sg, nelems, i) { + if (sg_has_page(sg)) + __dma_sync_for_device(sg_virt(sg), sg-length, + direction); + } } EXPORT_SYMBOL(dma_sync_sg_for_device); -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 25/31] frv: handle page-less SG entries
Only call kmap_atomic_primary when the SG entry is mapped into kernel virtual space. XXX: the code already looks odd due to the lack of pairing between kmap_atomic_primary and kunmap_atomic_primary. Does it work either before or after this patch? Signed-off-by: Christoph Hellwig h...@lst.de --- arch/frv/mb93090-mb00/pci-dma.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/frv/mb93090-mb00/pci-dma.c b/arch/frv/mb93090-mb00/pci-dma.c index 4d1f01d..77b3a1c 100644 --- a/arch/frv/mb93090-mb00/pci-dma.c +++ b/arch/frv/mb93090-mb00/pci-dma.c @@ -63,6 +63,9 @@ int dma_map_sg(struct device *dev, struct scatterlist *sglist, int nents, dampr2 = __get_DAMPR(2); for_each_sg(sglist, sg, nents, i) { + if (!sg_has_page(sg)) + continue; + vaddr = kmap_atomic_primary(sg_page(sg)); frv_dcache_writeback((unsigned long) vaddr, -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 30/31] intel-iommu: handle page-less SG entries
Just remove a BUG_ON, the code handles them just fine as-is. Signed-off-by: Christoph Hellwig h...@lst.de --- drivers/iommu/intel-iommu.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 3541d65..ae10573 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -3622,7 +3622,6 @@ static int intel_nontranslate_map_sg(struct device *hddev, struct scatterlist *sg; for_each_sg(sglist, sg, nelems, i) { - BUG_ON(!sg_page(sg)); sg-dma_address = sg_phys(sg); sg-dma_length = sg-length; } -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 29/31] parisc: handle page-less SG entries
Make all cache invalidation conditional on sg_has_page() and use sg_phys to get the physical address directly. Signed-off-by: Christoph Hellwig h...@lst.de --- arch/parisc/kernel/pci-dma.c | 29 ++--- 1 file changed, 18 insertions(+), 11 deletions(-) diff --git a/arch/parisc/kernel/pci-dma.c b/arch/parisc/kernel/pci-dma.c index b9402c9..6cad0e0 100644 --- a/arch/parisc/kernel/pci-dma.c +++ b/arch/parisc/kernel/pci-dma.c @@ -483,11 +483,13 @@ static int pa11_dma_map_sg(struct device *dev, struct scatterlist *sglist, int n BUG_ON(direction == DMA_NONE); for_each_sg(sglist, sg, nents, i) { - unsigned long vaddr = (unsigned long)sg_virt(sg); - - sg_dma_address(sg) = (dma_addr_t) virt_to_phys(vaddr); + sg_dma_address(sg) = sg_phys(sg); sg_dma_len(sg) = sg-length; - flush_kernel_dcache_range(vaddr, sg-length); + + if (sg_has_page(sg)) { + flush_kernel_dcache_range((unsigned long)sg_virt(sg), + sg-length); + } } return nents; } @@ -504,9 +506,10 @@ static void pa11_dma_unmap_sg(struct device *dev, struct scatterlist *sglist, in /* once we do combining we'll need to use phys_to_virt(sg_dma_address(sglist)) */ - for_each_sg(sglist, sg, nents, i) - flush_kernel_vmap_range(sg_virt(sg), sg-length); - return; + for_each_sg(sglist, sg, nents, i) { + if (sg_has_page(sg)) + flush_kernel_vmap_range(sg_virt(sg), sg-length); + } } static void pa11_dma_sync_single_for_cpu(struct device *dev, dma_addr_t dma_handle, unsigned long offset, size_t size, enum dma_data_direction direction) @@ -530,8 +533,10 @@ static void pa11_dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sgl /* once we do combining we'll need to use phys_to_virt(sg_dma_address(sglist)) */ - for_each_sg(sglist, sg, nents, i) - flush_kernel_vmap_range(sg_virt(sg), sg-length); + for_each_sg(sglist, sg, nents, i) { + if (sg_has_page(sg)) + flush_kernel_vmap_range(sg_virt(sg), sg-length); + } } static void pa11_dma_sync_sg_for_device(struct device *dev, struct scatterlist *sglist, int nents, enum dma_data_direction direction) @@ -541,8 +546,10 @@ static void pa11_dma_sync_sg_for_device(struct device *dev, struct scatterlist * /* once we do combining we'll need to use phys_to_virt(sg_dma_address(sglist)) */ - for_each_sg(sglist, sg, nents, i) - flush_kernel_vmap_range(sg_virt(sg), sg-length); + for_each_sg(sglist, sg, nents, i) { + if (sg_has_page(sg)) + flush_kernel_vmap_range(sg_virt(sg), sg-length); + } } struct hppa_dma_ops pcxl_dma_ops = { -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 19/31] arc: handle page-less SG entries
On Wednesday 12 August 2015 12:39 PM, Christoph Hellwig wrote: Make all cache invalidation conditional on sg_has_page() and use sg_phys to get the physical address directly. Signed-off-by: Christoph Hellwig h...@lst.de With a minor nit below. Acked-by: Vineet Gupta vgu...@synopsys.com --- arch/arc/include/asm/dma-mapping.h | 26 +++--- 1 file changed, 19 insertions(+), 7 deletions(-) diff --git a/arch/arc/include/asm/dma-mapping.h b/arch/arc/include/asm/dma-mapping.h index 2d28ba9..42eb526 100644 --- a/arch/arc/include/asm/dma-mapping.h +++ b/arch/arc/include/asm/dma-mapping.h @@ -108,9 +108,13 @@ dma_map_sg(struct device *dev, struct scatterlist *sg, struct scatterlist *s; int i; - for_each_sg(sg, s, nents, i) - s-dma_address = dma_map_page(dev, sg_page(s), s-offset, -s-length, dir); + for_each_sg(sg, s, nents, i) { + if (sg_has_page(s)) { + _dma_cache_sync((unsigned long)sg_virt(s), s-length, + dir); + } + s-dma_address = sg_phys(s); + } return nents; } @@ -163,8 +167,12 @@ dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sglist, int nelems, int i; struct scatterlist *sg; - for_each_sg(sglist, sg, nelems, i) - _dma_cache_sync((unsigned int)sg_virt(sg), sg-length, dir); + for_each_sg(sglist, sg, nelems, i) { + if (sg_has_page(sg)) { + _dma_cache_sync((unsigned int)sg_virt(sg), sg-length, + dir); + } + } } static inline void @@ -174,8 +182,12 @@ dma_sync_sg_for_device(struct device *dev, struct scatterlist *sglist, int i; struct scatterlist *sg; - for_each_sg(sglist, sg, nelems, i) - _dma_cache_sync((unsigned int)sg_virt(sg), sg-length, dir); + for_each_sg(sglist, sg, nelems, i) { + if (sg_has_page(sg)) { + _dma_cache_sync((unsigned int)sg_virt(sg), sg-length, + dir); For consistency, could u please fix the left alignment of @dir above - another tab perhaps ? + } + } } static inline int dma_supported(struct device *dev, u64 dma_mask) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v6 03/42] powerpc/powernv: Enable M64 on P7IOC
On Tue, Aug 11, 2015 at 12:06:26PM +1000, Alexey Kardashevskiy wrote: On 08/11/2015 09:45 AM, Gavin Shan wrote: On Mon, Aug 10, 2015 at 04:30:09PM +1000, Alexey Kardashevskiy wrote: On 08/06/2015 02:11 PM, Gavin Shan wrote: The patch enables M64 window on P7IOC, which has been enabled on PHB3. Different from PHB3 where 16 M64 BARs are supported and each of them can be owned by one particular PE# exclusively or divided evenly to 256 segments, each P7IOC PHB has 16 M64 BARs and each of them are divided into 8 segments. Is this a limitation of POWER7 chip or it is from IODA1? From IODA1. So each P7IOC PHB can support 128 M64 segments only. Also, P7IOC has M64DT, which helps mapping one particular M64 segment# to arbitrary PE#. PHB3 doesn't have M64DT, indicating that one M64 segment can only be pinned to the fixed PE#. In order to have similar logic to support M64 for PHB3 and P7IOC, we just provide 128 M64 (16 BARs) segments and fixed mapping between PE# and M64 segment# on P7IOC. In turn, we just need different phb-init_m64() hooks for P7IOC and PHB3 to support M64. Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com --- arch/powerpc/platforms/powernv/pci-ioda.c | 116 ++ 1 file changed, 104 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 38b5405..e4ac703 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -172,6 +172,69 @@ static void pnv_ioda_free_pe(struct pnv_phb *phb, int pe) clear_bit(pe, phb-ioda.pe_alloc); } +static int pnv_ioda1_init_m64(struct pnv_phb *phb) +{ + struct resource *r; + int seg; + + /* There are as many M64 segments as the maximum number +* of PEs, which is 128. +*/ + for (seg = 0; seg phb-ioda.total_pe; seg += 8) { This 8 is used a lot across the patch, please make it a macro (PNV_PHB_P7IOC_SEGNUM or PNV_PHB_IODA1_SEGNUM or whatever you think it is) with a short comment why it is 8. Or a pnv_phb member. I would like to use 8. When having a macro, you have to check the definition of the macro to get the real value of that. Give it a good name then. However, it makes sense to add more comments explaining why it's 8 here. You cannot comment it everywhere and everywhere is exact place when you'll have to comment it as I believe sometime it is segments-per-M64 and sometime it is number of bits in a byte (or not? anyway, this is will always distract unless you use macro for segments-per-M64). Ok. I will use PNV_PHB_IODA1_SEGNUM then. + unsigned long base; + int64_t rc; + + base = phb-ioda.m64_base + seg * phb-ioda.m64_segsize; + rc = opal_pci_set_phb_mem_window(phb-opal_id, +OPAL_M64_WINDOW_TYPE, +seg / 8, +base, +0, /* unused */ +8 * phb-ioda.m64_segsize); + if (rc != OPAL_SUCCESS) { + pr_warn( Error %lld setting M64 PHB#%d-BAR#%d\n, + rc, phb-hose-global_number, seg / 8); + goto fail; + } + + rc = opal_pci_phb_mmio_enable(phb-opal_id, + OPAL_M64_WINDOW_TYPE, + seg / 8, + OPAL_ENABLE_M64_SPLIT); + if (rc != OPAL_SUCCESS) { + pr_warn( Error %lld enabling M64 PHB#%d-BAR#%d\n, + rc, phb-hose-global_number, seg / 8); + goto fail; + } + } + + /* Strip off the segment used by the reserved PE, which What is this reserved PE on P7IOC? Strip off means exclude here? 127 that was exported from skiboot. Strip off means exclude. I like exclude lot better. Ok. Will use it. +* is expected to be 0 or last supported PE#. The PHB's +* first memory window traces the 32-bits MMIO range s/traces/filters/ ? Or I did not understand this comment... It seems you didn't understand it: there are two memory windows in every PHB. The first one is tracing M32 resource and the second one is tracing M64 resource. Tracing means logging, pretty much. Is this what you mean here? No, it means recording, not logging. So it would be appropriate to replace it with track? +* while the second one traces the 64-bits prefetchable +* MMIO range that the PHB supports. 32/64 ranges comment seems irrelevant here. Maybe it's not so relevant, but still. Not relevant - remove it. Put this text to the commit log. Ok. We're stripping off the M64 segment from the 2nd resource (as above), not first one. 2nd window (not _resource_), you mean? I mean struct pci_controller::mem_resources[1]. +*/ + r =
Re: [PATCH v6 05/42] powerpc/powernv: Track IO/M32/M64 segments from PE
On Wed, Aug 12, 2015 at 09:05:09PM +1000, Alexey Kardashevskiy wrote: On 08/12/2015 08:45 PM, Gavin Shan wrote: On Tue, Aug 11, 2015 at 12:23:42PM +1000, Alexey Kardashevskiy wrote: On 08/11/2015 10:03 AM, Gavin Shan wrote: On Mon, Aug 10, 2015 at 05:16:40PM +1000, Alexey Kardashevskiy wrote: On 08/06/2015 02:11 PM, Gavin Shan wrote: The patch is adding 6 bitmaps, three to PE and three to PHB, to track The patch is also removing 2 arrays (io_segmap and m32_segmap), what is that all about? Also, there was no m64_segmap, now there is, needs an explanation may be. Originally, the bitmaps (io_segmap and m32_segmap) are allocated dynamically. Now, they have fixed sizes - 512 bits. The subject powerpc/powernv: Track IO/M32/M64 segments from PE indicates why m64_segmap is added. But before this patch, you somehow managed to keep it working without a map for M64, by the same time you needed map for IO and M32. It seems you are making things consistent in this patch but it also feels like you do not have to do so as M64 did not need a map before and I cannot see why it needs one now. The M64 map is used by [PATCH v6 23/42] powerpc/powernv: Release PEs dynamically where the M64 segments consumed by one particular PE will be released. Then add it where it is really started being used. It is really hard to review a patch which is actually spread between patches. Do not count that reviewers will just trust you. Ok. I'll try. the consumed by one particular PE, which can be released once the PE is destroyed during PCI unplugging time. Also, we're using fixed quantity of bits to trace the used IO, M32 and M64 segments by PEs in one particular PHB. Out of curiosity - have you considered having just 3 arrays, in PHB, storing PE numbers, and ditching PE's arrays? Does PE itself need to know what PEs it is using? Not sure about this master/slave PEs though. I don't follow your suggestion. Can you rephrase and explain it a bit more? Please explains in what situations you need same map in both PHB and PE and how you are going to use them. For example, pe::m64_segmap and phb::m64_segmap. I believe you need to know what segment is used by what PE and that's it and having 2 bitmaps is overcomplicated hard to follow. Is there anything else what I am missing? The situation is same to all (IO, M32 and M64) segment maps. Taking m64_segmap as an example, it will be used when creating or destroying the PE who consumes M64 segments. phb::m64_segmap is recording the M64 segment usage in PHB's domain. It's used to check same M64 segment won't be used for towice. pe::m64_segmap tracks the M64 segments consumed by the PE. You could have a single map in PHB, key would be a segment number and value would be PE number. No need to have a map in PE. At all. No need to initialize bitmaps, etc. So it would be arrays for various segmant maps if I understood your suggestion as below. Please confirm: #define PNV_IODA_MAX_SEG_NUM512 int struct pnv_phb::io_segmap[PNV_IODA_MAX_SEG_NUM]; m32_segmap[PNV_IODA_MAX_SEG_NUM]; m64_segmap[PNV_IODA_MAX_SEG_NUM]; - Initially, they are initialize to IODA_INVALID_PE; - When one segment is assigned to one PE, the corresponding entry of the array is set to PE number. - When one segment is relased, the corresponding entry of the array is set to IODA_INVALID_PE; It would be easier to read patches if this one was right before [PATCH v6 23/42] powerpc/powernv: Release PEs dynamically I'll try to reoder the patch, but not expect too much... Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com --- arch/powerpc/platforms/powernv/pci-ioda.c | 29 +++-- arch/powerpc/platforms/powernv/pci.h | 18 ++ 2 files changed, 29 insertions(+), 18 deletions(-) diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index e4ac703..78b49a1 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -388,6 +388,12 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all) list_add_tail(pe-list, master_pe-slaves); } + /* M64 segments consumed by slave PEs are tracked + * by master PE + */ + set_bit(pe-pe_number, master_pe-m64_segmap); + set_bit(pe-pe_number, phb-ioda.m64_segmap); + /* P7IOC supports M64DT, which helps mapping M64 segment * to one particular PE#. However, PHB3 has fixed mapping * between M64 segment and PE#. In order to have same logic @@ -2871,9 +2877,11 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose, while (index phb-ioda.total_pe region.start = region.end) { - phb-ioda.io_segmap[index] = pe-pe_number; +
RE: [PATCH v2 05/10] cxl: Refactor adaptor init/teardown
From: Cyril Bur Sent: 11 August 2015 07:01 ... You have a dilema with the use of ugly if (rc = foo()). I don't like it but the file is littered with it. Looks like the majority of uses in this file the conditional block is only one line then it makes sense (or at least in terms of numbers of lines... fair enough), however, if you have a conditional block spanning multiple lines, I don't like. ... kfree(adapter); } -static struct cxl *cxl_alloc_adapter(struct pci_dev *dev) +static struct cxl *cxl_alloc_adapter(void) { struct cxl *adapter; + int rc; if (!(adapter = kzalloc(sizeof(struct cxl), GFP_KERNEL))) return NULL; - adapter-dev.parent = dev-dev; - adapter-dev.release = cxl_release_adapter; - pci_set_drvdata(dev, adapter); spin_lock_init(adapter-afu_list_lock); + if ((rc = cxl_alloc_adapter_nr(adapter))) Humf + goto err1; + + if ((rc = dev_set_name(adapter-dev, card%i, adapter-adapter_num))) Humf + goto err2; + return adapter; + +err2: + cxl_remove_adapter_nr(adapter); +err1: + kfree(adapter); + return NULL; } ... The function above doesn't even use the 'rc' value. David ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 20/31] avr32: handle page-less SG entries
Around Wed 12 Aug 2015 09:05:39 +0200 or thereabout, Christoph Hellwig wrote: Make all cache invalidation conditional on sg_has_page() and use sg_phys to get the physical address directly, bypassing the noop page_to_bus. Signed-off-by: Christoph Hellwig h...@lst.de Acked-by: Hans-Christian Egtvedt egtv...@samfundet.no --- arch/avr32/include/asm/dma-mapping.h | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/arch/avr32/include/asm/dma-mapping.h b/arch/avr32/include/asm/dma-mapping.h index ae7ac92..a662ce2 100644 --- a/arch/avr32/include/asm/dma-mapping.h +++ b/arch/avr32/include/asm/dma-mapping.h @@ -216,11 +216,9 @@ dma_map_sg(struct device *dev, struct scatterlist *sglist, int nents, struct scatterlist *sg; for_each_sg(sglist, sg, nents, i) { - char *virt; - - sg-dma_address = page_to_bus(sg_page(sg)) + sg-offset; - virt = sg_virt(sg); - dma_cache_sync(dev, virt, sg-length, direction); + sg-dma_address = sg_phys(sg); + if (sg_has_page(sg)) + dma_cache_sync(dev, sg_virt(sg), sg-length, direction); } return nents; @@ -328,8 +326,10 @@ dma_sync_sg_for_device(struct device *dev, struct scatterlist *sglist, int i; struct scatterlist *sg; - for_each_sg(sglist, sg, nents, i) - dma_cache_sync(dev, sg_virt(sg), sg-length, direction); + for_each_sg(sglist, sg, nents, i) { + if (sg_has_page(sg)) + dma_cache_sync(dev, sg_virt(sg), sg-length, direction); + } } /* Now for the API extensions over the pci_ one */ -- mvh Hans-Christian Egtvedt ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v6 05/42] powerpc/powernv: Track IO/M32/M64 segments from PE
On Tue, Aug 11, 2015 at 12:23:42PM +1000, Alexey Kardashevskiy wrote: On 08/11/2015 10:03 AM, Gavin Shan wrote: On Mon, Aug 10, 2015 at 05:16:40PM +1000, Alexey Kardashevskiy wrote: On 08/06/2015 02:11 PM, Gavin Shan wrote: The patch is adding 6 bitmaps, three to PE and three to PHB, to track The patch is also removing 2 arrays (io_segmap and m32_segmap), what is that all about? Also, there was no m64_segmap, now there is, needs an explanation may be. Originally, the bitmaps (io_segmap and m32_segmap) are allocated dynamically. Now, they have fixed sizes - 512 bits. The subject powerpc/powernv: Track IO/M32/M64 segments from PE indicates why m64_segmap is added. But before this patch, you somehow managed to keep it working without a map for M64, by the same time you needed map for IO and M32. It seems you are making things consistent in this patch but it also feels like you do not have to do so as M64 did not need a map before and I cannot see why it needs one now. The M64 map is used by [PATCH v6 23/42] powerpc/powernv: Release PEs dynamically where the M64 segments consumed by one particular PE will be released. the consumed by one particular PE, which can be released once the PE is destroyed during PCI unplugging time. Also, we're using fixed quantity of bits to trace the used IO, M32 and M64 segments by PEs in one particular PHB. Out of curiosity - have you considered having just 3 arrays, in PHB, storing PE numbers, and ditching PE's arrays? Does PE itself need to know what PEs it is using? Not sure about this master/slave PEs though. I don't follow your suggestion. Can you rephrase and explain it a bit more? Please explains in what situations you need same map in both PHB and PE and how you are going to use them. For example, pe::m64_segmap and phb::m64_segmap. I believe you need to know what segment is used by what PE and that's it and having 2 bitmaps is overcomplicated hard to follow. Is there anything else what I am missing? The situation is same to all (IO, M32 and M64) segment maps. Taking m64_segmap as an example, it will be used when creating or destroying the PE who consumes M64 segments. phb::m64_segmap is recording the M64 segment usage in PHB's domain. It's used to check same M64 segment won't be used for towice. pe::m64_segmap tracks the M64 segments consumed by the PE. It would be easier to read patches if this one was right before [PATCH v6 23/42] powerpc/powernv: Release PEs dynamically I'll try to reoder the patch, but not expect too much... Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com --- arch/powerpc/platforms/powernv/pci-ioda.c | 29 +++-- arch/powerpc/platforms/powernv/pci.h | 18 ++ 2 files changed, 29 insertions(+), 18 deletions(-) diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index e4ac703..78b49a1 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -388,6 +388,12 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all) list_add_tail(pe-list, master_pe-slaves); } + /* M64 segments consumed by slave PEs are tracked +* by master PE +*/ + set_bit(pe-pe_number, master_pe-m64_segmap); + set_bit(pe-pe_number, phb-ioda.m64_segmap); + /* P7IOC supports M64DT, which helps mapping M64 segment * to one particular PE#. However, PHB3 has fixed mapping * between M64 segment and PE#. In order to have same logic @@ -2871,9 +2877,11 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose, while (index phb-ioda.total_pe region.start = region.end) { - phb-ioda.io_segmap[index] = pe-pe_number; + set_bit(index, pe-io_segmap); + set_bit(index, phb-ioda.io_segmap); rc = opal_pci_map_pe_mmio_window(phb-opal_id, - pe-pe_number, OPAL_IO_WINDOW_TYPE, 0, index); + pe-pe_number, OPAL_IO_WINDOW_TYPE, + 0, index); Unrelated change. True, will drop. However, checkpatch.pl will complain wtih: exceeding 80 characters. It will not as you are not changing these lines, it only complains on changes. if (rc != OPAL_SUCCESS) { pr_err(%s: OPAL error %d when mapping IO segment #%d to PE#%d\n, @@ -2896,9 +2904,11 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose, while (index phb-ioda.total_pe region.start = region.end) { - phb-ioda.m32_segmap[index] = pe-pe_number; + set_bit(index, pe-m32_segmap); +
Re: [PATCH v6 05/42] powerpc/powernv: Track IO/M32/M64 segments from PE
On 08/12/2015 08:45 PM, Gavin Shan wrote: On Tue, Aug 11, 2015 at 12:23:42PM +1000, Alexey Kardashevskiy wrote: On 08/11/2015 10:03 AM, Gavin Shan wrote: On Mon, Aug 10, 2015 at 05:16:40PM +1000, Alexey Kardashevskiy wrote: On 08/06/2015 02:11 PM, Gavin Shan wrote: The patch is adding 6 bitmaps, three to PE and three to PHB, to track The patch is also removing 2 arrays (io_segmap and m32_segmap), what is that all about? Also, there was no m64_segmap, now there is, needs an explanation may be. Originally, the bitmaps (io_segmap and m32_segmap) are allocated dynamically. Now, they have fixed sizes - 512 bits. The subject powerpc/powernv: Track IO/M32/M64 segments from PE indicates why m64_segmap is added. But before this patch, you somehow managed to keep it working without a map for M64, by the same time you needed map for IO and M32. It seems you are making things consistent in this patch but it also feels like you do not have to do so as M64 did not need a map before and I cannot see why it needs one now. The M64 map is used by [PATCH v6 23/42] powerpc/powernv: Release PEs dynamically where the M64 segments consumed by one particular PE will be released. Then add it where it is really started being used. It is really hard to review a patch which is actually spread between patches. Do not count that reviewers will just trust you. the consumed by one particular PE, which can be released once the PE is destroyed during PCI unplugging time. Also, we're using fixed quantity of bits to trace the used IO, M32 and M64 segments by PEs in one particular PHB. Out of curiosity - have you considered having just 3 arrays, in PHB, storing PE numbers, and ditching PE's arrays? Does PE itself need to know what PEs it is using? Not sure about this master/slave PEs though. I don't follow your suggestion. Can you rephrase and explain it a bit more? Please explains in what situations you need same map in both PHB and PE and how you are going to use them. For example, pe::m64_segmap and phb::m64_segmap. I believe you need to know what segment is used by what PE and that's it and having 2 bitmaps is overcomplicated hard to follow. Is there anything else what I am missing? The situation is same to all (IO, M32 and M64) segment maps. Taking m64_segmap as an example, it will be used when creating or destroying the PE who consumes M64 segments. phb::m64_segmap is recording the M64 segment usage in PHB's domain. It's used to check same M64 segment won't be used for towice. pe::m64_segmap tracks the M64 segments consumed by the PE. You could have a single map in PHB, key would be a segment number and value would be PE number. No need to have a map in PE. At all. No need to initialize bitmaps, etc. It would be easier to read patches if this one was right before [PATCH v6 23/42] powerpc/powernv: Release PEs dynamically I'll try to reoder the patch, but not expect too much... Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com --- arch/powerpc/platforms/powernv/pci-ioda.c | 29 +++-- arch/powerpc/platforms/powernv/pci.h | 18 ++ 2 files changed, 29 insertions(+), 18 deletions(-) diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index e4ac703..78b49a1 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -388,6 +388,12 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all) list_add_tail(pe-list, master_pe-slaves); } + /* M64 segments consumed by slave PEs are tracked +* by master PE +*/ + set_bit(pe-pe_number, master_pe-m64_segmap); + set_bit(pe-pe_number, phb-ioda.m64_segmap); + /* P7IOC supports M64DT, which helps mapping M64 segment * to one particular PE#. However, PHB3 has fixed mapping * between M64 segment and PE#. In order to have same logic @@ -2871,9 +2877,11 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose, while (index phb-ioda.total_pe region.start = region.end) { - phb-ioda.io_segmap[index] = pe-pe_number; + set_bit(index, pe-io_segmap); + set_bit(index, phb-ioda.io_segmap); rc = opal_pci_map_pe_mmio_window(phb-opal_id, - pe-pe_number, OPAL_IO_WINDOW_TYPE, 0, index); + pe-pe_number, OPAL_IO_WINDOW_TYPE, + 0, index); Unrelated change. True, will drop. However, checkpatch.pl will complain wtih: exceeding 80 characters. It will not as you are not changing these lines, it only complains on changes.
Re: [PATCH v7 2/6] mm: mlock: Add new mlock system call
On Sun 09-08-15 01:22:52, Eric B Munson wrote: With the refactored mlock code, introduce a new system call for mlock. The new call will allow the user to specify what lock states are being added. mlock2 is trivial at the moment, but a follow on patch will add a new mlock state making it useful. Looks good to me Acked-by: Michal Hocko mho...@suse.com Signed-off-by: Eric B Munson emun...@akamai.com Acked-by: Vlastimil Babka vba...@suse.cz Cc: Michal Hocko mho...@suse.cz Cc: Vlastimil Babka vba...@suse.cz Cc: Heiko Carstens heiko.carst...@de.ibm.com Cc: Geert Uytterhoeven ge...@linux-m68k.org Cc: Catalin Marinas catalin.mari...@arm.com Cc: Stephen Rothwell s...@canb.auug.org.au Cc: Guenter Roeck li...@roeck-us.net Cc: Andrea Arcangeli aarca...@redhat.com Cc: linux-al...@vger.kernel.org Cc: linux-ker...@vger.kernel.org Cc: linux-arm-ker...@lists.infradead.org Cc: adi-buildroot-de...@lists.sourceforge.net Cc: linux-cris-ker...@axis.com Cc: linux-i...@vger.kernel.org Cc: linux-m...@lists.linux-m68k.org Cc: linux-am33-l...@redhat.com Cc: linux-par...@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-s...@vger.kernel.org Cc: linux...@vger.kernel.org Cc: sparcli...@vger.kernel.org Cc: linux-xte...@linux-xtensa.org Cc: linux-...@vger.kernel.org Cc: linux-a...@vger.kernel.org Cc: linux...@kvack.org --- arch/x86/entry/syscalls/syscall_32.tbl | 1 + arch/x86/entry/syscalls/syscall_64.tbl | 1 + include/linux/syscalls.h | 2 ++ include/uapi/asm-generic/unistd.h | 4 +++- kernel/sys_ni.c| 1 + mm/mlock.c | 8 6 files changed, 16 insertions(+), 1 deletion(-) diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl index ef8187f..8e06da6 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -365,3 +365,4 @@ 356 i386memfd_createsys_memfd_create 357 i386bpf sys_bpf 358 i386execveatsys_execveat stub32_execveat +360 i386mlock2 sys_mlock2 diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index 9ef32d5..67601e7 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -329,6 +329,7 @@ 320 common kexec_file_load sys_kexec_file_load 321 common bpf sys_bpf 322 64 execveatstub_execveat +324 common mlock2 sys_mlock2 # # x32-specific system call numbers start at 512 to avoid cache impact diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index b45c45b..56a3d59 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -884,4 +884,6 @@ asmlinkage long sys_execveat(int dfd, const char __user *filename, const char __user *const __user *argv, const char __user *const __user *envp, int flags); +asmlinkage long sys_mlock2(unsigned long start, size_t len, int flags); + #endif diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index e016bd9..14a6013 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -709,9 +709,11 @@ __SYSCALL(__NR_memfd_create, sys_memfd_create) __SYSCALL(__NR_bpf, sys_bpf) #define __NR_execveat 281 __SC_COMP(__NR_execveat, sys_execveat, compat_sys_execveat) +#define __NR_mlock2 282 +__SYSCALL(__NR_mlock2, sys_mlock2) #undef __NR_syscalls -#define __NR_syscalls 282 +#define __NR_syscalls 283 /* * All syscalls below here should go away really, diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index 7995ef5..4818b71 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -193,6 +193,7 @@ cond_syscall(sys_mlock); cond_syscall(sys_munlock); cond_syscall(sys_mlockall); cond_syscall(sys_munlockall); +cond_syscall(sys_mlock2); cond_syscall(sys_mincore); cond_syscall(sys_madvise); cond_syscall(sys_mremap); diff --git a/mm/mlock.c b/mm/mlock.c index 5692ee5..3094f27 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -643,6 +643,14 @@ SYSCALL_DEFINE2(mlock, unsigned long, start, size_t, len) return do_mlock(start, len, VM_LOCKED); } +SYSCALL_DEFINE3(mlock2, unsigned long, start, size_t, len, int, flags) +{ + if (flags) + return -EINVAL; + + return do_mlock(start, len, VM_LOCKED); +} + SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len) { int ret; -- 1.9.1 -- Michal Hocko SUSE Labs ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc/xmon: Allow limiting the size of the paca display
On Wed, 2015-08-12 at 13:24 +0530, Anshuman Khandual wrote: On 08/12/2015 12:27 PM, Michael Ellerman wrote: @@ -2090,9 +2092,12 @@ static void dump_one_paca(int cpu) printf( %-*s = %s\n, 16, present, cpu_present(cpu) ? yes : no); printf( %-*s = %s\n, 16, online, cpu_online(cpu) ? yes : no); + i = 4; /* We always print the first four lines */ + #define DUMP(paca, name, format) \ - printf( %-*s = %#-*format\t(0x%lx)\n, 16, #name, 18, paca-name, \ - offsetof(struct paca_struct, name)); + if (!num_lines || i++ num_lines) All look good except the fact that we are using 0 to signify that there is no limit to the number of lines. Is not it bit confusing ? Maybe a bit, but it's the simplest option, and it's not uncommon for zero to mean no limit. chage(1) springs to mind as an example. And in terms of xmon it's hardly the most confusing part :) cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 30/31] intel-iommu: handle page-less SG entries
On Wed, 2015-08-12 at 09:05 +0200, Christoph Hellwig wrote: Just remove a BUG_ON, the code handles them just fine as-is. Signed-off-by: Christoph Hellwig h...@lst.de Acked-by: David Woodhouse david.woodho...@intel.com -- David WoodhouseOpen Source Technology Centre david.woodho...@intel.com Intel Corporation smime.p7s Description: S/MIME cryptographic signature ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 31/31] dma-mapping-common: skip kmemleak checks for page-less SG entries
On Wed, Aug 12, 2015 at 12:05 AM, Christoph Hellwig h...@lst.de wrote: + for_each_sg(sg, s, nents, i) { + if (sg_has_page(s)) + kmemcheck_mark_initialized(sg_virt(s), s-length); + } [ Again, I'm responding to one random patch - this pattern was in other patches too. ] A question: do we actually expect to mix page-less and pageful SG entries in the same SG list? How does that happen? (I'm not saying it can't, I'm just wondering where people expect this to happen). IOW, maybe it would be valid to have a rule saying a SG list is either all pageful or pageless, never mixed, and then have the if statement outside the loop rather than inside. Linus ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 29/31] parisc: handle page-less SG entries
On Wed, Aug 12, 2015 at 12:05 AM, Christoph Hellwig h...@lst.de wrote: Make all cache invalidation conditional on sg_has_page() and use sg_phys to get the physical address directly. So this worries me a bit (I'm just reacting to one random patch in the series). The reason? I think this wants a big honking comment somewhere saying non-sg_page accesses are not necessarily cache coherent). Now, I don't think that's _wrong_, but it's an important distinction: if you look up pages in the page tables directly, there's a very subtle difference between then saving just the pfn and saving the struct page of the result. On sane architectures, this whole cache flushing thing doesn't matter. Which just means that it's going to be even more subtle on the odd broken ones.. I'm assuming that anybody who wants to use the page-less scatter-gather lists always does so on memory that isn't actually virtually mapped at all, or only does so on sane architectures that are cache coherent at a physical level, but I'd like that assumption *documented* somewhere. (And maybe it is, and I just didn't get to that patch yet) Linus ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev