Re: [PATCH v2 4/7] irqchip/gic-v3-its: Don't map the MSI page in its_irq_compose_msi_msg()
On 30/04/2019 13:34, Auger Eric wrote: Hi Julien, Hi Eric, Thank you for the review! On 4/29/19 4:44 PM, Julien Grall wrote: its_irq_compose_msi_msg() may be called from non-preemptible context. However, on RT, iommu_dma_map_msi_msg requires to be called from a preemptible context. A recent change split iommu_dma_map_msi_msg() in two new functions: one that should be called in preemptible context, the other does not have any requirement. The GICv3 ITS driver is reworked to avoid executing preemptible code in non-preemptible context. This can be achieved by preparing the MSI maping when allocating the MSI interrupt. mapping Signed-off-by: Julien Grall --- Changes in v2: - Rework the commit message to use imperative mood --- drivers/irqchip/irq-gic-v3-its.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c index 7577755bdcf4..12ddbcfe1b1e 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -1179,7 +1179,7 @@ static void its_irq_compose_msi_msg(struct irq_data *d, struct msi_msg *msg) msg->address_hi = upper_32_bits(addr); msg->data= its_get_event_id(d); - iommu_dma_map_msi_msg(d->irq, msg); + iommu_dma_compose_msi_msg(irq_data_get_msi_desc(d), msg); } static int its_irq_set_irqchip_state(struct irq_data *d, @@ -2566,6 +2566,7 @@ static int its_irq_domain_alloc(struct irq_domain *domain, unsigned int virq, { msi_alloc_info_t *info = args; struct its_device *its_dev = info->scratchpad[0].ptr; + struct its_node *its = its_dev->its; irq_hw_number_t hwirq; int err; int i; @@ -2574,6 +2575,8 @@ static int its_irq_domain_alloc(struct irq_domain *domain, unsigned int virq, if (err) return err; + err = iommu_dma_prepare_msi(info->desc, its->get_msi_base(its_dev)); Test err as in gicv2m driver? Hmmm yes. Marc, do you want me to respin the patch? Cheers, -- Julien Grall ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 0/3] PCIe Host request to reserve IOVA
On Fri, Apr 12, 2019 at 08:43:32AM +0530, Srinath Mannam wrote: > Few SOCs have limitation that their PCIe host can't allow few inbound > address ranges. Allowed inbound address ranges are listed in dma-ranges > DT property and this address ranges are required to do IOVA mapping. > Remaining address ranges have to be reserved in IOVA mapping. > > PCIe Host driver of those SOCs has to list resource entries of allowed > address ranges given in dma-ranges DT property in sorted order. This > sorted list of resources will be processed and reserve IOVA address for > inaccessible address holes while initializing IOMMU domain. > > This patch set is based on Linux-5.0-rc2. > > Changes from v3: > - Addressed Robin Murphy review comments. > - pcie-iproc: parse dma-ranges and make sorted resource list. > - dma-iommu: process list and reserve gaps between entries > > Changes from v2: > - Patch set rebased to Linux-5.0-rc2 > > Changes from v1: > - Addressed Oza review comments. > > Srinath Mannam (3): > PCI: Add dma_ranges window list > iommu/dma: Reserve IOVA for PCIe inaccessible DMA address > PCI: iproc: Add sorted dma ranges resource entries to host bridge > > drivers/iommu/dma-iommu.c | 19 > drivers/pci/controller/pcie-iproc.c | 44 > - > drivers/pci/probe.c | 3 +++ > include/linux/pci.h | 1 + > 4 files changed, 66 insertions(+), 1 deletion(-) Bjorn, Joerg, this series should not affect anything in the mainline other than its consumer (ie patch 3); if that's the case should we consider it for v5.2 and if yes how are we going to merge it ? Thanks, Lorenzo ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v7 05/23] iommu: Introduce cache_invalidate API
On 08/04/2019 13:18, Eric Auger wrote: > +int iommu_cache_invalidate(struct iommu_domain *domain, struct device *dev, > +struct iommu_cache_invalidate_info *inv_info) > +{ > + int ret = 0; > + > + if (unlikely(!domain->ops->cache_invalidate)) > + return -ENODEV; > + > + ret = domain->ops->cache_invalidate(domain, dev, inv_info); > + > + return ret; Nit: you don't really need ret The UAPI looks good to me, so Reviewed-by: Jean-Philippe Brucker ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 3/3] PCI: iproc: Add sorted dma ranges resource entries to host bridge
On Fri, Apr 12, 2019 at 08:43:35AM +0530, Srinath Mannam wrote: > IPROC host has the limitation that it can use only those address ranges > given by dma-ranges property as inbound address. So that the memory > address holes in dma-ranges should be reserved to allocate as DMA address. > > Inbound address of host accessed by PCIe devices will not be translated > before it comes to IOMMU or directly to PE. What does that mean "directly to PE" ? IIUC all you want to say is that there is no entity translating PCI memory transactions addresses before they it the PCI host controller inbound regions address decoder. > But the limitation of this host is, access to few address ranges are > ignored. So that IOVA ranges for these address ranges have to be > reserved. > > All allowed address ranges are listed in dma-ranges DT parameter. These > address ranges are converted as resource entries and listed in sorted > order add added to dma_ranges list of PCI host bridge structure. > > Ex: > dma-ranges = < \ > 0x4300 0x00 0x8000 0x00 0x8000 0x00 0x8000 \ > 0x4300 0x08 0x 0x08 0x 0x08 0x \ > 0x4300 0x80 0x 0x80 0x 0x40 0x> > > In the above example of dma-ranges, memory address from > 0x0 - 0x8000, > 0x1 - 0x8, > 0x10 - 0x80 and > 0x100 - 0x. > are not allowed to use as inbound addresses. > > Signed-off-by: Srinath Mannam > Based-on-patch-by: Oza Pawandeep > Reviewed-by: Oza Pawandeep > --- > drivers/pci/controller/pcie-iproc.c | 44 > - > 1 file changed, 43 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/controller/pcie-iproc.c > b/drivers/pci/controller/pcie-iproc.c > index c20fd6b..94ba5c0 100644 > --- a/drivers/pci/controller/pcie-iproc.c > +++ b/drivers/pci/controller/pcie-iproc.c > @@ -1146,11 +1146,43 @@ static int iproc_pcie_setup_ib(struct iproc_pcie > *pcie, > return ret; > } > > +static int > +iproc_pcie_add_dma_range(struct device *dev, struct list_head *resources, > + struct of_pci_range *range) > +{ > + struct resource *res; > + struct resource_entry *entry, *tmp; > + struct list_head *head = resources; > + > + res = devm_kzalloc(dev, sizeof(struct resource), GFP_KERNEL); > + if (!res) > + return -ENOMEM; > + > + resource_list_for_each_entry(tmp, resources) { > + if (tmp->res->start < range->cpu_addr) > + head = >node; > + } > + > + res->start = range->cpu_addr; > + res->end = res->start + range->size - 1; > + > + entry = resource_list_create_entry(res, 0); > + if (!entry) > + return -ENOMEM; > + > + entry->offset = res->start - range->cpu_addr; > + resource_list_add(entry, head); > + > + return 0; > +} > + > static int iproc_pcie_map_dma_ranges(struct iproc_pcie *pcie) > { > + struct pci_host_bridge *host = pci_host_bridge_from_priv(pcie); > struct of_pci_range range; > struct of_pci_range_parser parser; > int ret; > + LIST_HEAD(resources); > > /* Get the dma-ranges from DT */ > ret = of_pci_dma_range_parser_init(, pcie->dev->of_node); > @@ -1158,13 +1190,23 @@ static int iproc_pcie_map_dma_ranges(struct > iproc_pcie *pcie) > return ret; > > for_each_of_pci_range(, ) { > + ret = iproc_pcie_add_dma_range(pcie->dev, > +, > +); > + if (ret) > + goto out; > /* Each range entry corresponds to an inbound mapping region */ > ret = iproc_pcie_setup_ib(pcie, , IPROC_PCIE_IB_MAP_MEM); > if (ret) > - return ret; > + goto out; > } > > + list_splice_init(, >dma_ranges); > + > return 0; > +out: > + pci_free_resource_list(); > + return ret; > } > > static int iproce_pcie_get_msi(struct iproc_pcie *pcie, > -- > 2.7.4 > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 0/3] PCIe Host request to reserve IOVA
Hi Lorenzo, Thanks a lot. Please see my reply below. On Wed, May 1, 2019 at 7:24 PM Lorenzo Pieralisi wrote: > > On Wed, May 01, 2019 at 02:20:56PM +0100, Robin Murphy wrote: > > On 2019-05-01 1:55 pm, Bjorn Helgaas wrote: > > > On Wed, May 01, 2019 at 12:30:38PM +0100, Lorenzo Pieralisi wrote: > > > > On Fri, Apr 12, 2019 at 08:43:32AM +0530, Srinath Mannam wrote: > > > > > Few SOCs have limitation that their PCIe host can't allow few inbound > > > > > address ranges. Allowed inbound address ranges are listed in > > > > > dma-ranges > > > > > DT property and this address ranges are required to do IOVA mapping. > > > > > Remaining address ranges have to be reserved in IOVA mapping. > > > > > > > > > > PCIe Host driver of those SOCs has to list resource entries of allowed > > > > > address ranges given in dma-ranges DT property in sorted order. This > > > > > sorted list of resources will be processed and reserve IOVA address > > > > > for > > > > > inaccessible address holes while initializing IOMMU domain. > > > > > > > > > > This patch set is based on Linux-5.0-rc2. > > > > > > > > > > Changes from v3: > > > > >- Addressed Robin Murphy review comments. > > > > > - pcie-iproc: parse dma-ranges and make sorted resource list. > > > > > - dma-iommu: process list and reserve gaps between entries > > > > > > > > > > Changes from v2: > > > > >- Patch set rebased to Linux-5.0-rc2 > > > > > > > > > > Changes from v1: > > > > >- Addressed Oza review comments. > > > > > > > > > > Srinath Mannam (3): > > > > >PCI: Add dma_ranges window list > > > > >iommu/dma: Reserve IOVA for PCIe inaccessible DMA address > > > > >PCI: iproc: Add sorted dma ranges resource entries to host bridge > > > > > > > > > > drivers/iommu/dma-iommu.c | 19 > > > > > drivers/pci/controller/pcie-iproc.c | 44 > > > > > - > > > > > drivers/pci/probe.c | 3 +++ > > > > > include/linux/pci.h | 1 + > > > > > 4 files changed, 66 insertions(+), 1 deletion(-) > > > > > > > > Bjorn, Joerg, > > > > > > > > this series should not affect anything in the mainline other than its > > > > consumer (ie patch 3); if that's the case should we consider it for v5.2 > > > > and if yes how are we going to merge it ? > > > > > > I acked the first one > > > > > > Robin reviewed the second > > > (https://lore.kernel.org/lkml/e6c812d6-0cad-4cfd-defd-d7ec427a6...@arm.com) > > > (though I do agree with his comment about DMA_BIT_MASK()), Joerg was OK > > > with it if Robin was > > > (https://lore.kernel.org/lkml/20190423145721.gh29...@8bytes.org). > > > > > > Eric reviewed the third (and pointed out a typo). > > > > > > My Kconfiggery never got fully answered -- it looks to me as though it's > > > possible to build pcie-iproc without the DMA hole support, and I thought > > > the whole point of this series was to deal with those holes > > > (https://lore.kernel.org/lkml/20190418234241.gf126...@google.com). I > > > would > > > have expected something like making pcie-iproc depend on IOMMU_SUPPORT. > > > But Srinath didn't respond to that, so maybe it's not an issue and it > > > should only affect pcie-iproc anyway. > > > > Hmm, I'm sure I had at least half-written a reply on that point, but I > > can't seem to find it now... anyway, the gist is that these inbound > > windows are generally set up to cover the physical address ranges of DRAM > > and anything else that devices might need to DMA to. Thus if you're not > > using an IOMMU, the fact that devices can't access the gaps in between > > doesn't matter because there won't be anything there anyway; it only > > needs mitigating if you do use an IOMMU and start giving arbitrary > > non-physical addresses to the endpoint. > > So basically there is no strict IOMMU_SUPPORT dependency. Yes, without IOMMU_SUPPORT, all inbound addresses will fall inside dma-ranges. Issue is only in the case of IOMMU enable, this patch will address by reserving non-allowed address (holes of dma-ranges) by reserving them. > > > > So bottom line, I'm fine with merging it for v5.2. Do you want to merge > > > it, Lorenzo, or ...? > > > > This doesn't look like it will conflict with the other DMA ops and MSI > > mapping changes currently in-flight for iommu-dma, so I have no > > objection to it going through the PCI tree for 5.2. > > I will update the DMA_BIT_MASK() according to your review and fix the > typo Eric pointed out and push out a branch - we shall see if we can > include it for v5.2. I will send new patches with the change DMA_BIT_MASK() and typo along with Bjorn's comment in PATCH-1. Regards, Srinath. > > Thanks, > Lorenzo ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 5/7 v2] MIPS: use the generic uncached segment support in dma-direct
Stop providing our arch alloc/free hooks and just expose the segment offset instead. Signed-off-by: Christoph Hellwig --- arch/mips/Kconfig | 1 + arch/mips/include/asm/page.h | 3 --- arch/mips/jazz/jazzdma.c | 6 -- arch/mips/mm/dma-noncoherent.c | 26 +- 4 files changed, 10 insertions(+), 26 deletions(-) diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig index 4a5f5b0ee9a9..cde4b490f3c7 100644 --- a/arch/mips/Kconfig +++ b/arch/mips/Kconfig @@ -9,6 +9,7 @@ config MIPS select ARCH_HAS_ELF_RANDOMIZE select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST select ARCH_HAS_UBSAN_SANITIZE_ALL + select ARCH_HAS_UNCACHED_SEGMENT select ARCH_SUPPORTS_UPROBES select ARCH_USE_BUILTIN_BSWAP select ARCH_USE_CMPXCHG_LOCKREF if 64BIT diff --git a/arch/mips/include/asm/page.h b/arch/mips/include/asm/page.h index 6b31c93b5eaa..23e0f1386e04 100644 --- a/arch/mips/include/asm/page.h +++ b/arch/mips/include/asm/page.h @@ -258,9 +258,6 @@ extern int __virt_addr_valid(const volatile void *kaddr); ((current->personality & READ_IMPLIES_EXEC) ? VM_EXEC : 0) | \ VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) -#define UNCAC_ADDR(addr) (UNCAC_BASE + __pa(addr)) -#define CAC_ADDR(addr) ((unsigned long)__va((addr) - UNCAC_BASE)) - #include #include diff --git a/arch/mips/jazz/jazzdma.c b/arch/mips/jazz/jazzdma.c index bedb5047aff3..1804dc9d8136 100644 --- a/arch/mips/jazz/jazzdma.c +++ b/arch/mips/jazz/jazzdma.c @@ -575,10 +575,6 @@ static void *jazz_dma_alloc(struct device *dev, size_t size, return NULL; } - if (!(attrs & DMA_ATTR_NON_CONSISTENT)) { - dma_cache_wback_inv((unsigned long)ret, size); - ret = (void *)UNCAC_ADDR(ret); - } return ret; } @@ -586,8 +582,6 @@ static void jazz_dma_free(struct device *dev, size_t size, void *vaddr, dma_addr_t dma_handle, unsigned long attrs) { vdma_free(dma_handle); - if (!(attrs & DMA_ATTR_NON_CONSISTENT)) - vaddr = (void *)CAC_ADDR((unsigned long)vaddr); dma_direct_free_pages(dev, size, vaddr, dma_handle, attrs); } diff --git a/arch/mips/mm/dma-noncoherent.c b/arch/mips/mm/dma-noncoherent.c index f9549d2fbea3..ed56c6fa7be2 100644 --- a/arch/mips/mm/dma-noncoherent.c +++ b/arch/mips/mm/dma-noncoherent.c @@ -44,33 +44,25 @@ static inline bool cpu_needs_post_dma_flush(struct device *dev) } } -void *arch_dma_alloc(struct device *dev, size_t size, - dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs) +void arch_dma_prep_coherent(struct page *page, size_t size) { - void *ret; - - ret = dma_direct_alloc_pages(dev, size, dma_handle, gfp, attrs); - if (ret && !(attrs & DMA_ATTR_NON_CONSISTENT)) { - dma_cache_wback_inv((unsigned long) ret, size); - ret = (void *)UNCAC_ADDR(ret); - } + dma_cache_wback_inv((unsigned long)page_address(page), size); +} - return ret; +void *uncached_kernel_address(void *addr) +{ + return (void *)(__pa(addr) + UNCAC_BASE); } -void arch_dma_free(struct device *dev, size_t size, void *cpu_addr, - dma_addr_t dma_addr, unsigned long attrs) +void *cached_kernel_address(void *addr) { - if (!(attrs & DMA_ATTR_NON_CONSISTENT)) - cpu_addr = (void *)CAC_ADDR((unsigned long)cpu_addr); - dma_direct_free_pages(dev, size, cpu_addr, dma_addr, attrs); + return __va(addr) - UNCAC_BASE; } long arch_dma_coherent_to_pfn(struct device *dev, void *cpu_addr, dma_addr_t dma_addr) { - unsigned long addr = CAC_ADDR((unsigned long)cpu_addr); - return page_to_pfn(virt_to_page((void *)addr)); + return page_to_pfn(virt_to_page(cached_kernel_address(cpu_addr))); } pgprot_t arch_dma_mmap_pgprot(struct device *dev, pgprot_t prot, -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 0/3] PCIe Host request to reserve IOVA
On 2019-05-01 1:55 pm, Bjorn Helgaas wrote: On Wed, May 01, 2019 at 12:30:38PM +0100, Lorenzo Pieralisi wrote: On Fri, Apr 12, 2019 at 08:43:32AM +0530, Srinath Mannam wrote: Few SOCs have limitation that their PCIe host can't allow few inbound address ranges. Allowed inbound address ranges are listed in dma-ranges DT property and this address ranges are required to do IOVA mapping. Remaining address ranges have to be reserved in IOVA mapping. PCIe Host driver of those SOCs has to list resource entries of allowed address ranges given in dma-ranges DT property in sorted order. This sorted list of resources will be processed and reserve IOVA address for inaccessible address holes while initializing IOMMU domain. This patch set is based on Linux-5.0-rc2. Changes from v3: - Addressed Robin Murphy review comments. - pcie-iproc: parse dma-ranges and make sorted resource list. - dma-iommu: process list and reserve gaps between entries Changes from v2: - Patch set rebased to Linux-5.0-rc2 Changes from v1: - Addressed Oza review comments. Srinath Mannam (3): PCI: Add dma_ranges window list iommu/dma: Reserve IOVA for PCIe inaccessible DMA address PCI: iproc: Add sorted dma ranges resource entries to host bridge drivers/iommu/dma-iommu.c | 19 drivers/pci/controller/pcie-iproc.c | 44 - drivers/pci/probe.c | 3 +++ include/linux/pci.h | 1 + 4 files changed, 66 insertions(+), 1 deletion(-) Bjorn, Joerg, this series should not affect anything in the mainline other than its consumer (ie patch 3); if that's the case should we consider it for v5.2 and if yes how are we going to merge it ? I acked the first one Robin reviewed the second (https://lore.kernel.org/lkml/e6c812d6-0cad-4cfd-defd-d7ec427a6...@arm.com) (though I do agree with his comment about DMA_BIT_MASK()), Joerg was OK with it if Robin was (https://lore.kernel.org/lkml/20190423145721.gh29...@8bytes.org). Eric reviewed the third (and pointed out a typo). My Kconfiggery never got fully answered -- it looks to me as though it's possible to build pcie-iproc without the DMA hole support, and I thought the whole point of this series was to deal with those holes (https://lore.kernel.org/lkml/20190418234241.gf126...@google.com). I would have expected something like making pcie-iproc depend on IOMMU_SUPPORT. But Srinath didn't respond to that, so maybe it's not an issue and it should only affect pcie-iproc anyway. Hmm, I'm sure I had at least half-written a reply on that point, but I can't seem to find it now... anyway, the gist is that these inbound windows are generally set up to cover the physical address ranges of DRAM and anything else that devices might need to DMA to. Thus if you're not using an IOMMU, the fact that devices can't access the gaps in between doesn't matter because there won't be anything there anyway; it only needs mitigating if you do use an IOMMU and start giving arbitrary non-physical addresses to the endpoint. So bottom line, I'm fine with merging it for v5.2. Do you want to merge it, Lorenzo, or ...? This doesn't look like it will conflict with the other DMA ops and MSI mapping changes currently in-flight for iommu-dma, so I have no objection to it going through the PCI tree for 5.2. Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 0/7] iommu/dma-iommu: Split iommu_dma_map_msi_msg in two parts
Hi all, On RT, the function iommu_dma_map_msi_msg expects to be called from preemptible context. However, this is not always the case resulting a splat with !CONFIG_DEBUG_ATOMIC_SLEEP: [ 48.875777] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:974 [ 48.875779] in_atomic(): 1, irqs_disabled(): 128, pid: 2103, name: ip [ 48.875782] INFO: lockdep is turned off. [ 48.875784] irq event stamp: 10684 [ 48.875786] hardirqs last enabled at (10683): [] _raw_spin_unlock_irqrestore+0x88/0x90 [ 48.875791] hardirqs last disabled at (10684): [] _raw_spin_lock_irqsave+0x24/0x68 [ 48.875796] softirqs last enabled at (0): [] copy_process.isra.1.part.2+0x8d8/0x1970 [ 48.875801] softirqs last disabled at (0): [<>] (null) [ 48.875805] Preemption disabled at: [ 48.875805] [] __setup_irq+0xd8/0x6c0 [ 48.875811] CPU: 2 PID: 2103 Comm: ip Not tainted 5.0.3-rt1-7-g42ede9a0fed6 #45 [ 48.875815] Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development Platform, BIOS EDK II Jan 23 2017 [ 48.875817] Call trace: [ 48.875818] dump_backtrace+0x0/0x140 [ 48.875821] show_stack+0x14/0x20 [ 48.875823] dump_stack+0xa0/0xd4 [ 48.875827] ___might_sleep+0x16c/0x1f8 [ 48.875831] rt_spin_lock+0x5c/0x70 [ 48.875835] iommu_dma_map_msi_msg+0x5c/0x1d8 [ 48.875839] gicv2m_compose_msi_msg+0x3c/0x48 [ 48.875843] irq_chip_compose_msi_msg+0x40/0x58 [ 48.875846] msi_domain_activate+0x38/0x98 [ 48.875849] __irq_domain_activate_irq+0x58/0xa0 [ 48.875852] irq_domain_activate_irq+0x34/0x58 [ 48.875855] irq_activate+0x28/0x30 [ 48.875858] __setup_irq+0x2b0/0x6c0 [ 48.875861] request_threaded_irq+0xdc/0x188 [ 48.875865] sky2_setup_irq+0x44/0xf8 [ 48.875868] sky2_open+0x1a4/0x240 [ 48.875871] __dev_open+0xd8/0x188 [ 48.875874] __dev_change_flags+0x164/0x1f0 [ 48.875877] dev_change_flags+0x20/0x60 [ 48.875879] do_setlink+0x2a0/0xd30 [ 48.875882] __rtnl_newlink+0x5b4/0x6d8 [ 48.875885] rtnl_newlink+0x50/0x78 [ 48.875888] rtnetlink_rcv_msg+0x178/0x640 [ 48.875891] netlink_rcv_skb+0x58/0x118 [ 48.875893] rtnetlink_rcv+0x14/0x20 [ 48.875896] netlink_unicast+0x188/0x200 [ 48.875898] netlink_sendmsg+0x248/0x3d8 [ 48.875900] sock_sendmsg+0x18/0x40 [ 48.875904] ___sys_sendmsg+0x294/0x2d0 [ 48.875908] __sys_sendmsg+0x68/0xb8 [ 48.875911] __arm64_sys_sendmsg+0x20/0x28 [ 48.875914] el0_svc_common+0x90/0x118 [ 48.875918] el0_svc_handler+0x2c/0x80 [ 48.875922] el0_svc+0x8/0xc Most of the patches have now been acked (missing a couple of ack from Joerg). I was able to test the changes in GICv2m and GICv3 ITS. I don't have hardware for the other interrupt controllers. Cheers, Julien Grall (7): genirq/msi: Add a new field in msi_desc to store an IOMMU cookie iommu/dma-iommu: Split iommu_dma_map_msi_msg() in two parts irqchip/gicv2m: Don't map the MSI page in gicv2m_compose_msi_msg() irqchip/gic-v3-its: Don't map the MSI page in its_irq_compose_msi_msg() irqchip/ls-scfg-msi: Don't map the MSI page in ls_scfg_msi_compose_msg() irqchip/gic-v3-mbi: Don't map the MSI page in mbi_compose_m{b, s}i_msg() iommu/dma-iommu: Remove iommu_dma_map_msi_msg() drivers/iommu/Kconfig | 1 + drivers/iommu/dma-iommu.c | 48 +++ drivers/irqchip/irq-gic-v2m.c | 8 ++- drivers/irqchip/irq-gic-v3-its.c | 7 +- drivers/irqchip/irq-gic-v3-mbi.c | 15 ++-- drivers/irqchip/irq-ls-scfg-msi.c | 7 +- include/linux/dma-iommu.h | 24 ++-- include/linux/msi.h | 26 + kernel/irq/Kconfig| 3 +++ 9 files changed, 112 insertions(+), 27 deletions(-) -- 2.11.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH v9 03/13] mm: Add support for eXclusive Page Frame Ownership (XPFO)
On 5/1/19 8:49 AM, Waiman Long wrote: > On Wed, Apr 03, 2019 at 11:34:04AM -0600, Khalid Aziz wrote: >> diff --git a/Documentation/admin-guide/kernel-parameters.txt > b/Documentation/admin-guide/kernel-parameters.txt > >> index 858b6c0b9a15..9b36da94760e 100644 >> --- a/Documentation/admin-guide/kernel-parameters.txt >> +++ b/Documentation/admin-guide/kernel-parameters.txt >> @@ -2997,6 +2997,12 @@ >> >> nox2apic [X86-64,APIC] Do not enable x2APIC mode. >> >> + noxpfo [XPFO] Disable eXclusive Page Frame Ownership (XPFO) >> + when CONFIG_XPFO is on. Physical pages mapped into >> + user applications will also be mapped in the >> + kernel's address space as if CONFIG_XPFO was not >> + enabled. >> + >> cpu0_hotplug [X86] Turn on CPU0 hotplug feature when >> CONFIG_BO OTPARAM_HOTPLUG_CPU0 is off. >> Some features depend on CPU0. Known dependencies are: > > Given the big performance impact that XPFO can have. It should be off by > default when configured. Instead, the xpfo option should be used to > enable it. Agreed. I plan to disable it by default in the next version of the patch. This is likely to end up being a feature for extreme security conscious folks only, unless I or someone else comes up with further significant performance boost. Thanks, Khalid ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 3/3] PCI: iproc: Add sorted dma ranges resource entries to host bridge
Hi Lorenzo, Please see my reply below. On Wed, May 1, 2019 at 8:07 PM Lorenzo Pieralisi wrote: > > On Fri, Apr 12, 2019 at 08:43:35AM +0530, Srinath Mannam wrote: > > IPROC host has the limitation that it can use only those address ranges > > given by dma-ranges property as inbound address. So that the memory > > address holes in dma-ranges should be reserved to allocate as DMA address. > > > > Inbound address of host accessed by PCIe devices will not be translated > > before it comes to IOMMU or directly to PE. > > What does that mean "directly to PE" ? In general, with IOMMU enable case, inbound address access of endpoint will come to IOMMU. If IOMMU disable then it comes to PE (processing element - ARM). > > IIUC all you want to say is that there is no entity translating > PCI memory transactions addresses before they it the PCI host > controller inbound regions address decoder. In our SOC we have an entity (Inside PCIe RC) which will translate inbound address before it goes to IOMMU or PE. In other SOCs this will not be the case, all inbound address access will go to IOMMU or PE. Regards, Srinath. > > > But the limitation of this host is, access to few address ranges are > > ignored. So that IOVA ranges for these address ranges have to be > > reserved. > > > > All allowed address ranges are listed in dma-ranges DT parameter. These > > address ranges are converted as resource entries and listed in sorted > > order add added to dma_ranges list of PCI host bridge structure. > > > > Ex: > > dma-ranges = < \ > > 0x4300 0x00 0x8000 0x00 0x8000 0x00 0x8000 \ > > 0x4300 0x08 0x 0x08 0x 0x08 0x \ > > 0x4300 0x80 0x 0x80 0x 0x40 0x> > > > > In the above example of dma-ranges, memory address from > > 0x0 - 0x8000, > > 0x1 - 0x8, > > 0x10 - 0x80 and > > 0x100 - 0x. > > are not allowed to use as inbound addresses. > > > > Signed-off-by: Srinath Mannam > > Based-on-patch-by: Oza Pawandeep > > Reviewed-by: Oza Pawandeep > > --- > > drivers/pci/controller/pcie-iproc.c | 44 > > - > > 1 file changed, 43 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/pci/controller/pcie-iproc.c > > b/drivers/pci/controller/pcie-iproc.c > > index c20fd6b..94ba5c0 100644 > > --- a/drivers/pci/controller/pcie-iproc.c > > +++ b/drivers/pci/controller/pcie-iproc.c > > @@ -1146,11 +1146,43 @@ static int iproc_pcie_setup_ib(struct iproc_pcie > > *pcie, > > return ret; > > } > > > > +static int > > +iproc_pcie_add_dma_range(struct device *dev, struct list_head *resources, > > + struct of_pci_range *range) > > +{ > > + struct resource *res; > > + struct resource_entry *entry, *tmp; > > + struct list_head *head = resources; > > + > > + res = devm_kzalloc(dev, sizeof(struct resource), GFP_KERNEL); > > + if (!res) > > + return -ENOMEM; > > + > > + resource_list_for_each_entry(tmp, resources) { > > + if (tmp->res->start < range->cpu_addr) > > + head = >node; > > + } > > + > > + res->start = range->cpu_addr; > > + res->end = res->start + range->size - 1; > > + > > + entry = resource_list_create_entry(res, 0); > > + if (!entry) > > + return -ENOMEM; > > + > > + entry->offset = res->start - range->cpu_addr; > > + resource_list_add(entry, head); > > + > > + return 0; > > +} > > + > > static int iproc_pcie_map_dma_ranges(struct iproc_pcie *pcie) > > { > > + struct pci_host_bridge *host = pci_host_bridge_from_priv(pcie); > > struct of_pci_range range; > > struct of_pci_range_parser parser; > > int ret; > > + LIST_HEAD(resources); > > > > /* Get the dma-ranges from DT */ > > ret = of_pci_dma_range_parser_init(, pcie->dev->of_node); > > @@ -1158,13 +1190,23 @@ static int iproc_pcie_map_dma_ranges(struct > > iproc_pcie *pcie) > > return ret; > > > > for_each_of_pci_range(, ) { > > + ret = iproc_pcie_add_dma_range(pcie->dev, > > +, > > +); > > + if (ret) > > + goto out; > > /* Each range entry corresponds to an inbound mapping region > > */ > > ret = iproc_pcie_setup_ib(pcie, , > > IPROC_PCIE_IB_MAP_MEM); > > if (ret) > > - return ret; > > + goto out; > > } > > > > + list_splice_init(, >dma_ranges); > > + > > return 0; > > +out: > > + pci_free_resource_list(); > > + return ret; > > } > > > > static int iproce_pcie_get_msi(struct iproc_pcie *pcie, > > -- > > 2.7.4 > > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 0/3] PCIe Host request to reserve IOVA
On Wed, May 01, 2019 at 12:30:38PM +0100, Lorenzo Pieralisi wrote: > On Fri, Apr 12, 2019 at 08:43:32AM +0530, Srinath Mannam wrote: > > Few SOCs have limitation that their PCIe host can't allow few inbound > > address ranges. Allowed inbound address ranges are listed in dma-ranges > > DT property and this address ranges are required to do IOVA mapping. > > Remaining address ranges have to be reserved in IOVA mapping. > > > > PCIe Host driver of those SOCs has to list resource entries of allowed > > address ranges given in dma-ranges DT property in sorted order. This > > sorted list of resources will be processed and reserve IOVA address for > > inaccessible address holes while initializing IOMMU domain. > > > > This patch set is based on Linux-5.0-rc2. > > > > Changes from v3: > > - Addressed Robin Murphy review comments. > > - pcie-iproc: parse dma-ranges and make sorted resource list. > > - dma-iommu: process list and reserve gaps between entries > > > > Changes from v2: > > - Patch set rebased to Linux-5.0-rc2 > > > > Changes from v1: > > - Addressed Oza review comments. > > > > Srinath Mannam (3): > > PCI: Add dma_ranges window list > > iommu/dma: Reserve IOVA for PCIe inaccessible DMA address > > PCI: iproc: Add sorted dma ranges resource entries to host bridge > > > > drivers/iommu/dma-iommu.c | 19 > > drivers/pci/controller/pcie-iproc.c | 44 > > - > > drivers/pci/probe.c | 3 +++ > > include/linux/pci.h | 1 + > > 4 files changed, 66 insertions(+), 1 deletion(-) > > Bjorn, Joerg, > > this series should not affect anything in the mainline other than its > consumer (ie patch 3); if that's the case should we consider it for v5.2 > and if yes how are we going to merge it ? I acked the first one Robin reviewed the second (https://lore.kernel.org/lkml/e6c812d6-0cad-4cfd-defd-d7ec427a6...@arm.com) (though I do agree with his comment about DMA_BIT_MASK()), Joerg was OK with it if Robin was (https://lore.kernel.org/lkml/20190423145721.gh29...@8bytes.org). Eric reviewed the third (and pointed out a typo). My Kconfiggery never got fully answered -- it looks to me as though it's possible to build pcie-iproc without the DMA hole support, and I thought the whole point of this series was to deal with those holes (https://lore.kernel.org/lkml/20190418234241.gf126...@google.com). I would have expected something like making pcie-iproc depend on IOMMU_SUPPORT. But Srinath didn't respond to that, so maybe it's not an issue and it should only affect pcie-iproc anyway. So bottom line, I'm fine with merging it for v5.2. Do you want to merge it, Lorenzo, or ...? Bjorn ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 4/7] irqchip/gic-v3-its: Don't map the MSI page in its_irq_compose_msi_msg()
On 01/05/2019 12:14, Julien Grall wrote: > On 30/04/2019 13:34, Auger Eric wrote: >> Hi Julien, > > Hi Eric, > > Thank you for the review! > >> >> On 4/29/19 4:44 PM, Julien Grall wrote: >>> its_irq_compose_msi_msg() may be called from non-preemptible context. >>> However, on RT, iommu_dma_map_msi_msg requires to be called from a >>> preemptible context. >>> >>> A recent change split iommu_dma_map_msi_msg() in two new functions: >>> one that should be called in preemptible context, the other does >>> not have any requirement. >>> >>> The GICv3 ITS driver is reworked to avoid executing preemptible code in >>> non-preemptible context. This can be achieved by preparing the MSI >>> maping when allocating the MSI interrupt. >> mapping >>> >>> Signed-off-by: Julien Grall >>> >>> --- >>> Changes in v2: >>> - Rework the commit message to use imperative mood >>> --- >>> drivers/irqchip/irq-gic-v3-its.c | 5 - >>> 1 file changed, 4 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/irqchip/irq-gic-v3-its.c >>> b/drivers/irqchip/irq-gic-v3-its.c >>> index 7577755bdcf4..12ddbcfe1b1e 100644 >>> --- a/drivers/irqchip/irq-gic-v3-its.c >>> +++ b/drivers/irqchip/irq-gic-v3-its.c >>> @@ -1179,7 +1179,7 @@ static void its_irq_compose_msi_msg(struct irq_data >>> *d, struct msi_msg *msg) >>> msg->address_hi = upper_32_bits(addr); >>> msg->data = its_get_event_id(d); >>> >>> - iommu_dma_map_msi_msg(d->irq, msg); >>> + iommu_dma_compose_msi_msg(irq_data_get_msi_desc(d), msg); >>> } >>> >>> static int its_irq_set_irqchip_state(struct irq_data *d, >>> @@ -2566,6 +2566,7 @@ static int its_irq_domain_alloc(struct irq_domain >>> *domain, unsigned int virq, >>> { >>> msi_alloc_info_t *info = args; >>> struct its_device *its_dev = info->scratchpad[0].ptr; >>> + struct its_node *its = its_dev->its; >>> irq_hw_number_t hwirq; >>> int err; >>> int i; >>> @@ -2574,6 +2575,8 @@ static int its_irq_domain_alloc(struct irq_domain >>> *domain, unsigned int virq, >>> if (err) >>> return err; >>> >>> + err = iommu_dma_prepare_msi(info->desc, its->get_msi_base(its_dev)); >> Test err as in gicv2m driver? > > Hmmm yes. Marc, do you want me to respin the patch? Sure, feel free to if you can. But what I really need is an Ack from Jorg on the first few patches. Thanks, M. -- Jazz is not dead. It just smells funny... ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 2/7] iommu/dma-iommu: Split iommu_dma_map_msi_msg() in two parts
On RT, iommu_dma_map_msi_msg() may be called from non-preemptible context. This will lead to a splat with CONFIG_DEBUG_ATOMIC_SLEEP as the function is using spin_lock (they can sleep on RT). iommu_dma_map_msi_msg() is used to map the MSI page in the IOMMU PT and update the MSI message with the IOVA. Only the part to lookup for the MSI page requires to be called in preemptible context. As the MSI page cannot change over the lifecycle of the MSI interrupt, the lookup can be cached and re-used later on. iomma_dma_map_msi_msg() is now split in two functions: - iommu_dma_prepare_msi(): This function will prepare the mapping in the IOMMU and store the cookie in the structure msi_desc. This function should be called in preemptible context. - iommu_dma_compose_msi_msg(): This function will update the MSI message with the IOVA when the device is behind an IOMMU. Signed-off-by: Julien Grall Reviewed-by: Robin Murphy Reviewed-by: Eric Auguer --- Changes in v3: - Update the comment to use kerneldoc format - Fix typoes in the comments - More use of msi_desc_set_iommu_cookie - Add Robin's and Eric's reviewed-by Changes in v2: - Rework the commit message to use imperative mood - Use the MSI accessor to get/set the iommu cookie - Don't use ternary on return - Select CONFIG_IRQ_MSI_IOMMU - Pass an msi_desc rather than the irq number --- drivers/iommu/Kconfig | 1 + drivers/iommu/dma-iommu.c | 46 +- include/linux/dma-iommu.h | 25 + 3 files changed, 63 insertions(+), 9 deletions(-) diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index 6f07f3b21816..eb1c8cd243f9 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -94,6 +94,7 @@ config IOMMU_DMA bool select IOMMU_API select IOMMU_IOVA + select IRQ_MSI_IOMMU select NEED_SG_DMA_LENGTH config FSL_PAMU diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 77aabe637a60..f847904098f7 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -888,17 +888,18 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev, return NULL; } -void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg) +int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr) { - struct device *dev = msi_desc_to_dev(irq_get_msi_desc(irq)); + struct device *dev = msi_desc_to_dev(desc); struct iommu_domain *domain = iommu_get_domain_for_dev(dev); struct iommu_dma_cookie *cookie; struct iommu_dma_msi_page *msi_page; - phys_addr_t msi_addr = (u64)msg->address_hi << 32 | msg->address_lo; unsigned long flags; - if (!domain || !domain->iova_cookie) - return; + if (!domain || !domain->iova_cookie) { + desc->iommu_cookie = NULL; + return 0; + } cookie = domain->iova_cookie; @@ -911,7 +912,36 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg) msi_page = iommu_dma_get_msi_page(dev, msi_addr, domain); spin_unlock_irqrestore(>msi_lock, flags); - if (WARN_ON(!msi_page)) { + msi_desc_set_iommu_cookie(desc, msi_page); + + if (!msi_page) + return -ENOMEM; + return 0; +} + +void iommu_dma_compose_msi_msg(struct msi_desc *desc, + struct msi_msg *msg) +{ + struct device *dev = msi_desc_to_dev(desc); + const struct iommu_domain *domain = iommu_get_domain_for_dev(dev); + const struct iommu_dma_msi_page *msi_page; + + msi_page = msi_desc_get_iommu_cookie(desc); + + if (!domain || !domain->iova_cookie || WARN_ON(!msi_page)) + return; + + msg->address_hi = upper_32_bits(msi_page->iova); + msg->address_lo &= cookie_msi_granule(domain->iova_cookie) - 1; + msg->address_lo += lower_32_bits(msi_page->iova); +} + +void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg) +{ + struct msi_desc *desc = irq_get_msi_desc(irq); + phys_addr_t msi_addr = (u64)msg->address_hi << 32 | msg->address_lo; + + if (WARN_ON(iommu_dma_prepare_msi(desc, msi_addr))) { /* * We're called from a void callback, so the best we can do is * 'fail' by filling the message with obviously bogus values. @@ -922,8 +952,6 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg) msg->address_lo = ~0U; msg->data = ~0U; } else { - msg->address_hi = upper_32_bits(msi_page->iova); - msg->address_lo &= cookie_msi_granule(cookie) - 1; - msg->address_lo += lower_32_bits(msi_page->iova); + iommu_dma_compose_msi_msg(desc, msg); } } diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h index
[PATCH v3 7/7] iommu/dma-iommu: Remove iommu_dma_map_msi_msg()
A recent change split iommu_dma_map_msi_msg() in two new functions. The function was still implemented to avoid modifying all the callers at once. Now that all the callers have been reworked, iommu_dma_map_msi_msg() can be removed. Signed-off-by: Julien Grall Reviewed-by: Robin Murphy Reviewed-by: Eric Auger --- Changes in v3: - Add Robin's and Eric's reviewed-by Changes in v2: - Rework the commit message --- drivers/iommu/dma-iommu.c | 20 include/linux/dma-iommu.h | 5 - 2 files changed, 25 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index f847904098f7..13916fefeb27 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -935,23 +935,3 @@ void iommu_dma_compose_msi_msg(struct msi_desc *desc, msg->address_lo &= cookie_msi_granule(domain->iova_cookie) - 1; msg->address_lo += lower_32_bits(msi_page->iova); } - -void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg) -{ - struct msi_desc *desc = irq_get_msi_desc(irq); - phys_addr_t msi_addr = (u64)msg->address_hi << 32 | msg->address_lo; - - if (WARN_ON(iommu_dma_prepare_msi(desc, msi_addr))) { - /* -* We're called from a void callback, so the best we can do is -* 'fail' by filling the message with obviously bogus values. -* Since we got this far due to an IOMMU being present, it's -* not like the existing address would have worked anyway... -*/ - msg->address_hi = ~0U; - msg->address_lo = ~0U; - msg->data = ~0U; - } else { - iommu_dma_compose_msi_msg(desc, msg); - } -} diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h index 0b781a98ee73..476e0c54de2d 100644 --- a/include/linux/dma-iommu.h +++ b/include/linux/dma-iommu.h @@ -84,7 +84,6 @@ int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr); void iommu_dma_compose_msi_msg(struct msi_desc *desc, struct msi_msg *msg); -void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg); void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list); #else @@ -124,10 +123,6 @@ static inline void iommu_dma_compose_msi_msg(struct msi_desc *desc, { } -static inline void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg) -{ -} - static inline void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list) { } -- 2.11.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 6/7] irqchip/gic-v3-mbi: Don't map the MSI page in mbi_compose_m{b, s}i_msg()
The functions mbi_compose_m{b, s}i_msg may be called from non-preemptible context. However, on RT, iommu_dma_map_msi_msg() requires to be called from a preemptible context. A recent patch split iommu_dma_map_msi_msg in two new functions: one that should be called in preemptible context, the other does not have any requirement. The GICv3 MSI driver is reworked to avoid executing preemptible code in non-preemptible context. This can be achieved by preparing the two MSI mappings when allocating the MSI interrupt. Signed-off-by: Julien Grall --- Changes in v2: - Rework the commit message to use imperative mood --- drivers/irqchip/irq-gic-v3-mbi.c | 15 +-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/drivers/irqchip/irq-gic-v3-mbi.c b/drivers/irqchip/irq-gic-v3-mbi.c index fbfa7ff6deb1..d50f6cdf043c 100644 --- a/drivers/irqchip/irq-gic-v3-mbi.c +++ b/drivers/irqchip/irq-gic-v3-mbi.c @@ -84,6 +84,7 @@ static void mbi_free_msi(struct mbi_range *mbi, unsigned int hwirq, static int mbi_irq_domain_alloc(struct irq_domain *domain, unsigned int virq, unsigned int nr_irqs, void *args) { + msi_alloc_info_t *info = args; struct mbi_range *mbi = NULL; int hwirq, offset, i, err = 0; @@ -104,6 +105,16 @@ static int mbi_irq_domain_alloc(struct irq_domain *domain, unsigned int virq, hwirq = mbi->spi_start + offset; + err = iommu_dma_prepare_msi(info->desc, + mbi_phys_base + GICD_CLRSPI_NSR); + if (err) + return err; + + err = iommu_dma_prepare_msi(info->desc, + mbi_phys_base + GICD_SETSPI_NSR); + if (err) + return err; + for (i = 0; i < nr_irqs; i++) { err = mbi_irq_gic_domain_alloc(domain, virq + i, hwirq + i); if (err) @@ -142,7 +153,7 @@ static void mbi_compose_msi_msg(struct irq_data *data, struct msi_msg *msg) msg[0].address_lo = lower_32_bits(mbi_phys_base + GICD_SETSPI_NSR); msg[0].data = data->parent_data->hwirq; - iommu_dma_map_msi_msg(data->irq, msg); + iommu_dma_compose_msi_msg(irq_data_get_msi_desc(data), msg); } #ifdef CONFIG_PCI_MSI @@ -202,7 +213,7 @@ static void mbi_compose_mbi_msg(struct irq_data *data, struct msi_msg *msg) msg[1].address_lo = lower_32_bits(mbi_phys_base + GICD_CLRSPI_NSR); msg[1].data = data->parent_data->hwirq; - iommu_dma_map_msi_msg(data->irq, [1]); + iommu_dma_compose_msi_msg(irq_data_get_msi_desc(data), [1]); } /* Platform-MSI specific irqchip */ -- 2.11.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 1/7] genirq/msi: Add a new field in msi_desc to store an IOMMU cookie
When an MSI doorbell is located downstream of an IOMMU, it is required to swizzle the physical address with an appropriately-mapped IOVA for any device attached to one of our DMA ops domain. At the moment, the allocation of the mapping may be done when composing the message. However, the composing may be done in non-preemtible context while the allocation requires to be called from preemptible context. A follow-up change will split the current logic in two functions requiring to keep an IOMMU cookie per MSI. A new field is introduced in msi_desc to store an IOMMU cookie. As the cookie may not be required in some configuration, the field is protected under a new config CONFIG_IRQ_MSI_IOMMU. A pair of helpers has also been introduced to access the field. Signed-off-by: Julien Grall Reviewed-by: Robin Murphy Reviewed-by: Eric Auger --- Changes in v3: - Add Robin's and Eric's reviewed-by Changes in v2: - Update the commit message to use imperative mood - Protect the field with a new config that will be selected by IOMMU_DMA later on - Add a set of helpers to access the new field --- include/linux/msi.h | 26 ++ kernel/irq/Kconfig | 3 +++ 2 files changed, 29 insertions(+) diff --git a/include/linux/msi.h b/include/linux/msi.h index 7e9b81c3b50d..82a308c19222 100644 --- a/include/linux/msi.h +++ b/include/linux/msi.h @@ -77,6 +77,9 @@ struct msi_desc { struct device *dev; struct msi_msg msg; struct irq_affinity_desc*affinity; +#ifdef CONFIG_IRQ_MSI_IOMMU + const void *iommu_cookie; +#endif union { /* PCI MSI/X specific data */ @@ -119,6 +122,29 @@ struct msi_desc { #define for_each_msi_entry_safe(desc, tmp, dev)\ list_for_each_entry_safe((desc), (tmp), dev_to_msi_list((dev)), list) +#ifdef CONFIG_IRQ_MSI_IOMMU +static inline const void *msi_desc_get_iommu_cookie(struct msi_desc *desc) +{ + return desc->iommu_cookie; +} + +static inline void msi_desc_set_iommu_cookie(struct msi_desc *desc, +const void *iommu_cookie) +{ + desc->iommu_cookie = iommu_cookie; +} +#else +static inline const void *msi_desc_get_iommu_cookie(struct msi_desc *desc) +{ + return NULL; +} + +static inline void msi_desc_set_iommu_cookie(struct msi_desc *desc, +const void *iommu_cookie) +{ +} +#endif + #ifdef CONFIG_PCI_MSI #define first_pci_msi_entry(pdev) first_msi_entry(&(pdev)->dev) #define for_each_pci_msi_entry(desc, pdev) \ diff --git a/kernel/irq/Kconfig b/kernel/irq/Kconfig index 5f3e2baefca9..8fee06625c37 100644 --- a/kernel/irq/Kconfig +++ b/kernel/irq/Kconfig @@ -91,6 +91,9 @@ config GENERIC_MSI_IRQ_DOMAIN select IRQ_DOMAIN_HIERARCHY select GENERIC_MSI_IRQ +config IRQ_MSI_IOMMU + bool + config HANDLE_DOMAIN_IRQ bool -- 2.11.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 3/7] irqchip/gicv2m: Don't map the MSI page in gicv2m_compose_msi_msg()
gicv2m_compose_msi_msg() may be called from non-preemptible context. However, on RT, iommu_dma_map_msi_msg() requires to be called from a preemptible context. A recent change split iommu_dma_map_msi_msg() in two new functions: one that should be called in preemptible context, the other does not have any requirement. The GICv2m driver is reworked to avoid executing preemptible code in non-preemptible context. This can be achieved by preparing the MSI mapping when allocating the MSI interrupt. Signed-off-by: Julien Grall Reviewed-by: Eric Auger --- Changes in v3: - Add Eric's reviewed-by Changes in v2: - Rework the commit message to use imperative mood --- drivers/irqchip/irq-gic-v2m.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/irqchip/irq-gic-v2m.c b/drivers/irqchip/irq-gic-v2m.c index f5fe0100f9ff..4359f0583377 100644 --- a/drivers/irqchip/irq-gic-v2m.c +++ b/drivers/irqchip/irq-gic-v2m.c @@ -110,7 +110,7 @@ static void gicv2m_compose_msi_msg(struct irq_data *data, struct msi_msg *msg) if (v2m->flags & GICV2M_NEEDS_SPI_OFFSET) msg->data -= v2m->spi_offset; - iommu_dma_map_msi_msg(data->irq, msg); + iommu_dma_compose_msi_msg(irq_data_get_msi_desc(data), msg); } static struct irq_chip gicv2m_irq_chip = { @@ -167,6 +167,7 @@ static void gicv2m_unalloc_msi(struct v2m_data *v2m, unsigned int hwirq, static int gicv2m_irq_domain_alloc(struct irq_domain *domain, unsigned int virq, unsigned int nr_irqs, void *args) { + msi_alloc_info_t *info = args; struct v2m_data *v2m = NULL, *tmp; int hwirq, offset, i, err = 0; @@ -186,6 +187,11 @@ static int gicv2m_irq_domain_alloc(struct irq_domain *domain, unsigned int virq, hwirq = v2m->spi_start + offset; + err = iommu_dma_prepare_msi(info->desc, + v2m->res.start + V2M_MSI_SETSPI_NS); + if (err) + return err; + for (i = 0; i < nr_irqs; i++) { err = gicv2m_irq_gic_domain_alloc(domain, virq + i, hwirq + i); if (err) -- 2.11.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 5/7] irqchip/ls-scfg-msi: Don't map the MSI page in ls_scfg_msi_compose_msg()
ls_scfg_msi_compose_msg() may be called from non-preemptible context. However, on RT, iommu_dma_map_msi_msg() requires to be called from a preemptible context. A recent patch split iommu_dma_map_msi_msg() in two new functions: one that should be called in preemptible context, the other does not have any requirement. The FreeScale SCFG MSI driver is reworked to avoid executing preemptible code in non-preemptible context. This can be achieved by preparing the MSI maping when allocating the MSI interrupt. Signed-off-by: Julien Grall --- Changes in v2: - Rework the commit message to use imperative mood --- drivers/irqchip/irq-ls-scfg-msi.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/irqchip/irq-ls-scfg-msi.c b/drivers/irqchip/irq-ls-scfg-msi.c index c671b3212010..669d29105772 100644 --- a/drivers/irqchip/irq-ls-scfg-msi.c +++ b/drivers/irqchip/irq-ls-scfg-msi.c @@ -100,7 +100,7 @@ static void ls_scfg_msi_compose_msg(struct irq_data *data, struct msi_msg *msg) msg->data |= cpumask_first(mask); } - iommu_dma_map_msi_msg(data->irq, msg); + iommu_dma_compose_msi_msg(irq_data_get_msi_desc(data), msg); } static int ls_scfg_msi_set_affinity(struct irq_data *irq_data, @@ -141,6 +141,7 @@ static int ls_scfg_msi_domain_irq_alloc(struct irq_domain *domain, unsigned int nr_irqs, void *args) { + msi_alloc_info_t *info = args; struct ls_scfg_msi *msi_data = domain->host_data; int pos, err = 0; @@ -157,6 +158,10 @@ static int ls_scfg_msi_domain_irq_alloc(struct irq_domain *domain, if (err) return err; + err = iommu_dma_prepare_msi(info->desc, msi_data->msiir_addr); + if (err) + return err; + irq_domain_set_info(domain, virq, pos, _scfg_msi_parent_chip, msi_data, handle_simple_irq, NULL, NULL); -- 2.11.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 4/7] irqchip/gic-v3-its: Don't map the MSI page in its_irq_compose_msi_msg()
its_irq_compose_msi_msg() may be called from non-preemptible context. However, on RT, iommu_dma_map_msi_msg requires to be called from a preemptible context. A recent change split iommu_dma_map_msi_msg() in two new functions: one that should be called in preemptible context, the other does not have any requirement. The GICv3 ITS driver is reworked to avoid executing preemptible code in non-preemptible context. This can be achieved by preparing the MSI mapping when allocating the MSI interrupt. Signed-off-by: Julien Grall Reviewed-by: Eric Auger --- Changes in v3: - Fix typo in the commit message - Check the return of iommu_dma_prepare_msi - Add Eric's reviewed-by Changes in v2: - Rework the commit message to use imperative mood --- drivers/irqchip/irq-gic-v3-its.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c index 7577755bdcf4..9cddf336c09d 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -1179,7 +1179,7 @@ static void its_irq_compose_msi_msg(struct irq_data *d, struct msi_msg *msg) msg->address_hi = upper_32_bits(addr); msg->data = its_get_event_id(d); - iommu_dma_map_msi_msg(d->irq, msg); + iommu_dma_compose_msi_msg(irq_data_get_msi_desc(d), msg); } static int its_irq_set_irqchip_state(struct irq_data *d, @@ -2566,6 +2566,7 @@ static int its_irq_domain_alloc(struct irq_domain *domain, unsigned int virq, { msi_alloc_info_t *info = args; struct its_device *its_dev = info->scratchpad[0].ptr; + struct its_node *its = its_dev->its; irq_hw_number_t hwirq; int err; int i; @@ -2574,6 +2575,10 @@ static int its_irq_domain_alloc(struct irq_domain *domain, unsigned int virq, if (err) return err; + err = iommu_dma_prepare_msi(info->desc, its->get_msi_base(its_dev)); + if (err) + return err; + for (i = 0; i < nr_irqs; i++) { err = its_irq_gic_domain_alloc(domain, virq + i, hwirq + i); if (err) -- 2.11.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 0/3] PCIe Host request to reserve IOVA
On Wed, May 01, 2019 at 02:20:56PM +0100, Robin Murphy wrote: > On 2019-05-01 1:55 pm, Bjorn Helgaas wrote: > > On Wed, May 01, 2019 at 12:30:38PM +0100, Lorenzo Pieralisi wrote: > > > On Fri, Apr 12, 2019 at 08:43:32AM +0530, Srinath Mannam wrote: > > > > Few SOCs have limitation that their PCIe host can't allow few inbound > > > > address ranges. Allowed inbound address ranges are listed in dma-ranges > > > > DT property and this address ranges are required to do IOVA mapping. > > > > Remaining address ranges have to be reserved in IOVA mapping. > > > > > > > > PCIe Host driver of those SOCs has to list resource entries of allowed > > > > address ranges given in dma-ranges DT property in sorted order. This > > > > sorted list of resources will be processed and reserve IOVA address for > > > > inaccessible address holes while initializing IOMMU domain. > > > > > > > > This patch set is based on Linux-5.0-rc2. > > > > > > > > Changes from v3: > > > >- Addressed Robin Murphy review comments. > > > > - pcie-iproc: parse dma-ranges and make sorted resource list. > > > > - dma-iommu: process list and reserve gaps between entries > > > > > > > > Changes from v2: > > > >- Patch set rebased to Linux-5.0-rc2 > > > > > > > > Changes from v1: > > > >- Addressed Oza review comments. > > > > > > > > Srinath Mannam (3): > > > >PCI: Add dma_ranges window list > > > >iommu/dma: Reserve IOVA for PCIe inaccessible DMA address > > > >PCI: iproc: Add sorted dma ranges resource entries to host bridge > > > > > > > > drivers/iommu/dma-iommu.c | 19 > > > > drivers/pci/controller/pcie-iproc.c | 44 > > > > - > > > > drivers/pci/probe.c | 3 +++ > > > > include/linux/pci.h | 1 + > > > > 4 files changed, 66 insertions(+), 1 deletion(-) > > > > > > Bjorn, Joerg, > > > > > > this series should not affect anything in the mainline other than its > > > consumer (ie patch 3); if that's the case should we consider it for v5.2 > > > and if yes how are we going to merge it ? > > > > I acked the first one > > > > Robin reviewed the second > > (https://lore.kernel.org/lkml/e6c812d6-0cad-4cfd-defd-d7ec427a6...@arm.com) > > (though I do agree with his comment about DMA_BIT_MASK()), Joerg was OK > > with it if Robin was > > (https://lore.kernel.org/lkml/20190423145721.gh29...@8bytes.org). > > > > Eric reviewed the third (and pointed out a typo). > > > > My Kconfiggery never got fully answered -- it looks to me as though it's > > possible to build pcie-iproc without the DMA hole support, and I thought > > the whole point of this series was to deal with those holes > > (https://lore.kernel.org/lkml/20190418234241.gf126...@google.com). I would > > have expected something like making pcie-iproc depend on IOMMU_SUPPORT. > > But Srinath didn't respond to that, so maybe it's not an issue and it > > should only affect pcie-iproc anyway. > > Hmm, I'm sure I had at least half-written a reply on that point, but I > can't seem to find it now... anyway, the gist is that these inbound > windows are generally set up to cover the physical address ranges of DRAM > and anything else that devices might need to DMA to. Thus if you're not > using an IOMMU, the fact that devices can't access the gaps in between > doesn't matter because there won't be anything there anyway; it only > needs mitigating if you do use an IOMMU and start giving arbitrary > non-physical addresses to the endpoint. So basically there is no strict IOMMU_SUPPORT dependency. > > So bottom line, I'm fine with merging it for v5.2. Do you want to merge > > it, Lorenzo, or ...? > > This doesn't look like it will conflict with the other DMA ops and MSI > mapping changes currently in-flight for iommu-dma, so I have no > objection to it going through the PCI tree for 5.2. I will update the DMA_BIT_MASK() according to your review and fix the typo Eric pointed out and push out a branch - we shall see if we can include it for v5.2. Thanks, Lorenzo ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH v9 03/13] mm: Add support for eXclusive Page Frame Ownership (XPFO)
On Wed, Apr 03, 2019 at 11:34:04AM -0600, Khalid Aziz wrote: > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt > index 858b6c0b9a15..9b36da94760e 100644 > --- a/Documentation/admin-guide/kernel-parameters.txt > +++ b/Documentation/admin-guide/kernel-parameters.txt > @@ -2997,6 +2997,12 @@ > > nox2apic [X86-64,APIC] Do not enable x2APIC mode. > > + noxpfo [XPFO] Disable eXclusive Page Frame Ownership (XPFO) > + when CONFIG_XPFO is on. Physical pages mapped into > + user applications will also be mapped in the > + kernel's address space as if CONFIG_XPFO was not > + enabled. > + > cpu0_hotplug [X86] Turn on CPU0 hotplug feature when > CONFIG_BO OTPARAM_HOTPLUG_CPU0 is off. > Some features depend on CPU0. Known dependencies are: Given the big performance impact that XPFO can have. It should be off by default when configured. Instead, the xpfo option should be used to enable it. Cheers, Longman ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 0/3] PCIe Host request to reserve IOVA
Hi Robin, Thank you so much for all the information. Regards, Srinath. On Wed, May 1, 2019 at 6:51 PM Robin Murphy wrote: > > On 2019-05-01 1:55 pm, Bjorn Helgaas wrote: > > On Wed, May 01, 2019 at 12:30:38PM +0100, Lorenzo Pieralisi wrote: > >> On Fri, Apr 12, 2019 at 08:43:32AM +0530, Srinath Mannam wrote: > >>> Few SOCs have limitation that their PCIe host can't allow few inbound > >>> address ranges. Allowed inbound address ranges are listed in dma-ranges > >>> DT property and this address ranges are required to do IOVA mapping. > >>> Remaining address ranges have to be reserved in IOVA mapping. > >>> > >>> PCIe Host driver of those SOCs has to list resource entries of allowed > >>> address ranges given in dma-ranges DT property in sorted order. This > >>> sorted list of resources will be processed and reserve IOVA address for > >>> inaccessible address holes while initializing IOMMU domain. > >>> > >>> This patch set is based on Linux-5.0-rc2. > >>> > >>> Changes from v3: > >>>- Addressed Robin Murphy review comments. > >>> - pcie-iproc: parse dma-ranges and make sorted resource list. > >>> - dma-iommu: process list and reserve gaps between entries > >>> > >>> Changes from v2: > >>>- Patch set rebased to Linux-5.0-rc2 > >>> > >>> Changes from v1: > >>>- Addressed Oza review comments. > >>> > >>> Srinath Mannam (3): > >>>PCI: Add dma_ranges window list > >>>iommu/dma: Reserve IOVA for PCIe inaccessible DMA address > >>>PCI: iproc: Add sorted dma ranges resource entries to host bridge > >>> > >>> drivers/iommu/dma-iommu.c | 19 > >>> drivers/pci/controller/pcie-iproc.c | 44 > >>> - > >>> drivers/pci/probe.c | 3 +++ > >>> include/linux/pci.h | 1 + > >>> 4 files changed, 66 insertions(+), 1 deletion(-) > >> > >> Bjorn, Joerg, > >> > >> this series should not affect anything in the mainline other than its > >> consumer (ie patch 3); if that's the case should we consider it for v5.2 > >> and if yes how are we going to merge it ? > > > > I acked the first one > > > > Robin reviewed the second > > (https://lore.kernel.org/lkml/e6c812d6-0cad-4cfd-defd-d7ec427a6...@arm.com) > > (though I do agree with his comment about DMA_BIT_MASK()), Joerg was OK > > with it if Robin was > > (https://lore.kernel.org/lkml/20190423145721.gh29...@8bytes.org). > > > > Eric reviewed the third (and pointed out a typo). > > > > My Kconfiggery never got fully answered -- it looks to me as though it's > > possible to build pcie-iproc without the DMA hole support, and I thought > > the whole point of this series was to deal with those holes > > (https://lore.kernel.org/lkml/20190418234241.gf126...@google.com). I would > > have expected something like making pcie-iproc depend on IOMMU_SUPPORT. > > But Srinath didn't respond to that, so maybe it's not an issue and it > > should only affect pcie-iproc anyway. > > Hmm, I'm sure I had at least half-written a reply on that point, but I > can't seem to find it now... anyway, the gist is that these inbound > windows are generally set up to cover the physical address ranges of > DRAM and anything else that devices might need to DMA to. Thus if you're > not using an IOMMU, the fact that devices can't access the gaps in > between doesn't matter because there won't be anything there anyway; it > only needs mitigating if you do use an IOMMU and start giving arbitrary > non-physical addresses to the endpoint. > > > So bottom line, I'm fine with merging it for v5.2. Do you want to merge > > it, Lorenzo, or ...? > > This doesn't look like it will conflict with the other DMA ops and MSI > mapping changes currently in-flight for iommu-dma, so I have no > objection to it going through the PCI tree for 5.2. > > Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 4/7] dma-direct: provide generic support for uncached kernel segments
Hi Christoph, On Tue, Apr 30, 2019 at 07:00:29AM -0400, Christoph Hellwig wrote: > diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c > index 2c2772e9702a..d15a535c3e67 100644 > --- a/kernel/dma/direct.c > +++ b/kernel/dma/direct.c > @@ -164,6 +164,13 @@ void *dma_direct_alloc_pages(struct device *dev, size_t > size, > } > > ret = page_address(page); > + > + if (IS_ENABLED(CONFIG_ARCH_HAS_UNCACHED_SEGMENT) && > + !dev_is_dma_coherent(dev) && !(attrs & DMA_ATTR_NON_CONSISTENT)) { > + arch_dma_prep_coherent(page, size); > + ret = uncached_kernel_address(ret); > + } > + > if (force_dma_unencrypted()) { > set_memory_decrypted((unsigned long)ret, 1 << get_order(size)); > *dma_handle = __phys_to_dma(dev, page_to_phys(page)); > @@ -171,6 +178,7 @@ void *dma_direct_alloc_pages(struct device *dev, size_t > size, > *dma_handle = phys_to_dma(dev, page_to_phys(page)); > } > memset(ret, 0, size); > + > return ret; > } I'm not so sure about this part though. On MIPS we currently don't clear the allocated memory with memset. Is doing that really necessary? If it is necessary then as-is this code will clear the allocated memory using uncached writes which will be pretty slow. It would be much more efficient to perform the memset before arch_dma_prep_coherent() & before converting ret to an uncached address. Thanks, Paul ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 06/19] drivers core: Add I/O ASID allocator
On 30/04/2019 21:24, Jacob Pan wrote: > On Thu, 25 Apr 2019 11:41:05 +0100 > Jean-Philippe Brucker wrote: > >> On 25/04/2019 11:17, Auger Eric wrote: +/** + * ioasid_alloc - Allocate an IOASID + * @set: the IOASID set + * @min: the minimum ID (inclusive) + * @max: the maximum ID (exclusive) + * @private: data private to the caller + * + * Allocate an ID between @min and @max (or %0 and %INT_MAX). Return the >>> I would remove "(or %0 and %INT_MAX)". >> >> Agreed, those where the default values of idr, but the xarray doesn't >> define a default max value. By the way, I do think squashing patches 6 >> and 7 would be better (keeping my SOB but you can change the author). >> > I will squash 6 and 7 in v3. I will just add my SOB but keep the > author if that is OK. Sure, that works Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 4/7] dma-direct: provide generic support for uncached kernel segments
Hi Christoph, On Wed, May 01, 2019 at 07:29:12PM +0200, Christoph Hellwig wrote: > On Wed, May 01, 2019 at 05:18:59PM +, Paul Burton wrote: > > I'm not so sure about this part though. > > > > On MIPS we currently don't clear the allocated memory with memset. Is > > doing that really necessary? > > We are clearling it on mips, it is inside dma_direct_alloc_pages. Ah, of course, I clearly require more caffeine :) > > If it is necessary then as-is this code will clear the allocated memory > > using uncached writes which will be pretty slow. It would be much more > > efficient to perform the memset before arch_dma_prep_coherent() & before > > converting ret to an uncached address. > > Yes, we could do that. Great; using cached writes would match the existing MIPS behavior. Thanks, Paul ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 4/7] dma-direct: provide generic support for uncached kernel segments
On Wed, May 01, 2019 at 05:40:34PM +, Paul Burton wrote: > > > If it is necessary then as-is this code will clear the allocated memory > > > using uncached writes which will be pretty slow. It would be much more > > > efficient to perform the memset before arch_dma_prep_coherent() & before > > > converting ret to an uncached address. > > > > Yes, we could do that. > > Great; using cached writes would match the existing MIPS behavior. Can you test the stack with the two updated patches and ack them if they are fine? That would allow getting at least the infrastructure and mips in for this merge window. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v5 3/3] PCI: iproc: Add sorted dma ranges resource entries to host bridge
IPROC host has the limitation that it can use only those address ranges given by dma-ranges property as inbound address. So that the memory address holes in dma-ranges should be reserved to allocate as DMA address. Inbound address of host accessed by PCIe devices will not be translated before it comes to IOMMU or directly to PE. But the limitation of this host is, access to few address ranges are ignored. So that IOVA ranges for these address ranges have to be reserved. All allowed address ranges are listed in dma-ranges DT parameter. These address ranges are converted as resource entries and listed in sorted order and added to dma_ranges list of PCI host bridge structure. Ex: dma-ranges = < \ 0x4300 0x00 0x8000 0x00 0x8000 0x00 0x8000 \ 0x4300 0x08 0x 0x08 0x 0x08 0x \ 0x4300 0x80 0x 0x80 0x 0x40 0x> In the above example of dma-ranges, memory address from 0x0 - 0x8000, 0x1 - 0x8, 0x10 - 0x80 and 0x100 - 0x. are not allowed to be used as inbound addresses. Signed-off-by: Srinath Mannam Based-on-patch-by: Oza Pawandeep Reviewed-by: Oza Pawandeep Reviewed-by: Eric Auger --- drivers/pci/controller/pcie-iproc.c | 44 - 1 file changed, 43 insertions(+), 1 deletion(-) diff --git a/drivers/pci/controller/pcie-iproc.c b/drivers/pci/controller/pcie-iproc.c index c20fd6b..94ba5c0 100644 --- a/drivers/pci/controller/pcie-iproc.c +++ b/drivers/pci/controller/pcie-iproc.c @@ -1146,11 +1146,43 @@ static int iproc_pcie_setup_ib(struct iproc_pcie *pcie, return ret; } +static int +iproc_pcie_add_dma_range(struct device *dev, struct list_head *resources, +struct of_pci_range *range) +{ + struct resource *res; + struct resource_entry *entry, *tmp; + struct list_head *head = resources; + + res = devm_kzalloc(dev, sizeof(struct resource), GFP_KERNEL); + if (!res) + return -ENOMEM; + + resource_list_for_each_entry(tmp, resources) { + if (tmp->res->start < range->cpu_addr) + head = >node; + } + + res->start = range->cpu_addr; + res->end = res->start + range->size - 1; + + entry = resource_list_create_entry(res, 0); + if (!entry) + return -ENOMEM; + + entry->offset = res->start - range->cpu_addr; + resource_list_add(entry, head); + + return 0; +} + static int iproc_pcie_map_dma_ranges(struct iproc_pcie *pcie) { + struct pci_host_bridge *host = pci_host_bridge_from_priv(pcie); struct of_pci_range range; struct of_pci_range_parser parser; int ret; + LIST_HEAD(resources); /* Get the dma-ranges from DT */ ret = of_pci_dma_range_parser_init(, pcie->dev->of_node); @@ -1158,13 +1190,23 @@ static int iproc_pcie_map_dma_ranges(struct iproc_pcie *pcie) return ret; for_each_of_pci_range(, ) { + ret = iproc_pcie_add_dma_range(pcie->dev, + , + ); + if (ret) + goto out; /* Each range entry corresponds to an inbound mapping region */ ret = iproc_pcie_setup_ib(pcie, , IPROC_PCIE_IB_MAP_MEM); if (ret) - return ret; + goto out; } + list_splice_init(, >dma_ranges); + return 0; +out: + pci_free_resource_list(); + return ret; } static int iproce_pcie_get_msi(struct iproc_pcie *pcie, -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v5 1/3] PCI: Add dma_ranges window list
Add a dma_ranges field in PCI host bridge structure to hold resource entries list of memory regions in sorted order given through dma-ranges DT property. While initializing IOMMU domain of PCI EPs connected to that host bridge, this list of resources will be processed and IOVAs for the address holes will be reserved. Signed-off-by: Srinath Mannam Based-on-patch-by: Oza Pawandeep Reviewed-by: Oza Pawandeep Acked-by: Bjorn Helgaas --- drivers/pci/probe.c | 3 +++ include/linux/pci.h | 1 + 2 files changed, 4 insertions(+) diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 7e12d01..72563c1 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -595,6 +595,7 @@ struct pci_host_bridge *pci_alloc_host_bridge(size_t priv) return NULL; INIT_LIST_HEAD(>windows); + INIT_LIST_HEAD(>dma_ranges); bridge->dev.release = pci_release_host_bridge_dev; /* @@ -623,6 +624,7 @@ struct pci_host_bridge *devm_pci_alloc_host_bridge(struct device *dev, return NULL; INIT_LIST_HEAD(>windows); + INIT_LIST_HEAD(>dma_ranges); bridge->dev.release = devm_pci_release_host_bridge_dev; return bridge; @@ -632,6 +634,7 @@ EXPORT_SYMBOL(devm_pci_alloc_host_bridge); void pci_free_host_bridge(struct pci_host_bridge *bridge) { pci_free_resource_list(>windows); + pci_free_resource_list(>dma_ranges); kfree(bridge); } diff --git a/include/linux/pci.h b/include/linux/pci.h index 7744821..bba0a29 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -490,6 +490,7 @@ struct pci_host_bridge { void*sysdata; int busnr; struct list_head windows; /* resource_entry */ + struct list_head dma_ranges;/* dma ranges resource list */ u8 (*swizzle_irq)(struct pci_dev *, u8 *); /* Platform IRQ swizzler */ int (*map_irq)(const struct pci_dev *, u8, u8); void (*release_fn)(struct pci_host_bridge *); -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v5 0/3] PCIe Host request to reserve IOVA
Few SOCs have limitation that their PCIe host can't allow few inbound address ranges. Allowed inbound address ranges are listed in dma-ranges DT property and this address ranges are required to do IOVA mapping. Remaining address ranges have to be reserved in IOVA mapping. PCIe Host driver of those SOCs has to list resource entries of allowed address ranges given in dma-ranges DT property in sorted order. This sorted list of resources will be processed and reserve IOVA address for inaccessible address holes while initializing IOMMU domain. This patch set is based on Linux-5.1-rc3. Changes from v4: - Addressed Bjorn, Robin Murphy and Auger Eric review comments. - Commit message modification. - Change DMA_BIT_MASK to "~(dma_addr_t)0". Changes from v3: - Addressed Robin Murphy review comments. - pcie-iproc: parse dma-ranges and make sorted resource list. - dma-iommu: process list and reserve gaps between entries Changes from v2: - Patch set rebased to Linux-5.0-rc2 Changes from v1: - Addressed Oza review comments. Srinath Mannam (3): PCI: Add dma_ranges window list iommu/dma: Reserve IOVA for PCIe inaccessible DMA address PCI: iproc: Add sorted dma ranges resource entries to host bridge drivers/iommu/dma-iommu.c | 19 drivers/pci/controller/pcie-iproc.c | 44 - drivers/pci/probe.c | 3 +++ include/linux/pci.h | 1 + 4 files changed, 66 insertions(+), 1 deletion(-) -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v5 2/3] iommu/dma: Reserve IOVA for PCIe inaccessible DMA address
dma_ranges field of PCI host bridge structure has resource entries in sorted order of address range given through dma-ranges DT property. This list is the accessible DMA address range. So that this resource list will be processed and reserve IOVA address to the inaccessible address holes in the list. This method is similar to PCI IO resources address ranges reserving in IOMMU for each EP connected to host bridge. Signed-off-by: Srinath Mannam Based-on-patch-by: Oza Pawandeep Reviewed-by: Oza Pawandeep Acked-by: Robin Murphy --- drivers/iommu/dma-iommu.c | 19 +++ 1 file changed, 19 insertions(+) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 77aabe6..da94844 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -212,6 +212,7 @@ static void iova_reserve_pci_windows(struct pci_dev *dev, struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus); struct resource_entry *window; unsigned long lo, hi; + phys_addr_t start = 0, end; resource_list_for_each_entry(window, >windows) { if (resource_type(window->res) != IORESOURCE_MEM) @@ -221,6 +222,24 @@ static void iova_reserve_pci_windows(struct pci_dev *dev, hi = iova_pfn(iovad, window->res->end - window->offset); reserve_iova(iovad, lo, hi); } + + /* Get reserved DMA windows from host bridge */ + resource_list_for_each_entry(window, >dma_ranges) { + end = window->res->start - window->offset; +resv_iova: + if (end - start) { + lo = iova_pfn(iovad, start); + hi = iova_pfn(iovad, end); + reserve_iova(iovad, lo, hi); + } + start = window->res->end - window->offset + 1; + /* If window is last entry */ + if (window->node.next == >dma_ranges && + end != ~(dma_addr_t)0) { + end = ~(dma_addr_t)0; + goto resv_iova; + } + } } static int iova_reserve_iommu_regions(struct device *dev, -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 5/7 v2] MIPS: use the generic uncached segment support in dma-direct
Hi Christoph, On Wed, May 01, 2019 at 03:13:39PM +0200, Christoph Hellwig wrote: > Stop providing our arch alloc/free hooks and just expose the segment > offset instead. > > Signed-off-by: Christoph Hellwig > --- > arch/mips/Kconfig | 1 + > arch/mips/include/asm/page.h | 3 --- > arch/mips/jazz/jazzdma.c | 6 -- > arch/mips/mm/dma-noncoherent.c | 26 +- > 4 files changed, 10 insertions(+), 26 deletions(-) This one looks good to me now, for patches 1 & 5: Acked-by: Paul Burton Thanks, Paul ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 1/9] soc/fsl/qman: fixup liodns only on ppc targets
On Sat, Apr 27, 2019 at 2:14 AM wrote: > > From: Laurentiu Tudor > > ARM SoCs use SMMU so the liodn fixup done in the qman driver is no > longer making sense and it also breaks the ICID settings inherited > from u-boot. Do the fixups only for PPC targets. > > Signed-off-by: Laurentiu Tudor Applied for next. Thanks. Leo > --- > drivers/soc/fsl/qbman/qman_ccsr.c | 2 +- > drivers/soc/fsl/qbman/qman_priv.h | 9 - > 2 files changed, 9 insertions(+), 2 deletions(-) > > diff --git a/drivers/soc/fsl/qbman/qman_ccsr.c > b/drivers/soc/fsl/qbman/qman_ccsr.c > index 109b38de3176..a6bb43007d03 100644 > --- a/drivers/soc/fsl/qbman/qman_ccsr.c > +++ b/drivers/soc/fsl/qbman/qman_ccsr.c > @@ -596,7 +596,7 @@ static int qman_init_ccsr(struct device *dev) > } > > #define LIO_CFG_LIODN_MASK 0x0fff > -void qman_liodn_fixup(u16 channel) > +void __qman_liodn_fixup(u16 channel) > { > static int done; > static u32 liodn_offset; > diff --git a/drivers/soc/fsl/qbman/qman_priv.h > b/drivers/soc/fsl/qbman/qman_priv.h > index 75a8f905f8f7..04515718cfd9 100644 > --- a/drivers/soc/fsl/qbman/qman_priv.h > +++ b/drivers/soc/fsl/qbman/qman_priv.h > @@ -193,7 +193,14 @@ extern struct gen_pool *qm_cgralloc; /* CGR ID allocator > */ > u32 qm_get_pools_sdqcr(void); > > int qman_wq_alloc(void); > -void qman_liodn_fixup(u16 channel); > +#ifdef CONFIG_FSL_PAMU > +#define qman_liodn_fixup __qman_liodn_fixup > +#else > +static inline void qman_liodn_fixup(u16 channel) > +{ > +} > +#endif > +void __qman_liodn_fixup(u16 channel); > void qman_set_sdest(u16 channel, unsigned int cpu_idx); > > struct qman_portal *qman_create_affine_portal( > -- > 2.17.1 > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 2/9] soc/fsl/qbman_portals: add APIs to retrieve the probing status
On Sat, Apr 27, 2019 at 2:14 AM wrote: > > From: Laurentiu Tudor > > Add a couple of new APIs to check the probing status of the required > cpu bound qman and bman portals: > 'int bman_portals_probed()' and 'int qman_portals_probed()'. > They return the following values. > * 1 if qman/bman portals were all probed correctly > * 0 if qman/bman portals were not yet probed > * -1 if probing of qman/bman portals failed > Portals are considered successful probed if no error occurred during > the probing of any of the portals and if enough portals were probed > to have one available for each cpu. > The error handling paths were slightly rearranged in order to fit this > new functionality without being too intrusive. > Drivers that use qman/bman portal driver services are required to use > these APIs before calling any functions exported by these drivers or > otherwise they will crash the kernel. > First user will be the dpaa1 ethernet driver, coming in a subsequent > patch. > > Signed-off-by: Laurentiu Tudor Applied for next. Thanks. Leo > --- > drivers/soc/fsl/qbman/bman_portal.c | 20 > drivers/soc/fsl/qbman/qman_portal.c | 21 + > include/soc/fsl/bman.h | 8 > include/soc/fsl/qman.h | 9 + > 4 files changed, 50 insertions(+), 8 deletions(-) > > diff --git a/drivers/soc/fsl/qbman/bman_portal.c > b/drivers/soc/fsl/qbman/bman_portal.c > index 2c95cf59f3e7..cf4f10d6f590 100644 > --- a/drivers/soc/fsl/qbman/bman_portal.c > +++ b/drivers/soc/fsl/qbman/bman_portal.c > @@ -32,6 +32,7 @@ > > static struct bman_portal *affine_bportals[NR_CPUS]; > static struct cpumask portal_cpus; > +static int __bman_portals_probed; > /* protect bman global registers and global data shared among portals */ > static DEFINE_SPINLOCK(bman_lock); > > @@ -87,6 +88,12 @@ static int bman_online_cpu(unsigned int cpu) > return 0; > } > > +int bman_portals_probed(void) > +{ > + return __bman_portals_probed; > +} > +EXPORT_SYMBOL_GPL(bman_portals_probed); > + > static int bman_portal_probe(struct platform_device *pdev) > { > struct device *dev = >dev; > @@ -104,8 +111,10 @@ static int bman_portal_probe(struct platform_device > *pdev) > } > > pcfg = devm_kmalloc(dev, sizeof(*pcfg), GFP_KERNEL); > - if (!pcfg) > + if (!pcfg) { > + __bman_portals_probed = -1; > return -ENOMEM; > + } > > pcfg->dev = dev; > > @@ -113,14 +122,14 @@ static int bman_portal_probe(struct platform_device > *pdev) > DPAA_PORTAL_CE); > if (!addr_phys[0]) { > dev_err(dev, "Can't get %pOF property 'reg::CE'\n", node); > - return -ENXIO; > + goto err_ioremap1; > } > > addr_phys[1] = platform_get_resource(pdev, IORESOURCE_MEM, > DPAA_PORTAL_CI); > if (!addr_phys[1]) { > dev_err(dev, "Can't get %pOF property 'reg::CI'\n", node); > - return -ENXIO; > + goto err_ioremap1; > } > > pcfg->cpu = -1; > @@ -128,7 +137,7 @@ static int bman_portal_probe(struct platform_device *pdev) > irq = platform_get_irq(pdev, 0); > if (irq <= 0) { > dev_err(dev, "Can't get %pOF IRQ'\n", node); > - return -ENXIO; > + goto err_ioremap1; > } > pcfg->irq = irq; > > @@ -150,6 +159,7 @@ static int bman_portal_probe(struct platform_device *pdev) > spin_lock(_lock); > cpu = cpumask_next_zero(-1, _cpus); > if (cpu >= nr_cpu_ids) { > + __bman_portals_probed = 1; > /* unassigned portal, skip init */ > spin_unlock(_lock); > return 0; > @@ -175,6 +185,8 @@ static int bman_portal_probe(struct platform_device *pdev) > err_ioremap2: > memunmap(pcfg->addr_virt_ce); > err_ioremap1: > +__bman_portals_probed = -1; > + > return -ENXIO; > } > > diff --git a/drivers/soc/fsl/qbman/qman_portal.c > b/drivers/soc/fsl/qbman/qman_portal.c > index 661c9b234d32..e2186b681d87 100644 > --- a/drivers/soc/fsl/qbman/qman_portal.c > +++ b/drivers/soc/fsl/qbman/qman_portal.c > @@ -38,6 +38,7 @@ EXPORT_SYMBOL(qman_dma_portal); > #define CONFIG_FSL_DPA_PIRQ_FAST 1 > > static struct cpumask portal_cpus; > +static int __qman_portals_probed; > /* protect qman global registers and global data shared among portals */ > static DEFINE_SPINLOCK(qman_lock); > > @@ -220,6 +221,12 @@ static int qman_online_cpu(unsigned int cpu) > return 0; > } > > +int qman_portals_probed(void) > +{ > + return __qman_portals_probed; > +} > +EXPORT_SYMBOL_GPL(qman_portals_probed); > + > static int qman_portal_probe(struct platform_device *pdev) > { > struct device *dev = >dev; > @@ -238,8 +245,10 @@ static int
Re: [PATCH 4/7] dma-direct: provide generic support for uncached kernel segments
Hi Christoph, On Wed, May 01, 2019 at 07:49:05PM +0200, Christoph Hellwig wrote: > On Wed, May 01, 2019 at 05:40:34PM +, Paul Burton wrote: > > > > If it is necessary then as-is this code will clear the allocated memory > > > > using uncached writes which will be pretty slow. It would be much more > > > > efficient to perform the memset before arch_dma_prep_coherent() & before > > > > converting ret to an uncached address. > > > > > > Yes, we could do that. > > > > Great; using cached writes would match the existing MIPS behavior. > > Can you test the stack with the two updated patches and ack them if > they are fine? That would allow getting at least the infrastructure > and mips in for this merge window. Did you send a v2 of this patch? If so it hasn't showed up in my inbox, nor on the linux-mips archive on lore.kernel.org. Thanks, Paul ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 1/2] iommu/vt-d: Set intel_iommu_gfx_mapped correctly
The intel_iommu_gfx_mapped flag is exported by the Intel IOMMU driver to indicate whether an IOMMU is used for the graphic device. In a virtualized IOMMU environment (e.g. QEMU), an include-all IOMMU is used for graphic device. This flag is found to be clear even the IOMMU is used. Cc: Ashok Raj Cc: Jacob Pan Cc: Kevin Tian Reported-by: Zhenyu Wang Fixes: c0771df8d5297 ("intel-iommu: Export a flag indicating that the IOMMU is used for iGFX.") Suggested-by: Kevin Tian Signed-off-by: Lu Baolu --- drivers/iommu/intel-iommu.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index e0c0febc6fa5..00ad00193883 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -4068,9 +4068,7 @@ static void __init init_no_remapping_devices(void) /* This IOMMU has *only* gfx devices. Either bypass it or set the gfx_mapped flag, as appropriate */ - if (dmar_map_gfx) { - intel_iommu_gfx_mapped = 1; - } else { + if (!dmar_map_gfx) { drhd->ignored = 1; for_each_active_dev_scope(drhd->devices, drhd->devices_cnt, i, dev) @@ -4909,6 +4907,9 @@ int __init intel_iommu_init(void) goto out_free_reserved_range; } + if (dmar_map_gfx) + intel_iommu_gfx_mapped = 1; + init_no_remapping_devices(); ret = init_dmars(); -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 2/2] iommu/vt-d: Make kernel parameter igfx_off work with vIOMMU
The kernel parameter igfx_off is used by users to disable DMA remapping for the Intel integrated graphic device. It was designed for bare metal cases where a dedicated IOMMU is used for graphic. This doesn't apply to virtual IOMMU case where an include-all IOMMU is used. This makes the kernel parameter work with virtual IOMMU as well. Cc: Ashok Raj Cc: Jacob Pan Suggested-by: Kevin Tian Fixes: c0771df8d5297 ("intel-iommu: Export a flag indicating that the IOMMU is used for iGFX.") Signed-off-by: Lu Baolu Tested-by: Zhenyu Wang --- drivers/iommu/intel-iommu.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 00ad00193883..e078b13ce3d8 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -3415,9 +3415,12 @@ static int __init init_dmars(void) iommu_identity_mapping |= IDENTMAP_ALL; #ifdef CONFIG_INTEL_IOMMU_BROKEN_GFX_WA - iommu_identity_mapping |= IDENTMAP_GFX; + dmar_map_gfx = 0; #endif + if (!dmar_map_gfx) + iommu_identity_mapping |= IDENTMAP_GFX; + check_tylersburg_isoch(); if (iommu_identity_mapping) { -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 0/2] iommu/vt-d: Small fixes for 5.2-rc1
Hi Joerg, This includes two small fixes for virtual IOMMU running in qemu enviroment. On bare metal, we always have an dedicated IOMMU for Intel integrated graphic device. And some aspects of the driver was designed according to this. Unfortunately, in qemu environment, the virtual IOMMU has only a single include-all IOMMU engine, as the result some interfaces don't work as expected anymore. This includes two fixes for this. Best regards, Lu Baolu Lu Baolu (2): iommu/vt-d: Set intel_iommu_gfx_mapped correctly iommu/vt-d: Make kernel parameter igfx_off work with vIOMMU drivers/iommu/intel-iommu.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 02/10] swiotlb: Factor out slot allocation and free
Hi Robin, On 4/30/19 5:53 PM, Robin Murphy wrote: On 30/04/2019 03:02, Lu Baolu wrote: Hi Robin, On 4/29/19 7:06 PM, Robin Murphy wrote: On 29/04/2019 06:10, Lu Baolu wrote: Hi Christoph, On 4/26/19 11:04 PM, Christoph Hellwig wrote: On Thu, Apr 25, 2019 at 10:07:19AM +0800, Lu Baolu wrote: This is not VT-d specific. It's just how generic IOMMU works. Normally, IOMMU works in paging mode. So if a driver issues DMA with IOVA 0x0123, IOMMU can remap it with a physical address 0x0123. But we should never expect IOMMU to remap 0x0123 with physical address of 0x. That's the reason why I said that IOMMU will not work there. Well, with the iommu it doesn't happen. With swiotlb it obviosuly can happen, so drivers are fine with it. Why would that suddenly become an issue when swiotlb is called from the iommu code? I would say IOMMU is DMA remapping, not DMA engine. :-) I'm not sure I really follow the issue here - if we're copying the buffer to the bounce page(s) there's no conceptual difference from copying it to SWIOTLB slot(s), so there should be no need to worry about the original in-page offset. From the reply up-thread I guess you're trying to include an optimisation to only copy the head and tail of the buffer if it spans multiple pages, and directly map the ones in the middle, but AFAICS that's going to tie you to also using strict mode for TLB maintenance, which may not be a win overall depending on the balance between invalidation bandwidth vs. memcpy bandwidth. At least if we use standard SWIOTLB logic to always copy the whole thing, we should be able to release the bounce pages via the flush queue to allow 'safe' lazy unmaps. With respect, even we use the standard SWIOTLB logic, we need to use the strict mode for TLB maintenance. Say, some swiotbl slots are used by untrusted device for bounce page purpose. When the device driver unmaps the IOVA, the slots are freed but the mapping is still cached in IOTLB, hence the untrusted device is still able to access the slots. Then the slots are allocated to other devices. This makes it possible for the untrusted device to access the data buffer of other devices. Sure, that's indeed how it would work right now - however since the bounce pages will be freed and reused by the DMA API layer itself (at the same level as the IOVAs) I see no technical reason why we couldn't investigate deferred freeing as a future optimisation. Yes, agreed. Best regards, Lu Baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu