Re: [PATCH 0/9] Use vm_insert_range
On Thu, Nov 15, 2018 at 9:09 PM Souptick Joarder wrote: > > Previouly drivers have their own way of mapping range of > kernel pages/memory into user vma and this was done by > invoking vm_insert_page() within a loop. > > As this pattern is common across different drivers, it can > be generalized by creating a new function and use it across > the drivers. > > vm_insert_range is the new API which will be used to map a > range of kernel memory/pages to user vma. > > All the applicable places are converted to use new vm_insert_range > in this patch series. > > Souptick Joarder (9): > mm: Introduce new vm_insert_range API > arch/arm/mm/dma-mapping.c: Convert to use vm_insert_range > drivers/firewire/core-iso.c: Convert to use vm_insert_range > drm/rockchip/rockchip_drm_gem.c: Convert to use vm_insert_range > drm/xen/xen_drm_front_gem.c: Convert to use vm_insert_range > iommu/dma-iommu.c: Convert to use vm_insert_range > videobuf2/videobuf2-dma-sg.c: Convert to use vm_insert_range > xen/gntdev.c: Convert to use vm_insert_range > xen/privcmd-buf.c: Convert to use vm_insert_range Any further comment on driver changes ? > > arch/arm/mm/dma-mapping.c | 21 ++--- > drivers/firewire/core-iso.c | 15 ++-- > drivers/gpu/drm/rockchip/rockchip_drm_gem.c | 20 ++-- > drivers/gpu/drm/xen/xen_drm_front_gem.c | 20 +--- > drivers/iommu/dma-iommu.c | 12 ++ > drivers/media/common/videobuf2/videobuf2-dma-sg.c | 23 ++- > drivers/xen/gntdev.c | 11 - > drivers/xen/privcmd-buf.c | 8 ++- > include/linux/mm_types.h | 3 +++ > mm/memory.c | 28 > +++ > mm/nommu.c| 7 ++ > 11 files changed, 70 insertions(+), 98 deletions(-) > > -- > 1.9.1 > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH 0/6] Auxiliary IOMMU domains and Arm SMMUv3
Hi Joerg, On 11/8/18 12:43 AM, j...@8bytes.org wrote: Hi, On Wed, Nov 07, 2018 at 11:40:30AM +0800, Lu Baolu wrote: Hi Joerg, On 11/7/18 12:25 AM, j...@8bytes.org wrote: Nowadays, we can find PASID granular DMA translation on both ARM and x86 platforms. With PASID granular DMA translation supported in system iommu, if a device divides itself into multiple subsets and tag the DMA transfers of each subset with a unique PASID, each subset become assignable. We call the assignable subset as an ADI (Assignable Device Interface). As the result, each ADI could be attached with a domain. Yeah, I know the background. The point is, the IOMMU-API as of today implements a strict one-to-one relationship between domains and devices, every device can only have one domain assigned at a given time. If we assign a new domain to a device, the old gets unassigned. If we allow to attach multiple domains to a single device we fundamentally break that semantic. Yes. In the latest v4 submission, we use the aux-domain specific APIs to attach/detach the domain to a device. Do you see other APIs that will possibly break this semantic? Therefore I think it is better is support the ADI devices with subdomains and a new set of functions in the API to handle only these sub-domains. Can you please elaborate a bit more about the concept of subdomains? From my point of view, an aux-domain is a normal un-managed domain which has a PASID and could be attached to any ADIs through the aux-domain specific attach/detach APIs. It could also be attached to a device with the existing domain_attach/detach_device() APIs at the same time, hence mdev and pci devices in a vfio container could share a domain. Best regards, Lu Baolu Further more, a single domain might be attached to an ADI of device A, while attached to another legacy device B which doesn't support PASID features. In this case, we say "Domain 4" is attached to ADI(PASID#x) in aux mode and attached to device B in primary mode. This will of course not work with subdomains, but is that really necessary? VFIO already allocates a separate domain for each device anyway (iirc), so it is unlikely that is uses the same domain for a legacy and an ADI device. Regards, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH RFC] dma-direct: do not allocate a single page from CMA area
On Tue, Nov 20, 2018 at 10:20:10AM +0100, Christoph Hellwig wrote: > On Mon, Nov 05, 2018 at 02:40:51PM -0800, Nicolin Chen wrote: > > > > In general, this seems to make sense to me. It does represent a > > > > theoretical > > > > change in behaviour for devices which have their own CMA area somewhere > > > > other than kernel memory, and only ever make non-atomic allocations, > > > > but > > > > I'm not sure whether that's a realistic or common enough case to really > > > > worry about. > > > > > > Yes, I think we should make the decision in dma_alloc_from_contiguous > > > based on having a per-dev CMA area or not. There is a lot of cruft in > > > > It seems that cma_alloc() already has a CMA area check? Would it > > be duplicated to have a similar one in dma_alloc_from_contiguous? > > It isn't duplicate if it serves a different purpose. > > > > this area that should be cleaned up while we're at it, like always > > > falling back to the normal page allocator if there is no CMA area or > > > nothing suitable found in dma_alloc_from_contiguous instead of > > > having to duplicate all that in the caller. > > > > Am I supposed to clean up things that's mentioned above by moving > > the fallback allocator into dma_alloc_from_contiguous, or to just > > move my change (the count check) into dma_alloc_from_contiguous? > > > > I understand that'd be great to have a cleanup, yet feel it could > > be done separately as this patch isn't really a cleanup change. > > I can take care of any cleanups. I've been trying to dust up that > area anyway. Thanks for the reply. It looks like it'd be better for me to wait for the cleanup being done? I feel odd merely adding a size check in the dma_alloc_from_contiguous(). ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/4] PCI / ACPI: Identify external PCI devices
On Friday, November 16, 2018 11:57:38 AM CET Lorenzo Pieralisi wrote: > On Thu, Nov 15, 2018 at 07:33:54PM +, mario.limoncie...@dell.com wrote: > > > > > > > -Original Message- > > > From: Mika Westerberg > > > Sent: Thursday, November 15, 2018 1:01 PM > > > To: Lorenzo Pieralisi > > > Cc: Lukas Wunner; iommu@lists.linux-foundation.org; Joerg Roedel; David > > > Woodhouse; Lu Baolu; Ashok Raj; Bjorn Helgaas; Rafael J. Wysocki; Jacob > > > jun Pan; > > > Andreas Noever; Michael Jamet; Yehezkel Bernat; Christian Kellner; > > > Limonciello, > > > Mario; Anthony Wong; linux-a...@vger.kernel.org; > > > linux-...@vger.kernel.org; linux- > > > ker...@vger.kernel.org > > > Subject: Re: [PATCH 1/4] PCI / ACPI: Identify external PCI devices > > > > > > > > > [EXTERNAL EMAIL] > > > > > > On Thu, Nov 15, 2018 at 05:46:08PM +, Lorenzo Pieralisi wrote: > > > > Do you really need to parse it if the dev->is_thunderbolt check is > > > > enough ? > > > > > > Yes, we need to parse it one way or another. dev->is_thunderbolt is > > > based on heuristics which do not apply anymore when the thing gets > > > integrated in the SoC. > > > > > > The _DSD is there already (on existing systems) and is being used by > > > Windows so I don't understand why we cannot take advantage of it? Every > > > new system with Thunderbolt ports will have it. > > We have different opinions on this, there is no point in me reiterating > it over and over, I am against the approach taken to solve this problem > first in defining the bindings outside the ACPI specifications and > second by acquiescing to what has been done so that it will be done > over and over again. Arguably, however, we are on the receiving end of things here and even if we don't use this binding, that won't buy us anything (but it will introduce a fair amount of disappointment among both users and OEMs). If Windows uses it, then systems will ship with it regardless of what Linux does with it, so your "acquiescing to what has been done" argument leads to nowhere in this particular case. It's just a matter of whether or not Linux will provide the same level of functionality as Windows with respect to this and IMO it would be impractical to refuse to do that for purely formal reasons. > I will raise the point in the appropriate forum, it is up to Bjorn > and Rafael to decide on this patch. For the record, my opinion is that there's a little choice on whether or not to use this extra information that firmware is willing to give us. It could be defined in a better way and so on, but since it is in use anyway, we really have nothing to gain by refusing to use it. Now, where the handling of it belongs to is a separate matter that should be decided on its own. Thanks, Rafael ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 6/8] vfio/mdev: Add iommu place holders in mdev_device
On 11/16/2018 2:27 PM, Christoph Hellwig wrote: > On Fri, Nov 16, 2018 at 09:20:48AM +0800, Lu Baolu wrote: >>> Please keep symbols mdev_set/get_iommu_device(dev, iommu_device) non-GPL >>> same as other exported symbols from mdev_core module. >> >> Yes. It will be fixed in the next version. > > No. mdev shall not be used to circumvent the exports in the generic > vfio code. > It is about how mdev framework can be used by existing drivers. These symbols doesn't use any other exported symbols. Thanks, Kirti ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/2] SWIOTLB fixes for 4.20
On Tue, Nov 20, 2018 at 6:10 AM Robin Murphy wrote: > > This is what I have so far, which at least resolves the most ovbious > problems. I still haven't got very far with the USB corruption issue > I see on Juno with -rc1, but I'm yet to confirm whether that's actually > attributable to the SWIOTLB changes or something else entirely. > > Robin. > > Robin Murphy (2): > swiotlb: Make DIRECT_MAPPING_ERROR viable > swiotlb: Skip cache maintenance on map error > > include/linux/dma-direct.h | 2 +- > kernel/dma/swiotlb.c | 3 ++- > 2 files changed, 3 insertions(+), 2 deletions(-) Thanks so much for chasing this down! Unfortunately AOSP is giving me grief this week, so I've not been able to test the full environment, but I don't seem to be hitting the io hangs I was seeing earlier with this patch set. For both: Tested-by: John Stultz thanks -john ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [virtio-dev] Re: [PATCH v4 5/7] iommu: Add virtio-iommu driver
On 16/11/2018 18:46, Jean-Philippe Brucker wrote: >>> +/* >>> + * __viommu_sync_req - Complete all in-flight requests >>> + * >>> + * Wait for all added requests to complete. When this function returns, all >>> + * requests that were in-flight at the time of the call have completed. >>> + */ >>> +static int __viommu_sync_req(struct viommu_dev *viommu) >>> +{ >>> + int ret = 0; >>> + unsigned int len; >>> + size_t write_len; >>> + struct viommu_request *req; >>> + struct virtqueue *vq = viommu->vqs[VIOMMU_REQUEST_VQ]; >>> + >>> + assert_spin_locked(&viommu->request_lock); >>> + >>> + virtqueue_kick(vq); >>> + >>> + while (!list_empty(&viommu->requests)) { >>> + len = 0; >>> + req = virtqueue_get_buf(vq, &len); >>> + if (!req) >>> + continue; >>> + >>> + if (!len) >>> + viommu_set_req_status(req->buf, req->len, >>> + VIRTIO_IOMMU_S_IOERR); >>> + >>> + write_len = req->len - req->write_offset; >>> + if (req->writeback && len >= write_len) >> I don't get "len >= write_len". Is it valid for the device to write more >> than write_len? If it writes less than write_len, the status is not set, >> is it? Actually, len could well be three bytes smaller than write_len. The spec doesn't require the device to write the three padding bytes in virtio_iommu_req_tail, after the status byte. Here the driver just assumes that the device wrote the reserved field. The QEMU device seems to write uninitialized data in there... Any objection to changing the spec to require the device to initialize those bytes to zero? I think it makes things nicer overall and shouldn't have any performance impact. [...] >>> +static struct iommu_domain *viommu_domain_alloc(unsigned type) >>> +{ >>> + struct viommu_domain *vdomain; >>> + >>> + if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA) >> smmu drivers also support the IDENTITY type. Don't we want to support it >> as well? > > Eventually yes. The IDENTITY domain is useful when an IOMMU has been > forced upon you and gets in the way of performance, in which case you > get rid of it with CONFIG_IOMMU_DEFAULT_PASSTHROUGH=y or > iommu.passthrough=y. For virtio-iommu though, you could simply not > instantiate the device. > > I don't think it's difficult to add: if the device supports > VIRTIO_IOMMU_F_BYPASS, then we simply don't send an ATTACH request. > Otherwise after ATTACH we send a MAP for the whole input range. If the > change isn't bigger than this, I'll add it to v5. Not as straightforward as I hoped, when the device doesn't support VIRTIO_IOMMU_F_BYPASS: * We can't simply create an ID map for the whole input range, we need to carve out the resv_mem regions. * For a VFIO device, the host has no way of forwarding the identity mapping. For example the guest will attempt to map [0; 0x] -> [0; 0x], but VFIO only allows to map RAM and cannot take such request. One solution is to only create ID mapping for RAM, and register a notifier for memory hotplug, like intel-iommu does for IOMMUs that don't support HW pass-through. Since it's not completely trivial and - I think - not urgent, I'll leave this for later. Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/2] swiotlb: Make DIRECT_MAPPING_ERROR viable
On Tue, Nov 20, 2018 at 03:01:33PM +, Robin Murphy wrote: > On 20/11/2018 14:49, Konrad Rzeszutek Wilk wrote: > > On Tue, Nov 20, 2018 at 02:09:52PM +, Robin Murphy wrote: > > > With the overflow buffer removed, we no longer have a unique address > > > which is guaranteed not to be a valid DMA target to use as an error > > > token. The DIRECT_MAPPING_ERROR value of 0 tries to at least represent > > > an unlikely DMA target, but unfortunately there are already SWIOTLB > > > users with DMA-able memory at physical address 0 which now gets falsely > > > treated as a mapping failure and leads to all manner of misbehaviour. > > > > > > The best we can do to mitigate that is flip DIRECT_MAPPING_ERROR to the > > > commonly-used all-bits-set value, since the last single byte of memory > > > is by far the least-likely-valid DMA target. > > > > Are all the callers checking for DIRECT_MAPPING_ERROR or is it more of > > a comparison (as in if (!ret)) ? > > dma_direct_map_page() and dma_direct_mapping_error() were already doing the > right thing, and external callers must rely on the latter via > dma_mapping_error() rather than trying to inspect the actual value > themselves, since that varies between implementations anyway. AFAICS all the > new return paths from swiotlb_map_page() are also robust in referencing the > macro explicitly, so I think we're good. Cool! Thank you for checking. Acked-by: Konrad Rzeszutek Wilk Thank you! > > Thanks, > Robin. > > > > Fixes: dff8d6c1ed58 ("swiotlb: remove the overflow buffer")] > > > Reported-by: John Stultz > > > Signed-off-by: Robin Murphy > > > --- > > > include/linux/dma-direct.h | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h > > > index bd73e7a91410..9de9c7ab39d6 100644 > > > --- a/include/linux/dma-direct.h > > > +++ b/include/linux/dma-direct.h > > > @@ -5,7 +5,7 @@ > > > #include > > > #include > > > -#define DIRECT_MAPPING_ERROR 0 > > > +#define DIRECT_MAPPING_ERROR ~(dma_addr_t)0 > > > #ifdef CONFIG_ARCH_HAS_PHYS_TO_DMA > > > #include > > > -- > > > 2.19.1.dirty > > > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/2] SWIOTLB fixes for 4.20
On Tue, Nov 20, 2018 at 05:08:18PM +0100, Christoph Hellwig wrote: > On Tue, Nov 20, 2018 at 02:09:51PM +, Robin Murphy wrote: > > This is what I have so far, which at least resolves the most ovbious > > problems. I still haven't got very far with the USB corruption issue > > I see on Juno with -rc1, but I'm yet to confirm whether that's actually > > attributable to the SWIOTLB changes or something else entirely. > > This looks good modulo the minor nitpicks. > > Konrad, are you ok with me picking up both through the dma-mapping > tree? Yes, albeit I would want Stefano to take a peek at patch #2 just in case. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/2] swiotlb: Make DIRECT_MAPPING_ERROR viable
The subject line should say dma-direct. This isn't really swiotlb specific. > +#define DIRECT_MAPPING_ERROR ~(dma_addr_t)0 Please add braces around the value like the other *MAPPING_ERROR defintions so that it can be safely used in any context. Otherwise looks good: Reviewed-by: Christoph Hellwig ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/2] SWIOTLB fixes for 4.20
On Tue, Nov 20, 2018 at 02:09:51PM +, Robin Murphy wrote: > This is what I have so far, which at least resolves the most ovbious > problems. I still haven't got very far with the USB corruption issue > I see on Juno with -rc1, but I'm yet to confirm whether that's actually > attributable to the SWIOTLB changes or something else entirely. This looks good modulo the minor nitpicks. Konrad, are you ok with me picking up both through the dma-mapping tree? ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/2] swiotlb: Skip cache maintenance on map error
On Tue, Nov 20, 2018 at 02:09:53PM +, Robin Murphy wrote: > If swiotlb_bounce_page() failed, calling arch_sync_dma_for_device() may > lead to such delights as performing cache maintenance on whatever > address phys_to_virt(SWIOTLB_MAP_ERROR) looks like, which is typically > outside the kernel memory map and goes about as well as expected. > > Don't do that. > > Fixes: a4a4330db46a ("swiotlb: add support for non-coherent DMA") > Signed-off-by: Robin Murphy Looks good, Reviewed-by: Christoph Hellwig ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2] iommu/dma: Use NUMA aware memory allocations in __iommu_dma_alloc_pages()
On 20/11/2018 14:20, Robin Murphy wrote: On 20/11/2018 13:42, John Garry wrote: From: Ganapatrao Kulkarni Change function __iommu_dma_alloc_pages() to allocate memory/pages for DMA from respective device NUMA node. Signed-off-by: Ganapatrao Kulkarni [JPG: Modifed to use kvzalloc() and fixed indentation] Signed-off-by: John Garry --- Difference v1->v2: - Add Ganapatrao's tag and change author This patch was originally posted by Ganapatrao in [1]. However, after initial review, it was never reposted (due to lack of cycles, I think). In addition, the functionality in its sibling patches were merged through patches, as mentioned in [2]; this also refers to a discussion on device local allocations vs CPU local allocations for DMA pool, and which is better [3]. However, as mentioned in [3], dma_alloc_coherent() uses the locality information from the device - as in direct DMA - so this patch is just applying this same policy. [1] https://lore.kernel.org/patchwork/patch/833004/ [2] https://lkml.org/lkml/2018/8/22/391 [3] https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1692998.html diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index d1b0475..ada00bc 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -449,20 +449,17 @@ static void __iommu_dma_free_pages(struct page **pages, int count) kvfree(pages); } -static struct page **__iommu_dma_alloc_pages(unsigned int count, -unsigned long order_mask, gfp_t gfp) +static struct page **__iommu_dma_alloc_pages(struct device *dev, +unsigned int count, unsigned long order_mask, gfp_t gfp) { struct page **pages; -unsigned int i = 0, array_size = count * sizeof(*pages); +unsigned int i = 0, nid = dev_to_node(dev); order_mask &= (2U << MAX_ORDER) - 1; if (!order_mask) return NULL; -if (array_size <= PAGE_SIZE) -pages = kzalloc(array_size, GFP_KERNEL); -else -pages = vzalloc(array_size); +pages = kvzalloc_node(count * sizeof(*pages), GFP_KERNEL, nid); The pages array is only accessed by the CPU servicing the iommu_dma_alloc() call, and is usually freed again before that call even returns. It's certainly never touched by the device, so forcing it to a potentially non-local node doesn't make a great deal of sense. Right, it seems sensible to not make this allocation include the device-locality requirement, so can leave as is. However modifying to use kvzalloc() would seem ok. if (!pages) return NULL; @@ -483,8 +480,10 @@ static struct page **__iommu_dma_alloc_pages(unsigned int count, unsigned int order = __fls(order_mask); order_size = 1U << order; -page = alloc_pages((order_mask - order_size) ? - gfp | __GFP_NORETRY : gfp, order); +page = alloc_pages_node(nid, +(order_mask - order_size) ? +gfp | __GFP_NORETRY : gfp, +order); If we're touching this, can we sort out that horrendous ternary? FWIW I found I have a local version of the original patch which I tweaked at the time, and apparently I reworked this hunk as below, which does seem somewhat nicer for the same diffstat. Robin. @@ -446,10 +443,12 @@ static struct page **__iommu_dma_alloc_pages(unsigned int count, for (order_mask &= (2U << __fls(count)) - 1; order_mask; order_mask &= ~order_size) { unsigned int order = __fls(order_mask); + gfp_t alloc_flags = gfp; order_size = 1U << order; - page = alloc_pages((order_mask - order_size) ? - gfp | __GFP_NORETRY : gfp, order); + if (order_size < order_mask) + alloc_flags |= __GFP_NORETRY; Sure, this can be included + page = alloc_pages_node(nid, alloc_flags, order); if (!page) continue; if (!order) . Cheers, John ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/2] swiotlb: Make DIRECT_MAPPING_ERROR viable
On 20/11/2018 14:49, Konrad Rzeszutek Wilk wrote: On Tue, Nov 20, 2018 at 02:09:52PM +, Robin Murphy wrote: With the overflow buffer removed, we no longer have a unique address which is guaranteed not to be a valid DMA target to use as an error token. The DIRECT_MAPPING_ERROR value of 0 tries to at least represent an unlikely DMA target, but unfortunately there are already SWIOTLB users with DMA-able memory at physical address 0 which now gets falsely treated as a mapping failure and leads to all manner of misbehaviour. The best we can do to mitigate that is flip DIRECT_MAPPING_ERROR to the commonly-used all-bits-set value, since the last single byte of memory is by far the least-likely-valid DMA target. Are all the callers checking for DIRECT_MAPPING_ERROR or is it more of a comparison (as in if (!ret)) ? dma_direct_map_page() and dma_direct_mapping_error() were already doing the right thing, and external callers must rely on the latter via dma_mapping_error() rather than trying to inspect the actual value themselves, since that varies between implementations anyway. AFAICS all the new return paths from swiotlb_map_page() are also robust in referencing the macro explicitly, so I think we're good. Thanks, Robin. Fixes: dff8d6c1ed58 ("swiotlb: remove the overflow buffer")] Reported-by: John Stultz Signed-off-by: Robin Murphy --- include/linux/dma-direct.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h index bd73e7a91410..9de9c7ab39d6 100644 --- a/include/linux/dma-direct.h +++ b/include/linux/dma-direct.h @@ -5,7 +5,7 @@ #include #include -#define DIRECT_MAPPING_ERROR 0 +#define DIRECT_MAPPING_ERROR ~(dma_addr_t)0 #ifdef CONFIG_ARCH_HAS_PHYS_TO_DMA #include -- 2.19.1.dirty ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/2] swiotlb: Skip cache maintenance on map error
On Tue, Nov 20, 2018 at 02:09:53PM +, Robin Murphy wrote: > If swiotlb_bounce_page() failed, calling arch_sync_dma_for_device() may > lead to such delights as performing cache maintenance on whatever > address phys_to_virt(SWIOTLB_MAP_ERROR) looks like, which is typically > outside the kernel memory map and goes about as well as expected. > > Don't do that. +Stefano > > Fixes: a4a4330db46a ("swiotlb: add support for non-coherent DMA") > Signed-off-by: Robin Murphy > --- > kernel/dma/swiotlb.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c > index 5731daa09a32..045930e32c0e 100644 > --- a/kernel/dma/swiotlb.c > +++ b/kernel/dma/swiotlb.c > @@ -679,7 +679,8 @@ dma_addr_t swiotlb_map_page(struct device *dev, struct > page *page, > } > > if (!dev_is_dma_coherent(dev) && > - (attrs & DMA_ATTR_SKIP_CPU_SYNC) == 0) > + (attrs & DMA_ATTR_SKIP_CPU_SYNC) == 0 && > + dev_addr != DIRECT_MAPPING_ERROR) > arch_sync_dma_for_device(dev, phys, size, dir); > > return dev_addr; > -- > 2.19.1.dirty > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/2] swiotlb: Make DIRECT_MAPPING_ERROR viable
On Tue, Nov 20, 2018 at 02:09:52PM +, Robin Murphy wrote: > With the overflow buffer removed, we no longer have a unique address > which is guaranteed not to be a valid DMA target to use as an error > token. The DIRECT_MAPPING_ERROR value of 0 tries to at least represent > an unlikely DMA target, but unfortunately there are already SWIOTLB > users with DMA-able memory at physical address 0 which now gets falsely > treated as a mapping failure and leads to all manner of misbehaviour. > > The best we can do to mitigate that is flip DIRECT_MAPPING_ERROR to the > commonly-used all-bits-set value, since the last single byte of memory > is by far the least-likely-valid DMA target. Are all the callers checking for DIRECT_MAPPING_ERROR or is it more of a comparison (as in if (!ret)) ? > > Fixes: dff8d6c1ed58 ("swiotlb: remove the overflow buffer")] > Reported-by: John Stultz > Signed-off-by: Robin Murphy > --- > include/linux/dma-direct.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h > index bd73e7a91410..9de9c7ab39d6 100644 > --- a/include/linux/dma-direct.h > +++ b/include/linux/dma-direct.h > @@ -5,7 +5,7 @@ > #include > #include > > -#define DIRECT_MAPPING_ERROR 0 > +#define DIRECT_MAPPING_ERROR ~(dma_addr_t)0 > > #ifdef CONFIG_ARCH_HAS_PHYS_TO_DMA > #include > -- > 2.19.1.dirty > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2] iommu/dma: Use NUMA aware memory allocations in __iommu_dma_alloc_pages()
On 20/11/2018 13:42, John Garry wrote: From: Ganapatrao Kulkarni Change function __iommu_dma_alloc_pages() to allocate memory/pages for DMA from respective device NUMA node. Signed-off-by: Ganapatrao Kulkarni [JPG: Modifed to use kvzalloc() and fixed indentation] Signed-off-by: John Garry --- Difference v1->v2: - Add Ganapatrao's tag and change author This patch was originally posted by Ganapatrao in [1]. However, after initial review, it was never reposted (due to lack of cycles, I think). In addition, the functionality in its sibling patches were merged through patches, as mentioned in [2]; this also refers to a discussion on device local allocations vs CPU local allocations for DMA pool, and which is better [3]. However, as mentioned in [3], dma_alloc_coherent() uses the locality information from the device - as in direct DMA - so this patch is just applying this same policy. [1] https://lore.kernel.org/patchwork/patch/833004/ [2] https://lkml.org/lkml/2018/8/22/391 [3] https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1692998.html diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index d1b0475..ada00bc 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -449,20 +449,17 @@ static void __iommu_dma_free_pages(struct page **pages, int count) kvfree(pages); } -static struct page **__iommu_dma_alloc_pages(unsigned int count, - unsigned long order_mask, gfp_t gfp) +static struct page **__iommu_dma_alloc_pages(struct device *dev, + unsigned int count, unsigned long order_mask, gfp_t gfp) { struct page **pages; - unsigned int i = 0, array_size = count * sizeof(*pages); + unsigned int i = 0, nid = dev_to_node(dev); order_mask &= (2U << MAX_ORDER) - 1; if (!order_mask) return NULL; - if (array_size <= PAGE_SIZE) - pages = kzalloc(array_size, GFP_KERNEL); - else - pages = vzalloc(array_size); + pages = kvzalloc_node(count * sizeof(*pages), GFP_KERNEL, nid); The pages array is only accessed by the CPU servicing the iommu_dma_alloc() call, and is usually freed again before that call even returns. It's certainly never touched by the device, so forcing it to a potentially non-local node doesn't make a great deal of sense. if (!pages) return NULL; @@ -483,8 +480,10 @@ static struct page **__iommu_dma_alloc_pages(unsigned int count, unsigned int order = __fls(order_mask); order_size = 1U << order; - page = alloc_pages((order_mask - order_size) ? - gfp | __GFP_NORETRY : gfp, order); + page = alloc_pages_node(nid, + (order_mask - order_size) ? + gfp | __GFP_NORETRY : gfp, + order); If we're touching this, can we sort out that horrendous ternary? FWIW I found I have a local version of the original patch which I tweaked at the time, and apparently I reworked this hunk as below, which does seem somewhat nicer for the same diffstat. Robin. @@ -446,10 +443,12 @@ static struct page **__iommu_dma_alloc_pages(unsigned int count, for (order_mask &= (2U << __fls(count)) - 1; order_mask; order_mask &= ~order_size) { unsigned int order = __fls(order_mask); + gfp_t alloc_flags = gfp; order_size = 1U << order; - page = alloc_pages((order_mask - order_size) ? - gfp | __GFP_NORETRY : gfp, order); + if (order_size < order_mask) + alloc_flags |= __GFP_NORETRY; + page = alloc_pages_node(nid, alloc_flags, order); if (!page) continue; if (!order) ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 1/2] swiotlb: Make DIRECT_MAPPING_ERROR viable
With the overflow buffer removed, we no longer have a unique address which is guaranteed not to be a valid DMA target to use as an error token. The DIRECT_MAPPING_ERROR value of 0 tries to at least represent an unlikely DMA target, but unfortunately there are already SWIOTLB users with DMA-able memory at physical address 0 which now gets falsely treated as a mapping failure and leads to all manner of misbehaviour. The best we can do to mitigate that is flip DIRECT_MAPPING_ERROR to the commonly-used all-bits-set value, since the last single byte of memory is by far the least-likely-valid DMA target. Fixes: dff8d6c1ed58 ("swiotlb: remove the overflow buffer")] Reported-by: John Stultz Signed-off-by: Robin Murphy --- include/linux/dma-direct.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h index bd73e7a91410..9de9c7ab39d6 100644 --- a/include/linux/dma-direct.h +++ b/include/linux/dma-direct.h @@ -5,7 +5,7 @@ #include #include -#define DIRECT_MAPPING_ERROR 0 +#define DIRECT_MAPPING_ERROR ~(dma_addr_t)0 #ifdef CONFIG_ARCH_HAS_PHYS_TO_DMA #include -- 2.19.1.dirty ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 2/2] swiotlb: Skip cache maintenance on map error
If swiotlb_bounce_page() failed, calling arch_sync_dma_for_device() may lead to such delights as performing cache maintenance on whatever address phys_to_virt(SWIOTLB_MAP_ERROR) looks like, which is typically outside the kernel memory map and goes about as well as expected. Don't do that. Fixes: a4a4330db46a ("swiotlb: add support for non-coherent DMA") Signed-off-by: Robin Murphy --- kernel/dma/swiotlb.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 5731daa09a32..045930e32c0e 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -679,7 +679,8 @@ dma_addr_t swiotlb_map_page(struct device *dev, struct page *page, } if (!dev_is_dma_coherent(dev) && - (attrs & DMA_ATTR_SKIP_CPU_SYNC) == 0) + (attrs & DMA_ATTR_SKIP_CPU_SYNC) == 0 && + dev_addr != DIRECT_MAPPING_ERROR) arch_sync_dma_for_device(dev, phys, size, dir); return dev_addr; -- 2.19.1.dirty ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 0/2] SWIOTLB fixes for 4.20
This is what I have so far, which at least resolves the most ovbious problems. I still haven't got very far with the USB corruption issue I see on Juno with -rc1, but I'm yet to confirm whether that's actually attributable to the SWIOTLB changes or something else entirely. Robin. Robin Murphy (2): swiotlb: Make DIRECT_MAPPING_ERROR viable swiotlb: Skip cache maintenance on map error include/linux/dma-direct.h | 2 +- kernel/dma/swiotlb.c | 3 ++- 2 files changed, 3 insertions(+), 2 deletions(-) -- 2.19.1.dirty ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2] iommu/dma: Use NUMA aware memory allocations in __iommu_dma_alloc_pages()
From: Ganapatrao Kulkarni Change function __iommu_dma_alloc_pages() to allocate memory/pages for DMA from respective device NUMA node. Signed-off-by: Ganapatrao Kulkarni [JPG: Modifed to use kvzalloc() and fixed indentation] Signed-off-by: John Garry --- Difference v1->v2: - Add Ganapatrao's tag and change author This patch was originally posted by Ganapatrao in [1]. However, after initial review, it was never reposted (due to lack of cycles, I think). In addition, the functionality in its sibling patches were merged through patches, as mentioned in [2]; this also refers to a discussion on device local allocations vs CPU local allocations for DMA pool, and which is better [3]. However, as mentioned in [3], dma_alloc_coherent() uses the locality information from the device - as in direct DMA - so this patch is just applying this same policy. [1] https://lore.kernel.org/patchwork/patch/833004/ [2] https://lkml.org/lkml/2018/8/22/391 [3] https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1692998.html diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index d1b0475..ada00bc 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -449,20 +449,17 @@ static void __iommu_dma_free_pages(struct page **pages, int count) kvfree(pages); } -static struct page **__iommu_dma_alloc_pages(unsigned int count, - unsigned long order_mask, gfp_t gfp) +static struct page **__iommu_dma_alloc_pages(struct device *dev, + unsigned int count, unsigned long order_mask, gfp_t gfp) { struct page **pages; - unsigned int i = 0, array_size = count * sizeof(*pages); + unsigned int i = 0, nid = dev_to_node(dev); order_mask &= (2U << MAX_ORDER) - 1; if (!order_mask) return NULL; - if (array_size <= PAGE_SIZE) - pages = kzalloc(array_size, GFP_KERNEL); - else - pages = vzalloc(array_size); + pages = kvzalloc_node(count * sizeof(*pages), GFP_KERNEL, nid); if (!pages) return NULL; @@ -483,8 +480,10 @@ static struct page **__iommu_dma_alloc_pages(unsigned int count, unsigned int order = __fls(order_mask); order_size = 1U << order; - page = alloc_pages((order_mask - order_size) ? - gfp | __GFP_NORETRY : gfp, order); + page = alloc_pages_node(nid, + (order_mask - order_size) ? + gfp | __GFP_NORETRY : gfp, + order); if (!page) continue; if (!order) @@ -569,7 +568,8 @@ struct page **iommu_dma_alloc(struct device *dev, size_t size, gfp_t gfp, alloc_sizes = min_size; count = PAGE_ALIGN(size) >> PAGE_SHIFT; - pages = __iommu_dma_alloc_pages(count, alloc_sizes >> PAGE_SHIFT, gfp); + pages = __iommu_dma_alloc_pages(dev, count, alloc_sizes >> PAGE_SHIFT, + gfp); if (!pages) return NULL; -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
On 20/11/2018 09:16, Jonathan Cameron wrote: > +CC Jean-Phillipe and iommu list. Thanks for the Cc, sorry I don't have enough bandwidth to follow this thread at the moment. > In WarpDrive/uacce, we make this simple. If you support IOMMU and it > support > SVM/SVA. Everything will be fine just like ODP implicit mode. And you > don't need > to write any code for that. Because it has been done by IOMMU framework. > If it Looks like the IOMMU code uses mmu_notifier, so it is identical to IB's ODP. The only difference is that IB tends to have the IOMMU page table in the device, not in the CPU. The only case I know if that is different is the new-fangled CAPI stuff where the IOMMU can directly use the CPU's page table and the IOMMU page table (in device or CPU) is eliminated. >>> >>> Yes. We are not focusing on the current implementation. As mentioned in the >>> cover letter. We are expecting Jean Philips' SVA patch: >>> git://linux-arm.org/linux-jpb. >> >> This SVA stuff does not look comparable to CAPI as it still requires >> maintaining seperate IOMMU page tables. With SVA, we use the same page tables in the IOMMU and CPU. It's the same pgd pointer, there is no mirroring of mappings. We bind the process page tables with the device using a PASID (Process Address Space ID). After fork(), the child's mm is different from the parent's one, and is not automatically bound to the device. The device driver will have to issue a new bind() request, and the child mm will be bound with a different PASID. There could be a problem if the child inherits the parent's device handle. Then depending on the device, the child could be able to program DMA and possibly access the parent's address space. The parent needs to be aware of that when using the bind() API, and close the device fd in the child after fork(). We use MMU notifiers for some address space changes: * The IOTLB needs to be invalidated after any unmap() to the process address space. On Arm systems the SMMU IOTLBs can be invalidated by the CPU TLBI instructions, but we still need to invalidate TLBs private to devices that are arch-agnostic (Address Translation Cache in PCI ATS). * When the process mm exits, we need to remove the associated PASID configuration in the IOMMU and invalidate the TLBs. Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/dma: Use NUMA aware memory allocations in __iommu_dma_alloc_pages()
On 20/11/2018 10:09, Ganapatrao Kulkarni wrote: Hi John, On Tue, Nov 20, 2018 at 3:35 PM John Garry wrote: On 08/11/2018 17:55, John Garry wrote: Change function __iommu_dma_alloc_pages() to allocate memory/pages for DMA from respective device NUMA node. Ping a friendly reminder on this patch. Thanks Originally-from: Ganapatrao Kulkarni Signed-off-by: John Garry --- This patch was originally posted by Ganapatrao in [1] *. However, after initial review, it was never reposted (due to lack of cycles, I think). In addition, the functionality in its sibling patches were merged through patches, as mentioned in [2]; this also refers to a discussion on device local allocations vs CPU local allocations for DMA pool, and which is better [3]. However, as mentioned in [3], dma_alloc_coherent() uses the locality information from the device - as in direct DMA - so this patch is just applying this same policy. [1] https://lore.kernel.org/patchwork/patch/833004/ [2] https://lkml.org/lkml/2018/8/22/391 [3] https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1692998.html * Authorship on this updated patch may need to be fixed - I add not want to add Ganapatrao's SOB without permission. thanks for taking this up. please feel free to add my SoB. OK, I will also make you the author and repost. Thanks, John diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index d1b0475..ada00bc 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -449,20 +449,17 @@ static void __iommu_dma_free_pages(struct page **pages, int count) kvfree(pages); } -static struct page **__iommu_dma_alloc_pages(unsigned int count, - unsigned long order_mask, gfp_t gfp) +static struct page **__iommu_dma_alloc_pages(struct device *dev, + unsigned int count, unsigned long order_mask, gfp_t gfp) { struct page **pages; - unsigned int i = 0, array_size = count * sizeof(*pages); + unsigned int i = 0, nid = dev_to_node(dev); order_mask &= (2U << MAX_ORDER) - 1; if (!order_mask) return NULL; - if (array_size <= PAGE_SIZE) - pages = kzalloc(array_size, GFP_KERNEL); - else - pages = vzalloc(array_size); + pages = kvzalloc_node(count * sizeof(*pages), GFP_KERNEL, nid); if (!pages) return NULL; @@ -483,8 +480,10 @@ static struct page **__iommu_dma_alloc_pages(unsigned int count, unsigned int order = __fls(order_mask); order_size = 1U << order; - page = alloc_pages((order_mask - order_size) ? -gfp | __GFP_NORETRY : gfp, order); + page = alloc_pages_node(nid, + (order_mask - order_size) ? + gfp | __GFP_NORETRY : gfp, + order); if (!page) continue; if (!order) @@ -569,7 +568,8 @@ struct page **iommu_dma_alloc(struct device *dev, size_t size, gfp_t gfp, alloc_sizes = min_size; count = PAGE_ALIGN(size) >> PAGE_SHIFT; - pages = __iommu_dma_alloc_pages(count, alloc_sizes >> PAGE_SHIFT, gfp); + pages = __iommu_dma_alloc_pages(dev, count, alloc_sizes >> PAGE_SHIFT, + gfp); if (!pages) return NULL; thanks Ganapat . ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/dma: Use NUMA aware memory allocations in __iommu_dma_alloc_pages()
Hi John, On Tue, Nov 20, 2018 at 3:35 PM John Garry wrote: > > On 08/11/2018 17:55, John Garry wrote: > > Change function __iommu_dma_alloc_pages() to allocate memory/pages > > for DMA from respective device NUMA node. > > > > Ping a friendly reminder on this patch. > > Thanks > > > Originally-from: Ganapatrao Kulkarni > > Signed-off-by: John Garry > > --- > > > > This patch was originally posted by Ganapatrao in [1] *. > > > > However, after initial review, it was never reposted (due to lack of > > cycles, I think). In addition, the functionality in its sibling patches > > were merged through patches, as mentioned in [2]; this also refers to a > > discussion on device local allocations vs CPU local allocations for DMA > > pool, and which is better [3]. > > > > However, as mentioned in [3], dma_alloc_coherent() uses the locality > > information from the device - as in direct DMA - so this patch is just > > applying this same policy. > > > > [1] https://lore.kernel.org/patchwork/patch/833004/ > > [2] https://lkml.org/lkml/2018/8/22/391 > > [3] > > https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1692998.html > > > > * Authorship on this updated patch may need to be fixed - I add not want to > > add Ganapatrao's SOB without permission. thanks for taking this up. please feel free to add my SoB. > > > > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c > > index d1b0475..ada00bc 100644 > > --- a/drivers/iommu/dma-iommu.c > > +++ b/drivers/iommu/dma-iommu.c > > @@ -449,20 +449,17 @@ static void __iommu_dma_free_pages(struct page > > **pages, int count) > > kvfree(pages); > > } > > > > -static struct page **__iommu_dma_alloc_pages(unsigned int count, > > - unsigned long order_mask, gfp_t gfp) > > +static struct page **__iommu_dma_alloc_pages(struct device *dev, > > + unsigned int count, unsigned long order_mask, gfp_t gfp) > > { > > struct page **pages; > > - unsigned int i = 0, array_size = count * sizeof(*pages); > > + unsigned int i = 0, nid = dev_to_node(dev); > > > > order_mask &= (2U << MAX_ORDER) - 1; > > if (!order_mask) > > return NULL; > > > > - if (array_size <= PAGE_SIZE) > > - pages = kzalloc(array_size, GFP_KERNEL); > > - else > > - pages = vzalloc(array_size); > > + pages = kvzalloc_node(count * sizeof(*pages), GFP_KERNEL, nid); > > if (!pages) > > return NULL; > > > > @@ -483,8 +480,10 @@ static struct page **__iommu_dma_alloc_pages(unsigned > > int count, > > unsigned int order = __fls(order_mask); > > > > order_size = 1U << order; > > - page = alloc_pages((order_mask - order_size) ? > > -gfp | __GFP_NORETRY : gfp, order); > > + page = alloc_pages_node(nid, > > + (order_mask - order_size) ? > > + gfp | __GFP_NORETRY : gfp, > > + order); > > if (!page) > > continue; > > if (!order) > > @@ -569,7 +568,8 @@ struct page **iommu_dma_alloc(struct device *dev, > > size_t size, gfp_t gfp, > > alloc_sizes = min_size; > > > > count = PAGE_ALIGN(size) >> PAGE_SHIFT; > > - pages = __iommu_dma_alloc_pages(count, alloc_sizes >> PAGE_SHIFT, > > gfp); > > + pages = __iommu_dma_alloc_pages(dev, count, alloc_sizes >> PAGE_SHIFT, > > + gfp); > > if (!pages) > > return NULL; > > > > > > thanks Ganapat ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/dma: Use NUMA aware memory allocations in __iommu_dma_alloc_pages()
On 08/11/2018 17:55, John Garry wrote: Change function __iommu_dma_alloc_pages() to allocate memory/pages for DMA from respective device NUMA node. Ping a friendly reminder on this patch. Thanks Originally-from: Ganapatrao Kulkarni Signed-off-by: John Garry --- This patch was originally posted by Ganapatrao in [1] *. However, after initial review, it was never reposted (due to lack of cycles, I think). In addition, the functionality in its sibling patches were merged through patches, as mentioned in [2]; this also refers to a discussion on device local allocations vs CPU local allocations for DMA pool, and which is better [3]. However, as mentioned in [3], dma_alloc_coherent() uses the locality information from the device - as in direct DMA - so this patch is just applying this same policy. [1] https://lore.kernel.org/patchwork/patch/833004/ [2] https://lkml.org/lkml/2018/8/22/391 [3] https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1692998.html * Authorship on this updated patch may need to be fixed - I add not want to add Ganapatrao's SOB without permission. diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index d1b0475..ada00bc 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -449,20 +449,17 @@ static void __iommu_dma_free_pages(struct page **pages, int count) kvfree(pages); } -static struct page **__iommu_dma_alloc_pages(unsigned int count, - unsigned long order_mask, gfp_t gfp) +static struct page **__iommu_dma_alloc_pages(struct device *dev, + unsigned int count, unsigned long order_mask, gfp_t gfp) { struct page **pages; - unsigned int i = 0, array_size = count * sizeof(*pages); + unsigned int i = 0, nid = dev_to_node(dev); order_mask &= (2U << MAX_ORDER) - 1; if (!order_mask) return NULL; - if (array_size <= PAGE_SIZE) - pages = kzalloc(array_size, GFP_KERNEL); - else - pages = vzalloc(array_size); + pages = kvzalloc_node(count * sizeof(*pages), GFP_KERNEL, nid); if (!pages) return NULL; @@ -483,8 +480,10 @@ static struct page **__iommu_dma_alloc_pages(unsigned int count, unsigned int order = __fls(order_mask); order_size = 1U << order; - page = alloc_pages((order_mask - order_size) ? - gfp | __GFP_NORETRY : gfp, order); + page = alloc_pages_node(nid, + (order_mask - order_size) ? + gfp | __GFP_NORETRY : gfp, + order); if (!page) continue; if (!order) @@ -569,7 +568,8 @@ struct page **iommu_dma_alloc(struct device *dev, size_t size, gfp_t gfp, alloc_sizes = min_size; count = PAGE_ALIGN(size) >> PAGE_SHIFT; - pages = __iommu_dma_alloc_pages(count, alloc_sizes >> PAGE_SHIFT, gfp); + pages = __iommu_dma_alloc_pages(dev, count, alloc_sizes >> PAGE_SHIFT, + gfp); if (!pages) return NULL; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 06/10] swiotlb: use swiotlb_map_page in swiotlb_map_sg_attrs
On Mon, Nov 19, 2018 at 03:22:13PM -0800, John Stultz wrote: > > + sg->dma_address = dma_addr; > > sg_dma_len(sg) = sg->length; > > } > > I know Robin has already replied with more detailed info, but just to > close the loop as I'm finally home, applying this patch didn't seem to > help with the IO hangs I'm seeing w/ HiKey960. If Robins observation is right this should fix the problem for you: diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h index bd73e7a91410..1833f0c1fba0 100644 --- a/include/linux/dma-direct.h +++ b/include/linux/dma-direct.h @@ -5,7 +5,7 @@ #include #include -#define DIRECT_MAPPING_ERROR 0 +#define DIRECT_MAPPING_ERROR (~(dma_addr_t)0x0) #ifdef CONFIG_ARCH_HAS_PHYS_TO_DMA #include ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 06/10] swiotlb: use swiotlb_map_page in swiotlb_map_sg_attrs
On Mon, Nov 19, 2018 at 07:36:44PM +, Robin Murphy wrote: > OK, having brought my Hikey to life and reproduced John's stall with rc1, > what's going on is that at some point dma_map_sg() returns 0, which causes > the SCSI/UFS layer to go round in circles repeatedly trying to map the same > list(s) equally unsuccessfully. > > Why does dma_map_sg() fail? Turns out what we all managed to overlook is > that this patch *does* introduce a subtle change in behaviour, in that > previously the non-bounced case assigned dev_addr to sg->dma_address > without looking at it; now with the swiotlb_map_page() call we check the > return value against DIRECT_MAPPING_ERROR regardless of whether it was > bounced or not. > > Flash back to the other thread when I said "...but I suspect there may well > be non-IOMMU platforms where DMA to physical address 0 is a thing :("? I > have the 3GB Hikey where all the RAM is below 32 bits so SWIOTLB never ever > bounces, but sure enough, guess where that RAM starts... What is PAGE_OFFSET on that machine? We usually don't use kernel virtual address 0 so that we can deal with 0 pointer derferences sanely, but I guess we can get to physical address 0. I guess the quick fix would be to move DMA_DIRECT_MAPPING_ERROR to all-F ASAP.. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH RFC] dma-direct: do not allocate a single page from CMA area
On Mon, Nov 05, 2018 at 02:40:51PM -0800, Nicolin Chen wrote: > > > In general, this seems to make sense to me. It does represent a > > > theoretical > > > change in behaviour for devices which have their own CMA area somewhere > > > other than kernel memory, and only ever make non-atomic allocations, but > > > I'm not sure whether that's a realistic or common enough case to really > > > worry about. > > > > Yes, I think we should make the decision in dma_alloc_from_contiguous > > based on having a per-dev CMA area or not. There is a lot of cruft in > > It seems that cma_alloc() already has a CMA area check? Would it > be duplicated to have a similar one in dma_alloc_from_contiguous? It isn't duplicate if it serves a different purpose. > > this area that should be cleaned up while we're at it, like always > > falling back to the normal page allocator if there is no CMA area or > > nothing suitable found in dma_alloc_from_contiguous instead of > > having to duplicate all that in the caller. > > Am I supposed to clean up things that's mentioned above by moving > the fallback allocator into dma_alloc_from_contiguous, or to just > move my change (the count check) into dma_alloc_from_contiguous? > > I understand that'd be great to have a cleanup, yet feel it could > be done separately as this patch isn't really a cleanup change. I can take care of any cleanups. I've been trying to dust up that area anyway. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
+CC Jean-Phillipe and iommu list. On Mon, 19 Nov 2018 20:29:39 -0700 Jason Gunthorpe wrote: > On Tue, Nov 20, 2018 at 11:07:02AM +0800, Kenneth Lee wrote: > > On Mon, Nov 19, 2018 at 11:49:54AM -0700, Jason Gunthorpe wrote: > > > Date: Mon, 19 Nov 2018 11:49:54 -0700 > > > From: Jason Gunthorpe > > > To: Kenneth Lee > > > CC: Leon Romanovsky , Kenneth Lee , > > > Tim Sell , linux-...@vger.kernel.org, Alexander > > > Shishkin , Zaibo Xu > > > , zhangfei@foxmail.com, linux...@huawei.com, > > > haojian.zhu...@linaro.org, Christoph Lameter , Hao Fang > > > , Gavin Schenk , RDMA > > > mailing > > > list , Zhou Wang , > > > Doug Ledford , Uwe Kleine-König > > > , David Kershner > > > , Johan Hovold , Cyrille > > > Pitchen , Sagar Dharia > > > , Jens Axboe , > > > guodong...@linaro.org, linux-netdev , Randy > > > Dunlap > > > , linux-ker...@vger.kernel.org, Vinod Koul > > > , linux-cry...@vger.kernel.org, Philippe Ombredanne > > > , Sanyog Kale , "David S. > > > Miller" , linux-accelerat...@lists.ozlabs.org > > > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce > > > User-Agent: Mutt/1.9.4 (2018-02-28) > > > Message-ID: <20181119184954.gb4...@ziepe.ca> > > > > > > On Mon, Nov 19, 2018 at 05:14:05PM +0800, Kenneth Lee wrote: > > > > > > > If the hardware cannot share page table with the CPU, we then need to > > > > have > > > > some way to change the device page table. This is what happen in ODP. It > > > > invalidates the page table in device upon mmu_notifier call back. But > > > > this cannot > > > > solve the COW problem: if the user process A share a page P with > > > > device, and A > > > > forks a new process B, and it continue to write to the page. By COW, the > > > > process B will keep the page P, while A will get a new page P'. But you > > > > have > > > > no way to let the device know it should use P' rather than P. > > > > > > Is this true? I thought mmu_notifiers covered all these cases. > > > > > > The mm_notifier for A should fire if B causes the physical address of > > > A's pages to change via COW. > > > > > > And this causes the device page tables to re-synchronize. > > > > I don't see such code. The current do_cow_fault() implemenation has nothing > > to > > do with mm_notifer. > > Well, that sure sounds like it would be a bug in mmu_notifiers.. > > But considering Jean's SVA stuff seems based on mmu notifiers, I have > a hard time believing that it has any different behavior from RDMA's > ODP, and if it does have different behavior, then it is probably just > a bug in the ODP implementation. > > > > > In WarpDrive/uacce, we make this simple. If you support IOMMU and it > > > > support > > > > SVM/SVA. Everything will be fine just like ODP implicit mode. And you > > > > don't need > > > > to write any code for that. Because it has been done by IOMMU > > > > framework. If it > > > > > > Looks like the IOMMU code uses mmu_notifier, so it is identical to > > > IB's ODP. The only difference is that IB tends to have the IOMMU page > > > table in the device, not in the CPU. > > > > > > The only case I know if that is different is the new-fangled CAPI > > > stuff where the IOMMU can directly use the CPU's page table and the > > > IOMMU page table (in device or CPU) is eliminated. > > > > Yes. We are not focusing on the current implementation. As mentioned in the > > cover letter. We are expecting Jean Philips' SVA patch: > > git://linux-arm.org/linux-jpb. > > This SVA stuff does not look comparable to CAPI as it still requires > maintaining seperate IOMMU page tables. > > Also, those patches from Jean have a lot of references to > mmu_notifiers (ie look at iommu_mmu_notifier). > > Are you really sure it is actually any different at all? > > > > Anyhow, I don't think a single instance of hardware should justify an > > > entire new subsystem. Subsystems are hard to make and without multiple > > > hardware examples there is no way to expect that it would cover any > > > future use cases. > > > > Yes. That's our first expectation. We can keep it with our driver. But > > because > > there is no user driver support for any accelerator in mainline kernel. > > Even the > > well known QuickAssit has to be maintained out of tree. So we try to see if > > people is interested in working together to solve the problem. > > Well, you should come with patches ack'ed by these other groups. > > > > If all your driver needs is to mmap some PCI bar space, route > > > interrupts and do DMA mapping then mediated VFIO is probably a good > > > choice. > > > > Yes. That is what is done in our RFCv1/v2. But we accepted Jerome's opinion > > and > > try not to add complexity to the mm subsystem. > > Why would a mediated VFIO driver touch the mm subsystem? Sounds like > you don't have a VFIO driver if it needs to do stuff like that... > > > > If it needs to do a bunch of other stuff, not related to PCI bar > > > space, i