RE: [RFC PATCH 2/6] drivers core: Add I/O ASID allocator
> From: Koenig, Christian > Sent: Thursday, November 22, 2018 3:10 AM > > Am 21.11.18 um 12:16 schrieb Jean-Philippe Brucker: > > On 12/11/2018 14:40, Joerg Roedel wrote: > >> Hi Jean-Philippe, > >> > >> On Fri, Oct 19, 2018 at 07:11:54PM +0100, Jean-Philippe Brucker wrote: > >>> The allocator doesn't really belong in drivers/iommu because some > >>> drivers would like to allocate PASIDs for devices that aren't managed by > >>> an IOMMU, using the same ID space as IOMMU. It doesn't really > belong in > >>> drivers/pci either since platform device also support PASID. Add the > >>> allocator in drivers/base. > >> I don't really buy this argument, in the end it is the IOMMU that > >> enforces the PASID mappings for a device. Why should a device not > behind > >> an IOMMU need to allocate a pasid? Do you have examples of such > devices > >> which also don't have their own iommu built-in? > > I misunderstood that requirement. Reading through the thread again > > (https://www.mail-archive.com/iommu@lists.linux- > foundation.org/msg25640.html) > > it's more about a device using PASIDs as context IDs. Some contexts are > > not bound to processes but they still need their ID in the same PASID > > space as the contexts that are bound to process address spaces. So we > > can keep that code in drivers/iommu > > The problem with that approach is that we also need this allocator when > IOMMU is completely disabled. > > In other words PASIDs are used as contexts IDs by the hardware for > things like signaling which application has caused an interrupt/event > even when they are not used by IOMMU later on. > > Additional to that we have some MMUs which are internal to the devices > (GPUVM on AMD GPUs for example, similar exists for NVidia) which uses > PASIDs for address translation internally in the device. > > All of that is completely unrelated to IOMMU, but when IOMMU is enabled > you need to use the same allocator because all use cases use the same ID > space. > imo context ID is a resource used by device internal while PASID is sth. informational to external world (as defined in PCI bus package). ideally two things are separate in hardware but can be associated (e.g. PASID as an optional parameter per context). That is what I know for Intel GPUs. For your device, is "PASIDs are used as contexts IDs" a software implementation policy or actual hardware requirement (i.e. no dedicated PASID recording)? Thanks Kevin ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables
On Thu, Nov 22, 2018 at 10:36 AM Matthew Wilcox wrote: > > On Wed, Nov 21, 2018 at 10:26:26PM +, Robin Murphy wrote: > > These are IOMMU page tables, rather than CPU ones, so we're already well > > outside arch code - indeed the original motivation of io-pgtable was to be > > entirely independent of the p*d types and arch-specific MM code (this Armv7 > > short-descriptor format is already "non-native" when used by drivers in an > > arm64 kernel). > > There was quite a lot of explanation missing from this patch description! I totally agree ,-) I'm not familiar at all with either iommu or mm/... Looks like the patchset triggered a helpful discussion, and I understand the problem better now. I'll improve the description in the next revision. > > There are various efficiency reasons for using regular kernel memory instead > > of coherent DMA allocations - for the most part it works well, we just have > > the odd corner case like this one where the 32-bit format gets used on > > 64-bit systems such that the tables themselves still need to be allocated > > below 4GB (although the final output address can point at higher memory by > > virtue of the IOMMU in question not implementing permissions and repurposing > > some of those PTE fields as extra address bits). > > > > TBH, if this DMA32 stuff is going to be contentious we could possibly just > > rip out the offending kmem_cache - it seemed like good practice for the > > use-case, but provided kzalloc(SZ_1K, gfp | GFP_DMA32) can be relied upon to > > give the same 1KB alignment and chance of succeeding as the equivalent > > kmem_cache_alloc(), then we could quite easily make do with that instead. > > I think you should look at using the page_frag allocator here. You can > use whatever GFP_DMA flags you like. I'll try that. Thanks! ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 07/17] debugobjects: Move printk out of db lock critical sections
On (11/21/18 11:49), Waiman Long wrote: [..] > > case ODEBUG_STATE_ACTIVE: > > - debug_print_object(obj, "init"); > > state = obj->state; > > raw_spin_unlock_irqrestore(>lock, flags); > > + debug_print_object(obj, "init"); > > debug_object_fixup(descr->fixup_init, addr, state); > > return; > > > > case ODEBUG_STATE_DESTROYED: > > - debug_print_object(obj, "init"); > > + debug_printobj = true; > > break; > > default: > > break; > > } > > > > raw_spin_unlock_irqrestore(>lock, flags); > > + if (debug_chkstack) > > + debug_object_is_on_stack(addr, onstack); > > + if (debug_printobj) > > + debug_print_object(obj, "init"); > > [..] > > As a side note, one of the test systems that I used generated a > debugobjects splat in the bootup process and the system hanged > afterward. Applying this patch alone fix the hanging problem and the > system booted up successfully. So it is not really a good idea to call > printk() while holding a raw spinlock. Right, I like this patch. And I think that we, maybe, can go even further. Some serial consoles call mod_timer(). So what we could have with the debug objects enabled was mod_timer() lock_timer_base() debug_activate() printk() call_console_drivers() foo_console() mod_timer() lock_timer_base() << deadlock That's one possible scenario. The other one can involve console's IRQ handler, uart port spinlock, mod_timer, debug objects, printk, and an eventual deadlock on the uart port spinlock. This one can be mitigated with printk_safe. But mod_timer() deadlock will require a different fix. So maybe we need to switch debug objects print-outs to _always_ printk_deferred(). Debug objects can be used in code which cannot do direct printk() - timekeeping is just one example. -ss ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables
On Wed, Nov 21, 2018 at 10:26:26PM +, Robin Murphy wrote: > These are IOMMU page tables, rather than CPU ones, so we're already well > outside arch code - indeed the original motivation of io-pgtable was to be > entirely independent of the p*d types and arch-specific MM code (this Armv7 > short-descriptor format is already "non-native" when used by drivers in an > arm64 kernel). There was quite a lot of explanation missing from this patch description! > There are various efficiency reasons for using regular kernel memory instead > of coherent DMA allocations - for the most part it works well, we just have > the odd corner case like this one where the 32-bit format gets used on > 64-bit systems such that the tables themselves still need to be allocated > below 4GB (although the final output address can point at higher memory by > virtue of the IOMMU in question not implementing permissions and repurposing > some of those PTE fields as extra address bits). > > TBH, if this DMA32 stuff is going to be contentious we could possibly just > rip out the offending kmem_cache - it seemed like good practice for the > use-case, but provided kzalloc(SZ_1K, gfp | GFP_DMA32) can be relied upon to > give the same 1KB alignment and chance of succeeding as the equivalent > kmem_cache_alloc(), then we could quite easily make do with that instead. I think you should look at using the page_frag allocator here. You can use whatever GFP_DMA flags you like. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 3/3] iommu/io-pgtable-arm-v7s: Request DMA32 memory, and improve debugging
On Thu, Nov 22, 2018 at 2:02 AM Michal Hocko wrote: > > On Wed 21-11-18 16:46:38, Will Deacon wrote: > > On Sun, Nov 11, 2018 at 05:03:41PM +0800, Nicolas Boichat wrote: > > > For level 1/2 pages, ensure GFP_DMA32 is used if CONFIG_ZONE_DMA32 > > > is defined (e.g. on arm64 platforms). > > > > > > For level 2 pages, allocate a slab cache in SLAB_CACHE_DMA32. > > > > > > Also, print an error when the physical address does not fit in > > > 32-bit, to make debugging easier in the future. > > > > > > Fixes: ad67f5a6545f ("arm64: replace ZONE_DMA with ZONE_DMA32") > > > Signed-off-by: Nicolas Boichat > > > --- > > > > > > Changes since v1: > > > - Changed approach to use SLAB_CACHE_DMA32 added by the previous > > >commit. > > > - Use DMA or DMA32 depending on the architecture (DMA for arm, > > >DMA32 for arm64). > > > > > > drivers/iommu/io-pgtable-arm-v7s.c | 20 > > > 1 file changed, 16 insertions(+), 4 deletions(-) > > > > > > diff --git a/drivers/iommu/io-pgtable-arm-v7s.c > > > b/drivers/iommu/io-pgtable-arm-v7s.c > > > index 445c3bde04800c..996f7b6d00b44a 100644 > > > --- a/drivers/iommu/io-pgtable-arm-v7s.c > > > +++ b/drivers/iommu/io-pgtable-arm-v7s.c > > > @@ -161,6 +161,14 @@ > > > > > > #define ARM_V7S_TCR_PD1BIT(5) > > > > > > +#ifdef CONFIG_ZONE_DMA32 > > > +#define ARM_V7S_TABLE_GFP_DMA GFP_DMA32 > > > +#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA32 > > > +#else > > > +#define ARM_V7S_TABLE_GFP_DMA GFP_DMA > > > +#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA > > > +#endif > > > > It's a bit grotty that GFP_DMA32 doesn't just map to GFP_DMA on 32-bit > > architectures, since then we wouldn't need this #ifdeffery afaict. > > But GFP_DMA32 should map to GFP_KERNEL on 32b, no? Or what exactly is > going on in here? GFP_DMA32 will fail due to check_slab_flags (aka GFP_SLAB_BUG_MASK before patch 1/3 of this series)... But yes, it may be neater if there was transparent remapping of GFP_DMA32/SLAB_CACHE_DMA32 to GFP_DMA/SLAB_CACHE_DMA on 32-bit arch... > -- > Michal Hocko > SUSE Labs ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables
On Thu, Nov 22, 2018 at 6:27 AM Robin Murphy wrote: > > On 2018-11-21 9:38 pm, Matthew Wilcox wrote: > > On Wed, Nov 21, 2018 at 06:20:02PM +, Christopher Lameter wrote: > >> On Sun, 11 Nov 2018, Nicolas Boichat wrote: > >> > >>> This is a follow-up to the discussion in [1], to make sure that the page > >>> tables allocated by iommu/io-pgtable-arm-v7s are contained within 32-bit > >>> physical address space. > >> > >> Page tables? This means you need a page frame? Why go through the slab > >> allocators? > > > > Because this particular architecture has sub-page-size PMD page tables. > > We desperately need to hoist page table allocation out of the architectures; > > there're a bunch of different implementations and they're mostly bad, > > one way or another. > > These are IOMMU page tables, rather than CPU ones, so we're already well > outside arch code - indeed the original motivation of io-pgtable was to > be entirely independent of the p*d types and arch-specific MM code (this > Armv7 short-descriptor format is already "non-native" when used by > drivers in an arm64 kernel). > > There are various efficiency reasons for using regular kernel memory > instead of coherent DMA allocations - for the most part it works well, > we just have the odd corner case like this one where the 32-bit format > gets used on 64-bit systems such that the tables themselves still need > to be allocated below 4GB (although the final output address can point > at higher memory by virtue of the IOMMU in question not implementing > permissions and repurposing some of those PTE fields as extra address bits). > > TBH, if this DMA32 stuff is going to be contentious we could possibly > just rip out the offending kmem_cache - it seemed like good practice for > the use-case, but provided kzalloc(SZ_1K, gfp | GFP_DMA32) can be relied > upon to give the same 1KB alignment and chance of succeeding as the > equivalent kmem_cache_alloc(), then we could quite easily make do with > that instead. Yes, but if we want to use kzalloc, we'll need to create kmalloc_caches for DMA32, which seems wasteful as there are no other users (see my comment here: https://patchwork.kernel.org/patch/10677525/#22332697). Thanks, > Thanks, > Robin. > > > For each level of page table we generally have three cases: > > > > 1. single page > > 2. sub-page, naturally aligned > > 3. multiple pages, naturally aligned > > > > for 1 and 3, the page allocator will do just fine. > > for 2, we should have a per-MM page_frag allocator. s390 already has > > something like this, although it's more complicated. ppc also has > > something a little more complex for the cases when it's configured with > > a 64k page size but wants to use a 4k page table entry. > > > > I'd like x86 to be able to simply do: > > > > #define pte_alloc_one(mm, addr) page_alloc_table(mm, addr, 0) > > #define pmd_alloc_one(mm, addr) page_alloc_table(mm, addr, 0) > > #define pud_alloc_one(mm, addr) page_alloc_table(mm, addr, 0) > > #define p4d_alloc_one(mm, addr) page_alloc_table(mm, addr, 0) > > > > An architecture with 4k page size and needing a 16k PMD would do: > > > > #define pmd_alloc_one(mm, addr) page_alloc_table(mm, addr, 2) > > > > while an architecture with a 64k page size needing a 4k PTE would do: > > > > #define ARCH_PAGE_TABLE_FRAG > > #define pte_alloc_one(mm, addr) pagefrag_alloc_table(mm, addr, 4096) > > > > I haven't had time to work on this, but perhaps someone with a problem > > that needs fixing would like to, instead of burying yet another awful > > implementation away in arch/ somewhere. > > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 2/3] mm: Add support for SLAB_CACHE_DMA32
On Thu, Nov 22, 2018 at 2:32 AM Christopher Lameter wrote: > > On Sun, 11 Nov 2018, Nicolas Boichat wrote: > > > SLAB_CACHE_DMA32 is only available after explicit kmem_cache_create calls, > > no default cache is created for kmalloc. Add a test in check_slab_flags > > for this. > > This does not define the dma32 kmalloc array. Is that intentional? Yes that's intentional, AFAICT there is no user, so there is no point creating the cache. (okay, I could find one, but it's probably broken: git grep GFP_DMA32 | grep k[a-z]*alloc drivers/media/platform/vivid/vivid-osd.c: dev->video_vbase = kzalloc(dev->video_buffer_size, GFP_KERNEL | GFP_DMA32); ). > In that > case you need to fail any request for GFP_DMA32 coming in via kmalloc. Well, we do check for these in check_slab_flags (aka GFP_SLAB_BUG_MASK before patch 1/3 of this series), so, with or without this patch, calls with GFP_DMA32 will end up failing in check_slab_flags. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu/vt-d: Handle domain agaw being less than iommu agaw
The Intel IOMMU driver opportunistically skips a few top level page tables from the domain paging directory while programming the IOMMU context entry. However there is an implicit assumption in the code that domain's adjusted guest address width (agaw) would always be greater than IOMMU's agaw. The IOMMU capabilities in an upcoming platform cause the domain's agaw to be lower than IOMMU's agaw. The issue is seen when the IOMMU supports both 4-level and 5-level paging. The domain builds a 4-level page table based on agaw of 2. However the IOMMU's agaw is set as 3 (5-level). In this case the code incorrectly tries to skip page page table levels. This causes the IOMMU driver to avoid programming the context entry. The fix handles this case and programs the context entry accordingly. Fixes: de24e55395698 ("iommu/vt-d: Simplify domain_context_mapping_one") Cc: Cc: Ashok Raj Cc: Jacob Pan Cc: Lu Baolu Reviewed-by: Lu Baolu Reported-by: Ramos Falcon, Ernesto R Tested-by: Ricardo Neri Signed-off-by: Sohil Mehta --- drivers/iommu/intel-iommu.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index f3ccf025108b..fdf79baf1d79 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -2044,7 +2044,7 @@ static int domain_context_mapping_one(struct dmar_domain *domain, * than default. Unnecessary for PT mode. */ if (translation != CONTEXT_TT_PASS_THROUGH) { - for (agaw = domain->agaw; agaw != iommu->agaw; agaw--) { + for (agaw = domain->agaw; agaw > iommu->agaw; agaw--) { ret = -ENOMEM; pgd = phys_to_virt(dma_pte_addr(pgd)); if (!dma_pte_present(pgd)) @@ -2058,7 +2058,7 @@ static int domain_context_mapping_one(struct dmar_domain *domain, translation = CONTEXT_TT_MULTI_LEVEL; context_set_address_root(context, virt_to_phys(pgd)); - context_set_address_width(context, iommu->agaw); + context_set_address_width(context, agaw); } else { /* * In pass through mode, AW must be programmed to -- 2.19.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables
On 2018-11-21 9:38 pm, Matthew Wilcox wrote: On Wed, Nov 21, 2018 at 06:20:02PM +, Christopher Lameter wrote: On Sun, 11 Nov 2018, Nicolas Boichat wrote: This is a follow-up to the discussion in [1], to make sure that the page tables allocated by iommu/io-pgtable-arm-v7s are contained within 32-bit physical address space. Page tables? This means you need a page frame? Why go through the slab allocators? Because this particular architecture has sub-page-size PMD page tables. We desperately need to hoist page table allocation out of the architectures; there're a bunch of different implementations and they're mostly bad, one way or another. These are IOMMU page tables, rather than CPU ones, so we're already well outside arch code - indeed the original motivation of io-pgtable was to be entirely independent of the p*d types and arch-specific MM code (this Armv7 short-descriptor format is already "non-native" when used by drivers in an arm64 kernel). There are various efficiency reasons for using regular kernel memory instead of coherent DMA allocations - for the most part it works well, we just have the odd corner case like this one where the 32-bit format gets used on 64-bit systems such that the tables themselves still need to be allocated below 4GB (although the final output address can point at higher memory by virtue of the IOMMU in question not implementing permissions and repurposing some of those PTE fields as extra address bits). TBH, if this DMA32 stuff is going to be contentious we could possibly just rip out the offending kmem_cache - it seemed like good practice for the use-case, but provided kzalloc(SZ_1K, gfp | GFP_DMA32) can be relied upon to give the same 1KB alignment and chance of succeeding as the equivalent kmem_cache_alloc(), then we could quite easily make do with that instead. Thanks, Robin. For each level of page table we generally have three cases: 1. single page 2. sub-page, naturally aligned 3. multiple pages, naturally aligned for 1 and 3, the page allocator will do just fine. for 2, we should have a per-MM page_frag allocator. s390 already has something like this, although it's more complicated. ppc also has something a little more complex for the cases when it's configured with a 64k page size but wants to use a 4k page table entry. I'd like x86 to be able to simply do: #define pte_alloc_one(mm, addr) page_alloc_table(mm, addr, 0) #define pmd_alloc_one(mm, addr) page_alloc_table(mm, addr, 0) #define pud_alloc_one(mm, addr) page_alloc_table(mm, addr, 0) #define p4d_alloc_one(mm, addr) page_alloc_table(mm, addr, 0) An architecture with 4k page size and needing a 16k PMD would do: #define pmd_alloc_one(mm, addr) page_alloc_table(mm, addr, 2) while an architecture with a 64k page size needing a 4k PTE would do: #define ARCH_PAGE_TABLE_FRAG #define pte_alloc_one(mm, addr) pagefrag_alloc_table(mm, addr, 4096) I haven't had time to work on this, but perhaps someone with a problem that needs fixing would like to, instead of burying yet another awful implementation away in arch/ somewhere. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables
On Wed, Nov 21, 2018 at 06:20:02PM +, Christopher Lameter wrote: > On Sun, 11 Nov 2018, Nicolas Boichat wrote: > > > This is a follow-up to the discussion in [1], to make sure that the page > > tables allocated by iommu/io-pgtable-arm-v7s are contained within 32-bit > > physical address space. > > Page tables? This means you need a page frame? Why go through the slab > allocators? Because this particular architecture has sub-page-size PMD page tables. We desperately need to hoist page table allocation out of the architectures; there're a bunch of different implementations and they're mostly bad, one way or another. For each level of page table we generally have three cases: 1. single page 2. sub-page, naturally aligned 3. multiple pages, naturally aligned for 1 and 3, the page allocator will do just fine. for 2, we should have a per-MM page_frag allocator. s390 already has something like this, although it's more complicated. ppc also has something a little more complex for the cases when it's configured with a 64k page size but wants to use a 4k page table entry. I'd like x86 to be able to simply do: #define pte_alloc_one(mm, addr) page_alloc_table(mm, addr, 0) #define pmd_alloc_one(mm, addr) page_alloc_table(mm, addr, 0) #define pud_alloc_one(mm, addr) page_alloc_table(mm, addr, 0) #define p4d_alloc_one(mm, addr) page_alloc_table(mm, addr, 0) An architecture with 4k page size and needing a 16k PMD would do: #define pmd_alloc_one(mm, addr) page_alloc_table(mm, addr, 2) while an architecture with a 64k page size needing a 4k PTE would do: #define ARCH_PAGE_TABLE_FRAG #define pte_alloc_one(mm, addr) pagefrag_alloc_table(mm, addr, 4096) I haven't had time to work on this, but perhaps someone with a problem that needs fixing would like to, instead of burying yet another awful implementation away in arch/ somewhere. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 2/3] mm: Add support for SLAB_CACHE_DMA32
On Sun, 11 Nov 2018, Nicolas Boichat wrote: > SLAB_CACHE_DMA32 is only available after explicit kmem_cache_create calls, > no default cache is created for kmalloc. Add a test in check_slab_flags > for this. This does not define the dma32 kmalloc array. Is that intentional? In that case you need to fail any request for GFP_DMA32 coming in via kmalloc. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 3/3] iommu/io-pgtable-arm-v7s: Request DMA32 memory, and improve debugging
On Wed, 21 Nov 2018, Will Deacon wrote: > > +#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA32 SLAB_CACHE_DMA32??? WTH is going on here? We are trying to get rid of the dma slab array. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables
On Sun, 11 Nov 2018, Nicolas Boichat wrote: > This is a follow-up to the discussion in [1], to make sure that the page > tables allocated by iommu/io-pgtable-arm-v7s are contained within 32-bit > physical address space. Page tables? This means you need a page frame? Why go through the slab allocators? ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/9] Use vm_insert_range
On 11/21/18 2:56 PM, Souptick Joarder wrote: > On Thu, Nov 22, 2018 at 1:08 AM Boris Ostrovsky > wrote: >> On 11/21/18 1:24 AM, Souptick Joarder wrote: >>> On Thu, Nov 15, 2018 at 9:09 PM Souptick Joarder >>> wrote: Previouly drivers have their own way of mapping range of kernel pages/memory into user vma and this was done by invoking vm_insert_page() within a loop. As this pattern is common across different drivers, it can be generalized by creating a new function and use it across the drivers. vm_insert_range is the new API which will be used to map a range of kernel memory/pages to user vma. All the applicable places are converted to use new vm_insert_range in this patch series. Souptick Joarder (9): mm: Introduce new vm_insert_range API arch/arm/mm/dma-mapping.c: Convert to use vm_insert_range drivers/firewire/core-iso.c: Convert to use vm_insert_range drm/rockchip/rockchip_drm_gem.c: Convert to use vm_insert_range drm/xen/xen_drm_front_gem.c: Convert to use vm_insert_range iommu/dma-iommu.c: Convert to use vm_insert_range videobuf2/videobuf2-dma-sg.c: Convert to use vm_insert_range xen/gntdev.c: Convert to use vm_insert_range xen/privcmd-buf.c: Convert to use vm_insert_range >>> Any further comment on driver changes ? >> Xen drivers (the last two patches) look fine to me. > Thanks, can I considered this as Reviewed-by ? Reviewed-by: Boris Ostrovsky ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/9] Use vm_insert_range
On Thu, Nov 22, 2018 at 1:08 AM Boris Ostrovsky wrote: > > On 11/21/18 1:24 AM, Souptick Joarder wrote: > > On Thu, Nov 15, 2018 at 9:09 PM Souptick Joarder > > wrote: > >> Previouly drivers have their own way of mapping range of > >> kernel pages/memory into user vma and this was done by > >> invoking vm_insert_page() within a loop. > >> > >> As this pattern is common across different drivers, it can > >> be generalized by creating a new function and use it across > >> the drivers. > >> > >> vm_insert_range is the new API which will be used to map a > >> range of kernel memory/pages to user vma. > >> > >> All the applicable places are converted to use new vm_insert_range > >> in this patch series. > >> > >> Souptick Joarder (9): > >> mm: Introduce new vm_insert_range API > >> arch/arm/mm/dma-mapping.c: Convert to use vm_insert_range > >> drivers/firewire/core-iso.c: Convert to use vm_insert_range > >> drm/rockchip/rockchip_drm_gem.c: Convert to use vm_insert_range > >> drm/xen/xen_drm_front_gem.c: Convert to use vm_insert_range > >> iommu/dma-iommu.c: Convert to use vm_insert_range > >> videobuf2/videobuf2-dma-sg.c: Convert to use vm_insert_range > >> xen/gntdev.c: Convert to use vm_insert_range > >> xen/privcmd-buf.c: Convert to use vm_insert_range > > Any further comment on driver changes ? > > Xen drivers (the last two patches) look fine to me. Thanks, can I considered this as Reviewed-by ? > > -boris > > > >> arch/arm/mm/dma-mapping.c | 21 ++--- > >> drivers/firewire/core-iso.c | 15 ++-- > >> drivers/gpu/drm/rockchip/rockchip_drm_gem.c | 20 ++-- > >> drivers/gpu/drm/xen/xen_drm_front_gem.c | 20 +--- > >> drivers/iommu/dma-iommu.c | 12 ++ > >> drivers/media/common/videobuf2/videobuf2-dma-sg.c | 23 ++- > >> drivers/xen/gntdev.c | 11 - > >> drivers/xen/privcmd-buf.c | 8 ++- > >> include/linux/mm_types.h | 3 +++ > >> mm/memory.c | 28 > >> +++ > >> mm/nommu.c| 7 ++ > >> 11 files changed, 70 insertions(+), 98 deletions(-) > >> > >> -- > >> 1.9.1 > >> > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/9] Use vm_insert_range
On 11/21/18 1:24 AM, Souptick Joarder wrote: > On Thu, Nov 15, 2018 at 9:09 PM Souptick Joarder wrote: >> Previouly drivers have their own way of mapping range of >> kernel pages/memory into user vma and this was done by >> invoking vm_insert_page() within a loop. >> >> As this pattern is common across different drivers, it can >> be generalized by creating a new function and use it across >> the drivers. >> >> vm_insert_range is the new API which will be used to map a >> range of kernel memory/pages to user vma. >> >> All the applicable places are converted to use new vm_insert_range >> in this patch series. >> >> Souptick Joarder (9): >> mm: Introduce new vm_insert_range API >> arch/arm/mm/dma-mapping.c: Convert to use vm_insert_range >> drivers/firewire/core-iso.c: Convert to use vm_insert_range >> drm/rockchip/rockchip_drm_gem.c: Convert to use vm_insert_range >> drm/xen/xen_drm_front_gem.c: Convert to use vm_insert_range >> iommu/dma-iommu.c: Convert to use vm_insert_range >> videobuf2/videobuf2-dma-sg.c: Convert to use vm_insert_range >> xen/gntdev.c: Convert to use vm_insert_range >> xen/privcmd-buf.c: Convert to use vm_insert_range > Any further comment on driver changes ? Xen drivers (the last two patches) look fine to me. -boris >> arch/arm/mm/dma-mapping.c | 21 ++--- >> drivers/firewire/core-iso.c | 15 ++-- >> drivers/gpu/drm/rockchip/rockchip_drm_gem.c | 20 ++-- >> drivers/gpu/drm/xen/xen_drm_front_gem.c | 20 +--- >> drivers/iommu/dma-iommu.c | 12 ++ >> drivers/media/common/videobuf2/videobuf2-dma-sg.c | 23 ++- >> drivers/xen/gntdev.c | 11 - >> drivers/xen/privcmd-buf.c | 8 ++- >> include/linux/mm_types.h | 3 +++ >> mm/memory.c | 28 >> +++ >> mm/nommu.c| 7 ++ >> 11 files changed, 70 insertions(+), 98 deletions(-) >> >> -- >> 1.9.1 >> ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH 2/6] drivers core: Add I/O ASID allocator
Am 21.11.18 um 12:16 schrieb Jean-Philippe Brucker: > On 12/11/2018 14:40, Joerg Roedel wrote: >> Hi Jean-Philippe, >> >> On Fri, Oct 19, 2018 at 07:11:54PM +0100, Jean-Philippe Brucker wrote: >>> The allocator doesn't really belong in drivers/iommu because some >>> drivers would like to allocate PASIDs for devices that aren't managed by >>> an IOMMU, using the same ID space as IOMMU. It doesn't really belong in >>> drivers/pci either since platform device also support PASID. Add the >>> allocator in drivers/base. >> I don't really buy this argument, in the end it is the IOMMU that >> enforces the PASID mappings for a device. Why should a device not behind >> an IOMMU need to allocate a pasid? Do you have examples of such devices >> which also don't have their own iommu built-in? > I misunderstood that requirement. Reading through the thread again > (https://www.mail-archive.com/iommu@lists.linux-foundation.org/msg25640.html) > it's more about a device using PASIDs as context IDs. Some contexts are > not bound to processes but they still need their ID in the same PASID > space as the contexts that are bound to process address spaces. So we > can keep that code in drivers/iommu The problem with that approach is that we also need this allocator when IOMMU is completely disabled. In other words PASIDs are used as contexts IDs by the hardware for things like signaling which application has caused an interrupt/event even when they are not used by IOMMU later on. Additional to that we have some MMUs which are internal to the devices (GPUVM on AMD GPUs for example, similar exists for NVidia) which uses PASIDs for address translation internally in the device. All of that is completely unrelated to IOMMU, but when IOMMU is enabled you need to use the same allocator because all use cases use the same ID space. Regards, Christian. > >>> +int ioasid_for_each(struct ioasid_set *set, ioasid_iter_t func, void *data) >>> +{ >>> + int ret; >>> + struct ioasid_iter_data iter_data = { >>> + .set= set, >>> + .func = func, >>> + .data = data, >>> + }; >>> + >>> + idr_lock(_idr); >>> + ret = idr_for_each(_idr, ioasid_iter, _data); >>> + idr_unlock(_idr); >>> + >>> + return ret; >>> +} >>> +EXPORT_SYMBOL_GPL(ioasid_for_each); >> What is the intended use-case for this function? > I'm using it in sva_bind(), to see if there already exists an io_mm > associated to the mm_struct given as argument > >>> +void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid) >>> +{ >>> + void *priv = NULL; >>> + struct ioasid_data *ioasid_data; >>> + >>> + idr_lock(_idr); >>> + ioasid_data = idr_find(_idr, ioasid); >>> + if (ioasid_data && ioasid_data->set == set) >>> + priv = ioasid_data->private; >>> + idr_unlock(_idr); >>> + >>> + return priv; >>> +} >> I think using the idr_lock here will become a performance problem, as we >> need this function in the SVA page-fault path to retrieve the mm_struct >> belonging to a PASID. >> >> The head comment in lib/idr.c states that it should be safe to use >> rcu_readlock() on read-only iterations. If so, this should be used here >> so that concurrent lookups are possible. > Agreed. I have a patch to use the RCU which looks simple enough, but > since I haven't used RCU before I wasn't comfortable squashing it right > away. > > Thanks, > Jean > > diff --git a/drivers/base/ioasid.c b/drivers/base/ioasid.c > index 5eb3abbf69ac..fa3b48c9c044 100644 > --- a/drivers/base/ioasid.c > +++ b/drivers/base/ioasid.c > @@ -13,6 +13,7 @@ struct ioasid_data { > int id; > struct ioasid_set *set; > void *private; > + struct rcu_head rcu; > }; > > static DEFINE_IDR(ioasid_idr); > @@ -56,7 +57,8 @@ void ioasid_free(ioasid_t ioasid) > ioasid_data = idr_remove(_idr, ioasid); > idr_unlock(_idr); > > - kfree(ioasid_data); > + if (ioasid_data) > + kfree_rcu(ioasid_data, rcu); > } > EXPORT_SYMBOL_GPL(ioasid_free); > > @@ -95,9 +97,9 @@ int ioasid_for_each(struct ioasid_set *set, > ioasid_iter_t func, void *data) > .data = data, > }; > > - idr_lock(_idr); > + rcu_read_lock(); > ret = idr_for_each(_idr, ioasid_iter, _data); > - idr_unlock(_idr); > + rcu_read_unlock(); > > return ret; > } > @@ -111,11 +113,11 @@ void *ioasid_find(struct ioasid_set *set, ioasid_t > ioasid) > void *priv = NULL; > struct ioasid_data *ioasid_data; > > - idr_lock(_idr); > + rcu_read_lock(); > ioasid_data = idr_find(_idr, ioasid); > if (ioasid_data && ioasid_data->set == set) > priv = ioasid_data->private; > - idr_unlock(_idr); > + rcu_read_unlock(); > > return priv; > } > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH 0/6] Auxiliary IOMMU domains and Arm SMMUv3
On 12/11/2018 14:55, j...@8bytes.org wrote: > Hi Jean-Philippe, > > On Thu, Nov 08, 2018 at 06:29:42PM +, Jean-Philippe Brucker wrote: >> (1) My initial approach would have been to use the same page tables for >> the default_domain and this new domain, but it might be precisely what >> you were trying to avoid with this new proposal: a device attached to >> two domains at the same time. > > My request was about the initial assumptions a device driver can make > about a device. This assumptions is that DMA-API is set up and > initialized for it, for everything else (like SVA) the device driver > needs to take special action, like allocating an SVA domain and attaching > the device to it. > > This should of course not break any IRQ setups for the device, and also > enforcing an ordering is not a good and maintainable solution. > > So what you could do here is to either: > > 1) Save the needed IRQ mappings in some extra datastructure and > duplicate it in the SVA domain. This also makes it easier to > use the same SVA domain with multiple devices. > > 2) Just re-use the 'page-tables' from the default domain in the > sva-domain. This needs to happen at attach-time, because at > allocation time you don't know the device yet. > > I think 1) is the best solution, what do you think? Right now 2) seems more feasible, because changes will be limited to the IOMMU driver and don't spill in the dma-iommu layer. Maybe it will be clearer to me once I write a prototype. > Btw, things would be different if we could expose SVA through the > DMA-API to drivers. In this case we could just make the default domain > of type SVA and be done, but we are not there yet. For the moment though, I think we should allow device drivers to use the DMA-API at the same time as SVA. If a device driver has to map a management ring buffer for example, it's much nicer to use dma_alloc as usual rather than have them write their own DMA allocation routines. Which is another reason to implement 2) above: the DMA-API would still act on the default_domain, and attaching an "extended" domain augments it with SVA features. Detaching from the device doesn't require copying mappings back to the default domain. Does that sound OK? Then if vfio-pci wants to use SVA, maybe we can make the UNMANAGED domain support SVA by default? Otherwise introduce a UNMANAGED+ext domain. Anyway, that's for later. This is what I currently plan to add: struct io_mm *io_mm; struct iommu_domain *domain; domain = iommu_alloc_ext_domain(bus); /* Set an mm-exit callback to disable PASIDs. More attributes could be added later to change the capabilities of the ext domain */ iommu_domain_set_attr(domain, DOMAIN_ATTR_MM_EXIT_CB, _exit); /* Fails if the device doesn't support this domain type */ iommu_attach_device(domain, dev); /* Same as SVA v3, except a domain instead of dev as argument */ io_mm = iommu_sva_bind_mm(domain, current->mm, ...); /* on success, install the PASID in the device */ pasid = io_mm->pasid; /* Do more work */ iommu_sva_unbind_mm(io_mm); iommu_detach_device(domain, dev); Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 3/3] iommu/io-pgtable-arm-v7s: Request DMA32 memory, and improve debugging
On Wed 21-11-18 16:46:38, Will Deacon wrote: > On Sun, Nov 11, 2018 at 05:03:41PM +0800, Nicolas Boichat wrote: > > For level 1/2 pages, ensure GFP_DMA32 is used if CONFIG_ZONE_DMA32 > > is defined (e.g. on arm64 platforms). > > > > For level 2 pages, allocate a slab cache in SLAB_CACHE_DMA32. > > > > Also, print an error when the physical address does not fit in > > 32-bit, to make debugging easier in the future. > > > > Fixes: ad67f5a6545f ("arm64: replace ZONE_DMA with ZONE_DMA32") > > Signed-off-by: Nicolas Boichat > > --- > > > > Changes since v1: > > - Changed approach to use SLAB_CACHE_DMA32 added by the previous > >commit. > > - Use DMA or DMA32 depending on the architecture (DMA for arm, > >DMA32 for arm64). > > > > drivers/iommu/io-pgtable-arm-v7s.c | 20 > > 1 file changed, 16 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/iommu/io-pgtable-arm-v7s.c > > b/drivers/iommu/io-pgtable-arm-v7s.c > > index 445c3bde04800c..996f7b6d00b44a 100644 > > --- a/drivers/iommu/io-pgtable-arm-v7s.c > > +++ b/drivers/iommu/io-pgtable-arm-v7s.c > > @@ -161,6 +161,14 @@ > > > > #define ARM_V7S_TCR_PD1BIT(5) > > > > +#ifdef CONFIG_ZONE_DMA32 > > +#define ARM_V7S_TABLE_GFP_DMA GFP_DMA32 > > +#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA32 > > +#else > > +#define ARM_V7S_TABLE_GFP_DMA GFP_DMA > > +#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA > > +#endif > > It's a bit grotty that GFP_DMA32 doesn't just map to GFP_DMA on 32-bit > architectures, since then we wouldn't need this #ifdeffery afaict. But GFP_DMA32 should map to GFP_KERNEL on 32b, no? Or what exactly is going on in here? -- Michal Hocko SUSE Labs ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 0/2] SWIOTLB fixes for 4.20
Thanks, aplied to the dma-mapping for-linus tree. I'll wait a little longer for a review from Stefano, but I really want it in linux-next asap and hopefully off to Linus for the weekend. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCHv4 2/2] iommu/arm-smmu: Get clock config from device tree
Hi Thor, On Fri, Oct 05, 2018 at 03:57:00PM -0500, thor.tha...@linux.intel.com wrote: > From: Thor Thayer > > Currently the clocks are specified in a structure as well as > in the device tree. Since all the information about clocks can be > pulled from the device tree, parse the device tree for the clock > information if specified. > > This patch is dependent upon the following patch series that > adds clocks to the driver. > "[v16,0/5] iommu/arm-smmu: Add runtime pm/sleep support" > https://patchwork.kernel.org/cover/10581891/ > > particularly this patch which adds the clocks: > "[v16,1/5] iommu/arm-smmu: Add pm_runtime/sleep ops" > https://patchwork.kernel.org/patch/10581899/ > > Request testing on different platforms for verification. > This patch was tested on a Stratix10 SOCFPGA. > > Signed-off-by: Thor Thayer > --- > v4 Change dependency on pending patch series that adds clocks. > Add code for parsing device tree for the clocks. > v3 Change dependency on device tree bulk clock patches. > v2 Add structure for SOCFPGA and add dependency on clock patch. > --- > drivers/iommu/arm-smmu.c | 30 +++--- > 1 file changed, 27 insertions(+), 3 deletions(-) > > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c > index d315ca637097..3dd10663b09c 100644 > --- a/drivers/iommu/arm-smmu.c > +++ b/drivers/iommu/arm-smmu.c > @@ -44,6 +44,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -2139,6 +2140,7 @@ static int arm_smmu_device_dt_probe(struct > platform_device *pdev, > const struct arm_smmu_match_data *data; > struct device *dev = >dev; > bool legacy_binding; > + const char **parent_names; > > if (of_property_read_u32(dev->of_node, "#global-interrupts", >>num_global_irqs)) { > @@ -2149,9 +2151,25 @@ static int arm_smmu_device_dt_probe(struct > platform_device *pdev, > data = of_device_get_match_data(dev); > smmu->version = data->version; > smmu->model = data->model; > - smmu->num_clks = data->num_clks; > - > - arm_smmu_fill_clk_data(smmu, data->clks); > + smmu->num_clks = of_clk_get_parent_count(dev->of_node); > + /* check to see if clocks were specified in DT */ > + if (smmu->num_clks) { > + unsigned int i; > + > + parent_names = kmalloc_array(smmu->num_clks, > + sizeof(*parent_names), > + GFP_KERNEL); > + if (!parent_names) > + goto fail_clk_name; > + > + for (i = 0; i < smmu->num_clks; i++) { > + if (of_property_read_string_index(dev->of_node, > + "clock-names", i, > + _names[i])) > + goto fail_clk_name; > + } > + arm_smmu_fill_clk_data(smmu, parent_names); > + } > > parse_driver_options(smmu); > > @@ -2171,6 +2189,12 @@ static int arm_smmu_device_dt_probe(struct > platform_device *pdev, > smmu->features |= ARM_SMMU_FEAT_COHERENT_WALK; > > return 0; > + > +fail_clk_name: > + kfree(parent_names); > + /* clock-names required for clocks in devm_clk_bulk_get() */ > + dev_err(dev, "Error: clock-name required in device tree\n"); I think you can drop the "Error: " prefix here, but the rest of the patch looks sensible to me. We just need to work out how this co-exists with Vivek's approach of using match_data (see the other thread). Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 3/3] iommu/io-pgtable-arm-v7s: Request DMA32 memory, and improve debugging
On 21/11/2018 17:38, Christopher Lameter wrote: On Wed, 21 Nov 2018, Will Deacon wrote: +#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA32 SLAB_CACHE_DMA32??? WTH is going on here? We are trying to get rid of the dma slab array. See the previous two patches in this series. If there's already a (better) way to have a kmem_cache which allocates its backing pages with GFP_DMA32, please do let us know. Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RESEND PATCH v17 5/5] iommu/arm-smmu: Add support for qcom,smmu-v2 variant
[+Thor] On Fri, Nov 16, 2018 at 04:54:30PM +0530, Vivek Gautam wrote: > qcom,smmu-v2 is an arm,smmu-v2 implementation with specific > clock and power requirements. > On msm8996, multiple cores, viz. mdss, video, etc. use this > smmu. On sdm845, this smmu is used with gpu. > Add bindings for the same. > > Signed-off-by: Vivek Gautam > Reviewed-by: Rob Herring > Reviewed-by: Tomasz Figa > Tested-by: Srinivas Kandagatla > Reviewed-by: Robin Murphy > --- > drivers/iommu/arm-smmu.c | 13 + > 1 file changed, 13 insertions(+) > > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c > index 2098c3141f5f..d315ca637097 100644 > --- a/drivers/iommu/arm-smmu.c > +++ b/drivers/iommu/arm-smmu.c > @@ -120,6 +120,7 @@ enum arm_smmu_implementation { > GENERIC_SMMU, > ARM_MMU500, > CAVIUM_SMMUV2, > + QCOM_SMMUV2, > }; > > struct arm_smmu_s2cr { > @@ -2026,6 +2027,17 @@ ARM_SMMU_MATCH_DATA(arm_mmu401, ARM_SMMU_V1_64K, > GENERIC_SMMU); > ARM_SMMU_MATCH_DATA(arm_mmu500, ARM_SMMU_V2, ARM_MMU500); > ARM_SMMU_MATCH_DATA(cavium_smmuv2, ARM_SMMU_V2, CAVIUM_SMMUV2); > > +static const char * const qcom_smmuv2_clks[] = { > + "bus", "iface", > +}; > + > +static const struct arm_smmu_match_data qcom_smmuv2 = { > + .version = ARM_SMMU_V2, > + .model = QCOM_SMMUV2, > + .clks = qcom_smmuv2_clks, > + .num_clks = ARRAY_SIZE(qcom_smmuv2_clks), > +}; These seems redundant if we go down the route proposed by Thor, where we just pull all of the clocks out of the device-tree. In which case, why do we need this match_data at all? Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RESEND PATCH v17 2/5] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
On Fri, Nov 16, 2018 at 04:54:27PM +0530, Vivek Gautam wrote: > From: Sricharan R > > The smmu device probe/remove and add/remove master device callbacks > gets called when the smmu is not linked to its master, that is without > the context of the master device. So calling runtime apis in those places > separately. > Global locks are also initialized before enabling runtime pm as the > runtime_resume() calls device_reset() which does tlb_sync_global() > that ultimately requires locks to be initialized. > > Signed-off-by: Sricharan R > [vivek: Cleanup pm runtime calls] > Signed-off-by: Vivek Gautam > Reviewed-by: Tomasz Figa > Tested-by: Srinivas Kandagatla > Reviewed-by: Robin Murphy > --- > drivers/iommu/arm-smmu.c | 101 > ++- > 1 file changed, 91 insertions(+), 10 deletions(-) Given that you're doing the get/put in the TLBI ops unconditionally: > static void arm_smmu_flush_iotlb_all(struct iommu_domain *domain) > { > struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); > + struct arm_smmu_device *smmu = smmu_domain->smmu; > > - if (smmu_domain->tlb_ops) > + if (smmu_domain->tlb_ops) { > + arm_smmu_rpm_get(smmu); > smmu_domain->tlb_ops->tlb_flush_all(smmu_domain); > + arm_smmu_rpm_put(smmu); > + } > } > > static void arm_smmu_iotlb_sync(struct iommu_domain *domain) > { > struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); > + struct arm_smmu_device *smmu = smmu_domain->smmu; > > - if (smmu_domain->tlb_ops) > + if (smmu_domain->tlb_ops) { > + arm_smmu_rpm_get(smmu); > smmu_domain->tlb_ops->tlb_sync(smmu_domain); > + arm_smmu_rpm_put(smmu); > + } Why do you need them around the map/unmap calls as well? Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3] iommu/dma: Use NUMA aware memory allocations in __iommu_dma_alloc_pages()
On Wed, Nov 21, 2018 at 04:47:48PM +, John Garry wrote: > On 21/11/2018 16:07, Will Deacon wrote: > >On Wed, Nov 21, 2018 at 10:54:10PM +0800, John Garry wrote: > >>From: Ganapatrao Kulkarni > >> > >>Change function __iommu_dma_alloc_pages() to allocate pages for DMA from > >>respective device NUMA node. The ternary operator which would be for > >>alloc_pages_node() is tidied along with this. > >> > >>We also include a change to use kvzalloc() for kzalloc()/vzalloc() > >>combination. > >> > >>Signed-off-by: Ganapatrao Kulkarni > >>[JPG: Added kvzalloc(), drop pages ** being device local, tidied ternary > >>operator] > >>Signed-off-by: John Garry > > > >Weird, you're missing a diffstat here. > > > >Anyway, the patch looks fine to me, but it would be nice if you could > >justify the change with some numbers. Do you actually see an improvement > >from this change? > > > > Hi Will, > > Ah, I missed adding my comments explaining the motivation. It would be > better in the commit log. Anyway, here's the snippet: > > " ... as mentioned in [3], dma_alloc_coherent() uses the locality > information from the device - as in direct DMA - so this patch is just > applying this same policy. > > [3] > https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1692998.html; Yes, please add to this to the commit log. > I did have some numbers to show improvement in some scenarios when I tested > this a while back which I'll dig out. > > However I would say that some scenarios will improve and the opposite for > others with this change, considering different conditions in which DMA > memory may be used. Well, if you can show that it's useful in some cases and not catastrophic in others, then I think shooting for parity with direct DMA is a reasonable justification for the change. Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 07/17] debugobjects: Move printk out of db lock critical sections
On 11/19/2018 01:55 PM, Waiman Long wrote: > The db->lock is a raw spinlock and so the lock hold time is supposed > to be short. This will not be the case when printk() is being involved > in some of the critical sections. In order to avoid the long hold time, > in case some messages need to be printed, the debug_object_is_on_stack() > and debug_print_object() calls are now moved out of those critical > sections. > > Signed-off-by: Waiman Long > --- > lib/debugobjects.c | 61 > +- > 1 file changed, 42 insertions(+), 19 deletions(-) > > diff --git a/lib/debugobjects.c b/lib/debugobjects.c > index 403dd95..4216d3d 100644 > --- a/lib/debugobjects.c > +++ b/lib/debugobjects.c > @@ -376,6 +376,8 @@ static void debug_object_is_on_stack(void *addr, int > onstack) > struct debug_bucket *db; > struct debug_obj *obj; > unsigned long flags; > + bool debug_printobj = false; > + bool debug_chkstack = false; > > fill_pool(); > > @@ -392,7 +394,7 @@ static void debug_object_is_on_stack(void *addr, int > onstack) > debug_objects_oom(); > return; > } > - debug_object_is_on_stack(addr, onstack); > + debug_chkstack = true; > } > > switch (obj->state) { > @@ -403,20 +405,25 @@ static void debug_object_is_on_stack(void *addr, int > onstack) > break; > > case ODEBUG_STATE_ACTIVE: > - debug_print_object(obj, "init"); > state = obj->state; > raw_spin_unlock_irqrestore(>lock, flags); > + debug_print_object(obj, "init"); > debug_object_fixup(descr->fixup_init, addr, state); > return; > > case ODEBUG_STATE_DESTROYED: > - debug_print_object(obj, "init"); > + debug_printobj = true; > break; > default: > break; > } > > raw_spin_unlock_irqrestore(>lock, flags); > + if (debug_chkstack) > + debug_object_is_on_stack(addr, onstack); > + if (debug_printobj) > + debug_print_object(obj, "init"); > + > } > > /** > @@ -474,6 +481,8 @@ int debug_object_activate(void *addr, struct > debug_obj_descr *descr) > > obj = lookup_object(addr, db); > if (obj) { > + bool debug_printobj = false; > + > switch (obj->state) { > case ODEBUG_STATE_INIT: > case ODEBUG_STATE_INACTIVE: > @@ -482,14 +491,14 @@ int debug_object_activate(void *addr, struct > debug_obj_descr *descr) > break; > > case ODEBUG_STATE_ACTIVE: > - debug_print_object(obj, "activate"); > state = obj->state; > raw_spin_unlock_irqrestore(>lock, flags); > + debug_print_object(obj, "activate"); > ret = debug_object_fixup(descr->fixup_activate, addr, > state); > return ret ? 0 : -EINVAL; > > case ODEBUG_STATE_DESTROYED: > - debug_print_object(obj, "activate"); > + debug_printobj = true; > ret = -EINVAL; > break; > default: > @@ -497,10 +506,13 @@ int debug_object_activate(void *addr, struct > debug_obj_descr *descr) > break; > } > raw_spin_unlock_irqrestore(>lock, flags); > + if (debug_printobj) > + debug_print_object(obj, "activate"); > return ret; > } > > raw_spin_unlock_irqrestore(>lock, flags); > + > /* >* We are here when a static object is activated. We >* let the type specific code confirm whether this is > @@ -532,6 +544,7 @@ void debug_object_deactivate(void *addr, struct > debug_obj_descr *descr) > struct debug_bucket *db; > struct debug_obj *obj; > unsigned long flags; > + bool debug_printobj = false; > > if (!debug_objects_enabled) > return; > @@ -549,24 +562,27 @@ void debug_object_deactivate(void *addr, struct > debug_obj_descr *descr) > if (!obj->astate) > obj->state = ODEBUG_STATE_INACTIVE; > else > - debug_print_object(obj, "deactivate"); > + debug_printobj = true; > break; > > case ODEBUG_STATE_DESTROYED: > - debug_print_object(obj, "deactivate"); > + debug_printobj = true; > break; > default: > break; > } > - } else { > + } > + > + raw_spin_unlock_irqrestore(>lock, flags); > + if (!obj) { > struct debug_obj o = { .object = addr, >
Re: [PATCH v3] iommu/dma: Use NUMA aware memory allocations in __iommu_dma_alloc_pages()
On 21/11/2018 16:07, Will Deacon wrote: On Wed, Nov 21, 2018 at 10:54:10PM +0800, John Garry wrote: From: Ganapatrao Kulkarni Change function __iommu_dma_alloc_pages() to allocate pages for DMA from respective device NUMA node. The ternary operator which would be for alloc_pages_node() is tidied along with this. We also include a change to use kvzalloc() for kzalloc()/vzalloc() combination. Signed-off-by: Ganapatrao Kulkarni [JPG: Added kvzalloc(), drop pages ** being device local, tidied ternary operator] Signed-off-by: John Garry Weird, you're missing a diffstat here. Anyway, the patch looks fine to me, but it would be nice if you could justify the change with some numbers. Do you actually see an improvement from this change? Hi Will, Ah, I missed adding my comments explaining the motivation. It would be better in the commit log. Anyway, here's the snippet: " ... as mentioned in [3], dma_alloc_coherent() uses the locality information from the device - as in direct DMA - so this patch is just applying this same policy. [3] https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1692998.html; I did have some numbers to show improvement in some scenarios when I tested this a while back which I'll dig out. However I would say that some scenarios will improve and the opposite for others with this change, considering different conditions in which DMA memory may be used. Cheers, John Will . ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 3/3] iommu/io-pgtable-arm-v7s: Request DMA32 memory, and improve debugging
On Sun, Nov 11, 2018 at 05:03:41PM +0800, Nicolas Boichat wrote: > For level 1/2 pages, ensure GFP_DMA32 is used if CONFIG_ZONE_DMA32 > is defined (e.g. on arm64 platforms). > > For level 2 pages, allocate a slab cache in SLAB_CACHE_DMA32. > > Also, print an error when the physical address does not fit in > 32-bit, to make debugging easier in the future. > > Fixes: ad67f5a6545f ("arm64: replace ZONE_DMA with ZONE_DMA32") > Signed-off-by: Nicolas Boichat > --- > > Changes since v1: > - Changed approach to use SLAB_CACHE_DMA32 added by the previous >commit. > - Use DMA or DMA32 depending on the architecture (DMA for arm, >DMA32 for arm64). > > drivers/iommu/io-pgtable-arm-v7s.c | 20 > 1 file changed, 16 insertions(+), 4 deletions(-) > > diff --git a/drivers/iommu/io-pgtable-arm-v7s.c > b/drivers/iommu/io-pgtable-arm-v7s.c > index 445c3bde04800c..996f7b6d00b44a 100644 > --- a/drivers/iommu/io-pgtable-arm-v7s.c > +++ b/drivers/iommu/io-pgtable-arm-v7s.c > @@ -161,6 +161,14 @@ > > #define ARM_V7S_TCR_PD1 BIT(5) > > +#ifdef CONFIG_ZONE_DMA32 > +#define ARM_V7S_TABLE_GFP_DMA GFP_DMA32 > +#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA32 > +#else > +#define ARM_V7S_TABLE_GFP_DMA GFP_DMA > +#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA > +#endif It's a bit grotty that GFP_DMA32 doesn't just map to GFP_DMA on 32-bit architectures, since then we wouldn't need this #ifdeffery afaict. Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 0/2] Enable smmu support on sdm845
Hi Will, On 11/21/2018 9:22 PM, Will Deacon wrote: Hi Vivek, On Thu, Oct 11, 2018 at 03:19:28PM +0530, Vivek Gautam wrote: This series enables apps-smmu, the "arm,mmu-500" instance on sdm845. Series tested on SDM845 MTP device with related smmu patch series [1], and necessary config change, besides one hack to keep LDO14 in LPM mode to boot things up (see the commit in the test branch). The tested branch is available at [2]. [...] .../devicetree/bindings/iommu/arm,smmu.txt | 4 ++ arch/arm64/boot/dts/qcom/sdm845.dtsi | 72 ++ 2 files changed, 76 insertions(+) Given that this doesn't touch any of the driver code, please take this via the Andy and arm-soc. Yea, once the driver changes are pulled in your tree, I can ask Andy to pick these. Thanks Best regards Vivek Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3] iommu/dma: Use NUMA aware memory allocations in __iommu_dma_alloc_pages()
On Wed, Nov 21, 2018 at 10:54:10PM +0800, John Garry wrote: > From: Ganapatrao Kulkarni > > Change function __iommu_dma_alloc_pages() to allocate pages for DMA from > respective device NUMA node. The ternary operator which would be for > alloc_pages_node() is tidied along with this. > > We also include a change to use kvzalloc() for kzalloc()/vzalloc() > combination. > > Signed-off-by: Ganapatrao Kulkarni > [JPG: Added kvzalloc(), drop pages ** being device local, tidied ternary > operator] > Signed-off-by: John Garry Weird, you're missing a diffstat here. Anyway, the patch looks fine to me, but it would be nice if you could justify the change with some numbers. Do you actually see an improvement from this change? Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 1/2] dma-direct: Make DIRECT_MAPPING_ERROR viable for SWIOTLB
With the overflow buffer removed, we no longer have a unique address which is guaranteed not to be a valid DMA target to use as an error token. The DIRECT_MAPPING_ERROR value of 0 tries to at least represent an unlikely DMA target, but unfortunately there are already SWIOTLB users with DMA-able memory at physical address 0 which now gets falsely treated as a mapping failure and leads to all manner of misbehaviour. The best we can do to mitigate that is flip DIRECT_MAPPING_ERROR to the other commonly-used error value of all-bits-set, since the last single byte of memory is by far the least-likely-valid DMA target. Fixes: dff8d6c1ed58 ("swiotlb: remove the overflow buffer") Reported-by: John Stultz Tested-by: John Stultz Acked-by: Konrad Rzeszutek Wilk Reviewed-by: Christoph Hellwig Signed-off-by: Robin Murphy --- v2: Add parentheses, tweak commit message, collect tags include/linux/dma-direct.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h index bd73e7a91410..9e66bfe369aa 100644 --- a/include/linux/dma-direct.h +++ b/include/linux/dma-direct.h @@ -5,7 +5,7 @@ #include #include -#define DIRECT_MAPPING_ERROR 0 +#define DIRECT_MAPPING_ERROR (~(dma_addr_t)0) #ifdef CONFIG_ARCH_HAS_PHYS_TO_DMA #include -- 2.19.1.dirty ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 0/2] SWIOTLB fixes for 4.20
Here's a quick v2 addressing Christoph's nits and collecting the tags given so far. I've reproduced my USB problem with a commit prior to the dma-mapping pull, so it looks like the cache maintenance changes are off the hook for that one (and I'll have to venture into USB/UAS territory). No objection from me if Konrad wants to pick this up instead depending on timing. Robin. Robin Murphy (2): dma-direct: Make DIRECT_MAPPING_ERROR viable for SWIOTLB swiotlb: Skip cache maintenance on map error include/linux/dma-direct.h | 2 +- kernel/dma/swiotlb.c | 3 ++- 2 files changed, 3 insertions(+), 2 deletions(-) -- 2.19.1.dirty ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 2/2] swiotlb: Skip cache maintenance on map error
If swiotlb_bounce_page() failed, calling arch_sync_dma_for_device() may lead to such delights as performing cache maintenance on whatever address phys_to_virt(SWIOTLB_MAP_ERROR) looks like, which is typically outside the kernel memory map and goes about as well as expected. Don't do that. Fixes: a4a4330db46a ("swiotlb: add support for non-coherent DMA") Tested-by: John Stultz Reviewed-by: Christoph Hellwig Signed-off-by: Robin Murphy --- v2: Collect tags kernel/dma/swiotlb.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 5731daa09a32..045930e32c0e 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -679,7 +679,8 @@ dma_addr_t swiotlb_map_page(struct device *dev, struct page *page, } if (!dev_is_dma_coherent(dev) && - (attrs & DMA_ATTR_SKIP_CPU_SYNC) == 0) + (attrs & DMA_ATTR_SKIP_CPU_SYNC) == 0 && + dev_addr != DIRECT_MAPPING_ERROR) arch_sync_dma_for_device(dev, phys, size, dir); return dev_addr; -- 2.19.1.dirty ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 0/2] Enable smmu support on sdm845
Hi Vivek, On Thu, Oct 11, 2018 at 03:19:28PM +0530, Vivek Gautam wrote: > This series enables apps-smmu, the "arm,mmu-500" instance > on sdm845. > Series tested on SDM845 MTP device with related smmu patch series [1], > and necessary config change, besides one hack to keep LDO14 in LPM mode > to boot things up (see the commit in the test branch). > The tested branch is available at [2]. [...] > .../devicetree/bindings/iommu/arm,smmu.txt | 4 ++ > arch/arm64/boot/dts/qcom/sdm845.dtsi | 72 > ++ > 2 files changed, 76 insertions(+) Given that this doesn't touch any of the driver code, please take this via the Andy and arm-soc. Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/2] SWIOTLB fixes for 4.20
On Wed, Nov 21, 2018 at 02:03:31PM +0100, Christoph Hellwig wrote: > On Tue, Nov 20, 2018 at 11:34:41AM -0500, Konrad Rzeszutek Wilk wrote: > > > Konrad, are you ok with me picking up both through the dma-mapping > > > tree? > > > > Yes, albeit I would want Stefano to take a peek at patch #2 just in case. > > Stefano, can you take a look asap? This is a pretty trivial fix for > a nasty bug that breaks boot on real life systems. I'd like to merge > it by tomorrow so that I can send it off to Linus for the next rc > (I will be offline for about two days after Friday morning) It is Turkey Day in US so he may be busy catching those pesky turkeys. How about Tuesday? And in can also send the GIT PULL. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3] iommu/dma: Use NUMA aware memory allocations in __iommu_dma_alloc_pages()
From: Ganapatrao Kulkarni Change function __iommu_dma_alloc_pages() to allocate pages for DMA from respective device NUMA node. The ternary operator which would be for alloc_pages_node() is tidied along with this. We also include a change to use kvzalloc() for kzalloc()/vzalloc() combination. Signed-off-by: Ganapatrao Kulkarni [JPG: Added kvzalloc(), drop pages ** being device local, tidied ternary operator] Signed-off-by: John Garry diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index d1b0475..4afb1a8 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -449,20 +449,17 @@ static void __iommu_dma_free_pages(struct page **pages, int count) kvfree(pages); } -static struct page **__iommu_dma_alloc_pages(unsigned int count, - unsigned long order_mask, gfp_t gfp) +static struct page **__iommu_dma_alloc_pages(struct device *dev, + unsigned int count, unsigned long order_mask, gfp_t gfp) { struct page **pages; - unsigned int i = 0, array_size = count * sizeof(*pages); + unsigned int i = 0, nid = dev_to_node(dev); order_mask &= (2U << MAX_ORDER) - 1; if (!order_mask) return NULL; - if (array_size <= PAGE_SIZE) - pages = kzalloc(array_size, GFP_KERNEL); - else - pages = vzalloc(array_size); + pages = kvzalloc(count * sizeof(*pages), GFP_KERNEL); if (!pages) return NULL; @@ -481,10 +478,12 @@ static struct page **__iommu_dma_alloc_pages(unsigned int count, for (order_mask &= (2U << __fls(count)) - 1; order_mask; order_mask &= ~order_size) { unsigned int order = __fls(order_mask); + gfp_t alloc_flags = gfp; order_size = 1U << order; - page = alloc_pages((order_mask - order_size) ? - gfp | __GFP_NORETRY : gfp, order); + if (order_mask > order_size) + alloc_flags |= __GFP_NORETRY; + page = alloc_pages_node(nid, alloc_flags, order); if (!page) continue; if (!order) @@ -569,7 +568,8 @@ struct page **iommu_dma_alloc(struct device *dev, size_t size, gfp_t gfp, alloc_sizes = min_size; count = PAGE_ALIGN(size) >> PAGE_SHIFT; - pages = __iommu_dma_alloc_pages(count, alloc_sizes >> PAGE_SHIFT, gfp); + pages = __iommu_dma_alloc_pages(dev, count, alloc_sizes >> PAGE_SHIFT, + gfp); if (!pages) return NULL; -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/9] mm: Introduce new vm_insert_range API
> On Nov 21, 2018, at 5:35 AM, Matthew Wilcox wrote: > > It's probably better to be more explicit and answer Randy's question: > > * If we fail to insert any page into the vma, the function will return > * immediately leaving any previously-inserted pages present. Callers > * from the mmap handler may immediately return the error as their > * caller will destroy the vma, removing any successfully-inserted pages. > * Other callers should make their own arrangements for calling unmap_region(). That works for me as well. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/2] SWIOTLB fixes for 4.20
On Tue, Nov 20, 2018 at 11:34:41AM -0500, Konrad Rzeszutek Wilk wrote: > > Konrad, are you ok with me picking up both through the dma-mapping > > tree? > > Yes, albeit I would want Stefano to take a peek at patch #2 just in case. Stefano, can you take a look asap? This is a pretty trivial fix for a nasty bug that breaks boot on real life systems. I'd like to merge it by tomorrow so that I can send it off to Linus for the next rc (I will be offline for about two days after Friday morning) ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/9] mm: Introduce new vm_insert_range API
Could you add a line to the description explicitly stating that a failure to insert any page in the range will fail the entire routine, something like: > * This allows drivers to insert range of kernel pages they've allocated > * into a user vma. This is a generic function which drivers can use > * rather than using their own way of mapping range of kernel pages into > * user vma. > * > * A failure to insert any page in the range will fail the call as a whole. It's obvious when reading the code, but it would be self-documenting to state it outright. Thanks! -- Bill ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu: use memunmap to free memremap
memunmap() should be used to free the return of memremap(), not iounmap(). Fixes: dfddb969edf0("iommu/vt-d: Switch from ioremap_cache to memremap") Signed-off-by: Pan Bian --- drivers/iommu/intel-iommu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index f3ccf02..41a4b88 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -3075,7 +3075,7 @@ static int copy_context_table(struct intel_iommu *iommu, } if (old_ce) - iounmap(old_ce); + memunmap(old_ce); ret = 0; if (devfn < 0x80) -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/4] PCI / ACPI: Identify external PCI devices
On Tue, Nov 20, 2018 at 10:43:35PM +0100, Rafael J. Wysocki wrote: > On Friday, November 16, 2018 11:57:38 AM CET Lorenzo Pieralisi wrote: > > On Thu, Nov 15, 2018 at 07:33:54PM +, mario.limoncie...@dell.com wrote: > > > > > > > > > > -Original Message- > > > > From: Mika Westerberg > > > > Sent: Thursday, November 15, 2018 1:01 PM > > > > To: Lorenzo Pieralisi > > > > Cc: Lukas Wunner; iommu@lists.linux-foundation.org; Joerg Roedel; David > > > > Woodhouse; Lu Baolu; Ashok Raj; Bjorn Helgaas; Rafael J. Wysocki; Jacob > > > > jun Pan; > > > > Andreas Noever; Michael Jamet; Yehezkel Bernat; Christian Kellner; > > > > Limonciello, > > > > Mario; Anthony Wong; linux-a...@vger.kernel.org; > > > > linux-...@vger.kernel.org; linux- > > > > ker...@vger.kernel.org > > > > Subject: Re: [PATCH 1/4] PCI / ACPI: Identify external PCI devices > > > > > > > > > > > > [EXTERNAL EMAIL] > > > > > > > > On Thu, Nov 15, 2018 at 05:46:08PM +, Lorenzo Pieralisi wrote: > > > > > Do you really need to parse it if the dev->is_thunderbolt check is > > > > > enough ? > > > > > > > > Yes, we need to parse it one way or another. dev->is_thunderbolt is > > > > based on heuristics which do not apply anymore when the thing gets > > > > integrated in the SoC. > > > > > > > > The _DSD is there already (on existing systems) and is being used by > > > > Windows so I don't understand why we cannot take advantage of it? Every > > > > new system with Thunderbolt ports will have it. > > > > We have different opinions on this, there is no point in me reiterating > > it over and over, I am against the approach taken to solve this problem > > first in defining the bindings outside the ACPI specifications and > > second by acquiescing to what has been done so that it will be done > > over and over again. > > Arguably, however, we are on the receiving end of things here and even if > we don't use this binding, that won't buy us anything (but it will introduce > a fair amount of disappointment among both users and OEMs). > > If Windows uses it, then systems will ship with it regardless of what Linux > does with it, so your "acquiescing to what has been done" argument leads to > nowhere in this particular case. It's just a matter of whether or not > Linux will provide the same level of functionality as Windows with respect > to this and IMO it would be impractical to refuse to do that for purely > formal reasons. > > > I will raise the point in the appropriate forum, it is up to Bjorn > > and Rafael to decide on this patch. > > For the record, my opinion is that there's a little choice on whether > or not to use this extra information that firmware is willing to give > us. It could be defined in a better way and so on, but since it is in > use anyway, we really have nothing to gain by refusing to use it. AFAIK PCI firmware bindings should go into PCI firmware specifications, not Microsoft webpages. If we deviate from this model there is no way to tell whether that extra information is right or wrong, it is not necessarily about this patch, it is about changing the way these bindings are deployed in future systems. > Now, where the handling of it belongs to is a separate matter that should be > decided on its own. I think that the way these bindings were deployed is wrong, I agree this is not the right forum to discuss that though. What you will do with this patch is not my call anyway, I just expressed my opinion. Thanks, Lorenzo ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/9] mm: Introduce new vm_insert_range API
On Wed, Nov 21, 2018 at 04:19:11AM -0700, William Kucharski wrote: > Could you add a line to the description explicitly stating that a failure > to insert any page in the range will fail the entire routine, something > like: > > > * This allows drivers to insert range of kernel pages they've allocated > > * into a user vma. This is a generic function which drivers can use > > * rather than using their own way of mapping range of kernel pages into > > * user vma. > > * > > * A failure to insert any page in the range will fail the call as a whole. > > It's obvious when reading the code, but it would be self-documenting to > state it outright. It's probably better to be more explicit and answer Randy's question: * If we fail to insert any page into the vma, the function will return * immediately leaving any previously-inserted pages present. Callers * from the mmap handler may immediately return the error as their * caller will destroy the vma, removing any successfully-inserted pages. * Other callers should make their own arrangements for calling unmap_region(). Although unmap_region() is static so there clearly isn't any code in the kernel today other than in mmap handlers (or fault handlers) that needs to insert pages into a VMA. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [virtio-dev] Re: [PATCH v4 5/7] iommu: Add virtio-iommu driver
Hi jean, On 11/20/18 6:30 PM, Jean-Philippe Brucker wrote: > On 16/11/2018 18:46, Jean-Philippe Brucker wrote: +/* + * __viommu_sync_req - Complete all in-flight requests + * + * Wait for all added requests to complete. When this function returns, all + * requests that were in-flight at the time of the call have completed. + */ +static int __viommu_sync_req(struct viommu_dev *viommu) +{ + int ret = 0; + unsigned int len; + size_t write_len; + struct viommu_request *req; + struct virtqueue *vq = viommu->vqs[VIOMMU_REQUEST_VQ]; + + assert_spin_locked(>request_lock); + + virtqueue_kick(vq); + + while (!list_empty(>requests)) { + len = 0; + req = virtqueue_get_buf(vq, ); + if (!req) + continue; + + if (!len) + viommu_set_req_status(req->buf, req->len, +VIRTIO_IOMMU_S_IOERR); + + write_len = req->len - req->write_offset; + if (req->writeback && len >= write_len) >>> I don't get "len >= write_len". Is it valid for the device to write more >>> than write_len? If it writes less than write_len, the status is not set, >>> is it? > > Actually, len could well be three bytes smaller than write_len. The spec > doesn't require the device to write the three padding bytes in > virtio_iommu_req_tail, after the status byte. > > Here the driver just assumes that the device wrote the reserved field. > The QEMU device seems to write uninitialized data in there... Indeed that's incorrect and I should fix it. tail struct should be properly initialized to 0. Only probe request implementation is correct. > > Any objection to changing the spec to require the device to initialize > those bytes to zero? I think it makes things nicer overall and shouldn't > have any performance impact. No objection from me. > > [...] +static struct iommu_domain *viommu_domain_alloc(unsigned type) +{ + struct viommu_domain *vdomain; + + if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA) >>> smmu drivers also support the IDENTITY type. Don't we want to support it >>> as well? >> >> Eventually yes. The IDENTITY domain is useful when an IOMMU has been >> forced upon you and gets in the way of performance, in which case you >> get rid of it with CONFIG_IOMMU_DEFAULT_PASSTHROUGH=y or >> iommu.passthrough=y. For virtio-iommu though, you could simply not >> instantiate the device. >> >> I don't think it's difficult to add: if the device supports >> VIRTIO_IOMMU_F_BYPASS, then we simply don't send an ATTACH request. >> Otherwise after ATTACH we send a MAP for the whole input range. If the >> change isn't bigger than this, I'll add it to v5. > > Not as straightforward as I hoped, when the device doesn't support > VIRTIO_IOMMU_F_BYPASS: > > * We can't simply create an ID map for the whole input range, we need to > carve out the resv_mem regions. > > * For a VFIO device, the host has no way of forwarding the identity > mapping. For example the guest will attempt to map [0; 0x] > -> [0; 0x], but VFIO only allows to map RAM and cannot take > such request. One solution is to only create ID mapping for RAM, and > register a notifier for memory hotplug, like intel-iommu does for IOMMUs > that don't support HW pass-through. > > Since it's not completely trivial and - I think - not urgent, I'll leave > this for later. OK makes sense to me too. It was just a head up. Thanks Eric > > Thanks, > Jean > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH 2/6] drivers core: Add I/O ASID allocator
On 12/11/2018 14:40, Joerg Roedel wrote: > Hi Jean-Philippe, > > On Fri, Oct 19, 2018 at 07:11:54PM +0100, Jean-Philippe Brucker wrote: >> The allocator doesn't really belong in drivers/iommu because some >> drivers would like to allocate PASIDs for devices that aren't managed by >> an IOMMU, using the same ID space as IOMMU. It doesn't really belong in >> drivers/pci either since platform device also support PASID. Add the >> allocator in drivers/base. > > I don't really buy this argument, in the end it is the IOMMU that > enforces the PASID mappings for a device. Why should a device not behind > an IOMMU need to allocate a pasid? Do you have examples of such devices > which also don't have their own iommu built-in? I misunderstood that requirement. Reading through the thread again (https://www.mail-archive.com/iommu@lists.linux-foundation.org/msg25640.html) it's more about a device using PASIDs as context IDs. Some contexts are not bound to processes but they still need their ID in the same PASID space as the contexts that are bound to process address spaces. So we can keep that code in drivers/iommu >> +int ioasid_for_each(struct ioasid_set *set, ioasid_iter_t func, void *data) >> +{ >> +int ret; >> +struct ioasid_iter_data iter_data = { >> +.set= set, >> +.func = func, >> +.data = data, >> +}; >> + >> +idr_lock(_idr); >> +ret = idr_for_each(_idr, ioasid_iter, _data); >> +idr_unlock(_idr); >> + >> +return ret; >> +} >> +EXPORT_SYMBOL_GPL(ioasid_for_each); > > What is the intended use-case for this function? I'm using it in sva_bind(), to see if there already exists an io_mm associated to the mm_struct given as argument >> +void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid) >> +{ >> +void *priv = NULL; >> +struct ioasid_data *ioasid_data; >> + >> +idr_lock(_idr); >> +ioasid_data = idr_find(_idr, ioasid); >> +if (ioasid_data && ioasid_data->set == set) >> +priv = ioasid_data->private; >> +idr_unlock(_idr); >> + >> +return priv; >> +} > > I think using the idr_lock here will become a performance problem, as we > need this function in the SVA page-fault path to retrieve the mm_struct > belonging to a PASID. > > The head comment in lib/idr.c states that it should be safe to use > rcu_readlock() on read-only iterations. If so, this should be used here > so that concurrent lookups are possible. Agreed. I have a patch to use the RCU which looks simple enough, but since I haven't used RCU before I wasn't comfortable squashing it right away. Thanks, Jean diff --git a/drivers/base/ioasid.c b/drivers/base/ioasid.c index 5eb3abbf69ac..fa3b48c9c044 100644 --- a/drivers/base/ioasid.c +++ b/drivers/base/ioasid.c @@ -13,6 +13,7 @@ struct ioasid_data { int id; struct ioasid_set *set; void *private; + struct rcu_head rcu; }; static DEFINE_IDR(ioasid_idr); @@ -56,7 +57,8 @@ void ioasid_free(ioasid_t ioasid) ioasid_data = idr_remove(_idr, ioasid); idr_unlock(_idr); - kfree(ioasid_data); + if (ioasid_data) + kfree_rcu(ioasid_data, rcu); } EXPORT_SYMBOL_GPL(ioasid_free); @@ -95,9 +97,9 @@ int ioasid_for_each(struct ioasid_set *set, ioasid_iter_t func, void *data) .data = data, }; - idr_lock(_idr); + rcu_read_lock(); ret = idr_for_each(_idr, ioasid_iter, _data); - idr_unlock(_idr); + rcu_read_unlock(); return ret; } @@ -111,11 +113,11 @@ void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid) void *priv = NULL; struct ioasid_data *ioasid_data; - idr_lock(_idr); + rcu_read_lock(); ioasid_data = idr_find(_idr, ioasid); if (ioasid_data && ioasid_data->set == set) priv = ioasid_data->private; - idr_unlock(_idr); + rcu_read_unlock(); return priv; } ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 6/8] vfio/mdev: Add iommu place holders in mdev_device
On Wed, Nov 21, 2018 at 02:22:08AM +0530, Kirti Wankhede wrote: > It is about how mdev framework can be used by existing drivers. These > symbols doesn't use any other exported symbols. That is an unfortunate accident of history, but doesn't extent to new ones. It also is another inidicator those drivers probably are derived works of the Linux kernel and might be in legal trouble one way or another. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu