RE: [RFC PATCH 2/6] drivers core: Add I/O ASID allocator

2018-11-21 Thread Tian, Kevin
> From: Koenig, Christian
> Sent: Thursday, November 22, 2018 3:10 AM
> 
> Am 21.11.18 um 12:16 schrieb Jean-Philippe Brucker:
> > On 12/11/2018 14:40, Joerg Roedel wrote:
> >> Hi Jean-Philippe,
> >>
> >> On Fri, Oct 19, 2018 at 07:11:54PM +0100, Jean-Philippe Brucker wrote:
> >>> The allocator doesn't really belong in drivers/iommu because some
> >>> drivers would like to allocate PASIDs for devices that aren't managed by
> >>> an IOMMU, using the same ID space as IOMMU. It doesn't really
> belong in
> >>> drivers/pci either since platform device also support PASID. Add the
> >>> allocator in drivers/base.
> >> I don't really buy this argument, in the end it is the IOMMU that
> >> enforces the PASID mappings for a device. Why should a device not
> behind
> >> an IOMMU need to allocate a pasid? Do you have examples of such
> devices
> >> which also don't have their own iommu built-in?
> > I misunderstood that requirement. Reading through the thread again
> > (https://www.mail-archive.com/iommu@lists.linux-
> foundation.org/msg25640.html)
> > it's more about a device using PASIDs as context IDs. Some contexts are
> > not bound to processes but they still need their ID in the same PASID
> > space as the contexts that are bound to process address spaces. So we
> > can keep that code in drivers/iommu
> 
> The problem with that approach is that we also need this allocator when
> IOMMU is completely disabled.
> 
> In other words PASIDs are used as contexts IDs by the hardware for
> things like signaling which application has caused an interrupt/event
> even when they are not used by IOMMU later on.
> 
> Additional to that we have some MMUs which are internal to the devices
> (GPUVM on AMD GPUs for example, similar exists for NVidia) which uses
> PASIDs for address translation internally in the device.
> 
> All of that is completely unrelated to IOMMU, but when IOMMU is enabled
> you need to use the same allocator because all use cases use the same ID
> space.
> 

imo context ID is a resource used by device internal while PASID is sth.
informational to external world (as defined in PCI bus package). ideally two 
things are separate in hardware but can be associated (e.g. PASID as an 
optional parameter per context). That is what I know for Intel GPUs.

For your device, is "PASIDs are used as contexts IDs" a software 
implementation policy or actual hardware requirement (i.e. no dedicated
PASID recording)?

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables

2018-11-21 Thread Nicolas Boichat
On Thu, Nov 22, 2018 at 10:36 AM Matthew Wilcox  wrote:
>
> On Wed, Nov 21, 2018 at 10:26:26PM +, Robin Murphy wrote:
> > These are IOMMU page tables, rather than CPU ones, so we're already well
> > outside arch code - indeed the original motivation of io-pgtable was to be
> > entirely independent of the p*d types and arch-specific MM code (this Armv7
> > short-descriptor format is already "non-native" when used by drivers in an
> > arm64 kernel).
>
> There was quite a lot of explanation missing from this patch description!

I totally agree ,-) I'm not familiar at all with either iommu or
mm/... Looks like the patchset triggered a helpful discussion, and I
understand the problem better now. I'll improve the description in the
next revision.

> > There are various efficiency reasons for using regular kernel memory instead
> > of coherent DMA allocations - for the most part it works well, we just have
> > the odd corner case like this one where the 32-bit format gets used on
> > 64-bit systems such that the tables themselves still need to be allocated
> > below 4GB (although the final output address can point at higher memory by
> > virtue of the IOMMU in question not implementing permissions and repurposing
> > some of those PTE fields as extra address bits).
> >
> > TBH, if this DMA32 stuff is going to be contentious we could possibly just
> > rip out the offending kmem_cache - it seemed like good practice for the
> > use-case, but provided kzalloc(SZ_1K, gfp | GFP_DMA32) can be relied upon to
> > give the same 1KB alignment and chance of succeeding as the equivalent
> > kmem_cache_alloc(), then we could quite easily make do with that instead.
>
> I think you should look at using the page_frag allocator here.  You can
> use whatever GFP_DMA flags you like.

I'll try that.

Thanks!
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 07/17] debugobjects: Move printk out of db lock critical sections

2018-11-21 Thread Sergey Senozhatsky
On (11/21/18 11:49), Waiman Long wrote:
[..]
> > case ODEBUG_STATE_ACTIVE:
> > -   debug_print_object(obj, "init");
> > state = obj->state;
> > raw_spin_unlock_irqrestore(>lock, flags);
> > +   debug_print_object(obj, "init");
> > debug_object_fixup(descr->fixup_init, addr, state);
> > return;
> >  
> > case ODEBUG_STATE_DESTROYED:
> > -   debug_print_object(obj, "init");
> > +   debug_printobj = true;
> > break;
> > default:
> > break;
> > }
> >  
> > raw_spin_unlock_irqrestore(>lock, flags);
> > +   if (debug_chkstack)
> > +   debug_object_is_on_stack(addr, onstack);
> > +   if (debug_printobj)
> > +   debug_print_object(obj, "init");
> >
[..]
>
> As a side note, one of the test systems that I used generated a
> debugobjects splat in the bootup process and the system hanged
> afterward. Applying this patch alone fix the hanging problem and the
> system booted up successfully. So it is not really a good idea to call
> printk() while holding a raw spinlock.

Right, I like this patch.
And I think that we, maybe, can go even further.

Some serial consoles call mod_timer(). So what we could have with the
debug objects enabled was

mod_timer()
 lock_timer_base()
  debug_activate()
   printk()
call_console_drivers()
 foo_console()
  mod_timer()
   lock_timer_base()   << deadlock

That's one possible scenario. The other one can involve console's
IRQ handler, uart port spinlock, mod_timer, debug objects, printk,
and an eventual deadlock on the uart port spinlock. This one can
be mitigated with printk_safe. But mod_timer() deadlock will require
a different fix.

So maybe we need to switch debug objects print-outs to _always_
printk_deferred(). Debug objects can be used in code which cannot
do direct printk() - timekeeping is just one example.

-ss
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables

2018-11-21 Thread Matthew Wilcox
On Wed, Nov 21, 2018 at 10:26:26PM +, Robin Murphy wrote:
> These are IOMMU page tables, rather than CPU ones, so we're already well
> outside arch code - indeed the original motivation of io-pgtable was to be
> entirely independent of the p*d types and arch-specific MM code (this Armv7
> short-descriptor format is already "non-native" when used by drivers in an
> arm64 kernel).

There was quite a lot of explanation missing from this patch description!

> There are various efficiency reasons for using regular kernel memory instead
> of coherent DMA allocations - for the most part it works well, we just have
> the odd corner case like this one where the 32-bit format gets used on
> 64-bit systems such that the tables themselves still need to be allocated
> below 4GB (although the final output address can point at higher memory by
> virtue of the IOMMU in question not implementing permissions and repurposing
> some of those PTE fields as extra address bits).
> 
> TBH, if this DMA32 stuff is going to be contentious we could possibly just
> rip out the offending kmem_cache - it seemed like good practice for the
> use-case, but provided kzalloc(SZ_1K, gfp | GFP_DMA32) can be relied upon to
> give the same 1KB alignment and chance of succeeding as the equivalent
> kmem_cache_alloc(), then we could quite easily make do with that instead.

I think you should look at using the page_frag allocator here.  You can
use whatever GFP_DMA flags you like.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 3/3] iommu/io-pgtable-arm-v7s: Request DMA32 memory, and improve debugging

2018-11-21 Thread Nicolas Boichat
On Thu, Nov 22, 2018 at 2:02 AM Michal Hocko  wrote:
>
> On Wed 21-11-18 16:46:38, Will Deacon wrote:
> > On Sun, Nov 11, 2018 at 05:03:41PM +0800, Nicolas Boichat wrote:
> > > For level 1/2 pages, ensure GFP_DMA32 is used if CONFIG_ZONE_DMA32
> > > is defined (e.g. on arm64 platforms).
> > >
> > > For level 2 pages, allocate a slab cache in SLAB_CACHE_DMA32.
> > >
> > > Also, print an error when the physical address does not fit in
> > > 32-bit, to make debugging easier in the future.
> > >
> > > Fixes: ad67f5a6545f ("arm64: replace ZONE_DMA with ZONE_DMA32")
> > > Signed-off-by: Nicolas Boichat 
> > > ---
> > >
> > > Changes since v1:
> > >  - Changed approach to use SLAB_CACHE_DMA32 added by the previous
> > >commit.
> > >  - Use DMA or DMA32 depending on the architecture (DMA for arm,
> > >DMA32 for arm64).
> > >
> > > drivers/iommu/io-pgtable-arm-v7s.c | 20 
> > >  1 file changed, 16 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
> > > b/drivers/iommu/io-pgtable-arm-v7s.c
> > > index 445c3bde04800c..996f7b6d00b44a 100644
> > > --- a/drivers/iommu/io-pgtable-arm-v7s.c
> > > +++ b/drivers/iommu/io-pgtable-arm-v7s.c
> > > @@ -161,6 +161,14 @@
> > >
> > >  #define ARM_V7S_TCR_PD1BIT(5)
> > >
> > > +#ifdef CONFIG_ZONE_DMA32
> > > +#define ARM_V7S_TABLE_GFP_DMA GFP_DMA32
> > > +#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA32
> > > +#else
> > > +#define ARM_V7S_TABLE_GFP_DMA GFP_DMA
> > > +#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA
> > > +#endif
> >
> > It's a bit grotty that GFP_DMA32 doesn't just map to GFP_DMA on 32-bit
> > architectures, since then we wouldn't need this #ifdeffery afaict.
>
> But GFP_DMA32 should map to GFP_KERNEL on 32b, no? Or what exactly is
> going on in here?

GFP_DMA32 will fail due to check_slab_flags (aka GFP_SLAB_BUG_MASK
before patch 1/3 of this series)... But yes, it may be neater if there
was transparent remapping of GFP_DMA32/SLAB_CACHE_DMA32 to
GFP_DMA/SLAB_CACHE_DMA on 32-bit arch...

> --
> Michal Hocko
> SUSE Labs
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables

2018-11-21 Thread Nicolas Boichat
On Thu, Nov 22, 2018 at 6:27 AM Robin Murphy  wrote:
>
> On 2018-11-21 9:38 pm, Matthew Wilcox wrote:
> > On Wed, Nov 21, 2018 at 06:20:02PM +, Christopher Lameter wrote:
> >> On Sun, 11 Nov 2018, Nicolas Boichat wrote:
> >>
> >>> This is a follow-up to the discussion in [1], to make sure that the page
> >>> tables allocated by iommu/io-pgtable-arm-v7s are contained within 32-bit
> >>> physical address space.
> >>
> >> Page tables? This means you need a page frame? Why go through the slab
> >> allocators?
> >
> > Because this particular architecture has sub-page-size PMD page tables.
> > We desperately need to hoist page table allocation out of the architectures;
> > there're a bunch of different implementations and they're mostly bad,
> > one way or another.
>
> These are IOMMU page tables, rather than CPU ones, so we're already well
> outside arch code - indeed the original motivation of io-pgtable was to
> be entirely independent of the p*d types and arch-specific MM code (this
> Armv7 short-descriptor format is already "non-native" when used by
> drivers in an arm64 kernel).
>
> There are various efficiency reasons for using regular kernel memory
> instead of coherent DMA allocations - for the most part it works well,
> we just have the odd corner case like this one where the 32-bit format
> gets used on 64-bit systems such that the tables themselves still need
> to be allocated below 4GB (although the final output address can point
> at higher memory by virtue of the IOMMU in question not implementing
> permissions and repurposing some of those PTE fields as extra address bits).
>
> TBH, if this DMA32 stuff is going to be contentious we could possibly
> just rip out the offending kmem_cache - it seemed like good practice for
> the use-case, but provided kzalloc(SZ_1K, gfp | GFP_DMA32) can be relied
> upon to give the same 1KB alignment and chance of succeeding as the
> equivalent kmem_cache_alloc(), then we could quite easily make do with
> that instead.

Yes, but if we want to use kzalloc, we'll need to create
kmalloc_caches for DMA32, which seems wasteful as there are no other
users (see my comment here:
https://patchwork.kernel.org/patch/10677525/#22332697).

Thanks,

> Thanks,
> Robin.
>
> > For each level of page table we generally have three cases:
> >
> > 1. single page
> > 2. sub-page, naturally aligned
> > 3. multiple pages, naturally aligned
> >
> > for 1 and 3, the page allocator will do just fine.
> > for 2, we should have a per-MM page_frag allocator.  s390 already has
> > something like this, although it's more complicated.  ppc also has
> > something a little more complex for the cases when it's configured with
> > a 64k page size but wants to use a 4k page table entry.
> >
> > I'd like x86 to be able to simply do:
> >
> > #define pte_alloc_one(mm, addr)   page_alloc_table(mm, addr, 0)
> > #define pmd_alloc_one(mm, addr)   page_alloc_table(mm, addr, 0)
> > #define pud_alloc_one(mm, addr)   page_alloc_table(mm, addr, 0)
> > #define p4d_alloc_one(mm, addr)   page_alloc_table(mm, addr, 0)
> >
> > An architecture with 4k page size and needing a 16k PMD would do:
> >
> > #define pmd_alloc_one(mm, addr) page_alloc_table(mm, addr, 2)
> >
> > while an architecture with a 64k page size needing a 4k PTE would do:
> >
> > #define ARCH_PAGE_TABLE_FRAG
> > #define pte_alloc_one(mm, addr) pagefrag_alloc_table(mm, addr, 4096)
> >
> > I haven't had time to work on this, but perhaps someone with a problem
> > that needs fixing would like to, instead of burying yet another awful
> > implementation away in arch/ somewhere.
> >
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/3] mm: Add support for SLAB_CACHE_DMA32

2018-11-21 Thread Nicolas Boichat
On Thu, Nov 22, 2018 at 2:32 AM Christopher Lameter  wrote:
>
> On Sun, 11 Nov 2018, Nicolas Boichat wrote:
>
> > SLAB_CACHE_DMA32 is only available after explicit kmem_cache_create calls,
> > no default cache is created for kmalloc. Add a test in check_slab_flags
> > for this.
>
> This does not define the dma32 kmalloc array. Is that intentional?

Yes that's intentional, AFAICT there is no user, so there is no point
creating the cache.

 (okay, I could find one, but it's probably broken:
git grep GFP_DMA32 | grep k[a-z]*alloc
drivers/media/platform/vivid/vivid-osd.c: dev->video_vbase =
kzalloc(dev->video_buffer_size, GFP_KERNEL | GFP_DMA32);
).

> In that
> case you need to fail any request for GFP_DMA32 coming in via kmalloc.

Well, we do check for these in check_slab_flags (aka GFP_SLAB_BUG_MASK
before patch 1/3 of this series), so, with or without this patch,
calls with GFP_DMA32 will end up failing in check_slab_flags.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/vt-d: Handle domain agaw being less than iommu agaw

2018-11-21 Thread Sohil Mehta
The Intel IOMMU driver opportunistically skips a few top level page
tables from the domain paging directory while programming the IOMMU
context entry. However there is an implicit assumption in the code that
domain's adjusted guest address width (agaw) would always be greater
than IOMMU's agaw.

The IOMMU capabilities in an upcoming platform cause the domain's agaw
to be lower than IOMMU's agaw. The issue is seen when the IOMMU supports
both 4-level and 5-level paging. The domain builds a 4-level page table
based on agaw of 2. However the IOMMU's agaw is set as 3 (5-level). In
this case the code incorrectly tries to skip page page table levels.
This causes the IOMMU driver to avoid programming the context entry. The
fix handles this case and programs the context entry accordingly.

Fixes: de24e55395698 ("iommu/vt-d: Simplify domain_context_mapping_one")
Cc: 
Cc: Ashok Raj 
Cc: Jacob Pan 
Cc: Lu Baolu 
Reviewed-by: Lu Baolu 
Reported-by: Ramos Falcon, Ernesto R 
Tested-by: Ricardo Neri 
Signed-off-by: Sohil Mehta 
---
 drivers/iommu/intel-iommu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index f3ccf025108b..fdf79baf1d79 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2044,7 +2044,7 @@ static int domain_context_mapping_one(struct dmar_domain 
*domain,
 * than default.  Unnecessary for PT mode.
 */
if (translation != CONTEXT_TT_PASS_THROUGH) {
-   for (agaw = domain->agaw; agaw != iommu->agaw; agaw--) {
+   for (agaw = domain->agaw; agaw > iommu->agaw; agaw--) {
ret = -ENOMEM;
pgd = phys_to_virt(dma_pte_addr(pgd));
if (!dma_pte_present(pgd))
@@ -2058,7 +2058,7 @@ static int domain_context_mapping_one(struct dmar_domain 
*domain,
translation = CONTEXT_TT_MULTI_LEVEL;
 
context_set_address_root(context, virt_to_phys(pgd));
-   context_set_address_width(context, iommu->agaw);
+   context_set_address_width(context, agaw);
} else {
/*
 * In pass through mode, AW must be programmed to
-- 
2.19.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables

2018-11-21 Thread Robin Murphy

On 2018-11-21 9:38 pm, Matthew Wilcox wrote:

On Wed, Nov 21, 2018 at 06:20:02PM +, Christopher Lameter wrote:

On Sun, 11 Nov 2018, Nicolas Boichat wrote:


This is a follow-up to the discussion in [1], to make sure that the page
tables allocated by iommu/io-pgtable-arm-v7s are contained within 32-bit
physical address space.


Page tables? This means you need a page frame? Why go through the slab
allocators?


Because this particular architecture has sub-page-size PMD page tables.
We desperately need to hoist page table allocation out of the architectures;
there're a bunch of different implementations and they're mostly bad,
one way or another.


These are IOMMU page tables, rather than CPU ones, so we're already well 
outside arch code - indeed the original motivation of io-pgtable was to 
be entirely independent of the p*d types and arch-specific MM code (this 
Armv7 short-descriptor format is already "non-native" when used by 
drivers in an arm64 kernel).


There are various efficiency reasons for using regular kernel memory 
instead of coherent DMA allocations - for the most part it works well, 
we just have the odd corner case like this one where the 32-bit format 
gets used on 64-bit systems such that the tables themselves still need 
to be allocated below 4GB (although the final output address can point 
at higher memory by virtue of the IOMMU in question not implementing 
permissions and repurposing some of those PTE fields as extra address bits).


TBH, if this DMA32 stuff is going to be contentious we could possibly 
just rip out the offending kmem_cache - it seemed like good practice for 
the use-case, but provided kzalloc(SZ_1K, gfp | GFP_DMA32) can be relied 
upon to give the same 1KB alignment and chance of succeeding as the 
equivalent kmem_cache_alloc(), then we could quite easily make do with 
that instead.


Thanks,
Robin.


For each level of page table we generally have three cases:

1. single page
2. sub-page, naturally aligned
3. multiple pages, naturally aligned

for 1 and 3, the page allocator will do just fine.
for 2, we should have a per-MM page_frag allocator.  s390 already has
something like this, although it's more complicated.  ppc also has
something a little more complex for the cases when it's configured with
a 64k page size but wants to use a 4k page table entry.

I'd like x86 to be able to simply do:

#define pte_alloc_one(mm, addr) page_alloc_table(mm, addr, 0)
#define pmd_alloc_one(mm, addr) page_alloc_table(mm, addr, 0)
#define pud_alloc_one(mm, addr) page_alloc_table(mm, addr, 0)
#define p4d_alloc_one(mm, addr) page_alloc_table(mm, addr, 0)

An architecture with 4k page size and needing a 16k PMD would do:

#define pmd_alloc_one(mm, addr) page_alloc_table(mm, addr, 2)

while an architecture with a 64k page size needing a 4k PTE would do:

#define ARCH_PAGE_TABLE_FRAG
#define pte_alloc_one(mm, addr) pagefrag_alloc_table(mm, addr, 4096)

I haven't had time to work on this, but perhaps someone with a problem
that needs fixing would like to, instead of burying yet another awful
implementation away in arch/ somewhere.


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables

2018-11-21 Thread Matthew Wilcox
On Wed, Nov 21, 2018 at 06:20:02PM +, Christopher Lameter wrote:
> On Sun, 11 Nov 2018, Nicolas Boichat wrote:
> 
> > This is a follow-up to the discussion in [1], to make sure that the page
> > tables allocated by iommu/io-pgtable-arm-v7s are contained within 32-bit
> > physical address space.
> 
> Page tables? This means you need a page frame? Why go through the slab
> allocators?

Because this particular architecture has sub-page-size PMD page tables.
We desperately need to hoist page table allocation out of the architectures;
there're a bunch of different implementations and they're mostly bad,
one way or another.

For each level of page table we generally have three cases:

1. single page
2. sub-page, naturally aligned
3. multiple pages, naturally aligned

for 1 and 3, the page allocator will do just fine.
for 2, we should have a per-MM page_frag allocator.  s390 already has
something like this, although it's more complicated.  ppc also has
something a little more complex for the cases when it's configured with
a 64k page size but wants to use a 4k page table entry.

I'd like x86 to be able to simply do:

#define pte_alloc_one(mm, addr) page_alloc_table(mm, addr, 0)
#define pmd_alloc_one(mm, addr) page_alloc_table(mm, addr, 0)
#define pud_alloc_one(mm, addr) page_alloc_table(mm, addr, 0)
#define p4d_alloc_one(mm, addr) page_alloc_table(mm, addr, 0)

An architecture with 4k page size and needing a 16k PMD would do:

#define pmd_alloc_one(mm, addr) page_alloc_table(mm, addr, 2)

while an architecture with a 64k page size needing a 4k PTE would do:

#define ARCH_PAGE_TABLE_FRAG
#define pte_alloc_one(mm, addr) pagefrag_alloc_table(mm, addr, 4096)

I haven't had time to work on this, but perhaps someone with a problem
that needs fixing would like to, instead of burying yet another awful
implementation away in arch/ somewhere.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/3] mm: Add support for SLAB_CACHE_DMA32

2018-11-21 Thread Christopher Lameter
On Sun, 11 Nov 2018, Nicolas Boichat wrote:

> SLAB_CACHE_DMA32 is only available after explicit kmem_cache_create calls,
> no default cache is created for kmalloc. Add a test in check_slab_flags
> for this.

This does not define the dma32 kmalloc array. Is that intentional? In that
case you need to fail any request for GFP_DMA32 coming in via kmalloc.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 3/3] iommu/io-pgtable-arm-v7s: Request DMA32 memory, and improve debugging

2018-11-21 Thread Christopher Lameter
On Wed, 21 Nov 2018, Will Deacon wrote:

> > +#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA32

SLAB_CACHE_DMA32??? WTH is going on here? We are trying to get rid of
the dma slab array.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables

2018-11-21 Thread Christopher Lameter
On Sun, 11 Nov 2018, Nicolas Boichat wrote:

> This is a follow-up to the discussion in [1], to make sure that the page
> tables allocated by iommu/io-pgtable-arm-v7s are contained within 32-bit
> physical address space.

Page tables? This means you need a page frame? Why go through the slab
allocators?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/9] Use vm_insert_range

2018-11-21 Thread Boris Ostrovsky
On 11/21/18 2:56 PM, Souptick Joarder wrote:
> On Thu, Nov 22, 2018 at 1:08 AM Boris Ostrovsky
>  wrote:
>> On 11/21/18 1:24 AM, Souptick Joarder wrote:
>>> On Thu, Nov 15, 2018 at 9:09 PM Souptick Joarder  
>>> wrote:
 Previouly drivers have their own way of mapping range of
 kernel pages/memory into user vma and this was done by
 invoking vm_insert_page() within a loop.

 As this pattern is common across different drivers, it can
 be generalized by creating a new function and use it across
 the drivers.

 vm_insert_range is the new API which will be used to map a
 range of kernel memory/pages to user vma.

 All the applicable places are converted to use new vm_insert_range
 in this patch series.

 Souptick Joarder (9):
   mm: Introduce new vm_insert_range API
   arch/arm/mm/dma-mapping.c: Convert to use vm_insert_range
   drivers/firewire/core-iso.c: Convert to use vm_insert_range
   drm/rockchip/rockchip_drm_gem.c: Convert to use vm_insert_range
   drm/xen/xen_drm_front_gem.c: Convert to use vm_insert_range
   iommu/dma-iommu.c: Convert to use vm_insert_range
   videobuf2/videobuf2-dma-sg.c: Convert to use vm_insert_range
   xen/gntdev.c: Convert to use vm_insert_range
   xen/privcmd-buf.c: Convert to use vm_insert_range
>>> Any further comment on driver changes ?
>> Xen drivers (the last two patches) look fine to me.
> Thanks, can I considered this as Reviewed-by ?


Reviewed-by: Boris Ostrovsky 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/9] Use vm_insert_range

2018-11-21 Thread Souptick Joarder
On Thu, Nov 22, 2018 at 1:08 AM Boris Ostrovsky
 wrote:
>
> On 11/21/18 1:24 AM, Souptick Joarder wrote:
> > On Thu, Nov 15, 2018 at 9:09 PM Souptick Joarder  
> > wrote:
> >> Previouly drivers have their own way of mapping range of
> >> kernel pages/memory into user vma and this was done by
> >> invoking vm_insert_page() within a loop.
> >>
> >> As this pattern is common across different drivers, it can
> >> be generalized by creating a new function and use it across
> >> the drivers.
> >>
> >> vm_insert_range is the new API which will be used to map a
> >> range of kernel memory/pages to user vma.
> >>
> >> All the applicable places are converted to use new vm_insert_range
> >> in this patch series.
> >>
> >> Souptick Joarder (9):
> >>   mm: Introduce new vm_insert_range API
> >>   arch/arm/mm/dma-mapping.c: Convert to use vm_insert_range
> >>   drivers/firewire/core-iso.c: Convert to use vm_insert_range
> >>   drm/rockchip/rockchip_drm_gem.c: Convert to use vm_insert_range
> >>   drm/xen/xen_drm_front_gem.c: Convert to use vm_insert_range
> >>   iommu/dma-iommu.c: Convert to use vm_insert_range
> >>   videobuf2/videobuf2-dma-sg.c: Convert to use vm_insert_range
> >>   xen/gntdev.c: Convert to use vm_insert_range
> >>   xen/privcmd-buf.c: Convert to use vm_insert_range
> > Any further comment on driver changes ?
>
> Xen drivers (the last two patches) look fine to me.

Thanks, can I considered this as Reviewed-by ?

>
> -boris
>
>
> >>  arch/arm/mm/dma-mapping.c | 21 ++---
> >>  drivers/firewire/core-iso.c   | 15 ++--
> >>  drivers/gpu/drm/rockchip/rockchip_drm_gem.c   | 20 ++--
> >>  drivers/gpu/drm/xen/xen_drm_front_gem.c   | 20 +---
> >>  drivers/iommu/dma-iommu.c | 12 ++
> >>  drivers/media/common/videobuf2/videobuf2-dma-sg.c | 23 ++-
> >>  drivers/xen/gntdev.c  | 11 -
> >>  drivers/xen/privcmd-buf.c |  8 ++-
> >>  include/linux/mm_types.h  |  3 +++
> >>  mm/memory.c   | 28 
> >> +++
> >>  mm/nommu.c|  7 ++
> >>  11 files changed, 70 insertions(+), 98 deletions(-)
> >>
> >> --
> >> 1.9.1
> >>
>
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/9] Use vm_insert_range

2018-11-21 Thread Boris Ostrovsky
On 11/21/18 1:24 AM, Souptick Joarder wrote:
> On Thu, Nov 15, 2018 at 9:09 PM Souptick Joarder  wrote:
>> Previouly drivers have their own way of mapping range of
>> kernel pages/memory into user vma and this was done by
>> invoking vm_insert_page() within a loop.
>>
>> As this pattern is common across different drivers, it can
>> be generalized by creating a new function and use it across
>> the drivers.
>>
>> vm_insert_range is the new API which will be used to map a
>> range of kernel memory/pages to user vma.
>>
>> All the applicable places are converted to use new vm_insert_range
>> in this patch series.
>>
>> Souptick Joarder (9):
>>   mm: Introduce new vm_insert_range API
>>   arch/arm/mm/dma-mapping.c: Convert to use vm_insert_range
>>   drivers/firewire/core-iso.c: Convert to use vm_insert_range
>>   drm/rockchip/rockchip_drm_gem.c: Convert to use vm_insert_range
>>   drm/xen/xen_drm_front_gem.c: Convert to use vm_insert_range
>>   iommu/dma-iommu.c: Convert to use vm_insert_range
>>   videobuf2/videobuf2-dma-sg.c: Convert to use vm_insert_range
>>   xen/gntdev.c: Convert to use vm_insert_range
>>   xen/privcmd-buf.c: Convert to use vm_insert_range
> Any further comment on driver changes ?

Xen drivers (the last two patches) look fine to me.

-boris


>>  arch/arm/mm/dma-mapping.c | 21 ++---
>>  drivers/firewire/core-iso.c   | 15 ++--
>>  drivers/gpu/drm/rockchip/rockchip_drm_gem.c   | 20 ++--
>>  drivers/gpu/drm/xen/xen_drm_front_gem.c   | 20 +---
>>  drivers/iommu/dma-iommu.c | 12 ++
>>  drivers/media/common/videobuf2/videobuf2-dma-sg.c | 23 ++-
>>  drivers/xen/gntdev.c  | 11 -
>>  drivers/xen/privcmd-buf.c |  8 ++-
>>  include/linux/mm_types.h  |  3 +++
>>  mm/memory.c   | 28 
>> +++
>>  mm/nommu.c|  7 ++
>>  11 files changed, 70 insertions(+), 98 deletions(-)
>>
>> --
>> 1.9.1
>>

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 2/6] drivers core: Add I/O ASID allocator

2018-11-21 Thread Koenig, Christian
Am 21.11.18 um 12:16 schrieb Jean-Philippe Brucker:
> On 12/11/2018 14:40, Joerg Roedel wrote:
>> Hi Jean-Philippe,
>>
>> On Fri, Oct 19, 2018 at 07:11:54PM +0100, Jean-Philippe Brucker wrote:
>>> The allocator doesn't really belong in drivers/iommu because some
>>> drivers would like to allocate PASIDs for devices that aren't managed by
>>> an IOMMU, using the same ID space as IOMMU. It doesn't really belong in
>>> drivers/pci either since platform device also support PASID. Add the
>>> allocator in drivers/base.
>> I don't really buy this argument, in the end it is the IOMMU that
>> enforces the PASID mappings for a device. Why should a device not behind
>> an IOMMU need to allocate a pasid? Do you have examples of such devices
>> which also don't have their own iommu built-in?
> I misunderstood that requirement. Reading through the thread again
> (https://www.mail-archive.com/iommu@lists.linux-foundation.org/msg25640.html)
> it's more about a device using PASIDs as context IDs. Some contexts are
> not bound to processes but they still need their ID in the same PASID
> space as the contexts that are bound to process address spaces. So we
> can keep that code in drivers/iommu

The problem with that approach is that we also need this allocator when 
IOMMU is completely disabled.

In other words PASIDs are used as contexts IDs by the hardware for 
things like signaling which application has caused an interrupt/event 
even when they are not used by IOMMU later on.

Additional to that we have some MMUs which are internal to the devices 
(GPUVM on AMD GPUs for example, similar exists for NVidia) which uses 
PASIDs for address translation internally in the device.

All of that is completely unrelated to IOMMU, but when IOMMU is enabled 
you need to use the same allocator because all use cases use the same ID 
space.

Regards,
Christian.

>
>>> +int ioasid_for_each(struct ioasid_set *set, ioasid_iter_t func, void *data)
>>> +{
>>> +   int ret;
>>> +   struct ioasid_iter_data iter_data = {
>>> +   .set= set,
>>> +   .func   = func,
>>> +   .data   = data,
>>> +   };
>>> +
>>> +   idr_lock(_idr);
>>> +   ret = idr_for_each(_idr, ioasid_iter, _data);
>>> +   idr_unlock(_idr);
>>> +
>>> +   return ret;
>>> +}
>>> +EXPORT_SYMBOL_GPL(ioasid_for_each);
>> What is the intended use-case for this function?
> I'm using it in sva_bind(), to see if there already exists an io_mm
> associated to the mm_struct given as argument
>
>>> +void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid)
>>> +{
>>> +   void *priv = NULL;
>>> +   struct ioasid_data *ioasid_data;
>>> +
>>> +   idr_lock(_idr);
>>> +   ioasid_data = idr_find(_idr, ioasid);
>>> +   if (ioasid_data && ioasid_data->set == set)
>>> +   priv = ioasid_data->private;
>>> +   idr_unlock(_idr);
>>> +
>>> +   return priv;
>>> +}
>> I think using the idr_lock here will become a performance problem, as we
>> need this function in the SVA page-fault path to retrieve the mm_struct
>> belonging to a PASID.
>>
>> The head comment in lib/idr.c states that it should be safe to use
>> rcu_readlock() on read-only iterations. If so, this should be used here
>> so that concurrent lookups are possible.
> Agreed. I have a patch to use the RCU which looks simple enough, but
> since I haven't used RCU before I wasn't comfortable squashing it right
> away.
>
> Thanks,
> Jean
>
> diff --git a/drivers/base/ioasid.c b/drivers/base/ioasid.c
> index 5eb3abbf69ac..fa3b48c9c044 100644
> --- a/drivers/base/ioasid.c
> +++ b/drivers/base/ioasid.c
> @@ -13,6 +13,7 @@ struct ioasid_data {
>   int id;
>   struct ioasid_set *set;
>   void *private;
> + struct rcu_head rcu;
>   };
>
>   static DEFINE_IDR(ioasid_idr);
> @@ -56,7 +57,8 @@ void ioasid_free(ioasid_t ioasid)
>   ioasid_data = idr_remove(_idr, ioasid);
>   idr_unlock(_idr);
>
> - kfree(ioasid_data);
> + if (ioasid_data)
> + kfree_rcu(ioasid_data, rcu);
>   }
>   EXPORT_SYMBOL_GPL(ioasid_free);
>
> @@ -95,9 +97,9 @@ int ioasid_for_each(struct ioasid_set *set,
> ioasid_iter_t func, void *data)
>   .data   = data,
>   };
>
> - idr_lock(_idr);
> + rcu_read_lock();
>   ret = idr_for_each(_idr, ioasid_iter, _data);
> - idr_unlock(_idr);
> + rcu_read_unlock();
>
>   return ret;
>   }
> @@ -111,11 +113,11 @@ void *ioasid_find(struct ioasid_set *set, ioasid_t
> ioasid)
>   void *priv = NULL;
>   struct ioasid_data *ioasid_data;
>
> - idr_lock(_idr);
> + rcu_read_lock();
>   ioasid_data = idr_find(_idr, ioasid);
>   if (ioasid_data && ioasid_data->set == set)
>   priv = ioasid_data->private;
> - idr_unlock(_idr);
> + rcu_read_unlock();
>
>   return priv;
>   }
>

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 0/6] Auxiliary IOMMU domains and Arm SMMUv3

2018-11-21 Thread Jean-Philippe Brucker
On 12/11/2018 14:55, j...@8bytes.org wrote:
> Hi Jean-Philippe,
> 
> On Thu, Nov 08, 2018 at 06:29:42PM +, Jean-Philippe Brucker wrote:
>> (1) My initial approach would have been to use the same page tables for
>> the default_domain and this new domain, but it might be precisely what
>> you were trying to avoid with this new proposal: a device attached to
>> two domains at the same time.
> 
> My request was about the initial assumptions a device driver can make
> about a device. This assumptions is that DMA-API is set up and
> initialized for it, for everything else (like SVA) the device driver
> needs to take special action, like allocating an SVA domain and attaching
> the device to it.
> 
> This should of course not break any IRQ setups for the device, and also
> enforcing an ordering is not a good and maintainable solution.
> 
> So what you could do here is to either:
> 
>   1) Save the needed IRQ mappings in some extra datastructure and
>  duplicate it in the SVA domain. This also makes it easier to
>  use the same SVA domain with multiple devices.
> 
>   2) Just re-use the 'page-tables' from the default domain in the
>  sva-domain. This needs to happen at attach-time, because at
>  allocation time you don't know the device yet.
> 
> I think 1) is the best solution, what do you think?

Right now 2) seems more feasible, because changes will be limited to the
IOMMU driver and don't spill in the dma-iommu layer. Maybe it will be
clearer to me once I write a prototype.

> Btw, things would be different if we could expose SVA through the
> DMA-API to drivers. In this case we could just make the default domain
> of type SVA and be done, but we are not there yet.

For the moment though, I think we should allow device drivers to use the
DMA-API at the same time as SVA. If a device driver has to map a
management ring buffer for example, it's much nicer to use dma_alloc as
usual rather than have them write their own DMA allocation routines.
Which is another reason to implement 2) above: the DMA-API would still
act on the default_domain, and attaching an "extended" domain augments
it with SVA features. Detaching from the device doesn't require copying
mappings back to the default domain. Does that sound OK?

Then if vfio-pci wants to use SVA, maybe we can make the UNMANAGED
domain support SVA by default? Otherwise introduce a UNMANAGED+ext
domain. Anyway, that's for later.

This is what I currently plan to add:

struct io_mm *io_mm;
struct iommu_domain *domain;

domain = iommu_alloc_ext_domain(bus);

/* Set an mm-exit callback to disable PASIDs. More attributes
   could be added later to change the capabilities of the ext
   domain */
iommu_domain_set_attr(domain, DOMAIN_ATTR_MM_EXIT_CB, _exit);

/* Fails if the device doesn't support this domain type */
iommu_attach_device(domain, dev);

/* Same as SVA v3, except a domain instead of dev as argument */
io_mm = iommu_sva_bind_mm(domain, current->mm, ...);

/* on success, install the PASID in the device */
pasid = io_mm->pasid;

/* Do more work */

iommu_sva_unbind_mm(io_mm);
iommu_detach_device(domain, dev);


Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 3/3] iommu/io-pgtable-arm-v7s: Request DMA32 memory, and improve debugging

2018-11-21 Thread Michal Hocko
On Wed 21-11-18 16:46:38, Will Deacon wrote:
> On Sun, Nov 11, 2018 at 05:03:41PM +0800, Nicolas Boichat wrote:
> > For level 1/2 pages, ensure GFP_DMA32 is used if CONFIG_ZONE_DMA32
> > is defined (e.g. on arm64 platforms).
> > 
> > For level 2 pages, allocate a slab cache in SLAB_CACHE_DMA32.
> > 
> > Also, print an error when the physical address does not fit in
> > 32-bit, to make debugging easier in the future.
> > 
> > Fixes: ad67f5a6545f ("arm64: replace ZONE_DMA with ZONE_DMA32")
> > Signed-off-by: Nicolas Boichat 
> > ---
> > 
> > Changes since v1:
> >  - Changed approach to use SLAB_CACHE_DMA32 added by the previous
> >commit.
> >  - Use DMA or DMA32 depending on the architecture (DMA for arm,
> >DMA32 for arm64).
> > 
> > drivers/iommu/io-pgtable-arm-v7s.c | 20 
> >  1 file changed, 16 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
> > b/drivers/iommu/io-pgtable-arm-v7s.c
> > index 445c3bde04800c..996f7b6d00b44a 100644
> > --- a/drivers/iommu/io-pgtable-arm-v7s.c
> > +++ b/drivers/iommu/io-pgtable-arm-v7s.c
> > @@ -161,6 +161,14 @@
> >  
> >  #define ARM_V7S_TCR_PD1BIT(5)
> >  
> > +#ifdef CONFIG_ZONE_DMA32
> > +#define ARM_V7S_TABLE_GFP_DMA GFP_DMA32
> > +#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA32
> > +#else
> > +#define ARM_V7S_TABLE_GFP_DMA GFP_DMA
> > +#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA
> > +#endif
> 
> It's a bit grotty that GFP_DMA32 doesn't just map to GFP_DMA on 32-bit
> architectures, since then we wouldn't need this #ifdeffery afaict.

But GFP_DMA32 should map to GFP_KERNEL on 32b, no? Or what exactly is
going on in here?

-- 
Michal Hocko
SUSE Labs
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 0/2] SWIOTLB fixes for 4.20

2018-11-21 Thread Christoph Hellwig
Thanks,

aplied to the dma-mapping for-linus tree.

I'll wait a little longer for a review from Stefano, but I really want
it in linux-next asap and hopefully off to Linus for the weekend.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCHv4 2/2] iommu/arm-smmu: Get clock config from device tree

2018-11-21 Thread Will Deacon
Hi Thor,

On Fri, Oct 05, 2018 at 03:57:00PM -0500, thor.tha...@linux.intel.com wrote:
> From: Thor Thayer 
> 
> Currently the clocks are specified in a structure as well as
> in the device tree. Since all the information about clocks can be
> pulled from the device tree, parse the device tree for the clock
> information if specified.
> 
> This patch is dependent upon the following patch series that
> adds clocks to the driver.
> "[v16,0/5] iommu/arm-smmu: Add runtime pm/sleep support"
> https://patchwork.kernel.org/cover/10581891/
> 
> particularly this patch which adds the clocks:
> "[v16,1/5] iommu/arm-smmu: Add pm_runtime/sleep ops"
> https://patchwork.kernel.org/patch/10581899/
> 
> Request testing on different platforms for verification.
> This patch was tested on a Stratix10 SOCFPGA.
> 
> Signed-off-by: Thor Thayer 
> ---
> v4  Change dependency on pending patch series that adds clocks.
> Add code for parsing device tree for the clocks.
> v3  Change dependency on device tree bulk clock patches.
> v2  Add structure for SOCFPGA and add dependency on clock patch.
> ---
>  drivers/iommu/arm-smmu.c | 30 +++---
>  1 file changed, 27 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index d315ca637097..3dd10663b09c 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -44,6 +44,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -2139,6 +2140,7 @@ static int arm_smmu_device_dt_probe(struct 
> platform_device *pdev,
>   const struct arm_smmu_match_data *data;
>   struct device *dev = >dev;
>   bool legacy_binding;
> + const char **parent_names;
>  
>   if (of_property_read_u32(dev->of_node, "#global-interrupts",
>>num_global_irqs)) {
> @@ -2149,9 +2151,25 @@ static int arm_smmu_device_dt_probe(struct 
> platform_device *pdev,
>   data = of_device_get_match_data(dev);
>   smmu->version = data->version;
>   smmu->model = data->model;
> - smmu->num_clks = data->num_clks;
> -
> - arm_smmu_fill_clk_data(smmu, data->clks);
> + smmu->num_clks = of_clk_get_parent_count(dev->of_node);
> + /* check to see if clocks were specified in DT */
> + if (smmu->num_clks) {
> + unsigned int i;
> +
> + parent_names = kmalloc_array(smmu->num_clks,
> +  sizeof(*parent_names),
> +  GFP_KERNEL);
> + if (!parent_names)
> + goto fail_clk_name;
> +
> + for (i = 0; i < smmu->num_clks; i++) {
> + if (of_property_read_string_index(dev->of_node,
> +   "clock-names", i,
> +   _names[i]))
> + goto fail_clk_name;
> + }
> + arm_smmu_fill_clk_data(smmu, parent_names);
> + }
>  
>   parse_driver_options(smmu);
>  
> @@ -2171,6 +2189,12 @@ static int arm_smmu_device_dt_probe(struct 
> platform_device *pdev,
>   smmu->features |= ARM_SMMU_FEAT_COHERENT_WALK;
>  
>   return 0;
> +
> +fail_clk_name:
> + kfree(parent_names);
> + /* clock-names required for clocks in devm_clk_bulk_get() */
> + dev_err(dev, "Error: clock-name required in device tree\n");

I think you can drop the "Error: " prefix here, but the rest of the patch
looks sensible to me. We just need to work out how this co-exists with
Vivek's approach of using match_data (see the other thread).

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 3/3] iommu/io-pgtable-arm-v7s: Request DMA32 memory, and improve debugging

2018-11-21 Thread Robin Murphy

On 21/11/2018 17:38, Christopher Lameter wrote:

On Wed, 21 Nov 2018, Will Deacon wrote:


+#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA32


SLAB_CACHE_DMA32??? WTH is going on here? We are trying to get rid of
the dma slab array.


See the previous two patches in this series. If there's already a 
(better) way to have a kmem_cache which allocates its backing pages with 
GFP_DMA32, please do let us know.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RESEND PATCH v17 5/5] iommu/arm-smmu: Add support for qcom,smmu-v2 variant

2018-11-21 Thread Will Deacon
[+Thor]

On Fri, Nov 16, 2018 at 04:54:30PM +0530, Vivek Gautam wrote:
> qcom,smmu-v2 is an arm,smmu-v2 implementation with specific
> clock and power requirements.
> On msm8996, multiple cores, viz. mdss, video, etc. use this
> smmu. On sdm845, this smmu is used with gpu.
> Add bindings for the same.
> 
> Signed-off-by: Vivek Gautam 
> Reviewed-by: Rob Herring 
> Reviewed-by: Tomasz Figa 
> Tested-by: Srinivas Kandagatla 
> Reviewed-by: Robin Murphy 
> ---
>  drivers/iommu/arm-smmu.c | 13 +
>  1 file changed, 13 insertions(+)
> 
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index 2098c3141f5f..d315ca637097 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -120,6 +120,7 @@ enum arm_smmu_implementation {
>   GENERIC_SMMU,
>   ARM_MMU500,
>   CAVIUM_SMMUV2,
> + QCOM_SMMUV2,
>  };
>  
>  struct arm_smmu_s2cr {
> @@ -2026,6 +2027,17 @@ ARM_SMMU_MATCH_DATA(arm_mmu401, ARM_SMMU_V1_64K, 
> GENERIC_SMMU);
>  ARM_SMMU_MATCH_DATA(arm_mmu500, ARM_SMMU_V2, ARM_MMU500);
>  ARM_SMMU_MATCH_DATA(cavium_smmuv2, ARM_SMMU_V2, CAVIUM_SMMUV2);
>  
> +static const char * const qcom_smmuv2_clks[] = {
> + "bus", "iface",
> +};
> +
> +static const struct arm_smmu_match_data qcom_smmuv2 = {
> + .version = ARM_SMMU_V2,
> + .model = QCOM_SMMUV2,
> + .clks = qcom_smmuv2_clks,
> + .num_clks = ARRAY_SIZE(qcom_smmuv2_clks),
> +};

These seems redundant if we go down the route proposed by Thor, where we
just pull all of the clocks out of the device-tree. In which case, why
do we need this match_data at all?

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RESEND PATCH v17 2/5] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device

2018-11-21 Thread Will Deacon
On Fri, Nov 16, 2018 at 04:54:27PM +0530, Vivek Gautam wrote:
> From: Sricharan R 
> 
> The smmu device probe/remove and add/remove master device callbacks
> gets called when the smmu is not linked to its master, that is without
> the context of the master device. So calling runtime apis in those places
> separately.
> Global locks are also initialized before enabling runtime pm as the
> runtime_resume() calls device_reset() which does tlb_sync_global()
> that ultimately requires locks to be initialized.
> 
> Signed-off-by: Sricharan R 
> [vivek: Cleanup pm runtime calls]
> Signed-off-by: Vivek Gautam 
> Reviewed-by: Tomasz Figa 
> Tested-by: Srinivas Kandagatla 
> Reviewed-by: Robin Murphy 
> ---
>  drivers/iommu/arm-smmu.c | 101 
> ++-
>  1 file changed, 91 insertions(+), 10 deletions(-)

Given that you're doing the get/put in the TLBI ops unconditionally:

>  static void arm_smmu_flush_iotlb_all(struct iommu_domain *domain)
>  {
>   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> + struct arm_smmu_device *smmu = smmu_domain->smmu;
>  
> - if (smmu_domain->tlb_ops)
> + if (smmu_domain->tlb_ops) {
> + arm_smmu_rpm_get(smmu);
>   smmu_domain->tlb_ops->tlb_flush_all(smmu_domain);
> + arm_smmu_rpm_put(smmu);
> + }
>  }
>  
>  static void arm_smmu_iotlb_sync(struct iommu_domain *domain)
>  {
>   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> + struct arm_smmu_device *smmu = smmu_domain->smmu;
>  
> - if (smmu_domain->tlb_ops)
> + if (smmu_domain->tlb_ops) {
> + arm_smmu_rpm_get(smmu);
>   smmu_domain->tlb_ops->tlb_sync(smmu_domain);
> + arm_smmu_rpm_put(smmu);
> + }

Why do you need them around the map/unmap calls as well?

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3] iommu/dma: Use NUMA aware memory allocations in __iommu_dma_alloc_pages()

2018-11-21 Thread Will Deacon
On Wed, Nov 21, 2018 at 04:47:48PM +, John Garry wrote:
> On 21/11/2018 16:07, Will Deacon wrote:
> >On Wed, Nov 21, 2018 at 10:54:10PM +0800, John Garry wrote:
> >>From: Ganapatrao Kulkarni 
> >>
> >>Change function __iommu_dma_alloc_pages() to allocate pages for DMA from
> >>respective device NUMA node. The ternary operator which would be for
> >>alloc_pages_node() is tidied along with this.
> >>
> >>We also include a change to use kvzalloc() for kzalloc()/vzalloc()
> >>combination.
> >>
> >>Signed-off-by: Ganapatrao Kulkarni 
> >>[JPG: Added kvzalloc(), drop pages ** being device local, tidied ternary 
> >>operator]
> >>Signed-off-by: John Garry 
> >
> >Weird, you're missing a diffstat here.
> >
> >Anyway, the patch looks fine to me, but it would be nice if you could
> >justify the change with some numbers. Do you actually see an improvement
> >from this change?
> >
> 
> Hi Will,
> 
> Ah, I missed adding my comments explaining the motivation. It would be
> better in the commit log. Anyway, here's the snippet:
> 
> " ... as mentioned in [3], dma_alloc_coherent() uses the locality
> information from the device - as in direct DMA - so this patch is just
> applying this same policy.
> 
> [3]
> https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1692998.html;

Yes, please add to this to the commit log.

> I did have some numbers to show improvement in some scenarios when I tested
> this a while back which I'll dig out.
> 
> However I would say that some scenarios will improve and the opposite for
> others with this change, considering different conditions in which DMA
> memory may be used.

Well, if you can show that it's useful in some cases and not catastrophic in
others, then I think shooting for parity with direct DMA is a reasonable
justification for the change.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 07/17] debugobjects: Move printk out of db lock critical sections

2018-11-21 Thread Waiman Long
On 11/19/2018 01:55 PM, Waiman Long wrote:
> The db->lock is a raw spinlock and so the lock hold time is supposed
> to be short. This will not be the case when printk() is being involved
> in some of the critical sections. In order to avoid the long hold time,
> in case some messages need to be printed, the debug_object_is_on_stack()
> and debug_print_object() calls are now moved out of those critical
> sections.
>
> Signed-off-by: Waiman Long 
> ---
>  lib/debugobjects.c | 61 
> +-
>  1 file changed, 42 insertions(+), 19 deletions(-)
>
> diff --git a/lib/debugobjects.c b/lib/debugobjects.c
> index 403dd95..4216d3d 100644
> --- a/lib/debugobjects.c
> +++ b/lib/debugobjects.c
> @@ -376,6 +376,8 @@ static void debug_object_is_on_stack(void *addr, int 
> onstack)
>   struct debug_bucket *db;
>   struct debug_obj *obj;
>   unsigned long flags;
> + bool debug_printobj = false;
> + bool debug_chkstack = false;
>  
>   fill_pool();
>  
> @@ -392,7 +394,7 @@ static void debug_object_is_on_stack(void *addr, int 
> onstack)
>   debug_objects_oom();
>   return;
>   }
> - debug_object_is_on_stack(addr, onstack);
> + debug_chkstack = true;
>   }
>  
>   switch (obj->state) {
> @@ -403,20 +405,25 @@ static void debug_object_is_on_stack(void *addr, int 
> onstack)
>   break;
>  
>   case ODEBUG_STATE_ACTIVE:
> - debug_print_object(obj, "init");
>   state = obj->state;
>   raw_spin_unlock_irqrestore(>lock, flags);
> + debug_print_object(obj, "init");
>   debug_object_fixup(descr->fixup_init, addr, state);
>   return;
>  
>   case ODEBUG_STATE_DESTROYED:
> - debug_print_object(obj, "init");
> + debug_printobj = true;
>   break;
>   default:
>   break;
>   }
>  
>   raw_spin_unlock_irqrestore(>lock, flags);
> + if (debug_chkstack)
> + debug_object_is_on_stack(addr, onstack);
> + if (debug_printobj)
> + debug_print_object(obj, "init");
> +
>  }
>  
>  /**
> @@ -474,6 +481,8 @@ int debug_object_activate(void *addr, struct 
> debug_obj_descr *descr)
>  
>   obj = lookup_object(addr, db);
>   if (obj) {
> + bool debug_printobj = false;
> +
>   switch (obj->state) {
>   case ODEBUG_STATE_INIT:
>   case ODEBUG_STATE_INACTIVE:
> @@ -482,14 +491,14 @@ int debug_object_activate(void *addr, struct 
> debug_obj_descr *descr)
>   break;
>  
>   case ODEBUG_STATE_ACTIVE:
> - debug_print_object(obj, "activate");
>   state = obj->state;
>   raw_spin_unlock_irqrestore(>lock, flags);
> + debug_print_object(obj, "activate");
>   ret = debug_object_fixup(descr->fixup_activate, addr, 
> state);
>   return ret ? 0 : -EINVAL;
>  
>   case ODEBUG_STATE_DESTROYED:
> - debug_print_object(obj, "activate");
> + debug_printobj = true;
>   ret = -EINVAL;
>   break;
>   default:
> @@ -497,10 +506,13 @@ int debug_object_activate(void *addr, struct 
> debug_obj_descr *descr)
>   break;
>   }
>   raw_spin_unlock_irqrestore(>lock, flags);
> + if (debug_printobj)
> + debug_print_object(obj, "activate");
>   return ret;
>   }
>  
>   raw_spin_unlock_irqrestore(>lock, flags);
> +
>   /*
>* We are here when a static object is activated. We
>* let the type specific code confirm whether this is
> @@ -532,6 +544,7 @@ void debug_object_deactivate(void *addr, struct 
> debug_obj_descr *descr)
>   struct debug_bucket *db;
>   struct debug_obj *obj;
>   unsigned long flags;
> + bool debug_printobj = false;
>  
>   if (!debug_objects_enabled)
>   return;
> @@ -549,24 +562,27 @@ void debug_object_deactivate(void *addr, struct 
> debug_obj_descr *descr)
>   if (!obj->astate)
>   obj->state = ODEBUG_STATE_INACTIVE;
>   else
> - debug_print_object(obj, "deactivate");
> + debug_printobj = true;
>   break;
>  
>   case ODEBUG_STATE_DESTROYED:
> - debug_print_object(obj, "deactivate");
> + debug_printobj = true;
>   break;
>   default:
>   break;
>   }
> - } else {
> + }
> +
> + raw_spin_unlock_irqrestore(>lock, flags);
> + if (!obj) {
>   struct debug_obj o = { .object = addr,
>  

Re: [PATCH v3] iommu/dma: Use NUMA aware memory allocations in __iommu_dma_alloc_pages()

2018-11-21 Thread John Garry

On 21/11/2018 16:07, Will Deacon wrote:

On Wed, Nov 21, 2018 at 10:54:10PM +0800, John Garry wrote:

From: Ganapatrao Kulkarni 

Change function __iommu_dma_alloc_pages() to allocate pages for DMA from
respective device NUMA node. The ternary operator which would be for
alloc_pages_node() is tidied along with this.

We also include a change to use kvzalloc() for kzalloc()/vzalloc()
combination.

Signed-off-by: Ganapatrao Kulkarni 
[JPG: Added kvzalloc(), drop pages ** being device local, tidied ternary 
operator]
Signed-off-by: John Garry 


Weird, you're missing a diffstat here.

Anyway, the patch looks fine to me, but it would be nice if you could
justify the change with some numbers. Do you actually see an improvement
from this change?



Hi Will,

Ah, I missed adding my comments explaining the motivation. It would be 
better in the commit log. Anyway, here's the snippet:


" ... as mentioned in [3], dma_alloc_coherent() uses the locality
information from the device - as in direct DMA - so this patch is just
applying this same policy.

[3] 
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1692998.html;


I did have some numbers to show improvement in some scenarios when I 
tested this a while back which I'll dig out.


However I would say that some scenarios will improve and the opposite 
for others with this change, considering different conditions in which 
DMA memory may be used.


Cheers,
John


Will

.




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 3/3] iommu/io-pgtable-arm-v7s: Request DMA32 memory, and improve debugging

2018-11-21 Thread Will Deacon
On Sun, Nov 11, 2018 at 05:03:41PM +0800, Nicolas Boichat wrote:
> For level 1/2 pages, ensure GFP_DMA32 is used if CONFIG_ZONE_DMA32
> is defined (e.g. on arm64 platforms).
> 
> For level 2 pages, allocate a slab cache in SLAB_CACHE_DMA32.
> 
> Also, print an error when the physical address does not fit in
> 32-bit, to make debugging easier in the future.
> 
> Fixes: ad67f5a6545f ("arm64: replace ZONE_DMA with ZONE_DMA32")
> Signed-off-by: Nicolas Boichat 
> ---
> 
> Changes since v1:
>  - Changed approach to use SLAB_CACHE_DMA32 added by the previous
>commit.
>  - Use DMA or DMA32 depending on the architecture (DMA for arm,
>DMA32 for arm64).
> 
> drivers/iommu/io-pgtable-arm-v7s.c | 20 
>  1 file changed, 16 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
> b/drivers/iommu/io-pgtable-arm-v7s.c
> index 445c3bde04800c..996f7b6d00b44a 100644
> --- a/drivers/iommu/io-pgtable-arm-v7s.c
> +++ b/drivers/iommu/io-pgtable-arm-v7s.c
> @@ -161,6 +161,14 @@
>  
>  #define ARM_V7S_TCR_PD1  BIT(5)
>  
> +#ifdef CONFIG_ZONE_DMA32
> +#define ARM_V7S_TABLE_GFP_DMA GFP_DMA32
> +#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA32
> +#else
> +#define ARM_V7S_TABLE_GFP_DMA GFP_DMA
> +#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA
> +#endif

It's a bit grotty that GFP_DMA32 doesn't just map to GFP_DMA on 32-bit
architectures, since then we wouldn't need this #ifdeffery afaict.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 0/2] Enable smmu support on sdm845

2018-11-21 Thread Vivek Gautam

Hi Will,


On 11/21/2018 9:22 PM, Will Deacon wrote:

Hi Vivek,

On Thu, Oct 11, 2018 at 03:19:28PM +0530, Vivek Gautam wrote:

This series enables apps-smmu, the "arm,mmu-500" instance
on sdm845.
Series tested on SDM845 MTP device with related smmu patch series [1],
and necessary config change, besides one hack to keep LDO14 in LPM mode
to boot things up (see the commit in the test branch).
The tested branch is available at [2].

[...]


  .../devicetree/bindings/iommu/arm,smmu.txt |  4 ++
  arch/arm64/boot/dts/qcom/sdm845.dtsi   | 72 ++
  2 files changed, 76 insertions(+)

Given that this doesn't touch any of the driver code, please take this via
the Andy and arm-soc.


Yea, once the driver changes are pulled in your tree, I can ask Andy to 
pick these.

Thanks

Best regards
Vivek


Will


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3] iommu/dma: Use NUMA aware memory allocations in __iommu_dma_alloc_pages()

2018-11-21 Thread Will Deacon
On Wed, Nov 21, 2018 at 10:54:10PM +0800, John Garry wrote:
> From: Ganapatrao Kulkarni 
> 
> Change function __iommu_dma_alloc_pages() to allocate pages for DMA from
> respective device NUMA node. The ternary operator which would be for
> alloc_pages_node() is tidied along with this.
> 
> We also include a change to use kvzalloc() for kzalloc()/vzalloc()
> combination.
> 
> Signed-off-by: Ganapatrao Kulkarni 
> [JPG: Added kvzalloc(), drop pages ** being device local, tidied ternary 
> operator]
> Signed-off-by: John Garry 

Weird, you're missing a diffstat here.

Anyway, the patch looks fine to me, but it would be nice if you could
justify the change with some numbers. Do you actually see an improvement
from this change?

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 1/2] dma-direct: Make DIRECT_MAPPING_ERROR viable for SWIOTLB

2018-11-21 Thread Robin Murphy
With the overflow buffer removed, we no longer have a unique address
which is guaranteed not to be a valid DMA target to use as an error
token. The DIRECT_MAPPING_ERROR value of 0 tries to at least represent
an unlikely DMA target, but unfortunately there are already SWIOTLB
users with DMA-able memory at physical address 0 which now gets falsely
treated as a mapping failure and leads to all manner of misbehaviour.

The best we can do to mitigate that is flip DIRECT_MAPPING_ERROR to the
other commonly-used error value of all-bits-set, since the last single
byte of memory is by far the least-likely-valid DMA target.

Fixes: dff8d6c1ed58 ("swiotlb: remove the overflow buffer")
Reported-by: John Stultz 
Tested-by: John Stultz 
Acked-by: Konrad Rzeszutek Wilk 
Reviewed-by: Christoph Hellwig 
Signed-off-by: Robin Murphy 
---

v2: Add parentheses, tweak commit message, collect tags

 include/linux/dma-direct.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index bd73e7a91410..9e66bfe369aa 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -5,7 +5,7 @@
 #include 
 #include 
 
-#define DIRECT_MAPPING_ERROR   0
+#define DIRECT_MAPPING_ERROR   (~(dma_addr_t)0)
 
 #ifdef CONFIG_ARCH_HAS_PHYS_TO_DMA
 #include 
-- 
2.19.1.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 0/2] SWIOTLB fixes for 4.20

2018-11-21 Thread Robin Murphy
Here's a quick v2 addressing Christoph's nits and collecting the tags
given so far. I've reproduced my USB problem with a commit prior to the
dma-mapping pull, so it looks like the cache maintenance changes are off
the hook for that one (and I'll have to venture into USB/UAS territory).

No objection from me if Konrad wants to pick this up instead depending
on timing.

Robin.


Robin Murphy (2):
  dma-direct: Make DIRECT_MAPPING_ERROR viable for SWIOTLB
  swiotlb: Skip cache maintenance on map error

 include/linux/dma-direct.h | 2 +-
 kernel/dma/swiotlb.c   | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

-- 
2.19.1.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 2/2] swiotlb: Skip cache maintenance on map error

2018-11-21 Thread Robin Murphy
If swiotlb_bounce_page() failed, calling arch_sync_dma_for_device() may
lead to such delights as performing cache maintenance on whatever
address phys_to_virt(SWIOTLB_MAP_ERROR) looks like, which is typically
outside the kernel memory map and goes about as well as expected.

Don't do that.

Fixes: a4a4330db46a ("swiotlb: add support for non-coherent DMA")
Tested-by: John Stultz 
Reviewed-by: Christoph Hellwig 
Signed-off-by: Robin Murphy 
---

v2: Collect tags

 kernel/dma/swiotlb.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 5731daa09a32..045930e32c0e 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -679,7 +679,8 @@ dma_addr_t swiotlb_map_page(struct device *dev, struct page 
*page,
}
 
if (!dev_is_dma_coherent(dev) &&
-   (attrs & DMA_ATTR_SKIP_CPU_SYNC) == 0)
+   (attrs & DMA_ATTR_SKIP_CPU_SYNC) == 0 &&
+   dev_addr != DIRECT_MAPPING_ERROR)
arch_sync_dma_for_device(dev, phys, size, dir);
 
return dev_addr;
-- 
2.19.1.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 0/2] Enable smmu support on sdm845

2018-11-21 Thread Will Deacon
Hi Vivek,

On Thu, Oct 11, 2018 at 03:19:28PM +0530, Vivek Gautam wrote:
> This series enables apps-smmu, the "arm,mmu-500" instance
> on sdm845.
> Series tested on SDM845 MTP device with related smmu patch series [1],
> and necessary config change, besides one hack to keep LDO14 in LPM mode
> to boot things up (see the commit in the test branch).
> The tested branch is available at [2].

[...]

>  .../devicetree/bindings/iommu/arm,smmu.txt |  4 ++
>  arch/arm64/boot/dts/qcom/sdm845.dtsi   | 72 
> ++
>  2 files changed, 76 insertions(+)

Given that this doesn't touch any of the driver code, please take this via
the Andy and arm-soc.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/2] SWIOTLB fixes for 4.20

2018-11-21 Thread Konrad Rzeszutek Wilk
On Wed, Nov 21, 2018 at 02:03:31PM +0100, Christoph Hellwig wrote:
> On Tue, Nov 20, 2018 at 11:34:41AM -0500, Konrad Rzeszutek Wilk wrote:
> > > Konrad, are you ok with me picking up both through the dma-mapping
> > > tree?
> > 
> > Yes, albeit I would want Stefano to take a peek at patch #2 just in case.
> 
> Stefano, can you take a look asap?  This is a pretty trivial fix for
> a nasty bug that breaks boot on real life systems.  I'd like to merge
> it by tomorrow so that I can send it off to Linus for the next rc
> (I will be offline for about two days after Friday morning)

It is Turkey Day in US so he may be busy catching those pesky turkeys.
How about Tuesday? And in can also send the GIT PULL.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3] iommu/dma: Use NUMA aware memory allocations in __iommu_dma_alloc_pages()

2018-11-21 Thread John Garry
From: Ganapatrao Kulkarni 

Change function __iommu_dma_alloc_pages() to allocate pages for DMA from
respective device NUMA node. The ternary operator which would be for
alloc_pages_node() is tidied along with this.

We also include a change to use kvzalloc() for kzalloc()/vzalloc()
combination.

Signed-off-by: Ganapatrao Kulkarni 
[JPG: Added kvzalloc(), drop pages ** being device local, tidied ternary 
operator]
Signed-off-by: John Garry 

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index d1b0475..4afb1a8 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -449,20 +449,17 @@ static void __iommu_dma_free_pages(struct page **pages, 
int count)
kvfree(pages);
 }
 
-static struct page **__iommu_dma_alloc_pages(unsigned int count,
-   unsigned long order_mask, gfp_t gfp)
+static struct page **__iommu_dma_alloc_pages(struct device *dev,
+   unsigned int count, unsigned long order_mask, gfp_t gfp)
 {
struct page **pages;
-   unsigned int i = 0, array_size = count * sizeof(*pages);
+   unsigned int i = 0, nid = dev_to_node(dev);
 
order_mask &= (2U << MAX_ORDER) - 1;
if (!order_mask)
return NULL;
 
-   if (array_size <= PAGE_SIZE)
-   pages = kzalloc(array_size, GFP_KERNEL);
-   else
-   pages = vzalloc(array_size);
+   pages = kvzalloc(count * sizeof(*pages), GFP_KERNEL);
if (!pages)
return NULL;
 
@@ -481,10 +478,12 @@ static struct page **__iommu_dma_alloc_pages(unsigned int 
count,
for (order_mask &= (2U << __fls(count)) - 1;
 order_mask; order_mask &= ~order_size) {
unsigned int order = __fls(order_mask);
+   gfp_t alloc_flags = gfp;
 
order_size = 1U << order;
-   page = alloc_pages((order_mask - order_size) ?
-  gfp | __GFP_NORETRY : gfp, order);
+   if (order_mask > order_size)
+   alloc_flags |= __GFP_NORETRY;
+   page = alloc_pages_node(nid, alloc_flags, order);
if (!page)
continue;
if (!order)
@@ -569,7 +568,8 @@ struct page **iommu_dma_alloc(struct device *dev, size_t 
size, gfp_t gfp,
alloc_sizes = min_size;
 
count = PAGE_ALIGN(size) >> PAGE_SHIFT;
-   pages = __iommu_dma_alloc_pages(count, alloc_sizes >> PAGE_SHIFT, gfp);
+   pages = __iommu_dma_alloc_pages(dev, count, alloc_sizes >> PAGE_SHIFT,
+   gfp);
if (!pages)
return NULL;
 
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/9] mm: Introduce new vm_insert_range API

2018-11-21 Thread William Kucharski



> On Nov 21, 2018, at 5:35 AM, Matthew Wilcox  wrote:
> 
> It's probably better to be more explicit and answer Randy's question:
> 
> * If we fail to insert any page into the vma, the function will return
> * immediately leaving any previously-inserted pages present.  Callers
> * from the mmap handler may immediately return the error as their
> * caller will destroy the vma, removing any successfully-inserted pages.
> * Other callers should make their own arrangements for calling unmap_region().

That works for me as well.


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/2] SWIOTLB fixes for 4.20

2018-11-21 Thread Christoph Hellwig
On Tue, Nov 20, 2018 at 11:34:41AM -0500, Konrad Rzeszutek Wilk wrote:
> > Konrad, are you ok with me picking up both through the dma-mapping
> > tree?
> 
> Yes, albeit I would want Stefano to take a peek at patch #2 just in case.

Stefano, can you take a look asap?  This is a pretty trivial fix for
a nasty bug that breaks boot on real life systems.  I'd like to merge
it by tomorrow so that I can send it off to Linus for the next rc
(I will be offline for about two days after Friday morning)
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/9] mm: Introduce new vm_insert_range API

2018-11-21 Thread William Kucharski
Could you add a line to the description explicitly stating that a failure
to insert any page in the range will fail the entire routine, something
like:

> * This allows drivers to insert range of kernel pages they've allocated
> * into a user vma. This is a generic function which drivers can use
> * rather than using their own way of mapping range of kernel pages into
> * user vma.
> *
> * A failure to insert any page in the range will fail the call as a whole.

It's obvious when reading the code, but it would be self-documenting to
state it outright.

Thanks!
-- Bill


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu: use memunmap to free memremap

2018-11-21 Thread Pan Bian
memunmap() should be used to free the return of memremap(), not
iounmap().

Fixes: dfddb969edf0("iommu/vt-d: Switch from ioremap_cache to memremap")
Signed-off-by: Pan Bian 
---
 drivers/iommu/intel-iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index f3ccf02..41a4b88 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3075,7 +3075,7 @@ static int copy_context_table(struct intel_iommu *iommu,
}
 
if (old_ce)
-   iounmap(old_ce);
+   memunmap(old_ce);
 
ret = 0;
if (devfn < 0x80)
-- 
2.7.4


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/4] PCI / ACPI: Identify external PCI devices

2018-11-21 Thread Lorenzo Pieralisi
On Tue, Nov 20, 2018 at 10:43:35PM +0100, Rafael J. Wysocki wrote:
> On Friday, November 16, 2018 11:57:38 AM CET Lorenzo Pieralisi wrote:
> > On Thu, Nov 15, 2018 at 07:33:54PM +, mario.limoncie...@dell.com wrote:
> > > 
> > > 
> > > > -Original Message-
> > > > From: Mika Westerberg 
> > > > Sent: Thursday, November 15, 2018 1:01 PM
> > > > To: Lorenzo Pieralisi
> > > > Cc: Lukas Wunner; iommu@lists.linux-foundation.org; Joerg Roedel; David
> > > > Woodhouse; Lu Baolu; Ashok Raj; Bjorn Helgaas; Rafael J. Wysocki; Jacob 
> > > > jun Pan;
> > > > Andreas Noever; Michael Jamet; Yehezkel Bernat; Christian Kellner; 
> > > > Limonciello,
> > > > Mario; Anthony Wong; linux-a...@vger.kernel.org; 
> > > > linux-...@vger.kernel.org; linux-
> > > > ker...@vger.kernel.org
> > > > Subject: Re: [PATCH 1/4] PCI / ACPI: Identify external PCI devices
> > > > 
> > > > 
> > > > [EXTERNAL EMAIL]
> > > > 
> > > > On Thu, Nov 15, 2018 at 05:46:08PM +, Lorenzo Pieralisi wrote:
> > > > > Do you really need to parse it if the dev->is_thunderbolt check is 
> > > > > enough ?
> > > > 
> > > > Yes, we need to parse it one way or another. dev->is_thunderbolt is
> > > > based on heuristics which do not apply anymore when the thing gets
> > > > integrated in the SoC.
> > > > 
> > > > The _DSD is there already (on existing systems) and is being used by
> > > > Windows so I don't understand why we cannot take advantage of it? Every
> > > > new system with Thunderbolt ports will have it.
> > 
> > We have different opinions on this, there is no point in me reiterating
> > it over and over, I am against the approach taken to solve this problem
> > first in defining the bindings outside the ACPI specifications and
> > second by acquiescing to what has been done so that it will be done
> > over and over again.
> 
> Arguably, however, we are on the receiving end of things here and even if
> we don't use this binding, that won't buy us anything (but it will introduce
> a fair amount of disappointment among both users and OEMs).
> 
> If Windows uses it, then systems will ship with it regardless of what Linux
> does with it, so your "acquiescing to what has been done" argument leads to
> nowhere in this particular case.  It's just a matter of whether or not
> Linux will provide the same level of functionality as Windows with respect
> to this and IMO it would be impractical to refuse to do that for purely
> formal reasons.
> 
> > I will raise the point in the appropriate forum, it is up to Bjorn
> > and Rafael to decide on this patch.
> 
> For the record, my opinion is that there's a little choice on whether
> or not to use this extra information that firmware is willing to give
> us.  It could be defined in a better way and so on, but since it is in
> use anyway, we really have nothing to gain by refusing to use it.

AFAIK PCI firmware bindings should go into PCI firmware specifications,
not Microsoft webpages.

If we deviate from this model there is no way to tell whether that extra
information is right or wrong, it is not necessarily about this patch,
it is about changing the way these bindings are deployed in future
systems.

> Now, where the handling of it belongs to is a separate matter that should be
> decided on its own.

I think that the way these bindings were deployed is wrong, I agree this
is not the right forum to discuss that though. What you will do with
this patch is not my call anyway, I just expressed my opinion.

Thanks,
Lorenzo
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/9] mm: Introduce new vm_insert_range API

2018-11-21 Thread Matthew Wilcox
On Wed, Nov 21, 2018 at 04:19:11AM -0700, William Kucharski wrote:
> Could you add a line to the description explicitly stating that a failure
> to insert any page in the range will fail the entire routine, something
> like:
> 
> > * This allows drivers to insert range of kernel pages they've allocated
> > * into a user vma. This is a generic function which drivers can use
> > * rather than using their own way of mapping range of kernel pages into
> > * user vma.
> > *
> > * A failure to insert any page in the range will fail the call as a whole.
> 
> It's obvious when reading the code, but it would be self-documenting to
> state it outright.

It's probably better to be more explicit and answer Randy's question:

 * If we fail to insert any page into the vma, the function will return
 * immediately leaving any previously-inserted pages present.  Callers
 * from the mmap handler may immediately return the error as their
 * caller will destroy the vma, removing any successfully-inserted pages.
 * Other callers should make their own arrangements for calling unmap_region().

Although unmap_region() is static so there clearly isn't any code in the
kernel today other than in mmap handlers (or fault handlers) that needs to
insert pages into a VMA.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [virtio-dev] Re: [PATCH v4 5/7] iommu: Add virtio-iommu driver

2018-11-21 Thread Auger Eric
Hi jean,

On 11/20/18 6:30 PM, Jean-Philippe Brucker wrote:
> On 16/11/2018 18:46, Jean-Philippe Brucker wrote:
 +/*
 + * __viommu_sync_req - Complete all in-flight requests
 + *
 + * Wait for all added requests to complete. When this function returns, 
 all
 + * requests that were in-flight at the time of the call have completed.
 + */
 +static int __viommu_sync_req(struct viommu_dev *viommu)
 +{
 +  int ret = 0;
 +  unsigned int len;
 +  size_t write_len;
 +  struct viommu_request *req;
 +  struct virtqueue *vq = viommu->vqs[VIOMMU_REQUEST_VQ];
 +
 +  assert_spin_locked(>request_lock);
 +
 +  virtqueue_kick(vq);
 +
 +  while (!list_empty(>requests)) {
 +  len = 0;
 +  req = virtqueue_get_buf(vq, );
 +  if (!req)
 +  continue;
 +
 +  if (!len)
 +  viommu_set_req_status(req->buf, req->len,
 +VIRTIO_IOMMU_S_IOERR);
 +
 +  write_len = req->len - req->write_offset;
 +  if (req->writeback && len >= write_len)
>>> I don't get "len >= write_len". Is it valid for the device to write more
>>> than write_len? If it writes less than write_len, the status is not set,
>>> is it?
> 
> Actually, len could well be three bytes smaller than write_len. The spec
> doesn't require the device to write the three padding bytes in
> virtio_iommu_req_tail, after the status byte.
> 
> Here the driver just assumes that the device wrote the reserved field.
> The QEMU device seems to write uninitialized data in there...

Indeed that's incorrect and I should fix it. tail struct should be
properly initialized to 0. Only probe request implementation is correct.
> 
> Any objection to changing the spec to require the device to initialize
> those bytes to zero? I think it makes things nicer overall and shouldn't
> have any performance impact.
No objection from me.
> 
> [...]
 +static struct iommu_domain *viommu_domain_alloc(unsigned type)
 +{
 +  struct viommu_domain *vdomain;
 +
 +  if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
>>> smmu drivers also support the IDENTITY type. Don't we want to support it
>>> as well?
>>
>> Eventually yes. The IDENTITY domain is useful when an IOMMU has been
>> forced upon you and gets in the way of performance, in which case you
>> get rid of it with CONFIG_IOMMU_DEFAULT_PASSTHROUGH=y or
>> iommu.passthrough=y. For virtio-iommu though, you could simply not
>> instantiate the device.
>>
>> I don't think it's difficult to add: if the device supports
>> VIRTIO_IOMMU_F_BYPASS, then we simply don't send an ATTACH request.
>> Otherwise after ATTACH we send a MAP for the whole input range. If the
>> change isn't bigger than this, I'll add it to v5.
> 
> Not as straightforward as I hoped, when the device doesn't support
> VIRTIO_IOMMU_F_BYPASS:
> 
> * We can't simply create an ID map for the whole input range, we need to
> carve out the resv_mem regions.
> 
> * For a VFIO device, the host has no way of forwarding the identity
> mapping. For example the guest will attempt to map [0; 0x]
> -> [0; 0x], but VFIO only allows to map RAM and cannot take
> such request. One solution is to only create ID mapping for RAM, and
> register a notifier for memory hotplug, like intel-iommu does for IOMMUs
> that don't support HW pass-through.
> 
> Since it's not completely trivial and - I think - not urgent, I'll leave
> this for later.
OK makes sense to me too. It was just a head up.

Thanks

Eric
> 
> Thanks,
> Jean
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 2/6] drivers core: Add I/O ASID allocator

2018-11-21 Thread Jean-Philippe Brucker
On 12/11/2018 14:40, Joerg Roedel wrote:
> Hi Jean-Philippe,
> 
> On Fri, Oct 19, 2018 at 07:11:54PM +0100, Jean-Philippe Brucker wrote:
>> The allocator doesn't really belong in drivers/iommu because some
>> drivers would like to allocate PASIDs for devices that aren't managed by
>> an IOMMU, using the same ID space as IOMMU. It doesn't really belong in
>> drivers/pci either since platform device also support PASID. Add the
>> allocator in drivers/base.
> 
> I don't really buy this argument, in the end it is the IOMMU that
> enforces the PASID mappings for a device. Why should a device not behind
> an IOMMU need to allocate a pasid? Do you have examples of such devices
> which also don't have their own iommu built-in?

I misunderstood that requirement. Reading through the thread again
(https://www.mail-archive.com/iommu@lists.linux-foundation.org/msg25640.html)
it's more about a device using PASIDs as context IDs. Some contexts are
not bound to processes but they still need their ID in the same PASID
space as the contexts that are bound to process address spaces. So we
can keep that code in drivers/iommu

>> +int ioasid_for_each(struct ioasid_set *set, ioasid_iter_t func, void *data)
>> +{
>> +int ret;
>> +struct ioasid_iter_data iter_data = {
>> +.set= set,
>> +.func   = func,
>> +.data   = data,
>> +};
>> +
>> +idr_lock(_idr);
>> +ret = idr_for_each(_idr, ioasid_iter, _data);
>> +idr_unlock(_idr);
>> +
>> +return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(ioasid_for_each);
> 
> What is the intended use-case for this function?

I'm using it in sva_bind(), to see if there already exists an io_mm
associated to the mm_struct given as argument

>> +void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid)
>> +{
>> +void *priv = NULL;
>> +struct ioasid_data *ioasid_data;
>> +
>> +idr_lock(_idr);
>> +ioasid_data = idr_find(_idr, ioasid);
>> +if (ioasid_data && ioasid_data->set == set)
>> +priv = ioasid_data->private;
>> +idr_unlock(_idr);
>> +
>> +return priv;
>> +}
> 
> I think using the idr_lock here will become a performance problem, as we
> need this function in the SVA page-fault path to retrieve the mm_struct
> belonging to a PASID.
> 
> The head comment in lib/idr.c states that it should be safe to use
> rcu_readlock() on read-only iterations. If so, this should be used here
> so that concurrent lookups are possible.

Agreed. I have a patch to use the RCU which looks simple enough, but
since I haven't used RCU before I wasn't comfortable squashing it right
away.

Thanks,
Jean

diff --git a/drivers/base/ioasid.c b/drivers/base/ioasid.c
index 5eb3abbf69ac..fa3b48c9c044 100644
--- a/drivers/base/ioasid.c
+++ b/drivers/base/ioasid.c
@@ -13,6 +13,7 @@ struct ioasid_data {
int id;
struct ioasid_set *set;
void *private;
+   struct rcu_head rcu;
 };

 static DEFINE_IDR(ioasid_idr);
@@ -56,7 +57,8 @@ void ioasid_free(ioasid_t ioasid)
ioasid_data = idr_remove(_idr, ioasid);
idr_unlock(_idr);

-   kfree(ioasid_data);
+   if (ioasid_data)
+   kfree_rcu(ioasid_data, rcu);
 }
 EXPORT_SYMBOL_GPL(ioasid_free);

@@ -95,9 +97,9 @@ int ioasid_for_each(struct ioasid_set *set,
ioasid_iter_t func, void *data)
.data   = data,
};

-   idr_lock(_idr);
+   rcu_read_lock();
ret = idr_for_each(_idr, ioasid_iter, _data);
-   idr_unlock(_idr);
+   rcu_read_unlock();

return ret;
 }
@@ -111,11 +113,11 @@ void *ioasid_find(struct ioasid_set *set, ioasid_t
ioasid)
void *priv = NULL;
struct ioasid_data *ioasid_data;

-   idr_lock(_idr);
+   rcu_read_lock();
ioasid_data = idr_find(_idr, ioasid);
if (ioasid_data && ioasid_data->set == set)
priv = ioasid_data->private;
-   idr_unlock(_idr);
+   rcu_read_unlock();

return priv;
 }

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 6/8] vfio/mdev: Add iommu place holders in mdev_device

2018-11-21 Thread Christoph Hellwig
On Wed, Nov 21, 2018 at 02:22:08AM +0530, Kirti Wankhede wrote:
> It is about how mdev framework can be used by existing drivers. These
> symbols doesn't use any other exported symbols.

That is an unfortunate accident of history, but doesn't extent to new
ones.  It also is another inidicator those drivers probably are derived
works of the Linux kernel and might be in legal trouble one way or
another.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu