Re: [PATCH v4 2/3] mm: Add support for kmem caches in DMA32 zone

2018-12-05 Thread Wei Yang
On Thu, Dec 06, 2018 at 11:55:02AM +0800, Nicolas Boichat wrote:
>On Thu, Dec 6, 2018 at 11:32 AM Wei Yang  wrote:
>>
>> On Thu, Dec 06, 2018 at 08:41:36AM +0800, Nicolas Boichat wrote:
>> >On Wed, Dec 5, 2018 at 8:18 PM Wei Yang  wrote:
>> >>
>> >> On Wed, Dec 05, 2018 at 03:39:51PM +0800, Nicolas Boichat wrote:
>> >> >On Wed, Dec 5, 2018 at 3:25 PM Wei Yang  
>> >> >wrote:
>> >> >>
>> >> >> On Wed, Dec 05, 2018 at 01:48:27PM +0800, Nicolas Boichat wrote:
>> >> >> >In some cases (e.g. IOMMU ARMv7s page allocator), we need to allocate
>> >> >> >data structures smaller than a page with GFP_DMA32 flag.
>> >> >> >
>> >> >> >This change makes it possible to create a custom cache in DMA32 zone
>> >> >> >using kmem_cache_create, then allocate memory using kmem_cache_alloc.
>> >> >> >
>> >> >> >We do not create a DMA32 kmalloc cache array, as there are currently
>> >> >> >no users of kmalloc(..., GFP_DMA32). The new test in check_slab_flags
>> >> >> >ensures that such calls still fail (as they do before this change).
>> >> >> >
>> >> >> >Fixes: ad67f5a6545f ("arm64: replace ZONE_DMA with ZONE_DMA32")
>> >> >> >Signed-off-by: Nicolas Boichat 
>> >> >> >---
>> >> >> >
>> >> >> >Changes since v2:
>> >> >> > - Clarified commit message
>> >> >> > - Add entry in sysfs-kernel-slab to document the new sysfs file
>> >> >> >
>> >> >> >(v3 used the page_frag approach)
>> >> >> >
>> >> >> >Documentation/ABI/testing/sysfs-kernel-slab |  9 +
>> >> >> > include/linux/slab.h|  2 ++
>> >> >> > mm/internal.h   |  8 ++--
>> >> >> > mm/slab.c   |  4 +++-
>> >> >> > mm/slab.h   |  3 ++-
>> >> >> > mm/slab_common.c|  2 +-
>> >> >> > mm/slub.c   | 18 +-
>> >> >> > 7 files changed, 40 insertions(+), 6 deletions(-)
>> >> >> >
>> >> >> >diff --git a/Documentation/ABI/testing/sysfs-kernel-slab 
>> >> >> >b/Documentation/ABI/testing/sysfs-kernel-slab
>> >> >> >index 29601d93a1c2ea..d742c6cfdffbe9 100644
>> >> >> >--- a/Documentation/ABI/testing/sysfs-kernel-slab
>> >> >> >+++ b/Documentation/ABI/testing/sysfs-kernel-slab
>> >> >> >@@ -106,6 +106,15 @@ Description:
>> >> >> >   are from ZONE_DMA.
>> >> >> >   Available when CONFIG_ZONE_DMA is enabled.
>> >> >> >
>> >> >> >+What: /sys/kernel/slab/cache/cache_dma32
>> >> >> >+Date: December 2018
>> >> >> >+KernelVersion:4.21
>> >> >> >+Contact:  Nicolas Boichat 
>> >> >> >+Description:
>> >> >> >+  The cache_dma32 file is read-only and specifies 
>> >> >> >whether objects
>> >> >> >+  are from ZONE_DMA32.
>> >> >> >+  Available when CONFIG_ZONE_DMA32 is enabled.
>> >> >> >+
>> >> >> > What: /sys/kernel/slab/cache/cpu_slabs
>> >> >> > Date: May 2007
>> >> >> > KernelVersion:2.6.22
>> >> >> >diff --git a/include/linux/slab.h b/include/linux/slab.h
>> >> >> >index 11b45f7ae4057c..9449b19c5f107a 100644
>> >> >> >--- a/include/linux/slab.h
>> >> >> >+++ b/include/linux/slab.h
>> >> >> >@@ -32,6 +32,8 @@
>> >> >> > #define SLAB_HWCACHE_ALIGN((slab_flags_t __force)0x2000U)
>> >> >> > /* Use GFP_DMA memory */
>> >> >> > #define SLAB_CACHE_DMA((slab_flags_t 
>> >> >> > __force)0x4000U)
>> >> >> >+/* Use GFP_DMA32 memory */
>> >> >> >+#define SLAB_CACHE_DMA32  ((slab_flags_t __force)0x8000U)
>> >> >> > /* DEBUG: Store the last owner for bug hunting */
>> >> >> > #define SLAB_STORE_USER   ((slab_flags_t 
>> >> >> > __force)0x0001U)
>> >> >> > /* Panic if kmem_cache_create() fails */
>> >> >> >diff --git a/mm/internal.h b/mm/internal.h
>> >> >> >index a2ee82a0cd44ae..fd244ad716eaf8 100644
>> >> >> >--- a/mm/internal.h
>> >> >> >+++ b/mm/internal.h
>> >> >> >@@ -14,6 +14,7 @@
>> >> >> > #include 
>> >> >> > #include 
>> >> >> > #include 
>> >> >> >+#include 
>> >> >> > #include 
>> >> >> >
>> >> >> > /*
>> >> >> >@@ -34,9 +35,12 @@
>> >> >> > #define GFP_CONSTRAINT_MASK (__GFP_HARDWALL|__GFP_THISNODE)
>> >> >> >
>> >> >> > /* Check for flags that must not be used with a slab allocator */
>> >> >> >-static inline gfp_t check_slab_flags(gfp_t flags)
>> >> >> >+static inline gfp_t check_slab_flags(gfp_t flags, slab_flags_t 
>> >> >> >slab_flags)
>> >> >> > {
>> >> >> >-  gfp_t bug_mask = __GFP_DMA32 | __GFP_HIGHMEM | 
>> >> >> >~__GFP_BITS_MASK;
>> >> >> >+  gfp_t bug_mask = __GFP_HIGHMEM | ~__GFP_BITS_MASK;
>> >> >> >+
>> >> >> >+  if (!IS_ENABLED(CONFIG_ZONE_DMA32) || !(slab_flags & 
>> >> >> >SLAB_CACHE_DMA32))
>> >> >> >+  bug_mask |= __GFP_DMA32;
>> >> >>
>> >> >> The original version doesn't check CONFIG_ZONE_DMA32.
>> >> >>
>> >> >> Do we need to add this condition here?
>> >> >> Could we just decide the bug_mask based on slab_flags?
>> >> >
>> >> >We can. The reason I did it this way is that when we don't have
>> >> 

[PATCH 1/1] swiotlb: fix comment on swiotlb_bounce()

2018-12-05 Thread Dongli Zhang
Fix the comment as swiotlb_bounce() is used to copy from original dma
location to swiotlb buffer during swiotlb_tbl_map_single(), while to
copy from swiotlb buffer to original dma location during
swiotlb_tbl_unmap_single().

Signed-off-by: Dongli Zhang 
---
 kernel/dma/swiotlb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 045930e..0a5ec94 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -389,7 +389,7 @@ static int is_swiotlb_buffer(phys_addr_t paddr)
 }
 
 /*
- * Bounce: copy the swiotlb buffer back to the original dma location
+ * Bounce: copy the swiotlb buffer from or back to the original dma location
  */
 static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr,
   size_t size, enum dma_data_direction dir)
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH RFC 1/1] swiotlb: add debugfs to track swiotlb buffer usage

2018-12-05 Thread Dongli Zhang
The device driver will not be able to do dma operations once swiotlb buffer
is full, either because the driver is using so many IO TLB blocks inflight,
or because there is memory leak issue in device driver. To export the
swiotlb buffer usage via debugfs would help the user estimate the size of
swiotlb buffer to pre-allocate or analyze device driver memory leak issue.

As the swiotlb can be initialized at very early stage when debugfs cannot
register successfully, this patch creates the debugfs entry on demand.

Signed-off-by: Dongli Zhang 
---
 kernel/dma/swiotlb.c | 57 
 1 file changed, 57 insertions(+)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 045930e..d3c8aa4 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -35,6 +35,9 @@
 #include 
 #include 
 #include 
+#ifdef CONFIG_DEBUG_FS
+#include 
+#endif
 
 #include 
 #include 
@@ -73,6 +76,13 @@ static phys_addr_t io_tlb_start, io_tlb_end;
  */
 static unsigned long io_tlb_nslabs;
 
+#ifdef CONFIG_DEBUG_FS
+/*
+ * The number of used IO TLB block
+ */
+static unsigned long io_tlb_used;
+#endif
+
 /*
  * This is a free list describing the number of free entries available from
  * each index
@@ -100,6 +110,41 @@ static DEFINE_SPINLOCK(io_tlb_lock);
 
 static int late_alloc;
 
+#ifdef CONFIG_DEBUG_FS
+
+static struct dentry *d_swiotlb_usage;
+
+static int swiotlb_usage_show(struct seq_file *m, void *v)
+{
+   seq_printf(m, "%lu\n%lu\n", io_tlb_used, io_tlb_nslabs);
+   return 0;
+}
+
+static int swiotlb_usage_open(struct inode *inode, struct file *filp)
+{
+   return single_open(filp, swiotlb_usage_show, NULL);
+}
+
+static const struct file_operations swiotlb_usage_fops = {
+   .open   = swiotlb_usage_open,
+   .read   = seq_read,
+   .llseek = seq_lseek,
+   .release= single_release,
+};
+
+void swiotlb_create_debugfs(void)
+{
+   d_swiotlb_usage = debugfs_create_dir("swiotlb", NULL);
+
+   if (!d_swiotlb_usage)
+   return;
+
+   debugfs_create_file("usage", 0600, d_swiotlb_usage,
+   NULL, _usage_fops);
+}
+
+#endif
+
 static int __init
 setup_io_tlb_npages(char *str)
 {
@@ -449,6 +494,11 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
pr_warn_once("%s is active and system is using DMA bounce 
buffers\n",
 sme_active() ? "SME" : "SEV");
 
+#ifdef CONFIG_DEBUG_FS
+   if (unlikely(!d_swiotlb_usage))
+   swiotlb_create_debugfs();
+#endif
+
mask = dma_get_seg_boundary(hwdev);
 
tbl_dma_addr &= mask;
@@ -528,6 +578,9 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
dev_warn(hwdev, "swiotlb buffer is full (sz: %zd bytes)\n", 
size);
return SWIOTLB_MAP_ERROR;
 found:
+#ifdef CONFIG_DEBUG_FS
+   io_tlb_used += nslots;
+#endif
spin_unlock_irqrestore(_tlb_lock, flags);
 
/*
@@ -588,6 +641,10 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, 
phys_addr_t tlb_addr,
 */
for (i = index - 1; (OFFSET(i, IO_TLB_SEGSIZE) != 
IO_TLB_SEGSIZE -1) && io_tlb_list[i]; i--)
io_tlb_list[i] = ++count;
+
+#ifdef CONFIG_DEBUG_FS
+   io_tlb_used -= nslots;
+#endif
}
spin_unlock_irqrestore(_tlb_lock, flags);
 }
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 2/3] mm: Add support for kmem caches in DMA32 zone

2018-12-05 Thread Nicolas Boichat
On Thu, Dec 6, 2018 at 11:32 AM Wei Yang  wrote:
>
> On Thu, Dec 06, 2018 at 08:41:36AM +0800, Nicolas Boichat wrote:
> >On Wed, Dec 5, 2018 at 8:18 PM Wei Yang  wrote:
> >>
> >> On Wed, Dec 05, 2018 at 03:39:51PM +0800, Nicolas Boichat wrote:
> >> >On Wed, Dec 5, 2018 at 3:25 PM Wei Yang  wrote:
> >> >>
> >> >> On Wed, Dec 05, 2018 at 01:48:27PM +0800, Nicolas Boichat wrote:
> >> >> >In some cases (e.g. IOMMU ARMv7s page allocator), we need to allocate
> >> >> >data structures smaller than a page with GFP_DMA32 flag.
> >> >> >
> >> >> >This change makes it possible to create a custom cache in DMA32 zone
> >> >> >using kmem_cache_create, then allocate memory using kmem_cache_alloc.
> >> >> >
> >> >> >We do not create a DMA32 kmalloc cache array, as there are currently
> >> >> >no users of kmalloc(..., GFP_DMA32). The new test in check_slab_flags
> >> >> >ensures that such calls still fail (as they do before this change).
> >> >> >
> >> >> >Fixes: ad67f5a6545f ("arm64: replace ZONE_DMA with ZONE_DMA32")
> >> >> >Signed-off-by: Nicolas Boichat 
> >> >> >---
> >> >> >
> >> >> >Changes since v2:
> >> >> > - Clarified commit message
> >> >> > - Add entry in sysfs-kernel-slab to document the new sysfs file
> >> >> >
> >> >> >(v3 used the page_frag approach)
> >> >> >
> >> >> >Documentation/ABI/testing/sysfs-kernel-slab |  9 +
> >> >> > include/linux/slab.h|  2 ++
> >> >> > mm/internal.h   |  8 ++--
> >> >> > mm/slab.c   |  4 +++-
> >> >> > mm/slab.h   |  3 ++-
> >> >> > mm/slab_common.c|  2 +-
> >> >> > mm/slub.c   | 18 +-
> >> >> > 7 files changed, 40 insertions(+), 6 deletions(-)
> >> >> >
> >> >> >diff --git a/Documentation/ABI/testing/sysfs-kernel-slab 
> >> >> >b/Documentation/ABI/testing/sysfs-kernel-slab
> >> >> >index 29601d93a1c2ea..d742c6cfdffbe9 100644
> >> >> >--- a/Documentation/ABI/testing/sysfs-kernel-slab
> >> >> >+++ b/Documentation/ABI/testing/sysfs-kernel-slab
> >> >> >@@ -106,6 +106,15 @@ Description:
> >> >> >   are from ZONE_DMA.
> >> >> >   Available when CONFIG_ZONE_DMA is enabled.
> >> >> >
> >> >> >+What: /sys/kernel/slab/cache/cache_dma32
> >> >> >+Date: December 2018
> >> >> >+KernelVersion:4.21
> >> >> >+Contact:  Nicolas Boichat 
> >> >> >+Description:
> >> >> >+  The cache_dma32 file is read-only and specifies whether 
> >> >> >objects
> >> >> >+  are from ZONE_DMA32.
> >> >> >+  Available when CONFIG_ZONE_DMA32 is enabled.
> >> >> >+
> >> >> > What: /sys/kernel/slab/cache/cpu_slabs
> >> >> > Date: May 2007
> >> >> > KernelVersion:2.6.22
> >> >> >diff --git a/include/linux/slab.h b/include/linux/slab.h
> >> >> >index 11b45f7ae4057c..9449b19c5f107a 100644
> >> >> >--- a/include/linux/slab.h
> >> >> >+++ b/include/linux/slab.h
> >> >> >@@ -32,6 +32,8 @@
> >> >> > #define SLAB_HWCACHE_ALIGN((slab_flags_t __force)0x2000U)
> >> >> > /* Use GFP_DMA memory */
> >> >> > #define SLAB_CACHE_DMA((slab_flags_t 
> >> >> > __force)0x4000U)
> >> >> >+/* Use GFP_DMA32 memory */
> >> >> >+#define SLAB_CACHE_DMA32  ((slab_flags_t __force)0x8000U)
> >> >> > /* DEBUG: Store the last owner for bug hunting */
> >> >> > #define SLAB_STORE_USER   ((slab_flags_t 
> >> >> > __force)0x0001U)
> >> >> > /* Panic if kmem_cache_create() fails */
> >> >> >diff --git a/mm/internal.h b/mm/internal.h
> >> >> >index a2ee82a0cd44ae..fd244ad716eaf8 100644
> >> >> >--- a/mm/internal.h
> >> >> >+++ b/mm/internal.h
> >> >> >@@ -14,6 +14,7 @@
> >> >> > #include 
> >> >> > #include 
> >> >> > #include 
> >> >> >+#include 
> >> >> > #include 
> >> >> >
> >> >> > /*
> >> >> >@@ -34,9 +35,12 @@
> >> >> > #define GFP_CONSTRAINT_MASK (__GFP_HARDWALL|__GFP_THISNODE)
> >> >> >
> >> >> > /* Check for flags that must not be used with a slab allocator */
> >> >> >-static inline gfp_t check_slab_flags(gfp_t flags)
> >> >> >+static inline gfp_t check_slab_flags(gfp_t flags, slab_flags_t 
> >> >> >slab_flags)
> >> >> > {
> >> >> >-  gfp_t bug_mask = __GFP_DMA32 | __GFP_HIGHMEM | ~__GFP_BITS_MASK;
> >> >> >+  gfp_t bug_mask = __GFP_HIGHMEM | ~__GFP_BITS_MASK;
> >> >> >+
> >> >> >+  if (!IS_ENABLED(CONFIG_ZONE_DMA32) || !(slab_flags & 
> >> >> >SLAB_CACHE_DMA32))
> >> >> >+  bug_mask |= __GFP_DMA32;
> >> >>
> >> >> The original version doesn't check CONFIG_ZONE_DMA32.
> >> >>
> >> >> Do we need to add this condition here?
> >> >> Could we just decide the bug_mask based on slab_flags?
> >> >
> >> >We can. The reason I did it this way is that when we don't have
> >> >CONFIG_ZONE_DMA32, the compiler should be able to simplify to:
> >> >
> >> >bug_mask = __GFP_HIGHMEM | ~__GFP_BITS_MASK;
> >> >if (true || ..) => if (true)
> >> >   bug_mask |= __GFP_DMA32;
> 

Re: [PATCH v4 2/3] mm: Add support for kmem caches in DMA32 zone

2018-12-05 Thread Nicolas Boichat
On Wed, Dec 5, 2018 at 10:02 PM Vlastimil Babka  wrote:
>
> On 12/5/18 6:48 AM, Nicolas Boichat wrote:
> > In some cases (e.g. IOMMU ARMv7s page allocator), we need to allocate
> > data structures smaller than a page with GFP_DMA32 flag.
> >
> > This change makes it possible to create a custom cache in DMA32 zone
> > using kmem_cache_create, then allocate memory using kmem_cache_alloc.
> >
> > We do not create a DMA32 kmalloc cache array, as there are currently
> > no users of kmalloc(..., GFP_DMA32). The new test in check_slab_flags
> > ensures that such calls still fail (as they do before this change).
> >
> > Fixes: ad67f5a6545f ("arm64: replace ZONE_DMA with ZONE_DMA32")
>
> Same as my comment for 1/3.

I'll drop.

> > Signed-off-by: Nicolas Boichat 
>
> In general,
> Acked-by: Vlastimil Babka 
>
> Some comments below:
>
> > ---
> >
> > Changes since v2:
> >  - Clarified commit message
> >  - Add entry in sysfs-kernel-slab to document the new sysfs file
> >
> > (v3 used the page_frag approach)
> >
> > Documentation/ABI/testing/sysfs-kernel-slab |  9 +
> >  include/linux/slab.h|  2 ++
> >  mm/internal.h   |  8 ++--
> >  mm/slab.c   |  4 +++-
> >  mm/slab.h   |  3 ++-
> >  mm/slab_common.c|  2 +-
> >  mm/slub.c   | 18 +-
> >  7 files changed, 40 insertions(+), 6 deletions(-)
> >
> > diff --git a/Documentation/ABI/testing/sysfs-kernel-slab 
> > b/Documentation/ABI/testing/sysfs-kernel-slab
> > index 29601d93a1c2ea..d742c6cfdffbe9 100644
> > --- a/Documentation/ABI/testing/sysfs-kernel-slab
> > +++ b/Documentation/ABI/testing/sysfs-kernel-slab
> > @@ -106,6 +106,15 @@ Description:
> >   are from ZONE_DMA.
> >   Available when CONFIG_ZONE_DMA is enabled.
> >
> > +What:/sys/kernel/slab/cache/cache_dma32
> > +Date:December 2018
> > +KernelVersion:   4.21
> > +Contact: Nicolas Boichat 
> > +Description:
> > + The cache_dma32 file is read-only and specifies whether 
> > objects
> > + are from ZONE_DMA32.
> > + Available when CONFIG_ZONE_DMA32 is enabled.
>
> I don't have a strong opinion. It's a new file, yeah, but consistent
> with already existing ones. I'd leave the decision with SL*B maintainers.
>
> >  What:/sys/kernel/slab/cache/cpu_slabs
> >  Date:May 2007
> >  KernelVersion:   2.6.22
> > diff --git a/include/linux/slab.h b/include/linux/slab.h
> > index 11b45f7ae4057c..9449b19c5f107a 100644
> > --- a/include/linux/slab.h
> > +++ b/include/linux/slab.h
> > @@ -32,6 +32,8 @@
> >  #define SLAB_HWCACHE_ALIGN   ((slab_flags_t __force)0x2000U)
> >  /* Use GFP_DMA memory */
> >  #define SLAB_CACHE_DMA   ((slab_flags_t __force)0x4000U)
> > +/* Use GFP_DMA32 memory */
> > +#define SLAB_CACHE_DMA32 ((slab_flags_t __force)0x8000U)
> >  /* DEBUG: Store the last owner for bug hunting */
> >  #define SLAB_STORE_USER  ((slab_flags_t __force)0x0001U)
> >  /* Panic if kmem_cache_create() fails */
> > diff --git a/mm/internal.h b/mm/internal.h
> > index a2ee82a0cd44ae..fd244ad716eaf8 100644
> > --- a/mm/internal.h
> > +++ b/mm/internal.h
> > @@ -14,6 +14,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >
> >  /*
> > @@ -34,9 +35,12 @@
> >  #define GFP_CONSTRAINT_MASK (__GFP_HARDWALL|__GFP_THISNODE)
> >
> >  /* Check for flags that must not be used with a slab allocator */
> > -static inline gfp_t check_slab_flags(gfp_t flags)
> > +static inline gfp_t check_slab_flags(gfp_t flags, slab_flags_t slab_flags)
> >  {
> > - gfp_t bug_mask = __GFP_DMA32 | __GFP_HIGHMEM | ~__GFP_BITS_MASK;
> > + gfp_t bug_mask = __GFP_HIGHMEM | ~__GFP_BITS_MASK;
> > +
> > + if (!IS_ENABLED(CONFIG_ZONE_DMA32) || !(slab_flags & 
> > SLAB_CACHE_DMA32))
> > + bug_mask |= __GFP_DMA32;
>
> I'll point out that this is not even strictly needed AFAICS, as only
> flags passed to kmem_cache_alloc() are checked - the cache->allocflags
> derived from SLAB_CACHE_DMA32 are appended only after check_slab_flags()
> (in both SLAB and SLUB AFAICS). And for a cache created with
> SLAB_CACHE_DMA32, the caller of kmem_cache_alloc() doesn't need to also
> include __GFP_DMA32, the allocation will be from ZONE_DMA32 regardless.

Yes, you're right. I also looked at existing users of SLAB_CACHE_DMA,
and there is one case in drivers/scsi/scsi_lib.c where GFP_DMA is not
be passed (all the other users pass it).

I can drop GFP_DMA32 from my call in io-pgtable-arm-v7s.c.

> So it would be fine even unchanged. The check would anyway need some
> more love to catch the same with __GFP_DMA to be consistent and cover
> all corner cases.

Yes, the test is not complete. If we really wanted this to be
accurate, we'd need to check that GFP_* exactly 

Re: [PATCH v4 2/3] mm: Add support for kmem caches in DMA32 zone

2018-12-05 Thread Wei Yang
On Thu, Dec 06, 2018 at 08:41:36AM +0800, Nicolas Boichat wrote:
>On Wed, Dec 5, 2018 at 8:18 PM Wei Yang  wrote:
>>
>> On Wed, Dec 05, 2018 at 03:39:51PM +0800, Nicolas Boichat wrote:
>> >On Wed, Dec 5, 2018 at 3:25 PM Wei Yang  wrote:
>> >>
>> >> On Wed, Dec 05, 2018 at 01:48:27PM +0800, Nicolas Boichat wrote:
>> >> >In some cases (e.g. IOMMU ARMv7s page allocator), we need to allocate
>> >> >data structures smaller than a page with GFP_DMA32 flag.
>> >> >
>> >> >This change makes it possible to create a custom cache in DMA32 zone
>> >> >using kmem_cache_create, then allocate memory using kmem_cache_alloc.
>> >> >
>> >> >We do not create a DMA32 kmalloc cache array, as there are currently
>> >> >no users of kmalloc(..., GFP_DMA32). The new test in check_slab_flags
>> >> >ensures that such calls still fail (as they do before this change).
>> >> >
>> >> >Fixes: ad67f5a6545f ("arm64: replace ZONE_DMA with ZONE_DMA32")
>> >> >Signed-off-by: Nicolas Boichat 
>> >> >---
>> >> >
>> >> >Changes since v2:
>> >> > - Clarified commit message
>> >> > - Add entry in sysfs-kernel-slab to document the new sysfs file
>> >> >
>> >> >(v3 used the page_frag approach)
>> >> >
>> >> >Documentation/ABI/testing/sysfs-kernel-slab |  9 +
>> >> > include/linux/slab.h|  2 ++
>> >> > mm/internal.h   |  8 ++--
>> >> > mm/slab.c   |  4 +++-
>> >> > mm/slab.h   |  3 ++-
>> >> > mm/slab_common.c|  2 +-
>> >> > mm/slub.c   | 18 +-
>> >> > 7 files changed, 40 insertions(+), 6 deletions(-)
>> >> >
>> >> >diff --git a/Documentation/ABI/testing/sysfs-kernel-slab 
>> >> >b/Documentation/ABI/testing/sysfs-kernel-slab
>> >> >index 29601d93a1c2ea..d742c6cfdffbe9 100644
>> >> >--- a/Documentation/ABI/testing/sysfs-kernel-slab
>> >> >+++ b/Documentation/ABI/testing/sysfs-kernel-slab
>> >> >@@ -106,6 +106,15 @@ Description:
>> >> >   are from ZONE_DMA.
>> >> >   Available when CONFIG_ZONE_DMA is enabled.
>> >> >
>> >> >+What: /sys/kernel/slab/cache/cache_dma32
>> >> >+Date: December 2018
>> >> >+KernelVersion:4.21
>> >> >+Contact:  Nicolas Boichat 
>> >> >+Description:
>> >> >+  The cache_dma32 file is read-only and specifies whether 
>> >> >objects
>> >> >+  are from ZONE_DMA32.
>> >> >+  Available when CONFIG_ZONE_DMA32 is enabled.
>> >> >+
>> >> > What: /sys/kernel/slab/cache/cpu_slabs
>> >> > Date: May 2007
>> >> > KernelVersion:2.6.22
>> >> >diff --git a/include/linux/slab.h b/include/linux/slab.h
>> >> >index 11b45f7ae4057c..9449b19c5f107a 100644
>> >> >--- a/include/linux/slab.h
>> >> >+++ b/include/linux/slab.h
>> >> >@@ -32,6 +32,8 @@
>> >> > #define SLAB_HWCACHE_ALIGN((slab_flags_t __force)0x2000U)
>> >> > /* Use GFP_DMA memory */
>> >> > #define SLAB_CACHE_DMA((slab_flags_t 
>> >> > __force)0x4000U)
>> >> >+/* Use GFP_DMA32 memory */
>> >> >+#define SLAB_CACHE_DMA32  ((slab_flags_t __force)0x8000U)
>> >> > /* DEBUG: Store the last owner for bug hunting */
>> >> > #define SLAB_STORE_USER   ((slab_flags_t 
>> >> > __force)0x0001U)
>> >> > /* Panic if kmem_cache_create() fails */
>> >> >diff --git a/mm/internal.h b/mm/internal.h
>> >> >index a2ee82a0cd44ae..fd244ad716eaf8 100644
>> >> >--- a/mm/internal.h
>> >> >+++ b/mm/internal.h
>> >> >@@ -14,6 +14,7 @@
>> >> > #include 
>> >> > #include 
>> >> > #include 
>> >> >+#include 
>> >> > #include 
>> >> >
>> >> > /*
>> >> >@@ -34,9 +35,12 @@
>> >> > #define GFP_CONSTRAINT_MASK (__GFP_HARDWALL|__GFP_THISNODE)
>> >> >
>> >> > /* Check for flags that must not be used with a slab allocator */
>> >> >-static inline gfp_t check_slab_flags(gfp_t flags)
>> >> >+static inline gfp_t check_slab_flags(gfp_t flags, slab_flags_t 
>> >> >slab_flags)
>> >> > {
>> >> >-  gfp_t bug_mask = __GFP_DMA32 | __GFP_HIGHMEM | ~__GFP_BITS_MASK;
>> >> >+  gfp_t bug_mask = __GFP_HIGHMEM | ~__GFP_BITS_MASK;
>> >> >+
>> >> >+  if (!IS_ENABLED(CONFIG_ZONE_DMA32) || !(slab_flags & 
>> >> >SLAB_CACHE_DMA32))
>> >> >+  bug_mask |= __GFP_DMA32;
>> >>
>> >> The original version doesn't check CONFIG_ZONE_DMA32.
>> >>
>> >> Do we need to add this condition here?
>> >> Could we just decide the bug_mask based on slab_flags?
>> >
>> >We can. The reason I did it this way is that when we don't have
>> >CONFIG_ZONE_DMA32, the compiler should be able to simplify to:
>> >
>> >bug_mask = __GFP_HIGHMEM | ~__GFP_BITS_MASK;
>> >if (true || ..) => if (true)
>> >   bug_mask |= __GFP_DMA32;
>> >
>> >Then just
>> >bug_mask = __GFP_HIGHMEM | ~__GFP_BITS_MASK | __GFP_DMA32;
>> >
>> >And since the function is inline, slab_flags would not even need to be
>> >accessed at all.
>> >
>>
>> Hmm, I get one confusion.
>>
>> This means if CONFIG_ZONE_DMA32 is not enabled, 

Re: [PATCH v5 04/12] iommu/vt-d: Add 256-bit invalidation descriptor support

2018-12-05 Thread Lu Baolu

Hi Joerg,

On 12/5/18 11:56 PM, Joerg Roedel wrote:

On Tue, Dec 04, 2018 at 02:13:31PM +0800, Lu Baolu wrote:

The existing code uses GFP_ATOMIC, this patch only changes the size of
the allocated desc_page.

I don't think we really need GFP_ATOMIC here (and also for some other
places). I will clean up them in a separated patch.


Okay, thanks.


In this patch, there is some code like the code below. It calculates
destination address of memcpy with qi->desc. If it's still struct qi_desc
pointer, the calculation result would be wrong.

+   memcpy(desc, qi->desc + (wait_index << shift),
+  1 << shift);

The change of the calculation method is to support 128 bits invalidation
descriptors and 256 invalidation descriptors in this unified code logic.

Also, the conversation between Baolu and me may help.

https://lore.kernel.org/patchwork/patch/1006756/


Yes. We need to support different descriptor size.


Okay, pointer arithmetic on void* isn't well defined in the C standard,
afaik. But it should work with GCC, so it's probably fine.

Unrelated to this patch-set, the whole qi management code needs some
cleanups, it queues a sync after every command and has very tricky
locking. This patch-set further complicates matters there, so it is
probably time for a clean re-write of that part?


Sure. I will start the cleanups(including the unnecessary ATOMIC page
allocation) and submit them in a separated patch set after well tested.

Best regards,
Lu Baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 02/12] iommu/vt-d: Manage scalalble mode PASID tables

2018-12-05 Thread Lu Baolu

Hi Joerg,

On 12/5/18 11:50 PM, Joerg Roedel wrote:

On Tue, Dec 04, 2018 at 01:58:06PM +0800, Lu Baolu wrote:

This function is called in an unsleepable context.

spin_lock()
[...]
if (pasid_table_is_necessary)
allocate_pasid_table(dev)
[...]
spin_unlock()

We can move it out of the lock range.

How about

if (pasid_table_is_necessary)
pasid_table = allocate_pasid_table(dev)

spin_lock()
[...]
if (pasid_table_is_necessary)
set_up_pasid_table(pasid_table)
[...]
spin_unlock()


Hmm, so when the IOMMU is configured in scalable mode we can just
allocate a pasid-table for the device when we set it up, right?

Scalable mode is a boot-time decision, so we know for sure whether we
need a pasid-table on device-setup time. And the device-setup code it
preemptable, so I think it this allocation should be outside of any
spin-locked section.


Fair enough. I will fix this up in the next version.

Thank you for pointing this out.

Best regards,
Lu Baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 2/3] mm: Add support for kmem caches in DMA32 zone

2018-12-05 Thread Nicolas Boichat
On Wed, Dec 5, 2018 at 8:18 PM Wei Yang  wrote:
>
> On Wed, Dec 05, 2018 at 03:39:51PM +0800, Nicolas Boichat wrote:
> >On Wed, Dec 5, 2018 at 3:25 PM Wei Yang  wrote:
> >>
> >> On Wed, Dec 05, 2018 at 01:48:27PM +0800, Nicolas Boichat wrote:
> >> >In some cases (e.g. IOMMU ARMv7s page allocator), we need to allocate
> >> >data structures smaller than a page with GFP_DMA32 flag.
> >> >
> >> >This change makes it possible to create a custom cache in DMA32 zone
> >> >using kmem_cache_create, then allocate memory using kmem_cache_alloc.
> >> >
> >> >We do not create a DMA32 kmalloc cache array, as there are currently
> >> >no users of kmalloc(..., GFP_DMA32). The new test in check_slab_flags
> >> >ensures that such calls still fail (as they do before this change).
> >> >
> >> >Fixes: ad67f5a6545f ("arm64: replace ZONE_DMA with ZONE_DMA32")
> >> >Signed-off-by: Nicolas Boichat 
> >> >---
> >> >
> >> >Changes since v2:
> >> > - Clarified commit message
> >> > - Add entry in sysfs-kernel-slab to document the new sysfs file
> >> >
> >> >(v3 used the page_frag approach)
> >> >
> >> >Documentation/ABI/testing/sysfs-kernel-slab |  9 +
> >> > include/linux/slab.h|  2 ++
> >> > mm/internal.h   |  8 ++--
> >> > mm/slab.c   |  4 +++-
> >> > mm/slab.h   |  3 ++-
> >> > mm/slab_common.c|  2 +-
> >> > mm/slub.c   | 18 +-
> >> > 7 files changed, 40 insertions(+), 6 deletions(-)
> >> >
> >> >diff --git a/Documentation/ABI/testing/sysfs-kernel-slab 
> >> >b/Documentation/ABI/testing/sysfs-kernel-slab
> >> >index 29601d93a1c2ea..d742c6cfdffbe9 100644
> >> >--- a/Documentation/ABI/testing/sysfs-kernel-slab
> >> >+++ b/Documentation/ABI/testing/sysfs-kernel-slab
> >> >@@ -106,6 +106,15 @@ Description:
> >> >   are from ZONE_DMA.
> >> >   Available when CONFIG_ZONE_DMA is enabled.
> >> >
> >> >+What: /sys/kernel/slab/cache/cache_dma32
> >> >+Date: December 2018
> >> >+KernelVersion:4.21
> >> >+Contact:  Nicolas Boichat 
> >> >+Description:
> >> >+  The cache_dma32 file is read-only and specifies whether 
> >> >objects
> >> >+  are from ZONE_DMA32.
> >> >+  Available when CONFIG_ZONE_DMA32 is enabled.
> >> >+
> >> > What: /sys/kernel/slab/cache/cpu_slabs
> >> > Date: May 2007
> >> > KernelVersion:2.6.22
> >> >diff --git a/include/linux/slab.h b/include/linux/slab.h
> >> >index 11b45f7ae4057c..9449b19c5f107a 100644
> >> >--- a/include/linux/slab.h
> >> >+++ b/include/linux/slab.h
> >> >@@ -32,6 +32,8 @@
> >> > #define SLAB_HWCACHE_ALIGN((slab_flags_t __force)0x2000U)
> >> > /* Use GFP_DMA memory */
> >> > #define SLAB_CACHE_DMA((slab_flags_t __force)0x4000U)
> >> >+/* Use GFP_DMA32 memory */
> >> >+#define SLAB_CACHE_DMA32  ((slab_flags_t __force)0x8000U)
> >> > /* DEBUG: Store the last owner for bug hunting */
> >> > #define SLAB_STORE_USER   ((slab_flags_t __force)0x0001U)
> >> > /* Panic if kmem_cache_create() fails */
> >> >diff --git a/mm/internal.h b/mm/internal.h
> >> >index a2ee82a0cd44ae..fd244ad716eaf8 100644
> >> >--- a/mm/internal.h
> >> >+++ b/mm/internal.h
> >> >@@ -14,6 +14,7 @@
> >> > #include 
> >> > #include 
> >> > #include 
> >> >+#include 
> >> > #include 
> >> >
> >> > /*
> >> >@@ -34,9 +35,12 @@
> >> > #define GFP_CONSTRAINT_MASK (__GFP_HARDWALL|__GFP_THISNODE)
> >> >
> >> > /* Check for flags that must not be used with a slab allocator */
> >> >-static inline gfp_t check_slab_flags(gfp_t flags)
> >> >+static inline gfp_t check_slab_flags(gfp_t flags, slab_flags_t 
> >> >slab_flags)
> >> > {
> >> >-  gfp_t bug_mask = __GFP_DMA32 | __GFP_HIGHMEM | ~__GFP_BITS_MASK;
> >> >+  gfp_t bug_mask = __GFP_HIGHMEM | ~__GFP_BITS_MASK;
> >> >+
> >> >+  if (!IS_ENABLED(CONFIG_ZONE_DMA32) || !(slab_flags & 
> >> >SLAB_CACHE_DMA32))
> >> >+  bug_mask |= __GFP_DMA32;
> >>
> >> The original version doesn't check CONFIG_ZONE_DMA32.
> >>
> >> Do we need to add this condition here?
> >> Could we just decide the bug_mask based on slab_flags?
> >
> >We can. The reason I did it this way is that when we don't have
> >CONFIG_ZONE_DMA32, the compiler should be able to simplify to:
> >
> >bug_mask = __GFP_HIGHMEM | ~__GFP_BITS_MASK;
> >if (true || ..) => if (true)
> >   bug_mask |= __GFP_DMA32;
> >
> >Then just
> >bug_mask = __GFP_HIGHMEM | ~__GFP_BITS_MASK | __GFP_DMA32;
> >
> >And since the function is inline, slab_flags would not even need to be
> >accessed at all.
> >
>
> Hmm, I get one confusion.
>
> This means if CONFIG_ZONE_DMA32 is not enabled, bug_mask will always
> contains __GFP_DMA32. This will check with cachep->flags.
>
> If cachep->flags has GFP_DMA32, this always fail?
>
> Is this possible?

Not fully sure to understand the question, 

[PATCH v2 7/8] dma/debug: Remove dma_debug_resize_entries()

2018-12-05 Thread Robin Murphy
With no callers left, we can clean up this part of dma-debug's exposed
internals, making way to tweak the allocation behaviour.

Signed-off-by: Robin Murphy 
---

v2: New

 include/linux/dma-debug.h |  7 ---
 kernel/dma/debug.c| 30 --
 2 files changed, 37 deletions(-)

diff --git a/include/linux/dma-debug.h b/include/linux/dma-debug.h
index 30213adbb6b9..46e6131a72b6 100644
--- a/include/linux/dma-debug.h
+++ b/include/linux/dma-debug.h
@@ -30,8 +30,6 @@ struct bus_type;
 
 extern void dma_debug_add_bus(struct bus_type *bus);
 
-extern int dma_debug_resize_entries(u32 num_entries);
-
 extern void debug_dma_map_single(struct device *dev, const void *addr,
 unsigned long len);
 
@@ -101,11 +99,6 @@ static inline void dma_debug_add_bus(struct bus_type *bus)
 {
 }
 
-static inline int dma_debug_resize_entries(u32 num_entries)
-{
-   return 0;
-}
-
 static inline void debug_dma_map_single(struct device *dev, const void *addr,
unsigned long len)
 {
diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
index cd9a12bf6a3a..2202402afe9a 100644
--- a/kernel/dma/debug.c
+++ b/kernel/dma/debug.c
@@ -756,36 +756,6 @@ static void dma_entry_free(struct dma_debug_entry *entry)
spin_unlock_irqrestore(_entries_lock, flags);
 }
 
-int dma_debug_resize_entries(u32 num_entries)
-{
-   int i, delta, ret = 0;
-   unsigned long flags;
-   struct dma_debug_entry *entry;
-
-   spin_lock_irqsave(_entries_lock, flags);
-
-   if (nr_total_entries < num_entries) {
-   delta = num_entries - nr_total_entries;
-
-   ret = dma_debug_add_entries(delta, GFP_ATOMIC);
-   } else {
-   delta = nr_total_entries - num_entries;
-
-   for (i = 0; i < delta && !list_empty(_entries); i++) {
-   entry = __dma_entry_alloc();
-   kfree(entry);
-   }
-
-   nr_total_entries -= i;
-   if (nr_total_entries != num_entries)
-   ret = -EBUSY;
-   }
-
-   spin_unlock_irqrestore(_entries_lock, flags);
-
-   return ret;
-}
-
 /*
  * DMA-API debugging init code
  *
-- 
2.19.1.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 6/8] x86/dma/amd-gart: Stop resizing dma_debug_entry pool

2018-12-05 Thread Robin Murphy
dma-debug is now capable of adding new entries to its pool on-demand if
the initial preallocation was insufficient, so the IOMMU_LEAK logic no
longer needs to explicitly change the pool size. This does lose it the
ability to save a couple of megabytes of RAM by reducing the pool size
below its default, but it seems unlikely that that is a realistic
concern these days (or indeed that anyone is actively debugging AGP
drivers' DMA usage any more). Getting rid of dma_debug_resize_entries()
will make room for further streamlining in the dma-debug code itself.

Removing the call reveals quite a lot of cruft which has been useless
for nearly a decade since commit 19c1a6f5764d ("x86 gart: reimplement
IOMMU_LEAK feature by using DMA_API_DEBUG"), including the entire
'iommu=leak' parameter, which controlled nothing except whether
dma_debug_resize_entries() was called or not.

CC: Thomas Gleixner 
CC: Ingo Molnar 
CC: Borislav Petkov 
CC: "H. Peter Anvin" 
CC: x...@kernel.org
Signed-off-by: Robin Murphy 
---

v2: New

 Documentation/x86/x86_64/boot-options.txt |  5 +
 arch/x86/kernel/amd_gart_64.c | 23 ---
 2 files changed, 1 insertion(+), 27 deletions(-)

diff --git a/Documentation/x86/x86_64/boot-options.txt 
b/Documentation/x86/x86_64/boot-options.txt
index ad6d2a80cf05..abc53886655e 100644
--- a/Documentation/x86/x86_64/boot-options.txt
+++ b/Documentation/x86/x86_64/boot-options.txt
@@ -209,7 +209,7 @@ IOMMU (input/output memory management unit)
   mapping with memory protection, etc.
   Kernel boot message: "PCI-DMA: Using Calgary IOMMU"
 
- iommu=[][,noagp][,off][,force][,noforce][,leak[=]
+ iommu=[][,noagp][,off][,force][,noforce]
[,memaper[=]][,merge][,fullflush][,nomerge]
[,noaperture][,calgary]
 
@@ -228,9 +228,6 @@ IOMMU (input/output memory management unit)
 allowedOverwrite iommu off workarounds for specific chipsets.
 fullflush  Flush IOMMU on each allocation (default).
 nofullflushDon't use IOMMU fullflush.
-leak   Turn on simple iommu leak tracing (only when
-   CONFIG_IOMMU_LEAK is on). Default number of leak pages
-   is 20.
 memaper[=]  Allocate an own aperture over RAM with size 32MB

[PATCH v2 3/8] dma-debug: Refactor dma_debug_entry allocation

2018-12-05 Thread Robin Murphy
Make the prealloc_memory() logic a little more general and robust so
that it serves for runtime reallocations too. The first thing we can
do with that is clean up dma_debug_resize_entries() a bit.

Signed-off-by: Robin Murphy 
---

v2: Give it a better name, simplify the locking mess

 kernel/dma/debug.c | 87 +++---
 1 file changed, 36 insertions(+), 51 deletions(-)

diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
index 29486eb9d1dc..1b0858d7edfd 100644
--- a/kernel/dma/debug.c
+++ b/kernel/dma/debug.c
@@ -646,6 +646,36 @@ static void add_dma_entry(struct dma_debug_entry *entry)
 */
 }
 
+static int dma_debug_add_entries(u32 num_entries, gfp_t gfp)
+{
+   struct dma_debug_entry *entry, *next_entry;
+   LIST_HEAD(tmp);
+   int i;
+
+   for (i = 0; i < num_entries; ++i) {
+   entry = kzalloc(sizeof(*entry), gfp);
+   if (!entry)
+   goto out_err;
+
+   list_add_tail(>list, );
+   }
+
+   list_splice(, _entries);
+   num_free_entries += num_entries;
+   nr_total_entries += num_entries;
+
+   return 0;
+
+out_err:
+
+   list_for_each_entry_safe(entry, next_entry, , list) {
+   list_del(>list);
+   kfree(entry);
+   }
+
+   return -ENOMEM;
+}
+
 static struct dma_debug_entry *__dma_entry_alloc(void)
 {
struct dma_debug_entry *entry;
@@ -715,28 +745,13 @@ int dma_debug_resize_entries(u32 num_entries)
int i, delta, ret = 0;
unsigned long flags;
struct dma_debug_entry *entry;
-   LIST_HEAD(tmp);
 
spin_lock_irqsave(_entries_lock, flags);
 
if (nr_total_entries < num_entries) {
delta = num_entries - nr_total_entries;
 
-   spin_unlock_irqrestore(_entries_lock, flags);
-
-   for (i = 0; i < delta; i++) {
-   entry = kzalloc(sizeof(*entry), GFP_KERNEL);
-   if (!entry)
-   break;
-
-   list_add_tail(>list, );
-   }
-
-   spin_lock_irqsave(_entries_lock, flags);
-
-   list_splice(, _entries);
-   nr_total_entries += i;
-   num_free_entries += i;
+   ret = dma_debug_add_entries(delta, GFP_ATOMIC);
} else {
delta = nr_total_entries - num_entries;
 
@@ -746,11 +761,10 @@ int dma_debug_resize_entries(u32 num_entries)
}
 
nr_total_entries -= i;
+   if (nr_total_entries != num_entries)
+   ret = -EBUSY;
}
 
-   if (nr_total_entries != num_entries)
-   ret = 1;
-
spin_unlock_irqrestore(_entries_lock, flags);
 
return ret;
@@ -764,36 +778,6 @@ int dma_debug_resize_entries(u32 num_entries)
  *   2. Preallocate a given number of dma_debug_entry structs
  */
 
-static int prealloc_memory(u32 num_entries)
-{
-   struct dma_debug_entry *entry, *next_entry;
-   int i;
-
-   for (i = 0; i < num_entries; ++i) {
-   entry = kzalloc(sizeof(*entry), GFP_KERNEL);
-   if (!entry)
-   goto out_err;
-
-   list_add_tail(>list, _entries);
-   }
-
-   num_free_entries = num_entries;
-   min_free_entries = num_entries;
-
-   pr_info("preallocated %d debug entries\n", num_entries);
-
-   return 0;
-
-out_err:
-
-   list_for_each_entry_safe(entry, next_entry, _entries, list) {
-   list_del(>list);
-   kfree(entry);
-   }
-
-   return -ENOMEM;
-}
-
 static ssize_t filter_read(struct file *file, char __user *user_buf,
   size_t count, loff_t *ppos)
 {
@@ -1038,14 +1022,15 @@ static int dma_debug_init(void)
return 0;
}
 
-   if (prealloc_memory(nr_prealloc_entries) != 0) {
+   if (dma_debug_add_entries(nr_prealloc_entries, GFP_KERNEL) != 0) {
pr_err("debugging out of memory error - disabled\n");
global_disable = true;
 
return 0;
}
 
-   nr_total_entries = num_free_entries;
+   min_free_entries = num_free_entries;
+   pr_info("preallocated %d debug entries\n", nr_total_entries);
 
dma_debug_initialized = true;
 
-- 
2.19.1.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 8/8] dma-debug: Batch dma_debug_entry allocation

2018-12-05 Thread Robin Murphy
DMA debug entries are one of those things which aren't that useful
individually - we will always want some larger quantity of them - and
that we don't really need to manage the exact number of - we only care
about having 'enough'. In that regard, the current behaviour of creating
them one-by-one from the slab allocator means an awful lot of function
call overhead and memory wasted on alignment padding.

Now that we don't have to worry about freeing anything via
dma_debug_resize_entries(), we can optimise the allocation behaviour by
grabbing whole pages at once, which will save considerably on the
aforementioned overheads, and probably offer a little more cache/TLB
locality benefit for traversing the lists under normal operation.

Since freeing a whole page of entries at once becomes enough of a
challenge that it's not really worth complicating dma_debug_init(),
we may as well tweak the preallocation behaviour so that as long as we
manage to allocate *some* pages, we can leave debugging enabled on a
best-effort basis rather than otherwise wasting them.

Signed-off-by: Robin Murphy 
---

v2: New

 Documentation/DMA-API.txt |  4 +++-
 kernel/dma/debug.c| 45 +--
 2 files changed, 22 insertions(+), 27 deletions(-)

diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt
index 7a7d8a415ce8..097c51b79330 100644
--- a/Documentation/DMA-API.txt
+++ b/Documentation/DMA-API.txt
@@ -747,7 +747,9 @@ driver afterwards. This filter can be disabled or changed 
later using debugfs.
 When the code disables itself at runtime this is most likely because it ran
 out of dma_debug_entries and was unable to allocate more on-demand. 65536
 entries are preallocated at boot - if this is too low for you boot with
-'dma_debug_entries=' to overwrite the default. The
+'dma_debug_entries=' to overwrite the default. Note
+that the code allocates entires in batches, so the exact number of
+preallocated entries may be greater than the actual number requested. The
 code will print to the kernel log each time it has dynamically allocated
 as many entries as were initially preallocated. This is to indicate that a
 larger preallocation size may be appropriate, or if it happens continually
diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
index 2202402afe9a..a6a603526c8f 100644
--- a/kernel/dma/debug.c
+++ b/kernel/dma/debug.c
@@ -48,7 +48,7 @@
 #define PREALLOC_DMA_DEBUG_ENTRIES (1 << 16)
 #endif
 /* If the pool runs out, add this many new entries at once */
-#define DMA_DEBUG_DYNAMIC_ENTRIES 256
+#define DMA_DEBUG_DYNAMIC_ENTRIES (PAGE_SIZE / sizeof(struct dma_debug_entry))
 
 enum {
dma_debug_single,
@@ -648,34 +648,22 @@ static void add_dma_entry(struct dma_debug_entry *entry)
 */
 }
 
-static int dma_debug_add_entries(u32 num_entries, gfp_t gfp)
+static int dma_debug_add_entries(gfp_t gfp)
 {
-   struct dma_debug_entry *entry, *next_entry;
-   LIST_HEAD(tmp);
+   struct dma_debug_entry *entry;
int i;
 
-   for (i = 0; i < num_entries; ++i) {
-   entry = kzalloc(sizeof(*entry), gfp);
-   if (!entry)
-   goto out_err;
+   entry = (void *)get_zeroed_page(gfp);
+   if (!entry)
+   return -ENOMEM;
 
-   list_add_tail(>list, );
-   }
+   for (i = 0; i < DMA_DEBUG_DYNAMIC_ENTRIES; i++)
+   list_add_tail([i].list, _entries);
 
-   list_splice(, _entries);
-   num_free_entries += num_entries;
-   nr_total_entries += num_entries;
+   num_free_entries += DMA_DEBUG_DYNAMIC_ENTRIES;
+   nr_total_entries += DMA_DEBUG_DYNAMIC_ENTRIES;
 
return 0;
-
-out_err:
-
-   list_for_each_entry_safe(entry, next_entry, , list) {
-   list_del(>list);
-   kfree(entry);
-   }
-
-   return -ENOMEM;
 }
 
 static struct dma_debug_entry *__dma_entry_alloc(void)
@@ -717,7 +705,7 @@ static struct dma_debug_entry *dma_entry_alloc(void)
 
spin_lock_irqsave(_entries_lock, flags);
if (num_free_entries == 0) {
-   if (dma_debug_create_entries(DMA_DEBUG_DYNAMIC_ENTRIES, 
GFP_ATOMIC)) {
+   if (dma_debug_add_entries(GFP_ATOMIC)) {
global_disable = true;
spin_unlock_irqrestore(_entries_lock, flags);
pr_err("debugging out of memory - disabling\n");
@@ -1008,15 +996,20 @@ static int dma_debug_init(void)
return 0;
}
 
-   if (dma_debug_add_entries(nr_prealloc_entries, GFP_KERNEL) != 0) {
+   for (i = 0; i < DIV_ROUND_UP(nr_prealloc_entries, 
DMA_DEBUG_DYNAMIC_ENTRIES); ++i)
+   dma_debug_add_entries(GFP_KERNEL);
+   if (num_free_entries >= nr_prealloc_entries) {
+   pr_info("preallocated %d debug entries\n", nr_total_entries);
+   } else if (num_free_entries > 0) {
+   pr_warn("%d debug entries requested but only %d allocated\n",
+  

[PATCH v2 5/8] dma-debug: Make leak-like behaviour apparent

2018-12-05 Thread Robin Murphy
Now that we can dynamically allocate DMA debug entries to cope with
drivers maintaining excessively large numbers of live mappings, a driver
which *does* actually have a bug leaking mappings (and is not unloaded)
will no longer trigger the "DMA-API: debugging out of memory - disabling"
message until it gets to actual kernel OOM conditions, which means it
could go unnoticed for a while. To that end, let's inform the user each
time the pool has grown to a multiple of its initial size, which should
make it apparent that they either have a leak or might want to increase
the preallocation size.

Signed-off-by: Robin Murphy 
---

v2: Move the suggestion of a leak from the message to the documentation

 Documentation/DMA-API.txt |  6 +-
 kernel/dma/debug.c| 13 +
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt
index 0fcb7561af1e..7a7d8a415ce8 100644
--- a/Documentation/DMA-API.txt
+++ b/Documentation/DMA-API.txt
@@ -747,7 +747,11 @@ driver afterwards. This filter can be disabled or changed 
later using debugfs.
 When the code disables itself at runtime this is most likely because it ran
 out of dma_debug_entries and was unable to allocate more on-demand. 65536
 entries are preallocated at boot - if this is too low for you boot with
-'dma_debug_entries=' to overwrite the default.
+'dma_debug_entries=' to overwrite the default. The
+code will print to the kernel log each time it has dynamically allocated
+as many entries as were initially preallocated. This is to indicate that a
+larger preallocation size may be appropriate, or if it happens continually
+that a driver may be leaking mappings.
 
 ::
 
diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
index b1fe8b78acc4..cd9a12bf6a3a 100644
--- a/kernel/dma/debug.c
+++ b/kernel/dma/debug.c
@@ -693,6 +693,18 @@ static struct dma_debug_entry *__dma_entry_alloc(void)
return entry;
 }
 
+void __dma_entry_alloc_check_leak(void)
+{
+   u32 tmp = nr_total_entries % nr_prealloc_entries;
+
+   /* Shout each time we tick over some multiple of the initial pool */
+   if (tmp < DMA_DEBUG_DYNAMIC_ENTRIES) {
+   pr_info("dma_debug_entry pool grown to %u (%u00%%)\n",
+   nr_total_entries,
+   (nr_total_entries / nr_prealloc_entries));
+   }
+}
+
 /* struct dma_entry allocator
  *
  * The next two functions implement the allocator for
@@ -711,6 +723,7 @@ static struct dma_debug_entry *dma_entry_alloc(void)
pr_err("debugging out of memory - disabling\n");
return NULL;
}
+   __dma_entry_alloc_check_leak();
}
 
entry = __dma_entry_alloc();
-- 
2.19.1.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 4/8] dma-debug: Dynamically expand the dma_debug_entry pool

2018-12-05 Thread Robin Murphy
Certain drivers such as large multi-queue network adapters can use pools
of mapped DMA buffers larger than the default dma_debug_entry pool of
65536 entries, with the result that merely probing such a device can
cause DMA debug to disable itself during boot unless explicitly given an
appropriate "dma_debug_entries=..." option.

Developers trying to debug some other driver on such a system may not be
immediately aware of this, and at worst it can hide bugs if they fail to
realise that dma-debug has already disabled itself unexpectedly by the
time the code of interest gets to run. Even once they do realise, it can
be a bit of a pain to emprirically determine a suitable number of
preallocated entries to configure without massively over-allocating.

There's really no need for such a static limit, though, since we can
quite easily expand the pool at runtime in those rare cases that the
preallocated entries are insufficient, which is arguably the least
surprising and most useful behaviour.

Signed-off-by: Robin Murphy 
---

v2: Use GFP_ATOMIC, make retry loop redundant, update documentation

 Documentation/DMA-API.txt | 11 +--
 kernel/dma/debug.c| 15 +--
 2 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt
index 6bdb095393b0..0fcb7561af1e 100644
--- a/Documentation/DMA-API.txt
+++ b/Documentation/DMA-API.txt
@@ -717,8 +717,8 @@ dma-api/num_errors  The number in this file shows 
how many
 dma-api/min_free_entries   This read-only file can be read to get the
minimum number of free dma_debug_entries the
allocator has ever seen. If this value goes
-   down to zero the code will disable itself
-   because it is not longer reliable.
+   down to zero the code will attempt to increase
+   nr_total_entries to compensate.
 
 dma-api/num_free_entries   The current number of free dma_debug_entries
in the allocator.
@@ -745,10 +745,9 @@ driver filter at boot time. The debug code will only print 
errors for that
 driver afterwards. This filter can be disabled or changed later using debugfs.
 
 When the code disables itself at runtime this is most likely because it ran
-out of dma_debug_entries. These entries are preallocated at boot. The number
-of preallocated entries is defined per architecture. If it is too low for you
-boot with 'dma_debug_entries=' to overwrite the
-architectural default.
+out of dma_debug_entries and was unable to allocate more on-demand. 65536
+entries are preallocated at boot - if this is too low for you boot with
+'dma_debug_entries=' to overwrite the default.
 
 ::
 
diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
index 1b0858d7edfd..b1fe8b78acc4 100644
--- a/kernel/dma/debug.c
+++ b/kernel/dma/debug.c
@@ -47,6 +47,8 @@
 #ifndef PREALLOC_DMA_DEBUG_ENTRIES
 #define PREALLOC_DMA_DEBUG_ENTRIES (1 << 16)
 #endif
+/* If the pool runs out, add this many new entries at once */
+#define DMA_DEBUG_DYNAMIC_ENTRIES 256
 
 enum {
dma_debug_single,
@@ -702,12 +704,13 @@ static struct dma_debug_entry *dma_entry_alloc(void)
unsigned long flags;
 
spin_lock_irqsave(_entries_lock, flags);
-
-   if (list_empty(_entries)) {
-   global_disable = true;
-   spin_unlock_irqrestore(_entries_lock, flags);
-   pr_err("debugging out of memory - disabling\n");
-   return NULL;
+   if (num_free_entries == 0) {
+   if (dma_debug_create_entries(DMA_DEBUG_DYNAMIC_ENTRIES, 
GFP_ATOMIC)) {
+   global_disable = true;
+   spin_unlock_irqrestore(_entries_lock, flags);
+   pr_err("debugging out of memory - disabling\n");
+   return NULL;
+   }
}
 
entry = __dma_entry_alloc();
-- 
2.19.1.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 1/8] dma-debug: Use pr_fmt()

2018-12-05 Thread Robin Murphy
Use pr_fmt() to generate the "DMA-API: " prefix consistently. This
results in it being added to a couple of pr_*() messages which were
missing it before, and for the err_printk() calls moves it to the actual
start of the message instead of somewhere in the middle.

Reviewed-by: Christoph Hellwig 
Signed-off-by: Robin Murphy 
---

v2: Add Christoph's review tag

 kernel/dma/debug.c | 74 --
 1 file changed, 38 insertions(+), 36 deletions(-)

diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
index 231ca4628062..91b84140e4a5 100644
--- a/kernel/dma/debug.c
+++ b/kernel/dma/debug.c
@@ -17,6 +17,8 @@
  * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
  */
 
+#define pr_fmt(fmt)"DMA-API: " fmt
+
 #include 
 #include 
 #include 
@@ -234,7 +236,7 @@ static bool driver_filter(struct device *dev)
error_count += 1;   \
if (driver_filter(dev) &&   \
(show_all_errors || show_num_errors > 0)) { \
-   WARN(1, "%s %s: " format,   \
+   WARN(1, pr_fmt("%s %s: ") format,   \
 dev ? dev_driver_string(dev) : "NULL", \
 dev ? dev_name(dev) : "NULL", ## arg); \
dump_entry_trace(entry);\
@@ -519,7 +521,7 @@ static void active_cacheline_inc_overlap(phys_addr_t cln)
 * prematurely.
 */
WARN_ONCE(overlap > ACTIVE_CACHELINE_MAX_OVERLAP,
- "DMA-API: exceeded %d overlapping mappings of cacheline 
%pa\n",
+ pr_fmt("exceeded %d overlapping mappings of cacheline %pa\n"),
  ACTIVE_CACHELINE_MAX_OVERLAP, );
 }
 
@@ -614,7 +616,7 @@ void debug_dma_assert_idle(struct page *page)
 
cln = to_cacheline_number(entry);
err_printk(entry->dev, entry,
-  "DMA-API: cpu touching an active dma mapped cacheline 
[cln=%pa]\n",
+  "cpu touching an active dma mapped cacheline [cln=%pa]\n",
   );
 }
 
@@ -634,7 +636,7 @@ static void add_dma_entry(struct dma_debug_entry *entry)
 
rc = active_cacheline_insert(entry);
if (rc == -ENOMEM) {
-   pr_err("DMA-API: cacheline tracking ENOMEM, dma-debug 
disabled\n");
+   pr_err("cacheline tracking ENOMEM, dma-debug disabled\n");
global_disable = true;
}
 
@@ -673,7 +675,7 @@ static struct dma_debug_entry *dma_entry_alloc(void)
if (list_empty(_entries)) {
global_disable = true;
spin_unlock_irqrestore(_entries_lock, flags);
-   pr_err("DMA-API: debugging out of memory - disabling\n");
+   pr_err("debugging out of memory - disabling\n");
return NULL;
}
 
@@ -777,7 +779,7 @@ static int prealloc_memory(u32 num_entries)
num_free_entries = num_entries;
min_free_entries = num_entries;
 
-   pr_info("DMA-API: preallocated %d debug entries\n", num_entries);
+   pr_info("preallocated %d debug entries\n", num_entries);
 
return 0;
 
@@ -850,7 +852,7 @@ static ssize_t filter_write(struct file *file, const char 
__user *userbuf,
 * switched off.
 */
if (current_driver_name[0])
-   pr_info("DMA-API: switching off dma-debug driver 
filter\n");
+   pr_info("switching off dma-debug driver filter\n");
current_driver_name[0] = 0;
current_driver = NULL;
goto out_unlock;
@@ -868,7 +870,7 @@ static ssize_t filter_write(struct file *file, const char 
__user *userbuf,
current_driver_name[i] = 0;
current_driver = NULL;
 
-   pr_info("DMA-API: enable driver filter for driver [%s]\n",
+   pr_info("enable driver filter for driver [%s]\n",
current_driver_name);
 
 out_unlock:
@@ -887,7 +889,7 @@ static int dma_debug_fs_init(void)
 {
dma_debug_dent = debugfs_create_dir("dma-api", NULL);
if (!dma_debug_dent) {
-   pr_err("DMA-API: can not create debugfs directory\n");
+   pr_err("can not create debugfs directory\n");
return -ENOMEM;
}
 
@@ -973,7 +975,7 @@ static int dma_debug_device_change(struct notifier_block 
*nb, unsigned long acti
count = device_dma_allocations(dev, );
if (count == 0)
break;
-   err_printk(dev, entry, "DMA-API: device driver has pending "
+   err_printk(dev, entry, "device driver has pending "
"DMA allocations while released from device "
"[count=%d]\n"
"One of leaked entries details: "
@@ -1023,14 +1025,14 @@ static int 

[PATCH v2 0/8] dma-debug cleanup and dynamic allocation

2018-12-05 Thread Robin Murphy
Hi all,

Here's some assorted cleanup and improvements to dma-debug which grew
out of the problem that certain drivers use very large numbers of DMA
mappings, and knowing when to override "dma_debug_entries=..." and what
value to override it with can be a less-than-obvious task for users.

The main part is patches #3 and #4, wherein we make dma-debug clever
enough to allocate more entries dynamically if needed, such that the
preallocation value becomes more of a quality-of-life option than a
necessity. Patches #6 and #7 do some cruft-removal to allow patch #8
to make the allocation behaviour more efficient in general.

Patches #1, #2 and #5 are some other cleanup and handy features which
fell out of the discussion/development.

Robin.


Robin Murphy (8):
  dma-debug: Use pr_fmt()
  dma-debug: Expose nr_total_entries in debugfs
  dma-debug: Refactor dma_debug_entry allocation
  dma-debug: Dynamically expand the dma_debug_entry pool
  dma-debug: Make leak-like behaviour apparent
  x86/dma/amd-gart: Stop resizing dma_debug_entry pool
  dma/debug: Remove dma_debug_resize_entries()
  dma-debug: Batch dma_debug_entry allocation

 Documentation/DMA-API.txt |  20 +-
 Documentation/x86/x86_64/boot-options.txt |   5 +-
 arch/x86/kernel/amd_gart_64.c |  23 ---
 include/linux/dma-debug.h |   7 -
 kernel/dma/debug.c| 211 ++
 5 files changed, 107 insertions(+), 159 deletions(-)

-- 
2.19.1.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 2/8] dma-debug: Expose nr_total_entries in debugfs

2018-12-05 Thread Robin Murphy
Expose nr_total_entries in debugfs, so that {num,min}_free_entries
become even more meaningful to users interested in current/maximum
utilisation. This becomes even more relevant once nr_total_entries
may change at runtime beyond just the existing AMD GART debug code.

Suggested-by: John Garry 
Signed-off-by: Robin Murphy 
---

v2: New

 Documentation/DMA-API.txt | 3 +++
 kernel/dma/debug.c| 7 +++
 2 files changed, 10 insertions(+)

diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt
index ac66ae2509a9..6bdb095393b0 100644
--- a/Documentation/DMA-API.txt
+++ b/Documentation/DMA-API.txt
@@ -723,6 +723,9 @@ dma-api/min_free_entriesThis read-only file can be read 
to get the
 dma-api/num_free_entries   The current number of free dma_debug_entries
in the allocator.
 
+dma-api/nr_total_entries   The total number of dma_debug_entries in the
+   allocator, both free and used.
+
 dma-api/driver-filter  You can write a name of a driver into this file
to limit the debug output to requests from that
particular driver. Write an empty string to
diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
index 91b84140e4a5..29486eb9d1dc 100644
--- a/kernel/dma/debug.c
+++ b/kernel/dma/debug.c
@@ -144,6 +144,7 @@ static struct dentry *show_all_errors_dent  __read_mostly;
 static struct dentry *show_num_errors_dent  __read_mostly;
 static struct dentry *num_free_entries_dent __read_mostly;
 static struct dentry *min_free_entries_dent __read_mostly;
+static struct dentry *nr_total_entries_dent __read_mostly;
 static struct dentry *filter_dent   __read_mostly;
 
 /* per-driver filter related state */
@@ -928,6 +929,12 @@ static int dma_debug_fs_init(void)
if (!min_free_entries_dent)
goto out_err;
 
+   nr_total_entries_dent = debugfs_create_u32("nr_total_entries", 0444,
+   dma_debug_dent,
+   _total_entries);
+   if (!nr_total_entries_dent)
+   goto out_err;
+
filter_dent = debugfs_create_file("driver_filter", 0644,
  dma_debug_dent, NULL, _fops);
if (!filter_dent)
-- 
2.19.1.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: fix amd_iommu=force_isolation

2018-12-05 Thread Yu Zhao via iommu
On Wed, Dec 05, 2018 at 05:09:55PM +0100, Joerg Roedel wrote:
> On Tue, Dec 04, 2018 at 03:37:16PM -0700, Yu Zhao wrote:
> > The parameter is still there but it's ignored. We need to check its
> > value before deciding to go into passthrough mode for AMD IOMMU.
> > 
> > Fixes: aafd8ba0ca74 ("iommu/amd: Implement add_device and remove_device")
> > 
> > Signed-off-by: Yu Zhao 
> > ---
> >  drivers/iommu/amd_iommu.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> > index 1167ff0416cf..3e4219e6cff0 100644
> > --- a/drivers/iommu/amd_iommu.c
> > +++ b/drivers/iommu/amd_iommu.c
> > @@ -2195,7 +2195,8 @@ static int amd_iommu_add_device(struct device *dev)
> >  
> > BUG_ON(!dev_data);
> >  
> > -   if (iommu_pass_through || dev_data->iommu_v2)
> > +   if (iommu_pass_through ||
> > +   (!amd_iommu_force_isolation && dev_data->iommu_v2))
> 
> This breaks the iommu_v2 use-case, as it needs a direct mapping for the
> devices that support it.

Actually this is what we want. We occasionally use this parameter
to isolate v2 device when debugging memory corruption that we
suspect is caused by DMA. It helped us before. Now with the patch,
we caught another v2 device doing DMA write to memory areas that
it's not supposed to.

> I think the force_isolation parameter does not make sense anymore today
> and should be removed.

Could you please suggest an alternative? Otherwise, we won't be able
to force v2 device out of direct mapping for the reason mentioned
above.

Thank you.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/4] iommu/of: Don't call iommu_ops->add_device directly

2018-12-05 Thread Robin Murphy

On 05/12/2018 14:36, Joerg Roedel wrote:

From: Joerg Roedel 

Make sure to invoke this call-back through the proper
function of the IOMMU-API.

Signed-off-by: Joerg Roedel 
---
  drivers/iommu/of_iommu.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index c5dd63072529..4d4847de727e 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -218,10 +218,10 @@ const struct iommu_ops *of_iommu_configure(struct device 
*dev,
ops = dev->iommu_fwspec->ops;
/*
 * If we have reason to believe the IOMMU driver missed the initial
-* add_device callback for dev, replay it to get things in order.
+* probe for dev, replay it to get things in order.
 */
-   if (ops && ops->add_device && dev->bus && !dev->iommu_group)


Ugh, code I wrote...

I think that first ops test is serving double-duty - it's not just an 
"is ops->add_device safe to call?" check, but also the specific "do we 
need to try doing this at all?" check, since ops is only (potentially) 
set in the !err case at the top of this context.



-   err = ops->add_device(dev);
+   if (dev->bus && !dev->iommu_group)


Thus to avoid calling add_device erroneously in either of the "no valid 
IOMMU" cases (since dev->bus->iommu_ops may well be non-NULL), this 
still needs to be at least:


if (ops && dev->bus && !dev->iommu_group)

What I can't quite remember just now is whether it's actually valid to 
get here with err == 0 but dev->iommu_fwspec->ops == NULL, so it *might* 
be OK to use "!err" instead of "ops" to make things slightly more 
obvious - I'll work through the flow tomorrow to double-check.


Robin.


+   err = iommu_probe_device(dev);
  
  	/* Ignore all other errors apart from EPROBE_DEFER */

if (err == -EPROBE_DEFER) {


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/4] iommu: Consolitate ->add/remove_device() calls

2018-12-05 Thread Robin Murphy

On 05/12/2018 14:36, Joerg Roedel wrote:

From: Joerg Roedel 

Put them into separate functions and call those where the
plain ops have been called before.

Signed-off-by: Joerg Roedel 
---
  drivers/iommu/iommu.c | 54 ++-
  include/linux/iommu.h |  3 +++
  2 files changed, 31 insertions(+), 26 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index edbdf5d6962c..ad4dc51eb69c 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -110,6 +110,26 @@ void iommu_device_unregister(struct iommu_device *iommu)
spin_unlock(_device_lock);
  }
  
+int iommu_probe_device(struct device *dev)

+{
+   const struct iommu_ops *ops = dev->bus->iommu_ops;
+
+   if (!ops->add_device)
+   return 0;


Is there any good reason to let .add_device/.remove_device be optional 
still? Everyone's implemented them for a while now, and on a practical 
level I don't really see how else we could expect devices to be taken in 
and out of their appropriate groups correctly.


Robin.


+
+   WARN_ON(dev->iommu_group);
+
+   return ops->add_device(dev);
+}
+
+void  iommu_release_device(struct device *dev)
+{
+   const struct iommu_ops *ops = dev->bus->iommu_ops;
+
+   if (ops->remove_device && dev->iommu_group)
+   ops->remove_device(dev);
+}
+
  static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus,
 unsigned type);
  static int __iommu_attach_device(struct iommu_domain *domain,
@@ -1117,16 +1137,7 @@ struct iommu_domain *iommu_group_default_domain(struct 
iommu_group *group)
  
  static int add_iommu_group(struct device *dev, void *data)

  {
-   struct iommu_callback_data *cb = data;
-   const struct iommu_ops *ops = cb->ops;
-   int ret;
-
-   if (!ops->add_device)
-   return 0;
-
-   WARN_ON(dev->iommu_group);
-
-   ret = ops->add_device(dev);
+   int ret = iommu_probe_device(dev);
  
  	/*

 * We ignore -ENODEV errors for now, as they just mean that the
@@ -1141,11 +1152,7 @@ static int add_iommu_group(struct device *dev, void 
*data)
  
  static int remove_iommu_group(struct device *dev, void *data)

  {
-   struct iommu_callback_data *cb = data;
-   const struct iommu_ops *ops = cb->ops;
-
-   if (ops->remove_device && dev->iommu_group)
-   ops->remove_device(dev);
+   iommu_release_device(dev);
  
  	return 0;

  }
@@ -1153,27 +1160,22 @@ static int remove_iommu_group(struct device *dev, void 
*data)
  static int iommu_bus_notifier(struct notifier_block *nb,
  unsigned long action, void *data)
  {
+   unsigned long group_action = 0;
struct device *dev = data;
-   const struct iommu_ops *ops = dev->bus->iommu_ops;
struct iommu_group *group;
-   unsigned long group_action = 0;
  
  	/*

 * ADD/DEL call into iommu driver ops if provided, which may
 * result in ADD/DEL notifiers to group->notifier
 */
if (action == BUS_NOTIFY_ADD_DEVICE) {
-   if (ops->add_device) {
-   int ret;
+   int ret;
  
-			ret = ops->add_device(dev);

-   return (ret) ? NOTIFY_DONE : NOTIFY_OK;
-   }
+   ret = iommu_probe_device(dev);
+   return (ret) ? NOTIFY_DONE : NOTIFY_OK;
} else if (action == BUS_NOTIFY_REMOVED_DEVICE) {
-   if (ops->remove_device && dev->iommu_group) {
-   ops->remove_device(dev);
-   return 0;
-   }
+   iommu_release_device(dev);
+   return NOTIFY_OK;
}
  
  	/*

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index a1d28f42cb77..2357841845bb 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -398,6 +398,9 @@ void iommu_fwspec_free(struct device *dev);
  int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids);
  const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode);
  
+int iommu_probe_device(struct device *dev);

+void  iommu_release_device(struct device *dev);
+
  #else /* CONFIG_IOMMU_API */
  
  struct iommu_ops {};



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/9] ACPI/IORT: Use helper functions to access dev->iommu_fwspec

2018-12-05 Thread Robin Murphy

On 04/12/2018 16:29, Joerg Roedel wrote:

From: Joerg Roedel 

Use the new helpers dev_iommu_fwspec_get()/set() to access
the dev->iommu_fwspec pointer. This makes it easier to move
that pointer later into another struct.

Cc: Lorenzo Pieralisi 
Signed-off-by: Joerg Roedel 
---
  drivers/acpi/arm64/iort.c | 12 +++-
  1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index 70f4e80b9246..754a67ba49e5 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -779,7 +779,7 @@ static inline bool iort_iommu_driver_enabled(u8 type)
  static struct acpi_iort_node *iort_get_msi_resv_iommu(struct device *dev)
  {
struct acpi_iort_node *iommu;
-   struct iommu_fwspec *fwspec = dev->iommu_fwspec;
+   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
  
  	iommu = iort_get_iort_node(fwspec->iommu_fwnode);
  
@@ -824,6 +824,7 @@ static inline int iort_add_device_replay(const struct iommu_ops *ops,

   */
  int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head 
*head)
  {
+   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
struct acpi_iort_its_group *its;
struct acpi_iort_node *iommu_node, *its_node = NULL;
int i, resv = 0;
@@ -841,9 +842,9 @@ int iort_iommu_msi_get_resv_regions(struct device *dev, 
struct list_head *head)
 * a given PCI or named component may map IDs to.
 */
  
-	for (i = 0; i < dev->iommu_fwspec->num_ids; i++) {

+   for (i = 0; i < fwspec->num_ids; i++) {
its_node = iort_node_map_id(iommu_node,
-   dev->iommu_fwspec->ids[i],
+   fwspec->ids[i],
NULL, IORT_MSI_TYPE);
if (its_node)
break;
@@ -1036,6 +1037,7 @@ void iort_dma_setup(struct device *dev, u64 *dma_addr, 
u64 *dma_size)
   */
  const struct iommu_ops *iort_iommu_configure(struct device *dev)
  {
+   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
struct acpi_iort_node *node, *parent;
const struct iommu_ops *ops;
u32 streamid = 0;
@@ -1045,7 +1047,7 @@ const struct iommu_ops *iort_iommu_configure(struct 
device *dev)
 * If we already translated the fwspec there
 * is nothing left to do, return the iommu_ops.
 */
-   ops = iort_fwspec_iommu_ops(dev->iommu_fwspec);
+   ops = iort_fwspec_iommu_ops(fwspec);
if (ops)
return ops;
  
@@ -1084,7 +1086,7 @@ const struct iommu_ops *iort_iommu_configure(struct device *dev)

 * add_device callback for dev, replay it to get things in order.
 */
if (!err) {
-   ops = iort_fwspec_iommu_ops(dev->iommu_fwspec);
+   ops = iort_fwspec_iommu_ops(fwspec);


This needs to reload the new fwspec initialised by iort_iommu_xlate(), 
in the same manner as the OF code. I think the best thing to do is 
encapsulate the dev_iommu_fwspec_get() call in iort_fwspec_iommu_ops(), 
and have that take dev as its argument directly.


Robin.


err = iort_add_device_replay(ops, dev);
}
  


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 7/9] iommu/of: Use helper functions to access dev->iommu_fwspec

2018-12-05 Thread Robin Murphy

On 04/12/2018 16:30, Joerg Roedel wrote:

From: Joerg Roedel 

Use the new helpers dev_iommu_fwspec_get()/set() to access
the dev->iommu_fwspec pointer. This makes it easier to move
that pointer later into another struct.

Signed-off-by: Joerg Roedel 
---
  drivers/iommu/of_iommu.c | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index c5dd63072529..38232250b5f4 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -164,7 +164,7 @@ const struct iommu_ops *of_iommu_configure(struct device 
*dev,
   struct device_node *master_np)
  {
const struct iommu_ops *ops = NULL;
-   struct iommu_fwspec *fwspec = dev->iommu_fwspec;
+   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
int err = NO_IOMMU;
  
  	if (!master_np)

@@ -208,6 +208,9 @@ const struct iommu_ops *of_iommu_configure(struct device 
*dev,
}
}
  
+	/* The fwspec pointer changed, read it again */

+   fwspec = dev_iommu_fwspec_get(dev);


Nit: I think it makes sense to put this inside the "if (!err)" condition 
below rather than out here where it may or may not be relevant. The 
comment for that case is already supposed to imply that it's dealing 
with a fresh fwspec.


Robin.


+
/*
 * Two success conditions can be represented by non-negative err here:
 * >0 : there is no IOMMU, or one was unavailable for non-fatal reasons
@@ -215,7 +218,7 @@ const struct iommu_ops *of_iommu_configure(struct device 
*dev,
 * <0 : any actual error
 */
if (!err)
-   ops = dev->iommu_fwspec->ops;
+   ops = fwspec->ops;
/*
 * If we have reason to believe the IOMMU driver missed the initial
 * add_device callback for dev, replay it to get things in order.


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC v2] iommu/vt-d: Allow iommu_domain_alloc to allocate IOMMU_DOMAIN_DMA

2018-12-05 Thread James Sewart via iommu
Hey,

There exists an issue in the logic used to determine domain association 
with devices. Currently the driver uses find_or_alloc_domain to either 
reuse an existing domain or allocate a new one if one isn’t found. Domains 
should be shared between all members of an IOMMU group as this is the 
minimum granularity that we can guarantee address space isolation.

The intel IOMMU driver exposes pci_device_group in intel_iommu_ops as the 
function to call to determine the group of a device, this is implemented 
in the generic IOMMU code and checks: dma aliases, upstream pcie switch 
ACS, pci aliases, and pci function aliases. The find_or_alloc_domain code 
currently only uses dma aliases to determine if a domain is shared. This 
causes a disconnect between IOMMU groups and domains. We have observed 
devices under a pcie switch each having their own domain but assigned the 
same group.

One solution would be to fix the logic in find_or_alloc_domain to add 
checks for the other conditions that a device may share a domain. However, 
this duplicates code which the generic IOMMU code implements. Instead this 
issue can be fixed by allowing the allocation of default_domain on the 
IOMMU group. This is not currently supported as the intel driver does not 
allow allocation of domain type IOMMU_DOMAIN_DMA.

Allowing allocation of DMA domains has the effect that the default_domain 
is non NULL and is attached to a device when initialising. This delegates 
the handling of domains to the generic IOMMU code. Once this is 
implemented it is possible to remove the lazy allocation of domains 
entirely.

This patch implements DMA and identity domains to be allocated for 
external management. As it isn’t known which device will be attached to a 
domain, the dma domain is not initialised at alloc time. Instead it is 
allocated when attached. As we may lose RMRR mappings when attaching a 
device to a new domain, we also ensure these are mapped at attach time.

This will likely conflict with the work done for auxiliary domains by 
Baolu but the code to accommodate won’t change much.

I had also started on a patch to remove find_or_alloc_domain and various 
functions that call it but had issues with edge cases such as 
iommu_prepare_isa that is doing domain operations at IOMMU init time.

Cheers,
James.


---
 drivers/iommu/intel-iommu.c | 159 +---
 1 file changed, 110 insertions(+), 49 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 41a4b8808802..6437cb2e9b22 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -351,6 +351,14 @@ static int hw_pass_through = 1;
 /* si_domain contains mulitple devices */
 #define DOMAIN_FLAG_STATIC_IDENTITY(1 << 1)
 
+/* Domain managed externally, don't cleanup if it isn't attached
+ * to any devices. */
+#define DOMAIN_FLAG_NO_CLEANUP (1 << 2)
+
+/* Set after domain initialisation. Used when allocating dma domains to
+ * defer domain initialisation until it is attached to a device */
+#define DOMAIN_FLAG_INITIALISED(1 << 4)
+
 #define for_each_domain_iommu(idx, domain) \
for (idx = 0; idx < g_num_of_iommus; idx++) \
if (domain->iommu_refcnt[idx])
@@ -624,6 +632,16 @@ static inline int domain_type_is_vm_or_si(struct 
dmar_domain *domain)
DOMAIN_FLAG_STATIC_IDENTITY);
 }
 
+static inline int domain_managed_externally(struct dmar_domain *domain)
+{
+   return domain->flags & DOMAIN_FLAG_NO_CLEANUP;
+}
+
+static inline int domain_is_initialised(struct dmar_domain *domain)
+{
+   return domain->flags & DOMAIN_FLAG_INITIALISED;
+}
+
 static inline int domain_pfn_supported(struct dmar_domain *domain,
   unsigned long pfn)
 {
@@ -1717,7 +1735,7 @@ static void disable_dmar_iommu(struct intel_iommu *iommu)
 
__dmar_remove_one_dev_info(info);
 
-   if (!domain_type_is_vm_or_si(domain)) {
+   if (!domain_managed_externally(domain)) {
/*
 * The domain_exit() function  can't be called under
 * device_domain_lock, as it takes this lock itself.
@@ -1951,6 +1969,7 @@ static int domain_init(struct dmar_domain *domain, struct 
intel_iommu *iommu,
domain->pgd = (struct dma_pte *)alloc_pgtable_page(domain->nid);
if (!domain->pgd)
return -ENOMEM;
+   domain->flags |= DOMAIN_FLAG_INITIALISED;
__iommu_flush_cache(iommu, domain->pgd, PAGE_SIZE);
return 0;
 }
@@ -3234,11 +3253,9 @@ static int copy_translation_tables(struct intel_iommu 
*iommu)
 static int __init init_dmars(void)
 {
struct dmar_drhd_unit *drhd;
-   struct dmar_rmrr_unit *rmrr;
bool copied_tables = false;
-   struct device *dev;
struct intel_iommu *iommu;
-   int i, ret;
+   int ret;
 
/*
 * for each 

Re: [PATCH 2/5] iommu/of: Use device_iommu_mapped()

2018-12-05 Thread Robin Murphy

On 04/12/2018 17:25, Joerg Roedel wrote:

From: Joerg Roedel 

Use Use device_iommu_mapped() to check if the device is
already mapped by an IOMMU.


FWIW, this check (and its ACPI equivalent in patch #3) is specifically 
asking "has .add_device() already been called?", rather than the more 
general "is this device managed by an IOMMU?" (to which the exact answer 
at this point is "yes, provided we return successfully from here").


I have no objection to the change as-is - especially if that usage is 
within the intended scope of this API - I just wanted to call it out in 
case you're also planning to introduce something else which would be 
even more appropriate for that.


Robin.


Signed-off-by: Joerg Roedel 
---
  drivers/iommu/of_iommu.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index c5dd63072529..bfcf139503f0 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -220,7 +220,7 @@ const struct iommu_ops *of_iommu_configure(struct device 
*dev,
 * If we have reason to believe the IOMMU driver missed the initial
 * add_device callback for dev, replay it to get things in order.
 */
-   if (ops && ops->add_device && dev->bus && !dev->iommu_group)
+   if (ops && ops->add_device && dev->bus && !device_iommu_mapped(dev))
err = ops->add_device(dev);
  
  	/* Ignore all other errors apart from EPROBE_DEFER */



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/5] Introduce device_iommu_maped() function

2018-12-05 Thread Robin Murphy

Hi Joerg,

On 04/12/2018 17:24, Joerg Roedel wrote:

Hi,

here is a patch-set to replace the dev->iommu_group checks
in the source tree by a proper function call. The pointer
checks mostly happen to check whether a device is managed my
an IOMMU. For that purpose a pointer check is not very
descriptable, so replace it by a function call that make its
purpose readable.


Nice, we can also clean up a whole load of vague iommu_present() usage 
and even one or two odd iommu_get_domain_for_dev() calls once we have this.



This also starts to remove direct access to the
dev->iommu_group pointer outside of iommu-code. This is
another move towards consolidating the various
iommu-related pointers in 'struct device' into one pointer
only.

Please review.

Thanks,

Joerg

Joerg Roedel (5):
   driver core: Introduce device_iommu_mapped() function
   iommu/of: Use device_iommu_mapped()
   ACPI/IORT: Use device_iommu_mapped()
   powerpc/iommu: Use device_iommu_mapped()
   xhci: Use device_iommu_mapped()

  arch/powerpc/kernel/eeh.c   |  2 +-
  arch/powerpc/kernel/iommu.c |  6 +++---
  drivers/acpi/arm64/iort.c   |  2 +-
  drivers/iommu/of_iommu.c|  2 +-
  drivers/usb/host/xhci.c |  2 +-
  include/linux/device.h  | 10 ++
  6 files changed, 17 insertions(+), 7 deletions(-)


There looks to be one more here:

drivers/dma/sh/rcar-dmac.c: rcar_dmac_probe()

Other than that and a minor comment on the OF/IORT part, though, for the 
whole series:


Acked-by: Robin Murphy 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: fix amd_iommu=force_isolation

2018-12-05 Thread Joerg Roedel
On Tue, Dec 04, 2018 at 03:37:16PM -0700, Yu Zhao wrote:
> The parameter is still there but it's ignored. We need to check its
> value before deciding to go into passthrough mode for AMD IOMMU.
> 
> Fixes: aafd8ba0ca74 ("iommu/amd: Implement add_device and remove_device")
> 
> Signed-off-by: Yu Zhao 
> ---
>  drivers/iommu/amd_iommu.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> index 1167ff0416cf..3e4219e6cff0 100644
> --- a/drivers/iommu/amd_iommu.c
> +++ b/drivers/iommu/amd_iommu.c
> @@ -2195,7 +2195,8 @@ static int amd_iommu_add_device(struct device *dev)
>  
>   BUG_ON(!dev_data);
>  
> - if (iommu_pass_through || dev_data->iommu_v2)
> + if (iommu_pass_through ||
> + (!amd_iommu_force_isolation && dev_data->iommu_v2))

This breaks the iommu_v2 use-case, as it needs a direct mapping for the
devices that support it.

I think the force_isolation parameter does not make sense anymore today
and should be removed.


Regards,

Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] Fix typo. Change tlb_range_add to iotlb_range_add and tlb_sync to iotlb_sync

2018-12-05 Thread Joerg Roedel
On Tue, Dec 04, 2018 at 06:27:34PM +, Tom Murphy wrote:
> From: tom 
> 
> Someone forgot to update this comment.
> 
> Signed-off-by: Tom Murphy 
> ---
>  include/linux/iommu.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Applied, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 04/12] iommu/vt-d: Add 256-bit invalidation descriptor support

2018-12-05 Thread Joerg Roedel
On Tue, Dec 04, 2018 at 02:13:31PM +0800, Lu Baolu wrote:
> The existing code uses GFP_ATOMIC, this patch only changes the size of
> the allocated desc_page.
> 
> I don't think we really need GFP_ATOMIC here (and also for some other
> places). I will clean up them in a separated patch.

Okay, thanks.

> > In this patch, there is some code like the code below. It calculates
> > destination address of memcpy with qi->desc. If it's still struct qi_desc
> > pointer, the calculation result would be wrong.
> > 
> > +   memcpy(desc, qi->desc + (wait_index << shift),
> > +  1 << shift);
> > 
> > The change of the calculation method is to support 128 bits invalidation
> > descriptors and 256 invalidation descriptors in this unified code logic.
> > 
> > Also, the conversation between Baolu and me may help.
> > 
> > https://lore.kernel.org/patchwork/patch/1006756/
> 
> Yes. We need to support different descriptor size.

Okay, pointer arithmetic on void* isn't well defined in the C standard,
afaik. But it should work with GCC, so it's probably fine.

Unrelated to this patch-set, the whole qi management code needs some
cleanups, it queues a sync after every command and has very tricky
locking. This patch-set further complicates matters there, so it is
probably time for a clean re-write of that part?

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 02/12] iommu/vt-d: Manage scalalble mode PASID tables

2018-12-05 Thread Joerg Roedel
On Tue, Dec 04, 2018 at 01:58:06PM +0800, Lu Baolu wrote:
> This function is called in an unsleepable context.
> 
> spin_lock()
> [...]
> if (pasid_table_is_necessary)
>   allocate_pasid_table(dev)
> [...]
> spin_unlock()
> 
> We can move it out of the lock range.
> 
> How about
> 
> if (pasid_table_is_necessary)
>   pasid_table = allocate_pasid_table(dev)
> 
> spin_lock()
> [...]
> if (pasid_table_is_necessary)
>   set_up_pasid_table(pasid_table)
> [...]
> spin_unlock()

Hmm, so when the IOMMU is configured in scalable mode we can just
allocate a pasid-table for the device when we set it up, right?

Scalable mode is a boot-time decision, so we know for sure whether we
need a pasid-table on device-setup time. And the device-setup code it
preemptable, so I think it this allocation should be outside of any
spin-locked section.


Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 02/12] iommu/vt-d: Manage scalalble mode PASID tables

2018-12-05 Thread Joerg Roedel
On Mon, Dec 03, 2018 at 05:23:40PM +, Liu, Yi L wrote:
> So for a device without PASID support, we need to at least to have a PASID
> entry so that its DMA request (without pasid) can be translated. Thus a PASID
> table is needed for such devices.

That makes sense, thanks for the explanation.


Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/4] iommu: Consolitate ->add/remove_device() calls

2018-12-05 Thread Joerg Roedel
On Wed, Dec 05, 2018 at 06:54:18AM -0800, Christoph Hellwig wrote:
> > +void  iommu_release_device(struct device *dev)
> 
> Nitpick: there seems to be a double space here.
> 
> > +int iommu_probe_device(struct device *dev);
> > +void  iommu_release_device(struct device *dev);
> 
> .. and here.

Right, thanks. I fixed it in my development branch.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/4] iommu: Consolitate ->add/remove_device() calls

2018-12-05 Thread Christoph Hellwig
> +void  iommu_release_device(struct device *dev)

Nitpick: there seems to be a double space here.

> +int iommu_probe_device(struct device *dev);
> +void  iommu_release_device(struct device *dev);

.. and here.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 3/3] iommu/io-pgtable-arm-v7s: Request DMA32 memory, and improve debugging

2018-12-05 Thread Vlastimil Babka
On 12/5/18 6:48 AM, Nicolas Boichat wrote:
> IOMMUs using ARMv7 short-descriptor format require page tables
> (level 1 and 2) to be allocated within the first 4GB of RAM, even
> on 64-bit systems.
> 
> For level 1/2 pages, ensure GFP_DMA32 is used if CONFIG_ZONE_DMA32
> is defined (e.g. on arm64 platforms).
> 
> For level 2 pages, allocate a slab cache in SLAB_CACHE_DMA32.
> 
> Also, print an error when the physical address does not fit in
> 32-bit, to make debugging easier in the future.
> 
> Fixes: ad67f5a6545f ("arm64: replace ZONE_DMA with ZONE_DMA32")
> Signed-off-by: Nicolas Boichat 
> ---
> 
> Changes since v2:
>  - Commit message
> 
> (v3 used the page_frag approach)
> 
>  drivers/iommu/io-pgtable-arm-v7s.c | 20 
>  1 file changed, 16 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
> b/drivers/iommu/io-pgtable-arm-v7s.c
> index 445c3bde04800c..996f7b6d00b44a 100644
> --- a/drivers/iommu/io-pgtable-arm-v7s.c
> +++ b/drivers/iommu/io-pgtable-arm-v7s.c
> @@ -161,6 +161,14 @@
>  
>  #define ARM_V7S_TCR_PD1  BIT(5)
>  
> +#ifdef CONFIG_ZONE_DMA32
> +#define ARM_V7S_TABLE_GFP_DMA GFP_DMA32
> +#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA32
> +#else
> +#define ARM_V7S_TABLE_GFP_DMA GFP_DMA
> +#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA
> +#endif
> +
>  typedef u32 arm_v7s_iopte;
>  
>  static bool selftest_running;
> @@ -198,13 +206,17 @@ static void *__arm_v7s_alloc_table(int lvl, gfp_t gfp,
>   void *table = NULL;
>  
>   if (lvl == 1)
> - table = (void *)__get_dma_pages(__GFP_ZERO, get_order(size));
> + table = (void *)__get_free_pages(
> + __GFP_ZERO | ARM_V7S_TABLE_GFP_DMA, get_order(size));
>   else if (lvl == 2)
> - table = kmem_cache_zalloc(data->l2_tables, gfp | GFP_DMA);
> + table = kmem_cache_zalloc(data->l2_tables,
> +   gfp | ARM_V7S_TABLE_GFP_DMA);

So as I've explained in 2/3, you don't need ARM_V7S_TABLE_GFP_DMA here
(and then you don't need to adjust the slab warnings).

>   phys = virt_to_phys(table);
> - if (phys != (arm_v7s_iopte)phys)
> + if (phys != (arm_v7s_iopte)phys) {
>   /* Doesn't fit in PTE */
> + dev_err(dev, "Page table does not fit in PTE: %pa", );
>   goto out_free;
> + }
>   if (table && !(cfg->quirks & IO_PGTABLE_QUIRK_NO_DMA)) {
>   dma = dma_map_single(dev, table, size, DMA_TO_DEVICE);
>   if (dma_mapping_error(dev, dma))
> @@ -737,7 +749,7 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct 
> io_pgtable_cfg *cfg,
>   data->l2_tables = kmem_cache_create("io-pgtable_armv7s_l2",
>   ARM_V7S_TABLE_SIZE(2),
>   ARM_V7S_TABLE_SIZE(2),
> - SLAB_CACHE_DMA, NULL);
> + ARM_V7S_TABLE_SLAB_CACHE, NULL);
>   if (!data->l2_tables)
>   goto out_free_data;
>  
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 3/3] iommu/io-pgtable-arm-v7s: Request DMA32 memory, and improve debugging

2018-12-05 Thread Will Deacon
On Wed, Dec 05, 2018 at 06:43:08AM -0800, Christoph Hellwig wrote:
> On Wed, Dec 05, 2018 at 02:40:06PM +, Robin Murphy wrote:
> > 32-bit Arm doesn't have ZONE_DMA32, but has (or at least had at the time) a
> > 2GB ZONE_DMA. Whether we actually need that or not depends on how this all
> > interacts with LPAE and highmem, but I'm not sure of those details off-hand.
> 
> Well, arm32 can't address more than 32-bits in the linear kernel
> mapping, so GFP_KERNEL should be perfectly fine there if the limit
> really is 32-bits and not 31 or smaller because someone stole a bit
> or two somewhere.

I'm not sure that's necessarily true on the physical side. Wasn't there a
keystone SoC with /all/ the coherent memory above 4GB?

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 3/3] iommu/io-pgtable-arm-v7s: Request DMA32 memory, and improve debugging

2018-12-05 Thread Christoph Hellwig
On Wed, Dec 05, 2018 at 02:40:06PM +, Robin Murphy wrote:
> 32-bit Arm doesn't have ZONE_DMA32, but has (or at least had at the time) a
> 2GB ZONE_DMA. Whether we actually need that or not depends on how this all
> interacts with LPAE and highmem, but I'm not sure of those details off-hand.

Well, arm32 can't address more than 32-bits in the linear kernel
mapping, so GFP_KERNEL should be perfectly fine there if the limit
really is 32-bits and not 31 or smaller because someone stole a bit
or two somewhere.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables

2018-12-05 Thread Will Deacon
On Wed, Dec 05, 2018 at 10:04:00AM +0800, Nicolas Boichat wrote:
> On Tue, Dec 4, 2018 at 10:35 PM Vlastimil Babka  wrote:
> >
> > On 12/4/18 10:37 AM, Nicolas Boichat wrote:
> > > On Sun, Nov 11, 2018 at 5:04 PM Nicolas Boichat  
> > > wrote:
> > >>
> > >> This is a follow-up to the discussion in [1], to make sure that the page
> > >> tables allocated by iommu/io-pgtable-arm-v7s are contained within 32-bit
> > >> physical address space.
> > >>
> > >> [1] 
> > >> https://lists.linuxfoundation.org/pipermail/iommu/2018-November/030876.html
> > >
> > > Hi everyone,
> > >
> > > Let's try to summarize here.
> > >
> > > First, we confirmed that this is a regression, and IOMMU errors happen
> > > on 4.19 and linux-next/master on MT8173 (elm, Acer Chromebook R13).
> > > The issue most likely starts from ad67f5a6545f ("arm64: replace
> > > ZONE_DMA with ZONE_DMA32"), i.e. 4.15, and presumably breaks a number
> > > of Mediatek platforms (and maybe others?).
> > >
> > > We have a few options here:
> > > 1. This series [2], that adds support for GFP_DMA32 slab caches,
> > > _without_ adding kmalloc caches (since there are no users of
> > > kmalloc(..., GFP_DMA32)). I think I've addressed all the comments on
> > > the 3 patches, and AFAICT this solution works fine.
> > > 2. genalloc. That works, but unless we preallocate 4MB for L2 tables
> > > (which is wasteful as we usually only need a handful of L2 tables),
> > > we'll need changes in the core (use GFP_ATOMIC) to allow allocating on
> > > demand, and as it stands we'd have no way to shrink the allocation.
> > > 3. page_frag [3]. That works fine, and the code is quite simple. One
> > > drawback is that fragments in partially freed pages cannot be reused
> > > (from limited experiments, I see that IOMMU L2 tables are rarely
> > > freed, so it's unlikely a whole page would get freed). But given the
> > > low number of L2 tables, maybe we can live with that.
> > >
> > > I think 2 is out. Any preference between 1 and 3? I think 1 makes
> > > better use of the memory, so that'd be my preference. But I'm probably
> > > missing something.
> >
> > I would prefer 1 as well. IIRC you already confirmed that alignment
> > requirements are not broken for custom kmem caches even in presence of
> > SLUB debug options (and I would say it's a bug to be fixed if they
> > weren't).
> 
> > I just asked (and didn't get a reply I think) about your
> > ability to handle the GFP_ATOMIC allocation failures. They should be
> > rare when only single page allocations are needed for the kmem cache.
> > But in case they are not an option, then preallocating would be needed,
> > thus probably option 2.
> 
> Oh, sorry, I missed your question.
> 
> I don't have a full answer, but:
> - The allocations themselves are rare (I count a few 10s of L2 tables
> at most on my system, I assume we rarely have >100), and yes, we only
> need a single page, so the failures should be exceptional.
> - My change is probably not making anything worse: I assume that even
> with the current approach using GFP_DMA slab caches on older kernels,
> failures could potentially happen. I don't think we've seen those. If
> we are really concerned about this, maybe we'd need to modify
> mtk_iommu_map to not hold a spinlock (if that's possible), so we don't
> need to use GFP_ATOMIC. I suggest we just keep an eye on such issues,
> and address them if they show up (we can even revisit genalloc at that
> stage).

I think the spinlock is the least of our worries: the map/unmap routines
can be called in irq context and may need to allocate second-level tables.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 3/3] iommu/io-pgtable-arm-v7s: Request DMA32 memory, and improve debugging

2018-12-05 Thread Robin Murphy

On 05/12/2018 13:54, Christoph Hellwig wrote:

On Wed, Dec 05, 2018 at 01:48:28PM +0800, Nicolas Boichat wrote:

IOMMUs using ARMv7 short-descriptor format require page tables
(level 1 and 2) to be allocated within the first 4GB of RAM, even
on 64-bit systems.



+#ifdef CONFIG_ZONE_DMA32
+#define ARM_V7S_TABLE_GFP_DMA GFP_DMA32
+#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA32
+#else
+#define ARM_V7S_TABLE_GFP_DMA GFP_DMA
+#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA
+#endif


How does using GFP_DMA make sense based on the above?  If the system
has more than 32-bits worth of RAM it should be using GFP_DMA32, else
GFP_KERNEL, not GFP_DMA for an arch defined small addressability pool.


32-bit Arm doesn't have ZONE_DMA32, but has (or at least had at the 
time) a 2GB ZONE_DMA. Whether we actually need that or not depends on 
how this all interacts with LPAE and highmem, but I'm not sure of those 
details off-hand.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/4] iommu/sysfs: Rename iommu_release_device()

2018-12-05 Thread Joerg Roedel
From: Joerg Roedel 

Remove the iommu_ prefix from the function and a few other
static data structures so that the iommu_release_device name
can be re-used in iommu core code.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/iommu-sysfs.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/iommu-sysfs.c b/drivers/iommu/iommu-sysfs.c
index 36d1a7ce7fc4..71c2249d3260 100644
--- a/drivers/iommu/iommu-sysfs.c
+++ b/drivers/iommu/iommu-sysfs.c
@@ -22,25 +22,25 @@ static struct attribute *devices_attr[] = {
NULL,
 };
 
-static const struct attribute_group iommu_devices_attr_group = {
+static const struct attribute_group devices_attr_group = {
.name = "devices",
.attrs = devices_attr,
 };
 
-static const struct attribute_group *iommu_dev_groups[] = {
-   _devices_attr_group,
+static const struct attribute_group *dev_groups[] = {
+   _attr_group,
NULL,
 };
 
-static void iommu_release_device(struct device *dev)
+static void release_device(struct device *dev)
 {
kfree(dev);
 }
 
 static struct class iommu_class = {
.name = "iommu",
-   .dev_release = iommu_release_device,
-   .dev_groups = iommu_dev_groups,
+   .dev_release = release_device,
+   .dev_groups = dev_groups,
 };
 
 static int __init iommu_dev_init(void)
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/4] iommu: Consolitate ->add/remove_device() calls

2018-12-05 Thread Joerg Roedel
From: Joerg Roedel 

Put them into separate functions and call those where the
plain ops have been called before.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/iommu.c | 54 ++-
 include/linux/iommu.h |  3 +++
 2 files changed, 31 insertions(+), 26 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index edbdf5d6962c..ad4dc51eb69c 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -110,6 +110,26 @@ void iommu_device_unregister(struct iommu_device *iommu)
spin_unlock(_device_lock);
 }
 
+int iommu_probe_device(struct device *dev)
+{
+   const struct iommu_ops *ops = dev->bus->iommu_ops;
+
+   if (!ops->add_device)
+   return 0;
+
+   WARN_ON(dev->iommu_group);
+
+   return ops->add_device(dev);
+}
+
+void  iommu_release_device(struct device *dev)
+{
+   const struct iommu_ops *ops = dev->bus->iommu_ops;
+
+   if (ops->remove_device && dev->iommu_group)
+   ops->remove_device(dev);
+}
+
 static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus,
 unsigned type);
 static int __iommu_attach_device(struct iommu_domain *domain,
@@ -1117,16 +1137,7 @@ struct iommu_domain *iommu_group_default_domain(struct 
iommu_group *group)
 
 static int add_iommu_group(struct device *dev, void *data)
 {
-   struct iommu_callback_data *cb = data;
-   const struct iommu_ops *ops = cb->ops;
-   int ret;
-
-   if (!ops->add_device)
-   return 0;
-
-   WARN_ON(dev->iommu_group);
-
-   ret = ops->add_device(dev);
+   int ret = iommu_probe_device(dev);
 
/*
 * We ignore -ENODEV errors for now, as they just mean that the
@@ -1141,11 +1152,7 @@ static int add_iommu_group(struct device *dev, void 
*data)
 
 static int remove_iommu_group(struct device *dev, void *data)
 {
-   struct iommu_callback_data *cb = data;
-   const struct iommu_ops *ops = cb->ops;
-
-   if (ops->remove_device && dev->iommu_group)
-   ops->remove_device(dev);
+   iommu_release_device(dev);
 
return 0;
 }
@@ -1153,27 +1160,22 @@ static int remove_iommu_group(struct device *dev, void 
*data)
 static int iommu_bus_notifier(struct notifier_block *nb,
  unsigned long action, void *data)
 {
+   unsigned long group_action = 0;
struct device *dev = data;
-   const struct iommu_ops *ops = dev->bus->iommu_ops;
struct iommu_group *group;
-   unsigned long group_action = 0;
 
/*
 * ADD/DEL call into iommu driver ops if provided, which may
 * result in ADD/DEL notifiers to group->notifier
 */
if (action == BUS_NOTIFY_ADD_DEVICE) {
-   if (ops->add_device) {
-   int ret;
+   int ret;
 
-   ret = ops->add_device(dev);
-   return (ret) ? NOTIFY_DONE : NOTIFY_OK;
-   }
+   ret = iommu_probe_device(dev);
+   return (ret) ? NOTIFY_DONE : NOTIFY_OK;
} else if (action == BUS_NOTIFY_REMOVED_DEVICE) {
-   if (ops->remove_device && dev->iommu_group) {
-   ops->remove_device(dev);
-   return 0;
-   }
+   iommu_release_device(dev);
+   return NOTIFY_OK;
}
 
/*
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index a1d28f42cb77..2357841845bb 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -398,6 +398,9 @@ void iommu_fwspec_free(struct device *dev);
 int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids);
 const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode);
 
+int iommu_probe_device(struct device *dev);
+void  iommu_release_device(struct device *dev);
+
 #else /* CONFIG_IOMMU_API */
 
 struct iommu_ops {};
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/4] Consolitate iommu_ops->add/remove_device() calls

2018-12-05 Thread Joerg Roedel
Hi,

this patch-set consolidates the calls to iommu_ops->add_device()
and remove_device() and the necessary checks into
probe/release functions that be extended later with more
setup work.

Regards,

   Joerg

Joerg Roedel (4):
  iommu/sysfs: Rename iommu_release_device()
  iommu: Consolitate ->add/remove_device() calls
  iommu/of: Don't call iommu_ops->add_device directly
  ACPI/IORT: Don't call iommu_ops->add_device directly

 drivers/acpi/arm64/iort.c   |  4 +--
 drivers/iommu/iommu-sysfs.c | 12 -
 drivers/iommu/iommu.c   | 54 +++--
 drivers/iommu/of_iommu.c|  6 ++---
 include/linux/iommu.h   |  3 +++
 5 files changed, 42 insertions(+), 37 deletions(-)

-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/4] iommu/of: Don't call iommu_ops->add_device directly

2018-12-05 Thread Joerg Roedel
From: Joerg Roedel 

Make sure to invoke this call-back through the proper
function of the IOMMU-API.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/of_iommu.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index c5dd63072529..4d4847de727e 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -218,10 +218,10 @@ const struct iommu_ops *of_iommu_configure(struct device 
*dev,
ops = dev->iommu_fwspec->ops;
/*
 * If we have reason to believe the IOMMU driver missed the initial
-* add_device callback for dev, replay it to get things in order.
+* probe for dev, replay it to get things in order.
 */
-   if (ops && ops->add_device && dev->bus && !dev->iommu_group)
-   err = ops->add_device(dev);
+   if (dev->bus && !dev->iommu_group)
+   err = iommu_probe_device(dev);
 
/* Ignore all other errors apart from EPROBE_DEFER */
if (err == -EPROBE_DEFER) {
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 4/4] ACPI/IORT: Don't call iommu_ops->add_device directly

2018-12-05 Thread Joerg Roedel
From: Joerg Roedel 

Make sure to invoke this call-back through the proper
function of the IOMMU-API.

Signed-off-by: Joerg Roedel 
---
 drivers/acpi/arm64/iort.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index 70f4e80b9246..d4f7c1adc048 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -805,8 +805,8 @@ static inline int iort_add_device_replay(const struct 
iommu_ops *ops,
 {
int err = 0;
 
-   if (ops->add_device && dev->bus && !dev->iommu_group)
-   err = ops->add_device(dev);
+   if (dev->bus && !dev->iommu_group)
+   err = iommu_probe_device(dev);
 
return err;
 }
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: use generic DMA mapping code in powerpc V4

2018-12-05 Thread Christoph Hellwig
On Wed, Dec 05, 2018 at 10:44:05AM +0100, Christian Zigotzky wrote:
> Thanks for your reply. I undid all dma mapping commits with the following 
> command:
>
> git checkout 721c01ba8b46ddb5355bd6e6b3bbfdabfdf01e97
>
> After that I compiled the kernels with this code for my P5020 board (Cyrus) 
> and for my PASEMI board (Nemo) today.
>
> Result: PASEMI onboard ethernet works again and the P5020 board boots.
>
> It seems the dma mapping commits are the problem.

Thanks.  Can you try a few stepping points in the tree?

First just with commit 7fd3bb05b73beea1f9840b505aa09beb9c75a8c6
(the first one) applied?

Second with all commits up to 5da11e49df21f21dac25a2491aa788307bdacb6b

And if that still works with commits up to
c1bfcad4b0cf38ce5b00f7ad880d3a13484c123a
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 2/3] mm: Add support for kmem caches in DMA32 zone

2018-12-05 Thread Vlastimil Babka
On 12/5/18 6:48 AM, Nicolas Boichat wrote:
> In some cases (e.g. IOMMU ARMv7s page allocator), we need to allocate
> data structures smaller than a page with GFP_DMA32 flag.
> 
> This change makes it possible to create a custom cache in DMA32 zone
> using kmem_cache_create, then allocate memory using kmem_cache_alloc.
> 
> We do not create a DMA32 kmalloc cache array, as there are currently
> no users of kmalloc(..., GFP_DMA32). The new test in check_slab_flags
> ensures that such calls still fail (as they do before this change).
> 
> Fixes: ad67f5a6545f ("arm64: replace ZONE_DMA with ZONE_DMA32")

Same as my comment for 1/3.

> Signed-off-by: Nicolas Boichat 

In general,
Acked-by: Vlastimil Babka 

Some comments below:

> ---
> 
> Changes since v2:
>  - Clarified commit message
>  - Add entry in sysfs-kernel-slab to document the new sysfs file
> 
> (v3 used the page_frag approach)
> 
> Documentation/ABI/testing/sysfs-kernel-slab |  9 +
>  include/linux/slab.h|  2 ++
>  mm/internal.h   |  8 ++--
>  mm/slab.c   |  4 +++-
>  mm/slab.h   |  3 ++-
>  mm/slab_common.c|  2 +-
>  mm/slub.c   | 18 +-
>  7 files changed, 40 insertions(+), 6 deletions(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-kernel-slab 
> b/Documentation/ABI/testing/sysfs-kernel-slab
> index 29601d93a1c2ea..d742c6cfdffbe9 100644
> --- a/Documentation/ABI/testing/sysfs-kernel-slab
> +++ b/Documentation/ABI/testing/sysfs-kernel-slab
> @@ -106,6 +106,15 @@ Description:
>   are from ZONE_DMA.
>   Available when CONFIG_ZONE_DMA is enabled.
>  
> +What:/sys/kernel/slab/cache/cache_dma32
> +Date:December 2018
> +KernelVersion:   4.21
> +Contact: Nicolas Boichat 
> +Description:
> + The cache_dma32 file is read-only and specifies whether objects
> + are from ZONE_DMA32.
> + Available when CONFIG_ZONE_DMA32 is enabled.

I don't have a strong opinion. It's a new file, yeah, but consistent
with already existing ones. I'd leave the decision with SL*B maintainers.

>  What:/sys/kernel/slab/cache/cpu_slabs
>  Date:May 2007
>  KernelVersion:   2.6.22
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index 11b45f7ae4057c..9449b19c5f107a 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -32,6 +32,8 @@
>  #define SLAB_HWCACHE_ALIGN   ((slab_flags_t __force)0x2000U)
>  /* Use GFP_DMA memory */
>  #define SLAB_CACHE_DMA   ((slab_flags_t __force)0x4000U)
> +/* Use GFP_DMA32 memory */
> +#define SLAB_CACHE_DMA32 ((slab_flags_t __force)0x8000U)
>  /* DEBUG: Store the last owner for bug hunting */
>  #define SLAB_STORE_USER  ((slab_flags_t __force)0x0001U)
>  /* Panic if kmem_cache_create() fails */
> diff --git a/mm/internal.h b/mm/internal.h
> index a2ee82a0cd44ae..fd244ad716eaf8 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -14,6 +14,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  /*
> @@ -34,9 +35,12 @@
>  #define GFP_CONSTRAINT_MASK (__GFP_HARDWALL|__GFP_THISNODE)
>  
>  /* Check for flags that must not be used with a slab allocator */
> -static inline gfp_t check_slab_flags(gfp_t flags)
> +static inline gfp_t check_slab_flags(gfp_t flags, slab_flags_t slab_flags)
>  {
> - gfp_t bug_mask = __GFP_DMA32 | __GFP_HIGHMEM | ~__GFP_BITS_MASK;
> + gfp_t bug_mask = __GFP_HIGHMEM | ~__GFP_BITS_MASK;
> +
> + if (!IS_ENABLED(CONFIG_ZONE_DMA32) || !(slab_flags & SLAB_CACHE_DMA32))
> + bug_mask |= __GFP_DMA32;

I'll point out that this is not even strictly needed AFAICS, as only
flags passed to kmem_cache_alloc() are checked - the cache->allocflags
derived from SLAB_CACHE_DMA32 are appended only after check_slab_flags()
(in both SLAB and SLUB AFAICS). And for a cache created with
SLAB_CACHE_DMA32, the caller of kmem_cache_alloc() doesn't need to also
include __GFP_DMA32, the allocation will be from ZONE_DMA32 regardless.
So it would be fine even unchanged. The check would anyway need some
more love to catch the same with __GFP_DMA to be consistent and cover
all corner cases.

>  
>   if (unlikely(flags & bug_mask)) {
>   gfp_t invalid_mask = flags & bug_mask;
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 3/3] iommu/io-pgtable-arm-v7s: Request DMA32 memory, and improve debugging

2018-12-05 Thread Christoph Hellwig
On Wed, Dec 05, 2018 at 01:48:28PM +0800, Nicolas Boichat wrote:
> IOMMUs using ARMv7 short-descriptor format require page tables
> (level 1 and 2) to be allocated within the first 4GB of RAM, even
> on 64-bit systems.

> +#ifdef CONFIG_ZONE_DMA32
> +#define ARM_V7S_TABLE_GFP_DMA GFP_DMA32
> +#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA32
> +#else
> +#define ARM_V7S_TABLE_GFP_DMA GFP_DMA
> +#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA
> +#endif

How does using GFP_DMA make sense based on the above?  If the system
has more than 32-bits worth of RAM it should be using GFP_DMA32, else
GFP_KERNEL, not GFP_DMA for an arch defined small addressability pool.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] dma-mapping: fix lack of DMA address assignment in generic remap allocator

2018-12-05 Thread Christoph Hellwig
Thanks, applied.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 1/3] mm: slab/slub: Add check_slab_flags function to check for valid flags

2018-12-05 Thread Vlastimil Babka
On 12/5/18 6:48 AM, Nicolas Boichat wrote:
> Remove duplicated code between slab and slub, and will make it
> easier to make the test more complicated in the next commits.
> 
> Fixes: ad67f5a6545f ("arm64: replace ZONE_DMA with ZONE_DMA32")

Well, not really. Patch 3 does that and yeah this will be a prerequisity
for a clean stable backport, but we don't tag all prerequisities like
this, I think?

> Signed-off-by: Nicolas Boichat 

Acked-by: Vlastimil Babka 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] dma-mapping: fix lack of DMA address assignment in generic remap allocator

2018-12-05 Thread Thierry Reding
On Wed, Dec 05, 2018 at 11:14:01AM +0100, Marek Szyprowski wrote:
> Commit bfd56cd60521 ("dma-mapping: support highmem in the generic remap
> allocator") replaced dma_direct_alloc_pages() with __dma_direct_alloc_pages(),
> which doesn't set dma_handle and zero allocated memory. Fix it by doing this
> directly in the caller function.
> 
> Fixes: bfd56cd60521 ("dma-mapping: support highmem in the generic remap 
> allocator")
> Signed-off-by: Marek Szyprowski 
> ---
>  kernel/dma/remap.c | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)

Tested-by: Thierry Reding 


signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 2/3] mm: Add support for kmem caches in DMA32 zone

2018-12-05 Thread Wei Yang
On Wed, Dec 05, 2018 at 03:39:51PM +0800, Nicolas Boichat wrote:
>On Wed, Dec 5, 2018 at 3:25 PM Wei Yang  wrote:
>>
>> On Wed, Dec 05, 2018 at 01:48:27PM +0800, Nicolas Boichat wrote:
>> >In some cases (e.g. IOMMU ARMv7s page allocator), we need to allocate
>> >data structures smaller than a page with GFP_DMA32 flag.
>> >
>> >This change makes it possible to create a custom cache in DMA32 zone
>> >using kmem_cache_create, then allocate memory using kmem_cache_alloc.
>> >
>> >We do not create a DMA32 kmalloc cache array, as there are currently
>> >no users of kmalloc(..., GFP_DMA32). The new test in check_slab_flags
>> >ensures that such calls still fail (as they do before this change).
>> >
>> >Fixes: ad67f5a6545f ("arm64: replace ZONE_DMA with ZONE_DMA32")
>> >Signed-off-by: Nicolas Boichat 
>> >---
>> >
>> >Changes since v2:
>> > - Clarified commit message
>> > - Add entry in sysfs-kernel-slab to document the new sysfs file
>> >
>> >(v3 used the page_frag approach)
>> >
>> >Documentation/ABI/testing/sysfs-kernel-slab |  9 +
>> > include/linux/slab.h|  2 ++
>> > mm/internal.h   |  8 ++--
>> > mm/slab.c   |  4 +++-
>> > mm/slab.h   |  3 ++-
>> > mm/slab_common.c|  2 +-
>> > mm/slub.c   | 18 +-
>> > 7 files changed, 40 insertions(+), 6 deletions(-)
>> >
>> >diff --git a/Documentation/ABI/testing/sysfs-kernel-slab 
>> >b/Documentation/ABI/testing/sysfs-kernel-slab
>> >index 29601d93a1c2ea..d742c6cfdffbe9 100644
>> >--- a/Documentation/ABI/testing/sysfs-kernel-slab
>> >+++ b/Documentation/ABI/testing/sysfs-kernel-slab
>> >@@ -106,6 +106,15 @@ Description:
>> >   are from ZONE_DMA.
>> >   Available when CONFIG_ZONE_DMA is enabled.
>> >
>> >+What: /sys/kernel/slab/cache/cache_dma32
>> >+Date: December 2018
>> >+KernelVersion:4.21
>> >+Contact:  Nicolas Boichat 
>> >+Description:
>> >+  The cache_dma32 file is read-only and specifies whether 
>> >objects
>> >+  are from ZONE_DMA32.
>> >+  Available when CONFIG_ZONE_DMA32 is enabled.
>> >+
>> > What: /sys/kernel/slab/cache/cpu_slabs
>> > Date: May 2007
>> > KernelVersion:2.6.22
>> >diff --git a/include/linux/slab.h b/include/linux/slab.h
>> >index 11b45f7ae4057c..9449b19c5f107a 100644
>> >--- a/include/linux/slab.h
>> >+++ b/include/linux/slab.h
>> >@@ -32,6 +32,8 @@
>> > #define SLAB_HWCACHE_ALIGN((slab_flags_t __force)0x2000U)
>> > /* Use GFP_DMA memory */
>> > #define SLAB_CACHE_DMA((slab_flags_t __force)0x4000U)
>> >+/* Use GFP_DMA32 memory */
>> >+#define SLAB_CACHE_DMA32  ((slab_flags_t __force)0x8000U)
>> > /* DEBUG: Store the last owner for bug hunting */
>> > #define SLAB_STORE_USER   ((slab_flags_t __force)0x0001U)
>> > /* Panic if kmem_cache_create() fails */
>> >diff --git a/mm/internal.h b/mm/internal.h
>> >index a2ee82a0cd44ae..fd244ad716eaf8 100644
>> >--- a/mm/internal.h
>> >+++ b/mm/internal.h
>> >@@ -14,6 +14,7 @@
>> > #include 
>> > #include 
>> > #include 
>> >+#include 
>> > #include 
>> >
>> > /*
>> >@@ -34,9 +35,12 @@
>> > #define GFP_CONSTRAINT_MASK (__GFP_HARDWALL|__GFP_THISNODE)
>> >
>> > /* Check for flags that must not be used with a slab allocator */
>> >-static inline gfp_t check_slab_flags(gfp_t flags)
>> >+static inline gfp_t check_slab_flags(gfp_t flags, slab_flags_t slab_flags)
>> > {
>> >-  gfp_t bug_mask = __GFP_DMA32 | __GFP_HIGHMEM | ~__GFP_BITS_MASK;
>> >+  gfp_t bug_mask = __GFP_HIGHMEM | ~__GFP_BITS_MASK;
>> >+
>> >+  if (!IS_ENABLED(CONFIG_ZONE_DMA32) || !(slab_flags & 
>> >SLAB_CACHE_DMA32))
>> >+  bug_mask |= __GFP_DMA32;
>>
>> The original version doesn't check CONFIG_ZONE_DMA32.
>>
>> Do we need to add this condition here?
>> Could we just decide the bug_mask based on slab_flags?
>
>We can. The reason I did it this way is that when we don't have
>CONFIG_ZONE_DMA32, the compiler should be able to simplify to:
>
>bug_mask = __GFP_HIGHMEM | ~__GFP_BITS_MASK;
>if (true || ..) => if (true)
>   bug_mask |= __GFP_DMA32;
>
>Then just
>bug_mask = __GFP_HIGHMEM | ~__GFP_BITS_MASK | __GFP_DMA32;
>
>And since the function is inline, slab_flags would not even need to be
>accessed at all.
>

Hmm, I get one confusion.

This means if CONFIG_ZONE_DMA32 is not enabled, bug_mask will always
contains __GFP_DMA32. This will check with cachep->flags.

If cachep->flags has GFP_DMA32, this always fail?

Is this possible?

-- 
Wei Yang
Help you, Help me
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V2] mm: Replace all open encodings for NUMA_NO_NODE

2018-12-05 Thread Lubomir Rintel
On Wed, 2018-12-05 at 17:01 +0530, Anshuman Khandual wrote:
> 
> On 12/05/2018 02:56 AM, Lubomir Rintel wrote:
> > On Mon, 2018-11-26 at 17:56 +0530, Anshuman Khandual wrote:
> > > At present there are multiple places where invalid node number is encoded
> > > as -1. Even though implicitly understood it is always better to have 
> > > macros
> > > in there. Replace these open encodings for an invalid node number with the
> > > global macro NUMA_NO_NODE. This helps remove NUMA related assumptions like
> > > 'invalid node' from various places redirecting them to a common 
> > > definition.
> > > 
> > > Signed-off-by: Anshuman Khandual 
> > > ---
> > > Changes in V2:
> > > 
> > > - Added inclusion of 'numa.h' header at various places per Andrew
> > > - Updated 'dev_to_node' to use NUMA_NO_NODE instead per Vinod
> > > 
> > > Changes in V1: (https://lkml.org/lkml/2018/11/23/485)
> > > 
> > > - Dropped OCFS2 changes per Joseph
> > > - Dropped media/video drivers changes per Hans
> > > 
> > > RFC - https://patchwork.kernel.org/patch/10678035/
> > > 
> > > Build tested this with multiple cross compiler options like alpha, sparc,
> > > arm64, x86, powerpc, powerpc64le etc with their default config which might
> > > not have compiled tested all driver related changes. I will appreciate
> > > folks giving this a test in their respective build environment.
> > > 
> > > All these places for replacement were found by running the following grep
> > > patterns on the entire kernel code. Please let me know if this might have
> > > missed some instances. This might also have replaced some false positives.
> > > I will appreciate suggestions, inputs and review.
> > > 
> > > 1. git grep "nid == -1"
> > > 2. git grep "node == -1"
> > > 3. git grep "nid = -1"
> > > 4. git grep "node = -1"
> > > 
> > >  arch/alpha/include/asm/topology.h |  3 ++-
> > >  arch/ia64/kernel/numa.c   |  2 +-
> > >  arch/ia64/mm/discontig.c  |  6 +++---
> > >  arch/ia64/sn/kernel/io_common.c   |  3 ++-
> > >  arch/powerpc/include/asm/pci-bridge.h |  3 ++-
> > >  arch/powerpc/kernel/paca.c|  3 ++-
> > >  arch/powerpc/kernel/pci-common.c  |  3 ++-
> > >  arch/powerpc/mm/numa.c| 14 +++---
> > >  arch/powerpc/platforms/powernv/memtrace.c |  5 +++--
> > >  arch/sparc/kernel/auxio_32.c  |  3 ++-
> > >  arch/sparc/kernel/pci_fire.c  |  3 ++-
> > >  arch/sparc/kernel/pci_schizo.c|  3 ++-
> > >  arch/sparc/kernel/pcic.c  |  7 ---
> > >  arch/sparc/kernel/psycho_common.c |  3 ++-
> > >  arch/sparc/kernel/sbus.c  |  3 ++-
> > >  arch/sparc/mm/init_64.c   |  6 +++---
> > >  arch/sparc/prom/init_32.c |  3 ++-
> > >  arch/sparc/prom/init_64.c |  5 +++--
> > >  arch/sparc/prom/tree_32.c | 13 +++--
> > >  arch/sparc/prom/tree_64.c | 19 ++-
> > >  arch/x86/include/asm/pci.h|  3 ++-
> > >  arch/x86/kernel/apic/x2apic_uv_x.c|  7 ---
> > >  arch/x86/kernel/smpboot.c |  3 ++-
> > >  arch/x86/platform/olpc/olpc_dt.c  | 17 +
> > >  drivers/block/mtip32xx/mtip32xx.c |  5 +++--
> > >  drivers/dma/dmaengine.c   |  4 +++-
> > >  drivers/infiniband/hw/hfi1/affinity.c |  3 ++-
> > >  drivers/infiniband/hw/hfi1/init.c |  3 ++-
> > >  drivers/iommu/dmar.c  |  5 +++--
> > >  drivers/iommu/intel-iommu.c   |  3 ++-
> > >  drivers/misc/sgi-xp/xpc_uv.c  |  3 ++-
> > >  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  5 +++--
> > >  include/linux/device.h|  2 +-
> > >  init/init_task.c  |  3 ++-
> > >  kernel/kthread.c  |  3 ++-
> > >  kernel/sched/fair.c   | 15 ---
> > >  lib/cpumask.c |  3 ++-
> > >  mm/huge_memory.c  | 13 +++--
> > >  mm/hugetlb.c  |  3 ++-
> > >  mm/ksm.c  |  2 +-
> > >  mm/memory.c   |  7 ---
> > >  mm/memory_hotplug.c   | 12 ++--
> > >  mm/mempolicy.c|  2 +-
> > >  mm/page_alloc.c   |  4 ++--
> > >  mm/page_ext.c |  2 +-
> > >  net/core/pktgen.c |  3 ++-
> > >  net/qrtr/qrtr.c   |  3 ++-
> > >  tools/perf/bench/numa.c   |  6 +++---
> > >  48 files changed, 146 insertions(+), 108 deletions(-)
> > Thanks for the patch. It seems to me that 

Re: [PATCH v4 2/3] mm: Add support for kmem caches in DMA32 zone

2018-12-05 Thread Michal Hocko
On Wed 05-12-18 19:01:03, Nicolas Boichat wrote:
[...]
> > Secondly, why do we need a new sysfs file? Who is going to consume it?
> 
> We have cache_dma, so it seems consistent to add cache_dma32.

I wouldn't copy a pattern unless there is an explicit usecase for it.
We do expose way too much to userspace and that keeps kicking us later.
Not that I am aware of any specific example for cache_dma but seeing
other examples I would rather be more careful.

> I wasn't aware of tools/vm/slabinfo.c, so I can add support for
> cache_dma32 in a follow-up patch. Any other user I should take care
> of?

In general zones are inernal MM implementation details and the less we
export to userspace the better.

> > Then why do we need SLAB_MERGE_SAME to cover GFP_DMA32 as well?
> 
> SLAB_MERGE_SAME tells us which flags _need_ to be the same for the
> slabs to be merged. We don't want slab caches with GFP_DMA32 and
> ~GFP_DMA32 to be merged, so it should be in there.
> (https://elixir.bootlin.com/linux/v4.19.6/source/mm/slab_common.c#L342).

Ohh, my bad, I have misread the change. Sure we definitely not want to
allow merging here. My bad.
-- 
Michal Hocko
SUSE Labs
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V2] mm: Replace all open encodings for NUMA_NO_NODE

2018-12-05 Thread Anshuman Khandual



On 12/05/2018 02:56 AM, Lubomir Rintel wrote:
> On Mon, 2018-11-26 at 17:56 +0530, Anshuman Khandual wrote:
>> At present there are multiple places where invalid node number is encoded
>> as -1. Even though implicitly understood it is always better to have macros
>> in there. Replace these open encodings for an invalid node number with the
>> global macro NUMA_NO_NODE. This helps remove NUMA related assumptions like
>> 'invalid node' from various places redirecting them to a common definition.
>>
>> Signed-off-by: Anshuman Khandual 
>> ---
>> Changes in V2:
>>
>> - Added inclusion of 'numa.h' header at various places per Andrew
>> - Updated 'dev_to_node' to use NUMA_NO_NODE instead per Vinod
>>
>> Changes in V1: (https://lkml.org/lkml/2018/11/23/485)
>>
>> - Dropped OCFS2 changes per Joseph
>> - Dropped media/video drivers changes per Hans
>>
>> RFC - https://patchwork.kernel.org/patch/10678035/
>>
>> Build tested this with multiple cross compiler options like alpha, sparc,
>> arm64, x86, powerpc, powerpc64le etc with their default config which might
>> not have compiled tested all driver related changes. I will appreciate
>> folks giving this a test in their respective build environment.
>>
>> All these places for replacement were found by running the following grep
>> patterns on the entire kernel code. Please let me know if this might have
>> missed some instances. This might also have replaced some false positives.
>> I will appreciate suggestions, inputs and review.
>>
>> 1. git grep "nid == -1"
>> 2. git grep "node == -1"
>> 3. git grep "nid = -1"
>> 4. git grep "node = -1"
>>
>>  arch/alpha/include/asm/topology.h |  3 ++-
>>  arch/ia64/kernel/numa.c   |  2 +-
>>  arch/ia64/mm/discontig.c  |  6 +++---
>>  arch/ia64/sn/kernel/io_common.c   |  3 ++-
>>  arch/powerpc/include/asm/pci-bridge.h |  3 ++-
>>  arch/powerpc/kernel/paca.c|  3 ++-
>>  arch/powerpc/kernel/pci-common.c  |  3 ++-
>>  arch/powerpc/mm/numa.c| 14 +++---
>>  arch/powerpc/platforms/powernv/memtrace.c |  5 +++--
>>  arch/sparc/kernel/auxio_32.c  |  3 ++-
>>  arch/sparc/kernel/pci_fire.c  |  3 ++-
>>  arch/sparc/kernel/pci_schizo.c|  3 ++-
>>  arch/sparc/kernel/pcic.c  |  7 ---
>>  arch/sparc/kernel/psycho_common.c |  3 ++-
>>  arch/sparc/kernel/sbus.c  |  3 ++-
>>  arch/sparc/mm/init_64.c   |  6 +++---
>>  arch/sparc/prom/init_32.c |  3 ++-
>>  arch/sparc/prom/init_64.c |  5 +++--
>>  arch/sparc/prom/tree_32.c | 13 +++--
>>  arch/sparc/prom/tree_64.c | 19 ++-
>>  arch/x86/include/asm/pci.h|  3 ++-
>>  arch/x86/kernel/apic/x2apic_uv_x.c|  7 ---
>>  arch/x86/kernel/smpboot.c |  3 ++-
>>  arch/x86/platform/olpc/olpc_dt.c  | 17 +
>>  drivers/block/mtip32xx/mtip32xx.c |  5 +++--
>>  drivers/dma/dmaengine.c   |  4 +++-
>>  drivers/infiniband/hw/hfi1/affinity.c |  3 ++-
>>  drivers/infiniband/hw/hfi1/init.c |  3 ++-
>>  drivers/iommu/dmar.c  |  5 +++--
>>  drivers/iommu/intel-iommu.c   |  3 ++-
>>  drivers/misc/sgi-xp/xpc_uv.c  |  3 ++-
>>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  5 +++--
>>  include/linux/device.h|  2 +-
>>  init/init_task.c  |  3 ++-
>>  kernel/kthread.c  |  3 ++-
>>  kernel/sched/fair.c   | 15 ---
>>  lib/cpumask.c |  3 ++-
>>  mm/huge_memory.c  | 13 +++--
>>  mm/hugetlb.c  |  3 ++-
>>  mm/ksm.c  |  2 +-
>>  mm/memory.c   |  7 ---
>>  mm/memory_hotplug.c   | 12 ++--
>>  mm/mempolicy.c|  2 +-
>>  mm/page_alloc.c   |  4 ++--
>>  mm/page_ext.c |  2 +-
>>  net/core/pktgen.c |  3 ++-
>>  net/qrtr/qrtr.c   |  3 ++-
>>  tools/perf/bench/numa.c   |  6 +++---
>>  48 files changed, 146 insertions(+), 108 deletions(-)
> Thanks for the patch. It seems to me that you've got a fairly large
> amount of it wrong though -- perhaps relying just on "git grep" alone
> is not the best idea.

Hmm, okay.

> 
> The diffstat is not all that big, it is entirely plausible to just
> review each hunk manually: just do a "git show -U20" to get some
> context.
> 
> You get a NAK from me for the OLPC DT part, but I 

Re: [LKP] [mm] 19717e78a0: stderr.if(target_node==NUMA_NO_NODE){

2018-12-05 Thread Anshuman Khandual
On 12/05/2018 10:30 AM, kernel test robot wrote:
> FYI, we noticed the following commit (built with gcc-7):
> 
> commit: 19717e78a04d51512cf0e7b9b09c61f06b2af071 ("[PATCH V2] mm: Replace all 
> open encodings for NUMA_NO_NODE")
> url: 
> https://github.com/0day-ci/linux/commits/Anshuman-Khandual/mm-Replace-all-open-encodings-for-NUMA_NO_NODE/20181126-203831
> 
> 
> in testcase: perf-sanity-tests
> with following parameters:
> 
>   perf_compiler: gcc
>   ucode: 0x713
> 
> 
> 
> on test machine: 16 threads Intel(R) Xeon(R) CPU D-1541 @ 2.10GHz with 8G 
> memory
> 
> caused below changes (please refer to attached dmesg/kmsg for entire 
> log/backtrace):

The fix (in Andrew's staging tree) from Stephen Rothwell which adds 

definitions to  should fix this.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 2/3] mm: Add support for kmem caches in DMA32 zone

2018-12-05 Thread Nicolas Boichat
On Wed, Dec 5, 2018 at 5:56 PM Michal Hocko  wrote:
>
> On Wed 05-12-18 13:48:27, Nicolas Boichat wrote:
> > In some cases (e.g. IOMMU ARMv7s page allocator), we need to allocate
> > data structures smaller than a page with GFP_DMA32 flag.
> >
> > This change makes it possible to create a custom cache in DMA32 zone
> > using kmem_cache_create, then allocate memory using kmem_cache_alloc.
> >
> > We do not create a DMA32 kmalloc cache array, as there are currently
> > no users of kmalloc(..., GFP_DMA32). The new test in check_slab_flags
> > ensures that such calls still fail (as they do before this change).
>
> The changelog should be much more specific about decisions made here.
> First of all it would be nice to mention the usecase.

Ok, I'll copy most of the cover letter text here (i.e. the fact that
IOMMU wants physical memory <4GB for L2 page tables, why it's better
than genalloc/page_frag).

> Secondly, why do we need a new sysfs file? Who is going to consume it?

We have cache_dma, so it seems consistent to add cache_dma32.

I wasn't aware of tools/vm/slabinfo.c, so I can add support for
cache_dma32 in a follow-up patch. Any other user I should take care
of?

> Then why do we need SLAB_MERGE_SAME to cover GFP_DMA32 as well?

SLAB_MERGE_SAME tells us which flags _need_ to be the same for the
slabs to be merged. We don't want slab caches with GFP_DMA32 and
~GFP_DMA32 to be merged, so it should be in there.
(https://elixir.bootlin.com/linux/v4.19.6/source/mm/slab_common.c#L342).

> I
> thought the whole point is to use dedicated slab cache. Who is this
> going to merge with?

Well, if there was another SLAB cache requiring 1KB GFP_DMA32
elements, then I don't see why we would not merge the caches. This is
what happens with this IOMMU L2 tables cache pre-CONFIG_ZONE_DMA32 on
arm64 (output on some 3.18 kernel below), and what would happen on
arm32 since we still use GFP_DMA.

/sys/kernel/slab # ls -l | grep dt-0001024
drwxr-xr-x. 2 root root 0 Dec  5 02:25 :dt-0001024
lrwxrwxrwx. 1 root root 0 Dec  5 02:25 dma-kmalloc-1024 -> :dt-0001024
lrwxrwxrwx. 1 root root 0 Dec  5 02:25 io-pgtable_armv7s_l2 -> :dt-0001024

Thanks!

> --
> Michal Hocko
> SUSE Labs
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] dma-mapping: fix lack of DMA address assignment in generic remap allocator

2018-12-05 Thread Marek Szyprowski
Commit bfd56cd60521 ("dma-mapping: support highmem in the generic remap
allocator") replaced dma_direct_alloc_pages() with __dma_direct_alloc_pages(),
which doesn't set dma_handle and zero allocated memory. Fix it by doing this
directly in the caller function.

Fixes: bfd56cd60521 ("dma-mapping: support highmem in the generic remap 
allocator")
Signed-off-by: Marek Szyprowski 
---
 kernel/dma/remap.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/dma/remap.c b/kernel/dma/remap.c
index 68a64e3ff6a1..8a44317cfc1a 100644
--- a/kernel/dma/remap.c
+++ b/kernel/dma/remap.c
@@ -223,8 +223,14 @@ void *arch_dma_alloc(struct device *dev, size_t size, 
dma_addr_t *dma_handle,
ret = dma_common_contiguous_remap(page, size, VM_USERMAP,
arch_dma_mmap_pgprot(dev, PAGE_KERNEL, attrs),
__builtin_return_address(0));
-   if (!ret)
+   if (!ret) {
__dma_direct_free_pages(dev, size, page);
+   return ret;
+   }
+
+   *dma_handle = phys_to_dma(dev, page_to_phys(page));
+   memset(ret, 0, size);
+
return ret;
 }
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 2/3] mm: Add support for kmem caches in DMA32 zone

2018-12-05 Thread Michal Hocko
On Wed 05-12-18 13:48:27, Nicolas Boichat wrote:
> In some cases (e.g. IOMMU ARMv7s page allocator), we need to allocate
> data structures smaller than a page with GFP_DMA32 flag.
> 
> This change makes it possible to create a custom cache in DMA32 zone
> using kmem_cache_create, then allocate memory using kmem_cache_alloc.
> 
> We do not create a DMA32 kmalloc cache array, as there are currently
> no users of kmalloc(..., GFP_DMA32). The new test in check_slab_flags
> ensures that such calls still fail (as they do before this change).

The changelog should be much more specific about decisions made here.
First of all it would be nice to mention the usecase.

Secondly, why do we need a new sysfs file? Who is going to consume it?

Then why do we need SLAB_MERGE_SAME to cover GFP_DMA32 as well? I
thought the whole point is to use dedicated slab cache. Who is this
going to merge with?
-- 
Michal Hocko
SUSE Labs
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: use generic DMA mapping code in powerpc V4

2018-12-05 Thread Christian Zigotzky

On 04 December 2018 at 3:24PM, Christoph Hellwig wrote:

On Tue, Dec 04, 2018 at 10:53:39AM +0100, Christian Zigotzky wrote:

I don't know why this kernel doesn't recognize the hard disks connected to
my physical P5020 board and why the onboard ethernet on my PASEMI board
doesn't work. (dma_direct_map_page: overflow)

Do you know if this actually works for the baseline before my patches?
E.g. with commit 721c01ba8b46ddb5355bd6e6b3bbfdabfdf01e97 ?


Hi Christoph,

Thanks for your reply. I undid all dma mapping commits with the 
following command:


git checkout 721c01ba8b46ddb5355bd6e6b3bbfdabfdf01e97

After that I compiled the kernels with this code for my P5020 board 
(Cyrus) and for my PASEMI board (Nemo) today.


Result: PASEMI onboard ethernet works again and the P5020 board boots.

It seems the dma mapping commits are the problem.

@All
Could you please test Christoph's kernel on your PASEMI and NXP boards? 
Download:


'git clone git://git.infradead.org/users/hch/misc.git -b powerpc-dma.5 a'

Thanks,
Christian

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] media: s5p-mfc: Fix memdev DMA configuration

2018-12-05 Thread Mauro Carvalho Chehab
Em Fri, 14 Sep 2018 14:19:29 +0200
Marek Szyprowski  escreveu:

> Hi Robin,
> 
> On 2018-09-12 18:45, Robin Murphy wrote:
> > Having of_reserved_mem_device_init() forcibly reconfigure DMA for all
> > callers, potentially overriding the work done by a bus-specific
> > .dma_configure method earlier, is at best a bad idea and at worst
> > actively harmful. If drivers really need virtual devices to own
> > dma-coherent memory, they should explicitly configure those devices
> > based on the appropriate firmware node as they create them.
> >
> > It looks like the only driver not passing in a proper OF platform device
> > is s5p-mfc, so move the rogue of_dma_configure() call into that driver
> > where it logically belongs.
> >
> > CC: Smitha T Murthy 
> > CC: Marek Szyprowski 
> > CC: Rob Herring 
> > Signed-off-by: Robin Murphy   
> 
> Right, after recent cleanup dma ops initialization, MFC driver is
> a better place for calling of_dma_configure() on virtual devices.
> 
> Reviewed-by: Marek Szyprowski 

This patch seems to fit better via OF/DT tree. Not sure if it was
merged there yet. In any case:

Acked-by: Mauro Carvalho Chehab 

> 
> > ---
> >   drivers/media/platform/s5p-mfc/s5p_mfc.c | 7 +++
> >   drivers/of/of_reserved_mem.c | 4 
> >   2 files changed, 7 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/media/platform/s5p-mfc/s5p_mfc.c 
> > b/drivers/media/platform/s5p-mfc/s5p_mfc.c
> > index 927a1235408d..77eb4a4511c1 100644
> > --- a/drivers/media/platform/s5p-mfc/s5p_mfc.c
> > +++ b/drivers/media/platform/s5p-mfc/s5p_mfc.c
> > @@ -1094,6 +1094,13 @@ static struct device *s5p_mfc_alloc_memdev(struct 
> > device *dev,
> > child->dma_mask = dev->dma_mask;
> > child->release = s5p_mfc_memdev_release;
> >   
> > +   /*
> > +* The memdevs are not proper OF platform devices, so in order for them
> > +* to be treated as valid DMA masters we need a bit of a hack to force
> > +* them to inherit the MFC node's DMA configuration.
> > +*/
> > +   of_dma_configure(child, dev->of_node, true);
> > +
> > if (device_add(child) == 0) {
> > ret = of_reserved_mem_device_init_by_idx(child, dev->of_node,
> >  idx);
> > diff --git a/drivers/of/of_reserved_mem.c b/drivers/of/of_reserved_mem.c
> > index 895c83e0c7b6..4ef6f4485335 100644
> > --- a/drivers/of/of_reserved_mem.c
> > +++ b/drivers/of/of_reserved_mem.c
> > @@ -350,10 +350,6 @@ int of_reserved_mem_device_init_by_idx(struct device 
> > *dev,
> > mutex_lock(_rmem_assigned_device_mutex);
> > list_add(>list, _rmem_assigned_device_list);
> > mutex_unlock(_rmem_assigned_device_mutex);
> > -   /* ensure that dma_ops is set for virtual devices
> > -* using reserved memory
> > -*/
> > -   of_dma_configure(dev, np, true);
> >   
> > dev_info(dev, "assigned reserved memory node %s\n", rmem->name);
> > } else {  
> 
> Best regards



Thanks,
Mauro
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 2/3] mm: Add support for kmem caches in DMA32 zone

2018-12-05 Thread Wei Yang
On Wed, Dec 05, 2018 at 03:39:51PM +0800, Nicolas Boichat wrote:
>On Wed, Dec 5, 2018 at 3:25 PM Wei Yang  wrote:
>>
>> On Wed, Dec 05, 2018 at 01:48:27PM +0800, Nicolas Boichat wrote:
>> >In some cases (e.g. IOMMU ARMv7s page allocator), we need to allocate
>> >data structures smaller than a page with GFP_DMA32 flag.
>> >
>> >This change makes it possible to create a custom cache in DMA32 zone
>> >using kmem_cache_create, then allocate memory using kmem_cache_alloc.
>> >
>> >We do not create a DMA32 kmalloc cache array, as there are currently
>> >no users of kmalloc(..., GFP_DMA32). The new test in check_slab_flags
>> >ensures that such calls still fail (as they do before this change).
>> >
>> >Fixes: ad67f5a6545f ("arm64: replace ZONE_DMA with ZONE_DMA32")
>> >Signed-off-by: Nicolas Boichat 
>> >---
>> >
>> >Changes since v2:
>> > - Clarified commit message
>> > - Add entry in sysfs-kernel-slab to document the new sysfs file
>> >
>> >(v3 used the page_frag approach)
>> >
>> >Documentation/ABI/testing/sysfs-kernel-slab |  9 +
>> > include/linux/slab.h|  2 ++
>> > mm/internal.h   |  8 ++--
>> > mm/slab.c   |  4 +++-
>> > mm/slab.h   |  3 ++-
>> > mm/slab_common.c|  2 +-
>> > mm/slub.c   | 18 +-
>> > 7 files changed, 40 insertions(+), 6 deletions(-)
>> >
>> >diff --git a/Documentation/ABI/testing/sysfs-kernel-slab 
>> >b/Documentation/ABI/testing/sysfs-kernel-slab
>> >index 29601d93a1c2ea..d742c6cfdffbe9 100644
>> >--- a/Documentation/ABI/testing/sysfs-kernel-slab
>> >+++ b/Documentation/ABI/testing/sysfs-kernel-slab
>> >@@ -106,6 +106,15 @@ Description:
>> >   are from ZONE_DMA.
>> >   Available when CONFIG_ZONE_DMA is enabled.
>> >
>> >+What: /sys/kernel/slab/cache/cache_dma32
>> >+Date: December 2018
>> >+KernelVersion:4.21
>> >+Contact:  Nicolas Boichat 
>> >+Description:
>> >+  The cache_dma32 file is read-only and specifies whether 
>> >objects
>> >+  are from ZONE_DMA32.
>> >+  Available when CONFIG_ZONE_DMA32 is enabled.
>> >+
>> > What: /sys/kernel/slab/cache/cpu_slabs
>> > Date: May 2007
>> > KernelVersion:2.6.22
>> >diff --git a/include/linux/slab.h b/include/linux/slab.h
>> >index 11b45f7ae4057c..9449b19c5f107a 100644
>> >--- a/include/linux/slab.h
>> >+++ b/include/linux/slab.h
>> >@@ -32,6 +32,8 @@
>> > #define SLAB_HWCACHE_ALIGN((slab_flags_t __force)0x2000U)
>> > /* Use GFP_DMA memory */
>> > #define SLAB_CACHE_DMA((slab_flags_t __force)0x4000U)
>> >+/* Use GFP_DMA32 memory */
>> >+#define SLAB_CACHE_DMA32  ((slab_flags_t __force)0x8000U)
>> > /* DEBUG: Store the last owner for bug hunting */
>> > #define SLAB_STORE_USER   ((slab_flags_t __force)0x0001U)
>> > /* Panic if kmem_cache_create() fails */
>> >diff --git a/mm/internal.h b/mm/internal.h
>> >index a2ee82a0cd44ae..fd244ad716eaf8 100644
>> >--- a/mm/internal.h
>> >+++ b/mm/internal.h
>> >@@ -14,6 +14,7 @@
>> > #include 
>> > #include 
>> > #include 
>> >+#include 
>> > #include 
>> >
>> > /*
>> >@@ -34,9 +35,12 @@
>> > #define GFP_CONSTRAINT_MASK (__GFP_HARDWALL|__GFP_THISNODE)
>> >
>> > /* Check for flags that must not be used with a slab allocator */
>> >-static inline gfp_t check_slab_flags(gfp_t flags)
>> >+static inline gfp_t check_slab_flags(gfp_t flags, slab_flags_t slab_flags)
>> > {
>> >-  gfp_t bug_mask = __GFP_DMA32 | __GFP_HIGHMEM | ~__GFP_BITS_MASK;
>> >+  gfp_t bug_mask = __GFP_HIGHMEM | ~__GFP_BITS_MASK;
>> >+
>> >+  if (!IS_ENABLED(CONFIG_ZONE_DMA32) || !(slab_flags & 
>> >SLAB_CACHE_DMA32))
>> >+  bug_mask |= __GFP_DMA32;
>>
>> The original version doesn't check CONFIG_ZONE_DMA32.
>>
>> Do we need to add this condition here?
>> Could we just decide the bug_mask based on slab_flags?
>
>We can. The reason I did it this way is that when we don't have
>CONFIG_ZONE_DMA32, the compiler should be able to simplify to:
>
>bug_mask = __GFP_HIGHMEM | ~__GFP_BITS_MASK;
>if (true || ..) => if (true)
>   bug_mask |= __GFP_DMA32;
>
>Then just
>bug_mask = __GFP_HIGHMEM | ~__GFP_BITS_MASK | __GFP_DMA32;
>
>And since the function is inline, slab_flags would not even need to be
>accessed at all.
>

Thanks for explanation. This make sense to me.

-- 
Wei Yang
Help you, Help me
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 0/4] PCI / iommu / thunderbolt: IOMMU based DMA protection

2018-12-05 Thread Mika Westerberg
On Thu, Nov 29, 2018 at 06:51:49PM +0300, Mika Westerberg wrote:
> Recent systems with Thunderbolt ports may be utilizing IOMMU to prevent DMA
> attacks. This is different from the previous security level based scheme
> because the connected device cannot access system memory outside of the
> regions allocated for it by the driver.

Applied all to thunderbolt.git/next with acks from Bjorn and Rafael.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu