Re: [PATCH v4 2/3] mm: Add support for kmem caches in DMA32 zone
On Thu, Dec 06, 2018 at 11:55:02AM +0800, Nicolas Boichat wrote: >On Thu, Dec 6, 2018 at 11:32 AM Wei Yang wrote: >> >> On Thu, Dec 06, 2018 at 08:41:36AM +0800, Nicolas Boichat wrote: >> >On Wed, Dec 5, 2018 at 8:18 PM Wei Yang wrote: >> >> >> >> On Wed, Dec 05, 2018 at 03:39:51PM +0800, Nicolas Boichat wrote: >> >> >On Wed, Dec 5, 2018 at 3:25 PM Wei Yang >> >> >wrote: >> >> >> >> >> >> On Wed, Dec 05, 2018 at 01:48:27PM +0800, Nicolas Boichat wrote: >> >> >> >In some cases (e.g. IOMMU ARMv7s page allocator), we need to allocate >> >> >> >data structures smaller than a page with GFP_DMA32 flag. >> >> >> > >> >> >> >This change makes it possible to create a custom cache in DMA32 zone >> >> >> >using kmem_cache_create, then allocate memory using kmem_cache_alloc. >> >> >> > >> >> >> >We do not create a DMA32 kmalloc cache array, as there are currently >> >> >> >no users of kmalloc(..., GFP_DMA32). The new test in check_slab_flags >> >> >> >ensures that such calls still fail (as they do before this change). >> >> >> > >> >> >> >Fixes: ad67f5a6545f ("arm64: replace ZONE_DMA with ZONE_DMA32") >> >> >> >Signed-off-by: Nicolas Boichat >> >> >> >--- >> >> >> > >> >> >> >Changes since v2: >> >> >> > - Clarified commit message >> >> >> > - Add entry in sysfs-kernel-slab to document the new sysfs file >> >> >> > >> >> >> >(v3 used the page_frag approach) >> >> >> > >> >> >> >Documentation/ABI/testing/sysfs-kernel-slab | 9 + >> >> >> > include/linux/slab.h| 2 ++ >> >> >> > mm/internal.h | 8 ++-- >> >> >> > mm/slab.c | 4 +++- >> >> >> > mm/slab.h | 3 ++- >> >> >> > mm/slab_common.c| 2 +- >> >> >> > mm/slub.c | 18 +- >> >> >> > 7 files changed, 40 insertions(+), 6 deletions(-) >> >> >> > >> >> >> >diff --git a/Documentation/ABI/testing/sysfs-kernel-slab >> >> >> >b/Documentation/ABI/testing/sysfs-kernel-slab >> >> >> >index 29601d93a1c2ea..d742c6cfdffbe9 100644 >> >> >> >--- a/Documentation/ABI/testing/sysfs-kernel-slab >> >> >> >+++ b/Documentation/ABI/testing/sysfs-kernel-slab >> >> >> >@@ -106,6 +106,15 @@ Description: >> >> >> > are from ZONE_DMA. >> >> >> > Available when CONFIG_ZONE_DMA is enabled. >> >> >> > >> >> >> >+What: /sys/kernel/slab/cache/cache_dma32 >> >> >> >+Date: December 2018 >> >> >> >+KernelVersion:4.21 >> >> >> >+Contact: Nicolas Boichat >> >> >> >+Description: >> >> >> >+ The cache_dma32 file is read-only and specifies >> >> >> >whether objects >> >> >> >+ are from ZONE_DMA32. >> >> >> >+ Available when CONFIG_ZONE_DMA32 is enabled. >> >> >> >+ >> >> >> > What: /sys/kernel/slab/cache/cpu_slabs >> >> >> > Date: May 2007 >> >> >> > KernelVersion:2.6.22 >> >> >> >diff --git a/include/linux/slab.h b/include/linux/slab.h >> >> >> >index 11b45f7ae4057c..9449b19c5f107a 100644 >> >> >> >--- a/include/linux/slab.h >> >> >> >+++ b/include/linux/slab.h >> >> >> >@@ -32,6 +32,8 @@ >> >> >> > #define SLAB_HWCACHE_ALIGN((slab_flags_t __force)0x2000U) >> >> >> > /* Use GFP_DMA memory */ >> >> >> > #define SLAB_CACHE_DMA((slab_flags_t >> >> >> > __force)0x4000U) >> >> >> >+/* Use GFP_DMA32 memory */ >> >> >> >+#define SLAB_CACHE_DMA32 ((slab_flags_t __force)0x8000U) >> >> >> > /* DEBUG: Store the last owner for bug hunting */ >> >> >> > #define SLAB_STORE_USER ((slab_flags_t >> >> >> > __force)0x0001U) >> >> >> > /* Panic if kmem_cache_create() fails */ >> >> >> >diff --git a/mm/internal.h b/mm/internal.h >> >> >> >index a2ee82a0cd44ae..fd244ad716eaf8 100644 >> >> >> >--- a/mm/internal.h >> >> >> >+++ b/mm/internal.h >> >> >> >@@ -14,6 +14,7 @@ >> >> >> > #include >> >> >> > #include >> >> >> > #include >> >> >> >+#include >> >> >> > #include >> >> >> > >> >> >> > /* >> >> >> >@@ -34,9 +35,12 @@ >> >> >> > #define GFP_CONSTRAINT_MASK (__GFP_HARDWALL|__GFP_THISNODE) >> >> >> > >> >> >> > /* Check for flags that must not be used with a slab allocator */ >> >> >> >-static inline gfp_t check_slab_flags(gfp_t flags) >> >> >> >+static inline gfp_t check_slab_flags(gfp_t flags, slab_flags_t >> >> >> >slab_flags) >> >> >> > { >> >> >> >- gfp_t bug_mask = __GFP_DMA32 | __GFP_HIGHMEM | >> >> >> >~__GFP_BITS_MASK; >> >> >> >+ gfp_t bug_mask = __GFP_HIGHMEM | ~__GFP_BITS_MASK; >> >> >> >+ >> >> >> >+ if (!IS_ENABLED(CONFIG_ZONE_DMA32) || !(slab_flags & >> >> >> >SLAB_CACHE_DMA32)) >> >> >> >+ bug_mask |= __GFP_DMA32; >> >> >> >> >> >> The original version doesn't check CONFIG_ZONE_DMA32. >> >> >> >> >> >> Do we need to add this condition here? >> >> >> Could we just decide the bug_mask based on slab_flags? >> >> > >> >> >We can. The reason I did it this way is that when we don't have >> >>
[PATCH 1/1] swiotlb: fix comment on swiotlb_bounce()
Fix the comment as swiotlb_bounce() is used to copy from original dma location to swiotlb buffer during swiotlb_tbl_map_single(), while to copy from swiotlb buffer to original dma location during swiotlb_tbl_unmap_single(). Signed-off-by: Dongli Zhang --- kernel/dma/swiotlb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 045930e..0a5ec94 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -389,7 +389,7 @@ static int is_swiotlb_buffer(phys_addr_t paddr) } /* - * Bounce: copy the swiotlb buffer back to the original dma location + * Bounce: copy the swiotlb buffer from or back to the original dma location */ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr, size_t size, enum dma_data_direction dir) -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH RFC 1/1] swiotlb: add debugfs to track swiotlb buffer usage
The device driver will not be able to do dma operations once swiotlb buffer is full, either because the driver is using so many IO TLB blocks inflight, or because there is memory leak issue in device driver. To export the swiotlb buffer usage via debugfs would help the user estimate the size of swiotlb buffer to pre-allocate or analyze device driver memory leak issue. As the swiotlb can be initialized at very early stage when debugfs cannot register successfully, this patch creates the debugfs entry on demand. Signed-off-by: Dongli Zhang --- kernel/dma/swiotlb.c | 57 1 file changed, 57 insertions(+) diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 045930e..d3c8aa4 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -35,6 +35,9 @@ #include #include #include +#ifdef CONFIG_DEBUG_FS +#include +#endif #include #include @@ -73,6 +76,13 @@ static phys_addr_t io_tlb_start, io_tlb_end; */ static unsigned long io_tlb_nslabs; +#ifdef CONFIG_DEBUG_FS +/* + * The number of used IO TLB block + */ +static unsigned long io_tlb_used; +#endif + /* * This is a free list describing the number of free entries available from * each index @@ -100,6 +110,41 @@ static DEFINE_SPINLOCK(io_tlb_lock); static int late_alloc; +#ifdef CONFIG_DEBUG_FS + +static struct dentry *d_swiotlb_usage; + +static int swiotlb_usage_show(struct seq_file *m, void *v) +{ + seq_printf(m, "%lu\n%lu\n", io_tlb_used, io_tlb_nslabs); + return 0; +} + +static int swiotlb_usage_open(struct inode *inode, struct file *filp) +{ + return single_open(filp, swiotlb_usage_show, NULL); +} + +static const struct file_operations swiotlb_usage_fops = { + .open = swiotlb_usage_open, + .read = seq_read, + .llseek = seq_lseek, + .release= single_release, +}; + +void swiotlb_create_debugfs(void) +{ + d_swiotlb_usage = debugfs_create_dir("swiotlb", NULL); + + if (!d_swiotlb_usage) + return; + + debugfs_create_file("usage", 0600, d_swiotlb_usage, + NULL, _usage_fops); +} + +#endif + static int __init setup_io_tlb_npages(char *str) { @@ -449,6 +494,11 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, pr_warn_once("%s is active and system is using DMA bounce buffers\n", sme_active() ? "SME" : "SEV"); +#ifdef CONFIG_DEBUG_FS + if (unlikely(!d_swiotlb_usage)) + swiotlb_create_debugfs(); +#endif + mask = dma_get_seg_boundary(hwdev); tbl_dma_addr &= mask; @@ -528,6 +578,9 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, dev_warn(hwdev, "swiotlb buffer is full (sz: %zd bytes)\n", size); return SWIOTLB_MAP_ERROR; found: +#ifdef CONFIG_DEBUG_FS + io_tlb_used += nslots; +#endif spin_unlock_irqrestore(_tlb_lock, flags); /* @@ -588,6 +641,10 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr, */ for (i = index - 1; (OFFSET(i, IO_TLB_SEGSIZE) != IO_TLB_SEGSIZE -1) && io_tlb_list[i]; i--) io_tlb_list[i] = ++count; + +#ifdef CONFIG_DEBUG_FS + io_tlb_used -= nslots; +#endif } spin_unlock_irqrestore(_tlb_lock, flags); } -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 2/3] mm: Add support for kmem caches in DMA32 zone
On Thu, Dec 6, 2018 at 11:32 AM Wei Yang wrote: > > On Thu, Dec 06, 2018 at 08:41:36AM +0800, Nicolas Boichat wrote: > >On Wed, Dec 5, 2018 at 8:18 PM Wei Yang wrote: > >> > >> On Wed, Dec 05, 2018 at 03:39:51PM +0800, Nicolas Boichat wrote: > >> >On Wed, Dec 5, 2018 at 3:25 PM Wei Yang wrote: > >> >> > >> >> On Wed, Dec 05, 2018 at 01:48:27PM +0800, Nicolas Boichat wrote: > >> >> >In some cases (e.g. IOMMU ARMv7s page allocator), we need to allocate > >> >> >data structures smaller than a page with GFP_DMA32 flag. > >> >> > > >> >> >This change makes it possible to create a custom cache in DMA32 zone > >> >> >using kmem_cache_create, then allocate memory using kmem_cache_alloc. > >> >> > > >> >> >We do not create a DMA32 kmalloc cache array, as there are currently > >> >> >no users of kmalloc(..., GFP_DMA32). The new test in check_slab_flags > >> >> >ensures that such calls still fail (as they do before this change). > >> >> > > >> >> >Fixes: ad67f5a6545f ("arm64: replace ZONE_DMA with ZONE_DMA32") > >> >> >Signed-off-by: Nicolas Boichat > >> >> >--- > >> >> > > >> >> >Changes since v2: > >> >> > - Clarified commit message > >> >> > - Add entry in sysfs-kernel-slab to document the new sysfs file > >> >> > > >> >> >(v3 used the page_frag approach) > >> >> > > >> >> >Documentation/ABI/testing/sysfs-kernel-slab | 9 + > >> >> > include/linux/slab.h| 2 ++ > >> >> > mm/internal.h | 8 ++-- > >> >> > mm/slab.c | 4 +++- > >> >> > mm/slab.h | 3 ++- > >> >> > mm/slab_common.c| 2 +- > >> >> > mm/slub.c | 18 +- > >> >> > 7 files changed, 40 insertions(+), 6 deletions(-) > >> >> > > >> >> >diff --git a/Documentation/ABI/testing/sysfs-kernel-slab > >> >> >b/Documentation/ABI/testing/sysfs-kernel-slab > >> >> >index 29601d93a1c2ea..d742c6cfdffbe9 100644 > >> >> >--- a/Documentation/ABI/testing/sysfs-kernel-slab > >> >> >+++ b/Documentation/ABI/testing/sysfs-kernel-slab > >> >> >@@ -106,6 +106,15 @@ Description: > >> >> > are from ZONE_DMA. > >> >> > Available when CONFIG_ZONE_DMA is enabled. > >> >> > > >> >> >+What: /sys/kernel/slab/cache/cache_dma32 > >> >> >+Date: December 2018 > >> >> >+KernelVersion:4.21 > >> >> >+Contact: Nicolas Boichat > >> >> >+Description: > >> >> >+ The cache_dma32 file is read-only and specifies whether > >> >> >objects > >> >> >+ are from ZONE_DMA32. > >> >> >+ Available when CONFIG_ZONE_DMA32 is enabled. > >> >> >+ > >> >> > What: /sys/kernel/slab/cache/cpu_slabs > >> >> > Date: May 2007 > >> >> > KernelVersion:2.6.22 > >> >> >diff --git a/include/linux/slab.h b/include/linux/slab.h > >> >> >index 11b45f7ae4057c..9449b19c5f107a 100644 > >> >> >--- a/include/linux/slab.h > >> >> >+++ b/include/linux/slab.h > >> >> >@@ -32,6 +32,8 @@ > >> >> > #define SLAB_HWCACHE_ALIGN((slab_flags_t __force)0x2000U) > >> >> > /* Use GFP_DMA memory */ > >> >> > #define SLAB_CACHE_DMA((slab_flags_t > >> >> > __force)0x4000U) > >> >> >+/* Use GFP_DMA32 memory */ > >> >> >+#define SLAB_CACHE_DMA32 ((slab_flags_t __force)0x8000U) > >> >> > /* DEBUG: Store the last owner for bug hunting */ > >> >> > #define SLAB_STORE_USER ((slab_flags_t > >> >> > __force)0x0001U) > >> >> > /* Panic if kmem_cache_create() fails */ > >> >> >diff --git a/mm/internal.h b/mm/internal.h > >> >> >index a2ee82a0cd44ae..fd244ad716eaf8 100644 > >> >> >--- a/mm/internal.h > >> >> >+++ b/mm/internal.h > >> >> >@@ -14,6 +14,7 @@ > >> >> > #include > >> >> > #include > >> >> > #include > >> >> >+#include > >> >> > #include > >> >> > > >> >> > /* > >> >> >@@ -34,9 +35,12 @@ > >> >> > #define GFP_CONSTRAINT_MASK (__GFP_HARDWALL|__GFP_THISNODE) > >> >> > > >> >> > /* Check for flags that must not be used with a slab allocator */ > >> >> >-static inline gfp_t check_slab_flags(gfp_t flags) > >> >> >+static inline gfp_t check_slab_flags(gfp_t flags, slab_flags_t > >> >> >slab_flags) > >> >> > { > >> >> >- gfp_t bug_mask = __GFP_DMA32 | __GFP_HIGHMEM | ~__GFP_BITS_MASK; > >> >> >+ gfp_t bug_mask = __GFP_HIGHMEM | ~__GFP_BITS_MASK; > >> >> >+ > >> >> >+ if (!IS_ENABLED(CONFIG_ZONE_DMA32) || !(slab_flags & > >> >> >SLAB_CACHE_DMA32)) > >> >> >+ bug_mask |= __GFP_DMA32; > >> >> > >> >> The original version doesn't check CONFIG_ZONE_DMA32. > >> >> > >> >> Do we need to add this condition here? > >> >> Could we just decide the bug_mask based on slab_flags? > >> > > >> >We can. The reason I did it this way is that when we don't have > >> >CONFIG_ZONE_DMA32, the compiler should be able to simplify to: > >> > > >> >bug_mask = __GFP_HIGHMEM | ~__GFP_BITS_MASK; > >> >if (true || ..) => if (true) > >> > bug_mask |= __GFP_DMA32; >
Re: [PATCH v4 2/3] mm: Add support for kmem caches in DMA32 zone
On Wed, Dec 5, 2018 at 10:02 PM Vlastimil Babka wrote: > > On 12/5/18 6:48 AM, Nicolas Boichat wrote: > > In some cases (e.g. IOMMU ARMv7s page allocator), we need to allocate > > data structures smaller than a page with GFP_DMA32 flag. > > > > This change makes it possible to create a custom cache in DMA32 zone > > using kmem_cache_create, then allocate memory using kmem_cache_alloc. > > > > We do not create a DMA32 kmalloc cache array, as there are currently > > no users of kmalloc(..., GFP_DMA32). The new test in check_slab_flags > > ensures that such calls still fail (as they do before this change). > > > > Fixes: ad67f5a6545f ("arm64: replace ZONE_DMA with ZONE_DMA32") > > Same as my comment for 1/3. I'll drop. > > Signed-off-by: Nicolas Boichat > > In general, > Acked-by: Vlastimil Babka > > Some comments below: > > > --- > > > > Changes since v2: > > - Clarified commit message > > - Add entry in sysfs-kernel-slab to document the new sysfs file > > > > (v3 used the page_frag approach) > > > > Documentation/ABI/testing/sysfs-kernel-slab | 9 + > > include/linux/slab.h| 2 ++ > > mm/internal.h | 8 ++-- > > mm/slab.c | 4 +++- > > mm/slab.h | 3 ++- > > mm/slab_common.c| 2 +- > > mm/slub.c | 18 +- > > 7 files changed, 40 insertions(+), 6 deletions(-) > > > > diff --git a/Documentation/ABI/testing/sysfs-kernel-slab > > b/Documentation/ABI/testing/sysfs-kernel-slab > > index 29601d93a1c2ea..d742c6cfdffbe9 100644 > > --- a/Documentation/ABI/testing/sysfs-kernel-slab > > +++ b/Documentation/ABI/testing/sysfs-kernel-slab > > @@ -106,6 +106,15 @@ Description: > > are from ZONE_DMA. > > Available when CONFIG_ZONE_DMA is enabled. > > > > +What:/sys/kernel/slab/cache/cache_dma32 > > +Date:December 2018 > > +KernelVersion: 4.21 > > +Contact: Nicolas Boichat > > +Description: > > + The cache_dma32 file is read-only and specifies whether > > objects > > + are from ZONE_DMA32. > > + Available when CONFIG_ZONE_DMA32 is enabled. > > I don't have a strong opinion. It's a new file, yeah, but consistent > with already existing ones. I'd leave the decision with SL*B maintainers. > > > What:/sys/kernel/slab/cache/cpu_slabs > > Date:May 2007 > > KernelVersion: 2.6.22 > > diff --git a/include/linux/slab.h b/include/linux/slab.h > > index 11b45f7ae4057c..9449b19c5f107a 100644 > > --- a/include/linux/slab.h > > +++ b/include/linux/slab.h > > @@ -32,6 +32,8 @@ > > #define SLAB_HWCACHE_ALIGN ((slab_flags_t __force)0x2000U) > > /* Use GFP_DMA memory */ > > #define SLAB_CACHE_DMA ((slab_flags_t __force)0x4000U) > > +/* Use GFP_DMA32 memory */ > > +#define SLAB_CACHE_DMA32 ((slab_flags_t __force)0x8000U) > > /* DEBUG: Store the last owner for bug hunting */ > > #define SLAB_STORE_USER ((slab_flags_t __force)0x0001U) > > /* Panic if kmem_cache_create() fails */ > > diff --git a/mm/internal.h b/mm/internal.h > > index a2ee82a0cd44ae..fd244ad716eaf8 100644 > > --- a/mm/internal.h > > +++ b/mm/internal.h > > @@ -14,6 +14,7 @@ > > #include > > #include > > #include > > +#include > > #include > > > > /* > > @@ -34,9 +35,12 @@ > > #define GFP_CONSTRAINT_MASK (__GFP_HARDWALL|__GFP_THISNODE) > > > > /* Check for flags that must not be used with a slab allocator */ > > -static inline gfp_t check_slab_flags(gfp_t flags) > > +static inline gfp_t check_slab_flags(gfp_t flags, slab_flags_t slab_flags) > > { > > - gfp_t bug_mask = __GFP_DMA32 | __GFP_HIGHMEM | ~__GFP_BITS_MASK; > > + gfp_t bug_mask = __GFP_HIGHMEM | ~__GFP_BITS_MASK; > > + > > + if (!IS_ENABLED(CONFIG_ZONE_DMA32) || !(slab_flags & > > SLAB_CACHE_DMA32)) > > + bug_mask |= __GFP_DMA32; > > I'll point out that this is not even strictly needed AFAICS, as only > flags passed to kmem_cache_alloc() are checked - the cache->allocflags > derived from SLAB_CACHE_DMA32 are appended only after check_slab_flags() > (in both SLAB and SLUB AFAICS). And for a cache created with > SLAB_CACHE_DMA32, the caller of kmem_cache_alloc() doesn't need to also > include __GFP_DMA32, the allocation will be from ZONE_DMA32 regardless. Yes, you're right. I also looked at existing users of SLAB_CACHE_DMA, and there is one case in drivers/scsi/scsi_lib.c where GFP_DMA is not be passed (all the other users pass it). I can drop GFP_DMA32 from my call in io-pgtable-arm-v7s.c. > So it would be fine even unchanged. The check would anyway need some > more love to catch the same with __GFP_DMA to be consistent and cover > all corner cases. Yes, the test is not complete. If we really wanted this to be accurate, we'd need to check that GFP_* exactly
Re: [PATCH v4 2/3] mm: Add support for kmem caches in DMA32 zone
On Thu, Dec 06, 2018 at 08:41:36AM +0800, Nicolas Boichat wrote: >On Wed, Dec 5, 2018 at 8:18 PM Wei Yang wrote: >> >> On Wed, Dec 05, 2018 at 03:39:51PM +0800, Nicolas Boichat wrote: >> >On Wed, Dec 5, 2018 at 3:25 PM Wei Yang wrote: >> >> >> >> On Wed, Dec 05, 2018 at 01:48:27PM +0800, Nicolas Boichat wrote: >> >> >In some cases (e.g. IOMMU ARMv7s page allocator), we need to allocate >> >> >data structures smaller than a page with GFP_DMA32 flag. >> >> > >> >> >This change makes it possible to create a custom cache in DMA32 zone >> >> >using kmem_cache_create, then allocate memory using kmem_cache_alloc. >> >> > >> >> >We do not create a DMA32 kmalloc cache array, as there are currently >> >> >no users of kmalloc(..., GFP_DMA32). The new test in check_slab_flags >> >> >ensures that such calls still fail (as they do before this change). >> >> > >> >> >Fixes: ad67f5a6545f ("arm64: replace ZONE_DMA with ZONE_DMA32") >> >> >Signed-off-by: Nicolas Boichat >> >> >--- >> >> > >> >> >Changes since v2: >> >> > - Clarified commit message >> >> > - Add entry in sysfs-kernel-slab to document the new sysfs file >> >> > >> >> >(v3 used the page_frag approach) >> >> > >> >> >Documentation/ABI/testing/sysfs-kernel-slab | 9 + >> >> > include/linux/slab.h| 2 ++ >> >> > mm/internal.h | 8 ++-- >> >> > mm/slab.c | 4 +++- >> >> > mm/slab.h | 3 ++- >> >> > mm/slab_common.c| 2 +- >> >> > mm/slub.c | 18 +- >> >> > 7 files changed, 40 insertions(+), 6 deletions(-) >> >> > >> >> >diff --git a/Documentation/ABI/testing/sysfs-kernel-slab >> >> >b/Documentation/ABI/testing/sysfs-kernel-slab >> >> >index 29601d93a1c2ea..d742c6cfdffbe9 100644 >> >> >--- a/Documentation/ABI/testing/sysfs-kernel-slab >> >> >+++ b/Documentation/ABI/testing/sysfs-kernel-slab >> >> >@@ -106,6 +106,15 @@ Description: >> >> > are from ZONE_DMA. >> >> > Available when CONFIG_ZONE_DMA is enabled. >> >> > >> >> >+What: /sys/kernel/slab/cache/cache_dma32 >> >> >+Date: December 2018 >> >> >+KernelVersion:4.21 >> >> >+Contact: Nicolas Boichat >> >> >+Description: >> >> >+ The cache_dma32 file is read-only and specifies whether >> >> >objects >> >> >+ are from ZONE_DMA32. >> >> >+ Available when CONFIG_ZONE_DMA32 is enabled. >> >> >+ >> >> > What: /sys/kernel/slab/cache/cpu_slabs >> >> > Date: May 2007 >> >> > KernelVersion:2.6.22 >> >> >diff --git a/include/linux/slab.h b/include/linux/slab.h >> >> >index 11b45f7ae4057c..9449b19c5f107a 100644 >> >> >--- a/include/linux/slab.h >> >> >+++ b/include/linux/slab.h >> >> >@@ -32,6 +32,8 @@ >> >> > #define SLAB_HWCACHE_ALIGN((slab_flags_t __force)0x2000U) >> >> > /* Use GFP_DMA memory */ >> >> > #define SLAB_CACHE_DMA((slab_flags_t >> >> > __force)0x4000U) >> >> >+/* Use GFP_DMA32 memory */ >> >> >+#define SLAB_CACHE_DMA32 ((slab_flags_t __force)0x8000U) >> >> > /* DEBUG: Store the last owner for bug hunting */ >> >> > #define SLAB_STORE_USER ((slab_flags_t >> >> > __force)0x0001U) >> >> > /* Panic if kmem_cache_create() fails */ >> >> >diff --git a/mm/internal.h b/mm/internal.h >> >> >index a2ee82a0cd44ae..fd244ad716eaf8 100644 >> >> >--- a/mm/internal.h >> >> >+++ b/mm/internal.h >> >> >@@ -14,6 +14,7 @@ >> >> > #include >> >> > #include >> >> > #include >> >> >+#include >> >> > #include >> >> > >> >> > /* >> >> >@@ -34,9 +35,12 @@ >> >> > #define GFP_CONSTRAINT_MASK (__GFP_HARDWALL|__GFP_THISNODE) >> >> > >> >> > /* Check for flags that must not be used with a slab allocator */ >> >> >-static inline gfp_t check_slab_flags(gfp_t flags) >> >> >+static inline gfp_t check_slab_flags(gfp_t flags, slab_flags_t >> >> >slab_flags) >> >> > { >> >> >- gfp_t bug_mask = __GFP_DMA32 | __GFP_HIGHMEM | ~__GFP_BITS_MASK; >> >> >+ gfp_t bug_mask = __GFP_HIGHMEM | ~__GFP_BITS_MASK; >> >> >+ >> >> >+ if (!IS_ENABLED(CONFIG_ZONE_DMA32) || !(slab_flags & >> >> >SLAB_CACHE_DMA32)) >> >> >+ bug_mask |= __GFP_DMA32; >> >> >> >> The original version doesn't check CONFIG_ZONE_DMA32. >> >> >> >> Do we need to add this condition here? >> >> Could we just decide the bug_mask based on slab_flags? >> > >> >We can. The reason I did it this way is that when we don't have >> >CONFIG_ZONE_DMA32, the compiler should be able to simplify to: >> > >> >bug_mask = __GFP_HIGHMEM | ~__GFP_BITS_MASK; >> >if (true || ..) => if (true) >> > bug_mask |= __GFP_DMA32; >> > >> >Then just >> >bug_mask = __GFP_HIGHMEM | ~__GFP_BITS_MASK | __GFP_DMA32; >> > >> >And since the function is inline, slab_flags would not even need to be >> >accessed at all. >> > >> >> Hmm, I get one confusion. >> >> This means if CONFIG_ZONE_DMA32 is not enabled,
Re: [PATCH v5 04/12] iommu/vt-d: Add 256-bit invalidation descriptor support
Hi Joerg, On 12/5/18 11:56 PM, Joerg Roedel wrote: On Tue, Dec 04, 2018 at 02:13:31PM +0800, Lu Baolu wrote: The existing code uses GFP_ATOMIC, this patch only changes the size of the allocated desc_page. I don't think we really need GFP_ATOMIC here (and also for some other places). I will clean up them in a separated patch. Okay, thanks. In this patch, there is some code like the code below. It calculates destination address of memcpy with qi->desc. If it's still struct qi_desc pointer, the calculation result would be wrong. + memcpy(desc, qi->desc + (wait_index << shift), + 1 << shift); The change of the calculation method is to support 128 bits invalidation descriptors and 256 invalidation descriptors in this unified code logic. Also, the conversation between Baolu and me may help. https://lore.kernel.org/patchwork/patch/1006756/ Yes. We need to support different descriptor size. Okay, pointer arithmetic on void* isn't well defined in the C standard, afaik. But it should work with GCC, so it's probably fine. Unrelated to this patch-set, the whole qi management code needs some cleanups, it queues a sync after every command and has very tricky locking. This patch-set further complicates matters there, so it is probably time for a clean re-write of that part? Sure. I will start the cleanups(including the unnecessary ATOMIC page allocation) and submit them in a separated patch set after well tested. Best regards, Lu Baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 02/12] iommu/vt-d: Manage scalalble mode PASID tables
Hi Joerg, On 12/5/18 11:50 PM, Joerg Roedel wrote: On Tue, Dec 04, 2018 at 01:58:06PM +0800, Lu Baolu wrote: This function is called in an unsleepable context. spin_lock() [...] if (pasid_table_is_necessary) allocate_pasid_table(dev) [...] spin_unlock() We can move it out of the lock range. How about if (pasid_table_is_necessary) pasid_table = allocate_pasid_table(dev) spin_lock() [...] if (pasid_table_is_necessary) set_up_pasid_table(pasid_table) [...] spin_unlock() Hmm, so when the IOMMU is configured in scalable mode we can just allocate a pasid-table for the device when we set it up, right? Scalable mode is a boot-time decision, so we know for sure whether we need a pasid-table on device-setup time. And the device-setup code it preemptable, so I think it this allocation should be outside of any spin-locked section. Fair enough. I will fix this up in the next version. Thank you for pointing this out. Best regards, Lu Baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 2/3] mm: Add support for kmem caches in DMA32 zone
On Wed, Dec 5, 2018 at 8:18 PM Wei Yang wrote: > > On Wed, Dec 05, 2018 at 03:39:51PM +0800, Nicolas Boichat wrote: > >On Wed, Dec 5, 2018 at 3:25 PM Wei Yang wrote: > >> > >> On Wed, Dec 05, 2018 at 01:48:27PM +0800, Nicolas Boichat wrote: > >> >In some cases (e.g. IOMMU ARMv7s page allocator), we need to allocate > >> >data structures smaller than a page with GFP_DMA32 flag. > >> > > >> >This change makes it possible to create a custom cache in DMA32 zone > >> >using kmem_cache_create, then allocate memory using kmem_cache_alloc. > >> > > >> >We do not create a DMA32 kmalloc cache array, as there are currently > >> >no users of kmalloc(..., GFP_DMA32). The new test in check_slab_flags > >> >ensures that such calls still fail (as they do before this change). > >> > > >> >Fixes: ad67f5a6545f ("arm64: replace ZONE_DMA with ZONE_DMA32") > >> >Signed-off-by: Nicolas Boichat > >> >--- > >> > > >> >Changes since v2: > >> > - Clarified commit message > >> > - Add entry in sysfs-kernel-slab to document the new sysfs file > >> > > >> >(v3 used the page_frag approach) > >> > > >> >Documentation/ABI/testing/sysfs-kernel-slab | 9 + > >> > include/linux/slab.h| 2 ++ > >> > mm/internal.h | 8 ++-- > >> > mm/slab.c | 4 +++- > >> > mm/slab.h | 3 ++- > >> > mm/slab_common.c| 2 +- > >> > mm/slub.c | 18 +- > >> > 7 files changed, 40 insertions(+), 6 deletions(-) > >> > > >> >diff --git a/Documentation/ABI/testing/sysfs-kernel-slab > >> >b/Documentation/ABI/testing/sysfs-kernel-slab > >> >index 29601d93a1c2ea..d742c6cfdffbe9 100644 > >> >--- a/Documentation/ABI/testing/sysfs-kernel-slab > >> >+++ b/Documentation/ABI/testing/sysfs-kernel-slab > >> >@@ -106,6 +106,15 @@ Description: > >> > are from ZONE_DMA. > >> > Available when CONFIG_ZONE_DMA is enabled. > >> > > >> >+What: /sys/kernel/slab/cache/cache_dma32 > >> >+Date: December 2018 > >> >+KernelVersion:4.21 > >> >+Contact: Nicolas Boichat > >> >+Description: > >> >+ The cache_dma32 file is read-only and specifies whether > >> >objects > >> >+ are from ZONE_DMA32. > >> >+ Available when CONFIG_ZONE_DMA32 is enabled. > >> >+ > >> > What: /sys/kernel/slab/cache/cpu_slabs > >> > Date: May 2007 > >> > KernelVersion:2.6.22 > >> >diff --git a/include/linux/slab.h b/include/linux/slab.h > >> >index 11b45f7ae4057c..9449b19c5f107a 100644 > >> >--- a/include/linux/slab.h > >> >+++ b/include/linux/slab.h > >> >@@ -32,6 +32,8 @@ > >> > #define SLAB_HWCACHE_ALIGN((slab_flags_t __force)0x2000U) > >> > /* Use GFP_DMA memory */ > >> > #define SLAB_CACHE_DMA((slab_flags_t __force)0x4000U) > >> >+/* Use GFP_DMA32 memory */ > >> >+#define SLAB_CACHE_DMA32 ((slab_flags_t __force)0x8000U) > >> > /* DEBUG: Store the last owner for bug hunting */ > >> > #define SLAB_STORE_USER ((slab_flags_t __force)0x0001U) > >> > /* Panic if kmem_cache_create() fails */ > >> >diff --git a/mm/internal.h b/mm/internal.h > >> >index a2ee82a0cd44ae..fd244ad716eaf8 100644 > >> >--- a/mm/internal.h > >> >+++ b/mm/internal.h > >> >@@ -14,6 +14,7 @@ > >> > #include > >> > #include > >> > #include > >> >+#include > >> > #include > >> > > >> > /* > >> >@@ -34,9 +35,12 @@ > >> > #define GFP_CONSTRAINT_MASK (__GFP_HARDWALL|__GFP_THISNODE) > >> > > >> > /* Check for flags that must not be used with a slab allocator */ > >> >-static inline gfp_t check_slab_flags(gfp_t flags) > >> >+static inline gfp_t check_slab_flags(gfp_t flags, slab_flags_t > >> >slab_flags) > >> > { > >> >- gfp_t bug_mask = __GFP_DMA32 | __GFP_HIGHMEM | ~__GFP_BITS_MASK; > >> >+ gfp_t bug_mask = __GFP_HIGHMEM | ~__GFP_BITS_MASK; > >> >+ > >> >+ if (!IS_ENABLED(CONFIG_ZONE_DMA32) || !(slab_flags & > >> >SLAB_CACHE_DMA32)) > >> >+ bug_mask |= __GFP_DMA32; > >> > >> The original version doesn't check CONFIG_ZONE_DMA32. > >> > >> Do we need to add this condition here? > >> Could we just decide the bug_mask based on slab_flags? > > > >We can. The reason I did it this way is that when we don't have > >CONFIG_ZONE_DMA32, the compiler should be able to simplify to: > > > >bug_mask = __GFP_HIGHMEM | ~__GFP_BITS_MASK; > >if (true || ..) => if (true) > > bug_mask |= __GFP_DMA32; > > > >Then just > >bug_mask = __GFP_HIGHMEM | ~__GFP_BITS_MASK | __GFP_DMA32; > > > >And since the function is inline, slab_flags would not even need to be > >accessed at all. > > > > Hmm, I get one confusion. > > This means if CONFIG_ZONE_DMA32 is not enabled, bug_mask will always > contains __GFP_DMA32. This will check with cachep->flags. > > If cachep->flags has GFP_DMA32, this always fail? > > Is this possible? Not fully sure to understand the question,
[PATCH v2 7/8] dma/debug: Remove dma_debug_resize_entries()
With no callers left, we can clean up this part of dma-debug's exposed internals, making way to tweak the allocation behaviour. Signed-off-by: Robin Murphy --- v2: New include/linux/dma-debug.h | 7 --- kernel/dma/debug.c| 30 -- 2 files changed, 37 deletions(-) diff --git a/include/linux/dma-debug.h b/include/linux/dma-debug.h index 30213adbb6b9..46e6131a72b6 100644 --- a/include/linux/dma-debug.h +++ b/include/linux/dma-debug.h @@ -30,8 +30,6 @@ struct bus_type; extern void dma_debug_add_bus(struct bus_type *bus); -extern int dma_debug_resize_entries(u32 num_entries); - extern void debug_dma_map_single(struct device *dev, const void *addr, unsigned long len); @@ -101,11 +99,6 @@ static inline void dma_debug_add_bus(struct bus_type *bus) { } -static inline int dma_debug_resize_entries(u32 num_entries) -{ - return 0; -} - static inline void debug_dma_map_single(struct device *dev, const void *addr, unsigned long len) { diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c index cd9a12bf6a3a..2202402afe9a 100644 --- a/kernel/dma/debug.c +++ b/kernel/dma/debug.c @@ -756,36 +756,6 @@ static void dma_entry_free(struct dma_debug_entry *entry) spin_unlock_irqrestore(_entries_lock, flags); } -int dma_debug_resize_entries(u32 num_entries) -{ - int i, delta, ret = 0; - unsigned long flags; - struct dma_debug_entry *entry; - - spin_lock_irqsave(_entries_lock, flags); - - if (nr_total_entries < num_entries) { - delta = num_entries - nr_total_entries; - - ret = dma_debug_add_entries(delta, GFP_ATOMIC); - } else { - delta = nr_total_entries - num_entries; - - for (i = 0; i < delta && !list_empty(_entries); i++) { - entry = __dma_entry_alloc(); - kfree(entry); - } - - nr_total_entries -= i; - if (nr_total_entries != num_entries) - ret = -EBUSY; - } - - spin_unlock_irqrestore(_entries_lock, flags); - - return ret; -} - /* * DMA-API debugging init code * -- 2.19.1.dirty ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 6/8] x86/dma/amd-gart: Stop resizing dma_debug_entry pool
dma-debug is now capable of adding new entries to its pool on-demand if the initial preallocation was insufficient, so the IOMMU_LEAK logic no longer needs to explicitly change the pool size. This does lose it the ability to save a couple of megabytes of RAM by reducing the pool size below its default, but it seems unlikely that that is a realistic concern these days (or indeed that anyone is actively debugging AGP drivers' DMA usage any more). Getting rid of dma_debug_resize_entries() will make room for further streamlining in the dma-debug code itself. Removing the call reveals quite a lot of cruft which has been useless for nearly a decade since commit 19c1a6f5764d ("x86 gart: reimplement IOMMU_LEAK feature by using DMA_API_DEBUG"), including the entire 'iommu=leak' parameter, which controlled nothing except whether dma_debug_resize_entries() was called or not. CC: Thomas Gleixner CC: Ingo Molnar CC: Borislav Petkov CC: "H. Peter Anvin" CC: x...@kernel.org Signed-off-by: Robin Murphy --- v2: New Documentation/x86/x86_64/boot-options.txt | 5 + arch/x86/kernel/amd_gart_64.c | 23 --- 2 files changed, 1 insertion(+), 27 deletions(-) diff --git a/Documentation/x86/x86_64/boot-options.txt b/Documentation/x86/x86_64/boot-options.txt index ad6d2a80cf05..abc53886655e 100644 --- a/Documentation/x86/x86_64/boot-options.txt +++ b/Documentation/x86/x86_64/boot-options.txt @@ -209,7 +209,7 @@ IOMMU (input/output memory management unit) mapping with memory protection, etc. Kernel boot message: "PCI-DMA: Using Calgary IOMMU" - iommu=[][,noagp][,off][,force][,noforce][,leak[=] + iommu=[][,noagp][,off][,force][,noforce] [,memaper[=]][,merge][,fullflush][,nomerge] [,noaperture][,calgary] @@ -228,9 +228,6 @@ IOMMU (input/output memory management unit) allowedOverwrite iommu off workarounds for specific chipsets. fullflush Flush IOMMU on each allocation (default). nofullflushDon't use IOMMU fullflush. -leak Turn on simple iommu leak tracing (only when - CONFIG_IOMMU_LEAK is on). Default number of leak pages - is 20. memaper[=] Allocate an own aperture over RAM with size 32MB
[PATCH v2 3/8] dma-debug: Refactor dma_debug_entry allocation
Make the prealloc_memory() logic a little more general and robust so that it serves for runtime reallocations too. The first thing we can do with that is clean up dma_debug_resize_entries() a bit. Signed-off-by: Robin Murphy --- v2: Give it a better name, simplify the locking mess kernel/dma/debug.c | 87 +++--- 1 file changed, 36 insertions(+), 51 deletions(-) diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c index 29486eb9d1dc..1b0858d7edfd 100644 --- a/kernel/dma/debug.c +++ b/kernel/dma/debug.c @@ -646,6 +646,36 @@ static void add_dma_entry(struct dma_debug_entry *entry) */ } +static int dma_debug_add_entries(u32 num_entries, gfp_t gfp) +{ + struct dma_debug_entry *entry, *next_entry; + LIST_HEAD(tmp); + int i; + + for (i = 0; i < num_entries; ++i) { + entry = kzalloc(sizeof(*entry), gfp); + if (!entry) + goto out_err; + + list_add_tail(>list, ); + } + + list_splice(, _entries); + num_free_entries += num_entries; + nr_total_entries += num_entries; + + return 0; + +out_err: + + list_for_each_entry_safe(entry, next_entry, , list) { + list_del(>list); + kfree(entry); + } + + return -ENOMEM; +} + static struct dma_debug_entry *__dma_entry_alloc(void) { struct dma_debug_entry *entry; @@ -715,28 +745,13 @@ int dma_debug_resize_entries(u32 num_entries) int i, delta, ret = 0; unsigned long flags; struct dma_debug_entry *entry; - LIST_HEAD(tmp); spin_lock_irqsave(_entries_lock, flags); if (nr_total_entries < num_entries) { delta = num_entries - nr_total_entries; - spin_unlock_irqrestore(_entries_lock, flags); - - for (i = 0; i < delta; i++) { - entry = kzalloc(sizeof(*entry), GFP_KERNEL); - if (!entry) - break; - - list_add_tail(>list, ); - } - - spin_lock_irqsave(_entries_lock, flags); - - list_splice(, _entries); - nr_total_entries += i; - num_free_entries += i; + ret = dma_debug_add_entries(delta, GFP_ATOMIC); } else { delta = nr_total_entries - num_entries; @@ -746,11 +761,10 @@ int dma_debug_resize_entries(u32 num_entries) } nr_total_entries -= i; + if (nr_total_entries != num_entries) + ret = -EBUSY; } - if (nr_total_entries != num_entries) - ret = 1; - spin_unlock_irqrestore(_entries_lock, flags); return ret; @@ -764,36 +778,6 @@ int dma_debug_resize_entries(u32 num_entries) * 2. Preallocate a given number of dma_debug_entry structs */ -static int prealloc_memory(u32 num_entries) -{ - struct dma_debug_entry *entry, *next_entry; - int i; - - for (i = 0; i < num_entries; ++i) { - entry = kzalloc(sizeof(*entry), GFP_KERNEL); - if (!entry) - goto out_err; - - list_add_tail(>list, _entries); - } - - num_free_entries = num_entries; - min_free_entries = num_entries; - - pr_info("preallocated %d debug entries\n", num_entries); - - return 0; - -out_err: - - list_for_each_entry_safe(entry, next_entry, _entries, list) { - list_del(>list); - kfree(entry); - } - - return -ENOMEM; -} - static ssize_t filter_read(struct file *file, char __user *user_buf, size_t count, loff_t *ppos) { @@ -1038,14 +1022,15 @@ static int dma_debug_init(void) return 0; } - if (prealloc_memory(nr_prealloc_entries) != 0) { + if (dma_debug_add_entries(nr_prealloc_entries, GFP_KERNEL) != 0) { pr_err("debugging out of memory error - disabled\n"); global_disable = true; return 0; } - nr_total_entries = num_free_entries; + min_free_entries = num_free_entries; + pr_info("preallocated %d debug entries\n", nr_total_entries); dma_debug_initialized = true; -- 2.19.1.dirty ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 8/8] dma-debug: Batch dma_debug_entry allocation
DMA debug entries are one of those things which aren't that useful individually - we will always want some larger quantity of them - and that we don't really need to manage the exact number of - we only care about having 'enough'. In that regard, the current behaviour of creating them one-by-one from the slab allocator means an awful lot of function call overhead and memory wasted on alignment padding. Now that we don't have to worry about freeing anything via dma_debug_resize_entries(), we can optimise the allocation behaviour by grabbing whole pages at once, which will save considerably on the aforementioned overheads, and probably offer a little more cache/TLB locality benefit for traversing the lists under normal operation. Since freeing a whole page of entries at once becomes enough of a challenge that it's not really worth complicating dma_debug_init(), we may as well tweak the preallocation behaviour so that as long as we manage to allocate *some* pages, we can leave debugging enabled on a best-effort basis rather than otherwise wasting them. Signed-off-by: Robin Murphy --- v2: New Documentation/DMA-API.txt | 4 +++- kernel/dma/debug.c| 45 +-- 2 files changed, 22 insertions(+), 27 deletions(-) diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt index 7a7d8a415ce8..097c51b79330 100644 --- a/Documentation/DMA-API.txt +++ b/Documentation/DMA-API.txt @@ -747,7 +747,9 @@ driver afterwards. This filter can be disabled or changed later using debugfs. When the code disables itself at runtime this is most likely because it ran out of dma_debug_entries and was unable to allocate more on-demand. 65536 entries are preallocated at boot - if this is too low for you boot with -'dma_debug_entries=' to overwrite the default. The +'dma_debug_entries=' to overwrite the default. Note +that the code allocates entires in batches, so the exact number of +preallocated entries may be greater than the actual number requested. The code will print to the kernel log each time it has dynamically allocated as many entries as were initially preallocated. This is to indicate that a larger preallocation size may be appropriate, or if it happens continually diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c index 2202402afe9a..a6a603526c8f 100644 --- a/kernel/dma/debug.c +++ b/kernel/dma/debug.c @@ -48,7 +48,7 @@ #define PREALLOC_DMA_DEBUG_ENTRIES (1 << 16) #endif /* If the pool runs out, add this many new entries at once */ -#define DMA_DEBUG_DYNAMIC_ENTRIES 256 +#define DMA_DEBUG_DYNAMIC_ENTRIES (PAGE_SIZE / sizeof(struct dma_debug_entry)) enum { dma_debug_single, @@ -648,34 +648,22 @@ static void add_dma_entry(struct dma_debug_entry *entry) */ } -static int dma_debug_add_entries(u32 num_entries, gfp_t gfp) +static int dma_debug_add_entries(gfp_t gfp) { - struct dma_debug_entry *entry, *next_entry; - LIST_HEAD(tmp); + struct dma_debug_entry *entry; int i; - for (i = 0; i < num_entries; ++i) { - entry = kzalloc(sizeof(*entry), gfp); - if (!entry) - goto out_err; + entry = (void *)get_zeroed_page(gfp); + if (!entry) + return -ENOMEM; - list_add_tail(>list, ); - } + for (i = 0; i < DMA_DEBUG_DYNAMIC_ENTRIES; i++) + list_add_tail([i].list, _entries); - list_splice(, _entries); - num_free_entries += num_entries; - nr_total_entries += num_entries; + num_free_entries += DMA_DEBUG_DYNAMIC_ENTRIES; + nr_total_entries += DMA_DEBUG_DYNAMIC_ENTRIES; return 0; - -out_err: - - list_for_each_entry_safe(entry, next_entry, , list) { - list_del(>list); - kfree(entry); - } - - return -ENOMEM; } static struct dma_debug_entry *__dma_entry_alloc(void) @@ -717,7 +705,7 @@ static struct dma_debug_entry *dma_entry_alloc(void) spin_lock_irqsave(_entries_lock, flags); if (num_free_entries == 0) { - if (dma_debug_create_entries(DMA_DEBUG_DYNAMIC_ENTRIES, GFP_ATOMIC)) { + if (dma_debug_add_entries(GFP_ATOMIC)) { global_disable = true; spin_unlock_irqrestore(_entries_lock, flags); pr_err("debugging out of memory - disabling\n"); @@ -1008,15 +996,20 @@ static int dma_debug_init(void) return 0; } - if (dma_debug_add_entries(nr_prealloc_entries, GFP_KERNEL) != 0) { + for (i = 0; i < DIV_ROUND_UP(nr_prealloc_entries, DMA_DEBUG_DYNAMIC_ENTRIES); ++i) + dma_debug_add_entries(GFP_KERNEL); + if (num_free_entries >= nr_prealloc_entries) { + pr_info("preallocated %d debug entries\n", nr_total_entries); + } else if (num_free_entries > 0) { + pr_warn("%d debug entries requested but only %d allocated\n", +
[PATCH v2 5/8] dma-debug: Make leak-like behaviour apparent
Now that we can dynamically allocate DMA debug entries to cope with drivers maintaining excessively large numbers of live mappings, a driver which *does* actually have a bug leaking mappings (and is not unloaded) will no longer trigger the "DMA-API: debugging out of memory - disabling" message until it gets to actual kernel OOM conditions, which means it could go unnoticed for a while. To that end, let's inform the user each time the pool has grown to a multiple of its initial size, which should make it apparent that they either have a leak or might want to increase the preallocation size. Signed-off-by: Robin Murphy --- v2: Move the suggestion of a leak from the message to the documentation Documentation/DMA-API.txt | 6 +- kernel/dma/debug.c| 13 + 2 files changed, 18 insertions(+), 1 deletion(-) diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt index 0fcb7561af1e..7a7d8a415ce8 100644 --- a/Documentation/DMA-API.txt +++ b/Documentation/DMA-API.txt @@ -747,7 +747,11 @@ driver afterwards. This filter can be disabled or changed later using debugfs. When the code disables itself at runtime this is most likely because it ran out of dma_debug_entries and was unable to allocate more on-demand. 65536 entries are preallocated at boot - if this is too low for you boot with -'dma_debug_entries=' to overwrite the default. +'dma_debug_entries=' to overwrite the default. The +code will print to the kernel log each time it has dynamically allocated +as many entries as were initially preallocated. This is to indicate that a +larger preallocation size may be appropriate, or if it happens continually +that a driver may be leaking mappings. :: diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c index b1fe8b78acc4..cd9a12bf6a3a 100644 --- a/kernel/dma/debug.c +++ b/kernel/dma/debug.c @@ -693,6 +693,18 @@ static struct dma_debug_entry *__dma_entry_alloc(void) return entry; } +void __dma_entry_alloc_check_leak(void) +{ + u32 tmp = nr_total_entries % nr_prealloc_entries; + + /* Shout each time we tick over some multiple of the initial pool */ + if (tmp < DMA_DEBUG_DYNAMIC_ENTRIES) { + pr_info("dma_debug_entry pool grown to %u (%u00%%)\n", + nr_total_entries, + (nr_total_entries / nr_prealloc_entries)); + } +} + /* struct dma_entry allocator * * The next two functions implement the allocator for @@ -711,6 +723,7 @@ static struct dma_debug_entry *dma_entry_alloc(void) pr_err("debugging out of memory - disabling\n"); return NULL; } + __dma_entry_alloc_check_leak(); } entry = __dma_entry_alloc(); -- 2.19.1.dirty ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 4/8] dma-debug: Dynamically expand the dma_debug_entry pool
Certain drivers such as large multi-queue network adapters can use pools of mapped DMA buffers larger than the default dma_debug_entry pool of 65536 entries, with the result that merely probing such a device can cause DMA debug to disable itself during boot unless explicitly given an appropriate "dma_debug_entries=..." option. Developers trying to debug some other driver on such a system may not be immediately aware of this, and at worst it can hide bugs if they fail to realise that dma-debug has already disabled itself unexpectedly by the time the code of interest gets to run. Even once they do realise, it can be a bit of a pain to emprirically determine a suitable number of preallocated entries to configure without massively over-allocating. There's really no need for such a static limit, though, since we can quite easily expand the pool at runtime in those rare cases that the preallocated entries are insufficient, which is arguably the least surprising and most useful behaviour. Signed-off-by: Robin Murphy --- v2: Use GFP_ATOMIC, make retry loop redundant, update documentation Documentation/DMA-API.txt | 11 +-- kernel/dma/debug.c| 15 +-- 2 files changed, 14 insertions(+), 12 deletions(-) diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt index 6bdb095393b0..0fcb7561af1e 100644 --- a/Documentation/DMA-API.txt +++ b/Documentation/DMA-API.txt @@ -717,8 +717,8 @@ dma-api/num_errors The number in this file shows how many dma-api/min_free_entries This read-only file can be read to get the minimum number of free dma_debug_entries the allocator has ever seen. If this value goes - down to zero the code will disable itself - because it is not longer reliable. + down to zero the code will attempt to increase + nr_total_entries to compensate. dma-api/num_free_entries The current number of free dma_debug_entries in the allocator. @@ -745,10 +745,9 @@ driver filter at boot time. The debug code will only print errors for that driver afterwards. This filter can be disabled or changed later using debugfs. When the code disables itself at runtime this is most likely because it ran -out of dma_debug_entries. These entries are preallocated at boot. The number -of preallocated entries is defined per architecture. If it is too low for you -boot with 'dma_debug_entries=' to overwrite the -architectural default. +out of dma_debug_entries and was unable to allocate more on-demand. 65536 +entries are preallocated at boot - if this is too low for you boot with +'dma_debug_entries=' to overwrite the default. :: diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c index 1b0858d7edfd..b1fe8b78acc4 100644 --- a/kernel/dma/debug.c +++ b/kernel/dma/debug.c @@ -47,6 +47,8 @@ #ifndef PREALLOC_DMA_DEBUG_ENTRIES #define PREALLOC_DMA_DEBUG_ENTRIES (1 << 16) #endif +/* If the pool runs out, add this many new entries at once */ +#define DMA_DEBUG_DYNAMIC_ENTRIES 256 enum { dma_debug_single, @@ -702,12 +704,13 @@ static struct dma_debug_entry *dma_entry_alloc(void) unsigned long flags; spin_lock_irqsave(_entries_lock, flags); - - if (list_empty(_entries)) { - global_disable = true; - spin_unlock_irqrestore(_entries_lock, flags); - pr_err("debugging out of memory - disabling\n"); - return NULL; + if (num_free_entries == 0) { + if (dma_debug_create_entries(DMA_DEBUG_DYNAMIC_ENTRIES, GFP_ATOMIC)) { + global_disable = true; + spin_unlock_irqrestore(_entries_lock, flags); + pr_err("debugging out of memory - disabling\n"); + return NULL; + } } entry = __dma_entry_alloc(); -- 2.19.1.dirty ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 1/8] dma-debug: Use pr_fmt()
Use pr_fmt() to generate the "DMA-API: " prefix consistently. This results in it being added to a couple of pr_*() messages which were missing it before, and for the err_printk() calls moves it to the actual start of the message instead of somewhere in the middle. Reviewed-by: Christoph Hellwig Signed-off-by: Robin Murphy --- v2: Add Christoph's review tag kernel/dma/debug.c | 74 -- 1 file changed, 38 insertions(+), 36 deletions(-) diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c index 231ca4628062..91b84140e4a5 100644 --- a/kernel/dma/debug.c +++ b/kernel/dma/debug.c @@ -17,6 +17,8 @@ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ +#define pr_fmt(fmt)"DMA-API: " fmt + #include #include #include @@ -234,7 +236,7 @@ static bool driver_filter(struct device *dev) error_count += 1; \ if (driver_filter(dev) && \ (show_all_errors || show_num_errors > 0)) { \ - WARN(1, "%s %s: " format, \ + WARN(1, pr_fmt("%s %s: ") format, \ dev ? dev_driver_string(dev) : "NULL", \ dev ? dev_name(dev) : "NULL", ## arg); \ dump_entry_trace(entry);\ @@ -519,7 +521,7 @@ static void active_cacheline_inc_overlap(phys_addr_t cln) * prematurely. */ WARN_ONCE(overlap > ACTIVE_CACHELINE_MAX_OVERLAP, - "DMA-API: exceeded %d overlapping mappings of cacheline %pa\n", + pr_fmt("exceeded %d overlapping mappings of cacheline %pa\n"), ACTIVE_CACHELINE_MAX_OVERLAP, ); } @@ -614,7 +616,7 @@ void debug_dma_assert_idle(struct page *page) cln = to_cacheline_number(entry); err_printk(entry->dev, entry, - "DMA-API: cpu touching an active dma mapped cacheline [cln=%pa]\n", + "cpu touching an active dma mapped cacheline [cln=%pa]\n", ); } @@ -634,7 +636,7 @@ static void add_dma_entry(struct dma_debug_entry *entry) rc = active_cacheline_insert(entry); if (rc == -ENOMEM) { - pr_err("DMA-API: cacheline tracking ENOMEM, dma-debug disabled\n"); + pr_err("cacheline tracking ENOMEM, dma-debug disabled\n"); global_disable = true; } @@ -673,7 +675,7 @@ static struct dma_debug_entry *dma_entry_alloc(void) if (list_empty(_entries)) { global_disable = true; spin_unlock_irqrestore(_entries_lock, flags); - pr_err("DMA-API: debugging out of memory - disabling\n"); + pr_err("debugging out of memory - disabling\n"); return NULL; } @@ -777,7 +779,7 @@ static int prealloc_memory(u32 num_entries) num_free_entries = num_entries; min_free_entries = num_entries; - pr_info("DMA-API: preallocated %d debug entries\n", num_entries); + pr_info("preallocated %d debug entries\n", num_entries); return 0; @@ -850,7 +852,7 @@ static ssize_t filter_write(struct file *file, const char __user *userbuf, * switched off. */ if (current_driver_name[0]) - pr_info("DMA-API: switching off dma-debug driver filter\n"); + pr_info("switching off dma-debug driver filter\n"); current_driver_name[0] = 0; current_driver = NULL; goto out_unlock; @@ -868,7 +870,7 @@ static ssize_t filter_write(struct file *file, const char __user *userbuf, current_driver_name[i] = 0; current_driver = NULL; - pr_info("DMA-API: enable driver filter for driver [%s]\n", + pr_info("enable driver filter for driver [%s]\n", current_driver_name); out_unlock: @@ -887,7 +889,7 @@ static int dma_debug_fs_init(void) { dma_debug_dent = debugfs_create_dir("dma-api", NULL); if (!dma_debug_dent) { - pr_err("DMA-API: can not create debugfs directory\n"); + pr_err("can not create debugfs directory\n"); return -ENOMEM; } @@ -973,7 +975,7 @@ static int dma_debug_device_change(struct notifier_block *nb, unsigned long acti count = device_dma_allocations(dev, ); if (count == 0) break; - err_printk(dev, entry, "DMA-API: device driver has pending " + err_printk(dev, entry, "device driver has pending " "DMA allocations while released from device " "[count=%d]\n" "One of leaked entries details: " @@ -1023,14 +1025,14 @@ static int
[PATCH v2 0/8] dma-debug cleanup and dynamic allocation
Hi all, Here's some assorted cleanup and improvements to dma-debug which grew out of the problem that certain drivers use very large numbers of DMA mappings, and knowing when to override "dma_debug_entries=..." and what value to override it with can be a less-than-obvious task for users. The main part is patches #3 and #4, wherein we make dma-debug clever enough to allocate more entries dynamically if needed, such that the preallocation value becomes more of a quality-of-life option than a necessity. Patches #6 and #7 do some cruft-removal to allow patch #8 to make the allocation behaviour more efficient in general. Patches #1, #2 and #5 are some other cleanup and handy features which fell out of the discussion/development. Robin. Robin Murphy (8): dma-debug: Use pr_fmt() dma-debug: Expose nr_total_entries in debugfs dma-debug: Refactor dma_debug_entry allocation dma-debug: Dynamically expand the dma_debug_entry pool dma-debug: Make leak-like behaviour apparent x86/dma/amd-gart: Stop resizing dma_debug_entry pool dma/debug: Remove dma_debug_resize_entries() dma-debug: Batch dma_debug_entry allocation Documentation/DMA-API.txt | 20 +- Documentation/x86/x86_64/boot-options.txt | 5 +- arch/x86/kernel/amd_gart_64.c | 23 --- include/linux/dma-debug.h | 7 - kernel/dma/debug.c| 211 ++ 5 files changed, 107 insertions(+), 159 deletions(-) -- 2.19.1.dirty ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 2/8] dma-debug: Expose nr_total_entries in debugfs
Expose nr_total_entries in debugfs, so that {num,min}_free_entries become even more meaningful to users interested in current/maximum utilisation. This becomes even more relevant once nr_total_entries may change at runtime beyond just the existing AMD GART debug code. Suggested-by: John Garry Signed-off-by: Robin Murphy --- v2: New Documentation/DMA-API.txt | 3 +++ kernel/dma/debug.c| 7 +++ 2 files changed, 10 insertions(+) diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt index ac66ae2509a9..6bdb095393b0 100644 --- a/Documentation/DMA-API.txt +++ b/Documentation/DMA-API.txt @@ -723,6 +723,9 @@ dma-api/min_free_entriesThis read-only file can be read to get the dma-api/num_free_entries The current number of free dma_debug_entries in the allocator. +dma-api/nr_total_entries The total number of dma_debug_entries in the + allocator, both free and used. + dma-api/driver-filter You can write a name of a driver into this file to limit the debug output to requests from that particular driver. Write an empty string to diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c index 91b84140e4a5..29486eb9d1dc 100644 --- a/kernel/dma/debug.c +++ b/kernel/dma/debug.c @@ -144,6 +144,7 @@ static struct dentry *show_all_errors_dent __read_mostly; static struct dentry *show_num_errors_dent __read_mostly; static struct dentry *num_free_entries_dent __read_mostly; static struct dentry *min_free_entries_dent __read_mostly; +static struct dentry *nr_total_entries_dent __read_mostly; static struct dentry *filter_dent __read_mostly; /* per-driver filter related state */ @@ -928,6 +929,12 @@ static int dma_debug_fs_init(void) if (!min_free_entries_dent) goto out_err; + nr_total_entries_dent = debugfs_create_u32("nr_total_entries", 0444, + dma_debug_dent, + _total_entries); + if (!nr_total_entries_dent) + goto out_err; + filter_dent = debugfs_create_file("driver_filter", 0644, dma_debug_dent, NULL, _fops); if (!filter_dent) -- 2.19.1.dirty ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu: fix amd_iommu=force_isolation
On Wed, Dec 05, 2018 at 05:09:55PM +0100, Joerg Roedel wrote: > On Tue, Dec 04, 2018 at 03:37:16PM -0700, Yu Zhao wrote: > > The parameter is still there but it's ignored. We need to check its > > value before deciding to go into passthrough mode for AMD IOMMU. > > > > Fixes: aafd8ba0ca74 ("iommu/amd: Implement add_device and remove_device") > > > > Signed-off-by: Yu Zhao > > --- > > drivers/iommu/amd_iommu.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c > > index 1167ff0416cf..3e4219e6cff0 100644 > > --- a/drivers/iommu/amd_iommu.c > > +++ b/drivers/iommu/amd_iommu.c > > @@ -2195,7 +2195,8 @@ static int amd_iommu_add_device(struct device *dev) > > > > BUG_ON(!dev_data); > > > > - if (iommu_pass_through || dev_data->iommu_v2) > > + if (iommu_pass_through || > > + (!amd_iommu_force_isolation && dev_data->iommu_v2)) > > This breaks the iommu_v2 use-case, as it needs a direct mapping for the > devices that support it. Actually this is what we want. We occasionally use this parameter to isolate v2 device when debugging memory corruption that we suspect is caused by DMA. It helped us before. Now with the patch, we caught another v2 device doing DMA write to memory areas that it's not supposed to. > I think the force_isolation parameter does not make sense anymore today > and should be removed. Could you please suggest an alternative? Otherwise, we won't be able to force v2 device out of direct mapping for the reason mentioned above. Thank you. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 3/4] iommu/of: Don't call iommu_ops->add_device directly
On 05/12/2018 14:36, Joerg Roedel wrote: From: Joerg Roedel Make sure to invoke this call-back through the proper function of the IOMMU-API. Signed-off-by: Joerg Roedel --- drivers/iommu/of_iommu.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c index c5dd63072529..4d4847de727e 100644 --- a/drivers/iommu/of_iommu.c +++ b/drivers/iommu/of_iommu.c @@ -218,10 +218,10 @@ const struct iommu_ops *of_iommu_configure(struct device *dev, ops = dev->iommu_fwspec->ops; /* * If we have reason to believe the IOMMU driver missed the initial -* add_device callback for dev, replay it to get things in order. +* probe for dev, replay it to get things in order. */ - if (ops && ops->add_device && dev->bus && !dev->iommu_group) Ugh, code I wrote... I think that first ops test is serving double-duty - it's not just an "is ops->add_device safe to call?" check, but also the specific "do we need to try doing this at all?" check, since ops is only (potentially) set in the !err case at the top of this context. - err = ops->add_device(dev); + if (dev->bus && !dev->iommu_group) Thus to avoid calling add_device erroneously in either of the "no valid IOMMU" cases (since dev->bus->iommu_ops may well be non-NULL), this still needs to be at least: if (ops && dev->bus && !dev->iommu_group) What I can't quite remember just now is whether it's actually valid to get here with err == 0 but dev->iommu_fwspec->ops == NULL, so it *might* be OK to use "!err" instead of "ops" to make things slightly more obvious - I'll work through the flow tomorrow to double-check. Robin. + err = iommu_probe_device(dev); /* Ignore all other errors apart from EPROBE_DEFER */ if (err == -EPROBE_DEFER) { ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/4] iommu: Consolitate ->add/remove_device() calls
On 05/12/2018 14:36, Joerg Roedel wrote: From: Joerg Roedel Put them into separate functions and call those where the plain ops have been called before. Signed-off-by: Joerg Roedel --- drivers/iommu/iommu.c | 54 ++- include/linux/iommu.h | 3 +++ 2 files changed, 31 insertions(+), 26 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index edbdf5d6962c..ad4dc51eb69c 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -110,6 +110,26 @@ void iommu_device_unregister(struct iommu_device *iommu) spin_unlock(_device_lock); } +int iommu_probe_device(struct device *dev) +{ + const struct iommu_ops *ops = dev->bus->iommu_ops; + + if (!ops->add_device) + return 0; Is there any good reason to let .add_device/.remove_device be optional still? Everyone's implemented them for a while now, and on a practical level I don't really see how else we could expect devices to be taken in and out of their appropriate groups correctly. Robin. + + WARN_ON(dev->iommu_group); + + return ops->add_device(dev); +} + +void iommu_release_device(struct device *dev) +{ + const struct iommu_ops *ops = dev->bus->iommu_ops; + + if (ops->remove_device && dev->iommu_group) + ops->remove_device(dev); +} + static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus, unsigned type); static int __iommu_attach_device(struct iommu_domain *domain, @@ -1117,16 +1137,7 @@ struct iommu_domain *iommu_group_default_domain(struct iommu_group *group) static int add_iommu_group(struct device *dev, void *data) { - struct iommu_callback_data *cb = data; - const struct iommu_ops *ops = cb->ops; - int ret; - - if (!ops->add_device) - return 0; - - WARN_ON(dev->iommu_group); - - ret = ops->add_device(dev); + int ret = iommu_probe_device(dev); /* * We ignore -ENODEV errors for now, as they just mean that the @@ -1141,11 +1152,7 @@ static int add_iommu_group(struct device *dev, void *data) static int remove_iommu_group(struct device *dev, void *data) { - struct iommu_callback_data *cb = data; - const struct iommu_ops *ops = cb->ops; - - if (ops->remove_device && dev->iommu_group) - ops->remove_device(dev); + iommu_release_device(dev); return 0; } @@ -1153,27 +1160,22 @@ static int remove_iommu_group(struct device *dev, void *data) static int iommu_bus_notifier(struct notifier_block *nb, unsigned long action, void *data) { + unsigned long group_action = 0; struct device *dev = data; - const struct iommu_ops *ops = dev->bus->iommu_ops; struct iommu_group *group; - unsigned long group_action = 0; /* * ADD/DEL call into iommu driver ops if provided, which may * result in ADD/DEL notifiers to group->notifier */ if (action == BUS_NOTIFY_ADD_DEVICE) { - if (ops->add_device) { - int ret; + int ret; - ret = ops->add_device(dev); - return (ret) ? NOTIFY_DONE : NOTIFY_OK; - } + ret = iommu_probe_device(dev); + return (ret) ? NOTIFY_DONE : NOTIFY_OK; } else if (action == BUS_NOTIFY_REMOVED_DEVICE) { - if (ops->remove_device && dev->iommu_group) { - ops->remove_device(dev); - return 0; - } + iommu_release_device(dev); + return NOTIFY_OK; } /* diff --git a/include/linux/iommu.h b/include/linux/iommu.h index a1d28f42cb77..2357841845bb 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -398,6 +398,9 @@ void iommu_fwspec_free(struct device *dev); int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids); const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode); +int iommu_probe_device(struct device *dev); +void iommu_release_device(struct device *dev); + #else /* CONFIG_IOMMU_API */ struct iommu_ops {}; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/9] ACPI/IORT: Use helper functions to access dev->iommu_fwspec
On 04/12/2018 16:29, Joerg Roedel wrote: From: Joerg Roedel Use the new helpers dev_iommu_fwspec_get()/set() to access the dev->iommu_fwspec pointer. This makes it easier to move that pointer later into another struct. Cc: Lorenzo Pieralisi Signed-off-by: Joerg Roedel --- drivers/acpi/arm64/iort.c | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c index 70f4e80b9246..754a67ba49e5 100644 --- a/drivers/acpi/arm64/iort.c +++ b/drivers/acpi/arm64/iort.c @@ -779,7 +779,7 @@ static inline bool iort_iommu_driver_enabled(u8 type) static struct acpi_iort_node *iort_get_msi_resv_iommu(struct device *dev) { struct acpi_iort_node *iommu; - struct iommu_fwspec *fwspec = dev->iommu_fwspec; + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); iommu = iort_get_iort_node(fwspec->iommu_fwnode); @@ -824,6 +824,7 @@ static inline int iort_add_device_replay(const struct iommu_ops *ops, */ int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head *head) { + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); struct acpi_iort_its_group *its; struct acpi_iort_node *iommu_node, *its_node = NULL; int i, resv = 0; @@ -841,9 +842,9 @@ int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head *head) * a given PCI or named component may map IDs to. */ - for (i = 0; i < dev->iommu_fwspec->num_ids; i++) { + for (i = 0; i < fwspec->num_ids; i++) { its_node = iort_node_map_id(iommu_node, - dev->iommu_fwspec->ids[i], + fwspec->ids[i], NULL, IORT_MSI_TYPE); if (its_node) break; @@ -1036,6 +1037,7 @@ void iort_dma_setup(struct device *dev, u64 *dma_addr, u64 *dma_size) */ const struct iommu_ops *iort_iommu_configure(struct device *dev) { + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); struct acpi_iort_node *node, *parent; const struct iommu_ops *ops; u32 streamid = 0; @@ -1045,7 +1047,7 @@ const struct iommu_ops *iort_iommu_configure(struct device *dev) * If we already translated the fwspec there * is nothing left to do, return the iommu_ops. */ - ops = iort_fwspec_iommu_ops(dev->iommu_fwspec); + ops = iort_fwspec_iommu_ops(fwspec); if (ops) return ops; @@ -1084,7 +1086,7 @@ const struct iommu_ops *iort_iommu_configure(struct device *dev) * add_device callback for dev, replay it to get things in order. */ if (!err) { - ops = iort_fwspec_iommu_ops(dev->iommu_fwspec); + ops = iort_fwspec_iommu_ops(fwspec); This needs to reload the new fwspec initialised by iort_iommu_xlate(), in the same manner as the OF code. I think the best thing to do is encapsulate the dev_iommu_fwspec_get() call in iort_fwspec_iommu_ops(), and have that take dev as its argument directly. Robin. err = iort_add_device_replay(ops, dev); } ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 7/9] iommu/of: Use helper functions to access dev->iommu_fwspec
On 04/12/2018 16:30, Joerg Roedel wrote: From: Joerg Roedel Use the new helpers dev_iommu_fwspec_get()/set() to access the dev->iommu_fwspec pointer. This makes it easier to move that pointer later into another struct. Signed-off-by: Joerg Roedel --- drivers/iommu/of_iommu.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c index c5dd63072529..38232250b5f4 100644 --- a/drivers/iommu/of_iommu.c +++ b/drivers/iommu/of_iommu.c @@ -164,7 +164,7 @@ const struct iommu_ops *of_iommu_configure(struct device *dev, struct device_node *master_np) { const struct iommu_ops *ops = NULL; - struct iommu_fwspec *fwspec = dev->iommu_fwspec; + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); int err = NO_IOMMU; if (!master_np) @@ -208,6 +208,9 @@ const struct iommu_ops *of_iommu_configure(struct device *dev, } } + /* The fwspec pointer changed, read it again */ + fwspec = dev_iommu_fwspec_get(dev); Nit: I think it makes sense to put this inside the "if (!err)" condition below rather than out here where it may or may not be relevant. The comment for that case is already supposed to imply that it's dealing with a fresh fwspec. Robin. + /* * Two success conditions can be represented by non-negative err here: * >0 : there is no IOMMU, or one was unavailable for non-fatal reasons @@ -215,7 +218,7 @@ const struct iommu_ops *of_iommu_configure(struct device *dev, * <0 : any actual error */ if (!err) - ops = dev->iommu_fwspec->ops; + ops = fwspec->ops; /* * If we have reason to believe the IOMMU driver missed the initial * add_device callback for dev, replay it to get things in order. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC v2] iommu/vt-d: Allow iommu_domain_alloc to allocate IOMMU_DOMAIN_DMA
Hey, There exists an issue in the logic used to determine domain association with devices. Currently the driver uses find_or_alloc_domain to either reuse an existing domain or allocate a new one if one isn’t found. Domains should be shared between all members of an IOMMU group as this is the minimum granularity that we can guarantee address space isolation. The intel IOMMU driver exposes pci_device_group in intel_iommu_ops as the function to call to determine the group of a device, this is implemented in the generic IOMMU code and checks: dma aliases, upstream pcie switch ACS, pci aliases, and pci function aliases. The find_or_alloc_domain code currently only uses dma aliases to determine if a domain is shared. This causes a disconnect between IOMMU groups and domains. We have observed devices under a pcie switch each having their own domain but assigned the same group. One solution would be to fix the logic in find_or_alloc_domain to add checks for the other conditions that a device may share a domain. However, this duplicates code which the generic IOMMU code implements. Instead this issue can be fixed by allowing the allocation of default_domain on the IOMMU group. This is not currently supported as the intel driver does not allow allocation of domain type IOMMU_DOMAIN_DMA. Allowing allocation of DMA domains has the effect that the default_domain is non NULL and is attached to a device when initialising. This delegates the handling of domains to the generic IOMMU code. Once this is implemented it is possible to remove the lazy allocation of domains entirely. This patch implements DMA and identity domains to be allocated for external management. As it isn’t known which device will be attached to a domain, the dma domain is not initialised at alloc time. Instead it is allocated when attached. As we may lose RMRR mappings when attaching a device to a new domain, we also ensure these are mapped at attach time. This will likely conflict with the work done for auxiliary domains by Baolu but the code to accommodate won’t change much. I had also started on a patch to remove find_or_alloc_domain and various functions that call it but had issues with edge cases such as iommu_prepare_isa that is doing domain operations at IOMMU init time. Cheers, James. --- drivers/iommu/intel-iommu.c | 159 +--- 1 file changed, 110 insertions(+), 49 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 41a4b8808802..6437cb2e9b22 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -351,6 +351,14 @@ static int hw_pass_through = 1; /* si_domain contains mulitple devices */ #define DOMAIN_FLAG_STATIC_IDENTITY(1 << 1) +/* Domain managed externally, don't cleanup if it isn't attached + * to any devices. */ +#define DOMAIN_FLAG_NO_CLEANUP (1 << 2) + +/* Set after domain initialisation. Used when allocating dma domains to + * defer domain initialisation until it is attached to a device */ +#define DOMAIN_FLAG_INITIALISED(1 << 4) + #define for_each_domain_iommu(idx, domain) \ for (idx = 0; idx < g_num_of_iommus; idx++) \ if (domain->iommu_refcnt[idx]) @@ -624,6 +632,16 @@ static inline int domain_type_is_vm_or_si(struct dmar_domain *domain) DOMAIN_FLAG_STATIC_IDENTITY); } +static inline int domain_managed_externally(struct dmar_domain *domain) +{ + return domain->flags & DOMAIN_FLAG_NO_CLEANUP; +} + +static inline int domain_is_initialised(struct dmar_domain *domain) +{ + return domain->flags & DOMAIN_FLAG_INITIALISED; +} + static inline int domain_pfn_supported(struct dmar_domain *domain, unsigned long pfn) { @@ -1717,7 +1735,7 @@ static void disable_dmar_iommu(struct intel_iommu *iommu) __dmar_remove_one_dev_info(info); - if (!domain_type_is_vm_or_si(domain)) { + if (!domain_managed_externally(domain)) { /* * The domain_exit() function can't be called under * device_domain_lock, as it takes this lock itself. @@ -1951,6 +1969,7 @@ static int domain_init(struct dmar_domain *domain, struct intel_iommu *iommu, domain->pgd = (struct dma_pte *)alloc_pgtable_page(domain->nid); if (!domain->pgd) return -ENOMEM; + domain->flags |= DOMAIN_FLAG_INITIALISED; __iommu_flush_cache(iommu, domain->pgd, PAGE_SIZE); return 0; } @@ -3234,11 +3253,9 @@ static int copy_translation_tables(struct intel_iommu *iommu) static int __init init_dmars(void) { struct dmar_drhd_unit *drhd; - struct dmar_rmrr_unit *rmrr; bool copied_tables = false; - struct device *dev; struct intel_iommu *iommu; - int i, ret; + int ret; /* * for each
Re: [PATCH 2/5] iommu/of: Use device_iommu_mapped()
On 04/12/2018 17:25, Joerg Roedel wrote: From: Joerg Roedel Use Use device_iommu_mapped() to check if the device is already mapped by an IOMMU. FWIW, this check (and its ACPI equivalent in patch #3) is specifically asking "has .add_device() already been called?", rather than the more general "is this device managed by an IOMMU?" (to which the exact answer at this point is "yes, provided we return successfully from here"). I have no objection to the change as-is - especially if that usage is within the intended scope of this API - I just wanted to call it out in case you're also planning to introduce something else which would be even more appropriate for that. Robin. Signed-off-by: Joerg Roedel --- drivers/iommu/of_iommu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c index c5dd63072529..bfcf139503f0 100644 --- a/drivers/iommu/of_iommu.c +++ b/drivers/iommu/of_iommu.c @@ -220,7 +220,7 @@ const struct iommu_ops *of_iommu_configure(struct device *dev, * If we have reason to believe the IOMMU driver missed the initial * add_device callback for dev, replay it to get things in order. */ - if (ops && ops->add_device && dev->bus && !dev->iommu_group) + if (ops && ops->add_device && dev->bus && !device_iommu_mapped(dev)) err = ops->add_device(dev); /* Ignore all other errors apart from EPROBE_DEFER */ ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/5] Introduce device_iommu_maped() function
Hi Joerg, On 04/12/2018 17:24, Joerg Roedel wrote: Hi, here is a patch-set to replace the dev->iommu_group checks in the source tree by a proper function call. The pointer checks mostly happen to check whether a device is managed my an IOMMU. For that purpose a pointer check is not very descriptable, so replace it by a function call that make its purpose readable. Nice, we can also clean up a whole load of vague iommu_present() usage and even one or two odd iommu_get_domain_for_dev() calls once we have this. This also starts to remove direct access to the dev->iommu_group pointer outside of iommu-code. This is another move towards consolidating the various iommu-related pointers in 'struct device' into one pointer only. Please review. Thanks, Joerg Joerg Roedel (5): driver core: Introduce device_iommu_mapped() function iommu/of: Use device_iommu_mapped() ACPI/IORT: Use device_iommu_mapped() powerpc/iommu: Use device_iommu_mapped() xhci: Use device_iommu_mapped() arch/powerpc/kernel/eeh.c | 2 +- arch/powerpc/kernel/iommu.c | 6 +++--- drivers/acpi/arm64/iort.c | 2 +- drivers/iommu/of_iommu.c| 2 +- drivers/usb/host/xhci.c | 2 +- include/linux/device.h | 10 ++ 6 files changed, 17 insertions(+), 7 deletions(-) There looks to be one more here: drivers/dma/sh/rcar-dmac.c: rcar_dmac_probe() Other than that and a minor comment on the OF/IORT part, though, for the whole series: Acked-by: Robin Murphy ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu: fix amd_iommu=force_isolation
On Tue, Dec 04, 2018 at 03:37:16PM -0700, Yu Zhao wrote: > The parameter is still there but it's ignored. We need to check its > value before deciding to go into passthrough mode for AMD IOMMU. > > Fixes: aafd8ba0ca74 ("iommu/amd: Implement add_device and remove_device") > > Signed-off-by: Yu Zhao > --- > drivers/iommu/amd_iommu.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c > index 1167ff0416cf..3e4219e6cff0 100644 > --- a/drivers/iommu/amd_iommu.c > +++ b/drivers/iommu/amd_iommu.c > @@ -2195,7 +2195,8 @@ static int amd_iommu_add_device(struct device *dev) > > BUG_ON(!dev_data); > > - if (iommu_pass_through || dev_data->iommu_v2) > + if (iommu_pass_through || > + (!amd_iommu_force_isolation && dev_data->iommu_v2)) This breaks the iommu_v2 use-case, as it needs a direct mapping for the devices that support it. I think the force_isolation parameter does not make sense anymore today and should be removed. Regards, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] Fix typo. Change tlb_range_add to iotlb_range_add and tlb_sync to iotlb_sync
On Tue, Dec 04, 2018 at 06:27:34PM +, Tom Murphy wrote: > From: tom > > Someone forgot to update this comment. > > Signed-off-by: Tom Murphy > --- > include/linux/iommu.h | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) Applied, thanks. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 04/12] iommu/vt-d: Add 256-bit invalidation descriptor support
On Tue, Dec 04, 2018 at 02:13:31PM +0800, Lu Baolu wrote: > The existing code uses GFP_ATOMIC, this patch only changes the size of > the allocated desc_page. > > I don't think we really need GFP_ATOMIC here (and also for some other > places). I will clean up them in a separated patch. Okay, thanks. > > In this patch, there is some code like the code below. It calculates > > destination address of memcpy with qi->desc. If it's still struct qi_desc > > pointer, the calculation result would be wrong. > > > > + memcpy(desc, qi->desc + (wait_index << shift), > > + 1 << shift); > > > > The change of the calculation method is to support 128 bits invalidation > > descriptors and 256 invalidation descriptors in this unified code logic. > > > > Also, the conversation between Baolu and me may help. > > > > https://lore.kernel.org/patchwork/patch/1006756/ > > Yes. We need to support different descriptor size. Okay, pointer arithmetic on void* isn't well defined in the C standard, afaik. But it should work with GCC, so it's probably fine. Unrelated to this patch-set, the whole qi management code needs some cleanups, it queues a sync after every command and has very tricky locking. This patch-set further complicates matters there, so it is probably time for a clean re-write of that part? Regards, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 02/12] iommu/vt-d: Manage scalalble mode PASID tables
On Tue, Dec 04, 2018 at 01:58:06PM +0800, Lu Baolu wrote: > This function is called in an unsleepable context. > > spin_lock() > [...] > if (pasid_table_is_necessary) > allocate_pasid_table(dev) > [...] > spin_unlock() > > We can move it out of the lock range. > > How about > > if (pasid_table_is_necessary) > pasid_table = allocate_pasid_table(dev) > > spin_lock() > [...] > if (pasid_table_is_necessary) > set_up_pasid_table(pasid_table) > [...] > spin_unlock() Hmm, so when the IOMMU is configured in scalable mode we can just allocate a pasid-table for the device when we set it up, right? Scalable mode is a boot-time decision, so we know for sure whether we need a pasid-table on device-setup time. And the device-setup code it preemptable, so I think it this allocation should be outside of any spin-locked section. Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 02/12] iommu/vt-d: Manage scalalble mode PASID tables
On Mon, Dec 03, 2018 at 05:23:40PM +, Liu, Yi L wrote: > So for a device without PASID support, we need to at least to have a PASID > entry so that its DMA request (without pasid) can be translated. Thus a PASID > table is needed for such devices. That makes sense, thanks for the explanation. Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/4] iommu: Consolitate ->add/remove_device() calls
On Wed, Dec 05, 2018 at 06:54:18AM -0800, Christoph Hellwig wrote: > > +void iommu_release_device(struct device *dev) > > Nitpick: there seems to be a double space here. > > > +int iommu_probe_device(struct device *dev); > > +void iommu_release_device(struct device *dev); > > .. and here. Right, thanks. I fixed it in my development branch. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/4] iommu: Consolitate ->add/remove_device() calls
> +void iommu_release_device(struct device *dev) Nitpick: there seems to be a double space here. > +int iommu_probe_device(struct device *dev); > +void iommu_release_device(struct device *dev); .. and here. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 3/3] iommu/io-pgtable-arm-v7s: Request DMA32 memory, and improve debugging
On 12/5/18 6:48 AM, Nicolas Boichat wrote: > IOMMUs using ARMv7 short-descriptor format require page tables > (level 1 and 2) to be allocated within the first 4GB of RAM, even > on 64-bit systems. > > For level 1/2 pages, ensure GFP_DMA32 is used if CONFIG_ZONE_DMA32 > is defined (e.g. on arm64 platforms). > > For level 2 pages, allocate a slab cache in SLAB_CACHE_DMA32. > > Also, print an error when the physical address does not fit in > 32-bit, to make debugging easier in the future. > > Fixes: ad67f5a6545f ("arm64: replace ZONE_DMA with ZONE_DMA32") > Signed-off-by: Nicolas Boichat > --- > > Changes since v2: > - Commit message > > (v3 used the page_frag approach) > > drivers/iommu/io-pgtable-arm-v7s.c | 20 > 1 file changed, 16 insertions(+), 4 deletions(-) > > diff --git a/drivers/iommu/io-pgtable-arm-v7s.c > b/drivers/iommu/io-pgtable-arm-v7s.c > index 445c3bde04800c..996f7b6d00b44a 100644 > --- a/drivers/iommu/io-pgtable-arm-v7s.c > +++ b/drivers/iommu/io-pgtable-arm-v7s.c > @@ -161,6 +161,14 @@ > > #define ARM_V7S_TCR_PD1 BIT(5) > > +#ifdef CONFIG_ZONE_DMA32 > +#define ARM_V7S_TABLE_GFP_DMA GFP_DMA32 > +#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA32 > +#else > +#define ARM_V7S_TABLE_GFP_DMA GFP_DMA > +#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA > +#endif > + > typedef u32 arm_v7s_iopte; > > static bool selftest_running; > @@ -198,13 +206,17 @@ static void *__arm_v7s_alloc_table(int lvl, gfp_t gfp, > void *table = NULL; > > if (lvl == 1) > - table = (void *)__get_dma_pages(__GFP_ZERO, get_order(size)); > + table = (void *)__get_free_pages( > + __GFP_ZERO | ARM_V7S_TABLE_GFP_DMA, get_order(size)); > else if (lvl == 2) > - table = kmem_cache_zalloc(data->l2_tables, gfp | GFP_DMA); > + table = kmem_cache_zalloc(data->l2_tables, > + gfp | ARM_V7S_TABLE_GFP_DMA); So as I've explained in 2/3, you don't need ARM_V7S_TABLE_GFP_DMA here (and then you don't need to adjust the slab warnings). > phys = virt_to_phys(table); > - if (phys != (arm_v7s_iopte)phys) > + if (phys != (arm_v7s_iopte)phys) { > /* Doesn't fit in PTE */ > + dev_err(dev, "Page table does not fit in PTE: %pa", ); > goto out_free; > + } > if (table && !(cfg->quirks & IO_PGTABLE_QUIRK_NO_DMA)) { > dma = dma_map_single(dev, table, size, DMA_TO_DEVICE); > if (dma_mapping_error(dev, dma)) > @@ -737,7 +749,7 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct > io_pgtable_cfg *cfg, > data->l2_tables = kmem_cache_create("io-pgtable_armv7s_l2", > ARM_V7S_TABLE_SIZE(2), > ARM_V7S_TABLE_SIZE(2), > - SLAB_CACHE_DMA, NULL); > + ARM_V7S_TABLE_SLAB_CACHE, NULL); > if (!data->l2_tables) > goto out_free_data; > > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 3/3] iommu/io-pgtable-arm-v7s: Request DMA32 memory, and improve debugging
On Wed, Dec 05, 2018 at 06:43:08AM -0800, Christoph Hellwig wrote: > On Wed, Dec 05, 2018 at 02:40:06PM +, Robin Murphy wrote: > > 32-bit Arm doesn't have ZONE_DMA32, but has (or at least had at the time) a > > 2GB ZONE_DMA. Whether we actually need that or not depends on how this all > > interacts with LPAE and highmem, but I'm not sure of those details off-hand. > > Well, arm32 can't address more than 32-bits in the linear kernel > mapping, so GFP_KERNEL should be perfectly fine there if the limit > really is 32-bits and not 31 or smaller because someone stole a bit > or two somewhere. I'm not sure that's necessarily true on the physical side. Wasn't there a keystone SoC with /all/ the coherent memory above 4GB? Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 3/3] iommu/io-pgtable-arm-v7s: Request DMA32 memory, and improve debugging
On Wed, Dec 05, 2018 at 02:40:06PM +, Robin Murphy wrote: > 32-bit Arm doesn't have ZONE_DMA32, but has (or at least had at the time) a > 2GB ZONE_DMA. Whether we actually need that or not depends on how this all > interacts with LPAE and highmem, but I'm not sure of those details off-hand. Well, arm32 can't address more than 32-bits in the linear kernel mapping, so GFP_KERNEL should be perfectly fine there if the limit really is 32-bits and not 31 or smaller because someone stole a bit or two somewhere. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables
On Wed, Dec 05, 2018 at 10:04:00AM +0800, Nicolas Boichat wrote: > On Tue, Dec 4, 2018 at 10:35 PM Vlastimil Babka wrote: > > > > On 12/4/18 10:37 AM, Nicolas Boichat wrote: > > > On Sun, Nov 11, 2018 at 5:04 PM Nicolas Boichat > > > wrote: > > >> > > >> This is a follow-up to the discussion in [1], to make sure that the page > > >> tables allocated by iommu/io-pgtable-arm-v7s are contained within 32-bit > > >> physical address space. > > >> > > >> [1] > > >> https://lists.linuxfoundation.org/pipermail/iommu/2018-November/030876.html > > > > > > Hi everyone, > > > > > > Let's try to summarize here. > > > > > > First, we confirmed that this is a regression, and IOMMU errors happen > > > on 4.19 and linux-next/master on MT8173 (elm, Acer Chromebook R13). > > > The issue most likely starts from ad67f5a6545f ("arm64: replace > > > ZONE_DMA with ZONE_DMA32"), i.e. 4.15, and presumably breaks a number > > > of Mediatek platforms (and maybe others?). > > > > > > We have a few options here: > > > 1. This series [2], that adds support for GFP_DMA32 slab caches, > > > _without_ adding kmalloc caches (since there are no users of > > > kmalloc(..., GFP_DMA32)). I think I've addressed all the comments on > > > the 3 patches, and AFAICT this solution works fine. > > > 2. genalloc. That works, but unless we preallocate 4MB for L2 tables > > > (which is wasteful as we usually only need a handful of L2 tables), > > > we'll need changes in the core (use GFP_ATOMIC) to allow allocating on > > > demand, and as it stands we'd have no way to shrink the allocation. > > > 3. page_frag [3]. That works fine, and the code is quite simple. One > > > drawback is that fragments in partially freed pages cannot be reused > > > (from limited experiments, I see that IOMMU L2 tables are rarely > > > freed, so it's unlikely a whole page would get freed). But given the > > > low number of L2 tables, maybe we can live with that. > > > > > > I think 2 is out. Any preference between 1 and 3? I think 1 makes > > > better use of the memory, so that'd be my preference. But I'm probably > > > missing something. > > > > I would prefer 1 as well. IIRC you already confirmed that alignment > > requirements are not broken for custom kmem caches even in presence of > > SLUB debug options (and I would say it's a bug to be fixed if they > > weren't). > > > I just asked (and didn't get a reply I think) about your > > ability to handle the GFP_ATOMIC allocation failures. They should be > > rare when only single page allocations are needed for the kmem cache. > > But in case they are not an option, then preallocating would be needed, > > thus probably option 2. > > Oh, sorry, I missed your question. > > I don't have a full answer, but: > - The allocations themselves are rare (I count a few 10s of L2 tables > at most on my system, I assume we rarely have >100), and yes, we only > need a single page, so the failures should be exceptional. > - My change is probably not making anything worse: I assume that even > with the current approach using GFP_DMA slab caches on older kernels, > failures could potentially happen. I don't think we've seen those. If > we are really concerned about this, maybe we'd need to modify > mtk_iommu_map to not hold a spinlock (if that's possible), so we don't > need to use GFP_ATOMIC. I suggest we just keep an eye on such issues, > and address them if they show up (we can even revisit genalloc at that > stage). I think the spinlock is the least of our worries: the map/unmap routines can be called in irq context and may need to allocate second-level tables. Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 3/3] iommu/io-pgtable-arm-v7s: Request DMA32 memory, and improve debugging
On 05/12/2018 13:54, Christoph Hellwig wrote: On Wed, Dec 05, 2018 at 01:48:28PM +0800, Nicolas Boichat wrote: IOMMUs using ARMv7 short-descriptor format require page tables (level 1 and 2) to be allocated within the first 4GB of RAM, even on 64-bit systems. +#ifdef CONFIG_ZONE_DMA32 +#define ARM_V7S_TABLE_GFP_DMA GFP_DMA32 +#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA32 +#else +#define ARM_V7S_TABLE_GFP_DMA GFP_DMA +#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA +#endif How does using GFP_DMA make sense based on the above? If the system has more than 32-bits worth of RAM it should be using GFP_DMA32, else GFP_KERNEL, not GFP_DMA for an arch defined small addressability pool. 32-bit Arm doesn't have ZONE_DMA32, but has (or at least had at the time) a 2GB ZONE_DMA. Whether we actually need that or not depends on how this all interacts with LPAE and highmem, but I'm not sure of those details off-hand. Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 1/4] iommu/sysfs: Rename iommu_release_device()
From: Joerg Roedel Remove the iommu_ prefix from the function and a few other static data structures so that the iommu_release_device name can be re-used in iommu core code. Signed-off-by: Joerg Roedel --- drivers/iommu/iommu-sysfs.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/iommu/iommu-sysfs.c b/drivers/iommu/iommu-sysfs.c index 36d1a7ce7fc4..71c2249d3260 100644 --- a/drivers/iommu/iommu-sysfs.c +++ b/drivers/iommu/iommu-sysfs.c @@ -22,25 +22,25 @@ static struct attribute *devices_attr[] = { NULL, }; -static const struct attribute_group iommu_devices_attr_group = { +static const struct attribute_group devices_attr_group = { .name = "devices", .attrs = devices_attr, }; -static const struct attribute_group *iommu_dev_groups[] = { - _devices_attr_group, +static const struct attribute_group *dev_groups[] = { + _attr_group, NULL, }; -static void iommu_release_device(struct device *dev) +static void release_device(struct device *dev) { kfree(dev); } static struct class iommu_class = { .name = "iommu", - .dev_release = iommu_release_device, - .dev_groups = iommu_dev_groups, + .dev_release = release_device, + .dev_groups = dev_groups, }; static int __init iommu_dev_init(void) -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 2/4] iommu: Consolitate ->add/remove_device() calls
From: Joerg Roedel Put them into separate functions and call those where the plain ops have been called before. Signed-off-by: Joerg Roedel --- drivers/iommu/iommu.c | 54 ++- include/linux/iommu.h | 3 +++ 2 files changed, 31 insertions(+), 26 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index edbdf5d6962c..ad4dc51eb69c 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -110,6 +110,26 @@ void iommu_device_unregister(struct iommu_device *iommu) spin_unlock(_device_lock); } +int iommu_probe_device(struct device *dev) +{ + const struct iommu_ops *ops = dev->bus->iommu_ops; + + if (!ops->add_device) + return 0; + + WARN_ON(dev->iommu_group); + + return ops->add_device(dev); +} + +void iommu_release_device(struct device *dev) +{ + const struct iommu_ops *ops = dev->bus->iommu_ops; + + if (ops->remove_device && dev->iommu_group) + ops->remove_device(dev); +} + static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus, unsigned type); static int __iommu_attach_device(struct iommu_domain *domain, @@ -1117,16 +1137,7 @@ struct iommu_domain *iommu_group_default_domain(struct iommu_group *group) static int add_iommu_group(struct device *dev, void *data) { - struct iommu_callback_data *cb = data; - const struct iommu_ops *ops = cb->ops; - int ret; - - if (!ops->add_device) - return 0; - - WARN_ON(dev->iommu_group); - - ret = ops->add_device(dev); + int ret = iommu_probe_device(dev); /* * We ignore -ENODEV errors for now, as they just mean that the @@ -1141,11 +1152,7 @@ static int add_iommu_group(struct device *dev, void *data) static int remove_iommu_group(struct device *dev, void *data) { - struct iommu_callback_data *cb = data; - const struct iommu_ops *ops = cb->ops; - - if (ops->remove_device && dev->iommu_group) - ops->remove_device(dev); + iommu_release_device(dev); return 0; } @@ -1153,27 +1160,22 @@ static int remove_iommu_group(struct device *dev, void *data) static int iommu_bus_notifier(struct notifier_block *nb, unsigned long action, void *data) { + unsigned long group_action = 0; struct device *dev = data; - const struct iommu_ops *ops = dev->bus->iommu_ops; struct iommu_group *group; - unsigned long group_action = 0; /* * ADD/DEL call into iommu driver ops if provided, which may * result in ADD/DEL notifiers to group->notifier */ if (action == BUS_NOTIFY_ADD_DEVICE) { - if (ops->add_device) { - int ret; + int ret; - ret = ops->add_device(dev); - return (ret) ? NOTIFY_DONE : NOTIFY_OK; - } + ret = iommu_probe_device(dev); + return (ret) ? NOTIFY_DONE : NOTIFY_OK; } else if (action == BUS_NOTIFY_REMOVED_DEVICE) { - if (ops->remove_device && dev->iommu_group) { - ops->remove_device(dev); - return 0; - } + iommu_release_device(dev); + return NOTIFY_OK; } /* diff --git a/include/linux/iommu.h b/include/linux/iommu.h index a1d28f42cb77..2357841845bb 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -398,6 +398,9 @@ void iommu_fwspec_free(struct device *dev); int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids); const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode); +int iommu_probe_device(struct device *dev); +void iommu_release_device(struct device *dev); + #else /* CONFIG_IOMMU_API */ struct iommu_ops {}; -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 0/4] Consolitate iommu_ops->add/remove_device() calls
Hi, this patch-set consolidates the calls to iommu_ops->add_device() and remove_device() and the necessary checks into probe/release functions that be extended later with more setup work. Regards, Joerg Joerg Roedel (4): iommu/sysfs: Rename iommu_release_device() iommu: Consolitate ->add/remove_device() calls iommu/of: Don't call iommu_ops->add_device directly ACPI/IORT: Don't call iommu_ops->add_device directly drivers/acpi/arm64/iort.c | 4 +-- drivers/iommu/iommu-sysfs.c | 12 - drivers/iommu/iommu.c | 54 +++-- drivers/iommu/of_iommu.c| 6 ++--- include/linux/iommu.h | 3 +++ 5 files changed, 42 insertions(+), 37 deletions(-) -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 3/4] iommu/of: Don't call iommu_ops->add_device directly
From: Joerg Roedel Make sure to invoke this call-back through the proper function of the IOMMU-API. Signed-off-by: Joerg Roedel --- drivers/iommu/of_iommu.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c index c5dd63072529..4d4847de727e 100644 --- a/drivers/iommu/of_iommu.c +++ b/drivers/iommu/of_iommu.c @@ -218,10 +218,10 @@ const struct iommu_ops *of_iommu_configure(struct device *dev, ops = dev->iommu_fwspec->ops; /* * If we have reason to believe the IOMMU driver missed the initial -* add_device callback for dev, replay it to get things in order. +* probe for dev, replay it to get things in order. */ - if (ops && ops->add_device && dev->bus && !dev->iommu_group) - err = ops->add_device(dev); + if (dev->bus && !dev->iommu_group) + err = iommu_probe_device(dev); /* Ignore all other errors apart from EPROBE_DEFER */ if (err == -EPROBE_DEFER) { -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 4/4] ACPI/IORT: Don't call iommu_ops->add_device directly
From: Joerg Roedel Make sure to invoke this call-back through the proper function of the IOMMU-API. Signed-off-by: Joerg Roedel --- drivers/acpi/arm64/iort.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c index 70f4e80b9246..d4f7c1adc048 100644 --- a/drivers/acpi/arm64/iort.c +++ b/drivers/acpi/arm64/iort.c @@ -805,8 +805,8 @@ static inline int iort_add_device_replay(const struct iommu_ops *ops, { int err = 0; - if (ops->add_device && dev->bus && !dev->iommu_group) - err = ops->add_device(dev); + if (dev->bus && !dev->iommu_group) + err = iommu_probe_device(dev); return err; } -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: use generic DMA mapping code in powerpc V4
On Wed, Dec 05, 2018 at 10:44:05AM +0100, Christian Zigotzky wrote: > Thanks for your reply. I undid all dma mapping commits with the following > command: > > git checkout 721c01ba8b46ddb5355bd6e6b3bbfdabfdf01e97 > > After that I compiled the kernels with this code for my P5020 board (Cyrus) > and for my PASEMI board (Nemo) today. > > Result: PASEMI onboard ethernet works again and the P5020 board boots. > > It seems the dma mapping commits are the problem. Thanks. Can you try a few stepping points in the tree? First just with commit 7fd3bb05b73beea1f9840b505aa09beb9c75a8c6 (the first one) applied? Second with all commits up to 5da11e49df21f21dac25a2491aa788307bdacb6b And if that still works with commits up to c1bfcad4b0cf38ce5b00f7ad880d3a13484c123a ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 2/3] mm: Add support for kmem caches in DMA32 zone
On 12/5/18 6:48 AM, Nicolas Boichat wrote: > In some cases (e.g. IOMMU ARMv7s page allocator), we need to allocate > data structures smaller than a page with GFP_DMA32 flag. > > This change makes it possible to create a custom cache in DMA32 zone > using kmem_cache_create, then allocate memory using kmem_cache_alloc. > > We do not create a DMA32 kmalloc cache array, as there are currently > no users of kmalloc(..., GFP_DMA32). The new test in check_slab_flags > ensures that such calls still fail (as they do before this change). > > Fixes: ad67f5a6545f ("arm64: replace ZONE_DMA with ZONE_DMA32") Same as my comment for 1/3. > Signed-off-by: Nicolas Boichat In general, Acked-by: Vlastimil Babka Some comments below: > --- > > Changes since v2: > - Clarified commit message > - Add entry in sysfs-kernel-slab to document the new sysfs file > > (v3 used the page_frag approach) > > Documentation/ABI/testing/sysfs-kernel-slab | 9 + > include/linux/slab.h| 2 ++ > mm/internal.h | 8 ++-- > mm/slab.c | 4 +++- > mm/slab.h | 3 ++- > mm/slab_common.c| 2 +- > mm/slub.c | 18 +- > 7 files changed, 40 insertions(+), 6 deletions(-) > > diff --git a/Documentation/ABI/testing/sysfs-kernel-slab > b/Documentation/ABI/testing/sysfs-kernel-slab > index 29601d93a1c2ea..d742c6cfdffbe9 100644 > --- a/Documentation/ABI/testing/sysfs-kernel-slab > +++ b/Documentation/ABI/testing/sysfs-kernel-slab > @@ -106,6 +106,15 @@ Description: > are from ZONE_DMA. > Available when CONFIG_ZONE_DMA is enabled. > > +What:/sys/kernel/slab/cache/cache_dma32 > +Date:December 2018 > +KernelVersion: 4.21 > +Contact: Nicolas Boichat > +Description: > + The cache_dma32 file is read-only and specifies whether objects > + are from ZONE_DMA32. > + Available when CONFIG_ZONE_DMA32 is enabled. I don't have a strong opinion. It's a new file, yeah, but consistent with already existing ones. I'd leave the decision with SL*B maintainers. > What:/sys/kernel/slab/cache/cpu_slabs > Date:May 2007 > KernelVersion: 2.6.22 > diff --git a/include/linux/slab.h b/include/linux/slab.h > index 11b45f7ae4057c..9449b19c5f107a 100644 > --- a/include/linux/slab.h > +++ b/include/linux/slab.h > @@ -32,6 +32,8 @@ > #define SLAB_HWCACHE_ALIGN ((slab_flags_t __force)0x2000U) > /* Use GFP_DMA memory */ > #define SLAB_CACHE_DMA ((slab_flags_t __force)0x4000U) > +/* Use GFP_DMA32 memory */ > +#define SLAB_CACHE_DMA32 ((slab_flags_t __force)0x8000U) > /* DEBUG: Store the last owner for bug hunting */ > #define SLAB_STORE_USER ((slab_flags_t __force)0x0001U) > /* Panic if kmem_cache_create() fails */ > diff --git a/mm/internal.h b/mm/internal.h > index a2ee82a0cd44ae..fd244ad716eaf8 100644 > --- a/mm/internal.h > +++ b/mm/internal.h > @@ -14,6 +14,7 @@ > #include > #include > #include > +#include > #include > > /* > @@ -34,9 +35,12 @@ > #define GFP_CONSTRAINT_MASK (__GFP_HARDWALL|__GFP_THISNODE) > > /* Check for flags that must not be used with a slab allocator */ > -static inline gfp_t check_slab_flags(gfp_t flags) > +static inline gfp_t check_slab_flags(gfp_t flags, slab_flags_t slab_flags) > { > - gfp_t bug_mask = __GFP_DMA32 | __GFP_HIGHMEM | ~__GFP_BITS_MASK; > + gfp_t bug_mask = __GFP_HIGHMEM | ~__GFP_BITS_MASK; > + > + if (!IS_ENABLED(CONFIG_ZONE_DMA32) || !(slab_flags & SLAB_CACHE_DMA32)) > + bug_mask |= __GFP_DMA32; I'll point out that this is not even strictly needed AFAICS, as only flags passed to kmem_cache_alloc() are checked - the cache->allocflags derived from SLAB_CACHE_DMA32 are appended only after check_slab_flags() (in both SLAB and SLUB AFAICS). And for a cache created with SLAB_CACHE_DMA32, the caller of kmem_cache_alloc() doesn't need to also include __GFP_DMA32, the allocation will be from ZONE_DMA32 regardless. So it would be fine even unchanged. The check would anyway need some more love to catch the same with __GFP_DMA to be consistent and cover all corner cases. > > if (unlikely(flags & bug_mask)) { > gfp_t invalid_mask = flags & bug_mask; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 3/3] iommu/io-pgtable-arm-v7s: Request DMA32 memory, and improve debugging
On Wed, Dec 05, 2018 at 01:48:28PM +0800, Nicolas Boichat wrote: > IOMMUs using ARMv7 short-descriptor format require page tables > (level 1 and 2) to be allocated within the first 4GB of RAM, even > on 64-bit systems. > +#ifdef CONFIG_ZONE_DMA32 > +#define ARM_V7S_TABLE_GFP_DMA GFP_DMA32 > +#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA32 > +#else > +#define ARM_V7S_TABLE_GFP_DMA GFP_DMA > +#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA > +#endif How does using GFP_DMA make sense based on the above? If the system has more than 32-bits worth of RAM it should be using GFP_DMA32, else GFP_KERNEL, not GFP_DMA for an arch defined small addressability pool. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] dma-mapping: fix lack of DMA address assignment in generic remap allocator
Thanks, applied. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 1/3] mm: slab/slub: Add check_slab_flags function to check for valid flags
On 12/5/18 6:48 AM, Nicolas Boichat wrote: > Remove duplicated code between slab and slub, and will make it > easier to make the test more complicated in the next commits. > > Fixes: ad67f5a6545f ("arm64: replace ZONE_DMA with ZONE_DMA32") Well, not really. Patch 3 does that and yeah this will be a prerequisity for a clean stable backport, but we don't tag all prerequisities like this, I think? > Signed-off-by: Nicolas Boichat Acked-by: Vlastimil Babka ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] dma-mapping: fix lack of DMA address assignment in generic remap allocator
On Wed, Dec 05, 2018 at 11:14:01AM +0100, Marek Szyprowski wrote: > Commit bfd56cd60521 ("dma-mapping: support highmem in the generic remap > allocator") replaced dma_direct_alloc_pages() with __dma_direct_alloc_pages(), > which doesn't set dma_handle and zero allocated memory. Fix it by doing this > directly in the caller function. > > Fixes: bfd56cd60521 ("dma-mapping: support highmem in the generic remap > allocator") > Signed-off-by: Marek Szyprowski > --- > kernel/dma/remap.c | 8 +++- > 1 file changed, 7 insertions(+), 1 deletion(-) Tested-by: Thierry Reding signature.asc Description: PGP signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 2/3] mm: Add support for kmem caches in DMA32 zone
On Wed, Dec 05, 2018 at 03:39:51PM +0800, Nicolas Boichat wrote: >On Wed, Dec 5, 2018 at 3:25 PM Wei Yang wrote: >> >> On Wed, Dec 05, 2018 at 01:48:27PM +0800, Nicolas Boichat wrote: >> >In some cases (e.g. IOMMU ARMv7s page allocator), we need to allocate >> >data structures smaller than a page with GFP_DMA32 flag. >> > >> >This change makes it possible to create a custom cache in DMA32 zone >> >using kmem_cache_create, then allocate memory using kmem_cache_alloc. >> > >> >We do not create a DMA32 kmalloc cache array, as there are currently >> >no users of kmalloc(..., GFP_DMA32). The new test in check_slab_flags >> >ensures that such calls still fail (as they do before this change). >> > >> >Fixes: ad67f5a6545f ("arm64: replace ZONE_DMA with ZONE_DMA32") >> >Signed-off-by: Nicolas Boichat >> >--- >> > >> >Changes since v2: >> > - Clarified commit message >> > - Add entry in sysfs-kernel-slab to document the new sysfs file >> > >> >(v3 used the page_frag approach) >> > >> >Documentation/ABI/testing/sysfs-kernel-slab | 9 + >> > include/linux/slab.h| 2 ++ >> > mm/internal.h | 8 ++-- >> > mm/slab.c | 4 +++- >> > mm/slab.h | 3 ++- >> > mm/slab_common.c| 2 +- >> > mm/slub.c | 18 +- >> > 7 files changed, 40 insertions(+), 6 deletions(-) >> > >> >diff --git a/Documentation/ABI/testing/sysfs-kernel-slab >> >b/Documentation/ABI/testing/sysfs-kernel-slab >> >index 29601d93a1c2ea..d742c6cfdffbe9 100644 >> >--- a/Documentation/ABI/testing/sysfs-kernel-slab >> >+++ b/Documentation/ABI/testing/sysfs-kernel-slab >> >@@ -106,6 +106,15 @@ Description: >> > are from ZONE_DMA. >> > Available when CONFIG_ZONE_DMA is enabled. >> > >> >+What: /sys/kernel/slab/cache/cache_dma32 >> >+Date: December 2018 >> >+KernelVersion:4.21 >> >+Contact: Nicolas Boichat >> >+Description: >> >+ The cache_dma32 file is read-only and specifies whether >> >objects >> >+ are from ZONE_DMA32. >> >+ Available when CONFIG_ZONE_DMA32 is enabled. >> >+ >> > What: /sys/kernel/slab/cache/cpu_slabs >> > Date: May 2007 >> > KernelVersion:2.6.22 >> >diff --git a/include/linux/slab.h b/include/linux/slab.h >> >index 11b45f7ae4057c..9449b19c5f107a 100644 >> >--- a/include/linux/slab.h >> >+++ b/include/linux/slab.h >> >@@ -32,6 +32,8 @@ >> > #define SLAB_HWCACHE_ALIGN((slab_flags_t __force)0x2000U) >> > /* Use GFP_DMA memory */ >> > #define SLAB_CACHE_DMA((slab_flags_t __force)0x4000U) >> >+/* Use GFP_DMA32 memory */ >> >+#define SLAB_CACHE_DMA32 ((slab_flags_t __force)0x8000U) >> > /* DEBUG: Store the last owner for bug hunting */ >> > #define SLAB_STORE_USER ((slab_flags_t __force)0x0001U) >> > /* Panic if kmem_cache_create() fails */ >> >diff --git a/mm/internal.h b/mm/internal.h >> >index a2ee82a0cd44ae..fd244ad716eaf8 100644 >> >--- a/mm/internal.h >> >+++ b/mm/internal.h >> >@@ -14,6 +14,7 @@ >> > #include >> > #include >> > #include >> >+#include >> > #include >> > >> > /* >> >@@ -34,9 +35,12 @@ >> > #define GFP_CONSTRAINT_MASK (__GFP_HARDWALL|__GFP_THISNODE) >> > >> > /* Check for flags that must not be used with a slab allocator */ >> >-static inline gfp_t check_slab_flags(gfp_t flags) >> >+static inline gfp_t check_slab_flags(gfp_t flags, slab_flags_t slab_flags) >> > { >> >- gfp_t bug_mask = __GFP_DMA32 | __GFP_HIGHMEM | ~__GFP_BITS_MASK; >> >+ gfp_t bug_mask = __GFP_HIGHMEM | ~__GFP_BITS_MASK; >> >+ >> >+ if (!IS_ENABLED(CONFIG_ZONE_DMA32) || !(slab_flags & >> >SLAB_CACHE_DMA32)) >> >+ bug_mask |= __GFP_DMA32; >> >> The original version doesn't check CONFIG_ZONE_DMA32. >> >> Do we need to add this condition here? >> Could we just decide the bug_mask based on slab_flags? > >We can. The reason I did it this way is that when we don't have >CONFIG_ZONE_DMA32, the compiler should be able to simplify to: > >bug_mask = __GFP_HIGHMEM | ~__GFP_BITS_MASK; >if (true || ..) => if (true) > bug_mask |= __GFP_DMA32; > >Then just >bug_mask = __GFP_HIGHMEM | ~__GFP_BITS_MASK | __GFP_DMA32; > >And since the function is inline, slab_flags would not even need to be >accessed at all. > Hmm, I get one confusion. This means if CONFIG_ZONE_DMA32 is not enabled, bug_mask will always contains __GFP_DMA32. This will check with cachep->flags. If cachep->flags has GFP_DMA32, this always fail? Is this possible? -- Wei Yang Help you, Help me ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V2] mm: Replace all open encodings for NUMA_NO_NODE
On Wed, 2018-12-05 at 17:01 +0530, Anshuman Khandual wrote: > > On 12/05/2018 02:56 AM, Lubomir Rintel wrote: > > On Mon, 2018-11-26 at 17:56 +0530, Anshuman Khandual wrote: > > > At present there are multiple places where invalid node number is encoded > > > as -1. Even though implicitly understood it is always better to have > > > macros > > > in there. Replace these open encodings for an invalid node number with the > > > global macro NUMA_NO_NODE. This helps remove NUMA related assumptions like > > > 'invalid node' from various places redirecting them to a common > > > definition. > > > > > > Signed-off-by: Anshuman Khandual > > > --- > > > Changes in V2: > > > > > > - Added inclusion of 'numa.h' header at various places per Andrew > > > - Updated 'dev_to_node' to use NUMA_NO_NODE instead per Vinod > > > > > > Changes in V1: (https://lkml.org/lkml/2018/11/23/485) > > > > > > - Dropped OCFS2 changes per Joseph > > > - Dropped media/video drivers changes per Hans > > > > > > RFC - https://patchwork.kernel.org/patch/10678035/ > > > > > > Build tested this with multiple cross compiler options like alpha, sparc, > > > arm64, x86, powerpc, powerpc64le etc with their default config which might > > > not have compiled tested all driver related changes. I will appreciate > > > folks giving this a test in their respective build environment. > > > > > > All these places for replacement were found by running the following grep > > > patterns on the entire kernel code. Please let me know if this might have > > > missed some instances. This might also have replaced some false positives. > > > I will appreciate suggestions, inputs and review. > > > > > > 1. git grep "nid == -1" > > > 2. git grep "node == -1" > > > 3. git grep "nid = -1" > > > 4. git grep "node = -1" > > > > > > arch/alpha/include/asm/topology.h | 3 ++- > > > arch/ia64/kernel/numa.c | 2 +- > > > arch/ia64/mm/discontig.c | 6 +++--- > > > arch/ia64/sn/kernel/io_common.c | 3 ++- > > > arch/powerpc/include/asm/pci-bridge.h | 3 ++- > > > arch/powerpc/kernel/paca.c| 3 ++- > > > arch/powerpc/kernel/pci-common.c | 3 ++- > > > arch/powerpc/mm/numa.c| 14 +++--- > > > arch/powerpc/platforms/powernv/memtrace.c | 5 +++-- > > > arch/sparc/kernel/auxio_32.c | 3 ++- > > > arch/sparc/kernel/pci_fire.c | 3 ++- > > > arch/sparc/kernel/pci_schizo.c| 3 ++- > > > arch/sparc/kernel/pcic.c | 7 --- > > > arch/sparc/kernel/psycho_common.c | 3 ++- > > > arch/sparc/kernel/sbus.c | 3 ++- > > > arch/sparc/mm/init_64.c | 6 +++--- > > > arch/sparc/prom/init_32.c | 3 ++- > > > arch/sparc/prom/init_64.c | 5 +++-- > > > arch/sparc/prom/tree_32.c | 13 +++-- > > > arch/sparc/prom/tree_64.c | 19 ++- > > > arch/x86/include/asm/pci.h| 3 ++- > > > arch/x86/kernel/apic/x2apic_uv_x.c| 7 --- > > > arch/x86/kernel/smpboot.c | 3 ++- > > > arch/x86/platform/olpc/olpc_dt.c | 17 + > > > drivers/block/mtip32xx/mtip32xx.c | 5 +++-- > > > drivers/dma/dmaengine.c | 4 +++- > > > drivers/infiniband/hw/hfi1/affinity.c | 3 ++- > > > drivers/infiniband/hw/hfi1/init.c | 3 ++- > > > drivers/iommu/dmar.c | 5 +++-- > > > drivers/iommu/intel-iommu.c | 3 ++- > > > drivers/misc/sgi-xp/xpc_uv.c | 3 ++- > > > drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 5 +++-- > > > include/linux/device.h| 2 +- > > > init/init_task.c | 3 ++- > > > kernel/kthread.c | 3 ++- > > > kernel/sched/fair.c | 15 --- > > > lib/cpumask.c | 3 ++- > > > mm/huge_memory.c | 13 +++-- > > > mm/hugetlb.c | 3 ++- > > > mm/ksm.c | 2 +- > > > mm/memory.c | 7 --- > > > mm/memory_hotplug.c | 12 ++-- > > > mm/mempolicy.c| 2 +- > > > mm/page_alloc.c | 4 ++-- > > > mm/page_ext.c | 2 +- > > > net/core/pktgen.c | 3 ++- > > > net/qrtr/qrtr.c | 3 ++- > > > tools/perf/bench/numa.c | 6 +++--- > > > 48 files changed, 146 insertions(+), 108 deletions(-) > > Thanks for the patch. It seems to me that
Re: [PATCH v4 2/3] mm: Add support for kmem caches in DMA32 zone
On Wed 05-12-18 19:01:03, Nicolas Boichat wrote: [...] > > Secondly, why do we need a new sysfs file? Who is going to consume it? > > We have cache_dma, so it seems consistent to add cache_dma32. I wouldn't copy a pattern unless there is an explicit usecase for it. We do expose way too much to userspace and that keeps kicking us later. Not that I am aware of any specific example for cache_dma but seeing other examples I would rather be more careful. > I wasn't aware of tools/vm/slabinfo.c, so I can add support for > cache_dma32 in a follow-up patch. Any other user I should take care > of? In general zones are inernal MM implementation details and the less we export to userspace the better. > > Then why do we need SLAB_MERGE_SAME to cover GFP_DMA32 as well? > > SLAB_MERGE_SAME tells us which flags _need_ to be the same for the > slabs to be merged. We don't want slab caches with GFP_DMA32 and > ~GFP_DMA32 to be merged, so it should be in there. > (https://elixir.bootlin.com/linux/v4.19.6/source/mm/slab_common.c#L342). Ohh, my bad, I have misread the change. Sure we definitely not want to allow merging here. My bad. -- Michal Hocko SUSE Labs ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V2] mm: Replace all open encodings for NUMA_NO_NODE
On 12/05/2018 02:56 AM, Lubomir Rintel wrote: > On Mon, 2018-11-26 at 17:56 +0530, Anshuman Khandual wrote: >> At present there are multiple places where invalid node number is encoded >> as -1. Even though implicitly understood it is always better to have macros >> in there. Replace these open encodings for an invalid node number with the >> global macro NUMA_NO_NODE. This helps remove NUMA related assumptions like >> 'invalid node' from various places redirecting them to a common definition. >> >> Signed-off-by: Anshuman Khandual >> --- >> Changes in V2: >> >> - Added inclusion of 'numa.h' header at various places per Andrew >> - Updated 'dev_to_node' to use NUMA_NO_NODE instead per Vinod >> >> Changes in V1: (https://lkml.org/lkml/2018/11/23/485) >> >> - Dropped OCFS2 changes per Joseph >> - Dropped media/video drivers changes per Hans >> >> RFC - https://patchwork.kernel.org/patch/10678035/ >> >> Build tested this with multiple cross compiler options like alpha, sparc, >> arm64, x86, powerpc, powerpc64le etc with their default config which might >> not have compiled tested all driver related changes. I will appreciate >> folks giving this a test in their respective build environment. >> >> All these places for replacement were found by running the following grep >> patterns on the entire kernel code. Please let me know if this might have >> missed some instances. This might also have replaced some false positives. >> I will appreciate suggestions, inputs and review. >> >> 1. git grep "nid == -1" >> 2. git grep "node == -1" >> 3. git grep "nid = -1" >> 4. git grep "node = -1" >> >> arch/alpha/include/asm/topology.h | 3 ++- >> arch/ia64/kernel/numa.c | 2 +- >> arch/ia64/mm/discontig.c | 6 +++--- >> arch/ia64/sn/kernel/io_common.c | 3 ++- >> arch/powerpc/include/asm/pci-bridge.h | 3 ++- >> arch/powerpc/kernel/paca.c| 3 ++- >> arch/powerpc/kernel/pci-common.c | 3 ++- >> arch/powerpc/mm/numa.c| 14 +++--- >> arch/powerpc/platforms/powernv/memtrace.c | 5 +++-- >> arch/sparc/kernel/auxio_32.c | 3 ++- >> arch/sparc/kernel/pci_fire.c | 3 ++- >> arch/sparc/kernel/pci_schizo.c| 3 ++- >> arch/sparc/kernel/pcic.c | 7 --- >> arch/sparc/kernel/psycho_common.c | 3 ++- >> arch/sparc/kernel/sbus.c | 3 ++- >> arch/sparc/mm/init_64.c | 6 +++--- >> arch/sparc/prom/init_32.c | 3 ++- >> arch/sparc/prom/init_64.c | 5 +++-- >> arch/sparc/prom/tree_32.c | 13 +++-- >> arch/sparc/prom/tree_64.c | 19 ++- >> arch/x86/include/asm/pci.h| 3 ++- >> arch/x86/kernel/apic/x2apic_uv_x.c| 7 --- >> arch/x86/kernel/smpboot.c | 3 ++- >> arch/x86/platform/olpc/olpc_dt.c | 17 + >> drivers/block/mtip32xx/mtip32xx.c | 5 +++-- >> drivers/dma/dmaengine.c | 4 +++- >> drivers/infiniband/hw/hfi1/affinity.c | 3 ++- >> drivers/infiniband/hw/hfi1/init.c | 3 ++- >> drivers/iommu/dmar.c | 5 +++-- >> drivers/iommu/intel-iommu.c | 3 ++- >> drivers/misc/sgi-xp/xpc_uv.c | 3 ++- >> drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 5 +++-- >> include/linux/device.h| 2 +- >> init/init_task.c | 3 ++- >> kernel/kthread.c | 3 ++- >> kernel/sched/fair.c | 15 --- >> lib/cpumask.c | 3 ++- >> mm/huge_memory.c | 13 +++-- >> mm/hugetlb.c | 3 ++- >> mm/ksm.c | 2 +- >> mm/memory.c | 7 --- >> mm/memory_hotplug.c | 12 ++-- >> mm/mempolicy.c| 2 +- >> mm/page_alloc.c | 4 ++-- >> mm/page_ext.c | 2 +- >> net/core/pktgen.c | 3 ++- >> net/qrtr/qrtr.c | 3 ++- >> tools/perf/bench/numa.c | 6 +++--- >> 48 files changed, 146 insertions(+), 108 deletions(-) > Thanks for the patch. It seems to me that you've got a fairly large > amount of it wrong though -- perhaps relying just on "git grep" alone > is not the best idea. Hmm, okay. > > The diffstat is not all that big, it is entirely plausible to just > review each hunk manually: just do a "git show -U20" to get some > context. > > You get a NAK from me for the OLPC DT part, but I
Re: [LKP] [mm] 19717e78a0: stderr.if(target_node==NUMA_NO_NODE){
On 12/05/2018 10:30 AM, kernel test robot wrote: > FYI, we noticed the following commit (built with gcc-7): > > commit: 19717e78a04d51512cf0e7b9b09c61f06b2af071 ("[PATCH V2] mm: Replace all > open encodings for NUMA_NO_NODE") > url: > https://github.com/0day-ci/linux/commits/Anshuman-Khandual/mm-Replace-all-open-encodings-for-NUMA_NO_NODE/20181126-203831 > > > in testcase: perf-sanity-tests > with following parameters: > > perf_compiler: gcc > ucode: 0x713 > > > > on test machine: 16 threads Intel(R) Xeon(R) CPU D-1541 @ 2.10GHz with 8G > memory > > caused below changes (please refer to attached dmesg/kmsg for entire > log/backtrace): The fix (in Andrew's staging tree) from Stephen Rothwell which adds definitions to should fix this. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 2/3] mm: Add support for kmem caches in DMA32 zone
On Wed, Dec 5, 2018 at 5:56 PM Michal Hocko wrote: > > On Wed 05-12-18 13:48:27, Nicolas Boichat wrote: > > In some cases (e.g. IOMMU ARMv7s page allocator), we need to allocate > > data structures smaller than a page with GFP_DMA32 flag. > > > > This change makes it possible to create a custom cache in DMA32 zone > > using kmem_cache_create, then allocate memory using kmem_cache_alloc. > > > > We do not create a DMA32 kmalloc cache array, as there are currently > > no users of kmalloc(..., GFP_DMA32). The new test in check_slab_flags > > ensures that such calls still fail (as they do before this change). > > The changelog should be much more specific about decisions made here. > First of all it would be nice to mention the usecase. Ok, I'll copy most of the cover letter text here (i.e. the fact that IOMMU wants physical memory <4GB for L2 page tables, why it's better than genalloc/page_frag). > Secondly, why do we need a new sysfs file? Who is going to consume it? We have cache_dma, so it seems consistent to add cache_dma32. I wasn't aware of tools/vm/slabinfo.c, so I can add support for cache_dma32 in a follow-up patch. Any other user I should take care of? > Then why do we need SLAB_MERGE_SAME to cover GFP_DMA32 as well? SLAB_MERGE_SAME tells us which flags _need_ to be the same for the slabs to be merged. We don't want slab caches with GFP_DMA32 and ~GFP_DMA32 to be merged, so it should be in there. (https://elixir.bootlin.com/linux/v4.19.6/source/mm/slab_common.c#L342). > I > thought the whole point is to use dedicated slab cache. Who is this > going to merge with? Well, if there was another SLAB cache requiring 1KB GFP_DMA32 elements, then I don't see why we would not merge the caches. This is what happens with this IOMMU L2 tables cache pre-CONFIG_ZONE_DMA32 on arm64 (output on some 3.18 kernel below), and what would happen on arm32 since we still use GFP_DMA. /sys/kernel/slab # ls -l | grep dt-0001024 drwxr-xr-x. 2 root root 0 Dec 5 02:25 :dt-0001024 lrwxrwxrwx. 1 root root 0 Dec 5 02:25 dma-kmalloc-1024 -> :dt-0001024 lrwxrwxrwx. 1 root root 0 Dec 5 02:25 io-pgtable_armv7s_l2 -> :dt-0001024 Thanks! > -- > Michal Hocko > SUSE Labs ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] dma-mapping: fix lack of DMA address assignment in generic remap allocator
Commit bfd56cd60521 ("dma-mapping: support highmem in the generic remap allocator") replaced dma_direct_alloc_pages() with __dma_direct_alloc_pages(), which doesn't set dma_handle and zero allocated memory. Fix it by doing this directly in the caller function. Fixes: bfd56cd60521 ("dma-mapping: support highmem in the generic remap allocator") Signed-off-by: Marek Szyprowski --- kernel/dma/remap.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/kernel/dma/remap.c b/kernel/dma/remap.c index 68a64e3ff6a1..8a44317cfc1a 100644 --- a/kernel/dma/remap.c +++ b/kernel/dma/remap.c @@ -223,8 +223,14 @@ void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle, ret = dma_common_contiguous_remap(page, size, VM_USERMAP, arch_dma_mmap_pgprot(dev, PAGE_KERNEL, attrs), __builtin_return_address(0)); - if (!ret) + if (!ret) { __dma_direct_free_pages(dev, size, page); + return ret; + } + + *dma_handle = phys_to_dma(dev, page_to_phys(page)); + memset(ret, 0, size); + return ret; } -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 2/3] mm: Add support for kmem caches in DMA32 zone
On Wed 05-12-18 13:48:27, Nicolas Boichat wrote: > In some cases (e.g. IOMMU ARMv7s page allocator), we need to allocate > data structures smaller than a page with GFP_DMA32 flag. > > This change makes it possible to create a custom cache in DMA32 zone > using kmem_cache_create, then allocate memory using kmem_cache_alloc. > > We do not create a DMA32 kmalloc cache array, as there are currently > no users of kmalloc(..., GFP_DMA32). The new test in check_slab_flags > ensures that such calls still fail (as they do before this change). The changelog should be much more specific about decisions made here. First of all it would be nice to mention the usecase. Secondly, why do we need a new sysfs file? Who is going to consume it? Then why do we need SLAB_MERGE_SAME to cover GFP_DMA32 as well? I thought the whole point is to use dedicated slab cache. Who is this going to merge with? -- Michal Hocko SUSE Labs ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: use generic DMA mapping code in powerpc V4
On 04 December 2018 at 3:24PM, Christoph Hellwig wrote: On Tue, Dec 04, 2018 at 10:53:39AM +0100, Christian Zigotzky wrote: I don't know why this kernel doesn't recognize the hard disks connected to my physical P5020 board and why the onboard ethernet on my PASEMI board doesn't work. (dma_direct_map_page: overflow) Do you know if this actually works for the baseline before my patches? E.g. with commit 721c01ba8b46ddb5355bd6e6b3bbfdabfdf01e97 ? Hi Christoph, Thanks for your reply. I undid all dma mapping commits with the following command: git checkout 721c01ba8b46ddb5355bd6e6b3bbfdabfdf01e97 After that I compiled the kernels with this code for my P5020 board (Cyrus) and for my PASEMI board (Nemo) today. Result: PASEMI onboard ethernet works again and the P5020 board boots. It seems the dma mapping commits are the problem. @All Could you please test Christoph's kernel on your PASEMI and NXP boards? Download: 'git clone git://git.infradead.org/users/hch/misc.git -b powerpc-dma.5 a' Thanks, Christian ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] media: s5p-mfc: Fix memdev DMA configuration
Em Fri, 14 Sep 2018 14:19:29 +0200 Marek Szyprowski escreveu: > Hi Robin, > > On 2018-09-12 18:45, Robin Murphy wrote: > > Having of_reserved_mem_device_init() forcibly reconfigure DMA for all > > callers, potentially overriding the work done by a bus-specific > > .dma_configure method earlier, is at best a bad idea and at worst > > actively harmful. If drivers really need virtual devices to own > > dma-coherent memory, they should explicitly configure those devices > > based on the appropriate firmware node as they create them. > > > > It looks like the only driver not passing in a proper OF platform device > > is s5p-mfc, so move the rogue of_dma_configure() call into that driver > > where it logically belongs. > > > > CC: Smitha T Murthy > > CC: Marek Szyprowski > > CC: Rob Herring > > Signed-off-by: Robin Murphy > > Right, after recent cleanup dma ops initialization, MFC driver is > a better place for calling of_dma_configure() on virtual devices. > > Reviewed-by: Marek Szyprowski This patch seems to fit better via OF/DT tree. Not sure if it was merged there yet. In any case: Acked-by: Mauro Carvalho Chehab > > > --- > > drivers/media/platform/s5p-mfc/s5p_mfc.c | 7 +++ > > drivers/of/of_reserved_mem.c | 4 > > 2 files changed, 7 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/media/platform/s5p-mfc/s5p_mfc.c > > b/drivers/media/platform/s5p-mfc/s5p_mfc.c > > index 927a1235408d..77eb4a4511c1 100644 > > --- a/drivers/media/platform/s5p-mfc/s5p_mfc.c > > +++ b/drivers/media/platform/s5p-mfc/s5p_mfc.c > > @@ -1094,6 +1094,13 @@ static struct device *s5p_mfc_alloc_memdev(struct > > device *dev, > > child->dma_mask = dev->dma_mask; > > child->release = s5p_mfc_memdev_release; > > > > + /* > > +* The memdevs are not proper OF platform devices, so in order for them > > +* to be treated as valid DMA masters we need a bit of a hack to force > > +* them to inherit the MFC node's DMA configuration. > > +*/ > > + of_dma_configure(child, dev->of_node, true); > > + > > if (device_add(child) == 0) { > > ret = of_reserved_mem_device_init_by_idx(child, dev->of_node, > > idx); > > diff --git a/drivers/of/of_reserved_mem.c b/drivers/of/of_reserved_mem.c > > index 895c83e0c7b6..4ef6f4485335 100644 > > --- a/drivers/of/of_reserved_mem.c > > +++ b/drivers/of/of_reserved_mem.c > > @@ -350,10 +350,6 @@ int of_reserved_mem_device_init_by_idx(struct device > > *dev, > > mutex_lock(_rmem_assigned_device_mutex); > > list_add(>list, _rmem_assigned_device_list); > > mutex_unlock(_rmem_assigned_device_mutex); > > - /* ensure that dma_ops is set for virtual devices > > -* using reserved memory > > -*/ > > - of_dma_configure(dev, np, true); > > > > dev_info(dev, "assigned reserved memory node %s\n", rmem->name); > > } else { > > Best regards Thanks, Mauro ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 2/3] mm: Add support for kmem caches in DMA32 zone
On Wed, Dec 05, 2018 at 03:39:51PM +0800, Nicolas Boichat wrote: >On Wed, Dec 5, 2018 at 3:25 PM Wei Yang wrote: >> >> On Wed, Dec 05, 2018 at 01:48:27PM +0800, Nicolas Boichat wrote: >> >In some cases (e.g. IOMMU ARMv7s page allocator), we need to allocate >> >data structures smaller than a page with GFP_DMA32 flag. >> > >> >This change makes it possible to create a custom cache in DMA32 zone >> >using kmem_cache_create, then allocate memory using kmem_cache_alloc. >> > >> >We do not create a DMA32 kmalloc cache array, as there are currently >> >no users of kmalloc(..., GFP_DMA32). The new test in check_slab_flags >> >ensures that such calls still fail (as they do before this change). >> > >> >Fixes: ad67f5a6545f ("arm64: replace ZONE_DMA with ZONE_DMA32") >> >Signed-off-by: Nicolas Boichat >> >--- >> > >> >Changes since v2: >> > - Clarified commit message >> > - Add entry in sysfs-kernel-slab to document the new sysfs file >> > >> >(v3 used the page_frag approach) >> > >> >Documentation/ABI/testing/sysfs-kernel-slab | 9 + >> > include/linux/slab.h| 2 ++ >> > mm/internal.h | 8 ++-- >> > mm/slab.c | 4 +++- >> > mm/slab.h | 3 ++- >> > mm/slab_common.c| 2 +- >> > mm/slub.c | 18 +- >> > 7 files changed, 40 insertions(+), 6 deletions(-) >> > >> >diff --git a/Documentation/ABI/testing/sysfs-kernel-slab >> >b/Documentation/ABI/testing/sysfs-kernel-slab >> >index 29601d93a1c2ea..d742c6cfdffbe9 100644 >> >--- a/Documentation/ABI/testing/sysfs-kernel-slab >> >+++ b/Documentation/ABI/testing/sysfs-kernel-slab >> >@@ -106,6 +106,15 @@ Description: >> > are from ZONE_DMA. >> > Available when CONFIG_ZONE_DMA is enabled. >> > >> >+What: /sys/kernel/slab/cache/cache_dma32 >> >+Date: December 2018 >> >+KernelVersion:4.21 >> >+Contact: Nicolas Boichat >> >+Description: >> >+ The cache_dma32 file is read-only and specifies whether >> >objects >> >+ are from ZONE_DMA32. >> >+ Available when CONFIG_ZONE_DMA32 is enabled. >> >+ >> > What: /sys/kernel/slab/cache/cpu_slabs >> > Date: May 2007 >> > KernelVersion:2.6.22 >> >diff --git a/include/linux/slab.h b/include/linux/slab.h >> >index 11b45f7ae4057c..9449b19c5f107a 100644 >> >--- a/include/linux/slab.h >> >+++ b/include/linux/slab.h >> >@@ -32,6 +32,8 @@ >> > #define SLAB_HWCACHE_ALIGN((slab_flags_t __force)0x2000U) >> > /* Use GFP_DMA memory */ >> > #define SLAB_CACHE_DMA((slab_flags_t __force)0x4000U) >> >+/* Use GFP_DMA32 memory */ >> >+#define SLAB_CACHE_DMA32 ((slab_flags_t __force)0x8000U) >> > /* DEBUG: Store the last owner for bug hunting */ >> > #define SLAB_STORE_USER ((slab_flags_t __force)0x0001U) >> > /* Panic if kmem_cache_create() fails */ >> >diff --git a/mm/internal.h b/mm/internal.h >> >index a2ee82a0cd44ae..fd244ad716eaf8 100644 >> >--- a/mm/internal.h >> >+++ b/mm/internal.h >> >@@ -14,6 +14,7 @@ >> > #include >> > #include >> > #include >> >+#include >> > #include >> > >> > /* >> >@@ -34,9 +35,12 @@ >> > #define GFP_CONSTRAINT_MASK (__GFP_HARDWALL|__GFP_THISNODE) >> > >> > /* Check for flags that must not be used with a slab allocator */ >> >-static inline gfp_t check_slab_flags(gfp_t flags) >> >+static inline gfp_t check_slab_flags(gfp_t flags, slab_flags_t slab_flags) >> > { >> >- gfp_t bug_mask = __GFP_DMA32 | __GFP_HIGHMEM | ~__GFP_BITS_MASK; >> >+ gfp_t bug_mask = __GFP_HIGHMEM | ~__GFP_BITS_MASK; >> >+ >> >+ if (!IS_ENABLED(CONFIG_ZONE_DMA32) || !(slab_flags & >> >SLAB_CACHE_DMA32)) >> >+ bug_mask |= __GFP_DMA32; >> >> The original version doesn't check CONFIG_ZONE_DMA32. >> >> Do we need to add this condition here? >> Could we just decide the bug_mask based on slab_flags? > >We can. The reason I did it this way is that when we don't have >CONFIG_ZONE_DMA32, the compiler should be able to simplify to: > >bug_mask = __GFP_HIGHMEM | ~__GFP_BITS_MASK; >if (true || ..) => if (true) > bug_mask |= __GFP_DMA32; > >Then just >bug_mask = __GFP_HIGHMEM | ~__GFP_BITS_MASK | __GFP_DMA32; > >And since the function is inline, slab_flags would not even need to be >accessed at all. > Thanks for explanation. This make sense to me. -- Wei Yang Help you, Help me ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 0/4] PCI / iommu / thunderbolt: IOMMU based DMA protection
On Thu, Nov 29, 2018 at 06:51:49PM +0300, Mika Westerberg wrote: > Recent systems with Thunderbolt ports may be utilizing IOMMU to prevent DMA > attacks. This is different from the previous security level based scheme > because the connected device cannot access system memory outside of the > regions allocated for it by the driver. Applied all to thunderbolt.git/next with acks from Bjorn and Rafael. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu