Re: [RFC PATCH v2 00/12] get rid of GFP_ZONE_TABLE/BAD
On Tue, May 22, 2018 at 08:37:28PM +0200, Michal Hocko wrote: > So why is this any better than the current code. Sure I am not a great > fan of GFP_ZONE_TABLE because of how it is incomprehensible but this > doesn't look too much better, yet we are losing a check for incompatible > gfp flags. The diffstat looks really sound but then you just look and > see that the large part is the comment that at least explained the gfp > zone modifiers somehow and the debugging code. So what is the selling > point? I have a plan, but it's not exactly fully-formed yet. One of the big problems we have today is that we have a lot of users who have constraints on the physical memory they want to allocate, but we have very limited abilities to provide them with what they're asking for. The various different ZONEs have different meanings on different architectures and are generally a mess. If we had eight ZONEs, we could offer: ZONE_16M// 24 bit ZONE_256M // 28 bit ZONE_LOWMEM // CONFIG_32BIT only ZONE_4G // 32 bit ZONE_64G// 36 bit ZONE_1T // 40 bit ZONE_ALL// everything larger ZONE_MOVABLE// movable allocations; no physical address guarantees #ifdef CONFIG_64BIT #define ZONE_NORMAL ZONE_ALL #else #define ZONE_NORMAL ZONE_LOWMEM #endif This would cover most driver DMA mask allocations; we could tweak the offered zones based on analysis of what people need. #define GFP_HIGHUSER(GFP_USER | ZONE_ALL) #define GFP_HIGHUSER_MOVABLE(GFP_USER | ZONE_MOVABLE) One other thing I want to see is that fallback from zones happens from highest to lowest normally (ie if you fail to allocate in 1T, then you try to allocate from 64G), but movable allocations hapen from lowest to highest. So ZONE_16M ends up full of page cache pages which are readily evictable for the rare occasions when we need to allocate memory below 16MB. I'm sure there are lots of good reasons why this won't work, which is why I've been hesitant to propose it before now. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/ipmmu-vmsa: Document R-Car V3H and E3 IPMMU DT bindings
On Mon, May 21, 2018 at 11:41:33PM +0900, Magnus Damm wrote: > From: Magnus Damm> > Update the IPMMU DT binding documentation to include the compat strings > for the IPMMU devices included in the R-Car V3H and E3 SoCs. > > Signed-off-by: Magnus Damm > --- > > Developed on top of renesas-drivers-2018-05-15-v4.17-rc5 > > Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.txt |2 ++ > 1 file changed, 2 insertions(+) Acked-by: Rob Herring ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH v3 5/9] drivers/block/zram/zram_drv: update usage of zone modifiers
From: Huaisheng YeUse __GFP_ZONE_MOVABLE to replace (__GFP_HIGHMEM | __GFP_MOVABLE). ___GFP_DMA, ___GFP_HIGHMEM and ___GFP_DMA32 have been deleted from GFP bitmasks, the bottom three bits of GFP mask is reserved for storing encoded zone number. __GFP_ZONE_MOVABLE contains encoded ZONE_MOVABLE and __GFP_MOVABLE flag. With GFP_ZONE_TABLE, __GFP_HIGHMEM ORing __GFP_MOVABLE means gfp_zone should return ZONE_MOVABLE. In order to keep that compatible with GFP_ZONE_TABLE, replace (__GFP_HIGHMEM | __GFP_MOVABLE) with __GFP_ZONE_MOVABLE. Signed-off-by: Huaisheng Ye Cc: Minchan Kim Cc: Nitin Gupta Cc: Sergey Senozhatsky Cc: Christoph Hellwig --- drivers/block/zram/zram_drv.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index 0f3fadd..1bb5ca8 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -1004,14 +1004,12 @@ static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec, handle = zs_malloc(zram->mem_pool, comp_len, __GFP_KSWAPD_RECLAIM | __GFP_NOWARN | - __GFP_HIGHMEM | - __GFP_MOVABLE); + __GFP_ZONE_MOVABLE); if (!handle) { zcomp_stream_put(zram->comp); atomic64_inc(>stats.writestall); handle = zs_malloc(zram->mem_pool, comp_len, - GFP_NOIO | __GFP_HIGHMEM | - __GFP_MOVABLE); + GFP_NOIO | __GFP_ZONE_MOVABLE); if (handle) goto compress_again; return -ENOMEM; -- 1.8.3.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH v3 9/9] arch/x86/include/asm/page.h: update usage of movableflags
From: Huaisheng YeGFP_HIGHUSER_MOVABLE doesn't equal to GFP_HIGHUSER | __GFP_MOVABLE, modify it to adapt patch of getting rid of GFP_ZONE_TABLE/BAD. Signed-off-by: Huaisheng Ye Cc: Thomas Gleixner Cc: Ingo Molnar Cc: "H. Peter Anvin" Cc: Kate Stewart Cc: Greg Kroah-Hartman Cc: x...@kernel.org Cc: Philippe Ombredanne Cc: Christoph Hellwig --- arch/x86/include/asm/page.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h index 7555b48..a47f42d 100644 --- a/arch/x86/include/asm/page.h +++ b/arch/x86/include/asm/page.h @@ -35,7 +35,8 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr, } #define __alloc_zeroed_user_highpage(movableflags, vma, vaddr) \ - alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vaddr) + alloc_page_vma((movableflags ? GFP_HIGHUSER_MOVABLE : GFP_HIGHUSER) \ + | __GFP_ZERO, vma, vaddr) #define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE #ifndef __pa -- 1.8.3.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH v3 8/9] include/linux/highmem.h: update usage of movableflags
From: Huaisheng YeGFP_HIGHUSER_MOVABLE doesn't equal to GFP_HIGHUSER | __GFP_MOVABLE, modify it to adapt patch of getting rid of GFP_ZONE_TABLE/BAD. Signed-off-by: Huaisheng Ye Cc: Kate Stewart Cc: Greg Kroah-Hartman Cc: Thomas Gleixner Cc: Philippe Ombredanne Cc: Christoph Hellwig --- include/linux/highmem.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/linux/highmem.h b/include/linux/highmem.h index 0690679..5383c9e 100644 --- a/include/linux/highmem.h +++ b/include/linux/highmem.h @@ -159,8 +159,8 @@ static inline void clear_user_highpage(struct page *page, unsigned long vaddr) struct vm_area_struct *vma, unsigned long vaddr) { - struct page *page = alloc_page_vma(GFP_HIGHUSER | movableflags, - vma, vaddr); + struct page *page = alloc_page_vma(movableflags ? + GFP_HIGHUSER_MOVABLE : GFP_HIGHUSER, vma, vaddr); if (page) clear_user_highpage(page, vaddr); -- 1.8.3.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH v3 7/9] mm/zsmalloc: update usage of zone modifiers
From: Huaisheng YeUse __GFP_ZONE_MOVABLE to replace (__GFP_HIGHMEM | __GFP_MOVABLE). ___GFP_DMA, ___GFP_HIGHMEM and ___GFP_DMA32 have been deleted from GFP bitmasks, the bottom three bits of GFP mask is reserved for storing encoded zone number. __GFP_ZONE_MOVABLE contains encoded ZONE_MOVABLE and __GFP_MOVABLE flag. With GFP_ZONE_TABLE, __GFP_HIGHMEM ORing __GFP_MOVABLE means gfp_zone should return ZONE_MOVABLE. In order to keep that compatible with GFP_ZONE_TABLE, Use GFP_NORMAL_UNMOVABLE() to clear bottom 4 bits of GFP bitmaks. Signed-off-by: Huaisheng Ye Cc: Minchan Kim Cc: Nitin Gupta Cc: Sergey Senozhatsky Cc: Christoph Hellwig --- mm/zsmalloc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 61cb05d..e250c69 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -345,7 +345,7 @@ static void destroy_cache(struct zs_pool *pool) static unsigned long cache_alloc_handle(struct zs_pool *pool, gfp_t gfp) { return (unsigned long)kmem_cache_alloc(pool->handle_cachep, - gfp & ~(__GFP_HIGHMEM|__GFP_MOVABLE)); + GFP_NORMAL_UNMOVABLE(gfp)); } static void cache_free_handle(struct zs_pool *pool, unsigned long handle) @@ -356,7 +356,7 @@ static void cache_free_handle(struct zs_pool *pool, unsigned long handle) static struct zspage *cache_alloc_zspage(struct zs_pool *pool, gfp_t flags) { return kmem_cache_alloc(pool->zspage_cachep, - flags & ~(__GFP_HIGHMEM|__GFP_MOVABLE)); + GFP_NORMAL_UNMOVABLE(flags)); } static void cache_free_zspage(struct zs_pool *pool, struct zspage *zspage) -- 1.8.3.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH v3 4/9] fs/btrfs/extent_io: update usage of zone modifiers
From: Huaisheng YeUse __GFP_ZONE_MASK to replace (__GFP_DMA32 | __GFP_HIGHMEM). In function alloc_extent_state, it is obvious that __GFP_DMA is not the expecting zone type. ___GFP_DMA, ___GFP_HIGHMEM and ___GFP_DMA32 have been deleted from GFP bitmasks, the bottom three bits of GFP mask is reserved for storing encoded zone number. __GFP_DMA, __GFP_HIGHMEM and __GFP_DMA32 should not be operated with each others by OR. Use GFP_NORMAL() to clear bottom 3 bits of GFP bitmaks. Signed-off-by: Huaisheng Ye Cc: Chris Mason Cc: Josef Bacik Cc: David Sterba Cc: Christoph Hellwig --- fs/btrfs/extent_io.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index e99b329..f41fc61 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -220,7 +220,7 @@ static struct extent_state *alloc_extent_state(gfp_t mask) * The given mask might be not appropriate for the slab allocator, * drop the unsupported bits */ - mask &= ~(__GFP_DMA32|__GFP_HIGHMEM); + mask = GFP_NORMAL(mask); state = kmem_cache_alloc(extent_state_cache, mask); if (!state) return state; -- 1.8.3.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH v3 3/9] drivers/xen/swiotlb-xen: update usage of zone modifiers
From: Huaisheng YeUse __GFP_ZONE_MASK to replace (__GFP_DMA | __GFP_HIGHMEM). In function xen_swiotlb_alloc_coherent, it is obvious that __GFP_DMA32 is not the expecting zone type. ___GFP_DMA, ___GFP_HIGHMEM and ___GFP_DMA32 have been deleted from GFP bitmasks, the bottom three bits of GFP mask is reserved for storing encoded zone number. __GFP_DMA, __GFP_HIGHMEM and __GFP_DMA32 should not be operated with each others by OR. Use GFP_NORMAL() to clear bottom 3 bits of GFP bitmaks. Signed-off-by: Huaisheng Ye Cc: Konrad Rzeszutek Wilk Cc: Boris Ostrovsky Cc: Juergen Gross Cc: Christoph Hellwig --- drivers/xen/swiotlb-xen.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c index e1c6089..359 100644 --- a/drivers/xen/swiotlb-xen.c +++ b/drivers/xen/swiotlb-xen.c @@ -301,7 +301,7 @@ int __ref xen_swiotlb_init(int verbose, bool early) * machine physical layout. We can't allocate highmem * because we can't return a pointer to it. */ - flags &= ~(__GFP_DMA | __GFP_HIGHMEM); + flags = GFP_NORMAL(flags); /* On ARM this function returns an ioremap'ped virtual address for * which virt_to_phys doesn't return the corresponding physical -- 1.8.3.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH v3 2/9] include/linux/dma-mapping: update usage of zone modifiers
From: Huaisheng YeUse __GFP_ZONE_MASK to replace (__GFP_DMA | __GFP_HIGHMEM | __GFP_DMA32). ___GFP_DMA, ___GFP_HIGHMEM and ___GFP_DMA32 have been deleted from GFP bitmasks, the bottom three bits of GFP mask is reserved for storing encoded zone number. __GFP_DMA, __GFP_HIGHMEM and __GFP_DMA32 should not be operated with each others by OR. Use GFP_NORMAL() to clear bottom 3 bits of GFP bitmaks. Signed-off-by: Huaisheng Ye Cc: Christoph Hellwig Cc: Marek Szyprowski Cc: Robin Murphy Cc: Christoph Hellwig --- include/linux/dma-mapping.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index f8ab1c0..8fe524d 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -519,7 +519,7 @@ static inline void *dma_alloc_attrs(struct device *dev, size_t size, return cpu_addr; /* let the implementation decide on the zone to allocate from: */ - flag &= ~(__GFP_DMA | __GFP_DMA32 | __GFP_HIGHMEM); + flag = GFP_NORMAL(flag); if (!arch_dma_alloc_attrs(, )) return NULL; -- 1.8.3.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH v3 1/9] include/linux/gfp.h: get rid of GFP_ZONE_TABLE/BAD
From: Huaisheng YeReplace GFP_ZONE_TABLE and GFP_ZONE_BAD with encoded zone number. Delete ___GFP_DMA, ___GFP_HIGHMEM and ___GFP_DMA32 from GFP bitmasks, the bottom three bits of GFP mask is reserved for storing encoded zone number. The encoding method is XOR. Get zone number from enum zone_type, then encode the number with ZONE_NORMAL by XOR operation. The goal is to make sure ZONE_NORMAL can be encoded to zero. So, the compatibility can be guaranteed, such as GFP_KERNEL and GFP_ATOMIC can be used as before. Reserve __GFP_MOVABLE in bit 3, so that it can continue to be used as a flag. Same as before, __GFP_MOVABLE respresents movable migrate type for ZONE_DMA, ZONE_DMA32, and ZONE_NORMAL. But when it is enabled with __GFP_HIGHMEM, ZONE_MOVABLE shall be returned instead of ZONE_HIGHMEM. __GFP_ZONE_MOVABLE is created to realize it. With this patch, just enabling __GFP_MOVABLE and __GFP_HIGHMEM is not enough to get ZONE_MOVABLE from gfp_zone. All subsystems should use GFP_HIGHUSER_MOVABLE directly to achieve that. Decode zone number directly from bottom three bits of flags in gfp_zone. The theory of encoding and decoding is, A ^ B ^ B = A Suggested-by: Matthew Wilcox Signed-off-by: Huaisheng Ye Cc: Andrew Morton Cc: Vlastimil Babka Cc: Michal Hocko Cc: Mel Gorman Cc: Kate Stewart Cc: "Levin, Alexander (Sasha Levin)" Cc: Greg Kroah-Hartman Cc: Christoph Hellwig --- include/linux/gfp.h | 107 ++-- 1 file changed, 20 insertions(+), 87 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 1a4582b..f76ccd76 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -16,9 +16,7 @@ */ /* Plain integer GFP bitmasks. Do not use this directly. */ -#define ___GFP_DMA 0x01u -#define ___GFP_HIGHMEM 0x02u -#define ___GFP_DMA32 0x04u +#define ___GFP_ZONE_MASK 0x07u #define ___GFP_MOVABLE 0x08u #define ___GFP_RECLAIMABLE 0x10u #define ___GFP_HIGH0x20u @@ -53,11 +51,15 @@ * without the underscores and use them consistently. The definitions here may * be used in bit comparisons. */ -#define __GFP_DMA ((__force gfp_t)___GFP_DMA) -#define __GFP_HIGHMEM ((__force gfp_t)___GFP_HIGHMEM) -#define __GFP_DMA32((__force gfp_t)___GFP_DMA32) +#define __GFP_DMA ((__force gfp_t)OPT_ZONE_DMA ^ ZONE_NORMAL) +#define __GFP_HIGHMEM ((__force gfp_t)OPT_ZONE_HIGHMEM ^ ZONE_NORMAL) +#define __GFP_DMA32((__force gfp_t)OPT_ZONE_DMA32 ^ ZONE_NORMAL) #define __GFP_MOVABLE ((__force gfp_t)___GFP_MOVABLE) /* ZONE_MOVABLE allowed */ -#define GFP_ZONEMASK (__GFP_DMA|__GFP_HIGHMEM|__GFP_DMA32|__GFP_MOVABLE) +#define GFP_ZONEMASK ((__force gfp_t)___GFP_ZONE_MASK | ___GFP_MOVABLE) +/* bottom 3 bits of GFP bitmasks are used for zone number encoded*/ +#define __GFP_ZONE_MASK ((__force gfp_t)___GFP_ZONE_MASK) +#define __GFP_ZONE_MOVABLE \ + ((__force gfp_t)(ZONE_MOVABLE ^ ZONE_NORMAL) | ___GFP_MOVABLE) /* * Page mobility and placement hints @@ -268,6 +270,13 @@ * available and will not wake kswapd/kcompactd on failure. The _LIGHT * version does not attempt reclaim/compaction at all and is by default used * in page fault path, while the non-light is used by khugepaged. + * + * GFP_NORMAL() is used to clear bottom 3 bits of GFP bitmask. Actually it + * returns encoded ZONE_NORMAL bits. + * + * GFP_NORMAL_UNMOVABLE() is similar to GFP_NORMAL, but it clear bottom 4 bits + * of GFP bitmask. Excepting the encoded ZONE_NORMAL bits, it clears MOVABLE + * flags as well. */ #define GFP_ATOMIC (__GFP_HIGH|__GFP_ATOMIC|__GFP_KSWAPD_RECLAIM) #define GFP_KERNEL (__GFP_RECLAIM | __GFP_IO | __GFP_FS) @@ -279,10 +288,12 @@ #define GFP_DMA__GFP_DMA #define GFP_DMA32 __GFP_DMA32 #define GFP_HIGHUSER (GFP_USER | __GFP_HIGHMEM) -#define GFP_HIGHUSER_MOVABLE (GFP_HIGHUSER | __GFP_MOVABLE) +#define GFP_HIGHUSER_MOVABLE (GFP_USER | __GFP_ZONE_MOVABLE) #define GFP_TRANSHUGE_LIGHT((GFP_HIGHUSER_MOVABLE | __GFP_COMP | \ __GFP_NOMEMALLOC | __GFP_NOWARN) & ~__GFP_RECLAIM) #define GFP_TRANSHUGE (GFP_TRANSHUGE_LIGHT | __GFP_DIRECT_RECLAIM) +#define GFP_NORMAL(gfp)((gfp) & ~__GFP_ZONE_MASK) +#define GFP_NORMAL_UNMOVABLE(gfp) ((gfp) & ~GFP_ZONEMASK) /* Convert GFP flags to their corresponding migrate type */ #define GFP_MOVABLE_MASK (__GFP_RECLAIMABLE|__GFP_MOVABLE) @@ -326,87 +337,9 @@ static inline bool gfpflags_allow_blocking(const gfp_t gfp_flags) #define OPT_ZONE_DMA32 ZONE_NORMAL #endif -/* - * GFP_ZONE_TABLE is a word size bitstring that is used for looking up the - * zone to use given the lowest 4 bits of
[RFC PATCH v3 0/9] get rid of GFP_ZONE_TABLE/BAD
From: Huaisheng YeChanges since v2: [2] * According to Christoph's suggestion, rebase patches to current mainline from v4.16. * Follow the advice of Matthew, create macros like GFP_NORMAL and GFP_NORMAL_UNMOVABLE to clear bottom 3 and 4 bits of GFP bitmask. * Delete some patches because of kernel updating. [2]: https://marc.info/?l=linux-mm=152691610014027=2 Tested by Lenovo Thinksystem server. Initmem setup node 0 [mem 0x1000-0x00043fff] [0.00] On node 0 totalpages: 4111666 [0.00] DMA zone: 64 pages used for memmap [0.00] DMA zone: 23 pages reserved [0.00] DMA zone: 3999 pages, LIFO batch:0 [0.00] mminit::memmap_init Initialising map node 0 zone 0 pfns 1 -> 4096 [0.00] DMA32 zone: 10935 pages used for memmap [0.00] DMA32 zone: 699795 pages, LIFO batch:31 [0.00] mminit::memmap_init Initialising map node 0 zone 1 pfns 4096 -> 1048576 [0.00] Normal zone: 53248 pages used for memmap [0.00] Normal zone: 3407872 pages, LIFO batch:31 [0.00] mminit::memmap_init Initialising map node 0 zone 2 pfns 1048576 -> 4456448 [0.00] mminit::memmap_init Initialising map node 0 zone 3 pfns 1 -> 4456448 [0.00] Initmem setup node 1 [mem 0x00238000-0x00277fff] [0.00] On node 1 totalpages: 4194304 [0.00] Normal zone: 65536 pages used for memmap [0.00] Normal zone: 4194304 pages, LIFO batch:31 [0.00] mminit::memmap_init Initialising map node 1 zone 2 pfns 37224448 -> 41418752 [0.00] mminit::memmap_init Initialising map node 1 zone 3 pfns 37224448 -> 41418752 ... [0.00] mminit::zonelist general 0:DMA = 0:DMA [0.00] mminit::zonelist general 0:DMA32 = 0:DMA32 0:DMA [0.00] mminit::zonelist general 0:Normal = 0:Normal 0:DMA32 0:DMA 1:Normal [0.00] mminit::zonelist thisnode 0:DMA = 0:DMA [0.00] mminit::zonelist thisnode 0:DMA32 = 0:DMA32 0:DMA [0.00] mminit::zonelist thisnode 0:Normal = 0:Normal 0:DMA32 0:DMA [0.00] mminit::zonelist general 1:Normal = 1:Normal 0:Normal 0:DMA32 0:DMA [0.00] mminit::zonelist thisnode 1:Normal = 1:Normal [0.00] Built 2 zonelists, mobility grouping on. Total pages: 8176164 [0.00] Policy zone: Normal [0.00] Kernel command line: BOOT_IMAGE=/vmlinuz-4.17.0-rc6-gfp09+ root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/root rd.lvm.lv=fedora/swap debug LANG=en_US.UTF-8 mminit_loglevel=4 console=tty0 console=ttyS0,115200n8 memblock=debug earlyprintk=serial,0x3f8,115200 --- Replace GFP_ZONE_TABLE and GFP_ZONE_BAD with encoded zone number. Delete ___GFP_DMA, ___GFP_HIGHMEM and ___GFP_DMA32 from GFP bitmasks, the bottom three bits of GFP mask is reserved for storing encoded zone number. The encoding method is XOR. Get zone number from enum zone_type, then encode the number with ZONE_NORMAL by XOR operation. The goal is to make sure ZONE_NORMAL can be encoded to zero. So, the compatibility can be guaranteed, such as GFP_KERNEL and GFP_ATOMIC can be used as before. Reserve __GFP_MOVABLE in bit 3, so that it can continue to be used as a flag. Same as before, __GFP_MOVABLE respresents movable migrate type for ZONE_DMA, ZONE_DMA32, and ZONE_NORMAL. But when it is enabled with __GFP_HIGHMEM, ZONE_MOVABLE shall be returned instead of ZONE_HIGHMEM. __GFP_ZONE_MOVABLE is created to realize it. With this patch, just enabling __GFP_MOVABLE and __GFP_HIGHMEM is not enough to get ZONE_MOVABLE from gfp_zone. All callers should use GFP_HIGHUSER_MOVABLE or __GFP_ZONE_MOVABLE directly to achieve that. Decode zone number directly from bottom three bits of flags in gfp_zone. The theory of encoding and decoding is, A ^ B ^ B = A Changes since v1:[1] * Create __GFP_ZONE_MOVABLE and modify GFP_HIGHUSER_MOVABLE to help callers to get ZONE_MOVABLE. Try to create __GFP_ZONE_MASK to mask lowest 3 bits of GFP bitmasks. * Modify some callers' gfp flag to update usage of address zone modifiers. * Modify inline function gfp_zone to get better performance according to Matthew's suggestion. [1]: https://marc.info/?l=linux-mm=152596791931266=2 --- Huaisheng Ye (9): include/linux/gfp.h: get rid of GFP_ZONE_TABLE/BAD include/linux/dma-mapping: update usage of zone modifiers drivers/xen/swiotlb-xen: update usage of zone modifiers fs/btrfs/extent_io: update usage of zone modifiers drivers/block/zram/zram_drv: update usage of zone modifiers mm/vmpressure: update usage of zone modifiers mm/zsmalloc: update usage of zone modifiers include/linux/highmem.h: update usage of movableflags arch/x86/include/asm/page.h: update usage of movableflags arch/x86/include/asm/page.h | 3 +- drivers/block/zram/zram_drv.c | 6 +-- drivers/xen/swiotlb-xen.c | 2 +- fs/btrfs/extent_io.c | 2 +- include/linux/dma-mapping.h | 2 +- include/linux/gfp.h | 107
RE: [External] Re: [RFC PATCH v2 00/12] get rid of GFP_ZONE_TABLE/BAD
From: Michal Hocko [mailto:mho...@kernel.org] Sent: Wednesday, May 23, 2018 2:37 AM > > On Mon 21-05-18 23:20:21, Huaisheng Ye wrote: > > From: Huaisheng Ye> > > > Replace GFP_ZONE_TABLE and GFP_ZONE_BAD with encoded zone number. > > > > Delete ___GFP_DMA, ___GFP_HIGHMEM and ___GFP_DMA32 from GFP bitmasks, > > the bottom three bits of GFP mask is reserved for storing encoded > > zone number. > > > > The encoding method is XOR. Get zone number from enum zone_type, > > then encode the number with ZONE_NORMAL by XOR operation. > > The goal is to make sure ZONE_NORMAL can be encoded to zero. So, > > the compatibility can be guaranteed, such as GFP_KERNEL and GFP_ATOMIC > > can be used as before. > > > > Reserve __GFP_MOVABLE in bit 3, so that it can continue to be used as > > a flag. Same as before, __GFP_MOVABLE respresents movable migrate type > > for ZONE_DMA, ZONE_DMA32, and ZONE_NORMAL. But when it is enabled with > > __GFP_HIGHMEM, ZONE_MOVABLE shall be returned instead of ZONE_HIGHMEM. > > __GFP_ZONE_MOVABLE is created to realize it. > > > > With this patch, just enabling __GFP_MOVABLE and __GFP_HIGHMEM is not > > enough to get ZONE_MOVABLE from gfp_zone. All callers should use > > GFP_HIGHUSER_MOVABLE or __GFP_ZONE_MOVABLE directly to achieve that. > > > > Decode zone number directly from bottom three bits of flags in gfp_zone. > > The theory of encoding and decoding is, > > A ^ B ^ B = A > > So why is this any better than the current code. Sure I am not a great > fan of GFP_ZONE_TABLE because of how it is incomprehensible but this > doesn't look too much better, yet we are losing a check for incompatible > gfp flags. The diffstat looks really sound but then you just look and > see that the large part is the comment that at least explained the gfp > zone modifiers somehow and the debugging code. So what is the selling > point? Dear Michal, Let me try to reply your questions. Exactly, GFP_ZONE_TABLE is too complicated. I think there are two advantages from the series of patches. 1. XOR operation is simple and efficient, GFP_ZONE_TABLE/BAD need to do twice shift operations, the first is for getting a zone_type and the second is for checking the to be returned type is a correct or not. But with these patch XOR operation just needs to use once. Because the bottom 3 bits of GFP bitmask have been used to represent the encoded zone number, we can say there is no bad zone number if all callers could use it without buggy way. Of course, the returned zone type in gfp_zone needs to be no more than ZONE_MOVABLE. 2. GFP_ZONE_TABLE has limit with the amount of zone types. Current GFP_ZONE_TABLE is 32 bits, in general, there are 4 zone types for most ofX86_64 platform, they are ZONE_DMA, ZONE_DMA32, ZONE_NORMAL and ZONE_MOVABLE. If we want to expand the amount of zone types to larger than 4, the zone shift should be 3. That is to say, a 32 bits zone table is not enough to store all zone types. And the most painful thing is that, current GFP bitmasks' space is quite space-constrained it only have four ___GFP_XXX could be used as below, #define ___GFP_DMA 0x01u #define ___GFP_HIGHMEM 0x02u #define ___GFP_DMA320x04u (___GFP_NORMAL equals to 0x00) If we use the implementation of these patches, there is a maximum of 8 zone types could be used. The method of encoding and decoding is quite simple and users could have an intuitive feeling for this as below, and the most important is that, there is no BAD zone types eventually. A ^ B ^ B = A And by the way, our v3 patches are ready, but the smtp of Gmail is quite unstable for some firewall reason in my side, I will try to resend them ASAP. Sincerely, Huaisheng Ye ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH v3 9/9] arch/x86/include/asm/page.h: update usage of movableflags
From: Huaisheng YeGFP_HIGHUSER_MOVABLE doesn't equal to GFP_HIGHUSER | __GFP_MOVABLE, modify it to adapt patch of getting rid of GFP_ZONE_TABLE/BAD. Signed-off-by: Huaisheng Ye Cc: Thomas Gleixner Cc: Ingo Molnar Cc: "H. Peter Anvin" Cc: Kate Stewart Cc: Greg Kroah-Hartman Cc: x...@kernel.org Cc: Philippe Ombredanne Cc: Christoph Hellwig --- arch/x86/include/asm/page.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h index 7555b48..a47f42d 100644 --- a/arch/x86/include/asm/page.h +++ b/arch/x86/include/asm/page.h @@ -35,7 +35,8 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr, } #define __alloc_zeroed_user_highpage(movableflags, vma, vaddr) \ - alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vaddr) + alloc_page_vma((movableflags ? GFP_HIGHUSER_MOVABLE : GFP_HIGHUSER) \ + | __GFP_ZERO, vma, vaddr) #define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE #ifndef __pa -- 1.8.3.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH v3 8/9] include/linux/highmem.h: update usage of movableflags
From: Huaisheng YeGFP_HIGHUSER_MOVABLE doesn't equal to GFP_HIGHUSER | __GFP_MOVABLE, modify it to adapt patch of getting rid of GFP_ZONE_TABLE/BAD. Signed-off-by: Huaisheng Ye Cc: Kate Stewart Cc: Greg Kroah-Hartman Cc: Thomas Gleixner Cc: Philippe Ombredanne Cc: Christoph Hellwig --- include/linux/highmem.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/linux/highmem.h b/include/linux/highmem.h index 0690679..5383c9e 100644 --- a/include/linux/highmem.h +++ b/include/linux/highmem.h @@ -159,8 +159,8 @@ static inline void clear_user_highpage(struct page *page, unsigned long vaddr) struct vm_area_struct *vma, unsigned long vaddr) { - struct page *page = alloc_page_vma(GFP_HIGHUSER | movableflags, - vma, vaddr); + struct page *page = alloc_page_vma(movableflags ? + GFP_HIGHUSER_MOVABLE : GFP_HIGHUSER, vma, vaddr); if (page) clear_user_highpage(page, vaddr); -- 1.8.3.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH v3 4/9] fs/btrfs/extent_io: update usage of zone modifiers
From: Huaisheng YeUse __GFP_ZONE_MASK to replace (__GFP_DMA32 | __GFP_HIGHMEM). In function alloc_extent_state, it is obvious that __GFP_DMA is not the expecting zone type. ___GFP_DMA, ___GFP_HIGHMEM and ___GFP_DMA32 have been deleted from GFP bitmasks, the bottom three bits of GFP mask is reserved for storing encoded zone number. __GFP_DMA, __GFP_HIGHMEM and __GFP_DMA32 should not be operated with each others by OR. Use GFP_NORMAL() to clear bottom 3 bits of GFP bitmaks. Signed-off-by: Huaisheng Ye Cc: Chris Mason Cc: Josef Bacik Cc: David Sterba Cc: Christoph Hellwig --- fs/btrfs/extent_io.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index e99b329..f41fc61 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -220,7 +220,7 @@ static struct extent_state *alloc_extent_state(gfp_t mask) * The given mask might be not appropriate for the slab allocator, * drop the unsupported bits */ - mask &= ~(__GFP_DMA32|__GFP_HIGHMEM); + mask = GFP_NORMAL(mask); state = kmem_cache_alloc(extent_state_cache, mask); if (!state) return state; -- 1.8.3.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH v3 1/9] include/linux/gfp.h: get rid of GFP_ZONE_TABLE/BAD
From: Huaisheng YeReplace GFP_ZONE_TABLE and GFP_ZONE_BAD with encoded zone number. Delete ___GFP_DMA, ___GFP_HIGHMEM and ___GFP_DMA32 from GFP bitmasks, the bottom three bits of GFP mask is reserved for storing encoded zone number. The encoding method is XOR. Get zone number from enum zone_type, then encode the number with ZONE_NORMAL by XOR operation. The goal is to make sure ZONE_NORMAL can be encoded to zero. So, the compatibility can be guaranteed, such as GFP_KERNEL and GFP_ATOMIC can be used as before. Reserve __GFP_MOVABLE in bit 3, so that it can continue to be used as a flag. Same as before, __GFP_MOVABLE respresents movable migrate type for ZONE_DMA, ZONE_DMA32, and ZONE_NORMAL. But when it is enabled with __GFP_HIGHMEM, ZONE_MOVABLE shall be returned instead of ZONE_HIGHMEM. __GFP_ZONE_MOVABLE is created to realize it. With this patch, just enabling __GFP_MOVABLE and __GFP_HIGHMEM is not enough to get ZONE_MOVABLE from gfp_zone. All subsystems should use GFP_HIGHUSER_MOVABLE directly to achieve that. Decode zone number directly from bottom three bits of flags in gfp_zone. The theory of encoding and decoding is, A ^ B ^ B = A Suggested-by: Matthew Wilcox Signed-off-by: Huaisheng Ye Cc: Andrew Morton Cc: Vlastimil Babka Cc: Michal Hocko Cc: Mel Gorman Cc: Kate Stewart Cc: "Levin, Alexander (Sasha Levin)" Cc: Greg Kroah-Hartman Cc: Christoph Hellwig --- include/linux/gfp.h | 107 ++-- 1 file changed, 20 insertions(+), 87 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 1a4582b..f76ccd76 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -16,9 +16,7 @@ */ /* Plain integer GFP bitmasks. Do not use this directly. */ -#define ___GFP_DMA 0x01u -#define ___GFP_HIGHMEM 0x02u -#define ___GFP_DMA32 0x04u +#define ___GFP_ZONE_MASK 0x07u #define ___GFP_MOVABLE 0x08u #define ___GFP_RECLAIMABLE 0x10u #define ___GFP_HIGH0x20u @@ -53,11 +51,15 @@ * without the underscores and use them consistently. The definitions here may * be used in bit comparisons. */ -#define __GFP_DMA ((__force gfp_t)___GFP_DMA) -#define __GFP_HIGHMEM ((__force gfp_t)___GFP_HIGHMEM) -#define __GFP_DMA32((__force gfp_t)___GFP_DMA32) +#define __GFP_DMA ((__force gfp_t)OPT_ZONE_DMA ^ ZONE_NORMAL) +#define __GFP_HIGHMEM ((__force gfp_t)OPT_ZONE_HIGHMEM ^ ZONE_NORMAL) +#define __GFP_DMA32((__force gfp_t)OPT_ZONE_DMA32 ^ ZONE_NORMAL) #define __GFP_MOVABLE ((__force gfp_t)___GFP_MOVABLE) /* ZONE_MOVABLE allowed */ -#define GFP_ZONEMASK (__GFP_DMA|__GFP_HIGHMEM|__GFP_DMA32|__GFP_MOVABLE) +#define GFP_ZONEMASK ((__force gfp_t)___GFP_ZONE_MASK | ___GFP_MOVABLE) +/* bottom 3 bits of GFP bitmasks are used for zone number encoded*/ +#define __GFP_ZONE_MASK ((__force gfp_t)___GFP_ZONE_MASK) +#define __GFP_ZONE_MOVABLE \ + ((__force gfp_t)(ZONE_MOVABLE ^ ZONE_NORMAL) | ___GFP_MOVABLE) /* * Page mobility and placement hints @@ -268,6 +270,13 @@ * available and will not wake kswapd/kcompactd on failure. The _LIGHT * version does not attempt reclaim/compaction at all and is by default used * in page fault path, while the non-light is used by khugepaged. + * + * GFP_NORMAL() is used to clear bottom 3 bits of GFP bitmask. Actually it + * returns encoded ZONE_NORMAL bits. + * + * GFP_NORMAL_UNMOVABLE() is similar to GFP_NORMAL, but it clear bottom 4 bits + * of GFP bitmask. Excepting the encoded ZONE_NORMAL bits, it clears MOVABLE + * flags as well. */ #define GFP_ATOMIC (__GFP_HIGH|__GFP_ATOMIC|__GFP_KSWAPD_RECLAIM) #define GFP_KERNEL (__GFP_RECLAIM | __GFP_IO | __GFP_FS) @@ -279,10 +288,12 @@ #define GFP_DMA__GFP_DMA #define GFP_DMA32 __GFP_DMA32 #define GFP_HIGHUSER (GFP_USER | __GFP_HIGHMEM) -#define GFP_HIGHUSER_MOVABLE (GFP_HIGHUSER | __GFP_MOVABLE) +#define GFP_HIGHUSER_MOVABLE (GFP_USER | __GFP_ZONE_MOVABLE) #define GFP_TRANSHUGE_LIGHT((GFP_HIGHUSER_MOVABLE | __GFP_COMP | \ __GFP_NOMEMALLOC | __GFP_NOWARN) & ~__GFP_RECLAIM) #define GFP_TRANSHUGE (GFP_TRANSHUGE_LIGHT | __GFP_DIRECT_RECLAIM) +#define GFP_NORMAL(gfp)((gfp) & ~__GFP_ZONE_MASK) +#define GFP_NORMAL_UNMOVABLE(gfp) ((gfp) & ~GFP_ZONEMASK) /* Convert GFP flags to their corresponding migrate type */ #define GFP_MOVABLE_MASK (__GFP_RECLAIMABLE|__GFP_MOVABLE) @@ -326,87 +337,9 @@ static inline bool gfpflags_allow_blocking(const gfp_t gfp_flags) #define OPT_ZONE_DMA32 ZONE_NORMAL #endif -/* - * GFP_ZONE_TABLE is a word size bitstring that is used for looking up the - * zone to use given the lowest 4 bits of
[RFC PATCH v3 2/9] include/linux/dma-mapping: update usage of zone modifiers
From: Huaisheng YeUse __GFP_ZONE_MASK to replace (__GFP_DMA | __GFP_HIGHMEM | __GFP_DMA32). ___GFP_DMA, ___GFP_HIGHMEM and ___GFP_DMA32 have been deleted from GFP bitmasks, the bottom three bits of GFP mask is reserved for storing encoded zone number. __GFP_DMA, __GFP_HIGHMEM and __GFP_DMA32 should not be operated with each others by OR. Use GFP_NORMAL() to clear bottom 3 bits of GFP bitmaks. Signed-off-by: Huaisheng Ye Cc: Christoph Hellwig Cc: Marek Szyprowski Cc: Robin Murphy Cc: Christoph Hellwig --- include/linux/dma-mapping.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index f8ab1c0..8fe524d 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -519,7 +519,7 @@ static inline void *dma_alloc_attrs(struct device *dev, size_t size, return cpu_addr; /* let the implementation decide on the zone to allocate from: */ - flag &= ~(__GFP_DMA | __GFP_DMA32 | __GFP_HIGHMEM); + flag = GFP_NORMAL(flag); if (!arch_dma_alloc_attrs(, )) return NULL; -- 1.8.3.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH v3 5/9] drivers/block/zram/zram_drv: update usage of zone modifiers
From: Huaisheng YeUse __GFP_ZONE_MOVABLE to replace (__GFP_HIGHMEM | __GFP_MOVABLE). ___GFP_DMA, ___GFP_HIGHMEM and ___GFP_DMA32 have been deleted from GFP bitmasks, the bottom three bits of GFP mask is reserved for storing encoded zone number. __GFP_ZONE_MOVABLE contains encoded ZONE_MOVABLE and __GFP_MOVABLE flag. With GFP_ZONE_TABLE, __GFP_HIGHMEM ORing __GFP_MOVABLE means gfp_zone should return ZONE_MOVABLE. In order to keep that compatible with GFP_ZONE_TABLE, replace (__GFP_HIGHMEM | __GFP_MOVABLE) with __GFP_ZONE_MOVABLE. Signed-off-by: Huaisheng Ye Cc: Minchan Kim Cc: Nitin Gupta Cc: Sergey Senozhatsky Cc: Christoph Hellwig --- drivers/block/zram/zram_drv.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index 0f3fadd..1bb5ca8 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -1004,14 +1004,12 @@ static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec, handle = zs_malloc(zram->mem_pool, comp_len, __GFP_KSWAPD_RECLAIM | __GFP_NOWARN | - __GFP_HIGHMEM | - __GFP_MOVABLE); + __GFP_ZONE_MOVABLE); if (!handle) { zcomp_stream_put(zram->comp); atomic64_inc(>stats.writestall); handle = zs_malloc(zram->mem_pool, comp_len, - GFP_NOIO | __GFP_HIGHMEM | - __GFP_MOVABLE); + GFP_NOIO | __GFP_ZONE_MOVABLE); if (handle) goto compress_again; return -ENOMEM; -- 1.8.3.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH v3 7/9] mm/zsmalloc: update usage of zone modifiers
From: Huaisheng YeUse __GFP_ZONE_MOVABLE to replace (__GFP_HIGHMEM | __GFP_MOVABLE). ___GFP_DMA, ___GFP_HIGHMEM and ___GFP_DMA32 have been deleted from GFP bitmasks, the bottom three bits of GFP mask is reserved for storing encoded zone number. __GFP_ZONE_MOVABLE contains encoded ZONE_MOVABLE and __GFP_MOVABLE flag. With GFP_ZONE_TABLE, __GFP_HIGHMEM ORing __GFP_MOVABLE means gfp_zone should return ZONE_MOVABLE. In order to keep that compatible with GFP_ZONE_TABLE, Use GFP_NORMAL_UNMOVABLE() to clear bottom 4 bits of GFP bitmaks. Signed-off-by: Huaisheng Ye Cc: Minchan Kim Cc: Nitin Gupta Cc: Sergey Senozhatsky Cc: Christoph Hellwig --- mm/zsmalloc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 61cb05d..e250c69 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -345,7 +345,7 @@ static void destroy_cache(struct zs_pool *pool) static unsigned long cache_alloc_handle(struct zs_pool *pool, gfp_t gfp) { return (unsigned long)kmem_cache_alloc(pool->handle_cachep, - gfp & ~(__GFP_HIGHMEM|__GFP_MOVABLE)); + GFP_NORMAL_UNMOVABLE(gfp)); } static void cache_free_handle(struct zs_pool *pool, unsigned long handle) @@ -356,7 +356,7 @@ static void cache_free_handle(struct zs_pool *pool, unsigned long handle) static struct zspage *cache_alloc_zspage(struct zs_pool *pool, gfp_t flags) { return kmem_cache_alloc(pool->zspage_cachep, - flags & ~(__GFP_HIGHMEM|__GFP_MOVABLE)); + GFP_NORMAL_UNMOVABLE(flags)); } static void cache_free_zspage(struct zs_pool *pool, struct zspage *zspage) -- 1.8.3.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH v3 6/9] mm/vmpressure: update usage of zone modifiers
From: Huaisheng YeUse __GFP_ZONE_MOVABLE to replace (__GFP_HIGHMEM | __GFP_MOVABLE). ___GFP_DMA, ___GFP_HIGHMEM and ___GFP_DMA32 have been deleted from GFP bitmasks, the bottom three bits of GFP mask is reserved for storing encoded zone number. __GFP_ZONE_MOVABLE contains encoded ZONE_MOVABLE and __GFP_MOVABLE flag. With GFP_ZONE_TABLE, __GFP_HIGHMEM ORing __GFP_MOVABLE means gfp_zone should return ZONE_MOVABLE. In order to keep that compatible with GFP_ZONE_TABLE, replace (__GFP_HIGHMEM | __GFP_MOVABLE) with __GFP_ZONE_MOVABLE. Signed-off-by: Huaisheng Ye Cc: Andrew Morton Cc: zhongjiang Cc: Minchan Kim Cc: Dan Carpenter Cc: David Rientjes Cc: Christoph Hellwig --- mm/vmpressure.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/vmpressure.c b/mm/vmpressure.c index 85350ce..30a40e2 100644 --- a/mm/vmpressure.c +++ b/mm/vmpressure.c @@ -256,7 +256,7 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg, bool tree, * Indirect reclaim (kswapd) sets sc->gfp_mask to GFP_KERNEL, so * we account it too. */ - if (!(gfp & (__GFP_HIGHMEM | __GFP_MOVABLE | __GFP_IO | __GFP_FS))) + if (!(gfp & (__GFP_ZONE_MOVABLE | __GFP_IO | __GFP_FS))) return; /* -- 1.8.3.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH v3 3/9] drivers/xen/swiotlb-xen: update usage of zone modifiers
From: Huaisheng YeUse __GFP_ZONE_MASK to replace (__GFP_DMA | __GFP_HIGHMEM). In function xen_swiotlb_alloc_coherent, it is obvious that __GFP_DMA32 is not the expecting zone type. ___GFP_DMA, ___GFP_HIGHMEM and ___GFP_DMA32 have been deleted from GFP bitmasks, the bottom three bits of GFP mask is reserved for storing encoded zone number. __GFP_DMA, __GFP_HIGHMEM and __GFP_DMA32 should not be operated with each others by OR. Use GFP_NORMAL() to clear bottom 3 bits of GFP bitmaks. Signed-off-by: Huaisheng Ye Cc: Konrad Rzeszutek Wilk Cc: Boris Ostrovsky Cc: Juergen Gross Cc: Christoph Hellwig --- drivers/xen/swiotlb-xen.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c index e1c6089..359 100644 --- a/drivers/xen/swiotlb-xen.c +++ b/drivers/xen/swiotlb-xen.c @@ -301,7 +301,7 @@ int __ref xen_swiotlb_init(int verbose, bool early) * machine physical layout. We can't allocate highmem * because we can't return a pointer to it. */ - flags &= ~(__GFP_DMA | __GFP_HIGHMEM); + flags = GFP_NORMAL(flags); /* On ARM this function returns an ioremap'ped virtual address for * which virt_to_phys doesn't return the corresponding physical -- 1.8.3.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH v3 0/9] get rid of GFP_ZONE_TABLE/BAD
From: Huaisheng YeChanges since v2: [2] * According to Christoph's suggestion, rebase patches to current mainline from v4.16. * Follow the advice of Matthew, create macros like GFP_NORMAL and GFP_NORMAL_UNMOVABLE to clear bottom 3 and 4 bits of GFP bitmask. * Delete some patches because of kernel updating. [2]: https://marc.info/?l=linux-mm=152691610014027=2 Tested by Lenovo Thinksystem server. Initmem setup node 0 [mem 0x1000-0x00043fff] [0.00] On node 0 totalpages: 4111666 [0.00] DMA zone: 64 pages used for memmap [0.00] DMA zone: 23 pages reserved [0.00] DMA zone: 3999 pages, LIFO batch:0 [0.00] mminit::memmap_init Initialising map node 0 zone 0 pfns 1 -> 4096 [0.00] DMA32 zone: 10935 pages used for memmap [0.00] DMA32 zone: 699795 pages, LIFO batch:31 [0.00] mminit::memmap_init Initialising map node 0 zone 1 pfns 4096 -> 1048576 [0.00] Normal zone: 53248 pages used for memmap [0.00] Normal zone: 3407872 pages, LIFO batch:31 [0.00] mminit::memmap_init Initialising map node 0 zone 2 pfns 1048576 -> 4456448 [0.00] mminit::memmap_init Initialising map node 0 zone 3 pfns 1 -> 4456448 [0.00] Initmem setup node 1 [mem 0x00238000-0x00277fff] [0.00] On node 1 totalpages: 4194304 [0.00] Normal zone: 65536 pages used for memmap [0.00] Normal zone: 4194304 pages, LIFO batch:31 [0.00] mminit::memmap_init Initialising map node 1 zone 2 pfns 37224448 -> 41418752 [0.00] mminit::memmap_init Initialising map node 1 zone 3 pfns 37224448 -> 41418752 ... [0.00] mminit::zonelist general 0:DMA = 0:DMA [0.00] mminit::zonelist general 0:DMA32 = 0:DMA32 0:DMA [0.00] mminit::zonelist general 0:Normal = 0:Normal 0:DMA32 0:DMA 1:Normal [0.00] mminit::zonelist thisnode 0:DMA = 0:DMA [0.00] mminit::zonelist thisnode 0:DMA32 = 0:DMA32 0:DMA [0.00] mminit::zonelist thisnode 0:Normal = 0:Normal 0:DMA32 0:DMA [0.00] mminit::zonelist general 1:Normal = 1:Normal 0:Normal 0:DMA32 0:DMA [0.00] mminit::zonelist thisnode 1:Normal = 1:Normal [0.00] Built 2 zonelists, mobility grouping on. Total pages: 8176164 [0.00] Policy zone: Normal [0.00] Kernel command line: BOOT_IMAGE=/vmlinuz-4.17.0-rc6-gfp09+ root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/root rd.lvm.lv=fedora/swap debug LANG=en_US.UTF-8 mminit_loglevel=4 console=tty0 console=ttyS0,115200n8 memblock=debug earlyprintk=serial,0x3f8,115200 --- Replace GFP_ZONE_TABLE and GFP_ZONE_BAD with encoded zone number. Delete ___GFP_DMA, ___GFP_HIGHMEM and ___GFP_DMA32 from GFP bitmasks, the bottom three bits of GFP mask is reserved for storing encoded zone number. The encoding method is XOR. Get zone number from enum zone_type, then encode the number with ZONE_NORMAL by XOR operation. The goal is to make sure ZONE_NORMAL can be encoded to zero. So, the compatibility can be guaranteed, such as GFP_KERNEL and GFP_ATOMIC can be used as before. Reserve __GFP_MOVABLE in bit 3, so that it can continue to be used as a flag. Same as before, __GFP_MOVABLE respresents movable migrate type for ZONE_DMA, ZONE_DMA32, and ZONE_NORMAL. But when it is enabled with __GFP_HIGHMEM, ZONE_MOVABLE shall be returned instead of ZONE_HIGHMEM. __GFP_ZONE_MOVABLE is created to realize it. With this patch, just enabling __GFP_MOVABLE and __GFP_HIGHMEM is not enough to get ZONE_MOVABLE from gfp_zone. All callers should use GFP_HIGHUSER_MOVABLE or __GFP_ZONE_MOVABLE directly to achieve that. Decode zone number directly from bottom three bits of flags in gfp_zone. The theory of encoding and decoding is, A ^ B ^ B = A Changes since v1:[1] * Create __GFP_ZONE_MOVABLE and modify GFP_HIGHUSER_MOVABLE to help callers to get ZONE_MOVABLE. Try to create __GFP_ZONE_MASK to mask lowest 3 bits of GFP bitmasks. * Modify some callers' gfp flag to update usage of address zone modifiers. * Modify inline function gfp_zone to get better performance according to Matthew's suggestion. [1]: https://marc.info/?l=linux-mm=152596791931266=2 --- Huaisheng Ye (9): include/linux/gfp.h: get rid of GFP_ZONE_TABLE/BAD include/linux/dma-mapping: update usage of zone modifiers drivers/xen/swiotlb-xen: update usage of zone modifiers fs/btrfs/extent_io: update usage of zone modifiers drivers/block/zram/zram_drv: update usage of zone modifiers mm/vmpressure: update usage of zone modifiers mm/zsmalloc: update usage of zone modifiers include/linux/highmem.h: update usage of movableflags arch/x86/include/asm/page.h: update usage of movableflags arch/x86/include/asm/page.h | 3 +- drivers/block/zram/zram_drv.c | 6 +-- drivers/xen/swiotlb-xen.c | 2 +- fs/btrfs/extent_io.c | 2 +- include/linux/dma-mapping.h | 2 +- include/linux/gfp.h | 107
Re: [PATCH 1/1] iommu/dma: fix trival coding style mistake
On 23/05/18 07:02, Zhen Lei wrote: No functional changes. What's the mistake? Signed-off-by: Zhen Lei--- drivers/iommu/dma-iommu.c | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index ddcbbdb..4e885f7 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -231,6 +231,9 @@ static int iova_reserve_iommu_regions(struct device *dev, LIST_HEAD(resv_regions); int ret = 0; + if (!dev) + return 0; Logically, it makes no sense at all to call this function without a valid device; doing the check in init_domain was a deliberate decision to reflect that. This isn't a cleanup path shared by multiple callers where the "accept NULL for simplicity" argument might apply. + if (dev_is_pci(dev)) iova_reserve_pci_windows(to_pci_dev(dev), iovad); @@ -246,11 +249,12 @@ static int iova_reserve_iommu_regions(struct device *dev, hi = iova_pfn(iovad, region->start + region->length - 1); reserve_iova(iovad, lo, hi); - if (region->type == IOMMU_RESV_MSI) + if (region->type == IOMMU_RESV_MSI) { ret = cookie_init_hw_msi_region(cookie, region->start, region->start + region->length); - if (ret) - break; + if (ret) + break; + } Why? ret is already initialised appropriately, and the coding style even says that going beyond 3 levels of indentation is undesirable... Robin. } iommu_put_resv_regions(dev, _regions); @@ -308,8 +312,6 @@ int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base, } init_iova_domain(iovad, 1UL << order, base_pfn); - if (!dev) - return 0; return iova_reserve_iommu_regions(dev, domain); } -- 1.8.3 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 13/40] vfio: Add support for Shared Virtual Addressing
Hi, On 2018/5/12 3:06, Jean-Philippe Brucker wrote: Add two new ioctls for VFIO containers. VFIO_IOMMU_BIND_PROCESS creates a bond between a container and a process address space, identified by a Process Address Space ID (PASID). Devices in the container append this PASID to DMA transactions in order to access the process' address space. The process page tables are shared with the IOMMU, and mechanisms such as PCI ATS/PRI are used to handle faults. VFIO_IOMMU_UNBIND_PROCESS removes a bond created with VFIO_IOMMU_BIND_PROCESS. Signed-off-by: Jean-Philippe Brucker+static int vfio_iommu_bind_group(struct vfio_iommu *iommu, +struct vfio_group *group, +struct vfio_mm *vfio_mm) +{ + int ret; + bool enabled_sva = false; + struct vfio_iommu_sva_bind_data data = { + .vfio_mm= vfio_mm, + .iommu = iommu, + .count = 0, + }; + + if (!group->sva_enabled) { + ret = iommu_group_for_each_dev(group->iommu_group, NULL, + vfio_iommu_sva_init); Do we need to do *sva_init here or do anything to avoid repeated initiation? while another process already did initiation at this device, I think that current process will get an EEXIST. Thanks. + if (ret) + return ret; + + group->sva_enabled = enabled_sva = true; + } + + ret = iommu_group_for_each_dev(group->iommu_group, , + vfio_iommu_sva_bind_dev); + if (ret && data.count > 1) + iommu_group_for_each_dev(group->iommu_group, vfio_mm, +vfio_iommu_sva_unbind_dev); + if (ret && enabled_sva) { + iommu_group_for_each_dev(group->iommu_group, NULL, +vfio_iommu_sva_shutdown); + group->sva_enabled = false; + } + + return ret; +} @@ -1442,6 +1636,10 @@ static int vfio_iommu_type1_attach_group(void *iommu_data, if (ret) goto out_detach; + ret = vfio_iommu_replay_bind(iommu, group); + if (ret) + goto out_detach; + if (resv_msi) { ret = iommu_get_msi_cookie(domain->domain, resv_msi_base); if (ret) @@ -1547,6 +1745,11 @@ static void vfio_iommu_type1_detach_group(void *iommu_data, continue; iommu_detach_group(domain->domain, iommu_group); + if (group->sva_enabled) { + iommu_group_for_each_dev(iommu_group, NULL, +vfio_iommu_sva_shutdown); + group->sva_enabled = false; + } Here, why shut down here? If another process is working on the device, there may be a crash? Thanks. list_del(>next); kfree(group); /* @@ -1562,6 +1765,7 @@ static void vfio_iommu_type1_detach_group(void *iommu_data, ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 1/1] iommu/dma: fix trival coding style mistake
No functional changes. Signed-off-by: Zhen Lei--- drivers/iommu/dma-iommu.c | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index ddcbbdb..4e885f7 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -231,6 +231,9 @@ static int iova_reserve_iommu_regions(struct device *dev, LIST_HEAD(resv_regions); int ret = 0; + if (!dev) + return 0; + if (dev_is_pci(dev)) iova_reserve_pci_windows(to_pci_dev(dev), iovad); @@ -246,11 +249,12 @@ static int iova_reserve_iommu_regions(struct device *dev, hi = iova_pfn(iovad, region->start + region->length - 1); reserve_iova(iovad, lo, hi); - if (region->type == IOMMU_RESV_MSI) + if (region->type == IOMMU_RESV_MSI) { ret = cookie_init_hw_msi_region(cookie, region->start, region->start + region->length); - if (ret) - break; + if (ret) + break; + } } iommu_put_resv_regions(dev, _regions); @@ -308,8 +312,6 @@ int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base, } init_iova_domain(iovad, 1UL << order, base_pfn); - if (!dev) - return 0; return iova_reserve_iommu_regions(dev, domain); } -- 1.8.3 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu