Re: [RFC PATCH v2 00/12] get rid of GFP_ZONE_TABLE/BAD

2018-05-23 Thread Matthew Wilcox
On Tue, May 22, 2018 at 08:37:28PM +0200, Michal Hocko wrote:
> So why is this any better than the current code. Sure I am not a great
> fan of GFP_ZONE_TABLE because of how it is incomprehensible but this
> doesn't look too much better, yet we are losing a check for incompatible
> gfp flags. The diffstat looks really sound but then you just look and
> see that the large part is the comment that at least explained the gfp
> zone modifiers somehow and the debugging code. So what is the selling
> point?

I have a plan, but it's not exactly fully-formed yet.

One of the big problems we have today is that we have a lot of users
who have constraints on the physical memory they want to allocate,
but we have very limited abilities to provide them with what they're
asking for.  The various different ZONEs have different meanings on
different architectures and are generally a mess.

If we had eight ZONEs, we could offer:

ZONE_16M// 24 bit
ZONE_256M   // 28 bit
ZONE_LOWMEM // CONFIG_32BIT only
ZONE_4G // 32 bit
ZONE_64G// 36 bit
ZONE_1T // 40 bit
ZONE_ALL// everything larger
ZONE_MOVABLE// movable allocations; no physical address guarantees

#ifdef CONFIG_64BIT
#define ZONE_NORMAL ZONE_ALL
#else
#define ZONE_NORMAL ZONE_LOWMEM
#endif

This would cover most driver DMA mask allocations; we could tweak the
offered zones based on analysis of what people need.

#define GFP_HIGHUSER(GFP_USER | ZONE_ALL)
#define GFP_HIGHUSER_MOVABLE(GFP_USER | ZONE_MOVABLE)

One other thing I want to see is that fallback from zones happens from
highest to lowest normally (ie if you fail to allocate in 1T, then you
try to allocate from 64G), but movable allocations hapen from lowest
to highest.  So ZONE_16M ends up full of page cache pages which are
readily evictable for the rare occasions when we need to allocate memory
below 16MB.

I'm sure there are lots of good reasons why this won't work, which is
why I've been hesitant to propose it before now.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/ipmmu-vmsa: Document R-Car V3H and E3 IPMMU DT bindings

2018-05-23 Thread Rob Herring
On Mon, May 21, 2018 at 11:41:33PM +0900, Magnus Damm wrote:
> From: Magnus Damm 
> 
> Update the IPMMU DT binding documentation to include the compat strings
> for the IPMMU devices included in the R-Car V3H and E3 SoCs.
> 
> Signed-off-by: Magnus Damm 
> ---
> 
>  Developed on top of renesas-drivers-2018-05-15-v4.17-rc5
> 
>  Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.txt |2 ++
>  1 file changed, 2 insertions(+)

Acked-by: Rob Herring 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v3 5/9] drivers/block/zram/zram_drv: update usage of zone modifiers

2018-05-23 Thread Huaisheng Ye
From: Huaisheng Ye 

Use __GFP_ZONE_MOVABLE to replace (__GFP_HIGHMEM | __GFP_MOVABLE).

___GFP_DMA, ___GFP_HIGHMEM and ___GFP_DMA32 have been deleted from GFP
bitmasks, the bottom three bits of GFP mask is reserved for storing
encoded zone number.

__GFP_ZONE_MOVABLE contains encoded ZONE_MOVABLE and __GFP_MOVABLE flag.

With GFP_ZONE_TABLE, __GFP_HIGHMEM ORing __GFP_MOVABLE means gfp_zone
should return ZONE_MOVABLE. In order to keep that compatible with
GFP_ZONE_TABLE, replace (__GFP_HIGHMEM | __GFP_MOVABLE) with
__GFP_ZONE_MOVABLE.

Signed-off-by: Huaisheng Ye 
Cc: Minchan Kim 
Cc: Nitin Gupta 
Cc: Sergey Senozhatsky 
Cc: Christoph Hellwig 
---
 drivers/block/zram/zram_drv.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 0f3fadd..1bb5ca8 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1004,14 +1004,12 @@ static int __zram_bvec_write(struct zram *zram, struct 
bio_vec *bvec,
handle = zs_malloc(zram->mem_pool, comp_len,
__GFP_KSWAPD_RECLAIM |
__GFP_NOWARN |
-   __GFP_HIGHMEM |
-   __GFP_MOVABLE);
+   __GFP_ZONE_MOVABLE);
if (!handle) {
zcomp_stream_put(zram->comp);
atomic64_inc(>stats.writestall);
handle = zs_malloc(zram->mem_pool, comp_len,
-   GFP_NOIO | __GFP_HIGHMEM |
-   __GFP_MOVABLE);
+   GFP_NOIO | __GFP_ZONE_MOVABLE);
if (handle)
goto compress_again;
return -ENOMEM;
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v3 9/9] arch/x86/include/asm/page.h: update usage of movableflags

2018-05-23 Thread Huaisheng Ye
From: Huaisheng Ye 

GFP_HIGHUSER_MOVABLE doesn't equal to GFP_HIGHUSER | __GFP_MOVABLE,
modify it to adapt patch of getting rid of GFP_ZONE_TABLE/BAD.

Signed-off-by: Huaisheng Ye 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Kate Stewart 
Cc: Greg Kroah-Hartman 
Cc: x...@kernel.org 
Cc: Philippe Ombredanne 
Cc: Christoph Hellwig 
---
 arch/x86/include/asm/page.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h
index 7555b48..a47f42d 100644
--- a/arch/x86/include/asm/page.h
+++ b/arch/x86/include/asm/page.h
@@ -35,7 +35,8 @@ static inline void copy_user_page(void *to, void *from, 
unsigned long vaddr,
 }
 
 #define __alloc_zeroed_user_highpage(movableflags, vma, vaddr) \
-   alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vaddr)
+   alloc_page_vma((movableflags ? GFP_HIGHUSER_MOVABLE : GFP_HIGHUSER) \
+   | __GFP_ZERO, vma, vaddr)
 #define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE
 
 #ifndef __pa
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v3 8/9] include/linux/highmem.h: update usage of movableflags

2018-05-23 Thread Huaisheng Ye
From: Huaisheng Ye 

GFP_HIGHUSER_MOVABLE doesn't equal to GFP_HIGHUSER | __GFP_MOVABLE,
modify it to adapt patch of getting rid of GFP_ZONE_TABLE/BAD.

Signed-off-by: Huaisheng Ye 
Cc: Kate Stewart 
Cc: Greg Kroah-Hartman 
Cc: Thomas Gleixner 
Cc: Philippe Ombredanne 
Cc: Christoph Hellwig 
---
 include/linux/highmem.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index 0690679..5383c9e 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -159,8 +159,8 @@ static inline void clear_user_highpage(struct page *page, 
unsigned long vaddr)
struct vm_area_struct *vma,
unsigned long vaddr)
 {
-   struct page *page = alloc_page_vma(GFP_HIGHUSER | movableflags,
-   vma, vaddr);
+   struct page *page = alloc_page_vma(movableflags ?
+   GFP_HIGHUSER_MOVABLE : GFP_HIGHUSER, vma, vaddr);
 
if (page)
clear_user_highpage(page, vaddr);
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v3 7/9] mm/zsmalloc: update usage of zone modifiers

2018-05-23 Thread Huaisheng Ye
From: Huaisheng Ye 

Use __GFP_ZONE_MOVABLE to replace (__GFP_HIGHMEM | __GFP_MOVABLE).

___GFP_DMA, ___GFP_HIGHMEM and ___GFP_DMA32 have been deleted from GFP
bitmasks, the bottom three bits of GFP mask is reserved for storing
encoded zone number.

__GFP_ZONE_MOVABLE contains encoded ZONE_MOVABLE and __GFP_MOVABLE flag.

With GFP_ZONE_TABLE, __GFP_HIGHMEM ORing __GFP_MOVABLE means gfp_zone
should return ZONE_MOVABLE. In order to keep that compatible with
GFP_ZONE_TABLE, Use GFP_NORMAL_UNMOVABLE() to clear bottom 4 bits of
GFP bitmaks.

Signed-off-by: Huaisheng Ye 
Cc: Minchan Kim 
Cc: Nitin Gupta 
Cc: Sergey Senozhatsky 
Cc: Christoph Hellwig 
---
 mm/zsmalloc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 61cb05d..e250c69 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -345,7 +345,7 @@ static void destroy_cache(struct zs_pool *pool)
 static unsigned long cache_alloc_handle(struct zs_pool *pool, gfp_t gfp)
 {
return (unsigned long)kmem_cache_alloc(pool->handle_cachep,
-   gfp & ~(__GFP_HIGHMEM|__GFP_MOVABLE));
+   GFP_NORMAL_UNMOVABLE(gfp));
 }
 
 static void cache_free_handle(struct zs_pool *pool, unsigned long handle)
@@ -356,7 +356,7 @@ static void cache_free_handle(struct zs_pool *pool, 
unsigned long handle)
 static struct zspage *cache_alloc_zspage(struct zs_pool *pool, gfp_t flags)
 {
return kmem_cache_alloc(pool->zspage_cachep,
-   flags & ~(__GFP_HIGHMEM|__GFP_MOVABLE));
+   GFP_NORMAL_UNMOVABLE(flags));
 }
 
 static void cache_free_zspage(struct zs_pool *pool, struct zspage *zspage)
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v3 4/9] fs/btrfs/extent_io: update usage of zone modifiers

2018-05-23 Thread Huaisheng Ye
From: Huaisheng Ye 

Use __GFP_ZONE_MASK to replace (__GFP_DMA32 | __GFP_HIGHMEM).

In function alloc_extent_state, it is obvious that __GFP_DMA is not
the expecting zone type.

___GFP_DMA, ___GFP_HIGHMEM and ___GFP_DMA32 have been deleted from GFP
bitmasks, the bottom three bits of GFP mask is reserved for storing
encoded zone number.
__GFP_DMA, __GFP_HIGHMEM and __GFP_DMA32 should not be operated with
each others by OR.

Use GFP_NORMAL() to clear bottom 3 bits of GFP bitmaks.

Signed-off-by: Huaisheng Ye 
Cc: Chris Mason 
Cc: Josef Bacik 
Cc: David Sterba 
Cc: Christoph Hellwig 
---
 fs/btrfs/extent_io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index e99b329..f41fc61 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -220,7 +220,7 @@ static struct extent_state *alloc_extent_state(gfp_t mask)
 * The given mask might be not appropriate for the slab allocator,
 * drop the unsupported bits
 */
-   mask &= ~(__GFP_DMA32|__GFP_HIGHMEM);
+   mask = GFP_NORMAL(mask);
state = kmem_cache_alloc(extent_state_cache, mask);
if (!state)
return state;
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v3 3/9] drivers/xen/swiotlb-xen: update usage of zone modifiers

2018-05-23 Thread Huaisheng Ye
From: Huaisheng Ye 

Use __GFP_ZONE_MASK to replace (__GFP_DMA | __GFP_HIGHMEM).

In function xen_swiotlb_alloc_coherent, it is obvious that __GFP_DMA32
is not the expecting zone type.

___GFP_DMA, ___GFP_HIGHMEM and ___GFP_DMA32 have been deleted from GFP
bitmasks, the bottom three bits of GFP mask is reserved for storing
encoded zone number.
__GFP_DMA, __GFP_HIGHMEM and __GFP_DMA32 should not be operated with
each others by OR.

Use GFP_NORMAL() to clear bottom 3 bits of GFP bitmaks.

Signed-off-by: Huaisheng Ye 
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 
Cc: Juergen Gross 
Cc: Christoph Hellwig 
---
 drivers/xen/swiotlb-xen.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index e1c6089..359 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -301,7 +301,7 @@ int __ref xen_swiotlb_init(int verbose, bool early)
* machine physical layout.  We can't allocate highmem
* because we can't return a pointer to it.
*/
-   flags &= ~(__GFP_DMA | __GFP_HIGHMEM);
+   flags = GFP_NORMAL(flags);
 
/* On ARM this function returns an ioremap'ped virtual address for
 * which virt_to_phys doesn't return the corresponding physical
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v3 2/9] include/linux/dma-mapping: update usage of zone modifiers

2018-05-23 Thread Huaisheng Ye
From: Huaisheng Ye 

Use __GFP_ZONE_MASK to replace (__GFP_DMA | __GFP_HIGHMEM | __GFP_DMA32).

___GFP_DMA, ___GFP_HIGHMEM and ___GFP_DMA32 have been deleted from GFP
bitmasks, the bottom three bits of GFP mask is reserved for storing
encoded zone number.
__GFP_DMA, __GFP_HIGHMEM and __GFP_DMA32 should not be operated with
each others by OR.

Use GFP_NORMAL() to clear bottom 3 bits of GFP bitmaks.

Signed-off-by: Huaisheng Ye 
Cc: Christoph Hellwig 
Cc: Marek Szyprowski 
Cc: Robin Murphy 
Cc: Christoph Hellwig 
---
 include/linux/dma-mapping.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index f8ab1c0..8fe524d 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -519,7 +519,7 @@ static inline void *dma_alloc_attrs(struct device *dev, 
size_t size,
return cpu_addr;
 
/* let the implementation decide on the zone to allocate from: */
-   flag &= ~(__GFP_DMA | __GFP_DMA32 | __GFP_HIGHMEM);
+   flag = GFP_NORMAL(flag);
 
if (!arch_dma_alloc_attrs(, ))
return NULL;
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v3 1/9] include/linux/gfp.h: get rid of GFP_ZONE_TABLE/BAD

2018-05-23 Thread Huaisheng Ye
From: Huaisheng Ye 

Replace GFP_ZONE_TABLE and GFP_ZONE_BAD with encoded zone number.

Delete ___GFP_DMA, ___GFP_HIGHMEM and ___GFP_DMA32 from GFP bitmasks,
the bottom three bits of GFP mask is reserved for storing encoded
zone number.

The encoding method is XOR. Get zone number from enum zone_type,
then encode the number with ZONE_NORMAL by XOR operation.
The goal is to make sure ZONE_NORMAL can be encoded to zero. So,
the compatibility can be guaranteed, such as GFP_KERNEL and GFP_ATOMIC
can be used as before.

Reserve __GFP_MOVABLE in bit 3, so that it can continue to be used as
a flag. Same as before, __GFP_MOVABLE respresents movable migrate type
for ZONE_DMA, ZONE_DMA32, and ZONE_NORMAL. But when it is enabled with
__GFP_HIGHMEM, ZONE_MOVABLE shall be returned instead of ZONE_HIGHMEM.
__GFP_ZONE_MOVABLE is created to realize it.

With this patch, just enabling __GFP_MOVABLE and __GFP_HIGHMEM is not
enough to get ZONE_MOVABLE from gfp_zone. All subsystems should use
GFP_HIGHUSER_MOVABLE directly to achieve that.

Decode zone number directly from bottom three bits of flags in gfp_zone.
The theory of encoding and decoding is,
A ^ B ^ B = A

Suggested-by: Matthew Wilcox 
Signed-off-by: Huaisheng Ye 
Cc: Andrew Morton 
Cc: Vlastimil Babka 
Cc: Michal Hocko 
Cc: Mel Gorman 
Cc: Kate Stewart 
Cc: "Levin, Alexander (Sasha Levin)" 
Cc: Greg Kroah-Hartman 
Cc: Christoph Hellwig 
---
 include/linux/gfp.h | 107 ++--
 1 file changed, 20 insertions(+), 87 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 1a4582b..f76ccd76 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -16,9 +16,7 @@
  */
 
 /* Plain integer GFP bitmasks. Do not use this directly. */
-#define ___GFP_DMA 0x01u
-#define ___GFP_HIGHMEM 0x02u
-#define ___GFP_DMA32   0x04u
+#define ___GFP_ZONE_MASK   0x07u
 #define ___GFP_MOVABLE 0x08u
 #define ___GFP_RECLAIMABLE 0x10u
 #define ___GFP_HIGH0x20u
@@ -53,11 +51,15 @@
  * without the underscores and use them consistently. The definitions here may
  * be used in bit comparisons.
  */
-#define __GFP_DMA  ((__force gfp_t)___GFP_DMA)
-#define __GFP_HIGHMEM  ((__force gfp_t)___GFP_HIGHMEM)
-#define __GFP_DMA32((__force gfp_t)___GFP_DMA32)
+#define __GFP_DMA  ((__force gfp_t)OPT_ZONE_DMA ^ ZONE_NORMAL)
+#define __GFP_HIGHMEM  ((__force gfp_t)OPT_ZONE_HIGHMEM ^ ZONE_NORMAL)
+#define __GFP_DMA32((__force gfp_t)OPT_ZONE_DMA32 ^ ZONE_NORMAL)
 #define __GFP_MOVABLE  ((__force gfp_t)___GFP_MOVABLE)  /* ZONE_MOVABLE 
allowed */
-#define GFP_ZONEMASK   (__GFP_DMA|__GFP_HIGHMEM|__GFP_DMA32|__GFP_MOVABLE)
+#define GFP_ZONEMASK   ((__force gfp_t)___GFP_ZONE_MASK | ___GFP_MOVABLE)
+/* bottom 3 bits of GFP bitmasks are used for zone number encoded*/
+#define __GFP_ZONE_MASK ((__force gfp_t)___GFP_ZONE_MASK)
+#define __GFP_ZONE_MOVABLE \
+   ((__force gfp_t)(ZONE_MOVABLE ^ ZONE_NORMAL) | ___GFP_MOVABLE)
 
 /*
  * Page mobility and placement hints
@@ -268,6 +270,13 @@
  *   available and will not wake kswapd/kcompactd on failure. The _LIGHT
  *   version does not attempt reclaim/compaction at all and is by default used
  *   in page fault path, while the non-light is used by khugepaged.
+ *
+ * GFP_NORMAL() is used to clear bottom 3 bits of GFP bitmask. Actually it
+ *   returns encoded ZONE_NORMAL bits.
+ *
+ * GFP_NORMAL_UNMOVABLE() is similar to GFP_NORMAL, but it clear bottom 4 bits
+ *   of GFP bitmask. Excepting the encoded ZONE_NORMAL bits, it clears MOVABLE
+ *   flags as well.
  */
 #define GFP_ATOMIC (__GFP_HIGH|__GFP_ATOMIC|__GFP_KSWAPD_RECLAIM)
 #define GFP_KERNEL (__GFP_RECLAIM | __GFP_IO | __GFP_FS)
@@ -279,10 +288,12 @@
 #define GFP_DMA__GFP_DMA
 #define GFP_DMA32  __GFP_DMA32
 #define GFP_HIGHUSER   (GFP_USER | __GFP_HIGHMEM)
-#define GFP_HIGHUSER_MOVABLE   (GFP_HIGHUSER | __GFP_MOVABLE)
+#define GFP_HIGHUSER_MOVABLE   (GFP_USER | __GFP_ZONE_MOVABLE)
 #define GFP_TRANSHUGE_LIGHT((GFP_HIGHUSER_MOVABLE | __GFP_COMP | \
 __GFP_NOMEMALLOC | __GFP_NOWARN) & ~__GFP_RECLAIM)
 #define GFP_TRANSHUGE  (GFP_TRANSHUGE_LIGHT | __GFP_DIRECT_RECLAIM)
+#define GFP_NORMAL(gfp)((gfp) & ~__GFP_ZONE_MASK)
+#define GFP_NORMAL_UNMOVABLE(gfp) ((gfp) & ~GFP_ZONEMASK)
 
 /* Convert GFP flags to their corresponding migrate type */
 #define GFP_MOVABLE_MASK (__GFP_RECLAIMABLE|__GFP_MOVABLE)
@@ -326,87 +337,9 @@ static inline bool gfpflags_allow_blocking(const gfp_t 
gfp_flags)
 #define OPT_ZONE_DMA32 ZONE_NORMAL
 #endif
 
-/*
- * GFP_ZONE_TABLE is a word size bitstring that is used for looking up the
- * zone to use given the lowest 4 bits of 

[RFC PATCH v3 0/9] get rid of GFP_ZONE_TABLE/BAD

2018-05-23 Thread Huaisheng Ye
From: Huaisheng Ye 

Changes since v2: [2]
* According to Christoph's suggestion, rebase patches to current
  mainline from v4.16.

* Follow the advice of Matthew, create macros like GFP_NORMAL and
  GFP_NORMAL_UNMOVABLE to clear bottom 3 and 4 bits of GFP bitmask.

* Delete some patches because of kernel updating.

[2]: https://marc.info/?l=linux-mm=152691610014027=2

Tested by Lenovo Thinksystem server.

Initmem setup node 0 [mem 0x1000-0x00043fff]
[0.00] On node 0 totalpages: 4111666
[0.00]   DMA zone: 64 pages used for memmap
[0.00]   DMA zone: 23 pages reserved
[0.00]   DMA zone: 3999 pages, LIFO batch:0
[0.00] mminit::memmap_init Initialising map node 0 zone 0 pfns 1 -> 
4096 
[0.00]   DMA32 zone: 10935 pages used for memmap
[0.00]   DMA32 zone: 699795 pages, LIFO batch:31
[0.00] mminit::memmap_init Initialising map node 0 zone 1 pfns 4096 -> 
1048576
[0.00]   Normal zone: 53248 pages used for memmap
[0.00]   Normal zone: 3407872 pages, LIFO batch:31
[0.00] mminit::memmap_init Initialising map node 0 zone 2 pfns 1048576 
-> 4456448
[0.00] mminit::memmap_init Initialising map node 0 zone 3 pfns 1 -> 
4456448
[0.00] Initmem setup node 1 [mem 0x00238000-0x00277fff]
[0.00] On node 1 totalpages: 4194304
[0.00]   Normal zone: 65536 pages used for memmap
[0.00]   Normal zone: 4194304 pages, LIFO batch:31
[0.00] mminit::memmap_init Initialising map node 1 zone 2 pfns 37224448 
-> 41418752
[0.00] mminit::memmap_init Initialising map node 1 zone 3 pfns 37224448 
-> 41418752
...
[0.00] mminit::zonelist general 0:DMA = 0:DMA
[0.00] mminit::zonelist general 0:DMA32 = 0:DMA32 0:DMA
[0.00] mminit::zonelist general 0:Normal = 0:Normal 0:DMA32 0:DMA 
1:Normal
[0.00] mminit::zonelist thisnode 0:DMA = 0:DMA
[0.00] mminit::zonelist thisnode 0:DMA32 = 0:DMA32 0:DMA
[0.00] mminit::zonelist thisnode 0:Normal = 0:Normal 0:DMA32 0:DMA
[0.00] mminit::zonelist general 1:Normal = 1:Normal 0:Normal 0:DMA32 
0:DMA
[0.00] mminit::zonelist thisnode 1:Normal = 1:Normal
[0.00] Built 2 zonelists, mobility grouping on.  Total pages: 8176164
[0.00] Policy zone: Normal
[0.00] Kernel command line: BOOT_IMAGE=/vmlinuz-4.17.0-rc6-gfp09+ 
root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/root rd.lvm.lv=fedora/swap 
debug 
LANG=en_US.UTF-8 mminit_loglevel=4 console=tty0 console=ttyS0,115200n8 
memblock=debug
earlyprintk=serial,0x3f8,115200

---

Replace GFP_ZONE_TABLE and GFP_ZONE_BAD with encoded zone number.

Delete ___GFP_DMA, ___GFP_HIGHMEM and ___GFP_DMA32 from GFP bitmasks,
the bottom three bits of GFP mask is reserved for storing encoded
zone number.

The encoding method is XOR. Get zone number from enum zone_type,
then encode the number with ZONE_NORMAL by XOR operation.
The goal is to make sure ZONE_NORMAL can be encoded to zero. So,
the compatibility can be guaranteed, such as GFP_KERNEL and GFP_ATOMIC
can be used as before.

Reserve __GFP_MOVABLE in bit 3, so that it can continue to be used as
a flag. Same as before, __GFP_MOVABLE respresents movable migrate type
for ZONE_DMA, ZONE_DMA32, and ZONE_NORMAL. But when it is enabled with
__GFP_HIGHMEM, ZONE_MOVABLE shall be returned instead of ZONE_HIGHMEM.
__GFP_ZONE_MOVABLE is created to realize it.

With this patch, just enabling __GFP_MOVABLE and __GFP_HIGHMEM is not
enough to get ZONE_MOVABLE from gfp_zone. All callers should use
GFP_HIGHUSER_MOVABLE or __GFP_ZONE_MOVABLE directly to achieve that.

Decode zone number directly from bottom three bits of flags in gfp_zone.
The theory of encoding and decoding is,
A ^ B ^ B = A

Changes since v1:[1]

* Create __GFP_ZONE_MOVABLE and modify GFP_HIGHUSER_MOVABLE to help
  callers to get ZONE_MOVABLE. Try to create __GFP_ZONE_MASK to mask
  lowest 3 bits of GFP bitmasks.

* Modify some callers' gfp flag to update usage of address zone
  modifiers.

* Modify inline function gfp_zone to get better performance according
  to Matthew's suggestion.

[1]: https://marc.info/?l=linux-mm=152596791931266=2

---

Huaisheng Ye (9):
  include/linux/gfp.h: get rid of GFP_ZONE_TABLE/BAD
  include/linux/dma-mapping: update usage of zone modifiers
  drivers/xen/swiotlb-xen: update usage of zone modifiers
  fs/btrfs/extent_io: update usage of zone modifiers
  drivers/block/zram/zram_drv: update usage of zone modifiers
  mm/vmpressure: update usage of zone modifiers
  mm/zsmalloc: update usage of zone modifiers
  include/linux/highmem.h: update usage of movableflags
  arch/x86/include/asm/page.h: update usage of movableflags

 arch/x86/include/asm/page.h   |   3 +-
 drivers/block/zram/zram_drv.c |   6 +--
 drivers/xen/swiotlb-xen.c |   2 +-
 fs/btrfs/extent_io.c  |   2 +-
 include/linux/dma-mapping.h   |   2 +-
 include/linux/gfp.h   | 107 

RE: [External] Re: [RFC PATCH v2 00/12] get rid of GFP_ZONE_TABLE/BAD

2018-05-23 Thread Huaisheng HS1 Ye
From: Michal Hocko [mailto:mho...@kernel.org]
Sent: Wednesday, May 23, 2018 2:37 AM
> 
> On Mon 21-05-18 23:20:21, Huaisheng Ye wrote:
> > From: Huaisheng Ye 
> >
> > Replace GFP_ZONE_TABLE and GFP_ZONE_BAD with encoded zone number.
> >
> > Delete ___GFP_DMA, ___GFP_HIGHMEM and ___GFP_DMA32 from GFP bitmasks,
> > the bottom three bits of GFP mask is reserved for storing encoded
> > zone number.
> >
> > The encoding method is XOR. Get zone number from enum zone_type,
> > then encode the number with ZONE_NORMAL by XOR operation.
> > The goal is to make sure ZONE_NORMAL can be encoded to zero. So,
> > the compatibility can be guaranteed, such as GFP_KERNEL and GFP_ATOMIC
> > can be used as before.
> >
> > Reserve __GFP_MOVABLE in bit 3, so that it can continue to be used as
> > a flag. Same as before, __GFP_MOVABLE respresents movable migrate type
> > for ZONE_DMA, ZONE_DMA32, and ZONE_NORMAL. But when it is enabled with
> > __GFP_HIGHMEM, ZONE_MOVABLE shall be returned instead of ZONE_HIGHMEM.
> > __GFP_ZONE_MOVABLE is created to realize it.
> >
> > With this patch, just enabling __GFP_MOVABLE and __GFP_HIGHMEM is not
> > enough to get ZONE_MOVABLE from gfp_zone. All callers should use
> > GFP_HIGHUSER_MOVABLE or __GFP_ZONE_MOVABLE directly to achieve that.
> >
> > Decode zone number directly from bottom three bits of flags in gfp_zone.
> > The theory of encoding and decoding is,
> > A ^ B ^ B = A
> 
> So why is this any better than the current code. Sure I am not a great
> fan of GFP_ZONE_TABLE because of how it is incomprehensible but this
> doesn't look too much better, yet we are losing a check for incompatible
> gfp flags. The diffstat looks really sound but then you just look and
> see that the large part is the comment that at least explained the gfp
> zone modifiers somehow and the debugging code. So what is the selling
> point?

Dear Michal,

Let me try to reply your questions.
Exactly, GFP_ZONE_TABLE is too complicated. I think there are two advantages
from the series of patches.

1. XOR operation is simple and efficient, GFP_ZONE_TABLE/BAD need to do twice
shift operations, the first is for getting a zone_type and the second is for
checking the to be returned type is a correct or not. But with these patch XOR
operation just needs to use once. Because the bottom 3 bits of GFP bitmask have
been used to represent the encoded zone number, we can say there is no bad zone
number if all callers could use it without buggy way. Of course, the returned
zone type in gfp_zone needs to be no more than ZONE_MOVABLE.

2. GFP_ZONE_TABLE has limit with the amount of zone types. Current 
GFP_ZONE_TABLE
is 32 bits, in general, there are 4 zone types for most ofX86_64 platform, they
are ZONE_DMA, ZONE_DMA32, ZONE_NORMAL and ZONE_MOVABLE. If we want to expand the
amount of zone types to larger than 4, the zone shift should be 3. That is to 
say,
a 32 bits zone table is not enough to store all zone types.
And the most painful thing is that, current GFP bitmasks' space is quite
space-constrained it only have four ___GFP_XXX could be used as below,

#define ___GFP_DMA  0x01u
#define ___GFP_HIGHMEM  0x02u
#define ___GFP_DMA320x04u
(___GFP_NORMAL equals to 0x00)

If we use the implementation of these patches, there is a maximum of 8 zone 
types
could be used. The method of encoding and decoding is quite simple and users 
could
have an intuitive feeling for this as below, and the most important is that, 
there
is no BAD zone types eventually.

A ^ B ^ B = A

And by the way, our v3 patches are ready, but the smtp of Gmail is quite 
unstable
for some firewall reason in my side, I will try to resend them ASAP.

Sincerely,
Huaisheng Ye


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v3 9/9] arch/x86/include/asm/page.h: update usage of movableflags

2018-05-23 Thread Huaisheng Ye
From: Huaisheng Ye 

GFP_HIGHUSER_MOVABLE doesn't equal to GFP_HIGHUSER | __GFP_MOVABLE,
modify it to adapt patch of getting rid of GFP_ZONE_TABLE/BAD.

Signed-off-by: Huaisheng Ye 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Kate Stewart 
Cc: Greg Kroah-Hartman 
Cc: x...@kernel.org 
Cc: Philippe Ombredanne 
Cc: Christoph Hellwig 
---
 arch/x86/include/asm/page.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h
index 7555b48..a47f42d 100644
--- a/arch/x86/include/asm/page.h
+++ b/arch/x86/include/asm/page.h
@@ -35,7 +35,8 @@ static inline void copy_user_page(void *to, void *from, 
unsigned long vaddr,
 }
 
 #define __alloc_zeroed_user_highpage(movableflags, vma, vaddr) \
-   alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vaddr)
+   alloc_page_vma((movableflags ? GFP_HIGHUSER_MOVABLE : GFP_HIGHUSER) \
+   | __GFP_ZERO, vma, vaddr)
 #define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE
 
 #ifndef __pa
-- 
1.8.3.1


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v3 8/9] include/linux/highmem.h: update usage of movableflags

2018-05-23 Thread Huaisheng Ye
From: Huaisheng Ye 

GFP_HIGHUSER_MOVABLE doesn't equal to GFP_HIGHUSER | __GFP_MOVABLE,
modify it to adapt patch of getting rid of GFP_ZONE_TABLE/BAD.

Signed-off-by: Huaisheng Ye 
Cc: Kate Stewart 
Cc: Greg Kroah-Hartman 
Cc: Thomas Gleixner 
Cc: Philippe Ombredanne 
Cc: Christoph Hellwig 
---
 include/linux/highmem.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index 0690679..5383c9e 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -159,8 +159,8 @@ static inline void clear_user_highpage(struct page *page, 
unsigned long vaddr)
struct vm_area_struct *vma,
unsigned long vaddr)
 {
-   struct page *page = alloc_page_vma(GFP_HIGHUSER | movableflags,
-   vma, vaddr);
+   struct page *page = alloc_page_vma(movableflags ?
+   GFP_HIGHUSER_MOVABLE : GFP_HIGHUSER, vma, vaddr);
 
if (page)
clear_user_highpage(page, vaddr);
-- 
1.8.3.1


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v3 4/9] fs/btrfs/extent_io: update usage of zone modifiers

2018-05-23 Thread Huaisheng Ye
From: Huaisheng Ye 

Use __GFP_ZONE_MASK to replace (__GFP_DMA32 | __GFP_HIGHMEM).

In function alloc_extent_state, it is obvious that __GFP_DMA is not
the expecting zone type.

___GFP_DMA, ___GFP_HIGHMEM and ___GFP_DMA32 have been deleted from GFP
bitmasks, the bottom three bits of GFP mask is reserved for storing
encoded zone number.
__GFP_DMA, __GFP_HIGHMEM and __GFP_DMA32 should not be operated with
each others by OR.

Use GFP_NORMAL() to clear bottom 3 bits of GFP bitmaks.

Signed-off-by: Huaisheng Ye 
Cc: Chris Mason 
Cc: Josef Bacik 
Cc: David Sterba 
Cc: Christoph Hellwig 
---
 fs/btrfs/extent_io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index e99b329..f41fc61 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -220,7 +220,7 @@ static struct extent_state *alloc_extent_state(gfp_t mask)
 * The given mask might be not appropriate for the slab allocator,
 * drop the unsupported bits
 */
-   mask &= ~(__GFP_DMA32|__GFP_HIGHMEM);
+   mask = GFP_NORMAL(mask);
state = kmem_cache_alloc(extent_state_cache, mask);
if (!state)
return state;
-- 
1.8.3.1


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v3 1/9] include/linux/gfp.h: get rid of GFP_ZONE_TABLE/BAD

2018-05-23 Thread Huaisheng Ye
From: Huaisheng Ye 

Replace GFP_ZONE_TABLE and GFP_ZONE_BAD with encoded zone number.

Delete ___GFP_DMA, ___GFP_HIGHMEM and ___GFP_DMA32 from GFP bitmasks,
the bottom three bits of GFP mask is reserved for storing encoded
zone number.

The encoding method is XOR. Get zone number from enum zone_type,
then encode the number with ZONE_NORMAL by XOR operation.
The goal is to make sure ZONE_NORMAL can be encoded to zero. So,
the compatibility can be guaranteed, such as GFP_KERNEL and GFP_ATOMIC
can be used as before.

Reserve __GFP_MOVABLE in bit 3, so that it can continue to be used as
a flag. Same as before, __GFP_MOVABLE respresents movable migrate type
for ZONE_DMA, ZONE_DMA32, and ZONE_NORMAL. But when it is enabled with
__GFP_HIGHMEM, ZONE_MOVABLE shall be returned instead of ZONE_HIGHMEM.
__GFP_ZONE_MOVABLE is created to realize it.

With this patch, just enabling __GFP_MOVABLE and __GFP_HIGHMEM is not
enough to get ZONE_MOVABLE from gfp_zone. All subsystems should use
GFP_HIGHUSER_MOVABLE directly to achieve that.

Decode zone number directly from bottom three bits of flags in gfp_zone.
The theory of encoding and decoding is,
A ^ B ^ B = A

Suggested-by: Matthew Wilcox 
Signed-off-by: Huaisheng Ye 
Cc: Andrew Morton 
Cc: Vlastimil Babka 
Cc: Michal Hocko 
Cc: Mel Gorman 
Cc: Kate Stewart 
Cc: "Levin, Alexander (Sasha Levin)" 
Cc: Greg Kroah-Hartman 
Cc: Christoph Hellwig 
---
 include/linux/gfp.h | 107 ++--
 1 file changed, 20 insertions(+), 87 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 1a4582b..f76ccd76 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -16,9 +16,7 @@
  */
 
 /* Plain integer GFP bitmasks. Do not use this directly. */
-#define ___GFP_DMA 0x01u
-#define ___GFP_HIGHMEM 0x02u
-#define ___GFP_DMA32   0x04u
+#define ___GFP_ZONE_MASK   0x07u
 #define ___GFP_MOVABLE 0x08u
 #define ___GFP_RECLAIMABLE 0x10u
 #define ___GFP_HIGH0x20u
@@ -53,11 +51,15 @@
  * without the underscores and use them consistently. The definitions here may
  * be used in bit comparisons.
  */
-#define __GFP_DMA  ((__force gfp_t)___GFP_DMA)
-#define __GFP_HIGHMEM  ((__force gfp_t)___GFP_HIGHMEM)
-#define __GFP_DMA32((__force gfp_t)___GFP_DMA32)
+#define __GFP_DMA  ((__force gfp_t)OPT_ZONE_DMA ^ ZONE_NORMAL)
+#define __GFP_HIGHMEM  ((__force gfp_t)OPT_ZONE_HIGHMEM ^ ZONE_NORMAL)
+#define __GFP_DMA32((__force gfp_t)OPT_ZONE_DMA32 ^ ZONE_NORMAL)
 #define __GFP_MOVABLE  ((__force gfp_t)___GFP_MOVABLE)  /* ZONE_MOVABLE 
allowed */
-#define GFP_ZONEMASK   (__GFP_DMA|__GFP_HIGHMEM|__GFP_DMA32|__GFP_MOVABLE)
+#define GFP_ZONEMASK   ((__force gfp_t)___GFP_ZONE_MASK | ___GFP_MOVABLE)
+/* bottom 3 bits of GFP bitmasks are used for zone number encoded*/
+#define __GFP_ZONE_MASK ((__force gfp_t)___GFP_ZONE_MASK)
+#define __GFP_ZONE_MOVABLE \
+   ((__force gfp_t)(ZONE_MOVABLE ^ ZONE_NORMAL) | ___GFP_MOVABLE)
 
 /*
  * Page mobility and placement hints
@@ -268,6 +270,13 @@
  *   available and will not wake kswapd/kcompactd on failure. The _LIGHT
  *   version does not attempt reclaim/compaction at all and is by default used
  *   in page fault path, while the non-light is used by khugepaged.
+ *
+ * GFP_NORMAL() is used to clear bottom 3 bits of GFP bitmask. Actually it
+ *   returns encoded ZONE_NORMAL bits.
+ *
+ * GFP_NORMAL_UNMOVABLE() is similar to GFP_NORMAL, but it clear bottom 4 bits
+ *   of GFP bitmask. Excepting the encoded ZONE_NORMAL bits, it clears MOVABLE
+ *   flags as well.
  */
 #define GFP_ATOMIC (__GFP_HIGH|__GFP_ATOMIC|__GFP_KSWAPD_RECLAIM)
 #define GFP_KERNEL (__GFP_RECLAIM | __GFP_IO | __GFP_FS)
@@ -279,10 +288,12 @@
 #define GFP_DMA__GFP_DMA
 #define GFP_DMA32  __GFP_DMA32
 #define GFP_HIGHUSER   (GFP_USER | __GFP_HIGHMEM)
-#define GFP_HIGHUSER_MOVABLE   (GFP_HIGHUSER | __GFP_MOVABLE)
+#define GFP_HIGHUSER_MOVABLE   (GFP_USER | __GFP_ZONE_MOVABLE)
 #define GFP_TRANSHUGE_LIGHT((GFP_HIGHUSER_MOVABLE | __GFP_COMP | \
 __GFP_NOMEMALLOC | __GFP_NOWARN) & ~__GFP_RECLAIM)
 #define GFP_TRANSHUGE  (GFP_TRANSHUGE_LIGHT | __GFP_DIRECT_RECLAIM)
+#define GFP_NORMAL(gfp)((gfp) & ~__GFP_ZONE_MASK)
+#define GFP_NORMAL_UNMOVABLE(gfp) ((gfp) & ~GFP_ZONEMASK)
 
 /* Convert GFP flags to their corresponding migrate type */
 #define GFP_MOVABLE_MASK (__GFP_RECLAIMABLE|__GFP_MOVABLE)
@@ -326,87 +337,9 @@ static inline bool gfpflags_allow_blocking(const gfp_t 
gfp_flags)
 #define OPT_ZONE_DMA32 ZONE_NORMAL
 #endif
 
-/*
- * GFP_ZONE_TABLE is a word size bitstring that is used for looking up the
- * zone to use given the lowest 4 bits of 

[RFC PATCH v3 2/9] include/linux/dma-mapping: update usage of zone modifiers

2018-05-23 Thread Huaisheng Ye
From: Huaisheng Ye 

Use __GFP_ZONE_MASK to replace (__GFP_DMA | __GFP_HIGHMEM | __GFP_DMA32).

___GFP_DMA, ___GFP_HIGHMEM and ___GFP_DMA32 have been deleted from GFP
bitmasks, the bottom three bits of GFP mask is reserved for storing
encoded zone number.
__GFP_DMA, __GFP_HIGHMEM and __GFP_DMA32 should not be operated with
each others by OR.

Use GFP_NORMAL() to clear bottom 3 bits of GFP bitmaks.

Signed-off-by: Huaisheng Ye 
Cc: Christoph Hellwig 
Cc: Marek Szyprowski 
Cc: Robin Murphy 
Cc: Christoph Hellwig 
---
 include/linux/dma-mapping.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index f8ab1c0..8fe524d 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -519,7 +519,7 @@ static inline void *dma_alloc_attrs(struct device *dev, 
size_t size,
return cpu_addr;
 
/* let the implementation decide on the zone to allocate from: */
-   flag &= ~(__GFP_DMA | __GFP_DMA32 | __GFP_HIGHMEM);
+   flag = GFP_NORMAL(flag);
 
if (!arch_dma_alloc_attrs(, ))
return NULL;
-- 
1.8.3.1


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v3 5/9] drivers/block/zram/zram_drv: update usage of zone modifiers

2018-05-23 Thread Huaisheng Ye
From: Huaisheng Ye 

Use __GFP_ZONE_MOVABLE to replace (__GFP_HIGHMEM | __GFP_MOVABLE).

___GFP_DMA, ___GFP_HIGHMEM and ___GFP_DMA32 have been deleted from GFP
bitmasks, the bottom three bits of GFP mask is reserved for storing
encoded zone number.

__GFP_ZONE_MOVABLE contains encoded ZONE_MOVABLE and __GFP_MOVABLE flag.

With GFP_ZONE_TABLE, __GFP_HIGHMEM ORing __GFP_MOVABLE means gfp_zone
should return ZONE_MOVABLE. In order to keep that compatible with
GFP_ZONE_TABLE, replace (__GFP_HIGHMEM | __GFP_MOVABLE) with
__GFP_ZONE_MOVABLE.

Signed-off-by: Huaisheng Ye 
Cc: Minchan Kim 
Cc: Nitin Gupta 
Cc: Sergey Senozhatsky 
Cc: Christoph Hellwig 
---
 drivers/block/zram/zram_drv.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 0f3fadd..1bb5ca8 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1004,14 +1004,12 @@ static int __zram_bvec_write(struct zram *zram, struct 
bio_vec *bvec,
handle = zs_malloc(zram->mem_pool, comp_len,
__GFP_KSWAPD_RECLAIM |
__GFP_NOWARN |
-   __GFP_HIGHMEM |
-   __GFP_MOVABLE);
+   __GFP_ZONE_MOVABLE);
if (!handle) {
zcomp_stream_put(zram->comp);
atomic64_inc(>stats.writestall);
handle = zs_malloc(zram->mem_pool, comp_len,
-   GFP_NOIO | __GFP_HIGHMEM |
-   __GFP_MOVABLE);
+   GFP_NOIO | __GFP_ZONE_MOVABLE);
if (handle)
goto compress_again;
return -ENOMEM;
-- 
1.8.3.1


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v3 7/9] mm/zsmalloc: update usage of zone modifiers

2018-05-23 Thread Huaisheng Ye
From: Huaisheng Ye 

Use __GFP_ZONE_MOVABLE to replace (__GFP_HIGHMEM | __GFP_MOVABLE).

___GFP_DMA, ___GFP_HIGHMEM and ___GFP_DMA32 have been deleted from GFP
bitmasks, the bottom three bits of GFP mask is reserved for storing
encoded zone number.

__GFP_ZONE_MOVABLE contains encoded ZONE_MOVABLE and __GFP_MOVABLE flag.

With GFP_ZONE_TABLE, __GFP_HIGHMEM ORing __GFP_MOVABLE means gfp_zone
should return ZONE_MOVABLE. In order to keep that compatible with
GFP_ZONE_TABLE, Use GFP_NORMAL_UNMOVABLE() to clear bottom 4 bits of
GFP bitmaks.

Signed-off-by: Huaisheng Ye 
Cc: Minchan Kim 
Cc: Nitin Gupta 
Cc: Sergey Senozhatsky 
Cc: Christoph Hellwig 
---
 mm/zsmalloc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 61cb05d..e250c69 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -345,7 +345,7 @@ static void destroy_cache(struct zs_pool *pool)
 static unsigned long cache_alloc_handle(struct zs_pool *pool, gfp_t gfp)
 {
return (unsigned long)kmem_cache_alloc(pool->handle_cachep,
-   gfp & ~(__GFP_HIGHMEM|__GFP_MOVABLE));
+   GFP_NORMAL_UNMOVABLE(gfp));
 }
 
 static void cache_free_handle(struct zs_pool *pool, unsigned long handle)
@@ -356,7 +356,7 @@ static void cache_free_handle(struct zs_pool *pool, 
unsigned long handle)
 static struct zspage *cache_alloc_zspage(struct zs_pool *pool, gfp_t flags)
 {
return kmem_cache_alloc(pool->zspage_cachep,
-   flags & ~(__GFP_HIGHMEM|__GFP_MOVABLE));
+   GFP_NORMAL_UNMOVABLE(flags));
 }
 
 static void cache_free_zspage(struct zs_pool *pool, struct zspage *zspage)
-- 
1.8.3.1


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v3 6/9] mm/vmpressure: update usage of zone modifiers

2018-05-23 Thread Huaisheng Ye
From: Huaisheng Ye 

Use __GFP_ZONE_MOVABLE to replace (__GFP_HIGHMEM | __GFP_MOVABLE).

___GFP_DMA, ___GFP_HIGHMEM and ___GFP_DMA32 have been deleted from GFP
bitmasks, the bottom three bits of GFP mask is reserved for storing
encoded zone number.

__GFP_ZONE_MOVABLE contains encoded ZONE_MOVABLE and __GFP_MOVABLE flag.

With GFP_ZONE_TABLE, __GFP_HIGHMEM ORing __GFP_MOVABLE means gfp_zone
should return ZONE_MOVABLE. In order to keep that compatible with
GFP_ZONE_TABLE, replace (__GFP_HIGHMEM | __GFP_MOVABLE) with
__GFP_ZONE_MOVABLE.

Signed-off-by: Huaisheng Ye 
Cc: Andrew Morton 
Cc: zhongjiang 
Cc: Minchan Kim 
Cc: Dan Carpenter 
Cc: David Rientjes 
Cc: Christoph Hellwig 
---
 mm/vmpressure.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/vmpressure.c b/mm/vmpressure.c
index 85350ce..30a40e2 100644
--- a/mm/vmpressure.c
+++ b/mm/vmpressure.c
@@ -256,7 +256,7 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg, bool 
tree,
 * Indirect reclaim (kswapd) sets sc->gfp_mask to GFP_KERNEL, so
 * we account it too.
 */
-   if (!(gfp & (__GFP_HIGHMEM | __GFP_MOVABLE | __GFP_IO | __GFP_FS)))
+   if (!(gfp & (__GFP_ZONE_MOVABLE | __GFP_IO | __GFP_FS)))
return;
 
/*
-- 
1.8.3.1


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v3 3/9] drivers/xen/swiotlb-xen: update usage of zone modifiers

2018-05-23 Thread Huaisheng Ye
From: Huaisheng Ye 

Use __GFP_ZONE_MASK to replace (__GFP_DMA | __GFP_HIGHMEM).

In function xen_swiotlb_alloc_coherent, it is obvious that __GFP_DMA32
is not the expecting zone type.

___GFP_DMA, ___GFP_HIGHMEM and ___GFP_DMA32 have been deleted from GFP
bitmasks, the bottom three bits of GFP mask is reserved for storing
encoded zone number.
__GFP_DMA, __GFP_HIGHMEM and __GFP_DMA32 should not be operated with
each others by OR.

Use GFP_NORMAL() to clear bottom 3 bits of GFP bitmaks.

Signed-off-by: Huaisheng Ye 
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 
Cc: Juergen Gross 
Cc: Christoph Hellwig 
---
 drivers/xen/swiotlb-xen.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index e1c6089..359 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -301,7 +301,7 @@ int __ref xen_swiotlb_init(int verbose, bool early)
* machine physical layout.  We can't allocate highmem
* because we can't return a pointer to it.
*/
-   flags &= ~(__GFP_DMA | __GFP_HIGHMEM);
+   flags = GFP_NORMAL(flags);
 
/* On ARM this function returns an ioremap'ped virtual address for
 * which virt_to_phys doesn't return the corresponding physical
-- 
1.8.3.1


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v3 0/9] get rid of GFP_ZONE_TABLE/BAD

2018-05-23 Thread Huaisheng Ye
From: Huaisheng Ye 

Changes since v2: [2]
* According to Christoph's suggestion, rebase patches to current
  mainline from v4.16.

* Follow the advice of Matthew, create macros like GFP_NORMAL and
  GFP_NORMAL_UNMOVABLE to clear bottom 3 and 4 bits of GFP bitmask.

* Delete some patches because of kernel updating.

[2]: https://marc.info/?l=linux-mm=152691610014027=2

Tested by Lenovo Thinksystem server.

Initmem setup node 0 [mem 0x1000-0x00043fff]
[0.00] On node 0 totalpages: 4111666
[0.00]   DMA zone: 64 pages used for memmap
[0.00]   DMA zone: 23 pages reserved
[0.00]   DMA zone: 3999 pages, LIFO batch:0
[0.00] mminit::memmap_init Initialising map node 0 zone 0 pfns 1 -> 
4096 
[0.00]   DMA32 zone: 10935 pages used for memmap
[0.00]   DMA32 zone: 699795 pages, LIFO batch:31
[0.00] mminit::memmap_init Initialising map node 0 zone 1 pfns 4096 -> 
1048576
[0.00]   Normal zone: 53248 pages used for memmap
[0.00]   Normal zone: 3407872 pages, LIFO batch:31
[0.00] mminit::memmap_init Initialising map node 0 zone 2 pfns 1048576 
-> 4456448
[0.00] mminit::memmap_init Initialising map node 0 zone 3 pfns 1 -> 
4456448
[0.00] Initmem setup node 1 [mem 0x00238000-0x00277fff]
[0.00] On node 1 totalpages: 4194304
[0.00]   Normal zone: 65536 pages used for memmap
[0.00]   Normal zone: 4194304 pages, LIFO batch:31
[0.00] mminit::memmap_init Initialising map node 1 zone 2 pfns 37224448 
-> 41418752
[0.00] mminit::memmap_init Initialising map node 1 zone 3 pfns 37224448 
-> 41418752
...
[0.00] mminit::zonelist general 0:DMA = 0:DMA
[0.00] mminit::zonelist general 0:DMA32 = 0:DMA32 0:DMA
[0.00] mminit::zonelist general 0:Normal = 0:Normal 0:DMA32 0:DMA 
1:Normal
[0.00] mminit::zonelist thisnode 0:DMA = 0:DMA
[0.00] mminit::zonelist thisnode 0:DMA32 = 0:DMA32 0:DMA
[0.00] mminit::zonelist thisnode 0:Normal = 0:Normal 0:DMA32 0:DMA
[0.00] mminit::zonelist general 1:Normal = 1:Normal 0:Normal 0:DMA32 
0:DMA
[0.00] mminit::zonelist thisnode 1:Normal = 1:Normal
[0.00] Built 2 zonelists, mobility grouping on.  Total pages: 8176164
[0.00] Policy zone: Normal
[0.00] Kernel command line: BOOT_IMAGE=/vmlinuz-4.17.0-rc6-gfp09+ 
root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/root rd.lvm.lv=fedora/swap 
debug 
LANG=en_US.UTF-8 mminit_loglevel=4 console=tty0 console=ttyS0,115200n8 
memblock=debug
earlyprintk=serial,0x3f8,115200

---

Replace GFP_ZONE_TABLE and GFP_ZONE_BAD with encoded zone number.

Delete ___GFP_DMA, ___GFP_HIGHMEM and ___GFP_DMA32 from GFP bitmasks,
the bottom three bits of GFP mask is reserved for storing encoded
zone number.

The encoding method is XOR. Get zone number from enum zone_type,
then encode the number with ZONE_NORMAL by XOR operation.
The goal is to make sure ZONE_NORMAL can be encoded to zero. So,
the compatibility can be guaranteed, such as GFP_KERNEL and GFP_ATOMIC
can be used as before.

Reserve __GFP_MOVABLE in bit 3, so that it can continue to be used as
a flag. Same as before, __GFP_MOVABLE respresents movable migrate type
for ZONE_DMA, ZONE_DMA32, and ZONE_NORMAL. But when it is enabled with
__GFP_HIGHMEM, ZONE_MOVABLE shall be returned instead of ZONE_HIGHMEM.
__GFP_ZONE_MOVABLE is created to realize it.

With this patch, just enabling __GFP_MOVABLE and __GFP_HIGHMEM is not
enough to get ZONE_MOVABLE from gfp_zone. All callers should use
GFP_HIGHUSER_MOVABLE or __GFP_ZONE_MOVABLE directly to achieve that.

Decode zone number directly from bottom three bits of flags in gfp_zone.
The theory of encoding and decoding is,
A ^ B ^ B = A

Changes since v1:[1]

* Create __GFP_ZONE_MOVABLE and modify GFP_HIGHUSER_MOVABLE to help
  callers to get ZONE_MOVABLE. Try to create __GFP_ZONE_MASK to mask
  lowest 3 bits of GFP bitmasks.

* Modify some callers' gfp flag to update usage of address zone
  modifiers.

* Modify inline function gfp_zone to get better performance according
  to Matthew's suggestion.

[1]: https://marc.info/?l=linux-mm=152596791931266=2

---

Huaisheng Ye (9):
  include/linux/gfp.h: get rid of GFP_ZONE_TABLE/BAD
  include/linux/dma-mapping: update usage of zone modifiers
  drivers/xen/swiotlb-xen: update usage of zone modifiers
  fs/btrfs/extent_io: update usage of zone modifiers
  drivers/block/zram/zram_drv: update usage of zone modifiers
  mm/vmpressure: update usage of zone modifiers
  mm/zsmalloc: update usage of zone modifiers
  include/linux/highmem.h: update usage of movableflags
  arch/x86/include/asm/page.h: update usage of movableflags

 arch/x86/include/asm/page.h   |   3 +-
 drivers/block/zram/zram_drv.c |   6 +--
 drivers/xen/swiotlb-xen.c |   2 +-
 fs/btrfs/extent_io.c  |   2 +-
 include/linux/dma-mapping.h   |   2 +-
 include/linux/gfp.h   | 107 

Re: [PATCH 1/1] iommu/dma: fix trival coding style mistake

2018-05-23 Thread Robin Murphy

On 23/05/18 07:02, Zhen Lei wrote:

No functional changes.


What's the mistake?


Signed-off-by: Zhen Lei 
---
  drivers/iommu/dma-iommu.c | 12 +++-
  1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index ddcbbdb..4e885f7 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -231,6 +231,9 @@ static int iova_reserve_iommu_regions(struct device *dev,
LIST_HEAD(resv_regions);
int ret = 0;

+   if (!dev)
+   return 0;


Logically, it makes no sense at all to call this function without a 
valid device; doing the check in init_domain was a deliberate decision 
to reflect that. This isn't a cleanup path shared by multiple callers 
where the "accept NULL for simplicity" argument might apply.



+
if (dev_is_pci(dev))
iova_reserve_pci_windows(to_pci_dev(dev), iovad);

@@ -246,11 +249,12 @@ static int iova_reserve_iommu_regions(struct device *dev,
hi = iova_pfn(iovad, region->start + region->length - 1);
reserve_iova(iovad, lo, hi);

-   if (region->type == IOMMU_RESV_MSI)
+   if (region->type == IOMMU_RESV_MSI) {
ret = cookie_init_hw_msi_region(cookie, region->start,
region->start + region->length);
-   if (ret)
-   break;
+   if (ret)
+   break;
+   }


Why? ret is already initialised appropriately, and the coding style even 
says that going beyond 3 levels of indentation is undesirable...


Robin.


}
iommu_put_resv_regions(dev, _regions);

@@ -308,8 +312,6 @@ int iommu_dma_init_domain(struct iommu_domain *domain, 
dma_addr_t base,
}

init_iova_domain(iovad, 1UL << order, base_pfn);
-   if (!dev)
-   return 0;

return iova_reserve_iommu_regions(dev, domain);
  }
--
1.8.3


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 13/40] vfio: Add support for Shared Virtual Addressing

2018-05-23 Thread Xu Zaibo

Hi,

On 2018/5/12 3:06, Jean-Philippe Brucker wrote:

Add two new ioctls for VFIO containers. VFIO_IOMMU_BIND_PROCESS creates a
bond between a container and a process address space, identified by a
Process Address Space ID (PASID). Devices in the container append this
PASID to DMA transactions in order to access the process' address space.
The process page tables are shared with the IOMMU, and mechanisms such as
PCI ATS/PRI are used to handle faults. VFIO_IOMMU_UNBIND_PROCESS removes a
bond created with VFIO_IOMMU_BIND_PROCESS.

Signed-off-by: Jean-Philippe Brucker 






+static int vfio_iommu_bind_group(struct vfio_iommu *iommu,
+struct vfio_group *group,
+struct vfio_mm *vfio_mm)
+{
+   int ret;
+   bool enabled_sva = false;
+   struct vfio_iommu_sva_bind_data data = {
+   .vfio_mm= vfio_mm,
+   .iommu  = iommu,
+   .count  = 0,
+   };
+
+   if (!group->sva_enabled) {
+   ret = iommu_group_for_each_dev(group->iommu_group, NULL,
+  vfio_iommu_sva_init);

Do we need to do *sva_init here or do anything to avoid repeated initiation?
while another process already did initiation at this device, I think 
that current process will get an EEXIST.


Thanks.

+   if (ret)
+   return ret;
+
+   group->sva_enabled = enabled_sva = true;
+   }
+
+   ret = iommu_group_for_each_dev(group->iommu_group, ,
+  vfio_iommu_sva_bind_dev);
+   if (ret && data.count > 1)
+   iommu_group_for_each_dev(group->iommu_group, vfio_mm,
+vfio_iommu_sva_unbind_dev);
+   if (ret && enabled_sva) {
+   iommu_group_for_each_dev(group->iommu_group, NULL,
+vfio_iommu_sva_shutdown);
+   group->sva_enabled = false;
+   }
+
+   return ret;
+}


  
@@ -1442,6 +1636,10 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,

if (ret)
goto out_detach;
  
+	ret = vfio_iommu_replay_bind(iommu, group);

+   if (ret)
+   goto out_detach;
+
if (resv_msi) {
ret = iommu_get_msi_cookie(domain->domain, resv_msi_base);
if (ret)
@@ -1547,6 +1745,11 @@ static void vfio_iommu_type1_detach_group(void 
*iommu_data,
continue;
  
  		iommu_detach_group(domain->domain, iommu_group);

+   if (group->sva_enabled) {
+   iommu_group_for_each_dev(iommu_group, NULL,
+vfio_iommu_sva_shutdown);
+   group->sva_enabled = false;
+   }
Here, why shut down here? If another process is working on the device, 
there may be a crash?


Thanks.

list_del(>next);
kfree(group);
/*
@@ -1562,6 +1765,7 @@ static void vfio_iommu_type1_detach_group(void 
*iommu_data,




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/1] iommu/dma: fix trival coding style mistake

2018-05-23 Thread Zhen Lei
No functional changes.

Signed-off-by: Zhen Lei 
---
 drivers/iommu/dma-iommu.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index ddcbbdb..4e885f7 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -231,6 +231,9 @@ static int iova_reserve_iommu_regions(struct device *dev,
LIST_HEAD(resv_regions);
int ret = 0;

+   if (!dev)
+   return 0;
+
if (dev_is_pci(dev))
iova_reserve_pci_windows(to_pci_dev(dev), iovad);

@@ -246,11 +249,12 @@ static int iova_reserve_iommu_regions(struct device *dev,
hi = iova_pfn(iovad, region->start + region->length - 1);
reserve_iova(iovad, lo, hi);

-   if (region->type == IOMMU_RESV_MSI)
+   if (region->type == IOMMU_RESV_MSI) {
ret = cookie_init_hw_msi_region(cookie, region->start,
region->start + region->length);
-   if (ret)
-   break;
+   if (ret)
+   break;
+   }
}
iommu_put_resv_regions(dev, _regions);

@@ -308,8 +312,6 @@ int iommu_dma_init_domain(struct iommu_domain *domain, 
dma_addr_t base,
}

init_iova_domain(iovad, 1UL << order, base_pfn);
-   if (!dev)
-   return 0;

return iova_reserve_iommu_regions(dev, domain);
 }
--
1.8.3


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu