Re: [PATCH v6 3/6] mm/cma: populate ZONE_CMA

2016-10-26 Thread Vlastimil Babka

On 10/26/2016 06:31 AM, Joonsoo Kim wrote:

Hello,

Here comes fixed one.

--->8
From 93fb05a83d74f9e2c8caebc2fa6d1a8807c9ffb6 Mon Sep 17 00:00:00 2001
From: Joonsoo Kim 
Date: Thu, 24 Mar 2016 22:29:10 +0900
Subject: [PATCH] mm/cma: populate ZONE_CMA

Until now, reserved pages for CMA are managed in the ordinary zones
where page's pfn are belong to. This approach has numorous problems
and fixing them isn't easy. (It is mentioned on previous patch.)
To fix this situation, ZONE_CMA is introduced in previous patch, but,
not yet populated. This patch implement population of ZONE_CMA
by stealing reserved pages from the ordinary zones.

Unlike previous implementation that kernel allocation request with
__GFP_MOVABLE could be serviced from CMA region, allocation request only
with GFP_HIGHUSER_MOVABLE can be serviced from CMA region in the new
approach. This is an inevitable design decision to use the zone
implementation because ZONE_CMA could contain highmem. Due to this
decision, ZONE_CMA will work like as ZONE_HIGHMEM or ZONE_MOVABLE.

I don't think it would be a problem because most of file cache pages
and anonymous pages are requested with GFP_HIGHUSER_MOVABLE. It could
be proved by the fact that there are many systems with ZONE_HIGHMEM and
they work fine. Notable disadvantage is that we cannot use these pages
for blockdev file cache page, because it usually has __GFP_MOVABLE but
not __GFP_HIGHMEM and __GFP_USER. But, in this case, there is pros and
cons. In my experience, blockdev file cache pages are one of the top
reason that causes cma_alloc() to fail temporarily. So, we can get more
guarantee of cma_alloc() success by discarding that case.

Implementation itself is very easy to understand. Steal when cma area is
initialized and recalculate various per zone stat/threshold.

Reviewed-by: Aneesh Kumar K.V 
Signed-off-by: Joonsoo Kim 


Acked-by: Vlastimil Babka 



Re: [PATCH v6 3/6] mm/cma: populate ZONE_CMA

2016-10-25 Thread Joonsoo Kim
On Tue, Oct 18, 2016 at 05:27:30PM +0900, Joonsoo Kim wrote:
> On Tue, Oct 18, 2016 at 09:42:57AM +0200, Vlastimil Babka wrote:
> > On 10/14/2016 05:03 AM, js1...@gmail.com wrote:
> > >@@ -145,6 +145,35 @@ static int __init cma_activate_area(struct cma *cma)
> > > static int __init cma_init_reserved_areas(void)
> > > {
> > >   int i;
> > >+  struct zone *zone;
> > >+  pg_data_t *pgdat;
> > >+
> > >+  if (!cma_area_count)
> > >+  return 0;
> > >+
> > >+  for_each_online_pgdat(pgdat) {
> > >+  unsigned long start_pfn = UINT_MAX, end_pfn = 0;
> > >+
> > >+  for (i = 0; i < cma_area_count; i++) {
> > >+  if (pfn_to_nid(cma_areas[i].base_pfn) !=
> > >+  pgdat->node_id)
> > >+  continue;
> > >+
> > >+  start_pfn = min(start_pfn, cma_areas[i].base_pfn);
> > >+  end_pfn = max(end_pfn, cma_areas[i].base_pfn +
> > >+  cma_areas[i].count);
> > >+  }
> > >+
> > >+  if (!end_pfn)
> > >+  continue;
> > >+
> > >+  zone = &pgdat->node_zones[ZONE_CMA];
> > >+
> > >+  /* ZONE_CMA doesn't need to exceed CMA region */
> > >+  zone->zone_start_pfn = max(zone->zone_start_pfn, start_pfn);
> > >+  zone->spanned_pages = min(zone_end_pfn(zone), end_pfn) -
> > >+  zone->zone_start_pfn;
> > 
> > Hmm, do the max/min here work as intended? IIUC the initial
> 
> Yeap.
> 
> > zone_start_pfn is UINT_MAX and zone->spanned_pages is 1? So at least
> > the max/min should be swapped?
> 
> No. CMA zone's start/end pfn are updated as node's start/end pfn.
> 
> > Also the zone_end_pfn(zone) on the second line already sees the
> > changes to zone->zone_start_pfn in the first line, so it's kind of a
> > mess. You should probably cache zone_end_pfn() to a temporary
> > variable before changing zone_start_pfn.
> 
> You're right although it doesn't cause any problem. I look at the code
> again and find that max/min isn't needed. Calculated start/end pfn
> should be inbetween node's start/end pfn so max(zone->zone_start_pfn,
> start_pfn) will return start_pfn and messed up min(zone_end_pfn(zone),
> end_pfn) will return end_pfn in all the cases.
> 
> Anyway, I will fix it as following.
> 
> zone->zone_start_pfn = start_pfn
> zone->spanned_pages = end_pfn - start_pfn

Hello,

Here comes fixed one.

--->8
>From 93fb05a83d74f9e2c8caebc2fa6d1a8807c9ffb6 Mon Sep 17 00:00:00 2001
From: Joonsoo Kim 
Date: Thu, 24 Mar 2016 22:29:10 +0900
Subject: [PATCH] mm/cma: populate ZONE_CMA

Until now, reserved pages for CMA are managed in the ordinary zones
where page's pfn are belong to. This approach has numorous problems
and fixing them isn't easy. (It is mentioned on previous patch.)
To fix this situation, ZONE_CMA is introduced in previous patch, but,
not yet populated. This patch implement population of ZONE_CMA
by stealing reserved pages from the ordinary zones.

Unlike previous implementation that kernel allocation request with
__GFP_MOVABLE could be serviced from CMA region, allocation request only
with GFP_HIGHUSER_MOVABLE can be serviced from CMA region in the new
approach. This is an inevitable design decision to use the zone
implementation because ZONE_CMA could contain highmem. Due to this
decision, ZONE_CMA will work like as ZONE_HIGHMEM or ZONE_MOVABLE.

I don't think it would be a problem because most of file cache pages
and anonymous pages are requested with GFP_HIGHUSER_MOVABLE. It could
be proved by the fact that there are many systems with ZONE_HIGHMEM and
they work fine. Notable disadvantage is that we cannot use these pages
for blockdev file cache page, because it usually has __GFP_MOVABLE but
not __GFP_HIGHMEM and __GFP_USER. But, in this case, there is pros and
cons. In my experience, blockdev file cache pages are one of the top
reason that causes cma_alloc() to fail temporarily. So, we can get more
guarantee of cma_alloc() success by discarding that case.

Implementation itself is very easy to understand. Steal when cma area is
initialized and recalculate various per zone stat/threshold.

Reviewed-by: Aneesh Kumar K.V 
Signed-off-by: Joonsoo Kim 
---
 include/linux/memory_hotplug.h |  3 --
 include/linux/mm.h |  1 +
 mm/cma.c   | 62 ++
 mm/internal.h  |  3 ++
 mm/page_alloc.c| 29 +---
 5 files changed, 86 insertions(+), 12 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 01033fa..ea5af47 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -198,9 +198,6 @@ extern void get_page_bootmem(unsigned long ingo, struct 
page *page,
 void mem_hotplug_begin(void);
 void mem_hotplug_done(void);
 
-extern void set_zone_contiguous(struct zone *zone);
-extern void clear_zone_contiguous(struct 

Re: [PATCH v6 3/6] mm/cma: populate ZONE_CMA

2016-10-18 Thread Joonsoo Kim
On Tue, Oct 18, 2016 at 09:42:57AM +0200, Vlastimil Babka wrote:
> On 10/14/2016 05:03 AM, js1...@gmail.com wrote:
> >@@ -145,6 +145,35 @@ static int __init cma_activate_area(struct cma *cma)
> > static int __init cma_init_reserved_areas(void)
> > {
> > int i;
> >+struct zone *zone;
> >+pg_data_t *pgdat;
> >+
> >+if (!cma_area_count)
> >+return 0;
> >+
> >+for_each_online_pgdat(pgdat) {
> >+unsigned long start_pfn = UINT_MAX, end_pfn = 0;
> >+
> >+for (i = 0; i < cma_area_count; i++) {
> >+if (pfn_to_nid(cma_areas[i].base_pfn) !=
> >+pgdat->node_id)
> >+continue;
> >+
> >+start_pfn = min(start_pfn, cma_areas[i].base_pfn);
> >+end_pfn = max(end_pfn, cma_areas[i].base_pfn +
> >+cma_areas[i].count);
> >+}
> >+
> >+if (!end_pfn)
> >+continue;
> >+
> >+zone = &pgdat->node_zones[ZONE_CMA];
> >+
> >+/* ZONE_CMA doesn't need to exceed CMA region */
> >+zone->zone_start_pfn = max(zone->zone_start_pfn, start_pfn);
> >+zone->spanned_pages = min(zone_end_pfn(zone), end_pfn) -
> >+zone->zone_start_pfn;
> 
> Hmm, do the max/min here work as intended? IIUC the initial

Yeap.

> zone_start_pfn is UINT_MAX and zone->spanned_pages is 1? So at least
> the max/min should be swapped?

No. CMA zone's start/end pfn are updated as node's start/end pfn.

> Also the zone_end_pfn(zone) on the second line already sees the
> changes to zone->zone_start_pfn in the first line, so it's kind of a
> mess. You should probably cache zone_end_pfn() to a temporary
> variable before changing zone_start_pfn.

You're right although it doesn't cause any problem. I look at the code
again and find that max/min isn't needed. Calculated start/end pfn
should be inbetween node's start/end pfn so max(zone->zone_start_pfn,
start_pfn) will return start_pfn and messed up min(zone_end_pfn(zone),
end_pfn) will return end_pfn in all the cases.

Anyway, I will fix it as following.

zone->zone_start_pfn = start_pfn
zone->spanned_pages = end_pfn - start_pfn

Thanks.



Re: [PATCH v6 3/6] mm/cma: populate ZONE_CMA

2016-10-18 Thread Vlastimil Babka

On 10/14/2016 05:03 AM, js1...@gmail.com wrote:

@@ -145,6 +145,35 @@ static int __init cma_activate_area(struct cma *cma)
 static int __init cma_init_reserved_areas(void)
 {
int i;
+   struct zone *zone;
+   pg_data_t *pgdat;
+
+   if (!cma_area_count)
+   return 0;
+
+   for_each_online_pgdat(pgdat) {
+   unsigned long start_pfn = UINT_MAX, end_pfn = 0;
+
+   for (i = 0; i < cma_area_count; i++) {
+   if (pfn_to_nid(cma_areas[i].base_pfn) !=
+   pgdat->node_id)
+   continue;
+
+   start_pfn = min(start_pfn, cma_areas[i].base_pfn);
+   end_pfn = max(end_pfn, cma_areas[i].base_pfn +
+   cma_areas[i].count);
+   }
+
+   if (!end_pfn)
+   continue;
+
+   zone = &pgdat->node_zones[ZONE_CMA];
+
+   /* ZONE_CMA doesn't need to exceed CMA region */
+   zone->zone_start_pfn = max(zone->zone_start_pfn, start_pfn);
+   zone->spanned_pages = min(zone_end_pfn(zone), end_pfn) -
+   zone->zone_start_pfn;


Hmm, do the max/min here work as intended? IIUC the initial 
zone_start_pfn is UINT_MAX and zone->spanned_pages is 1? So at least the 
max/min should be swapped?
Also the zone_end_pfn(zone) on the second line already sees the changes 
to zone->zone_start_pfn in the first line, so it's kind of a mess. You 
should probably cache zone_end_pfn() to a temporary variable before 
changing zone_start_pfn.



+   }


I'm guessing the initial values come from this part in patch 2/6:


@@ -5723,6 +5738,8 @@ static void __meminit calculate_node_totalpages(struct 
pglist_data *pgdat,
unsigned long *zholes_size)
 {
unsigned long realtotalpages = 0, totalpages = 0;
+   unsigned long zone_cma_start_pfn = UINT_MAX;
+   unsigned long zone_cma_end_pfn = 0;
enum zone_type i;

for (i = 0; i < MAX_NR_ZONES; i++) {
@@ -5730,6 +5747,13 @@ static void __meminit calculate_node_totalpages(struct 
pglist_data *pgdat,
unsigned long zone_start_pfn, zone_end_pfn;
unsigned long size, real_size;

+   if (is_zone_cma_idx(i)) {
+   zone->zone_start_pfn = zone_cma_start_pfn;
+   size = zone_cma_end_pfn - zone_cma_start_pfn;
+   real_size = 0;
+   goto init_zone;
+   }




[PATCH v6 3/6] mm/cma: populate ZONE_CMA

2016-10-13 Thread js1304
From: Joonsoo Kim 

Until now, reserved pages for CMA are managed in the ordinary zones
where page's pfn are belong to. This approach has numorous problems
and fixing them isn't easy. (It is mentioned on previous patch.)
To fix this situation, ZONE_CMA is introduced in previous patch, but,
not yet populated. This patch implement population of ZONE_CMA
by stealing reserved pages from the ordinary zones.

Unlike previous implementation that kernel allocation request with
__GFP_MOVABLE could be serviced from CMA region, allocation request only
with GFP_HIGHUSER_MOVABLE can be serviced from CMA region in the new
approach. This is an inevitable design decision to use the zone
implementation because ZONE_CMA could contain highmem. Due to this
decision, ZONE_CMA will work like as ZONE_HIGHMEM or ZONE_MOVABLE.

I don't think it would be a problem because most of file cache pages
and anonymous pages are requested with GFP_HIGHUSER_MOVABLE. It could
be proved by the fact that there are many systems with ZONE_HIGHMEM and
they work fine. Notable disadvantage is that we cannot use these pages
for blockdev file cache page, because it usually has __GFP_MOVABLE but
not __GFP_HIGHMEM and __GFP_USER. But, in this case, there is pros and
cons. In my experience, blockdev file cache pages are one of the top
reason that causes cma_alloc() to fail temporarily. So, we can get more
guarantee of cma_alloc() success by discarding that case.

Implementation itself is very easy to understand. Steal when cma area is
initialized and recalculate various per zone stat/threshold.

Reviewed-by: Aneesh Kumar K.V 
Signed-off-by: Joonsoo Kim 
---
 include/linux/memory_hotplug.h |  3 --
 include/linux/mm.h |  1 +
 mm/cma.c   | 63 ++
 mm/internal.h  |  3 ++
 mm/page_alloc.c| 29 ---
 5 files changed, 87 insertions(+), 12 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 01033fa..ea5af47 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -198,9 +198,6 @@ extern void get_page_bootmem(unsigned long ingo, struct 
page *page,
 void mem_hotplug_begin(void);
 void mem_hotplug_done(void);
 
-extern void set_zone_contiguous(struct zone *zone);
-extern void clear_zone_contiguous(struct zone *zone);
-
 #else /* ! CONFIG_MEMORY_HOTPLUG */
 /*
  * Stub functions for when hotplug is off
diff --git a/include/linux/mm.h b/include/linux/mm.h
index e9caec6..6348d35 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1923,6 +1923,7 @@ extern __printf(2, 3)
 
 extern void zone_pcp_update(struct zone *zone);
 extern void zone_pcp_reset(struct zone *zone);
+extern void setup_zone_pageset(struct zone *zone);
 
 /* page_alloc.c */
 extern int min_free_kbytes;
diff --git a/mm/cma.c b/mm/cma.c
index 384c2cb..ba7c340 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -38,6 +38,7 @@
 #include 
 
 #include "cma.h"
+#include "internal.h"
 
 struct cma cma_areas[MAX_CMA_AREAS];
 unsigned cma_area_count;
@@ -116,10 +117,9 @@ static int __init cma_activate_area(struct cma *cma)
for (j = pageblock_nr_pages; j; --j, pfn++) {
WARN_ON_ONCE(!pfn_valid(pfn));
/*
-* alloc_contig_range requires the pfn range
-* specified to be in the same zone. Make this
-* simple by forcing the entire CMA resv range
-* to be in the same zone.
+* In init_cma_reserved_pageblock(), present_pages is
+* adjusted with assumption that all pages come from
+* a single zone. It could be fixed but not yet done.
 */
if (page_zone(pfn_to_page(pfn)) != zone)
goto err;
@@ -145,6 +145,35 @@ static int __init cma_activate_area(struct cma *cma)
 static int __init cma_init_reserved_areas(void)
 {
int i;
+   struct zone *zone;
+   pg_data_t *pgdat;
+
+   if (!cma_area_count)
+   return 0;
+
+   for_each_online_pgdat(pgdat) {
+   unsigned long start_pfn = UINT_MAX, end_pfn = 0;
+
+   for (i = 0; i < cma_area_count; i++) {
+   if (pfn_to_nid(cma_areas[i].base_pfn) !=
+   pgdat->node_id)
+   continue;
+
+   start_pfn = min(start_pfn, cma_areas[i].base_pfn);
+   end_pfn = max(end_pfn, cma_areas[i].base_pfn +
+   cma_areas[i].count);
+   }
+
+   if (!end_pfn)
+   continue;
+
+   zone = &pgdat->node_zones[ZONE_CMA];
+
+   /* ZONE_CMA doesn't need to exceed CMA region */
+   zone->zone_start_pfn = max(zone->zone_start_pfn, start_pfn);