Re: [PATCH mm] introduce reverse buddy concept to reduce buddy fragment

2017-07-04 Thread Mel Gorman
On Tue, Jul 04, 2017 at 01:24:14PM +0200, Michal Hocko wrote:
> On Tue 04-07-17 16:04:52, zhouxianrong wrote:
> > every 2s i sample /proc/buddyinfo in the whole test process.
> > 
> > the last about 90 samples were sampled after the test was done.
> 
> I've tried to explain to you that numbers without a proper testing
> metodology and highlevel metrics you are interested in and comparision
> to the base kernel are meaningless. I cannot draw any conclusion from
> looking at numbers you have posted. Are high order allocations cheaper
> to do with this patch? What about an averge order-0 allocation request?
> 

I have to agree. The patch is extremely complex for what it does which
is working around a limitation of the buddy allocator in general
(buddy's must be naturally aligned). There would have to be *strong*
justification that allocations fail even with compaction or a reclaim
cycle or that the latency is severely reduced -- neither which is
evident from the data presented. It would also have to be proven that
there is no overhead added in the general case to justify this so
without extensive justification for the complexity;

Naked-by: Mel Gorman 

> You are touching memory allocator hot paths and those are really
> sensitive to changes. It takes a lot of testing with different workloads
> to prove that no new regressions are introduced. That being said, I
> completely agree that reducing the memory fragmentation is an important
> objective but touching the page allocator and adding new branches there
> sounds like a problematic approach which would have to show _huge_
> benefits to be mergeable. Is it possible to improve khugepaged to
> accomplish the same thing?

Or if this is CMA related, a justification why alloc_contig_range cannot do
the same thing with a linear walk when the initial allocation attempt fails.

-- 
Mel Gorman
SUSE Labs


Re: [PATCH mm] introduce reverse buddy concept to reduce buddy fragment

2017-07-04 Thread Mel Gorman
On Tue, Jul 04, 2017 at 01:24:14PM +0200, Michal Hocko wrote:
> On Tue 04-07-17 16:04:52, zhouxianrong wrote:
> > every 2s i sample /proc/buddyinfo in the whole test process.
> > 
> > the last about 90 samples were sampled after the test was done.
> 
> I've tried to explain to you that numbers without a proper testing
> metodology and highlevel metrics you are interested in and comparision
> to the base kernel are meaningless. I cannot draw any conclusion from
> looking at numbers you have posted. Are high order allocations cheaper
> to do with this patch? What about an averge order-0 allocation request?
> 

I have to agree. The patch is extremely complex for what it does which
is working around a limitation of the buddy allocator in general
(buddy's must be naturally aligned). There would have to be *strong*
justification that allocations fail even with compaction or a reclaim
cycle or that the latency is severely reduced -- neither which is
evident from the data presented. It would also have to be proven that
there is no overhead added in the general case to justify this so
without extensive justification for the complexity;

Naked-by: Mel Gorman 

> You are touching memory allocator hot paths and those are really
> sensitive to changes. It takes a lot of testing with different workloads
> to prove that no new regressions are introduced. That being said, I
> completely agree that reducing the memory fragmentation is an important
> objective but touching the page allocator and adding new branches there
> sounds like a problematic approach which would have to show _huge_
> benefits to be mergeable. Is it possible to improve khugepaged to
> accomplish the same thing?

Or if this is CMA related, a justification why alloc_contig_range cannot do
the same thing with a linear walk when the initial allocation attempt fails.

-- 
Mel Gorman
SUSE Labs


Re: [PATCH mm] introduce reverse buddy concept to reduce buddy fragment

2017-07-04 Thread Michal Hocko
On Tue 04-07-17 16:04:52, zhouxianrong wrote:
> every 2s i sample /proc/buddyinfo in the whole test process.
> 
> the last about 90 samples were sampled after the test was done.

I've tried to explain to you that numbers without a proper testing
metodology and highlevel metrics you are interested in and comparision
to the base kernel are meaningless. I cannot draw any conclusion from
looking at numbers you have posted. Are high order allocations cheaper
to do with this patch? What about an averge order-0 allocation request?

You are touching memory allocator hot paths and those are really
sensitive to changes. It takes a lot of testing with different workloads
to prove that no new regressions are introduced. That being said, I
completely agree that reducing the memory fragmentation is an important
objective but touching the page allocator and adding new branches there
sounds like a problematic approach which would have to show _huge_
benefits to be mergeable. Is it possible to improve khugepaged to
accomplish the same thing?
-- 
Michal Hocko
SUSE Labs


Re: [PATCH mm] introduce reverse buddy concept to reduce buddy fragment

2017-07-04 Thread Michal Hocko
On Tue 04-07-17 16:04:52, zhouxianrong wrote:
> every 2s i sample /proc/buddyinfo in the whole test process.
> 
> the last about 90 samples were sampled after the test was done.

I've tried to explain to you that numbers without a proper testing
metodology and highlevel metrics you are interested in and comparision
to the base kernel are meaningless. I cannot draw any conclusion from
looking at numbers you have posted. Are high order allocations cheaper
to do with this patch? What about an averge order-0 allocation request?

You are touching memory allocator hot paths and those are really
sensitive to changes. It takes a lot of testing with different workloads
to prove that no new regressions are introduced. That being said, I
completely agree that reducing the memory fragmentation is an important
objective but touching the page allocator and adding new branches there
sounds like a problematic approach which would have to show _huge_
benefits to be mergeable. Is it possible to improve khugepaged to
accomplish the same thing?
-- 
Michal Hocko
SUSE Labs


Re: [PATCH mm] introduce reverse buddy concept to reduce buddy fragment

2017-07-04 Thread zhouxianrong

every 2s i sample /proc/buddyinfo in the whole test process.

the last about 90 samples were sampled after the test was done.

Node 0, zone  DMA
  4706   2099838266 50  5  3  2  1  2 38
 0395   1261211 57  6  1  0  0  0  0
Node 0, zone  DMA
  4691   2107833265 50  5  3  2  1  2 38
 0395   1260211 57  6  1  0  0  0  0
Node 0, zone  DMA
  1815   1437791266 51  6  2  2  1  2 38
 0244   1275211 57  6  1  0  0  0  0
Node 0, zone  DMA
  1465   1030796267 51  6  2  2  1  2 38
 0246   1279211 57  6  1  0  0  0  0
Node 0, zone  DMA
  1378   1114791260 51  6  2  2  1  2 38
 0183   1282216 54  3  1  0  0  0  0
Node 0, zone  DMA
  2605   2021619260 51  6  2  2  1  2 38
 0307   1330220 54  3  1  0  0  0  0
Node 0, zone  DMA
  2465   2026618260 51  6  2  2  1  2 38
 0312   1330220 54  3  1  0  0  0  0
Node 0, zone  DMA
   758766462224 43  6  2  2  1  2 38
 0148   1082194 50  3  1  0  0  0  0
Node 0, zone  DMA
   912939472224 43  6  2  2  1  2 38
 0174   1086194 50  3  1  0  0  0  0
Node 0, zone  DMA
   502   1049428226 44  6  2  2  1  2 38
 0187   1092198 50  3  1  0  0  0  0
Node 0, zone  DMA
   747   1338671228 46  6  2  2  1  2 38
 0222   1180204 51  3  1  0  0  0  0
Node 0, zone  DMA
   675   1351667226 46  6  2  2  1  2 38
 0220   1180204 52  3  1  0  0  0  0
Node 0, zone  DMA
   865787266220 45  6  2  2  1  2 38
 0116984203 51  3  1  0  0  0  0
Node 0, zone  DMA
  1915   1233351261 47  6  2  2  1  2 38
 0179   1191230 53  3  1  0  0  0  0
Node 0, zone  DMA
  2078   1348402258 46  6  2  2  1  2 38
 0183   1233228 53  3  1  0  0  0  0
Node 0, zone  DMA
  2940   1129457259 46  6  2  2  1  2 38
 0209   1239229 54  3  1  0  0  0  0
Node 0, zone  DMA
  2906   1127457259 46  6  2  2  1  2 38
 0222   1240230 54  3  1  0  0  0  0
Node 0, zone  DMA
  1540   1093475256 46  6  2  2  1  2 38
 0293   1234227 54  3  1  0  0  0  0
Node 0, zone  DMA
  1060   1071487257 46  6  2  2  1  2 38
 0297   1238227 54  3  1  0  0  0  0
Node 0, zone  DMA
  1693869405267 46  6  2  2  1  2 38
 0243   1480230 54  3  1  0  0  0  0
Node 0, zone  DMA
  1720928426269 46  6  2  2  1  2 38
 0260   1485230 54  3  1  0  0  0  0
Node 0, zone  DMA
   546601393269 46  6  2  2  1  2 38
 0193   1314230 54  3  1  0  0  0  0
Node 0, zone  DMA
   583336 28 42 20  3  2  2  2  2 36
 0 15 43  5 20  1  1  0  0  0  0
Node 0, zone  DMA
   592382 27 39 21  3  2  2  2  2 36
 0 17 27  5 21  1  1  0  0  0  0
Node 0, zone  DMA
  3534   1510212 78 35  6  3  3  1  2 32
 0122333 66 28  2  1  0  0  0  0
Node 0, zone  DMA
  3411   1521212 78 35  6  3  3  1  2 32
 0123334 67 28  2  1  0  0  0  0
Node 0, zone  DMA
  1521   1521211 79 35  6  3  3  1  2 32
 0118336 68 28  2  1  0  0  0  0
Node 0, zone  DMA
 3  1  2  0  3  3  1  1  1  3 18
 0  2  

Re: [PATCH mm] introduce reverse buddy concept to reduce buddy fragment

2017-07-04 Thread zhouxianrong

every 2s i sample /proc/buddyinfo in the whole test process.

the last about 90 samples were sampled after the test was done.

Node 0, zone  DMA
  4706   2099838266 50  5  3  2  1  2 38
 0395   1261211 57  6  1  0  0  0  0
Node 0, zone  DMA
  4691   2107833265 50  5  3  2  1  2 38
 0395   1260211 57  6  1  0  0  0  0
Node 0, zone  DMA
  1815   1437791266 51  6  2  2  1  2 38
 0244   1275211 57  6  1  0  0  0  0
Node 0, zone  DMA
  1465   1030796267 51  6  2  2  1  2 38
 0246   1279211 57  6  1  0  0  0  0
Node 0, zone  DMA
  1378   1114791260 51  6  2  2  1  2 38
 0183   1282216 54  3  1  0  0  0  0
Node 0, zone  DMA
  2605   2021619260 51  6  2  2  1  2 38
 0307   1330220 54  3  1  0  0  0  0
Node 0, zone  DMA
  2465   2026618260 51  6  2  2  1  2 38
 0312   1330220 54  3  1  0  0  0  0
Node 0, zone  DMA
   758766462224 43  6  2  2  1  2 38
 0148   1082194 50  3  1  0  0  0  0
Node 0, zone  DMA
   912939472224 43  6  2  2  1  2 38
 0174   1086194 50  3  1  0  0  0  0
Node 0, zone  DMA
   502   1049428226 44  6  2  2  1  2 38
 0187   1092198 50  3  1  0  0  0  0
Node 0, zone  DMA
   747   1338671228 46  6  2  2  1  2 38
 0222   1180204 51  3  1  0  0  0  0
Node 0, zone  DMA
   675   1351667226 46  6  2  2  1  2 38
 0220   1180204 52  3  1  0  0  0  0
Node 0, zone  DMA
   865787266220 45  6  2  2  1  2 38
 0116984203 51  3  1  0  0  0  0
Node 0, zone  DMA
  1915   1233351261 47  6  2  2  1  2 38
 0179   1191230 53  3  1  0  0  0  0
Node 0, zone  DMA
  2078   1348402258 46  6  2  2  1  2 38
 0183   1233228 53  3  1  0  0  0  0
Node 0, zone  DMA
  2940   1129457259 46  6  2  2  1  2 38
 0209   1239229 54  3  1  0  0  0  0
Node 0, zone  DMA
  2906   1127457259 46  6  2  2  1  2 38
 0222   1240230 54  3  1  0  0  0  0
Node 0, zone  DMA
  1540   1093475256 46  6  2  2  1  2 38
 0293   1234227 54  3  1  0  0  0  0
Node 0, zone  DMA
  1060   1071487257 46  6  2  2  1  2 38
 0297   1238227 54  3  1  0  0  0  0
Node 0, zone  DMA
  1693869405267 46  6  2  2  1  2 38
 0243   1480230 54  3  1  0  0  0  0
Node 0, zone  DMA
  1720928426269 46  6  2  2  1  2 38
 0260   1485230 54  3  1  0  0  0  0
Node 0, zone  DMA
   546601393269 46  6  2  2  1  2 38
 0193   1314230 54  3  1  0  0  0  0
Node 0, zone  DMA
   583336 28 42 20  3  2  2  2  2 36
 0 15 43  5 20  1  1  0  0  0  0
Node 0, zone  DMA
   592382 27 39 21  3  2  2  2  2 36
 0 17 27  5 21  1  1  0  0  0  0
Node 0, zone  DMA
  3534   1510212 78 35  6  3  3  1  2 32
 0122333 66 28  2  1  0  0  0  0
Node 0, zone  DMA
  3411   1521212 78 35  6  3  3  1  2 32
 0123334 67 28  2  1  0  0  0  0
Node 0, zone  DMA
  1521   1521211 79 35  6  3  3  1  2 32
 0118336 68 28  2  1  0  0  0  0
Node 0, zone  DMA
 3  1  2  0  3  3  1  1  1  3 18
 0  2  

Re: [PATCH mm] introduce reverse buddy concept to reduce buddy fragment

2017-07-04 Thread zhouxianrong

i do the test again. after minutes i tell you the result.

On 2017/7/4 14:52, Michal Hocko wrote:

On Tue 04-07-17 09:21:00, zhouxianrong wrote:

the test was done as follows:

1. the environment is android 7.0 and kernel is 4.1 and managed memory is 3.5GB


There have been many changes in the compaction proper since than. Do you
see the same problem with the current upstream kernel?


2. every 4s startup one apk, total 100 more apks need to startup
3. after finishing step 2, sample buddyinfo once and get the result


How stable are those results?





Re: [PATCH mm] introduce reverse buddy concept to reduce buddy fragment

2017-07-04 Thread zhouxianrong

i do the test again. after minutes i tell you the result.

On 2017/7/4 14:52, Michal Hocko wrote:

On Tue 04-07-17 09:21:00, zhouxianrong wrote:

the test was done as follows:

1. the environment is android 7.0 and kernel is 4.1 and managed memory is 3.5GB


There have been many changes in the compaction proper since than. Do you
see the same problem with the current upstream kernel?


2. every 4s startup one apk, total 100 more apks need to startup
3. after finishing step 2, sample buddyinfo once and get the result


How stable are those results?





Re: [PATCH mm] introduce reverse buddy concept to reduce buddy fragment

2017-07-04 Thread Michal Hocko
On Tue 04-07-17 09:21:00, zhouxianrong wrote:
> the test was done as follows:
> 
> 1. the environment is android 7.0 and kernel is 4.1 and managed memory is 
> 3.5GB

There have been many changes in the compaction proper since than. Do you
see the same problem with the current upstream kernel?

> 2. every 4s startup one apk, total 100 more apks need to startup
> 3. after finishing step 2, sample buddyinfo once and get the result

How stable are those results?
-- 
Michal Hocko
SUSE Labs


Re: [PATCH mm] introduce reverse buddy concept to reduce buddy fragment

2017-07-04 Thread Michal Hocko
On Tue 04-07-17 09:21:00, zhouxianrong wrote:
> the test was done as follows:
> 
> 1. the environment is android 7.0 and kernel is 4.1 and managed memory is 
> 3.5GB

There have been many changes in the compaction proper since than. Do you
see the same problem with the current upstream kernel?

> 2. every 4s startup one apk, total 100 more apks need to startup
> 3. after finishing step 2, sample buddyinfo once and get the result

How stable are those results?
-- 
Michal Hocko
SUSE Labs


Re: [PATCH mm] introduce reverse buddy concept to reduce buddy fragment

2017-07-03 Thread zhouxianrong

the test was done as follows:

1. the environment is android 7.0 and kernel is 4.1 and managed memory is 3.5GB
2. every 4s startup one apk, total 100 more apks need to startup
3. after finishing step 2, sample buddyinfo once and get the result

On 2017/7/3 23:33, Michal Hocko wrote:

On Mon 03-07-17 20:02:16, zhouxianrong wrote:
[...]

from above i think after applying the patch the result is better.


You haven't described your testing methodology, nor the workload that was
tested. As such this data is completely meaningless.





Re: [PATCH mm] introduce reverse buddy concept to reduce buddy fragment

2017-07-03 Thread zhouxianrong

the test was done as follows:

1. the environment is android 7.0 and kernel is 4.1 and managed memory is 3.5GB
2. every 4s startup one apk, total 100 more apks need to startup
3. after finishing step 2, sample buddyinfo once and get the result

On 2017/7/3 23:33, Michal Hocko wrote:

On Mon 03-07-17 20:02:16, zhouxianrong wrote:
[...]

from above i think after applying the patch the result is better.


You haven't described your testing methodology, nor the workload that was
tested. As such this data is completely meaningless.





Re: [PATCH mm] introduce reverse buddy concept to reduce buddy fragment

2017-07-03 Thread Michal Hocko
On Mon 03-07-17 20:02:16, zhouxianrong wrote:
[...]
> from above i think after applying the patch the result is better.

You haven't described your testing methodology, nor the workload that was
tested. As such this data is completely meaningless.

-- 
Michal Hocko
SUSE Labs


Re: [PATCH mm] introduce reverse buddy concept to reduce buddy fragment

2017-07-03 Thread Michal Hocko
On Mon 03-07-17 20:02:16, zhouxianrong wrote:
[...]
> from above i think after applying the patch the result is better.

You haven't described your testing methodology, nor the workload that was
tested. As such this data is completely meaningless.

-- 
Michal Hocko
SUSE Labs


Re: [PATCH mm] introduce reverse buddy concept to reduce buddy fragment

2017-07-03 Thread zhouxianrong



On 2017/7/3 15:48, Michal Hocko wrote:

On Fri 30-06-17 19:25:41, zhouxianr...@huawei.com wrote:

From: zhouxianrong 

when buddy is under fragment i find that still there are some pages
just like AFFA mode. A is allocated, F is free, AF is buddy pair for
oder n, FA is buddy pair for oder n as well.


Could you quantify how often does this happen and how much of a problem
this actually is? Is there any specific workload that would suffer from
such an artificial fragmentation?


I want to compse the
FF as oder n + 1 and align to n other than n + 1. this patch broke
the rules of buddy stated as alignment to its length of oder. i think
we can do so except for kernel stack because the requirement comes from
buddy attribution rather than user.


Why do you think the stack is a problem here?


for kernel stack requirement i add
__GFP_NOREVERSEBUDDY for this purpose.

a sample just like blow.

Node 0, zone  DMA
  1389   1765342272  2  0  0  0  0  0  0
 0 75   4398   1560379 27  2  0  0  0  0
Node 0, zone   Normal
20 24 14  2  0  0  0  0  0  0  0
 0  6228  3  0  0  0  0  0  0  0


at the sample moment if we have not this patch, the aspect should like below:

Node 0, zoneDMA
   (1389 + 75 * 2)   (1765 + 4398 * 2)(342 + 1560 * 2)(272 + 379 * 2)   
   (2 + 27 * 2)  (0 + 2 * 2)  0  0  0  0  0
Node 0, zoneNormal
   (20 + 6 * 2)(24 + 228 * 2) (14 + 3 * 2)  2  0  0  0  
0  0  0  0

i find out AFFA mode in lower order free_list and move FF into higher order 
free_list_reverse.

now we only consider the DMA zone. let we see the difference.

Node 0, zoneDMA
   (1389 + 75 * 2)   (1765 + 4398 * 2)   (342 + 1560 * 2)   (272 + 379 * 2)   
(2 + 27 * 2)   (0 + 2 * 2)   0  0  0  0  0

it is equal to

Node 0, zoneDMA
   1539   10561   3804   1302   5840   0   0   0   0   --  
1)

after applying this patch

Node 0, zoneDMA
   1389   1765342272  2  0  0  0  0  0  0
   0  75  4398   1560379 27 2  0  0  0  0

it is equivalent to

Node 0, zoneDMA
   (1389 + 0)   (1765 + 75)   (342 + 4398)   (272 + 1560)   (2 + 379)   (0 + 
27)   (0 + 2)   0  0  0  0

it is equal to

Node 0, zoneDMA
   1389 1840  4740   1832   381 27  
   2 0  0  0  0   --  2)

let's write 1) and 2) together and compare them

   1539   10561   3804   1302   5840   0   0   0   0   - 1)
   1389   18404740   1832   381   27   2   0   0   0   0   - 2)

from above i think after applying the patch the result is better.



the patch does not consider fallback allocation for now.


The path is missing the crucial information required for any
optimization. Some numbers to compare how much it helps. The above
output of buddyinfo is pointless without any base to compare to. Also
which workloads would benefit from this change and how much? It is also
a non trivial amount of code in the guts of the page allocator so this
really needs _much_ better explanation.

I haven't looked closely on the code yet but a quick look at
set_reverse_free_area scared me away.


Signed-off-by: zhouxianrong 
---
 include/linux/gfp.h |8 +-
 include/linux/mmzone.h  |2 +
 include/linux/page-flags.h  |9 ++
 include/linux/thread_info.h |5 +-
 mm/compaction.c |   17 
 mm/internal.h   |7 ++
 mm/page_alloc.c |  222 +++
 mm/vmstat.c |5 +-
 8 files changed, 251 insertions(+), 24 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index db373b9..f63d4d9 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -40,6 +40,7 @@
 #define ___GFP_DIRECT_RECLAIM  0x40u
 #define ___GFP_WRITE   0x80u
 #define ___GFP_KSWAPD_RECLAIM  0x100u
+#define ___GFP_NOREVERSEBUDDY  0x200u
 /* If the above are modified, __GFP_BITS_SHIFT may need updating */

 /*
@@ -171,6 +172,10 @@
  * __GFP_NOTRACK_FALSE_POSITIVE is an alias of __GFP_NOTRACK. It's a means of
  *   distinguishing in the source between false positives and allocations that
  *   cannot be supported (e.g. page tables).
+ *
+ * __GFP_NOREVERSEBUDDY does not allocate pages from reverse buddy list
+ *   of current order. It make sure that allocation is alignment to same order
+ *   with length order.
  */
 #define __GFP_COLD ((__force gfp_t)___GFP_COLD)
 #define __GFP_NOWARN   ((__force gfp_t)___GFP_NOWARN)
@@ -178,9 +183,10 @@
 #define __GFP_ZERO ((__force gfp_t)___GFP_ZERO)
 #define __GFP_NOTRACK  ((__force gfp_t)___GFP_NOTRACK)
 #define 

Re: [PATCH mm] introduce reverse buddy concept to reduce buddy fragment

2017-07-03 Thread zhouxianrong



On 2017/7/3 15:48, Michal Hocko wrote:

On Fri 30-06-17 19:25:41, zhouxianr...@huawei.com wrote:

From: zhouxianrong 

when buddy is under fragment i find that still there are some pages
just like AFFA mode. A is allocated, F is free, AF is buddy pair for
oder n, FA is buddy pair for oder n as well.


Could you quantify how often does this happen and how much of a problem
this actually is? Is there any specific workload that would suffer from
such an artificial fragmentation?


I want to compse the
FF as oder n + 1 and align to n other than n + 1. this patch broke
the rules of buddy stated as alignment to its length of oder. i think
we can do so except for kernel stack because the requirement comes from
buddy attribution rather than user.


Why do you think the stack is a problem here?


for kernel stack requirement i add
__GFP_NOREVERSEBUDDY for this purpose.

a sample just like blow.

Node 0, zone  DMA
  1389   1765342272  2  0  0  0  0  0  0
 0 75   4398   1560379 27  2  0  0  0  0
Node 0, zone   Normal
20 24 14  2  0  0  0  0  0  0  0
 0  6228  3  0  0  0  0  0  0  0


at the sample moment if we have not this patch, the aspect should like below:

Node 0, zoneDMA
   (1389 + 75 * 2)   (1765 + 4398 * 2)(342 + 1560 * 2)(272 + 379 * 2)   
   (2 + 27 * 2)  (0 + 2 * 2)  0  0  0  0  0
Node 0, zoneNormal
   (20 + 6 * 2)(24 + 228 * 2) (14 + 3 * 2)  2  0  0  0  
0  0  0  0

i find out AFFA mode in lower order free_list and move FF into higher order 
free_list_reverse.

now we only consider the DMA zone. let we see the difference.

Node 0, zoneDMA
   (1389 + 75 * 2)   (1765 + 4398 * 2)   (342 + 1560 * 2)   (272 + 379 * 2)   
(2 + 27 * 2)   (0 + 2 * 2)   0  0  0  0  0

it is equal to

Node 0, zoneDMA
   1539   10561   3804   1302   5840   0   0   0   0   --  
1)

after applying this patch

Node 0, zoneDMA
   1389   1765342272  2  0  0  0  0  0  0
   0  75  4398   1560379 27 2  0  0  0  0

it is equivalent to

Node 0, zoneDMA
   (1389 + 0)   (1765 + 75)   (342 + 4398)   (272 + 1560)   (2 + 379)   (0 + 
27)   (0 + 2)   0  0  0  0

it is equal to

Node 0, zoneDMA
   1389 1840  4740   1832   381 27  
   2 0  0  0  0   --  2)

let's write 1) and 2) together and compare them

   1539   10561   3804   1302   5840   0   0   0   0   - 1)
   1389   18404740   1832   381   27   2   0   0   0   0   - 2)

from above i think after applying the patch the result is better.



the patch does not consider fallback allocation for now.


The path is missing the crucial information required for any
optimization. Some numbers to compare how much it helps. The above
output of buddyinfo is pointless without any base to compare to. Also
which workloads would benefit from this change and how much? It is also
a non trivial amount of code in the guts of the page allocator so this
really needs _much_ better explanation.

I haven't looked closely on the code yet but a quick look at
set_reverse_free_area scared me away.


Signed-off-by: zhouxianrong 
---
 include/linux/gfp.h |8 +-
 include/linux/mmzone.h  |2 +
 include/linux/page-flags.h  |9 ++
 include/linux/thread_info.h |5 +-
 mm/compaction.c |   17 
 mm/internal.h   |7 ++
 mm/page_alloc.c |  222 +++
 mm/vmstat.c |5 +-
 8 files changed, 251 insertions(+), 24 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index db373b9..f63d4d9 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -40,6 +40,7 @@
 #define ___GFP_DIRECT_RECLAIM  0x40u
 #define ___GFP_WRITE   0x80u
 #define ___GFP_KSWAPD_RECLAIM  0x100u
+#define ___GFP_NOREVERSEBUDDY  0x200u
 /* If the above are modified, __GFP_BITS_SHIFT may need updating */

 /*
@@ -171,6 +172,10 @@
  * __GFP_NOTRACK_FALSE_POSITIVE is an alias of __GFP_NOTRACK. It's a means of
  *   distinguishing in the source between false positives and allocations that
  *   cannot be supported (e.g. page tables).
+ *
+ * __GFP_NOREVERSEBUDDY does not allocate pages from reverse buddy list
+ *   of current order. It make sure that allocation is alignment to same order
+ *   with length order.
  */
 #define __GFP_COLD ((__force gfp_t)___GFP_COLD)
 #define __GFP_NOWARN   ((__force gfp_t)___GFP_NOWARN)
@@ -178,9 +183,10 @@
 #define __GFP_ZERO ((__force gfp_t)___GFP_ZERO)
 #define __GFP_NOTRACK  ((__force gfp_t)___GFP_NOTRACK)
 #define __GFP_NOTRACK_FALSE_POSITIVE (__GFP_NOTRACK)
+#define 

Re: [PATCH mm] introduce reverse buddy concept to reduce buddy fragment

2017-07-03 Thread zhouxianrong



On 2017/7/3 15:48, Michal Hocko wrote:

On Fri 30-06-17 19:25:41, zhouxianr...@huawei.com wrote:

From: zhouxianrong 

when buddy is under fragment i find that still there are some pages
just like AFFA mode. A is allocated, F is free, AF is buddy pair for
oder n, FA is buddy pair for oder n as well.


Could you quantify how often does this happen and how much of a problem
this actually is? Is there any specific workload that would suffer from
such an artificial fragmentation?


I want to compse the
FF as oder n + 1 and align to n other than n + 1. this patch broke
the rules of buddy stated as alignment to its length of oder. i think
we can do so except for kernel stack because the requirement comes from
buddy attribution rather than user.


Why do you think the stack is a problem here?


for kernel stack requirement i add
__GFP_NOREVERSEBUDDY for this purpose.

a sample just like blow.

Node 0, zone  DMA
  1389   1765342272  2  0  0  0  0  0  0
 0 75   4398   1560379 27  2  0  0  0  0
Node 0, zone   Normal
20 24 14  2  0  0  0  0  0  0  0
 0  6228  3  0  0  0  0  0  0  0



at the sample moment if we have not this patch, the aspect should like below:

Node 0, zoneDMA
   (1389 + 75 * 2)   (1765 + 4398 * 2)(342 + 1560 * 2)(272 + 379 * 2)   
   (2 + 27 * 2)  (0 + 2 * 2)  0  0  0  0  0
Node 0, zoneNormal
   (20 + 6 * 2)(24 + 228 * 2) (14 + 3 * 2)  2  0  0  0  
0  0  0  0

i find out AFFA mode in lower order free_list and move FF into higher order 
free_list_reverse.


the patch does not consider fallback allocation for now.


The path is missing the crucial information required for any
optimization. Some numbers to compare how much it helps. The above
output of buddyinfo is pointless without any base to compare to. Also
which workloads would benefit from this change and how much? It is also
a non trivial amount of code in the guts of the page allocator so this
really needs _much_ better explanation.

I haven't looked closely on the code yet but a quick look at
set_reverse_free_area scared me away.


Signed-off-by: zhouxianrong 
---
 include/linux/gfp.h |8 +-
 include/linux/mmzone.h  |2 +
 include/linux/page-flags.h  |9 ++
 include/linux/thread_info.h |5 +-
 mm/compaction.c |   17 
 mm/internal.h   |7 ++
 mm/page_alloc.c |  222 +++
 mm/vmstat.c |5 +-
 8 files changed, 251 insertions(+), 24 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index db373b9..f63d4d9 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -40,6 +40,7 @@
 #define ___GFP_DIRECT_RECLAIM  0x40u
 #define ___GFP_WRITE   0x80u
 #define ___GFP_KSWAPD_RECLAIM  0x100u
+#define ___GFP_NOREVERSEBUDDY  0x200u
 /* If the above are modified, __GFP_BITS_SHIFT may need updating */

 /*
@@ -171,6 +172,10 @@
  * __GFP_NOTRACK_FALSE_POSITIVE is an alias of __GFP_NOTRACK. It's a means of
  *   distinguishing in the source between false positives and allocations that
  *   cannot be supported (e.g. page tables).
+ *
+ * __GFP_NOREVERSEBUDDY does not allocate pages from reverse buddy list
+ *   of current order. It make sure that allocation is alignment to same order
+ *   with length order.
  */
 #define __GFP_COLD ((__force gfp_t)___GFP_COLD)
 #define __GFP_NOWARN   ((__force gfp_t)___GFP_NOWARN)
@@ -178,9 +183,10 @@
 #define __GFP_ZERO ((__force gfp_t)___GFP_ZERO)
 #define __GFP_NOTRACK  ((__force gfp_t)___GFP_NOTRACK)
 #define __GFP_NOTRACK_FALSE_POSITIVE (__GFP_NOTRACK)
+#define __GFP_NOREVERSEBUDDY ((__force gfp_t)___GFP_NOREVERSEBUDDY)

 /* Room for N __GFP_FOO bits */
-#define __GFP_BITS_SHIFT 25
+#define __GFP_BITS_SHIFT 26
 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))

 /*
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 8e02b37..94237fe 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -89,7 +89,9 @@ enum {

 struct free_area {
struct list_headfree_list[MIGRATE_TYPES];
+   struct list_headfree_list_reverse[MIGRATE_TYPES];
unsigned long   nr_free;
+   unsigned long   nr_free_reverse;
 };

 struct pglist_data;
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 6b5818d..39d17d7 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -675,6 +675,15 @@ static inline int TestClearPageDoubleMap(struct page *page)
 #define PAGE_KMEMCG_MAPCOUNT_VALUE (-512)
 PAGE_MAPCOUNT_OPS(Kmemcg, KMEMCG)

+/*
+ * ReverseBuddy is enabled for the buddy allocator that allow allocating
+ * two adjacent same 

Re: [PATCH mm] introduce reverse buddy concept to reduce buddy fragment

2017-07-03 Thread zhouxianrong



On 2017/7/3 15:48, Michal Hocko wrote:

On Fri 30-06-17 19:25:41, zhouxianr...@huawei.com wrote:

From: zhouxianrong 

when buddy is under fragment i find that still there are some pages
just like AFFA mode. A is allocated, F is free, AF is buddy pair for
oder n, FA is buddy pair for oder n as well.


Could you quantify how often does this happen and how much of a problem
this actually is? Is there any specific workload that would suffer from
such an artificial fragmentation?


I want to compse the
FF as oder n + 1 and align to n other than n + 1. this patch broke
the rules of buddy stated as alignment to its length of oder. i think
we can do so except for kernel stack because the requirement comes from
buddy attribution rather than user.


Why do you think the stack is a problem here?


for kernel stack requirement i add
__GFP_NOREVERSEBUDDY for this purpose.

a sample just like blow.

Node 0, zone  DMA
  1389   1765342272  2  0  0  0  0  0  0
 0 75   4398   1560379 27  2  0  0  0  0
Node 0, zone   Normal
20 24 14  2  0  0  0  0  0  0  0
 0  6228  3  0  0  0  0  0  0  0



at the sample moment if we have not this patch, the aspect should like below:

Node 0, zoneDMA
   (1389 + 75 * 2)   (1765 + 4398 * 2)(342 + 1560 * 2)(272 + 379 * 2)   
   (2 + 27 * 2)  (0 + 2 * 2)  0  0  0  0  0
Node 0, zoneNormal
   (20 + 6 * 2)(24 + 228 * 2) (14 + 3 * 2)  2  0  0  0  
0  0  0  0

i find out AFFA mode in lower order free_list and move FF into higher order 
free_list_reverse.


the patch does not consider fallback allocation for now.


The path is missing the crucial information required for any
optimization. Some numbers to compare how much it helps. The above
output of buddyinfo is pointless without any base to compare to. Also
which workloads would benefit from this change and how much? It is also
a non trivial amount of code in the guts of the page allocator so this
really needs _much_ better explanation.

I haven't looked closely on the code yet but a quick look at
set_reverse_free_area scared me away.


Signed-off-by: zhouxianrong 
---
 include/linux/gfp.h |8 +-
 include/linux/mmzone.h  |2 +
 include/linux/page-flags.h  |9 ++
 include/linux/thread_info.h |5 +-
 mm/compaction.c |   17 
 mm/internal.h   |7 ++
 mm/page_alloc.c |  222 +++
 mm/vmstat.c |5 +-
 8 files changed, 251 insertions(+), 24 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index db373b9..f63d4d9 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -40,6 +40,7 @@
 #define ___GFP_DIRECT_RECLAIM  0x40u
 #define ___GFP_WRITE   0x80u
 #define ___GFP_KSWAPD_RECLAIM  0x100u
+#define ___GFP_NOREVERSEBUDDY  0x200u
 /* If the above are modified, __GFP_BITS_SHIFT may need updating */

 /*
@@ -171,6 +172,10 @@
  * __GFP_NOTRACK_FALSE_POSITIVE is an alias of __GFP_NOTRACK. It's a means of
  *   distinguishing in the source between false positives and allocations that
  *   cannot be supported (e.g. page tables).
+ *
+ * __GFP_NOREVERSEBUDDY does not allocate pages from reverse buddy list
+ *   of current order. It make sure that allocation is alignment to same order
+ *   with length order.
  */
 #define __GFP_COLD ((__force gfp_t)___GFP_COLD)
 #define __GFP_NOWARN   ((__force gfp_t)___GFP_NOWARN)
@@ -178,9 +183,10 @@
 #define __GFP_ZERO ((__force gfp_t)___GFP_ZERO)
 #define __GFP_NOTRACK  ((__force gfp_t)___GFP_NOTRACK)
 #define __GFP_NOTRACK_FALSE_POSITIVE (__GFP_NOTRACK)
+#define __GFP_NOREVERSEBUDDY ((__force gfp_t)___GFP_NOREVERSEBUDDY)

 /* Room for N __GFP_FOO bits */
-#define __GFP_BITS_SHIFT 25
+#define __GFP_BITS_SHIFT 26
 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))

 /*
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 8e02b37..94237fe 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -89,7 +89,9 @@ enum {

 struct free_area {
struct list_headfree_list[MIGRATE_TYPES];
+   struct list_headfree_list_reverse[MIGRATE_TYPES];
unsigned long   nr_free;
+   unsigned long   nr_free_reverse;
 };

 struct pglist_data;
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 6b5818d..39d17d7 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -675,6 +675,15 @@ static inline int TestClearPageDoubleMap(struct page *page)
 #define PAGE_KMEMCG_MAPCOUNT_VALUE (-512)
 PAGE_MAPCOUNT_OPS(Kmemcg, KMEMCG)

+/*
+ * ReverseBuddy is enabled for the buddy allocator that allow allocating
+ * two adjacent same free order blocks other than buddy blocks and
+ * 

Re: [PATCH mm] introduce reverse buddy concept to reduce buddy fragment

2017-07-03 Thread Michal Hocko
On Fri 30-06-17 19:25:41, zhouxianr...@huawei.com wrote:
> From: zhouxianrong 
> 
> when buddy is under fragment i find that still there are some pages 
> just like AFFA mode. A is allocated, F is free, AF is buddy pair for
> oder n, FA is buddy   pair for oder n as well.

Could you quantify how often does this happen and how much of a problem
this actually is? Is there any specific workload that would suffer from
such an artificial fragmentation?

> I want to compse the
> FF as oder n + 1 and align to n other than n + 1. this patch broke
> the rules of buddy stated as alignment to its length of oder. i think
> we can do so except for kernel stack because the requirement comes from
> buddy attribution rather than user.

Why do you think the stack is a problem here?

> for kernel stack requirement i add
> __GFP_NOREVERSEBUDDY for this purpose.
> 
> a sample just like blow.
> 
> Node 0, zone  DMA
>   1389   1765342272  2  0  0  0  0  0  0
>0 75   4398   1560379 27  2  0  0  0  0
> Node 0, zone   Normal
>   20 24 14  2  0  0  0  0  0  0  0
>0  6228  3  0  0  0  0  0  0  0
> 
> the patch does not consider fallback allocation for now.

The path is missing the crucial information required for any
optimization. Some numbers to compare how much it helps. The above
output of buddyinfo is pointless without any base to compare to. Also
which workloads would benefit from this change and how much? It is also
a non trivial amount of code in the guts of the page allocator so this
really needs _much_ better explanation.

I haven't looked closely on the code yet but a quick look at
set_reverse_free_area scared me away.
 
> Signed-off-by: zhouxianrong 
> ---
>  include/linux/gfp.h |8 +-
>  include/linux/mmzone.h  |2 +
>  include/linux/page-flags.h  |9 ++
>  include/linux/thread_info.h |5 +-
>  mm/compaction.c |   17 
>  mm/internal.h   |7 ++
>  mm/page_alloc.c |  222 
> +++
>  mm/vmstat.c |5 +-
>  8 files changed, 251 insertions(+), 24 deletions(-)
> 
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index db373b9..f63d4d9 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -40,6 +40,7 @@
>  #define ___GFP_DIRECT_RECLAIM0x40u
>  #define ___GFP_WRITE 0x80u
>  #define ___GFP_KSWAPD_RECLAIM0x100u
> +#define ___GFP_NOREVERSEBUDDY0x200u
>  /* If the above are modified, __GFP_BITS_SHIFT may need updating */
>  
>  /*
> @@ -171,6 +172,10 @@
>   * __GFP_NOTRACK_FALSE_POSITIVE is an alias of __GFP_NOTRACK. It's a means of
>   *   distinguishing in the source between false positives and allocations 
> that
>   *   cannot be supported (e.g. page tables).
> + *
> + * __GFP_NOREVERSEBUDDY does not allocate pages from reverse buddy list
> + *   of current order. It make sure that allocation is alignment to same 
> order
> + *   with length order.
>   */
>  #define __GFP_COLD   ((__force gfp_t)___GFP_COLD)
>  #define __GFP_NOWARN ((__force gfp_t)___GFP_NOWARN)
> @@ -178,9 +183,10 @@
>  #define __GFP_ZERO   ((__force gfp_t)___GFP_ZERO)
>  #define __GFP_NOTRACK((__force gfp_t)___GFP_NOTRACK)
>  #define __GFP_NOTRACK_FALSE_POSITIVE (__GFP_NOTRACK)
> +#define __GFP_NOREVERSEBUDDY ((__force gfp_t)___GFP_NOREVERSEBUDDY)
>  
>  /* Room for N __GFP_FOO bits */
> -#define __GFP_BITS_SHIFT 25
> +#define __GFP_BITS_SHIFT 26
>  #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
>  
>  /*
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 8e02b37..94237fe 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -89,7 +89,9 @@ enum {
>  
>  struct free_area {
>   struct list_headfree_list[MIGRATE_TYPES];
> + struct list_headfree_list_reverse[MIGRATE_TYPES];
>   unsigned long   nr_free;
> + unsigned long   nr_free_reverse;
>  };
>  
>  struct pglist_data;
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 6b5818d..39d17d7 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -675,6 +675,15 @@ static inline int TestClearPageDoubleMap(struct page 
> *page)
>  #define PAGE_KMEMCG_MAPCOUNT_VALUE   (-512)
>  PAGE_MAPCOUNT_OPS(Kmemcg, KMEMCG)
>  
> +/*
> + * ReverseBuddy is enabled for the buddy allocator that allow allocating
> + * two adjacent same free order blocks other than buddy blocks and
> + * composing them as a order + 1 block. It is for reducing buddy
> + * fragment.
> + */
> +#define PAGE_REVERSE_BUDDY_MAPCOUNT_VALUE(-1024)
> +PAGE_MAPCOUNT_OPS(ReverseBuddy, REVERSE_BUDDY)
> +
>  extern bool is_free_buddy_page(struct page *page);
>  
>  

Re: [PATCH mm] introduce reverse buddy concept to reduce buddy fragment

2017-07-03 Thread Michal Hocko
On Fri 30-06-17 19:25:41, zhouxianr...@huawei.com wrote:
> From: zhouxianrong 
> 
> when buddy is under fragment i find that still there are some pages 
> just like AFFA mode. A is allocated, F is free, AF is buddy pair for
> oder n, FA is buddy   pair for oder n as well.

Could you quantify how often does this happen and how much of a problem
this actually is? Is there any specific workload that would suffer from
such an artificial fragmentation?

> I want to compse the
> FF as oder n + 1 and align to n other than n + 1. this patch broke
> the rules of buddy stated as alignment to its length of oder. i think
> we can do so except for kernel stack because the requirement comes from
> buddy attribution rather than user.

Why do you think the stack is a problem here?

> for kernel stack requirement i add
> __GFP_NOREVERSEBUDDY for this purpose.
> 
> a sample just like blow.
> 
> Node 0, zone  DMA
>   1389   1765342272  2  0  0  0  0  0  0
>0 75   4398   1560379 27  2  0  0  0  0
> Node 0, zone   Normal
>   20 24 14  2  0  0  0  0  0  0  0
>0  6228  3  0  0  0  0  0  0  0
> 
> the patch does not consider fallback allocation for now.

The path is missing the crucial information required for any
optimization. Some numbers to compare how much it helps. The above
output of buddyinfo is pointless without any base to compare to. Also
which workloads would benefit from this change and how much? It is also
a non trivial amount of code in the guts of the page allocator so this
really needs _much_ better explanation.

I haven't looked closely on the code yet but a quick look at
set_reverse_free_area scared me away.
 
> Signed-off-by: zhouxianrong 
> ---
>  include/linux/gfp.h |8 +-
>  include/linux/mmzone.h  |2 +
>  include/linux/page-flags.h  |9 ++
>  include/linux/thread_info.h |5 +-
>  mm/compaction.c |   17 
>  mm/internal.h   |7 ++
>  mm/page_alloc.c |  222 
> +++
>  mm/vmstat.c |5 +-
>  8 files changed, 251 insertions(+), 24 deletions(-)
> 
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index db373b9..f63d4d9 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -40,6 +40,7 @@
>  #define ___GFP_DIRECT_RECLAIM0x40u
>  #define ___GFP_WRITE 0x80u
>  #define ___GFP_KSWAPD_RECLAIM0x100u
> +#define ___GFP_NOREVERSEBUDDY0x200u
>  /* If the above are modified, __GFP_BITS_SHIFT may need updating */
>  
>  /*
> @@ -171,6 +172,10 @@
>   * __GFP_NOTRACK_FALSE_POSITIVE is an alias of __GFP_NOTRACK. It's a means of
>   *   distinguishing in the source between false positives and allocations 
> that
>   *   cannot be supported (e.g. page tables).
> + *
> + * __GFP_NOREVERSEBUDDY does not allocate pages from reverse buddy list
> + *   of current order. It make sure that allocation is alignment to same 
> order
> + *   with length order.
>   */
>  #define __GFP_COLD   ((__force gfp_t)___GFP_COLD)
>  #define __GFP_NOWARN ((__force gfp_t)___GFP_NOWARN)
> @@ -178,9 +183,10 @@
>  #define __GFP_ZERO   ((__force gfp_t)___GFP_ZERO)
>  #define __GFP_NOTRACK((__force gfp_t)___GFP_NOTRACK)
>  #define __GFP_NOTRACK_FALSE_POSITIVE (__GFP_NOTRACK)
> +#define __GFP_NOREVERSEBUDDY ((__force gfp_t)___GFP_NOREVERSEBUDDY)
>  
>  /* Room for N __GFP_FOO bits */
> -#define __GFP_BITS_SHIFT 25
> +#define __GFP_BITS_SHIFT 26
>  #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
>  
>  /*
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 8e02b37..94237fe 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -89,7 +89,9 @@ enum {
>  
>  struct free_area {
>   struct list_headfree_list[MIGRATE_TYPES];
> + struct list_headfree_list_reverse[MIGRATE_TYPES];
>   unsigned long   nr_free;
> + unsigned long   nr_free_reverse;
>  };
>  
>  struct pglist_data;
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 6b5818d..39d17d7 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -675,6 +675,15 @@ static inline int TestClearPageDoubleMap(struct page 
> *page)
>  #define PAGE_KMEMCG_MAPCOUNT_VALUE   (-512)
>  PAGE_MAPCOUNT_OPS(Kmemcg, KMEMCG)
>  
> +/*
> + * ReverseBuddy is enabled for the buddy allocator that allow allocating
> + * two adjacent same free order blocks other than buddy blocks and
> + * composing them as a order + 1 block. It is for reducing buddy
> + * fragment.
> + */
> +#define PAGE_REVERSE_BUDDY_MAPCOUNT_VALUE(-1024)
> +PAGE_MAPCOUNT_OPS(ReverseBuddy, REVERSE_BUDDY)
> +
>  extern bool is_free_buddy_page(struct page *page);
>  
>  __PAGEFLAG(Isolated, isolated, PF_ANY);
> diff --git 

[PATCH mm] introduce reverse buddy concept to reduce buddy fragment

2017-06-30 Thread zhouxianrong
From: zhouxianrong 

when buddy is under fragment i find that still there are some pages 
just like AFFA mode. A is allocated, F is free, AF is buddy pair for
oder n, FA is buddy pair for oder n as well. I want to compse the
FF as oder n + 1 and align to n other than n + 1. this patch broke
the rules of buddy stated as alignment to its length of oder. i think
we can do so except for kernel stack because the requirement comes from
buddy attribution rather than user. for kernel stack requirement i add
__GFP_NOREVERSEBUDDY for this purpose.

a sample just like blow.

Node 0, zone  DMA
  1389   1765342272  2  0  0  0  0  0   
   0
 0 75   4398   1560379 27  2  0  0  
0  0
Node 0, zone   Normal
20 24 14  2  0  0  0  0  0  
0  0
 0  6228  3  0  0  0  0  0  
0  0

the patch does not consider fallback allocation for now.

Signed-off-by: zhouxianrong 
---
 include/linux/gfp.h |8 +-
 include/linux/mmzone.h  |2 +
 include/linux/page-flags.h  |9 ++
 include/linux/thread_info.h |5 +-
 mm/compaction.c |   17 
 mm/internal.h   |7 ++
 mm/page_alloc.c |  222 +++
 mm/vmstat.c |5 +-
 8 files changed, 251 insertions(+), 24 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index db373b9..f63d4d9 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -40,6 +40,7 @@
 #define ___GFP_DIRECT_RECLAIM  0x40u
 #define ___GFP_WRITE   0x80u
 #define ___GFP_KSWAPD_RECLAIM  0x100u
+#define ___GFP_NOREVERSEBUDDY  0x200u
 /* If the above are modified, __GFP_BITS_SHIFT may need updating */
 
 /*
@@ -171,6 +172,10 @@
  * __GFP_NOTRACK_FALSE_POSITIVE is an alias of __GFP_NOTRACK. It's a means of
  *   distinguishing in the source between false positives and allocations that
  *   cannot be supported (e.g. page tables).
+ *
+ * __GFP_NOREVERSEBUDDY does not allocate pages from reverse buddy list
+ *   of current order. It make sure that allocation is alignment to same order
+ *   with length order.
  */
 #define __GFP_COLD ((__force gfp_t)___GFP_COLD)
 #define __GFP_NOWARN   ((__force gfp_t)___GFP_NOWARN)
@@ -178,9 +183,10 @@
 #define __GFP_ZERO ((__force gfp_t)___GFP_ZERO)
 #define __GFP_NOTRACK  ((__force gfp_t)___GFP_NOTRACK)
 #define __GFP_NOTRACK_FALSE_POSITIVE (__GFP_NOTRACK)
+#define __GFP_NOREVERSEBUDDY ((__force gfp_t)___GFP_NOREVERSEBUDDY)
 
 /* Room for N __GFP_FOO bits */
-#define __GFP_BITS_SHIFT 25
+#define __GFP_BITS_SHIFT 26
 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
 
 /*
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 8e02b37..94237fe 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -89,7 +89,9 @@ enum {
 
 struct free_area {
struct list_headfree_list[MIGRATE_TYPES];
+   struct list_headfree_list_reverse[MIGRATE_TYPES];
unsigned long   nr_free;
+   unsigned long   nr_free_reverse;
 };
 
 struct pglist_data;
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 6b5818d..39d17d7 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -675,6 +675,15 @@ static inline int TestClearPageDoubleMap(struct page *page)
 #define PAGE_KMEMCG_MAPCOUNT_VALUE (-512)
 PAGE_MAPCOUNT_OPS(Kmemcg, KMEMCG)
 
+/*
+ * ReverseBuddy is enabled for the buddy allocator that allow allocating
+ * two adjacent same free order blocks other than buddy blocks and
+ * composing them as a order + 1 block. It is for reducing buddy
+ * fragment.
+ */
+#define PAGE_REVERSE_BUDDY_MAPCOUNT_VALUE  (-1024)
+PAGE_MAPCOUNT_OPS(ReverseBuddy, REVERSE_BUDDY)
+
 extern bool is_free_buddy_page(struct page *page);
 
 __PAGEFLAG(Isolated, isolated, PF_ANY);
diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
index 5837387..b4a1605 100644
--- a/include/linux/thread_info.h
+++ b/include/linux/thread_info.h
@@ -28,9 +28,10 @@
 
 #ifdef CONFIG_DEBUG_STACK_USAGE
 # define THREADINFO_GFP(GFP_KERNEL_ACCOUNT | __GFP_NOTRACK | \
-__GFP_ZERO)
+__GFP_NOREVERSEBUDDY | __GFP_ZERO)
 #else
-# define THREADINFO_GFP(GFP_KERNEL_ACCOUNT | __GFP_NOTRACK)
+# define THREADINFO_GFP(GFP_KERNEL_ACCOUNT | __GFP_NOTRACK | \
+__GFP_NOREVERSEBUDDY)
 #endif
 
 /*
diff --git a/mm/compaction.c b/mm/compaction.c
index 0fdfde0..a43f169 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -768,6 +768,20 @@ static 

[PATCH mm] introduce reverse buddy concept to reduce buddy fragment

2017-06-30 Thread zhouxianrong
From: zhouxianrong 

when buddy is under fragment i find that still there are some pages 
just like AFFA mode. A is allocated, F is free, AF is buddy pair for
oder n, FA is buddy pair for oder n as well. I want to compse the
FF as oder n + 1 and align to n other than n + 1. this patch broke
the rules of buddy stated as alignment to its length of oder. i think
we can do so except for kernel stack because the requirement comes from
buddy attribution rather than user. for kernel stack requirement i add
__GFP_NOREVERSEBUDDY for this purpose.

a sample just like blow.

Node 0, zone  DMA
  1389   1765342272  2  0  0  0  0  0   
   0
 0 75   4398   1560379 27  2  0  0  
0  0
Node 0, zone   Normal
20 24 14  2  0  0  0  0  0  
0  0
 0  6228  3  0  0  0  0  0  
0  0

the patch does not consider fallback allocation for now.

Signed-off-by: zhouxianrong 
---
 include/linux/gfp.h |8 +-
 include/linux/mmzone.h  |2 +
 include/linux/page-flags.h  |9 ++
 include/linux/thread_info.h |5 +-
 mm/compaction.c |   17 
 mm/internal.h   |7 ++
 mm/page_alloc.c |  222 +++
 mm/vmstat.c |5 +-
 8 files changed, 251 insertions(+), 24 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index db373b9..f63d4d9 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -40,6 +40,7 @@
 #define ___GFP_DIRECT_RECLAIM  0x40u
 #define ___GFP_WRITE   0x80u
 #define ___GFP_KSWAPD_RECLAIM  0x100u
+#define ___GFP_NOREVERSEBUDDY  0x200u
 /* If the above are modified, __GFP_BITS_SHIFT may need updating */
 
 /*
@@ -171,6 +172,10 @@
  * __GFP_NOTRACK_FALSE_POSITIVE is an alias of __GFP_NOTRACK. It's a means of
  *   distinguishing in the source between false positives and allocations that
  *   cannot be supported (e.g. page tables).
+ *
+ * __GFP_NOREVERSEBUDDY does not allocate pages from reverse buddy list
+ *   of current order. It make sure that allocation is alignment to same order
+ *   with length order.
  */
 #define __GFP_COLD ((__force gfp_t)___GFP_COLD)
 #define __GFP_NOWARN   ((__force gfp_t)___GFP_NOWARN)
@@ -178,9 +183,10 @@
 #define __GFP_ZERO ((__force gfp_t)___GFP_ZERO)
 #define __GFP_NOTRACK  ((__force gfp_t)___GFP_NOTRACK)
 #define __GFP_NOTRACK_FALSE_POSITIVE (__GFP_NOTRACK)
+#define __GFP_NOREVERSEBUDDY ((__force gfp_t)___GFP_NOREVERSEBUDDY)
 
 /* Room for N __GFP_FOO bits */
-#define __GFP_BITS_SHIFT 25
+#define __GFP_BITS_SHIFT 26
 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
 
 /*
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 8e02b37..94237fe 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -89,7 +89,9 @@ enum {
 
 struct free_area {
struct list_headfree_list[MIGRATE_TYPES];
+   struct list_headfree_list_reverse[MIGRATE_TYPES];
unsigned long   nr_free;
+   unsigned long   nr_free_reverse;
 };
 
 struct pglist_data;
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 6b5818d..39d17d7 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -675,6 +675,15 @@ static inline int TestClearPageDoubleMap(struct page *page)
 #define PAGE_KMEMCG_MAPCOUNT_VALUE (-512)
 PAGE_MAPCOUNT_OPS(Kmemcg, KMEMCG)
 
+/*
+ * ReverseBuddy is enabled for the buddy allocator that allow allocating
+ * two adjacent same free order blocks other than buddy blocks and
+ * composing them as a order + 1 block. It is for reducing buddy
+ * fragment.
+ */
+#define PAGE_REVERSE_BUDDY_MAPCOUNT_VALUE  (-1024)
+PAGE_MAPCOUNT_OPS(ReverseBuddy, REVERSE_BUDDY)
+
 extern bool is_free_buddy_page(struct page *page);
 
 __PAGEFLAG(Isolated, isolated, PF_ANY);
diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
index 5837387..b4a1605 100644
--- a/include/linux/thread_info.h
+++ b/include/linux/thread_info.h
@@ -28,9 +28,10 @@
 
 #ifdef CONFIG_DEBUG_STACK_USAGE
 # define THREADINFO_GFP(GFP_KERNEL_ACCOUNT | __GFP_NOTRACK | \
-__GFP_ZERO)
+__GFP_NOREVERSEBUDDY | __GFP_ZERO)
 #else
-# define THREADINFO_GFP(GFP_KERNEL_ACCOUNT | __GFP_NOTRACK)
+# define THREADINFO_GFP(GFP_KERNEL_ACCOUNT | __GFP_NOTRACK | \
+__GFP_NOREVERSEBUDDY)
 #endif
 
 /*
diff --git a/mm/compaction.c b/mm/compaction.c
index 0fdfde0..a43f169 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -768,6 +768,20 @@ static bool too_many_isolated(struct zone *zone)

[PATCH mm] introduce reverse buddy concept to reduce buddy fragment

2017-06-30 Thread zhouxianrong
From: z00281421 


Signed-off-by: z00281421 
---
 include/linux/gfp.h |8 +-
 include/linux/mmzone.h  |2 +
 include/linux/page-flags.h  |9 ++
 include/linux/thread_info.h |5 +-
 mm/compaction.c |   17 
 mm/internal.h   |7 ++
 mm/page_alloc.c |  222 +++
 mm/vmstat.c |5 +-
 8 files changed, 251 insertions(+), 24 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index db373b9..f63d4d9 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -40,6 +40,7 @@
 #define ___GFP_DIRECT_RECLAIM  0x40u
 #define ___GFP_WRITE   0x80u
 #define ___GFP_KSWAPD_RECLAIM  0x100u
+#define ___GFP_NOREVERSEBUDDY  0x200u
 /* If the above are modified, __GFP_BITS_SHIFT may need updating */
 
 /*
@@ -171,6 +172,10 @@
  * __GFP_NOTRACK_FALSE_POSITIVE is an alias of __GFP_NOTRACK. It's a means of
  *   distinguishing in the source between false positives and allocations that
  *   cannot be supported (e.g. page tables).
+ *
+ * __GFP_NOREVERSEBUDDY does not allocate pages from reverse buddy list
+ *   of current order. It make sure that allocation is alignment to same order
+ *   with length order.
  */
 #define __GFP_COLD ((__force gfp_t)___GFP_COLD)
 #define __GFP_NOWARN   ((__force gfp_t)___GFP_NOWARN)
@@ -178,9 +183,10 @@
 #define __GFP_ZERO ((__force gfp_t)___GFP_ZERO)
 #define __GFP_NOTRACK  ((__force gfp_t)___GFP_NOTRACK)
 #define __GFP_NOTRACK_FALSE_POSITIVE (__GFP_NOTRACK)
+#define __GFP_NOREVERSEBUDDY ((__force gfp_t)___GFP_NOREVERSEBUDDY)
 
 /* Room for N __GFP_FOO bits */
-#define __GFP_BITS_SHIFT 25
+#define __GFP_BITS_SHIFT 26
 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
 
 /*
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 8e02b37..94237fe 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -89,7 +89,9 @@ enum {
 
 struct free_area {
struct list_headfree_list[MIGRATE_TYPES];
+   struct list_headfree_list_reverse[MIGRATE_TYPES];
unsigned long   nr_free;
+   unsigned long   nr_free_reverse;
 };
 
 struct pglist_data;
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 6b5818d..39d17d7 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -675,6 +675,15 @@ static inline int TestClearPageDoubleMap(struct page *page)
 #define PAGE_KMEMCG_MAPCOUNT_VALUE (-512)
 PAGE_MAPCOUNT_OPS(Kmemcg, KMEMCG)
 
+/*
+ * ReverseBuddy is enabled for the buddy allocator that allow allocating
+ * two adjacent same free order blocks other than buddy blocks and
+ * composing them as a order + 1 block. It is for reducing buddy
+ * fragment.
+ */
+#define PAGE_REVERSE_BUDDY_MAPCOUNT_VALUE  (-1024)
+PAGE_MAPCOUNT_OPS(ReverseBuddy, REVERSE_BUDDY)
+
 extern bool is_free_buddy_page(struct page *page);
 
 __PAGEFLAG(Isolated, isolated, PF_ANY);
diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
index 5837387..b4a1605 100644
--- a/include/linux/thread_info.h
+++ b/include/linux/thread_info.h
@@ -28,9 +28,10 @@
 
 #ifdef CONFIG_DEBUG_STACK_USAGE
 # define THREADINFO_GFP(GFP_KERNEL_ACCOUNT | __GFP_NOTRACK | \
-__GFP_ZERO)
+__GFP_NOREVERSEBUDDY | __GFP_ZERO)
 #else
-# define THREADINFO_GFP(GFP_KERNEL_ACCOUNT | __GFP_NOTRACK)
+# define THREADINFO_GFP(GFP_KERNEL_ACCOUNT | __GFP_NOTRACK | \
+__GFP_NOREVERSEBUDDY)
 #endif
 
 /*
diff --git a/mm/compaction.c b/mm/compaction.c
index 0fdfde0..a43f169 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -768,6 +768,20 @@ static bool too_many_isolated(struct zone *zone)
continue;
}
 
+   if (PageReverseBuddy(page)) {
+   unsigned long freepage_order = page_order_unsafe(page);
+
+   /*
+* Without lock, we cannot be sure that what we got is
+* a valid page order. Consider only values in the
+* valid order range to prevent low_pfn overflow.
+*/
+   if (freepage_order > 0 &&
+   freepage_order < MAX_ORDER - 1)
+   low_pfn += (1UL << (freepage_order + 1)) - 1;
+   continue;
+   }
+
/*
 * Regardless of being on LRU, compound pages such as THP and
 * hugetlbfs are not to be compacted. We can potentially save
@@ -1005,6 +1019,9 @@ static bool suitable_migration_target(struct 
compact_control *cc,
return false;
}
 
+   if (PageReverseBuddy(page))
+   

[PATCH mm] introduce reverse buddy concept to reduce buddy fragment

2017-06-30 Thread zhouxianrong
From: z00281421 


Signed-off-by: z00281421 
---
 include/linux/gfp.h |8 +-
 include/linux/mmzone.h  |2 +
 include/linux/page-flags.h  |9 ++
 include/linux/thread_info.h |5 +-
 mm/compaction.c |   17 
 mm/internal.h   |7 ++
 mm/page_alloc.c |  222 +++
 mm/vmstat.c |5 +-
 8 files changed, 251 insertions(+), 24 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index db373b9..f63d4d9 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -40,6 +40,7 @@
 #define ___GFP_DIRECT_RECLAIM  0x40u
 #define ___GFP_WRITE   0x80u
 #define ___GFP_KSWAPD_RECLAIM  0x100u
+#define ___GFP_NOREVERSEBUDDY  0x200u
 /* If the above are modified, __GFP_BITS_SHIFT may need updating */
 
 /*
@@ -171,6 +172,10 @@
  * __GFP_NOTRACK_FALSE_POSITIVE is an alias of __GFP_NOTRACK. It's a means of
  *   distinguishing in the source between false positives and allocations that
  *   cannot be supported (e.g. page tables).
+ *
+ * __GFP_NOREVERSEBUDDY does not allocate pages from reverse buddy list
+ *   of current order. It make sure that allocation is alignment to same order
+ *   with length order.
  */
 #define __GFP_COLD ((__force gfp_t)___GFP_COLD)
 #define __GFP_NOWARN   ((__force gfp_t)___GFP_NOWARN)
@@ -178,9 +183,10 @@
 #define __GFP_ZERO ((__force gfp_t)___GFP_ZERO)
 #define __GFP_NOTRACK  ((__force gfp_t)___GFP_NOTRACK)
 #define __GFP_NOTRACK_FALSE_POSITIVE (__GFP_NOTRACK)
+#define __GFP_NOREVERSEBUDDY ((__force gfp_t)___GFP_NOREVERSEBUDDY)
 
 /* Room for N __GFP_FOO bits */
-#define __GFP_BITS_SHIFT 25
+#define __GFP_BITS_SHIFT 26
 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
 
 /*
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 8e02b37..94237fe 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -89,7 +89,9 @@ enum {
 
 struct free_area {
struct list_headfree_list[MIGRATE_TYPES];
+   struct list_headfree_list_reverse[MIGRATE_TYPES];
unsigned long   nr_free;
+   unsigned long   nr_free_reverse;
 };
 
 struct pglist_data;
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 6b5818d..39d17d7 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -675,6 +675,15 @@ static inline int TestClearPageDoubleMap(struct page *page)
 #define PAGE_KMEMCG_MAPCOUNT_VALUE (-512)
 PAGE_MAPCOUNT_OPS(Kmemcg, KMEMCG)
 
+/*
+ * ReverseBuddy is enabled for the buddy allocator that allow allocating
+ * two adjacent same free order blocks other than buddy blocks and
+ * composing them as a order + 1 block. It is for reducing buddy
+ * fragment.
+ */
+#define PAGE_REVERSE_BUDDY_MAPCOUNT_VALUE  (-1024)
+PAGE_MAPCOUNT_OPS(ReverseBuddy, REVERSE_BUDDY)
+
 extern bool is_free_buddy_page(struct page *page);
 
 __PAGEFLAG(Isolated, isolated, PF_ANY);
diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
index 5837387..b4a1605 100644
--- a/include/linux/thread_info.h
+++ b/include/linux/thread_info.h
@@ -28,9 +28,10 @@
 
 #ifdef CONFIG_DEBUG_STACK_USAGE
 # define THREADINFO_GFP(GFP_KERNEL_ACCOUNT | __GFP_NOTRACK | \
-__GFP_ZERO)
+__GFP_NOREVERSEBUDDY | __GFP_ZERO)
 #else
-# define THREADINFO_GFP(GFP_KERNEL_ACCOUNT | __GFP_NOTRACK)
+# define THREADINFO_GFP(GFP_KERNEL_ACCOUNT | __GFP_NOTRACK | \
+__GFP_NOREVERSEBUDDY)
 #endif
 
 /*
diff --git a/mm/compaction.c b/mm/compaction.c
index 0fdfde0..a43f169 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -768,6 +768,20 @@ static bool too_many_isolated(struct zone *zone)
continue;
}
 
+   if (PageReverseBuddy(page)) {
+   unsigned long freepage_order = page_order_unsafe(page);
+
+   /*
+* Without lock, we cannot be sure that what we got is
+* a valid page order. Consider only values in the
+* valid order range to prevent low_pfn overflow.
+*/
+   if (freepage_order > 0 &&
+   freepage_order < MAX_ORDER - 1)
+   low_pfn += (1UL << (freepage_order + 1)) - 1;
+   continue;
+   }
+
/*
 * Regardless of being on LRU, compound pages such as THP and
 * hugetlbfs are not to be compacted. We can potentially save
@@ -1005,6 +1019,9 @@ static bool suitable_migration_target(struct 
compact_control *cc,
return false;
}
 
+   if (PageReverseBuddy(page))
+   return false;
+
/* If the block is MIGRATE_MOVABLE or