Re: [PATCH mm] introduce reverse buddy concept to reduce buddy fragment
On Tue, Jul 04, 2017 at 01:24:14PM +0200, Michal Hocko wrote: > On Tue 04-07-17 16:04:52, zhouxianrong wrote: > > every 2s i sample /proc/buddyinfo in the whole test process. > > > > the last about 90 samples were sampled after the test was done. > > I've tried to explain to you that numbers without a proper testing > metodology and highlevel metrics you are interested in and comparision > to the base kernel are meaningless. I cannot draw any conclusion from > looking at numbers you have posted. Are high order allocations cheaper > to do with this patch? What about an averge order-0 allocation request? > I have to agree. The patch is extremely complex for what it does which is working around a limitation of the buddy allocator in general (buddy's must be naturally aligned). There would have to be *strong* justification that allocations fail even with compaction or a reclaim cycle or that the latency is severely reduced -- neither which is evident from the data presented. It would also have to be proven that there is no overhead added in the general case to justify this so without extensive justification for the complexity; Naked-by: Mel Gorman> You are touching memory allocator hot paths and those are really > sensitive to changes. It takes a lot of testing with different workloads > to prove that no new regressions are introduced. That being said, I > completely agree that reducing the memory fragmentation is an important > objective but touching the page allocator and adding new branches there > sounds like a problematic approach which would have to show _huge_ > benefits to be mergeable. Is it possible to improve khugepaged to > accomplish the same thing? Or if this is CMA related, a justification why alloc_contig_range cannot do the same thing with a linear walk when the initial allocation attempt fails. -- Mel Gorman SUSE Labs
Re: [PATCH mm] introduce reverse buddy concept to reduce buddy fragment
On Tue, Jul 04, 2017 at 01:24:14PM +0200, Michal Hocko wrote: > On Tue 04-07-17 16:04:52, zhouxianrong wrote: > > every 2s i sample /proc/buddyinfo in the whole test process. > > > > the last about 90 samples were sampled after the test was done. > > I've tried to explain to you that numbers without a proper testing > metodology and highlevel metrics you are interested in and comparision > to the base kernel are meaningless. I cannot draw any conclusion from > looking at numbers you have posted. Are high order allocations cheaper > to do with this patch? What about an averge order-0 allocation request? > I have to agree. The patch is extremely complex for what it does which is working around a limitation of the buddy allocator in general (buddy's must be naturally aligned). There would have to be *strong* justification that allocations fail even with compaction or a reclaim cycle or that the latency is severely reduced -- neither which is evident from the data presented. It would also have to be proven that there is no overhead added in the general case to justify this so without extensive justification for the complexity; Naked-by: Mel Gorman > You are touching memory allocator hot paths and those are really > sensitive to changes. It takes a lot of testing with different workloads > to prove that no new regressions are introduced. That being said, I > completely agree that reducing the memory fragmentation is an important > objective but touching the page allocator and adding new branches there > sounds like a problematic approach which would have to show _huge_ > benefits to be mergeable. Is it possible to improve khugepaged to > accomplish the same thing? Or if this is CMA related, a justification why alloc_contig_range cannot do the same thing with a linear walk when the initial allocation attempt fails. -- Mel Gorman SUSE Labs
Re: [PATCH mm] introduce reverse buddy concept to reduce buddy fragment
On Tue 04-07-17 16:04:52, zhouxianrong wrote: > every 2s i sample /proc/buddyinfo in the whole test process. > > the last about 90 samples were sampled after the test was done. I've tried to explain to you that numbers without a proper testing metodology and highlevel metrics you are interested in and comparision to the base kernel are meaningless. I cannot draw any conclusion from looking at numbers you have posted. Are high order allocations cheaper to do with this patch? What about an averge order-0 allocation request? You are touching memory allocator hot paths and those are really sensitive to changes. It takes a lot of testing with different workloads to prove that no new regressions are introduced. That being said, I completely agree that reducing the memory fragmentation is an important objective but touching the page allocator and adding new branches there sounds like a problematic approach which would have to show _huge_ benefits to be mergeable. Is it possible to improve khugepaged to accomplish the same thing? -- Michal Hocko SUSE Labs
Re: [PATCH mm] introduce reverse buddy concept to reduce buddy fragment
On Tue 04-07-17 16:04:52, zhouxianrong wrote: > every 2s i sample /proc/buddyinfo in the whole test process. > > the last about 90 samples were sampled after the test was done. I've tried to explain to you that numbers without a proper testing metodology and highlevel metrics you are interested in and comparision to the base kernel are meaningless. I cannot draw any conclusion from looking at numbers you have posted. Are high order allocations cheaper to do with this patch? What about an averge order-0 allocation request? You are touching memory allocator hot paths and those are really sensitive to changes. It takes a lot of testing with different workloads to prove that no new regressions are introduced. That being said, I completely agree that reducing the memory fragmentation is an important objective but touching the page allocator and adding new branches there sounds like a problematic approach which would have to show _huge_ benefits to be mergeable. Is it possible to improve khugepaged to accomplish the same thing? -- Michal Hocko SUSE Labs
Re: [PATCH mm] introduce reverse buddy concept to reduce buddy fragment
every 2s i sample /proc/buddyinfo in the whole test process. the last about 90 samples were sampled after the test was done. Node 0, zone DMA 4706 2099838266 50 5 3 2 1 2 38 0395 1261211 57 6 1 0 0 0 0 Node 0, zone DMA 4691 2107833265 50 5 3 2 1 2 38 0395 1260211 57 6 1 0 0 0 0 Node 0, zone DMA 1815 1437791266 51 6 2 2 1 2 38 0244 1275211 57 6 1 0 0 0 0 Node 0, zone DMA 1465 1030796267 51 6 2 2 1 2 38 0246 1279211 57 6 1 0 0 0 0 Node 0, zone DMA 1378 1114791260 51 6 2 2 1 2 38 0183 1282216 54 3 1 0 0 0 0 Node 0, zone DMA 2605 2021619260 51 6 2 2 1 2 38 0307 1330220 54 3 1 0 0 0 0 Node 0, zone DMA 2465 2026618260 51 6 2 2 1 2 38 0312 1330220 54 3 1 0 0 0 0 Node 0, zone DMA 758766462224 43 6 2 2 1 2 38 0148 1082194 50 3 1 0 0 0 0 Node 0, zone DMA 912939472224 43 6 2 2 1 2 38 0174 1086194 50 3 1 0 0 0 0 Node 0, zone DMA 502 1049428226 44 6 2 2 1 2 38 0187 1092198 50 3 1 0 0 0 0 Node 0, zone DMA 747 1338671228 46 6 2 2 1 2 38 0222 1180204 51 3 1 0 0 0 0 Node 0, zone DMA 675 1351667226 46 6 2 2 1 2 38 0220 1180204 52 3 1 0 0 0 0 Node 0, zone DMA 865787266220 45 6 2 2 1 2 38 0116984203 51 3 1 0 0 0 0 Node 0, zone DMA 1915 1233351261 47 6 2 2 1 2 38 0179 1191230 53 3 1 0 0 0 0 Node 0, zone DMA 2078 1348402258 46 6 2 2 1 2 38 0183 1233228 53 3 1 0 0 0 0 Node 0, zone DMA 2940 1129457259 46 6 2 2 1 2 38 0209 1239229 54 3 1 0 0 0 0 Node 0, zone DMA 2906 1127457259 46 6 2 2 1 2 38 0222 1240230 54 3 1 0 0 0 0 Node 0, zone DMA 1540 1093475256 46 6 2 2 1 2 38 0293 1234227 54 3 1 0 0 0 0 Node 0, zone DMA 1060 1071487257 46 6 2 2 1 2 38 0297 1238227 54 3 1 0 0 0 0 Node 0, zone DMA 1693869405267 46 6 2 2 1 2 38 0243 1480230 54 3 1 0 0 0 0 Node 0, zone DMA 1720928426269 46 6 2 2 1 2 38 0260 1485230 54 3 1 0 0 0 0 Node 0, zone DMA 546601393269 46 6 2 2 1 2 38 0193 1314230 54 3 1 0 0 0 0 Node 0, zone DMA 583336 28 42 20 3 2 2 2 2 36 0 15 43 5 20 1 1 0 0 0 0 Node 0, zone DMA 592382 27 39 21 3 2 2 2 2 36 0 17 27 5 21 1 1 0 0 0 0 Node 0, zone DMA 3534 1510212 78 35 6 3 3 1 2 32 0122333 66 28 2 1 0 0 0 0 Node 0, zone DMA 3411 1521212 78 35 6 3 3 1 2 32 0123334 67 28 2 1 0 0 0 0 Node 0, zone DMA 1521 1521211 79 35 6 3 3 1 2 32 0118336 68 28 2 1 0 0 0 0 Node 0, zone DMA 3 1 2 0 3 3 1 1 1 3 18 0 2
Re: [PATCH mm] introduce reverse buddy concept to reduce buddy fragment
every 2s i sample /proc/buddyinfo in the whole test process. the last about 90 samples were sampled after the test was done. Node 0, zone DMA 4706 2099838266 50 5 3 2 1 2 38 0395 1261211 57 6 1 0 0 0 0 Node 0, zone DMA 4691 2107833265 50 5 3 2 1 2 38 0395 1260211 57 6 1 0 0 0 0 Node 0, zone DMA 1815 1437791266 51 6 2 2 1 2 38 0244 1275211 57 6 1 0 0 0 0 Node 0, zone DMA 1465 1030796267 51 6 2 2 1 2 38 0246 1279211 57 6 1 0 0 0 0 Node 0, zone DMA 1378 1114791260 51 6 2 2 1 2 38 0183 1282216 54 3 1 0 0 0 0 Node 0, zone DMA 2605 2021619260 51 6 2 2 1 2 38 0307 1330220 54 3 1 0 0 0 0 Node 0, zone DMA 2465 2026618260 51 6 2 2 1 2 38 0312 1330220 54 3 1 0 0 0 0 Node 0, zone DMA 758766462224 43 6 2 2 1 2 38 0148 1082194 50 3 1 0 0 0 0 Node 0, zone DMA 912939472224 43 6 2 2 1 2 38 0174 1086194 50 3 1 0 0 0 0 Node 0, zone DMA 502 1049428226 44 6 2 2 1 2 38 0187 1092198 50 3 1 0 0 0 0 Node 0, zone DMA 747 1338671228 46 6 2 2 1 2 38 0222 1180204 51 3 1 0 0 0 0 Node 0, zone DMA 675 1351667226 46 6 2 2 1 2 38 0220 1180204 52 3 1 0 0 0 0 Node 0, zone DMA 865787266220 45 6 2 2 1 2 38 0116984203 51 3 1 0 0 0 0 Node 0, zone DMA 1915 1233351261 47 6 2 2 1 2 38 0179 1191230 53 3 1 0 0 0 0 Node 0, zone DMA 2078 1348402258 46 6 2 2 1 2 38 0183 1233228 53 3 1 0 0 0 0 Node 0, zone DMA 2940 1129457259 46 6 2 2 1 2 38 0209 1239229 54 3 1 0 0 0 0 Node 0, zone DMA 2906 1127457259 46 6 2 2 1 2 38 0222 1240230 54 3 1 0 0 0 0 Node 0, zone DMA 1540 1093475256 46 6 2 2 1 2 38 0293 1234227 54 3 1 0 0 0 0 Node 0, zone DMA 1060 1071487257 46 6 2 2 1 2 38 0297 1238227 54 3 1 0 0 0 0 Node 0, zone DMA 1693869405267 46 6 2 2 1 2 38 0243 1480230 54 3 1 0 0 0 0 Node 0, zone DMA 1720928426269 46 6 2 2 1 2 38 0260 1485230 54 3 1 0 0 0 0 Node 0, zone DMA 546601393269 46 6 2 2 1 2 38 0193 1314230 54 3 1 0 0 0 0 Node 0, zone DMA 583336 28 42 20 3 2 2 2 2 36 0 15 43 5 20 1 1 0 0 0 0 Node 0, zone DMA 592382 27 39 21 3 2 2 2 2 36 0 17 27 5 21 1 1 0 0 0 0 Node 0, zone DMA 3534 1510212 78 35 6 3 3 1 2 32 0122333 66 28 2 1 0 0 0 0 Node 0, zone DMA 3411 1521212 78 35 6 3 3 1 2 32 0123334 67 28 2 1 0 0 0 0 Node 0, zone DMA 1521 1521211 79 35 6 3 3 1 2 32 0118336 68 28 2 1 0 0 0 0 Node 0, zone DMA 3 1 2 0 3 3 1 1 1 3 18 0 2
Re: [PATCH mm] introduce reverse buddy concept to reduce buddy fragment
i do the test again. after minutes i tell you the result. On 2017/7/4 14:52, Michal Hocko wrote: On Tue 04-07-17 09:21:00, zhouxianrong wrote: the test was done as follows: 1. the environment is android 7.0 and kernel is 4.1 and managed memory is 3.5GB There have been many changes in the compaction proper since than. Do you see the same problem with the current upstream kernel? 2. every 4s startup one apk, total 100 more apks need to startup 3. after finishing step 2, sample buddyinfo once and get the result How stable are those results?
Re: [PATCH mm] introduce reverse buddy concept to reduce buddy fragment
i do the test again. after minutes i tell you the result. On 2017/7/4 14:52, Michal Hocko wrote: On Tue 04-07-17 09:21:00, zhouxianrong wrote: the test was done as follows: 1. the environment is android 7.0 and kernel is 4.1 and managed memory is 3.5GB There have been many changes in the compaction proper since than. Do you see the same problem with the current upstream kernel? 2. every 4s startup one apk, total 100 more apks need to startup 3. after finishing step 2, sample buddyinfo once and get the result How stable are those results?
Re: [PATCH mm] introduce reverse buddy concept to reduce buddy fragment
On Tue 04-07-17 09:21:00, zhouxianrong wrote: > the test was done as follows: > > 1. the environment is android 7.0 and kernel is 4.1 and managed memory is > 3.5GB There have been many changes in the compaction proper since than. Do you see the same problem with the current upstream kernel? > 2. every 4s startup one apk, total 100 more apks need to startup > 3. after finishing step 2, sample buddyinfo once and get the result How stable are those results? -- Michal Hocko SUSE Labs
Re: [PATCH mm] introduce reverse buddy concept to reduce buddy fragment
On Tue 04-07-17 09:21:00, zhouxianrong wrote: > the test was done as follows: > > 1. the environment is android 7.0 and kernel is 4.1 and managed memory is > 3.5GB There have been many changes in the compaction proper since than. Do you see the same problem with the current upstream kernel? > 2. every 4s startup one apk, total 100 more apks need to startup > 3. after finishing step 2, sample buddyinfo once and get the result How stable are those results? -- Michal Hocko SUSE Labs
Re: [PATCH mm] introduce reverse buddy concept to reduce buddy fragment
the test was done as follows: 1. the environment is android 7.0 and kernel is 4.1 and managed memory is 3.5GB 2. every 4s startup one apk, total 100 more apks need to startup 3. after finishing step 2, sample buddyinfo once and get the result On 2017/7/3 23:33, Michal Hocko wrote: On Mon 03-07-17 20:02:16, zhouxianrong wrote: [...] from above i think after applying the patch the result is better. You haven't described your testing methodology, nor the workload that was tested. As such this data is completely meaningless.
Re: [PATCH mm] introduce reverse buddy concept to reduce buddy fragment
the test was done as follows: 1. the environment is android 7.0 and kernel is 4.1 and managed memory is 3.5GB 2. every 4s startup one apk, total 100 more apks need to startup 3. after finishing step 2, sample buddyinfo once and get the result On 2017/7/3 23:33, Michal Hocko wrote: On Mon 03-07-17 20:02:16, zhouxianrong wrote: [...] from above i think after applying the patch the result is better. You haven't described your testing methodology, nor the workload that was tested. As such this data is completely meaningless.
Re: [PATCH mm] introduce reverse buddy concept to reduce buddy fragment
On Mon 03-07-17 20:02:16, zhouxianrong wrote: [...] > from above i think after applying the patch the result is better. You haven't described your testing methodology, nor the workload that was tested. As such this data is completely meaningless. -- Michal Hocko SUSE Labs
Re: [PATCH mm] introduce reverse buddy concept to reduce buddy fragment
On Mon 03-07-17 20:02:16, zhouxianrong wrote: [...] > from above i think after applying the patch the result is better. You haven't described your testing methodology, nor the workload that was tested. As such this data is completely meaningless. -- Michal Hocko SUSE Labs
Re: [PATCH mm] introduce reverse buddy concept to reduce buddy fragment
On 2017/7/3 15:48, Michal Hocko wrote: On Fri 30-06-17 19:25:41, zhouxianr...@huawei.com wrote: From: zhouxianrongwhen buddy is under fragment i find that still there are some pages just like AFFA mode. A is allocated, F is free, AF is buddy pair for oder n, FA is buddy pair for oder n as well. Could you quantify how often does this happen and how much of a problem this actually is? Is there any specific workload that would suffer from such an artificial fragmentation? I want to compse the FF as oder n + 1 and align to n other than n + 1. this patch broke the rules of buddy stated as alignment to its length of oder. i think we can do so except for kernel stack because the requirement comes from buddy attribution rather than user. Why do you think the stack is a problem here? for kernel stack requirement i add __GFP_NOREVERSEBUDDY for this purpose. a sample just like blow. Node 0, zone DMA 1389 1765342272 2 0 0 0 0 0 0 0 75 4398 1560379 27 2 0 0 0 0 Node 0, zone Normal 20 24 14 2 0 0 0 0 0 0 0 0 6228 3 0 0 0 0 0 0 0 at the sample moment if we have not this patch, the aspect should like below: Node 0, zoneDMA (1389 + 75 * 2) (1765 + 4398 * 2)(342 + 1560 * 2)(272 + 379 * 2) (2 + 27 * 2) (0 + 2 * 2) 0 0 0 0 0 Node 0, zoneNormal (20 + 6 * 2)(24 + 228 * 2) (14 + 3 * 2) 2 0 0 0 0 0 0 0 i find out AFFA mode in lower order free_list and move FF into higher order free_list_reverse. now we only consider the DMA zone. let we see the difference. Node 0, zoneDMA (1389 + 75 * 2) (1765 + 4398 * 2) (342 + 1560 * 2) (272 + 379 * 2) (2 + 27 * 2) (0 + 2 * 2) 0 0 0 0 0 it is equal to Node 0, zoneDMA 1539 10561 3804 1302 5840 0 0 0 0 -- 1) after applying this patch Node 0, zoneDMA 1389 1765342272 2 0 0 0 0 0 0 0 75 4398 1560379 27 2 0 0 0 0 it is equivalent to Node 0, zoneDMA (1389 + 0) (1765 + 75) (342 + 4398) (272 + 1560) (2 + 379) (0 + 27) (0 + 2) 0 0 0 0 it is equal to Node 0, zoneDMA 1389 1840 4740 1832 381 27 2 0 0 0 0 -- 2) let's write 1) and 2) together and compare them 1539 10561 3804 1302 5840 0 0 0 0 - 1) 1389 18404740 1832 381 27 2 0 0 0 0 - 2) from above i think after applying the patch the result is better. the patch does not consider fallback allocation for now. The path is missing the crucial information required for any optimization. Some numbers to compare how much it helps. The above output of buddyinfo is pointless without any base to compare to. Also which workloads would benefit from this change and how much? It is also a non trivial amount of code in the guts of the page allocator so this really needs _much_ better explanation. I haven't looked closely on the code yet but a quick look at set_reverse_free_area scared me away. Signed-off-by: zhouxianrong --- include/linux/gfp.h |8 +- include/linux/mmzone.h |2 + include/linux/page-flags.h |9 ++ include/linux/thread_info.h |5 +- mm/compaction.c | 17 mm/internal.h |7 ++ mm/page_alloc.c | 222 +++ mm/vmstat.c |5 +- 8 files changed, 251 insertions(+), 24 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index db373b9..f63d4d9 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -40,6 +40,7 @@ #define ___GFP_DIRECT_RECLAIM 0x40u #define ___GFP_WRITE 0x80u #define ___GFP_KSWAPD_RECLAIM 0x100u +#define ___GFP_NOREVERSEBUDDY 0x200u /* If the above are modified, __GFP_BITS_SHIFT may need updating */ /* @@ -171,6 +172,10 @@ * __GFP_NOTRACK_FALSE_POSITIVE is an alias of __GFP_NOTRACK. It's a means of * distinguishing in the source between false positives and allocations that * cannot be supported (e.g. page tables). + * + * __GFP_NOREVERSEBUDDY does not allocate pages from reverse buddy list + * of current order. It make sure that allocation is alignment to same order + * with length order. */ #define __GFP_COLD ((__force gfp_t)___GFP_COLD) #define __GFP_NOWARN ((__force gfp_t)___GFP_NOWARN) @@ -178,9 +183,10 @@ #define __GFP_ZERO ((__force gfp_t)___GFP_ZERO) #define __GFP_NOTRACK ((__force gfp_t)___GFP_NOTRACK) #define
Re: [PATCH mm] introduce reverse buddy concept to reduce buddy fragment
On 2017/7/3 15:48, Michal Hocko wrote: On Fri 30-06-17 19:25:41, zhouxianr...@huawei.com wrote: From: zhouxianrong when buddy is under fragment i find that still there are some pages just like AFFA mode. A is allocated, F is free, AF is buddy pair for oder n, FA is buddy pair for oder n as well. Could you quantify how often does this happen and how much of a problem this actually is? Is there any specific workload that would suffer from such an artificial fragmentation? I want to compse the FF as oder n + 1 and align to n other than n + 1. this patch broke the rules of buddy stated as alignment to its length of oder. i think we can do so except for kernel stack because the requirement comes from buddy attribution rather than user. Why do you think the stack is a problem here? for kernel stack requirement i add __GFP_NOREVERSEBUDDY for this purpose. a sample just like blow. Node 0, zone DMA 1389 1765342272 2 0 0 0 0 0 0 0 75 4398 1560379 27 2 0 0 0 0 Node 0, zone Normal 20 24 14 2 0 0 0 0 0 0 0 0 6228 3 0 0 0 0 0 0 0 at the sample moment if we have not this patch, the aspect should like below: Node 0, zoneDMA (1389 + 75 * 2) (1765 + 4398 * 2)(342 + 1560 * 2)(272 + 379 * 2) (2 + 27 * 2) (0 + 2 * 2) 0 0 0 0 0 Node 0, zoneNormal (20 + 6 * 2)(24 + 228 * 2) (14 + 3 * 2) 2 0 0 0 0 0 0 0 i find out AFFA mode in lower order free_list and move FF into higher order free_list_reverse. now we only consider the DMA zone. let we see the difference. Node 0, zoneDMA (1389 + 75 * 2) (1765 + 4398 * 2) (342 + 1560 * 2) (272 + 379 * 2) (2 + 27 * 2) (0 + 2 * 2) 0 0 0 0 0 it is equal to Node 0, zoneDMA 1539 10561 3804 1302 5840 0 0 0 0 -- 1) after applying this patch Node 0, zoneDMA 1389 1765342272 2 0 0 0 0 0 0 0 75 4398 1560379 27 2 0 0 0 0 it is equivalent to Node 0, zoneDMA (1389 + 0) (1765 + 75) (342 + 4398) (272 + 1560) (2 + 379) (0 + 27) (0 + 2) 0 0 0 0 it is equal to Node 0, zoneDMA 1389 1840 4740 1832 381 27 2 0 0 0 0 -- 2) let's write 1) and 2) together and compare them 1539 10561 3804 1302 5840 0 0 0 0 - 1) 1389 18404740 1832 381 27 2 0 0 0 0 - 2) from above i think after applying the patch the result is better. the patch does not consider fallback allocation for now. The path is missing the crucial information required for any optimization. Some numbers to compare how much it helps. The above output of buddyinfo is pointless without any base to compare to. Also which workloads would benefit from this change and how much? It is also a non trivial amount of code in the guts of the page allocator so this really needs _much_ better explanation. I haven't looked closely on the code yet but a quick look at set_reverse_free_area scared me away. Signed-off-by: zhouxianrong --- include/linux/gfp.h |8 +- include/linux/mmzone.h |2 + include/linux/page-flags.h |9 ++ include/linux/thread_info.h |5 +- mm/compaction.c | 17 mm/internal.h |7 ++ mm/page_alloc.c | 222 +++ mm/vmstat.c |5 +- 8 files changed, 251 insertions(+), 24 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index db373b9..f63d4d9 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -40,6 +40,7 @@ #define ___GFP_DIRECT_RECLAIM 0x40u #define ___GFP_WRITE 0x80u #define ___GFP_KSWAPD_RECLAIM 0x100u +#define ___GFP_NOREVERSEBUDDY 0x200u /* If the above are modified, __GFP_BITS_SHIFT may need updating */ /* @@ -171,6 +172,10 @@ * __GFP_NOTRACK_FALSE_POSITIVE is an alias of __GFP_NOTRACK. It's a means of * distinguishing in the source between false positives and allocations that * cannot be supported (e.g. page tables). + * + * __GFP_NOREVERSEBUDDY does not allocate pages from reverse buddy list + * of current order. It make sure that allocation is alignment to same order + * with length order. */ #define __GFP_COLD ((__force gfp_t)___GFP_COLD) #define __GFP_NOWARN ((__force gfp_t)___GFP_NOWARN) @@ -178,9 +183,10 @@ #define __GFP_ZERO ((__force gfp_t)___GFP_ZERO) #define __GFP_NOTRACK ((__force gfp_t)___GFP_NOTRACK) #define __GFP_NOTRACK_FALSE_POSITIVE (__GFP_NOTRACK) +#define
Re: [PATCH mm] introduce reverse buddy concept to reduce buddy fragment
On 2017/7/3 15:48, Michal Hocko wrote: On Fri 30-06-17 19:25:41, zhouxianr...@huawei.com wrote: From: zhouxianrongwhen buddy is under fragment i find that still there are some pages just like AFFA mode. A is allocated, F is free, AF is buddy pair for oder n, FA is buddy pair for oder n as well. Could you quantify how often does this happen and how much of a problem this actually is? Is there any specific workload that would suffer from such an artificial fragmentation? I want to compse the FF as oder n + 1 and align to n other than n + 1. this patch broke the rules of buddy stated as alignment to its length of oder. i think we can do so except for kernel stack because the requirement comes from buddy attribution rather than user. Why do you think the stack is a problem here? for kernel stack requirement i add __GFP_NOREVERSEBUDDY for this purpose. a sample just like blow. Node 0, zone DMA 1389 1765342272 2 0 0 0 0 0 0 0 75 4398 1560379 27 2 0 0 0 0 Node 0, zone Normal 20 24 14 2 0 0 0 0 0 0 0 0 6228 3 0 0 0 0 0 0 0 at the sample moment if we have not this patch, the aspect should like below: Node 0, zoneDMA (1389 + 75 * 2) (1765 + 4398 * 2)(342 + 1560 * 2)(272 + 379 * 2) (2 + 27 * 2) (0 + 2 * 2) 0 0 0 0 0 Node 0, zoneNormal (20 + 6 * 2)(24 + 228 * 2) (14 + 3 * 2) 2 0 0 0 0 0 0 0 i find out AFFA mode in lower order free_list and move FF into higher order free_list_reverse. the patch does not consider fallback allocation for now. The path is missing the crucial information required for any optimization. Some numbers to compare how much it helps. The above output of buddyinfo is pointless without any base to compare to. Also which workloads would benefit from this change and how much? It is also a non trivial amount of code in the guts of the page allocator so this really needs _much_ better explanation. I haven't looked closely on the code yet but a quick look at set_reverse_free_area scared me away. Signed-off-by: zhouxianrong --- include/linux/gfp.h |8 +- include/linux/mmzone.h |2 + include/linux/page-flags.h |9 ++ include/linux/thread_info.h |5 +- mm/compaction.c | 17 mm/internal.h |7 ++ mm/page_alloc.c | 222 +++ mm/vmstat.c |5 +- 8 files changed, 251 insertions(+), 24 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index db373b9..f63d4d9 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -40,6 +40,7 @@ #define ___GFP_DIRECT_RECLAIM 0x40u #define ___GFP_WRITE 0x80u #define ___GFP_KSWAPD_RECLAIM 0x100u +#define ___GFP_NOREVERSEBUDDY 0x200u /* If the above are modified, __GFP_BITS_SHIFT may need updating */ /* @@ -171,6 +172,10 @@ * __GFP_NOTRACK_FALSE_POSITIVE is an alias of __GFP_NOTRACK. It's a means of * distinguishing in the source between false positives and allocations that * cannot be supported (e.g. page tables). + * + * __GFP_NOREVERSEBUDDY does not allocate pages from reverse buddy list + * of current order. It make sure that allocation is alignment to same order + * with length order. */ #define __GFP_COLD ((__force gfp_t)___GFP_COLD) #define __GFP_NOWARN ((__force gfp_t)___GFP_NOWARN) @@ -178,9 +183,10 @@ #define __GFP_ZERO ((__force gfp_t)___GFP_ZERO) #define __GFP_NOTRACK ((__force gfp_t)___GFP_NOTRACK) #define __GFP_NOTRACK_FALSE_POSITIVE (__GFP_NOTRACK) +#define __GFP_NOREVERSEBUDDY ((__force gfp_t)___GFP_NOREVERSEBUDDY) /* Room for N __GFP_FOO bits */ -#define __GFP_BITS_SHIFT 25 +#define __GFP_BITS_SHIFT 26 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1)) /* diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 8e02b37..94237fe 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -89,7 +89,9 @@ enum { struct free_area { struct list_headfree_list[MIGRATE_TYPES]; + struct list_headfree_list_reverse[MIGRATE_TYPES]; unsigned long nr_free; + unsigned long nr_free_reverse; }; struct pglist_data; diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 6b5818d..39d17d7 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -675,6 +675,15 @@ static inline int TestClearPageDoubleMap(struct page *page) #define PAGE_KMEMCG_MAPCOUNT_VALUE (-512) PAGE_MAPCOUNT_OPS(Kmemcg, KMEMCG) +/* + * ReverseBuddy is enabled for the buddy allocator that allow allocating + * two adjacent same
Re: [PATCH mm] introduce reverse buddy concept to reduce buddy fragment
On 2017/7/3 15:48, Michal Hocko wrote: On Fri 30-06-17 19:25:41, zhouxianr...@huawei.com wrote: From: zhouxianrong when buddy is under fragment i find that still there are some pages just like AFFA mode. A is allocated, F is free, AF is buddy pair for oder n, FA is buddy pair for oder n as well. Could you quantify how often does this happen and how much of a problem this actually is? Is there any specific workload that would suffer from such an artificial fragmentation? I want to compse the FF as oder n + 1 and align to n other than n + 1. this patch broke the rules of buddy stated as alignment to its length of oder. i think we can do so except for kernel stack because the requirement comes from buddy attribution rather than user. Why do you think the stack is a problem here? for kernel stack requirement i add __GFP_NOREVERSEBUDDY for this purpose. a sample just like blow. Node 0, zone DMA 1389 1765342272 2 0 0 0 0 0 0 0 75 4398 1560379 27 2 0 0 0 0 Node 0, zone Normal 20 24 14 2 0 0 0 0 0 0 0 0 6228 3 0 0 0 0 0 0 0 at the sample moment if we have not this patch, the aspect should like below: Node 0, zoneDMA (1389 + 75 * 2) (1765 + 4398 * 2)(342 + 1560 * 2)(272 + 379 * 2) (2 + 27 * 2) (0 + 2 * 2) 0 0 0 0 0 Node 0, zoneNormal (20 + 6 * 2)(24 + 228 * 2) (14 + 3 * 2) 2 0 0 0 0 0 0 0 i find out AFFA mode in lower order free_list and move FF into higher order free_list_reverse. the patch does not consider fallback allocation for now. The path is missing the crucial information required for any optimization. Some numbers to compare how much it helps. The above output of buddyinfo is pointless without any base to compare to. Also which workloads would benefit from this change and how much? It is also a non trivial amount of code in the guts of the page allocator so this really needs _much_ better explanation. I haven't looked closely on the code yet but a quick look at set_reverse_free_area scared me away. Signed-off-by: zhouxianrong --- include/linux/gfp.h |8 +- include/linux/mmzone.h |2 + include/linux/page-flags.h |9 ++ include/linux/thread_info.h |5 +- mm/compaction.c | 17 mm/internal.h |7 ++ mm/page_alloc.c | 222 +++ mm/vmstat.c |5 +- 8 files changed, 251 insertions(+), 24 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index db373b9..f63d4d9 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -40,6 +40,7 @@ #define ___GFP_DIRECT_RECLAIM 0x40u #define ___GFP_WRITE 0x80u #define ___GFP_KSWAPD_RECLAIM 0x100u +#define ___GFP_NOREVERSEBUDDY 0x200u /* If the above are modified, __GFP_BITS_SHIFT may need updating */ /* @@ -171,6 +172,10 @@ * __GFP_NOTRACK_FALSE_POSITIVE is an alias of __GFP_NOTRACK. It's a means of * distinguishing in the source between false positives and allocations that * cannot be supported (e.g. page tables). + * + * __GFP_NOREVERSEBUDDY does not allocate pages from reverse buddy list + * of current order. It make sure that allocation is alignment to same order + * with length order. */ #define __GFP_COLD ((__force gfp_t)___GFP_COLD) #define __GFP_NOWARN ((__force gfp_t)___GFP_NOWARN) @@ -178,9 +183,10 @@ #define __GFP_ZERO ((__force gfp_t)___GFP_ZERO) #define __GFP_NOTRACK ((__force gfp_t)___GFP_NOTRACK) #define __GFP_NOTRACK_FALSE_POSITIVE (__GFP_NOTRACK) +#define __GFP_NOREVERSEBUDDY ((__force gfp_t)___GFP_NOREVERSEBUDDY) /* Room for N __GFP_FOO bits */ -#define __GFP_BITS_SHIFT 25 +#define __GFP_BITS_SHIFT 26 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1)) /* diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 8e02b37..94237fe 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -89,7 +89,9 @@ enum { struct free_area { struct list_headfree_list[MIGRATE_TYPES]; + struct list_headfree_list_reverse[MIGRATE_TYPES]; unsigned long nr_free; + unsigned long nr_free_reverse; }; struct pglist_data; diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 6b5818d..39d17d7 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -675,6 +675,15 @@ static inline int TestClearPageDoubleMap(struct page *page) #define PAGE_KMEMCG_MAPCOUNT_VALUE (-512) PAGE_MAPCOUNT_OPS(Kmemcg, KMEMCG) +/* + * ReverseBuddy is enabled for the buddy allocator that allow allocating + * two adjacent same free order blocks other than buddy blocks and + *
Re: [PATCH mm] introduce reverse buddy concept to reduce buddy fragment
On Fri 30-06-17 19:25:41, zhouxianr...@huawei.com wrote: > From: zhouxianrong> > when buddy is under fragment i find that still there are some pages > just like AFFA mode. A is allocated, F is free, AF is buddy pair for > oder n, FA is buddy pair for oder n as well. Could you quantify how often does this happen and how much of a problem this actually is? Is there any specific workload that would suffer from such an artificial fragmentation? > I want to compse the > FF as oder n + 1 and align to n other than n + 1. this patch broke > the rules of buddy stated as alignment to its length of oder. i think > we can do so except for kernel stack because the requirement comes from > buddy attribution rather than user. Why do you think the stack is a problem here? > for kernel stack requirement i add > __GFP_NOREVERSEBUDDY for this purpose. > > a sample just like blow. > > Node 0, zone DMA > 1389 1765342272 2 0 0 0 0 0 0 >0 75 4398 1560379 27 2 0 0 0 0 > Node 0, zone Normal > 20 24 14 2 0 0 0 0 0 0 0 >0 6228 3 0 0 0 0 0 0 0 > > the patch does not consider fallback allocation for now. The path is missing the crucial information required for any optimization. Some numbers to compare how much it helps. The above output of buddyinfo is pointless without any base to compare to. Also which workloads would benefit from this change and how much? It is also a non trivial amount of code in the guts of the page allocator so this really needs _much_ better explanation. I haven't looked closely on the code yet but a quick look at set_reverse_free_area scared me away. > Signed-off-by: zhouxianrong > --- > include/linux/gfp.h |8 +- > include/linux/mmzone.h |2 + > include/linux/page-flags.h |9 ++ > include/linux/thread_info.h |5 +- > mm/compaction.c | 17 > mm/internal.h |7 ++ > mm/page_alloc.c | 222 > +++ > mm/vmstat.c |5 +- > 8 files changed, 251 insertions(+), 24 deletions(-) > > diff --git a/include/linux/gfp.h b/include/linux/gfp.h > index db373b9..f63d4d9 100644 > --- a/include/linux/gfp.h > +++ b/include/linux/gfp.h > @@ -40,6 +40,7 @@ > #define ___GFP_DIRECT_RECLAIM0x40u > #define ___GFP_WRITE 0x80u > #define ___GFP_KSWAPD_RECLAIM0x100u > +#define ___GFP_NOREVERSEBUDDY0x200u > /* If the above are modified, __GFP_BITS_SHIFT may need updating */ > > /* > @@ -171,6 +172,10 @@ > * __GFP_NOTRACK_FALSE_POSITIVE is an alias of __GFP_NOTRACK. It's a means of > * distinguishing in the source between false positives and allocations > that > * cannot be supported (e.g. page tables). > + * > + * __GFP_NOREVERSEBUDDY does not allocate pages from reverse buddy list > + * of current order. It make sure that allocation is alignment to same > order > + * with length order. > */ > #define __GFP_COLD ((__force gfp_t)___GFP_COLD) > #define __GFP_NOWARN ((__force gfp_t)___GFP_NOWARN) > @@ -178,9 +183,10 @@ > #define __GFP_ZERO ((__force gfp_t)___GFP_ZERO) > #define __GFP_NOTRACK((__force gfp_t)___GFP_NOTRACK) > #define __GFP_NOTRACK_FALSE_POSITIVE (__GFP_NOTRACK) > +#define __GFP_NOREVERSEBUDDY ((__force gfp_t)___GFP_NOREVERSEBUDDY) > > /* Room for N __GFP_FOO bits */ > -#define __GFP_BITS_SHIFT 25 > +#define __GFP_BITS_SHIFT 26 > #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1)) > > /* > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index 8e02b37..94237fe 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -89,7 +89,9 @@ enum { > > struct free_area { > struct list_headfree_list[MIGRATE_TYPES]; > + struct list_headfree_list_reverse[MIGRATE_TYPES]; > unsigned long nr_free; > + unsigned long nr_free_reverse; > }; > > struct pglist_data; > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h > index 6b5818d..39d17d7 100644 > --- a/include/linux/page-flags.h > +++ b/include/linux/page-flags.h > @@ -675,6 +675,15 @@ static inline int TestClearPageDoubleMap(struct page > *page) > #define PAGE_KMEMCG_MAPCOUNT_VALUE (-512) > PAGE_MAPCOUNT_OPS(Kmemcg, KMEMCG) > > +/* > + * ReverseBuddy is enabled for the buddy allocator that allow allocating > + * two adjacent same free order blocks other than buddy blocks and > + * composing them as a order + 1 block. It is for reducing buddy > + * fragment. > + */ > +#define PAGE_REVERSE_BUDDY_MAPCOUNT_VALUE(-1024) > +PAGE_MAPCOUNT_OPS(ReverseBuddy, REVERSE_BUDDY) > + > extern bool is_free_buddy_page(struct page *page); > >
Re: [PATCH mm] introduce reverse buddy concept to reduce buddy fragment
On Fri 30-06-17 19:25:41, zhouxianr...@huawei.com wrote: > From: zhouxianrong > > when buddy is under fragment i find that still there are some pages > just like AFFA mode. A is allocated, F is free, AF is buddy pair for > oder n, FA is buddy pair for oder n as well. Could you quantify how often does this happen and how much of a problem this actually is? Is there any specific workload that would suffer from such an artificial fragmentation? > I want to compse the > FF as oder n + 1 and align to n other than n + 1. this patch broke > the rules of buddy stated as alignment to its length of oder. i think > we can do so except for kernel stack because the requirement comes from > buddy attribution rather than user. Why do you think the stack is a problem here? > for kernel stack requirement i add > __GFP_NOREVERSEBUDDY for this purpose. > > a sample just like blow. > > Node 0, zone DMA > 1389 1765342272 2 0 0 0 0 0 0 >0 75 4398 1560379 27 2 0 0 0 0 > Node 0, zone Normal > 20 24 14 2 0 0 0 0 0 0 0 >0 6228 3 0 0 0 0 0 0 0 > > the patch does not consider fallback allocation for now. The path is missing the crucial information required for any optimization. Some numbers to compare how much it helps. The above output of buddyinfo is pointless without any base to compare to. Also which workloads would benefit from this change and how much? It is also a non trivial amount of code in the guts of the page allocator so this really needs _much_ better explanation. I haven't looked closely on the code yet but a quick look at set_reverse_free_area scared me away. > Signed-off-by: zhouxianrong > --- > include/linux/gfp.h |8 +- > include/linux/mmzone.h |2 + > include/linux/page-flags.h |9 ++ > include/linux/thread_info.h |5 +- > mm/compaction.c | 17 > mm/internal.h |7 ++ > mm/page_alloc.c | 222 > +++ > mm/vmstat.c |5 +- > 8 files changed, 251 insertions(+), 24 deletions(-) > > diff --git a/include/linux/gfp.h b/include/linux/gfp.h > index db373b9..f63d4d9 100644 > --- a/include/linux/gfp.h > +++ b/include/linux/gfp.h > @@ -40,6 +40,7 @@ > #define ___GFP_DIRECT_RECLAIM0x40u > #define ___GFP_WRITE 0x80u > #define ___GFP_KSWAPD_RECLAIM0x100u > +#define ___GFP_NOREVERSEBUDDY0x200u > /* If the above are modified, __GFP_BITS_SHIFT may need updating */ > > /* > @@ -171,6 +172,10 @@ > * __GFP_NOTRACK_FALSE_POSITIVE is an alias of __GFP_NOTRACK. It's a means of > * distinguishing in the source between false positives and allocations > that > * cannot be supported (e.g. page tables). > + * > + * __GFP_NOREVERSEBUDDY does not allocate pages from reverse buddy list > + * of current order. It make sure that allocation is alignment to same > order > + * with length order. > */ > #define __GFP_COLD ((__force gfp_t)___GFP_COLD) > #define __GFP_NOWARN ((__force gfp_t)___GFP_NOWARN) > @@ -178,9 +183,10 @@ > #define __GFP_ZERO ((__force gfp_t)___GFP_ZERO) > #define __GFP_NOTRACK((__force gfp_t)___GFP_NOTRACK) > #define __GFP_NOTRACK_FALSE_POSITIVE (__GFP_NOTRACK) > +#define __GFP_NOREVERSEBUDDY ((__force gfp_t)___GFP_NOREVERSEBUDDY) > > /* Room for N __GFP_FOO bits */ > -#define __GFP_BITS_SHIFT 25 > +#define __GFP_BITS_SHIFT 26 > #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1)) > > /* > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index 8e02b37..94237fe 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -89,7 +89,9 @@ enum { > > struct free_area { > struct list_headfree_list[MIGRATE_TYPES]; > + struct list_headfree_list_reverse[MIGRATE_TYPES]; > unsigned long nr_free; > + unsigned long nr_free_reverse; > }; > > struct pglist_data; > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h > index 6b5818d..39d17d7 100644 > --- a/include/linux/page-flags.h > +++ b/include/linux/page-flags.h > @@ -675,6 +675,15 @@ static inline int TestClearPageDoubleMap(struct page > *page) > #define PAGE_KMEMCG_MAPCOUNT_VALUE (-512) > PAGE_MAPCOUNT_OPS(Kmemcg, KMEMCG) > > +/* > + * ReverseBuddy is enabled for the buddy allocator that allow allocating > + * two adjacent same free order blocks other than buddy blocks and > + * composing them as a order + 1 block. It is for reducing buddy > + * fragment. > + */ > +#define PAGE_REVERSE_BUDDY_MAPCOUNT_VALUE(-1024) > +PAGE_MAPCOUNT_OPS(ReverseBuddy, REVERSE_BUDDY) > + > extern bool is_free_buddy_page(struct page *page); > > __PAGEFLAG(Isolated, isolated, PF_ANY); > diff --git
[PATCH mm] introduce reverse buddy concept to reduce buddy fragment
From: zhouxianrongwhen buddy is under fragment i find that still there are some pages just like AFFA mode. A is allocated, F is free, AF is buddy pair for oder n, FA is buddy pair for oder n as well. I want to compse the FF as oder n + 1 and align to n other than n + 1. this patch broke the rules of buddy stated as alignment to its length of oder. i think we can do so except for kernel stack because the requirement comes from buddy attribution rather than user. for kernel stack requirement i add __GFP_NOREVERSEBUDDY for this purpose. a sample just like blow. Node 0, zone DMA 1389 1765342272 2 0 0 0 0 0 0 0 75 4398 1560379 27 2 0 0 0 0 Node 0, zone Normal 20 24 14 2 0 0 0 0 0 0 0 0 6228 3 0 0 0 0 0 0 0 the patch does not consider fallback allocation for now. Signed-off-by: zhouxianrong --- include/linux/gfp.h |8 +- include/linux/mmzone.h |2 + include/linux/page-flags.h |9 ++ include/linux/thread_info.h |5 +- mm/compaction.c | 17 mm/internal.h |7 ++ mm/page_alloc.c | 222 +++ mm/vmstat.c |5 +- 8 files changed, 251 insertions(+), 24 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index db373b9..f63d4d9 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -40,6 +40,7 @@ #define ___GFP_DIRECT_RECLAIM 0x40u #define ___GFP_WRITE 0x80u #define ___GFP_KSWAPD_RECLAIM 0x100u +#define ___GFP_NOREVERSEBUDDY 0x200u /* If the above are modified, __GFP_BITS_SHIFT may need updating */ /* @@ -171,6 +172,10 @@ * __GFP_NOTRACK_FALSE_POSITIVE is an alias of __GFP_NOTRACK. It's a means of * distinguishing in the source between false positives and allocations that * cannot be supported (e.g. page tables). + * + * __GFP_NOREVERSEBUDDY does not allocate pages from reverse buddy list + * of current order. It make sure that allocation is alignment to same order + * with length order. */ #define __GFP_COLD ((__force gfp_t)___GFP_COLD) #define __GFP_NOWARN ((__force gfp_t)___GFP_NOWARN) @@ -178,9 +183,10 @@ #define __GFP_ZERO ((__force gfp_t)___GFP_ZERO) #define __GFP_NOTRACK ((__force gfp_t)___GFP_NOTRACK) #define __GFP_NOTRACK_FALSE_POSITIVE (__GFP_NOTRACK) +#define __GFP_NOREVERSEBUDDY ((__force gfp_t)___GFP_NOREVERSEBUDDY) /* Room for N __GFP_FOO bits */ -#define __GFP_BITS_SHIFT 25 +#define __GFP_BITS_SHIFT 26 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1)) /* diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 8e02b37..94237fe 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -89,7 +89,9 @@ enum { struct free_area { struct list_headfree_list[MIGRATE_TYPES]; + struct list_headfree_list_reverse[MIGRATE_TYPES]; unsigned long nr_free; + unsigned long nr_free_reverse; }; struct pglist_data; diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 6b5818d..39d17d7 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -675,6 +675,15 @@ static inline int TestClearPageDoubleMap(struct page *page) #define PAGE_KMEMCG_MAPCOUNT_VALUE (-512) PAGE_MAPCOUNT_OPS(Kmemcg, KMEMCG) +/* + * ReverseBuddy is enabled for the buddy allocator that allow allocating + * two adjacent same free order blocks other than buddy blocks and + * composing them as a order + 1 block. It is for reducing buddy + * fragment. + */ +#define PAGE_REVERSE_BUDDY_MAPCOUNT_VALUE (-1024) +PAGE_MAPCOUNT_OPS(ReverseBuddy, REVERSE_BUDDY) + extern bool is_free_buddy_page(struct page *page); __PAGEFLAG(Isolated, isolated, PF_ANY); diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h index 5837387..b4a1605 100644 --- a/include/linux/thread_info.h +++ b/include/linux/thread_info.h @@ -28,9 +28,10 @@ #ifdef CONFIG_DEBUG_STACK_USAGE # define THREADINFO_GFP(GFP_KERNEL_ACCOUNT | __GFP_NOTRACK | \ -__GFP_ZERO) +__GFP_NOREVERSEBUDDY | __GFP_ZERO) #else -# define THREADINFO_GFP(GFP_KERNEL_ACCOUNT | __GFP_NOTRACK) +# define THREADINFO_GFP(GFP_KERNEL_ACCOUNT | __GFP_NOTRACK | \ +__GFP_NOREVERSEBUDDY) #endif /* diff --git a/mm/compaction.c b/mm/compaction.c index 0fdfde0..a43f169 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -768,6 +768,20 @@ static
[PATCH mm] introduce reverse buddy concept to reduce buddy fragment
From: zhouxianrong when buddy is under fragment i find that still there are some pages just like AFFA mode. A is allocated, F is free, AF is buddy pair for oder n, FA is buddy pair for oder n as well. I want to compse the FF as oder n + 1 and align to n other than n + 1. this patch broke the rules of buddy stated as alignment to its length of oder. i think we can do so except for kernel stack because the requirement comes from buddy attribution rather than user. for kernel stack requirement i add __GFP_NOREVERSEBUDDY for this purpose. a sample just like blow. Node 0, zone DMA 1389 1765342272 2 0 0 0 0 0 0 0 75 4398 1560379 27 2 0 0 0 0 Node 0, zone Normal 20 24 14 2 0 0 0 0 0 0 0 0 6228 3 0 0 0 0 0 0 0 the patch does not consider fallback allocation for now. Signed-off-by: zhouxianrong --- include/linux/gfp.h |8 +- include/linux/mmzone.h |2 + include/linux/page-flags.h |9 ++ include/linux/thread_info.h |5 +- mm/compaction.c | 17 mm/internal.h |7 ++ mm/page_alloc.c | 222 +++ mm/vmstat.c |5 +- 8 files changed, 251 insertions(+), 24 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index db373b9..f63d4d9 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -40,6 +40,7 @@ #define ___GFP_DIRECT_RECLAIM 0x40u #define ___GFP_WRITE 0x80u #define ___GFP_KSWAPD_RECLAIM 0x100u +#define ___GFP_NOREVERSEBUDDY 0x200u /* If the above are modified, __GFP_BITS_SHIFT may need updating */ /* @@ -171,6 +172,10 @@ * __GFP_NOTRACK_FALSE_POSITIVE is an alias of __GFP_NOTRACK. It's a means of * distinguishing in the source between false positives and allocations that * cannot be supported (e.g. page tables). + * + * __GFP_NOREVERSEBUDDY does not allocate pages from reverse buddy list + * of current order. It make sure that allocation is alignment to same order + * with length order. */ #define __GFP_COLD ((__force gfp_t)___GFP_COLD) #define __GFP_NOWARN ((__force gfp_t)___GFP_NOWARN) @@ -178,9 +183,10 @@ #define __GFP_ZERO ((__force gfp_t)___GFP_ZERO) #define __GFP_NOTRACK ((__force gfp_t)___GFP_NOTRACK) #define __GFP_NOTRACK_FALSE_POSITIVE (__GFP_NOTRACK) +#define __GFP_NOREVERSEBUDDY ((__force gfp_t)___GFP_NOREVERSEBUDDY) /* Room for N __GFP_FOO bits */ -#define __GFP_BITS_SHIFT 25 +#define __GFP_BITS_SHIFT 26 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1)) /* diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 8e02b37..94237fe 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -89,7 +89,9 @@ enum { struct free_area { struct list_headfree_list[MIGRATE_TYPES]; + struct list_headfree_list_reverse[MIGRATE_TYPES]; unsigned long nr_free; + unsigned long nr_free_reverse; }; struct pglist_data; diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 6b5818d..39d17d7 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -675,6 +675,15 @@ static inline int TestClearPageDoubleMap(struct page *page) #define PAGE_KMEMCG_MAPCOUNT_VALUE (-512) PAGE_MAPCOUNT_OPS(Kmemcg, KMEMCG) +/* + * ReverseBuddy is enabled for the buddy allocator that allow allocating + * two adjacent same free order blocks other than buddy blocks and + * composing them as a order + 1 block. It is for reducing buddy + * fragment. + */ +#define PAGE_REVERSE_BUDDY_MAPCOUNT_VALUE (-1024) +PAGE_MAPCOUNT_OPS(ReverseBuddy, REVERSE_BUDDY) + extern bool is_free_buddy_page(struct page *page); __PAGEFLAG(Isolated, isolated, PF_ANY); diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h index 5837387..b4a1605 100644 --- a/include/linux/thread_info.h +++ b/include/linux/thread_info.h @@ -28,9 +28,10 @@ #ifdef CONFIG_DEBUG_STACK_USAGE # define THREADINFO_GFP(GFP_KERNEL_ACCOUNT | __GFP_NOTRACK | \ -__GFP_ZERO) +__GFP_NOREVERSEBUDDY | __GFP_ZERO) #else -# define THREADINFO_GFP(GFP_KERNEL_ACCOUNT | __GFP_NOTRACK) +# define THREADINFO_GFP(GFP_KERNEL_ACCOUNT | __GFP_NOTRACK | \ +__GFP_NOREVERSEBUDDY) #endif /* diff --git a/mm/compaction.c b/mm/compaction.c index 0fdfde0..a43f169 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -768,6 +768,20 @@ static bool too_many_isolated(struct zone *zone)
[PATCH mm] introduce reverse buddy concept to reduce buddy fragment
From: z00281421Signed-off-by: z00281421 --- include/linux/gfp.h |8 +- include/linux/mmzone.h |2 + include/linux/page-flags.h |9 ++ include/linux/thread_info.h |5 +- mm/compaction.c | 17 mm/internal.h |7 ++ mm/page_alloc.c | 222 +++ mm/vmstat.c |5 +- 8 files changed, 251 insertions(+), 24 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index db373b9..f63d4d9 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -40,6 +40,7 @@ #define ___GFP_DIRECT_RECLAIM 0x40u #define ___GFP_WRITE 0x80u #define ___GFP_KSWAPD_RECLAIM 0x100u +#define ___GFP_NOREVERSEBUDDY 0x200u /* If the above are modified, __GFP_BITS_SHIFT may need updating */ /* @@ -171,6 +172,10 @@ * __GFP_NOTRACK_FALSE_POSITIVE is an alias of __GFP_NOTRACK. It's a means of * distinguishing in the source between false positives and allocations that * cannot be supported (e.g. page tables). + * + * __GFP_NOREVERSEBUDDY does not allocate pages from reverse buddy list + * of current order. It make sure that allocation is alignment to same order + * with length order. */ #define __GFP_COLD ((__force gfp_t)___GFP_COLD) #define __GFP_NOWARN ((__force gfp_t)___GFP_NOWARN) @@ -178,9 +183,10 @@ #define __GFP_ZERO ((__force gfp_t)___GFP_ZERO) #define __GFP_NOTRACK ((__force gfp_t)___GFP_NOTRACK) #define __GFP_NOTRACK_FALSE_POSITIVE (__GFP_NOTRACK) +#define __GFP_NOREVERSEBUDDY ((__force gfp_t)___GFP_NOREVERSEBUDDY) /* Room for N __GFP_FOO bits */ -#define __GFP_BITS_SHIFT 25 +#define __GFP_BITS_SHIFT 26 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1)) /* diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 8e02b37..94237fe 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -89,7 +89,9 @@ enum { struct free_area { struct list_headfree_list[MIGRATE_TYPES]; + struct list_headfree_list_reverse[MIGRATE_TYPES]; unsigned long nr_free; + unsigned long nr_free_reverse; }; struct pglist_data; diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 6b5818d..39d17d7 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -675,6 +675,15 @@ static inline int TestClearPageDoubleMap(struct page *page) #define PAGE_KMEMCG_MAPCOUNT_VALUE (-512) PAGE_MAPCOUNT_OPS(Kmemcg, KMEMCG) +/* + * ReverseBuddy is enabled for the buddy allocator that allow allocating + * two adjacent same free order blocks other than buddy blocks and + * composing them as a order + 1 block. It is for reducing buddy + * fragment. + */ +#define PAGE_REVERSE_BUDDY_MAPCOUNT_VALUE (-1024) +PAGE_MAPCOUNT_OPS(ReverseBuddy, REVERSE_BUDDY) + extern bool is_free_buddy_page(struct page *page); __PAGEFLAG(Isolated, isolated, PF_ANY); diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h index 5837387..b4a1605 100644 --- a/include/linux/thread_info.h +++ b/include/linux/thread_info.h @@ -28,9 +28,10 @@ #ifdef CONFIG_DEBUG_STACK_USAGE # define THREADINFO_GFP(GFP_KERNEL_ACCOUNT | __GFP_NOTRACK | \ -__GFP_ZERO) +__GFP_NOREVERSEBUDDY | __GFP_ZERO) #else -# define THREADINFO_GFP(GFP_KERNEL_ACCOUNT | __GFP_NOTRACK) +# define THREADINFO_GFP(GFP_KERNEL_ACCOUNT | __GFP_NOTRACK | \ +__GFP_NOREVERSEBUDDY) #endif /* diff --git a/mm/compaction.c b/mm/compaction.c index 0fdfde0..a43f169 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -768,6 +768,20 @@ static bool too_many_isolated(struct zone *zone) continue; } + if (PageReverseBuddy(page)) { + unsigned long freepage_order = page_order_unsafe(page); + + /* +* Without lock, we cannot be sure that what we got is +* a valid page order. Consider only values in the +* valid order range to prevent low_pfn overflow. +*/ + if (freepage_order > 0 && + freepage_order < MAX_ORDER - 1) + low_pfn += (1UL << (freepage_order + 1)) - 1; + continue; + } + /* * Regardless of being on LRU, compound pages such as THP and * hugetlbfs are not to be compacted. We can potentially save @@ -1005,6 +1019,9 @@ static bool suitable_migration_target(struct compact_control *cc, return false; } + if (PageReverseBuddy(page)) +
[PATCH mm] introduce reverse buddy concept to reduce buddy fragment
From: z00281421 Signed-off-by: z00281421 --- include/linux/gfp.h |8 +- include/linux/mmzone.h |2 + include/linux/page-flags.h |9 ++ include/linux/thread_info.h |5 +- mm/compaction.c | 17 mm/internal.h |7 ++ mm/page_alloc.c | 222 +++ mm/vmstat.c |5 +- 8 files changed, 251 insertions(+), 24 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index db373b9..f63d4d9 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -40,6 +40,7 @@ #define ___GFP_DIRECT_RECLAIM 0x40u #define ___GFP_WRITE 0x80u #define ___GFP_KSWAPD_RECLAIM 0x100u +#define ___GFP_NOREVERSEBUDDY 0x200u /* If the above are modified, __GFP_BITS_SHIFT may need updating */ /* @@ -171,6 +172,10 @@ * __GFP_NOTRACK_FALSE_POSITIVE is an alias of __GFP_NOTRACK. It's a means of * distinguishing in the source between false positives and allocations that * cannot be supported (e.g. page tables). + * + * __GFP_NOREVERSEBUDDY does not allocate pages from reverse buddy list + * of current order. It make sure that allocation is alignment to same order + * with length order. */ #define __GFP_COLD ((__force gfp_t)___GFP_COLD) #define __GFP_NOWARN ((__force gfp_t)___GFP_NOWARN) @@ -178,9 +183,10 @@ #define __GFP_ZERO ((__force gfp_t)___GFP_ZERO) #define __GFP_NOTRACK ((__force gfp_t)___GFP_NOTRACK) #define __GFP_NOTRACK_FALSE_POSITIVE (__GFP_NOTRACK) +#define __GFP_NOREVERSEBUDDY ((__force gfp_t)___GFP_NOREVERSEBUDDY) /* Room for N __GFP_FOO bits */ -#define __GFP_BITS_SHIFT 25 +#define __GFP_BITS_SHIFT 26 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1)) /* diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 8e02b37..94237fe 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -89,7 +89,9 @@ enum { struct free_area { struct list_headfree_list[MIGRATE_TYPES]; + struct list_headfree_list_reverse[MIGRATE_TYPES]; unsigned long nr_free; + unsigned long nr_free_reverse; }; struct pglist_data; diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 6b5818d..39d17d7 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -675,6 +675,15 @@ static inline int TestClearPageDoubleMap(struct page *page) #define PAGE_KMEMCG_MAPCOUNT_VALUE (-512) PAGE_MAPCOUNT_OPS(Kmemcg, KMEMCG) +/* + * ReverseBuddy is enabled for the buddy allocator that allow allocating + * two adjacent same free order blocks other than buddy blocks and + * composing them as a order + 1 block. It is for reducing buddy + * fragment. + */ +#define PAGE_REVERSE_BUDDY_MAPCOUNT_VALUE (-1024) +PAGE_MAPCOUNT_OPS(ReverseBuddy, REVERSE_BUDDY) + extern bool is_free_buddy_page(struct page *page); __PAGEFLAG(Isolated, isolated, PF_ANY); diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h index 5837387..b4a1605 100644 --- a/include/linux/thread_info.h +++ b/include/linux/thread_info.h @@ -28,9 +28,10 @@ #ifdef CONFIG_DEBUG_STACK_USAGE # define THREADINFO_GFP(GFP_KERNEL_ACCOUNT | __GFP_NOTRACK | \ -__GFP_ZERO) +__GFP_NOREVERSEBUDDY | __GFP_ZERO) #else -# define THREADINFO_GFP(GFP_KERNEL_ACCOUNT | __GFP_NOTRACK) +# define THREADINFO_GFP(GFP_KERNEL_ACCOUNT | __GFP_NOTRACK | \ +__GFP_NOREVERSEBUDDY) #endif /* diff --git a/mm/compaction.c b/mm/compaction.c index 0fdfde0..a43f169 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -768,6 +768,20 @@ static bool too_many_isolated(struct zone *zone) continue; } + if (PageReverseBuddy(page)) { + unsigned long freepage_order = page_order_unsafe(page); + + /* +* Without lock, we cannot be sure that what we got is +* a valid page order. Consider only values in the +* valid order range to prevent low_pfn overflow. +*/ + if (freepage_order > 0 && + freepage_order < MAX_ORDER - 1) + low_pfn += (1UL << (freepage_order + 1)) - 1; + continue; + } + /* * Regardless of being on LRU, compound pages such as THP and * hugetlbfs are not to be compacted. We can potentially save @@ -1005,6 +1019,9 @@ static bool suitable_migration_target(struct compact_control *cc, return false; } + if (PageReverseBuddy(page)) + return false; + /* If the block is MIGRATE_MOVABLE or