Re: [RFC PATCH 3/5] mm/page_alloc: stop instantly reusing freed page

2016-10-13 Thread Joonsoo Kim
On Thu, Oct 13, 2016 at 12:59:14PM +0200, Vlastimil Babka wrote:
> On 10/13/2016 10:08 AM, js1...@gmail.com wrote:
> >From: Joonsoo Kim 
> >
> >Allocation/free pattern is usually sequantial. If they are freed to
> >the buddy list, they can be coalesced. However, we first keep these freed
> >pages at the pcp list and try to reuse them until threshold is reached
> >so we don't have enough chance to get a high order freepage. This reusing
> >would provide us some performance advantages since we don't need to
> >get the zone lock and we don't pay the cost to check buddy merging.
> >But, less fragmentation and more high order freepage would compensate
> >this overhead in other ways. First, we would trigger less direct
> >compaction which has high overhead. And, there are usecases that uses
> >high order page to boost their performance.
> >
> >Instantly resuing freed page seems to provide us computational benefit
> >but the other affects more precious things like as I/O performance and
> >memory consumption so I think that it's a good idea to weight
> >later advantage more.
> 
> Again, there's also cache hotness to consider. And whether the
> sequential pattern is still real on a system with higher uptime.
> Should be possible to evaluate with tracepoints?

I answered this in previous e-mail. Anyway, we should evaluate
cache-effect. tracepoint or perf's cache event would show some
evidence. I will do it soon and report again.

Thanks.



Re: [RFC PATCH 3/5] mm/page_alloc: stop instantly reusing freed page

2016-10-13 Thread Vlastimil Babka

On 10/13/2016 10:08 AM, js1...@gmail.com wrote:

From: Joonsoo Kim 

Allocation/free pattern is usually sequantial. If they are freed to
the buddy list, they can be coalesced. However, we first keep these freed
pages at the pcp list and try to reuse them until threshold is reached
so we don't have enough chance to get a high order freepage. This reusing
would provide us some performance advantages since we don't need to
get the zone lock and we don't pay the cost to check buddy merging.
But, less fragmentation and more high order freepage would compensate
this overhead in other ways. First, we would trigger less direct
compaction which has high overhead. And, there are usecases that uses
high order page to boost their performance.

Instantly resuing freed page seems to provide us computational benefit
but the other affects more precious things like as I/O performance and
memory consumption so I think that it's a good idea to weight
later advantage more.


Again, there's also cache hotness to consider. And whether the 
sequential pattern is still real on a system with higher uptime. Should 
be possible to evaluate with tracepoints?