Re: [PATCH v5 2/2] mm/memcontrol.c: Reduce reclaim retries in mem_cgroup_resize_limit()

2018-02-22 Thread Michal Hocko
On Thu 22-02-18 19:01:58, Andrey Ryabinin wrote:
> 
> 
> On 02/22/2018 06:44 PM, Michal Hocko wrote:
> > On Thu 22-02-18 18:38:11, Andrey Ryabinin wrote:
> 
> 
>  with the patch:
>  best: 1.04  secs, 9.7G reclaimed
>  worst: 2.2 secs, 16G reclaimed.
> 
>  without:
>  best: 5.4 sec, 35G reclaimed
>  worst: 22.2 sec, 136G reclaimed
> >>>
> >>> Could you also compare how much memory do we reclaim with/without the
> >>> patch?
> >>>
> >>
> >> I did and I wrote the results. Please look again.
> > 
> > I must have forgotten. Care to point me to the message-id?
> 
> The results are quoted right above, literally above. Raise your eyes
> up. message-id 0927bcab-7e2c-c6f9-d16a-315ac436b...@virtuozzo.com

OK, I see. We were talking about 2 different things I guess.

> I write it here again:
> 
> with the patch:
>  best: 9.7G reclaimed
>  worst: 16G reclaimed
> 
> without:
>  best: 35G reclaimed
>  worst: 136G reclaimed
> 
> Or you asking about something else? If so, I don't understand what you
> want.

Well, those numbers do not tell us much, right? You have 4 concurrent
readers each an own 1G file in a loop. The longer you keep running that
the more pages you are reclaiming of course. But you are not comparing
the same amount of work.

My main concern about the patch is that it might over-reclaim a lot if
we have workload which also frees memory rahther than constantly add
more easily reclaimable page cache. I realize such a test is not easy
to make.

I have already said that I will not block the patch but it should be at
least explained why a larger batch makes a difference.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH v5 2/2] mm/memcontrol.c: Reduce reclaim retries in mem_cgroup_resize_limit()

2018-02-22 Thread Michal Hocko
On Thu 22-02-18 19:01:58, Andrey Ryabinin wrote:
> 
> 
> On 02/22/2018 06:44 PM, Michal Hocko wrote:
> > On Thu 22-02-18 18:38:11, Andrey Ryabinin wrote:
> 
> 
>  with the patch:
>  best: 1.04  secs, 9.7G reclaimed
>  worst: 2.2 secs, 16G reclaimed.
> 
>  without:
>  best: 5.4 sec, 35G reclaimed
>  worst: 22.2 sec, 136G reclaimed
> >>>
> >>> Could you also compare how much memory do we reclaim with/without the
> >>> patch?
> >>>
> >>
> >> I did and I wrote the results. Please look again.
> > 
> > I must have forgotten. Care to point me to the message-id?
> 
> The results are quoted right above, literally above. Raise your eyes
> up. message-id 0927bcab-7e2c-c6f9-d16a-315ac436b...@virtuozzo.com

OK, I see. We were talking about 2 different things I guess.

> I write it here again:
> 
> with the patch:
>  best: 9.7G reclaimed
>  worst: 16G reclaimed
> 
> without:
>  best: 35G reclaimed
>  worst: 136G reclaimed
> 
> Or you asking about something else? If so, I don't understand what you
> want.

Well, those numbers do not tell us much, right? You have 4 concurrent
readers each an own 1G file in a loop. The longer you keep running that
the more pages you are reclaiming of course. But you are not comparing
the same amount of work.

My main concern about the patch is that it might over-reclaim a lot if
we have workload which also frees memory rahther than constantly add
more easily reclaimable page cache. I realize such a test is not easy
to make.

I have already said that I will not block the patch but it should be at
least explained why a larger batch makes a difference.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH v5 2/2] mm/memcontrol.c: Reduce reclaim retries in mem_cgroup_resize_limit()

2018-02-22 Thread Andrey Ryabinin


On 02/22/2018 06:44 PM, Michal Hocko wrote:
> On Thu 22-02-18 18:38:11, Andrey Ryabinin wrote:


 with the patch:
 best: 1.04  secs, 9.7G reclaimed
 worst: 2.2 secs, 16G reclaimed.

 without:
 best: 5.4 sec, 35G reclaimed
 worst: 22.2 sec, 136G reclaimed
>>>
>>> Could you also compare how much memory do we reclaim with/without the
>>> patch?
>>>
>>
>> I did and I wrote the results. Please look again.
> 
> I must have forgotten. Care to point me to the message-id?

The results are quoted right above, literally above. Raise your eyes up. 
message-id 0927bcab-7e2c-c6f9-d16a-315ac436b...@virtuozzo.com

I write it here again:

with the patch:
 best: 9.7G reclaimed
 worst: 16G reclaimed

without:
 best: 35G reclaimed
 worst: 136G reclaimed

Or you asking about something else? If so, I don't understand what you want.

> 20180119132544.19569-2-aryabi...@virtuozzo.com doesn't contain this
> information and a quick glance over the follow up thread doesn't have
> anything as well. Ideally, this should be in the patch changelog, btw.
> 

Well, I did these measurements only today and I don't have time machine.


Re: [PATCH v5 2/2] mm/memcontrol.c: Reduce reclaim retries in mem_cgroup_resize_limit()

2018-02-22 Thread Andrey Ryabinin


On 02/22/2018 06:44 PM, Michal Hocko wrote:
> On Thu 22-02-18 18:38:11, Andrey Ryabinin wrote:


 with the patch:
 best: 1.04  secs, 9.7G reclaimed
 worst: 2.2 secs, 16G reclaimed.

 without:
 best: 5.4 sec, 35G reclaimed
 worst: 22.2 sec, 136G reclaimed
>>>
>>> Could you also compare how much memory do we reclaim with/without the
>>> patch?
>>>
>>
>> I did and I wrote the results. Please look again.
> 
> I must have forgotten. Care to point me to the message-id?

The results are quoted right above, literally above. Raise your eyes up. 
message-id 0927bcab-7e2c-c6f9-d16a-315ac436b...@virtuozzo.com

I write it here again:

with the patch:
 best: 9.7G reclaimed
 worst: 16G reclaimed

without:
 best: 35G reclaimed
 worst: 136G reclaimed

Or you asking about something else? If so, I don't understand what you want.

> 20180119132544.19569-2-aryabi...@virtuozzo.com doesn't contain this
> information and a quick glance over the follow up thread doesn't have
> anything as well. Ideally, this should be in the patch changelog, btw.
> 

Well, I did these measurements only today and I don't have time machine.


Re: [PATCH v5 2/2] mm/memcontrol.c: Reduce reclaim retries in mem_cgroup_resize_limit()

2018-02-22 Thread Michal Hocko
On Thu 22-02-18 18:38:11, Andrey Ryabinin wrote:
> 
> 
> On 02/22/2018 06:33 PM, Michal Hocko wrote:
> > On Thu 22-02-18 18:13:11, Andrey Ryabinin wrote:
> >>
> >>
> >> On 02/22/2018 05:09 PM, Michal Hocko wrote:
> >>> On Thu 22-02-18 16:50:33, Andrey Ryabinin wrote:
>  On 02/21/2018 11:17 PM, Andrew Morton wrote:
> > On Fri, 19 Jan 2018 16:11:18 +0100 Michal Hocko  
> > wrote:
> >
> >> And to be honest, I do not really see why keeping retrying from
> >> mem_cgroup_resize_limit should be so much faster than keep retrying 
> >> from
> >> the direct reclaim path. We are doing SWAP_CLUSTER_MAX batches anyway.
> >> mem_cgroup_resize_limit loop adds _some_ overhead but I am not really
> >> sure why it should be that large.
> >
> > Maybe restarting the scan lots of times results in rescanning lots of
> > ineligible pages at the start of the list before doing useful work?
> >
> > Andrey, are you able to determine where all that CPU time is being 
> > spent?
> >
> 
>  I should have been more specific about the test I did. The full script 
>  looks like this:
> 
>  mkdir -p /sys/fs/cgroup/memory/test
>  echo $$ > /sys/fs/cgroup/memory/test/tasks
>  cat 4G_file > /dev/null
>  while true; do cat 4G_file > /dev/null; done &
>  loop_pid=$!
>  perf stat echo 50M > /sys/fs/cgroup/memory/test/memory.limit_in_bytes
>  echo -1 > /sys/fs/cgroup/memory/test/memory.limit_in_bytes
>  kill $loop_pid
> 
> 
>  I think the additional loops add some overhead and it's not that big by 
>  itself, but
>  this small overhead allows task to refill slightly more pages, increasing
>  the total amount of pages that mem_cgroup_resize_limit() need to reclaim.
> 
>  By using the following commands to show the the amount of reclaimed 
>  pages:
>  perf record -e vmscan:mm_vmscan_memcg_reclaim_end echo 50M > 
>  /sys/fs/cgroup/memory/test/memory.limit_in_bytes
>  perf script|cut -d '=' -f 2| paste -sd+ |bc
> 
>  I've got 1259841 pages (4.9G) with the patch vs 1394312 pages (5.4G) 
>  without it.
> >>>
> >>> So how does the picture changes if you have multiple producers?
> >>>
> >>
> >> Drastically, in favor of the patch. But numbers *very* fickle from run to 
> >> run.
> >>
> >> Inside 5G vm with  4 cpus (qemu -m 5G -smp 4) and 4 processes in cgroup 
> >> reading 1G files:
> >> "while true; do cat /1g_f$i > /dev/null; done &"
> >>
> >> with the patch:
> >> best: 1.04  secs, 9.7G reclaimed
> >> worst: 2.2 secs, 16G reclaimed.
> >>
> >> without:
> >> best: 5.4 sec, 35G reclaimed
> >> worst: 22.2 sec, 136G reclaimed
> > 
> > Could you also compare how much memory do we reclaim with/without the
> > patch?
> > 
> 
> I did and I wrote the results. Please look again.

I must have forgotten. Care to point me to the message-id?
20180119132544.19569-2-aryabi...@virtuozzo.com doesn't contain this
information and a quick glance over the follow up thread doesn't have
anything as well. Ideally, this should be in the patch changelog, btw.

-- 
Michal Hocko
SUSE Labs


Re: [PATCH v5 2/2] mm/memcontrol.c: Reduce reclaim retries in mem_cgroup_resize_limit()

2018-02-22 Thread Michal Hocko
On Thu 22-02-18 18:38:11, Andrey Ryabinin wrote:
> 
> 
> On 02/22/2018 06:33 PM, Michal Hocko wrote:
> > On Thu 22-02-18 18:13:11, Andrey Ryabinin wrote:
> >>
> >>
> >> On 02/22/2018 05:09 PM, Michal Hocko wrote:
> >>> On Thu 22-02-18 16:50:33, Andrey Ryabinin wrote:
>  On 02/21/2018 11:17 PM, Andrew Morton wrote:
> > On Fri, 19 Jan 2018 16:11:18 +0100 Michal Hocko  
> > wrote:
> >
> >> And to be honest, I do not really see why keeping retrying from
> >> mem_cgroup_resize_limit should be so much faster than keep retrying 
> >> from
> >> the direct reclaim path. We are doing SWAP_CLUSTER_MAX batches anyway.
> >> mem_cgroup_resize_limit loop adds _some_ overhead but I am not really
> >> sure why it should be that large.
> >
> > Maybe restarting the scan lots of times results in rescanning lots of
> > ineligible pages at the start of the list before doing useful work?
> >
> > Andrey, are you able to determine where all that CPU time is being 
> > spent?
> >
> 
>  I should have been more specific about the test I did. The full script 
>  looks like this:
> 
>  mkdir -p /sys/fs/cgroup/memory/test
>  echo $$ > /sys/fs/cgroup/memory/test/tasks
>  cat 4G_file > /dev/null
>  while true; do cat 4G_file > /dev/null; done &
>  loop_pid=$!
>  perf stat echo 50M > /sys/fs/cgroup/memory/test/memory.limit_in_bytes
>  echo -1 > /sys/fs/cgroup/memory/test/memory.limit_in_bytes
>  kill $loop_pid
> 
> 
>  I think the additional loops add some overhead and it's not that big by 
>  itself, but
>  this small overhead allows task to refill slightly more pages, increasing
>  the total amount of pages that mem_cgroup_resize_limit() need to reclaim.
> 
>  By using the following commands to show the the amount of reclaimed 
>  pages:
>  perf record -e vmscan:mm_vmscan_memcg_reclaim_end echo 50M > 
>  /sys/fs/cgroup/memory/test/memory.limit_in_bytes
>  perf script|cut -d '=' -f 2| paste -sd+ |bc
> 
>  I've got 1259841 pages (4.9G) with the patch vs 1394312 pages (5.4G) 
>  without it.
> >>>
> >>> So how does the picture changes if you have multiple producers?
> >>>
> >>
> >> Drastically, in favor of the patch. But numbers *very* fickle from run to 
> >> run.
> >>
> >> Inside 5G vm with  4 cpus (qemu -m 5G -smp 4) and 4 processes in cgroup 
> >> reading 1G files:
> >> "while true; do cat /1g_f$i > /dev/null; done &"
> >>
> >> with the patch:
> >> best: 1.04  secs, 9.7G reclaimed
> >> worst: 2.2 secs, 16G reclaimed.
> >>
> >> without:
> >> best: 5.4 sec, 35G reclaimed
> >> worst: 22.2 sec, 136G reclaimed
> > 
> > Could you also compare how much memory do we reclaim with/without the
> > patch?
> > 
> 
> I did and I wrote the results. Please look again.

I must have forgotten. Care to point me to the message-id?
20180119132544.19569-2-aryabi...@virtuozzo.com doesn't contain this
information and a quick glance over the follow up thread doesn't have
anything as well. Ideally, this should be in the patch changelog, btw.

-- 
Michal Hocko
SUSE Labs


Re: [PATCH v5 2/2] mm/memcontrol.c: Reduce reclaim retries in mem_cgroup_resize_limit()

2018-02-22 Thread Andrey Ryabinin


On 02/22/2018 06:33 PM, Michal Hocko wrote:
> On Thu 22-02-18 18:13:11, Andrey Ryabinin wrote:
>>
>>
>> On 02/22/2018 05:09 PM, Michal Hocko wrote:
>>> On Thu 22-02-18 16:50:33, Andrey Ryabinin wrote:
 On 02/21/2018 11:17 PM, Andrew Morton wrote:
> On Fri, 19 Jan 2018 16:11:18 +0100 Michal Hocko  wrote:
>
>> And to be honest, I do not really see why keeping retrying from
>> mem_cgroup_resize_limit should be so much faster than keep retrying from
>> the direct reclaim path. We are doing SWAP_CLUSTER_MAX batches anyway.
>> mem_cgroup_resize_limit loop adds _some_ overhead but I am not really
>> sure why it should be that large.
>
> Maybe restarting the scan lots of times results in rescanning lots of
> ineligible pages at the start of the list before doing useful work?
>
> Andrey, are you able to determine where all that CPU time is being spent?
>

 I should have been more specific about the test I did. The full script 
 looks like this:

 mkdir -p /sys/fs/cgroup/memory/test
 echo $$ > /sys/fs/cgroup/memory/test/tasks
 cat 4G_file > /dev/null
 while true; do cat 4G_file > /dev/null; done &
 loop_pid=$!
 perf stat echo 50M > /sys/fs/cgroup/memory/test/memory.limit_in_bytes
 echo -1 > /sys/fs/cgroup/memory/test/memory.limit_in_bytes
 kill $loop_pid


 I think the additional loops add some overhead and it's not that big by 
 itself, but
 this small overhead allows task to refill slightly more pages, increasing
 the total amount of pages that mem_cgroup_resize_limit() need to reclaim.

 By using the following commands to show the the amount of reclaimed pages:
 perf record -e vmscan:mm_vmscan_memcg_reclaim_end echo 50M > 
 /sys/fs/cgroup/memory/test/memory.limit_in_bytes
 perf script|cut -d '=' -f 2| paste -sd+ |bc

 I've got 1259841 pages (4.9G) with the patch vs 1394312 pages (5.4G) 
 without it.
>>>
>>> So how does the picture changes if you have multiple producers?
>>>
>>
>> Drastically, in favor of the patch. But numbers *very* fickle from run to 
>> run.
>>
>> Inside 5G vm with  4 cpus (qemu -m 5G -smp 4) and 4 processes in cgroup 
>> reading 1G files:
>> "while true; do cat /1g_f$i > /dev/null; done &"
>>
>> with the patch:
>> best: 1.04  secs, 9.7G reclaimed
>> worst: 2.2 secs, 16G reclaimed.
>>
>> without:
>> best: 5.4 sec, 35G reclaimed
>> worst: 22.2 sec, 136G reclaimed
> 
> Could you also compare how much memory do we reclaim with/without the
> patch?
> 

I did and I wrote the results. Please look again.


Re: [PATCH v5 2/2] mm/memcontrol.c: Reduce reclaim retries in mem_cgroup_resize_limit()

2018-02-22 Thread Andrey Ryabinin


On 02/22/2018 06:33 PM, Michal Hocko wrote:
> On Thu 22-02-18 18:13:11, Andrey Ryabinin wrote:
>>
>>
>> On 02/22/2018 05:09 PM, Michal Hocko wrote:
>>> On Thu 22-02-18 16:50:33, Andrey Ryabinin wrote:
 On 02/21/2018 11:17 PM, Andrew Morton wrote:
> On Fri, 19 Jan 2018 16:11:18 +0100 Michal Hocko  wrote:
>
>> And to be honest, I do not really see why keeping retrying from
>> mem_cgroup_resize_limit should be so much faster than keep retrying from
>> the direct reclaim path. We are doing SWAP_CLUSTER_MAX batches anyway.
>> mem_cgroup_resize_limit loop adds _some_ overhead but I am not really
>> sure why it should be that large.
>
> Maybe restarting the scan lots of times results in rescanning lots of
> ineligible pages at the start of the list before doing useful work?
>
> Andrey, are you able to determine where all that CPU time is being spent?
>

 I should have been more specific about the test I did. The full script 
 looks like this:

 mkdir -p /sys/fs/cgroup/memory/test
 echo $$ > /sys/fs/cgroup/memory/test/tasks
 cat 4G_file > /dev/null
 while true; do cat 4G_file > /dev/null; done &
 loop_pid=$!
 perf stat echo 50M > /sys/fs/cgroup/memory/test/memory.limit_in_bytes
 echo -1 > /sys/fs/cgroup/memory/test/memory.limit_in_bytes
 kill $loop_pid


 I think the additional loops add some overhead and it's not that big by 
 itself, but
 this small overhead allows task to refill slightly more pages, increasing
 the total amount of pages that mem_cgroup_resize_limit() need to reclaim.

 By using the following commands to show the the amount of reclaimed pages:
 perf record -e vmscan:mm_vmscan_memcg_reclaim_end echo 50M > 
 /sys/fs/cgroup/memory/test/memory.limit_in_bytes
 perf script|cut -d '=' -f 2| paste -sd+ |bc

 I've got 1259841 pages (4.9G) with the patch vs 1394312 pages (5.4G) 
 without it.
>>>
>>> So how does the picture changes if you have multiple producers?
>>>
>>
>> Drastically, in favor of the patch. But numbers *very* fickle from run to 
>> run.
>>
>> Inside 5G vm with  4 cpus (qemu -m 5G -smp 4) and 4 processes in cgroup 
>> reading 1G files:
>> "while true; do cat /1g_f$i > /dev/null; done &"
>>
>> with the patch:
>> best: 1.04  secs, 9.7G reclaimed
>> worst: 2.2 secs, 16G reclaimed.
>>
>> without:
>> best: 5.4 sec, 35G reclaimed
>> worst: 22.2 sec, 136G reclaimed
> 
> Could you also compare how much memory do we reclaim with/without the
> patch?
> 

I did and I wrote the results. Please look again.


Re: [PATCH v5 2/2] mm/memcontrol.c: Reduce reclaim retries in mem_cgroup_resize_limit()

2018-02-22 Thread Michal Hocko
On Thu 22-02-18 18:13:11, Andrey Ryabinin wrote:
> 
> 
> On 02/22/2018 05:09 PM, Michal Hocko wrote:
> > On Thu 22-02-18 16:50:33, Andrey Ryabinin wrote:
> >> On 02/21/2018 11:17 PM, Andrew Morton wrote:
> >>> On Fri, 19 Jan 2018 16:11:18 +0100 Michal Hocko  wrote:
> >>>
>  And to be honest, I do not really see why keeping retrying from
>  mem_cgroup_resize_limit should be so much faster than keep retrying from
>  the direct reclaim path. We are doing SWAP_CLUSTER_MAX batches anyway.
>  mem_cgroup_resize_limit loop adds _some_ overhead but I am not really
>  sure why it should be that large.
> >>>
> >>> Maybe restarting the scan lots of times results in rescanning lots of
> >>> ineligible pages at the start of the list before doing useful work?
> >>>
> >>> Andrey, are you able to determine where all that CPU time is being spent?
> >>>
> >>
> >> I should have been more specific about the test I did. The full script 
> >> looks like this:
> >>
> >> mkdir -p /sys/fs/cgroup/memory/test
> >> echo $$ > /sys/fs/cgroup/memory/test/tasks
> >> cat 4G_file > /dev/null
> >> while true; do cat 4G_file > /dev/null; done &
> >> loop_pid=$!
> >> perf stat echo 50M > /sys/fs/cgroup/memory/test/memory.limit_in_bytes
> >> echo -1 > /sys/fs/cgroup/memory/test/memory.limit_in_bytes
> >> kill $loop_pid
> >>
> >>
> >> I think the additional loops add some overhead and it's not that big by 
> >> itself, but
> >> this small overhead allows task to refill slightly more pages, increasing
> >> the total amount of pages that mem_cgroup_resize_limit() need to reclaim.
> >>
> >> By using the following commands to show the the amount of reclaimed pages:
> >> perf record -e vmscan:mm_vmscan_memcg_reclaim_end echo 50M > 
> >> /sys/fs/cgroup/memory/test/memory.limit_in_bytes
> >> perf script|cut -d '=' -f 2| paste -sd+ |bc
> >>
> >> I've got 1259841 pages (4.9G) with the patch vs 1394312 pages (5.4G) 
> >> without it.
> > 
> > So how does the picture changes if you have multiple producers?
> > 
> 
> Drastically, in favor of the patch. But numbers *very* fickle from run to run.
> 
> Inside 5G vm with  4 cpus (qemu -m 5G -smp 4) and 4 processes in cgroup 
> reading 1G files:
> "while true; do cat /1g_f$i > /dev/null; done &"
> 
> with the patch:
> best: 1.04  secs, 9.7G reclaimed
> worst: 2.2 secs, 16G reclaimed.
> 
> without:
> best: 5.4 sec, 35G reclaimed
> worst: 22.2 sec, 136G reclaimed

Could you also compare how much memory do we reclaim with/without the
patch?
-- 
Michal Hocko
SUSE Labs


Re: [PATCH v5 2/2] mm/memcontrol.c: Reduce reclaim retries in mem_cgroup_resize_limit()

2018-02-22 Thread Michal Hocko
On Thu 22-02-18 18:13:11, Andrey Ryabinin wrote:
> 
> 
> On 02/22/2018 05:09 PM, Michal Hocko wrote:
> > On Thu 22-02-18 16:50:33, Andrey Ryabinin wrote:
> >> On 02/21/2018 11:17 PM, Andrew Morton wrote:
> >>> On Fri, 19 Jan 2018 16:11:18 +0100 Michal Hocko  wrote:
> >>>
>  And to be honest, I do not really see why keeping retrying from
>  mem_cgroup_resize_limit should be so much faster than keep retrying from
>  the direct reclaim path. We are doing SWAP_CLUSTER_MAX batches anyway.
>  mem_cgroup_resize_limit loop adds _some_ overhead but I am not really
>  sure why it should be that large.
> >>>
> >>> Maybe restarting the scan lots of times results in rescanning lots of
> >>> ineligible pages at the start of the list before doing useful work?
> >>>
> >>> Andrey, are you able to determine where all that CPU time is being spent?
> >>>
> >>
> >> I should have been more specific about the test I did. The full script 
> >> looks like this:
> >>
> >> mkdir -p /sys/fs/cgroup/memory/test
> >> echo $$ > /sys/fs/cgroup/memory/test/tasks
> >> cat 4G_file > /dev/null
> >> while true; do cat 4G_file > /dev/null; done &
> >> loop_pid=$!
> >> perf stat echo 50M > /sys/fs/cgroup/memory/test/memory.limit_in_bytes
> >> echo -1 > /sys/fs/cgroup/memory/test/memory.limit_in_bytes
> >> kill $loop_pid
> >>
> >>
> >> I think the additional loops add some overhead and it's not that big by 
> >> itself, but
> >> this small overhead allows task to refill slightly more pages, increasing
> >> the total amount of pages that mem_cgroup_resize_limit() need to reclaim.
> >>
> >> By using the following commands to show the the amount of reclaimed pages:
> >> perf record -e vmscan:mm_vmscan_memcg_reclaim_end echo 50M > 
> >> /sys/fs/cgroup/memory/test/memory.limit_in_bytes
> >> perf script|cut -d '=' -f 2| paste -sd+ |bc
> >>
> >> I've got 1259841 pages (4.9G) with the patch vs 1394312 pages (5.4G) 
> >> without it.
> > 
> > So how does the picture changes if you have multiple producers?
> > 
> 
> Drastically, in favor of the patch. But numbers *very* fickle from run to run.
> 
> Inside 5G vm with  4 cpus (qemu -m 5G -smp 4) and 4 processes in cgroup 
> reading 1G files:
> "while true; do cat /1g_f$i > /dev/null; done &"
> 
> with the patch:
> best: 1.04  secs, 9.7G reclaimed
> worst: 2.2 secs, 16G reclaimed.
> 
> without:
> best: 5.4 sec, 35G reclaimed
> worst: 22.2 sec, 136G reclaimed

Could you also compare how much memory do we reclaim with/without the
patch?
-- 
Michal Hocko
SUSE Labs


Re: [PATCH v5 2/2] mm/memcontrol.c: Reduce reclaim retries in mem_cgroup_resize_limit()

2018-02-22 Thread Andrey Ryabinin


On 02/22/2018 05:09 PM, Michal Hocko wrote:
> On Thu 22-02-18 16:50:33, Andrey Ryabinin wrote:
>> On 02/21/2018 11:17 PM, Andrew Morton wrote:
>>> On Fri, 19 Jan 2018 16:11:18 +0100 Michal Hocko  wrote:
>>>
 And to be honest, I do not really see why keeping retrying from
 mem_cgroup_resize_limit should be so much faster than keep retrying from
 the direct reclaim path. We are doing SWAP_CLUSTER_MAX batches anyway.
 mem_cgroup_resize_limit loop adds _some_ overhead but I am not really
 sure why it should be that large.
>>>
>>> Maybe restarting the scan lots of times results in rescanning lots of
>>> ineligible pages at the start of the list before doing useful work?
>>>
>>> Andrey, are you able to determine where all that CPU time is being spent?
>>>
>>
>> I should have been more specific about the test I did. The full script looks 
>> like this:
>>
>> mkdir -p /sys/fs/cgroup/memory/test
>> echo $$ > /sys/fs/cgroup/memory/test/tasks
>> cat 4G_file > /dev/null
>> while true; do cat 4G_file > /dev/null; done &
>> loop_pid=$!
>> perf stat echo 50M > /sys/fs/cgroup/memory/test/memory.limit_in_bytes
>> echo -1 > /sys/fs/cgroup/memory/test/memory.limit_in_bytes
>> kill $loop_pid
>>
>>
>> I think the additional loops add some overhead and it's not that big by 
>> itself, but
>> this small overhead allows task to refill slightly more pages, increasing
>> the total amount of pages that mem_cgroup_resize_limit() need to reclaim.
>>
>> By using the following commands to show the the amount of reclaimed pages:
>> perf record -e vmscan:mm_vmscan_memcg_reclaim_end echo 50M > 
>> /sys/fs/cgroup/memory/test/memory.limit_in_bytes
>> perf script|cut -d '=' -f 2| paste -sd+ |bc
>>
>> I've got 1259841 pages (4.9G) with the patch vs 1394312 pages (5.4G) without 
>> it.
> 
> So how does the picture changes if you have multiple producers?
> 

Drastically, in favor of the patch. But numbers *very* fickle from run to run.

Inside 5G vm with  4 cpus (qemu -m 5G -smp 4) and 4 processes in cgroup reading 
1G files:
"while true; do cat /1g_f$i > /dev/null; done &"

with the patch:
best: 1.04  secs, 9.7G reclaimed
worst: 2.2 secs, 16G reclaimed.

without:
best: 5.4 sec, 35G reclaimed
worst: 22.2 sec, 136G reclaimed


Re: [PATCH v5 2/2] mm/memcontrol.c: Reduce reclaim retries in mem_cgroup_resize_limit()

2018-02-22 Thread Andrey Ryabinin


On 02/22/2018 05:09 PM, Michal Hocko wrote:
> On Thu 22-02-18 16:50:33, Andrey Ryabinin wrote:
>> On 02/21/2018 11:17 PM, Andrew Morton wrote:
>>> On Fri, 19 Jan 2018 16:11:18 +0100 Michal Hocko  wrote:
>>>
 And to be honest, I do not really see why keeping retrying from
 mem_cgroup_resize_limit should be so much faster than keep retrying from
 the direct reclaim path. We are doing SWAP_CLUSTER_MAX batches anyway.
 mem_cgroup_resize_limit loop adds _some_ overhead but I am not really
 sure why it should be that large.
>>>
>>> Maybe restarting the scan lots of times results in rescanning lots of
>>> ineligible pages at the start of the list before doing useful work?
>>>
>>> Andrey, are you able to determine where all that CPU time is being spent?
>>>
>>
>> I should have been more specific about the test I did. The full script looks 
>> like this:
>>
>> mkdir -p /sys/fs/cgroup/memory/test
>> echo $$ > /sys/fs/cgroup/memory/test/tasks
>> cat 4G_file > /dev/null
>> while true; do cat 4G_file > /dev/null; done &
>> loop_pid=$!
>> perf stat echo 50M > /sys/fs/cgroup/memory/test/memory.limit_in_bytes
>> echo -1 > /sys/fs/cgroup/memory/test/memory.limit_in_bytes
>> kill $loop_pid
>>
>>
>> I think the additional loops add some overhead and it's not that big by 
>> itself, but
>> this small overhead allows task to refill slightly more pages, increasing
>> the total amount of pages that mem_cgroup_resize_limit() need to reclaim.
>>
>> By using the following commands to show the the amount of reclaimed pages:
>> perf record -e vmscan:mm_vmscan_memcg_reclaim_end echo 50M > 
>> /sys/fs/cgroup/memory/test/memory.limit_in_bytes
>> perf script|cut -d '=' -f 2| paste -sd+ |bc
>>
>> I've got 1259841 pages (4.9G) with the patch vs 1394312 pages (5.4G) without 
>> it.
> 
> So how does the picture changes if you have multiple producers?
> 

Drastically, in favor of the patch. But numbers *very* fickle from run to run.

Inside 5G vm with  4 cpus (qemu -m 5G -smp 4) and 4 processes in cgroup reading 
1G files:
"while true; do cat /1g_f$i > /dev/null; done &"

with the patch:
best: 1.04  secs, 9.7G reclaimed
worst: 2.2 secs, 16G reclaimed.

without:
best: 5.4 sec, 35G reclaimed
worst: 22.2 sec, 136G reclaimed


Re: [PATCH v5 2/2] mm/memcontrol.c: Reduce reclaim retries in mem_cgroup_resize_limit()

2018-02-22 Thread Michal Hocko
On Thu 22-02-18 16:50:33, Andrey Ryabinin wrote:
> On 02/21/2018 11:17 PM, Andrew Morton wrote:
> > On Fri, 19 Jan 2018 16:11:18 +0100 Michal Hocko  wrote:
> > 
> >> And to be honest, I do not really see why keeping retrying from
> >> mem_cgroup_resize_limit should be so much faster than keep retrying from
> >> the direct reclaim path. We are doing SWAP_CLUSTER_MAX batches anyway.
> >> mem_cgroup_resize_limit loop adds _some_ overhead but I am not really
> >> sure why it should be that large.
> > 
> > Maybe restarting the scan lots of times results in rescanning lots of
> > ineligible pages at the start of the list before doing useful work?
> > 
> > Andrey, are you able to determine where all that CPU time is being spent?
> > 
> 
> I should have been more specific about the test I did. The full script looks 
> like this:
> 
> mkdir -p /sys/fs/cgroup/memory/test
> echo $$ > /sys/fs/cgroup/memory/test/tasks
> cat 4G_file > /dev/null
> while true; do cat 4G_file > /dev/null; done &
> loop_pid=$!
> perf stat echo 50M > /sys/fs/cgroup/memory/test/memory.limit_in_bytes
> echo -1 > /sys/fs/cgroup/memory/test/memory.limit_in_bytes
> kill $loop_pid
> 
> 
> I think the additional loops add some overhead and it's not that big by 
> itself, but
> this small overhead allows task to refill slightly more pages, increasing
> the total amount of pages that mem_cgroup_resize_limit() need to reclaim.
> 
> By using the following commands to show the the amount of reclaimed pages:
> perf record -e vmscan:mm_vmscan_memcg_reclaim_end echo 50M > 
> /sys/fs/cgroup/memory/test/memory.limit_in_bytes
> perf script|cut -d '=' -f 2| paste -sd+ |bc
> 
> I've got 1259841 pages (4.9G) with the patch vs 1394312 pages (5.4G) without 
> it.

So how does the picture changes if you have multiple producers?

-- 
Michal Hocko
SUSE Labs


Re: [PATCH v5 2/2] mm/memcontrol.c: Reduce reclaim retries in mem_cgroup_resize_limit()

2018-02-22 Thread Michal Hocko
On Thu 22-02-18 16:50:33, Andrey Ryabinin wrote:
> On 02/21/2018 11:17 PM, Andrew Morton wrote:
> > On Fri, 19 Jan 2018 16:11:18 +0100 Michal Hocko  wrote:
> > 
> >> And to be honest, I do not really see why keeping retrying from
> >> mem_cgroup_resize_limit should be so much faster than keep retrying from
> >> the direct reclaim path. We are doing SWAP_CLUSTER_MAX batches anyway.
> >> mem_cgroup_resize_limit loop adds _some_ overhead but I am not really
> >> sure why it should be that large.
> > 
> > Maybe restarting the scan lots of times results in rescanning lots of
> > ineligible pages at the start of the list before doing useful work?
> > 
> > Andrey, are you able to determine where all that CPU time is being spent?
> > 
> 
> I should have been more specific about the test I did. The full script looks 
> like this:
> 
> mkdir -p /sys/fs/cgroup/memory/test
> echo $$ > /sys/fs/cgroup/memory/test/tasks
> cat 4G_file > /dev/null
> while true; do cat 4G_file > /dev/null; done &
> loop_pid=$!
> perf stat echo 50M > /sys/fs/cgroup/memory/test/memory.limit_in_bytes
> echo -1 > /sys/fs/cgroup/memory/test/memory.limit_in_bytes
> kill $loop_pid
> 
> 
> I think the additional loops add some overhead and it's not that big by 
> itself, but
> this small overhead allows task to refill slightly more pages, increasing
> the total amount of pages that mem_cgroup_resize_limit() need to reclaim.
> 
> By using the following commands to show the the amount of reclaimed pages:
> perf record -e vmscan:mm_vmscan_memcg_reclaim_end echo 50M > 
> /sys/fs/cgroup/memory/test/memory.limit_in_bytes
> perf script|cut -d '=' -f 2| paste -sd+ |bc
> 
> I've got 1259841 pages (4.9G) with the patch vs 1394312 pages (5.4G) without 
> it.

So how does the picture changes if you have multiple producers?

-- 
Michal Hocko
SUSE Labs


Re: [PATCH v5 2/2] mm/memcontrol.c: Reduce reclaim retries in mem_cgroup_resize_limit()

2018-02-22 Thread Andrey Ryabinin
On 02/21/2018 11:17 PM, Andrew Morton wrote:
> On Fri, 19 Jan 2018 16:11:18 +0100 Michal Hocko  wrote:
> 
>> And to be honest, I do not really see why keeping retrying from
>> mem_cgroup_resize_limit should be so much faster than keep retrying from
>> the direct reclaim path. We are doing SWAP_CLUSTER_MAX batches anyway.
>> mem_cgroup_resize_limit loop adds _some_ overhead but I am not really
>> sure why it should be that large.
> 
> Maybe restarting the scan lots of times results in rescanning lots of
> ineligible pages at the start of the list before doing useful work?
> 
> Andrey, are you able to determine where all that CPU time is being spent?
> 

I should have been more specific about the test I did. The full script looks 
like this:

mkdir -p /sys/fs/cgroup/memory/test
echo $$ > /sys/fs/cgroup/memory/test/tasks
cat 4G_file > /dev/null
while true; do cat 4G_file > /dev/null; done &
loop_pid=$!
perf stat echo 50M > /sys/fs/cgroup/memory/test/memory.limit_in_bytes
echo -1 > /sys/fs/cgroup/memory/test/memory.limit_in_bytes
kill $loop_pid


I think the additional loops add some overhead and it's not that big by itself, 
but
this small overhead allows task to refill slightly more pages, increasing
the total amount of pages that mem_cgroup_resize_limit() need to reclaim.

By using the following commands to show the the amount of reclaimed pages:
perf record -e vmscan:mm_vmscan_memcg_reclaim_end echo 50M > 
/sys/fs/cgroup/memory/test/memory.limit_in_bytes
perf script|cut -d '=' -f 2| paste -sd+ |bc

I've got 1259841 pages (4.9G) with the patch vs 1394312 pages (5.4G) without it.


Re: [PATCH v5 2/2] mm/memcontrol.c: Reduce reclaim retries in mem_cgroup_resize_limit()

2018-02-22 Thread Andrey Ryabinin
On 02/21/2018 11:17 PM, Andrew Morton wrote:
> On Fri, 19 Jan 2018 16:11:18 +0100 Michal Hocko  wrote:
> 
>> And to be honest, I do not really see why keeping retrying from
>> mem_cgroup_resize_limit should be so much faster than keep retrying from
>> the direct reclaim path. We are doing SWAP_CLUSTER_MAX batches anyway.
>> mem_cgroup_resize_limit loop adds _some_ overhead but I am not really
>> sure why it should be that large.
> 
> Maybe restarting the scan lots of times results in rescanning lots of
> ineligible pages at the start of the list before doing useful work?
> 
> Andrey, are you able to determine where all that CPU time is being spent?
> 

I should have been more specific about the test I did. The full script looks 
like this:

mkdir -p /sys/fs/cgroup/memory/test
echo $$ > /sys/fs/cgroup/memory/test/tasks
cat 4G_file > /dev/null
while true; do cat 4G_file > /dev/null; done &
loop_pid=$!
perf stat echo 50M > /sys/fs/cgroup/memory/test/memory.limit_in_bytes
echo -1 > /sys/fs/cgroup/memory/test/memory.limit_in_bytes
kill $loop_pid


I think the additional loops add some overhead and it's not that big by itself, 
but
this small overhead allows task to refill slightly more pages, increasing
the total amount of pages that mem_cgroup_resize_limit() need to reclaim.

By using the following commands to show the the amount of reclaimed pages:
perf record -e vmscan:mm_vmscan_memcg_reclaim_end echo 50M > 
/sys/fs/cgroup/memory/test/memory.limit_in_bytes
perf script|cut -d '=' -f 2| paste -sd+ |bc

I've got 1259841 pages (4.9G) with the patch vs 1394312 pages (5.4G) without it.


Re: [PATCH v5 2/2] mm/memcontrol.c: Reduce reclaim retries in mem_cgroup_resize_limit()

2018-02-21 Thread Andrew Morton
On Fri, 19 Jan 2018 16:11:18 +0100 Michal Hocko  wrote:

> And to be honest, I do not really see why keeping retrying from
> mem_cgroup_resize_limit should be so much faster than keep retrying from
> the direct reclaim path. We are doing SWAP_CLUSTER_MAX batches anyway.
> mem_cgroup_resize_limit loop adds _some_ overhead but I am not really
> sure why it should be that large.

Maybe restarting the scan lots of times results in rescanning lots of
ineligible pages at the start of the list before doing useful work?

Andrey, are you able to determine where all that CPU time is being spent?

Thanks.


Re: [PATCH v5 2/2] mm/memcontrol.c: Reduce reclaim retries in mem_cgroup_resize_limit()

2018-02-21 Thread Andrew Morton
On Fri, 19 Jan 2018 16:11:18 +0100 Michal Hocko  wrote:

> And to be honest, I do not really see why keeping retrying from
> mem_cgroup_resize_limit should be so much faster than keep retrying from
> the direct reclaim path. We are doing SWAP_CLUSTER_MAX batches anyway.
> mem_cgroup_resize_limit loop adds _some_ overhead but I am not really
> sure why it should be that large.

Maybe restarting the scan lots of times results in rescanning lots of
ineligible pages at the start of the list before doing useful work?

Andrey, are you able to determine where all that CPU time is being spent?

Thanks.


Re: [PATCH v5 2/2] mm/memcontrol.c: Reduce reclaim retries in mem_cgroup_resize_limit()

2018-01-19 Thread Michal Hocko
On Fri 19-01-18 07:24:08, Shakeel Butt wrote:
[...]
> Thanks for the explanation. Another query, we do not call
> drain_all_stock() in mem_cgroup_resize_limit() but memory_max_write()
> does call drain_all_stock(). Was this intentional or missed
> accidentally?

I think it is just an omission. I would have to look closer but I am
just leaving now and will be back on Tuesday. This is unrelated so I
would rather discuss it in a separate email thread.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH v5 2/2] mm/memcontrol.c: Reduce reclaim retries in mem_cgroup_resize_limit()

2018-01-19 Thread Michal Hocko
On Fri 19-01-18 07:24:08, Shakeel Butt wrote:
[...]
> Thanks for the explanation. Another query, we do not call
> drain_all_stock() in mem_cgroup_resize_limit() but memory_max_write()
> does call drain_all_stock(). Was this intentional or missed
> accidentally?

I think it is just an omission. I would have to look closer but I am
just leaving now and will be back on Tuesday. This is unrelated so I
would rather discuss it in a separate email thread.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH v5 2/2] mm/memcontrol.c: Reduce reclaim retries in mem_cgroup_resize_limit()

2018-01-19 Thread Shakeel Butt
On Fri, Jan 19, 2018 at 7:11 AM, Michal Hocko  wrote:
> On Fri 19-01-18 06:49:29, Shakeel Butt wrote:
>> On Fri, Jan 19, 2018 at 5:35 AM, Michal Hocko  wrote:
>> > On Fri 19-01-18 16:25:44, Andrey Ryabinin wrote:
>> >> Currently mem_cgroup_resize_limit() retries to set limit after reclaiming
>> >> 32 pages. It makes more sense to reclaim needed amount of pages right 
>> >> away.
>> >>
>> >> This works noticeably faster, especially if 'usage - limit' big.
>> >> E.g. bringing down limit from 4G to 50M:
>> >>
>> >> Before:
>> >>  # perf stat echo 50M > memory.limit_in_bytes
>> >>
>> >>  Performance counter stats for 'echo 50M':
>> >>
>> >> 386.582382  task-clock (msec) #0.835 CPUs 
>> >> utilized
>> >>  2,502  context-switches  #0.006 M/sec
>> >>
>> >>0.463244382 seconds time elapsed
>> >>
>> >> After:
>> >>  # perf stat echo 50M > memory.limit_in_bytes
>> >>
>> >>  Performance counter stats for 'echo 50M':
>> >>
>> >> 169.403906  task-clock (msec) #0.849 CPUs 
>> >> utilized
>> >> 14  context-switches  #0.083 K/sec
>> >>
>> >>0.199536900 seconds time elapsed
>> >
>> > But I am not going ack this one. As already stated this has a risk
>> > of over-reclaim if there a lot of charges are freed along with this
>> > shrinking. This is more of a theoretical concern so I am _not_ going to
>>
>> If you don't mind, can you explain why over-reclaim is a concern at
>> all? The only side effect of over reclaim I can think of is the job
>> might suffer a bit over (more swapins & pageins). Shouldn't this be
>> within the expectation of the user decreasing the limits?
>
> It is not a disaster. But it is an unexpected side effect of the
> implementation. If you have limit 1GB and want to reduce it 500MB
> then it would be quite surprising to land at 200M just because somebody
> was freeing 300MB in parallel. Is this likely? Probably not but the more
> is the limit touched and the larger are the differences the more likely
> it is. Keep retrying in the smaller amounts and you will not see the
> above happening.
>
> And to be honest, I do not really see why keeping retrying from
> mem_cgroup_resize_limit should be so much faster than keep retrying from
> the direct reclaim path. We are doing SWAP_CLUSTER_MAX batches anyway.
> mem_cgroup_resize_limit loop adds _some_ overhead but I am not really
> sure why it should be that large.
>

Thanks for the explanation. Another query, we do not call
drain_all_stock() in mem_cgroup_resize_limit() but memory_max_write()
does call drain_all_stock(). Was this intentional or missed
accidentally?


Re: [PATCH v5 2/2] mm/memcontrol.c: Reduce reclaim retries in mem_cgroup_resize_limit()

2018-01-19 Thread Shakeel Butt
On Fri, Jan 19, 2018 at 7:11 AM, Michal Hocko  wrote:
> On Fri 19-01-18 06:49:29, Shakeel Butt wrote:
>> On Fri, Jan 19, 2018 at 5:35 AM, Michal Hocko  wrote:
>> > On Fri 19-01-18 16:25:44, Andrey Ryabinin wrote:
>> >> Currently mem_cgroup_resize_limit() retries to set limit after reclaiming
>> >> 32 pages. It makes more sense to reclaim needed amount of pages right 
>> >> away.
>> >>
>> >> This works noticeably faster, especially if 'usage - limit' big.
>> >> E.g. bringing down limit from 4G to 50M:
>> >>
>> >> Before:
>> >>  # perf stat echo 50M > memory.limit_in_bytes
>> >>
>> >>  Performance counter stats for 'echo 50M':
>> >>
>> >> 386.582382  task-clock (msec) #0.835 CPUs 
>> >> utilized
>> >>  2,502  context-switches  #0.006 M/sec
>> >>
>> >>0.463244382 seconds time elapsed
>> >>
>> >> After:
>> >>  # perf stat echo 50M > memory.limit_in_bytes
>> >>
>> >>  Performance counter stats for 'echo 50M':
>> >>
>> >> 169.403906  task-clock (msec) #0.849 CPUs 
>> >> utilized
>> >> 14  context-switches  #0.083 K/sec
>> >>
>> >>0.199536900 seconds time elapsed
>> >
>> > But I am not going ack this one. As already stated this has a risk
>> > of over-reclaim if there a lot of charges are freed along with this
>> > shrinking. This is more of a theoretical concern so I am _not_ going to
>>
>> If you don't mind, can you explain why over-reclaim is a concern at
>> all? The only side effect of over reclaim I can think of is the job
>> might suffer a bit over (more swapins & pageins). Shouldn't this be
>> within the expectation of the user decreasing the limits?
>
> It is not a disaster. But it is an unexpected side effect of the
> implementation. If you have limit 1GB and want to reduce it 500MB
> then it would be quite surprising to land at 200M just because somebody
> was freeing 300MB in parallel. Is this likely? Probably not but the more
> is the limit touched and the larger are the differences the more likely
> it is. Keep retrying in the smaller amounts and you will not see the
> above happening.
>
> And to be honest, I do not really see why keeping retrying from
> mem_cgroup_resize_limit should be so much faster than keep retrying from
> the direct reclaim path. We are doing SWAP_CLUSTER_MAX batches anyway.
> mem_cgroup_resize_limit loop adds _some_ overhead but I am not really
> sure why it should be that large.
>

Thanks for the explanation. Another query, we do not call
drain_all_stock() in mem_cgroup_resize_limit() but memory_max_write()
does call drain_all_stock(). Was this intentional or missed
accidentally?


Re: [PATCH v5 2/2] mm/memcontrol.c: Reduce reclaim retries in mem_cgroup_resize_limit()

2018-01-19 Thread Michal Hocko
On Fri 19-01-18 06:49:29, Shakeel Butt wrote:
> On Fri, Jan 19, 2018 at 5:35 AM, Michal Hocko  wrote:
> > On Fri 19-01-18 16:25:44, Andrey Ryabinin wrote:
> >> Currently mem_cgroup_resize_limit() retries to set limit after reclaiming
> >> 32 pages. It makes more sense to reclaim needed amount of pages right away.
> >>
> >> This works noticeably faster, especially if 'usage - limit' big.
> >> E.g. bringing down limit from 4G to 50M:
> >>
> >> Before:
> >>  # perf stat echo 50M > memory.limit_in_bytes
> >>
> >>  Performance counter stats for 'echo 50M':
> >>
> >> 386.582382  task-clock (msec) #0.835 CPUs 
> >> utilized
> >>  2,502  context-switches  #0.006 M/sec
> >>
> >>0.463244382 seconds time elapsed
> >>
> >> After:
> >>  # perf stat echo 50M > memory.limit_in_bytes
> >>
> >>  Performance counter stats for 'echo 50M':
> >>
> >> 169.403906  task-clock (msec) #0.849 CPUs 
> >> utilized
> >> 14  context-switches  #0.083 K/sec
> >>
> >>0.199536900 seconds time elapsed
> >
> > But I am not going ack this one. As already stated this has a risk
> > of over-reclaim if there a lot of charges are freed along with this
> > shrinking. This is more of a theoretical concern so I am _not_ going to
> 
> If you don't mind, can you explain why over-reclaim is a concern at
> all? The only side effect of over reclaim I can think of is the job
> might suffer a bit over (more swapins & pageins). Shouldn't this be
> within the expectation of the user decreasing the limits?

It is not a disaster. But it is an unexpected side effect of the
implementation. If you have limit 1GB and want to reduce it 500MB
then it would be quite surprising to land at 200M just because somebody
was freeing 300MB in parallel. Is this likely? Probably not but the more
is the limit touched and the larger are the differences the more likely
it is. Keep retrying in the smaller amounts and you will not see the
above happening.

And to be honest, I do not really see why keeping retrying from
mem_cgroup_resize_limit should be so much faster than keep retrying from
the direct reclaim path. We are doing SWAP_CLUSTER_MAX batches anyway.
mem_cgroup_resize_limit loop adds _some_ overhead but I am not really
sure why it should be that large.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH v5 2/2] mm/memcontrol.c: Reduce reclaim retries in mem_cgroup_resize_limit()

2018-01-19 Thread Michal Hocko
On Fri 19-01-18 06:49:29, Shakeel Butt wrote:
> On Fri, Jan 19, 2018 at 5:35 AM, Michal Hocko  wrote:
> > On Fri 19-01-18 16:25:44, Andrey Ryabinin wrote:
> >> Currently mem_cgroup_resize_limit() retries to set limit after reclaiming
> >> 32 pages. It makes more sense to reclaim needed amount of pages right away.
> >>
> >> This works noticeably faster, especially if 'usage - limit' big.
> >> E.g. bringing down limit from 4G to 50M:
> >>
> >> Before:
> >>  # perf stat echo 50M > memory.limit_in_bytes
> >>
> >>  Performance counter stats for 'echo 50M':
> >>
> >> 386.582382  task-clock (msec) #0.835 CPUs 
> >> utilized
> >>  2,502  context-switches  #0.006 M/sec
> >>
> >>0.463244382 seconds time elapsed
> >>
> >> After:
> >>  # perf stat echo 50M > memory.limit_in_bytes
> >>
> >>  Performance counter stats for 'echo 50M':
> >>
> >> 169.403906  task-clock (msec) #0.849 CPUs 
> >> utilized
> >> 14  context-switches  #0.083 K/sec
> >>
> >>0.199536900 seconds time elapsed
> >
> > But I am not going ack this one. As already stated this has a risk
> > of over-reclaim if there a lot of charges are freed along with this
> > shrinking. This is more of a theoretical concern so I am _not_ going to
> 
> If you don't mind, can you explain why over-reclaim is a concern at
> all? The only side effect of over reclaim I can think of is the job
> might suffer a bit over (more swapins & pageins). Shouldn't this be
> within the expectation of the user decreasing the limits?

It is not a disaster. But it is an unexpected side effect of the
implementation. If you have limit 1GB and want to reduce it 500MB
then it would be quite surprising to land at 200M just because somebody
was freeing 300MB in parallel. Is this likely? Probably not but the more
is the limit touched and the larger are the differences the more likely
it is. Keep retrying in the smaller amounts and you will not see the
above happening.

And to be honest, I do not really see why keeping retrying from
mem_cgroup_resize_limit should be so much faster than keep retrying from
the direct reclaim path. We are doing SWAP_CLUSTER_MAX batches anyway.
mem_cgroup_resize_limit loop adds _some_ overhead but I am not really
sure why it should be that large.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH v5 2/2] mm/memcontrol.c: Reduce reclaim retries in mem_cgroup_resize_limit()

2018-01-19 Thread Shakeel Butt
On Fri, Jan 19, 2018 at 5:35 AM, Michal Hocko  wrote:
> On Fri 19-01-18 16:25:44, Andrey Ryabinin wrote:
>> Currently mem_cgroup_resize_limit() retries to set limit after reclaiming
>> 32 pages. It makes more sense to reclaim needed amount of pages right away.
>>
>> This works noticeably faster, especially if 'usage - limit' big.
>> E.g. bringing down limit from 4G to 50M:
>>
>> Before:
>>  # perf stat echo 50M > memory.limit_in_bytes
>>
>>  Performance counter stats for 'echo 50M':
>>
>> 386.582382  task-clock (msec) #0.835 CPUs 
>> utilized
>>  2,502  context-switches  #0.006 M/sec
>>
>>0.463244382 seconds time elapsed
>>
>> After:
>>  # perf stat echo 50M > memory.limit_in_bytes
>>
>>  Performance counter stats for 'echo 50M':
>>
>> 169.403906  task-clock (msec) #0.849 CPUs 
>> utilized
>> 14  context-switches  #0.083 K/sec
>>
>>0.199536900 seconds time elapsed
>
> But I am not going ack this one. As already stated this has a risk
> of over-reclaim if there a lot of charges are freed along with this
> shrinking. This is more of a theoretical concern so I am _not_ going to

If you don't mind, can you explain why over-reclaim is a concern at
all? The only side effect of over reclaim I can think of is the job
might suffer a bit over (more swapins & pageins). Shouldn't this be
within the expectation of the user decreasing the limits?

> nack. If we ever see such a problem then reverting this patch should be
> pretty straghtforward.
>
>> Signed-off-by: Andrey Ryabinin 

Reviewed-by: Shakeel Butt 

>> Cc: Shakeel Butt 
>> Cc: Michal Hocko 
>> Cc: Johannes Weiner 
>> Cc: Vladimir Davydov 
>> ---
>>  mm/memcontrol.c | 6 --
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 9d987f3e79dc..09bac2df2f12 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -2448,6 +2448,7 @@ static DEFINE_MUTEX(memcg_limit_mutex);
>>  static int mem_cgroup_resize_limit(struct mem_cgroup *memcg,
>>  unsigned long limit, bool memsw)
>>  {
>> + unsigned long nr_pages;
>>   bool enlarge = false;
>>   int ret;
>>   bool limits_invariant;
>> @@ -2479,8 +2480,9 @@ static int mem_cgroup_resize_limit(struct mem_cgroup 
>> *memcg,
>>   if (!ret)
>>   break;
>>
>> - if (!try_to_free_mem_cgroup_pages(memcg, 1,
>> - GFP_KERNEL, !memsw)) {
>> + nr_pages = max_t(long, 1, page_counter_read(counter) - limit);
>> + if (!try_to_free_mem_cgroup_pages(memcg, nr_pages,
>> + GFP_KERNEL, !memsw)) {
>>   ret = -EBUSY;
>>   break;
>>   }
>> --
>> 2.13.6
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majord...@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
>
> --
> Michal Hocko
> SUSE Labs


Re: [PATCH v5 2/2] mm/memcontrol.c: Reduce reclaim retries in mem_cgroup_resize_limit()

2018-01-19 Thread Shakeel Butt
On Fri, Jan 19, 2018 at 5:35 AM, Michal Hocko  wrote:
> On Fri 19-01-18 16:25:44, Andrey Ryabinin wrote:
>> Currently mem_cgroup_resize_limit() retries to set limit after reclaiming
>> 32 pages. It makes more sense to reclaim needed amount of pages right away.
>>
>> This works noticeably faster, especially if 'usage - limit' big.
>> E.g. bringing down limit from 4G to 50M:
>>
>> Before:
>>  # perf stat echo 50M > memory.limit_in_bytes
>>
>>  Performance counter stats for 'echo 50M':
>>
>> 386.582382  task-clock (msec) #0.835 CPUs 
>> utilized
>>  2,502  context-switches  #0.006 M/sec
>>
>>0.463244382 seconds time elapsed
>>
>> After:
>>  # perf stat echo 50M > memory.limit_in_bytes
>>
>>  Performance counter stats for 'echo 50M':
>>
>> 169.403906  task-clock (msec) #0.849 CPUs 
>> utilized
>> 14  context-switches  #0.083 K/sec
>>
>>0.199536900 seconds time elapsed
>
> But I am not going ack this one. As already stated this has a risk
> of over-reclaim if there a lot of charges are freed along with this
> shrinking. This is more of a theoretical concern so I am _not_ going to

If you don't mind, can you explain why over-reclaim is a concern at
all? The only side effect of over reclaim I can think of is the job
might suffer a bit over (more swapins & pageins). Shouldn't this be
within the expectation of the user decreasing the limits?

> nack. If we ever see such a problem then reverting this patch should be
> pretty straghtforward.
>
>> Signed-off-by: Andrey Ryabinin 

Reviewed-by: Shakeel Butt 

>> Cc: Shakeel Butt 
>> Cc: Michal Hocko 
>> Cc: Johannes Weiner 
>> Cc: Vladimir Davydov 
>> ---
>>  mm/memcontrol.c | 6 --
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 9d987f3e79dc..09bac2df2f12 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -2448,6 +2448,7 @@ static DEFINE_MUTEX(memcg_limit_mutex);
>>  static int mem_cgroup_resize_limit(struct mem_cgroup *memcg,
>>  unsigned long limit, bool memsw)
>>  {
>> + unsigned long nr_pages;
>>   bool enlarge = false;
>>   int ret;
>>   bool limits_invariant;
>> @@ -2479,8 +2480,9 @@ static int mem_cgroup_resize_limit(struct mem_cgroup 
>> *memcg,
>>   if (!ret)
>>   break;
>>
>> - if (!try_to_free_mem_cgroup_pages(memcg, 1,
>> - GFP_KERNEL, !memsw)) {
>> + nr_pages = max_t(long, 1, page_counter_read(counter) - limit);
>> + if (!try_to_free_mem_cgroup_pages(memcg, nr_pages,
>> + GFP_KERNEL, !memsw)) {
>>   ret = -EBUSY;
>>   break;
>>   }
>> --
>> 2.13.6
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majord...@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
>
> --
> Michal Hocko
> SUSE Labs


Re: [PATCH v5 2/2] mm/memcontrol.c: Reduce reclaim retries in mem_cgroup_resize_limit()

2018-01-19 Thread Michal Hocko
On Fri 19-01-18 16:25:44, Andrey Ryabinin wrote:
> Currently mem_cgroup_resize_limit() retries to set limit after reclaiming
> 32 pages. It makes more sense to reclaim needed amount of pages right away.
> 
> This works noticeably faster, especially if 'usage - limit' big.
> E.g. bringing down limit from 4G to 50M:
> 
> Before:
>  # perf stat echo 50M > memory.limit_in_bytes
> 
>  Performance counter stats for 'echo 50M':
> 
> 386.582382  task-clock (msec) #0.835 CPUs utilized
>  2,502  context-switches  #0.006 M/sec
> 
>0.463244382 seconds time elapsed
> 
> After:
>  # perf stat echo 50M > memory.limit_in_bytes
> 
>  Performance counter stats for 'echo 50M':
> 
> 169.403906  task-clock (msec) #0.849 CPUs utilized
> 14  context-switches  #0.083 K/sec
> 
>0.199536900 seconds time elapsed

But I am not going ack this one. As already stated this has a risk
of over-reclaim if there a lot of charges are freed along with this
shrinking. This is more of a theoretical concern so I am _not_ going to
nack. If we ever see such a problem then reverting this patch should be
pretty straghtforward.

> Signed-off-by: Andrey Ryabinin 
> Cc: Shakeel Butt 
> Cc: Michal Hocko 
> Cc: Johannes Weiner 
> Cc: Vladimir Davydov 
> ---
>  mm/memcontrol.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 9d987f3e79dc..09bac2df2f12 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2448,6 +2448,7 @@ static DEFINE_MUTEX(memcg_limit_mutex);
>  static int mem_cgroup_resize_limit(struct mem_cgroup *memcg,
>  unsigned long limit, bool memsw)
>  {
> + unsigned long nr_pages;
>   bool enlarge = false;
>   int ret;
>   bool limits_invariant;
> @@ -2479,8 +2480,9 @@ static int mem_cgroup_resize_limit(struct mem_cgroup 
> *memcg,
>   if (!ret)
>   break;
>  
> - if (!try_to_free_mem_cgroup_pages(memcg, 1,
> - GFP_KERNEL, !memsw)) {
> + nr_pages = max_t(long, 1, page_counter_read(counter) - limit);
> + if (!try_to_free_mem_cgroup_pages(memcg, nr_pages,
> + GFP_KERNEL, !memsw)) {
>   ret = -EBUSY;
>   break;
>   }
> -- 
> 2.13.6
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 

-- 
Michal Hocko
SUSE Labs


Re: [PATCH v5 2/2] mm/memcontrol.c: Reduce reclaim retries in mem_cgroup_resize_limit()

2018-01-19 Thread Michal Hocko
On Fri 19-01-18 16:25:44, Andrey Ryabinin wrote:
> Currently mem_cgroup_resize_limit() retries to set limit after reclaiming
> 32 pages. It makes more sense to reclaim needed amount of pages right away.
> 
> This works noticeably faster, especially if 'usage - limit' big.
> E.g. bringing down limit from 4G to 50M:
> 
> Before:
>  # perf stat echo 50M > memory.limit_in_bytes
> 
>  Performance counter stats for 'echo 50M':
> 
> 386.582382  task-clock (msec) #0.835 CPUs utilized
>  2,502  context-switches  #0.006 M/sec
> 
>0.463244382 seconds time elapsed
> 
> After:
>  # perf stat echo 50M > memory.limit_in_bytes
> 
>  Performance counter stats for 'echo 50M':
> 
> 169.403906  task-clock (msec) #0.849 CPUs utilized
> 14  context-switches  #0.083 K/sec
> 
>0.199536900 seconds time elapsed

But I am not going ack this one. As already stated this has a risk
of over-reclaim if there a lot of charges are freed along with this
shrinking. This is more of a theoretical concern so I am _not_ going to
nack. If we ever see such a problem then reverting this patch should be
pretty straghtforward.

> Signed-off-by: Andrey Ryabinin 
> Cc: Shakeel Butt 
> Cc: Michal Hocko 
> Cc: Johannes Weiner 
> Cc: Vladimir Davydov 
> ---
>  mm/memcontrol.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 9d987f3e79dc..09bac2df2f12 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2448,6 +2448,7 @@ static DEFINE_MUTEX(memcg_limit_mutex);
>  static int mem_cgroup_resize_limit(struct mem_cgroup *memcg,
>  unsigned long limit, bool memsw)
>  {
> + unsigned long nr_pages;
>   bool enlarge = false;
>   int ret;
>   bool limits_invariant;
> @@ -2479,8 +2480,9 @@ static int mem_cgroup_resize_limit(struct mem_cgroup 
> *memcg,
>   if (!ret)
>   break;
>  
> - if (!try_to_free_mem_cgroup_pages(memcg, 1,
> - GFP_KERNEL, !memsw)) {
> + nr_pages = max_t(long, 1, page_counter_read(counter) - limit);
> + if (!try_to_free_mem_cgroup_pages(memcg, nr_pages,
> + GFP_KERNEL, !memsw)) {
>   ret = -EBUSY;
>   break;
>   }
> -- 
> 2.13.6
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 

-- 
Michal Hocko
SUSE Labs