Re: [PATCH 1/2 v3] mm: vmscan: do not pass reclaimed slab to vmpressure

2017-02-06 Thread vinayak menon
On Fri, Feb 3, 2017 at 8:29 PM, Michal Hocko  wrote:
> On Fri 03-02-17 10:56:42, vinayak menon wrote:
>> On Thu, Feb 2, 2017 at 9:31 PM, Michal Hocko  wrote:
>> >
>> > Why would you like to chose and kill a task when the slab reclaim can
>> > still make sufficient progres? Are you sure that the slab contribution
>> > to the stats makes all the above happening?
>> >
>> I agree that a task need not be killed if sufficient progress is made
>> in reclaiming
>> memory say from slab. But here it looks like we have an impact because of 
>> just
>> increasing the reclaimed without touching the scanned. It could be because of
>> disimilar costs or not adding adding cost. I agree that vmpressure is
>> only a reasonable
>> estimate which does not already include few other costs, but I am not
>> sure whether it is ok
>> to add another element which further increases that disparity.
>> We noticed this problem when moving from 3.18 to 4.4 kernel version. With the
>> same workload, the vmpressure events differ between 3.18 and 4.4 causing the
>> above mentioned problem. And with this patch on 4.4 we get the same results
>> as in 3,18. So the slab contribution to stats is making a difference.
>
> Please document that in the changelog along with description of the
> workload that is affected. Ideally also add some data from /proc/vmstat
> so that we can see the reclaim activity.

Sure, I will add these to the changelog.

Thanks,
Vinayak


Re: [PATCH 1/2 v3] mm: vmscan: do not pass reclaimed slab to vmpressure

2017-02-06 Thread vinayak menon
On Fri, Feb 3, 2017 at 8:29 PM, Michal Hocko  wrote:
> On Fri 03-02-17 10:56:42, vinayak menon wrote:
>> On Thu, Feb 2, 2017 at 9:31 PM, Michal Hocko  wrote:
>> >
>> > Why would you like to chose and kill a task when the slab reclaim can
>> > still make sufficient progres? Are you sure that the slab contribution
>> > to the stats makes all the above happening?
>> >
>> I agree that a task need not be killed if sufficient progress is made
>> in reclaiming
>> memory say from slab. But here it looks like we have an impact because of 
>> just
>> increasing the reclaimed without touching the scanned. It could be because of
>> disimilar costs or not adding adding cost. I agree that vmpressure is
>> only a reasonable
>> estimate which does not already include few other costs, but I am not
>> sure whether it is ok
>> to add another element which further increases that disparity.
>> We noticed this problem when moving from 3.18 to 4.4 kernel version. With the
>> same workload, the vmpressure events differ between 3.18 and 4.4 causing the
>> above mentioned problem. And with this patch on 4.4 we get the same results
>> as in 3,18. So the slab contribution to stats is making a difference.
>
> Please document that in the changelog along with description of the
> workload that is affected. Ideally also add some data from /proc/vmstat
> so that we can see the reclaim activity.

Sure, I will add these to the changelog.

Thanks,
Vinayak


Re: [PATCH 1/2 v3] mm: vmscan: do not pass reclaimed slab to vmpressure

2017-02-03 Thread Michal Hocko
On Fri 03-02-17 10:56:42, vinayak menon wrote:
> On Thu, Feb 2, 2017 at 9:31 PM, Michal Hocko  wrote:
> >
> > Why would you like to chose and kill a task when the slab reclaim can
> > still make sufficient progres? Are you sure that the slab contribution
> > to the stats makes all the above happening?
> >
> I agree that a task need not be killed if sufficient progress is made
> in reclaiming
> memory say from slab. But here it looks like we have an impact because of just
> increasing the reclaimed without touching the scanned. It could be because of
> disimilar costs or not adding adding cost. I agree that vmpressure is
> only a reasonable
> estimate which does not already include few other costs, but I am not
> sure whether it is ok
> to add another element which further increases that disparity.
> We noticed this problem when moving from 3.18 to 4.4 kernel version. With the
> same workload, the vmpressure events differ between 3.18 and 4.4 causing the
> above mentioned problem. And with this patch on 4.4 we get the same results
> as in 3,18. So the slab contribution to stats is making a difference.

Please document that in the changelog along with description of the
workload that is affected. Ideally also add some data from /proc/vmstat
so that we can see the reclaim activity.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH 1/2 v3] mm: vmscan: do not pass reclaimed slab to vmpressure

2017-02-03 Thread Michal Hocko
On Fri 03-02-17 10:56:42, vinayak menon wrote:
> On Thu, Feb 2, 2017 at 9:31 PM, Michal Hocko  wrote:
> >
> > Why would you like to chose and kill a task when the slab reclaim can
> > still make sufficient progres? Are you sure that the slab contribution
> > to the stats makes all the above happening?
> >
> I agree that a task need not be killed if sufficient progress is made
> in reclaiming
> memory say from slab. But here it looks like we have an impact because of just
> increasing the reclaimed without touching the scanned. It could be because of
> disimilar costs or not adding adding cost. I agree that vmpressure is
> only a reasonable
> estimate which does not already include few other costs, but I am not
> sure whether it is ok
> to add another element which further increases that disparity.
> We noticed this problem when moving from 3.18 to 4.4 kernel version. With the
> same workload, the vmpressure events differ between 3.18 and 4.4 causing the
> above mentioned problem. And with this patch on 4.4 we get the same results
> as in 3,18. So the slab contribution to stats is making a difference.

Please document that in the changelog along with description of the
workload that is affected. Ideally also add some data from /proc/vmstat
so that we can see the reclaim activity.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH 1/2 v3] mm: vmscan: do not pass reclaimed slab to vmpressure

2017-02-02 Thread Minchan Kim
On Thu, Feb 02, 2017 at 11:44:22AM +0100, Michal Hocko wrote:
> On Tue 31-01-17 14:32:08, Vinayak Menon wrote:
> > During global reclaim, the nr_reclaimed passed to vmpressure
> > includes the pages reclaimed from slab. But the corresponding
> > scanned slab pages is not passed. This can cause total reclaimed
> > pages to be greater than scanned, causing an unsigned underflow
> > in vmpressure resulting in a critical event being sent to root
> > cgroup. So do not consider reclaimed slab pages for vmpressure
> > calculation. The reclaimed pages from slab can be excluded because
> > the freeing of a page by slab shrinking depends on each slab's
> > object population, making the cost model (i.e. scan:free) different
> > from that of LRU.
> 
> This might be true but what happens if the slab reclaim contributes
> significantly to the overal reclaim? This would be quite rare but not
> impossible.

Of course, it is better for vmpressure to cover slab but it's not
easy without page-based shrinking model, I think. It wold make
vmpressure higher easily due to low reclaim efficiency compared to
LRU pages. Yeah, vmpressure is not a perfect but no need to add
more noises, either. It's regression since 6b4f7799c6a5 so I think
this patch should go first and if someone want to cover slab really,
he should spend a time to work it well. It's too much that Vinayak
shuld make a effort for that.


Re: [PATCH 1/2 v3] mm: vmscan: do not pass reclaimed slab to vmpressure

2017-02-02 Thread Minchan Kim
On Thu, Feb 02, 2017 at 11:44:22AM +0100, Michal Hocko wrote:
> On Tue 31-01-17 14:32:08, Vinayak Menon wrote:
> > During global reclaim, the nr_reclaimed passed to vmpressure
> > includes the pages reclaimed from slab. But the corresponding
> > scanned slab pages is not passed. This can cause total reclaimed
> > pages to be greater than scanned, causing an unsigned underflow
> > in vmpressure resulting in a critical event being sent to root
> > cgroup. So do not consider reclaimed slab pages for vmpressure
> > calculation. The reclaimed pages from slab can be excluded because
> > the freeing of a page by slab shrinking depends on each slab's
> > object population, making the cost model (i.e. scan:free) different
> > from that of LRU.
> 
> This might be true but what happens if the slab reclaim contributes
> significantly to the overal reclaim? This would be quite rare but not
> impossible.

Of course, it is better for vmpressure to cover slab but it's not
easy without page-based shrinking model, I think. It wold make
vmpressure higher easily due to low reclaim efficiency compared to
LRU pages. Yeah, vmpressure is not a perfect but no need to add
more noises, either. It's regression since 6b4f7799c6a5 so I think
this patch should go first and if someone want to cover slab really,
he should spend a time to work it well. It's too much that Vinayak
shuld make a effort for that.


Re: [PATCH 1/2 v3] mm: vmscan: do not pass reclaimed slab to vmpressure

2017-02-02 Thread vinayak menon
On Thu, Feb 2, 2017 at 9:31 PM, Michal Hocko  wrote:
>
> Why would you like to chose and kill a task when the slab reclaim can
> still make sufficient progres? Are you sure that the slab contribution
> to the stats makes all the above happening?
>
I agree that a task need not be killed if sufficient progress is made
in reclaiming
memory say from slab. But here it looks like we have an impact because of just
increasing the reclaimed without touching the scanned. It could be because of
disimilar costs or not adding adding cost. I agree that vmpressure is
only a reasonable
estimate which does not already include few other costs, but I am not
sure whether it is ok
to add another element which further increases that disparity.
We noticed this problem when moving from 3.18 to 4.4 kernel version. With the
same workload, the vmpressure events differ between 3.18 and 4.4 causing the
above mentioned problem. And with this patch on 4.4 we get the same results
as in 3,18. So the slab contribution to stats is making a difference.

>> This increases the memory pressure and
>> finally result in late critical events, but by that time the task
>> launch latencies are impacted.
>

> I have seen vmpressure hitting critical events really quickly but that
> is mostly because the vmpressure uses only very simplistic
> approximation. Usually the reclaim goes well, until you hit to dirty
> or pinned pages. Then it can get really bad, so you can get from high
> effectiveness to 0 pretty quickly.
> --
> Michal Hocko
> SUSE Labs


Re: [PATCH 1/2 v3] mm: vmscan: do not pass reclaimed slab to vmpressure

2017-02-02 Thread vinayak menon
On Thu, Feb 2, 2017 at 9:31 PM, Michal Hocko  wrote:
>
> Why would you like to chose and kill a task when the slab reclaim can
> still make sufficient progres? Are you sure that the slab contribution
> to the stats makes all the above happening?
>
I agree that a task need not be killed if sufficient progress is made
in reclaiming
memory say from slab. But here it looks like we have an impact because of just
increasing the reclaimed without touching the scanned. It could be because of
disimilar costs or not adding adding cost. I agree that vmpressure is
only a reasonable
estimate which does not already include few other costs, but I am not
sure whether it is ok
to add another element which further increases that disparity.
We noticed this problem when moving from 3.18 to 4.4 kernel version. With the
same workload, the vmpressure events differ between 3.18 and 4.4 causing the
above mentioned problem. And with this patch on 4.4 we get the same results
as in 3,18. So the slab contribution to stats is making a difference.

>> This increases the memory pressure and
>> finally result in late critical events, but by that time the task
>> launch latencies are impacted.
>

> I have seen vmpressure hitting critical events really quickly but that
> is mostly because the vmpressure uses only very simplistic
> approximation. Usually the reclaim goes well, until you hit to dirty
> or pinned pages. Then it can get really bad, so you can get from high
> effectiveness to 0 pretty quickly.
> --
> Michal Hocko
> SUSE Labs


Re: [PATCH 1/2 v3] mm: vmscan: do not pass reclaimed slab to vmpressure

2017-02-02 Thread Michal Hocko
On Thu 02-02-17 21:00:10, vinayak menon wrote:
> On Thu, Feb 2, 2017 at 5:22 PM, Michal Hocko  wrote:
> > On Thu 02-02-17 16:55:49, vinayak menon wrote:
> >> On Thu, Feb 2, 2017 at 4:18 PM, Michal Hocko  wrote:
> >> > On Thu 02-02-17 11:44:22, Michal Hocko wrote:
> >> >> On Tue 31-01-17 14:32:08, Vinayak Menon wrote:
> >> >> > During global reclaim, the nr_reclaimed passed to vmpressure
> >> >> > includes the pages reclaimed from slab. But the corresponding
> >> >> > scanned slab pages is not passed. This can cause total reclaimed
> >> >> > pages to be greater than scanned, causing an unsigned underflow
> >> >> > in vmpressure resulting in a critical event being sent to root
> >> >> > cgroup. So do not consider reclaimed slab pages for vmpressure
> >> >> > calculation. The reclaimed pages from slab can be excluded because
> >> >> > the freeing of a page by slab shrinking depends on each slab's
> >> >> > object population, making the cost model (i.e. scan:free) different
> >> >> > from that of LRU.
> >> >>
> >> >> This might be true but what happens if the slab reclaim contributes
> >> >> significantly to the overal reclaim? This would be quite rare but not
> >> >> impossible.
> >> >>
> >> >> I am wondering why we cannot simply make cap nr_reclaimed to nr_scanned
> >> >> and be done with this all? Sure it will be imprecise but the same will
> >> >> be true with this approach.
> >>
> >> Thinking of a case where 100 LRU pages were scanned and only 10 were
> >> reclaimed.  Now, say slab reclaimed 100 pages and we have no idea
> >> how many were scanned.  The actual vmpressure of 90 will now be 0
> >> because of the addition on 100 slab pages. So underflow was not the
> >> only issue, but incorrect vmpressure.
> >
> > Is this actually a problem. The end result - enough pages being
> > reclaimed should matter, no?
> >
>
> But vmpressure is incorrect now, no ?

What does it mean incorrect? vmpressure is just an approximation that
tells us how much we struggle to reclaim memory. If we are making a
progress then we shouldn't reach higher levels.

> Because the scanned slab pages
> is not included in nr_scanned (the cost). The 100 scanned and 10
> reclaimed from LRU were a reasonable estimate as you said, and to that
> we are adding a reclaimed value alone without scanned and thus making
> it incorrect ? Because the cost of slab reclaim is not accounted.

there are other costs which are not included. E.g. stalling because of
dirty pages etc...

> But
> I agree that the vmpressure value would have been more correct if it
> could include both scanned and reclaimed from slab. And may be more
> correct if we can include the scanned and reclaimed from all shrinkers
> which I think is not the case right now (lowmemorykiller, zsmalloc
> etc). But as Minchan was pointing out, since the cost model for slab
> is different, would it be fine to just add reclaimed from slab to
> vmpressure ?

Get back to your example. Do you really prefer seeing an alarm just
because we had hard time reclaiming LRU pages which might be pinned due
to reclaimable slab pages (e.g. fs metadata) when the slab reclaim can
free enough of them?

vmpressure never had a good semantic, it is just an approximation that
happened to work for some workloads which it was proposed for.

[...]
> >> Our
> >> internal tests on Android actually shows the problem. When vmpressure
> >> with slab reclaimed added is used to kill tasks, it does not kick in
> >> at the right time.
> >
> > With the skewed reclaimed? How that happens? Could you elaborate more?
>
> Yes. Because of the skewed reclaim. The observation is that the vmpressure
> critical events are received late. Because of adding slab reclaimed without
> corresponding scanned, the vmpressure values are diluted resulting in lesser
> number of critical events at the beginning, resulting in tasks not
> being chosen to be killed.

Why would you like to chose and kill a task when the slab reclaim can
still make sufficient progres? Are you sure that the slab contribution
to the stats makes all the above happening?

> This increases the memory pressure and
> finally result in late critical events, but by that time the task
> launch latencies are impacted.

I have seen vmpressure hitting critical events really quickly but that
is mostly because the vmpressure uses only very simplistic
approximation. Usually the reclaim goes well, until you hit to dirty
or pinned pages. Then it can get really bad, so you can get from high
effectiveness to 0 pretty quickly.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH 1/2 v3] mm: vmscan: do not pass reclaimed slab to vmpressure

2017-02-02 Thread Michal Hocko
On Thu 02-02-17 21:00:10, vinayak menon wrote:
> On Thu, Feb 2, 2017 at 5:22 PM, Michal Hocko  wrote:
> > On Thu 02-02-17 16:55:49, vinayak menon wrote:
> >> On Thu, Feb 2, 2017 at 4:18 PM, Michal Hocko  wrote:
> >> > On Thu 02-02-17 11:44:22, Michal Hocko wrote:
> >> >> On Tue 31-01-17 14:32:08, Vinayak Menon wrote:
> >> >> > During global reclaim, the nr_reclaimed passed to vmpressure
> >> >> > includes the pages reclaimed from slab. But the corresponding
> >> >> > scanned slab pages is not passed. This can cause total reclaimed
> >> >> > pages to be greater than scanned, causing an unsigned underflow
> >> >> > in vmpressure resulting in a critical event being sent to root
> >> >> > cgroup. So do not consider reclaimed slab pages for vmpressure
> >> >> > calculation. The reclaimed pages from slab can be excluded because
> >> >> > the freeing of a page by slab shrinking depends on each slab's
> >> >> > object population, making the cost model (i.e. scan:free) different
> >> >> > from that of LRU.
> >> >>
> >> >> This might be true but what happens if the slab reclaim contributes
> >> >> significantly to the overal reclaim? This would be quite rare but not
> >> >> impossible.
> >> >>
> >> >> I am wondering why we cannot simply make cap nr_reclaimed to nr_scanned
> >> >> and be done with this all? Sure it will be imprecise but the same will
> >> >> be true with this approach.
> >>
> >> Thinking of a case where 100 LRU pages were scanned and only 10 were
> >> reclaimed.  Now, say slab reclaimed 100 pages and we have no idea
> >> how many were scanned.  The actual vmpressure of 90 will now be 0
> >> because of the addition on 100 slab pages. So underflow was not the
> >> only issue, but incorrect vmpressure.
> >
> > Is this actually a problem. The end result - enough pages being
> > reclaimed should matter, no?
> >
>
> But vmpressure is incorrect now, no ?

What does it mean incorrect? vmpressure is just an approximation that
tells us how much we struggle to reclaim memory. If we are making a
progress then we shouldn't reach higher levels.

> Because the scanned slab pages
> is not included in nr_scanned (the cost). The 100 scanned and 10
> reclaimed from LRU were a reasonable estimate as you said, and to that
> we are adding a reclaimed value alone without scanned and thus making
> it incorrect ? Because the cost of slab reclaim is not accounted.

there are other costs which are not included. E.g. stalling because of
dirty pages etc...

> But
> I agree that the vmpressure value would have been more correct if it
> could include both scanned and reclaimed from slab. And may be more
> correct if we can include the scanned and reclaimed from all shrinkers
> which I think is not the case right now (lowmemorykiller, zsmalloc
> etc). But as Minchan was pointing out, since the cost model for slab
> is different, would it be fine to just add reclaimed from slab to
> vmpressure ?

Get back to your example. Do you really prefer seeing an alarm just
because we had hard time reclaiming LRU pages which might be pinned due
to reclaimable slab pages (e.g. fs metadata) when the slab reclaim can
free enough of them?

vmpressure never had a good semantic, it is just an approximation that
happened to work for some workloads which it was proposed for.

[...]
> >> Our
> >> internal tests on Android actually shows the problem. When vmpressure
> >> with slab reclaimed added is used to kill tasks, it does not kick in
> >> at the right time.
> >
> > With the skewed reclaimed? How that happens? Could you elaborate more?
>
> Yes. Because of the skewed reclaim. The observation is that the vmpressure
> critical events are received late. Because of adding slab reclaimed without
> corresponding scanned, the vmpressure values are diluted resulting in lesser
> number of critical events at the beginning, resulting in tasks not
> being chosen to be killed.

Why would you like to chose and kill a task when the slab reclaim can
still make sufficient progres? Are you sure that the slab contribution
to the stats makes all the above happening?

> This increases the memory pressure and
> finally result in late critical events, but by that time the task
> launch latencies are impacted.

I have seen vmpressure hitting critical events really quickly but that
is mostly because the vmpressure uses only very simplistic
approximation. Usually the reclaim goes well, until you hit to dirty
or pinned pages. Then it can get really bad, so you can get from high
effectiveness to 0 pretty quickly.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH 1/2 v3] mm: vmscan: do not pass reclaimed slab to vmpressure

2017-02-02 Thread vinayak menon
On Thu, Feb 2, 2017 at 5:22 PM, Michal Hocko  wrote:
> On Thu 02-02-17 16:55:49, vinayak menon wrote:
>> On Thu, Feb 2, 2017 at 4:18 PM, Michal Hocko  wrote:
>> > On Thu 02-02-17 11:44:22, Michal Hocko wrote:
>> >> On Tue 31-01-17 14:32:08, Vinayak Menon wrote:
>> >> > During global reclaim, the nr_reclaimed passed to vmpressure
>> >> > includes the pages reclaimed from slab. But the corresponding
>> >> > scanned slab pages is not passed. This can cause total reclaimed
>> >> > pages to be greater than scanned, causing an unsigned underflow
>> >> > in vmpressure resulting in a critical event being sent to root
>> >> > cgroup. So do not consider reclaimed slab pages for vmpressure
>> >> > calculation. The reclaimed pages from slab can be excluded because
>> >> > the freeing of a page by slab shrinking depends on each slab's
>> >> > object population, making the cost model (i.e. scan:free) different
>> >> > from that of LRU.
>> >>
>> >> This might be true but what happens if the slab reclaim contributes
>> >> significantly to the overal reclaim? This would be quite rare but not
>> >> impossible.
>> >>
>> >> I am wondering why we cannot simply make cap nr_reclaimed to nr_scanned
>> >> and be done with this all? Sure it will be imprecise but the same will
>> >> be true with this approach.
>>
>> Thinking of a case where 100 LRU pages were scanned and only 10 were
>> reclaimed.  Now, say slab reclaimed 100 pages and we have no idea
>> how many were scanned.  The actual vmpressure of 90 will now be 0
>> because of the addition on 100 slab pages. So underflow was not the
>> only issue, but incorrect vmpressure.
>
> Is this actually a problem. The end result - enough pages being
> reclaimed should matter, no?
>
But vmpressure is incorrect now, no ? Because the scanned slab pages is
not included in nr_scanned (the cost). The 100 scanned and 10 reclaimed from LRU
were a reasonable estimate as you said, and to that we are adding a
reclaimed value alone without
scanned and thus making it incorrect ? Because the cost of slab reclaim is not
accounted. But I agree that the vmpressure value would have been more correct
if it could include both scanned and reclaimed from slab. And may be
more correct
if we can include the scanned and reclaimed from all shrinkers which I
think is not
the case right now (lowmemorykiller, zsmalloc etc). But as Minchan was pointing
out, since the cost model for slab is different, would it be fine to
just add reclaimed
from slab to vmpressure ?

>> Even though the slab reclaimed is not accounted in vmpressure, the
>> slab reclaimed pages will have a feedback effect on the LRU pressure
>> right ? i.e. the next LRU scan will either be less or delayed if
>> enough slab pages are reclaimed, in turn lowering the vmpressure or
>> delaying it ?
>
> Not sure what you mean but we can break out from the direct reclaim
> because we have fulfilled the reclaim target and that is why I think
> that it shouldn't be really harmful to consider them in the pressure
> calculation. After all we are making reclaim progress and that should
> be considered. reclaimed/scanned is a reasonable estimation but it has
> many issues because it doesn't really tell how hard it was to get that
> number of pages reclaimed. We might have to wait for writeback which is
> something completely different from a clean page cache. There are
> certainly different possible metrics.
>
I see.

>> If that is so, the
>> current approach of neglecting slab reclaimed will provide more
>> accurate vmpressure than capping nr_reclaimed to nr_scanned ?
>
> The problem I can see is that you can get serious vmpressure events
> while the reclaim manages to provide pages we are asking for and later
> decisions might be completely inappropriate.
>
>> Our
>> internal tests on Android actually shows the problem. When vmpressure
>> with slab reclaimed added is used to kill tasks, it does not kick in
>> at the right time.
>
> With the skewed reclaimed? How that happens? Could you elaborate more?
Yes. Because of the skewed reclaim. The observation is that the vmpressure
critical events are received late. Because of adding slab reclaimed without
corresponding scanned, the vmpressure values are diluted resulting in lesser
number of critical events at the beginning, resulting in tasks not
being chosen to
be killed. This increases the memory pressure and finally result in
late critical events,
but by that time the task launch latencies are impacted.


Re: [PATCH 1/2 v3] mm: vmscan: do not pass reclaimed slab to vmpressure

2017-02-02 Thread vinayak menon
On Thu, Feb 2, 2017 at 5:22 PM, Michal Hocko  wrote:
> On Thu 02-02-17 16:55:49, vinayak menon wrote:
>> On Thu, Feb 2, 2017 at 4:18 PM, Michal Hocko  wrote:
>> > On Thu 02-02-17 11:44:22, Michal Hocko wrote:
>> >> On Tue 31-01-17 14:32:08, Vinayak Menon wrote:
>> >> > During global reclaim, the nr_reclaimed passed to vmpressure
>> >> > includes the pages reclaimed from slab. But the corresponding
>> >> > scanned slab pages is not passed. This can cause total reclaimed
>> >> > pages to be greater than scanned, causing an unsigned underflow
>> >> > in vmpressure resulting in a critical event being sent to root
>> >> > cgroup. So do not consider reclaimed slab pages for vmpressure
>> >> > calculation. The reclaimed pages from slab can be excluded because
>> >> > the freeing of a page by slab shrinking depends on each slab's
>> >> > object population, making the cost model (i.e. scan:free) different
>> >> > from that of LRU.
>> >>
>> >> This might be true but what happens if the slab reclaim contributes
>> >> significantly to the overal reclaim? This would be quite rare but not
>> >> impossible.
>> >>
>> >> I am wondering why we cannot simply make cap nr_reclaimed to nr_scanned
>> >> and be done with this all? Sure it will be imprecise but the same will
>> >> be true with this approach.
>>
>> Thinking of a case where 100 LRU pages were scanned and only 10 were
>> reclaimed.  Now, say slab reclaimed 100 pages and we have no idea
>> how many were scanned.  The actual vmpressure of 90 will now be 0
>> because of the addition on 100 slab pages. So underflow was not the
>> only issue, but incorrect vmpressure.
>
> Is this actually a problem. The end result - enough pages being
> reclaimed should matter, no?
>
But vmpressure is incorrect now, no ? Because the scanned slab pages is
not included in nr_scanned (the cost). The 100 scanned and 10 reclaimed from LRU
were a reasonable estimate as you said, and to that we are adding a
reclaimed value alone without
scanned and thus making it incorrect ? Because the cost of slab reclaim is not
accounted. But I agree that the vmpressure value would have been more correct
if it could include both scanned and reclaimed from slab. And may be
more correct
if we can include the scanned and reclaimed from all shrinkers which I
think is not
the case right now (lowmemorykiller, zsmalloc etc). But as Minchan was pointing
out, since the cost model for slab is different, would it be fine to
just add reclaimed
from slab to vmpressure ?

>> Even though the slab reclaimed is not accounted in vmpressure, the
>> slab reclaimed pages will have a feedback effect on the LRU pressure
>> right ? i.e. the next LRU scan will either be less or delayed if
>> enough slab pages are reclaimed, in turn lowering the vmpressure or
>> delaying it ?
>
> Not sure what you mean but we can break out from the direct reclaim
> because we have fulfilled the reclaim target and that is why I think
> that it shouldn't be really harmful to consider them in the pressure
> calculation. After all we are making reclaim progress and that should
> be considered. reclaimed/scanned is a reasonable estimation but it has
> many issues because it doesn't really tell how hard it was to get that
> number of pages reclaimed. We might have to wait for writeback which is
> something completely different from a clean page cache. There are
> certainly different possible metrics.
>
I see.

>> If that is so, the
>> current approach of neglecting slab reclaimed will provide more
>> accurate vmpressure than capping nr_reclaimed to nr_scanned ?
>
> The problem I can see is that you can get serious vmpressure events
> while the reclaim manages to provide pages we are asking for and later
> decisions might be completely inappropriate.
>
>> Our
>> internal tests on Android actually shows the problem. When vmpressure
>> with slab reclaimed added is used to kill tasks, it does not kick in
>> at the right time.
>
> With the skewed reclaimed? How that happens? Could you elaborate more?
Yes. Because of the skewed reclaim. The observation is that the vmpressure
critical events are received late. Because of adding slab reclaimed without
corresponding scanned, the vmpressure values are diluted resulting in lesser
number of critical events at the beginning, resulting in tasks not
being chosen to
be killed. This increases the memory pressure and finally result in
late critical events,
but by that time the task launch latencies are impacted.


Re: [PATCH 1/2 v3] mm: vmscan: do not pass reclaimed slab to vmpressure

2017-02-02 Thread Michal Hocko
On Thu 02-02-17 16:55:49, vinayak menon wrote:
> On Thu, Feb 2, 2017 at 4:18 PM, Michal Hocko  wrote:
> > On Thu 02-02-17 11:44:22, Michal Hocko wrote:
> >> On Tue 31-01-17 14:32:08, Vinayak Menon wrote:
> >> > During global reclaim, the nr_reclaimed passed to vmpressure
> >> > includes the pages reclaimed from slab. But the corresponding
> >> > scanned slab pages is not passed. This can cause total reclaimed
> >> > pages to be greater than scanned, causing an unsigned underflow
> >> > in vmpressure resulting in a critical event being sent to root
> >> > cgroup. So do not consider reclaimed slab pages for vmpressure
> >> > calculation. The reclaimed pages from slab can be excluded because
> >> > the freeing of a page by slab shrinking depends on each slab's
> >> > object population, making the cost model (i.e. scan:free) different
> >> > from that of LRU.
> >>
> >> This might be true but what happens if the slab reclaim contributes
> >> significantly to the overal reclaim? This would be quite rare but not
> >> impossible.
> >>
> >> I am wondering why we cannot simply make cap nr_reclaimed to nr_scanned
> >> and be done with this all? Sure it will be imprecise but the same will
> >> be true with this approach.
>
> Thinking of a case where 100 LRU pages were scanned and only 10 were
> reclaimed.  Now, say slab reclaimed 100 pages and we have no idea
> how many were scanned.  The actual vmpressure of 90 will now be 0
> because of the addition on 100 slab pages. So underflow was not the
> only issue, but incorrect vmpressure.

Is this actually a problem. The end result - enough pages being
reclaimed should matter, no?

> Even though the slab reclaimed is not accounted in vmpressure, the
> slab reclaimed pages will have a feedback effect on the LRU pressure
> right ? i.e. the next LRU scan will either be less or delayed if
> enough slab pages are reclaimed, in turn lowering the vmpressure or
> delaying it ?

Not sure what you mean but we can break out from the direct reclaim
because we have fulfilled the reclaim target and that is why I think
that it shouldn't be really harmful to consider them in the pressure
calculation. After all we are making reclaim progress and that should
be considered. reclaimed/scanned is a reasonable estimation but it has
many issues because it doesn't really tell how hard it was to get that
number of pages reclaimed. We might have to wait for writeback which is
something completely different from a clean page cache. There are
certainly different possible metrics.

> If that is so, the
> current approach of neglecting slab reclaimed will provide more
> accurate vmpressure than capping nr_reclaimed to nr_scanned ?

The problem I can see is that you can get serious vmpressure events
while the reclaim manages to provide pages we are asking for and later
decisions might be completely inappropriate.

> Our
> internal tests on Android actually shows the problem. When vmpressure
> with slab reclaimed added is used to kill tasks, it does not kick in
> at the right time.

With the skewed reclaimed? How that happens? Could you elaborate more?

> > In other words something as "beautiful" as the following:
> > diff --git a/mm/vmpressure.c b/mm/vmpressure.c
> > index 149fdf6c5c56..abea42817dd0 100644
> > --- a/mm/vmpressure.c
> > +++ b/mm/vmpressure.c
> > @@ -236,6 +236,15 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg, 
> > bool tree,
> > return;
> >
> > /*
> > +* Due to accounting issues - e.g. THP contributing 1 to scanned but
> > +* potentially much more to reclaimed or SLAB pages not contributing
> > +* to scanned at all - we have to skew reclaimed to prevent from
> > +* wrong pressure levels due to overflows.
> > +*/
> > +   if (reclaimed > scanned)
> > +   reclaimed = scanned;
> > +
> > +   /*
> 
> This underflow problem is fixed by a separate patch
> https://lkml.org/lkml/2017/1/27/48
> That patch performs this check only once at the end of a window period.
> Is that ok ?

I have seen that patch but pushing that up into the vmpressure makes
more sense to me because it is more obvious. Not something I would
insist on, though.

-- 
Michal Hocko
SUSE Labs


Re: [PATCH 1/2 v3] mm: vmscan: do not pass reclaimed slab to vmpressure

2017-02-02 Thread Michal Hocko
On Thu 02-02-17 16:55:49, vinayak menon wrote:
> On Thu, Feb 2, 2017 at 4:18 PM, Michal Hocko  wrote:
> > On Thu 02-02-17 11:44:22, Michal Hocko wrote:
> >> On Tue 31-01-17 14:32:08, Vinayak Menon wrote:
> >> > During global reclaim, the nr_reclaimed passed to vmpressure
> >> > includes the pages reclaimed from slab. But the corresponding
> >> > scanned slab pages is not passed. This can cause total reclaimed
> >> > pages to be greater than scanned, causing an unsigned underflow
> >> > in vmpressure resulting in a critical event being sent to root
> >> > cgroup. So do not consider reclaimed slab pages for vmpressure
> >> > calculation. The reclaimed pages from slab can be excluded because
> >> > the freeing of a page by slab shrinking depends on each slab's
> >> > object population, making the cost model (i.e. scan:free) different
> >> > from that of LRU.
> >>
> >> This might be true but what happens if the slab reclaim contributes
> >> significantly to the overal reclaim? This would be quite rare but not
> >> impossible.
> >>
> >> I am wondering why we cannot simply make cap nr_reclaimed to nr_scanned
> >> and be done with this all? Sure it will be imprecise but the same will
> >> be true with this approach.
>
> Thinking of a case where 100 LRU pages were scanned and only 10 were
> reclaimed.  Now, say slab reclaimed 100 pages and we have no idea
> how many were scanned.  The actual vmpressure of 90 will now be 0
> because of the addition on 100 slab pages. So underflow was not the
> only issue, but incorrect vmpressure.

Is this actually a problem. The end result - enough pages being
reclaimed should matter, no?

> Even though the slab reclaimed is not accounted in vmpressure, the
> slab reclaimed pages will have a feedback effect on the LRU pressure
> right ? i.e. the next LRU scan will either be less or delayed if
> enough slab pages are reclaimed, in turn lowering the vmpressure or
> delaying it ?

Not sure what you mean but we can break out from the direct reclaim
because we have fulfilled the reclaim target and that is why I think
that it shouldn't be really harmful to consider them in the pressure
calculation. After all we are making reclaim progress and that should
be considered. reclaimed/scanned is a reasonable estimation but it has
many issues because it doesn't really tell how hard it was to get that
number of pages reclaimed. We might have to wait for writeback which is
something completely different from a clean page cache. There are
certainly different possible metrics.

> If that is so, the
> current approach of neglecting slab reclaimed will provide more
> accurate vmpressure than capping nr_reclaimed to nr_scanned ?

The problem I can see is that you can get serious vmpressure events
while the reclaim manages to provide pages we are asking for and later
decisions might be completely inappropriate.

> Our
> internal tests on Android actually shows the problem. When vmpressure
> with slab reclaimed added is used to kill tasks, it does not kick in
> at the right time.

With the skewed reclaimed? How that happens? Could you elaborate more?

> > In other words something as "beautiful" as the following:
> > diff --git a/mm/vmpressure.c b/mm/vmpressure.c
> > index 149fdf6c5c56..abea42817dd0 100644
> > --- a/mm/vmpressure.c
> > +++ b/mm/vmpressure.c
> > @@ -236,6 +236,15 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg, 
> > bool tree,
> > return;
> >
> > /*
> > +* Due to accounting issues - e.g. THP contributing 1 to scanned but
> > +* potentially much more to reclaimed or SLAB pages not contributing
> > +* to scanned at all - we have to skew reclaimed to prevent from
> > +* wrong pressure levels due to overflows.
> > +*/
> > +   if (reclaimed > scanned)
> > +   reclaimed = scanned;
> > +
> > +   /*
> 
> This underflow problem is fixed by a separate patch
> https://lkml.org/lkml/2017/1/27/48
> That patch performs this check only once at the end of a window period.
> Is that ok ?

I have seen that patch but pushing that up into the vmpressure makes
more sense to me because it is more obvious. Not something I would
insist on, though.

-- 
Michal Hocko
SUSE Labs


Re: [PATCH 1/2 v3] mm: vmscan: do not pass reclaimed slab to vmpressure

2017-02-02 Thread vinayak menon
On Thu, Feb 2, 2017 at 4:14 PM, Michal Hocko  wrote:

>
> We usually refer to the culprit comment as
> Fixes: 6b4f7799c6a5 ("mm: vmscan: invoke slab shrinkers from shrink_zone()")
>
Thanks for pointing that out Michal. I see that added to the version
of patch in mmotm.

> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 


Re: [PATCH 1/2 v3] mm: vmscan: do not pass reclaimed slab to vmpressure

2017-02-02 Thread vinayak menon
On Thu, Feb 2, 2017 at 4:14 PM, Michal Hocko  wrote:

>
> We usually refer to the culprit comment as
> Fixes: 6b4f7799c6a5 ("mm: vmscan: invoke slab shrinkers from shrink_zone()")
>
Thanks for pointing that out Michal. I see that added to the version
of patch in mmotm.

> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 


Re: [PATCH 1/2 v3] mm: vmscan: do not pass reclaimed slab to vmpressure

2017-02-02 Thread vinayak menon
On Thu, Feb 2, 2017 at 4:18 PM, Michal Hocko  wrote:
> On Thu 02-02-17 11:44:22, Michal Hocko wrote:
>> On Tue 31-01-17 14:32:08, Vinayak Menon wrote:
>> > During global reclaim, the nr_reclaimed passed to vmpressure
>> > includes the pages reclaimed from slab. But the corresponding
>> > scanned slab pages is not passed. This can cause total reclaimed
>> > pages to be greater than scanned, causing an unsigned underflow
>> > in vmpressure resulting in a critical event being sent to root
>> > cgroup. So do not consider reclaimed slab pages for vmpressure
>> > calculation. The reclaimed pages from slab can be excluded because
>> > the freeing of a page by slab shrinking depends on each slab's
>> > object population, making the cost model (i.e. scan:free) different
>> > from that of LRU.
>>
>> This might be true but what happens if the slab reclaim contributes
>> significantly to the overal reclaim? This would be quite rare but not
>> impossible.
>>
>> I am wondering why we cannot simply make cap nr_reclaimed to nr_scanned
>> and be done with this all? Sure it will be imprecise but the same will
>> be true with this approach.
Thinking of a case where 100 LRU pages were scanned and only 10 were reclaimed.
Now, say slab reclaimed 100 pages and we have no idea how many were scanned.
The actual vmpressure of 90 will now be 0 because of the addition on 100 slab
pages. So underflow was not the only issue, but incorrect vmpressure.
Even though the slab reclaimed is not accounted in vmpressure, the
slab reclaimed
pages will have a feedback effect on the LRU pressure right ? i.e. the
next LRU scan
will either be less or delayed if enough slab pages are reclaimed, in
turn lowering the
vmpressure or delaying it ? If that is so, the current approach of
neglecting slab reclaimed
will provide more accurate vmpressure than capping nr_reclaimed to nr_scanned ?
Our internal tests on Android actually shows the problem. When
vmpressure with slab
reclaimed added is used to kill tasks, it does not kick in at the right time.

>
> In other words something as "beautiful" as the following:
> diff --git a/mm/vmpressure.c b/mm/vmpressure.c
> index 149fdf6c5c56..abea42817dd0 100644
> --- a/mm/vmpressure.c
> +++ b/mm/vmpressure.c
> @@ -236,6 +236,15 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg, 
> bool tree,
> return;
>
> /*
> +* Due to accounting issues - e.g. THP contributing 1 to scanned but
> +* potentially much more to reclaimed or SLAB pages not contributing
> +* to scanned at all - we have to skew reclaimed to prevent from
> +* wrong pressure levels due to overflows.
> +*/
> +   if (reclaimed > scanned)
> +   reclaimed = scanned;
> +
> +   /*

This underflow problem is fixed by a separate patch
https://lkml.org/lkml/2017/1/27/48
That patch performs this check only once at the end of a window period.
Is that ok ?

Thanks,
Vinayak


Re: [PATCH 1/2 v3] mm: vmscan: do not pass reclaimed slab to vmpressure

2017-02-02 Thread vinayak menon
On Thu, Feb 2, 2017 at 4:18 PM, Michal Hocko  wrote:
> On Thu 02-02-17 11:44:22, Michal Hocko wrote:
>> On Tue 31-01-17 14:32:08, Vinayak Menon wrote:
>> > During global reclaim, the nr_reclaimed passed to vmpressure
>> > includes the pages reclaimed from slab. But the corresponding
>> > scanned slab pages is not passed. This can cause total reclaimed
>> > pages to be greater than scanned, causing an unsigned underflow
>> > in vmpressure resulting in a critical event being sent to root
>> > cgroup. So do not consider reclaimed slab pages for vmpressure
>> > calculation. The reclaimed pages from slab can be excluded because
>> > the freeing of a page by slab shrinking depends on each slab's
>> > object population, making the cost model (i.e. scan:free) different
>> > from that of LRU.
>>
>> This might be true but what happens if the slab reclaim contributes
>> significantly to the overal reclaim? This would be quite rare but not
>> impossible.
>>
>> I am wondering why we cannot simply make cap nr_reclaimed to nr_scanned
>> and be done with this all? Sure it will be imprecise but the same will
>> be true with this approach.
Thinking of a case where 100 LRU pages were scanned and only 10 were reclaimed.
Now, say slab reclaimed 100 pages and we have no idea how many were scanned.
The actual vmpressure of 90 will now be 0 because of the addition on 100 slab
pages. So underflow was not the only issue, but incorrect vmpressure.
Even though the slab reclaimed is not accounted in vmpressure, the
slab reclaimed
pages will have a feedback effect on the LRU pressure right ? i.e. the
next LRU scan
will either be less or delayed if enough slab pages are reclaimed, in
turn lowering the
vmpressure or delaying it ? If that is so, the current approach of
neglecting slab reclaimed
will provide more accurate vmpressure than capping nr_reclaimed to nr_scanned ?
Our internal tests on Android actually shows the problem. When
vmpressure with slab
reclaimed added is used to kill tasks, it does not kick in at the right time.

>
> In other words something as "beautiful" as the following:
> diff --git a/mm/vmpressure.c b/mm/vmpressure.c
> index 149fdf6c5c56..abea42817dd0 100644
> --- a/mm/vmpressure.c
> +++ b/mm/vmpressure.c
> @@ -236,6 +236,15 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg, 
> bool tree,
> return;
>
> /*
> +* Due to accounting issues - e.g. THP contributing 1 to scanned but
> +* potentially much more to reclaimed or SLAB pages not contributing
> +* to scanned at all - we have to skew reclaimed to prevent from
> +* wrong pressure levels due to overflows.
> +*/
> +   if (reclaimed > scanned)
> +   reclaimed = scanned;
> +
> +   /*

This underflow problem is fixed by a separate patch
https://lkml.org/lkml/2017/1/27/48
That patch performs this check only once at the end of a window period.
Is that ok ?

Thanks,
Vinayak


Re: [PATCH 1/2 v3] mm: vmscan: do not pass reclaimed slab to vmpressure

2017-02-02 Thread Michal Hocko
On Thu 02-02-17 11:44:22, Michal Hocko wrote:
> On Tue 31-01-17 14:32:08, Vinayak Menon wrote:
> > During global reclaim, the nr_reclaimed passed to vmpressure
> > includes the pages reclaimed from slab. But the corresponding
> > scanned slab pages is not passed. This can cause total reclaimed
> > pages to be greater than scanned, causing an unsigned underflow
> > in vmpressure resulting in a critical event being sent to root
> > cgroup. So do not consider reclaimed slab pages for vmpressure
> > calculation. The reclaimed pages from slab can be excluded because
> > the freeing of a page by slab shrinking depends on each slab's
> > object population, making the cost model (i.e. scan:free) different
> > from that of LRU.
> 
> This might be true but what happens if the slab reclaim contributes
> significantly to the overal reclaim? This would be quite rare but not
> impossible.
> 
> I am wondering why we cannot simply make cap nr_reclaimed to nr_scanned
> and be done with this all? Sure it will be imprecise but the same will
> be true with this approach.

In other words something as "beautiful" as the following:
diff --git a/mm/vmpressure.c b/mm/vmpressure.c
index 149fdf6c5c56..abea42817dd0 100644
--- a/mm/vmpressure.c
+++ b/mm/vmpressure.c
@@ -236,6 +236,15 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg, bool 
tree,
return;
 
/*
+* Due to accounting issues - e.g. THP contributing 1 to scanned but
+* potentially much more to reclaimed or SLAB pages not contributing
+* to scanned at all - we have to skew reclaimed to prevent from
+* wrong pressure levels due to overflows.
+*/
+   if (reclaimed > scanned)
+   reclaimed = scanned;
+
+   /*
 * If we got here with no pages scanned, then that is an indicator
 * that reclaimer was unable to find any shrinkable LRUs at the
 * current scanning depth. But it does not mean that we should
-- 
Michal Hocko
SUSE Labs


Re: [PATCH 1/2 v3] mm: vmscan: do not pass reclaimed slab to vmpressure

2017-02-02 Thread Michal Hocko
On Thu 02-02-17 11:44:22, Michal Hocko wrote:
> On Tue 31-01-17 14:32:08, Vinayak Menon wrote:
> > During global reclaim, the nr_reclaimed passed to vmpressure
> > includes the pages reclaimed from slab. But the corresponding
> > scanned slab pages is not passed. This can cause total reclaimed
> > pages to be greater than scanned, causing an unsigned underflow
> > in vmpressure resulting in a critical event being sent to root
> > cgroup. So do not consider reclaimed slab pages for vmpressure
> > calculation. The reclaimed pages from slab can be excluded because
> > the freeing of a page by slab shrinking depends on each slab's
> > object population, making the cost model (i.e. scan:free) different
> > from that of LRU.
> 
> This might be true but what happens if the slab reclaim contributes
> significantly to the overal reclaim? This would be quite rare but not
> impossible.
> 
> I am wondering why we cannot simply make cap nr_reclaimed to nr_scanned
> and be done with this all? Sure it will be imprecise but the same will
> be true with this approach.

In other words something as "beautiful" as the following:
diff --git a/mm/vmpressure.c b/mm/vmpressure.c
index 149fdf6c5c56..abea42817dd0 100644
--- a/mm/vmpressure.c
+++ b/mm/vmpressure.c
@@ -236,6 +236,15 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg, bool 
tree,
return;
 
/*
+* Due to accounting issues - e.g. THP contributing 1 to scanned but
+* potentially much more to reclaimed or SLAB pages not contributing
+* to scanned at all - we have to skew reclaimed to prevent from
+* wrong pressure levels due to overflows.
+*/
+   if (reclaimed > scanned)
+   reclaimed = scanned;
+
+   /*
 * If we got here with no pages scanned, then that is an indicator
 * that reclaimer was unable to find any shrinkable LRUs at the
 * current scanning depth. But it does not mean that we should
-- 
Michal Hocko
SUSE Labs


Re: [PATCH 1/2 v3] mm: vmscan: do not pass reclaimed slab to vmpressure

2017-02-02 Thread Michal Hocko
On Tue 31-01-17 14:32:08, Vinayak Menon wrote:
> During global reclaim, the nr_reclaimed passed to vmpressure
> includes the pages reclaimed from slab. But the corresponding
> scanned slab pages is not passed. This can cause total reclaimed
> pages to be greater than scanned, causing an unsigned underflow
> in vmpressure resulting in a critical event being sent to root
> cgroup. So do not consider reclaimed slab pages for vmpressure
> calculation. The reclaimed pages from slab can be excluded because
> the freeing of a page by slab shrinking depends on each slab's
> object population, making the cost model (i.e. scan:free) different
> from that of LRU.

This might be true but what happens if the slab reclaim contributes
significantly to the overal reclaim? This would be quite rare but not
impossible.

I am wondering why we cannot simply make cap nr_reclaimed to nr_scanned
and be done with this all? Sure it will be imprecise but the same will
be true with this approach.

> Also, not every shrinker accounts the pages it
> reclaims. This is a regression introduced by commit 6b4f7799c6a5
> ("mm: vmscan: invoke slab shrinkers from shrink_zone()").

We usually refer to the culprit comment as
Fixes: 6b4f7799c6a5 ("mm: vmscan: invoke slab shrinkers from shrink_zone()")
 
> Signed-off-by: Vinayak Menon 
> ---
>  mm/vmscan.c | 17 -
>  1 file changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 947ab6f..8969f8e 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2594,16 +2594,23 @@ static bool shrink_node(pg_data_t *pgdat, struct 
> scan_control *sc)
>   sc->nr_scanned - nr_scanned,
>   node_lru_pages);
>  
> + /*
> +  * Record the subtree's reclaim efficiency. The reclaimed
> +  * pages from slab is excluded here because the corresponding
> +  * scanned pages is not accounted. Moreover, freeing a page
> +  * by slab shrinking depends on each slab's object population,
> +  * making the cost model (i.e. scan:free) different from that
> +  * of LRU.
> +  */
> + vmpressure(sc->gfp_mask, sc->target_mem_cgroup, true,
> +sc->nr_scanned - nr_scanned,
> +sc->nr_reclaimed - nr_reclaimed);
> +
>   if (reclaim_state) {
>   sc->nr_reclaimed += reclaim_state->reclaimed_slab;
>   reclaim_state->reclaimed_slab = 0;
>   }
>  
> - /* Record the subtree's reclaim efficiency */
> - vmpressure(sc->gfp_mask, sc->target_mem_cgroup, true,
> -sc->nr_scanned - nr_scanned,
> -sc->nr_reclaimed - nr_reclaimed);
> -
>   if (sc->nr_reclaimed - nr_reclaimed)
>   reclaimable = true;
>  
> -- 
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
> member of the Code Aurora Forum, hosted by The Linux Foundation
> 

-- 
Michal Hocko
SUSE Labs


Re: [PATCH 1/2 v3] mm: vmscan: do not pass reclaimed slab to vmpressure

2017-02-02 Thread Michal Hocko
On Tue 31-01-17 14:32:08, Vinayak Menon wrote:
> During global reclaim, the nr_reclaimed passed to vmpressure
> includes the pages reclaimed from slab. But the corresponding
> scanned slab pages is not passed. This can cause total reclaimed
> pages to be greater than scanned, causing an unsigned underflow
> in vmpressure resulting in a critical event being sent to root
> cgroup. So do not consider reclaimed slab pages for vmpressure
> calculation. The reclaimed pages from slab can be excluded because
> the freeing of a page by slab shrinking depends on each slab's
> object population, making the cost model (i.e. scan:free) different
> from that of LRU.

This might be true but what happens if the slab reclaim contributes
significantly to the overal reclaim? This would be quite rare but not
impossible.

I am wondering why we cannot simply make cap nr_reclaimed to nr_scanned
and be done with this all? Sure it will be imprecise but the same will
be true with this approach.

> Also, not every shrinker accounts the pages it
> reclaims. This is a regression introduced by commit 6b4f7799c6a5
> ("mm: vmscan: invoke slab shrinkers from shrink_zone()").

We usually refer to the culprit comment as
Fixes: 6b4f7799c6a5 ("mm: vmscan: invoke slab shrinkers from shrink_zone()")
 
> Signed-off-by: Vinayak Menon 
> ---
>  mm/vmscan.c | 17 -
>  1 file changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 947ab6f..8969f8e 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2594,16 +2594,23 @@ static bool shrink_node(pg_data_t *pgdat, struct 
> scan_control *sc)
>   sc->nr_scanned - nr_scanned,
>   node_lru_pages);
>  
> + /*
> +  * Record the subtree's reclaim efficiency. The reclaimed
> +  * pages from slab is excluded here because the corresponding
> +  * scanned pages is not accounted. Moreover, freeing a page
> +  * by slab shrinking depends on each slab's object population,
> +  * making the cost model (i.e. scan:free) different from that
> +  * of LRU.
> +  */
> + vmpressure(sc->gfp_mask, sc->target_mem_cgroup, true,
> +sc->nr_scanned - nr_scanned,
> +sc->nr_reclaimed - nr_reclaimed);
> +
>   if (reclaim_state) {
>   sc->nr_reclaimed += reclaim_state->reclaimed_slab;
>   reclaim_state->reclaimed_slab = 0;
>   }
>  
> - /* Record the subtree's reclaim efficiency */
> - vmpressure(sc->gfp_mask, sc->target_mem_cgroup, true,
> -sc->nr_scanned - nr_scanned,
> -sc->nr_reclaimed - nr_reclaimed);
> -
>   if (sc->nr_reclaimed - nr_reclaimed)
>   reclaimable = true;
>  
> -- 
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
> member of the Code Aurora Forum, hosted by The Linux Foundation
> 

-- 
Michal Hocko
SUSE Labs


Re: [PATCH 1/2 v3] mm: vmscan: do not pass reclaimed slab to vmpressure

2017-01-31 Thread Minchan Kim
On Tue, Jan 31, 2017 at 02:32:08PM +0530, Vinayak Menon wrote:
> During global reclaim, the nr_reclaimed passed to vmpressure
> includes the pages reclaimed from slab. But the corresponding
> scanned slab pages is not passed. This can cause total reclaimed
> pages to be greater than scanned, causing an unsigned underflow
> in vmpressure resulting in a critical event being sent to root
> cgroup. So do not consider reclaimed slab pages for vmpressure
> calculation. The reclaimed pages from slab can be excluded because
> the freeing of a page by slab shrinking depends on each slab's
> object population, making the cost model (i.e. scan:free) different
> from that of LRU. Also, not every shrinker accounts the pages it
> reclaims. This is a regression introduced by commit 6b4f7799c6a5
> ("mm: vmscan: invoke slab shrinkers from shrink_zone()").
> 
> Signed-off-by: Vinayak Menon 
Acked-by: Minchan Kim 



Re: [PATCH 1/2 v3] mm: vmscan: do not pass reclaimed slab to vmpressure

2017-01-31 Thread Minchan Kim
On Tue, Jan 31, 2017 at 02:32:08PM +0530, Vinayak Menon wrote:
> During global reclaim, the nr_reclaimed passed to vmpressure
> includes the pages reclaimed from slab. But the corresponding
> scanned slab pages is not passed. This can cause total reclaimed
> pages to be greater than scanned, causing an unsigned underflow
> in vmpressure resulting in a critical event being sent to root
> cgroup. So do not consider reclaimed slab pages for vmpressure
> calculation. The reclaimed pages from slab can be excluded because
> the freeing of a page by slab shrinking depends on each slab's
> object population, making the cost model (i.e. scan:free) different
> from that of LRU. Also, not every shrinker accounts the pages it
> reclaims. This is a regression introduced by commit 6b4f7799c6a5
> ("mm: vmscan: invoke slab shrinkers from shrink_zone()").
> 
> Signed-off-by: Vinayak Menon 
Acked-by: Minchan Kim