Re: [EDT] oom_killer: find bulkiest task based on pss value

2015-05-10 Thread David Rientjes
On Fri, 8 May 2015, Yogesh Narayan Gaur wrote:

> Presently in oom_kill.c we calculate badness score of the victim task as per 
> the present RSS counter value of the task.
> RSS counter value for any task is usually '[Private (Dirty/Clean)] + [Shared 
> (Dirty/Clean)]' of the task.
> We have encountered a situation where values for Private fields are less but 
> value for Shared fields are more and hence make total RSS counter value 
> large. Later on oom situation killing task with highest RSS value but as 
> Private field values are not large hence memory gain after killing this 
> process is not as per the expectation.
> 
> For e.g. take below use-case scenario, in which 3 process are running in 
> system. 
> All these process done mmap for file exist in present directory and then 
> copying data from this file to local allocated pointers in while(1) loop with 
> some sleep. Out of 3 process, 2 process has mmaped file with MAP_SHARED 
> setting and one has mapped file with MAP_PRIVATE setting.
> I have all 3 processes in background and checks RSS/PSS value from user space 
> utility (utility over cat /proc/pid/smaps)
> Before OOM, below is the consumed memory status for these 3 process (all 
> processes run with oom_score_adj = 0)
> 
> Comm : 1prg,  Pid : 213 (values in kB)
>   Rss Shared  Private  Pss
>   Process :  375764194596181168 278460
> 
> Comm : 3prg,  Pid : 217 (values in kB)
>   RssShared   Private Pss
>   Process :  305760  32 305728305738
> 
> Comm : 2prg,  Pid : 218 (values in kB)
>   Rss  Shared   Private Pss
>   Process :  389980 194596 195384292676
> 
> 
> Thus as per present code design, first it would select process [2prg : 218] 
> as bulkiest process as its RSS value is highest to kill. But if we kill this 
> process then only ~195MB would be free as compare to expected ~389MB.
> Thus identifying the task based on RSS value is not accurate design and 
> killing that identified process didn’t release expected memory back to system.
> 
> We need to calculate victim task based on PSS instead of RSS as PSS value 
> calculates as
> PSS value = [Private (Dirty/Clean)] + [Shared (Dirty/Clean) / no. of shared 
> task]
> For above use-case scenario also, it can be checked that process [3prg : 217] 
> is having largest PSS value and by killing this process we can gain maximum 
> memory (~305MB) as compare to killing process identified based on RSS value.
> 

The oom killer doesn't expect to necessarily be able to free all memory 
that is represented by the rss of a process.  In fact, after it selects a 
process it will happily kill a child process in favor of its parent if 
they don't share the same memory.

There're a few problems with using pss and the proposed patch that 
follows:

 - it's less predictable since it depends on the number of times the 
   memory is mapped, which may change during the process's lifetime,

 - it requires mm->mmap_sem to do, which is not possible to do because
   it may be held and thus reverting back to rss in situations where
   the trylock fails makes it even less predictable and reliable, and

 - all users who currently tune /proc/pid/oom_score_adj or
   /proc/pid/oom_adj are doing so based on the current heuristic, which
   is rss; if we switched to pss and all a process's memory is shared
   then their oom_score_adj or oom_adj is now severely broken (and as a
   result of the first problem above, defining oom_score_adj is near
   impossible).

We don't have the expectation of freeing the entire rss, the best we can 
do is use a heuristic which is reliable, consistent, and cheap to check.  
We can then ask users who desire a process to have a different oom kill 
priority to use oom_score_adj and they may do so in a reliable way without 
having the fallback behavior that your trylock does.

Re: [EDT] oom_killer: find bulkiest task based on pss value

2015-05-10 Thread David Rientjes
On Fri, 8 May 2015, Yogesh Narayan Gaur wrote:

 Presently in oom_kill.c we calculate badness score of the victim task as per 
 the present RSS counter value of the task.
 RSS counter value for any task is usually '[Private (Dirty/Clean)] + [Shared 
 (Dirty/Clean)]' of the task.
 We have encountered a situation where values for Private fields are less but 
 value for Shared fields are more and hence make total RSS counter value 
 large. Later on oom situation killing task with highest RSS value but as 
 Private field values are not large hence memory gain after killing this 
 process is not as per the expectation.
 
 For e.g. take below use-case scenario, in which 3 process are running in 
 system. 
 All these process done mmap for file exist in present directory and then 
 copying data from this file to local allocated pointers in while(1) loop with 
 some sleep. Out of 3 process, 2 process has mmaped file with MAP_SHARED 
 setting and one has mapped file with MAP_PRIVATE setting.
 I have all 3 processes in background and checks RSS/PSS value from user space 
 utility (utility over cat /proc/pid/smaps)
 Before OOM, below is the consumed memory status for these 3 process (all 
 processes run with oom_score_adj = 0)
 
 Comm : 1prg,  Pid : 213 (values in kB)
   Rss Shared  Private  Pss
   Process :  375764194596181168 278460
 
 Comm : 3prg,  Pid : 217 (values in kB)
   RssShared   Private Pss
   Process :  305760  32 305728305738
 
 Comm : 2prg,  Pid : 218 (values in kB)
   Rss  Shared   Private Pss
   Process :  389980 194596 195384292676
 
 
 Thus as per present code design, first it would select process [2prg : 218] 
 as bulkiest process as its RSS value is highest to kill. But if we kill this 
 process then only ~195MB would be free as compare to expected ~389MB.
 Thus identifying the task based on RSS value is not accurate design and 
 killing that identified process didn’t release expected memory back to system.
 
 We need to calculate victim task based on PSS instead of RSS as PSS value 
 calculates as
 PSS value = [Private (Dirty/Clean)] + [Shared (Dirty/Clean) / no. of shared 
 task]
 For above use-case scenario also, it can be checked that process [3prg : 217] 
 is having largest PSS value and by killing this process we can gain maximum 
 memory (~305MB) as compare to killing process identified based on RSS value.
 

The oom killer doesn't expect to necessarily be able to free all memory 
that is represented by the rss of a process.  In fact, after it selects a 
process it will happily kill a child process in favor of its parent if 
they don't share the same memory.

There're a few problems with using pss and the proposed patch that 
follows:

 - it's less predictable since it depends on the number of times the 
   memory is mapped, which may change during the process's lifetime,

 - it requires mm-mmap_sem to do, which is not possible to do because
   it may be held and thus reverting back to rss in situations where
   the trylock fails makes it even less predictable and reliable, and

 - all users who currently tune /proc/pid/oom_score_adj or
   /proc/pid/oom_adj are doing so based on the current heuristic, which
   is rss; if we switched to pss and all a process's memory is shared
   then their oom_score_adj or oom_adj is now severely broken (and as a
   result of the first problem above, defining oom_score_adj is near
   impossible).

We don't have the expectation of freeing the entire rss, the best we can 
do is use a heuristic which is reliable, consistent, and cheap to check.  
We can then ask users who desire a process to have a different oom kill 
priority to use oom_score_adj and they may do so in a reliable way without 
having the fallback behavior that your trylock does.

Re: Re: [EDT] oom_killer: find bulkiest task based on pss value

2015-05-08 Thread yalin wang
2015-05-08 16:01 GMT+08:00 Yogesh Narayan Gaur :
> EP-2DAD0AFA905A4ACB804C4F82A001242F
>
> --- Original Message ---
> Sender : yalin wang
> Date : May 08, 2015 13:17 (GMT+05:30)
> Title : Re: [EDT] oom_killer: find bulkiest task based on pss value
>
> 2015-05-08 13:29 GMT+08:00 Yogesh Narayan Gaur :
>>>
>>> EP-2DAD0AFA905A4ACB804C4F82A001242F
>>> Hi Andrew,
>>>
>>> Presently in oom_kill.c we calculate badness score of the victim task as 
>>> per the present RSS counter value of the task.
>>> RSS counter value for any task is usually '[Private (Dirty/Clean)] + 
>>> [Shared (Dirty/Clean)]' of the task.
>>> We have encountered a situation where values for Private fields are less 
>>> but value for Shared fields are more and hence make total RSS counter value 
>>> large. Later on oom situation killing task with highest RSS value but as 
>>> Private field values are not large hence memory gain after killing this 
>>> process is not as per the expectation.
>>>
>>> For e.g. take below use-case scenario, in which 3 process are running in 
>>> system.
>>> All these process done mmap for file exist in present directory and then 
>>> copying data from this file to local allocated pointers in while(1) loop 
>>> with some sleep. Out of 3 process, 2 process has mmaped file with 
>>> MAP_SHARED setting and one has mapped file with MAP_PRIVATE setting.
>>> I have all 3 processes in background and checks RSS/PSS value from user 
>>> space utility (utility over cat /proc/pid/smaps)
>>> Before OOM, below is the consumed memory status for these 3 process (all 
>>> processes run with oom_score_adj = 0)
>>> 
>>> Comm : 1prg,  Pid : 213 (values in kB)
>>>   Rss Shared  Private  Pss
>>>   Process :  375764194596181168 278460
>>> 
>>> Comm : 3prg,  Pid : 217 (values in kB)
>>>   RssShared   Private Pss
>>>   Process :  305760  32 305728305738
>>> 
>>> Comm : 2prg,  Pid : 218 (values in kB)
>>>   Rss  Shared   Private Pss
>>>   Process :  389980 194596 195384292676
>>> 
>>>
>>> Thus as per present code design, first it would select process [2prg : 218] 
>>> as bulkiest process as its RSS value is highest to kill. But if we kill 
>>> this process then only ~195MB would be free as compare to expected ~389MB.
>>> Thus identifying the task based on RSS value is not accurate design and 
>>> killing that identified process didn’t release expected memory back to 
>>> system.
>>>
>>> We need to calculate victim task based on PSS instead of RSS as PSS value 
>>> calculates as
>>> PSS value = [Private (Dirty/Clean)] + [Shared (Dirty/Clean) / no. of shared 
>>> task]
>>> For above use-case scenario also, it can be checked that process [3prg : 
>>> 217] is having largest PSS value and by killing this process we can gain 
>>> maximum memory (~305MB) as compare to killing process identified based on 
>>> RSS value.
>>>
>>> --
>>> Regards,
>>> Yogesh Gaur.
>
>>
>>Great,
>>
>> in fact, i also encounter this scenario,
>> I  use USS (page map counter == 1) pages
>> to decide which process should be killed,
>> seems have the same result as you use PSS,
>> but PSS is better , it also consider shared pages,
>> in case some process have large shared pages mapping
>> but little Private page mapping
>>
>> BRs,
>> Yalin
>
> I have made patch which identifies bulkiest task on basis of PSS value. 
> Please check below patch.
> This patch is correcting the way victim task gets identified in oom condition.
>
> ==
>
> From 1c3d7f552f696bdbc0126c8e23beabedbd80e423 Mon Sep 17 00:00:00 2001
> From: Yogesh Gaur 
> Date: Thu, 7 May 2015 01:52:13 +0530
> Subject: [PATCH] oom: find victim task based on pss
>
> This patch is identifying bulkiest task to kill by OOM on the basis of PSS 
> value
> instead of present RSS values.
> There can be scenario where task with highest RSS counter is consuming lot of 
> shared
> memory and killing that task didn't release expected amount of memory to 
> system.
> PSS value = [Private (Dirty/

Re: Re: [EDT] oom_killer: find bulkiest task based on pss value

2015-05-08 Thread Yogesh Narayan Gaur
EP-2DAD0AFA905A4ACB804C4F82A001242F

--- Original Message ---
Sender : yalin wang
Date : May 08, 2015 13:17 (GMT+05:30)
Title : Re: [EDT] oom_killer: find bulkiest task based on pss value

2015-05-08 13:29 GMT+08:00 Yogesh Narayan Gaur :
>>
>> EP-2DAD0AFA905A4ACB804C4F82A001242F
>> Hi Andrew,
>>
>> Presently in oom_kill.c we calculate badness score of the victim task as per 
>> the present RSS counter value of the task.
>> RSS counter value for any task is usually '[Private (Dirty/Clean)] + [Shared 
>> (Dirty/Clean)]' of the task.
>> We have encountered a situation where values for Private fields are less but 
>> value for Shared fields are more and hence make total RSS counter value 
>> large. Later on oom situation killing task with highest RSS value but as 
>> Private field values are not large hence memory gain after killing this 
>> process is not as per the expectation.
>>
>> For e.g. take below use-case scenario, in which 3 process are running in 
>> system.
>> All these process done mmap for file exist in present directory and then 
>> copying data from this file to local allocated pointers in while(1) loop 
>> with some sleep. Out of 3 process, 2 process has mmaped file with MAP_SHARED 
>> setting and one has mapped file with MAP_PRIVATE setting.
>> I have all 3 processes in background and checks RSS/PSS value from user 
>> space utility (utility over cat /proc/pid/smaps)
>> Before OOM, below is the consumed memory status for these 3 process (all 
>> processes run with oom_score_adj = 0)
>> 
>> Comm : 1prg,  Pid : 213 (values in kB)
>>   Rss Shared  Private  Pss
>>   Process :  375764194596181168 278460
>> 
>> Comm : 3prg,  Pid : 217 (values in kB)
>>   RssShared   Private Pss
>>   Process :  305760  32 305728305738
>> 
>> Comm : 2prg,  Pid : 218 (values in kB)
>>   Rss  Shared   Private Pss
>>   Process :  389980 194596 195384292676
>> 
>>
>> Thus as per present code design, first it would select process [2prg : 218] 
>> as bulkiest process as its RSS value is highest to kill. But if we kill this 
>> process then only ~195MB would be free as compare to expected ~389MB.
>> Thus identifying the task based on RSS value is not accurate design and 
>> killing that identified process didn’t release expected memory back to 
>> system.
>>
>> We need to calculate victim task based on PSS instead of RSS as PSS value 
>> calculates as
>> PSS value = [Private (Dirty/Clean)] + [Shared (Dirty/Clean) / no. of shared 
>> task]
>> For above use-case scenario also, it can be checked that process [3prg : 
>> 217] is having largest PSS value and by killing this process we can gain 
>> maximum memory (~305MB) as compare to killing process identified based on 
>> RSS value.
>>
>> --
>> Regards,
>> Yogesh Gaur.

>
>Great,
>
> in fact, i also encounter this scenario,
> I  use USS (page map counter == 1) pages
> to decide which process should be killed,
> seems have the same result as you use PSS,
> but PSS is better , it also consider shared pages,
> in case some process have large shared pages mapping
> but little Private page mapping
>
> BRs,
> Yalin

I have made patch which identifies bulkiest task on basis of PSS value. Please 
check below patch.
This patch is correcting the way victim task gets identified in oom condition. 

==

From 1c3d7f552f696bdbc0126c8e23beabedbd80e423 Mon Sep 17 00:00:00 2001
From: Yogesh Gaur 
Date: Thu, 7 May 2015 01:52:13 +0530
Subject: [PATCH] oom: find victim task based on pss

This patch is identifying bulkiest task to kill by OOM on the basis of PSS value
instead of present RSS values.
There can be scenario where task with highest RSS counter is consuming lot of 
shared
memory and killing that task didn't release expected amount of memory to system.
PSS value = [Private (Dirty/Clean)] + [Shared (Dirty/Clean) / no. of shared 
task]
RSS value = [Private (Dirty/Clean)] + [Shared (Dirty/Clean)]
Thus, using PSS value instead of RSS value as PSS value closely matches with 
actual
memory usage by the task.
This patch is using smaps_pte_range() interface defined in 
CONFIG_PROC_PAGE_MONITOR.
For case when CONFIG_PROC_PAGE_MONITOR disabled, this simply returns RSS value 
count.

Signed-off-by: Yogesh Gaur 
Signed-off-by: Amit Arora 
Reviewed-b

Re: [EDT] oom_killer: find bulkiest task based on pss value

2015-05-08 Thread yalin wang
2015-05-08 13:29 GMT+08:00 Yogesh Narayan Gaur :
>
> EP-2DAD0AFA905A4ACB804C4F82A001242F
> Hi Andrew,
>
> Presently in oom_kill.c we calculate badness score of the victim task as per 
> the present RSS counter value of the task.
> RSS counter value for any task is usually '[Private (Dirty/Clean)] + [Shared 
> (Dirty/Clean)]' of the task.
> We have encountered a situation where values for Private fields are less but 
> value for Shared fields are more and hence make total RSS counter value 
> large. Later on oom situation killing task with highest RSS value but as 
> Private field values are not large hence memory gain after killing this 
> process is not as per the expectation.
>
> For e.g. take below use-case scenario, in which 3 process are running in 
> system.
> All these process done mmap for file exist in present directory and then 
> copying data from this file to local allocated pointers in while(1) loop with 
> some sleep. Out of 3 process, 2 process has mmaped file with MAP_SHARED 
> setting and one has mapped file with MAP_PRIVATE setting.
> I have all 3 processes in background and checks RSS/PSS value from user space 
> utility (utility over cat /proc/pid/smaps)
> Before OOM, below is the consumed memory status for these 3 process (all 
> processes run with oom_score_adj = 0)
> 
> Comm : 1prg,  Pid : 213 (values in kB)
>   Rss Shared  Private  Pss
>   Process :  375764194596181168 278460
> 
> Comm : 3prg,  Pid : 217 (values in kB)
>   RssShared   Private Pss
>   Process :  305760  32 305728305738
> 
> Comm : 2prg,  Pid : 218 (values in kB)
>   Rss  Shared   Private Pss
>   Process :  389980 194596 195384292676
> 
>
> Thus as per present code design, first it would select process [2prg : 218] 
> as bulkiest process as its RSS value is highest to kill. But if we kill this 
> process then only ~195MB would be free as compare to expected ~389MB.
> Thus identifying the task based on RSS value is not accurate design and 
> killing that identified process didn’t release expected memory back to system.
>
> We need to calculate victim task based on PSS instead of RSS as PSS value 
> calculates as
> PSS value = [Private (Dirty/Clean)] + [Shared (Dirty/Clean) / no. of shared 
> task]
> For above use-case scenario also, it can be checked that process [3prg : 217] 
> is having largest PSS value and by killing this process we can gain maximum 
> memory (~305MB) as compare to killing process identified based on RSS value.
>
> --
> Regards,
> Yogesh Gaur.


Great,

in fact, i also encounter this scenario,
i use USS (page map counter == 1) pages
to decide which process should be killed,
seems have the same result as you use PSS,
but PSS is better , it also consider shared pages,
in case some process have large shared pages mapping
but little Private page mapping

BRs,
Yalin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Re: [EDT] oom_killer: find bulkiest task based on pss value

2015-05-08 Thread Yogesh Narayan Gaur
EP-2DAD0AFA905A4ACB804C4F82A001242F

--- Original Message ---
Sender : yalin wangyalin.wang2...@gmail.com
Date : May 08, 2015 13:17 (GMT+05:30)
Title : Re: [EDT] oom_killer: find bulkiest task based on pss value

2015-05-08 13:29 GMT+08:00 Yogesh Narayan Gaur :

 EP-2DAD0AFA905A4ACB804C4F82A001242F
 Hi Andrew,

 Presently in oom_kill.c we calculate badness score of the victim task as per 
 the present RSS counter value of the task.
 RSS counter value for any task is usually '[Private (Dirty/Clean)] + [Shared 
 (Dirty/Clean)]' of the task.
 We have encountered a situation where values for Private fields are less but 
 value for Shared fields are more and hence make total RSS counter value 
 large. Later on oom situation killing task with highest RSS value but as 
 Private field values are not large hence memory gain after killing this 
 process is not as per the expectation.

 For e.g. take below use-case scenario, in which 3 process are running in 
 system.
 All these process done mmap for file exist in present directory and then 
 copying data from this file to local allocated pointers in while(1) loop 
 with some sleep. Out of 3 process, 2 process has mmaped file with MAP_SHARED 
 setting and one has mapped file with MAP_PRIVATE setting.
 I have all 3 processes in background and checks RSS/PSS value from user 
 space utility (utility over cat /proc/pid/smaps)
 Before OOM, below is the consumed memory status for these 3 process (all 
 processes run with oom_score_adj = 0)
 
 Comm : 1prg,  Pid : 213 (values in kB)
   Rss Shared  Private  Pss
   Process :  375764194596181168 278460
 
 Comm : 3prg,  Pid : 217 (values in kB)
   RssShared   Private Pss
   Process :  305760  32 305728305738
 
 Comm : 2prg,  Pid : 218 (values in kB)
   Rss  Shared   Private Pss
   Process :  389980 194596 195384292676
 

 Thus as per present code design, first it would select process [2prg : 218] 
 as bulkiest process as its RSS value is highest to kill. But if we kill this 
 process then only ~195MB would be free as compare to expected ~389MB.
 Thus identifying the task based on RSS value is not accurate design and 
 killing that identified process didn’t release expected memory back to 
 system.

 We need to calculate victim task based on PSS instead of RSS as PSS value 
 calculates as
 PSS value = [Private (Dirty/Clean)] + [Shared (Dirty/Clean) / no. of shared 
 task]
 For above use-case scenario also, it can be checked that process [3prg : 
 217] is having largest PSS value and by killing this process we can gain 
 maximum memory (~305MB) as compare to killing process identified based on 
 RSS value.

 --
 Regards,
 Yogesh Gaur.


Great,

 in fact, i also encounter this scenario,
 I  use USS (page map counter == 1) pages
 to decide which process should be killed,
 seems have the same result as you use PSS,
 but PSS is better , it also consider shared pages,
 in case some process have large shared pages mapping
 but little Private page mapping

 BRs,
 Yalin

I have made patch which identifies bulkiest task on basis of PSS value. Please 
check below patch.
This patch is correcting the way victim task gets identified in oom condition. 

==

From 1c3d7f552f696bdbc0126c8e23beabedbd80e423 Mon Sep 17 00:00:00 2001
From: Yogesh Gaur yn.g...@samsung.com
Date: Thu, 7 May 2015 01:52:13 +0530
Subject: [PATCH] oom: find victim task based on pss

This patch is identifying bulkiest task to kill by OOM on the basis of PSS value
instead of present RSS values.
There can be scenario where task with highest RSS counter is consuming lot of 
shared
memory and killing that task didn't release expected amount of memory to system.
PSS value = [Private (Dirty/Clean)] + [Shared (Dirty/Clean) / no. of shared 
task]
RSS value = [Private (Dirty/Clean)] + [Shared (Dirty/Clean)]
Thus, using PSS value instead of RSS value as PSS value closely matches with 
actual
memory usage by the task.
This patch is using smaps_pte_range() interface defined in 
CONFIG_PROC_PAGE_MONITOR.
For case when CONFIG_PROC_PAGE_MONITOR disabled, this simply returns RSS value 
count.

Signed-off-by: Yogesh Gaur yn.g...@samsung.com
Signed-off-by: Amit Arora amit.ar...@samsung.com
Reviewed-by: Ajeet Yadav ajee...@samsung.com
---
 fs/proc/task_mmu.c |   47 +++
 include/linux/mm.h |9 +
 mm/oom_kill.c  |9 +++--
 3 files changed, 63 insertions(+), 2 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 956b75d..dd962ff 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -964,6 +964,53 @@ struct pagemapread {
bool

Re: [EDT] oom_killer: find bulkiest task based on pss value

2015-05-08 Thread yalin wang
2015-05-08 13:29 GMT+08:00 Yogesh Narayan Gaur yn.g...@samsung.com:

 EP-2DAD0AFA905A4ACB804C4F82A001242F
 Hi Andrew,

 Presently in oom_kill.c we calculate badness score of the victim task as per 
 the present RSS counter value of the task.
 RSS counter value for any task is usually '[Private (Dirty/Clean)] + [Shared 
 (Dirty/Clean)]' of the task.
 We have encountered a situation where values for Private fields are less but 
 value for Shared fields are more and hence make total RSS counter value 
 large. Later on oom situation killing task with highest RSS value but as 
 Private field values are not large hence memory gain after killing this 
 process is not as per the expectation.

 For e.g. take below use-case scenario, in which 3 process are running in 
 system.
 All these process done mmap for file exist in present directory and then 
 copying data from this file to local allocated pointers in while(1) loop with 
 some sleep. Out of 3 process, 2 process has mmaped file with MAP_SHARED 
 setting and one has mapped file with MAP_PRIVATE setting.
 I have all 3 processes in background and checks RSS/PSS value from user space 
 utility (utility over cat /proc/pid/smaps)
 Before OOM, below is the consumed memory status for these 3 process (all 
 processes run with oom_score_adj = 0)
 
 Comm : 1prg,  Pid : 213 (values in kB)
   Rss Shared  Private  Pss
   Process :  375764194596181168 278460
 
 Comm : 3prg,  Pid : 217 (values in kB)
   RssShared   Private Pss
   Process :  305760  32 305728305738
 
 Comm : 2prg,  Pid : 218 (values in kB)
   Rss  Shared   Private Pss
   Process :  389980 194596 195384292676
 

 Thus as per present code design, first it would select process [2prg : 218] 
 as bulkiest process as its RSS value is highest to kill. But if we kill this 
 process then only ~195MB would be free as compare to expected ~389MB.
 Thus identifying the task based on RSS value is not accurate design and 
 killing that identified process didn’t release expected memory back to system.

 We need to calculate victim task based on PSS instead of RSS as PSS value 
 calculates as
 PSS value = [Private (Dirty/Clean)] + [Shared (Dirty/Clean) / no. of shared 
 task]
 For above use-case scenario also, it can be checked that process [3prg : 217] 
 is having largest PSS value and by killing this process we can gain maximum 
 memory (~305MB) as compare to killing process identified based on RSS value.

 --
 Regards,
 Yogesh Gaur.


Great,

in fact, i also encounter this scenario,
i use USS (page map counter == 1) pages
to decide which process should be killed,
seems have the same result as you use PSS,
but PSS is better , it also consider shared pages,
in case some process have large shared pages mapping
but little Private page mapping

BRs,
Yalin
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Re: [EDT] oom_killer: find bulkiest task based on pss value

2015-05-08 Thread yalin wang
2015-05-08 16:01 GMT+08:00 Yogesh Narayan Gaur yn.g...@samsung.com:
 EP-2DAD0AFA905A4ACB804C4F82A001242F

 --- Original Message ---
 Sender : yalin wangyalin.wang2...@gmail.com
 Date : May 08, 2015 13:17 (GMT+05:30)
 Title : Re: [EDT] oom_killer: find bulkiest task based on pss value

 2015-05-08 13:29 GMT+08:00 Yogesh Narayan Gaur :

 EP-2DAD0AFA905A4ACB804C4F82A001242F
 Hi Andrew,

 Presently in oom_kill.c we calculate badness score of the victim task as 
 per the present RSS counter value of the task.
 RSS counter value for any task is usually '[Private (Dirty/Clean)] + 
 [Shared (Dirty/Clean)]' of the task.
 We have encountered a situation where values for Private fields are less 
 but value for Shared fields are more and hence make total RSS counter value 
 large. Later on oom situation killing task with highest RSS value but as 
 Private field values are not large hence memory gain after killing this 
 process is not as per the expectation.

 For e.g. take below use-case scenario, in which 3 process are running in 
 system.
 All these process done mmap for file exist in present directory and then 
 copying data from this file to local allocated pointers in while(1) loop 
 with some sleep. Out of 3 process, 2 process has mmaped file with 
 MAP_SHARED setting and one has mapped file with MAP_PRIVATE setting.
 I have all 3 processes in background and checks RSS/PSS value from user 
 space utility (utility over cat /proc/pid/smaps)
 Before OOM, below is the consumed memory status for these 3 process (all 
 processes run with oom_score_adj = 0)
 
 Comm : 1prg,  Pid : 213 (values in kB)
   Rss Shared  Private  Pss
   Process :  375764194596181168 278460
 
 Comm : 3prg,  Pid : 217 (values in kB)
   RssShared   Private Pss
   Process :  305760  32 305728305738
 
 Comm : 2prg,  Pid : 218 (values in kB)
   Rss  Shared   Private Pss
   Process :  389980 194596 195384292676
 

 Thus as per present code design, first it would select process [2prg : 218] 
 as bulkiest process as its RSS value is highest to kill. But if we kill 
 this process then only ~195MB would be free as compare to expected ~389MB.
 Thus identifying the task based on RSS value is not accurate design and 
 killing that identified process didn’t release expected memory back to 
 system.

 We need to calculate victim task based on PSS instead of RSS as PSS value 
 calculates as
 PSS value = [Private (Dirty/Clean)] + [Shared (Dirty/Clean) / no. of shared 
 task]
 For above use-case scenario also, it can be checked that process [3prg : 
 217] is having largest PSS value and by killing this process we can gain 
 maximum memory (~305MB) as compare to killing process identified based on 
 RSS value.

 --
 Regards,
 Yogesh Gaur.


Great,

 in fact, i also encounter this scenario,
 I  use USS (page map counter == 1) pages
 to decide which process should be killed,
 seems have the same result as you use PSS,
 but PSS is better , it also consider shared pages,
 in case some process have large shared pages mapping
 but little Private page mapping

 BRs,
 Yalin

 I have made patch which identifies bulkiest task on basis of PSS value. 
 Please check below patch.
 This patch is correcting the way victim task gets identified in oom condition.

 ==

 From 1c3d7f552f696bdbc0126c8e23beabedbd80e423 Mon Sep 17 00:00:00 2001
 From: Yogesh Gaur yn.g...@samsung.com
 Date: Thu, 7 May 2015 01:52:13 +0530
 Subject: [PATCH] oom: find victim task based on pss

 This patch is identifying bulkiest task to kill by OOM on the basis of PSS 
 value
 instead of present RSS values.
 There can be scenario where task with highest RSS counter is consuming lot of 
 shared
 memory and killing that task didn't release expected amount of memory to 
 system.
 PSS value = [Private (Dirty/Clean)] + [Shared (Dirty/Clean) / no. of shared 
 task]
 RSS value = [Private (Dirty/Clean)] + [Shared (Dirty/Clean)]
 Thus, using PSS value instead of RSS value as PSS value closely matches with 
 actual
 memory usage by the task.
 This patch is using smaps_pte_range() interface defined in 
 CONFIG_PROC_PAGE_MONITOR.
 For case when CONFIG_PROC_PAGE_MONITOR disabled, this simply returns RSS 
 value count.

 Signed-off-by: Yogesh Gaur yn.g...@samsung.com
 Signed-off-by: Amit Arora amit.ar...@samsung.com
 Reviewed-by: Ajeet Yadav ajee...@samsung.com
 ---
  fs/proc/task_mmu.c |   47 +++
  include/linux/mm.h |9 +
  mm/oom_kill.c  |9 +++--
  3 files changed, 63 insertions(+), 2 deletions(-)

 diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
 index 956b75d..dd962ff