Re: [PATCH v15 1/2] Reorganize the oom report in dump_header

2018-11-22 Thread Michal Hocko
On Wed 21-11-18 19:29:58, ufo19890...@gmail.com wrote:
> From: yuzhoujian 
> 
> OOM report contains several sections. The first one is the allocation
> context that has triggered the OOM. Then we have cpuset context
> followed by the stack trace of the OOM path. The tird one is the OOM
> memory information. Followed by the current memory state of all system
> tasks. At last, we will show oom eligible tasks and the information
> about the chosen oom victim.
> 
> One thing that makes parsing more awkward than necessary is that we do
> not have a single and easily parsable line about the oom context. This
> patch is reorganizing the oom report to
> 1) who invoked oom and what was the allocation request
> [  515.902945] tuned invoked oom-killer: 
> gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
> 
> 2) OOM stack trace
> [  515.904273] CPU: 24 PID: 1809 Comm: tuned Not tainted 4.20.0-rc3+ #3
> [  515.905518] Hardware name: Inspur SA5212M4/YZMB-00370-107, BIOS 4.1.10 
> 11/14/2016
> [  515.906821] Call Trace:
> [  515.908062]  dump_stack+0x5a/0x73
> [  515.909311]  dump_header+0x55/0x28c
> [  515.914260]  oom_kill_process+0x2d8/0x300
> [  515.916708]  out_of_memory+0x145/0x4a0
> [  515.917932]  __alloc_pages_slowpath+0x7d2/0xa16
> [  515.919157]  __alloc_pages_nodemask+0x277/0x290
> [  515.920367]  filemap_fault+0x3d0/0x6c0
> [  515.921529]  ? filemap_map_pages+0x2b8/0x420
> [  515.922709]  ext4_filemap_fault+0x2c/0x40 [ext4]
> [  515.923884]  __do_fault+0x20/0x80
> [  515.925032]  __handle_mm_fault+0xbc0/0xe80
> [  515.926195]  handle_mm_fault+0xfa/0x210
> [  515.927357]  __do_page_fault+0x233/0x4c0
> [  515.928506]  do_page_fault+0x32/0x140
> [  515.929646]  ? page_fault+0x8/0x30
> [  515.930770]  page_fault+0x1e/0x30
> 
> 3) OOM memory information
> [  515.958093] Mem-Info:
> [  515.959647] active_anon:26501758 inactive_anon:1179809 isolated_anon:0
>  active_file:4402672 inactive_file:483963 isolated_file:1344
>  unevictable:0 dirty:4886753 writeback:0 unstable:0
>  slab_reclaimable:148442 slab_unreclaimable:18741
>  mapped:1347 shmem:1347 pagetables:58669 bounce:0
>  free:88663 free_pcp:0 free_cma:0
> ...
> 
> 4) current memory state of all system tasks
> [  516.079544] [744] 0   744 9211 1345   114688   82  
>0 systemd-journal
> [  516.082034] [787] 0   787317640   143360   92  
>0 lvmetad
> [  516.084465] [792] 0   792109301   110592  208  
>-1000 systemd-udevd
> [  516.086865] [   1199] 0  1199138660   131072  112  
>-1000 auditd
> [  516.089190] [   1222] 0  1222319901   110592  157  
>0 smartd
> [  516.091477] [   1225] 0  1225 4864   8581920   43  
>0 irqbalance
> [  516.093712] [   1226] 0  1226526120   258048  426  
>0 abrtd
> [  516.112128] [   1280] 0  1280   109774   55   299008  400  
>0 NetworkManager
> [  516.113998] [   1295] 0  129528817   3769632   24  
>0 ksmtuned
> [  516.144596] [  10718] 0 10718  2622484  1721372 15998976   267219  
>0 panic
> [  516.145792] [  10719] 0 10719  2622484  1164767  981811253576  
>0 panic
> [  516.146977] [  10720] 0 10720  2622484  1174361  990412853709  
>0 panic
> [  516.148163] [  10721] 0 10721  2622484  1209070 1019494454824  
>0 panic
> [  516.149329] [  10722] 0 10722  2622484  1745799 1477427291138  
>0 panic
> 
> 5) oom context (contrains and the chosen victim).
> oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0-1,task=panic,pid=10737,uid=0
> 
> An admin can easily get the full oom context at a single line which
> makes parsing much easier.
> 
> Signed-off-by: yuzhoujian 

Looks good, finally
Acked-by: Michal Hocko 
-- 
Michal Hocko
SUSE Labs


[PATCH v15 1/2] Reorganize the oom report in dump_header

2018-11-21 Thread ufo19890607
From: yuzhoujian 

OOM report contains several sections. The first one is the allocation
context that has triggered the OOM. Then we have cpuset context
followed by the stack trace of the OOM path. The tird one is the OOM
memory information. Followed by the current memory state of all system
tasks. At last, we will show oom eligible tasks and the information
about the chosen oom victim.

One thing that makes parsing more awkward than necessary is that we do
not have a single and easily parsable line about the oom context. This
patch is reorganizing the oom report to
1) who invoked oom and what was the allocation request
[  515.902945] tuned invoked oom-killer: 
gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0

2) OOM stack trace
[  515.904273] CPU: 24 PID: 1809 Comm: tuned Not tainted 4.20.0-rc3+ #3
[  515.905518] Hardware name: Inspur SA5212M4/YZMB-00370-107, BIOS 4.1.10 
11/14/2016
[  515.906821] Call Trace:
[  515.908062]  dump_stack+0x5a/0x73
[  515.909311]  dump_header+0x55/0x28c
[  515.914260]  oom_kill_process+0x2d8/0x300
[  515.916708]  out_of_memory+0x145/0x4a0
[  515.917932]  __alloc_pages_slowpath+0x7d2/0xa16
[  515.919157]  __alloc_pages_nodemask+0x277/0x290
[  515.920367]  filemap_fault+0x3d0/0x6c0
[  515.921529]  ? filemap_map_pages+0x2b8/0x420
[  515.922709]  ext4_filemap_fault+0x2c/0x40 [ext4]
[  515.923884]  __do_fault+0x20/0x80
[  515.925032]  __handle_mm_fault+0xbc0/0xe80
[  515.926195]  handle_mm_fault+0xfa/0x210
[  515.927357]  __do_page_fault+0x233/0x4c0
[  515.928506]  do_page_fault+0x32/0x140
[  515.929646]  ? page_fault+0x8/0x30
[  515.930770]  page_fault+0x1e/0x30

3) OOM memory information
[  515.958093] Mem-Info:
[  515.959647] active_anon:26501758 inactive_anon:1179809 isolated_anon:0
 active_file:4402672 inactive_file:483963 isolated_file:1344
 unevictable:0 dirty:4886753 writeback:0 unstable:0
 slab_reclaimable:148442 slab_unreclaimable:18741
 mapped:1347 shmem:1347 pagetables:58669 bounce:0
 free:88663 free_pcp:0 free_cma:0
...

4) current memory state of all system tasks
[  516.079544] [744] 0   744 9211 1345   114688   82
 0 systemd-journal
[  516.082034] [787] 0   787317640   143360   92
 0 lvmetad
[  516.084465] [792] 0   792109301   110592  208
 -1000 systemd-udevd
[  516.086865] [   1199] 0  1199138660   131072  112
 -1000 auditd
[  516.089190] [   1222] 0  1222319901   110592  157
 0 smartd
[  516.091477] [   1225] 0  1225 4864   8581920   43
 0 irqbalance
[  516.093712] [   1226] 0  1226526120   258048  426
 0 abrtd
[  516.112128] [   1280] 0  1280   109774   55   299008  400
 0 NetworkManager
[  516.113998] [   1295] 0  129528817   3769632   24
 0 ksmtuned
[  516.144596] [  10718] 0 10718  2622484  1721372 15998976   267219
 0 panic
[  516.145792] [  10719] 0 10719  2622484  1164767  981811253576
 0 panic
[  516.146977] [  10720] 0 10720  2622484  1174361  990412853709
 0 panic
[  516.148163] [  10721] 0 10721  2622484  1209070 1019494454824
 0 panic
[  516.149329] [  10722] 0 10722  2622484  1745799 1477427291138
 0 panic

5) oom context (contrains and the chosen victim).
oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0-1,task=panic,pid=10737,uid=0

An admin can easily get the full oom context at a single line which
makes parsing much easier.

Signed-off-by: yuzhoujian 
---
Changes since v14:
- add the dump_oom_summary for the single line output of oom context.
- fix the null pointer in the dump_header.

Changes since v13:
- remove the spaces for printing pid and uid.

Changes since v12:
- print the cpuset and memory allocation information after oom victim comm, pid.

Changes since v11:
- move the array of const char oom_constraint_text to oom_kill.c
- add the cpuset information in the one line output.

Changes since v10:
- divide the patch v8 into two parts. One part is to add the array of const 
char and put enum
  oom_constaint into oom.h; the other adds a new func to print the missing 
information for the system-
  wide oom report.

Changes since v9:
- divide the patch v8 into two parts. One part is to move enum oom_constraint 
into memcontrol.h; the
  other refactors the output info in the dump_header.
- replace orgin_memcg and kill_memcg with oom_memcg and task_memcg resptively.

Changes since v8:
- add the constraint in the oom_control structure.
- put enum oom_constraint and constraint array into the oom.h file.
- simplify the description for mem_cgroup_print_oom_context.

Changes since v7:
- add the constraint parameter to dump_header and oom_kill_process.
- remove the static char array in the mem_cgroup_print_oom_context, and
  invoke pr_cont_

Re: [PATCH v15 1/2] Reorganize the oom report in dump_header

2018-11-02 Thread Michal Hocko
On Fri 02-11-18 14:18:59, 禹舟键 wrote:
> Hi Michal
> The message-id is as below
> https://lkml.org/lkml/2018/7/31/148

David said
: It's possible that p is NULL when calling dump_header().  In this case we
: do not want to print any line concerning a victim because no oom kill has
: occurred.

This means that we should check for p rather than oc.

-- 
Michal Hocko
SUSE Labs


Re: [PATCH v15 1/2] Reorganize the oom report in dump_header

2018-11-01 Thread Michal Hocko
On Thu 01-11-18 18:09:39, 禹舟键 wrote:
> Hi Michal
> The null pointer is possible when calling the dump_header, this bug was
> detected by LKP. Below is the context 3 months ago.

Yeah I remember it was 0day report but I coundn't find it in my email
archive. Do you happen to have a message-id?

Anyway
if (__ratelimit(&oom_rs))
dump_header(oc, p);
+   if (oc)
+   dump_oom_summary(oc, victim);

Clearly cannot solve any NULL ptr because oc is never NULL unless I am
missing something terribly.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH v15 1/2] Reorganize the oom report in dump_header

2018-10-31 Thread Michal Hocko
On Sat 29-09-18 21:06:26, ufo19890...@gmail.com wrote:
[...]
> Changes since v14:
> - add the dump_oom_summary for the single line output of oom context.
> - fix the null pointer in the dump_header.

I do not remember details about this null ptr but the fix you seemed to
have done is
[...]
> +static void dump_oom_summary(struct oom_control *oc, struct task_struct 
> *victim)
> +{
> + /* one line summary of the oom killer context. */
> + pr_info("oom-kill:constraint=%s,nodemask=%*pbl",
> + oom_constraint_text[oc->constraint],
> + nodemask_pr_args(oc->nodemask));
> + cpuset_print_current_mems_allowed();
> + pr_cont(",task=%s,pid=%d,uid=%d\n", victim->comm, victim->pid,
> + from_kuid(&init_user_ns, task_uid(victim)));
> +}
> +
>  /*
>   * Number of OOM victims in flight
>   */
> @@ -951,6 +960,8 @@ static void oom_kill_process(struct oom_control *oc, 
> const char *message)
>  
>   if (__ratelimit(&oom_rs))
>   dump_header(oc, p);
> + if (oc)
> + dump_oom_summary(oc, victim);
>  

this? If yes then this is bogus because oc is never NULL. Besides that,
you used to have this one line summary in dump_header which looks much
better fit to me than oom_kill_process.

-- 
Michal Hocko
SUSE Labs


[PATCH v15 1/2] Reorganize the oom report in dump_header

2018-09-29 Thread ufo19890607
From: yuzhoujian 

OOM report contains several sections. The first one is the allocation
context that has triggered the OOM. Then we have cpuset context
followed by the stack trace of the OOM path. The tird one is the OOM
memory information. Followed by the current memory state of all system
tasks. At last, we will show oom eligible tasks and the information
about the chosen oom victim.

One thing that makes parsing more awkward than necessary is that we do
not have a single and easily parsable line about the oom context. This
patch is reorganizing the oom report to
1) who invoked oom and what was the allocation request
[  515.902945] tuned invoked oom-killer: 
gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0

2) OOM stack trace
[  515.904273] CPU: 24 PID: 1809 Comm: tuned Not tainted 4.19.0-rc5+ #3
[  515.905518] Hardware name: Inspur SA5212M4/YZMB-00370-107, BIOS 4.1.10 
11/14/2016
[  515.906821] Call Trace:
[  515.908062]  dump_stack+0x5a/0x73
[  515.909311]  dump_header+0x55/0x28c
[  515.914260]  oom_kill_process+0x2d8/0x300
[  515.916708]  out_of_memory+0x145/0x4a0
[  515.917932]  __alloc_pages_slowpath+0x7d2/0xa16
[  515.919157]  __alloc_pages_nodemask+0x277/0x290
[  515.920367]  filemap_fault+0x3d0/0x6c0
[  515.921529]  ? filemap_map_pages+0x2b8/0x420
[  515.922709]  ext4_filemap_fault+0x2c/0x40 [ext4]
[  515.923884]  __do_fault+0x20/0x80
[  515.925032]  __handle_mm_fault+0xbc0/0xe80
[  515.926195]  handle_mm_fault+0xfa/0x210
[  515.927357]  __do_page_fault+0x233/0x4c0
[  515.928506]  do_page_fault+0x32/0x140
[  515.929646]  ? page_fault+0x8/0x30
[  515.930770]  page_fault+0x1e/0x30

3) OOM memory information
[  515.958093] Mem-Info:
[  515.959647] active_anon:26501758 inactive_anon:1179809 isolated_anon:0
 active_file:4402672 inactive_file:483963 isolated_file:1344
 unevictable:0 dirty:4886753 writeback:0 unstable:0
 slab_reclaimable:148442 slab_unreclaimable:18741
 mapped:1347 shmem:1347 pagetables:58669 bounce:0
 free:88663 free_pcp:0 free_cma:0
...

4) current memory state of all system tasks
[  516.079544] [744] 0   744 9211 1345   114688   82
 0 systemd-journal
[  516.082034] [787] 0   787317640   143360   92
 0 lvmetad
[  516.084465] [792] 0   792109301   110592  208
 -1000 systemd-udevd
[  516.086865] [   1199] 0  1199138660   131072  112
 -1000 auditd
[  516.089190] [   1222] 0  1222319901   110592  157
 0 smartd
[  516.091477] [   1225] 0  1225 4864   8581920   43
 0 irqbalance
[  516.093712] [   1226] 0  1226526120   258048  426
 0 abrtd
[  516.112128] [   1280] 0  1280   109774   55   299008  400
 0 NetworkManager
[  516.113998] [   1295] 0  129528817   3769632   24
 0 ksmtuned
[  516.144596] [  10718] 0 10718  2622484  1721372 15998976   267219
 0 panic
[  516.145792] [  10719] 0 10719  2622484  1164767  981811253576
 0 panic
[  516.146977] [  10720] 0 10720  2622484  1174361  990412853709
 0 panic
[  516.148163] [  10721] 0 10721  2622484  1209070 1019494454824
 0 panic
[  516.149329] [  10722] 0 10722  2622484  1745799 1477427291138
 0 panic

5) oom context (contrains and the chosen victim).
oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0-1,task=panic,pid=10737,uid=0

An admin can easily get the full oom context at a single line which
makes parsing much easier.

Signed-off-by: yuzhoujian 
---
Changes since v14:
- add the dump_oom_summary for the single line output of oom context.
- fix the null pointer in the dump_header.

Changes since v13:
- remove the spaces for printing pid and uid.

Changes since v12:
- print the cpuset and memory allocation information after oom victim comm, pid.

Changes since v11:
- move the array of const char oom_constraint_text to oom_kill.c
- add the cpuset information in the one line output.

Changes since v10:
- divide the patch v8 into two parts. One part is to add the array of const 
char and put enum
  oom_constaint into oom.h; the other adds a new func to print the missing 
information for the system-
  wide oom report.

Changes since v9:
- divide the patch v8 into two parts. One part is to move enum oom_constraint 
into memcontrol.h; the
  other refactors the output info in the dump_header.
- replace orgin_memcg and kill_memcg with oom_memcg and task_memcg resptively.

Changes since v8:
- add the constraint in the oom_control structure.
- put enum oom_constraint and constraint array into the oom.h file.
- simplify the description for mem_cgroup_print_oom_context.

Changes since v7:
- add the constraint parameter to dump_header and oom_kill_process.
- remove the static char array in the mem_cgroup_print_oom_context, and
  invoke pr_cont_