Re: [PATCH] OOM killer: wait for tasks with pending SIGKILL to exit

2013-10-01 Thread David Rientjes
On Tue, 1 Oct 2013, Sergey Dyasly wrote:

> If you are ok with the first change in my patch regarding 
> fatal_signal_pending,
> I can send new patch with just that change.
> 

The entire patch is pointless, there's no need to give access to memory 
reserves simply because it is PF_EXITING.  If it needs memory, it will 
call the oom killer itself.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] OOM killer: wait for tasks with pending SIGKILL to exit

2013-10-01 Thread Sergey Dyasly
It seems to me that we are going nowhere with this discussion...

If you are ok with the first change in my patch regarding fatal_signal_pending,
I can send new patch with just that change.


On Mon, 30 Sep 2013 15:08:25 -0700 (PDT)
David Rientjes  wrote:

> On Fri, 27 Sep 2013, Sergey Dyasly wrote:
> 
> > What you are saying contradicts current OOMk code the way I read it. 
> > Comment in
> > oom_kill_process() says:
> > 
> > "If the task is already exiting ... set TIF_MEMDIE so it can die quickly"
> > 
> > I just want to know the right solution.
> > 
> 
> That's a comment, not code.  The point of the PF_EXITING special handling 
> in oom_kill_process() is to avoid telling sysadmins that a process has 
> been killed to free memory when it has already called exit() and to avoid 
> sacrificing one of its children for the exiting process.
> 
> It may or may not need access to memory reserves to actually exit after 
> PF_EXITING depending on whether it needs to allocate memory for 
> coredumping or anything else.  So instead of waiting for it to recall the 
> oom killer, TIF_MEMDIE is set anyway.  The point is that PF_EXITING 
> processes can already get TIF_MEMDIE immediately when their memory 
> allocation fails so there's no reason not to set it now as an 
> optimization.
> 
> But we definitely want to avoid printing anything to the kernel log when 
> the process has already called exit() and issuing the SIGKILL at that 
> point would be pointless.
> 
> > You are mistaken, oom_kill_process() is only called from out_of_memory()
> > and mem_cgroup_out_of_memory().
> > 
> 
> out_of_memory() calls oom_kill_process() in two places, plus the call from 
> mem_cgroup_out_of_memory(), making three calls in the tree.  Not that this 
> matters in the slightest, though.
> 
> > > Read the comment about why we don't emit anything to the kernel log in 
> > > this case; the process is already exiting, there's no need to kill it or 
> > > make anyone believe that it was killed.
> > 
> > Yes, but there is already the PF_EXITING check in oom_scan_process_thread(),
> > and in this case oom_kill_process() won't be even called. That's why it's
> > redundant.
> > 
> 
> You apparently have no idea how long select_bad_process() runs on a large 
> system with thousands of processes.  Keep in mind that SGI requested the 
> addition of the oom_kill_allocating_task sysctl specifically because of 
> how long select_bad_process() runs.  The PF_EXITING check in 
> oom_kill_process() is simply an optimization to return early and with 
> access to memory reserves so it can exit as quickly as possible and 
> without the kernel stating it's killing something that has already called 
> exit().


-- 
Sergey Dyasly 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] OOM killer: wait for tasks with pending SIGKILL to exit

2013-10-01 Thread Sergey Dyasly
It seems to me that we are going nowhere with this discussion...

If you are ok with the first change in my patch regarding fatal_signal_pending,
I can send new patch with just that change.


On Mon, 30 Sep 2013 15:08:25 -0700 (PDT)
David Rientjes rient...@google.com wrote:

 On Fri, 27 Sep 2013, Sergey Dyasly wrote:
 
  What you are saying contradicts current OOMk code the way I read it. 
  Comment in
  oom_kill_process() says:
  
  If the task is already exiting ... set TIF_MEMDIE so it can die quickly
  
  I just want to know the right solution.
  
 
 That's a comment, not code.  The point of the PF_EXITING special handling 
 in oom_kill_process() is to avoid telling sysadmins that a process has 
 been killed to free memory when it has already called exit() and to avoid 
 sacrificing one of its children for the exiting process.
 
 It may or may not need access to memory reserves to actually exit after 
 PF_EXITING depending on whether it needs to allocate memory for 
 coredumping or anything else.  So instead of waiting for it to recall the 
 oom killer, TIF_MEMDIE is set anyway.  The point is that PF_EXITING 
 processes can already get TIF_MEMDIE immediately when their memory 
 allocation fails so there's no reason not to set it now as an 
 optimization.
 
 But we definitely want to avoid printing anything to the kernel log when 
 the process has already called exit() and issuing the SIGKILL at that 
 point would be pointless.
 
  You are mistaken, oom_kill_process() is only called from out_of_memory()
  and mem_cgroup_out_of_memory().
  
 
 out_of_memory() calls oom_kill_process() in two places, plus the call from 
 mem_cgroup_out_of_memory(), making three calls in the tree.  Not that this 
 matters in the slightest, though.
 
   Read the comment about why we don't emit anything to the kernel log in 
   this case; the process is already exiting, there's no need to kill it or 
   make anyone believe that it was killed.
  
  Yes, but there is already the PF_EXITING check in oom_scan_process_thread(),
  and in this case oom_kill_process() won't be even called. That's why it's
  redundant.
  
 
 You apparently have no idea how long select_bad_process() runs on a large 
 system with thousands of processes.  Keep in mind that SGI requested the 
 addition of the oom_kill_allocating_task sysctl specifically because of 
 how long select_bad_process() runs.  The PF_EXITING check in 
 oom_kill_process() is simply an optimization to return early and with 
 access to memory reserves so it can exit as quickly as possible and 
 without the kernel stating it's killing something that has already called 
 exit().


-- 
Sergey Dyasly dse...@gmail.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] OOM killer: wait for tasks with pending SIGKILL to exit

2013-10-01 Thread David Rientjes
On Tue, 1 Oct 2013, Sergey Dyasly wrote:

 If you are ok with the first change in my patch regarding 
 fatal_signal_pending,
 I can send new patch with just that change.
 

The entire patch is pointless, there's no need to give access to memory 
reserves simply because it is PF_EXITING.  If it needs memory, it will 
call the oom killer itself.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] OOM killer: wait for tasks with pending SIGKILL to exit

2013-09-30 Thread David Rientjes
On Fri, 27 Sep 2013, Sergey Dyasly wrote:

> What you are saying contradicts current OOMk code the way I read it. Comment 
> in
> oom_kill_process() says:
> 
> "If the task is already exiting ... set TIF_MEMDIE so it can die quickly"
> 
> I just want to know the right solution.
> 

That's a comment, not code.  The point of the PF_EXITING special handling 
in oom_kill_process() is to avoid telling sysadmins that a process has 
been killed to free memory when it has already called exit() and to avoid 
sacrificing one of its children for the exiting process.

It may or may not need access to memory reserves to actually exit after 
PF_EXITING depending on whether it needs to allocate memory for 
coredumping or anything else.  So instead of waiting for it to recall the 
oom killer, TIF_MEMDIE is set anyway.  The point is that PF_EXITING 
processes can already get TIF_MEMDIE immediately when their memory 
allocation fails so there's no reason not to set it now as an 
optimization.

But we definitely want to avoid printing anything to the kernel log when 
the process has already called exit() and issuing the SIGKILL at that 
point would be pointless.

> You are mistaken, oom_kill_process() is only called from out_of_memory()
> and mem_cgroup_out_of_memory().
> 

out_of_memory() calls oom_kill_process() in two places, plus the call from 
mem_cgroup_out_of_memory(), making three calls in the tree.  Not that this 
matters in the slightest, though.

> > Read the comment about why we don't emit anything to the kernel log in 
> > this case; the process is already exiting, there's no need to kill it or 
> > make anyone believe that it was killed.
> 
> Yes, but there is already the PF_EXITING check in oom_scan_process_thread(),
> and in this case oom_kill_process() won't be even called. That's why it's
> redundant.
> 

You apparently have no idea how long select_bad_process() runs on a large 
system with thousands of processes.  Keep in mind that SGI requested the 
addition of the oom_kill_allocating_task sysctl specifically because of 
how long select_bad_process() runs.  The PF_EXITING check in 
oom_kill_process() is simply an optimization to return early and with 
access to memory reserves so it can exit as quickly as possible and 
without the kernel stating it's killing something that has already called 
exit().
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] OOM killer: wait for tasks with pending SIGKILL to exit

2013-09-30 Thread David Rientjes
On Fri, 27 Sep 2013, Sergey Dyasly wrote:

 What you are saying contradicts current OOMk code the way I read it. Comment 
 in
 oom_kill_process() says:
 
 If the task is already exiting ... set TIF_MEMDIE so it can die quickly
 
 I just want to know the right solution.
 

That's a comment, not code.  The point of the PF_EXITING special handling 
in oom_kill_process() is to avoid telling sysadmins that a process has 
been killed to free memory when it has already called exit() and to avoid 
sacrificing one of its children for the exiting process.

It may or may not need access to memory reserves to actually exit after 
PF_EXITING depending on whether it needs to allocate memory for 
coredumping or anything else.  So instead of waiting for it to recall the 
oom killer, TIF_MEMDIE is set anyway.  The point is that PF_EXITING 
processes can already get TIF_MEMDIE immediately when their memory 
allocation fails so there's no reason not to set it now as an 
optimization.

But we definitely want to avoid printing anything to the kernel log when 
the process has already called exit() and issuing the SIGKILL at that 
point would be pointless.

 You are mistaken, oom_kill_process() is only called from out_of_memory()
 and mem_cgroup_out_of_memory().
 

out_of_memory() calls oom_kill_process() in two places, plus the call from 
mem_cgroup_out_of_memory(), making three calls in the tree.  Not that this 
matters in the slightest, though.

  Read the comment about why we don't emit anything to the kernel log in 
  this case; the process is already exiting, there's no need to kill it or 
  make anyone believe that it was killed.
 
 Yes, but there is already the PF_EXITING check in oom_scan_process_thread(),
 and in this case oom_kill_process() won't be even called. That's why it's
 redundant.
 

You apparently have no idea how long select_bad_process() runs on a large 
system with thousands of processes.  Keep in mind that SGI requested the 
addition of the oom_kill_allocating_task sysctl specifically because of 
how long select_bad_process() runs.  The PF_EXITING check in 
oom_kill_process() is simply an optimization to return early and with 
access to memory reserves so it can exit as quickly as possible and 
without the kernel stating it's killing something that has already called 
exit().
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] OOM killer: wait for tasks with pending SIGKILL to exit

2013-09-27 Thread Sergey Dyasly
On Wed, 25 Sep 2013 13:31:32 -0700 (PDT)
David Rientjes  wrote:

> On Wed, 11 Sep 2013, Sergey Dyasly wrote:
> 
> > > > /*
> > > >  * If this task is not being ptraced on exit, then wait 
> > > > for it
> > > >  * to finish before killing some other task 
> > > > unnecessarily.
> > > >  */
> > > > -   if (!(task->group_leader->ptrace & PT_TRACE_EXIT))
> > > > +   if (!(task->group_leader->ptrace & PT_TRACE_EXIT)) {
> > > > +   set_tsk_thread_flag(task, TIF_MEMDIE);
> > > 
> > > This does not, we do not give access to memory reserves unless the 
> > > process 
> > > needs it to allocate memory.  The task here, which is not current, can 
> > > call into the oom killer and be granted memory reserves if necessary.
> > 
> > True. However, why TIF_MEMDIE is set for PF_EXITING task in 
> > oom_kill_process()
> > then?
> 
> If current needs access to memory reserves while PF_EXITING, it should 
> call the page allocator, find that it is out of memory, and call the oom 
> killer to silently be granted memory reserves.

I understand this and you are repeating yourself :)
What you are saying contradicts current OOMk code the way I read it. Comment in
oom_kill_process() says:

"If the task is already exiting ... set TIF_MEMDIE so it can die quickly"

I just want to know the right solution.

> > > > @@ -412,16 +415,6 @@ void oom_kill_process(struct task_struct *p, gfp_t 
> > > > gfp_mask, int order,
> > > > static DEFINE_RATELIMIT_STATE(oom_rs, 
> > > > DEFAULT_RATELIMIT_INTERVAL,
> > > >   DEFAULT_RATELIMIT_BURST);
> > > >  
> > > > -   /*
> > > > -* If the task is already exiting, don't alarm the sysadmin or 
> > > > kill
> > > > -* its children or threads, just set TIF_MEMDIE so it can die 
> > > > quickly
> > > > -*/
> > > > -   if (p->flags & PF_EXITING) {
> > > > -   set_tsk_thread_flag(p, TIF_MEMDIE);
> > > > -   put_task_struct(p);
> > > > -   return;
> > > > -   }
> > > 
> > > I think you misunderstood the point of this; if a selected process is 
> > > already in the exit path then this is simply avoiding dumping oom kill 
> > > lines to the kernel log.  We want to keep doing that.
> > 
> > This happens in oom_kill_process() after victim has been selected by
> > select_bad_process(). But there is already PF_EXITING check in
> > oom_scan_process_thread() and in this case OOM code won't call 
> > oom_kill_process.
> 
> select_bad_process() is one of three callers to oom_kill_process().

You are mistaken, oom_kill_process() is only called from out_of_memory()
and mem_cgroup_out_of_memory().

> > The only difference is in force_kill flag, and the only case where it's set
> > is SysRq. And I think in this case OOM killer messages are a good thing to 
> > have
> > even when victim is already exiting, instead of just silence.
> > 
> 
> Read the comment about why we don't emit anything to the kernel log in 
> this case; the process is already exiting, there's no need to kill it or 
> make anyone believe that it was killed.

Yes, but there is already the PF_EXITING check in oom_scan_process_thread(),
and in this case oom_kill_process() won't be even called. That's why it's
redundant.

--
Sergey Dyasly 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] OOM killer: wait for tasks with pending SIGKILL to exit

2013-09-27 Thread Sergey Dyasly
On Wed, 25 Sep 2013 13:31:32 -0700 (PDT)
David Rientjes rient...@google.com wrote:

 On Wed, 11 Sep 2013, Sergey Dyasly wrote:
 
/*
 * If this task is not being ptraced on exit, then wait 
for it
 * to finish before killing some other task 
unnecessarily.
 */
-   if (!(task-group_leader-ptrace  PT_TRACE_EXIT))
+   if (!(task-group_leader-ptrace  PT_TRACE_EXIT)) {
+   set_tsk_thread_flag(task, TIF_MEMDIE);
   
   This does not, we do not give access to memory reserves unless the 
   process 
   needs it to allocate memory.  The task here, which is not current, can 
   call into the oom killer and be granted memory reserves if necessary.
  
  True. However, why TIF_MEMDIE is set for PF_EXITING task in 
  oom_kill_process()
  then?
 
 If current needs access to memory reserves while PF_EXITING, it should 
 call the page allocator, find that it is out of memory, and call the oom 
 killer to silently be granted memory reserves.

I understand this and you are repeating yourself :)
What you are saying contradicts current OOMk code the way I read it. Comment in
oom_kill_process() says:

If the task is already exiting ... set TIF_MEMDIE so it can die quickly

I just want to know the right solution.

@@ -412,16 +415,6 @@ void oom_kill_process(struct task_struct *p, gfp_t 
gfp_mask, int order,
static DEFINE_RATELIMIT_STATE(oom_rs, 
DEFAULT_RATELIMIT_INTERVAL,
  DEFAULT_RATELIMIT_BURST);
 
-   /*
-* If the task is already exiting, don't alarm the sysadmin or 
kill
-* its children or threads, just set TIF_MEMDIE so it can die 
quickly
-*/
-   if (p-flags  PF_EXITING) {
-   set_tsk_thread_flag(p, TIF_MEMDIE);
-   put_task_struct(p);
-   return;
-   }
   
   I think you misunderstood the point of this; if a selected process is 
   already in the exit path then this is simply avoiding dumping oom kill 
   lines to the kernel log.  We want to keep doing that.
  
  This happens in oom_kill_process() after victim has been selected by
  select_bad_process(). But there is already PF_EXITING check in
  oom_scan_process_thread() and in this case OOM code won't call 
  oom_kill_process.
 
 select_bad_process() is one of three callers to oom_kill_process().

You are mistaken, oom_kill_process() is only called from out_of_memory()
and mem_cgroup_out_of_memory().

  The only difference is in force_kill flag, and the only case where it's set
  is SysRq. And I think in this case OOM killer messages are a good thing to 
  have
  even when victim is already exiting, instead of just silence.
  
 
 Read the comment about why we don't emit anything to the kernel log in 
 this case; the process is already exiting, there's no need to kill it or 
 make anyone believe that it was killed.

Yes, but there is already the PF_EXITING check in oom_scan_process_thread(),
and in this case oom_kill_process() won't be even called. That's why it's
redundant.

--
Sergey Dyasly dse...@gmail.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] OOM killer: wait for tasks with pending SIGKILL to exit

2013-09-25 Thread David Rientjes
On Wed, 11 Sep 2013, Sergey Dyasly wrote:

> > >   /*
> > >* If this task is not being ptraced on exit, then wait for it
> > >* to finish before killing some other task unnecessarily.
> > >*/
> > > - if (!(task->group_leader->ptrace & PT_TRACE_EXIT))
> > > + if (!(task->group_leader->ptrace & PT_TRACE_EXIT)) {
> > > + set_tsk_thread_flag(task, TIF_MEMDIE);
> > 
> > This does not, we do not give access to memory reserves unless the process 
> > needs it to allocate memory.  The task here, which is not current, can 
> > call into the oom killer and be granted memory reserves if necessary.
> 
> True. However, why TIF_MEMDIE is set for PF_EXITING task in oom_kill_process()
> then?

If current needs access to memory reserves while PF_EXITING, it should 
call the page allocator, find that it is out of memory, and call the oom 
killer to silently be granted memory reserves.

> > > @@ -412,16 +415,6 @@ void oom_kill_process(struct task_struct *p, gfp_t 
> > > gfp_mask, int order,
> > >   static DEFINE_RATELIMIT_STATE(oom_rs, DEFAULT_RATELIMIT_INTERVAL,
> > > DEFAULT_RATELIMIT_BURST);
> > >  
> > > - /*
> > > -  * If the task is already exiting, don't alarm the sysadmin or kill
> > > -  * its children or threads, just set TIF_MEMDIE so it can die quickly
> > > -  */
> > > - if (p->flags & PF_EXITING) {
> > > - set_tsk_thread_flag(p, TIF_MEMDIE);
> > > - put_task_struct(p);
> > > - return;
> > > - }
> > 
> > I think you misunderstood the point of this; if a selected process is 
> > already in the exit path then this is simply avoiding dumping oom kill 
> > lines to the kernel log.  We want to keep doing that.
> 
> This happens in oom_kill_process() after victim has been selected by
> select_bad_process(). But there is already PF_EXITING check in
> oom_scan_process_thread() and in this case OOM code won't call 
> oom_kill_process.

select_bad_process() is one of three callers to oom_kill_process().

> The only difference is in force_kill flag, and the only case where it's set
> is SysRq. And I think in this case OOM killer messages are a good thing to 
> have
> even when victim is already exiting, instead of just silence.
> 

Read the comment about why we don't emit anything to the kernel log in 
this case; the process is already exiting, there's no need to kill it or 
make anyone believe that it was killed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] OOM killer: wait for tasks with pending SIGKILL to exit

2013-09-25 Thread David Rientjes
On Wed, 11 Sep 2013, Sergey Dyasly wrote:

 /*
  * If this task is not being ptraced on exit, then wait for it
  * to finish before killing some other task unnecessarily.
  */
   - if (!(task-group_leader-ptrace  PT_TRACE_EXIT))
   + if (!(task-group_leader-ptrace  PT_TRACE_EXIT)) {
   + set_tsk_thread_flag(task, TIF_MEMDIE);
  
  This does not, we do not give access to memory reserves unless the process 
  needs it to allocate memory.  The task here, which is not current, can 
  call into the oom killer and be granted memory reserves if necessary.
 
 True. However, why TIF_MEMDIE is set for PF_EXITING task in oom_kill_process()
 then?

If current needs access to memory reserves while PF_EXITING, it should 
call the page allocator, find that it is out of memory, and call the oom 
killer to silently be granted memory reserves.

   @@ -412,16 +415,6 @@ void oom_kill_process(struct task_struct *p, gfp_t 
   gfp_mask, int order,
 static DEFINE_RATELIMIT_STATE(oom_rs, DEFAULT_RATELIMIT_INTERVAL,
   DEFAULT_RATELIMIT_BURST);

   - /*
   -  * If the task is already exiting, don't alarm the sysadmin or kill
   -  * its children or threads, just set TIF_MEMDIE so it can die quickly
   -  */
   - if (p-flags  PF_EXITING) {
   - set_tsk_thread_flag(p, TIF_MEMDIE);
   - put_task_struct(p);
   - return;
   - }
  
  I think you misunderstood the point of this; if a selected process is 
  already in the exit path then this is simply avoiding dumping oom kill 
  lines to the kernel log.  We want to keep doing that.
 
 This happens in oom_kill_process() after victim has been selected by
 select_bad_process(). But there is already PF_EXITING check in
 oom_scan_process_thread() and in this case OOM code won't call 
 oom_kill_process.

select_bad_process() is one of three callers to oom_kill_process().

 The only difference is in force_kill flag, and the only case where it's set
 is SysRq. And I think in this case OOM killer messages are a good thing to 
 have
 even when victim is already exiting, instead of just silence.
 

Read the comment about why we don't emit anything to the kernel log in 
this case; the process is already exiting, there's no need to kill it or 
make anyone believe that it was killed.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] OOM killer: wait for tasks with pending SIGKILL to exit

2013-09-19 Thread Sergey Dyasly
Ping :)

On Wed, 11 Sep 2013 19:06:05 +0400
Sergey Dyasly  wrote:

> On Mon, 9 Sep 2013 13:07:08 -0700 (PDT)
> David Rientjes  wrote:
> 
> > >   /*
> > >* If this task is not being ptraced on exit, then wait for it
> > >* to finish before killing some other task unnecessarily.
> > >*/
> > > - if (!(task->group_leader->ptrace & PT_TRACE_EXIT))
> > > + if (!(task->group_leader->ptrace & PT_TRACE_EXIT)) {
> > > + set_tsk_thread_flag(task, TIF_MEMDIE);
> > 
> > This does not, we do not give access to memory reserves unless the process 
> > needs it to allocate memory.  The task here, which is not current, can 
> > call into the oom killer and be granted memory reserves if necessary.
> 
> True. However, why TIF_MEMDIE is set for PF_EXITING task in oom_kill_process()
> then?
> Also, setting TIF_MEMDIE will avoid direct reclaim and memory allocation 
> should
> be fast if exiting task needs it.
> 
> > > @@ -412,16 +415,6 @@ void oom_kill_process(struct task_struct *p, gfp_t 
> > > gfp_mask, int order,
> > >   static DEFINE_RATELIMIT_STATE(oom_rs, DEFAULT_RATELIMIT_INTERVAL,
> > > DEFAULT_RATELIMIT_BURST);
> > >  
> > > - /*
> > > -  * If the task is already exiting, don't alarm the sysadmin or kill
> > > -  * its children or threads, just set TIF_MEMDIE so it can die quickly
> > > -  */
> > > - if (p->flags & PF_EXITING) {
> > > - set_tsk_thread_flag(p, TIF_MEMDIE);
> > > - put_task_struct(p);
> > > - return;
> > > - }
> > 
> > I think you misunderstood the point of this; if a selected process is 
> > already in the exit path then this is simply avoiding dumping oom kill 
> > lines to the kernel log.  We want to keep doing that.
> 
> This happens in oom_kill_process() after victim has been selected by
> select_bad_process(). But there is already PF_EXITING check in
> oom_scan_process_thread() and in this case OOM code won't call 
> oom_kill_process.
> There is only a slight chance that victim will become PF_EXITING between
> scan and kill.
> 
> The only difference is in force_kill flag, and the only case where it's set
> is SysRq. And I think in this case OOM killer messages are a good thing to 
> have
> even when victim is already exiting, instead of just silence.
> 
> -- 
> Sergey Dyasly 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] OOM killer: wait for tasks with pending SIGKILL to exit

2013-09-19 Thread Sergey Dyasly
Ping :)

On Wed, 11 Sep 2013 19:06:05 +0400
Sergey Dyasly dse...@gmail.com wrote:

 On Mon, 9 Sep 2013 13:07:08 -0700 (PDT)
 David Rientjes rient...@google.com wrote:
 
 /*
  * If this task is not being ptraced on exit, then wait for it
  * to finish before killing some other task unnecessarily.
  */
   - if (!(task-group_leader-ptrace  PT_TRACE_EXIT))
   + if (!(task-group_leader-ptrace  PT_TRACE_EXIT)) {
   + set_tsk_thread_flag(task, TIF_MEMDIE);
  
  This does not, we do not give access to memory reserves unless the process 
  needs it to allocate memory.  The task here, which is not current, can 
  call into the oom killer and be granted memory reserves if necessary.
 
 True. However, why TIF_MEMDIE is set for PF_EXITING task in oom_kill_process()
 then?
 Also, setting TIF_MEMDIE will avoid direct reclaim and memory allocation 
 should
 be fast if exiting task needs it.
 
   @@ -412,16 +415,6 @@ void oom_kill_process(struct task_struct *p, gfp_t 
   gfp_mask, int order,
 static DEFINE_RATELIMIT_STATE(oom_rs, DEFAULT_RATELIMIT_INTERVAL,
   DEFAULT_RATELIMIT_BURST);

   - /*
   -  * If the task is already exiting, don't alarm the sysadmin or kill
   -  * its children or threads, just set TIF_MEMDIE so it can die quickly
   -  */
   - if (p-flags  PF_EXITING) {
   - set_tsk_thread_flag(p, TIF_MEMDIE);
   - put_task_struct(p);
   - return;
   - }
  
  I think you misunderstood the point of this; if a selected process is 
  already in the exit path then this is simply avoiding dumping oom kill 
  lines to the kernel log.  We want to keep doing that.
 
 This happens in oom_kill_process() after victim has been selected by
 select_bad_process(). But there is already PF_EXITING check in
 oom_scan_process_thread() and in this case OOM code won't call 
 oom_kill_process.
 There is only a slight chance that victim will become PF_EXITING between
 scan and kill.
 
 The only difference is in force_kill flag, and the only case where it's set
 is SysRq. And I think in this case OOM killer messages are a good thing to 
 have
 even when victim is already exiting, instead of just silence.
 
 -- 
 Sergey Dyasly dse...@gmail.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] OOM killer: wait for tasks with pending SIGKILL to exit

2013-09-11 Thread Sergey Dyasly
On Mon, 9 Sep 2013 13:07:08 -0700 (PDT)
David Rientjes  wrote:

> > /*
> >  * If this task is not being ptraced on exit, then wait for it
> >  * to finish before killing some other task unnecessarily.
> >  */
> > -   if (!(task->group_leader->ptrace & PT_TRACE_EXIT))
> > +   if (!(task->group_leader->ptrace & PT_TRACE_EXIT)) {
> > +   set_tsk_thread_flag(task, TIF_MEMDIE);
> 
> This does not, we do not give access to memory reserves unless the process 
> needs it to allocate memory.  The task here, which is not current, can 
> call into the oom killer and be granted memory reserves if necessary.

True. However, why TIF_MEMDIE is set for PF_EXITING task in oom_kill_process()
then?
Also, setting TIF_MEMDIE will avoid direct reclaim and memory allocation should
be fast if exiting task needs it.

> > @@ -412,16 +415,6 @@ void oom_kill_process(struct task_struct *p, gfp_t 
> > gfp_mask, int order,
> > static DEFINE_RATELIMIT_STATE(oom_rs, DEFAULT_RATELIMIT_INTERVAL,
> >   DEFAULT_RATELIMIT_BURST);
> >  
> > -   /*
> > -* If the task is already exiting, don't alarm the sysadmin or kill
> > -* its children or threads, just set TIF_MEMDIE so it can die quickly
> > -*/
> > -   if (p->flags & PF_EXITING) {
> > -   set_tsk_thread_flag(p, TIF_MEMDIE);
> > -   put_task_struct(p);
> > -   return;
> > -   }
> 
> I think you misunderstood the point of this; if a selected process is 
> already in the exit path then this is simply avoiding dumping oom kill 
> lines to the kernel log.  We want to keep doing that.

This happens in oom_kill_process() after victim has been selected by
select_bad_process(). But there is already PF_EXITING check in
oom_scan_process_thread() and in this case OOM code won't call oom_kill_process.
There is only a slight chance that victim will become PF_EXITING between
scan and kill.

The only difference is in force_kill flag, and the only case where it's set
is SysRq. And I think in this case OOM killer messages are a good thing to have
even when victim is already exiting, instead of just silence.

-- 
Sergey Dyasly 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] OOM killer: wait for tasks with pending SIGKILL to exit

2013-09-11 Thread Sergey Dyasly
On Mon, 9 Sep 2013 13:07:08 -0700 (PDT)
David Rientjes rient...@google.com wrote:

  /*
   * If this task is not being ptraced on exit, then wait for it
   * to finish before killing some other task unnecessarily.
   */
  -   if (!(task-group_leader-ptrace  PT_TRACE_EXIT))
  +   if (!(task-group_leader-ptrace  PT_TRACE_EXIT)) {
  +   set_tsk_thread_flag(task, TIF_MEMDIE);
 
 This does not, we do not give access to memory reserves unless the process 
 needs it to allocate memory.  The task here, which is not current, can 
 call into the oom killer and be granted memory reserves if necessary.

True. However, why TIF_MEMDIE is set for PF_EXITING task in oom_kill_process()
then?
Also, setting TIF_MEMDIE will avoid direct reclaim and memory allocation should
be fast if exiting task needs it.

  @@ -412,16 +415,6 @@ void oom_kill_process(struct task_struct *p, gfp_t 
  gfp_mask, int order,
  static DEFINE_RATELIMIT_STATE(oom_rs, DEFAULT_RATELIMIT_INTERVAL,
DEFAULT_RATELIMIT_BURST);
   
  -   /*
  -* If the task is already exiting, don't alarm the sysadmin or kill
  -* its children or threads, just set TIF_MEMDIE so it can die quickly
  -*/
  -   if (p-flags  PF_EXITING) {
  -   set_tsk_thread_flag(p, TIF_MEMDIE);
  -   put_task_struct(p);
  -   return;
  -   }
 
 I think you misunderstood the point of this; if a selected process is 
 already in the exit path then this is simply avoiding dumping oom kill 
 lines to the kernel log.  We want to keep doing that.

This happens in oom_kill_process() after victim has been selected by
select_bad_process(). But there is already PF_EXITING check in
oom_scan_process_thread() and in this case OOM code won't call oom_kill_process.
There is only a slight chance that victim will become PF_EXITING between
scan and kill.

The only difference is in force_kill flag, and the only case where it's set
is SysRq. And I think in this case OOM killer messages are a good thing to have
even when victim is already exiting, instead of just silence.

-- 
Sergey Dyasly dse...@gmail.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] OOM killer: wait for tasks with pending SIGKILL to exit

2013-09-09 Thread David Rientjes
On Mon, 9 Sep 2013, Oleg Nesterov wrote:

> > @@ -275,13 +275,16 @@ enum oom_scan_t oom_scan_process_thread(struct 
> > task_struct *task,
> > if (oom_task_origin(task))
> > return OOM_SCAN_SELECT;
> >  
> > -   if (task->flags & PF_EXITING && !force_kill) {
> > +   if ((task->flags & PF_EXITING || fatal_signal_pending(task)) &&
> > +   !force_kill) {
> > /*
> >  * If this task is not being ptraced on exit, then wait for it
> >  * to finish before killing some other task unnecessarily.
> >  */
> > -   if (!(task->group_leader->ptrace & PT_TRACE_EXIT))
> > +   if (!(task->group_leader->ptrace & PT_TRACE_EXIT)) {
> 
> can't we finally kill (or fix?) this PT_TRACE_EXIT check?
> 

Patches are always welcome.

> It was added to fix the exploit I sent. But the patch was wrong,
> that exploit could be easily modified to trigger the same problem.
> 

If the patch prevented your exploit when coredumping was done differently 
then it was not wrong.  It may not have been as inclusive as you would 
have liked, but then again you never proposed any kernel changes to fix it 
yourself either.

> However, now that the coredumping is killable that exploit won't
> work, so the original reason has gone away.
> 
> So why do we need this check today?
> 

If you feel it can be removed, please propose a patch to do so with a 
changelog that describes why it is no longer necessary.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] OOM killer: wait for tasks with pending SIGKILL to exit

2013-09-09 Thread David Rientjes
On Mon, 9 Sep 2013, Sergey Dyasly wrote:

> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 98e75f2..ef83b81 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -275,13 +275,16 @@ enum oom_scan_t oom_scan_process_thread(struct 
> task_struct *task,
>   if (oom_task_origin(task))
>   return OOM_SCAN_SELECT;
>  
> - if (task->flags & PF_EXITING && !force_kill) {
> + if ((task->flags & PF_EXITING || fatal_signal_pending(task)) &&
> + !force_kill) {

This makes sense.

>   /*
>* If this task is not being ptraced on exit, then wait for it
>* to finish before killing some other task unnecessarily.
>*/
> - if (!(task->group_leader->ptrace & PT_TRACE_EXIT))
> + if (!(task->group_leader->ptrace & PT_TRACE_EXIT)) {
> + set_tsk_thread_flag(task, TIF_MEMDIE);

This does not, we do not give access to memory reserves unless the process 
needs it to allocate memory.  The task here, which is not current, can 
call into the oom killer and be granted memory reserves if necessary.

>   return OOM_SCAN_ABORT;
> + }
>   }
>   return OOM_SCAN_OK;
>  }
> @@ -412,16 +415,6 @@ void oom_kill_process(struct task_struct *p, gfp_t 
> gfp_mask, int order,
>   static DEFINE_RATELIMIT_STATE(oom_rs, DEFAULT_RATELIMIT_INTERVAL,
> DEFAULT_RATELIMIT_BURST);
>  
> - /*
> -  * If the task is already exiting, don't alarm the sysadmin or kill
> -  * its children or threads, just set TIF_MEMDIE so it can die quickly
> -  */
> - if (p->flags & PF_EXITING) {
> - set_tsk_thread_flag(p, TIF_MEMDIE);
> - put_task_struct(p);
> - return;
> - }

I think you misunderstood the point of this; if a selected process is 
already in the exit path then this is simply avoiding dumping oom kill 
lines to the kernel log.  We want to keep doing that.

> -
>   if (__ratelimit(_rs))
>   dump_header(p, gfp_mask, order, memcg, nodemask);
>  
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] OOM killer: wait for tasks with pending SIGKILL to exit

2013-09-09 Thread Oleg Nesterov
Can't really comment this patch, just one off-topic note...

On 09/09, Sergey Dyasly wrote:
>
> @@ -275,13 +275,16 @@ enum oom_scan_t oom_scan_process_thread(struct 
> task_struct *task,
>   if (oom_task_origin(task))
>   return OOM_SCAN_SELECT;
>  
> - if (task->flags & PF_EXITING && !force_kill) {
> + if ((task->flags & PF_EXITING || fatal_signal_pending(task)) &&
> + !force_kill) {
>   /*
>* If this task is not being ptraced on exit, then wait for it
>* to finish before killing some other task unnecessarily.
>*/
> - if (!(task->group_leader->ptrace & PT_TRACE_EXIT))
> + if (!(task->group_leader->ptrace & PT_TRACE_EXIT)) {

can't we finally kill (or fix?) this PT_TRACE_EXIT check?

It was added to fix the exploit I sent. But the patch was wrong,
that exploit could be easily modified to trigger the same problem.

However, now that the coredumping is killable that exploit won't
work, so the original reason has gone away.

So why do we need this check today?

And note that we check ->group_leader, this looks completely wrong.
(again, ->group_leader was used just because the original exploit
 traced the group leader).

David?

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] OOM killer: wait for tasks with pending SIGKILL to exit

2013-09-09 Thread Sergey Dyasly
If OOM killer finds a task which is about to exit or is already doing so,
there is no need to kill anyone. It should just wait until task dies.

Add missing fatal_signal_pending() check and allow selected task to use memory
reserves in order to exit quickly.

Also remove redundant PF_EXITING check after victim has been selected.

Signed-off-by: Sergey Dyasly 
---
 mm/oom_kill.c | 17 +
 1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 98e75f2..ef83b81 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -275,13 +275,16 @@ enum oom_scan_t oom_scan_process_thread(struct 
task_struct *task,
if (oom_task_origin(task))
return OOM_SCAN_SELECT;
 
-   if (task->flags & PF_EXITING && !force_kill) {
+   if ((task->flags & PF_EXITING || fatal_signal_pending(task)) &&
+   !force_kill) {
/*
 * If this task is not being ptraced on exit, then wait for it
 * to finish before killing some other task unnecessarily.
 */
-   if (!(task->group_leader->ptrace & PT_TRACE_EXIT))
+   if (!(task->group_leader->ptrace & PT_TRACE_EXIT)) {
+   set_tsk_thread_flag(task, TIF_MEMDIE);
return OOM_SCAN_ABORT;
+   }
}
return OOM_SCAN_OK;
 }
@@ -412,16 +415,6 @@ void oom_kill_process(struct task_struct *p, gfp_t 
gfp_mask, int order,
static DEFINE_RATELIMIT_STATE(oom_rs, DEFAULT_RATELIMIT_INTERVAL,
  DEFAULT_RATELIMIT_BURST);
 
-   /*
-* If the task is already exiting, don't alarm the sysadmin or kill
-* its children or threads, just set TIF_MEMDIE so it can die quickly
-*/
-   if (p->flags & PF_EXITING) {
-   set_tsk_thread_flag(p, TIF_MEMDIE);
-   put_task_struct(p);
-   return;
-   }
-
if (__ratelimit(_rs))
dump_header(p, gfp_mask, order, memcg, nodemask);
 
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] OOM killer: wait for tasks with pending SIGKILL to exit

2013-09-09 Thread Sergey Dyasly
If OOM killer finds a task which is about to exit or is already doing so,
there is no need to kill anyone. It should just wait until task dies.

Add missing fatal_signal_pending() check and allow selected task to use memory
reserves in order to exit quickly.

Also remove redundant PF_EXITING check after victim has been selected.

Signed-off-by: Sergey Dyasly dse...@gmail.com
---
 mm/oom_kill.c | 17 +
 1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 98e75f2..ef83b81 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -275,13 +275,16 @@ enum oom_scan_t oom_scan_process_thread(struct 
task_struct *task,
if (oom_task_origin(task))
return OOM_SCAN_SELECT;
 
-   if (task-flags  PF_EXITING  !force_kill) {
+   if ((task-flags  PF_EXITING || fatal_signal_pending(task)) 
+   !force_kill) {
/*
 * If this task is not being ptraced on exit, then wait for it
 * to finish before killing some other task unnecessarily.
 */
-   if (!(task-group_leader-ptrace  PT_TRACE_EXIT))
+   if (!(task-group_leader-ptrace  PT_TRACE_EXIT)) {
+   set_tsk_thread_flag(task, TIF_MEMDIE);
return OOM_SCAN_ABORT;
+   }
}
return OOM_SCAN_OK;
 }
@@ -412,16 +415,6 @@ void oom_kill_process(struct task_struct *p, gfp_t 
gfp_mask, int order,
static DEFINE_RATELIMIT_STATE(oom_rs, DEFAULT_RATELIMIT_INTERVAL,
  DEFAULT_RATELIMIT_BURST);
 
-   /*
-* If the task is already exiting, don't alarm the sysadmin or kill
-* its children or threads, just set TIF_MEMDIE so it can die quickly
-*/
-   if (p-flags  PF_EXITING) {
-   set_tsk_thread_flag(p, TIF_MEMDIE);
-   put_task_struct(p);
-   return;
-   }
-
if (__ratelimit(oom_rs))
dump_header(p, gfp_mask, order, memcg, nodemask);
 
-- 
1.8.1.2

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] OOM killer: wait for tasks with pending SIGKILL to exit

2013-09-09 Thread Oleg Nesterov
Can't really comment this patch, just one off-topic note...

On 09/09, Sergey Dyasly wrote:

 @@ -275,13 +275,16 @@ enum oom_scan_t oom_scan_process_thread(struct 
 task_struct *task,
   if (oom_task_origin(task))
   return OOM_SCAN_SELECT;
  
 - if (task-flags  PF_EXITING  !force_kill) {
 + if ((task-flags  PF_EXITING || fatal_signal_pending(task)) 
 + !force_kill) {
   /*
* If this task is not being ptraced on exit, then wait for it
* to finish before killing some other task unnecessarily.
*/
 - if (!(task-group_leader-ptrace  PT_TRACE_EXIT))
 + if (!(task-group_leader-ptrace  PT_TRACE_EXIT)) {

can't we finally kill (or fix?) this PT_TRACE_EXIT check?

It was added to fix the exploit I sent. But the patch was wrong,
that exploit could be easily modified to trigger the same problem.

However, now that the coredumping is killable that exploit won't
work, so the original reason has gone away.

So why do we need this check today?

And note that we check -group_leader, this looks completely wrong.
(again, -group_leader was used just because the original exploit
 traced the group leader).

David?

Oleg.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] OOM killer: wait for tasks with pending SIGKILL to exit

2013-09-09 Thread David Rientjes
On Mon, 9 Sep 2013, Sergey Dyasly wrote:

 diff --git a/mm/oom_kill.c b/mm/oom_kill.c
 index 98e75f2..ef83b81 100644
 --- a/mm/oom_kill.c
 +++ b/mm/oom_kill.c
 @@ -275,13 +275,16 @@ enum oom_scan_t oom_scan_process_thread(struct 
 task_struct *task,
   if (oom_task_origin(task))
   return OOM_SCAN_SELECT;
  
 - if (task-flags  PF_EXITING  !force_kill) {
 + if ((task-flags  PF_EXITING || fatal_signal_pending(task)) 
 + !force_kill) {

This makes sense.

   /*
* If this task is not being ptraced on exit, then wait for it
* to finish before killing some other task unnecessarily.
*/
 - if (!(task-group_leader-ptrace  PT_TRACE_EXIT))
 + if (!(task-group_leader-ptrace  PT_TRACE_EXIT)) {
 + set_tsk_thread_flag(task, TIF_MEMDIE);

This does not, we do not give access to memory reserves unless the process 
needs it to allocate memory.  The task here, which is not current, can 
call into the oom killer and be granted memory reserves if necessary.

   return OOM_SCAN_ABORT;
 + }
   }
   return OOM_SCAN_OK;
  }
 @@ -412,16 +415,6 @@ void oom_kill_process(struct task_struct *p, gfp_t 
 gfp_mask, int order,
   static DEFINE_RATELIMIT_STATE(oom_rs, DEFAULT_RATELIMIT_INTERVAL,
 DEFAULT_RATELIMIT_BURST);
  
 - /*
 -  * If the task is already exiting, don't alarm the sysadmin or kill
 -  * its children or threads, just set TIF_MEMDIE so it can die quickly
 -  */
 - if (p-flags  PF_EXITING) {
 - set_tsk_thread_flag(p, TIF_MEMDIE);
 - put_task_struct(p);
 - return;
 - }

I think you misunderstood the point of this; if a selected process is 
already in the exit path then this is simply avoiding dumping oom kill 
lines to the kernel log.  We want to keep doing that.

 -
   if (__ratelimit(oom_rs))
   dump_header(p, gfp_mask, order, memcg, nodemask);
  
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] OOM killer: wait for tasks with pending SIGKILL to exit

2013-09-09 Thread David Rientjes
On Mon, 9 Sep 2013, Oleg Nesterov wrote:

  @@ -275,13 +275,16 @@ enum oom_scan_t oom_scan_process_thread(struct 
  task_struct *task,
  if (oom_task_origin(task))
  return OOM_SCAN_SELECT;
   
  -   if (task-flags  PF_EXITING  !force_kill) {
  +   if ((task-flags  PF_EXITING || fatal_signal_pending(task)) 
  +   !force_kill) {
  /*
   * If this task is not being ptraced on exit, then wait for it
   * to finish before killing some other task unnecessarily.
   */
  -   if (!(task-group_leader-ptrace  PT_TRACE_EXIT))
  +   if (!(task-group_leader-ptrace  PT_TRACE_EXIT)) {
 
 can't we finally kill (or fix?) this PT_TRACE_EXIT check?
 

Patches are always welcome.

 It was added to fix the exploit I sent. But the patch was wrong,
 that exploit could be easily modified to trigger the same problem.
 

If the patch prevented your exploit when coredumping was done differently 
then it was not wrong.  It may not have been as inclusive as you would 
have liked, but then again you never proposed any kernel changes to fix it 
yourself either.

 However, now that the coredumping is killable that exploit won't
 work, so the original reason has gone away.
 
 So why do we need this check today?
 

If you feel it can be removed, please propose a patch to do so with a 
changelog that describes why it is no longer necessary.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/