Re: [PATCH -mm 1/3] mm/oom_kill: remove the wrong fatal_signal_pending()

2015-09-30 Thread Oleg Nesterov
On 09/30, Tetsuo Handa wrote:
>
> David Rientjes wrote:
> > On Tue, 29 Sep 2015, Oleg Nesterov wrote:
> >
> > > The fatal_signal_pending() was added to suppress unnecessary "sharing
> > > same memory" message, but it can't 100% help anyway because it can be
> > > false-negative; SIGKILL can be already dequeued.
> > >
> > > And worse, it can be false-positive due to exec or coredump. exec is
> > > mostly fine, but coredump is not. It is possible that the group leader
> > > has the pending SIGKILL because its sub-thread originated the coredump,
> > > in this case we must not skip this process.
> > >
> > > We could probably add the additional ->group_exit_task check but this
> > > pach just removes fatal_signal_pending(), the extra "Kill process" is
> > > unlikely and doesn't really hurt.
>
> This fatal_signal_pending() check is about to be added by me because the OOM
> killer spams the kernel log when the mm struct which the OOM victim is using
> is shared by many threads. ( http://marc.info/?l=linux-mm=143256441501204 )

OK, I see, but it is wrong.

But I don't really understand "shared by many threads", I mean "threads" is
confusing word. I guess you mean CLONE_VM processes, otherwise we shouldn't
see the additional spam.

And 1000 CLONE_VM processes + "and the lock dependency prevents all threads
except the OOM victim thread from terminating until they get TIF_MEMDIE flag"
look like a really pathological case...

> > In addition, I'm really debating whether we need the "sharing same memory"
> > line or not.  In the past, it has been helpful because there is no other
> > way to determine what the kernel has killed other than to leave an
> > artifact behind in the kernel log.  I can imagine that this could easily
> > spam the kernel log, though, accompanied by oom killer messages that are
> > already very verbose.  I wouldn't mind if it the printk were removed
> > entirely.
> >
>
> I was waiting for your comment about whether you depend on
> the "sharing same memory" message with KERN_ERR level.
> ( http://marc.info/?l=linux-mm=144120389203133 )
>
> If nobody else objects, I think we can remove the "sharing same memory"
> message. ( http://marc.info/?l=linux-mm=144119325831959 )

OK, will you agree with v2 which also removes pr_warn?

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm 1/3] mm/oom_kill: remove the wrong fatal_signal_pending()

2015-09-30 Thread Oleg Nesterov
On 09/29, David Rientjes wrote:
>
> On Tue, 29 Sep 2015, Oleg Nesterov wrote:
>
> > The fatal_signal_pending() was added to suppress unnecessary "sharing
> > same memory" message, but it can't 100% help anyway because it can be
> > false-negative; SIGKILL can be already dequeued.
> >
> > And worse, it can be false-positive due to exec or coredump. exec is
> > mostly fine, but coredump is not. It is possible that the group leader
> > has the pending SIGKILL because its sub-thread originated the coredump,
> > in this case we must not skip this process.
> >
> > We could probably add the additional ->group_exit_task check but this
> > pach just removes fatal_signal_pending(), the extra "Kill process" is
> > unlikely and doesn't really hurt.
> >
> > Signed-off-by: Oleg Nesterov 
>
> Acked-by: David Rientjes 

Thanks!

> In addition, I'm really debating whether we need the "sharing same memory"
> line or not.  In the past, it has been helpful because there is no other
> way to determine what the kernel has killed other than to leave an
> artifact behind in the kernel log.  I can imagine that this could easily
> spam the kernel log, though, accompanied by oom killer messages that are
> already very verbose.  I wouldn't mind if it the printk were removed
> entirely.

Yes, me too... let me reply to Tetsuo's email.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm 1/3] mm/oom_kill: remove the wrong fatal_signal_pending()

2015-09-30 Thread Oleg Nesterov
On 09/30, Tetsuo Handa wrote:
>
> David Rientjes wrote:
> > On Tue, 29 Sep 2015, Oleg Nesterov wrote:
> >
> > > The fatal_signal_pending() was added to suppress unnecessary "sharing
> > > same memory" message, but it can't 100% help anyway because it can be
> > > false-negative; SIGKILL can be already dequeued.
> > >
> > > And worse, it can be false-positive due to exec or coredump. exec is
> > > mostly fine, but coredump is not. It is possible that the group leader
> > > has the pending SIGKILL because its sub-thread originated the coredump,
> > > in this case we must not skip this process.
> > >
> > > We could probably add the additional ->group_exit_task check but this
> > > pach just removes fatal_signal_pending(), the extra "Kill process" is
> > > unlikely and doesn't really hurt.
>
> This fatal_signal_pending() check is about to be added by me because the OOM
> killer spams the kernel log when the mm struct which the OOM victim is using
> is shared by many threads. ( http://marc.info/?l=linux-mm=143256441501204 )

OK, I see, but it is wrong.

But I don't really understand "shared by many threads", I mean "threads" is
confusing word. I guess you mean CLONE_VM processes, otherwise we shouldn't
see the additional spam.

And 1000 CLONE_VM processes + "and the lock dependency prevents all threads
except the OOM victim thread from terminating until they get TIF_MEMDIE flag"
look like a really pathological case...

> > In addition, I'm really debating whether we need the "sharing same memory"
> > line or not.  In the past, it has been helpful because there is no other
> > way to determine what the kernel has killed other than to leave an
> > artifact behind in the kernel log.  I can imagine that this could easily
> > spam the kernel log, though, accompanied by oom killer messages that are
> > already very verbose.  I wouldn't mind if it the printk were removed
> > entirely.
> >
>
> I was waiting for your comment about whether you depend on
> the "sharing same memory" message with KERN_ERR level.
> ( http://marc.info/?l=linux-mm=144120389203133 )
>
> If nobody else objects, I think we can remove the "sharing same memory"
> message. ( http://marc.info/?l=linux-mm=144119325831959 )

OK, will you agree with v2 which also removes pr_warn?

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm 1/3] mm/oom_kill: remove the wrong fatal_signal_pending()

2015-09-30 Thread Oleg Nesterov
On 09/29, David Rientjes wrote:
>
> On Tue, 29 Sep 2015, Oleg Nesterov wrote:
>
> > The fatal_signal_pending() was added to suppress unnecessary "sharing
> > same memory" message, but it can't 100% help anyway because it can be
> > false-negative; SIGKILL can be already dequeued.
> >
> > And worse, it can be false-positive due to exec or coredump. exec is
> > mostly fine, but coredump is not. It is possible that the group leader
> > has the pending SIGKILL because its sub-thread originated the coredump,
> > in this case we must not skip this process.
> >
> > We could probably add the additional ->group_exit_task check but this
> > pach just removes fatal_signal_pending(), the extra "Kill process" is
> > unlikely and doesn't really hurt.
> >
> > Signed-off-by: Oleg Nesterov 
>
> Acked-by: David Rientjes 

Thanks!

> In addition, I'm really debating whether we need the "sharing same memory"
> line or not.  In the past, it has been helpful because there is no other
> way to determine what the kernel has killed other than to leave an
> artifact behind in the kernel log.  I can imagine that this could easily
> spam the kernel log, though, accompanied by oom killer messages that are
> already very verbose.  I wouldn't mind if it the printk were removed
> entirely.

Yes, me too... let me reply to Tetsuo's email.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm 1/3] mm/oom_kill: remove the wrong fatal_signal_pending()

2015-09-29 Thread Tetsuo Handa
David Rientjes wrote:
> On Tue, 29 Sep 2015, Oleg Nesterov wrote:
> 
> > The fatal_signal_pending() was added to suppress unnecessary "sharing
> > same memory" message, but it can't 100% help anyway because it can be
> > false-negative; SIGKILL can be already dequeued.
> > 
> > And worse, it can be false-positive due to exec or coredump. exec is
> > mostly fine, but coredump is not. It is possible that the group leader
> > has the pending SIGKILL because its sub-thread originated the coredump,
> > in this case we must not skip this process.
> > 
> > We could probably add the additional ->group_exit_task check but this
> > pach just removes fatal_signal_pending(), the extra "Kill process" is
> > unlikely and doesn't really hurt.

This fatal_signal_pending() check is about to be added by me because the OOM
killer spams the kernel log when the mm struct which the OOM victim is using
is shared by many threads. ( http://marc.info/?l=linux-mm=143256441501204 )

> > 
> > Signed-off-by: Oleg Nesterov 
> 
> Acked-by: David Rientjes 
> 
> In addition, I'm really debating whether we need the "sharing same memory" 
> line or not.  In the past, it has been helpful because there is no other 
> way to determine what the kernel has killed other than to leave an 
> artifact behind in the kernel log.  I can imagine that this could easily 
> spam the kernel log, though, accompanied by oom killer messages that are 
> already very verbose.  I wouldn't mind if it the printk were removed 
> entirely.
> 

I was waiting for your comment about whether you depend on
the "sharing same memory" message with KERN_ERR level.
( http://marc.info/?l=linux-mm=144120389203133 )

If nobody else objects, I think we can remove the "sharing same memory"
message. ( http://marc.info/?l=linux-mm=144119325831959 )
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm 1/3] mm/oom_kill: remove the wrong fatal_signal_pending()

2015-09-29 Thread David Rientjes
On Tue, 29 Sep 2015, Oleg Nesterov wrote:

> The fatal_signal_pending() was added to suppress unnecessary "sharing
> same memory" message, but it can't 100% help anyway because it can be
> false-negative; SIGKILL can be already dequeued.
> 
> And worse, it can be false-positive due to exec or coredump. exec is
> mostly fine, but coredump is not. It is possible that the group leader
> has the pending SIGKILL because its sub-thread originated the coredump,
> in this case we must not skip this process.
> 
> We could probably add the additional ->group_exit_task check but this
> pach just removes fatal_signal_pending(), the extra "Kill process" is
> unlikely and doesn't really hurt.
> 
> Signed-off-by: Oleg Nesterov 

Acked-by: David Rientjes 

In addition, I'm really debating whether we need the "sharing same memory" 
line or not.  In the past, it has been helpful because there is no other 
way to determine what the kernel has killed other than to leave an 
artifact behind in the kernel log.  I can imagine that this could easily 
spam the kernel log, though, accompanied by oom killer messages that are 
already very verbose.  I wouldn't mind if it the printk were removed 
entirely.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm 1/3] mm/oom_kill: remove the wrong fatal_signal_pending()

2015-09-29 Thread Oleg Nesterov
The fatal_signal_pending() was added to suppress unnecessary "sharing
same memory" message, but it can't 100% help anyway because it can be
false-negative; SIGKILL can be already dequeued.

And worse, it can be false-positive due to exec or coredump. exec is
mostly fine, but coredump is not. It is possible that the group leader
has the pending SIGKILL because its sub-thread originated the coredump,
in this case we must not skip this process.

We could probably add the additional ->group_exit_task check but this
pach just removes fatal_signal_pending(), the extra "Kill process" is
unlikely and doesn't really hurt.

Signed-off-by: Oleg Nesterov 
---
 mm/oom_kill.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 4766e25..0d581c6 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -588,8 +588,6 @@ void oom_kill_process(struct oom_control *oc, struct 
task_struct *p,
!(p->flags & PF_KTHREAD)) {
if (p->signal->oom_score_adj == OOM_SCORE_ADJ_MIN)
continue;
-   if (fatal_signal_pending(p))
-   continue;
 
pr_info("Kill process %d (%s) sharing same memory\n",
task_pid_nr(p), p->comm);
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm 1/3] mm/oom_kill: remove the wrong fatal_signal_pending()

2015-09-29 Thread Oleg Nesterov
The fatal_signal_pending() was added to suppress unnecessary "sharing
same memory" message, but it can't 100% help anyway because it can be
false-negative; SIGKILL can be already dequeued.

And worse, it can be false-positive due to exec or coredump. exec is
mostly fine, but coredump is not. It is possible that the group leader
has the pending SIGKILL because its sub-thread originated the coredump,
in this case we must not skip this process.

We could probably add the additional ->group_exit_task check but this
pach just removes fatal_signal_pending(), the extra "Kill process" is
unlikely and doesn't really hurt.

Signed-off-by: Oleg Nesterov 
---
 mm/oom_kill.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 4766e25..0d581c6 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -588,8 +588,6 @@ void oom_kill_process(struct oom_control *oc, struct 
task_struct *p,
!(p->flags & PF_KTHREAD)) {
if (p->signal->oom_score_adj == OOM_SCORE_ADJ_MIN)
continue;
-   if (fatal_signal_pending(p))
-   continue;
 
pr_info("Kill process %d (%s) sharing same memory\n",
task_pid_nr(p), p->comm);
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm 1/3] mm/oom_kill: remove the wrong fatal_signal_pending()

2015-09-29 Thread David Rientjes
On Tue, 29 Sep 2015, Oleg Nesterov wrote:

> The fatal_signal_pending() was added to suppress unnecessary "sharing
> same memory" message, but it can't 100% help anyway because it can be
> false-negative; SIGKILL can be already dequeued.
> 
> And worse, it can be false-positive due to exec or coredump. exec is
> mostly fine, but coredump is not. It is possible that the group leader
> has the pending SIGKILL because its sub-thread originated the coredump,
> in this case we must not skip this process.
> 
> We could probably add the additional ->group_exit_task check but this
> pach just removes fatal_signal_pending(), the extra "Kill process" is
> unlikely and doesn't really hurt.
> 
> Signed-off-by: Oleg Nesterov 

Acked-by: David Rientjes 

In addition, I'm really debating whether we need the "sharing same memory" 
line or not.  In the past, it has been helpful because there is no other 
way to determine what the kernel has killed other than to leave an 
artifact behind in the kernel log.  I can imagine that this could easily 
spam the kernel log, though, accompanied by oom killer messages that are 
already very verbose.  I wouldn't mind if it the printk were removed 
entirely.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm 1/3] mm/oom_kill: remove the wrong fatal_signal_pending()

2015-09-29 Thread Tetsuo Handa
David Rientjes wrote:
> On Tue, 29 Sep 2015, Oleg Nesterov wrote:
> 
> > The fatal_signal_pending() was added to suppress unnecessary "sharing
> > same memory" message, but it can't 100% help anyway because it can be
> > false-negative; SIGKILL can be already dequeued.
> > 
> > And worse, it can be false-positive due to exec or coredump. exec is
> > mostly fine, but coredump is not. It is possible that the group leader
> > has the pending SIGKILL because its sub-thread originated the coredump,
> > in this case we must not skip this process.
> > 
> > We could probably add the additional ->group_exit_task check but this
> > pach just removes fatal_signal_pending(), the extra "Kill process" is
> > unlikely and doesn't really hurt.

This fatal_signal_pending() check is about to be added by me because the OOM
killer spams the kernel log when the mm struct which the OOM victim is using
is shared by many threads. ( http://marc.info/?l=linux-mm=143256441501204 )

> > 
> > Signed-off-by: Oleg Nesterov 
> 
> Acked-by: David Rientjes 
> 
> In addition, I'm really debating whether we need the "sharing same memory" 
> line or not.  In the past, it has been helpful because there is no other 
> way to determine what the kernel has killed other than to leave an 
> artifact behind in the kernel log.  I can imagine that this could easily 
> spam the kernel log, though, accompanied by oom killer messages that are 
> already very verbose.  I wouldn't mind if it the printk were removed 
> entirely.
> 

I was waiting for your comment about whether you depend on
the "sharing same memory" message with KERN_ERR level.
( http://marc.info/?l=linux-mm=144120389203133 )

If nobody else objects, I think we can remove the "sharing same memory"
message. ( http://marc.info/?l=linux-mm=144119325831959 )
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/