Re: [PATCH -mm 1/3] mm/oom_kill: remove the wrong fatal_signal_pending()
On 09/30, Tetsuo Handa wrote: > > David Rientjes wrote: > > On Tue, 29 Sep 2015, Oleg Nesterov wrote: > > > > > The fatal_signal_pending() was added to suppress unnecessary "sharing > > > same memory" message, but it can't 100% help anyway because it can be > > > false-negative; SIGKILL can be already dequeued. > > > > > > And worse, it can be false-positive due to exec or coredump. exec is > > > mostly fine, but coredump is not. It is possible that the group leader > > > has the pending SIGKILL because its sub-thread originated the coredump, > > > in this case we must not skip this process. > > > > > > We could probably add the additional ->group_exit_task check but this > > > pach just removes fatal_signal_pending(), the extra "Kill process" is > > > unlikely and doesn't really hurt. > > This fatal_signal_pending() check is about to be added by me because the OOM > killer spams the kernel log when the mm struct which the OOM victim is using > is shared by many threads. ( http://marc.info/?l=linux-mm=143256441501204 ) OK, I see, but it is wrong. But I don't really understand "shared by many threads", I mean "threads" is confusing word. I guess you mean CLONE_VM processes, otherwise we shouldn't see the additional spam. And 1000 CLONE_VM processes + "and the lock dependency prevents all threads except the OOM victim thread from terminating until they get TIF_MEMDIE flag" look like a really pathological case... > > In addition, I'm really debating whether we need the "sharing same memory" > > line or not. In the past, it has been helpful because there is no other > > way to determine what the kernel has killed other than to leave an > > artifact behind in the kernel log. I can imagine that this could easily > > spam the kernel log, though, accompanied by oom killer messages that are > > already very verbose. I wouldn't mind if it the printk were removed > > entirely. > > > > I was waiting for your comment about whether you depend on > the "sharing same memory" message with KERN_ERR level. > ( http://marc.info/?l=linux-mm=144120389203133 ) > > If nobody else objects, I think we can remove the "sharing same memory" > message. ( http://marc.info/?l=linux-mm=144119325831959 ) OK, will you agree with v2 which also removes pr_warn? Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm 1/3] mm/oom_kill: remove the wrong fatal_signal_pending()
On 09/29, David Rientjes wrote: > > On Tue, 29 Sep 2015, Oleg Nesterov wrote: > > > The fatal_signal_pending() was added to suppress unnecessary "sharing > > same memory" message, but it can't 100% help anyway because it can be > > false-negative; SIGKILL can be already dequeued. > > > > And worse, it can be false-positive due to exec or coredump. exec is > > mostly fine, but coredump is not. It is possible that the group leader > > has the pending SIGKILL because its sub-thread originated the coredump, > > in this case we must not skip this process. > > > > We could probably add the additional ->group_exit_task check but this > > pach just removes fatal_signal_pending(), the extra "Kill process" is > > unlikely and doesn't really hurt. > > > > Signed-off-by: Oleg Nesterov > > Acked-by: David Rientjes Thanks! > In addition, I'm really debating whether we need the "sharing same memory" > line or not. In the past, it has been helpful because there is no other > way to determine what the kernel has killed other than to leave an > artifact behind in the kernel log. I can imagine that this could easily > spam the kernel log, though, accompanied by oom killer messages that are > already very verbose. I wouldn't mind if it the printk were removed > entirely. Yes, me too... let me reply to Tetsuo's email. Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm 1/3] mm/oom_kill: remove the wrong fatal_signal_pending()
On 09/30, Tetsuo Handa wrote: > > David Rientjes wrote: > > On Tue, 29 Sep 2015, Oleg Nesterov wrote: > > > > > The fatal_signal_pending() was added to suppress unnecessary "sharing > > > same memory" message, but it can't 100% help anyway because it can be > > > false-negative; SIGKILL can be already dequeued. > > > > > > And worse, it can be false-positive due to exec or coredump. exec is > > > mostly fine, but coredump is not. It is possible that the group leader > > > has the pending SIGKILL because its sub-thread originated the coredump, > > > in this case we must not skip this process. > > > > > > We could probably add the additional ->group_exit_task check but this > > > pach just removes fatal_signal_pending(), the extra "Kill process" is > > > unlikely and doesn't really hurt. > > This fatal_signal_pending() check is about to be added by me because the OOM > killer spams the kernel log when the mm struct which the OOM victim is using > is shared by many threads. ( http://marc.info/?l=linux-mm=143256441501204 ) OK, I see, but it is wrong. But I don't really understand "shared by many threads", I mean "threads" is confusing word. I guess you mean CLONE_VM processes, otherwise we shouldn't see the additional spam. And 1000 CLONE_VM processes + "and the lock dependency prevents all threads except the OOM victim thread from terminating until they get TIF_MEMDIE flag" look like a really pathological case... > > In addition, I'm really debating whether we need the "sharing same memory" > > line or not. In the past, it has been helpful because there is no other > > way to determine what the kernel has killed other than to leave an > > artifact behind in the kernel log. I can imagine that this could easily > > spam the kernel log, though, accompanied by oom killer messages that are > > already very verbose. I wouldn't mind if it the printk were removed > > entirely. > > > > I was waiting for your comment about whether you depend on > the "sharing same memory" message with KERN_ERR level. > ( http://marc.info/?l=linux-mm=144120389203133 ) > > If nobody else objects, I think we can remove the "sharing same memory" > message. ( http://marc.info/?l=linux-mm=144119325831959 ) OK, will you agree with v2 which also removes pr_warn? Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm 1/3] mm/oom_kill: remove the wrong fatal_signal_pending()
On 09/29, David Rientjes wrote: > > On Tue, 29 Sep 2015, Oleg Nesterov wrote: > > > The fatal_signal_pending() was added to suppress unnecessary "sharing > > same memory" message, but it can't 100% help anyway because it can be > > false-negative; SIGKILL can be already dequeued. > > > > And worse, it can be false-positive due to exec or coredump. exec is > > mostly fine, but coredump is not. It is possible that the group leader > > has the pending SIGKILL because its sub-thread originated the coredump, > > in this case we must not skip this process. > > > > We could probably add the additional ->group_exit_task check but this > > pach just removes fatal_signal_pending(), the extra "Kill process" is > > unlikely and doesn't really hurt. > > > > Signed-off-by: Oleg Nesterov> > Acked-by: David Rientjes Thanks! > In addition, I'm really debating whether we need the "sharing same memory" > line or not. In the past, it has been helpful because there is no other > way to determine what the kernel has killed other than to leave an > artifact behind in the kernel log. I can imagine that this could easily > spam the kernel log, though, accompanied by oom killer messages that are > already very verbose. I wouldn't mind if it the printk were removed > entirely. Yes, me too... let me reply to Tetsuo's email. Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm 1/3] mm/oom_kill: remove the wrong fatal_signal_pending()
David Rientjes wrote: > On Tue, 29 Sep 2015, Oleg Nesterov wrote: > > > The fatal_signal_pending() was added to suppress unnecessary "sharing > > same memory" message, but it can't 100% help anyway because it can be > > false-negative; SIGKILL can be already dequeued. > > > > And worse, it can be false-positive due to exec or coredump. exec is > > mostly fine, but coredump is not. It is possible that the group leader > > has the pending SIGKILL because its sub-thread originated the coredump, > > in this case we must not skip this process. > > > > We could probably add the additional ->group_exit_task check but this > > pach just removes fatal_signal_pending(), the extra "Kill process" is > > unlikely and doesn't really hurt. This fatal_signal_pending() check is about to be added by me because the OOM killer spams the kernel log when the mm struct which the OOM victim is using is shared by many threads. ( http://marc.info/?l=linux-mm=143256441501204 ) > > > > Signed-off-by: Oleg Nesterov > > Acked-by: David Rientjes > > In addition, I'm really debating whether we need the "sharing same memory" > line or not. In the past, it has been helpful because there is no other > way to determine what the kernel has killed other than to leave an > artifact behind in the kernel log. I can imagine that this could easily > spam the kernel log, though, accompanied by oom killer messages that are > already very verbose. I wouldn't mind if it the printk were removed > entirely. > I was waiting for your comment about whether you depend on the "sharing same memory" message with KERN_ERR level. ( http://marc.info/?l=linux-mm=144120389203133 ) If nobody else objects, I think we can remove the "sharing same memory" message. ( http://marc.info/?l=linux-mm=144119325831959 ) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm 1/3] mm/oom_kill: remove the wrong fatal_signal_pending()
On Tue, 29 Sep 2015, Oleg Nesterov wrote: > The fatal_signal_pending() was added to suppress unnecessary "sharing > same memory" message, but it can't 100% help anyway because it can be > false-negative; SIGKILL can be already dequeued. > > And worse, it can be false-positive due to exec or coredump. exec is > mostly fine, but coredump is not. It is possible that the group leader > has the pending SIGKILL because its sub-thread originated the coredump, > in this case we must not skip this process. > > We could probably add the additional ->group_exit_task check but this > pach just removes fatal_signal_pending(), the extra "Kill process" is > unlikely and doesn't really hurt. > > Signed-off-by: Oleg Nesterov Acked-by: David Rientjes In addition, I'm really debating whether we need the "sharing same memory" line or not. In the past, it has been helpful because there is no other way to determine what the kernel has killed other than to leave an artifact behind in the kernel log. I can imagine that this could easily spam the kernel log, though, accompanied by oom killer messages that are already very verbose. I wouldn't mind if it the printk were removed entirely. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -mm 1/3] mm/oom_kill: remove the wrong fatal_signal_pending()
The fatal_signal_pending() was added to suppress unnecessary "sharing same memory" message, but it can't 100% help anyway because it can be false-negative; SIGKILL can be already dequeued. And worse, it can be false-positive due to exec or coredump. exec is mostly fine, but coredump is not. It is possible that the group leader has the pending SIGKILL because its sub-thread originated the coredump, in this case we must not skip this process. We could probably add the additional ->group_exit_task check but this pach just removes fatal_signal_pending(), the extra "Kill process" is unlikely and doesn't really hurt. Signed-off-by: Oleg Nesterov --- mm/oom_kill.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 4766e25..0d581c6 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -588,8 +588,6 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p, !(p->flags & PF_KTHREAD)) { if (p->signal->oom_score_adj == OOM_SCORE_ADJ_MIN) continue; - if (fatal_signal_pending(p)) - continue; pr_info("Kill process %d (%s) sharing same memory\n", task_pid_nr(p), p->comm); -- 2.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -mm 1/3] mm/oom_kill: remove the wrong fatal_signal_pending()
The fatal_signal_pending() was added to suppress unnecessary "sharing same memory" message, but it can't 100% help anyway because it can be false-negative; SIGKILL can be already dequeued. And worse, it can be false-positive due to exec or coredump. exec is mostly fine, but coredump is not. It is possible that the group leader has the pending SIGKILL because its sub-thread originated the coredump, in this case we must not skip this process. We could probably add the additional ->group_exit_task check but this pach just removes fatal_signal_pending(), the extra "Kill process" is unlikely and doesn't really hurt. Signed-off-by: Oleg Nesterov--- mm/oom_kill.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 4766e25..0d581c6 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -588,8 +588,6 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p, !(p->flags & PF_KTHREAD)) { if (p->signal->oom_score_adj == OOM_SCORE_ADJ_MIN) continue; - if (fatal_signal_pending(p)) - continue; pr_info("Kill process %d (%s) sharing same memory\n", task_pid_nr(p), p->comm); -- 2.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm 1/3] mm/oom_kill: remove the wrong fatal_signal_pending()
On Tue, 29 Sep 2015, Oleg Nesterov wrote: > The fatal_signal_pending() was added to suppress unnecessary "sharing > same memory" message, but it can't 100% help anyway because it can be > false-negative; SIGKILL can be already dequeued. > > And worse, it can be false-positive due to exec or coredump. exec is > mostly fine, but coredump is not. It is possible that the group leader > has the pending SIGKILL because its sub-thread originated the coredump, > in this case we must not skip this process. > > We could probably add the additional ->group_exit_task check but this > pach just removes fatal_signal_pending(), the extra "Kill process" is > unlikely and doesn't really hurt. > > Signed-off-by: Oleg NesterovAcked-by: David Rientjes In addition, I'm really debating whether we need the "sharing same memory" line or not. In the past, it has been helpful because there is no other way to determine what the kernel has killed other than to leave an artifact behind in the kernel log. I can imagine that this could easily spam the kernel log, though, accompanied by oom killer messages that are already very verbose. I wouldn't mind if it the printk were removed entirely. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm 1/3] mm/oom_kill: remove the wrong fatal_signal_pending()
David Rientjes wrote: > On Tue, 29 Sep 2015, Oleg Nesterov wrote: > > > The fatal_signal_pending() was added to suppress unnecessary "sharing > > same memory" message, but it can't 100% help anyway because it can be > > false-negative; SIGKILL can be already dequeued. > > > > And worse, it can be false-positive due to exec or coredump. exec is > > mostly fine, but coredump is not. It is possible that the group leader > > has the pending SIGKILL because its sub-thread originated the coredump, > > in this case we must not skip this process. > > > > We could probably add the additional ->group_exit_task check but this > > pach just removes fatal_signal_pending(), the extra "Kill process" is > > unlikely and doesn't really hurt. This fatal_signal_pending() check is about to be added by me because the OOM killer spams the kernel log when the mm struct which the OOM victim is using is shared by many threads. ( http://marc.info/?l=linux-mm=143256441501204 ) > > > > Signed-off-by: Oleg Nesterov> > Acked-by: David Rientjes > > In addition, I'm really debating whether we need the "sharing same memory" > line or not. In the past, it has been helpful because there is no other > way to determine what the kernel has killed other than to leave an > artifact behind in the kernel log. I can imagine that this could easily > spam the kernel log, though, accompanied by oom killer messages that are > already very verbose. I wouldn't mind if it the printk were removed > entirely. > I was waiting for your comment about whether you depend on the "sharing same memory" message with KERN_ERR level. ( http://marc.info/?l=linux-mm=144120389203133 ) If nobody else objects, I think we can remove the "sharing same memory" message. ( http://marc.info/?l=linux-mm=144119325831959 ) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/