Re: [PATCH] mm: oom: Fix race condition between oom_badness and do_exit of task
[Sorry about the slow response but I was offline for almost two weeks and catching up with a tsunami in my inbox now] On Fri 09-03-18 19:48:46, Tetsuo Handa wrote: > Kohli, Gaurav wrote: > > > t->alloc_lock is still held when leaving find_lock_task_mm(), which means > > > that t->mm != NULL. But nothing prevents t from setting t->mm = NULL at > > > exit_mm() from do_exit() and calling exit_creds() from > > > __put_task_struct(t) > > > after task_unlock(t) is called. Seems difficult to trigger race window. > > > Maybe > > > something has preempted because oom_badness() becomes outside of RCU grace > > > period upon leaving find_lock_task_mm() when called from proc_oom_score(). > > > > Hi Tetsuo, > > > > Yes it is not easy to reproduce seen twice till now and i agree with > > your analysis. But David has already fixing this in different way, > > So that also looks better to me: > > > > https://patchwork.kernel.org/patch/10265641/ > > > > Yes, I'm aware of that patch. > > > But if need to keep that code, So we have to bump up the task > > reference that's only i can think of now. > > I don't think so, for I think it is safe to call > has_capability_noaudit(p) with p->alloc_lock held. This however adds a subtle assumption on locking here and we should rather not do so. The scope of alloc_lock is quite messy already and adding on top is definitely not an improvement. > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index f2e7dfb..4efcfb8 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -222,7 +222,6 @@ unsigned long oom_badness(struct task_struct *p, struct > mem_cgroup *memcg, >*/ > points = get_mm_rss(p->mm) + get_mm_counter(p->mm, MM_SWAPENTS) + > mm_pgtables_bytes(p->mm) / PAGE_SIZE; > - task_unlock(p); > > /* >* Root processes get 3% bonus, just like the __vm_enough_memory() > @@ -230,6 +229,7 @@ unsigned long oom_badness(struct task_struct *p, struct > mem_cgroup *memcg, >*/ > if (has_capability_noaudit(p, CAP_SYS_ADMIN)) > points -= (points * 3) / 100; > + task_unlock(p); > > /* Normalize to oom_score_adj units */ > adj *= totalpages / 1000; -- Michal Hocko SUSE Labs
Re: [PATCH] mm: oom: Fix race condition between oom_badness and do_exit of task
[Sorry about the slow response but I was offline for almost two weeks and catching up with a tsunami in my inbox now] On Fri 09-03-18 19:48:46, Tetsuo Handa wrote: > Kohli, Gaurav wrote: > > > t->alloc_lock is still held when leaving find_lock_task_mm(), which means > > > that t->mm != NULL. But nothing prevents t from setting t->mm = NULL at > > > exit_mm() from do_exit() and calling exit_creds() from > > > __put_task_struct(t) > > > after task_unlock(t) is called. Seems difficult to trigger race window. > > > Maybe > > > something has preempted because oom_badness() becomes outside of RCU grace > > > period upon leaving find_lock_task_mm() when called from proc_oom_score(). > > > > Hi Tetsuo, > > > > Yes it is not easy to reproduce seen twice till now and i agree with > > your analysis. But David has already fixing this in different way, > > So that also looks better to me: > > > > https://patchwork.kernel.org/patch/10265641/ > > > > Yes, I'm aware of that patch. > > > But if need to keep that code, So we have to bump up the task > > reference that's only i can think of now. > > I don't think so, for I think it is safe to call > has_capability_noaudit(p) with p->alloc_lock held. This however adds a subtle assumption on locking here and we should rather not do so. The scope of alloc_lock is quite messy already and adding on top is definitely not an improvement. > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index f2e7dfb..4efcfb8 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -222,7 +222,6 @@ unsigned long oom_badness(struct task_struct *p, struct > mem_cgroup *memcg, >*/ > points = get_mm_rss(p->mm) + get_mm_counter(p->mm, MM_SWAPENTS) + > mm_pgtables_bytes(p->mm) / PAGE_SIZE; > - task_unlock(p); > > /* >* Root processes get 3% bonus, just like the __vm_enough_memory() > @@ -230,6 +229,7 @@ unsigned long oom_badness(struct task_struct *p, struct > mem_cgroup *memcg, >*/ > if (has_capability_noaudit(p, CAP_SYS_ADMIN)) > points -= (points * 3) / 100; > + task_unlock(p); > > /* Normalize to oom_score_adj units */ > adj *= totalpages / 1000; -- Michal Hocko SUSE Labs
Re: [PATCH] mm: oom: Fix race condition between oom_badness and do_exit of task
Kohli, Gaurav wrote: > On 3/9/2018 4:18 PM, Tetsuo Handa wrote: > > > Kohli, Gaurav wrote: > >>> t->alloc_lock is still held when leaving find_lock_task_mm(), which means > >>> that t->mm != NULL. But nothing prevents t from setting t->mm = NULL at > >>> exit_mm() from do_exit() and calling exit_creds() from > >>> __put_task_struct(t) > >>> after task_unlock(t) is called. Seems difficult to trigger race window. > >>> Maybe > >>> something has preempted because oom_badness() becomes outside of RCU grace > >>> period upon leaving find_lock_task_mm() when called from proc_oom_score(). > >> Hi Tetsuo, > >> > >> Yes it is not easy to reproduce seen twice till now and i agree with > >> your analysis. But David has already fixing this in different way, > >> So that also looks better to me: > >> > >> https://patchwork.kernel.org/patch/10265641/ > >> > > Yes, I'm aware of that patch. > > > >> But if need to keep that code, So we have to bump up the task > >> reference that's only i can think of now. > > I don't think so, for I think it is safe to call > > has_capability_noaudit(p) with p->alloc_lock held. > > > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > > index f2e7dfb..4efcfb8 100644 > > --- a/mm/oom_kill.c > > +++ b/mm/oom_kill.c > > @@ -222,7 +222,6 @@ unsigned long oom_badness(struct task_struct *p, struct > > mem_cgroup *memcg, > > */ > > points = get_mm_rss(p->mm) + get_mm_counter(p->mm, MM_SWAPENTS) + > > mm_pgtables_bytes(p->mm) / PAGE_SIZE; > > - task_unlock(p); > > > > /* > > * Root processes get 3% bonus, just like the __vm_enough_memory() > > @@ -230,6 +229,7 @@ unsigned long oom_badness(struct task_struct *p, struct > > mem_cgroup *memcg, > > */ > > if (has_capability_noaudit(p, CAP_SYS_ADMIN)) > > points -= (points * 3) / 100; > > + task_unlock(p); > > Earlier i have thought the same to post this, but this may create > problem if there are sleeping calls in > > has_capability_noaudit ? has_capability_noaudit() does not sleep. See what has_ns_capability_noaudit() is doing. > > > > > /* Normalize to oom_score_adj units */ > > adj *= totalpages / 1000; > > > -- > Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc. > is a member of the Code Aurora Forum, > a Linux Foundation Collaborative Project. > >
Re: [PATCH] mm: oom: Fix race condition between oom_badness and do_exit of task
Kohli, Gaurav wrote: > On 3/9/2018 4:18 PM, Tetsuo Handa wrote: > > > Kohli, Gaurav wrote: > >>> t->alloc_lock is still held when leaving find_lock_task_mm(), which means > >>> that t->mm != NULL. But nothing prevents t from setting t->mm = NULL at > >>> exit_mm() from do_exit() and calling exit_creds() from > >>> __put_task_struct(t) > >>> after task_unlock(t) is called. Seems difficult to trigger race window. > >>> Maybe > >>> something has preempted because oom_badness() becomes outside of RCU grace > >>> period upon leaving find_lock_task_mm() when called from proc_oom_score(). > >> Hi Tetsuo, > >> > >> Yes it is not easy to reproduce seen twice till now and i agree with > >> your analysis. But David has already fixing this in different way, > >> So that also looks better to me: > >> > >> https://patchwork.kernel.org/patch/10265641/ > >> > > Yes, I'm aware of that patch. > > > >> But if need to keep that code, So we have to bump up the task > >> reference that's only i can think of now. > > I don't think so, for I think it is safe to call > > has_capability_noaudit(p) with p->alloc_lock held. > > > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > > index f2e7dfb..4efcfb8 100644 > > --- a/mm/oom_kill.c > > +++ b/mm/oom_kill.c > > @@ -222,7 +222,6 @@ unsigned long oom_badness(struct task_struct *p, struct > > mem_cgroup *memcg, > > */ > > points = get_mm_rss(p->mm) + get_mm_counter(p->mm, MM_SWAPENTS) + > > mm_pgtables_bytes(p->mm) / PAGE_SIZE; > > - task_unlock(p); > > > > /* > > * Root processes get 3% bonus, just like the __vm_enough_memory() > > @@ -230,6 +229,7 @@ unsigned long oom_badness(struct task_struct *p, struct > > mem_cgroup *memcg, > > */ > > if (has_capability_noaudit(p, CAP_SYS_ADMIN)) > > points -= (points * 3) / 100; > > + task_unlock(p); > > Earlier i have thought the same to post this, but this may create > problem if there are sleeping calls in > > has_capability_noaudit ? has_capability_noaudit() does not sleep. See what has_ns_capability_noaudit() is doing. > > > > > /* Normalize to oom_score_adj units */ > > adj *= totalpages / 1000; > > > -- > Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc. > is a member of the Code Aurora Forum, > a Linux Foundation Collaborative Project. > >
Re: [PATCH] mm: oom: Fix race condition between oom_badness and do_exit of task
On 3/9/2018 4:18 PM, Tetsuo Handa wrote: Kohli, Gaurav wrote: t->alloc_lock is still held when leaving find_lock_task_mm(), which means that t->mm != NULL. But nothing prevents t from setting t->mm = NULL at exit_mm() from do_exit() and calling exit_creds() from __put_task_struct(t) after task_unlock(t) is called. Seems difficult to trigger race window. Maybe something has preempted because oom_badness() becomes outside of RCU grace period upon leaving find_lock_task_mm() when called from proc_oom_score(). Hi Tetsuo, Yes it is not easy to reproduce seen twice till now and i agree with your analysis. But David has already fixing this in different way, So that also looks better to me: https://patchwork.kernel.org/patch/10265641/ Yes, I'm aware of that patch. But if need to keep that code, So we have to bump up the task reference that's only i can think of now. I don't think so, for I think it is safe to call has_capability_noaudit(p) with p->alloc_lock held. diff --git a/mm/oom_kill.c b/mm/oom_kill.c index f2e7dfb..4efcfb8 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -222,7 +222,6 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg, */ points = get_mm_rss(p->mm) + get_mm_counter(p->mm, MM_SWAPENTS) + mm_pgtables_bytes(p->mm) / PAGE_SIZE; - task_unlock(p); /* * Root processes get 3% bonus, just like the __vm_enough_memory() @@ -230,6 +229,7 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg, */ if (has_capability_noaudit(p, CAP_SYS_ADMIN)) points -= (points * 3) / 100; + task_unlock(p); Earlier i have thought the same to post this, but this may create problem if there are sleeping calls in has_capability_noaudit ? /* Normalize to oom_score_adj units */ adj *= totalpages / 1000; -- Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
Re: [PATCH] mm: oom: Fix race condition between oom_badness and do_exit of task
On 3/9/2018 4:18 PM, Tetsuo Handa wrote: Kohli, Gaurav wrote: t->alloc_lock is still held when leaving find_lock_task_mm(), which means that t->mm != NULL. But nothing prevents t from setting t->mm = NULL at exit_mm() from do_exit() and calling exit_creds() from __put_task_struct(t) after task_unlock(t) is called. Seems difficult to trigger race window. Maybe something has preempted because oom_badness() becomes outside of RCU grace period upon leaving find_lock_task_mm() when called from proc_oom_score(). Hi Tetsuo, Yes it is not easy to reproduce seen twice till now and i agree with your analysis. But David has already fixing this in different way, So that also looks better to me: https://patchwork.kernel.org/patch/10265641/ Yes, I'm aware of that patch. But if need to keep that code, So we have to bump up the task reference that's only i can think of now. I don't think so, for I think it is safe to call has_capability_noaudit(p) with p->alloc_lock held. diff --git a/mm/oom_kill.c b/mm/oom_kill.c index f2e7dfb..4efcfb8 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -222,7 +222,6 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg, */ points = get_mm_rss(p->mm) + get_mm_counter(p->mm, MM_SWAPENTS) + mm_pgtables_bytes(p->mm) / PAGE_SIZE; - task_unlock(p); /* * Root processes get 3% bonus, just like the __vm_enough_memory() @@ -230,6 +229,7 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg, */ if (has_capability_noaudit(p, CAP_SYS_ADMIN)) points -= (points * 3) / 100; + task_unlock(p); Earlier i have thought the same to post this, but this may create problem if there are sleeping calls in has_capability_noaudit ? /* Normalize to oom_score_adj units */ adj *= totalpages / 1000; -- Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
Re: [PATCH] mm: oom: Fix race condition between oom_badness and do_exit of task
Kohli, Gaurav wrote: > > t->alloc_lock is still held when leaving find_lock_task_mm(), which means > > that t->mm != NULL. But nothing prevents t from setting t->mm = NULL at > > exit_mm() from do_exit() and calling exit_creds() from __put_task_struct(t) > > after task_unlock(t) is called. Seems difficult to trigger race window. > > Maybe > > something has preempted because oom_badness() becomes outside of RCU grace > > period upon leaving find_lock_task_mm() when called from proc_oom_score(). > > Hi Tetsuo, > > Yes it is not easy to reproduce seen twice till now and i agree with > your analysis. But David has already fixing this in different way, > So that also looks better to me: > > https://patchwork.kernel.org/patch/10265641/ > Yes, I'm aware of that patch. > But if need to keep that code, So we have to bump up the task > reference that's only i can think of now. I don't think so, for I think it is safe to call has_capability_noaudit(p) with p->alloc_lock held. diff --git a/mm/oom_kill.c b/mm/oom_kill.c index f2e7dfb..4efcfb8 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -222,7 +222,6 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg, */ points = get_mm_rss(p->mm) + get_mm_counter(p->mm, MM_SWAPENTS) + mm_pgtables_bytes(p->mm) / PAGE_SIZE; - task_unlock(p); /* * Root processes get 3% bonus, just like the __vm_enough_memory() @@ -230,6 +229,7 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg, */ if (has_capability_noaudit(p, CAP_SYS_ADMIN)) points -= (points * 3) / 100; + task_unlock(p); /* Normalize to oom_score_adj units */ adj *= totalpages / 1000;
Re: [PATCH] mm: oom: Fix race condition between oom_badness and do_exit of task
Kohli, Gaurav wrote: > > t->alloc_lock is still held when leaving find_lock_task_mm(), which means > > that t->mm != NULL. But nothing prevents t from setting t->mm = NULL at > > exit_mm() from do_exit() and calling exit_creds() from __put_task_struct(t) > > after task_unlock(t) is called. Seems difficult to trigger race window. > > Maybe > > something has preempted because oom_badness() becomes outside of RCU grace > > period upon leaving find_lock_task_mm() when called from proc_oom_score(). > > Hi Tetsuo, > > Yes it is not easy to reproduce seen twice till now and i agree with > your analysis. But David has already fixing this in different way, > So that also looks better to me: > > https://patchwork.kernel.org/patch/10265641/ > Yes, I'm aware of that patch. > But if need to keep that code, So we have to bump up the task > reference that's only i can think of now. I don't think so, for I think it is safe to call has_capability_noaudit(p) with p->alloc_lock held. diff --git a/mm/oom_kill.c b/mm/oom_kill.c index f2e7dfb..4efcfb8 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -222,7 +222,6 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg, */ points = get_mm_rss(p->mm) + get_mm_counter(p->mm, MM_SWAPENTS) + mm_pgtables_bytes(p->mm) / PAGE_SIZE; - task_unlock(p); /* * Root processes get 3% bonus, just like the __vm_enough_memory() @@ -230,6 +229,7 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg, */ if (has_capability_noaudit(p, CAP_SYS_ADMIN)) points -= (points * 3) / 100; + task_unlock(p); /* Normalize to oom_score_adj units */ adj *= totalpages / 1000;
Re: [PATCH] mm: oom: Fix race condition between oom_badness and do_exit of task
On 2018/03/08 13:51, Kohli, Gaurav wrote: > On 3/8/2018 2:26 AM, David Rientjes wrote: > >> On Wed, 7 Mar 2018, Gaurav Kohli wrote: >> >>> diff --git a/mm/oom_kill.c b/mm/oom_kill.c >>> index 6fd9773..5f4cc4b 100644 >>> --- a/mm/oom_kill.c >>> +++ b/mm/oom_kill.c >>> @@ -114,9 +114,11 @@ struct task_struct *find_lock_task_mm(struct >>> task_struct *p) >>> for_each_thread(p, t) { >>> task_lock(t); >>> + get_task_struct(t); >>> if (likely(t->mm)) >>> goto found; >>> task_unlock(t); >>> + put_task_struct(t); >>> } >>> t = NULL; >>> found: >> We hold rcu_read_lock() here, so perhaps only do get_task_struct() before >> doing rcu_read_unlock() and we have a non-NULL t? > > Here rcu_read_lock will not help, as our task may change due to below algo: > > for_each_thread(p, t) { > task_lock(t); > + get_task_struct(t); > if (likely(t->mm)) > goto found; > task_unlock(t); > + put_task_struct(t) > > > So only we can increase usage counter here only at the current task. static int proc_single_show(struct seq_file *m, void *v) { struct inode *inode = m->private; struct pid_namespace *ns; struct pid *pid; struct task_struct *task; int ret; ns = inode->i_sb->s_fs_info; pid = proc_pid(inode); task = get_pid_task(pid, PIDTYPE_PID); /* get_task_struct() is called upon success. */ if (!task) return -ESRCH; ret = PROC_I(inode)->op.proc_show(m, ns, pid, task); put_task_struct(task); return ret; } static int proc_oom_score(struct seq_file *m, struct pid_namespace *ns, struct pid *pid, struct task_struct *task) { unsigned long totalpages = totalram_pages + total_swap_pages; unsigned long points = 0; points = oom_badness(task, NULL, NULL, totalpages) * 1000 / totalpages; /* task->usage > 0 due to proc_single_show() */ seq_printf(m, "%lu\n", points); return 0; } struct task_struct *find_lock_task_mm(struct task_struct *p) /* p->usage > 0 */ { struct task_struct *t; rcu_read_lock(); for_each_thread(p, t) { task_lock(t); if (likely(t->mm)) goto found; task_unlock(t); } t = NULL; found: rcu_read_unlock(); return t; /* t->usage > 0 even if t != p because t->mm != NULL */ } t->alloc_lock is still held when leaving find_lock_task_mm(), which means that t->mm != NULL. But nothing prevents t from setting t->mm = NULL at exit_mm() from do_exit() and calling exit_creds() from __put_task_struct(t) after task_unlock(t) is called. Seems difficult to trigger race window. Maybe something has preempted because oom_badness() becomes outside of RCU grace period upon leaving find_lock_task_mm() when called from proc_oom_score().
Re: [PATCH] mm: oom: Fix race condition between oom_badness and do_exit of task
On 2018/03/08 13:51, Kohli, Gaurav wrote: > On 3/8/2018 2:26 AM, David Rientjes wrote: > >> On Wed, 7 Mar 2018, Gaurav Kohli wrote: >> >>> diff --git a/mm/oom_kill.c b/mm/oom_kill.c >>> index 6fd9773..5f4cc4b 100644 >>> --- a/mm/oom_kill.c >>> +++ b/mm/oom_kill.c >>> @@ -114,9 +114,11 @@ struct task_struct *find_lock_task_mm(struct >>> task_struct *p) >>> for_each_thread(p, t) { >>> task_lock(t); >>> + get_task_struct(t); >>> if (likely(t->mm)) >>> goto found; >>> task_unlock(t); >>> + put_task_struct(t); >>> } >>> t = NULL; >>> found: >> We hold rcu_read_lock() here, so perhaps only do get_task_struct() before >> doing rcu_read_unlock() and we have a non-NULL t? > > Here rcu_read_lock will not help, as our task may change due to below algo: > > for_each_thread(p, t) { > task_lock(t); > + get_task_struct(t); > if (likely(t->mm)) > goto found; > task_unlock(t); > + put_task_struct(t) > > > So only we can increase usage counter here only at the current task. static int proc_single_show(struct seq_file *m, void *v) { struct inode *inode = m->private; struct pid_namespace *ns; struct pid *pid; struct task_struct *task; int ret; ns = inode->i_sb->s_fs_info; pid = proc_pid(inode); task = get_pid_task(pid, PIDTYPE_PID); /* get_task_struct() is called upon success. */ if (!task) return -ESRCH; ret = PROC_I(inode)->op.proc_show(m, ns, pid, task); put_task_struct(task); return ret; } static int proc_oom_score(struct seq_file *m, struct pid_namespace *ns, struct pid *pid, struct task_struct *task) { unsigned long totalpages = totalram_pages + total_swap_pages; unsigned long points = 0; points = oom_badness(task, NULL, NULL, totalpages) * 1000 / totalpages; /* task->usage > 0 due to proc_single_show() */ seq_printf(m, "%lu\n", points); return 0; } struct task_struct *find_lock_task_mm(struct task_struct *p) /* p->usage > 0 */ { struct task_struct *t; rcu_read_lock(); for_each_thread(p, t) { task_lock(t); if (likely(t->mm)) goto found; task_unlock(t); } t = NULL; found: rcu_read_unlock(); return t; /* t->usage > 0 even if t != p because t->mm != NULL */ } t->alloc_lock is still held when leaving find_lock_task_mm(), which means that t->mm != NULL. But nothing prevents t from setting t->mm = NULL at exit_mm() from do_exit() and calling exit_creds() from __put_task_struct(t) after task_unlock(t) is called. Seems difficult to trigger race window. Maybe something has preempted because oom_badness() becomes outside of RCU grace period upon leaving find_lock_task_mm() when called from proc_oom_score().
Re: [PATCH] mm: oom: Fix race condition between oom_badness and do_exit of task
On 3/8/2018 2:26 AM, David Rientjes wrote: On Wed, 7 Mar 2018, Gaurav Kohli wrote: diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 6fd9773..5f4cc4b 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -114,9 +114,11 @@ struct task_struct *find_lock_task_mm(struct task_struct *p) for_each_thread(p, t) { task_lock(t); + get_task_struct(t); if (likely(t->mm)) goto found; task_unlock(t); + put_task_struct(t); } t = NULL; found: We hold rcu_read_lock() here, so perhaps only do get_task_struct() before doing rcu_read_unlock() and we have a non-NULL t? Here rcu_read_lock will not help, as our task may change due to below algo: for_each_thread(p, t) { task_lock(t); + get_task_struct(t); if (likely(t->mm)) goto found; task_unlock(t); + put_task_struct(t) So only we can increase usage counter here only at the current task. I have seen you new patch, that seems valid to me and it will resolve our issue. Thanks for support. Regards Gaurav -- Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
Re: [PATCH] mm: oom: Fix race condition between oom_badness and do_exit of task
On 3/8/2018 2:26 AM, David Rientjes wrote: On Wed, 7 Mar 2018, Gaurav Kohli wrote: diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 6fd9773..5f4cc4b 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -114,9 +114,11 @@ struct task_struct *find_lock_task_mm(struct task_struct *p) for_each_thread(p, t) { task_lock(t); + get_task_struct(t); if (likely(t->mm)) goto found; task_unlock(t); + put_task_struct(t); } t = NULL; found: We hold rcu_read_lock() here, so perhaps only do get_task_struct() before doing rcu_read_unlock() and we have a non-NULL t? Here rcu_read_lock will not help, as our task may change due to below algo: for_each_thread(p, t) { task_lock(t); + get_task_struct(t); if (likely(t->mm)) goto found; task_unlock(t); + put_task_struct(t) So only we can increase usage counter here only at the current task. I have seen you new patch, that seems valid to me and it will resolve our issue. Thanks for support. Regards Gaurav -- Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
Re: [PATCH] mm: oom: Fix race condition between oom_badness and do_exit of task
On Wed, 7 Mar 2018, Gaurav Kohli wrote: > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index 6fd9773..5f4cc4b 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -114,9 +114,11 @@ struct task_struct *find_lock_task_mm(struct task_struct > *p) > > for_each_thread(p, t) { > task_lock(t); > + get_task_struct(t); > if (likely(t->mm)) > goto found; > task_unlock(t); > + put_task_struct(t); > } > t = NULL; > found: We hold rcu_read_lock() here, so perhaps only do get_task_struct() before doing rcu_read_unlock() and we have a non-NULL t? > @@ -191,6 +193,7 @@ unsigned long oom_badness(struct task_struct *p, struct > mem_cgroup *memcg, > test_bit(MMF_OOM_SKIP, >mm->flags) || > in_vfork(p)) { > task_unlock(p); > + put_task_struct(p); > return 0; > } > > @@ -208,7 +211,7 @@ unsigned long oom_badness(struct task_struct *p, struct > mem_cgroup *memcg, >*/ > if (has_capability_noaudit(p, CAP_SYS_ADMIN)) > points -= (points * 3) / 100; > - > + put_task_struct(p); > /* Normalize to oom_score_adj units */ > adj *= totalpages / 1000; > points += adj; This fixes up oom_badness(), but there are other users of find_lock_task_mm() in the oom killer as well as other subsystems.
Re: [PATCH] mm: oom: Fix race condition between oom_badness and do_exit of task
On Wed, 7 Mar 2018, Gaurav Kohli wrote: > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index 6fd9773..5f4cc4b 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -114,9 +114,11 @@ struct task_struct *find_lock_task_mm(struct task_struct > *p) > > for_each_thread(p, t) { > task_lock(t); > + get_task_struct(t); > if (likely(t->mm)) > goto found; > task_unlock(t); > + put_task_struct(t); > } > t = NULL; > found: We hold rcu_read_lock() here, so perhaps only do get_task_struct() before doing rcu_read_unlock() and we have a non-NULL t? > @@ -191,6 +193,7 @@ unsigned long oom_badness(struct task_struct *p, struct > mem_cgroup *memcg, > test_bit(MMF_OOM_SKIP, >mm->flags) || > in_vfork(p)) { > task_unlock(p); > + put_task_struct(p); > return 0; > } > > @@ -208,7 +211,7 @@ unsigned long oom_badness(struct task_struct *p, struct > mem_cgroup *memcg, >*/ > if (has_capability_noaudit(p, CAP_SYS_ADMIN)) > points -= (points * 3) / 100; > - > + put_task_struct(p); > /* Normalize to oom_score_adj units */ > adj *= totalpages / 1000; > points += adj; This fixes up oom_badness(), but there are other users of find_lock_task_mm() in the oom killer as well as other subsystems.