Re: [v4 1/4] mm, oom: refactor the TIF_MEMDIE usage

2017-07-26 Thread Roman Gushchin
On Wed, Jul 26, 2017 at 04:44:08PM +0200, Michal Hocko wrote:
> On Wed 26-07-17 16:24:34, Michal Hocko wrote:
> [...]
> > Or if you prefer I can post it separately?
> 
> I've just tried to rebase relevant parts on top of the current mmotm
> tree and it needs some non-trivial updates. Would you mind if I post
> those patches with you on CC?

Sure.
Again, I'm not against your approach (and I've tried to rebase your patches,
and it worked well for me, although I didn't run any proper tests),
I just don't want to create an unnecessary dependancy here.

If your patchset will be accepted, it will be quite trivial to rebase
mine and vice-versa; they are not so dependant.

> I will comment on the rest of the series later. I have glanced through
> it and have to digest it some more before replying.

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v4 1/4] mm, oom: refactor the TIF_MEMDIE usage

2017-07-26 Thread Michal Hocko
On Wed 26-07-17 16:24:34, Michal Hocko wrote:
[...]
> Or if you prefer I can post it separately?

I've just tried to rebase relevant parts on top of the current mmotm
tree and it needs some non-trivial updates. Would you mind if I post
those patches with you on CC? I really think that we shouldn't invent a
new throttling just to replace it later.

I will comment on the rest of the series later. I have glanced through
it and have to digest it some more before replying.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v4 1/4] mm, oom: refactor the TIF_MEMDIE usage

2017-07-26 Thread Michal Hocko
On Wed 26-07-17 15:06:07, Roman Gushchin wrote:
> On Wed, Jul 26, 2017 at 03:56:22PM +0200, Michal Hocko wrote:
> > On Wed 26-07-17 14:27:15, Roman Gushchin wrote:
> > [...]
> > > @@ -656,13 +658,24 @@ static void mark_oom_victim(struct task_struct *tsk)
> > >   struct mm_struct *mm = tsk->mm;
> > >  
> > >   WARN_ON(oom_killer_disabled);
> > > - /* OOM killer might race with memcg OOM */
> > > - if (test_and_set_tsk_thread_flag(tsk, TIF_MEMDIE))
> > > +
> > > + if (!cmpxchg(_memdie_owner, NULL, current)) {
> > > + struct task_struct *t;
> > > +
> > > + rcu_read_lock();
> > > + for_each_thread(current, t)
> > > + set_tsk_thread_flag(t, TIF_MEMDIE);
> > > + rcu_read_unlock();
> > > + }
> > 
> > I would realy much rather see we limit the amount of memory reserves oom
> > victims can consume rather than build on top of the current hackish
> > approach of limiting the number of tasks because the fundamental problem
> > is still there (a heavy multithreaded process can still deplete the
> > reserves completely).
> > 
> > Is there really any reason to not go with the existing patch I've
> > pointed to the last time around? You didn't seem to have any objects
> > back then.
> 
> Hi Michal!
> 
> I had this patch in mind and mentioned in the commit log, that TIF_MEMDIE
> as an memory reserves access indicator will probably be eliminated later.
> 
> But that patch is not upstream yet, and it's directly related to the theme.
> The proposed refactoring of TIF_MEMDIE usage is not against your approach,
> and will not make harder to go this way further.

So what is the reason to invent another tricky way to limit access to
memory reserves? The patch is not upstream but nothin prevents you from
picking it up and post along with your series. Or if you prefer I can
post it separately?

> I'm slightly concerned about an idea to give TIF_MEMDIE to all tasks
> in case we're killing a really large cgroup.

Why? There shouldn't be any issue if the access is limited.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v4 1/4] mm, oom: refactor the TIF_MEMDIE usage

2017-07-26 Thread Roman Gushchin
On Wed, Jul 26, 2017 at 03:56:22PM +0200, Michal Hocko wrote:
> On Wed 26-07-17 14:27:15, Roman Gushchin wrote:
> [...]
> > @@ -656,13 +658,24 @@ static void mark_oom_victim(struct task_struct *tsk)
> > struct mm_struct *mm = tsk->mm;
> >  
> > WARN_ON(oom_killer_disabled);
> > -   /* OOM killer might race with memcg OOM */
> > -   if (test_and_set_tsk_thread_flag(tsk, TIF_MEMDIE))
> > +
> > +   if (!cmpxchg(_memdie_owner, NULL, current)) {
> > +   struct task_struct *t;
> > +
> > +   rcu_read_lock();
> > +   for_each_thread(current, t)
> > +   set_tsk_thread_flag(t, TIF_MEMDIE);
> > +   rcu_read_unlock();
> > +   }
> 
> I would realy much rather see we limit the amount of memory reserves oom
> victims can consume rather than build on top of the current hackish
> approach of limiting the number of tasks because the fundamental problem
> is still there (a heavy multithreaded process can still deplete the
> reserves completely).
> 
> Is there really any reason to not go with the existing patch I've
> pointed to the last time around? You didn't seem to have any objects
> back then.

Hi Michal!

I had this patch in mind and mentioned in the commit log, that TIF_MEMDIE
as an memory reserves access indicator will probably be eliminated later.

But that patch is not upstream yet, and it's directly related to the theme.
The proposed refactoring of TIF_MEMDIE usage is not against your approach,
and will not make harder to go this way further.

I'm slightly concerned about an idea to give TIF_MEMDIE to all tasks
in case we're killing a really large cgroup. But it's only a theoretical
concern, maybe it's fine.

So, I'd keep the existing approach for this patchset, and then we can follow
your approach and we will have a better test case for it.

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v4 1/4] mm, oom: refactor the TIF_MEMDIE usage

2017-07-26 Thread Michal Hocko
On Wed 26-07-17 14:27:15, Roman Gushchin wrote:
[...]
> @@ -656,13 +658,24 @@ static void mark_oom_victim(struct task_struct *tsk)
>   struct mm_struct *mm = tsk->mm;
>  
>   WARN_ON(oom_killer_disabled);
> - /* OOM killer might race with memcg OOM */
> - if (test_and_set_tsk_thread_flag(tsk, TIF_MEMDIE))
> +
> + if (!cmpxchg(_memdie_owner, NULL, current)) {
> + struct task_struct *t;
> +
> + rcu_read_lock();
> + for_each_thread(current, t)
> + set_tsk_thread_flag(t, TIF_MEMDIE);
> + rcu_read_unlock();
> + }

I would realy much rather see we limit the amount of memory reserves oom
victims can consume rather than build on top of the current hackish
approach of limiting the number of tasks because the fundamental problem
is still there (a heavy multithreaded process can still deplete the
reserves completely).

Is there really any reason to not go with the existing patch I've
pointed to the last time around? You didn't seem to have any objects
back then.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[v4 1/4] mm, oom: refactor the TIF_MEMDIE usage

2017-07-26 Thread Roman Gushchin
First, separate tsk_is_oom_victim() and TIF_MEMDIE flag checks:
let the first one indicate that a task is killed by the OOM killer,
and the second one indicate that a task has an access to the memory
reserves (with a hope to eliminate it later).

Second, set TIF_MEMDIE to all threads of an OOM victim process.

Third, to limit the number of processes which have an access to memory
reserves, let's keep an atomic pointer to a task, which grabbed it.

Signed-off-by: Roman Gushchin 
Cc: Michal Hocko 
Cc: Vladimir Davydov 
Cc: Johannes Weiner 
Cc: Tetsuo Handa 
Cc: David Rientjes 
Cc: Tejun Heo 
Cc: kernel-t...@fb.com
Cc: cgro...@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: linux...@kvack.org
---
 kernel/exit.c   |  2 +-
 mm/memcontrol.c |  2 +-
 mm/oom_kill.c   | 30 +-
 3 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index 8f40bee5ba9d..d5f372a2a363 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -542,7 +542,7 @@ static void exit_mm(void)
task_unlock(current);
mm_update_next_owner(mm);
mmput(mm);
-   if (test_thread_flag(TIF_MEMDIE))
+   if (tsk_is_oom_victim(current))
exit_oom_victim();
 }
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index d61133e6af99..9085e55eb69f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1896,7 +1896,7 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t 
gfp_mask,
 * bypass the last charges so that they can exit quickly and
 * free their memory.
 */
-   if (unlikely(test_thread_flag(TIF_MEMDIE) ||
+   if (unlikely(tsk_is_oom_victim(current) ||
 fatal_signal_pending(current) ||
 current->flags & PF_EXITING))
goto force;
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 9e8b4f030c1c..72de01be4d33 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -435,6 +435,8 @@ static DECLARE_WAIT_QUEUE_HEAD(oom_victims_wait);
 
 static bool oom_killer_disabled __read_mostly;
 
+static struct task_struct *tif_memdie_owner;
+
 #define K(x) ((x) << (PAGE_SHIFT-10))
 
 /*
@@ -656,13 +658,24 @@ static void mark_oom_victim(struct task_struct *tsk)
struct mm_struct *mm = tsk->mm;
 
WARN_ON(oom_killer_disabled);
-   /* OOM killer might race with memcg OOM */
-   if (test_and_set_tsk_thread_flag(tsk, TIF_MEMDIE))
+
+   if (!cmpxchg(_memdie_owner, NULL, current)) {
+   struct task_struct *t;
+
+   rcu_read_lock();
+   for_each_thread(current, t)
+   set_tsk_thread_flag(t, TIF_MEMDIE);
+   rcu_read_unlock();
+   }
+
+   /*
+* OOM killer might race with memcg OOM.
+* oom_mm is bound to the signal struct life time.
+*/
+   if (cmpxchg(>signal->oom_mm, NULL, mm))
return;
 
-   /* oom_mm is bound to the signal struct life time. */
-   if (!cmpxchg(>signal->oom_mm, NULL, mm))
-   mmgrab(tsk->signal->oom_mm);
+   mmgrab(tsk->signal->oom_mm);
 
/*
 * Make sure that the task is woken up from uninterruptible sleep
@@ -682,6 +695,13 @@ void exit_oom_victim(void)
 {
clear_thread_flag(TIF_MEMDIE);
 
+   /*
+* If current tasks if a thread, which initially
+* received TIF_MEMDIE, clear tif_memdie_owner to
+* give a next process a chance to capture it.
+*/
+   cmpxchg(_memdie_owner, current, NULL);
+
if (!atomic_dec_return(_victims))
wake_up_all(_victims_wait);
 }
-- 
2.13.3

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html