Re: [RFC PATCH] mm, oom: distinguish blockable mode for mmu notifiers

2018-07-16 Thread Leon Romanovsky
On Tue, Jul 10, 2018 at 04:14:10PM +0200, Michal Hocko wrote:
> On Tue 10-07-18 16:40:40, Leon Romanovsky wrote:
> > On Mon, Jul 09, 2018 at 02:29:08PM +0200, Michal Hocko wrote:
> > > On Wed 27-06-18 09:44:21, Michal Hocko wrote:
> > > > This is the v2 of RFC based on the feedback I've received so far. The
> > > > code even compiles as a bonus ;) I haven't runtime tested it yet, mostly
> > > > because I have no idea how.
> > > >
> > > > Any further feedback is highly appreciated of course.
> > >
> > > Any other feedback before I post this as non-RFC?
> >
> > From mlx5 perspective, who is primary user of umem_odp.c your change looks 
> > ok.
>
> Can I assume your Acked-by?
>
> Thanks for your review!

For mlx and umem_odp pieces,
Acked-by: Leon Romanovsky 

Thanks


signature.asc
Description: PGP signature
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC PATCH] mm, oom: distinguish blockable mode for mmu notifiers

2018-07-11 Thread Leon Romanovsky
On Wed, Jul 11, 2018 at 01:13:18PM +0200, Michal Hocko wrote:
> On Wed 11-07-18 13:14:47, Leon Romanovsky wrote:
> > On Wed, Jul 11, 2018 at 11:03:53AM +0200, Michal Hocko wrote:
> > > On Tue 10-07-18 19:20:20, Leon Romanovsky wrote:
> > > > On Tue, Jul 10, 2018 at 04:14:10PM +0200, Michal Hocko wrote:
> > > > > On Tue 10-07-18 16:40:40, Leon Romanovsky wrote:
> > > > > > On Mon, Jul 09, 2018 at 02:29:08PM +0200, Michal Hocko wrote:
> > > > > > > On Wed 27-06-18 09:44:21, Michal Hocko wrote:
> > > > > > > > This is the v2 of RFC based on the feedback I've received so 
> > > > > > > > far. The
> > > > > > > > code even compiles as a bonus ;) I haven't runtime tested it 
> > > > > > > > yet, mostly
> > > > > > > > because I have no idea how.
> > > > > > > >
> > > > > > > > Any further feedback is highly appreciated of course.
> > > > > > >
> > > > > > > Any other feedback before I post this as non-RFC?
> > > > > >
> > > > > > From mlx5 perspective, who is primary user of umem_odp.c your 
> > > > > > change looks ok.
> > > > >
> > > > > Can I assume your Acked-by?
> > > >
> > > > I didn't have a chance to test it because it applies on our rdma-next, 
> > > > but
> > > > fails to compile.
> > >
> > > What is the compilation problem? Is it caused by the patch or some other
> > > unrelated changed?
> >
> > Thanks for pushing me to take a look on it.
> > Your patch needs the following hunk to properly compile at least on my 
> > system.
>
> I suspect you were trying the original version. I've posted an updated
> patch here http://lkml.kernel.org/r/20180627074421.gf32...@dhcp22.suse.cz
> and all these issues should be fixed there. Including many other fixes.
>

Ohh, you used --reply-to, IMHO it is best way to make sure that the
patch will be lost :)

> Could you have a look at that one please?

I grabbed it, the results will be overnight only.

Thanks

> --
> Michal Hocko
> SUSE Labs


signature.asc
Description: PGP signature
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC PATCH] mm, oom: distinguish blockable mode for mmu notifiers

2018-07-11 Thread Michal Hocko
On Wed 11-07-18 13:14:47, Leon Romanovsky wrote:
> On Wed, Jul 11, 2018 at 11:03:53AM +0200, Michal Hocko wrote:
> > On Tue 10-07-18 19:20:20, Leon Romanovsky wrote:
> > > On Tue, Jul 10, 2018 at 04:14:10PM +0200, Michal Hocko wrote:
> > > > On Tue 10-07-18 16:40:40, Leon Romanovsky wrote:
> > > > > On Mon, Jul 09, 2018 at 02:29:08PM +0200, Michal Hocko wrote:
> > > > > > On Wed 27-06-18 09:44:21, Michal Hocko wrote:
> > > > > > > This is the v2 of RFC based on the feedback I've received so far. 
> > > > > > > The
> > > > > > > code even compiles as a bonus ;) I haven't runtime tested it yet, 
> > > > > > > mostly
> > > > > > > because I have no idea how.
> > > > > > >
> > > > > > > Any further feedback is highly appreciated of course.
> > > > > >
> > > > > > Any other feedback before I post this as non-RFC?
> > > > >
> > > > > From mlx5 perspective, who is primary user of umem_odp.c your change 
> > > > > looks ok.
> > > >
> > > > Can I assume your Acked-by?
> > >
> > > I didn't have a chance to test it because it applies on our rdma-next, but
> > > fails to compile.
> >
> > What is the compilation problem? Is it caused by the patch or some other
> > unrelated changed?
> 
> Thanks for pushing me to take a look on it.
> Your patch needs the following hunk to properly compile at least on my system.

I suspect you were trying the original version. I've posted an updated
patch here http://lkml.kernel.org/r/20180627074421.gf32...@dhcp22.suse.cz
and all these issues should be fixed there. Including many other fixes.

Could you have a look at that one please?
-- 
Michal Hocko
SUSE Labs
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC PATCH] mm, oom: distinguish blockable mode for mmu notifiers

2018-07-11 Thread Leon Romanovsky
On Wed, Jul 11, 2018 at 11:03:53AM +0200, Michal Hocko wrote:
> On Tue 10-07-18 19:20:20, Leon Romanovsky wrote:
> > On Tue, Jul 10, 2018 at 04:14:10PM +0200, Michal Hocko wrote:
> > > On Tue 10-07-18 16:40:40, Leon Romanovsky wrote:
> > > > On Mon, Jul 09, 2018 at 02:29:08PM +0200, Michal Hocko wrote:
> > > > > On Wed 27-06-18 09:44:21, Michal Hocko wrote:
> > > > > > This is the v2 of RFC based on the feedback I've received so far. 
> > > > > > The
> > > > > > code even compiles as a bonus ;) I haven't runtime tested it yet, 
> > > > > > mostly
> > > > > > because I have no idea how.
> > > > > >
> > > > > > Any further feedback is highly appreciated of course.
> > > > >
> > > > > Any other feedback before I post this as non-RFC?
> > > >
> > > > From mlx5 perspective, who is primary user of umem_odp.c your change 
> > > > looks ok.
> > >
> > > Can I assume your Acked-by?
> >
> > I didn't have a chance to test it because it applies on our rdma-next, but
> > fails to compile.
>
> What is the compilation problem? Is it caused by the patch or some other
> unrelated changed?

Thanks for pushing me to take a look on it.
Your patch needs the following hunk to properly compile at least on my system.

I'll take it to our regression.

diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index 369867501bed..1f364a157097 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -155,9 +155,9 @@ struct mmu_notifier_ops {
 * cannot block, mmu_notifier_ops.flags should have
 * MMU_INVALIDATE_DOES_NOT_BLOCK set.
 */
-   void (*invalidate_range_start)(struct mmu_notifier *mn,
+   int (*invalidate_range_start)(struct mmu_notifier *mn,
   struct mm_struct *mm,
-  unsigned long start, unsigned long end);
+  unsigned long start, unsigned long end, 
bool blockable);
void (*invalidate_range_end)(struct mmu_notifier *mn,
 struct mm_struct *mm,
 unsigned long start, unsigned long end);
@@ -229,7 +229,7 @@ extern int __mmu_notifier_test_young(struct mm_struct *mm,
 unsigned long address);
 extern void __mmu_notifier_change_pte(struct mm_struct *mm,
  unsigned long address, pte_t pte);
-extern void __mmu_notifier_invalidate_range_start(struct mm_struct *mm,
+extern int __mmu_notifier_invalidate_range_start(struct mm_struct *mm,
  unsigned long start, unsigned long end,
  bool blockable);
 extern void __mmu_notifier_invalidate_range_end(struct mm_struct *mm,
diff --git a/include/linux/oom.h b/include/linux/oom.h
index 6adac113e96d..92f70e4c6252 100644
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -95,7 +95,7 @@ static inline int check_stable_address_space(struct mm_struct 
*mm)
return 0;
 }

-void __oom_reap_task_mm(struct mm_struct *mm);
+bool __oom_reap_task_mm(struct mm_struct *mm);

 extern unsigned long oom_badness(struct task_struct *p,
struct mem_cgroup *memcg, const nodemask_t *nodemask,
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 7e0c6e78ae5c..7c7bd6f3298e 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -1,6 +1,6 @@
 /*
  *  linux/mm/oom_kill.c
- *
+ *
  *  Copyright (C)  1998,2000  Rik van Riel
  * Thanks go out to Claus Fischer for some serious inspiration and
  * for goading me into coding this file...
@@ -569,7 +569,7 @@ static bool oom_reap_task_mm(struct task_struct *tsk, 
struct mm_struct *mm)
if (!__oom_reap_task_mm(mm)) {
up_read(>mmap_sem);
ret = false;
-   goto out_unlock;
+   goto unlock_oom;
}

pr_info("oom_reaper: reaped process %d (%s), now anon-rss:%lukB, 
file-rss:%lukB, shmem-rss:%lukB\n",

Thanks

> --
> Michal Hocko
> SUSE Labs


signature.asc
Description: PGP signature
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC PATCH] mm, oom: distinguish blockable mode for mmu notifiers

2018-07-11 Thread Michal Hocko
On Tue 10-07-18 19:20:20, Leon Romanovsky wrote:
> On Tue, Jul 10, 2018 at 04:14:10PM +0200, Michal Hocko wrote:
> > On Tue 10-07-18 16:40:40, Leon Romanovsky wrote:
> > > On Mon, Jul 09, 2018 at 02:29:08PM +0200, Michal Hocko wrote:
> > > > On Wed 27-06-18 09:44:21, Michal Hocko wrote:
> > > > > This is the v2 of RFC based on the feedback I've received so far. The
> > > > > code even compiles as a bonus ;) I haven't runtime tested it yet, 
> > > > > mostly
> > > > > because I have no idea how.
> > > > >
> > > > > Any further feedback is highly appreciated of course.
> > > >
> > > > Any other feedback before I post this as non-RFC?
> > >
> > > From mlx5 perspective, who is primary user of umem_odp.c your change 
> > > looks ok.
> >
> > Can I assume your Acked-by?
> 
> I didn't have a chance to test it because it applies on our rdma-next, but
> fails to compile.

What is the compilation problem? Is it caused by the patch or some other
unrelated changed?
-- 
Michal Hocko
SUSE Labs
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC PATCH] mm, oom: distinguish blockable mode for mmu notifiers

2018-07-10 Thread Leon Romanovsky
On Tue, Jul 10, 2018 at 04:14:10PM +0200, Michal Hocko wrote:
> On Tue 10-07-18 16:40:40, Leon Romanovsky wrote:
> > On Mon, Jul 09, 2018 at 02:29:08PM +0200, Michal Hocko wrote:
> > > On Wed 27-06-18 09:44:21, Michal Hocko wrote:
> > > > This is the v2 of RFC based on the feedback I've received so far. The
> > > > code even compiles as a bonus ;) I haven't runtime tested it yet, mostly
> > > > because I have no idea how.
> > > >
> > > > Any further feedback is highly appreciated of course.
> > >
> > > Any other feedback before I post this as non-RFC?
> >
> > From mlx5 perspective, who is primary user of umem_odp.c your change looks 
> > ok.
>
> Can I assume your Acked-by?

I didn't have a chance to test it because it applies on our rdma-next, but
fails to compile.

Thanks

>
> Thanks for your review!
> --
> Michal Hocko
> SUSE Labs
>


signature.asc
Description: PGP signature
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC PATCH] mm, oom: distinguish blockable mode for mmu notifiers

2018-07-10 Thread Michal Hocko
On Tue 10-07-18 16:40:40, Leon Romanovsky wrote:
> On Mon, Jul 09, 2018 at 02:29:08PM +0200, Michal Hocko wrote:
> > On Wed 27-06-18 09:44:21, Michal Hocko wrote:
> > > This is the v2 of RFC based on the feedback I've received so far. The
> > > code even compiles as a bonus ;) I haven't runtime tested it yet, mostly
> > > because I have no idea how.
> > >
> > > Any further feedback is highly appreciated of course.
> >
> > Any other feedback before I post this as non-RFC?
> 
> From mlx5 perspective, who is primary user of umem_odp.c your change looks ok.

Can I assume your Acked-by?

Thanks for your review!
-- 
Michal Hocko
SUSE Labs
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC PATCH] mm, oom: distinguish blockable mode for mmu notifiers

2018-07-10 Thread Leon Romanovsky
On Mon, Jul 09, 2018 at 02:29:08PM +0200, Michal Hocko wrote:
> On Wed 27-06-18 09:44:21, Michal Hocko wrote:
> > This is the v2 of RFC based on the feedback I've received so far. The
> > code even compiles as a bonus ;) I haven't runtime tested it yet, mostly
> > because I have no idea how.
> >
> > Any further feedback is highly appreciated of course.
>
> Any other feedback before I post this as non-RFC?

From mlx5 perspective, who is primary user of umem_odp.c your change looks ok.

Thanks


signature.asc
Description: PGP signature
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC PATCH] mm, oom: distinguish blockable mode for mmu notifiers

2018-07-09 Thread Michal Hocko
On Wed 27-06-18 09:44:21, Michal Hocko wrote:
> This is the v2 of RFC based on the feedback I've received so far. The
> code even compiles as a bonus ;) I haven't runtime tested it yet, mostly
> because I have no idea how.
> 
> Any further feedback is highly appreciated of course.

Any other feedback before I post this as non-RFC?

> ---
> From ec9a7241bf422b908532c4c33953b0da2655ad05 Mon Sep 17 00:00:00 2001
> From: Michal Hocko 
> Date: Wed, 20 Jun 2018 15:03:20 +0200
> Subject: [PATCH] mm, oom: distinguish blockable mode for mmu notifiers
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> There are several blockable mmu notifiers which might sleep in
> mmu_notifier_invalidate_range_start and that is a problem for the
> oom_reaper because it needs to guarantee a forward progress so it cannot
> depend on any sleepable locks.
> 
> Currently we simply back off and mark an oom victim with blockable mmu
> notifiers as done after a short sleep. That can result in selecting a
> new oom victim prematurely because the previous one still hasn't torn
> its memory down yet.
> 
> We can do much better though. Even if mmu notifiers use sleepable locks
> there is no reason to automatically assume those locks are held.
> Moreover majority of notifiers only care about a portion of the address
> space and there is absolutely zero reason to fail when we are unmapping an
> unrelated range. Many notifiers do really block and wait for HW which is
> harder to handle and we have to bail out though.
> 
> This patch handles the low hanging fruid. 
> __mmu_notifier_invalidate_range_start
> gets a blockable flag and callbacks are not allowed to sleep if the
> flag is set to false. This is achieved by using trylock instead of the
> sleepable lock for most callbacks and continue as long as we do not
> block down the call chain.
> 
> I think we can improve that even further because there is a common
> pattern to do a range lookup first and then do something about that.
> The first part can be done without a sleeping lock in most cases AFAICS.
> 
> The oom_reaper end then simply retries if there is at least one notifier
> which couldn't make any progress in !blockable mode. A retry loop is
> already implemented to wait for the mmap_sem and this is basically the
> same thing.
> 
> Changes since rfc v1
> - gpu notifiers can sleep while waiting for HW (evict_process_queues_cpsch
>   on a lock and amdgpu_mn_invalidate_node on unbound timeout) make sure
>   we bail out when we have an intersecting range for starter
> - note that a notifier failed to the log for easier debugging
> - back off early in ib_umem_notifier_invalidate_range_start if the
>   callback is called
> - mn_invl_range_start waits for completion down the unmap_grant_pages
>   path so we have to back off early on overlapping ranges
> 
> Cc: "David (ChunMing) Zhou" 
> Cc: Paolo Bonzini 
> Cc: "Radim Krčmář" 
> Cc: Alex Deucher 
> Cc: "Christian König" 
> Cc: David Airlie 
> Cc: Jani Nikula 
> Cc: Joonas Lahtinen 
> Cc: Rodrigo Vivi 
> Cc: Doug Ledford 
> Cc: Jason Gunthorpe 
> Cc: Mike Marciniszyn 
> Cc: Dennis Dalessandro 
> Cc: Sudeep Dutt 
> Cc: Ashutosh Dixit 
> Cc: Dimitri Sivanich 
> Cc: Boris Ostrovsky 
> Cc: Juergen Gross 
> Cc: "Jérôme Glisse" 
> Cc: Andrea Arcangeli 
> Cc: Felix Kuehling 
> Cc: k...@vger.kernel.org (open list:KERNEL VIRTUAL MACHINE FOR X86 (KVM/x86))
> Cc: linux-ker...@vger.kernel.org (open list:X86 ARCHITECTURE (32-BIT AND 
> 64-BIT))
> Cc: amd-gfx@lists.freedesktop.org (open list:RADEON and AMDGPU DRM DRIVERS)
> Cc: dri-de...@lists.freedesktop.org (open list:DRM DRIVERS)
> Cc: intel-...@lists.freedesktop.org (open list:INTEL DRM DRIVERS (excluding 
> Poulsbo, Moorestow...)
> Cc: linux-r...@vger.kernel.org (open list:INFINIBAND SUBSYSTEM)
> Cc: xen-de...@lists.xenproject.org (moderated list:XEN HYPERVISOR INTERFACE)
> Cc: linux...@kvack.org (open list:HMM - Heterogeneous Memory Management)
> Reported-by: David Rientjes 
> Signed-off-by: Michal Hocko 
> ---
>  arch/x86/kvm/x86.c  |  7 ++--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c  | 43 +++-
>  drivers/gpu/drm/i915/i915_gem_userptr.c | 13 ++--
>  drivers/gpu/drm/radeon/radeon_mn.c  | 22 +++--
>  drivers/infiniband/core/umem_odp.c  | 33 +++
>  drivers/infiniband/hw/hfi1/mmu_rb.c | 11 ---
>  drivers/infiniband/hw/mlx5/odp.c|  2 +-
>  drivers/misc/mic/scif/scif_dma.c|  7 ++--
>  drivers/misc/sgi-gru/grutlbpurge.c  |  7 ++--
>  drivers/xen/gntdev.c| 44 -
>  include/linux/kvm_host.h|  4 +--
>  include/linux/mmu_notifier.h| 35 +++-
>  include/linux/oom.h |  2 +-
>  include/rdma/ib_umem_odp.h  |  3 +-
>  mm/hmm.c|  7 ++--
>  mm/mmap.c   |  2 +-
>  mm/mmu_notifier.c  

Re: [RFC PATCH] mm, oom: distinguish blockable mode for mmu notifiers

2018-07-02 Thread Michal Hocko
On Mon 02-07-18 14:39:50, Christian König wrote:
[...]
> Not wanting to block something as important as this, so feel free to add an
> Acked-by: Christian König  to the patch.

Thanks a lot!

> Let's rather face the next topic: Any idea how to runtime test this?

This is a good question indeed. One way to do that would be triggering
the OOM killer from the context which uses each of these mmu notifiers
(one at the time) and see how that works. You would see the note in the
log whenever the notifier would block. The primary thing to test is how
often the oom reaper really had to back off completely.

> I mean I can rather easily provide a test which crashes an AMD GPU, which in
> turn then would mean that the MMU notifier would block forever without this
> patch.

Well, you do not really have to go that far. It should be sufficient to
do the above. The current code would simply back of without releasing
any memory. The patch should help to reclaim some memory.
 
> But do you know a way to let the OOM killer kill a specific process?

Yes, you can set its oom_score_adj to 1000 which means always select
that task.
-- 
Michal Hocko
SUSE Labs
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC PATCH] mm, oom: distinguish blockable mode for mmu notifiers

2018-07-02 Thread Christian König

Am 02.07.2018 um 14:35 schrieb Michal Hocko:

On Mon 02-07-18 14:24:29, Christian König wrote:

Am 02.07.2018 um 14:20 schrieb Michal Hocko:

On Mon 02-07-18 14:13:42, Christian König wrote:

Am 02.07.2018 um 13:54 schrieb Michal Hocko:

On Mon 02-07-18 11:14:58, Christian König wrote:

Am 27.06.2018 um 09:44 schrieb Michal Hocko:

This is the v2 of RFC based on the feedback I've received so far. The
code even compiles as a bonus ;) I haven't runtime tested it yet, mostly
because I have no idea how.

Any further feedback is highly appreciated of course.

That sounds like it should work and at least the amdgpu changes now look
good to me on first glance.

Can you split that up further in the usual way? E.g. adding the blockable
flag in one patch and fixing all implementations of the MMU notifier in
follow up patches.

But such a code would be broken, no? Ignoring the blockable state will
simply lead to lockups until the fixup parts get applied.

Well to still be bisect-able you only need to get the interface change in
first with fixing the function signature of the implementations.

That would only work if those functions return -AGAIN unconditionally.
Otherwise they would pretend to not block while that would be obviously
incorrect. This doesn't sound correct to me.


Then add all the new code to the implementations and last start to actually
use the new interface.

That is a pattern we use regularly and I think it's good practice to do
this.

But we do rely on the proper blockable handling.

Yeah, but you could add the handling only after you have all the
implementations in place. Don't you?

Yeah, but then I would be adding a code with no user. And I really
prefer to no do so because then the code is harder to argue about.


Is the split up really worth it? I was thinking about that but had hard
times to end up with something that would be bisectable. Well, except
for returning -EBUSY until all notifiers are implemented. Which I found
confusing.

It at least makes reviewing changes much easier, cause as driver maintainer
I can concentrate on the stuff only related to me.

Additional to that when you cause some unrelated side effect in a driver we
can much easier pinpoint the actual change later on when the patch is
smaller.


This way I'm pretty sure Felix and I can give an rb on the amdgpu/amdkfd
changes.

If you are worried to give r-b only for those then this can be done even
for larger patches. Just make your Reviewd-by more specific
R-b: name # For BLA BLA

Yeah, possible alternative but more work for me when I review it :)

I definitely do not want to add more work to reviewers and I completely
see how massive "flag days" like these are not popular but I really
didn't find a reasonable way around that would be both correct and
wouldn't add much more churn on the way. So if you really insist then I
would really appreciate a hint on the way to achive the same without any
above downsides.

Well, I don't insist on this. It's just from my point of view that this
patch doesn't needs to be one patch, but could be split up.

Well, if there are more people with the same concern I can try to do
that. But if your only concern is to focus on your particular part then
I guess it would be easier both for you and me to simply apply the patch
and use git show $files_for_your_subystem on your end. I have put the
patch to attempts/oom-vs-mmu-notifiers branch to my tree at
git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git


Not wanting to block something as important as this, so feel free to add 
an Acked-by: Christian König  to the patch.


Let's rather face the next topic: Any idea how to runtime test this?

I mean I can rather easily provide a test which crashes an AMD GPU, 
which in turn then would mean that the MMU notifier would block forever 
without this patch.


But do you know a way to let the OOM killer kill a specific process?

Regards,
Christian.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC PATCH] mm, oom: distinguish blockable mode for mmu notifiers

2018-07-02 Thread Michal Hocko
On Mon 02-07-18 14:24:29, Christian König wrote:
> Am 02.07.2018 um 14:20 schrieb Michal Hocko:
> > On Mon 02-07-18 14:13:42, Christian König wrote:
> > > Am 02.07.2018 um 13:54 schrieb Michal Hocko:
> > > > On Mon 02-07-18 11:14:58, Christian König wrote:
> > > > > Am 27.06.2018 um 09:44 schrieb Michal Hocko:
> > > > > > This is the v2 of RFC based on the feedback I've received so far. 
> > > > > > The
> > > > > > code even compiles as a bonus ;) I haven't runtime tested it yet, 
> > > > > > mostly
> > > > > > because I have no idea how.
> > > > > > 
> > > > > > Any further feedback is highly appreciated of course.
> > > > > That sounds like it should work and at least the amdgpu changes now 
> > > > > look
> > > > > good to me on first glance.
> > > > > 
> > > > > Can you split that up further in the usual way? E.g. adding the 
> > > > > blockable
> > > > > flag in one patch and fixing all implementations of the MMU notifier 
> > > > > in
> > > > > follow up patches.
> > > > But such a code would be broken, no? Ignoring the blockable state will
> > > > simply lead to lockups until the fixup parts get applied.
> > > Well to still be bisect-able you only need to get the interface change in
> > > first with fixing the function signature of the implementations.
> > That would only work if those functions return -AGAIN unconditionally.
> > Otherwise they would pretend to not block while that would be obviously
> > incorrect. This doesn't sound correct to me.
> > 
> > > Then add all the new code to the implementations and last start to 
> > > actually
> > > use the new interface.
> > > 
> > > That is a pattern we use regularly and I think it's good practice to do
> > > this.
> > But we do rely on the proper blockable handling.
> 
> Yeah, but you could add the handling only after you have all the
> implementations in place. Don't you?

Yeah, but then I would be adding a code with no user. And I really
prefer to no do so because then the code is harder to argue about.

> > > > Is the split up really worth it? I was thinking about that but had hard
> > > > times to end up with something that would be bisectable. Well, except
> > > > for returning -EBUSY until all notifiers are implemented. Which I found
> > > > confusing.
> > > It at least makes reviewing changes much easier, cause as driver 
> > > maintainer
> > > I can concentrate on the stuff only related to me.
> > > 
> > > Additional to that when you cause some unrelated side effect in a driver 
> > > we
> > > can much easier pinpoint the actual change later on when the patch is
> > > smaller.
> > > 
> > > > > This way I'm pretty sure Felix and I can give an rb on the 
> > > > > amdgpu/amdkfd
> > > > > changes.
> > > > If you are worried to give r-b only for those then this can be done even
> > > > for larger patches. Just make your Reviewd-by more specific
> > > > R-b: name # For BLA BLA
> > > Yeah, possible alternative but more work for me when I review it :)
> > I definitely do not want to add more work to reviewers and I completely
> > see how massive "flag days" like these are not popular but I really
> > didn't find a reasonable way around that would be both correct and
> > wouldn't add much more churn on the way. So if you really insist then I
> > would really appreciate a hint on the way to achive the same without any
> > above downsides.
> 
> Well, I don't insist on this. It's just from my point of view that this
> patch doesn't needs to be one patch, but could be split up.

Well, if there are more people with the same concern I can try to do
that. But if your only concern is to focus on your particular part then
I guess it would be easier both for you and me to simply apply the patch
and use git show $files_for_your_subystem on your end. I have put the
patch to attempts/oom-vs-mmu-notifiers branch to my tree at
git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git
-- 
Michal Hocko
SUSE Labs
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC PATCH] mm, oom: distinguish blockable mode for mmu notifiers

2018-07-02 Thread Christian König

Am 02.07.2018 um 14:20 schrieb Michal Hocko:

On Mon 02-07-18 14:13:42, Christian König wrote:

Am 02.07.2018 um 13:54 schrieb Michal Hocko:

On Mon 02-07-18 11:14:58, Christian König wrote:

Am 27.06.2018 um 09:44 schrieb Michal Hocko:

This is the v2 of RFC based on the feedback I've received so far. The
code even compiles as a bonus ;) I haven't runtime tested it yet, mostly
because I have no idea how.

Any further feedback is highly appreciated of course.

That sounds like it should work and at least the amdgpu changes now look
good to me on first glance.

Can you split that up further in the usual way? E.g. adding the blockable
flag in one patch and fixing all implementations of the MMU notifier in
follow up patches.

But such a code would be broken, no? Ignoring the blockable state will
simply lead to lockups until the fixup parts get applied.

Well to still be bisect-able you only need to get the interface change in
first with fixing the function signature of the implementations.

That would only work if those functions return -AGAIN unconditionally.
Otherwise they would pretend to not block while that would be obviously
incorrect. This doesn't sound correct to me.


Then add all the new code to the implementations and last start to actually
use the new interface.

That is a pattern we use regularly and I think it's good practice to do
this.

But we do rely on the proper blockable handling.


Yeah, but you could add the handling only after you have all the 
implementations in place. Don't you?



Is the split up really worth it? I was thinking about that but had hard
times to end up with something that would be bisectable. Well, except
for returning -EBUSY until all notifiers are implemented. Which I found
confusing.

It at least makes reviewing changes much easier, cause as driver maintainer
I can concentrate on the stuff only related to me.

Additional to that when you cause some unrelated side effect in a driver we
can much easier pinpoint the actual change later on when the patch is
smaller.


This way I'm pretty sure Felix and I can give an rb on the amdgpu/amdkfd
changes.

If you are worried to give r-b only for those then this can be done even
for larger patches. Just make your Reviewd-by more specific
R-b: name # For BLA BLA

Yeah, possible alternative but more work for me when I review it :)

I definitely do not want to add more work to reviewers and I completely
see how massive "flag days" like these are not popular but I really
didn't find a reasonable way around that would be both correct and
wouldn't add much more churn on the way. So if you really insist then I
would really appreciate a hint on the way to achive the same without any
above downsides.


Well, I don't insist on this. It's just from my point of view that this 
patch doesn't needs to be one patch, but could be split up.


Could be that I just don't know the code or the consequences of adding 
that well enough to really judge.


Christian.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC PATCH] mm, oom: distinguish blockable mode for mmu notifiers

2018-07-02 Thread Michal Hocko
On Mon 02-07-18 14:13:42, Christian König wrote:
> Am 02.07.2018 um 13:54 schrieb Michal Hocko:
> > On Mon 02-07-18 11:14:58, Christian König wrote:
> > > Am 27.06.2018 um 09:44 schrieb Michal Hocko:
> > > > This is the v2 of RFC based on the feedback I've received so far. The
> > > > code even compiles as a bonus ;) I haven't runtime tested it yet, mostly
> > > > because I have no idea how.
> > > > 
> > > > Any further feedback is highly appreciated of course.
> > > That sounds like it should work and at least the amdgpu changes now look
> > > good to me on first glance.
> > > 
> > > Can you split that up further in the usual way? E.g. adding the blockable
> > > flag in one patch and fixing all implementations of the MMU notifier in
> > > follow up patches.
> > But such a code would be broken, no? Ignoring the blockable state will
> > simply lead to lockups until the fixup parts get applied.
> 
> Well to still be bisect-able you only need to get the interface change in
> first with fixing the function signature of the implementations.

That would only work if those functions return -AGAIN unconditionally.
Otherwise they would pretend to not block while that would be obviously
incorrect. This doesn't sound correct to me.

> Then add all the new code to the implementations and last start to actually
> use the new interface.
> 
> That is a pattern we use regularly and I think it's good practice to do
> this.

But we do rely on the proper blockable handling.

> > Is the split up really worth it? I was thinking about that but had hard
> > times to end up with something that would be bisectable. Well, except
> > for returning -EBUSY until all notifiers are implemented. Which I found
> > confusing.
> 
> It at least makes reviewing changes much easier, cause as driver maintainer
> I can concentrate on the stuff only related to me.
> 
> Additional to that when you cause some unrelated side effect in a driver we
> can much easier pinpoint the actual change later on when the patch is
> smaller.
> 
> > 
> > > This way I'm pretty sure Felix and I can give an rb on the amdgpu/amdkfd
> > > changes.
> > If you are worried to give r-b only for those then this can be done even
> > for larger patches. Just make your Reviewd-by more specific
> > R-b: name # For BLA BLA
> 
> Yeah, possible alternative but more work for me when I review it :)

I definitely do not want to add more work to reviewers and I completely
see how massive "flag days" like these are not popular but I really
didn't find a reasonable way around that would be both correct and
wouldn't add much more churn on the way. So if you really insist then I
would really appreciate a hint on the way to achive the same without any
above downsides.
-- 
Michal Hocko
SUSE Labs
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC PATCH] mm, oom: distinguish blockable mode for mmu notifiers

2018-07-02 Thread Christian König

Am 02.07.2018 um 13:54 schrieb Michal Hocko:

On Mon 02-07-18 11:14:58, Christian König wrote:

Am 27.06.2018 um 09:44 schrieb Michal Hocko:

This is the v2 of RFC based on the feedback I've received so far. The
code even compiles as a bonus ;) I haven't runtime tested it yet, mostly
because I have no idea how.

Any further feedback is highly appreciated of course.

That sounds like it should work and at least the amdgpu changes now look
good to me on first glance.

Can you split that up further in the usual way? E.g. adding the blockable
flag in one patch and fixing all implementations of the MMU notifier in
follow up patches.

But such a code would be broken, no? Ignoring the blockable state will
simply lead to lockups until the fixup parts get applied.


Well to still be bisect-able you only need to get the interface change 
in first with fixing the function signature of the implementations.


Then add all the new code to the implementations and last start to 
actually use the new interface.


That is a pattern we use regularly and I think it's good practice to do 
this.



Is the split up really worth it? I was thinking about that but had hard
times to end up with something that would be bisectable. Well, except
for returning -EBUSY until all notifiers are implemented. Which I found
confusing.


It at least makes reviewing changes much easier, cause as driver 
maintainer I can concentrate on the stuff only related to me.


Additional to that when you cause some unrelated side effect in a driver 
we can much easier pinpoint the actual change later on when the patch is 
smaller.





This way I'm pretty sure Felix and I can give an rb on the amdgpu/amdkfd
changes.

If you are worried to give r-b only for those then this can be done even
for larger patches. Just make your Reviewd-by more specific
R-b: name # For BLA BLA


Yeah, possible alternative but more work for me when I review it :)

Regards,
Christian.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC PATCH] mm, oom: distinguish blockable mode for mmu notifiers

2018-07-02 Thread Michal Hocko
On Mon 02-07-18 11:14:58, Christian König wrote:
> Am 27.06.2018 um 09:44 schrieb Michal Hocko:
> > This is the v2 of RFC based on the feedback I've received so far. The
> > code even compiles as a bonus ;) I haven't runtime tested it yet, mostly
> > because I have no idea how.
> > 
> > Any further feedback is highly appreciated of course.
> 
> That sounds like it should work and at least the amdgpu changes now look
> good to me on first glance.
> 
> Can you split that up further in the usual way? E.g. adding the blockable
> flag in one patch and fixing all implementations of the MMU notifier in
> follow up patches.

But such a code would be broken, no? Ignoring the blockable state will
simply lead to lockups until the fixup parts get applied.
Is the split up really worth it? I was thinking about that but had hard
times to end up with something that would be bisectable. Well, except
for returning -EBUSY until all notifiers are implemented. Which I found
confusing.

> This way I'm pretty sure Felix and I can give an rb on the amdgpu/amdkfd
> changes.

If you are worried to give r-b only for those then this can be done even
for larger patches. Just make your Reviewd-by more specific
R-b: name # For BLA BLA
-- 
Michal Hocko
SUSE Labs
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC PATCH] mm, oom: distinguish blockable mode for mmu notifiers

2018-07-02 Thread Christian König

Am 27.06.2018 um 09:44 schrieb Michal Hocko:

This is the v2 of RFC based on the feedback I've received so far. The
code even compiles as a bonus ;) I haven't runtime tested it yet, mostly
because I have no idea how.

Any further feedback is highly appreciated of course.


That sounds like it should work and at least the amdgpu changes now look 
good to me on first glance.


Can you split that up further in the usual way? E.g. adding the 
blockable flag in one patch and fixing all implementations of the MMU 
notifier in follow up patches.


This way I'm pretty sure Felix and I can give an rb on the amdgpu/amdkfd 
changes.


Thanks,
Christian.


---
 From ec9a7241bf422b908532c4c33953b0da2655ad05 Mon Sep 17 00:00:00 2001
From: Michal Hocko 
Date: Wed, 20 Jun 2018 15:03:20 +0200
Subject: [PATCH] mm, oom: distinguish blockable mode for mmu notifiers
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

There are several blockable mmu notifiers which might sleep in
mmu_notifier_invalidate_range_start and that is a problem for the
oom_reaper because it needs to guarantee a forward progress so it cannot
depend on any sleepable locks.

Currently we simply back off and mark an oom victim with blockable mmu
notifiers as done after a short sleep. That can result in selecting a
new oom victim prematurely because the previous one still hasn't torn
its memory down yet.

We can do much better though. Even if mmu notifiers use sleepable locks
there is no reason to automatically assume those locks are held.
Moreover majority of notifiers only care about a portion of the address
space and there is absolutely zero reason to fail when we are unmapping an
unrelated range. Many notifiers do really block and wait for HW which is
harder to handle and we have to bail out though.

This patch handles the low hanging fruid. __mmu_notifier_invalidate_range_start
gets a blockable flag and callbacks are not allowed to sleep if the
flag is set to false. This is achieved by using trylock instead of the
sleepable lock for most callbacks and continue as long as we do not
block down the call chain.

I think we can improve that even further because there is a common
pattern to do a range lookup first and then do something about that.
The first part can be done without a sleeping lock in most cases AFAICS.

The oom_reaper end then simply retries if there is at least one notifier
which couldn't make any progress in !blockable mode. A retry loop is
already implemented to wait for the mmap_sem and this is basically the
same thing.

Changes since rfc v1
- gpu notifiers can sleep while waiting for HW (evict_process_queues_cpsch
   on a lock and amdgpu_mn_invalidate_node on unbound timeout) make sure
   we bail out when we have an intersecting range for starter
- note that a notifier failed to the log for easier debugging
- back off early in ib_umem_notifier_invalidate_range_start if the
   callback is called
- mn_invl_range_start waits for completion down the unmap_grant_pages
   path so we have to back off early on overlapping ranges

Cc: "David (ChunMing) Zhou" 
Cc: Paolo Bonzini 
Cc: "Radim Krčmář" 
Cc: Alex Deucher 
Cc: "Christian König" 
Cc: David Airlie 
Cc: Jani Nikula 
Cc: Joonas Lahtinen 
Cc: Rodrigo Vivi 
Cc: Doug Ledford 
Cc: Jason Gunthorpe 
Cc: Mike Marciniszyn 
Cc: Dennis Dalessandro 
Cc: Sudeep Dutt 
Cc: Ashutosh Dixit 
Cc: Dimitri Sivanich 
Cc: Boris Ostrovsky 
Cc: Juergen Gross 
Cc: "Jérôme Glisse" 
Cc: Andrea Arcangeli 
Cc: Felix Kuehling 
Cc: k...@vger.kernel.org (open list:KERNEL VIRTUAL MACHINE FOR X86 (KVM/x86))
Cc: linux-ker...@vger.kernel.org (open list:X86 ARCHITECTURE (32-BIT AND 
64-BIT))
Cc: amd-gfx@lists.freedesktop.org (open list:RADEON and AMDGPU DRM DRIVERS)
Cc: dri-de...@lists.freedesktop.org (open list:DRM DRIVERS)
Cc: intel-...@lists.freedesktop.org (open list:INTEL DRM DRIVERS (excluding 
Poulsbo, Moorestow...)
Cc: linux-r...@vger.kernel.org (open list:INFINIBAND SUBSYSTEM)
Cc: xen-de...@lists.xenproject.org (moderated list:XEN HYPERVISOR INTERFACE)
Cc: linux...@kvack.org (open list:HMM - Heterogeneous Memory Management)
Reported-by: David Rientjes 
Signed-off-by: Michal Hocko 
---
  arch/x86/kvm/x86.c  |  7 ++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c  | 43 +++-
  drivers/gpu/drm/i915/i915_gem_userptr.c | 13 ++--
  drivers/gpu/drm/radeon/radeon_mn.c  | 22 +++--
  drivers/infiniband/core/umem_odp.c  | 33 +++
  drivers/infiniband/hw/hfi1/mmu_rb.c | 11 ---
  drivers/infiniband/hw/mlx5/odp.c|  2 +-
  drivers/misc/mic/scif/scif_dma.c|  7 ++--
  drivers/misc/sgi-gru/grutlbpurge.c  |  7 ++--
  drivers/xen/gntdev.c| 44 -
  include/linux/kvm_host.h|  4 +--
  include/linux/mmu_notifier.h| 35 +++-
  include/linux/oom.h |  2 +-
  include/rdma/ib_umem_odp.h  |  3 +-

Re: [RFC PATCH] mm, oom: distinguish blockable mode for mmu notifiers

2018-06-27 Thread Michal Hocko
This is the v2 of RFC based on the feedback I've received so far. The
code even compiles as a bonus ;) I haven't runtime tested it yet, mostly
because I have no idea how.

Any further feedback is highly appreciated of course.
---
From ec9a7241bf422b908532c4c33953b0da2655ad05 Mon Sep 17 00:00:00 2001
From: Michal Hocko 
Date: Wed, 20 Jun 2018 15:03:20 +0200
Subject: [PATCH] mm, oom: distinguish blockable mode for mmu notifiers
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

There are several blockable mmu notifiers which might sleep in
mmu_notifier_invalidate_range_start and that is a problem for the
oom_reaper because it needs to guarantee a forward progress so it cannot
depend on any sleepable locks.

Currently we simply back off and mark an oom victim with blockable mmu
notifiers as done after a short sleep. That can result in selecting a
new oom victim prematurely because the previous one still hasn't torn
its memory down yet.

We can do much better though. Even if mmu notifiers use sleepable locks
there is no reason to automatically assume those locks are held.
Moreover majority of notifiers only care about a portion of the address
space and there is absolutely zero reason to fail when we are unmapping an
unrelated range. Many notifiers do really block and wait for HW which is
harder to handle and we have to bail out though.

This patch handles the low hanging fruid. __mmu_notifier_invalidate_range_start
gets a blockable flag and callbacks are not allowed to sleep if the
flag is set to false. This is achieved by using trylock instead of the
sleepable lock for most callbacks and continue as long as we do not
block down the call chain.

I think we can improve that even further because there is a common
pattern to do a range lookup first and then do something about that.
The first part can be done without a sleeping lock in most cases AFAICS.

The oom_reaper end then simply retries if there is at least one notifier
which couldn't make any progress in !blockable mode. A retry loop is
already implemented to wait for the mmap_sem and this is basically the
same thing.

Changes since rfc v1
- gpu notifiers can sleep while waiting for HW (evict_process_queues_cpsch
  on a lock and amdgpu_mn_invalidate_node on unbound timeout) make sure
  we bail out when we have an intersecting range for starter
- note that a notifier failed to the log for easier debugging
- back off early in ib_umem_notifier_invalidate_range_start if the
  callback is called
- mn_invl_range_start waits for completion down the unmap_grant_pages
  path so we have to back off early on overlapping ranges

Cc: "David (ChunMing) Zhou" 
Cc: Paolo Bonzini 
Cc: "Radim Krčmář" 
Cc: Alex Deucher 
Cc: "Christian König" 
Cc: David Airlie 
Cc: Jani Nikula 
Cc: Joonas Lahtinen 
Cc: Rodrigo Vivi 
Cc: Doug Ledford 
Cc: Jason Gunthorpe 
Cc: Mike Marciniszyn 
Cc: Dennis Dalessandro 
Cc: Sudeep Dutt 
Cc: Ashutosh Dixit 
Cc: Dimitri Sivanich 
Cc: Boris Ostrovsky 
Cc: Juergen Gross 
Cc: "Jérôme Glisse" 
Cc: Andrea Arcangeli 
Cc: Felix Kuehling 
Cc: k...@vger.kernel.org (open list:KERNEL VIRTUAL MACHINE FOR X86 (KVM/x86))
Cc: linux-ker...@vger.kernel.org (open list:X86 ARCHITECTURE (32-BIT AND 
64-BIT))
Cc: amd-gfx@lists.freedesktop.org (open list:RADEON and AMDGPU DRM DRIVERS)
Cc: dri-de...@lists.freedesktop.org (open list:DRM DRIVERS)
Cc: intel-...@lists.freedesktop.org (open list:INTEL DRM DRIVERS (excluding 
Poulsbo, Moorestow...)
Cc: linux-r...@vger.kernel.org (open list:INFINIBAND SUBSYSTEM)
Cc: xen-de...@lists.xenproject.org (moderated list:XEN HYPERVISOR INTERFACE)
Cc: linux...@kvack.org (open list:HMM - Heterogeneous Memory Management)
Reported-by: David Rientjes 
Signed-off-by: Michal Hocko 
---
 arch/x86/kvm/x86.c  |  7 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c  | 43 +++-
 drivers/gpu/drm/i915/i915_gem_userptr.c | 13 ++--
 drivers/gpu/drm/radeon/radeon_mn.c  | 22 +++--
 drivers/infiniband/core/umem_odp.c  | 33 +++
 drivers/infiniband/hw/hfi1/mmu_rb.c | 11 ---
 drivers/infiniband/hw/mlx5/odp.c|  2 +-
 drivers/misc/mic/scif/scif_dma.c|  7 ++--
 drivers/misc/sgi-gru/grutlbpurge.c  |  7 ++--
 drivers/xen/gntdev.c| 44 -
 include/linux/kvm_host.h|  4 +--
 include/linux/mmu_notifier.h| 35 +++-
 include/linux/oom.h |  2 +-
 include/rdma/ib_umem_odp.h  |  3 +-
 mm/hmm.c|  7 ++--
 mm/mmap.c   |  2 +-
 mm/mmu_notifier.c   | 19 ---
 mm/oom_kill.c   | 29 
 virt/kvm/kvm_main.c | 15 ++---
 19 files changed, 225 insertions(+), 80 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6bcecc325e7e..ac08f5d711be 100644
--- 

Re: [RFC PATCH] mm, oom: distinguish blockable mode for mmu notifiers

2018-06-25 Thread Michal Hocko
On Mon 25-06-18 10:01:03, Michal Hocko wrote:
> On Fri 22-06-18 16:09:06, Felix Kuehling wrote:
> > On 2018-06-22 11:24 AM, Michal Hocko wrote:
> > > On Fri 22-06-18 17:13:02, Christian König wrote:
> > >> Hi Michal,
> > >>
> > >> [Adding Felix as well]
> > >>
> > >> Well first of all you have a misconception why at least the AMD graphics
> > >> driver need to be able to sleep in an MMU notifier: We need to sleep 
> > >> because
> > >> we need to wait for hardware operations to finish and *NOT* because we 
> > >> need
> > >> to wait for locks.
> > >>
> > >> I'm not sure if your flag now means that you generally can't sleep in MMU
> > >> notifiers any more, but if that's the case at least AMD hardware will 
> > >> break
> > >> badly. In our case the approach of waiting for a short time for the 
> > >> process
> > >> to be reaped and then select another victim actually sounds like the 
> > >> right
> > >> thing to do.
> > > Well, I do not need to make the notifier code non blocking all the time.
> > > All I need is to ensure that it won't sleep if the flag says so and
> > > return -EAGAIN instead.
> > >
> > > So here is what I do for amdgpu:
> > 
> > In the case of KFD we also need to take the DQM lock:
> > 
> > amdgpu_mn_invalidate_range_start_hsa -> amdgpu_amdkfd_evict_userptr ->
> > kgd2kfd_quiesce_mm -> kfd_process_evict_queues -> evict_process_queues_cpsch
> > 
> > So we'd need to pass the blockable parameter all the way through that
> > call chain.
> 
> Thanks, I have missed that part. So I guess I will start with something
> similar to intel-gfx and back off when the current range needs some
> treatment. So this on top. Does it look correct?
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
> index d138a526feff..e2d422b3eb0b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
> @@ -266,6 +266,11 @@ static int amdgpu_mn_invalidate_range_start_hsa(struct 
> mmu_notifier *mn,
>   struct amdgpu_mn_node *node;
>   struct amdgpu_bo *bo;
>  
> + if (!blockable) {
> + amdgpu_mn_read_unlock();
> + return -EAGAIN;
> + }
> +
>   node = container_of(it, struct amdgpu_mn_node, it);
>   it = interval_tree_iter_next(it, start, end);

Ble, just noticed that half of the change didn't get to git index...
This is what I have
commit c4701b36ac2802b903db3d05cf77c030fccce3a8
Author: Michal Hocko 
Date:   Mon Jun 25 15:24:03 2018 +0200

fold me

- amd gpu notifiers can sleep deeper in the callchain 
(evict_process_queues_cpsch
  on a lock and amdgpu_mn_invalidate_node on unbound timeout) make sure
  we bail out when we have an intersecting range for starter

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
index d138a526feff..3399a4a927fb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
@@ -225,6 +225,11 @@ static int amdgpu_mn_invalidate_range_start_gfx(struct 
mmu_notifier *mn,
while (it) {
struct amdgpu_mn_node *node;
 
+   if (!blockable) {
+   amdgpu_mn_read_unlock(rmn);
+   return -EAGAIN;
+   }
+
node = container_of(it, struct amdgpu_mn_node, it);
it = interval_tree_iter_next(it, start, end);
 
@@ -266,6 +271,11 @@ static int amdgpu_mn_invalidate_range_start_hsa(struct 
mmu_notifier *mn,
struct amdgpu_mn_node *node;
struct amdgpu_bo *bo;
 
+   if (!blockable) {
+   amdgpu_mn_read_unlock(rmn);
+   return -EAGAIN;
+   }
+
node = container_of(it, struct amdgpu_mn_node, it);
it = interval_tree_iter_next(it, start, end);
 
-- 
Michal Hocko
SUSE Labs
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC PATCH] mm, oom: distinguish blockable mode for mmu notifiers

2018-06-25 Thread Michal Hocko
On Fri 22-06-18 16:09:06, Felix Kuehling wrote:
> On 2018-06-22 11:24 AM, Michal Hocko wrote:
> > On Fri 22-06-18 17:13:02, Christian König wrote:
> >> Hi Michal,
> >>
> >> [Adding Felix as well]
> >>
> >> Well first of all you have a misconception why at least the AMD graphics
> >> driver need to be able to sleep in an MMU notifier: We need to sleep 
> >> because
> >> we need to wait for hardware operations to finish and *NOT* because we need
> >> to wait for locks.
> >>
> >> I'm not sure if your flag now means that you generally can't sleep in MMU
> >> notifiers any more, but if that's the case at least AMD hardware will break
> >> badly. In our case the approach of waiting for a short time for the process
> >> to be reaped and then select another victim actually sounds like the right
> >> thing to do.
> > Well, I do not need to make the notifier code non blocking all the time.
> > All I need is to ensure that it won't sleep if the flag says so and
> > return -EAGAIN instead.
> >
> > So here is what I do for amdgpu:
> 
> In the case of KFD we also need to take the DQM lock:
> 
> amdgpu_mn_invalidate_range_start_hsa -> amdgpu_amdkfd_evict_userptr ->
> kgd2kfd_quiesce_mm -> kfd_process_evict_queues -> evict_process_queues_cpsch
> 
> So we'd need to pass the blockable parameter all the way through that
> call chain.

Thanks, I have missed that part. So I guess I will start with something
similar to intel-gfx and back off when the current range needs some
treatment. So this on top. Does it look correct?

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
index d138a526feff..e2d422b3eb0b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
@@ -266,6 +266,11 @@ static int amdgpu_mn_invalidate_range_start_hsa(struct 
mmu_notifier *mn,
struct amdgpu_mn_node *node;
struct amdgpu_bo *bo;
 
+   if (!blockable) {
+   amdgpu_mn_read_unlock();
+   return -EAGAIN;
+   }
+
node = container_of(it, struct amdgpu_mn_node, it);
it = interval_tree_iter_next(it, start, end);
 
-- 
Michal Hocko
SUSE Labs
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC PATCH] mm, oom: distinguish blockable mode for mmu notifiers

2018-06-22 Thread Felix Kuehling
On 2018-06-22 11:24 AM, Michal Hocko wrote:
> On Fri 22-06-18 17:13:02, Christian König wrote:
>> Hi Michal,
>>
>> [Adding Felix as well]
>>
>> Well first of all you have a misconception why at least the AMD graphics
>> driver need to be able to sleep in an MMU notifier: We need to sleep because
>> we need to wait for hardware operations to finish and *NOT* because we need
>> to wait for locks.
>>
>> I'm not sure if your flag now means that you generally can't sleep in MMU
>> notifiers any more, but if that's the case at least AMD hardware will break
>> badly. In our case the approach of waiting for a short time for the process
>> to be reaped and then select another victim actually sounds like the right
>> thing to do.
> Well, I do not need to make the notifier code non blocking all the time.
> All I need is to ensure that it won't sleep if the flag says so and
> return -EAGAIN instead.
>
> So here is what I do for amdgpu:

In the case of KFD we also need to take the DQM lock:

amdgpu_mn_invalidate_range_start_hsa -> amdgpu_amdkfd_evict_userptr ->
kgd2kfd_quiesce_mm -> kfd_process_evict_queues -> evict_process_queues_cpsch

So we'd need to pass the blockable parameter all the way through that
call chain.

Regards,
  Felix

>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
>>> index 83e344fbb50a..d138a526feff 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
>>> @@ -136,12 +136,18 @@ void amdgpu_mn_unlock(struct amdgpu_mn *mn)
>>>*
>>>* Take the rmn read side lock.
>>>*/
>>> -static void amdgpu_mn_read_lock(struct amdgpu_mn *rmn)
>>> +static int amdgpu_mn_read_lock(struct amdgpu_mn *rmn, bool blockable)
>>>   {
>>> -   mutex_lock(>read_lock);
>>> +   if (blockable)
>>> +   mutex_lock(>read_lock);
>>> +   else if (!mutex_trylock(>read_lock))
>>> +   return -EAGAIN;
>>> +
>>> if (atomic_inc_return(>recursion) == 1)
>>> down_read_non_owner(>lock);
>>> mutex_unlock(>read_lock);
>>> +
>>> +   return 0;
>>>   }
>>>   /**
>>> @@ -197,10 +203,11 @@ static void amdgpu_mn_invalidate_node(struct 
>>> amdgpu_mn_node *node,
>>>* We block for all BOs between start and end to be idle and
>>>* unmap them by move them into system domain again.
>>>*/
>>> -static void amdgpu_mn_invalidate_range_start_gfx(struct mmu_notifier *mn,
>>> +static int amdgpu_mn_invalidate_range_start_gfx(struct mmu_notifier *mn,
>>>  struct mm_struct *mm,
>>>  unsigned long start,
>>> -unsigned long end)
>>> +unsigned long end,
>>> +bool blockable)
>>>   {
>>> struct amdgpu_mn *rmn = container_of(mn, struct amdgpu_mn, mn);
>>> struct interval_tree_node *it;
>>> @@ -208,7 +215,11 @@ static void 
>>> amdgpu_mn_invalidate_range_start_gfx(struct mmu_notifier *mn,
>>> /* notification is exclusive, but interval is inclusive */
>>> end -= 1;
>>> -   amdgpu_mn_read_lock(rmn);
>>> +   /* TODO we should be able to split locking for interval tree and
>>> +* amdgpu_mn_invalidate_node
>>> +*/
>>> +   if (amdgpu_mn_read_lock(rmn, blockable))
>>> +   return -EAGAIN;
>>> it = interval_tree_iter_first(>objects, start, end);
>>> while (it) {
>>> @@ -219,6 +230,8 @@ static void amdgpu_mn_invalidate_range_start_gfx(struct 
>>> mmu_notifier *mn,
>>> amdgpu_mn_invalidate_node(node, start, end);
>>> }
>>> +
>>> +   return 0;
>>>   }
>>>   /**
>>> @@ -233,10 +246,11 @@ static void 
>>> amdgpu_mn_invalidate_range_start_gfx(struct mmu_notifier *mn,
>>>* necessitates evicting all user-mode queues of the process. The BOs
>>>* are restorted in amdgpu_mn_invalidate_range_end_hsa.
>>>*/
>>> -static void amdgpu_mn_invalidate_range_start_hsa(struct mmu_notifier *mn,
>>> +static int amdgpu_mn_invalidate_range_start_hsa(struct mmu_notifier *mn,
>>>  struct mm_struct *mm,
>>>  unsigned long start,
>>> -unsigned long end)
>>> +unsigned long end,
>>> +bool blockable)
>>>   {
>>> struct amdgpu_mn *rmn = container_of(mn, struct amdgpu_mn, mn);
>>> struct interval_tree_node *it;
>>> @@ -244,7 +258,8 @@ static void amdgpu_mn_invalidate_range_start_hsa(struct 
>>> mmu_notifier *mn,
>>> /* notification is exclusive, but interval is inclusive */
>>> end -= 1;
>>> -   amdgpu_mn_read_lock(rmn);
>>> +   if (amdgpu_mn_read_lock(rmn, blockable))
>>> +   return -EAGAIN;
>>> it = interval_tree_iter_first(>objects, start, end);
>>> while (it) {
>>> @@ -262,6 +277,8 @@ static void 

Re: [RFC PATCH] mm, oom: distinguish blockable mode for mmu notifiers

2018-06-22 Thread Michal Hocko
On Fri 22-06-18 17:13:02, Christian König wrote:
> Hi Michal,
> 
> [Adding Felix as well]
> 
> Well first of all you have a misconception why at least the AMD graphics
> driver need to be able to sleep in an MMU notifier: We need to sleep because
> we need to wait for hardware operations to finish and *NOT* because we need
> to wait for locks.
> 
> I'm not sure if your flag now means that you generally can't sleep in MMU
> notifiers any more, but if that's the case at least AMD hardware will break
> badly. In our case the approach of waiting for a short time for the process
> to be reaped and then select another victim actually sounds like the right
> thing to do.

Well, I do not need to make the notifier code non blocking all the time.
All I need is to ensure that it won't sleep if the flag says so and
return -EAGAIN instead.

So here is what I do for amdgpu:

> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
> > index 83e344fbb50a..d138a526feff 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
> > @@ -136,12 +136,18 @@ void amdgpu_mn_unlock(struct amdgpu_mn *mn)
> >*
> >* Take the rmn read side lock.
> >*/
> > -static void amdgpu_mn_read_lock(struct amdgpu_mn *rmn)
> > +static int amdgpu_mn_read_lock(struct amdgpu_mn *rmn, bool blockable)
> >   {
> > -   mutex_lock(>read_lock);
> > +   if (blockable)
> > +   mutex_lock(>read_lock);
> > +   else if (!mutex_trylock(>read_lock))
> > +   return -EAGAIN;
> > +
> > if (atomic_inc_return(>recursion) == 1)
> > down_read_non_owner(>lock);
> > mutex_unlock(>read_lock);
> > +
> > +   return 0;
> >   }
> >   /**
> > @@ -197,10 +203,11 @@ static void amdgpu_mn_invalidate_node(struct 
> > amdgpu_mn_node *node,
> >* We block for all BOs between start and end to be idle and
> >* unmap them by move them into system domain again.
> >*/
> > -static void amdgpu_mn_invalidate_range_start_gfx(struct mmu_notifier *mn,
> > +static int amdgpu_mn_invalidate_range_start_gfx(struct mmu_notifier *mn,
> >  struct mm_struct *mm,
> >  unsigned long start,
> > -unsigned long end)
> > +unsigned long end,
> > +bool blockable)
> >   {
> > struct amdgpu_mn *rmn = container_of(mn, struct amdgpu_mn, mn);
> > struct interval_tree_node *it;
> > @@ -208,7 +215,11 @@ static void 
> > amdgpu_mn_invalidate_range_start_gfx(struct mmu_notifier *mn,
> > /* notification is exclusive, but interval is inclusive */
> > end -= 1;
> > -   amdgpu_mn_read_lock(rmn);
> > +   /* TODO we should be able to split locking for interval tree and
> > +* amdgpu_mn_invalidate_node
> > +*/
> > +   if (amdgpu_mn_read_lock(rmn, blockable))
> > +   return -EAGAIN;
> > it = interval_tree_iter_first(>objects, start, end);
> > while (it) {
> > @@ -219,6 +230,8 @@ static void amdgpu_mn_invalidate_range_start_gfx(struct 
> > mmu_notifier *mn,
> > amdgpu_mn_invalidate_node(node, start, end);
> > }
> > +
> > +   return 0;
> >   }
> >   /**
> > @@ -233,10 +246,11 @@ static void 
> > amdgpu_mn_invalidate_range_start_gfx(struct mmu_notifier *mn,
> >* necessitates evicting all user-mode queues of the process. The BOs
> >* are restorted in amdgpu_mn_invalidate_range_end_hsa.
> >*/
> > -static void amdgpu_mn_invalidate_range_start_hsa(struct mmu_notifier *mn,
> > +static int amdgpu_mn_invalidate_range_start_hsa(struct mmu_notifier *mn,
> >  struct mm_struct *mm,
> >  unsigned long start,
> > -unsigned long end)
> > +unsigned long end,
> > +bool blockable)
> >   {
> > struct amdgpu_mn *rmn = container_of(mn, struct amdgpu_mn, mn);
> > struct interval_tree_node *it;
> > @@ -244,7 +258,8 @@ static void amdgpu_mn_invalidate_range_start_hsa(struct 
> > mmu_notifier *mn,
> > /* notification is exclusive, but interval is inclusive */
> > end -= 1;
> > -   amdgpu_mn_read_lock(rmn);
> > +   if (amdgpu_mn_read_lock(rmn, blockable))
> > +   return -EAGAIN;
> > it = interval_tree_iter_first(>objects, start, end);
> > while (it) {
> > @@ -262,6 +277,8 @@ static void amdgpu_mn_invalidate_range_start_hsa(struct 
> > mmu_notifier *mn,
> > amdgpu_amdkfd_evict_userptr(mem, mm);
> > }
> > }
> > +
> > +   return 0;
> >   }
> >   /**
-- 
Michal Hocko
SUSE Labs
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC PATCH] mm, oom: distinguish blockable mode for mmu notifiers

2018-06-22 Thread Christian König

Hi Michal,

[Adding Felix as well]

Well first of all you have a misconception why at least the AMD graphics 
driver need to be able to sleep in an MMU notifier: We need to sleep 
because we need to wait for hardware operations to finish and *NOT* 
because we need to wait for locks.


I'm not sure if your flag now means that you generally can't sleep in 
MMU notifiers any more, but if that's the case at least AMD hardware 
will break badly. In our case the approach of waiting for a short time 
for the process to be reaped and then select another victim actually 
sounds like the right thing to do.


What we also already try to do is to abort hardware operations with the 
address space when we detect that the process is dying, but that can 
certainly be improved.


Regards,
Christian.

Am 22.06.2018 um 17:02 schrieb Michal Hocko:

From: Michal Hocko 

There are several blockable mmu notifiers which might sleep in
mmu_notifier_invalidate_range_start and that is a problem for the
oom_reaper because it needs to guarantee a forward progress so it cannot
depend on any sleepable locks. Currently we simply back off and mark an
oom victim with blockable mmu notifiers as done after a short sleep.
That can result in selecting a new oom victim prematurely because the
previous one still hasn't torn its memory down yet.

We can do much better though. Even if mmu notifiers use sleepable locks
there is no reason to automatically assume those locks are held.
Moreover most notifiers only care about a portion of the address
space. This patch handles the first part of the problem.
__mmu_notifier_invalidate_range_start gets a blockable flag and
callbacks are not allowed to sleep if the flag is set to false. This is
achieved by using trylock instead of the sleepable lock for most
callbacks. I think we can improve that even further because there is
a common pattern to do a range lookup first and then do something about
that. The first part can be done without a sleeping lock I presume.

Anyway, what does the oom_reaper do with all that? We do not have to
fail right away. We simply retry if there is at least one notifier which
couldn't make any progress. A retry loop is already implemented to wait
for the mmap_sem and this is basically the same thing.

Cc: "David (ChunMing) Zhou" 
Cc: Paolo Bonzini 
Cc: "Radim Krčmář" 
Cc: Alex Deucher 
Cc: "Christian König" 
Cc: David Airlie 
Cc: Jani Nikula 
Cc: Joonas Lahtinen 
Cc: Rodrigo Vivi 
Cc: Doug Ledford 
Cc: Jason Gunthorpe 
Cc: Mike Marciniszyn 
Cc: Dennis Dalessandro 
Cc: Sudeep Dutt 
Cc: Ashutosh Dixit 
Cc: Dimitri Sivanich 
Cc: Boris Ostrovsky 
Cc: Juergen Gross 
Cc: "Jérôme Glisse" 
Cc: Andrea Arcangeli 
Cc: k...@vger.kernel.org (open list:KERNEL VIRTUAL MACHINE FOR X86 (KVM/x86))
Cc: linux-ker...@vger.kernel.org (open list:X86 ARCHITECTURE (32-BIT AND 
64-BIT))
Cc: amd-gfx@lists.freedesktop.org (open list:RADEON and AMDGPU DRM DRIVERS)
Cc: dri-de...@lists.freedesktop.org (open list:DRM DRIVERS)
Cc: intel-...@lists.freedesktop.org (open list:INTEL DRM DRIVERS (excluding 
Poulsbo, Moorestow...)
Cc: linux-r...@vger.kernel.org (open list:INFINIBAND SUBSYSTEM)
Cc: xen-de...@lists.xenproject.org (moderated list:XEN HYPERVISOR INTERFACE)
Cc: linux...@kvack.org (open list:HMM - Heterogeneous Memory Management)
Reported-by: David Rientjes 
Signed-off-by: Michal Hocko 
---

Hi,
this is an RFC and not tested at all. I am not very familiar with the
mmu notifiers semantics very much so this is a crude attempt to achieve
what I need basically. It might be completely wrong but I would like
to discuss what would be a better way if that is the case.

get_maintainers gave me quite large list of people to CC so I had to trim
it down. If you think I have forgot somebody, please let me know

Any feedback is highly appreciated.

  arch/x86/kvm/x86.c  |  7 --
  drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c  | 33 +++--
  drivers/gpu/drm/i915/i915_gem_userptr.c | 10 +---
  drivers/gpu/drm/radeon/radeon_mn.c  | 15 ---
  drivers/infiniband/core/umem_odp.c  | 15 ---
  drivers/infiniband/hw/hfi1/mmu_rb.c |  7 --
  drivers/misc/mic/scif/scif_dma.c|  7 --
  drivers/misc/sgi-gru/grutlbpurge.c  |  7 --
  drivers/xen/gntdev.c| 14 ---
  include/linux/kvm_host.h|  2 +-
  include/linux/mmu_notifier.h| 15 +--
  mm/hmm.c|  7 --
  mm/mmu_notifier.c   | 15 ---
  mm/oom_kill.c   | 29 +++---
  virt/kvm/kvm_main.c | 12 ++---
  15 files changed, 137 insertions(+), 58 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6bcecc325e7e..ac08f5d711be 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7203,8 +7203,9 @@ static void vcpu_load_eoi_exitmap(struct kvm_vcpu *vcpu)
kvm_x86_ops->load_eoi_exitmap(vcpu,