Re: [PATCH v2 hmm 11/11] mm/hmm: Remove confusing comment and logic from hmm_release

2019-06-10 Thread Ralph Campbell


On 6/10/19 9:02 AM, Jason Gunthorpe wrote:

On Fri, Jun 07, 2019 at 02:37:07PM -0700, Ralph Campbell wrote:


On 6/6/19 11:44 AM, Jason Gunthorpe wrote:

From: Jason Gunthorpe 

hmm_release() is called exactly once per hmm. ops->release() cannot
accidentally trigger any action that would recurse back onto
hmm->mirrors_sem.

This fixes a use after-free race of the form:

 CPU0   CPU1
 hmm_release()
   up_write(>mirrors_sem);
   hmm_mirror_unregister(mirror)
down_write(>mirrors_sem);
up_write(>mirrors_sem);
kfree(mirror)
   mirror->ops->release(mirror)

The only user we have today for ops->release is an empty function, so this
is unambiguously safe.

As a consequence of plugging this race drivers are not allowed to
register/unregister mirrors from within a release op.

Signed-off-by: Jason Gunthorpe 


I agree with the analysis above but I'm not sure that release() will
always be an empty function. It might be more efficient to write back
all data migrated to a device "in one pass" instead of relying
on unmap_vmas() calling hmm_start_range_invalidate() per VMA.


I think we have to focus on the *current* kernel - and we have two
users of release, nouveau_svm.c is empty and amdgpu_mn.c does
schedule_work() - so I believe we should go ahead with this simple
solution to the actual race today that both of those will suffer from.

If we find a need for a more complex version then it can be debated
and justified with proper context...

Ok?

Jason


OK.
I guess we have enough on the plate already :-)
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v2 hmm 11/11] mm/hmm: Remove confusing comment and logic from hmm_release

2019-06-10 Thread Jason Gunthorpe
On Fri, Jun 07, 2019 at 02:37:07PM -0700, Ralph Campbell wrote:
> 
> On 6/6/19 11:44 AM, Jason Gunthorpe wrote:
> > From: Jason Gunthorpe 
> > 
> > hmm_release() is called exactly once per hmm. ops->release() cannot
> > accidentally trigger any action that would recurse back onto
> > hmm->mirrors_sem.
> > 
> > This fixes a use after-free race of the form:
> > 
> > CPU0   CPU1
> > hmm_release()
> >   up_write(>mirrors_sem);
> >   hmm_mirror_unregister(mirror)
> >down_write(>mirrors_sem);
> >up_write(>mirrors_sem);
> >kfree(mirror)
> >   mirror->ops->release(mirror)
> > 
> > The only user we have today for ops->release is an empty function, so this
> > is unambiguously safe.
> > 
> > As a consequence of plugging this race drivers are not allowed to
> > register/unregister mirrors from within a release op.
> > 
> > Signed-off-by: Jason Gunthorpe 
> 
> I agree with the analysis above but I'm not sure that release() will
> always be an empty function. It might be more efficient to write back
> all data migrated to a device "in one pass" instead of relying
> on unmap_vmas() calling hmm_start_range_invalidate() per VMA.

I think we have to focus on the *current* kernel - and we have two
users of release, nouveau_svm.c is empty and amdgpu_mn.c does
schedule_work() - so I believe we should go ahead with this simple
solution to the actual race today that both of those will suffer from.

If we find a need for a more complex version then it can be debated
and justified with proper context...

Ok?

Jason
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v2 hmm 11/11] mm/hmm: Remove confusing comment and logic from hmm_release

2019-06-09 Thread Ralph Campbell


On 6/6/19 11:44 AM, Jason Gunthorpe wrote:

From: Jason Gunthorpe 

hmm_release() is called exactly once per hmm. ops->release() cannot
accidentally trigger any action that would recurse back onto
hmm->mirrors_sem.

This fixes a use after-free race of the form:

CPU0   CPU1
hmm_release()
  up_write(>mirrors_sem);
  hmm_mirror_unregister(mirror)
   down_write(>mirrors_sem);
   up_write(>mirrors_sem);
   kfree(mirror)
  mirror->ops->release(mirror)

The only user we have today for ops->release is an empty function, so this
is unambiguously safe.

As a consequence of plugging this race drivers are not allowed to
register/unregister mirrors from within a release op.

Signed-off-by: Jason Gunthorpe 


I agree with the analysis above but I'm not sure that release() will
always be an empty function. It might be more efficient to write back
all data migrated to a device "in one pass" instead of relying
on unmap_vmas() calling hmm_start_range_invalidate() per VMA.

I think the bigger issue is potential deadlocks while calling
sync_cpu_device_pagetables() and tasks calling hmm_mirror_unregister():

Say you have three threads:
- Thread A is in try_to_unmap(), either without holding mmap_sem or with
mmap_sem held for read.
- Thread B has some unrelated driver calling hmm_mirror_unregister().
This doesn't require mmap_sem.
- Thread C is about to call migrate_vma().

Thread AThread B Thread C
try_to_unmaphmm_mirror_unregistermigrate_vma
--  ---  --
hmm_invalidate_range_start
down_read(mirrors_sem)
down_write(mirrors_sem)
// Blocked on A
  device_lock
device_lock
// Blocked on C
  migrate_vma()
  hmm_invalidate_range_s
  down_read(mirrors_sem)
  // Blocked on B
  // Deadlock

Perhaps we should consider using SRCU for walking the mirror->list?


---
  mm/hmm.c | 28 +---
  1 file changed, 9 insertions(+), 19 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index 709d138dd49027..3a45dd3d778248 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -136,26 +136,16 @@ static void hmm_release(struct mmu_notifier *mn, struct 
mm_struct *mm)
WARN_ON(!list_empty(>ranges));
mutex_unlock(>lock);
  
-	down_write(>mirrors_sem);

-   mirror = list_first_entry_or_null(>mirrors, struct hmm_mirror,
- list);
-   while (mirror) {
-   list_del_init(>list);
-   if (mirror->ops->release) {
-   /*
-* Drop mirrors_sem so the release callback can wait
-* on any pending work that might itself trigger a
-* mmu_notifier callback and thus would deadlock with
-* us.
-*/
-   up_write(>mirrors_sem);
+   down_read(>mirrors_sem);
+   list_for_each_entry(mirror, >mirrors, list) {
+   /*
+* Note: The driver is not allowed to trigger
+* hmm_mirror_unregister() from this thread.
+*/
+   if (mirror->ops->release)
mirror->ops->release(mirror);
-   down_write(>mirrors_sem);
-   }
-   mirror = list_first_entry_or_null(>mirrors,
- struct hmm_mirror, list);
}
-   up_write(>mirrors_sem);
+   up_read(>mirrors_sem);
  
  	hmm_put(hmm);

  }
@@ -287,7 +277,7 @@ void hmm_mirror_unregister(struct hmm_mirror *mirror)
struct hmm *hmm = mirror->hmm;
  
  	down_write(>mirrors_sem);

-   list_del_init(>list);
+   list_del(>list);
up_write(>mirrors_sem);
hmm_put(hmm);
memset(>hmm, POISON_INUSE, sizeof(mirror->hmm));


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v2 hmm 11/11] mm/hmm: Remove confusing comment and logic from hmm_release

2019-06-09 Thread Jason Gunthorpe
On Thu, Jun 06, 2019 at 08:47:28PM -0700, John Hubbard wrote:
> On 6/6/19 11:44 AM, Jason Gunthorpe wrote:
> > From: Jason Gunthorpe 
> > 
> > hmm_release() is called exactly once per hmm. ops->release() cannot
> > accidentally trigger any action that would recurse back onto
> > hmm->mirrors_sem.
> > 
> > This fixes a use after-free race of the form:
> > 
> >CPU0   CPU1
> >hmm_release()
> >  up_write(>mirrors_sem);
> >  hmm_mirror_unregister(mirror)
> >   down_write(>mirrors_sem);
> >   up_write(>mirrors_sem);
> >   kfree(mirror)
> >  mirror->ops->release(mirror)
> > 
> > The only user we have today for ops->release is an empty function, so this
> > is unambiguously safe.
> > 
> > As a consequence of plugging this race drivers are not allowed to
> > register/unregister mirrors from within a release op.
> > 
> > Signed-off-by: Jason Gunthorpe 
> >  mm/hmm.c | 28 +---
> >  1 file changed, 9 insertions(+), 19 deletions(-)
> > 
> > diff --git a/mm/hmm.c b/mm/hmm.c
> > index 709d138dd49027..3a45dd3d778248 100644
> > +++ b/mm/hmm.c
> > @@ -136,26 +136,16 @@ static void hmm_release(struct mmu_notifier *mn, 
> > struct mm_struct *mm)
> > WARN_ON(!list_empty(>ranges));
> > mutex_unlock(>lock);
> >  
> > -   down_write(>mirrors_sem);
> > -   mirror = list_first_entry_or_null(>mirrors, struct hmm_mirror,
> > - list);
> > -   while (mirror) {
> > -   list_del_init(>list);
> > -   if (mirror->ops->release) {
> > -   /*
> > -* Drop mirrors_sem so the release callback can wait
> > -* on any pending work that might itself trigger a
> > -* mmu_notifier callback and thus would deadlock with
> > -* us.
> > -*/
> > -   up_write(>mirrors_sem);
> > +   down_read(>mirrors_sem);
> 
> This is cleaner and simpler, but I suspect it is leading to the deadlock
> that Ralph Campbell is seeing in his driver testing. (And in general, holding
> a lock during a driver callback usually leads to deadlocks.)

I think Ralph has never seen this patch (it is new), so it must be one
of the earlier patches..

> Ralph, is this the one? It's the only place in this patchset where I can
> see a lock around a callback to driver code, that wasn't there before. So
> I'm pretty sure it is the one...

Can you share the lockdep report please?

Thanks,
Jason
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v2 hmm 11/11] mm/hmm: Remove confusing comment and logic from hmm_release

2019-06-09 Thread Jason Gunthorpe
On Fri, Jun 07, 2019 at 02:37:07PM -0700, Ralph Campbell wrote:
> 
> On 6/6/19 11:44 AM, Jason Gunthorpe wrote:
> > From: Jason Gunthorpe 
> > 
> > hmm_release() is called exactly once per hmm. ops->release() cannot
> > accidentally trigger any action that would recurse back onto
> > hmm->mirrors_sem.
> > 
> > This fixes a use after-free race of the form:
> > 
> > CPU0   CPU1
> > hmm_release()
> >   up_write(>mirrors_sem);
> >   hmm_mirror_unregister(mirror)
> >down_write(>mirrors_sem);
> >up_write(>mirrors_sem);
> >kfree(mirror)
> >   mirror->ops->release(mirror)
> > 
> > The only user we have today for ops->release is an empty function, so this
> > is unambiguously safe.
> > 
> > As a consequence of plugging this race drivers are not allowed to
> > register/unregister mirrors from within a release op.
> > 
> > Signed-off-by: Jason Gunthorpe 
> 
> I agree with the analysis above but I'm not sure that release() will
> always be an empty function. It might be more efficient to write back
> all data migrated to a device "in one pass" instead of relying
> on unmap_vmas() calling hmm_start_range_invalidate() per VMA.

Sure, but it should not be allowed to recurse back to
hmm_mirror_unregister.

> I think the bigger issue is potential deadlocks while calling
> sync_cpu_device_pagetables() and tasks calling hmm_mirror_unregister():
>
> Say you have three threads:
> - Thread A is in try_to_unmap(), either without holding mmap_sem or with
> mmap_sem held for read.
> - Thread B has some unrelated driver calling hmm_mirror_unregister().
> This doesn't require mmap_sem.
> - Thread C is about to call migrate_vma().
>
> Thread AThread B Thread C
> try_to_unmaphmm_mirror_unregistermigrate_vma
> hmm_invalidate_range_start
> down_read(mirrors_sem)
> down_write(mirrors_sem)
> // Blocked on A
>   device_lock
> device_lock
> // Blocked on C
>   migrate_vma()
>   hmm_invalidate_range_s
>   down_read(mirrors_sem)
>   // Blocked on B
>   // Deadlock

Oh... you know I didn't know this about rwsems in linux that they have
a fairness policy for writes to block future reads..

Still, at least as things are designed, the driver cannot hold a lock
it obtains under sync_cpu_device_pagetables() and nest other things in
that lock. It certainly can't recurse back into any mmu notifiers
while holding that lock. (as you point out)

The lock in sync_cpu_device_pagetables() needs to be very narrowly
focused on updating device state only.

So, my first reaction is that the driver in thread C is wrong, and
needs a different locking scheme. I think you'd have to make a really
good case that there is no alternative for a driver..

> Perhaps we should consider using SRCU for walking the mirror->list?

It means the driver has to deal with races like in this patch
description. At that point there is almost no reason to insert hmm
here, just use mmu notifiers directly.

Drivers won't get this right, it is too hard.

Jason
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH v2 hmm 11/11] mm/hmm: Remove confusing comment and logic from hmm_release

2019-06-07 Thread Jason Gunthorpe
From: Jason Gunthorpe 

hmm_release() is called exactly once per hmm. ops->release() cannot
accidentally trigger any action that would recurse back onto
hmm->mirrors_sem.

This fixes a use after-free race of the form:

   CPU0   CPU1
   hmm_release()
 up_write(>mirrors_sem);
 hmm_mirror_unregister(mirror)
  down_write(>mirrors_sem);
  up_write(>mirrors_sem);
  kfree(mirror)
 mirror->ops->release(mirror)

The only user we have today for ops->release is an empty function, so this
is unambiguously safe.

As a consequence of plugging this race drivers are not allowed to
register/unregister mirrors from within a release op.

Signed-off-by: Jason Gunthorpe 
---
 mm/hmm.c | 28 +---
 1 file changed, 9 insertions(+), 19 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index 709d138dd49027..3a45dd3d778248 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -136,26 +136,16 @@ static void hmm_release(struct mmu_notifier *mn, struct 
mm_struct *mm)
WARN_ON(!list_empty(>ranges));
mutex_unlock(>lock);
 
-   down_write(>mirrors_sem);
-   mirror = list_first_entry_or_null(>mirrors, struct hmm_mirror,
- list);
-   while (mirror) {
-   list_del_init(>list);
-   if (mirror->ops->release) {
-   /*
-* Drop mirrors_sem so the release callback can wait
-* on any pending work that might itself trigger a
-* mmu_notifier callback and thus would deadlock with
-* us.
-*/
-   up_write(>mirrors_sem);
+   down_read(>mirrors_sem);
+   list_for_each_entry(mirror, >mirrors, list) {
+   /*
+* Note: The driver is not allowed to trigger
+* hmm_mirror_unregister() from this thread.
+*/
+   if (mirror->ops->release)
mirror->ops->release(mirror);
-   down_write(>mirrors_sem);
-   }
-   mirror = list_first_entry_or_null(>mirrors,
- struct hmm_mirror, list);
}
-   up_write(>mirrors_sem);
+   up_read(>mirrors_sem);
 
hmm_put(hmm);
 }
@@ -287,7 +277,7 @@ void hmm_mirror_unregister(struct hmm_mirror *mirror)
struct hmm *hmm = mirror->hmm;
 
down_write(>mirrors_sem);
-   list_del_init(>list);
+   list_del(>list);
up_write(>mirrors_sem);
hmm_put(hmm);
memset(>hmm, POISON_INUSE, sizeof(mirror->hmm));
-- 
2.21.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v2 hmm 11/11] mm/hmm: Remove confusing comment and logic from hmm_release

2019-06-07 Thread John Hubbard
On 6/6/19 11:44 AM, Jason Gunthorpe wrote:
> From: Jason Gunthorpe 
> 
> hmm_release() is called exactly once per hmm. ops->release() cannot
> accidentally trigger any action that would recurse back onto
> hmm->mirrors_sem.
> 
> This fixes a use after-free race of the form:
> 
>CPU0   CPU1
>hmm_release()
>  up_write(>mirrors_sem);
>  hmm_mirror_unregister(mirror)
>   down_write(>mirrors_sem);
>   up_write(>mirrors_sem);
>   kfree(mirror)
>  mirror->ops->release(mirror)
> 
> The only user we have today for ops->release is an empty function, so this
> is unambiguously safe.
> 
> As a consequence of plugging this race drivers are not allowed to
> register/unregister mirrors from within a release op.
> 
> Signed-off-by: Jason Gunthorpe 
> ---
>  mm/hmm.c | 28 +---
>  1 file changed, 9 insertions(+), 19 deletions(-)
> 
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 709d138dd49027..3a45dd3d778248 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -136,26 +136,16 @@ static void hmm_release(struct mmu_notifier *mn, struct 
> mm_struct *mm)
>   WARN_ON(!list_empty(>ranges));
>   mutex_unlock(>lock);
>  
> - down_write(>mirrors_sem);
> - mirror = list_first_entry_or_null(>mirrors, struct hmm_mirror,
> -   list);
> - while (mirror) {
> - list_del_init(>list);
> - if (mirror->ops->release) {
> - /*
> -  * Drop mirrors_sem so the release callback can wait
> -  * on any pending work that might itself trigger a
> -  * mmu_notifier callback and thus would deadlock with
> -  * us.
> -  */
> - up_write(>mirrors_sem);
> + down_read(>mirrors_sem);

This is cleaner and simpler, but I suspect it is leading to the deadlock
that Ralph Campbell is seeing in his driver testing. (And in general, holding
a lock during a driver callback usually leads to deadlocks.)

Ralph, is this the one? It's the only place in this patchset where I can
see a lock around a callback to driver code, that wasn't there before. So
I'm pretty sure it is the one...


thanks,
-- 
John Hubbard
NVIDIA
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel