Re: [PATCH] drm/amdkfd: Remove skiping userptr buffer mapping when mmu notifier marks it as invalid

2023-05-10 Thread Alex Deucher
On Wed, May 10, 2023 at 11:00 AM Felix Kuehling  wrote:
>
> Am 2023-05-09 um 18:17 schrieb Alex Deucher:
> > From: Xiaogang Chen 
> >
> > mmu notifier does not always hold mm->sem during call back. That causes
> > a race condition between kfd userprt buffer mapping and mmu notifier
> > which leds to gpu shadder or SDMA access userptr buffer before it has been
> > mapped to gpu VM. Always map userptr buffer to avoid that though it may make
> > some userprt buffers mapped two times.
> >
> > Suggested-by: Felix Kuehling 
> > Signed-off-by: Xiaogang Chen 
> > Reviewed-by: Felix Kuehling 
> > Signed-off-by: Alex Deucher 
>
> This patch is no longer needed and should not be applied. It was
> originally applied to amd-staging-drm-next as patch
> fcf00f8d29f2fc6bf00531a1447be28b99073cc3 in November 2022. This fixed a
> race condition due to incorrect assumptions about the mmap lock and MMU
> notifiers. This hunk was added back by my later patch f95f51a4c335
> ("drm/amdgpu: Add notifier lock for KFD userptrs") in December, using
> our own notifier lock that doesn't suffer from those races.
>

Thanks.  Dropped.

Alex

> Regards,
>Felix
>
>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 10 --
> >   1 file changed, 10 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> > index 58a774647573..40078c0a5585 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> > @@ -1942,16 +1942,6 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
> >*/
> >   mutex_lock(>process_info->lock);
> >
> > - /* Lock notifier lock. If we find an invalid userptr BO, we can be
> > -  * sure that the MMU notifier is no longer running
> > -  * concurrently and the queues are actually stopped
> > -  */
> > - if (amdgpu_ttm_tt_get_usermm(bo->tbo.ttm)) {
> > - mutex_lock(>process_info->notifier_lock);
> > - is_invalid_userptr = !!mem->invalid;
> > - mutex_unlock(>process_info->notifier_lock);
> > - }
> > -
> >   mutex_lock(>lock);
> >
> >   domain = mem->domain;


Re: [PATCH] drm/amdkfd: Remove skiping userptr buffer mapping when mmu notifier marks it as invalid

2023-05-10 Thread Felix Kuehling

Am 2023-05-09 um 18:17 schrieb Alex Deucher:

From: Xiaogang Chen 

mmu notifier does not always hold mm->sem during call back. That causes
a race condition between kfd userprt buffer mapping and mmu notifier
which leds to gpu shadder or SDMA access userptr buffer before it has been
mapped to gpu VM. Always map userptr buffer to avoid that though it may make
some userprt buffers mapped two times.

Suggested-by: Felix Kuehling 
Signed-off-by: Xiaogang Chen 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher 


This patch is no longer needed and should not be applied. It was 
originally applied to amd-staging-drm-next as patch 
fcf00f8d29f2fc6bf00531a1447be28b99073cc3 in November 2022. This fixed a 
race condition due to incorrect assumptions about the mmap lock and MMU 
notifiers. This hunk was added back by my later patch f95f51a4c335 
("drm/amdgpu: Add notifier lock for KFD userptrs") in December, using 
our own notifier lock that doesn't suffer from those races.


Regards,
  Felix



---
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 10 --
  1 file changed, 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 58a774647573..40078c0a5585 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1942,16 +1942,6 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
 */
mutex_lock(>process_info->lock);
  
-	/* Lock notifier lock. If we find an invalid userptr BO, we can be

-* sure that the MMU notifier is no longer running
-* concurrently and the queues are actually stopped
-*/
-   if (amdgpu_ttm_tt_get_usermm(bo->tbo.ttm)) {
-   mutex_lock(>process_info->notifier_lock);
-   is_invalid_userptr = !!mem->invalid;
-   mutex_unlock(>process_info->notifier_lock);
-   }
-
mutex_lock(>lock);
  
  	domain = mem->domain;


[PATCH] drm/amdkfd: Remove skiping userptr buffer mapping when mmu notifier marks it as invalid

2023-05-09 Thread Alex Deucher
From: Xiaogang Chen 

mmu notifier does not always hold mm->sem during call back. That causes
a race condition between kfd userprt buffer mapping and mmu notifier
which leds to gpu shadder or SDMA access userptr buffer before it has been
mapped to gpu VM. Always map userptr buffer to avoid that though it may make
some userprt buffers mapped two times.

Suggested-by: Felix Kuehling 
Signed-off-by: Xiaogang Chen 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 10 --
 1 file changed, 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 58a774647573..40078c0a5585 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1942,16 +1942,6 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
 */
mutex_lock(>process_info->lock);
 
-   /* Lock notifier lock. If we find an invalid userptr BO, we can be
-* sure that the MMU notifier is no longer running
-* concurrently and the queues are actually stopped
-*/
-   if (amdgpu_ttm_tt_get_usermm(bo->tbo.ttm)) {
-   mutex_lock(>process_info->notifier_lock);
-   is_invalid_userptr = !!mem->invalid;
-   mutex_unlock(>process_info->notifier_lock);
-   }
-
mutex_lock(>lock);
 
domain = mem->domain;
-- 
2.40.1



Re: [PATCH] drm/amdkfd: Remove skiping userptr buffer mapping when mmu notifier marks it as invalid

2022-10-31 Thread Felix Kuehling

Am 2022-10-28 um 20:26 schrieb Xiaogang.Chen:

From: Xiaogang Chen 

mmu notifier does not always hold mm->sem during call back. That causes a race 
condition
between kfd userprt buffer mapping and mmu notifier which leds to gpu shadder 
or SDMA access
userptr buffer before it has been mapped to gpu VM. Always map userptr buffer 
to avoid that
though it may make some userprt buffers mapped two times.

Suggested-by:  Felix Kuehling 
Signed-off-by: Xiaogang Chen 


Reviewed-by: Felix Kuehling 



---
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 10 --
  1 file changed, 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index da9d475d7ef2..ba72a910d0d5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1907,16 +1907,6 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
 */
mutex_lock(>process_info->lock);
  
-	/* Lock mmap-sem. If we find an invalid userptr BO, we can be

-* sure that the MMU notifier is no longer running
-* concurrently and the queues are actually stopped
-*/
-   if (amdgpu_ttm_tt_get_usermm(bo->tbo.ttm)) {
-   mmap_write_lock(current->mm);
-   is_invalid_userptr = atomic_read(>invalid);
-   mmap_write_unlock(current->mm);
-   }
-
mutex_lock(>lock);
  
  	domain = mem->domain;


[PATCH] drm/amdkfd: Remove skiping userptr buffer mapping when mmu notifier marks it as invalid

2022-10-28 Thread Xiaogang . Chen
From: Xiaogang Chen 

mmu notifier does not always hold mm->sem during call back. That causes a race 
condition
between kfd userprt buffer mapping and mmu notifier which leds to gpu shadder 
or SDMA access
userptr buffer before it has been mapped to gpu VM. Always map userptr buffer 
to avoid that
though it may make some userprt buffers mapped two times.

Suggested-by:  Felix Kuehling 
Signed-off-by: Xiaogang Chen 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 10 --
 1 file changed, 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index da9d475d7ef2..ba72a910d0d5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1907,16 +1907,6 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
 */
mutex_lock(>process_info->lock);
 
-   /* Lock mmap-sem. If we find an invalid userptr BO, we can be
-* sure that the MMU notifier is no longer running
-* concurrently and the queues are actually stopped
-*/
-   if (amdgpu_ttm_tt_get_usermm(bo->tbo.ttm)) {
-   mmap_write_lock(current->mm);
-   is_invalid_userptr = atomic_read(>invalid);
-   mmap_write_unlock(current->mm);
-   }
-
mutex_lock(>lock);
 
domain = mem->domain;
-- 
2.25.1