Re: [PATCH] drm/amdkfd: Remove skiping userptr buffer mapping when mmu notifier marks it as invalid
On Wed, May 10, 2023 at 11:00 AM Felix Kuehling wrote: > > Am 2023-05-09 um 18:17 schrieb Alex Deucher: > > From: Xiaogang Chen > > > > mmu notifier does not always hold mm->sem during call back. That causes > > a race condition between kfd userprt buffer mapping and mmu notifier > > which leds to gpu shadder or SDMA access userptr buffer before it has been > > mapped to gpu VM. Always map userptr buffer to avoid that though it may make > > some userprt buffers mapped two times. > > > > Suggested-by: Felix Kuehling > > Signed-off-by: Xiaogang Chen > > Reviewed-by: Felix Kuehling > > Signed-off-by: Alex Deucher > > This patch is no longer needed and should not be applied. It was > originally applied to amd-staging-drm-next as patch > fcf00f8d29f2fc6bf00531a1447be28b99073cc3 in November 2022. This fixed a > race condition due to incorrect assumptions about the mmap lock and MMU > notifiers. This hunk was added back by my later patch f95f51a4c335 > ("drm/amdgpu: Add notifier lock for KFD userptrs") in December, using > our own notifier lock that doesn't suffer from those races. > Thanks. Dropped. Alex > Regards, >Felix > > > > --- > > drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 10 -- > > 1 file changed, 10 deletions(-) > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c > > index 58a774647573..40078c0a5585 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c > > @@ -1942,16 +1942,6 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu( > >*/ > > mutex_lock(>process_info->lock); > > > > - /* Lock notifier lock. If we find an invalid userptr BO, we can be > > - * sure that the MMU notifier is no longer running > > - * concurrently and the queues are actually stopped > > - */ > > - if (amdgpu_ttm_tt_get_usermm(bo->tbo.ttm)) { > > - mutex_lock(>process_info->notifier_lock); > > - is_invalid_userptr = !!mem->invalid; > > - mutex_unlock(>process_info->notifier_lock); > > - } > > - > > mutex_lock(>lock); > > > > domain = mem->domain;
Re: [PATCH] drm/amdkfd: Remove skiping userptr buffer mapping when mmu notifier marks it as invalid
Am 2023-05-09 um 18:17 schrieb Alex Deucher: From: Xiaogang Chen mmu notifier does not always hold mm->sem during call back. That causes a race condition between kfd userprt buffer mapping and mmu notifier which leds to gpu shadder or SDMA access userptr buffer before it has been mapped to gpu VM. Always map userptr buffer to avoid that though it may make some userprt buffers mapped two times. Suggested-by: Felix Kuehling Signed-off-by: Xiaogang Chen Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher This patch is no longer needed and should not be applied. It was originally applied to amd-staging-drm-next as patch fcf00f8d29f2fc6bf00531a1447be28b99073cc3 in November 2022. This fixed a race condition due to incorrect assumptions about the mmap lock and MMU notifiers. This hunk was added back by my later patch f95f51a4c335 ("drm/amdgpu: Add notifier lock for KFD userptrs") in December, using our own notifier lock that doesn't suffer from those races. Regards, Felix --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 10 -- 1 file changed, 10 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 58a774647573..40078c0a5585 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -1942,16 +1942,6 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu( */ mutex_lock(>process_info->lock); - /* Lock notifier lock. If we find an invalid userptr BO, we can be -* sure that the MMU notifier is no longer running -* concurrently and the queues are actually stopped -*/ - if (amdgpu_ttm_tt_get_usermm(bo->tbo.ttm)) { - mutex_lock(>process_info->notifier_lock); - is_invalid_userptr = !!mem->invalid; - mutex_unlock(>process_info->notifier_lock); - } - mutex_lock(>lock); domain = mem->domain;
[PATCH] drm/amdkfd: Remove skiping userptr buffer mapping when mmu notifier marks it as invalid
From: Xiaogang Chen mmu notifier does not always hold mm->sem during call back. That causes a race condition between kfd userprt buffer mapping and mmu notifier which leds to gpu shadder or SDMA access userptr buffer before it has been mapped to gpu VM. Always map userptr buffer to avoid that though it may make some userprt buffers mapped two times. Suggested-by: Felix Kuehling Signed-off-by: Xiaogang Chen Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 10 -- 1 file changed, 10 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 58a774647573..40078c0a5585 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -1942,16 +1942,6 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu( */ mutex_lock(>process_info->lock); - /* Lock notifier lock. If we find an invalid userptr BO, we can be -* sure that the MMU notifier is no longer running -* concurrently and the queues are actually stopped -*/ - if (amdgpu_ttm_tt_get_usermm(bo->tbo.ttm)) { - mutex_lock(>process_info->notifier_lock); - is_invalid_userptr = !!mem->invalid; - mutex_unlock(>process_info->notifier_lock); - } - mutex_lock(>lock); domain = mem->domain; -- 2.40.1
Re: [PATCH] drm/amdkfd: Remove skiping userptr buffer mapping when mmu notifier marks it as invalid
Am 2022-10-28 um 20:26 schrieb Xiaogang.Chen: From: Xiaogang Chen mmu notifier does not always hold mm->sem during call back. That causes a race condition between kfd userprt buffer mapping and mmu notifier which leds to gpu shadder or SDMA access userptr buffer before it has been mapped to gpu VM. Always map userptr buffer to avoid that though it may make some userprt buffers mapped two times. Suggested-by: Felix Kuehling Signed-off-by: Xiaogang Chen Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 10 -- 1 file changed, 10 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index da9d475d7ef2..ba72a910d0d5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -1907,16 +1907,6 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu( */ mutex_lock(>process_info->lock); - /* Lock mmap-sem. If we find an invalid userptr BO, we can be -* sure that the MMU notifier is no longer running -* concurrently and the queues are actually stopped -*/ - if (amdgpu_ttm_tt_get_usermm(bo->tbo.ttm)) { - mmap_write_lock(current->mm); - is_invalid_userptr = atomic_read(>invalid); - mmap_write_unlock(current->mm); - } - mutex_lock(>lock); domain = mem->domain;
[PATCH] drm/amdkfd: Remove skiping userptr buffer mapping when mmu notifier marks it as invalid
From: Xiaogang Chen mmu notifier does not always hold mm->sem during call back. That causes a race condition between kfd userprt buffer mapping and mmu notifier which leds to gpu shadder or SDMA access userptr buffer before it has been mapped to gpu VM. Always map userptr buffer to avoid that though it may make some userprt buffers mapped two times. Suggested-by: Felix Kuehling Signed-off-by: Xiaogang Chen --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 10 -- 1 file changed, 10 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index da9d475d7ef2..ba72a910d0d5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -1907,16 +1907,6 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu( */ mutex_lock(>process_info->lock); - /* Lock mmap-sem. If we find an invalid userptr BO, we can be -* sure that the MMU notifier is no longer running -* concurrently and the queues are actually stopped -*/ - if (amdgpu_ttm_tt_get_usermm(bo->tbo.ttm)) { - mmap_write_lock(current->mm); - is_invalid_userptr = atomic_read(>invalid); - mmap_write_unlock(current->mm); - } - mutex_lock(>lock); domain = mem->domain; -- 2.25.1