[PATCH] drm/amdgpu: Fix clang warning: unused label 'exit'

2021-05-25 Thread Andrey Grodzovsky
Problem: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:332:1: warning: unused label 'exit' [-Wunused-label] exit: ^ Fix: Put #ifdef CONFIG_64BIT around exit Reported-by: kernel test robot Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 ++ 1 fi

[PATCH v2 2/2] drm/amdgpu: Fix crash when hot unplug in BACO.

2021-05-21 Thread Andrey Grodzovsky
handling. Fix: Use a flag we use for PCIe error recovery to avoid accessing registres. This allows to succefully complete rpm resume sequence and finish pci remove. v2: Renamed HW access block flag Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1081 Signed-off-by: Andrey Grodzovsky

[PATCH v2 1/2] drm/amdgpu: Rename flag which prevents HW access

2021-05-21 Thread Andrey Grodzovsky
Make it's name not feature but function descriptive. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu.h| 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 +++--- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c| 4 ++-- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c

Re: [PATCH] drm/amdgpu: Fix crash when hot unplug in BACO.

2021-05-21 Thread Andrey Grodzovsky
Will do. Andrey On 2021-05-21 4:18 p.m., Alex Deucher wrote: On Fri, May 21, 2021 at 4:14 PM Andrey Grodzovsky wrote: Problem: When device goes into sleep state due to prolonged innactivity (e.g. BACO sleep) and then hot unplugged, PCI core will try to wake up the device as part of unplug

[PATCH] drm/amdgpu: Fix crash when hot unplug in BACO.

2021-05-21 Thread Andrey Grodzovsky
flag is not set yet. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1081 Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c

Re: [PATCH] drm/amdgpu: Add early fini callback

2021-05-19 Thread Andrey Grodzovsky
On 2021-05-19 11:29 p.m., Felix Kuehling wrote: Am 2021-05-19 um 11:20 p.m. schrieb Andrey Grodzovsky: Use it to call disply code dependent on device->drv_data before it's set to NULL on device unplug v5: Move HW finilization into this callback to prevent MMIO accesses post cpi rem

[PATCH] drm/amdgpu: Add early fini callback

2021-05-19 Thread Andrey Grodzovsky
ious KFD commit into this commit to avoid compile break. Signed-off-by: Andrey Grodzovsky Acked-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c| 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

[PATCH] drm/sched: Avoid data corruptions

2021-05-19 Thread Andrey Grodzovsky
Wait for all dependencies of a job to complete before killing it to avoid data corruptions. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/scheduler/sched_entity.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm

Re: [PATCH v7 13/16] drm/scheduler: Fix hang when sched_entity released

2021-05-19 Thread Andrey Grodzovsky
On 2021-05-19 7:46 a.m., Christian König wrote: Am 19.05.21 um 13:03 schrieb Andrey Grodzovsky: On 2021-05-19 6:57 a.m., Christian König wrote: Am 18.05.21 um 20:48 schrieb Andrey Grodzovsky: [SNIP] Would this be the right way to do it ? Yes, it is at least a start. Question is if we

Re: [PATCH] drm/amd/amdgpu: fix a potential deadlock in gpu reset

2021-05-19 Thread Andrey Grodzovsky
k. Acked-by: Christian König Yes, seems like a typo... Reviewed-by: Andrey Grodzovsky andrey.grodzov...@amd.com Andrey ---   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 -   1 file changed, 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/a

Re: [PATCH v7 13/16] drm/scheduler: Fix hang when sched_entity released

2021-05-19 Thread Andrey Grodzovsky
On 2021-05-19 6:57 a.m., Christian König wrote: Am 18.05.21 um 20:48 schrieb Andrey Grodzovsky: [SNIP] Would this be the right way to do it ? Yes, it is at least a start. Question is if we can wait blocking here or not. We install a callback a bit lower to avoid blocking, so I'm p

Re: [PATCH v7 13/16] drm/scheduler: Fix hang when sched_entity released

2021-05-18 Thread Andrey Grodzovsky
On 2021-05-18 2:48 p.m., Andrey Grodzovsky wrote: On 2021-05-18 2:13 p.m., Christian König wrote: Am 18.05.21 um 20:09 schrieb Andrey Grodzovsky: On 2021-05-18 2:02 p.m., Christian König wrote: Am 18.05.21 um 19:43 schrieb Andrey Grodzovsky: On 2021-05-18 12:33 p.m., Christian König

Re: [PATCH v7 13/16] drm/scheduler: Fix hang when sched_entity released

2021-05-18 Thread Andrey Grodzovsky
On 2021-05-18 2:13 p.m., Christian König wrote: Am 18.05.21 um 20:09 schrieb Andrey Grodzovsky: On 2021-05-18 2:02 p.m., Christian König wrote: Am 18.05.21 um 19:43 schrieb Andrey Grodzovsky: On 2021-05-18 12:33 p.m., Christian König wrote: Am 18.05.21 um 18:17 schrieb Andrey Grodzovsky

Re: [PATCH v7 13/16] drm/scheduler: Fix hang when sched_entity released

2021-05-18 Thread Andrey Grodzovsky
On 2021-05-18 2:02 p.m., Christian König wrote: Am 18.05.21 um 19:43 schrieb Andrey Grodzovsky: On 2021-05-18 12:33 p.m., Christian König wrote: Am 18.05.21 um 18:17 schrieb Andrey Grodzovsky: On 2021-05-18 11:15 a.m., Christian König wrote: Am 18.05.21 um 17:03 schrieb Andrey Grodzovsky

Re: [PATCH v7 13/16] drm/scheduler: Fix hang when sched_entity released

2021-05-18 Thread Andrey Grodzovsky
On 2021-05-18 12:33 p.m., Christian König wrote: Am 18.05.21 um 18:17 schrieb Andrey Grodzovsky: On 2021-05-18 11:15 a.m., Christian König wrote: Am 18.05.21 um 17:03 schrieb Andrey Grodzovsky: On 2021-05-18 10:07 a.m., Christian König wrote: In a separate discussion with Daniel we once

Re: [PATCH v7 13/16] drm/scheduler: Fix hang when sched_entity released

2021-05-18 Thread Andrey Grodzovsky
On 2021-05-18 11:15 a.m., Christian König wrote: Am 18.05.21 um 17:03 schrieb Andrey Grodzovsky: On 2021-05-18 10:07 a.m., Christian König wrote: In a separate discussion with Daniel we once more iterated over the dma_resv requirements and I came to the conclusion that this approach here

Re: [PATCH v7 13/16] drm/scheduler: Fix hang when sched_entity released

2021-05-18 Thread Andrey Grodzovsky
e how to handle that case. One possibility would be to wait for all dependencies of unscheduled jobs before signaling their fences as canceled. Christian. Am 12.05.21 um 16:26 schrieb Andrey Grodzovsky: Problem: If scheduler is already stopped by the time sched_entity is released and entit

Re: [PATCH] drm/amdgpu: Unmap all MMIO mappings

2021-05-18 Thread Andrey Grodzovsky
Ping Andrey On 2021-05-17 3:31 p.m., Andrey Grodzovsky wrote: Access to those must be prevented post pci_remove v6: Drop BOs list, unampping VRAM BAR is enough. v8: Add condition of xgmi.connected_to_cpu to MTTR handling and remove MTTR handling from the old place. Signed-off-by: Andrey

Re: [PATCH v7 12/16] drm/amdgpu: Fix hang on device removal.

2021-05-17 Thread Andrey Grodzovsky
Yep, you can take a look. Andrey On 2021-05-17 3:39 p.m., Christian König wrote: You need to note who you are pinging here. I'm still assuming you wait for feedback from Daniel. Or should I take a look? Christian. Am 17.05.21 um 16:40 schrieb Andrey Grodzovsky: Ping Andrey On 20

[PATCH] drm/amdgpu: Unmap all MMIO mappings

2021-05-17 Thread Andrey Grodzovsky
Access to those must be prevented post pci_remove v6: Drop BOs list, unampping VRAM BAR is enough. v8: Add condition of xgmi.connected_to_cpu to MTTR handling and remove MTTR handling from the old place. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 26

Re: [PATCH v7 16/16] drm/amdgpu: Unmap all MMIO mappings

2021-05-17 Thread Andrey Grodzovsky
On 2021-05-17 2:56 p.m., Alex Deucher wrote: On Mon, May 17, 2021 at 2:46 PM Andrey Grodzovsky wrote: On 2021-05-17 1:43 p.m., Alex Deucher wrote: On Wed, May 12, 2021 at 10:27 AM Andrey Grodzovsky wrote: Access to those must be prevented post pci_remove v6: Drop BOs list, unampping

Re: [PATCH v7 16/16] drm/amdgpu: Unmap all MMIO mappings

2021-05-17 Thread Andrey Grodzovsky
On 2021-05-17 1:43 p.m., Alex Deucher wrote: On Wed, May 12, 2021 at 10:27 AM Andrey Grodzovsky wrote: Access to those must be prevented post pci_remove v6: Drop BOs list, unampping VRAM BAR is enough. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

Re: [PATCH v7 16/16] drm/amdgpu: Unmap all MMIO mappings

2021-05-17 Thread Andrey Grodzovsky
Ping Andrey On 2021-05-14 10:42 a.m., Andrey Grodzovsky wrote: Ping Andrey On 2021-05-12 10:26 a.m., Andrey Grodzovsky wrote: Access to those must be prevented post pci_remove v6: Drop BOs list, unampping VRAM BAR is enough. Signed-off-by: Andrey Grodzovsky ---   drivers/gpu/drm/amd

Re: [PATCH v7 12/16] drm/amdgpu: Fix hang on device removal.

2021-05-17 Thread Andrey Grodzovsky
Ping Andrey On 2021-05-14 10:42 a.m., Andrey Grodzovsky wrote: Ping Andrey On 2021-05-12 10:26 a.m., Andrey Grodzovsky wrote: If removing while commands in flight you cannot wait to flush the HW fences on a ring since the device is gone. Signed-off-by: Andrey Grodzovsky ---   drivers/gpu

[PATCH] drm/amdgpu: Handle IOMMU enabled case.

2021-05-17 Thread Andrey Grodzovsky
all amdgpu_ih_ring_fini unconditionally. v8: Add deatiled explanation Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 14 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h | 2 +- drivers/gpu/drm/amd/amd

Re: [PATCH v7 05/16] drm/amdgpu: Handle IOMMU enabled case.

2021-05-14 Thread Andrey Grodzovsky
a bunch of refactoring to me. Regards,   Felix Am 2021-05-14 um 10:41 a.m. schrieb Andrey Grodzovsky: Ping Andrey On 2021-05-12 10:26 a.m., Andrey Grodzovsky wrote: Handle all DMA IOMMU gropup related dependencies before the group is removed. v5: Drop IOMMU notifier and switch to lockle

Re: [PATCH v7 16/16] drm/amdgpu: Unmap all MMIO mappings

2021-05-14 Thread Andrey Grodzovsky
Ping Andrey On 2021-05-12 10:26 a.m., Andrey Grodzovsky wrote: Access to those must be prevented post pci_remove v6: Drop BOs list, unampping VRAM BAR is enough. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 24 +++--- drivers/gpu/drm

Re: [PATCH v7 12/16] drm/amdgpu: Fix hang on device removal.

2021-05-14 Thread Andrey Grodzovsky
Ping Andrey On 2021-05-12 10:26 a.m., Andrey Grodzovsky wrote: If removing while commands in flight you cannot wait to flush the HW fences on a ring since the device is gone. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 16 ++-- 1 file

Re: [PATCH v7 05/16] drm/amdgpu: Handle IOMMU enabled case.

2021-05-14 Thread Andrey Grodzovsky
Ping Andrey On 2021-05-12 10:26 a.m., Andrey Grodzovsky wrote: Handle all DMA IOMMU gropup related dependencies before the group is removed. v5: Drop IOMMU notifier and switch to lockless call to ttm_tt_unpopulate v6: Drop the BO unamp list v7: Drop amdgpu_gart_fini In amdgpu_ih_ring_fini do

Re: [PATCH v7 09/16] drm/amdgpu: Guard against write accesses after device removal

2021-05-13 Thread Andrey Grodzovsky
On 2021-05-12 4:50 p.m., Alex Deucher wrote: On Wed, May 12, 2021 at 4:30 PM Andrey Grodzovsky wrote: On 2021-05-12 4:17 p.m., Alex Deucher wrote: On Wed, May 12, 2021 at 10:27 AM Andrey Grodzovsky wrote: This should prevent writing to memory or IO ranges possibly already allocated

Re: [PATCH v7 03/16] drm/amdkfd: Split kfd suspend from device exit

2021-05-12 Thread Andrey Grodzovsky
On 2021-05-12 4:33 p.m., Felix Kuehling wrote: Am 2021-05-12 um 10:26 a.m. schrieb Andrey Grodzovsky: Helps to expdite HW related stuff to amdgpu_pci_remove Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h

Re: [PATCH v7 09/16] drm/amdgpu: Guard against write accesses after device removal

2021-05-12 Thread Andrey Grodzovsky
On 2021-05-12 4:17 p.m., Alex Deucher wrote: On Wed, May 12, 2021 at 10:27 AM Andrey Grodzovsky wrote: This should prevent writing to memory or IO ranges possibly already allocated for other uses after our device is removed. v5: Protect more places wher memcopy_to/form_io takes place

[PATCH v7 16/16] drm/amdgpu: Unmap all MMIO mappings

2021-05-12 Thread Andrey Grodzovsky
Access to those must be prevented post pci_remove v6: Drop BOs list, unampping VRAM BAR is enough. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 24 +++--- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 1 + drivers/gpu/drm/amd/amdgpu

[PATCH v7 15/16] drm/amdgpu: Verify DMA opearations from device are done

2021-05-12 Thread Andrey Grodzovsky
In case device remove is just simualted by sysfs then verify device doesn't keep doing DMA to the released memory after pci_remove is done. Signed-off-by: Andrey Grodzovsky Acked-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 6 ++ 1 file changed, 6 insertions(+)

[PATCH v7 14/16] drm/amd/display: Remove superfluous drm_mode_config_cleanup

2021-05-12 Thread Andrey Grodzovsky
It's already being released by DRM core through devm Signed-off-by: Andrey Grodzovsky Reviewed-by: Rodrigo Siqueira --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/dr

[PATCH v7 13/16] drm/scheduler: Fix hang when sched_entity released

2021-05-12 Thread Andrey Grodzovsky
rq due to race. v3: Drop drm_sched_rq_remove_entity, only modify entity->stopped and check for it in drm_sched_entity_is_idle Signed-off-by: Andrey Grodzovsky Reviewed-by: Christian König --- drivers/gpu/drm/scheduler/sched_entity.c | 3 ++- drivers/gpu/drm/scheduler/sched_

[PATCH v7 12/16] drm/amdgpu: Fix hang on device removal.

2021-05-12 Thread Andrey Grodzovsky
If removing while commands in flight you cannot wait to flush the HW fences on a ring since the device is gone. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 16 ++-- 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm

[PATCH v7 11/16] drm/amdgpu: Prevent any job recoveries after device is unplugged.

2021-05-12 Thread Andrey Grodzovsky
Return DRM_TASK_STATUS_ENODEV back to the scheduler when device is not present so they timeout timer will not be rearmed. v5: Update to match updated return values in enum drm_gpu_sched_stat Signed-off-by: Andrey Grodzovsky Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu

[PATCH v7 10/16] drm/sched: Make timeout timer rearm conditional.

2021-05-12 Thread Andrey Grodzovsky
We don't want to rearm the timer if driver hook reports that the device is gone. v5: Update drm_gpu_sched_stat values in code. Signed-off-by: Andrey Grodzovsky Reviewed-by: Christian König --- drivers/gpu/drm/scheduler/sched_main.c | 11 +++ 1 file changed, 7 insertions(

[PATCH v7 09/16] drm/amdgpu: Guard against write accesses after device removal

2021-05-12 Thread Andrey Grodzovsky
of HW ring commands emission protection since they are in GART and not in MMIO. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 +++- drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c| 9 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c| 63

[PATCH v7 08/16] drm/amdgpu: Convert driver sysfs attributes to static attributes

2021-05-12 Thread Andrey Grodzovsky
This allows to remove explicit creation and destruction of those attrs and by this avoids warnings on device finalizing post physical device extraction. v5: Use newly added pci_driver.dev_groups directly Signed-off-by: Andrey Grodzovsky Acked-by: Christian König --- drivers/gpu/drm/amd/amdgpu

[PATCH v7 07/16] PCI: Add support for dev_groups to struct pci_driver

2021-05-12 Thread Andrey Grodzovsky
This helps converting PCI drivers sysfs attributes to static. Analogous to' commit b71b283e3d6d ("USB: add support for dev_groups to struct usb_driver")' Signed-off-by: Andrey Grodzovsky Suggested-by: Greg Kroah-Hartman Acked-by: Bjorn Helgaas --- drivers/pci/pci-dri

[PATCH v7 06/16] drm/amdgpu: Remap all page faults to per process dummy page.

2021-05-12 Thread Andrey Grodzovsky
On device removal reroute all CPU mappings to dummy page per drm_file instance or imported GEM object. v4: Update for modified ttm_bo_vm_dummy_page Signed-off-by: Andrey Grodzovsky Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 21 - 1 file

[PATCH v7 05/16] drm/amdgpu: Handle IOMMU enabled case.

2021-05-12 Thread Andrey Grodzovsky
zed rings. Call amdgpu_ih_ring_fini unconditionally. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 14 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_i

[PATCH v7 04/16] drm/amdgpu: Add early fini callback

2021-05-12 Thread Andrey Grodzovsky
Use it to call disply code dependent on device->drv_data before it's set to NULL on device unplug v5: Move HW finilization into this callback to prevent MMIO accesses post cpi remove. Signed-off-by: Andrey Grodzovsky Acked-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_

[PATCH v7 02/16] drm/amdgpu: Split amdgpu_device_fini into early and late

2021-05-12 Thread Andrey Grodzovsky
last device reference is dropped. v4: Change functions prefix early->hw and late->sw Signed-off-by: Andrey Grodzovsky Acked-by: Christian König Reviewed-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu.h| 6 - drivers/gpu/drm/amd/amdgpu/amdgpu_device.

[PATCH v7 03/16] drm/amdkfd: Split kfd suspend from device exit

2021-05-12 Thread Andrey Grodzovsky
Helps to expdite HW related stuff to amdgpu_pci_remove Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_device.c| 3 ++- 3 files changed, 4 insertions(+), 3 deletions

[PATCH v7 01/16] drm/ttm: Remap all page faults to per process dummy page.

2021-05-12 Thread Andrey Grodzovsky
r that BO. v5: Remove duplicate return. v6: Polish ttm_bo_vm_dummy_page, remove superfluous code. Signed-off-by: Andrey Grodzovsky Reviewed-by: Christian König --- drivers/gpu/drm/ttm/ttm_bo_vm.c | 54 - include/drm/ttm/ttm_bo_api.h| 2 ++ 2 files chang

[PATCH v7 00/16] RFC Support hot device unplug in amdgpu

2021-05-12 Thread Andrey Grodzovsky
d gitlab ticket https://gitlab.freedesktop.org/drm/amd/-/issues/1081 [4] - Related IGT tests https://gitlab.freedesktop.org/agrodzov/igt-gpu-tools/-/commits/master Andrey Grodzovsky (16): drm/ttm: Remap all page faults to per process dummy page. drm/amdgpu: Split amdgpu_device_fini into early

Re: [PATCH v6 10/16] drm/amdgpu: Guard against write accesses after device removal

2021-05-12 Thread Andrey Grodzovsky
Hopefyllu Alex can chime in on this. I will respin V7 soon. Andrey On 2021-05-12 10:06 a.m., Christian König wrote: Am 12.05.21 um 16:01 schrieb Andrey Grodzovsky: Ping - need a confirmation it's ok to keep this as a single patch given my explanation bellow. It was just an suggestion

Re: [PATCH v6 10/16] drm/amdgpu: Guard against write accesses after device removal

2021-05-12 Thread Andrey Grodzovsky
Ping - need a confirmation it's ok to keep this as a single patch given my explanation bellow. Andrey On 2021-05-11 1:52 p.m., Andrey Grodzovsky wrote: On 2021-05-11 2:50 a.m., Christian König wrote: Am 10.05.21 um 18:36 schrieb Andrey Grodzovsky: This should prevent writing to memo

Re: [PATCH v6 10/16] drm/amdgpu: Guard against write accesses after device removal

2021-05-11 Thread Andrey Grodzovsky
On 2021-05-11 2:50 a.m., Christian König wrote: Am 10.05.21 um 18:36 schrieb Andrey Grodzovsky: This should prevent writing to memory or IO ranges possibly already allocated for other uses after our device is removed. v5: Protect more places wher memcopy_to/form_io takes place Protect IB

Re: [PATCH v6 06/16] drm/amdgpu: Handle IOMMU enabled case.

2021-05-11 Thread Andrey Grodzovsky
On 2021-05-11 11:56 a.m., Alex Deucher wrote: On Mon, May 10, 2021 at 12:37 PM Andrey Grodzovsky wrote: Handle all DMA IOMMU gropup related dependencies before the group is removed. v5: Drop IOMMU notifier and switch to lockless call to ttm_tt_unpopulate v6: Drop the BO unamp list Signed

Re: [PATCH v6 06/16] drm/amdgpu: Handle IOMMU enabled case.

2021-05-11 Thread Andrey Grodzovsky
On 2021-05-11 2:44 a.m., Christian König wrote: Am 10.05.21 um 18:36 schrieb Andrey Grodzovsky: Handle all DMA IOMMU gropup related dependencies before the group is removed. v5: Drop IOMMU notifier and switch to lockless call to ttm_tt_unpopulate v6: Drop the BO unamp list Signed-off-by

Re: [PATCH v6 04/16] drm/amdkfd: Split kfd suspend from devie exit

2021-05-11 Thread Andrey Grodzovsky
On 2021-05-11 2:40 a.m., Christian König wrote: Am 10.05.21 um 18:36 schrieb Andrey Grodzovsky: Helps to expdite HW related stuff to amdgpu_pci_remove Signed-off-by: Andrey Grodzovsky ---   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 2 +-   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 2

Re: [PATCH v6 01/16] drm/ttm: Remap all page faults to per process dummy page.

2021-05-11 Thread Andrey Grodzovsky
On 2021-05-11 2:38 a.m., Christian König wrote: Am 10.05.21 um 18:36 schrieb Andrey Grodzovsky: On device removal reroute all CPU mappings to dummy page. v3: Remove loop to find DRM file and instead access it by vma->vm_file->private_data. Move dummy page installation into a separate fu

Re: [PATCH v6 02/16] drm/ttm: Expose ttm_tt_unpopulate for driver use

2021-05-10 Thread Andrey Grodzovsky
On 2021-05-10 2:27 p.m., Felix Kuehling wrote: Am 2021-05-10 um 12:36 p.m. schrieb Andrey Grodzovsky: It's needed to drop iommu backed pages on device unplug before device's IOMMU group is released. I don't see any calls to ttm_tt_unpopulate in the rest of the series now. Is t

[PATCH v6 16/16] drm/amdgpu: Verify DMA opearations from device are done

2021-05-10 Thread Andrey Grodzovsky
In case device remove is just simualted by sysfs then verify device doesn't keep doing DMA to the released memory after pci_remove is done. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/dr

[PATCH v6 15/16] drm/amd/display: Remove superflous drm_mode_config_cleanup

2021-05-10 Thread Andrey Grodzovsky
It's already being released by DRM core through devm Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu

[PATCH v6 12/16] drm/amdgpu: Prevent any job recoveries after device is unplugged.

2021-05-10 Thread Andrey Grodzovsky
Return DRM_TASK_STATUS_ENODEV back to the scheduler when device is not present so they timeout timer will not be rearmed. v5: Update to match updated return values in enum drm_gpu_sched_stat Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 19

[PATCH v6 14/16] drm/scheduler: Fix hang when sched_entity released

2021-05-10 Thread Andrey Grodzovsky
rq due to race. v3: Drop drm_sched_rq_remove_entity, only modify entity->stopped and check for it in drm_sched_entity_is_idle Signed-off-by: Andrey Grodzovsky Reviewed-by: Christian König --- drivers/gpu/drm/scheduler/sched_entity.c | 3 ++- drivers/gpu/drm/scheduler/sched_

[PATCH v6 13/16] drm/amdgpu: Fix hang on device removal.

2021-05-10 Thread Andrey Grodzovsky
If removing while commands in flight you cannot wait to flush the HW fences on a ring since the device is gone. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 16 ++-- 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm

[PATCH v6 07/16] drm/amdgpu: Remap all page faults to per process dummy page.

2021-05-10 Thread Andrey Grodzovsky
On device removal reroute all CPU mappings to dummy page per drm_file instance or imported GEM object. v4: Update for modified ttm_bo_vm_dummy_page Signed-off-by: Andrey Grodzovsky Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 21 - 1 file

[PATCH v6 11/16] drm/sched: Make timeout timer rearm conditional.

2021-05-10 Thread Andrey Grodzovsky
We don't want to rearm the timer if driver hook reports that the device is gone. v5: Update drm_gpu_sched_stat values in code. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/scheduler/sched_main.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/driver

[PATCH v6 10/16] drm/amdgpu: Guard against write accesses after device removal

2021-05-10 Thread Andrey Grodzovsky
: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c| 11 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 9 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c| 17 +++-- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 63 +++-- drivers/gpu/drm/amd/amdgpu

[PATCH v6 08/16] PCI: Add support for dev_groups to struct pci_device_driver

2021-05-10 Thread Andrey Grodzovsky
This helps converting PCI drivers sysfs attributes to static. Analogous to b71b283e3d6d ("USB: add support for dev_groups to struct usb_driver") Signed-off-by: Andrey Grodzovsky Suggested-by: Greg Kroah-Hartman --- drivers/pci/pci-driver.c | 1 + include/linux/pci.h | 3 ++

[PATCH v6 06/16] drm/amdgpu: Handle IOMMU enabled case.

2021-05-10 Thread Andrey Grodzovsky
Handle all DMA IOMMU gropup related dependencies before the group is removed. v5: Drop IOMMU notifier and switch to lockless call to ttm_tt_unpopulate v6: Drop the BO unamp list Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++-- drivers/gpu/drm/amd/amdgpu

[PATCH v6 09/16] drm/amdgpu: Convert driver sysfs attributes to static attributes

2021-05-10 Thread Andrey Grodzovsky
This allows to remove explicit creation and destruction of those attrs and by this avoids warnings on device finalizing post physical device extraction. v5: Use newly added pci_driver.dev_groups directly Signed-off-by: Andrey Grodzovsky Acked-by: Christian König --- drivers/gpu/drm/amd/amdgpu

[PATCH v6 05/16] drm/amdgpu: Add early fini callback

2021-05-10 Thread Andrey Grodzovsky
Use it to call disply code dependent on device->drv_data before it's set to NULL on device unplug v5: Move HW finilization into this callback to prevent MMIO accesses post cpi remove. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

[PATCH v6 03/16] drm/amdgpu: Split amdgpu_device_fini into early and late

2021-05-10 Thread Andrey Grodzovsky
last device reference is dropped. v4: Change functions prefix early->hw and late->sw Signed-off-by: Andrey Grodzovsky Acked-by: Christian König Reviewed-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu.h| 6 - drivers/gpu/drm/amd/amdgpu/amdgpu_device.

[PATCH v6 04/16] drm/amdkfd: Split kfd suspend from devie exit

2021-05-10 Thread Andrey Grodzovsky
Helps to expdite HW related stuff to amdgpu_pci_remove Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_device.c| 3 ++- 3 files changed, 4 insertions(+), 3 deletions

[PATCH v6 02/16] drm/ttm: Expose ttm_tt_unpopulate for driver use

2021-05-10 Thread Andrey Grodzovsky
It's needed to drop iommu backed pages on device unplug before device's IOMMU group is released. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/ttm/ttm_tt.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c index 53

[PATCH v6 00/16] RFC Support hot device unplug in amdgpu

2021-05-10 Thread Andrey Grodzovsky
patchset https://lore.kernel.org/amd-gfx/20210428151207.1212258-1-andrey.grodzov...@amd.com/ [2] - drm/doc: device hot-unplug for userspace https://www.spinics.net/lists/dri-devel/msg259755.html [3] - Related gitlab ticket https://gitlab.freedesktop.org/drm/amd/-/issues/1081 [4] - Related IGT test

[PATCH v6 01/16] drm/ttm: Remap all page faults to per process dummy page.

2021-05-10 Thread Andrey Grodzovsky
r that BO. v5: Remove duplicate return. v6: Polish ttm_bo_vm_dummy_page, remove superflous code. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/ttm/ttm_bo_vm.c | 57 - include/drm/ttm/ttm_bo_api.h| 2 ++ 2 files changed, 58 insertions(+), 1 deletion(-)

Re: [PATCH v5 20/27] drm: Scope all DRM IOCTLs with drm_dev_enter/exit

2021-05-07 Thread Andrey Grodzovsky
On 2021-05-07 12:24 p.m., Daniel Vetter wrote: On Fri, May 07, 2021 at 11:39:49AM -0400, Andrey Grodzovsky wrote: On 2021-05-07 5:11 a.m., Daniel Vetter wrote: On Thu, May 06, 2021 at 12:25:06PM -0400, Andrey Grodzovsky wrote: On 2021-05-06 5:40 a.m., Daniel Vetter wrote: On Fri, Apr

Re: [PATCH v5 15/27] drm/scheduler: Fix hang when sched_entity released

2021-05-07 Thread Andrey Grodzovsky
On 2021-05-07 12:29 p.m., Daniel Vetter wrote: On Fri, Apr 30, 2021 at 12:10:57PM -0400, Andrey Grodzovsky wrote: On 2021-04-30 2:47 a.m., Christian König wrote: Am 29.04.21 um 19:06 schrieb Andrey Grodzovsky: On 2021-04-29 3:18 a.m., Christian König wrote: I need to take another

Re: [PATCH v5 20/27] drm: Scope all DRM IOCTLs with drm_dev_enter/exit

2021-05-07 Thread Andrey Grodzovsky
On 2021-05-07 5:11 a.m., Daniel Vetter wrote: On Thu, May 06, 2021 at 12:25:06PM -0400, Andrey Grodzovsky wrote: On 2021-05-06 5:40 a.m., Daniel Vetter wrote: On Fri, Apr 30, 2021 at 01:27:37PM -0400, Andrey Grodzovsky wrote: On 2021-04-30 6:25 a.m., Daniel Vetter wrote: On Thu, Apr

Re: [PATCH v5 20/27] drm: Scope all DRM IOCTLs with drm_dev_enter/exit

2021-05-06 Thread Andrey Grodzovsky
On 2021-05-06 5:40 a.m., Daniel Vetter wrote: On Fri, Apr 30, 2021 at 01:27:37PM -0400, Andrey Grodzovsky wrote: On 2021-04-30 6:25 a.m., Daniel Vetter wrote: On Thu, Apr 29, 2021 at 04:34:55PM -0400, Andrey Grodzovsky wrote: On 2021-04-29 3:05 p.m., Daniel Vetter wrote: On Thu, Apr

Re: [PATCH v5 06/27] drm/amdgpu: Handle IOMMU enabled case.

2021-05-05 Thread Andrey Grodzovsky
Ping Andrey On 2021-05-04 11:43 a.m., Andrey Grodzovsky wrote: On 2021-05-04 3:03 a.m., Christian König wrote: Am 03.05.21 um 22:43 schrieb Andrey Grodzovsky: On 2021-04-29 3:08 a.m., Christian König wrote: Am 28.04.21 um 17:11 schrieb Andrey Grodzovsky: Handle all DMA IOMMU gropup

Re: [PATCH v5 06/27] drm/amdgpu: Handle IOMMU enabled case.

2021-05-05 Thread Andrey Grodzovsky
On 2021-05-04 1:05 p.m., Felix Kuehling wrote: Am 2021-04-28 um 11:11 a.m. schrieb Andrey Grodzovsky: Handle all DMA IOMMU gropup related dependencies before the group is removed. v5: Drop IOMMU notifier and switch to lockless call to ttm_tt_unpopulate Signed-off-by: Andrey Grodzovsky

Re: [PATCH v5 20/27] drm: Scope all DRM IOCTLs with drm_dev_enter/exit

2021-05-05 Thread Andrey Grodzovsky
Ping Andrey On 2021-04-30 1:27 p.m., Andrey Grodzovsky wrote: On 2021-04-30 6:25 a.m., Daniel Vetter wrote: On Thu, Apr 29, 2021 at 04:34:55PM -0400, Andrey Grodzovsky wrote: On 2021-04-29 3:05 p.m., Daniel Vetter wrote: On Thu, Apr 29, 2021 at 12:04:33PM -0400, Andrey Grodzovsky wrote

Re: [PATCH v5 15/27] drm/scheduler: Fix hang when sched_entity released

2021-05-05 Thread Andrey Grodzovsky
Ping Andrey On 2021-04-30 12:10 p.m., Andrey Grodzovsky wrote: On 2021-04-30 2:47 a.m., Christian König wrote: Am 29.04.21 um 19:06 schrieb Andrey Grodzovsky: On 2021-04-29 3:18 a.m., Christian König wrote: I need to take another look at this part when I don't have a massive hea

Re: [PATCH v5 06/27] drm/amdgpu: Handle IOMMU enabled case.

2021-05-04 Thread Andrey Grodzovsky
On 2021-05-04 3:03 a.m., Christian König wrote: Am 03.05.21 um 22:43 schrieb Andrey Grodzovsky: On 2021-04-29 3:08 a.m., Christian König wrote: Am 28.04.21 um 17:11 schrieb Andrey Grodzovsky: Handle all DMA IOMMU gropup related dependencies before the group is removed. v5: Drop IOMMU

Re: [PATCH v5 06/27] drm/amdgpu: Handle IOMMU enabled case.

2021-05-03 Thread Andrey Grodzovsky
On 2021-04-29 3:08 a.m., Christian König wrote: Am 28.04.21 um 17:11 schrieb Andrey Grodzovsky: Handle all DMA IOMMU gropup related dependencies before the group is removed. v5: Drop IOMMU notifier and switch to lockless call to ttm_tt_unpopulate Maybe split that up into more patches

Re: [PATCH v5 06/27] drm/amdgpu: Handle IOMMU enabled case.

2021-05-03 Thread Andrey Grodzovsky
On 2021-04-29 11:13 p.m., Alex Deucher wrote: On Wed, Apr 28, 2021 at 11:13 AM Andrey Grodzovsky wrote: Handle all DMA IOMMU gropup related dependencies before the group is removed. v5: Drop IOMMU notifier and switch to lockless call to ttm_tt_unpopulate Signed-off-by: Andrey Grodzovsky

Re: [PATCH v5 20/27] drm: Scope all DRM IOCTLs with drm_dev_enter/exit

2021-04-30 Thread Andrey Grodzovsky
On 2021-04-30 6:25 a.m., Daniel Vetter wrote: On Thu, Apr 29, 2021 at 04:34:55PM -0400, Andrey Grodzovsky wrote: On 2021-04-29 3:05 p.m., Daniel Vetter wrote: On Thu, Apr 29, 2021 at 12:04:33PM -0400, Andrey Grodzovsky wrote: On 2021-04-29 7:32 a.m., Daniel Vetter wrote: On Thu, Apr

Re: [PATCH v5 15/27] drm/scheduler: Fix hang when sched_entity released

2021-04-30 Thread Andrey Grodzovsky
On 2021-04-30 2:47 a.m., Christian König wrote: Am 29.04.21 um 19:06 schrieb Andrey Grodzovsky: On 2021-04-29 3:18 a.m., Christian König wrote: I need to take another look at this part when I don't have a massive headache any more. Maybe split the patch set up into different

Re: [PATCH v5 03/27] drm/amdgpu: Split amdgpu_device_fini into early and late

2021-04-29 Thread Andrey Grodzovsky
late_fini before and then according to Daniel's requirest it was changed to fini_hw and fini_sw Andrey Thanks, Lijo *From:* amd-gfx on behalf of Andrey Grodzovsky *Sent:* Wednesday, April 28, 2021 8:41:43 PM *To:* d

Re: [PATCH v5 08/27] PCI: add support for dev_groups to struct pci_device_driver

2021-04-29 Thread Andrey Grodzovsky
On 2021-04-29 3:23 p.m., Bjorn Helgaas wrote: On Thu, Apr 29, 2021 at 12:53:15PM -0400, Andrey Grodzovsky wrote: On 2021-04-28 12:53 p.m., Bjorn Helgaas wrote: On Wed, Apr 28, 2021 at 11:11:48AM -0400, Andrey Grodzovsky wrote: This is exact copy of 'USB: add support for dev_grou

Re: [PATCH v5 20/27] drm: Scope all DRM IOCTLs with drm_dev_enter/exit

2021-04-29 Thread Andrey Grodzovsky
On 2021-04-29 3:05 p.m., Daniel Vetter wrote: On Thu, Apr 29, 2021 at 12:04:33PM -0400, Andrey Grodzovsky wrote: On 2021-04-29 7:32 a.m., Daniel Vetter wrote: On Thu, Apr 29, 2021 at 01:23:19PM +0200, Daniel Vetter wrote: On Wed, Apr 28, 2021 at 11:12:00AM -0400, Andrey Grodzovsky wrote

Re: [PATCH v5 13/27] drm/amdgpu: When filizing the fence driver. stop scheduler first.

2021-04-29 Thread Andrey Grodzovsky
On 2021-04-29 3:15 a.m., Christian König wrote: Filizing the fences? You mean finishing the fences, don't you? :) Yes, my bad. Andrey Am 28.04.21 um 17:11 schrieb Andrey Grodzovsky: No point calling amdgpu_fence_wait_empty before stopping the SW scheduler otherwise there is alw

Re: [PATCH v5 15/27] drm/scheduler: Fix hang when sched_entity released

2021-04-29 Thread Andrey Grodzovsky
b and fence handling I am not sure you mean this patch here, maybe another one ? Also note you already RBed it. Andrey Christian. Am 28.04.21 um 17:11 schrieb Andrey Grodzovsky: Problem: If scheduler is already stopped by the time sched_entity is released and entity's job_queue not empty I

Re: [PATCH v5 16/27] drm/amdgpu: Unmap all MMIO mappings

2021-04-29 Thread Andrey Grodzovsky
On 2021-04-29 3:19 a.m., Christian König wrote: Am 28.04.21 um 17:11 schrieb Andrey Grodzovsky: Access to those must be prevented post pci_remove That is certainly a no-go. We want to get rid of the kernel pointers in BOs, not add another one. Christian. As we discussed internally

Re: [PATCH v5 08/27] PCI: add support for dev_groups to struct pci_device_driver

2021-04-29 Thread Andrey Grodzovsky
On 2021-04-28 12:53 p.m., Bjorn Helgaas wrote: In subject: s/PCI: add support/PCI: Add support/ to match convention ("git log --oneline drivers/pci/pci-driver.c" to learn this). On Wed, Apr 28, 2021 at 11:11:48AM -0400, Andrey Grodzovsky wrote: This is exact copy of 'USB:

Re: [PATCH v5 20/27] drm: Scope all DRM IOCTLs with drm_dev_enter/exit

2021-04-29 Thread Andrey Grodzovsky
On 2021-04-29 12:29 p.m., Felix Kuehling wrote: Am 2021-04-29 um 12:21 p.m. schrieb Andrey Grodzovsky: On 2021-04-29 12:15 p.m., Felix Kuehling wrote: Am 2021-04-29 um 12:04 p.m. schrieb Andrey Grodzovsky: So as I understand your preferred approach is that I scope any back_end, HW

Re: [PATCH v5 20/27] drm: Scope all DRM IOCTLs with drm_dev_enter/exit

2021-04-29 Thread Andrey Grodzovsky
On 2021-04-29 12:15 p.m., Felix Kuehling wrote: Am 2021-04-29 um 12:04 p.m. schrieb Andrey Grodzovsky: So as I understand your preferred approach is that I scope any back_end, HW specific function with drm_dev_enter/exit because that where MMIO access takes place. But besides explicit MMIO

Re: [PATCH v5 20/27] drm: Scope all DRM IOCTLs with drm_dev_enter/exit

2021-04-29 Thread Andrey Grodzovsky
On 2021-04-29 7:32 a.m., Daniel Vetter wrote: On Thu, Apr 29, 2021 at 01:23:19PM +0200, Daniel Vetter wrote: On Wed, Apr 28, 2021 at 11:12:00AM -0400, Andrey Grodzovsky wrote: With this calling drm_dev_unplug will flush and block all in flight IOCTLs Also, add feature such that if device

Re: [PATCH v5 19/27] drm/amdgpu: Finilise device fences on device remove.

2021-04-28 Thread Andrey Grodzovsky
On 2021-04-28 11:11 a.m., Andrey Grodzovsky wrote: Make sure all fecens dependent on HW present are force signaled when handling device removal. This helpes later to scope all HW accesing code such as IOCTLs in drm_dev_enter/exit and use drm_dev_unplug as synchronization point past which we

[PATCH v5 27/27] drm/amdgpu: Verify DMA opearations from device are done

2021-04-28 Thread Andrey Grodzovsky
In case device remove is just simualted by sysfs then verify device doesn't keep doing DMA to the released memory after pci_remove is done. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/dr

[PATCH v5 23/27] drm/amd/powerplay: Scope all PM queued work with drm_dev_enter/exit

2021-04-28 Thread Andrey Grodzovsky
To allow completion and further block of HW accesses post device PCI remove. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/pm/amdgpu_dpm.c | 44 +-- drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 26 +++--- 2 files changed, 47 insertions(+), 23

<    1   2   3   4   5   6   7   8   9   10   >