Re: [PATCH 1/2] drm/amdgpu: Fix MMIO access page fault

2021-09-17 Thread Andrey Grodzovsky
wed-by: James Zhu Thanks & Best Regards! James On 2021-09-17 7:30 a.m., Andrey Grodzovsky wrote: Add more guards to MMIO access post device unbind/unplug Bug:https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.archlinux.org%2Ftask%2F72092%3Fproject%3D1%26order%3Ddat

[PATCH 2/2] drm/amdgpu: Fix resume failures when device is gone

2021-09-17 Thread Andrey Grodzovsky
for PCIe error recovery to avoid accessing registres. This allows to successfully complete pm resume sequence and finish pci remove. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu

[PATCH 1/2] drm/amdgpu: Fix MMIO access page fault

2021-09-17 Thread Andrey Grodzovsky
Add more guards to MMIO access post device unbind/unplug Bug:https://bugs.archlinux.org/task/72092?project=1=dateopened=desc=1 Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c | 8 ++-- drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c | 17 +++-- 2 files changed

Re: [PATCH] drm/amdgpu: Fix crash on device remove/driver unload

2021-09-16 Thread Andrey Grodzovsky
On 2021-09-16 11:51 a.m., Lazar, Lijo wrote: On 9/16/2021 9:15 PM, Andrey Grodzovsky wrote: On 2021-09-16 4:20 a.m., Lazar, Lijo wrote: A minor comment below. On 9/16/2021 1:11 AM, Andrey Grodzovsky wrote: Crash: BUG: unable to handle page fault for address: 10e1 RIP: 0010

Re: [PATCH] drm/amdgpu: Fix crash on device remove/driver unload

2021-09-16 Thread Andrey Grodzovsky
On 2021-09-16 4:20 a.m., Lazar, Lijo wrote: A minor comment below. On 9/16/2021 1:11 AM, Andrey Grodzovsky wrote: Crash: BUG: unable to handle page fault for address: 10e1 RIP: 0010:vega10_power_gate_vce+0x26/0x50 [amdgpu] Call Trace: pp_set_powergating_by_smu+0x16a/0x2b0 [amdgpu

Re: [PATCH v2] drm/amdgpu: Put drm_dev_enter/exit outside hot codepath

2021-09-15 Thread Andrey Grodzovsky
I fixed 2 regressions and latest code, applied your patch on top and passed libdrm tests on Vega 10. You can pickup those 2 patches and try too if you have time. In any case - Reviewed-and-tested-by: Andrey Grodzovsky Andrey On 2021-09-15 2:37 a.m., xinhui pan wrote: We hit soft hang while

[PATCH] drm/amd/display: Fix crash on device remove/driver unload

2021-09-15 Thread Andrey Grodzovsky
Why: DC core is being released from DM before it's referenced from hpd_rx wq destruction code. How: Move hpd_rx destruction before DC core destruction. Signed-off-by: Andrey Grodzovsky --- .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 24 +-- 1 file changed, 12 insertions

[PATCH] drm/amdgpu: Fix crash on device remove/driver unload

2021-09-15 Thread Andrey Grodzovsky
ee6679aaa61c drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/uvd_v3_1.c | 24 --- drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c | 24 --- drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c | 24 --- drivers

Re: [PATCH 1/2] drm/sched: fix the bug of time out calculation(v4)

2021-09-15 Thread Andrey Grodzovsky
Pushed Andrey On 2021-09-15 7:45 a.m., Christian König wrote: Yes, I think so as well. Andrey can you push this? Christian. Am 15.09.21 um 00:59 schrieb Grodzovsky, Andrey: AFAIK this one is independent. Christian, can you confirm ? Andrey

Re: 回复: [PATCH v2] drm/amdgpu: Put drm_dev_enter/exit outside hot codepath

2021-09-15 Thread Andrey Grodzovsky
On 2021-09-15 9:57 a.m., Christian König wrote: Am 15.09.21 um 15:52 schrieb Andrey Grodzovsky: On 2021-09-15 2:42 a.m., Pan, Xinhui wrote: [AMD Official Use Only] Andrey I hit panic with this plug/unplug test without this patch. Can you please tell which ASIC you are using and which

Re: 回复: [PATCH v2] drm/amdgpu: Put drm_dev_enter/exit outside hot codepath

2021-09-15 Thread Andrey Grodzovsky
On 2021-09-15 2:42 a.m., Pan, Xinhui wrote: [AMD Official Use Only] Andrey I hit panic with this plug/unplug test without this patch. Can you please tell which ASIC you are using and which kernel branch and what is the tip commit ? But as we add enter/exit in all its callers. maybe it

Re: 回复: [PATCH] drm/amdgpu: Put drm_dev_enter/exit outside hot codepath

2021-09-14 Thread Andrey Grodzovsky
I think you missed 'reply all' so bringing  back to public On 2021-09-14 11:40 p.m., Pan, Xinhui wrote: [AMD Official Use Only] perf says it is the lock addl $0x0,-0x4(%rsp) details is below. the contention is huge maybe. Yes - that makes sense to me too as long as the lock here is some

Re: [PATCH] drm/amdgpu: Put drm_dev_enter/exit outside hot codepath

2021-09-14 Thread Andrey Grodzovsky
On 2021-09-14 9:42 p.m., xinhui pan wrote: We hit soft hang while doing memory pressure test on one numa system. After a qucik look, this is because kfd invalid/valid userptr memory frequently with process_info lock hold. perf top says below, 75.81% [kernel] [k] __srcu_read_unlock

Re: [PATCH v2] drm/amdgpu: Fix a race of IB test

2021-09-13 Thread Andrey Grodzovsky
Please add a tag V2 in description explaining what was the delta from V1. Other then that looks good to me. Andrey On 2021-09-12 7:48 p.m., xinhui pan wrote: Direct IB submission should be exclusive. So use write lock. Signed-off-by: xinhui pan ---

Re: [PATCH v2] Revert "drm/scheduler: Avoid accessing freed bad job."

2021-09-02 Thread Andrey Grodzovsky
On 2021-09-02 10:28 a.m., Daniel Vetter wrote: On Tue, Aug 31, 2021 at 02:24:52PM -0400, Andrey Grodzovsky wrote: On 2021-08-31 9:11 a.m., Daniel Vetter wrote: On Thu, Aug 26, 2021 at 11:04:14AM +0200, Daniel Vetter wrote: On Thu, Aug 19, 2021 at 11:25:09AM -0400, Andrey Grodzovsky wrote

Re: [diagnostic TDR mode patches] unify our solution opinions/suggestions in one thread

2021-08-31 Thread Andrey Grodzovsky
On 2021-09-01 12:40 a.m., Jingwen Chen wrote: On Wed Sep 01, 2021 at 12:28:59AM -0400, Andrey Grodzovsky wrote: On 2021-09-01 12:25 a.m., Jingwen Chen wrote: On Wed Sep 01, 2021 at 12:04:47AM -0400, Andrey Grodzovsky wrote: I will answer everything here - On 2021-08-31 9:58 p.m., Liu, Monk

Re: [diagnostic TDR mode patches] unify our solution opinions/suggestions in one thread

2021-08-31 Thread Andrey Grodzovsky
On 2021-09-01 12:25 a.m., Jingwen Chen wrote: On Wed Sep 01, 2021 at 12:04:47AM -0400, Andrey Grodzovsky wrote: I will answer everything here - On 2021-08-31 9:58 p.m., Liu, Monk wrote: [AMD Official Use Only] In the previous discussion, you guys stated that we should

Re: [diagnostic TDR mode patches] unify our solution opinions/suggestions in one thread

2021-08-31 Thread Andrey Grodzovsky
I will answer everything here - On 2021-08-31 9:58 p.m., Liu, Monk wrote: [AMD Official Use Only] In the previous discussion, you guys stated that we should drop the “kthread_should_park” in cleanup_job. @@ -676,15 +676,6 @@ drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched) {  

Re: [PATCH 2/2] drm/sched: serialize job_timeout and scheduler

2021-08-31 Thread Andrey Grodzovsky
On 2021-08-31 12:01 p.m., Luben Tuikov wrote: On 2021-08-31 11:23, Andrey Grodzovsky wrote: On 2021-08-31 10:38 a.m., Daniel Vetter wrote: On Tue, Aug 31, 2021 at 10:20:40AM -0400, Andrey Grodzovsky wrote: On 2021-08-31 10:03 a.m., Daniel Vetter wrote: On Tue, Aug 31, 2021 at 09:53:36AM

Re: [PATCH v2] Revert "drm/scheduler: Avoid accessing freed bad job."

2021-08-31 Thread Andrey Grodzovsky
On 2021-08-31 9:11 a.m., Daniel Vetter wrote: On Thu, Aug 26, 2021 at 11:04:14AM +0200, Daniel Vetter wrote: On Thu, Aug 19, 2021 at 11:25:09AM -0400, Andrey Grodzovsky wrote: On 2021-08-19 5:30 a.m., Daniel Vetter wrote: On Wed, Aug 18, 2021 at 10:51:00AM -0400, Andrey Grodzovsky wrote

Re: [PATCH 2/2] drm/sched: serialize job_timeout and scheduler

2021-08-31 Thread Andrey Grodzovsky
On 2021-08-31 10:38 a.m., Daniel Vetter wrote: On Tue, Aug 31, 2021 at 10:20:40AM -0400, Andrey Grodzovsky wrote: On 2021-08-31 10:03 a.m., Daniel Vetter wrote: On Tue, Aug 31, 2021 at 09:53:36AM -0400, Andrey Grodzovsky wrote: It's says patch [2/2] but i can't find patch 1 On 2021-08-31 6

Re: [PATCH 2/2] drm/sched: serialize job_timeout and scheduler

2021-08-31 Thread Andrey Grodzovsky
On 2021-08-31 10:03 a.m., Daniel Vetter wrote: On Tue, Aug 31, 2021 at 09:53:36AM -0400, Andrey Grodzovsky wrote: It's says patch [2/2] but i can't find patch 1 On 2021-08-31 6:35 a.m., Monk Liu wrote: tested-by: jingwen chen Signed-off-by: Monk Liu Signed-off-by: jingwen chen

Re: [PATCH 2/2] drm/sched: serialize job_timeout and scheduler

2021-08-31 Thread Andrey Grodzovsky
It's says patch [2/2] but i can't find patch 1 On 2021-08-31 6:35 a.m., Monk Liu wrote: tested-by: jingwen chen Signed-off-by: Monk Liu Signed-off-by: jingwen chen --- drivers/gpu/drm/scheduler/sched_main.c | 24 1 file changed, 4 insertions(+), 20 deletions(-)

Re: [PATCH] drm/amdgpu: Fix a deadlock if previous GEM object allocation fails

2021-08-30 Thread Andrey Grodzovsky
On 2021-08-30 11:24 p.m., Pan, Xinhui wrote: [AMD Official Use Only] [AMD Official Use Only] Unreserve root BO before return otherwise next allocation got deadlock. Signed-off-by: xinhui pan --- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 11 +-- 1 file changed, 5 insertions(+), 6

Re: [PATCH] drm/amdgpu: stop scheduler when calling hw_fini (v2)

2021-08-30 Thread Andrey Grodzovsky
empty before suspend. v2: Call drm_sched_resubmit_job before drm_sched_start to restart jobs from the pending list. Suggested-by: Andrey Grodzovsky Suggested-by: Christian König Signed-off-by: Guchun Chen Reviewed-by: Christian König ---   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 8

Re: [PATCH v3 1/4] drm/ttm: Create pinned list

2021-08-30 Thread Andrey Grodzovsky
On 2021-08-30 1:05 p.m., Christian König wrote: Am 30.08.21 um 19:02 schrieb Andrey Grodzovsky: On 2021-08-30 12:51 p.m., Christian König wrote: Am 30.08.21 um 16:16 schrieb Andrey Grodzovsky: On 2021-08-30 4:58 a.m., Christian König wrote: Am 27.08.21 um 22:39 schrieb Andrey

Re: [PATCH v3 1/4] drm/ttm: Create pinned list

2021-08-30 Thread Andrey Grodzovsky
On 2021-08-30 12:51 p.m., Christian König wrote: Am 30.08.21 um 16:16 schrieb Andrey Grodzovsky: On 2021-08-30 4:58 a.m., Christian König wrote: Am 27.08.21 um 22:39 schrieb Andrey Grodzovsky: This list will be used to capture all non VRAM BOs not on LRU so when device is hot unplugged we

Re: [PATCH v3 1/4] drm/ttm: Create pinned list

2021-08-30 Thread Andrey Grodzovsky
On 2021-08-30 4:58 a.m., Christian König wrote: Am 27.08.21 um 22:39 schrieb Andrey Grodzovsky: This list will be used to capture all non VRAM BOs not on LRU so when device is hot unplugged we can iterate the list and unmap DMA mappings before device is removed. v2: Reanme function

[PATCH v3 3/4] drm/amdgpu: drm/amdgpu: Handle IOMMU enabled case

2021-08-27 Thread Andrey Grodzovsky
Handle all DMA IOMMU group related dependencies before the group is removed and we try to access it after free. v2: Move the actul handling function to TTM Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 ++ 1 file changed, 2 insertions(+) diff --git

[PATCH v3 2/4] drm/ttm: Clear all DMA mappings on demand

2021-08-27 Thread Andrey Grodzovsky
Switch to ttm_tt_unpopulate Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/ttm/ttm_device.c | 47 include/drm/ttm/ttm_device.h | 1 + 2 files changed, 48 insertions(+) diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c index

[PATCH v3 4/4] drm/amdgpu: Add a UAPI flag for hot plug/unplug

2021-08-27 Thread Andrey Grodzovsky
To support libdrm tests. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 6400259a7c4b..c2fdf67ff551 100644

[PATCH v3 1/4] drm/ttm: Create pinned list

2021-08-27 Thread Andrey Grodzovsky
This list will be used to capture all non VRAM BOs not on LRU so when device is hot unplugged we can iterate the list and unmap DMA mappings before device is removed. v2: Reanme function to ttm_bo_move_to_pinned v3: Move the pinned list to ttm device Signed-off-by: Andrey Grodzovsky Suggested

[PATCH v3 0/4] Various fixes to pass libdrm hotunplug tests

2021-08-27 Thread Andrey Grodzovsky
IOMMU hnadling to TTM layer. v3: Move pinned list to ttm device and a few others. Andrey Grodzovsky (4): drm/ttm: Create pinned list drm/ttm: Clear all DMA mappings on demand drm/amdgpu: drm/amdgpu: Handle IOMMU enabled case drm/amdgpu: Add a UAPI flag for hot plug/unplug drivers/gpu

Re: [PATCH] drm/amdgpu: stop scheduler when calling hw_fini

2021-08-27 Thread Andrey Grodzovsky
I don't think it will start/stop twice because amdgpu_fence_driver_hw_fini/inint is not called during reset. I am worried about calling drm_sched_start without calling drm_sched_resubmit_job first since that the place where the jobs are actually restarted. Also calling drm_sched_start with

Re: [PATCH] drm/sched: fix the bug of time out calculation(v3)

2021-08-27 Thread Andrey Grodzovsky
better than starting the timer when pushing the job to the ring buffer, because that is completely off. Christian. Am 27.08.21 um 20:22 schrieb Andrey Grodzovsky: As I mentioned to Monk before - what about cases such as in this test - https://gitlab.freedesktop.org/mesa/drm/-/commit

Re: [PATCH] drm/sched: fix the bug of time out calculation(v3)

2021-08-27 Thread Andrey Grodzovsky
into the ring buffer, but rather when it starts processing. Starting processing is a bit swampy defined, but just starting the timer when the previous job completes should be fine enough. Christian. Am 27.08.21 um 15:57 schrieb Andrey Grodzovsky: The TS represents the point in time when the job

Re: [PATCH v2 0/4] Various fixes to pass libdrm hotunplug tests

2021-08-27 Thread Andrey Grodzovsky
Ping Andrey On 2021-08-26 1:27 p.m., Andrey Grodzovsky wrote: Bunch of fixes to enable passing hotplug tests i previosly added here[1] with latest code. Once accepted I will enable the tests on libdrm side. [1] - https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/172 v2: Dropping VCE

Re: [PATCH] drm/sched: fix the bug of time out calculation(v3)

2021-08-27 Thread Andrey Grodzovsky
6.08.21 um 22:14 schrieb Andrey Grodzovsky: Attached quick patch for per job TTL calculation to make more precises next timer expiration. It's on top of the patch in this thread. Let me know if this makes sense. Andrey On 2021-08-26 10:03 a.m., Andrey Grodzovsky wrote: On 2021-08-26 12:55 a.m.

Re: [PATCH] drm/sched: fix the bug of time out calculation(v3)

2021-08-27 Thread Andrey Grodzovsky
: [PATCH] drm/sched: fix the bug of time out calculation(v3) Attached quick patch for per job TTL calculation to make more precises next timer expiration. It's on top of the patch in this thread. Let me know if this makes sense. Andrey On 2021-08-26 10:03 a.m., Andrey Grodzovsky wrote: On 2021-08

Re: [PATCH] drm/sched: fix the bug of time out calculation(v3)

2021-08-26 Thread Andrey Grodzovsky
Attached quick patch for per job TTL calculation to make more precises next timer expiration. It's on top of the patch in this thread. Let me know if this makes sense. Andrey On 2021-08-26 10:03 a.m., Andrey Grodzovsky wrote: On 2021-08-26 12:55 a.m., Monk Liu wrote: issue: in cleanup_job

[PATCH v2 2/4] drm/ttm: Clear all DMA mappings on demand

2021-08-26 Thread Andrey Grodzovsky
Used by drivers supporting hot unplug to handle all DMA IOMMU group related dependencies before the group is removed during device removal and we try to access it after free when last device pointer from user space is dropped. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/ttm

[PATCH v2 4/4] drm/amdgpu: Add a UAPI flag for hot plug/unplug

2021-08-26 Thread Andrey Grodzovsky
To support libdrm tests. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 6400259a7c4b..c2fdf67ff551 100644

[PATCH v2 0/4] Various fixes to pass libdrm hotunplug tests

2021-08-26 Thread Andrey Grodzovsky
IOMMU hnadling to TTM layer. Andrey Grodzovsky (4): drm/ttm: Create pinned list drm/ttm: Clear all DMA mappings on demand drm/amdgpu: drm/amdgpu: Handle IOMMU enabled case drm/amdgpu: Add a UAPI flag for hot plug/unplug drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 + drivers/gpu/drm

[PATCH v2 3/4] drm/amdgpu: drm/amdgpu: Handle IOMMU enabled case

2021-08-26 Thread Andrey Grodzovsky
Handle all DMA IOMMU group related dependencies before the group is removed and we try to access it after free. v2: Move the actul handling function to TTM Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 ++ 1 file changed, 2 insertions(+) diff --git

[PATCH v2 1/4] drm/ttm: Create pinned list

2021-08-26 Thread Andrey Grodzovsky
assigned to them. Signed-off-by: Andrey Grodzovsky Suggested-by: Christian König --- drivers/gpu/drm/ttm/ttm_bo.c | 30 ++ drivers/gpu/drm/ttm/ttm_resource.c | 1 + include/drm/ttm/ttm_resource.h | 1 + 3 files changed, 28 insertions(+), 4 deletions(-) diff

Re: [PATCH] drm/sched: fix the bug of time out calculation(v3)

2021-08-26 Thread Andrey Grodzovsky
On 2021-08-26 12:55 a.m., Monk Liu wrote: issue: in cleanup_job the cancle_delayed_work will cancel a TO timer even the its corresponding job is still running. fix: do not cancel the timer in cleanup_job, instead do the cancelling only when the heading job is signaled, and if there is a

Re: [PATCH 3/4] drm/amdgpu: drm/amdgpu: Handle IOMMU enabled case

2021-08-26 Thread Andrey Grodzovsky
Ping Andrey On 2021-08-25 11:36 a.m., Andrey Grodzovsky wrote: On 2021-08-25 2:43 a.m., Christian König wrote: Am 24.08.21 um 23:01 schrieb Andrey Grodzovsky: Handle all DMA IOMMU group related dependencies before the group is removed and we try to access it after free. Signed-off

Re: [PATCH] drm/sched: fix the bug of time out calculation(v2)

2021-08-25 Thread Andrey Grodzovsky
On 2021-08-26 12:55 a.m., Liu, Monk wrote: [AMD Official Use Only] But for timer pending case (common case) your mod_delayed_work will effectively do exactly the same if you don't use per job TTLs - you mod it to sched->timeout value which resets the pending timer to again count from 0.

Re: [PATCH] drm/sched: fix the bug of time out calculation(v2)

2021-08-25 Thread Andrey Grodzovsky
On 2021-08-25 10:31 p.m., Liu, Monk wrote: [AMD Official Use Only] Hi Andrey I'm not quite sure if I read you correctly Seems to me you can only do it for empty pending list otherwise you risk cancelling a legit new timer that was started by the next job or not restarting timer at all

Re: [PATCH] drm/sched: fix the bug of time out calculation(v2)

2021-08-25 Thread Andrey Grodzovsky
On 2021-08-25 8:11 a.m., Christian König wrote: No, this would break that logic here. See drm_sched_start_timeout() can be called multiple times, this is intentional and very important! The logic in queue_delayed_work() makes sure that the timer is only started once and then never again.

Re: [PATCH 3/4] drm/amdgpu: drm/amdgpu: Handle IOMMU enabled case

2021-08-25 Thread Andrey Grodzovsky
On 2021-08-25 2:43 a.m., Christian König wrote: Am 24.08.21 um 23:01 schrieb Andrey Grodzovsky: Handle all DMA IOMMU group related dependencies before the group is removed and we try to access it after free. Signed-off-by: Andrey Grodzovsky ---   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

Re: [PATCH 1/4] drm/amdgpu: Move flush VCE idle_work during HW fini

2021-08-24 Thread Andrey Grodzovsky
here too. https://lists.freedesktop.org/archives/amd-gfx/2021-August/067972.html https://lists.freedesktop.org/archives/amd-gfx/2021-August/067967.html BR Evan -Original Message- From: amd-gfx On Behalf Of Andrey Grodzovsky Sent: Wednesday, August 25, 2021 5:01 AM To: dri-de

[PATCH 4/4] drm/amdgpu: Add a UAPI flag for hot plug/unplug

2021-08-24 Thread Andrey Grodzovsky
To support libdrm tests. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 6400259a7c4b..c2fdf67ff551 100644

[PATCH 3/4] drm/amdgpu: drm/amdgpu: Handle IOMMU enabled case

2021-08-24 Thread Andrey Grodzovsky
Handle all DMA IOMMU group related dependencies before the group is removed and we try to access it after free. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 + drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c| 50 ++ drivers/gpu/drm/amd

[PATCH 1/4] drm/amdgpu: Move flush VCE idle_work during HW fini

2021-08-24 Thread Andrey Grodzovsky
Attepmts to powergate after device is removed lead to crash. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 1 - drivers/gpu/drm/amd/amdgpu/vce_v2_0.c | 4 drivers/gpu/drm/amd/amdgpu/vce_v3_0.c | 5 - drivers/gpu/drm/amd/amdgpu/vce_v4_0.c | 2 ++ 4

[PATCH 2/4] drm/ttm: Create pinned list

2021-08-24 Thread Andrey Grodzovsky
This list will be used to capture all non VRAM BOs not on LRU so when device is hot unplugged we can iterate the list and unmap DMA mappings before device is removed. Signed-off-by: Andrey Grodzovsky Suggested-by: Christian König --- drivers/gpu/drm/ttm/ttm_bo.c | 24

[PATCH 0/4] Various fixes to pass libdrm hotunplug tests

2021-08-24 Thread Andrey Grodzovsky
Bunch of fixes to enable passing hotplug tests i previosly added here[1] with latest code. Once accepted I will enable the tests on libdrm side. [1] - https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/172 Andrey Grodzovsky (4): drm/amdgpu: Move flush VCE idle_work during HW fini drm

Re: [PATCH] drm/sched: fix the bug of time out calculation

2021-08-24 Thread Andrey Grodzovsky
On 2021-08-24 5:51 a.m., Monk Liu wrote: the original logic is wrong that the timeout will not be retriggerd after the previous job siganled, and that lead to the scenario that all jobs in the same scheduler shares the same timeout timer from the very begining job in this scheduler which is

Re: [PATCH] drm/sched: fix the bug of time out calculation

2021-08-24 Thread Andrey Grodzovsky
On 2021-08-24 10:46 a.m., Andrey Grodzovsky wrote: On 2021-08-24 5:51 a.m., Monk Liu wrote: the original logic is wrong that the timeout will not be retriggerd after the previous job siganled, and that lead to the scenario that all jobs in the same scheduler shares the same timeout timer

Re: [PATCH v2] Revert "drm/scheduler: Avoid accessing freed bad job."

2021-08-24 Thread Andrey Grodzovsky
hursday, August 19, 2021 5:31 PM To: Grodzovsky, Andrey Cc: Daniel Vetter ; Alex Deucher ; Chen, JingWen ; Maling list - DRI developers ; amd-gfx list ; Liu, Monk ; Koenig, Christian Subject: Re: [PATCH v2] Revert "drm/scheduler: Avoid accessing freed bad job." On Wed, Aug 18, 2021 at

Re: [PATCH] drm/amdgpu: avoid over-handle of fence driver fini in s3 test (v2)

2021-08-23 Thread Andrey Grodzovsky
in s3 test (v2) Please go ahead.  Thanks! Alex On Thu, Aug 19, 2021 at 8:05 AM Mike Lothian wrote: Hi Do I need to open a new bug report for this? Cheers Mike On Wed, 18 Aug 2021 at 06:26, Andrey Grodzovsky wrote: On 2021-08-02 1:16 a.m., Guchun Chen wrote

Re: [PATCH v2] Revert "drm/scheduler: Avoid accessing freed bad job."

2021-08-20 Thread Andrey Grodzovsky
sky, Andrey Cc: Daniel Vetter ; Alex Deucher ; Chen, JingWen ; Maling list - DRI developers ; amd-gfx list ; Liu, Monk ; Koenig, Christian Subject: Re: [PATCH v2] Revert "drm/scheduler: Avoid accessing freed bad job." On Wed, Aug 18, 2021 at 10:51:00AM -0400, Andrey Grodzovsky wrot

Re: [PATCH v2] Revert "drm/scheduler: Avoid accessing freed bad job."

2021-08-20 Thread Andrey Grodzovsky
Maling list - DRI developers ; amd-gfx list ; Liu, Monk ; Koenig, Christian Subject: Re: [PATCH v2] Revert "drm/scheduler: Avoid accessing freed bad job." On Wed, Aug 18, 2021 at 10:51:00AM -0400, Andrey Grodzovsky wrote: On 2021-08-18 10:42 a.m., Daniel Vetter wrote: On Wed, Aug 18

Re: [PATCH v2] Revert "drm/scheduler: Avoid accessing freed bad job."

2021-08-19 Thread Andrey Grodzovsky
On 2021-08-19 5:30 a.m., Daniel Vetter wrote: On Wed, Aug 18, 2021 at 10:51:00AM -0400, Andrey Grodzovsky wrote: On 2021-08-18 10:42 a.m., Daniel Vetter wrote: On Wed, Aug 18, 2021 at 10:36:32AM -0400, Andrey Grodzovsky wrote: On 2021-08-18 10:32 a.m., Daniel Vetter wrote: On Wed, Aug 18

Re: [PATCH v2] Revert "drm/scheduler: Avoid accessing freed bad job."

2021-08-18 Thread Andrey Grodzovsky
On 2021-08-18 10:42 a.m., Daniel Vetter wrote: On Wed, Aug 18, 2021 at 10:36:32AM -0400, Andrey Grodzovsky wrote: On 2021-08-18 10:32 a.m., Daniel Vetter wrote: On Wed, Aug 18, 2021 at 10:26:25AM -0400, Andrey Grodzovsky wrote: On 2021-08-18 10:02 a.m., Alex Deucher wrote: + dri-devel

Re: [PATCH v2] Revert "drm/scheduler: Avoid accessing freed bad job."

2021-08-18 Thread Andrey Grodzovsky
On 2021-08-18 10:32 a.m., Daniel Vetter wrote: On Wed, Aug 18, 2021 at 10:26:25AM -0400, Andrey Grodzovsky wrote: On 2021-08-18 10:02 a.m., Alex Deucher wrote: + dri-devel Since scheduler is a shared component, please add dri-devel on all scheduler patches. On Wed, Aug 18, 2021 at 7:21 AM

Re: [PATCH v2] Revert "drm/scheduler: Avoid accessing freed bad job."

2021-08-18 Thread Andrey Grodzovsky
On 2021-08-18 10:02 a.m., Alex Deucher wrote: + dri-devel Since scheduler is a shared component, please add dri-devel on all scheduler patches. On Wed, Aug 18, 2021 at 7:21 AM Jingwen Chen wrote: [Why] for bailing job, this commit will delete it from pending list thus the bailing job will

Re: [PATCH] drm/amdgpu: avoid over-handle of fence driver fini in s3 test (v2)

2021-08-17 Thread Andrey Grodzovsky
On 2021-08-02 1:16 a.m., Guchun Chen wrote: In amdgpu_fence_driver_hw_fini, no need to call drm_sched_fini to stop scheduler in s3 test, otherwise, fence related failure will arrive after resume. To fix this and for a better clean up, move drm_sched_fini from fence_hw_fini to fence_sw_fini, as

Re: [PATCH] drm/amd/amdgpu:flush ttm delayed work before cancel_sync

2021-08-17 Thread Andrey Grodzovsky
Looks reasonable to me. Reviewed-by: Andrey Grodzovsky Andrey On 2021-08-17 5:50 a.m., YuBiao Wang wrote: [Why] In some cases when we unload driver, warning call trace will show up in vram_mgr_fini which claims that LRU is not empty, caused by the ttm bo inside delay deleted queue. [How] We

Re: [PATCH] Revert "drm/scheduler: Avoid accessing freed bad job."

2021-08-17 Thread Andrey Grodzovsky
On 2021-08-17 12:28 a.m., Jingwen Chen wrote: [Why] for bailing job, this commit will delete it from pending list thus the bailing job will never have a chance to be resubmitted even in advance tdr mode. [How] after embeded hw_fence into amdgpu_job is done, the race condition that this commit

Re: [PATCH v4] drm/amd/amdgpu embed hw_fence into amdgpu_job

2021-08-10 Thread Andrey Grodzovsky
Reviewed-by: Andrey Grodzovsky Andrey On 2021-08-09 11:22 p.m., Jingwen Chen wrote: From: Jack Zhang Why: Previously hw fence is alloced separately with job. It caused historical lifetime issues and corner cases. The ideal situation is to take fence to manage both job and fence's lifetime

Re: [PATCHv2 2/2] drm/amd/amdgpu: add tdr support for embeded hw_fence

2021-08-09 Thread Andrey Grodzovsky
On 2021-08-05 4:31 a.m., Jingwen Chen wrote: [Why] After embeded hw_fence to amdgpu_job, we need to add tdr support for this feature. [How] 1. Add a resubmit_flag for resubmit jobs. 2. Clear job fence from RCU and force complete vm flush fences in pre_asic_reset 3. skip dma_fence_get for

Re: [PATCHv2 1/2] drm/amd/amdgpu embed hw_fence into amdgpu_job

2021-08-05 Thread Andrey Grodzovsky
On 2021-08-05 4:31 a.m., Jingwen Chen wrote: From: Jack Zhang Why: Previously hw fence is alloced separately with job. It caused historical lifetime issues and corner cases. The ideal situation is to take fence to manage both job and fence's lifetime, and simplify the design of

Re: [PATCH 2/2] drm: add tdr support for embeded hw_fence

2021-07-23 Thread Andrey Grodzovsky
On 2021-07-22 8:20 p.m., Jingwen Chen wrote: On Thu Jul 22, 2021 at 01:50:09PM -0400, Andrey Grodzovsky wrote: On 2021-07-22 1:27 p.m., Jingwen Chen wrote: On Thu Jul 22, 2021 at 01:17:13PM -0400, Andrey Grodzovsky wrote: On 2021-07-22 12:47 p.m., Jingwen Chen wrote: On Thu Jul 22, 2021

Re: [PATCH 2/2] drm: add tdr support for embeded hw_fence

2021-07-22 Thread Andrey Grodzovsky
On 2021-07-22 1:27 p.m., Jingwen Chen wrote: On Thu Jul 22, 2021 at 01:17:13PM -0400, Andrey Grodzovsky wrote: On 2021-07-22 12:47 p.m., Jingwen Chen wrote: On Thu Jul 22, 2021 at 06:24:28PM +0200, Christian König wrote: Am 22.07.21 um 16:45 schrieb Andrey Grodzovsky: On 2021-07-22 6:45 a.m

Re: [PATCH 2/2] drm: add tdr support for embeded hw_fence

2021-07-22 Thread Andrey Grodzovsky
On 2021-07-22 12:47 p.m., Jingwen Chen wrote: On Thu Jul 22, 2021 at 06:24:28PM +0200, Christian König wrote: Am 22.07.21 um 16:45 schrieb Andrey Grodzovsky: On 2021-07-22 6:45 a.m., Jingwen Chen wrote: On Wed Jul 21, 2021 at 12:53:51PM -0400, Andrey Grodzovsky wrote: On 2021-07-20 11:13

Re: [PATCH 2/2] drm: add tdr support for embeded hw_fence

2021-07-22 Thread Andrey Grodzovsky
On 2021-07-22 6:45 a.m., Jingwen Chen wrote: On Wed Jul 21, 2021 at 12:53:51PM -0400, Andrey Grodzovsky wrote: On 2021-07-20 11:13 p.m., Jingwen Chen wrote: [Why] After embeded hw_fence to amdgpu_job, we need to add tdr support for this feature. [How] 1. Add a resubmit_flag for resubmit

Re: [PATCH 2/2] drm: add tdr support for embeded hw_fence

2021-07-21 Thread Andrey Grodzovsky
On 2021-07-20 11:13 p.m., Jingwen Chen wrote: [Why] After embeded hw_fence to amdgpu_job, we need to add tdr support for this feature. [How] 1. Add a resubmit_flag for resubmit jobs. 2. Clear job fence from RCU and force complete vm flush fences in pre_asic_reset 3. skip dma_fence_get for

[PATCH 2/3] drm/amdgpu: Switch to LBF for USPC PD FW in psp v11

2021-07-13 Thread Andrey Grodzovsky
Update callback signature and update implementation. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h | 6 ++-- drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 41 - 2 files changed, 16 insertions(+), 31 deletions(-) diff --git a/drivers/gpu/drm

[PATCH 3/3] drm/amdgpu: Switch to LBF for USPC PD FW in psp v13

2021-07-13 Thread Andrey Grodzovsky
Add USBC PD FW implementation here to be used with relevant ASICs. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/psp_v13_0.c | 66 ++ 1 file changed, 66 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v13_0.c b/drivers/gpu/drm/amd/amdgpu

[PATCH 1/3] drm/amdgpu: Switch to VRAM buffer for USBC PD FW.

2021-07-13 Thread Andrey Grodzovsky
System memory-based implementation for updating the USBCPD is deprecated for switching to LFB based implementation for all the ASICs. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 33 ++--- 1 file changed, 13 insertions(+), 20 deletions

[PATCH 5/6] drm/amdgpu: Fix BUG_ON assert

2021-06-22 Thread Andrey Grodzovsky
With added CPU domain to placement you can have now 3 placemnts at once. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu

[PATCH 4/6] drm/amdgpu: switch gtt_mgr to counting used pages

2021-06-22 Thread Andrey Grodzovsky
From: Lang Yu Change mgr->available into mgr->used (invert the value). Makes more sense to do it this way since we don't need the spinlock any more to double check the handling. v3 (chk): separated from the TEMPOARAY FLAG change. Signed-off-by: Lang Yu Signed-off-by: Christian König ---

[PATCH 2/6] drm/amdgpu: user temporary GTT as bounce buffer

2021-06-22 Thread Andrey Grodzovsky
From: Lang Yu Currently, we have a limitted GTT memory size and need a bounce buffer when doing buffer migration between VRAM and SYSTEM domain. The problem is under GTT memory pressure we can't do buffer migration between VRAM and SYSTEM domain. But in some cases we really need that.

[PATCH 6/6] drm/ttm: Fix multihop assert on eviction.

2021-06-22 Thread Andrey Grodzovsky
Problem: Under memory pressure when GTT domain is almost full multihop assert will come up when trying to evict LRU BO from VRAM to SYSTEM. Fix: Don't assert on multihop error in evict code but rather do a retry as we do in ttm_bo_move_buffer Signed-off-by: Andrey Grodzovsky --- drivers/gpu

[PATCH 3/6] drm/amdgpu: always allow evicting to SYSTEM domain

2021-06-22 Thread Andrey Grodzovsky
From: Christian König When we run out of GTT we should still be able to evict VRAM->SYSTEM with a bounce bufferdrm/amdgpu: always allow evicting to SYSTEM domain Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 6 -- 1 file changed, 4 insertions(+), 2

[PATCH 1/6] drm/ttm: add TTM_PL_FLAG_TEMPORARY flag v3

2021-06-22 Thread Andrey Grodzovsky
From: Lang Yu Sometimes drivers need to use bounce buffers to evict BOs. While those reside in some domain they are not necessarily suitable for CS. Add a flag so that drivers can note that a bounce buffers needs to be reallocated during validation. v2: add detailed comments v3 (chk): merge

Re: [PATCH 0/7] libdrm tests for hot-unplug fe goature

2021-06-07 Thread Andrey Grodzovsky
] libdrm tests for hot-unplug feature Please open a gitlab MR for these. Alex On Tue, Jun 1, 2021 at 4:17 PM Andrey Grodzovsky wrote: Adding some tests to acompany the recently added hot-unplug feature. For now the test suite is disabled until the feature propagates from drm-misc-next to drm-next

Re: [PATCH 0/7] libdrm tests for hot-unplug feature

2021-06-04 Thread Andrey Grodzovsky
On 2021-06-03 10:53 p.m., Alex Deucher wrote: On Thu, Jun 3, 2021 at 9:37 PM Dave Airlie wrote: On Fri, 4 Jun 2021 at 07:20, Alex Deucher wrote: Please open a gitlab MR for these. I'd really prefer these tests all get migrated out of here into igt. I don't think libdrm_amdgpu really

Re: [PATCH 0/7] libdrm tests for hot-unplug feature

2021-06-03 Thread Andrey Grodzovsky
Ping Andrey On 2021-06-02 10:20 a.m., Andrey Grodzovsky wrote: On 2021-06-02 3:59 a.m., Daniel Vetter wrote: On Tue, Jun 1, 2021 at 10:17 PM Andrey Grodzovsky wrote: Adding some tests to acompany the recently added hot-unplug feature. For now the test suite is disabled until the feature

Re: [PATCH 2/7] xf86drm: Add function to retrieve char device path

2021-06-02 Thread Andrey Grodzovsky
It calls drmNodeIsDRM which is private function itself so - if i implement it in amdgpu part I still need to expose drmNodeIsDRM. Note that this function is basically a subset of drmGetDeviceNameFromFd2 Andrey On 2021-06-02 5:16 a.m., Simon Ser wrote: Do we really need to make this a public

Re: [PATCH 0/7] libdrm tests for hot-unplug feature

2021-06-02 Thread Andrey Grodzovsky
On 2021-06-02 3:59 a.m., Daniel Vetter wrote: On Tue, Jun 1, 2021 at 10:17 PM Andrey Grodzovsky wrote: Adding some tests to acompany the recently added hot-unplug feature. For now the test suite is disabled until the feature propagates from drm-misc-next to drm-next. Andrey Grodzovsky (7

[PATCH 2/7] xf86drm: Add function to retrieve char device path

2021-06-01 Thread Andrey Grodzovsky
Used to access device controls Signed-off-by: Andrey Grodzovsky --- xf86drm.c | 23 +++ xf86drm.h | 1 + 2 files changed, 24 insertions(+) diff --git a/xf86drm.c b/xf86drm.c index edfeb347..a5ecd323 100644 --- a/xf86drm.c +++ b/xf86drm.c @@ -4361,6 +4361,29 @@ drm_public

[PATCH 4/7] test/amdgpu/hotunplug: Add test suite for GPU unplug

2021-06-01 Thread Andrey Grodzovsky
Add just the test suite skeleton. Signed-off-by: Andrey Grodzovsky --- tests/amdgpu/amdgpu_test.c | 11 tests/amdgpu/amdgpu_test.h | 23 +++ tests/amdgpu/hotunplug_tests.c | 116 + tests/amdgpu/meson.build | 1 + 4 files changed, 151

[PATCH 3/7] test/amdgpu: Add helper functions for hot unplug

2021-06-01 Thread Andrey Grodzovsky
Expose close device and add open device wich preserves test index. Signed-off-by: Andrey Grodzovsky --- tests/amdgpu/amdgpu_test.c | 31 --- tests/amdgpu/amdgpu_test.h | 3 +++ 2 files changed, 31 insertions(+), 3 deletions(-) diff --git a/tests/amdgpu

[PATCH 7/7] tests/amdgpu/hotunplug: Add hotunplug with exported bo test

2021-06-01 Thread Andrey Grodzovsky
Disconnect device while BO is exported. Signed-off-by: Andrey Grodzovsky --- tests/amdgpu/hotunplug_tests.c | 46 -- 1 file changed, 44 insertions(+), 2 deletions(-) diff --git a/tests/amdgpu/hotunplug_tests.c b/tests/amdgpu/hotunplug_tests.c index 6e133a07

[PATCH 5/7] test/amdgpu/hotunplug: Add basic test

2021-06-01 Thread Andrey Grodzovsky
Add plug/unplug device and open/close device file infrastrucutre. Add basic test - unplug device while device file still open. Close device file afterwards and replug the device. Signed-off-by: Andrey Grodzovsky --- tests/amdgpu/hotunplug_tests.c | 135 + 1 file

[PATCH 1/7] tests/amdgpu: Fix valgrind warning

2021-06-01 Thread Andrey Grodzovsky
Struct access after free Signed-off-by: Andrey Grodzovsky --- tests/amdgpu/basic_tests.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/tests/amdgpu/basic_tests.c b/tests/amdgpu/basic_tests.c index 8e7c4916..8b7fd0f6 100644 --- a/tests/amdgpu/basic_tests.c +++ b/tests

[PATCH 0/7] libdrm tests for hot-unplug feature

2021-06-01 Thread Andrey Grodzovsky
Adding some tests to acompany the recently added hot-unplug feature. For now the test suite is disabled until the feature propagates from drm-misc-next to drm-next. Andrey Grodzovsky (7): tests/amdgpu: Fix valgrind warning xf86drm: Add function to retrieve char device path test/amdgpu: Add

[PATCH 6/7] tests/amdgpu/hotunplug: Add unplug with cs test.

2021-06-01 Thread Andrey Grodzovsky
Same as simple test but while doing cs Signed-off-by: Andrey Grodzovsky --- tests/amdgpu/hotunplug_tests.c | 128 - 1 file changed, 126 insertions(+), 2 deletions(-) diff --git a/tests/amdgpu/hotunplug_tests.c b/tests/amdgpu/hotunplug_tests.c index c2bc1cf2

<    1   2   3   4   5   6   7   8   9   10   >