Re: [PATCH] drm/amdgpu: Fix skipping hangged job reset during gpu recover.

2018-10-31 Thread Koenig, Christian
Am 31.10.18 um 15:36 schrieb Andrey Grodzovsky:
> Problem:
> During GPU recover DAL would hang in
> amdgpu_pm_compute_clocks->amdgpu_fence_wait_empty
>
> Fix:
> Turns out there was what looks like a typo introduced by
> 3320b8d drm/amdgpu: remove job->ring which caused skipping
> amdgpu_fence_driver_force_completion for guilty's job fence and so it
> was never force signaled and this would cause the hang later in DAL.
>
> Signed-off-by: Andrey Grodzovsky 

Crap, I was already staring at that code for a while as well but didn't 
realized what was wrong with it.

Patch is Reviewed-by: Christian König 

Regards,
Christian.

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 9a33fd0..8717a4f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -3363,7 +3363,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device 
> *adev,
>   
>   kthread_park(ring->sched.thread);
>   
> - if (job && job->base.sched == >sched)
> + if (job && job->base.sched != >sched)
>   continue;
>   
>   drm_sched_hw_job_reset(>sched, job ? >base : NULL);

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu: Fix skipping hangged job reset during gpu recover.

2018-10-31 Thread Andrey Grodzovsky
Problem:
During GPU recover DAL would hang in
amdgpu_pm_compute_clocks->amdgpu_fence_wait_empty

Fix:
Turns out there was what looks like a typo introduced by
3320b8d drm/amdgpu: remove job->ring which caused skipping
amdgpu_fence_driver_force_completion for guilty's job fence and so it
was never force signaled and this would cause the hang later in DAL.

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 9a33fd0..8717a4f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3363,7 +3363,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
 
kthread_park(ring->sched.thread);
 
-   if (job && job->base.sched == >sched)
+   if (job && job->base.sched != >sched)
continue;
 
drm_sched_hw_job_reset(>sched, job ? >base : NULL);
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx