RE: [PATCH 1/2] drm/amdgpu: fix double gpu_recovery for NV of SRIOV

2019-12-17 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

Reviewed-by: Emily Deng 

>-Original Message-
>From: amd-gfx  On Behalf Of Monk Liu
>Sent: Tuesday, December 17, 2019 6:20 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Liu, Monk 
>Subject: [PATCH 1/2] drm/amdgpu: fix double gpu_recovery for NV of SRIOV
>
>issues:
>gpu_recover() is re-entered by the mailbox interrupt handler mxgpu_nv.c
>
>fix:
>we need to bypass the gpu_recover() invoke in mailbox interrupt as long as the
>timeout is not infinite (thus the TDR will be triggered automatically after 
>time
>out, no need to invoke
>gpu_recover() through mailbox interrupt.
>
>Signed-off-by: Monk Liu 
>---
> drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c | 6 +-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
>b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
>index 0d8767e..1c3a7d4 100644
>--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
>+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
>@@ -269,7 +269,11 @@ static void xgpu_nv_mailbox_flr_work(struct
>work_struct *work)
>   }
>
>   /* Trigger recovery for world switch failure if no TDR */
>-  if (amdgpu_device_should_recover_gpu(adev))
>+  if (amdgpu_device_should_recover_gpu(adev)
>+  && (adev->sdma_timeout == MAX_SCHEDULE_TIMEOUT ||
>+  adev->gfx_timeout == MAX_SCHEDULE_TIMEOUT ||
>+  adev->compute_timeout == MAX_SCHEDULE_TIMEOUT ||
>+  adev->video_timeout == MAX_SCHEDULE_TIMEOUT))
>   amdgpu_device_gpu_recover(adev, NULL);  }
>
>--
>2.7.4
>
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.fre
>edesktop.org%2Fmailman%2Flistinfo%2Famd-
>gfxdata=02%7C01%7CEmily.Deng%40amd.com%7C029ef88677e744f2ad
>8f08d782dab68c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C63
>7121748276776005sdata=IiRwMTw6DQW8sh8Y7SkZ2PehohwnH6gSqkt
>t64a73UU%3Dreserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 1/2] drm/amdgpu: fix double gpu_recovery for NV of SRIOV

2019-12-17 Thread Monk Liu
issues:
gpu_recover() is re-entered by the mailbox interrupt
handler mxgpu_nv.c

fix:
we need to bypass the gpu_recover() invoke in mailbox
interrupt as long as the timeout is not infinite (thus the TDR
will be triggered automatically after time out, no need to invoke
gpu_recover() through mailbox interrupt.

Signed-off-by: Monk Liu 
---
 drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c 
b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
index 0d8767e..1c3a7d4 100644
--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
@@ -269,7 +269,11 @@ static void xgpu_nv_mailbox_flr_work(struct work_struct 
*work)
}
 
/* Trigger recovery for world switch failure if no TDR */
-   if (amdgpu_device_should_recover_gpu(adev))
+   if (amdgpu_device_should_recover_gpu(adev)
+   && (adev->sdma_timeout == MAX_SCHEDULE_TIMEOUT ||
+   adev->gfx_timeout == MAX_SCHEDULE_TIMEOUT ||
+   adev->compute_timeout == MAX_SCHEDULE_TIMEOUT ||
+   adev->video_timeout == MAX_SCHEDULE_TIMEOUT))
amdgpu_device_gpu_recover(adev, NULL);
 }
 
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx