[AMD Official Use Only - General]

> -----Original Message-----
> From: Chen, Guchun <guchun.c...@amd.com>
> Sent: Monday, May 8, 2023 9:28 PM
> To: amd-gfx@lists.freedesktop.org; Deucher, Alexander
> <alexander.deuc...@amd.com>; Zhang, Hawking
> <hawking.zh...@amd.com>; Lazar, Lijo <lijo.la...@amd.com>; Quan, Evan
> <evan.q...@amd.com>; Koenig, Christian <christian.koe...@amd.com>;
> Pan, Xinhui <xinhui....@amd.com>
> Cc: Chen, Guchun <guchun.c...@amd.com>; Zhou1, Tao
> <tao.zh...@amd.com>
> Subject: [PATCH] drm/amdgpu/gfx: disable gfx9 cp_ecc_error_irq only when
> enabling legacy gfx ras
> 
> gfx9 cp_ecc_error_irq is only enabled when legacy gfx ras is assert.
> So in gfx_v9_0_hw_fini, interrupt disablement for cp_ecc_error_irq should
> be executed under such condition, otherwise, an amdgpu_irq_put calltrace
> will occur.
> 
> [ 7283.170322] RIP: 0010:amdgpu_irq_put+0x45/0x70 [amdgpu] [
> 7283.170964] RSP: 0018:ffff9a5fc3967d00 EFLAGS: 00010246 [ 7283.170967]
> RAX: ffff98d88afd3040 RBX: ffff98d89da20000 RCX: 0000000000000000 [
> 7283.170969] RDX: 0000000000000000 RSI: ffff98d89da2bef8 RDI:
> ffff98d89da20000 [ 7283.170971] RBP: ffff98d89da20000 R08:
> ffff98d89da2ca18 R09: 0000000000000006 [ 7283.170973] R10:
> ffffd5764243c008 R11: 0000000000000000 R12: 0000000000001050 [
> 7283.170975] R13: ffff98d89da38978 R14: ffffffff999ae15a R15:
> ffff98d880130105 [ 7283.170978] FS:  0000000000000000(0000)
> GS:ffff98d996f00000(0000) knlGS:0000000000000000 [ 7283.170981] CS:  0010
> DS: 0000 ES: 0000 CR0: 0000000080050033 [ 7283.170983] CR2:
> 00000000f7a9d178 CR3: 00000001c42ea000 CR4: 00000000003506e0 [
> 7283.170986] Call Trace:
> [ 7283.170988]  <TASK>
> [ 7283.170989]  gfx_v9_0_hw_fini+0x1c/0x6d0 [amdgpu] [ 7283.171655]
> amdgpu_device_ip_suspend_phase2+0x101/0x1a0 [amdgpu] [ 7283.172245]
> amdgpu_device_suspend+0x103/0x180 [amdgpu] [ 7283.172823]
> amdgpu_pmops_freeze+0x21/0x60 [amdgpu] [ 7283.173412]
> pci_pm_freeze+0x54/0xc0 [ 7283.173419]  ?
> __pfx_pci_pm_freeze+0x10/0x10 [ 7283.173425]
> dpm_run_callback+0x98/0x200 [ 7283.173430]
> __device_suspend+0x164/0x5f0
> 
> v2: drop gfx11 as it's fixed in a different solution by retiring cp_ecc_irq
> funcs(Hawking)
> 
> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2522
> 
> Signed-off-by: Guchun Chen <guchun.c...@amd.com>
> Reviewed-by: Tao Zhou <tao.zh...@amd.com>

Acked-by: Alex Deucher <alexander.deuc...@amd.com>

> ---
>  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> index ae09fc1cfe6b..c54d05bdc2d8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> @@ -3751,7 +3751,8 @@ static int gfx_v9_0_hw_fini(void *handle)  {
>       struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> 
> -     amdgpu_irq_put(adev, &adev->gfx.cp_ecc_error_irq, 0);
> +     if (amdgpu_ras_is_supported(adev, AMDGPU_RAS_BLOCK__GFX))
> +             amdgpu_irq_put(adev, &adev->gfx.cp_ecc_error_irq, 0);
>       amdgpu_irq_put(adev, &adev->gfx.priv_reg_irq, 0);
>       amdgpu_irq_put(adev, &adev->gfx.priv_inst_irq, 0);
> 
> --
> 2.25.1

Reply via email to