[PATCH 1/2] drm/amdgpu: add umc retire unit element

2023-02-19 Thread Tao Zhou
It records how many bad pages are retired in one uncorrectable error. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h | 2 ++ drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 1 + drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c | 1 + drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 3 +++ 4

[PATCH 2/2] drm/amdgpu: exclude duplicate pages from UMC RAS UE count

2023-02-19 Thread Tao Zhou
If a UMC bad page is reserved but not freed by an application, the application may trigger uncorrectable error repeatly by accessing the page. v2: add specific function to do the check. v3: remove duplicate pages, calculate new added bad page number. v4: reuse save_bad_pages to calculate new

RE: [PATCH] drm/amdgpu: fix incorrect active rb bitmap for gfx11

2023-02-19 Thread Gao, Likun
This patch is Reviewed-by: Likun Gao Regards, Likun -Original Message- From: amd-gfx On Behalf Of Hawking Zhang Sent: Monday, February 20, 2023 9:09 AM To: amd-gfx@lists.freedesktop.org; Xu, Feifei ; Gao, Likun ; Deucher, Alexander Cc: Zhang, Hawking Subject: [PATCH] drm/amdgpu: fix

RE: [PATCH] drm/amdgpu: fix incorrect active rb bitmap for gfx11

2023-02-19 Thread Zhang, Hawking
Please ignore this one. Some code needs to be optimized. I'll send out another one for the review. Regards, Hawking -Original Message- From: Zhang, Hawking Sent: Sunday, February 19, 2023 14:33 To: amd-gfx@lists.freedesktop.org; Xu, Feifei ; Gao, Likun ; Deucher, Alexander Cc:

[PATCH] drm/amdgpu: fix incorrect active rb bitmap for gfx11

2023-02-19 Thread Hawking Zhang
GFX v11 changes RB_BACKEND_DISABLE related registers from per SA to global ones. The approach to query active rb bitmap needs to be changed accordingly. Query per SE setting returns wrong active RB bitmap especially in the case when some of SA are disabled. With the new approach, driver will