[PATCH 2/2] drm/amdgpu: trigger mode1 reset for RAS RMA status

2024-05-23 Thread Tao Zhou
Check RMA status in bad page retirement flow. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 16 +--- drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 7 +++ 2 files changed, 16 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c

[PATCH 1/2] drm/amdgpu: add RAS is_rma flag

2024-05-23 Thread Tao Zhou
Set the flag to true if bad page number reaches threshold. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c| 7 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h| 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 10 ++ drivers/gpu/drm/amd/amdgpu

[PATCH] drm/amdgpu: use u32 for buf size in __amdgpu_eeprom_xfer

2024-05-20 Thread Tao Zhou
And also make sure the the value of msg[1].len should be in the range of u16. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.c b/drivers/gpu/drm/amd

[PATCH] drm/amdgpu: update type of buf size to u32 for eeprom functions

2024-05-19 Thread Tao Zhou
Avoid overflow issue. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.c | 6 +++--- drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.h | 4 ++-- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.c b/drivers/gpu/drm/amd/amdgpu

[PATCH] drm/amdgpu: retire UMC v12 mca_addr_to_pa

2024-04-02 Thread Tao Zhou
RAS TA will handle it, the interface is useless. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 1 - drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 105 ++--- drivers/gpu/drm/amd/amdgpu/umc_v12_0.h | 62 +-- 3 files changed, 7 insertions

[PATCH] drm/amdgpu: update check condition for XGMI ACA UE

2024-04-01 Thread Tao Zhou
Check more possibile ext error codes. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c index f4be524b0dc1..be1f4efa9ef6

[PATCH] drm/amd/pm: update XGMI RAS UE criteria for sum v13.0.6

2024-04-01 Thread Tao Zhou
Add more possible ext error code. v2: still use ext error code instead of UC bit. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c b

[PATCH] drm/amd/pm: update XGMI RAS UC criteria for sum v13.0.6

2024-03-31 Thread Tao Zhou
Check UC bit instead of ext error code. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c b/drivers/gpu/drm/amd/pm/swsmu/smu13

[PATCH] drm/amdgpu: implement IRQ_STATE_ENABLE for SDMA v4.4.2

2024-03-28 Thread Tao Zhou
SDMA_CNTL is not set in some cases, driver configures it by itself. v2: simplify code Signed-off-by: Tao Zhou Reviewed-by: Hawking Zhang --- drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 16 +++- 1 file changed, 3 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/amd

[PATCH] drm/amdgpu: implement IRQ_STATE_ENABLE for SDMA v4.4.2

2024-03-28 Thread Tao Zhou
SDMA_CNTL is not set in some cases, driver configures it by itself. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2

[PATCH 1/2] drm/amdgpu: add socket id parameter for psp query address cmd

2024-03-20 Thread Tao Zhou
And set the socket id. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/ta_ras_if.h | 1 + drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 14 +++--- 2 files changed, 12 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/ta_ras_if.h b/drivers/gpu/drm/amd/amdgpu

[PATCH 2/2] drm/amdgpu: simplify convert_error_address interface for UMC v12

2024-03-20 Thread Tao Zhou
Replace separate parameters with struct ta_ras_query_address_input. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 57 ++ 1 file changed, 30 insertions(+), 27 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c b/drivers/gpu/drm/amd

[PATCH 3/3] drm/amdgpu: make reset method configurable for RAS poison

2024-03-18 Thread Tao Zhou
Each RAS block has different requirement for gpu reset in poison consumption handling. Add support for mmhub RAS poison consumption handling. v2: remove the mmhub poison support for kfd int v10. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c| 2 +- drivers/gpu/drm

[PATCH 2/3] drm/amdgpu: support utcl2 RAS poison query for mmhub

2024-03-18 Thread Tao Zhou
Support the query for both gfxhub and mmhub, also replace xcc_id with hub_inst. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 17 - drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 2 +- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 5

[PATCH 1/3] drm/amdgpu: add utcl2 RAS poison query for mmhub

2024-03-18 Thread Tao Zhou
Add it for mmhub v1.8. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_mmhub.h | 2 ++ drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c | 15 +++ 2 files changed, 17 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mmhub.h b/drivers/gpu/drm/amd/amdgpu

[PATCH 3/3] drm/amdgpu: make reset method configurable for RAS poison

2024-03-13 Thread Tao Zhou
Each RAS block has different requirement for gpu reset in poison consumption handling. Add support for mmhub RAS poison consumption handling. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c| 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 2 +- drivers/gpu/drm

[PATCH 2/3] drm/amdgpu: support utcl2 RAS poison query for mmhub

2024-03-13 Thread Tao Zhou
Support the query for both gfxhub and mmhub, also replace xcc_id with hub_inst. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 17 - drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 2 +- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 3

[PATCH 1/3] drm/amdgpu: add utcl2 RAS poison query for mmhub

2024-03-13 Thread Tao Zhou
Add it for mmhub v1.8. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_mmhub.h | 2 ++ drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c | 15 +++ 2 files changed, 17 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mmhub.h b/drivers/gpu/drm/amd/amdgpu

[PATCH] drm/amdgpu: add deferred error check for UMC v12 address query

2024-02-28 Thread Tao Zhou
Both RAS UE and deferred errors need page retirement. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c b/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c index 14ef7a24be7b

[PATCH 5/5] drm/amdgpu: skip GFX FED error in page fault handling

2024-02-23 Thread Tao Zhou
Let kfd interrupt handler process it. v2: return 0 instead of 1 for fed error. drop the usage of strcmp in interrupt handler. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd

[PATCH 4/5] amd/amdkfd: get node id for query_utcl2_poison_status

2024-02-23 Thread Tao Zhou
Obtain it from ring entry. v2: replace node id with logical xcc id. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdkfd/kfd_int_process_v10.c | 14 -- drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c | 14 -- 2 files changed, 24 insertions(+), 4 deletions(-) diff

[PATCH 3/5] drm/amdgpu: retire gfx ras query_utcl2_poison_status

2024-02-23 Thread Tao Zhou
Replace it with related interface in gfxhub functions. v2: replace node id with xcc id. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 7 --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 3 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h| 1 - drivers/gpu/drm

[PATCH 1/5] drm/amdgpu: add new bit definitions for GC 9.0 PROTECTION_FAULT_STATUS

2024-02-23 Thread Tao Zhou
Add UCE and FED bit definitions. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h b/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h

[PATCH 2/5] drm/amdgpu: add utcl2 poison query for gfxhub

2024-02-23 Thread Tao Zhou
Implement it for gfxhub 1.0 and 1.2. v2: input logical xcc id for poison query interface. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfxhub.h | 2 ++ drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c | 17 + drivers/gpu/drm/amd/amdgpu/gfxhub_v1_2.c | 15

[PATCH 4/5] amd/amdkfd: get node id for query_utcl2_poison_status

2024-02-19 Thread Tao Zhou
Obtain it from ring entry. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdkfd/kfd_int_process_v10.c | 3 ++- drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v10.c b/drivers

[PATCH 5/5] drm/amdgpu: skip GFX FED error in page fault handling

2024-02-19 Thread Tao Zhou
Let kfd interrupt handler process it. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c index 773725a92cf1..70defc394b7b

[PATCH 3/5] drm/amdgpu: retire gfx ras query_utcl2_poison_status

2024-02-19 Thread Tao Zhou
Replace it with related interface in gfxhub functions. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 7 --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 3 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h| 1 - drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c| 12

[PATCH 2/5] drm/amdgpu: add utcl2 poison query for gfxhub

2024-02-19 Thread Tao Zhou
Implement it for gfxhub 1.0 and 1.2. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfxhub.h | 2 ++ drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c | 17 + drivers/gpu/drm/amd/amdgpu/gfxhub_v1_2.c | 15 +++ 3 files changed, 34 insertions(+) diff --git

[PATCH 1/5] drm/amdgpu: add new bit definitions for GC 9.0 PROTECTION_FAULT_STATUS

2024-02-19 Thread Tao Zhou
Add UCE and FED bit definitions. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h b/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h

[PATCH] drm/amdgpu: add UTCL2 RAS poison query for gfx 9.4.3

2024-02-17 Thread Tao Zhou
Add help function to query and reset RAS UTCL2 poison status. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c index

[PATCH 2/2] use PSP address query command

2024-01-30 Thread Tao Zhou
Get UMC physical address from PSP in RAS error address coversion. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 46 ++ 1 file changed, 39 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c b/drivers/gpu/drm/amd

[PATCH 1/2] add PSP RAS address query command

2024-01-30 Thread Tao Zhou
Convert mca address to physical address or vice versa via RAS TA. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 25 + drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h | 3 +++ drivers/gpu/drm/amd/amdgpu/ta_ras_if.h | 36 + 3 files changed

[PATCH] drm/amdgpu: disable ras feature when fini

2024-01-28 Thread Tao Zhou
Send ras disable feature command in fini. Signed-off-by: Tao Zhou Change-Id: I95f1d1e0a46fb613631e5cd77497e64c0551c4c7 --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd

[PATCH 2/2] update check condition of query for ras page retire

2024-01-17 Thread Tao Zhou
Support page retirement handling in debug mode. v2: revert smu_v13_0_6_get_ecc_info directly. Signed-off-by: Tao Zhou Change-Id: I0aaa807d7fe87b3da0f023c380e57ab6dd446fcf --- drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git

[PATCH 1/2] Revert "drm/amd/pm: smu v13_0_6 supports ecc info by default"

2024-01-17 Thread Tao Zhou
This reverts commit affdce050ab4119a3cdf74d7faa8f1eb30f6f6aa. We use debug mode flag instead of this interface. Signed-off-by: Tao Zhou Change-Id: I49eae821ce352d542143d68c05802634b4bf469d --- drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c | 8 1 file changed, 8 deletions

[PATCH 2/2] update check condition of query for ras page retire

2024-01-17 Thread Tao Zhou
Support page retirement handling in debug mode. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 9 +++-- drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c | 4 ++-- 2 files changed, 9 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd

[PATCH 1/2] update error condition check for umc_v12_0_query_error_address

2024-01-17 Thread Tao Zhou
Deferred error is also taken into account. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c b/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c index 10edf818acf5..2e0bd4312f2c

[PATCH] drm/amdgpu: Don't warn for unsupported set_xgmi_plpd_mode

2023-11-02 Thread Tao Zhou
set_xgmi_plpd_mode may be unsupported and this isn't error, no need to print warning for it. v2: add ret2 to save the status of psp_ras_trigger_error. Suggested-by: lijo.la...@amd.com Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 14 -- 1 file changed, 8

[PATCH] drm/amdgpu: check recovery status of xgmi hive in ras_reset_error_count

2023-10-31 Thread Tao Zhou
Handle xgmi hive case. Suggested-by: Hawking Zhang Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index

[PATCH] drm/amdgpu: handle extra UE register entries for gfx v9_4_3

2023-10-31 Thread Tao Zhou
The UE registe list is larger than CE list. Reported-by: yipeng.c...@amd.com Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 38 + 1 file changed, 38 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c b/drivers/gpu/drm/amd/amdgpu

[PATCH] drm/amdgpu: Don't warn for unsupported set_xgmi_plpd_mode

2023-10-31 Thread Tao Zhou
set_xgmi_plpd_mode may be unsupported and this isn't error, no need to print warning for it. Suggested-by: lijo.la...@amd.com Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu

[PATCH 2/2] drm/amdgpu: add RAS reset/query operations for XGMI v6_4

2023-10-27 Thread Tao Zhou
Reset/query RAS error status and count. v2: use XGMI IP version instead of WAFL version. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 46 ++-- 1 file changed, 43 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c

[PATCH 1/2] drm/amdgpu: set XGMI IP version manually for v6_4

2023-10-27 Thread Tao Zhou
The version can't be queried from discovery table. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c index 0b711bac2092

[PATCH] drm/amdgpu: use mode-2 reset for RAS poison consumption

2023-10-26 Thread Tao Zhou
Switch from mode-1 reset to mode-2 for poison consumption. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c index

[PATCH] drm/amdgpu: check RAS supported first in ras_reset_error_count

2023-10-24 Thread Tao Zhou
Not all platforms support RAS. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index c71321edf50b..a6cff4a31c54

[PATCH] drm/amdgpu: get RAS poison status from DF v4_6_2

2023-10-24 Thread Tao Zhou
Add DF block and RAS poison mode query for DF v4_6_2. Signed-off-by: Tao Zhou Reviewed-by: Stanley.Yang --- drivers/gpu/drm/amd/amdgpu/Makefile | 3 +- drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 4 +++ drivers/gpu/drm/amd/amdgpu/df_v4_6_2.c| 34

[PATCH] drm/amdgpu: enable RAS poison mode for APU

2023-10-20 Thread Tao Zhou
Enable it by default on APU platform. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index 95c181cd1fea..a41cab0a2f9c

[PATCH 6/6] drm/amdgpu: drop status query/reset for GCEA 9.4.3 and MMEA 1.8

2023-10-18 Thread Tao Zhou
PMFW will be responsible for them. v2: remove query interfaces. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 60 -- drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c | 143 2 files changed, 203 deletions(-) diff --git a/drivers/gpu/drm/amd

[PATCH 4/6] drm/amd/pm: record mca debug mode in RAS

2023-10-18 Thread Tao Zhou
Call amdgpu_ras_set_mca_debug_mode when we set mca debug mode in smu v13_0_6. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c b/drivers/gpu/drm/amd/pm/swsmu

[PATCH 5/6] drm/amdgpu: bypass RAS error reset in some conditions

2023-10-18 Thread Tao Zhou
PMFW is responsible for RAS error reset in some conditions, driver can skip the operation. v2: add check for ras->in_recovery, it's set earlier than amdgpu_in_reset. v3: fix error in gpu reset check. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 10 +- 1 f

[PATCH 3/6] drm/amdgpu: add set/get mca debug mode operations

2023-10-18 Thread Tao Zhou
Record the debug mode status in RAS. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 21 + drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 5 + 2 files changed, 26 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd

[PATCH 2/6] drm/amdgpu: replace reset_error_count with amdgpu_ras_reset_error_count

2023-10-18 Thread Tao Zhou
Simplify the code. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 9 ++--- drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 7 ++- drivers/gpu/drm/amd

[PATCH 1/6] drm/amdgpu: define ras_reset_error_count function

2023-10-18 Thread Tao Zhou
Make the code architecture more simple. v2: reuse ras_reset_error_count in ras_reset_error_status. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 19 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 2 ++ 2 files changed, 17 insertions(+), 4 deletions

[PATCH 6/6] drm/amdgpu: drop status reset for GCEA 9.4.3 and MMEA 1.8

2023-10-17 Thread Tao Zhou
PMFW will be responsible for it. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 22 --- drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c | 86 - 2 files changed, 108 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c b/drivers/gpu/drm

[PATCH 4/6] drm/amdgpu: bypass RAS error reset in some conditions

2023-10-17 Thread Tao Zhou
PMFW is responsible for RAS error reset in some conditions, driver can skip the operation. v2: add check for ras->in_recovery, it's set earlier than amdgpu_in_reset. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 20 ++-- 1 file changed, 18 inserti

[PATCH 5/6] drm/amdgpu: reuse amdgpu_ras_reset_error_count code

2023-10-17 Thread Tao Zhou
To simplify the code of amdgpu_ras_reset_error_status without logical change. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 30 +++-- 1 file changed, 8 insertions(+), 22 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu

[PATCH 2/6] drm/amdgpu: add set/get mca debug mode operations

2023-10-17 Thread Tao Zhou
Record the debug mode status in RAS. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 21 + drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 5 + 2 files changed, 26 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd

[PATCH 3/6] drm/amd/pm: record mca debug mode in RAS

2023-10-17 Thread Tao Zhou
Call amdgpu_ras_set_mca_debug_mode when we set mca debug mode in smu v13_0_6. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c b/drivers/gpu/drm/amd/pm/swsmu

[PATCH 1/6] drm/amdgpu: define ras_reset_error_count function

2023-10-17 Thread Tao Zhou
Make the code architecture more simple. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c| 17 + drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h| 2 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 4

[PATCH 5/5] drm/amdgpu: reuse amdgpu_ras_reset_error_count code

2023-10-12 Thread Tao Zhou
To simplify the code of amdgpu_ras_reset_error_status without logical change. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 29 +++-- 1 file changed, 8 insertions(+), 21 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu

[PATCH 4/5] drm/amdgpu: bypass RAS error reset in some conditions

2023-10-12 Thread Tao Zhou
PMFW is responsible for RAS error reset in some conditions, driver can skip the operation. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 18 -- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b

[PATCH 3/5] drm/amd/pm: record mca debug mode in RAS

2023-10-12 Thread Tao Zhou
Call amdgpu_ras_set_mca_debug_mode when we set mca debug mode in smu v13_0_6. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c b/drivers/gpu/drm/amd/pm/swsmu

[PATCH 2/5] drm/amdgpu: add set/get mca debug mode operations

2023-10-12 Thread Tao Zhou
Record the debug mode status in RAS. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 21 + drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 5 + 2 files changed, 26 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd

[PATCH 1/5] drm/amdgpu: define ras_reset_error_count function

2023-10-12 Thread Tao Zhou
Make the code architecture more simple. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c| 17 + drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h| 2 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 4

[PATCH 1/2] drm/amdgpu: exit directly if gpu reset fails

2023-09-27 Thread Tao Zhou
No need to perform the full reset operation in case of gpu reset failure. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu

[PATCH 2/2] drm/amdgpu: update retry times for psp vmbx wait

2023-09-27 Thread Tao Zhou
Increase the retry loops and replace the constant number with macro. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/psp_v13_0.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v13_0.c b/drivers/gpu/drm/amd/amdgpu/psp_v13_0.c index

[PATCH 3/3] drm/amdgpu: change if condition for bad channel bitmap update

2023-09-20 Thread Tao Zhou
: replace sizeof with BITS_PER_TYPE, we should check bit number instead of byte number. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c b/drivers/gpu

[PATCH 2/3] drm/amdgpu: fix value of some UMC parameters for UMC v12

2023-09-20 Thread Tao Zhou
Prepare for bad page retirement. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 4 +++- drivers/gpu/drm/amd/amdgpu/umc_v12_0.h | 2 ++ 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0

[PATCH 1/3] drm/amdgpu: print channel index for UMC bad page

2023-09-20 Thread Tao Zhou
Print channel index for UMC v12. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c b/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c index c6742dd863d4..7714c2ef2cdc

[PATCH 2/3] drm/amdgpu: fix value of some UMC parameters for UMC v12

2023-09-19 Thread Tao Zhou
Prepare for bad page retirement. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 4 +++- drivers/gpu/drm/amd/amdgpu/umc_v12_0.h | 2 ++ 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0

[PATCH 3/3] drm/amdgpu: change if condition for bad channel bitmap update

2023-09-19 Thread Tao Zhou
. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c index 8ced4be784e0..1c4433f22f4b 100644 --- a/drivers

[PATCH 1/3] drm/amdgpu: print channel index for UMC bad page

2023-09-19 Thread Tao Zhou
Print channel index for UMC v12. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c b/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c index c6742dd863d4..7714c2ef2cdc

[PATCH 3/3] drm/amdgpu: print more address info of UMC bad page

2023-09-06 Thread Tao Zhou
Print out row, column and bank value of UMC error address for UMC v12. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c b/drivers/gpu/drm/amd/amdgpu

[PATCH 2/3] drm/amdgpu: add channel index table for UMC v12

2023-09-06 Thread Tao Zhou
Get UMC phyical channel index according to node id, umc instance and channel instance. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 1 + drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 14 ++ drivers/gpu/drm/amd/amdgpu/umc_v12_0.h | 5 + 3 files changed, 20

[PATCH 1/3] drm/amdgpu: add address conversion for UMC v12

2023-09-06 Thread Tao Zhou
Convert MCA error address to physical address and find out all pages in one physical row. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h | 5 ++ drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 97 - drivers/gpu/drm/amd/amdgpu/umc_v12_0.h | 64

[PATCH] drm/amdgpu: use read-modify-write mode for gfx v9_4_3 SQ setting

2023-08-25 Thread Tao Zhou
Instead of using direct update, avoid touching unrelated fields. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c index

[PATCH] drm/amdgpu: add RAS fatal error handler for NBIO v7.9

2023-08-07 Thread Tao Zhou
Register RAS fatal error interrupt and add handler. v2: only register NBIO RAS for dGPU platform. change nbio_v7_9_set_ras_controller_irq_state and nbio_v7_9_set_ras_err_event_athub_irq_state to dummy functions. Signed-off-by: Tao Zhou Reviewed-by: Hawking Zhang --- drivers/gpu/drm

[PATCH] drm/amdgpu: add RAS fatal error handler for NBIO v7.9

2023-08-06 Thread Tao Zhou
Register RAS fatal error interrupt and add handler. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 4 + drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c | 219 drivers/gpu/drm/amd/amdgpu/nbio_v7_9.h | 1 + 3 files changed, 224 insertions(+) diff

[PATCH] drm/amdgpu: add watchdog timer enablement for gfx_v9_4_3

2023-07-06 Thread Tao Zhou
Configure SQ watchdog timer setting. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 38 + 1 file changed, 38 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c index 9e3b835bdbb2

[PATCH] drm/amdgpu: skip address adjustment for GFX RAS injection

2023-06-29 Thread Tao Zhou
The address parameter of GFX RAS injection isn't related to XGMI node number, keep it unchanged. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm

[PATCH] drm/amdgpu: check RAS irq existence for VCN/JPEG

2023-06-20 Thread Tao Zhou
No RAS irq is allowed. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c | 3 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c b/drivers/gpu/drm/amd/amdgpu

[PATCH] drm/amdgpu: remove unused definition

2023-05-18 Thread Tao Zhou
mmhub_v1_8_mmea_cgtt_clk_cntl_reg is defined but not used. Reported-by: kernel test robot Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c | 8 1 file changed, 8 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c b/drivers/gpu/drm/amd/amdgpu

[PATCH 2/2] drm/amdgpu: add bad_page_threshold check in ras_eeprom_check_err

2023-02-21 Thread Tao Zhou
bad_page_threshold controls page retirement behavior and it should be also checked. v2: simplify the condition of bad page handling path. Signed-off-by: Tao Zhou --- .../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c| 19 ++- 1 file changed, 14 insertions(+), 5 deletions(-) diff

[PATCH 1/2] drm/amdgpu: change default behavior of bad_page_threshold parameter

2023-02-21 Thread Tao Zhou
Ignore ras umc bad page threshold by default, GPU initialization won't be stopped in this mode. v2: refine the description of bad_page_threshold. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c| 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c| 7

[PATCH 2/2] drm/amdgpu: add bad_page_threshold check in ras_eeprom_check_err

2023-02-21 Thread Tao Zhou
bad_page_threshold controls page retirement behavior and it should be also checked. Signed-off-by: Tao Zhou --- .../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c| 20 ++- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c

[PATCH 1/2] drm/amdgpu: change default behavior of bad_page_threshold parameter

2023-02-21 Thread Tao Zhou
Ignore ras umc bad page threshold by default, GPU initialization won't be stopped in this mode. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c| 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c| 7 --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 4

[PATCH 2/2] drm/amdgpu: exclude duplicate pages from UMC RAS UE count

2023-02-19 Thread Tao Zhou
added bad page number. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 16 +--- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 3 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 5 +++-- 3 files changed, 18 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd

[PATCH 1/2] drm/amdgpu: add umc retire unit element

2023-02-19 Thread Tao Zhou
It records how many bad pages are retired in one uncorrectable error. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h | 2 ++ drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 1 + drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c | 1 + drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 3 +++ 4

[PATCH 2/2] drm/amdgpu: exclude duplicate pages from UMC RAS UE count

2023-02-16 Thread Tao Zhou
If a UMC bad page is reserved but not freed by an application, the application may trigger uncorrectable error repeatly by accessing the page. v2: add specific function to do the check. v3: remove duplicate pages, calculate new added bad page number. Signed-off-by: Tao Zhou --- drivers/gpu/drm

[PATCH 1/2] drm/amdgpu: add umc retire unit element

2023-02-16 Thread Tao Zhou
It records how many bad pages are retired in one uncorrectable error. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h | 2 ++ drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 1 + drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c | 1 + drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 3 +++ 4

[PATCH] drm/amdgpu: don't increase UMC RAS UE count if no new bad page

2023-02-15 Thread Tao Zhou
If a UMC bad page is reserved but not freed by an application, the application may trigger uncorrectable error repeatly by accessing the page. v2: add specific function to do the check. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 24 drivers

[PATCH] drm/amdgpu: don't increase UMC RAS UE count if no new bad page

2023-02-10 Thread Tao Zhou
If a UMC bad page is reserved but not freed by an application, the application may trigger uncorrectable error repeatly by accessing the page. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 9 - drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 6 +- 2 files changed

[PATCH] drm/amdgpu: retire unused get_umc_v6_7_channel_index

2023-01-18 Thread Tao Zhou
-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/umc_v6_7.c | 7 --- 1 file changed, 7 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v6_7.c b/drivers/gpu/drm/amd/amdgpu/umc_v6_7.c index 72fd963f178b..e08e25a3a1a9 100644 --- a/drivers/gpu/drm/amd/amdgpu/umc_v6_7.c +++ b/drivers/gpu/drm/amd

[PATCH 7/7] drm/amdgpu: define RAS poison mode query function

2022-12-07 Thread Tao Zhou
1. no need to query poison mode on SRIOV guest side, host can handle it. 2. define the function to simplify code. v2: rename amdgpu_ras_poison_mode_query to amdgpu_ras_query_poison_mode. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 54 +++-- 1 file

[PATCH 6/7] drm/amdgpu: update VCN/JPEG RAS setting

2022-12-07 Thread Tao Zhou
Support VCN/JPEG RAS in both bare metal and SRIOV environment. v2: update commit description. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 24 +--- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu

[PATCH 4/7] drm/amdgpu: add VCN poison consumption handler for SRIOV

2022-12-07 Thread Tao Zhou
Inform host and let host handle consumption interrupt. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c index

[PATCH 5/7] drm/amdgpu: skip RAS error injection in SRIOV

2022-12-07 Thread Tao Zhou
Injection on guest is not allowed. v2: return directly in SRIOV environment. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index

[PATCH 3/7] drm/amdgpu: add RAS poison consumption handler for SRIOV

2022-12-07 Thread Tao Zhou
Send message to PF if VF receives RAS poison consumption interrupt. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 44 +++-- 1 file changed, 26 insertions(+), 18 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c b/drivers/gpu/drm/amd

[PATCH 1/7] drm/amdgpu: add RAS poison consumption handler for AI SRIOV

2022-12-07 Thread Tao Zhou
Send message to host and host will handle it. v2: split it into two parts, one for mxgpu ai and another one for common poison consumption handler. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h | 1 + drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c| 6 ++ drivers/gpu/drm

[PATCH 2/7] drm/amdgpu: add RAS poison consumption handler for NV SRIOV

2022-12-07 Thread Tao Zhou
Send handling request to host. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c | 6 ++ drivers/gpu/drm/amd/amdgpu/mxgpu_nv.h | 1 + 2 files changed, 7 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c index

  1   2   3   >