include amdgpu_ras.h head file instead of use extern
ras_debugfs_create_all function
Signed-off-by: Stanley.Yang
Change-Id: I2697250ba67d4deac4371fea05efb68a976f7e5a
---
drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git
Fix the warning
"warn: variable dereferenced before check 'obj' (see line 1131)"
by removing unnecessary checks as amdgpu_ras_debugfs_create_all()
is only called from amdgpu_debugfs_init() where obj member in
con->head list is not NULL.
Use list_for_each_entry() instead list_for_each_entry_safe()
From: Tao Zhou
centralize all debugfs creation in one place for ras
Signed-off-by: Tao Zhou
Signed-off-by: Stanley.Yang
Change-Id: I7489ccb41dcf7a11ecc45313ad42940474999d81
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 29 +
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h |
From: Tao Zhou
and remove each ras IP's own debugfs creation
Signed-off-by: Tao Zhou
Signed-off-by: Stanley.Yang
Change-Id: If3d16862afa0d97abad183dd6e60478b34029e95
---
drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 3 +++
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 1 -
mmHDP_READ_CACHE_INVALIDATE register is in HDP not in NBIO
Signed-off-by: Stanley.Yang
Change-Id: I4375a8a67d3a13f9605479e169169e22dd5833d1
---
drivers/gpu/drm/amd/amdgpu/nv.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c
GCEA/MMHUB EA error should not result to DF freeze, this is
fixed in next generation, but for some reasons the GCEA/MMHUB
EA error will result to DF freeze in previous generation,
diver should avoid to indicate GCEA/MMHUB EA error as hw fatal
error in kernel message by read GCEA/MMHUB err status
Signed-off-by: Stanley.Yang
Change-Id: Ia0ad9453ac3ac929f95c73cbee5b7a8fc42a9816
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 7 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 164 +
drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h | 30 +++-
3 files changed, 196
Changed from V1:
rename some functions name, only init ras error handler data for
supported asic.
Changed from V2:
fix poential memory leak.
Signed-off-by: Stanley.Yang
Change-Id: Ia0ad9453ac3ac929f95c73cbee5b7a8fc42a9816
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |
Changed from V1:
rename some functions name, only init ras error handler data for
supported asic.
Changed from V2:
fix potential memory leak.
Signed-off-by: Stanley.Yang
Change-Id: Ia0ad9453ac3ac929f95c73cbee5b7a8fc42a9816
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
Changed from V1:
rename same functions name, only init ras error handler data for
supported asic.
Signed-off-by: Stanley.Yang
Change-Id: Ia0ad9453ac3ac929f95c73cbee5b7a8fc42a9816
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 +
drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c |
The KFDTopologyTest.BasicTest will failed if skip smc, sdma, sos, ta
and asd fw in SRIOV for vega10, so adjust above fw and skip load them
in SRIOV only for navi12.
v2: remove unnecessary asic type check.
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 3
each sdma instance fw_version and feature_version
should be set right value when asic type isn't
between SIENNA_CICHILD and CHIP_DIMGREY_CAVEFISH
Signed-off-by: Stanley.Yang
Change-Id: I1edbf3e0557d771eb4c0b686fa5299a3b5f26e35
---
drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c | 2 +-
1 file changed, 1
noretry = 0 casue KFDGraphicsInterop test failed on SRIOV platform
for vega10, so set noretry to 1 for vega10.
Signed-off-by: Stanley.Yang
Change-Id: I241da5c20970ea889909997ff044d6e61642da81
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 1 +
1 file changed, 1 insertion(+)
diff --git
The KFDTopologyTest.BasicTest will failed if skip smc, sdma, sos, ta
and asd fw in SRIOV for vega10, so adjust above fw and skip load them
in SRIOV only for navi12.
Signed-off-by: Stanley.Yang
Change-Id: Id354be93723d7b5d769d73dc67c596af300305af
---
drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
skip load smu and sdma fw on sriov due to sos,
ta and asd fw have been skipped for SIENNA_CICHLID.
V2:
move asic check into smu11
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c | 3 +++
drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 10 --
skip load smu and sdma fw on sriov due to smc, sos,
ta and asd fw have been skipped for SIENNA_CICHLID.
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c| 3 +++
drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 4 +++-
2 files changed, 6 insertions(+), 1 deletion(-)
diff
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h | 5 +
drivers/gpu/drm/amd/amdgpu/umc_v8_7.c | 2 +-
2 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h
index
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index f404c2321a6a..ca5a32944242 100644
---
From: John Clements
support umc ras function initialization for aldebaran
Change-Id: I84155d4d3eaae86a8c1bd2331b1964946c47f6da
Signed-off-by: John Clements
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 13 +
drivers/gpu/drm/amd/amdgpu/umc_v6_7.c | 15
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 22 ++
1 file changed, 22 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index daf63a4c1fff..dfeaa57dd7ea 100644
---
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 0e16683876aa..d9d292c79cfa 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++
cause:
It is necessary to send ras disable command to ras-ta during gfx
block ras later init, because the ras capability is disable read
from vbios for vega20 gaming, but the ras context is released
during ras init process, this will cause send ras disable
cause:
It is necessary to send ras disable command to ras-ta to program
GB_EDC_MODE to "BYPASS" mode during gfx block ras later init,
because the ras capability is disable read from vbios for vega20
gaming, but the ras context is released during ras init
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 10 +-
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index ec3ebc33ee03..8fdf355d7de8 100644
---
Test scenario:
modprobe amdgpu -> rmmod amdgpu -> modprobe amdgpu
Error log:
[ 54.396807] debugfs: File 'page_pool' in directory 'amdttm' already
present!
[ 54.396833] debugfs: File 'page_pool_shrink' in directory 'amdttm'
already present!
[ 54.396848] debugfs: File
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
index 6e781cee8bb6..e0a8224e466f 100644
update smu driver if version to 0x08 to avoid mismatch log
A version mismatch can still happen with an older FW
Change-Id: I97f2bc4ed9a9cba313b744e2ff6812c90b244935
Signed-off-by: Stanley.Yang
---
.../drm/amd/pm/inc/smu13_driver_if_aldebaran.h | 18 +-
add message smu to query error information
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 16 +++
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h | 4 +
drivers/gpu/drm/amd/amdgpu/umc_v6_7.c | 161
3 files changed, 181 insertions(+)
diff --git
support ECC TABLE message, this table include unc ras error count
and error address
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h | 7
.../drm/amd/pm/swsmu/smu13/aldebaran_ppt.c| 38 +++
.../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c| 2
if smu support ECCTABLE, driver can message smu to get ecc_table
then query umc error info from ECCTABLE
apply pmfw version check to ensure backward compatibility
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 42 ---
update smu driver if version to avoid mismatch log
Change-Id: I97f2bc4ed9a9cba313b744e2ff6812c90b244935
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/pm/inc/smu_v13_0.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/pm/inc/smu_v13_0.h
update smu driver if and version to avoid mismatch log
v2:
update smu driver interface
Change-Id: I97f2bc4ed9a9cba313b744e2ff6812c90b244935
Signed-off-by: Stanley.Yang
---
.../drm/amd/pm/inc/smu13_driver_if_aldebaran.h | 18 +-
drivers/gpu/drm/amd/pm/inc/smu_v13_0.h
update smu driver if version to avoid mismatch log
Change-Id: I97f2bc4ed9a9cba313b744e2ff6812c90b244935
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/pm/inc/smu_v13_0.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/pm/inc/smu_v13_0.h
Reason:
{
[ 578.019986] amdgpu :23:00.0: amdgpu: GPU reset begin!
[ 583.245566] amdgpu :23:00.0: amdgpu: Failed to disable smu features.
[ 583.245621] amdgpu :23:00.0: amdgpu: Fail to disable dpm features!
[ 583.245639] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]]
{
[ 578.019986] amdgpu :23:00.0: amdgpu: GPU reset begin!
[ 583.245566] amdgpu :23:00.0: amdgpu: Failed to disable smu features.
[ 583.245621] amdgpu :23:00.0: amdgpu: Fail to disable dpm features!
[ 583.245639] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR*
Function amdgpu_device_fini_hw is called before amdgpu_device_fini_sw,
so ras ta will unload before send ras disable command, ras dsiable operation
must before hw fini.
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 +++--
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
v2:
still need call ras_disable_all_featrures to handle
ras initilization failure case.
Function amdgpu_device_fini_hw is called before amdgpu_device_fini_sw,
so ras ta will unload before send ras disable command, ras dsiable operation
must before hw fini.
Signed-off-by: Stanley.Yang
support ECC TABLE message, this table include umc ras error count
and error address
v2:
add smu version check to query whether support ecctable
call smu_cmn_update_table to get ecctable directly
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h | 8 +++
update smu driver if version to 0x08 to avoid mismatch log
A version mismatch can still happen with an older FW
Change-Id: I97f2bc4ed9a9cba313b744e2ff6812c90b244935
Signed-off-by: Stanley.Yang
---
.../drm/amd/pm/inc/smu13_driver_if_aldebaran.h | 18 +-
add message smu to query error information
v2:
rename message_smu to ecc_info
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 16 +++
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h | 4 +
drivers/gpu/drm/amd/amdgpu/umc_v6_7.c | 161
3 files
if smu support ECCTABLE, driver can message smu to get ecc_table
then query umc error info from ECCTABLE
v2:
optimize source code makes logical more reasonable
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 42 +++
skip get ecc info for aldebarn through check ip version
do not affect other asic type
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
this is a workaround due to get ecc info failed during gpu recovery
[ 700.236122] amdgpu :09:00.0: amdgpu: Failed to export SMU ecc table!
[ 700.236128] amdgpu :09:00.0: amdgpu: GPU reset begin!
[ 704.331171] amdgpu: qcm fence wait loop timeout expired
[ 704.331194] amdgpu: The cp
remove in recovery stat check, skip umc ras err cnt
harvest in amdgpu_ras_log_on_err_counter
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 15 ++-
1 file changed, 10 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 10 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 2 +-
drivers/gpu/drm/amd/amdgpu/ta_ras_if.h | 3 ++-
3 files changed, 12 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
Changed from v1:
remove unused brace
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 9 -
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 2 +-
drivers/gpu/drm/amd/amdgpu/ta_ras_if.h | 3 ++-
3 files changed, 11 insertions(+), 3 deletions(-)
diff --git
Change-Id: I6afe0332cbb20528648c38665264930d6b091c2f
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/pm/amdgpu_dpm.c | 7 ++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
index
Message SMU bad channel information bitmap to update OOB table
Change-Id: I49a79af64d5263c28db059ecb8b8405a471431b4
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 7 +++
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 3 ++
support message SMU update bad channel info to update HBM bad channel
info in OOB table
Change-Id: I1e50ed8118f4c1aaefb04c040e59ae4918cdc295
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/pm/amdgpu_dpm.c | 12 ++
drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h | 1 +
It should notice SMU to update bad channel info when detected
uncorrectable error in UMC block
Change-Id: I2dc8848affdb53e52891013953ae9383fff5f20f
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 7 ++
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 3 +++
print more error info when deferred uncorrectable ras error
changed from V1:
move Defferred error msg into query uncorrectable error
count function.
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 7 +-
drivers/gpu/drm/amd/amdgpu/umc_v6_7.c | 72
It must check asic whether support smu
before call smu powerplay function, otherwise
it may cause null point on no support smu asic.
Change-Id: Ib86f3d4c88317b23eb1040b9ce1c5c8dcae42488
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/pm/amdgpu_dpm.c | 6 ++
1 file changed, 6
The OOB table error count info should be reset after reset
eeprom table
Change-Id: I2a39e0e44b7b1a5ab7d6b4d4b73ebe48264396b7
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 3 +++
1 file changed, 3 insertions(+)
diff --git
the UMC_STATUS register is not liner, adjust offset
calculation formula to get correct address
Change-Id: Ic8926078301848330babf289c4238dc8cbcf313d
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/amdgpu/umc_v6_7.c | 7 +++
1 file changed, 7 insertions(+)
diff --git
Pmfw read ecc info registers in the following order,
umc0: ch_inst 0, 1, 2 ... 7
umc1: ch_inst 0, 1, 2 ... 7
The position of the register value stored in eccinfo
table is calculated according to the below formula,
channel_index = umc_inst * channel_in_umc + ch_inst
Driver directly
Pmfw read ecc info registers and store values in
eccinfo_table in the following order
umc0 ch_inst 0, 1, 2 ... 7
umc1 ch_inst 0, 1, 2 ... 7
...
umc3 ch_inst 0, 1, 2 ... 7
Driver should convert eccinfo_table_idx into channel_index according
to channel_idx_tbe.
Change-Id:
Change-Id: Ic2a488ee253a913d806bd33ee9c90e31a71af320
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/amdgpu/umc_v6_7.c | 23 ---
drivers/gpu/drm/amd/amdgpu/umc_v8_7.c | 6 --
2 files changed, 29 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v6_7.c
Change-Id: I09a2aae85cde3ab2cb6b042b973da6839ad024ec
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/amdgpu/umc_v6_7.c | 62 ++-
1 file changed, 60 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v6_7.c
In order to debug ras error, driver will print IPID/SYND/MISC0
register value if detect correctable or uncorrectable error.
Provide umc_query_error_status_helper function to reduce code
redundancy.
Change-Id: I09a2aae85cde3ab2cb6b042b973da6839ad024ec
Signed-off-by: Stanley.Yang
---
It should first check block ras obj whether be set, it should
return directly if block ras obj is not set.
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 5 -
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
It should first check block ras obj whether be set, it should
return 0 directly if block ras obj or hw_ops is not set.
Changed from V1:
return 0 directly if block ras obj or hw ops is not set
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 10 +-
1
This is workaround, kiq ring test failed in suspend stage when do ras
recovery for gfx v9_4_3.
Change-Id: I8de9900aa76706f59bc029d4e9e8438c6e1db8e0
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 21 +
1 file changed, 21 insertions(+)
diff --git
Reset error data info stored in vram when user clear eeprom table.
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 97 ++-
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 2 +
.../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c| 4 +
3 files
Fix delete nodes that it has been freed.
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index
This is workaround due to ring test failed during ras
do gpu recovery for aqua vanjaram.
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 6 ++
1 file changed, 6 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
The amdgpu_ras_get_context may return NULL if device
not support ras feature, so add check before using.
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
Enable smu_v13_0_6 mca debug mode if ras is enabled.
Changed from V1:
enable mca debug mode if ras enabled.
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git
Enable smu_v13_0_6 mca debug mode when GFX RAS feature is enabled
on APU.
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
Enable RAS feature by default for aqua vanjaram on apu
platform.
Change-Id: I02105d07d169d1356251c994249a134ca5dd2a7a
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 14 ++
1 file changed, 2 insertions(+), 12 deletions(-)
diff --git
support umc/gfx/sdma ras on guest side
Changed from V1:
move sriov judgment in amdgpu_ras_interrupt_fatal_error_handler
Change-Id: Ic7dda45d8f8cf2d5f1abc7705abc153d558da8a1
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +++
Fix aldebaran ras supported check on SRIOV guest side,
the previous check conditicon block all ras feature
on baremetal
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git
SMU add a new variable mca_ceumc_addr to record
umc correctable error address in EccInfo table,
driver side add ecctable_v2 to support this feature
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 1 +
drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h | 2 +
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 5 ++
drivers/gpu/drm/amd/amdgpu/umc_v6_7.c | 55 ++-
.../drm/amd/pm/swsmu/smu13/aldebaran_ppt.c| 2 +
3 files changed, 60 insertions(+), 2 deletions(-)
diff --git
SMU add a new variable mca_ceumc_addr to record
umc correctable error address in EccInfo table,
driver side add EccInfo_V2_t to support this feature
Changed from V1:
remove ecc_table_v2 and unnecessary table id, define union struct
include
EccInfo_t and EccInfo_V2_t.
Changed
Changed from V1:
remove unnecessary same row physical address calculation
Changed from V2:
move record_ce_addr_supported to umc_ecc_info struct
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 5 ++
drivers/gpu/drm/amd/amdgpu/umc_v6_7.c |
support umc/gfx/sdma ras on guest side
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4
drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c| 4
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c| 23 ++
drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c |
The VCN and JPEG ras are supported, so add VCN and JPEG ras block id.
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 4
drivers/gpu/drm/amd/amdgpu/ta_ras_if.h | 2 ++
2 files changed, 6 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
The EccInfo_t struct in driver_if.h is as below in official release
verion 68.55.0
typedef struct {
uint64_t mca_umc_status;
uint64_t mca_umc_addr;
uint16_t ce_count_lo_chip;
uint16_t ce_count_hi_chip;
uint32_t eccPadding;
uint64_t mca_ceumc_addr;
} EccInfo_t;
It's different
SMU add a new variable mca_ceumc_addr to record
umc correctable error address in EccInfo table,
driver side add EccInfo_V2_t to support this feature
Changed from V1:
remove ecc_table_v2 and unnecessary table id, define union struct
include
EccInfo_t and EccInfo_V2_t.
Changed from V1:
remove unnecessary same row physical address calculation
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 5 ++
drivers/gpu/drm/amd/amdgpu/umc_v6_7.c | 52 ++-
.../drm/amd/pm/swsmu/smu13/aldebaran_ppt.c| 1 +
[Why]
[ 754.862560] refcount_t: underflow; use-after-free.
[ 754.862898] Call Trace:
[ 754.862903]
[ 754.862913] amdgpu_job_free_cb+0xc2/0xe1 [amdgpu]
[ 754.863543] drm_sched_main.cold+0x34/0x39 [amd_sched]
[How]
The fw_fence may be not init, check whether
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 6 ++
1 file changed, 6 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
index 6d2879ac585b..f76b1cb8baf8 100644
---
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/amdgpu/sdma_v4_4.c | 4
1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4.c
b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4.c
index 6f9895cdddb1..0ddb6955a6d3 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4.c
+++
XGMI RAS should be according to the gmc xgmi physical nodes number,
XGMI RAS should not be enabled if xgmi num_physical_nodes is zero.
Change-Id: Idf3600b30584b10b528e7237d103d84d5097b7e0
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 7 +++
1 file changed, 7
Aldebaran supports VCN and JPEG RAS, it reports unexpected
block id message during VCN and JPEG RAS initialization if VCN
and JPEG block id not defined.
Change-Id: Icceb43556eec802f11c2077c1c58a1e92c9df599
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 4
XGMI RAS should be according to the gmc xmgi supported flag
and xgmi physical nodes number.
Change-Id: Idf3600b30584b10b528e7237d103d84d5097b7e0
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 8
1 file changed, 8 insertions(+)
diff --git
Change-Id: Icceb43556eec802f11c2077c1c58a1e92c9df599
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 4
drivers/gpu/drm/amd/amdgpu/ta_ras_if.h | 2 ++
2 files changed, 6 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
Using "is_app_apu" to identify device in the native
APU mode or carveout mode.
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 8 +++---
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 34 ++---
3 files
Do not compare injection address with mc_vram_size
if mc_vram_size is zero.
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
Enable RAS for aqua vanjaram.
Changed from V1:
Split the change in amdgpu_ras_asic_supported into a
separated patch.
Signed-off-by: Stanley.Yang
Reviewed-by: Hawking Zhang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 1 +
1 file changed, 1 insertion(+)
diff --git
Disable RAS feature by default for aqua vanjaram on APU platform.
Changed from V1:
Splite Disable RAS by default on APU platform into a
separated patch.
Signed-off-by: Stanley.Yang
Reviewed-by: Hawking Zhang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 9 +
1 file
Enable RAS for aqua vanjaram.
Changed from V1:
Split the change in amdgpu_ras_asic_supported into a
separated patch.
Changed from V2:
Avoid to modify global variable amdgpu_ras_enable.
Signed-off-by: Stanley.Yang
Reviewed-by: Hawking Zhang
---
Disable RAS feature by default for aqua vanjaram on APU platform.
Changed from V1:
Splite Disable RAS by default on APU platform into a
separated patch.
Changed from V2:
Avoid to modify global variable amdgpu_ras_enable.
Signed-off-by: Stanley.Yang
Reviewed-by: Hawking
pass xcc mask to ras ta, ras ta will compare
the mask with the one from chiplet topology.
Changed from V1:
Remove IP version checking.
Set ras_cmd->ras_init_message.init_flags.xcc_mask
directly due to xcc_mask is common structres to
all the devices.
Signed-off-by:
Changed from V1:
Remove amdgpu_ras_logical_mask_to_physical_mask
due to GET_MASK provides same feature.
Support convert VCN/JPEG logical mask to physical
mask.
Signed-off-by: Stanley.Yang
Reviewed-by: Tao Zhou
Reviewed-by: Hawking Zhang
---
Support VCN/JPEG instance mask checking, pass logical
mask directly except GFX/SDMA/VCN/JPEG blocks.
Changed from V1:
correct a typo
Signed-off-by: Stanley.Yang
Reviewed-by: Tao Zhou
Reviewed-by: Hawking Zhang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 6 +-
1 file changed, 5
Add ras info to EEPROM table, app can analyse device ECC
status without GPU driver through EEPROM table ras info.
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 2 +-
.../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c| 204 --
Set EEPROM ras info: rma status, health percent and bad
page threshold.
Signed-off-by: Stanley.Yang
---
.../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c| 24 +++
.../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.h| 5
2 files changed, 29 insertions(+)
diff --git
It's more reasonable to check EEPROM table ras info bytes.
Signed-off-by: Stanley.Yang
---
.../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c| 19 +++
1 file changed, 19 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
Rename RAS_TABLE_VER to RAS_TABLE_VER_V1,
move RAS_TABLE_VER_V1 from amdgpu_ras_eeprom.c to amdgpu_ras_eeprom.h.
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 5 ++---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.h | 2 ++
2 files changed, 4 insertions(+), 3
1 - 100 of 118 matches
Mail list logo