RE: [PATCH] drm/amdgpu: fix one vf mode

2020-04-26 Thread Zhang, Hawking
[AMD Official Use Only - Internal Distribution Only]

Sorry I shall say hold on this patch until we reach agreement on how to support 
onevf mode in current software smu design.

-Original Message-
From: amd-gfx  On Behalf Of Zhang, 
Hawking
Sent: Monday, April 27, 2020 12:25
To: Liu, Monk ; amd-gfx@lists.freedesktop.org
Cc: Liu, Monk 
Subject: RE: [PATCH] drm/amdgpu: fix one vf mode

[AMD Official Use Only - Internal Distribution Only]

As discussed, we want to keep this patch until we finalized onevf mode support 
design in guest driver.

Current approach to add one_vf mode check for every smu function is not 
sustatinable and error prone when new asic support added in software smu

Regards,
Hawking
-Original Message-
From: amd-gfx  On Behalf Of Monk Liu
Sent: Monday, April 27, 2020 11:35
To: amd-gfx@lists.freedesktop.org
Cc: Liu, Monk 
Subject: [PATCH] drm/amdgpu: fix one vf mode

still need to call system_enable_features for one vf mode but need to block the 
SMU request from SRIOV case and allows the software side change pass in 
"smu_v11_0_system_features_control"

by this patlch the pp_dpm_mclk/sclk now shows correct output

Signed-off-by: Monk Liu 
Singed-off-by: Rohit 
---
 drivers/gpu/drm/amd/powerplay/amdgpu_smu.c |  8   
drivers/gpu/drm/amd/powerplay/smu_v11_0.c  | 13 +
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c 
b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
index 5964d63..bfb026c 100644
--- a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
+++ b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
@@ -1183,7 +1183,15 @@ static int smu_smc_table_hw_init(struct smu_context *smu,
return ret;
}
}
+   } else {
+   /* we need to enable some SMU features for one vf mode */
+   if (amdgpu_sriov_is_pp_one_vf(adev)) {
+   ret = smu_system_features_control(smu, true);
+   if (ret)
+   return ret;
+   }
}
+
if (adev->asic_type != CHIP_ARCTURUS) {
ret = smu_notify_display_change(smu);
if (ret)
diff --git a/drivers/gpu/drm/amd/powerplay/smu_v11_0.c 
b/drivers/gpu/drm/amd/powerplay/smu_v11_0.c
index 3e1b3ed..6fb2fd1 100644
--- a/drivers/gpu/drm/amd/powerplay/smu_v11_0.c
+++ b/drivers/gpu/drm/amd/powerplay/smu_v11_0.c
@@ -764,6 +764,9 @@ int smu_v11_0_write_pptable(struct smu_context *smu)
struct smu_table_context *table_context = >smu_table;
int ret = 0;
 
+   if (amdgpu_sriov_vf(smu->adev))
+   return 0;
+
ret = smu_update_table(smu, SMU_TABLE_PPTABLE, 0,
   table_context->driver_pptable, true);
 
@@ -922,10 +925,12 @@ int smu_v11_0_system_features_control(struct smu_context 
*smu,
uint32_t feature_mask[2];
int ret = 0;
 
-   ret = smu_send_smc_msg(smu, (en ? SMU_MSG_EnableAllSmuFeatures :
-SMU_MSG_DisableAllSmuFeatures), NULL);
-   if (ret)
-   return ret;
+   if (!amdgpu_sriov_vf(smu->adev)) {
+   ret = smu_send_smc_msg(smu, (en ? SMU_MSG_EnableAllSmuFeatures :
+
SMU_MSG_DisableAllSmuFeatures), NULL);
+   if (ret)
+   return ret;
+   }
 
bitmap_zero(feature->enabled, feature->feature_num);
bitmap_zero(feature->supported, feature->feature_num);
--
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfxdata=02%7C01%7Chawking.zhang%40amd.com%7Cbdd39fed708f40f172e208d7ea62ea73%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637235582882901954sdata=FN9JLQo8MeWUXHpRxrKRanwd7B03LjCrXMuD%2BjjjZRU%3Dreserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfxdata=02%7C01%7Chawking.zhang%40amd.com%7Cbdd39fed708f40f172e208d7ea62ea73%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637235582882901954sdata=FN9JLQo8MeWUXHpRxrKRanwd7B03LjCrXMuD%2BjjjZRU%3Dreserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: fix one vf mode

2020-04-26 Thread Zhang, Hawking
[AMD Official Use Only - Internal Distribution Only]

As discussed, we want to keep this patch until we finalized onevf mode support 
design in guest driver.

Current approach to add one_vf mode check for every smu function is not 
sustatinable and error prone when new asic support added in software smu

Regards,
Hawking
-Original Message-
From: amd-gfx  On Behalf Of Monk Liu
Sent: Monday, April 27, 2020 11:35
To: amd-gfx@lists.freedesktop.org
Cc: Liu, Monk 
Subject: [PATCH] drm/amdgpu: fix one vf mode

still need to call system_enable_features for one vf mode but need to block the 
SMU request from SRIOV case and allows the software side change pass in 
"smu_v11_0_system_features_control"

by this patlch the pp_dpm_mclk/sclk now shows correct output

Signed-off-by: Monk Liu 
Singed-off-by: Rohit 
---
 drivers/gpu/drm/amd/powerplay/amdgpu_smu.c |  8   
drivers/gpu/drm/amd/powerplay/smu_v11_0.c  | 13 +
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c 
b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
index 5964d63..bfb026c 100644
--- a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
+++ b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
@@ -1183,7 +1183,15 @@ static int smu_smc_table_hw_init(struct smu_context *smu,
return ret;
}
}
+   } else {
+   /* we need to enable some SMU features for one vf mode */
+   if (amdgpu_sriov_is_pp_one_vf(adev)) {
+   ret = smu_system_features_control(smu, true);
+   if (ret)
+   return ret;
+   }
}
+
if (adev->asic_type != CHIP_ARCTURUS) {
ret = smu_notify_display_change(smu);
if (ret)
diff --git a/drivers/gpu/drm/amd/powerplay/smu_v11_0.c 
b/drivers/gpu/drm/amd/powerplay/smu_v11_0.c
index 3e1b3ed..6fb2fd1 100644
--- a/drivers/gpu/drm/amd/powerplay/smu_v11_0.c
+++ b/drivers/gpu/drm/amd/powerplay/smu_v11_0.c
@@ -764,6 +764,9 @@ int smu_v11_0_write_pptable(struct smu_context *smu)
struct smu_table_context *table_context = >smu_table;
int ret = 0;
 
+   if (amdgpu_sriov_vf(smu->adev))
+   return 0;
+
ret = smu_update_table(smu, SMU_TABLE_PPTABLE, 0,
   table_context->driver_pptable, true);
 
@@ -922,10 +925,12 @@ int smu_v11_0_system_features_control(struct smu_context 
*smu,
uint32_t feature_mask[2];
int ret = 0;
 
-   ret = smu_send_smc_msg(smu, (en ? SMU_MSG_EnableAllSmuFeatures :
-SMU_MSG_DisableAllSmuFeatures), NULL);
-   if (ret)
-   return ret;
+   if (!amdgpu_sriov_vf(smu->adev)) {
+   ret = smu_send_smc_msg(smu, (en ? SMU_MSG_EnableAllSmuFeatures :
+
SMU_MSG_DisableAllSmuFeatures), NULL);
+   if (ret)
+   return ret;
+   }
 
bitmap_zero(feature->enabled, feature->feature_num);
bitmap_zero(feature->supported, feature->feature_num);
--
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfxdata=02%7C01%7Chawking.zhang%40amd.com%7Cadd54868b3d8404af9a508d7ea5c0535%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637235553277191960sdata=bdieikeeCvuUqIBD7LHUpoZzFxIdRxMsXmQdqZ12AD0%3Dreserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH v3] drm/amdkfd: Track GPU memory utilization per process

2020-04-26 Thread Mukul Joshi
Track GPU VRAM usage on a per process basis and report it through
sysfs.

v2:
   - Handle AMDGPU BO-specific details in
 amdgpu_amdkfd_gpuvm_free_memory_of_gpu().
   - Return size of VRAM BO being freed from
 amdgpu_amdkfd_gpuvm_free_memory_of_gpu().
   - Do not consider imported memory for VRAM
 usage calculations.

v3:
   - Move handling of imported BO size from
 kfd_ioctl_free_memory_of_gpu() to  
 amdgpu_amdkfd_gpuvm_free_memory_of_gpu().

Signed-off-by: Mukul Joshi 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|  3 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 16 +-
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  | 13 -
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  7 +++
 drivers/gpu/drm/amd/amdkfd/kfd_process.c  | 57 ---
 5 files changed, 84 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index d065c50582eb..a501026e829c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -65,6 +65,7 @@ struct kgd_mem {
struct amdgpu_sync sync;
 
bool aql_queue;
+   bool is_imported;
 };
 
 /* KFD Memory Eviction */
@@ -219,7 +220,7 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
void *vm, struct kgd_mem **mem,
uint64_t *offset, uint32_t flags);
 int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
-   struct kgd_dev *kgd, struct kgd_mem *mem);
+   struct kgd_dev *kgd, struct kgd_mem *mem, uint64_t *size);
 int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
struct kgd_dev *kgd, struct kgd_mem *mem, void *vm);
 int amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu(
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 0768b7eb7683..1247938b1ec1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1277,7 +1277,7 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
 }
 
 int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
-   struct kgd_dev *kgd, struct kgd_mem *mem)
+   struct kgd_dev *kgd, struct kgd_mem *mem, uint64_t *size)
 {
struct amdkfd_process_info *process_info = mem->process_info;
unsigned long bo_size = mem->bo->tbo.mem.size;
@@ -1286,9 +1286,11 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
struct ttm_validate_buffer *bo_list_entry;
unsigned int mapped_to_gpu_memory;
int ret;
+   bool is_imported = 0;
 
mutex_lock(>lock);
mapped_to_gpu_memory = mem->mapped_to_gpu_memory;
+   is_imported = mem->is_imported;
mutex_unlock(>lock);
/* lock is not needed after this, since mem is unused and will
 * be freed anyway
@@ -1340,6 +1342,17 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
kfree(mem->bo->tbo.sg);
}
 
+   /* Update the size of the BO being freed if it was allocated from
+* VRAM and is not imported.
+*/
+   if (size) {
+   if ((mem->bo->preferred_domains == AMDGPU_GEM_DOMAIN_VRAM) &&
+   (!is_imported))
+   *size = bo_size;
+   else
+   *size = 0;
+   }
+
/* Free the BO*/
amdgpu_bo_unref(>bo);
mutex_destroy(>lock);
@@ -1694,6 +1707,7 @@ int amdgpu_amdkfd_gpuvm_import_dmabuf(struct kgd_dev *kgd,
(*mem)->process_info = avm->process_info;
add_kgd_mem_to_kfd_bo_list(*mem, avm->process_info, false);
amdgpu_sync_create(&(*mem)->sync);
+   (*mem)->is_imported = true;
 
return 0;
 }
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index f8fa03a12add..ede84f76397f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1322,6 +1322,10 @@ static int kfd_ioctl_alloc_memory_of_gpu(struct file 
*filep,
goto err_free;
}
 
+   /* Update the VRAM usage count */
+   if (flags & KFD_IOC_ALLOC_MEM_FLAGS_VRAM)
+   pdd->vram_usage += args->size;
+
mutex_unlock(>mutex);
 
args->handle = MAKE_HANDLE(args->gpu_id, idr_handle);
@@ -1337,7 +1341,7 @@ static int kfd_ioctl_alloc_memory_of_gpu(struct file 
*filep,
return 0;
 
 err_free:
-   amdgpu_amdkfd_gpuvm_free_memory_of_gpu(dev->kgd, (struct kgd_mem *)mem);
+   amdgpu_amdkfd_gpuvm_free_memory_of_gpu(dev->kgd, (struct kgd_mem *)mem, 
NULL);
 err_unlock:
mutex_unlock(>mutex);
return err;
@@ -1351,6 +1355,7 @@ static int kfd_ioctl_free_memory_of_gpu(struct file 
*filep,
void *mem;
struct kfd_dev *dev;
int ret;
+   uint64_t size = 0;
 
dev = kfd_device_by_id(GET_GPU_ID(args->handle));
if (!dev)
@@ -1373,7 +1378,7 @@ static int kfd_ioctl_free_memory_of_gpu(struct file 

[PATCH] drm/amdgpu: fix one vf mode

2020-04-26 Thread Monk Liu
still need to call system_enable_features for one vf mode
but need to block the SMU request from SRIOV case and allows
the software side change pass in "smu_v11_0_system_features_control"

by this patlch the pp_dpm_mclk/sclk now shows correct output

Signed-off-by: Monk Liu 
Singed-off-by: Rohit 
---
 drivers/gpu/drm/amd/powerplay/amdgpu_smu.c |  8 
 drivers/gpu/drm/amd/powerplay/smu_v11_0.c  | 13 +
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c 
b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
index 5964d63..bfb026c 100644
--- a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
+++ b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
@@ -1183,7 +1183,15 @@ static int smu_smc_table_hw_init(struct smu_context *smu,
return ret;
}
}
+   } else {
+   /* we need to enable some SMU features for one vf mode */
+   if (amdgpu_sriov_is_pp_one_vf(adev)) {
+   ret = smu_system_features_control(smu, true);
+   if (ret)
+   return ret;
+   }
}
+
if (adev->asic_type != CHIP_ARCTURUS) {
ret = smu_notify_display_change(smu);
if (ret)
diff --git a/drivers/gpu/drm/amd/powerplay/smu_v11_0.c 
b/drivers/gpu/drm/amd/powerplay/smu_v11_0.c
index 3e1b3ed..6fb2fd1 100644
--- a/drivers/gpu/drm/amd/powerplay/smu_v11_0.c
+++ b/drivers/gpu/drm/amd/powerplay/smu_v11_0.c
@@ -764,6 +764,9 @@ int smu_v11_0_write_pptable(struct smu_context *smu)
struct smu_table_context *table_context = >smu_table;
int ret = 0;
 
+   if (amdgpu_sriov_vf(smu->adev))
+   return 0;
+
ret = smu_update_table(smu, SMU_TABLE_PPTABLE, 0,
   table_context->driver_pptable, true);
 
@@ -922,10 +925,12 @@ int smu_v11_0_system_features_control(struct smu_context 
*smu,
uint32_t feature_mask[2];
int ret = 0;
 
-   ret = smu_send_smc_msg(smu, (en ? SMU_MSG_EnableAllSmuFeatures :
-SMU_MSG_DisableAllSmuFeatures), NULL);
-   if (ret)
-   return ret;
+   if (!amdgpu_sriov_vf(smu->adev)) {
+   ret = smu_send_smc_msg(smu, (en ? SMU_MSG_EnableAllSmuFeatures :
+
SMU_MSG_DisableAllSmuFeatures), NULL);
+   if (ret)
+   return ret;
+   }
 
bitmap_zero(feature->enabled, feature->feature_num);
bitmap_zero(feature->supported, feature->feature_num);
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 2/2] drm/amd/powerplay: avoid using pm_en before it is initialized revised

2020-04-26 Thread Tiecheng Zhou
hwmgr->pm_en is initialized at hwmgr_hw_init.

during amdgpu_device_init, there is amdgpu_asic_reset that calls to
soc15_asic_reset (for V320 usecase, Vega10 asic), in which:
1) soc15_asic_reset_method calls to pp_get_asic_baco_capability (pm_en)
2) soc15_asic_baco_reset calls to pp_set_asic_baco_state (pm_en)

pm_en is used in the above two cases while it has not yet been initialized

So avoid using pm_en in the above two functions for V320 passthrough.

Signed-off-by: Tiecheng Zhou 
---
 drivers/gpu/drm/amd/powerplay/amd_powerplay.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c 
b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
index 71b843f542d8..fc31499c2e5c 100644
--- a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
+++ b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
@@ -1438,7 +1438,8 @@ static int pp_get_asic_baco_capability(void *handle, bool 
*cap)
if (!hwmgr)
return -EINVAL;
 
-   if (!hwmgr->pm_en || !hwmgr->hwmgr_func->get_asic_baco_capability)
+   if (!(hwmgr->not_vf && amdgpu_dpm) ||
+   !hwmgr->hwmgr_func->get_asic_baco_capability)
return 0;
 
mutex_lock(>smu_lock);
@@ -1472,7 +1473,8 @@ static int pp_set_asic_baco_state(void *handle, int state)
if (!hwmgr)
return -EINVAL;
 
-   if (!hwmgr->pm_en || !hwmgr->hwmgr_func->set_asic_baco_state)
+   if (!(hwmgr->not_vf && amdgpu_dpm) ||
+   !hwmgr->hwmgr_func->set_asic_baco_state)
return 0;
 
mutex_lock(>smu_lock);
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 1/2] Revert "drm/amd/powerplay: avoid using pm_en before it is initialized"

2020-04-26 Thread Tiecheng Zhou
This reverts commit 764a21cb085b8d7d754b5d74e2ecc6adc064e3e7.

The commit being reverted changed the wrong place, it should have
changed in func get_asic_baco_capability.

Signed-off-by: Tiecheng Zhou 
---
 drivers/gpu/drm/amd/powerplay/amd_powerplay.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c 
b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
index fdff3e1c5e95..71b843f542d8 100644
--- a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
+++ b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
@@ -1455,8 +1455,7 @@ static int pp_get_asic_baco_state(void *handle, int 
*state)
if (!hwmgr)
return -EINVAL;
 
-   if (!(hwmgr->not_vf && amdgpu_dpm) ||
-   !hwmgr->hwmgr_func->get_asic_baco_state)
+   if (!hwmgr->pm_en || !hwmgr->hwmgr_func->get_asic_baco_state)
return 0;
 
mutex_lock(>smu_lock);
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 2/2] drm/amd/powerplay: avoid using pm_en before it is initialized revised

2020-04-26 Thread Zhou, Tiecheng
[AMD Official Use Only - Internal Distribution Only]

Sorry for bothering, I'm going to send another one,

It seems pm_en should not be used in get_asic_baco_capability, 
get_asic_baco_state, set_asic_baco_state.


-Original Message-
From: Tiecheng Zhou  
Sent: Monday, April 27, 2020 9:57 AM
To: amd-gfx@lists.freedesktop.org
Cc: Zhou, Tiecheng ; Tao, Yintian 
Subject: [PATCH 2/2] drm/amd/powerplay: avoid using pm_en before it is 
initialized revised

hwmgr->pm_en is initialized at hwmgr_hw_init.
during amdgpu_device_init, there is amdgpu_asic_reset that calls to 
pp_get_asic_baco_capability, while hwmgr->pm_en has not yet been initialized.

this is to avoid using pm_en in pp_get_asic_baco_capability

Signed-off-by: Tiecheng Zhou 
Signed-off-by: Yintian Tao 
---
 drivers/gpu/drm/amd/powerplay/amd_powerplay.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c 
b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
index 71b843f542d8..fb4ca614f6e3 100644
--- a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
+++ b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
@@ -1438,7 +1438,8 @@ static int pp_get_asic_baco_capability(void *handle, bool 
*cap)
if (!hwmgr)
return -EINVAL;
 
-   if (!hwmgr->pm_en || !hwmgr->hwmgr_func->get_asic_baco_capability)
+   if (!(hwmgr->not_vf && amdgpu_dpm) ||
+   !hwmgr->hwmgr_func->get_asic_baco_capability)
return 0;
 
mutex_lock(>smu_lock);
--
2.17.1
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 1/2] Revert "drm/amd/powerplay: avoid using pm_en before it is initialized"

2020-04-26 Thread Tiecheng Zhou
This reverts commit 764a21cb085b8d7d754b5d74e2ecc6adc064e3e7.

The commit being reverted changed the wrong place, it should have
changed in func get_asic_baco_capability.

Signed-off-by: Tiecheng Zhou 
---
 drivers/gpu/drm/amd/powerplay/amd_powerplay.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c 
b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
index fdff3e1c5e95..71b843f542d8 100644
--- a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
+++ b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
@@ -1455,8 +1455,7 @@ static int pp_get_asic_baco_state(void *handle, int 
*state)
if (!hwmgr)
return -EINVAL;
 
-   if (!(hwmgr->not_vf && amdgpu_dpm) ||
-   !hwmgr->hwmgr_func->get_asic_baco_state)
+   if (!hwmgr->pm_en || !hwmgr->hwmgr_func->get_asic_baco_state)
return 0;
 
mutex_lock(>smu_lock);
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 2/2] drm/amd/powerplay: avoid using pm_en before it is initialized revised

2020-04-26 Thread Tiecheng Zhou
hwmgr->pm_en is initialized at hwmgr_hw_init.
during amdgpu_device_init, there is amdgpu_asic_reset that calls to
pp_get_asic_baco_capability, while hwmgr->pm_en has not yet been initialized.

this is to avoid using pm_en in pp_get_asic_baco_capability

Signed-off-by: Tiecheng Zhou 
Signed-off-by: Yintian Tao 
---
 drivers/gpu/drm/amd/powerplay/amd_powerplay.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c 
b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
index 71b843f542d8..fb4ca614f6e3 100644
--- a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
+++ b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
@@ -1438,7 +1438,8 @@ static int pp_get_asic_baco_capability(void *handle, bool 
*cap)
if (!hwmgr)
return -EINVAL;
 
-   if (!hwmgr->pm_en || !hwmgr->hwmgr_func->get_asic_baco_capability)
+   if (!(hwmgr->not_vf && amdgpu_dpm) ||
+   !hwmgr->hwmgr_func->get_asic_baco_capability)
return 0;
 
mutex_lock(>smu_lock);
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/radeon: cleanup coding style a bit

2020-04-26 Thread Christian König

Am 26.04.20 um 15:12 schrieb Bernard Zhao:

Maybe no need to check ws before kmalloc, kmalloc will check
itself, kmalloc`s logic is if ptr is NULL, kmalloc will just
return

Signed-off-by: Bernard Zhao 


Reviewed-by: Christian König 

I'm wondering why the automated scripts haven't found that one before.


---
  drivers/gpu/drm/radeon/atom.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/radeon/atom.c b/drivers/gpu/drm/radeon/atom.c
index 2c27627b6659..f15b20da5315 100644
--- a/drivers/gpu/drm/radeon/atom.c
+++ b/drivers/gpu/drm/radeon/atom.c
@@ -1211,8 +1211,7 @@ static int atom_execute_table_locked(struct atom_context 
*ctx, int index, uint32
SDEBUG("<<\n");
  
  free:

-   if (ws)
-   kfree(ectx.ws);
+   kfree(ectx.ws);
return ret;
  }
  


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 0/2] add correctable error query support on arcturus

2020-04-26 Thread Zhou1, Tao
[AMD Official Use Only - Internal Distribution Only]

The series is:

Reviewed-by: Tao Zhou 

> -Original Message-
> From: Chen, Guchun 
> Sent: 2020年4月26日 17:17
> To: amd-gfx@lists.freedesktop.org; Zhang, Hawking
> ; Li, Dennis ; Zhou1, Tao
> ; Clements, John ;
> Deucher, Alexander 
> Cc: Li, Candice ; Chen, Guchun
> 
> Subject: [PATCH 0/2] add correctable error query support on arcturus
> 
> Below two patches are submmited to promise UMC correctable error query
> working fine on arcturus.
> Patch 1 is to switch RSMU UMC index access to SMN interface to make it
> stable, and to be consistent with other register access in this file.
> Patch 2 is to decouple EccErrCnt error count query and clear operation, due
> to unknown hardware cause.
> 
> Both are verified on arcturus and Vega20.
> 
> Guchun Chen (2):
>   drm/amdgpu: switch to SMN interface to operate RSMU index mode
>   drm/amdgpu: decouple EccErrCnt query and clear operation.
> 
>  drivers/gpu/drm/amd/amdgpu/umc_v6_1.c | 112
> +++---
>  1 file changed, 103 insertions(+), 9 deletions(-)
> 
> --
> 2.17.1
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amd/display: remove conversion to bool amdgpu_dm.c

2020-04-26 Thread Jason Yan
The '==' expression itself is bool, no need to convert it to bool again.
This fixes the following coccicheck warning:

drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c:7230:16-21: WARNING:
conversion to bool not needed here

Signed-off-by: Jason Yan 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 71309ee3aca3..4051eee86d43 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -7242,8 +7242,7 @@ static void amdgpu_dm_atomic_commit_tail(struct 
drm_atomic_state *state)
hdcp_update_display(
adev->dm.hdcp_workqueue, 
aconnector->dc_link->link_index, aconnector,
new_con_state->hdcp_content_type,
-   new_con_state->content_protection == 
DRM_MODE_CONTENT_PROTECTION_DESIRED ? true
-   
 : false);
+   new_con_state->content_protection == 
DRM_MODE_CONTENT_PROTECTION_DESIRED);
}
 #endif
 
-- 
2.21.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amd/powerplay: remove conversion to bool in navi10_ppt.c

2020-04-26 Thread Jason Yan
The '==' expression itself is bool, no need to convert it to bool again.
This fixes the following coccicheck warning:

drivers/gpu/drm/amd/powerplay/navi10_ppt.c:698:47-52: WARNING:
conversion to bool not needed here

Signed-off-by: Jason Yan 
---
 drivers/gpu/drm/amd/powerplay/navi10_ppt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/powerplay/navi10_ppt.c 
b/drivers/gpu/drm/amd/powerplay/navi10_ppt.c
index 2184d247a9f7..135442c36273 100644
--- a/drivers/gpu/drm/amd/powerplay/navi10_ppt.c
+++ b/drivers/gpu/drm/amd/powerplay/navi10_ppt.c
@@ -695,7 +695,7 @@ static bool navi10_is_support_fine_grained_dpm(struct 
smu_context *smu, enum smu
dpm_desc = >DpmDescriptor[clk_index];
 
/* 0 - Fine grained DPM, 1 - Discrete DPM */
-   return dpm_desc->SnapToDiscrete == 0 ? true : false;
+   return dpm_desc->SnapToDiscrete == 0;
 }
 
 static inline bool navi10_od_feature_is_supported(struct 
smu_11_0_overdrive_table *od_table, enum SMU_11_0_ODFEATURE_CAP cap)
-- 
2.21.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amd/powerplay: avoid using pm_en before it is initialized 2nd

2020-04-26 Thread Tiecheng Zhou
hwmgr->pm_en is initialized at hwmgr_hw_init.
during amdgpu_device_init, there is amdgpu_asic_reset that calls to
pp_get_asic_baco_capability, while hwmgr->pm_en has not yet been initialized.

this is a second patch that avoid using pm_en in pp_get_asic_baco_capability

Signed-off-by: Tiecheng Zhou 
Signed-off-by: Yintian Tao 
---
 drivers/gpu/drm/amd/powerplay/amd_powerplay.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c 
b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
index fdff3e1c5e95..b27f71c75550 100644
--- a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
+++ b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
@@ -1438,7 +1438,8 @@ static int pp_get_asic_baco_capability(void *handle, bool 
*cap)
if (!hwmgr)
return -EINVAL;
 
-   if (!hwmgr->pm_en || !hwmgr->hwmgr_func->get_asic_baco_capability)
+   if (!(hwmgr->not_vf && amdgpu_dpm) ||
+   !hwmgr->hwmgr_func->get_asic_baco_capability)
return 0;
 
mutex_lock(>smu_lock);
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: Add autodump debugfs node for gpu reset v4

2020-04-26 Thread Christian König

Am 26.04.20 um 12:09 schrieb jia...@amd.com:

From: Jiange Zhao 

When GPU got timeout, it would notify an interested part
of an opportunity to dump info before actual GPU reset.

A usermode app would open 'autodump' node under debugfs system
and poll() for readable/writable. When a GPU reset is due,
amdgpu would notify usermode app through wait_queue_head and give
it 10 minutes to dump info.

After usermode app has done its work, this 'autodump' node is closed.
On node closure, amdgpu gets to know the dump is done through
the completion that is triggered in release().

There is no write or read callback because necessary info can be
obtained through dmesg and umr. Messages back and forth between
usermode app and amdgpu are unnecessary.

v2: (1) changed 'registered' to 'app_listening'
 (2) add a mutex in open() to prevent race condition

v3 (chk): grab the reset lock to avoid race in autodump_open,
   rename debugfs file to amdgpu_autodump,
   provide autodump_read as well,
   style and code cleanups

v4: add 'bool app_listening' to differentiate situations, so that
 the node can be reopened; also, there is no need to wait for
 completion when no app is waiting for a dump.


NAK, exactly that is racy and should be avoided.

What problem are you seeing here?

Regards,
Christian.



Signed-off-by: Jiange Zhao 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h |  2 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 82 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.h |  7 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  2 +
  4 files changed, 92 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index bc1e0fd71a09..6f8ef98c4b97 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -990,6 +990,8 @@ struct amdgpu_device {
charproduct_number[16];
charproduct_name[32];
charserial[16];
+
+   struct amdgpu_autodump  autodump;
  };
  
  static inline struct amdgpu_device *amdgpu_ttm_adev(struct ttm_bo_device *bdev)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index 1a4894fa3693..04720264e8b9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -27,7 +27,7 @@
  #include 
  #include 
  #include 
-
+#include 
  #include 
  
  #include "amdgpu.h"

@@ -74,7 +74,85 @@ int amdgpu_debugfs_add_files(struct amdgpu_device *adev,
return 0;
  }
  
+int amdgpu_debugfs_wait_dump(struct amdgpu_device *adev)

+{
  #if defined(CONFIG_DEBUG_FS)
+   unsigned long timeout = 600 * HZ;
+   int ret;
+
+   if (!adev->autodump.app_listening)
+   return 0;
+
+   wake_up_interruptible(>autodump.gpu_hang);
+
+   ret = 
wait_for_completion_interruptible_timeout(>autodump.dumping, timeout);
+   if (ret == 0) {
+   pr_err("autodump: timeout, move on to gpu recovery\n");
+   return -ETIMEDOUT;
+   }
+#endif
+   return 0;
+}
+
+#if defined(CONFIG_DEBUG_FS)
+
+static int amdgpu_debugfs_autodump_open(struct inode *inode, struct file *file)
+{
+   struct amdgpu_device *adev = inode->i_private;
+   int ret;
+
+   file->private_data = adev;
+
+   mutex_lock(>lock_reset);
+   if (!adev->autodump.app_listening) {
+   adev->autodump.app_listening = true;
+   ret = 0;
+   } else {
+   ret = -EBUSY;
+   }
+   mutex_unlock(>lock_reset);
+
+   return ret;
+}
+
+static int amdgpu_debugfs_autodump_release(struct inode *inode, struct file 
*file)
+{
+   struct amdgpu_device *adev = file->private_data;
+
+   complete(>autodump.dumping);
+   adev->autodump.app_listening = false;
+   return 0;
+}
+
+static unsigned int amdgpu_debugfs_autodump_poll(struct file *file, struct 
poll_table_struct *poll_table)
+{
+   struct amdgpu_device *adev = file->private_data;
+
+   poll_wait(file, >autodump.gpu_hang, poll_table);
+
+   if (adev->in_gpu_reset)
+   return POLLIN | POLLRDNORM | POLLWRNORM;
+
+   return 0;
+}
+
+static const struct file_operations autodump_debug_fops = {
+   .owner = THIS_MODULE,
+   .open = amdgpu_debugfs_autodump_open,
+   .poll = amdgpu_debugfs_autodump_poll,
+   .release = amdgpu_debugfs_autodump_release,
+};
+
+static void amdgpu_debugfs_autodump_init(struct amdgpu_device *adev)
+{
+   init_completion(>autodump.dumping);
+   init_waitqueue_head(>autodump.gpu_hang);
+   adev->autodump.app_listening = false;
+
+   debugfs_create_file("amdgpu_autodump", 0600,
+   adev->ddev->primary->debugfs_root,
+   adev, _debug_fops);
+}
  
  /**

   * amdgpu_debugfs_process_reg_op - Handle MMIO register reads/writes
@@ -1434,6 

Re: [PATCH] drm/amdgpu: Add autodump debugfs node for gpu reset v3

2020-04-26 Thread Zhao, Jiange
[AMD Official Use Only - Internal Distribution Only]

Hi @Christian König,

I pulled your patch and tried it. And I have some doubts:

(1) I can't fully understand the functionality of read() in here.
(2) amdgpu_autodump can't be reopened. Because 
wait_for_completion_interruptible_timeout() would decrement 
autodump.dumping.done.
(3) When there is no usermode app listening on this node, 
amdgpu_debugfs_wait_dump() would stuck and reset can't be performed.

So I made some changes on top of your modification as version 4. Please have a 
look.

Jiange

From: Christian König 
Sent: Friday, April 24, 2020 7:47 PM
To: amd-gfx@lists.freedesktop.org ; 
Pelloux-prayer, Pierre-eric ; Zhang, 
Hawking ; Liu, Monk ; Zhao, Jiange 

Cc: Kuehling, Felix 
Subject: [PATCH] drm/amdgpu: Add autodump debugfs node for gpu reset v3

From: Jiange Zhao 

When GPU got timeout, it would notify an interested part
of an opportunity to dump info before actual GPU reset.

A usermode app would open 'autodump' node under debugfs system
and poll() for readable/writable. When a GPU reset is due,
amdgpu would notify usermode app through wait_queue_head and give
it 10 minutes to dump info.

After usermode app has done its work, this 'autodump' node is closed.
On node closure, amdgpu gets to know the dump is done through
the completion that is triggered in release().

There is no write or read callback because necessary info can be
obtained through dmesg and umr. Messages back and forth between
usermode app and amdgpu are unnecessary.

v2: (1) changed 'registered' to 'app_listening'
(2) add a mutex in open() to prevent race condition

v3 (chk): grab the reset lock to avoid race in autodump_open,
  rename debugfs file to amdgpu_autodump,
  provide autodump_read as well,
  style and code cleanups

Signed-off-by: Jiange Zhao 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h |  2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 92 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.h |  6 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  2 +
 4 files changed, 101 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index bc1e0fd71a09..6f8ef98c4b97 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -990,6 +990,8 @@ struct amdgpu_device {
 charproduct_number[16];
 charproduct_name[32];
 charserial[16];
+
+   struct amdgpu_autodump  autodump;
 };

 static inline struct amdgpu_device *amdgpu_ttm_adev(struct ttm_bo_device *bdev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index 1a4894fa3693..b1029d12a971 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -27,7 +27,7 @@
 #include 
 #include 
 #include 
-
+#include 
 #include 

 #include "amdgpu.h"
@@ -74,8 +74,96 @@ int amdgpu_debugfs_add_files(struct amdgpu_device *adev,
 return 0;
 }

+int amdgpu_debugfs_wait_dump(struct amdgpu_device *adev)
+{
+#if defined(CONFIG_DEBUG_FS)
+   const unsigned long timeout = 600 * HZ;
+   int ret;
+
+   wake_up_interruptible(>autodump.gpu_hang);
+
+   ret = wait_for_completion_interruptible_timeout(>autodump.dumping,
+   timeout);
+   if (ret == 0) {
+   pr_err("autodump: timeout, move on to gpu recovery\n");
+   return -ETIMEDOUT;
+   }
+#endif
+   return 0;
+}
+
 #if defined(CONFIG_DEBUG_FS)

+static int amdgpu_debugfs_autodump_open(struct inode *inode, struct file *file)
+{
+   struct amdgpu_device *adev = inode->i_private;
+   int ret;
+
+   file->private_data = adev;
+
+   mutex_lock(>lock_reset);
+   if (adev->autodump.dumping.done) {
+   reinit_completion(>autodump.dumping);
+   ret = 0;
+   } else {
+   ret = -EBUSY;
+   }
+   mutex_unlock(>lock_reset);
+
+   return ret;
+}
+
+static int amdgpu_debugfs_autodump_release(struct inode *inode,
+  struct file *file)
+{
+   struct amdgpu_device *adev = file->private_data;
+
+   complete(>autodump.dumping);
+   return 0;
+}
+
+static ssize_t amdgpu_debugfs_autodump_read(struct file *file, char __user 
*buf,
+   size_t size, loff_t *pos)
+{
+   struct amdgpu_device *adev = file->private_data;
+
+   wait_event_interruptible(adev->autodump.gpu_hang, adev->in_gpu_reset);
+   return 0;
+}
+
+unsigned int amdgpu_debugfs_autodump_poll(struct file *file,
+ struct poll_table_struct *poll_table)
+{
+   struct amdgpu_device *adev = 

[PATCH] drm/amdgpu: Add autodump debugfs node for gpu reset v4

2020-04-26 Thread jianzh
From: Jiange Zhao 

When GPU got timeout, it would notify an interested part
of an opportunity to dump info before actual GPU reset.

A usermode app would open 'autodump' node under debugfs system
and poll() for readable/writable. When a GPU reset is due,
amdgpu would notify usermode app through wait_queue_head and give
it 10 minutes to dump info.

After usermode app has done its work, this 'autodump' node is closed.
On node closure, amdgpu gets to know the dump is done through
the completion that is triggered in release().

There is no write or read callback because necessary info can be
obtained through dmesg and umr. Messages back and forth between
usermode app and amdgpu are unnecessary.

v2: (1) changed 'registered' to 'app_listening'
(2) add a mutex in open() to prevent race condition

v3 (chk): grab the reset lock to avoid race in autodump_open,
  rename debugfs file to amdgpu_autodump,
  provide autodump_read as well,
  style and code cleanups

v4: add 'bool app_listening' to differentiate situations, so that
the node can be reopened; also, there is no need to wait for
completion when no app is waiting for a dump.

Signed-off-by: Jiange Zhao 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h |  2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 82 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.h |  7 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  2 +
 4 files changed, 92 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index bc1e0fd71a09..6f8ef98c4b97 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -990,6 +990,8 @@ struct amdgpu_device {
charproduct_number[16];
charproduct_name[32];
charserial[16];
+
+   struct amdgpu_autodump  autodump;
 };
 
 static inline struct amdgpu_device *amdgpu_ttm_adev(struct ttm_bo_device *bdev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index 1a4894fa3693..04720264e8b9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -27,7 +27,7 @@
 #include 
 #include 
 #include 
-
+#include 
 #include 
 
 #include "amdgpu.h"
@@ -74,7 +74,85 @@ int amdgpu_debugfs_add_files(struct amdgpu_device *adev,
return 0;
 }
 
+int amdgpu_debugfs_wait_dump(struct amdgpu_device *adev)
+{
 #if defined(CONFIG_DEBUG_FS)
+   unsigned long timeout = 600 * HZ;
+   int ret;
+
+   if (!adev->autodump.app_listening)
+   return 0;
+
+   wake_up_interruptible(>autodump.gpu_hang);
+
+   ret = 
wait_for_completion_interruptible_timeout(>autodump.dumping, timeout);
+   if (ret == 0) {
+   pr_err("autodump: timeout, move on to gpu recovery\n");
+   return -ETIMEDOUT;
+   }
+#endif
+   return 0;
+}
+
+#if defined(CONFIG_DEBUG_FS)
+
+static int amdgpu_debugfs_autodump_open(struct inode *inode, struct file *file)
+{
+   struct amdgpu_device *adev = inode->i_private;
+   int ret;
+
+   file->private_data = adev;
+
+   mutex_lock(>lock_reset);
+   if (!adev->autodump.app_listening) {
+   adev->autodump.app_listening = true;
+   ret = 0;
+   } else {
+   ret = -EBUSY;
+   }
+   mutex_unlock(>lock_reset);
+
+   return ret;
+}
+
+static int amdgpu_debugfs_autodump_release(struct inode *inode, struct file 
*file)
+{
+   struct amdgpu_device *adev = file->private_data;
+
+   complete(>autodump.dumping);
+   adev->autodump.app_listening = false;
+   return 0;
+}
+
+static unsigned int amdgpu_debugfs_autodump_poll(struct file *file, struct 
poll_table_struct *poll_table)
+{
+   struct amdgpu_device *adev = file->private_data;
+
+   poll_wait(file, >autodump.gpu_hang, poll_table);
+
+   if (adev->in_gpu_reset)
+   return POLLIN | POLLRDNORM | POLLWRNORM;
+
+   return 0;
+}
+
+static const struct file_operations autodump_debug_fops = {
+   .owner = THIS_MODULE,
+   .open = amdgpu_debugfs_autodump_open,
+   .poll = amdgpu_debugfs_autodump_poll,
+   .release = amdgpu_debugfs_autodump_release,
+};
+
+static void amdgpu_debugfs_autodump_init(struct amdgpu_device *adev)
+{
+   init_completion(>autodump.dumping);
+   init_waitqueue_head(>autodump.gpu_hang);
+   adev->autodump.app_listening = false;
+
+   debugfs_create_file("amdgpu_autodump", 0600,
+   adev->ddev->primary->debugfs_root,
+   adev, _debug_fops);
+}
 
 /**
  * amdgpu_debugfs_process_reg_op - Handle MMIO register reads/writes
@@ -1434,6 +1512,8 @@ int amdgpu_debugfs_init(struct amdgpu_device *adev)
 
amdgpu_ras_debugfs_create_all(adev);
 
+   amdgpu_debugfs_autodump_init(adev);
+
return 

[PATCH 2/2] drm/amdgpu: decouple EccErrCnt query and clear operation.

2020-04-26 Thread Guchun Chen
Due to hardware bug that when RSMU UMC index is disabled,
clear EccErrCnt at the first UMC instance will clean up all other
EccErrCnt registes from other instances at the same time. This
will break the correctable error count log in EccErrCnt register
once querying it. So decouple both to make error count query workable.

Signed-off-by: Guchun Chen 
---
 drivers/gpu/drm/amd/amdgpu/umc_v6_1.c | 83 +--
 1 file changed, 79 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v6_1.c 
b/drivers/gpu/drm/amd/amdgpu/umc_v6_1.c
index 6d767970b2cf..fa889eeb3a17 100644
--- a/drivers/gpu/drm/amd/amdgpu/umc_v6_1.c
+++ b/drivers/gpu/drm/amd/amdgpu/umc_v6_1.c
@@ -104,6 +104,81 @@ static inline uint32_t get_umc_6_reg_offset(struct 
amdgpu_device *adev,
return adev->umc.channel_offs*ch_inst + UMC_6_INST_DIST*umc_inst;
 }
 
+static void umc_v6_1_clear_error_count_per_channel(struct amdgpu_device *adev,
+   uint32_t umc_reg_offset)
+{
+   uint32_t ecc_err_cnt_addr;
+   uint32_t ecc_err_cnt_sel, ecc_err_cnt_sel_addr;
+
+   if (adev->asic_type == CHIP_ARCTURUS) {
+   /* UMC 6_1_2 registers */
+   ecc_err_cnt_sel_addr =
+   SOC15_REG_OFFSET(UMC, 0,
+   mmUMCCH0_0_EccErrCntSel_ARCT);
+   ecc_err_cnt_addr =
+   SOC15_REG_OFFSET(UMC, 0,
+   mmUMCCH0_0_EccErrCnt_ARCT);
+   } else {
+   /* UMC 6_1_1 registers */
+   ecc_err_cnt_sel_addr =
+   SOC15_REG_OFFSET(UMC, 0,
+   mmUMCCH0_0_EccErrCntSel);
+   ecc_err_cnt_addr =
+   SOC15_REG_OFFSET(UMC, 0,
+   mmUMCCH0_0_EccErrCnt);
+   }
+
+   /* select the lower chip */
+   ecc_err_cnt_sel = RREG32_PCIE((ecc_err_cnt_sel_addr +
+   umc_reg_offset) * 4);
+   ecc_err_cnt_sel = REG_SET_FIELD(ecc_err_cnt_sel,
+   UMCCH0_0_EccErrCntSel,
+   EccErrCntCsSel, 0);
+   WREG32_PCIE((ecc_err_cnt_sel_addr + umc_reg_offset) * 4,
+   ecc_err_cnt_sel);
+
+   /* clear lower chip error count */
+   WREG32_PCIE((ecc_err_cnt_addr + umc_reg_offset) * 4,
+   UMC_V6_1_CE_CNT_INIT);
+
+   /* select the higher chip */
+   ecc_err_cnt_sel = RREG32_PCIE((ecc_err_cnt_sel_addr +
+   umc_reg_offset) * 4);
+   ecc_err_cnt_sel = REG_SET_FIELD(ecc_err_cnt_sel,
+   UMCCH0_0_EccErrCntSel,
+   EccErrCntCsSel, 1);
+   WREG32_PCIE((ecc_err_cnt_sel_addr + umc_reg_offset) * 4,
+   ecc_err_cnt_sel);
+
+   /* clear higher chip error count */
+   WREG32_PCIE((ecc_err_cnt_addr + umc_reg_offset) * 4,
+   UMC_V6_1_CE_CNT_INIT);
+}
+
+static void umc_v6_1_clear_error_count(struct amdgpu_device *adev)
+{
+   uint32_t umc_inst= 0;
+   uint32_t ch_inst = 0;
+   uint32_t umc_reg_offset  = 0;
+   uint32_t rsmu_umc_index_state =
+   umc_v6_1_get_umc_index_mode_state(adev);
+
+   if (rsmu_umc_index_state)
+   umc_v6_1_disable_umc_index_mode(adev);
+
+   LOOP_UMC_INST_AND_CH(umc_inst, ch_inst) {
+   umc_reg_offset = get_umc_6_reg_offset(adev,
+   umc_inst,
+   ch_inst);
+
+   umc_v6_1_clear_error_count_per_channel(adev,
+   umc_reg_offset);
+   }
+
+   if (rsmu_umc_index_state)
+   umc_v6_1_enable_umc_index_mode(adev);
+}
+
 static void umc_v6_1_query_correctable_error_count(struct amdgpu_device *adev,
   uint32_t umc_reg_offset,
   unsigned long *error_count)
@@ -136,23 +211,21 @@ static void umc_v6_1_query_correctable_error_count(struct 
amdgpu_device *adev,
ecc_err_cnt_sel = REG_SET_FIELD(ecc_err_cnt_sel, UMCCH0_0_EccErrCntSel,
EccErrCntCsSel, 0);
WREG32_PCIE((ecc_err_cnt_sel_addr + umc_reg_offset) * 4, 
ecc_err_cnt_sel);
+
ecc_err_cnt = RREG32_PCIE((ecc_err_cnt_addr + umc_reg_offset) * 4);
*error_count +=
(REG_GET_FIELD(ecc_err_cnt, UMCCH0_0_EccErrCnt, EccErrCnt) -
 UMC_V6_1_CE_CNT_INIT);
-   /* clear the lower chip err count */
-   WREG32_PCIE((ecc_err_cnt_addr + umc_reg_offset) * 4, 
UMC_V6_1_CE_CNT_INIT);
 
/* select the higher chip and check the err counter */
ecc_err_cnt_sel = REG_SET_FIELD(ecc_err_cnt_sel, 

[PATCH 0/2] add correctable error query support on arcturus

2020-04-26 Thread Guchun Chen
Below two patches are submmited to promise UMC correctable error query
working fine on arcturus.
Patch 1 is to switch RSMU UMC index access to SMN interface to make it
stable, and to be consistent with other register access in this file.
Patch 2 is to decouple EccErrCnt error count query and clear operation,
due to unknown hardware cause.

Both are verified on arcturus and Vega20.

Guchun Chen (2):
  drm/amdgpu: switch to SMN interface to operate RSMU index mode
  drm/amdgpu: decouple EccErrCnt query and clear operation.

 drivers/gpu/drm/amd/amdgpu/umc_v6_1.c | 112 +++---
 1 file changed, 103 insertions(+), 9 deletions(-)

-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 1/2] drm/amdgpu: switch to SMN interface to operate RSMU index mode

2020-04-26 Thread Guchun Chen
This makes consistent of regsiter access in this module.

Signed-off-by: Guchun Chen 
---
 drivers/gpu/drm/amd/amdgpu/umc_v6_1.c | 29 ++-
 1 file changed, 24 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v6_1.c 
b/drivers/gpu/drm/amd/amdgpu/umc_v6_1.c
index 616eac76eaa7..6d767970b2cf 100644
--- a/drivers/gpu/drm/amd/amdgpu/umc_v6_1.c
+++ b/drivers/gpu/drm/amd/amdgpu/umc_v6_1.c
@@ -56,24 +56,43 @@ const uint32_t
 
 static void umc_v6_1_enable_umc_index_mode(struct amdgpu_device *adev)
 {
-   WREG32_FIELD15(RSMU, 0, RSMU_UMC_INDEX_REGISTER_NBIF_VG20_GPU,
+   uint32_t rsmu_umc_addr, rsmu_umc_val;
+
+   rsmu_umc_addr = SOC15_REG_OFFSET(RSMU, 0,
+   mmRSMU_UMC_INDEX_REGISTER_NBIF_VG20_GPU);
+   rsmu_umc_val = RREG32_PCIE(rsmu_umc_addr * 4);
+
+   rsmu_umc_val = REG_SET_FIELD(rsmu_umc_val,
+   RSMU_UMC_INDEX_REGISTER_NBIF_VG20_GPU,
RSMU_UMC_INDEX_MODE_EN, 1);
+
+   WREG32_PCIE(rsmu_umc_addr * 4, rsmu_umc_val);
 }
 
 static void umc_v6_1_disable_umc_index_mode(struct amdgpu_device *adev)
 {
-   WREG32_FIELD15(RSMU, 0, RSMU_UMC_INDEX_REGISTER_NBIF_VG20_GPU,
+   uint32_t rsmu_umc_addr, rsmu_umc_val;
+
+   rsmu_umc_addr = SOC15_REG_OFFSET(RSMU, 0,
+   mmRSMU_UMC_INDEX_REGISTER_NBIF_VG20_GPU);
+   rsmu_umc_val = RREG32_PCIE(rsmu_umc_addr * 4);
+
+   rsmu_umc_val = REG_SET_FIELD(rsmu_umc_val,
+   RSMU_UMC_INDEX_REGISTER_NBIF_VG20_GPU,
RSMU_UMC_INDEX_MODE_EN, 0);
+
+   WREG32_PCIE(rsmu_umc_addr * 4, rsmu_umc_val);
 }
 
 static uint32_t umc_v6_1_get_umc_index_mode_state(struct amdgpu_device *adev)
 {
-   uint32_t rsmu_umc_index;
+   uint32_t rsmu_umc_addr, rsmu_umc_val;
 
-   rsmu_umc_index = RREG32_SOC15(RSMU, 0,
+   rsmu_umc_addr = SOC15_REG_OFFSET(RSMU, 0,
mmRSMU_UMC_INDEX_REGISTER_NBIF_VG20_GPU);
+   rsmu_umc_val = RREG32_PCIE(rsmu_umc_addr * 4);
 
-   return REG_GET_FIELD(rsmu_umc_index,
+   return REG_GET_FIELD(rsmu_umc_val,
RSMU_UMC_INDEX_REGISTER_NBIF_VG20_GPU,
RSMU_UMC_INDEX_MODE_EN);
 }
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: drm/amdgpu: apply AMDGPU_IB_FLAG_EMIT_MEM_SYNC to compute IBs too

2020-04-26 Thread Christian König
Thanks for that explanation. I suspected that there was a good reason to 
have that in the kernel, but couldn't find one.


In this case the patch is Reviewed-by: Christian König 



We should probably add this explanation as comment to the flag as well.

Thanks,
Christian.

Am 26.04.20 um 02:43 schrieb Marek Olšák:

It was merged into amd-staging-drm-next.

I'm not absolutely sure, but I think we need to invalidate before IBs 
if an IB is cached in L2 and the CPU has updated it. It can only be 
cached in L2 if something other than CP has read it or written to it 
without invalidation. CP reads don't cache it but they can hit the 
cache if it's already cached.


For CE, we need to invalidate before the IB in the kernel, because CE 
IBs can't do cache invalidations IIRC. This is the number one reason 
for merging the already pushed commits.


Marek

On Sat., Apr. 25, 2020, 11:03 Christian König, 
> wrote:


Was that patch set actually merged upstream? My last status is
that we couldn't find a reason why we need to do this in the kernel.

Christian.

Am 25.04.20 um 10:52 schrieb Marek Olšák:

This was missed.

Marek

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org  
https://lists.freedesktop.org/mailman/listinfo/amd-gfx



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: drm/amdgpu: invalidate L2 before SDMA IBs (on gfx10)

2020-04-26 Thread Christian König
Thought so, we should try to get this get it committed ASAP. And maybe 
add a CC: stable tag as well.


Patch is Reviewed-by: Christian König .

Thanks,
Christian.

Am 26.04.20 um 02:28 schrieb Marek Olšák:
Not without a mandatory firmware update. The gcr packet support for 
IBs was added into the sdma firmware just two weeks ago.


Marek

On Sat., Apr. 25, 2020, 11:04 Christian König, 
> wrote:


Could we do this in userspace as well?

Am 25.04.20 um 10:52 schrieb Marek Olšák:

This should fix SDMA hangs on gfx10.

Marek

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org  
https://lists.freedesktop.org/mailman/listinfo/amd-gfx  





___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx