date:20231009

[PATCH] drm/amd/display: add missing NULL check for DML2

2023-10-09 Thread Bob Zhou

Recently, the driver introduce DML2 for future ASIC support.
But, some ASIC's hubbub pointer is null before calling.
It cause the below null pointer issue, so add null check to fix it.

BUG: kernel NULL pointer dereference, address: 
RIP: 0010:dc_create_resource_pool+0xc1/0x2c0 [amdgpu] Call Trace:
 
 ? show_regs.cold+0x1a/0x1f
 ? __die_body+0x20/0x70
 ? __die+0x2b/0x37
 ? page_fault_oops+0x136/0x2c0
 ? do_user_addr_fault+0x303/0x660
 ? exc_page_fault+0x77/0x170
 ? asm_exc_page_fault+0x27/0x30
 ? dc_create_resource_pool+0xc1/0x2c0 [amdgpu]  ? 
dc_create_resource_pool+0x243/0x2c0 [amdgpu]
 dc_create+0x23f/0x6b0 [amdgpu]
 ? dmi_matches+0xa3/0x200
 amdgpu_dm_init+0x2bd/0x22a0 [amdgpu]

Fixes: a2815ada8616 ("drm/amd/display: Introduce DML2")

Signed-off-by: Bob Zhou 
---
 drivers/gpu/drm/amd/display/dc/core/dc_resource.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
index 25562b262555..d20e01226353 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
@@ -321,7 +321,7 @@ struct resource_pool *dc_create_resource_pool(struct dc  
*dc,
res_pool->ref_clocks.xtalin_clock_inKhz;
res_pool->ref_clocks.dchub_ref_clock_inKhz =
res_pool->ref_clocks.xtalin_clock_inKhz;
-   if ((res_pool->hubbub->funcs->get_dchub_ref_freq))
+   if (res_pool->hubbub && 
res_pool->hubbub->funcs->get_dchub_ref_freq)

res_pool->hubbub->funcs->get_dchub_ref_freq(res_pool->hubbub,

res_pool->ref_clocks.dccg_ref_clock_inKhz,

_pool->ref_clocks.dchub_ref_clock_inKhz);
-- 
2.34.1

RE: [PATCH] drm/amd/pm: wait for completion of the EnableGfxImu command

2023-10-09 Thread Zhang, Yifan

[AMD Official Use Only - General]

I'm wondering why it is "without waiting" in the first place ? It doesn't make 
sense to continue driver loading if power up GFX fails. Can we apply the change 
regardless of load types ?

-Original Message-
From: Huang, Tim 
Sent: Tuesday, October 10, 2023 12:47 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Zhang, Yifan 
; Huang, Tim 
Subject: [PATCH] drm/amd/pm: wait for completion of the EnableGfxImu command

Wait for completion of sending the EnableGfxImu message when using the PSP FW 
loading.

Signed-off-by: Tim Huang 
---
 drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
index 8dc683c02a7d..bcb7ab9d2221 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
@@ -82,6 +82,8 @@ MODULE_FIRMWARE("amdgpu/smu_13_0_10.bin");
 #define PCIE_LC_SPEED_CNTL__LC_CURRENT_DATA_RATE_MASK 0xC000  #define 
PCIE_LC_SPEED_CNTL__LC_CURRENT_DATA_RATE__SHIFT 0xE

+#define ENABLE_IMU_ARG_GFXOFF_ENABLE   1
+
 static const int link_width[] = {0, 1, 2, 4, 8, 12, 16};

 const int pmfw_decoded_link_speed[5] = {1, 2, 3, 4, 5}; @@ -2301,11 +2303,17 
@@ int smu_v13_0_baco_exit(struct smu_context *smu)  int 
smu_v13_0_set_gfx_power_up_by_imu(struct smu_context *smu)  {
uint16_t index;
+   struct amdgpu_device *adev = smu->adev;
+
+   if (adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) {
+   return smu_cmn_send_smc_msg_with_param(smu, 
SMU_MSG_EnableGfxImu,
+  
ENABLE_IMU_ARG_GFXOFF_ENABLE, NULL);
+   }

index = smu_cmn_to_asic_specific_index(smu, CMN2ASIC_MAPPING_MSG,
   SMU_MSG_EnableGfxImu);
-   /* Param 1 to tell PMFW to enable GFXOFF feature */
-   return smu_cmn_send_msg_without_waiting(smu, index, 1);
+   return smu_cmn_send_msg_without_waiting(smu, index,
+   ENABLE_IMU_ARG_GFXOFF_ENABLE);
 }

 int smu_v13_0_od_edit_dpm_table(struct smu_context *smu,
--
2.39.2

Re: [PATCH v3 3/3] drm/amd/pm: Use gpu_metrics_v1_4 for SMUv13.0.6

2023-10-09 Thread Lazar, Lijo





On 10/9/2023 8:23 PM, Asad Kamal wrote:

Use gpu_metrics_v1_4 for SMUv13.0.6 to fill
gpu metric info

v3: Removed filling gpu metric instantaneous
pcie bw

Signed-off-by: Asad Kamal 


A special note inline.

Series is-
Reviewed-by: Lijo Lazar 


---
  .../drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c  | 65 ---
  1 file changed, 41 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
index ce971a93d28b..7ab73112e4f3 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
@@ -279,7 +279,7 @@ static int smu_v13_0_6_tables_init(struct smu_context *smu)
return -ENOMEM;
smu_table->metrics_time = 0;
  
-	smu_table->gpu_metrics_table_size = sizeof(struct gpu_metrics_v1_3);

+   smu_table->gpu_metrics_table_size = sizeof(struct gpu_metrics_v1_4);
smu_table->gpu_metrics_table =
kzalloc(smu_table->gpu_metrics_table_size, GFP_KERNEL);
if (!smu_table->gpu_metrics_table) {
@@ -1969,22 +1969,19 @@ static int 
smu_v13_0_6_get_current_pcie_link_speed(struct smu_context *smu)
  static ssize_t smu_v13_0_6_get_gpu_metrics(struct smu_context *smu, void 
**table)
  {
struct smu_table_context *smu_table = >smu_table;
-   struct gpu_metrics_v1_3 *gpu_metrics =
-   (struct gpu_metrics_v1_3 *)smu_table->gpu_metrics_table;
+   struct gpu_metrics_v1_4 *gpu_metrics =
+   (struct gpu_metrics_v1_4 *)smu_table->gpu_metrics_table;
struct amdgpu_device *adev = smu->adev;
-   int ret = 0, inst0, xcc0;
+   int ret = 0, xcc_id, inst, i;
MetricsTable_t *metrics;
u16 link_width_level;
  
-	inst0 = adev->sdma.instance[0].aid_id;

-   xcc0 = GET_INST(GC, 0);
-
metrics = kzalloc(sizeof(MetricsTable_t), GFP_KERNEL);


Please take care to include "(drm/amd/pm: Fix a memory leak on an error 
path)" while pushing these changes.


Thanks,
Lijo


ret = smu_v13_0_6_get_metrics_table(smu, metrics, true);
if (ret)
return ret;
  
-	smu_cmn_init_soft_gpu_metrics(gpu_metrics, 1, 3);

+   smu_cmn_init_soft_gpu_metrics(gpu_metrics, 1, 4);
  
  	gpu_metrics->temperature_hotspot =

SMUQ10_ROUND(metrics->MaxSocketTemperature);
@@ -2000,30 +1997,38 @@ static ssize_t smu_v13_0_6_get_gpu_metrics(struct 
smu_context *smu, void **table
gpu_metrics->average_umc_activity =
SMUQ10_ROUND(metrics->DramBandwidthUtilization);
  
-	gpu_metrics->average_socket_power =

+   gpu_metrics->curr_socket_power =
SMUQ10_ROUND(metrics->SocketPower);
/* Energy counter reported in 15.259uJ (2^-16) units */
gpu_metrics->energy_accumulator = metrics->SocketEnergyAcc;
  
-	gpu_metrics->current_gfxclk =

-   SMUQ10_ROUND(metrics->GfxclkFrequency[xcc0]);
-   gpu_metrics->current_socclk =
-   SMUQ10_ROUND(metrics->SocclkFrequency[inst0]);
-   gpu_metrics->current_uclk = SMUQ10_ROUND(metrics->UclkFrequency);
-   gpu_metrics->current_vclk0 =
-   SMUQ10_ROUND(metrics->VclkFrequency[inst0]);
-   gpu_metrics->current_dclk0 =
-   SMUQ10_ROUND(metrics->DclkFrequency[inst0]);
+   for (i = 0; i < MAX_GFX_CLKS; i++) {
+   xcc_id = GET_INST(GC, i);
+   if (xcc_id >= 0)
+   gpu_metrics->current_gfxclk[i] =
+   SMUQ10_ROUND(metrics->GfxclkFrequency[xcc_id]);
+
+   if (i < MAX_CLKS) {
+   gpu_metrics->current_socclk[i] =
+   SMUQ10_ROUND(metrics->SocclkFrequency[i]);
+   inst = GET_INST(VCN, i);
+   if (inst >= 0) {
+   gpu_metrics->current_vclk0[i] =
+   
SMUQ10_ROUND(metrics->VclkFrequency[inst]);
+   gpu_metrics->current_dclk0[i] =
+   
SMUQ10_ROUND(metrics->DclkFrequency[inst]);
+   }
+   }
+   }
  
-	gpu_metrics->average_gfxclk_frequency = gpu_metrics->current_gfxclk;

-   gpu_metrics->average_socclk_frequency = gpu_metrics->current_socclk;
-   gpu_metrics->average_uclk_frequency = gpu_metrics->current_uclk;
-   gpu_metrics->average_vclk0_frequency = gpu_metrics->current_vclk0;
-   gpu_metrics->average_dclk0_frequency = gpu_metrics->current_dclk0;
+   gpu_metrics->current_uclk = SMUQ10_ROUND(metrics->UclkFrequency);
  
  	/* Throttle status is not reported through metrics now */

gpu_metrics->throttle_status = 0;
  
+	/* Clock Lock Status. Each bit corresponds to each GFXCLK instance */

+   gpu_metrics->gfxclk_lock_status = metrics->GfxLockXCDMak >> 
GET_INST(GC, 0);
+
if (!(adev->flags & AMD_IS_APU)) {

[PATCH] drm/amd/pm: wait for completion of the EnableGfxImu command

2023-10-09 Thread Tim Huang

Wait for completion of sending the EnableGfxImu message
when using the PSP FW loading.

Signed-off-by: Tim Huang 
---
 drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
index 8dc683c02a7d..bcb7ab9d2221 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
@@ -82,6 +82,8 @@ MODULE_FIRMWARE("amdgpu/smu_13_0_10.bin");
 #define PCIE_LC_SPEED_CNTL__LC_CURRENT_DATA_RATE_MASK 0xC000
 #define PCIE_LC_SPEED_CNTL__LC_CURRENT_DATA_RATE__SHIFT 0xE
 
+#define ENABLE_IMU_ARG_GFXOFF_ENABLE   1
+
 static const int link_width[] = {0, 1, 2, 4, 8, 12, 16};
 
 const int pmfw_decoded_link_speed[5] = {1, 2, 3, 4, 5};
@@ -2301,11 +2303,17 @@ int smu_v13_0_baco_exit(struct smu_context *smu)
 int smu_v13_0_set_gfx_power_up_by_imu(struct smu_context *smu)
 {
uint16_t index;
+   struct amdgpu_device *adev = smu->adev;
+
+   if (adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) {
+   return smu_cmn_send_smc_msg_with_param(smu, 
SMU_MSG_EnableGfxImu,
+  
ENABLE_IMU_ARG_GFXOFF_ENABLE, NULL);
+   }
 
index = smu_cmn_to_asic_specific_index(smu, CMN2ASIC_MAPPING_MSG,
   SMU_MSG_EnableGfxImu);
-   /* Param 1 to tell PMFW to enable GFXOFF feature */
-   return smu_cmn_send_msg_without_waiting(smu, index, 1);
+   return smu_cmn_send_msg_without_waiting(smu, index,
+   ENABLE_IMU_ARG_GFXOFF_ENABLE);
 }
 
 int smu_v13_0_od_edit_dpm_table(struct smu_context *smu,
-- 
2.39.2

Re: [PATCH] drm/amdgpu: Return -EINVAL when MMSCH init status incorrect

2023-10-09 Thread Chen, JingWen (Wayne)


Reviewed-by: Jingwen Chen 
--
Best Regards,
JingWen Chen

On 2023/10/8 18:06, Lin.Cao wrote:

Return -EINVAL when MMSCH init fail which can be handle by function
amdgpu_device_reset_sriov correctly.

Signed-off-by: Lin.Cao 
---
  drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c 
b/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c
index ac614b869aaf..a3768aefb6b6 100644
--- a/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c
@@ -518,8 +518,11 @@ static int jpeg_v4_0_start_sriov(struct amdgpu_device 
*adev)
return -EBUSY;
}
}
-   if (resp != expected && resp != MMSCH_VF_MAILBOX_RESP__INCOMPLETE && 
init_status != MMSCH_VF_ENGINE_STATUS__PASS)
+   if (resp != expected && resp != MMSCH_VF_MAILBOX_RESP__INCOMPLETE
+   && init_status != MMSCH_VF_ENGINE_STATUS__PASS) {
DRM_ERROR("MMSCH init status is incorrect! readback=0x%08x, header 
init status for jpeg: %x\n", resp, init_status);
+   return -EINVAL;
+   }
  
  	return 0;

RE: [PATCH] drm/amd/pm: Fix a memory leak on an error path

2023-10-09 Thread Wang, Yang(Kevin)

Fixes: 511a95552ec8 ("drm/amd/pm: Add SMU 13.0.6 support")

Please add above information into your patch commit message.

Reviewed-by: Yang Wang 

Best Regards,
Kevin

-Original Message-
From: Kunwu.Chan  
Sent: Tuesday, October 10, 2023 10:19 AM
To: evan.q...@amd.com; Deucher, Alexander ; Koenig, 
Christian ; Pan, Xinhui ; 
airl...@gmail.com; dan...@ffwll.ch; Lazar, Lijo ; Kamal, 
Asad ; Zhang, Hawking ; Wang, 
Yang(Kevin) ; Ma, Le ; 
dan.carpen...@linaro.org
Cc: amd-gfx@lists.freedesktop.org; dri-de...@lists.freedesktop.org; 
linux-ker...@vger.kernel.org; kunwu.c...@hotmail.com; Kunwu.Chan 

Subject: [PATCH] drm/amd/pm: Fix a memory leak on an error path

Add missing free on an error path.

Signed-off-by: Kunwu.Chan 
---
 drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
index ce971a93d28b..c26e12ff532c 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
@@ -1981,8 +1981,10 @@ static ssize_t smu_v13_0_6_get_gpu_metrics(struct 
smu_context *smu, void **table
 
metrics = kzalloc(sizeof(MetricsTable_t), GFP_KERNEL);
ret = smu_v13_0_6_get_metrics_table(smu, metrics, true);
-   if (ret)
+   if (ret) {
+   kfree(metrics);
return ret;
+   }
 
smu_cmn_init_soft_gpu_metrics(gpu_metrics, 1, 3);
 
-- 
2.25.1

RE: [PATCH] drm/amdgpu:Check gfx poweron when skip flush_gpu_tlb

2023-10-09 Thread Xu, Feifei

[AMD Official Use Only - General]

Yes, adev->gfx.is_poweron check will be applied in gmc_v11 callback, which will 
be called after the generic gmc part: amdgpu_gmc_flush_gpu_tlb() function.
But in commit: d0c860f33553 ("drm/amdgpu: rework lock handling for flush_tlb 
v2"), the flush is moved at a higher level amdgpu_gmc_flush_gpu_tlb() function.

Thus the gmc_v11 callback will never be called in the resume because 
adev->reset_domain->sem not released and returned ahead. Adding a check of 
adev->gfx.is_poweron will let GFX11 not breaking ahead, like following:

if (!down_read_trylock(>reset_domain->sem) && //--> true in gfx11
+!adev->gfx.is_poweron) //-->false in gfx11, and the whole if statement 
will be false, not return ahead. The following gmc v11 callback will be 
executed later.

Thanks,
Feifei

-Original Message-
From: Zhang, Hawking 
Sent: Monday, October 9, 2023 4:58 PM
To: Xu, Feifei ; Wang, Yang(Kevin) ; 
amd-gfx@lists.freedesktop.org
Cc: Koenig, Christian 
Subject: RE: [PATCH] drm/amdgpu:Check gfx poweron when skip flush_gpu_tlb

[AMD Official Use Only - General]

adev->gfx.is_poweron check should already be applied in IP specific (gmc v11) 
callback. If gfx is not power on, it does nothing but just returns. I didn't 
see how it helps resolve the issue if we just move the check from one function 
to another.

Regards,
Hawking

-Original Message-
From: Xu, Feifei 
Sent: Monday, October 9, 2023 09:51
To: Wang, Yang(Kevin) ; amd-gfx@lists.freedesktop.org
Cc: Koenig, Christian ; Zhang, Hawking 

Subject: RE: [PATCH] drm/amdgpu:Check gfx poweron when skip flush_gpu_tlb

[AMD Official Use Only - General]

Hi,

>> Based on your description, the above code should use "||" instead of
>> "&&",
&& is to add more restriction here.  To avoid skipping necessary TLB flush by 
return.
For Asics < GFX11, !adev->gfx.is_poweron is always true (this paremeter is 
intrudoced from GFX11), only depends on reset_domain->sem; For Asics = GFX11, 
!adev->gfx.is_poweron might be false (which gfx might already poweron in the 
reset), this will make the if () not ture, return will not be executed, thus 
flush TLB.

>> And after merging code into one line may result in the lock not being 
>> released if the lock can be acquired success.
If !adev->gfx.is_poweron is true, the reset_domin->sem will not be 
down_read_trylock, thus could avoid this deadlock.

Thanks,
Feifei

-Original Message-
From: Wang, Yang(Kevin) 
Sent: Sunday, October 8, 2023 9:36 PM
To: Xu, Feifei ; amd-gfx@lists.freedesktop.org
Cc: Xu, Feifei ; Xu, Feifei ; Koenig, 
Christian ; Zhang, Hawking 
Subject: RE: [PATCH] drm/amdgpu:Check gfx poweron when skip flush_gpu_tlb


-Original Message-
From: amd-gfx  On Behalf Of Feifei Xu
Sent: Sunday, October 8, 2023 6:07 PM
To: amd-gfx@lists.freedesktop.org
Cc: Xu, Feifei ; Xu, Feifei ; Koenig, 
Christian ; Zhang, Hawking 
Subject: [PATCH] drm/amdgpu:Check gfx poweron when skip flush_gpu_tlb

To fix the gpu recovery failed on GFX11 with gfxhub pagefault, flush gpu tlb 
after reset on GFX11.
Gfxhub tlb flush need check if adev->gfx.is_poweron set.

Fixes: d0c860f33553 ("drm/amdgpu: rework lock handling for flush_tlb v2")

Signed-off-by: Feifei Xu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
index 2f9bb86edd71..048d32edee88 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
@@ -611,8 +611,9 @@ void amdgpu_gmc_flush_gpu_tlb(struct amdgpu_device *adev, 
uint32_t vmid,
/*
 * A GPU reset should flush all TLBs anyway, so no need to do
 * this while one is ongoing.
+* For GFX11, gfxhub flush check if adev->gfx.is_poweron is set.
 */
-   if (!down_read_trylock(>reset_domain->sem))
+   if (!down_read_trylock(>reset_domain->sem) &&
+!adev->gfx.is_poweron)
return;

[Kevin]:
Based on your description, the above code should use "||" instead of "&&", And 
after merging code into one line may result in the lock not being released if 
the lock can be acquired success.

Best Regards,
Kevin

if (adev->gmc.flush_tlb_needs_extra_type_2)
--
2.34.1

Re: [PATCH 2/3] drm/amdgpu/umsch: power on/off UMSCH by DLDO

2023-10-09 Thread Lang Yu

On 10/10/ , Deucher, Alexander wrote:
> [Public]
> 
> > -Original Message-
> > From: Yu, Lang 
> > Sent: Saturday, October 7, 2023 4:54 AM
> > To: amd-gfx@lists.freedesktop.org
> > Cc: Deucher, Alexander ; Zhang, Yifan
> > ; Gopalakrishnan, Veerabadhran (Veera)
> > ; Yu, Lang 
> > Subject: [PATCH 2/3] drm/amdgpu/umsch: power on/off UMSCH by DLDO
> >
> > VCN 4.0.5 uses DLDO.
> >
> > Signed-off-by: Lang Yu 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c | 26
> > ++
> >  1 file changed, 26 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c
> > b/drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c
> > index a60178156c77..7e79954c833b 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c
> > @@ -34,6 +34,16 @@
> >  #include "umsch_mm_4_0_api_def.h"
> >  #include "umsch_mm_v4_0.h"
> >
> > +#define regUVD_IPX_DLDO_CONFIG 0x0064
> > +#define regUVD_IPX_DLDO_CONFIG_BASE_IDX1
> > +#define regUVD_IPX_DLDO_STATUS 0x0065
> > +#define regUVD_IPX_DLDO_STATUS_BASE_IDX1
> > +
> > +#define UVD_IPX_DLDO_CONFIG__ONO0_PWR_CONFIG__SHIFT
> > 0x0002
> > +#define UVD_IPX_DLDO_CONFIG__ONO0_PWR_CONFIG_MASK
> > 0x000cUL
> > +#define UVD_IPX_DLDO_STATUS__ONO0_PWR_STATUS__SHIFT
> > 0x0001
> > +#define UVD_IPX_DLDO_STATUS__ONO0_PWR_STATUS_MASK
> > 0x0002UL
> > +
> >  static int umsch_mm_v4_0_load_microcode(struct amdgpu_umsch_mm
> > *umsch)  {
> >   struct amdgpu_device *adev = umsch->ring.adev; @@ -50,6 +60,14
> > @@ static int umsch_mm_v4_0_load_microcode(struct amdgpu_umsch_mm
> > *umsch)
> >
> >   umsch->cmd_buf_curr_ptr = umsch->cmd_buf_ptr;
> >
> > + if (adev->ip_versions[VCN_HWIP][0] == IP_VERSION(4, 0, 5)) {
> 
> This switched to a function call.  Amdgpu_ip_version().
> 
> > + WREG32_SOC15(VCN, 0, regUVD_IPX_DLDO_CONFIG,
> > + 1 <<
> > UVD_IPX_DLDO_CONFIG__ONO0_PWR_CONFIG__SHIFT);
> > + SOC15_WAIT_ON_RREG(VCN, 0, regUVD_IPX_DLDO_STATUS,
> > + 0 <<
> > UVD_IPX_DLDO_STATUS__ONO0_PWR_STATUS__SHIFT,
> > +
> >   UVD_IPX_DLDO_STATUS__ONO0_PWR_STATUS_MASK);
> > + }
> > +
> >   data = RREG32_SOC15(VCN, 0, regUMSCH_MES_RESET_CTRL);
> >   data = REG_SET_FIELD(data, UMSCH_MES_RESET_CTRL,
> > MES_CORE_SOFT_RESET, 0);
> >   WREG32_SOC15_UMSCH(regUMSCH_MES_RESET_CTRL, data); @@ -
> > 229,6 +247,14 @@ static int umsch_mm_v4_0_ring_stop(struct
> > amdgpu_umsch_mm *umsch)
> >   data = REG_SET_FIELD(data, VCN_UMSCH_RB_DB_CTRL, EN, 0);
> >   WREG32_SOC15(VCN, 0, regVCN_UMSCH_RB_DB_CTRL, data);
> >
> > + if (adev->ip_versions[VCN_HWIP][0] == IP_VERSION(4, 0, 5)) {
> 
> Same here.

Thanks for pointing out them. Will fix this.

Regards,
Lang

> Alex
> 
> > + WREG32_SOC15(VCN, 0, regUVD_IPX_DLDO_CONFIG,
> > + 2 <<
> > UVD_IPX_DLDO_CONFIG__ONO0_PWR_CONFIG__SHIFT);
> > + SOC15_WAIT_ON_RREG(VCN, 0, regUVD_IPX_DLDO_STATUS,
> > + 1 <<
> > UVD_IPX_DLDO_STATUS__ONO0_PWR_STATUS__SHIFT,
> > +
> >   UVD_IPX_DLDO_STATUS__ONO0_PWR_STATUS_MASK);
> > + }
> > +
> >   return 0;
> >  }
> >
> > --
> > 2.25.1
>

Re: [PATCH 2/3] drm/amdgpu/umsch: power on/off UMSCH by DLDO

2023-10-09 Thread Lang Yu

On 10/10/ , Deucher, Alexander wrote:
> [Public]
> 
> > -Original Message-
> > From: Yu, Lang 
> > Sent: Saturday, October 7, 2023 4:54 AM
> > To: amd-gfx@lists.freedesktop.org
> > Cc: Deucher, Alexander ; Zhang, Yifan
> > ; Gopalakrishnan, Veerabadhran (Veera)
> > ; Yu, Lang 
> > Subject: [PATCH 2/3] drm/amdgpu/umsch: power on/off UMSCH by DLDO
> >
> > VCN 4.0.5 uses DLDO.
> >
> > Signed-off-by: Lang Yu 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c | 26
> > ++
> >  1 file changed, 26 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c
> > b/drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c
> > index a60178156c77..7e79954c833b 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c
> > @@ -34,6 +34,16 @@
> >  #include "umsch_mm_4_0_api_def.h"
> >  #include "umsch_mm_v4_0.h"
> >
> > +#define regUVD_IPX_DLDO_CONFIG 0x0064
> > +#define regUVD_IPX_DLDO_CONFIG_BASE_IDX1
> > +#define regUVD_IPX_DLDO_STATUS 0x0065
> > +#define regUVD_IPX_DLDO_STATUS_BASE_IDX1
> > +
> > +#define UVD_IPX_DLDO_CONFIG__ONO0_PWR_CONFIG__SHIFT
> > 0x0002
> > +#define UVD_IPX_DLDO_CONFIG__ONO0_PWR_CONFIG_MASK
> > 0x000cUL
> > +#define UVD_IPX_DLDO_STATUS__ONO0_PWR_STATUS__SHIFT
> > 0x0001
> > +#define UVD_IPX_DLDO_STATUS__ONO0_PWR_STATUS_MASK
> > 0x0002UL
> > +
> >  static int umsch_mm_v4_0_load_microcode(struct amdgpu_umsch_mm
> > *umsch)  {
> >   struct amdgpu_device *adev = umsch->ring.adev; @@ -50,6 +60,14
> > @@ static int umsch_mm_v4_0_load_microcode(struct amdgpu_umsch_mm
> > *umsch)
> >
> >   umsch->cmd_buf_curr_ptr = umsch->cmd_buf_ptr;
> >
> > + if (adev->ip_versions[VCN_HWIP][0] == IP_VERSION(4, 0, 5)) {
> > + WREG32_SOC15(VCN, 0, regUVD_IPX_DLDO_CONFIG,
> > + 1 <<
> > UVD_IPX_DLDO_CONFIG__ONO0_PWR_CONFIG__SHIFT);
> > + SOC15_WAIT_ON_RREG(VCN, 0, regUVD_IPX_DLDO_STATUS,
> > + 0 <<
> > UVD_IPX_DLDO_STATUS__ONO0_PWR_STATUS__SHIFT,
> > +
> >   UVD_IPX_DLDO_STATUS__ONO0_PWR_STATUS_MASK);
> > + }
> > +
> 
> Is this the right place for this?  umsch_mm_hw_init() only calls this for 
> FW_LOAD_DIRECT.  Maybe that check needs to be dropped?

That check is dropped in [PATCH 1/3] drm/amdgpu/umsch: fix psp frontdoor 
loading.

PMFW removed DLDO programing in PPSMC_MSG_PowerUpUmsch function.
So driver needs to program it explicitly.

Regards,
Lang

> Alex
> 
> >   data = RREG32_SOC15(VCN, 0, regUMSCH_MES_RESET_CTRL);
> >   data = REG_SET_FIELD(data, UMSCH_MES_RESET_CTRL,
> > MES_CORE_SOFT_RESET, 0);
> >   WREG32_SOC15_UMSCH(regUMSCH_MES_RESET_CTRL, data); @@ -
> > 229,6 +247,14 @@ static int umsch_mm_v4_0_ring_stop(struct
> > amdgpu_umsch_mm *umsch)
> >   data = REG_SET_FIELD(data, VCN_UMSCH_RB_DB_CTRL, EN, 0);
> >   WREG32_SOC15(VCN, 0, regVCN_UMSCH_RB_DB_CTRL, data);
> >
> > + if (adev->ip_versions[VCN_HWIP][0] == IP_VERSION(4, 0, 5)) {
> > + WREG32_SOC15(VCN, 0, regUVD_IPX_DLDO_CONFIG,
> > + 2 <<
> > UVD_IPX_DLDO_CONFIG__ONO0_PWR_CONFIG__SHIFT);
> > + SOC15_WAIT_ON_RREG(VCN, 0, regUVD_IPX_DLDO_STATUS,
> > + 1 <<
> > UVD_IPX_DLDO_STATUS__ONO0_PWR_STATUS__SHIFT,
> > +
> >   UVD_IPX_DLDO_STATUS__ONO0_PWR_STATUS_MASK);
> > + }
> > +
> >   return 0;
> >  }
> >
> > --
> > 2.25.1
>

Re: [PATCH] drm/amd/display: Enable fast plane updates on DCN3.2 and above when state->allow_modeset = true

2023-10-09 Thread Mario Limonciello


On 10/7/2023 00:41, Tianci Yin wrote:

From: tiancyin 

[why]
When cursor moves across screen boarder, lag cursor observed,
since subvp settings need to sync up with vblank, that cause
cursor updates being delayed.

[how]
Enable fast plane updates on DCN3.2 to fix it.

Signed-off-by: tiancyin 
---
  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index c21726bdbca2..25a0bd314fe5 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -9879,6 +9879,7 @@ static bool should_reset_plane(struct drm_atomic_state 
*state,
struct drm_plane *other;
struct drm_plane_state *old_other_state, *new_other_state;
struct drm_crtc_state *new_crtc_state;
+   struct amdgpu_device *adev = drm_to_adev(plane->dev);
int i;
  
  	/*

@@ -9886,7 +9887,7 @@ static bool should_reset_plane(struct drm_atomic_state 
*state,
 * enough to determine when we need to reset all the planes on
 * the stream.
 */
-   if (state->allow_modeset)
+   if (adev->ip_versions[DCE_HWIP][0] < IP_VERSION(3, 2, 0) && 
state->allow_modeset)
return true;
  
  	/* Exit early if we know that we're adding or removing the plane. */


The comment associated with this says that this hack should go when 
there are sufficient checks.


If there are enough checks for DCN3.2, isn't it likely there are enough 
for earlier products too?  None of the rest of the code checks a 
specific IP version.


Maybe the whole commit/block should go?

RE: [PATCH 2/3] drm/amdgpu/umsch: power on/off UMSCH by DLDO

2023-10-09 Thread Deucher, Alexander

[Public]

> -Original Message-
> From: Yu, Lang 
> Sent: Saturday, October 7, 2023 4:54 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Zhang, Yifan
> ; Gopalakrishnan, Veerabadhran (Veera)
> ; Yu, Lang 
> Subject: [PATCH 2/3] drm/amdgpu/umsch: power on/off UMSCH by DLDO
>
> VCN 4.0.5 uses DLDO.
>
> Signed-off-by: Lang Yu 
> ---
>  drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c | 26
> ++
>  1 file changed, 26 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c
> b/drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c
> index a60178156c77..7e79954c833b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c
> @@ -34,6 +34,16 @@
>  #include "umsch_mm_4_0_api_def.h"
>  #include "umsch_mm_v4_0.h"
>
> +#define regUVD_IPX_DLDO_CONFIG 0x0064
> +#define regUVD_IPX_DLDO_CONFIG_BASE_IDX1
> +#define regUVD_IPX_DLDO_STATUS 0x0065
> +#define regUVD_IPX_DLDO_STATUS_BASE_IDX1
> +
> +#define UVD_IPX_DLDO_CONFIG__ONO0_PWR_CONFIG__SHIFT
> 0x0002
> +#define UVD_IPX_DLDO_CONFIG__ONO0_PWR_CONFIG_MASK
> 0x000cUL
> +#define UVD_IPX_DLDO_STATUS__ONO0_PWR_STATUS__SHIFT
> 0x0001
> +#define UVD_IPX_DLDO_STATUS__ONO0_PWR_STATUS_MASK
> 0x0002UL
> +
>  static int umsch_mm_v4_0_load_microcode(struct amdgpu_umsch_mm
> *umsch)  {
>   struct amdgpu_device *adev = umsch->ring.adev; @@ -50,6 +60,14
> @@ static int umsch_mm_v4_0_load_microcode(struct amdgpu_umsch_mm
> *umsch)
>
>   umsch->cmd_buf_curr_ptr = umsch->cmd_buf_ptr;
>
> + if (adev->ip_versions[VCN_HWIP][0] == IP_VERSION(4, 0, 5)) {
> + WREG32_SOC15(VCN, 0, regUVD_IPX_DLDO_CONFIG,
> + 1 <<
> UVD_IPX_DLDO_CONFIG__ONO0_PWR_CONFIG__SHIFT);
> + SOC15_WAIT_ON_RREG(VCN, 0, regUVD_IPX_DLDO_STATUS,
> + 0 <<
> UVD_IPX_DLDO_STATUS__ONO0_PWR_STATUS__SHIFT,
> +
>   UVD_IPX_DLDO_STATUS__ONO0_PWR_STATUS_MASK);
> + }
> +

Is this the right place for this?  umsch_mm_hw_init() only calls this for 
FW_LOAD_DIRECT.  Maybe that check needs to be dropped?

Alex

>   data = RREG32_SOC15(VCN, 0, regUMSCH_MES_RESET_CTRL);
>   data = REG_SET_FIELD(data, UMSCH_MES_RESET_CTRL,
> MES_CORE_SOFT_RESET, 0);
>   WREG32_SOC15_UMSCH(regUMSCH_MES_RESET_CTRL, data); @@ -
> 229,6 +247,14 @@ static int umsch_mm_v4_0_ring_stop(struct
> amdgpu_umsch_mm *umsch)
>   data = REG_SET_FIELD(data, VCN_UMSCH_RB_DB_CTRL, EN, 0);
>   WREG32_SOC15(VCN, 0, regVCN_UMSCH_RB_DB_CTRL, data);
>
> + if (adev->ip_versions[VCN_HWIP][0] == IP_VERSION(4, 0, 5)) {
> + WREG32_SOC15(VCN, 0, regUVD_IPX_DLDO_CONFIG,
> + 2 <<
> UVD_IPX_DLDO_CONFIG__ONO0_PWR_CONFIG__SHIFT);
> + SOC15_WAIT_ON_RREG(VCN, 0, regUVD_IPX_DLDO_STATUS,
> + 1 <<
> UVD_IPX_DLDO_STATUS__ONO0_PWR_STATUS__SHIFT,
> +
>   UVD_IPX_DLDO_STATUS__ONO0_PWR_STATUS_MASK);
> + }
> +
>   return 0;
>  }
>
> --
> 2.25.1

RE: [PATCH 2/3] drm/amdgpu/umsch: power on/off UMSCH by DLDO

2023-10-09 Thread Deucher, Alexander

[Public]

> -Original Message-
> From: Yu, Lang 
> Sent: Saturday, October 7, 2023 4:54 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Zhang, Yifan
> ; Gopalakrishnan, Veerabadhran (Veera)
> ; Yu, Lang 
> Subject: [PATCH 2/3] drm/amdgpu/umsch: power on/off UMSCH by DLDO
>
> VCN 4.0.5 uses DLDO.
>
> Signed-off-by: Lang Yu 
> ---
>  drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c | 26
> ++
>  1 file changed, 26 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c
> b/drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c
> index a60178156c77..7e79954c833b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c
> @@ -34,6 +34,16 @@
>  #include "umsch_mm_4_0_api_def.h"
>  #include "umsch_mm_v4_0.h"
>
> +#define regUVD_IPX_DLDO_CONFIG 0x0064
> +#define regUVD_IPX_DLDO_CONFIG_BASE_IDX1
> +#define regUVD_IPX_DLDO_STATUS 0x0065
> +#define regUVD_IPX_DLDO_STATUS_BASE_IDX1
> +
> +#define UVD_IPX_DLDO_CONFIG__ONO0_PWR_CONFIG__SHIFT
> 0x0002
> +#define UVD_IPX_DLDO_CONFIG__ONO0_PWR_CONFIG_MASK
> 0x000cUL
> +#define UVD_IPX_DLDO_STATUS__ONO0_PWR_STATUS__SHIFT
> 0x0001
> +#define UVD_IPX_DLDO_STATUS__ONO0_PWR_STATUS_MASK
> 0x0002UL
> +
>  static int umsch_mm_v4_0_load_microcode(struct amdgpu_umsch_mm
> *umsch)  {
>   struct amdgpu_device *adev = umsch->ring.adev; @@ -50,6 +60,14
> @@ static int umsch_mm_v4_0_load_microcode(struct amdgpu_umsch_mm
> *umsch)
>
>   umsch->cmd_buf_curr_ptr = umsch->cmd_buf_ptr;
>
> + if (adev->ip_versions[VCN_HWIP][0] == IP_VERSION(4, 0, 5)) {

This switched to a function call.  Amdgpu_ip_version().

> + WREG32_SOC15(VCN, 0, regUVD_IPX_DLDO_CONFIG,
> + 1 <<
> UVD_IPX_DLDO_CONFIG__ONO0_PWR_CONFIG__SHIFT);
> + SOC15_WAIT_ON_RREG(VCN, 0, regUVD_IPX_DLDO_STATUS,
> + 0 <<
> UVD_IPX_DLDO_STATUS__ONO0_PWR_STATUS__SHIFT,
> +
>   UVD_IPX_DLDO_STATUS__ONO0_PWR_STATUS_MASK);
> + }
> +
>   data = RREG32_SOC15(VCN, 0, regUMSCH_MES_RESET_CTRL);
>   data = REG_SET_FIELD(data, UMSCH_MES_RESET_CTRL,
> MES_CORE_SOFT_RESET, 0);
>   WREG32_SOC15_UMSCH(regUMSCH_MES_RESET_CTRL, data); @@ -
> 229,6 +247,14 @@ static int umsch_mm_v4_0_ring_stop(struct
> amdgpu_umsch_mm *umsch)
>   data = REG_SET_FIELD(data, VCN_UMSCH_RB_DB_CTRL, EN, 0);
>   WREG32_SOC15(VCN, 0, regVCN_UMSCH_RB_DB_CTRL, data);
>
> + if (adev->ip_versions[VCN_HWIP][0] == IP_VERSION(4, 0, 5)) {

Same here.

Alex

> + WREG32_SOC15(VCN, 0, regUVD_IPX_DLDO_CONFIG,
> + 2 <<
> UVD_IPX_DLDO_CONFIG__ONO0_PWR_CONFIG__SHIFT);
> + SOC15_WAIT_ON_RREG(VCN, 0, regUVD_IPX_DLDO_STATUS,
> + 1 <<
> UVD_IPX_DLDO_STATUS__ONO0_PWR_STATUS__SHIFT,
> +
>   UVD_IPX_DLDO_STATUS__ONO0_PWR_STATUS_MASK);
> + }
> +
>   return 0;
>  }
>
> --
> 2.25.1

Re: [PATCH] drm/amdgpu: fix SI failure due to doorbells allocation

2023-10-09 Thread Alex Deucher

Applied.  Thanks!

On Mon, Oct 9, 2023 at 5:27 AM Sharma, Shashank  wrote:
>
> [AMD Official Use Only - General]
>
> Reviewed-by: Shashank Sharma 
>
> Regards
> Shashank
> -Original Message-
> From: Icenowy Zheng 
> Sent: Sunday, October 8, 2023 8:47 AM
> To: Deucher, Alexander ; Koenig, Christian 
> ; Pan, Xinhui ; David Airlie 
> ; Daniel Vetter ; Sharma, Shashank 
> ; Yadav, Arvind 
> Cc: amd-gfx@lists.freedesktop.org; dri-de...@lists.freedesktop.org; 
> linux-ker...@vger.kernel.org; Icenowy Zheng 
> Subject: [PATCH] drm/amdgpu: fix SI failure due to doorbells allocation
>
> SI hardware does not have doorbells at all, however currently the code will 
> try to do the allocation and thus fail, makes SI AMDGPU not usable.
>
> Fix this failure by skipping doorbells allocation when doorbells count is 
> zero.
>
> Fixes: 54c30d2a8def ("drm/amdgpu: create kernel doorbell pages")
> Signed-off-by: Icenowy Zheng 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell_mgr.c | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell_mgr.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell_mgr.c
> index d0249ada91d30..599aece42017a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell_mgr.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell_mgr.c
> @@ -142,6 +142,10 @@ int amdgpu_doorbell_create_kernel_doorbells(struct 
> amdgpu_device *adev)
> int r;
> int size;
>
> +   /* SI HW does not have doorbells, skip allocation */
> +   if (adev->doorbell.num_kernel_doorbells == 0)
> +   return 0;
> +
> /* Reserve first num_kernel_doorbells (page-aligned) for kernel ops */
> size = ALIGN(adev->doorbell.num_kernel_doorbells * sizeof(u32), 
> PAGE_SIZE);
>
> --
> 2.39.1
>

Re: [PATCH v6 4/7] drm/amd: Capture errors in amdgpu_switcheroo_set_state()

2023-10-09 Thread Mario Limonciello


On 10/9/2023 13:40, Alex Deucher wrote:

On Mon, Oct 9, 2023 at 12:52 PM Mario Limonciello
 wrote:


amdgpu_switcheroo_set_state() calls lots of functions that could
fail under memory pressure or for other reasons.  Don't assume
everything can successfully run sequentially, and check return codes
for everything that returns one.


I think the reason we do this rather than returning errors was not
because we assumed they would be successful, but that it seemed better
to try and restore the hardware than to bail early.


OK.  If no other feedback for the series I guess for now I'll drop this 
patch from the series when it's committed.




Alex



Acked-by: Christian König 
Signed-off-by: Mario Limonciello 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 36 +-
  1 file changed, 29 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 0f98f720d9ca..65a4537ee6f3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -1749,23 +1749,45 @@ static void amdgpu_switcheroo_set_state(struct pci_dev 
*pdev,
 /* don't suspend or resume card normally */
 dev->switch_power_state = DRM_SWITCH_POWER_CHANGING;

-   pci_set_power_state(pdev, PCI_D0);
-   amdgpu_device_load_pci_state(pdev);
+   r = pci_set_power_state(pdev, PCI_D0);
+   if (r) {
+   DRM_WARN("pci_set_power_state failed (%d)\n", r);
+   return;
+   }
+   if (!amdgpu_device_load_pci_state(pdev))
+   return;
 r = pci_enable_device(pdev);
 if (r)
 DRM_WARN("pci_enable_device failed (%d)\n", r);
-   amdgpu_device_resume(dev, true);
+   r = amdgpu_device_resume(dev, true);
+   if (r) {
+   DRM_WARN("amdgpu_device_resume failed (%d)\n", r);
+   return;
+   }

 dev->switch_power_state = DRM_SWITCH_POWER_ON;
 } else {
 pr_info("switched off\n");
 dev->switch_power_state = DRM_SWITCH_POWER_CHANGING;
-   amdgpu_device_prepare(dev);
-   amdgpu_device_suspend(dev, true);
-   amdgpu_device_cache_pci_state(pdev);
+   r = amdgpu_device_prepare(dev);
+   if (r) {
+   DRM_WARN("amdgpu_device_prepare failed (%d)\n", r);
+   return;
+   }
+   r = amdgpu_device_suspend(dev, true);
+   if (r) {
+   DRM_WARN("amdgpu_device_suspend failed (%d)\n", r);
+   return;
+   }
+   if (!amdgpu_device_cache_pci_state(pdev))
+   return;
 /* Shut down the device */
 pci_disable_device(pdev);
-   pci_set_power_state(pdev, PCI_D3cold);
+   r = pci_set_power_state(pdev, PCI_D3cold);
+   if (r) {
+   DRM_WARN("pci_set_power_state failed (%d)\n", r);
+   return;
+   }
 dev->switch_power_state = DRM_SWITCH_POWER_OFF;
 }
  }
--
2.34.1

Re: [PATCH v6 4/7] drm/amd: Capture errors in amdgpu_switcheroo_set_state()

2023-10-09 Thread Alex Deucher

On Mon, Oct 9, 2023 at 12:52 PM Mario Limonciello
 wrote:
>
> amdgpu_switcheroo_set_state() calls lots of functions that could
> fail under memory pressure or for other reasons.  Don't assume
> everything can successfully run sequentially, and check return codes
> for everything that returns one.

I think the reason we do this rather than returning errors was not
because we assumed they would be successful, but that it seemed better
to try and restore the hardware than to bail early.

Alex

>
> Acked-by: Christian König 
> Signed-off-by: Mario Limonciello 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 36 +-
>  1 file changed, 29 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 0f98f720d9ca..65a4537ee6f3 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -1749,23 +1749,45 @@ static void amdgpu_switcheroo_set_state(struct 
> pci_dev *pdev,
> /* don't suspend or resume card normally */
> dev->switch_power_state = DRM_SWITCH_POWER_CHANGING;
>
> -   pci_set_power_state(pdev, PCI_D0);
> -   amdgpu_device_load_pci_state(pdev);
> +   r = pci_set_power_state(pdev, PCI_D0);
> +   if (r) {
> +   DRM_WARN("pci_set_power_state failed (%d)\n", r);
> +   return;
> +   }
> +   if (!amdgpu_device_load_pci_state(pdev))
> +   return;
> r = pci_enable_device(pdev);
> if (r)
> DRM_WARN("pci_enable_device failed (%d)\n", r);
> -   amdgpu_device_resume(dev, true);
> +   r = amdgpu_device_resume(dev, true);
> +   if (r) {
> +   DRM_WARN("amdgpu_device_resume failed (%d)\n", r);
> +   return;
> +   }
>
> dev->switch_power_state = DRM_SWITCH_POWER_ON;
> } else {
> pr_info("switched off\n");
> dev->switch_power_state = DRM_SWITCH_POWER_CHANGING;
> -   amdgpu_device_prepare(dev);
> -   amdgpu_device_suspend(dev, true);
> -   amdgpu_device_cache_pci_state(pdev);
> +   r = amdgpu_device_prepare(dev);
> +   if (r) {
> +   DRM_WARN("amdgpu_device_prepare failed (%d)\n", r);
> +   return;
> +   }
> +   r = amdgpu_device_suspend(dev, true);
> +   if (r) {
> +   DRM_WARN("amdgpu_device_suspend failed (%d)\n", r);
> +   return;
> +   }
> +   if (!amdgpu_device_cache_pci_state(pdev))
> +   return;
> /* Shut down the device */
> pci_disable_device(pdev);
> -   pci_set_power_state(pdev, PCI_D3cold);
> +   r = pci_set_power_state(pdev, PCI_D3cold);
> +   if (r) {
> +   DRM_WARN("pci_set_power_state failed (%d)\n", r);
> +   return;
> +   }
> dev->switch_power_state = DRM_SWITCH_POWER_OFF;
> }
>  }
> --
> 2.34.1
>

[PATCH] drm/amdgpu: enable GFX IP v11.5.0 CG and PG support

2023-10-09 Thread Alex Deucher

From: Li Ma 

Add CG support for GFX/MC/HDP/ATHUB/IH/BIF.
Add PG support for GFX.

Signed-off-by: Li Ma 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/athub_v3_0.c |  1 +
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c  |  3 +++
 drivers/gpu/drm/amd/amdgpu/soc21.c  | 22 +++---
 3 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/athub_v3_0.c 
b/drivers/gpu/drm/amd/amdgpu/athub_v3_0.c
index 5a318bc03d23..f0737fb3a999 100644
--- a/drivers/gpu/drm/amd/amdgpu/athub_v3_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/athub_v3_0.c
@@ -103,6 +103,7 @@ int athub_v3_0_set_clockgating(struct amdgpu_device *adev,
case IP_VERSION(3, 0, 0):
case IP_VERSION(3, 0, 1):
case IP_VERSION(3, 0, 2):
+   case IP_VERSION(3, 3, 0):
athub_v3_0_update_medium_grain_clock_gating(adev,
state == AMD_CG_STATE_GATE);
athub_v3_0_update_medium_grain_light_sleep(adev,
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index 9c4562bda8cd..27b224b0688a 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -5027,6 +5027,7 @@ static void gfx_v11_cntl_power_gating(struct 
amdgpu_device *adev, bool enable)
switch (amdgpu_ip_version(adev, GC_HWIP, 0)) {
case IP_VERSION(11, 0, 1):
case IP_VERSION(11, 0, 4):
+   case IP_VERSION(11, 5, 0):
WREG32_SOC15(GC, 0, regRLC_PG_DELAY_3, 
RLC_PG_DELAY_3_DEFAULT_GC_11_0_1);
break;
default:
@@ -5061,6 +5062,7 @@ static int gfx_v11_0_set_powergating_state(void *handle,
break;
case IP_VERSION(11, 0, 1):
case IP_VERSION(11, 0, 4):
+   case IP_VERSION(11, 5, 0):
if (!enable)
amdgpu_gfx_off_ctrl(adev, false);
 
@@ -5091,6 +5093,7 @@ static int gfx_v11_0_set_clockgating_state(void *handle,
case IP_VERSION(11, 0, 2):
case IP_VERSION(11, 0, 3):
case IP_VERSION(11, 0, 4):
+   case IP_VERSION(11, 5, 0):
gfx_v11_0_update_gfx_clock_gating(adev,
state ==  AMD_CG_STATE_GATE);
break;
diff --git a/drivers/gpu/drm/amd/amdgpu/soc21.c 
b/drivers/gpu/drm/amd/amdgpu/soc21.c
index 6fcc4f7be117..df7462cec6ab 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc21.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc21.c
@@ -689,11 +689,27 @@ static int soc21_common_early_init(void *handle)
AMD_PG_SUPPORT_JPEG;
adev->external_rev_id = adev->rev_id + 0x80;
break;
-
-
case IP_VERSION(11, 5, 0):
adev->cg_flags = AMD_CG_SUPPORT_VCN_MGCG |
-   AMD_CG_SUPPORT_JPEG_MGCG;
+   AMD_CG_SUPPORT_JPEG_MGCG |
+   AMD_CG_SUPPORT_GFX_CGCG |
+   AMD_CG_SUPPORT_GFX_CGLS |
+   AMD_CG_SUPPORT_GFX_MGCG |
+   AMD_CG_SUPPORT_GFX_FGCG |
+   AMD_CG_SUPPORT_REPEATER_FGCG |
+   AMD_CG_SUPPORT_GFX_PERF_CLK |
+   AMD_CG_SUPPORT_GFX_3D_CGCG |
+   AMD_CG_SUPPORT_GFX_3D_CGLS  |
+   AMD_CG_SUPPORT_MC_MGCG |
+   AMD_CG_SUPPORT_MC_LS |
+   AMD_CG_SUPPORT_HDP_LS |
+   AMD_CG_SUPPORT_HDP_DS |
+   AMD_CG_SUPPORT_HDP_SD |
+   AMD_CG_SUPPORT_ATHUB_MGCG |
+   AMD_CG_SUPPORT_ATHUB_LS |
+   AMD_CG_SUPPORT_IH_CG |
+   AMD_CG_SUPPORT_BIF_MGCG |
+   AMD_CG_SUPPORT_BIF_LS;
adev->pg_flags = AMD_PG_SUPPORT_VCN_DPG |
AMD_PG_SUPPORT_VCN |
AMD_PG_SUPPORT_JPEG |
-- 
2.41.0

[PATCH 3/3] drm/amdgpu: add support to power up/down UMSCH by SMU

2023-10-09 Thread Alex Deucher

From: Lang Yu 

Power up/down UMSCH by SMU.

Signed-off-by: Lang Yu 
Acked-by: Leo Liu 
Acked-by: Veerabadhran Gopalakrishnan 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 26 +++
 drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h |  1 +
 2 files changed, 27 insertions(+)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c 
b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
index df513347cb73..7c3356d6da5e 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
@@ -299,6 +299,29 @@ static int smu_dpm_set_vpe_enable(struct smu_context *smu,
return ret;
 }
 
+static int smu_dpm_set_umsch_mm_enable(struct smu_context *smu,
+  bool enable)
+{
+   struct smu_power_context *smu_power = >smu_power;
+   struct smu_power_gate *power_gate = _power->power_gate;
+   int ret = 0;
+
+   if (!smu->adev->enable_umsch_mm)
+   return 0;
+
+   if (!smu->ppt_funcs->dpm_set_umsch_mm_enable)
+   return 0;
+
+   if (atomic_read(_gate->umsch_mm_gated) ^ enable)
+   return 0;
+
+   ret = smu->ppt_funcs->dpm_set_umsch_mm_enable(smu, enable);
+   if (!ret)
+   atomic_set(_gate->umsch_mm_gated, !enable);
+
+   return ret;
+}
+
 /**
  * smu_dpm_set_power_gate - power gate/ungate the specific IP block
  *
@@ -1196,6 +1219,7 @@ static int smu_sw_init(void *handle)
atomic_set(>smu_power.power_gate.vcn_gated, 1);
atomic_set(>smu_power.power_gate.jpeg_gated, 1);
atomic_set(>smu_power.power_gate.vpe_gated, 1);
+   atomic_set(>smu_power.power_gate.umsch_mm_gated, 1);
 
smu->workload_mask = 1 << 
smu->workload_prority[PP_SMC_POWER_PROFILE_BOOTUP_DEFAULT];
smu->workload_prority[PP_SMC_POWER_PROFILE_BOOTUP_DEFAULT] = 0;
@@ -1549,6 +1573,7 @@ static int smu_hw_init(void *handle)
smu_dpm_set_vcn_enable(smu, true);
smu_dpm_set_jpeg_enable(smu, true);
smu_dpm_set_vpe_enable(smu, true);
+   smu_dpm_set_umsch_mm_enable(smu, true);
smu_set_gfx_cgpg(smu, true);
}
 
@@ -1726,6 +1751,7 @@ static int smu_hw_fini(void *handle)
smu_dpm_set_vcn_enable(smu, false);
smu_dpm_set_jpeg_enable(smu, false);
smu_dpm_set_vpe_enable(smu, false);
+   smu_dpm_set_umsch_mm_enable(smu, false);
 
adev->vcn.cur_state = AMD_PG_STATE_GATE;
adev->jpeg.cur_state = AMD_PG_STATE_GATE;
diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h 
b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
index 190a2ce38ac1..1454eed76604 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
+++ b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
@@ -375,6 +375,7 @@ struct smu_power_gate {
atomic_t vcn_gated;
atomic_t jpeg_gated;
atomic_t vpe_gated;
+   atomic_t umsch_mm_gated;
 };
 
 struct smu_power_context {
-- 
2.41.0

[PATCH 1/3] drm/amdgpu: add support to powerup VPE by SMU

2023-10-09 Thread Alex Deucher

From: Lang Yu 

Powerup VPE by SMU.

Signed-off-by: Lang Yu 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/pm/amdgpu_dpm.c   |  1 +
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 29 +++
 drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h |  1 +
 3 files changed, 31 insertions(+)

diff --git a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c 
b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
index 1b17a71ed45e..acf3527fff2d 100644
--- a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
+++ b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
@@ -93,6 +93,7 @@ int amdgpu_dpm_set_powergating_by_smu(struct amdgpu_device 
*adev, uint32_t block
case AMD_IP_BLOCK_TYPE_JPEG:
case AMD_IP_BLOCK_TYPE_GMC:
case AMD_IP_BLOCK_TYPE_ACP:
+   case AMD_IP_BLOCK_TYPE_VPE:
if (pp_funcs && pp_funcs->set_powergating_by_smu)
ret = (pp_funcs->set_powergating_by_smu(
(adev)->powerplay.pp_handle, block_type, gate));
diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c 
b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
index 0b4f9f2ca529..df513347cb73 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
@@ -279,6 +279,26 @@ static int smu_dpm_set_jpeg_enable(struct smu_context *smu,
return ret;
 }
 
+static int smu_dpm_set_vpe_enable(struct smu_context *smu,
+  bool enable)
+{
+   struct smu_power_context *smu_power = >smu_power;
+   struct smu_power_gate *power_gate = _power->power_gate;
+   int ret = 0;
+
+   if (!smu->ppt_funcs->dpm_set_vpe_enable)
+   return 0;
+
+   if (atomic_read(_gate->vpe_gated) ^ enable)
+   return 0;
+
+   ret = smu->ppt_funcs->dpm_set_vpe_enable(smu, enable);
+   if (!ret)
+   atomic_set(_gate->vpe_gated, !enable);
+
+   return ret;
+}
+
 /**
  * smu_dpm_set_power_gate - power gate/ungate the specific IP block
  *
@@ -337,6 +357,12 @@ static int smu_dpm_set_power_gate(void *handle,
dev_err(smu->adev->dev, "Failed to power %s JPEG!\n",
gate ? "gate" : "ungate");
break;
+   case AMD_IP_BLOCK_TYPE_VPE:
+   ret = smu_dpm_set_vpe_enable(smu, !gate);
+   if (ret)
+   dev_err(smu->adev->dev, "Failed to power %s VPE!\n",
+   gate ? "gate" : "ungate");
+   break;
default:
dev_err(smu->adev->dev, "Unsupported block type!\n");
return -EINVAL;
@@ -1169,6 +1195,7 @@ static int smu_sw_init(void *handle)
 
atomic_set(>smu_power.power_gate.vcn_gated, 1);
atomic_set(>smu_power.power_gate.jpeg_gated, 1);
+   atomic_set(>smu_power.power_gate.vpe_gated, 1);
 
smu->workload_mask = 1 << 
smu->workload_prority[PP_SMC_POWER_PROFILE_BOOTUP_DEFAULT];
smu->workload_prority[PP_SMC_POWER_PROFILE_BOOTUP_DEFAULT] = 0;
@@ -1521,6 +1548,7 @@ static int smu_hw_init(void *handle)
return ret;
smu_dpm_set_vcn_enable(smu, true);
smu_dpm_set_jpeg_enable(smu, true);
+   smu_dpm_set_vpe_enable(smu, true);
smu_set_gfx_cgpg(smu, true);
}
 
@@ -1697,6 +1725,7 @@ static int smu_hw_fini(void *handle)
 
smu_dpm_set_vcn_enable(smu, false);
smu_dpm_set_jpeg_enable(smu, false);
+   smu_dpm_set_vpe_enable(smu, false);
 
adev->vcn.cur_state = AMD_PG_STATE_GATE;
adev->jpeg.cur_state = AMD_PG_STATE_GATE;
diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h 
b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
index 72d632be0ee6..c8cd4e3a3d3b 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
+++ b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
@@ -374,6 +374,7 @@ struct smu_power_gate {
bool vce_gated;
atomic_t vcn_gated;
atomic_t jpeg_gated;
+   atomic_t vpe_gated;
 };
 
 struct smu_power_context {
-- 
2.41.0

[PATCH 2/3] drm/amdgpu: add power up/down UMSCH ppt callback

2023-10-09 Thread Alex Deucher

From: Lang Yu 

Add ppt callback to power up/down UMSCH.

v2: squash in updates (Alex)

Signed-off-by: Lang Yu 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h   |  6 ++
 drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h|  6 +-
 .../gpu/drm/amd/pm/swsmu/smu14/smu_v14_0_0_ppt.c| 13 +
 3 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h 
b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
index c8cd4e3a3d3b..190a2ce38ac1 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
+++ b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
@@ -1350,6 +1350,12 @@ struct pptable_funcs {
 *   management.
 */
int (*dpm_set_vpe_enable)(struct smu_context *smu, bool enable);
+
+   /**
+* @dpm_set_umsch_mm_enable: Enable/disable UMSCH engine dynamic power
+*   management.
+*/
+   int (*dpm_set_umsch_mm_enable)(struct smu_context *smu, bool enable);
 };
 
 typedef enum {
diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h 
b/drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h
index 3958c9cb4e91..098267ec0a83 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h
+++ b/drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h
@@ -255,7 +255,11 @@
__SMU_DUMMY_MAP(McaBankCeDumpDW),   \
__SMU_DUMMY_MAP(SelectPLPDMode),\
__SMU_DUMMY_MAP(PowerUpVpe),\
-   __SMU_DUMMY_MAP(PowerDownVpe),
+   __SMU_DUMMY_MAP(PowerDownVpe), \
+   __SMU_DUMMY_MAP(PowerUpUmsch),  \
+   __SMU_DUMMY_MAP(PowerDownUmsch),\
+   __SMU_DUMMY_MAP(SetSoftMaxVpe), \
+   __SMU_DUMMY_MAP(SetSoftMinVpe),
 
 #undef __SMU_DUMMY_MAP
 #define __SMU_DUMMY_MAP(type)  SMU_MSG_##type
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu14/smu_v14_0_0_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu14/smu_v14_0_0_ppt.c
index 5db29fcf699d..eb613687513a 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu14/smu_v14_0_0_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu14/smu_v14_0_0_ppt.c
@@ -104,6 +104,10 @@ static struct cmn2asic_msg_mapping 
smu_v14_0_0_message_map[SMU_MSG_MAX_COUNT] =
MSG_MAP(SetHardMinIspxclkByFreq,
PPSMC_MSG_SetHardMinIspxclkByFreq,  1),
MSG_MAP(PowerUpVpe, 
PPSMC_MSG_PowerUpVpe,   1),
MSG_MAP(PowerDownVpe,   PPSMC_MSG_PowerDownVpe, 
1),
+   MSG_MAP(PowerUpUmsch,   PPSMC_MSG_PowerUpUmsch, 
1),
+   MSG_MAP(PowerDownUmsch, 
PPSMC_MSG_PowerDownUmsch,   1),
+   MSG_MAP(SetSoftMaxVpe,  
PPSMC_MSG_SetSoftMaxVpe,1),
+   MSG_MAP(SetSoftMinVpe,  
PPSMC_MSG_SetSoftMinVpe,1),
 };
 
 static struct cmn2asic_mapping smu_v14_0_0_feature_mask_map[SMU_FEATURE_COUNT] 
= {
@@ -1025,6 +1029,14 @@ static int smu_v14_0_0_set_vpe_enable(struct smu_context 
*smu,
   0, NULL);
 }
 
+static int smu_v14_0_0_set_umsch_mm_enable(struct smu_context *smu,
+ bool enable)
+{
+   return smu_cmn_send_smc_msg_with_param(smu, enable ?
+  SMU_MSG_PowerUpUmsch : 
SMU_MSG_PowerDownUmsch,
+  0, NULL);
+}
+
 static const struct pptable_funcs smu_v14_0_0_ppt_funcs = {
.check_fw_status = smu_v14_0_check_fw_status,
.check_fw_version = smu_v14_0_check_fw_version,
@@ -1054,6 +1066,7 @@ static const struct pptable_funcs smu_v14_0_0_ppt_funcs = 
{
.set_fine_grain_gfx_freq_parameters = 
smu_v14_0_0_set_fine_grain_gfx_freq_parameters,
.set_gfx_power_up_by_imu = smu_v14_0_set_gfx_power_up_by_imu,
.dpm_set_vpe_enable = smu_v14_0_0_set_vpe_enable,
+   .dpm_set_umsch_mm_enable = smu_v14_0_0_set_umsch_mm_enable,
 };
 
 static void smu_v14_0_0_set_smu_mailbox_registers(struct smu_context *smu)
-- 
2.41.0

[PATCH 4/5] drm/amd/swsmu: add smu14 ip support

2023-10-09 Thread Alex Deucher

From: Kenneth Feng 

Add initial swSMU support for smu 14 series ASIC.

v2: squash in build fixes and updates (Li Ma)
fix warnings (Alex)
v3: squash in updates (Alex)
v4: squash in updates (Alex)
v5: squash in avg/current power updates (Alex)

Signed-off-by: Li Ma 
Signed-off-by: Kenneth Feng 
Signed-off-by: Likun Gao 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/pm/Makefile   |1 +
 drivers/gpu/drm/amd/pm/swsmu/Makefile |2 +-
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c |4 +
 drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h |6 +
 drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h  |7 +-
 drivers/gpu/drm/amd/pm/swsmu/inc/smu_v14_0.h  |  230 +++
 drivers/gpu/drm/amd/pm/swsmu/smu14/Makefile   |   30 +
 .../gpu/drm/amd/pm/swsmu/smu14/smu_v14_0.c| 1727 +
 .../drm/amd/pm/swsmu/smu14/smu_v14_0_0_ppt.c  | 1078 ++
 .../drm/amd/pm/swsmu/smu14/smu_v14_0_0_ppt.h  |   28 +
 10 files changed, 3110 insertions(+), 3 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/pm/swsmu/inc/smu_v14_0.h
 create mode 100644 drivers/gpu/drm/amd/pm/swsmu/smu14/Makefile
 create mode 100644 drivers/gpu/drm/amd/pm/swsmu/smu14/smu_v14_0.c
 create mode 100644 drivers/gpu/drm/amd/pm/swsmu/smu14/smu_v14_0_0_ppt.c
 create mode 100644 drivers/gpu/drm/amd/pm/swsmu/smu14/smu_v14_0_0_ppt.h

diff --git a/drivers/gpu/drm/amd/pm/Makefile b/drivers/gpu/drm/amd/pm/Makefile
index 51751db436b0..ebbf188f625c 100644
--- a/drivers/gpu/drm/amd/pm/Makefile
+++ b/drivers/gpu/drm/amd/pm/Makefile
@@ -30,6 +30,7 @@ subdir-ccflags-y += \
-I$(FULL_AMD_PATH)/pm/swsmu/smu11 \
-I$(FULL_AMD_PATH)/pm/swsmu/smu12 \
-I$(FULL_AMD_PATH)/pm/swsmu/smu13 \
+   -I$(FULL_AMD_PATH)/pm/swsmu/smu14 \
-I$(FULL_AMD_PATH)/pm/powerplay/inc \
-I$(FULL_AMD_PATH)/pm/powerplay/smumgr\
-I$(FULL_AMD_PATH)/pm/powerplay/hwmgr \
diff --git a/drivers/gpu/drm/amd/pm/swsmu/Makefile 
b/drivers/gpu/drm/amd/pm/swsmu/Makefile
index 7987c6cf849d..e5fdda49c96c 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/Makefile
+++ b/drivers/gpu/drm/amd/pm/swsmu/Makefile
@@ -22,7 +22,7 @@
 
 AMD_SWSMU_PATH = ../pm/swsmu
 
-SWSMU_LIBS = smu11 smu12 smu13
+SWSMU_LIBS = smu11 smu12 smu13 smu14
 
 AMD_SWSMU = $(addsuffix /Makefile,$(addprefix 
$(FULL_AMD_PATH)/pm/swsmu/,$(SWSMU_LIBS)))
 
diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c 
b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
index 99750c182279..da6f018aff12 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
@@ -43,6 +43,7 @@
 #include "smu_v13_0_5_ppt.h"
 #include "smu_v13_0_6_ppt.h"
 #include "smu_v13_0_7_ppt.h"
+#include "smu_v14_0_0_ppt.h"
 #include "amd_pcie.h"
 
 /*
@@ -660,6 +661,9 @@ static int smu_set_funcs(struct amdgpu_device *adev)
case IP_VERSION(13, 0, 7):
smu_v13_0_7_set_ppt_funcs(smu);
break;
+   case IP_VERSION(14, 0, 0):
+   smu_v14_0_0_set_ppt_funcs(smu);
+   break;
default:
return -EINVAL;
}
diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h 
b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
index f3cab5e633a7..72d632be0ee6 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
+++ b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
@@ -1343,6 +1343,12 @@ struct pptable_funcs {
 * @init_pptable_microcode: Prepare the pptable microcode to upload via 
PSP
 */
int (*init_pptable_microcode)(struct smu_context *smu);
+
+   /**
+* @dpm_set_vpe_enable: Enable/disable VPE engine dynamic power
+*   management.
+*/
+   int (*dpm_set_vpe_enable)(struct smu_context *smu, bool enable);
 };
 
 typedef enum {
diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h 
b/drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h
index 4850e48bbef5..3958c9cb4e91 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h
+++ b/drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h
@@ -253,7 +253,9 @@
__SMU_DUMMY_MAP(QueryValidMcaCeCount),  \
__SMU_DUMMY_MAP(McaBankDumpDW), \
__SMU_DUMMY_MAP(McaBankCeDumpDW),   \
-   __SMU_DUMMY_MAP(SelectPLPDMode),
+   __SMU_DUMMY_MAP(SelectPLPDMode),\
+   __SMU_DUMMY_MAP(PowerUpVpe),\
+   __SMU_DUMMY_MAP(PowerDownVpe),
 
 #undef __SMU_DUMMY_MAP
 #define __SMU_DUMMY_MAP(type)  SMU_MSG_##type
@@ -415,7 +417,8 @@ enum smu_clk_type {
__SMU_DUMMY_MAP(MEM_TEMP_READ), \
__SMU_DUMMY_MAP(ATHUB_MMHUB_PG),\
__SMU_DUMMY_MAP(BACO_CG),   \
-   __SMU_DUMMY_MAP(SOC_CG),
+   __SMU_DUMMY_MAP(SOC_CG),\
+   __SMU_DUMMY_MAP(LOW_POWER_DCNCLKS),
 
 #undef __SMU_DUMMY_MAP
 #define __SMU_DUMMY_MAP(feature)   SMU_FEATURE_##feature##_BIT
diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/smu_v14_0.h

[PATCH 5/5] drm/amdgpu/discovery: add SMU 14 support

2023-10-09 Thread Alex Deucher

From: Li Ma 

add smu 14 into the IP discovery list.

Signed-off-by: Li Ma 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c  | 3 +++
 drivers/gpu/drm/amd/include/kgd_pp_interface.h | 1 +
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c  | 8 
 3 files changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index 68aa271d5aa5..17d4311e22d5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -1846,6 +1846,9 @@ static int amdgpu_discovery_set_smu_ip_blocks(struct 
amdgpu_device *adev)
case IP_VERSION(13, 0, 11):
amdgpu_device_ip_block_add(adev, _v13_0_ip_block);
break;
+   case IP_VERSION(14, 0, 0):
+   amdgpu_device_ip_block_add(adev, _v14_0_ip_block);
+   break;
default:
dev_err(adev->dev,
"Failed to add smu ip block(MP1_HWIP:0x%x)\n",
diff --git a/drivers/gpu/drm/amd/include/kgd_pp_interface.h 
b/drivers/gpu/drm/amd/include/kgd_pp_interface.h
index e0bb6d39f0c3..ff72db95c902 100644
--- a/drivers/gpu/drm/amd/include/kgd_pp_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_pp_interface.h
@@ -28,6 +28,7 @@ extern const struct amdgpu_ip_block_version pp_smu_ip_block;
 extern const struct amdgpu_ip_block_version smu_v11_0_ip_block;
 extern const struct amdgpu_ip_block_version smu_v12_0_ip_block;
 extern const struct amdgpu_ip_block_version smu_v13_0_ip_block;
+extern const struct amdgpu_ip_block_version smu_v14_0_ip_block;
 
 enum smu_event_type {
SMU_EVENT_RESET_COMPLETE = 0,
diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c 
b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
index da6f018aff12..0b4f9f2ca529 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
@@ -2268,6 +2268,14 @@ const struct amdgpu_ip_block_version smu_v13_0_ip_block 
= {
.funcs = _ip_funcs,
 };
 
+const struct amdgpu_ip_block_version smu_v14_0_ip_block = {
+   .type = AMD_IP_BLOCK_TYPE_SMC,
+   .major = 14,
+   .minor = 0,
+   .rev = 0,
+   .funcs = _ip_funcs,
+};
+
 static int smu_load_microcode(void *handle)
 {
struct smu_context *smu = handle;
-- 
2.41.0

[PATCH 2/5] drm/amd/swsmu: add smu v14_0_0 ppsmc file

2023-10-09 Thread Alex Deucher

From: Li Ma 

Add initial smu v14_0_0 ppsmc file

v2: squash in updates (Alex)
v3: squash in updates (Alex)

Signed-off-by: Li Ma 
Signed-off-by: Alex Deucher 
---
 .../pm/swsmu/inc/pmfw_if/smu_v14_0_0_ppsmc.h  | 142 ++
 1 file changed, 142 insertions(+)
 create mode 100644 drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu_v14_0_0_ppsmc.h

diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu_v14_0_0_ppsmc.h 
b/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu_v14_0_0_ppsmc.h
new file mode 100644
index ..82fec93e0114
--- /dev/null
+++ b/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu_v14_0_0_ppsmc.h
@@ -0,0 +1,142 @@
+/*
+ * Copyright 2023 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#ifndef __SMU_V14_0_0_PPSMC_H__
+#define __SMU_V14_0_0_PPSMC_H__
+
+/*! @mainpage PMFW-PPS (PPLib) Message Interface
+  This documentation contains the subsections:\n\n
+  @ref ResponseCodes\n
+  @ref definitions\n
+  @ref enums\n
+*/
+
+/** @def PPS_PMFW_IF_VER
+* PPS (PPLib) to PMFW IF version 1.0
+*/
+#define PPS_PMFW_IF_VER "1.0" ///< Major.Minor
+
+/** @defgroup ResponseCodes PMFW Response Codes
+*  @{
+*/
+// SMU Response Codes:
+#define PPSMC_Result_OK0x1  ///< Message Response OK
+#define PPSMC_Result_Failed0xFF ///< Message Response Failed
+#define PPSMC_Result_UnknownCmd0xFE ///< Message Response Unknown 
Command
+#define PPSMC_Result_CmdRejectedPrereq 0xFD ///< Message Response Command 
Failed Prerequisite
+#define PPSMC_Result_CmdRejectedBusy   0xFC ///< Message Response Command 
Rejected due to PMFW is busy. Sender should retry sending this message
+/** @}*/
+
+/** @defgroup definitions Message definitions
+*  @{
+*/
+// Message Definitions:
+#define PPSMC_MSG_TestMessage   0x01 ///< To check if PMFW is 
alive and responding. Requirement specified by PMFW team
+#define PPSMC_MSG_GetPmfwVersion0x02 ///< Get PMFW version
+#define PPSMC_MSG_GetDriverIfVersion0x03 ///< Get PMFW_DRIVER_IF 
version
+#define PPSMC_MSG_SPARE00x04 ///< SPARE
+#define PPSMC_MSG_SPARE10x05 ///< SPARE
+#define PPSMC_MSG_PowerDownVcn  0x06 ///< Power down VCN
+#define PPSMC_MSG_PowerUpVcn0x07 ///< Power up VCN; VCN is 
power gated by default
+#define PPSMC_MSG_SetHardMinVcn 0x08 ///< For wireless display
+#define PPSMC_MSG_SetSoftMinGfxclk  0x09 ///< Set SoftMin for 
GFXCLK, argument is frequency in MHz
+#define PPSMC_MSG_SPARE20x0A ///< SPARE
+#define PPSMC_MSG_SPARE30x0B ///< SPARE
+#define PPSMC_MSG_PrepareMp1ForUnload   0x0C ///< Prepare PMFW for GFX 
driver unload
+#define PPSMC_MSG_SetDriverDramAddrHigh 0x0D ///< Set high 32 bits of 
DRAM address for Driver table transfer
+#define PPSMC_MSG_SetDriverDramAddrLow  0x0E ///< Set low 32 bits of 
DRAM address for Driver table transfer
+#define PPSMC_MSG_TransferTableSmu2Dram 0x0F ///< Transfer driver 
interface table from PMFW SRAM to DRAM
+#define PPSMC_MSG_TransferTableDram2Smu 0x10 ///< Transfer driver 
interface table from DRAM to PMFW SRAM
+#define PPSMC_MSG_GfxDeviceDriverReset  0x11 ///< Request GFX mode 2 
reset
+#define PPSMC_MSG_GetEnabledSmuFeatures 0x12 ///< Get enabled features 
in PMFW
+#define PPSMC_MSG_SetHardMinSocclkByFreq0x13 ///< Set hard min for SOC 
CLK
+#define PPSMC_MSG_SetSoftMinFclk0x14 ///< Set hard min for FCLK
+#define PPSMC_MSG_SetSoftMinVcn 0x15 ///< Set soft min for VCN 
clocks (VCLK and DCLK)
+
+#define PPSMC_MSG_EnableGfxImu  0x16 ///< Enable GFX IMU
+
+#define PPSMC_MSG_GetGfxclkFrequency0x17 ///< Get GFX clock 
frequency
+#define PPSMC_MSG_GetFclkFrequency  0x18 ///< Get FCLK

[PATCH 3/5] drm/amd/swsmu: add smu v14_0_0 pmfw if file

2023-10-09 Thread Alex Deucher

From: Li Ma 

Add initial smu v14_0_0 pmfw if file

v2: squash in updates (Alex)

Signed-off-by: Li Ma 
Signed-off-by: Alex Deucher 
---
 .../pm/swsmu/inc/pmfw_if/smu_v14_0_0_pmfw.h   | 156 ++
 1 file changed, 156 insertions(+)
 create mode 100644 drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu_v14_0_0_pmfw.h

diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu_v14_0_0_pmfw.h 
b/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu_v14_0_0_pmfw.h
new file mode 100644
index ..0dea30cceb98
--- /dev/null
+++ b/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu_v14_0_0_pmfw.h
@@ -0,0 +1,156 @@
+/*
+ * Copyright 2023 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#ifndef __SMU_V14_0_0_PMFW_H__
+#define __SMU_V14_0_0_PMFW_H__
+
+#include "smu14_driver_if_v14_0_0.h"
+
+#pragma pack(push, 1)
+
+#define ENABLE_DEBUG_FEATURES
+
+// Firmware features
+// Feature Control Defines
+#define FEATURE_CCLK_DPM_BIT 0
+#define FEATURE_FAN_CONTROLLER_BIT   1
+#define FEATURE_DATA_CALCULATION_BIT 2
+#define FEATURE_PPT_BIT  3
+#define FEATURE_TDC_BIT  4
+#define FEATURE_THERMAL_BIT  5
+#define FEATURE_FIT_BIT  6
+#define FEATURE_EDC_BIT  7
+#define FEATURE_PLL_POWER_DOWN_BIT   8
+#define FEATURE_VDDOFF_BIT   9
+#define FEATURE_VCN_DPM_BIT 10
+#define FEATURE_DS_MPM_BIT  11
+#define FEATURE_FCLK_DPM_BIT12
+#define FEATURE_SOCCLK_DPM_BIT  13
+#define FEATURE_DS_MPIO_BIT 14
+#define FEATURE_LCLK_DPM_BIT15
+#define FEATURE_SHUBCLK_DPM_BIT 16
+#define FEATURE_DCFCLK_DPM_BIT  17
+#define FEATURE_ISP_DPM_BIT 18
+#define FEATURE_IPU_DPM_BIT 19
+#define FEATURE_GFX_DPM_BIT 20
+#define FEATURE_DS_GFXCLK_BIT   21
+#define FEATURE_DS_SOCCLK_BIT   22
+#define FEATURE_DS_LCLK_BIT 23
+#define FEATURE_LOW_POWER_DCNCLKS_BIT   24  // for all DISP clks
+#define FEATURE_DS_SHUBCLK_BIT  25
+#define FEATURE_GFX_TEMP_VMIN_BIT   26
+#define FEATURE_ZSTATES_BIT 27
+#define FEATURE_IOMMUL2_PG_BIT  28
+#define FEATURE_DS_FCLK_BIT 29
+#define FEATURE_DS_SMNCLK_BIT   30
+#define FEATURE_DS_MP1CLK_BIT   31
+#define FEATURE_RESERVED3   32
+#define FEATURE_SMU_LOW_POWER_BIT   33
+#define FEATURE_SMART_L3_RINSER_BIT 34
+#define FEATURE_GFX_DEM_BIT 35
+#define FEATURE_PSI_BIT 36
+#define FEATURE_PROCHOT_BIT 37
+#define FEATURE_CPUOFF_BIT  38
+#define FEATURE_STAPM_BIT   39
+#define FEATURE_S0I3_BIT40
+#define FEATURE_DF_LIGHT_CSTATE 41   // shift the order or 
DFCstate annd DF light Cstate
+#define FEATURE_PERF_LIMIT_BIT  42
+#define FEATURE_CORE_DLDO_BIT   43
+#define FEATURE_DVO_BIT 44
+#define FEATURE_DS_VCN_BIT  45
+#define FEATURE_CPPC_BIT46
+#define FEATURE_CPPC_PREFERRED_CORES47
+#define FEATURE_DF_CSTATES_BIT  48
+#define FEATURE_RESERVED4   49
+#define FEATURE_ATHUB_PG_BIT50
+#define FEATURE_VDDOFF_ECO_BIT  51
+#define FEATURE_ZSTATES_ECO_BIT 52
+#define FEATURE_CC6_BIT 53
+#define FEATURE_DS_UMCCLK_BIT   54
+#define FEATURE_DS_ISPCLK_BIT   55
+#define FEATURE_DS_HSPCLK_BIT   56
+#define FEATURE_RESERVED5   57
+#define FEATURE_DS_IPUCLK_BIT   58
+#define FEATURE_DS_VPECLK_BIT   59
+#define FEATURE_VPE_DPM_BIT 60

[PATCH 1/5] drm/amdgpu/swsmu: add smu v14_0_0 driver if file

2023-10-09 Thread Alex Deucher

From: Li Ma 

Add initial smu v14_0_0 driver if file

v2: squash in updates (Alex)
v3: update interface (Alex)

Signed-off-by: Li Ma 
Signed-off-by: Alex Deucher 
---
 .../inc/pmfw_if/smu14_driver_if_v14_0_0.h | 281 ++
 1 file changed, 281 insertions(+)
 create mode 100644 
drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu14_driver_if_v14_0_0.h

diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu14_driver_if_v14_0_0.h 
b/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu14_driver_if_v14_0_0.h
new file mode 100644
index ..cb6948a0b262
--- /dev/null
+++ b/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu14_driver_if_v14_0_0.h
@@ -0,0 +1,281 @@
+/*
+ * Copyright 2023 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#ifndef SMU14_DRIVER_IF_V14_0_0_H
+#define SMU14_DRIVER_IF_V14_0_0_H
+
+// *** IMPORTANT ***
+// SMU TEAM: Always increment the interface version if
+// any structure is changed in this file
+#define PMFW_DRIVER_IF_VERSION 6
+
+typedef struct {
+  int32_t value;
+  uint32_t numFractionalBits;
+} FloatInIntFormat_t;
+
+typedef enum {
+  DSPCLK_DCFCLK = 0,
+  DSPCLK_DISPCLK,
+  DSPCLK_PIXCLK,
+  DSPCLK_PHYCLK,
+  DSPCLK_COUNT,
+} DSPCLK_e;
+
+typedef struct {
+  uint16_t Freq; // in MHz
+  uint16_t Vid;  // min voltage in SVI3 VID
+} DisplayClockTable_t;
+
+typedef struct {
+  uint16_t MinClock; // This is either DCFCLK or SOCCLK (in MHz)
+  uint16_t MaxClock; // This is either DCFCLK or SOCCLK (in MHz)
+  uint16_t MinMclk;
+  uint16_t MaxMclk;
+
+  uint8_t  WmSetting;
+  uint8_t  WmType;  // Used for normal pstate change or memory retraining
+  uint8_t  Padding[2];
+} WatermarkRowGeneric_t;
+
+#define NUM_WM_RANGES 4
+#define WM_PSTATE_CHG 0
+#define WM_RETRAINING 1
+
+typedef enum {
+  WM_SOCCLK = 0,
+  WM_DCFCLK,
+  WM_COUNT,
+} WM_CLOCK_e;
+
+typedef struct {
+  // Watermarks
+  WatermarkRowGeneric_t WatermarkRow[WM_COUNT][NUM_WM_RANGES];
+
+  uint32_t MmHubPadding[7]; // SMU internal use
+} Watermarks_t;
+
+typedef enum {
+  CUSTOM_DPM_SETTING_GFXCLK,
+  CUSTOM_DPM_SETTING_CCLK,
+  CUSTOM_DPM_SETTING_FCLK_CCX,
+  CUSTOM_DPM_SETTING_FCLK_GFX,
+  CUSTOM_DPM_SETTING_FCLK_STALLS,
+  CUSTOM_DPM_SETTING_LCLK,
+  CUSTOM_DPM_SETTING_COUNT,
+} CUSTOM_DPM_SETTING_e;
+
+typedef struct {
+  uint8_t ActiveHystLimit;
+  uint8_t IdleHystLimit;
+  uint8_t FPS;
+  uint8_t MinActiveFreqType;
+  FloatInIntFormat_t  MinActiveFreq;
+  FloatInIntFormat_t  PD_Data_limit;
+  FloatInIntFormat_t  PD_Data_time_constant;
+  FloatInIntFormat_t  PD_Data_error_coeff;
+  FloatInIntFormat_t  PD_Data_error_rate_coeff;
+} DpmActivityMonitorCoeffExt_t;
+
+typedef struct {
+  DpmActivityMonitorCoeffExt_t 
DpmActivityMonitorCoeff[CUSTOM_DPM_SETTING_COUNT];
+} CustomDpmSettings_t;
+
+#define NUM_DCFCLK_DPM_LEVELS 8
+#define NUM_DISPCLK_DPM_LEVELS8
+#define NUM_DPPCLK_DPM_LEVELS 8
+#define NUM_SOCCLK_DPM_LEVELS 8
+#define NUM_VCN_DPM_LEVELS8
+#define NUM_SOC_VOLTAGE_LEVELS8
+#define NUM_VPE_DPM_LEVELS8
+#define NUM_FCLK_DPM_LEVELS   8
+#define NUM_MEM_PSTATE_LEVELS 4
+
+
+typedef struct {
+  uint32_t UClk;
+  uint32_t MemClk;
+  uint32_t Voltage;
+  uint8_t  WckRatio;
+  uint8_t  Spare[3];
+} MemPstateTable_t;
+
+//Freq in MHz
+//Voltage in milli volts with 2 fractional bits
+typedef struct {
+  uint32_t DcfClocks[NUM_DCFCLK_DPM_LEVELS];
+  uint32_t DispClocks[NUM_DISPCLK_DPM_LEVELS];
+  uint32_t DppClocks[NUM_DPPCLK_DPM_LEVELS];
+  uint32_t SocClocks[NUM_SOCCLK_DPM_LEVELS];
+  uint32_t VClocks[NUM_VCN_DPM_LEVELS];
+  uint32_t DClocks[NUM_VCN_DPM_LEVELS];
+  uint32_t VPEClocks[NUM_VPE_DPM_LEVELS];
+  uint32_t FclkClocks_Freq[NUM_FCLK_DPM_LEVELS];
+  uint32_t FclkClocks_Voltage[NUM_FCLK_DPM_LEVELS];
+  uint32_t SocVoltage[NUM_SOC_VOLTAGE_LEVELS];
+  MemPstateTable_t MemPstateTable[NUM_MEM_PSTATE_LEVELS];
+
+  uint8_t  NumDcfClkLevelsEnabled;
+  uint8_t

[PATCH 0/5] Add SMU 14.0 support

2023-10-09 Thread Alex Deucher

This adds support for SMU 14.0 support.  SMU is
the IP block which runs PMFW and manages power
for the GPU.

Kenneth Feng (1):
  drm/amd/swsmu: add smu14 ip support

Li Ma (4):
  drm/amdgpu/swsmu: add smu v14_0_0 driver if file
  drm/amd/swsmu: add smu v14_0_0 ppsmc file
  drm/amd/swsmu: add smu v14_0_0 pmfw if file
  drm/amdgpu/discovery: add SMU 14 support

 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c |3 +
 .../gpu/drm/amd/include/kgd_pp_interface.h|1 +
 drivers/gpu/drm/amd/pm/Makefile   |1 +
 drivers/gpu/drm/amd/pm/swsmu/Makefile |2 +-
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c |   12 +
 drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h |6 +
 .../inc/pmfw_if/smu14_driver_if_v14_0_0.h |  281 +++
 .../pm/swsmu/inc/pmfw_if/smu_v14_0_0_pmfw.h   |  156 ++
 .../pm/swsmu/inc/pmfw_if/smu_v14_0_0_ppsmc.h  |  142 ++
 drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h  |7 +-
 drivers/gpu/drm/amd/pm/swsmu/inc/smu_v14_0.h  |  230 +++
 drivers/gpu/drm/amd/pm/swsmu/smu14/Makefile   |   30 +
 .../gpu/drm/amd/pm/swsmu/smu14/smu_v14_0.c| 1727 +
 .../drm/amd/pm/swsmu/smu14/smu_v14_0_0_ppt.c  | 1078 ++
 .../drm/amd/pm/swsmu/smu14/smu_v14_0_0_ppt.h  |   28 +
 15 files changed, 3701 insertions(+), 3 deletions(-)
 create mode 100644 
drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu14_driver_if_v14_0_0.h
 create mode 100644 drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu_v14_0_0_pmfw.h
 create mode 100644 drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu_v14_0_0_ppsmc.h
 create mode 100644 drivers/gpu/drm/amd/pm/swsmu/inc/smu_v14_0.h
 create mode 100644 drivers/gpu/drm/amd/pm/swsmu/smu14/Makefile
 create mode 100644 drivers/gpu/drm/amd/pm/swsmu/smu14/smu_v14_0.c
 create mode 100644 drivers/gpu/drm/amd/pm/swsmu/smu14/smu_v14_0_0_ppt.c
 create mode 100644 drivers/gpu/drm/amd/pm/swsmu/smu14/smu_v14_0_0_ppt.h

-- 
2.41.0

Re: [PATCH AUTOSEL 5.10 13/22] drm/amdgpu: install stub fence into potential unused fence pointers

2023-10-09 Thread Greg KH

On Mon, Oct 09, 2023 at 02:46:40PM +0200, Christian König wrote:
> Am 07.10.23 um 11:50 schrieb Greg KH:
> > On Sun, Sep 10, 2023 at 03:43:01PM -0500, Bryan Jennings wrote:
> > > This is also causing log spam on 5.15.  It was included in 5.15.128 as
> > > commit 4921792e04f2125b5eadef9dbe9417a8354c7eff.  I encountered this and
> > > found https://gitlab.freedesktop.org/drm/amd/-/issues/2820 while 
> > > researching
> > > the problem.
> > Confused, what should we do here?
> 
> If this patch was backported to even more older kernels then please revert
> that immediately!

It only went to 5.10 and 5.15 and has been reverted from both of them
now.

thanks,

greg k-h

Re: [PATCH v2] drm/amdgpu: Address member 'gart_placement' not described in 'amdgpu_gmc_gart_location'

2023-10-09 Thread Alex Deucher

On Mon, Oct 9, 2023 at 12:17 PM Srinivasan Shanmugam
 wrote:
>
> Fixes the below:
> drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c:274: warning: Function parameter or 
> member 'gart_placement' not described in 'amdgpu_gmc_gart_location'
>
> Cc: Christian König 
> Cc: Alex Deucher 
> Cc: "Pan, Xinhui" 
> Signed-off-by: Srinivasan Shanmugam 
> ---
>
> v2:
>  - s/in/around
>
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> index 60c81c3d29d5..47772f233a4d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> @@ -264,6 +264,7 @@ void amdgpu_gmc_sysvm_location(struct amdgpu_device 
> *adev, struct amdgpu_gmc *mc
>   *
>   * @adev: amdgpu device structure holding all necessary information
>   * @mc: memory controller structure holding memory information
> + * @gart_placement: GART placement region around VRAM

GART placement policy with respect to VRAM

WIth that fixed:
Reviewed-by: Alex Deucher 

>   *
>   * Function will place try to place GART before or after VRAM.
>   * If GART size is bigger than space left then we ajust GART size.
> --
> 2.34.1
>

[PATCH v6 6/7] drm/amd/display: Destroy DC context while keeping DML and DML2

2023-10-09 Thread Mario Limonciello

If there is memory pressure at suspend time then dynamically
allocating a large structure as part of DC suspend code will
fail.

Instead re-use the same structures and clear all members except
those that should be maintained.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2362
Acked-by: Alex Deucher 
Signed-off-by: Mario Limonciello 
---
 drivers/gpu/drm/amd/display/dc/core/dc.c  | 37 ---
 .../gpu/drm/amd/display/dc/core/dc_resource.c | 12 ++
 2 files changed, 12 insertions(+), 37 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c 
b/drivers/gpu/drm/amd/display/dc/core/dc.c
index 074a692e1c66..fd35ab2ce3a4 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
@@ -4781,12 +4781,6 @@ bool dc_set_power_state(
struct dc *dc,
enum dc_acpi_cm_power_state power_state)
 {
-   struct kref refcount;
-   struct display_mode_lib *dml;
-#ifdef CONFIG_DRM_AMD_DC_FP
-   struct dml2_context *dml2 = NULL;
-#endif
-
if (!dc->current_state)
return true;
 
@@ -4805,40 +4799,9 @@ bool dc_set_power_state(
 
break;
default:
-#ifdef CONFIG_DRM_AMD_DC_FP
-   if (dc->debug.using_dml2)
-   dml2 = dc->current_state->bw_ctx.dml2;
-#endif
ASSERT(dc->current_state->stream_count == 0);
-   /* Zero out the current context so that on resume we start with
-* clean state, and dc hw programming optimizations will not
-* cause any trouble.
-*/
-   dml = kzalloc(sizeof(struct display_mode_lib),
-   GFP_KERNEL);
-
-   ASSERT(dml);
-   if (!dml)
-   return false;
-
-   /* Preserve refcount */
-   refcount = dc->current_state->refcount;
-   /* Preserve display mode lib */
-   memcpy(dml, >current_state->bw_ctx.dml, sizeof(struct 
display_mode_lib));
 
dc_resource_state_destruct(dc->current_state);
-   memset(dc->current_state, 0,
-   sizeof(*dc->current_state));
-
-   dc->current_state->refcount = refcount;
-   dc->current_state->bw_ctx.dml = *dml;
-
-   kfree(dml);
-
-#ifdef CONFIG_DRM_AMD_DC_FP
-   if (dc->debug.using_dml2)
-   dc->current_state->bw_ctx.dml2 = dml2;
-#endif
 
break;
}
diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
index 25562b262555..a7e49c78c187 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
@@ -4487,6 +4487,18 @@ void dc_resource_state_destruct(struct dc_state *context)
context->streams[i] = NULL;
}
context->stream_count = 0;
+   context->stream_mask = 0;
+   memset(>res_ctx, 0, sizeof(context->res_ctx));
+   memset(>pp_display_cfg, 0, sizeof(context->pp_display_cfg));
+   memset(>dcn_bw_vars, 0, sizeof(context->dcn_bw_vars));
+   context->clk_mgr = NULL;
+   memset(>bw_ctx.bw, 0, sizeof(context->bw_ctx.bw));
+   memset(context->block_sequence, 0, sizeof(context->block_sequence));
+   context->block_sequence_steps = 0;
+   memset(context->dc_dmub_cmd, 0, sizeof(context->dc_dmub_cmd));
+   context->dmub_cmd_count = 0;
+   memset(>perf_params, 0, sizeof(context->perf_params));
+   memset(>scratch, 0, sizeof(context->scratch));
 }
 
 void dc_resource_state_copy_construct(
-- 
2.34.1

[PATCH v6 7/7] drm/amd/display: make dc_set_power_state() return type `void` again

2023-10-09 Thread Mario Limonciello

As dc_set_power_state() no longer allocates memory, it's not necessary
to have return types and check return code as it can't fail anymore.

Change it back to `void`.

Reviewed-by: Alex Deucher 
Signed-off-by: Mario Limonciello 
---
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c   | 17 +
 drivers/gpu/drm/amd/display/dc/core/dc.c|  6 ++
 drivers/gpu/drm/amd/display/dc/dc.h |  2 +-
 3 files changed, 8 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 97e21bbac5cf..df0af8d1eaba 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -2667,11 +2667,6 @@ static void hpd_rx_irq_work_suspend(struct 
amdgpu_display_manager *dm)
}
 }
 
-static int dm_set_power_state(struct dc *dc, enum dc_acpi_cm_power_state 
power_state)
-{
-   return dc_set_power_state(dc, power_state) ? 0 : -ENOMEM;
-}
-
 static int dm_suspend(void *handle)
 {
struct amdgpu_device *adev = handle;
@@ -2707,7 +2702,9 @@ static int dm_suspend(void *handle)
 
hpd_rx_irq_work_suspend(dm);
 
-   return dm_set_power_state(dm->dc, DC_ACPI_CM_POWER_STATE_D3);
+   dc_set_power_state(dm->dc, DC_ACPI_CM_POWER_STATE_D3);
+
+   return 0;
 }
 
 struct drm_connector *
@@ -2901,9 +2898,7 @@ static int dm_resume(void *handle)
if (r)
DRM_ERROR("DMUB interface failed to initialize: 
status=%d\n", r);
 
-   r = dm_set_power_state(dm->dc, DC_ACPI_CM_POWER_STATE_D0);
-   if (r)
-   return r;
+   dc_set_power_state(dm->dc, DC_ACPI_CM_POWER_STATE_D0);
 
dc_resume(dm->dc);
 
@@ -2953,9 +2948,7 @@ static int dm_resume(void *handle)
}
 
/* power on hardware */
-   r = dm_set_power_state(dm->dc, DC_ACPI_CM_POWER_STATE_D0);
-   if (r)
-   return r;
+dc_set_power_state(dm->dc, DC_ACPI_CM_POWER_STATE_D0);
 
/* program HPD filter */
dc_resume(dm->dc);
diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c 
b/drivers/gpu/drm/amd/display/dc/core/dc.c
index fd35ab2ce3a4..f602ff0d4146 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
@@ -4777,12 +4777,12 @@ void dc_power_down_on_boot(struct dc *dc)
dc->hwss.power_down_on_boot(dc);
 }
 
-bool dc_set_power_state(
+void dc_set_power_state(
struct dc *dc,
enum dc_acpi_cm_power_state power_state)
 {
if (!dc->current_state)
-   return true;
+   return;
 
switch (power_state) {
case DC_ACPI_CM_POWER_STATE_D0:
@@ -4805,8 +4805,6 @@ bool dc_set_power_state(
 
break;
}
-
-   return true;
 }
 
 void dc_resume(struct dc *dc)
diff --git a/drivers/gpu/drm/amd/display/dc/dc.h 
b/drivers/gpu/drm/amd/display/dc/dc.h
index 42a18b363e00..f16912a8791d 100644
--- a/drivers/gpu/drm/amd/display/dc/dc.h
+++ b/drivers/gpu/drm/amd/display/dc/dc.h
@@ -2341,7 +2341,7 @@ void dc_notify_vsync_int_state(struct dc *dc, struct 
dc_stream_state *stream, bo
 
 /* Power Interfaces */
 
-bool dc_set_power_state(
+void dc_set_power_state(
struct dc *dc,
enum dc_acpi_cm_power_state power_state);
 void dc_resume(struct dc *dc);
-- 
2.34.1

[PATCH v6 5/7] drm/amd/display: Catch errors from drm_atomic_helper_suspend()

2023-10-09 Thread Mario Limonciello

drm_atomic_helper_suspend() can return PTR_ERR(), in which case the
error gets stored into `dm->cached_state`.  This can cause failures
during resume.  Catch the error during suspend and fail the suspend
instead.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2362
Acked-by: Christian König 
Acked-by: Alex Deucher 
Signed-off-by: Mario Limonciello 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index bd6cb8ffda7f..97e21bbac5cf 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -2698,6 +2698,8 @@ static int dm_suspend(void *handle)
 
WARN_ON(adev->dm.cached_state);
adev->dm.cached_state = drm_atomic_helper_suspend(adev_to_drm(adev));
+   if (IS_ERR(adev->dm.cached_state))
+   return PTR_ERR(adev->dm.cached_state);
 
s3_handle_mst(adev_to_drm(adev), true);
 
-- 
2.34.1

[PATCH v6 4/7] drm/amd: Capture errors in amdgpu_switcheroo_set_state()

2023-10-09 Thread Mario Limonciello

amdgpu_switcheroo_set_state() calls lots of functions that could
fail under memory pressure or for other reasons.  Don't assume
everything can successfully run sequentially, and check return codes
for everything that returns one.

Acked-by: Christian König 
Signed-off-by: Mario Limonciello 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 36 +-
 1 file changed, 29 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 0f98f720d9ca..65a4537ee6f3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -1749,23 +1749,45 @@ static void amdgpu_switcheroo_set_state(struct pci_dev 
*pdev,
/* don't suspend or resume card normally */
dev->switch_power_state = DRM_SWITCH_POWER_CHANGING;
 
-   pci_set_power_state(pdev, PCI_D0);
-   amdgpu_device_load_pci_state(pdev);
+   r = pci_set_power_state(pdev, PCI_D0);
+   if (r) {
+   DRM_WARN("pci_set_power_state failed (%d)\n", r);
+   return;
+   }
+   if (!amdgpu_device_load_pci_state(pdev))
+   return;
r = pci_enable_device(pdev);
if (r)
DRM_WARN("pci_enable_device failed (%d)\n", r);
-   amdgpu_device_resume(dev, true);
+   r = amdgpu_device_resume(dev, true);
+   if (r) {
+   DRM_WARN("amdgpu_device_resume failed (%d)\n", r);
+   return;
+   }
 
dev->switch_power_state = DRM_SWITCH_POWER_ON;
} else {
pr_info("switched off\n");
dev->switch_power_state = DRM_SWITCH_POWER_CHANGING;
-   amdgpu_device_prepare(dev);
-   amdgpu_device_suspend(dev, true);
-   amdgpu_device_cache_pci_state(pdev);
+   r = amdgpu_device_prepare(dev);
+   if (r) {
+   DRM_WARN("amdgpu_device_prepare failed (%d)\n", r);
+   return;
+   }
+   r = amdgpu_device_suspend(dev, true);
+   if (r) {
+   DRM_WARN("amdgpu_device_suspend failed (%d)\n", r);
+   return;
+   }
+   if (!amdgpu_device_cache_pci_state(pdev))
+   return;
/* Shut down the device */
pci_disable_device(pdev);
-   pci_set_power_state(pdev, PCI_D3cold);
+   r = pci_set_power_state(pdev, PCI_D3cold);
+   if (r) {
+   DRM_WARN("pci_set_power_state failed (%d)\n", r);
+   return;
+   }
dev->switch_power_state = DRM_SWITCH_POWER_OFF;
}
 }
-- 
2.34.1

[PATCH v6 3/7] drm/amd: Split up UVD suspend into prepare and suspend steps

2023-10-09 Thread Mario Limonciello

amdgpu_uvd_suspend() allocates memory and copies objects into that
allocated memory.  This fails under memory pressure.  Instead move
majority of this code into a prepare step when swap can still be
allocated.

Reviewed-by: Christian König 
Signed-off-by: Mario Limonciello 
---
v5->v6:
 * Drop spurious newlines
 * add tag
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 12 
 drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.h |  1 +
 drivers/gpu/drm/amd/amdgpu/uvd_v3_1.c   |  8 
 drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c   |  8 
 drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c   |  8 
 drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c   |  8 
 drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c   |  8 
 7 files changed, 49 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
index b7441654e6fa..815b7c34ed33 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
@@ -418,12 +418,11 @@ int amdgpu_uvd_entity_init(struct amdgpu_device *adev)
return 0;
 }
 
-int amdgpu_uvd_suspend(struct amdgpu_device *adev)
+int amdgpu_uvd_prepare_suspend(struct amdgpu_device *adev)
 {
unsigned int size;
void *ptr;
int i, j, idx;
-   bool in_ras_intr = amdgpu_ras_intr_triggered();
 
cancel_delayed_work_sync(>uvd.idle_work);
 
@@ -452,7 +451,7 @@ int amdgpu_uvd_suspend(struct amdgpu_device *adev)
 
if (drm_dev_enter(adev_to_drm(adev), )) {
/* re-write 0 since err_event_athub will corrupt VCPU 
buffer */
-   if (in_ras_intr)
+   if (amdgpu_ras_intr_triggered())
memset(adev->uvd.inst[j].saved_bo, 0, size);
else
memcpy_fromio(adev->uvd.inst[j].saved_bo, ptr, 
size);
@@ -461,7 +460,12 @@ int amdgpu_uvd_suspend(struct amdgpu_device *adev)
}
}
 
-   if (in_ras_intr)
+   return 0;
+}
+
+int amdgpu_uvd_suspend(struct amdgpu_device *adev)
+{
+   if (amdgpu_ras_intr_triggered())
DRM_WARN("UVD VCPU state may lost due to RAS 
ERREVENT_ATHUB_INTERRUPT\n");
 
return 0;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.h
index 9f89bb7cd60b..a9f342537c68 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.h
@@ -74,6 +74,7 @@ struct amdgpu_uvd {
 int amdgpu_uvd_sw_init(struct amdgpu_device *adev);
 int amdgpu_uvd_sw_fini(struct amdgpu_device *adev);
 int amdgpu_uvd_entity_init(struct amdgpu_device *adev);
+int amdgpu_uvd_prepare_suspend(struct amdgpu_device *adev);
 int amdgpu_uvd_suspend(struct amdgpu_device *adev);
 int amdgpu_uvd_resume(struct amdgpu_device *adev);
 int amdgpu_uvd_get_create_msg(struct amdgpu_ring *ring, uint32_t handle,
diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v3_1.c 
b/drivers/gpu/drm/amd/amdgpu/uvd_v3_1.c
index 5534c769b655..58a8f78c003c 100644
--- a/drivers/gpu/drm/amd/amdgpu/uvd_v3_1.c
+++ b/drivers/gpu/drm/amd/amdgpu/uvd_v3_1.c
@@ -706,6 +706,13 @@ static int uvd_v3_1_hw_fini(void *handle)
return 0;
 }
 
+static int uvd_v3_1_prepare_suspend(void *handle)
+{
+   struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+
+   return amdgpu_uvd_prepare_suspend(adev);
+}
+
 static int uvd_v3_1_suspend(void *handle)
 {
int r;
@@ -806,6 +813,7 @@ static const struct amd_ip_funcs uvd_v3_1_ip_funcs = {
.sw_fini = uvd_v3_1_sw_fini,
.hw_init = uvd_v3_1_hw_init,
.hw_fini = uvd_v3_1_hw_fini,
+   .prepare_suspend = uvd_v3_1_prepare_suspend,
.suspend = uvd_v3_1_suspend,
.resume = uvd_v3_1_resume,
.is_idle = uvd_v3_1_is_idle,
diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c 
b/drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c
index c108b8381795..d3b1e31f5450 100644
--- a/drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c
+++ b/drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c
@@ -220,6 +220,13 @@ static int uvd_v4_2_hw_fini(void *handle)
return 0;
 }
 
+static int uvd_v4_2_prepare_suspend(void *handle)
+{
+   struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+
+   return amdgpu_uvd_prepare_suspend(adev);
+}
+
 static int uvd_v4_2_suspend(void *handle)
 {
int r;
@@ -756,6 +763,7 @@ static const struct amd_ip_funcs uvd_v4_2_ip_funcs = {
.sw_fini = uvd_v4_2_sw_fini,
.hw_init = uvd_v4_2_hw_init,
.hw_fini = uvd_v4_2_hw_fini,
+   .prepare_suspend = uvd_v4_2_prepare_suspend,
.suspend = uvd_v4_2_suspend,
.resume = uvd_v4_2_resume,
.is_idle = uvd_v4_2_is_idle,
diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c 
b/drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c
index d7e31e48a2b8..5a8116437abf 100644
--- a/drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c
@@ -218,6 +218,13 @@ static int uvd_v5_0_hw_fini(void *handle)

[PATCH v6 2/7] drm/amd: Add concept of running prepare_suspend() sequence for IP blocks

2023-10-09 Thread Mario Limonciello

If any IP blocks allocate memory during their hw_fini() sequence
this can cause the suspend to fail under memory pressure.  Introduce
a new phase that IP blocks can use to allocate memory before suspend
starts so that it can potentially be evicted into swap instead.

Reviewed-by: Christian König 
Signed-off-by: Mario Limonciello 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 12 +++-
 drivers/gpu/drm/amd/include/amd_shared.h   |  1 +
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 0a7cc9107f07..0f98f720d9ca 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4348,7 +4348,7 @@ static int amdgpu_device_evict_resources(struct 
amdgpu_device *adev)
 int amdgpu_device_prepare(struct drm_device *dev)
 {
struct amdgpu_device *adev = drm_to_adev(dev);
-   int r;
+   int i, r;
 
if (dev->switch_power_state == DRM_SWITCH_POWER_OFF)
return 0;
@@ -4358,6 +4358,16 @@ int amdgpu_device_prepare(struct drm_device *dev)
if (r)
return r;
 
+   for (i = 0; i < adev->num_ip_blocks; i++) {
+   if (!adev->ip_blocks[i].status.valid)
+   continue;
+   if (!adev->ip_blocks[i].version->funcs->prepare_suspend)
+   continue;
+   r = adev->ip_blocks[i].version->funcs->prepare_suspend((void 
*)adev);
+   if (r)
+   return r;
+   }
+
return 0;
 }
 
diff --git a/drivers/gpu/drm/amd/include/amd_shared.h 
b/drivers/gpu/drm/amd/include/amd_shared.h
index ce75351204bb..98e60bc868dd 100644
--- a/drivers/gpu/drm/amd/include/amd_shared.h
+++ b/drivers/gpu/drm/amd/include/amd_shared.h
@@ -299,6 +299,7 @@ struct amd_ip_funcs {
int (*hw_init)(void *handle);
int (*hw_fini)(void *handle);
void (*late_fini)(void *handle);
+   int (*prepare_suspend)(void *handle);
int (*suspend)(void *handle);
int (*resume)(void *handle);
bool (*is_idle)(void *handle);
-- 
2.34.1

[PATCH v6 1/7] drm/amd: Evict resources during PM ops prepare() callback

2023-10-09 Thread Mario Limonciello

Linux PM core has a prepare() callback run before suspend.

If the system is under high memory pressure, the resources may need
to be evicted into swap instead.  If the storage backing for swap
is offlined during the suspend() step then such a call may fail.

So move this step into prepare() to move evict majority of
resources and update all non-pmops callers to call the same callback.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2362
Reviewed-by: Christian König 
Reviewed-by: Alex Deucher 
Signed-off-by: Mario Limonciello 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h|  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 31 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c| 10 ---
 3 files changed, 34 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 4cc78e0e4304..fdb2e9ae13e2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1409,6 +1409,7 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
 void amdgpu_driver_release_kms(struct drm_device *dev);
 
 int amdgpu_device_ip_suspend(struct amdgpu_device *adev);
+int amdgpu_device_prepare(struct drm_device *dev);
 int amdgpu_device_suspend(struct drm_device *dev, bool fbcon);
 int amdgpu_device_resume(struct drm_device *dev, bool fbcon);
 u32 amdgpu_get_vblank_counter_kms(struct drm_crtc *crtc);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index d5b81a086e69..0a7cc9107f07 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -1760,6 +1760,7 @@ static void amdgpu_switcheroo_set_state(struct pci_dev 
*pdev,
} else {
pr_info("switched off\n");
dev->switch_power_state = DRM_SWITCH_POWER_CHANGING;
+   amdgpu_device_prepare(dev);
amdgpu_device_suspend(dev, true);
amdgpu_device_cache_pci_state(pdev);
/* Shut down the device */
@@ -4335,6 +4336,31 @@ static int amdgpu_device_evict_resources(struct 
amdgpu_device *adev)
 /*
  * Suspend & resume.
  */
+/**
+ * amdgpu_device_prepare - prepare for device suspend
+ *
+ * @dev: drm dev pointer
+ *
+ * Prepare to put the hw in the suspend state (all asics).
+ * Returns 0 for success or an error on failure.
+ * Called at driver suspend.
+ */
+int amdgpu_device_prepare(struct drm_device *dev)
+{
+   struct amdgpu_device *adev = drm_to_adev(dev);
+   int r;
+
+   if (dev->switch_power_state == DRM_SWITCH_POWER_OFF)
+   return 0;
+
+   /* Evict the majority of BOs before starting suspend sequence */
+   r = amdgpu_device_evict_resources(adev);
+   if (r)
+   return r;
+
+   return 0;
+}
+
 /**
  * amdgpu_device_suspend - initiate device suspend
  *
@@ -4355,11 +4381,6 @@ int amdgpu_device_suspend(struct drm_device *dev, bool 
fbcon)
 
adev->in_suspend = true;
 
-   /* Evict the majority of BOs before grabbing the full access */
-   r = amdgpu_device_evict_resources(adev);
-   if (r)
-   return r;
-
if (amdgpu_sriov_vf(adev)) {
amdgpu_virt_fini_data_exchange(adev);
r = amdgpu_virt_request_full_gpu(adev, false);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 81affdf7c0c3..420196a17e22 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -2427,8 +2427,9 @@ static int amdgpu_pmops_prepare(struct device *dev)
/* Return a positive number here so
 * DPM_FLAG_SMART_SUSPEND works properly
 */
-   if (amdgpu_device_supports_boco(drm_dev))
-   return pm_runtime_suspended(dev);
+   if (amdgpu_device_supports_boco(drm_dev) &&
+   pm_runtime_suspended(dev))
+   return 1;
 
/* if we will not support s3 or s2i for the device
 *  then skip suspend
@@ -2437,7 +2438,7 @@ static int amdgpu_pmops_prepare(struct device *dev)
!amdgpu_acpi_is_s3_active(adev))
return 1;
 
-   return 0;
+   return amdgpu_device_prepare(drm_dev);
 }
 
 static void amdgpu_pmops_complete(struct device *dev)
@@ -2637,6 +2638,9 @@ static int amdgpu_pmops_runtime_suspend(struct device 
*dev)
if (amdgpu_device_supports_boco(drm_dev))
adev->mp1_state = PP_MP1_STATE_UNLOAD;
 
+   ret = amdgpu_device_prepare(drm_dev);
+   if (ret)
+   return ret;
ret = amdgpu_device_suspend(drm_dev, false);
if (ret) {
adev->in_runpm = false;
-- 
2.34.1

[PATCH v6 0/7] Better handle memory pressure at suspend

2023-10-09 Thread Mario Limonciello

At suspend time if there is memory pressure then dynamically allocating
memory will cause failures that don't clean up properly when trying
suspend a second time.

Move the bigger memory allocations into Linux PM prepare() callback, drop
allocations that aren't really needed in DC code and report failures
in dm_suspend() up.

v1: 
https://lore.kernel.org/amd-gfx/20230925143359.14932-1-mario.limoncie...@amd.com/
v2: 
https://lore.kernel.org/amd-gfx/20231002224449.95565-1-mario.limoncie...@amd.com/T/#mc800319a05df821cd1875234b09bf212e2e3282b
v3: 
https://lore.kernel.org/amd-gfx/20231003205437.123426-1-mario.limoncie...@amd.com/T/#m00a49b75cd2638bf8a0ebd549d6a6010bfb7328b
v4: 
https://lore.kernel.org/amd-gfx/20231004171838.168215-1-mario.limoncie...@amd.com/T/#m0921840868295ec19abbe5ecbaa0aee75356b9e1
v5: 
https://lore.kernel.org/amd-gfx/20231006185026.5536-1-mario.limoncie...@amd.com/T/#m924d5b955ff2fe933b3c483a61e703724f7014e4

v5->v6:
 * Rename prepare() to prepare_suspend()
 * Remove spurious newlines
 * Add tags
 * Fix up commit messages
v4->v5:
  * Call amdgpu_device_prepare() from other callers to amdgpu_device_suspend()
  * 3x evict calls -> 2x evict calls
  * Add IP block specific prepare path
  * Fix issue in UVD
  * Raise warnings for issues that could happen in amdgpu_switcheroo_set_state()
  * Catch problem that could happen in dm_suspend()
  * Rebase on top of DML2 series in amd-staging-drm-next
v3->v4:
 * Combine patches 1/2
 * Drop adev->in_suspend references
v2->v3:
 * Handle adev->in_suspend in prepare() and complete()
 * Add missing scratch variable in dc_resource_state_destruct()
 * Revert error code propagation in same series
v1->v2:
 * Handle DC code too
 * Add prepare callback rather than moving symbol calls

Mario Limonciello (7):
  drm/amd: Evict resources during PM ops prepare() callback
  drm/amd: Add concept of running prepare_suspend() sequence for IP
blocks
  drm/amd: Split up UVD suspend into prepare and suspend steps
  drm/amd: Capture errors in amdgpu_switcheroo_set_state()
  drm/amd/display: Catch errors from drm_atomic_helper_suspend()
  drm/amd/display: Destroy DC context while keeping DML and DML2
  drm/amd/display: make dc_set_power_state() return type `void` again

 drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c| 75 ---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   | 10 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c   | 12 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.h   |  1 +
 drivers/gpu/drm/amd/amdgpu/uvd_v3_1.c |  8 ++
 drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c |  8 ++
 drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c |  8 ++
 drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c |  8 ++
 drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c |  8 ++
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 19 ++---
 drivers/gpu/drm/amd/display/dc/core/dc.c  | 43 +--
 .../gpu/drm/amd/display/dc/core/dc_resource.c | 12 +++
 drivers/gpu/drm/amd/display/dc/dc.h   |  2 +-
 drivers/gpu/drm/amd/include/amd_shared.h  |  1 +
 15 files changed, 144 insertions(+), 72 deletions(-)

-- 
2.34.1

[PATCH v2] drm/amdgpu: Address member 'gart_placement' not described in 'amdgpu_gmc_gart_location'

2023-10-09 Thread Srinivasan Shanmugam

Fixes the below:
drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c:274: warning: Function parameter or 
member 'gart_placement' not described in 'amdgpu_gmc_gart_location'

Cc: Christian König 
Cc: Alex Deucher 
Cc: "Pan, Xinhui" 
Signed-off-by: Srinivasan Shanmugam 
---

v2:
 - s/in/around

 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
index 60c81c3d29d5..47772f233a4d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
@@ -264,6 +264,7 @@ void amdgpu_gmc_sysvm_location(struct amdgpu_device *adev, 
struct amdgpu_gmc *mc
  *
  * @adev: amdgpu device structure holding all necessary information
  * @mc: memory controller structure holding memory information
+ * @gart_placement: GART placement region around VRAM
  *
  * Function will place try to place GART before or after VRAM.
  * If GART size is bigger than space left then we ajust GART size.
-- 
2.34.1

RE: [PATCH 3/3] drm/amdgpu/umsch: enable doorbell for umsch

2023-10-09 Thread Gopalakrishnan, Veerabadhran (Veera)

[AMD Official Use Only - General]

Reviewed-by: Veerabadhran Gopalakrishnan 

-Veera

-Original Message-
From: Yu, Lang 
Sent: Saturday, October 7, 2023 2:24 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Zhang, Yifan 
; Gopalakrishnan, Veerabadhran (Veera) 
; Yu, Lang 
Subject: [PATCH 3/3] drm/amdgpu/umsch: enable doorbell for umsch

Program vcn_doorbell_range with vcn_ring0_1.

Signed-off-by: Lang Yu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c | 2 +-
 drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c   | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c
index 9d89c4186989..4bd076e9e367 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c
@@ -554,7 +554,7 @@ int amdgpu_umsch_mm_ring_init(struct amdgpu_umsch_mm *umsch)
struct amdgpu_ring *ring = >ring;

ring->vm_hub = AMDGPU_MMHUB0(0);
-   ring->use_doorbell = 0;
+   ring->use_doorbell = true;
ring->no_scheduler = true;
ring->doorbell_index = (AMDGPU_NAVI10_DOORBELL64_VCN0_1 << 1) + 6;

diff --git a/drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c 
b/drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c
index 7e79954c833b..17c73aaa1e82 100644
--- a/drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c
@@ -217,7 +217,8 @@ static int umsch_mm_v4_0_ring_start(struct amdgpu_umsch_mm 
*umsch)
data = REG_SET_FIELD(data, VCN_UMSCH_RB_DB_CTRL, EN, 1);
WREG32_SOC15(VCN, 0, regVCN_UMSCH_RB_DB_CTRL, data);

-   adev->nbio.funcs->vcn_doorbell_range(adev, ring->use_doorbell, 
ring->doorbell_index, 0);
+   adev->nbio.funcs->vcn_doorbell_range(adev, ring->use_doorbell,
+   (adev->doorbell_index.vcn.vcn_ring0_1 << 1), 0);

WREG32_SOC15(VCN, 0, regVCN_UMSCH_RB_BASE_LO, 
lower_32_bits(ring->gpu_addr));
WREG32_SOC15(VCN, 0, regVCN_UMSCH_RB_BASE_HI, 
upper_32_bits(ring->gpu_addr));
--
2.25.1

RE: [PATCH 2/3] drm/amdgpu/umsch: power on/off UMSCH by DLDO

2023-10-09 Thread Gopalakrishnan, Veerabadhran (Veera)

[AMD Official Use Only - General]

Reviewed-by: Veerabadhran Gopalakrishnan 

-Veera

-Original Message-
From: Yu, Lang 
Sent: Saturday, October 7, 2023 2:24 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Zhang, Yifan 
; Gopalakrishnan, Veerabadhran (Veera) 
; Yu, Lang 
Subject: [PATCH 2/3] drm/amdgpu/umsch: power on/off UMSCH by DLDO

VCN 4.0.5 uses DLDO.

Signed-off-by: Lang Yu 
---
 drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c 
b/drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c
index a60178156c77..7e79954c833b 100644
--- a/drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c
@@ -34,6 +34,16 @@
 #include "umsch_mm_4_0_api_def.h"
 #include "umsch_mm_v4_0.h"

+#define regUVD_IPX_DLDO_CONFIG 0x0064
+#define regUVD_IPX_DLDO_CONFIG_BASE_IDX1
+#define regUVD_IPX_DLDO_STATUS 0x0065
+#define regUVD_IPX_DLDO_STATUS_BASE_IDX1
+
+#define UVD_IPX_DLDO_CONFIG__ONO0_PWR_CONFIG__SHIFT0x0002
+#define UVD_IPX_DLDO_CONFIG__ONO0_PWR_CONFIG_MASK  0x000cUL
+#define UVD_IPX_DLDO_STATUS__ONO0_PWR_STATUS__SHIFT0x0001
+#define UVD_IPX_DLDO_STATUS__ONO0_PWR_STATUS_MASK  0x0002UL
+
 static int umsch_mm_v4_0_load_microcode(struct amdgpu_umsch_mm *umsch)  {
struct amdgpu_device *adev = umsch->ring.adev; @@ -50,6 +60,14 @@ 
static int umsch_mm_v4_0_load_microcode(struct amdgpu_umsch_mm *umsch)

umsch->cmd_buf_curr_ptr = umsch->cmd_buf_ptr;

+   if (adev->ip_versions[VCN_HWIP][0] == IP_VERSION(4, 0, 5)) {
+   WREG32_SOC15(VCN, 0, regUVD_IPX_DLDO_CONFIG,
+   1 << UVD_IPX_DLDO_CONFIG__ONO0_PWR_CONFIG__SHIFT);
+   SOC15_WAIT_ON_RREG(VCN, 0, regUVD_IPX_DLDO_STATUS,
+   0 << UVD_IPX_DLDO_STATUS__ONO0_PWR_STATUS__SHIFT,
+   UVD_IPX_DLDO_STATUS__ONO0_PWR_STATUS_MASK);
+   }
+
data = RREG32_SOC15(VCN, 0, regUMSCH_MES_RESET_CTRL);
data = REG_SET_FIELD(data, UMSCH_MES_RESET_CTRL, MES_CORE_SOFT_RESET, 
0);
WREG32_SOC15_UMSCH(regUMSCH_MES_RESET_CTRL, data); @@ -229,6 +247,14 @@ 
static int umsch_mm_v4_0_ring_stop(struct amdgpu_umsch_mm *umsch)
data = REG_SET_FIELD(data, VCN_UMSCH_RB_DB_CTRL, EN, 0);
WREG32_SOC15(VCN, 0, regVCN_UMSCH_RB_DB_CTRL, data);

+   if (adev->ip_versions[VCN_HWIP][0] == IP_VERSION(4, 0, 5)) {
+   WREG32_SOC15(VCN, 0, regUVD_IPX_DLDO_CONFIG,
+   2 << UVD_IPX_DLDO_CONFIG__ONO0_PWR_CONFIG__SHIFT);
+   SOC15_WAIT_ON_RREG(VCN, 0, regUVD_IPX_DLDO_STATUS,
+   1 << UVD_IPX_DLDO_STATUS__ONO0_PWR_STATUS__SHIFT,
+   UVD_IPX_DLDO_STATUS__ONO0_PWR_STATUS_MASK);
+   }
+
return 0;
 }

--
2.25.1

RE: [PATCH 1/3] drm/amdgpu/umsch: fix psp frontdoor loading

2023-10-09 Thread Gopalakrishnan, Veerabadhran (Veera)

[AMD Official Use Only - General]

Reviewed-by: Veerabadhran Gopalakrishnan 

-Veera

-Original Message-
From: Yu, Lang 
Sent: Saturday, October 7, 2023 2:24 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Zhang, Yifan 
; Gopalakrishnan, Veerabadhran (Veera) 
; Yu, Lang 
Subject: [PATCH 1/3] drm/amdgpu/umsch: fix psp frontdoor loading

These changes are missed in rebase.

Signed-off-by: Lang Yu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c |  35 +++---  
drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.h |  18 +++-
 drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c   | 107 ---
 3 files changed, 72 insertions(+), 88 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c
index aeff9926412f..9d89c4186989 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c
@@ -75,17 +75,6 @@ struct umsch_mm_test {
uint32_tnum_queues;
 };

-int umsch_mm_psp_update_sram(struct amdgpu_device *adev, u32 ucode_size) -{
-   struct amdgpu_firmware_info ucode = {
-   .ucode_id = AMDGPU_UCODE_ID_UMSCH_MM_CMD_BUFFER,
-   .mc_addr = adev->umsch_mm.cmd_buf_gpu_addr,
-   .ucode_size = ucode_size,
-   };
-
-   return psp_execute_ip_fw_load(>psp, );
-}
-
 static int map_ring_data(struct amdgpu_device *adev, struct amdgpu_vm *vm,
  struct amdgpu_bo *bo, struct amdgpu_bo_va **bo_va,
  uint64_t addr, uint32_t size)
@@ -694,15 +683,17 @@ int amdgpu_umsch_mm_allocate_ucode_data_buffer(struct 
amdgpu_umsch_mm *umsch)
return 0;
 }

-void* amdgpu_umsch_mm_add_cmd(struct amdgpu_umsch_mm *umsch,
- void* cmd_ptr, uint32_t reg_offset, uint32_t 
reg_data)
+int amdgpu_umsch_mm_psp_execute_cmd_buf(struct amdgpu_umsch_mm *umsch)
 {
-   uint32_t* ptr = (uint32_t *)cmd_ptr;
-
-   *ptr++ = (reg_offset << 2);
-   *ptr++ = reg_data;
+   struct amdgpu_device *adev = umsch->ring.adev;
+   struct amdgpu_firmware_info ucode = {
+   .ucode_id = AMDGPU_UCODE_ID_UMSCH_MM_CMD_BUFFER,
+   .mc_addr = adev->umsch_mm.cmd_buf_gpu_addr,
+   .ucode_size = ((uintptr_t)adev->umsch_mm.cmd_buf_curr_ptr -
+ (uintptr_t)adev->umsch_mm.cmd_buf_ptr),
+   };

-   return ptr;
+   return psp_execute_ip_fw_load(>psp, );
 }

 static void umsch_mm_agdb_index_init(struct amdgpu_device *adev) @@ -824,11 
+815,9 @@ static int umsch_mm_hw_init(void *handle)
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
int r;

-   if (adev->firmware.load_type == AMDGPU_FW_LOAD_DIRECT) {
-   r = umsch_mm_load_microcode(>umsch_mm);
-   if (r)
-   return r;
-   }
+   r = umsch_mm_load_microcode(>umsch_mm);
+   if (r)
+   return r;

umsch_mm_ring_start(>umsch_mm);

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.h
index d83fdf2da464..8258a43a6236 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.h
@@ -150,6 +150,7 @@ struct amdgpu_umsch_mm {
struct amdgpu_bo*cmd_buf_obj;
uint64_tcmd_buf_gpu_addr;
uint32_t*cmd_buf_ptr;
+   uint32_t*cmd_buf_curr_ptr;

uint32_twb_index;
uint64_tsch_ctx_gpu_addr;
@@ -167,19 +168,28 @@ struct amdgpu_umsch_mm {
struct mutexmutex_hidden;
 };

-int umsch_mm_psp_update_sram(struct amdgpu_device *adev, u32 ucode_size);
-
 int amdgpu_umsch_mm_submit_pkt(struct amdgpu_umsch_mm *umsch, void *pkt, int 
ndws);  int amdgpu_umsch_mm_query_fence(struct amdgpu_umsch_mm *umsch);

 int amdgpu_umsch_mm_init_microcode(struct amdgpu_umsch_mm *umsch);  int 
amdgpu_umsch_mm_allocate_ucode_buffer(struct amdgpu_umsch_mm *umsch);  int 
amdgpu_umsch_mm_allocate_ucode_data_buffer(struct amdgpu_umsch_mm *umsch);
-void* amdgpu_umsch_mm_add_cmd(struct amdgpu_umsch_mm *umsch,
- void* cmd_ptr, uint32_t reg_offset, uint32_t 
reg_data);
+
+int amdgpu_umsch_mm_psp_execute_cmd_buf(struct amdgpu_umsch_mm *umsch);

 int amdgpu_umsch_mm_ring_init(struct amdgpu_umsch_mm *umsch);

+#define WREG32_SOC15_UMSCH(reg, value) 
\
+   do {
\
+   uint32_t reg_offset = 
adev->reg_offset[VCN_HWIP][0][reg##_BASE_IDX] + reg;  \
+   if (adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) {   
\
+   *adev->umsch_mm.cmd_buf_curr_ptr++ = (reg_offset << 2); 
\
+

[PATCH] drm/amdgpu: Address member 'gart_placement' not described in 'amdgpu_gmc_gart_location'

2023-10-09 Thread Srinivasan Shanmugam

Fixes the below:
drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c:274: warning: Function parameter or 
member 'gart_placement' not described in 'amdgpu_gmc_gart_location'

Cc: Christian König 
Cc: Alex Deucher 
Cc: "Pan, Xinhui" 
Signed-off-by: Srinivasan Shanmugam 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
index 60c81c3d29d5..8864bdc63dba 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
@@ -264,6 +264,7 @@ void amdgpu_gmc_sysvm_location(struct amdgpu_device *adev, 
struct amdgpu_gmc *mc
  *
  * @adev: amdgpu device structure holding all necessary information
  * @mc: memory controller structure holding memory information
+ * @gart_placement: GART placement region in VRAM
  *
  * Function will place try to place GART before or after VRAM.
  * If GART size is bigger than space left then we ajust GART size.
-- 
2.34.1

[PATCH v3 2/3] drm/amd/pm: Add gpu_metrics_v1_4

2023-10-09 Thread Asad Kamal

Add new gpu_metrics_v1_4 to acquire XGMI data transfer,
pcie bandwidth & Clock lock status

v2:
Add pcie error counter to gpu metric table v1_4

Signed-off-by: Asad Kamal 
---
 .../gpu/drm/amd/include/kgd_pp_interface.h| 76 +++
 drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c|  3 +
 2 files changed, 79 insertions(+)

diff --git a/drivers/gpu/drm/amd/include/kgd_pp_interface.h 
b/drivers/gpu/drm/amd/include/kgd_pp_interface.h
index e0bb6d39f0c3..eeb547d79eac 100644
--- a/drivers/gpu/drm/amd/include/kgd_pp_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_pp_interface.h
@@ -313,6 +313,10 @@ enum pp_xgmi_plpd_mode {
 #define XGMI_MODE_PSTATE_D0 1
 
 #define NUM_HBM_INSTANCES 4
+#define NUM_XGMI_LINKS 8
+#define MAX_GFX_CLKS 8
+#define MAX_CLKS 4
+#define NUM_VCN 4
 
 struct seq_file;
 enum amd_pp_clock_type;
@@ -696,6 +700,78 @@ struct gpu_metrics_v1_3 {
uint64_tindep_throttle_status;
 };
 
+struct gpu_metrics_v1_4 {
+   struct metrics_table_header common_header;
+
+   /* Temperature (Celsius) */
+   uint16_ttemperature_hotspot;
+   uint16_ttemperature_mem;
+   uint16_ttemperature_vrsoc;
+
+   /* Power (Watts) */
+   uint16_tcurr_socket_power;
+
+   /* Utilization (%) */
+   uint16_taverage_gfx_activity;
+   uint16_taverage_umc_activity; // memory 
controller
+   uint16_tvcn_activity[NUM_VCN];
+
+   /* Energy (15.259uJ (2^-16) units) */
+   uint64_tenergy_accumulator;
+
+   /* Driver attached timestamp (in ns) */
+   uint64_tsystem_clock_counter;
+
+   /* Throttle status */
+   uint32_tthrottle_status;
+
+   /* Clock Lock Status. Each bit corresponds to clock instance */
+   uint32_tgfxclk_lock_status;
+
+   /* Link width (number of lanes) and speed (in 0.1 GT/s) */
+   uint16_tpcie_link_width;
+   uint16_tpcie_link_speed;
+
+   /* XGMI bus width and bitrate (in Gbps) */
+   uint16_txgmi_link_width;
+   uint16_txgmi_link_speed;
+
+   /* Utilization Accumulated (%) */
+   uint32_tgfx_activity_acc;
+   uint32_tmem_activity_acc;
+
+   /*PCIE accumulated bandwidth (GB/sec) */
+   uint64_tpcie_bandwidth_acc;
+
+   /*PCIE instantaneous bandwidth (GB/sec) */
+   uint64_tpcie_bandwidth_inst;
+
+   /* PCIE L0 to recovery state transition accumulated count */
+   uint64_tpcie_l0_to_recov_count_acc;
+
+   /* PCIE replay accumulated count */
+   uint64_tpcie_replay_count_acc;
+
+   /* PCIE replay rollover accumulated count */
+   uint64_tpcie_replay_rover_count_acc;
+
+   /* XGMI accumulated data transfer size(KiloBytes) */
+   uint64_txgmi_read_data_acc[NUM_XGMI_LINKS];
+   uint64_txgmi_write_data_acc[NUM_XGMI_LINKS];
+
+   /* PMFW attached timestamp (10ns resolution) */
+   uint64_tfirmware_timestamp;
+
+   /* Current clocks (Mhz) */
+   uint16_tcurrent_gfxclk[MAX_GFX_CLKS];
+   uint16_tcurrent_socclk[MAX_CLKS];
+   uint16_tcurrent_vclk0[MAX_CLKS];
+   uint16_tcurrent_dclk0[MAX_CLKS];
+   uint16_tcurrent_uclk;
+
+   uint16_tpadding;
+};
+
 /*
  * gpu_metrics_v2_0 is not recommended as it's not naturally aligned.
  * Use gpu_metrics_v2_1 or later instead.
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
index 12618a583e97..6e57c94379a9 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
@@ -986,6 +986,9 @@ void smu_cmn_init_soft_gpu_metrics(void *table, uint8_t 
frev, uint8_t crev)
case METRICS_VERSION(1, 3):
structure_size = sizeof(struct gpu_metrics_v1_3);
break;
+   case METRICS_VERSION(1, 4):
+   structure_size = sizeof(struct gpu_metrics_v1_4);
+   break;
case METRICS_VERSION(2, 0):
structure_size = sizeof(struct gpu_metrics_v2_0);
break;
-- 
2.42.0

[PATCH v3 3/3] drm/amd/pm: Use gpu_metrics_v1_4 for SMUv13.0.6

2023-10-09 Thread Asad Kamal

Use gpu_metrics_v1_4 for SMUv13.0.6 to fill
gpu metric info

v3: Removed filling gpu metric instantaneous
pcie bw

Signed-off-by: Asad Kamal 
---
 .../drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c  | 65 ---
 1 file changed, 41 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
index ce971a93d28b..7ab73112e4f3 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
@@ -279,7 +279,7 @@ static int smu_v13_0_6_tables_init(struct smu_context *smu)
return -ENOMEM;
smu_table->metrics_time = 0;
 
-   smu_table->gpu_metrics_table_size = sizeof(struct gpu_metrics_v1_3);
+   smu_table->gpu_metrics_table_size = sizeof(struct gpu_metrics_v1_4);
smu_table->gpu_metrics_table =
kzalloc(smu_table->gpu_metrics_table_size, GFP_KERNEL);
if (!smu_table->gpu_metrics_table) {
@@ -1969,22 +1969,19 @@ static int 
smu_v13_0_6_get_current_pcie_link_speed(struct smu_context *smu)
 static ssize_t smu_v13_0_6_get_gpu_metrics(struct smu_context *smu, void 
**table)
 {
struct smu_table_context *smu_table = >smu_table;
-   struct gpu_metrics_v1_3 *gpu_metrics =
-   (struct gpu_metrics_v1_3 *)smu_table->gpu_metrics_table;
+   struct gpu_metrics_v1_4 *gpu_metrics =
+   (struct gpu_metrics_v1_4 *)smu_table->gpu_metrics_table;
struct amdgpu_device *adev = smu->adev;
-   int ret = 0, inst0, xcc0;
+   int ret = 0, xcc_id, inst, i;
MetricsTable_t *metrics;
u16 link_width_level;
 
-   inst0 = adev->sdma.instance[0].aid_id;
-   xcc0 = GET_INST(GC, 0);
-
metrics = kzalloc(sizeof(MetricsTable_t), GFP_KERNEL);
ret = smu_v13_0_6_get_metrics_table(smu, metrics, true);
if (ret)
return ret;
 
-   smu_cmn_init_soft_gpu_metrics(gpu_metrics, 1, 3);
+   smu_cmn_init_soft_gpu_metrics(gpu_metrics, 1, 4);
 
gpu_metrics->temperature_hotspot =
SMUQ10_ROUND(metrics->MaxSocketTemperature);
@@ -2000,30 +1997,38 @@ static ssize_t smu_v13_0_6_get_gpu_metrics(struct 
smu_context *smu, void **table
gpu_metrics->average_umc_activity =
SMUQ10_ROUND(metrics->DramBandwidthUtilization);
 
-   gpu_metrics->average_socket_power =
+   gpu_metrics->curr_socket_power =
SMUQ10_ROUND(metrics->SocketPower);
/* Energy counter reported in 15.259uJ (2^-16) units */
gpu_metrics->energy_accumulator = metrics->SocketEnergyAcc;
 
-   gpu_metrics->current_gfxclk =
-   SMUQ10_ROUND(metrics->GfxclkFrequency[xcc0]);
-   gpu_metrics->current_socclk =
-   SMUQ10_ROUND(metrics->SocclkFrequency[inst0]);
-   gpu_metrics->current_uclk = SMUQ10_ROUND(metrics->UclkFrequency);
-   gpu_metrics->current_vclk0 =
-   SMUQ10_ROUND(metrics->VclkFrequency[inst0]);
-   gpu_metrics->current_dclk0 =
-   SMUQ10_ROUND(metrics->DclkFrequency[inst0]);
+   for (i = 0; i < MAX_GFX_CLKS; i++) {
+   xcc_id = GET_INST(GC, i);
+   if (xcc_id >= 0)
+   gpu_metrics->current_gfxclk[i] =
+   SMUQ10_ROUND(metrics->GfxclkFrequency[xcc_id]);
+
+   if (i < MAX_CLKS) {
+   gpu_metrics->current_socclk[i] =
+   SMUQ10_ROUND(metrics->SocclkFrequency[i]);
+   inst = GET_INST(VCN, i);
+   if (inst >= 0) {
+   gpu_metrics->current_vclk0[i] =
+   
SMUQ10_ROUND(metrics->VclkFrequency[inst]);
+   gpu_metrics->current_dclk0[i] =
+   
SMUQ10_ROUND(metrics->DclkFrequency[inst]);
+   }
+   }
+   }
 
-   gpu_metrics->average_gfxclk_frequency = gpu_metrics->current_gfxclk;
-   gpu_metrics->average_socclk_frequency = gpu_metrics->current_socclk;
-   gpu_metrics->average_uclk_frequency = gpu_metrics->current_uclk;
-   gpu_metrics->average_vclk0_frequency = gpu_metrics->current_vclk0;
-   gpu_metrics->average_dclk0_frequency = gpu_metrics->current_dclk0;
+   gpu_metrics->current_uclk = SMUQ10_ROUND(metrics->UclkFrequency);
 
/* Throttle status is not reported through metrics now */
gpu_metrics->throttle_status = 0;
 
+   /* Clock Lock Status. Each bit corresponds to each GFXCLK instance */
+   gpu_metrics->gfxclk_lock_status = metrics->GfxLockXCDMak >> 
GET_INST(GC, 0);
+
if (!(adev->flags & AMD_IS_APU)) {
link_width_level = 
smu_v13_0_6_get_current_pcie_link_width_level(smu);
if (link_width_level > MAX_LINK_WIDTH)
@@ -2033,6 +2038,8 @@ static ssize_t smu_v13_0_6_get_gpu_metrics(struct 
smu_context *smu,

[PATCH v3 1/3] drm/amd/pm: Update metric table for smu v13_0_6

2023-10-09 Thread Asad Kamal

Update pmfw metric table to include xgmi transfer
data and pci instantaneous bandwidth for smu v13_0_6

v2:
Updated metric table version

v3: Removed inst pcie bw with alignment to metrics table
version 8

Signed-off-by: Asad Kamal 
---
 .../gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu_v13_0_6_pmfw.h| 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu_v13_0_6_pmfw.h 
b/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu_v13_0_6_pmfw.h
index 9be4051c0865..66edd839c7ec 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu_v13_0_6_pmfw.h
+++ b/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu_v13_0_6_pmfw.h
@@ -123,7 +123,7 @@ typedef enum {
   VOLTAGE_GUARDBAND_COUNT
 } GFX_GUARDBAND_e;
 
-#define SMU_METRICS_TABLE_VERSION 0x7
+#define SMU_METRICS_TABLE_VERSION 0x8
 
 typedef struct __attribute__((packed, aligned(4))) {
   uint32_t AccumulationCounter;
@@ -207,6 +207,11 @@ typedef struct __attribute__((packed, aligned(4))) {
   uint64_t PublicSerialNumber_AID[4];
   uint64_t PublicSerialNumber_XCD[8];
   uint64_t PublicSerialNumber_CCD[12];
+
+  //XGMI Data tranfser size
+  uint64_t XgmiReadDataSizeAcc[8];//in KByte
+  uint64_t XgmiWriteDataSizeAcc[8];//in KByte
+
 } MetricsTable_t;
 
 #define SMU_VF_METRICS_TABLE_VERSION 0x3
-- 
2.42.0

Re: [PATCH v5 6/7] drm/amd/display: Destroy DC context while keeping DML and DML2

2023-10-09 Thread Alex Deucher

On Sat, Oct 7, 2023 at 3:47 AM Mario Limonciello
 wrote:
>
> If there is memory pressure at suspend time then dynamically
> allocating a large structure as part of DC suspend code will
> fail.
>
> Instead re-use the same structures and clear all members except
> those that should be maintained.
>
> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2362
> Signed-off-by: Mario Limonciello 

Acked-by: Alex Deucher 

> ---
> v4->v5:
>  * Rebase for DML2
> ---
>  drivers/gpu/drm/amd/display/dc/core/dc.c  | 37 ---
>  .../gpu/drm/amd/display/dc/core/dc_resource.c | 12 ++
>  2 files changed, 12 insertions(+), 37 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c 
> b/drivers/gpu/drm/amd/display/dc/core/dc.c
> index 2436a293931b..55e7b5a8ec8e 100644
> --- a/drivers/gpu/drm/amd/display/dc/core/dc.c
> +++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
> @@ -4704,12 +4704,6 @@ bool dc_set_power_state(
> struct dc *dc,
> enum dc_acpi_cm_power_state power_state)
>  {
> -   struct kref refcount;
> -   struct display_mode_lib *dml;
> -#ifdef CONFIG_DRM_AMD_DC_FP
> -   struct dml2_context *dml2 = NULL;
> -#endif
> -
> if (!dc->current_state)
> return true;
>
> @@ -4728,40 +4722,9 @@ bool dc_set_power_state(
>
> break;
> default:
> -#ifdef CONFIG_DRM_AMD_DC_FP
> -   if (dc->debug.using_dml2)
> -   dml2 = dc->current_state->bw_ctx.dml2;
> -#endif
> ASSERT(dc->current_state->stream_count == 0);
> -   /* Zero out the current context so that on resume we start 
> with
> -* clean state, and dc hw programming optimizations will not
> -* cause any trouble.
> -*/
> -   dml = kzalloc(sizeof(struct display_mode_lib),
> -   GFP_KERNEL);
> -
> -   ASSERT(dml);
> -   if (!dml)
> -   return false;
> -
> -   /* Preserve refcount */
> -   refcount = dc->current_state->refcount;
> -   /* Preserve display mode lib */
> -   memcpy(dml, >current_state->bw_ctx.dml, sizeof(struct 
> display_mode_lib));
>
> dc_resource_state_destruct(dc->current_state);
> -   memset(dc->current_state, 0,
> -   sizeof(*dc->current_state));
> -
> -   dc->current_state->refcount = refcount;
> -   dc->current_state->bw_ctx.dml = *dml;
> -
> -   kfree(dml);
> -
> -#ifdef CONFIG_DRM_AMD_DC_FP
> -   if (dc->debug.using_dml2)
> -   dc->current_state->bw_ctx.dml2 = dml2;
> -#endif
>
> break;
> }
> diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c 
> b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
> index c1e2f0e10ab2..e2c7acdff301 100644
> --- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
> +++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
> @@ -4357,6 +4357,18 @@ void dc_resource_state_destruct(struct dc_state 
> *context)
> context->streams[i] = NULL;
> }
> context->stream_count = 0;
> +   context->stream_mask = 0;
> +   memset(>res_ctx, 0, sizeof(context->res_ctx));
> +   memset(>pp_display_cfg, 0, sizeof(context->pp_display_cfg));
> +   memset(>dcn_bw_vars, 0, sizeof(context->dcn_bw_vars));
> +   context->clk_mgr = NULL;
> +   memset(>bw_ctx.bw, 0, sizeof(context->bw_ctx.bw));
> +   memset(context->block_sequence, 0, sizeof(context->block_sequence));
> +   context->block_sequence_steps = 0;
> +   memset(context->dc_dmub_cmd, 0, sizeof(context->dc_dmub_cmd));
> +   context->dmub_cmd_count = 0;
> +   memset(>perf_params, 0, sizeof(context->perf_params));
> +   memset(>scratch, 0, sizeof(context->scratch));
>  }
>
>  void dc_resource_state_copy_construct(
> --
> 2.34.1
>

Re: [PATCH v5 5/7] drm/amd/display: Catch errors from drm_atomic_helper_suspend()

2023-10-09 Thread Alex Deucher

On Sun, Oct 8, 2023 at 6:57 PM Mario Limonciello
 wrote:
>
> drm_atomic_helper_suspend() can return PTR_ERR(), in which case the
> error gets stored into `dm->cached_state`.  This can cause failures
> during resume.  Catch the error during suspend and fail the suspend
> instead.
>
> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2362
> Signed-off-by: Mario Limonciello 

Acked-by: Alex Deucher 

> ---
> v4->v5:
>  * New patch
> ---
>  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> index a59a11ae42db..63944d3b9e8c 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -2716,6 +2716,8 @@ static int dm_suspend(void *handle)
>
> WARN_ON(adev->dm.cached_state);
> adev->dm.cached_state = drm_atomic_helper_suspend(adev_to_drm(adev));
> +   if (IS_ERR(adev->dm.cached_state))
> +   return PTR_ERR(adev->dm.cached_state);
>
> s3_handle_mst(adev_to_drm(adev), true);
>
> --
> 2.34.1
>

Re: [PATCH v5 3/7] drm/amd: Split up UVD suspend into prepare and suspend steps

2023-10-09 Thread Alex Deucher

On Fri, Oct 6, 2023 at 3:07 PM Mario Limonciello
 wrote:
>
> amdgpu_uvd_suspend() allocates memory and copies objects into that
> allocated memory.  This fails under memory pressure.  Instead move
> majority of this code into a prepare step when swap can still be
> allocated.
>
> Signed-off-by: Mario Limonciello 
> ---
> v4->v5:
>  * New patch
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 12 
>  drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.h |  1 +
>  drivers/gpu/drm/amd/amdgpu/uvd_v3_1.c   |  9 +
>  drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c   |  9 +
>  drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c   |  9 +
>  drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c   |  9 +
>  drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c   |  9 +
>  7 files changed, 54 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
> index b7441654e6fa..a53c4ba8b3fe 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
> @@ -418,12 +418,11 @@ int amdgpu_uvd_entity_init(struct amdgpu_device *adev)
> return 0;
>  }
>
> -int amdgpu_uvd_suspend(struct amdgpu_device *adev)
> +int amdgpu_uvd_prepare(struct amdgpu_device *adev)

amdgpu_uvd_prepare_suspend() so it's more obvious what this is for.
Other than that, looks good to me.

Alex

>  {
> unsigned int size;
> void *ptr;
> int i, j, idx;
> -   bool in_ras_intr = amdgpu_ras_intr_triggered();
>
> cancel_delayed_work_sync(>uvd.idle_work);
>
> @@ -452,7 +451,7 @@ int amdgpu_uvd_suspend(struct amdgpu_device *adev)
>
> if (drm_dev_enter(adev_to_drm(adev), )) {
> /* re-write 0 since err_event_athub will corrupt VCPU 
> buffer */
> -   if (in_ras_intr)
> +   if (amdgpu_ras_intr_triggered())
> memset(adev->uvd.inst[j].saved_bo, 0, size);
> else
> memcpy_fromio(adev->uvd.inst[j].saved_bo, 
> ptr, size);
> @@ -461,7 +460,12 @@ int amdgpu_uvd_suspend(struct amdgpu_device *adev)
> }
> }
>
> -   if (in_ras_intr)
> +   return 0;
> +}
> +
> +int amdgpu_uvd_suspend(struct amdgpu_device *adev)
> +{
> +   if (amdgpu_ras_intr_triggered())
> DRM_WARN("UVD VCPU state may lost due to RAS 
> ERREVENT_ATHUB_INTERRUPT\n");
>
> return 0;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.h
> index 9f89bb7cd60b..72228425e021 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.h
> @@ -74,6 +74,7 @@ struct amdgpu_uvd {
>  int amdgpu_uvd_sw_init(struct amdgpu_device *adev);
>  int amdgpu_uvd_sw_fini(struct amdgpu_device *adev);
>  int amdgpu_uvd_entity_init(struct amdgpu_device *adev);
> +int amdgpu_uvd_prepare(struct amdgpu_device *adev);
>  int amdgpu_uvd_suspend(struct amdgpu_device *adev);
>  int amdgpu_uvd_resume(struct amdgpu_device *adev);
>  int amdgpu_uvd_get_create_msg(struct amdgpu_ring *ring, uint32_t handle,
> diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v3_1.c 
> b/drivers/gpu/drm/amd/amdgpu/uvd_v3_1.c
> index 5534c769b655..869e9948fa36 100644
> --- a/drivers/gpu/drm/amd/amdgpu/uvd_v3_1.c
> +++ b/drivers/gpu/drm/amd/amdgpu/uvd_v3_1.c
> @@ -706,6 +706,14 @@ static int uvd_v3_1_hw_fini(void *handle)
> return 0;
>  }
>
> +static int uvd_v3_1_prepare(void *handle)
> +{
> +   struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> +
> +   return amdgpu_uvd_prepare(adev);
> +
> +}
> +
>  static int uvd_v3_1_suspend(void *handle)
>  {
> int r;
> @@ -806,6 +814,7 @@ static const struct amd_ip_funcs uvd_v3_1_ip_funcs = {
> .sw_fini = uvd_v3_1_sw_fini,
> .hw_init = uvd_v3_1_hw_init,
> .hw_fini = uvd_v3_1_hw_fini,
> +   .prepare = uvd_v3_1_prepare,
> .suspend = uvd_v3_1_suspend,
> .resume = uvd_v3_1_resume,
> .is_idle = uvd_v3_1_is_idle,
> diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c 
> b/drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c
> index c108b8381795..e589c17af371 100644
> --- a/drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c
> +++ b/drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c
> @@ -220,6 +220,14 @@ static int uvd_v4_2_hw_fini(void *handle)
> return 0;
>  }
>
> +static int uvd_v4_2_prepare(void *handle)
> +{
> +   struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> +
> +   return amdgpu_uvd_prepare(adev);
> +
> +}
> +
>  static int uvd_v4_2_suspend(void *handle)
>  {
> int r;
> @@ -756,6 +764,7 @@ static const struct amd_ip_funcs uvd_v4_2_ip_funcs = {
> .sw_fini = uvd_v4_2_sw_fini,
> .hw_init = uvd_v4_2_hw_init,
> .hw_fini = uvd_v4_2_hw_fini,
> +   .prepare = uvd_v4_2_prepare,
> .suspend = uvd_v4_2_suspend,
> .resume = uvd_v4_2_resume,
> .is_idle =

Re: [PATCH v5 1/7] drm/amd: Evict resources during PM ops prepare() callback

2023-10-09 Thread Alex Deucher

On Fri, Oct 6, 2023 at 3:17 PM Mario Limonciello
 wrote:
>
> Linux PM core has a prepare() callback run before suspend.
>
> If the system is under high memory pressure, the resources may need
> to be evicted into swap instead.  If the storage backing for swap
> is offlined during the suspend() step then such a call may fail.
>
> So move this step into prepare() to move evict majority of
> resources and update all non-pmops callers to call the same callback.
>
> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2362
> Signed-off-by: Mario Limonciello 

Reviewed-by: Alex Deucher 

> ---
> v4->v5:
>  * Call amdgpu_device_prepare() from other callers to amdgpu_device_suspend()
>  * 3x evict calls -> 2x evict calls
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h|  1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 31 ++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c| 10 ---
>  3 files changed, 34 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 4cc78e0e4304..fdb2e9ae13e2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -1409,6 +1409,7 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
>  void amdgpu_driver_release_kms(struct drm_device *dev);
>
>  int amdgpu_device_ip_suspend(struct amdgpu_device *adev);
> +int amdgpu_device_prepare(struct drm_device *dev);
>  int amdgpu_device_suspend(struct drm_device *dev, bool fbcon);
>  int amdgpu_device_resume(struct drm_device *dev, bool fbcon);
>  u32 amdgpu_get_vblank_counter_kms(struct drm_crtc *crtc);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 0cb702c3046a..cb334dc57c59 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -1760,6 +1760,7 @@ static void amdgpu_switcheroo_set_state(struct pci_dev 
> *pdev,
> } else {
> pr_info("switched off\n");
> dev->switch_power_state = DRM_SWITCH_POWER_CHANGING;
> +   amdgpu_device_prepare(dev);
> amdgpu_device_suspend(dev, true);
> amdgpu_device_cache_pci_state(pdev);
> /* Shut down the device */
> @@ -4335,6 +4336,31 @@ static int amdgpu_device_evict_resources(struct 
> amdgpu_device *adev)
>  /*
>   * Suspend & resume.
>   */
> +/**
> + * amdgpu_device_prepare - prepare for device suspend
> + *
> + * @dev: drm dev pointer
> + *
> + * Prepare to put the hw in the suspend state (all asics).
> + * Returns 0 for success or an error on failure.
> + * Called at driver suspend.
> + */
> +int amdgpu_device_prepare(struct drm_device *dev)
> +{
> +   struct amdgpu_device *adev = drm_to_adev(dev);
> +   int r;
> +
> +   if (dev->switch_power_state == DRM_SWITCH_POWER_OFF)
> +   return 0;
> +
> +   /* Evict the majority of BOs before starting suspend sequence */
> +   r = amdgpu_device_evict_resources(adev);
> +   if (r)
> +   return r;
> +
> +   return 0;
> +}
> +
>  /**
>   * amdgpu_device_suspend - initiate device suspend
>   *
> @@ -4355,11 +4381,6 @@ int amdgpu_device_suspend(struct drm_device *dev, bool 
> fbcon)
>
> adev->in_suspend = true;
>
> -   /* Evict the majority of BOs before grabbing the full access */
> -   r = amdgpu_device_evict_resources(adev);
> -   if (r)
> -   return r;
> -
> if (amdgpu_sriov_vf(adev)) {
> amdgpu_virt_fini_data_exchange(adev);
> r = amdgpu_virt_request_full_gpu(adev, false);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index 81affdf7c0c3..420196a17e22 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -2427,8 +2427,9 @@ static int amdgpu_pmops_prepare(struct device *dev)
> /* Return a positive number here so
>  * DPM_FLAG_SMART_SUSPEND works properly
>  */
> -   if (amdgpu_device_supports_boco(drm_dev))
> -   return pm_runtime_suspended(dev);
> +   if (amdgpu_device_supports_boco(drm_dev) &&
> +   pm_runtime_suspended(dev))
> +   return 1;
>
> /* if we will not support s3 or s2i for the device
>  *  then skip suspend
> @@ -2437,7 +2438,7 @@ static int amdgpu_pmops_prepare(struct device *dev)
> !amdgpu_acpi_is_s3_active(adev))
> return 1;
>
> -   return 0;
> +   return amdgpu_device_prepare(drm_dev);
>  }
>
>  static void amdgpu_pmops_complete(struct device *dev)
> @@ -2637,6 +2638,9 @@ static int amdgpu_pmops_runtime_suspend(struct device 
> *dev)
> if (amdgpu_device_supports_boco(drm_dev))
> adev->mp1_state = PP_MP1_STATE_UNLOAD;
>
> +   ret = amdgpu_device_prepare(drm_dev);
> +   if (ret)
> +   return ret;

Re: [PATCH v5 2/7] drm/amd: Add concept of running prepare() sequence for IP blocks

2023-10-09 Thread Alex Deucher

On Fri, Oct 6, 2023 at 2:51 PM Mario Limonciello
 wrote:
>
> If any IP blocks allocate memory during their sw_fini() sequence
> this can cause the suspend to fail under memory pressure.  Introduce
> a new phase that IP blocks can use to allocate memory before suspend
> starts so that it can potentially be evicted into swap instead.
>
> Signed-off-by: Mario Limonciello 
> ---
> v4->v5:
>  * New patch
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 12 +++-
>  drivers/gpu/drm/amd/include/amd_shared.h   |  1 +
>  2 files changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index cb334dc57c59..a362152cd0da 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -4348,7 +4348,7 @@ static int amdgpu_device_evict_resources(struct 
> amdgpu_device *adev)
>  int amdgpu_device_prepare(struct drm_device *dev)
>  {
> struct amdgpu_device *adev = drm_to_adev(dev);
> -   int r;
> +   int i, r;
>
> if (dev->switch_power_state == DRM_SWITCH_POWER_OFF)
> return 0;
> @@ -4358,6 +4358,16 @@ int amdgpu_device_prepare(struct drm_device *dev)
> if (r)
> return r;
>
> +   for (i = 0; i < adev->num_ip_blocks; i++) {
> +   if (!adev->ip_blocks[i].status.valid)
> +   continue;
> +   if (!adev->ip_blocks[i].version->funcs->prepare)
> +   continue;
> +   r = adev->ip_blocks[i].version->funcs->prepare((void *)adev);
> +   if (r)
> +   return r;
> +   }
> +
> return 0;
>  }
>
> diff --git a/drivers/gpu/drm/amd/include/amd_shared.h 
> b/drivers/gpu/drm/amd/include/amd_shared.h
> index ce75351204bb..1f831cb747e0 100644
> --- a/drivers/gpu/drm/amd/include/amd_shared.h
> +++ b/drivers/gpu/drm/amd/include/amd_shared.h
> @@ -299,6 +299,7 @@ struct amd_ip_funcs {
> int (*hw_init)(void *handle);
> int (*hw_fini)(void *handle);
> void (*late_fini)(void *handle);
> +   int (*prepare)(void *prepare);

Prepare is a little vague.  How about prepare_suspend()?  Also *handle
for consistency.  Could possibly use this for DC as well if we end up
needing to allocate DML structures.

Alex

> int (*suspend)(void *handle);
> int (*resume)(void *handle);
> bool (*is_idle)(void *handle);
> --
> 2.34.1
>

RE: [PATCH v2 3/3] drm/amd/pm: Use gpu_metrics_v1_4 for SMUv13.0.6

2023-10-09 Thread Kamal, Asad

[AMD Official Use Only - General]

-Original Message-
From: Lazar, Lijo 
Sent: Monday, October 9, 2023 7:59 PM
To: Kamal, Asad ; amd-gfx@lists.freedesktop.org
Cc: Ma, Le ; Zhang, Morris ; Zhang, Hawking 

Subject: Re: [PATCH v2 3/3] drm/amd/pm: Use gpu_metrics_v1_4 for SMUv13.0.6



On 10/9/2023 5:28 PM, Lazar, Lijo wrote:
>
>
> On 10/6/2023 8:11 PM, Asad Kamal wrote:
>> Use gpu_metrics_v1_4 for SMUv13.0.6 to fill gpu metric info
>>
>> Signed-off-by: Asad Kamal 
>
> Series is:
>  Reviewed-by: Lijo Lazar 
>

On a second thought, since there is no FW release yet with FW metrics table v9 
support, suggest to drop patch 1 and pcie_bandwidth_inst  value assignment. 
Will keep the field as place holder till there is a FW update.
[Kamal, Asad] Will send a v3 with the changes.

Thanks & Regards
Asad

Thanks,
Lijo

> Thanks,
> Lijo
>
>> ---
>>   .../drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c  | 67
>> ---
>>   1 file changed, 43 insertions(+), 24 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
>> b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
>> index ce971a93d28b..3a07f1c95e45 100644
>> --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
>> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
>> @@ -279,7 +279,7 @@ static int smu_v13_0_6_tables_init(struct
>> smu_context *smu)
>>   return -ENOMEM;
>>   smu_table->metrics_time = 0;
>> -smu_table->gpu_metrics_table_size = sizeof(struct
>> gpu_metrics_v1_3);
>> +smu_table->gpu_metrics_table_size = sizeof(struct
>> +gpu_metrics_v1_4);
>>   smu_table->gpu_metrics_table =
>>   kzalloc(smu_table->gpu_metrics_table_size, GFP_KERNEL);
>>   if (!smu_table->gpu_metrics_table) { @@ -1969,22 +1969,19 @@
>> static int smu_v13_0_6_get_current_pcie_link_speed(struct smu_context
>> *smu)
>>   static ssize_t smu_v13_0_6_get_gpu_metrics(struct smu_context *smu,
>> void **table)
>>   {
>>   struct smu_table_context *smu_table = >smu_table;
>> -struct gpu_metrics_v1_3 *gpu_metrics =
>> -(struct gpu_metrics_v1_3 *)smu_table->gpu_metrics_table;
>> +struct gpu_metrics_v1_4 *gpu_metrics =
>> +(struct gpu_metrics_v1_4 *)smu_table->gpu_metrics_table;
>>   struct amdgpu_device *adev = smu->adev;
>> -int ret = 0, inst0, xcc0;
>> +int ret = 0, xcc_id, inst, i;
>>   MetricsTable_t *metrics;
>>   u16 link_width_level;
>> -inst0 = adev->sdma.instance[0].aid_id;
>> -xcc0 = GET_INST(GC, 0);
>> -
>>   metrics = kzalloc(sizeof(MetricsTable_t), GFP_KERNEL);
>>   ret = smu_v13_0_6_get_metrics_table(smu, metrics, true);
>>   if (ret)
>>   return ret;
>> -smu_cmn_init_soft_gpu_metrics(gpu_metrics, 1, 3);
>> +smu_cmn_init_soft_gpu_metrics(gpu_metrics, 1, 4);
>>   gpu_metrics->temperature_hotspot =
>>   SMUQ10_ROUND(metrics->MaxSocketTemperature);
>> @@ -2000,30 +1997,38 @@ static ssize_t
>> smu_v13_0_6_get_gpu_metrics(struct smu_context *smu, void **table
>>   gpu_metrics->average_umc_activity =
>>   SMUQ10_ROUND(metrics->DramBandwidthUtilization);
>> -gpu_metrics->average_socket_power =
>> +gpu_metrics->curr_socket_power =
>>   SMUQ10_ROUND(metrics->SocketPower);
>>   /* Energy counter reported in 15.259uJ (2^-16) units */
>>   gpu_metrics->energy_accumulator = metrics->SocketEnergyAcc;
>> -gpu_metrics->current_gfxclk =
>> -SMUQ10_ROUND(metrics->GfxclkFrequency[xcc0]);
>> -gpu_metrics->current_socclk =
>> -SMUQ10_ROUND(metrics->SocclkFrequency[inst0]);
>> -gpu_metrics->current_uclk =
>> SMUQ10_ROUND(metrics->UclkFrequency);
>> -gpu_metrics->current_vclk0 =
>> -SMUQ10_ROUND(metrics->VclkFrequency[inst0]);
>> -gpu_metrics->current_dclk0 =
>> -SMUQ10_ROUND(metrics->DclkFrequency[inst0]);
>> +for (i = 0; i < MAX_GFX_CLKS; i++) {
>> +xcc_id = GET_INST(GC, i);
>> +if (xcc_id >= 0)
>> +gpu_metrics->current_gfxclk[i] =
>> +SMUQ10_ROUND(metrics->GfxclkFrequency[xcc_id]);
>> +
>> +if (i < MAX_CLKS) {
>> +gpu_metrics->current_socclk[i] =
>> +SMUQ10_ROUND(metrics->SocclkFrequency[i]);
>> +inst = GET_INST(VCN, i);
>> +if (inst >= 0) {
>> +gpu_metrics->current_vclk0[i] =
>> +SMUQ10_ROUND(metrics->VclkFrequency[inst]);
>> +gpu_metrics->current_dclk0[i] =
>> +SMUQ10_ROUND(metrics->DclkFrequency[inst]);
>> +}
>> +}
>> +}
>> -gpu_metrics->average_gfxclk_frequency =
>> gpu_metrics->current_gfxclk;
>> -gpu_metrics->average_socclk_frequency =
>> gpu_metrics->current_socclk;
>> -gpu_metrics->average_uclk_frequency = gpu_metrics->current_uclk;
>> -gpu_metrics->average_vclk0_frequency =
>> gpu_metrics->current_vclk0;
>> -gpu_metrics->average_dclk0_frequency =
>> gpu_metrics->current_dclk0;
>> +

Re: [PATCH v2 3/3] drm/amd/pm: Use gpu_metrics_v1_4 for SMUv13.0.6

2023-10-09 Thread Lazar, Lijo





On 10/9/2023 5:28 PM, Lazar, Lijo wrote:



On 10/6/2023 8:11 PM, Asad Kamal wrote:

Use gpu_metrics_v1_4 for SMUv13.0.6 to fill
gpu metric info

Signed-off-by: Asad Kamal 


Series is:
 Reviewed-by: Lijo Lazar 



On a second thought, since there is no FW release yet with FW metrics 
table v9 support, suggest to drop patch 1 and pcie_bandwidth_inst  value 
assignment. Will keep the field as place holder till there is a FW update.


Thanks,
Lijo


Thanks,
Lijo


---
  .../drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c  | 67 ---
  1 file changed, 43 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c

index ce971a93d28b..3a07f1c95e45 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
@@ -279,7 +279,7 @@ static int smu_v13_0_6_tables_init(struct 
smu_context *smu)

  return -ENOMEM;
  smu_table->metrics_time = 0;
-    smu_table->gpu_metrics_table_size = sizeof(struct gpu_metrics_v1_3);
+    smu_table->gpu_metrics_table_size = sizeof(struct gpu_metrics_v1_4);
  smu_table->gpu_metrics_table =
  kzalloc(smu_table->gpu_metrics_table_size, GFP_KERNEL);
  if (!smu_table->gpu_metrics_table) {
@@ -1969,22 +1969,19 @@ static int 
smu_v13_0_6_get_current_pcie_link_speed(struct smu_context *smu)
  static ssize_t smu_v13_0_6_get_gpu_metrics(struct smu_context *smu, 
void **table)

  {
  struct smu_table_context *smu_table = >smu_table;
-    struct gpu_metrics_v1_3 *gpu_metrics =
-    (struct gpu_metrics_v1_3 *)smu_table->gpu_metrics_table;
+    struct gpu_metrics_v1_4 *gpu_metrics =
+    (struct gpu_metrics_v1_4 *)smu_table->gpu_metrics_table;
  struct amdgpu_device *adev = smu->adev;
-    int ret = 0, inst0, xcc0;
+    int ret = 0, xcc_id, inst, i;
  MetricsTable_t *metrics;
  u16 link_width_level;
-    inst0 = adev->sdma.instance[0].aid_id;
-    xcc0 = GET_INST(GC, 0);
-
  metrics = kzalloc(sizeof(MetricsTable_t), GFP_KERNEL);
  ret = smu_v13_0_6_get_metrics_table(smu, metrics, true);
  if (ret)
  return ret;
-    smu_cmn_init_soft_gpu_metrics(gpu_metrics, 1, 3);
+    smu_cmn_init_soft_gpu_metrics(gpu_metrics, 1, 4);
  gpu_metrics->temperature_hotspot =
  SMUQ10_ROUND(metrics->MaxSocketTemperature);
@@ -2000,30 +1997,38 @@ static ssize_t 
smu_v13_0_6_get_gpu_metrics(struct smu_context *smu, void **table

  gpu_metrics->average_umc_activity =
  SMUQ10_ROUND(metrics->DramBandwidthUtilization);
-    gpu_metrics->average_socket_power =
+    gpu_metrics->curr_socket_power =
  SMUQ10_ROUND(metrics->SocketPower);
  /* Energy counter reported in 15.259uJ (2^-16) units */
  gpu_metrics->energy_accumulator = metrics->SocketEnergyAcc;
-    gpu_metrics->current_gfxclk =
-    SMUQ10_ROUND(metrics->GfxclkFrequency[xcc0]);
-    gpu_metrics->current_socclk =
-    SMUQ10_ROUND(metrics->SocclkFrequency[inst0]);
-    gpu_metrics->current_uclk = SMUQ10_ROUND(metrics->UclkFrequency);
-    gpu_metrics->current_vclk0 =
-    SMUQ10_ROUND(metrics->VclkFrequency[inst0]);
-    gpu_metrics->current_dclk0 =
-    SMUQ10_ROUND(metrics->DclkFrequency[inst0]);
+    for (i = 0; i < MAX_GFX_CLKS; i++) {
+    xcc_id = GET_INST(GC, i);
+    if (xcc_id >= 0)
+    gpu_metrics->current_gfxclk[i] =
+    SMUQ10_ROUND(metrics->GfxclkFrequency[xcc_id]);
+
+    if (i < MAX_CLKS) {
+    gpu_metrics->current_socclk[i] =
+    SMUQ10_ROUND(metrics->SocclkFrequency[i]);
+    inst = GET_INST(VCN, i);
+    if (inst >= 0) {
+    gpu_metrics->current_vclk0[i] =
+    SMUQ10_ROUND(metrics->VclkFrequency[inst]);
+    gpu_metrics->current_dclk0[i] =
+    SMUQ10_ROUND(metrics->DclkFrequency[inst]);
+    }
+    }
+    }
-    gpu_metrics->average_gfxclk_frequency = gpu_metrics->current_gfxclk;
-    gpu_metrics->average_socclk_frequency = gpu_metrics->current_socclk;
-    gpu_metrics->average_uclk_frequency = gpu_metrics->current_uclk;
-    gpu_metrics->average_vclk0_frequency = gpu_metrics->current_vclk0;
-    gpu_metrics->average_dclk0_frequency = gpu_metrics->current_dclk0;
+    gpu_metrics->current_uclk = SMUQ10_ROUND(metrics->UclkFrequency);
  /* Throttle status is not reported through metrics now */
  gpu_metrics->throttle_status = 0;
+    /* Clock Lock Status. Each bit corresponds to each GFXCLK 
instance */
+    gpu_metrics->gfxclk_lock_status = metrics->GfxLockXCDMak >> 
GET_INST(GC, 0);

+
  if (!(adev->flags & AMD_IS_APU)) {
  link_width_level = 
smu_v13_0_6_get_current_pcie_link_width_level(smu);

  if (link_width_level > MAX_LINK_WIDTH)
@@ -2033,6 +2038,10 @@ static ssize_t 
smu_v13_0_6_get_gpu_metrics(struct smu_context *smu, void **table

Re: [PATCH v2 4/4] drm/amd/pm: Add sysfs attribute to get pm log

2023-10-09 Thread Alex Deucher

On Mon, Oct 9, 2023 at 4:41 AM Christian König
 wrote:
>
> Am 06.10.23 um 16:24 schrieb Alex Deucher:
> > On Fri, Oct 6, 2023 at 4:32 AM Christian König
> >  wrote:
> >> Am 06.10.23 um 07:21 schrieb Lijo Lazar:
> >>> Add sysfs attribute to read power management log. A snapshot is
> >>> captured to the buffer when the attribute is read.
> >>>
> >>> Signed-off-by: Lijo Lazar 
> >>> ---
> >>>
> >>> v2: Pass PAGE_SIZE as the max size of input buffer
> >>>
> >>>drivers/gpu/drm/amd/pm/amdgpu_pm.c | 40 ++
> >>>1 file changed, 40 insertions(+)
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c 
> >>> b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
> >>> index 4c65a2fac028..5a1d21c52672 100644
> >>> --- a/drivers/gpu/drm/amd/pm/amdgpu_pm.c
> >>> +++ b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
> >>> @@ -1794,6 +1794,44 @@ static ssize_t amdgpu_set_apu_thermal_cap(struct 
> >>> device *dev,
> >>>return count;
> >>>}
> >>>
> >>> +static int amdgpu_pmlog_attr_update(struct amdgpu_device *adev,
> >>> + struct amdgpu_device_attr *attr,
> >>> + uint32_t mask,
> >>> + enum amdgpu_device_attr_states *states)
> >>> +{
> >>> + if (amdgpu_dpm_get_pm_log(adev, NULL, 0) == -EOPNOTSUPP)
> >>> + *states = ATTR_STATE_UNSUPPORTED;
> >>> +
> >>> + return 0;
> >>> +}
> >>> +
> >>> +static ssize_t amdgpu_get_pmlog(struct device *dev,
> >>> + struct device_attribute *attr, char *buf)
> >>> +{
> >>> + struct drm_device *ddev = dev_get_drvdata(dev);
> >>> + struct amdgpu_device *adev = drm_to_adev(ddev);
> >>> + ssize_t size = 0;
> >>> + int ret;
> >>> +
> >>> + if (amdgpu_in_reset(adev))
> >>> + return -EPERM;
> >> Please stop using amdgpu_in_reset() for stuff like that altogether.
> >>
> >> When this is reset critical it should grab the reset rw semaphore. If it
> >> isn't than that check isn't necessary.
> > All of the power related sysfs files have this check.  It was
> > originally added because users have processes running which poll
> > various hwmon files at regular intervals and since SMU also handles
> > reset, we don't want to get a metrics table request while a reset is
> > happening otherwise the SMU gets confused.
>
> Then this approach is completely broken. Nothing prevents the reset from
> starting right after doing the check.
>
> If we need exclusive access to the SMU then we should just grab a lock.

Right, but the entire file should be fixed.  It's sort of orthogonal
to this patch.

Alex

>
> Christian.
>
> >
> > Alex
> >
> >> Regards,
> >> Christian.
> >>
> >>> + if (adev->in_suspend && !adev->in_runpm)
> >>> + return -EPERM;
> >>> +
> >>> + ret = pm_runtime_get_sync(ddev->dev);
> >>> + if (ret < 0) {
> >>> + pm_runtime_put_autosuspend(ddev->dev);
> >>> + return ret;
> >>> + }
> >>> +
> >>> + size = amdgpu_dpm_get_pm_log(adev, buf, PAGE_SIZE);
> >>> +
> >>> + pm_runtime_mark_last_busy(ddev->dev);
> >>> + pm_runtime_put_autosuspend(ddev->dev);
> >>> +
> >>> + return size;
> >>> +}
> >>> +
> >>>/**
> >>> * DOC: gpu_metrics
> >>> *
> >>> @@ -2091,6 +2129,8 @@ static struct amdgpu_device_attr 
> >>> amdgpu_device_attrs[] = {
> >>>AMDGPU_DEVICE_ATTR_RW(smartshift_bias,  
> >>> ATTR_FLAG_BASIC,
> >>>  .attr_update = ss_bias_attr_update),
> >>>AMDGPU_DEVICE_ATTR_RW(xgmi_plpd_policy, 
> >>> ATTR_FLAG_BASIC),
> >>> + AMDGPU_DEVICE_ATTR_RO(pmlog,
> >>> ATTR_FLAG_BASIC,
> >>> +   .attr_update = amdgpu_pmlog_attr_update),
> >>>};
> >>>
> >>>static int default_attr_update(struct amdgpu_device *adev, struct 
> >>> amdgpu_device_attr *attr,
>

Re: [PATCH v2 06/16] platform/x86/amd/pmf: Add support to get inputs from other subsystems

2023-10-09 Thread Ilpo Järvinen

On Mon, 9 Oct 2023, Shyam Sundar S K wrote:
> On 10/4/2023 5:44 PM, Ilpo Järvinen wrote:
> > On Sat, 30 Sep 2023, Shyam Sundar S K wrote:
> > 
> >> PMF driver sends changing inputs from each subystem to TA for evaluating
> >> the conditions in the policy binary.
> >>
> >> Add initial support of plumbing in the PMF driver for Smart PC to get
> >> information from other subsystems in the kernel.
> >>
> >> Signed-off-by: Shyam Sundar S K 

> >> diff --git a/drivers/platform/x86/amd/pmf/spc.c 
> >> b/drivers/platform/x86/amd/pmf/spc.c
> >> new file mode 100644
> >> index ..3113bde051d9
> >> --- /dev/null
> >> +++ b/drivers/platform/x86/amd/pmf/spc.c
> >> @@ -0,0 +1,119 @@
> >> +// SPDX-License-Identifier: GPL-2.0
> >> +/*
> >> + * AMD Platform Management Framework Driver - Smart PC Capabilities
> >> + *
> >> + * Copyright (c) 2023, Advanced Micro Devices, Inc.
> >> + * All Rights Reserved.
> >> + *
> >> + * Authors: Shyam Sundar S K 
> >> + *  Patil Rajesh Reddy 
> >> + */
> >> +
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include "pmf.h"
> >> +
> >> +static void amd_pmf_get_smu_info(struct amd_pmf_dev *dev, struct 
> >> ta_pmf_enact_table *in)
> >> +{
> >> +  u16 max, avg = 0;
> >> +  int i;
> >> +
> >> +  memset(dev->buf, 0, sizeof(dev->m_table));
> >> +  amd_pmf_send_cmd(dev, SET_TRANSFER_TABLE, 0, 7, NULL);
> >> +  memcpy(>m_table, dev->buf, sizeof(dev->m_table));
> >> +
> >> +  in->ev_info.socket_power = dev->m_table.apu_power + 
> >> dev->m_table.dgpu_power;
> >> +  in->ev_info.skin_temperature = dev->m_table.skin_temp;
> >> +
> >> +  /* get the avg C0 residency of all the cores */
> >> +  for (i = 0; i < ARRAY_SIZE(dev->m_table.avg_core_c0residency); i++)
> >> +  avg += dev->m_table.avg_core_c0residency[i];
> >> +
> >> +  /* get the max C0 residency of all the cores */
> >> +  max = dev->m_table.avg_core_c0residency[0];
> >> +  for (i = 1; i < ARRAY_SIZE(dev->m_table.avg_core_c0residency); i++) {
> >> +  if (dev->m_table.avg_core_c0residency[i] > max)
> >> +  max = dev->m_table.avg_core_c0residency[i];
> >> +  }
> > 
> > My comments weren't either answered adequately or changes made here.
> > Please check the v1 comments. I hope it's not because you feel hurry to 
> > get the next version out...
> > 
> > I'm still unsure if the u16 thing can overflow because I don't know what's 
> > the max value for avg_core_c0residency[i].
> 
> the highest value for avg_core_c0residency[i] is merely a small number
> and hence I retained the avg variable as u16. Not sure if there a
> 'real' case where it can overflow.

Okay, if you think it's fine, no problem with it then (not that there's 
a big advantage having it as u16 instead of e.g. unsigned int).

> Sorry, I missed to merge both into a single for loop. I will address
> this in v3.

Thanks.

-- 
 i.

Re: [PATCH v2 01/16] platform/x86/amd/pmf: Add PMF TEE interface

2023-10-09 Thread Ilpo Järvinen

On Mon, 9 Oct 2023, Shyam Sundar S K wrote:

> 
> 
> On 10/4/2023 4:20 PM, Ilpo Järvinen wrote:
> > On Sat, 30 Sep 2023, Shyam Sundar S K wrote:
> > 
> >> AMD PMF driver loads the PMF TA (Trusted Application) into the AMD
> >> ASP's (AMD Security Processor) TEE (Trusted Execution Environment).
> >>
> >> PMF Trusted Application is a secured firmware placed under
> >> /lib/firmware/amdtee gets loaded only when the TEE environment is
> >> initialized. Add the initial code path to build these pipes.
> >>
> >> Reviewed-by: Mario Limonciello 
> >> Signed-off-by: Shyam Sundar S K 
> >> ---
> >>  drivers/platform/x86/amd/pmf/Makefile |   3 +-
> >>  drivers/platform/x86/amd/pmf/core.c   |  11 ++-
> >>  drivers/platform/x86/amd/pmf/pmf.h|  16 
> >>  drivers/platform/x86/amd/pmf/tee-if.c | 112 ++
> >>  4 files changed, 138 insertions(+), 4 deletions(-)
> >>  create mode 100644 drivers/platform/x86/amd/pmf/tee-if.c
> >>
> >> diff --git a/drivers/platform/x86/amd/pmf/Makefile 
> >> b/drivers/platform/x86/amd/pmf/Makefile
> >> index fdededf54392..d2746ee7369f 100644
> >> --- a/drivers/platform/x86/amd/pmf/Makefile
> >> +++ b/drivers/platform/x86/amd/pmf/Makefile
> >> @@ -6,4 +6,5 @@
> >>  
> >>  obj-$(CONFIG_AMD_PMF) += amd-pmf.o
> >>  amd-pmf-objs := core.o acpi.o sps.o \
> >> -  auto-mode.o cnqf.o
> >> +  auto-mode.o cnqf.o \
> >> +  tee-if.o
> >> diff --git a/drivers/platform/x86/amd/pmf/core.c 
> >> b/drivers/platform/x86/amd/pmf/core.c
> >> index 78ed3ee22555..68f1389dda3e 100644
> >> --- a/drivers/platform/x86/amd/pmf/core.c
> >> +++ b/drivers/platform/x86/amd/pmf/core.c
> >> @@ -309,8 +309,11 @@ static void amd_pmf_init_features(struct amd_pmf_dev 
> >> *dev)
> >>dev_dbg(dev->dev, "SPS enabled and Platform Profiles 
> >> registered\n");
> >>}
> >>  
> >> -  /* Enable Auto Mode */
> >> -  if (is_apmf_func_supported(dev, APMF_FUNC_AUTO_MODE)) {
> >> +  if (amd_pmf_init_smart_pc(dev)) {
> >> +  /* Enable Smart PC Solution builder */
> >> +  dev_dbg(dev->dev, "Smart PC Solution Enabled\n");
> >> +  } else if (is_apmf_func_supported(dev, APMF_FUNC_AUTO_MODE)) {
> >> +  /* Enable Auto Mode */
> > 
> > I'm pretty certain neither of these two comments add any information to 
> > what's readily visible from the code itself so they can be dropped.
> > 
> >>amd_pmf_init_auto_mode(dev);
> >>dev_dbg(dev->dev, "Auto Mode Init done\n");
> >>} else if (is_apmf_func_supported(dev, APMF_FUNC_DYN_SLIDER_AC) ||
> >> @@ -330,7 +333,9 @@ static void amd_pmf_deinit_features(struct amd_pmf_dev 
> >> *dev)
> >>amd_pmf_deinit_sps(dev);
> >>}
> >>  
> >> -  if (is_apmf_func_supported(dev, APMF_FUNC_AUTO_MODE)) {
> >> +  if (dev->smart_pc_enabled) {
> >> +  amd_pmf_deinit_smart_pc(dev);
> >> +  } else if (is_apmf_func_supported(dev, APMF_FUNC_AUTO_MODE)) {
> >>amd_pmf_deinit_auto_mode(dev);
> >>} else if (is_apmf_func_supported(dev, APMF_FUNC_DYN_SLIDER_AC) ||
> >>  is_apmf_func_supported(dev, APMF_FUNC_DYN_SLIDER_DC)) 
> >> {
> >> diff --git a/drivers/platform/x86/amd/pmf/pmf.h 
> >> b/drivers/platform/x86/amd/pmf/pmf.h
> >> index deba88e6e4c8..02460c2a31ea 100644
> >> --- a/drivers/platform/x86/amd/pmf/pmf.h
> >> +++ b/drivers/platform/x86/amd/pmf/pmf.h
> >> @@ -179,6 +179,12 @@ struct amd_pmf_dev {
> >>bool cnqf_enabled;
> >>bool cnqf_supported;
> >>struct notifier_block pwr_src_notifier;
> >> +  /* Smart PC solution builder */
> >> +  struct tee_context *tee_ctx;
> >> +  struct tee_shm *fw_shm_pool;
> >> +  u32 session_id;
> >> +  void *shbuf;
> >> +  bool smart_pc_enabled;
> >>  };
> >>  
> >>  struct apmf_sps_prop_granular {
> >> @@ -389,6 +395,13 @@ struct apmf_dyn_slider_output {
> >>struct apmf_cnqf_power_set ps[APMF_CNQF_MAX];
> >>  } __packed;
> >>  
> >> +struct ta_pmf_shared_memory {
> >> +  int command_id;
> >> +  int resp_id;
> >> +  u32 pmf_result;
> >> +  u32 if_version;
> >> +};
> >> +
> >>  /* Core Layer */
> >>  int apmf_acpi_init(struct amd_pmf_dev *pmf_dev);
> >>  void apmf_acpi_deinit(struct amd_pmf_dev *pmf_dev);
> >> @@ -433,4 +446,7 @@ void amd_pmf_deinit_cnqf(struct amd_pmf_dev *dev);
> >>  int amd_pmf_trans_cnqf(struct amd_pmf_dev *dev, int socket_power, ktime_t 
> >> time_lapsed_ms);
> >>  extern const struct attribute_group cnqf_feature_attribute_group;
> >>  
> >> +/* Smart PC builder Layer*/
> > 
> > Missing space.
> > 
> >> +int amd_pmf_init_smart_pc(struct amd_pmf_dev *dev);
> >> +void amd_pmf_deinit_smart_pc(struct amd_pmf_dev *dev);
> >>  #endif /* PMF_H */
> >> diff --git a/drivers/platform/x86/amd/pmf/tee-if.c 
> >> b/drivers/platform/x86/amd/pmf/tee-if.c
> >> new file mode 100644
> >> index ..4db80ca59a11
> >> --- /dev/null
> >> +++ b/drivers/platform/x86/amd/pmf/tee-if.c
> >> @@ -0,0 +1,112 @@
> >> +// SPDX-License-Identifier: GPL-2.0
> >> +/*
> >> + * AMD Platform Management Framework Driver

Re: [PATCH v2 12/16] platform/x86/amd/pmf: Add PMF-AMDGPU get interface

2023-10-09 Thread Ilpo Järvinen

On Mon, 9 Oct 2023, Shyam Sundar S K wrote:
> On 10/4/2023 6:19 PM, Ilpo Järvinen wrote:
> > On Sat, 30 Sep 2023, Shyam Sundar S K wrote:
> > 
> >> In order to provide GPU inputs to TA for the Smart PC solution to work, we
> >> need to have interface between the PMF driver and the AMDGPU driver.
> >>
> >> Add the initial code path for get interface from AMDGPU.
> >>
> >> Co-developed-by: Mario Limonciello 
> >> Signed-off-by: Mario Limonciello 
> >> Signed-off-by: Shyam Sundar S K 
> > 
> >> @@ -355,6 +356,21 @@ static int amd_pmf_get_bios_buffer(struct amd_pmf_dev 
> >> *dev)
> >>return amd_pmf_start_policy_engine(dev);
> >>  }
> >>  
> >> +static int amd_pmf_get_gpu_handle(struct pci_dev *pdev, void *data)
> >> +{
> >> +  struct amd_pmf_dev *dev = data;
> >> +
> >> +  if (pdev->vendor == PCI_VENDOR_ID_ATI && pdev->devfn == 0) {
> >> +  /* get the amdgpu handle from the pci root after walking 
> >> through the pci bus */
> > 
> > I can see from the code that you assign to amdgpu handle so this comment 
> > added no information.
> > 
> > It doesn't really answer at all why you're doing this second step. Based 
> > on the give parameters to pci_get_device(), it looks as if you're asking 
> > for the same device you already have in pdev to be searched to you.
> 
> Not sure if I understand you remark completely.
> 
> amd_pmf_get_gpu_handle() is a callback function for pci_walk_bus
> (which is done below).
> 
> What I am trying to do here is to get the PCI handle for the GPU
> device by walking the PCI bus.
> 
> I think the 'pdev' here refers to the pci root, using that root we
> walk the entire tree and only stop walking when we find a handle to
> GPU device.

Not exactly what happens, in amd_pmf_get_gpu_handle() pdev changes on each 
call so I don't know why you stated it is refering to the "pci root".

> Do you want me to change the "pdev" parameter to be renamed as "root" ?

No, please don't do that, it would be misleading.

> Am I missing something?

I meant that at some point of the walk through the PCI devices, you have 
a PCI device pdev with ->vendor PCI_VENDOR_ID_AMD when that if condition 
above matched. Please explain why you need to do another lookup with 
pci_get_device() at that point (with the same ->vendor and ->device as 
shown below)?

> >> +  dev->gfx_data.gpu_dev = pci_get_device(pdev->vendor, 
> >> pdev->device, NULL);
> >> +  if (dev->gfx_data.gpu_dev) {
> >> +  pci_dev_put(pdev);
> >> +  return 1; /* stop walking */
> >> +  }
> >> +  }
> >> +  return 0; /* continue walking */


-- 
 i.

Re: [PATCH v2 04/16] platform/x86/amd/pmf: Add support for PMF Policy Binary

2023-10-09 Thread Ilpo Järvinen

On Mon, 9 Oct 2023, Shyam Sundar S K wrote:
> On 10/4/2023 5:30 PM, Ilpo Järvinen wrote:
> > On Sat, 30 Sep 2023, Shyam Sundar S K wrote:
> > 
> >> PMF Policy binary is a encrypted and signed binary that will be part
> >> of the BIOS. PMF driver via the ACPI interface checks the existence
> >> of Smart PC bit. If the advertised bit is found, PMF driver walks
> >> the acpi namespace to find out the policy binary size and the address
> >> which has to be passed to the TA during the TA init sequence.
> >>
> >> The policy binary is comprised of inputs (or the events) and outputs
> >> (or the actions). With the PMF ecosystem, OEMs generate the policy
> >> binary (or could be multiple binaries) that contains a supported set
> >> of inputs and outputs which could be specifically carved out for each
> >> usage segment (or for each user also) that could influence the system
> >> behavior either by enriching the user experience or/and boost/throttle
> >> power limits.
> >>
> >> Once the TA init command succeeds, the PMF driver sends the changing
> >> events in the current environment to the TA for a constant sampling
> >> frequency time (the event here could be a lid close or open) and
> >> if the policy binary has corresponding action built within it, the
> >> TA sends the action for it in the subsequent enact command.
> >>
> >> If the inputs sent to the TA has no output defined in the policy
> >> binary generated by OEMs, there will be no action to be performed
> >> by the PMF driver.
> >>
> >> Example policies:
> >>
> >> 1) if slider is performance ; set the SPL to 40W
> >> Here PMF driver registers with the platform profile interface and
> >> when the slider position is changed, PMF driver lets the TA know
> >> about this. TA sends back an action to update the Sustained
> >> Power Limit (SPL). PMF driver updates this limit via the PMFW mailbox.
> >>
> >> 2) if user_away ; then lock the system
> >> Here PMF driver hooks to the AMD SFH driver to know the user presence
> >> and send the inputs to TA and if the condition is met, the TA sends
> >> the action of locking the system. PMF driver generates a uevent and
> >> based on the udev rule in the userland the system gets locked with
> >> systemctl.
> >>
> >> The intent here is to provide the OEM's to make a policy to lock the
> >> system when the user is away ; but the userland can make a choice to
> >> ignore it.
> >>
> >> and so on.
> >>
> >> The OEMs will have an utility to create numerous such policies and
> >> the policies shall be reviewed by AMD before signing and encrypting
> >> them. Policies are shared between operating systems to have seemless user
> >> experience.
> >>
> >> Since all this action has to happen via the "amdtee" driver, currently
> >> there is no caller for it in the kernel which can load the amdtee driver.
> >> Without amdtee driver loading onto the system the "tee" calls shall fail
> >> from the PMF driver. Hence an explicit "request_module" has been added
> >> to address this.
> >>
> >> Signed-off-by: Shyam Sundar S K 

> >> diff --git a/drivers/platform/x86/amd/pmf/core.c 
> >> b/drivers/platform/x86/amd/pmf/core.c
> >> index 678dce4fea08..787f25511191 100644
> >> --- a/drivers/platform/x86/amd/pmf/core.c
> >> +++ b/drivers/platform/x86/amd/pmf/core.c
> >> @@ -384,6 +384,18 @@ static int amd_pmf_probe(struct platform_device *pdev)
> >>return -ENOMEM;
> >>  
> >>dev->dev = >dev;
> >> +  err = apmf_check_smart_pc(dev);
> >> +  if (!err) {
> >> +  /* in order for Smart PC solution to work it has a hard 
> >> dependency
> >> +   * on the amdtee driver to be loaded first even before the PMF 
> >> driver
> >> +   * loads. PMF ASL has a _CRS method that advertises the 
> >> existence
> >> +   * of Smart PC bit. If this information is present, use this to
> >> +   * explicitly probe the amdtee driver, so that "tee" plumbing 
> >> is done
> >> +   * before the PMF Smart PC init happens.
> >> +   */
> > 
> > But please follow no-text on /* line formatting for multiline comments. 
> > Also start with a capital letter.
> > 
> > 
> >> +  if (request_module("amdtee"))
> > 
> > Are you aware that this won't give you very strong guarantees about 
> > anything if request_module()'s function comments is to be believed?
> > 
> > If that's all what you're after, MODULE_SOFTDEP("pre: amdtee"); is 
> > probably enough (and I unfortunately don't know the answer how to do it if 
> > you want something stronger than that when you don't directly depend on 
> > the symbols of the other module).
> 
> MODULE_SOFTDEP("pre: amdtee"); did not help.

So how was this module loaded then? I suppose if the user does insmod, the 
softdep wouldn't be honored but modprobe should load the dependencies 
first.

> There is no consumer loading the 'amdtee' driver today in the kernel.
> Even now with this change, the pmf driver calls the TEE subsystem APIs
> that will eventually land in amdtee code.

Re: [PATCH] drm/amdgpu: add missing NULL check

2023-10-09 Thread Samuel Pitoiset


I can confirm this patch fixes the kernel crash I reported.

But as discussed with Christian, we should find the root cause.

On 10/6/23 14:11, Christian König wrote:

bo->tbo.resource can easily be NULL here.

Signed-off-by: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
index f3ee83cdf97e..d28e21baef16 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
@@ -252,7 +252,7 @@ static inline bool amdgpu_bo_in_cpu_visible_vram(struct 
amdgpu_bo *bo)
struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
struct amdgpu_res_cursor cursor;
  
-	if (bo->tbo.resource->mem_type != TTM_PL_VRAM)

+   if (!bo->tbo.resource || bo->tbo.resource->mem_type != TTM_PL_VRAM)
return false;
  
  	amdgpu_res_first(bo->tbo.resource, 0, amdgpu_bo_size(bo), );

Re: [PATCH AUTOSEL 5.10 13/22] drm/amdgpu: install stub fence into potential unused fence pointers

2023-10-09 Thread Christian König


Am 07.10.23 um 11:50 schrieb Greg KH:

On Sun, Sep 10, 2023 at 03:43:01PM -0500, Bryan Jennings wrote:

This is also causing log spam on 5.15.  It was included in 5.15.128 as
commit 4921792e04f2125b5eadef9dbe9417a8354c7eff.  I encountered this and
found https://gitlab.freedesktop.org/drm/amd/-/issues/2820 while researching
the problem.

Confused, what should we do here?


If this patch was backported to even more older kernels then please 
revert that immediately!


This patch was part of a new feature and can only work correctly with a 
bunch of prerequisites. If you don't have those prerequisites in your 
branch then it might actually cause random memory corruptions through 
device DMA.


And we should probably talk about why this patch was automatically 
selected for backporting in the first place? There is no mention that 
this is a fix or should be backported in the commit message or patch 
itself whatsoever.


Without the WARN_ON() which started to spam the logs that could have 
gone south pretty quickly. Random data corruption without any indicator 
what's causing it is not really funny.


Regards,
Christian.

Re: [PATCH v2 3/3] drm/amd/pm: Use gpu_metrics_v1_4 for SMUv13.0.6

2023-10-09 Thread Lazar, Lijo





On 10/6/2023 8:11 PM, Asad Kamal wrote:

Use gpu_metrics_v1_4 for SMUv13.0.6 to fill
gpu metric info

Signed-off-by: Asad Kamal 


Series is:
Reviewed-by: Lijo Lazar 

Thanks,
Lijo


---
  .../drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c  | 67 ---
  1 file changed, 43 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
index ce971a93d28b..3a07f1c95e45 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
@@ -279,7 +279,7 @@ static int smu_v13_0_6_tables_init(struct smu_context *smu)
return -ENOMEM;
smu_table->metrics_time = 0;
  
-	smu_table->gpu_metrics_table_size = sizeof(struct gpu_metrics_v1_3);

+   smu_table->gpu_metrics_table_size = sizeof(struct gpu_metrics_v1_4);
smu_table->gpu_metrics_table =
kzalloc(smu_table->gpu_metrics_table_size, GFP_KERNEL);
if (!smu_table->gpu_metrics_table) {
@@ -1969,22 +1969,19 @@ static int 
smu_v13_0_6_get_current_pcie_link_speed(struct smu_context *smu)
  static ssize_t smu_v13_0_6_get_gpu_metrics(struct smu_context *smu, void 
**table)
  {
struct smu_table_context *smu_table = >smu_table;
-   struct gpu_metrics_v1_3 *gpu_metrics =
-   (struct gpu_metrics_v1_3 *)smu_table->gpu_metrics_table;
+   struct gpu_metrics_v1_4 *gpu_metrics =
+   (struct gpu_metrics_v1_4 *)smu_table->gpu_metrics_table;
struct amdgpu_device *adev = smu->adev;
-   int ret = 0, inst0, xcc0;
+   int ret = 0, xcc_id, inst, i;
MetricsTable_t *metrics;
u16 link_width_level;
  
-	inst0 = adev->sdma.instance[0].aid_id;

-   xcc0 = GET_INST(GC, 0);
-
metrics = kzalloc(sizeof(MetricsTable_t), GFP_KERNEL);
ret = smu_v13_0_6_get_metrics_table(smu, metrics, true);
if (ret)
return ret;
  
-	smu_cmn_init_soft_gpu_metrics(gpu_metrics, 1, 3);

+   smu_cmn_init_soft_gpu_metrics(gpu_metrics, 1, 4);
  
  	gpu_metrics->temperature_hotspot =

SMUQ10_ROUND(metrics->MaxSocketTemperature);
@@ -2000,30 +1997,38 @@ static ssize_t smu_v13_0_6_get_gpu_metrics(struct 
smu_context *smu, void **table
gpu_metrics->average_umc_activity =
SMUQ10_ROUND(metrics->DramBandwidthUtilization);
  
-	gpu_metrics->average_socket_power =

+   gpu_metrics->curr_socket_power =
SMUQ10_ROUND(metrics->SocketPower);
/* Energy counter reported in 15.259uJ (2^-16) units */
gpu_metrics->energy_accumulator = metrics->SocketEnergyAcc;
  
-	gpu_metrics->current_gfxclk =

-   SMUQ10_ROUND(metrics->GfxclkFrequency[xcc0]);
-   gpu_metrics->current_socclk =
-   SMUQ10_ROUND(metrics->SocclkFrequency[inst0]);
-   gpu_metrics->current_uclk = SMUQ10_ROUND(metrics->UclkFrequency);
-   gpu_metrics->current_vclk0 =
-   SMUQ10_ROUND(metrics->VclkFrequency[inst0]);
-   gpu_metrics->current_dclk0 =
-   SMUQ10_ROUND(metrics->DclkFrequency[inst0]);
+   for (i = 0; i < MAX_GFX_CLKS; i++) {
+   xcc_id = GET_INST(GC, i);
+   if (xcc_id >= 0)
+   gpu_metrics->current_gfxclk[i] =
+   SMUQ10_ROUND(metrics->GfxclkFrequency[xcc_id]);
+
+   if (i < MAX_CLKS) {
+   gpu_metrics->current_socclk[i] =
+   SMUQ10_ROUND(metrics->SocclkFrequency[i]);
+   inst = GET_INST(VCN, i);
+   if (inst >= 0) {
+   gpu_metrics->current_vclk0[i] =
+   
SMUQ10_ROUND(metrics->VclkFrequency[inst]);
+   gpu_metrics->current_dclk0[i] =
+   
SMUQ10_ROUND(metrics->DclkFrequency[inst]);
+   }
+   }
+   }
  
-	gpu_metrics->average_gfxclk_frequency = gpu_metrics->current_gfxclk;

-   gpu_metrics->average_socclk_frequency = gpu_metrics->current_socclk;
-   gpu_metrics->average_uclk_frequency = gpu_metrics->current_uclk;
-   gpu_metrics->average_vclk0_frequency = gpu_metrics->current_vclk0;
-   gpu_metrics->average_dclk0_frequency = gpu_metrics->current_dclk0;
+   gpu_metrics->current_uclk = SMUQ10_ROUND(metrics->UclkFrequency);
  
  	/* Throttle status is not reported through metrics now */

gpu_metrics->throttle_status = 0;
  
+	/* Clock Lock Status. Each bit corresponds to each GFXCLK instance */

+   gpu_metrics->gfxclk_lock_status = metrics->GfxLockXCDMak >> 
GET_INST(GC, 0);
+
if (!(adev->flags & AMD_IS_APU)) {
link_width_level = 
smu_v13_0_6_get_current_pcie_link_width_level(smu);
if (link_width_level > MAX_LINK_WIDTH)
@@ -2033,6 +2038,10 @@ static ssize_t

Re: [PATCH -next 0/7] drm: Remove many unnecessary NULL values

2023-10-09 Thread Dmitry Baryshkov



On Wed, 09 Aug 2023 11:44:38 +0800, Ruan Jinjie wrote:
> The NULL initialization of the pointers assigned by kzalloc() or
> kunit_kzalloc() first is not necessary, because if the kzalloc() or
> kunit_kzalloc() failed, the pointers will be assigned NULL, otherwise
> it works as usual. so remove it.
> 
> Ruan Jinjie (7):
>   drm/amdkfd: Remove unnecessary NULL values
>   drm/amd/display: Remove unnecessary NULL values
>   drm/msm: Remove unnecessary NULL values
>   drm/radeon: Remove unnecessary NULL values
>   drm/virtio: Remove an unnecessary NULL value
>   drm/format-helper: Remove unnecessary NULL values
>   drm: Remove unnecessary NULL values
> 
> [...]

Applied, thanks!

[3/7] drm/msm: Remove unnecessary NULL values
  https://gitlab.freedesktop.org/lumag/msm/-/commit/92a48b6ed510

Best regards,
-- 
Dmitry Baryshkov

Re: [PATCH AUTOSEL 5.10 13/22] drm/amdgpu: install stub fence into potential unused fence pointers

2023-10-09 Thread Greg KH

On Sun, Sep 10, 2023 at 03:43:01PM -0500, Bryan Jennings wrote:
> This is also causing log spam on 5.15.  It was included in 5.15.128 as
> commit 4921792e04f2125b5eadef9dbe9417a8354c7eff.  I encountered this and
> found https://gitlab.freedesktop.org/drm/amd/-/issues/2820 while researching
> the problem.

Confused, what should we do here?

Re: [PATCH] drm/amdgpu: add missing NULL check

2023-10-09 Thread Christian König


Am 06.10.23 um 16:41 schrieb Alex Deucher:

On Fri, Oct 6, 2023 at 9:07 AM Christian König
 wrote:

bo->tbo.resource can easily be NULL here.

Signed-off-by: Christian König 

Add a link to the bug report?


Ah, crap. Forgotten to add the link before pushing that. But I've added 
a CC stable.


Apart from that I suspect that this doesn't fix the real issue here, it 
just mitigates the problem.


For some reason we can't allocate OA resources, but also doesn't fail 
and instead keep working with an empty BO.


Regards,
Christian.


With that:
Reviewed-by: Alex Deucher 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
index f3ee83cdf97e..d28e21baef16 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
@@ -252,7 +252,7 @@ static inline bool amdgpu_bo_in_cpu_visible_vram(struct 
amdgpu_bo *bo)
 struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
 struct amdgpu_res_cursor cursor;

-   if (bo->tbo.resource->mem_type != TTM_PL_VRAM)
+   if (!bo->tbo.resource || bo->tbo.resource->mem_type != TTM_PL_VRAM)
 return false;

 amdgpu_res_first(bo->tbo.resource, 0, amdgpu_bo_size(bo), );
--
2.34.1

Re: [PATCH v5 5/7] drm/amd/display: Catch errors from drm_atomic_helper_suspend()

2023-10-09 Thread Christian König


Am 06.10.23 um 20:50 schrieb Mario Limonciello:

drm_atomic_helper_suspend() can return PTR_ERR(), in which case the
error gets stored into `dm->cached_state`.  This can cause failures
during resume.  Catch the error during suspend and fail the suspend
instead.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2362
Signed-off-by: Mario Limonciello 


Acked-by: Christian König 


---
v4->v5:
  * New patch
---
  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index a59a11ae42db..63944d3b9e8c 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -2716,6 +2716,8 @@ static int dm_suspend(void *handle)
  
  	WARN_ON(adev->dm.cached_state);

adev->dm.cached_state = drm_atomic_helper_suspend(adev_to_drm(adev));
+   if (IS_ERR(adev->dm.cached_state))
+   return PTR_ERR(adev->dm.cached_state);
  
  	s3_handle_mst(adev_to_drm(adev), true);

Re: [PATCH v5 4/7] drm/amd: Capture errors in amdgpu_switcheroo_set_state()

2023-10-09 Thread Christian König


Am 06.10.23 um 20:50 schrieb Mario Limonciello:

amdgpu_switcheroo_set_state() calls lots of functions that could
fail under memory pressure or for other reasons.  Don't assume
everything can successfully run sequentially, and check return codes
for everything that returns one.

Signed-off-by: Mario Limonciello 


Acked-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 36 +-
  1 file changed, 29 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index a362152cd0da..8dfcff783dab 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -1749,23 +1749,45 @@ static void amdgpu_switcheroo_set_state(struct pci_dev 
*pdev,
/* don't suspend or resume card normally */
dev->switch_power_state = DRM_SWITCH_POWER_CHANGING;
  
-		pci_set_power_state(pdev, PCI_D0);

-   amdgpu_device_load_pci_state(pdev);
+   r = pci_set_power_state(pdev, PCI_D0);
+   if (r) {
+   DRM_WARN("pci_set_power_state failed (%d)\n", r);
+   return;
+   }
+   if (!amdgpu_device_load_pci_state(pdev))
+   return;
r = pci_enable_device(pdev);
if (r)
DRM_WARN("pci_enable_device failed (%d)\n", r);
-   amdgpu_device_resume(dev, true);
+   r = amdgpu_device_resume(dev, true);
+   if (r) {
+   DRM_WARN("amdgpu_device_resume failed (%d)\n", r);
+   return;
+   }
  
  		dev->switch_power_state = DRM_SWITCH_POWER_ON;

} else {
pr_info("switched off\n");
dev->switch_power_state = DRM_SWITCH_POWER_CHANGING;
-   amdgpu_device_prepare(dev);
-   amdgpu_device_suspend(dev, true);
-   amdgpu_device_cache_pci_state(pdev);
+   r = amdgpu_device_prepare(dev);
+   if (r) {
+   DRM_WARN("amdgpu_device_prepare failed (%d)\n", r);
+   return;
+   }
+   r = amdgpu_device_suspend(dev, true);
+   if (r) {
+   DRM_WARN("amdgpu_device_suspend failed (%d)\n", r);
+   return;
+   }
+   if (!amdgpu_device_cache_pci_state(pdev))
+   return;
/* Shut down the device */
pci_disable_device(pdev);
-   pci_set_power_state(pdev, PCI_D3cold);
+   r = pci_set_power_state(pdev, PCI_D3cold);
+   if (r) {
+   DRM_WARN("pci_set_power_state failed (%d)\n", r);
+   return;
+   }
dev->switch_power_state = DRM_SWITCH_POWER_OFF;
}
  }

RE: [PATCH] drm/amdgpu: fix SI failure due to doorbells allocation

2023-10-09 Thread Sharma, Shashank

[AMD Official Use Only - General]

Reviewed-by: Shashank Sharma 

Regards
Shashank
-Original Message-
From: Icenowy Zheng 
Sent: Sunday, October 8, 2023 8:47 AM
To: Deucher, Alexander ; Koenig, Christian 
; Pan, Xinhui ; David Airlie 
; Daniel Vetter ; Sharma, Shashank 
; Yadav, Arvind 
Cc: amd-gfx@lists.freedesktop.org; dri-de...@lists.freedesktop.org; 
linux-ker...@vger.kernel.org; Icenowy Zheng 
Subject: [PATCH] drm/amdgpu: fix SI failure due to doorbells allocation

SI hardware does not have doorbells at all, however currently the code will try 
to do the allocation and thus fail, makes SI AMDGPU not usable.

Fix this failure by skipping doorbells allocation when doorbells count is zero.

Fixes: 54c30d2a8def ("drm/amdgpu: create kernel doorbell pages")
Signed-off-by: Icenowy Zheng 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell_mgr.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell_mgr.c
index d0249ada91d30..599aece42017a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell_mgr.c
@@ -142,6 +142,10 @@ int amdgpu_doorbell_create_kernel_doorbells(struct 
amdgpu_device *adev)
int r;
int size;

+   /* SI HW does not have doorbells, skip allocation */
+   if (adev->doorbell.num_kernel_doorbells == 0)
+   return 0;
+
/* Reserve first num_kernel_doorbells (page-aligned) for kernel ops */
size = ALIGN(adev->doorbell.num_kernel_doorbells * sizeof(u32), 
PAGE_SIZE);

--
2.39.1

RE: [PATCH] drm/amdgpu:Check gfx poweron when skip flush_gpu_tlb

2023-10-09 Thread Zhang, Hawking

[AMD Official Use Only - General]

adev->gfx.is_poweron check should already be applied in IP specific (gmc v11) 
callback. If gfx is not power on, it does nothing but just returns. I didn't 
see how it helps resolve the issue if we just move the check from one function 
to another.

Regards,
Hawking

-Original Message-
From: Xu, Feifei 
Sent: Monday, October 9, 2023 09:51
To: Wang, Yang(Kevin) ; amd-gfx@lists.freedesktop.org
Cc: Koenig, Christian ; Zhang, Hawking 

Subject: RE: [PATCH] drm/amdgpu:Check gfx poweron when skip flush_gpu_tlb

[AMD Official Use Only - General]

Hi,

>> Based on your description, the above code should use "||" instead of
>> "&&",
&& is to add more restriction here.  To avoid skipping necessary TLB flush by 
return.
For Asics < GFX11, !adev->gfx.is_poweron is always true (this paremeter is 
intrudoced from GFX11), only depends on reset_domain->sem; For Asics = GFX11, 
!adev->gfx.is_poweron might be false (which gfx might already poweron in the 
reset), this will make the if () not ture, return will not be executed, thus 
flush TLB.

>> And after merging code into one line may result in the lock not being 
>> released if the lock can be acquired success.
If !adev->gfx.is_poweron is true, the reset_domin->sem will not be 
down_read_trylock, thus could avoid this deadlock.

Thanks,
Feifei

-Original Message-
From: Wang, Yang(Kevin) 
Sent: Sunday, October 8, 2023 9:36 PM
To: Xu, Feifei ; amd-gfx@lists.freedesktop.org
Cc: Xu, Feifei ; Xu, Feifei ; Koenig, 
Christian ; Zhang, Hawking 
Subject: RE: [PATCH] drm/amdgpu:Check gfx poweron when skip flush_gpu_tlb


-Original Message-
From: amd-gfx  On Behalf Of Feifei Xu
Sent: Sunday, October 8, 2023 6:07 PM
To: amd-gfx@lists.freedesktop.org
Cc: Xu, Feifei ; Xu, Feifei ; Koenig, 
Christian ; Zhang, Hawking 
Subject: [PATCH] drm/amdgpu:Check gfx poweron when skip flush_gpu_tlb

To fix the gpu recovery failed on GFX11 with gfxhub pagefault, flush gpu tlb 
after reset on GFX11.
Gfxhub tlb flush need check if adev->gfx.is_poweron set.

Fixes: d0c860f33553 ("drm/amdgpu: rework lock handling for flush_tlb v2")

Signed-off-by: Feifei Xu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
index 2f9bb86edd71..048d32edee88 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
@@ -611,8 +611,9 @@ void amdgpu_gmc_flush_gpu_tlb(struct amdgpu_device *adev, 
uint32_t vmid,
/*
 * A GPU reset should flush all TLBs anyway, so no need to do
 * this while one is ongoing.
+* For GFX11, gfxhub flush check if adev->gfx.is_poweron is set.
 */
-   if (!down_read_trylock(>reset_domain->sem))
+   if (!down_read_trylock(>reset_domain->sem) &&
+!adev->gfx.is_poweron)
return;

[Kevin]:
Based on your description, the above code should use "||" instead of "&&", And 
after merging code into one line may result in the lock not being released if 
the lock can be acquired success.

Best Regards,
Kevin

if (adev->gmc.flush_tlb_needs_extra_type_2)
--
2.34.1

Re: [PATCH] drm/amdgpu:Check gfx poweron when skip flush_gpu_tlb

2023-10-09 Thread Christian König


Am 09.10.23 um 03:50 schrieb Xu, Feifei:

[AMD Official Use Only - General]

Hi,


Based on your description, the above code should use "||" instead of "&&",

&& is to add more restriction here.  To avoid skipping necessary TLB flush by 
return.
For Asics < GFX11, !adev->gfx.is_poweron is always true (this paremeter is 
intrudoced from GFX11), only depends on reset_domain->sem;
For Asics = GFX11, !adev->gfx.is_poweron might be false (which gfx might 
already poweron in the reset), this will make the if () not ture, return will not 
be executed, thus flush TLB.


First of all the patch is broken because you only handle the locking, 
but not the unlocking part.


Then a TLB flush shouldn't be necessary on reset. A reset implies that 
the TLB is cleared as well.


We discussed the possibility to avoid that, but this is not supposed to 
be happening at the moment.


Regards,
Christian.




And after merging code into one line may result in the lock not being released 
if the lock can be acquired success.

If !adev->gfx.is_poweron is true, the reset_domin->sem will not be 
down_read_trylock, thus could avoid this deadlock.




Thanks,
Feifei

-Original Message-
From: Wang, Yang(Kevin) 
Sent: Sunday, October 8, 2023 9:36 PM
To: Xu, Feifei ; amd-gfx@lists.freedesktop.org
Cc: Xu, Feifei ; Xu, Feifei ; Koenig, Christian 
; Zhang, Hawking 
Subject: RE: [PATCH] drm/amdgpu:Check gfx poweron when skip flush_gpu_tlb


-Original Message-
From: amd-gfx  On Behalf Of Feifei Xu
Sent: Sunday, October 8, 2023 6:07 PM
To: amd-gfx@lists.freedesktop.org
Cc: Xu, Feifei ; Xu, Feifei ; Koenig, Christian 
; Zhang, Hawking 
Subject: [PATCH] drm/amdgpu:Check gfx poweron when skip flush_gpu_tlb

To fix the gpu recovery failed on GFX11 with gfxhub pagefault, flush gpu tlb 
after reset on GFX11.
Gfxhub tlb flush need check if adev->gfx.is_poweron set.

Fixes: d0c860f33553 ("drm/amdgpu: rework lock handling for flush_tlb v2")

Signed-off-by: Feifei Xu 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
index 2f9bb86edd71..048d32edee88 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
@@ -611,8 +611,9 @@ void amdgpu_gmc_flush_gpu_tlb(struct amdgpu_device *adev, 
uint32_t vmid,
 /*
  * A GPU reset should flush all TLBs anyway, so no need to do
  * this while one is ongoing.
+* For GFX11, gfxhub flush check if adev->gfx.is_poweron is set.
  */
-   if (!down_read_trylock(>reset_domain->sem))
+   if (!down_read_trylock(>reset_domain->sem) &&
+!adev->gfx.is_poweron)
 return;

[Kevin]:
Based on your description, the above code should use "||" instead of "&&",
And after merging code into one line may result in the lock not being released 
if the lock can be acquired success.

Best Regards,
Kevin

 if (adev->gmc.flush_tlb_needs_extra_type_2)
--
2.34.1

Re: [PATCH v5 2/7] drm/amd: Add concept of running prepare() sequence for IP blocks

2023-10-09 Thread Christian König


Am 06.10.23 um 20:50 schrieb Mario Limonciello:

If any IP blocks allocate memory during their sw_fini() sequence


hw_fini instead of sw_fini? sw_fini should only be called on driver unload.


this can cause the suspend to fail under memory pressure.  Introduce
a new phase that IP blocks can use to allocate memory before suspend
starts so that it can potentially be evicted into swap instead.

Signed-off-by: Mario Limonciello 


Apart from the commit message Reviewed-by: Christian König 
.


Regards,
Christian.


---
v4->v5:
  * New patch
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 12 +++-
  drivers/gpu/drm/amd/include/amd_shared.h   |  1 +
  2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index cb334dc57c59..a362152cd0da 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4348,7 +4348,7 @@ static int amdgpu_device_evict_resources(struct 
amdgpu_device *adev)
  int amdgpu_device_prepare(struct drm_device *dev)
  {
struct amdgpu_device *adev = drm_to_adev(dev);
-   int r;
+   int i, r;
  
  	if (dev->switch_power_state == DRM_SWITCH_POWER_OFF)

return 0;
@@ -4358,6 +4358,16 @@ int amdgpu_device_prepare(struct drm_device *dev)
if (r)
return r;
  
+	for (i = 0; i < adev->num_ip_blocks; i++) {

+   if (!adev->ip_blocks[i].status.valid)
+   continue;
+   if (!adev->ip_blocks[i].version->funcs->prepare)
+   continue;
+   r = adev->ip_blocks[i].version->funcs->prepare((void *)adev);
+   if (r)
+   return r;
+   }
+
return 0;
  }
  
diff --git a/drivers/gpu/drm/amd/include/amd_shared.h b/drivers/gpu/drm/amd/include/amd_shared.h

index ce75351204bb..1f831cb747e0 100644
--- a/drivers/gpu/drm/amd/include/amd_shared.h
+++ b/drivers/gpu/drm/amd/include/amd_shared.h
@@ -299,6 +299,7 @@ struct amd_ip_funcs {
int (*hw_init)(void *handle);
int (*hw_fini)(void *handle);
void (*late_fini)(void *handle);
+   int (*prepare)(void *prepare);
int (*suspend)(void *handle);
int (*resume)(void *handle);
bool (*is_idle)(void *handle);

Re: [PATCH v5 1/7] drm/amd: Evict resources during PM ops prepare() callback

2023-10-09 Thread Christian König


Am 06.10.23 um 20:50 schrieb Mario Limonciello:

Linux PM core has a prepare() callback run before suspend.

If the system is under high memory pressure, the resources may need
to be evicted into swap instead.  If the storage backing for swap
is offlined during the suspend() step then such a call may fail.

So move this step into prepare() to move evict majority of
resources and update all non-pmops callers to call the same callback.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2362
Signed-off-by: Mario Limonciello 


Reviewed-by: Christian König  for this one.


---
v4->v5:
  * Call amdgpu_device_prepare() from other callers to amdgpu_device_suspend()
  * 3x evict calls -> 2x evict calls
---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h|  1 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 31 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c| 10 ---
  3 files changed, 34 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 4cc78e0e4304..fdb2e9ae13e2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1409,6 +1409,7 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
  void amdgpu_driver_release_kms(struct drm_device *dev);
  
  int amdgpu_device_ip_suspend(struct amdgpu_device *adev);

+int amdgpu_device_prepare(struct drm_device *dev);
  int amdgpu_device_suspend(struct drm_device *dev, bool fbcon);
  int amdgpu_device_resume(struct drm_device *dev, bool fbcon);
  u32 amdgpu_get_vblank_counter_kms(struct drm_crtc *crtc);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 0cb702c3046a..cb334dc57c59 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -1760,6 +1760,7 @@ static void amdgpu_switcheroo_set_state(struct pci_dev 
*pdev,
} else {
pr_info("switched off\n");
dev->switch_power_state = DRM_SWITCH_POWER_CHANGING;
+   amdgpu_device_prepare(dev);
amdgpu_device_suspend(dev, true);
amdgpu_device_cache_pci_state(pdev);
/* Shut down the device */
@@ -4335,6 +4336,31 @@ static int amdgpu_device_evict_resources(struct 
amdgpu_device *adev)
  /*
   * Suspend & resume.
   */
+/**
+ * amdgpu_device_prepare - prepare for device suspend
+ *
+ * @dev: drm dev pointer
+ *
+ * Prepare to put the hw in the suspend state (all asics).
+ * Returns 0 for success or an error on failure.
+ * Called at driver suspend.
+ */
+int amdgpu_device_prepare(struct drm_device *dev)
+{
+   struct amdgpu_device *adev = drm_to_adev(dev);
+   int r;
+
+   if (dev->switch_power_state == DRM_SWITCH_POWER_OFF)
+   return 0;
+
+   /* Evict the majority of BOs before starting suspend sequence */
+   r = amdgpu_device_evict_resources(adev);
+   if (r)
+   return r;
+
+   return 0;
+}
+
  /**
   * amdgpu_device_suspend - initiate device suspend
   *
@@ -4355,11 +4381,6 @@ int amdgpu_device_suspend(struct drm_device *dev, bool 
fbcon)
  
  	adev->in_suspend = true;
  
-	/* Evict the majority of BOs before grabbing the full access */

-   r = amdgpu_device_evict_resources(adev);
-   if (r)
-   return r;
-
if (amdgpu_sriov_vf(adev)) {
amdgpu_virt_fini_data_exchange(adev);
r = amdgpu_virt_request_full_gpu(adev, false);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 81affdf7c0c3..420196a17e22 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -2427,8 +2427,9 @@ static int amdgpu_pmops_prepare(struct device *dev)
/* Return a positive number here so
 * DPM_FLAG_SMART_SUSPEND works properly
 */
-   if (amdgpu_device_supports_boco(drm_dev))
-   return pm_runtime_suspended(dev);
+   if (amdgpu_device_supports_boco(drm_dev) &&
+   pm_runtime_suspended(dev))
+   return 1;
  
  	/* if we will not support s3 or s2i for the device

 *  then skip suspend
@@ -2437,7 +2438,7 @@ static int amdgpu_pmops_prepare(struct device *dev)
!amdgpu_acpi_is_s3_active(adev))
return 1;
  
-	return 0;

+   return amdgpu_device_prepare(drm_dev);
  }
  
  static void amdgpu_pmops_complete(struct device *dev)

@@ -2637,6 +2638,9 @@ static int amdgpu_pmops_runtime_suspend(struct device 
*dev)
if (amdgpu_device_supports_boco(drm_dev))
adev->mp1_state = PP_MP1_STATE_UNLOAD;
  
+	ret = amdgpu_device_prepare(drm_dev);

+   if (ret)
+   return ret;
ret = amdgpu_device_suspend(drm_dev, false);
if (ret) {
adev->in_runpm = false;

Re: [PATCH v2 4/4] drm/amd/pm: Add sysfs attribute to get pm log

2023-10-09 Thread Christian König


Am 06.10.23 um 16:24 schrieb Alex Deucher:

On Fri, Oct 6, 2023 at 4:32 AM Christian König
 wrote:

Am 06.10.23 um 07:21 schrieb Lijo Lazar:

Add sysfs attribute to read power management log. A snapshot is
captured to the buffer when the attribute is read.

Signed-off-by: Lijo Lazar 
---

v2: Pass PAGE_SIZE as the max size of input buffer

   drivers/gpu/drm/amd/pm/amdgpu_pm.c | 40 ++
   1 file changed, 40 insertions(+)

diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c 
b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
index 4c65a2fac028..5a1d21c52672 100644
--- a/drivers/gpu/drm/amd/pm/amdgpu_pm.c
+++ b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
@@ -1794,6 +1794,44 @@ static ssize_t amdgpu_set_apu_thermal_cap(struct device 
*dev,
   return count;
   }

+static int amdgpu_pmlog_attr_update(struct amdgpu_device *adev,
+ struct amdgpu_device_attr *attr,
+ uint32_t mask,
+ enum amdgpu_device_attr_states *states)
+{
+ if (amdgpu_dpm_get_pm_log(adev, NULL, 0) == -EOPNOTSUPP)
+ *states = ATTR_STATE_UNSUPPORTED;
+
+ return 0;
+}
+
+static ssize_t amdgpu_get_pmlog(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct drm_device *ddev = dev_get_drvdata(dev);
+ struct amdgpu_device *adev = drm_to_adev(ddev);
+ ssize_t size = 0;
+ int ret;
+
+ if (amdgpu_in_reset(adev))
+ return -EPERM;

Please stop using amdgpu_in_reset() for stuff like that altogether.

When this is reset critical it should grab the reset rw semaphore. If it
isn't than that check isn't necessary.

All of the power related sysfs files have this check.  It was
originally added because users have processes running which poll
various hwmon files at regular intervals and since SMU also handles
reset, we don't want to get a metrics table request while a reset is
happening otherwise the SMU gets confused.


Then this approach is completely broken. Nothing prevents the reset from 
starting right after doing the check.


If we need exclusive access to the SMU then we should just grab a lock.

Christian.



Alex


Regards,
Christian.


+ if (adev->in_suspend && !adev->in_runpm)
+ return -EPERM;
+
+ ret = pm_runtime_get_sync(ddev->dev);
+ if (ret < 0) {
+ pm_runtime_put_autosuspend(ddev->dev);
+ return ret;
+ }
+
+ size = amdgpu_dpm_get_pm_log(adev, buf, PAGE_SIZE);
+
+ pm_runtime_mark_last_busy(ddev->dev);
+ pm_runtime_put_autosuspend(ddev->dev);
+
+ return size;
+}
+
   /**
* DOC: gpu_metrics
*
@@ -2091,6 +2129,8 @@ static struct amdgpu_device_attr amdgpu_device_attrs[] = {
   AMDGPU_DEVICE_ATTR_RW(smartshift_bias,  
ATTR_FLAG_BASIC,
 .attr_update = ss_bias_attr_update),
   AMDGPU_DEVICE_ATTR_RW(xgmi_plpd_policy, 
ATTR_FLAG_BASIC),
+ AMDGPU_DEVICE_ATTR_RO(pmlog,
ATTR_FLAG_BASIC,
+   .attr_update = amdgpu_pmlog_attr_update),
   };

   static int default_attr_update(struct amdgpu_device *adev, struct 
amdgpu_device_attr *attr,

69 matches

Mail list logo