RE: [PATCH] drm/amdgpu: fix uninitialized scalar variable warning

2024-04-29 Thread Huang, Tim
[Public]

> -Original Message-
> From: Wang, Yang(Kevin) 
> Sent: Tuesday, April 30, 2024 12:14 PM
> To: Huang, Tim ; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Koenig, Christian
> 
> Subject: RE: [PATCH] drm/amdgpu: fix uninitialized scalar variable warning
>
> [Public]
>
> -Original Message-
> From: amd-gfx  On Behalf Of Huang,
> Tim
> Sent: Tuesday, April 30, 2024 11:32 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Koenig, Christian
> 
> Subject: RE: [PATCH] drm/amdgpu: fix uninitialized scalar variable warning
>
> [Public]
>
> [Public]
>
> Ping ...
> > -Original Message-
> > From: Huang, Tim 
> > Sent: Friday, April 26, 2024 9:14 AM
> > To: amd-gfx@lists.freedesktop.org
> > Cc: Deucher, Alexander ; Koenig, Christian
> > ; Huang, Tim 
> > Subject: [PATCH] drm/amdgpu: fix uninitialized scalar variable warning
> >
> > Clear warning that field bp is uninitialized when calling
> > amdgpu_virt_ras_add_bps.
> >
> > Signed-off-by: Tim Huang 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> > index 54ab51a4ada7..a2f15edfe812 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> > @@ -395,6 +395,8 @@ static void amdgpu_virt_add_bad_page(struct
> > amdgpu_device *adev,
> >   else
> >   vram_usage_va = adev->mman.drv_vram_usage_va;
> >
> > + memset(, 0, sizeof(struct eeprom_table_record));
> [Kevin]:
>
> It is better to change code  to "sizeof (bp)".

Yes, agree, will change to this. Thanks.

Tim
>
> Reviewed-by: Yang Wang 
>
> Best Regards,
> Kevin
> > +
> >   if (bp_block_size) {
> >   bp_cnt = bp_block_size / sizeof(uint64_t);
> >   for (bp_idx = 0; bp_idx < bp_cnt; bp_idx++) {
> > --
> > 2.39.2
>



RE: [PATCH] drm/amdgpu: fix uninitialized scalar variable warning

2024-04-29 Thread Wang, Yang(Kevin)
[Public]

-Original Message-
From: amd-gfx  On Behalf Of Huang, Tim
Sent: Tuesday, April 30, 2024 11:32 AM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Koenig, Christian 

Subject: RE: [PATCH] drm/amdgpu: fix uninitialized scalar variable warning

[Public]

[Public]

Ping ...
> -Original Message-
> From: Huang, Tim 
> Sent: Friday, April 26, 2024 9:14 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Koenig, Christian
> ; Huang, Tim 
> Subject: [PATCH] drm/amdgpu: fix uninitialized scalar variable warning
>
> Clear warning that field bp is uninitialized when calling
> amdgpu_virt_ras_add_bps.
>
> Signed-off-by: Tim Huang 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> index 54ab51a4ada7..a2f15edfe812 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> @@ -395,6 +395,8 @@ static void amdgpu_virt_add_bad_page(struct
> amdgpu_device *adev,
>   else
>   vram_usage_va = adev->mman.drv_vram_usage_va;
>
> + memset(, 0, sizeof(struct eeprom_table_record));
[Kevin]:

It is better to change code  to "sizeof (bp)".

Reviewed-by: Yang Wang 

Best Regards,
Kevin
> +
>   if (bp_block_size) {
>   bp_cnt = bp_block_size / sizeof(uint64_t);
>   for (bp_idx = 0; bp_idx < bp_cnt; bp_idx++) {
> --
> 2.39.2



RE: [PATCH 1/2] drm/amd/pm: fix uninitialized variable warnings for vega10_hwmgr

2024-04-29 Thread Wang, Yang(Kevin)
[AMD Official Use Only - General]

Series is
Reviewed-by: Yang Wang 

Best Regards,
Kevin

-Original Message-
From: amd-gfx  On Behalf Of Huang, Tim
Sent: Tuesday, April 30, 2024 11:28 AM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Koenig, Christian 

Subject: RE: [PATCH 1/2] drm/amd/pm: fix uninitialized variable warnings for 
vega10_hwmgr

[AMD Official Use Only - General]

[AMD Official Use Only - General]

Ping ...

> -Original Message-
> From: Huang, Tim 
> Sent: Sunday, April 28, 2024 4:45 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Koenig, Christian
> ; Huang, Tim 
> Subject: [PATCH 1/2] drm/amd/pm: fix uninitialized variable warnings
> for vega10_hwmgr
>
> Clear warnings that using uninitialized variable when fails to get the
> valid value from SMU.
>
> Signed-off-by: Tim Huang 
> ---
>  .../drm/amd/pm/powerplay/hwmgr/vega10_hwmgr.c | 46 ++-
>  .../amd/pm/powerplay/smumgr/vega10_smumgr.c   |  6 ++-
>  2 files changed, 39 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/vega10_hwmgr.c
> b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/vega10_hwmgr.c
> index 9f5bd998c6bf..488ad9de4694 100644
> --- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/vega10_hwmgr.c
> +++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/vega10_hwmgr.c
> @@ -354,13 +354,13 @@ static int
> vega10_odn_initial_default_setting(struct
> pp_hwmgr *hwmgr)
>   return 0;
>  }
>
> -static void vega10_init_dpm_defaults(struct pp_hwmgr *hwmgr)
> +static int vega10_init_dpm_defaults(struct pp_hwmgr *hwmgr)
>  {
>   struct vega10_hwmgr *data = hwmgr->backend;
> - int i;
>   uint32_t sub_vendor_id, hw_revision;
>   uint32_t top32, bottom32;
>   struct amdgpu_device *adev = hwmgr->adev;
> + int ret, i;
>
>   vega10_initialize_power_tune_defaults(hwmgr);
>
> @@ -485,9 +485,12 @@ static void vega10_init_dpm_defaults(struct
> pp_hwmgr
> *hwmgr)
>   if (data->registry_data.vr0hot_enabled)
>   data->smu_features[GNLD_VR0HOT].supported = true;
>
> - smum_send_msg_to_smc(hwmgr,
> + ret = smum_send_msg_to_smc(hwmgr,
>   PPSMC_MSG_GetSmuVersion,
>   >smu_version);
> + if (ret)
> + return ret;
> +
>   /* ACG firmware has major version 5 */
>   if ((hwmgr->smu_version & 0xff00) == 0x500)
>   data->smu_features[GNLD_ACG].supported = true; @@
> -505,10
> +508,16 @@ static void vega10_init_dpm_defaults(struct pp_hwmgr
> +*hwmgr)
>   data->smu_features[GNLD_PCC_LIMIT].supported = true;
>
>   /* Get the SN to turn into a Unique ID */
> - smum_send_msg_to_smc(hwmgr, PPSMC_MSG_ReadSerialNumTop32,
> );
> - smum_send_msg_to_smc(hwmgr,
> PPSMC_MSG_ReadSerialNumBottom32, );
> + ret = smum_send_msg_to_smc(hwmgr,
> PPSMC_MSG_ReadSerialNumTop32, );
> + if (ret)
> + return ret;
> +
> + ret = smum_send_msg_to_smc(hwmgr,
> PPSMC_MSG_ReadSerialNumBottom32, );
> + if (ret)
> + return ret;
>
>   adev->unique_id = ((uint64_t)bottom32 << 32) | top32;
> + return 0;
>  }
>
>  #ifdef PPLIB_VEGA10_EVV_SUPPORT
> @@ -882,7 +891,9 @@ static int vega10_hwmgr_backend_init(struct
> pp_hwmgr
> *hwmgr)
>
>   vega10_set_features_platform_caps(hwmgr);
>
> - vega10_init_dpm_defaults(hwmgr);
> + result = vega10_init_dpm_defaults(hwmgr);
> + if (result)
> + return result;
>
>  #ifdef PPLIB_VEGA10_EVV_SUPPORT
>   /* Get leakage voltage based on leakage ID. */ @@ -3900,11
> +3911,14 @@ static int vega10_get_gpu_power(struct pp_hwmgr *hwmgr,
>   uint32_t *query)
>  {
>   uint32_t value;
> + int ret;
>
>   if (!query)
>   return -EINVAL;
>
> - smum_send_msg_to_smc(hwmgr, PPSMC_MSG_GetCurrPkgPwr,
> );
> + ret = smum_send_msg_to_smc(hwmgr, PPSMC_MSG_GetCurrPkgPwr,
> );
> + if (ret)
> + return ret;
>
>   /* SMC returning actual watts, keep consistent with legacy
> asics, low 8 bit as 8 fractional bits */
>   *query = value << 8;
> @@ -4800,14 +4814,16 @@ static int vega10_print_clock_levels(struct
> pp_hwmgr *hwmgr,
>   uint32_t gen_speed, lane_width, current_gen_speed,
> current_lane_width;
>   PPTable_t *pptable = &(data->smc_state_table.pp_table);
>
> - int i, now, size = 0, count = 0;
> + int i, ret, now,  size = 0, count = 0;
>
>   switch (type) {
>   case PP_SCLK:
>   if (data->registry_data.sclk_dpm_key_disabled)
>   break;
>
> - smum_send_msg_to_smc(hwmgr,
> PPSMC_MSG_GetCurrentGfxclkIndex, );
> + ret = smum_send_msg_to_smc(hwmgr,
> PPSMC_MSG_GetCurrentGfxclkIndex, );
> + if (ret)
> + break;
>
>   if (hwmgr->pp_one_vf &&
>   (hwmgr->dpm_level ==
> AMD_DPM_FORCED_LEVEL_PROFILE_PEAK))
> @@ -4823,7 +4839,9 @@ static int vega10_print_clock_levels(struct
> 

RE: [PATCH] drm/amdgpu: fix uninitialized scalar variable warning

2024-04-29 Thread Huang, Tim
[Public]

Ping ...
> -Original Message-
> From: Huang, Tim 
> Sent: Friday, April 26, 2024 9:14 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Koenig, Christian
> ; Huang, Tim 
> Subject: [PATCH] drm/amdgpu: fix uninitialized scalar variable warning
>
> Clear warning that field bp is uninitialized when calling
> amdgpu_virt_ras_add_bps.
>
> Signed-off-by: Tim Huang 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> index 54ab51a4ada7..a2f15edfe812 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> @@ -395,6 +395,8 @@ static void amdgpu_virt_add_bad_page(struct
> amdgpu_device *adev,
>   else
>   vram_usage_va = adev->mman.drv_vram_usage_va;
>
> + memset(, 0, sizeof(struct eeprom_table_record));
> +
>   if (bp_block_size) {
>   bp_cnt = bp_block_size / sizeof(uint64_t);
>   for (bp_idx = 0; bp_idx < bp_cnt; bp_idx++) {
> --
> 2.39.2



RE: [PATCH 1/2] drm/amd/pm: fix uninitialized variable warnings for vega10_hwmgr

2024-04-29 Thread Huang, Tim
[AMD Official Use Only - General]

Ping ...

> -Original Message-
> From: Huang, Tim 
> Sent: Sunday, April 28, 2024 4:45 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Koenig, Christian
> ; Huang, Tim 
> Subject: [PATCH 1/2] drm/amd/pm: fix uninitialized variable warnings for
> vega10_hwmgr
>
> Clear warnings that using uninitialized variable when fails to get the valid 
> value
> from SMU.
>
> Signed-off-by: Tim Huang 
> ---
>  .../drm/amd/pm/powerplay/hwmgr/vega10_hwmgr.c | 46 ++-
>  .../amd/pm/powerplay/smumgr/vega10_smumgr.c   |  6 ++-
>  2 files changed, 39 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/vega10_hwmgr.c
> b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/vega10_hwmgr.c
> index 9f5bd998c6bf..488ad9de4694 100644
> --- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/vega10_hwmgr.c
> +++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/vega10_hwmgr.c
> @@ -354,13 +354,13 @@ static int vega10_odn_initial_default_setting(struct
> pp_hwmgr *hwmgr)
>   return 0;
>  }
>
> -static void vega10_init_dpm_defaults(struct pp_hwmgr *hwmgr)
> +static int vega10_init_dpm_defaults(struct pp_hwmgr *hwmgr)
>  {
>   struct vega10_hwmgr *data = hwmgr->backend;
> - int i;
>   uint32_t sub_vendor_id, hw_revision;
>   uint32_t top32, bottom32;
>   struct amdgpu_device *adev = hwmgr->adev;
> + int ret, i;
>
>   vega10_initialize_power_tune_defaults(hwmgr);
>
> @@ -485,9 +485,12 @@ static void vega10_init_dpm_defaults(struct pp_hwmgr
> *hwmgr)
>   if (data->registry_data.vr0hot_enabled)
>   data->smu_features[GNLD_VR0HOT].supported = true;
>
> - smum_send_msg_to_smc(hwmgr,
> + ret = smum_send_msg_to_smc(hwmgr,
>   PPSMC_MSG_GetSmuVersion,
>   >smu_version);
> + if (ret)
> + return ret;
> +
>   /* ACG firmware has major version 5 */
>   if ((hwmgr->smu_version & 0xff00) == 0x500)
>   data->smu_features[GNLD_ACG].supported = true; @@ -505,10
> +508,16 @@ static void vega10_init_dpm_defaults(struct pp_hwmgr *hwmgr)
>   data->smu_features[GNLD_PCC_LIMIT].supported = true;
>
>   /* Get the SN to turn into a Unique ID */
> - smum_send_msg_to_smc(hwmgr, PPSMC_MSG_ReadSerialNumTop32,
> );
> - smum_send_msg_to_smc(hwmgr,
> PPSMC_MSG_ReadSerialNumBottom32, );
> + ret = smum_send_msg_to_smc(hwmgr,
> PPSMC_MSG_ReadSerialNumTop32, );
> + if (ret)
> + return ret;
> +
> + ret = smum_send_msg_to_smc(hwmgr,
> PPSMC_MSG_ReadSerialNumBottom32, );
> + if (ret)
> + return ret;
>
>   adev->unique_id = ((uint64_t)bottom32 << 32) | top32;
> + return 0;
>  }
>
>  #ifdef PPLIB_VEGA10_EVV_SUPPORT
> @@ -882,7 +891,9 @@ static int vega10_hwmgr_backend_init(struct pp_hwmgr
> *hwmgr)
>
>   vega10_set_features_platform_caps(hwmgr);
>
> - vega10_init_dpm_defaults(hwmgr);
> + result = vega10_init_dpm_defaults(hwmgr);
> + if (result)
> + return result;
>
>  #ifdef PPLIB_VEGA10_EVV_SUPPORT
>   /* Get leakage voltage based on leakage ID. */ @@ -3900,11 +3911,14
> @@ static int vega10_get_gpu_power(struct pp_hwmgr *hwmgr,
>   uint32_t *query)
>  {
>   uint32_t value;
> + int ret;
>
>   if (!query)
>   return -EINVAL;
>
> - smum_send_msg_to_smc(hwmgr, PPSMC_MSG_GetCurrPkgPwr,
> );
> + ret = smum_send_msg_to_smc(hwmgr, PPSMC_MSG_GetCurrPkgPwr,
> );
> + if (ret)
> + return ret;
>
>   /* SMC returning actual watts, keep consistent with legacy asics, low 8
> bit as 8 fractional bits */
>   *query = value << 8;
> @@ -4800,14 +4814,16 @@ static int vega10_print_clock_levels(struct
> pp_hwmgr *hwmgr,
>   uint32_t gen_speed, lane_width, current_gen_speed,
> current_lane_width;
>   PPTable_t *pptable = &(data->smc_state_table.pp_table);
>
> - int i, now, size = 0, count = 0;
> + int i, ret, now,  size = 0, count = 0;
>
>   switch (type) {
>   case PP_SCLK:
>   if (data->registry_data.sclk_dpm_key_disabled)
>   break;
>
> - smum_send_msg_to_smc(hwmgr,
> PPSMC_MSG_GetCurrentGfxclkIndex, );
> + ret = smum_send_msg_to_smc(hwmgr,
> PPSMC_MSG_GetCurrentGfxclkIndex, );
> + if (ret)
> + break;
>
>   if (hwmgr->pp_one_vf &&
>   (hwmgr->dpm_level ==
> AMD_DPM_FORCED_LEVEL_PROFILE_PEAK))
> @@ -4823,7 +4839,9 @@ static int vega10_print_clock_levels(struct pp_hwmgr
> *hwmgr,
>   if (data->registry_data.mclk_dpm_key_disabled)
>   break;
>
> - smum_send_msg_to_smc(hwmgr,
> PPSMC_MSG_GetCurrentUclkIndex, );
> + ret = smum_send_msg_to_smc(hwmgr,
> PPSMC_MSG_GetCurrentUclkIndex, );
> + if (ret)
> + break;
>
>   for (i = 0; i < mclk_table->count; i++)
>   

RE: [PATCH 2/3] drm/amd/pm: fix the Out-of-bounds read warning

2024-04-29 Thread Huang, Tim
[Public]

This patch is,


Reviewed-by: Tim Huang 

Best Regards,
Tim Huang


> -Original Message-
> From: Jesse Zhang 
> Sent: Friday, April 26, 2024 3:29 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Koenig, Christian
> ; Huang, Tim ; Zhang,
> Jesse(Jie) ; Zhang, Jesse(Jie) 
> Subject: [PATCH 2/3] drm/amd/pm: fix the Out-of-bounds read warning
>
> using index i - 1U may beyond element index for mc_data[] when i = 0.
>
> Signed-off-by: Jesse Zhang 
> ---
>  drivers/gpu/drm/amd/pm/powerplay/hwmgr/ppatomctrl.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/ppatomctrl.c
> b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/ppatomctrl.c
> index b1b4c09c3467..b56298d9da98 100644
> --- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/ppatomctrl.c
> +++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/ppatomctrl.c
> @@ -73,8 +73,9 @@ static int atomctrl_retrieve_ac_timing(
>   j++;
>   } else if ((table-
> >mc_reg_address[i].uc_pre_reg_data &
>   LOW_NIBBLE_MASK)
> == DATA_EQU_PREV) {
> - table-
> >mc_reg_table_entry[num_ranges].mc_data[i] =
> - table-
> >mc_reg_table_entry[num_ranges].mc_data[i-1];
> + if (i)
> + table-
> >mc_reg_table_entry[num_ranges].mc_data[i] =
> + table-
> >mc_reg_table_entry[num_ranges].mc_data[i-1];
>   }
>   }
>   num_ranges++;
> --
> 2.25.1



RE: [PATCH 1/3 V2] drm/amd/pm: Fix negative array index read warning for pptable->DpmDescriptor

2024-04-29 Thread Huang, Tim
[Public]

This patch is,

Reviewed-by: Tim Huang 

Best Regards,
Tim Huang



> -Original Message-
> From: Jesse Zhang 
> Sent: Sunday, April 28, 2024 5:38 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Koenig, Christian
> ; Huang, Tim ; Zhang,
> Jesse(Jie) ; Zhang, Jesse(Jie) 
> Subject: [PATCH 1/3 V2] drm/amd/pm: Fix negative array index read warning for
> pptable->DpmDescriptor
>
> Avoid using the negative values
> for clk_idex as an index into an array pptable->DpmDescriptor.
>
> V2: fix clk_index return check (Tim Huang)
>
> Signed-off-by: Jesse Zhang 
> ---
>  .../gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c   | 27 ++-
>  1 file changed, 21 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
> b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
> index 5a68d365967f..c06e0d6e3017 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
> @@ -1219,19 +1219,22 @@ static int
> navi10_get_current_clk_freq_by_table(struct smu_context *smu,
>  value);
>  }
>
> -static bool navi10_is_support_fine_grained_dpm(struct smu_context *smu,
> enum smu_clk_type clk_type)
> +static int navi10_is_support_fine_grained_dpm(struct smu_context *smu,
> +enum smu_clk_type clk_type)
>  {
>   PPTable_t *pptable = smu->smu_table.driver_pptable;
>   DpmDescriptor_t *dpm_desc = NULL;
> - uint32_t clk_index = 0;
> + int clk_index = 0;
>
>   clk_index = smu_cmn_to_asic_specific_index(smu,
>  CMN2ASIC_MAPPING_CLK,
>  clk_type);
> + if (clk_index < 0)
> + return clk_index;
> +
>   dpm_desc = >DpmDescriptor[clk_index];
>
>   /* 0 - Fine grained DPM, 1 - Discrete DPM */
> - return dpm_desc->SnapToDiscrete == 0;
> + return dpm_desc->SnapToDiscrete == 0 ? 1 : 0;
>  }
>
>  static inline bool navi10_od_feature_is_supported(struct
> smu_11_0_overdrive_table *od_table, enum SMU_11_0_ODFEATURE_CAP cap)
> @@ -1287,7 +1290,11 @@ static int navi10_emit_clk_levels(struct smu_context
> *smu,
>   if (ret)
>   return ret;
>
> - if (!navi10_is_support_fine_grained_dpm(smu, clk_type)) {
> + ret = navi10_is_support_fine_grained_dpm(smu, clk_type);
> + if (ret < 0)
> + return ret;
> +
> + if (!ret) {
>   for (i = 0; i < count; i++) {
>   ret = smu_v11_0_get_dpm_freq_by_index(smu,
> clk_type, 
> i,
> );
> @@ -1496,7 +1503,11 @@ static int navi10_print_clk_levels(struct smu_context
> *smu,
>   if (ret)
>   return size;
>
> - if (!navi10_is_support_fine_grained_dpm(smu, clk_type)) {
> + ret = navi10_is_support_fine_grained_dpm(smu, clk_type);
> + if (ret < 0)
> + return ret;
> +
> + if (!ret) {
>   for (i = 0; i < count; i++) {
>   ret = smu_v11_0_get_dpm_freq_by_index(smu,
> clk_type, i, );
>   if (ret)
> @@ -1665,7 +1676,11 @@ static int navi10_force_clk_levels(struct smu_context
> *smu,
>   case SMU_UCLK:
>   case SMU_FCLK:
>   /* There is only 2 levels for fine grained DPM */
> - if (navi10_is_support_fine_grained_dpm(smu, clk_type)) {
> + ret = navi10_is_support_fine_grained_dpm(smu, clk_type);
> + if (ret < 0)
> + return ret;
> +
> + if (ret) {
>   soft_max_level = (soft_max_level >= 1 ? 1 : 0);
>   soft_min_level = (soft_min_level >= 1 ? 1 : 0);
>   }
> --
> 2.25.1



RE: [PATCH 1/3 V2] drm/amd/pm: Fix negative array index read warning for pptable->DpmDescriptor

2024-04-29 Thread Zhang, Jesse(Jie)
[AMD Official Use Only - General]

Ping ...

-Original Message-
From: Jesse Zhang 
Sent: Sunday, April 28, 2024 5:38 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Koenig, Christian 
; Huang, Tim ; Zhang, Jesse(Jie) 
; Zhang, Jesse(Jie) 
Subject: [PATCH 1/3 V2] drm/amd/pm: Fix negative array index read warning for 
pptable->DpmDescriptor

Avoid using the negative values
for clk_idex as an index into an array pptable->DpmDescriptor.

V2: fix clk_index return check (Tim Huang)

Signed-off-by: Jesse Zhang 
---
 .../gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c   | 27 ++-
 1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
index 5a68d365967f..c06e0d6e3017 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
@@ -1219,19 +1219,22 @@ static int navi10_get_current_clk_freq_by_table(struct 
smu_context *smu,
   value);
 }

-static bool navi10_is_support_fine_grained_dpm(struct smu_context *smu, enum 
smu_clk_type clk_type)
+static int navi10_is_support_fine_grained_dpm(struct smu_context *smu,
+enum smu_clk_type clk_type)
 {
PPTable_t *pptable = smu->smu_table.driver_pptable;
DpmDescriptor_t *dpm_desc = NULL;
-   uint32_t clk_index = 0;
+   int clk_index = 0;

clk_index = smu_cmn_to_asic_specific_index(smu,
   CMN2ASIC_MAPPING_CLK,
   clk_type);
+   if (clk_index < 0)
+   return clk_index;
+
dpm_desc = >DpmDescriptor[clk_index];

/* 0 - Fine grained DPM, 1 - Discrete DPM */
-   return dpm_desc->SnapToDiscrete == 0;
+   return dpm_desc->SnapToDiscrete == 0 ? 1 : 0;
 }

 static inline bool navi10_od_feature_is_supported(struct 
smu_11_0_overdrive_table *od_table, enum SMU_11_0_ODFEATURE_CAP cap) @@ -1287,7 
+1290,11 @@ static int navi10_emit_clk_levels(struct smu_context *smu,
if (ret)
return ret;

-   if (!navi10_is_support_fine_grained_dpm(smu, clk_type)) {
+   ret = navi10_is_support_fine_grained_dpm(smu, clk_type);
+   if (ret < 0)
+   return ret;
+
+   if (!ret) {
for (i = 0; i < count; i++) {
ret = smu_v11_0_get_dpm_freq_by_index(smu,
  clk_type, 
i, );
@@ -1496,7 +1503,11 @@ static int navi10_print_clk_levels(struct smu_context 
*smu,
if (ret)
return size;

-   if (!navi10_is_support_fine_grained_dpm(smu, clk_type)) {
+   ret = navi10_is_support_fine_grained_dpm(smu, clk_type);
+   if (ret < 0)
+   return ret;
+
+   if (!ret) {
for (i = 0; i < count; i++) {
ret = smu_v11_0_get_dpm_freq_by_index(smu, 
clk_type, i, );
if (ret)
@@ -1665,7 +1676,11 @@ static int navi10_force_clk_levels(struct smu_context 
*smu,
case SMU_UCLK:
case SMU_FCLK:
/* There is only 2 levels for fine grained DPM */
-   if (navi10_is_support_fine_grained_dpm(smu, clk_type)) {
+   ret = navi10_is_support_fine_grained_dpm(smu, clk_type);
+   if (ret < 0)
+   return ret;
+
+   if (ret) {
soft_max_level = (soft_max_level >= 1 ? 1 : 0);
soft_min_level = (soft_min_level >= 1 ? 1 : 0);
}
--
2.25.1



Re: [PATCH 2/3] drm/amdgpu: Reduce mem_type to domain double indirection

2024-04-29 Thread Felix Kuehling



On 2024-04-29 12:47, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

All apart from AMDGPU_GEM_DOMAIN_GTT memory domains map 1:1 to TTM
placements. And the former be either AMDGPU_PL_PREEMPT or TTM_PL_TT,
depending on AMDGPU_GEM_CREATE_PREEMPTIBLE.

Simplify a few places in the code which convert the TTM placement into
a domain by checking against the current placement directly.

In the conversion AMDGPU_PL_PREEMPT either does not have to be handled
because amdgpu_mem_type_to_domain() cannot return that value anyway.

v2:
  * Remove AMDGPU_PL_PREEMPT handling.

Signed-off-by: Tvrtko Ursulin 
Reviewed-by: Christian König  # v1

Reviewed-by: Felix Kuehling 

I also ran kfdtest on a multi-GPU system just to make sure this didn't 
break our multi-GPU support. BTW, I had to fix up some things when I 
tried to apply your patch to the current amd-staging-drm-next branch. 
That branch was just rebased on Linux 6.8, so maybe that's part of the 
reason.




---
  drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c |  3 +--
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c  | 27 +
  2 files changed, 12 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
index 055ba2ea4c12..0b3b10d21952 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
@@ -165,8 +165,7 @@ static struct sg_table *amdgpu_dma_buf_map(struct 
dma_buf_attachment *attach,
if (r)
return ERR_PTR(r);
  
-	} else if (!(amdgpu_mem_type_to_domain(bo->tbo.resource->mem_type) &

-AMDGPU_GEM_DOMAIN_GTT)) {
+   } else if (bo->tbo.resource->mem_type != TTM_PL_TT) {
return ERR_PTR(-EBUSY);
}
  
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c

index 8bc79924d171..eb5bd6962560 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -976,12 +976,11 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 
domain,
  
  	ttm_bo_pin(>tbo);
  
-	domain = amdgpu_mem_type_to_domain(bo->tbo.resource->mem_type);

-   if (domain == AMDGPU_GEM_DOMAIN_VRAM) {
+   if (bo->tbo.resource->mem_type == TTM_PL_VRAM) {
atomic64_add(amdgpu_bo_size(bo), >vram_pin_size);
atomic64_add(amdgpu_vram_mgr_bo_visible_size(bo),
 >visible_pin_size);
-   } else if (domain == AMDGPU_GEM_DOMAIN_GTT) {
+   } else if (bo->tbo.resource->mem_type == TTM_PL_TT) {
atomic64_add(amdgpu_bo_size(bo), >gart_pin_size);
}
  
@@ -1280,7 +1279,6 @@ void amdgpu_bo_get_memory(struct amdgpu_bo *bo,

  {
uint64_t size = amdgpu_bo_size(bo);
struct drm_gem_object *obj;
-   unsigned int domain;
bool shared;
  
  	/* Abort if the BO doesn't currently have a backing store */

@@ -1290,21 +1288,20 @@ void amdgpu_bo_get_memory(struct amdgpu_bo *bo,
obj = >tbo.base;
shared = drm_gem_object_is_shared_for_memory_stats(obj);
  
-	domain = amdgpu_mem_type_to_domain(bo->tbo.resource->mem_type);

-   switch (domain) {
-   case AMDGPU_GEM_DOMAIN_VRAM:
+   switch (bo->tbo.resource->mem_type) {
+   case TTM_PL_VRAM:
stats->vram += size;
if (amdgpu_bo_in_cpu_visible_vram(bo))
stats->visible_vram += size;
if (shared)
stats->vram_shared += size;
break;
-   case AMDGPU_GEM_DOMAIN_GTT:
+   case TTM_PL_TT:
stats->gtt += size;
if (shared)
stats->gtt_shared += size;
break;
-   case AMDGPU_GEM_DOMAIN_CPU:
+   case TTM_PL_SYSTEM:
default:
stats->cpu += size;
if (shared)
@@ -1317,7 +1314,7 @@ void amdgpu_bo_get_memory(struct amdgpu_bo *bo,
if (bo->flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED)
stats->requested_visible_vram += size;
  
-		if (domain != AMDGPU_GEM_DOMAIN_VRAM) {

+   if (bo->tbo.resource->mem_type != TTM_PL_VRAM) {
stats->evicted_vram += size;
if (bo->flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED)
stats->evicted_visible_vram += size;
@@ -1592,19 +1589,17 @@ u64 amdgpu_bo_print_info(int id, struct amdgpu_bo *bo, 
struct seq_file *m)
u64 size;
  
  	if (dma_resv_trylock(bo->tbo.base.resv)) {

-   unsigned int domain;
-   domain = amdgpu_mem_type_to_domain(bo->tbo.resource->mem_type);
-   switch (domain) {
-   case AMDGPU_GEM_DOMAIN_VRAM:
+   switch (bo->tbo.resource->mem_type) {
+   case TTM_PL_VRAM:
if (amdgpu_bo_in_cpu_visible_vram(bo))

Re: [PATCH] drm/amdkfd: update buffer_{store,load}_* modifiers for gfx940

2024-04-29 Thread Felix Kuehling

On 2024-04-29 17:50, Jay Cornwall wrote:

On 4/29/2024 06:06, Lancelot SIX wrote:

Instruction modifiers of the untyped vector memory buffer instructions
(MUBUF encoded) changed in gfx940.  The slc, scc and glc modifiers have
been replaced with sc0, sc1 and nt.

The current CWSR trap handler is written using pre-gfx940 modifier
names, making the source incompatible with a strict gfx940 assembler.

This patch updates the cwsr_trap_handler_gfx9.s source file to be
compatible with all gfx9 variants of the ISA.  The binary assembled code
is unchanged (so the behaviour is unchanged as well), only the source
representation is updated.

Signed-off-by: Lancelot SIX 
---
  .../drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm | 24 ---
  1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm 
b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm

index bb26338204f4..a2d597d7fb57 100644
--- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm
+++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm
@@ -48,6 +48,12 @@ var ACK_SQC_STORE    = 1    
//workaround for suspected SQC store bug causing
  var SAVE_AFTER_XNACK_ERROR    =    1 //workaround for TCP store 
failure after XNACK error when ALLOW_REPLAY=0, for debugger
  var SINGLE_STEP_MISSED_WORKAROUND   =    (ASIC_FAMILY <= 
CHIP_ALDEBARAN)    //workaround for lost MODE.DEBUG_EN exception when 
SAVECTX raised

  +#if ASIC_FAMILY < CHIP_GC_9_4_3
+#define VMEM_MODIFIERS slc:1 glc:1
+#else
+#define VMEM_MODIFIERS sc0:1 nt:1
+#endif
+
/**/
  /*    variables  */
/**/
@@ -581,7 +587,7 @@ end
  L_SAVE_LDS_LOOP_VECTOR:
    ds_read_b64 v[0:1], v2    //x =LDS[a], byte address
    s_waitcnt lgkmcnt(0)
-  buffer_store_dwordx2  v[0:1], v2, s_save_buf_rsrc0, 
s_save_mem_offset offen:1  glc:1  slc:1
+  buffer_store_dwordx2  v[0:1], v2, s_save_buf_rsrc0, 
s_save_mem_offset VMEM_MODIFIERS offen:1

  //    s_waitcnt vmcnt(0)
  //    v_add_u32 v2, vcc[0:1], v2, v3
    v_add_u32 v2, v2, v3
@@ -979,17 +985,17 @@ L_TCP_STORE_CHECK_DONE:
  end
    function write_4vgprs_to_mem(s_rsrc, s_mem_offset)
-    buffer_store_dword v0, v0, s_rsrc, s_mem_offset slc:1 glc:1
-    buffer_store_dword v1, v0, s_rsrc, s_mem_offset slc:1 glc:1  
offset:256
-    buffer_store_dword v2, v0, s_rsrc, s_mem_offset slc:1 glc:1  
offset:256*2
-    buffer_store_dword v3, v0, s_rsrc, s_mem_offset slc:1 glc:1  
offset:256*3

+    buffer_store_dword v0, v0, s_rsrc, s_mem_offset VMEM_MODIFIERS
+    buffer_store_dword v1, v0, s_rsrc, s_mem_offset VMEM_MODIFIERS 
offset:256
+    buffer_store_dword v2, v0, s_rsrc, s_mem_offset VMEM_MODIFIERS 
offset:256*2
+    buffer_store_dword v3, v0, s_rsrc, s_mem_offset VMEM_MODIFIERS 
offset:256*3

  end
    function read_4vgprs_from_mem(s_rsrc, s_mem_offset)
-    buffer_load_dword v0, v0, s_rsrc, s_mem_offset slc:1 glc:1
-    buffer_load_dword v1, v0, s_rsrc, s_mem_offset slc:1 glc:1 
offset:256
-    buffer_load_dword v2, v0, s_rsrc, s_mem_offset slc:1 glc:1 
offset:256*2
-    buffer_load_dword v3, v0, s_rsrc, s_mem_offset slc:1 glc:1 
offset:256*3

+    buffer_load_dword v0, v0, s_rsrc, s_mem_offset VMEM_MODIFIERS
+    buffer_load_dword v1, v0, s_rsrc, s_mem_offset VMEM_MODIFIERS 
offset:256
+    buffer_load_dword v2, v0, s_rsrc, s_mem_offset VMEM_MODIFIERS 
offset:256*2
+    buffer_load_dword v3, v0, s_rsrc, s_mem_offset VMEM_MODIFIERS 
offset:256*3

  s_waitcnt vmcnt(0)
  end

base-commit: cf743996352e327f483dc7d66606c90276f57380


Reviewed-by: Jay Cornwall 


Acked-by: Felix Kuehling 

Do you need me to submit the patch to amd-staging-drm-next?

Thanks,
  Felix




Re: [PATCH] drm/amdkfd: update buffer_{store,load}_* modifiers for gfx940

2024-04-29 Thread Jay Cornwall

On 4/29/2024 06:06, Lancelot SIX wrote:

Instruction modifiers of the untyped vector memory buffer instructions
(MUBUF encoded) changed in gfx940.  The slc, scc and glc modifiers have
been replaced with sc0, sc1 and nt.

The current CWSR trap handler is written using pre-gfx940 modifier
names, making the source incompatible with a strict gfx940 assembler.

This patch updates the cwsr_trap_handler_gfx9.s source file to be
compatible with all gfx9 variants of the ISA.  The binary assembled code
is unchanged (so the behaviour is unchanged as well), only the source
representation is updated.

Signed-off-by: Lancelot SIX 
---
  .../drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm | 24 ---
  1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm 
b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm
index bb26338204f4..a2d597d7fb57 100644
--- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm
+++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm
@@ -48,6 +48,12 @@ var ACK_SQC_STORE=   1   
//workaround for suspected SQC store bug causing
  var SAVE_AFTER_XNACK_ERROR=   1   //workaround for 
TCP store failure after XNACK error when ALLOW_REPLAY=0, for debugger
  var SINGLE_STEP_MISSED_WORKAROUND   = (ASIC_FAMILY <= CHIP_ALDEBARAN)  
//workaround for lost MODE.DEBUG_EN exception when SAVECTX raised
  
+#if ASIC_FAMILY < CHIP_GC_9_4_3

+#define VMEM_MODIFIERS slc:1 glc:1
+#else
+#define VMEM_MODIFIERS sc0:1 nt:1
+#endif
+
  /**/
  /*variables */
  /**/
@@ -581,7 +587,7 @@ end
  L_SAVE_LDS_LOOP_VECTOR:
ds_read_b64 v[0:1], v2  //x =LDS[a], byte address
s_waitcnt lgkmcnt(0)
-  buffer_store_dwordx2  v[0:1], v2, s_save_buf_rsrc0, s_save_mem_offset 
offen:1  glc:1  slc:1
+  buffer_store_dwordx2  v[0:1], v2, s_save_buf_rsrc0, s_save_mem_offset 
VMEM_MODIFIERS offen:1
  //s_waitcnt vmcnt(0)
  //v_add_u32 v2, vcc[0:1], v2, v3
v_add_u32 v2, v2, v3
@@ -979,17 +985,17 @@ L_TCP_STORE_CHECK_DONE:
  end
  
  function write_4vgprs_to_mem(s_rsrc, s_mem_offset)

-   buffer_store_dword v0, v0, s_rsrc, s_mem_offset slc:1 glc:1
-   buffer_store_dword v1, v0, s_rsrc, s_mem_offset slc:1 glc:1  offset:256
-   buffer_store_dword v2, v0, s_rsrc, s_mem_offset slc:1 glc:1  
offset:256*2
-   buffer_store_dword v3, v0, s_rsrc, s_mem_offset slc:1 glc:1  
offset:256*3
+   buffer_store_dword v0, v0, s_rsrc, s_mem_offset VMEM_MODIFIERS
+   buffer_store_dword v1, v0, s_rsrc, s_mem_offset VMEM_MODIFIERS 
offset:256
+   buffer_store_dword v2, v0, s_rsrc, s_mem_offset VMEM_MODIFIERS 
offset:256*2
+   buffer_store_dword v3, v0, s_rsrc, s_mem_offset VMEM_MODIFIERS 
offset:256*3
  end
  
  function read_4vgprs_from_mem(s_rsrc, s_mem_offset)

-   buffer_load_dword v0, v0, s_rsrc, s_mem_offset slc:1 glc:1
-   buffer_load_dword v1, v0, s_rsrc, s_mem_offset slc:1 glc:1 offset:256
-   buffer_load_dword v2, v0, s_rsrc, s_mem_offset slc:1 glc:1 offset:256*2
-   buffer_load_dword v3, v0, s_rsrc, s_mem_offset slc:1 glc:1 offset:256*3
+   buffer_load_dword v0, v0, s_rsrc, s_mem_offset VMEM_MODIFIERS
+   buffer_load_dword v1, v0, s_rsrc, s_mem_offset VMEM_MODIFIERS offset:256
+   buffer_load_dword v2, v0, s_rsrc, s_mem_offset VMEM_MODIFIERS 
offset:256*2
+   buffer_load_dword v3, v0, s_rsrc, s_mem_offset VMEM_MODIFIERS 
offset:256*3
s_waitcnt vmcnt(0)
  end
  


base-commit: cf743996352e327f483dc7d66606c90276f57380


Reviewed-by: Jay Cornwall 


Re: [PATCH v2 2/2] drm/amdgpu/pm: Fix uninitialized variable warning

2024-04-29 Thread Deucher, Alexander
[AMD Official Use Only - General]

Series is:
Acked-by: Alex Deucher 

From: Ma, Jun 
Sent: Monday, April 29, 2024 3:58 AM
To: amd-gfx@lists.freedesktop.org 
Cc: Feng, Kenneth ; Deucher, Alexander 
; Wang, Yang(Kevin) ; 
Koenig, Christian ; Ma, Jun 
Subject: [PATCH v2 2/2] drm/amdgpu/pm: Fix uninitialized variable warning

Check return value of smum_send_msg_to_smc to fix
uninitialized variable varning

Signed-off-by: Ma Jun 
---
 .../drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.c  | 21 +
 .../drm/amd/pm/powerplay/hwmgr/vega12_hwmgr.c | 20 
 .../drm/amd/pm/powerplay/hwmgr/vega20_hwmgr.c | 23 ++-
 3 files changed, 48 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.c 
b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.c
index 38d5605117ff..a8c732e07006 100644
--- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.c
+++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.c
@@ -1558,7 +1558,10 @@ static int smu10_set_fine_grain_clk_vol(struct pp_hwmgr 
*hwmgr,
 }

 if (input[0] == 0) {
-   smum_send_msg_to_smc(hwmgr, 
PPSMC_MSG_GetMinGfxclkFrequency, _freq);
+   ret = smum_send_msg_to_smc(hwmgr, 
PPSMC_MSG_GetMinGfxclkFrequency, _freq);
+   if (ret)
+   return ret;
+
 if (input[1] < min_freq) {
 pr_err("Fine grain setting minimum sclk (%ld) 
MHz is less than the minimum allowed (%d) MHz\n",
 input[1], min_freq);
@@ -1566,7 +1569,10 @@ static int smu10_set_fine_grain_clk_vol(struct pp_hwmgr 
*hwmgr,
 }
 smu10_data->gfx_actual_soft_min_freq = input[1];
 } else if (input[0] == 1) {
-   smum_send_msg_to_smc(hwmgr, 
PPSMC_MSG_GetMaxGfxclkFrequency, _freq);
+   ret = smum_send_msg_to_smc(hwmgr, 
PPSMC_MSG_GetMaxGfxclkFrequency, _freq);
+   if (ret)
+   return ret;
+
 if (input[1] > max_freq) {
 pr_err("Fine grain setting maximum sclk (%ld) 
MHz is greater than the maximum allowed (%d) MHz\n",
 input[1], max_freq);
@@ -1581,10 +1587,15 @@ static int smu10_set_fine_grain_clk_vol(struct pp_hwmgr 
*hwmgr,
 pr_err("Input parameter number not correct\n");
 return -EINVAL;
 }
-   smum_send_msg_to_smc(hwmgr, PPSMC_MSG_GetMinGfxclkFrequency, 
_freq);
-   smum_send_msg_to_smc(hwmgr, PPSMC_MSG_GetMaxGfxclkFrequency, 
_freq);
-
+   ret = smum_send_msg_to_smc(hwmgr, 
PPSMC_MSG_GetMinGfxclkFrequency, _freq);
+   if (ret)
+   return ret;
 smu10_data->gfx_actual_soft_min_freq = min_freq;
+
+   ret = smum_send_msg_to_smc(hwmgr, 
PPSMC_MSG_GetMaxGfxclkFrequency, _freq);
+   if (ret)
+   return ret;
+
 smu10_data->gfx_actual_soft_max_freq = max_freq;
 } else if (type == PP_OD_COMMIT_DPM_TABLE) {
 if (size != 0) {
diff --git a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/vega12_hwmgr.c 
b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/vega12_hwmgr.c
index c223e3a6bfca..10fd4e9f016c 100644
--- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/vega12_hwmgr.c
+++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/vega12_hwmgr.c
@@ -293,12 +293,12 @@ static int vega12_set_features_platform_caps(struct 
pp_hwmgr *hwmgr)
 return 0;
 }

-static void vega12_init_dpm_defaults(struct pp_hwmgr *hwmgr)
+static int vega12_init_dpm_defaults(struct pp_hwmgr *hwmgr)
 {
 struct vega12_hwmgr *data = (struct vega12_hwmgr *)(hwmgr->backend);
 struct amdgpu_device *adev = hwmgr->adev;
 uint32_t top32, bottom32;
-   int i;
+   int i, ret;

 data->smu_features[GNLD_DPM_PREFETCHER].smu_feature_id =
 FEATURE_DPM_PREFETCHER_BIT;
@@ -364,10 +364,16 @@ static void vega12_init_dpm_defaults(struct pp_hwmgr 
*hwmgr)
 }

 /* Get the SN to turn into a Unique ID */
-   smum_send_msg_to_smc(hwmgr, PPSMC_MSG_ReadSerialNumTop32, );
-   smum_send_msg_to_smc(hwmgr, PPSMC_MSG_ReadSerialNumBottom32, );
+   ret = smum_send_msg_to_smc(hwmgr, PPSMC_MSG_ReadSerialNumTop32, );
+   if (ret)
+   return ret;
+   ret = smum_send_msg_to_smc(hwmgr, PPSMC_MSG_ReadSerialNumBottom32, 
);
+   if (ret)
+   return ret;

 adev->unique_id = ((uint64_t)bottom32 << 32) | top32;
+
+   return 0;
 }

 static int vega12_set_private_data_based_on_pptable(struct pp_hwmgr *hwmgr)
@@ -410,7 +416,11 @@ static int vega12_hwmgr_backend_init(struct pp_hwmgr 
*hwmgr)

 

[linux-next:master] BUILD REGRESSION b0a2c79c6f3590b74742cbbc76687014d47972d8

2024-04-29 Thread kernel test robot
tree/branch: 
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
branch HEAD: b0a2c79c6f3590b74742cbbc76687014d47972d8  Add linux-next specific 
files for 20240429

Error/Warning reports:

https://lore.kernel.org/oe-kbuild-all/20240429.kkvw8mvg-...@intel.com
https://lore.kernel.org/oe-kbuild-all/202404292317.uk7gvrsc-...@intel.com
https://lore.kernel.org/oe-kbuild-all/202404300248.dblz5vz7-...@intel.com

Error/Warning: (recently discovered and may have been fixed)

arch/arm64/boot/dts/ti/k3-am62a7-sk.dtb: syscon@4300: 'syscon@4008', 
'syscon@4018' do not match any of the regexes: '^chipid@[0-9a-f]+$', 
'^clock-controller@[0-9a-f]+$', '^mux-controller@[0-9a-f]+$', 'phy@[0-9a-f]+$', 
'pinctrl-[0-9]+'
make[4]: *** No rule to make target 
'arch/arm64/boot/dts/ti/k3-j784s4-evm-usxgmii-exp1-exp2.dtb', needed by 
'arch/arm64/boot/dts/ti/'.
make[4]: *** No rule to make target 
'arch/arm64/boot/dts/ti/k3-j784s4-evm-usxgmii-exp1-exp2.dtb', needed by 
'arch/arm64/boot/dts/ti/dtbs-list'.

Unverified Error/Warning (likely false positive, please contact us if 
interested):

{standard input}:2000: Error: unknown pseudo-op: `.lfe4886'

Error/Warning ids grouped by kconfigs:

gcc_recent_errors
|-- alpha-allyesconfig
|   `-- drivers-usb-dwc3-core.c:warning:variable-hw_mode-set-but-not-used
|-- arc-allmodconfig
|   `-- drivers-usb-dwc3-core.c:warning:variable-hw_mode-set-but-not-used
|-- arc-allyesconfig
|   `-- drivers-usb-dwc3-core.c:warning:variable-hw_mode-set-but-not-used
|-- arc-randconfig-001-20240429
|   `-- drivers-usb-dwc3-core.c:warning:variable-hw_mode-set-but-not-used
|-- arm-allmodconfig
|   `-- drivers-usb-dwc3-core.c:warning:variable-hw_mode-set-but-not-used
|-- arm-allyesconfig
|   `-- drivers-usb-dwc3-core.c:warning:variable-hw_mode-set-but-not-used
|-- arm64-defconfig
|   `-- drivers-usb-dwc3-core.c:warning:variable-hw_mode-set-but-not-used
|-- arm64-randconfig-r131-20230824
|   `-- 
make:No-rule-to-make-target-arch-arm64-boot-dts-ti-k3-j784s4-evm-usxgmii-exp1-exp2.dtb-needed-by-arch-arm64-boot-dts-ti-dtbs-list-.
|-- csky-allmodconfig
|   `-- drivers-usb-dwc3-core.c:warning:variable-hw_mode-set-but-not-used
|-- csky-allyesconfig
|   `-- drivers-usb-dwc3-core.c:warning:variable-hw_mode-set-but-not-used
|-- i386-allmodconfig
|   `-- drivers-usb-dwc3-core.c:warning:variable-hw_mode-set-but-not-used
|-- i386-allyesconfig
|   `-- drivers-usb-dwc3-core.c:warning:variable-hw_mode-set-but-not-used
|-- i386-buildonly-randconfig-005-20240429
|   `-- drivers-usb-dwc3-core.c:warning:variable-hw_mode-set-but-not-used
|-- i386-randconfig-141-20240429
|   |-- 
drivers-gpu-drm-bridge-cadence-cdns-mhdp8546-core.c-cdns_mhdp_atomic_enable()-warn:inconsistent-returns-mhdp-link_mutex-.
|   |-- 
drivers-pwm-core.c-pwm_put()-warn:variable-dereferenced-before-check-pwm-(see-line-)
|   `-- 
drivers-usb-typec-ucsi-ucsi.c-ucsi_get_pd_caps()-warn:passing-zero-to-ERR_PTR
|-- loongarch-allmodconfig
|   `-- drivers-usb-dwc3-core.c:warning:variable-hw_mode-set-but-not-used
|-- loongarch-randconfig-002-20240429
|   `-- drivers-usb-dwc3-core.c:warning:variable-hw_mode-set-but-not-used
|-- m68k-allmodconfig
|   `-- drivers-usb-dwc3-core.c:warning:variable-hw_mode-set-but-not-used
|-- m68k-allyesconfig
|   `-- drivers-usb-dwc3-core.c:warning:variable-hw_mode-set-but-not-used
|-- microblaze-allmodconfig
|   `-- drivers-usb-dwc3-core.c:warning:variable-hw_mode-set-but-not-used
|-- microblaze-allyesconfig
|   `-- drivers-usb-dwc3-core.c:warning:variable-hw_mode-set-but-not-used
|-- mips-allyesconfig
|   `-- drivers-usb-dwc3-core.c:warning:variable-hw_mode-set-but-not-used
|-- nios2-allmodconfig
|   `-- drivers-usb-dwc3-core.c:warning:variable-hw_mode-set-but-not-used
|-- nios2-allyesconfig
|   `-- drivers-usb-dwc3-core.c:warning:variable-hw_mode-set-but-not-used
|-- nios2-randconfig-001-20240429
|   `-- drivers-usb-dwc3-core.c:warning:variable-hw_mode-set-but-not-used
|-- nios2-randconfig-002-20240429
|   `-- drivers-usb-dwc3-core.c:warning:variable-hw_mode-set-but-not-used
|-- openrisc-allyesconfig
|   `-- drivers-usb-dwc3-core.c:warning:variable-hw_mode-set-but-not-used
|-- parisc-allmodconfig
|   `-- drivers-usb-dwc3-core.c:warning:variable-hw_mode-set-but-not-used
|-- parisc-allyesconfig
|   `-- drivers-usb-dwc3-core.c:warning:variable-hw_mode-set-but-not-used
|-- parisc-randconfig-002-20240429
|   `-- drivers-usb-dwc3-core.c:warning:variable-hw_mode-set-but-not-used
|-- powerpc-allmodconfig
|   `-- drivers-usb-dwc3-core.c:warning:variable-hw_mode-set-but-not-used
|-- powerpc-randconfig-002-20240429
|   |-- drivers-usb-dwc3-core.c:warning:variable-hw_mode-set-but-not-used
|   |-- 
powerpc-linux-ld:drivers-gpu-drm-amd-amdgpu-..-display-dc-dml-calcs-dcn_calc_auto.o-uses-hard-float-drivers-gpu-drm-amd-amdgpu-..-display-amdgpu_dm-amdgpu_dm_helpers.o-uses-soft-float
|   |-- 
powerpc-linux-ld:drivers-gpu-drm-amd-amdgpu-..-display-dc-dml-calcs-dcn_calc_math.o-uses-hard-float-drivers-gpu-drm-amd-amdgpu-..-display

AMD Dell m18 r1/7600XT Laptop Issue dcn31_panel_cntl_construct+0x49/0x60 [amdgpu]

2024-04-29 Thread Gregory Carter
is this a bug or a support issue for this laptop?

[4.739919] [ cut here ]
[4.739920] WARNING: CPU: 17 PID: 551 at
drivers/gpu/drm/amd/amdgpu/../display/dc/dcn31/dcn31_panel_cntl.c:172
dcn31_panel_cntl_construct+0x49/0x60 [amdgpu]
[4.740183] Modules linked in: amdgpu(+) video amdxcp i2c_algo_bit
drm_ttm_helper ttm drm_exec gpu_sched drm_suballoc_helper rtsx_pci_sdmmc
drm_buddy crct10dif_pclmul crc32_pclmul crc32c_intel mmc_core
drm_display_helper polyval_clmulni r8169 nvme ucsi_acpi polyval_generic
hid_multitouch nvme_core ghash_clmulni_intel sha512_ssse3 sha256_ssse3
sha1_ssse3 typec_ucsi ccp rtsx_pci sp5100_tco cec realtek nvme_auth typec
i2c_hid_acpi wmi i2c_hid serio_raw scsi_dh_rdac scsi_dh_emc scsi_dh_alua
fuse i2c_dev
[4.740203] CPU: 17 PID: 551 Comm: (udev-worker) Not tainted
6.8.7-300.fc40.x86_64 #1
[4.740205] Hardware name: Alienware Alienware m18 R1 AMD/0XY1HF, BIOS
1.12.0 03/08/2024
[4.740206] RIP: 0010:dcn31_panel_cntl_construct+0x49/0x60 [amdgpu]
[4.740363] Code: 57 10 8b 56 0c 85 d2 74 28 83 fa 01 74 23 48 8b 40 10
48 8b 38 48 85 ff 74 04 48 8b 7f 08 48 c7 c6 98 5d 1e c1 e8 17 2f ef fa
<0f> 0b ba 0f 00 00 00 89 53 14 5b c3 cc cc cc cc 0f 1f 80 00 00 00
[4.740365] RSP: 0018:96b840ffb3a8 EFLAGS: 00010246
[4.740367] RAX:  RBX: 899850d07a40 RCX:
0027
[4.740368] RDX:  RSI: 0001 RDI:
89a73da618c0
[4.740369] RBP: 96b840ffb3e8 R08:  R09:
96b840ffb160
[4.740370] R10: bd516808 R11: 0003 R12:
96b840ffb43c
[4.740371] R13: c10902a0 R14: 96b840ffb7a0 R15:
89984f99f800
[4.740372] FS:  7f9f756d0980() GS:89a73da4()
knlGS:
[4.740373] CS:  0010 DS:  ES:  CR0: 80050033
[4.740374] CR2: 7f9f764e5b00 CR3: 000110086000 CR4:
00f50ef0
[4.740375] PKRU: 5554
[4.740376] Call Trace:
[4.740378]  
[4.740379]  ? dcn31_panel_cntl_construct+0x49/0x60 [amdgpu]
[4.740525]  ? __warn+0x81/0x130
[4.740529]  ? dcn31_panel_cntl_construct+0x49/0x60 [amdgpu]
[4.740663]  ? report_bug+0x16f/0x1a0
[4.740666]  ? handle_bug+0x3c/0x80
[4.740667]  ? exc_invalid_op+0x17/0x70
[4.740669]  ? asm_exc_invalid_op+0x1a/0x20
[4.740671]  ? dcn31_panel_cntl_construct+0x49/0x60 [amdgpu]
[4.740802]  ? dcn31_panel_cntl_construct+0x49/0x60 [amdgpu]
[4.740930]  dcn32_panel_cntl_create+0x37/0x50 [amdgpu]
[4.741076]  construct_phy+0xac6/0xd30 [amdgpu]
[4.741225]  link_create+0x1da/0x210 [amdgpu]
[4.741359]  create_links+0x134/0x420 [amdgpu]
[4.741504]  dc_create+0x316/0x650 [amdgpu]
[4.741645]  amdgpu_dm_init.isra.0+0x2d4/0x1fb0 [amdgpu]
[4.741803]  ? prb_read_valid+0x1b/0x30
[4.741806]  ? console_unlock+0x84/0x130
[4.741807]  ? __wake_up_klogd.part.0+0x3c/0x60
[4.741809]  ? vprintk_emit+0x175/0x2c0
[4.741810]  ? dev_printk_emit+0xa3/0xd0
[4.741812]  ? __pfx_smu_v13_0_update_pcie_parameters+0x10/0x10 [amdgpu]
[4.741952]  dm_hw_init+0x12/0x30 [amdgpu]
[4.742094]  amdgpu_device_init+0x1e9d/0x26d0 [amdgpu]
[4.742208]  amdgpu_driver_load_kms+0x19/0x190 [amdgpu]
[4.742318]  amdgpu_pci_probe+0x18b/0x510 [amdgpu]
[4.742427]  local_pci_probe+0x42/0xa0
[4.742430]  pci_device_probe+0xc1/0x280
[4.742432]  really_probe+0x19b/0x3e0
[4.742434]  ? __pfx___driver_attach+0x10/0x10
[4.742435]  __driver_probe_device+0x78/0x160
[4.742437]  driver_probe_device+0x1f/0xa0
[4.742438]  __driver_attach+0xba/0x1c0
[4.742440]  bus_for_each_dev+0x8c/0xe0
[4.742441]  bus_add_driver+0x116/0x220
[4.742443]  driver_register+0x5c/0x100
[4.742445]  ? __pfx_amdgpu_init+0x10/0x10 [amdgpu]
[4.742559]  do_one_initcall+0x58/0x320
[4.742562]  do_init_module+0x60/0x240
[4.742564]  __do_sys_init_module+0x17a/0x1b0
[4.742566]  do_syscall_64+0x83/0x170
[4.742569]  ? __count_memcg_events+0x4d/0xc0
[4.742570]  ? count_memcg_events.constprop.0+0x1a/0x30
[4.742572]  ? handle_mm_fault+0x1f2/0x350
[4.742574]  ? do_user_addr_fault+0x304/0x670
[4.742576]  ? exc_page_fault+0x7f/0x180
[4.742577]  entry_SYSCALL_64_after_hwframe+0x78/0x80
[4.742579] RIP: 0033:0x7f9f760a357e
[4.742588] Code: 48 8b 0d 9d 98 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66
2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05
<48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 6a 98 0c 00 f7 d8 64 89 01 48
[4.742590] RSP: 002b:7ffdbdc2ba08 EFLAGS: 0246 ORIG_RAX:
00af
[4.742592] RAX: ffda RBX: 55a1d88a69e0 RCX:
7f9f760a357e
[4.742593] RDX: 7f9f761bb07d RSI: 019e11a6 RDI:
7f9f73c00010
[4.742594] RBP: 7ffdbdc2bac0 R08: 55a1d8874010 R09:
0007
[4.742594] R10: 0001 R11: 0246 R12:
7f9f761bb07d
[4.742598] R13: 0002 R14: 

[PATCH v2 2/2] drm/amd/display: Move PRIMARY plane zpos higher

2024-04-29 Thread sunpeng.li
From: Leo Li 

[Why]

Compositors have different ways of assigning surfaces to DRM planes for
render offloading. It may decide between various strategies: overlay,
underlay, or a mix of both (see here for more info:
https://gitlab.freedesktop.org/emersion/libliftoff/-/issues/76)

One way for compositors to implement the underlay strategy is to assign
a higher zpos to the DRM_PRIMARY plane than the DRM_OVERLAY planes,
effectively turning the DRM_OVERLAY plane into an underlay plane.

Today, amdgpu attaches an immutable zpos of 0 to the DRM_PRIMARY plane.
This however, is an arbitrary restriction. DCN pipes are general
purpose, and can be arranged in any z-order. To support compositors
using this allocation scheme, we can set a non-zero immutable zpos for
the PRIMARY, allowing the placement of OVERLAYS (mutable zpos range
0-254) beneath the PRIMARY.

[How]

Assign a zpos = #no of OVERLAY planes to the PRIMARY plane. Then, clean
up any assumptions in the driver of PRIMARY plane having the lowest
zpos.

Signed-off-by: Leo Li 
Reviewed-by: Harry Wentland 
Acked-by: Pekka Paalanen 

v2: Fix typo s/decending/descending/
---
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 34 +--
 .../amd/display/amdgpu_dm/amdgpu_dm_plane.c   | 18 +++---
 2 files changed, 44 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index b4b5b73707c1..6782ca1137d4 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -80,6 +80,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -375,6 +376,20 @@ static inline void reverse_planes_order(struct 
dc_surface_update *array_of_surfa
swap(array_of_surface_update[i], array_of_surface_update[j]);
 }
 
+/*
+ * DC will program planes with their z-order determined by their ordering
+ * in the dc_surface_updates array. This comparator is used to sort them
+ * by descending zpos.
+ */
+static int dm_plane_layer_index_cmp(const void *a, const void *b)
+{
+   const struct dc_surface_update *sa = (struct dc_surface_update *)a;
+   const struct dc_surface_update *sb = (struct dc_surface_update *)b;
+
+   /* Sort by descending dc_plane layer_index (i.e. normalized_zpos) */
+   return sb->surface->layer_index - sa->surface->layer_index;
+}
+
 /**
  * update_planes_and_stream_adapter() - Send planes to be updated in DC
  *
@@ -399,7 +414,8 @@ static inline bool update_planes_and_stream_adapter(struct 
dc *dc,
struct dc_stream_update 
*stream_update,
struct dc_surface_update 
*array_of_surface_update)
 {
-   reverse_planes_order(array_of_surface_update, planes_count);
+   sort(array_of_surface_update, planes_count,
+sizeof(*array_of_surface_update), dm_plane_layer_index_cmp, NULL);
 
/*
 * Previous frame finished and HW is ready for optimization.
@@ -9503,6 +9519,8 @@ static void amdgpu_dm_atomic_commit_tail(struct 
drm_atomic_state *state)
for (j = 0; j < status->plane_count; j++)
dummy_updates[j].surface = status->plane_states[0];
 
+   sort(dummy_updates, status->plane_count,
+sizeof(*dummy_updates), dm_plane_layer_index_cmp, NULL);
 
mutex_lock(>dc_lock);
dc_update_planes_and_stream(dm->dc,
@@ -10237,6 +10255,16 @@ static bool should_reset_plane(struct drm_atomic_state 
*state,
if (new_crtc_state->color_mgmt_changed)
return true;
 
+   /*
+* On zpos change, planes need to be reordered by removing and re-adding
+* them one by one to the dc state, in order of descending zpos.
+*
+* TODO: We can likely skip bandwidth validation if the only thing that
+* changed about the plane was it'z z-ordering.
+*/
+   if (new_crtc_state->zpos_changed)
+   return true;
+
if (drm_atomic_crtc_needs_modeset(new_crtc_state))
return true;
 
@@ -11076,7 +11104,7 @@ static int amdgpu_dm_atomic_check(struct drm_device 
*dev,
}
 
/* Remove exiting planes if they are modified */
-   for_each_oldnew_plane_in_state_reverse(state, plane, old_plane_state, 
new_plane_state, i) {
+   for_each_oldnew_plane_in_descending_zpos(state, plane, old_plane_state, 
new_plane_state) {
if (old_plane_state->fb && new_plane_state->fb &&
get_mem_type(old_plane_state->fb) !=
get_mem_type(new_plane_state->fb))
@@ -11121,7 +11149,7 @@ static int amdgpu_dm_atomic_check(struct drm_device 
*dev,
}
 
/* Add new/modified planes */
-   for_each_oldnew_plane_in_state_reverse(state, plane, old_plane_state, 
new_plane_state, i) {
+   

[PATCH v2 1/2] drm/amd/display: Introduce overlay cursor mode

2024-04-29 Thread sunpeng.li
From: Leo Li 

[Why]

DCN is the display hardware for amdgpu. DRM planes are backed by DCN
hardware pipes, which carry pixel data from one end (memory), to the
other (output encoder).

Each DCN pipe has the ability to blend in a cursor early on in the
pipeline. In other words, there are no dedicated cursor planes in DCN,
which makes cursor behavior somewhat unintuitive for compositors.

For example, if the cursor is in RGB format, but the top-most DRM plane
is in YUV format, DCN will not be able to blend them. Because of this,
amdgpu_dm rejects all configurations where a cursor needs to be enabled
on top of a YUV formatted plane.

>From a compositor's perspective, when computing an allocation for
hardware plane offloading, this cursor-on-yuv configuration result in an
atomic test failure. Since the failure reason is not obvious at all,
compositors will likely fall back to full rendering, which is not ideal.

Instead, amdgpu_dm can try to accommodate the cursor-on-yuv
configuration by opportunistically reserving a separate DCN pipe just
for the cursor. We can refer to this as "overlay cursor mode". It is
contrasted with "native cursor mode", where the native DCN per-pipe
cursor is used.

[How]

On each crtc, compute whether the cursor plane should be enabled in
overlay mode. If it is, mark the CRTC as requesting overlay cursor mode.

Overlay cursor should be enabled whenever there exists a underlying
plane that has YUV format, or is scaled differently than the cursor. It
should also be enabled if there is no underlying plane, or if underlying
planes do not cover the entire CRTC.

During DC validation, attempt to enable a separate DCN pipe for the
cursor if it's in overlay mode. If that fails, or if no overlay mode is
requested, then fallback to native mode.

v2:
* Update commit message for when overlay cursor should be enabled
* Also consider scale and no-underlying-plane case (cursor on crtc bg)
* Consider all underlying planes when determinig overlay/native, not
  just the plane immediately beneath the cursor, as it may not cover the
  entire CRTC.
* Fix typo s/decending/descending/
* Force native cursor on pre-DCN hardware

Signed-off-by: Leo Li 
Acked-by: Harry Wentland 
Acked-by: Pekka Paalanen 
---
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 490 +-
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h |   7 +
 .../amd/display/amdgpu_dm/amdgpu_dm_crtc.c|   1 +
 .../amd/display/amdgpu_dm/amdgpu_dm_plane.c   |  13 +-
 4 files changed, 386 insertions(+), 125 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 8245cc63712f..b4b5b73707c1 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -8490,8 +8490,22 @@ static void amdgpu_dm_commit_planes(struct 
drm_atomic_state *state,
 * Disable the cursor first if we're disabling all the planes.
 * It'll remain on the screen after the planes are re-enabled
 * if we don't.
+*
+* If the cursor is transitioning from native to overlay mode, the
+* native cursor needs to be disabled first.
 */
-   if (acrtc_state->active_planes == 0)
+   if (acrtc_state->cursor_mode == DM_CURSOR_OVERLAY_MODE &&
+   dm_old_crtc_state->cursor_mode == DM_CURSOR_NATIVE_MODE) {
+   struct dc_cursor_position cursor_position = {0};
+
+   dc_stream_set_cursor_position(acrtc_state->stream,
+ _position);
+   bundle->stream_update.cursor_position =
+   _state->stream->cursor_position;
+   }
+
+   if (acrtc_state->active_planes == 0 &&
+   dm_old_crtc_state->cursor_mode == DM_CURSOR_NATIVE_MODE)
amdgpu_dm_commit_cursors(state);
 
/* update planes when needed */
@@ -8505,7 +8519,8 @@ static void amdgpu_dm_commit_planes(struct 
drm_atomic_state *state,
struct dm_plane_state *dm_new_plane_state = 
to_dm_plane_state(new_plane_state);
 
/* Cursor plane is handled after stream updates */
-   if (plane->type == DRM_PLANE_TYPE_CURSOR) {
+   if (plane->type == DRM_PLANE_TYPE_CURSOR &&
+   acrtc_state->cursor_mode == DM_CURSOR_NATIVE_MODE) {
if ((fb && crtc == pcrtc) ||
(old_plane_state->fb && old_plane_state->crtc == 
pcrtc)) {
cursor_update = true;
@@ -8863,7 +8878,8 @@ static void amdgpu_dm_commit_planes(struct 
drm_atomic_state *state,
 * to be disabling a single plane - those pipes are being disabled.
 */
if (acrtc_state->active_planes &&
-   (!updated_planes_and_streams || amdgpu_ip_version(dm->adev, 
DCE_HWIP, 0) == 0))
+   (!updated_planes_and_streams || amdgpu_ip_version(dm->adev, 
DCE_HWIP, 0) == 0) &&
+   

Re: [PATCH 2/2] drm/amdkfd: Allow memory oversubscription on small APUs

2024-04-29 Thread Felix Kuehling

On 2024-04-29 06:38, Yu, Lang wrote:

[Public]


-Original Message-
From: Kuehling, Felix 
Sent: Saturday, April 27, 2024 6:45 AM
To: Yu, Lang ; amd-gfx@lists.freedesktop.org
Cc: Yang, Philip ; Koenig, Christian
; Zhang, Yifan ; Liu,
Aaron 
Subject: Re: [PATCH 2/2] drm/amdkfd: Allow memory oversubscription on
small APUs

On 2024-04-26 04:37, Lang Yu wrote:

The default ttm_tt_pages_limit is 1/2 of system memory.
It is prone to out of memory with such a configuration.

Indiscriminately allowing the violation of all memory limits is not a good
solution. It will lead to poor performance once you actually reach
ttm_pages_limit and TTM starts swapping out BOs.

Hi Felix,

I just feel it's like a bug that 1/2 of system memory is fee, the driver tells 
users out of memory.
On the other hand, if memory is available, why not use it.


TTM does not allow us to use more than 1/2 system memory. I believe 
that's because TTM needs additional memory to swap out BOs. Any GTT 
allocation through the render node APIs is subject to the same limitations.


Render node APIs can handle memory overcommitment more gracefully 
because the kernel mode driver is in the loop for command submissions 
and fences. That doesn't work for KFD with user mode queues. The memory 
limits in KFD are there to prevent overcommitting memory because we need 
all of our memory (per process) to be resident at the same time. If we 
let KFD exceed the TTM limits, we get into situations where we're 
thrashing (processes evicting each other constantly) or even worse, 
where we're just not able to make all memory resident. So we end up with 
suspended user mode queues and extremely poor performance or soft hangs.





By the way, can we use USERPTR for VRAM allocations?
Then we don't have ttm_tt_pages_limit limitations. Thanks.


No. There is an expectation that VRAM BOs can be shared between 
processes through DMABufs (for HIP IPC APIs). You can't export userptrs 
as DMABufs.


You can try to raise the TTM pages limit using a TTM module parameter. 
But this is taking a risk for system stability when TTM gets into a 
situation where it needs to swap out a large BO.


Regards,
  Felix




I actually did some tests on Strix (12 CU@2100 MHz, 29412M 128bits 
LPDDR5@937MHz) with
https://github.com/ROCm/pytorch-micro-benchmarking.

Command: python micro_benchmarking_pytorch.py --network resnet50 
--batch-size=64 --iterations=20

1, Run 1 resnet50 (FP32, batch size 64)
Memory usage:
 System mem used 6748M out of 29412M
 TTM mem used 6658M out of 15719M
Memory oversubscription percentage:  0
Throughput [img/sec] : 49.04

2,  Run 2 resnet50 simultaneously (FP32, batch size 64)
Memory usage:
 System mem used 13496M out of 29412M
 TTM mem used 13316M out of 15719M
Memory oversubscription percentage:  0
Throughput [img/sec] (respectively) : 25.27 / 26.70

3, Run 3 resnet50 simultaneously (FP32, batch size 64)
Memory usage:
 System mem used 20245M out of 29412M
 TTM mem used 19974M out of 15719M
Memory oversubscription percentage:  ~27%

Throughput [img/sec](respectively) : 10.62 / 7.47 / 6.90 (In theory: 16 / 16 / 
16)

 From my observations,

1, GPU is underutilized a lot, sometimes its loading is less than 50% and even 
0, when running 3 resnet50 simultaneously with ~27% memory oversubscription.
The driver is busying evicting and restoring process. It takes ~2-5 seconds to 
restore all the BOs for one process (swap in and out BOs, actually allocate and 
copy pages),
even though the process doesn't need all the allocated BOs to be resident.

2, Sometimes, the fairness can't be guaranteed between process when memory is 
oversubscribed.
They can't share the GPU equally when created with default priority.

3, The less GPU underutilization time during evicting and restoring, the less 
performance degradation under memory oversubscription.

Regards,
Lang


Regards,
   Felix



Signed-off-by: Lang Yu 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c   |  2 +-
   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h   |  4 ++--
   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 12

+---

   3 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index 3295838e9a1d..c01c6f3ab562 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -167,7 +167,7 @@ void amdgpu_amdkfd_device_init(struct

amdgpu_device *adev)

  int i;
  int last_valid_bit;

-amdgpu_amdkfd_gpuvm_init_mem_limits();
+amdgpu_amdkfd_gpuvm_init_mem_limits(adev);

  if (adev->kfd.dev) {
  struct kgd2kfd_shared_resources gpu_resources = { diff --git
a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 1de021ebdd46..13284dbd8c58 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ 

RE: [PATCH v3] drm/amdgpu: IB test encode test package change for VCN5

2024-04-29 Thread Dong, Ruijing
[AMD Official Use Only - General]

Reviewed-by: Ruijing Dong ruijing.d...@amd.com

Thanks,
Ruijing

From: amd-gfx  On Behalf Of Jiang, Sonny
Sent: Monday, April 29, 2024 9:49 AM
To: amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH v3] drm/amdgpu: IB test encode test package change for VCN5


[AMD Official Use Only - General]


[AMD Official Use Only - General]

Ping.

Sonny

From: Jiang, Sonny mailto:sonny.ji...@amd.com>>
Sent: Thursday, April 25, 2024 4:12 PM
To: amd-gfx@lists.freedesktop.org 
mailto:amd-gfx@lists.freedesktop.org>>
Subject: Re: [PATCH v3] drm/amdgpu: IB test encode test package change for VCN5

By tests, I didn't find error on VCN1 to VCN4.

Thanks,
Sonny


From: Jiang, Sonny mailto:sonny.ji...@amd.com>>
Sent: Thursday, April 25, 2024 4:10 PM
To: amd-gfx@lists.freedesktop.org 
mailto:amd-gfx@lists.freedesktop.org>>
Cc: Jiang, Sonny mailto:sonny.ji...@amd.com>>; Jiang, 
Sonny mailto:sonny.ji...@amd.com>>
Subject: [PATCH v3] drm/amdgpu: IB test encode test package change for VCN5

From: Sonny Jiang mailto:sonji...@amd.com>>

VCN5 session info package interface changed

Signed-off-by: Sonny Jiang mailto:sonny.ji...@amd.com>>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
index 677eb141554e..b89605b400c0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
@@ -885,7 +885,7 @@ static int amdgpu_vcn_enc_get_create_msg(struct amdgpu_ring 
*ring, uint32_t hand
 ib->ptr[ib->length_dw++] = handle;
 ib->ptr[ib->length_dw++] = upper_32_bits(addr);
 ib->ptr[ib->length_dw++] = addr;
-   ib->ptr[ib->length_dw++] = 0x000b;
+   ib->ptr[ib->length_dw++] = 0x;

 ib->ptr[ib->length_dw++] = 0x0014;
 ib->ptr[ib->length_dw++] = 0x0002; /* task info */
@@ -952,7 +952,7 @@ static int amdgpu_vcn_enc_get_destroy_msg(struct 
amdgpu_ring *ring, uint32_t han
 ib->ptr[ib->length_dw++] = handle;
 ib->ptr[ib->length_dw++] = upper_32_bits(addr);
 ib->ptr[ib->length_dw++] = addr;
-   ib->ptr[ib->length_dw++] = 0x000b;
+   ib->ptr[ib->length_dw++] = 0x;

 ib->ptr[ib->length_dw++] = 0x0014;
 ib->ptr[ib->length_dw++] = 0x0002;
--
2.43.2


[PATCH 1/3] drm/amdgpu: Add amdgpu_bo_is_vm_bo helper

2024-04-29 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

Help code readability by replacing a bunch of:

bo->tbo.base.resv == vm->root.bo->tbo.base.resv

With:

amdgpu_vm_is_bo_always_valid(vm, bo)

No functional changes.

v2:
 * Rename helper and move to amdgpu_vm. (Christian)

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c  | 40 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h  |  2 ++
 3 files changed, 28 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 67c234bcf89f..e698d65e9508 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -174,7 +174,7 @@ static int amdgpu_gem_object_open(struct drm_gem_object 
*obj,
return -EPERM;
 
if (abo->flags & AMDGPU_GEM_CREATE_VM_ALWAYS_VALID &&
-   abo->tbo.base.resv != vm->root.bo->tbo.base.resv)
+   !amdgpu_vm_is_bo_always_valid(vm, abo))
return -EPERM;
 
r = amdgpu_bo_reserve(abo, false);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 8af3f0fd3073..01ca4b35b369 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -333,7 +333,7 @@ void amdgpu_vm_bo_base_init(struct amdgpu_vm_bo_base *base,
base->next = bo->vm_bo;
bo->vm_bo = base;
 
-   if (bo->tbo.base.resv != vm->root.bo->tbo.base.resv)
+   if (!amdgpu_vm_is_bo_always_valid(vm, bo))
return;
 
dma_resv_assert_held(vm->root.bo->tbo.base.resv);
@@ -1101,13 +1101,13 @@ static void amdgpu_vm_bo_get_memory(struct amdgpu_bo_va 
*bo_va,
 * For now ignore BOs which are currently locked and potentially
 * changing their location.
 */
-   if (bo->tbo.base.resv != vm->root.bo->tbo.base.resv &&
+   if (!amdgpu_vm_is_bo_always_valid(vm, bo) &&
!dma_resv_trylock(bo->tbo.base.resv))
return;
 
amdgpu_bo_get_memory(bo, stats);
-   if (bo->tbo.base.resv != vm->root.bo->tbo.base.resv)
-   dma_resv_unlock(bo->tbo.base.resv);
+   if (amdgpu_vm_is_bo_always_valid(vm, bo))
+   dma_resv_unlock(bo->tbo.base.resv);
 }
 
 void amdgpu_vm_get_memory(struct amdgpu_vm *vm,
@@ -1203,8 +1203,7 @@ int amdgpu_vm_bo_update(struct amdgpu_device *adev, 
struct amdgpu_bo_va *bo_va,
uncached = false;
}
 
-   if (clear || (bo && bo->tbo.base.resv ==
- vm->root.bo->tbo.base.resv))
+   if (clear || amdgpu_vm_is_bo_always_valid(vm, bo))
last_update = >last_update;
else
last_update = _va->last_pt_update;
@@ -1246,7 +1245,7 @@ int amdgpu_vm_bo_update(struct amdgpu_device *adev, 
struct amdgpu_bo_va *bo_va,
 * the evicted list so that it gets validated again on the
 * next command submission.
 */
-   if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv) {
+   if (amdgpu_vm_is_bo_always_valid(vm, bo)) {
uint32_t mem_type = bo->tbo.resource->mem_type;
 
if (!(bo->preferred_domains &
@@ -1640,10 +1639,9 @@ static void amdgpu_vm_bo_insert_map(struct amdgpu_device 
*adev,
if (mapping->flags & AMDGPU_PTE_PRT)
amdgpu_vm_prt_get(adev);
 
-   if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv &&
-   !bo_va->base.moved) {
+   if (amdgpu_vm_is_bo_always_valid(vm, bo) && !bo_va->base.moved)
amdgpu_vm_bo_moved(_va->base);
-   }
+
trace_amdgpu_vm_bo_map(bo_va, mapping);
 }
 
@@ -1922,7 +1920,7 @@ int amdgpu_vm_bo_clear_mappings(struct amdgpu_device 
*adev,
if (before->flags & AMDGPU_PTE_PRT)
amdgpu_vm_prt_get(adev);
 
-   if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv &&
+   if (amdgpu_vm_is_bo_always_valid(vm, bo) &&
!before->bo_va->base.moved)
amdgpu_vm_bo_moved(>bo_va->base);
} else {
@@ -1937,7 +1935,7 @@ int amdgpu_vm_bo_clear_mappings(struct amdgpu_device 
*adev,
if (after->flags & AMDGPU_PTE_PRT)
amdgpu_vm_prt_get(adev);
 
-   if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv &&
+   if (amdgpu_vm_is_bo_always_valid(vm, bo) &&
!after->bo_va->base.moved)
amdgpu_vm_bo_moved(>bo_va->base);
} else {
@@ -2017,7 +2015,7 @@ void amdgpu_vm_bo_del(struct amdgpu_device *adev,
 
if (bo) {
dma_resv_assert_held(bo->tbo.base.resv);
-   if (bo->tbo.base.resv == vm->root.bo->tbo.base.resv)
+   if (amdgpu_vm_is_bo_always_valid(vm, bo))
ttm_bo_set_bulk_move(>tbo, NULL);
 
for (base = _va->base.bo->vm_bo; 

[PATCH 2/3] drm/amdgpu: Reduce mem_type to domain double indirection

2024-04-29 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

All apart from AMDGPU_GEM_DOMAIN_GTT memory domains map 1:1 to TTM
placements. And the former be either AMDGPU_PL_PREEMPT or TTM_PL_TT,
depending on AMDGPU_GEM_CREATE_PREEMPTIBLE.

Simplify a few places in the code which convert the TTM placement into
a domain by checking against the current placement directly.

In the conversion AMDGPU_PL_PREEMPT either does not have to be handled
because amdgpu_mem_type_to_domain() cannot return that value anyway.

v2:
 * Remove AMDGPU_PL_PREEMPT handling.

Signed-off-by: Tvrtko Ursulin 
Reviewed-by: Christian König  # v1
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c |  3 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c  | 27 +
 2 files changed, 12 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
index 055ba2ea4c12..0b3b10d21952 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
@@ -165,8 +165,7 @@ static struct sg_table *amdgpu_dma_buf_map(struct 
dma_buf_attachment *attach,
if (r)
return ERR_PTR(r);
 
-   } else if (!(amdgpu_mem_type_to_domain(bo->tbo.resource->mem_type) &
-AMDGPU_GEM_DOMAIN_GTT)) {
+   } else if (bo->tbo.resource->mem_type != TTM_PL_TT) {
return ERR_PTR(-EBUSY);
}
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 8bc79924d171..eb5bd6962560 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -976,12 +976,11 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 
domain,
 
ttm_bo_pin(>tbo);
 
-   domain = amdgpu_mem_type_to_domain(bo->tbo.resource->mem_type);
-   if (domain == AMDGPU_GEM_DOMAIN_VRAM) {
+   if (bo->tbo.resource->mem_type == TTM_PL_VRAM) {
atomic64_add(amdgpu_bo_size(bo), >vram_pin_size);
atomic64_add(amdgpu_vram_mgr_bo_visible_size(bo),
 >visible_pin_size);
-   } else if (domain == AMDGPU_GEM_DOMAIN_GTT) {
+   } else if (bo->tbo.resource->mem_type == TTM_PL_TT) {
atomic64_add(amdgpu_bo_size(bo), >gart_pin_size);
}
 
@@ -1280,7 +1279,6 @@ void amdgpu_bo_get_memory(struct amdgpu_bo *bo,
 {
uint64_t size = amdgpu_bo_size(bo);
struct drm_gem_object *obj;
-   unsigned int domain;
bool shared;
 
/* Abort if the BO doesn't currently have a backing store */
@@ -1290,21 +1288,20 @@ void amdgpu_bo_get_memory(struct amdgpu_bo *bo,
obj = >tbo.base;
shared = drm_gem_object_is_shared_for_memory_stats(obj);
 
-   domain = amdgpu_mem_type_to_domain(bo->tbo.resource->mem_type);
-   switch (domain) {
-   case AMDGPU_GEM_DOMAIN_VRAM:
+   switch (bo->tbo.resource->mem_type) {
+   case TTM_PL_VRAM:
stats->vram += size;
if (amdgpu_bo_in_cpu_visible_vram(bo))
stats->visible_vram += size;
if (shared)
stats->vram_shared += size;
break;
-   case AMDGPU_GEM_DOMAIN_GTT:
+   case TTM_PL_TT:
stats->gtt += size;
if (shared)
stats->gtt_shared += size;
break;
-   case AMDGPU_GEM_DOMAIN_CPU:
+   case TTM_PL_SYSTEM:
default:
stats->cpu += size;
if (shared)
@@ -1317,7 +1314,7 @@ void amdgpu_bo_get_memory(struct amdgpu_bo *bo,
if (bo->flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED)
stats->requested_visible_vram += size;
 
-   if (domain != AMDGPU_GEM_DOMAIN_VRAM) {
+   if (bo->tbo.resource->mem_type != TTM_PL_VRAM) {
stats->evicted_vram += size;
if (bo->flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED)
stats->evicted_visible_vram += size;
@@ -1592,19 +1589,17 @@ u64 amdgpu_bo_print_info(int id, struct amdgpu_bo *bo, 
struct seq_file *m)
u64 size;
 
if (dma_resv_trylock(bo->tbo.base.resv)) {
-   unsigned int domain;
-   domain = amdgpu_mem_type_to_domain(bo->tbo.resource->mem_type);
-   switch (domain) {
-   case AMDGPU_GEM_DOMAIN_VRAM:
+   switch (bo->tbo.resource->mem_type) {
+   case TTM_PL_VRAM:
if (amdgpu_bo_in_cpu_visible_vram(bo))
placement = "VRAM VISIBLE";
else
placement = "VRAM";
break;
-   case AMDGPU_GEM_DOMAIN_GTT:
+   case TTM_PL_TT:
placement = "GTT";
break;
-   case AMDGPU_GEM_DOMAIN_CPU:
+   case 

[PATCH 3/3] drm/amdgpu: Describe preemptible objects in debugfs

2024-04-29 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

Instead of mixing them together with regular system memory objects mark
them explicitly as 'PREEMPTIBLE'.

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Felix Kuehling 
---
No idea on the name to use.. :)
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index eb5bd6962560..be6c2f5b9fcb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -1599,6 +1599,9 @@ u64 amdgpu_bo_print_info(int id, struct amdgpu_bo *bo, 
struct seq_file *m)
case TTM_PL_TT:
placement = "GTT";
break;
+   case AMDGPU_PL_PREEMPT:
+   placement = "PREEMPTIBLE";
+   break;
case TTM_PL_SYSTEM:
default:
placement = "CPU";
-- 
2.44.0



Re: [PATCH] drm/amd/display: Add MSF panel to DPCD 0x317 patch list

2024-04-29 Thread Alex Deucher
Applied.  Thanks!

On Fri, Mar 8, 2024 at 8:58 PM  wrote:
>
> From: Tobias Jakobi 
>
> This 8.4 inch panel is integrated in the Ayaneo Kun handheld
> device. The panel resolution is 2560×1600, i.e. it has
> portrait dimensions.
>
> Decoding the EDID shows:
> Manufacturer: MSF
> Model: 4099
> Display Product Name: 'TV080WUM-NL0 '
>
> Judging from the product name this might be a clone of a
> BOE panel, but with larger dimensions.
>
> Panel frequently shows non-functional backlight control. Adding
> some debug prints to update_connector_ext_caps() shows that
> something the OLED bit of ext_caps is set, and then the driver
> assumes that backlight is controlled via AUX.
>
> Forcing backlight control to PWM via amdgpu.backlight=0 restores
> backlight operation.
>
> Signed-off-by: Tobias Jakobi 
> ---
>  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c 
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c
> index 7a09a72e182f..5a017ba94e3c 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c
> @@ -68,6 +68,7 @@ static void apply_edid_quirks(struct edid *edid, struct 
> dc_edid_caps *edid_caps)
> case drm_edid_encode_panel_id('A', 'U', 'O', 0xE69B):
> case drm_edid_encode_panel_id('B', 'O', 'E', 0x092A):
> case drm_edid_encode_panel_id('L', 'G', 'D', 0x06D1):
> +   case drm_edid_encode_panel_id('M', 'S', 'F', 0x1003):
> DRM_DEBUG_DRIVER("Clearing DPCD 0x317 on monitor with panel 
> id %X\n", panel_id);
> edid_caps->panel_patch.remove_sink_ext_caps = true;
> break;
> --
> 2.43.0
>


Re: [PATCH] drm/amd/display: Remove duplicate dcn401/dcn401_clk_mgr.h header

2024-04-29 Thread Alex Deucher
Applied.  Thanks!

On Wed, Apr 24, 2024 at 11:42 PM Jiapeng Chong
 wrote:
>
> ./drivers/gpu/drm/amd/display/dc/clk_mgr/dcn401/dcn401_clk_mgr.c: 
> dcn401/dcn401_clk_mgr.h is included more than once.
>
> Reported-by: Abaci Robot 
> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=8885
> Signed-off-by: Jiapeng Chong 
> ---
>  drivers/gpu/drm/amd/display/dc/clk_mgr/dcn401/dcn401_clk_mgr.c | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn401/dcn401_clk_mgr.c 
> b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn401/dcn401_clk_mgr.c
> index d146c35f6d60..005092b0a0cb 100644
> --- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn401/dcn401_clk_mgr.c
> +++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn401/dcn401_clk_mgr.c
> @@ -21,7 +21,6 @@
>  #include "dcn/dcn_4_1_0_offset.h"
>  #include "dcn/dcn_4_1_0_sh_mask.h"
>
> -#include "dcn401/dcn401_clk_mgr.h"
>  #include "dml/dcn401/dcn401_fpu.h"
>
>  #define mmCLK01_CLK0_CLK_PLL_REQ0x16E37
> --
> 2.20.1.7.g153144c
>


Re: [PATCH] drm/amd/display: Remove duplicate spl/dc_spl_types.h header

2024-04-29 Thread Alex Deucher
Applied.  Thanks!

On Wed, Apr 24, 2024 at 9:52 PM Jiapeng Chong
 wrote:
>
> ./drivers/gpu/drm/amd/display/dc/inc/hw/transform.h: spl/dc_spl_types.h is 
> included more than once.
>
> Reported-by: Abaci Robot 
> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=8884
> Signed-off-by: Jiapeng Chong 
> ---
>  drivers/gpu/drm/amd/display/dc/inc/hw/transform.h | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/display/dc/inc/hw/transform.h 
> b/drivers/gpu/drm/amd/display/dc/inc/hw/transform.h
> index 5aa2f1a1fb83..28da1dddf0a0 100644
> --- a/drivers/gpu/drm/amd/display/dc/inc/hw/transform.h
> +++ b/drivers/gpu/drm/amd/display/dc/inc/hw/transform.h
> @@ -31,8 +31,6 @@
>  #include "fixed31_32.h"
>  #include "spl/dc_spl_types.h"
>
> -#include "spl/dc_spl_types.h"
> -
>  #define CSC_TEMPERATURE_MATRIX_SIZE 12
>
>  struct bit_depth_reduction_params;
> --
> 2.19.1.6.gb485710b
>


Re: [PATCH] drm/amdgpu: Fix signedness bug in sdma_v4_0_process_trap_irq()

2024-04-29 Thread Alex Deucher
Applied.  Thanks!

On Sun, Apr 28, 2024 at 9:32 PM Zhou, Bob  wrote:
>
> [Public]
>
> Reviewed-by: Bob Zhou 
>
> Regards,
> Bob
>
> -Original Message-
> From: Dan Carpenter 
> Sent: 2024年4月28日 20:57
> To: Zhou, Bob 
> Cc: Deucher, Alexander ; Koenig, Christian 
> ; Pan, Xinhui ; David Airlie 
> ; Daniel Vetter ; Kuehling, Felix 
> ; Zhang, Hawking ; Guchun Chen 
> ; Ma, Le ; Lazar, Lijo 
> ; Sharma, Shashank ; 
> amd-gfx@lists.freedesktop.org; dri-de...@lists.freedesktop.org; 
> linux-ker...@vger.kernel.org; kernel-janit...@vger.kernel.org
> Subject: [PATCH] drm/amdgpu: Fix signedness bug in 
> sdma_v4_0_process_trap_irq()
>
> The "instance" variable needs to be signed for the error handling to work.
>
> Fixes: b34ddc71267a ("drm/amdgpu: add error handle to avoid out-of-bounds")
> Signed-off-by: Dan Carpenter 
> ---
>  drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c 
> b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> index 101038395c3b..772604feb6ac 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> @@ -2017,7 +2017,7 @@ static int sdma_v4_0_process_trap_irq(struct 
> amdgpu_device *adev,
>   struct amdgpu_irq_src *source,
>   struct amdgpu_iv_entry *entry)  {
> -   uint32_t instance;
> +   int instance;
>
> DRM_DEBUG("IH: SDMA trap\n");
> instance = sdma_v4_0_irq_id_to_seq(entry->client_id);
> --
> 2.43.0
>


Re: [PATCH] Revert "drm/amd/pm: fixes a random hang in S4 for SMU v13.0.4/11"

2024-04-29 Thread Mario Limonciello

On 4/29/2024 08:38, Alex Deucher wrote:

This reverts commit 31729e8c21ecfd671458e02b6511eb68c2225113.

This causes problems with reboots/shutdowns for some users.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3351
Signed-off-by: Alex Deucher 
Cc: Tim Huang 


It would be unfortunate to drop as it did fix S4 for a number of users 
too.  Rather than dropping could the check be made for adev->in_s4?


In any case; whichever solution happens should also CC stable because 
the problematic commit did go to stable eventually.



---
  drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c | 12 +---
  1 file changed, 1 insertion(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c
index 88f1a0d878f33..e8119918ef6b1 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c
@@ -226,18 +226,8 @@ static int smu_v13_0_4_system_features_control(struct 
smu_context *smu, bool en)
struct amdgpu_device *adev = smu->adev;
int ret = 0;
  
-	if (!en && !adev->in_s0ix) {

-   /* Adds a GFX reset as workaround just before sending the
-* MP1_UNLOAD message to prevent GC/RLC/PMFW from entering
-* an invalid state.
-*/
-   ret = smu_cmn_send_smc_msg_with_param(smu, 
SMU_MSG_GfxDeviceDriverReset,
- SMU_RESET_MODE_2, NULL);
-   if (ret)
-   return ret;
-
+   if (!en && !adev->in_s0ix)
ret = smu_cmn_send_smc_msg(smu, SMU_MSG_PrepareMp1ForUnload, 
NULL);
-   }
  
  	return ret;

  }




[PATCH 04/11] drm/amdgpu/mes12: add mes mapping legacy queue support

2024-04-29 Thread Alex Deucher
From: Jack Xiao 

Add mes12 map legacy queue packet submission.

Signed-off-by: Jack Xiao 
Reviewed-by: Hawking Zhang 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/mes_v12_0.c | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
index b60ed178114e9..132868b8db198 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
@@ -230,6 +230,31 @@ static int mes_v12_0_remove_hw_queue(struct amdgpu_mes 
*mes,
offsetof(union MESAPI__REMOVE_QUEUE, api_status));
 }
 
+static int mes_v12_0_map_legacy_queue(struct amdgpu_mes *mes,
+ struct mes_map_legacy_queue_input *input)
+{
+   union MESAPI__ADD_QUEUE mes_add_queue_pkt;
+
+   memset(_add_queue_pkt, 0, sizeof(mes_add_queue_pkt));
+
+   mes_add_queue_pkt.header.type = MES_API_TYPE_SCHEDULER;
+   mes_add_queue_pkt.header.opcode = MES_SCH_API_ADD_QUEUE;
+   mes_add_queue_pkt.header.dwsize = API_FRAME_SIZE_IN_DWORDS;
+
+   mes_add_queue_pkt.pipe_id = input->pipe_id;
+   mes_add_queue_pkt.queue_id = input->queue_id;
+   mes_add_queue_pkt.doorbell_offset = input->doorbell_offset;
+   mes_add_queue_pkt.mqd_addr = input->mqd_addr;
+   mes_add_queue_pkt.wptr_addr = input->wptr_addr;
+   mes_add_queue_pkt.queue_type =
+   convert_to_mes_queue_type(input->queue_type);
+   mes_add_queue_pkt.map_legacy_kq = 1;
+
+   return mes_v12_0_submit_pkt_and_poll_completion(mes,
+   _add_queue_pkt, sizeof(mes_add_queue_pkt),
+   offsetof(union MESAPI__ADD_QUEUE, api_status));
+}
+
 static int mes_v12_0_unmap_legacy_queue(struct amdgpu_mes *mes,
struct mes_unmap_legacy_queue_input *input)
 {
@@ -493,6 +518,7 @@ static void mes_v12_0_enable_unmapped_doorbell_handling(
 static const struct amdgpu_mes_funcs mes_v12_0_funcs = {
.add_hw_queue = mes_v12_0_add_hw_queue,
.remove_hw_queue = mes_v12_0_remove_hw_queue,
+   .map_legacy_queue = mes_v12_0_map_legacy_queue,
.unmap_legacy_queue = mes_v12_0_unmap_legacy_queue,
.suspend_gang = mes_v12_0_suspend_gang,
.resume_gang = mes_v12_0_resume_gang,
-- 
2.44.0



[PATCH 08/11] drm/amdgpu: add module parameter 'amdgpu_uni_mes'

2024-04-29 Thread Alex Deucher
From: Jack Xiao 

Add module parameter 'amdgpu_uni_mes' to enable/disable unified
mes fw support.

Signed-off-by: Jack Xiao 
Reviewed-by: Hawking Zhang 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c |  2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   | 10 ++
 3 files changed, 13 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index c4355d72df02e..8bb8b414d5113 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -220,6 +220,7 @@ extern int amdgpu_discovery;
 extern int amdgpu_mes;
 extern int amdgpu_mes_log_enable;
 extern int amdgpu_mes_kiq;
+extern int amdgpu_uni_mes;
 extern int amdgpu_noretry;
 extern int amdgpu_force_asic_type;
 extern int amdgpu_smartshift_bias;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index 7187968226a81..6b240f6e98b7d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -2223,6 +2223,8 @@ static int amdgpu_discovery_set_mes_ip_blocks(struct 
amdgpu_device *adev)
amdgpu_device_ip_block_add(adev, _v11_0_ip_block);
adev->enable_mes = true;
adev->enable_mes_kiq = true;
+   if (amdgpu_uni_mes)
+   adev->enable_uni_mes = true;
break;
default:
break;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index ea14f1c8f4304..447fa858c6541 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -197,6 +197,7 @@ int amdgpu_discovery = -1;
 int amdgpu_mes;
 int amdgpu_mes_log_enable = 0;
 int amdgpu_mes_kiq;
+int amdgpu_uni_mes;
 int amdgpu_noretry = -1;
 int amdgpu_force_asic_type = -1;
 int amdgpu_tmz = -1; /* auto */
@@ -686,6 +687,15 @@ MODULE_PARM_DESC(mes_kiq,
"Enable Micro Engine Scheduler KIQ (0 = disabled (default), 1 = 
enabled)");
 module_param_named(mes_kiq, amdgpu_mes_kiq, int, 0444);
 
+/**
+ * DOC: uni_mes (int)
+ * Enable Unified Micro Engine Scheduler. This is a new engine pipe for 
unified scheduler.
+ * (0 = disabled (default), 1 = enabled)
+ */
+MODULE_PARM_DESC(uni_mes,
+   "Enable Unified Micro Engine Scheduler (0 = disabled (default), 1 = 
enabled)");
+module_param_named(uni_mes, amdgpu_uni_mes, int, 0444);
+
 /**
  * DOC: noretry (int)
  * Disable XNACK retry in the SQ by default on GFXv9 hardware. On ASICs that
-- 
2.44.0



[PATCH 11/11] drm/amdgpu/discovery: add mes v12_0 ip block

2024-04-29 Thread Alex Deucher
From: Likun Gao 

Add mes v12_0 ip block.

v2: squash in update (Alex)
v3: rebase on unified mes changes (Alex)

Signed-off-by: Likun Gao 
Reviewed-by: Hawking Zhang 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index 3410fa7022fca..df13c2f9673f2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -96,6 +96,7 @@
 #include "amdgpu_vkms.h"
 #include "mes_v10_1.h"
 #include "mes_v11_0.h"
+#include "mes_v12_0.h"
 #include "smuio_v11_0.h"
 #include "smuio_v11_0_6.h"
 #include "smuio_v13_0.h"
@@ -2233,6 +2234,14 @@ static int amdgpu_discovery_set_mes_ip_blocks(struct 
amdgpu_device *adev)
if (amdgpu_uni_mes)
adev->enable_uni_mes = true;
break;
+   case IP_VERSION(12, 0, 0):
+   case IP_VERSION(12, 0, 1):
+   amdgpu_device_ip_block_add(adev, _v12_0_ip_block);
+   adev->enable_mes = true;
+   adev->enable_mes_kiq = true;
+   if (amdgpu_uni_mes)
+   adev->enable_uni_mes = true;
+   break;
default:
break;
}
-- 
2.44.0



[PATCH 10/11] drm/amdgpu/discovery: add gfx v12_0 ip block

2024-04-29 Thread Alex Deucher
From: Likun Gao 

Add gfx v12_0 ip block.

v2: Squash in update (Alex)
v3: add exp flag (Alex)

Signed-off-by: Likun Gao 
Reviewed-by: Hawking Zhang 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index 6b240f6e98b7d..3410fa7022fca 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -76,6 +76,7 @@
 #include "ih_v7_0.h"
 #include "gfx_v10_0.h"
 #include "gfx_v11_0.h"
+#include "gfx_v12_0.h"
 #include "sdma_v5_0.h"
 #include "sdma_v5_2.h"
 #include "sdma_v6_0.h"
@@ -2037,6 +2038,12 @@ static int amdgpu_discovery_set_gc_ip_blocks(struct 
amdgpu_device *adev)
case IP_VERSION(11, 5, 1):
amdgpu_device_ip_block_add(adev, _v11_0_ip_block);
break;
+   case IP_VERSION(12, 0, 0):
+   case IP_VERSION(12, 0, 1):
+   if (!amdgpu_exp_hw_support)
+   return -EINVAL;
+   amdgpu_device_ip_block_add(adev, _v12_0_ip_block);
+   break;
default:
dev_err(adev->dev, "Failed to add gfx ip block(GC_HWIP:0x%x)\n",
amdgpu_ip_version(adev, GC_HWIP, 0));
-- 
2.44.0



[PATCH 09/11] drm/amdgpu/mes12: disable logging output

2024-04-29 Thread Alex Deucher
From: Jack Xiao 

Random page fault was oberserved, temporarily disable
mes log buffer output.

Signed-off-by: Jack Xiao 
Reviewed-by: Kenneth Feng 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/mes_v12_0.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
index 0b67ff9c04924..cbd5b312a075b 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
@@ -440,7 +440,7 @@ static int mes_v12_0_set_hw_resources(struct amdgpu_mes 
*mes)
mes_set_hw_res_pkt.enable_reg_active_poll = 1;
mes_set_hw_res_pkt.oversubscription_timer = 50;
 
-   mes_set_hw_res_pkt.enable_mes_event_int_logging = 1;
+   mes_set_hw_res_pkt.enable_mes_event_int_logging = 0;
mes_set_hw_res_pkt.event_intr_history_gpu_mc_ptr = 
mes->event_log_gpu_addr;
 
return mes_v12_0_submit_pkt_and_poll_completion(mes,
-- 
2.44.0



[PATCH 06/11] drm/amdgpu: Disable unmapped doorbell handling basic mode on mes 12

2024-04-29 Thread Alex Deucher
From: shaoyunl 

The new mechanism for unmapped doorbell handling requires both driver side and
MES fw side change. The FW side changes are still not released.

Signed-off-by: shaoyunl 
Reviewed-by: Harish Kasiviswanthan 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/mes_v12_0.c| 16 +---
 drivers/gpu/drm/amd/include/mes_v12_api_def.h |  3 +--
 2 files changed, 2 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
index 132868b8db198..cf6dea13cc955 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
@@ -422,14 +422,7 @@ static int mes_v12_0_set_hw_resources(struct amdgpu_mes 
*mes)
mes_set_hw_res_pkt.disable_mes_log = 1;
mes_set_hw_res_pkt.use_different_vmid_compute = 1;
mes_set_hw_res_pkt.enable_reg_active_poll = 1;
-
-   /*
-* No need to enable oversubscribe timer when we have unmapped doorbell
-* handling support.
-* handling  mode - 0: disabled; 1: basic version; 2: basic+ version
-*/
-   mes_set_hw_res_pkt.oversubscription_timer = 0;
-   mes_set_hw_res_pkt.unmapped_doorbell_handling = 1;
+   mes_set_hw_res_pkt.oversubscription_timer = 50;
 
 
mes_set_hw_res_pkt.enable_mes_event_int_logging = 1;
@@ -877,13 +870,6 @@ static int mes_v12_0_mqd_init(struct amdgpu_ring *ring)
mqd->cp_hqd_iq_timer = regCP_HQD_IQ_TIMER_DEFAULT;
mqd->cp_hqd_quantum = regCP_HQD_QUANTUM_DEFAULT;
 
-   /*
-* Set CP_HQD_GFX_CONTROL.DB_UPDATED_MSG_EN[15] to enable unmapped
-* doorbell handling. This is a reserved CP internal register can
-* not be accesss by others
-*/
-   mqd->reserved_184 = BIT(15);
-
return 0;
 }
 
diff --git a/drivers/gpu/drm/amd/include/mes_v12_api_def.h 
b/drivers/gpu/drm/amd/include/mes_v12_api_def.h
index 2cdecf937acef..81cc0a5540492 100644
--- a/drivers/gpu/drm/amd/include/mes_v12_api_def.h
+++ b/drivers/gpu/drm/amd/include/mes_v12_api_def.h
@@ -238,8 +238,7 @@ union MESAPI_SET_HW_RESOURCES {
uint32_t send_write_data : 1;
uint32_t os_tdr_timeout_override : 1;
uint32_t use_rs64mem_for_proc_gang_ctx : 1;
-   uint32_t unmapped_doorbell_handling: 2;
-   uint32_t reserved : 15;
+   uint32_t reserved : 17;
};
uint32_t uint32_all;
};
-- 
2.44.0



[PATCH 05/11] drm/amdgpu/gfx: enable mes to map legacy queue support

2024-04-29 Thread Alex Deucher
From: Jack Xiao 

Enable mes to map legacy queue support.

v2: drop unused gfx_v12_0_kiq_enable_kgq() (Alex)

Signed-off-by: Jack Xiao 
Reviewed-by: Hawking Zhang 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 39 +
 drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c  | 25 ++--
 2 files changed, 36 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
index 1d955652f3ba6..8e7cc44143857 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
@@ -623,10 +623,28 @@ int amdgpu_gfx_enable_kcq(struct amdgpu_device *adev, int 
xcc_id)
queue_mask |= (1ull << 
amdgpu_queue_mask_bit_to_set_resource_bit(adev, i));
}
 
-   DRM_INFO("kiq ring mec %d pipe %d q %d\n", kiq_ring->me, kiq_ring->pipe,
-   kiq_ring->queue);
amdgpu_device_flush_hdp(adev, NULL);
 
+   if (adev->enable_mes)
+   queue_mask = ~0ULL;
+
+   if (adev->enable_uni_mes) {
+   for (i = 0; i < adev->gfx.num_compute_rings; i++) {
+   j = i + xcc_id * adev->gfx.num_compute_rings;
+   r = amdgpu_mes_map_legacy_queue(adev,
+   
>gfx.compute_ring[j]);
+   if (r) {
+   DRM_ERROR("failed to map compute queue\n");
+   return r;
+   }
+   }
+
+   return 0;
+   }
+
+   DRM_INFO("kiq ring mec %d pipe %d q %d\n", kiq_ring->me, kiq_ring->pipe,
+kiq_ring->queue);
+
spin_lock(>ring_lock);
r = amdgpu_ring_alloc(kiq_ring, kiq->pmf->map_queues_size *
adev->gfx.num_compute_rings +
@@ -637,9 +655,6 @@ int amdgpu_gfx_enable_kcq(struct amdgpu_device *adev, int 
xcc_id)
return r;
}
 
-   if (adev->enable_mes)
-   queue_mask = ~0ULL;
-
kiq->pmf->kiq_set_resources(kiq_ring, queue_mask);
for (i = 0; i < adev->gfx.num_compute_rings; i++) {
j = i + xcc_id * adev->gfx.num_compute_rings;
@@ -666,6 +681,20 @@ int amdgpu_gfx_enable_kgq(struct amdgpu_device *adev, int 
xcc_id)
 
amdgpu_device_flush_hdp(adev, NULL);
 
+   if (adev->enable_uni_mes) {
+   for (i = 0; i < adev->gfx.num_gfx_rings; i++) {
+   j = i + xcc_id * adev->gfx.num_gfx_rings;
+   r = amdgpu_mes_map_legacy_queue(adev,
+   >gfx.gfx_ring[j]);
+   if (r) {
+   DRM_ERROR("failed to map gfx queue\n");
+   return r;
+   }
+   }
+
+   return 0;
+   }
+
spin_lock(>ring_lock);
/* No need to map kcq on the slave */
if (amdgpu_gfx_is_master_xcc(adev, xcc_id)) {
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
index 51955a4e47d59..3e2a6806f1c19 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
@@ -2707,28 +2707,6 @@ static int gfx_v12_0_gfx_init_queue(struct amdgpu_ring 
*ring)
return 0;
 }
 
-static int gfx_v12_0_kiq_enable_kgq(struct amdgpu_device *adev)
-{
-   struct amdgpu_kiq *kiq = >gfx.kiq[0];
-   struct amdgpu_ring *kiq_ring = >gfx.kiq[0].ring;
-   int r, i;
-
-   if (!kiq->pmf || !kiq->pmf->kiq_map_queues)
-   return -EINVAL;
-
-   r = amdgpu_ring_alloc(kiq_ring, kiq->pmf->map_queues_size *
-   adev->gfx.num_gfx_rings);
-   if (r) {
-   DRM_ERROR("Failed to lock KIQ (%d).\n", r);
-   return r;
-   }
-
-   for (i = 0; i < adev->gfx.num_gfx_rings; i++)
-   kiq->pmf->kiq_map_queues(kiq_ring, >gfx.gfx_ring[i]);
-
-   return amdgpu_ring_test_helper(kiq_ring);
-}
-
 static int gfx_v12_0_cp_async_gfx_ring_resume(struct amdgpu_device *adev)
 {
int r, i;
@@ -2751,7 +2729,8 @@ static int gfx_v12_0_cp_async_gfx_ring_resume(struct 
amdgpu_device *adev)
if (r)
goto done;
}
-   r = gfx_v12_0_kiq_enable_kgq(adev);
+
+   r = amdgpu_gfx_enable_kgq(adev, 0);
if (r)
goto done;
 
-- 
2.44.0



[PATCH 07/11] drm/amdgpu/mes12: add legacy setting hw resource interface

2024-04-29 Thread Alex Deucher
From: Jack Xiao 

For unified mes fw, add the legacy interface to set hardware
resources.

v2: remove warning (Alex)

Signed-off-by: Jack Xiao 
Reviewed-by: Hawking Zhang 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/mes_v12_0.c| 22 +--
 drivers/gpu/drm/amd/include/mes_v12_api_def.h | 22 +++
 2 files changed, 42 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
index cf6dea13cc955..0b67ff9c04924 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
@@ -376,6 +376,22 @@ static int mes_v12_0_misc_op(struct amdgpu_mes *mes,
offsetof(union MESAPI__MISC, api_status));
 }
 
+static int mes_v12_0_set_hw_resources_1(struct amdgpu_mes *mes)
+{
+   union MESAPI_SET_HW_RESOURCES_1 mes_set_hw_res_1_pkt;
+
+   memset(_set_hw_res_1_pkt, 0, sizeof(mes_set_hw_res_1_pkt));
+
+   mes_set_hw_res_1_pkt.header.type = MES_API_TYPE_SCHEDULER;
+   mes_set_hw_res_1_pkt.header.opcode = MES_SCH_API_SET_HW_RSRC_1;
+   mes_set_hw_res_1_pkt.header.dwsize = API_FRAME_SIZE_IN_DWORDS;
+   mes_set_hw_res_1_pkt.mes_kiq_unmap_timeout = 100;
+
+   return mes_v12_0_submit_pkt_and_poll_completion(mes,
+   _set_hw_res_1_pkt, sizeof(mes_set_hw_res_1_pkt),
+   offsetof(union MESAPI_SET_HW_RESOURCES_1, api_status));
+}
+
 static int mes_v12_0_set_hw_resources(struct amdgpu_mes *mes)
 {
int i;
@@ -424,7 +440,6 @@ static int mes_v12_0_set_hw_resources(struct amdgpu_mes 
*mes)
mes_set_hw_res_pkt.enable_reg_active_poll = 1;
mes_set_hw_res_pkt.oversubscription_timer = 50;
 
-
mes_set_hw_res_pkt.enable_mes_event_int_logging = 1;
mes_set_hw_res_pkt.event_intr_history_gpu_mc_ptr = 
mes->event_log_gpu_addr;
 
@@ -1043,7 +1058,7 @@ static int mes_v12_0_kiq_ring_init(struct amdgpu_device 
*adev)
ring = >gfx.kiq[0].ring;
 
ring->me = 3;
-   ring->pipe = 1;
+   ring->pipe = adev->enable_uni_mes ? 0 : 1;
ring->queue = 0;
 
ring->adev = NULL;
@@ -1309,6 +1324,9 @@ static int mes_v12_0_hw_init(void *handle)
if (r)
goto failure;
 
+   if (adev->enable_uni_mes)
+   mes_v12_0_set_hw_resources_1(>mes);
+
mes_v12_0_init_aggregated_doorbell(>mes);
 
/* Enable the MES to handle doorbell ring on unmapped queue */
diff --git a/drivers/gpu/drm/amd/include/mes_v12_api_def.h 
b/drivers/gpu/drm/amd/include/mes_v12_api_def.h
index 81cc0a5540492..e3211daa9c2e4 100644
--- a/drivers/gpu/drm/amd/include/mes_v12_api_def.h
+++ b/drivers/gpu/drm/amd/include/mes_v12_api_def.h
@@ -62,6 +62,7 @@ enum MES_SCH_API_OPCODE {
MES_SCH_API_AMD_LOG = 16,
MES_SCH_API_SET_SE_MODE = 17,
MES_SCH_API_SET_GANG_SUBMIT = 18,
+   MES_SCH_API_SET_HW_RSRC_1   = 19,
 
MES_SCH_API_MAX = 0xFF
 };
@@ -252,6 +253,27 @@ union MESAPI_SET_HW_RESOURCES {
uint32_t max_dwords_in_api[API_FRAME_SIZE_IN_DWORDS];
 };
 
+union MESAPI_SET_HW_RESOURCES_1 {
+   struct {
+   union MES_API_HEADERheader;
+   struct MES_API_STATUS   api_status;
+   uint64_ttimestamp;
+   union {
+   struct {
+   uint32_t enable_mes_debug_ctx : 1;
+   uint32_t reserved : 31;
+   };
+   uint32_t uint32_all;
+   };
+   uint64_tmes_debug_ctx_mc_addr;
+   uint32_tmes_debug_ctx_size;
+   /* unit is 100ms */
+   uint32_tmes_kiq_unmap_timeout;
+   };
+
+   uint32_t max_dwords_in_api[API_FRAME_SIZE_IN_DWORDS];
+};
+
 union MESAPI__ADD_QUEUE {
struct {
union MES_API_HEADERheader;
-- 
2.44.0



[PATCH 03/11] drm/amdgpu/mes12: enable uni_mes fw on mes pipe0

2024-04-29 Thread Alex Deucher
From: Jack Xiao 

Enable the unified mes firmware on mes pipe0.

Signed-off-by: Jack Xiao 
Reviewed-by: Hawking Zhang 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/mes_v12_0.c | 51 +++---
 1 file changed, 38 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
index e9c963ac452ac..b60ed178114e9 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
@@ -39,6 +39,7 @@ MODULE_FIRMWARE("amdgpu/gc_12_0_1_mes.bin");
 MODULE_FIRMWARE("amdgpu/gc_12_0_1_mes1.bin");
 MODULE_FIRMWARE("amdgpu/gc_12_0_1_uni_mes.bin");
 
+static int mes_v12_0_hw_init(void *handle);
 static int mes_v12_0_hw_fini(void *handle);
 static int mes_v12_0_kiq_hw_init(struct amdgpu_device *adev);
 static int mes_v12_0_kiq_hw_fini(struct amdgpu_device *adev);
@@ -586,13 +587,13 @@ static void mes_v12_0_enable(struct amdgpu_device *adev, 
bool enable)
if (enable) {
data = RREG32_SOC15(GC, 0, regCP_MES_CNTL);
data = REG_SET_FIELD(data, CP_MES_CNTL, MES_PIPE0_RESET, 1);
-   data = REG_SET_FIELD(data, CP_MES_CNTL,
-MES_PIPE1_RESET, adev->enable_mes_kiq ? 1 : 0);
+   data = REG_SET_FIELD(data, CP_MES_CNTL, MES_PIPE1_RESET,
+  (!adev->enable_uni_mes && adev->enable_mes_kiq) ? 1 : 0);
WREG32_SOC15(GC, 0, regCP_MES_CNTL, data);
 
mutex_lock(>srbm_mutex);
for (pipe = 0; pipe < AMDGPU_MAX_MES_PIPES; pipe++) {
-   if (!adev->enable_mes_kiq &&
+   if ((!adev->enable_mes_kiq || adev->enable_uni_mes) &&
pipe == AMDGPU_MES_KIQ_PIPE)
continue;
 
@@ -610,11 +611,13 @@ static void mes_v12_0_enable(struct amdgpu_device *adev, 
bool enable)
/* unhalt MES and activate pipe0 */
data = REG_SET_FIELD(0, CP_MES_CNTL, MES_PIPE0_ACTIVE, 1);
data = REG_SET_FIELD(data, CP_MES_CNTL, MES_PIPE1_ACTIVE,
-adev->enable_mes_kiq ? 1 : 0);
+  (!adev->enable_uni_mes && adev->enable_mes_kiq) ? 1 : 0);
WREG32_SOC15(GC, 0, regCP_MES_CNTL, data);
 
if (amdgpu_emu_mode)
msleep(100);
+   else if (adev->enable_uni_mes)
+   udelay(500);
else
udelay(50);
} else {
@@ -625,7 +628,7 @@ static void mes_v12_0_enable(struct amdgpu_device *adev, 
bool enable)
 MES_INVALIDATE_ICACHE, 1);
data = REG_SET_FIELD(data, CP_MES_CNTL, MES_PIPE0_RESET, 1);
data = REG_SET_FIELD(data, CP_MES_CNTL, MES_PIPE1_RESET,
-adev->enable_mes_kiq ? 1 : 0);
+  (!adev->enable_uni_mes && adev->enable_mes_kiq) ? 1 : 0);
data = REG_SET_FIELD(data, CP_MES_CNTL, MES_HALT, 1);
WREG32_SOC15(GC, 0, regCP_MES_CNTL, data);
}
@@ -640,6 +643,10 @@ static void mes_v12_0_set_ucode_start_addr(struct 
amdgpu_device *adev)
 
mutex_lock(>srbm_mutex);
for (pipe = 0; pipe < AMDGPU_MAX_MES_PIPES; pipe++) {
+   if ((!adev->enable_mes_kiq || adev->enable_uni_mes) &&
+   pipe == AMDGPU_MES_KIQ_PIPE)
+   continue;
+
/* me=3, queue=0 */
soc21_grbm_select(adev, 3, pipe, 0, 0);
 
@@ -966,9 +973,13 @@ static int mes_v12_0_queue_init(struct amdgpu_device *adev,
return r;
 
if (pipe == AMDGPU_MES_SCHED_PIPE) {
-   r = mes_v12_0_kiq_enable_queue(adev);
-   if (r)
-   return r;
+   if (adev->enable_uni_mes) {
+   mes_v12_0_queue_init_register(ring);
+   } else {
+   r = mes_v12_0_kiq_enable_queue(adev);
+   if (r)
+   return r;
+   }
} else {
mes_v12_0_queue_init_register(ring);
}
@@ -1202,6 +1213,11 @@ static int mes_v12_0_kiq_hw_init(struct amdgpu_device 
*adev)
 {
int r = 0;
 
+   mes_v12_0_kiq_setting(>gfx.kiq[0].ring);
+
+   if (adev->enable_uni_mes)
+   return mes_v12_0_hw_init(adev);
+
if (adev->firmware.load_type == AMDGPU_FW_LOAD_DIRECT) {
 
r = mes_v12_0_load_microcode(adev, AMDGPU_MES_SCHED_PIPE, 
false);
@@ -1223,8 +1239,6 @@ static int mes_v12_0_kiq_hw_init(struct amdgpu_device 
*adev)
 
mes_v12_0_enable(adev, true);
 
-   mes_v12_0_kiq_setting(>gfx.kiq[0].ring);
-
r = mes_v12_0_queue_init(adev, AMDGPU_MES_KIQ_PIPE);
if (r)
goto failure;
@@ -1238,7 +1252,7 @@ static int mes_v12_0_kiq_hw_init(struct amdgpu_device 
*adev)
 
 static int 

[PATCH 01/11] drm/amdgpu/mes: add uni_mes fw loading support

2024-04-29 Thread Alex Deucher
From: Jack Xiao 

Add the unified mes firmware loading support.

Signed-off-by: Jack Xiao 
Reviewed-by: Hawking Zhang 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 5 -
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index f87d53e183c3d..c4355d72df02e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1046,6 +1046,7 @@ struct amdgpu_device {
/* mes */
boolenable_mes;
boolenable_mes_kiq;
+   boolenable_uni_mes;
struct amdgpu_mes   mes;
struct amdgpu_mqd   mqds[AMDGPU_HW_IP_NUM];
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
index ea06f8be133e0..62edf63285667 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
@@ -1511,7 +1511,10 @@ int amdgpu_mes_init_microcode(struct amdgpu_device 
*adev, int pipe)
 
amdgpu_ucode_ip_version_decode(adev, GC_HWIP, ucode_prefix,
   sizeof(ucode_prefix));
-   if (amdgpu_ip_version(adev, GC_HWIP, 0) >= IP_VERSION(11, 0, 0) &&
+   if (adev->enable_uni_mes && pipe == AMDGPU_MES_SCHED_PIPE) {
+   snprintf(fw_name, sizeof(fw_name),
+"amdgpu/%s_uni_mes.bin", ucode_prefix);
+   } else if (amdgpu_ip_version(adev, GC_HWIP, 0) >= IP_VERSION(11, 0, 0) 
&&
amdgpu_ip_version(adev, GC_HWIP, 0) < IP_VERSION(12, 0, 0)) {
snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mes%s.bin",
 ucode_prefix,
-- 
2.44.0



[PATCH 02/11] drm/amdgpu/mes12: add uni_mes fw loading support

2024-04-29 Thread Alex Deucher
From: Jack Xiao 

Add the unified mes firmware loading support.

Signed-off-by: Jack Xiao 
Reviewed-by: Hawking Zhang 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/mes_v12_0.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
index e92478b1f298f..e9c963ac452ac 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
@@ -34,8 +34,10 @@
 
 MODULE_FIRMWARE("amdgpu/gc_12_0_0_mes.bin");
 MODULE_FIRMWARE("amdgpu/gc_12_0_0_mes1.bin");
+MODULE_FIRMWARE("amdgpu/gc_12_0_0_uni_mes.bin");
 MODULE_FIRMWARE("amdgpu/gc_12_0_1_mes.bin");
 MODULE_FIRMWARE("amdgpu/gc_12_0_1_mes1.bin");
+MODULE_FIRMWARE("amdgpu/gc_12_0_1_uni_mes.bin");
 
 static int mes_v12_0_hw_fini(void *handle);
 static int mes_v12_0_kiq_hw_init(struct amdgpu_device *adev);
@@ -1331,6 +1333,14 @@ static int mes_v12_0_early_init(void *handle)
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
int pipe, r;
 
+   if (adev->enable_uni_mes) {
+   r = amdgpu_mes_init_microcode(adev, AMDGPU_MES_SCHED_PIPE);
+   if (!r)
+   return 0;
+
+   adev->enable_uni_mes = false;
+   }
+
for (pipe = 0; pipe < AMDGPU_MAX_MES_PIPES; pipe++) {
if (!adev->enable_mes_kiq && pipe == AMDGPU_MES_KIQ_PIPE)
continue;
-- 
2.44.0



[PATCH 11/14] drm/amdkfd: fix NULL ptr for debugfs mqds on GFX v12

2024-04-29 Thread Alex Deucher
From: Eric Huang 

mqd_stride function in gfx v12 is not implemented, that
causes NULL ptr error. Add the generic func to fix it.

Signed-off-by: Eric Huang 
Reviewed-by: Harish Kasiviswanathan 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v12.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v12.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v12.c
index 4d786b5ffd130..aa900b651eb0e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v12.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v12.c
@@ -389,6 +389,7 @@ struct mqd_manager *mqd_manager_init_v12(enum KFD_MQD_TYPE 
type,
mqd->is_occupied = kfd_is_occupied_cp;
mqd->mqd_size = sizeof(struct v12_compute_mqd);
mqd->get_wave_state = get_wave_state;
+   mqd->mqd_stride = kfd_mqd_stride;
 #if defined(CONFIG_DEBUG_FS)
mqd->debugfs_show_mqd = debugfs_show_mqd;
 #endif
@@ -404,6 +405,7 @@ struct mqd_manager *mqd_manager_init_v12(enum KFD_MQD_TYPE 
type,
mqd->destroy_mqd = kfd_destroy_mqd_cp;
mqd->is_occupied = kfd_is_occupied_cp;
mqd->mqd_size = sizeof(struct v12_compute_mqd);
+   mqd->mqd_stride = kfd_mqd_stride;
 #if defined(CONFIG_DEBUG_FS)
mqd->debugfs_show_mqd = debugfs_show_mqd;
 #endif
@@ -433,6 +435,7 @@ struct mqd_manager *mqd_manager_init_v12(enum KFD_MQD_TYPE 
type,
mqd->destroy_mqd = kfd_destroy_mqd_sdma;
mqd->is_occupied = kfd_is_occupied_sdma;
mqd->mqd_size = sizeof(struct v12_sdma_mqd);
+   mqd->mqd_stride = kfd_mqd_stride;
 #if defined(CONFIG_DEBUG_FS)
mqd->debugfs_show_mqd = debugfs_show_mqd_sdma;
 #endif
-- 
2.44.0



[PATCH 10/14] drm/amdkfd: enable single alu ops for gfx12

2024-04-29 Thread Alex Deucher
From: Jonathan Kim 

GFX12 debugging requires setting up precise ALU operation for catching
ALU exceptions.

Signed-off-by: Jonathan Kim 
Tested-by: Lancelot Six 
Reviewed-by: Eric Huang 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdkfd/kfd_debug.c| 15 +--
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c |  4 
 include/uapi/linux/kfd_ioctl.h|  1 +
 include/uapi/linux/kfd_sysfs.h| 19 ++-
 4 files changed, 28 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
index d889e3545120a..45b1975b149a9 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
@@ -25,6 +25,7 @@
 #include "kfd_topology.h"
 #include 
 #include 
+#include 
 
 #define MAX_WATCH_ADDRESSES4
 
@@ -497,14 +498,24 @@ int kfd_dbg_trap_set_flags(struct kfd_process *target, 
uint32_t *flags)
int i, r = 0, rewind_count = 0;
 
for (i = 0; i < target->n_pdds; i++) {
-   if (!kfd_dbg_is_per_vmid_supported(target->pdds[i]->dev) &&
+   struct kfd_topology_device *topo_dev =
+   
kfd_topology_device_by_id(target->pdds[i]->dev->id);
+   uint32_t caps = topo_dev->node_props.capability;
+
+   if (!(caps | 
HSA_CAP_TRAP_DEBUG_PRECISE_MEMORY_OPERATIONS_SUPPORTED) &&
(*flags & KFD_DBG_TRAP_FLAG_SINGLE_MEM_OP)) {
*flags = prev_flags;
return -EACCES;
}
+
+   if (!(caps | 
HSA_CAP_TRAP_DEBUG_PRECISE_ALU_OPERATIONS_SUPPORTED) &&
+   (*flags & KFD_DBG_TRAP_FLAG_SINGLE_ALU_OP)) {
+   *flags = prev_flags;
+   return -EACCES;
+   }
}
 
-   target->dbg_flags = *flags & KFD_DBG_TRAP_FLAG_SINGLE_MEM_OP;
+   target->dbg_flags = *flags;
*flags = prev_flags;
for (i = 0; i < target->n_pdds; i++) {
struct kfd_process_device *pdd = target->pdds[i];
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index c51f131eaa2fb..11857869afb9e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -1927,6 +1927,10 @@ static void kfd_topology_set_capabilities(struct 
kfd_topology_device *dev)
if (KFD_GC_VERSION(dev->gpu) >= IP_VERSION(11, 0, 0))
dev->node_props.capability |=

HSA_CAP_TRAP_DEBUG_PRECISE_MEMORY_OPERATIONS_SUPPORTED;
+
+   if (KFD_GC_VERSION(dev->gpu) >= IP_VERSION(12, 0, 0))
+   dev->node_props.capability |=
+   
HSA_CAP_TRAP_DEBUG_PRECISE_ALU_OPERATIONS_SUPPORTED;
}
 
kfd_topology_set_dbg_firmware_support(dev);
diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index d09c4a18e5713..43fb0f4c42262 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -853,6 +853,7 @@ enum kfd_dbg_trap_address_watch_mode {
 /* Additional wave settings */
 enum kfd_dbg_trap_flags {
KFD_DBG_TRAP_FLAG_SINGLE_MEM_OP = 1,
+   KFD_DBG_TRAP_FLAG_SINGLE_ALU_OP = 2,
 };
 
 /* Trap exceptions */
diff --git a/include/uapi/linux/kfd_sysfs.h b/include/uapi/linux/kfd_sysfs.h
index a51b7331e0b4b..5e8d28617efad 100644
--- a/include/uapi/linux/kfd_sysfs.h
+++ b/include/uapi/linux/kfd_sysfs.h
@@ -51,15 +51,16 @@
 /* Old buggy user mode depends on this being 0 */
 #define HSA_CAP_RESERVED_WAS_SRAM_EDCSUPPORTED 0x0008
 
-#define HSA_CAP_MEM_EDCSUPPORTED   0x0010
-#define HSA_CAP_RASEVENTNOTIFY 0x0020
-#define HSA_CAP_ASIC_REVISION_MASK 0x03c0
-#define HSA_CAP_ASIC_REVISION_SHIFT22
-#define HSA_CAP_SRAM_EDCSUPPORTED  0x0400
-#define HSA_CAP_SVMAPI_SUPPORTED   0x0800
-#define HSA_CAP_FLAGS_COHERENTHOSTACCESS   0x1000
-#define HSA_CAP_TRAP_DEBUG_FIRMWARE_SUPPORTED   0x2000
-#define HSA_CAP_RESERVED   0xe00f8000
+#define HSA_CAP_MEM_EDCSUPPORTED   0x0010
+#define HSA_CAP_RASEVENTNOTIFY 0x0020
+#define HSA_CAP_ASIC_REVISION_MASK 0x03c0
+#define HSA_CAP_ASIC_REVISION_SHIFT22
+#define HSA_CAP_SRAM_EDCSUPPORTED  0x0400
+#define HSA_CAP_SVMAPI_SUPPORTED   0x0800
+#define HSA_CAP_FLAGS_COHERENTHOSTACCESS   0x1000
+#define HSA_CAP_TRAP_DEBUG_FIRMWARE_SUPPORTED  0x2000
+#define HSA_CAP_TRAP_DEBUG_PRECISE_ALU_OPERATIONS_SUPPORTED0x4000
+#define HSA_CAP_RESERVED   0x800f8000
 
 /* debug_prop bits in node properties */
 #define 

[PATCH 05/14] drm/amdkfd: save and restore barrier state for gfx12

2024-04-29 Thread Alex Deucher
From: Lancelot SIX 

Add support to save and restore the work group barrier state in gfx12
CWSR trap handler.

There is no support to directly restore the signal count of a barrier
state, so instead this patch repeatedly calls s_barrier_signal to
increment the signal count to the desired value.

In this patch, I have implemented the logic to restore the barrier at
the end of the block restoring the HWREGs.  This process needs to be
done by exactly 1 wave per work group.  To achieve this, the initial
value of s_restore_spi_init_hi (containing a FIRST_WAVE bit) needs to be
saved up until that point.  An alternative could be restore the barrier
earlier in the process (around when LDS is restored, as the same wave
does both).  Doing this would break the pattern that the restore
procedure follows the CWSR area layout.

Before restoring the barrier, this patch checks if the barrier was whose
state was saved has the "valid" bit set, even if I don't think this
barrier can be in an invalid state during context save.  I expect this
test to always be true.

Signed-off-by: Lancelot SIX 
Reviewed-by: Jay Cornwall 
Signed-off-by: Alex Deucher 
---
 .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 312 +-
 .../amd/amdkfd/cwsr_trap_handler_gfx10.asm|  43 +++
 2 files changed, 205 insertions(+), 150 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h 
b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
index 2dd14f26d2f88..b539ac814b867 100644
--- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
+++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
@@ -3647,7 +3647,7 @@ static const uint32_t cwsr_trap_gfx9_4_3_hex[] = {
 };
 
 static const uint32_t cwsr_trap_gfx12_hex[] = {
-   0xbfa1, 0xbfa0023b,
+   0xbfa1, 0xbfa00240,
0xb0804009, 0xb8f8f804,
0x9178ff78, 0x8c00,
0xb8fbf811, 0x8b6eff78,
@@ -3781,21 +3781,57 @@ static const uint32_t cwsr_trap_gfx12_hex[] = {
0xfa71, 0x807d817d,
0xb8faf802, 0xd7610002,
0xfa7a, 0x807d817d,
-   0xbefe00ff, 0x,
-   0xbeff0080, 0xc4068070,
+   0xbefa50c1, 0xbfc7,
+   0xd7610002, 0xfa7a,
+   0x807d817d, 0xbefe00ff,
+   0x, 0xbeff0080,
+   0xc4068070, 0x008ce802,
+   0x, 0xbefe00c1,
+   0xb8f03b05, 0x80708170,
+   0xbf0d9973, 0xbfa20002,
+   0x84708970, 0xbfa1,
+   0x84708a70, 0xb8fa1e06,
+   0x847a8a7a, 0x80707a70,
+   0xbef600ff, 0x0100,
+   0xbef90080, 0xbefd0080,
+   0xbf80, 0xbe804100,
+   0xbe824102, 0xbe844104,
+   0xbe864106, 0xbe884108,
+   0xbe8a410a, 0xbe8c410c,
+   0xbe8e410e, 0xd7610002,
+   0xf200, 0x80798179,
+   0xd7610002, 0xf201,
+   0x80798179, 0xd7610002,
+   0xf202, 0x80798179,
+   0xd7610002, 0xf203,
+   0x80798179, 0xd7610002,
+   0xf204, 0x80798179,
+   0xd7610002, 0xf205,
+   0x80798179, 0xd7610002,
+   0xf206, 0x80798179,
+   0xd7610002, 0xf207,
+   0x80798179, 0xd7610002,
+   0xf208, 0x80798179,
+   0xd7610002, 0xf209,
+   0x80798179, 0xd7610002,
+   0xf20a, 0x80798179,
+   0xd7610002, 0xf20b,
+   0x80798179, 0xd7610002,
+   0xf20c, 0x80798179,
+   0xd7610002, 0xf20d,
+   0x80798179, 0xd7610002,
+   0xf20e, 0x80798179,
+   0xd7610002, 0xf20f,
+   0x80798179, 0xbf06a079,
+   0xbfa10007, 0xc4068070,
0x008ce802, 0x,
-   0xbefe00c1, 0xb8f03b05,
-   0x80708170, 0xbf0d9973,
-   0xbfa20002, 0x84708970,
-   0xbfa1, 0x84708a70,
-   0xb8fa1e06, 0x847a8a7a,
-   0x80707a70, 0xbef600ff,
-   0x0100, 0xbef90080,
-   0xbefd0080, 0xbf80,
+   0x8070ff70, 0x0080,
+   0xbef90080, 0x7e040280,
+   0x807d907d, 0xbf0aff7d,
+   0x0060, 0xbfa2ffbb,
0xbe804100, 0xbe824102,
0xbe844104, 0xbe864106,
0xbe884108, 0xbe8a410a,
-   0xbe8c410c, 0xbe8e410e,
0xd7610002, 0xf200,
0x80798179, 0xd7610002,
0xf201, 0x80798179,
@@ -3814,130 +3850,97 @@ static const uint32_t cwsr_trap_gfx12_hex[] = {
0xd7610002, 0xf20a,
0x80798179, 0xd7610002,
0xf20b, 0x80798179,
-   0xd7610002, 0xf20c,
-   0x80798179, 0xd7610002,
-   0xf20d, 0x80798179,
-   0xd7610002, 0xf20e,
-   0x80798179, 0xd7610002,
-   0xf20f, 0x80798179,
-   0xbf06a079, 0xbfa10007,
0xc4068070, 0x008ce802,
-   0x, 0x8070ff70,
-   0x0080, 0xbef90080,
-   0x7e040280, 0x807d907d,
-   0xbf0aff7d, 0x0060,
-   0xbfa2ffbb, 0xbe804100,
-   0xbe824102, 0xbe844104,
-   0xbe864106, 0xbe884108,
-   0xbe8a410a, 0xd7610002,
-   0xf200, 0x80798179,
-   0xd7610002, 0xf201,
-   0x80798179, 0xd7610002,
-   0xf202, 0x80798179,
-   0xd7610002, 0xf203,
-   0x80798179, 

[PATCH 14/14] drm/amdkfd: mark GFX12 system and peer GPU memory mappings as MTYPE_NC

2024-04-29 Thread Alex Deucher
From: Sreekant Somasekharan 

Due to a HW bug, the system memory mappings and peer GPU mappings
on GFX12 need to be marked as MTYPE_NC.

Cc: Joe Greathouse 
Cc: David Belanger 
Signed-off-by: Rajneesh Bhardwaj 
Signed-off-by: Sreekant Somasekharan 
Reviewed-by: Harish Kasiviswanathan 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c | 9 +
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c   | 9 +
 2 files changed, 18 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c
index c24f5bd3e09ce..3e6676fdc1875 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c
@@ -497,6 +497,10 @@ static void gmc_v12_0_get_vm_pte(struct amdgpu_device 
*adev,
 uint64_t *flags)
 {
struct amdgpu_bo *bo = mapping->bo_va->base.bo;
+   struct amdgpu_device *bo_adev = amdgpu_ttm_adev(bo->tbo.bdev);
+   bool coherent = bo->flags & AMDGPU_GEM_CREATE_COHERENT;
+   bool is_system = bo->tbo.resource->mem_type == TTM_PL_SYSTEM;
+
 
*flags &= ~AMDGPU_PTE_EXECUTABLE;
*flags |= mapping->flags & AMDGPU_PTE_EXECUTABLE;
@@ -515,6 +519,11 @@ static void gmc_v12_0_get_vm_pte(struct amdgpu_device 
*adev,
   AMDGPU_GEM_CREATE_UNCACHED))
*flags = (*flags & ~AMDGPU_PTE_MTYPE_GFX12_MASK) |
 AMDGPU_PTE_MTYPE_GFX12(MTYPE_UC);
+
+   /* WA for HW bug */
+   if ((bo && is_system) || ((bo_adev != adev) && coherent))
+   *flags |= AMDGPU_PTE_MTYPE_GFX12(MTYPE_NC);
+
 }
 
 static unsigned gmc_v12_0_get_vbios_fb_size(struct amdgpu_device *adev)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index fc5ede17f7c22..db90795e6245a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1248,6 +1248,15 @@ svm_range_get_pte_flags(struct kfd_node *node,
mapping_flags |= AMDGPU_VM_MTYPE_UC;
}
break;
+   case IP_VERSION(12, 0, 0):
+   if (domain == SVM_RANGE_VRAM_DOMAIN) {
+   if (bo_node != node)
+   mapping_flags |= AMDGPU_VM_MTYPE_NC;
+   } else {
+   mapping_flags |= coherent ?
+   AMDGPU_VM_MTYPE_UC : AMDGPU_VM_MTYPE_NC;
+   }
+   break;
default:
mapping_flags |= coherent ?
AMDGPU_VM_MTYPE_UC : AMDGPU_VM_MTYPE_NC;
-- 
2.44.0



[PATCH 09/14] drm/amdkfd: fix support for trap on wave start and end for gfx12

2024-04-29 Thread Alex Deucher
From: Jonathan Kim 

Similar to GFX11, GFX12 supports trapping on wave start and end.

Signed-off-by: Jonathan Kim 
Signed-off-by: Alex Deucher 
---
 .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v12.c| 48 +--
 1 file changed, 43 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v12.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v12.c
index efb4bed2d900a..0dfe7093bd8a0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v12.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v12.c
@@ -224,7 +224,10 @@ static int 
kgd_gfx_v12_validate_trap_override_request(struct amdgpu_device *adev
KFD_DBG_TRAP_MASK_FP_INEXACT |
KFD_DBG_TRAP_MASK_INT_DIVIDE_BY_ZERO |
KFD_DBG_TRAP_MASK_DBG_ADDRESS_WATCH |
-   KFD_DBG_TRAP_MASK_DBG_MEMORY_VIOLATION;
+   KFD_DBG_TRAP_MASK_DBG_MEMORY_VIOLATION |
+   KFD_DBG_TRAP_MASK_TRAP_ON_WAVE_START |
+   KFD_DBG_TRAP_MASK_TRAP_ON_WAVE_END;
+
 
if (trap_override != KFD_DBG_TRAP_OVERRIDE_OR &&
trap_override != KFD_DBG_TRAP_OVERRIDE_REPLACE)
@@ -233,6 +236,41 @@ static int 
kgd_gfx_v12_validate_trap_override_request(struct amdgpu_device *adev
return 0;
 }
 
+static uint32_t trap_mask_map_sw_to_hw(uint32_t mask)
+{
+   uint32_t trap_on_start = (mask & KFD_DBG_TRAP_MASK_TRAP_ON_WAVE_START) 
? 1 : 0;
+   uint32_t trap_on_end = (mask & KFD_DBG_TRAP_MASK_TRAP_ON_WAVE_END) ? 1 
: 0;
+   uint32_t excp_en = mask & (KFD_DBG_TRAP_MASK_FP_INVALID |
+   KFD_DBG_TRAP_MASK_FP_INPUT_DENORMAL |
+   KFD_DBG_TRAP_MASK_FP_DIVIDE_BY_ZERO |
+   KFD_DBG_TRAP_MASK_FP_OVERFLOW |
+   KFD_DBG_TRAP_MASK_FP_UNDERFLOW |
+   KFD_DBG_TRAP_MASK_FP_INEXACT |
+   KFD_DBG_TRAP_MASK_INT_DIVIDE_BY_ZERO |
+   KFD_DBG_TRAP_MASK_DBG_ADDRESS_WATCH |
+   KFD_DBG_TRAP_MASK_DBG_MEMORY_VIOLATION);
+   uint32_t ret;
+
+   ret = REG_SET_FIELD(0, SPI_GDBG_PER_VMID_CNTL, EXCP_EN, excp_en);
+   ret = REG_SET_FIELD(ret, SPI_GDBG_PER_VMID_CNTL, TRAP_ON_START, 
trap_on_start);
+   ret = REG_SET_FIELD(ret, SPI_GDBG_PER_VMID_CNTL, TRAP_ON_END, 
trap_on_end);
+
+   return ret;
+}
+
+static uint32_t trap_mask_map_hw_to_sw(uint32_t mask)
+{
+   uint32_t ret = REG_GET_FIELD(mask, SPI_GDBG_PER_VMID_CNTL, EXCP_EN);
+
+   if (REG_GET_FIELD(mask, SPI_GDBG_PER_VMID_CNTL, TRAP_ON_START))
+   ret |= KFD_DBG_TRAP_MASK_TRAP_ON_WAVE_START;
+
+   if (REG_GET_FIELD(mask, SPI_GDBG_PER_VMID_CNTL, TRAP_ON_END))
+   ret |= KFD_DBG_TRAP_MASK_TRAP_ON_WAVE_END;
+
+   return ret;
+}
+
 /* returns TRAP_EN, EXCP_EN and EXCP_REPLACE. */
 static uint32_t kgd_gfx_v12_set_wave_launch_trap_override(struct amdgpu_device 
*adev,
uint32_t vmid,
@@ -245,12 +283,12 @@ static uint32_t 
kgd_gfx_v12_set_wave_launch_trap_override(struct amdgpu_device *
 {
uint32_t data = 0;
 
-   *trap_mask_prev = REG_GET_FIELD(kfd_dbg_trap_cntl_prev, 
SPI_GDBG_PER_VMID_CNTL, EXCP_EN);
-   trap_mask_bits = (trap_mask_bits & trap_mask_request) |
-   (*trap_mask_prev & ~trap_mask_request);
+   *trap_mask_prev = trap_mask_map_hw_to_sw(kfd_dbg_trap_cntl_prev);
+
+   data = (trap_mask_bits & trap_mask_request) | (*trap_mask_prev & 
~trap_mask_request);
+   data = trap_mask_map_sw_to_hw(data);
 
data = REG_SET_FIELD(data, SPI_GDBG_PER_VMID_CNTL, TRAP_EN, 1);
-   data = REG_SET_FIELD(data, SPI_GDBG_PER_VMID_CNTL, EXCP_EN, 
trap_mask_bits);
data = REG_SET_FIELD(data, SPI_GDBG_PER_VMID_CNTL, EXCP_REPLACE, 
trap_override);
 
return data;
-- 
2.44.0



[PATCH 12/14] drm/amdkfd: Enable atomic support for GFX12

2024-04-29 Thread Alex Deucher
From: David Belanger 

Enable flag in KFD and set the atomic support bit in MQD.

Signed-off-by: David Belanger 
Reviewed-by: Harish Kasiviswanathan 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdkfd/kfd_device.c  | 2 ++
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v12.c | 3 +++
 2 files changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 093987b1e373e..5e5c6acb08bea 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -230,6 +230,8 @@ static void kfd_device_info_init(struct kfd_dev *kfd,
 */
kfd->device_info.needs_pci_atomics = true;
kfd->device_info.no_atomic_fw_version = 
kfd->adev->gfx.rs64_enable ? 509 : 0;
+   } else {
+   kfd->device_info.needs_pci_atomics = true;
}
} else {
kfd->device_info.doorbell_size = 4;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v12.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v12.c
index aa900b651eb0e..b7a08e7a44234 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v12.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v12.c
@@ -135,6 +135,9 @@ static void init_mqd(struct mqd_manager *mm, void **mqd,
 */
m->cp_hqd_hq_status0 = 1 << 14;
 
+   if (amdgpu_amdkfd_have_atomics_support(mm->dev->adev))
+   m->cp_hqd_hq_status0 |= 1 << 29;
+
if (q->format == KFD_QUEUE_FORMAT_AQL) {
m->cp_hqd_aql_control =
1 << CP_HQD_AQL_CONTROL__CONTROL0__SHIFT;
-- 
2.44.0



[PATCH 07/14] drm/amdkfd: Enable GFX12 trap handler

2024-04-29 Thread Alex Deucher
From: David Belanger 

Updated switch statement to use GFX12 trap handler.

Signed-off-by: David Belanger 
Reviewed-by: Harish Kasiviswanathan 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdkfd/kfd_device.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 1e21908e972fa..093987b1e373e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -525,10 +525,9 @@ static void kfd_cwsr_init(struct kfd_dev *kfd)
kfd->cwsr_isa = cwsr_trap_gfx11_hex;
kfd->cwsr_isa_size = sizeof(cwsr_trap_gfx11_hex);
} else {
-   /* GFX12_TODO: Change to gfx12 struct when available. */
-   BUILD_BUG_ON(sizeof(cwsr_trap_gfx11_hex) > PAGE_SIZE);
-   kfd->cwsr_isa = cwsr_trap_gfx11_hex;
-   kfd->cwsr_isa_size = sizeof(cwsr_trap_gfx11_hex);
+   BUILD_BUG_ON(sizeof(cwsr_trap_gfx12_hex) > PAGE_SIZE);
+   kfd->cwsr_isa = cwsr_trap_gfx12_hex;
+   kfd->cwsr_isa_size = sizeof(cwsr_trap_gfx12_hex);
}
 
kfd->cwsr_enabled = true;
-- 
2.44.0



[PATCH 13/14] drm/amd/amdkfd: Add GFX12 PTE flag to SVM get PTE function

2024-04-29 Thread Alex Deucher
From: Sreekant Somasekharan 

Add new GFX12 PTE flag AMDGPU_PTE_IS_PTE to svm_range_get_pte_flags
function. This resolves the issues related to SVM enablement in GFX12.

Signed-off-by: Sreekant Somasekharan 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 386875e6eb96b..fc5ede17f7c22 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1263,6 +1263,8 @@ svm_range_get_pte_flags(struct kfd_node *node,
pte_flags = AMDGPU_PTE_VALID;
pte_flags |= (domain == SVM_RANGE_VRAM_DOMAIN) ? 0 : AMDGPU_PTE_SYSTEM;
pte_flags |= snoop ? AMDGPU_PTE_SNOOPED : 0;
+   if (KFD_GC_VERSION(node) >= IP_VERSION(12, 0, 0))
+   pte_flags |= AMDGPU_PTE_IS_PTE;
 
pte_flags |= amdgpu_gem_va_map_flags(node->adev, mapping_flags);
return pte_flags;
-- 
2.44.0



[PATCH 01/14] drm/amdkfd: Added device queue manager files for GFX12.

2024-04-29 Thread Alex Deucher
From: David Belanger 

Initial implementation, based on GFX11.

v2: squash in include fix from David (Alex)

Signed-off-by: David Belanger 
Reviewed-by: Harish Kasiviswanathan 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdkfd/Makefile   |  1 +
 .../drm/amd/amdkfd/kfd_device_queue_manager.c |  4 +-
 .../drm/amd/amdkfd/kfd_device_queue_manager.h |  2 +
 .../amd/amdkfd/kfd_device_queue_manager_v12.c | 81 +++
 4 files changed, 87 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager_v12.c

diff --git a/drivers/gpu/drm/amd/amdkfd/Makefile 
b/drivers/gpu/drm/amd/amdkfd/Makefile
index f0d9eebf242b6..0d3d8972240da 100644
--- a/drivers/gpu/drm/amd/amdkfd/Makefile
+++ b/drivers/gpu/drm/amd/amdkfd/Makefile
@@ -50,6 +50,7 @@ AMDKFD_FILES  := $(AMDKFD_PATH)/kfd_module.o \
$(AMDKFD_PATH)/kfd_device_queue_manager_v9.o \
$(AMDKFD_PATH)/kfd_device_queue_manager_v10.o \
$(AMDKFD_PATH)/kfd_device_queue_manager_v11.o \
+   $(AMDKFD_PATH)/kfd_device_queue_manager_v12.o \
$(AMDKFD_PATH)/kfd_interrupt.o \
$(AMDKFD_PATH)/kfd_events.o \
$(AMDKFD_PATH)/cik_event_interrupt.o \
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index c08b6ee252898..4721b2fccd068 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -2597,7 +2597,9 @@ struct device_queue_manager 
*device_queue_manager_init(struct kfd_node *dev)
break;
 
default:
-   if (KFD_GC_VERSION(dev) >= IP_VERSION(11, 0, 0))
+   if (KFD_GC_VERSION(dev) >= IP_VERSION(12, 0, 0))
+   device_queue_manager_init_v12(>asic_ops);
+   else if (KFD_GC_VERSION(dev) >= IP_VERSION(11, 0, 0))
device_queue_manager_init_v11(>asic_ops);
else if (KFD_GC_VERSION(dev) >= IP_VERSION(10, 1, 1))
device_queue_manager_init_v10(>asic_ops);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
index cf7e182588f80..fcc0ee67f5441 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
@@ -277,6 +277,8 @@ void device_queue_manager_init_v10(
struct device_queue_manager_asic_ops *asic_ops);
 void device_queue_manager_init_v11(
struct device_queue_manager_asic_ops *asic_ops);
+void device_queue_manager_init_v12(
+   struct device_queue_manager_asic_ops *asic_ops);
 void program_sh_mem_settings(struct device_queue_manager *dqm,
struct qcm_process_device *qpd);
 unsigned int get_cp_queues_num(struct device_queue_manager *dqm);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager_v12.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager_v12.c
new file mode 100644
index 0..4f3295b29dfb1
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager_v12.c
@@ -0,0 +1,81 @@
+// SPDX-License-Identifier: GPL-2.0 OR MIT
+/*
+ * Copyright 2023 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include "kfd_device_queue_manager.h"
+#include "gc/gc_12_0_0_sh_mask.h"
+#include "soc24_enum.h"
+
+static int update_qpd_v12(struct device_queue_manager *dqm,
+struct qcm_process_device *qpd);
+static void init_sdma_vm_v12(struct device_queue_manager *dqm, struct queue *q,
+   struct qcm_process_device *qpd);
+
+void device_queue_manager_init_v12(
+   struct device_queue_manager_asic_ops *asic_ops)
+{
+   asic_ops->update_qpd = update_qpd_v12;
+   asic_ops->init_sdma_vm = 

[PATCH 06/14] drm/amdkfd: enable missed single-step workaround for gfx12

2024-04-29 Thread Alex Deucher
From: Laurent Morichetti 

When trap_ctrl.trap_after_inst is set, it is possible for a wave to
enter the trap handler, after single-stepping an instruction and a
save_context is raised, with only save_context set in excp_flag_priv.

Because excp_flag_priv.trap_after_inst is not reliably set, we need to
use the missed single-step workaround for gfx12 as well.

Also add wave_start and wave_end as exceptions that should be handled
by the 2nd level trap handler.

Signed-off-by: Laurent Morichetti 
Tested-by: Lancelot Six 
Reviewed-by: Jonathan Kim 
Signed-off-by: Alex Deucher 
---
 .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 783 +-
 .../amd/amdkfd/cwsr_trap_handler_gfx10.asm|  20 +-
 2 files changed, 409 insertions(+), 394 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h 
b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
index b539ac814b867..73d3772cdb76b 100644
--- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
+++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
@@ -3647,191 +3647,159 @@ static const uint32_t cwsr_trap_gfx9_4_3_hex[] = {
 };
 
 static const uint32_t cwsr_trap_gfx12_hex[] = {
-   0xbfa1, 0xbfa00240,
+   0xbfa1, 0xbfa00243,
0xb0804009, 0xb8f8f804,
0x9178ff78, 0x8c00,
0xb8fbf811, 0x8b6eff78,
0x4000, 0xbfa10008,
0x8b6eff7b, 0x0080,
-   0xbfa20015, 0x8b6ea07b,
-   0xbfa2003e, 0xbf830010,
+   0xbfa20018, 0x8b6ea07b,
+   0xbfa20041, 0xbf830010,
0xb8fbf811, 0xbfa0fffb,
-   0x8b6eff7b, 0x0050,
-   0xbfa2000d, 0xb8eef812,
+   0x8b6eff7b, 0x0bd0,
+   0xbfa20010, 0xb8eef812,
0x8b6f8f7b, 0xbfa10002,
0x8c6eff6e, 0x0080,
-   0xb8eff813, 0x8b6f6e6f,
-   0xbfa20005, 0x8b6eff6d,
-   0xf000, 0xbfa20002,
-   0x8b6ea07b, 0xbfa2002b,
-   0xbefa4d82, 0xbf89fc07,
-   0x84fa887a, 0xbf0d8f7b,
-   0xbfa10002, 0x8c7bff7b,
-   0x, 0xf4601bbd,
-   0xf810, 0xbf89fc07,
-   0x846e976e, 0x9177ff77,
-   0x0080, 0x8c776e77,
-   0xf4603bbd, 0xf800,
-   0xbf89fc07, 0xf4603ebd,
-   0xf808, 0xbf89fc07,
-   0x8bee6e6e, 0xbfa10001,
-   0xbe80486e, 0x8b6eff6d,
-   0xf000, 0xbfa20009,
-   0xb8eef811, 0x8b6eff6e,
-   0x0080, 0xbfa20007,
-   0x8c78ff78, 0x4000,
-   0x80ec886c, 0x82ed806d,
-   0xbfa2, 0x806c846c,
-   0x826d806d, 0x8b6dff6d,
-   0x, 0x8bfe7e7e,
-   0x8bea6a6a, 0xb978f804,
-   0xbe804a6c, 0x8b6dff6d,
-   0x, 0xbefa0080,
-   0xb97a0151, 0xbeee007e,
-   0xbeef007f, 0xbefe0180,
-   0xbefe4d84, 0xbf89fc07,
-   0x8b7aff7f, 0x0400,
-   0x847a857a, 0x8c6d7a6d,
-   0xbefa007e, 0x8b7bff7f,
-   0x, 0xbefe00c1,
-   0xbeff00c1, 0xee0a407a,
-   0x000c, 0x,
-   0x7e000280, 0xbefe007a,
-   0xbeff007b, 0xb8fb0742,
-   0x847b997b, 0xb8fa3b05,
-   0x807a817a, 0xbf0d997b,
-   0xbfa20002, 0x847a897a,
-   0xbfa1, 0x847a8a7a,
-   0xb8fb1e06, 0x847b8a7b,
-   0x807a7b7a, 0x8b7bff7f,
-   0x, 0x807aff7a,
-   0x0200, 0x807a7e7a,
-   0x827b807b, 0xd761,
-   0x00010870, 0xd761,
-   0x00010a71, 0xd761,
-   0x00010c72, 0xd761,
-   0x00010e73, 0xd761,
-   0x00011074, 0xd761,
-   0x00011275, 0xd761,
-   0x00011476, 0xd761,
-   0x00011677, 0xd761,
-   0x00011a79, 0xd761,
-   0x00011c7e, 0xd761,
-   0x00011e7f, 0xbefe00ff,
-   0x3fff, 0xbeff0080,
+   0xb8eff813, 0x8b6e6e6f,
+   0xbfa20008, 0x8b6eff6d,
+   0xf000, 0xbfa20005,
+   0x8b6fff6f, 0x0200,
+   0xbfa20002, 0x8b6ea07b,
+   0xbfa2002b, 0xbefa4d82,
+   0xbf89fc07, 0x84fa887a,
+   0xbf0d8f7b, 0xbfa10002,
+   0x8c7bff7b, 0x,
+   0xf4601bbd, 0xf810,
+   0xbf89fc07, 0x846e976e,
+   0x9177ff77, 0x0080,
+   0x8c776e77, 0xf4603bbd,
+   0xf800, 0xbf89fc07,
+   0xf4603ebd, 0xf808,
+   0xbf89fc07, 0x8bee6e6e,
+   0xbfa10001, 0xbe80486e,
+   0x8b6eff6d, 0xf000,
+   0xbfa20009, 0xb8eef811,
+   0x8b6eff6e, 0x0080,
+   0xbfa20007, 0x8c78ff78,
+   0x4000, 0x80ec886c,
+   0x82ed806d, 0xbfa2,
+   0x806c846c, 0x826d806d,
+   0x8b6dff6d, 0x,
+   0x8bfe7e7e, 0x8bea6a6a,
+   0xb978f804, 0xbe804a6c,
+   0x8b6dff6d, 0x,
+   0xbefa0080, 0xb97a0151,
+   0xbeee007e, 0xbeef007f,
+   0xbefe0180, 0xbefe4d84,
+   0xbf89fc07, 0x8b7aff7f,
+   0x0400, 0x847a857a,
+   0x8c6d7a6d, 0xbefa007e,
+   0x8b7bff7f, 0x,
+   0xbefe00c1, 0xbeff00c1,
0xee0a407a, 0x000c,
-   0x4000, 0xd760007a,
-   0x00011d00, 0xd760007b,
-   0x00011f00, 0xbefe007a,
-   0xbeff007b, 0xbef4007e,
-   0x8b75ff7f, 0x,
-   0x8c75ff75, 0x0004,
- 

[PATCH 08/14] drm/amdkfd: always enable ttmp setup for gfx12

2024-04-29 Thread Alex Deucher
From: Jonathan Kim 

Similar to GFX11, always enable the setup of trap temporaries on GFX12.

Signed-off-by: Jonathan Kim 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v12.c | 2 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c | 1 +
 drivers/gpu/drm/amd/amdkfd/kfd_debug.h | 3 ++-
 3 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v12.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v12.c
index 5aa2fd147d99d..efb4bed2d900a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v12.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v12.c
@@ -205,7 +205,7 @@ static uint32_t kgd_gfx_v12_disable_debug_trap(struct 
amdgpu_device *adev,
 {
uint32_t data = 0;
 
-   data = REG_SET_FIELD(data, SPI_GDBG_PER_VMID_CNTL, TRAP_EN, 
keep_trap_enabled);
+   data = REG_SET_FIELD(data, SPI_GDBG_PER_VMID_CNTL, TRAP_EN, 1);
data = REG_SET_FIELD(data, SPI_GDBG_PER_VMID_CNTL, EXCP_EN, 0);
data = REG_SET_FIELD(data, SPI_GDBG_PER_VMID_CNTL, EXCP_REPLACE, 0);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
index c94ed3b929cb4..51955a4e47d59 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
@@ -1446,6 +1446,7 @@ static void gfx_v12_0_init_compute_vmid(struct 
amdgpu_device *adev)
/* Enable trap for each kfd vmid. */
data = RREG32_SOC15(GC, 0, regSPI_GDBG_PER_VMID_CNTL);
data = REG_SET_FIELD(data, SPI_GDBG_PER_VMID_CNTL, TRAP_EN, 1);
+   WREG32_SOC15(GC, 0, regSPI_GDBG_PER_VMID_CNTL, data);
}
soc24_grbm_select(adev, 0, 0, 0, 0);
mutex_unlock(>srbm_mutex);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_debug.h
index fd0ff64d4184a..da9a3cb329f13 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.h
@@ -134,6 +134,7 @@ static inline bool kfd_dbg_has_ttmps_always_setup(struct 
kfd_node *dev)
KFD_GC_VERSION(dev) != IP_VERSION(9, 4, 2)) ||
   (KFD_GC_VERSION(dev) >= IP_VERSION(11, 0, 0) &&
KFD_GC_VERSION(dev) < IP_VERSION(12, 0, 0) &&
-   (dev->adev->mes.sched_version & 
AMDGPU_MES_VERSION_MASK) >= 70);
+   (dev->adev->mes.sched_version & 
AMDGPU_MES_VERSION_MASK) >= 70) ||
+  (KFD_GC_VERSION(dev) >= IP_VERSION(12, 0, 0));
 }
 #endif
-- 
2.44.0



[PATCH 04/14] drm/amdkfd: Add gfx12 trap handler support

2024-04-29 Thread Alex Deucher
From: Jay Cornwall 

- HWREG changes since gfx11
- Save/restore barrier state
- get_wave_size is now reserved by assembler

v2: rebase (Alex)

Signed-off-by: Jay Cornwall 
Reviewed-by: Harish Kasiviswanathan 
Signed-off-by: Alex Deucher 
---
 .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 465 ++
 .../amd/amdkfd/cwsr_trap_handler_gfx10.asm| 184 +--
 2 files changed, 607 insertions(+), 42 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h 
b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
index 5a0308d26b53c..2dd14f26d2f88 100644
--- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
+++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
@@ -3645,3 +3645,468 @@ static const uint32_t cwsr_trap_gfx9_4_3_hex[] = {
0xb97a0002, 0xbf8a,
0xbe801f6c, 0xbf9b,
 };
+
+static const uint32_t cwsr_trap_gfx12_hex[] = {
+   0xbfa1, 0xbfa0023b,
+   0xb0804009, 0xb8f8f804,
+   0x9178ff78, 0x8c00,
+   0xb8fbf811, 0x8b6eff78,
+   0x4000, 0xbfa10008,
+   0x8b6eff7b, 0x0080,
+   0xbfa20015, 0x8b6ea07b,
+   0xbfa2003e, 0xbf830010,
+   0xb8fbf811, 0xbfa0fffb,
+   0x8b6eff7b, 0x0050,
+   0xbfa2000d, 0xb8eef812,
+   0x8b6f8f7b, 0xbfa10002,
+   0x8c6eff6e, 0x0080,
+   0xb8eff813, 0x8b6f6e6f,
+   0xbfa20005, 0x8b6eff6d,
+   0xf000, 0xbfa20002,
+   0x8b6ea07b, 0xbfa2002b,
+   0xbefa4d82, 0xbf89fc07,
+   0x84fa887a, 0xbf0d8f7b,
+   0xbfa10002, 0x8c7bff7b,
+   0x, 0xf4601bbd,
+   0xf810, 0xbf89fc07,
+   0x846e976e, 0x9177ff77,
+   0x0080, 0x8c776e77,
+   0xf4603bbd, 0xf800,
+   0xbf89fc07, 0xf4603ebd,
+   0xf808, 0xbf89fc07,
+   0x8bee6e6e, 0xbfa10001,
+   0xbe80486e, 0x8b6eff6d,
+   0xf000, 0xbfa20009,
+   0xb8eef811, 0x8b6eff6e,
+   0x0080, 0xbfa20007,
+   0x8c78ff78, 0x4000,
+   0x80ec886c, 0x82ed806d,
+   0xbfa2, 0x806c846c,
+   0x826d806d, 0x8b6dff6d,
+   0x, 0x8bfe7e7e,
+   0x8bea6a6a, 0xb978f804,
+   0xbe804a6c, 0x8b6dff6d,
+   0x, 0xbefa0080,
+   0xb97a0151, 0xbeee007e,
+   0xbeef007f, 0xbefe0180,
+   0xbefe4d84, 0xbf89fc07,
+   0x8b7aff7f, 0x0400,
+   0x847a857a, 0x8c6d7a6d,
+   0xbefa007e, 0x8b7bff7f,
+   0x, 0xbefe00c1,
+   0xbeff00c1, 0xee0a407a,
+   0x000c, 0x,
+   0x7e000280, 0xbefe007a,
+   0xbeff007b, 0xb8fb0742,
+   0x847b997b, 0xb8fa3b05,
+   0x807a817a, 0xbf0d997b,
+   0xbfa20002, 0x847a897a,
+   0xbfa1, 0x847a8a7a,
+   0xb8fb1e06, 0x847b8a7b,
+   0x807a7b7a, 0x8b7bff7f,
+   0x, 0x807aff7a,
+   0x0200, 0x807a7e7a,
+   0x827b807b, 0xd761,
+   0x00010870, 0xd761,
+   0x00010a71, 0xd761,
+   0x00010c72, 0xd761,
+   0x00010e73, 0xd761,
+   0x00011074, 0xd761,
+   0x00011275, 0xd761,
+   0x00011476, 0xd761,
+   0x00011677, 0xd761,
+   0x00011a79, 0xd761,
+   0x00011c7e, 0xd761,
+   0x00011e7f, 0xbefe00ff,
+   0x3fff, 0xbeff0080,
+   0xee0a407a, 0x000c,
+   0x4000, 0xd760007a,
+   0x00011d00, 0xd760007b,
+   0x00011f00, 0xbefe007a,
+   0xbeff007b, 0xbef4007e,
+   0x8b75ff7f, 0x,
+   0x8c75ff75, 0x0004,
+   0xbef60080, 0xbef700ff,
+   0x10807fac, 0xbef1007d,
+   0xbef00080, 0xb8f30742,
+   0x84739973, 0xbefe00c1,
+   0x857d9973, 0x8b7d817d,
+   0xbf06817d, 0xbfa20002,
+   0xbeff0080, 0xbfa2,
+   0xbeff00c1, 0xbfac,
+   0xbef600ff, 0x0100,
+   0xc4068070, 0x008ce801,
+   0x8000, 0xc4068070,
+   0x008ce802, 0x0001,
+   0xc4068070, 0x008ce803,
+   0x00018000, 0xbfab,
+   0xbef600ff, 0x0100,
+   0xc4068070, 0x008ce801,
+   0x0001, 0xc4068070,
+   0x008ce802, 0x0002,
+   0xc4068070, 0x008ce803,
+   0x0003, 0xb8f03b05,
+   0x80708170, 0xbf0d9973,
+   0xbfa20002, 0x84708970,
+   0xbfa1, 0x84708a70,
+   0xb8fa1e06, 0x847a8a7a,
+   0x80707a70, 0x8070ff70,
+   0x0200, 0xbef600ff,
+   0x0100, 0x7e000280,
+   0x7e020280, 0x7e040280,
+   0xbefd0080, 0xbe804ec2,
+   0xbf94fffe, 0xd7610002,
+   0xfa71, 0x807d817d,
+   0xd7610002, 0xfa6c,
+   0x807d817d, 0x917aff6d,
+   0x8000, 0xd7610002,
+   0xfa7a, 0x807d817d,
+   0xd7610002, 0xfa6e,
+   0x807d817d, 0xd7610002,
+   0xfa6f, 0x807d817d,
+   0xd7610002, 0xfa78,
+   0x807d817d, 0xb8faf811,
+   0xd7610002, 0xfa7a,
+   0x807d817d, 0xd7610002,
+   0xfa7b, 0x807d817d,
+   0xb8f1f801, 0xd7610002,
+   0xfa71, 0x807d817d,
+   0xb8f1f814, 0xd7610002,
+   0xfa71, 0x807d817d,
+   0xb8f1f815, 0xd7610002,
+   0xfa71, 0x807d817d,
+   0xb8f1f812, 

[PATCH 03/14] drm/amdkfd: Move trap handler coherence flags to preprocessor

2024-04-29 Thread Alex Deucher
From: Jay Cornwall 

No functional change. Preparation for gfx12 support.

v2: drop unrelated change (Alex)

Signed-off-by: Jay Cornwall 
Reviewed-by: Harish Kasiviswanathan 
Signed-off-by: Alex Deucher 
---
 .../amd/amdkfd/cwsr_trap_handler_gfx10.asm| 127 +-
 1 file changed, 65 insertions(+), 62 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm 
b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm
index e1aaa5ce0784e..dae912688c955 100644
--- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm
+++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm
@@ -46,6 +46,9 @@
 #define SW_SA_TRAP (ASIC_FAMILY >= CHIP_PLUM_BONITO)
 #define SAVE_AFTER_XNACK_ERROR (HAVE_XNACK && !NO_SQC_STORE) // workaround for 
TCP store failure after XNACK error when ALLOW_REPLAY=0, for debugger
 
+#define S_COHERENCE glc:1
+#define V_COHERENCE slc:1 glc:1
+
 var SINGLE_STEP_MISSED_WORKAROUND  = 1 //workaround for lost 
MODE.DEBUG_EN exception when SAVECTX raised
 
 var SQ_WAVE_STATUS_SPI_PRIO_MASK   = 0x0006
@@ -298,15 +301,15 @@ L_FETCH_2ND_TRAP:
s_or_b32ttmp15, ttmp15, 0x
 L_NO_SIGN_EXTEND_TMA:
 
-   s_load_dwordttmp2, [ttmp14, ttmp15], 0x10 glc:1 
// debug trap enabled flag
+   s_load_dwordttmp2, [ttmp14, ttmp15], 0x10 S_COHERENCE   
// debug trap enabled flag
s_waitcnt   lgkmcnt(0)
s_lshl_b32  ttmp2, ttmp2, TTMP11_DEBUG_TRAP_ENABLED_SHIFT
s_andn2_b32 ttmp11, ttmp11, TTMP11_DEBUG_TRAP_ENABLED_MASK
s_or_b32ttmp11, ttmp11, ttmp2
 
-   s_load_dwordx2  [ttmp2, ttmp3], [ttmp14, ttmp15], 0x0 glc:1 
// second-level TBA
+   s_load_dwordx2  [ttmp2, ttmp3], [ttmp14, ttmp15], 0x0 S_COHERENCE   
// second-level TBA
s_waitcnt   lgkmcnt(0)
-   s_load_dwordx2  [ttmp14, ttmp15], [ttmp14, ttmp15], 0x8 glc:1   
// second-level TMA
+   s_load_dwordx2  [ttmp14, ttmp15], [ttmp14, ttmp15], 0x8 S_COHERENCE 
// second-level TMA
s_waitcnt   lgkmcnt(0)
 
s_and_b64   [ttmp2, ttmp3], [ttmp2, ttmp3], [ttmp2, ttmp3]
@@ -399,7 +402,7 @@ L_SLEEP:
s_and_b32   s_save_ttmps_hi, exec_hi, 0x
s_mov_b32   exec_lo, 0x
s_mov_b32   exec_hi, 0x
-   global_store_dword_addtid   v0, [s_save_ttmps_lo, s_save_ttmps_hi] 
slc:1 glc:1
+   global_store_dword_addtid   v0, [s_save_ttmps_lo, s_save_ttmps_hi] 
V_COHERENCE
v_mov_b32   v0, 0x0
s_mov_b32   exec_lo, s_save_ttmps_lo
s_mov_b32   exec_hi, s_save_ttmps_hi
@@ -431,15 +434,15 @@ L_SLEEP:
 
s_mov_b32   exec_lo, 0x3FFF
s_mov_b32   exec_hi, 0x0
-   global_store_dword_addtid   v0, [s_save_ttmps_lo, s_save_ttmps_hi] 
inst_offset:0x40 slc:1 glc:1
+   global_store_dword_addtid   v0, [s_save_ttmps_lo, s_save_ttmps_hi] 
inst_offset:0x40 V_COHERENCE
v_readlane_b32  ttmp14, v0, 0xE
v_readlane_b32  ttmp15, v0, 0xF
s_mov_b32   exec_lo, ttmp14
s_mov_b32   exec_hi, ttmp15
 #else
-   s_store_dwordx4 [ttmp4, ttmp5, ttmp6, ttmp7], [s_save_ttmps_lo, 
s_save_ttmps_hi], 0x50 glc:1
-   s_store_dwordx4 [ttmp8, ttmp9, ttmp10, ttmp11], [s_save_ttmps_lo, 
s_save_ttmps_hi], 0x60 glc:1
-   s_store_dword   ttmp13, [s_save_ttmps_lo, s_save_ttmps_hi], 0x74 glc:1
+   s_store_dwordx4 [ttmp4, ttmp5, ttmp6, ttmp7], [s_save_ttmps_lo, 
s_save_ttmps_hi], 0x50 S_COHERENCE
+   s_store_dwordx4 [ttmp8, ttmp9, ttmp10, ttmp11], [s_save_ttmps_lo, 
s_save_ttmps_hi], 0x60 S_COHERENCE
+   s_store_dword   ttmp13, [s_save_ttmps_lo, s_save_ttmps_hi], 0x74 
S_COHERENCE
 #endif
 
/* setup Resource Contants */
@@ -488,11 +491,11 @@ L_SAVE_FIRST_VGPRS32_WITH_TCP:
 #endif
 
 #if !NO_SQC_STORE
-   buffer_store_dword  v0, v0, s_save_buf_rsrc0, s_save_mem_offset 
slc:1 glc:1
+   buffer_store_dword  v0, v0, s_save_buf_rsrc0, s_save_mem_offset 
V_COHERENCE
 #endif
-   buffer_store_dword  v1, v0, s_save_buf_rsrc0, s_save_mem_offset 
slc:1 glc:1 offset:128
-   buffer_store_dword  v2, v0, s_save_buf_rsrc0, s_save_mem_offset 
slc:1 glc:1 offset:128*2
-   buffer_store_dword  v3, v0, s_save_buf_rsrc0, s_save_mem_offset 
slc:1 glc:1 offset:128*3
+   buffer_store_dword  v1, v0, s_save_buf_rsrc0, s_save_mem_offset 
V_COHERENCE offset:128
+   buffer_store_dword  v2, v0, s_save_buf_rsrc0, s_save_mem_offset 
V_COHERENCE offset:128*2
+   buffer_store_dword  v3, v0, s_save_buf_rsrc0, s_save_mem_offset 
V_COHERENCE offset:128*3
s_branchL_SAVE_HWREG
 
 L_SAVE_4VGPR_WAVE64:
@@ -511,11 +514,11 @@ L_SAVE_FIRST_VGPRS64_WITH_TCP:
 #endif
 
 #if !NO_SQC_STORE
-   buffer_store_dword  v0, v0, s_save_buf_rsrc0, s_save_mem_offset 
slc:1 glc:1
+   buffer_store_dword  v0, v0, s_save_buf_rsrc0, 

[PATCH 02/14] drm/amdkfd: Added gfx_v12_kfd2kgd interface for GFX12.

2024-04-29 Thread Alex Deucher
From: David Belanger 

Initial implementation, based on GFX11.

v2: Removed functions not needed by cp scheduler.
v3: Fixed typos.
v4: squash in warning fix (Alex)

Signed-off-by: David Belanger 
Acked-by: Jonathan Kim 
Reviewed-by: Harish Kasiviswanathan 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/Makefile   |   3 +-
 .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v12.c| 339 ++
 drivers/gpu/drm/amd/amdkfd/kfd_device.c   |   5 +-
 3 files changed, 344 insertions(+), 3 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v12.c

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index 099a47b3e0496..de7b76327f5ba 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -281,7 +281,8 @@ amdgpu-y += \
amdgpu_amdkfd_gc_9_4_3.o \
amdgpu_amdkfd_gfx_v10.o \
amdgpu_amdkfd_gfx_v10_3.o \
-   amdgpu_amdkfd_gfx_v11.o
+   amdgpu_amdkfd_gfx_v11.o \
+   amdgpu_amdkfd_gfx_v12.o
 
 ifneq ($(CONFIG_DRM_AMDGPU_CIK),)
 amdgpu-y += amdgpu_amdkfd_gfx_v7.o
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v12.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v12.c
new file mode 100644
index 0..5aa2fd147d99d
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v12.c
@@ -0,0 +1,339 @@
+/*
+ * Copyright 2023 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include "amdgpu.h"
+#include "amdgpu_amdkfd.h"
+#include "gc/gc_12_0_0_offset.h"
+#include "gc/gc_12_0_0_sh_mask.h"
+#include "soc24.h"
+#include 
+
+static void lock_srbm(struct amdgpu_device *adev, uint32_t mec, uint32_t pipe,
+   uint32_t queue, uint32_t vmid)
+{
+   mutex_lock(>srbm_mutex);
+   soc24_grbm_select(adev, mec, pipe, queue, vmid);
+}
+
+static void unlock_srbm(struct amdgpu_device *adev)
+{
+   soc24_grbm_select(adev, 0, 0, 0, 0);
+   mutex_unlock(>srbm_mutex);
+}
+
+static void acquire_queue(struct amdgpu_device *adev, uint32_t pipe_id,
+   uint32_t queue_id)
+{
+   uint32_t mec = (pipe_id / adev->gfx.mec.num_pipe_per_mec) + 1;
+   uint32_t pipe = (pipe_id % adev->gfx.mec.num_pipe_per_mec);
+
+   lock_srbm(adev, mec, pipe, queue_id, 0);
+}
+
+static void release_queue(struct amdgpu_device *adev)
+{
+   unlock_srbm(adev);
+}
+
+static int init_interrupts_v12(struct amdgpu_device *adev, uint32_t pipe_id, 
uint32_t inst)
+{
+   uint32_t mec;
+   uint32_t pipe;
+
+   mec = (pipe_id / adev->gfx.mec.num_pipe_per_mec) + 1;
+   pipe = (pipe_id % adev->gfx.mec.num_pipe_per_mec);
+
+   lock_srbm(adev, mec, pipe, 0, 0);
+
+   WREG32_SOC15(GC, 0, regCPC_INT_CNTL,
+   CP_INT_CNTL_RING0__TIME_STAMP_INT_ENABLE_MASK |
+   CP_INT_CNTL_RING0__OPCODE_ERROR_INT_ENABLE_MASK);
+
+   unlock_srbm(adev);
+
+   return 0;
+}
+
+static uint32_t get_sdma_rlc_reg_offset(struct amdgpu_device *adev,
+   unsigned int engine_id,
+   unsigned int queue_id)
+{
+   uint32_t sdma_engine_reg_base = 0;
+   uint32_t sdma_rlc_reg_offset;
+
+   switch (engine_id) {
+   case 0:
+   sdma_engine_reg_base = SOC15_REG_OFFSET(SDMA0, 0,
+   regSDMA0_QUEUE0_RB_CNTL) - 
regSDMA0_QUEUE0_RB_CNTL;
+   break;
+   case 1:
+   sdma_engine_reg_base = SOC15_REG_OFFSET(SDMA1, 0,
+   regSDMA1_QUEUE0_RB_CNTL) - 
regSDMA0_QUEUE0_RB_CNTL;
+   break;
+   default:
+   BUG();
+   }
+
+   sdma_rlc_reg_offset = sdma_engine_reg_base
+   + queue_id * (regSDMA0_QUEUE1_RB_CNTL - 
regSDMA0_QUEUE0_RB_CNTL);
+
+   pr_debug("RLC register offset for SDMA%d RLC%d: 0x%x\n", engine_id,
+   queue_id, 

Re: [PATCH 3/3] drm/amdgpu: Fix pinned GART area accounting and fdinfo reporting

2024-04-29 Thread Felix Kuehling

On 2024-04-29 5:43, Tvrtko Ursulin wrote:


On 26/04/2024 23:24, Felix Kuehling wrote:


On 2024-04-26 12:43, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

When commit b453e42a6e8b ("drm/amdgpu: Add new placement for preemptible
SG BOs") added a new TTM region it missed to notice the conceptual
imbalance in GART pin size accounting as done in amdgpu_bo_pin/unpin.

That imbalance leads to such objects getting accounted against the
resource, but are not un-accounted when unpinned.


AMDGPU_PL_PREEMPT is mostly used for userptr BOs, which cannot be 
pinned. In any case you should make sure that the accounting is 
consistent between amdgpu_bo_pin_restricted and amdgpu_bo_unpin. This 
patch breaks that consistency.


You mean amdgpu_bo_pin(_restricted) and amdgpu_bo_unpin do not run for 
such objects, or something else?


Right. amdgpu_bo_pin_restricted will return an error for userptr BOs:

if (amdgpu_ttm_tt_get_usermm(bo->tbo.ttm))
return -EPERM;




If they run, then at the end of pin there is:

 domain = amdgpu_mem_type_to_domain(bo->tbo.resource->mem_type);
...
 } else if (domain == AMDGPU_GEM_DOMAIN_GTT) {
     atomic64_add(amdgpu_bo_size(bo), >gart_pin_size);


You changed that in your patch 2:

-   } else if (domain == AMDGPU_GEM_DOMAIN_GTT) {
+   } else if (bo->tbo.resource->mem_type == TTM_PL_TT ||
+  bo->tbo.resource->mem_type == AMDGPU_PL_PREEMPT) {
atomic64_add(amdgpu_bo_size(bo), >gart_pin_size);
}

I was suggesting you just change this in patch 2 like this, so it 
matches what's done on unpin:


-   } else if (domain == AMDGPU_GEM_DOMAIN_GTT) {
+   } else if (bo->tbo.resource->mem_type == TTM_PL_TT) {
atomic64_add(amdgpu_bo_size(bo), >gart_pin_size);
}




And unpin has no handling for AMDGPU_PL_PREEMPT.

Ah I see.. does it rely on amdgpu_mem_type_to_domain returning 0 for 
AMDGPU_PL_PREEMPT? My confusion was I misread the pinning check as 
checking the domain as stored in the bo at creation time.


Although I am still confused by the statement userptr BOs are not 
pinned. It is not needed to map them via GART on AMD hardware for GPU to 
be able to access them?

Fix by extending the accounting criteria in amdgpu_bo_unpin.

What also aappears needs fixing is not reporting their size from the
amdgpu_bo_get_memory, which is used to implement fdinfo stats, so 
they are

not mixed with the regular userspace created and driver owned objects.


I think that's true. It's a very fine distinction. AMDGPU_PL_PREEMPT 
does use system memory and it is GPU accessible, just like GTT. The 
only difference is, that it's not subject to the GTT limits because 
their eviction is handled by callbacks other than TTM evictions and 
doesn't need to wait for fences.


As in you think those two hunks of the patch are correct?


Yes. It seems, Christian agrees but wants to show preemptible memory 
separately in debugfs instead of not showing it at all.


Regards,
  Felix




Regards,

Tvrtko



Regards,
   Felix




And also amdgpu_bo_print_info for debugfs reporting.

Note that the patch depends on the previous one which broke down the
relevant checks from the domain based to placement based.

Signed-off-by: Tvrtko Ursulin 
Fixes: b453e42a6e8b ("drm/amdgpu: Add new placement for preemptible 
SG BOs")

Cc: Felix Kuehling 
Cc: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 5 ++---
  1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c

index fb984669fc3a..5a2bbc793953 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -1032,7 +1032,8 @@ void amdgpu_bo_unpin(struct amdgpu_bo *bo)
  atomic64_sub(amdgpu_bo_size(bo), >vram_pin_size);
  atomic64_sub(amdgpu_vram_mgr_bo_visible_size(bo),
   >visible_pin_size);
-    } else if (bo->tbo.resource->mem_type == TTM_PL_TT) {
+    } else if (bo->tbo.resource->mem_type == TTM_PL_TT ||
+   bo->tbo.resource->mem_type == AMDGPU_PL_PREEMPT) {
  atomic64_sub(amdgpu_bo_size(bo), >gart_pin_size);
  }
@@ -1298,7 +1299,6 @@ void amdgpu_bo_get_memory(struct amdgpu_bo *bo,
  stats->vram_shared += size;
  break;
  case TTM_PL_TT:
-    case AMDGPU_PL_PREEMPT:
  stats->gtt += size;
  if (shared)
  stats->gtt_shared += size;
@@ -1599,7 +1599,6 @@ u64 amdgpu_bo_print_info(int id, struct 
amdgpu_bo *bo, struct seq_file *m)

  placement = "VRAM";
  break;
  case TTM_PL_TT:
-    case AMDGPU_PL_PREEMPT:
  placement = "GTT";
  break;
  case TTM_PL_SYSTEM:


[PATCH 26/31] drm/amdgpu: fix active rb and cu number for gfx12

2024-04-29 Thread Alex Deucher
From: Likun Gao 

Correct the algorithm of active CU and RB to bypass
the disabled SA for gfx12.

Signed-off-by: Likun Gao 
Reviewed-by: Hawking Zhang 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c | 81 +-
 1 file changed, 55 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
index 882e00234e33a..6a2af12b5e29d 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
@@ -1354,44 +1354,70 @@ static void gfx_v12_0_select_se_sh(struct amdgpu_device 
*adev, u32 se_num,
WREG32_SOC15(GC, 0, regGRBM_GFX_INDEX, data);
 }
 
-static u32 gfx_v12_0_get_rb_active_bitmap(struct amdgpu_device *adev)
+static u32 gfx_v12_0_get_sa_active_bitmap(struct amdgpu_device *adev)
 {
-   u32 data, mask;
+   u32 gc_disabled_sa_mask, gc_user_disabled_sa_mask, sa_mask;
 
-   data = RREG32_SOC15(GC, 0, regCC_RB_BACKEND_DISABLE);
-   data |= RREG32_SOC15(GC, 0, regGC_USER_RB_BACKEND_DISABLE);
+   gc_disabled_sa_mask = RREG32_SOC15(GC, 0, 
regGRBM_CC_GC_SA_UNIT_DISABLE);
+   gc_disabled_sa_mask = REG_GET_FIELD(gc_disabled_sa_mask,
+   GRBM_CC_GC_SA_UNIT_DISABLE,
+   SA_DISABLE);
+   gc_user_disabled_sa_mask = RREG32_SOC15(GC, 0, 
regGRBM_GC_USER_SA_UNIT_DISABLE);
+   gc_user_disabled_sa_mask = REG_GET_FIELD(gc_user_disabled_sa_mask,
+GRBM_GC_USER_SA_UNIT_DISABLE,
+SA_DISABLE);
+   sa_mask = amdgpu_gfx_create_bitmask(adev->gfx.config.max_sh_per_se *
+   
adev->gfx.config.max_shader_engines);
 
-   data &= CC_RB_BACKEND_DISABLE__BACKEND_DISABLE_MASK;
-   data >>= GC_USER_RB_BACKEND_DISABLE__BACKEND_DISABLE__SHIFT;
+   return sa_mask & (~(gc_disabled_sa_mask | gc_user_disabled_sa_mask));
+}
 
-   mask = amdgpu_gfx_create_bitmask(adev->gfx.config.max_backends_per_se /
-adev->gfx.config.max_sh_per_se);
+static u32 gfx_v12_0_get_rb_active_bitmap(struct amdgpu_device *adev)
+{
+   u32 gc_disabled_rb_mask, gc_user_disabled_rb_mask;
+   u32 rb_mask;
 
-   return (~data) & mask;
+   gc_disabled_rb_mask = RREG32_SOC15(GC, 0, regCC_RB_BACKEND_DISABLE);
+   gc_disabled_rb_mask = REG_GET_FIELD(gc_disabled_rb_mask,
+   CC_RB_BACKEND_DISABLE,
+   BACKEND_DISABLE);
+   gc_user_disabled_rb_mask = RREG32_SOC15(GC, 0, 
regGC_USER_RB_BACKEND_DISABLE);
+   gc_user_disabled_rb_mask = REG_GET_FIELD(gc_user_disabled_rb_mask,
+GC_USER_RB_BACKEND_DISABLE,
+BACKEND_DISABLE);
+   rb_mask = 
amdgpu_gfx_create_bitmask(adev->gfx.config.max_backends_per_se *
+   
adev->gfx.config.max_shader_engines);
+
+   return rb_mask & (~(gc_disabled_rb_mask | gc_user_disabled_rb_mask));
 }
 
 static void gfx_v12_0_setup_rb(struct amdgpu_device *adev)
 {
-   int i, j;
-   u32 data;
-   u32 active_rbs = 0;
-   u32 rb_bitmap_width_per_sh = adev->gfx.config.max_backends_per_se /
-adev->gfx.config.max_sh_per_se;
+   u32 rb_bitmap_width_per_sa;
+   u32 max_sa;
+   u32 active_sa_bitmap;
+   u32 global_active_rb_bitmap;
+   u32 active_rb_bitmap = 0;
+   u32 i;
 
-   mutex_lock(>grbm_idx_mutex);
-   for (i = 0; i < adev->gfx.config.max_shader_engines; i++) {
-   for (j = 0; j < adev->gfx.config.max_sh_per_se; j++) {
-   gfx_v12_0_select_se_sh(adev, i, j, 0x, 0);
-   data = gfx_v12_0_get_rb_active_bitmap(adev);
-   active_rbs |= data << ((i * 
adev->gfx.config.max_sh_per_se + j) *
-  rb_bitmap_width_per_sh);
-   }
+   /* query sa bitmap from SA_UNIT_DISABLE registers */
+   active_sa_bitmap = gfx_v12_0_get_sa_active_bitmap(adev);
+   /* query rb bitmap from RB_BACKEND_DISABLE registers */
+   global_active_rb_bitmap = gfx_v12_0_get_rb_active_bitmap(adev);
+
+   /* generate active rb bitmap according to active sa bitmap */
+   max_sa = adev->gfx.config.max_shader_engines *
+adev->gfx.config.max_sh_per_se;
+   rb_bitmap_width_per_sa = adev->gfx.config.max_backends_per_se /
+adev->gfx.config.max_sh_per_se;
+   for (i = 0; i < max_sa; i++) {
+   if (active_sa_bitmap & (1 << i))
+   active_rb_bitmap |= (0x3 << (i * 
rb_bitmap_width_per_sa));
}
-   gfx_v12_0_select_se_sh(adev, 0x, 0x, 0x, 0);
-   

[PATCH 20/31] drm/amd: Move fw init from sw_init to early_init for imu v12

2024-04-29 Thread Alex Deucher
From: Likun Gao 

Move microcode loading from sw_init to early_init to align with
the perious version of imu init sequence.

Signed-off-by: Likun Gao 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
index df5873ba54e76..e8505c77e12e8 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
@@ -461,6 +461,14 @@ static int gfx_v12_0_init_microcode(struct amdgpu_device 
*adev)
/* only one MEC for gfx 12 */
adev->gfx.mec2_fw = NULL;
 
+   if (adev->gfx.imu.funcs && (amdgpu_dpm > 0)) {
+   if (adev->gfx.imu.funcs->init_microcode) {
+   err = adev->gfx.imu.funcs->init_microcode(adev);
+   if (err)
+   dev_err(adev->dev, "Failed to load imu 
firmware!\n");
+   }
+   }
+
 out:
if (err) {
amdgpu_ucode_release(>gfx.pfp_fw);
@@ -1172,14 +1180,6 @@ static int gfx_v12_0_sw_init(void *handle)
 
adev->gfx.gfx_current_status = AMDGPU_GFX_NORMAL_MODE;
 
-   if (adev->gfx.imu.funcs && (amdgpu_dpm > 0)) {
-   if (adev->gfx.imu.funcs->init_microcode) {
-   r = adev->gfx.imu.funcs->init_microcode(adev);
-   if (r)
-   dev_err(adev->dev, "Failed to load imu 
firmware!\n");
-   }
-   }
-
gfx_v12_0_me_init(adev);
 
r = gfx_v12_0_rlc_init(adev);
-- 
2.44.0



[PATCH 28/31] drm/amdgpu: init gfxhub setting to align with mmhub

2024-04-29 Thread Alex Deucher
From: Likun Gao 

Align gfxhub settings with mmhub when program rlc ram.

Signed-off-by: Likun Gao 
Reviewed-by: Hawking Zhang 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/imu_v12_0.c | 39 ++
 1 file changed, 39 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/imu_v12_0.c 
b/drivers/gpu/drm/amd/amdgpu/imu_v12_0.c
index 5baef51660637..d67807d5c14c0 100644
--- a/drivers/gpu/drm/amd/amdgpu/imu_v12_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/imu_v12_0.c
@@ -30,6 +30,7 @@
 
 #include "gc/gc_12_0_0_offset.h"
 #include "gc/gc_12_0_0_sh_mask.h"
+#include "mmhub/mmhub_4_1_0_offset.h"
 
 MODULE_FIRMWARE("amdgpu/gc_12_0_1_imu.bin");
 
@@ -295,6 +296,43 @@ static u32 imu_v12_0_grbm_gfx_index_remap(struct 
amdgpu_device *adev,
return val;
 }
 
+static u32 imu_v12_init_gfxhub_settings(struct amdgpu_device *adev,
+   u32 reg, u32 data)
+{
+   if (reg == SOC15_REG_OFFSET(GC, 0, regGCMC_VM_FB_LOCATION_BASE))
+   return RREG32_SOC15(MMHUB, 0, regMMMC_VM_FB_LOCATION_BASE);
+   else if (reg == SOC15_REG_OFFSET(GC, 0, regGCMC_VM_FB_LOCATION_TOP))
+   return RREG32_SOC15(MMHUB, 0, regMMMC_VM_FB_LOCATION_TOP);
+   else if (reg == SOC15_REG_OFFSET(GC, 0, regGCMC_VM_FB_OFFSET))
+   return RREG32_SOC15(MMHUB, 0, regMMMC_VM_FB_OFFSET);
+   else if (reg == SOC15_REG_OFFSET(GC, 0, regGCMC_VM_AGP_BASE))
+   return RREG32_SOC15(MMHUB, 0, regMMMC_VM_AGP_BASE);
+   else if (reg == SOC15_REG_OFFSET(GC, 0, regGCMC_VM_AGP_BOT))
+   return RREG32_SOC15(MMHUB, 0, regMMMC_VM_AGP_BOT);
+   else if (reg == SOC15_REG_OFFSET(GC, 0, regGCMC_VM_AGP_TOP))
+   return RREG32_SOC15(MMHUB, 0, regMMMC_VM_AGP_TOP);
+   else if (reg == SOC15_REG_OFFSET(GC, 0, regGCMC_VM_MX_L1_TLB_CNTL))
+   return RREG32_SOC15(MMHUB, 0, regMMMC_VM_MX_L1_TLB_CNTL);
+   else if (reg == SOC15_REG_OFFSET(GC, 0, 
regGCMC_VM_SYSTEM_APERTURE_LOW_ADDR))
+   return RREG32_SOC15(MMHUB, 0, 
regMMMC_VM_SYSTEM_APERTURE_LOW_ADDR);
+   else if (reg == SOC15_REG_OFFSET(GC, 0, 
regGCMC_VM_SYSTEM_APERTURE_HIGH_ADDR))
+   return RREG32_SOC15(MMHUB, 0, 
regMMMC_VM_SYSTEM_APERTURE_HIGH_ADDR);
+   else if (reg == SOC15_REG_OFFSET(GC, 0, 
regGCMC_VM_LOCAL_FB_ADDRESS_START))
+   return RREG32_SOC15(MMHUB, 0, 
regMMMC_VM_LOCAL_FB_ADDRESS_START);
+   else if (reg == SOC15_REG_OFFSET(GC, 0, 
regGCMC_VM_LOCAL_FB_ADDRESS_END))
+   return RREG32_SOC15(MMHUB, 0, regMMMC_VM_LOCAL_FB_ADDRESS_END);
+   else if (reg == SOC15_REG_OFFSET(GC, 0, 
regGCMC_VM_LOCAL_SYSMEM_ADDRESS_START))
+   return RREG32_SOC15(MMHUB, 0, 
regMMMC_VM_LOCAL_SYSMEM_ADDRESS_START);
+   else if (reg == SOC15_REG_OFFSET(GC, 0, 
regGCMC_VM_LOCAL_SYSMEM_ADDRESS_END))
+   return RREG32_SOC15(MMHUB, 0, 
regMMMC_VM_LOCAL_SYSMEM_ADDRESS_END);
+   else if (reg == SOC15_REG_OFFSET(GC, 0, 
regGCMC_VM_SYSTEM_APERTURE_DEFAULT_ADDR_LSB))
+   return RREG32_SOC15(MMHUB, 0, 
regMMMC_VM_SYSTEM_APERTURE_DEFAULT_ADDR_LSB);
+   else if (reg == SOC15_REG_OFFSET(GC, 0, 
regGCMC_VM_SYSTEM_APERTURE_DEFAULT_ADDR_MSB))
+   return RREG32_SOC15(MMHUB, 0, 
regMMMC_VM_SYSTEM_APERTURE_DEFAULT_ADDR_MSB);
+   else
+   return data;
+}
+
 static void program_imu_rlc_ram(struct amdgpu_device *adev,
const u32 *regs,
const u32 array_size)
@@ -308,6 +346,7 @@ static void program_imu_rlc_ram(struct amdgpu_device *adev,
for (i = 0; i < array_size; i += 3) {
reg = regs[i + 0];
data = regs[i + 2];
+   data = imu_v12_init_gfxhub_settings(adev, reg, data);
if (reg == SOC15_REG_OFFSET(GC, 0, regGRBM_GFX_INDEX)) {
val_l = imu_v12_0_grbm_gfx_index_remap(adev, data, 
false);
val_h = imu_v12_0_grbm_gfx_index_remap(adev, data, 
true);
-- 
2.44.0



[PATCH 11/31] drm/amdgpu: Enable MES to handle doorbell ring on unmapped queue

2024-04-29 Thread Alex Deucher
From: shaoyunl 

On MES12, HW can monitor up to 2048 doorbells that not be
mapped currently and trigger the interrupt to MES when these unmapped
doorbell been ringed.

Signed-off-by: shaoyunl 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/mes_v12_0.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
index d20bb78280b15..d8ccf580bcf4b 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
@@ -454,6 +454,27 @@ static void mes_v12_0_init_aggregated_doorbell(struct 
amdgpu_mes *mes)
WREG32_SOC15(GC, 0, regCP_HQD_GFX_CONTROL, data);
 }
 
+
+static void mes_v12_0_enable_unmapped_doorbell_handling(
+   struct amdgpu_mes *mes, bool enable)
+{
+   struct amdgpu_device *adev = mes->adev;
+   uint32_t data = RREG32_SOC15(GC, 0, regCP_UNMAPPED_DOORBELL);
+
+   /*
+* The default PROC_LSB settng is 0xc which means doorbell
+* addr[16:12] gives the doorbell page number. For kfd, each
+* process will use 2 pages of doorbell, we need to change the
+* setting to 0xd
+*/
+   data &= ~CP_UNMAPPED_DOORBELL__PROC_LSB_MASK;
+   data |= 0xd <<  CP_UNMAPPED_DOORBELL__PROC_LSB__SHIFT;
+
+   data |= (enable ? 1 : 0) << CP_UNMAPPED_DOORBELL__ENABLE__SHIFT;
+
+   WREG32_SOC15(GC, 0, regCP_UNMAPPED_DOORBELL, data);
+}
+
 static const struct amdgpu_mes_funcs mes_v12_0_funcs = {
.add_hw_queue = mes_v12_0_add_hw_queue,
.remove_hw_queue = mes_v12_0_remove_hw_queue,
@@ -1233,6 +1254,9 @@ static int mes_v12_0_hw_init(void *handle)
 
mes_v12_0_init_aggregated_doorbell(>mes);
 
+   /* Enable the MES to handle doorbell ring on unmapped queue */
+   mes_v12_0_enable_unmapped_doorbell_handling(>mes, true);
+
r = mes_v12_0_query_sched_status(>mes);
if (r) {
DRM_ERROR("MES is busy\n");
-- 
2.44.0



[PATCH 21/31] drm/amd/amdgpu: workaround for the imu fw loading

2024-04-29 Thread Alex Deucher
From: Kenneth Feng 

workaournd for the imu fw loading on gfx 12.0 without psp

Signed-off-by: Kenneth Feng 
Reviewed-by: Likun Gao 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/imu_v12_0.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/imu_v12_0.c 
b/drivers/gpu/drm/amd/amdgpu/imu_v12_0.c
index be140ee4d9173..7112e4b2d6489 100644
--- a/drivers/gpu/drm/amd/amdgpu/imu_v12_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/imu_v12_0.c
@@ -143,6 +143,11 @@ static void imu_v12_0_setup(struct amdgpu_device *adev)
imu_reg_val = RREG32_SOC15(GC, 0, regGFX_IMU_C2PMSG_16);
imu_reg_val |= 0x1;
WREG32_SOC15(GC, 0, regGFX_IMU_C2PMSG_16, imu_reg_val);
+
+   imu_reg_val = RREG32_SOC15(GC, 0, regGFX_IMU_SCRATCH_10);
+   imu_reg_val |= 0x20010007;
+   WREG32_SOC15(GC, 0, regGFX_IMU_SCRATCH_10, imu_reg_val);
+
}
 }
 
-- 
2.44.0



[PATCH 22/31] drm/amdgpu: set different fw data addr for mec pipe

2024-04-29 Thread Alex Deucher
From: Likun Gao 

For MEC fw data, different pipe should programed into
different address.

Signed-off-by: Likun Gao 
Reviewed-by: Hawking Zhang 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
index e8505c77e12e8..68a66ccb0100d 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
@@ -2432,7 +2432,9 @@ static int 
gfx_v12_0_cp_compute_load_microcode_rs64(struct amdgpu_device *adev)
return r;
}
 
-   r = amdgpu_bo_create_reserved(adev, fw_data_size,
+   r = amdgpu_bo_create_reserved(adev,
+ ALIGN(fw_data_size, 64 * 1024) *
+ adev->gfx.mec.num_pipe_per_mec,
  64 * 1024, AMDGPU_GEM_DOMAIN_VRAM,
  >gfx.mec.mec_fw_data_obj,
  >gfx.mec.mec_fw_data_gpu_addr,
@@ -2444,7 +2446,9 @@ static int 
gfx_v12_0_cp_compute_load_microcode_rs64(struct amdgpu_device *adev)
}
 
memcpy(fw_ucode_ptr, fw_ucode, fw_ucode_size);
-   memcpy(fw_data_ptr, fw_data, fw_data_size);
+   for (i = 0; i < adev->gfx.mec.num_pipe_per_mec; i++) {
+   memcpy(fw_data_ptr + i * ALIGN(fw_data_size, 64 * 1024) / 4, 
fw_data, fw_data_size);
+   }
 
amdgpu_bo_kunmap(adev->gfx.mec.mec_fw_obj);
amdgpu_bo_kunmap(adev->gfx.mec.mec_fw_data_obj);
@@ -2467,9 +2471,11 @@ static int 
gfx_v12_0_cp_compute_load_microcode_rs64(struct amdgpu_device *adev)
soc24_grbm_select(adev, 1, i, 0, 0);
 
WREG32_SOC15(GC, 0, regCP_MEC_MDBASE_LO,
-lower_32_bits(adev->gfx.mec.mec_fw_data_gpu_addr));
+lower_32_bits(adev->gfx.mec.mec_fw_data_gpu_addr +
+  i * ALIGN(fw_data_size, 64 * 1024)));
WREG32_SOC15(GC, 0, regCP_MEC_MDBASE_HI,
-upper_32_bits(adev->gfx.mec.mec_fw_data_gpu_addr));
+upper_32_bits(adev->gfx.mec.mec_fw_data_gpu_addr +
+  i * ALIGN(fw_data_size, 64 * 1024)));
 
WREG32_SOC15(GC, 0, regCP_CPC_IC_BASE_LO,
 lower_32_bits(adev->gfx.mec.mec_fw_gpu_addr));
-- 
2.44.0



[PATCH 31/31] drm/amdgpu: Enable event log on MES 12

2024-04-29 Thread Alex Deucher
From: shaoyunl 

Enable event log through the HW specific FW API

Signed-off-by: shaoyunl 
Reviewed-by: Harish Kasiviswanthan 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/mes_v12_0.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
index 4a041cc22f68a..e92478b1f298f 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
@@ -403,6 +403,10 @@ static int mes_v12_0_set_hw_resources(struct amdgpu_mes 
*mes)
mes_set_hw_res_pkt.oversubscription_timer = 0;
mes_set_hw_res_pkt.unmapped_doorbell_handling = 1;
 
+
+   mes_set_hw_res_pkt.enable_mes_event_int_logging = 1;
+   mes_set_hw_res_pkt.event_intr_history_gpu_mc_ptr = 
mes->event_log_gpu_addr;
+
return mes_v12_0_submit_pkt_and_poll_completion(mes,
_set_hw_res_pkt, sizeof(mes_set_hw_res_pkt),
offsetof(union MESAPI_SET_HW_RESOURCES, api_status));
-- 
2.44.0



[PATCH 17/31] drm/amdgpu: skip imu related function if dpm=0

2024-04-29 Thread Alex Deucher
From: Likun Gao 

Only execute IMU related functions if dpm>0.

Signed-off-by: Likun Gao 
Reviewed-by: Hawking Zhang 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
index 1253053d10339..f3f8601d6e184 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
@@ -1076,7 +1076,7 @@ static int gfx_v12_0_rlc_backdoor_autoload_enable(struct 
amdgpu_device *adev)
 
WREG32_SOC15(GC, 0, regGFX_IMU_RLC_BOOTLOADER_SIZE, rlc_g_size);
 
-   if (adev->gfx.imu.funcs) {
+   if (adev->gfx.imu.funcs && (amdgpu_dpm > 0)) {
/* RLC autoload sequence 3: load IMU fw */
if (adev->gfx.imu.funcs->load_microcode)
adev->gfx.imu.funcs->load_microcode(adev);
@@ -1149,7 +1149,7 @@ static int gfx_v12_0_sw_init(void *handle)
 
adev->gfx.gfx_current_status = AMDGPU_GFX_NORMAL_MODE;
 
-   if (adev->gfx.imu.funcs) {
+   if (adev->gfx.imu.funcs && (amdgpu_dpm > 0)) {
if (adev->gfx.imu.funcs->init_microcode) {
r = adev->gfx.imu.funcs->init_microcode(adev);
if (r)
@@ -3215,7 +3215,7 @@ static int gfx_v12_0_hw_init(void *handle)
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
 
if (adev->firmware.load_type == AMDGPU_FW_LOAD_RLC_BACKDOOR_AUTO) {
-   if (adev->gfx.imu.funcs) {
+   if (adev->gfx.imu.funcs && (amdgpu_dpm > 0)) {
/* RLC autoload sequence 1: Program rlc ram */
if (adev->gfx.imu.funcs->program_rlc_ram)
adev->gfx.imu.funcs->program_rlc_ram(adev);
-- 
2.44.0



[PATCH 23/31] drm/amd/amdgpu: update GFX12 wave data registers

2024-04-29 Thread Alex Deucher
From: Tom St Denis 

Signed-off-by: Tom St Denis 
Reviewed-by: Jonathan Kim 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
index 68a66ccb0100d..730d57a10077f 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
@@ -659,8 +659,8 @@ static void gfx_v12_0_read_wave_data(struct amdgpu_device 
*adev,
 * zero here */
WARN_ON(simd != 0);
 
-   /* type 3 wave data */
-   dst[(*no_fields)++] = 3;
+   /* type 4 wave data */
+   dst[(*no_fields)++] = 4;
dst[(*no_fields)++] = wave_read_ind(adev, wave, ixSQ_WAVE_STATUS);
dst[(*no_fields)++] = wave_read_ind(adev, wave, ixSQ_WAVE_PC_LO);
dst[(*no_fields)++] = wave_read_ind(adev, wave, ixSQ_WAVE_PC_HI);
@@ -675,6 +675,15 @@ static void gfx_v12_0_read_wave_data(struct amdgpu_device 
*adev,
dst[(*no_fields)++] = wave_read_ind(adev, wave, ixSQ_WAVE_IB_DBG1);
dst[(*no_fields)++] = wave_read_ind(adev, wave, ixSQ_WAVE_M0);
dst[(*no_fields)++] = wave_read_ind(adev, wave, ixSQ_WAVE_MODE);
+   dst[(*no_fields)++] = wave_read_ind(adev, wave, ixSQ_WAVE_STATE_PRIV);
+   dst[(*no_fields)++] = wave_read_ind(adev, wave, 
ixSQ_WAVE_EXCP_FLAG_PRIV);
+   dst[(*no_fields)++] = wave_read_ind(adev, wave, 
ixSQ_WAVE_EXCP_FLAG_USER);
+   dst[(*no_fields)++] = wave_read_ind(adev, wave, ixSQ_WAVE_TRAP_CTRL);
+   dst[(*no_fields)++] = wave_read_ind(adev, wave, ixSQ_WAVE_ACTIVE);
+   dst[(*no_fields)++] = wave_read_ind(adev, wave, 
ixSQ_WAVE_VALID_AND_IDLE);
+   dst[(*no_fields)++] = wave_read_ind(adev, wave, 
ixSQ_WAVE_DVGPR_ALLOC_LO);
+   dst[(*no_fields)++] = wave_read_ind(adev, wave, 
ixSQ_WAVE_DVGPR_ALLOC_HI);
+   dst[(*no_fields)++] = wave_read_ind(adev, wave, ixSQ_WAVE_SCHED_MODE);
 }
 
 static void gfx_v12_0_read_wave_sgprs(struct amdgpu_device *adev,
-- 
2.44.0



[PATCH 27/31] drm/amdgpu: skip dpm check to init imu fw

2024-04-29 Thread Alex Deucher
From: Likun Gao 

Skip dpm check to init imu firmware for imu v12.

Signed-off-by: Likun Gao 
Reviewed-by: Hawking Zhang 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
index 6a2af12b5e29d..33fe519e617d6 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
@@ -461,7 +461,7 @@ static int gfx_v12_0_init_microcode(struct amdgpu_device 
*adev)
/* only one MEC for gfx 12 */
adev->gfx.mec2_fw = NULL;
 
-   if (adev->gfx.imu.funcs && (amdgpu_dpm > 0)) {
+   if (adev->gfx.imu.funcs) {
if (adev->gfx.imu.funcs->init_microcode) {
err = adev->gfx.imu.funcs->init_microcode(adev);
if (err)
-- 
2.44.0



[PATCH 19/31] drm/amdgpu: support S fw load for gfx v12

2024-04-29 Thread Alex Deucher
From: Likun Gao 

Support Save & Restore related fw load with backdoor RLC
autoload type on gfx v12.

Signed-off-by: Likun Gao 
Reviewed-by: Kenneth Feng 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
index 06244d97c2831..df5873ba54e76 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
@@ -915,6 +915,7 @@ gfx_v12_0_rlc_backdoor_autoload_copy_gfx_ucode(struct 
amdgpu_device *adev)
uint32_t fw_size;
const struct gfx_firmware_header_v2_0 *cpv2_hdr;
const struct rlc_firmware_header_v2_0 *rlc_hdr;
+   const struct rlc_firmware_header_v2_1 *rlcv21_hdr;
const struct rlc_firmware_header_v2_2 *rlcv22_hdr;
uint16_t version_major, version_minor;
 
@@ -986,6 +987,21 @@ gfx_v12_0_rlc_backdoor_autoload_copy_gfx_ucode(struct 
amdgpu_device *adev)
version_major = le16_to_cpu(rlc_hdr->header.header_version_major);
version_minor = le16_to_cpu(rlc_hdr->header.header_version_minor);
if (version_major == 2) {
+   if (version_minor >= 1) {
+   rlcv21_hdr = (const struct rlc_firmware_header_v2_1 
*)adev->gfx.rlc_fw->data;
+
+   fw_data = (const __le32 *)(adev->gfx.rlc_fw->data +
+   
le32_to_cpu(rlcv21_hdr->save_restore_list_gpm_offset_bytes));
+   fw_size = 
le32_to_cpu(rlcv21_hdr->save_restore_list_gpm_size_bytes);
+   gfx_v12_0_rlc_backdoor_autoload_copy_ucode(adev, 
SOC24_FIRMWARE_ID_RLCG_SCRATCH,
+  fw_data, fw_size);
+
+   fw_data = (const __le32 *)(adev->gfx.rlc_fw->data +
+   
le32_to_cpu(rlcv21_hdr->save_restore_list_srm_offset_bytes));
+   fw_size = 
le32_to_cpu(rlcv21_hdr->save_restore_list_srm_size_bytes);
+   gfx_v12_0_rlc_backdoor_autoload_copy_ucode(adev, 
SOC24_FIRMWARE_ID_RLC_SRM_ARAM,
+  fw_data, fw_size);
+   }
if (version_minor >= 2) {
rlcv22_hdr = (const struct rlc_firmware_header_v2_2 
*)adev->gfx.rlc_fw->data;
 
-- 
2.44.0



[PATCH 15/31] drm/amdgpu: set cp fw address set for gfx v12

2024-04-29 Thread Alex Deucher
From: Likun Gao 

Split PFF/ME/MEC firmware address setting function
from related load microcode funtion, as it's also
needed for rlc autolad.

Signed-off-by: Likun Gao 
Reviewed-by: Hawking Zhang 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c | 186 -
 1 file changed, 122 insertions(+), 64 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
index 2075797b8b762..afb977e1dfc81 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
@@ -1749,6 +1749,110 @@ static void gfx_v12_0_config_gfx_rs64(struct 
amdgpu_device *adev)
WREG32_SOC15(GC, 0, regCP_MEC_RS64_CNTL, tmp);
 }
 
+static void gfx_v12_0_set_pfp_ucode_start_addr(struct amdgpu_device *adev)
+{
+   const struct gfx_firmware_header_v2_0 *cp_hdr;
+   unsigned pipe_id, tmp;
+
+   cp_hdr = (const struct gfx_firmware_header_v2_0 *)
+   adev->gfx.pfp_fw->data;
+   mutex_lock(>srbm_mutex);
+   for (pipe_id = 0; pipe_id < adev->gfx.me.num_pipe_per_me; pipe_id++) {
+   soc24_grbm_select(adev, 0, pipe_id, 0, 0);
+   WREG32_SOC15(GC, 0, regCP_PFP_PRGRM_CNTR_START,
+(cp_hdr->ucode_start_addr_hi << 30) |
+(cp_hdr->ucode_start_addr_lo >> 2));
+   WREG32_SOC15(GC, 0, regCP_PFP_PRGRM_CNTR_START_HI,
+cp_hdr->ucode_start_addr_hi>>2);
+
+   /*
+* Program CP_ME_CNTL to reset given PIPE to take
+* effect of CP_PFP_PRGRM_CNTR_START.
+*/
+   tmp = RREG32_SOC15(GC, 0, regCP_ME_CNTL);
+   if (pipe_id == 0)
+   tmp = REG_SET_FIELD(tmp, CP_ME_CNTL,
+   PFP_PIPE0_RESET, 1);
+   else
+   tmp = REG_SET_FIELD(tmp, CP_ME_CNTL,
+   PFP_PIPE1_RESET, 1);
+   WREG32_SOC15(GC, 0, regCP_ME_CNTL, tmp);
+
+   /* Clear pfp pipe0 reset bit. */
+   if (pipe_id == 0)
+   tmp = REG_SET_FIELD(tmp, CP_ME_CNTL,
+   PFP_PIPE0_RESET, 0);
+   else
+   tmp = REG_SET_FIELD(tmp, CP_ME_CNTL,
+   PFP_PIPE1_RESET, 0);
+   WREG32_SOC15(GC, 0, regCP_ME_CNTL, tmp);
+   }
+   soc24_grbm_select(adev, 0, 0, 0, 0);
+   mutex_unlock(>srbm_mutex);
+}
+
+static void gfx_v12_0_set_me_ucode_start_addr(struct amdgpu_device *adev)
+{
+   const struct gfx_firmware_header_v2_0 *cp_hdr;
+   unsigned pipe_id, tmp;
+
+   cp_hdr = (const struct gfx_firmware_header_v2_0 *)
+   adev->gfx.me_fw->data;
+   mutex_lock(>srbm_mutex);
+   for (pipe_id = 0; pipe_id < adev->gfx.me.num_pipe_per_me; pipe_id++) {
+   soc24_grbm_select(adev, 0, pipe_id, 0, 0);
+   WREG32_SOC15(GC, 0, regCP_ME_PRGRM_CNTR_START,
+(cp_hdr->ucode_start_addr_hi << 30) |
+(cp_hdr->ucode_start_addr_lo >> 2) );
+   WREG32_SOC15(GC, 0, regCP_ME_PRGRM_CNTR_START_HI,
+cp_hdr->ucode_start_addr_hi>>2);
+
+   /*
+* Program CP_ME_CNTL to reset given PIPE to take
+* effect of CP_ME_PRGRM_CNTR_START.
+*/
+   tmp = RREG32_SOC15(GC, 0, regCP_ME_CNTL);
+   if (pipe_id == 0)
+   tmp = REG_SET_FIELD(tmp, CP_ME_CNTL,
+   ME_PIPE0_RESET, 1);
+   else
+   tmp = REG_SET_FIELD(tmp, CP_ME_CNTL,
+   ME_PIPE1_RESET, 1);
+   WREG32_SOC15(GC, 0, regCP_ME_CNTL, tmp);
+
+   /* Clear pfp pipe0 reset bit. */
+   if (pipe_id == 0)
+   tmp = REG_SET_FIELD(tmp, CP_ME_CNTL,
+   ME_PIPE0_RESET, 0);
+   else
+   tmp = REG_SET_FIELD(tmp, CP_ME_CNTL,
+   ME_PIPE1_RESET, 0);
+   WREG32_SOC15(GC, 0, regCP_ME_CNTL, tmp);
+   }
+   soc24_grbm_select(adev, 0, 0, 0, 0);
+   mutex_unlock(>srbm_mutex);
+}
+
+static void gfx_v12_0_set_mec_ucode_start_addr(struct amdgpu_device *adev)
+{
+   const struct gfx_firmware_header_v2_0 *cp_hdr;
+   unsigned pipe_id;
+
+   cp_hdr = (const struct gfx_firmware_header_v2_0 *)
+   adev->gfx.mec_fw->data;
+   mutex_lock(>srbm_mutex);
+   for (pipe_id = 0; pipe_id < adev->gfx.mec.num_pipe_per_mec; pipe_id++) {
+   soc24_grbm_select(adev, 1, pipe_id, 0, 0);
+   WREG32_SOC15(GC, 0, regCP_MEC_RS64_PRGRM_CNTR_START,
+cp_hdr->ucode_start_addr_lo >> 2 |
+  

[PATCH 25/31] drm/amdgpu: use new method to program rlc ram

2024-04-29 Thread Alex Deucher
From: Likun Gao 

Program rlc ram with golden setting data instead.
The old method (program_imu_rlc_ram_old) should be
retired in the future.

Signed-off-by: Likun Gao 
Reviewed-by: Hawking Zhang 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/imu_v12_0.c | 70 ++
 1 file changed, 61 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/imu_v12_0.c 
b/drivers/gpu/drm/amd/amdgpu/imu_v12_0.c
index 7112e4b2d6489..5baef51660637 100644
--- a/drivers/gpu/drm/amd/amdgpu/imu_v12_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/imu_v12_0.c
@@ -33,6 +33,8 @@
 
 MODULE_FIRMWARE("amdgpu/gc_12_0_1_imu.bin");
 
+#define TRANSFER_RAM_MASK  0x001c
+
 static int imu_v12_0_init_microcode(struct amdgpu_device *adev)
 {
char fw_name[40];
@@ -245,9 +247,9 @@ static const struct imu_rlc_ram_golden 
imu_rlc_ram_golden_12_0_1[] = {
IMU_RLC_RAM_GOLDEN_VALUE(GC, 0, 
regGCMC_VM_SYSTEM_APERTURE_DEFAULT_ADDR_MSB, 0, 0x1c)
 };
 
-static void program_imu_rlc_ram(struct amdgpu_device *adev,
-   const struct imu_rlc_ram_golden *regs,
-   const u32 array_size)
+static void program_imu_rlc_ram_old(struct amdgpu_device *adev,
+   const struct imu_rlc_ram_golden *regs,
+   const u32 array_size)
 {
const struct imu_rlc_ram_golden *entry;
u32 reg, data;
@@ -271,21 +273,66 @@ static void program_imu_rlc_ram(struct amdgpu_device 
*adev,
WREG32_SOC15(GC, 0, regGFX_IMU_RLC_RAM_ADDR_LOW, reg);
WREG32_SOC15(GC, 0, regGFX_IMU_RLC_RAM_DATA, data);
}
-   //Indicate the latest entry
-   WREG32_SOC15(GC, 0, regGFX_IMU_RLC_RAM_ADDR_HIGH, 0);
-   WREG32_SOC15(GC, 0, regGFX_IMU_RLC_RAM_ADDR_LOW, 0);
-   WREG32_SOC15(GC, 0, regGFX_IMU_RLC_RAM_DATA, 0);
+}
+
+static u32 imu_v12_0_grbm_gfx_index_remap(struct amdgpu_device *adev,
+ u32 data, bool high)
+{
+   u32 val, inst_index;
+
+   inst_index = REG_GET_FIELD(data, GRBM_GFX_INDEX, INSTANCE_INDEX);
+
+   if (high)
+   val = inst_index >> 5;
+   else
+   val = REG_GET_FIELD(data, GRBM_GFX_INDEX, SE_BROADCAST_WRITES) 
<< 18 |
+ REG_GET_FIELD(data, GRBM_GFX_INDEX, SA_BROADCAST_WRITES) 
<< 19 |
+ REG_GET_FIELD(data, GRBM_GFX_INDEX, 
INSTANCE_BROADCAST_WRITES) << 20 |
+ REG_GET_FIELD(data, GRBM_GFX_INDEX, SE_INDEX) << 21 |
+ REG_GET_FIELD(data, GRBM_GFX_INDEX, SA_INDEX) << 25 |
+ (inst_index & 0x1f);
+
+   return val;
+}
+
+static void program_imu_rlc_ram(struct amdgpu_device *adev,
+   const u32 *regs,
+   const u32 array_size)
+{
+   u32 reg, data, val_h = 0, val_l = TRANSFER_RAM_MASK;
+   int i;
+
+   if (array_size % 3)
+   return;
+
+   for (i = 0; i < array_size; i += 3) {
+   reg = regs[i + 0];
+   data = regs[i + 2];
+   if (reg == SOC15_REG_OFFSET(GC, 0, regGRBM_GFX_INDEX)) {
+   val_l = imu_v12_0_grbm_gfx_index_remap(adev, data, 
false);
+   val_h = imu_v12_0_grbm_gfx_index_remap(adev, data, 
true);
+   } else {
+   WREG32_SOC15(GC, 0, regGFX_IMU_RLC_RAM_ADDR_HIGH, 
val_h);
+   WREG32_SOC15(GC, 0, regGFX_IMU_RLC_RAM_ADDR_LOW, reg | 
val_l);
+   WREG32_SOC15(GC, 0, regGFX_IMU_RLC_RAM_DATA, data);
+   }
+   }
 }
 
 static void imu_v12_0_program_rlc_ram(struct amdgpu_device *adev)
 {
-   u32 reg_data;
+   u32 reg_data, size;
+   const u32 *data;
+   int r = -EINVAL;
 
WREG32_SOC15(GC, 0, regGFX_IMU_RLC_RAM_INDEX, 0x2);
 
switch (amdgpu_ip_version(adev, GC_HWIP, 0)) {
case IP_VERSION(12, 0, 1):
-   program_imu_rlc_ram(adev, imu_rlc_ram_golden_12_0_1,
+   if (!r)
+   program_imu_rlc_ram(adev, data, (const u32)size);
+   else
+   program_imu_rlc_ram_old(adev, imu_rlc_ram_golden_12_0_1,
(const 
u32)ARRAY_SIZE(imu_rlc_ram_golden_12_0_1));
break;
default:
@@ -293,6 +340,11 @@ static void imu_v12_0_program_rlc_ram(struct amdgpu_device 
*adev)
break;
}
 
+   //Indicate the latest entry
+   WREG32_SOC15(GC, 0, regGFX_IMU_RLC_RAM_ADDR_HIGH, 0);
+   WREG32_SOC15(GC, 0, regGFX_IMU_RLC_RAM_ADDR_LOW, 0);
+   WREG32_SOC15(GC, 0, regGFX_IMU_RLC_RAM_DATA, 0);
+
reg_data = RREG32_SOC15(GC, 0, regGFX_IMU_RLC_RAM_INDEX);
reg_data |= GFX_IMU_RLC_RAM_INDEX__RAM_VALID_MASK;
WREG32_SOC15(GC, 0, regGFX_IMU_RLC_RAM_INDEX, reg_data);
-- 
2.44.0



[PATCH 24/31] drm/amd/amdgpu: add cgcg interface for gfx 12.0

2024-04-29 Thread Alex Deucher
From: Kenneth Feng 

add cgcg interface for gfx 12.0

Signed-off-by: Kenneth Feng 
Reviewed-by: Likun Gao 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c | 190 -
 drivers/gpu/drm/amd/amdgpu/soc24.c |   3 +
 2 files changed, 191 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
index 730d57a10077f..882e00234e33a 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
@@ -1472,7 +1472,7 @@ static void gfx_v12_0_constants_init(struct amdgpu_device 
*adev)
 }
 
 static void gfx_v12_0_enable_gui_idle_interrupt(struct amdgpu_device *adev,
-  bool enable)
+   bool enable)
 {
u32 tmp;
 
@@ -3594,10 +3594,196 @@ static int gfx_v12_0_set_powergating_state(void 
*handle,
return 0;
 }
 
+static void gfx_v12_0_update_coarse_grain_clock_gating(struct amdgpu_device 
*adev,
+  bool enable)
+{
+   uint32_t def, data;
+
+   if (!(adev->cg_flags &
+ (AMD_CG_SUPPORT_GFX_CGCG |
+ AMD_CG_SUPPORT_GFX_CGLS |
+ AMD_CG_SUPPORT_GFX_3D_CGCG |
+ AMD_CG_SUPPORT_GFX_3D_CGLS)))
+   return;
+
+   if (enable) {
+   def = data = RREG32_SOC15(GC, 0, regRLC_CGTT_MGCG_OVERRIDE);
+
+   /* unset CGCG override */
+   if (adev->cg_flags & AMD_CG_SUPPORT_GFX_CGCG)
+   data &= 
~RLC_CGTT_MGCG_OVERRIDE__GFXIP_CGCG_OVERRIDE_MASK;
+   if (adev->cg_flags & AMD_CG_SUPPORT_GFX_CGLS)
+   data &= 
~RLC_CGTT_MGCG_OVERRIDE__GFXIP_CGLS_OVERRIDE_MASK;
+   if (adev->cg_flags & AMD_CG_SUPPORT_GFX_3D_CGCG ||
+   adev->cg_flags & AMD_CG_SUPPORT_GFX_3D_CGLS)
+   data &= 
~RLC_CGTT_MGCG_OVERRIDE__GFXIP_GFX3D_CG_OVERRIDE_MASK;
+
+   /* update CGCG override bits */
+   if (def != data)
+   WREG32_SOC15(GC, 0, regRLC_CGTT_MGCG_OVERRIDE, data);
+
+   /* enable cgcg FSM(0x363F) */
+   def = data = RREG32_SOC15(GC, 0, regRLC_CGCG_CGLS_CTRL);
+
+   if (adev->cg_flags & AMD_CG_SUPPORT_GFX_CGCG) {
+   data &= 
~RLC_CGCG_CGLS_CTRL__CGCG_GFX_IDLE_THRESHOLD_MASK;
+   data |= (0x36 << 
RLC_CGCG_CGLS_CTRL__CGCG_GFX_IDLE_THRESHOLD__SHIFT) |
+RLC_CGCG_CGLS_CTRL__CGCG_EN_MASK;
+   }
+
+   if (adev->cg_flags & AMD_CG_SUPPORT_GFX_CGLS) {
+   data &= 
~RLC_CGCG_CGLS_CTRL__CGLS_REP_COMPANSAT_DELAY_MASK;
+   data |= (0x000F << 
RLC_CGCG_CGLS_CTRL__CGLS_REP_COMPANSAT_DELAY__SHIFT) |
+RLC_CGCG_CGLS_CTRL__CGLS_EN_MASK;
+   }
+
+   if (def != data)
+   WREG32_SOC15(GC, 0, regRLC_CGCG_CGLS_CTRL, data);
+
+   /* Program RLC_CGCG_CGLS_CTRL_3D */
+   def = data = RREG32_SOC15(GC, 0, regRLC_CGCG_CGLS_CTRL_3D);
+
+   if (adev->cg_flags & AMD_CG_SUPPORT_GFX_3D_CGCG) {
+   data &= 
~RLC_CGCG_CGLS_CTRL_3D__CGCG_GFX_IDLE_THRESHOLD_MASK;
+   data |= (0x36 << 
RLC_CGCG_CGLS_CTRL_3D__CGCG_GFX_IDLE_THRESHOLD__SHIFT) |
+RLC_CGCG_CGLS_CTRL_3D__CGCG_EN_MASK;
+   }
+
+   if (adev->cg_flags & AMD_CG_SUPPORT_GFX_3D_CGLS) {
+   data &= 
~RLC_CGCG_CGLS_CTRL_3D__CGLS_REP_COMPANSAT_DELAY_MASK;
+   data |= (0xf << 
RLC_CGCG_CGLS_CTRL_3D__CGLS_REP_COMPANSAT_DELAY__SHIFT) |
+RLC_CGCG_CGLS_CTRL_3D__CGLS_EN_MASK;
+   }
+
+   if (def != data)
+   WREG32_SOC15(GC, 0, regRLC_CGCG_CGLS_CTRL_3D, data);
+
+   /* set IDLE_POLL_COUNT(0x00900100) */
+   def = data = RREG32_SOC15(GC, 0, regCP_RB_WPTR_POLL_CNTL);
+
+   data &= ~(CP_RB_WPTR_POLL_CNTL__POLL_FREQUENCY_MASK | 
CP_RB_WPTR_POLL_CNTL__IDLE_POLL_COUNT_MASK);
+   data |= (0x0100 << CP_RB_WPTR_POLL_CNTL__POLL_FREQUENCY__SHIFT) 
|
+   (0x0090 << 
CP_RB_WPTR_POLL_CNTL__IDLE_POLL_COUNT__SHIFT);
+
+   if (def != data)
+   WREG32_SOC15(GC, 0, regCP_RB_WPTR_POLL_CNTL, data);
+
+   data = RREG32_SOC15(GC, 0, regCP_INT_CNTL);
+   data = REG_SET_FIELD(data, CP_INT_CNTL, CNTX_BUSY_INT_ENABLE, 
1);
+   data = REG_SET_FIELD(data, CP_INT_CNTL, CNTX_EMPTY_INT_ENABLE, 
1);
+   data = REG_SET_FIELD(data, CP_INT_CNTL, CMP_BUSY_INT_ENABLE, 1);
+   data = REG_SET_FIELD(data, CP_INT_CNTL, GFX_IDLE_INT_ENABLE, 1);
+   WREG32_SOC15(GC, 0, regCP_INT_CNTL, data);
+
+  

[PATCH 18/31] drm/amdgpu/gfx12: recalculate available compute rings to use

2024-04-29 Thread Alex Deucher
From: Jack Xiao 

Recalculate the number of compute rings to use based on
the gfx hardware configuration. As needed reserve half of
compute rings for mes, kgd can't use up all compute rings.

Signed-off-by: Jack Xiao 
Reviewed-by: Hawking Zhang 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
index f3f8601d6e184..06244d97c2831 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
@@ -1103,6 +1103,7 @@ static int gfx_v12_0_rlc_backdoor_autoload_enable(struct 
amdgpu_device *adev)
 static int gfx_v12_0_sw_init(void *handle)
 {
int i, j, k, r, ring_id = 0;
+   unsigned num_compute_rings;
int xcc_id = 0;
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
 
@@ -1126,6 +1127,12 @@ static int gfx_v12_0_sw_init(void *handle)
break;
}
 
+   /* recalculate compute rings to use based on hardware configuration */
+   num_compute_rings = (adev->gfx.mec.num_pipe_per_mec *
+adev->gfx.mec.num_queue_per_pipe) / 2;
+   adev->gfx.num_compute_rings = min(adev->gfx.num_compute_rings,
+ num_compute_rings);
+
/* EOP Event */
r = amdgpu_irq_add_id(adev, SOC21_IH_CLIENTID_GRBM_CP,
  GFX_11_0_0__SRCID__CP_EOP_INTERRUPT,
-- 
2.44.0



[PATCH 06/31] drm/amdgpu: Add mes_v12_api_def.h for gfx12

2024-04-29 Thread Alex Deucher
From: Harish Kasiviswanathan 

Add MES_v12 header definition for gfx12

v2: Modify SET_SHADER_DEBUGGER to match mes_v11 definition. This doesn't
change the structure layout

v3: Removed unncessary comment and spaces

Signed-off-by: Harish Kasiviswanathan 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/include/mes_v12_api_def.h | 775 ++
 1 file changed, 775 insertions(+)
 create mode 100644 drivers/gpu/drm/amd/include/mes_v12_api_def.h

diff --git a/drivers/gpu/drm/amd/include/mes_v12_api_def.h 
b/drivers/gpu/drm/amd/include/mes_v12_api_def.h
new file mode 100644
index 0..81cc0a5540492
--- /dev/null
+++ b/drivers/gpu/drm/amd/include/mes_v12_api_def.h
@@ -0,0 +1,775 @@
+/*
+ * Copyright 2023 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#ifndef __MES_API_DEF_H__
+#define __MES_API_DEF_H__
+
+#pragma pack(push, 8)
+
+#define MES_API_VERSION 0x14
+
+/* Driver submits one API(cmd) as a single Frame and this command size is same 
for all API
+ * to ease the debugging and parsing of ring buffer.
+ */
+enum {API_FRAME_SIZE_IN_DWORDS = 64};
+
+/* To avoid command in scheduler context to be overwritten whenenver mutilple 
interrupts come in,
+ * this creates another queue
+ */
+enum {API_NUMBER_OF_COMMAND_MAX   = 32};
+
+enum MES_API_TYPE {
+   MES_API_TYPE_SCHEDULER = 1,
+   MES_API_TYPE_MAX
+};
+
+enum MES_SCH_API_OPCODE {
+   MES_SCH_API_SET_HW_RSRC = 0,
+   MES_SCH_API_SET_SCHEDULING_CONFIG   = 1, /* agreegated db, 
quantums, etc */
+   MES_SCH_API_ADD_QUEUE   = 2,
+   MES_SCH_API_REMOVE_QUEUE= 3,
+   MES_SCH_API_PERFORM_YIELD   = 4,
+   MES_SCH_API_SET_GANG_PRIORITY_LEVEL = 5, /* For windows GANG = 
Context */
+   MES_SCH_API_SUSPEND = 6,
+   MES_SCH_API_RESUME  = 7,
+   MES_SCH_API_RESET   = 8,
+   MES_SCH_API_SET_LOG_BUFFER  = 9,
+   MES_SCH_API_CHANGE_GANG_PRORITY = 10,
+   MES_SCH_API_QUERY_SCHEDULER_STATUS  = 11,
+   MES_SCH_API_SET_DEBUG_VMID  = 13,
+   MES_SCH_API_MISC= 14,
+   MES_SCH_API_UPDATE_ROOT_PAGE_TABLE  = 15,
+   MES_SCH_API_AMD_LOG = 16,
+   MES_SCH_API_SET_SE_MODE = 17,
+   MES_SCH_API_SET_GANG_SUBMIT = 18,
+
+   MES_SCH_API_MAX = 0xFF
+};
+
+union MES_API_HEADER {
+   struct {
+   uint32_t type : 4; /* 0 - Invalid; 1 - Scheduling; 2 - TBD 
*/
+   uint32_t opcode   : 8;
+   uint32_t dwsize   : 8; /* including header */
+   uint32_t reserved : 12;
+   };
+
+   uint32_t u32All;
+};
+
+enum MES_AMD_PRIORITY_LEVEL {
+   AMD_PRIORITY_LEVEL_LOW  = 0,
+   AMD_PRIORITY_LEVEL_NORMAL   = 1,
+   AMD_PRIORITY_LEVEL_MEDIUM   = 2,
+   AMD_PRIORITY_LEVEL_HIGH = 3,
+   AMD_PRIORITY_LEVEL_REALTIME = 4,
+
+   AMD_PRIORITY_NUM_LEVELS
+};
+
+enum MES_QUEUE_TYPE {
+   MES_QUEUE_TYPE_GFX,
+   MES_QUEUE_TYPE_COMPUTE,
+   MES_QUEUE_TYPE_SDMA,
+
+   MES_QUEUE_TYPE_MAX,
+};
+
+struct MES_API_STATUS {
+   uint64_t api_completion_fence_addr;
+   uint64_t api_completion_fence_value;
+};
+
+
+enum { MAX_COMPUTE_PIPES = 8 };
+enum { MAX_GFX_PIPES= 2 };
+enum { MAX_SDMA_PIPES   = 2 };
+
+enum { MAX_COMPUTE_HQD_PER_PIPE= 8 };
+enum { MAX_GFX_HQD_PER_PIPE= 8 };
+enum { MAX_SDMA_HQD_PER_PIPE   = 10 };
+enum { MAX_SDMA_HQD_PER_PIPE_11_0  = 8 };
+
+
+enum { MAX_QUEUES_IN_A_GANG = 8 };
+
+enum VM_HUB_TYPE {
+   VM_HUB_TYPE_GC = 0,
+   VM_HUB_TYPE_MM = 1,
+
+   VM_HUB_TYPE_MAX,
+};
+
+enum { VMID_INVALID = 0x };
+
+enum { MAX_VMID_GCHUB = 16 };
+enum { MAX_VMID_MMHUB = 16 };
+
+enum SET_DEBUG_VMID_OPERATIONS {

[PATCH 30/31] drm/amdgpu: Enable unmapped doorbell handling basic mode on mes 12

2024-04-29 Thread Alex Deucher
From: shaoyunl 

Enable basic mode handling for doorbell ring on unmapped CP queue.
In this mode, MES can start schedule the queue mapping based on HW
interrupt instead of timer.

Signed-off-by: shaoyunl 
Reviewed-by: Harish Kasiviswanthan 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/mes_v12_0.c| 16 +++-
 drivers/gpu/drm/amd/include/mes_v12_api_def.h |  3 ++-
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
index 2d713e7b976aa..4a041cc22f68a 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
@@ -394,7 +394,14 @@ static int mes_v12_0_set_hw_resources(struct amdgpu_mes 
*mes)
mes_set_hw_res_pkt.disable_mes_log = 1;
mes_set_hw_res_pkt.use_different_vmid_compute = 1;
mes_set_hw_res_pkt.enable_reg_active_poll = 1;
-   mes_set_hw_res_pkt.oversubscription_timer = 50;
+
+   /*
+* No need to enable oversubscribe timer when we have unmapped doorbell
+* handling support.
+* handling  mode - 0: disabled; 1: basic version; 2: basic+ version
+*/
+   mes_set_hw_res_pkt.oversubscription_timer = 0;
+   mes_set_hw_res_pkt.unmapped_doorbell_handling = 1;
 
return mes_v12_0_submit_pkt_and_poll_completion(mes,
_set_hw_res_pkt, sizeof(mes_set_hw_res_pkt),
@@ -831,6 +838,13 @@ static int mes_v12_0_mqd_init(struct amdgpu_ring *ring)
mqd->cp_hqd_iq_timer = regCP_HQD_IQ_TIMER_DEFAULT;
mqd->cp_hqd_quantum = regCP_HQD_QUANTUM_DEFAULT;
 
+   /*
+* Set CP_HQD_GFX_CONTROL.DB_UPDATED_MSG_EN[15] to enable unmapped
+* doorbell handling. This is a reserved CP internal register can
+* not be accesss by others
+*/
+   mqd->reserved_184 = BIT(15);
+
return 0;
 }
 
diff --git a/drivers/gpu/drm/amd/include/mes_v12_api_def.h 
b/drivers/gpu/drm/amd/include/mes_v12_api_def.h
index 81cc0a5540492..2cdecf937acef 100644
--- a/drivers/gpu/drm/amd/include/mes_v12_api_def.h
+++ b/drivers/gpu/drm/amd/include/mes_v12_api_def.h
@@ -238,7 +238,8 @@ union MESAPI_SET_HW_RESOURCES {
uint32_t send_write_data : 1;
uint32_t os_tdr_timeout_override : 1;
uint32_t use_rs64mem_for_proc_gang_ctx : 1;
-   uint32_t reserved : 17;
+   uint32_t unmapped_doorbell_handling: 2;
+   uint32_t reserved : 15;
};
uint32_t uint32_all;
};
-- 
2.44.0



[PATCH 29/31] drm/amdgpu: Switch to smuio func to get gpu clk counter

2024-04-29 Thread Alex Deucher
From: Hawking Zhang 

Switch to smuio callback to query gpu clock counter

Signed-off-by: Hawking Zhang 
Reviewed-by: Likun Gao 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
index 33fe519e617d6..c94ed3b929cb4 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
@@ -36,8 +36,6 @@
 
 #include "gc/gc_12_0_0_offset.h"
 #include "gc/gc_12_0_0_sh_mask.h"
-#include "smuio/smuio_14_0_2_offset.h"
-#include "smuio/smuio_14_0_2_sh_mask.h"
 #include "soc24_enum.h"
 #include "ivsrcid/gfx/irqsrcs_gfx_11_0_0.h"
 
@@ -3469,12 +3467,12 @@ static uint64_t gfx_v12_0_get_gpu_clock_counter(struct 
amdgpu_device *adev)
 {
uint64_t clock;
 
-   amdgpu_gfx_off_ctrl(adev, false);
-   mutex_lock(>gfx.gpu_clock_mutex);
-   clock = (uint64_t)RREG32_SOC15(SMUIO, 0, regGOLDEN_TSC_COUNT_LOWER) |
-   ((uint64_t)RREG32_SOC15(SMUIO, 0, regGOLDEN_TSC_COUNT_UPPER) << 
32ULL);
-   mutex_unlock(>gfx.gpu_clock_mutex);
-   amdgpu_gfx_off_ctrl(adev, true);
+   if (adev->smuio.funcs &&
+   adev->smuio.funcs->get_gpu_clock_counter)
+   clock = adev->smuio.funcs->get_gpu_clock_counter(adev);
+   else
+   dev_warn(adev->dev, "query gpu clock counter is not 
supported\n");
+
return clock;
 }
 
-- 
2.44.0



[PATCH 02/31] drm/amdgpu/discovery: Set GC family for GC 12.0 IP

2024-04-29 Thread Alex Deucher
From: Likun Gao 

Set GC family for GC 12.0 IPs.

v2: squash in updates (Alex)

Signed-off-by: Likun Gao 
Reviewed-by: Hawking Zhang 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index 2aad1ba0ab9d2..7187968226a81 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -2503,6 +2503,10 @@ int amdgpu_discovery_set_ip_blocks(struct amdgpu_device 
*adev)
case IP_VERSION(11, 5, 1):
adev->family = AMDGPU_FAMILY_GC_11_5_0;
break;
+   case IP_VERSION(12, 0, 0):
+   case IP_VERSION(12, 0, 1):
+   adev->family = AMDGPU_FAMILY_GC_12_0_0;
+   break;
default:
return -EINVAL;
}
-- 
2.44.0



Re: [PATCH 3/3] drm/amdgpu: Fix pinned GART area accounting and fdinfo reporting

2024-04-29 Thread Felix Kuehling




On 2024-04-29 9:45, Tvrtko Ursulin wrote:


On 29/04/2024 12:11, Christian König wrote:

Am 29.04.24 um 11:43 schrieb Tvrtko Ursulin:


On 26/04/2024 23:24, Felix Kuehling wrote:


On 2024-04-26 12:43, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

When commit b453e42a6e8b ("drm/amdgpu: Add new placement for 
preemptible

SG BOs") added a new TTM region it missed to notice the conceptual
imbalance in GART pin size accounting as done in amdgpu_bo_pin/unpin.

That imbalance leads to such objects getting accounted against the
resource, but are not un-accounted when unpinned.


AMDGPU_PL_PREEMPT is mostly used for userptr BOs, which cannot be 
pinned. In any case you should make sure that the accounting is 
consistent between amdgpu_bo_pin_restricted and amdgpu_bo_unpin. 
This patch breaks that consistency.


You mean amdgpu_bo_pin(_restricted) and amdgpu_bo_unpin do not run 
for such objects, or something else?


If they run, then at the end of pin there is:

domain = amdgpu_mem_type_to_domain(bo->tbo.resource->mem_type);
...
} else if (domain == AMDGPU_GEM_DOMAIN_GTT) {
    atomic64_add(amdgpu_bo_size(bo), >gart_pin_size);

And unpin has no handling for AMDGPU_PL_PREEMPT.

Ah I see.. does it rely on amdgpu_mem_type_to_domain returning 0 for 
AMDGPU_PL_PREEMPT? My confusion was I misread the pinning check as 
checking the domain as stored in the bo at creation time.


Although I am still confused by the statement userptr BOs are not 
pinned. It is not needed to map them via GART on AMD hardware for GPU 
to be able to access them?


No, a GART mapping is only needed if you want to scanout from them or 
otherwise use them from the kernel on the GPU.


Background is that the kernel doesn't has VM with page tables..


Got it, thanks!

Presumably somewhere else in the code then it is prevented to call 
pin/unpin on those?


I was referring to this condition in amdgpu_bo_pin_restricted:

if (amdgpu_ttm_tt_get_usermm(bo->tbo.ttm))
return -EPERM;

However, when I look into it more, I see that AMDGPU_PL_PREEMPT is used 
for other SG BOs that actually are pinned, specifically BOs created by 
KFD with KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL or 
KFD_IOC_ALLOC_MEM_FLAGS_MMIO_REMAP. These are very small BOs (one or two 
pages), and only one per process, per GPU, so I'm not sure it's worth 
adding special handling for them in the BO pin accounting.


Regards,
  Felix




What to do, if anything, with the attempt to address the asymmetry in 
the accounting criteria between the pin and unpin?


I mean domain based on pin:

 domain = amdgpu_mem_type_to_domain(bo->tbo.resource->mem_type);
 if (domain == AMDGPU_GEM_DOMAIN_VRAM) {
     atomic64_add(amdgpu_bo_size(bo), >vram_pin_size);
     atomic64_add(amdgpu_vram_mgr_bo_visible_size(bo),
  >visible_pin_size);
 } else if (domain == AMDGPU_GEM_DOMAIN_GTT) {
     atomic64_add(amdgpu_bo_size(bo), >gart_pin_size);
 }

Versus placement based on unpin:

 if (bo->tbo.resource->mem_type == TTM_PL_VRAM) {
     atomic64_sub(amdgpu_bo_size(bo), >vram_pin_size);
     atomic64_sub(amdgpu_vram_mgr_bo_visible_size(bo),
  >visible_pin_size);
 } else if (bo->tbo.resource->mem_type == TTM_PL_TT) {
     atomic64_sub(amdgpu_bo_size(bo), >gart_pin_size);
 }

The fact amdgpu_mem_type_to_domain never translates back to 
AMDGPU_PL_PREEMPT means there is indeed currently no bug.


Is 2/3 still desirable to convert the check in pin to me mem_type based?


Fix by extending the accounting criteria in amdgpu_bo_unpin.

What also aappears needs fixing is not reporting their size from the
amdgpu_bo_get_memory, which is used to implement fdinfo stats, so 
they are

not mixed with the regular userspace created and driver owned objects.


I think that's true. It's a very fine distinction. AMDGPU_PL_PREEMPT 
does use system memory and it is GPU accessible, just like GTT. The 
only difference is, that it's not subject to the GTT limits because 
their eviction is handled by callbacks other than TTM evictions and 
doesn't need to wait for fences.


As in you think those two hunks of the patch are correct?


I think so as well, yes. But we still need a name for preemptible BOs 
while printing them in debugfs.


Currently it looks the name is 'CPU':

amdgpu_bo_print_info()
...
     case AMDGPU_GEM_DOMAIN_CPU:
     default:
     placement = "CPU";
     break;


Also, where to account them in struct amdgpu_mem_stats?

Regards,

Tvrtko



Regards,
Christian.



Regards,

Tvrtko



Regards,
   Felix




And also amdgpu_bo_print_info for debugfs reporting.

Note that the patch depends on the previous one which broke down the
relevant checks from the domain based to placement based.

Signed-off-by: Tvrtko Ursulin 
Fixes: b453e42a6e8b ("drm/amdgpu: Add new placement for preemptible 
SG BOs")

Cc: Felix Kuehling 
Cc: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 5 

[PATCH 16/31] drm/amd/amdgpu: imu fw loading support

2024-04-29 Thread Alex Deucher
From: Kenneth Feng 

support imu related function for gfx v12.

Signed-off-by: Kenneth Feng 
Signed-off-by: Likun Gao 
Reviewed-by: Hawking Zhang 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/Makefile|   3 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c |   4 +-
 drivers/gpu/drm/amd/amdgpu/imu_v12_0.c | 303 +
 drivers/gpu/drm/amd/amdgpu/imu_v12_0.h |  30 +++
 4 files changed, 337 insertions(+), 3 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/imu_v12_0.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/imu_v12_0.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index a6b5cb32ddf9a..099a47b3e0496 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -168,7 +168,8 @@ amdgpu-y += \
gfx_v11_0.o \
gfx_v11_0_3.o \
imu_v11_0_3.o \
-   gfx_v12_0.o
+   gfx_v12_0.o \
+   imu_v12_0.o
 
 # add async DMA block
 amdgpu-y += \
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
index afb977e1dfc81..1253053d10339 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
@@ -30,6 +30,7 @@
 #include "amdgpu_psp.h"
 #include "amdgpu_smu.h"
 #include "amdgpu_atomfirmware.h"
+#include "imu_v12_0.h"
 #include "soc24.h"
 #include "nvd.h"
 
@@ -4523,8 +4524,7 @@ static void gfx_v12_0_set_imu_funcs(struct amdgpu_device 
*adev)
else
adev->gfx.imu.mode = DEBUG_MODE;
 
-   /* TODO */
-   //adev->gfx.imu.funcs = _v12_0_imu_funcs;
+   adev->gfx.imu.funcs = _v12_0_imu_funcs;
 }
 
 static void gfx_v12_0_set_rlc_funcs(struct amdgpu_device *adev)
diff --git a/drivers/gpu/drm/amd/amdgpu/imu_v12_0.c 
b/drivers/gpu/drm/amd/amdgpu/imu_v12_0.c
new file mode 100644
index 0..be140ee4d9173
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/imu_v12_0.c
@@ -0,0 +1,303 @@
+/*
+ * Copyright 2023 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include 
+#include "amdgpu.h"
+#include "amdgpu_imu.h"
+#include "amdgpu_dpm.h"
+
+#include "imu_v12_0.h"
+
+#include "gc/gc_12_0_0_offset.h"
+#include "gc/gc_12_0_0_sh_mask.h"
+
+MODULE_FIRMWARE("amdgpu/gc_12_0_1_imu.bin");
+
+static int imu_v12_0_init_microcode(struct amdgpu_device *adev)
+{
+   char fw_name[40];
+   char ucode_prefix[30];
+   int err;
+   const struct imu_firmware_header_v1_0 *imu_hdr;
+   struct amdgpu_firmware_info *info = NULL;
+
+   DRM_DEBUG("\n");
+
+   amdgpu_ucode_ip_version_decode(adev, GC_HWIP, ucode_prefix, 
sizeof(ucode_prefix));
+
+   snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_imu.bin", ucode_prefix);
+   err = amdgpu_ucode_request(adev, >gfx.imu_fw, fw_name);
+   if (err)
+   goto out;
+   imu_hdr = (const struct imu_firmware_header_v1_0 
*)adev->gfx.imu_fw->data;
+   adev->gfx.imu_fw_version = le32_to_cpu(imu_hdr->header.ucode_version);
+
+   if (adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) {
+   info = >firmware.ucode[AMDGPU_UCODE_ID_IMU_I];
+   info->ucode_id = AMDGPU_UCODE_ID_IMU_I;
+   info->fw = adev->gfx.imu_fw;
+   adev->firmware.fw_size +=
+   ALIGN(le32_to_cpu(imu_hdr->imu_iram_ucode_size_bytes), 
PAGE_SIZE);
+   info = >firmware.ucode[AMDGPU_UCODE_ID_IMU_D];
+   info->ucode_id = AMDGPU_UCODE_ID_IMU_D;
+   info->fw = adev->gfx.imu_fw;
+   adev->firmware.fw_size +=
+   ALIGN(le32_to_cpu(imu_hdr->imu_dram_ucode_size_bytes), 
PAGE_SIZE);
+   }
+
+out:
+   if (err) {
+   dev_err(adev->dev,
+   "gfx12: Failed to load firmware \"%s\"\n",
+   fw_name);
+   amdgpu_ucode_release(>gfx.imu_fw);
+   }
+
+   return err;
+}
+
+static int 

[PATCH 12/31] drm/amdgpu: fix trap enablement for gfx12

2024-04-29 Thread Alex Deucher
From: Jonathan Kim 

Fix request to MES to set SQ_SHADER_TBA_HI.trap_en for GFX12.

Signed-off-by: Jonathan Kim 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/mes_v12_0.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
index d8ccf580bcf4b..8ab85e6231922 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
@@ -336,6 +336,7 @@ static int mes_v12_0_misc_op(struct amdgpu_mes *mes,
memcpy(misc_pkt.set_shader_debugger.tcp_watch_cntl,
input->set_shader_debugger.tcp_watch_cntl,

sizeof(misc_pkt.set_shader_debugger.tcp_watch_cntl));
+   misc_pkt.set_shader_debugger.trap_en = 
input->set_shader_debugger.trap_en;
break;
default:
DRM_ERROR("unsupported misc op (%d) \n", input->op);
-- 
2.44.0



[PATCH 13/31] drm/amdgpu/mes12: update data cache boundary

2024-04-29 Thread Alex Deucher
From: Jack Xiao 

Enlarge the data cache boundary.

v2: use the fix data cache boundary.

Signed-off-by: Jack Xiao 
Reviewed-by: Hawking Zhang 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/mes_v12_0.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
index 8ab85e6231922..2d713e7b976aa 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
@@ -685,8 +685,8 @@ static int mes_v12_0_load_microcode(struct amdgpu_device 
*adev,
WREG32_SOC15(GC, 0, regCP_MES_MDBASE_HI,
 upper_32_bits(adev->mes.data_fw_gpu_addr[pipe]));
 
-   /* Set 0x3 (256K-1) to CP_MES_MDBOUND_LO */
-   WREG32_SOC15(GC, 0, regCP_MES_MDBOUND_LO, 0x3);
+   /* Set data cache boundary CP_MES_MDBOUND_LO */
+   WREG32_SOC15(GC, 0, regCP_MES_MDBOUND_LO, 0x7);
 
if (prime_icache) {
/* invalidate ICACHE */
-- 
2.44.0



[PATCH 01/31] drm/amdgpu: Add gfx v12_0_0 family id

2024-04-29 Thread Alex Deucher
From: Likun Gao 

Add gfx v12_0_0 family id

Signed-off-by: Likun Gao 
Reviewed-by: Hawking Zhang 
Signed-off-by: Alex Deucher 
---
 include/uapi/drm/amdgpu_drm.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index feb47623458a8..5b6c0055cfcf8 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -1279,7 +1279,7 @@ struct drm_amdgpu_info_gpuvm_fault {
 #define AMDGPU_FAMILY_GC_10_3_6149 /* GC 10.3.6 */
 #define AMDGPU_FAMILY_GC_10_3_7151 /* GC 10.3.7 */
 #define AMDGPU_FAMILY_GC_11_5_0150 /* GC 11.5.0 */
-#define AMDGPU_FAMILY_GC_12_0_0 152 /* GC 12.0.0 */
+#define AMDGPU_FAMILY_GC_12_0_0152 /* GC 12.0.0 */
 
 #if defined(__cplusplus)
 }
-- 
2.44.0



[PATCH 08/31] drm/amdgpu: Add mes v12_0 ip block support (v4)

2024-04-29 Thread Alex Deucher
From: Jack Xiao 

v1: Add mes v12_0 ip block support. (Jack)
v2: Switch to gfx.kiq array. (Hawking)
v3: Switch to AMDGPU_GFXHUB(0). (Hawking)
v4: Rebase (Alex)

Signed-off-by: Jack Xiao 
Reviewed-by: Hawking Zhang 
Signed-off-by: Hawking Zhang 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/Makefile|3 +-
 drivers/gpu/drm/amd/amdgpu/mes_v12_0.c | 1306 
 drivers/gpu/drm/amd/amdgpu/mes_v12_0.h |   29 +
 3 files changed, 1337 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/mes_v12_0.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index ce460523f28c3..7d03e89f15d00 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -186,7 +186,8 @@ amdgpu-y += \
 amdgpu-y += \
amdgpu_mes.o \
mes_v10_1.o \
-   mes_v11_0.o
+   mes_v11_0.o \
+   mes_v12_0.o
 
 # add UVD block
 amdgpu-y += \
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
new file mode 100644
index 0..1bf12fc1f72e5
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
@@ -0,0 +1,1306 @@
+/*
+ * Copyright 2023 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include 
+#include 
+#include "amdgpu.h"
+#include "soc15_common.h"
+#include "soc21.h"
+#include "gc/gc_12_0_0_offset.h"
+#include "gc/gc_12_0_0_sh_mask.h"
+#include "gc/gc_11_0_0_default.h"
+#include "v12_structs.h"
+#include "mes_v12_api_def.h"
+
+MODULE_FIRMWARE("amdgpu/gc_12_0_0_mes.bin");
+MODULE_FIRMWARE("amdgpu/gc_12_0_0_mes1.bin");
+MODULE_FIRMWARE("amdgpu/gc_12_0_1_mes.bin");
+MODULE_FIRMWARE("amdgpu/gc_12_0_1_mes1.bin");
+
+static int mes_v12_0_hw_fini(void *handle);
+static int mes_v12_0_kiq_hw_init(struct amdgpu_device *adev);
+static int mes_v12_0_kiq_hw_fini(struct amdgpu_device *adev);
+
+#define MES_EOP_SIZE   2048
+
+static void mes_v12_0_ring_set_wptr(struct amdgpu_ring *ring)
+{
+   struct amdgpu_device *adev = ring->adev;
+
+   if (ring->use_doorbell) {
+   atomic64_set((atomic64_t *)ring->wptr_cpu_addr,
+ring->wptr);
+   WDOORBELL64(ring->doorbell_index, ring->wptr);
+   } else {
+   BUG();
+   }
+}
+
+static u64 mes_v12_0_ring_get_rptr(struct amdgpu_ring *ring)
+{
+   return *ring->rptr_cpu_addr;
+}
+
+static u64 mes_v12_0_ring_get_wptr(struct amdgpu_ring *ring)
+{
+   u64 wptr;
+
+   if (ring->use_doorbell)
+   wptr = atomic64_read((atomic64_t *)ring->wptr_cpu_addr);
+   else
+   BUG();
+   return wptr;
+}
+
+static const struct amdgpu_ring_funcs mes_v12_0_ring_funcs = {
+   .type = AMDGPU_RING_TYPE_MES,
+   .align_mask = 1,
+   .nop = 0,
+   .support_64bit_ptrs = true,
+   .get_rptr = mes_v12_0_ring_get_rptr,
+   .get_wptr = mes_v12_0_ring_get_wptr,
+   .set_wptr = mes_v12_0_ring_set_wptr,
+   .insert_nop = amdgpu_ring_insert_nop,
+};
+
+static int mes_v12_0_submit_pkt_and_poll_completion(struct amdgpu_mes *mes,
+   void *pkt, int size,
+   int api_status_off)
+{
+   int ndw = size / 4;
+   signed long r;
+   union MESAPI__ADD_QUEUE *x_pkt = pkt;
+   struct MES_API_STATUS *api_status;
+   struct amdgpu_device *adev = mes->adev;
+   struct amdgpu_ring *ring = >ring;
+   unsigned long flags;
+   signed long timeout = adev->usec_timeout;
+
+   if (amdgpu_emu_mode) {
+   timeout *= 100;
+   } else if (amdgpu_sriov_vf(adev)) {
+   /* Worst case in sriov where all other 15 VF timeout, each VF 
needs about 600ms */
+   timeout = 15 * 600 * 1000;
+   }
+   BUG_ON(size % 4 != 0);
+
+   

[PATCH 10/31] drm/amdgpu: enable mes v12 self test

2024-04-29 Thread Alex Deucher
From: Jack Xiao 

1. fix available compute queue to use
2. enable mes v12 self test

Signed-off-by: Jack Xiao 
Reviewed-by: Hawking Zhang 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 2 +-
 drivers/gpu/drm/amd/amdgpu/mes_v12_0.c  | 6 ++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
index 02ce69e3d1ddc..ea06f8be133e0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
@@ -156,7 +156,7 @@ int amdgpu_mes_init(struct amdgpu_device *adev)
 
for (i = 0; i < AMDGPU_MES_MAX_COMPUTE_PIPES; i++) {
/* use only 1st MEC pipes */
-   if (i >= 4)
+   if (i >= adev->gfx.mec.num_pipe_per_mec)
continue;
adev->mes.compute_hqd_mask[i] = 0xc;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
index a9bf06ad0202b..d20bb78280b15 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
@@ -1301,6 +1301,12 @@ static int mes_v12_0_early_init(void *handle)
 
 static int mes_v12_0_late_init(void *handle)
 {
+   struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+
+   /* it's only intended for use in mes_self_test case, not for s0ix and 
reset */
+   if (!amdgpu_in_reset(adev) && !adev->in_s0ix && !adev->in_suspend)
+   amdgpu_mes_self_test(adev);
+
return 0;
 }
 
-- 
2.44.0



[PATCH 09/31] drm/amdgpu: set mes fw address for mes v12

2024-04-29 Thread Alex Deucher
From: Likun Gao 

Split the function of mes fimrware address setting
from mes firmware load for mes v12, as it's also
needed for rlc autoload.

Signed-off-by: Likun Gao 
Reviewed-by: Hawking Zhang 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/mes_v12_0.c | 37 +++---
 1 file changed, 28 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
index 1bf12fc1f72e5..a9bf06ad0202b 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
@@ -596,13 +596,36 @@ static void mes_v12_0_enable(struct amdgpu_device *adev, 
bool enable)
}
 }
 
+static void mes_v12_0_set_ucode_start_addr(struct amdgpu_device *adev)
+{
+   uint64_t ucode_addr;
+   int pipe;
+
+   mes_v12_0_enable(adev, false);
+
+   mutex_lock(>srbm_mutex);
+   for (pipe = 0; pipe < AMDGPU_MAX_MES_PIPES; pipe++) {
+   /* me=3, queue=0 */
+   soc21_grbm_select(adev, 3, pipe, 0, 0);
+
+   /* set ucode start address */
+   ucode_addr = adev->mes.uc_start_addr[pipe] >> 2;
+   WREG32_SOC15(GC, 0, regCP_MES_PRGRM_CNTR_START,
+   lower_32_bits(ucode_addr));
+   WREG32_SOC15(GC, 0, regCP_MES_PRGRM_CNTR_START_HI,
+   upper_32_bits(ucode_addr));
+
+   soc21_grbm_select(adev, 0, 0, 0, 0);
+   }
+   mutex_unlock(>srbm_mutex);
+}
+
 /* This function is for backdoor MES firmware */
 static int mes_v12_0_load_microcode(struct amdgpu_device *adev,
enum admgpu_mes_pipe pipe, bool 
prime_icache)
 {
int r;
uint32_t data;
-   uint64_t ucode_addr;
 
mes_v12_0_enable(adev, false);
 
@@ -625,13 +648,6 @@ static int mes_v12_0_load_microcode(struct amdgpu_device 
*adev,
 
WREG32_SOC15(GC, 0, regCP_MES_IC_BASE_CNTL, 0);
 
-   /* set ucode start address */
-   ucode_addr = adev->mes.uc_start_addr[pipe] >> 2;
-   WREG32_SOC15(GC, 0, regCP_MES_PRGRM_CNTR_START,
-lower_32_bits(ucode_addr));
-   WREG32_SOC15(GC, 0, regCP_MES_PRGRM_CNTR_START_HI,
-upper_32_bits(ucode_addr));
-
/* set ucode fimrware address */
WREG32_SOC15(GC, 0, regCP_MES_IC_BASE_LO,
 lower_32_bits(adev->mes.ucode_fw_gpu_addr[pipe]));
@@ -1158,7 +1174,10 @@ static int mes_v12_0_kiq_hw_init(struct amdgpu_device 
*adev)
return r;
}
 
-   }
+   mes_v12_0_set_ucode_start_addr(adev);
+
+   } else if (adev->firmware.load_type == AMDGPU_FW_LOAD_RLC_BACKDOOR_AUTO)
+   mes_v12_0_set_ucode_start_addr(adev);
 
mes_v12_0_enable(adev, true);
 
-- 
2.44.0



[PATCH 07/31] drm/amdgpu: init mes ucode name for gfx v12

2024-04-29 Thread Alex Deucher
From: Likun Gao 

Keep gfx v12 mes fw name to gc_12_x_x_mes.bin
and gc_12_x_x_mes1.bin.

Signed-off-by: Likun Gao 
Reviewed-by: Hawking Zhang 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
index 5ca5c47ab54ed..02ce69e3d1ddc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
@@ -1511,7 +1511,8 @@ int amdgpu_mes_init_microcode(struct amdgpu_device *adev, 
int pipe)
 
amdgpu_ucode_ip_version_decode(adev, GC_HWIP, ucode_prefix,
   sizeof(ucode_prefix));
-   if (amdgpu_ip_version(adev, GC_HWIP, 0) >= IP_VERSION(11, 0, 0)) {
+   if (amdgpu_ip_version(adev, GC_HWIP, 0) >= IP_VERSION(11, 0, 0) &&
+   amdgpu_ip_version(adev, GC_HWIP, 0) < IP_VERSION(12, 0, 0)) {
snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mes%s.bin",
 ucode_prefix,
 pipe == AMDGPU_MES_SCHED_PIPE ? "_2" : "1");
-- 
2.44.0



[PATCH 05/31] drm/amdgpu: add rlc TOC header file for soc24

2024-04-29 Thread Alex Deucher
From: Likun Gao 

Add RLC autoload TOC header file for soc24 ASIC.

Signed-off-by: Likun Gao 
Reviewed-by: Hawking Zhang 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.h | 47 +
 1 file changed, 47 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.h
index 0614de6c122cb..fce22d3f816b2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.h
@@ -112,6 +112,53 @@ typedef enum _SOC21_FIRMWARE_ID_ {
 SOC21_FIRMWARE_ID_MAX = 37
 } SOC21_FIRMWARE_ID;
 
+typedef enum _SOC24_FIRMWARE_ID_ {
+SOC24_FIRMWARE_ID_INVALID = 0,
+SOC24_FIRMWARE_ID_RLC_G_UCODE = 1,
+SOC24_FIRMWARE_ID_RLC_TOC = 2,
+SOC24_FIRMWARE_ID_RLCG_SCRATCH= 3,
+SOC24_FIRMWARE_ID_RLC_SRM_ARAM= 4,
+SOC24_FIRMWARE_ID_RLC_P_UCODE = 5,
+SOC24_FIRMWARE_ID_RLC_V_UCODE = 6,
+SOC24_FIRMWARE_ID_RLX6_UCODE  = 7,
+SOC24_FIRMWARE_ID_RLX6_UCODE_CORE1= 8,
+SOC24_FIRMWARE_ID_RLX6_DRAM_BOOT  = 9,
+SOC24_FIRMWARE_ID_RLX6_DRAM_BOOT_CORE1= 10,
+SOC24_FIRMWARE_ID_SDMA_UCODE_TH0  = 11,
+SOC24_FIRMWARE_ID_SDMA_UCODE_TH1  = 12,
+SOC24_FIRMWARE_ID_CP_PFP  = 13,
+SOC24_FIRMWARE_ID_CP_ME   = 14,
+SOC24_FIRMWARE_ID_CP_MEC  = 15,
+SOC24_FIRMWARE_ID_RS64_MES_P0 = 16,
+SOC24_FIRMWARE_ID_RS64_MES_P1 = 17,
+SOC24_FIRMWARE_ID_RS64_PFP= 18,
+SOC24_FIRMWARE_ID_RS64_ME = 19,
+SOC24_FIRMWARE_ID_RS64_MEC= 20,
+SOC24_FIRMWARE_ID_RS64_MES_P0_STACK   = 21,
+SOC24_FIRMWARE_ID_RS64_MES_P1_STACK   = 22,
+SOC24_FIRMWARE_ID_RS64_PFP_P0_STACK   = 23,
+SOC24_FIRMWARE_ID_RS64_PFP_P1_STACK   = 24,
+SOC24_FIRMWARE_ID_RS64_ME_P0_STACK= 25,
+SOC24_FIRMWARE_ID_RS64_ME_P1_STACK= 26,
+SOC24_FIRMWARE_ID_RS64_MEC_P0_STACK   = 27,
+SOC24_FIRMWARE_ID_RS64_MEC_P1_STACK   = 28,
+SOC24_FIRMWARE_ID_RS64_MEC_P2_STACK   = 29,
+SOC24_FIRMWARE_ID_RS64_MEC_P3_STACK   = 30,
+SOC24_FIRMWARE_ID_RLC_SRM_DRAM_SR = 31,
+SOC24_FIRMWARE_ID_RLCG_SCRATCH_SR = 32,
+SOC24_FIRMWARE_ID_RLCP_SCRATCH_SR = 33,
+SOC24_FIRMWARE_ID_RLCV_SCRATCH_SR = 34,
+SOC24_FIRMWARE_ID_RLX6_DRAM_SR= 35,
+SOC24_FIRMWARE_ID_RLX6_DRAM_SR_CORE1  = 36,
+SOC24_FIRMWARE_ID_RLCDEBUGLOG = 37,
+SOC24_FIRMWARE_ID_SRIOV_DEBUG = 38,
+SOC24_FIRMWARE_ID_SRIOV_CSA_RLC   = 39,
+SOC24_FIRMWARE_ID_SRIOV_CSA_SDMA  = 40,
+SOC24_FIRMWARE_ID_SRIOV_CSA_CP= 41,
+SOC24_FIRMWARE_ID_UMF_ZONE_PAD= 42,
+SOC24_FIRMWARE_ID_MAX = 43
+} SOC24_FIRMWARE_ID;
+
 typedef struct _RLC_TABLE_OF_CONTENT {
union {
unsigned intDW0;
-- 
2.44.0



[PATCH 04/31] drm/amdgpu: add new TOC structure

2024-04-29 Thread Alex Deucher
From: Likun Gao 

Add new RLC_TABLE_OF_CONTENT structure definition.

Signed-off-by: Likun Gao 
Reviewed-by: Hawking Zhang 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.h | 27 +
 1 file changed, 27 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.h
index 5a17e0ff2ab89..0614de6c122cb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.h
@@ -155,6 +155,33 @@ typedef struct _RLC_TABLE_OF_CONTENT {
};
 } RLC_TABLE_OF_CONTENT;
 
+typedef struct _RLC_TABLE_OF_CONTENT_V2 {
+   union {
+   unsigned intDW0;
+   struct {
+   uint32_t offset : 25;
+   uint32_t id : 7;
+   };
+   };
+
+   union {
+   unsigned intDW1;
+   struct {
+   uint32_t reserved0  : 1;
+   uint32_t reserved1  : 1;
+   uint32_t reserved2  : 1;
+   uint32_t memory_destination : 2;
+   uint32_t vfflr_image_code   : 4;
+   uint32_t reserved9  : 1;
+   uint32_t reserved10 : 1;
+   uint32_t reserved11 : 1;
+   uint32_t size_x16   : 1;
+   uint32_t reserved13 : 1;
+   uint32_t size   : 18;
+   };
+   };
+} RLC_TABLE_OF_CONTENT_V2;
+
 #define RLC_TOC_MAX_SIZE   64
 
 struct amdgpu_rlc_funcs {
-- 
2.44.0



[PATCH 03/31] drm/amdgpu: add gfx12 clearstate header

2024-04-29 Thread Alex Deucher
From: Likun Gao 

Add gfx12 clearstate register arrays.

Signed-off-by: Likun Gao 
Reviewed-by: Hawking Zhang 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/clearstate_gfx12.h | 121 ++
 1 file changed, 121 insertions(+)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/clearstate_gfx12.h

diff --git a/drivers/gpu/drm/amd/amdgpu/clearstate_gfx12.h 
b/drivers/gpu/drm/amd/amdgpu/clearstate_gfx12.h
new file mode 100644
index 0..2f6c9d11d5aef
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/clearstate_gfx12.h
@@ -0,0 +1,121 @@
+/*
+ * Copyright 2023 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+#ifndef __CLEARSTATE_GFX12_H_
+#define __CLEARSTATE_GFX12_H_
+
+static const unsigned int gfx12_SECT_CONTEXT_def_1[] = {
+0x, //mmSC_MEM_TEMPORAL
+0x, //mmSC_MEM_SPEC_READ
+0x, //mmPA_SC_VPORT_0_TL
+0x, //mmPA_SC_VPORT_0_BR
+0x, //mmPA_SC_VPORT_1_TL
+0x, //mmPA_SC_VPORT_1_BR
+0x, //mmPA_SC_VPORT_2_TL
+0x, //mmPA_SC_VPORT_2_BR
+0x, //mmPA_SC_VPORT_3_TL
+0x, //mmPA_SC_VPORT_3_BR
+0x, //mmPA_SC_VPORT_4_TL
+0x, //mmPA_SC_VPORT_4_BR
+0x, //mmPA_SC_VPORT_5_TL
+0x, //mmPA_SC_VPORT_5_BR
+0x, //mmPA_SC_VPORT_6_TL
+0x, //mmPA_SC_VPORT_6_BR
+0x, //mmPA_SC_VPORT_7_TL
+0x, //mmPA_SC_VPORT_7_BR
+0x, //mmPA_SC_VPORT_8_TL
+0x, //mmPA_SC_VPORT_8_BR
+0x, //mmPA_SC_VPORT_9_TL
+0x, //mmPA_SC_VPORT_9_BR
+0x, //mmPA_SC_VPORT_10_TL
+0x, //mmPA_SC_VPORT_10_BR
+0x, //mmPA_SC_VPORT_11_TL
+0x, //mmPA_SC_VPORT_11_BR
+0x, //mmPA_SC_VPORT_12_TL
+0x, //mmPA_SC_VPORT_12_BR
+0x, //mmPA_SC_VPORT_13_TL
+0x, //mmPA_SC_VPORT_13_BR
+0x, //mmPA_SC_VPORT_14_TL
+0x, //mmPA_SC_VPORT_14_BR
+0x, //mmPA_SC_VPORT_15_TL
+0x, //mmPA_SC_VPORT_15_BR
+};
+
+static const unsigned int gfx12_SECT_CONTEXT_def_2[] = {
+0x, //mmPA_CL_PROG_NEAR_CLIP_Z
+0x, //mmPA_RATE_CNTL
+};
+
+static const unsigned int gfx12_SECT_CONTEXT_def_3[] = {
+0x, //mmCP_PERFMON_CNTX_CNTL
+};
+
+static const unsigned int gfx12_SECT_CONTEXT_def_4[] = {
+0x, //mmCONTEXT_RESERVED_REG0
+0x, //mmCONTEXT_RESERVED_REG1
+0x, //mmPA_SC_CLIPRECT_0_EXT
+0x, //mmPA_SC_CLIPRECT_1_EXT
+0x, //mmPA_SC_CLIPRECT_2_EXT
+0x, //mmPA_SC_CLIPRECT_3_EXT
+};
+
+static const unsigned int gfx12_SECT_CONTEXT_def_5[] = {
+0x, //mmPA_SC_HIZ_INFO
+0x, //mmPA_SC_HIS_INFO
+0x, //mmPA_SC_HIZ_BASE
+0x, //mmPA_SC_HIZ_BASE_EXT
+0x, //mmPA_SC_HIZ_SIZE_XY
+0x, //mmPA_SC_HIS_BASE
+0x, //mmPA_SC_HIS_BASE_EXT
+0x, //mmPA_SC_HIS_SIZE_XY
+0x, //mmPA_SC_BINNER_OUTPUT_TIMEOUT_CNTL
+0x, //mmPA_SC_BINNER_DYNAMIC_BATCH_LIMIT
+0x, //mmPA_SC_HISZ_CONTROL
+};
+
+static const unsigned int gfx12_SECT_CONTEXT_def_6[] = {
+0x, //mmCB_MEM0_INFO
+0x, //mmCB_MEM1_INFO
+0x, //mmCB_MEM2_INFO
+0x, //mmCB_MEM3_INFO
+0x, //mmCB_MEM4_INFO
+0x, //mmCB_MEM5_INFO
+0x, //mmCB_MEM6_INFO
+0x, //mmCB_MEM7_INFO
+};
+
+static const struct cs_extent_def gfx12_SECT_CONTEXT_defs[] = {
+{gfx12_SECT_CONTEXT_def_1, 0xa03e, 34 },
+{gfx12_SECT_CONTEXT_def_2, 0xa0cc, 2 },
+{gfx12_SECT_CONTEXT_def_3, 0xa0d8, 1 },
+{gfx12_SECT_CONTEXT_def_4, 0xa0db, 6 },
+{gfx12_SECT_CONTEXT_def_5, 0xa2e5, 11 },
+{gfx12_SECT_CONTEXT_def_6, 0xa3c0, 8 },
+{ 0, 0, 0 }
+};
+
+static const struct cs_section_def gfx12_cs_data[] = {
+{ gfx12_SECT_CONTEXT_defs, SECT_CONTEXT },
+{ 0, SECT_NONE }
+};
+
+#endif /* __CLEARSTATE_GFX12_H_ */
-- 
2.44.0



RE: [PATCH v3] drm/amdgpu: IB test encode test package change for VCN5

2024-04-29 Thread Liu, Leo
[AMD Official Use Only - General]

Reviewed-by: Leo Liu 

> -Original Message-
> From: amd-gfx  On Behalf Of Sonny
> Jiang
> Sent: Thursday, April 25, 2024 4:11 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Jiang, Sonny ; Jiang, Sonny
> 
> Subject: [PATCH v3] drm/amdgpu: IB test encode test package change for VCN5
>
> From: Sonny Jiang 
>
> VCN5 session info package interface changed
>
> Signed-off-by: Sonny Jiang 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> index 677eb141554e..b89605b400c0 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> @@ -885,7 +885,7 @@ static int amdgpu_vcn_enc_get_create_msg(struct
> amdgpu_ring *ring, uint32_t hand
>   ib->ptr[ib->length_dw++] = handle;
>   ib->ptr[ib->length_dw++] = upper_32_bits(addr);
>   ib->ptr[ib->length_dw++] = addr;
> - ib->ptr[ib->length_dw++] = 0x000b;
> + ib->ptr[ib->length_dw++] = 0x;
>
>   ib->ptr[ib->length_dw++] = 0x0014;
>   ib->ptr[ib->length_dw++] = 0x0002; /* task info */ @@ -952,7
> +952,7 @@ static int amdgpu_vcn_enc_get_destroy_msg(struct amdgpu_ring
> *ring, uint32_t han
>   ib->ptr[ib->length_dw++] = handle;
>   ib->ptr[ib->length_dw++] = upper_32_bits(addr);
>   ib->ptr[ib->length_dw++] = addr;
> - ib->ptr[ib->length_dw++] = 0x000b;
> + ib->ptr[ib->length_dw++] = 0x;
>
>   ib->ptr[ib->length_dw++] = 0x0014;
>   ib->ptr[ib->length_dw++] = 0x0002;
> --
> 2.43.2



Re: [PATCH v3] drm/amdgpu: Fix the uninitialized variable warning

2024-04-29 Thread Alex Deucher
On Fri, Apr 26, 2024 at 5:57 AM Ma Jun  wrote:
>
> Check the user input and phy_id value range to fix
> "Using uninitialized value phy_id"
>
> Signed-off-by: Ma Jun 

Reviewed-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_securedisplay.c | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_securedisplay.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_securedisplay.c
> index 8ed0e073656f..41ebe690eeff 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_securedisplay.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_securedisplay.c
> @@ -135,6 +135,10 @@ static ssize_t amdgpu_securedisplay_debugfs_write(struct 
> file *f, const char __u
> mutex_unlock(>securedisplay_context.mutex);
> break;
> case 2:
> +   if (size < 3 || phy_id >= TA_SECUREDISPLAY_MAX_PHY) {
> +   dev_err(adev->dev, "Invalid input: %s\n", str);
> +   return -EINVAL;
> +   }
> mutex_lock(>securedisplay_context.mutex);
> psp_prep_securedisplay_cmd_buf(psp, _cmd,
> TA_SECUREDISPLAY_COMMAND__SEND_ROI_CRC);
> --
> 2.34.1
>


Re: [PATCH 1/3] drm/amdgpu: Add amdgpu_bo_is_vm_bo helper

2024-04-29 Thread Christian König

Am 29.04.24 um 15:34 schrieb Tvrtko Ursulin:


On 29/04/2024 12:02, Christian König wrote:

Am 26.04.24 um 18:43 schrieb Tvrtko Ursulin:

From: Tvrtko Ursulin 

Help code readability by replacing a bunch of:

bo->tbo.base.resv == vm->root.bo->tbo.base.resv

With:

amdgpu_bo_is_vm_bo(bo, vm)

No functional changes.


Ah,yes that was on my TODO list as well.

But I would have rather added this to the VM instead. In other words 
move it to amdgpu_vm.h and call it amdgpu_vm_is_bo_always_valid() or 
something like that.


I am happy to move it around as long as you are sure amdgpu_vm.h is 
the location.


For instance main API there it seems to be amdgpu_vm_bo's. At least 
all the amdgpu_bo usages do not needing anything more that the struct 
forward declared.


So if I move the helper there I either need to make it include another 
header, or move the helper out of line to amdgpu_vm.c.


Thoughts?


amdgpu_vm.c is fine as well. I just though that something so simply 
could be an inline function in the header as well.


Regards,
Christian.



Regards,

Tvrtko



Signed-off-by: Tvrtko Ursulin 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c    |  2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 14 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 31 
+-

  3 files changed, 28 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c

index 67c234bcf89f..32e4a9c6e805 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -174,7 +174,7 @@ static int amdgpu_gem_object_open(struct 
drm_gem_object *obj,

  return -EPERM;
  if (abo->flags & AMDGPU_GEM_CREATE_VM_ALWAYS_VALID &&
-    abo->tbo.base.resv != vm->root.bo->tbo.base.resv)
+    !amdgpu_bo_is_vm_bo(abo, vm))
  return -EPERM;
  r = amdgpu_bo_reserve(abo, false);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h

index be679c42b0b8..f2bb6965cc77 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
@@ -307,6 +307,20 @@ static inline struct amdgpu_bo 
*amdgpu_bo_shadowed(struct amdgpu_bo *bo)

  return NULL;
  }
+/**
+ * amdgpu_bo_is_vm_bo - check if the BO is VM always valid
+ *
+ * @abo: BO to be tested.
+ * @vm: VM to test against.
+ *
+ * Returns true if the BO is VM always valid.
+ */
+static inline bool amdgpu_bo_is_vm_bo(struct amdgpu_bo *bo,
+  struct amdgpu_vm *vm)
+{
+    return bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv;
+}
+
  bool amdgpu_bo_is_amdgpu_bo(struct ttm_buffer_object *bo);
  void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32 
domain);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c

index 8af3f0fd3073..6d6f0e325172 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -333,7 +333,7 @@ void amdgpu_vm_bo_base_init(struct 
amdgpu_vm_bo_base *base,

  base->next = bo->vm_bo;
  bo->vm_bo = base;
-    if (bo->tbo.base.resv != vm->root.bo->tbo.base.resv)
+    if (!amdgpu_bo_is_vm_bo(bo, vm))
  return;
  dma_resv_assert_held(vm->root.bo->tbo.base.resv);
@@ -1101,13 +1101,12 @@ static void amdgpu_vm_bo_get_memory(struct 
amdgpu_bo_va *bo_va,

   * For now ignore BOs which are currently locked and potentially
   * changing their location.
   */
-    if (bo->tbo.base.resv != vm->root.bo->tbo.base.resv &&
-    !dma_resv_trylock(bo->tbo.base.resv))
+    if (!amdgpu_bo_is_vm_bo(bo, vm) && 
!dma_resv_trylock(bo->tbo.base.resv))

  return;
  amdgpu_bo_get_memory(bo, stats);
-    if (bo->tbo.base.resv != vm->root.bo->tbo.base.resv)
-    dma_resv_unlock(bo->tbo.base.resv);
+    if (amdgpu_bo_is_vm_bo(bo, vm))
+    dma_resv_unlock(bo->tbo.base.resv);
  }
  void amdgpu_vm_get_memory(struct amdgpu_vm *vm,
@@ -1203,8 +1202,7 @@ int amdgpu_vm_bo_update(struct amdgpu_device 
*adev, struct amdgpu_bo_va *bo_va,

  uncached = false;
  }
-    if (clear || (bo && bo->tbo.base.resv ==
-  vm->root.bo->tbo.base.resv))
+    if (clear || amdgpu_bo_is_vm_bo(bo, vm))
  last_update = >last_update;
  else
  last_update = _va->last_pt_update;
@@ -1246,7 +1244,7 @@ int amdgpu_vm_bo_update(struct amdgpu_device 
*adev, struct amdgpu_bo_va *bo_va,

   * the evicted list so that it gets validated again on the
   * next command submission.
   */
-    if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv) {
+    if (amdgpu_bo_is_vm_bo(bo, vm)) {
  uint32_t mem_type = bo->tbo.resource->mem_type;
  if (!(bo->preferred_domains &
@@ -1640,10 +1638,9 @@ static void amdgpu_vm_bo_insert_map(struct 
amdgpu_device *adev,

  if (mapping->flags & AMDGPU_PTE_PRT)
  amdgpu_vm_prt_get(adev);
-    if (bo && bo->tbo.base.resv == 

Re: [PATCH v3] drm/amdgpu: IB test encode test package change for VCN5

2024-04-29 Thread Jiang, Sonny
[AMD Official Use Only - General]

Ping.

Sonny

From: Jiang, Sonny 
Sent: Thursday, April 25, 2024 4:12 PM
To: amd-gfx@lists.freedesktop.org 
Subject: Re: [PATCH v3] drm/amdgpu: IB test encode test package change for VCN5

By tests, I didn't find error on VCN1 to VCN4.

Thanks,
Sonny


From: Jiang, Sonny 
Sent: Thursday, April 25, 2024 4:10 PM
To: amd-gfx@lists.freedesktop.org 
Cc: Jiang, Sonny ; Jiang, Sonny 
Subject: [PATCH v3] drm/amdgpu: IB test encode test package change for VCN5

From: Sonny Jiang 

VCN5 session info package interface changed

Signed-off-by: Sonny Jiang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
index 677eb141554e..b89605b400c0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
@@ -885,7 +885,7 @@ static int amdgpu_vcn_enc_get_create_msg(struct amdgpu_ring 
*ring, uint32_t hand
 ib->ptr[ib->length_dw++] = handle;
 ib->ptr[ib->length_dw++] = upper_32_bits(addr);
 ib->ptr[ib->length_dw++] = addr;
-   ib->ptr[ib->length_dw++] = 0x000b;
+   ib->ptr[ib->length_dw++] = 0x;

 ib->ptr[ib->length_dw++] = 0x0014;
 ib->ptr[ib->length_dw++] = 0x0002; /* task info */
@@ -952,7 +952,7 @@ static int amdgpu_vcn_enc_get_destroy_msg(struct 
amdgpu_ring *ring, uint32_t han
 ib->ptr[ib->length_dw++] = handle;
 ib->ptr[ib->length_dw++] = upper_32_bits(addr);
 ib->ptr[ib->length_dw++] = addr;
-   ib->ptr[ib->length_dw++] = 0x000b;
+   ib->ptr[ib->length_dw++] = 0x;

 ib->ptr[ib->length_dw++] = 0x0014;
 ib->ptr[ib->length_dw++] = 0x0002;
--
2.43.2



Re: [PATCH] drm/amd/pm: fix uninitialized variable warning for smu_v13

2024-04-29 Thread Alex Deucher
On Mon, Apr 29, 2024 at 3:52 AM Tim Huang  wrote:
>
> Clear warning that using uninitialized variable when the dpm is
> not enabled and reuse the code for SMU13 to get the boot frequency.
>
> Signed-off-by: Tim Huang 

Acked-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/pm/swsmu/inc/smu_v13_0.h  |  4 ++
>  .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c| 55 +--
>  .../drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c  | 28 +-
>  .../drm/amd/pm/swsmu/smu13/smu_v13_0_5_ppt.c  | 28 +-
>  .../drm/amd/pm/swsmu/smu13/yellow_carp_ppt.c  | 28 +-
>  5 files changed, 51 insertions(+), 92 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/smu_v13_0.h 
> b/drivers/gpu/drm/amd/pm/swsmu/inc/smu_v13_0.h
> index d9700a3f28d2..e58220a7ee2f 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/inc/smu_v13_0.h
> +++ b/drivers/gpu/drm/amd/pm/swsmu/inc/smu_v13_0.h
> @@ -298,5 +298,9 @@ int smu_v13_0_enable_uclk_shadow(struct smu_context *smu, 
> bool enable);
>
>  int smu_v13_0_set_wbrf_exclusion_ranges(struct smu_context *smu,
>  struct freq_band_range 
> *exclusion_ranges);
> +
> +int smu_v13_0_get_boot_freq_by_index(struct smu_context *smu,
> +enum smu_clk_type clk_type,
> +uint32_t *value);
>  #endif
>  #endif
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c 
> b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
> index a8d34adc7d3f..ed5a7a83c9e2 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
> @@ -1559,22 +1559,9 @@ int smu_v13_0_get_dpm_ultimate_freq(struct smu_context 
> *smu, enum smu_clk_type c
> uint32_t clock_limit;
>
> if (!smu_cmn_clk_dpm_is_enabled(smu, clk_type)) {
> -   switch (clk_type) {
> -   case SMU_MCLK:
> -   case SMU_UCLK:
> -   clock_limit = smu->smu_table.boot_values.uclk;
> -   break;
> -   case SMU_GFXCLK:
> -   case SMU_SCLK:
> -   clock_limit = smu->smu_table.boot_values.gfxclk;
> -   break;
> -   case SMU_SOCCLK:
> -   clock_limit = smu->smu_table.boot_values.socclk;
> -   break;
> -   default:
> -   clock_limit = 0;
> -   break;
> -   }
> +   ret = smu_v13_0_get_boot_freq_by_index(smu, clk_type, 
> _limit);
> +   if (ret)
> +   return ret;
>
> /* clock in Mhz unit */
> if (min)
> @@ -1894,6 +1881,40 @@ int smu_v13_0_set_power_source(struct smu_context *smu,
>NULL);
>  }
>
> +int smu_v13_0_get_boot_freq_by_index(struct smu_context *smu,
> +enum smu_clk_type clk_type,
> +uint32_t *value)
> +{
> +   int ret = 0;
> +
> +   switch (clk_type) {
> +   case SMU_MCLK:
> +   case SMU_UCLK:
> +   *value = smu->smu_table.boot_values.uclk;
> +   break;
> +   case SMU_FCLK:
> +   *value = smu->smu_table.boot_values.fclk;
> +   break;
> +   case SMU_GFXCLK:
> +   case SMU_SCLK:
> +   *value = smu->smu_table.boot_values.gfxclk;
> +   break;
> +   case SMU_SOCCLK:
> +   *value = smu->smu_table.boot_values.socclk;
> +   break;
> +   case SMU_VCLK:
> +   *value = smu->smu_table.boot_values.vclk;
> +   break;
> +   case SMU_DCLK:
> +   *value = smu->smu_table.boot_values.dclk;
> +   break;
> +   default:
> +   ret = -EINVAL;
> +   break;
> +   }
> +   return ret;
> +}
> +
>  int smu_v13_0_get_dpm_freq_by_index(struct smu_context *smu,
> enum smu_clk_type clk_type, uint16_t 
> level,
> uint32_t *value)
> @@ -1905,7 +1926,7 @@ int smu_v13_0_get_dpm_freq_by_index(struct smu_context 
> *smu,
> return -EINVAL;
>
> if (!smu_cmn_clk_dpm_is_enabled(smu, clk_type))
> -   return 0;
> +   return smu_v13_0_get_boot_freq_by_index(smu, clk_type, value);
>
> clk_id = smu_cmn_to_asic_specific_index(smu,
> CMN2ASIC_MAPPING_CLK,
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c 
> b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c
> index 88f1a0d878f3..e283b282ec27 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c
> @@ -756,31 +756,9 @@ static int smu_v13_0_4_get_dpm_ultimate_freq(struct 
> smu_context *smu,
> int ret = 0;
>
> if 

[PATCH] Revert "drm/amd/pm: fixes a random hang in S4 for SMU v13.0.4/11"

2024-04-29 Thread Alex Deucher
This reverts commit 31729e8c21ecfd671458e02b6511eb68c2225113.

This causes problems with reboots/shutdowns for some users.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3351
Signed-off-by: Alex Deucher 
Cc: Tim Huang 
---
 drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c | 12 +---
 1 file changed, 1 insertion(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c
index 88f1a0d878f33..e8119918ef6b1 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c
@@ -226,18 +226,8 @@ static int smu_v13_0_4_system_features_control(struct 
smu_context *smu, bool en)
struct amdgpu_device *adev = smu->adev;
int ret = 0;
 
-   if (!en && !adev->in_s0ix) {
-   /* Adds a GFX reset as workaround just before sending the
-* MP1_UNLOAD message to prevent GC/RLC/PMFW from entering
-* an invalid state.
-*/
-   ret = smu_cmn_send_smc_msg_with_param(smu, 
SMU_MSG_GfxDeviceDriverReset,
- SMU_RESET_MODE_2, NULL);
-   if (ret)
-   return ret;
-
+   if (!en && !adev->in_s0ix)
ret = smu_cmn_send_smc_msg(smu, SMU_MSG_PrepareMp1ForUnload, 
NULL);
-   }
 
return ret;
 }
-- 
2.44.0



Re: [PATCH] drm/amdgpu/vpe: fix vpe dpm clk ratio setup failed

2024-04-29 Thread Alex Deucher
On Mon, Apr 29, 2024 at 3:07 AM Peyton Lee  wrote:
>
> Some version of BIOS does not enable all clock levels,
> resulting in high level clock frequency of 0.
> The number of valid CLKs must be confirmed in advance.
>
> Signed-off-by: Peyton Lee 

Acked-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c | 13 +++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c
> index c23d97d34b7e..49881073ff58 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c
> @@ -128,6 +128,7 @@ int amdgpu_vpe_configure_dpm(struct amdgpu_vpe *vpe)
> struct dpm_clock *VPEClks;
> struct dpm_clock *SOCClks;
> uint32_t idx;
> +   uint32_t vpeclk_enalbled_num = 0;
> uint32_t pratio_vmax_vnorm = 0, pratio_vnorm_vmid = 0, 
> pratio_vmid_vmin = 0;
> uint16_t pratio_vmin_freq = 0, pratio_vmid_freq = 0, 
> pratio_vnorm_freq = 0, pratio_vmax_freq = 0;
>
> @@ -144,6 +145,14 @@ int amdgpu_vpe_configure_dpm(struct amdgpu_vpe *vpe)
> SOCClks = clock_table.SocClocks;
> VPEClks = clock_table.VPEClocks;
>
> +   /* Comfirm enabled vpe clk num
> +* Enabled VPE clocks are ordered from low to high in VPEClks
> +* The highest valid clock index+1 is the number of VPEClks
> +*/
> +   for (idx = PP_SMU_NUM_VPECLK_DPM_LEVELS; idx && 
> !vpeclk_enalbled_num; idx--)
> +   if (VPEClks[idx-1].Freq)
> +   vpeclk_enalbled_num = idx;
> +
> /* vpe dpm only cares 4 levels. */
> for (idx = 0; idx < VPE_MAX_DPM_LEVEL; idx++) {
> uint32_t soc_dpm_level;
> @@ -155,8 +164,8 @@ int amdgpu_vpe_configure_dpm(struct amdgpu_vpe *vpe)
> soc_dpm_level = (idx * 2) + 1;
>
> /* clamp the max level */
> -   if (soc_dpm_level > PP_SMU_NUM_VPECLK_DPM_LEVELS - 1)
> -   soc_dpm_level = PP_SMU_NUM_VPECLK_DPM_LEVELS 
> - 1;
> +   if (soc_dpm_level > vpeclk_enalbled_num - 1)
> +   soc_dpm_level = vpeclk_enalbled_num - 1;
>
> min_freq = (SOCClks[soc_dpm_level].Freq < 
> VPEClks[soc_dpm_level].Freq) ?
>SOCClks[soc_dpm_level].Freq : 
> VPEClks[soc_dpm_level].Freq;
> --
> 2.34.1
>


RE: [PATCH 00/46] DC Patches April 29, 2024

2024-04-29 Thread Wheeler, Daniel
[Public]

Hi all,

This week this patchset was tested on the following systems:
* Lenovo ThinkBook T13s Gen4 with AMD Ryzen 5 6600U
* MSI Gaming X Trio RX 6800
* Gigabyte Gaming OC RX 7900 XTX

These systems were tested on the following display/connection types:
* eDP, (1080p 60hz [5650U]) (1920x1200 60hz [6600U]) (2560x1600 
120hz[6600U])
* VGA and DVI (1680x1050 60hz [DP to VGA/DVI, USB-C to VGA/DVI])
* DP/HDMI/USB-C (1440p 170hz, 4k 60hz, 4k 144hz, 4k 240hz [Includes 
USB-C to DP/HDMI adapters])
* Thunderbolt (LG Ultrafine 5k)
* MST (Startech MST14DP123DP [DP to 3x DP] and 2x 4k 60Hz displays)
* DSC (with Cable Matters 101075 [DP to 3x DP] with 3x 4k60 displays, 
and HP Hook G2 with 1 4k60 display)
* USB 4 (Kensington SD5700T and 1x 4k 60Hz display)
* PCON (Club3D CAC-1085 and 1x 4k 144Hz display [at 4k 120HZ, as that 
is the max the adapter supports])

The testing is a mix of automated and manual tests. Manual testing includes 
(but is not limited to):
* Changing display configurations and settings
* Benchmark testing
* Feature testing (Freesync, etc.)

Automated testing includes (but is not limited to):
* Script testing (scripts to automate some of the manual checks)
* IGT testing

The patchset consists of the amd-staging-drm-next branch (Head commit - 
6fdf2d7a8aaa drm/amd/display: 3.2.282) with new patches added on top of it.

Tested on Ubuntu 22.04.3, on Wayland and X11, using KDE Plasma and Gnome.


Tested-by: Daniel Wheeler 


Thank you,

Dan Wheeler
Sr. Technologist | AMD
SW Display
--
1 Commerce Valley Dr E, Thornhill, ON L3T 7X6
amd.com

-Original Message-
From: amd-gfx  On Behalf Of Wayne Lin
Sent: Wednesday, April 24, 2024 4:49 AM
To: amd-gfx@lists.freedesktop.org
Cc: Wentland, Harry ; Li, Sun peng (Leo) 
; Siqueira, Rodrigo ; Pillai, 
Aurabindo ; Li, Roman ; Lin, Wayne 
; Gutierrez, Agustin ; Chung, 
ChiaHsuan (Tom) ; Wu, Hersen ; 
Zuo, Jerry ; Lin, Wayne 
Subject: [PATCH 00/46] DC Patches April 29, 2024

This DC patchset brings improvements in multiple areas. In summary, we 
highlight:

- Disable seamless boot on 128b/132b encoding
- Change ASSR disable sequence to avoid corruption
- Fix few IPS problems
- Enable Replay for DCN315
- Fix few ODM problems
- Fix FEC_READY write timing
- Fix few FPO problems
- Adjust DML21 gpuvm_enable assignment
- Fix divide by 0 error in VM environment
- Fix few DCN35 problems
- Fix flickering on DCN321
- Fix mst resume problem
- Fix multi-disp FAMS problem
- Refactor Replay
- Update some of the dcn303 parameters
- Enable legacy fast update for dcn301
- Add VCO parameter for DCN31 FPU
- Have cursor and surface updates together
- Fix problems reported by Coverity

---
Alex Hung (9):
  drm/amd/display: Check index msg_id before read or write
  drm/amd/display: Check pipe offset before setting vblank
  drm/amd/display: Skip finding free audio for unknown engine_id
  drm/amd/display: Do not return negative stream id for array
  drm/amd/display: ASSERT when failing to find index by plane/stream id
  drm/amd/display: Remove redundant include file
  drm/amd/display: Fix uninitialized variables in DM
  drm/amd/display: Fix uninitialized variables in DC
  drm/amd/display: Fix uninitialized variables in DC

Alvin Lee (3):
  drm/amd/display: Only program P-State force if pipe config changed
  drm/amd/display: Assign linear_pitch_alignment even for VM
  drm/amd/display: For FPO + Vactive check that all pipes support VA

Aric Cyr (1):
  drm/amd/display: 3.2.283

Daniel Miess (1):
  drm/amd/display: Enable RCO for PHYSYMCLK in DCN35

Dennis Chan (1):
  drm/amd/display: Refactor for Replay Link off frame count

Harry Wentland (2):
  drm/amd/display: Do cursor programming with rest of pipe
  drm/amd/display: Always use legacy way of setting cursor on DCE

Hersen Wu (2):
  drm/amd/display: Add NULL pointer check for kzalloc
  drm/amd/display: Fix overlapping copy within dml_core_mode_programming

Ilya Bakoulin (1):
  drm/amd/display: Fix FEC_READY write on DP LT

Iswara Nagulendran (1):
  drm/amd/display: Restrict multi-disp support for in-game FAMS

Joan Lee (1):
  drm/amd/display: Enable Replay for DCN315

Leo Ma (1):
  drm/amd/display: Fix DC mode screen flickering on DCN321

Nevenko Stupar (1):
  drm/amd/display: gpuvm handling in DML21

Nicholas Kazlauskas (2):
  drm/amd/display: Add trigger FIFO resync path for DCN35
  drm/amd/display: Notify idle link detection through shared state

Revalla Hari Krishna (1):
  drm/amd/display: Refactor HUBBUB into component folder

Rodrigo Siqueira (10):
  drm/amd/display: Improve registers write
  drm/amd/display: Add missing SMU version
  drm/amd/display: Adjust codestyle for dcn31 and hdcp_msg
  drm/amd/display: Add VCO speed parameter for DCN31 FPU
  drm/amd/display: Adjust functions 

Re: [PATCH 3/3] drm/amdgpu: Fix pinned GART area accounting and fdinfo reporting

2024-04-29 Thread Tvrtko Ursulin



On 26/04/2024 23:24, Felix Kuehling wrote:


On 2024-04-26 12:43, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

When commit b453e42a6e8b ("drm/amdgpu: Add new placement for preemptible
SG BOs") added a new TTM region it missed to notice the conceptual
imbalance in GART pin size accounting as done in amdgpu_bo_pin/unpin.

That imbalance leads to such objects getting accounted against the
resource, but are not un-accounted when unpinned.


AMDGPU_PL_PREEMPT is mostly used for userptr BOs, which cannot be 
pinned. In any case you should make sure that the accounting is 
consistent between amdgpu_bo_pin_restricted and amdgpu_bo_unpin. This 
patch breaks that consistency.


You mean amdgpu_bo_pin(_restricted) and amdgpu_bo_unpin do not run for 
such objects, or something else?


If they run, then at the end of pin there is:

domain = amdgpu_mem_type_to_domain(bo->tbo.resource->mem_type);
...
} else if (domain == AMDGPU_GEM_DOMAIN_GTT) {
atomic64_add(amdgpu_bo_size(bo), >gart_pin_size);

And unpin has no handling for AMDGPU_PL_PREEMPT.

Ah I see.. does it rely on amdgpu_mem_type_to_domain returning 0 for 
AMDGPU_PL_PREEMPT? My confusion was I misread the pinning check as 
checking the domain as stored in the bo at creation time.


Although I am still confused by the statement userptr BOs are not 
pinned. It is not needed to map them via GART on AMD hardware for GPU to 
be able to access them?

Fix by extending the accounting criteria in amdgpu_bo_unpin.

What also aappears needs fixing is not reporting their size from the
amdgpu_bo_get_memory, which is used to implement fdinfo stats, so they 
are

not mixed with the regular userspace created and driver owned objects.


I think that's true. It's a very fine distinction. AMDGPU_PL_PREEMPT 
does use system memory and it is GPU accessible, just like GTT. The only 
difference is, that it's not subject to the GTT limits because their 
eviction is handled by callbacks other than TTM evictions and doesn't 
need to wait for fences.


As in you think those two hunks of the patch are correct?

Regards,

Tvrtko



Regards,
   Felix




And also amdgpu_bo_print_info for debugfs reporting.

Note that the patch depends on the previous one which broke down the
relevant checks from the domain based to placement based.

Signed-off-by: Tvrtko Ursulin 
Fixes: b453e42a6e8b ("drm/amdgpu: Add new placement for preemptible SG 
BOs")

Cc: Felix Kuehling 
Cc: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 5 ++---
  1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c

index fb984669fc3a..5a2bbc793953 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -1032,7 +1032,8 @@ void amdgpu_bo_unpin(struct amdgpu_bo *bo)
  atomic64_sub(amdgpu_bo_size(bo), >vram_pin_size);
  atomic64_sub(amdgpu_vram_mgr_bo_visible_size(bo),
   >visible_pin_size);
-    } else if (bo->tbo.resource->mem_type == TTM_PL_TT) {
+    } else if (bo->tbo.resource->mem_type == TTM_PL_TT ||
+   bo->tbo.resource->mem_type == AMDGPU_PL_PREEMPT) {
  atomic64_sub(amdgpu_bo_size(bo), >gart_pin_size);
  }
@@ -1298,7 +1299,6 @@ void amdgpu_bo_get_memory(struct amdgpu_bo *bo,
  stats->vram_shared += size;
  break;
  case TTM_PL_TT:
-    case AMDGPU_PL_PREEMPT:
  stats->gtt += size;
  if (shared)
  stats->gtt_shared += size;
@@ -1599,7 +1599,6 @@ u64 amdgpu_bo_print_info(int id, struct 
amdgpu_bo *bo, struct seq_file *m)

  placement = "VRAM";
  break;
  case TTM_PL_TT:
-    case AMDGPU_PL_PREEMPT:
  placement = "GTT";
  break;
  case TTM_PL_SYSTEM:


Re: [PATCH] drm/amdgpu: fix doorbell regression

2024-04-29 Thread Christian König

Am 29.04.24 um 14:50 schrieb Shashank Sharma:

This patch adds a missed handling of PL domain doorbell while
handling VRAM faults.

Fixes: a6ff969fe9cb ("drm/amdgpu: fix visible VRAM handling during faults")
Cc: Christian Koenig 
Cc: Alex Deucher 
Signed-off-by: Shashank Sharma 
Signed-off-by: Arvind Yadav 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 1d71729e3f6b..c71eeb6a04e6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -419,7 +419,7 @@ bool amdgpu_res_cpu_visible(struct amdgpu_device *adev,
return false;
  
  	if (res->mem_type == TTM_PL_SYSTEM || res->mem_type == TTM_PL_TT ||

-   res->mem_type == AMDGPU_PL_PREEMPT)
+   res->mem_type == AMDGPU_PL_PREEMPT || res->mem_type == 
AMDGPU_PL_DOORBELL)
return true;
  
  	if (res->mem_type != TTM_PL_VRAM)




[PATCH] drm/amdgpu: fix doorbell regression

2024-04-29 Thread Shashank Sharma
This patch adds a missed handling of PL domain doorbell while
handling VRAM faults.

Fixes: a6ff969fe9cb ("drm/amdgpu: fix visible VRAM handling during faults")
Cc: Christian Koenig 
Cc: Alex Deucher 
Signed-off-by: Shashank Sharma 
Signed-off-by: Arvind Yadav 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 1d71729e3f6b..c71eeb6a04e6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -419,7 +419,7 @@ bool amdgpu_res_cpu_visible(struct amdgpu_device *adev,
return false;
 
if (res->mem_type == TTM_PL_SYSTEM || res->mem_type == TTM_PL_TT ||
-   res->mem_type == AMDGPU_PL_PREEMPT)
+   res->mem_type == AMDGPU_PL_PREEMPT || res->mem_type == 
AMDGPU_PL_DOORBELL)
return true;
 
if (res->mem_type != TTM_PL_VRAM)
-- 
2.43.2



Re: [PATCH 3/3] drm/amdgpu: Fix pinned GART area accounting and fdinfo reporting

2024-04-29 Thread Christian König

Am 29.04.24 um 11:43 schrieb Tvrtko Ursulin:


On 26/04/2024 23:24, Felix Kuehling wrote:


On 2024-04-26 12:43, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

When commit b453e42a6e8b ("drm/amdgpu: Add new placement for 
preemptible

SG BOs") added a new TTM region it missed to notice the conceptual
imbalance in GART pin size accounting as done in amdgpu_bo_pin/unpin.

That imbalance leads to such objects getting accounted against the
resource, but are not un-accounted when unpinned.


AMDGPU_PL_PREEMPT is mostly used for userptr BOs, which cannot be 
pinned. In any case you should make sure that the accounting is 
consistent between amdgpu_bo_pin_restricted and amdgpu_bo_unpin. This 
patch breaks that consistency.


You mean amdgpu_bo_pin(_restricted) and amdgpu_bo_unpin do not run for 
such objects, or something else?


If they run, then at the end of pin there is:

domain = amdgpu_mem_type_to_domain(bo->tbo.resource->mem_type);
...
} else if (domain == AMDGPU_GEM_DOMAIN_GTT) {
    atomic64_add(amdgpu_bo_size(bo), >gart_pin_size);

And unpin has no handling for AMDGPU_PL_PREEMPT.

Ah I see.. does it rely on amdgpu_mem_type_to_domain returning 0 for 
AMDGPU_PL_PREEMPT? My confusion was I misread the pinning check as 
checking the domain as stored in the bo at creation time.


Although I am still confused by the statement userptr BOs are not 
pinned. It is not needed to map them via GART on AMD hardware for GPU 
to be able to access them?


No, a GART mapping is only needed if you want to scanout from them or 
otherwise use them from the kernel on the GPU.


Background is that the kernel doesn't has VM with page tables..


Fix by extending the accounting criteria in amdgpu_bo_unpin.

What also aappears needs fixing is not reporting their size from the
amdgpu_bo_get_memory, which is used to implement fdinfo stats, so 
they are

not mixed with the regular userspace created and driver owned objects.


I think that's true. It's a very fine distinction. AMDGPU_PL_PREEMPT 
does use system memory and it is GPU accessible, just like GTT. The 
only difference is, that it's not subject to the GTT limits because 
their eviction is handled by callbacks other than TTM evictions and 
doesn't need to wait for fences.


As in you think those two hunks of the patch are correct?


I think so as well, yes. But we still need a name for preemptible BOs 
while printing them in debugfs.


Regards,
Christian.



Regards,

Tvrtko



Regards,
   Felix




And also amdgpu_bo_print_info for debugfs reporting.

Note that the patch depends on the previous one which broke down the
relevant checks from the domain based to placement based.

Signed-off-by: Tvrtko Ursulin 
Fixes: b453e42a6e8b ("drm/amdgpu: Add new placement for preemptible 
SG BOs")

Cc: Felix Kuehling 
Cc: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 5 ++---
  1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c

index fb984669fc3a..5a2bbc793953 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -1032,7 +1032,8 @@ void amdgpu_bo_unpin(struct amdgpu_bo *bo)
  atomic64_sub(amdgpu_bo_size(bo), >vram_pin_size);
  atomic64_sub(amdgpu_vram_mgr_bo_visible_size(bo),
   >visible_pin_size);
-    } else if (bo->tbo.resource->mem_type == TTM_PL_TT) {
+    } else if (bo->tbo.resource->mem_type == TTM_PL_TT ||
+   bo->tbo.resource->mem_type == AMDGPU_PL_PREEMPT) {
  atomic64_sub(amdgpu_bo_size(bo), >gart_pin_size);
  }
@@ -1298,7 +1299,6 @@ void amdgpu_bo_get_memory(struct amdgpu_bo *bo,
  stats->vram_shared += size;
  break;
  case TTM_PL_TT:
-    case AMDGPU_PL_PREEMPT:
  stats->gtt += size;
  if (shared)
  stats->gtt_shared += size;
@@ -1599,7 +1599,6 @@ u64 amdgpu_bo_print_info(int id, struct 
amdgpu_bo *bo, struct seq_file *m)

  placement = "VRAM";
  break;
  case TTM_PL_TT:
-    case AMDGPU_PL_PREEMPT:
  placement = "GTT";
  break;
  case TTM_PL_SYSTEM:




[PATCH] drm/amdkfd: update buffer_{store, load}_* modifiers for gfx940

2024-04-29 Thread Lancelot SIX
Instruction modifiers of the untyped vector memory buffer instructions
(MUBUF encoded) changed in gfx940.  The slc, scc and glc modifiers have
been replaced with sc0, sc1 and nt.

The current CWSR trap handler is written using pre-gfx940 modifier
names, making the source incompatible with a strict gfx940 assembler.

This patch updates the cwsr_trap_handler_gfx9.s source file to be
compatible with all gfx9 variants of the ISA.  The binary assembled code
is unchanged (so the behaviour is unchanged as well), only the source
representation is updated.

Signed-off-by: Lancelot SIX 
---
 .../drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm | 24 ---
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm 
b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm
index bb26338204f4..a2d597d7fb57 100644
--- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm
+++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm
@@ -48,6 +48,12 @@ var ACK_SQC_STORE=   1   
//workaround for suspected SQC store bug causing
 var SAVE_AFTER_XNACK_ERROR =   1   //workaround for 
TCP store failure after XNACK error when ALLOW_REPLAY=0, for debugger
 var SINGLE_STEP_MISSED_WORKAROUND   =  (ASIC_FAMILY <= CHIP_ALDEBARAN) 
//workaround for lost MODE.DEBUG_EN exception when SAVECTX raised
 
+#if ASIC_FAMILY < CHIP_GC_9_4_3
+#define VMEM_MODIFIERS slc:1 glc:1
+#else
+#define VMEM_MODIFIERS sc0:1 nt:1
+#endif
+
 /**/
 /* variables */
 /**/
@@ -581,7 +587,7 @@ end
 L_SAVE_LDS_LOOP_VECTOR:
   ds_read_b64 v[0:1], v2   //x =LDS[a], byte address
   s_waitcnt lgkmcnt(0)
-  buffer_store_dwordx2  v[0:1], v2, s_save_buf_rsrc0, s_save_mem_offset 
offen:1  glc:1  slc:1
+  buffer_store_dwordx2  v[0:1], v2, s_save_buf_rsrc0, s_save_mem_offset 
VMEM_MODIFIERS offen:1
 // s_waitcnt vmcnt(0)
 // v_add_u32 v2, vcc[0:1], v2, v3
   v_add_u32 v2, v2, v3
@@ -979,17 +985,17 @@ L_TCP_STORE_CHECK_DONE:
 end
 
 function write_4vgprs_to_mem(s_rsrc, s_mem_offset)
-   buffer_store_dword v0, v0, s_rsrc, s_mem_offset slc:1 glc:1
-   buffer_store_dword v1, v0, s_rsrc, s_mem_offset slc:1 glc:1  offset:256
-   buffer_store_dword v2, v0, s_rsrc, s_mem_offset slc:1 glc:1  
offset:256*2
-   buffer_store_dword v3, v0, s_rsrc, s_mem_offset slc:1 glc:1  
offset:256*3
+   buffer_store_dword v0, v0, s_rsrc, s_mem_offset VMEM_MODIFIERS
+   buffer_store_dword v1, v0, s_rsrc, s_mem_offset VMEM_MODIFIERS 
offset:256
+   buffer_store_dword v2, v0, s_rsrc, s_mem_offset VMEM_MODIFIERS 
offset:256*2
+   buffer_store_dword v3, v0, s_rsrc, s_mem_offset VMEM_MODIFIERS 
offset:256*3
 end
 
 function read_4vgprs_from_mem(s_rsrc, s_mem_offset)
-   buffer_load_dword v0, v0, s_rsrc, s_mem_offset slc:1 glc:1
-   buffer_load_dword v1, v0, s_rsrc, s_mem_offset slc:1 glc:1 offset:256
-   buffer_load_dword v2, v0, s_rsrc, s_mem_offset slc:1 glc:1 offset:256*2
-   buffer_load_dword v3, v0, s_rsrc, s_mem_offset slc:1 glc:1 offset:256*3
+   buffer_load_dword v0, v0, s_rsrc, s_mem_offset VMEM_MODIFIERS
+   buffer_load_dword v1, v0, s_rsrc, s_mem_offset VMEM_MODIFIERS offset:256
+   buffer_load_dword v2, v0, s_rsrc, s_mem_offset VMEM_MODIFIERS 
offset:256*2
+   buffer_load_dword v3, v0, s_rsrc, s_mem_offset VMEM_MODIFIERS 
offset:256*3
s_waitcnt vmcnt(0)
 end
 

base-commit: cf743996352e327f483dc7d66606c90276f57380
-- 
2.34.1



Re: [PATCH 3/3] drm/amdgpu: Fix pinned GART area accounting and fdinfo reporting

2024-04-29 Thread Christian König

Am 26.04.24 um 18:43 schrieb Tvrtko Ursulin:

From: Tvrtko Ursulin 

When commit b453e42a6e8b ("drm/amdgpu: Add new placement for preemptible
SG BOs") added a new TTM region it missed to notice the conceptual
imbalance in GART pin size accounting as done in amdgpu_bo_pin/unpin.

That imbalance leads to such objects getting accounted against the
resource, but are not un-accounted when unpinned.

Fix by extending the accounting criteria in amdgpu_bo_unpin.

What also aappears needs fixing is not reporting their size from the
amdgpu_bo_get_memory, which is used to implement fdinfo stats, so they are
not mixed with the regular userspace created and driver owned objects.

And also amdgpu_bo_print_info for debugfs reporting.

Note that the patch depends on the previous one which broke down the
relevant checks from the domain based to placement based.

Signed-off-by: Tvrtko Ursulin 
Fixes: b453e42a6e8b ("drm/amdgpu: Add new placement for preemptible SG BOs")
Cc: Felix Kuehling 
Cc: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 5 ++---
  1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index fb984669fc3a..5a2bbc793953 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -1032,7 +1032,8 @@ void amdgpu_bo_unpin(struct amdgpu_bo *bo)
atomic64_sub(amdgpu_bo_size(bo), >vram_pin_size);
atomic64_sub(amdgpu_vram_mgr_bo_visible_size(bo),
 >visible_pin_size);
-   } else if (bo->tbo.resource->mem_type == TTM_PL_TT) {
+   } else if (bo->tbo.resource->mem_type == TTM_PL_TT ||
+  bo->tbo.resource->mem_type == AMDGPU_PL_PREEMPT) {


Good catch, but please separate that one from the other changes since we 
probably want to backport it.



atomic64_sub(amdgpu_bo_size(bo), >gart_pin_size);
}
  
@@ -1298,7 +1299,6 @@ void amdgpu_bo_get_memory(struct amdgpu_bo *bo,

stats->vram_shared += size;
break;
case TTM_PL_TT:
-   case AMDGPU_PL_PREEMPT:
stats->gtt += size;
if (shared)
stats->gtt_shared += size;
@@ -1599,7 +1599,6 @@ u64 amdgpu_bo_print_info(int id, struct amdgpu_bo *bo, 
struct seq_file *m)
placement = "VRAM";
break;
case TTM_PL_TT:
-   case AMDGPU_PL_PREEMPT:


Yeah, that makes sense as well. But we need a case for AMDGPU_PL_PREEMPT 
here as well then.


Regards,
Christian.


placement = "GTT";
break;
case TTM_PL_SYSTEM:




Re: [PATCH 2/3] drm/amdgpu: Reduce mem_type to domain double indirection

2024-04-29 Thread Christian König

Am 26.04.24 um 18:43 schrieb Tvrtko Ursulin:

From: Tvrtko Ursulin 

All apart from AMDGPU_GEM_DOMAIN_GTT memory domains map 1:1 to TTM
placements. And the former be either AMDGPU_PL_PREEMPT or TTM_PL_TT,
depending on AMDGPU_GEM_CREATE_PREEMPTIBLE.

Simplify a few places in the code which convert the TTM placement into
a domain by checking against the current placement directly.

Signed-off-by: Tvrtko Ursulin 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c |  4 +--
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c  | 30 ++---
  2 files changed, 16 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
index 055ba2ea4c12..ff83f8d8628c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
@@ -165,8 +165,8 @@ static struct sg_table *amdgpu_dma_buf_map(struct 
dma_buf_attachment *attach,
if (r)
return ERR_PTR(r);
  
-	} else if (!(amdgpu_mem_type_to_domain(bo->tbo.resource->mem_type) &

-AMDGPU_GEM_DOMAIN_GTT)) {
+   } else if (bo->tbo.resource->mem_type != TTM_PL_TT &&
+  bo->tbo.resource->mem_type != AMDGPU_PL_PREEMPT) {
return ERR_PTR(-EBUSY);
}
  
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c

index 8bc79924d171..fb984669fc3a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -976,12 +976,12 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 
domain,
  
  	ttm_bo_pin(>tbo);
  
-	domain = amdgpu_mem_type_to_domain(bo->tbo.resource->mem_type);

-   if (domain == AMDGPU_GEM_DOMAIN_VRAM) {
+   if (bo->tbo.resource->mem_type == TTM_PL_VRAM) {
atomic64_add(amdgpu_bo_size(bo), >vram_pin_size);
atomic64_add(amdgpu_vram_mgr_bo_visible_size(bo),
 >visible_pin_size);
-   } else if (domain == AMDGPU_GEM_DOMAIN_GTT) {
+   } else if (bo->tbo.resource->mem_type == TTM_PL_TT ||
+  bo->tbo.resource->mem_type == AMDGPU_PL_PREEMPT) {
atomic64_add(amdgpu_bo_size(bo), >gart_pin_size);
}
  
@@ -1280,7 +1280,6 @@ void amdgpu_bo_get_memory(struct amdgpu_bo *bo,

  {
uint64_t size = amdgpu_bo_size(bo);
struct drm_gem_object *obj;
-   unsigned int domain;
bool shared;
  
  	/* Abort if the BO doesn't currently have a backing store */

@@ -1290,21 +1289,21 @@ void amdgpu_bo_get_memory(struct amdgpu_bo *bo,
obj = >tbo.base;
shared = drm_gem_object_is_shared_for_memory_stats(obj);
  
-	domain = amdgpu_mem_type_to_domain(bo->tbo.resource->mem_type);

-   switch (domain) {
-   case AMDGPU_GEM_DOMAIN_VRAM:
+   switch (bo->tbo.resource->mem_type) {
+   case TTM_PL_VRAM:
stats->vram += size;
if (amdgpu_bo_in_cpu_visible_vram(bo))
stats->visible_vram += size;
if (shared)
stats->vram_shared += size;
break;
-   case AMDGPU_GEM_DOMAIN_GTT:
+   case TTM_PL_TT:
+   case AMDGPU_PL_PREEMPT:
stats->gtt += size;
if (shared)
stats->gtt_shared += size;
break;
-   case AMDGPU_GEM_DOMAIN_CPU:
+   case TTM_PL_SYSTEM:
default:
stats->cpu += size;
if (shared)
@@ -1317,7 +1316,7 @@ void amdgpu_bo_get_memory(struct amdgpu_bo *bo,
if (bo->flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED)
stats->requested_visible_vram += size;
  
-		if (domain != AMDGPU_GEM_DOMAIN_VRAM) {

+   if (bo->tbo.resource->mem_type != TTM_PL_VRAM) {
stats->evicted_vram += size;
if (bo->flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED)
stats->evicted_visible_vram += size;
@@ -1592,19 +1591,18 @@ u64 amdgpu_bo_print_info(int id, struct amdgpu_bo *bo, 
struct seq_file *m)
u64 size;
  
  	if (dma_resv_trylock(bo->tbo.base.resv)) {

-   unsigned int domain;
-   domain = amdgpu_mem_type_to_domain(bo->tbo.resource->mem_type);
-   switch (domain) {
-   case AMDGPU_GEM_DOMAIN_VRAM:
+   switch (bo->tbo.resource->mem_type) {
+   case TTM_PL_VRAM:
if (amdgpu_bo_in_cpu_visible_vram(bo))
placement = "VRAM VISIBLE";
else
placement = "VRAM";
break;
-   case AMDGPU_GEM_DOMAIN_GTT:
+   case TTM_PL_TT:
+   case AMDGPU_PL_PREEMPT:
placement = "GTT";
break;
-   

Re: [PATCH 1/3] drm/amdgpu: Add amdgpu_bo_is_vm_bo helper

2024-04-29 Thread Christian König

Am 26.04.24 um 18:43 schrieb Tvrtko Ursulin:

From: Tvrtko Ursulin 

Help code readability by replacing a bunch of:

bo->tbo.base.resv == vm->root.bo->tbo.base.resv

With:

amdgpu_bo_is_vm_bo(bo, vm)

No functional changes.


Ah,yes that was on my TODO list as well.

But I would have rather added this to the VM instead. In other words 
move it to amdgpu_vm.h and call it amdgpu_vm_is_bo_always_valid() or 
something like that.


Regards,
Christian.



Signed-off-by: Tvrtko Ursulin 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c|  2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 14 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 31 +-
  3 files changed, 28 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 67c234bcf89f..32e4a9c6e805 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -174,7 +174,7 @@ static int amdgpu_gem_object_open(struct drm_gem_object 
*obj,
return -EPERM;
  
  	if (abo->flags & AMDGPU_GEM_CREATE_VM_ALWAYS_VALID &&

-   abo->tbo.base.resv != vm->root.bo->tbo.base.resv)
+   !amdgpu_bo_is_vm_bo(abo, vm))
return -EPERM;
  
  	r = amdgpu_bo_reserve(abo, false);

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
index be679c42b0b8..f2bb6965cc77 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
@@ -307,6 +307,20 @@ static inline struct amdgpu_bo *amdgpu_bo_shadowed(struct 
amdgpu_bo *bo)
return NULL;
  }
  
+/**

+ * amdgpu_bo_is_vm_bo - check if the BO is VM always valid
+ *
+ * @abo: BO to be tested.
+ * @vm: VM to test against.
+ *
+ * Returns true if the BO is VM always valid.
+ */
+static inline bool amdgpu_bo_is_vm_bo(struct amdgpu_bo *bo,
+ struct amdgpu_vm *vm)
+{
+   return bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv;
+}
+
  bool amdgpu_bo_is_amdgpu_bo(struct ttm_buffer_object *bo);
  void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32 domain);
  
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c

index 8af3f0fd3073..6d6f0e325172 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -333,7 +333,7 @@ void amdgpu_vm_bo_base_init(struct amdgpu_vm_bo_base *base,
base->next = bo->vm_bo;
bo->vm_bo = base;
  
-	if (bo->tbo.base.resv != vm->root.bo->tbo.base.resv)

+   if (!amdgpu_bo_is_vm_bo(bo, vm))
return;
  
  	dma_resv_assert_held(vm->root.bo->tbo.base.resv);

@@ -1101,13 +1101,12 @@ static void amdgpu_vm_bo_get_memory(struct amdgpu_bo_va 
*bo_va,
 * For now ignore BOs which are currently locked and potentially
 * changing their location.
 */
-   if (bo->tbo.base.resv != vm->root.bo->tbo.base.resv &&
-   !dma_resv_trylock(bo->tbo.base.resv))
+   if (!amdgpu_bo_is_vm_bo(bo, vm) && !dma_resv_trylock(bo->tbo.base.resv))
return;
  
  	amdgpu_bo_get_memory(bo, stats);

-   if (bo->tbo.base.resv != vm->root.bo->tbo.base.resv)
-   dma_resv_unlock(bo->tbo.base.resv);
+   if (amdgpu_bo_is_vm_bo(bo, vm))
+   dma_resv_unlock(bo->tbo.base.resv);
  }
  
  void amdgpu_vm_get_memory(struct amdgpu_vm *vm,

@@ -1203,8 +1202,7 @@ int amdgpu_vm_bo_update(struct amdgpu_device *adev, 
struct amdgpu_bo_va *bo_va,
uncached = false;
}
  
-	if (clear || (bo && bo->tbo.base.resv ==

- vm->root.bo->tbo.base.resv))
+   if (clear || amdgpu_bo_is_vm_bo(bo, vm))
last_update = >last_update;
else
last_update = _va->last_pt_update;
@@ -1246,7 +1244,7 @@ int amdgpu_vm_bo_update(struct amdgpu_device *adev, 
struct amdgpu_bo_va *bo_va,
 * the evicted list so that it gets validated again on the
 * next command submission.
 */
-   if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv) {
+   if (amdgpu_bo_is_vm_bo(bo, vm)) {
uint32_t mem_type = bo->tbo.resource->mem_type;
  
  		if (!(bo->preferred_domains &

@@ -1640,10 +1638,9 @@ static void amdgpu_vm_bo_insert_map(struct amdgpu_device 
*adev,
if (mapping->flags & AMDGPU_PTE_PRT)
amdgpu_vm_prt_get(adev);
  
-	if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv &&

-   !bo_va->base.moved) {
+   if (amdgpu_bo_is_vm_bo(bo, vm) && !bo_va->base.moved)
amdgpu_vm_bo_moved(_va->base);
-   }
+
trace_amdgpu_vm_bo_map(bo_va, mapping);
  }
  
@@ -1922,8 +1919,7 @@ int amdgpu_vm_bo_clear_mappings(struct amdgpu_device *adev,

if (before->flags & AMDGPU_PTE_PRT)
amdgpu_vm_prt_get(adev);
  
-		if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv &&

RE: [PATCH 1/2] drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs

2024-04-29 Thread Yu, Lang
[Public]

>-Original Message-
>From: Kuehling, Felix 
>Sent: Saturday, April 27, 2024 6:52 AM
>To: Yu, Lang ; amd-gfx@lists.freedesktop.org
>Cc: Yang, Philip ; Koenig, Christian
>; Zhang, Yifan ; Liu,
>Aaron 
>Subject: Re: [PATCH 1/2] drm/amdkfd: Let VRAM allocations go to GTT
>domain on small APUs
>
>
>On 2024-04-26 04:37, Lang Yu wrote:
>> Small APUs(i.e., consumer, embedded products) usually have a small
>> carveout device memory which can't satisfy most compute workloads
>> memory allocation requirements.
>>
>> We can't even run a Basic MNIST Example with a default 512MB carveout.
>> https://github.com/pytorch/examples/tree/main/mnist.
>>
>> Though we can change BIOS settings to enlarge carveout size, which is
>> inflexible and may bring complaint. On the other hand, the memory
>> resource can't be effectively used between host and device.
>>
>> The solution is MI300A approach, i.e., let VRAM allocations go to GTT.
>>
>> Signed-off-by: Lang Yu 
>
>Two nit-picks inline. Other than that, this patch looks reasonable to me.

Thanks. Will update them accordingly.

Regards,
Lang

>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c|  6 +-
>>   .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 21 +++-
>---
>>   drivers/gpu/drm/amd/amdkfd/kfd_migrate.c  |  2 +-
>>   drivers/gpu/drm/amd/amdkfd/kfd_svm.c  |  6 --
>>   drivers/gpu/drm/amd/amdkfd/kfd_svm.h  |  3 ++-
>>   5 files changed, 24 insertions(+), 14 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>> index 7ba05f030dd1..3295838e9a1d 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>> @@ -456,7 +456,9 @@ void amdgpu_amdkfd_get_local_mem_info(struct
>amdgpu_device *adev,
>>  mem_info->local_mem_size_private =
>>  KFD_XCP_MEMORY_SIZE(adev, xcp-
>>id);
>>  } else {
>> -mem_info->local_mem_size_public = adev-
>>gmc.visible_vram_size;
>> +mem_info->local_mem_size_public = adev->flags &
>AMD_IS_APU ?
>> +  (ttm_tt_pages_limit() <<
>PAGE_SHIFT) :
>> +  adev-
>>gmc.visible_vram_size;
>>  mem_info->local_mem_size_private = adev-
>>gmc.real_vram_size -
>>  adev->gmc.visible_vram_size;
>
>On an APU the private size should be reported as 0.
>
>
>>  }
>> @@ -824,6 +826,8 @@ u64 amdgpu_amdkfd_xcp_memory_size(struct
>amdgpu_device *adev, int xcp_id)
>>  }
>>  do_div(tmp, adev->xcp_mgr->num_xcp_per_mem_partition);
>>  return ALIGN_DOWN(tmp, PAGE_SIZE);
>> +} else if (adev->flags & AMD_IS_APU) {
>> +return (ttm_tt_pages_limit() << PAGE_SHIFT);
>>  } else {
>>  return adev->gmc.real_vram_size;
>>  }
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> index c4f9960dafbb..7eb5afcc4895 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> @@ -196,7 +196,7 @@ int amdgpu_amdkfd_reserve_mem_limit(struct
>amdgpu_device *adev,
>>  return -EINVAL;
>>
>>  vram_size = KFD_XCP_MEMORY_SIZE(adev, xcp_id);
>> -if (adev->gmc.is_app_apu) {
>> +if (adev->gmc.is_app_apu || adev->flags & AMD_IS_APU) {
>>  system_mem_needed = size;
>>  ttm_mem_needed = size;
>>  }
>> @@ -232,7 +232,8 @@ int amdgpu_amdkfd_reserve_mem_limit(struct
>amdgpu_device *adev,
>>"adev reference can't be null when vram is used");
>>  if (adev && xcp_id >= 0) {
>>  adev->kfd.vram_used[xcp_id] += vram_needed;
>> -adev->kfd.vram_used_aligned[xcp_id] += adev-
>>gmc.is_app_apu ?
>> +adev->kfd.vram_used_aligned[xcp_id] +=
>> +(adev->gmc.is_app_apu || adev->flags &
>AMD_IS_APU) ?
>>  vram_needed :
>>  ALIGN(vram_needed,
>VRAM_AVAILABLITY_ALIGN);
>>  }
>> @@ -260,7 +261,7 @@ void
>amdgpu_amdkfd_unreserve_mem_limit(struct
>> amdgpu_device *adev,
>>
>>  if (adev) {
>>  adev->kfd.vram_used[xcp_id] -= size;
>> -if (adev->gmc.is_app_apu) {
>> +if (adev->gmc.is_app_apu || adev->flags &
>AMD_IS_APU) {
>>  adev->kfd.vram_used_aligned[xcp_id] -= size;
>>  kfd_mem_limit.system_mem_used -= size;
>>  kfd_mem_limit.ttm_mem_used -= size; @@ -
>889,7 +890,7 @@ static
>> int kfd_mem_attach(struct amdgpu_device *adev, struct kgd_mem *mem,
>>   * if peer device has large BAR. In contrast, access over xGMI is
>>   * 

  1   2   >