RE: [PATCH] drm/amdgpu: Add fatal error handling in nbio v4_3

2023-03-22 Thread Zhou1, Tao
[AMD Official Use Only - General]

Reviewed-by: Tao Zhou 

> -Original Message-
> From: Zhang, Hawking 
> Sent: Thursday, March 23, 2023 10:24 AM
> To: amd-gfx@lists.freedesktop.org; Zhou1, Tao ; Yang,
> Stanley ; Li, Candice ; Chai,
> Thomas 
> Cc: Zhang, Hawking 
> Subject: [PATCH] drm/amdgpu: Add fatal error handling in nbio v4_3
> 
> GPU will stop working once fatal error is detected.
> it will inform driver to do reset to recover from the fatal error.
> 
> Signed-off-by: Hawking Zhang 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 11 
> drivers/gpu/drm/amd/amdgpu/nbio_v4_3.c  | 79 +
> drivers/gpu/drm/amd/amdgpu/nbio_v4_3.h  |  1 +
>  drivers/gpu/drm/amd/amdgpu/soc21.c  | 15 -
>  4 files changed, 105 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> index c6dc3cd2a9de..5b1779021881 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> @@ -34,6 +34,7 @@
>  #include "amdgpu_atomfirmware.h"
>  #include "amdgpu_xgmi.h"
>  #include "ivsrcid/nbio/irqsrcs_nbif_7_4.h"
> +#include "nbio_v4_3.h"
>  #include "atom.h"
>  #include "amdgpu_reset.h"
> 
> @@ -2562,6 +2563,16 @@ int amdgpu_ras_init(struct amdgpu_device *adev)
>   if (!adev->gmc.xgmi.connected_to_cpu)
>   adev->nbio.ras = _v7_4_ras;
>   break;
> + case IP_VERSION(4, 3, 0):
> + if (adev->ras_hw_enabled | AMDGPU_RAS_BLOCK__DF)
> + /* unlike other generation of nbio ras,
> +  * nbio v4_3 only support fatal error interrupt
> +  * to inform software that DF is freezed due to
> +  * system fatal error event. driver should not
> +  * enable nbio ras in such case. Instead,
> +  * check DF RAS */
> + adev->nbio.ras = _v4_3_ras;
> + break;
>   default:
>   /* nbio ras is not available */
>   break;
> diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v4_3.c
> b/drivers/gpu/drm/amd/amdgpu/nbio_v4_3.c
> index 09fdcd20cb91..d5ed9e0e1a5f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/nbio_v4_3.c
> +++ b/drivers/gpu/drm/amd/amdgpu/nbio_v4_3.c
> @@ -26,6 +26,7 @@
> 
>  #include "nbio/nbio_4_3_0_offset.h"
>  #include "nbio/nbio_4_3_0_sh_mask.h"
> +#include "ivsrcid/nbio/irqsrcs_nbif_7_4.h"
>  #include 
> 
>  static void nbio_v4_3_remap_hdp_registers(struct amdgpu_device *adev) @@ -
> 538,3 +539,81 @@ const struct amdgpu_nbio_funcs nbio_v4_3_sriov_funcs = {
>   .remap_hdp_registers = nbio_v4_3_remap_hdp_registers,
>   .get_rom_offset = nbio_v4_3_get_rom_offset,  };
> +
> +static int nbio_v4_3_set_ras_err_event_athub_irq_state(struct amdgpu_device
> *adev,
> +struct amdgpu_irq_src 
> *src,
> +unsigned type,
> +enum
> amdgpu_interrupt_state state) {
> + /* The ras_controller_irq enablement should be done in psp bl when it
> +  * tries to enable ras feature. Driver only need to set the correct
> interrupt
> +  * vector for bare-metal and sriov use case respectively
> +  */
> + uint32_t bif_doorbell_int_cntl;
> +
> + bif_doorbell_int_cntl = RREG32_SOC15(NBIO, 0,
> regBIF_BX0_BIF_DOORBELL_INT_CNTL);
> + bif_doorbell_int_cntl = REG_SET_FIELD(bif_doorbell_int_cntl,
> +   BIF_BX0_BIF_DOORBELL_INT_CNTL,
> +
> RAS_ATHUB_ERR_EVENT_INTERRUPT_DISABLE,
> +   (state ==
> AMDGPU_IRQ_STATE_ENABLE) ? 0 : 1);
> + WREG32_SOC15(NBIO, 0, regBIF_BX0_BIF_DOORBELL_INT_CNTL,
> +bif_doorbell_int_cntl);
> +
> + return 0;
> +}
> +
> +static int nbio_v4_3_process_err_event_athub_irq(struct amdgpu_device
> *adev,
> +  struct amdgpu_irq_src
> *source,
> +  struct amdgpu_iv_entry *entry)
> +{
> + /* By design, the ih cookie for err_event_athub_irq should be written
> +  * to bif ring. since bif ring is not enabled, just leave process 
> callback
> +  * as a dummy one.
> +  */
> + return 0;
> +}
> +
> +static const struct amdgpu_irq_src_funcs
> nbio_v4_3_ras_err_event_athub_irq_funcs = {
> + .set = nbio_v4_3_set_ras_err_event_athub_irq_state,
> + .process = nbio_v4_3_process_err_event_athub_irq,
> +};
> +
> +static void nbio_v4_3_handle_ras_err_event_athub_intr_no_bifring(struct
> +amdgpu_device *adev) {
> + uint32_t bif_doorbell_int_cntl;
> +
> + bif_doorbell_int_cntl = RREG32_SOC15(NBIO, 0,
> regBIF_BX0_BIF_DOORBELL_INT_CNTL);
> + if (REG_GET_FIELD(bif_doorbell_int_cntl,
> +   BIF_DOORBELL_INT_CNTL,
> +   RAS_ATHUB_ERR_EVENT_INTERRUPT_STATUS)) {
> +  

RE: [PATCH] drm/amdgpu: Add fatal error handling in nbio v4_3

2023-03-22 Thread Li, Candice
[Public]

Reviewed-by: Candice Li 



Thanks,
Candice

-Original Message-
From: Zhang, Hawking  
Sent: Thursday, March 23, 2023 10:24 AM
To: amd-gfx@lists.freedesktop.org; Zhou1, Tao ; Yang, 
Stanley ; Li, Candice ; Chai, Thomas 

Cc: Zhang, Hawking 
Subject: [PATCH] drm/amdgpu: Add fatal error handling in nbio v4_3

GPU will stop working once fatal error is detected.
it will inform driver to do reset to recover from
the fatal error.

Signed-off-by: Hawking Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 11 
 drivers/gpu/drm/amd/amdgpu/nbio_v4_3.c  | 79 +
 drivers/gpu/drm/amd/amdgpu/nbio_v4_3.h  |  1 +
 drivers/gpu/drm/amd/amdgpu/soc21.c  | 15 -
 4 files changed, 105 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index c6dc3cd2a9de..5b1779021881 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -34,6 +34,7 @@
 #include "amdgpu_atomfirmware.h"
 #include "amdgpu_xgmi.h"
 #include "ivsrcid/nbio/irqsrcs_nbif_7_4.h"
+#include "nbio_v4_3.h"
 #include "atom.h"
 #include "amdgpu_reset.h"
 
@@ -2562,6 +2563,16 @@ int amdgpu_ras_init(struct amdgpu_device *adev)
if (!adev->gmc.xgmi.connected_to_cpu)
adev->nbio.ras = _v7_4_ras;
break;
+   case IP_VERSION(4, 3, 0):
+   if (adev->ras_hw_enabled | AMDGPU_RAS_BLOCK__DF)
+   /* unlike other generation of nbio ras,
+* nbio v4_3 only support fatal error interrupt
+* to inform software that DF is freezed due to
+* system fatal error event. driver should not
+* enable nbio ras in such case. Instead,
+* check DF RAS */
+   adev->nbio.ras = _v4_3_ras;
+   break;
default:
/* nbio ras is not available */
break;
diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v4_3.c 
b/drivers/gpu/drm/amd/amdgpu/nbio_v4_3.c
index 09fdcd20cb91..d5ed9e0e1a5f 100644
--- a/drivers/gpu/drm/amd/amdgpu/nbio_v4_3.c
+++ b/drivers/gpu/drm/amd/amdgpu/nbio_v4_3.c
@@ -26,6 +26,7 @@
 
 #include "nbio/nbio_4_3_0_offset.h"
 #include "nbio/nbio_4_3_0_sh_mask.h"
+#include "ivsrcid/nbio/irqsrcs_nbif_7_4.h"
 #include 
 
 static void nbio_v4_3_remap_hdp_registers(struct amdgpu_device *adev)
@@ -538,3 +539,81 @@ const struct amdgpu_nbio_funcs nbio_v4_3_sriov_funcs = {
.remap_hdp_registers = nbio_v4_3_remap_hdp_registers,
.get_rom_offset = nbio_v4_3_get_rom_offset,
 };
+
+static int nbio_v4_3_set_ras_err_event_athub_irq_state(struct amdgpu_device 
*adev,
+  struct amdgpu_irq_src 
*src,
+  unsigned type,
+  enum 
amdgpu_interrupt_state state)
+{
+   /* The ras_controller_irq enablement should be done in psp bl when it
+* tries to enable ras feature. Driver only need to set the correct 
interrupt
+* vector for bare-metal and sriov use case respectively
+*/
+   uint32_t bif_doorbell_int_cntl;
+
+   bif_doorbell_int_cntl = RREG32_SOC15(NBIO, 0, 
regBIF_BX0_BIF_DOORBELL_INT_CNTL);
+   bif_doorbell_int_cntl = REG_SET_FIELD(bif_doorbell_int_cntl,
+ BIF_BX0_BIF_DOORBELL_INT_CNTL,
+ 
RAS_ATHUB_ERR_EVENT_INTERRUPT_DISABLE,
+ (state == 
AMDGPU_IRQ_STATE_ENABLE) ? 0 : 1);
+   WREG32_SOC15(NBIO, 0, regBIF_BX0_BIF_DOORBELL_INT_CNTL, 
bif_doorbell_int_cntl);
+
+   return 0;
+}
+
+static int nbio_v4_3_process_err_event_athub_irq(struct amdgpu_device *adev,
+struct amdgpu_irq_src *source,
+struct amdgpu_iv_entry *entry)
+{
+   /* By design, the ih cookie for err_event_athub_irq should be written
+* to bif ring. since bif ring is not enabled, just leave process 
callback
+* as a dummy one.
+*/
+   return 0;
+}
+
+static const struct amdgpu_irq_src_funcs 
nbio_v4_3_ras_err_event_athub_irq_funcs = {
+   .set = nbio_v4_3_set_ras_err_event_athub_irq_state,
+   .process = nbio_v4_3_process_err_event_athub_irq,
+};
+
+static void nbio_v4_3_handle_ras_err_event_athub_intr_no_bifring(struct 
amdgpu_device *adev)
+{
+   uint32_t bif_doorbell_int_cntl;
+
+   bif_doorbell_int_cntl = RREG32_SOC15(NBIO, 0, 
regBIF_BX0_BIF_DOORBELL_INT_CNTL);
+   if (REG_GET_FIELD(bif_doorbell_int_cntl,
+ BIF_DOORBELL_INT_CNTL,
+ RAS_ATHUB_ERR_EVENT_INTERRUPT_STATUS)) {
+   /* driver has to clear the interrupt status when bif ring is 
disabled */
+   

[PATCH] drm/amdgpu: Add fatal error handling in nbio v4_3

2023-03-22 Thread Hawking Zhang
GPU will stop working once fatal error is detected.
it will inform driver to do reset to recover from
the fatal error.

Signed-off-by: Hawking Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 11 
 drivers/gpu/drm/amd/amdgpu/nbio_v4_3.c  | 79 +
 drivers/gpu/drm/amd/amdgpu/nbio_v4_3.h  |  1 +
 drivers/gpu/drm/amd/amdgpu/soc21.c  | 15 -
 4 files changed, 105 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index c6dc3cd2a9de..5b1779021881 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -34,6 +34,7 @@
 #include "amdgpu_atomfirmware.h"
 #include "amdgpu_xgmi.h"
 #include "ivsrcid/nbio/irqsrcs_nbif_7_4.h"
+#include "nbio_v4_3.h"
 #include "atom.h"
 #include "amdgpu_reset.h"
 
@@ -2562,6 +2563,16 @@ int amdgpu_ras_init(struct amdgpu_device *adev)
if (!adev->gmc.xgmi.connected_to_cpu)
adev->nbio.ras = _v7_4_ras;
break;
+   case IP_VERSION(4, 3, 0):
+   if (adev->ras_hw_enabled | AMDGPU_RAS_BLOCK__DF)
+   /* unlike other generation of nbio ras,
+* nbio v4_3 only support fatal error interrupt
+* to inform software that DF is freezed due to
+* system fatal error event. driver should not
+* enable nbio ras in such case. Instead,
+* check DF RAS */
+   adev->nbio.ras = _v4_3_ras;
+   break;
default:
/* nbio ras is not available */
break;
diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v4_3.c 
b/drivers/gpu/drm/amd/amdgpu/nbio_v4_3.c
index 09fdcd20cb91..d5ed9e0e1a5f 100644
--- a/drivers/gpu/drm/amd/amdgpu/nbio_v4_3.c
+++ b/drivers/gpu/drm/amd/amdgpu/nbio_v4_3.c
@@ -26,6 +26,7 @@
 
 #include "nbio/nbio_4_3_0_offset.h"
 #include "nbio/nbio_4_3_0_sh_mask.h"
+#include "ivsrcid/nbio/irqsrcs_nbif_7_4.h"
 #include 
 
 static void nbio_v4_3_remap_hdp_registers(struct amdgpu_device *adev)
@@ -538,3 +539,81 @@ const struct amdgpu_nbio_funcs nbio_v4_3_sriov_funcs = {
.remap_hdp_registers = nbio_v4_3_remap_hdp_registers,
.get_rom_offset = nbio_v4_3_get_rom_offset,
 };
+
+static int nbio_v4_3_set_ras_err_event_athub_irq_state(struct amdgpu_device 
*adev,
+  struct amdgpu_irq_src 
*src,
+  unsigned type,
+  enum 
amdgpu_interrupt_state state)
+{
+   /* The ras_controller_irq enablement should be done in psp bl when it
+* tries to enable ras feature. Driver only need to set the correct 
interrupt
+* vector for bare-metal and sriov use case respectively
+*/
+   uint32_t bif_doorbell_int_cntl;
+
+   bif_doorbell_int_cntl = RREG32_SOC15(NBIO, 0, 
regBIF_BX0_BIF_DOORBELL_INT_CNTL);
+   bif_doorbell_int_cntl = REG_SET_FIELD(bif_doorbell_int_cntl,
+ BIF_BX0_BIF_DOORBELL_INT_CNTL,
+ 
RAS_ATHUB_ERR_EVENT_INTERRUPT_DISABLE,
+ (state == 
AMDGPU_IRQ_STATE_ENABLE) ? 0 : 1);
+   WREG32_SOC15(NBIO, 0, regBIF_BX0_BIF_DOORBELL_INT_CNTL, 
bif_doorbell_int_cntl);
+
+   return 0;
+}
+
+static int nbio_v4_3_process_err_event_athub_irq(struct amdgpu_device *adev,
+struct amdgpu_irq_src *source,
+struct amdgpu_iv_entry *entry)
+{
+   /* By design, the ih cookie for err_event_athub_irq should be written
+* to bif ring. since bif ring is not enabled, just leave process 
callback
+* as a dummy one.
+*/
+   return 0;
+}
+
+static const struct amdgpu_irq_src_funcs 
nbio_v4_3_ras_err_event_athub_irq_funcs = {
+   .set = nbio_v4_3_set_ras_err_event_athub_irq_state,
+   .process = nbio_v4_3_process_err_event_athub_irq,
+};
+
+static void nbio_v4_3_handle_ras_err_event_athub_intr_no_bifring(struct 
amdgpu_device *adev)
+{
+   uint32_t bif_doorbell_int_cntl;
+
+   bif_doorbell_int_cntl = RREG32_SOC15(NBIO, 0, 
regBIF_BX0_BIF_DOORBELL_INT_CNTL);
+   if (REG_GET_FIELD(bif_doorbell_int_cntl,
+ BIF_DOORBELL_INT_CNTL,
+ RAS_ATHUB_ERR_EVENT_INTERRUPT_STATUS)) {
+   /* driver has to clear the interrupt status when bif ring is 
disabled */
+   bif_doorbell_int_cntl = REG_SET_FIELD(bif_doorbell_int_cntl,
+   BIF_DOORBELL_INT_CNTL,
+   
RAS_ATHUB_ERR_EVENT_INTERRUPT_CLEAR, 1);
+   WREG32_SOC15(NBIO, 0, regBIF_BX0_BIF_DOORBELL_INT_CNTL, 
bif_doorbell_int_cntl);
+   

RE: [Resend PATCH v1 3/3] drm/amd/pm: vangogh: support to send SMT enable message

2023-03-22 Thread Yuan, Perry
[AMD Official Use Only - General]



> -Original Message-
> From: Wenyou Yang 
> Sent: Wednesday, March 22, 2023 5:16 PM
> To: Deucher, Alexander ; Koenig, Christian
> ; Pan, Xinhui 
> Cc: Yuan, Perry ; Liang, Richard qi
> ; Li, Ying ; Liu, Kun
> ; amd-gfx@lists.freedesktop.org; Yang, WenYou
> 
> Subject: [Resend PATCH v1 3/3] drm/amd/pm: vangogh: support to send SMT
> enable message
> 
> Add the support to PPSMC_MSG_SetCClkSMTEnable(0x58) message to pmfw for
> vangogh.
> 
> Signed-off-by: Wenyou Yang 
> ---
>  .../pm/swsmu/inc/pmfw_if/smu_v11_5_ppsmc.h|  3 ++-
>  drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h  |  3 ++-
>   .../gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c  | 19
> +++
>  3 files changed, 23 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu_v11_5_ppsmc.h
> b/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu_v11_5_ppsmc.h
> index 7471e2df2828..2b182dbc6f9c 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu_v11_5_ppsmc.h
> +++ b/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu_v11_5_ppsmc.h
> @@ -111,7 +111,8 @@
>  #define PPSMC_MSG_GetGfxOffStatus   0x50
>  #define PPSMC_MSG_GetGfxOffEntryCount   0x51
>  #define PPSMC_MSG_LogGfxOffResidency0x52
> -#define PPSMC_Message_Count0x53
> +#define PPSMC_MSG_SetCClkSMTEnable  0x58
> +#define PPSMC_Message_Count0x54
> 
>  //Argument for PPSMC_MSG_GfxDeviceDriverReset  enum { diff --git
> a/drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h
> b/drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h
> index 297b70b9388f..820812d910bf 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h
> +++ b/drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h
> @@ -245,7 +245,8 @@
>   __SMU_DUMMY_MAP(AllowGpo),  \
>   __SMU_DUMMY_MAP(Mode2Reset),\
>   __SMU_DUMMY_MAP(RequestI2cTransaction), \
> - __SMU_DUMMY_MAP(GetMetricsTable),
> + __SMU_DUMMY_MAP(GetMetricsTable), \
> + __SMU_DUMMY_MAP(SetCClkSMTEnable),
> 
>  #undef __SMU_DUMMY_MAP
>  #define __SMU_DUMMY_MAP(type)SMU_MSG_##type
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c
> b/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c
> index 7433dcaa16e0..f0eeb42df96b 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c
> @@ -141,6 +141,7 @@ static struct cmn2asic_msg_mapping
> vangogh_message_map[SMU_MSG_MAX_COUNT] = {
>   MSG_MAP(GetGfxOffStatus,
> PPSMC_MSG_GetGfxOffStatus,0),
>   MSG_MAP(GetGfxOffEntryCount,
> PPSMC_MSG_GetGfxOffEntryCount,0),
>   MSG_MAP(LogGfxOffResidency,
> PPSMC_MSG_LogGfxOffResidency, 0),
> + MSG_MAP(SetCClkSMTEnable,
> PPSMC_MSG_SetCClkSMTEnable,
>   0),
>  };
> 
>  static struct cmn2asic_mapping
> vangogh_feature_mask_map[SMU_FEATURE_COUNT] = { @@ -2428,6
> +2429,23 @@ static u32 vangogh_get_gfxoff_entrycount(struct smu_context
> *smu, uint64_t *entr
>   return ret;
>  }
> 
> +static int vangogh_set_cpu_smt_enable(struct smu_context *smu, bool
> +enable) {
> + int ret = 0;
> +
> + if (enable) {
> + ret = smu_cmn_send_smc_msg_with_param(smu,
> +
> SMU_MSG_SetCClkSMTEnable,
> +   1, NULL);
> + } else {
> + ret = smu_cmn_send_smc_msg_with_param(smu,
> +
> SMU_MSG_SetCClkSMTEnable,
> +   0, NULL);
> + }
> +
> + return ret;
> +}
> +
>  static const struct pptable_funcs vangogh_ppt_funcs = {
> 
>   .check_fw_status = smu_v11_0_check_fw_status, @@ -2474,6 +2492,7
> @@ static const struct pptable_funcs vangogh_ppt_funcs = {
>   .get_power_limit = vangogh_get_power_limit,
>   .set_power_limit = vangogh_set_power_limit,
>   .get_vbios_bootup_values = smu_v11_0_get_vbios_bootup_values,
> + .set_cpu_smt_enable = vangogh_set_cpu_smt_enable,

Maybe we can rename the function with cclk dpm string? 
For example, 
.set_cclk_pd_limit = vangogh_set_cpu_smt_enable,

Perry. 

>  };
> 
>  void vangogh_set_ppt_funcs(struct smu_context *smu)
> --
> 2.39.2


Re: [PATCH 32/32] drm/amdkfd: bump kfd ioctl minor version for debug api availability

2023-03-22 Thread Felix Kuehling

Am 2023-01-25 um 14:54 schrieb Jonathan Kim:

Bump the minor version to declare debugging capability is now
available.

Signed-off-by: Jonathan Kim 
Reviewed-by: Felix Kuehling 


This needs to be bumped to 1.13 once you rebase on the latest staging. 
With that fixed, the patch is


Reviewed-by: Felix Kuehling 



---
  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 1 -
  include/uapi/linux/kfd_ioctl.h   | 3 ++-
  2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index da74a6ef4d9b..c28d4b2dd0ef 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -2896,7 +2896,6 @@ static int kfd_ioctl_set_debug_trap(struct file *filep, 
struct kfd_process *p, v
if (!r)
target->exception_enable_mask = 
args->enable.exception_mask;
  
-		pr_warn("Debug functions limited\n");

break;
case KFD_IOC_DBG_TRAP_DISABLE:
r = kfd_dbg_trap_disable(target);
diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index 9ef4eed45c19..a0efe1ccdbd6 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -37,9 +37,10 @@
   * - 1.9 - Add available memory ioctl
   * - 1.10 - Add SMI profiler event log
   * - 1.11 - Add unified memory for ctx save/restore area
+ * - 1.12 - Add debugger API
   */
  #define KFD_IOCTL_MAJOR_VERSION 1
-#define KFD_IOCTL_MINOR_VERSION 11
+#define KFD_IOCTL_MINOR_VERSION 12
  
  struct kfd_ioctl_get_version_args {

__u32 major_version;/* from KFD */


Re: [PATCH 31/32] drm/amdkfd: add debug device snapshot operation

2023-03-22 Thread Felix Kuehling

Am 2023-01-25 um 14:54 schrieb Jonathan Kim:

Similar to queue snapshot, return an array of device information using
an entry_size check and return.
Unlike queue snapshots, the debugger needs to pass to correct number of
devices that exist.  If it fails to do so, the KFD will return the
number of actual devices so that the debugger can make a subsequent
successful call.

v3: was reviewed but re-requesting review with new revision and
subvendor information.
memset 0 device info entry to clear padding.

v2: change buf_size are to num_devices for more clarity.
expand device entry new members on copy.
fix minimum entry size calculation for queue and device snapshot.
change device snapshot implementation to match queue snapshot
implementation.

Signed-off-by: Jonathan Kim 


Reviewed-by: Felix Kuehling 



---
  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c |  7 ++-
  drivers/gpu/drm/amd/amdkfd/kfd_debug.c   | 72 
  drivers/gpu/drm/amd/amdkfd/kfd_debug.h   |  5 ++
  3 files changed, 82 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 93b288233577..da74a6ef4d9b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -2972,8 +2972,11 @@ static int kfd_ioctl_set_debug_trap(struct file *filep, 
struct kfd_process *p, v
>queue_snapshot.entry_size);
break;
case KFD_IOC_DBG_TRAP_GET_DEVICE_SNAPSHOT:
-   pr_warn("Debug op %i not supported yet\n", args->op);
-   r = -EACCES;
+   r = kfd_dbg_trap_device_snapshot(target,
+   args->device_snapshot.exception_mask,
+   (void __user 
*)args->device_snapshot.snapshot_buf_ptr,
+   >device_snapshot.num_devices,
+   >device_snapshot.entry_size);
break;
default:
pr_err("Invalid option: %i\n", args->op);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
index db316f0625f8..d1c4eb9652fd 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
@@ -22,6 +22,7 @@
  
  #include "kfd_debug.h"

  #include "kfd_device_queue_manager.h"
+#include "kfd_topology.h"
  #include 
  #include 
  
@@ -998,6 +999,77 @@ int kfd_dbg_trap_query_exception_info(struct kfd_process *target,

return r;
  }
  
+int kfd_dbg_trap_device_snapshot(struct kfd_process *target,

+   uint64_t exception_clear_mask,
+   void __user *user_info,
+   uint32_t *number_of_device_infos,
+   uint32_t *entry_size)
+{
+   struct kfd_dbg_device_info_entry device_info;
+   uint32_t tmp_entry_size = *entry_size, tmp_num_devices;
+   int i, r = 0;
+
+   if (!(target && user_info && number_of_device_infos && entry_size))
+   return -EINVAL;
+
+   tmp_num_devices = min_t(size_t, *number_of_device_infos, 
target->n_pdds);
+   *number_of_device_infos = target->n_pdds;
+   *entry_size = min_t(size_t, *entry_size, sizeof(device_info));
+
+   if (!tmp_num_devices)
+   return 0;
+
+   memset(_info, 0, sizeof(device_info));
+
+   mutex_lock(>event_mutex);
+
+   /* Run over all pdd of the process */
+   for (i = 0; i < tmp_num_devices; i++) {
+   struct kfd_process_device *pdd = target->pdds[i];
+   struct kfd_topology_device *topo_dev = 
kfd_topology_device_by_id(pdd->dev->id);
+
+   device_info.gpu_id = pdd->dev->id;
+   device_info.exception_status = pdd->exception_status;
+   device_info.lds_base = pdd->lds_base;
+   device_info.lds_limit = pdd->lds_limit;
+   device_info.scratch_base = pdd->scratch_base;
+   device_info.scratch_limit = pdd->scratch_limit;
+   device_info.gpuvm_base = pdd->gpuvm_base;
+   device_info.gpuvm_limit = pdd->gpuvm_limit;
+   device_info.location_id = topo_dev->node_props.location_id;
+   device_info.vendor_id = topo_dev->node_props.vendor_id;
+   device_info.device_id = topo_dev->node_props.device_id;
+   device_info.revision_id = pdd->dev->adev->pdev->revision;
+   device_info.subsystem_vendor_id = 
pdd->dev->adev->pdev->subsystem_vendor;
+   device_info.subsystem_device_id = 
pdd->dev->adev->pdev->subsystem_device;
+   device_info.fw_version = pdd->dev->mec_fw_version;
+   device_info.gfx_target_version =
+   topo_dev->node_props.gfx_target_version;
+   device_info.simd_count = topo_dev->node_props.simd_count;
+   device_info.max_waves_per_simd =
+   topo_dev->node_props.max_waves_per_simd;
+   device_info.array_count = 

Re: [PATCH 30/32] drm/amdkfd: add debug queue snapshot operation

2023-03-22 Thread Felix Kuehling



Am 2023-01-25 um 14:53 schrieb Jonathan Kim:

Allow the debugger to get a snapshot of a specified number of queues
containing various queue property information that is copied to the
debugger.

Since the debugger doesn't know how many queues exist at any given time,
allow the debugger to pass the requested number of snapshots as 0 to get
the actual number of potential snapshots to use for a subsequent snapshot
request for actual information.

To prevent future ABI breakage, pass in the requested entry_size.
The KFD will return it's own entry_size in case the debugger still wants
log the information in a core dump on sizing failure.

Also allow the debugger to clear exceptions when doing a snapshot.

v3: fix uninitialized return and change queue snapshot to type void for
proper increment on buffer copy.
use memset 0 to init snapshot entry to clear struct padding.

v2: change buf_size arg to num_queues for clarity.
fix minimum entry size calculation.

Signed-off-by: Jonathan Kim 
---
  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |  6 +++
  .../drm/amd/amdkfd/kfd_device_queue_manager.c | 36 
  .../drm/amd/amdkfd/kfd_device_queue_manager.h |  3 ++
  drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  5 +++
  .../amd/amdkfd/kfd_process_queue_manager.c| 41 +++
  5 files changed, 91 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index d3d2026b6e65..93b288233577 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -2965,6 +2965,12 @@ static int kfd_ioctl_set_debug_trap(struct file *filep, 
struct kfd_process *p, v
>query_exception_info.info_size);
break;
case KFD_IOC_DBG_TRAP_GET_QUEUE_SNAPSHOT:
+   r = pqm_get_queue_snapshot(>pqm,
+   args->queue_snapshot.exception_mask,
+   (void __user 
*)args->queue_snapshot.snapshot_buf_ptr,
+   >queue_snapshot.num_queues,
+   >queue_snapshot.entry_size);
+   break;
case KFD_IOC_DBG_TRAP_GET_DEVICE_SNAPSHOT:
pr_warn("Debug op %i not supported yet\n", args->op);
r = -EACCES;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 7792fe9491c5..5ae504a512f0 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -3000,6 +3000,42 @@ int suspend_queues(struct kfd_process *p,
return total_suspended;
  }
  
+static uint32_t set_queue_type_for_user(struct queue_properties *q_props)

+{
+   switch (q_props->type) {
+   case KFD_QUEUE_TYPE_COMPUTE:
+   return q_props->format == KFD_QUEUE_FORMAT_PM4
+   ? KFD_IOC_QUEUE_TYPE_COMPUTE
+   : KFD_IOC_QUEUE_TYPE_COMPUTE_AQL;
+   case KFD_QUEUE_TYPE_SDMA:
+   return KFD_IOC_QUEUE_TYPE_SDMA;
+   case KFD_QUEUE_TYPE_SDMA_XGMI:
+   return KFD_IOC_QUEUE_TYPE_SDMA_XGMI;
+   default:
+   WARN_ONCE(true, "queue type not recognized!");
+   return 0x;
+   };
+}
+
+void set_queue_snapshot_entry(struct queue *q,
+ uint64_t exception_clear_mask,
+ struct kfd_queue_snapshot_entry *qss_entry)
+{
+   qss_entry->ring_base_address = q->properties.queue_address;
+   qss_entry->write_pointer_address = (uint64_t)q->properties.write_ptr;
+   qss_entry->read_pointer_address = (uint64_t)q->properties.read_ptr;
+   qss_entry->ctx_save_restore_address =
+   q->properties.ctx_save_restore_area_address;
+   qss_entry->ctx_save_restore_area_size =
+   q->properties.ctx_save_restore_area_size;
+   qss_entry->exception_status = q->properties.exception_status;
+   qss_entry->queue_id = q->properties.queue_id;
+   qss_entry->gpu_id = q->device->id;
+   qss_entry->ring_size = (uint32_t)q->properties.queue_size;
+   qss_entry->queue_type = set_queue_type_for_user(>properties);
+   q->properties.exception_status &= ~exception_clear_mask;
+}
+
  int debug_lock_and_unmap(struct device_queue_manager *dqm)
  {
int r;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
index 7ccf8d0d1867..89d4a5b293a5 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
@@ -296,6 +296,9 @@ int suspend_queues(struct kfd_process *p,
  int resume_queues(struct kfd_process *p,
uint32_t num_queues,
uint32_t *usr_queue_id_array);
+void set_queue_snapshot_entry(struct queue 

Re: [PATCH 27/32] drm/amdkfd: add debug set flags operation

2023-03-22 Thread Felix Kuehling



Am 2023-01-25 um 14:53 schrieb Jonathan Kim:

Allow the debugger to set single memory and single ALU operations.

Some exceptions are imprecise (memory violations, address watch) in the
sense that a trap occurs only when the exception interrupt occurs and
not at the non-halting faulty instruction.  Trap temporaries 0 & 1 save
the program counter address, which means that these values will not point
to the faulty instruction address but to whenever the interrupt was
raised.

Setting the Single Memory Operations flag will inject an automatic wait
on every memory operation instruction forcing imprecise memory exceptions
to become precise at the cost of performance.  This setting is not
permitted on debug devices that support only a global setting of this
option.

Return the previous set flags to the debugger as well.

v3: make precise mem op the only available flag for now.

v2: add gfx11 support.

Signed-off-by: Jonathan Kim 
---
  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c |  2 ++
  drivers/gpu/drm/amd/amdkfd/kfd_debug.c   | 38 
  drivers/gpu/drm/amd/amdkfd/kfd_debug.h   |  1 +
  3 files changed, 41 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 8f2ede781863..c34caa14b84e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -2947,6 +2947,8 @@ static int kfd_ioctl_set_debug_trap(struct file *filep, 
struct kfd_process *p, v
args->clear_node_address_watch.id);
break;
case KFD_IOC_DBG_TRAP_SET_FLAGS:
+   r = kfd_dbg_trap_set_flags(target, >set_flags.flags);
+   break;
case KFD_IOC_DBG_TRAP_QUERY_DEBUG_EVENT:
case KFD_IOC_DBG_TRAP_QUERY_EXCEPTION_INFO:
case KFD_IOC_DBG_TRAP_GET_QUEUE_SNAPSHOT:
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
index 8d2e1adb442d..77ba7da2bb9d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
@@ -23,6 +23,7 @@
  #include "kfd_debug.h"
  #include "kfd_device_queue_manager.h"
  #include 
+#include 
  
  #define MAX_WATCH_ADDRESSES	4
  
@@ -425,6 +426,40 @@ static void kfd_dbg_clear_process_address_watch(struct kfd_process *target)

kfd_dbg_trap_clear_dev_address_watch(target->pdds[i], 
j);
  }
  
+int kfd_dbg_trap_set_flags(struct kfd_process *target, uint32_t *flags)

+{
+   uint32_t prev_flags = target->dbg_flags;
+   int i, r = 0;
+
+   for (i = 0; i < target->n_pdds; i++) {
+   if (!kfd_dbg_is_per_vmid_supported(target->pdds[i]->dev) &&
+   (*flags & KFD_DBG_TRAP_FLAG_SINGLE_MEM_OP)) {
+   *flags = prev_flags;
+   return -EACCES;
+   }
+   }
+
+   target->dbg_flags = *flags & KFD_DBG_TRAP_FLAG_SINGLE_MEM_OP;
+   *flags = prev_flags;
+   for (i = 0; i < target->n_pdds; i++) {
+   struct kfd_process_device *pdd = target->pdds[i];
+
+   if (!kfd_dbg_is_per_vmid_supported(pdd->dev))
+   continue;
+
+   if (!pdd->dev->shared_resources.enable_mes)
+   r = debug_refresh_runlist(pdd->dev->dqm);
+   else
+   r = kfd_dbg_set_mes_debug_mode(pdd);
+
+   if (r) {
+   target->dbg_flags = prev_flags;
+   break;


Do we need to roll back changes on the other GPUs when this happens?



+   }
+   }
+
+   return r;
+}
  
  /* kfd_dbg_trap_deactivate:

   *target: target process
@@ -439,9 +474,12 @@ void kfd_dbg_trap_deactivate(struct kfd_process *target, 
bool unwind, int unwind
int i, count = 0;
  
  	if (!unwind) {

+   uint32_t flags = 0;


checkpatch.pl will complain without an empty line after the variable 
declaration.


Regards,
  Felix



cancel_work_sync(>debug_event_workarea);
kfd_dbg_clear_process_address_watch(target);
kfd_dbg_trap_set_wave_launch_mode(target, 0);
+
+   kfd_dbg_trap_set_flags(target, );
}
  
  	for (i = 0; i < target->n_pdds; i++) {

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_debug.h
index 63c716ce5ab9..782362d82890 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.h
@@ -57,6 +57,7 @@ int kfd_dbg_trap_set_dev_address_watch(struct 
kfd_process_device *pdd,
uint32_t watch_address_mask,
uint32_t *watch_id,
uint32_t watch_mode);
+int kfd_dbg_trap_set_flags(struct kfd_process *target, uint32_t *flags);
  int kfd_dbg_send_exception_to_runtime(struct kfd_process *p,
unsigned int dev_id,

Re: [PATCH 26/32] drm/amdkfd: add debug set and clear address watch points operation

2023-03-22 Thread Felix Kuehling



Am 2023-01-25 um 14:53 schrieb Jonathan Kim:

Shader read, write and atomic memory operations can be alerted to the
debugger as an address watch exception.

Allow the debugger to pass in a watch point to a particular memory
address per device.

Note that there exists only 4 watch points per devices to date, so have
the KFD keep track of what watch points are allocated or not.

v3: add gfx11 support.
cleanup gfx9 kgd calls to set and clear address watch.
use per device spinlock to set watch points.
fixup runlist refresh calls on set/clear address watch.

v2: change dev_id arg to gpu_id for consistency

Signed-off-by: Jonathan Kim 
---
  .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c  |  51 +++
  .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c   |   2 +
  .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c|  78 ++
  .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.h|   8 ++
  .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10_3.c  |   5 +-
  .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c|  52 ++-
  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c |  77 ++
  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h |   8 ++
  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |  24 
  drivers/gpu/drm/amd/amdkfd/kfd_debug.c| 136 ++
  drivers/gpu/drm/amd/amdkfd/kfd_debug.h|   8 +-
  drivers/gpu/drm/amd/amdkfd/kfd_device.c   |   1 +
  drivers/gpu/drm/amd/amdkfd/kfd_priv.h |   6 +-
  13 files changed, 451 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
index 4de2066215b4..18baf1cd8c01 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
@@ -118,6 +118,55 @@ static uint32_t kgd_aldebaran_set_wave_launch_mode(struct 
amdgpu_device *adev,
return data;
  }
  
+#define TCP_WATCH_STRIDE (regTCP_WATCH1_ADDR_H - regTCP_WATCH0_ADDR_H)

+static uint32_t kgd_gfx_aldebaran_set_address_watch(
+   struct amdgpu_device *adev,
+   uint64_t watch_address,
+   uint32_t watch_address_mask,
+   uint32_t watch_id,
+   uint32_t watch_mode,
+   uint32_t debug_vmid)
+{
+   uint32_t watch_address_high;
+   uint32_t watch_address_low;
+   uint32_t watch_address_cntl;
+
+   watch_address_cntl = 0;
+   watch_address_low = lower_32_bits(watch_address);
+   watch_address_high = upper_32_bits(watch_address) & 0x;
+
+   watch_address_cntl = REG_SET_FIELD(watch_address_cntl,
+   TCP_WATCH0_CNTL,
+   MODE,
+   watch_mode);
+
+   watch_address_cntl = REG_SET_FIELD(watch_address_cntl,
+   TCP_WATCH0_CNTL,
+   MASK,
+   watch_address_mask >> 6);
+
+   watch_address_cntl = REG_SET_FIELD(watch_address_cntl,
+   TCP_WATCH0_CNTL,
+   VALID,
+   1);
+
+   WREG32_RLC((SOC15_REG_OFFSET(GC, 0, regTCP_WATCH0_ADDR_H) +
+   (watch_id * TCP_WATCH_STRIDE)),
+   watch_address_high);
+
+   WREG32_RLC((SOC15_REG_OFFSET(GC, 0, regTCP_WATCH0_ADDR_L) +
+   (watch_id * TCP_WATCH_STRIDE)),
+   watch_address_low);
+
+   return watch_address_cntl;
+}
+
+uint32_t kgd_gfx_aldebaran_clear_address_watch(struct amdgpu_device *adev,
+   uint32_t watch_id)
+{
+   return 0;
+}
+
  const struct kfd2kgd_calls aldebaran_kfd2kgd = {
.program_sh_mem_settings = kgd_gfx_v9_program_sh_mem_settings,
.set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping,
@@ -140,6 +189,8 @@ const struct kfd2kgd_calls aldebaran_kfd2kgd = {
.validate_trap_override_request = 
kgd_aldebaran_validate_trap_override_request,
.set_wave_launch_trap_override = 
kgd_aldebaran_set_wave_launch_trap_override,
.set_wave_launch_mode = kgd_aldebaran_set_wave_launch_mode,
+   .set_address_watch = kgd_gfx_aldebaran_set_address_watch,
+   .clear_address_watch = kgd_gfx_aldebaran_clear_address_watch,
.get_iq_wait_times = kgd_gfx_v9_get_iq_wait_times,
.build_grace_period_packet_info = 
kgd_gfx_v9_build_grace_period_packet_info,
.program_trap_handler_settings = 
kgd_gfx_v9_program_trap_handler_settings,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
index 500013540356..a7fb5ef13166 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
@@ -413,6 +413,8 @@ const struct kfd2kgd_calls arcturus_kfd2kgd = {
.validate_trap_override_request = 

[PATCH AUTOSEL 6.1 26/34] drm/amdkfd: Fixed kfd_process cleanup on module exit.

2023-03-22 Thread Sasha Levin
From: David Belanger 

[ Upstream commit 20bc9f76b6a2455c6b54b91ae7634f147f64987f ]

Handle case when module is unloaded (kfd_exit) before a process space
(mm_struct) is released.

v2: Fixed potential race conditions by removing all kfd_process from
the process table first, then working on releasing the resources.

v3: Fixed loop element access / synchronization.  Fixed extra empty lines.

Signed-off-by: David Belanger 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/amdkfd/kfd_module.c  |  1 +
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|  1 +
 drivers/gpu/drm/amd/amdkfd/kfd_process.c | 67 +---
 3 files changed, 62 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_module.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_module.c
index 09b966dc37681..aee2212e52f69 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_module.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_module.c
@@ -77,6 +77,7 @@ static int kfd_init(void)
 
 static void kfd_exit(void)
 {
+   kfd_cleanup_processes();
kfd_debugfs_fini();
kfd_process_destroy_wq();
kfd_procfs_shutdown();
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index bf610e3b683bb..6d6588b9beed7 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -928,6 +928,7 @@ bool kfd_dev_is_large_bar(struct kfd_dev *dev);
 
 int kfd_process_create_wq(void);
 void kfd_process_destroy_wq(void);
+void kfd_cleanup_processes(void);
 struct kfd_process *kfd_create_process(struct file *filep);
 struct kfd_process *kfd_get_process(const struct task_struct *task);
 struct kfd_process *kfd_lookup_process_by_pasid(u32 pasid);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index dd351105c1bcf..7f68d51541e8e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1167,6 +1167,17 @@ static void kfd_process_free_notifier(struct 
mmu_notifier *mn)
kfd_unref_process(container_of(mn, struct kfd_process, mmu_notifier));
 }
 
+static void kfd_process_notifier_release_internal(struct kfd_process *p)
+{
+   cancel_delayed_work_sync(>eviction_work);
+   cancel_delayed_work_sync(>restore_work);
+
+   /* Indicate to other users that MM is no longer valid */
+   p->mm = NULL;
+
+   mmu_notifier_put(>mmu_notifier);
+}
+
 static void kfd_process_notifier_release(struct mmu_notifier *mn,
struct mm_struct *mm)
 {
@@ -1181,17 +1192,22 @@ static void kfd_process_notifier_release(struct 
mmu_notifier *mn,
return;
 
mutex_lock(_processes_mutex);
+   /*
+* Do early return if table is empty.
+*
+* This could potentially happen if this function is called concurrently
+* by mmu_notifier and by kfd_cleanup_pocesses.
+*
+*/
+   if (hash_empty(kfd_processes_table)) {
+   mutex_unlock(_processes_mutex);
+   return;
+   }
hash_del_rcu(>kfd_processes);
mutex_unlock(_processes_mutex);
synchronize_srcu(_processes_srcu);
 
-   cancel_delayed_work_sync(>eviction_work);
-   cancel_delayed_work_sync(>restore_work);
-
-   /* Indicate to other users that MM is no longer valid */
-   p->mm = NULL;
-
-   mmu_notifier_put(>mmu_notifier);
+   kfd_process_notifier_release_internal(p);
 }
 
 static const struct mmu_notifier_ops kfd_process_mmu_notifier_ops = {
@@ -1200,6 +1216,43 @@ static const struct mmu_notifier_ops 
kfd_process_mmu_notifier_ops = {
.free_notifier = kfd_process_free_notifier,
 };
 
+/*
+ * This code handles the case when driver is being unloaded before all
+ * mm_struct are released.  We need to safely free the kfd_process and
+ * avoid race conditions with mmu_notifier that might try to free them.
+ *
+ */
+void kfd_cleanup_processes(void)
+{
+   struct kfd_process *p;
+   struct hlist_node *p_temp;
+   unsigned int temp;
+   HLIST_HEAD(cleanup_list);
+
+   /*
+* Move all remaining kfd_process from the process table to a
+* temp list for processing.   Once done, callback from mmu_notifier
+* release will not see the kfd_process in the table and do early 
return,
+* avoiding double free issues.
+*/
+   mutex_lock(_processes_mutex);
+   hash_for_each_safe(kfd_processes_table, temp, p_temp, p, kfd_processes) 
{
+   hash_del_rcu(>kfd_processes);
+   synchronize_srcu(_processes_srcu);
+   hlist_add_head(>kfd_processes, _list);
+   }
+   mutex_unlock(_processes_mutex);
+
+   hlist_for_each_entry_safe(p, p_temp, _list, kfd_processes)
+   kfd_process_notifier_release_internal(p);
+
+   /*
+* Ensures that all outstanding free_notifier get called, triggering
+* the 

[PATCH AUTOSEL 6.1 18/34] drm/amdkfd: fix potential kgd_mem UAFs

2023-03-22 Thread Sasha Levin
From: Chia-I Wu 

[ Upstream commit 9da050b0d9e04439d225a2ec3044af70cdfb3933 ]

kgd_mem pointers returned by kfd_process_device_translate_handle are
only guaranteed to be valid while p->mutex is held. As soon as the mutex
is unlocked, another thread can free the BO.

Signed-off-by: Chia-I Wu 
Signed-off-by: Felix Kuehling 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index f79b8e964140e..e191d38f3da62 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1298,14 +1298,14 @@ static int kfd_ioctl_map_memory_to_gpu(struct file 
*filep,
args->n_success = i+1;
}
 
-   mutex_unlock(>mutex);
-
err = amdgpu_amdkfd_gpuvm_sync_memory(dev->adev, (struct kgd_mem *) 
mem, true);
if (err) {
pr_debug("Sync memory failed, wait interrupted by user 
signal\n");
goto sync_memory_failed;
}
 
+   mutex_unlock(>mutex);
+
/* Flush TLBs after waiting for the page table updates to complete */
for (i = 0; i < args->n_devices; i++) {
peer_pdd = kfd_process_device_data_by_id(p, devices_arr[i]);
@@ -1321,9 +1321,9 @@ static int kfd_ioctl_map_memory_to_gpu(struct file *filep,
 bind_process_to_device_failed:
 get_mem_obj_from_handle_failed:
 map_memory_to_gpu_failed:
+sync_memory_failed:
mutex_unlock(>mutex);
 copy_from_user_failed:
-sync_memory_failed:
kfree(devices_arr);
 
return err;
@@ -1337,6 +1337,7 @@ static int kfd_ioctl_unmap_memory_from_gpu(struct file 
*filep,
void *mem;
long err = 0;
uint32_t *devices_arr = NULL, i;
+   bool flush_tlb;
 
if (!args->n_devices) {
pr_debug("Device IDs array empty\n");
@@ -1389,16 +1390,19 @@ static int kfd_ioctl_unmap_memory_from_gpu(struct file 
*filep,
}
args->n_success = i+1;
}
-   mutex_unlock(>mutex);
 
-   if (kfd_flush_tlb_after_unmap(pdd->dev)) {
+   flush_tlb = kfd_flush_tlb_after_unmap(pdd->dev);
+   if (flush_tlb) {
err = amdgpu_amdkfd_gpuvm_sync_memory(pdd->dev->adev,
(struct kgd_mem *) mem, true);
if (err) {
pr_debug("Sync memory failed, wait interrupted by user 
signal\n");
goto sync_memory_failed;
}
+   }
+   mutex_unlock(>mutex);
 
+   if (flush_tlb) {
/* Flush TLBs after waiting for the page table updates to 
complete */
for (i = 0; i < args->n_devices; i++) {
peer_pdd = kfd_process_device_data_by_id(p, 
devices_arr[i]);
@@ -1414,9 +1418,9 @@ static int kfd_ioctl_unmap_memory_from_gpu(struct file 
*filep,
 bind_process_to_device_failed:
 get_mem_obj_from_handle_failed:
 unmap_memory_from_gpu_failed:
+sync_memory_failed:
mutex_unlock(>mutex);
 copy_from_user_failed:
-sync_memory_failed:
kfree(devices_arr);
return err;
 }
-- 
2.39.2



[PATCH AUTOSEL 6.1 17/34] drm/amdkfd: fix a potential double free in pqm_create_queue

2023-03-22 Thread Sasha Levin
From: Chia-I Wu 

[ Upstream commit b2ca5c5d416b4e72d1e9d0293fc720e2d525fd42 ]

Set *q to NULL on errors, otherwise pqm_create_queue would free it
again.

Signed-off-by: Chia-I Wu 
Signed-off-by: Felix Kuehling 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
index 5137476ec18e6..4236539d9f932 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
@@ -218,8 +218,8 @@ static int init_user_queue(struct process_queue_manager 
*pqm,
return 0;
 
 cleanup:
-   if (dev->shared_resources.enable_mes)
-   uninit_queue(*q);
+   uninit_queue(*q);
+   *q = NULL;
return retval;
 }
 
-- 
2.39.2



[PATCH AUTOSEL 6.1 16/34] drm/amdkfd: Fix BO offset for multi-VMA page migration

2023-03-22 Thread Sasha Levin
From: Xiaogang Chen 

[ Upstream commit b4ee9606378bb9520c94d8b96f0305c3696f5c29 ]

svm_migrate_ram_to_vram migrates a prange from sys ram to vram. The prange may
cross multiple vma. Need remember current dst vram offset in the TTM resource 
for
each migration.

v2: squash in warning fix (Alex)

Signed-off-by: Xiaogang Chen 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index 22b077ac9a196..fad500dd224d8 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -295,7 +295,7 @@ static unsigned long svm_migrate_unsuccessful_pages(struct 
migrate_vma *migrate)
 static int
 svm_migrate_copy_to_vram(struct amdgpu_device *adev, struct svm_range *prange,
 struct migrate_vma *migrate, struct dma_fence **mfence,
-dma_addr_t *scratch)
+dma_addr_t *scratch, uint64_t ttm_res_offset)
 {
uint64_t npages = migrate->npages;
struct device *dev = adev->dev;
@@ -305,8 +305,8 @@ svm_migrate_copy_to_vram(struct amdgpu_device *adev, struct 
svm_range *prange,
uint64_t i, j;
int r;
 
-   pr_debug("svms 0x%p [0x%lx 0x%lx]\n", prange->svms, prange->start,
-prange->last);
+   pr_debug("svms 0x%p [0x%lx 0x%lx 0x%llx]\n", prange->svms, 
prange->start,
+prange->last, ttm_res_offset);
 
src = scratch;
dst = (uint64_t *)(scratch + npages);
@@ -317,7 +317,7 @@ svm_migrate_copy_to_vram(struct amdgpu_device *adev, struct 
svm_range *prange,
goto out;
}
 
-   amdgpu_res_first(prange->ttm_res, prange->offset << PAGE_SHIFT,
+   amdgpu_res_first(prange->ttm_res, ttm_res_offset,
 npages << PAGE_SHIFT, );
for (i = j = 0; i < npages; i++) {
struct page *spage;
@@ -404,7 +404,7 @@ svm_migrate_copy_to_vram(struct amdgpu_device *adev, struct 
svm_range *prange,
 static long
 svm_migrate_vma_to_vram(struct amdgpu_device *adev, struct svm_range *prange,
struct vm_area_struct *vma, uint64_t start,
-   uint64_t end, uint32_t trigger)
+   uint64_t end, uint32_t trigger, uint64_t ttm_res_offset)
 {
struct kfd_process *p = container_of(prange->svms, struct kfd_process, 
svms);
uint64_t npages = (end - start) >> PAGE_SHIFT;
@@ -457,7 +457,7 @@ svm_migrate_vma_to_vram(struct amdgpu_device *adev, struct 
svm_range *prange,
else
pr_debug("0x%lx pages migrated\n", cpages);
 
-   r = svm_migrate_copy_to_vram(adev, prange, , , scratch);
+   r = svm_migrate_copy_to_vram(adev, prange, , , scratch, 
ttm_res_offset);
migrate_vma_pages();
 
pr_debug("successful/cpages/npages 0x%lx/0x%lx/0x%lx\n",
@@ -505,6 +505,7 @@ svm_migrate_ram_to_vram(struct svm_range *prange, uint32_t 
best_loc,
unsigned long addr, start, end;
struct vm_area_struct *vma;
struct amdgpu_device *adev;
+   uint64_t ttm_res_offset;
unsigned long cpages = 0;
long r = 0;
 
@@ -525,6 +526,7 @@ svm_migrate_ram_to_vram(struct svm_range *prange, uint32_t 
best_loc,
 
start = prange->start << PAGE_SHIFT;
end = (prange->last + 1) << PAGE_SHIFT;
+   ttm_res_offset = prange->offset << PAGE_SHIFT;
 
for (addr = start; addr < end;) {
unsigned long next;
@@ -534,13 +536,14 @@ svm_migrate_ram_to_vram(struct svm_range *prange, 
uint32_t best_loc,
break;
 
next = min(vma->vm_end, end);
-   r = svm_migrate_vma_to_vram(adev, prange, vma, addr, next, 
trigger);
+   r = svm_migrate_vma_to_vram(adev, prange, vma, addr, next, 
trigger, ttm_res_offset);
if (r < 0) {
pr_debug("failed %ld to migrate\n", r);
break;
} else {
cpages += r;
}
+   ttm_res_offset += next - addr;
addr = next;
}
 
-- 
2.39.2



[PATCH AUTOSEL 6.2 37/45] drm/amdkfd: Fixed kfd_process cleanup on module exit.

2023-03-22 Thread Sasha Levin
From: David Belanger 

[ Upstream commit 20bc9f76b6a2455c6b54b91ae7634f147f64987f ]

Handle case when module is unloaded (kfd_exit) before a process space
(mm_struct) is released.

v2: Fixed potential race conditions by removing all kfd_process from
the process table first, then working on releasing the resources.

v3: Fixed loop element access / synchronization.  Fixed extra empty lines.

Signed-off-by: David Belanger 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/amdkfd/kfd_module.c  |  1 +
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|  1 +
 drivers/gpu/drm/amd/amdkfd/kfd_process.c | 67 +---
 3 files changed, 62 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_module.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_module.c
index 09b966dc37681..aee2212e52f69 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_module.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_module.c
@@ -77,6 +77,7 @@ static int kfd_init(void)
 
 static void kfd_exit(void)
 {
+   kfd_cleanup_processes();
kfd_debugfs_fini();
kfd_process_destroy_wq();
kfd_procfs_shutdown();
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 552c3ac85a132..7dc55919993c0 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -926,6 +926,7 @@ bool kfd_dev_is_large_bar(struct kfd_dev *dev);
 
 int kfd_process_create_wq(void);
 void kfd_process_destroy_wq(void);
+void kfd_cleanup_processes(void);
 struct kfd_process *kfd_create_process(struct file *filep);
 struct kfd_process *kfd_get_process(const struct task_struct *task);
 struct kfd_process *kfd_lookup_process_by_pasid(u32 pasid);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 51b1683ac5c1e..4d9f2d1c49b1d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1167,6 +1167,17 @@ static void kfd_process_free_notifier(struct 
mmu_notifier *mn)
kfd_unref_process(container_of(mn, struct kfd_process, mmu_notifier));
 }
 
+static void kfd_process_notifier_release_internal(struct kfd_process *p)
+{
+   cancel_delayed_work_sync(>eviction_work);
+   cancel_delayed_work_sync(>restore_work);
+
+   /* Indicate to other users that MM is no longer valid */
+   p->mm = NULL;
+
+   mmu_notifier_put(>mmu_notifier);
+}
+
 static void kfd_process_notifier_release(struct mmu_notifier *mn,
struct mm_struct *mm)
 {
@@ -1181,17 +1192,22 @@ static void kfd_process_notifier_release(struct 
mmu_notifier *mn,
return;
 
mutex_lock(_processes_mutex);
+   /*
+* Do early return if table is empty.
+*
+* This could potentially happen if this function is called concurrently
+* by mmu_notifier and by kfd_cleanup_pocesses.
+*
+*/
+   if (hash_empty(kfd_processes_table)) {
+   mutex_unlock(_processes_mutex);
+   return;
+   }
hash_del_rcu(>kfd_processes);
mutex_unlock(_processes_mutex);
synchronize_srcu(_processes_srcu);
 
-   cancel_delayed_work_sync(>eviction_work);
-   cancel_delayed_work_sync(>restore_work);
-
-   /* Indicate to other users that MM is no longer valid */
-   p->mm = NULL;
-
-   mmu_notifier_put(>mmu_notifier);
+   kfd_process_notifier_release_internal(p);
 }
 
 static const struct mmu_notifier_ops kfd_process_mmu_notifier_ops = {
@@ -1200,6 +1216,43 @@ static const struct mmu_notifier_ops 
kfd_process_mmu_notifier_ops = {
.free_notifier = kfd_process_free_notifier,
 };
 
+/*
+ * This code handles the case when driver is being unloaded before all
+ * mm_struct are released.  We need to safely free the kfd_process and
+ * avoid race conditions with mmu_notifier that might try to free them.
+ *
+ */
+void kfd_cleanup_processes(void)
+{
+   struct kfd_process *p;
+   struct hlist_node *p_temp;
+   unsigned int temp;
+   HLIST_HEAD(cleanup_list);
+
+   /*
+* Move all remaining kfd_process from the process table to a
+* temp list for processing.   Once done, callback from mmu_notifier
+* release will not see the kfd_process in the table and do early 
return,
+* avoiding double free issues.
+*/
+   mutex_lock(_processes_mutex);
+   hash_for_each_safe(kfd_processes_table, temp, p_temp, p, kfd_processes) 
{
+   hash_del_rcu(>kfd_processes);
+   synchronize_srcu(_processes_srcu);
+   hlist_add_head(>kfd_processes, _list);
+   }
+   mutex_unlock(_processes_mutex);
+
+   hlist_for_each_entry_safe(p, p_temp, _list, kfd_processes)
+   kfd_process_notifier_release_internal(p);
+
+   /*
+* Ensures that all outstanding free_notifier get called, triggering
+* the 

[PATCH AUTOSEL 6.2 28/45] drm/amd/display: Fix HDCP failing to enable after suspend

2023-03-22 Thread Sasha Levin
From: Bhawanpreet Lakha 

[ Upstream commit 728cefa53a36ba378ed4a7f31a0c08289687d824 ]

[Why]
On resume some displays are not ready for HDCP, so they will fail if we
start the hdcp authentintication too soon.

Add a delay so that the displays can be ready before we start.

NOTE: Previoulsy this delay was set to 3 seconds but it was causing
issues with compliance, 2 seconds should enough for compliance and the
s3 resume case.

[How]
Change the Delay to 2 seconds.

Reviewed-by: Aurabindo Pillai 
Acked-by: Qingqing Zhuo 
Signed-off-by: Bhawanpreet Lakha 
Tested-by: Daniel Wheeler 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_hdcp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_hdcp.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_hdcp.c
index a7fd98f57f94c..dc62375a8e2c4 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_hdcp.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_hdcp.c
@@ -495,7 +495,7 @@ static void update_config(void *handle, struct 
cp_psp_stream_config *config)
link->dp.mst_enabled = config->mst_enabled;
link->dp.usb4_enabled = config->usb4_enabled;
display->adjust.disable = MOD_HDCP_DISPLAY_DISABLE_AUTHENTICATION;
-   link->adjust.auth_delay = 0;
+   link->adjust.auth_delay = 2;
link->adjust.hdcp1.disable = 0;
conn_state = aconnector->base.state;
 
-- 
2.39.2



[PATCH AUTOSEL 6.2 27/45] drm/amdkfd: fix potential kgd_mem UAFs

2023-03-22 Thread Sasha Levin
From: Chia-I Wu 

[ Upstream commit 9da050b0d9e04439d225a2ec3044af70cdfb3933 ]

kgd_mem pointers returned by kfd_process_device_translate_handle are
only guaranteed to be valid while p->mutex is held. As soon as the mutex
is unlocked, another thread can free the BO.

Signed-off-by: Chia-I Wu 
Signed-off-by: Felix Kuehling 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index f79b8e964140e..e191d38f3da62 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1298,14 +1298,14 @@ static int kfd_ioctl_map_memory_to_gpu(struct file 
*filep,
args->n_success = i+1;
}
 
-   mutex_unlock(>mutex);
-
err = amdgpu_amdkfd_gpuvm_sync_memory(dev->adev, (struct kgd_mem *) 
mem, true);
if (err) {
pr_debug("Sync memory failed, wait interrupted by user 
signal\n");
goto sync_memory_failed;
}
 
+   mutex_unlock(>mutex);
+
/* Flush TLBs after waiting for the page table updates to complete */
for (i = 0; i < args->n_devices; i++) {
peer_pdd = kfd_process_device_data_by_id(p, devices_arr[i]);
@@ -1321,9 +1321,9 @@ static int kfd_ioctl_map_memory_to_gpu(struct file *filep,
 bind_process_to_device_failed:
 get_mem_obj_from_handle_failed:
 map_memory_to_gpu_failed:
+sync_memory_failed:
mutex_unlock(>mutex);
 copy_from_user_failed:
-sync_memory_failed:
kfree(devices_arr);
 
return err;
@@ -1337,6 +1337,7 @@ static int kfd_ioctl_unmap_memory_from_gpu(struct file 
*filep,
void *mem;
long err = 0;
uint32_t *devices_arr = NULL, i;
+   bool flush_tlb;
 
if (!args->n_devices) {
pr_debug("Device IDs array empty\n");
@@ -1389,16 +1390,19 @@ static int kfd_ioctl_unmap_memory_from_gpu(struct file 
*filep,
}
args->n_success = i+1;
}
-   mutex_unlock(>mutex);
 
-   if (kfd_flush_tlb_after_unmap(pdd->dev)) {
+   flush_tlb = kfd_flush_tlb_after_unmap(pdd->dev);
+   if (flush_tlb) {
err = amdgpu_amdkfd_gpuvm_sync_memory(pdd->dev->adev,
(struct kgd_mem *) mem, true);
if (err) {
pr_debug("Sync memory failed, wait interrupted by user 
signal\n");
goto sync_memory_failed;
}
+   }
+   mutex_unlock(>mutex);
 
+   if (flush_tlb) {
/* Flush TLBs after waiting for the page table updates to 
complete */
for (i = 0; i < args->n_devices; i++) {
peer_pdd = kfd_process_device_data_by_id(p, 
devices_arr[i]);
@@ -1414,9 +1418,9 @@ static int kfd_ioctl_unmap_memory_from_gpu(struct file 
*filep,
 bind_process_to_device_failed:
 get_mem_obj_from_handle_failed:
 unmap_memory_from_gpu_failed:
+sync_memory_failed:
mutex_unlock(>mutex);
 copy_from_user_failed:
-sync_memory_failed:
kfree(devices_arr);
return err;
 }
-- 
2.39.2



[PATCH AUTOSEL 6.2 26/45] drm/amdgpu/vcn: custom video info caps for sriov

2023-03-22 Thread Sasha Levin
From: Jane Jian 

[ Upstream commit d71e38df3b730a17ab6b25cabb2ccfe8a7f04385 ]

for sriov, we added a new flag to indicate av1 support,
this will override the original caps info.

Signed-off-by: Jane Jian 
Acked-by: Alex Deucher 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h|   4 +
 drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h |   3 +-
 drivers/gpu/drm/amd/amdgpu/soc21.c  | 103 ++--
 3 files changed, 99 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
index 2b9d806e23afb..10a0a510910b6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
@@ -123,6 +123,8 @@ enum AMDGIM_FEATURE_FLAG {
AMDGIM_FEATURE_PP_ONE_VF = (1 << 4),
/* Indirect Reg Access enabled */
AMDGIM_FEATURE_INDIRECT_REG_ACCESS = (1 << 5),
+   /* AV1 Support MODE*/
+   AMDGIM_FEATURE_AV1_SUPPORT = (1 << 6),
 };
 
 enum AMDGIM_REG_ACCESS_FLAG {
@@ -321,6 +323,8 @@ static inline bool is_virtual_machine(void)
((!amdgpu_in_reset(adev)) && adev->virt.tdr_debug)
 #define amdgpu_sriov_is_normal(adev) \
((!amdgpu_in_reset(adev)) && (!adev->virt.tdr_debug))
+#define amdgpu_sriov_is_av1_support(adev) \
+   ((adev)->virt.gim_feature & AMDGIM_FEATURE_AV1_SUPPORT)
 bool amdgpu_virt_mmio_blocked(struct amdgpu_device *adev);
 void amdgpu_virt_init_setting(struct amdgpu_device *adev);
 void amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h 
b/drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h
index 6c97148ca0ed3..24d42d24e6a01 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h
@@ -93,7 +93,8 @@ union amd_sriov_msg_feature_flags {
uint32_t mm_bw_management  : 1;
uint32_t pp_one_vf_mode: 1;
uint32_t reg_indirect_acc  : 1;
-   uint32_t reserved  : 26;
+   uint32_t av1_support   : 1;
+   uint32_t reserved  : 25;
} flags;
uint32_t all;
 };
diff --git a/drivers/gpu/drm/amd/amdgpu/soc21.c 
b/drivers/gpu/drm/amd/amdgpu/soc21.c
index 3d938b52178e3..9eedc1a1494c0 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc21.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc21.c
@@ -101,6 +101,59 @@ static const struct amdgpu_video_codecs 
vcn_4_0_0_video_codecs_decode_vcn1 =
.codec_array = vcn_4_0_0_video_codecs_decode_array_vcn1,
 };
 
+/* SRIOV SOC21, not const since data is controlled by host */
+static struct amdgpu_video_codec_info 
sriov_vcn_4_0_0_video_codecs_encode_array_vcn0[] = {
+   {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_MPEG4_AVC, 4096, 
2304, 0)},
+   {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_HEVC, 4096, 2304, 
0)},
+   {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_AV1, 8192, 4352, 0)},
+};
+
+static struct amdgpu_video_codec_info 
sriov_vcn_4_0_0_video_codecs_encode_array_vcn1[] = {
+   {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_MPEG4_AVC, 4096, 
2304, 0)},
+   {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_HEVC, 4096, 2304, 
0)},
+};
+
+static struct amdgpu_video_codecs sriov_vcn_4_0_0_video_codecs_encode_vcn0 = {
+   .codec_count = 
ARRAY_SIZE(sriov_vcn_4_0_0_video_codecs_encode_array_vcn0),
+   .codec_array = sriov_vcn_4_0_0_video_codecs_encode_array_vcn0,
+};
+
+static struct amdgpu_video_codecs sriov_vcn_4_0_0_video_codecs_encode_vcn1 = {
+   .codec_count = 
ARRAY_SIZE(sriov_vcn_4_0_0_video_codecs_encode_array_vcn1),
+   .codec_array = sriov_vcn_4_0_0_video_codecs_encode_array_vcn1,
+};
+
+static struct amdgpu_video_codec_info 
sriov_vcn_4_0_0_video_codecs_decode_array_vcn0[] = {
+   {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_MPEG2, 4096, 4096, 
3)},
+   {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_MPEG4, 4096, 4096, 
5)},
+   {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_MPEG4_AVC, 4096, 
4096, 52)},
+   {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_VC1, 4096, 4096, 4)},
+   {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_HEVC, 8192, 4352, 
186)},
+   {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_JPEG, 4096, 4096, 
0)},
+   {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_VP9, 8192, 4352, 0)},
+   {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_AV1, 8192, 4352, 0)},
+};
+
+static struct amdgpu_video_codec_info 
sriov_vcn_4_0_0_video_codecs_decode_array_vcn1[] = {
+   {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_MPEG2, 4096, 4096, 
3)},
+   {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_MPEG4, 4096, 4096, 
5)},
+   {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_MPEG4_AVC, 4096, 
4096, 52)},
+   {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_VC1, 4096, 4096, 4)},
+   

[PATCH AUTOSEL 6.2 25/45] drm/amdkfd: fix a potential double free in pqm_create_queue

2023-03-22 Thread Sasha Levin
From: Chia-I Wu 

[ Upstream commit b2ca5c5d416b4e72d1e9d0293fc720e2d525fd42 ]

Set *q to NULL on errors, otherwise pqm_create_queue would free it
again.

Signed-off-by: Chia-I Wu 
Signed-off-by: Felix Kuehling 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
index 5137476ec18e6..4236539d9f932 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
@@ -218,8 +218,8 @@ static int init_user_queue(struct process_queue_manager 
*pqm,
return 0;
 
 cleanup:
-   if (dev->shared_resources.enable_mes)
-   uninit_queue(*q);
+   uninit_queue(*q);
+   *q = NULL;
return retval;
 }
 
-- 
2.39.2



[PATCH AUTOSEL 6.2 24/45] drm/amdkfd: Fix BO offset for multi-VMA page migration

2023-03-22 Thread Sasha Levin
From: Xiaogang Chen 

[ Upstream commit b4ee9606378bb9520c94d8b96f0305c3696f5c29 ]

svm_migrate_ram_to_vram migrates a prange from sys ram to vram. The prange may
cross multiple vma. Need remember current dst vram offset in the TTM resource 
for
each migration.

v2: squash in warning fix (Alex)

Signed-off-by: Xiaogang Chen 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index 10048ce16aea4..5c319007b4701 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -289,7 +289,7 @@ static unsigned long svm_migrate_unsuccessful_pages(struct 
migrate_vma *migrate)
 static int
 svm_migrate_copy_to_vram(struct amdgpu_device *adev, struct svm_range *prange,
 struct migrate_vma *migrate, struct dma_fence **mfence,
-dma_addr_t *scratch)
+dma_addr_t *scratch, uint64_t ttm_res_offset)
 {
uint64_t npages = migrate->npages;
struct device *dev = adev->dev;
@@ -299,8 +299,8 @@ svm_migrate_copy_to_vram(struct amdgpu_device *adev, struct 
svm_range *prange,
uint64_t i, j;
int r;
 
-   pr_debug("svms 0x%p [0x%lx 0x%lx]\n", prange->svms, prange->start,
-prange->last);
+   pr_debug("svms 0x%p [0x%lx 0x%lx 0x%llx]\n", prange->svms, 
prange->start,
+prange->last, ttm_res_offset);
 
src = scratch;
dst = (uint64_t *)(scratch + npages);
@@ -311,7 +311,7 @@ svm_migrate_copy_to_vram(struct amdgpu_device *adev, struct 
svm_range *prange,
goto out;
}
 
-   amdgpu_res_first(prange->ttm_res, prange->offset << PAGE_SHIFT,
+   amdgpu_res_first(prange->ttm_res, ttm_res_offset,
 npages << PAGE_SHIFT, );
for (i = j = 0; i < npages; i++) {
struct page *spage;
@@ -398,7 +398,7 @@ svm_migrate_copy_to_vram(struct amdgpu_device *adev, struct 
svm_range *prange,
 static long
 svm_migrate_vma_to_vram(struct amdgpu_device *adev, struct svm_range *prange,
struct vm_area_struct *vma, uint64_t start,
-   uint64_t end, uint32_t trigger)
+   uint64_t end, uint32_t trigger, uint64_t ttm_res_offset)
 {
struct kfd_process *p = container_of(prange->svms, struct kfd_process, 
svms);
uint64_t npages = (end - start) >> PAGE_SHIFT;
@@ -451,7 +451,7 @@ svm_migrate_vma_to_vram(struct amdgpu_device *adev, struct 
svm_range *prange,
else
pr_debug("0x%lx pages migrated\n", cpages);
 
-   r = svm_migrate_copy_to_vram(adev, prange, , , scratch);
+   r = svm_migrate_copy_to_vram(adev, prange, , , scratch, 
ttm_res_offset);
migrate_vma_pages();
 
pr_debug("successful/cpages/npages 0x%lx/0x%lx/0x%lx\n",
@@ -499,6 +499,7 @@ svm_migrate_ram_to_vram(struct svm_range *prange, uint32_t 
best_loc,
unsigned long addr, start, end;
struct vm_area_struct *vma;
struct amdgpu_device *adev;
+   uint64_t ttm_res_offset;
unsigned long cpages = 0;
long r = 0;
 
@@ -519,6 +520,7 @@ svm_migrate_ram_to_vram(struct svm_range *prange, uint32_t 
best_loc,
 
start = prange->start << PAGE_SHIFT;
end = (prange->last + 1) << PAGE_SHIFT;
+   ttm_res_offset = prange->offset << PAGE_SHIFT;
 
for (addr = start; addr < end;) {
unsigned long next;
@@ -528,13 +530,14 @@ svm_migrate_ram_to_vram(struct svm_range *prange, 
uint32_t best_loc,
break;
 
next = min(vma->vm_end, end);
-   r = svm_migrate_vma_to_vram(adev, prange, vma, addr, next, 
trigger);
+   r = svm_migrate_vma_to_vram(adev, prange, vma, addr, next, 
trigger, ttm_res_offset);
if (r < 0) {
pr_debug("failed %ld to migrate\n", r);
break;
} else {
cpages += r;
}
+   ttm_res_offset += next - addr;
addr = next;
}
 
-- 
2.39.2



Re: [PATCH] drm/display: Add missing OLED Vesa brightnesses definitions

2023-03-22 Thread Harry Wentland



On 3/22/23 12:05, Rodrigo Siqueira wrote:
> Cc: Anthony Koo 
> Cc: Iswara Negulendran 
> Cc: Felipe Clark 
> Cc: Harry Wentland 
> Signed-off-by: Rodrigo Siqueira 

Reviewed-by: Harry Wentland 

Harry

> ---
>  include/drm/display/drm_dp.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/include/drm/display/drm_dp.h b/include/drm/display/drm_dp.h
> index 632376c291db..d30a9b2f450c 100644
> --- a/include/drm/display/drm_dp.h
> +++ b/include/drm/display/drm_dp.h
> @@ -977,6 +977,8 @@
>  # define DP_EDP_BACKLIGHT_FREQ_AUX_SET_CAP   (1 << 5)
>  # define DP_EDP_DYNAMIC_BACKLIGHT_CAP(1 << 6)
>  # define DP_EDP_VBLANK_BACKLIGHT_UPDATE_CAP  (1 << 7)
> +#define DP_EDP_OLED_VESA_BRIGHTNESS_ON  0x80
> +# define DP_EDP_OLED_VESA_CAP(1 << 4)
>  
>  #define DP_EDP_GENERAL_CAP_2 0x703
>  # define DP_EDP_OVERDRIVE_ENGINE_ENABLED (1 << 0)



[PATCH] drm/display: Add missing OLED Vesa brightnesses definitions

2023-03-22 Thread Rodrigo Siqueira
Cc: Anthony Koo 
Cc: Iswara Negulendran 
Cc: Felipe Clark 
Cc: Harry Wentland 
Signed-off-by: Rodrigo Siqueira 
---
 include/drm/display/drm_dp.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/drm/display/drm_dp.h b/include/drm/display/drm_dp.h
index 632376c291db..d30a9b2f450c 100644
--- a/include/drm/display/drm_dp.h
+++ b/include/drm/display/drm_dp.h
@@ -977,6 +977,8 @@
 # define DP_EDP_BACKLIGHT_FREQ_AUX_SET_CAP (1 << 5)
 # define DP_EDP_DYNAMIC_BACKLIGHT_CAP  (1 << 6)
 # define DP_EDP_VBLANK_BACKLIGHT_UPDATE_CAP(1 << 7)
+#define DP_EDP_OLED_VESA_BRIGHTNESS_ON  0x80
+# define DP_EDP_OLED_VESA_CAP  (1 << 4)
 
 #define DP_EDP_GENERAL_CAP_2   0x703
 # define DP_EDP_OVERDRIVE_ENGINE_ENABLED   (1 << 0)
-- 
2.39.2



Re: [PATCH] drm/amd/display: Clean up some inconsistent indenting

2023-03-22 Thread Hamza Mahfooz

On 3/21/23 23:14, Jiapeng Chong wrote:

No functional modification involved.

Reported-by: Abaci Robot 
Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=4585
Signed-off-by: Jiapeng Chong 


Applied, thanks!


---
  drivers/gpu/drm/amd/display/modules/power/power_helpers.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/modules/power/power_helpers.c 
b/drivers/gpu/drm/amd/display/modules/power/power_helpers.c
index fa469de3e935..0d3a983cb9ec 100644
--- a/drivers/gpu/drm/amd/display/modules/power/power_helpers.c
+++ b/drivers/gpu/drm/amd/display/modules/power/power_helpers.c
@@ -758,8 +758,8 @@ bool dmcu_load_iram(struct dmcu *dmcu,
  
  	if (dmcu->dmcu_version.abm_version == 0x24) {

fill_iram_v_2_3((struct iram_table_v_2_2 *)ram_table, params, 
true);
-   result = dmcu->funcs->load_iram(
-   dmcu, 0, (char *)(_table), 
IRAM_RESERVE_AREA_START_V2_2);
+   result = dmcu->funcs->load_iram(dmcu, 0, (char *)(_table),
+   IRAM_RESERVE_AREA_START_V2_2);
} else if (dmcu->dmcu_version.abm_version == 0x23) {
fill_iram_v_2_3((struct iram_table_v_2_2 *)ram_table, params, 
true);
  


--
Hamza



Re: [PATCH] drm/amd/display: Remove the unused variable dppclk_delay_subtotal

2023-03-22 Thread Hamza Mahfooz

On 3/21/23 21:59, Jiapeng Chong wrote:

Variable dppclk_delay_subtotal is not effectively used, so delete it.

drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn314/display_rq_dlg_calc_314.c:1004:15:
 warning: variable 'dppclk_delay_subtotal' set but not used.

Reported-by: Abaci Robot 
Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=4584
Signed-off-by: Jiapeng Chong 


Applied, thanks!


---
  .../display/dc/dml/dcn314/display_rq_dlg_calc_314.c| 10 --
  1 file changed, 10 deletions(-)

diff --git 
a/drivers/gpu/drm/amd/display/dc/dml/dcn314/display_rq_dlg_calc_314.c 
b/drivers/gpu/drm/amd/display/dc/dml/dcn314/display_rq_dlg_calc_314.c
index 6576b897a512..d1c2693a2e28 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn314/display_rq_dlg_calc_314.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn314/display_rq_dlg_calc_314.c
@@ -1001,7 +1001,6 @@ static void dml_rq_dlg_get_dlg_params(
unsigned int vupdate_width;
unsigned int vready_offset;
  
-	unsigned int dppclk_delay_subtotal;

unsigned int dispclk_delay_subtotal;
  
  	unsigned int vstartup_start;

@@ -1130,17 +1129,8 @@ static void dml_rq_dlg_get_dlg_params(
vupdate_offset = dst->vupdate_offset;
vupdate_width = dst->vupdate_width;
vready_offset = dst->vready_offset;
-
-   dppclk_delay_subtotal = mode_lib->ip.dppclk_delay_subtotal;
dispclk_delay_subtotal = mode_lib->ip.dispclk_delay_subtotal;
  
-	if (scl_enable)

-   dppclk_delay_subtotal += mode_lib->ip.dppclk_delay_scl;
-   else
-   dppclk_delay_subtotal += mode_lib->ip.dppclk_delay_scl_lb_only;
-
-   dppclk_delay_subtotal += mode_lib->ip.dppclk_delay_cnvc_formatter + 
src->num_cursors * mode_lib->ip.dppclk_delay_cnvc_cursor;
-
if (dout->dsc_enable) {
double dsc_delay = get_dsc_delay(mode_lib, e2e_pipe_param, 
num_pipes, pipe_idx); // FROM VBA
  


--
Hamza



Re: [PATCH] drm/amd/display: Slightly optimize dm_dmub_outbox1_low_irq()

2023-03-22 Thread Hamza Mahfooz

On 3/21/23 13:58, Christophe JAILLET wrote:

A kzalloc()+memcpy() can be optimized in a single kmemdup().
This saves a few cycles because some memory doesn't need to be zeroed.

Signed-off-by: Christophe JAILLET 


Applied, thanks!


---
  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 5 ++---
  1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 5bac5781a06b..57a5fbdab890 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -820,15 +820,14 @@ static void dm_dmub_outbox1_low_irq(void 
*interrupt_params)
DRM_ERROR("Failed to allocate 
dmub_hpd_wrk");
return;
}
-   dmub_hpd_wrk->dmub_notify = 
kzalloc(sizeof(struct dmub_notification), GFP_ATOMIC);
+   dmub_hpd_wrk->dmub_notify = kmemdup(, 
sizeof(struct dmub_notification),
+   GFP_ATOMIC);
if (!dmub_hpd_wrk->dmub_notify) {
kfree(dmub_hpd_wrk);
DRM_ERROR("Failed to allocate 
dmub_hpd_wrk->dmub_notify");
return;
}
INIT_WORK(_hpd_wrk->handle_hpd_work, 
dm_handle_hpd_work);
-   if (dmub_hpd_wrk->dmub_notify)
-   memcpy(dmub_hpd_wrk->dmub_notify, 
, sizeof(struct dmub_notification));
dmub_hpd_wrk->adev = adev;
if (notify.type == DMUB_NOTIFICATION_HPD) {
plink = 
adev->dm.dc->links[notify.link_index];


--
Hamza



Re: [PATCH v3 1/5] drm/amdgpu: add UAPI for workload hints to ctx ioctl

2023-03-22 Thread Marek Olšák
The uapi would make sense if somebody wrote and implemented a Vulkan
extension exposing the hints and if we had customers who require that
extension. Without that, userspace knows almost nothing. If anything, this
effort should be led by our customers especially in the case of Vulkan
(writing the extension spec, etc.)

This is not a stack issue as much as it is an interface designed around
Windows that doesn't fit Linux, and for that reason, putting into uapi in
the current form doesn't seem to be a good idea.

Marek

On Wed, Mar 22, 2023 at 10:52 AM Alex Deucher  wrote:

> On Wed, Mar 22, 2023 at 10:37 AM Marek Olšák  wrote:
> >
> > It sounds like the kernel should set the hint based on which queues are
> used, so that every UMD doesn't have to duplicate the same logic.
>
> Userspace has a better idea of what they are doing than the kernel.
> That said, we already set the video hint in the kernel when we submit
> work to VCN/UVD/VCE and we already set hint COMPUTE when user queues
> are active in ROCm because user queues don't go through the kernel.  I
> guess we could just set 3D by default.  On windows there is a separate
> API for fullscreen 3D games so 3D is only enabled in that case.  I
> assumed UMDs would want to select a hint, but maybe we should just
> select the kernel set something.  I figured vulkan or OpenGL would
> select 3D vs COMPUTE depending on what queues/extensions the app uses.
>
> Thinking about it more, if we do keep the hints, maybe it makes more
> sense to select the hint at context init.  Then we can set the hint to
> the hardware at context init time.  If multiple hints come in from
> different contexts we'll automatically select the most aggressive one.
> That would also be compatible with user mode queues.
>
> Alex
>
> >
> > Marek
> >
> > On Wed, Mar 22, 2023 at 10:29 AM Christian König <
> christian.koe...@amd.com> wrote:
> >>
> >> Well that sounds like being able to optionally set it after context
> creation is actually the right approach.
> >>
> >> VA-API could set it as soon as we know that this is a video codec
> application.
> >>
> >> Vulkan can set it depending on what features are used by the
> application.
> >>
> >> But yes, Shashank (or whoever requested that) should come up with some
> code for Mesa to actually use it. Otherwise we don't have the justification
> to push it into the kernel driver.
> >>
> >> Christian.
> >>
> >> Am 22.03.23 um 15:24 schrieb Marek Olšák:
> >>
> >> The hint is static per API (one of graphics, video, compute, unknown).
> In the case of Vulkan, which exposes all queues, the hint is unknown, so
> Vulkan won't use it. (or make it based on the queue being used and not the
> uapi context state) GL won't use it because the default hint is already 3D.
> That makes VAAPI the only user that only sets the hint once, and maybe it's
> not worth even adding this uapi just for VAAPI.
> >>
> >> Marek
> >>
> >> On Wed, Mar 22, 2023 at 10:08 AM Christian König <
> christian.koe...@amd.com> wrote:
> >>>
> >>> Well completely agree that we shouldn't have unused API. That's why I
> said we should remove the getting the hint from the UAPI.
> >>>
> >>> But what's wrong with setting it after creating the context? Don't you
> know enough about the use case? I need to understand the background a bit
> better here.
> >>>
> >>> Christian.
> >>>
> >>> Am 22.03.23 um 15:05 schrieb Marek Olšák:
> >>>
> >>> The option to change the hint after context creation and get the hint
> would be unused uapi, and AFAIK we are not supposed to add unused uapi.
> What I asked is to change it to a uapi that userspace will actually use.
> >>>
> >>> Marek
> >>>
> >>> On Tue, Mar 21, 2023 at 9:54 AM Christian König <
> ckoenig.leichtzumer...@gmail.com> wrote:
> 
>  Yes, I would like to avoid having multiple code paths for context
> creation.
> 
>  Setting it later on should be equally to specifying it on creation
> since we only need it during CS.
> 
>  Regards,
>  Christian.
> 
>  Am 21.03.23 um 14:00 schrieb Sharma, Shashank:
> 
>  [AMD Official Use Only - General]
> 
> 
> 
>  When we started this patch series, the workload hint was a part of
> the ctx_flag only,
> 
>  But we changed that after the design review, to make it more like how
> we are handling PSTATE.
> 
> 
> 
>  Details:
> 
>  https://patchwork.freedesktop.org/patch/496111/
> 
> 
> 
>  Regards
> 
>  Shashank
> 
> 
> 
>  From: Marek Olšák 
>  Sent: 21 March 2023 04:05
>  To: Sharma, Shashank 
>  Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander <
> alexander.deuc...@amd.com>; Somalapuram, Amaranath <
> amaranath.somalapu...@amd.com>; Koenig, Christian <
> christian.koe...@amd.com>
>  Subject: Re: [PATCH v3 1/5] drm/amdgpu: add UAPI for workload hints
> to ctx ioctl
> 
> 
> 
>  I think we should do it differently because this interface will be
> mostly 

Re: [PATCH v3 1/5] drm/amdgpu: add UAPI for workload hints to ctx ioctl

2023-03-22 Thread Alex Deucher
On Wed, Mar 22, 2023 at 10:37 AM Marek Olšák  wrote:
>
> It sounds like the kernel should set the hint based on which queues are used, 
> so that every UMD doesn't have to duplicate the same logic.

Userspace has a better idea of what they are doing than the kernel.
That said, we already set the video hint in the kernel when we submit
work to VCN/UVD/VCE and we already set hint COMPUTE when user queues
are active in ROCm because user queues don't go through the kernel.  I
guess we could just set 3D by default.  On windows there is a separate
API for fullscreen 3D games so 3D is only enabled in that case.  I
assumed UMDs would want to select a hint, but maybe we should just
select the kernel set something.  I figured vulkan or OpenGL would
select 3D vs COMPUTE depending on what queues/extensions the app uses.

Thinking about it more, if we do keep the hints, maybe it makes more
sense to select the hint at context init.  Then we can set the hint to
the hardware at context init time.  If multiple hints come in from
different contexts we'll automatically select the most aggressive one.
That would also be compatible with user mode queues.

Alex

>
> Marek
>
> On Wed, Mar 22, 2023 at 10:29 AM Christian König  
> wrote:
>>
>> Well that sounds like being able to optionally set it after context creation 
>> is actually the right approach.
>>
>> VA-API could set it as soon as we know that this is a video codec 
>> application.
>>
>> Vulkan can set it depending on what features are used by the application.
>>
>> But yes, Shashank (or whoever requested that) should come up with some code 
>> for Mesa to actually use it. Otherwise we don't have the justification to 
>> push it into the kernel driver.
>>
>> Christian.
>>
>> Am 22.03.23 um 15:24 schrieb Marek Olšák:
>>
>> The hint is static per API (one of graphics, video, compute, unknown). In 
>> the case of Vulkan, which exposes all queues, the hint is unknown, so Vulkan 
>> won't use it. (or make it based on the queue being used and not the uapi 
>> context state) GL won't use it because the default hint is already 3D. That 
>> makes VAAPI the only user that only sets the hint once, and maybe it's not 
>> worth even adding this uapi just for VAAPI.
>>
>> Marek
>>
>> On Wed, Mar 22, 2023 at 10:08 AM Christian König  
>> wrote:
>>>
>>> Well completely agree that we shouldn't have unused API. That's why I said 
>>> we should remove the getting the hint from the UAPI.
>>>
>>> But what's wrong with setting it after creating the context? Don't you know 
>>> enough about the use case? I need to understand the background a bit better 
>>> here.
>>>
>>> Christian.
>>>
>>> Am 22.03.23 um 15:05 schrieb Marek Olšák:
>>>
>>> The option to change the hint after context creation and get the hint would 
>>> be unused uapi, and AFAIK we are not supposed to add unused uapi. What I 
>>> asked is to change it to a uapi that userspace will actually use.
>>>
>>> Marek
>>>
>>> On Tue, Mar 21, 2023 at 9:54 AM Christian König 
>>>  wrote:

 Yes, I would like to avoid having multiple code paths for context creation.

 Setting it later on should be equally to specifying it on creation since 
 we only need it during CS.

 Regards,
 Christian.

 Am 21.03.23 um 14:00 schrieb Sharma, Shashank:

 [AMD Official Use Only - General]



 When we started this patch series, the workload hint was a part of the 
 ctx_flag only,

 But we changed that after the design review, to make it more like how we 
 are handling PSTATE.



 Details:

 https://patchwork.freedesktop.org/patch/496111/



 Regards

 Shashank



 From: Marek Olšák 
 Sent: 21 March 2023 04:05
 To: Sharma, Shashank 
 Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander 
 ; Somalapuram, Amaranath 
 ; Koenig, Christian 
 
 Subject: Re: [PATCH v3 1/5] drm/amdgpu: add UAPI for workload hints to ctx 
 ioctl



 I think we should do it differently because this interface will be mostly 
 unused by open source userspace in its current form.



 Let's set the workload hint in drm_amdgpu_ctx_in::flags, and that will be 
 immutable for the lifetime of the context. No other interface is needed.



 Marek



 On Mon, Sep 26, 2022 at 5:41 PM Shashank Sharma  
 wrote:

 Allow the user to specify a workload hint to the kernel.
 We can use these to tweak the dpm heuristics to better match
 the workload for improved performance.

 V3: Create only set() workload UAPI (Christian)

 Signed-off-by: Alex Deucher 
 Signed-off-by: Shashank Sharma 
 ---
  include/uapi/drm/amdgpu_drm.h | 17 +
  1 file changed, 17 insertions(+)

 diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
 index c2c9c674a223..23d354242699 100644

RE: [PATCH v3 1/5] drm/amdgpu: add UAPI for workload hints to ctx ioctl

2023-03-22 Thread Sharma, Shashank
[AMD Official Use Only - General]

From the exposed workload hints:
+#define AMDGPU_CTX_WORKLOAD_HINT_NONE
+#define AMDGPU_CTX_WORKLOAD_HINT_3D
+#define AMDGPU_CTX_WORKLOAD_HINT_VIDEO
+#define AMDGPU_CTX_WORKLOAD_HINT_VR
+#define AMDGPU_CTX_WORKLOAD_HINT_COMPUTE

I guess the only option which we do not know how to use is HINT_VR, everything 
else is known. I find it a limitation of the stack that we can’t differentiate 
between a VR workload Vs 3D, coz at some time we might have to give high 
privilege or special attention to it when VR becomes more demanding, but for 
now, I can remove this one option from the patch:

+#define AMDGPU_CTX_WORKLOAD_HINT_VR
Regards
Shashank

From: Koenig, Christian 
Sent: 22 March 2023 15:29
To: Marek Olšák 
Cc: Christian König ; Sharma, Shashank 
; Deucher, Alexander ; 
Somalapuram, Amaranath ; 
amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH v3 1/5] drm/amdgpu: add UAPI for workload hints to ctx ioctl

Well that sounds like being able to optionally set it after context creation is 
actually the right approach.

VA-API could set it as soon as we know that this is a video codec application.

Vulkan can set it depending on what features are used by the application.

But yes, Shashank (or whoever requested that) should come up with some code for 
Mesa to actually use it. Otherwise we don't have the justification to push it 
into the kernel driver.

Christian.
Am 22.03.23 um 15:24 schrieb Marek Olšák:
The hint is static per API (one of graphics, video, compute, unknown). In the 
case of Vulkan, which exposes all queues, the hint is unknown, so Vulkan won't 
use it. (or make it based on the queue being used and not the uapi context 
state) GL won't use it because the default hint is already 3D. That makes VAAPI 
the only user that only sets the hint once, and maybe it's not worth even 
adding this uapi just for VAAPI.

Marek

On Wed, Mar 22, 2023 at 10:08 AM Christian König 
mailto:christian.koe...@amd.com>> wrote:
Well completely agree that we shouldn't have unused API. That's why I said we 
should remove the getting the hint from the UAPI.

But what's wrong with setting it after creating the context? Don't you know 
enough about the use case? I need to understand the background a bit better 
here.

Christian.
Am 22.03.23 um 15:05 schrieb Marek Olšák:
The option to change the hint after context creation and get the hint would be 
unused uapi, and AFAIK we are not supposed to add unused uapi. What I asked is 
to change it to a uapi that userspace will actually use.

Marek

On Tue, Mar 21, 2023 at 9:54 AM Christian König 
mailto:ckoenig.leichtzumer...@gmail.com>> 
wrote:
Yes, I would like to avoid having multiple code paths for context creation.

Setting it later on should be equally to specifying it on creation since we 
only need it during CS.

Regards,
Christian.
Am 21.03.23 um 14:00 schrieb Sharma, Shashank:

[AMD Official Use Only - General]

When we started this patch series, the workload hint was a part of the ctx_flag 
only,
But we changed that after the design review, to make it more like how we are 
handling PSTATE.

Details:
https://patchwork.freedesktop.org/patch/496111/

Regards
Shashank

From: Marek Olšák 
Sent: 21 March 2023 04:05
To: Sharma, Shashank 
Cc: amd-gfx@lists.freedesktop.org; 
Deucher, Alexander 
; Somalapuram, 
Amaranath 
; Koenig, 
Christian 
Subject: Re: [PATCH v3 1/5] drm/amdgpu: add UAPI for workload hints to ctx ioctl

I think we should do it differently because this interface will be mostly 
unused by open source userspace in its current form.

Let's set the workload hint in drm_amdgpu_ctx_in::flags, and that will be 
immutable for the lifetime of the context. No other interface is needed.

Marek

On Mon, Sep 26, 2022 at 5:41 PM Shashank Sharma 
mailto:shashank.sha...@amd.com>> wrote:
Allow the user to specify a workload hint to the kernel.
We can use these to tweak the dpm heuristics to better match
the workload for improved performance.

V3: Create only set() workload UAPI (Christian)

Signed-off-by: Alex Deucher 
mailto:alexander.deuc...@amd.com>>
Signed-off-by: Shashank Sharma 
mailto:shashank.sha...@amd.com>>
---
 include/uapi/drm/amdgpu_drm.h | 17 +
 1 file changed, 17 insertions(+)

diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index c2c9c674a223..23d354242699 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -212,6 +212,7 @@ union drm_amdgpu_bo_list {
 #define AMDGPU_CTX_OP_QUERY_STATE2 4
 #define AMDGPU_CTX_OP_GET_STABLE_PSTATE5
 #define AMDGPU_CTX_OP_SET_STABLE_PSTATE6
+#define AMDGPU_CTX_OP_SET_WORKLOAD_PROFILE 7

 /* GPU reset status */
 #define AMDGPU_CTX_NO_RESET0
@@ -252,6 +253,17 @@ union drm_amdgpu_bo_list {
 #define 

Re: [PATCH v3 1/5] drm/amdgpu: add UAPI for workload hints to ctx ioctl

2023-03-22 Thread Marek Olšák
It sounds like the kernel should set the hint based on which queues are
used, so that every UMD doesn't have to duplicate the same logic.

Marek

On Wed, Mar 22, 2023 at 10:29 AM Christian König 
wrote:

> Well that sounds like being able to optionally set it after context
> creation is actually the right approach.
>
> VA-API could set it as soon as we know that this is a video codec
> application.
>
> Vulkan can set it depending on what features are used by the application.
>
> But yes, Shashank (or whoever requested that) should come up with some
> code for Mesa to actually use it. Otherwise we don't have the justification
> to push it into the kernel driver.
>
> Christian.
>
> Am 22.03.23 um 15:24 schrieb Marek Olšák:
>
> The hint is static per API (one of graphics, video, compute, unknown). In
> the case of Vulkan, which exposes all queues, the hint is unknown, so
> Vulkan won't use it. (or make it based on the queue being used and not the
> uapi context state) GL won't use it because the default hint is already 3D.
> That makes VAAPI the only user that only sets the hint once, and maybe it's
> not worth even adding this uapi just for VAAPI.
>
> Marek
>
> On Wed, Mar 22, 2023 at 10:08 AM Christian König 
> wrote:
>
>> Well completely agree that we shouldn't have unused API. That's why I
>> said we should remove the getting the hint from the UAPI.
>>
>> But what's wrong with setting it after creating the context? Don't you
>> know enough about the use case? I need to understand the background a bit
>> better here.
>>
>> Christian.
>>
>> Am 22.03.23 um 15:05 schrieb Marek Olšák:
>>
>> The option to change the hint after context creation and get the hint
>> would be unused uapi, and AFAIK we are not supposed to add unused uapi.
>> What I asked is to change it to a uapi that userspace will actually use.
>>
>> Marek
>>
>> On Tue, Mar 21, 2023 at 9:54 AM Christian König <
>> ckoenig.leichtzumer...@gmail.com> wrote:
>>
>>> Yes, I would like to avoid having multiple code paths for context
>>> creation.
>>>
>>> Setting it later on should be equally to specifying it on creation since
>>> we only need it during CS.
>>>
>>> Regards,
>>> Christian.
>>>
>>> Am 21.03.23 um 14:00 schrieb Sharma, Shashank:
>>>
>>> [AMD Official Use Only - General]
>>>
>>>
>>>
>>> When we started this patch series, the workload hint was a part of the
>>> ctx_flag only,
>>>
>>> But we changed that after the design review, to make it more like how we
>>> are handling PSTATE.
>>>
>>>
>>>
>>> Details:
>>>
>>> https://patchwork.freedesktop.org/patch/496111/
>>>
>>>
>>>
>>> Regards
>>>
>>> Shashank
>>>
>>>
>>>
>>> *From:* Marek Olšák  
>>> *Sent:* 21 March 2023 04:05
>>> *To:* Sharma, Shashank 
>>> 
>>> *Cc:* amd-gfx@lists.freedesktop.org; Deucher, Alexander
>>>  ; Somalapuram,
>>> Amaranath 
>>> ; Koenig, Christian
>>>  
>>> *Subject:* Re: [PATCH v3 1/5] drm/amdgpu: add UAPI for workload hints
>>> to ctx ioctl
>>>
>>>
>>>
>>> I think we should do it differently because this interface will be
>>> mostly unused by open source userspace in its current form.
>>>
>>>
>>>
>>> Let's set the workload hint in drm_amdgpu_ctx_in::flags, and that will
>>> be immutable for the lifetime of the context. No other interface is needed.
>>>
>>>
>>>
>>> Marek
>>>
>>>
>>>
>>> On Mon, Sep 26, 2022 at 5:41 PM Shashank Sharma 
>>> wrote:
>>>
>>> Allow the user to specify a workload hint to the kernel.
>>> We can use these to tweak the dpm heuristics to better match
>>> the workload for improved performance.
>>>
>>> V3: Create only set() workload UAPI (Christian)
>>>
>>> Signed-off-by: Alex Deucher 
>>> Signed-off-by: Shashank Sharma 
>>> ---
>>>  include/uapi/drm/amdgpu_drm.h | 17 +
>>>  1 file changed, 17 insertions(+)
>>>
>>> diff --git a/include/uapi/drm/amdgpu_drm.h
>>> b/include/uapi/drm/amdgpu_drm.h
>>> index c2c9c674a223..23d354242699 100644
>>> --- a/include/uapi/drm/amdgpu_drm.h
>>> +++ b/include/uapi/drm/amdgpu_drm.h
>>> @@ -212,6 +212,7 @@ union drm_amdgpu_bo_list {
>>>  #define AMDGPU_CTX_OP_QUERY_STATE2 4
>>>  #define AMDGPU_CTX_OP_GET_STABLE_PSTATE5
>>>  #define AMDGPU_CTX_OP_SET_STABLE_PSTATE6
>>> +#define AMDGPU_CTX_OP_SET_WORKLOAD_PROFILE 7
>>>
>>>  /* GPU reset status */
>>>  #define AMDGPU_CTX_NO_RESET0
>>> @@ -252,6 +253,17 @@ union drm_amdgpu_bo_list {
>>>  #define AMDGPU_CTX_STABLE_PSTATE_MIN_MCLK  3
>>>  #define AMDGPU_CTX_STABLE_PSTATE_PEAK  4
>>>
>>> +/* GPU workload hints, flag bits 8-15 */
>>> +#define AMDGPU_CTX_WORKLOAD_HINT_SHIFT 8
>>> +#define AMDGPU_CTX_WORKLOAD_HINT_MASK  (0xff <<
>>> AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
>>> +#define AMDGPU_CTX_WORKLOAD_HINT_NONE  (0 <<
>>> AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
>>> +#define AMDGPU_CTX_WORKLOAD_HINT_3D(1 <<
>>> AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
>>> +#define AMDGPU_CTX_WORKLOAD_HINT_VIDEO (2 <<
>>> AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
>>> +#define AMDGPU_CTX_WORKLOAD_HINT_VR(3 <<
>>> AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
>>> 

Re: [PATCH 07/11] drm/amdgpu: add UAPI to query GFX shadow sizes

2023-03-22 Thread Alex Deucher
On Wed, Mar 22, 2023 at 10:12 AM Marek Olšák  wrote:
>
> On Tue, Mar 21, 2023 at 3:51 PM Alex Deucher  wrote:
>>
>> On Mon, Mar 20, 2023 at 8:30 PM Marek Olšák  wrote:
>> >
>> >
>> > On Mon, Mar 20, 2023 at 1:38 PM Alex Deucher  
>> > wrote:
>> >>
>> >> Add UAPI to query the GFX shadow buffer requirements
>> >> for preemption on GFX11.  UMDs need to specify the shadow
>> >> areas for preemption.
>> >>
>> >> Signed-off-by: Alex Deucher 
>> >> ---
>> >>  include/uapi/drm/amdgpu_drm.h | 10 ++
>> >>  1 file changed, 10 insertions(+)
>> >>
>> >> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
>> >> index 3d9474af6566..19a806145371 100644
>> >> --- a/include/uapi/drm/amdgpu_drm.h
>> >> +++ b/include/uapi/drm/amdgpu_drm.h
>> >> @@ -886,6 +886,7 @@ struct drm_amdgpu_cs_chunk_cp_gfx_shadow {
>> >> #define AMDGPU_INFO_VIDEO_CAPS_DECODE   0
>> >> /* Subquery id: Encode */
>> >> #define AMDGPU_INFO_VIDEO_CAPS_ENCODE   1
>> >> +#define AMDGPU_INFO_CP_GFX_SHADOW_SIZE 0x22
>> >>
>> >>  #define AMDGPU_INFO_MMR_SE_INDEX_SHIFT 0
>> >>  #define AMDGPU_INFO_MMR_SE_INDEX_MASK  0xff
>> >> @@ -1203,6 +1204,15 @@ struct drm_amdgpu_info_video_caps {
>> >> struct drm_amdgpu_info_video_codec_info 
>> >> codec_info[AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_COUNT];
>> >>  };
>> >>
>> >> +struct drm_amdgpu_info_cp_gfx_shadow_size {
>> >> +   __u32 shadow_size;
>> >> +   __u32 shadow_alignment;
>> >> +   __u32 csa_size;
>> >> +   __u32 csa_alignment;
>> >> +   __u32 gds_size;
>> >> +   __u32 gds_alignment;
>> >
>> >
>> > Can you document the fields? What is CSA? Also, why is GDS there when the 
>> > hw deprecated it and replaced it with GDS registers?
>>
>> Will add documentation.  For reference:
>> CSA (Context Save Area) - used as a scratch area for FW for saving
>> various things
>> Shadow - stores the pipeline state
>> GDS backup - stores the GDS state used by the pipeline.  I'm not sure
>> if this is registers or the old GDS memory.  Presumably the former.
>
>
> 1. The POR for gfx11 was not to use GDS memory. I don't know why it's there, 
> but it would be unused uapi.

It still needs to be allocated because the FW requires it.

>
> 2. Is it secure to give userspace write access to the CSA and shadow buffers? 
> In the case of CSA, it looks like userspace could break the firmware.

Yes, it should be fine. It's the same way it has always been, it's
just that on older chips the kernel mapped them into all of the GPUVMs
because it was global. If the userspace screws it up, they are only
hurting themselves because the FW uses it to save the UMD's hardware
state.

Alex

>
> Marek


Re: [PATCH 1/2] drm/amdgpu: track MQD size for gfx and compute

2023-03-22 Thread Felix Kuehling
MQDs are smaller than a page. The BO size will always be exactly be one 
page.


KFD can allocate MQDs with a suballocator. On some GPUs we allocate MQDs 
together with the queue's control stack in a single BO. And on some GPUs 
we allocate SDMA "MQDs" in bulk together with the HIQ MQD. So relying on 
the BO size would not work for us.


Regards,
  Felix


Am 2023-03-22 um 09:58 schrieb Christian König:

Am 22.03.23 um 14:26 schrieb Alex Deucher:

On Wed, Mar 22, 2023 at 4:48 AM Christian König
 wrote:

Am 21.03.23 um 20:39 schrieb Alex Deucher:

It varies by generation and we need to know the size
to expose this via debugfs.

I suspect we can't just use the BO size for this?

We could, but it may be larger than the actual MQD.  Maybe that's not
a big deal?


I don't really know either. Maybe just go ahead with this approach 
here, but I usually try to avoid stuff like that because it can be an 
additional source of errors when the allocation size is not correct.


Christian.



Alex


If yes the series is Reviewed-by: Christian König 




Signed-off-by: Alex Deucher 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c  | 2 ++
   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 1 +
   2 files changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c

index c50d59855011..5435f41a3b7f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
@@ -404,6 +404,7 @@ int amdgpu_gfx_mqd_sw_init(struct amdgpu_device 
*adev,

   return r;
   }

+ ring->mqd_size = mqd_size;
   /* prepare MQD backup */
   adev->gfx.me.mqd_backup[i] = 
kmalloc(mqd_size, GFP_KERNEL);

   if (!adev->gfx.me.mqd_backup[i])
@@ -424,6 +425,7 @@ int amdgpu_gfx_mqd_sw_init(struct amdgpu_device 
*adev,

   return r;
   }

+ ring->mqd_size = mqd_size;
   /* prepare MQD backup */
   adev->gfx.mec.mqd_backup[i] = 
kmalloc(mqd_size, GFP_KERNEL);

   if (!adev->gfx.mec.mqd_backup[i])
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h

index 7942cb62e52c..deb9f7bead02 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -257,6 +257,7 @@ struct amdgpu_ring {
   struct amdgpu_bo    *mqd_obj;
   uint64_t    mqd_gpu_addr;
   void    *mqd_ptr;
+ unsigned    mqd_size;
   uint64_t    eop_gpu_addr;
   u32 doorbell_index;
   bool    use_doorbell;




Re: [PATCH v3 1/5] drm/amdgpu: add UAPI for workload hints to ctx ioctl

2023-03-22 Thread Christian König
Well that sounds like being able to optionally set it after context 
creation is actually the right approach.


VA-API could set it as soon as we know that this is a video codec 
application.


Vulkan can set it depending on what features are used by the application.

But yes, Shashank (or whoever requested that) should come up with some 
code for Mesa to actually use it. Otherwise we don't have the 
justification to push it into the kernel driver.


Christian.

Am 22.03.23 um 15:24 schrieb Marek Olšák:
The hint is static per API (one of graphics, video, compute, unknown). 
In the case of Vulkan, which exposes all queues, the hint is unknown, 
so Vulkan won't use it. (or make it based on the queue being used and 
not the uapi context state) GL won't use it because the default hint 
is already 3D. That makes VAAPI the only user that only sets the hint 
once, and maybe it's not worth even adding this uapi just for VAAPI.


Marek

On Wed, Mar 22, 2023 at 10:08 AM Christian König 
 wrote:


Well completely agree that we shouldn't have unused API. That's
why I said we should remove the getting the hint from the UAPI.

But what's wrong with setting it after creating the context? Don't
you know enough about the use case? I need to understand the
background a bit better here.

Christian.

Am 22.03.23 um 15:05 schrieb Marek Olšák:

The option to change the hint after context creation and get the
hint would be unused uapi, and AFAIK we are not supposed to add
unused uapi. What I asked is to change it to a uapi that
userspace will actually use.

Marek

On Tue, Mar 21, 2023 at 9:54 AM Christian König
 wrote:

Yes, I would like to avoid having multiple code paths for
context creation.

Setting it later on should be equally to specifying it on
creation since we only need it during CS.

Regards,
Christian.

Am 21.03.23 um 14:00 schrieb Sharma, Shashank:


[AMD Official Use Only - General]

When we started this patch series, the workload hint was a
part of the ctx_flag only,

But we changed that after the design review, to make it more
like how we are handling PSTATE.

Details:

https://patchwork.freedesktop.org/patch/496111/

Regards

Shashank

*From:*Marek Olšák  
*Sent:* 21 March 2023 04:05
*To:* Sharma, Shashank 

*Cc:* amd-gfx@lists.freedesktop.org; Deucher, Alexander

; Somalapuram, Amaranath

; Koenig, Christian
 
*Subject:* Re: [PATCH v3 1/5] drm/amdgpu: add UAPI for
workload hints to ctx ioctl

I think we should do it differently because this interface
will be mostly unused by open source userspace in its
current form.

Let's set the workload hint in drm_amdgpu_ctx_in::flags, and
that will be immutable for the lifetime of the context. No
other interface is needed.

Marek

On Mon, Sep 26, 2022 at 5:41 PM Shashank Sharma
 wrote:

Allow the user to specify a workload hint to the kernel.
We can use these to tweak the dpm heuristics to better match
the workload for improved performance.

V3: Create only set() workload UAPI (Christian)

Signed-off-by: Alex Deucher 
Signed-off-by: Shashank Sharma 
---
 include/uapi/drm/amdgpu_drm.h | 17 +
 1 file changed, 17 insertions(+)

diff --git a/include/uapi/drm/amdgpu_drm.h
b/include/uapi/drm/amdgpu_drm.h
index c2c9c674a223..23d354242699 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -212,6 +212,7 @@ union drm_amdgpu_bo_list {
 #define AMDGPU_CTX_OP_QUERY_STATE2  4
 #define AMDGPU_CTX_OP_GET_STABLE_PSTATE        5
 #define AMDGPU_CTX_OP_SET_STABLE_PSTATE        6
+#define AMDGPU_CTX_OP_SET_WORKLOAD_PROFILE     7

 /* GPU reset status */
 #define AMDGPU_CTX_NO_RESET 0
@@ -252,6 +253,17 @@ union drm_amdgpu_bo_list {
 #define AMDGPU_CTX_STABLE_PSTATE_MIN_MCLK  3
 #define AMDGPU_CTX_STABLE_PSTATE_PEAK 4

+/* GPU workload hints, flag bits 8-15 */
+#define AMDGPU_CTX_WORKLOAD_HINT_SHIFT    8
+#define AMDGPU_CTX_WORKLOAD_HINT_MASK     (0xff <<
AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
+#define AMDGPU_CTX_WORKLOAD_HINT_NONE     (0 <<
AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
+#define AMDGPU_CTX_WORKLOAD_HINT_3D     (1 <<
AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
+#define 

Re: [PATCH v3 1/5] drm/amdgpu: add UAPI for workload hints to ctx ioctl

2023-03-22 Thread Marek Olšák
The hint is static per API (one of graphics, video, compute, unknown). In
the case of Vulkan, which exposes all queues, the hint is unknown, so
Vulkan won't use it. (or make it based on the queue being used and not the
uapi context state) GL won't use it because the default hint is already 3D.
That makes VAAPI the only user that only sets the hint once, and maybe it's
not worth even adding this uapi just for VAAPI.

Marek

On Wed, Mar 22, 2023 at 10:08 AM Christian König 
wrote:

> Well completely agree that we shouldn't have unused API. That's why I said
> we should remove the getting the hint from the UAPI.
>
> But what's wrong with setting it after creating the context? Don't you
> know enough about the use case? I need to understand the background a bit
> better here.
>
> Christian.
>
> Am 22.03.23 um 15:05 schrieb Marek Olšák:
>
> The option to change the hint after context creation and get the hint
> would be unused uapi, and AFAIK we are not supposed to add unused uapi.
> What I asked is to change it to a uapi that userspace will actually use.
>
> Marek
>
> On Tue, Mar 21, 2023 at 9:54 AM Christian König <
> ckoenig.leichtzumer...@gmail.com> wrote:
>
>> Yes, I would like to avoid having multiple code paths for context
>> creation.
>>
>> Setting it later on should be equally to specifying it on creation since
>> we only need it during CS.
>>
>> Regards,
>> Christian.
>>
>> Am 21.03.23 um 14:00 schrieb Sharma, Shashank:
>>
>> [AMD Official Use Only - General]
>>
>>
>>
>> When we started this patch series, the workload hint was a part of the
>> ctx_flag only,
>>
>> But we changed that after the design review, to make it more like how we
>> are handling PSTATE.
>>
>>
>>
>> Details:
>>
>> https://patchwork.freedesktop.org/patch/496111/
>>
>>
>>
>> Regards
>>
>> Shashank
>>
>>
>>
>> *From:* Marek Olšák  
>> *Sent:* 21 March 2023 04:05
>> *To:* Sharma, Shashank 
>> 
>> *Cc:* amd-gfx@lists.freedesktop.org; Deucher, Alexander
>>  ; Somalapuram,
>> Amaranath  ;
>> Koenig, Christian  
>> *Subject:* Re: [PATCH v3 1/5] drm/amdgpu: add UAPI for workload hints to
>> ctx ioctl
>>
>>
>>
>> I think we should do it differently because this interface will be mostly
>> unused by open source userspace in its current form.
>>
>>
>>
>> Let's set the workload hint in drm_amdgpu_ctx_in::flags, and that will be
>> immutable for the lifetime of the context. No other interface is needed.
>>
>>
>>
>> Marek
>>
>>
>>
>> On Mon, Sep 26, 2022 at 5:41 PM Shashank Sharma 
>> wrote:
>>
>> Allow the user to specify a workload hint to the kernel.
>> We can use these to tweak the dpm heuristics to better match
>> the workload for improved performance.
>>
>> V3: Create only set() workload UAPI (Christian)
>>
>> Signed-off-by: Alex Deucher 
>> Signed-off-by: Shashank Sharma 
>> ---
>>  include/uapi/drm/amdgpu_drm.h | 17 +
>>  1 file changed, 17 insertions(+)
>>
>> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
>> index c2c9c674a223..23d354242699 100644
>> --- a/include/uapi/drm/amdgpu_drm.h
>> +++ b/include/uapi/drm/amdgpu_drm.h
>> @@ -212,6 +212,7 @@ union drm_amdgpu_bo_list {
>>  #define AMDGPU_CTX_OP_QUERY_STATE2 4
>>  #define AMDGPU_CTX_OP_GET_STABLE_PSTATE5
>>  #define AMDGPU_CTX_OP_SET_STABLE_PSTATE6
>> +#define AMDGPU_CTX_OP_SET_WORKLOAD_PROFILE 7
>>
>>  /* GPU reset status */
>>  #define AMDGPU_CTX_NO_RESET0
>> @@ -252,6 +253,17 @@ union drm_amdgpu_bo_list {
>>  #define AMDGPU_CTX_STABLE_PSTATE_MIN_MCLK  3
>>  #define AMDGPU_CTX_STABLE_PSTATE_PEAK  4
>>
>> +/* GPU workload hints, flag bits 8-15 */
>> +#define AMDGPU_CTX_WORKLOAD_HINT_SHIFT 8
>> +#define AMDGPU_CTX_WORKLOAD_HINT_MASK  (0xff <<
>> AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
>> +#define AMDGPU_CTX_WORKLOAD_HINT_NONE  (0 <<
>> AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
>> +#define AMDGPU_CTX_WORKLOAD_HINT_3D(1 <<
>> AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
>> +#define AMDGPU_CTX_WORKLOAD_HINT_VIDEO (2 <<
>> AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
>> +#define AMDGPU_CTX_WORKLOAD_HINT_VR(3 <<
>> AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
>> +#define AMDGPU_CTX_WORKLOAD_HINT_COMPUTE   (4 <<
>> AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
>> +#define AMDGPU_CTX_WORKLOAD_HINT_MAX
>> AMDGPU_CTX_WORKLOAD_HINT_COMPUTE
>> +#define AMDGPU_CTX_WORKLOAD_INDEX(n)  (n >>
>> AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
>> +
>>  struct drm_amdgpu_ctx_in {
>> /** AMDGPU_CTX_OP_* */
>> __u32   op;
>> @@ -281,6 +293,11 @@ union drm_amdgpu_ctx_out {
>> __u32   flags;
>> __u32   _pad;
>> } pstate;
>> +
>> +   struct {
>> +   __u32   flags;
>> +   __u32   _pad;
>> +   } workload;
>>  };
>>
>>  union drm_amdgpu_ctx {
>> --
>> 2.34.1
>>
>>
>>
>


Re: [PATCH 07/11] drm/amdgpu: add UAPI to query GFX shadow sizes

2023-03-22 Thread Marek Olšák
On Tue, Mar 21, 2023 at 3:51 PM Alex Deucher  wrote:

> On Mon, Mar 20, 2023 at 8:30 PM Marek Olšák  wrote:
> >
> >
> > On Mon, Mar 20, 2023 at 1:38 PM Alex Deucher 
> wrote:
> >>
> >> Add UAPI to query the GFX shadow buffer requirements
> >> for preemption on GFX11.  UMDs need to specify the shadow
> >> areas for preemption.
> >>
> >> Signed-off-by: Alex Deucher 
> >> ---
> >>  include/uapi/drm/amdgpu_drm.h | 10 ++
> >>  1 file changed, 10 insertions(+)
> >>
> >> diff --git a/include/uapi/drm/amdgpu_drm.h
> b/include/uapi/drm/amdgpu_drm.h
> >> index 3d9474af6566..19a806145371 100644
> >> --- a/include/uapi/drm/amdgpu_drm.h
> >> +++ b/include/uapi/drm/amdgpu_drm.h
> >> @@ -886,6 +886,7 @@ struct drm_amdgpu_cs_chunk_cp_gfx_shadow {
> >> #define AMDGPU_INFO_VIDEO_CAPS_DECODE   0
> >> /* Subquery id: Encode */
> >> #define AMDGPU_INFO_VIDEO_CAPS_ENCODE   1
> >> +#define AMDGPU_INFO_CP_GFX_SHADOW_SIZE 0x22
> >>
> >>  #define AMDGPU_INFO_MMR_SE_INDEX_SHIFT 0
> >>  #define AMDGPU_INFO_MMR_SE_INDEX_MASK  0xff
> >> @@ -1203,6 +1204,15 @@ struct drm_amdgpu_info_video_caps {
> >> struct drm_amdgpu_info_video_codec_info
> codec_info[AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_COUNT];
> >>  };
> >>
> >> +struct drm_amdgpu_info_cp_gfx_shadow_size {
> >> +   __u32 shadow_size;
> >> +   __u32 shadow_alignment;
> >> +   __u32 csa_size;
> >> +   __u32 csa_alignment;
> >> +   __u32 gds_size;
> >> +   __u32 gds_alignment;
> >
> >
> > Can you document the fields? What is CSA? Also, why is GDS there when
> the hw deprecated it and replaced it with GDS registers?
>
> Will add documentation.  For reference:
> CSA (Context Save Area) - used as a scratch area for FW for saving
> various things
> Shadow - stores the pipeline state
> GDS backup - stores the GDS state used by the pipeline.  I'm not sure
> if this is registers or the old GDS memory.  Presumably the former.
>

1. The POR for gfx11 was not to use GDS memory. I don't know why it's
there, but it would be unused uapi.

2. Is it secure to give userspace write access to the CSA and shadow
buffers? In the case of CSA, it looks like userspace could break the
firmware.

Marek


Re: [PATCH v3 1/5] drm/amdgpu: add UAPI for workload hints to ctx ioctl

2023-03-22 Thread Christian König
Well completely agree that we shouldn't have unused API. That's why I 
said we should remove the getting the hint from the UAPI.


But what's wrong with setting it after creating the context? Don't you 
know enough about the use case? I need to understand the background a 
bit better here.


Christian.

Am 22.03.23 um 15:05 schrieb Marek Olšák:
The option to change the hint after context creation and get the hint 
would be unused uapi, and AFAIK we are not supposed to add unused 
uapi. What I asked is to change it to a uapi that userspace will 
actually use.


Marek

On Tue, Mar 21, 2023 at 9:54 AM Christian König 
 wrote:


Yes, I would like to avoid having multiple code paths for context
creation.

Setting it later on should be equally to specifying it on creation
since we only need it during CS.

Regards,
Christian.

Am 21.03.23 um 14:00 schrieb Sharma, Shashank:


[AMD Official Use Only - General]

When we started this patch series, the workload hint was a part
of the ctx_flag only,

But we changed that after the design review, to make it more like
how we are handling PSTATE.

Details:

https://patchwork.freedesktop.org/patch/496111/

Regards

Shashank

*From:*Marek Olšák  
*Sent:* 21 March 2023 04:05
*To:* Sharma, Shashank 

*Cc:* amd-gfx@lists.freedesktop.org; Deucher, Alexander
 ;
Somalapuram, Amaranath 
; Koenig, Christian
 
*Subject:* Re: [PATCH v3 1/5] drm/amdgpu: add UAPI for workload
hints to ctx ioctl

I think we should do it differently because this interface will
be mostly unused by open source userspace in its current form.

Let's set the workload hint in drm_amdgpu_ctx_in::flags, and that
will be immutable for the lifetime of the context. No other
interface is needed.

Marek

On Mon, Sep 26, 2022 at 5:41 PM Shashank Sharma
 wrote:

Allow the user to specify a workload hint to the kernel.
We can use these to tweak the dpm heuristics to better match
the workload for improved performance.

V3: Create only set() workload UAPI (Christian)

Signed-off-by: Alex Deucher 
Signed-off-by: Shashank Sharma 
---
 include/uapi/drm/amdgpu_drm.h | 17 +
 1 file changed, 17 insertions(+)

diff --git a/include/uapi/drm/amdgpu_drm.h
b/include/uapi/drm/amdgpu_drm.h
index c2c9c674a223..23d354242699 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -212,6 +212,7 @@ union drm_amdgpu_bo_list {
 #define AMDGPU_CTX_OP_QUERY_STATE2     4
 #define AMDGPU_CTX_OP_GET_STABLE_PSTATE        5
 #define AMDGPU_CTX_OP_SET_STABLE_PSTATE        6
+#define AMDGPU_CTX_OP_SET_WORKLOAD_PROFILE     7

 /* GPU reset status */
 #define AMDGPU_CTX_NO_RESET            0
@@ -252,6 +253,17 @@ union drm_amdgpu_bo_list {
 #define AMDGPU_CTX_STABLE_PSTATE_MIN_MCLK  3
 #define AMDGPU_CTX_STABLE_PSTATE_PEAK  4

+/* GPU workload hints, flag bits 8-15 */
+#define AMDGPU_CTX_WORKLOAD_HINT_SHIFT     8
+#define AMDGPU_CTX_WORKLOAD_HINT_MASK      (0xff <<
AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
+#define AMDGPU_CTX_WORKLOAD_HINT_NONE      (0 <<
AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
+#define AMDGPU_CTX_WORKLOAD_HINT_3D        (1 <<
AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
+#define AMDGPU_CTX_WORKLOAD_HINT_VIDEO     (2 <<
AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
+#define AMDGPU_CTX_WORKLOAD_HINT_VR        (3 <<
AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
+#define AMDGPU_CTX_WORKLOAD_HINT_COMPUTE   (4 <<
AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
+#define AMDGPU_CTX_WORKLOAD_HINT_MAX
AMDGPU_CTX_WORKLOAD_HINT_COMPUTE
+#define AMDGPU_CTX_WORKLOAD_INDEX(n)      (n >>
AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
+
 struct drm_amdgpu_ctx_in {
        /** AMDGPU_CTX_OP_* */
        __u32   op;
@@ -281,6 +293,11 @@ union drm_amdgpu_ctx_out {
                        __u32   flags;
                        __u32   _pad;
                } pstate;
+
+               struct {
+                       __u32   flags;
+                       __u32   _pad;
+               } workload;
 };

 union drm_amdgpu_ctx {
-- 
2.34.1






Re: [PATCH 07/11] drm/amdgpu: add UAPI to query GFX shadow sizes

2023-03-22 Thread Marek Olšák
On Tue, Mar 21, 2023 at 3:54 PM Alex Deucher  wrote:

> On Mon, Mar 20, 2023 at 8:31 PM Marek Olšák  wrote:
> >
> > On Mon, Mar 20, 2023 at 1:38 PM Alex Deucher 
> wrote:
> >>
> >> Add UAPI to query the GFX shadow buffer requirements
> >> for preemption on GFX11.  UMDs need to specify the shadow
> >> areas for preemption.
> >>
> >> Signed-off-by: Alex Deucher 
> >> ---
> >>  include/uapi/drm/amdgpu_drm.h | 10 ++
> >>  1 file changed, 10 insertions(+)
> >>
> >> diff --git a/include/uapi/drm/amdgpu_drm.h
> b/include/uapi/drm/amdgpu_drm.h
> >> index 3d9474af6566..19a806145371 100644
> >> --- a/include/uapi/drm/amdgpu_drm.h
> >> +++ b/include/uapi/drm/amdgpu_drm.h
> >> @@ -886,6 +886,7 @@ struct drm_amdgpu_cs_chunk_cp_gfx_shadow {
> >> #define AMDGPU_INFO_VIDEO_CAPS_DECODE   0
> >> /* Subquery id: Encode */
> >> #define AMDGPU_INFO_VIDEO_CAPS_ENCODE   1
> >> +#define AMDGPU_INFO_CP_GFX_SHADOW_SIZE 0x22
> >
> >
> > Can you put this into the device structure instead? Let's minimize the
> number of kernel queries as much as possible.
>
> I guess, but one nice thing about this is that we can use the query as
> a way to determine if the kernel supports this functionality or not.
> If not, the query returns -ENOTSUP.
>

That should be another flag in the device info structure or the sizes
should be 0. There is never a reason to add a new single-value INFO query.

Marek


Re: [PATCH v3 1/5] drm/amdgpu: add UAPI for workload hints to ctx ioctl

2023-03-22 Thread Marek Olšák
The option to change the hint after context creation and get the hint would
be unused uapi, and AFAIK we are not supposed to add unused uapi. What I
asked is to change it to a uapi that userspace will actually use.

Marek

On Tue, Mar 21, 2023 at 9:54 AM Christian König <
ckoenig.leichtzumer...@gmail.com> wrote:

> Yes, I would like to avoid having multiple code paths for context creation.
>
> Setting it later on should be equally to specifying it on creation since
> we only need it during CS.
>
> Regards,
> Christian.
>
> Am 21.03.23 um 14:00 schrieb Sharma, Shashank:
>
> [AMD Official Use Only - General]
>
>
>
> When we started this patch series, the workload hint was a part of the
> ctx_flag only,
>
> But we changed that after the design review, to make it more like how we
> are handling PSTATE.
>
>
>
> Details:
>
> https://patchwork.freedesktop.org/patch/496111/
>
>
>
> Regards
>
> Shashank
>
>
>
> *From:* Marek Olšák  
> *Sent:* 21 March 2023 04:05
> *To:* Sharma, Shashank  
> *Cc:* amd-gfx@lists.freedesktop.org; Deucher, Alexander
>  ; Somalapuram,
> Amaranath  ;
> Koenig, Christian  
> *Subject:* Re: [PATCH v3 1/5] drm/amdgpu: add UAPI for workload hints to
> ctx ioctl
>
>
>
> I think we should do it differently because this interface will be mostly
> unused by open source userspace in its current form.
>
>
>
> Let's set the workload hint in drm_amdgpu_ctx_in::flags, and that will be
> immutable for the lifetime of the context. No other interface is needed.
>
>
>
> Marek
>
>
>
> On Mon, Sep 26, 2022 at 5:41 PM Shashank Sharma 
> wrote:
>
> Allow the user to specify a workload hint to the kernel.
> We can use these to tweak the dpm heuristics to better match
> the workload for improved performance.
>
> V3: Create only set() workload UAPI (Christian)
>
> Signed-off-by: Alex Deucher 
> Signed-off-by: Shashank Sharma 
> ---
>  include/uapi/drm/amdgpu_drm.h | 17 +
>  1 file changed, 17 insertions(+)
>
> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
> index c2c9c674a223..23d354242699 100644
> --- a/include/uapi/drm/amdgpu_drm.h
> +++ b/include/uapi/drm/amdgpu_drm.h
> @@ -212,6 +212,7 @@ union drm_amdgpu_bo_list {
>  #define AMDGPU_CTX_OP_QUERY_STATE2 4
>  #define AMDGPU_CTX_OP_GET_STABLE_PSTATE5
>  #define AMDGPU_CTX_OP_SET_STABLE_PSTATE6
> +#define AMDGPU_CTX_OP_SET_WORKLOAD_PROFILE 7
>
>  /* GPU reset status */
>  #define AMDGPU_CTX_NO_RESET0
> @@ -252,6 +253,17 @@ union drm_amdgpu_bo_list {
>  #define AMDGPU_CTX_STABLE_PSTATE_MIN_MCLK  3
>  #define AMDGPU_CTX_STABLE_PSTATE_PEAK  4
>
> +/* GPU workload hints, flag bits 8-15 */
> +#define AMDGPU_CTX_WORKLOAD_HINT_SHIFT 8
> +#define AMDGPU_CTX_WORKLOAD_HINT_MASK  (0xff <<
> AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
> +#define AMDGPU_CTX_WORKLOAD_HINT_NONE  (0 <<
> AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
> +#define AMDGPU_CTX_WORKLOAD_HINT_3D(1 <<
> AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
> +#define AMDGPU_CTX_WORKLOAD_HINT_VIDEO (2 <<
> AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
> +#define AMDGPU_CTX_WORKLOAD_HINT_VR(3 <<
> AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
> +#define AMDGPU_CTX_WORKLOAD_HINT_COMPUTE   (4 <<
> AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
> +#define AMDGPU_CTX_WORKLOAD_HINT_MAX  AMDGPU_CTX_WORKLOAD_HINT_COMPUTE
> +#define AMDGPU_CTX_WORKLOAD_INDEX(n)  (n >>
> AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
> +
>  struct drm_amdgpu_ctx_in {
> /** AMDGPU_CTX_OP_* */
> __u32   op;
> @@ -281,6 +293,11 @@ union drm_amdgpu_ctx_out {
> __u32   flags;
> __u32   _pad;
> } pstate;
> +
> +   struct {
> +   __u32   flags;
> +   __u32   _pad;
> +   } workload;
>  };
>
>  union drm_amdgpu_ctx {
> --
> 2.34.1
>
>
>


Re: [PATCH 1/2] drm/amdgpu: track MQD size for gfx and compute

2023-03-22 Thread Christian König

Am 22.03.23 um 14:26 schrieb Alex Deucher:

On Wed, Mar 22, 2023 at 4:48 AM Christian König
 wrote:

Am 21.03.23 um 20:39 schrieb Alex Deucher:

It varies by generation and we need to know the size
to expose this via debugfs.

I suspect we can't just use the BO size for this?

We could, but it may be larger than the actual MQD.  Maybe that's not
a big deal?


I don't really know either. Maybe just go ahead with this approach here, 
but I usually try to avoid stuff like that because it can be an 
additional source of errors when the allocation size is not correct.


Christian.



Alex



If yes the series is Reviewed-by: Christian König 


Signed-off-by: Alex Deucher 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c  | 2 ++
   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 1 +
   2 files changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
index c50d59855011..5435f41a3b7f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
@@ -404,6 +404,7 @@ int amdgpu_gfx_mqd_sw_init(struct amdgpu_device *adev,
   return r;
   }

+ ring->mqd_size = mqd_size;
   /* prepare MQD backup */
   adev->gfx.me.mqd_backup[i] = kmalloc(mqd_size, 
GFP_KERNEL);
   if (!adev->gfx.me.mqd_backup[i])
@@ -424,6 +425,7 @@ int amdgpu_gfx_mqd_sw_init(struct amdgpu_device *adev,
   return r;
   }

+ ring->mqd_size = mqd_size;
   /* prepare MQD backup */
   adev->gfx.mec.mqd_backup[i] = kmalloc(mqd_size, 
GFP_KERNEL);
   if (!adev->gfx.mec.mqd_backup[i])
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index 7942cb62e52c..deb9f7bead02 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -257,6 +257,7 @@ struct amdgpu_ring {
   struct amdgpu_bo*mqd_obj;
   uint64_tmqd_gpu_addr;
   void*mqd_ptr;
+ unsignedmqd_size;
   uint64_teop_gpu_addr;
   u32 doorbell_index;
   booluse_doorbell;




Re: [PATCH 1/2] drm/amdgpu: track MQD size for gfx and compute

2023-03-22 Thread Alex Deucher
On Wed, Mar 22, 2023 at 4:48 AM Christian König
 wrote:
>
> Am 21.03.23 um 20:39 schrieb Alex Deucher:
> > It varies by generation and we need to know the size
> > to expose this via debugfs.
>
> I suspect we can't just use the BO size for this?

We could, but it may be larger than the actual MQD.  Maybe that's not
a big deal?

Alex


>
> If yes the series is Reviewed-by: Christian König 
>
> >
> > Signed-off-by: Alex Deucher 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c  | 2 ++
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 1 +
> >   2 files changed, 3 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> > index c50d59855011..5435f41a3b7f 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> > @@ -404,6 +404,7 @@ int amdgpu_gfx_mqd_sw_init(struct amdgpu_device *adev,
> >   return r;
> >   }
> >
> > + ring->mqd_size = mqd_size;
> >   /* prepare MQD backup */
> >   adev->gfx.me.mqd_backup[i] = 
> > kmalloc(mqd_size, GFP_KERNEL);
> >   if (!adev->gfx.me.mqd_backup[i])
> > @@ -424,6 +425,7 @@ int amdgpu_gfx_mqd_sw_init(struct amdgpu_device *adev,
> >   return r;
> >   }
> >
> > + ring->mqd_size = mqd_size;
> >   /* prepare MQD backup */
> >   adev->gfx.mec.mqd_backup[i] = kmalloc(mqd_size, 
> > GFP_KERNEL);
> >   if (!adev->gfx.mec.mqd_backup[i])
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> > index 7942cb62e52c..deb9f7bead02 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> > @@ -257,6 +257,7 @@ struct amdgpu_ring {
> >   struct amdgpu_bo*mqd_obj;
> >   uint64_tmqd_gpu_addr;
> >   void*mqd_ptr;
> > + unsignedmqd_size;
> >   uint64_teop_gpu_addr;
> >   u32 doorbell_index;
> >   booluse_doorbell;
>


[bug report] drm/amd/display: move eDP panel control logic to link_edp_panel_control

2023-03-22 Thread Dan Carpenter
The recent function renames made these warnings show up as new again:

drivers/gpu/drm/amd/amdgpu/../display/dc/link/protocols/link_edp_panel_control.c:358
edp_receiver_ready_T9() warn: potential negative cast to bool 'result'

drivers/gpu/drm/amd/amdgpu/../display/dc/link/protocols/link_edp_panel_control.c:393
edp_receiver_ready_T7() warn: potential negative cast to bool 'result'

drivers/gpu/drm/amd/amdgpu/../display/dc/link/protocols/link_edp_panel_control.c
336 bool edp_receiver_ready_T9(struct dc_link *link)
337 {
338 unsigned int tries = 0;
339 unsigned char sinkstatus = 0;
340 unsigned char edpRev = 0;
341 enum dc_status result = DC_OK;
342 
343 result = core_link_read_dpcd(link, DP_EDP_DPCD_REV, , 
sizeof(edpRev));
344 
345 /* start from eDP version 1.2, SINK_STAUS indicate the sink is 
ready.*/
346 if (result == DC_OK && edpRev >= DP_EDP_12) {
347 do {
348 sinkstatus = 1;
349 result = core_link_read_dpcd(link, 
DP_SINK_STATUS, , sizeof(sinkstatus));
350 if (sinkstatus == 0)
351 break;
352 if (result != DC_OK)
353 break;
354 udelay(100); //MAx T9
355 } while (++tries < 50);
356 }
357 
--> 358 return result;
   ^^
result is a non-zero enum so this always returns true.  Which is fine
because the caller doesn't check.

359 }

regards,
dan carpenter


[Resend PATCH v1 2/3] drm/amd/pm: send the SMT-enable message to pmfw

2023-03-22 Thread Wenyou Yang
When the CPU SMT status change in the fly, sent the SMT-enable
message to pmfw to notify it that the SMT status changed.

Signed-off-by: Wenyou Yang 
---
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 41 +++
 drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h |  5 +++
 2 files changed, 46 insertions(+)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c 
b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
index b5d64749990e..5cd85a9d149d 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
@@ -22,6 +22,7 @@
 
 #define SWSMU_CODE_LAYER_L1
 
+#include 
 #include 
 #include 
 
@@ -69,6 +70,14 @@ static int smu_set_fan_speed_rpm(void *handle, uint32_t 
speed);
 static int smu_set_gfx_cgpg(struct smu_context *smu, bool enabled);
 static int smu_set_mp1_state(void *handle, enum pp_mp1_state mp1_state);
 
+static int smt_notifier_callback(struct notifier_block *nb, unsigned long 
action, void *data);
+
+extern struct raw_notifier_head smt_notifier_head;
+
+static struct notifier_block smt_notifier = {
+   .notifier_call = smt_notifier_callback,
+};
+
 static int smu_sys_get_pp_feature_mask(void *handle,
   char *buf)
 {
@@ -625,6 +634,8 @@ static int smu_set_funcs(struct amdgpu_device *adev)
return 0;
 }
 
+static struct smu_context *current_smu;
+
 static int smu_early_init(void *handle)
 {
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
@@ -645,6 +656,7 @@ static int smu_early_init(void *handle)
mutex_init(>message_lock);
 
adev->powerplay.pp_handle = smu;
+   current_smu = smu;
adev->powerplay.pp_funcs = _pm_funcs;
 
r = smu_set_funcs(adev);
@@ -1105,6 +1117,8 @@ static int smu_sw_init(void *handle)
if (!smu->ppt_funcs->get_fan_control_mode)
smu->adev->pm.no_fan = true;
 
+   raw_notifier_chain_register(_notifier_head, _notifier);
+
return 0;
 }
 
@@ -1122,6 +1136,8 @@ static int smu_sw_fini(void *handle)
 
smu_fini_microcode(smu);
 
+   raw_notifier_chain_unregister(_notifier_head, _notifier);
+
return 0;
 }
 
@@ -3241,3 +3257,28 @@ int smu_send_hbm_bad_channel_flag(struct smu_context 
*smu, uint32_t size)
 
return ret;
 }
+
+static int smu_set_cpu_smt_enable(struct smu_context *smu, bool enable)
+{
+   int ret = -EINVAL;
+
+   if (smu->ppt_funcs && smu->ppt_funcs->set_cpu_smt_enable)
+   ret = smu->ppt_funcs->set_cpu_smt_enable(smu, enable);
+
+   return ret;
+}
+
+static int smt_notifier_callback(struct notifier_block *nb,
+unsigned long action, void *data)
+{
+   struct smu_context *smu = current_smu;
+   int ret = NOTIFY_OK;
+
+   ret = (action == SMT_ENABLED) ?
+   smu_set_cpu_smt_enable(smu, true) :
+   smu_set_cpu_smt_enable(smu, false);
+   if (ret)
+   ret = NOTIFY_BAD;
+
+   return ret;
+}
diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h 
b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
index 09469c750a96..7c6594bba796 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
+++ b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
@@ -1354,6 +1354,11 @@ struct pptable_funcs {
 * @init_pptable_microcode: Prepare the pptable microcode to upload via 
PSP
 */
int (*init_pptable_microcode)(struct smu_context *smu);
+
+   /**
+* @set_cpu_smt_enable: Set the CPU SMT status
+*/
+   int (*set_cpu_smt_enable)(struct smu_context *smu, bool enable);
 };
 
 typedef enum {
-- 
2.39.2



[Resend PATCH v1 3/3] drm/amd/pm: vangogh: support to send SMT enable message

2023-03-22 Thread Wenyou Yang
Add the support to PPSMC_MSG_SetCClkSMTEnable(0x58) message to pmfw
for vangogh.

Signed-off-by: Wenyou Yang 
---
 .../pm/swsmu/inc/pmfw_if/smu_v11_5_ppsmc.h|  3 ++-
 drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h  |  3 ++-
 .../gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c  | 19 +++
 3 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu_v11_5_ppsmc.h 
b/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu_v11_5_ppsmc.h
index 7471e2df2828..2b182dbc6f9c 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu_v11_5_ppsmc.h
+++ b/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu_v11_5_ppsmc.h
@@ -111,7 +111,8 @@
 #define PPSMC_MSG_GetGfxOffStatus 0x50
 #define PPSMC_MSG_GetGfxOffEntryCount 0x51
 #define PPSMC_MSG_LogGfxOffResidency  0x52
-#define PPSMC_Message_Count0x53
+#define PPSMC_MSG_SetCClkSMTEnable0x58
+#define PPSMC_Message_Count0x54
 
 //Argument for PPSMC_MSG_GfxDeviceDriverReset
 enum {
diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h 
b/drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h
index 297b70b9388f..820812d910bf 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h
+++ b/drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h
@@ -245,7 +245,8 @@
__SMU_DUMMY_MAP(AllowGpo),  \
__SMU_DUMMY_MAP(Mode2Reset),\
__SMU_DUMMY_MAP(RequestI2cTransaction), \
-   __SMU_DUMMY_MAP(GetMetricsTable),
+   __SMU_DUMMY_MAP(GetMetricsTable), \
+   __SMU_DUMMY_MAP(SetCClkSMTEnable),
 
 #undef __SMU_DUMMY_MAP
 #define __SMU_DUMMY_MAP(type)  SMU_MSG_##type
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c
index 7433dcaa16e0..f0eeb42df96b 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c
@@ -141,6 +141,7 @@ static struct cmn2asic_msg_mapping 
vangogh_message_map[SMU_MSG_MAX_COUNT] = {
MSG_MAP(GetGfxOffStatus,PPSMC_MSG_GetGfxOffStatus,  
0),
MSG_MAP(GetGfxOffEntryCount,
PPSMC_MSG_GetGfxOffEntryCount,  0),
MSG_MAP(LogGfxOffResidency, 
PPSMC_MSG_LogGfxOffResidency,   0),
+   MSG_MAP(SetCClkSMTEnable,   PPSMC_MSG_SetCClkSMTEnable, 
0),
 };
 
 static struct cmn2asic_mapping vangogh_feature_mask_map[SMU_FEATURE_COUNT] = {
@@ -2428,6 +2429,23 @@ static u32 vangogh_get_gfxoff_entrycount(struct 
smu_context *smu, uint64_t *entr
return ret;
 }
 
+static int vangogh_set_cpu_smt_enable(struct smu_context *smu, bool enable)
+{
+   int ret = 0;
+
+   if (enable) {
+   ret = smu_cmn_send_smc_msg_with_param(smu,
+ SMU_MSG_SetCClkSMTEnable,
+ 1, NULL);
+   } else {
+   ret = smu_cmn_send_smc_msg_with_param(smu,
+ SMU_MSG_SetCClkSMTEnable,
+ 0, NULL);
+   }
+
+   return ret;
+}
+
 static const struct pptable_funcs vangogh_ppt_funcs = {
 
.check_fw_status = smu_v11_0_check_fw_status,
@@ -2474,6 +2492,7 @@ static const struct pptable_funcs vangogh_ppt_funcs = {
.get_power_limit = vangogh_get_power_limit,
.set_power_limit = vangogh_set_power_limit,
.get_vbios_bootup_values = smu_v11_0_get_vbios_bootup_values,
+   .set_cpu_smt_enable = vangogh_set_cpu_smt_enable,
 };
 
 void vangogh_set_ppt_funcs(struct smu_context *smu)
-- 
2.39.2



[Resend PATCH v1 1/3] cpu/smt: add a notifier to notify the SMT changes

2023-03-22 Thread Wenyou Yang
Add the notifier chain to notify the cpu SMT status changes

Signed-off-by: Wenyou Yang 
---
 include/linux/cpu.h |  5 +
 kernel/cpu.c| 11 ++-
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 314802f98b9d..9a842317fe2d 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -213,6 +213,11 @@ enum cpuhp_smt_control {
CPU_SMT_NOT_IMPLEMENTED,
 };
 
+enum cpuhp_smt_status {
+   SMT_ENABLED,
+   SMT_DISABLED,
+};
+
 #if defined(CONFIG_SMP) && defined(CONFIG_HOTPLUG_SMT)
 extern enum cpuhp_smt_control cpu_smt_control;
 extern void cpu_smt_disable(bool force);
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 6c0a92ca6bb5..accae0fa9868 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -89,6 +89,9 @@ static DEFINE_PER_CPU(struct cpuhp_cpu_state, cpuhp_state) = {
 cpumask_t cpus_booted_once_mask;
 #endif
 
+RAW_NOTIFIER_HEAD(smt_notifier_head);
+EXPORT_SYMBOL(smt_notifier_head);
+
 #if defined(CONFIG_LOCKDEP) && defined(CONFIG_SMP)
 static struct lockdep_map cpuhp_state_up_map =
STATIC_LOCKDEP_MAP_INIT("cpuhp_state-up", _state_up_map);
@@ -2281,8 +2284,10 @@ int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval)
 */
cpuhp_offline_cpu_device(cpu);
}
-   if (!ret)
+   if (!ret) {
cpu_smt_control = ctrlval;
+   raw_notifier_call_chain(_notifier_head, SMT_DISABLED, NULL);
+   }
cpu_maps_update_done();
return ret;
 }
@@ -2303,7 +2308,11 @@ int cpuhp_smt_enable(void)
/* See comment in cpuhp_smt_disable() */
cpuhp_online_cpu_device(cpu);
}
+   if (!ret)
+   raw_notifier_call_chain(_notifier_head, SMT_ENABLED, NULL);
+
cpu_maps_update_done();
+
return ret;
 }
 #endif
-- 
2.39.2



[Resend PATCH v1 0/3] send message to pmfw when SMT changes

2023-03-22 Thread Wenyou Yang
When the CPU SMT changes on the fly, send the message to pmfw
to notify the SMT status changed.

Wenyou Yang (3):
  cpu/smt: add a notifier to notify the SMT changes
  drm/amd/pm: send the SMT-enable message to pmfw
  drm/amd/pm: vangogh: support to send SMT enable message

 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 41 +++
 drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h |  5 +++
 .../pm/swsmu/inc/pmfw_if/smu_v11_5_ppsmc.h|  3 +-
 drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h  |  3 +-
 .../gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c  | 19 +
 include/linux/cpu.h   |  5 +++
 kernel/cpu.c  | 11 -
 7 files changed, 84 insertions(+), 3 deletions(-)

-- 
2.39.2



Re: [PATCH 07/11] drm/amdgpu: add UAPI to query GFX shadow sizes

2023-03-22 Thread Christian König




Am 21.03.23 um 20:53 schrieb Alex Deucher:

On Mon, Mar 20, 2023 at 8:31 PM Marek Olšák  wrote:

On Mon, Mar 20, 2023 at 1:38 PM Alex Deucher  wrote:

Add UAPI to query the GFX shadow buffer requirements
for preemption on GFX11.  UMDs need to specify the shadow
areas for preemption.

Signed-off-by: Alex Deucher 
---
  include/uapi/drm/amdgpu_drm.h | 10 ++
  1 file changed, 10 insertions(+)

diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index 3d9474af6566..19a806145371 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -886,6 +886,7 @@ struct drm_amdgpu_cs_chunk_cp_gfx_shadow {
 #define AMDGPU_INFO_VIDEO_CAPS_DECODE   0
 /* Subquery id: Encode */
 #define AMDGPU_INFO_VIDEO_CAPS_ENCODE   1
+#define AMDGPU_INFO_CP_GFX_SHADOW_SIZE 0x22


Can you put this into the device structure instead? Let's minimize the number 
of kernel queries as much as possible.

I guess, but one nice thing about this is that we can use the query as
a way to determine if the kernel supports this functionality or not.
If not, the query returns -ENOTSUP.


Well if we put it at the end of the device info structure the sizes 
should be zero on older kernels/fw.


That will also work nicely.

Christian.



Alex



Thanks,
Marek





Re: [PATCH 1/2] drm/amdgpu: track MQD size for gfx and compute

2023-03-22 Thread Christian König

Am 21.03.23 um 20:39 schrieb Alex Deucher:

It varies by generation and we need to know the size
to expose this via debugfs.


I suspect we can't just use the BO size for this?

If yes the series is Reviewed-by: Christian König 



Signed-off-by: Alex Deucher 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c  | 2 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 1 +
  2 files changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
index c50d59855011..5435f41a3b7f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
@@ -404,6 +404,7 @@ int amdgpu_gfx_mqd_sw_init(struct amdgpu_device *adev,
return r;
}
  
+ring->mqd_size = mqd_size;

/* prepare MQD backup */
adev->gfx.me.mqd_backup[i] = kmalloc(mqd_size, 
GFP_KERNEL);
if (!adev->gfx.me.mqd_backup[i])
@@ -424,6 +425,7 @@ int amdgpu_gfx_mqd_sw_init(struct amdgpu_device *adev,
return r;
}
  
+			ring->mqd_size = mqd_size;

/* prepare MQD backup */
adev->gfx.mec.mqd_backup[i] = kmalloc(mqd_size, 
GFP_KERNEL);
if (!adev->gfx.mec.mqd_backup[i])
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index 7942cb62e52c..deb9f7bead02 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -257,6 +257,7 @@ struct amdgpu_ring {
struct amdgpu_bo*mqd_obj;
uint64_tmqd_gpu_addr;
void*mqd_ptr;
+   unsignedmqd_size;
uint64_teop_gpu_addr;
u32 doorbell_index;
booluse_doorbell;