Re: [PATCH v2 2/4] drm: Add drm_get_acpi_edid() helper

2024-01-30 Thread kernel test robot
Hi Mario,

kernel test robot noticed the following build errors:

[auto build test ERROR on rafael-pm/linux-next]
[also build test ERROR on rafael-pm/acpi-bus linus/master v6.8-rc2 
next-20240131]
[cannot apply to drm-misc/drm-misc-next rafael-pm/devprop]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:
https://github.com/intel-lab-lkp/linux/commits/Mario-Limonciello/ACPI-video-Handle-fetching-EDID-that-is-longer-than-256-bytes/20240131-032909
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git 
linux-next
patch link:
https://lore.kernel.org/r/20240130192608.11666-3-mario.limonciello%40amd.com
patch subject: [PATCH v2 2/4] drm: Add drm_get_acpi_edid() helper
config: i386-buildonly-randconfig-002-20240131 
(https://download.01.org/0day-ci/archive/20240131/202401311541.bde2glwr-...@intel.com/config)
compiler: clang version 17.0.6 (https://github.com/llvm/llvm-project 
6009708b4367171ccdbf4b5905cb6a803753fe18)
reproduce (this is a W=1 build): 
(https://download.01.org/0day-ci/archive/20240131/202401311541.bde2glwr-...@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202401311541.bde2glwr-...@intel.com/

All errors (new ones prefixed by >>):

>> drivers/platform/x86/dell/dell-wmi-ddv.c:647:12: error: call to undeclared 
>> function 'acpi_device_uid'; ISO C99 and later do not support implicit 
>> function declarations [-Wimplicit-function-declaration]
 647 | uid_str = acpi_device_uid(acpi_dev);
 |   ^
>> drivers/platform/x86/dell/dell-wmi-ddv.c:647:10: error: incompatible integer 
>> to pointer conversion assigning to 'const char *' from 'int' 
>> [-Wint-conversion]
 647 | uid_str = acpi_device_uid(acpi_dev);
 | ^ ~
>> drivers/platform/x86/dell/dell-wmi-ddv.c:660:35: error: call to undeclared 
>> function 'to_acpi_device'; ISO C99 and later do not support implicit 
>> function declarations [-Wimplicit-function-declaration]
 660 | ret = 
dell_wmi_ddv_battery_index(to_acpi_device(dev->parent), );
 |  ^
>> drivers/platform/x86/dell/dell-wmi-ddv.c:660:35: error: incompatible integer 
>> to pointer conversion passing 'int' to parameter of type 'struct acpi_device 
>> *' [-Wint-conversion]
 660 | ret = 
dell_wmi_ddv_battery_index(to_acpi_device(dev->parent), );
 |  ^~~
   drivers/platform/x86/dell/dell-wmi-ddv.c:643:59: note: passing argument to 
parameter 'acpi_dev' here
 643 | static int dell_wmi_ddv_battery_index(struct acpi_device *acpi_dev, 
u32 *index)
 |   ^
   drivers/platform/x86/dell/dell-wmi-ddv.c:679:35: error: call to undeclared 
function 'to_acpi_device'; ISO C99 and later do not support implicit function 
declarations [-Wimplicit-function-declaration]
 679 | ret = 
dell_wmi_ddv_battery_index(to_acpi_device(dev->parent), );
 |  ^
   drivers/platform/x86/dell/dell-wmi-ddv.c:679:35: error: incompatible integer 
to pointer conversion passing 'int' to parameter of type 'struct acpi_device *' 
[-Wint-conversion]
 679 | ret = 
dell_wmi_ddv_battery_index(to_acpi_device(dev->parent), );
 |  ^~~
   drivers/platform/x86/dell/dell-wmi-ddv.c:643:59: note: passing argument to 
parameter 'acpi_dev' here
 643 | static int dell_wmi_ddv_battery_index(struct acpi_device *acpi_dev, 
u32 *index)
 |   ^
   drivers/platform/x86/dell/dell-wmi-ddv.c:705:35: error: call to undeclared 
function 'to_acpi_device'; ISO C99 and later do not support implicit function 
declarations [-Wimplicit-function-declaration]
 705 | ret = 
dell_wmi_ddv_battery_index(to_acpi_device(battery->dev.parent), );
 |  ^
   drivers/platform/x86/dell/dell-wmi-ddv.c:705:35: error: incompatible integer 
to pointer conversion passing 'int' to parameter of type 'struct acpi_device *' 
[-Wint-conversion]
 705 | ret = 
dell_wmi_ddv_battery_index(to_acpi_device(battery->dev.parent), );
 |  
^~~
   drivers/platform/x86/dell/dell-wmi-ddv.c:643:59: note: passing argument to 
parameter 'acpi_dev' here
 643 | static int dell_wmi_ddv_battery_index(struct acpi_device *acpi_dev, 
u32 *index)
 |  

RE: [PATCH] drm/amdgpu: remove asymmetrical irq disabling in vcn 4.0.5 suspend

2024-01-30 Thread Jamadar, Saleemkhan
[AMD Official Use Only - General]

Acked-By: Saleemkhan Jamadar 

-Original Message-
From: Zhang, Yifan 
Sent: Tuesday, January 30, 2024 6:45 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Gopalakrishnan, 
Veerabadhran (Veera) ; Jamadar, Saleemkhan 
; Zhang, Yifan 
Subject: [PATCH] drm/amdgpu: remove asymmetrical irq disabling in vcn 4.0.5 
suspend

There is no irq enabled in vcn 4.0.5 resume, causing wrong amdgpu_irq_src 
status.
Beside, current set function callbacks are empty with no real effect.

Signed-off-by: Yifan Zhang 
---
 drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c   | 17 -
 drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c | 19 ---
 2 files changed, 36 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c 
b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
index 169ed400ee7b..8ab01ae919d2 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
@@ -2017,22 +2017,6 @@ static int vcn_v4_0_set_powergating_state(void *handle, 
enum amd_powergating_sta
return ret;
 }

-/**
- * vcn_v4_0_set_interrupt_state - set VCN block interrupt state
- *
- * @adev: amdgpu_device pointer
- * @source: interrupt sources
- * @type: interrupt types
- * @state: interrupt states
- *
- * Set VCN block interrupt state
- */
-static int vcn_v4_0_set_interrupt_state(struct amdgpu_device *adev, struct 
amdgpu_irq_src *source,
-  unsigned type, enum amdgpu_interrupt_state state)
-{
-   return 0;
-}
-
 /**
  * vcn_v4_0_set_ras_interrupt_state - set VCN block RAS interrupt state
  *
@@ -2097,7 +2081,6 @@ static int vcn_v4_0_process_interrupt(struct 
amdgpu_device *adev, struct amdgpu_  }

 static const struct amdgpu_irq_src_funcs vcn_v4_0_irq_funcs = {
-   .set = vcn_v4_0_set_interrupt_state,
.process = vcn_v4_0_process_interrupt,  };

diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c 
b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c
index 2eda30e78f61..49e4c3c09aca 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c
@@ -269,8 +269,6 @@ static int vcn_v4_0_5_hw_fini(void *handle)
vcn_v4_0_5_set_powergating_state(adev, 
AMD_PG_STATE_GATE);
}
}
-
-   amdgpu_irq_put(adev, >vcn.inst[i].irq, 0);
}

return 0;
@@ -1668,22 +1666,6 @@ static int vcn_v4_0_5_set_powergating_state(void 
*handle, enum amd_powergating_s
return ret;
 }

-/**
- * vcn_v4_0_5_set_interrupt_state - set VCN block interrupt state
- *
- * @adev: amdgpu_device pointer
- * @source: interrupt sources
- * @type: interrupt types
- * @state: interrupt states
- *
- * Set VCN block interrupt state
- */
-static int vcn_v4_0_5_set_interrupt_state(struct amdgpu_device *adev, struct 
amdgpu_irq_src *source,
-   unsigned type, enum amdgpu_interrupt_state state)
-{
-   return 0;
-}
-
 /**
  * vcn_v4_0_5_process_interrupt - process VCN block interrupt
  *
@@ -1726,7 +1708,6 @@ static int vcn_v4_0_5_process_interrupt(struct 
amdgpu_device *adev, struct amdgp  }

 static const struct amdgpu_irq_src_funcs vcn_v4_0_5_irq_funcs = {
-   .set = vcn_v4_0_5_set_interrupt_state,
.process = vcn_v4_0_5_process_interrupt,  };

--
2.37.3



RE: [PATCH] drm/amdgpu/pm: Use inline function for IP version check

2024-01-30 Thread Wang, Yang(Kevin)
[AMD Official Use Only - General]

Reviewed-by: Yang Wang 

Best Regards,
Kevin

-Original Message-
From: Ma, Jun 
Sent: Wednesday, January 31, 2024 1:59 PM
To: amd-gfx@lists.freedesktop.org
Cc: Feng, Kenneth ; Deucher, Alexander 
; Wang, Yang(Kevin) ; Ma, 
Jun 
Subject: [PATCH] drm/amdgpu/pm: Use inline function for IP version check

Use existing inline function for IP version check.

Signed-off-by: Ma Jun 
---
 drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c
index bc8bd67c48ac..9c72c36260ff 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c
@@ -2944,7 +2944,7 @@ static bool smu_v13_0_0_wbrf_support_check(struct 
smu_context *smu)  {
struct amdgpu_device *adev = smu->adev;

-   switch (adev->ip_versions[MP1_HWIP][0]) {
+   switch (amdgpu_ip_version(adev, MP1_HWIP, 0)) {
case IP_VERSION(13, 0, 0):
return smu->smc_fw_version >= 0x004e6300;
case IP_VERSION(13, 0, 10):
--
2.34.1



[PATCH] drm/amdgpu/pm: Use inline function for IP version check

2024-01-30 Thread Ma Jun
Use existing inline function for IP version check.

Signed-off-by: Ma Jun 
---
 drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c
index bc8bd67c48ac..9c72c36260ff 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c
@@ -2944,7 +2944,7 @@ static bool smu_v13_0_0_wbrf_support_check(struct 
smu_context *smu)
 {
struct amdgpu_device *adev = smu->adev;
 
-   switch (adev->ip_versions[MP1_HWIP][0]) {
+   switch (amdgpu_ip_version(adev, MP1_HWIP, 0)) {
case IP_VERSION(13, 0, 0):
return smu->smc_fw_version >= 0x004e6300;
case IP_VERSION(13, 0, 10):
-- 
2.34.1



[PATCH] drm/amd/display: Add NULL test for 'timing generator' in 'dcn21_set_pipe()'

2024-01-30 Thread Srinivasan Shanmugam
In "u32 otg_inst = pipe_ctx->stream_res.tg->inst;"
pipe_ctx->stream_res.tg could be NULL, it is relying on the caller to
ensure the tg is not NULL.

Fixes: 474ac4a875ca ("drm/amd/display: Implement some asic specific abm call 
backs.")
Cc: Yongqiang Sun 
Cc: Anthony Koo 
Cc: Rodrigo Siqueira 
Cc: Aurabindo Pillai 
Signed-off-by: Srinivasan Shanmugam 
---
 .../amd/display/dc/hwss/dcn21/dcn21_hwseq.c   | 24 +++
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn21/dcn21_hwseq.c 
b/drivers/gpu/drm/amd/display/dc/hwss/dcn21/dcn21_hwseq.c
index 5d2d8fd64d98..4e21af0942ea 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dcn21/dcn21_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn21/dcn21_hwseq.c
@@ -206,28 +206,32 @@ void dcn21_set_abm_immediate_disable(struct pipe_ctx 
*pipe_ctx)
 void dcn21_set_pipe(struct pipe_ctx *pipe_ctx)
 {
struct abm *abm = pipe_ctx->stream_res.abm;
-   uint32_t otg_inst = pipe_ctx->stream_res.tg->inst;
+   struct timing_generator *tg = pipe_ctx->stream_res.tg;
struct panel_cntl *panel_cntl = pipe_ctx->stream->link->panel_cntl;
struct dmcu *dmcu = pipe_ctx->stream->ctx->dc->res_pool->dmcu;
+   u32 otg_inst;
+
+   if (!abm && !tg && !panel_cntl)
+   return;
+
+   otg_inst = tg->inst;
 
if (dmcu) {
dce110_set_pipe(pipe_ctx);
return;
}
 
-   if (abm && panel_cntl) {
-   if (abm->funcs && abm->funcs->set_pipe_ex) {
-   abm->funcs->set_pipe_ex(abm,
+   if (abm->funcs && abm->funcs->set_pipe_ex) {
+   abm->funcs->set_pipe_ex(abm,
otg_inst,
SET_ABM_PIPE_NORMAL,
panel_cntl->inst,
panel_cntl->pwrseq_inst);
-   } else {
-   dmub_abm_set_pipe(abm, otg_inst,
-   SET_ABM_PIPE_NORMAL,
-   panel_cntl->inst,
-   panel_cntl->pwrseq_inst);
-   }
+   } else {
+   dmub_abm_set_pipe(abm, otg_inst,
+ SET_ABM_PIPE_NORMAL,
+ panel_cntl->inst,
+ panel_cntl->pwrseq_inst);
}
 }
 
-- 
2.34.1



[PATCH v2] drm/amd/display: Fix 'panel_cntl' could be null in 'dcn21_set_backlight_level()'

2024-01-30 Thread Srinivasan Shanmugam
'panel_cntl' structure used to control the display panel could be null,
dereferencing it could lead to a null pointer access.

Fixes the below:
drivers/gpu/drm/amd/amdgpu/../display/dc/hwss/dcn21/dcn21_hwseq.c:269 
dcn21_set_backlight_level() error: we previously assumed 'panel_cntl' could be 
null (see line 250)

Fixes: 474ac4a875ca ("drm/amd/display: Implement some asic specific abm call 
backs.")
Cc: Yongqiang Sun 
Cc: Anthony Koo 
Cc: Rodrigo Siqueira 
Cc: Aurabindo Pillai 
Signed-off-by: Srinivasan Shanmugam 
---
v2:
 - Add NULL check for timing generator also which controls CRTC (Anthony)

 .../amd/display/dc/hwss/dcn21/dcn21_hwseq.c   | 39 ++-
 1 file changed, 20 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn21/dcn21_hwseq.c 
b/drivers/gpu/drm/amd/display/dc/hwss/dcn21/dcn21_hwseq.c
index 8e88dcaf88f5..5d2d8fd64d98 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dcn21/dcn21_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn21/dcn21_hwseq.c
@@ -237,34 +237,35 @@ bool dcn21_set_backlight_level(struct pipe_ctx *pipe_ctx,
 {
struct dc_context *dc = pipe_ctx->stream->ctx;
struct abm *abm = pipe_ctx->stream_res.abm;
+   struct timing_generator *tg = pipe_ctx->stream_res.tg;
struct panel_cntl *panel_cntl = pipe_ctx->stream->link->panel_cntl;
+   u32 otg_inst;
+
+   if (!abm && !tg && !panel_cntl)
+   return false;
+
+   otg_inst = tg->inst;
 
if (dc->dc->res_pool->dmcu) {
dce110_set_backlight_level(pipe_ctx, backlight_pwm_u16_16, 
frame_ramp);
return true;
}
 
-   if (abm != NULL) {
-   uint32_t otg_inst = pipe_ctx->stream_res.tg->inst;
-
-   if (abm && panel_cntl) {
-   if (abm->funcs && abm->funcs->set_pipe_ex) {
-   abm->funcs->set_pipe_ex(abm,
-   otg_inst,
-   SET_ABM_PIPE_NORMAL,
-   panel_cntl->inst,
-   panel_cntl->pwrseq_inst);
-   } else {
-   dmub_abm_set_pipe(abm,
-   otg_inst,
-   SET_ABM_PIPE_NORMAL,
-   panel_cntl->inst,
-   
panel_cntl->pwrseq_inst);
-   }
-   }
+   if (abm->funcs && abm->funcs->set_pipe_ex) {
+   abm->funcs->set_pipe_ex(abm,
+   otg_inst,
+   SET_ABM_PIPE_NORMAL,
+   panel_cntl->inst,
+   panel_cntl->pwrseq_inst);
+   } else {
+   dmub_abm_set_pipe(abm,
+ otg_inst,
+ SET_ABM_PIPE_NORMAL,
+ panel_cntl->inst,
+ panel_cntl->pwrseq_inst);
}
 
-   if (abm && abm->funcs && abm->funcs->set_backlight_level_pwm)
+   if (abm->funcs && abm->funcs->set_backlight_level_pwm)
abm->funcs->set_backlight_level_pwm(abm, backlight_pwm_u16_16,
frame_ramp, 0, panel_cntl->inst);
else
-- 
2.34.1



RE: [PATCH 2/2] use PSP address query command

2024-01-30 Thread Zhang, Hawking
[AMD Official Use Only - General]

Series is

Reviewed-by: Hawking Zhang 

Regards,
Hawking
-Original Message-
From: amd-gfx  On Behalf Of Tao Zhou
Sent: Tuesday, January 30, 2024 19:09
To: amd-gfx@lists.freedesktop.org
Cc: Zhou1, Tao 
Subject: [PATCH 2/2] use PSP address query command

Get UMC physical address from PSP in RAS error address coversion.

Signed-off-by: Tao Zhou 
---
 drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 46 ++
 1 file changed, 39 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c 
b/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
index 836a4cc1134e..14ef7a24be7b 100644
--- a/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
@@ -203,14 +203,14 @@ static bool umc_v12_0_bit_wise_xor(uint32_t val)
return result;
 }

-static void umc_v12_0_convert_error_address(struct amdgpu_device *adev,
-   struct ras_err_data *err_data, 
uint64_t err_addr,
-   uint32_t ch_inst, uint32_t umc_inst,
-   uint32_t node_inst)
+static void umc_v12_0_mca_addr_to_pa(struct amdgpu_device *adev,
+   uint64_t err_addr, uint32_t ch_inst, 
uint32_t umc_inst,
+   uint32_t node_inst,
+   struct ta_ras_query_address_output 
*addr_out)
 {
uint32_t channel_index, i;
-   uint64_t soc_pa, na, retired_page, column;
-   uint32_t bank_hash0, bank_hash1, bank_hash2, bank_hash3, col, row, 
row_xor;
+   uint64_t na, soc_pa;
+   uint32_t bank_hash0, bank_hash1, bank_hash2, bank_hash3, col, row;
uint32_t bank0, bank1, bank2, bank3, bank;

bank_hash0 = (err_addr >> UMC_V12_0_MCA_B0_BIT) & 0x1ULL; @@ -260,12 
+260,44 @@ static void umc_v12_0_convert_error_address(struct amdgpu_device 
*adev,
/* the umc channel bits are not original values, they are hashed */
UMC_V12_0_SET_CHANNEL_HASH(channel_index, soc_pa);

+   addr_out->pa.pa = soc_pa;
+   addr_out->pa.bank = bank;
+   addr_out->pa.channel_idx = channel_index; }
+
+static void umc_v12_0_convert_error_address(struct amdgpu_device *adev,
+   struct ras_err_data *err_data, 
uint64_t err_addr,
+   uint32_t ch_inst, uint32_t umc_inst,
+   uint32_t node_inst)
+{
+   uint32_t col, row, row_xor, bank, channel_index;
+   uint64_t soc_pa, retired_page, column;
+   struct ta_ras_query_address_input addr_in;
+   struct ta_ras_query_address_output addr_out;
+
+   addr_in.addr_type = TA_RAS_MCA_TO_PA;
+   addr_in.ma.err_addr = err_addr;
+   addr_in.ma.ch_inst = ch_inst;
+   addr_in.ma.umc_inst = umc_inst;
+   addr_in.ma.node_inst = node_inst;
+
+   if (psp_ras_query_address(>psp, _in, _out))
+   /* fallback to old path if fail to get pa from psp */
+   umc_v12_0_mca_addr_to_pa(adev, err_addr, ch_inst, umc_inst,
+   node_inst, _out);
+
+   soc_pa = addr_out.pa.pa;
+   bank = addr_out.pa.bank;
+   channel_index = addr_out.pa.channel_idx;
+
+   col = (err_addr >> 1) & 0x1fULL;
+   row = (err_addr >> 10) & 0x3fffULL;
+   row_xor = row ^ (0x1ULL << 13);
/* clear [C3 C2] in soc physical address */
soc_pa &= ~(0x3ULL << UMC_V12_0_PA_C2_BIT);
/* clear [C4] in soc physical address */
soc_pa &= ~(0x1ULL << UMC_V12_0_PA_C4_BIT);

-   row_xor = row ^ (0x1ULL << 13);
/* loop for all possibilities of [C4 C3 C2] */
for (column = 0; column < UMC_V12_0_NA_MAP_PA_NUM; column++) {
retired_page = soc_pa | ((column & 0x3) << UMC_V12_0_PA_C2_BIT);
--
2.34.1



Re: [PATCH v3 1/2] drm/buddy: Implement tracking clear page feature

2024-01-30 Thread Arunpravin Paneer Selvam

Hi Matthew,

On 12/21/2023 12:51 AM, Matthew Auld wrote:

Hi,

On 14/12/2023 13:42, Arunpravin Paneer Selvam wrote:

- Add tracking clear page feature.

- Driver should enable the DRM_BUDDY_CLEARED flag if it
   successfully clears the blocks in the free path. On the otherhand,
   DRM buddy marks each block as cleared.

- Track the available cleared pages size

- If driver requests cleared memory we prefer cleared memory
   but fallback to uncleared if we can't find the cleared blocks.
   when driver requests uncleared memory we try to use uncleared but
   fallback to cleared memory if necessary.

- When a block gets freed we clear it and mark the freed block as 
cleared,

   when there are buddies which are cleared as well we can merge them.
   Otherwise, we prefer to keep the blocks as separated.


I was not involved, but it looks like we have also tried enabling the 
clear-on-free idea for VRAM in i915 and then also tracking that in the 
allocator, however that work unfortunately is not upstream. The code 
is open source though: 
https://github.com/intel-gpu/intel-gpu-i915-backports/blob/backport/main/drivers/gpu/drm/i915/i915_buddy.c#L300


It looks like some of the design differences there are having two 
separate free lists, so mm->clean and mm->dirty (sounds reasonable to 
me). And also the inclusion of a de-fragmentation routine, since buddy 
blocks are now not always merged back, we might choose to run the 
defrag in some cases, which also sounds reasonable. IIRC in amdgpu 
userspace can control the page-size for an allocation, so perhaps you 
would want to run it first if the allocation fails, before trying to 
evict stuff?
I checked the clear-on-free idea implemented in i915. In amdgpu version, 
we are clearing all the blocks in amdgpu free routine and DRM buddy 
expects only the DRM_BUDDY_CLEARED flag. Basically, we are keeping the 
cleared blocks ready to be allocated when the user request for the 
cleared memory. We observed that this improves the performance on games 
and resolves the stutter issues as well. I see i915 active fences part 
does the same job for i915. Could we move this part into i915 free 
routine and set the DRM_BUDDY_CLEARED flag.


On de-fragmentation , I have included a function which can be called at 
places where we get -ENOSPC. This routine will merge back the clear and 
dirty blocks together to form a larger block of requested size. I am 
wondering where we could use this routine as for the non-contiguous 
memory we have the fallback method and for the contiguous memory we have 
the try harder method which searches through the tree.


I agree we can have 2 lists (clear list and dirty list) and this would 
reduce the search iterations. But we need to handle the 2 lists design 
in all the functions which might require more time for testing on all 
platforms. Could we just go ahead with 1 list (free list) for now and I 
am going to take up this work as my next task.


Thanks,
Arun.




v1: (Christian)
   - Depends on the flag check DRM_BUDDY_CLEARED, enable the block as
 cleared. Else, reset the clear flag for each block in the list.

   - For merging the 2 cleared blocks compare as below,
 drm_buddy_is_clear(block) != drm_buddy_is_clear(buddy)

Signed-off-by: Arunpravin Paneer Selvam 


Suggested-by: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c  |   6 +-
  drivers/gpu/drm/drm_buddy.c   | 169 +++---
  drivers/gpu/drm/i915/i915_ttm_buddy_manager.c |   6 +-
  drivers/gpu/drm/tests/drm_buddy_test.c    |  10 +-
  include/drm/drm_buddy.h   |  18 +-
  5 files changed, 168 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c

index 08916538a615..d0e199cc8f17 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -556,7 +556,7 @@ static int amdgpu_vram_mgr_new(struct 
ttm_resource_manager *man,

  return 0;
    error_free_blocks:
-    drm_buddy_free_list(mm, >blocks);
+    drm_buddy_free_list(mm, >blocks, 0);
  mutex_unlock(>lock);
  error_fini:
  ttm_resource_fini(man, >base);
@@ -589,7 +589,7 @@ static void amdgpu_vram_mgr_del(struct 
ttm_resource_manager *man,

    amdgpu_vram_mgr_do_reserve(man);
  -    drm_buddy_free_list(mm, >blocks);
+    drm_buddy_free_list(mm, >blocks, 0);
  mutex_unlock(>lock);
    atomic64_sub(vis_usage, >vis_usage);
@@ -897,7 +897,7 @@ void amdgpu_vram_mgr_fini(struct amdgpu_device 
*adev)

  kfree(rsv);
    list_for_each_entry_safe(rsv, temp, >reserved_pages, 
blocks) {

-    drm_buddy_free_list(>mm, >allocated);
+    drm_buddy_free_list(>mm, >allocated, 0);
  kfree(rsv);
  }
  if (!adev->gmc.is_app_apu)
diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
index f57e6d74fb0e..d44172f23f05 100644
--- a/drivers/gpu/drm/drm_buddy.c
+++ 

Re: [PATCH v3] drm/amdkfd: reserve the BO before validating it

2024-01-30 Thread Felix Kuehling

On 2024-01-30 04:45, Lang Yu wrote:

Fixes: 410f08516e0f ("drm/amdkfd: Move dma unmapping after TLB flush")

v2: Avoid unmapping attachment twice when ERESTARTSYS.

v3: Lock the BO before accessing ttm->sg to avoid race conditions.(Felix)

[   41.708711] WARNING: CPU: 0 PID: 1463 at drivers/gpu/drm/ttm/ttm_bo.c:846 
ttm_bo_validate+0x146/0x1b0 [ttm]
[   41.708989] Call Trace:
[   41.708992]  
[   41.708996]  ? show_regs+0x6c/0x80
[   41.709000]  ? ttm_bo_validate+0x146/0x1b0 [ttm]
[   41.709008]  ? __warn+0x93/0x190
[   41.709014]  ? ttm_bo_validate+0x146/0x1b0 [ttm]
[   41.709024]  ? report_bug+0x1f9/0x210
[   41.709035]  ? handle_bug+0x46/0x80
[   41.709041]  ? exc_invalid_op+0x1d/0x80
[   41.709048]  ? asm_exc_invalid_op+0x1f/0x30
[   41.709057]  ? amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x2c/0x80 [amdgpu]
[   41.709185]  ? ttm_bo_validate+0x146/0x1b0 [ttm]
[   41.709197]  ? amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x2c/0x80 [amdgpu]
[   41.709337]  ? srso_alias_return_thunk+0x5/0x7f
[   41.709346]  kfd_mem_dmaunmap_attachment+0x9e/0x1e0 [amdgpu]
[   41.709467]  amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x56/0x80 [amdgpu]
[   41.709586]  kfd_ioctl_unmap_memory_from_gpu+0x1b7/0x300 [amdgpu]
[   41.709710]  kfd_ioctl+0x1ec/0x650 [amdgpu]
[   41.709822]  ? __pfx_kfd_ioctl_unmap_memory_from_gpu+0x10/0x10 [amdgpu]
[   41.709945]  ? srso_alias_return_thunk+0x5/0x7f
[   41.709949]  ? tomoyo_file_ioctl+0x20/0x30
[   41.709959]  __x64_sys_ioctl+0x9c/0xd0
[   41.709967]  do_syscall_64+0x3f/0x90
[   41.709973]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8

Signed-off-by: Lang Yu 


Reviewed-by: Felix Kuehling 



---
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|  2 +-
  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 20 ---
  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |  4 +++-
  3 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 298fc52a35bc..e60f63ccf79a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -313,7 +313,7 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(struct 
amdgpu_device *adev,
  struct kgd_mem *mem, void *drm_priv);
  int amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu(
struct amdgpu_device *adev, struct kgd_mem *mem, void 
*drm_priv);
-void amdgpu_amdkfd_gpuvm_dmaunmap_mem(struct kgd_mem *mem, void *drm_priv);
+int amdgpu_amdkfd_gpuvm_dmaunmap_mem(struct kgd_mem *mem, void *drm_priv);
  int amdgpu_amdkfd_gpuvm_sync_memory(
struct amdgpu_device *adev, struct kgd_mem *mem, bool intr);
  int amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct kgd_mem *mem,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 6f3a4cb2a9ef..ef71b12062a1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -2088,21 +2088,35 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
return ret;
  }
  
-void amdgpu_amdkfd_gpuvm_dmaunmap_mem(struct kgd_mem *mem, void *drm_priv)

+int amdgpu_amdkfd_gpuvm_dmaunmap_mem(struct kgd_mem *mem, void *drm_priv)
  {
struct kfd_mem_attachment *entry;
struct amdgpu_vm *vm;
+   int ret;
  
  	vm = drm_priv_to_vm(drm_priv);
  
  	mutex_lock(>lock);
  
+	ret = amdgpu_bo_reserve(mem->bo, true);

+   if (ret)
+   goto out;
+
list_for_each_entry(entry, >attachments, list) {
-   if (entry->bo_va->base.vm == vm)
-   kfd_mem_dmaunmap_attachment(mem, entry);
+   if (entry->bo_va->base.vm != vm)
+   continue;
+   if (entry->bo_va->base.bo->tbo.ttm &&
+   !entry->bo_va->base.bo->tbo.ttm->sg)
+   continue;
+
+   kfd_mem_dmaunmap_attachment(mem, entry);
}
  
+	amdgpu_bo_unreserve(mem->bo);

+out:
mutex_unlock(>lock);
+
+   return ret;
  }
  
  int amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu(

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index ce4c52ec34d8..80e90fdef291 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1442,7 +1442,9 @@ static int kfd_ioctl_unmap_memory_from_gpu(struct file 
*filep,
kfd_flush_tlb(peer_pdd, TLB_FLUSH_HEAVYWEIGHT);
  
  		/* Remove dma mapping after tlb flush to avoid IO_PAGE_FAULT */

-   amdgpu_amdkfd_gpuvm_dmaunmap_mem(mem, peer_pdd->drm_priv);
+   err = amdgpu_amdkfd_gpuvm_dmaunmap_mem(mem, peer_pdd->drm_priv);
+   if (err)
+   goto sync_memory_failed;
}
  
  	mutex_unlock(>mutex);


[PATCH 2/2] drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole (v2)

2024-01-30 Thread Felix Kuehling
The TBA and TMA, along with an unused IB allocation, reside at low
addresses in the VM address space. A stray VM fault which hits these
pages must be serviced by making their page table entries invalid.
The scheduler depends upon these pages being resident and fails,
preventing a debugger from inspecting the failure state.

By relocating these pages above 47 bits in the VM address space they
can only be reached when bits [63:48] are set to 1. This makes it much
less likely for a misbehaving program to generate accesses to them.
The current placement at VA (PAGE_SIZE*2) is readily hit by a NULL
access with a small offset.

v2:
- Move it to the reserved space to avoid concflicts with Mesa
- Add macros to make reserved space management easier

Cc: Arunpravin Paneer Selvam 
Cc: Christian Koenig 
Signed-off-by: Jay Cornwall 
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c  |  4 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c|  7 ++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h   | 10 ++-
 drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c | 30 +++-
 4 files changed, 29 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
index 823d31f4a2a3..53d0a458d78e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
@@ -28,9 +28,9 @@
 
 uint64_t amdgpu_csa_vaddr(struct amdgpu_device *adev)
 {
-   uint64_t addr = adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT;
+   uint64_t addr = AMDGPU_VA_RESERVED_CSA_START(
+   adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT);
 
-   addr -= AMDGPU_VA_RESERVED_CSA_SIZE;
addr = amdgpu_gmc_sign_extend(addr);
 
return addr;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
index 3d0d56087d41..9e769ef50f2e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
@@ -45,11 +45,8 @@
  */
 static inline u64 amdgpu_seq64_get_va_base(struct amdgpu_device *adev)
 {
-   u64 addr = adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT;
-
-   addr -= AMDGPU_VA_RESERVED_TOP;
-
-   return addr;
+   return AMDGPU_VA_RESERVED_SEQ64_START(
+   adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT);
 }
 
 /**
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 2c4053b29bb3..c2407f6a7e83 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -137,9 +137,17 @@ struct amdgpu_mem_stats;
 
 /* Reserve space at top/bottom of address space for kernel use */
 #define AMDGPU_VA_RESERVED_CSA_SIZE(2ULL << 20)
+#define AMDGPU_VA_RESERVED_CSA_START(top)  ((top) \
+- AMDGPU_VA_RESERVED_CSA_SIZE)
 #define AMDGPU_VA_RESERVED_SEQ64_SIZE  (2ULL << 20)
+#define AMDGPU_VA_RESERVED_SEQ64_START(top)
(AMDGPU_VA_RESERVED_CSA_START(top) \
+- 
AMDGPU_VA_RESERVED_SEQ64_SIZE)
+#define AMDGPU_VA_RESERVED_TRAP_SIZE   (2ULL << 12)
+#define AMDGPU_VA_RESERVED_TRAP_START(top) 
(AMDGPU_VA_RESERVED_SEQ64_START(top) \
+- AMDGPU_VA_RESERVED_TRAP_SIZE)
 #define AMDGPU_VA_RESERVED_BOTTOM  (1ULL << 16)
-#define AMDGPU_VA_RESERVED_TOP (AMDGPU_VA_RESERVED_SEQ64_SIZE 
+ \
+#define AMDGPU_VA_RESERVED_TOP (AMDGPU_VA_RESERVED_TRAP_SIZE + 
\
+AMDGPU_VA_RESERVED_SEQ64_SIZE 
+ \
 AMDGPU_VA_RESERVED_CSA_SIZE)
 
 /* See vm_update_mode */
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
index 6604a3f99c5e..f899cce25b2a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include "amdgpu_vm.h"
 
 /*
  * The primary memory I/O features being added for revisions of gfxip
@@ -326,10 +327,16 @@ static void kfd_init_apertures_vi(struct 
kfd_process_device *pdd, uint8_t id)
 * with small reserved space for kernel.
 * Set them to CANONICAL addresses.
 */
-   pdd->gpuvm_base = SVM_USER_BASE;
+   pdd->gpuvm_base = max(SVM_USER_BASE, AMDGPU_VA_RESERVED_BOTTOM);
pdd->gpuvm_limit =
pdd->dev->kfd->shared_resources.gpuvm_size - 1;
 
+   /* dGPUs: the reserved space for kernel
+* before SVM
+*/
+   pdd->qpd.cwsr_base = SVM_CWSR_BASE;
+   pdd->qpd.ib_base = SVM_IB_BASE;
+
pdd->scratch_base = MAKE_SCRATCH_APP_BASE_VI();
pdd->scratch_limit = MAKE_SCRATCH_APP_LIMIT(pdd->scratch_base);
 }
@@ -339,18 +346,19 @@ static void kfd_init_apertures_v9(struct 
kfd_process_device *pdd, 

[PATCH 1/2] drm/amdgpu: Reduce VA_RESERVED_BOTTOM to 64KB

2024-01-30 Thread Felix Kuehling
The reservation is there to catch NULL pointer dereferences from the
GPU. Reduce the size to 64KB to make sure that shared virtual address
programming models can map all CPU-accessible virtual addresses for GPU
access. This is also the default for CPU virtual address mappings as
seen in /proc/sys/vm/mmap_min_addr.

Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 98a57192..2c4053b29bb3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -135,10 +135,10 @@ struct amdgpu_mem_stats;
 #define AMDGPU_IS_MMHUB0(x) ((x) >= AMDGPU_MMHUB0_START && (x) < 
AMDGPU_MMHUB1_START)
 #define AMDGPU_IS_MMHUB1(x) ((x) >= AMDGPU_MMHUB1_START && (x) < 
AMDGPU_MAX_VMHUBS)
 
-/* Reserve 2MB at top/bottom of address space for kernel use */
+/* Reserve space at top/bottom of address space for kernel use */
 #define AMDGPU_VA_RESERVED_CSA_SIZE(2ULL << 20)
 #define AMDGPU_VA_RESERVED_SEQ64_SIZE  (2ULL << 20)
-#define AMDGPU_VA_RESERVED_BOTTOM  (2ULL << 20)
+#define AMDGPU_VA_RESERVED_BOTTOM  (1ULL << 16)
 #define AMDGPU_VA_RESERVED_TOP (AMDGPU_VA_RESERVED_SEQ64_SIZE 
+ \
 AMDGPU_VA_RESERVED_CSA_SIZE)
 
-- 
2.34.1



[PATCH v5 3/3] drm/buddy: Add defragmentation support

2024-01-30 Thread Arunpravin Paneer Selvam
Add a function to support defragmentation.

v5: Defragment the freelist order array beginning
from min_order.

Signed-off-by: Arunpravin Paneer Selvam 
Suggested-by: Matthew Auld 
---
 drivers/gpu/drm/drm_buddy.c | 70 ++---
 1 file changed, 58 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
index d44172f23f05..8aa6d31cb826 100644
--- a/drivers/gpu/drm/drm_buddy.c
+++ b/drivers/gpu/drm/drm_buddy.c
@@ -276,10 +276,12 @@ drm_get_buddy(struct drm_buddy_block *block)
 }
 EXPORT_SYMBOL(drm_get_buddy);
 
-static void __drm_buddy_free(struct drm_buddy *mm,
-struct drm_buddy_block *block)
+static unsigned int __drm_buddy_free(struct drm_buddy *mm,
+struct drm_buddy_block *block,
+bool defrag)
 {
struct drm_buddy_block *parent;
+   unsigned int order;
 
while ((parent = block->parent)) {
struct drm_buddy_block *buddy;
@@ -289,12 +291,14 @@ static void __drm_buddy_free(struct drm_buddy *mm,
if (!drm_buddy_block_is_free(buddy))
break;
 
-   if (drm_buddy_block_is_clear(block) !=
-   drm_buddy_block_is_clear(buddy))
-   break;
+   if (!defrag) {
+   if (drm_buddy_block_is_clear(block) !=
+   drm_buddy_block_is_clear(buddy))
+   break;
 
-   if (drm_buddy_block_is_clear(block))
-   mark_cleared(parent);
+   if (drm_buddy_block_is_clear(block))
+   mark_cleared(parent);
+   }
 
list_del(>link);
 
@@ -304,7 +308,37 @@ static void __drm_buddy_free(struct drm_buddy *mm,
block = parent;
}
 
+   order = drm_buddy_block_order(block);
mark_free(mm, block);
+
+   return order;
+}
+
+static void drm_buddy_defrag(struct drm_buddy *mm,
+unsigned int min_order)
+{
+   struct drm_buddy_block *block;
+   struct list_head *list;
+   unsigned int order;
+   int i;
+
+   if (min_order > mm->max_order)
+   return;
+
+   for (i = min_order - 1; i >= 0; i--) {
+   list = >free_list[i];
+   if (list_empty(list))
+   continue;
+
+   list_for_each_entry_reverse(block, list, link) {
+   if (!block->parent)
+   continue;
+
+   order = __drm_buddy_free(mm, block, 1);
+   if (order >= min_order)
+   return;
+   }
+   }
 }
 
 /**
@@ -321,7 +355,7 @@ void drm_buddy_free_block(struct drm_buddy *mm,
if (drm_buddy_block_is_clear(block))
mm->clear_avail += drm_buddy_block_size(mm, block);
 
-   __drm_buddy_free(mm, block);
+   __drm_buddy_free(mm, block, 0);
 }
 EXPORT_SYMBOL(drm_buddy_free_block);
 
@@ -447,7 +481,7 @@ __alloc_range_bias(struct drm_buddy *mm,
if (buddy &&
(drm_buddy_block_is_free(block) &&
 drm_buddy_block_is_free(buddy)))
-   __drm_buddy_free(mm, block);
+   __drm_buddy_free(mm, block, 0);
return ERR_PTR(err);
 }
 
@@ -577,7 +611,7 @@ alloc_from_freelist(struct drm_buddy *mm,
 
 err_undo:
if (tmp != order)
-   __drm_buddy_free(mm, block);
+   __drm_buddy_free(mm, block, 0);
return ERR_PTR(err);
 }
 
@@ -657,7 +691,7 @@ static int __alloc_range(struct drm_buddy *mm,
if (buddy &&
(drm_buddy_block_is_free(block) &&
 drm_buddy_block_is_free(buddy)))
-   __drm_buddy_free(mm, block);
+   __drm_buddy_free(mm, block, 0);
 
 err_free:
if (err == -ENOSPC && total_allocated_on_err) {
@@ -903,7 +937,17 @@ int drm_buddy_alloc_blocks(struct drm_buddy *mm,
 
if (order-- == min_order) {
if (flags & DRM_BUDDY_CONTIGUOUS_ALLOCATION &&
-   !(flags & DRM_BUDDY_RANGE_ALLOCATION))
+   !(flags & DRM_BUDDY_RANGE_ALLOCATION)) {
+   /*
+* Defragment the freelist
+*/
+   drm_buddy_defrag(mm, min_order);
+   /*
+* Try contiguous block allocation 
again!
+*/
+   block = alloc_from_freelist(mm, 
min_order, flags);
+   if (!IS_ERR(block))
+   break;

[PATCH v5 2/3] drm/amdgpu: Enable clear page functionality

2024-01-30 Thread Arunpravin Paneer Selvam
Add clear page support in vram memory region.

v1:(Christian)
  - Dont handle clear page as TTM flag since when moving the BO back
in from GTT again we don't need that.
  - Make a specialized version of amdgpu_fill_buffer() which only
clears the VRAM areas which are not already cleared
  - Drop the TTM_PL_FLAG_WIPE_ON_RELEASE check in
amdgpu_object.c

v2:
  - Modify the function name amdgpu_ttm_* (Alex)
  - Drop the delayed parameter (Christian)
  - handle amdgpu_res_cleared() just above the size
calculation (Christian)
  - Use AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE for clearing the buffers
in the free path to properly wait for fences etc.. (Christian)

v3:(Christian)
  - Remove buffer clear code in VRAM manager instead change the
AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE handling to set
the DRM_BUDDY_CLEARED flag.
  - Remove ! from amdgpu_res_cleared() check.

Signed-off-by: Arunpravin Paneer Selvam 
Suggested-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c| 22 ---
 .../gpu/drm/amd/amdgpu/amdgpu_res_cursor.h| 25 
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   | 61 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h   |  5 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c  |  6 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.h  |  5 ++
 6 files changed, 111 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index b2e9a2f81d82..be32f9852d19 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -39,6 +39,7 @@
 #include "amdgpu.h"
 #include "amdgpu_trace.h"
 #include "amdgpu_amdkfd.h"
+#include "amdgpu_vram_mgr.h"
 
 /**
  * DOC: amdgpu_object
@@ -595,8 +596,7 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
if (!amdgpu_bo_support_uswc(bo->flags))
bo->flags &= ~AMDGPU_GEM_CREATE_CPU_GTT_USWC;
 
-   if (adev->ras_enabled)
-   bo->flags |= AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE;
+   bo->flags |= AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE;
 
bo->tbo.bdev = >mman.bdev;
if (bp->domain & (AMDGPU_GEM_DOMAIN_GWS | AMDGPU_GEM_DOMAIN_OA |
@@ -626,15 +626,17 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
 
if (bp->flags & AMDGPU_GEM_CREATE_VRAM_CLEARED &&
bo->tbo.resource->mem_type == TTM_PL_VRAM) {
-   struct dma_fence *fence;
+   struct dma_fence *fence = NULL;
 
-   r = amdgpu_fill_buffer(bo, 0, bo->tbo.base.resv, , true);
+   r = amdgpu_ttm_clear_buffer(bo, bo->tbo.base.resv, );
if (unlikely(r))
goto fail_unreserve;
 
-   dma_resv_add_fence(bo->tbo.base.resv, fence,
-  DMA_RESV_USAGE_KERNEL);
-   dma_fence_put(fence);
+   if (fence) {
+   dma_resv_add_fence(bo->tbo.base.resv, fence,
+  DMA_RESV_USAGE_KERNEL);
+   dma_fence_put(fence);
+   }
}
if (!bp->resv)
amdgpu_bo_unreserve(bo);
@@ -1357,8 +1359,12 @@ void amdgpu_bo_release_notify(struct ttm_buffer_object 
*bo)
if (WARN_ON_ONCE(!dma_resv_trylock(bo->base.resv)))
return;
 
-   r = amdgpu_fill_buffer(abo, AMDGPU_POISON, bo->base.resv, , true);
+   r = amdgpu_fill_buffer(abo, 0, bo->base.resv, , true);
if (!WARN_ON(r)) {
+   struct amdgpu_vram_mgr_resource *vres;
+
+   vres = to_amdgpu_vram_mgr_resource(bo->resource);
+   vres->flags |= DRM_BUDDY_CLEARED;
amdgpu_bo_fence(abo, fence, false);
dma_fence_put(fence);
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
index 381101d2bf05..50fcd86e1033 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
@@ -164,4 +164,29 @@ static inline void amdgpu_res_next(struct 
amdgpu_res_cursor *cur, uint64_t size)
}
 }
 
+/**
+ * amdgpu_res_cleared - check if blocks are cleared
+ *
+ * @cur: the cursor to extract the block
+ *
+ * Check if the @cur block is cleared
+ */
+static inline bool amdgpu_res_cleared(struct amdgpu_res_cursor *cur)
+{
+   struct drm_buddy_block *block;
+
+   switch (cur->mem_type) {
+   case TTM_PL_VRAM:
+   block = cur->node;
+
+   if (!amdgpu_vram_mgr_is_cleared(block))
+   return false;
+   break;
+   default:
+   return false;
+   }
+
+   return true;
+}
+
 #endif
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 46a24d2308aa..15cdda77573c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -378,11 +378,15 

[PATCH v5 1/3] drm/buddy: Implement tracking clear page feature

2024-01-30 Thread Arunpravin Paneer Selvam
- Add tracking clear page feature.

- Driver should enable the DRM_BUDDY_CLEARED flag if it
  successfully clears the blocks in the free path. On the otherhand,
  DRM buddy marks each block as cleared.

- Track the available cleared pages size

- If driver requests cleared memory we prefer cleared memory
  but fallback to uncleared if we can't find the cleared blocks.
  when driver requests uncleared memory we try to use uncleared but
  fallback to cleared memory if necessary.

- When a block gets freed we clear it and mark the freed block as cleared,
  when there are buddies which are cleared as well we can merge them.
  Otherwise, we prefer to keep the blocks as separated.

v1: (Christian)
  - Depends on the flag check DRM_BUDDY_CLEARED, enable the block as
cleared. Else, reset the clear flag for each block in the list.

  - For merging the 2 cleared blocks compare as below,
drm_buddy_is_clear(block) != drm_buddy_is_clear(buddy)

Signed-off-by: Arunpravin Paneer Selvam 
Suggested-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c  |   6 +-
 drivers/gpu/drm/drm_buddy.c   | 169 +++---
 drivers/gpu/drm/i915/i915_ttm_buddy_manager.c |   6 +-
 drivers/gpu/drm/tests/drm_buddy_test.c|  10 +-
 include/drm/drm_buddy.h   |  18 +-
 5 files changed, 168 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index 08916538a615..d0e199cc8f17 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -556,7 +556,7 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager 
*man,
return 0;
 
 error_free_blocks:
-   drm_buddy_free_list(mm, >blocks);
+   drm_buddy_free_list(mm, >blocks, 0);
mutex_unlock(>lock);
 error_fini:
ttm_resource_fini(man, >base);
@@ -589,7 +589,7 @@ static void amdgpu_vram_mgr_del(struct ttm_resource_manager 
*man,
 
amdgpu_vram_mgr_do_reserve(man);
 
-   drm_buddy_free_list(mm, >blocks);
+   drm_buddy_free_list(mm, >blocks, 0);
mutex_unlock(>lock);
 
atomic64_sub(vis_usage, >vis_usage);
@@ -897,7 +897,7 @@ void amdgpu_vram_mgr_fini(struct amdgpu_device *adev)
kfree(rsv);
 
list_for_each_entry_safe(rsv, temp, >reserved_pages, blocks) {
-   drm_buddy_free_list(>mm, >allocated);
+   drm_buddy_free_list(>mm, >allocated, 0);
kfree(rsv);
}
if (!adev->gmc.is_app_apu)
diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
index f57e6d74fb0e..d44172f23f05 100644
--- a/drivers/gpu/drm/drm_buddy.c
+++ b/drivers/gpu/drm/drm_buddy.c
@@ -57,6 +57,16 @@ static void list_insert_sorted(struct drm_buddy *mm,
__list_add(>link, node->link.prev, >link);
 }
 
+static void clear_reset(struct drm_buddy_block *block)
+{
+   block->header &= ~DRM_BUDDY_HEADER_CLEAR;
+}
+
+static void mark_cleared(struct drm_buddy_block *block)
+{
+   block->header |= DRM_BUDDY_HEADER_CLEAR;
+}
+
 static void mark_allocated(struct drm_buddy_block *block)
 {
block->header &= ~DRM_BUDDY_HEADER_STATE;
@@ -223,6 +233,12 @@ static int split_block(struct drm_buddy *mm,
mark_free(mm, block->left);
mark_free(mm, block->right);
 
+   if (drm_buddy_block_is_clear(block)) {
+   mark_cleared(block->left);
+   mark_cleared(block->right);
+   clear_reset(block);
+   }
+
mark_split(block);
 
return 0;
@@ -273,6 +289,13 @@ static void __drm_buddy_free(struct drm_buddy *mm,
if (!drm_buddy_block_is_free(buddy))
break;
 
+   if (drm_buddy_block_is_clear(block) !=
+   drm_buddy_block_is_clear(buddy))
+   break;
+
+   if (drm_buddy_block_is_clear(block))
+   mark_cleared(parent);
+
list_del(>link);
 
drm_block_free(mm, block);
@@ -295,6 +318,9 @@ void drm_buddy_free_block(struct drm_buddy *mm,
 {
BUG_ON(!drm_buddy_block_is_allocated(block));
mm->avail += drm_buddy_block_size(mm, block);
+   if (drm_buddy_block_is_clear(block))
+   mm->clear_avail += drm_buddy_block_size(mm, block);
+
__drm_buddy_free(mm, block);
 }
 EXPORT_SYMBOL(drm_buddy_free_block);
@@ -305,10 +331,20 @@ EXPORT_SYMBOL(drm_buddy_free_block);
  * @mm: DRM buddy manager
  * @objects: input list head to free blocks
  */
-void drm_buddy_free_list(struct drm_buddy *mm, struct list_head *objects)
+void drm_buddy_free_list(struct drm_buddy *mm,
+struct list_head *objects,
+unsigned long flags)
 {
struct drm_buddy_block *block, *on;
 
+   if (flags & DRM_BUDDY_CLEARED) {
+   list_for_each_entry(block, objects, link)
+   mark_cleared(block);
+ 

[PATCH v2 3/4] drm/amd: Fetch the EDID from _DDC if available for eDP

2024-01-30 Thread Mario Limonciello
Some manufacturers have intentionally put an EDID that differs from
the EDID on the internal panel on laptops.

Attempt to fetch this EDID if it exists and prefer it over the EDID
that is provided by the panel.

Signed-off-by: Mario Limonciello 
---
v2:
 * Use drm helper which will run more validation
 * Move eDP check to DRM helper
 * Add module parameter
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h|  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c |  4 
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c|  8 
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c  | 10 --
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c  |  9 ++---
 5 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 3d8a48f46b01..5d5be3e20687 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -217,6 +217,7 @@ extern int amdgpu_smartshift_bias;
 extern int amdgpu_use_xgmi_p2p;
 extern int amdgpu_mtype_local;
 extern bool enforce_isolation;
+extern bool acpi_edid;
 #ifdef CONFIG_HSA_AMD
 extern int sched_policy;
 extern bool debug_evictions;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c
index 9caba10315a8..6aa8cc431abe 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c
@@ -278,6 +278,10 @@ static void amdgpu_connector_get_edid(struct drm_connector 
*connector)
struct amdgpu_device *adev = drm_to_adev(dev);
struct amdgpu_connector *amdgpu_connector = 
to_amdgpu_connector(connector);
 
+   /* if the BIOS specifies the EDID via _DDC, prefer this */
+   if (acpi_edid && !amdgpu_connector->edid)
+   amdgpu_connector->edid = drm_get_acpi_edid(connector);
+
if (amdgpu_connector->edid)
return;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index cc69005f5b46..be7a4da85a8b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -166,6 +166,7 @@ uint amdgpu_sdma_phase_quantum = 32;
 char *amdgpu_disable_cu;
 char *amdgpu_virtual_display;
 bool enforce_isolation;
+bool acpi_edid = true;
 /*
  * OverDrive(bit 14) disabled by default
  * GFX DCS(bit 19) disabled by default
@@ -990,6 +991,13 @@ MODULE_PARM_DESC(wbrf,
"Enable Wifi RFI interference mitigation (0 = disabled, 1 = enabled, -1 
= auto(default)");
 module_param_named(wbrf, amdgpu_wbrf, int, 0444);
 
+/**
+ * DOC: acpi_edid (bool)
+ * Try to fetch EDID for eDP display from BIOS using ACPI _DDC method.
+ */
+module_param(acpi_edid, bool, 0444);
+MODULE_PARM_DESC(acpi_edid, "Fetch EDID for eDP display from BIOS");
+
 /* These devices are not supported by amdgpu.
  * They are supported by the mach64, r128, radeon drivers
  */
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 202c6ad443a3..688d615c6687 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -6589,7 +6589,11 @@ static void amdgpu_dm_connector_funcs_force(struct 
drm_connector *connector)
struct amdgpu_connector *amdgpu_connector = 
to_amdgpu_connector(connector);
struct dc_link *dc_link = aconnector->dc_link;
struct dc_sink *dc_em_sink = aconnector->dc_em_sink;
-   struct edid *edid;
+   struct edid *edid = NULL;
+
+   /* prefer ACPI over panel for eDP */
+   if (acpi_edid)
+   edid = drm_get_acpi_edid(connector);
 
/*
 * Note: drm_get_edid gets edid in the following order:
@@ -6597,7 +6601,9 @@ static void amdgpu_dm_connector_funcs_force(struct 
drm_connector *connector)
 * 2) firmware EDID if set via edid_firmware module parameter
 * 3) regular DDC read.
 */
-   edid = drm_get_edid(connector, _connector->ddc_bus->aux.ddc);
+   if (!edid)
+   edid = drm_get_edid(connector, 
_connector->ddc_bus->aux.ddc);
+
if (!edid) {
DRM_ERROR("No EDID found on connector: %s.\n", connector->name);
return;
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c
index 85b7f58a7f35..cc39b1c14aa8 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c
@@ -899,7 +899,7 @@ enum dc_edid_status dm_helpers_read_local_edid(
struct i2c_adapter *ddc;
int retry = 3;
enum dc_edid_status edid_status;
-   struct edid *edid;
+   struct edid *edid = NULL;
 
if (link->aux_mode)
ddc = >dm_dp_aux.aux.ddc;
@@ -910,8 +910,11 @@ enum dc_edid_status dm_helpers_read_local_edid(
 * do check 

[PATCH v2 4/4] drm/nouveau: Use drm_get_acpi_edid() helper

2024-01-30 Thread Mario Limonciello
Rather than inventing a wrapper to acpi_video_get_edid() use the
one provided by drm. This fixes two problems:
1. A memory leak that the memory provided by the ACPI call was
   never freed.
2. Validation of the BIOS provided blob.

Signed-off-by: Mario Limonciello 
---
v1->v2:
 * New patch
---
 drivers/gpu/drm/nouveau/nouveau_acpi.c  | 27 -
 drivers/gpu/drm/nouveau/nouveau_acpi.h  |  2 --
 drivers/gpu/drm/nouveau/nouveau_connector.c |  2 +-
 3 files changed, 1 insertion(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_acpi.c 
b/drivers/gpu/drm/nouveau/nouveau_acpi.c
index 8f0c69aad248..de9daafb3fbb 100644
--- a/drivers/gpu/drm/nouveau/nouveau_acpi.c
+++ b/drivers/gpu/drm/nouveau/nouveau_acpi.c
@@ -360,33 +360,6 @@ void nouveau_unregister_dsm_handler(void) {}
 void nouveau_switcheroo_optimus_dsm(void) {}
 #endif
 
-void *
-nouveau_acpi_edid(struct drm_device *dev, struct drm_connector *connector)
-{
-   struct acpi_device *acpidev;
-   int type, ret;
-   void *edid;
-
-   switch (connector->connector_type) {
-   case DRM_MODE_CONNECTOR_LVDS:
-   case DRM_MODE_CONNECTOR_eDP:
-   type = ACPI_VIDEO_DISPLAY_LCD;
-   break;
-   default:
-   return NULL;
-   }
-
-   acpidev = ACPI_COMPANION(dev->dev);
-   if (!acpidev)
-   return NULL;
-
-   ret = acpi_video_get_edid(acpidev, type, -1, );
-   if (ret < 0)
-   return NULL;
-
-   return kmemdup(edid, EDID_LENGTH, GFP_KERNEL);
-}
-
 bool nouveau_acpi_video_backlight_use_native(void)
 {
return acpi_video_backlight_use_native();
diff --git a/drivers/gpu/drm/nouveau/nouveau_acpi.h 
b/drivers/gpu/drm/nouveau/nouveau_acpi.h
index e39dd8b94b8b..6a3def8e6cca 100644
--- a/drivers/gpu/drm/nouveau/nouveau_acpi.h
+++ b/drivers/gpu/drm/nouveau/nouveau_acpi.h
@@ -10,7 +10,6 @@ bool nouveau_is_v1_dsm(void);
 void nouveau_register_dsm_handler(void);
 void nouveau_unregister_dsm_handler(void);
 void nouveau_switcheroo_optimus_dsm(void);
-void *nouveau_acpi_edid(struct drm_device *, struct drm_connector *);
 bool nouveau_acpi_video_backlight_use_native(void);
 void nouveau_acpi_video_register_backlight(void);
 #else
@@ -19,7 +18,6 @@ static inline bool nouveau_is_v1_dsm(void) { return false; };
 static inline void nouveau_register_dsm_handler(void) {}
 static inline void nouveau_unregister_dsm_handler(void) {}
 static inline void nouveau_switcheroo_optimus_dsm(void) {}
-static inline void *nouveau_acpi_edid(struct drm_device *dev, struct 
drm_connector *connector) { return NULL; }
 static inline bool nouveau_acpi_video_backlight_use_native(void) { return 
true; }
 static inline void nouveau_acpi_video_register_backlight(void) {}
 #endif
diff --git a/drivers/gpu/drm/nouveau/nouveau_connector.c 
b/drivers/gpu/drm/nouveau/nouveau_connector.c
index 856b3ef5edb8..746571d4cac0 100644
--- a/drivers/gpu/drm/nouveau/nouveau_connector.c
+++ b/drivers/gpu/drm/nouveau/nouveau_connector.c
@@ -713,7 +713,7 @@ nouveau_connector_detect_lvds(struct drm_connector 
*connector, bool force)
 * valid - it's not (rh#613284)
 */
if (nv_encoder->dcb->lvdsconf.use_acpi_for_edid) {
-   edid = nouveau_acpi_edid(dev, connector);
+   edid = drm_get_acpi_edid(connector);
if (edid) {
status = connector_status_connected;
goto out;
-- 
2.34.1



[PATCH v2 2/4] drm: Add drm_get_acpi_edid() helper

2024-01-30 Thread Mario Limonciello
Some manufacturers have intentionally put an EDID that differs from
the EDID on the internal panel on laptops.  Drivers can call this
helper to attempt to fetch the EDID from the BIOS's ACPI _DDC method.

Signed-off-by: Mario Limonciello 
---
v1->v2:
 * Split code from previous amdgpu specific helper to generic drm helper.
---
 drivers/gpu/drm/Kconfig|  4 +++
 drivers/gpu/drm/drm_edid.c | 73 ++
 include/drm/drm_edid.h |  1 +
 3 files changed, 78 insertions(+)

diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index 2520db0b776e..0065dcb63745 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -21,6 +21,10 @@ menuconfig DRM
select KCMP
select VIDEO_CMDLINE
select VIDEO_NOMODESET
+   select ACPI_VIDEO if ACPI
+   select BACKLIGHT_CLASS_DEVICE if ACPI
+   select INPUT if ACPI
+   select ACPI_WMI if X86
help
  Kernel-level support for the Direct Rendering Infrastructure (DRI)
  introduced in XFree86 4.0. If you say Y here, you need to select
diff --git a/drivers/gpu/drm/drm_edid.c b/drivers/gpu/drm/drm_edid.c
index 69c68804023f..1fbbeaa664b2 100644
--- a/drivers/gpu/drm/drm_edid.c
+++ b/drivers/gpu/drm/drm_edid.c
@@ -28,6 +28,7 @@
  * DEALINGS IN THE SOFTWARE.
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -2188,6 +2189,47 @@ drm_do_probe_ddc_edid(void *data, u8 *buf, unsigned int 
block, size_t len)
return ret == xfers ? 0 : -1;
 }
 
+/**
+ * drm_do_probe_acpi_edid() - get EDID information via ACPI _DDC
+ * @data: struct drm_device
+ * @buf: EDID data buffer to be filled
+ * @block: 128 byte EDID block to start fetching from
+ * @len: EDID data buffer length to fetch
+ *
+ * Try to fetch EDID information by calling acpi_video_get_edid() function.
+ *
+ * Return: 0 on success or error code on failure.
+ */
+static int
+drm_do_probe_acpi_edid(void *data, u8 *buf, unsigned int block, size_t len)
+{
+   struct drm_device *ddev = data;
+   struct acpi_device *acpidev = ACPI_COMPANION(ddev->dev);
+   unsigned char start = block * EDID_LENGTH;
+   void *edid;
+   int r;
+
+   if (!acpidev)
+   return -ENODEV;
+
+   /* fetch the entire edid from BIOS */
+   r = acpi_video_get_edid(acpidev, ACPI_VIDEO_DISPLAY_LCD, -1, );
+   if (r < 0) {
+   DRM_DEBUG_KMS("Failed to get EDID from ACPI: %d\n", r);
+   return -EINVAL;
+   }
+   if (len > r || start > r || start + len > r) {
+   r = EINVAL;
+   goto cleanup;
+   }
+
+   memcpy(buf, edid + start, len);
+   r = 0;
+cleanup:
+   kfree(edid);
+   return r;
+}
+
 static void connector_bad_edid(struct drm_connector *connector,
   const struct edid *edid, int num_blocks)
 {
@@ -2643,6 +2685,37 @@ struct edid *drm_get_edid(struct drm_connector 
*connector,
 }
 EXPORT_SYMBOL(drm_get_edid);
 
+/**
+ * drm_get_acpi_edid - get EDID data, if available
+ * @connector: connector we're probing
+ *
+ * Use the BIOS to attempt to grab EDID data if possible.  If found,
+ * attach it to the connector.
+ *
+ * Return: Pointer to valid EDID or NULL if we couldn't find any.
+ */
+struct edid *drm_get_acpi_edid(struct drm_connector *connector)
+{
+   struct edid *edid = NULL;
+
+   switch (connector->connector_type) {
+   case DRM_MODE_CONNECTOR_LVDS:
+   case DRM_MODE_CONNECTOR_eDP:
+   break;
+   default:
+   return NULL;
+   }
+
+   if (connector->force == DRM_FORCE_OFF)
+   return NULL;
+
+   edid = _drm_do_get_edid(connector, drm_do_probe_acpi_edid, 
connector->dev, NULL);
+
+   drm_connector_update_edid_property(connector, edid);
+   return edid;
+}
+EXPORT_SYMBOL(drm_get_acpi_edid);
+
 /**
  * drm_edid_read_custom - Read EDID data using given EDID block read function
  * @connector: Connector to use
diff --git a/include/drm/drm_edid.h b/include/drm/drm_edid.h
index 518d1b8106c7..60fbdc06badc 100644
--- a/include/drm/drm_edid.h
+++ b/include/drm/drm_edid.h
@@ -412,6 +412,7 @@ struct edid *drm_do_get_edid(struct drm_connector 
*connector,
void *data);
 struct edid *drm_get_edid(struct drm_connector *connector,
  struct i2c_adapter *adapter);
+struct edid *drm_get_acpi_edid(struct drm_connector *connector);
 u32 drm_edid_get_panel_id(struct i2c_adapter *adapter);
 struct edid *drm_get_edid_switcheroo(struct drm_connector *connector,
 struct i2c_adapter *adapter);
-- 
2.34.1



[PATCH v2 1/4] ACPI: video: Handle fetching EDID that is longer than 256 bytes

2024-01-30 Thread Mario Limonciello
The ACPI specification allows for an EDID to be up to 512 bytes but
the _DDC EDID fetching code will only try up to 256 bytes.

Modify the code to instead start at 512 bytes and work it's way
down instead.

As _DDC is now called up to 4 times on a machine debugging messages
are noisier than necessary.  Decrease from info to debug.

Link: 
https://uefi.org/htmlspecs/ACPI_Spec_6_4_html/Apx_B_Video_Extensions/output-device-specific-methods.html#ddc-return-the-edid-for-this-device
Signed-off-by: Mario Limonciello 
---
v1->v2:
 * Use for loop for acpi_video_get_edid()
 * Use one of Rafael's suggestions for acpi_video_device_EDID()
 * Decrease message level too

I was going to split this separately, but decided to keep it in the same
series in case there is any decision to change the interface to
acpi_video_get_edid() at the same series.
---
 drivers/acpi/acpi_video.c | 25 +
 1 file changed, 9 insertions(+), 16 deletions(-)

diff --git a/drivers/acpi/acpi_video.c b/drivers/acpi/acpi_video.c
index 4afdda9db019..3bfd013e09d2 100644
--- a/drivers/acpi/acpi_video.c
+++ b/drivers/acpi/acpi_video.c
@@ -625,12 +625,9 @@ acpi_video_device_EDID(struct acpi_video_device *device,
 
if (!device)
return -ENODEV;
-   if (length == 128)
-   arg0.integer.value = 1;
-   else if (length == 256)
-   arg0.integer.value = 2;
-   else
+   if (!length || (length % 128))
return -EINVAL;
+   arg0.integer.value = length / 128;
 
status = acpi_evaluate_object(device->dev->handle, "_DDC", , 
);
if (ACPI_FAILURE(status))
@@ -641,7 +638,8 @@ acpi_video_device_EDID(struct acpi_video_device *device,
if (obj && obj->type == ACPI_TYPE_BUFFER)
*edid = obj;
else {
-   acpi_handle_info(device->dev->handle, "Invalid _DDC data\n");
+   acpi_handle_debug(device->dev->handle,
+"Invalid _DDC data for length %ld\n", length);
status = -EFAULT;
kfree(obj);
}
@@ -1447,7 +1445,6 @@ int acpi_video_get_edid(struct acpi_device *device, int 
type, int device_id,
 
for (i = 0; i < video->attached_count; i++) {
video_device = video->attached_array[i].bind_info;
-   length = 256;
 
if (!video_device)
continue;
@@ -1478,18 +1475,14 @@ int acpi_video_get_edid(struct acpi_device *device, int 
type, int device_id,
continue;
}
 
-   status = acpi_video_device_EDID(video_device, , length);
-
-   if (ACPI_FAILURE(status) || !buffer ||
-   buffer->type != ACPI_TYPE_BUFFER) {
-   length = 128;
+   for (length = 512; length > 0; length -= 128) {
status = acpi_video_device_EDID(video_device, ,
length);
-   if (ACPI_FAILURE(status) || !buffer ||
-   buffer->type != ACPI_TYPE_BUFFER) {
-   continue;
-   }
+   if (ACPI_SUCCESS(status))
+   break;
}
+   if (!length)
+   continue;
 
*edid = buffer->buffer.pointer;
return length;
-- 
2.34.1



[PATCH v2 0/4] Add support for fetching EDID from ACPI _DDC

2024-01-30 Thread Mario Limonciello
Some laptops ship an EDID in the BIOS encoded in the _DDC method that
differs than the EDID directly on the laptop panel for $REASONS.

This is the EDID that is used by the AMD Windows driver, and so sometimes
different results are found in different operating systems.

This series adds a new DRM helper that will use acpi_video to fetch the
EDID.

On amdgpu when an eDP panel is found the BIOS
is checked first for an EDID and that used as a preference if found.

On nouveau it replaces the previous local function doing a similar role.

This does *not* use struct drm_edid as this will require more involved
amdgpu display driver work that will come separately as part of follow-ups
to: https://lore.kernel.org/amd-gfx/20240126163429.56714-1-m...@igalia.com/

Mario Limonciello (4):
  ACPI: video: Handle fetching EDID that is longer than 256 bytes
  drm: Add drm_get_acpi_edid() helper
  drm/amd: Fetch the EDID from _DDC if available for eDP
  drm/nouveau: Use drm_get_acpi_edid() helper

 drivers/acpi/acpi_video.c | 25 +++
 drivers/gpu/drm/Kconfig   |  4 +
 drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  1 +
 .../gpu/drm/amd/amdgpu/amdgpu_connectors.c|  4 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   |  8 ++
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 10 ++-
 .../amd/display/amdgpu_dm/amdgpu_dm_helpers.c |  9 ++-
 drivers/gpu/drm/drm_edid.c| 73 +++
 drivers/gpu/drm/nouveau/nouveau_acpi.c| 27 ---
 drivers/gpu/drm/nouveau/nouveau_acpi.h|  2 -
 drivers/gpu/drm/nouveau/nouveau_connector.c   |  2 +-
 include/drm/drm_edid.h|  1 +
 12 files changed, 115 insertions(+), 51 deletions(-)

-- 
2.34.1



RE: [PATCH] drm/amdgpu: Clear the hotplug interrupt ack bit before hpd initialization

2024-01-30 Thread Deucher, Alexander
[Public]

> -Original Message-
> From: amd-gfx  On Behalf Of Qiang
> Ma
> Sent: Tuesday, January 30, 2024 4:35 AM
> To: lexander.deuc...@amd.com; Koenig, Christian
> ; Pan, Xinhui ;
> airl...@gmail.com; dan...@ffwll.ch; sunran...@208suo.com;
> SHANMUGAM, SRINIVASAN 
> Cc: Qiang Ma ; dri-de...@lists.freedesktop.org;
> amd-gfx@lists.freedesktop.org; linux-ker...@vger.kernel.org
> Subject: [PATCH] drm/amdgpu: Clear the hotplug interrupt ack bit before hpd
> initialization
>
> Problem:
> The computer in the bios initialization process, unplug the HDMI display, wait
> until the system up, plug in the HDMI display, did not enter the hotplug
> interrupt function, the display is not bright.
>
> Fix:
> After the above problem occurs, and the hpd ack interrupt bit is 1, the
> interrupt should be cleared during hpd_init initialization so that when the
> driver is ready, it can respond to the hpd interrupt normally.
>
> Signed-off-by: Qiang Ma 
> ---
>  drivers/gpu/drm/amd/amdgpu/dce_v10_0.c |  2 ++
> drivers/gpu/drm/amd/amdgpu/dce_v11_0.c |  2 ++
> drivers/gpu/drm/amd/amdgpu/dce_v6_0.c  | 20 +---
> drivers/gpu/drm/amd/amdgpu/dce_v8_0.c  | 20 +---
>  4 files changed, 38 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c
> b/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c
> index bb666cb7522e..11859059fd10 100644
> --- a/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c
> @@ -51,6 +51,7 @@
>
>  static void dce_v10_0_set_display_funcs(struct amdgpu_device *adev);
> static void dce_v10_0_set_irq_funcs(struct amdgpu_device *adev);
> +static void dce_v10_0_hpd_int_ack(struct amdgpu_device *adev, int hpd);
>
>  static const u32 crtc_offsets[] = {
>   CRTC0_REGISTER_OFFSET,
> @@ -363,6 +364,7 @@ static void dce_v10_0_hpd_init(struct
> amdgpu_device *adev)
>
> AMDGPU_HPD_DISCONNECT_INT_DELAY_IN_MS);
>   WREG32(mmDC_HPD_TOGGLE_FILT_CNTL +
> hpd_offsets[amdgpu_connector->hpd.hpd], tmp);
>
> + dce_v6_0_hpd_int_ack(adev, amdgpu_connector->hpd.hpd);


Should be dce_v10_0_hpd_int_ack().

>   dce_v10_0_hpd_set_polarity(adev, amdgpu_connector-
> >hpd.hpd);
>   amdgpu_irq_get(adev, >hpd_irq,
>  amdgpu_connector->hpd.hpd);
> diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
> b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
> index 7af277f61cca..745e4fdffade 100644
> --- a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
> @@ -51,6 +51,7 @@
>
>  static void dce_v11_0_set_display_funcs(struct amdgpu_device *adev);
> static void dce_v11_0_set_irq_funcs(struct amdgpu_device *adev);
> +static void dce_v11_0_hpd_int_ack(struct amdgpu_device *adev, int hpd);
>
>  static const u32 crtc_offsets[] =
>  {
> @@ -387,6 +388,7 @@ static void dce_v11_0_hpd_init(struct
> amdgpu_device *adev)
>
> AMDGPU_HPD_DISCONNECT_INT_DELAY_IN_MS);
>   WREG32(mmDC_HPD_TOGGLE_FILT_CNTL +
> hpd_offsets[amdgpu_connector->hpd.hpd], tmp);
>
> + dce_v11_0_hpd_int_ack(adev, amdgpu_connector-
> >hpd.hpd);
>   dce_v11_0_hpd_set_polarity(adev, amdgpu_connector-
> >hpd.hpd);
>   amdgpu_irq_get(adev, >hpd_irq, amdgpu_connector-
> >hpd.hpd);
>   }
> diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v6_0.c
> b/drivers/gpu/drm/amd/amdgpu/dce_v6_0.c
> index 143efc37a17f..f8e15ebf74b4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/dce_v6_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/dce_v6_0.c
> @@ -272,6 +272,21 @@ static void dce_v6_0_hpd_set_polarity(struct
> amdgpu_device *adev,
>   WREG32(mmDC_HPD1_INT_CONTROL + hpd_offsets[hpd], tmp);  }
>
> +static void dce_v6_0_hpd_int_ack(struct amdgpu_device *adev,
> +  int hpd)
> +{
> + u32 tmp;
> +
> + if (hpd >= adev->mode_info.num_hpd) {
> + DRM_DEBUG("invalid hdp %d\n", hpd);
> + return;
> + }
> +
> + tmp = RREG32(mmDC_HPD1_INT_CONTROL + hpd_offsets[hpd]);
> + tmp |= DC_HPD1_INT_CONTROL__DC_HPD1_INT_ACK_MASK;
> + WREG32(mmDC_HPD1_INT_CONTROL + hpd_offsets[hpd], tmp); }
> +
>  /**
>   * dce_v6_0_hpd_init - hpd setup callback.
>   *
> @@ -311,6 +326,7 @@ static void dce_v6_0_hpd_init(struct amdgpu_device
> *adev)
>   continue;
>   }
>
> + dce_v6_0_hpd_int_ack(adev, amdgpu_connector->hpd.hpd);
>   dce_v6_0_hpd_set_polarity(adev, amdgpu_connector-
> >hpd.hpd);
>   amdgpu_irq_get(adev, >hpd_irq, amdgpu_connector-
> >hpd.hpd);
>   }
> @@ -3101,9 +3117,7 @@ static int dce_v6_0_hpd_irq(struct amdgpu_device
> *adev,
>   mask = interrupt_status_offsets[hpd].hpd;
>
>   if (disp_int & mask) {
> - tmp = RREG32(mmDC_HPD1_INT_CONTROL +
> hpd_offsets[hpd]);
> - tmp |=
> DC_HPD1_INT_CONTROL__DC_HPD1_INT_ACK_MASK;
> - WREG32(mmDC_HPD1_INT_CONTROL + hpd_offsets[hpd],
> tmp);
> + 

Re: [PATCH] drm/amdgpu: remove golden setting for gfx 11.5.0

2024-01-30 Thread Deucher, Alexander
[AMD Official Use Only - General]

Acked-by: Alex Deucher 

From: Zhang, Yifan 
Sent: Monday, January 29, 2024 4:06 AM
To: amd-gfx@lists.freedesktop.org 
Cc: Deucher, Alexander ; Koenig, Christian 
; Huang, Tim ; Yu, Lang 
; Zhang, Yifan 
Subject: [PATCH] drm/amdgpu: remove golden setting for gfx 11.5.0

No need to set golden settings in driver from gfx 11.5.0 onwards

Signed-off-by: Yifan Zhang 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 32 ++
 1 file changed, 2 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index c1e10760..4e99af904e04 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -90,10 +90,6 @@ MODULE_FIRMWARE("amdgpu/gc_11_5_0_me.bin");
 MODULE_FIRMWARE("amdgpu/gc_11_5_0_mec.bin");
 MODULE_FIRMWARE("amdgpu/gc_11_5_0_rlc.bin");

-static const struct soc15_reg_golden golden_settings_gc_11_0[] = {
-   SOC15_REG_GOLDEN_VALUE(GC, 0, regTCP_CNTL, 0x2000, 0x2000)
-};
-
 static const struct soc15_reg_golden golden_settings_gc_11_0_1[] =
 {
 SOC15_REG_GOLDEN_VALUE(GC, 0, regCGTT_GS_NGG_CLK_CTRL, 0x9fff8fff, 
0x0010),
@@ -104,24 +100,8 @@ static const struct soc15_reg_golden 
golden_settings_gc_11_0_1[] =
 SOC15_REG_GOLDEN_VALUE(GC, 0, regPA_SC_ENHANCE_3, 0xfffd, 
0x0008),
 SOC15_REG_GOLDEN_VALUE(GC, 0, regPA_SC_VRS_SURFACE_CNTL_1, 0xfff891ff, 
0x55480100),
 SOC15_REG_GOLDEN_VALUE(GC, 0, regTA_CNTL_AUX, 0xf7f7, 0x0103),
-   SOC15_REG_GOLDEN_VALUE(GC, 0, regTCP_CNTL2, 0xfcff, 0x000a)
-};
-
-static const struct soc15_reg_golden golden_settings_gc_11_5_0[] = {
-   SOC15_REG_GOLDEN_VALUE(GC, 0, regDB_DEBUG5, 0x, 0x0800),
-   SOC15_REG_GOLDEN_VALUE(GC, 0, regGB_ADDR_CONFIG, 0x0c1807ff, 
0x0242),
-   SOC15_REG_GOLDEN_VALUE(GC, 0, regGCR_GENERAL_CNTL, 0x1ff1, 
0x0500),
-   SOC15_REG_GOLDEN_VALUE(GC, 0, regGL2A_ADDR_MATCH_MASK, 0x, 
0xfff3),
-   SOC15_REG_GOLDEN_VALUE(GC, 0, regGL2C_ADDR_MATCH_MASK, 0x, 
0xfff3),
-   SOC15_REG_GOLDEN_VALUE(GC, 0, regGL2C_CTRL, 0x, 0xf37fff3f),
-   SOC15_REG_GOLDEN_VALUE(GC, 0, regGL2C_CTRL3, 0xfffb, 0x00f40188),
-   SOC15_REG_GOLDEN_VALUE(GC, 0, regGL2C_CTRL4, 0xf0ff, 0x80009007),
-   SOC15_REG_GOLDEN_VALUE(GC, 0, regPA_CL_ENHANCE, 0xf1ff, 0x00880007),
-   SOC15_REG_GOLDEN_VALUE(GC, 0, regPC_CONFIG_CNTL_1, 0x, 
0x0001),
-   SOC15_REG_GOLDEN_VALUE(GC, 0, regTA_CNTL_AUX, 0xf7f7, 0x0103),
-   SOC15_REG_GOLDEN_VALUE(GC, 0, regTA_CNTL2, 0x007f, 0x),
-   SOC15_REG_GOLDEN_VALUE(GC, 0, regTCP_CNTL2, 0xffcf, 0x200a),
-   SOC15_REG_GOLDEN_VALUE(GC, 0, regUTCL1_CTRL_2, 0x, 0x048f)
+   SOC15_REG_GOLDEN_VALUE(GC, 0, regTCP_CNTL2, 0xfcff, 0x000a),
+   SOC15_REG_GOLDEN_VALUE(GC, 0, regTCP_CNTL, 0x2000, 0x2000)
 };

 #define DEFAULT_SH_MEM_CONFIG \
@@ -304,17 +284,9 @@ static void gfx_v11_0_init_golden_registers(struct 
amdgpu_device *adev)
 golden_settings_gc_11_0_1,
 (const 
u32)ARRAY_SIZE(golden_settings_gc_11_0_1));
 break;
-   case IP_VERSION(11, 5, 0):
-   soc15_program_register_sequence(adev,
-   golden_settings_gc_11_5_0,
-   (const 
u32)ARRAY_SIZE(golden_settings_gc_11_5_0));
-   break;
 default:
 break;
 }
-   soc15_program_register_sequence(adev,
-   golden_settings_gc_11_0,
-   (const 
u32)ARRAY_SIZE(golden_settings_gc_11_0));

 }

--
2.37.3



Re: [PATCH 5/6] drm/i915: Update shared stats to use the new gem helper

2024-01-30 Thread Tvrtko Ursulin




On 30/01/2024 16:12, Alex Deucher wrote:

Switch to using the new gem shared memory stats helper
rather than hand rolling it.

Link: 
https://lore.kernel.org/all/20231207180225.439482-1-alexander.deuc...@amd.com/
Signed-off-by: Alex Deucher 
---
  drivers/gpu/drm/i915/i915_drm_client.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drm_client.c 
b/drivers/gpu/drm/i915/i915_drm_client.c
index fa6852713bee..f58682505491 100644
--- a/drivers/gpu/drm/i915/i915_drm_client.c
+++ b/drivers/gpu/drm/i915/i915_drm_client.c
@@ -53,7 +53,7 @@ obj_meminfo(struct drm_i915_gem_object *obj,
obj->mm.region->id : INTEL_REGION_SMEM;
const u64 sz = obj->base.size;
  
-	if (obj->base.handle_count > 1)

+   if (drm_gem_object_is_shared_for_memory_stats(>base))
stats[id].shared += sz;
else
stats[id].private += sz;


Reviewed-by: Tvrtko Ursulin 

Good that you remembered this story, I completely forgot!

Regards,

Tvrtko


Re: [PATCH 2/6] drm: add drm_gem_object_is_shared_for_memory_stats() helper

2024-01-30 Thread Tvrtko Ursulin



On 30/01/2024 16:12, Alex Deucher wrote:

Add a helper so that drm drivers can consistently report
shared status via the fdinfo shared memory stats interface.

In addition to handle count, show buffers as shared if they
are shared via dma-buf as well (e.g., shared with v4l or some
other subsystem).

Link: 
https://lore.kernel.org/all/20231207180225.439482-1-alexander.deuc...@amd.com/
Signed-off-by: Alex Deucher 
---
  drivers/gpu/drm/drm_gem.c | 16 
  include/drm/drm_gem.h |  1 +
  2 files changed, 17 insertions(+)

diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 44a948b80ee1..71b5f628d828 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -1506,3 +1506,19 @@ int drm_gem_evict(struct drm_gem_object *obj)
return 0;
  }
  EXPORT_SYMBOL(drm_gem_evict);
+
+/**
+ * drm_gem_object_is_shared_for_memory_stats - helper for shared memory stats
+ *
+ * This helper should only be used for fdinfo shared memory stats to determine
+ * if a GEM object is shared.
+ *
+ * @obj: obj in question
+ */
+bool drm_gem_object_is_shared_for_memory_stats(struct drm_gem_object *obj)
+{
+   if ((obj->handle_count > 1) || obj->dma_buf)
+   return true;
+   return false;
+}
+EXPORT_SYMBOL(drm_gem_object_is_shared_for_memory_stats);
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index 369505447acd..86a9c696f038 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -552,6 +552,7 @@ unsigned long drm_gem_lru_scan(struct drm_gem_lru *lru,
   bool (*shrink)(struct drm_gem_object *obj));
  
  int drm_gem_evict(struct drm_gem_object *obj);

+bool drm_gem_object_is_shared_for_memory_stats(struct drm_gem_object *obj);
  
  #ifdef CONFIG_LOCKDEP

  /**


Not sure what the local view on static inlines, but fine nevertheless.

Reviewed-by: Tvrtko Ursulin 

Regards,

Tvrtko


Re: [PATCH 3/6] drm: update drm_show_memory_stats() for dma-bufs

2024-01-30 Thread Tvrtko Ursulin




On 30/01/2024 16:12, Alex Deucher wrote:

Show buffers as shared if they are shared via dma-buf as well
(e.g., shared with v4l or some other subsystem).

v2: switch to gem helper

Link: 
https://lore.kernel.org/all/20231207180225.439482-1-alexander.deuc...@amd.com/
Reviewed-by: Rob Clark  (v1)
Signed-off-by: Alex Deucher 
Cc: Rob Clark 
---
  drivers/gpu/drm/drm_file.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
index 8c87287c3e16..638ffaf5 100644
--- a/drivers/gpu/drm/drm_file.c
+++ b/drivers/gpu/drm/drm_file.c
@@ -913,7 +913,7 @@ void drm_show_memory_stats(struct drm_printer *p, struct 
drm_file *file)
DRM_GEM_OBJECT_PURGEABLE;
}
  
-		if (obj->handle_count > 1) {

+   if (drm_gem_object_is_shared_for_memory_stats(obj)) {
status.shared += obj->size;
} else {
status.private += obj->size;


Reviewed-by: Tvrtko Ursulin 

Regards,

Tvrtko


Re: [PATCH 1/6] Documentation/gpu: Update documentation on drm-shared-*

2024-01-30 Thread Tvrtko Ursulin



On 30/01/2024 16:12, Alex Deucher wrote:

Clarify the documentaiton in preparation for updated
helpers which check the handle count as well as whether
a dma-buf has been attached.

Link: 
https://lore.kernel.org/all/20231207180225.439482-1-alexander.deuc...@amd.com/
Signed-off-by: Alex Deucher 
---
  Documentation/gpu/drm-usage-stats.rst | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/gpu/drm-usage-stats.rst 
b/Documentation/gpu/drm-usage-stats.rst
index 7aca5c7a7b1d..6dc299343b48 100644
--- a/Documentation/gpu/drm-usage-stats.rst
+++ b/Documentation/gpu/drm-usage-stats.rst
@@ -138,7 +138,7 @@ indicating kibi- or mebi-bytes.
  
  - drm-shared-:  [KiB|MiB]
  
-The total size of buffers that are shared with another file (ie. have more

+The total size of buffers that are shared with another file (e.g., have more
  than a single handle).
  
  - drm-total-:  [KiB|MiB]


Reviewed-by: Tvrtko Ursulin 

Regards,

Tvrtko


Re: [PATCH 2/6] drm: add drm_gem_object_is_shared_for_memory_stats() helper

2024-01-30 Thread Hamza Mahfooz

On 1/30/24 11:12, Alex Deucher wrote:

Add a helper so that drm drivers can consistently report
shared status via the fdinfo shared memory stats interface.

In addition to handle count, show buffers as shared if they
are shared via dma-buf as well (e.g., shared with v4l or some
other subsystem).

Link: 
https://lore.kernel.org/all/20231207180225.439482-1-alexander.deuc...@amd.com/
Signed-off-by: Alex Deucher 
---
  drivers/gpu/drm/drm_gem.c | 16 
  include/drm/drm_gem.h |  1 +
  2 files changed, 17 insertions(+)

diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 44a948b80ee1..71b5f628d828 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -1506,3 +1506,19 @@ int drm_gem_evict(struct drm_gem_object *obj)
return 0;
  }
  EXPORT_SYMBOL(drm_gem_evict);
+
+/**
+ * drm_gem_object_is_shared_for_memory_stats - helper for shared memory stats
+ *
+ * This helper should only be used for fdinfo shared memory stats to determine
+ * if a GEM object is shared.
+ *
+ * @obj: obj in question
+ */
+bool drm_gem_object_is_shared_for_memory_stats(struct drm_gem_object *obj)
+{
+   if ((obj->handle_count > 1) || obj->dma_buf)
+   return true;
+   return false;


nit: you can simplify this to:
return (obj->handle_count > 1) || obj->dma_buf;

(It maybe worth just inlining this to drm_gem.h).


+}
+EXPORT_SYMBOL(drm_gem_object_is_shared_for_memory_stats);
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index 369505447acd..86a9c696f038 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -552,6 +552,7 @@ unsigned long drm_gem_lru_scan(struct drm_gem_lru *lru,
   bool (*shrink)(struct drm_gem_object *obj));
  
  int drm_gem_evict(struct drm_gem_object *obj);

+bool drm_gem_object_is_shared_for_memory_stats(struct drm_gem_object *obj);
  
  #ifdef CONFIG_LOCKDEP

  /**

--
Hamza



[PATCH 5/6] drm/i915: Update shared stats to use the new gem helper

2024-01-30 Thread Alex Deucher
Switch to using the new gem shared memory stats helper
rather than hand rolling it.

Link: 
https://lore.kernel.org/all/20231207180225.439482-1-alexander.deuc...@amd.com/
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/i915/i915_drm_client.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drm_client.c 
b/drivers/gpu/drm/i915/i915_drm_client.c
index fa6852713bee..f58682505491 100644
--- a/drivers/gpu/drm/i915/i915_drm_client.c
+++ b/drivers/gpu/drm/i915/i915_drm_client.c
@@ -53,7 +53,7 @@ obj_meminfo(struct drm_i915_gem_object *obj,
obj->mm.region->id : INTEL_REGION_SMEM;
const u64 sz = obj->base.size;
 
-   if (obj->base.handle_count > 1)
+   if (drm_gem_object_is_shared_for_memory_stats(>base))
stats[id].shared += sz;
else
stats[id].private += sz;
-- 
2.42.0



[PATCH 6/6] drm/xe: Update shared stats to use the new gem helper

2024-01-30 Thread Alex Deucher
Switch to using the new gem shared memory stats helper
rather than hand rolling it.

Link: 
https://lore.kernel.org/all/20231207180225.439482-1-alexander.deuc...@amd.com/
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/xe/xe_drm_client.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_drm_client.c 
b/drivers/gpu/drm/xe/xe_drm_client.c
index 82d1305e831f..ecf2eb67d310 100644
--- a/drivers/gpu/drm/xe/xe_drm_client.c
+++ b/drivers/gpu/drm/xe/xe_drm_client.c
@@ -113,7 +113,7 @@ static void bo_meminfo(struct xe_bo *bo,
else
mem_type = XE_PL_TT;
 
-   if (bo->ttm.base.handle_count > 1)
+   if (drm_gem_object_is_shared_for_memory_stats(>ttm.base))
stats[mem_type].shared += sz;
else
stats[mem_type].private += sz;
-- 
2.42.0



[PATCH 2/6] drm: add drm_gem_object_is_shared_for_memory_stats() helper

2024-01-30 Thread Alex Deucher
Add a helper so that drm drivers can consistently report
shared status via the fdinfo shared memory stats interface.

In addition to handle count, show buffers as shared if they
are shared via dma-buf as well (e.g., shared with v4l or some
other subsystem).

Link: 
https://lore.kernel.org/all/20231207180225.439482-1-alexander.deuc...@amd.com/
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/drm_gem.c | 16 
 include/drm/drm_gem.h |  1 +
 2 files changed, 17 insertions(+)

diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 44a948b80ee1..71b5f628d828 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -1506,3 +1506,19 @@ int drm_gem_evict(struct drm_gem_object *obj)
return 0;
 }
 EXPORT_SYMBOL(drm_gem_evict);
+
+/**
+ * drm_gem_object_is_shared_for_memory_stats - helper for shared memory stats
+ *
+ * This helper should only be used for fdinfo shared memory stats to determine
+ * if a GEM object is shared.
+ *
+ * @obj: obj in question
+ */
+bool drm_gem_object_is_shared_for_memory_stats(struct drm_gem_object *obj)
+{
+   if ((obj->handle_count > 1) || obj->dma_buf)
+   return true;
+   return false;
+}
+EXPORT_SYMBOL(drm_gem_object_is_shared_for_memory_stats);
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index 369505447acd..86a9c696f038 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -552,6 +552,7 @@ unsigned long drm_gem_lru_scan(struct drm_gem_lru *lru,
   bool (*shrink)(struct drm_gem_object *obj));
 
 int drm_gem_evict(struct drm_gem_object *obj);
+bool drm_gem_object_is_shared_for_memory_stats(struct drm_gem_object *obj);
 
 #ifdef CONFIG_LOCKDEP
 /**
-- 
2.42.0



[PATCH 1/6] Documentation/gpu: Update documentation on drm-shared-*

2024-01-30 Thread Alex Deucher
Clarify the documentaiton in preparation for updated
helpers which check the handle count as well as whether
a dma-buf has been attached.

Link: 
https://lore.kernel.org/all/20231207180225.439482-1-alexander.deuc...@amd.com/
Signed-off-by: Alex Deucher 
---
 Documentation/gpu/drm-usage-stats.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/gpu/drm-usage-stats.rst 
b/Documentation/gpu/drm-usage-stats.rst
index 7aca5c7a7b1d..6dc299343b48 100644
--- a/Documentation/gpu/drm-usage-stats.rst
+++ b/Documentation/gpu/drm-usage-stats.rst
@@ -138,7 +138,7 @@ indicating kibi- or mebi-bytes.
 
 - drm-shared-:  [KiB|MiB]
 
-The total size of buffers that are shared with another file (ie. have more
+The total size of buffers that are shared with another file (e.g., have more
 than a single handle).
 
 - drm-total-:  [KiB|MiB]
-- 
2.42.0



[PATCH 3/6] drm: update drm_show_memory_stats() for dma-bufs

2024-01-30 Thread Alex Deucher
Show buffers as shared if they are shared via dma-buf as well
(e.g., shared with v4l or some other subsystem).

v2: switch to gem helper

Link: 
https://lore.kernel.org/all/20231207180225.439482-1-alexander.deuc...@amd.com/
Reviewed-by: Rob Clark  (v1)
Signed-off-by: Alex Deucher 
Cc: Rob Clark 
---
 drivers/gpu/drm/drm_file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
index 8c87287c3e16..638ffaf5 100644
--- a/drivers/gpu/drm/drm_file.c
+++ b/drivers/gpu/drm/drm_file.c
@@ -913,7 +913,7 @@ void drm_show_memory_stats(struct drm_printer *p, struct 
drm_file *file)
DRM_GEM_OBJECT_PURGEABLE;
}
 
-   if (obj->handle_count > 1) {
+   if (drm_gem_object_is_shared_for_memory_stats(obj)) {
status.shared += obj->size;
} else {
status.private += obj->size;
-- 
2.42.0



[PATCH 4/6] drm/amdgpu: add shared fdinfo stats

2024-01-30 Thread Alex Deucher
Add shared stats.  Useful for seeing shared memory.

v2: take dma-buf into account as well
v3: use the new gem helper

Link: 
https://lore.kernel.org/all/20231207180225.439482-1-alexander.deuc...@amd.com/
Signed-off-by: Alex Deucher 
Cc: Rob Clark 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c |  4 
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 11 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |  6 ++
 3 files changed, 21 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
index 5706b282a0c7..c7df7fa3459f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
@@ -97,6 +97,10 @@ void amdgpu_show_fdinfo(struct drm_printer *p, struct 
drm_file *file)
   stats.requested_visible_vram/1024UL);
drm_printf(p, "amd-requested-gtt:\t%llu KiB\n",
   stats.requested_gtt/1024UL);
+   drm_printf(p, "drm-shared-vram:\t%llu KiB\n", stats.vram_shared/1024UL);
+   drm_printf(p, "drm-shared-gtt:\t%llu KiB\n", stats.gtt_shared/1024UL);
+   drm_printf(p, "drm-shared-cpu:\t%llu KiB\n", stats.cpu_shared/1024UL);
+
for (hw_ip = 0; hw_ip < AMDGPU_HW_IP_NUM; ++hw_ip) {
if (!usage[hw_ip])
continue;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 425cebcc5cbf..e6f69fce539b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -1276,25 +1276,36 @@ void amdgpu_bo_get_memory(struct amdgpu_bo *bo,
  struct amdgpu_mem_stats *stats)
 {
uint64_t size = amdgpu_bo_size(bo);
+   struct drm_gem_object *obj;
unsigned int domain;
+   bool shared;
 
/* Abort if the BO doesn't currently have a backing store */
if (!bo->tbo.resource)
return;
 
+   obj = >tbo.base;
+   shared = drm_gem_object_is_shared_for_memory_stats(obj);
+
domain = amdgpu_mem_type_to_domain(bo->tbo.resource->mem_type);
switch (domain) {
case AMDGPU_GEM_DOMAIN_VRAM:
stats->vram += size;
if (amdgpu_bo_in_cpu_visible_vram(bo))
stats->visible_vram += size;
+   if (shared)
+   stats->vram_shared += size;
break;
case AMDGPU_GEM_DOMAIN_GTT:
stats->gtt += size;
+   if (shared)
+   stats->gtt_shared += size;
break;
case AMDGPU_GEM_DOMAIN_CPU:
default:
stats->cpu += size;
+   if (shared)
+   stats->cpu_shared += size;
break;
}
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
index a3ea8a82db23..be679c42b0b8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
@@ -138,12 +138,18 @@ struct amdgpu_bo_vm {
 struct amdgpu_mem_stats {
/* current VRAM usage, includes visible VRAM */
uint64_t vram;
+   /* current shared VRAM usage, includes visible VRAM */
+   uint64_t vram_shared;
/* current visible VRAM usage */
uint64_t visible_vram;
/* current GTT usage */
uint64_t gtt;
+   /* current shared GTT usage */
+   uint64_t gtt_shared;
/* current system memory usage */
uint64_t cpu;
+   /* current shared system memory usage */
+   uint64_t cpu_shared;
/* sum of evicted buffers, includes visible VRAM */
uint64_t evicted_vram;
/* sum of evicted buffers due to CPU access */
-- 
2.42.0



[PATCH 0/6 V3] fdinfo shared stats

2024-01-30 Thread Alex Deucher
We had a request to add shared buffer stats to fdinfo for amdgpu and
while implementing that, Christian mentioned that just looking at
the GEM handle count doesn't take into account buffers shared with other
subsystems like V4L or RDMA.  Those subsystems don't use GEM, so it
doesn't really matter from a GPU top perspective, but it's more
correct if you actually want to see shared buffers.

After further discussions, add a helper and update all fdinfo
implementations to use that helper for consistency.

Alex Deucher (6):
  Documentation/gpu: Update documentation on drm-shared-*
  drm: add drm_gem_object_is_shared_for_memory_stats() helper
  drm: update drm_show_memory_stats() for dma-bufs
  drm/amdgpu: add shared fdinfo stats
  drm/i915: Update shared stats to use the new gem helper
  drm/xe: Update shared stats to use the new gem helper

 Documentation/gpu/drm-usage-stats.rst  |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c |  4 
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 11 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |  6 ++
 drivers/gpu/drm/drm_file.c |  2 +-
 drivers/gpu/drm/drm_gem.c  | 16 
 drivers/gpu/drm/i915/i915_drm_client.c |  2 +-
 drivers/gpu/drm/xe/xe_drm_client.c |  2 +-
 include/drm/drm_gem.h  |  1 +
 9 files changed, 42 insertions(+), 4 deletions(-)

-- 
2.42.0



Re: [PATCH] drm/amd/display: Add NULL check for kzalloc in 'amdgpu_dm_atomic_commit_tail()'

2024-01-30 Thread Pillai, Aurabindo
[AMD Official Use Only - General]

Prefer drm_err instead of DRM_ERR: 
https://elixir.bootlin.com/linux/latest/source/include/drm/drm_print.h#L468

With or without that fixed, patch is

Reviewed-by: Aurabindo Pillai 

--

Regards,
Jay

From: SHANMUGAM, SRINIVASAN 
Sent: Tuesday, January 30, 2024 4:45 AM
To: Siqueira, Rodrigo ; Pillai, Aurabindo 

Cc: amd-gfx@lists.freedesktop.org ; Julia Lawall 
; Hung, Alex ; Deucher, Alexander 
; Chung, ChiaHsuan (Tom) 
Subject: Re: [PATCH] drm/amd/display: Add NULL check for kzalloc in 
'amdgpu_dm_atomic_commit_tail()'

+ Cc: Tom Chung 

On 1/30/2024 2:11 PM, SHANMUGAM, SRINIVASAN wrote:
> Add a NULL check for the kzalloc call that allocates memory for
> dummy_updates in the amdgpu_dm_atomic_commit_tail function. Previously,
> if kzalloc failed to allocate memory and returned NULL, the code would
> attempt to use the NULL pointer.
>
> The fix is to check if kzalloc returns NULL, and if so, log an error
> message and skip the rest of the current loop iteration with the
> continue statement.  This prevents the code from attempting to use the
> NULL pointer.
>
> Cc: Julia Lawall 
> Cc: Aurabindo Pillai 
> Cc: Rodrigo Siqueira 
> Cc: Alex Hung 
> Cc: Alex Deucher 
> Signed-off-by: Srinivasan Shanmugam 
> ---
>   drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 4 
>   1 file changed, 4 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> index 0bf1bc7ced7d..8590c9f1dda6 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -9236,6 +9236,10 @@ static void amdgpu_dm_atomic_commit_tail(struct 
> drm_atomic_state *state)
> * To fix this, DC should permit updating only stream 
> properties.
> */
>dummy_updates = kzalloc(sizeof(struct dc_surface_update) * 
> MAX_SURFACES, GFP_ATOMIC);
> + if (!dummy_updates) {
> + DRM_ERROR("Failed to allocate memory for 
> dummy_updates.\n");
> + continue;
> + }
>for (j = 0; j < status->plane_count; j++)
>dummy_updates[j].surface = status->plane_states[0];
>


[PATCH v3 9/9] drm/ci: uprev IGT and update testlist

2024-01-30 Thread Vignesh Raman
Uprev IGT and add amd, v3d, vc4 and vgem specific
tests to testlist. Have testlist.txt per driver
and include a base testlist so that the driver
specific tests will run only on those hardware.

Signed-off-by: Vignesh Raman 
---

v3:
  - New patch in series to uprev IGT and update testlist.

---
 drivers/gpu/drm/ci/gitlab-ci.yml  |   2 +-
 drivers/gpu/drm/ci/igt_runner.sh  |  12 +-
 drivers/gpu/drm/ci/testlist-amdgpu.txt| 151 ++
 drivers/gpu/drm/ci/testlist-msm.txt   |  50 ++
 drivers/gpu/drm/ci/testlist-panfrost.txt  |  17 ++
 drivers/gpu/drm/ci/testlist-v3d.txt   |  73 +
 drivers/gpu/drm/ci/testlist-vc4.txt   |  49 ++
 drivers/gpu/drm/ci/testlist.txt   | 100 
 .../gpu/drm/ci/xfails/amdgpu-stoney-fails.txt |  24 ++-
 .../drm/ci/xfails/amdgpu-stoney-flakes.txt|   9 +-
 .../gpu/drm/ci/xfails/amdgpu-stoney-skips.txt |  10 +-
 11 files changed, 427 insertions(+), 70 deletions(-)
 create mode 100644 drivers/gpu/drm/ci/testlist-amdgpu.txt
 create mode 100644 drivers/gpu/drm/ci/testlist-msm.txt
 create mode 100644 drivers/gpu/drm/ci/testlist-panfrost.txt
 create mode 100644 drivers/gpu/drm/ci/testlist-v3d.txt
 create mode 100644 drivers/gpu/drm/ci/testlist-vc4.txt

diff --git a/drivers/gpu/drm/ci/gitlab-ci.yml b/drivers/gpu/drm/ci/gitlab-ci.yml
index bc8cb3420476..e2b021616a8e 100644
--- a/drivers/gpu/drm/ci/gitlab-ci.yml
+++ b/drivers/gpu/drm/ci/gitlab-ci.yml
@@ -5,7 +5,7 @@ variables:
   UPSTREAM_REPO: git://anongit.freedesktop.org/drm/drm
   TARGET_BRANCH: drm-next
 
-  IGT_VERSION: d2af13d9f5be5ce23d996e4afd3e45990f5ab977
+  IGT_VERSION: b0cc8160ebdc87ce08b7fd83bb3c99ff7a4d8610
 
   DEQP_RUNNER_GIT_URL: https://gitlab.freedesktop.org/anholt/deqp-runner.git
   DEQP_RUNNER_GIT_TAG: v0.15.0
diff --git a/drivers/gpu/drm/ci/igt_runner.sh b/drivers/gpu/drm/ci/igt_runner.sh
index f001e015d135..2fd09b9b7cf6 100755
--- a/drivers/gpu/drm/ci/igt_runner.sh
+++ b/drivers/gpu/drm/ci/igt_runner.sh
@@ -64,10 +64,20 @@ if ! grep -q "core_getversion" /install/testlist.txt; then
 fi
 
 set +e
+if [ "$DRIVER_NAME" = "amdgpu" ]; then
+TEST_LIST="/install/testlist-amdgpu.txt"
+elif [ "$DRIVER_NAME" = "msm" ]; then
+TEST_LIST="/install/testlist-msm.txt"
+elif [ "$DRIVER_NAME" = "panfrost" ]; then
+TEST_LIST="/install/testlist-panfrost.txt"
+else
+TEST_LIST="/install/testlist.txt"
+fi
+
 igt-runner \
 run \
 --igt-folder /igt/libexec/igt-gpu-tools \
---caselist /install/testlist.txt \
+--caselist $TEST_LIST \
 --output /results \
 $IGT_SKIPS \
 $IGT_FLAKES \
diff --git a/drivers/gpu/drm/ci/testlist-amdgpu.txt 
b/drivers/gpu/drm/ci/testlist-amdgpu.txt
new file mode 100644
index ..4486f86d340b
--- /dev/null
+++ b/drivers/gpu/drm/ci/testlist-amdgpu.txt
@@ -0,0 +1,151 @@
+testlist.txt
+amdgpu/amd_abm@dpms_cycle
+amdgpu/amd_abm@backlight_monotonic_basic
+amdgpu/amd_abm@backlight_monotonic_abm
+amdgpu/amd_abm@abm_enabled
+amdgpu/amd_abm@abm_gradual
+amdgpu/amd_bo@amdgpu_bo_export_import
+amdgpu/amd_bo@amdgpu_bo_metadata
+amdgpu/amd_bo@amdgpu_bo_map_unmap
+amdgpu/amd_bo@amdgpu_memory_alloc
+amdgpu/amd_bo@amdgpu_mem_fail_alloc
+amdgpu/amd_bo@amdgpu_bo_find_by_cpu_mapping
+amdgpu/amd_cp_dma_misc@GTT_to_VRAM-AMDGPU_HW_IP_GFX0
+amdgpu/amd_cp_dma_misc@GTT_to_VRAM-AMDGPU_HW_IP_COMPUTE0
+amdgpu/amd_cp_dma_misc@VRAM_to_GTT-AMDGPU_HW_IP_GFX0
+amdgpu/amd_cp_dma_misc@VRAM_to_GTT-AMDGPU_HW_IP_COMPUTE0
+amdgpu/amd_cp_dma_misc@VRAM_to_VRAM-AMDGPU_HW_IP_GFX0
+amdgpu/amd_cp_dma_misc@VRAM_to_VRAM-AMDGPU_HW_IP_COMPUTE0
+amdgpu/amd_dispatch@amdgpu-dispatch-test-compute-with-IP-COMPUTE
+amdgpu/amd_dispatch@amdgpu-dispatch-test-gfx-with-IP-GFX
+amdgpu/amd_dispatch@amdgpu-dispatch-hang-test-gfx-with-IP-GFX
+amdgpu/amd_dispatch@amdgpu-dispatch-hang-test-compute-with-IP-COMPUTE
+amdgpu/amd_dispatch@amdgpu-reset-test-gfx-with-IP-GFX-and-COMPUTE
+amdgpu/amd_hotplug@basic
+amdgpu/amd_hotplug@basic-suspend
+amdgpu/amd_jpeg_dec@amdgpu_cs_jpeg_decode
+amdgpu/amd_max_bpc@4k-mode-max-bpc
+amdgpu/amd_module_load@reload
+amdgpu/amd_plane@test-mpo-4k
+amdgpu/amd_plane@mpo-swizzle-toggle
+amdgpu/amd_plane@mpo-swizzle-toggle-multihead
+amdgpu/amd_plane@mpo-pan-rgb
+amdgpu/amd_plane@mpo-pan-rgb-multihead
+amdgpu/amd_plane@mpo-pan-nv12
+amdgpu/amd_plane@mpo-pan-nv12-multihead
+amdgpu/amd_plane@mpo-pan-p010
+amdgpu/amd_plane@mpo-pan-p010-multihead
+amdgpu/amd_plane@mpo-pan-multi-rgb
+amdgpu/amd_plane@mpo-pan-multi-nv12
+amdgpu/amd_plane@mpo-pan-multi-p010
+amdgpu/amd_plane@multi-overlay
+amdgpu/amd_plane@multi-overlay-invalid
+amdgpu/amd_plane@mpo-scale-rgb
+amdgpu/amd_plane@mpo-scale-rgb-multihead
+amdgpu/amd_plane@mpo-scale-nv12
+amdgpu/amd_plane@mpo-scale-nv12-multihead
+amdgpu/amd_plane@mpo-scale-p010
+amdgpu/amd_plane@mpo-scale-p010-multihead
+amdgpu/amd_pstate@amdgpu_pstate
+amdgpu/amd_subvp@dual-4k60
+amdgpu/amd_uvd_enc@uvd_enc_create
+amdgpu/amd_uvd_enc@amdgpu_uvd_enc_session_init

[PATCH v3 6/9] drm/ci: rockchip: Rename existing job

2024-01-30 Thread Vignesh Raman
For rockchip rk3288 and rk3399, the display driver is rockchip.
Currently, in drm-ci for rockchip, only the display driver is
tested. So rename the rockchip job to indicate that display
driver is tested.

Rename the name of xfail files for rockchip (rk3288 and rk3399),
to include information about the tested driver and update xfails
accordingly.

Signed-off-by: Vignesh Raman 
---

v2:
  - Refactor the patch to rename job to indicate display driver testing,
rename the existing xfail files.

v3:
  - Add the job name in GPU_VERSION and use it for xfail file names
instead of using DRIVER_NAME. Also update xfails.

---
 drivers/gpu/drm/ci/test.yml   | 36 -
 .../xfails/rockchip-rk3288-display-fails.txt  | 21 
 .../xfails/rockchip-rk3288-display-flakes.txt | 17 ++
 .../xfails/rockchip-rk3288-display-skips.txt  |  8 +++
 .../drm/ci/xfails/rockchip-rk3288-fails.txt   | 54 ---
 .../drm/ci/xfails/rockchip-rk3288-skips.txt   | 52 --
 txt => rockchip-rk3399-display-fails.txt} | 38 +++--
 .../xfails/rockchip-rk3399-display-flakes.txt | 23 
 .../xfails/rockchip-rk3399-display-skips.txt  |  6 +++
 .../drm/ci/xfails/rockchip-rk3399-flakes.txt  |  7 ---
 .../drm/ci/xfails/rockchip-rk3399-skips.txt   |  5 --
 11 files changed, 117 insertions(+), 150 deletions(-)
 create mode 100644 drivers/gpu/drm/ci/xfails/rockchip-rk3288-display-fails.txt
 create mode 100644 drivers/gpu/drm/ci/xfails/rockchip-rk3288-display-flakes.txt
 create mode 100644 drivers/gpu/drm/ci/xfails/rockchip-rk3288-display-skips.txt
 delete mode 100644 drivers/gpu/drm/ci/xfails/rockchip-rk3288-fails.txt
 delete mode 100644 drivers/gpu/drm/ci/xfails/rockchip-rk3288-skips.txt
 rename drivers/gpu/drm/ci/xfails/{rockchip-rk3399-fails.txt => 
rockchip-rk3399-display-fails.txt} (71%)
 create mode 100644 drivers/gpu/drm/ci/xfails/rockchip-rk3399-display-flakes.txt
 create mode 100644 drivers/gpu/drm/ci/xfails/rockchip-rk3399-display-skips.txt
 delete mode 100644 drivers/gpu/drm/ci/xfails/rockchip-rk3399-flakes.txt
 delete mode 100644 drivers/gpu/drm/ci/xfails/rockchip-rk3399-skips.txt

diff --git a/drivers/gpu/drm/ci/test.yml b/drivers/gpu/drm/ci/test.yml
index f4053bc0e365..1b8846c6bdbf 100644
--- a/drivers/gpu/drm/ci/test.yml
+++ b/drivers/gpu/drm/ci/test.yml
@@ -150,33 +150,45 @@ msm:sdm845:
   script:
 - ./install/bare-metal/cros-servo.sh
 
-rockchip:rk3288:
-  extends:
-- .lava-igt:arm32
+.rockchip:
   stage: rockchip
   variables:
-DRIVER_NAME: rockchip
-DEVICE_TYPE: rk3288-veyron-jaq
 DTB: ${DEVICE_TYPE}
 BOOT_METHOD: depthcharge
+
+.rk3288:
+  extends:
+- .lava-igt:arm32
+- .rockchip
+  variables:
+DEVICE_TYPE: rk3288-veyron-jaq
 KERNEL_IMAGE_TYPE: "zimage"
-GPU_VERSION: rockchip-rk3288
 RUNNER_TAG: mesa-ci-x86-64-lava-rk3288-veyron-jaq
 
-rockchip:rk3399:
+.rk3399:
   extends:
 - .lava-igt:arm64
-  stage: rockchip
+- .rockchip
   parallel: 2
   variables:
-DRIVER_NAME: rockchip
 DEVICE_TYPE: rk3399-gru-kevin
-DTB: ${DEVICE_TYPE}
-BOOT_METHOD: depthcharge
 KERNEL_IMAGE_TYPE: ""
-GPU_VERSION: rockchip-rk3399
 RUNNER_TAG: mesa-ci-x86-64-lava-rk3399-gru-kevin
 
+rockchip:rk3288-display:
+  extends:
+- .rk3288
+  variables:
+GPU_VERSION: rockchip-rk3288-display
+DRIVER_NAME: rockchip
+
+rockchip:rk3399-display:
+  extends:
+- .rk3399
+  variables:
+GPU_VERSION: rockchip-rk3399-display
+DRIVER_NAME: rockchip
+
 .i915:
   extends:
 - .lava-igt:x86_64
diff --git a/drivers/gpu/drm/ci/xfails/rockchip-rk3288-display-fails.txt 
b/drivers/gpu/drm/ci/xfails/rockchip-rk3288-display-fails.txt
new file mode 100644
index ..6fae7d85c2c3
--- /dev/null
+++ b/drivers/gpu/drm/ci/xfails/rockchip-rk3288-display-fails.txt
@@ -0,0 +1,21 @@
+kms_cursor_crc@cursor-onscreen-32x10,Crash
+kms_cursor_crc@cursor-onscreen-64x21,Crash
+kms_cursor_crc@cursor-onscreen-64x64,Crash
+kms_cursor_crc@cursor-random-32x10,Crash
+kms_cursor_crc@cursor-random-64x21,Crash
+kms_cursor_crc@cursor-sliding-32x10,Crash
+kms_cursor_legacy@cursor-vs-flip-atomic,Fail
+kms_cursor_legacy@cursor-vs-flip-atomic-transitions,Fail
+kms_cursor_legacy@cursor-vs-flip-toggle,Fail
+kms_cursor_legacy@flip-vs-cursor-crc-atomic,Crash
+kms_flip@flip-vs-modeset-vs-hang,Crash
+kms_flip@flip-vs-panning-vs-hang,Crash
+kms_invalid_mode@int-max-clock,Crash
+kms_pipe_crc_basic@read-crc-frame-sequence,Crash
+kms_plane@pixel-format,Crash
+kms_plane_cursor@primary,Crash
+kms_prop_blob@invalid-set-prop,Crash
+kms_prop_blob@invalid-set-prop-any,Crash
+kms_properties@connector-properties-legacy,Crash
+kms_properties@get_properties-sanity-atomic,Crash
+kms_properties@get_properties-sanity-non-atomic,Crash
diff --git a/drivers/gpu/drm/ci/xfails/rockchip-rk3288-display-flakes.txt 
b/drivers/gpu/drm/ci/xfails/rockchip-rk3288-display-flakes.txt
new file mode 100644
index ..0bd27b8d41ce
--- /dev/null
+++ 

[PATCH v3 8/9] drm/ci: uprev mesa version

2024-01-30 Thread Vignesh Raman
zlib.net is not allowing tarball download anymore and results
in below error in kernel+rootfs_arm32 container build,
urllib.error.HTTPError: HTTP Error 403: Forbidden
urllib.error.HTTPError: HTTP Error 415: Unsupported Media Type

Uprev mesa which includes a fix for this issue.
https://gitlab.freedesktop.org/mesa/mesa/-/commit/908f444ec10fe44ae2df004909b2e6206188a71a

Signed-off-by: Vignesh Raman 
---

v3:
  - New patch in series to uprev mesa.

---
 drivers/gpu/drm/ci/container.yml  | 6 +++---
 drivers/gpu/drm/ci/gitlab-ci.yml  | 6 +++---
 drivers/gpu/drm/ci/image-tags.yml | 3 ++-
 3 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/ci/container.yml b/drivers/gpu/drm/ci/container.yml
index 9764e7921a4f..1060eb380b02 100644
--- a/drivers/gpu/drm/ci/container.yml
+++ b/drivers/gpu/drm/ci/container.yml
@@ -40,11 +40,11 @@ debian/x86_64_test-android:
   rules:
 - when: never
 
-windows_build_vs2019:
+windows_build_msvc:
   rules:
 - when: never
 
-windows_test_vs2019:
+windows_test_msvc:
   rules:
 - when: never
 
@@ -56,7 +56,7 @@ rustfmt:
rules:
 - when: never
 
-windows_vs2019:
+windows_msvc:
rules:
 - when: never
 
diff --git a/drivers/gpu/drm/ci/gitlab-ci.yml b/drivers/gpu/drm/ci/gitlab-ci.yml
index 084e3ff8e3f4..bc8cb3420476 100644
--- a/drivers/gpu/drm/ci/gitlab-ci.yml
+++ b/drivers/gpu/drm/ci/gitlab-ci.yml
@@ -1,6 +1,6 @@
 variables:
   DRM_CI_PROJECT_PATH:  mesa/mesa
-  DRM_CI_COMMIT_SHA:  
9d162de9a05155e1c4041857a5848842749164cf
+  DRM_CI_COMMIT_SHA:  
c4b32f9e90b7204735e6adf1f60c178bf85752e7
 
   UPSTREAM_REPO: git://anongit.freedesktop.org/drm/drm
   TARGET_BRANCH: drm-next
@@ -26,7 +26,7 @@ variables:
   JOB_ARTIFACTS_BASE: ${PIPELINE_ARTIFACTS_BASE}/${CI_JOB_ID}
   # default kernel for rootfs before injecting the current kernel tree
   KERNEL_REPO: "gfx-ci/linux"
-  KERNEL_TAG: "v6.6.4-for-mesa-ci-e4f4c500f7fb"
+  KERNEL_TAG: "v6.6.13-mesa-9916"
   KERNEL_IMAGE_BASE: https://${S3_HOST}/mesa-lava/${KERNEL_REPO}/${KERNEL_TAG}
   LAVA_TAGS: subset-1-gfx
   LAVA_JOB_PRIORITY: 30
@@ -98,6 +98,7 @@ include:
 stages:
   - sanity
   - container
+  - code-validation
   - git-archive
   - build
   - amdgpu
@@ -107,7 +108,6 @@ stages:
   - msm
   - rockchip
   - virtio-gpu
-  - lint
 
 # YAML anchors for rule conditions
 # 
diff --git a/drivers/gpu/drm/ci/image-tags.yml 
b/drivers/gpu/drm/ci/image-tags.yml
index 7ab4f2514da8..cf07c3e09b8c 100644
--- a/drivers/gpu/drm/ci/image-tags.yml
+++ b/drivers/gpu/drm/ci/image-tags.yml
@@ -1,5 +1,5 @@
 variables:
-   CONTAINER_TAG: "2023-10-11-mesa-uprev"
+   CONTAINER_TAG: "2022-01-29-mesa-uprev"
DEBIAN_X86_64_BUILD_BASE_IMAGE: "debian/x86_64_build-base"
DEBIAN_BASE_TAG: "${CONTAINER_TAG}"
 
@@ -7,6 +7,7 @@ variables:
DEBIAN_BUILD_TAG: "2023-10-08-config"
 
KERNEL_ROOTFS_TAG: "2023-10-06-amd"
+   PKG_REPO_REV: "67f2c46b"
 
DEBIAN_X86_64_TEST_BASE_IMAGE: "debian/x86_64_test-base"
DEBIAN_X86_64_TEST_IMAGE_GL_PATH: "debian/x86_64_test-gl"
-- 
2.40.1



[PATCH v3 7/9] drm/ci: rockchip: Add job to test panfrost GPU driver

2024-01-30 Thread Vignesh Raman
For rockchip rk3288 and rk3399, the GPU driver is panfrost.
So add support in drm-ci to test panfrost driver for rockchip
SOC and update xfails. Skip KMS tests for panfrost driver
since it is not a not a KMS driver.

Signed-off-by: Vignesh Raman 
---

v2:
  - Add panfrost GPU jobs for rockchip SOC with new xfails.

v3:
  - Skip KMS tests for panfrost driver since it is not a not
a KMS driver and update xfails. Add the job name in GPU_VERSION
and use it for xfail file names instead of using DRIVER_NAME.

---
 drivers/gpu/drm/ci/test.yml| 14 ++
 .../drm/ci/xfails/rockchip-rk3288-gpu-fails.txt|  1 +
 .../drm/ci/xfails/rockchip-rk3288-gpu-skips.txt|  2 ++
 .../drm/ci/xfails/rockchip-rk3399-gpu-fails.txt|  1 +
 .../drm/ci/xfails/rockchip-rk3399-gpu-skips.txt|  2 ++
 5 files changed, 20 insertions(+)
 create mode 100644 drivers/gpu/drm/ci/xfails/rockchip-rk3288-gpu-fails.txt
 create mode 100644 drivers/gpu/drm/ci/xfails/rockchip-rk3288-gpu-skips.txt
 create mode 100644 drivers/gpu/drm/ci/xfails/rockchip-rk3399-gpu-fails.txt
 create mode 100644 drivers/gpu/drm/ci/xfails/rockchip-rk3399-gpu-skips.txt

diff --git a/drivers/gpu/drm/ci/test.yml b/drivers/gpu/drm/ci/test.yml
index 1b8846c6bdbf..8ab8a8f56d6a 100644
--- a/drivers/gpu/drm/ci/test.yml
+++ b/drivers/gpu/drm/ci/test.yml
@@ -175,6 +175,13 @@ msm:sdm845:
 KERNEL_IMAGE_TYPE: ""
 RUNNER_TAG: mesa-ci-x86-64-lava-rk3399-gru-kevin
 
+rockchip:rk3288-gpu:
+  extends:
+- .rk3288
+  variables:
+GPU_VERSION: rockchip-rk3288-gpu
+DRIVER_NAME: panfrost
+
 rockchip:rk3288-display:
   extends:
 - .rk3288
@@ -182,6 +189,13 @@ rockchip:rk3288-display:
 GPU_VERSION: rockchip-rk3288-display
 DRIVER_NAME: rockchip
 
+rockchip:rk3399-gpu:
+  extends:
+- .rk3399
+  variables:
+GPU_VERSION: rockchip-rk3399-gpu
+DRIVER_NAME: panfrost
+
 rockchip:rk3399-display:
   extends:
 - .rk3399
diff --git a/drivers/gpu/drm/ci/xfails/rockchip-rk3288-gpu-fails.txt 
b/drivers/gpu/drm/ci/xfails/rockchip-rk3288-gpu-fails.txt
new file mode 100644
index ..abd35a8ef6f4
--- /dev/null
+++ b/drivers/gpu/drm/ci/xfails/rockchip-rk3288-gpu-fails.txt
@@ -0,0 +1 @@
+panfrost_prime@gem-prime-import,Crash
diff --git a/drivers/gpu/drm/ci/xfails/rockchip-rk3288-gpu-skips.txt 
b/drivers/gpu/drm/ci/xfails/rockchip-rk3288-gpu-skips.txt
new file mode 100644
index ..2ea09d1648bc
--- /dev/null
+++ b/drivers/gpu/drm/ci/xfails/rockchip-rk3288-gpu-skips.txt
@@ -0,0 +1,2 @@
+# Panfrost is not a KMS driver, so skip the KMS tests
+kms_.*
diff --git a/drivers/gpu/drm/ci/xfails/rockchip-rk3399-gpu-fails.txt 
b/drivers/gpu/drm/ci/xfails/rockchip-rk3399-gpu-fails.txt
new file mode 100644
index ..6f5e760d5ec0
--- /dev/null
+++ b/drivers/gpu/drm/ci/xfails/rockchip-rk3399-gpu-fails.txt
@@ -0,0 +1 @@
+panfrost_prime@gem-prime-import,Fail
diff --git a/drivers/gpu/drm/ci/xfails/rockchip-rk3399-gpu-skips.txt 
b/drivers/gpu/drm/ci/xfails/rockchip-rk3399-gpu-skips.txt
new file mode 100644
index ..2ea09d1648bc
--- /dev/null
+++ b/drivers/gpu/drm/ci/xfails/rockchip-rk3399-gpu-skips.txt
@@ -0,0 +1,2 @@
+# Panfrost is not a KMS driver, so skip the KMS tests
+kms_.*
-- 
2.40.1



[PATCH v3 2/9] drm/ci: mediatek: Rename exisitng job

2024-01-30 Thread Vignesh Raman
For mediatek mt8173 and mt8183, the display driver is mediatek.
Currently, in drm-ci for mediatek, only the display driver is
tested. So rename the mediatek job to indicate that display driver is
tested. Rename the name of xfail files for mediatek (mt8173 and mt8183),
to include information about the tested driver and update xfails
accordingly. Since the correct driver name is passed from the job to
test gpu and display driver, remove the check to set IGT_FORCE_DRIVER
based on driver name.

Also add the job name in GPU_VERSION and use it for xfail file names
instead of using DRIVER_NAME.

Signed-off-by: Vignesh Raman 
---

v2:
  - Refactor the patch to rename job to indicate display driver testing,
rename the existing xfail files, and remove IGT_FORCE_DRIVER from the
script since it's now set by the job.

v3:
  - Add the job name in GPU_VERSION and use it for xfail file names instead
of using DRIVER_NAME. Also update xfails.

---
 drivers/gpu/drm/ci/igt_runner.sh  | 22 ++-
 drivers/gpu/drm/ci/test.yml   | 57 +++
 txt => mediatek-mt8173-display-fails.txt} | 13 -
 .../xfails/mediatek-mt8173-display-flakes.txt | 13 +
 .../xfails/mediatek-mt8183-display-fails.txt  | 16 ++
 .../xfails/mediatek-mt8183-display-flakes.txt |  8 +++
 .../drm/ci/xfails/mediatek-mt8183-fails.txt   | 13 -
 7 files changed, 77 insertions(+), 65 deletions(-)
 rename drivers/gpu/drm/ci/xfails/{mediatek-mt8173-fails.txt => 
mediatek-mt8173-display-fails.txt} (59%)
 create mode 100644 drivers/gpu/drm/ci/xfails/mediatek-mt8173-display-flakes.txt
 create mode 100644 drivers/gpu/drm/ci/xfails/mediatek-mt8183-display-fails.txt
 create mode 100644 drivers/gpu/drm/ci/xfails/mediatek-mt8183-display-flakes.txt
 delete mode 100644 drivers/gpu/drm/ci/xfails/mediatek-mt8183-fails.txt

diff --git a/drivers/gpu/drm/ci/igt_runner.sh b/drivers/gpu/drm/ci/igt_runner.sh
index f1a08b9b146f..f001e015d135 100755
--- a/drivers/gpu/drm/ci/igt_runner.sh
+++ b/drivers/gpu/drm/ci/igt_runner.sh
@@ -20,16 +20,6 @@ cat /sys/kernel/debug/dri/*/state
 set -e
 
 case "$DRIVER_NAME" in
-rockchip|meson)
-export IGT_FORCE_DRIVER="panfrost"
-;;
-mediatek)
-if [ "$GPU_VERSION" = "mt8173" ]; then
-export IGT_FORCE_DRIVER=${DRIVER_NAME}
-elif [ "$GPU_VERSION" = "mt8183" ]; then
-export IGT_FORCE_DRIVER="panfrost"
-fi
-;;
 amdgpu)
 # Cannot use HWCI_KERNEL_MODULES as at that point we don't have the 
module in /lib
 mv /install/modules/lib/modules/* /lib/modules/.
@@ -37,16 +27,16 @@ case "$DRIVER_NAME" in
 ;;
 esac
 
-if [ -e "/install/xfails/$DRIVER_NAME-$GPU_VERSION-skips.txt" ]; then
-IGT_SKIPS="--skips /install/xfails/$DRIVER_NAME-$GPU_VERSION-skips.txt"
+if [ -e "/install/xfails/$GPU_VERSION-skips.txt" ]; then
+IGT_SKIPS="--skips /install/xfails/$GPU_VERSION-skips.txt"
 fi
 
-if [ -e "/install/xfails/$DRIVER_NAME-$GPU_VERSION-flakes.txt" ]; then
-IGT_FLAKES="--flakes /install/xfails/$DRIVER_NAME-$GPU_VERSION-flakes.txt"
+if [ -e "/install/xfails/$GPU_VERSION-flakes.txt" ]; then
+IGT_FLAKES="--flakes /install/xfails/$GPU_VERSION-flakes.txt"
 fi
 
-if [ -e "/install/xfails/$DRIVER_NAME-$GPU_VERSION-fails.txt" ]; then
-IGT_FAILS="--baseline /install/xfails/$DRIVER_NAME-$GPU_VERSION-fails.txt"
+if [ -e "/install/xfails/$GPU_VERSION-fails.txt" ]; then
+IGT_FAILS="--baseline /install/xfails/$GPU_VERSION-fails.txt"
 fi
 
 if [ "`uname -m`" = "aarch64" ]; then
diff --git a/drivers/gpu/drm/ci/test.yml b/drivers/gpu/drm/ci/test.yml
index 355b794ef2b1..0cd44e6ea18b 100644
--- a/drivers/gpu/drm/ci/test.yml
+++ b/drivers/gpu/drm/ci/test.yml
@@ -98,7 +98,7 @@ msm:sc7180-trogdor-lazor-limozeen:
   variables:
 DEVICE_TYPE: sc7180-trogdor-lazor-limozeen
 DTB: sc7180-trogdor-lazor-limozeen-nots-r5
-GPU_VERSION: ${DEVICE_TYPE}
+GPU_VERSION: msm-${DEVICE_TYPE}
 RUNNER_TAG: mesa-ci-x86-64-lava-sc7180-trogdor-lazor-limozeen
 
 msm:sc7180-trogdor-kingoftown:
@@ -108,7 +108,7 @@ msm:sc7180-trogdor-kingoftown:
   variables:
 DEVICE_TYPE: sc7180-trogdor-kingoftown
 DTB: sc7180-trogdor-kingoftown
-GPU_VERSION: ${DEVICE_TYPE}
+GPU_VERSION: msm-${DEVICE_TYPE}
 RUNNER_TAG: mesa-ci-x86-64-lava-sc7180-trogdor-kingoftown
 
 msm:apq8016:
@@ -118,7 +118,7 @@ msm:apq8016:
   variables:
 DRIVER_NAME: msm
 BM_DTB: https://${PIPELINE_ARTIFACTS_BASE}/arm64/apq8016-sbc-usb-host.dtb
-GPU_VERSION: apq8016
+GPU_VERSION: msm-apq8016
 BM_CMDLINE: "ip=dhcp console=ttyMSM0,115200n8 $BM_KERNEL_EXTRA_ARGS 
root=/dev/nfs rw nfsrootdebug nfsroot=,tcp,nfsvers=4.2 init=/init 
$BM_KERNELARGS"
 RUNNER_TAG: google-freedreno-db410c
   script:
@@ -132,7 +132,7 @@ msm:apq8096:
 DRIVER_NAME: msm
 BM_KERNEL_EXTRA_ARGS: maxcpus=2
 BM_DTB: https://${PIPELINE_ARTIFACTS_BASE}/arm64/apq8096-db820c.dtb
-GPU_VERSION: apq8096
+GPU_VERSION: msm-apq8096

[PATCH v3 1/9] drm/ci: arm64.config: Enable CONFIG_DRM_ANALOGIX_ANX7625

2024-01-30 Thread Vignesh Raman
Enable CONFIG_DRM_ANALOGIX_ANX7625 in the arm64 defconfig to get
display driver probed on the mt8183-kukui-jacuzzi-juniper machine.

arch/arm64/configs/defconfig has CONFIG_DRM_ANALOGIX_ANX7625=m,
but drm-ci don't have initrd with modules, so add
CONFIG_DRM_ANALOGIX_ANX7625=y in CI arm64 config.

Signed-off-by: Vignesh Raman 
---

v2:
  - No changes

v3:
  - No changes

---
 drivers/gpu/drm/ci/arm64.config | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/ci/arm64.config b/drivers/gpu/drm/ci/arm64.config
index 8dbce9919a57..37d23fd7a367 100644
--- a/drivers/gpu/drm/ci/arm64.config
+++ b/drivers/gpu/drm/ci/arm64.config
@@ -187,6 +187,7 @@ CONFIG_MTK_DEVAPC=y
 CONFIG_PWM_MTK_DISP=y
 CONFIG_MTK_CMDQ=y
 CONFIG_REGULATOR_DA9211=y
+CONFIG_DRM_ANALOGIX_ANX7625=y
 
 # For nouveau.  Note that DRM must be a module so that it's loaded after NFS 
is up to provide the firmware.
 CONFIG_ARCH_TEGRA=y
-- 
2.40.1



[PATCH v3 5/9] drm/ci: meson: Add job to test panfrost GPU driver

2024-01-30 Thread Vignesh Raman
For amlogic meson SOC the GPU driver is panfrost. So add
support in drm-ci to test panfrost driver for amlogic meson
SOC and update xfails. Skip KMS tests for panfrost driver
since it is not a not a KMS driver.

Signed-off-by: Vignesh Raman 
---

v2:
  - Add panfrost GPU jobs for amlogic meson SOC with new xfails.

v3:
  - Skip KMS tests for panfrost driver since it is not a not a KMS
driver and update xfails. Add the job name in GPU_VERSION and use
it for xfail file names instead of using DRIVER_NAME.

---
 drivers/gpu/drm/ci/test.yml| 7 +++
 drivers/gpu/drm/ci/xfails/meson-g12b-gpu-fails.txt | 1 +
 drivers/gpu/drm/ci/xfails/meson-g12b-gpu-skips.txt | 2 ++
 3 files changed, 10 insertions(+)
 create mode 100644 drivers/gpu/drm/ci/xfails/meson-g12b-gpu-fails.txt
 create mode 100644 drivers/gpu/drm/ci/xfails/meson-g12b-gpu-skips.txt

diff --git a/drivers/gpu/drm/ci/test.yml b/drivers/gpu/drm/ci/test.yml
index bf4c303a65f2..f4053bc0e365 100644
--- a/drivers/gpu/drm/ci/test.yml
+++ b/drivers/gpu/drm/ci/test.yml
@@ -358,6 +358,13 @@ mediatek:mt8183-display:
 DEVICE_TYPE: meson-g12b-a311d-khadas-vim3
 RUNNER_TAG: mesa-ci-x86-64-lava-meson-g12b-a311d-khadas-vim3
 
+meson:g12b-gpu:
+  extends:
+- .g12b
+  variables:
+GPU_VERSION: meson-g12b-gpu
+DRIVER_NAME: panfrost
+
 meson:g12b-display:
   extends:
 - .g12b
diff --git a/drivers/gpu/drm/ci/xfails/meson-g12b-gpu-fails.txt 
b/drivers/gpu/drm/ci/xfails/meson-g12b-gpu-fails.txt
new file mode 100644
index ..6f5e760d5ec0
--- /dev/null
+++ b/drivers/gpu/drm/ci/xfails/meson-g12b-gpu-fails.txt
@@ -0,0 +1 @@
+panfrost_prime@gem-prime-import,Fail
diff --git a/drivers/gpu/drm/ci/xfails/meson-g12b-gpu-skips.txt 
b/drivers/gpu/drm/ci/xfails/meson-g12b-gpu-skips.txt
new file mode 100644
index ..2ea09d1648bc
--- /dev/null
+++ b/drivers/gpu/drm/ci/xfails/meson-g12b-gpu-skips.txt
@@ -0,0 +1,2 @@
+# Panfrost is not a KMS driver, so skip the KMS tests
+kms_.*
-- 
2.40.1



[PATCH v3 4/9] drm/ci: meson: Rename exisitng job

2024-01-30 Thread Vignesh Raman
For Amlogic Meson SOC the display driver is meson. Currently,
in drm-ci for meson, only the display driver is tested.
So rename the meson job to indicate that display driver is tested.

Rename the name of xfail files for meson (g12b), to include
information about the tested driver and update xfails
accordingly.

Signed-off-by: Vignesh Raman 
---

v2:
  - Refactor the patch to rename job to indicate display driver testing,
rename the existing xfail files.

v3:
  - Add the job name in GPU_VERSION and use it for xfail file names instead
of using DRIVER_NAME.

---
 drivers/gpu/drm/ci/test.yml   | 11 ---
 ...on-g12b-fails.txt => meson-g12b-display-fails.txt} |  3 ---
 2 files changed, 8 insertions(+), 6 deletions(-)
 rename drivers/gpu/drm/ci/xfails/{meson-g12b-fails.txt => 
meson-g12b-display-fails.txt} (84%)

diff --git a/drivers/gpu/drm/ci/test.yml b/drivers/gpu/drm/ci/test.yml
index e153c5a7ad80..bf4c303a65f2 100644
--- a/drivers/gpu/drm/ci/test.yml
+++ b/drivers/gpu/drm/ci/test.yml
@@ -346,20 +346,25 @@ mediatek:mt8183-display:
 - .lava-igt:arm64
   stage: meson
   variables:
-DRIVER_NAME: meson
 DTB: ${DEVICE_TYPE}
 BOOT_METHOD: u-boot
 KERNEL_IMAGE_TYPE: "image"
 
-meson:g12b:
+.g12b:
   extends:
 - .meson
   parallel: 3
   variables:
 DEVICE_TYPE: meson-g12b-a311d-khadas-vim3
-GPU_VERSION: meson-g12b
 RUNNER_TAG: mesa-ci-x86-64-lava-meson-g12b-a311d-khadas-vim3
 
+meson:g12b-display:
+  extends:
+- .g12b
+  variables:
+GPU_VERSION: meson-g12b-display
+DRIVER_NAME: meson
+
 virtio_gpu:none:
   stage: virtio-gpu
   variables:
diff --git a/drivers/gpu/drm/ci/xfails/meson-g12b-fails.txt 
b/drivers/gpu/drm/ci/xfails/meson-g12b-display-fails.txt
similarity index 84%
rename from drivers/gpu/drm/ci/xfails/meson-g12b-fails.txt
rename to drivers/gpu/drm/ci/xfails/meson-g12b-display-fails.txt
index 56a2ae7047b4..f123fb0cb820 100644
--- a/drivers/gpu/drm/ci/xfails/meson-g12b-fails.txt
+++ b/drivers/gpu/drm/ci/xfails/meson-g12b-display-fails.txt
@@ -7,9 +7,6 @@ kms_cursor_legacy@torture-bo,Fail
 kms_cursor_legacy@torture-move,Fail
 kms_force_connector_basic@force-edid,Fail
 kms_hdmi_inject@inject-4k,Fail
-kms_plane_cursor@overlay,Fail
-kms_plane_cursor@primary,Fail
-kms_plane_cursor@viewport,Fail
 kms_properties@connector-properties-atomic,Fail
 kms_properties@connector-properties-legacy,Fail
 kms_properties@get_properties-sanity-atomic,Fail
-- 
2.40.1



[PATCH v3 3/9] drm/ci: mediatek: Add job to test panfrost and powervr GPU driver

2024-01-30 Thread Vignesh Raman
For mediatek mt8173, the GPU driver is powervr and for mediatek
mt8183, the GPU driver is panfrost. So add support in drm-ci to
test panfrost and powervr GPU driver for mediatek SOCs and update
xfails. Powervr driver was merged in linux kernel, but there's no
mediatek support yet. So disable the mt8173-gpu job which uses
powervr driver.

Add panfrost specific tests to testlist and skip KMS tests for
panfrost driver since it is not a not a KMS driver. Also update
the MAINTAINERS file to include xfails for panfrost driver.

Signed-off-by: Vignesh Raman 
---

v2:
  - Add panfrost and PVR GPU jobs for mediatek SOC with new xfails, add xfail
entry to MAINTAINERS.

v3:
  - Add panfrost specific tests to testlist and skip KMS tests for
panfrost driver since it is not a not a KMS driver and update xfails.
Update the MAINTAINERS file to include xfails for panfrost driver.
Add the job name in GPU_VERSION and use it for xfail file names instead
of using DRIVER_NAME.

---
 MAINTAINERS|  1 +
 drivers/gpu/drm/ci/test.yml| 18 ++
 drivers/gpu/drm/ci/testlist.txt| 16 
 .../ci/xfails/mediatek-mt8183-gpu-skips.txt|  2 ++
 4 files changed, 37 insertions(+)
 create mode 100644 drivers/gpu/drm/ci/xfails/mediatek-mt8183-gpu-skips.txt

diff --git a/MAINTAINERS b/MAINTAINERS
index 9d959a6881f7..bcdc17d1aa26 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1645,6 +1645,7 @@ L:dri-de...@lists.freedesktop.org
 S: Supported
 T: git git://anongit.freedesktop.org/drm/drm-misc
 F: Documentation/gpu/panfrost.rst
+F: drivers/gpu/drm/ci/xfails/panfrost*
 F: drivers/gpu/drm/panfrost/
 F: include/uapi/drm/panfrost_drm.h
 
diff --git a/drivers/gpu/drm/ci/test.yml b/drivers/gpu/drm/ci/test.yml
index 0cd44e6ea18b..e153c5a7ad80 100644
--- a/drivers/gpu/drm/ci/test.yml
+++ b/drivers/gpu/drm/ci/test.yml
@@ -299,6 +299,17 @@ amdgpu:stoney:
 DEVICE_TYPE: mt8183-kukui-jacuzzi-juniper-sku16
 RUNNER_TAG: mesa-ci-x86-64-lava-mt8183-kukui-jacuzzi-juniper-sku16
 
+mediatek:mt8173-gpu:
+  extends:
+- .mt8173
+  variables:
+GPU_VERSION: mediatek-mt8173-gpu
+DRIVER_NAME: powervr
+  rules:
+# TODO: powervr driver was merged in linux kernel, but there's no mediatek 
support yet
+# Remove the rule once mediatek support is added for powervr
+- when: never
+
 mediatek:mt8173-display:
   extends:
 - .mt8173
@@ -306,6 +317,13 @@ mediatek:mt8173-display:
 GPU_VERSION: mediatek-mt8173-display
 DRIVER_NAME: mediatek
 
+mediatek:mt8183-gpu:
+  extends:
+- .mt8183
+  variables:
+GPU_VERSION: mediatek-mt8183-gpu
+DRIVER_NAME: panfrost
+
 mediatek:mt8183-display:
   extends:
 - .mt8183
diff --git a/drivers/gpu/drm/ci/testlist.txt b/drivers/gpu/drm/ci/testlist.txt
index eaeb751bb0ad..772fc025b1f8 100644
--- a/drivers/gpu/drm/ci/testlist.txt
+++ b/drivers/gpu/drm/ci/testlist.txt
@@ -2959,3 +2959,19 @@ msm_submit@invalid-duplicate-bo-submit
 msm_submit@invalid-cmd-idx-submit
 msm_submit@invalid-cmd-type-submit
 msm_submit@valid-submit
+panfrost_get_param@base-params
+panfrost_get_param@get-bad-param
+panfrost_get_param@get-bad-padding
+panfrost_gem_new@gem-new-4096
+panfrost_gem_new@gem-new-0
+panfrost_gem_new@gem-new-zeroed
+panfrost_prime@gem-prime-import
+panfrost_submit@pan-submit
+panfrost_submit@pan-submit-error-no-jc
+panfrost_submit@pan-submit-error-bad-in-syncs
+panfrost_submit@pan-submit-error-bad-bo-handles
+panfrost_submit@pan-submit-error-bad-requirements
+panfrost_submit@pan-submit-error-bad-out-sync
+panfrost_submit@pan-reset
+panfrost_submit@pan-submit-and-close
+panfrost_submit@pan-unhandled-pagefault
diff --git a/drivers/gpu/drm/ci/xfails/mediatek-mt8183-gpu-skips.txt 
b/drivers/gpu/drm/ci/xfails/mediatek-mt8183-gpu-skips.txt
new file mode 100644
index ..2ea09d1648bc
--- /dev/null
+++ b/drivers/gpu/drm/ci/xfails/mediatek-mt8183-gpu-skips.txt
@@ -0,0 +1,2 @@
+# Panfrost is not a KMS driver, so skip the KMS tests
+kms_.*
-- 
2.40.1



[PATCH v3 0/9] drm/ci: Add support for GPU and display testing

2024-01-30 Thread Vignesh Raman
Some ARM SOCs have a separate display controller and GPU, each with
different drivers. For mediatek mt8173, the GPU driver is powervr,
and the display driver is mediatek. In the case of mediatek mt8183,
the GPU driver is panfrost, and the display driver is mediatek.
With rockchip rk3288/rk3399, the GPU driver is panfrost, while the
display driver is rockchip. For amlogic meson, the GPU driver is
panfrost, and the display driver is meson.

IGT tests run various tests with different xfails and can test both
GPU devices and KMS/display devices. Currently, in drm-ci for MediaTek,
Rockchip, and Amlogic Meson platforms, only the GPU driver is tested.
This leads to incomplete coverage since the display is never tested on
these platforms. This commit series adds support in drm-ci to run tests
for both GPU and display drivers for MediaTek, Rockchip, and Amlogic
Meson platforms.

Uprev mesa and IGT in drm-ci and add amd, v3d, vc4 and vgem specific
tests to testlist. Have testlist.txt per driver and include a base
testlist so that the driver specific tests will run only on those hardware.

Vignesh Raman (9):
  drm/ci: arm64.config: Enable CONFIG_DRM_ANALOGIX_ANX7625
  drm/ci: mediatek: Rename exisitng job
  drm/ci: mediatek: Add job to test panfrost and powervr GPU driver
  drm/ci: meson: Rename exisitng job
  drm/ci: meson: Add job to test panfrost GPU driver
  drm/ci: rockchip: Rename existing job
  drm/ci: rockchip: Add job to test panfrost GPU driver
  drm/ci: uprev mesa version
  drm/ci: uprev IGT and update testlist

 MAINTAINERS   |   1 +
 drivers/gpu/drm/ci/arm64.config   |   1 +
 drivers/gpu/drm/ci/container.yml  |   6 +-
 drivers/gpu/drm/ci/gitlab-ci.yml  |   8 +-
 drivers/gpu/drm/ci/igt_runner.sh  |  34 ++--
 drivers/gpu/drm/ci/image-tags.yml |   3 +-
 drivers/gpu/drm/ci/test.yml   | 137 
 drivers/gpu/drm/ci/testlist-amdgpu.txt| 151 ++
 drivers/gpu/drm/ci/testlist-msm.txt   |  50 ++
 drivers/gpu/drm/ci/testlist-panfrost.txt  |  17 ++
 drivers/gpu/drm/ci/testlist-v3d.txt   |  73 +
 drivers/gpu/drm/ci/testlist-vc4.txt   |  49 ++
 drivers/gpu/drm/ci/testlist.txt   |  84 --
 .../gpu/drm/ci/xfails/amdgpu-stoney-fails.txt |  24 ++-
 .../drm/ci/xfails/amdgpu-stoney-flakes.txt|   9 +-
 .../gpu/drm/ci/xfails/amdgpu-stoney-skips.txt |  10 +-
 txt => mediatek-mt8173-display-fails.txt} |  13 --
 .../xfails/mediatek-mt8173-display-flakes.txt |  13 ++
 .../xfails/mediatek-mt8183-display-fails.txt  |  16 ++
 .../xfails/mediatek-mt8183-display-flakes.txt |   8 +
 .../drm/ci/xfails/mediatek-mt8183-fails.txt   |  13 --
 .../ci/xfails/mediatek-mt8183-gpu-skips.txt   |   2 +
 ...fails.txt => meson-g12b-display-fails.txt} |   3 -
 .../drm/ci/xfails/meson-g12b-gpu-fails.txt|   1 +
 .../drm/ci/xfails/meson-g12b-gpu-skips.txt|   2 +
 .../xfails/rockchip-rk3288-display-fails.txt  |  21 +++
 .../xfails/rockchip-rk3288-display-flakes.txt |  17 ++
 .../xfails/rockchip-rk3288-display-skips.txt  |   8 +
 .../drm/ci/xfails/rockchip-rk3288-fails.txt   |  54 ---
 .../ci/xfails/rockchip-rk3288-gpu-fails.txt   |   1 +
 .../ci/xfails/rockchip-rk3288-gpu-skips.txt   |   2 +
 .../drm/ci/xfails/rockchip-rk3288-skips.txt   |  52 --
 txt => rockchip-rk3399-display-fails.txt} |  38 +++--
 .../xfails/rockchip-rk3399-display-flakes.txt |  23 +++
 .../xfails/rockchip-rk3399-display-skips.txt  |   6 +
 .../drm/ci/xfails/rockchip-rk3399-flakes.txt  |   7 -
 .../ci/xfails/rockchip-rk3399-gpu-fails.txt   |   1 +
 .../ci/xfails/rockchip-rk3399-gpu-skips.txt   |   2 +
 .../drm/ci/xfails/rockchip-rk3399-skips.txt   |   5 -
 39 files changed, 686 insertions(+), 279 deletions(-)
 create mode 100644 drivers/gpu/drm/ci/testlist-amdgpu.txt
 create mode 100644 drivers/gpu/drm/ci/testlist-msm.txt
 create mode 100644 drivers/gpu/drm/ci/testlist-panfrost.txt
 create mode 100644 drivers/gpu/drm/ci/testlist-v3d.txt
 create mode 100644 drivers/gpu/drm/ci/testlist-vc4.txt
 rename drivers/gpu/drm/ci/xfails/{mediatek-mt8173-fails.txt => 
mediatek-mt8173-display-fails.txt} (59%)
 create mode 100644 drivers/gpu/drm/ci/xfails/mediatek-mt8173-display-flakes.txt
 create mode 100644 drivers/gpu/drm/ci/xfails/mediatek-mt8183-display-fails.txt
 create mode 100644 drivers/gpu/drm/ci/xfails/mediatek-mt8183-display-flakes.txt
 delete mode 100644 drivers/gpu/drm/ci/xfails/mediatek-mt8183-fails.txt
 create mode 100644 drivers/gpu/drm/ci/xfails/mediatek-mt8183-gpu-skips.txt
 rename drivers/gpu/drm/ci/xfails/{meson-g12b-fails.txt => 
meson-g12b-display-fails.txt} (84%)
 create mode 100644 drivers/gpu/drm/ci/xfails/meson-g12b-gpu-fails.txt
 create mode 100644 drivers/gpu/drm/ci/xfails/meson-g12b-gpu-skips.txt
 create mode 100644 drivers/gpu/drm/ci/xfails/rockchip-rk3288-display-fails.txt
 create mode 100644 

Re: [PATCH v2 1/1] drm/virtio: Implement device_attach

2024-01-30 Thread Christian König

Am 30.01.24 um 12:16 schrieb Daniel Vetter:

On Tue, Jan 30, 2024 at 12:10:31PM +0100, Daniel Vetter wrote:

On Mon, Jan 29, 2024 at 06:31:19PM +0800, Julia Zhang wrote:

As vram objects don't have backing pages and thus can't implement
drm_gem_object_funcs.get_sg_table callback. This removes drm dma-buf
callbacks in virtgpu_gem_map_dma_buf()/virtgpu_gem_unmap_dma_buf()
and implement virtgpu specific map/unmap/attach callbacks to support
both of shmem objects and vram objects.

Signed-off-by: Julia Zhang 
---
  drivers/gpu/drm/virtio/virtgpu_prime.c | 40 +++---
  1 file changed, 36 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_prime.c 
b/drivers/gpu/drm/virtio/virtgpu_prime.c
index 44425f20d91a..b490a5343b06 100644
--- a/drivers/gpu/drm/virtio/virtgpu_prime.c
+++ b/drivers/gpu/drm/virtio/virtgpu_prime.c
@@ -49,11 +49,26 @@ virtgpu_gem_map_dma_buf(struct dma_buf_attachment *attach,
  {
struct drm_gem_object *obj = attach->dmabuf->priv;
struct virtio_gpu_object *bo = gem_to_virtio_gpu_obj(obj);
+   struct sg_table *sgt;
+   int ret;
  
  	if (virtio_gpu_is_vram(bo))

return virtio_gpu_vram_map_dma_buf(bo, attach->dev, dir);
  
-	return drm_gem_map_dma_buf(attach, dir);

+   sgt = drm_prime_pages_to_sg(obj->dev,
+   to_drm_gem_shmem_obj(obj)->pages,
+   obj->size >> PAGE_SHIFT);
+   if (IS_ERR(sgt))
+   return sgt;
+
+   ret = dma_map_sgtable(attach->dev, sgt, dir, DMA_ATTR_SKIP_CPU_SYNC);
+   if (ret) {
+   sg_free_table(sgt);
+   kfree(sgt);
+   return ERR_PTR(ret);
+   }
+
+   return sgt;
  }
  
  static void virtgpu_gem_unmap_dma_buf(struct dma_buf_attachment *attach,

@@ -63,12 +78,29 @@ static void virtgpu_gem_unmap_dma_buf(struct 
dma_buf_attachment *attach,
struct drm_gem_object *obj = attach->dmabuf->priv;
struct virtio_gpu_object *bo = gem_to_virtio_gpu_obj(obj);
  
+	if (!sgt)

+   return;
+
if (virtio_gpu_is_vram(bo)) {
virtio_gpu_vram_unmap_dma_buf(attach->dev, sgt, dir);
-   return;
+   } else {
+   dma_unmap_sgtable(attach->dev, sgt, dir, 
DMA_ATTR_SKIP_CPU_SYNC);
+   sg_free_table(sgt);
+   kfree(sgt);
}
+}
+
+static int virtgpu_gem_device_attach(struct dma_buf *dma_buf,
+struct dma_buf_attachment *attach)
+{
+   struct drm_gem_object *obj = attach->dmabuf->priv;
+   struct virtio_gpu_object *bo = gem_to_virtio_gpu_obj(obj);
+   int ret = 0;
+
+   if (!virtio_gpu_is_vram(bo) && obj->funcs->pin)
+   ret = obj->funcs->pin(obj);
  
-	drm_gem_unmap_dma_buf(attach, sgt, dir);

+   return ret;

This doesn't look like what I've expected. There should be no need to
change the map/unmap functions, especially not for the usual gem bo case.
We should definitely keep using the exact same code for that. Instead all
I expected is roughly

virtgpu_gem_device_attach()
{
if (virtio_gpu_is_vram(bo)) {
if (can_access_virtio_vram_directly(attach->dev)
return 0;
else
return -EBUSY;
} else {
return drm_gem_map_attach();
}
}

Note that I think can_access_virtio_vram_directly() needs to be
implemented first. I'm not even sure it's possible, might be that all the
importers need to set the attachment->peer2peer flag. Which is why this
thing exists really. But that's a pile more work to do.


Yeah, that is really just speculative. All importers need to set the 
peer2peer flag just in case.


What happens under the hood is that IOMMU redirects the "VRAM" memory 
access to whatever address the DMA-buf on the host is pointing to 
(system, VRAM, doorbell, IOMMU, whatever).


I'm also not 100% sure if all the cache snooping is done correctly in 
all cases, but for now it seems to work.




Frankly the more I look at the original patch that added vram export
support the more this just looks like a "pls revert, this is just too
broken".

The commit I mean is this one: ea5ea3d8a117 ("drm/virtio: support mapping
exported vram"). The commit message definitely needs to cite that one, and
also needs a cc: stable because not rejecting invalid imports is a pretty
big deal.


Yeah, I've pointed out that commit in an internal discussion as well. I 
was just not aware that it's that severely broken.


Regards,
Christian.



Also adding David.
-Sima


We should definitely not open-code any functions for the gem_bo export
case, which your patch seems to do? Or maybe I'm just extremely confused.
-Sima

  
  static const struct virtio_dma_buf_ops virtgpu_dmabuf_ops =  {

@@ -83,7 +115,7 @@ static const struct virtio_dma_buf_ops virtgpu_dmabuf_ops =  
{
.vmap = drm_gem_dmabuf_vmap,
.vunmap = 

[PATCH] drm/amdgpu: Clear the hotplug interrupt ack bit before hpd initialization

2024-01-30 Thread Qiang Ma
Problem:
The computer in the bios initialization process, unplug the HDMI display,
wait until the system up, plug in the HDMI display, did not enter the
hotplug interrupt function, the display is not bright.

Fix:
After the above problem occurs, and the hpd ack interrupt bit is 1,
the interrupt should be cleared during hpd_init initialization so that
when the driver is ready, it can respond to the hpd interrupt normally.

Signed-off-by: Qiang Ma 
---
 drivers/gpu/drm/amd/amdgpu/dce_v10_0.c |  2 ++
 drivers/gpu/drm/amd/amdgpu/dce_v11_0.c |  2 ++
 drivers/gpu/drm/amd/amdgpu/dce_v6_0.c  | 20 +---
 drivers/gpu/drm/amd/amdgpu/dce_v8_0.c  | 20 +---
 4 files changed, 38 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c
index bb666cb7522e..11859059fd10 100644
--- a/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c
@@ -51,6 +51,7 @@
 
 static void dce_v10_0_set_display_funcs(struct amdgpu_device *adev);
 static void dce_v10_0_set_irq_funcs(struct amdgpu_device *adev);
+static void dce_v10_0_hpd_int_ack(struct amdgpu_device *adev, int hpd);
 
 static const u32 crtc_offsets[] = {
CRTC0_REGISTER_OFFSET,
@@ -363,6 +364,7 @@ static void dce_v10_0_hpd_init(struct amdgpu_device *adev)
AMDGPU_HPD_DISCONNECT_INT_DELAY_IN_MS);
WREG32(mmDC_HPD_TOGGLE_FILT_CNTL + 
hpd_offsets[amdgpu_connector->hpd.hpd], tmp);
 
+   dce_v6_0_hpd_int_ack(adev, amdgpu_connector->hpd.hpd);
dce_v10_0_hpd_set_polarity(adev, amdgpu_connector->hpd.hpd);
amdgpu_irq_get(adev, >hpd_irq,
   amdgpu_connector->hpd.hpd);
diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
index 7af277f61cca..745e4fdffade 100644
--- a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
@@ -51,6 +51,7 @@
 
 static void dce_v11_0_set_display_funcs(struct amdgpu_device *adev);
 static void dce_v11_0_set_irq_funcs(struct amdgpu_device *adev);
+static void dce_v11_0_hpd_int_ack(struct amdgpu_device *adev, int hpd);
 
 static const u32 crtc_offsets[] =
 {
@@ -387,6 +388,7 @@ static void dce_v11_0_hpd_init(struct amdgpu_device *adev)
AMDGPU_HPD_DISCONNECT_INT_DELAY_IN_MS);
WREG32(mmDC_HPD_TOGGLE_FILT_CNTL + 
hpd_offsets[amdgpu_connector->hpd.hpd], tmp);
 
+   dce_v11_0_hpd_int_ack(adev, amdgpu_connector->hpd.hpd);
dce_v11_0_hpd_set_polarity(adev, amdgpu_connector->hpd.hpd);
amdgpu_irq_get(adev, >hpd_irq, amdgpu_connector->hpd.hpd);
}
diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v6_0.c 
b/drivers/gpu/drm/amd/amdgpu/dce_v6_0.c
index 143efc37a17f..f8e15ebf74b4 100644
--- a/drivers/gpu/drm/amd/amdgpu/dce_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/dce_v6_0.c
@@ -272,6 +272,21 @@ static void dce_v6_0_hpd_set_polarity(struct amdgpu_device 
*adev,
WREG32(mmDC_HPD1_INT_CONTROL + hpd_offsets[hpd], tmp);
 }
 
+static void dce_v6_0_hpd_int_ack(struct amdgpu_device *adev,
+int hpd)
+{
+   u32 tmp;
+
+   if (hpd >= adev->mode_info.num_hpd) {
+   DRM_DEBUG("invalid hdp %d\n", hpd);
+   return;
+   }
+
+   tmp = RREG32(mmDC_HPD1_INT_CONTROL + hpd_offsets[hpd]);
+   tmp |= DC_HPD1_INT_CONTROL__DC_HPD1_INT_ACK_MASK;
+   WREG32(mmDC_HPD1_INT_CONTROL + hpd_offsets[hpd], tmp);
+}
+
 /**
  * dce_v6_0_hpd_init - hpd setup callback.
  *
@@ -311,6 +326,7 @@ static void dce_v6_0_hpd_init(struct amdgpu_device *adev)
continue;
}
 
+   dce_v6_0_hpd_int_ack(adev, amdgpu_connector->hpd.hpd);
dce_v6_0_hpd_set_polarity(adev, amdgpu_connector->hpd.hpd);
amdgpu_irq_get(adev, >hpd_irq, amdgpu_connector->hpd.hpd);
}
@@ -3101,9 +3117,7 @@ static int dce_v6_0_hpd_irq(struct amdgpu_device *adev,
mask = interrupt_status_offsets[hpd].hpd;
 
if (disp_int & mask) {
-   tmp = RREG32(mmDC_HPD1_INT_CONTROL + hpd_offsets[hpd]);
-   tmp |= DC_HPD1_INT_CONTROL__DC_HPD1_INT_ACK_MASK;
-   WREG32(mmDC_HPD1_INT_CONTROL + hpd_offsets[hpd], tmp);
+   dce_v6_0_hpd_int_ack(adev, hpd);
schedule_delayed_work(>hotplug_work, 0);
DRM_DEBUG("IH: HPD%d\n", hpd + 1);
}
diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v8_0.c 
b/drivers/gpu/drm/amd/amdgpu/dce_v8_0.c
index adeddfb7ff12..141e33a01686 100644
--- a/drivers/gpu/drm/amd/amdgpu/dce_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/dce_v8_0.c
@@ -264,6 +264,21 @@ static void dce_v8_0_hpd_set_polarity(struct amdgpu_device 
*adev,
WREG32(mmDC_HPD1_INT_CONTROL + hpd_offsets[hpd], tmp);
 }
 
+static void dce_v8_0_hpd_int_ack(struct amdgpu_device *adev,
+   

[PATCH] drm/amdgpu: Clear the hotplug interrupt ack bit before hpd initialization

2024-01-30 Thread Qiang Ma
Problem:
The computer in the bios initialization process, unplug the HDMI display,
wait until the system up, plug in the HDMI display, did not enter the
hotplug interrupt function, the display is not bright.

Fix:
After the above problem occurs, and the hpd ack interrupt bit is 1,
the interrupt should be cleared during hpd_init initialization so that
when the driver is ready, it can respond to the hpd interrupt normally.

Signed-off-by: Qiang Ma 
---
 drivers/gpu/drm/amd/amdgpu/dce_v10_0.c |  2 ++
 drivers/gpu/drm/amd/amdgpu/dce_v11_0.c |  2 ++
 drivers/gpu/drm/amd/amdgpu/dce_v6_0.c  | 20 +---
 drivers/gpu/drm/amd/amdgpu/dce_v8_0.c  | 20 +---
 4 files changed, 38 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c
index bb666cb7522e..11859059fd10 100644
--- a/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c
@@ -51,6 +51,7 @@
 
 static void dce_v10_0_set_display_funcs(struct amdgpu_device *adev);
 static void dce_v10_0_set_irq_funcs(struct amdgpu_device *adev);
+static void dce_v10_0_hpd_int_ack(struct amdgpu_device *adev, int hpd);
 
 static const u32 crtc_offsets[] = {
CRTC0_REGISTER_OFFSET,
@@ -363,6 +364,7 @@ static void dce_v10_0_hpd_init(struct amdgpu_device *adev)
AMDGPU_HPD_DISCONNECT_INT_DELAY_IN_MS);
WREG32(mmDC_HPD_TOGGLE_FILT_CNTL + 
hpd_offsets[amdgpu_connector->hpd.hpd], tmp);
 
+   dce_v6_0_hpd_int_ack(adev, amdgpu_connector->hpd.hpd);
dce_v10_0_hpd_set_polarity(adev, amdgpu_connector->hpd.hpd);
amdgpu_irq_get(adev, >hpd_irq,
   amdgpu_connector->hpd.hpd);
diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
index 7af277f61cca..745e4fdffade 100644
--- a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
@@ -51,6 +51,7 @@
 
 static void dce_v11_0_set_display_funcs(struct amdgpu_device *adev);
 static void dce_v11_0_set_irq_funcs(struct amdgpu_device *adev);
+static void dce_v11_0_hpd_int_ack(struct amdgpu_device *adev, int hpd);
 
 static const u32 crtc_offsets[] =
 {
@@ -387,6 +388,7 @@ static void dce_v11_0_hpd_init(struct amdgpu_device *adev)
AMDGPU_HPD_DISCONNECT_INT_DELAY_IN_MS);
WREG32(mmDC_HPD_TOGGLE_FILT_CNTL + 
hpd_offsets[amdgpu_connector->hpd.hpd], tmp);
 
+   dce_v11_0_hpd_int_ack(adev, amdgpu_connector->hpd.hpd);
dce_v11_0_hpd_set_polarity(adev, amdgpu_connector->hpd.hpd);
amdgpu_irq_get(adev, >hpd_irq, amdgpu_connector->hpd.hpd);
}
diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v6_0.c 
b/drivers/gpu/drm/amd/amdgpu/dce_v6_0.c
index 143efc37a17f..f8e15ebf74b4 100644
--- a/drivers/gpu/drm/amd/amdgpu/dce_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/dce_v6_0.c
@@ -272,6 +272,21 @@ static void dce_v6_0_hpd_set_polarity(struct amdgpu_device 
*adev,
WREG32(mmDC_HPD1_INT_CONTROL + hpd_offsets[hpd], tmp);
 }
 
+static void dce_v6_0_hpd_int_ack(struct amdgpu_device *adev,
+int hpd)
+{
+   u32 tmp;
+
+   if (hpd >= adev->mode_info.num_hpd) {
+   DRM_DEBUG("invalid hdp %d\n", hpd);
+   return;
+   }
+
+   tmp = RREG32(mmDC_HPD1_INT_CONTROL + hpd_offsets[hpd]);
+   tmp |= DC_HPD1_INT_CONTROL__DC_HPD1_INT_ACK_MASK;
+   WREG32(mmDC_HPD1_INT_CONTROL + hpd_offsets[hpd], tmp);
+}
+
 /**
  * dce_v6_0_hpd_init - hpd setup callback.
  *
@@ -311,6 +326,7 @@ static void dce_v6_0_hpd_init(struct amdgpu_device *adev)
continue;
}
 
+   dce_v6_0_hpd_int_ack(adev, amdgpu_connector->hpd.hpd);
dce_v6_0_hpd_set_polarity(adev, amdgpu_connector->hpd.hpd);
amdgpu_irq_get(adev, >hpd_irq, amdgpu_connector->hpd.hpd);
}
@@ -3101,9 +3117,7 @@ static int dce_v6_0_hpd_irq(struct amdgpu_device *adev,
mask = interrupt_status_offsets[hpd].hpd;
 
if (disp_int & mask) {
-   tmp = RREG32(mmDC_HPD1_INT_CONTROL + hpd_offsets[hpd]);
-   tmp |= DC_HPD1_INT_CONTROL__DC_HPD1_INT_ACK_MASK;
-   WREG32(mmDC_HPD1_INT_CONTROL + hpd_offsets[hpd], tmp);
+   dce_v6_0_hpd_int_ack(adev, hpd);
schedule_delayed_work(>hotplug_work, 0);
DRM_DEBUG("IH: HPD%d\n", hpd + 1);
}
diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v8_0.c 
b/drivers/gpu/drm/amd/amdgpu/dce_v8_0.c
index adeddfb7ff12..141e33a01686 100644
--- a/drivers/gpu/drm/amd/amdgpu/dce_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/dce_v8_0.c
@@ -264,6 +264,21 @@ static void dce_v8_0_hpd_set_polarity(struct amdgpu_device 
*adev,
WREG32(mmDC_HPD1_INT_CONTROL + hpd_offsets[hpd], tmp);
 }
 
+static void dce_v8_0_hpd_int_ack(struct amdgpu_device *adev,
+   

[PATCH] drm/amdgpu: remove asymmetrical irq disabling in vcn 4.0.5 suspend

2024-01-30 Thread Yifan Zhang
There is no irq enabled in vcn 4.0.5 resume, causing wrong amdgpu_irq_src 
status.
Beside, current set function callbacks are empty with no real effect.

Signed-off-by: Yifan Zhang 
---
 drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c   | 17 -
 drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c | 19 ---
 2 files changed, 36 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c 
b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
index 169ed400ee7b..8ab01ae919d2 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
@@ -2017,22 +2017,6 @@ static int vcn_v4_0_set_powergating_state(void *handle, 
enum amd_powergating_sta
return ret;
 }
 
-/**
- * vcn_v4_0_set_interrupt_state - set VCN block interrupt state
- *
- * @adev: amdgpu_device pointer
- * @source: interrupt sources
- * @type: interrupt types
- * @state: interrupt states
- *
- * Set VCN block interrupt state
- */
-static int vcn_v4_0_set_interrupt_state(struct amdgpu_device *adev, struct 
amdgpu_irq_src *source,
-  unsigned type, enum amdgpu_interrupt_state state)
-{
-   return 0;
-}
-
 /**
  * vcn_v4_0_set_ras_interrupt_state - set VCN block RAS interrupt state
  *
@@ -2097,7 +2081,6 @@ static int vcn_v4_0_process_interrupt(struct 
amdgpu_device *adev, struct amdgpu_
 }
 
 static const struct amdgpu_irq_src_funcs vcn_v4_0_irq_funcs = {
-   .set = vcn_v4_0_set_interrupt_state,
.process = vcn_v4_0_process_interrupt,
 };
 
diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c 
b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c
index 2eda30e78f61..49e4c3c09aca 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c
@@ -269,8 +269,6 @@ static int vcn_v4_0_5_hw_fini(void *handle)
vcn_v4_0_5_set_powergating_state(adev, 
AMD_PG_STATE_GATE);
}
}
-
-   amdgpu_irq_put(adev, >vcn.inst[i].irq, 0);
}
 
return 0;
@@ -1668,22 +1666,6 @@ static int vcn_v4_0_5_set_powergating_state(void 
*handle, enum amd_powergating_s
return ret;
 }
 
-/**
- * vcn_v4_0_5_set_interrupt_state - set VCN block interrupt state
- *
- * @adev: amdgpu_device pointer
- * @source: interrupt sources
- * @type: interrupt types
- * @state: interrupt states
- *
- * Set VCN block interrupt state
- */
-static int vcn_v4_0_5_set_interrupt_state(struct amdgpu_device *adev, struct 
amdgpu_irq_src *source,
-   unsigned type, enum amdgpu_interrupt_state state)
-{
-   return 0;
-}
-
 /**
  * vcn_v4_0_5_process_interrupt - process VCN block interrupt
  *
@@ -1726,7 +1708,6 @@ static int vcn_v4_0_5_process_interrupt(struct 
amdgpu_device *adev, struct amdgp
 }
 
 static const struct amdgpu_irq_src_funcs vcn_v4_0_5_irq_funcs = {
-   .set = vcn_v4_0_5_set_interrupt_state,
.process = vcn_v4_0_5_process_interrupt,
 };
 
-- 
2.37.3



Re: [PATCH v3] drm/amdgpu: Fix missing error code in 'gmc_v6/7/8/9_0_hw_init()'

2024-01-30 Thread Christian König

Am 30.01.24 um 11:19 schrieb Srinivasan Shanmugam:

Return 0 for success scenairos in 'gmc_v6/7/8/9_0_hw_init()'

Fixes the below:
drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c:920 gmc_v6_0_hw_init() warn: missing 
error code? 'r'
drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c:1104 gmc_v7_0_hw_init() warn: missing 
error code? 'r'
drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c:1224 gmc_v8_0_hw_init() warn: missing 
error code? 'r'
drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c:2347 gmc_v9_0_hw_init() warn: missing 
error code? 'r'

Fixes: 8301de8fcadc ("drm/amdgpu: Fix with right return code '-EIO' in 
'amdgpu_gmc_vram_checking()'")
Cc: Christian König 
Cc: Alex Deucher 
Signed-off-by: Srinivasan Shanmugam 


Reviewed-by: Christian König 


---
v3:
   - Changed from 'return r;' to 'return 0' (Christian)

  drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c | 4 ++--
  drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c | 4 ++--
  drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 4 ++--
  drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 4 ++--
  4 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
index 229263e407e0..23b478639921 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
@@ -916,8 +916,8 @@ static int gmc_v6_0_hw_init(void *handle)
  
  	if (amdgpu_emu_mode == 1)

return amdgpu_gmc_vram_checking(adev);
-   else
-   return r;
+
+   return 0;
  }
  
  static int gmc_v6_0_hw_fini(void *handle)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
index d95f719eec55..3da7b6a2b00d 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
@@ -1100,8 +1100,8 @@ static int gmc_v7_0_hw_init(void *handle)
  
  	if (amdgpu_emu_mode == 1)

return amdgpu_gmc_vram_checking(adev);
-   else
-   return r;
+
+   return 0;
  }
  
  static int gmc_v7_0_hw_fini(void *handle)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
index 4eb0cccdb413..969a9e867170 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
@@ -1220,8 +1220,8 @@ static int gmc_v8_0_hw_init(void *handle)
  
  	if (amdgpu_emu_mode == 1)

return amdgpu_gmc_vram_checking(adev);
-   else
-   return r;
+
+   return 0;
  }
  
  static int gmc_v8_0_hw_fini(void *handle)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index a3a11538207b..4a50537252ac 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -2343,8 +2343,8 @@ static int gmc_v9_0_hw_init(void *handle)
  
  	if (amdgpu_emu_mode == 1)

return amdgpu_gmc_vram_checking(adev);
-   else
-   return r;
+
+   return 0;
  }
  
  /**




RE: [PATCH] drm/amdgpu: Need to resume ras during gpu reset for gfx v9_4_3 sriov

2024-01-30 Thread Zhang, Hawking
[AMD Official Use Only - General]

Reviewed-by: Hawking Zhang 

Regards,
Hawking
-Original Message-
From: amd-gfx  On Behalf Of YiPeng Chai
Sent: Tuesday, January 30, 2024 20:10
To: amd-gfx@lists.freedesktop.org
Cc: Wang, Yang(Kevin) ; Zhou1, Tao ; 
Chai, Thomas ; Yang, Stanley ; Chai, 
Thomas ; Li, Candice ; Zhang, Hawking 

Subject: [PATCH] drm/amdgpu: Need to resume ras during gpu reset for gfx v9_4_3 
sriov

Need to resume ras during gpu reset for
gfx v9_4_3 sriov

Signed-off-by: YiPeng Chai 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index afc0b4eb7f8e..3c393d7d9672 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5724,6 +5724,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
/* Aldebaran and gfx_11_0_3 support ras in SRIOV, so need 
resume ras during reset */
if (amdgpu_ip_version(adev, GC_HWIP, 0) ==
IP_VERSION(9, 4, 2) ||
+   amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 3) 
||
amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(11, 0, 3))
amdgpu_ras_resume(adev);
} else {
--
2.34.1



[PATCH] drm/amdgpu: Need to resume ras during gpu reset for gfx v9_4_3 sriov

2024-01-30 Thread YiPeng Chai
Need to resume ras during gpu reset for
gfx v9_4_3 sriov

Signed-off-by: YiPeng Chai 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index afc0b4eb7f8e..3c393d7d9672 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5724,6 +5724,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
/* Aldebaran and gfx_11_0_3 support ras in SRIOV, so need 
resume ras during reset */
if (amdgpu_ip_version(adev, GC_HWIP, 0) ==
IP_VERSION(9, 4, 2) ||
+   amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 3) 
||
amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(11, 0, 3))
amdgpu_ras_resume(adev);
} else {
-- 
2.34.1



Flaky tests for amdgpu

2024-01-30 Thread Vignesh Raman

Hi Maintainers,

There are some flaky tests reported for amdgpu driver testing in drm-ci.

# Board Name: hp-11A-G6-EE-grunt
# IGT Version: 1.28-gb0cc8160e
# Linux Version: 6.7.0-rc3

Pipeline url:
https://gitlab.freedesktop.org/vigneshraman/linux/-/jobs/54373774

# Reported by deqp-runner
amdgpu/amd_pci_unplug@amdgpu_hotunplug_simple
amdgpu/amd_pci_unplug@amdgpu_hotunplug_with_exported_bo

Will add these tests in 
drivers/gpu/drm/ci/xfails/amdgpu-stoney-flakes.txt 
(https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/Documentation/gpu/automated_testing.rst#n70)


Please could you have a look at these test results and let us know if 
you need more information. Thank you.


Regards,
Vignesh


Re: [PATCH v2 1/1] drm/virtio: Implement device_attach

2024-01-30 Thread Daniel Vetter
On Tue, Jan 30, 2024 at 12:10:31PM +0100, Daniel Vetter wrote:
> On Mon, Jan 29, 2024 at 06:31:19PM +0800, Julia Zhang wrote:
> > As vram objects don't have backing pages and thus can't implement
> > drm_gem_object_funcs.get_sg_table callback. This removes drm dma-buf
> > callbacks in virtgpu_gem_map_dma_buf()/virtgpu_gem_unmap_dma_buf()
> > and implement virtgpu specific map/unmap/attach callbacks to support
> > both of shmem objects and vram objects.
> > 
> > Signed-off-by: Julia Zhang 
> > ---
> >  drivers/gpu/drm/virtio/virtgpu_prime.c | 40 +++---
> >  1 file changed, 36 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/virtio/virtgpu_prime.c 
> > b/drivers/gpu/drm/virtio/virtgpu_prime.c
> > index 44425f20d91a..b490a5343b06 100644
> > --- a/drivers/gpu/drm/virtio/virtgpu_prime.c
> > +++ b/drivers/gpu/drm/virtio/virtgpu_prime.c
> > @@ -49,11 +49,26 @@ virtgpu_gem_map_dma_buf(struct dma_buf_attachment 
> > *attach,
> >  {
> > struct drm_gem_object *obj = attach->dmabuf->priv;
> > struct virtio_gpu_object *bo = gem_to_virtio_gpu_obj(obj);
> > +   struct sg_table *sgt;
> > +   int ret;
> >  
> > if (virtio_gpu_is_vram(bo))
> > return virtio_gpu_vram_map_dma_buf(bo, attach->dev, dir);
> >  
> > -   return drm_gem_map_dma_buf(attach, dir);
> > +   sgt = drm_prime_pages_to_sg(obj->dev,
> > +   to_drm_gem_shmem_obj(obj)->pages,
> > +   obj->size >> PAGE_SHIFT);
> > +   if (IS_ERR(sgt))
> > +   return sgt;
> > +
> > +   ret = dma_map_sgtable(attach->dev, sgt, dir, DMA_ATTR_SKIP_CPU_SYNC);
> > +   if (ret) {
> > +   sg_free_table(sgt);
> > +   kfree(sgt);
> > +   return ERR_PTR(ret);
> > +   }
> > +
> > +   return sgt;
> >  }
> >  
> >  static void virtgpu_gem_unmap_dma_buf(struct dma_buf_attachment *attach,
> > @@ -63,12 +78,29 @@ static void virtgpu_gem_unmap_dma_buf(struct 
> > dma_buf_attachment *attach,
> > struct drm_gem_object *obj = attach->dmabuf->priv;
> > struct virtio_gpu_object *bo = gem_to_virtio_gpu_obj(obj);
> >  
> > +   if (!sgt)
> > +   return;
> > +
> > if (virtio_gpu_is_vram(bo)) {
> > virtio_gpu_vram_unmap_dma_buf(attach->dev, sgt, dir);
> > -   return;
> > +   } else {
> > +   dma_unmap_sgtable(attach->dev, sgt, dir, 
> > DMA_ATTR_SKIP_CPU_SYNC);
> > +   sg_free_table(sgt);
> > +   kfree(sgt);
> > }
> > +}
> > +
> > +static int virtgpu_gem_device_attach(struct dma_buf *dma_buf,
> > +struct dma_buf_attachment *attach)
> > +{
> > +   struct drm_gem_object *obj = attach->dmabuf->priv;
> > +   struct virtio_gpu_object *bo = gem_to_virtio_gpu_obj(obj);
> > +   int ret = 0;
> > +
> > +   if (!virtio_gpu_is_vram(bo) && obj->funcs->pin)
> > +   ret = obj->funcs->pin(obj);
> >  
> > -   drm_gem_unmap_dma_buf(attach, sgt, dir);
> > +   return ret;
> 
> This doesn't look like what I've expected. There should be no need to
> change the map/unmap functions, especially not for the usual gem bo case.
> We should definitely keep using the exact same code for that. Instead all
> I expected is roughly
> 
> virtgpu_gem_device_attach()
> {
>   if (virtio_gpu_is_vram(bo)) {
>   if (can_access_virtio_vram_directly(attach->dev)
>   return 0;
>   else
>   return -EBUSY;
>   } else {
>   return drm_gem_map_attach();
>   }
> }
> 
> Note that I think can_access_virtio_vram_directly() needs to be
> implemented first. I'm not even sure it's possible, might be that all the
> importers need to set the attachment->peer2peer flag. Which is why this
> thing exists really. But that's a pile more work to do.
> 
> Frankly the more I look at the original patch that added vram export
> support the more this just looks like a "pls revert, this is just too
> broken".

The commit I mean is this one: ea5ea3d8a117 ("drm/virtio: support mapping
exported vram"). The commit message definitely needs to cite that one, and
also needs a cc: stable because not rejecting invalid imports is a pretty
big deal.

Also adding David.
-Sima

> 
> We should definitely not open-code any functions for the gem_bo export
> case, which your patch seems to do? Or maybe I'm just extremely confused.
> -Sima
> 
> >  
> >  static const struct virtio_dma_buf_ops virtgpu_dmabuf_ops =  {
> > @@ -83,7 +115,7 @@ static const struct virtio_dma_buf_ops 
> > virtgpu_dmabuf_ops =  {
> > .vmap = drm_gem_dmabuf_vmap,
> > .vunmap = drm_gem_dmabuf_vunmap,
> > },
> > -   .device_attach = drm_gem_map_attach,
> > +   .device_attach = virtgpu_gem_device_attach,
> > .get_uuid = virtgpu_virtio_get_uuid,
> >  };
> >  
> > -- 
> > 2.34.1
> > 
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH v2 1/1] drm/virtio: Implement device_attach

2024-01-30 Thread Daniel Vetter
On Mon, Jan 29, 2024 at 06:31:19PM +0800, Julia Zhang wrote:
> As vram objects don't have backing pages and thus can't implement
> drm_gem_object_funcs.get_sg_table callback. This removes drm dma-buf
> callbacks in virtgpu_gem_map_dma_buf()/virtgpu_gem_unmap_dma_buf()
> and implement virtgpu specific map/unmap/attach callbacks to support
> both of shmem objects and vram objects.
> 
> Signed-off-by: Julia Zhang 
> ---
>  drivers/gpu/drm/virtio/virtgpu_prime.c | 40 +++---
>  1 file changed, 36 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/virtio/virtgpu_prime.c 
> b/drivers/gpu/drm/virtio/virtgpu_prime.c
> index 44425f20d91a..b490a5343b06 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_prime.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_prime.c
> @@ -49,11 +49,26 @@ virtgpu_gem_map_dma_buf(struct dma_buf_attachment *attach,
>  {
>   struct drm_gem_object *obj = attach->dmabuf->priv;
>   struct virtio_gpu_object *bo = gem_to_virtio_gpu_obj(obj);
> + struct sg_table *sgt;
> + int ret;
>  
>   if (virtio_gpu_is_vram(bo))
>   return virtio_gpu_vram_map_dma_buf(bo, attach->dev, dir);
>  
> - return drm_gem_map_dma_buf(attach, dir);
> + sgt = drm_prime_pages_to_sg(obj->dev,
> + to_drm_gem_shmem_obj(obj)->pages,
> + obj->size >> PAGE_SHIFT);
> + if (IS_ERR(sgt))
> + return sgt;
> +
> + ret = dma_map_sgtable(attach->dev, sgt, dir, DMA_ATTR_SKIP_CPU_SYNC);
> + if (ret) {
> + sg_free_table(sgt);
> + kfree(sgt);
> + return ERR_PTR(ret);
> + }
> +
> + return sgt;
>  }
>  
>  static void virtgpu_gem_unmap_dma_buf(struct dma_buf_attachment *attach,
> @@ -63,12 +78,29 @@ static void virtgpu_gem_unmap_dma_buf(struct 
> dma_buf_attachment *attach,
>   struct drm_gem_object *obj = attach->dmabuf->priv;
>   struct virtio_gpu_object *bo = gem_to_virtio_gpu_obj(obj);
>  
> + if (!sgt)
> + return;
> +
>   if (virtio_gpu_is_vram(bo)) {
>   virtio_gpu_vram_unmap_dma_buf(attach->dev, sgt, dir);
> - return;
> + } else {
> + dma_unmap_sgtable(attach->dev, sgt, dir, 
> DMA_ATTR_SKIP_CPU_SYNC);
> + sg_free_table(sgt);
> + kfree(sgt);
>   }
> +}
> +
> +static int virtgpu_gem_device_attach(struct dma_buf *dma_buf,
> +  struct dma_buf_attachment *attach)
> +{
> + struct drm_gem_object *obj = attach->dmabuf->priv;
> + struct virtio_gpu_object *bo = gem_to_virtio_gpu_obj(obj);
> + int ret = 0;
> +
> + if (!virtio_gpu_is_vram(bo) && obj->funcs->pin)
> + ret = obj->funcs->pin(obj);
>  
> - drm_gem_unmap_dma_buf(attach, sgt, dir);
> + return ret;

This doesn't look like what I've expected. There should be no need to
change the map/unmap functions, especially not for the usual gem bo case.
We should definitely keep using the exact same code for that. Instead all
I expected is roughly

virtgpu_gem_device_attach()
{
if (virtio_gpu_is_vram(bo)) {
if (can_access_virtio_vram_directly(attach->dev)
return 0;
else
return -EBUSY;
} else {
return drm_gem_map_attach();
}
}

Note that I think can_access_virtio_vram_directly() needs to be
implemented first. I'm not even sure it's possible, might be that all the
importers need to set the attachment->peer2peer flag. Which is why this
thing exists really. But that's a pile more work to do.

Frankly the more I look at the original patch that added vram export
support the more this just looks like a "pls revert, this is just too
broken".

We should definitely not open-code any functions for the gem_bo export
case, which your patch seems to do? Or maybe I'm just extremely confused.
-Sima

>  
>  static const struct virtio_dma_buf_ops virtgpu_dmabuf_ops =  {
> @@ -83,7 +115,7 @@ static const struct virtio_dma_buf_ops virtgpu_dmabuf_ops 
> =  {
>   .vmap = drm_gem_dmabuf_vmap,
>   .vunmap = drm_gem_dmabuf_vunmap,
>   },
> - .device_attach = drm_gem_map_attach,
> + .device_attach = virtgpu_gem_device_attach,
>   .get_uuid = virtgpu_virtio_get_uuid,
>  };
>  
> -- 
> 2.34.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch



[PATCH 2/2] use PSP address query command

2024-01-30 Thread Tao Zhou
Get UMC physical address from PSP in RAS error address coversion.

Signed-off-by: Tao Zhou 
---
 drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 46 ++
 1 file changed, 39 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c 
b/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
index 836a4cc1134e..14ef7a24be7b 100644
--- a/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
@@ -203,14 +203,14 @@ static bool umc_v12_0_bit_wise_xor(uint32_t val)
return result;
 }
 
-static void umc_v12_0_convert_error_address(struct amdgpu_device *adev,
-   struct ras_err_data *err_data, 
uint64_t err_addr,
-   uint32_t ch_inst, uint32_t umc_inst,
-   uint32_t node_inst)
+static void umc_v12_0_mca_addr_to_pa(struct amdgpu_device *adev,
+   uint64_t err_addr, uint32_t ch_inst, 
uint32_t umc_inst,
+   uint32_t node_inst,
+   struct ta_ras_query_address_output 
*addr_out)
 {
uint32_t channel_index, i;
-   uint64_t soc_pa, na, retired_page, column;
-   uint32_t bank_hash0, bank_hash1, bank_hash2, bank_hash3, col, row, 
row_xor;
+   uint64_t na, soc_pa;
+   uint32_t bank_hash0, bank_hash1, bank_hash2, bank_hash3, col, row;
uint32_t bank0, bank1, bank2, bank3, bank;
 
bank_hash0 = (err_addr >> UMC_V12_0_MCA_B0_BIT) & 0x1ULL;
@@ -260,12 +260,44 @@ static void umc_v12_0_convert_error_address(struct 
amdgpu_device *adev,
/* the umc channel bits are not original values, they are hashed */
UMC_V12_0_SET_CHANNEL_HASH(channel_index, soc_pa);
 
+   addr_out->pa.pa = soc_pa;
+   addr_out->pa.bank = bank;
+   addr_out->pa.channel_idx = channel_index;
+}
+
+static void umc_v12_0_convert_error_address(struct amdgpu_device *adev,
+   struct ras_err_data *err_data, 
uint64_t err_addr,
+   uint32_t ch_inst, uint32_t umc_inst,
+   uint32_t node_inst)
+{
+   uint32_t col, row, row_xor, bank, channel_index;
+   uint64_t soc_pa, retired_page, column;
+   struct ta_ras_query_address_input addr_in;
+   struct ta_ras_query_address_output addr_out;
+
+   addr_in.addr_type = TA_RAS_MCA_TO_PA;
+   addr_in.ma.err_addr = err_addr;
+   addr_in.ma.ch_inst = ch_inst;
+   addr_in.ma.umc_inst = umc_inst;
+   addr_in.ma.node_inst = node_inst;
+
+   if (psp_ras_query_address(>psp, _in, _out))
+   /* fallback to old path if fail to get pa from psp */
+   umc_v12_0_mca_addr_to_pa(adev, err_addr, ch_inst, umc_inst,
+   node_inst, _out);
+
+   soc_pa = addr_out.pa.pa;
+   bank = addr_out.pa.bank;
+   channel_index = addr_out.pa.channel_idx;
+
+   col = (err_addr >> 1) & 0x1fULL;
+   row = (err_addr >> 10) & 0x3fffULL;
+   row_xor = row ^ (0x1ULL << 13);
/* clear [C3 C2] in soc physical address */
soc_pa &= ~(0x3ULL << UMC_V12_0_PA_C2_BIT);
/* clear [C4] in soc physical address */
soc_pa &= ~(0x1ULL << UMC_V12_0_PA_C4_BIT);
 
-   row_xor = row ^ (0x1ULL << 13);
/* loop for all possibilities of [C4 C3 C2] */
for (column = 0; column < UMC_V12_0_NA_MAP_PA_NUM; column++) {
retired_page = soc_pa | ((column & 0x3) << UMC_V12_0_PA_C2_BIT);
-- 
2.34.1



[PATCH 1/2] add PSP RAS address query command

2024-01-30 Thread Tao Zhou
Convert mca address to physical address or vice versa via RAS TA.

Signed-off-by: Tao Zhou 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 25 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h |  3 +++
 drivers/gpu/drm/amd/amdgpu/ta_ras_if.h  | 36 +
 3 files changed, 64 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index 9eff8753f9b9..bb2d419fe914 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -1782,6 +1782,31 @@ int psp_ras_trigger_error(struct psp_context *psp,
 
return 0;
 }
+
+int psp_ras_query_address(struct psp_context *psp,
+ struct ta_ras_query_address_input *addr_in,
+ struct ta_ras_query_address_output *addr_out)
+{
+   struct ta_ras_shared_memory *ras_cmd;
+   int ret;
+
+   if (!psp->ras_context.context.initialized)
+   return -EINVAL;
+
+   ras_cmd = (struct ta_ras_shared_memory 
*)psp->ras_context.context.mem_context.shared_buf;
+   memset(ras_cmd, 0, sizeof(struct ta_ras_shared_memory));
+
+   ras_cmd->cmd_id = TA_RAS_COMMAND__QUERY_ADDRESS;
+   ras_cmd->ras_in_message.address = *addr_in;
+
+   ret = psp_ras_invoke(psp, ras_cmd->cmd_id);
+   if (ret || ras_cmd->ras_status || psp->cmd_buf_mem->resp.status)
+   return -EINVAL;
+
+   *addr_out = ras_cmd->ras_out_message.address;
+
+   return 0;
+}
 // ras end
 
 // HDCP start
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h
index 652b0a01854a..9951bdd022de 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h
@@ -502,6 +502,9 @@ int psp_ras_enable_features(struct psp_context *psp,
 int psp_ras_trigger_error(struct psp_context *psp,
  struct ta_ras_trigger_error_input *info, uint32_t 
instance_mask);
 int psp_ras_terminate(struct psp_context *psp);
+int psp_ras_query_address(struct psp_context *psp,
+ struct ta_ras_query_address_input *addr_in,
+ struct ta_ras_query_address_output *addr_out);
 
 int psp_hdcp_invoke(struct psp_context *psp, uint32_t ta_cmd_id);
 int psp_dtm_invoke(struct psp_context *psp, uint32_t ta_cmd_id);
diff --git a/drivers/gpu/drm/amd/amdgpu/ta_ras_if.h 
b/drivers/gpu/drm/amd/amdgpu/ta_ras_if.h
index 879bb7af297c..056d4df8fa1f 100644
--- a/drivers/gpu/drm/amd/amdgpu/ta_ras_if.h
+++ b/drivers/gpu/drm/amd/amdgpu/ta_ras_if.h
@@ -36,6 +36,9 @@ enum ras_command {
TA_RAS_COMMAND__ENABLE_FEATURES = 0,
TA_RAS_COMMAND__DISABLE_FEATURES,
TA_RAS_COMMAND__TRIGGER_ERROR,
+   TA_RAS_COMMAND__QUERY_BLOCK_INFO,
+   TA_RAS_COMMAND__QUERY_SUB_BLOCK_INFO,
+   TA_RAS_COMMAND__QUERY_ADDRESS,
 };
 
 enum ta_ras_status {
@@ -105,6 +108,11 @@ enum ta_ras_error_type {
TA_RAS_ERROR__POISON= 8,
 };
 
+enum ta_ras_address_type {
+   TA_RAS_MCA_TO_PA,
+   TA_RAS_PA_TO_MCA,
+};
+
 /* Input/output structures for RAS commands */
 /**/
 
@@ -133,12 +141,38 @@ struct ta_ras_init_flags {
uint8_t channel_dis_num;
 };
 
+struct ta_ras_mca_addr {
+   uint64_t err_addr;
+   uint32_t ch_inst;
+   uint32_t umc_inst;
+   uint32_t node_inst;
+};
+
+struct ta_ras_phy_addr {
+   uint64_t pa;
+   uint32_t bank;
+   uint32_t channel_idx;
+};
+
+struct ta_ras_query_address_input {
+   enum ta_ras_address_type addr_type;
+   struct ta_ras_mca_addr ma;
+   struct ta_ras_phy_addr pa;
+};
+
 struct ta_ras_output_flags {
uint8_t ras_init_success_flag;
uint8_t err_inject_switch_disable_flag;
uint8_t reg_access_failure_flag;
 };
 
+struct ta_ras_query_address_output {
+   /* don't use the flags here */
+   struct ta_ras_output_flags flags;
+   struct ta_ras_mca_addr ma;
+   struct ta_ras_phy_addr pa;
+};
+
 /* Common input structure for RAS callbacks */
 /**/
 union ta_ras_cmd_input {
@@ -146,12 +180,14 @@ union ta_ras_cmd_input {
struct ta_ras_enable_features_input enable_features;
struct ta_ras_disable_features_inputdisable_features;
struct ta_ras_trigger_error_input   trigger_error;
+   struct ta_ras_query_address_input   address;
 
uint32_t reserve_pad[256];
 };
 
 union ta_ras_cmd_output {
struct ta_ras_output_flags flags;
+   struct ta_ras_query_address_output address;
 
uint32_t reserve_pad[256];
 };
-- 
2.34.1



Re: [PATCH v3 3/3] drm/amdgpu: Implement check_async_props for planes

2024-01-30 Thread Simon Ser
> Do we really need this much flexibility, especially for the first driver
> adding the first few additional properties?

AFAIU we'd like to allow more props as well, e.g. cursor position…


Re: [PATCH v3 3/3] drm/amdgpu: Implement check_async_props for planes

2024-01-30 Thread Daniel Vetter
On Sun, Jan 28, 2024 at 06:25:15PM -0300, André Almeida wrote:
> AMD GPUs can do async flips with changes on more properties than just
> the FB ID, so implement a custom check_async_props for AMD planes.
> 
> Allow amdgpu to do async flips with overlay planes as well.
> 
> Signed-off-by: André Almeida 
> ---
> v3: allow overlay planes

This comment very much written with a lack of clearly better ideas, but:

Do we really need this much flexibility, especially for the first driver
adding the first few additional properties?

A simple bool on struct drm_plane to indicate whether async flips are ok
or not should also do this job here? Maybe a bit of work to roll that out
to the primary planes for current drivers, but not much. And wouldn't need
drivers to implement some very uapi-marshalling atomic code ...

Also we could probably remove the current drm_mode_config.async_flip flag
and entirely replace it with the per-plane one.
-Sima
> 
>  .../amd/display/amdgpu_dm/amdgpu_dm_plane.c   | 29 +++
>  1 file changed, 29 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c 
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
> index 116121e647ca..ed75b69636b4 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
> @@ -25,6 +25,7 @@
>   */
>  
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -1430,6 +1431,33 @@ static void 
> amdgpu_dm_plane_drm_plane_destroy_state(struct drm_plane *plane,
>   drm_atomic_helper_plane_destroy_state(plane, state);
>  }
>  
> +static int amdgpu_dm_plane_check_async_props(struct drm_property *prop,
> +   struct drm_plane *plane,
> +   struct drm_plane_state *plane_state,
> +   struct drm_mode_object *obj,
> +   u64 prop_value, u64 old_val)
> +{
> + struct drm_mode_config *config = >dev->mode_config;
> + int ret;
> +
> + if (prop != config->prop_fb_id &&
> + prop != config->prop_in_fence_fd) {
> + ret = drm_atomic_plane_get_property(plane, plane_state,
> + prop, _val);
> + return drm_atomic_check_prop_changes(ret, old_val, prop_value, 
> prop);
> + }
> +
> + if (plane_state->plane->type != DRM_PLANE_TYPE_PRIMARY &&
> + plane_state->plane->type != DRM_PLANE_TYPE_OVERLAY) {
> + drm_dbg_atomic(prop->dev,
> +"[OBJECT:%d] Only primary or overlay planes can 
> be changed during async flip\n",
> +obj->id);
> + return -EINVAL;
> + }
> +
> + return 0;
> +}
> +
>  static const struct drm_plane_funcs dm_plane_funcs = {
>   .update_plane   = drm_atomic_helper_update_plane,
>   .disable_plane  = drm_atomic_helper_disable_plane,
> @@ -1438,6 +1466,7 @@ static const struct drm_plane_funcs dm_plane_funcs = {
>   .atomic_duplicate_state = amdgpu_dm_plane_drm_plane_duplicate_state,
>   .atomic_destroy_state = amdgpu_dm_plane_drm_plane_destroy_state,
>   .format_mod_supported = amdgpu_dm_plane_format_mod_supported,
> + .check_async_props = amdgpu_dm_plane_check_async_props,
>  };
>  
>  int amdgpu_dm_plane_init(struct amdgpu_display_manager *dm,
> -- 
> 2.43.0
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


[PATCH v3] drm/amdgpu: Fix missing error code in 'gmc_v6/7/8/9_0_hw_init()'

2024-01-30 Thread Srinivasan Shanmugam
Return 0 for success scenairos in 'gmc_v6/7/8/9_0_hw_init()'

Fixes the below:
drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c:920 gmc_v6_0_hw_init() warn: missing 
error code? 'r'
drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c:1104 gmc_v7_0_hw_init() warn: missing 
error code? 'r'
drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c:1224 gmc_v8_0_hw_init() warn: missing 
error code? 'r'
drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c:2347 gmc_v9_0_hw_init() warn: missing 
error code? 'r'

Fixes: 8301de8fcadc ("drm/amdgpu: Fix with right return code '-EIO' in 
'amdgpu_gmc_vram_checking()'")
Cc: Christian König 
Cc: Alex Deucher 
Signed-off-by: Srinivasan Shanmugam 
---
v3: 
  - Changed from 'return r;' to 'return 0' (Christian)

 drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c | 4 ++--
 drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c | 4 ++--
 drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 4 ++--
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 4 ++--
 4 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
index 229263e407e0..23b478639921 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
@@ -916,8 +916,8 @@ static int gmc_v6_0_hw_init(void *handle)
 
if (amdgpu_emu_mode == 1)
return amdgpu_gmc_vram_checking(adev);
-   else
-   return r;
+
+   return 0;
 }
 
 static int gmc_v6_0_hw_fini(void *handle)
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
index d95f719eec55..3da7b6a2b00d 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
@@ -1100,8 +1100,8 @@ static int gmc_v7_0_hw_init(void *handle)
 
if (amdgpu_emu_mode == 1)
return amdgpu_gmc_vram_checking(adev);
-   else
-   return r;
+
+   return 0;
 }
 
 static int gmc_v7_0_hw_fini(void *handle)
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
index 4eb0cccdb413..969a9e867170 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
@@ -1220,8 +1220,8 @@ static int gmc_v8_0_hw_init(void *handle)
 
if (amdgpu_emu_mode == 1)
return amdgpu_gmc_vram_checking(adev);
-   else
-   return r;
+
+   return 0;
 }
 
 static int gmc_v8_0_hw_fini(void *handle)
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index a3a11538207b..4a50537252ac 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -2343,8 +2343,8 @@ static int gmc_v9_0_hw_init(void *handle)
 
if (amdgpu_emu_mode == 1)
return amdgpu_gmc_vram_checking(adev);
-   else
-   return r;
+
+   return 0;
 }
 
 /**
-- 
2.34.1



Re: [PATCH] drm/amd/display: Fix buffer overflow in 'get_host_router_total_dp_tunnel_bw()'

2024-01-30 Thread Chung, ChiaHsuan (Tom)

It looks good to me.

Reviewed-by: Tom Chung  



On 1/30/2024 5:49 PM, SHANMUGAM, SRINIVASAN wrote:


[AMD Official Use Only - General]


*From:* SHANMUGAM, SRINIVASAN
*Sent:* Tuesday, January 30, 2024 3:18 PM
*To:* Siqueira, Rodrigo ; Pillai, Aurabindo 

*Cc:* Cyr, Aric ; amd-gfx@lists.freedesktop.org; 
Somasundaram, Meenakshikumar ; 
Huang, PeiChen (Pei-Chen) 
*Subject:* Re: [PATCH] drm/amd/display: Fix buffer overflow in 
'get_host_router_total_dp_tunnel_bw()'


+ Cc: Tom Chung 

On 1/29/2024 9:19 PM, Srinivasan Shanmugam wrote:

The error message buffer overflow 'dc->links' 12 <= 12 suggests that the

code is trying to access an element of the dc->links array that is

beyond its bounds. In C, arrays are zero-indexed, so an array with 12

elements has valid indices from 0 to 11. Trying to access dc->links[12]

would be an attempt to access the 13th element of a 12-element array,

which is a buffer overflow.

To fix this, ensure that the loop does not go beyond the last valid

index when accessing dc->links[i + 1] by subtracting 1 from the loop

condition.

This would ensure that i + 1 is always a valid index in the array.

Fixes the below:

drivers/gpu/drm/amd/amdgpu/../display/dc/link/protocols/link_dp_dpia_bw.c:208 
get_host_router_total_dp_tunnel_bw() error: buffer overflow 'dc->links' 12 <= 12

Fixes: 9ed0893b7c58 ("drm/amd/display: Add dpia display mode validation 
logic")

Cc: PeiChen Huang  

Cc: Aric Cyr  

Cc: Rodrigo Siqueira  


Cc: Aurabindo Pillai  


Cc: Meenakshikumar Somasundaram  


Signed-off-by: Srinivasan Shanmugam  


---

  drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_dpia_bw.c | 2 +-

  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_dpia_bw.c 
b/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_dpia_bw.c

index dd0d2b206462..5491b707cec8 100644

--- a/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_dpia_bw.c

+++ b/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_dpia_bw.c

@@ -196,7 +196,7 @@ static int get_host_router_total_dp_tunnel_bw(const 
struct dc *dc, uint8_t hr_in

   struct dc_link *link_dpia_primary, *link_dpia_secondary;

   int total_bw = 0;

  


- for (uint8_t i = 0; i < MAX_PIPES * 2; ++i) {

+ for (uint8_t i = 0; i < (MAX_PIPES * 2) - 1; ++i) {

  


  if (!dc->links[i] || dc->links[i]->ep_type != 
DISPLAY_ENDPOINT_USB4_DPIA)

      continue;


RE: [PATCH] drm/amd/display: Fix buffer overflow in 'get_host_router_total_dp_tunnel_bw()'

2024-01-30 Thread SHANMUGAM, SRINIVASAN
[AMD Official Use Only - General]



From: SHANMUGAM, SRINIVASAN
Sent: Tuesday, January 30, 2024 3:18 PM
To: Siqueira, Rodrigo ; Pillai, Aurabindo 

Cc: Cyr, Aric ; amd-gfx@lists.freedesktop.org; Somasundaram, 
Meenakshikumar ; Huang, PeiChen (Pei-Chen) 

Subject: Re: [PATCH] drm/amd/display: Fix buffer overflow in 
'get_host_router_total_dp_tunnel_bw()'


+ Cc: Tom Chung mailto:chiahsuan.ch...@amd.com>>
On 1/29/2024 9:19 PM, Srinivasan Shanmugam wrote:

The error message buffer overflow 'dc->links' 12 <= 12 suggests that the

code is trying to access an element of the dc->links array that is

beyond its bounds. In C, arrays are zero-indexed, so an array with 12

elements has valid indices from 0 to 11. Trying to access dc->links[12]

would be an attempt to access the 13th element of a 12-element array,

which is a buffer overflow.



To fix this, ensure that the loop does not go beyond the last valid

index when accessing dc->links[i + 1] by subtracting 1 from the loop

condition.



This would ensure that i + 1 is always a valid index in the array.



Fixes the below:

drivers/gpu/drm/amd/amdgpu/../display/dc/link/protocols/link_dp_dpia_bw.c:208 
get_host_router_total_dp_tunnel_bw() error: buffer overflow 'dc->links' 12 <= 12



Fixes: 9ed0893b7c58 ("drm/amd/display: Add dpia display mode validation logic")

Cc: PeiChen Huang 

Cc: Aric Cyr 

Cc: Rodrigo Siqueira 

Cc: Aurabindo Pillai 

Cc: Meenakshikumar Somasundaram 


Signed-off-by: Srinivasan Shanmugam 


---

 drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_dpia_bw.c | 2 +-

 1 file changed, 1 insertion(+), 1 deletion(-)



diff --git a/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_dpia_bw.c 
b/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_dpia_bw.c

index dd0d2b206462..5491b707cec8 100644

--- a/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_dpia_bw.c

+++ b/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_dpia_bw.c

@@ -196,7 +196,7 @@ static int get_host_router_total_dp_tunnel_bw(const struct 
dc *dc, uint8_t hr_in

  struct dc_link *link_dpia_primary, *link_dpia_secondary;

  int total_bw = 0;



- for (uint8_t i = 0; i < MAX_PIPES * 2; ++i) {

+ for (uint8_t i = 0; i < (MAX_PIPES * 2) - 1; ++i) {



 if (!dc->links[i] || dc->links[i]->ep_type != 
DISPLAY_ENDPOINT_USB4_DPIA)

 continue;


Re: [PATCH] drm/amd/display: Fix buffer overflow in 'get_host_router_total_dp_tunnel_bw()'

2024-01-30 Thread SRINIVASAN SHANMUGAM

+ Cc: Tom Chung 

On 1/29/2024 9:19 PM, Srinivasan Shanmugam wrote:

The error message buffer overflow 'dc->links' 12 <= 12 suggests that the
code is trying to access an element of the dc->links array that is
beyond its bounds. In C, arrays are zero-indexed, so an array with 12
elements has valid indices from 0 to 11. Trying to access dc->links[12]
would be an attempt to access the 13th element of a 12-element array,
which is a buffer overflow.

To fix this, ensure that the loop does not go beyond the last valid
index when accessing dc->links[i + 1] by subtracting 1 from the loop
condition.

This would ensure that i + 1 is always a valid index in the array.

Fixes the below:
drivers/gpu/drm/amd/amdgpu/../display/dc/link/protocols/link_dp_dpia_bw.c:208 
get_host_router_total_dp_tunnel_bw() error: buffer overflow 'dc->links' 12 <= 12

Fixes: 9ed0893b7c58 ("drm/amd/display: Add dpia display mode validation logic")
Cc: PeiChen Huang
Cc: Aric Cyr
Cc: Rodrigo Siqueira
Cc: Aurabindo Pillai
Cc: Meenakshikumar Somasundaram
Signed-off-by: Srinivasan Shanmugam
---
  drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_dpia_bw.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_dpia_bw.c 
b/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_dpia_bw.c
index dd0d2b206462..5491b707cec8 100644
--- a/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_dpia_bw.c
+++ b/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_dpia_bw.c
@@ -196,7 +196,7 @@ static int get_host_router_total_dp_tunnel_bw(const struct 
dc *dc, uint8_t hr_in
struct dc_link *link_dpia_primary, *link_dpia_secondary;
int total_bw = 0;
  
-	for (uint8_t i = 0; i < MAX_PIPES * 2; ++i) {

+   for (uint8_t i = 0; i < (MAX_PIPES * 2) - 1; ++i) {
  
  		if (!dc->links[i] || dc->links[i]->ep_type != DISPLAY_ENDPOINT_USB4_DPIA)

continue;

[PATCH v3] drm/amdkfd: reserve the BO before validating it

2024-01-30 Thread Lang Yu
Fixes: 410f08516e0f ("drm/amdkfd: Move dma unmapping after TLB flush")

v2: Avoid unmapping attachment twice when ERESTARTSYS.

v3: Lock the BO before accessing ttm->sg to avoid race conditions.(Felix)

[   41.708711] WARNING: CPU: 0 PID: 1463 at drivers/gpu/drm/ttm/ttm_bo.c:846 
ttm_bo_validate+0x146/0x1b0 [ttm]
[   41.708989] Call Trace:
[   41.708992]  
[   41.708996]  ? show_regs+0x6c/0x80
[   41.709000]  ? ttm_bo_validate+0x146/0x1b0 [ttm]
[   41.709008]  ? __warn+0x93/0x190
[   41.709014]  ? ttm_bo_validate+0x146/0x1b0 [ttm]
[   41.709024]  ? report_bug+0x1f9/0x210
[   41.709035]  ? handle_bug+0x46/0x80
[   41.709041]  ? exc_invalid_op+0x1d/0x80
[   41.709048]  ? asm_exc_invalid_op+0x1f/0x30
[   41.709057]  ? amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x2c/0x80 [amdgpu]
[   41.709185]  ? ttm_bo_validate+0x146/0x1b0 [ttm]
[   41.709197]  ? amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x2c/0x80 [amdgpu]
[   41.709337]  ? srso_alias_return_thunk+0x5/0x7f
[   41.709346]  kfd_mem_dmaunmap_attachment+0x9e/0x1e0 [amdgpu]
[   41.709467]  amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x56/0x80 [amdgpu]
[   41.709586]  kfd_ioctl_unmap_memory_from_gpu+0x1b7/0x300 [amdgpu]
[   41.709710]  kfd_ioctl+0x1ec/0x650 [amdgpu]
[   41.709822]  ? __pfx_kfd_ioctl_unmap_memory_from_gpu+0x10/0x10 [amdgpu]
[   41.709945]  ? srso_alias_return_thunk+0x5/0x7f
[   41.709949]  ? tomoyo_file_ioctl+0x20/0x30
[   41.709959]  __x64_sys_ioctl+0x9c/0xd0
[   41.709967]  do_syscall_64+0x3f/0x90
[   41.709973]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8

Signed-off-by: Lang Yu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|  2 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 20 ---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |  4 +++-
 3 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 298fc52a35bc..e60f63ccf79a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -313,7 +313,7 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(struct 
amdgpu_device *adev,
  struct kgd_mem *mem, void *drm_priv);
 int amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu(
struct amdgpu_device *adev, struct kgd_mem *mem, void 
*drm_priv);
-void amdgpu_amdkfd_gpuvm_dmaunmap_mem(struct kgd_mem *mem, void *drm_priv);
+int amdgpu_amdkfd_gpuvm_dmaunmap_mem(struct kgd_mem *mem, void *drm_priv);
 int amdgpu_amdkfd_gpuvm_sync_memory(
struct amdgpu_device *adev, struct kgd_mem *mem, bool intr);
 int amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct kgd_mem *mem,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 6f3a4cb2a9ef..ef71b12062a1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -2088,21 +2088,35 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
return ret;
 }
 
-void amdgpu_amdkfd_gpuvm_dmaunmap_mem(struct kgd_mem *mem, void *drm_priv)
+int amdgpu_amdkfd_gpuvm_dmaunmap_mem(struct kgd_mem *mem, void *drm_priv)
 {
struct kfd_mem_attachment *entry;
struct amdgpu_vm *vm;
+   int ret;
 
vm = drm_priv_to_vm(drm_priv);
 
mutex_lock(>lock);
 
+   ret = amdgpu_bo_reserve(mem->bo, true);
+   if (ret)
+   goto out;
+
list_for_each_entry(entry, >attachments, list) {
-   if (entry->bo_va->base.vm == vm)
-   kfd_mem_dmaunmap_attachment(mem, entry);
+   if (entry->bo_va->base.vm != vm)
+   continue;
+   if (entry->bo_va->base.bo->tbo.ttm &&
+   !entry->bo_va->base.bo->tbo.ttm->sg)
+   continue;
+
+   kfd_mem_dmaunmap_attachment(mem, entry);
}
 
+   amdgpu_bo_unreserve(mem->bo);
+out:
mutex_unlock(>lock);
+
+   return ret;
 }
 
 int amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu(
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index ce4c52ec34d8..80e90fdef291 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1442,7 +1442,9 @@ static int kfd_ioctl_unmap_memory_from_gpu(struct file 
*filep,
kfd_flush_tlb(peer_pdd, TLB_FLUSH_HEAVYWEIGHT);
 
/* Remove dma mapping after tlb flush to avoid IO_PAGE_FAULT */
-   amdgpu_amdkfd_gpuvm_dmaunmap_mem(mem, peer_pdd->drm_priv);
+   err = amdgpu_amdkfd_gpuvm_dmaunmap_mem(mem, peer_pdd->drm_priv);
+   if (err)
+   goto sync_memory_failed;
}
 
mutex_unlock(>mutex);
-- 
2.25.1



Re: [PATCH] drm/amd/display: Add NULL check for kzalloc in 'amdgpu_dm_atomic_commit_tail()'

2024-01-30 Thread SRINIVASAN SHANMUGAM

+ Cc: Tom Chung 

On 1/30/2024 2:11 PM, SHANMUGAM, SRINIVASAN wrote:

Add a NULL check for the kzalloc call that allocates memory for
dummy_updates in the amdgpu_dm_atomic_commit_tail function. Previously,
if kzalloc failed to allocate memory and returned NULL, the code would
attempt to use the NULL pointer.

The fix is to check if kzalloc returns NULL, and if so, log an error
message and skip the rest of the current loop iteration with the
continue statement.  This prevents the code from attempting to use the
NULL pointer.

Cc: Julia Lawall 
Cc: Aurabindo Pillai 
Cc: Rodrigo Siqueira 
Cc: Alex Hung 
Cc: Alex Deucher 
Signed-off-by: Srinivasan Shanmugam 
---
  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 0bf1bc7ced7d..8590c9f1dda6 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -9236,6 +9236,10 @@ static void amdgpu_dm_atomic_commit_tail(struct 
drm_atomic_state *state)
 * To fix this, DC should permit updating only stream 
properties.
 */
dummy_updates = kzalloc(sizeof(struct dc_surface_update) * 
MAX_SURFACES, GFP_ATOMIC);
+   if (!dummy_updates) {
+   DRM_ERROR("Failed to allocate memory for 
dummy_updates.\n");
+   continue;
+   }
for (j = 0; j < status->plane_count; j++)
dummy_updates[j].surface = status->plane_states[0];
  


Re: [PATCH v2] drm/amdgpu: Fix missing error code in 'gmc_v6/7/8/9_0_hw_init()'

2024-01-30 Thread Christian König

Am 30.01.24 um 09:27 schrieb Srinivasan Shanmugam:

Return r for success scenairos in 'gmc_v6/7/8/9_0_hw_init()'

Fixes the below:
drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c:920 gmc_v6_0_hw_init() warn: missing 
error code? 'r'
drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c:1104 gmc_v7_0_hw_init() warn: missing 
error code? 'r'
drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c:1224 gmc_v8_0_hw_init() warn: missing 
error code? 'r'
drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c:2347 gmc_v9_0_hw_init() warn: missing 
error code? 'r'

Fixes: 8301de8fcadc ("drm/amdgpu: Fix with right return code '-EIO' in 
'amdgpu_gmc_vram_checking()'")
Cc: Christian König 
Cc: Alex Deucher 
Signed-off-by: Srinivasan Shanmugam 
---
v2:
Changed 'return 0;' to 'return r;' in 'gmc_v9_0_hw_init' in v1.


I think return 0 is actually better since at least in the GMC v6 case 
I've checked the "if(r) return r;" actually guarantees that it's zero.


Regards,
Christian.



  drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c | 4 ++--
  drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c | 4 ++--
  drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 4 ++--
  drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 4 ++--
  4 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
index 229263e407e0..7e53b7b043a8 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
@@ -916,8 +916,8 @@ static int gmc_v6_0_hw_init(void *handle)
  
  	if (amdgpu_emu_mode == 1)

return amdgpu_gmc_vram_checking(adev);
-   else
-   return r;
+
+   return r;
  }
  
  static int gmc_v6_0_hw_fini(void *handle)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
index d95f719eec55..d30b57820c9c 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
@@ -1100,8 +1100,8 @@ static int gmc_v7_0_hw_init(void *handle)
  
  	if (amdgpu_emu_mode == 1)

return amdgpu_gmc_vram_checking(adev);
-   else
-   return r;
+
+   return r;
  }
  
  static int gmc_v7_0_hw_fini(void *handle)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
index 4eb0cccdb413..5d55e2313345 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
@@ -1220,8 +1220,8 @@ static int gmc_v8_0_hw_init(void *handle)
  
  	if (amdgpu_emu_mode == 1)

return amdgpu_gmc_vram_checking(adev);
-   else
-   return r;
+
+   return r;
  }
  
  static int gmc_v8_0_hw_fini(void *handle)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index a3a11538207b..b5651e0426f1 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -2343,8 +2343,8 @@ static int gmc_v9_0_hw_init(void *handle)
  
  	if (amdgpu_emu_mode == 1)

return amdgpu_gmc_vram_checking(adev);
-   else
-   return r;
+
+   return r;
  }
  
  /**




RE: [PATCH 2/2] drm/amdgpu: reset gpu for pm abort case

2024-01-30 Thread Liang, Prike
[AMD Official Use Only - General]

> From: Lazar, Lijo 
> Sent: Monday, January 29, 2024 2:48 PM
> To: Liang, Prike ; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Sharma, Deepak
> 
> Subject: Re: [PATCH 2/2] drm/amdgpu: reset gpu for pm abort case
>
>
>
> On 1/26/2024 2:30 PM, Liang, Prike wrote:
> > [AMD Official Use Only - General]
> >
> >>
> >> On 1/25/2024 8:52 AM, Prike Liang wrote:
> >>> In the pm abort case the gfx power rail not turn off from FCH side
> >>> and this will lead to the gfx reinitialized failed base on the
> >>> unknown gfx HW status, so let's reset the gpu to a known good power
> state.
> >>>
> >>
> >> From the description, this an APU only problem (or this patch could
> >> only resolve APU abort sequence). However, there is no check for APU
> >> in the patch below.
> >>
> > [Prike]  IIRC, there also has a similar problem on the dGPU side when
> > suspend abort and now this patch is only drafted for a hot issue on
> > the RV series. If need we can add a TODO item for drafting a more generic
> solution.
> >
>
> If this addresses a specific issue, then better to check the specific IP 
> revision
> before presenting this as a generic one. Presently the patch logic considers
> this as a generic for all soc15 asics.
>
Before someone can further confirm whether there's a similar problem on the 
dGPU device side then I prefer to limit this quirk only on some specific ASIC.

> >>
> >>> Signed-off-by: Prike Liang 
> >>> ---
> >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 +
> >>>  drivers/gpu/drm/amd/amdgpu/soc15.c | 8 +++-
> >>>  2 files changed, 12 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >>> index 56d9dfa61290..4c40ffaaa5c2 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >>> @@ -4627,6 +4627,11 @@ int amdgpu_device_resume(struct
> drm_device
> >> *dev, bool fbcon)
> >>> return r;
> >>> }
> >>>
> >>> +   if(amdgpu_asic_need_reset_on_init(adev)) {
> >>> +   DRM_INFO("PM abort case and let's reset asic \n");
> >>> +   amdgpu_asic_reset(adev);
> >>> +   }
> >>> +
> >>
> >> suspend_noirq is specific for suspend scenarios and not valid for
> freeze/thaw.
> >> I guess this could trigger reset for successful restore on APUs.
> >>
> > [Prike] If doesn't run into noirq_suspend then still need further check
> whether the PSP TOS is still alive before gpu reset.
> >
>
> AFAIU, for a successful resume from hibernate on APUs, TOS will still be
> running. The patch will trigger a reset in such cases also.
>
> Thanks,
> Lijo
>
Yes, during the system try to restore the saved image the TOS should be running 
at that moment so will filter out the hibernate resume case in the later patch.

Thanks,
Prike
> >>> if (dev->switch_power_state == DRM_SWITCH_POWER_OFF)
> >>> return 0;
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c
> >>> b/drivers/gpu/drm/amd/amdgpu/soc15.c
> >>> index 15033efec2ba..9329a00b6abc 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
> >>> @@ -804,9 +804,16 @@ static bool soc15_need_reset_on_init(struct
> >> amdgpu_device *adev)
> >>> if (adev->asic_type == CHIP_RENOIR)
> >>> return true;
> >>>
> >>> +   sol_reg = RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_81);
> >>> +
> >>> /* Just return false for soc15 GPUs.  Reset does not seem to
> >>>  * be necessary.
> >>>  */
> >>
> >> The comment now doesn't make sense.
> >>
> >> Thanks,
> >> Lijo
> >>
> >>> +   if (adev->in_suspend && !adev->in_s0ix &&
> >>> +   !adev->pm_complete &&
> >>> +   sol_reg)
> >>> +   return true;
> >>> +
> >>> if (!amdgpu_passthrough(adev))
> >>> return false;
> >>>
> >>> @@ -816,7 +823,6 @@ static bool soc15_need_reset_on_init(struct
> >> amdgpu_device *adev)
> >>> /* Check sOS sign of life register to confirm sys driver and sOS
> >>>  * are already been loaded.
> >>>  */
> >>> -   sol_reg = RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_81);
> >>> if (sol_reg)
> >>> return true;
> >>>


[PATCH] drm/amd/display: Add NULL check for kzalloc in 'amdgpu_dm_atomic_commit_tail()'

2024-01-30 Thread Srinivasan Shanmugam
Add a NULL check for the kzalloc call that allocates memory for
dummy_updates in the amdgpu_dm_atomic_commit_tail function. Previously,
if kzalloc failed to allocate memory and returned NULL, the code would
attempt to use the NULL pointer.

The fix is to check if kzalloc returns NULL, and if so, log an error
message and skip the rest of the current loop iteration with the
continue statement.  This prevents the code from attempting to use the
NULL pointer.

Cc: Julia Lawall 
Cc: Aurabindo Pillai 
Cc: Rodrigo Siqueira 
Cc: Alex Hung 
Cc: Alex Deucher 
Signed-off-by: Srinivasan Shanmugam 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 0bf1bc7ced7d..8590c9f1dda6 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -9236,6 +9236,10 @@ static void amdgpu_dm_atomic_commit_tail(struct 
drm_atomic_state *state)
 * To fix this, DC should permit updating only stream 
properties.
 */
dummy_updates = kzalloc(sizeof(struct dc_surface_update) * 
MAX_SURFACES, GFP_ATOMIC);
+   if (!dummy_updates) {
+   DRM_ERROR("Failed to allocate memory for 
dummy_updates.\n");
+   continue;
+   }
for (j = 0; j < status->plane_count; j++)
dummy_updates[j].surface = status->plane_states[0];
 
-- 
2.34.1



Re: Bug#1061449: linux-image-6.7-amd64: a boot message from amdgpu

2024-01-30 Thread Salvatore Bonaccorso
Hi,

[for this reply dropping the Debian bugreport to avoid later followups
sending the ack to the mailinglist and adding noise]

On Sun, Jan 28, 2024 at 11:44:59AM +0100, Linux regression tracking (Thorsten 
Leemhuis) wrote:
> On 27.01.24 14:14, Salvatore Bonaccorso wrote:
> >
> > In Debian (https://bugs.debian.org/1061449) we got the following
> > quotred report:
> > 
> > On Wed, Jan 24, 2024 at 07:38:16PM +0100, Patrice Duroux wrote:
> >>
> >> Giving a try to 6.7, here is a message extracted from dmesg:
> >> [4.177226] [ cut here ]
> >> [4.177227] WARNING: CPU: 6 PID: 248 at
> >> drivers/gpu/drm/amd/amdgpu/../display/dc/link/link_factory.c:387
> >> construct_phy+0xb26/0xd60 [amdgpu]
> > [...]
> 
> Not my area of expertise, but looks a lot like a duplicate of
> https://gitlab.freedesktop.org/drm/amd/-/issues/3122#note_2252835
> 
> Mario (now CCed) already prepared a patch for that issue that seems to work.

#regzbot link: https://gitlab.freedesktop.org/drm/amd/-/issues/3122

Thanks. Indeed the reporter confirmed in
https://bugs.debian.org/1061449#55 that the patch fixes the issue.

So a duplicate of the above.

Regards,
Salvatore


[PATCH v2 0/1] drm/amd: Don't init MEC2 firmware when it fails to load

2024-01-30 Thread David McFarland
> Sorry to be pedantic; but I realized after I tried to apply this is
> missing a S-o-b.  Can you please add one?

Of course, here you go.  I left it off because I wasn't 100% sure about
the intention of your previous change.


David McFarland (1):
  drm/amd: Don't init MEC2 firmware when it fails to load

 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 2 --
 1 file changed, 2 deletions(-)

-- 
2.40.1



Re: drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole (v2)

2024-01-30 Thread Shengyu Qu
Hi Felix,

Thanks for reply. I'll record a backtrace when I'm free. Besides, here is
a dmesg log from someone else in the issue discussion about this problem:
https://projects.blender.org/attachments/ea7b7db5-ac16-479d-935b-9e1da33cd6f0
Tested using next-20240129 with this patch applied, and setup is Plasma 6.0
RC1(Wayland) + RX 6600 XT.

Best regards,
Shengyu

在 2024/1/30 1:47, Felix Kuehling 写道:
> On 2024-01-29 10:24, Shengyu Qu wrote:
>> Hello Felix,
>> I think you are right. This problem has existed for years(just look at
>> the
>> issue creation time in my link), and is thought caused by OpenGL-ROCMTe
>> interop(that's why I think this patch might help). It is very easy to
>> trigger this problem in blender(method is also mentioned in the link).
> 
> This doesn't help you, but it's unlikely that this has been the same
> issue for two years for everybody who chimed into this bug report.
> Different kernel versions, GPUs, user mode ROCm and Mesa versions etc.
> 
> Case in point, it's possible that you're seeing an issue specific to
> RDNA3, which hasn't even been around for that long.
> 
> 
>> Do
>> you have any idea about this?
> 
> Not without seeing a lot more diagnostic information. A full backtrace
> from your kernel log would be a good start.
> 
> Regards,
>   Felix
> 
> 
>> Best regards,
>> Shengyu
>> 在 2024/1/29 22:51, Felix Kuehling 写道:
>>> On 2024-01-29 8:58, Shengyu Qu wrote:
 Hi,
 Seems rocm-opengl interop hang problem still exists[1]. Btw have you
 discovered into this problem?
 Best regards,
 Shengyu
 [1]
 https://projects.blender.org/blender/blender/issues/100353#issuecomment-599
>>>
>>> Maybe you're having a different problem. Do you see this issue also
>>> without any version of the "Relocate TBA/TMA ..." patch?
>>>
>>> Regards,
>>>   Felix
>>>
>>>

 在 2024/1/27 03:15, Shengyu Qu 写道:
> Hello Felix,
> This patch seems working on my system, also it seems fixes the
> ROCM/OpenGL
> interop problem.
> Is this intended to happen or not? Maybe we need more users to test
> it.
> Besides,
> Tested-by: Shengyu Qu 
> Best Regards,
> Shengyu
>
> 在 2024/1/26 06:27, Felix Kuehling 写道:
>> The TBA and TMA, along with an unused IB allocation, reside at low
>> addresses in the VM address space. A stray VM fault which hits these
>> pages must be serviced by making their page table entries invalid.
>> The scheduler depends upon these pages being resident and fails,
>> preventing a debugger from inspecting the failure state.
>>
>> By relocating these pages above 47 bits in the VM address space they
>> can only be reached when bits [63:48] are set to 1. This makes it
>> much
>> less likely for a misbehaving program to generate accesses to them.
>> The current placement at VA (PAGE_SIZE*2) is readily hit by a NULL
>> access with a small offset.
>>
>> v2:
>> - Move it to the reserved space to avoid concflicts with Mesa
>> - Add macros to make reserved space management easier
>>
>> Cc: Arunpravin Paneer Selvam 
>> Cc: Christian Koenig 
>> Signed-off-by: Jay Cornwall 
>> Signed-off-by: Felix Kuehling 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c  |  4 +--
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c    |  7 ++---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h   | 12 ++--
>>   drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c | 30
>> +++-
>>   4 files changed, 30 insertions(+), 23 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
>> index 823d31f4a2a3..53d0a458d78e 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
>> @@ -28,9 +28,9 @@
>>     uint64_t amdgpu_csa_vaddr(struct amdgpu_device *adev)
>>   {
>> -    uint64_t addr = adev->vm_manager.max_pfn <<
>> AMDGPU_GPU_PAGE_SHIFT;
>> +    uint64_t addr = AMDGPU_VA_RESERVED_CSA_START(
>> +    adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT);
>>   -    addr -= AMDGPU_VA_RESERVED_CSA_SIZE;
>>   addr = amdgpu_gmc_sign_extend(addr);
>>     return addr;
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
>> index 3d0d56087d41..9e769ef50f2e 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
>> @@ -45,11 +45,8 @@
>>    */
>>   static inline u64 amdgpu_seq64_get_va_base(struct amdgpu_device
>> *adev)
>>   {
>> -    u64 addr = adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT;
>> -
>> -    addr -= AMDGPU_VA_RESERVED_TOP;
>> -
>> -    return addr;
>> +    return AMDGPU_VA_RESERVED_SEQ64_START(
>> +    adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT);
>>   }
>>     

Re: drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole (v2)

2024-01-30 Thread Shengyu Qu

Hello Felix,
I think you are right. This problem has existed for years(just look at the
issue creation time in my link), and is thought caused by OpenGL-ROCM
interop(that's why I think this patch might help). It is very easy to
trigger this problem in blender(method is also mentioned in the link). Do
you have any idea about this?
Best regards,
Shengyu
在 2024/1/29 22:51, Felix Kuehling 写道:

On 2024-01-29 8:58, Shengyu Qu wrote:

Hi,
Seems rocm-opengl interop hang problem still exists[1]. Btw have you
discovered into this problem?
Best regards,
Shengyu
[1] 
https://projects.blender.org/blender/blender/issues/100353#issuecomment-599


Maybe you're having a different problem. Do you see this issue also 
without any version of the "Relocate TBA/TMA ..." patch?


Regards,
  Felix




在 2024/1/27 03:15, Shengyu Qu 写道:

Hello Felix,
This patch seems working on my system, also it seems fixes the 
ROCM/OpenGL

interop problem.
Is this intended to happen or not? Maybe we need more users to test it.
Besides,
Tested-by: Shengyu Qu 
Best Regards,
Shengyu

在 2024/1/26 06:27, Felix Kuehling 写道:

The TBA and TMA, along with an unused IB allocation, reside at low
addresses in the VM address space. A stray VM fault which hits these
pages must be serviced by making their page table entries invalid.
The scheduler depends upon these pages being resident and fails,
preventing a debugger from inspecting the failure state.

By relocating these pages above 47 bits in the VM address space they
can only be reached when bits [63:48] are set to 1. This makes it much
less likely for a misbehaving program to generate accesses to them.
The current placement at VA (PAGE_SIZE*2) is readily hit by a NULL
access with a small offset.

v2:
- Move it to the reserved space to avoid concflicts with Mesa
- Add macros to make reserved space management easier

Cc: Arunpravin Paneer Selvam 
Cc: Christian Koenig 
Signed-off-by: Jay Cornwall 
Signed-off-by: Felix Kuehling 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c  |  4 +--
  drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c    |  7 ++---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h   | 12 ++--
  drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c | 30 
+++-

  4 files changed, 30 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c

index 823d31f4a2a3..53d0a458d78e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
@@ -28,9 +28,9 @@
    uint64_t amdgpu_csa_vaddr(struct amdgpu_device *adev)
  {
-    uint64_t addr = adev->vm_manager.max_pfn << 
AMDGPU_GPU_PAGE_SHIFT;

+    uint64_t addr = AMDGPU_VA_RESERVED_CSA_START(
+    adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT);
  -    addr -= AMDGPU_VA_RESERVED_CSA_SIZE;
  addr = amdgpu_gmc_sign_extend(addr);
    return addr;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c

index 3d0d56087d41..9e769ef50f2e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
@@ -45,11 +45,8 @@
   */
  static inline u64 amdgpu_seq64_get_va_base(struct amdgpu_device 
*adev)

  {
-    u64 addr = adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT;
-
-    addr -= AMDGPU_VA_RESERVED_TOP;
-
-    return addr;
+    return AMDGPU_VA_RESERVED_SEQ64_START(
+    adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT);
  }
    /**
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h

index 98a57192..f23b6153d310 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -135,11 +135,19 @@ struct amdgpu_mem_stats;
  #define AMDGPU_IS_MMHUB0(x) ((x) >= AMDGPU_MMHUB0_START && (x) < 
AMDGPU_MMHUB1_START)
  #define AMDGPU_IS_MMHUB1(x) ((x) >= AMDGPU_MMHUB1_START && (x) < 
AMDGPU_MAX_VMHUBS)

  -/* Reserve 2MB at top/bottom of address space for kernel use */
+/* Reserve space at top/bottom of address space for kernel use */
  #define AMDGPU_VA_RESERVED_CSA_SIZE    (2ULL << 20)
+#define AMDGPU_VA_RESERVED_CSA_START(top)    ((top) \
+ - AMDGPU_VA_RESERVED_CSA_SIZE)
  #define AMDGPU_VA_RESERVED_SEQ64_SIZE    (2ULL << 20)
+#define AMDGPU_VA_RESERVED_SEQ64_START(top) 
(AMDGPU_VA_RESERVED_CSA_START(top) \

+ - AMDGPU_VA_RESERVED_SEQ64_SIZE)
+#define AMDGPU_VA_RESERVED_TRAP_SIZE    (2ULL << 12)
+#define AMDGPU_VA_RESERVED_TRAP_START(top) 
(AMDGPU_VA_RESERVED_SEQ64_START(top) \

+ - AMDGPU_VA_RESERVED_TRAP_SIZE)
  #define AMDGPU_VA_RESERVED_BOTTOM    (2ULL << 20)
-#define AMDGPU_VA_RESERVED_TOP (AMDGPU_VA_RESERVED_SEQ64_SIZE + \
+#define AMDGPU_VA_RESERVED_TOP (AMDGPU_VA_RESERVED_TRAP_SIZE + \
+ AMDGPU_VA_RESERVED_SEQ64_SIZE + \
   AMDGPU_VA_RESERVED_CSA_SIZE)
    /* See vm_update_mode */
diff --git 

Re: drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole (v2)

2024-01-30 Thread Shengyu Qu

Hi,

Seems rocm-opengl interop hang problem still exists[1]. Btw have you

discovered into this problem?

Best regards,

Shengyu

[1] 
https://projects.blender.org/blender/blender/issues/100353#issuecomment-599


在 2024/1/27 03:15, Shengyu Qu 写道:

Hello Felix,

This patch seems working on my system, also it seems fixes the 
ROCM/OpenGL

interop problem.

Is this intended to happen or not? Maybe we need more users to test it.

Besides,

Tested-by: Shengyu Qu 

Best Regards,

Shengyu

在 2024/1/26 06:27, Felix Kuehling 写道:

The TBA and TMA, along with an unused IB allocation, reside at low
addresses in the VM address space. A stray VM fault which hits these
pages must be serviced by making their page table entries invalid.
The scheduler depends upon these pages being resident and fails,
preventing a debugger from inspecting the failure state.

By relocating these pages above 47 bits in the VM address space they
can only be reached when bits [63:48] are set to 1. This makes it much
less likely for a misbehaving program to generate accesses to them.
The current placement at VA (PAGE_SIZE*2) is readily hit by a NULL
access with a small offset.

v2:
- Move it to the reserved space to avoid concflicts with Mesa
- Add macros to make reserved space management easier

Cc: Arunpravin Paneer Selvam 
Cc: Christian Koenig 
Signed-off-by: Jay Cornwall 
Signed-off-by: Felix Kuehling 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c  |  4 +--
  drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c    |  7 ++---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h   | 12 ++--
  drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c | 30 +++-
  4 files changed, 30 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c

index 823d31f4a2a3..53d0a458d78e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
@@ -28,9 +28,9 @@
    uint64_t amdgpu_csa_vaddr(struct amdgpu_device *adev)
  {
-    uint64_t addr = adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT;
+    uint64_t addr = AMDGPU_VA_RESERVED_CSA_START(
+    adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT);
  -    addr -= AMDGPU_VA_RESERVED_CSA_SIZE;
  addr = amdgpu_gmc_sign_extend(addr);
    return addr;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c

index 3d0d56087d41..9e769ef50f2e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
@@ -45,11 +45,8 @@
   */
  static inline u64 amdgpu_seq64_get_va_base(struct amdgpu_device *adev)
  {
-    u64 addr = adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT;
-
-    addr -= AMDGPU_VA_RESERVED_TOP;
-
-    return addr;
+    return AMDGPU_VA_RESERVED_SEQ64_START(
+    adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT);
  }
    /**
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h

index 98a57192..f23b6153d310 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -135,11 +135,19 @@ struct amdgpu_mem_stats;
  #define AMDGPU_IS_MMHUB0(x) ((x) >= AMDGPU_MMHUB0_START && (x) < 
AMDGPU_MMHUB1_START)
  #define AMDGPU_IS_MMHUB1(x) ((x) >= AMDGPU_MMHUB1_START && (x) < 
AMDGPU_MAX_VMHUBS)

  -/* Reserve 2MB at top/bottom of address space for kernel use */
+/* Reserve space at top/bottom of address space for kernel use */
  #define AMDGPU_VA_RESERVED_CSA_SIZE    (2ULL << 20)
+#define AMDGPU_VA_RESERVED_CSA_START(top)    ((top) \
+ - AMDGPU_VA_RESERVED_CSA_SIZE)
  #define AMDGPU_VA_RESERVED_SEQ64_SIZE    (2ULL << 20)
+#define AMDGPU_VA_RESERVED_SEQ64_START(top) 
(AMDGPU_VA_RESERVED_CSA_START(top) \

+ - AMDGPU_VA_RESERVED_SEQ64_SIZE)
+#define AMDGPU_VA_RESERVED_TRAP_SIZE    (2ULL << 12)
+#define AMDGPU_VA_RESERVED_TRAP_START(top) 
(AMDGPU_VA_RESERVED_SEQ64_START(top) \

+ - AMDGPU_VA_RESERVED_TRAP_SIZE)
  #define AMDGPU_VA_RESERVED_BOTTOM    (2ULL << 20)
-#define AMDGPU_VA_RESERVED_TOP (AMDGPU_VA_RESERVED_SEQ64_SIZE + \
+#define AMDGPU_VA_RESERVED_TOP (AMDGPU_VA_RESERVED_TRAP_SIZE + \
+ AMDGPU_VA_RESERVED_SEQ64_SIZE + \
   AMDGPU_VA_RESERVED_CSA_SIZE)
    /* See vm_update_mode */
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c

index 6604a3f99c5e..f899cce25b2a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
@@ -36,6 +36,7 @@
  #include 
  #include 
  #include 
+#include "amdgpu_vm.h"
    /*
   * The primary memory I/O features being added for revisions of gfxip
@@ -326,10 +327,16 @@ static void kfd_init_apertures_vi(struct 
kfd_process_device *pdd, uint8_t id)

   * with small reserved space for kernel.
   * Set them to CANONICAL addresses.
   */
-    

Re: [RFC PATCH 0/2] drm/amd/display: switch amdgpu_dm_connector to

2024-01-30 Thread Jani Nikula
On Fri, 26 Jan 2024, Mario Limonciello  wrote:
> On 1/26/2024 10:28, Melissa Wen wrote:
>> Hi,
>> 
>> I'm debugging a null-pointer dereference when running
>> igt@kms_connector_force_edid and the way I found to solve the bug is to
>> stop using raw edid handler in amdgpu_connector_funcs_force and
>> create_eml_sink in favor of managing resouces via sruct drm_edid helpers
>> (Patch 1). The proper solution seems to be switch amdgpu_dm_connector
>> from struct edid to struct drm_edid and avoid the usage of different
>> approaches in the driver (Patch 2). However, doing it implies a good
>> amount of work and validation, therefore I decided to send this RFC
>> first to collect opinions and check if there is any parallel work on
>> this side. It's a working in progress.
>> 
>> The null-pointer error trigger by the igt@kms_connector_force_edid test
>> was introduced by:
>> - e54ed41620f ("drm/amd/display: Remove unwanted drm edid references")
>> 
>> You can check the error trace in the first patch.
>> 
>> This series was tested with kms_hdmi_inject and kms_force_connector. No
>> null-pointer error, kms_hdmi_inject is successul and kms_force_connector
>> is sucessful after the second execution - the force-edid subtest
>> still fails in the first run (I'm still investigating).
>> 
>> There is also a couple of cast warnings to be addressed - I'm looking
>> for the best replacement.
>> 
>> I appreciate any feedback and testing.
>
> So I'm actually a little bit worried by hardcoding EDID_LENGTH in this 
> series.
>
> I have some other patches that I'm posting later on that let you get the 
> EDID from _DDC BIOS method too.  My observation was that the EDID can be 
> anywhere up to 512 bytes according to the ACPI spec.
>
> An earlier version of my patch was using EDID_LENGTH when fetching it 
> and the EDID checksum failed.
>
> I'll CC you on the post, we probably want to get your changes and mine 
> merged together.

One of the main points of struct drm_edid is that it tracks the
allocation size separately.

We should simply not trust edid->extensions, because most of the time it
originates from outside the kernel.

Using drm_edid and immediately drm_edid_raw() falls short. That function
should only be used during migration to help. And yeah, it also means
EDID parsing should be done in drm_edid.c, and not spread out all over
the subsystem.


BR,
Jani.


>
>> 
>> Melissa
>> 
>> Melissa Wen (2):
>>drm/amd/display: fix null-pointer dereference on edid reading
>>drm/amd/display: switch amdgpu_dm_connector to use struct drm_edid
>> 
>>   .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 78 ++-
>>   .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h |  4 +-
>>   .../amd/display/amdgpu_dm/amdgpu_dm_helpers.c |  9 ++-
>>   .../display/amdgpu_dm/amdgpu_dm_mst_types.c   | 23 +++---
>>   4 files changed, 60 insertions(+), 54 deletions(-)
>> 
>

-- 
Jani Nikula, Intel


[PATCH v2 1/1] drm/amd: Don't init MEC2 firmware when it fails to load

2024-01-30 Thread David McFarland
The same calls are made directly above, but conditional on the firmware
loading and validating successfully.

Fixes: 9931b67690cf ("drm/amd: Load GFX10 microcode during early_init")
Signed-off-by: David McFarland 
---
v2: signed off

 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index d63cab294883..b0ba68016a02 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -4027,8 +4027,6 @@ static int gfx_v10_0_init_microcode(struct amdgpu_device 
*adev)
err = 0;
adev->gfx.mec2_fw = NULL;
}
-   amdgpu_gfx_cp_init_microcode(adev, AMDGPU_UCODE_ID_CP_MEC2);
-   amdgpu_gfx_cp_init_microcode(adev, AMDGPU_UCODE_ID_CP_MEC2_JT);
 
gfx_v10_0_check_fw_write_wait(adev);
 out:
-- 
2.40.1



RE: [PATCH] drm/amdgpu: remove imu start dependency on amdgpu_dpm.

2024-01-30 Thread Yu, Lang
[Public]

Reviewed-by: Lang Yu 

>-Original Message-
>From: amd-gfx  On Behalf Of Yifan Zhang
>Sent: Saturday, January 20, 2024 4:32 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deucher, Alexander ; Huang, Tim
>; Feng, Kenneth ; Ma, Li
>; Zhang, Yifan 
>Subject: [PATCH] drm/amdgpu: remove imu start dependency on amdgpu_dpm.
>
>IMU starts anyway when dpm is disabled in backdoor loading.
>
>Signed-off-by: Yifan Zhang 
>---
> drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>index a2d3cced8f19..c5b1d036c95d 100644
>--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>@@ -4324,7 +4324,7 @@ static int gfx_v11_0_hw_init(void *handle)
>   return r;
>   } else {
>   if (adev->firmware.load_type == AMDGPU_FW_LOAD_DIRECT) {
>-  if (adev->gfx.imu.funcs && (amdgpu_dpm > 0)) {
>+  if (adev->gfx.imu.funcs) {
>   if (adev->gfx.imu.funcs->load_microcode)
>   adev->gfx.imu.funcs-
>>load_microcode(adev);
>   if (adev->gfx.imu.funcs->setup_imu)
>--
>2.37.3



RE: [PATCH v2] drm/amdgpu: drm/amdgpu: remove golden setting for gfx 11.5.0

2024-01-30 Thread Yu, Lang
[Public]

Reviewed-by: Lang Yu 

>-Original Message-
>From: Zhang, Yifan 
>Sent: Tuesday, January 30, 2024 1:20 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deucher, Alexander ; Koenig, Christian
>; Huang, Tim ; Yu, Lang
>; Zhang, Yifan 
>Subject: [PATCH v2] drm/amdgpu: drm/amdgpu: remove golden setting for gfx
>11.5.0
>
>No need to set GC golden settings in driver from gfx 11.5.0 onwards.
>
>Signed-off-by: Yifan Zhang 
>---
> drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 22 --
> 1 file changed, 22 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>index c1e10760..2fb1342d5bd9 100644
>--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>@@ -107,23 +107,6 @@ static const struct soc15_reg_golden
>golden_settings_gc_11_0_1[] =
>   SOC15_REG_GOLDEN_VALUE(GC, 0, regTCP_CNTL2, 0xfcff,
>0x000a)  };
>
>-static const struct soc15_reg_golden golden_settings_gc_11_5_0[] = {
>-  SOC15_REG_GOLDEN_VALUE(GC, 0, regDB_DEBUG5, 0x,
>0x0800),
>-  SOC15_REG_GOLDEN_VALUE(GC, 0, regGB_ADDR_CONFIG, 0x0c1807ff,
>0x0242),
>-  SOC15_REG_GOLDEN_VALUE(GC, 0, regGCR_GENERAL_CNTL, 0x1ff1,
>0x0500),
>-  SOC15_REG_GOLDEN_VALUE(GC, 0, regGL2A_ADDR_MATCH_MASK,
>0x, 0xfff3),
>-  SOC15_REG_GOLDEN_VALUE(GC, 0, regGL2C_ADDR_MATCH_MASK,
>0x, 0xfff3),
>-  SOC15_REG_GOLDEN_VALUE(GC, 0, regGL2C_CTRL, 0x, 0xf37fff3f),
>-  SOC15_REG_GOLDEN_VALUE(GC, 0, regGL2C_CTRL3, 0xfffb,
>0x00f40188),
>-  SOC15_REG_GOLDEN_VALUE(GC, 0, regGL2C_CTRL4, 0xf0ff,
>0x80009007),
>-  SOC15_REG_GOLDEN_VALUE(GC, 0, regPA_CL_ENHANCE, 0xf1ff,
>0x00880007),
>-  SOC15_REG_GOLDEN_VALUE(GC, 0, regPC_CONFIG_CNTL_1, 0x,
>0x0001),
>-  SOC15_REG_GOLDEN_VALUE(GC, 0, regTA_CNTL_AUX, 0xf7f7,
>0x0103),
>-  SOC15_REG_GOLDEN_VALUE(GC, 0, regTA_CNTL2, 0x007f,
>0x),
>-  SOC15_REG_GOLDEN_VALUE(GC, 0, regTCP_CNTL2, 0xffcf,
>0x200a),
>-  SOC15_REG_GOLDEN_VALUE(GC, 0, regUTCL1_CTRL_2, 0x,
>0x048f)
>-};
>-
> #define DEFAULT_SH_MEM_CONFIG \
>   ((SH_MEM_ADDRESS_MODE_64 <<
>SH_MEM_CONFIG__ADDRESS_MODE__SHIFT) | \
>(SH_MEM_ALIGNMENT_MODE_UNALIGNED <<
>SH_MEM_CONFIG__ALIGNMENT_MODE__SHIFT) | \ @@ -304,11 +287,6 @@
>static void gfx_v11_0_init_golden_registers(struct amdgpu_device *adev)
>   golden_settings_gc_11_0_1,
>   (const
>u32)ARRAY_SIZE(golden_settings_gc_11_0_1));
>   break;
>-  case IP_VERSION(11, 5, 0):
>-  soc15_program_register_sequence(adev,
>-  golden_settings_gc_11_5_0,
>-  (const
>u32)ARRAY_SIZE(golden_settings_gc_11_5_0));
>-  break;
>   default:
>   break;
>   }
>--
>2.37.3



[PATCH v2] drm/amdgpu: Fix missing error code in 'gmc_v6/7/8/9_0_hw_init()'

2024-01-30 Thread Srinivasan Shanmugam
Return r for success scenairos in 'gmc_v6/7/8/9_0_hw_init()'

Fixes the below:
drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c:920 gmc_v6_0_hw_init() warn: missing 
error code? 'r'
drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c:1104 gmc_v7_0_hw_init() warn: missing 
error code? 'r'
drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c:1224 gmc_v8_0_hw_init() warn: missing 
error code? 'r'
drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c:2347 gmc_v9_0_hw_init() warn: missing 
error code? 'r'

Fixes: 8301de8fcadc ("drm/amdgpu: Fix with right return code '-EIO' in 
'amdgpu_gmc_vram_checking()'")
Cc: Christian König 
Cc: Alex Deucher 
Signed-off-by: Srinivasan Shanmugam 
---
v2: 
   Changed 'return 0;' to 'return r;' in 'gmc_v9_0_hw_init' in v1.

 drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c | 4 ++--
 drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c | 4 ++--
 drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 4 ++--
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 4 ++--
 4 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
index 229263e407e0..7e53b7b043a8 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
@@ -916,8 +916,8 @@ static int gmc_v6_0_hw_init(void *handle)
 
if (amdgpu_emu_mode == 1)
return amdgpu_gmc_vram_checking(adev);
-   else
-   return r;
+
+   return r;
 }
 
 static int gmc_v6_0_hw_fini(void *handle)
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
index d95f719eec55..d30b57820c9c 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
@@ -1100,8 +1100,8 @@ static int gmc_v7_0_hw_init(void *handle)
 
if (amdgpu_emu_mode == 1)
return amdgpu_gmc_vram_checking(adev);
-   else
-   return r;
+
+   return r;
 }
 
 static int gmc_v7_0_hw_fini(void *handle)
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
index 4eb0cccdb413..5d55e2313345 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
@@ -1220,8 +1220,8 @@ static int gmc_v8_0_hw_init(void *handle)
 
if (amdgpu_emu_mode == 1)
return amdgpu_gmc_vram_checking(adev);
-   else
-   return r;
+
+   return r;
 }
 
 static int gmc_v8_0_hw_fini(void *handle)
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index a3a11538207b..b5651e0426f1 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -2343,8 +2343,8 @@ static int gmc_v9_0_hw_init(void *handle)
 
if (amdgpu_emu_mode == 1)
return amdgpu_gmc_vram_checking(adev);
-   else
-   return r;
+
+   return r;
 }
 
 /**
-- 
2.34.1



[PATCH] drm/amdgpu: Fix missing error code in 'gmc_v6/7/8/9_0_hw_init()'

2024-01-30 Thread Srinivasan Shanmugam
Return r for success scenairos in 'gmc_v6/7/8/9_0_hw_init()'

Fixes the below:
drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c:920 gmc_v6_0_hw_init() warn: missing 
error code? 'r'
drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c:1104 gmc_v7_0_hw_init() warn: missing 
error code? 'r'
drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c:1224 gmc_v8_0_hw_init() warn: missing 
error code? 'r'
drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c:2347 gmc_v9_0_hw_init() warn: missing 
error code? 'r'

Fixes: 8301de8fcadc ("drm/amdgpu: Fix with right return code '-EIO' in 
'amdgpu_gmc_vram_checking()'")
Cc: Christian König 
Cc: Alex Deucher 
Signed-off-by: Srinivasan Shanmugam 
---
 drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c | 4 ++--
 drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c | 4 ++--
 drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 4 ++--
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 4 ++--
 4 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
index 229263e407e0..7e53b7b043a8 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
@@ -916,8 +916,8 @@ static int gmc_v6_0_hw_init(void *handle)
 
if (amdgpu_emu_mode == 1)
return amdgpu_gmc_vram_checking(adev);
-   else
-   return r;
+
+   return r;
 }
 
 static int gmc_v6_0_hw_fini(void *handle)
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
index d95f719eec55..d30b57820c9c 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
@@ -1100,8 +1100,8 @@ static int gmc_v7_0_hw_init(void *handle)
 
if (amdgpu_emu_mode == 1)
return amdgpu_gmc_vram_checking(adev);
-   else
-   return r;
+
+   return r;
 }
 
 static int gmc_v7_0_hw_fini(void *handle)
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
index 4eb0cccdb413..5d55e2313345 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
@@ -1220,8 +1220,8 @@ static int gmc_v8_0_hw_init(void *handle)
 
if (amdgpu_emu_mode == 1)
return amdgpu_gmc_vram_checking(adev);
-   else
-   return r;
+
+   return r;
 }
 
 static int gmc_v8_0_hw_fini(void *handle)
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index a3a11538207b..4a50537252ac 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -2343,8 +2343,8 @@ static int gmc_v9_0_hw_init(void *handle)
 
if (amdgpu_emu_mode == 1)
return amdgpu_gmc_vram_checking(adev);
-   else
-   return r;
+
+   return 0;
 }
 
 /**
-- 
2.34.1