date:20211215

RE: [PATCH v2] drm/amdgpu: Call amdgpu_device_unmap_mmio() iff device is unplugged to prevent crash in GPU initialization failure

2021-12-15 Thread Chen, Guchun

[Public]

My BAD to misunderstand this.

There are both spell typos in patch subject and body, s/iff/if.

The patch is: Reviewed-by: Guchun Chen 

Please wait for the ack from Andrey and Christian before pushing this.

Regards,
Guchun

-Original Message-
From: Shi, Leslie  
Sent: Thursday, December 16, 2021 3:00 PM
To: Chen, Guchun ; Grodzovsky, Andrey 
; Koenig, Christian ; Pan, 
Xinhui ; Deucher, Alexander ; 
amd-gfx@lists.freedesktop.org
Subject: RE: [PATCH v2] drm/amdgpu: Call amdgpu_device_unmap_mmio() iff device 
is unplugged to prevent crash in GPU initialization failure

[Public]

Hi Guchun,

As Andrey says, "we should not call amdgpu_device_unmap_mmio unless device is 
unplugged", I think we should call amdgpu_device_unmap_mmio() only if device is 
unplugged (drm_dev_enter() return false) . 

+if (!drm_dev_enter(adev_to_drm(adev), ))
+   amdgpu_device_unmap_mmio(adev);
+ else 
#   drm_dev_exit(idx);


Regards,
Leslie

-Original Message-
From: Chen, Guchun  
Sent: Thursday, December 16, 2021 2:46 PM
To: Shi, Leslie ; Grodzovsky, Andrey 
; Koenig, Christian ; Pan, 
Xinhui ; Deucher, Alexander ; 
amd-gfx@lists.freedesktop.org
Subject: RE: [PATCH v2] drm/amdgpu: Call amdgpu_device_unmap_mmio() iff device 
is unplugged to prevent crash in GPU initialization failure

[Public]

Hi Leslie,

I think we need to modify it like:

+if (drm_dev_enter(adev_to_drm(adev), )) {
+   amdgpu_device_unmap_mmio(adev);
+   drm_dev_exit(idx);
+}

Also you need to credit Andrey a 'suggested-by' in your patch.

Regards,
Guchun

-Original Message-
From: Shi, Leslie  
Sent: Thursday, December 16, 2021 2:14 PM
To: Grodzovsky, Andrey ; Koenig, Christian 
; Pan, Xinhui ; Deucher, 
Alexander ; amd-gfx@lists.freedesktop.org
Cc: Chen, Guchun ; Shi, Leslie 
Subject: [PATCH v2] drm/amdgpu: Call amdgpu_device_unmap_mmio() iff device is 
unplugged to prevent crash in GPU initialization failure

[Why]
In amdgpu_driver_load_kms, when amdgpu_device_init returns error during driver 
modprobe, it will start the error handle path immediately and call into 
amdgpu_device_unmap_mmio as well to release mapped VRAM. However, in the 
following release callback, driver stills visits the unmapped memory like 
vcn.inst[i].fw_shared_cpu_addr in vcn_v3_0_sw_fini. So a kernel crash occurs.

[How]
call amdgpu_device_unmap_mmio() iff device is unplugged to prevent invalid 
memory address in
vcn_v3_0_sw_fini() when GPU initialization failure.

Signed-off-by: Leslie Shi 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index fb03d75880ec..d3656e7b60c2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3845,6 +3845,8 @@ static void amdgpu_device_unmap_mmio(struct amdgpu_device 
*adev)
  */
 void amdgpu_device_fini_hw(struct amdgpu_device *adev)  {
+   int idx;
+
dev_info(adev->dev, "amdgpu: finishing device.\n");
flush_delayed_work(>delayed_init_work);
if (adev->mman.initialized) {
@@ -3888,7 +3890,11 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev)
 
amdgpu_gart_dummy_page_fini(adev);
 
-   amdgpu_device_unmap_mmio(adev);
+   if (!drm_dev_enter(adev_to_drm(adev), ))
+   amdgpu_device_unmap_mmio(adev);
+   else
+   drm_dev_exit(idx);
+
 }
 
 void amdgpu_device_fini_sw(struct amdgpu_device *adev)
--
2.25.1

RE: [PATCH v2] drm/amdgpu: Call amdgpu_device_unmap_mmio() iff device is unplugged to prevent crash in GPU initialization failure

2021-12-15 Thread Shi, Leslie

[Public]

Hi Guchun,

As Andrey says, "we should not call amdgpu_device_unmap_mmio unless device is 
unplugged", I think we should call amdgpu_device_unmap_mmio() only if device is 
unplugged (drm_dev_enter() return false) . 

+if (!drm_dev_enter(adev_to_drm(adev), ))
+   amdgpu_device_unmap_mmio(adev);
+ else 
#   drm_dev_exit(idx);


Regards,
Leslie

-Original Message-
From: Chen, Guchun  
Sent: Thursday, December 16, 2021 2:46 PM
To: Shi, Leslie ; Grodzovsky, Andrey 
; Koenig, Christian ; Pan, 
Xinhui ; Deucher, Alexander ; 
amd-gfx@lists.freedesktop.org
Subject: RE: [PATCH v2] drm/amdgpu: Call amdgpu_device_unmap_mmio() iff device 
is unplugged to prevent crash in GPU initialization failure

[Public]

Hi Leslie,

I think we need to modify it like:

+if (drm_dev_enter(adev_to_drm(adev), )) {
+   amdgpu_device_unmap_mmio(adev);
+   drm_dev_exit(idx);
+}

Also you need to credit Andrey a 'suggested-by' in your patch.

Regards,
Guchun

-Original Message-
From: Shi, Leslie  
Sent: Thursday, December 16, 2021 2:14 PM
To: Grodzovsky, Andrey ; Koenig, Christian 
; Pan, Xinhui ; Deucher, 
Alexander ; amd-gfx@lists.freedesktop.org
Cc: Chen, Guchun ; Shi, Leslie 
Subject: [PATCH v2] drm/amdgpu: Call amdgpu_device_unmap_mmio() iff device is 
unplugged to prevent crash in GPU initialization failure

[Why]
In amdgpu_driver_load_kms, when amdgpu_device_init returns error during driver 
modprobe, it will start the error handle path immediately and call into 
amdgpu_device_unmap_mmio as well to release mapped VRAM. However, in the 
following release callback, driver stills visits the unmapped memory like 
vcn.inst[i].fw_shared_cpu_addr in vcn_v3_0_sw_fini. So a kernel crash occurs.

[How]
call amdgpu_device_unmap_mmio() iff device is unplugged to prevent invalid 
memory address in
vcn_v3_0_sw_fini() when GPU initialization failure.

Signed-off-by: Leslie Shi 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index fb03d75880ec..d3656e7b60c2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3845,6 +3845,8 @@ static void amdgpu_device_unmap_mmio(struct amdgpu_device 
*adev)
  */
 void amdgpu_device_fini_hw(struct amdgpu_device *adev)  {
+   int idx;
+
dev_info(adev->dev, "amdgpu: finishing device.\n");
flush_delayed_work(>delayed_init_work);
if (adev->mman.initialized) {
@@ -3888,7 +3890,11 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev)
 
amdgpu_gart_dummy_page_fini(adev);
 
-   amdgpu_device_unmap_mmio(adev);
+   if (!drm_dev_enter(adev_to_drm(adev), ))
+   amdgpu_device_unmap_mmio(adev);
+   else
+   drm_dev_exit(idx);
+
 }
 
 void amdgpu_device_fini_sw(struct amdgpu_device *adev)
--
2.25.1

RE: [PATCH v2] drm/amdgpu: Call amdgpu_device_unmap_mmio() iff device is unplugged to prevent crash in GPU initialization failure

2021-12-15 Thread Chen, Guchun

[Public]

Hi Leslie,

I think we need to modify it like:

+if (drm_dev_enter(adev_to_drm(adev), )) {
+   amdgpu_device_unmap_mmio(adev);
+   drm_dev_exit(idx);
+}

Also you need to credit Andrey a 'suggested-by' in your patch.

Regards,
Guchun

-Original Message-
From: Shi, Leslie  
Sent: Thursday, December 16, 2021 2:14 PM
To: Grodzovsky, Andrey ; Koenig, Christian 
; Pan, Xinhui ; Deucher, 
Alexander ; amd-gfx@lists.freedesktop.org
Cc: Chen, Guchun ; Shi, Leslie 
Subject: [PATCH v2] drm/amdgpu: Call amdgpu_device_unmap_mmio() iff device is 
unplugged to prevent crash in GPU initialization failure

[Why]
In amdgpu_driver_load_kms, when amdgpu_device_init returns error during driver 
modprobe, it will start the error handle path immediately and call into 
amdgpu_device_unmap_mmio as well to release mapped VRAM. However, in the 
following release callback, driver stills visits the unmapped memory like 
vcn.inst[i].fw_shared_cpu_addr in vcn_v3_0_sw_fini. So a kernel crash occurs.

[How]
call amdgpu_device_unmap_mmio() iff device is unplugged to prevent invalid 
memory address in
vcn_v3_0_sw_fini() when GPU initialization failure.

Signed-off-by: Leslie Shi 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index fb03d75880ec..d3656e7b60c2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3845,6 +3845,8 @@ static void amdgpu_device_unmap_mmio(struct amdgpu_device 
*adev)
  */
 void amdgpu_device_fini_hw(struct amdgpu_device *adev)  {
+   int idx;
+
dev_info(adev->dev, "amdgpu: finishing device.\n");
flush_delayed_work(>delayed_init_work);
if (adev->mman.initialized) {
@@ -3888,7 +3890,11 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev)
 
amdgpu_gart_dummy_page_fini(adev);
 
-   amdgpu_device_unmap_mmio(adev);
+   if (!drm_dev_enter(adev_to_drm(adev), ))
+   amdgpu_device_unmap_mmio(adev);
+   else
+   drm_dev_exit(idx);
+
 }
 
 void amdgpu_device_fini_sw(struct amdgpu_device *adev)
--
2.25.1

[PATCH v2] drm/amdgpu: Call amdgpu_device_unmap_mmio() iff device is unplugged to prevent crash in GPU initialization failure

2021-12-15 Thread Leslie Shi

[Why]
In amdgpu_driver_load_kms, when amdgpu_device_init returns error during driver 
modprobe, it
will start the error handle path immediately and call into 
amdgpu_device_unmap_mmio as well
to release mapped VRAM. However, in the following release callback, driver 
stills visits the
unmapped memory like vcn.inst[i].fw_shared_cpu_addr in vcn_v3_0_sw_fini. So a 
kernel crash occurs.

[How]
call amdgpu_device_unmap_mmio() iff device is unplugged to prevent invalid 
memory address in
vcn_v3_0_sw_fini() when GPU initialization failure.

Signed-off-by: Leslie Shi 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index fb03d75880ec..d3656e7b60c2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3845,6 +3845,8 @@ static void amdgpu_device_unmap_mmio(struct amdgpu_device 
*adev)
  */
 void amdgpu_device_fini_hw(struct amdgpu_device *adev)
 {
+   int idx;
+
dev_info(adev->dev, "amdgpu: finishing device.\n");
flush_delayed_work(>delayed_init_work);
if (adev->mman.initialized) {
@@ -3888,7 +3890,11 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev)
 
amdgpu_gart_dummy_page_fini(adev);
 
-   amdgpu_device_unmap_mmio(adev);
+   if (!drm_dev_enter(adev_to_drm(adev), ))
+   amdgpu_device_unmap_mmio(adev);
+   else
+   drm_dev_exit(idx);
+
 }
 
 void amdgpu_device_fini_sw(struct amdgpu_device *adev)
-- 
2.25.1

RE: [PATCH] drm/amd/pm: Fix xgmi link control on aldebaran

2021-12-15 Thread Quan, Evan

[AMD Official Use Only]

Reviewed-by: Evan Quan 

> -Original Message-
> From: amd-gfx  On Behalf Of Lijo
> Lazar
> Sent: Wednesday, December 15, 2021 11:50 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Yang, Stanley
> ; Zhang, Hawking 
> Subject: [PATCH] drm/amd/pm: Fix xgmi link control on aldebaran
> 
> Fix the message argument.
>   0: Allow power down
>   1: Disallow power down
> 
> Signed-off-by: Lijo Lazar 
> ---
>  drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
> b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
> index 0907da022197..7433a051e795 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
> @@ -1625,7 +1625,7 @@ static int
> aldebaran_allow_xgmi_power_down(struct smu_context *smu, bool en)  {
>   return smu_cmn_send_smc_msg_with_param(smu,
>  SMU_MSG_GmiPwrDnControl,
> -en ? 1 : 0,
> +en ? 0 : 1,
>  NULL);
>  }
> 
> --
> 2.25.1

Re: [PATCH v3] drm/amdgpu: introduce new amdgpu_fence object to indicate the job embedded fence

2021-12-15 Thread Huang Rui

On Wed, Dec 15, 2021 at 09:30:31PM +0800, kernel test robot wrote:
> Hi Huang,
> 
> I love your patch! Perhaps something to improve:
> 
> [auto build test WARNING on drm/drm-next]
> [also build test WARNING on drm-intel/for-linux-next drm-tip/drm-tip 
> v5.16-rc5 next-20211214]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit-scm.com%2Fdocs%2Fgit-format-patchdata=04%7C01%7Cray.huang%40amd.com%7C57b8f278a2a14c8dab7908d9bfcf2649%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637751718697946000%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=GHmkohA1RYdTxSAsM%2BGd1QhX%2BOREjXw0xuALuROXd7I%3Dreserved=0]
> 
> url:
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2F0day-ci%2Flinux%2Fcommits%2FHuang-Rui%2Fdrm-amdgpu-introduce-new-amdgpu_fence-object-to-indicate-the-job-embedded-fence%2F20211215-143731data=04%7C01%7Cray.huang%40amd.com%7C57b8f278a2a14c8dab7908d9bfcf2649%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637751718697946000%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=LqI6BC9bWIb9iq9BKIpIZ8R%2FYB2m4x3OkytvPxundbw%3Dreserved=0
> base:   git://anongit.freedesktop.org/drm/drm drm-next
> config: x86_64-allyesconfig 
> (https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdownload.01.org%2F0day-ci%2Farchive%2F20211215%2F202112152115.sqAqnvG7-lkp%40intel.com%2Fconfigdata=04%7C01%7Cray.huang%40amd.com%7C57b8f278a2a14c8dab7908d9bfcf2649%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637751718697946000%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=5fJoEJtN2294YWevl8ys2VTHWh1AgaYtDOKuvg3Qi6w%3Dreserved=0)
> compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
> reproduce (this is a W=1 build):
> # 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2F0day-ci%2Flinux%2Fcommit%2Fa47becf231b123760625c45242e89f5e5b5b4915data=04%7C01%7Cray.huang%40amd.com%7C57b8f278a2a14c8dab7908d9bfcf2649%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637751718697946000%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=qbMYgGRhW%2BTUqKoJAodp5V2VNybzmekZLCinmUtQhho%3Dreserved=0
> git remote add linux-review 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2F0day-ci%2Flinuxdata=04%7C01%7Cray.huang%40amd.com%7C57b8f278a2a14c8dab7908d9bfcf2649%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637751718697946000%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=eX6JF%2FjinJTZj9CzP4tvTA3Chd8NODf85oNlSdCpq14%3Dreserved=0
> git fetch --no-tags linux-review 
> Huang-Rui/drm-amdgpu-introduce-new-amdgpu_fence-object-to-indicate-the-job-embedded-fence/20211215-143731
> git checkout a47becf231b123760625c45242e89f5e5b5b4915
> # save the config file to linux build tree
> mkdir build_dir
> make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash 
> drivers/gpu/drm/amd/amdgpu/
> 
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot 
> 
> All warnings (new ones prefixed by >>):
> 
> >> drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c:631: warning: expecting 
> >> prototype for amdgpu_fence_clear_job_fences(). Prototype was for 
> >> amdgpu_fence_driver_clear_job_fences() instead
> 

Nice catch! Thank you. It's my typo and will fix this.

Thanks,
Ray

> 
> vim +631 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> 
>623
>624/**
>625 * amdgpu_fence_clear_job_fences - clear job embedded fences of 
> ring
>626 *
>627 * @ring: fence of the ring to be cleared
>628 *
>629 */
>630void amdgpu_fence_driver_clear_job_fences(struct amdgpu_ring 
> *ring)
>  > 631{
>632int i;
>633struct dma_fence *old, **ptr;
>634
>635for (i = 0; i <= ring->fence_drv.num_fences_mask; i++) {
>636ptr = >fence_drv.fences[i];
>637old = rcu_dereference_protected(*ptr, 1);
>638if (old && old->ops == _job_fence_ops)
>639RCU_INIT_POINTER(*ptr, NULL);
>640}
>641}
>642
> 
> ---
> 0-DAY CI Kernel Test Service, Intel Corporation
> https://nam11.safe

RE: [PATCH V5 00/16] Unified entry point for other blocks to interact with power

2021-12-15 Thread Quan, Evan

[AMD Official Use Only]

Hi Lijo,

Please check the latest series. All your comments were addressed except those 
about return value(EOPNOTSUPP) on api unimplemented.
That I would like to handle separately(with follow-up patches).

BR
Evan
> -Original Message-
> From: Quan, Evan 
> Sent: Monday, December 13, 2021 11:52 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Lazar, Lijo
> ; Quan, Evan 
> Subject: [PATCH V5 00/16] Unified entry point for other blocks to interact
> with power
> 
> There are several problems with current power implementations:
> 1. Too many internal details are exposed to other blocks. Thus to interact
> with
>power, they need to know which power framework is used(powerplay vs
> swsmu)
>or even whether some API is implemented.
> 2. A lot of cross callings exist which make it hard to get a whole picture of
>the code hierarchy. And that makes any code change/increment error-
> prone.
> 3. Many different types of lock are used. It is calculated there is totally
>13 different locks are used within power. Some of them are even designed
> for
>the same purpose.
> 
> To ease the problems above, this patch series try to
> 1. provide unified entry point for other blocks to interact with power.
> 2. relocate some source code piece/headers to avoid cross callings.
> 3. enforce a unified lock protection on those entry point APIs above.
>That makes the future optimization for unnecessary power locks possible.
> 
> Evan Quan (16):
>   drm/amd/pm: do not expose implementation details to other blocks out
> of power
>   drm/amd/pm: do not expose power implementation details to
> amdgpu_pm.c
>   drm/amd/pm: do not expose power implementation details to display
>   drm/amd/pm: do not expose those APIs used internally only in
> amdgpu_dpm.c
>   drm/amd/pm: do not expose those APIs used internally only in si_dpm.c
>   drm/amd/pm: do not expose the API used internally only in kv_dpm.c
>   drm/amd/pm: create a new holder for those APIs used only by legacy
> ASICs(si/kv)
>   drm/amd/pm: move pp_force_state_enabled member to amdgpu_pm
> structure
>   drm/amd/pm: optimize the amdgpu_pm_compute_clocks()
> implementations
>   drm/amd/pm: move those code piece used by Stoney only to
> smu8_hwmgr.c
>   drm/amd/pm: drop redundant or unused APIs and data structures
>   drm/amd/pm: do not expose the smu_context structure used internally in
> power
>   drm/amd/pm: relocate the power related headers
>   drm/amd/pm: drop unnecessary gfxoff controls
>   drm/amd/pm: revise the performance level setting APIs
>   drm/amd/pm: unified lock protections in amdgpu_dpm.c
> 
>  drivers/gpu/drm/amd/amdgpu/aldebaran.c|2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h   |7 -
>  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c   |   25 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c|6 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c   |   18 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h   |7 -
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c   |5 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c   |5 +-
>  drivers/gpu/drm/amd/amdgpu/dce_v10_0.c|2 +-
>  drivers/gpu/drm/amd/amdgpu/dce_v11_0.c|2 +-
>  drivers/gpu/drm/amd/amdgpu/dce_v6_0.c |2 +-
>  drivers/gpu/drm/amd/amdgpu/dce_v8_0.c |2 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c   |2 +-
>  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c |6 +-
>  .../amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c  |  248 +-
>  drivers/gpu/drm/amd/include/amd_shared.h  |2 -
>  .../gpu/drm/amd/include/kgd_pp_interface.h|8 +
>  drivers/gpu/drm/amd/pm/Makefile   |   14 +-
>  drivers/gpu/drm/amd/pm/amdgpu_dpm.c   | 2467 -
>  drivers/gpu/drm/amd/pm/amdgpu_dpm_internal.c  |   94 +
>  drivers/gpu/drm/amd/pm/amdgpu_pm.c|  552 ++--
>  drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h   |  341 +--
>  .../gpu/drm/amd/pm/inc/amdgpu_dpm_internal.h  |   32 +
>  drivers/gpu/drm/amd/pm/legacy-dpm/Makefile|   32 +
>  .../pm/{powerplay => legacy-dpm}/cik_dpm.h|0
>  .../amd/pm/{powerplay => legacy-dpm}/kv_dpm.c |   47 +-
>  .../amd/pm/{powerplay => legacy-dpm}/kv_dpm.h |0
>  .../amd/pm/{powerplay => legacy-dpm}/kv_smc.c |0
>  .../gpu/drm/amd/pm/legacy-dpm/legacy_dpm.c| 1081 
>  .../gpu/drm/amd/pm/legacy-dpm/legacy_dpm.h|   38 +
>  .../amd/pm/{powerplay => legacy-dpm}/ppsmc.h  |0
>  .../pm/{powerplay => legacy-dpm}/r600_dpm.h   |0
>  .../amd/pm/{powerplay => legacy-dpm}/si_dpm.c |  163 +-
>  .../amd/pm/{powerplay => legacy-dpm}/si_dpm.h |   15 +-
>  .../amd/pm/{powerplay => legacy-dpm}/si_smc.c |0
>  .../{powerplay => legacy-dpm}/sislands_smc.h  |0
>  drivers/gpu/drm/amd/pm/powerplay/Makefile |4 -
>  .../gpu/drm/amd/pm/powerplay/amd_powerplay.c  |   51 +-
>  .../drm/amd/pm/powerplay/hwmgr/smu8_hwmgr.c   |   10 +-
>  .../pm/{ =>

[pull] amdgpu drm-fixes-5.16

2021-12-15 Thread Alex Deucher

Hi Dave, Daniel,

Fixes for 5.16.

The following changes since commit 2585cf9dfaaddf00b069673f27bb3f8530e2039c:

  Linux 5.16-rc5 (2021-12-12 14:53:01 -0800)

are available in the Git repository at:

  https://gitlab.freedesktop.org/agd5f/linux.git 
tags/amd-drm-fixes-5.16-2021-12-15

for you to fetch changes up to aa464957f7e660abd554f2546a588f6533720e21:

  drm/amd/pm: fix a potential gpu_metrics_table memory leak (2021-12-14 
17:59:19 -0500)


amd-drm-fixes-5.16-2021-12-15:

amdgpu:
- Fix RLC register offset
- GMC fix
- Properly cache SMU FW version on Yellow Carp
- Fix missing callback on DCN3.1
- Reset DMCUB before HW init
- Fix for GMC powergating on PCO
- Fix a possible memory leak in GPU metrics table handling on RN


Evan Quan (1):
  drm/amdgpu: correct the wrong cached state for GMC on PICASSO

Hawking Zhang (1):
  drm/amdgpu: don't override default ECO_BITs setting

Lang Yu (1):
  drm/amd/pm: fix a potential gpu_metrics_table memory leak

Le Ma (1):
  drm/amdgpu: correct register access for RLC_JUMP_TABLE_RESTORE

Mario Limonciello (1):
  drm/amd/pm: fix reading SMU FW version from amdgpu_firmware_info on YC

Nicholas Kazlauskas (2):
  drm/amd/display: Set exit_optimized_pwr_state for DCN31
  drm/amd/display: Reset DMCUB before HW init

 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 4 ++--
 drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c  | 1 -
 drivers/gpu/drm/amd/amdgpu/gfxhub_v2_0.c  | 1 -
 drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c  | 1 -
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 8 
 drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c   | 9 -
 drivers/gpu/drm/amd/amdgpu/mmhub_v1_7.c   | 1 -
 drivers/gpu/drm/amd/amdgpu/mmhub_v2_0.c   | 1 -
 drivers/gpu/drm/amd/amdgpu/mmhub_v2_3.c   | 1 -
 drivers/gpu/drm/amd/amdgpu/mmhub_v9_4.c   | 2 --
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 5 +
 drivers/gpu/drm/amd/display/dc/dcn31/dcn31_init.c | 1 +
 drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c  | 7 ++-
 drivers/gpu/drm/amd/pm/swsmu/smu12/smu_v12_0.c| 3 +++
 drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c| 3 +++
 15 files changed, 32 insertions(+), 16 deletions(-)

回复: Re: [PATCH] drm/amdgpu: fixup bad vram size on gmc v8

2021-12-15 Thread 周宗敏

the problematic boards that I have tested is [AMD/ATI] Lexa PRO [Radeon RX 550/550X] ;  and the vbios version : 113-RXF9310-C09-BTWhen an exception occurs I can see the following changes in the values of vram size get from RREG32(mmCONFIG_MEMSIZE) ,it seems to have garbage in the upper 16 bits and then I can also see some dmesg like below:when vram size register have garbage,we may see error message like below:amdgpu :09:00.0: VRAM: 4286582784M 0x00F4 - 0x000FF8F4 (4286582784M used)the correct message should like below:amdgpu :09:00.0: VRAM: 4096M 0x00F4 - 0x00F4 (4096M used)if you have any problems,please send me mail.thanks very much.      主　题：Re: [PATCH] drm/amdgpu: fixup bad vram size on gmc v8
            日　期：2021-12-16 04:23
            发件人：Alex Deucher
            收件人：Zongmin Zhou
            
        
        On Wed, Dec 15, 2021 at 10:31 AM Zongmin Zhouwrote:>> Some boards(like RX550) seem to have garbage in the upper> 16 bits of the vram size register.  Check for> this and clamp the size properly.  Fixes> boards reporting bogus amounts of vram.>> after add this patch,the maximum GPU VRAM size is 64GB,> otherwise only 64GB vram size will be used.Can you provide some examples of problematic boards and possibly avbios image from the problematic board?  What values are you seeing?It would be nice to see what the boards are reporting and whether thelower 16 bits are actually correct or if it is some other issue.  Thisregister is undefined until the asic has been initialized.  The vbiosprograms it as part of it's asic init sequence (either via vesa/gop orthe OS driver).Alex>> Signed-off-by: Zongmin Zhou> --->  drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 13 ++--->  1 file changed, 10 insertions(+), 3 deletions(-)>> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c> index 492ebed2915b..63b890f1e8af 100644> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c> @@ -515,10 +515,10 @@ static void gmc_v8_0_mc_program(struct amdgpu_device *adev)>  static int gmc_v8_0_mc_init(struct amdgpu_device *adev)>  {>         int r;> +       u32 tmp;>>         adev->gmc.vram_width = amdgpu_atombios_get_vram_width(adev);>         if (!adev->gmc.vram_width) {> -               u32 tmp;>                 int chansize, numchan;>>                 /* Get VRAM informations */> @@ -562,8 +562,15 @@ static int gmc_v8_0_mc_init(struct amdgpu_device *adev)>                 adev->gmc.vram_width = numchan * chansize;>         }>         /* size in MB on si */> -       adev->gmc.mc_vram_size = RREG32(mmCONFIG_MEMSIZE) * 1024ULL * 1024ULL;> -       adev->gmc.real_vram_size = RREG32(mmCONFIG_MEMSIZE) * 1024ULL * 1024ULL;> +       tmp = RREG32(mmCONFIG_MEMSIZE);> +       /* some boards may have garbage in the upper 16 bits */> +       if (tmp & 0x) {> +               DRM_INFO("Probable bad vram size: 0x%08x\n", tmp);> +               if (tmp & 0x)> +                       tmp &= 0x;> +       }> +       adev->gmc.mc_vram_size = tmp * 1024ULL * 1024ULL;> +       adev->gmc.real_vram_size = adev->gmc.mc_vram_size;>>         if (!(adev->flags & AMD_IS_APU)) {>                 r = amdgpu_device_resize_fb_bar(adev);> --> 2.25.1>>> No virus found>                 Checked by Hillstone Network AntiVirus

[PATCH] drm/amdgpu: add support for IP discovery gc_info table v2

2021-12-15 Thread Alex Deucher

Used on gfx9 based systems. Fixes incorrect CU counts reported
in the kernel log.

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1833
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 76 +--
 drivers/gpu/drm/amd/include/discovery.h   | 49 
 2 files changed, 103 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index ea00090b3fb3..bcc9343353b5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -526,10 +526,15 @@ void amdgpu_discovery_harvest_ip(struct amdgpu_device 
*adev)
}
 }
 
+union gc_info {
+   struct gc_info_v1_0 v1;
+   struct gc_info_v2_0 v2;
+};
+
 int amdgpu_discovery_get_gfx_info(struct amdgpu_device *adev)
 {
struct binary_header *bhdr;
-   struct gc_info_v1_0 *gc_info;
+   union gc_info *gc_info;
 
if (!adev->mman.discovery_bin) {
DRM_ERROR("ip discovery uninitialized\n");
@@ -537,28 +542,55 @@ int amdgpu_discovery_get_gfx_info(struct amdgpu_device 
*adev)
}
 
bhdr = (struct binary_header *)adev->mman.discovery_bin;
-   gc_info = (struct gc_info_v1_0 *)(adev->mman.discovery_bin +
+   gc_info = (union gc_info *)(adev->mman.discovery_bin +
le16_to_cpu(bhdr->table_list[GC].offset));
-
-   adev->gfx.config.max_shader_engines = le32_to_cpu(gc_info->gc_num_se);
-   adev->gfx.config.max_cu_per_sh = 2 * 
(le32_to_cpu(gc_info->gc_num_wgp0_per_sa) +
- 
le32_to_cpu(gc_info->gc_num_wgp1_per_sa));
-   adev->gfx.config.max_sh_per_se = le32_to_cpu(gc_info->gc_num_sa_per_se);
-   adev->gfx.config.max_backends_per_se = 
le32_to_cpu(gc_info->gc_num_rb_per_se);
-   adev->gfx.config.max_texture_channel_caches = 
le32_to_cpu(gc_info->gc_num_gl2c);
-   adev->gfx.config.max_gprs = le32_to_cpu(gc_info->gc_num_gprs);
-   adev->gfx.config.max_gs_threads = 
le32_to_cpu(gc_info->gc_num_max_gs_thds);
-   adev->gfx.config.gs_vgt_table_depth = 
le32_to_cpu(gc_info->gc_gs_table_depth);
-   adev->gfx.config.gs_prim_buffer_depth = 
le32_to_cpu(gc_info->gc_gsprim_buff_depth);
-   adev->gfx.config.double_offchip_lds_buf = 
le32_to_cpu(gc_info->gc_double_offchip_lds_buffer);
-   adev->gfx.cu_info.wave_front_size = le32_to_cpu(gc_info->gc_wave_size);
-   adev->gfx.cu_info.max_waves_per_simd = 
le32_to_cpu(gc_info->gc_max_waves_per_simd);
-   adev->gfx.cu_info.max_scratch_slots_per_cu = 
le32_to_cpu(gc_info->gc_max_scratch_slots_per_cu);
-   adev->gfx.cu_info.lds_size = le32_to_cpu(gc_info->gc_lds_size);
-   adev->gfx.config.num_sc_per_sh = le32_to_cpu(gc_info->gc_num_sc_per_se) 
/
-le32_to_cpu(gc_info->gc_num_sa_per_se);
-   adev->gfx.config.num_packer_per_sc = 
le32_to_cpu(gc_info->gc_num_packer_per_sc);
-
+   switch (gc_info->v1.header.version_major) {
+   case 1:
+   adev->gfx.config.max_shader_engines = 
le32_to_cpu(gc_info->v1.gc_num_se);
+   adev->gfx.config.max_cu_per_sh = 2 * 
(le32_to_cpu(gc_info->v1.gc_num_wgp0_per_sa) +
+ 
le32_to_cpu(gc_info->v1.gc_num_wgp1_per_sa));
+   adev->gfx.config.max_sh_per_se = 
le32_to_cpu(gc_info->v1.gc_num_sa_per_se);
+   adev->gfx.config.max_backends_per_se = 
le32_to_cpu(gc_info->v1.gc_num_rb_per_se);
+   adev->gfx.config.max_texture_channel_caches = 
le32_to_cpu(gc_info->v1.gc_num_gl2c);
+   adev->gfx.config.max_gprs = 
le32_to_cpu(gc_info->v1.gc_num_gprs);
+   adev->gfx.config.max_gs_threads = 
le32_to_cpu(gc_info->v1.gc_num_max_gs_thds);
+   adev->gfx.config.gs_vgt_table_depth = 
le32_to_cpu(gc_info->v1.gc_gs_table_depth);
+   adev->gfx.config.gs_prim_buffer_depth = 
le32_to_cpu(gc_info->v1.gc_gsprim_buff_depth);
+   adev->gfx.config.double_offchip_lds_buf = 
le32_to_cpu(gc_info->v1.gc_double_offchip_lds_buffer);
+   adev->gfx.cu_info.wave_front_size = 
le32_to_cpu(gc_info->v1.gc_wave_size);
+   adev->gfx.cu_info.max_waves_per_simd = 
le32_to_cpu(gc_info->v1.gc_max_waves_per_simd);
+   adev->gfx.cu_info.max_scratch_slots_per_cu = 
le32_to_cpu(gc_info->v1.gc_max_scratch_slots_per_cu);
+   adev->gfx.cu_info.lds_size = 
le32_to_cpu(gc_info->v1.gc_lds_size);
+   adev->gfx.config.num_sc_per_sh = 
le32_to_cpu(gc_info->v1.gc_num_sc_per_se) /
+   le32_to_cpu(gc_info->v1.gc_num_sa_per_se);
+   adev->gfx.config.num_packer_per_sc = 
le32_to_cpu(gc_info->v1.gc_num_packer_per_sc);
+   break;
+   case 2:
+   adev->gfx.config.max_shader_engines = 
le32_to_cpu(gc_info->v2.gc_num_se);
+   adev->gfx.config.max_cu_per_sh =

RE: [PATCH 1/2] drm/amdgpu: Separate vf2pf work item init from virt data exchange

2021-12-15 Thread Liu, Shaoyun

[AMD Official Use Only]

Looks ok to me . This serial is Reviewed by: Shaoyun.liu 

Regards
Shaoyun.liu

-Original Message-
From: amd-gfx  On Behalf Of Victor 
Skvortsov
Sent: Thursday, December 9, 2021 11:48 AM
To: amd-gfx@lists.freedesktop.org
Cc: Skvortsov, Victor 
Subject: [PATCH 1/2] drm/amdgpu: Separate vf2pf work item init from virt data 
exchange

We want to be able to call virt data exchange conditionally after gmc sw init 
to reserve bad pages as early as possible.
Since this is a conditional call, we will need to call it again unconditionally 
later in the init sequence.

Refactor the data exchange function so it can be called multiple times without 
re-initializing the work item.

Signed-off-by: Victor Skvortsov 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 20 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c   | 42 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h   |  5 +--
 drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c  |  2 +-
 drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c  |  2 +-
 5 files changed, 45 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index ce9bdef185c0..3992c4086d26 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2181,7 +2181,7 @@ static int amdgpu_device_ip_early_init(struct 
amdgpu_device *adev)
 
/*get pf2vf msg info at it's earliest time*/
if (amdgpu_sriov_vf(adev))
-   amdgpu_virt_init_data_exchange(adev);
+   amdgpu_virt_exchange_data(adev);
 
}
}
@@ -2345,8 +2345,10 @@ static int amdgpu_device_ip_init(struct amdgpu_device 
*adev)
}
}
 
-   if (amdgpu_sriov_vf(adev))
-   amdgpu_virt_init_data_exchange(adev);
+   if (amdgpu_sriov_vf(adev)) {
+   amdgpu_virt_exchange_data(adev);
+   amdgpu_virt_init_vf2pf_work_item(adev);
+   }
 
r = amdgpu_ib_pool_init(adev);
if (r) {
@@ -2949,7 +2951,7 @@ int amdgpu_device_ip_suspend(struct amdgpu_device *adev)
int r;
 
if (amdgpu_sriov_vf(adev)) {
-   amdgpu_virt_fini_data_exchange(adev);
+   amdgpu_virt_fini_vf2pf_work_item(adev);
amdgpu_virt_request_full_gpu(adev, false);
}
 
@@ -3839,7 +3841,7 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev)
 * */
if (amdgpu_sriov_vf(adev)) {
amdgpu_virt_request_full_gpu(adev, false);
-   amdgpu_virt_fini_data_exchange(adev);
+   amdgpu_virt_fini_vf2pf_work_item(adev);
}
 
/* disable all interrupts */
@@ -4317,7 +4319,9 @@ static int amdgpu_device_reset_sriov(struct amdgpu_device 
*adev,
if (r)
goto error;
 
-   amdgpu_virt_init_data_exchange(adev);
+   amdgpu_virt_exchange_data(adev);
+   amdgpu_virt_init_vf2pf_work_item(adev);
+
/* we need recover gart prior to run SMC/CP/SDMA resume */
amdgpu_gtt_mgr_recover(ttm_manager_type(>mman.bdev, TTM_PL_TT));
 
@@ -4495,7 +4499,7 @@ int amdgpu_device_pre_asic_reset(struct amdgpu_device 
*adev,
 
if (amdgpu_sriov_vf(adev)) {
/* stop the data exchange thread */
-   amdgpu_virt_fini_data_exchange(adev);
+   amdgpu_virt_fini_vf2pf_work_item(adev);
}
 
/* block all schedulers and reset given job's ring */ @@ -4898,7 
+4902,7 @@ static void amdgpu_device_recheck_guilty_jobs(
 retry:
/* do hw reset */
if (amdgpu_sriov_vf(adev)) {
-   amdgpu_virt_fini_data_exchange(adev);
+   amdgpu_virt_fini_vf2pf_work_item(adev);
r = amdgpu_device_reset_sriov(adev, false);
if (r)
adev->asic_reset_res = r;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
index 3fc49823f527..b6e3d379a86a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
@@ -611,16 +611,7 @@ static void amdgpu_virt_update_vf2pf_work_item(struct 
work_struct *work)
schedule_delayed_work(&(adev->virt.vf2pf_work), 
adev->virt.vf2pf_update_interval_ms);
 }
 
-void amdgpu_virt_fini_data_exchange(struct amdgpu_device *adev) -{
-   if (adev->virt.vf2pf_update_interval_ms != 0) {
-   DRM_INFO("clean up the vf2pf work item\n");
-   cancel_delayed_work_sync(>virt.vf2pf_work);
-   adev->virt.vf2pf_update_interval_ms = 0;
-   }
-}
-
-void amdgpu_virt_init_data_exchange(struct amdgpu_device *adev)
+void amdgpu_virt_exchange_data(struct amdgpu_device *adev)
 {
uint64_t bp_block_offset = 0;
uint32_t bp_block_size = 0;
@@

[PATCH] drm/amdgpu: Filter security violation registers

2021-12-15 Thread Bokun Zhang

Recently, there is security policy update under SRIOV.
We need to filter the registers that hit the violation
and move the code to the host driver side so that
the guest driver can execute correctly.

Signed-off-by: Bokun Zhang 
Change-Id: Ida893bb17de17a80e865c7662f04c5562f5d2727
---
 drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c | 83 ++
 1 file changed, 46 insertions(+), 37 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
index 4f546f632223..d3d6d5b045b8 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
@@ -542,9 +542,6 @@ static void sdma_v5_2_ctx_switch_enable(struct 
amdgpu_device *adev, bool enable)
}
 
for (i = 0; i < adev->sdma.num_instances; i++) {
-   f32_cntl = RREG32(sdma_v5_2_get_reg_offset(adev, i, 
mmSDMA0_CNTL));
-   f32_cntl = REG_SET_FIELD(f32_cntl, SDMA0_CNTL,
-   AUTO_CTXSW_ENABLE, enable ? 1 : 0);
if (enable && amdgpu_sdma_phase_quantum) {
WREG32_SOC15_IP(GC, sdma_v5_2_get_reg_offset(adev, i, 
mmSDMA0_PHASE0_QUANTUM),
   phase_quantum);
@@ -553,7 +550,13 @@ static void sdma_v5_2_ctx_switch_enable(struct 
amdgpu_device *adev, bool enable)
WREG32_SOC15_IP(GC, sdma_v5_2_get_reg_offset(adev, i, 
mmSDMA0_PHASE2_QUANTUM),
   phase_quantum);
}
-   WREG32(sdma_v5_2_get_reg_offset(adev, i, mmSDMA0_CNTL), 
f32_cntl);
+
+   if (!amdgpu_sriov_vf(adev)) {
+   f32_cntl = RREG32(sdma_v5_2_get_reg_offset(adev, i, 
mmSDMA0_CNTL));
+   f32_cntl = REG_SET_FIELD(f32_cntl, SDMA0_CNTL,
+   AUTO_CTXSW_ENABLE, enable ? 1 : 0);
+   WREG32(sdma_v5_2_get_reg_offset(adev, i, mmSDMA0_CNTL), 
f32_cntl);
+   }
}
 
 }
@@ -576,10 +579,12 @@ static void sdma_v5_2_enable(struct amdgpu_device *adev, 
bool enable)
sdma_v5_2_rlc_stop(adev);
}
 
-   for (i = 0; i < adev->sdma.num_instances; i++) {
-   f32_cntl = RREG32(sdma_v5_2_get_reg_offset(adev, i, 
mmSDMA0_F32_CNTL));
-   f32_cntl = REG_SET_FIELD(f32_cntl, SDMA0_F32_CNTL, HALT, enable 
? 0 : 1);
-   WREG32(sdma_v5_2_get_reg_offset(adev, i, mmSDMA0_F32_CNTL), 
f32_cntl);
+   if (!amdgpu_sriov_vf(adev)) {
+   for (i = 0; i < adev->sdma.num_instances; i++) {
+   f32_cntl = RREG32(sdma_v5_2_get_reg_offset(adev, i, 
mmSDMA0_F32_CNTL));
+   f32_cntl = REG_SET_FIELD(f32_cntl, SDMA0_F32_CNTL, 
HALT, enable ? 0 : 1);
+   WREG32(sdma_v5_2_get_reg_offset(adev, i, 
mmSDMA0_F32_CNTL), f32_cntl);
+   }
}
 }
 
@@ -608,7 +613,8 @@ static int sdma_v5_2_gfx_resume(struct amdgpu_device *adev)
ring = >sdma.instance[i].ring;
wb_offset = (ring->rptr_offs * 4);
 
-   WREG32_SOC15_IP(GC, sdma_v5_2_get_reg_offset(adev, i, 
mmSDMA0_SEM_WAIT_FAIL_TIMER_CNTL), 0);
+   if (!amdgpu_sriov_vf(adev))
+   WREG32_SOC15_IP(GC, sdma_v5_2_get_reg_offset(adev, i, 
mmSDMA0_SEM_WAIT_FAIL_TIMER_CNTL), 0);
 
/* Set ring buffer size in dwords */
rb_bufsz = order_base_2(ring->ring_size / 4);
@@ -683,32 +689,34 @@ static int sdma_v5_2_gfx_resume(struct amdgpu_device 
*adev)
sdma_v5_2_ring_set_wptr(ring);
 
/* set minor_ptr_update to 0 after wptr programed */
-   WREG32_SOC15_IP(GC, sdma_v5_2_get_reg_offset(adev, i, 
mmSDMA0_GFX_MINOR_PTR_UPDATE), 0);
 
-   /* set utc l1 enable flag always to 1 */
-   temp = RREG32(sdma_v5_2_get_reg_offset(adev, i, mmSDMA0_CNTL));
-   temp = REG_SET_FIELD(temp, SDMA0_CNTL, UTC_L1_ENABLE, 1);
-
-   /* enable MCBP */
-   temp = REG_SET_FIELD(temp, SDMA0_CNTL, MIDCMD_PREEMPT_ENABLE, 
1);
-   WREG32(sdma_v5_2_get_reg_offset(adev, i, mmSDMA0_CNTL), temp);
-
-   /* Set up RESP_MODE to non-copy addresses */
-   temp = RREG32_SOC15_IP(GC, sdma_v5_2_get_reg_offset(adev, i, 
mmSDMA0_UTCL1_CNTL));
-   temp = REG_SET_FIELD(temp, SDMA0_UTCL1_CNTL, RESP_MODE, 3);
-   temp = REG_SET_FIELD(temp, SDMA0_UTCL1_CNTL, REDO_DELAY, 9);
-   WREG32_SOC15_IP(GC, sdma_v5_2_get_reg_offset(adev, i, 
mmSDMA0_UTCL1_CNTL), temp);
-
-   /* program default cache read and write policy */
-   temp = RREG32_SOC15_IP(GC, sdma_v5_2_get_reg_offset(adev, i, 
mmSDMA0_UTCL1_PAGE));
-   /* clean read policy and write policy bits */
-   temp &= 0xFF0FFF;
-   temp |= ((CACHE_READ_POLICY_L2__DEFAULT << 12) |
-(CACHE_WRITE_POLICY_L2__DEFAULT <<

RE: [PATCH 4/5] drm/amdgpu: Initialize Aldebaran RLC function pointers

2021-12-15 Thread Skvortsov, Victor

[AMD Official Use Only]

Hey Alex,

This change was based on the fact that amd-mainline-dkms-5.13 calls 
get_xgmi_info() in gmc_v9_0_early_init(). But I can see that drm-next it's 
instead called in gmc_v9_0_sw_init(). So, I'm not sure whats the correct 
behavior. But I do agree that the change is kind of ugly. I don't know where 
else to put it if we do need to call get_xgmi_info() in early_init.

Thanks,
Victor

-Original Message-
From: Alex Deucher  
Sent: Wednesday, December 15, 2021 4:38 PM
To: Skvortsov, Victor 
Cc: amd-gfx list ; Deng, Emily 
; Liu, Monk ; Ming, Davis 
; Liu, Shaoyun ; Zhou, Peng Ju 
; Chen, JingWen ; Chen, Horace 
; Nieto, David M 
Subject: Re: [PATCH 4/5] drm/amdgpu: Initialize Aldebaran RLC function pointers

[CAUTION: External Email]

On Wed, Dec 15, 2021 at 1:56 PM Victor Skvortsov  
wrote:
>
> In SRIOV, RLC function pointers must be initialized early as we rely 
> on the RLCG interface for all GC register access.
>
> Signed-off-by: Victor Skvortsov 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 2 ++
>  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 3 +--
>  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.h | 2 ++
>  3 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> index 65e1f6cc59dd..1bc92a38d124 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> @@ -844,6 +844,8 @@ static int amdgpu_discovery_set_gc_ip_blocks(struct 
> amdgpu_device *adev)
> case IP_VERSION(9, 4, 1):
> case IP_VERSION(9, 4, 2):
> amdgpu_device_ip_block_add(adev, _v9_0_ip_block);
> +   if (amdgpu_sriov_vf(adev) && adev->ip_versions[GC_HWIP][0] == 
> IP_VERSION(9, 4, 2))
> +   gfx_v9_0_set_rlc_funcs(adev);

amdgpu_discovery.c is IP independent.  I'd rather not add random IP specific 
function calls.  gfx_v9_0_set_rlc_funcs() already gets called in 
gfx_v9_0_early_init().  Is that not early enough?  In general we shouldn't be 
touching the hardware much if at all in early_init.

Alex

> break;
> case IP_VERSION(10, 1, 10):
> case IP_VERSION(10, 1, 2):
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> index edb3e3b08eed..d252b06efa43 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> @@ -816,7 +816,6 @@ static void gfx_v9_0_sriov_wreg(struct 
> amdgpu_device *adev, u32 offset,  static void 
> gfx_v9_0_set_ring_funcs(struct amdgpu_device *adev);  static void 
> gfx_v9_0_set_irq_funcs(struct amdgpu_device *adev);  static void 
> gfx_v9_0_set_gds_init(struct amdgpu_device *adev); -static void 
> gfx_v9_0_set_rlc_funcs(struct amdgpu_device *adev);  static int 
> gfx_v9_0_get_cu_info(struct amdgpu_device *adev,
> struct amdgpu_cu_info *cu_info);  
> static uint64_t gfx_v9_0_get_gpu_clock_counter(struct amdgpu_device 
> *adev); @@ -7066,7 +7065,7 @@ static void gfx_v9_0_set_irq_funcs(struct 
> amdgpu_device *adev)
> adev->gfx.cp_ecc_error_irq.funcs = 
> _v9_0_cp_ecc_error_irq_funcs;  }
>
> -static void gfx_v9_0_set_rlc_funcs(struct amdgpu_device *adev)
> +void gfx_v9_0_set_rlc_funcs(struct amdgpu_device *adev)
>  {
> switch (adev->ip_versions[GC_HWIP][0]) {
> case IP_VERSION(9, 0, 1):
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.h 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.h
> index dfe8d4841f58..1817e252354f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.h
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.h
> @@ -29,4 +29,6 @@ extern const struct amdgpu_ip_block_version 
> gfx_v9_0_ip_block;  void gfx_v9_0_select_se_sh(struct amdgpu_device *adev, 
> u32 se_num, u32 sh_num,
>u32 instance);
>
> +void gfx_v9_0_set_rlc_funcs(struct amdgpu_device *adev);
> +
>  #endif
> --
> 2.25.1
>

Re: Various problems trying to vga-passthrough a Renoir iGPU to a xen/qubes-os hvm

2021-12-15 Thread Alex Deucher

Thinking about this more, I think the problem might be related to CPU
access to "VRAM".  APUs don't have dedicated VRAM, they use a reserved
carve out region at the top of system memory.  For CPU access to this
memory, we kmap the physical address of the carve out region of system
memory.  You'll need to make sure that region is accessible to the
guest.

Alex

On Mon, Dec 13, 2021 at 3:29 PM Alex Deucher  wrote:
>
> On Sun, Dec 12, 2021 at 5:19 PM Yann Dirson  wrote:
> >
> > Alex wrote:
> > > On Mon, Dec 6, 2021 at 4:36 PM Yann Dirson  wrote:
> > > >
> > > > Hi Alex,
> > > >
> > > > > We have not validated virtualization of our integrated GPUs.  I
> > > > > don't
> > > > > know that it will work at all.  We had done a bit of testing but
> > > > > ran
> > > > > into the same issues with the PSP, but never had a chance to
> > > > > debug
> > > > > further because this feature is not productized.
> > > > ...
> > > > > You need a functional PSP to get the GPU driver up and running.
> > > >
> > > > Ah, thanks for the hint :)
> > > >
> > > > I guess that if I want to have any chance to get the PSP working
> > > > I'm
> > > > going to need more details on it.  A quick search some time ago
> > > > mostly
> > > > brought reverse-engineering work, rather than official AMD doc.
> > > >  Are
> > > > there some AMD resources I missed ?
> > >
> > > The driver code is pretty much it.
> >
> > Let's try to shed some more light on how things work, taking as excuse
> > psp_v12_0_ring_create().
> >
> > First, register access through [RW]REG32_SOC15() is implemented in
> > terms of __[RW]REG32_SOC15_RLC__(), which is basically a [RW]REG32(),
> > except it has to be more complex in the SR-IOV case.
> > Has the RLC anything to do with SR-IOV ?
>
> When running the driver on a SR-IOV virtual function (VF), some
> registers are not available directly via the VF's MMIO aperture so
> they need to go through the RLC.  For bare metal or passthrough this
> is not relevant.
>
> >
> > It accesses registers in the MMIO range of the MP0 IP, and the "MP0"
> > name correlates highly with MMIO accesses in PSP-handling code.
> > Is "MP0" another name for PSP (and "MP1" for SMU) ?  The MP0 version
>
> Yes.
>
> > reported at v11.0.3 by discovery seems to contradict the use of v12.0
> > for RENOIR as set by soc15_set_ip_blocks(), or do I miss something ?
>
> Typo in the ip discovery table on renoir.
>
> >
> > More generally (and mostly out of curiosity while we're at it), do we
> > have a way to match IPs listed at discovery time with the ones used
> > in the driver ?
>
> In general, barring typos, the code is shared at the major version
> level.  The actual code may or may not need changes to handle minor
> revision changes in an IP.  The driver maps the IP versions from the
> ip discovery table to the code contained in the driver.
>
> >
> > ---
> >
> > As for the register names, maybe we could have a short explanation of
> > how they are structured ?  Eg. mmMP0_SMN_C2PMSG_69: that seems to be
> > a MMIO register named "C2PMSG_69" in the "MP0" IP, but I'm not sure
> > of the "SMN" part -- that could refer to the "System Management Network",
> > described in [0] as an internal bus.  Are we accessing this register
> > through this SMN ?
>
> These registers are just mailboxes for the PSP firmware.  All of the
> C2PMSG registers functionality is defined by the PSP firmware.  They
> are basically scratch registers used to communicate between the driver
> and the PSP firmware.
>
> >
> >
> > >  On APUs, the PSP is shared with
> > > the CPU and the rest of the platform.  The GPU driver just interacts
> > > with it for a few specific tasks:
> > > 1. Loading Trusted Applications (e.g., trusted firmware applications
> > > that run on the PSP for specific functionality, e.g., HDCP and
> > > content
> > > protection, etc.)
> > > 2. Validating and loading firmware for other engines on the SoC.
> > >  This
> > > is required to use those engines.
> >
> > Trying to understand in more details how we start the PSP up, I noticed
> > that psp_v12_0 has support for loading a sOS firmware, but never calls
> > init_sos_microcode() - and anyway there is no sos firmware for renoir
> > and green_sardine, which seem to be the only ASICs with this PSP version.
> > Is it something that's just not been completely wired up yet ?
>
> On APUs, the PSP is shared with the CPU so the PSP firmware is part of
> the sbios image.  The driver doesn't load it.  We only load it on
> dGPUs where the driver is responsible for the chip initialization.
>
> >
> > That also rings a bell, that we have nothing about Secure OS in the doc
> > yet (not even the acronym in the glossary).
> >
> >
> > > I'm not too familiar with the PSP's path to memory from the GPU
> > > perspective.  IIRC, most memory used by the PSP goes through carve
> > > out
> > > "vram" on APUs so it should work, but I would double check if there
> > > are any system memory allocations that used to interact with the PSP
> > >

Re: [PATCH 4/5] drm/amdgpu: Initialize Aldebaran RLC function pointers

2021-12-15 Thread Alex Deucher

On Wed, Dec 15, 2021 at 1:56 PM Victor Skvortsov
 wrote:
>
> In SRIOV, RLC function pointers must be initialized early as
> we rely on the RLCG interface for all GC register access.
>
> Signed-off-by: Victor Skvortsov 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 2 ++
>  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 3 +--
>  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.h | 2 ++
>  3 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> index 65e1f6cc59dd..1bc92a38d124 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> @@ -844,6 +844,8 @@ static int amdgpu_discovery_set_gc_ip_blocks(struct 
> amdgpu_device *adev)
> case IP_VERSION(9, 4, 1):
> case IP_VERSION(9, 4, 2):
> amdgpu_device_ip_block_add(adev, _v9_0_ip_block);
> +   if (amdgpu_sriov_vf(adev) && adev->ip_versions[GC_HWIP][0] == 
> IP_VERSION(9, 4, 2))
> +   gfx_v9_0_set_rlc_funcs(adev);

amdgpu_discovery.c is IP independent.  I'd rather not add random IP
specific function calls.  gfx_v9_0_set_rlc_funcs() already gets called
in gfx_v9_0_early_init().  Is that not early enough?  In general we
shouldn't be touching the hardware much if at all in early_init.

Alex

> break;
> case IP_VERSION(10, 1, 10):
> case IP_VERSION(10, 1, 2):
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> index edb3e3b08eed..d252b06efa43 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> @@ -816,7 +816,6 @@ static void gfx_v9_0_sriov_wreg(struct amdgpu_device 
> *adev, u32 offset,
>  static void gfx_v9_0_set_ring_funcs(struct amdgpu_device *adev);
>  static void gfx_v9_0_set_irq_funcs(struct amdgpu_device *adev);
>  static void gfx_v9_0_set_gds_init(struct amdgpu_device *adev);
> -static void gfx_v9_0_set_rlc_funcs(struct amdgpu_device *adev);
>  static int gfx_v9_0_get_cu_info(struct amdgpu_device *adev,
> struct amdgpu_cu_info *cu_info);
>  static uint64_t gfx_v9_0_get_gpu_clock_counter(struct amdgpu_device *adev);
> @@ -7066,7 +7065,7 @@ static void gfx_v9_0_set_irq_funcs(struct amdgpu_device 
> *adev)
> adev->gfx.cp_ecc_error_irq.funcs = _v9_0_cp_ecc_error_irq_funcs;
>  }
>
> -static void gfx_v9_0_set_rlc_funcs(struct amdgpu_device *adev)
> +void gfx_v9_0_set_rlc_funcs(struct amdgpu_device *adev)
>  {
> switch (adev->ip_versions[GC_HWIP][0]) {
> case IP_VERSION(9, 0, 1):
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.h 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.h
> index dfe8d4841f58..1817e252354f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.h
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.h
> @@ -29,4 +29,6 @@ extern const struct amdgpu_ip_block_version 
> gfx_v9_0_ip_block;
>  void gfx_v9_0_select_se_sh(struct amdgpu_device *adev, u32 se_num, u32 
> sh_num,
>u32 instance);
>
> +void gfx_v9_0_set_rlc_funcs(struct amdgpu_device *adev);
> +
>  #endif
> --
> 2.25.1
>

Re: [PATCH 6/7] drm/amdgpu: Ensure kunmap is called on error

2021-12-15 Thread Ira Weiny

On Tue, Dec 14, 2021 at 08:09:29AM +0100, Christian König wrote:
> Am 14.12.21 um 04:37 schrieb Ira Weiny:
> > On Mon, Dec 13, 2021 at 09:37:32PM +0100, Christian König wrote:
> > > Am 11.12.21 um 00:24 schrieb ira.we...@intel.com:
> > > > From: Ira Weiny 
> > > > 
> > > > The default case leaves the buffer object mapped in error.
> > > > 
> > > > Add amdgpu_bo_kunmap() to that case to ensure the mapping is cleaned up.
> > > Mhm, good catch. But why do you want to do this in the first place?
> > I'm not sure I understand the question.
> > 
> > Any mapping of memory should be paired with an unmapping when no longer 
> > needed.
> > And this is supported by the call to amdgpu_bo_kunmap() in the other
> > non-default cases.
> > 
> > Do you believe the mapping is not needed?
> 
> No, the unmapping is not needed here. See the function amdgpu_bo_kmap(), it
> either creates the mapping or return the cached pointer.

Ah I missed that.  Thanks.

> 
> A call to amdgpu_bo_kunmap() is only done in a few places where we know that
> the created mapping most likely won't be needed any more. If that's not done
> the mapping is automatically destroyed when the BO is moved or freed up.
> 
> I mean good bug fix, but you seem to see this as some kind of prerequisite
> to some follow up work converting TTM to use kmap_local() which most likely
> won't work in the first place.

Sure.  I see now that it is more complicated than I thought but I never thought
of this as a strict prerequisite.  Just something I found while trying to
figure out how this works.

How much of a speed up is it to maintain the ttm_bo_map_kmap map type?  Could
this all be done with vmap and just remove the kmap stuff?

Ira

> 
> Regards,
> Christian.
> 
> > 
> > Ira
> > 
> > > Christian.
> > > 
> > > > Signed-off-by: Ira Weiny 
> > > > 
> > > > ---
> > > > NOTE: It seems like this function could use a fair bit of refactoring
> > > > but this is the easiest way to fix the actual bug.
> > > > ---
> > > >drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 1 +
> > > >1 file changed, 1 insertion(+)
> > > > nice
> > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c 
> > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
> > > > index 6f8de11a17f1..b3ffd0f6b35f 100644
> > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
> > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
> > > > @@ -889,6 +889,7 @@ static int amdgpu_uvd_cs_msg(struct 
> > > > amdgpu_uvd_cs_ctx *ctx,
> > > > return 0;
> > > > default:
> > > > +   amdgpu_bo_kunmap(bo);
> > > > DRM_ERROR("Illegal UVD message type (%d)!\n", msg_type);
> > > > }
>

Re: [PATCH v4 4/6] drm: implement a method to free unused pages

2021-12-15 Thread Arunpravin




On 14/12/21 12:10 am, Matthew Auld wrote:
> On 01/12/2021 16:39, Arunpravin wrote:
>> On contiguous allocation, we round up the size
>> to the *next* power of 2, implement a function
>> to free the unused pages after the newly allocate block.
>>
>> v2(Matthew Auld):
>>- replace function name 'drm_buddy_free_unused_pages' with
>>  drm_buddy_block_trim
>>- replace input argument name 'actual_size' with 'new_size'
>>- add more validation checks for input arguments
>>- add overlaps check to avoid needless searching and splitting
>>- merged the below patch to see the feature in action
>>  - add free unused pages support to i915 driver
>>- lock drm_buddy_block_trim() function as it calls mark_free/mark_split
>>  are all globally visible
>>
>> v3:
>>- remove drm_buddy_block_trim() error handling and
>>  print a warn message if it fails
>>
>> Signed-off-by: Arunpravin 
>> ---
>>   drivers/gpu/drm/drm_buddy.c   | 72 ++-
>>   drivers/gpu/drm/i915/i915_ttm_buddy_manager.c | 10 +++
>>   include/drm/drm_buddy.h   |  4 ++
>>   3 files changed, 83 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
>> index eddc1eeda02e..707efc82216d 100644
>> --- a/drivers/gpu/drm/drm_buddy.c
>> +++ b/drivers/gpu/drm/drm_buddy.c
>> @@ -434,7 +434,8 @@ alloc_from_freelist(struct drm_buddy_mm *mm,
>>   static int __alloc_range(struct drm_buddy_mm *mm,
>>   struct list_head *dfs,
>>   u64 start, u64 size,
>> - struct list_head *blocks)
>> + struct list_head *blocks,
>> + bool trim_path)
>>   {
>>  struct drm_buddy_block *block;
>>  struct drm_buddy_block *buddy;
>> @@ -480,8 +481,20 @@ static int __alloc_range(struct drm_buddy_mm *mm,
>>   
>>  if (!drm_buddy_block_is_split(block)) {
>>  err = split_block(mm, block);
>> -if (unlikely(err))
>> +if (unlikely(err)) {
>> +if (trim_path)
>> +/*
>> + * Here in case of trim, we return and 
>> dont goto
>> + * split failure path as it removes 
>> from the
>> + * original list and potentially also 
>> freeing
>> + * the block. so we could leave as it 
>> is,
>> + * worse case we get some internal 
>> fragmentation
>> + * and leave the decision to the user
>> + */
>> +return err;
> 
> Hmm, ideally we don't want to leave around blocks where both buddies are 
> free without then also merging them back(not sure if that trips some 
> BUG_ON). Also IIUC, if we hit this failure path, depending on where the 
> split_block() fails we might be allocating something less than new_size? 
> Also if it's the first split_block() that fails then the user just gets 
> an empty list?
> 
> Could we perhaps just turn this node into a temporary root node to 
> prevent recursively freeing itself, but still retain the 
> un-splitting/freeing of the other nodes i.e something like:
> 
> list_del(>link);
> mark_free(mm, block);
> mm->avail += ...;
> 
> /* Prevent recursively freeing this node */
> parent = block->parent;
> block->parent = NULL;
> 
> list_add(>tmp_link, );
> ret = _alloc_range(mm, , new_start, new_size, blocks);
> if (ret) {
>  mem->avail -= ...;
>  mark_allocated(block);
>  list_add(>link, blocks);
> }
> 
> block->parent = parent;
> return ret;
> 
> That way we can also drop the special trim_path handling. Thoughts?

That's a nice idea. It will work.
> 
>> +
>>  goto err_undo;
>> +}
>>  }
>>   
>>  list_add(>right->tmp_link, dfs);
>> @@ -535,8 +548,61 @@ static int __drm_buddy_alloc_range(struct drm_buddy_mm 
>> *mm,
>>  for (i = 0; i < mm->n_roots; ++i)
>>  list_add_tail(>roots[i]->tmp_link, );
>>   
>> -return __alloc_range(mm, , start, size, blocks);
>> +return __alloc_range(mm, , start, size, blocks, 0);
>> +}
>> +
>> +/**
>> + * drm_buddy_block_trim - free unused pages
>> + *
>> + * @mm: DRM buddy manager
>> + * @new_size: original size requested
>> + * @blocks: output list head to add allocated blocks
>> + *
>> + * For contiguous allocation, we round up the size to the nearest
>> + * power of two value, drivers consume *actual* size, so remaining
>> + * portions are unused and it can be freed.
>> + *
>> + * Returns:
>> + * 0 on success, error code on failure.
>> + */
>> +int drm_buddy_block_trim(struct drm_buddy_mm *mm,
>> + u64 new_size,
>> + struct list_head *blocks)
>> +{
>> +

[PATCH v2 5/5] drm/amdgpu: Modify indirect register access for gfx9 sriov

2021-12-15 Thread Victor Skvortsov

Expand RLCG interface for new GC read & write commands.
New interface will only be used if the PF enables the flag in pf2vf msg.

v2: Added a description for the scratch registers

Signed-off-by: Victor Skvortsov 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 117 --
 1 file changed, 89 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index d252b06efa43..7a754cb8236e 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -63,6 +63,13 @@
 #define mmGCEA_PROBE_MAP0x070c
 #define mmGCEA_PROBE_MAP_BASE_IDX   0
 
+#define GFX9_RLCG_GC_WRITE_OLD (0x8 << 28)
+#define GFX9_RLCG_GC_WRITE (0x0 << 28)
+#define GFX9_RLCG_GC_READ  (0x1 << 28)
+#define GFX9_RLCG_VFGATE_DISABLED  0x400
+#define GFX9_RLCG_WRONG_OPERATION_TYPE 0x200
+#define GFX9_RLCG_NOT_IN_RANGE 0x100
+
 MODULE_FIRMWARE("amdgpu/vega10_ce.bin");
 MODULE_FIRMWARE("amdgpu/vega10_pfp.bin");
 MODULE_FIRMWARE("amdgpu/vega10_me.bin");
@@ -739,7 +746,7 @@ static const u32 GFX_RLC_SRM_INDEX_CNTL_DATA_OFFSETS[] =
mmRLC_SRM_INDEX_CNTL_DATA_7 - mmRLC_SRM_INDEX_CNTL_DATA_0,
 };
 
-static void gfx_v9_0_rlcg_w(struct amdgpu_device *adev, u32 offset, u32 v, u32 
flag)
+static u32 gfx_v9_0_rlcg_rw(struct amdgpu_device *adev, u32 offset, u32 v, 
uint32_t flag)
 {
static void *scratch_reg0;
static void *scratch_reg1;
@@ -748,21 +755,20 @@ static void gfx_v9_0_rlcg_w(struct amdgpu_device *adev, 
u32 offset, u32 v, u32 f
static void *spare_int;
static uint32_t grbm_cntl;
static uint32_t grbm_idx;
+   uint32_t i = 0;
+   uint32_t retries = 5;
+   u32 ret = 0;
+   u32 tmp;
 
scratch_reg0 = adev->rmmio + 
(adev->reg_offset[GC_HWIP][0][mmSCRATCH_REG0_BASE_IDX] + mmSCRATCH_REG0)*4;
scratch_reg1 = adev->rmmio + 
(adev->reg_offset[GC_HWIP][0][mmSCRATCH_REG1_BASE_IDX] + mmSCRATCH_REG1)*4;
-   scratch_reg2 = adev->rmmio + 
(adev->reg_offset[GC_HWIP][0][mmSCRATCH_REG1_BASE_IDX] + mmSCRATCH_REG2)*4;
-   scratch_reg3 = adev->rmmio + 
(adev->reg_offset[GC_HWIP][0][mmSCRATCH_REG1_BASE_IDX] + mmSCRATCH_REG3)*4;
+   scratch_reg2 = adev->rmmio + 
(adev->reg_offset[GC_HWIP][0][mmSCRATCH_REG2_BASE_IDX] + mmSCRATCH_REG2)*4;
+   scratch_reg3 = adev->rmmio + 
(adev->reg_offset[GC_HWIP][0][mmSCRATCH_REG3_BASE_IDX] + mmSCRATCH_REG3)*4;
spare_int = adev->rmmio + 
(adev->reg_offset[GC_HWIP][0][mmRLC_SPARE_INT_BASE_IDX] + mmRLC_SPARE_INT)*4;
 
grbm_cntl = adev->reg_offset[GC_HWIP][0][mmGRBM_GFX_CNTL_BASE_IDX] + 
mmGRBM_GFX_CNTL;
grbm_idx = adev->reg_offset[GC_HWIP][0][mmGRBM_GFX_INDEX_BASE_IDX] + 
mmGRBM_GFX_INDEX;
 
-   if (amdgpu_sriov_runtime(adev)) {
-   pr_err("shouldn't call rlcg write register during runtime\n");
-   return;
-   }
-
if (offset == grbm_cntl || offset == grbm_idx) {
if (offset  == grbm_cntl)
writel(v, scratch_reg2);
@@ -771,41 +777,95 @@ static void gfx_v9_0_rlcg_w(struct amdgpu_device *adev, 
u32 offset, u32 v, u32 f
 
writel(v, ((void __iomem *)adev->rmmio) + (offset * 4));
} else {
-   uint32_t i = 0;
-   uint32_t retries = 5;
-
+   /*
+* SCRATCH_REG0 = read/write value
+* SCRATCH_REG1[30:28]  = command
+* SCRATCH_REG1[19:0]   = address in dword
+* SCRATCH_REG1[26:24]  = Error reporting
+*/
writel(v, scratch_reg0);
-   writel(offset | 0x8000, scratch_reg1);
+   writel(offset | flag, scratch_reg1);
writel(1, spare_int);
-   for (i = 0; i < retries; i++) {
-   u32 tmp;
 
+   for (i = 0; i < retries; i++) {
tmp = readl(scratch_reg1);
-   if (!(tmp & 0x8000))
+   if (!(tmp & flag))
break;
 
udelay(10);
}
-   if (i >= retries)
-   pr_err("timeout: rlcg program reg:0x%05x failed !\n", 
offset);
+
+   if (i >= retries) {
+   if (amdgpu_sriov_reg_indirect_gc(adev)) {
+   if (tmp & GFX9_RLCG_VFGATE_DISABLED)
+   pr_err("The vfgate is disabled, program 
reg:0x%05x failed!\n", offset);
+   else if (tmp & GFX9_RLCG_WRONG_OPERATION_TYPE)
+   pr_err("Wrong operation type, program 
reg:0x%05x failed!\n", offset);
+   else if (tmp & GFX9_RLCG_NOT_IN_RANGE)
+   pr_err("The

[PATCH v2 4/5] drm/amdgpu: Initialize Aldebaran RLC function pointers

2021-12-15 Thread Victor Skvortsov

In SRIOV, RLC function pointers must be initialized early as
we rely on the RLCG interface for all GC register access.

v2: Make aldebaran a seperate case

Signed-off-by: Victor Skvortsov 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 4 
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 3 +--
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.h | 2 ++
 3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index 65e1f6cc59dd..d7e1b503cd3b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -842,8 +842,12 @@ static int amdgpu_discovery_set_gc_ip_blocks(struct 
amdgpu_device *adev)
case IP_VERSION(9, 3, 0):
case IP_VERSION(9, 4, 0):
case IP_VERSION(9, 4, 1):
+   amdgpu_device_ip_block_add(adev, _v9_0_ip_block);
+   break;
case IP_VERSION(9, 4, 2):
amdgpu_device_ip_block_add(adev, _v9_0_ip_block);
+   if (amdgpu_sriov_vf(adev))
+   gfx_v9_0_set_rlc_funcs(adev);
break;
case IP_VERSION(10, 1, 10):
case IP_VERSION(10, 1, 2):
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index edb3e3b08eed..d252b06efa43 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -816,7 +816,6 @@ static void gfx_v9_0_sriov_wreg(struct amdgpu_device *adev, 
u32 offset,
 static void gfx_v9_0_set_ring_funcs(struct amdgpu_device *adev);
 static void gfx_v9_0_set_irq_funcs(struct amdgpu_device *adev);
 static void gfx_v9_0_set_gds_init(struct amdgpu_device *adev);
-static void gfx_v9_0_set_rlc_funcs(struct amdgpu_device *adev);
 static int gfx_v9_0_get_cu_info(struct amdgpu_device *adev,
struct amdgpu_cu_info *cu_info);
 static uint64_t gfx_v9_0_get_gpu_clock_counter(struct amdgpu_device *adev);
@@ -7066,7 +7065,7 @@ static void gfx_v9_0_set_irq_funcs(struct amdgpu_device 
*adev)
adev->gfx.cp_ecc_error_irq.funcs = _v9_0_cp_ecc_error_irq_funcs;
 }
 
-static void gfx_v9_0_set_rlc_funcs(struct amdgpu_device *adev)
+void gfx_v9_0_set_rlc_funcs(struct amdgpu_device *adev)
 {
switch (adev->ip_versions[GC_HWIP][0]) {
case IP_VERSION(9, 0, 1):
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.h 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.h
index dfe8d4841f58..1817e252354f 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.h
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.h
@@ -29,4 +29,6 @@ extern const struct amdgpu_ip_block_version gfx_v9_0_ip_block;
 void gfx_v9_0_select_se_sh(struct amdgpu_device *adev, u32 se_num, u32 sh_num,
   u32 instance);
 
+void gfx_v9_0_set_rlc_funcs(struct amdgpu_device *adev);
+
 #endif
-- 
2.25.1

[PATCH v2 1/5] drm/amdgpu: Add *_SOC15_IP_NO_KIQ() macro definitions

2021-12-15 Thread Victor Skvortsov

Add helper macros to change register access
from direct to indirect.

Signed-off-by: Victor Skvortsov 
---
 drivers/gpu/drm/amd/amdgpu/soc15_common.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/soc15_common.h 
b/drivers/gpu/drm/amd/amdgpu/soc15_common.h
index 8a9ca87d8663..473767e03676 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc15_common.h
+++ b/drivers/gpu/drm/amd/amdgpu/soc15_common.h
@@ -51,6 +51,8 @@
 
 #define RREG32_SOC15_IP(ip, reg) __RREG32_SOC15_RLC__(reg, 0, ip##_HWIP)
 
+#define RREG32_SOC15_IP_NO_KIQ(ip, reg) __RREG32_SOC15_RLC__(reg, 
AMDGPU_REGS_NO_KIQ, ip##_HWIP)
+
 #define RREG32_SOC15_NO_KIQ(ip, inst, reg) \
__RREG32_SOC15_RLC__(adev->reg_offset[ip##_HWIP][inst][reg##_BASE_IDX] 
+ reg, \
 AMDGPU_REGS_NO_KIQ, ip##_HWIP)
@@ -65,6 +67,9 @@
 #define WREG32_SOC15_IP(ip, reg, value) \
 __WREG32_SOC15_RLC__(reg, value, 0, ip##_HWIP)
 
+#define WREG32_SOC15_IP_NO_KIQ(ip, reg, value) \
+__WREG32_SOC15_RLC__(reg, value, AMDGPU_REGS_NO_KIQ, ip##_HWIP)
+
 #define WREG32_SOC15_NO_KIQ(ip, inst, reg, value) \
__WREG32_SOC15_RLC__(adev->reg_offset[ip##_HWIP][inst][reg##_BASE_IDX] 
+ reg, \
 value, AMDGPU_REGS_NO_KIQ, ip##_HWIP)
-- 
2.25.1

[PATCH v2 2/5] drm/amdgpu: Modify indirect register access for gmc_v9_0 sriov

2021-12-15 Thread Victor Skvortsov

Modify GC register access from MMIO to RLCG if the
indirect flag is set

v2: Replaced ternary operator with if-else for better
readability

Signed-off-by: Victor Skvortsov 
---
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 57 ---
 1 file changed, 43 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index db2ec84f7237..e85a264c971a 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -478,9 +478,18 @@ static int gmc_v9_0_vm_fault_interrupt_state(struct 
amdgpu_device *adev,
hub = >vmhub[j];
for (i = 0; i < 16; i++) {
reg = hub->vm_context0_cntl + i;
-   tmp = RREG32(reg);
+
+   if (j == AMDGPU_GFXHUB_0)
+   tmp = RREG32_SOC15_IP(GC, reg);
+   else
+   tmp = RREG32_SOC15_IP(MMHUB, reg);
+
tmp &= ~bits;
-   WREG32(reg, tmp);
+
+   if (j == AMDGPU_GFXHUB_0)
+   WREG32_SOC15_IP(GC, reg, tmp);
+   else
+   WREG32_SOC15_IP(MMHUB, reg, tmp);
}
}
break;
@@ -489,9 +498,18 @@ static int gmc_v9_0_vm_fault_interrupt_state(struct 
amdgpu_device *adev,
hub = >vmhub[j];
for (i = 0; i < 16; i++) {
reg = hub->vm_context0_cntl + i;
-   tmp = RREG32(reg);
+
+   if (j == AMDGPU_GFXHUB_0)
+   tmp = RREG32_SOC15_IP(GC, reg);
+   else
+   tmp = RREG32_SOC15_IP(MMHUB, reg);
+
tmp |= bits;
-   WREG32(reg, tmp);
+
+   if (j == AMDGPU_GFXHUB_0)
+   WREG32_SOC15_IP(GC, reg, tmp);
+   else
+   WREG32_SOC15_IP(MMHUB, reg, tmp);
}
}
break;
@@ -788,9 +806,12 @@ static void gmc_v9_0_flush_gpu_tlb(struct amdgpu_device 
*adev, uint32_t vmid,
/* TODO: It needs to continue working on debugging with semaphore for 
GFXHUB as well. */
if (use_semaphore) {
for (j = 0; j < adev->usec_timeout; j++) {
-   /* a read return value of 1 means semaphore acuqire */
-   tmp = RREG32_NO_KIQ(hub->vm_inv_eng0_sem +
-   hub->eng_distance * eng);
+   /* a read return value of 1 means semaphore acquire */
+   if (vmhub == AMDGPU_GFXHUB_0)
+   tmp = RREG32_SOC15_IP_NO_KIQ(GC, 
hub->vm_inv_eng0_sem + hub->eng_distance * eng);
+   else
+   tmp = RREG32_SOC15_IP_NO_KIQ(MMHUB, 
hub->vm_inv_eng0_sem + hub->eng_distance * eng);
+
if (tmp & 0x1)
break;
udelay(1);
@@ -801,8 +822,10 @@ static void gmc_v9_0_flush_gpu_tlb(struct amdgpu_device 
*adev, uint32_t vmid,
}
 
do {
-   WREG32_NO_KIQ(hub->vm_inv_eng0_req +
- hub->eng_distance * eng, inv_req);
+   if (vmhub == AMDGPU_GFXHUB_0)
+   WREG32_SOC15_IP_NO_KIQ(GC, hub->vm_inv_eng0_req + 
hub->eng_distance * eng, inv_req);
+   else
+   WREG32_SOC15_IP_NO_KIQ(MMHUB, hub->vm_inv_eng0_req + 
hub->eng_distance * eng, inv_req);
 
/*
 * Issue a dummy read to wait for the ACK register to
@@ -815,8 +838,11 @@ static void gmc_v9_0_flush_gpu_tlb(struct amdgpu_device 
*adev, uint32_t vmid,
  hub->eng_distance * eng);
 
for (j = 0; j < adev->usec_timeout; j++) {
-   tmp = RREG32_NO_KIQ(hub->vm_inv_eng0_ack +
-   hub->eng_distance * eng);
+   if (vmhub == AMDGPU_GFXHUB_0)
+   tmp = RREG32_SOC15_IP_NO_KIQ(GC, 
hub->vm_inv_eng0_ack + hub->eng_distance * eng);
+   else
+   tmp = RREG32_SOC15_IP_NO_KIQ(MMHUB, 
hub->vm_inv_eng0_ack + hub->eng_distance * eng);
+
if (tmp & (1 << vmid))
break;
udelay(1);
@@ -827,13 +853,16 @@ static void gmc_v9_0_flush_gpu_tlb(struct amdgpu_device 
*adev, uint32_t vmid,
} while (inv_req);

[PATCH v2 3/5] drm/amdgpu: Modify indirect register access for amdkfd_gfx_v9 sriov

2021-12-15 Thread Victor Skvortsov

Modify GC register access from MMIO to RLCG if the indirect
flag is set

Signed-off-by: Victor Skvortsov 
---
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 27 +--
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
index ddfe7aff919d..1abf662a0e91 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
@@ -166,7 +166,7 @@ int kgd_gfx_v9_init_interrupts(struct amdgpu_device *adev, 
uint32_t pipe_id)
 
lock_srbm(adev, mec, pipe, 0, 0);
 
-   WREG32(SOC15_REG_OFFSET(GC, 0, mmCPC_INT_CNTL),
+   WREG32_SOC15(GC, 0, mmCPC_INT_CNTL,
CP_INT_CNTL_RING0__TIME_STAMP_INT_ENABLE_MASK |
CP_INT_CNTL_RING0__OPCODE_ERROR_INT_ENABLE_MASK);
 
@@ -279,7 +279,7 @@ int kgd_gfx_v9_hqd_load(struct amdgpu_device *adev, void 
*mqd,
   lower_32_bits((uintptr_t)wptr));
WREG32_RLC(SOC15_REG_OFFSET(GC, 0, 
mmCP_HQD_PQ_WPTR_POLL_ADDR_HI),
   upper_32_bits((uintptr_t)wptr));
-   WREG32(SOC15_REG_OFFSET(GC, 0, mmCP_PQ_WPTR_POLL_CNTL1),
+   WREG32_SOC15(GC, 0, mmCP_PQ_WPTR_POLL_CNTL1,
   (uint32_t)get_queue_mask(adev, pipe_id, queue_id));
}
 
@@ -488,13 +488,13 @@ bool kgd_gfx_v9_hqd_is_occupied(struct amdgpu_device 
*adev,
uint32_t low, high;
 
acquire_queue(adev, pipe_id, queue_id);
-   act = RREG32(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_ACTIVE));
+   act = RREG32_SOC15(GC, 0, mmCP_HQD_ACTIVE);
if (act) {
low = lower_32_bits(queue_address >> 8);
high = upper_32_bits(queue_address >> 8);
 
-   if (low == RREG32(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_PQ_BASE)) &&
-  high == RREG32(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_PQ_BASE_HI)))
+   if (low == RREG32_SOC15(GC, 0, mmCP_HQD_PQ_BASE) &&
+  high == RREG32_SOC15(GC, 0, mmCP_HQD_PQ_BASE_HI))
retval = true;
}
release_queue(adev);
@@ -556,7 +556,7 @@ int kgd_gfx_v9_hqd_destroy(struct amdgpu_device *adev, void 
*mqd,
 
end_jiffies = (utimeout * HZ / 1000) + jiffies;
while (true) {
-   temp = RREG32(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_ACTIVE));
+   temp = RREG32_SOC15(GC, 0, mmCP_HQD_ACTIVE);
if (!(temp & CP_HQD_ACTIVE__ACTIVE_MASK))
break;
if (time_after(jiffies, end_jiffies)) {
@@ -645,7 +645,7 @@ int kgd_gfx_v9_wave_control_execute(struct amdgpu_device 
*adev,
mutex_lock(>grbm_idx_mutex);
 
WREG32_SOC15_RLC_SHADOW(GC, 0, mmGRBM_GFX_INDEX, gfx_index_val);
-   WREG32(SOC15_REG_OFFSET(GC, 0, mmSQ_CMD), sq_cmd);
+   WREG32_SOC15(GC, 0, mmSQ_CMD, sq_cmd);
 
data = REG_SET_FIELD(data, GRBM_GFX_INDEX,
INSTANCE_BROADCAST_WRITES, 1);
@@ -722,7 +722,7 @@ static void get_wave_count(struct amdgpu_device *adev, int 
queue_idx,
pipe_idx = queue_idx / adev->gfx.mec.num_queue_per_pipe;
queue_slot = queue_idx % adev->gfx.mec.num_queue_per_pipe;
soc15_grbm_select(adev, 1, pipe_idx, queue_slot, 0);
-   reg_val = RREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_CSQ_WF_ACTIVE_COUNT_0) +
+   reg_val = RREG32_SOC15_IP(GC, SOC15_REG_OFFSET(GC, 0, 
mmSPI_CSQ_WF_ACTIVE_COUNT_0) +
 queue_slot);
*wave_cnt = reg_val & SPI_CSQ_WF_ACTIVE_COUNT_0__COUNT_MASK;
if (*wave_cnt != 0)
@@ -809,8 +809,7 @@ void kgd_gfx_v9_get_cu_occupancy(struct amdgpu_device 
*adev, int pasid,
for (sh_idx = 0; sh_idx < sh_cnt; sh_idx++) {
 
gfx_v9_0_select_se_sh(adev, se_idx, sh_idx, 0x);
-   queue_map = RREG32(SOC15_REG_OFFSET(GC, 0,
-  mmSPI_CSQ_WF_ACTIVE_STATUS));
+   queue_map = RREG32_SOC15(GC, 0, 
mmSPI_CSQ_WF_ACTIVE_STATUS);
 
/*
 * Assumption: queue map encodes following schema: four
@@ -860,17 +859,17 @@ void kgd_gfx_v9_program_trap_handler_settings(struct 
amdgpu_device *adev,
/*
 * Program TBA registers
 */
-   WREG32(SOC15_REG_OFFSET(GC, 0, mmSQ_SHADER_TBA_LO),
+   WREG32_SOC15(GC, 0, mmSQ_SHADER_TBA_LO,
 lower_32_bits(tba_addr >> 8));
-   WREG32(SOC15_REG_OFFSET(GC, 0, mmSQ_SHADER_TBA_HI),
+   WREG32_SOC15(GC, 0, mmSQ_SHADER_TBA_HI,
 upper_32_bits(tba_addr >> 8));
 
/*
 * Program TMA registers
 */
-   WREG32(SOC15_REG_OFFSET(GC, 0, mmSQ_SHADER_TMA_LO),
+   WREG32_SOC15(GC, 0, mmSQ_SHADER_TMA_LO,
lower_32_bits(tma_addr >> 8));
-   WREG32(SOC15_REG_OFFSET(GC, 0, mmSQ_SHADER_TMA_HI),
+   WREG32_SOC15(GC, 0, mmSQ_SHADER_TMA_HI,

[PATCH v2 0/5] * GFX9 RLCG Interface modifications *

2021-12-15 Thread Victor Skvortsov

This patchset introduces an expanded sriov RLCG interface.
This interface will be used by Aldebaran in sriov mode
for indirect GC register access during full access.

v2: Added descriptions to scratch registers, and 
improved code readability.

Victor Skvortsov (5):
  drm/amdgpu: Add *_SOC15_IP_NO_KIQ() macro definitions
  drm/amdgpu: Modify indirect register access for gmc_v9_0 sriov
  drm/amdgpu: Modify indirect register access for amdkfd_gfx_v9 sriov
  drm/amdgpu: Initialize Aldebaran RLC function pointers
  drm/amdgpu: Modify indirect register access for gfx9 sriov

 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c |  27 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c |   4 +
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 120 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.h |   2 +
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c |  57 +++--
 drivers/gpu/drm/amd/amdgpu/soc15_common.h |   5 +
 6 files changed, 157 insertions(+), 58 deletions(-)

-- 
2.25.1

Re: [PATCH v4 2/6] drm: improve drm_buddy_alloc function

2021-12-15 Thread Arunpravin




On 14/12/21 12:29 am, Matthew Auld wrote:
> On 09/12/2021 15:47, Paneer Selvam, Arunpravin wrote:
>> [AMD Official Use Only]
>>
>> Hi Matthew,
>>
>> Ping on this?
> 
> No new comments from me :) I guess just a question of what we should do 
> with the selftests, and then ofc at some point being able to throw this 
> at CI, or at least test locally, once the series builds.
> 
sure :) I think we should rewrite the i915 buddy selftests since now we
have a single function for range and non-range requirements. I will
rewrite the i915 buddy selftests and move to drm selftests folder?
so for the time being, I remove the i915_buddy_mock_selftest() from
i915_mock_selftests.h list to avoid build errors?
>>
>> Regards,
>> Arun
>> -Original Message-
>> From: amd-gfx  On Behalf Of Arunpravin
>> Sent: Wednesday, December 1, 2021 10:10 PM
>> To: dri-de...@lists.freedesktop.org; intel-...@lists.freedesktop.org; 
>> amd-gfx@lists.freedesktop.org
>> Cc: dan...@ffwll.ch; Paneer Selvam, Arunpravin 
>> ; jani.nik...@linux.intel.com; 
>> matthew.a...@intel.com; tzimmerm...@suse.de; Deucher, Alexander 
>> ; Koenig, Christian 
>> Subject: [PATCH v4 2/6] drm: improve drm_buddy_alloc function
>>
>> - Make drm_buddy_alloc a single function to handle
>>range allocation and non-range allocation demands
>>
>> - Implemented a new function alloc_range() which allocates
>>the requested power-of-two block comply with range limitations
>>
>> - Moved order computation and memory alignment logic from
>>i915 driver to drm buddy
>>
>> v2:
>>merged below changes to keep the build unbroken
>> - drm_buddy_alloc_range() becomes obsolete and may be removed
>> - enable ttm range allocation (fpfn / lpfn) support in i915 driver
>> - apply enhanced drm_buddy_alloc() function to i915 driver
>>
>> v3(Matthew Auld):
>>- Fix alignment issues and remove unnecessary list_empty check
>>- add more validation checks for input arguments
>>- make alloc_range() block allocations as bottom-up
>>- optimize order computation logic
>>- replace uint64_t with u64, which is preferred in the kernel
>>
>> v4(Matthew Auld):
>>- keep drm_buddy_alloc_range() function implementation for generic
>>  actual range allocations
>>- keep alloc_range() implementation for end bias allocations
>>
>> Signed-off-by: Arunpravin 
>> ---
>>   drivers/gpu/drm/drm_buddy.c   | 316 +-
>>   drivers/gpu/drm/i915/i915_ttm_buddy_manager.c |  67 ++--
>>   drivers/gpu/drm/i915/i915_ttm_buddy_manager.h |   2 +
>>   include/drm/drm_buddy.h   |  22 +-
>>   4 files changed, 285 insertions(+), 122 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c index 
>> 9340a4b61c5a..7f47632821f4 100644
>> --- a/drivers/gpu/drm/drm_buddy.c
>> +++ b/drivers/gpu/drm/drm_buddy.c
>> @@ -280,23 +280,97 @@ void drm_buddy_free_list(struct drm_buddy_mm *mm, 
>> struct list_head *objects)  }  EXPORT_SYMBOL(drm_buddy_free_list);
>>   
>> -/**
>> - * drm_buddy_alloc - allocate power-of-two blocks
>> - *
>> - * @mm: DRM buddy manager to allocate from
>> - * @order: size of the allocation
>> - *
>> - * The order value here translates to:
>> - *
>> - * 0 = 2^0 * mm->chunk_size
>> - * 1 = 2^1 * mm->chunk_size
>> - * 2 = 2^2 * mm->chunk_size
>> - *
>> - * Returns:
>> - * allocated ptr to the _buddy_block on success
>> - */
>> -struct drm_buddy_block *
>> -drm_buddy_alloc(struct drm_buddy_mm *mm, unsigned int order)
>> +static inline bool overlaps(u64 s1, u64 e1, u64 s2, u64 e2) {
>> +return s1 <= e2 && e1 >= s2;
>> +}
>> +
>> +static inline bool contains(u64 s1, u64 e1, u64 s2, u64 e2) {
>> +return s1 <= s2 && e1 >= e2;
>> +}
>> +
>> +static struct drm_buddy_block *
>> +alloc_range_bias(struct drm_buddy_mm *mm,
>> + u64 start, u64 end,
>> + unsigned int order)
>> +{
>> +struct drm_buddy_block *block;
>> +struct drm_buddy_block *buddy;
>> +LIST_HEAD(dfs);
>> +int err;
>> +int i;
>> +
>> +end = end - 1;
>> +
>> +for (i = 0; i < mm->n_roots; ++i)
>> +list_add_tail(>roots[i]->tmp_link, );
>> +
>> +do {
>> +u64 block_start;
>> +u64 block_end;
>> +
>> +block = list_first_entry_or_null(,
>> + struct drm_buddy_block,
>> + tmp_link);
>> +if (!block)
>> +break;
>> +
>> +list_del(>tmp_link);
>> +
>> +if (drm_buddy_block_order(block) < order)
>> +continue;
>> +
>> +block_start = drm_buddy_block_offset(block);
>> +block_end = block_start + drm_buddy_block_size(mm, block) - 1;
>> +
>> +if (!overlaps(start, end, block_start, block_end))
>> +continue;
>> +
>> +if (drm_buddy_block_is_allocated(block))
>> +continue;
>> +
>> +

Re: [PATCH] drm/amdgpu: fixup bad vram size on gmc v8

2021-12-15 Thread Alex Deucher

On Wed, Dec 15, 2021 at 10:31 AM Zongmin Zhou  wrote:
>
> Some boards(like RX550) seem to have garbage in the upper
> 16 bits of the vram size register.  Check for
> this and clamp the size properly.  Fixes
> boards reporting bogus amounts of vram.
>
> after add this patch,the maximum GPU VRAM size is 64GB,
> otherwise only 64GB vram size will be used.

Can you provide some examples of problematic boards and possibly a
vbios image from the problematic board?  What values are you seeing?
It would be nice to see what the boards are reporting and whether the
lower 16 bits are actually correct or if it is some other issue.  This
register is undefined until the asic has been initialized.  The vbios
programs it as part of it's asic init sequence (either via vesa/gop or
the OS driver).

Alex


>
> Signed-off-by: Zongmin Zhou
> ---
>  drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 13 ++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
> index 492ebed2915b..63b890f1e8af 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
> @@ -515,10 +515,10 @@ static void gmc_v8_0_mc_program(struct amdgpu_device 
> *adev)
>  static int gmc_v8_0_mc_init(struct amdgpu_device *adev)
>  {
> int r;
> +   u32 tmp;
>
> adev->gmc.vram_width = amdgpu_atombios_get_vram_width(adev);
> if (!adev->gmc.vram_width) {
> -   u32 tmp;
> int chansize, numchan;
>
> /* Get VRAM informations */
> @@ -562,8 +562,15 @@ static int gmc_v8_0_mc_init(struct amdgpu_device *adev)
> adev->gmc.vram_width = numchan * chansize;
> }
> /* size in MB on si */
> -   adev->gmc.mc_vram_size = RREG32(mmCONFIG_MEMSIZE) * 1024ULL * 1024ULL;
> -   adev->gmc.real_vram_size = RREG32(mmCONFIG_MEMSIZE) * 1024ULL * 
> 1024ULL;
> +   tmp = RREG32(mmCONFIG_MEMSIZE);
> +   /* some boards may have garbage in the upper 16 bits */
> +   if (tmp & 0x) {
> +   DRM_INFO("Probable bad vram size: 0x%08x\n", tmp);
> +   if (tmp & 0x)
> +   tmp &= 0x;
> +   }
> +   adev->gmc.mc_vram_size = tmp * 1024ULL * 1024ULL;
> +   adev->gmc.real_vram_size = adev->gmc.mc_vram_size;
>
> if (!(adev->flags & AMD_IS_APU)) {
> r = amdgpu_device_resize_fb_bar(adev);
> --
> 2.25.1
>
>
> No virus found
> Checked by Hillstone Network AntiVirus

Re: [PATCH 5/5] drm/amdgpu: Modify indirect register access for gfx9 sriov

2021-12-15 Thread Nieto, David M

[Public]

Gotcha,

Can you add prior to the implementation a small description on how the 
interface and the different scratch registers work? It may be easier to review 
with a clear idea of the operation. I know the earlier implementation did not 
include it, but now that we are modifying it, it would be good to document it.



From: Skvortsov, Victor 
Sent: Wednesday, December 15, 2021 11:18 AM
To: Nieto, David M ; amd-gfx@lists.freedesktop.org 
; Deng, Emily ; Liu, Monk 
; Ming, Davis ; Liu, Shaoyun 
; Zhou, Peng Ju ; Chen, JingWen 
; Chen, Horace 
Subject: RE: [PATCH 5/5] drm/amdgpu: Modify indirect register access for gfx9 
sriov


[AMD Official Use Only]



This was a bug in the original definition, but it functionally it makes no 
difference (in both cases the macros resolve to the same value).



From: Nieto, David M 
Sent: Wednesday, December 15, 2021 2:16 PM
To: Skvortsov, Victor ; 
amd-gfx@lists.freedesktop.org; Deng, Emily ; Liu, Monk 
; Ming, Davis ; Liu, Shaoyun 
; Zhou, Peng Ju ; Chen, JingWen 
; Chen, Horace 
Subject: Re: [PATCH 5/5] drm/amdgpu: Modify indirect register access for gfx9 
sriov



[AMD Official Use Only]



 scratch_reg0 = adev->rmmio + 
(adev->reg_offset[GC_HWIP][0][mmSCRATCH_REG0_BASE_IDX] + mmSCRATCH_REG0)*4;
 scratch_reg1 = adev->rmmio + 
(adev->reg_offset[GC_HWIP][0][mmSCRATCH_REG1_BASE_IDX] + mmSCRATCH_REG1)*4;
-   scratch_reg2 = adev->rmmio + 
(adev->reg_offset[GC_HWIP][0][mmSCRATCH_REG1_BASE_IDX] + mmSCRATCH_REG2)*4;
-   scratch_reg3 = adev->rmmio + 
(adev->reg_offset[GC_HWIP][0][mmSCRATCH_REG1_BASE_IDX] + mmSCRATCH_REG3)*4;
+   scratch_reg2 = adev->rmmio + 
(adev->reg_offset[GC_HWIP][0][mmSCRATCH_REG2_BASE_IDX] + mmSCRATCH_REG2)*4;
+   scratch_reg3 = adev->rmmio + 
(adev->reg_offset[GC_HWIP][0][mmSCRATCH_REG3_BASE_IDX] + mmSCRATCH_REG3)*4;
 spare_int = adev->rmmio + 
(adev->reg_offset[GC_HWIP][0][mmRLC_SPARE_INT_BASE_IDX] + mmRLC_SPARE_INT)*4;



the definition of scratch_reg2 and 3 has here will this be backwards 
compatible? Was it a bug in the definition?



From: Skvortsov, Victor 
mailto:victor.skvort...@amd.com>>
Sent: Wednesday, December 15, 2021 10:55 AM
To: amd-gfx@lists.freedesktop.org 
mailto:amd-gfx@lists.freedesktop.org>>; Deng, 
Emily mailto:emily.d...@amd.com>>; Liu, Monk 
mailto:monk@amd.com>>; Ming, Davis 
mailto:davis.m...@amd.com>>; Liu, Shaoyun 
mailto:shaoyun@amd.com>>; Zhou, Peng Ju 
mailto:pengju.z...@amd.com>>; Chen, JingWen 
mailto:jingwen.ch...@amd.com>>; Chen, Horace 
mailto:horace.c...@amd.com>>; Nieto, David M 
mailto:david.ni...@amd.com>>
Cc: Skvortsov, Victor 
mailto:victor.skvort...@amd.com>>
Subject: [PATCH 5/5] drm/amdgpu: Modify indirect register access for gfx9 sriov



Expand RLCG interface for new GC read & write commands.
New interface will only be used if the PF enables the flag in pf2vf msg.

Signed-off-by: Victor Skvortsov 
mailto:victor.skvort...@amd.com>>
---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 111 +++---
 1 file changed, 83 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index d252b06efa43..bce6ab52cae0 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -63,6 +63,13 @@
 #define mmGCEA_PROBE_MAP0x070c
 #define mmGCEA_PROBE_MAP_BASE_IDX   0

+#define GFX9_RLCG_GC_WRITE_OLD (0x8 << 28)
+#define GFX9_RLCG_GC_WRITE (0x0 << 28)
+#define GFX9_RLCG_GC_READ  (0x1 << 28)
+#define GFX9_RLCG_VFGATE_DISABLED  0x400
+#define GFX9_RLCG_WRONG_OPERATION_TYPE 0x200
+#define GFX9_RLCG_NOT_IN_RANGE 0x100
+
 MODULE_FIRMWARE("amdgpu/vega10_ce.bin");
 MODULE_FIRMWARE("amdgpu/vega10_pfp.bin");
 MODULE_FIRMWARE("amdgpu/vega10_me.bin");
@@ -739,7 +746,7 @@ static const u32 GFX_RLC_SRM_INDEX_CNTL_DATA_OFFSETS[] =
 mmRLC_SRM_INDEX_CNTL_DATA_7 - mmRLC_SRM_INDEX_CNTL_DATA_0,
 };

-static void gfx_v9_0_rlcg_w(struct amdgpu_device *adev, u32 offset, u32 v, u32 
flag)
+static u32 gfx_v9_0_rlcg_rw(struct amdgpu_device *adev, u32 offset, u32 v, 
uint32_t flag)
 {
 static void *scratch_reg0;
 static void *scratch_reg1;
@@ -748,21 +755,20 @@ static void gfx_v9_0_rlcg_w(struct amdgpu_device *adev, 
u32 offset, u32 v, u32 f
 static void *spare_int;
 static uint32_t grbm_cntl;
 static uint32_t grbm_idx;
+   uint32_t i = 0;
+   uint32_t retries = 5;
+   u32 ret = 0;
+   u32 tmp;

 scratch_reg0 = adev->rmmio + 
(adev->reg_offset[GC_HWIP][0][mmSCRATCH_REG0_BASE_IDX] + mmSCRATCH_REG0)*4;
 scratch_reg1 = adev->rmmio + 
(adev->reg_offset[GC_HWIP][0][mmSCRATCH_REG1_BASE_IDX] + mmSCRATCH_REG1)*4;
-   scratch_reg2 = adev->rmmio +

Re: [PATCH 3/5] drm/amdgpu: Modify indirect register access for amdkfd_gfx_v9 sriov

2021-12-15 Thread Nieto, David M

[AMD Official Use Only]

Reviewed-by: David Nieto 

From: Skvortsov, Victor 
Sent: Wednesday, December 15, 2021 10:55 AM
To: amd-gfx@lists.freedesktop.org ; Deng, Emily 
; Liu, Monk ; Ming, Davis 
; Liu, Shaoyun ; Zhou, Peng Ju 
; Chen, JingWen ; Chen, Horace 
; Nieto, David M 
Cc: Skvortsov, Victor 
Subject: [PATCH 3/5] drm/amdgpu: Modify indirect register access for 
amdkfd_gfx_v9 sriov

Modify GC register access from MMIO to RLCG if the indirect
flag is set

Signed-off-by: Victor Skvortsov 
---
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 27 +--
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
index ddfe7aff919d..1abf662a0e91 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
@@ -166,7 +166,7 @@ int kgd_gfx_v9_init_interrupts(struct amdgpu_device *adev, 
uint32_t pipe_id)

 lock_srbm(adev, mec, pipe, 0, 0);

-   WREG32(SOC15_REG_OFFSET(GC, 0, mmCPC_INT_CNTL),
+   WREG32_SOC15(GC, 0, mmCPC_INT_CNTL,
 CP_INT_CNTL_RING0__TIME_STAMP_INT_ENABLE_MASK |
 CP_INT_CNTL_RING0__OPCODE_ERROR_INT_ENABLE_MASK);

@@ -279,7 +279,7 @@ int kgd_gfx_v9_hqd_load(struct amdgpu_device *adev, void 
*mqd,
lower_32_bits((uintptr_t)wptr));
 WREG32_RLC(SOC15_REG_OFFSET(GC, 0, 
mmCP_HQD_PQ_WPTR_POLL_ADDR_HI),
upper_32_bits((uintptr_t)wptr));
-   WREG32(SOC15_REG_OFFSET(GC, 0, mmCP_PQ_WPTR_POLL_CNTL1),
+   WREG32_SOC15(GC, 0, mmCP_PQ_WPTR_POLL_CNTL1,
(uint32_t)get_queue_mask(adev, pipe_id, queue_id));
 }

@@ -488,13 +488,13 @@ bool kgd_gfx_v9_hqd_is_occupied(struct amdgpu_device 
*adev,
 uint32_t low, high;

 acquire_queue(adev, pipe_id, queue_id);
-   act = RREG32(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_ACTIVE));
+   act = RREG32_SOC15(GC, 0, mmCP_HQD_ACTIVE);
 if (act) {
 low = lower_32_bits(queue_address >> 8);
 high = upper_32_bits(queue_address >> 8);

-   if (low == RREG32(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_PQ_BASE)) &&
-  high == RREG32(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_PQ_BASE_HI)))
+   if (low == RREG32_SOC15(GC, 0, mmCP_HQD_PQ_BASE) &&
+  high == RREG32_SOC15(GC, 0, mmCP_HQD_PQ_BASE_HI))
 retval = true;
 }
 release_queue(adev);
@@ -556,7 +556,7 @@ int kgd_gfx_v9_hqd_destroy(struct amdgpu_device *adev, void 
*mqd,

 end_jiffies = (utimeout * HZ / 1000) + jiffies;
 while (true) {
-   temp = RREG32(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_ACTIVE));
+   temp = RREG32_SOC15(GC, 0, mmCP_HQD_ACTIVE);
 if (!(temp & CP_HQD_ACTIVE__ACTIVE_MASK))
 break;
 if (time_after(jiffies, end_jiffies)) {
@@ -645,7 +645,7 @@ int kgd_gfx_v9_wave_control_execute(struct amdgpu_device 
*adev,
 mutex_lock(>grbm_idx_mutex);

 WREG32_SOC15_RLC_SHADOW(GC, 0, mmGRBM_GFX_INDEX, gfx_index_val);
-   WREG32(SOC15_REG_OFFSET(GC, 0, mmSQ_CMD), sq_cmd);
+   WREG32_SOC15(GC, 0, mmSQ_CMD, sq_cmd);

 data = REG_SET_FIELD(data, GRBM_GFX_INDEX,
 INSTANCE_BROADCAST_WRITES, 1);
@@ -722,7 +722,7 @@ static void get_wave_count(struct amdgpu_device *adev, int 
queue_idx,
 pipe_idx = queue_idx / adev->gfx.mec.num_queue_per_pipe;
 queue_slot = queue_idx % adev->gfx.mec.num_queue_per_pipe;
 soc15_grbm_select(adev, 1, pipe_idx, queue_slot, 0);
-   reg_val = RREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_CSQ_WF_ACTIVE_COUNT_0) +
+   reg_val = RREG32_SOC15_IP(GC, SOC15_REG_OFFSET(GC, 0, 
mmSPI_CSQ_WF_ACTIVE_COUNT_0) +
  queue_slot);
 *wave_cnt = reg_val & SPI_CSQ_WF_ACTIVE_COUNT_0__COUNT_MASK;
 if (*wave_cnt != 0)
@@ -809,8 +809,7 @@ void kgd_gfx_v9_get_cu_occupancy(struct amdgpu_device 
*adev, int pasid,
 for (sh_idx = 0; sh_idx < sh_cnt; sh_idx++) {

 gfx_v9_0_select_se_sh(adev, se_idx, sh_idx, 
0x);
-   queue_map = RREG32(SOC15_REG_OFFSET(GC, 0,
-  mmSPI_CSQ_WF_ACTIVE_STATUS));
+   queue_map = RREG32_SOC15(GC, 0, 
mmSPI_CSQ_WF_ACTIVE_STATUS);

 /*
  * Assumption: queue map encodes following schema: four
@@ -860,17 +859,17 @@ void kgd_gfx_v9_program_trap_handler_settings(struct 
amdgpu_device *adev,
 /*
  * Program TBA registers
  */
-   WREG32(SOC15_REG_OFFSET(GC, 0, mmSQ_SHADER_TBA_LO),
+   WREG32_SOC15(GC, 0, mmSQ_SHADER_TBA_LO,
 lower_32_bits(tba_addr >> 8));
-   WREG32(SOC15_REG_OFFSET(GC, 0,

Re: [PATCH 1/5] drm/amdgpu: Add *_SOC15_IP_NO_KIQ() macro definitions

2021-12-15 Thread Nieto, David M

[AMD Official Use Only]

Reviewed-by: David Nieto 

From: Skvortsov, Victor 
Sent: Wednesday, December 15, 2021 10:55 AM
To: amd-gfx@lists.freedesktop.org ; Deng, Emily 
; Liu, Monk ; Ming, Davis 
; Liu, Shaoyun ; Zhou, Peng Ju 
; Chen, JingWen ; Chen, Horace 
; Nieto, David M 
Cc: Skvortsov, Victor 
Subject: [PATCH 1/5] drm/amdgpu: Add *_SOC15_IP_NO_KIQ() macro definitions

Add helper macros to change register access
from direct to indirect.

Signed-off-by: Victor Skvortsov 
---
 drivers/gpu/drm/amd/amdgpu/soc15_common.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/soc15_common.h 
b/drivers/gpu/drm/amd/amdgpu/soc15_common.h
index 8a9ca87d8663..473767e03676 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc15_common.h
+++ b/drivers/gpu/drm/amd/amdgpu/soc15_common.h
@@ -51,6 +51,8 @@

 #define RREG32_SOC15_IP(ip, reg) __RREG32_SOC15_RLC__(reg, 0, ip##_HWIP)

+#define RREG32_SOC15_IP_NO_KIQ(ip, reg) __RREG32_SOC15_RLC__(reg, 
AMDGPU_REGS_NO_KIQ, ip##_HWIP)
+
 #define RREG32_SOC15_NO_KIQ(ip, inst, reg) \
 __RREG32_SOC15_RLC__(adev->reg_offset[ip##_HWIP][inst][reg##_BASE_IDX] 
+ reg, \
  AMDGPU_REGS_NO_KIQ, ip##_HWIP)
@@ -65,6 +67,9 @@
 #define WREG32_SOC15_IP(ip, reg, value) \
  __WREG32_SOC15_RLC__(reg, value, 0, ip##_HWIP)

+#define WREG32_SOC15_IP_NO_KIQ(ip, reg, value) \
+__WREG32_SOC15_RLC__(reg, value, AMDGPU_REGS_NO_KIQ, ip##_HWIP)
+
 #define WREG32_SOC15_NO_KIQ(ip, inst, reg, value) \
 __WREG32_SOC15_RLC__(adev->reg_offset[ip##_HWIP][inst][reg##_BASE_IDX] 
+ reg, \
  value, AMDGPU_REGS_NO_KIQ, ip##_HWIP)
--
2.25.1

Re: [PATCH 2/5] drm/amdgpu: Modify indirect register access for gmc_v9_0 sriov

2021-12-15 Thread Nieto, David M

[AMD Official Use Only]

I don't know what others may think, but this coding while correct:

-   WREG32_NO_KIQ(hub->vm_inv_eng0_req +
- hub->eng_distance * eng, inv_req);
+   (vmhub == AMDGPU_GFXHUB_0) ?
+   WREG32_SOC15_IP_NO_KIQ(GC, hub->vm_inv_eng0_req + 
hub->eng_distance * eng, inv_req) :
+   WREG32_SOC15_IP_NO_KIQ(MMHUB, hub->vm_inv_eng0_req + 
hub->eng_distance * eng, inv_req);

if is bit difficult to read. I wouldn't mind if the results of those function 
calls were stored in a variable, but here, you are using it as an if/else or a 
switch statement. It is better to do it like that:

switch(vmhub) {
case AMDGPU_GFXHUB_0):
//yadayada

or

if (vmhub == AMDGPU_GFXHUB_0)
 //yadayada
else {
 // yada du
}

From: Skvortsov, Victor 
Sent: Wednesday, December 15, 2021 10:55 AM
To: amd-gfx@lists.freedesktop.org ; Deng, Emily 
; Liu, Monk ; Ming, Davis 
; Liu, Shaoyun ; Zhou, Peng Ju 
; Chen, JingWen ; Chen, Horace 
; Nieto, David M 
Cc: Skvortsov, Victor 
Subject: [PATCH 2/5] drm/amdgpu: Modify indirect register access for gmc_v9_0 
sriov

Modify GC register access from MMIO to RLCG if the
indirect flag is set

Signed-off-by: Victor Skvortsov 
---
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 45 +++
 1 file changed, 32 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index db2ec84f7237..345ce7fc6463 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -478,9 +478,16 @@ static int gmc_v9_0_vm_fault_interrupt_state(struct 
amdgpu_device *adev,
 hub = >vmhub[j];
 for (i = 0; i < 16; i++) {
 reg = hub->vm_context0_cntl + i;
-   tmp = RREG32(reg);
+
+   tmp = (j == AMDGPU_GFXHUB_0) ?
+   RREG32_SOC15_IP(GC, reg) :
+   RREG32_SOC15_IP(MMHUB, reg);
+
 tmp &= ~bits;
-   WREG32(reg, tmp);
+
+   (j == AMDGPU_GFXHUB_0) ?
+   WREG32_SOC15_IP(GC, reg, tmp) :
+   WREG32_SOC15_IP(MMHUB, reg, tmp);
 }
 }
 break;
@@ -489,9 +496,16 @@ static int gmc_v9_0_vm_fault_interrupt_state(struct 
amdgpu_device *adev,
 hub = >vmhub[j];
 for (i = 0; i < 16; i++) {
 reg = hub->vm_context0_cntl + i;
-   tmp = RREG32(reg);
+
+   tmp = (j == AMDGPU_GFXHUB_0) ?
+   RREG32_SOC15_IP(GC, reg) :
+   RREG32_SOC15_IP(MMHUB, reg);
+
 tmp |= bits;
-   WREG32(reg, tmp);
+
+   (j == AMDGPU_GFXHUB_0) ?
+   WREG32_SOC15_IP(GC, reg, tmp) :
+   WREG32_SOC15_IP(MMHUB, reg, tmp);
 }
 }
 break;
@@ -789,8 +803,9 @@ static void gmc_v9_0_flush_gpu_tlb(struct amdgpu_device 
*adev, uint32_t vmid,
 if (use_semaphore) {
 for (j = 0; j < adev->usec_timeout; j++) {
 /* a read return value of 1 means semaphore acuqire */
-   tmp = RREG32_NO_KIQ(hub->vm_inv_eng0_sem +
-   hub->eng_distance * eng);
+   tmp = (vmhub == AMDGPU_GFXHUB_0) ?
+   RREG32_SOC15_IP_NO_KIQ(GC, 
hub->vm_inv_eng0_sem + hub->eng_distance * eng) :
+   RREG32_SOC15_IP_NO_KIQ(MMHUB, 
hub->vm_inv_eng0_sem + hub->eng_distance * eng);
 if (tmp & 0x1)
 break;
 udelay(1);
@@ -801,8 +816,9 @@ static void gmc_v9_0_flush_gpu_tlb(struct amdgpu_device 
*adev, uint32_t vmid,
 }

 do {
-   WREG32_NO_KIQ(hub->vm_inv_eng0_req +
- hub->eng_distance * eng, inv_req);
+   (vmhub == AMDGPU_GFXHUB_0) ?
+   WREG32_SOC15_IP_NO_KIQ(GC, hub->vm_inv_eng0_req + 
hub->eng_distance * eng, inv_req) :
+   WREG32_SOC15_IP_NO_KIQ(MMHUB, hub->vm_inv_eng0_req + 
hub->eng_distance * eng, inv_req);

 /*
  * Issue a dummy read to wait for the ACK register to
@@ -815,8 +831,9 @@ static void gmc_v9_0_flush_gpu_tlb(struct amdgpu_device 
*adev, uint32_t vmid,
   hub->eng_distance * eng);

RE: [PATCH 5/5] drm/amdgpu: Modify indirect register access for gfx9 sriov

2021-12-15 Thread Skvortsov, Victor

[AMD Official Use Only]

This was a bug in the original definition, but it functionally it makes no 
difference (in both cases the macros resolve to the same value).

From: Nieto, David M 
Sent: Wednesday, December 15, 2021 2:16 PM
To: Skvortsov, Victor ; 
amd-gfx@lists.freedesktop.org; Deng, Emily ; Liu, Monk 
; Ming, Davis ; Liu, Shaoyun 
; Zhou, Peng Ju ; Chen, JingWen 
; Chen, Horace 
Subject: Re: [PATCH 5/5] drm/amdgpu: Modify indirect register access for gfx9 
sriov


[AMD Official Use Only]

 scratch_reg0 = adev->rmmio + 
(adev->reg_offset[GC_HWIP][0][mmSCRATCH_REG0_BASE_IDX] + mmSCRATCH_REG0)*4;
 scratch_reg1 = adev->rmmio + 
(adev->reg_offset[GC_HWIP][0][mmSCRATCH_REG1_BASE_IDX] + mmSCRATCH_REG1)*4;
-   scratch_reg2 = adev->rmmio + 
(adev->reg_offset[GC_HWIP][0][mmSCRATCH_REG1_BASE_IDX] + mmSCRATCH_REG2)*4;
-   scratch_reg3 = adev->rmmio + 
(adev->reg_offset[GC_HWIP][0][mmSCRATCH_REG1_BASE_IDX] + mmSCRATCH_REG3)*4;
+   scratch_reg2 = adev->rmmio + 
(adev->reg_offset[GC_HWIP][0][mmSCRATCH_REG2_BASE_IDX] + mmSCRATCH_REG2)*4;
+   scratch_reg3 = adev->rmmio + 
(adev->reg_offset[GC_HWIP][0][mmSCRATCH_REG3_BASE_IDX] + mmSCRATCH_REG3)*4;
 spare_int = adev->rmmio + 
(adev->reg_offset[GC_HWIP][0][mmRLC_SPARE_INT_BASE_IDX] + mmRLC_SPARE_INT)*4;

the definition of scratch_reg2 and 3 has here will this be backwards 
compatible? Was it a bug in the definition?

From: Skvortsov, Victor 
mailto:victor.skvort...@amd.com>>
Sent: Wednesday, December 15, 2021 10:55 AM
To: amd-gfx@lists.freedesktop.org 
mailto:amd-gfx@lists.freedesktop.org>>; Deng, 
Emily mailto:emily.d...@amd.com>>; Liu, Monk 
mailto:monk@amd.com>>; Ming, Davis 
mailto:davis.m...@amd.com>>; Liu, Shaoyun 
mailto:shaoyun@amd.com>>; Zhou, Peng Ju 
mailto:pengju.z...@amd.com>>; Chen, JingWen 
mailto:jingwen.ch...@amd.com>>; Chen, Horace 
mailto:horace.c...@amd.com>>; Nieto, David M 
mailto:david.ni...@amd.com>>
Cc: Skvortsov, Victor 
mailto:victor.skvort...@amd.com>>
Subject: [PATCH 5/5] drm/amdgpu: Modify indirect register access for gfx9 sriov

Expand RLCG interface for new GC read & write commands.
New interface will only be used if the PF enables the flag in pf2vf msg.

Signed-off-by: Victor Skvortsov 
mailto:victor.skvort...@amd.com>>
---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 111 +++---
 1 file changed, 83 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index d252b06efa43..bce6ab52cae0 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -63,6 +63,13 @@
 #define mmGCEA_PROBE_MAP0x070c
 #define mmGCEA_PROBE_MAP_BASE_IDX   0

+#define GFX9_RLCG_GC_WRITE_OLD (0x8 << 28)
+#define GFX9_RLCG_GC_WRITE (0x0 << 28)
+#define GFX9_RLCG_GC_READ  (0x1 << 28)
+#define GFX9_RLCG_VFGATE_DISABLED  0x400
+#define GFX9_RLCG_WRONG_OPERATION_TYPE 0x200
+#define GFX9_RLCG_NOT_IN_RANGE 0x100
+
 MODULE_FIRMWARE("amdgpu/vega10_ce.bin");
 MODULE_FIRMWARE("amdgpu/vega10_pfp.bin");
 MODULE_FIRMWARE("amdgpu/vega10_me.bin");
@@ -739,7 +746,7 @@ static const u32 GFX_RLC_SRM_INDEX_CNTL_DATA_OFFSETS[] =
 mmRLC_SRM_INDEX_CNTL_DATA_7 - mmRLC_SRM_INDEX_CNTL_DATA_0,
 };

-static void gfx_v9_0_rlcg_w(struct amdgpu_device *adev, u32 offset, u32 v, u32 
flag)
+static u32 gfx_v9_0_rlcg_rw(struct amdgpu_device *adev, u32 offset, u32 v, 
uint32_t flag)
 {
 static void *scratch_reg0;
 static void *scratch_reg1;
@@ -748,21 +755,20 @@ static void gfx_v9_0_rlcg_w(struct amdgpu_device *adev, 
u32 offset, u32 v, u32 f
 static void *spare_int;
 static uint32_t grbm_cntl;
 static uint32_t grbm_idx;
+   uint32_t i = 0;
+   uint32_t retries = 5;
+   u32 ret = 0;
+   u32 tmp;

 scratch_reg0 = adev->rmmio + 
(adev->reg_offset[GC_HWIP][0][mmSCRATCH_REG0_BASE_IDX] + mmSCRATCH_REG0)*4;
 scratch_reg1 = adev->rmmio + 
(adev->reg_offset[GC_HWIP][0][mmSCRATCH_REG1_BASE_IDX] + mmSCRATCH_REG1)*4;
-   scratch_reg2 = adev->rmmio + 
(adev->reg_offset[GC_HWIP][0][mmSCRATCH_REG1_BASE_IDX] + mmSCRATCH_REG2)*4;
-   scratch_reg3 = adev->rmmio + 
(adev->reg_offset[GC_HWIP][0][mmSCRATCH_REG1_BASE_IDX] + mmSCRATCH_REG3)*4;
+   scratch_reg2 = adev->rmmio + 
(adev->reg_offset[GC_HWIP][0][mmSCRATCH_REG2_BASE_IDX] + mmSCRATCH_REG2)*4;
+   scratch_reg3 = adev->rmmio + 
(adev->reg_offset[GC_HWIP][0][mmSCRATCH_REG3_BASE_IDX] + mmSCRATCH_REG3)*4;
 spare_int = adev->rmmio + 
(adev->reg_offset[GC_HWIP][0][mmRLC_SPARE_INT_BASE_IDX] + mmRLC_SPARE_INT)*4;

 grbm_cntl = adev->reg_offset[GC_HWIP][0][mmGRBM_GFX_CNTL_BASE_IDX] + 
mmGRBM_GFX_CNTL;
 grbm_idx = adev->reg_offset[GC_HWIP][0][mmGRBM_GFX_INDEX_BASE_IDX] +

Re: [PATCH 5/5] drm/amdgpu: Modify indirect register access for gfx9 sriov

2021-12-15 Thread Nieto, David M

[AMD Official Use Only]

 scratch_reg0 = adev->rmmio + 
(adev->reg_offset[GC_HWIP][0][mmSCRATCH_REG0_BASE_IDX] + mmSCRATCH_REG0)*4;
 scratch_reg1 = adev->rmmio + 
(adev->reg_offset[GC_HWIP][0][mmSCRATCH_REG1_BASE_IDX] + mmSCRATCH_REG1)*4;
-   scratch_reg2 = adev->rmmio + 
(adev->reg_offset[GC_HWIP][0][mmSCRATCH_REG1_BASE_IDX] + mmSCRATCH_REG2)*4;
-   scratch_reg3 = adev->rmmio + 
(adev->reg_offset[GC_HWIP][0][mmSCRATCH_REG1_BASE_IDX] + mmSCRATCH_REG3)*4;
+   scratch_reg2 = adev->rmmio + 
(adev->reg_offset[GC_HWIP][0][mmSCRATCH_REG2_BASE_IDX] + mmSCRATCH_REG2)*4;
+   scratch_reg3 = adev->rmmio + 
(adev->reg_offset[GC_HWIP][0][mmSCRATCH_REG3_BASE_IDX] + mmSCRATCH_REG3)*4;
 spare_int = adev->rmmio + 
(adev->reg_offset[GC_HWIP][0][mmRLC_SPARE_INT_BASE_IDX] + mmRLC_SPARE_INT)*4;

the definition of scratch_reg2 and 3 has here will this be backwards 
compatible? Was it a bug in the definition?

From: Skvortsov, Victor 
Sent: Wednesday, December 15, 2021 10:55 AM
To: amd-gfx@lists.freedesktop.org ; Deng, Emily 
; Liu, Monk ; Ming, Davis 
; Liu, Shaoyun ; Zhou, Peng Ju 
; Chen, JingWen ; Chen, Horace 
; Nieto, David M 
Cc: Skvortsov, Victor 
Subject: [PATCH 5/5] drm/amdgpu: Modify indirect register access for gfx9 sriov

Expand RLCG interface for new GC read & write commands.
New interface will only be used if the PF enables the flag in pf2vf msg.

Signed-off-by: Victor Skvortsov 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 111 +++---
 1 file changed, 83 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index d252b06efa43..bce6ab52cae0 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -63,6 +63,13 @@
 #define mmGCEA_PROBE_MAP0x070c
 #define mmGCEA_PROBE_MAP_BASE_IDX   0

+#define GFX9_RLCG_GC_WRITE_OLD (0x8 << 28)
+#define GFX9_RLCG_GC_WRITE (0x0 << 28)
+#define GFX9_RLCG_GC_READ  (0x1 << 28)
+#define GFX9_RLCG_VFGATE_DISABLED  0x400
+#define GFX9_RLCG_WRONG_OPERATION_TYPE 0x200
+#define GFX9_RLCG_NOT_IN_RANGE 0x100
+
 MODULE_FIRMWARE("amdgpu/vega10_ce.bin");
 MODULE_FIRMWARE("amdgpu/vega10_pfp.bin");
 MODULE_FIRMWARE("amdgpu/vega10_me.bin");
@@ -739,7 +746,7 @@ static const u32 GFX_RLC_SRM_INDEX_CNTL_DATA_OFFSETS[] =
 mmRLC_SRM_INDEX_CNTL_DATA_7 - mmRLC_SRM_INDEX_CNTL_DATA_0,
 };

-static void gfx_v9_0_rlcg_w(struct amdgpu_device *adev, u32 offset, u32 v, u32 
flag)
+static u32 gfx_v9_0_rlcg_rw(struct amdgpu_device *adev, u32 offset, u32 v, 
uint32_t flag)
 {
 static void *scratch_reg0;
 static void *scratch_reg1;
@@ -748,21 +755,20 @@ static void gfx_v9_0_rlcg_w(struct amdgpu_device *adev, 
u32 offset, u32 v, u32 f
 static void *spare_int;
 static uint32_t grbm_cntl;
 static uint32_t grbm_idx;
+   uint32_t i = 0;
+   uint32_t retries = 5;
+   u32 ret = 0;
+   u32 tmp;

 scratch_reg0 = adev->rmmio + 
(adev->reg_offset[GC_HWIP][0][mmSCRATCH_REG0_BASE_IDX] + mmSCRATCH_REG0)*4;
 scratch_reg1 = adev->rmmio + 
(adev->reg_offset[GC_HWIP][0][mmSCRATCH_REG1_BASE_IDX] + mmSCRATCH_REG1)*4;
-   scratch_reg2 = adev->rmmio + 
(adev->reg_offset[GC_HWIP][0][mmSCRATCH_REG1_BASE_IDX] + mmSCRATCH_REG2)*4;
-   scratch_reg3 = adev->rmmio + 
(adev->reg_offset[GC_HWIP][0][mmSCRATCH_REG1_BASE_IDX] + mmSCRATCH_REG3)*4;
+   scratch_reg2 = adev->rmmio + 
(adev->reg_offset[GC_HWIP][0][mmSCRATCH_REG2_BASE_IDX] + mmSCRATCH_REG2)*4;
+   scratch_reg3 = adev->rmmio + 
(adev->reg_offset[GC_HWIP][0][mmSCRATCH_REG3_BASE_IDX] + mmSCRATCH_REG3)*4;
 spare_int = adev->rmmio + 
(adev->reg_offset[GC_HWIP][0][mmRLC_SPARE_INT_BASE_IDX] + mmRLC_SPARE_INT)*4;

 grbm_cntl = adev->reg_offset[GC_HWIP][0][mmGRBM_GFX_CNTL_BASE_IDX] + 
mmGRBM_GFX_CNTL;
 grbm_idx = adev->reg_offset[GC_HWIP][0][mmGRBM_GFX_INDEX_BASE_IDX] + 
mmGRBM_GFX_INDEX;

-   if (amdgpu_sriov_runtime(adev)) {
-   pr_err("shouldn't call rlcg write register during runtime\n");
-   return;
-   }
-
 if (offset == grbm_cntl || offset == grbm_idx) {
 if (offset  == grbm_cntl)
 writel(v, scratch_reg2);
@@ -771,41 +777,89 @@ static void gfx_v9_0_rlcg_w(struct amdgpu_device *adev, 
u32 offset, u32 v, u32 f

 writel(v, ((void __iomem *)adev->rmmio) + (offset * 4));
 } else {
-   uint32_t i = 0;
-   uint32_t retries = 5;
-
 writel(v, scratch_reg0);
-   writel(offset | 0x8000, scratch_reg1);
+   writel(offset | flag, scratch_reg1);
 writel(1, spare_int);
-   for (i = 0; i < retries; i++) {
-   u32 tmp;

+

Re: [PATCH 4/5] drm/amdgpu: Initialize Aldebaran RLC function pointers

2021-12-15 Thread Nieto, David M

[AMD Official Use Only]

 case IP_VERSION(9, 4, 1):
 case IP_VERSION(9, 4, 2):
 amdgpu_device_ip_block_add(adev, _v9_0_ip_block);
+   if (amdgpu_sriov_vf(adev) && adev->ip_versions[GC_HWIP][0] == 
IP_VERSION(9, 4, 2))
+   gfx_v9_0_set_rlc_funcs(adev);
 break;
 case IP_VERSION(10, 1, 10):

I think for the above, it would be more clear just to separate them:

case IP_VERSION(9, 4, 1):
amdgpu_device_ip_block_add(adev, _v9_0_ip_block);
break;
case IP_VERSION(9, 4, 2):
   amdgpu_device_ip_block_add(adev, _v9_0_ip_block);
   if (amdgpu_sriov_vf(adev))
   gfx_v9_0_set_rlc_funcs(adev);
   break;

From: Skvortsov, Victor 
Sent: Wednesday, December 15, 2021 10:55 AM
To: amd-gfx@lists.freedesktop.org ; Deng, Emily 
; Liu, Monk ; Ming, Davis 
; Liu, Shaoyun ; Zhou, Peng Ju 
; Chen, JingWen ; Chen, Horace 
; Nieto, David M 
Cc: Skvortsov, Victor 
Subject: [PATCH 4/5] drm/amdgpu: Initialize Aldebaran RLC function pointers

In SRIOV, RLC function pointers must be initialized early as
we rely on the RLCG interface for all GC register access.

Signed-off-by: Victor Skvortsov 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 2 ++
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 3 +--
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.h | 2 ++
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index 65e1f6cc59dd..1bc92a38d124 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -844,6 +844,8 @@ static int amdgpu_discovery_set_gc_ip_blocks(struct 
amdgpu_device *adev)
 case IP_VERSION(9, 4, 1):
 case IP_VERSION(9, 4, 2):
 amdgpu_device_ip_block_add(adev, _v9_0_ip_block);
+   if (amdgpu_sriov_vf(adev) && adev->ip_versions[GC_HWIP][0] == 
IP_VERSION(9, 4, 2))
+   gfx_v9_0_set_rlc_funcs(adev);
 break;
 case IP_VERSION(10, 1, 10):
 case IP_VERSION(10, 1, 2):
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index edb3e3b08eed..d252b06efa43 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -816,7 +816,6 @@ static void gfx_v9_0_sriov_wreg(struct amdgpu_device *adev, 
u32 offset,
 static void gfx_v9_0_set_ring_funcs(struct amdgpu_device *adev);
 static void gfx_v9_0_set_irq_funcs(struct amdgpu_device *adev);
 static void gfx_v9_0_set_gds_init(struct amdgpu_device *adev);
-static void gfx_v9_0_set_rlc_funcs(struct amdgpu_device *adev);
 static int gfx_v9_0_get_cu_info(struct amdgpu_device *adev,
 struct amdgpu_cu_info *cu_info);
 static uint64_t gfx_v9_0_get_gpu_clock_counter(struct amdgpu_device *adev);
@@ -7066,7 +7065,7 @@ static void gfx_v9_0_set_irq_funcs(struct amdgpu_device 
*adev)
 adev->gfx.cp_ecc_error_irq.funcs = _v9_0_cp_ecc_error_irq_funcs;
 }

-static void gfx_v9_0_set_rlc_funcs(struct amdgpu_device *adev)
+void gfx_v9_0_set_rlc_funcs(struct amdgpu_device *adev)
 {
 switch (adev->ip_versions[GC_HWIP][0]) {
 case IP_VERSION(9, 0, 1):
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.h 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.h
index dfe8d4841f58..1817e252354f 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.h
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.h
@@ -29,4 +29,6 @@ extern const struct amdgpu_ip_block_version gfx_v9_0_ip_block;
 void gfx_v9_0_select_se_sh(struct amdgpu_device *adev, u32 se_num, u32 sh_num,
u32 instance);

+void gfx_v9_0_set_rlc_funcs(struct amdgpu_device *adev);
+
 #endif
--
2.25.1

[PATCH 4/5] drm/amdgpu: Initialize Aldebaran RLC function pointers

2021-12-15 Thread Victor Skvortsov

In SRIOV, RLC function pointers must be initialized early as
we rely on the RLCG interface for all GC register access.

Signed-off-by: Victor Skvortsov 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 2 ++
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 3 +--
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.h | 2 ++
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index 65e1f6cc59dd..1bc92a38d124 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -844,6 +844,8 @@ static int amdgpu_discovery_set_gc_ip_blocks(struct 
amdgpu_device *adev)
case IP_VERSION(9, 4, 1):
case IP_VERSION(9, 4, 2):
amdgpu_device_ip_block_add(adev, _v9_0_ip_block);
+   if (amdgpu_sriov_vf(adev) && adev->ip_versions[GC_HWIP][0] == 
IP_VERSION(9, 4, 2))
+   gfx_v9_0_set_rlc_funcs(adev);
break;
case IP_VERSION(10, 1, 10):
case IP_VERSION(10, 1, 2):
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index edb3e3b08eed..d252b06efa43 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -816,7 +816,6 @@ static void gfx_v9_0_sriov_wreg(struct amdgpu_device *adev, 
u32 offset,
 static void gfx_v9_0_set_ring_funcs(struct amdgpu_device *adev);
 static void gfx_v9_0_set_irq_funcs(struct amdgpu_device *adev);
 static void gfx_v9_0_set_gds_init(struct amdgpu_device *adev);
-static void gfx_v9_0_set_rlc_funcs(struct amdgpu_device *adev);
 static int gfx_v9_0_get_cu_info(struct amdgpu_device *adev,
struct amdgpu_cu_info *cu_info);
 static uint64_t gfx_v9_0_get_gpu_clock_counter(struct amdgpu_device *adev);
@@ -7066,7 +7065,7 @@ static void gfx_v9_0_set_irq_funcs(struct amdgpu_device 
*adev)
adev->gfx.cp_ecc_error_irq.funcs = _v9_0_cp_ecc_error_irq_funcs;
 }
 
-static void gfx_v9_0_set_rlc_funcs(struct amdgpu_device *adev)
+void gfx_v9_0_set_rlc_funcs(struct amdgpu_device *adev)
 {
switch (adev->ip_versions[GC_HWIP][0]) {
case IP_VERSION(9, 0, 1):
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.h 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.h
index dfe8d4841f58..1817e252354f 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.h
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.h
@@ -29,4 +29,6 @@ extern const struct amdgpu_ip_block_version gfx_v9_0_ip_block;
 void gfx_v9_0_select_se_sh(struct amdgpu_device *adev, u32 se_num, u32 sh_num,
   u32 instance);
 
+void gfx_v9_0_set_rlc_funcs(struct amdgpu_device *adev);
+
 #endif
-- 
2.25.1

[PATCH 2/5] drm/amdgpu: Modify indirect register access for gmc_v9_0 sriov

2021-12-15 Thread Victor Skvortsov

Modify GC register access from MMIO to RLCG if the
indirect flag is set

Signed-off-by: Victor Skvortsov 
---
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 45 +++
 1 file changed, 32 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index db2ec84f7237..345ce7fc6463 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -478,9 +478,16 @@ static int gmc_v9_0_vm_fault_interrupt_state(struct 
amdgpu_device *adev,
hub = >vmhub[j];
for (i = 0; i < 16; i++) {
reg = hub->vm_context0_cntl + i;
-   tmp = RREG32(reg);
+
+   tmp = (j == AMDGPU_GFXHUB_0) ?
+   RREG32_SOC15_IP(GC, reg) :
+   RREG32_SOC15_IP(MMHUB, reg);
+
tmp &= ~bits;
-   WREG32(reg, tmp);
+
+   (j == AMDGPU_GFXHUB_0) ?
+   WREG32_SOC15_IP(GC, reg, tmp) :
+   WREG32_SOC15_IP(MMHUB, reg, tmp);
}
}
break;
@@ -489,9 +496,16 @@ static int gmc_v9_0_vm_fault_interrupt_state(struct 
amdgpu_device *adev,
hub = >vmhub[j];
for (i = 0; i < 16; i++) {
reg = hub->vm_context0_cntl + i;
-   tmp = RREG32(reg);
+
+   tmp = (j == AMDGPU_GFXHUB_0) ?
+   RREG32_SOC15_IP(GC, reg) :
+   RREG32_SOC15_IP(MMHUB, reg);
+
tmp |= bits;
-   WREG32(reg, tmp);
+
+   (j == AMDGPU_GFXHUB_0) ?
+   WREG32_SOC15_IP(GC, reg, tmp) :
+   WREG32_SOC15_IP(MMHUB, reg, tmp);
}
}
break;
@@ -789,8 +803,9 @@ static void gmc_v9_0_flush_gpu_tlb(struct amdgpu_device 
*adev, uint32_t vmid,
if (use_semaphore) {
for (j = 0; j < adev->usec_timeout; j++) {
/* a read return value of 1 means semaphore acuqire */
-   tmp = RREG32_NO_KIQ(hub->vm_inv_eng0_sem +
-   hub->eng_distance * eng);
+   tmp = (vmhub == AMDGPU_GFXHUB_0) ?
+   RREG32_SOC15_IP_NO_KIQ(GC, 
hub->vm_inv_eng0_sem + hub->eng_distance * eng) :
+   RREG32_SOC15_IP_NO_KIQ(MMHUB, 
hub->vm_inv_eng0_sem + hub->eng_distance * eng);
if (tmp & 0x1)
break;
udelay(1);
@@ -801,8 +816,9 @@ static void gmc_v9_0_flush_gpu_tlb(struct amdgpu_device 
*adev, uint32_t vmid,
}
 
do {
-   WREG32_NO_KIQ(hub->vm_inv_eng0_req +
- hub->eng_distance * eng, inv_req);
+   (vmhub == AMDGPU_GFXHUB_0) ?
+   WREG32_SOC15_IP_NO_KIQ(GC, hub->vm_inv_eng0_req + 
hub->eng_distance * eng, inv_req) :
+   WREG32_SOC15_IP_NO_KIQ(MMHUB, hub->vm_inv_eng0_req + 
hub->eng_distance * eng, inv_req);
 
/*
 * Issue a dummy read to wait for the ACK register to
@@ -815,8 +831,9 @@ static void gmc_v9_0_flush_gpu_tlb(struct amdgpu_device 
*adev, uint32_t vmid,
  hub->eng_distance * eng);
 
for (j = 0; j < adev->usec_timeout; j++) {
-   tmp = RREG32_NO_KIQ(hub->vm_inv_eng0_ack +
-   hub->eng_distance * eng);
+   tmp = (vmhub == AMDGPU_GFXHUB_0) ?
+   RREG32_SOC15_IP_NO_KIQ(GC, hub->vm_inv_eng0_ack 
+ hub->eng_distance * eng) :
+   RREG32_SOC15_IP_NO_KIQ(MMHUB, 
hub->vm_inv_eng0_ack + hub->eng_distance * eng);
if (tmp & (1 << vmid))
break;
udelay(1);
@@ -827,13 +844,15 @@ static void gmc_v9_0_flush_gpu_tlb(struct amdgpu_device 
*adev, uint32_t vmid,
} while (inv_req);
 
/* TODO: It needs to continue working on debugging with semaphore for 
GFXHUB as well. */
-   if (use_semaphore)
+   if (use_semaphore) {
/*
 * add semaphore release after invalidation,
 * write with 0 means semaphore release
 */
-   WREG32_NO_KIQ(hub->vm_inv_eng0_sem +
- hub->eng_distance * eng, 0);
+   (vmhub == AMDGPU_GFXHUB_0) ?
+

[PATCH 5/5] drm/amdgpu: Modify indirect register access for gfx9 sriov

2021-12-15 Thread Victor Skvortsov

Expand RLCG interface for new GC read & write commands.
New interface will only be used if the PF enables the flag in pf2vf msg.

Signed-off-by: Victor Skvortsov 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 111 +++---
 1 file changed, 83 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index d252b06efa43..bce6ab52cae0 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -63,6 +63,13 @@
 #define mmGCEA_PROBE_MAP0x070c
 #define mmGCEA_PROBE_MAP_BASE_IDX   0
 
+#define GFX9_RLCG_GC_WRITE_OLD (0x8 << 28)
+#define GFX9_RLCG_GC_WRITE (0x0 << 28)
+#define GFX9_RLCG_GC_READ  (0x1 << 28)
+#define GFX9_RLCG_VFGATE_DISABLED  0x400
+#define GFX9_RLCG_WRONG_OPERATION_TYPE 0x200
+#define GFX9_RLCG_NOT_IN_RANGE 0x100
+
 MODULE_FIRMWARE("amdgpu/vega10_ce.bin");
 MODULE_FIRMWARE("amdgpu/vega10_pfp.bin");
 MODULE_FIRMWARE("amdgpu/vega10_me.bin");
@@ -739,7 +746,7 @@ static const u32 GFX_RLC_SRM_INDEX_CNTL_DATA_OFFSETS[] =
mmRLC_SRM_INDEX_CNTL_DATA_7 - mmRLC_SRM_INDEX_CNTL_DATA_0,
 };
 
-static void gfx_v9_0_rlcg_w(struct amdgpu_device *adev, u32 offset, u32 v, u32 
flag)
+static u32 gfx_v9_0_rlcg_rw(struct amdgpu_device *adev, u32 offset, u32 v, 
uint32_t flag)
 {
static void *scratch_reg0;
static void *scratch_reg1;
@@ -748,21 +755,20 @@ static void gfx_v9_0_rlcg_w(struct amdgpu_device *adev, 
u32 offset, u32 v, u32 f
static void *spare_int;
static uint32_t grbm_cntl;
static uint32_t grbm_idx;
+   uint32_t i = 0;
+   uint32_t retries = 5;
+   u32 ret = 0;
+   u32 tmp;
 
scratch_reg0 = adev->rmmio + 
(adev->reg_offset[GC_HWIP][0][mmSCRATCH_REG0_BASE_IDX] + mmSCRATCH_REG0)*4;
scratch_reg1 = adev->rmmio + 
(adev->reg_offset[GC_HWIP][0][mmSCRATCH_REG1_BASE_IDX] + mmSCRATCH_REG1)*4;
-   scratch_reg2 = adev->rmmio + 
(adev->reg_offset[GC_HWIP][0][mmSCRATCH_REG1_BASE_IDX] + mmSCRATCH_REG2)*4;
-   scratch_reg3 = adev->rmmio + 
(adev->reg_offset[GC_HWIP][0][mmSCRATCH_REG1_BASE_IDX] + mmSCRATCH_REG3)*4;
+   scratch_reg2 = adev->rmmio + 
(adev->reg_offset[GC_HWIP][0][mmSCRATCH_REG2_BASE_IDX] + mmSCRATCH_REG2)*4;
+   scratch_reg3 = adev->rmmio + 
(adev->reg_offset[GC_HWIP][0][mmSCRATCH_REG3_BASE_IDX] + mmSCRATCH_REG3)*4;
spare_int = adev->rmmio + 
(adev->reg_offset[GC_HWIP][0][mmRLC_SPARE_INT_BASE_IDX] + mmRLC_SPARE_INT)*4;
 
grbm_cntl = adev->reg_offset[GC_HWIP][0][mmGRBM_GFX_CNTL_BASE_IDX] + 
mmGRBM_GFX_CNTL;
grbm_idx = adev->reg_offset[GC_HWIP][0][mmGRBM_GFX_INDEX_BASE_IDX] + 
mmGRBM_GFX_INDEX;
 
-   if (amdgpu_sriov_runtime(adev)) {
-   pr_err("shouldn't call rlcg write register during runtime\n");
-   return;
-   }
-
if (offset == grbm_cntl || offset == grbm_idx) {
if (offset  == grbm_cntl)
writel(v, scratch_reg2);
@@ -771,41 +777,89 @@ static void gfx_v9_0_rlcg_w(struct amdgpu_device *adev, 
u32 offset, u32 v, u32 f
 
writel(v, ((void __iomem *)adev->rmmio) + (offset * 4));
} else {
-   uint32_t i = 0;
-   uint32_t retries = 5;
-
writel(v, scratch_reg0);
-   writel(offset | 0x8000, scratch_reg1);
+   writel(offset | flag, scratch_reg1);
writel(1, spare_int);
-   for (i = 0; i < retries; i++) {
-   u32 tmp;
 
+   for (i = 0; i < retries; i++) {
tmp = readl(scratch_reg1);
-   if (!(tmp & 0x8000))
+   if (!(tmp & flag))
break;
 
udelay(10);
}
-   if (i >= retries)
-   pr_err("timeout: rlcg program reg:0x%05x failed !\n", 
offset);
+
+   if (i >= retries) {
+   if (amdgpu_sriov_reg_indirect_gc(adev)) {
+   if (tmp & GFX9_RLCG_VFGATE_DISABLED)
+   pr_err("The vfgate is disabled, program 
reg:0x%05x failed!\n", offset);
+   else if (tmp & GFX9_RLCG_WRONG_OPERATION_TYPE)
+   pr_err("Wrong operation type, program 
reg:0x%05x failed!\n", offset);
+   else if (tmp & GFX9_RLCG_NOT_IN_RANGE)
+   pr_err("The register is not in range, 
program reg:0x%05x failed!\n", offset);
+   else
+   pr_err("Unknown error type, program 
reg:0x%05x failed!\n", offset);
+   } else
+   pr_err("timeout: rlcg program reg:0x%05x

[PATCH 1/5] drm/amdgpu: Add *_SOC15_IP_NO_KIQ() macro definitions

2021-12-15 Thread Victor Skvortsov

Add helper macros to change register access
from direct to indirect.

Signed-off-by: Victor Skvortsov 
---
 drivers/gpu/drm/amd/amdgpu/soc15_common.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/soc15_common.h 
b/drivers/gpu/drm/amd/amdgpu/soc15_common.h
index 8a9ca87d8663..473767e03676 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc15_common.h
+++ b/drivers/gpu/drm/amd/amdgpu/soc15_common.h
@@ -51,6 +51,8 @@
 
 #define RREG32_SOC15_IP(ip, reg) __RREG32_SOC15_RLC__(reg, 0, ip##_HWIP)
 
+#define RREG32_SOC15_IP_NO_KIQ(ip, reg) __RREG32_SOC15_RLC__(reg, 
AMDGPU_REGS_NO_KIQ, ip##_HWIP)
+
 #define RREG32_SOC15_NO_KIQ(ip, inst, reg) \
__RREG32_SOC15_RLC__(adev->reg_offset[ip##_HWIP][inst][reg##_BASE_IDX] 
+ reg, \
 AMDGPU_REGS_NO_KIQ, ip##_HWIP)
@@ -65,6 +67,9 @@
 #define WREG32_SOC15_IP(ip, reg, value) \
 __WREG32_SOC15_RLC__(reg, value, 0, ip##_HWIP)
 
+#define WREG32_SOC15_IP_NO_KIQ(ip, reg, value) \
+__WREG32_SOC15_RLC__(reg, value, AMDGPU_REGS_NO_KIQ, ip##_HWIP)
+
 #define WREG32_SOC15_NO_KIQ(ip, inst, reg, value) \
__WREG32_SOC15_RLC__(adev->reg_offset[ip##_HWIP][inst][reg##_BASE_IDX] 
+ reg, \
 value, AMDGPU_REGS_NO_KIQ, ip##_HWIP)
-- 
2.25.1

[PATCH 3/5] drm/amdgpu: Modify indirect register access for amdkfd_gfx_v9 sriov

2021-12-15 Thread Victor Skvortsov

Modify GC register access from MMIO to RLCG if the indirect
flag is set

Signed-off-by: Victor Skvortsov 
---
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 27 +--
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
index ddfe7aff919d..1abf662a0e91 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
@@ -166,7 +166,7 @@ int kgd_gfx_v9_init_interrupts(struct amdgpu_device *adev, 
uint32_t pipe_id)
 
lock_srbm(adev, mec, pipe, 0, 0);
 
-   WREG32(SOC15_REG_OFFSET(GC, 0, mmCPC_INT_CNTL),
+   WREG32_SOC15(GC, 0, mmCPC_INT_CNTL,
CP_INT_CNTL_RING0__TIME_STAMP_INT_ENABLE_MASK |
CP_INT_CNTL_RING0__OPCODE_ERROR_INT_ENABLE_MASK);
 
@@ -279,7 +279,7 @@ int kgd_gfx_v9_hqd_load(struct amdgpu_device *adev, void 
*mqd,
   lower_32_bits((uintptr_t)wptr));
WREG32_RLC(SOC15_REG_OFFSET(GC, 0, 
mmCP_HQD_PQ_WPTR_POLL_ADDR_HI),
   upper_32_bits((uintptr_t)wptr));
-   WREG32(SOC15_REG_OFFSET(GC, 0, mmCP_PQ_WPTR_POLL_CNTL1),
+   WREG32_SOC15(GC, 0, mmCP_PQ_WPTR_POLL_CNTL1,
   (uint32_t)get_queue_mask(adev, pipe_id, queue_id));
}
 
@@ -488,13 +488,13 @@ bool kgd_gfx_v9_hqd_is_occupied(struct amdgpu_device 
*adev,
uint32_t low, high;
 
acquire_queue(adev, pipe_id, queue_id);
-   act = RREG32(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_ACTIVE));
+   act = RREG32_SOC15(GC, 0, mmCP_HQD_ACTIVE);
if (act) {
low = lower_32_bits(queue_address >> 8);
high = upper_32_bits(queue_address >> 8);
 
-   if (low == RREG32(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_PQ_BASE)) &&
-  high == RREG32(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_PQ_BASE_HI)))
+   if (low == RREG32_SOC15(GC, 0, mmCP_HQD_PQ_BASE) &&
+  high == RREG32_SOC15(GC, 0, mmCP_HQD_PQ_BASE_HI))
retval = true;
}
release_queue(adev);
@@ -556,7 +556,7 @@ int kgd_gfx_v9_hqd_destroy(struct amdgpu_device *adev, void 
*mqd,
 
end_jiffies = (utimeout * HZ / 1000) + jiffies;
while (true) {
-   temp = RREG32(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_ACTIVE));
+   temp = RREG32_SOC15(GC, 0, mmCP_HQD_ACTIVE);
if (!(temp & CP_HQD_ACTIVE__ACTIVE_MASK))
break;
if (time_after(jiffies, end_jiffies)) {
@@ -645,7 +645,7 @@ int kgd_gfx_v9_wave_control_execute(struct amdgpu_device 
*adev,
mutex_lock(>grbm_idx_mutex);
 
WREG32_SOC15_RLC_SHADOW(GC, 0, mmGRBM_GFX_INDEX, gfx_index_val);
-   WREG32(SOC15_REG_OFFSET(GC, 0, mmSQ_CMD), sq_cmd);
+   WREG32_SOC15(GC, 0, mmSQ_CMD, sq_cmd);
 
data = REG_SET_FIELD(data, GRBM_GFX_INDEX,
INSTANCE_BROADCAST_WRITES, 1);
@@ -722,7 +722,7 @@ static void get_wave_count(struct amdgpu_device *adev, int 
queue_idx,
pipe_idx = queue_idx / adev->gfx.mec.num_queue_per_pipe;
queue_slot = queue_idx % adev->gfx.mec.num_queue_per_pipe;
soc15_grbm_select(adev, 1, pipe_idx, queue_slot, 0);
-   reg_val = RREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_CSQ_WF_ACTIVE_COUNT_0) +
+   reg_val = RREG32_SOC15_IP(GC, SOC15_REG_OFFSET(GC, 0, 
mmSPI_CSQ_WF_ACTIVE_COUNT_0) +
 queue_slot);
*wave_cnt = reg_val & SPI_CSQ_WF_ACTIVE_COUNT_0__COUNT_MASK;
if (*wave_cnt != 0)
@@ -809,8 +809,7 @@ void kgd_gfx_v9_get_cu_occupancy(struct amdgpu_device 
*adev, int pasid,
for (sh_idx = 0; sh_idx < sh_cnt; sh_idx++) {
 
gfx_v9_0_select_se_sh(adev, se_idx, sh_idx, 0x);
-   queue_map = RREG32(SOC15_REG_OFFSET(GC, 0,
-  mmSPI_CSQ_WF_ACTIVE_STATUS));
+   queue_map = RREG32_SOC15(GC, 0, 
mmSPI_CSQ_WF_ACTIVE_STATUS);
 
/*
 * Assumption: queue map encodes following schema: four
@@ -860,17 +859,17 @@ void kgd_gfx_v9_program_trap_handler_settings(struct 
amdgpu_device *adev,
/*
 * Program TBA registers
 */
-   WREG32(SOC15_REG_OFFSET(GC, 0, mmSQ_SHADER_TBA_LO),
+   WREG32_SOC15(GC, 0, mmSQ_SHADER_TBA_LO,
 lower_32_bits(tba_addr >> 8));
-   WREG32(SOC15_REG_OFFSET(GC, 0, mmSQ_SHADER_TBA_HI),
+   WREG32_SOC15(GC, 0, mmSQ_SHADER_TBA_HI,
 upper_32_bits(tba_addr >> 8));
 
/*
 * Program TMA registers
 */
-   WREG32(SOC15_REG_OFFSET(GC, 0, mmSQ_SHADER_TMA_LO),
+   WREG32_SOC15(GC, 0, mmSQ_SHADER_TMA_LO,
lower_32_bits(tma_addr >> 8));
-   WREG32(SOC15_REG_OFFSET(GC, 0, mmSQ_SHADER_TMA_HI),
+   WREG32_SOC15(GC, 0, mmSQ_SHADER_TMA_HI,

[PATCH 0/5] * GFX9 RLCG Interface modifications *

2021-12-15 Thread Victor Skvortsov

This patchset introduces an expanded sriov RLCG interface.
This interface will be used by Aldebaran in sriov mode for
indirect GC register access during full access.

Victor Skvortsov (5):
  drm/amdgpu: Add *_SOC15_IP_NO_KIQ() macro definitions
  drm/amdgpu: Modify indirect register access for gmc_v9_0 sriov
  drm/amdgpu: Modify indirect register access for amdkfd_gfx_v9 sriov
  drm/amdgpu: Initialize Aldebaran RLC function pointers
  drm/amdgpu: Modify indirect register access for gfx9 sriov

 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c |  27 ++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c |   2 +
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 114 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.h |   2 +
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c |  45 +--
 drivers/gpu/drm/amd/amdgpu/soc15_common.h |   5 +
 6 files changed, 138 insertions(+), 57 deletions(-)

-- 
2.25.1

RE: [PATCH] drm/amd/pm: Fix xgmi link control on aldebaran

2021-12-15 Thread Zhang, Hawking

Reviewed-by: Hawking Zhang 

Regards,
Hawking
-Original Message-
From: Lazar, Lijo  
Sent: Wednesday, December 15, 2021 23:50
To: amd-gfx@lists.freedesktop.org
Cc: Zhang, Hawking ; Deucher, Alexander 
; Yang, Stanley 
Subject: [PATCH] drm/amd/pm: Fix xgmi link control on aldebaran

Fix the message argument.
0: Allow power down
1: Disallow power down

Signed-off-by: Lijo Lazar 
---
 drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
index 0907da022197..7433a051e795 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
@@ -1625,7 +1625,7 @@ static int aldebaran_allow_xgmi_power_down(struct 
smu_context *smu, bool en)  {
return smu_cmn_send_smc_msg_with_param(smu,
   SMU_MSG_GmiPwrDnControl,
-  en ? 1 : 0,
+  en ? 0 : 1,
   NULL);
 }
 
--
2.25.1

[PATCH] drm/amd/pm: Fix xgmi link control on aldebaran

2021-12-15 Thread Lijo Lazar

Fix the message argument.
0: Allow power down
1: Disallow power down

Signed-off-by: Lijo Lazar 
---
 drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
index 0907da022197..7433a051e795 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
@@ -1625,7 +1625,7 @@ static int aldebaran_allow_xgmi_power_down(struct 
smu_context *smu, bool en)
 {
return smu_cmn_send_smc_msg_with_param(smu,
   SMU_MSG_GmiPwrDnControl,
-  en ? 1 : 0,
+  en ? 0 : 1,
   NULL);
 }
 
-- 
2.25.1

[PATCH] drm/amdkfd: use max() and min() to make code cleaner

2021-12-15 Thread cgel . zte

From: Changcheng Deng 

Use max() and min() in order to make code cleaner.

Reported-by: Zeal Robot 
Signed-off-by: Changcheng Deng 
---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 7e92dcea4ce8..c6d3555b5be6 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -2254,8 +2254,8 @@ svm_range_cpu_invalidate_pagetables(struct 
mmu_interval_notifier *mni,
 
start = mni->interval_tree.start;
last = mni->interval_tree.last;
-   start = (start > range->start ? start : range->start) >> PAGE_SHIFT;
-   last = (last < (range->end - 1) ? last : range->end - 1) >> PAGE_SHIFT;
+   start = max(start, range->start) >> PAGE_SHIFT;
+   last = min(last, range->end - 1) >> PAGE_SHIFT;
pr_debug("[0x%lx 0x%lx] range[0x%lx 0x%lx] notifier[0x%lx 0x%lx] %d\n",
 start, last, range->start >> PAGE_SHIFT,
 (range->end - 1) >> PAGE_SHIFT,
-- 
2.25.1

[PATCH] drm/amdgpu: fixup bad vram size on gmc v8

2021-12-15 Thread Zongmin Zhou

Some boards(like RX550) seem to have garbage in the upper
16 bits of the vram size register.  Check for
this and clamp the size properly.  Fixes
boards reporting bogus amounts of vram.

after add this patch,the maximum GPU VRAM size is 64GB,
otherwise only 64GB vram size will be used.

Signed-off-by: Zongmin Zhou
---
 drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
index 492ebed2915b..63b890f1e8af 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
@@ -515,10 +515,10 @@ static void gmc_v8_0_mc_program(struct amdgpu_device 
*adev)
 static int gmc_v8_0_mc_init(struct amdgpu_device *adev)
 {
int r;
+   u32 tmp;
 
adev->gmc.vram_width = amdgpu_atombios_get_vram_width(adev);
if (!adev->gmc.vram_width) {
-   u32 tmp;
int chansize, numchan;
 
/* Get VRAM informations */
@@ -562,8 +562,15 @@ static int gmc_v8_0_mc_init(struct amdgpu_device *adev)
adev->gmc.vram_width = numchan * chansize;
}
/* size in MB on si */
-   adev->gmc.mc_vram_size = RREG32(mmCONFIG_MEMSIZE) * 1024ULL * 1024ULL;
-   adev->gmc.real_vram_size = RREG32(mmCONFIG_MEMSIZE) * 1024ULL * 1024ULL;
+   tmp = RREG32(mmCONFIG_MEMSIZE);
+   /* some boards may have garbage in the upper 16 bits */
+   if (tmp & 0x) {
+   DRM_INFO("Probable bad vram size: 0x%08x\n", tmp);
+   if (tmp & 0x)
+   tmp &= 0x;
+   }
+   adev->gmc.mc_vram_size = tmp * 1024ULL * 1024ULL;
+   adev->gmc.real_vram_size = adev->gmc.mc_vram_size;
 
if (!(adev->flags & AMD_IS_APU)) {
r = amdgpu_device_resize_fb_bar(adev);
-- 
2.25.1


No virus found
Checked by Hillstone Network AntiVirus

Re: [PATCH] drm/amdgpu: add drm_dev_unplug() in GPU initialization failure to prevent crash

2021-12-15 Thread Andrey Grodzovsky

I think that we should not call amdgpu_device_unmap_mmio unless device 
is unplugged (as in amdgpu_pci_remove) because the point of this 
function is to prevent accesses to MMIO range the device was occupying 
before removal.
There is no point to prevent MMIO accesses when init failed and we want 
to do an orderly HW shutdown... So probably we should just change to


if (drm_dev_enter()) {

    amdgpu_device_unmap_mmio

    drm_dev_exit()

}

Andrey

On 2021-12-15 8:28 a.m., Chen, Guchun wrote:

[Public]

Hi Christian,

Your question is a really good one. The patch to unmap MMOI in such early phase 
is from Andrey's patch: drm/amdgpu: Unmap all MMIO mappings. It's a patch half 
a year ago, and everything looks fine till this case.

Regards,
Guchun

-Original Message-
From: Koenig, Christian 
Sent: Wednesday, December 15, 2021 7:00 PM
To: Shi, Leslie ; Grodzovsky, Andrey ; Pan, 
Xinhui ; Deucher, Alexander ; 
amd-gfx@lists.freedesktop.org
Cc: Chen, Guchun 
Subject: Re: [PATCH] drm/amdgpu: add drm_dev_unplug() in GPU initialization 
failure to prevent crash

Am 15.12.21 um 09:46 schrieb Leslie Shi:

[Why]
In amdgpu_driver_load_kms, when amdgpu_device_init returns error
during driver modprobe, it will start the error handle path
immediately and call into amdgpu_device_unmap_mmio as well to release
mapped VRAM. However, in the following release callback, driver stills visits 
the unmapped memory like vcn.inst[i].fw_shared_cpu_addr in vcn_v3_0_sw_fini. So 
a kernel crash occurs.

Mhm, interesting workaround but I'm not sure that's the right thing to do.

Question is why are we unmapping the MMIO space on driver load failure so early 
in the first place? I mean don't we need to clean up a bit?

If that's really the way to go then we should at least add a comment explaining 
why it's done that way.

Regards,
Christian.


[How]
Add drm_dev_unplug() before executing amdgpu_driver_unload_kms to prevent such 
crash.
GPU initialization failure is somehow allowed, but a kernel crash in this case 
should never happen.

Signed-off-by: Leslie Shi 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 2 ++
   1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 651c7abfde03..7bf6aecdbb92 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -268,6 +268,8 @@ int amdgpu_driver_load_kms(struct amdgpu_device *adev, 
unsigned long flags)
/* balance pm_runtime_get_sync in amdgpu_driver_unload_kms */
if (adev->rmmio && adev->runpm)
pm_runtime_put_noidle(dev->dev);
+
+   drm_dev_unplug(dev);
amdgpu_driver_unload_kms(dev);
}

Re: [PATCH v3] drm/amdgpu: introduce new amdgpu_fence object to indicate the job embedded fence

2021-12-15 Thread kernel test robot

Hi Huang,

I love your patch! Perhaps something to improve:

[auto build test WARNING on drm/drm-next]
[also build test WARNING on drm-intel/for-linux-next drm-tip/drm-tip v5.16-rc5 
next-20211214]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Huang-Rui/drm-amdgpu-introduce-new-amdgpu_fence-object-to-indicate-the-job-embedded-fence/20211215-143731
base:   git://anongit.freedesktop.org/drm/drm drm-next
config: x86_64-allyesconfig 
(https://download.01.org/0day-ci/archive/20211215/202112152115.sqaqnvg7-...@intel.com/config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce (this is a W=1 build):
# 
https://github.com/0day-ci/linux/commit/a47becf231b123760625c45242e89f5e5b5b4915
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Huang-Rui/drm-amdgpu-introduce-new-amdgpu_fence-object-to-indicate-the-job-embedded-fence/20211215-143731
git checkout a47becf231b123760625c45242e89f5e5b5b4915
# save the config file to linux build tree
mkdir build_dir
make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash 
drivers/gpu/drm/amd/amdgpu/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>):

>> drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c:631: warning: expecting prototype 
>> for amdgpu_fence_clear_job_fences(). Prototype was for 
>> amdgpu_fence_driver_clear_job_fences() instead


vim +631 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c

   623  
   624  /**
   625   * amdgpu_fence_clear_job_fences - clear job embedded fences of ring
   626   *
   627   * @ring: fence of the ring to be cleared
   628   *
   629   */
   630  void amdgpu_fence_driver_clear_job_fences(struct amdgpu_ring *ring)
 > 631  {
   632  int i;
   633  struct dma_fence *old, **ptr;
   634  
   635  for (i = 0; i <= ring->fence_drv.num_fences_mask; i++) {
   636  ptr = >fence_drv.fences[i];
   637  old = rcu_dereference_protected(*ptr, 1);
   638  if (old && old->ops == _job_fence_ops)
   639  RCU_INIT_POINTER(*ptr, NULL);
   640  }
   641  }
   642  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org

RE: [PATCH] drm/amdgpu: add drm_dev_unplug() in GPU initialization failure to prevent crash

2021-12-15 Thread Chen, Guchun

[Public]

Hi Christian,

Your question is a really good one. The patch to unmap MMOI in such early phase 
is from Andrey's patch: drm/amdgpu: Unmap all MMIO mappings. It's a patch half 
a year ago, and everything looks fine till this case.

Regards,
Guchun

-Original Message-
From: Koenig, Christian  
Sent: Wednesday, December 15, 2021 7:00 PM
To: Shi, Leslie ; Grodzovsky, Andrey 
; Pan, Xinhui ; Deucher, 
Alexander ; amd-gfx@lists.freedesktop.org
Cc: Chen, Guchun 
Subject: Re: [PATCH] drm/amdgpu: add drm_dev_unplug() in GPU initialization 
failure to prevent crash

Am 15.12.21 um 09:46 schrieb Leslie Shi:
> [Why]
> In amdgpu_driver_load_kms, when amdgpu_device_init returns error 
> during driver modprobe, it will start the error handle path 
> immediately and call into amdgpu_device_unmap_mmio as well to release 
> mapped VRAM. However, in the following release callback, driver stills visits 
> the unmapped memory like vcn.inst[i].fw_shared_cpu_addr in vcn_v3_0_sw_fini. 
> So a kernel crash occurs.

Mhm, interesting workaround but I'm not sure that's the right thing to do.

Question is why are we unmapping the MMIO space on driver load failure so early 
in the first place? I mean don't we need to clean up a bit?

If that's really the way to go then we should at least add a comment explaining 
why it's done that way.

Regards,
Christian.

>
> [How]
> Add drm_dev_unplug() before executing amdgpu_driver_unload_kms to prevent 
> such crash.
> GPU initialization failure is somehow allowed, but a kernel crash in this 
> case should never happen.
>
> Signed-off-by: Leslie Shi 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 2 ++
>   1 file changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> index 651c7abfde03..7bf6aecdbb92 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> @@ -268,6 +268,8 @@ int amdgpu_driver_load_kms(struct amdgpu_device *adev, 
> unsigned long flags)
>   /* balance pm_runtime_get_sync in amdgpu_driver_unload_kms */
>   if (adev->rmmio && adev->runpm)
>   pm_runtime_put_noidle(dev->dev);
> +
> + drm_dev_unplug(dev);
>   amdgpu_driver_unload_kms(dev);
>   }
>

Re: [PATCH] drm/amdgpu: add drm_dev_unplug() in GPU initialization failure to prevent crash

2021-12-15 Thread Christian König


Am 15.12.21 um 09:46 schrieb Leslie Shi:

[Why]
In amdgpu_driver_load_kms, when amdgpu_device_init returns error during driver 
modprobe, it
will start the error handle path immediately and call into 
amdgpu_device_unmap_mmio as well
to release mapped VRAM. However, in the following release callback, driver 
stills visits the
unmapped memory like vcn.inst[i].fw_shared_cpu_addr in vcn_v3_0_sw_fini. So a 
kernel crash occurs.


Mhm, interesting workaround but I'm not sure that's the right thing to do.

Question is why are we unmapping the MMIO space on driver load failure 
so early in the first place? I mean don't we need to clean up a bit?


If that's really the way to go then we should at least add a comment 
explaining why it's done that way.


Regards,
Christian.



[How]
Add drm_dev_unplug() before executing amdgpu_driver_unload_kms to prevent such 
crash.
GPU initialization failure is somehow allowed, but a kernel crash in this case 
should never happen.

Signed-off-by: Leslie Shi 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 651c7abfde03..7bf6aecdbb92 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -268,6 +268,8 @@ int amdgpu_driver_load_kms(struct amdgpu_device *adev, 
unsigned long flags)
/* balance pm_runtime_get_sync in amdgpu_driver_unload_kms */
if (adev->rmmio && adev->runpm)
pm_runtime_put_noidle(dev->dev);
+
+   drm_dev_unplug(dev);
amdgpu_driver_unload_kms(dev);
}

Re: [PATCH v3] drm/amdgpu: introduce new amdgpu_fence object to indicate the job embedded fence

2021-12-15 Thread Christian König


Am 15.12.21 um 07:35 schrieb Huang Rui:

The job embedded fence donesn't initialize the flags at
dma_fence_init(). Then we will go a wrong way in
amdgpu_fence_get_timeline_name callback and trigger a null pointer panic
once we enabled the trace event here. So introduce new amdgpu_fence
object to indicate the job embedded fence.

[  156.131790] BUG: kernel NULL pointer dereference, address: 02a0
[  156.131804] #PF: supervisor read access in kernel mode
[  156.131811] #PF: error_code(0x) - not-present page
[  156.131817] PGD 0 P4D 0
[  156.131824] Oops:  [#1] PREEMPT SMP PTI
[  156.131832] CPU: 6 PID: 1404 Comm: sdma0 Tainted: G   OE 
5.16.0-rc1-custom #1
[  156.131842] Hardware name: Gigabyte Technology Co., Ltd. 
Z170XP-SLI/Z170XP-SLI-CF, BIOS F20 11/04/2016
[  156.131848] RIP: 0010:strlen+0x0/0x20
[  156.131859] Code: 89 c0 c3 0f 1f 80 00 00 00 00 48 01 fe eb 0f 0f b6 07 38 d0 74 
10 48 83 c7 01 84 c0 74 05 48 39 f7 75 ec 31 c0 c3 48 89 f8 c3 <80> 3f 00 74 10 
48 89 f8 48 83 c0 01 80 38 00 75 f7 48 29 f8 c3 31
[  156.131872] RSP: 0018:9bd0018dbcf8 EFLAGS: 00010206
[  156.131880] RAX: 02a0 RBX: 8d0305ef01b0 RCX: 000b
[  156.131888] RDX: 8d03772ab924 RSI: 8d0305ef01b0 RDI: 02a0
[  156.131895] RBP: 9bd0018dbd60 R08: 8d03002094d0 R09: 
[  156.131901] R10: 005e R11: 0065 R12: 8d03002094d0
[  156.131907] R13: 001f R14: 00070018 R15: 0007
[  156.131914] FS:  () GS:8d062ed8() 
knlGS:
[  156.131923] CS:  0010 DS:  ES:  CR0: 80050033
[  156.131929] CR2: 02a0 CR3: 1120a005 CR4: 003706e0
[  156.131937] DR0:  DR1:  DR2: 
[  156.131942] DR3:  DR6: fffe0ff0 DR7: 0400
[  156.131949] Call Trace:
[  156.131953]  
[  156.131957]  ? trace_event_raw_event_dma_fence+0xcc/0x200
[  156.131973]  ? ring_buffer_unlock_commit+0x23/0x130
[  156.131982]  dma_fence_init+0x92/0xb0
[  156.131993]  amdgpu_fence_emit+0x10d/0x2b0 [amdgpu]
[  156.132302]  amdgpu_ib_schedule+0x2f9/0x580 [amdgpu]
[  156.132586]  amdgpu_job_run+0xed/0x220 [amdgpu]

Signed-off-by: Huang Rui 


Reviewed-by: Christian König 


---

V1 -> V2: add another amdgpu_fence_ops which is for job-embedded fence.
V2 -> V3: use amdgpu_fence_driver_clear_job_fences abstract the job fence
clearing operation.

---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  11 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 126 ++---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |   4 +-
  3 files changed, 90 insertions(+), 51 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 5625f7736e37..fecf7a09e5a2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4456,7 +4456,7 @@ int amdgpu_device_mode1_reset(struct amdgpu_device *adev)
  int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev,
 struct amdgpu_reset_context *reset_context)
  {
-   int i, j, r = 0;
+   int i, r = 0;
struct amdgpu_job *job = NULL;
bool need_full_reset =
test_bit(AMDGPU_NEED_FULL_RESET, _context->flags);
@@ -4478,15 +4478,8 @@ int amdgpu_device_pre_asic_reset(struct amdgpu_device 
*adev,
  
  		/*clear job fence from fence drv to avoid force_completion

 *leave NULL and vm flush fence in fence drv */
-   for (j = 0; j <= ring->fence_drv.num_fences_mask; j++) {
-   struct dma_fence *old, **ptr;
+   amdgpu_fence_driver_clear_job_fences(ring);
  
-			ptr = >fence_drv.fences[j];

-   old = rcu_dereference_protected(*ptr, 1);
-   if (old && test_bit(AMDGPU_FENCE_FLAG_EMBED_IN_JOB_BIT, 
>flags)) {
-   RCU_INIT_POINTER(*ptr, NULL);
-   }
-   }
/* after all hw jobs are reset, hw fence is meaningless, so 
force_completion */
amdgpu_fence_driver_force_completion(ring);
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index 3b7e86ea7167..db41d16838b9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -77,11 +77,13 @@ void amdgpu_fence_slab_fini(void)
   * Cast helper
   */
  static const struct dma_fence_ops amdgpu_fence_ops;
+static const struct dma_fence_ops amdgpu_job_fence_ops;
  static inline struct amdgpu_fence *to_amdgpu_fence(struct dma_fence *f)
  {
struct amdgpu_fence *__f = container_of(f, struct amdgpu_fence, base);
  
-	if (__f->base.ops == _fence_ops)

+   if (__f->base.ops == _fence_ops ||
+   __f->base.ops == _job_fence_ops)

[PATCH] drm/amdgpu: add drm_dev_unplug() in GPU initialization failure to prevent crash

2021-12-15 Thread Leslie Shi

[Why]
In amdgpu_driver_load_kms, when amdgpu_device_init returns error during driver 
modprobe, it
will start the error handle path immediately and call into 
amdgpu_device_unmap_mmio as well
to release mapped VRAM. However, in the following release callback, driver 
stills visits the
unmapped memory like vcn.inst[i].fw_shared_cpu_addr in vcn_v3_0_sw_fini. So a 
kernel crash occurs.

[How]
Add drm_dev_unplug() before executing amdgpu_driver_unload_kms to prevent such 
crash.
GPU initialization failure is somehow allowed, but a kernel crash in this case 
should never happen.

Signed-off-by: Leslie Shi 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 651c7abfde03..7bf6aecdbb92 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -268,6 +268,8 @@ int amdgpu_driver_load_kms(struct amdgpu_device *adev, 
unsigned long flags)
/* balance pm_runtime_get_sync in amdgpu_driver_unload_kms */
if (adev->rmmio && adev->runpm)
pm_runtime_put_noidle(dev->dev);
+
+   drm_dev_unplug(dev);
amdgpu_driver_unload_kms(dev);
}
 
-- 
2.25.1

48 matches

Mail list logo