Re: [PATCH] drm/amdgpu: add function descripion of new functions

2024-04-26 Thread Deucher, Alexander
[Public]

Reviewed-by: Alex Deucher 

From: Sunil Khatri 
Sent: Friday, April 26, 2024 3:18 AM
To: Deucher, Alexander ; Koenig, Christian 

Cc: amd-gfx@lists.freedesktop.org ; Khatri, 
Sunil 
Subject: [PATCH] drm/amdgpu: add function descripion of new functions

Add function description of the new functions added
in amd_ip_funcs.

new functions added are:
a. dump_ip_state
b. print_ip_state

Signed-off-by: Sunil Khatri 
---
 drivers/gpu/drm/amd/include/amd_shared.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/include/amd_shared.h 
b/drivers/gpu/drm/amd/include/amd_shared.h
index 7536c173a546..36ee9d3d6d9c 100644
--- a/drivers/gpu/drm/amd/include/amd_shared.h
+++ b/drivers/gpu/drm/amd/include/amd_shared.h
@@ -291,6 +291,8 @@ enum amd_dpm_forced_level;
  * @set_clockgating_state: enable/disable cg for the IP block
  * @set_powergating_state: enable/disable pg for the IP block
  * @get_clockgating_state: get current clockgating status
+ * @dump_ip_state: dump the IP state of the ASIC during a gpu hang
+ * @print_ip_state: print the IP state in devcoredump for each IP of the ASIC
  *
  * These hooks provide an interface for controlling the operational state
  * of IP blocks. After acquiring a list of IP blocks for the GPU in use,
--
2.34.1



RE: [PATCH 1/2] drm/print: drop include debugfs.h and include where needed

2024-04-24 Thread Deucher, Alexander
[Public]

> -Original Message-
> From: Jani Nikula 
> Sent: Wednesday, April 24, 2024 9:55 AM
> To: dri-de...@lists.freedesktop.org
> Cc: Andrzej Hajda ; Maxime Ripard
> ; Jacek Lawrynowicz
> ; Stanislaw Gruszka
> ; Oded Gabbay ;
> Russell King ; David Airlie ; Daniel
> Vetter ; Neil Armstrong ; Robert
> Foss ; Laurent Pinchart
> ; Jonas Karlman ;
> Jernej Skrabec ; Maarten Lankhorst
> ; Thomas Zimmermann
> ; Rodrigo Vivi ; Joonas
> Lahtinen ; Tvrtko Ursulin
> ; Frank Binns ; Matt Coster
> ; Rob Clark ; Abhinav
> Kumar ; Dmitry Baryshkov
> ; Sean Paul ; Marijn Suijten
> ; Karol Herbst ; Lyude
> Paul ; Danilo Krummrich ; Deucher,
> Alexander ; Koenig, Christian
> ; Pan, Xinhui ; Alain
> Volmat ; Huang, Ray ;
> Zack Rusin ; Broadcom internal kernel review list
> ; Lucas De Marchi
> ; Thomas Hellström
> ; intel-...@lists.freedesktop.org; intel-
> x...@lists.freedesktop.org; linux-arm-...@vger.kernel.org;
> freedr...@lists.freedesktop.org; nouv...@lists.freedesktop.org; amd-
> g...@lists.freedesktop.org
> Subject: Re: [PATCH 1/2] drm/print: drop include debugfs.h and include where
> needed
>
> On Mon, 22 Apr 2024, Jani Nikula  wrote:
> > Surprisingly many places depend on debugfs.h to be included via
> > drm_print.h. Fix them.
> >
> > v3: Also fix armada, ite-it6505, imagination, msm, sti, vc4, and xe
> >
> > v2: Also fix ivpu and vmwgfx
> >
> > Reviewed-by: Andrzej Hajda 
> > Acked-by: Maxime Ripard 
> > Link:
> >
> https://patchwork.freedesktop.org/patch/msgid/20240410141434.157908
> -1-
> > jani.nik...@intel.com
> > Signed-off-by: Jani Nikula 
>
> While the changes all over the place are small, mostly just adding the
> debugfs.h include, please consider acking. I've sent this a few times already.
>

For radeon:
Acked-by: Alex Deucher 

> Otherwise, I'll merge this by the end of the week, acks or not.
>
> Thanks,
> Jani.
>
>
>
> >
> > ---
> >
> > Cc: Jacek Lawrynowicz 
> > Cc: Stanislaw Gruszka 
> > Cc: Oded Gabbay 
> > Cc: Russell King 
> > Cc: David Airlie 
> > Cc: Daniel Vetter 
> > Cc: Andrzej Hajda 
> > Cc: Neil Armstrong 
> > Cc: Robert Foss 
> > Cc: Laurent Pinchart 
> > Cc: Jonas Karlman 
> > Cc: Jernej Skrabec 
> > Cc: Maarten Lankhorst 
> > Cc: Maxime Ripard 
> > Cc: Thomas Zimmermann 
> > Cc: Jani Nikula 
> > Cc: Rodrigo Vivi 
> > Cc: Joonas Lahtinen 
> > Cc: Tvrtko Ursulin 
> > Cc: Frank Binns 
> > Cc: Matt Coster 
> > Cc: Rob Clark 
> > Cc: Abhinav Kumar 
> > Cc: Dmitry Baryshkov 
> > Cc: Sean Paul 
> > Cc: Marijn Suijten 
> > Cc: Karol Herbst 
> > Cc: Lyude Paul 
> > Cc: Danilo Krummrich 
> > Cc: Alex Deucher 
> > Cc: "Christian König" 
> > Cc: "Pan, Xinhui" 
> > Cc: Alain Volmat 
> > Cc: Huang Rui 
> > Cc: Zack Rusin 
> > Cc: Broadcom internal kernel review list
> > 
> > Cc: Lucas De Marchi 
> > Cc: "Thomas Hellström" 
> > Cc: dri-de...@lists.freedesktop.org
> > Cc: intel-...@lists.freedesktop.org
> > Cc: intel...@lists.freedesktop.org
> > Cc: linux-arm-...@vger.kernel.org
> > Cc: freedr...@lists.freedesktop.org
> > Cc: nouv...@lists.freedesktop.org
> > Cc: amd-gfx@lists.freedesktop.org
> > ---
> >  drivers/accel/ivpu/ivpu_debugfs.c   | 2 ++
> >  drivers/gpu/drm/armada/armada_debugfs.c | 1 +
> >  drivers/gpu/drm/bridge/ite-it6505.c | 1 +
> >  drivers/gpu/drm/bridge/panel.c  | 2 ++
> >  drivers/gpu/drm/drm_print.c | 6 +++---
> >  drivers/gpu/drm/i915/display/intel_dmc.c| 1 +
> >  drivers/gpu/drm/imagination/pvr_fw_trace.c  | 1 +
> > drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.c | 2 ++
> >  drivers/gpu/drm/nouveau/dispnv50/crc.c  | 2 ++
> >  drivers/gpu/drm/radeon/r100.c   | 1 +
> >  drivers/gpu/drm/radeon/r300.c   | 1 +
> >  drivers/gpu/drm/radeon/r420.c   | 1 +
> >  drivers/gpu/drm/radeon/r600.c   | 3 ++-
> >  drivers/gpu/drm/radeon/radeon_fence.c   | 1 +
> >  drivers/gpu/drm/radeon/radeon_gem.c | 1 +
> >  drivers/gpu/drm/radeon/radeon_ib.c  | 2 ++
> >  drivers/gpu/drm/radeon/radeon_pm.c  | 1 +
> >  drivers/gpu/drm/radeon/radeon_ring.c| 2 ++
> >  drivers/gpu/drm/radeon/radeon_ttm.c | 1 +
> >  drivers/gpu/drm/radeon/rs400.c  | 1 +
> >  drivers/gpu/drm/radeon/rv515.c  | 1 +
> >  drivers/gpu/drm/sti/sti_drv.c   | 1 +
> >  d

RE: [PATCH 3/3] drm/amdgpu: Fix the uninitialized variable warning

2024-04-24 Thread Deucher, Alexander
[AMD Official Use Only - General]

> -Original Message-
> From: Ma, Jun 
> Sent: Wednesday, April 24, 2024 6:04 AM
> To: amd-gfx@lists.freedesktop.org; Koenig, Christian
> ; Deucher, Alexander
> 
> Cc: Ma, Jun 
> Subject: [PATCH 3/3] drm/amdgpu: Fix the uninitialized variable warning
>
> Initialize the phy_id to 0 to fix the warning of "Using uninitialized value 
> phy_id"
>
> Signed-off-by: Ma Jun 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_securedisplay.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_securedisplay.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_securedisplay.c
> index 8ed0e073656f..df81078aa26d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_securedisplay.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_securedisplay.c
> @@ -95,7 +95,7 @@ static ssize_t
> amdgpu_securedisplay_debugfs_write(struct file *f, const char __u
>   struct psp_context *psp = >psp;
>   struct ta_securedisplay_cmd *securedisplay_cmd;
>   struct drm_device *dev = adev_to_drm(adev);
> - uint32_t phy_id;
> + uint32_t phy_id = 0;

Would be better to return an error in case 2: below if size < 3.  Otherwise we 
are just blindly using 0 for phy id.

Alex

>   uint32_t op;
>   char str[64];
>   int ret;
> --
> 2.34.1



RE: [PATCH 1/3] drm/amdgpu: Fix uninitialized variable warning in amdgpu_afmt_acr

2024-04-24 Thread Deucher, Alexander
[AMD Official Use Only - General]

> -Original Message-
> From: Ma, Jun 
> Sent: Wednesday, April 24, 2024 6:04 AM
> To: amd-gfx@lists.freedesktop.org; Koenig, Christian
> ; Deucher, Alexander
> 
> Cc: Ma, Jun 
> Subject: [PATCH 1/3] drm/amdgpu: Fix uninitialized variable warning in
> amdgpu_afmt_acr
>
> Assign value to clock to fix the warning below:
> "Using uninitialized value res. Field res.clock is uninitialized"
>
> Signed-off-by: Ma Jun 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_afmt.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_afmt.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_afmt.c
> index a4d65973bf7c..9e3442b2d2ec 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_afmt.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_afmt.c
> @@ -87,7 +87,7 @@ static void amdgpu_afmt_calc_cts(uint32_t clock, int
> *CTS, int *N, int freq)
>
>  struct amdgpu_afmt_acr amdgpu_afmt_acr(uint32_t clock)  {
> - struct amdgpu_afmt_acr res;
> + struct amdgpu_afmt_acr res = {0};

I think you can drop this hunk since all of the fields are initialized below.

Alex

>   u8 i;
>
>   /* Precalculated values for common clocks */ @@ -100,6 +100,7 @@
> struct amdgpu_afmt_acr amdgpu_afmt_acr(uint32_t clock)
>   amdgpu_afmt_calc_cts(clock, _32khz, _32khz, 32000);
>   amdgpu_afmt_calc_cts(clock, _44_1khz, _44_1khz,
> 44100);
>   amdgpu_afmt_calc_cts(clock, _48khz, _48khz, 48000);
> + res.clock = clock;
>
>   return res;
>  }
> --
> 2.34.1



Re: [PATCH 1/2] drm/amdgpu: fix double free err_addr pointer warnings

2024-04-24 Thread Deucher, Alexander
[AMD Official Use Only - General]

Acked-by: Alex Deucher 

From: Bob Zhou 
Sent: Tuesday, April 23, 2024 1:32 AM
To: amd-gfx@lists.freedesktop.org ; Deucher, 
Alexander ; Koenig, Christian 

Cc: Zhou, Bob 
Subject: [PATCH 1/2] drm/amdgpu: fix double free err_addr pointer warnings

In amdgpu_umc_bad_page_polling_timeout, the amdgpu_umc_handle_bad_pages
will be run many times so that double free err_addr in some special case.
So set the err_addr to NULL to avoid the warnings.

Signed-off-by: Bob Zhou 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
index f486510fc94c..32e818d182fe 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
@@ -170,6 +170,7 @@ static void amdgpu_umc_handle_bad_pages(struct 
amdgpu_device *adev,
 }

 kfree(err_data->err_addr);
+   err_data->err_addr = NULL;

 mutex_unlock(>page_retirement_lock);
 }
--
2.34.1



Re: [PATCH] drm/amdgpu/vpe: fix vpe dpm setup failed

2024-04-18 Thread Deucher, Alexander
[AMD Official Use Only - General]

Acked-by: Alex Deucher 

From: Lee, Peyton 
Sent: Thursday, April 18, 2024 1:12 AM
To: amd-gfx@lists.freedesktop.org 
Cc: Deucher, Alexander ; Liu, HaoPing (Alan) 
; Yu, Lang ; Lee, Peyton 

Subject: [PATCH] drm/amdgpu/vpe: fix vpe dpm setup failed

The vpe dpm settings should be done before firmware is loaded.
Otherwise, the frequency cannot be successfully raised.

Signed-off-by: Peyton Lee 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c |  2 +-
 drivers/gpu/drm/amd/amdgpu/vpe_v6_1.c   | 14 +++---
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c
index 6695481f870f..c23d97d34b7e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c
@@ -205,7 +205,7 @@ int amdgpu_vpe_configure_dpm(struct amdgpu_vpe *vpe)
 dpm_ctl &= 0xfffe; /* Disable DPM */
 WREG32(vpe_get_reg_offset(vpe, 0, vpe->regs.dpm_enable), dpm_ctl);
 dev_dbg(adev->dev, "%s: disable vpe dpm\n", __func__);
-   return 0;
+   return -EINVAL;
 }

 int amdgpu_vpe_psp_update_sram(struct amdgpu_device *adev)
diff --git a/drivers/gpu/drm/amd/amdgpu/vpe_v6_1.c 
b/drivers/gpu/drm/amd/amdgpu/vpe_v6_1.c
index 769eb8f7bb3c..09315dd5a1ec 100644
--- a/drivers/gpu/drm/amd/amdgpu/vpe_v6_1.c
+++ b/drivers/gpu/drm/amd/amdgpu/vpe_v6_1.c
@@ -144,6 +144,12 @@ static int vpe_v6_1_load_microcode(struct amdgpu_vpe *vpe)
 WREG32(vpe_get_reg_offset(vpe, j, regVPEC_CNTL), ret);
 }

+   /* setup collaborate mode */
+   vpe_v6_1_set_collaborate_mode(vpe, true);
+   /* setup DPM */
+   if (amdgpu_vpe_configure_dpm(vpe))
+   dev_warn(adev->dev, "VPE failed to enable DPM\n");
+
 /*
  * For VPE 6.1.1, still only need to add master's offset, and psp will 
apply it to slave as well.
  * Here use instance 0 as master.
@@ -159,11 +165,7 @@ static int vpe_v6_1_load_microcode(struct amdgpu_vpe *vpe)
 adev->vpe.cmdbuf_cpu_addr[0] = f32_offset;
 adev->vpe.cmdbuf_cpu_addr[1] = f32_cntl;

-   amdgpu_vpe_psp_update_sram(adev);
-   vpe_v6_1_set_collaborate_mode(vpe, true);
-   amdgpu_vpe_configure_dpm(vpe);
-
-   return 0;
+   return amdgpu_vpe_psp_update_sram(adev);
 }

 vpe_hdr = (const struct vpe_firmware_header_v1_0 *)adev->vpe.fw->data;
@@ -196,8 +198,6 @@ static int vpe_v6_1_load_microcode(struct amdgpu_vpe *vpe)
 }

 vpe_v6_1_halt(vpe, false);
-   vpe_v6_1_set_collaborate_mode(vpe, true);
-   amdgpu_vpe_configure_dpm(vpe);

 return 0;
 }
--
2.34.1



Re: [PATCH] drm/amdgpu: Fix leak when GPU memory allocation fails

2024-04-18 Thread Deucher, Alexander
[Public]

Reviewed-by: Alex Deucher 

From: amd-gfx  on behalf of Mukul Joshi 

Sent: Thursday, April 18, 2024 12:17 PM
To: amd-gfx@lists.freedesktop.org 
Cc: Kuehling, Felix ; Joshi, Mukul 
Subject: [PATCH] drm/amdgpu: Fix leak when GPU memory allocation fails

Free the sync object if the memory allocation fails for any
reason.

Signed-off-by: Mukul Joshi 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 0ae9fd844623..bcf4a9e82075 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1854,6 +1854,7 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
 err_bo_create:
 amdgpu_amdkfd_unreserve_mem_limit(adev, aligned_size, flags, xcp_id);
 err_reserve_limit:
+   amdgpu_sync_free(&(*mem)->sync);
 mutex_destroy(&(*mem)->lock);
 if (gobj)
 drm_gem_object_put(gobj);
--
2.35.1



Re: [PATCH] drm/amdgpu: Use driver mode reset for data poison handling

2024-04-17 Thread Deucher, Alexander
[Public]

Acked-by: Alex Deucher 

From: amd-gfx  on behalf of Hawking 
Zhang 
Sent: Tuesday, April 16, 2024 1:56 AM
To: amd-gfx@lists.freedesktop.org ; Zhou1, Tao 

Cc: Zhang, Hawking 
Subject: [PATCH] drm/amdgpu: Use driver mode reset for data poison handling

mode-2 reset is the only reliable method that can get
GC/SDMA back when poison is consumed. mmhub requires
mode-1 reset.

Signed-off-by: Hawking Zhang 
---
 .../gpu/drm/amd/amdkfd/kfd_int_process_v9.c   | 22 +++
 1 file changed, 3 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
index c368c70df3f4a..94eb2493103ef 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
@@ -144,7 +144,7 @@ static void event_interrupt_poison_consumption_v9(struct 
kfd_node *dev,
 uint16_t pasid, uint16_t client_id)
 {
 enum amdgpu_ras_block block = 0;
-   int old_poison, ret = -EINVAL;
+   int old_poison;
 uint32_t reset = 0;
 struct kfd_process *p = kfd_lookup_process_by_pasid(pasid);

@@ -163,17 +163,13 @@ static void event_interrupt_poison_consumption_v9(struct 
kfd_node *dev,
 case SOC15_IH_CLIENTID_SE2SH:
 case SOC15_IH_CLIENTID_SE3SH:
 case SOC15_IH_CLIENTID_UTCL2:
-   ret = kfd_dqm_evict_pasid(dev->dqm, pasid);
 block = AMDGPU_RAS_BLOCK__GFX;
-   if (ret)
-   reset = AMDGPU_RAS_GPU_RESET_MODE2_RESET;
+   reset = AMDGPU_RAS_GPU_RESET_MODE2_RESET;
 break;
 case SOC15_IH_CLIENTID_VMC:
 case SOC15_IH_CLIENTID_VMC1:
-   ret = kfd_dqm_evict_pasid(dev->dqm, pasid);
 block = AMDGPU_RAS_BLOCK__MMHUB;
-   if (ret)
-   reset = AMDGPU_RAS_GPU_RESET_MODE1_RESET;
+   reset = AMDGPU_RAS_GPU_RESET_MODE1_RESET;
 break;
 case SOC15_IH_CLIENTID_SDMA0:
 case SOC15_IH_CLIENTID_SDMA1:
@@ -189,18 +185,6 @@ static void event_interrupt_poison_consumption_v9(struct 
kfd_node *dev,

 kfd_signal_poison_consumed_event(dev, pasid);

-   /* resetting queue passes, do page retirement without gpu reset
-* resetting queue fails, fallback to gpu reset solution
-*/
-   if (!ret)
-   dev_warn(dev->adev->dev,
-   "RAS poison consumption, unmap queue flow succeeded: 
client id %d\n",
-   client_id);
-   else
-   dev_warn(dev->adev->dev,
-   "RAS poison consumption, fall back to gpu reset flow: 
client id %d\n",
-   client_id);
-
 amdgpu_amdkfd_ras_poison_consumption_handler(dev->adev, block, reset);
 }

--
2.17.1



Re: [PATCH] drm/amdgpu: replace tmz flag into buffer flag

2024-04-12 Thread Deucher, Alexander
[AMD Official Use Only - General]

Reviewed-by: Alex Deucher 

From: Min, Frank 
Sent: Friday, April 12, 2024 8:06 AM
To: amd-gfx@lists.freedesktop.org 
Cc: Gao, Likun ; Zhang, Hawking ; 
Deucher, Alexander ; Koenig, Christian 

Subject: [PATCH] drm/amdgpu: replace tmz flag into buffer flag

[AMD Official Use Only - General]

From: Frank Min 

Replace tmz flag into buffer flag to make it easier to understand and extend

Signed-off-by: Likun Gao 
Signed-off-by: Frank Min 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c|  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h  |  4 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   | 18 +++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h   |  4 +++-
 drivers/gpu/drm/amd/amdgpu/cik_sdma.c |  2 +-
 drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c|  2 +-
 drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c|  2 +-
 drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c|  4 ++--
 drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c  |  4 ++--
 drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c|  4 ++--
 drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c|  4 ++--
 drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c|  4 ++--
 drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c|  5 +++--
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c  |  4 ++--
 15 files changed, 36 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c
index edc6377ec5ff..199693369c7c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c
@@ -39,7 +39,7 @@ static int amdgpu_benchmark_do_move(struct amdgpu_device 
*adev, unsigned size,
for (i = 0; i < n; i++) {
struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
r = amdgpu_copy_buffer(ring, saddr, daddr, size, NULL, ,
-  false, false, false);
+  false, false, 0);
if (r)
goto exit_do_move;
r = dma_fence_wait(fence, false);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 38742ff0ff49..abb1505c82ec 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -791,7 +791,7 @@ int amdgpu_bo_restore_shadow(struct amdgpu_bo *shadow, 
struct dma_fence **fence)

return amdgpu_copy_buffer(ring, shadow_addr, parent_addr,
  amdgpu_bo_size(shadow), NULL, fence,
- true, false, false);
+ true, false, 0);
 }

 /**
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h
index a22c6446817b..b5bde6652838 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h
@@ -136,7 +136,7 @@ struct amdgpu_buffer_funcs {
 uint64_t dst_offset,
 /* number of byte to transfer */
 uint32_t byte_count,
-bool tmz);
+uint32_t copy_flags);

/* maximum bytes in a single operation */
uint32_tfill_max_bytes;
@@ -154,7 +154,7 @@ struct amdgpu_buffer_funcs {
 uint32_t byte_count);
 };

-#define amdgpu_emit_copy_buffer(adev, ib, s, d, b, t) 
(adev)->mman.buffer_funcs->emit_copy_buffer((ib),  (s), (d), (b), (t))
+#define amdgpu_emit_copy_buffer(adev, ib, s, d, b, f)
+(adev)->mman.buffer_funcs->emit_copy_buffer((ib), (s), (d), (b), (f))
 #define amdgpu_emit_fill_buffer(adev, ib, s, d, b) 
(adev)->mman.buffer_funcs->emit_fill_buffer((ib), (s), (d), (b))

 struct amdgpu_sdma_instance *
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index f0fffbf2bdd5..d58ab879e125 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -267,7 +267,7 @@ static int amdgpu_ttm_map_buffer(struct ttm_buffer_object 
*bo,
dst_addr = amdgpu_bo_gpu_offset(adev->gart.bo);
dst_addr += window * AMDGPU_GTT_MAX_TRANSFER_SIZE * 8;
amdgpu_emit_copy_buffer(adev, >ibs[0], src_addr,
-   dst_addr, num_bytes, false);
+   dst_addr, num_bytes, 0);

amdgpu_ring_pad_ib(ring, >ibs[0]);
WARN_ON(job->ibs[0].length_dw > num_dw); @@ -327,6 +327,8 @@ int 
amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev,
struct dma_fence *fence = NULL;
int r = 0;

+   uint32_t copy_flags = 0;
+
if (!adev->mman.buffer_funcs_enabled) {
DRM_ERROR("Trying to move memory with ring turned off.\n");

Re: [PATCH] drm/amdgpu: increase mes submission timeout

2024-04-11 Thread Deucher, Alexander
[AMD Official Use Only - General]

Reviewed-by: Alex Deucher 

From: Kim, Jonathan 
Sent: Thursday, April 11, 2024 3:03 PM
To: amd-gfx@lists.freedesktop.org 
Cc: Deucher, Alexander ; Joshi, Mukul 
; Kim, Jonathan ; Kim, Jonathan 

Subject: [PATCH] drm/amdgpu: increase mes submission timeout

MES internally has a timeout allowance of 2 seconds.
Increase driver timeout to 3 seconds to be safe.

Signed-off-by: Jonathan Kim 
---
 drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
index e5230078a4cd..81833395324a 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
@@ -111,7 +111,7 @@ static int mes_v11_0_submit_pkt_and_poll_completion(struct 
amdgpu_mes *mes,
 struct amdgpu_device *adev = mes->adev;
 struct amdgpu_ring *ring = >ring;
 unsigned long flags;
-   signed long timeout = adev->usec_timeout;
+   signed long timeout = 300; /* 3000 ms */

 if (amdgpu_emu_mode) {
 timeout *= 100;
--
2.34.1



Re: [PATCH] drm/amd/amdgpu: Update PF2VF Header

2024-04-09 Thread Deucher, Alexander
[AMD Official Use Only - General]

Acked-by: Alex Deucher 

From: amd-gfx  on behalf of Luqmaan 
Irshad 
Sent: Tuesday, April 2, 2024 6:01 PM
To: amd-gfx@lists.freedesktop.org 
Cc: Irshad, Luqmaan 
Subject: [PATCH] drm/amd/amdgpu: Update PF2VF Header

Adding a new field for GPU Capacity to align the header with the host.

Signed-off-by: Luqmaan Irshad 
---
 drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h 
b/drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h
index 0de78d6a83fe..fb2b394bb9c5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h
@@ -158,7 +158,7 @@ struct amd_sriov_msg_pf2vf_info_header {
 uint32_t reserved[2];
 };

-#define AMD_SRIOV_MSG_PF2VF_INFO_FILLED_SIZE (48)
+#define AMD_SRIOV_MSG_PF2VF_INFO_FILLED_SIZE (49)
 struct amd_sriov_msg_pf2vf_info {
 /* header contains size and version */
 struct amd_sriov_msg_pf2vf_info_header header;
@@ -209,6 +209,8 @@ struct amd_sriov_msg_pf2vf_info {
 struct amd_sriov_msg_uuid_info uuid_info;
 /* PCIE atomic ops support flag */
 uint32_t pcie_atomic_ops_support_flags;
+   /* Portion of GPU memory occupied by VF.  MAX value is 65535, but set 
to uint32_t to maintain alignment with reserved size */
+   uint32_t gpu_capacity;
 /* reserved */
 uint32_t reserved[256 - AMD_SRIOV_MSG_PF2VF_INFO_FILLED_SIZE];
 };
--
2.44.0



Re: [PATCH] drm/amdgpu: Fix VCN allocation in CPX partition

2024-03-28 Thread Deucher, Alexander
[AMD Official Use Only - General]

Acked-by: Alex Deucher 

From: Lazar, Lijo 
Sent: Wednesday, March 27, 2024 10:05 PM
To: amd-gfx@lists.freedesktop.org 
Cc: Zhang, Hawking ; Deucher, Alexander 
; Zhu, James ; Kamal, Asad 

Subject: [PATCH] drm/amdgpu: Fix VCN allocation in CPX partition

VCN need not be shared in CPX mode always for all GFX 9.4.3 SOC SKUs. In
certain configs, VCN instance can be exclusively allocated to a
partition even under CPX mode.

Signed-off-by: Lijo Lazar 
Reviewed-by: James Zhu 
Reviewed-by: Asad Kamal 
---
 drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c 
b/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c
index d6f808acfb17..fbb43ae7624f 100644
--- a/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c
+++ b/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c
@@ -62,6 +62,11 @@ void aqua_vanjaram_doorbell_index_init(struct amdgpu_device 
*adev)
 adev->doorbell_index.max_assignment = 
AMDGPU_DOORBELL_LAYOUT1_MAX_ASSIGNMENT << 1;
 }

+static bool aqua_vanjaram_xcp_vcn_shared(struct amdgpu_device *adev)
+{
+   return (adev->xcp_mgr->num_xcps > adev->vcn.num_vcn_inst);
+}
+
 static void aqua_vanjaram_set_xcp_id(struct amdgpu_device *adev,
  uint32_t inst_idx, struct amdgpu_ring *ring)
 {
@@ -87,7 +92,7 @@ static void aqua_vanjaram_set_xcp_id(struct amdgpu_device 
*adev,
 case AMDGPU_RING_TYPE_VCN_ENC:
 case AMDGPU_RING_TYPE_VCN_JPEG:
 ip_blk = AMDGPU_XCP_VCN;
-   if (adev->xcp_mgr->mode == AMDGPU_CPX_PARTITION_MODE)
+   if (aqua_vanjaram_xcp_vcn_shared(adev))
 inst_mask = 1 << (inst_idx * 2);
 break;
 default:
@@ -140,10 +145,12 @@ static int aqua_vanjaram_xcp_sched_list_update(

 aqua_vanjaram_xcp_gpu_sched_update(adev, ring, ring->xcp_id);

-   /* VCN is shared by two partitions under CPX MODE */
+   /* VCN may be shared by two partitions under CPX MODE in certain
+* configs.
+*/
 if ((ring->funcs->type == AMDGPU_RING_TYPE_VCN_ENC ||
-   ring->funcs->type == AMDGPU_RING_TYPE_VCN_JPEG) &&
-   adev->xcp_mgr->mode == AMDGPU_CPX_PARTITION_MODE)
+ring->funcs->type == AMDGPU_RING_TYPE_VCN_JPEG) &&
+   aqua_vanjaram_xcp_vcn_shared(adev))
 aqua_vanjaram_xcp_gpu_sched_update(adev, ring, 
ring->xcp_id + 1);
 }

--
2.25.1



RE: [PATCH v2] drm/amdgpu: Reset dGPU if suspend got aborted

2024-03-27 Thread Deucher, Alexander
[Public]

> -Original Message-
> From: amd-gfx  On Behalf Of Lijo
> Lazar
> Sent: Thursday, March 28, 2024 12:20 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Zhang, Hawking ; Deucher, Alexander
> ; Wang, Yang(Kevin)
> 
> Subject: [PATCH v2] drm/amdgpu: Reset dGPU if suspend got aborted
>
> For SOC21 ASICs, there is an issue in re-enabling PM features if a suspend got
> aborted. In such cases, reset the device during resume phase. This is a
> workaround till a proper solution is finalized.
>
> Signed-off-by: Lijo Lazar 

Reviewed-by: Alex Deucher 

> ---
> v2: Read TOS status only if required (Kevin).
> Refine log message.
>
>  drivers/gpu/drm/amd/amdgpu/soc21.c | 25
> +
>  1 file changed, 25 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/soc21.c
> b/drivers/gpu/drm/amd/amdgpu/soc21.c
> index 8526282f4da1..abe319b0f063 100644
> --- a/drivers/gpu/drm/amd/amdgpu/soc21.c
> +++ b/drivers/gpu/drm/amd/amdgpu/soc21.c
> @@ -867,10 +867,35 @@ static int soc21_common_suspend(void *handle)
>   return soc21_common_hw_fini(adev);
>  }
>
> +static bool soc21_need_reset_on_resume(struct amdgpu_device *adev) {
> + u32 sol_reg1, sol_reg2;
> +
> + /* Will reset for the following suspend abort cases.
> +  * 1) Only reset dGPU side.
> +  * 2) S3 suspend got aborted and TOS is active.
> +  */
> + if (!(adev->flags & AMD_IS_APU) && adev->in_s3 &&
> + !adev->suspend_complete) {
> + sol_reg1 = RREG32_SOC15(MP0, 0,
> regMP0_SMN_C2PMSG_81);
> + msleep(100);
> + sol_reg2 = RREG32_SOC15(MP0, 0,
> regMP0_SMN_C2PMSG_81);
> +
> + return (sol_reg1 != sol_reg2);
> + }
> +
> + return false;
> +}
> +
>  static int soc21_common_resume(void *handle)  {
>   struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>
> + if (soc21_need_reset_on_resume(adev)) {
> + dev_info(adev->dev, "S3 suspend aborted, resetting...");
> + soc21_asic_reset(adev);
> + }
> +
>   return soc21_common_hw_init(adev);
>  }
>
> --
> 2.25.1



Re: [PATCH] drm/amd/pm: fixes a random hang in S4 for SMU v13.0.4/11

2024-03-27 Thread Deucher, Alexander
[AMD Official Use Only - General]

Acked-by: Alex Deucher 

From: Huang, Tim 
Sent: Thursday, March 28, 2024 12:17 AM
To: amd-gfx@lists.freedesktop.org 
Cc: Deucher, Alexander ; Zhang, Yifan 
; Huang, Tim 
Subject: [PATCH] drm/amd/pm: fixes a random hang in S4 for SMU v13.0.4/11

From: Tim Huang 

While doing multiple S4 stress tests, GC/RLC/PMFW get into
an invalid state resulting into hard hangs.

Adding a GFX reset as workaround just before sending the
MP1_UNLOAD message avoids this failure.

Signed-off-by: Tim Huang 
---
 drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c
index e8119918ef6b..88f1a0d878f3 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c
@@ -226,8 +226,18 @@ static int smu_v13_0_4_system_features_control(struct 
smu_context *smu, bool en)
 struct amdgpu_device *adev = smu->adev;
 int ret = 0;

-   if (!en && !adev->in_s0ix)
+   if (!en && !adev->in_s0ix) {
+   /* Adds a GFX reset as workaround just before sending the
+* MP1_UNLOAD message to prevent GC/RLC/PMFW from entering
+* an invalid state.
+*/
+   ret = smu_cmn_send_smc_msg_with_param(smu, 
SMU_MSG_GfxDeviceDriverReset,
+ SMU_RESET_MODE_2, NULL);
+   if (ret)
+   return ret;
+
 ret = smu_cmn_send_smc_msg(smu, SMU_MSG_PrepareMp1ForUnload, 
NULL);
+   }

 return ret;
 }
--
2.39.2



RE: [PATCH 10/28] drm: amdgpu: Use PCI_IRQ_INTX

2024-03-25 Thread Deucher, Alexander
[Public]

> -Original Message-
> From: amd-gfx  On Behalf Of
> Damien Le Moal
> Sent: Monday, March 25, 2024 3:09 AM
> To: linux-...@vger.kernel.org; Bjorn Helgaas ;
> Manivannan Sadhasivami ; linux-
> s...@vger.kernel.org; Martin K . Petersen ;
> Jaroslav Kysela ; linux-so...@vger.kernel.org; Greg Kroah-
> Hartman ; linux-...@vger.kernel.org; linux-
> ser...@vger.kernel.org; Hans de Goede ; platform-
> driver-...@vger.kernel.org; n...@lists.linux.dev; Lee Jones ;
> David Airlie ; amd-gfx@lists.freedesktop.org; Jason
> Gunthorpe ; linux-r...@vger.kernel.org; David S . Miller
> ; Eric Dumazet ;
> net...@vger.kernel.org; linux-ker...@vger.kernel.org
> Subject: [PATCH 10/28] drm: amdgpu: Use PCI_IRQ_INTX
>
> Use the macro PCI_IRQ_INTX instead of the deprecated PCI_IRQ_LEGACY
> macro.
>
> Signed-off-by: Damien Le Moal 

Feel free to take it through whatever tree makes sense.  If you want me to pick 
it up, let me know.
Acked-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> index 7e6d09730e6d..d18113017ee7 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> @@ -279,7 +279,7 @@ int amdgpu_irq_init(struct amdgpu_device *adev)
>   adev->irq.msi_enabled = false;
>
>   if (!amdgpu_msi_ok(adev))
> - flags = PCI_IRQ_LEGACY;
> + flags = PCI_IRQ_INTX;
>   else
>   flags = PCI_IRQ_ALL_TYPES;
>
> --
> 2.44.0



Re: [PATCH 2/2] drm/amdgpu: enable UMSCH 4.0.6

2024-03-22 Thread Deucher, Alexander
[AMD Official Use Only - General]

Series is:
Acked-by: Alex Deucher 

From: amd-gfx  on behalf of Lang Yu 

Sent: Thursday, March 21, 2024 10:53 PM
To: amd-gfx@lists.freedesktop.org 
Cc: Yu, Lang ; Gopalakrishnan, Veerabadhran (Veera) 

Subject: [PATCH 2/2] drm/amdgpu: enable UMSCH 4.0.6

Share same codes with 4.0.5 and enable collaborate mode for VPE.

Signed-off-by: Lang Yu 
Reviewed-by: Veerabadhran Gopalakrishnan 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c  | 12 ++--
 drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c|  7 +--
 3 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index 3c407164837b..07c5fca06178 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -2247,6 +2247,7 @@ static int amdgpu_discovery_set_umsch_mm_ip_blocks(struct 
amdgpu_device *adev)
 {
 switch (amdgpu_ip_version(adev, VCN_HWIP, 0)) {
 case IP_VERSION(4, 0, 5):
+   case IP_VERSION(4, 0, 6):
 if (amdgpu_umsch_mm & 0x1) {
 amdgpu_device_ip_block_add(adev, 
_mm_v4_0_ip_block);
 adev->enable_umsch_mm = true;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c
index 99210a3b1044..95f80b9131a8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c
@@ -189,10 +189,13 @@ static void setup_vpe_queue(struct amdgpu_device *adev,
 mqd->rptr_val = 0;
 mqd->unmapped = 1;

+   if (adev->vpe.collaborate_mode)
+   memcpy(++mqd, test->mqd_data_cpu_addr, sizeof(struct MQD_INFO));
+
 qinfo->mqd_addr = test->mqd_data_gpu_addr;
 qinfo->csa_addr = test->ctx_data_gpu_addr +
 offsetof(struct umsch_mm_test_ctx_data, vpe_ctx_csa);
-   qinfo->doorbell_offset_0 = (adev->doorbell_index.vpe_ring + 1) << 1;
+   qinfo->doorbell_offset_0 = 0;
 qinfo->doorbell_offset_1 = 0;
 }

@@ -287,7 +290,10 @@ static int submit_vpe_queue(struct amdgpu_device *adev, 
struct umsch_mm_test *te
 ring[5] = 0;

 mqd->wptr_val = (6 << 2);
-   // 
WDOORBELL32(adev->umsch_mm.agdb_index[CONTEXT_PRIORITY_LEVEL_NORMAL], 
mqd->wptr_val);
+   if (adev->vpe.collaborate_mode)
+   (++mqd)->wptr_val = (6 << 2);
+
+   WDOORBELL32(adev->umsch_mm.agdb_index[CONTEXT_PRIORITY_LEVEL_NORMAL], 
mqd->wptr_val);

 for (i = 0; i < adev->usec_timeout; i++) {
 if (*fence == test_pattern)
@@ -571,6 +577,7 @@ int amdgpu_umsch_mm_init_microcode(struct amdgpu_umsch_mm 
*umsch)

 switch (amdgpu_ip_version(adev, VCN_HWIP, 0)) {
 case IP_VERSION(4, 0, 5):
+   case IP_VERSION(4, 0, 6):
 fw_name = "amdgpu/umsch_mm_4_0_0.bin";
 break;
 default:
@@ -750,6 +757,7 @@ static int umsch_mm_early_init(void *handle)

 switch (amdgpu_ip_version(adev, VCN_HWIP, 0)) {
 case IP_VERSION(4, 0, 5):
+   case IP_VERSION(4, 0, 6):
 umsch_mm_v4_0_set_funcs(>umsch_mm);
 break;
 default:
diff --git a/drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c 
b/drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c
index 8e7b763cfdb7..84368cf1e175 100644
--- a/drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c
@@ -60,7 +60,7 @@ static int umsch_mm_v4_0_load_microcode(struct 
amdgpu_umsch_mm *umsch)

 umsch->cmd_buf_curr_ptr = umsch->cmd_buf_ptr;

-   if (amdgpu_ip_version(adev, VCN_HWIP, 0) == IP_VERSION(4, 0, 5)) {
+   if (amdgpu_ip_version(adev, VCN_HWIP, 0) >= IP_VERSION(4, 0, 5)) {
 WREG32_SOC15(VCN, 0, regUVD_IPX_DLDO_CONFIG,
 1 << UVD_IPX_DLDO_CONFIG__ONO0_PWR_CONFIG__SHIFT);
 SOC15_WAIT_ON_RREG(VCN, 0, regUVD_IPX_DLDO_STATUS,
@@ -248,7 +248,7 @@ static int umsch_mm_v4_0_ring_stop(struct amdgpu_umsch_mm 
*umsch)
 data = REG_SET_FIELD(data, VCN_UMSCH_RB_DB_CTRL, EN, 0);
 WREG32_SOC15(VCN, 0, regVCN_UMSCH_RB_DB_CTRL, data);

-   if (amdgpu_ip_version(adev, VCN_HWIP, 0) == IP_VERSION(4, 0, 5)) {
+   if (amdgpu_ip_version(adev, VCN_HWIP, 0) >= IP_VERSION(4, 0, 5)) {
 WREG32_SOC15(VCN, 0, regUVD_IPX_DLDO_CONFIG,
 2 << UVD_IPX_DLDO_CONFIG__ONO0_PWR_CONFIG__SHIFT);
 SOC15_WAIT_ON_RREG(VCN, 0, regUVD_IPX_DLDO_STATUS,
@@ -271,6 +271,8 @@ static int umsch_mm_v4_0_set_hw_resources(struct 
amdgpu_umsch_mm *umsch)

 set_hw_resources.vmid_mask_mm_vcn = umsch->vmid_mask_mm_vcn;
 set_hw_resources.vmid_mask_mm_vpe = umsch->vmid_mask_mm_vpe;
+   set_hw_resources.collaboration_mask_vpe =
+   adev->vpe.collaborate_mode ? 0x3 : 0x0;
 

RE: [PATCH v2 1/9] drm/amd/pm: Add support for DPM policies

2024-03-14 Thread Deucher, Alexander
[Public]

> -Original Message-
> From: Lazar, Lijo 
> Sent: Thursday, March 14, 2024 7:56 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Zhang, Hawking ; Deucher, Alexander
> ; Liu, Shuzhou (Bill)
> 
> Subject: [PATCH v2 1/9] drm/amd/pm: Add support for DPM policies
>
> Add support to set/get information about different DPM policies. The support
> is only available on SOCs which use swsmu architecture.
>
> A DPM policy type may be defined with different levels. For example, a policy
> may be defined to select Pstate preference and then later a pstate preference
> may be chosen.
>
> Signed-off-by: Lijo Lazar 
> Reviewed-by: Hawking Zhang 
> ---
> v2: Add NULL checks before accessing smu_dpm_policy_ctxt
>
>  .../gpu/drm/amd/include/kgd_pp_interface.h| 16 
>  drivers/gpu/drm/amd/pm/amdgpu_dpm.c   | 29 ++
>  drivers/gpu/drm/amd/pm/amdgpu_pm.c| 92 ++
>  drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h   |  4 +
>  drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 95
> +++
>  drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h | 29 ++
>  6 files changed, 265 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/include/kgd_pp_interface.h
> b/drivers/gpu/drm/amd/include/kgd_pp_interface.h
> index afb930b70615..84dd819ccc06 100644
> --- a/drivers/gpu/drm/amd/include/kgd_pp_interface.h
> +++ b/drivers/gpu/drm/amd/include/kgd_pp_interface.h
> @@ -273,6 +273,22 @@ enum pp_xgmi_plpd_mode {
>   XGMI_PLPD_COUNT,
>  };
>
> +enum pp_pm_policy {
> + PP_PM_POLICY_NONE = -1,
> + PP_PM_POLICY_SOC_PSTATE = 0,
> + PP_PM_POLICY_NUM,
> +};
> +
> +enum pp_policy_soc_pstate {
> + SOC_PSTATE_DEFAULT = 0,
> + SOC_PSTATE_0,
> + SOC_PSTATE_1,
> + SOC_PSTATE_2,
> + SOC_PSTAT_COUNT,
> +};
> +
> +#define PP_POLICY_MAX_LEVELS 5
> +
>  #define PP_GROUP_MASK0xF000
>  #define PP_GROUP_SHIFT   28
>
> diff --git a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
> b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
> index f84bfed50681..db3addd07120 100644
> --- a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
> +++ b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
> @@ -411,6 +411,35 @@ int amdgpu_dpm_set_xgmi_plpd_mode(struct
> amdgpu_device *adev, int mode)
>   return ret;
>  }
>
> +ssize_t amdgpu_dpm_get_pm_policy_info(struct amdgpu_device *adev, char
> +*buf) {
> + struct smu_context *smu = adev->powerplay.pp_handle;
> + int ret = -EOPNOTSUPP;
> +
> + if (is_support_sw_smu(adev)) {
> + mutex_lock(>pm.mutex);
> + ret = smu_get_pm_policy_info(smu, buf);
> + mutex_unlock(>pm.mutex);
> + }
> +
> + return ret;
> +}
> +
> +int amdgpu_dpm_set_pm_policy(struct amdgpu_device *adev, int
> policy_type,
> +  int policy_level)
> +{
> + struct smu_context *smu = adev->powerplay.pp_handle;
> + int ret = -EOPNOTSUPP;
> +
> + if (is_support_sw_smu(adev)) {
> + mutex_lock(>pm.mutex);
> + ret = smu_set_pm_policy(smu, policy_type, policy_level);
> + mutex_unlock(>pm.mutex);
> + }
> +
> + return ret;
> +}
> +
>  int amdgpu_dpm_enable_mgpu_fan_boost(struct amdgpu_device *adev)  {
>   void *pp_handle = adev->powerplay.pp_handle; diff --git
> a/drivers/gpu/drm/amd/pm/amdgpu_pm.c
> b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
> index efc631bddf4a..7ee11c2e3c61 100644
> --- a/drivers/gpu/drm/amd/pm/amdgpu_pm.c
> +++ b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
> @@ -2179,6 +2179,96 @@ static ssize_t
> amdgpu_set_xgmi_plpd_policy(struct device *dev,
>   return count;
>  }
>
> +static ssize_t amdgpu_get_pm_policy(struct device *dev,
> + struct device_attribute *attr, char *buf) {
> + struct drm_device *ddev = dev_get_drvdata(dev);
> + struct amdgpu_device *adev = drm_to_adev(ddev);
> +
> + if (amdgpu_in_reset(adev))
> + return -EPERM;
> + if (adev->in_suspend && !adev->in_runpm)
> + return -EPERM;
> +
> + return amdgpu_dpm_get_pm_policy_info(adev, buf); }
> +
> +static ssize_t amdgpu_set_pm_policy(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf, size_t count)
> +{
> + struct drm_device *ddev = dev_get_drvdata(dev);
> + struct amdgpu_device *adev = drm_to_adev(ddev);
> + int policy_type, ret, num_params = 0;
> + char delimiter[] = " \n\t";
> + char tmp_buf[128];
> + char *tmp, *param;
> + long val;
> +
> + if (amdgpu_in_reset(adev))
> +

Re: [PATCH] drm/amdgpu: correct the KGQ fallback message

2024-03-14 Thread Deucher, Alexander
[Public]

Reviewed-by: Alex Deucher 

From: Liang, Prike 
Sent: Wednesday, March 13, 2024 5:29 AM
To: amd-gfx@lists.freedesktop.org 
Cc: Deucher, Alexander ; Liang, Prike 

Subject: [PATCH] drm/amdgpu: correct the KGQ fallback message

Fix the KGQ fallback function name, as this will
help differentiate the failure in the KCQ enablement.

Signed-off-by: Prike Liang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
index 4835d6d899e7..d9dc5485 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
@@ -686,7 +686,7 @@ int amdgpu_gfx_enable_kgq(struct amdgpu_device *adev, int 
xcc_id)
 r = amdgpu_ring_test_helper(kiq_ring);
 spin_unlock(>ring_lock);
 if (r)
-   DRM_ERROR("KCQ enable failed\n");
+   DRM_ERROR("KGQ enable failed\n");

 return r;
 }
--
2.34.1



RE: [PATCH v2] drm/amdgpu: Clear the hotplug interrupt ack bit before hpd initialization

2024-03-14 Thread Deucher, Alexander
[Public]

> -Original Message-
> From: Qiang Ma 
> Sent: Wednesday, March 13, 2024 2:18 AM
> To: Deucher, Alexander ; Koenig, Christian
> ; Pan, Xinhui ;
> airl...@gmail.com; dan...@ffwll.ch; SHANMUGAM, SRINIVASAN
> ; sunran...@208suo.com
> Cc: amd-gfx@lists.freedesktop.org; dri-de...@lists.freedesktop.org; linux-
> ker...@vger.kernel.org
> Subject: Re: [PATCH v2] drm/amdgpu: Clear the hotplug interrupt ack bit
> before hpd initialization
>
> On Wed, 31 Jan 2024 15:57:03 +0800
> Qiang Ma  wrote:
>
> Hello everyone, please help review this patch.

This was applied back in January, sorry if I forget to reply.

Alex

>
>   Qiang Ma
>
> > Problem:
> > The computer in the bios initialization process, unplug the HDMI
> > display, wait until the system up, plug in the HDMI display, did not
> > enter the hotplug interrupt function, the display is not bright.
> >
> > Fix:
> > After the above problem occurs, and the hpd ack interrupt bit is 1,
> > the interrupt should be cleared during hpd_init initialization so that
> > when the driver is ready, it can respond to the hpd interrupt
> > normally.
> >
> > Signed-off-by: Qiang Ma 
> > ---
> > v2:
> >  - Remove unused variable 'tmp'
> >  - Fixed function spelling errors
> >
> > drivers/gpu/drm/amd/amdgpu/dce_v10_0.c |  2 ++
> > drivers/gpu/drm/amd/amdgpu/dce_v11_0.c |  2 ++
> > drivers/gpu/drm/amd/amdgpu/dce_v6_0.c  | 22 ++---
> -
> > drivers/gpu/drm/amd/amdgpu/dce_v8_0.c  | 22 ++---
> -
> >  4 files changed, 40 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c
> > b/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c index
> > bb666cb7522e..12a8ba929a72 100644 ---
> > a/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c +++
> > b/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c @@ -51,6 +51,7 @@
> >
> >  static void dce_v10_0_set_display_funcs(struct amdgpu_device *adev);
> > static void dce_v10_0_set_irq_funcs(struct amdgpu_device *adev);
> > +static void dce_v10_0_hpd_int_ack(struct amdgpu_device *adev, int
> > hpd);
> >  static const u32 crtc_offsets[] = {
> > CRTC0_REGISTER_OFFSET,
> > @@ -363,6 +364,7 @@ static void dce_v10_0_hpd_init(struct
> > amdgpu_device *adev) AMDGPU_HPD_DISCONNECT_INT_DELAY_IN_MS);
> > WREG32(mmDC_HPD_TOGGLE_FILT_CNTL +
> > hpd_offsets[amdgpu_connector->hpd.hpd], tmp);
> > +   dce_v10_0_hpd_int_ack(adev,
> > amdgpu_connector->hpd.hpd); dce_v10_0_hpd_set_polarity(adev,
> > amdgpu_connector->hpd.hpd); amdgpu_irq_get(adev, >hpd_irq,
> >amdgpu_connector->hpd.hpd); diff --git
> > a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
> > b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c index
> > 7af277f61cca..745e4fdffade 100644 ---
> > a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c +++
> > b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c @@ -51,6 +51,7 @@
> >
> >  static void dce_v11_0_set_display_funcs(struct amdgpu_device *adev);
> > static void dce_v11_0_set_irq_funcs(struct amdgpu_device *adev);
> > +static void dce_v11_0_hpd_int_ack(struct amdgpu_device *adev, int
> > hpd);
> >  static const u32 crtc_offsets[] =
> >  {
> > @@ -387,6 +388,7 @@ static void dce_v11_0_hpd_init(struct
> > amdgpu_device *adev) AMDGPU_HPD_DISCONNECT_INT_DELAY_IN_MS);
> > WREG32(mmDC_HPD_TOGGLE_FILT_CNTL +
> > hpd_offsets[amdgpu_connector->hpd.hpd], tmp);
> > +   dce_v11_0_hpd_int_ack(adev,
> > amdgpu_connector->hpd.hpd); dce_v11_0_hpd_set_polarity(adev,
> > amdgpu_connector->hpd.hpd); amdgpu_irq_get(adev, >hpd_irq,
> > amdgpu_connector->hpd.hpd); } diff --git
> > a/drivers/gpu/drm/amd/amdgpu/dce_v6_0.c
> > b/drivers/gpu/drm/amd/amdgpu/dce_v6_0.c index
> > 143efc37a17f..28c4a735716b 100644 ---
> > a/drivers/gpu/drm/amd/amdgpu/dce_v6_0.c +++
> > b/drivers/gpu/drm/amd/amdgpu/dce_v6_0.c @@ -272,6 +272,21 @@
> static
> > void dce_v6_0_hpd_set_polarity(struct amdgpu_device *adev,
> > WREG32(mmDC_HPD1_INT_CONTROL + hpd_offsets[hpd], tmp); }
> >
> > +static void dce_v6_0_hpd_int_ack(struct amdgpu_device *adev,
> > +int hpd)
> > +{
> > +   u32 tmp;
> > +
> > +   if (hpd >= adev->mode_info.num_hpd) {
> > +   DRM_DEBUG("invalid hdp %d\n", hpd);
> > +   return;
> > +   }
> > +
> > +   tmp = RREG32(mmDC_HPD1_INT_CONTROL + hpd_offsets[hpd]);
> > +   tmp |= DC_HPD1_INT_CONTROL__DC_HPD1_INT_ACK_MASK;
> > +   WREG32(mmDC_HPD1_INT_CONTROL + hpd_offse

RE: [PATCH] drm/amdgpu: add vm fault information to devcoredump

2024-03-06 Thread Deucher, Alexander
[Public]

> -Original Message-
> From: Sunil Khatri 
> Sent: Wednesday, March 6, 2024 1:20 PM
> To: Deucher, Alexander ; Koenig, Christian
> ; Sharma, Shashank
> 
> Cc: amd-gfx@lists.freedesktop.org; dri-de...@lists.freedesktop.org; linux-
> ker...@vger.kernel.org; Joshi, Mukul ; Paneer
> Selvam, Arunpravin ; Khatri, Sunil
> 
> Subject: [PATCH] drm/amdgpu: add vm fault information to devcoredump
>
> Add page fault information to the devcoredump.
>
> Output of devcoredump:
>  AMDGPU Device Coredump 
> version: 1
> kernel: 6.7.0-amd-staging-drm-next
> module: amdgpu
> time: 29.725011811
> process_name: soft_recovery_p PID: 1720
>
> Ring timed out details
> IP Type: 0 Ring Name: gfx_0.0.0
>
> [gfxhub] Page fault observed for GPU family:143 Faulty page starting at

I think we should add a separate section for the GPU identification information 
(family, PCI ids, IP versions, etc.).  For this patch, I think fine to just 
print the fault address and status.

Alex

> address 0x Protection fault status register:0x301031
>
> VRAM is lost due to GPU reset!
>
> Signed-off-by: Sunil Khatri 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 15 ++-
> drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h |  1 +
>  2 files changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
> index 147100c27c2d..d7fea6cdf2f9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
> @@ -203,8 +203,20 @@ amdgpu_devcoredump_read(char *buffer, loff_t
> offset, size_t count,
>  coredump->ring->name);
>   }
>
> + if (coredump->fault_info.status) {
> + struct amdgpu_vm_fault_info *fault_info = 
> >fault_info;
> +
> + drm_printf(, "\n[%s] Page fault observed for GPU
> family:%d\n",
> +fault_info->vmhub ? "mmhub" : "gfxhub",
> +coredump->adev->family);
> + drm_printf(, "Faulty page starting at address 0x%016llx\n",
> +fault_info->addr);
> + drm_printf(, "Protection fault status register:0x%x\n",
> +fault_info->status);
> + }
> +
>   if (coredump->reset_vram_lost)
> - drm_printf(, "VRAM is lost due to GPU reset!\n");
> + drm_printf(, "\nVRAM is lost due to GPU reset!\n");
>   if (coredump->adev->reset_info.num_regs) {
>   drm_printf(, "AMDGPU register dumps:\nOffset:
> Value:\n");
>
> @@ -253,6 +265,7 @@ void amdgpu_coredump(struct amdgpu_device
> *adev, bool vram_lost,
>   if (job) {
>   s_job = >base;
>   coredump->ring = to_amdgpu_ring(s_job->sched);
> + coredump->fault_info = job->vm->fault_info;
>   }
>
>   coredump->adev = adev;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
> index 60522963aaca..3197955264f9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
> @@ -98,6 +98,7 @@ struct amdgpu_coredump_info {
>   struct timespec64   reset_time;
>   boolreset_vram_lost;
>   struct amdgpu_ring  *ring;
> + struct amdgpu_vm_fault_info fault_info;
>  };
>  #endif
>
> --
> 2.34.1



Re: Re:RE: Why has to offer "/dev/drv/render128" fd when running ROCm demo?

2024-03-06 Thread Deucher, Alexander
[Public]

No such situation exists.  there is always a rendernode.

Alex


From: 曹子龙 
Sent: Wednesday, March 6, 2024 12:25 AM
To: amd-gfx@lists.freedesktop.org 
Cc: Deucher, Alexander ; Yat Sin, David 
; Kuehling, Felix 
Subject: Re:RE: Why has to offer "/dev/drv/render128" fd when running ROCm demo?

Hi felix:
   Thanks for your kindly help.

i still has a question, if on a pure-compute platform, which has no 
/dev/dri/render node, but only with a single /dev/kfd node, how the compue 
scenario works since that wei have no "render" fd exists?
is such platfom(without render node) exists?

 thanks for your kindly help.!

BRs
zlcao.





At 2024-03-06 04:43:30, "Kuehling, Felix"  wrote:

[AMD Official Use Only - General]


I already answered this question in a reply to another email that was addressed 
to me:



The render nodes are used for CPU mapping of VRAM with mmap calls and an offset 
that identifies the BO. The render node also creates the GPU virtual address 
space that is used by KFD to create the GPU memory mappings. Applications that 
use both graphics and compute can share the same GPU virtual address space in 
this way.



Zlcao, in the future, this type of question may be better addressed to a 
mailing list such as amd-gfx@lists.freedesktop.org, instead of writing 
separately to different maintainers.



Regards,
  Felix





From: Deucher, Alexander 
Sent: Tuesday, March 5, 2024 3:17 PM
To: 曹子龙 ; Kuehling, Felix ; Yat 
Sin, David 
Subject: Re: Why has to offer "/dev/drv/render128" fd when running ROCm demo?



[AMD Official Use Only - General]



+ Felix, David to comment.



From: 曹子龙 mailto:gainery...@163.com>>
Sent: Tuesday, March 5, 2024 8:46 AM
To: Deucher, Alexander 
mailto:alexander.deuc...@amd.com>>
Subject: Why has to offer "/dev/drv/render128" fd when running ROCm demo?



Hi alexander.:

  sorry for bother. but i really need some help to deal with my puzzle.

i am a freshman to AMD GPU Driver, NOW i am trying to write a simple demo to 
use "/dev/kfd" and do some  vram alloc/free tests,  but i found you must do the 
right "AMDKFD_IOC_ACQUIRE_VM" ioctl comamnd before you try GPU to do vram 
allocation and other things.  from the kfd driver code,the pre-condition to do 
the right "AMDKFD_IOC_ACQUIRE_VM" is to offer a "/dev/drv/render128" fd  to the 
parameters.



so, why need to do this? kfd is used for compute, but "/dev/dri/render128" is 
specify to gfx usaged, why must open the "/dev/dri/render128" during the KFD 
compute scenario?

thanks for your kindly help!



BRs

zlcao.




Re: [PATCH] drm/amdgpu: Fix missing break in ATOM_ARG_IMM Case of atom_get_src_int()

2024-02-26 Thread Deucher, Alexander
[Public]

Reviewed-by: Alex Deucher 

From: SHANMUGAM, SRINIVASAN 
Sent: Saturday, February 24, 2024 1:38 AM
To: Koenig, Christian ; Deucher, Alexander 

Cc: amd-gfx@lists.freedesktop.org ; SHANMUGAM, 
SRINIVASAN ; Jammy Zhou 
Subject: [PATCH] drm/amdgpu: Fix missing break in ATOM_ARG_IMM Case of 
atom_get_src_int()

Missing break statement in the ATOM_ARG_IMM case of a switch statement,
adds the missing break statement, ensuring that the program's control
flow is as intended.

Fixes the below:
drivers/gpu/drm/amd/amdgpu/atom.c:323 atom_get_src_int() warn: ignoring 
unreachable code.

Fixes: d38ceaf99ed0 ("drm/amdgpu: add core driver (v4)")
Cc: Jammy Zhou 
Cc: Christian König 
Cc: Alex Deucher 
Signed-off-by: Srinivasan Shanmugam 
---
 drivers/gpu/drm/amd/amdgpu/atom.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/atom.c 
b/drivers/gpu/drm/amd/amdgpu/atom.c
index b888613f653f..72362df352f6 100644
--- a/drivers/gpu/drm/amd/amdgpu/atom.c
+++ b/drivers/gpu/drm/amd/amdgpu/atom.c
@@ -320,7 +320,7 @@ static uint32_t atom_get_src_int(atom_exec_context *ctx, 
uint8_t attr,
 DEBUG("IMM 0x%02X\n", val);
 return val;
 }
-   return 0;
+   break;
 case ATOM_ARG_PLL:
 idx = U8(*ptr);
 (*ptr)++;
--
2.34.1



RE: [PATCH] drm/amd: Only allow one entity to control ABM

2024-02-20 Thread Deucher, Alexander
[Public]

> -Original Message-
> From: amd-gfx  On Behalf Of
> Christian König
> Sent: Tuesday, February 20, 2024 9:10 AM
> To: Alex Deucher 
> Cc: Limonciello, Mario ; Wentland, Harry
> ; amd-gfx@lists.freedesktop.org; dri-
> de...@lists.freedesktop.org; Mahfooz, Hamza ;
> Li, Sun peng (Leo) 
> Subject: Re: [PATCH] drm/amd: Only allow one entity to control ABM
>
> Am 19.02.24 um 16:28 schrieb Alex Deucher:
> > On Mon, Feb 19, 2024 at 10:19 AM Christian König
> >  wrote:
> >> Am 16.02.24 um 19:37 schrieb Alex Deucher:
> >>> On Fri, Feb 16, 2024 at 10:42 AM Christian König
> >>>  wrote:
>  Am 16.02.24 um 16:12 schrieb Mario Limonciello:
> > On 2/16/2024 09:05, Harry Wentland wrote:
> >> On 2024-02-16 09:47, Christian König wrote:
> >>> Am 16.02.24 um 15:42 schrieb Mario Limonciello:
>  On 2/16/2024 08:38, Christian König wrote:
> > Am 16.02.24 um 15:07 schrieb Mario Limonciello:
> >> By exporting ABM to sysfs it's possible that DRM master and
> >> software controlling the sysfs file fight over the value programmed
> for ABM.
> >>
> >> Adjust the module parameter behavior to control who control
> ABM:
> >> -2: DRM
> >> -1: sysfs (IE via software like power-profiles-daemon)
> > Well that sounds extremely awkward. Why should a
> > power-profiles-deamon has control over the panel power saving
> > features?
> >
> > I mean we are talking about things like reducing backlight
> > level when the is inactivity, don't we?
>  We're talking about activating the ABM algorithm when the
>  system is in power saving mode; not from inactivity.  This
>  allows the user to squeeze out some extra power "just" in that
> situation.
> 
>  But given the comments on the other patch, I tend to agree with
>  Harry's proposal instead that we just drop the DRM property
>  entirely as there are no consumers of it.
> >>> Yeah, but even then the design to let this be controlled by an
> >>> userspace deamon is questionable. Stuff like that is handled
> >>> inside the kernel and not exposed to userspace usually.
> >>>
> > Regarding the "how" and "why" of PPD; besides this panel power
> > savings sysfs file there are two other things that are nominally 
> > changed.
> >
> > ACPI platform profile:
> > https://www.kernel.org/doc/html/latest/userspace-api/sysfs-platfor
> > m_profile.html
> >
> > AMD-Pstate EPP value:
> > https://www.kernel.org/doc/html//latest/admin-guide/pm/amd-
> pstate.
> > html
> >
> > When a user goes into "power saving" mode both of those are tweaked.
> > Before we introduced the EPP tweaking in PPD we did discuss a
> > callback within the kernel so that userspace could change "just"
> > the ACPI platform profile and everything else would react.  There
> > was pushback on this, and so instead knobs are offered for things
> > that should be tweaked and the userspace daemon can set up policy
> > for what to do when a a user uses a userspace client (such as
> > GNOME or KDE) to change the desired system profile.
>  Ok, well who came up with the idea of the userspace deamon? Cause I
>  think there will be even more push back on this approach.
> 
>  Basically when we go from AC to battery (or whatever) the drivers
>  usually handle that all inside the kernel today. Involving
>  userspace is only done when there is a need for that, e.g.
>  inactivity detection or similar.
> >>> Well, we don't want policy in the kernel unless it's a platform or
> >>> hardware requirement.  Kernel should provide the knobs and then
> >>> userspace can set them however they want depending on user preference.
> >> Well, you not have the policy itself but usually the handling inside
> >> the kernel.
> >>
> >> In other words when I connect/disconnect AC from my laptop I can hear
> >> the fan changing, which is a switch in power state. Only the beep
> >> which comes out of the speakers as conformation is handled in userspace I
> think.
> >>
> >> And IIRC changing background light is also handled completely inside
> >> the kernel and when I close the lid the display turns off on its own
> >> and not because of some userspace deamon.
> >>
> >> So why is for this suddenly a userspace deamon involved?
> > It's a user preference.  Some people won't like ABM, some will.  They
> > set the policy from user space.  It's similar to the backlight level.
> > Some users always prefer a bright backlight regardless of AC/DC state,
> > others want the backlight to get brighter when on AC power.  The
> > kernel provides the knobs to set the ABM level and then user space can
> > specify the level and also device when they want it enabled (never,
> > only on DC, etc.).  The kernel driver for the backlight doesn't change
> > the backlight at AC/DC switch, 

Re: [PATCH] drm/amdgpu: Drop redundant parameter in amdgpu_gfx_kiq_init_ring

2024-02-19 Thread Deucher, Alexander
[Public]

Reviewed-by: Alex Deucher 

From: Ma, Jun 
Sent: Monday, February 19, 2024 1:40 AM
To: amd-gfx@lists.freedesktop.org ; Koenig, 
Christian ; Deucher, Alexander 

Cc: Ma, Jun 
Subject: [PATCH] drm/amdgpu: Drop redundant parameter in 
amdgpu_gfx_kiq_init_ring

Drop redundant parameters in function amdgpu_gfx_kiq_init_ring
to simplify the code

Signed-off-by: Ma Jun 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 6 +++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 4 +---
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c  | 5 ++---
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c  | 5 ++---
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c   | 5 ++---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c   | 5 ++---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 4 +---
 7 files changed, 13 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
index e114694d1131..4835d6d899e7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
@@ -304,11 +304,11 @@ static int amdgpu_gfx_kiq_acquire(struct amdgpu_device 
*adev,
 return -EINVAL;
 }

-int amdgpu_gfx_kiq_init_ring(struct amdgpu_device *adev,
-struct amdgpu_ring *ring,
-struct amdgpu_irq_src *irq, int xcc_id)
+int amdgpu_gfx_kiq_init_ring(struct amdgpu_device *adev, int xcc_id)
 {
 struct amdgpu_kiq *kiq = >gfx.kiq[xcc_id];
+   struct amdgpu_irq_src *irq = >irq;
+   struct amdgpu_ring *ring = >ring;
 int r = 0;

 spin_lock_init(>ring_lock);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
index f23bafec71c5..8fcf889ddce9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
@@ -471,9 +471,7 @@ static inline u32 amdgpu_gfx_create_bitmask(u32 bit_width)
 void amdgpu_gfx_parse_disable_cu(unsigned *mask, unsigned max_se,
  unsigned max_sh);

-int amdgpu_gfx_kiq_init_ring(struct amdgpu_device *adev,
-struct amdgpu_ring *ring,
-struct amdgpu_irq_src *irq, int xcc_id);
+int amdgpu_gfx_kiq_init_ring(struct amdgpu_device *adev, int xcc_id);

 void amdgpu_gfx_kiq_free_ring(struct amdgpu_ring *ring);

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index b02d63328f1c..691fa40e4e01 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -4490,7 +4490,7 @@ static int gfx_v10_0_compute_ring_init(struct 
amdgpu_device *adev, int ring_id,
 static int gfx_v10_0_sw_init(void *handle)
 {
 int i, j, k, r, ring_id = 0;
-   struct amdgpu_kiq *kiq;
+   int xcc_id = 0;
 struct amdgpu_device *adev = (struct amdgpu_device *)handle;

 switch (amdgpu_ip_version(adev, GC_HWIP, 0)) {
@@ -4619,8 +4619,7 @@ static int gfx_v10_0_sw_init(void *handle)
 return r;
 }

-   kiq = >gfx.kiq[0];
-   r = amdgpu_gfx_kiq_init_ring(adev, >ring, >irq, 0);
+   r = amdgpu_gfx_kiq_init_ring(adev, xcc_id);
 if (r)
 return r;
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index 2fb1342d5bd9..9d8ec709cd52 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -1329,7 +1329,7 @@ static int gfx_v11_0_rlc_backdoor_autoload_enable(struct 
amdgpu_device *adev)
 static int gfx_v11_0_sw_init(void *handle)
 {
 int i, j, k, r, ring_id = 0;
-   struct amdgpu_kiq *kiq;
+   int xcc_id = 0;
 struct amdgpu_device *adev = (struct amdgpu_device *)handle;

 switch (amdgpu_ip_version(adev, GC_HWIP, 0)) {
@@ -1454,8 +1454,7 @@ static int gfx_v11_0_sw_init(void *handle)
 return r;
 }

-   kiq = >gfx.kiq[0];
-   r = amdgpu_gfx_kiq_init_ring(adev, >ring, >irq, 0);
+   r = amdgpu_gfx_kiq_init_ring(adev, xcc_id);
 if (r)
 return r;
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index ea174b76ee70..b97ea62212b6 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -1900,8 +1900,8 @@ static void gfx_v8_0_sq_irq_work_func(struct work_struct 
*work);
 static int gfx_v8_0_sw_init(void *handle)
 {
 int i, j, k, r, ring_id;
+   int xcc_id = 0;
 struct amdgpu_ring *ring;
-   struct amdgpu_kiq *kiq;
 struct amdgpu_device *adev = (struct amdgpu_device *)handle;

 switch (adev->asic_type) {
@@ -2022,8 +2022,7 @@ static int gfx_v8_0_sw_init(void *handle)
 return r;
 }

-  

RE: [PATCH 1/3] drm/radeon: Use RMW accessors for changing LNKCTL2

2024-02-15 Thread Deucher, Alexander
[Public]

> -Original Message-
> From: Ilpo Järvinen 
> Sent: Thursday, February 15, 2024 8:32 AM
> To: Deucher, Alexander ; amd-
> g...@lists.freedesktop.org; Daniel Vetter ; David Airlie
> ; Dennis Dalessandro
> ; dri-
> de...@lists.freedesktop.org; Jason Gunthorpe ; Leon
> Romanovsky ; linux-ker...@vger.kernel.org; linux-
> r...@vger.kernel.org; Pan, Xinhui ; Koenig, Christian
> 
> Cc: Ilpo Järvinen ; Lukas Wunner
> 
> Subject: [PATCH 1/3] drm/radeon: Use RMW accessors for changing LNKCTL2
>
> Convert open coded RMW accesses for LNKCTL2 to use
> pcie_capability_clear_and_set_word() which makes its easier to understand
> what the code tries to do.
>
> LNKCTL2 is not really owned by any driver because it is a collection of 
> control
> bits that PCI core might need to touch. RMW accessors already have support
> for proper locking for a selected set of registers
> (LNKCTL2 is not yet among them but likely will be in the future) to avoid 
> losing
> concurrent updates.
>
> Suggested-by: Lukas Wunner 
> Signed-off-by: Ilpo Järvinen 

The radeon and amdgpu patches are:
Acked-by: Alex Deucher 

Are you looking for me to pick them up or do you want to land them as part of 
some larger change?  Either way is fine with me.

Alex

> ---
>  drivers/gpu/drm/radeon/cik.c | 40 ++--
>  drivers/gpu/drm/radeon/si.c  | 40 ++--
>  2 files changed, 30 insertions(+), 50 deletions(-)
>
> diff --git a/drivers/gpu/drm/radeon/cik.c b/drivers/gpu/drm/radeon/cik.c
> index 10be30366c2b..b5e96a8fc2c1 100644
> --- a/drivers/gpu/drm/radeon/cik.c
> +++ b/drivers/gpu/drm/radeon/cik.c
> @@ -9592,28 +9592,18 @@ static void cik_pcie_gen3_enable(struct
> radeon_device *rdev)
>
> PCI_EXP_LNKCTL_HAWD);
>
>   /* linkctl2 */
> - pcie_capability_read_word(root,
> PCI_EXP_LNKCTL2,
> -   );
> - tmp16 &= ~(PCI_EXP_LNKCTL2_ENTER_COMP
> |
> -PCI_EXP_LNKCTL2_TX_MARGIN);
> - tmp16 |= (bridge_cfg2 &
> -   (PCI_EXP_LNKCTL2_ENTER_COMP |
> -PCI_EXP_LNKCTL2_TX_MARGIN));
> - pcie_capability_write_word(root,
> -PCI_EXP_LNKCTL2,
> -tmp16);
> -
> - pcie_capability_read_word(rdev->pdev,
> -   PCI_EXP_LNKCTL2,
> -   );
> - tmp16 &= ~(PCI_EXP_LNKCTL2_ENTER_COMP
> |
> -PCI_EXP_LNKCTL2_TX_MARGIN);
> - tmp16 |= (gpu_cfg2 &
> -   (PCI_EXP_LNKCTL2_ENTER_COMP |
> -PCI_EXP_LNKCTL2_TX_MARGIN));
> - pcie_capability_write_word(rdev->pdev,
> -PCI_EXP_LNKCTL2,
> -tmp16);
> + pcie_capability_clear_and_set_word(root,
> PCI_EXP_LNKCTL2,
> +
> PCI_EXP_LNKCTL2_ENTER_COMP |
> +
> PCI_EXP_LNKCTL2_TX_MARGIN,
> +bridge_cfg2
> |
> +
> (PCI_EXP_LNKCTL2_ENTER_COMP |
> +
> PCI_EXP_LNKCTL2_TX_MARGIN));
> + pcie_capability_clear_and_set_word(rdev-
> >pdev, PCI_EXP_LNKCTL2,
> +
> PCI_EXP_LNKCTL2_ENTER_COMP |
> +
> PCI_EXP_LNKCTL2_TX_MARGIN,
> +gpu_cfg2 |
> +
> (PCI_EXP_LNKCTL2_ENTER_COMP |
> +
> PCI_EXP_LNKCTL2_TX_MARGIN));
>
>   tmp = RREG32_PCIE_PORT(PCIE_LC_CNTL4);
>   tmp &= ~LC_SET_QUIESCE;
> @@ -9627,15 +9617,15 @@ static void cik_pcie_gen3_enable(struct
> radeon_device *rdev)
>   speed_cntl &= ~LC_FORCE_DIS_SW_SPEED_CHANGE;
>   WREG32_PCIE_PORT(PCIE_LC_SPEED_CNTL, speed_cntl);
>
> - pcie_capability_read_word(rdev->pdev, PCI_EXP_LNKCTL2, );
> - tmp16 &= ~PCI_EXP_LNKCTL2_TLS;
> + tmp16 = 0;
>   if (speed_cap == PCIE_SPEED_8_0GT)
>   tmp16 |= PCI_EXP_LNKCTL2_TLS_8_0GT; /* gen3 */
>   else if (speed_cap == PCIE_SPEED_5_0GT)
>   tmp16 |= PCI_EXP_LNKCTL2_TLS_5_0GT; /* gen2 */
>   else
>   tmp16 |= PCI_EXP_LNK

RE: [PATCH] drm/amdgpu: bail on INFO IOCTL if the GPU is in reset

2024-02-12 Thread Deucher, Alexander
[AMD Official Use Only - General]

Ping?

> -Original Message-
> From: Deucher, Alexander 
> Sent: Monday, January 29, 2024 10:56 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander 
> Subject: [PATCH] drm/amdgpu: bail on INFO IOCTL if the GPU is in reset
>
> This avoids queries to read registers or query the SMU for telemetry data 
> while
> the GPU is in reset. This mirrors what we already do for sysfs.
>
> Signed-off-by: Alex Deucher 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> index a2df3025a754..d522e99c6f81 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> @@ -607,6 +607,9 @@ int amdgpu_info_ioctl(struct drm_device *dev, void
> *data, struct drm_file *filp)
>   int i, found, ret;
>   int ui32_size = sizeof(ui32);
>
> + if (amdgpu_in_reset(adev))
> + return -EPERM;
> +
>   if (!info->return_size || !info->return_pointer)
>   return -EINVAL;
>
> --
> 2.42.0



RE: [PATCH] drm/amdgpu/soc21: update VCN 4 max HEVC encoding resolution

2024-02-06 Thread Deucher, Alexander
[AMD Official Use Only - General]

> -Original Message-
> From: amd-gfx  On Behalf Of Thong
> Sent: Tuesday, February 6, 2024 6:28 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Thai, Thong 
> Subject: [PATCH] drm/amdgpu/soc21: update VCN 4 max HEVC encoding
> resolution
>
> Update the maximum resolution reported for HEVC encoding on VCN 4 devices
> to reflect its 8K encoding capability.
>

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3159

With that added,
Acked-by: Alex Deucher 

> Signed-off-by: Thong 
> ---
>  drivers/gpu/drm/amd/amdgpu/soc21.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/soc21.c
> b/drivers/gpu/drm/amd/amdgpu/soc21.c
> index 48c6efcdeac9..4d7188912edf 100644
> --- a/drivers/gpu/drm/amd/amdgpu/soc21.c
> +++ b/drivers/gpu/drm/amd/amdgpu/soc21.c
> @@ -50,13 +50,13 @@ static const struct amd_ip_funcs
> soc21_common_ip_funcs;
>  /* SOC21 */
>  static const struct amdgpu_video_codec_info
> vcn_4_0_0_video_codecs_encode_array_vcn0[] = {
>
>   {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_MPEG4
> _AVC, 4096, 2304, 0)},
> - {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_HEVC,
> 4096, 2304, 0)},
> + {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_HEVC,
> 8192, 4352,
> +0)},
>   {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_AV1,
> 8192, 4352, 0)},  };
>
>  static const struct amdgpu_video_codec_info
> vcn_4_0_0_video_codecs_encode_array_vcn1[] = {
>
>   {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_MPEG4
> _AVC, 4096, 2304, 0)},
> - {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_HEVC,
> 4096, 2304, 0)},
> + {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_HEVC,
> 8192, 4352,
> +0)},
>  };
>
>  static const struct amdgpu_video_codecs
> vcn_4_0_0_video_codecs_encode_vcn0 = {
> --
> 2.34.1



RE: [PATCH] drm/amd: Set s0i3/s3 in prepare() callback instead of suspend() callback

2024-02-06 Thread Deucher, Alexander
[AMD Official Use Only - General]

> -Original Message-
> From: amd-gfx  On Behalf Of Mario
> Limonciello
> Sent: Tuesday, February 6, 2024 4:32 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Limonciello, Mario ; Jürg Billeter
> 
> Subject: [PATCH] drm/amd: Set s0i3/s3 in prepare() callback instead of
> suspend() callback
>
> commit 5095d5418193 ("drm/amd: Evict resources during PM ops prepare()
> callback") intentionally moved the eviction of resources to earlier in the
> suspend process, but this introduced a subtle change that it occurs before
> adev->in_s0ix or adev->in_s3 are set. This meant that APUs actually started to
> evict resources at suspend time as well.
>
> Move the s0i3/s3 setting flags into prepare() to ensure that they're set 
> during
> eviction. Drop the existing call to return 1 in this case because the 
> suspend()
> callback looks for the flags too.
>
> Reported-by: Jürg Billeter 
> Closes: https://gitlab.freedesktop.org/drm/amd/-
> /issues/3132#note_2271038
> Fixes: 5095d5418193 ("drm/amd: Evict resources during PM ops prepare()
> callback")
> Signed-off-by: Mario Limonciello 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 14 --
>  1 file changed, 4 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index b74f68a15802..190b2ee9e36b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -2464,12 +2464,10 @@ static int amdgpu_pmops_prepare(struct device
> *dev)
>   pm_runtime_suspended(dev))
>   return 1;
>
> - /* if we will not support s3 or s2i for the device
> -  *  then skip suspend
> -  */
> - if (!amdgpu_acpi_is_s0ix_active(adev) &&
> - !amdgpu_acpi_is_s3_active(adev))
> - return 1;
> + if (amdgpu_acpi_is_s0ix_active(adev))
> + adev->in_s0ix = true;
> + else if (amdgpu_acpi_is_s3_active(adev))
> + adev->in_s3 = true;
>

Will resume always get called to clear these after after prepare?  Will these 
ever get set and then not unset?

Alex

>   return amdgpu_device_prepare(drm_dev);  } @@ -2484,10 +2482,6
> @@ static int amdgpu_pmops_suspend(struct device *dev)
>   struct drm_device *drm_dev = dev_get_drvdata(dev);
>   struct amdgpu_device *adev = drm_to_adev(drm_dev);
>
> - if (amdgpu_acpi_is_s0ix_active(adev))
> - adev->in_s0ix = true;
> - else if (amdgpu_acpi_is_s3_active(adev))
> - adev->in_s3 = true;
>   if (!adev->in_s0ix && !adev->in_s3)
>   return 0;
>   return amdgpu_device_suspend(drm_dev, true);
> --
> 2.34.1



RE: [PATCH] drm/amdkfd: Initialize kfd_gpu_cache_info for KFD topology

2024-02-06 Thread Deucher, Alexander
[AMD Official Use Only - General]

> -Original Message-
> From: Kuehling, Felix 
> Sent: Tuesday, February 6, 2024 4:15 PM
> To: Greathouse, Joseph ; amd-
> g...@lists.freedesktop.org; Deucher, Alexander
> 
> Subject: Re: [PATCH] drm/amdkfd: Initialize kfd_gpu_cache_info for KFD
> topology
>
>
> On 2024-02-06 15:55, Joseph Greathouse wrote:
> > The current kfd_gpu_cache_info structure is only partially filled in
> > for some architectures. This means that for devices where we do not
> > fill in some fields, we can returned uninitialized values through  the
> > KFD topology.
> > Zero out the kfd_gpu_cache_info before asking the remaining fields to
> > be filled in by lower-level functions.
> >
> > Signed-off-by: Joseph Greathouse 
>
> This fixes your previous patch "drm/amdkfd: Add cache line sizes to KFD
> topology". Alex, I think the previous patch hasn't gone upstream yet. Do you
> want a Fixes: tag or is is possible to squash this with Joe's previous patch
> before upstreaming?

Either way.  I can fix up the tag when we upstream or squash it.

Alex

>
> One nit-pick below.
>
>
> > ---
> >   drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 1 +
> >   1 file changed, 1 insertion(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> > b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> > index 3df2a8ad86fb..67c1e7f84750 100644
> > --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> > @@ -1707,6 +1707,7 @@ static void kfd_fill_cache_non_crat_info(struct
> > kfd_topology_device *dev, struct
> >
> > gpu_processor_id = dev->node_props.simd_id_base;
> >
> > +   memset(cache_info, 0, sizeof(struct kfd_gpu_cache_info) *
> > +KFD_MAX_CACHE_TYPES);
>
> Just use sizeof(cache_info). No need to calculate the size of the array and 
> risk
> getting it wrong.
>
> Regards,
>Felix
>
>
> > pcache_info = cache_info;
> > num_of_cache_types = kfd_get_gpu_cache_info(kdev, _info);
> > if (!num_of_cache_types) {


Re: drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole (v2)

2024-02-06 Thread Deucher, Alexander
[AMD Official Use Only - General]

The firmware has not been released yet, It's still undergoing regression 
testing.

Alex



From: Shengyu Qu
Sent: Tuesday, February 6, 2024 5:08 AM
To: Deucher, Alexander; Kuehling, Felix; amd-gfx@lists.freedesktop.org
Cc: wiagn...@outlook.com; Cornwall, Jay; Koenig, Christian; Paneer Selvam, 
Arunpravin
Subject: Re: drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole (v2)

Hi Alexander,

在 2024/2/6 1:12, Deucher, Alexander 写道:

Are you only seeing the problem with this patch applied or in general?  If you 
are seeing it in general, it likely related to a firmware issue that was 
recently fixed that will be resolved with an update CP firmware image.
Driver side changes:
https://gitlab.freedesktop.org/agd5f/linux/-/commit/0eb6c664b780dd1b4080e047ad51b100cd7840a3
https://gitlab.freedesktop.org/agd5f/linux/-/commit/40970e60070ed3d1390ec65e38e819f6d81b8f0c

Alex


This problem is not affected by this patch, so possible the firmware issue. 
Where can I get the newest firmware image? Or is it already pushed to 
linux-firmware repo?

Best regards,
Shengyu



Re: [PATCH v2] amdkfd: pass debug exceptions to second-level trap handler

2024-02-06 Thread Deucher, Alexander
[AMD Official Use Only - General]

Acked-by: Alex Deucher 

From: amd-gfx  on behalf of Laurent 
Morichetti 
Sent: Thursday, February 1, 2024 4:33 PM
To: amd-gfx@lists.freedesktop.org 
Cc: jay.cornwall@amd.com ; Morichetti, Laurent 
; Six, Lancelot ; Cornwall, 
Jay 
Subject: [PATCH v2] amdkfd: pass debug exceptions to second-level trap handler

Call the 2nd level trap handler if the cwsr handler is entered with any
one of wave_start, wave_end, or trap_after_inst exceptions.

Signed-off-by: Laurent Morichetti 
Tested-by: Lancelot Six 
Reviewed-by: Jay Cornwall 
---
 drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h  |  2 +-
 .../drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm  | 17 -
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h 
b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
index d1caaf0e6a7c..2e9b64edb8d2 100644
--- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
+++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
@@ -2518,7 +2518,7 @@ static const uint32_t cwsr_trap_gfx11_hex[] = {
 0x8b6eff7b, 0x0400,
 0xbfa20045, 0xbf830010,
 0xb8fbf803, 0xbfa0fffa,
-   0x8b6eff7b, 0x0900,
+   0x8b6eff7b, 0x00160900,
 0xbfa20015, 0x8b6eff7b,
 0x71ff, 0xbfa10008,
 0x8b6fff7b, 0x7080,
diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm 
b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm
index 71b3dc0c7363..7568ff3af978 100644
--- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm
+++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm
@@ -81,6 +81,11 @@ var SQ_WAVE_TRAPSTS_POST_SAVECTX_SHIFT   = 11
 var SQ_WAVE_TRAPSTS_POST_SAVECTX_SIZE   = 21
 var SQ_WAVE_TRAPSTS_ILLEGAL_INST_MASK   = 0x800
 var SQ_WAVE_TRAPSTS_EXCP_HI_MASK= 0x7000
+#if ASIC_FAMILY >= CHIP_PLUM_BONITO
+var SQ_WAVE_TRAPSTS_WAVE_START_MASK= 0x2
+var SQ_WAVE_TRAPSTS_WAVE_END_MASK  = 0x4
+var SQ_WAVE_TRAPSTS_TRAP_AFTER_INST_MASK   = 0x10
+#endif

 var SQ_WAVE_MODE_EXCP_EN_SHIFT  = 12
 var SQ_WAVE_MODE_EXCP_EN_ADDR_WATCH_SHIFT   = 19
@@ -92,6 +97,16 @@ var SQ_WAVE_IB_STS_RCNT_FIRST_REPLAY_MASK= 0x003F8000

 var SQ_WAVE_MODE_DEBUG_EN_MASK  = 0x800

+#if ASIC_FAMILY < CHIP_PLUM_BONITO
+var S_TRAPSTS_NON_MASKABLE_EXCP_MASK   = 
SQ_WAVE_TRAPSTS_MEM_VIOL_MASK|SQ_WAVE_TRAPSTS_ILLEGAL_INST_MASK
+#else
+var S_TRAPSTS_NON_MASKABLE_EXCP_MASK   = SQ_WAVE_TRAPSTS_MEM_VIOL_MASK 
|\
+ 
SQ_WAVE_TRAPSTS_ILLEGAL_INST_MASK  |\
+ 
SQ_WAVE_TRAPSTS_WAVE_START_MASK|\
+ SQ_WAVE_TRAPSTS_WAVE_END_MASK 
 |\
+ 
SQ_WAVE_TRAPSTS_TRAP_AFTER_INST_MASK
+#endif
+
 // bits [31:24] unused by SPI debug data
 var TTMP11_SAVE_REPLAY_W64H_SHIFT   = 31
 var TTMP11_SAVE_REPLAY_W64H_MASK= 0x8000
@@ -224,7 +239,7 @@ L_NOT_HALTED:
 // Check non-maskable exceptions. memory_violation, illegal_instruction
 // and xnack_error exceptions always cause the wave to enter the trap
 // handler.
-   s_and_b32   ttmp2, s_save_trapsts, 
SQ_WAVE_TRAPSTS_MEM_VIOL_MASK|SQ_WAVE_TRAPSTS_ILLEGAL_INST_MASK
+   s_and_b32   ttmp2, s_save_trapsts, S_TRAPSTS_NON_MASKABLE_EXCP_MASK
 s_cbranch_scc1  L_FETCH_2ND_TRAP

 // Check for maskable exceptions in trapsts.excp and trapsts.excp_hi.

base-commit: c4b562a17829454713e45219fa754be1bfda9004
--
2.25.1



RE: drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole (v2)

2024-02-05 Thread Deucher, Alexander
[Public]

> -Original Message-
> From: amd-gfx  On Behalf Of
> Shengyu Qu
> Sent: Saturday, February 3, 2024 8:05 AM
> To: Kuehling, Felix ; amd-gfx@lists.freedesktop.org
> Cc: wiagn...@outlook.com; Cornwall, Jay ;
> Koenig, Christian ; Paneer Selvam, Arunpravin
> 
> Subject: Re: drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole (v2)
>
> Hi Felix,
> Sorry for my late reply. I was busy this week.
> I just did some more tests using next-20240202 branch. Testing using blender
> 4.0.2, when only one HIP render task is running, there's no problem.
> However, when two tasks run together, software always crashes, but not
> crashes the whole system. Dmesg reports gpu reset in most cases, for
> example:
>
> [  176.071823] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring
> gfx_0.0.0 timeout, signaled seq=32608, emitted seq=32610 [  176.072000]
> [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process
> information: process blender pid 4256 thread blender:cs0 pid 4297
> [  176.072143] amdgpu :03:00.0: amdgpu: GPU reset begin!
> [  176.073571] amdgpu :03:00.0: amdgpu: Guilty job already signaled,
> skipping HW reset [  176.073593] amdgpu :03:00.0: amdgpu: GPU
> reset(4) succeeded!
>
> And in some rare cases, there would be a page fault report, see dmesg.log.
> Do you have any idea? Can I make it print more detailed diagnostic
> information?

Are you only seeing the problem with this patch applied or in general?  If you 
are seeing it in general, it likely related to a firmware issue that was 
recently fixed that will be resolved with an update CP firmware image.
Driver side changes:
https://gitlab.freedesktop.org/agd5f/linux/-/commit/0eb6c664b780dd1b4080e047ad51b100cd7840a3
https://gitlab.freedesktop.org/agd5f/linux/-/commit/40970e60070ed3d1390ec65e38e819f6d81b8f0c

Alex


>
> Best regards,
> Shengyu
>
>
> 在 2024/1/30 01:47, Felix Kuehling 写道:
> > On 2024-01-29 10:24, Shengyu Qu wrote:
> >> Hello Felix,
> >> I think you are right. This problem has existed for years(just look
> >> at the issue creation time in my link), and is thought caused by
> >> OpenGL-ROCM interop(that's why I think this patch might help). It is
> >> very easy to trigger this problem in blender(method is also mentioned
> >> in the link).
> >
> > This doesn't help you, but it's unlikely that this has been the same
> > issue for two years for everybody who chimed into this bug report.
> > Different kernel versions, GPUs, user mode ROCm and Mesa versions etc.
> >
> > Case in point, it's possible that you're seeing an issue specific to
> > RDNA3, which hasn't even been around for that long.
> >
> >
> >> Do
> >> you have any idea about this?
> >
> > Not without seeing a lot more diagnostic information. A full backtrace
> > from your kernel log would be a good start.
> >
> > Regards,
> >   Felix
> >
> >
> >> Best regards,
> >> Shengyu
> >> 在 2024/1/29 22:51, Felix Kuehling 写道:
> >>> On 2024-01-29 8:58, Shengyu Qu wrote:
>  Hi,
>  Seems rocm-opengl interop hang problem still exists[1]. Btw have
>  you discovered into this problem?
>  Best regards,
>  Shengyu
>  [1]
>  https://projects.blender.org/blender/blender/issues/100353#issuecom
>  ment-599
> >>>
> >>> Maybe you're having a different problem. Do you see this issue also
> >>> without any version of the "Relocate TBA/TMA ..." patch?
> >>>
> >>> Regards,
> >>>   Felix
> >>>
> >>>
> 
>  在 2024/1/27 03:15, Shengyu Qu 写道:
> > Hello Felix,
> > This patch seems working on my system, also it seems fixes the
> > ROCM/OpenGL interop problem.
> > Is this intended to happen or not? Maybe we need more users to
> > test it.
> > Besides,
> > Tested-by: Shengyu Qu  Best Regards,
> Shengyu
> >
> > 在 2024/1/26 06:27, Felix Kuehling 写道:
> >> The TBA and TMA, along with an unused IB allocation, reside at
> >> low addresses in the VM address space. A stray VM fault which
> >> hits these pages must be serviced by making their page table entries
> invalid.
> >> The scheduler depends upon these pages being resident and fails,
> >> preventing a debugger from inspecting the failure state.
> >>
> >> By relocating these pages above 47 bits in the VM address space
> >> they can only be reached when bits [63:48] are set to 1. This
> >> makes it much less likely for a misbehaving program to generate
> >> accesses to them.
> >> The current placement at VA (PAGE_SIZE*2) is readily hit by a
> >> NULL access with a small offset.
> >>
> >> v2:
> >> - Move it to the reserved space to avoid concflicts with Mesa
> >> - Add macros to make reserved space management easier
> >>
> >> Cc: Arunpravin Paneer Selvam 
> >> Cc: Christian Koenig 
> >> Signed-off-by: Jay Cornwall 
> >> Signed-off-by: Felix Kuehling 
> >> ---
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c  |  4 +--
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c|  7 ++---
> >> 

Re: [PATCH] drm/amd/display: Clear phantom stream count and plane count

2024-02-05 Thread Deucher, Alexander
[Public]

Acked-by: Alex Deucher 

From: amd-gfx  on behalf of Mario 
Limonciello 
Sent: Friday, February 2, 2024 7:30 PM
To: amd-gfx@lists.freedesktop.org 
Cc: Limonciello, Mario 
Subject: [PATCH] drm/amd/display: Clear phantom stream count and plane count

When dc_state_destruct() was refactored the new phantom_stream_count
and phantom_plane_count members weren't cleared.

Fixes: 012a04b1d6af ("drm/amd/display: Refactor phantom resource allocation")
Signed-off-by: Mario Limonciello 
---
 drivers/gpu/drm/amd/display/dc/core/dc_state.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_state.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_state.c
index 88c6436b28b6..180ac47868c2 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_state.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_state.c
@@ -291,11 +291,14 @@ void dc_state_destruct(struct dc_state *state)
 dc_stream_release(state->phantom_streams[i]);
 state->phantom_streams[i] = NULL;
 }
+   state->phantom_stream_count = 0;

 for (i = 0; i < state->phantom_plane_count; i++) {
 dc_plane_state_release(state->phantom_planes[i]);
 state->phantom_planes[i] = NULL;
 }
+   state->phantom_plane_count = 0;
+
 state->stream_mask = 0;
 memset(>res_ctx, 0, sizeof(state->res_ctx));
 memset(>pp_display_cfg, 0, sizeof(state->pp_display_cfg));
--
2.34.1



RE: [PATCH 2/2] drm/amdgpu: reset gpu for s3 suspend abort case

2024-02-02 Thread Deucher, Alexander
[Public]

> -Original Message-
> From: Liang, Prike 
> Sent: Thursday, February 1, 2024 3:58 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Sharma, Deepak
> ; Liang, Prike 
> Subject: [PATCH 2/2] drm/amdgpu: reset gpu for s3 suspend abort case
>
> In the s3 suspend abort case some type of gfx9 power rail not turn off from
> FCH side and this will put the GPU in an unknown power status, so let's reset
> the gpu to a known good power state before reinitialize gpu device.
>
> Signed-off-by: Prike Liang 

Acked-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/amdgpu/soc15.c | 22 ++
>  1 file changed, 22 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c
> b/drivers/gpu/drm/amd/amdgpu/soc15.c
> index 15033efec2ba..c64c01e2944a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
> @@ -1298,10 +1298,32 @@ static int soc15_common_suspend(void
> *handle)
>   return soc15_common_hw_fini(adev);
>  }
>
> +static bool soc15_need_reset_on_resume(struct amdgpu_device *adev) {
> + u32 sol_reg;
> +
> + sol_reg = RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_81);
> +
> + /* Will reset for the following suspend abort cases.
> +  * 1) Only reset limit on APU side, dGPU hasn't checked yet.
> +  * 2) S3 suspend abort and TOS already launched.
> +  */
> + if (adev->flags & AMD_IS_APU && adev->in_s3 &&
> + !adev->suspend_complete &&
> + sol_reg)
> + return true;
> +
> + return false;
> +}
> +
>  static int soc15_common_resume(void *handle)  {
>   struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>
> + if (soc15_need_reset_on_resume(adev)) {
> + dev_info(adev->dev, "S3 suspend abort case, let's reset
> ASIC.\n");
> + soc15_asic_reset(adev);
> + }
>   return soc15_common_hw_init(adev);
>  }
>
> --
> 2.34.1



RE: [PATCH 1/2] drm/amdgpu: skip to program GFXDEC registers for suspend abort

2024-02-02 Thread Deucher, Alexander
[AMD Official Use Only - General]

> -Original Message-
> From: Liang, Prike 
> Sent: Thursday, February 1, 2024 3:58 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Sharma, Deepak
> ; Liang, Prike 
> Subject: [PATCH 1/2] drm/amdgpu: skip to program GFXDEC registers for
> suspend abort
>
> In the suspend abort cases, the gfx power rail doesn't turn off so some
> GFXDEC registers/CSB can't reset to default value and at this moment
> reinitialize GFXDEC/CSB will result in an unexpected error.
> So let skip those program sequence for the suspend abort case.
>
> Signed-off-by: Prike Liang 

Reviewed-by: Alex Deucher 


> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 ++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 ++
>  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c   | 8 
>  3 files changed, 12 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index c5f3859fd682..312dfaec7b4a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -1079,6 +1079,8 @@ struct amdgpu_device {
>   boolin_s3;
>   boolin_s4;
>   boolin_s0ix;
> + /* indicate amdgpu suspension status */
> + boolsuspend_complete;
>
>   enum pp_mp1_state   mp1_state;
>   struct amdgpu_doorbell_index doorbell_index; diff --git
> a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index 475bd59c9ac2..59254144916c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -2472,6 +2472,7 @@ static int amdgpu_pmops_suspend(struct device
> *dev)
>   struct drm_device *drm_dev = dev_get_drvdata(dev);
>   struct amdgpu_device *adev = drm_to_adev(drm_dev);
>
> + adev->suspend_complete = false;
>   if (amdgpu_acpi_is_s0ix_active(adev))
>   adev->in_s0ix = true;
>   else if (amdgpu_acpi_is_s3_active(adev)) @@ -2486,6 +2487,7 @@
> static int amdgpu_pmops_suspend_noirq(struct device *dev)
>   struct drm_device *drm_dev = dev_get_drvdata(dev);
>   struct amdgpu_device *adev = drm_to_adev(drm_dev);
>
> + adev->suspend_complete = true;
>   if (amdgpu_acpi_should_gpu_reset(adev))
>   return amdgpu_asic_reset(adev);
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> index 57808be6e3ec..169d45268ef6 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> @@ -3034,6 +3034,14 @@ static int gfx_v9_0_cp_gfx_start(struct
> amdgpu_device *adev)
>
>   gfx_v9_0_cp_gfx_enable(adev, true);
>
> + /* Now only limit the quirk on the APU gfx9 series and already
> +  * confirmed that the APU gfx10/gfx11 needn't such update.
> +  */
> + if (adev->flags & AMD_IS_APU &&
> + adev->in_s3 && !adev->suspend_complete) {
> + DRM_INFO(" Will skip the CSB packet resubmit\n");
> + return 0;
> + }
>   r = amdgpu_ring_alloc(ring, gfx_v9_0_get_csb_size(adev) + 4 + 3);
>   if (r) {
>   DRM_ERROR("amdgpu: cp failed to lock ring (%d).\n", r);
> --
> 2.34.1



RE: [PATCH] drm/amdgpu: Fix potential out-of-bounds access in 'amdgpu_discovery_reg_base_init()'

2024-02-02 Thread Deucher, Alexander
[Public]

> -Original Message-
> From: SHANMUGAM, SRINIVASAN 
> Sent: Thursday, February 1, 2024 12:36 PM
> To: Deucher, Alexander ; Koenig, Christian
> 
> Cc: amd-gfx@lists.freedesktop.org; SHANMUGAM, SRINIVASAN
> 
> Subject: [PATCH] drm/amdgpu: Fix potential out-of-bounds access in
> 'amdgpu_discovery_reg_base_init()'
>
> The issue arises when the array 'adev->vcn.vcn_config' is accessed before
> checking if the index 'adev->vcn.num_vcn_inst' is within the bounds of the
> array.
>
> The fix involves moving the bounds check before the array access. This ensures
> that 'adev->vcn.num_vcn_inst' is within the bounds of the array before it is
> used as an index.
>
> Fixes the below:
> drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c:1289
> amdgpu_discovery_reg_base_init() error: testing array offset 'adev-
> >vcn.num_vcn_inst' after use.
>
> Cc: Christian König 
> Cc: Alex Deucher 
> Signed-off-by: Srinivasan Shanmugam 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> index ef800590c1ab..83da46d73f70 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> @@ -1282,11 +1282,11 @@ static int
> amdgpu_discovery_reg_base_init(struct amdgpu_device *adev)
>* 0b10 : encode is disabled
>* 0b01 : decode is disabled
>*/
> - adev->vcn.vcn_config[adev-
> >vcn.num_vcn_inst] =
> - ip->revision & 0xc0;
> - ip->revision &= ~0xc0;
>   if (adev->vcn.num_vcn_inst <
>   AMDGPU_MAX_VCN_INSTANCES) {
> + adev->vcn.vcn_config[adev-
> >vcn.num_vcn_inst] =
> + ip->revision & 0xc0;
> + ip->revision &= ~0xc0;

I have vague recollections of this being this way for a reason, but I can't 
recall why at this time.  That said, the ` ip->revision &= ~0xc0;` should 
always be executed, not just if the number of instances < MAX_VCN_INSTANCES. So 
I would move that line after the if/else block.

Alex


>   adev->vcn.num_vcn_inst++;
>   adev->vcn.inst_mask |=
>   (1U << ip->instance_number);
> --
> 2.34.1



RE: [PATCH] drm/amdgpu: Clear the hotplug interrupt ack bit before hpd initialization

2024-01-30 Thread Deucher, Alexander
[Public]

> -Original Message-
> From: amd-gfx  On Behalf Of Qiang
> Ma
> Sent: Tuesday, January 30, 2024 4:35 AM
> To: lexander.deuc...@amd.com; Koenig, Christian
> ; Pan, Xinhui ;
> airl...@gmail.com; dan...@ffwll.ch; sunran...@208suo.com;
> SHANMUGAM, SRINIVASAN 
> Cc: Qiang Ma ; dri-de...@lists.freedesktop.org;
> amd-gfx@lists.freedesktop.org; linux-ker...@vger.kernel.org
> Subject: [PATCH] drm/amdgpu: Clear the hotplug interrupt ack bit before hpd
> initialization
>
> Problem:
> The computer in the bios initialization process, unplug the HDMI display, wait
> until the system up, plug in the HDMI display, did not enter the hotplug
> interrupt function, the display is not bright.
>
> Fix:
> After the above problem occurs, and the hpd ack interrupt bit is 1, the
> interrupt should be cleared during hpd_init initialization so that when the
> driver is ready, it can respond to the hpd interrupt normally.
>
> Signed-off-by: Qiang Ma 
> ---
>  drivers/gpu/drm/amd/amdgpu/dce_v10_0.c |  2 ++
> drivers/gpu/drm/amd/amdgpu/dce_v11_0.c |  2 ++
> drivers/gpu/drm/amd/amdgpu/dce_v6_0.c  | 20 +---
> drivers/gpu/drm/amd/amdgpu/dce_v8_0.c  | 20 +---
>  4 files changed, 38 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c
> b/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c
> index bb666cb7522e..11859059fd10 100644
> --- a/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c
> @@ -51,6 +51,7 @@
>
>  static void dce_v10_0_set_display_funcs(struct amdgpu_device *adev);
> static void dce_v10_0_set_irq_funcs(struct amdgpu_device *adev);
> +static void dce_v10_0_hpd_int_ack(struct amdgpu_device *adev, int hpd);
>
>  static const u32 crtc_offsets[] = {
>   CRTC0_REGISTER_OFFSET,
> @@ -363,6 +364,7 @@ static void dce_v10_0_hpd_init(struct
> amdgpu_device *adev)
>
> AMDGPU_HPD_DISCONNECT_INT_DELAY_IN_MS);
>   WREG32(mmDC_HPD_TOGGLE_FILT_CNTL +
> hpd_offsets[amdgpu_connector->hpd.hpd], tmp);
>
> + dce_v6_0_hpd_int_ack(adev, amdgpu_connector->hpd.hpd);


Should be dce_v10_0_hpd_int_ack().

>   dce_v10_0_hpd_set_polarity(adev, amdgpu_connector-
> >hpd.hpd);
>   amdgpu_irq_get(adev, >hpd_irq,
>  amdgpu_connector->hpd.hpd);
> diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
> b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
> index 7af277f61cca..745e4fdffade 100644
> --- a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
> @@ -51,6 +51,7 @@
>
>  static void dce_v11_0_set_display_funcs(struct amdgpu_device *adev);
> static void dce_v11_0_set_irq_funcs(struct amdgpu_device *adev);
> +static void dce_v11_0_hpd_int_ack(struct amdgpu_device *adev, int hpd);
>
>  static const u32 crtc_offsets[] =
>  {
> @@ -387,6 +388,7 @@ static void dce_v11_0_hpd_init(struct
> amdgpu_device *adev)
>
> AMDGPU_HPD_DISCONNECT_INT_DELAY_IN_MS);
>   WREG32(mmDC_HPD_TOGGLE_FILT_CNTL +
> hpd_offsets[amdgpu_connector->hpd.hpd], tmp);
>
> + dce_v11_0_hpd_int_ack(adev, amdgpu_connector-
> >hpd.hpd);
>   dce_v11_0_hpd_set_polarity(adev, amdgpu_connector-
> >hpd.hpd);
>   amdgpu_irq_get(adev, >hpd_irq, amdgpu_connector-
> >hpd.hpd);
>   }
> diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v6_0.c
> b/drivers/gpu/drm/amd/amdgpu/dce_v6_0.c
> index 143efc37a17f..f8e15ebf74b4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/dce_v6_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/dce_v6_0.c
> @@ -272,6 +272,21 @@ static void dce_v6_0_hpd_set_polarity(struct
> amdgpu_device *adev,
>   WREG32(mmDC_HPD1_INT_CONTROL + hpd_offsets[hpd], tmp);  }
>
> +static void dce_v6_0_hpd_int_ack(struct amdgpu_device *adev,
> +  int hpd)
> +{
> + u32 tmp;
> +
> + if (hpd >= adev->mode_info.num_hpd) {
> + DRM_DEBUG("invalid hdp %d\n", hpd);
> + return;
> + }
> +
> + tmp = RREG32(mmDC_HPD1_INT_CONTROL + hpd_offsets[hpd]);
> + tmp |= DC_HPD1_INT_CONTROL__DC_HPD1_INT_ACK_MASK;
> + WREG32(mmDC_HPD1_INT_CONTROL + hpd_offsets[hpd], tmp); }
> +
>  /**
>   * dce_v6_0_hpd_init - hpd setup callback.
>   *
> @@ -311,6 +326,7 @@ static void dce_v6_0_hpd_init(struct amdgpu_device
> *adev)
>   continue;
>   }
>
> + dce_v6_0_hpd_int_ack(adev, amdgpu_connector->hpd.hpd);
>   dce_v6_0_hpd_set_polarity(adev, amdgpu_connector-
> >hpd.hpd);
>   amdgpu_irq_get(adev, >hpd_irq, amdgpu_connector-
> >hpd.hpd);
>   }
> @@ -3101,9 +3117,7 @@ static int dce_v6_0_hpd_irq(struct amdgpu_device
> *adev,
>   mask = interrupt_status_offsets[hpd].hpd;
>
>   if (disp_int & mask) {
> - tmp = RREG32(mmDC_HPD1_INT_CONTROL +
> hpd_offsets[hpd]);
> - tmp |=
> DC_HPD1_INT_CONTROL__DC_HPD1_INT_ACK_MASK;
> - WREG32(mmDC_HPD1_INT_CONTROL + hpd_offsets[hpd],
> tmp);
> + 

Re: [PATCH] drm/amdgpu: remove golden setting for gfx 11.5.0

2024-01-30 Thread Deucher, Alexander
[AMD Official Use Only - General]

Acked-by: Alex Deucher 

From: Zhang, Yifan 
Sent: Monday, January 29, 2024 4:06 AM
To: amd-gfx@lists.freedesktop.org 
Cc: Deucher, Alexander ; Koenig, Christian 
; Huang, Tim ; Yu, Lang 
; Zhang, Yifan 
Subject: [PATCH] drm/amdgpu: remove golden setting for gfx 11.5.0

No need to set golden settings in driver from gfx 11.5.0 onwards

Signed-off-by: Yifan Zhang 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 32 ++
 1 file changed, 2 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index c1e10760..4e99af904e04 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -90,10 +90,6 @@ MODULE_FIRMWARE("amdgpu/gc_11_5_0_me.bin");
 MODULE_FIRMWARE("amdgpu/gc_11_5_0_mec.bin");
 MODULE_FIRMWARE("amdgpu/gc_11_5_0_rlc.bin");

-static const struct soc15_reg_golden golden_settings_gc_11_0[] = {
-   SOC15_REG_GOLDEN_VALUE(GC, 0, regTCP_CNTL, 0x2000, 0x2000)
-};
-
 static const struct soc15_reg_golden golden_settings_gc_11_0_1[] =
 {
 SOC15_REG_GOLDEN_VALUE(GC, 0, regCGTT_GS_NGG_CLK_CTRL, 0x9fff8fff, 
0x0010),
@@ -104,24 +100,8 @@ static const struct soc15_reg_golden 
golden_settings_gc_11_0_1[] =
 SOC15_REG_GOLDEN_VALUE(GC, 0, regPA_SC_ENHANCE_3, 0xfffd, 
0x0008),
 SOC15_REG_GOLDEN_VALUE(GC, 0, regPA_SC_VRS_SURFACE_CNTL_1, 0xfff891ff, 
0x55480100),
 SOC15_REG_GOLDEN_VALUE(GC, 0, regTA_CNTL_AUX, 0xf7f7, 0x0103),
-   SOC15_REG_GOLDEN_VALUE(GC, 0, regTCP_CNTL2, 0xfcff, 0x000a)
-};
-
-static const struct soc15_reg_golden golden_settings_gc_11_5_0[] = {
-   SOC15_REG_GOLDEN_VALUE(GC, 0, regDB_DEBUG5, 0x, 0x0800),
-   SOC15_REG_GOLDEN_VALUE(GC, 0, regGB_ADDR_CONFIG, 0x0c1807ff, 
0x0242),
-   SOC15_REG_GOLDEN_VALUE(GC, 0, regGCR_GENERAL_CNTL, 0x1ff1, 
0x0500),
-   SOC15_REG_GOLDEN_VALUE(GC, 0, regGL2A_ADDR_MATCH_MASK, 0x, 
0xfff3),
-   SOC15_REG_GOLDEN_VALUE(GC, 0, regGL2C_ADDR_MATCH_MASK, 0x, 
0xfff3),
-   SOC15_REG_GOLDEN_VALUE(GC, 0, regGL2C_CTRL, 0x, 0xf37fff3f),
-   SOC15_REG_GOLDEN_VALUE(GC, 0, regGL2C_CTRL3, 0xfffb, 0x00f40188),
-   SOC15_REG_GOLDEN_VALUE(GC, 0, regGL2C_CTRL4, 0xf0ff, 0x80009007),
-   SOC15_REG_GOLDEN_VALUE(GC, 0, regPA_CL_ENHANCE, 0xf1ff, 0x00880007),
-   SOC15_REG_GOLDEN_VALUE(GC, 0, regPC_CONFIG_CNTL_1, 0x, 
0x0001),
-   SOC15_REG_GOLDEN_VALUE(GC, 0, regTA_CNTL_AUX, 0xf7f7, 0x0103),
-   SOC15_REG_GOLDEN_VALUE(GC, 0, regTA_CNTL2, 0x007f, 0x),
-   SOC15_REG_GOLDEN_VALUE(GC, 0, regTCP_CNTL2, 0xffcf, 0x200a),
-   SOC15_REG_GOLDEN_VALUE(GC, 0, regUTCL1_CTRL_2, 0x, 0x048f)
+   SOC15_REG_GOLDEN_VALUE(GC, 0, regTCP_CNTL2, 0xfcff, 0x000a),
+   SOC15_REG_GOLDEN_VALUE(GC, 0, regTCP_CNTL, 0x2000, 0x2000)
 };

 #define DEFAULT_SH_MEM_CONFIG \
@@ -304,17 +284,9 @@ static void gfx_v11_0_init_golden_registers(struct 
amdgpu_device *adev)
 golden_settings_gc_11_0_1,
 (const 
u32)ARRAY_SIZE(golden_settings_gc_11_0_1));
 break;
-   case IP_VERSION(11, 5, 0):
-   soc15_program_register_sequence(adev,
-   golden_settings_gc_11_5_0,
-   (const 
u32)ARRAY_SIZE(golden_settings_gc_11_5_0));
-   break;
 default:
 break;
 }
-   soc15_program_register_sequence(adev,
-   golden_settings_gc_11_0,
-   (const 
u32)ARRAY_SIZE(golden_settings_gc_11_0));

 }

--
2.37.3



RE: Have WX 3200 Radeon graphics card -- cannot get X11 session to work

2024-01-29 Thread Deucher, Alexander
[Public]

> -Original Message-
> From: William Bulley 
> Sent: Sunday, January 28, 2024 2:38 PM
> To: Deucher, Alexander 
> Cc: amd-gfx@lists.freedesktop.org
> Subject: Re: Have WX 3200 Radeon graphics card -- cannot get X11 session to
> work
>
> According to "Deucher, Alexander"  on Fri,
> 01/26/24 at 16:28:
> >
> > [AMD Official Use Only - General]
> >
> > Make sure you have OS mouse and keyboard drivers loaded and configured
> > within your X config?
>
> I got it to work!!!  Thanks to all who helped.


Glad to year you got it working!

Alex

>
> I got the clue I needed from this page this morning:
>
>https://fedoraproject.org/wiki/Input_device_configuration
>
> Here is the config that finally works:
>
> unix% pwd
> /usr/local/etc/X11/xorg.conf.d
> unix% cat 10-driver.conf
>
> Section "InputClass"
> Identifier  "Keyboard0"
> MatchIsKeyboard "on"
> Driver  "libinput"
> EndSection
>
> Section "InputClass"
> Identifier  "Mouse0"
> MatchIsPointer  "on"
> Driver  "libinput"
> EndSection
>
> Section "Device"
> Identifier  "Card0"
> Driver  "amdgpu"
> BusID   "PCI:41:0:0"
> Option  "DisplayPort-0" "Monitor0"
> EndSection
>
> --
> William Bulley
> E-MAIL: w...@umich.edu
> 


Re: [PATCH] drm/amdgpu: Fix the warning info in mode1 reset

2024-01-29 Thread Deucher, Alexander
[AMD Official Use Only - General]

Ping?


From: Deucher, Alexander 
Sent: Thursday, January 25, 2024 11:15 AM
To: amd-gfx@lists.freedesktop.org 
Cc: Ma, Jun ; Deucher, Alexander ; 
Prosyak, Vitaly 
Subject: [PATCH] drm/amdgpu: Fix the warning info in mode1 reset

From: Ma Jun 

Fix the warning info below during mode1 reset.
[  +0.04] Call Trace:
[  +0.04]  
[  +0.06]  ? show_regs+0x6e/0x80
[  +0.11]  ? __flush_work.isra.0+0x2e8/0x390
[  +0.05]  ? __warn+0x91/0x150
[  +0.09]  ? __flush_work.isra.0+0x2e8/0x390
[  +0.06]  ? report_bug+0x19d/0x1b0
[  +0.13]  ? handle_bug+0x46/0x80
[  +0.12]  ? exc_invalid_op+0x1d/0x80
[  +0.11]  ? asm_exc_invalid_op+0x1f/0x30
[  +0.14]  ? __flush_work.isra.0+0x2e8/0x390
[  +0.07]  ? __flush_work.isra.0+0x208/0x390
[  +0.07]  ? _prb_read_valid+0x216/0x290
[  +0.08]  __cancel_work_timer+0x11d/0x1a0
[  +0.07]  ? try_to_grab_pending+0xe8/0x190
[  +0.12]  cancel_work_sync+0x14/0x20
[  +0.08]  amddrm_sched_stop+0x3c/0x1d0 [amd_sched]
[  +0.32]  amdgpu_device_gpu_recover+0x29a/0xe90 [amdgpu]

This warning info was printed after applying the patch
"drm/sched: Convert drm scheduler to use a work queue rather than kthread".
The root cause is that amdgpu driver tries to use the uninitialized
work_struct in the struct drm_gpu_scheduler

v2:
 - Rename the function to amdgpu_ring_sched_ready and move it to
amdgpu_ring.c (Alex)
v3:
- Fix a few more checks based on Vitaly's patch (Alex)

Fixes: 11b3b9f461c5 ("drm/sched: Check scheduler ready before calling timeout 
handling")
Reviewed-by: Alex Deucher 
Signed-off-by: Vitaly Prosyak 
Signed-off-by: Ma Jun 
Signed-off-by: Alex Deucher 
---
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c|  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c|  8 
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 12 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c   | 14 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  2 +-
 5 files changed, 25 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
index 899e31e3a5e8..3a3f3ce09f00 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
@@ -290,7 +290,7 @@ static int suspend_resume_compute_scheduler(struct 
amdgpu_device *adev, bool sus
 for (i = 0; i < adev->gfx.num_compute_rings; i++) {
 struct amdgpu_ring *ring = >gfx.compute_ring[i];

-   if (!(ring && drm_sched_wqueue_ready(>sched)))
+   if (!amdgpu_ring_sched_ready(ring))
 continue;

 /* stop secheduler and drain ring. */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index e485dd3357c6..1afbb2e932c6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -1678,7 +1678,7 @@ static int amdgpu_debugfs_test_ib_show(struct seq_file 
*m, void *unused)
 for (i = 0; i < AMDGPU_MAX_RINGS; i++) {
 struct amdgpu_ring *ring = adev->rings[i];

-   if (!ring || !drm_sched_wqueue_ready(>sched))
+   if (!amdgpu_ring_sched_ready(ring))
 continue;
 drm_sched_wqueue_stop(>sched);
 }
@@ -1694,7 +1694,7 @@ static int amdgpu_debugfs_test_ib_show(struct seq_file 
*m, void *unused)
 for (i = 0; i < AMDGPU_MAX_RINGS; i++) {
 struct amdgpu_ring *ring = adev->rings[i];

-   if (!ring || !drm_sched_wqueue_ready(>sched))
+   if (!amdgpu_ring_sched_ready(ring))
 continue;
 drm_sched_wqueue_start(>sched);
 }
@@ -1916,8 +1916,8 @@ static int amdgpu_debugfs_ib_preempt(void *data, u64 val)

 ring = adev->rings[val];

-   if (!ring || !ring->funcs->preempt_ib ||
-   !drm_sched_wqueue_ready(>sched))
+   if (!amdgpu_ring_sched_ready(ring) ||
+   !ring->funcs->preempt_ib)
 return -EINVAL;

 /* the last preemption failed */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 1a04ccba9542..7ff17df7a5ce 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5042,7 +5042,7 @@ bool amdgpu_device_has_job_running(struct amdgpu_device 
*adev)
 for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
 struct amdgpu_ring *ring = adev->rings[i];

-   if (!ring || !drm_sched_wqueue_ready(>sched))
+   if (!amdgpu_ring_sched_ready(ring))
 continue;

 spin_lock(>sched.job_

RE: Have WX 3200 Radeon graphics card -- cannot get X11 session to work

2024-01-26 Thread Deucher, Alexander
[AMD Official Use Only - General]

> -Original Message-
> From: William Bulley 
> Sent: Friday, January 26, 2024 4:19 PM
> To: Deucher, Alexander 
> Cc: amd-gfx@lists.freedesktop.org
> Subject: Re: Have WX 3200 Radeon graphics card -- cannot get X11 session to
> work
>
> According to "Deucher, Alexander"  on Fri,
> 01/26/24 at 15:50:
> >
> > [Public]
> >
> > Kernel driver looks like its loaded properly.
> >
> > I don't really have much experience with freebsd, but it doesn't seem
> > to be able to open the kernel driver.  Perhaps X starts before the
> > kernel driver has finished loading?  Can you try and load the kernel driver 
> > and
> then start X?
>
> After sending this and researching the forums some more, I made a few
> changes and have had some success, but not quite there yet...
>
> Using what I found in the forums I now have this one file in my
> /usr/local/etc/X11/xorg.conf.d directory:
>
> unix% cat /usr/local/etc/X11/xorg.conf.d/10-driver.conf
> Section "Device"
> Identifier  "Card0"
> Driver  "amdgpu"
> BusID   "PCI:41:0:0"
> EndSection
>
> I got to this point today after learning about this the other day:
>
> unix# pciconf -lv | grep -A4 vgapci
> vgapci0@pci0:41:0:0:class=0x03 rev=0x10 hdr=0x00 vendor=0x1002
> device=0x6981 subvendor=0x1002 subdevice=0x0b0d
> vendor = 'Advanced Micro Devices, Inc. [AMD/ATI]'
> device = 'Lexa XT [Radeon PRO WX 3200]'
> class  = display
> subclass   = VGA
>
> I am not loading any relevant modules in my /boot/loader.conf file.
>
> This line was added to my /etc/rc.conf file:
>
>kld_list="amdgpu"
>
> I felt the amdgpu driver would support my WX 3200 (RS780) graphics card,
> and after a reboot, I was proved correct.  Previously I was starting my
> x11 session using the "startx" command from the vt0 virtual terminal.
>
> Just minutes ago I logged into the virtual terminal vt2 on this system as a 
> non-
> root user.  There I entered the "startx" command and I was completly
> surprised by the beautiful x11 session that appeared!!!
>
> Unfortunately, the mouse pointer is frozen at the exact center of the
> 3440 x 1440 monitor, and it will not move.  I don't know how to fix this.  Any
> help or ideas or suggestions would be greatly appreciated.

Make sure you have OS mouse and keyboard drivers loaded and configured within 
your X config?

Alex



RE: Have WX 3200 Radeon graphics card -- cannot get X11 session to work

2024-01-26 Thread Deucher, Alexander
[Public]

> -Original Message-
> From: amd-gfx  On Behalf Of
> William Bulley
> Sent: Friday, January 26, 2024 1:45 PM
> To: amd-gfx@lists.freedesktop.org
> Subject: Have WX 3200 Radeon graphics card -- cannot get X11 session to
> work
>
> I am running FreeBSD 14.0-STABLE from January 4th.  I have read the
> handbook and followed the instructions there.  I have these drivers:
>
> unix# ls -al /usr/local/lib/xorg/modules/drivers
> total 476
> drwxr-xr-x  2 root wheel  8 Jan 21 14:18 .
> drwxr-xr-x  5 root wheel 13 Jan 21 14:18 ..
> -rwxr-xr-x  1 root wheel 146216 Jan  6 11:41 amdgpu_drv.so
> -rwxr-xr-x  1 root wheel   7344 Jan 11 13:18 ati_drv.so
> -rwxr-xr-x  1 root wheel 112320 Jan 21 14:18 modesetting_drv.so -rwxr-xr-x
> 1 root wheel 501696 Jan 11 13:18 radeon_drv.so -rwxr-xr-x  1 root wheel
> 19800 Jan  6 11:41 scfb_drv.so -rwxr-xr-x  1 root wheel  27392 Jan  6 11:41
> vesa_drv.so
>
> I have these modules:
>
> unix# cd /boot/modules
> unix# ls -al *amdgpu*
> -r--r--r--  1 root wheel 8581752 Jan 22 15:14 amdgpu.ko < other amdgpu*.ko modules>> unix# ls -al *kms*
> -r--r--r--  1 root wheel 3013512 Jan 22 15:14 i915kms.ko
> -r--r--r--  1 root wheel 2394600 Jan 22 15:14 radeonkms.ko
>
> I have followed the instructions in the handbook Chapter 5.1 but have never
> gotten an x11 session to appear.  Whenever I run startx as a non-root user, 
> the
> error message is always
> "(EE) no screens found".
>
> The WX 3200 (RS780?) is a newer card, so I put this line in my /etc/rc,conf 
> file:
>
>kld_list+="amdgpu"
>
> During the booting of the O/S these 90 lines appear in my /var/log/messages
> file:
>
> Jan 26 10:35:20 msi1 kernel: [drm] amdgpu kernel modesetting enabled.
> Jan 26 10:35:20 msi1 kernel: drmn0:  on vgapci0 Jan 26 10:35:20
> msi1 kernel: vgapci0: child drmn0 requested pci_enable_io Jan 26 10:35:20
> msi1 kernel: vgapci0: child drmn0 requested pci_enable_io Jan 26 10:35:20
> msi1 kernel: [drm] initializing kernel modesetting (POLARIS12 0x1002:0x6981
> 0x1002:0x0B0D 0x10).
> Jan 26 10:35:20 msi1 kernel: drmn0: Trusted Memory Zone (TMZ) feature not
> supported Jan 26 10:35:20 msi1 kernel: [drm] register mmio base:
> 0xFCC0 Jan 26 10:35:20 msi1 kernel: [drm] register mmio size: 262144
> Jan 26 10:35:20 msi1 kernel: [drm] add ip block number 0  Jan
> 26 10:35:20 msi1 kernel: [drm] add ip block number 1  Jan 26
> 10:35:20 msi1 kernel: [drm] add ip block number 2  Jan 26
> 10:35:20 msi1 kernel: [drm] add ip block number 3  Jan 26
> 10:35:20 msi1 kernel: [drm] add ip block number 4  Jan 26
> 10:35:20 msi1 kernel: [drm] add ip block number 5  Jan 26
> 10:35:20 msi1 kernel: [drm] add ip block number 6  Jan 26 10:35:20
> msi1 kernel: [drm] add ip block number 7  Jan 26 10:35:20 msi1
> kernel: [drm] add ip block number 8  Jan 26 10:35:20 msi1 kernel:
> drmn0: Fetched VBIOS from VFCT Jan 26 10:35:20 msi1 kernel: amdgpu:
> ATOM BIOS: 113-D0155100-101 Jan 26 10:35:20 msi1 kernel: [drm] UVD is
> enabled in VM mode Jan 26 10:35:20 msi1 kernel: [drm] UVD ENC is enabled
> in VM mode Jan 26 10:35:20 msi1 kernel: [drm] VCE enabled in VM mode Jan
> 26 10:35:20 msi1 kernel: [drm] vm size is 256 GB, 2 levels, block size is 
> 10-bit,
> fragment size is 9-bit Jan 26 10:35:20 msi1 kernel: drmn0: successfully loaded
> firmware image 'amdgpu/polaris12_k_mc.bin'
> Jan 26 10:35:20 msi1 kernel: drmn0: VRAM: 4096M 0x00F4 -
> 0x00F4 (4096M used) Jan 26 10:35:20 msi1 kernel: drmn0:
> GART: 256M 0x00FF - 0x00FF0FFF Jan 26 10:35:20
> msi1 kernel: [drm] Detected VRAM RAM=4096M, BAR=4096M Jan 26
> 10:35:20 msi1 kernel: [drm] RAM width 128bits GDDR5 Jan 26 10:35:20 msi1
> kernel: [drm] amdgpu: 4096M of VRAM memory ready Jan 26 10:35:20 msi1
> kernel: [drm] amdgpu: 4096M of GTT memory ready.
> Jan 26 10:35:20 msi1 kernel: [drm] GART: num cpu pages 65536, num gpu
> pages 65536 Jan 26 10:35:20 msi1 kernel: [drm] PCIE GART of 256M enabled
> (table at 0x00F40030).
> Jan 26 10:35:20 msi1 kernel: drmn0: successfully loaded firmware image
> 'amdgpu/polaris12_pfp_2.bin'
> Jan 26 10:35:20 msi1 kernel: drmn0: successfully loaded firmware image
> 'amdgpu/polaris12_me_2.bin'
> Jan 26 10:35:20 msi1 kernel: drmn0: successfully loaded firmware image
> 'amdgpu/polaris12_ce_2.bin'
> Jan 26 10:35:20 msi1 kernel: [drm] Chained IB support enabled!
> Jan 26 10:35:20 msi1 kernel: drmn0: successfully loaded firmware image
> 'amdgpu/polaris12_rlc.bin'
> Jan 26 10:35:20 msi1 kernel: drmn0: successfully loaded firmware image
> 'amdgpu/polaris12_mec_2.bin'
> Jan 26 10:35:20 msi1 kernel: drmn0: successfully loaded firmware image
> 'amdgpu/polaris12_mec2_2.bin'
> Jan 26 10:35:20 msi1 kernel: drmn0: successfully loaded firmware image
> 'amdgpu/polaris12_sdma.bin'
> Jan 26 10:35:20 msi1 kernel: drmn0: successfully loaded firmware image
> 'amdgpu/polaris12_sdma1.bin'
> Jan 26 10:35:20 msi1 kernel: amdgpu: hwmgr_sw_init smu backed is
> polaris10_smu Jan 26 10:35:20 msi1 kernel: drmn0: successfully 

Re: [PATCH] Revert "drm/amd/pm: fix the high voltage and temperature issue"

2024-01-19 Thread Deucher, Alexander
[AMD Official Use Only - General]

Acked-by: Alex Deucher 

From: amd-gfx  on behalf of Mario 
Limonciello 
Sent: Friday, January 19, 2024 4:16 AM
To: amd-gfx@lists.freedesktop.org 
Cc: Feng, Kenneth ; Limonciello, Mario 
; sta...@vger.kernel.org 
Subject: [PATCH] Revert "drm/amd/pm: fix the high voltage and temperature issue"

This reverts commit 5f38ac54e60562323ea4abb1bfb37d043ee23357.
This causes issues with rebooting and the 7800XT.

Cc: Kenneth Feng 
Cc: sta...@vger.kernel.org
Fixes: 5f38ac54e605 ("drm/amd/pm: fix the high voltage and temperature issue")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3062
Signed-off-by: Mario Limonciello 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c| 24 --
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 33 ++-
 drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h |  1 -
 .../drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c  |  8 +
 .../drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c  |  8 +
 5 files changed, 11 insertions(+), 63 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index b4c19e3d0bf1..56d9dfa61290 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4131,23 +4131,13 @@ int amdgpu_device_init(struct amdgpu_device *adev,
 }
 }
 } else {
-   switch (amdgpu_ip_version(adev, MP1_HWIP, 0)) {
-   case IP_VERSION(13, 0, 0):
-   case IP_VERSION(13, 0, 7):
-   case IP_VERSION(13, 0, 10):
-   r = psp_gpu_reset(adev);
-   break;
-   default:
-   tmp = amdgpu_reset_method;
-   /* It should do a default reset when loading or 
reloading the driver,
-* regardless of the module parameter 
reset_method.
-*/
-   amdgpu_reset_method = AMD_RESET_METHOD_NONE;
-   r = amdgpu_asic_reset(adev);
-   amdgpu_reset_method = tmp;
-   break;
-   }
-
+   tmp = amdgpu_reset_method;
+   /* It should do a default reset when loading or 
reloading the driver,
+* regardless of the module parameter reset_method.
+*/
+   amdgpu_reset_method = AMD_RESET_METHOD_NONE;
+   r = amdgpu_asic_reset(adev);
+   amdgpu_reset_method = tmp;
 if (r) {
 dev_err(adev->dev, "asic reset on init 
failed\n");
 goto failed;
diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c 
b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
index c16703868e5c..961cd2aaf137 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
@@ -733,7 +733,7 @@ static int smu_early_init(void *handle)
 smu->adev = adev;
 smu->pm_enabled = !!amdgpu_dpm;
 smu->is_apu = false;
-   smu->smu_baco.state = SMU_BACO_STATE_NONE;
+   smu->smu_baco.state = SMU_BACO_STATE_EXIT;
 smu->smu_baco.platform_support = false;
 smu->user_dpm_profile.fan_mode = -1;

@@ -1961,31 +1961,10 @@ static int smu_smc_hw_cleanup(struct smu_context *smu)
 return 0;
 }

-static int smu_reset_mp1_state(struct smu_context *smu)
-{
-   struct amdgpu_device *adev = smu->adev;
-   int ret = 0;
-
-   if ((!adev->in_runpm) && (!adev->in_suspend) &&
-   (!amdgpu_in_reset(adev)))
-   switch (amdgpu_ip_version(adev, MP1_HWIP, 0)) {
-   case IP_VERSION(13, 0, 0):
-   case IP_VERSION(13, 0, 7):
-   case IP_VERSION(13, 0, 10):
-   ret = smu_set_mp1_state(smu, PP_MP1_STATE_UNLOAD);
-   break;
-   default:
-   break;
-   }
-
-   return ret;
-}
-
 static int smu_hw_fini(void *handle)
 {
 struct amdgpu_device *adev = (struct amdgpu_device *)handle;
 struct smu_context *smu = adev->powerplay.pp_handle;
-   int ret;

 if (amdgpu_sriov_vf(adev) && !amdgpu_sriov_is_pp_one_vf(adev))
 return 0;
@@ -2003,15 +1982,7 @@ static int smu_hw_fini(void *handle)

 adev->pm.dpm_enabled = false;

-   ret = smu_smc_hw_cleanup(smu);
-   if (ret)
-   return ret;
-
-   ret = smu_reset_mp1_state(smu);
-   if (ret)
-   return ret;
-
-   return 0;
+   return smu_smc_hw_cleanup(smu);
 }

 static void smu_late_fini(void *handle)
diff --git 

RE: [PATCH 1/2] drm/amdgpu: check PS, WS index

2024-01-12 Thread Deucher, Alexander
[Public]

> -Original Message-
> From: amd-gfx  On Behalf Of
> Alexander
> Sent: Thursday, January 11, 2024 10:05 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Alexander Richards 
> Subject: [PATCH 1/2] drm/amdgpu: check PS, WS index
>
> From: Alexander Richards 
>
> Theoretically, it would be possible for a buggy or malicious VBIOS to 
> overwrite
> past the bounds of the passed parameters (or its own workspace); add
> bounds checking to prevent this from happening.
>
> Signed-off-by: Alexander Richards 

Applied the series.  Thanks!


> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c  | 24 +++
> .../gpu/drm/amd/amdgpu/amdgpu_atomfirmware.c  |  3 +-
>  drivers/gpu/drm/amd/amdgpu/atom.c | 41 +-
>  drivers/gpu/drm/amd/amdgpu/atom.h |  2 +-
>  drivers/gpu/drm/amd/amdgpu/atombios_crtc.c| 28 ++---
>  drivers/gpu/drm/amd/amdgpu/atombios_dp.c  |  4 +-
>  .../gpu/drm/amd/amdgpu/atombios_encoders.c| 16 +++
>  drivers/gpu/drm/amd/amdgpu/atombios_i2c.c |  4 +-
>  .../drm/amd/display/dc/bios/command_table.c   |  2 +-
>  .../drm/amd/display/dc/bios/command_table2.c  |  2 +-
>  .../drm/amd/pm/powerplay/hwmgr/ppatomctrl.c   | 42 +--
>  .../drm/amd/pm/powerplay/hwmgr/ppatomfwctrl.c |  4 +-
>  .../gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c|  2 +-
>  .../gpu/drm/amd/pm/swsmu/smu12/smu_v12_0.c|  2 +-
>  14 files changed, 102 insertions(+), 74 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c
> index dce9e7d5e..52b12c171 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c
> @@ -1018,7 +1018,8 @@ int amdgpu_atombios_get_clock_dividers(struct
> amdgpu_device *adev,
>   if (clock_type == COMPUTE_ENGINE_PLL_PARAM) {
>   args.v3.ulClockParams = cpu_to_le32((clock_type <<
> 24) | clock);
>
> - amdgpu_atom_execute_table(adev-
> >mode_info.atom_context, index, (uint32_t *));
> + amdgpu_atom_execute_table(adev-
> >mode_info.atom_context, index, (uint32_t *),
> + sizeof(args));
>
>   dividers->post_div = args.v3.ucPostDiv;
>   dividers->enable_post_div = (args.v3.ucCntlFlag &
> @@ -1038,7 +1039,8 @@ int amdgpu_atombios_get_clock_dividers(struct
> amdgpu_device *adev,
>   if (strobe_mode)
>   args.v5.ucInputFlag =
> ATOM_PLL_INPUT_FLAG_PLL_STROBE_MODE_EN;
>
> - amdgpu_atom_execute_table(adev-
> >mode_info.atom_context, index, (uint32_t *));
> + amdgpu_atom_execute_table(adev-
> >mode_info.atom_context, index, (uint32_t *),
> + sizeof(args));
>
>   dividers->post_div = args.v5.ucPostDiv;
>   dividers->enable_post_div = (args.v5.ucCntlFlag &
> @@ -1056,7 +1058,8 @@ int amdgpu_atombios_get_clock_dividers(struct
> amdgpu_device *adev,
>   /* fusion */
>   args.v4.ulClock = cpu_to_le32(clock);   /* 10 khz */
>
> - amdgpu_atom_execute_table(adev-
> >mode_info.atom_context, index, (uint32_t *));
> + amdgpu_atom_execute_table(adev-
> >mode_info.atom_context, index, (uint32_t *),
> + sizeof(args));
>
>   dividers->post_divider = dividers->post_div =
> args.v4.ucPostDiv;
>   dividers->real_clock = le32_to_cpu(args.v4.ulClock); @@ -
> 1067,7 +1070,8 @@ int amdgpu_atombios_get_clock_dividers(struct
> amdgpu_device *adev,
>   args.v6_in.ulClock.ulComputeClockFlag = clock_type;
>   args.v6_in.ulClock.ulClockFreq = cpu_to_le32(clock);/* 10
> khz */
>
> - amdgpu_atom_execute_table(adev-
> >mode_info.atom_context, index, (uint32_t *));
> + amdgpu_atom_execute_table(adev-
> >mode_info.atom_context, index, (uint32_t *),
> + sizeof(args));
>
>   dividers->whole_fb_div =
> le16_to_cpu(args.v6_out.ulFbDiv.usFbDiv);
>   dividers->frac_fb_div =
> le16_to_cpu(args.v6_out.ulFbDiv.usFbDivFrac);
> @@ -1109,7 +1113,8 @@ int
> amdgpu_atombios_get_memory_pll_dividers(struct amdgpu_device *adev,
>   if (strobe_mode)
>   args.ucInputFlag |=
> MPLL_INPUT_FLAG_STROBE_MODE_EN;
>
> - amdgpu_atom_execute_table(adev-
> >mode_info.atom_context, index, (uint32_t *));
> + amdgpu_atom_execute_table(adev-
> >mode_info.atom_context, index, (uint32_t *),
> + sizeof(args));
>
>   mpll_param->clkfrac =
> le16_to_cpu(args.ulFbDiv.usFbDivFrac);
>   mpll_param->clkf =
> le16_to_cpu(args.ulFbDiv.usFbDiv);
> @@ -1151,7 +1156,8 @@ void
> amdgpu_atombios_set_engine_dram_timings(struct amdgpu_device *adev,
>   if 

Re: [PATCH] drm/amdgpu: Fix the null pointer when load rlc firmware

2024-01-12 Thread Deucher, Alexander
[Public]

Acked-by: Alex Deucher 

From: Ma, Jun 
Sent: Friday, January 12, 2024 1:26 AM
To: amd-gfx@lists.freedesktop.org ; Koenig, 
Christian ; Deucher, Alexander 

Cc: Ma, Jun 
Subject: [PATCH] drm/amdgpu: Fix the null pointer when load rlc firmware

If the RLC firmware is invalid because of wrong header size,
the pointer to the rlc firmware is released in function
amdgpu_ucode_request. There will be a null pointer error
in subsequent use. So skip validation to fix it.

Signed-off-by: Ma Jun 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index d2c34436aefc..4d90e570b3cd 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -3996,16 +3996,13 @@ static int gfx_v10_0_init_microcode(struct 
amdgpu_device *adev)

 if (!amdgpu_sriov_vf(adev)) {
 snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_rlc.bin", 
ucode_prefix);
-   err = amdgpu_ucode_request(adev, >gfx.rlc_fw, fw_name);
-   /* don't check this.  There are apparently firmwares in the 
wild with
-* incorrect size in the header
-*/
-   if (err == -ENODEV)
-   goto out;
+   err = request_firmware(>gfx.rlc_fw, fw_name, adev->dev);
 if (err)
-   dev_dbg(adev->dev,
-   "gfx10: amdgpu_ucode_request() failed \"%s\"\n",
-   fw_name);
+   goto out;
+
+   /* don't validate this firmware.  There are apparently firmwares
+* in the wild with incorrect size in the header
+*/
 rlc_hdr = (const struct rlc_firmware_header_v2_0 
*)adev->gfx.rlc_fw->data;
 version_major = 
le16_to_cpu(rlc_hdr->header.header_version_major);
 version_minor = 
le16_to_cpu(rlc_hdr->header.header_version_minor);
--
2.34.1



Re: [PATCH] drm/amdgpu: revert "Adjust removal control flow for smu v13_0_2"

2024-01-10 Thread Deucher, Alexander
[Public]

Acked-by: Alex Deucher 

From: Christian König 
Sent: Wednesday, January 10, 2024 9:31 AM
To: Lazar, Lijo ; Chai, Thomas ; 
Deucher, Alexander ; amd-gfx@lists.freedesktop.org 

Subject: [PATCH] drm/amdgpu: revert "Adjust removal control flow for smu 
v13_0_2"

Calling amdgpu_device_ip_resume_phase1() during shutdown leaves the
HW in an active state and is an unbalanced use of the IP callbacks.

Using the IP callbacks like this can lead to memory leaks, double
free and imbalanced reference counters.

Leaving the HW in an active state can lead to DMA accesses to memory now
freed by the driver.

Both is a complete no-go for driver unload so completely revert the
workaround for now.

This reverts commit f5c7e7797060255dbc8160734ccc5ad6183c5e04.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 32 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c| 32 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h  |  1 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h   |  1 -
 4 files changed, 1 insertion(+), 65 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index a39c9fea55c4..313316009039 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5232,7 +5232,6 @@ int amdgpu_do_asic_reset(struct list_head 
*device_list_handle,
 struct amdgpu_device *tmp_adev = NULL;
 bool need_full_reset, skip_hw_reset, vram_lost = false;
 int r = 0;
-   bool gpu_reset_for_dev_remove = 0;

 /* Try reset handler method first */
 tmp_adev = list_first_entry(device_list_handle, struct amdgpu_device,
@@ -5252,10 +5251,6 @@ int amdgpu_do_asic_reset(struct list_head 
*device_list_handle,
 test_bit(AMDGPU_NEED_FULL_RESET, _context->flags);
 skip_hw_reset = test_bit(AMDGPU_SKIP_HW_RESET, _context->flags);

-   gpu_reset_for_dev_remove =
-   test_bit(AMDGPU_RESET_FOR_DEVICE_REMOVE, _context->flags) 
&&
-   test_bit(AMDGPU_NEED_FULL_RESET, _context->flags);
-
 /*
  * ASIC reset has to be done on all XGMI hive nodes ASAP
  * to allow proper links negotiation in FW (within 1 sec)
@@ -5298,18 +5293,6 @@ int amdgpu_do_asic_reset(struct list_head 
*device_list_handle,
 amdgpu_ras_intr_cleared();
 }

-   /* Since the mode1 reset affects base ip blocks, the
-* phase1 ip blocks need to be resumed. Otherwise there
-* will be a BIOS signature error and the psp bootloader
-* can't load kdb on the next amdgpu install.
-*/
-   if (gpu_reset_for_dev_remove) {
-   list_for_each_entry(tmp_adev, device_list_handle, reset_list)
-   amdgpu_device_ip_resume_phase1(tmp_adev);
-
-   goto end;
-   }
-
 list_for_each_entry(tmp_adev, device_list_handle, reset_list) {
 if (need_full_reset) {
 /* post card */
@@ -5543,11 +5526,6 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
 int i, r = 0;
 bool need_emergency_restart = false;
 bool audio_suspended = false;
-   bool gpu_reset_for_dev_remove = false;
-
-   gpu_reset_for_dev_remove =
-   test_bit(AMDGPU_RESET_FOR_DEVICE_REMOVE, 
_context->flags) &&
-   test_bit(AMDGPU_NEED_FULL_RESET, 
_context->flags);

 /*
  * Special case: RAS triggered and full reset isn't supported
@@ -5585,7 +5563,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
 if (!amdgpu_sriov_vf(adev) && (adev->gmc.xgmi.num_physical_nodes > 1)) 
{
 list_for_each_entry(tmp_adev, >device_list, 
gmc.xgmi.head) {
 list_add_tail(_adev->reset_list, _list);
-   if (gpu_reset_for_dev_remove && adev->shutdown)
+   if (adev->shutdown)
 tmp_adev->shutdown = true;
 }
 if (!list_is_first(>reset_list, _list))
@@ -5670,10 +5648,6 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,

 retry:  /* Rest of adevs pre asic reset from XGMI hive. */
 list_for_each_entry(tmp_adev, device_list_handle, reset_list) {
-   if (gpu_reset_for_dev_remove) {
-   /* Workaroud for ASICs need to disable SMC first */
-   amdgpu_device_smu_fini_early(tmp_adev);
-   }
 r = amdgpu_device_pre_asic_reset(tmp_adev, reset_context);
 /*TODO Should we stop ?*/
 if (r) {
@@ -5705,9 +5679,6 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
 r = amdgpu_do_asic_reset(device_list_handle, reset_c

RE: Documentation for RGB strip on RX 7900 XTX (Reference)

2024-01-10 Thread Deucher, Alexander
[Public]

> -Original Message-
> From: amd-gfx  On Behalf Of
> Alexander Koskovich
> Sent: Tuesday, January 9, 2024 9:21 PM
> To: Christian König 
> Cc: amd-gfx@lists.freedesktop.org
> Subject: Re: Documentation for RGB strip on RX 7900 XTX (Reference)
>
> Are there any userspace utilities for checking out the ATOMBIOS tables? Have
> never done so and all the utilities I've found online are too old for this 
> card (at
> least it refuses to open the VBIOS for this card).


There is atomdis and I think there is some limited bios parsing in umr. That 
said, you'd need probably need to update the headers in those projects with the 
latest ones from the kernel kernel and probably add new parsers for the newer 
data table versions.

Alex

>
>
> On Tuesday, January 9th, 2024 at 3:02 AM, Christian König
>  wrote:
>
>
> >
> >
> > Am 08.01.24 um 23:32 schrieb Deucher, Alexander:
> >
> > > [Public]
> > >
> > > > -Original Message-
> > > > From: amd-gfx amd-gfx-boun...@lists.freedesktop.org On Behalf Of
> > > > Alexander Koskovich
> > > > Sent: Sunday, January 7, 2024 11:19 PM
> > > > To: amd-gfx@lists.freedesktop.org
> > > > Subject: Documentation for RGB strip on RX 7900 XTX (Reference)
> > > >
> > > > Hello,
> > > >
> > > > I was wondering if AMD would be able provide any documentation for
> > > > the RGB strip on the reference cooler
> > > > (https://www.amd.com/en/products/graphics/amd-radeon-rx-
> 7900xtx)?
> > > > It looks to be handled via I2C commands to the SMU, but having
> > > > proper documentation would be extremely helpful.
> > > > It depends on the AIB/OEM and how they designed the specific board.
> The RGB controller will either be attached to the DDCVGA i2c bus on the
> display hardware or the second SMU i2c bus. The former will require changes
> to the amdgpu display code to register display i2c buses that are not used by
> the display connectors on the board so they can be used by 3rd party
> applications. Currently we only register i2c buses used for display 
> connectors.
> The latter buses are already registered with the i2c subsystem since they are
> used for other things like EEPROMs on server and workstation cards and
> should be available via standard Linux i2c APIs. I'm not sure what i2c LED
> controllers each AIB vendor uses off hand. https://openrgb.org/index.html
> would probably be a good resource for that information.
> >
> >
> >
> > It might also be a good idea to look some of the ATOMBIOS tables found
> > on your device.
> >
> > Those tables are filled in by the AIB/OEM with the information which
> > connectors (HDMI, DVI, DP etc...) are on the board and I bet that the
> > information which RGB controller is used and where to find it is
> > somewhere in there as well.
> >
> > Adding Harry from our display team, might be that he has some more
> > hints as well.
> >
> > Christian.
> >
> > > Alex


RE: Documentation for RGB strip on RX 7900 XTX (Reference)

2024-01-10 Thread Deucher, Alexander
[Public]

> -Original Message-
> From: Alexander Koskovich 
> Sent: Tuesday, January 9, 2024 6:01 PM
> To: Deucher, Alexander 
> Cc: amd-gfx@lists.freedesktop.org
> Subject: RE: Documentation for RGB strip on RX 7900 XTX (Reference)
>
> I initially tried reaching out to the Coolermaster technical email, but I got 
> no
> response. Is there a better contact for something like this?
>

I'm not really sure.  That sounds like a good start.  It may work similarly to 
other CoolerMaster designs in openRGB.

Alex


>
>
> On Tuesday, January 9th, 2024 at 5:58 PM, Deucher, Alexander
>  wrote:
>
>
> >
> >
> > [Public]
> >
> > > -Original Message-
> > > From: amd-gfx amd-gfx-boun...@lists.freedesktop.org On Behalf Of
> > > Deucher, Alexander
> > > Sent: Tuesday, January 9, 2024 5:29 PM
> > > To: Alexander Koskovich akoskov...@protonmail.com
> > > Cc: amd-gfx@lists.freedesktop.org
> > > Subject: RE: Documentation for RGB strip on RX 7900 XTX (Reference)
> > >
> > > > -Original Message-
> > > > From: Alexander Koskovich akoskov...@protonmail.com
> > > > Sent: Tuesday, January 9, 2024 4:59 PM
> > > > To: Deucher, Alexander alexander.deuc...@amd.com
> > > > Cc: amd-gfx@lists.freedesktop.org
> > > > Subject: RE: Documentation for RGB strip on RX 7900 XTX
> > > > (Reference)
> > > >
> > > > Is the AIB/OEM for this board not AMD?
> > > > https://www.amd.com/en/products/graphics/amd-radeon-rx-7900xtx
> > >
> > > I'll double check (we usually don't produce reference boards with
> > > RGB), but my understanding is that if any of the boards available
> > > for sale on amd.com has RGB controls, the RGB control is provided by a
> third party vendor.
> >
> >
> >
> > CoolerMaster provides the RGB solution. See:
> > https://www.amd.com/en/support/graphics/amd-radeon-rx-7000-
> series/amd-
> > radeon-rx-7900-series/amd-radeon-rx-7900xtx
> >
> > Alex
> >
> > > Alex
> > >
> > > > On Tuesday, January 9th, 2024 at 4:53 PM, Deucher, Alexander
> > > > alexander.deuc...@amd.com wrote:
> > > >
> > > > > [AMD Official Use Only - General]
> > > > >
> > > > > > -Original Message-
> > > > > > From: Alexander Koskovich akoskov...@protonmail.com
> > > > > > Sent: Tuesday, January 9, 2024 3:27 PM
> > > > > > To: Deucher, Alexander alexander.deuc...@amd.com
> > > > > > Cc: amd-gfx@lists.freedesktop.org
> > > > > > Subject: RE: Documentation for RGB strip on RX 7900 XTX
> > > > > > (Reference)
> > > > > >
> > > > > > Doe AMD have documentation on the i2c data that gets sent
> > > > > > currently though? I was hoping to figure out what you need to
> > > > > > change in the command that gets sent to change stuff like
> > > > > > brightness, color (red, green, blue), rainbow, morse code, etc.
> > > > >
> > > > > It depends on the LED controller used by the AIB/OEM. The
> > > > > programming sequence is dependent on the LED controller.
> > > > >
> > > > > Alex
> > > > >
> > > > > > On Tuesday, January 9th, 2024 at 10:10 AM, Deucher, Alexander
> > > > > > alexander.deuc...@amd.com wrote:
> > > > > >
> > > > > > > [Public]
> > > > > > >
> > > > > > > > -Original Message-
> > > > > > > > From: Alexander Koskovich akoskov...@protonmail.com
> > > > > > > > Sent: Monday, January 8, 2024 7:22 PM
> > > > > > > > To: Deucher, Alexander alexander.deuc...@amd.com
> > > > > > > > Cc: amd-gfx@lists.freedesktop.org
> > > > > > > > Subject: RE: Documentation for RGB strip on RX 7900 XTX
> > > > > > > > (Reference)
> > > > > > > >
> > > > > > > > Currently the reference cooler from AMD does not have an
> > > > > > > > existing RGB controller for OpenRGB, that's why I was
> > > > > > > > looking for documentation on the I2C commands to send to
> > > > > > > > the second SMU, so I don't risk bricking my card by
> > > > > > > > sending wrong commands during development somehow.
> > > 

RE: [PATCH] drm/amdgpu: drop exp hw support check for GC 9.4.3

2024-01-10 Thread Deucher, Alexander
[Public]

Ping!

> -Original Message-
> From: Deucher, Alexander 
> Sent: Tuesday, January 9, 2024 10:46 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander 
> Subject: [PATCH] drm/amdgpu: drop exp hw support check for GC 9.4.3
>
> No longer needed.
>
> Signed-off-by: Alex Deucher 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> index b8fde08aec8e..f96811bbe40e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> @@ -1963,8 +1963,6 @@ static int
> amdgpu_discovery_set_gc_ip_blocks(struct amdgpu_device *adev)
>   amdgpu_device_ip_block_add(adev, _v9_0_ip_block);
>   break;
>   case IP_VERSION(9, 4, 3):
> - if (!amdgpu_exp_hw_support)
> - return -EINVAL;
>   amdgpu_device_ip_block_add(adev, _v9_4_3_ip_block);
>   break;
>   case IP_VERSION(10, 1, 10):
> --
> 2.42.0



RE: Documentation for RGB strip on RX 7900 XTX (Reference)

2024-01-09 Thread Deucher, Alexander
[Public]

> -Original Message-
> From: amd-gfx  On Behalf Of
> Deucher, Alexander
> Sent: Tuesday, January 9, 2024 5:29 PM
> To: Alexander Koskovich 
> Cc: amd-gfx@lists.freedesktop.org
> Subject: RE: Documentation for RGB strip on RX 7900 XTX (Reference)
> >
> > -Original Message-
> > From: Alexander Koskovich 
> > Sent: Tuesday, January 9, 2024 4:59 PM
> > To: Deucher, Alexander 
> > Cc: amd-gfx@lists.freedesktop.org
> > Subject: RE: Documentation for RGB strip on RX 7900 XTX (Reference)
> >
> > Is the AIB/OEM for this board not AMD?
> > https://www.amd.com/en/products/graphics/amd-radeon-rx-7900xtx
> >
>
> I'll double check (we usually don't produce reference boards with RGB), but
> my understanding is that if any of the boards available for sale on amd.com
> has RGB controls, the RGB control is provided by a third party vendor.


CoolerMaster provides the RGB solution.  See:
https://www.amd.com/en/support/graphics/amd-radeon-rx-7000-series/amd-radeon-rx-7900-series/amd-radeon-rx-7900xtx

Alex


>
> Alex
>
> >
> >
> > On Tuesday, January 9th, 2024 at 4:53 PM, Deucher, Alexander
> >  wrote:
> >
> >
> > >
> > >
> > > [AMD Official Use Only - General]
> > >
> > > > -Original Message-
> > > > From: Alexander Koskovich akoskov...@protonmail.com
> > > > Sent: Tuesday, January 9, 2024 3:27 PM
> > > > To: Deucher, Alexander alexander.deuc...@amd.com
> > > > Cc: amd-gfx@lists.freedesktop.org
> > > > Subject: RE: Documentation for RGB strip on RX 7900 XTX
> > > > (Reference)
> > > >
> > > > Doe AMD have documentation on the i2c data that gets sent
> > > > currently though? I was hoping to figure out what you need to
> > > > change in the command that gets sent to change stuff like
> > > > brightness, color (red, green, blue), rainbow, morse code, etc.
> > >
> > >
> > > It depends on the LED controller used by the AIB/OEM. The
> > > programming
> > sequence is dependent on the LED controller.
> > >
> > > Alex
> > >
> > > > On Tuesday, January 9th, 2024 at 10:10 AM, Deucher, Alexander
> > > > alexander.deuc...@amd.com wrote:
> > > >
> > > > > [Public]
> > > > >
> > > > > > -Original Message-
> > > > > > From: Alexander Koskovich akoskov...@protonmail.com
> > > > > > Sent: Monday, January 8, 2024 7:22 PM
> > > > > > To: Deucher, Alexander alexander.deuc...@amd.com
> > > > > > Cc: amd-gfx@lists.freedesktop.org
> > > > > > Subject: RE: Documentation for RGB strip on RX 7900 XTX
> > > > > > (Reference)
> > > > > >
> > > > > > Currently the reference cooler from AMD does not have an
> > > > > > existing RGB controller for OpenRGB, that's why I was looking
> > > > > > for documentation on the I2C commands to send to the second
> > > > > > SMU, so I don't risk bricking my card by sending wrong
> > > > > > commands during development somehow.
> > > > > >
> > > > > > writeSetCMDWithData:
> > > > > >
> > **
> > > > > > adli2c.iSize = sizeof(ADLI2C)
> > > > > > adli2c.iAction = ADL_DL_I2C_ACTIONWRITE adli2c.iAddress = 0xb4
> > > > > > adli2c.iSpeed = 100
> > > > > > 0 --
> > > > > > Dev 0: ADL_Display_WriteAndReadSMUI2C(0, ) = 0
> > > > > > adli2c.iDataSize =
> > > > > > 24 i2cData[0]~[24]
> > > > > > 40 51 2c 01 00 00 ff 00 ff ff ff cc 00 cc 00 00 00 ff ff ff ff
> > > > > > ff ff ff
> > > > > >
> > > > > > From the RGB app's logs this is an example of what the
> > > > > > official AMD application on Windows is sending when it changes
> > > > > > colors on the
> > RGB strip.
> > > > > >
> > > > > > From this can it be assumed the AMD card is using the latter
> > > > > > method you mentioned with the second SMU I2C bus, in which
> > > > > > case no driver changes would be needed?
> > > > >
> > > > > IIRC, each AIB/OEM uses its own preferred RGB controller. The
> > > > > referen

RE: Documentation for RGB strip on RX 7900 XTX (Reference)

2024-01-09 Thread Deucher, Alexander
[Public]

> -Original Message-
> From: Alexander Koskovich 
> Sent: Tuesday, January 9, 2024 4:59 PM
> To: Deucher, Alexander 
> Cc: amd-gfx@lists.freedesktop.org
> Subject: RE: Documentation for RGB strip on RX 7900 XTX (Reference)
>
> Is the AIB/OEM for this board not AMD?
> https://www.amd.com/en/products/graphics/amd-radeon-rx-7900xtx
>

I'll double check (we usually don't produce reference boards with RGB), but my 
understanding is that if any of the boards available for sale on amd.com has 
RGB controls, the RGB control is provided by a third party vendor.

Alex

>
>
> On Tuesday, January 9th, 2024 at 4:53 PM, Deucher, Alexander
>  wrote:
>
>
> >
> >
> > [AMD Official Use Only - General]
> >
> > > -Original Message-
> > > From: Alexander Koskovich akoskov...@protonmail.com
> > > Sent: Tuesday, January 9, 2024 3:27 PM
> > > To: Deucher, Alexander alexander.deuc...@amd.com
> > > Cc: amd-gfx@lists.freedesktop.org
> > > Subject: RE: Documentation for RGB strip on RX 7900 XTX (Reference)
> > >
> > > Doe AMD have documentation on the i2c data that gets sent currently
> > > though? I was hoping to figure out what you need to change in the
> > > command that gets sent to change stuff like brightness, color (red,
> > > green, blue), rainbow, morse code, etc.
> >
> >
> > It depends on the LED controller used by the AIB/OEM. The programming
> sequence is dependent on the LED controller.
> >
> > Alex
> >
> > > On Tuesday, January 9th, 2024 at 10:10 AM, Deucher, Alexander
> > > alexander.deuc...@amd.com wrote:
> > >
> > > > [Public]
> > > >
> > > > > -Original Message-
> > > > > From: Alexander Koskovich akoskov...@protonmail.com
> > > > > Sent: Monday, January 8, 2024 7:22 PM
> > > > > To: Deucher, Alexander alexander.deuc...@amd.com
> > > > > Cc: amd-gfx@lists.freedesktop.org
> > > > > Subject: RE: Documentation for RGB strip on RX 7900 XTX
> > > > > (Reference)
> > > > >
> > > > > Currently the reference cooler from AMD does not have an
> > > > > existing RGB controller for OpenRGB, that's why I was looking
> > > > > for documentation on the I2C commands to send to the second SMU,
> > > > > so I don't risk bricking my card by sending wrong commands
> > > > > during development somehow.
> > > > >
> > > > > writeSetCMDWithData:
> > > > >
> **
> > > > > adli2c.iSize = sizeof(ADLI2C)
> > > > > adli2c.iAction = ADL_DL_I2C_ACTIONWRITE adli2c.iAddress = 0xb4
> > > > > adli2c.iSpeed = 100
> > > > > 0 --
> > > > > Dev 0: ADL_Display_WriteAndReadSMUI2C(0, ) = 0
> > > > > adli2c.iDataSize =
> > > > > 24 i2cData[0]~[24]
> > > > > 40 51 2c 01 00 00 ff 00 ff ff ff cc 00 cc 00 00 00 ff ff ff ff
> > > > > ff ff ff
> > > > >
> > > > > From the RGB app's logs this is an example of what the official
> > > > > AMD application on Windows is sending when it changes colors on the
> RGB strip.
> > > > >
> > > > > From this can it be assumed the AMD card is using the latter
> > > > > method you mentioned with the second SMU I2C bus, in which case
> > > > > no driver changes would be needed?
> > > >
> > > > IIRC, each AIB/OEM uses its own preferred RGB controller. The
> > > > reference board just defines which i2c buses can be used. The RGB
> > > > control application is just a userspace app provided by the
> > > > AIB/OEM that calls ADL to talk to whichever i2c bus the vendor put
> > > > their RGB controller on. On Linux you can do something similar
> > > > using the i2c_dev module to open a connection to the i2c bus driver
> provided by the kernel. I believe that is what openRGB does today.
> > > > It looks like you already have the programming sequence above.
> > > >
> > > > Alex
> > > >
> > > > > On Monday, January 8th, 2024 at 5:32 PM, Deucher, Alexander
> > > > > alexander.deuc...@amd.com wrote:
> > > > >
> > > > > > [Public]
> > > > > >
> > > > > > > -Original Message-
> > > > > > > From: amd-gfx a

RE: Documentation for RGB strip on RX 7900 XTX (Reference)

2024-01-09 Thread Deucher, Alexander
[AMD Official Use Only - General]

> -Original Message-
> From: Alexander Koskovich 
> Sent: Tuesday, January 9, 2024 3:27 PM
> To: Deucher, Alexander 
> Cc: amd-gfx@lists.freedesktop.org
> Subject: RE: Documentation for RGB strip on RX 7900 XTX (Reference)
>
> Doe AMD have documentation on the i2c data that gets sent currently
> though? I was hoping to figure out what you need to change in the command
> that gets sent to change stuff like brightness, color (red, green, blue), 
> rainbow,
> morse code, etc.
>

It depends on the LED controller used by the AIB/OEM.  The programming sequence 
is dependent on the LED controller.

Alex


> On Tuesday, January 9th, 2024 at 10:10 AM, Deucher, Alexander
>  wrote:
>
>
> >
> >
> > [Public]
> >
> > > -Original Message-
> > > From: Alexander Koskovich akoskov...@protonmail.com
> > > Sent: Monday, January 8, 2024 7:22 PM
> > > To: Deucher, Alexander alexander.deuc...@amd.com
> > > Cc: amd-gfx@lists.freedesktop.org
> > > Subject: RE: Documentation for RGB strip on RX 7900 XTX (Reference)
> > >
> > > Currently the reference cooler from AMD does not have an existing
> > > RGB controller for OpenRGB, that's why I was looking for
> > > documentation on the I2C commands to send to the second SMU, so I
> > > don't risk bricking my card by sending wrong commands during
> development somehow.
> > >
> > > writeSetCMDWithData:
> > > **
> > > adli2c.iSize = sizeof(ADLI2C)
> > > adli2c.iAction = ADL_DL_I2C_ACTIONWRITE adli2c.iAddress = 0xb4
> > > adli2c.iSpeed = 100
> > > 0 --
> > > Dev 0: ADL_Display_WriteAndReadSMUI2C(0, ) = 0
> > > adli2c.iDataSize =
> > > 24 i2cData[0]~[24]
> > > 40 51 2c 01 00 00 ff 00 ff ff ff cc 00 cc 00 00 00 ff ff ff ff ff ff
> > > ff
> > >
> > > From the RGB app's logs this is an example of what the official AMD
> > > application on Windows is sending when it changes colors on the RGB strip.
> > >
> > > From this can it be assumed the AMD card is using the latter method
> > > you mentioned with the second SMU I2C bus, in which case no driver
> > > changes would be needed?
> >
> >
> >
> > IIRC, each AIB/OEM uses its own preferred RGB controller. The reference
> board just defines which i2c buses can be used. The RGB control application is
> just a userspace app provided by the AIB/OEM that calls ADL to talk to
> whichever i2c bus the vendor put their RGB controller on. On Linux you can do
> something similar using the i2c_dev module to open a connection to the i2c
> bus driver provided by the kernel. I believe that is what openRGB does today.
> It looks like you already have the programming sequence above.
> >
> > Alex
> >
> > > On Monday, January 8th, 2024 at 5:32 PM, Deucher, Alexander
> > > alexander.deuc...@amd.com wrote:
> > >
> > > > [Public]
> > > >
> > > > > -Original Message-
> > > > > From: amd-gfx amd-gfx-boun...@lists.freedesktop.org On Behalf Of
> > > > > Alexander Koskovich
> > > > > Sent: Sunday, January 7, 2024 11:19 PM
> > > > > To: amd-gfx@lists.freedesktop.org
> > > > > Subject: Documentation for RGB strip on RX 7900 XTX (Reference)
> > > > >
> > > > > Hello,
> > > > >
> > > > > I was wondering if AMD would be able provide any documentation
> > > > > for the RGB strip on the reference cooler
> > > > > (https://www.amd.com/en/products/graphics/amd-radeon-rx-
> 7900xtx)?
> > > > > It
> > > > > looks to be handled via I2C commands to the SMU, but having
> > > > > proper documentation would be extremely helpful.
> > > >
> > > > It depends on the AIB/OEM and how they designed the specific
> > > > board. The RGB controller will either be attached to the DDCVGA
> > > > i2c bus on the display hardware or the second SMU i2c bus. The
> > > > former will require changes to the amdgpu display code to register
> > > > display i2c buses that are not used by the display connectors on the
> board so they can be used by 3rd party applications.
> > > > Currently we only register i2c buses used for display connectors.
> > > > The latter buses are already registered with the i2c subsystem
> > > > since they are used for other things like EEPROMs on server and
> > > > workstation cards and should be available via standard Linux i2c
> > > > APIs. I'm not sure what i2c LED controllers each AIB vendor uses
> > > > off hand. https://openrgb.org/index.html would probably be a good
> resource for that information.
> > > >
> > > > Alex


RE: Documentation for RGB strip on RX 7900 XTX (Reference)

2024-01-09 Thread Deucher, Alexander
[Public]

> -Original Message-
> From: Alexander Koskovich 
> Sent: Monday, January 8, 2024 7:22 PM
> To: Deucher, Alexander 
> Cc: amd-gfx@lists.freedesktop.org
> Subject: RE: Documentation for RGB strip on RX 7900 XTX (Reference)
>
> Currently the reference cooler from AMD does not have an existing RGB
> controller for OpenRGB, that's why I was looking for documentation on the
> I2C commands to send to the second SMU, so I don't risk bricking my card by
> sending wrong commands during development somehow.
>
> writeSetCMDWithData:
> **
> adli2c.iSize = sizeof(ADLI2C)
> adli2c.iAction = ADL_DL_I2C_ACTIONWRITE
> adli2c.iAddress = 0xb4
> adli2c.iSpeed = 100
> 0 --
> Dev 0: ADL_Display_WriteAndReadSMUI2C(0, ) = 0 adli2c.iDataSize =
> 24 i2cData[0]~[24]
> 40 51 2c 01 00 00 ff 00 ff ff ff cc 00 cc 00 00 00 ff ff ff ff ff ff ff
>
> From the RGB app's logs this is an example of what the official AMD
> application on Windows is sending when it changes colors on the RGB strip.
>
> From this can it be assumed the AMD card is using the latter method you
> mentioned with the second SMU I2C bus, in which case no driver changes
> would be needed?


IIRC, each AIB/OEM uses its own preferred RGB controller.  The reference board 
just defines which i2c buses can be used.  The RGB control application is just 
a userspace app provided by the AIB/OEM that calls ADL to talk to whichever i2c 
bus the vendor put their RGB controller on.  On Linux you can do something 
similar using the i2c_dev module to open a connection to the i2c bus driver 
provided by the kernel.  I believe that is what openRGB does today.  It looks 
like you already have the programming sequence above.

Alex

>
>
> On Monday, January 8th, 2024 at 5:32 PM, Deucher, Alexander
>  wrote:
>
>
> >
> >
> > [Public]
> >
> > > -Original Message-
> > > From: amd-gfx amd-gfx-boun...@lists.freedesktop.org On Behalf Of
> > > Alexander Koskovich
> > > Sent: Sunday, January 7, 2024 11:19 PM
> > > To: amd-gfx@lists.freedesktop.org
> > > Subject: Documentation for RGB strip on RX 7900 XTX (Reference)
> > >
> > > Hello,
> > >
> > > I was wondering if AMD would be able provide any documentation for
> > > the RGB strip on the reference cooler
> > > (https://www.amd.com/en/products/graphics/amd-radeon-rx-7900xtx)?
> It
> > > looks to be handled via I2C commands to the SMU, but having proper
> > > documentation would be extremely helpful.
> >
> >
> > It depends on the AIB/OEM and how they designed the specific board. The
> RGB controller will either be attached to the DDCVGA i2c bus on the display
> hardware or the second SMU i2c bus. The former will require changes to the
> amdgpu display code to register display i2c buses that are not used by the
> display connectors on the board so they can be used by 3rd party applications.
> Currently we only register i2c buses used for display connectors. The latter
> buses are already registered with the i2c subsystem since they are used for
> other things like EEPROMs on server and workstation cards and should be
> available via standard Linux i2c APIs. I'm not sure what i2c LED controllers 
> each
> AIB vendor uses off hand. https://openrgb.org/index.html would probably be
> a good resource for that information.
> >
> > Alex


RE: Documentation for RGB strip on RX 7900 XTX (Reference)

2024-01-08 Thread Deucher, Alexander
[Public]

> -Original Message-
> From: amd-gfx  On Behalf Of
> Alexander Koskovich
> Sent: Sunday, January 7, 2024 11:19 PM
> To: amd-gfx@lists.freedesktop.org
> Subject: Documentation for RGB strip on RX 7900 XTX (Reference)
>
> Hello,
>
> I was wondering if AMD would be able provide any documentation for the
> RGB strip on the reference cooler
> (https://www.amd.com/en/products/graphics/amd-radeon-rx-7900xtx)? It
> looks to be handled via I2C commands to the SMU, but having proper
> documentation would be extremely helpful.

It depends on the AIB/OEM and how they designed the specific board.  The RGB 
controller will either be attached to the DDCVGA i2c bus on the display 
hardware or the second SMU i2c bus.  The former will require changes to the 
amdgpu display code to register display i2c buses that are not used by the 
display connectors on the board so they can be used by 3rd party applications.  
Currently we only register i2c buses used for display connectors.  The latter 
buses are already registered with the i2c subsystem since they are used for 
other things like EEPROMs on server and workstation cards and should be 
available via standard Linux i2c APIs.  I'm not sure what i2c LED controllers 
each AIB vendor uses off hand.  https://openrgb.org/index.html would probably 
be a good resource for that information.

Alex



Re: [PATCH 2/2] drm/amdgpu: skip gpu_info fw loading on navi12

2024-01-02 Thread Deucher, Alexander
[AMD Official Use Only - General]

Ping on this series?

Alex

From: Deucher, Alexander 
Sent: Thursday, December 21, 2023 1:11 PM
To: amd-gfx@lists.freedesktop.org 
Cc: Deucher, Alexander 
Subject: [PATCH 2/2] drm/amdgpu: skip gpu_info fw loading on navi12

It's no longer required.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2318
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 11 ++-
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 9c1ff893c03c..71e8fe2144b4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2251,15 +2251,8 @@ static int amdgpu_device_parse_gpu_info_fw(struct 
amdgpu_device *adev)

 adev->firmware.gpu_info_fw = NULL;

-   if (adev->mman.discovery_bin) {
-   /*
-* FIXME: The bounding box is still needed by Navi12, so
-* temporarily read it from gpu_info firmware. Should be dropped
-* when DAL no longer needs it.
-*/
-   if (adev->asic_type != CHIP_NAVI12)
-   return 0;
-   }
+   if (adev->mman.discovery_bin)
+   return 0;

 switch (adev->asic_type) {
 default:
--
2.42.0



Re: [PATCH] drm/amdgpu: Remove unreachable code in 'atom_skip_src_int()'

2024-01-02 Thread Deucher, Alexander
[Public]

Reviewed-by: Alex Deucher 

From: SHANMUGAM, SRINIVASAN 
Sent: Friday, December 29, 2023 4:43 AM
To: Deucher, Alexander ; Koenig, Christian 
; Kuehling, Felix 
Cc: amd-gfx@lists.freedesktop.org ; SHANMUGAM, 
SRINIVASAN 
Subject: [PATCH] drm/amdgpu: Remove unreachable code in 'atom_skip_src_int()'

Fixes the below:
drivers/gpu/drm/amd/amdgpu/atom.c:398 atom_skip_src_int() warn: ignoring 
unreachable code.

Cc: Christian König 
Cc: Alex Deucher 
Signed-off-by: Srinivasan Shanmugam 
---
 drivers/gpu/drm/amd/amdgpu/atom.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/atom.c 
b/drivers/gpu/drm/amd/amdgpu/atom.c
index 2c221000782c..a33e890c70d9 100644
--- a/drivers/gpu/drm/amd/amdgpu/atom.c
+++ b/drivers/gpu/drm/amd/amdgpu/atom.c
@@ -395,7 +395,6 @@ static void atom_skip_src_int(atom_exec_context *ctx, 
uint8_t attr, int *ptr)
 (*ptr)++;
 return;
 }
-   return;
 }
 }

--
2.34.1



Re: [PATCH 4/5] drm/amdgpu: Query boot status if discovery failed

2024-01-02 Thread Deucher, Alexander
[AMD Official Use Only - General]

Is mmIP_DISCOVERY_VERSION at the same offset across ASIC families?

Alex


From: Hawking Zhang 
Sent: Monday, January 1, 2024 10:43 PM
To: amd-gfx@lists.freedesktop.org ; Zhou1, Tao 
; Yang, Stanley ; Wang, Yang(Kevin) 
; Chai, Thomas ; Li, Candice 

Cc: Zhang, Hawking ; Deucher, Alexander 
; Lazar, Lijo ; Ma, Le 

Subject: [PATCH 4/5] drm/amdgpu: Query boot status if discovery failed

Check and report boot status if discovery failed.

Signed-off-by: Hawking Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index b8fde08aec8e..302b71e9f1e2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -27,6 +27,7 @@
 #include "amdgpu_discovery.h"
 #include "soc15_hw_ip.h"
 #include "discovery.h"
+#include "amdgpu_ras.h"

 #include "soc15.h"
 #include "gfx_v9_0.h"
@@ -98,6 +99,7 @@
 #define FIRMWARE_IP_DISCOVERY "amdgpu/ip_discovery.bin"
 MODULE_FIRMWARE(FIRMWARE_IP_DISCOVERY);

+#define mmIP_DISCOVERY_VERSION  0x16A00
 #define mmRCC_CONFIG_MEMSIZE0xde3
 #define mmMP0_SMN_C2PMSG_33 0x16061
 #define mmMM_INDEX  0x0
@@ -518,7 +520,9 @@ static int amdgpu_discovery_init(struct amdgpu_device *adev)
 out:
 kfree(adev->mman.discovery_bin);
 adev->mman.discovery_bin = NULL;
-
+   if ((amdgpu_discovery != 2) &&
+   (RREG32(mmIP_DISCOVERY_VERSION) == 4))
+   amdgpu_ras_query_boot_status(adev, 4);
 return r;
 }

--
2.17.1



Re: [PATCH v2] drm/amd: Add missing definitions for `SMU_MAX_LEVELS_VDDGFX`

2023-12-15 Thread Deucher, Alexander
[Public]

Would be cleaner to just add to the SMU_MAX_LEVELS_VDDC case.  E.g.,

   case SMU_MAX_LEVELS_VDDC:
+   case SMU_MAX_LEVELS_VDDGFX:
   return SMU71_MAX_LEVELS_VDDC;

With that change, the patch is:
Reviewed-by: Alex Deucher 

From: amd-gfx  on behalf of Mario 
Limonciello 
Sent: Friday, December 15, 2023 3:37 PM
To: amd-gfx@lists.freedesktop.org 
Cc: Limonciello, Mario 
Subject: [PATCH v2] drm/amd: Add missing definitions for `SMU_MAX_LEVELS_VDDGFX`

It is reported that on a Topaz dGPU the kernel emits:
amdgpu: can't get the mac of 5

This is because there is no definition for max levels of VDDGFX
declared for SMU71 or SMU7. The correct definition is VDDC so
use this.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3049
Signed-off-by: Mario Limonciello 
---
v1->v2:
 * s/VDDGFX/VDDC/
---
 drivers/gpu/drm/amd/pm/powerplay/smumgr/ci_smumgr.c  | 2 ++
 drivers/gpu/drm/amd/pm/powerplay/smumgr/iceland_smumgr.c | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/pm/powerplay/smumgr/ci_smumgr.c 
b/drivers/gpu/drm/amd/pm/powerplay/smumgr/ci_smumgr.c
index 9e4228232f02..afe5e18f28db 100644
--- a/drivers/gpu/drm/amd/pm/powerplay/smumgr/ci_smumgr.c
+++ b/drivers/gpu/drm/amd/pm/powerplay/smumgr/ci_smumgr.c
@@ -2303,6 +2303,8 @@ static uint32_t ci_get_mac_definition(uint32_t value)
 return SMU7_MAX_LEVELS_VDDCI;
 case SMU_MAX_LEVELS_MVDD:
 return SMU7_MAX_LEVELS_MVDD;
+   case SMU_MAX_LEVELS_VDDGFX:
+   return SMU7_MAX_LEVELS_VDDC;
 }

 pr_debug("can't get the mac of %x\n", value);
diff --git a/drivers/gpu/drm/amd/pm/powerplay/smumgr/iceland_smumgr.c 
b/drivers/gpu/drm/amd/pm/powerplay/smumgr/iceland_smumgr.c
index 97d9802fe673..b4b2a3c96679 100644
--- a/drivers/gpu/drm/amd/pm/powerplay/smumgr/iceland_smumgr.c
+++ b/drivers/gpu/drm/amd/pm/powerplay/smumgr/iceland_smumgr.c
@@ -2268,6 +2268,8 @@ static uint32_t iceland_get_mac_definition(uint32_t value)
 return SMU71_MAX_LEVELS_VDDCI;
 case SMU_MAX_LEVELS_MVDD:
 return SMU71_MAX_LEVELS_MVDD;
+   case SMU_MAX_LEVELS_VDDGFX:
+   return SMU71_MAX_LEVELS_VDDC;
 }

 pr_warn("can't get the mac of %x\n", value);
--
2.34.1



Re: [PATCH 2/2] Documentation/amdgpu: Remove a spurious character

2023-12-15 Thread Deucher, Alexander
[Public]

Series is:
Reviewed-by: Alex Deucher 

From: amd-gfx  on behalf of Mario 
Limonciello 
Sent: Friday, December 15, 2023 4:46 PM
To: amd-gfx@lists.freedesktop.org 
Cc: Limonciello, Mario 
Subject: [PATCH 2/2] Documentation/amdgpu: Remove a spurious character

`/` wasn't meant to be in the Dragon Range line

Signed-off-by: Mario Limonciello 
---
 Documentation/gpu/amdgpu/apu-asic-info-table.csv | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/gpu/amdgpu/apu-asic-info-table.csv 
b/Documentation/gpu/amdgpu/apu-asic-info-table.csv
index b8ada69919f3..18868abe2a91 100644
--- a/Documentation/gpu/amdgpu/apu-asic-info-table.csv
+++ b/Documentation/gpu/amdgpu/apu-asic-info-table.csv
@@ -7,7 +7,7 @@ SteamDeck, VANGOGH, DCN 3.0.1, 10.3.1, VCN 3.1.0, 5.2.1, 11.5.0
 Ryzen 5000 series / Ryzen 7x30 series, GREEN SARDINE / Cezanne / Barcelo / 
Barcelo-R, DCN 2.1, 9.3, VCN 2.2, 4.1.1, 12.0.1
 Ryzen 6000 series / Ryzen 7x35 series / Ryzen 7x36 series, YELLOW CARP / 
Rembrandt / Rembrandt-R, 3.1.2, 10.3.3, VCN 3.1.1, 5.2.3, 13.0.3
 Ryzen 7000 series (AM5), Raphael, 3.1.5, 10.3.6, 3.1.2, 5.2.6, 13.0.5
-Ryzen 7x45 series (FL1), / Dragon Range, 3.1.5, 10.3.6, 3.1.2, 5.2.6, 13.0.5
+Ryzen 7x45 series (FL1), Dragon Range, 3.1.5, 10.3.6, 3.1.2, 5.2.6, 13.0.5
 Ryzen 7x20 series, Mendocino, 3.1.6, 10.3.7, 3.1.1, 5.2.7, 13.0.8
 Ryzen 7x40 series, Phoenix, 3.1.4, 11.0.1 / 11.0.4, 4.0.2, 6.0.1, 13.0.4 / 
13.0.11
 Ryzen 8x40 series, Hawk Point, 3.1.4, 11.0.1 / 11.0.4, 4.0.2, 6.0.1, 13.0.4 / 
13.0.11
--
2.34.1



Re: [PATCH] drm/amd: Add missing definitions for `SMU_MAX_LEVELS_VDDGFX`

2023-12-15 Thread Deucher, Alexander
[Public]

VDDGFX should be matched to VDDC (e.g, SMU7_MAX_LEVELS_VDDC).

Alex

From: amd-gfx  on behalf of Mario 
Limonciello 
Sent: Thursday, December 14, 2023 4:11 PM
To: amd-gfx@lists.freedesktop.org 
Cc: Limonciello, Mario 
Subject: [PATCH] drm/amd: Add missing definitions for `SMU_MAX_LEVELS_VDDGFX`

It is reported that on a Topaz dGPU the kernel emits:
amdgpu: can't get the mac of 5

This is because there is no definition for max levels of VDDGFX
declared for SMU71 or SMU7. There is however an unused definition of
VDDNB. Use this to return the max levels for VDDGFX.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3049
Signed-off-by: Mario Limonciello 
---
 drivers/gpu/drm/amd/pm/powerplay/smumgr/ci_smumgr.c  | 2 ++
 drivers/gpu/drm/amd/pm/powerplay/smumgr/iceland_smumgr.c | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/pm/powerplay/smumgr/ci_smumgr.c 
b/drivers/gpu/drm/amd/pm/powerplay/smumgr/ci_smumgr.c
index 9e4228232f02..c5bccd382196 100644
--- a/drivers/gpu/drm/amd/pm/powerplay/smumgr/ci_smumgr.c
+++ b/drivers/gpu/drm/amd/pm/powerplay/smumgr/ci_smumgr.c
@@ -2303,6 +2303,8 @@ static uint32_t ci_get_mac_definition(uint32_t value)
 return SMU7_MAX_LEVELS_VDDCI;
 case SMU_MAX_LEVELS_MVDD:
 return SMU7_MAX_LEVELS_MVDD;
+   case SMU_MAX_LEVELS_VDDGFX:
+   return SMU7_MAX_LEVELS_VDDNB;
 }

 pr_debug("can't get the mac of %x\n", value);
diff --git a/drivers/gpu/drm/amd/pm/powerplay/smumgr/iceland_smumgr.c 
b/drivers/gpu/drm/amd/pm/powerplay/smumgr/iceland_smumgr.c
index 97d9802fe673..c9115eaa63c4 100644
--- a/drivers/gpu/drm/amd/pm/powerplay/smumgr/iceland_smumgr.c
+++ b/drivers/gpu/drm/amd/pm/powerplay/smumgr/iceland_smumgr.c
@@ -2268,6 +2268,8 @@ static uint32_t iceland_get_mac_definition(uint32_t value)
 return SMU71_MAX_LEVELS_VDDCI;
 case SMU_MAX_LEVELS_MVDD:
 return SMU71_MAX_LEVELS_MVDD;
+   case SMU_MAX_LEVELS_VDDGFX:
+   return SMU71_MAX_LEVELS_VDDNB;
 }

 pr_warn("can't get the mac of %x\n", value);
--
2.34.1



RE: Crashes under Xen with Radeon graphics card

2023-12-15 Thread Deucher, Alexander
[AMD Official Use Only - General]

> -Original Message-
> From: Juergen Gross 
> Sent: Friday, December 15, 2023 11:13 AM
> To: Deucher, Alexander ; lkml  ker...@vger.kernel.org>; xen-de...@lists.xenproject.org; amd-
> g...@lists.freedesktop.org
> Cc: Koenig, Christian ; Pan, Xinhui
> 
> Subject: Re: Crashes under Xen with Radeon graphics card
>
> On 15.12.23 17:04, Deucher, Alexander wrote:
> > [Public]
> >
> >> -Original Message-
> >> From: Juergen Gross 
> >> Sent: Friday, December 15, 2023 6:57 AM
> >> To: lkml ;
> >> xen-de...@lists.xenproject.org; amd- g...@lists.freedesktop.org
> >> Cc: Deucher, Alexander ; Koenig, Christian
> >> ; Pan, Xinhui 
> >> Subject: Crashes under Xen with Radeon graphics card
> >>
> >> Hi,
> >>
> >> I recently stumbled over a test system which showed crashes probably
> >> resulting from memory being overwritten randomly.
> >>
> >> The problem is occurring only in Dom0 when running under Xen. It
> >> seems to be present since at least kernel 6.3 (I didn't go back
> >> further yet), and it seems NOT to be present in kernel 5.14.
> >>
> >> I tracked the problem down to the initialization of the graphics card
> >> (the problem might surface only later, but at least an early
> >> initialization failure made the problem go away).
> >>
> >> # lspci
> >> 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc.
> >> [AMD/ATI] Caicos XTX [Radeon HD 8490 / R5 235X OEM]
> >> 01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Caicos
> >> HDMI Audio [Radeon HD 6450 / 7450/8450/8490 OEM / R5
> 230/235/235X
> >> OEM]
> >>
> >> I had a working .config and one which did produce the crashes, so I
> >> narrowed the problem down to detect that the important difference was
> >> in the area of firmware loading (the working .config didn't have
> >> CONFIG_FW_LOADER_COMPRESS_XZ set, causing firmware loading for the
> >> card to fail). This was of course not the real problem, but it caused
> >> the card initialization to fail.
> >>
> >> I manually decompressed the firmware files one by one to see whether
> >> the problem would be in the decompressor or probably in the driver of the
> card.
> >>
> >> The last step without crash was:
> >>
> >> # dmesg | grep radeon
> >> [   10.106405] [drm] radeon kernel modesetting enabled.
> >> [   10.106455] radeon :01:00.0: vgaarb: deactivate vga console
> >> [   10.222944] radeon :01:00.0: VRAM: 1024M
> 0x
> >> -
> >> 0x3FFF (1024M used)
> >> [   10.252921] radeon :01:00.0: GTT: 1024M 0x4000
> -
> >> 0x7FFF
> >> [   10.278255] [drm] radeon: 1024M of VRAM memory ready
> >> [   10.295828] [drm] radeon: 1024M of GTT memory ready.
> >> [   10.295867] radeon :01:00.0: Direct firmware load for
> >> radeon/CAICOS_pfp.bin succeeded
> >> [   10.330846] radeon :01:00.0: Direct firmware load for
> >> radeon/CAICOS_me.bin succeeded
> >> [   10.330858] radeon :01:00.0: Direct firmware load for
> >> radeon/BTC_rlc.bin
> >> succeeded
> >> [   10.330870] radeon :01:00.0: Direct firmware load for
> >> radeon/CAICOS_mc.bin failed with error -2
> >> [   10.380979] ni_cp: Failed to load firmware "radeon/CAICOS_mc.bin"
> >> [   10.381006] [drm:evergreen_init [radeon]] *ERROR* Failed to load
> >> firmware!
> >> [   10.405765] radeon :01:00.0: Fatal error during GPU init
> >> [   10.432107] [drm] radeon: finishing device.
> >> [   10.439179] [drm] radeon: ttm finalized
> >> [   10.463203] radeon: probe of :01:00.0 failed with error -2
> >>
> >> And with decompressing radeon/CAICOS_mc.bin I got:
> >>
> >> # dmesg | grep radeon
> >> [   10.266491] [drm] radeon kernel modesetting enabled.
> >> [   10.266552] radeon :01:00.0: vgaarb: deactivate vga console
> >> [   10.456047] radeon :01:00.0: VRAM: 1024M
> 0x
> >> -
> >> 0x3FFF (1024M used)
> >> [   10.470270] radeon :01:00.0: GTT: 1024M 0x4000
> -
> >> 0x7FFF
> >> [   10.566946] [drm] radeon: 1024M of VRAM memory ready
> >> [   10.576891] [drm] radeon: 1024M of GTT memory ready.
> >> [   10.586971] radeon :01:00.0: Direct firmware load for
> >> radeon/CAIC

RE: Crashes under Xen with Radeon graphics card

2023-12-15 Thread Deucher, Alexander
[Public]

> -Original Message-
> From: Juergen Gross 
> Sent: Friday, December 15, 2023 6:57 AM
> To: lkml ; xen-de...@lists.xenproject.org; amd-
> g...@lists.freedesktop.org
> Cc: Deucher, Alexander ; Koenig, Christian
> ; Pan, Xinhui 
> Subject: Crashes under Xen with Radeon graphics card
>
> Hi,
>
> I recently stumbled over a test system which showed crashes probably
> resulting from memory being overwritten randomly.
>
> The problem is occurring only in Dom0 when running under Xen. It seems to
> be present since at least kernel 6.3 (I didn't go back further yet), and it 
> seems
> NOT to be present in kernel 5.14.
>
> I tracked the problem down to the initialization of the graphics card (the
> problem might surface only later, but at least an early initialization 
> failure made
> the problem go away).
>
> # lspci
> 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI]
> Caicos XTX [Radeon HD 8490 / R5 235X OEM]
> 01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Caicos HDMI
> Audio [Radeon HD 6450 / 7450/8450/8490 OEM / R5 230/235/235X OEM]
>
> I had a working .config and one which did produce the crashes, so I narrowed
> the problem down to detect that the important difference was in the area of
> firmware loading (the working .config didn't have
> CONFIG_FW_LOADER_COMPRESS_XZ set, causing firmware loading for the
> card to fail). This was of course not the real problem, but it caused the card
> initialization to fail.
>
> I manually decompressed the firmware files one by one to see whether the
> problem would be in the decompressor or probably in the driver of the card.
>
> The last step without crash was:
>
> # dmesg | grep radeon
> [   10.106405] [drm] radeon kernel modesetting enabled.
> [   10.106455] radeon :01:00.0: vgaarb: deactivate vga console
> [   10.222944] radeon :01:00.0: VRAM: 1024M 0x
> -
> 0x3FFF (1024M used)
> [   10.252921] radeon :01:00.0: GTT: 1024M 0x4000 -
> 0x7FFF
> [   10.278255] [drm] radeon: 1024M of VRAM memory ready
> [   10.295828] [drm] radeon: 1024M of GTT memory ready.
> [   10.295867] radeon :01:00.0: Direct firmware load for
> radeon/CAICOS_pfp.bin succeeded
> [   10.330846] radeon :01:00.0: Direct firmware load for
> radeon/CAICOS_me.bin succeeded
> [   10.330858] radeon :01:00.0: Direct firmware load for
> radeon/BTC_rlc.bin
> succeeded
> [   10.330870] radeon :01:00.0: Direct firmware load for
> radeon/CAICOS_mc.bin failed with error -2
> [   10.380979] ni_cp: Failed to load firmware "radeon/CAICOS_mc.bin"
> [   10.381006] [drm:evergreen_init [radeon]] *ERROR* Failed to load
> firmware!
> [   10.405765] radeon :01:00.0: Fatal error during GPU init
> [   10.432107] [drm] radeon: finishing device.
> [   10.439179] [drm] radeon: ttm finalized
> [   10.463203] radeon: probe of :01:00.0 failed with error -2
>
> And with decompressing radeon/CAICOS_mc.bin I got:
>
> # dmesg | grep radeon
> [   10.266491] [drm] radeon kernel modesetting enabled.
> [   10.266552] radeon :01:00.0: vgaarb: deactivate vga console
> [   10.456047] radeon :01:00.0: VRAM: 1024M 0x
> -
> 0x3FFF (1024M used)
> [   10.470270] radeon :01:00.0: GTT: 1024M 0x4000 -
> 0x7FFF
> [   10.566946] [drm] radeon: 1024M of VRAM memory ready
> [   10.576891] [drm] radeon: 1024M of GTT memory ready.
> [   10.586971] radeon :01:00.0: Direct firmware load for
> radeon/CAICOS_pfp.bin succeeded
> [   10.611886] radeon :01:00.0: Direct firmware load for
> radeon/CAICOS_me.bin succeeded
> [   10.611909] radeon :01:00.0: Direct firmware load for
> radeon/BTC_rlc.bin
> succeeded
> [   10.611938] radeon :01:00.0: Direct firmware load for
> radeon/CAICOS_mc.bin succeeded
> [   10.660599] radeon :01:00.0: Direct firmware load for
> radeon/CAICOS_smc.bin failed with error -2
> [   10.660601] smc: error loading firmware "radeon/CAICOS_smc.bin"

You also need to make sure CAICOS_smc.bin is available.

> [   10.661676] [drm] radeon: power management initialized
> [   10.713666] radeon :01:00.0: Direct firmware load for
> radeon/SUMO_uvd.bin
> failed with error -2
> [   10.713668] radeon :01:00.0: radeon_uvd: Can't load firmware
> "radeon/SUMO_uvd.bin"
> [   10.713669] radeon :01:00.0: failed UVD (-2) init.

And SUMO_uvd.bin.

> [   10.714787] [drm] enabling PCIE gen 2 link speeds, disable with
> radeon.pcie_gen2=0
> [   10.809213] radeon :01:00.0: WB enabled
> [   10.817528] radeon :01:00.0: fence driver on ring 0 use gpu addr
> 0x4c00
> [   1

RE: [PATCH] drm/amd/display: fix documentation for dm_crtc_additional_color_mgmt()

2023-12-14 Thread Deucher, Alexander
[Public]

> -Original Message-
> From: Melissa Wen 
> Sent: Thursday, December 14, 2023 2:45 PM
> To: Wentland, Harry ; Li, Sun peng (Leo)
> ; Siqueira, Rodrigo ;
> Deucher, Alexander ; Koenig, Christian
> ; Pan, Xinhui ;
> airl...@gmail.com; dan...@ffwll.ch
> Cc: kernel test robot ; amd-gfx@lists.freedesktop.org; dri-
> de...@lists.freedesktop.org; kernel-...@igalia.com
> Subject: [PATCH] drm/amd/display: fix documentation for
> dm_crtc_additional_color_mgmt()
>
> warning: expecting prototype for drm_crtc_additional_color_mgmt().
> Prototype was for dm_crtc_additional_color_mgmt() instead
>
> Reported-by: kernel test robot 
> Closes: https://lore.kernel.org/oe-kbuild-all/202312141801.o9eBCxt9-
> l...@intel.com/
> Signed-off-by: Melissa Wen 

Applied.  Thanks!

Alex

> ---
>  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c
> index 8b3aa674741d..4439e5a27362 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c
> @@ -292,7 +292,7 @@ static int amdgpu_dm_crtc_late_register(struct
> drm_crtc *crtc)
>
>  #ifdef AMD_PRIVATE_COLOR
>  /**
> - * drm_crtc_additional_color_mgmt - enable additional color properties
> + * dm_crtc_additional_color_mgmt - enable additional color properties
>   * @crtc: DRM CRTC
>   *
>   * This function lets the driver enable post-blending CRTC regamma transfer
> --
> 2.42.0



RE: [PATCH 2/3] drm/amdgpu/sdma5.0: add begin/end_use ring callbacks

2023-12-08 Thread Deucher, Alexander
[Public]

> -Original Message-
> From: Deucher, Alexander 
> Sent: Friday, December 8, 2023 5:19 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander 
> Subject: [PATCH 2/3] drm/amdgpu/sdma5.0: add begin/end_use ring
> callbacks
>
> Add begin/end_use ring callbacks to disallow GFXOFF when SDMA work is
> submitted and allow it again afterward.
>
> Signed-off-by: Alex Deucher 

This one can probably be dropped.  It's only needed if anyone on navi1x is 
experiencing a similar issue.

Alex

> ---
>  drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
> b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
> index 5c1bb6d07a76..1a68cd2de522 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
> @@ -1790,6 +1790,8 @@ static const struct amdgpu_ring_funcs
> sdma_v5_0_ring_funcs = {
>   .test_ib = sdma_v5_0_ring_test_ib,
>   .insert_nop = sdma_v5_0_ring_insert_nop,
>   .pad_ib = sdma_v5_0_ring_pad_ib,
> + .begin_use = amdgpu_sdma_ring_begin_use,
> + .end_use = amdgpu_sdma_ring_end_use,
>   .emit_wreg = sdma_v5_0_ring_emit_wreg,
>   .emit_reg_wait = sdma_v5_0_ring_emit_reg_wait,
>   .emit_reg_write_reg_wait =
> sdma_v5_0_ring_emit_reg_write_reg_wait,
> --
> 2.42.0



Re: [PATCH] drm/amd/pm: fix pp_*clk_od typo

2023-12-06 Thread Deucher, Alexander
[AMD Official Use Only - General]

Reviewed-by: Alex Deucher 

From: amd-gfx  on behalf of Dmitrii 
Galantsev 
Sent: Wednesday, December 6, 2023 2:39 PM
To: amd-gfx@lists.freedesktop.org 
Cc: Galantsev, Dmitrii 
Subject: [PATCH] drm/amd/pm: fix pp_*clk_od typo

Fix pp_dpm_sclk_od and pp_dpm_mclk_od typos.
Those were defined as pp_*clk_od but used as pp_dpm_*clk_od instead.
This change removes the _dpm part.

Signed-off-by: Dmitrii Galantsev 
---
 drivers/gpu/drm/amd/pm/amdgpu_pm.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c 
b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
index e1497296afee..2cd995b0ceba 100644
--- a/drivers/gpu/drm/amd/pm/amdgpu_pm.c
+++ b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
@@ -2238,10 +2238,10 @@ static int default_attr_update(struct amdgpu_device 
*adev, struct amdgpu_device_
 } else if (DEVICE_ATTR_IS(xgmi_plpd_policy)) {
 if (amdgpu_dpm_get_xgmi_plpd_mode(adev, NULL) == 
XGMI_PLPD_NONE)
 *states = ATTR_STATE_UNSUPPORTED;
-   } else if (DEVICE_ATTR_IS(pp_dpm_mclk_od)) {
+   } else if (DEVICE_ATTR_IS(pp_mclk_od)) {
 if (amdgpu_dpm_get_mclk_od(adev) == -EOPNOTSUPP)
 *states = ATTR_STATE_UNSUPPORTED;
-   } else if (DEVICE_ATTR_IS(pp_dpm_sclk_od)) {
+   } else if (DEVICE_ATTR_IS(pp_sclk_od)) {
 if (amdgpu_dpm_get_sclk_od(adev) == -EOPNOTSUPP)
 *states = ATTR_STATE_UNSUPPORTED;
 } else if (DEVICE_ATTR_IS(apu_thermal_cap)) {
--
2.43.0



RE: [PATCH 1/6] Revert "drm/prime: Unexport helpers for fd/handle conversion"

2023-12-04 Thread Deucher, Alexander
[AMD Official Use Only - General]

> -Original Message-
> From: Kuehling, Felix 
> Sent: Friday, December 1, 2023 6:40 PM
> To: amd-gfx@lists.freedesktop.org; dri-de...@lists.freedesktop.org; Deucher,
> Alexander 
> Cc: Daniel Vetter ; Koenig, Christian
> ; Thomas Zimmermann
> 
> Subject: Re: [PATCH 1/6] Revert "drm/prime: Unexport helpers for fd/handle
> conversion"
>
> Hi Alex,
>
> I'm about to push patches 1-3 to the rebased amd-staging-drm-next. It would
> be good to get patch 1 into drm-fixes so that Linux 6.6 will be the only 
> kernel
> without these prime helpers. That would minimize the hassle for DKMS driver
> installations on future distros.

Already done:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0514f63cfff38a0dcb7ba9c5f245827edc0c5107

Alex

>
> Thanks,
>Felix
>
>
> On 2023-12-01 18:34, Felix Kuehling wrote:
> > This reverts commit 71a7974ac7019afeec105a54447ae1dc7216cbb3.
> >
> > These helper functions are needed for KFD to export and import DMABufs
> > the right way without duplicating the tracking of DMABufs associated
> > with GEM objects while ensuring that move notifier callbacks are
> > working as intended.
> >
> > Acked-by: Christian König 
> > Acked-by: Thomas Zimmermann 
> > Acked-by: Daniel Vetter 
> > Signed-off-by: Felix Kuehling 
> > ---
> >   drivers/gpu/drm/drm_prime.c | 33 ++---
> >   include/drm/drm_prime.h |  7 +++
> >   2 files changed, 25 insertions(+), 15 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
> > index 63b709a67471..834a5e28abbe 100644
> > --- a/drivers/gpu/drm/drm_prime.c
> > +++ b/drivers/gpu/drm/drm_prime.c
> > @@ -278,7 +278,7 @@ void drm_gem_dmabuf_release(struct dma_buf
> *dma_buf)
> >   }
> >   EXPORT_SYMBOL(drm_gem_dmabuf_release);
> >
> > -/*
> > +/**
> >* drm_gem_prime_fd_to_handle - PRIME import function for GEM drivers
> >* @dev: drm_device to import into
> >* @file_priv: drm file-private structure @@ -292,9 +292,9 @@
> > EXPORT_SYMBOL(drm_gem_dmabuf_release);
> >*
> >* Returns 0 on success or a negative error code on failure.
> >*/
> > -static int drm_gem_prime_fd_to_handle(struct drm_device *dev,
> > - struct drm_file *file_priv, int prime_fd,
> > - uint32_t *handle)
> > +int drm_gem_prime_fd_to_handle(struct drm_device *dev,
> > +  struct drm_file *file_priv, int prime_fd,
> > +  uint32_t *handle)
> >   {
> > struct dma_buf *dma_buf;
> > struct drm_gem_object *obj;
> > @@ -360,6 +360,7 @@ static int drm_gem_prime_fd_to_handle(struct
> drm_device *dev,
> > dma_buf_put(dma_buf);
> > return ret;
> >   }
> > +EXPORT_SYMBOL(drm_gem_prime_fd_to_handle);
> >
> >   int drm_prime_fd_to_handle_ioctl(struct drm_device *dev, void *data,
> >  struct drm_file *file_priv)
> > @@ -408,7 +409,7 @@ static struct dma_buf
> *export_and_register_object(struct drm_device *dev,
> > return dmabuf;
> >   }
> >
> > -/*
> > +/**
> >* drm_gem_prime_handle_to_fd - PRIME export function for GEM drivers
> >* @dev: dev to export the buffer from
> >* @file_priv: drm file-private structure @@ -421,10 +422,10 @@
> > static struct dma_buf *export_and_register_object(struct drm_device *dev,
> >* The actual exporting from GEM object to a dma-buf is done through the
> >* _gem_object_funcs.export callback.
> >*/
> > -static int drm_gem_prime_handle_to_fd(struct drm_device *dev,
> > - struct drm_file *file_priv, uint32_t 
> > handle,
> > - uint32_t flags,
> > - int *prime_fd)
> > +int drm_gem_prime_handle_to_fd(struct drm_device *dev,
> > +  struct drm_file *file_priv, uint32_t handle,
> > +  uint32_t flags,
> > +  int *prime_fd)
> >   {
> > struct drm_gem_object *obj;
> > int ret = 0;
> > @@ -506,6 +507,7 @@ static int drm_gem_prime_handle_to_fd(struct
> > drm_device *dev,
> >
> > return ret;
> >   }
> > +EXPORT_SYMBOL(drm_gem_prime_handle_to_fd);
> >
> >   int drm_prime_handle_to_fd_ioctl(struct drm_device *dev, void *data,
> >  struct drm_file *file_priv)
>

RE: [PATCH] drm/amd/swsmu: update smu v14_0_0 driver if version and metrics table

2023-12-04 Thread Deucher, Alexander
[Public]

> -Original Message-
> From: Ma, Li 
> Sent: Monday, December 4, 2023 3:52 AM
> To: Deucher, Alexander ; amd-
> g...@lists.freedesktop.org
> Cc: Koenig, Christian ; Zhang, Yifan
> ; Yu, Lang ; Wang,
> Yang(Kevin) 
> Subject: RE: [PATCH] drm/amd/swsmu: update smu v14_0_0 driver if version
> and metrics table
>
> [Public]
>
> Hi Alex,
>
> Sorry for the late reply. Only smu14 used this gpu_metrics_v3_0 struct. And
> the patch has upstream. As far as l know, umr used gpu_metrics_v3_0 and I
> will submit a patch to umr.
> Does this struct need to be back compatible currently? If yes, I will revert 
> this
> patch and add a new gpu_metrics_v3_1.

Ok.  If we don't yet have a released kernel with v3_0 support we should be 
fine.  I'll just include the updates in 6.7.

Alex

>
> Best Regards,
> Li
>
> -Original Message-
> From: Deucher, Alexander 
> Sent: Tuesday, November 28, 2023 4:47 AM
> To: Ma, Li ; amd-gfx@lists.freedesktop.org
> Cc: Koenig, Christian ; Zhang, Yifan
> ; Yu, Lang 
> Subject: RE: [PATCH] drm/amd/swsmu: update smu v14_0_0 driver if version
> and metrics table
>
> [Public]
>
> > -Original Message-----
> > From: Ma, Li 
> > Sent: Thursday, November 23, 2023 5:07 AM
> > To: amd-gfx@lists.freedesktop.org
> > Cc: Deucher, Alexander ; Koenig, Christian
> > ; Zhang, Yifan ; Yu,
> > Lang ; Ma, Li 
> > Subject: [PATCH] drm/amd/swsmu: update smu v14_0_0 driver if version
> > and metrics table
> >
> > Increment the driver if version and add new mems to the mertics table.
> >
> > Signed-off-by: Li Ma 
> > ---
> >  .../gpu/drm/amd/include/kgd_pp_interface.h| 17 
> >  drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h | 10 +++
> >  .../inc/pmfw_if/smu14_driver_if_v14_0_0.h | 77 +++
> >  .../drm/amd/pm/swsmu/smu14/smu_v14_0_0_ppt.c  | 46 ++-
> >  4 files changed, 115 insertions(+), 35 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/include/kgd_pp_interface.h
> > b/drivers/gpu/drm/amd/include/kgd_pp_interface.h
> > index 8ebba87f4289..eaea1c65e526 100644
> > --- a/drivers/gpu/drm/amd/include/kgd_pp_interface.h
> > +++ b/drivers/gpu/drm/amd/include/kgd_pp_interface.h
> > @@ -1086,6 +1086,10 @@ struct gpu_metrics_v3_0 {
> >   uint16_taverage_dram_reads;
> >   /* time filtered DRAM write bandwidth [MB/sec] */
> >   uint16_taverage_dram_writes;
> > + /* time filtered IPU read bandwidth [MB/sec] */
> > + uint16_taverage_ipu_reads;
> > + /* time filtered IPU write bandwidth [MB/sec] */
> > + uint16_taverage_ipu_writes;
> >
> >   /* Driver attached timestamp (in ns) */
> >   uint64_tsystem_clock_counter;
> > @@ -1105,6 +1109,8 @@ struct gpu_metrics_v3_0 {
> >   uint32_taverage_all_core_power;
> >   /* calculated core power [mW] */
> >   uint16_taverage_core_power[16];
> > + /* time filtered total system power [mW] */
> > + uint16_taverage_sys_power;
> >   /* maximum IRM defined STAPM power limit [mW] */
> >   uint16_tstapm_power_limit;
> >   /* time filtered STAPM power limit [mW] */ @@ -1117,6 +1123,8 @@
> > struct gpu_metrics_v3_0 {
> >   uint16_taverage_ipuclk_frequency;
> >   uint16_taverage_fclk_frequency;
> >   uint16_taverage_vclk_frequency;
> > + uint16_taverage_uclk_frequency;
> > + uint16_taverage_mpipu_frequency;
> >
> >   /* Current clocks */
> >   /* target core frequency [MHz] */ @@ -1126,6 +1134,15 @@ struct
> > gpu_metrics_v3_0 {
> >   /* GFXCLK frequency limit enforced on GFX [MHz] */
> >   uint16_tcurrent_gfx_maxfreq;
> >
> > + /* Throttle Residency (ASIC dependent) */
> > + uint32_t throttle_residency_prochot;
> > + uint32_t throttle_residency_spl;
> > + uint32_t throttle_residency_fppt;
> > + uint32_t throttle_residency_sppt;
> > + uint32_t throttle_residency_thm_core;
> > + uint32_t throttle_residency_thm_gfx;
> > + uint32_t throttle_residency_thm_soc;
> > +
> >   /* Metrics table alpha filter time constant [us] */
> >   uint32_ttime_filter_alphavalue;
> >  };
>
>

Re: [PATCH 3/3] drm/amdgpu: Avoid querying DRM MGCG status

2023-12-01 Thread Deucher, Alexander
[AMD Official Use Only - General]

For the series.

From: Alex Deucher 
Sent: Friday, December 1, 2023 9:00 AM
To: Lazar, Lijo 
Cc: amd-gfx@lists.freedesktop.org ; Deucher, 
Alexander ; Zhang, Hawking 
Subject: Re: [PATCH 3/3] drm/amdgpu: Avoid querying DRM MGCG status

Acked-by: Alex Deucher 

On Fri, Dec 1, 2023 at 3:32 AM Lijo Lazar  wrote:
>
> MP0 v13.0.6 SOCs don't support DRM MGCG.
>
> Signed-off-by: Lijo Lazar 
> ---
>  drivers/gpu/drm/amd/amdgpu/soc15.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c 
> b/drivers/gpu/drm/amd/amdgpu/soc15.c
> index 9043ebf1e161..15033efec2ba 100644
> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
> @@ -1430,7 +1430,8 @@ static void soc15_common_get_clockgating_state(void 
> *handle, u64 *flags)
> if (adev->hdp.funcs && adev->hdp.funcs->get_clock_gating_state)
> adev->hdp.funcs->get_clock_gating_state(adev, flags);
>
> -   if (amdgpu_ip_version(adev, MP0_HWIP, 0) != IP_VERSION(13, 0, 2)) {
> +   if ((amdgpu_ip_version(adev, MP0_HWIP, 0) != IP_VERSION(13, 0, 2)) &&
> +   (amdgpu_ip_version(adev, MP0_HWIP, 0) != IP_VERSION(13, 0, 6))) {
> /* AMD_CG_SUPPORT_DRM_MGCG */
> data = RREG32(SOC15_REG_OFFSET(MP0, 0, 
> mmMP0_MISC_CGTT_CTRL0));
> if (!(data & 0x0100))
> --
> 2.25.1
>


Re: [PATCH] drm/amdgpu: disable MCBP by default

2023-11-30 Thread Deucher, Alexander
[Public]

Acked-by: Alex Deucher 

From: amd-gfx  on behalf of 
jiadong@amd.com 
Sent: Thursday, November 30, 2023 7:57 PM
To: amd-gfx@lists.freedesktop.org 
Cc: Zhu, Jiadong 
Subject: [PATCH] drm/amdgpu: disable MCBP by default

From: Jiadong Zhu 

Disable MCBP(mid command buffer preemption) by default as old Mesa
hangs with it. We shall not enable the feature that breaks old usermode
driver.

Signed-off-by: Jiadong Zhu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 21b8a8f2b622..280fcad9ce93 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3690,10 +3690,6 @@ static void amdgpu_device_set_mcbp(struct amdgpu_device 
*adev)
 adev->gfx.mcbp = true;
 else if (amdgpu_mcbp == 0)
 adev->gfx.mcbp = false;
-   else if ((amdgpu_ip_version(adev, GC_HWIP, 0) >= IP_VERSION(9, 0, 0)) &&
-(amdgpu_ip_version(adev, GC_HWIP, 0) < IP_VERSION(10, 0, 0)) &&
-adev->gfx.num_gfx_rings)
-   adev->gfx.mcbp = true;

 if (amdgpu_sriov_vf(adev))
 adev->gfx.mcbp = true;
--
2.25.1



RE: [PATCH] drm/amd/swsmu: update smu v14_0_0 driver if version and metrics table

2023-11-27 Thread Deucher, Alexander
[Public]

> -Original Message-
> From: Ma, Li 
> Sent: Thursday, November 23, 2023 5:07 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Koenig, Christian
> ; Zhang, Yifan ; Yu,
> Lang ; Ma, Li 
> Subject: [PATCH] drm/amd/swsmu: update smu v14_0_0 driver if version and
> metrics table
>
> Increment the driver if version and add new mems to the mertics table.
>
> Signed-off-by: Li Ma 
> ---
>  .../gpu/drm/amd/include/kgd_pp_interface.h| 17 
>  drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h | 10 +++
>  .../inc/pmfw_if/smu14_driver_if_v14_0_0.h | 77 +++
>  .../drm/amd/pm/swsmu/smu14/smu_v14_0_0_ppt.c  | 46 ++-
>  4 files changed, 115 insertions(+), 35 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/include/kgd_pp_interface.h
> b/drivers/gpu/drm/amd/include/kgd_pp_interface.h
> index 8ebba87f4289..eaea1c65e526 100644
> --- a/drivers/gpu/drm/amd/include/kgd_pp_interface.h
> +++ b/drivers/gpu/drm/amd/include/kgd_pp_interface.h
> @@ -1086,6 +1086,10 @@ struct gpu_metrics_v3_0 {
>   uint16_taverage_dram_reads;
>   /* time filtered DRAM write bandwidth [MB/sec] */
>   uint16_taverage_dram_writes;
> + /* time filtered IPU read bandwidth [MB/sec] */
> + uint16_taverage_ipu_reads;
> + /* time filtered IPU write bandwidth [MB/sec] */
> + uint16_taverage_ipu_writes;
>
>   /* Driver attached timestamp (in ns) */
>   uint64_tsystem_clock_counter;
> @@ -1105,6 +1109,8 @@ struct gpu_metrics_v3_0 {
>   uint32_taverage_all_core_power;
>   /* calculated core power [mW] */
>   uint16_taverage_core_power[16];
> + /* time filtered total system power [mW] */
> + uint16_taverage_sys_power;
>   /* maximum IRM defined STAPM power limit [mW] */
>   uint16_tstapm_power_limit;
>   /* time filtered STAPM power limit [mW] */ @@ -1117,6 +1123,8
> @@ struct gpu_metrics_v3_0 {
>   uint16_taverage_ipuclk_frequency;
>   uint16_taverage_fclk_frequency;
>   uint16_taverage_vclk_frequency;
> + uint16_taverage_uclk_frequency;
> + uint16_taverage_mpipu_frequency;
>
>   /* Current clocks */
>   /* target core frequency [MHz] */
> @@ -1126,6 +1134,15 @@ struct gpu_metrics_v3_0 {
>   /* GFXCLK frequency limit enforced on GFX [MHz] */
>   uint16_tcurrent_gfx_maxfreq;
>
> + /* Throttle Residency (ASIC dependent) */
> + uint32_t throttle_residency_prochot;
> + uint32_t throttle_residency_spl;
> + uint32_t throttle_residency_fppt;
> + uint32_t throttle_residency_sppt;
> + uint32_t throttle_residency_thm_core;
> + uint32_t throttle_residency_thm_gfx;
> + uint32_t throttle_residency_thm_soc;
> +
>   /* Metrics table alpha filter time constant [us] */
>   uint32_ttime_filter_alphavalue;
>  };

Is anything else besides smu14 using v3 of this struct?  If so, we can't change 
the layout otherwise it will break existing tools.  If so, bump the version 
minor and append the new items to the end.

Alex


> diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
> b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
> index c125253df20b..c2265e027ca8 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
> +++ b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
> @@ -1418,6 +1418,16 @@ typedef enum {
>   METRICS_PCIE_WIDTH,
>   METRICS_CURR_FANPWM,
>   METRICS_CURR_SOCKETPOWER,
> + METRICS_AVERAGE_VPECLK,
> + METRICS_AVERAGE_IPUCLK,
> + METRICS_AVERAGE_MPIPUCLK,
> + METRICS_THROTTLER_RESIDENCY_PROCHOT,
> + METRICS_THROTTLER_RESIDENCY_SPL,
> + METRICS_THROTTLER_RESIDENCY_FPPT,
> + METRICS_THROTTLER_RESIDENCY_SPPT,
> + METRICS_THROTTLER_RESIDENCY_THM_CORE,
> + METRICS_THROTTLER_RESIDENCY_THM_GFX,
> + METRICS_THROTTLER_RESIDENCY_THM_SOC,
>  } MetricsMember_t;
>
>  enum smu_cmn2asic_mapping_type {
> diff --git
> a/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu14_driver_if_v14_0_0
> .h
> b/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu14_driver_if_v14_0_0
> .h
> index 22f88842a7fd..8f42771e1f0a 100644
> ---
> a/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu14_driver_if_v14_0_0
> .h
> +++
> b/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu14_driver_if_v14_0_0
> .h
> @@ -27,7 +27,7 @@
>  // *** IMPORTANT ***
>  // SMU TEAM: Always incr

Re: [PATCH v2] drm/amdgpu: correct the amdgpu runtime dereference usage count

2023-11-16 Thread Deucher, Alexander
[AMD Official Use Only - General]

Reviewed-by: Alex Deucher 

From: Liang, Prike 
Sent: Thursday, November 16, 2023 10:35 PM
To: amd-gfx@lists.freedesktop.org 
Cc: Deucher, Alexander ; Li, Sun peng (Leo) 
; Wentland, Harry ; Feng, Kenneth 
; Liang, Prike 
Subject: [PATCH v2] drm/amdgpu: correct the amdgpu runtime dereference usage 
count

Fix the amdgpu runpm dereference usage count.

Signed-off-by: Prike Liang 
---
v2: remove goto clause and return directly(Alex)

 drivers/gpu/drm/amd/amdgpu/amdgpu_display.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
index 0cacd0b9f8be..b8fbe97efe1d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
@@ -340,14 +340,11 @@ int amdgpu_display_crtc_set_config(struct drm_mode_set 
*set,
 adev->have_disp_power_ref = true;
 return ret;
 }
-   /* if we have no active crtcs, then drop the power ref
-* we got before
+   /* if we have no active crtcs, then go to
+* drop the power ref we got before
  */
-   if (!active && adev->have_disp_power_ref) {
-   pm_runtime_put_autosuspend(dev->dev);
+   if (!active && adev->have_disp_power_ref)
 adev->have_disp_power_ref = false;
-   }
-
 out:
 /* drop the power reference we got coming in here */
 pm_runtime_put_autosuspend(dev->dev);
--
2.34.1



Re: [PATCH] drm/amdgpu: exclude domain start when calucales offset for AGP aperture BOs

2023-11-10 Thread Deucher, Alexander
[Public]

In that case, how do we know we can skip the gart setup in 
amdgpu_ttm_alloc_gart()?

Alex

From: Koenig, Christian 
Sent: Friday, November 10, 2023 9:20 AM
To: Deucher, Alexander ; Zhang, Yifan 
; amd-gfx@lists.freedesktop.org 

Cc: Zhang, Jesse(Jie) 
Subject: Re: [PATCH] drm/amdgpu: exclude domain start when calucales offset for 
AGP aperture BOs

No, that's broken as well.

The problem is in amdgpu_ttm_alloc_gart():

if (addr != AMDGPU_BO_INVALID_OFFSET) {
bo->resource->start = addr >> PAGE_SHIFT;
return 0;
}

bo->resource->start is relative to the GART address, so we can't assign the AGP 
address here in the first place.

What we need to do is to drop this and call amdgpu_gmc_agp_addr() from 
amdgpu_bo_gpu_offset_no_check().

Regards,
Christian.

Am 10.11.23 um 15:17 schrieb Deucher, Alexander:

[Public]

I think the proper fix is probably to just drop the addition of agp_start in 
amdgpu_gmc_agp_addr().

Alex
________
From: Deucher, Alexander 
<mailto:alexander.deuc...@amd.com>
Sent: Friday, November 10, 2023 9:16 AM
To: Koenig, Christian 
<mailto:christian.koe...@amd.com>; Zhang, Yifan 
<mailto:yifan1.zh...@amd.com>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> 
<mailto:amd-gfx@lists.freedesktop.org>
Cc: Zhang, Jesse(Jie) <mailto:jesse.zh...@amd.com>
Subject: Re: [PATCH] drm/amdgpu: exclude domain start when calucales offset for 
AGP aperture BOs

It happens in amdgpu_gmc_agp_addr() which is called from 
amdgpu_ttm_alloc_gart().

Alex

From: Koenig, Christian 
<mailto:christian.koe...@amd.com>
Sent: Friday, November 10, 2023 9:14 AM
To: Zhang, Yifan <mailto:yifan1.zh...@amd.com>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> 
<mailto:amd-gfx@lists.freedesktop.org>
Cc: Deucher, Alexander 
<mailto:alexander.deuc...@amd.com>; Zhang, 
Jesse(Jie) <mailto:jesse.zh...@amd.com>
Subject: Re: [PATCH] drm/amdgpu: exclude domain start when calucales offset for 
AGP aperture BOs

Am 10.11.23 um 13:52 schrieb Yifan Zhang:
> For BOs in AGP aperture, tbo.resource->start includes AGP aperture start.


Well big NAK to that. tbo.resource->start should never ever include the
AGP aperture start in the first place.

How did that happen?

Regards,
Christian.

> Don't add it again in amdgpu_bo_gpu_offset. This issue was mitigated due to
> GART aperture start was 0 until this patch ("a013c94d5aca drm/amdgpu/gmc11:
> set gart placement GC11") changes GART start to a non-zero value.
>
> Reported-by: Jesse Zhang <mailto:jesse.zh...@amd.com>
> Signed-off-by: Yifan Zhang <mailto:yifan1.zh...@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c|  7 +++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h|  1 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 10 --
>   3 files changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> index 5f71414190e9..00e940eb69ab 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> @@ -169,6 +169,13 @@ int amdgpu_gmc_set_pte_pde(struct amdgpu_device *adev, 
> void *cpu_pt_addr,
>return 0;
>   }
>
> +bool bo_in_agp_aperture(struct amdgpu_bo *bo)
> +{
> + struct ttm_buffer_object *tbo = &(bo->tbo);
> + struct amdgpu_device *adev = amdgpu_ttm_adev(tbo->bdev);
> +
> + return (tbo->resource->start << PAGE_SHIFT) > adev->gmc.agp_start;
> +}
>   /**
>* amdgpu_gmc_agp_addr - return the address in the AGP address space
>*
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
> index e699d1ca8deb..448dc08e83de 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
> @@ -393,6 +393,7 @@ int amdgpu_gmc_set_pte_pde(struct amdgpu_device *adev, 
> void *cpu_pt_addr,
>uint64_t flags);
>   uint64_t amdgpu_gmc_pd_addr(struct amdgpu_bo *bo);
>   uint64_t amdgpu_gmc_agp_addr(struct ttm_buffer_object *bo);
> +bool bo_in_agp_aperture(struct amdgpu_bo *bo);
>   void amdgpu_gmc_sysvm_location(struct amdgpu_device *adev, struct 
> amdgpu_gmc *mc);
>   void amdgpu_gmc_vram_location(struct amdgpu_device *adev, struct amdgpu_gmc 
> *mc,
>  u64 base);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index cef920a93924..91a011d63ab4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object

Re: [PATCH] drm/amdgpu: exclude domain start when calucales offset for AGP aperture BOs

2023-11-10 Thread Deucher, Alexander
[Public]

I think the proper fix is probably to just drop the addition of agp_start in 
amdgpu_gmc_agp_addr().

Alex

From: Deucher, Alexander 
Sent: Friday, November 10, 2023 9:16 AM
To: Koenig, Christian ; Zhang, Yifan 
; amd-gfx@lists.freedesktop.org 

Cc: Zhang, Jesse(Jie) 
Subject: Re: [PATCH] drm/amdgpu: exclude domain start when calucales offset for 
AGP aperture BOs

It happens in amdgpu_gmc_agp_addr() which is called from 
amdgpu_ttm_alloc_gart().

Alex

From: Koenig, Christian 
Sent: Friday, November 10, 2023 9:14 AM
To: Zhang, Yifan ; amd-gfx@lists.freedesktop.org 

Cc: Deucher, Alexander ; Zhang, Jesse(Jie) 

Subject: Re: [PATCH] drm/amdgpu: exclude domain start when calucales offset for 
AGP aperture BOs

Am 10.11.23 um 13:52 schrieb Yifan Zhang:
> For BOs in AGP aperture, tbo.resource->start includes AGP aperture start.


Well big NAK to that. tbo.resource->start should never ever include the
AGP aperture start in the first place.

How did that happen?

Regards,
Christian.

> Don't add it again in amdgpu_bo_gpu_offset. This issue was mitigated due to
> GART aperture start was 0 until this patch ("a013c94d5aca drm/amdgpu/gmc11:
> set gart placement GC11") changes GART start to a non-zero value.
>
> Reported-by: Jesse Zhang 
> Signed-off-by: Yifan Zhang 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c|  7 +++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h|  1 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 10 --
>   3 files changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> index 5f71414190e9..00e940eb69ab 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> @@ -169,6 +169,13 @@ int amdgpu_gmc_set_pte_pde(struct amdgpu_device *adev, 
> void *cpu_pt_addr,
>return 0;
>   }
>
> +bool bo_in_agp_aperture(struct amdgpu_bo *bo)
> +{
> + struct ttm_buffer_object *tbo = &(bo->tbo);
> + struct amdgpu_device *adev = amdgpu_ttm_adev(tbo->bdev);
> +
> + return (tbo->resource->start << PAGE_SHIFT) > adev->gmc.agp_start;
> +}
>   /**
>* amdgpu_gmc_agp_addr - return the address in the AGP address space
>*
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
> index e699d1ca8deb..448dc08e83de 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
> @@ -393,6 +393,7 @@ int amdgpu_gmc_set_pte_pde(struct amdgpu_device *adev, 
> void *cpu_pt_addr,
>uint64_t flags);
>   uint64_t amdgpu_gmc_pd_addr(struct amdgpu_bo *bo);
>   uint64_t amdgpu_gmc_agp_addr(struct ttm_buffer_object *bo);
> +bool bo_in_agp_aperture(struct amdgpu_bo *bo);
>   void amdgpu_gmc_sysvm_location(struct amdgpu_device *adev, struct 
> amdgpu_gmc *mc);
>   void amdgpu_gmc_vram_location(struct amdgpu_device *adev, struct amdgpu_gmc 
> *mc,
>  u64 base);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index cef920a93924..91a011d63ab4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -39,6 +39,7 @@
>   #include "amdgpu.h"
>   #include "amdgpu_trace.h"
>   #include "amdgpu_amdkfd.h"
> +#include "amdgpu_gmc.h"
>
>   /**
>* DOC: amdgpu_object
> @@ -1529,8 +1530,13 @@ u64 amdgpu_bo_gpu_offset_no_check(struct amdgpu_bo *bo)
>struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
>uint64_t offset;
>
> - offset = (bo->tbo.resource->start << PAGE_SHIFT) +
> -  amdgpu_ttm_domain_start(adev, bo->tbo.resource->mem_type);
> + /* tbo.resource->start includes agp_start for AGP BOs */
> + if (bo_in_agp_aperture(bo)) {
> + offset = (bo->tbo.resource->start << PAGE_SHIFT);
> + } else {
> + offset = (bo->tbo.resource->start << PAGE_SHIFT) +
> +  amdgpu_ttm_domain_start(adev, 
> bo->tbo.resource->mem_type);
> + }
>
>return amdgpu_gmc_sign_extend(offset);
>   }



Re: [PATCH] drm/amdgpu: exclude domain start when calucales offset for AGP aperture BOs

2023-11-10 Thread Deucher, Alexander
[Public]

It happens in amdgpu_gmc_agp_addr() which is called from 
amdgpu_ttm_alloc_gart().

Alex

From: Koenig, Christian 
Sent: Friday, November 10, 2023 9:14 AM
To: Zhang, Yifan ; amd-gfx@lists.freedesktop.org 

Cc: Deucher, Alexander ; Zhang, Jesse(Jie) 

Subject: Re: [PATCH] drm/amdgpu: exclude domain start when calucales offset for 
AGP aperture BOs

Am 10.11.23 um 13:52 schrieb Yifan Zhang:
> For BOs in AGP aperture, tbo.resource->start includes AGP aperture start.


Well big NAK to that. tbo.resource->start should never ever include the
AGP aperture start in the first place.

How did that happen?

Regards,
Christian.

> Don't add it again in amdgpu_bo_gpu_offset. This issue was mitigated due to
> GART aperture start was 0 until this patch ("a013c94d5aca drm/amdgpu/gmc11:
> set gart placement GC11") changes GART start to a non-zero value.
>
> Reported-by: Jesse Zhang 
> Signed-off-by: Yifan Zhang 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c|  7 +++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h|  1 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 10 --
>   3 files changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> index 5f71414190e9..00e940eb69ab 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> @@ -169,6 +169,13 @@ int amdgpu_gmc_set_pte_pde(struct amdgpu_device *adev, 
> void *cpu_pt_addr,
>return 0;
>   }
>
> +bool bo_in_agp_aperture(struct amdgpu_bo *bo)
> +{
> + struct ttm_buffer_object *tbo = &(bo->tbo);
> + struct amdgpu_device *adev = amdgpu_ttm_adev(tbo->bdev);
> +
> + return (tbo->resource->start << PAGE_SHIFT) > adev->gmc.agp_start;
> +}
>   /**
>* amdgpu_gmc_agp_addr - return the address in the AGP address space
>*
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
> index e699d1ca8deb..448dc08e83de 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
> @@ -393,6 +393,7 @@ int amdgpu_gmc_set_pte_pde(struct amdgpu_device *adev, 
> void *cpu_pt_addr,
>uint64_t flags);
>   uint64_t amdgpu_gmc_pd_addr(struct amdgpu_bo *bo);
>   uint64_t amdgpu_gmc_agp_addr(struct ttm_buffer_object *bo);
> +bool bo_in_agp_aperture(struct amdgpu_bo *bo);
>   void amdgpu_gmc_sysvm_location(struct amdgpu_device *adev, struct 
> amdgpu_gmc *mc);
>   void amdgpu_gmc_vram_location(struct amdgpu_device *adev, struct amdgpu_gmc 
> *mc,
>  u64 base);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index cef920a93924..91a011d63ab4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -39,6 +39,7 @@
>   #include "amdgpu.h"
>   #include "amdgpu_trace.h"
>   #include "amdgpu_amdkfd.h"
> +#include "amdgpu_gmc.h"
>
>   /**
>* DOC: amdgpu_object
> @@ -1529,8 +1530,13 @@ u64 amdgpu_bo_gpu_offset_no_check(struct amdgpu_bo *bo)
>struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
>uint64_t offset;
>
> - offset = (bo->tbo.resource->start << PAGE_SHIFT) +
> -  amdgpu_ttm_domain_start(adev, bo->tbo.resource->mem_type);
> + /* tbo.resource->start includes agp_start for AGP BOs */
> + if (bo_in_agp_aperture(bo)) {
> + offset = (bo->tbo.resource->start << PAGE_SHIFT);
> + } else {
> + offset = (bo->tbo.resource->start << PAGE_SHIFT) +
> +  amdgpu_ttm_domain_start(adev, 
> bo->tbo.resource->mem_type);
> + }
>
>return amdgpu_gmc_sign_extend(offset);
>   }



RE: [PATCH 2/2] drm/amdgpu: add amdgpu runpm usage trace for separate funcs

2023-11-09 Thread Deucher, Alexander
[Public]

> -Original Message-
> From: Liang, Prike 
> Sent: Thursday, November 9, 2023 2:37 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Liang, Prike
> 
> Subject: [PATCH 2/2] drm/amdgpu: add amdgpu runpm usage trace for
> separate funcs
>
> Add trace for amdgpu runpm separate funcs usage and this will help
> debugging on the case of runpm usage missed to dereference.
> In the normal case the runpm usage count referred by one kind of
> functionality pairwise and usage should be changed from 1 to 0, otherwise
> there will be an issue in the amdgpu runpm usage dereference.
>
> Signed-off-by: Prike Liang 

Looks good.  Not sure if you want to add tracepoints to the other call sites as 
well.  These are probably the trickiest however.

Acked-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c |  4 
>  drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c   |  7 ++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h   | 15 +++
>  3 files changed, 25 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
> index e7e87a3b2601..decbbe3d4f06 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
> @@ -42,6 +42,7 @@
>  #include 
>  #include 
>  #include 
> +#include "amdgpu_trace.h"
>
>  /**
>   * amdgpu_dma_buf_attach - _buf_ops.attach implementation @@ -
> 63,6 +64,7 @@ static int amdgpu_dma_buf_attach(struct dma_buf *dmabuf,
>   attach->peer2peer = false;
>
>   r = pm_runtime_get_sync(adev_to_drm(adev)->dev);
> + trace_amdgpu_runpm_reference_dumps(1, __func__);
>   if (r < 0)
>   goto out;
>
> @@ -70,6 +72,7 @@ static int amdgpu_dma_buf_attach(struct dma_buf
> *dmabuf,
>
>  out:
>   pm_runtime_put_autosuspend(adev_to_drm(adev)->dev);
> + trace_amdgpu_runpm_reference_dumps(0, __func__);
>   return r;
>  }
>
> @@ -90,6 +93,7 @@ static void amdgpu_dma_buf_detach(struct dma_buf
> *dmabuf,
>
>   pm_runtime_mark_last_busy(adev_to_drm(adev)->dev);
>   pm_runtime_put_autosuspend(adev_to_drm(adev)->dev);
> + trace_amdgpu_runpm_reference_dumps(0, __func__);
>  }
>
>  /**
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> index 709a2c1b9d63..1026a9fa0c0f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> @@ -183,6 +183,7 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring,
> struct dma_fence **f, struct amd
>   amdgpu_ring_emit_fence(ring, ring->fence_drv.gpu_addr,
>  seq, flags | AMDGPU_FENCE_FLAG_INT);
>   pm_runtime_get_noresume(adev_to_drm(adev)->dev);
> + trace_amdgpu_runpm_reference_dumps(1, __func__);
>   ptr = >fence_drv.fences[seq & ring-
> >fence_drv.num_fences_mask];
>   if (unlikely(rcu_dereference_protected(*ptr, 1))) {
>   struct dma_fence *old;
> @@ -286,8 +287,11 @@ bool amdgpu_fence_process(struct amdgpu_ring
> *ring)
>   seq != ring->fence_drv.sync_seq)
>   amdgpu_fence_schedule_fallback(ring);
>
> - if (unlikely(seq == last_seq))
> + if (unlikely(seq == last_seq)) {
> + pm_runtime_put_autosuspend(adev_to_drm(adev)->dev);
> + trace_amdgpu_runpm_reference_dumps(0, __func__);
>   return false;
> + }
>
>   last_seq &= drv->num_fences_mask;
>   seq &= drv->num_fences_mask;
> @@ -310,6 +314,7 @@ bool amdgpu_fence_process(struct amdgpu_ring
> *ring)
>   dma_fence_put(fence);
>   pm_runtime_mark_last_busy(adev_to_drm(adev)->dev);
>   pm_runtime_put_autosuspend(adev_to_drm(adev)->dev);
> + trace_amdgpu_runpm_reference_dumps(0, __func__);
>   } while (last_seq != seq);
>
>   return true;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
> index 2fd1bfb35916..5d4792645540 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
> @@ -554,6 +554,21 @@ TRACE_EVENT(amdgpu_reset_reg_dumps,
> __entry->value)
>  );
>
> +TRACE_EVENT(amdgpu_runpm_reference_dumps,
> + TP_PROTO(uint32_t index, const char *func),
> + TP_ARGS(index, func),
> + TP_STRUCT__entry(
> +  __field(uint32_t, index)
> +  __string(func, func)
> +  ),
> + TP_fast_assign(
> +__entry->index = index;
> +__assign_str(func, func);
> +),
> + TP_printk("amdgpu runpm reference dump 0x%d: 0x%s\n",
> +   __entry->index,
> +   __get_str(func))
> +);
>  #undef AMDGPU_JOB_GET_TIMELINE_NAME
>  #endif
>
> --
> 2.34.1



RE: [PATCH 1/2] drm/amdgpu: correct the amdgpu runtime dereference usage count

2023-11-09 Thread Deucher, Alexander
[Public]

> -Original Message-
> From: Liang, Prike 
> Sent: Thursday, November 9, 2023 2:37 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Liang, Prike
> 
> Subject: [PATCH 1/2] drm/amdgpu: correct the amdgpu runtime dereference
> usage count
>
> Fix the amdgpu runpm dereference usage count.
>
> Signed-off-by: Prike Liang 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 2 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_display.c | 1 +
>  2 files changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> index a53f436fa9f1..f6e5d9f7 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> @@ -1992,7 +1992,7 @@ static int amdgpu_debugfs_sclk_set(void *data,
> u64 val)
>
>   ret = amdgpu_dpm_set_soft_freq_range(adev, PP_SCLK,
> (uint32_t)val, (uint32_t)val);
>   if (ret)
> - ret = -EINVAL;
> + goto out;

I think this hunk can be dropped.  It doesn't really change anything.  Or you 
could just drop the whole ret check since we just return ret at the end anyway. 
 Not sure if changing the error code is important here or not.

>
>  out:
>   pm_runtime_mark_last_busy(adev_to_drm(adev)->dev);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
> index 0cacd0b9f8be..ff1f42ae6d8e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
> @@ -346,6 +346,7 @@ int amdgpu_display_crtc_set_config(struct
> drm_mode_set *set,
>   if (!active && adev->have_disp_power_ref) {
>   pm_runtime_put_autosuspend(dev->dev);
>   adev->have_disp_power_ref = false;
> + return ret;
>   }

I think it would be cleaner to just drop the runtime_put above and update the 
comment.  We'll just fall through to the end of the function.

Alex

>
>  out:
> --
> 2.34.1



Re: [PATCH] drm/amd: Explicitly check for GFXOFF to be enabled for s0ix

2023-11-09 Thread Deucher, Alexander
[Public]

Reviewed-by: Alex Deucher 

From: amd-gfx  on behalf of Mario 
Limonciello 
Sent: Thursday, November 9, 2023 11:27 AM
To: amd-gfx@lists.freedesktop.org 
Cc: Limonciello, Mario 
Subject: [PATCH] drm/amd: Explicitly check for GFXOFF to be enabled for s0ix

If a user has disabled GFXOFF this may cause problems for the suspend
sequence.  Ensure that it is enabled in amdgpu_acpi_is_s0ix_active().

The system won't reach the deepest state but it also won't hang.

Signed-off-by: Mario Limonciello 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
index d62e49758635..e550067e5c5d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
@@ -1497,6 +1497,9 @@ bool amdgpu_acpi_is_s0ix_active(struct amdgpu_device 
*adev)
 if (adev->asic_type < CHIP_RAVEN)
 return false;

+   if (!(adev->pm.pp_feature & PP_GFXOFF_MASK))
+   return false;
+
 /*
  * If ACPI_FADT_LOW_POWER_S0 is not set in the FADT, it is generally
  * risky to do any special firmware-related preparations for entering
--
2.34.1



RE: [PATCH v2] drm/amd/swsmu: update smu v14_0_0 driver if and metrics table

2023-10-31 Thread Deucher, Alexander
[AMD Official Use Only - General]

> -Original Message-
> From: amd-gfx  On Behalf Of Li Ma
> Sent: Monday, October 30, 2023 6:55 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Zhang, Yifan
> ; Feng, Kenneth ; Ma, Li
> ; Du, Xiaojian 
> Subject: [PATCH v2] drm/amd/swsmu: update smu v14_0_0 driver if and
> metrics table
>
> Update driver if headers and metrics table in smu v14_0_0 after smu fw
> promotion. And drop the legacy metrics table.
> v1:
> update header files
> v2:
> drop legacy metrics table and add warning of checking pmfw version.
>
> Signed-off-by: Li Ma 

Acked-by: Alex Deucher 


RE: [PATCH] drm/radeon: replace 1-element arrays with flexible-array members

2023-10-27 Thread Deucher, Alexander
[Public]

> -Original Message-
> From: José Pekkarinen 
> Sent: Friday, October 27, 2023 12:59 PM
> To: Deucher, Alexander ; Koenig, Christian
> ; Pan, Xinhui ;
> sk...@linuxfoundation.org
> Cc: José Pekkarinen ; airl...@gmail.com;
> dan...@ffwll.ch; amd-gfx@lists.freedesktop.org; dri-
> de...@lists.freedesktop.org; linux-ker...@vger.kernel.org; linux-kernel-
> ment...@lists.linuxfoundation.org
> Subject: [PATCH] drm/radeon: replace 1-element arrays with flexible-array
> members
>
> Reported by coccinelle, the following patch will move the following 1 element
> arrays to flexible arrays.
>
> drivers/gpu/drm/radeon/atombios.h:5523:32-48: WARNING use flexible-
> array member instead
> (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-
> length-and-one-element-arrays)
> drivers/gpu/drm/radeon/atombios.h:5545:32-48: WARNING use flexible-
> array member instead
> (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-
> length-and-one-element-arrays)
> drivers/gpu/drm/radeon/atombios.h:5461:34-44: WARNING use flexible-
> array member instead
> (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-
> length-and-one-element-arrays)
> drivers/gpu/drm/radeon/atombios.h:4447:30-40: WARNING use flexible-
> array member instead
> (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-
> length-and-one-element-arrays)
> drivers/gpu/drm/radeon/atombios.h:4236:30-41: WARNING use flexible-
> array member instead
> (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-
> length-and-one-element-arrays)
> drivers/gpu/drm/radeon/atombios.h:7044:24-37: WARNING use flexible-
> array member instead
> (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-
> length-and-one-element-arrays)
> drivers/gpu/drm/radeon/atombios.h:7054:24-37: WARNING use flexible-
> array member instead
> (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-
> length-and-one-element-arrays)
> drivers/gpu/drm/radeon/atombios.h:7095:28-45: WARNING use flexible-
> array member instead
> (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-
> length-and-one-element-arrays)
> drivers/gpu/drm/radeon/atombios.h:7553:8-17: WARNING use flexible-array
> member instead
> (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-
> length-and-one-element-arrays)
> drivers/gpu/drm/radeon/atombios.h:7559:8-17: WARNING use flexible-array
> member instead
> (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-
> length-and-one-element-arrays)
> drivers/gpu/drm/radeon/atombios.h:3896:27-37: WARNING use flexible-
> array member instead
> (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-
> length-and-one-element-arrays)
> drivers/gpu/drm/radeon/atombios.h:5443:16-25: WARNING use flexible-
> array member instead
> (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-
> length-and-one-element-arrays)
> drivers/gpu/drm/radeon/atombios.h:5454:34-43: WARNING use flexible-
> array member instead
> (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-
> length-and-one-element-arrays)
> drivers/gpu/drm/radeon/atombios.h:4603:21-32: WARNING use flexible-
> array member instead
> (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-
> length-and-one-element-arrays)
> drivers/gpu/drm/radeon/atombios.h:6299:32-44: WARNING use flexible-
> array member instead
> (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-
> length-and-one-element-arrays)
> drivers/gpu/drm/radeon/atombios.h:4628:32-46: WARNING use flexible-
> array member instead
> (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-
> length-and-one-element-arrays)
> drivers/gpu/drm/radeon/atombios.h:6285:29-39: WARNING use flexible-
> array member instead
> (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-
> length-and-one-element-arrays)
> drivers/gpu/drm/radeon/atombios.h:4296:30-36: WARNING use flexible-
> array member instead
> (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-
> length-and-one-element-arrays)
> drivers/gpu/drm/radeon/atombios.h:4756:28-36: WARNING use flexible-
> array member instead
> (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-
> length-and-one-element-arrays)
> drivers/gpu/drm/radeon/atombios.h:4064:22-35: WARNING use flexible-
> array member instead
> (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-
> length-and-one-element-arrays)
> drivers/gpu/drm/radeon/atombios.h:7327:9-24: WARNING use flexible-array
> member instead
> (https://www.kernel.org/doc/html/latest/proces

Re: [PATCH] drm/amdgpu: add unmap latency when gfx11 set kiq resources

2023-10-27 Thread Deucher, Alexander
[Public]

Acked-by: Alex Deucher 

From: Tong Liu01 
Sent: Thursday, October 26, 2023 11:41 PM
To: amd-gfx@lists.freedesktop.org 
Cc: Evan Quan ; Chen, Horace ; Tuikov, 
Luben ; Koenig, Christian ; 
Deucher, Alexander ; Xiao, Jack ; 
Zhang, Hawking ; Liu, Monk ; Xu, 
Feifei ; Chang, HaiJun ; Liu01, Tong 
(Esther) 
Subject: [PATCH] drm/amdgpu: add unmap latency when gfx11 set kiq resources

[why]
If driver does not set unmap latency for KIQ, the default value of KIQ
unmap latency is zero. When do unmap queue, KIQ will return that almost
immediately after receiving unmap command. So, the queue status will be
saved to MQD incorrectly or lost in some chance.

[how]
Set unmap latency when do kiq set resources. The unmap latency is set to
be 1 second that is synchronized with Windows driver.

Signed-off-by: Tong Liu01 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index fd22943685f7..7aef7a3a340f 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -155,6 +155,7 @@ static void gfx11_kiq_set_resources(struct amdgpu_ring 
*kiq_ring, uint64_t queue
 {
 amdgpu_ring_write(kiq_ring, PACKET3(PACKET3_SET_RESOURCES, 6));
 amdgpu_ring_write(kiq_ring, PACKET3_SET_RESOURCES_VMID_MASK(0) |
+ PACKET3_SET_RESOURCES_UNMAP_LATENTY(0xa) | /* 
unmap_latency: 0xa (~ 1s) */
   PACKET3_SET_RESOURCES_QUEUE_TYPE(0));  /* 
vmid_mask:0 queue_type:0 (KIQ) */
 amdgpu_ring_write(kiq_ring, lower_32_bits(queue_mask)); /* queue mask 
lo */
 amdgpu_ring_write(kiq_ring, upper_32_bits(queue_mask)); /* queue mask 
hi */
--
2.34.1



Re: [PATCH] drm/amd/amdgpu: avoid to disable gfxhub interrupt when driver is unloaded

2023-10-24 Thread Deucher, Alexander
[Public]

Reviewed-by: Alex Deucher 

From: Kenneth Feng 
Sent: Monday, October 23, 2023 11:32 PM
To: amd-gfx@lists.freedesktop.org 
Cc: Deucher, Alexander ; Feng, Kenneth 

Subject: [PATCH] drm/amd/amdgpu: avoid to disable gfxhub interrupt when driver 
is unloaded

avoid to disable gfxhub interrupt when driver is unloaded on gmc 11

Signed-off-by: Kenneth Feng 
---
 drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
index 80ca2c05b0b8..8e36a8395464 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
@@ -73,7 +73,8 @@ gmc_v11_0_vm_fault_interrupt_state(struct amdgpu_device *adev,
  * fini/suspend, so the overall state doesn't
  * change over the course of suspend/resume.
  */
-   if (!adev->in_s0ix)
+   if (!adev->in_s0ix && (adev->in_runpm || adev->in_suspend ||
+  
amdgpu_in_reset(adev)))
 amdgpu_gmc_set_vm_fault_masks(adev, AMDGPU_GFXHUB(0), 
false);
 break;
 case AMDGPU_IRQ_STATE_ENABLE:
--
2.34.1



Re: [PATCH] drm/amd/pm: call smu_cmn_get_smc_version in is_mode1_reset_supported.

2023-10-24 Thread Deucher, Alexander
[Public]

Acked-by: Alex Deucher 

From: Zhang, Yifan 
Sent: Tuesday, October 24, 2023 9:41 AM
To: amd-gfx@lists.freedesktop.org 
Cc: Deucher, Alexander ; Koenig, Christian 
; Li, Candice ; Feng, Kenneth 
; Zhang, Yifan 
Subject: [PATCH] drm/amd/pm: call smu_cmn_get_smc_version in 
is_mode1_reset_supported.

is_mode1_reset_supported may be called before smu init, when smu_context
is unitialized in driver load/unload test. Call smu_cmn_get_smc_version
explicitly is_mode1_reset_supported.

Fixes: 5fe5098c64d9 ("drm/amd/pm: drop most smu_cmn_get_smc_version in smu")
Signed-off-by: Yifan Zhang 
---
 drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c | 8 +++-
 drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c| 8 +++-
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
index 090249b6422a..77c3d76c76a2 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
@@ -2461,12 +2461,18 @@ static bool 
sienna_cichlid_is_mode1_reset_supported(struct smu_context *smu)
 {
 struct amdgpu_device *adev = smu->adev;
 uint32_t val;
+   uint32_t smu_version;
+   int ret;

 /**
  * SRIOV env will not support SMU mode1 reset
  * PM FW support mode1 reset from 58.26
  */
-   if (amdgpu_sriov_vf(adev) || (smu->smc_fw_version < 0x003a1a00))
+   ret = smu_cmn_get_smc_version(smu, NULL, _version);
+   if (ret)
+   return false;
+
+   if (amdgpu_sriov_vf(adev) || (smu_version < 0x003a1a00))
 return false;

 /**
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c
index b1433973380b..648d5eafb27b 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c
@@ -2615,13 +2615,19 @@ static int smu_v13_0_0_baco_exit(struct smu_context 
*smu)
 static bool smu_v13_0_0_is_mode1_reset_supported(struct smu_context *smu)
 {
 struct amdgpu_device *adev = smu->adev;
+   u32 smu_version;
+   int ret;

 /* SRIOV does not support SMU mode1 reset */
 if (amdgpu_sriov_vf(adev))
 return false;

 /* PMFW support is available since 78.41 */
-   if (smu->smc_fw_version < 0x004e2900)
+   ret = smu_cmn_get_smc_version(smu, NULL, _version);
+   if (ret)
+   return false;
+
+   if (smu_version < 0x004e2900)
 return false;

 return true;
--
2.37.3



Re: [PATCH] drm/amdxcp: fix amdxcp unloads incompletely

2023-10-23 Thread Deucher, Alexander
[Public]

Acked-by: Alex Deucher 

From: amd-gfx  on behalf of James Zhu 

Sent: Thursday, September 7, 2023 10:41 AM
To: amd-gfx@lists.freedesktop.org 
Cc: Lin, Amber ; Zhu, James ; Kamal, Asad 

Subject: [PATCH] drm/amdxcp: fix amdxcp unloads incompletely

amdxcp unloads incompletely, and below error will be seen during load/unload,
sysfs: cannot create duplicate filename '/devices/platform/amdgpu_xcp.0'

devres_release_group will free xcp device at first, platform device will be
unregistered later in platform_device_unregister.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdxcp/amdgpu_xcp_drv.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdxcp/amdgpu_xcp_drv.c 
b/drivers/gpu/drm/amd/amdxcp/amdgpu_xcp_drv.c
index 353597fc908d..90ddd8371176 100644
--- a/drivers/gpu/drm/amd/amdxcp/amdgpu_xcp_drv.c
+++ b/drivers/gpu/drm/amd/amdxcp/amdgpu_xcp_drv.c
@@ -89,9 +89,10 @@ EXPORT_SYMBOL(amdgpu_xcp_drm_dev_alloc);
 void amdgpu_xcp_drv_release(void)
 {
 for (--pdev_num; pdev_num >= 0; --pdev_num) {
-   devres_release_group(_dev[pdev_num]->pdev->dev, NULL);
-   platform_device_unregister(xcp_dev[pdev_num]->pdev);
-   xcp_dev[pdev_num]->pdev = NULL;
+   struct platform_device *pdev = xcp_dev[pdev_num]->pdev;
+
+   devres_release_group(>dev, NULL);
+   platform_device_unregister(pdev);
 xcp_dev[pdev_num] = NULL;
 }
 pdev_num = 0;
--
2.34.1



RE: [PATCH] drm/amd: Disable ASPM for VI w/ all Intel systems

2023-10-23 Thread Deucher, Alexander
[Public]

> -Original Message-
> From: amd-gfx  On Behalf Of Mario
> Limonciello
> Sent: Monday, October 23, 2023 9:45 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Limonciello, Mario ;
> paolo.gent...@canonical.com
> Subject: [PATCH] drm/amd: Disable ASPM for VI w/ all Intel systems
>
> Originally we were quirking ASPM disabled specifically for VI when used with
> Alder Lake, but it appears to have problems with Rocket Lake as well.
>
> Like we've done in the case of dpm for newer platforms, disable ASPM for all
> Intel systems.
>
> Cc: sta...@vger.kernel.org # 5.15+
> Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default")
> Reported-and-tested-by: Paolo Gentili 
> Closes: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2036742
> Signed-off-by: Mario Limonciello 

Reviewed-by: Alex Deucher 

As a follow on, we probably want to apply this to all of the program_aspm() 
functions for each asic family.

Alex

> ---
>  drivers/gpu/drm/amd/amdgpu/vi.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c
> b/drivers/gpu/drm/amd/amdgpu/vi.c index 6a8494f98d3e..fe8ba9e9837b
> 100644
> --- a/drivers/gpu/drm/amd/amdgpu/vi.c
> +++ b/drivers/gpu/drm/amd/amdgpu/vi.c
> @@ -1124,7 +1124,7 @@ static void vi_program_aspm(struct
> amdgpu_device *adev)
>   bool bL1SS = false;
>   bool bClkReqSupport = true;
>
> - if (!amdgpu_device_should_use_aspm(adev) ||
> !amdgpu_device_aspm_support_quirk())
> + if (!amdgpu_device_should_use_aspm(adev) ||
> +!amdgpu_device_pcie_dynamic_switching_supported())
>   return;
>
>   if (adev->flags & AMD_IS_APU ||
> --
> 2.34.1



Re: [PATCH] drm/amdkfd: reserve a fence slot while locking the BO

2023-10-20 Thread Deucher, Alexander
[Public]

Acked-by: Alex Deucher 

From: amd-gfx  on behalf of Christian 
König 
Sent: Friday, October 20, 2023 8:33 AM
To: Shi, Leslie ; Kuehling, Felix 
; amd-gfx@lists.freedesktop.org 

Cc: Koenig, Christian 
Subject: [PATCH] drm/amdkfd: reserve a fence slot while locking the BO

Looks like the KFD still needs this.

Signed-off-by: Christian König 
Fixes: 8abc1eb2987a ("drm/amdkfd: switch over to using drm_exec v3")
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 7d6daf8d2bfa..e036011137aa 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1103,7 +1103,7 @@ static int reserve_bo_and_vm(struct kgd_mem *mem,
 if (unlikely(ret))
 goto error;

-   ret = drm_exec_lock_obj(>exec, >tbo.base);
+   ret = drm_exec_prepare_obj(>exec, >tbo.base, 1);
 drm_exec_retry_on_contention(>exec);
 if (unlikely(ret))
 goto error;
--
2.34.1



Re: [PATCH] drm/amd: Add missing kernel doc for prepare_suspend()

2023-10-17 Thread Deucher, Alexander
[Public]

Reviewed-by: Alex Deucher 

From: amd-gfx  on behalf of Mario 
Limonciello 
Sent: Tuesday, October 17, 2023 3:37 PM
To: amd-gfx@lists.freedesktop.org 
Cc: Stephen Rothwell ; Limonciello, Mario 

Subject: [PATCH] drm/amd: Add missing kernel doc for prepare_suspend()

prepare_suspend() is intended to be used for any IP blocks
that must allocate memory during the suspend sequence.

Reported-by: Stephen Rothwell 
Closes: https://lore.kernel.org/all/20231017143555.6a645...@canb.auug.org.au/
Fixes: cb11ca3233aa ("drm/amd: Add concept of running prepare_suspend() 
sequence for IP blocks")
Signed-off-by: Mario Limonciello 
---
 drivers/gpu/drm/amd/include/amd_shared.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/include/amd_shared.h 
b/drivers/gpu/drm/amd/include/amd_shared.h
index 98e60bc868dd..579977f6ad52 100644
--- a/drivers/gpu/drm/amd/include/amd_shared.h
+++ b/drivers/gpu/drm/amd/include/amd_shared.h
@@ -271,6 +271,8 @@ enum amd_dpm_forced_level;
  * @hw_init: sets up the hw state
  * @hw_fini: tears down the hw state
  * @late_fini: final cleanup
+ * @prepare_suspend: handle IP specific changes to prepare for suspend
+ *   (such as allocating any required memory)
  * @suspend: handles IP specific hw/sw changes for suspend
  * @resume: handles IP specific hw/sw changes for resume
  * @is_idle: returns current IP block idle status
--
2.34.1



Re: [PATCH 3/3] drm/amd: Read IMU FW version from scratch register during hw_init

2023-10-13 Thread Deucher, Alexander
[AMD Official Use Only - General]

Series is:
Reviewed-by: Alex Deucher 

From: amd-gfx  on behalf of Mario 
Limonciello 
Sent: Friday, October 13, 2023 3:26 PM
To: amd-gfx@lists.freedesktop.org 
Cc: Limonciello, Mario 
Subject: [PATCH 3/3] drm/amd: Read IMU FW version from scratch register during 
hw_init

If the IMU version wasn't discovered from the header, such as when
the firmware was directly loaded by PSP then there is no firmware
version to show to userspace from sysfs or IOCTL.

The IMU F/W stores the version in the first scratch register though,
so fetch it in these cases to let the driver export.

Signed-off-by: Mario Limonciello 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index f0957d060750..154b20492123 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -4373,6 +4373,10 @@ static int gfx_v11_0_hw_init(void *handle)
 if (r)
 return r;

+   /* get IMU version from HW if it's not set */
+   if (!adev->gfx.imu_fw_version)
+   adev->gfx.imu_fw_version = RREG32_SOC15(GC, 0, 
regGFX_IMU_SCRATCH_0);
+
 return r;
 }

--
2.34.1



Re: [PATCH] drm/amdgpu/umsch: add suspend and resume callback

2023-10-13 Thread Deucher, Alexander
[Public]

Acked-by: Alex Deucher 

From: amd-gfx  on behalf of Lang Yu 

Sent: Friday, October 13, 2023 1:58 AM
To: amd-gfx@lists.freedesktop.org 
Cc: Gopalakrishnan, Veerabadhran (Veera) ; 
Yu, Lang 
Subject: [PATCH] drm/amdgpu/umsch: add suspend and resume callback

Add missing IP callbacks.

Signed-off-by: Lang Yu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c
index 4bd076e9e367..f5fdde5181f7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c
@@ -844,6 +844,20 @@ static int umsch_mm_hw_fini(void *handle)
 return 0;
 }

+static int umsch_mm_suspend(void *handle)
+{
+   struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+
+   return umsch_mm_hw_fini(adev);
+}
+
+static int umsch_mm_resume(void *handle)
+{
+   struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+
+   return umsch_mm_hw_init(adev);
+}
+
 static const struct amd_ip_funcs umsch_mm_v4_0_ip_funcs = {
 .name = "umsch_mm_v4_0",
 .early_init = umsch_mm_early_init,
@@ -852,6 +866,8 @@ static const struct amd_ip_funcs umsch_mm_v4_0_ip_funcs = {
 .sw_fini = umsch_mm_sw_fini,
 .hw_init = umsch_mm_hw_init,
 .hw_fini = umsch_mm_hw_fini,
+   .suspend = umsch_mm_suspend,
+   .resume = umsch_mm_resume,
 };

 const struct amdgpu_ip_block_version umsch_mm_v4_0_ip_block = {
--
2.25.1



Re: [PATCH 1/2] drm/amdgpu: correct NBIO v7.11 programing

2023-10-12 Thread Deucher, Alexander
[Public]

Acked-by: Alex Deucher 

From: Yu, Lang 
Sent: Thursday, October 12, 2023 3:31 AM
To: amd-gfx@lists.freedesktop.org 
Cc: Deucher, Alexander ; Zhang, Yifan 
; Yu, Lang 
Subject: [PATCH 1/2] drm/amdgpu: correct NBIO v7.11 programing

Use v7.7 before, switch to v7.11 now.
Fix incorrect programing.

Signed-off-by: Lang Yu 
---
 drivers/gpu/drm/amd/amdgpu/nbio_v7_11.c   | 56 +--
 .../asic_reg/nbio/nbio_7_11_0_offset.h|  9 ++-
 2 files changed, 33 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v7_11.c 
b/drivers/gpu/drm/amd/amdgpu/nbio_v7_11.c
index 6873eead1e19..3a94f249929e 100644
--- a/drivers/gpu/drm/amd/amdgpu/nbio_v7_11.c
+++ b/drivers/gpu/drm/amd/amdgpu/nbio_v7_11.c
@@ -66,19 +66,19 @@ static void nbio_v7_11_sdma_doorbell_range(struct 
amdgpu_device *adev, int insta
   bool use_doorbell, int 
doorbell_index,
   int doorbell_size)
 {
-   u32 reg = SOC15_REG_OFFSET(NBIO, 0, regGDC0_BIF_SDMA0_DOORBELL_RANGE);
+   u32 reg = SOC15_REG_OFFSET(NBIO, 0, regGDC0_BIF_CSDMA_DOORBELL_RANGE);
 u32 doorbell_range = RREG32_PCIE_PORT(reg);

 if (use_doorbell) {
 doorbell_range = REG_SET_FIELD(doorbell_range,
-  GDC0_BIF_SDMA0_DOORBELL_RANGE,
+  GDC0_BIF_CSDMA_DOORBELL_RANGE,
OFFSET, doorbell_index);
 doorbell_range = REG_SET_FIELD(doorbell_range,
-  GDC0_BIF_SDMA0_DOORBELL_RANGE,
+  GDC0_BIF_CSDMA_DOORBELL_RANGE,
SIZE, doorbell_size);
 } else {
 doorbell_range = REG_SET_FIELD(doorbell_range,
-  GDC0_BIF_SDMA0_DOORBELL_RANGE,
+  GDC0_BIF_CSDMA_DOORBELL_RANGE,
SIZE, 0);
 }

@@ -145,27 +145,25 @@ static void nbio_v7_11_enable_doorbell_aperture(struct 
amdgpu_device *adev,
 static void nbio_v7_11_enable_doorbell_selfring_aperture(struct amdgpu_device 
*adev,
 bool enable)
 {
-/* u32 tmp = 0;
+   u32 tmp = 0;

 if (enable) {
-   tmp = REG_SET_FIELD(tmp, 
BIF_BX_PF0_DOORBELL_SELFRING_GPA_APER_CNTL,
+   tmp = REG_SET_FIELD(tmp, 
BIF_BX_PF1_DOORBELL_SELFRING_GPA_APER_CNTL,
 DOORBELL_SELFRING_GPA_APER_EN, 1) |
-   REG_SET_FIELD(tmp, 
BIF_BX_PF0_DOORBELL_SELFRING_GPA_APER_CNTL,
+ REG_SET_FIELD(tmp, 
BIF_BX_PF1_DOORBELL_SELFRING_GPA_APER_CNTL,
 DOORBELL_SELFRING_GPA_APER_MODE, 1) |
-   REG_SET_FIELD(tmp, 
BIF_BX_PF0_DOORBELL_SELFRING_GPA_APER_CNTL,
+ REG_SET_FIELD(tmp, 
BIF_BX_PF1_DOORBELL_SELFRING_GPA_APER_CNTL,
 DOORBELL_SELFRING_GPA_APER_SIZE, 0);

 WREG32_SOC15(NBIO, 0,
-   regBIF_BX_PF0_DOORBELL_SELFRING_GPA_APER_BASE_LOW,
+   regBIF_BX_PF1_DOORBELL_SELFRING_GPA_APER_BASE_LOW,
 lower_32_bits(adev->doorbell.base));
 WREG32_SOC15(NBIO, 0,
-   regBIF_BX_PF0_DOORBELL_SELFRING_GPA_APER_BASE_HIGH,
+   regBIF_BX_PF1_DOORBELL_SELFRING_GPA_APER_BASE_HIGH,
 upper_32_bits(adev->doorbell.base));
 }

-   WREG32_SOC15(NBIO, 0, regBIF_BX_PF0_DOORBELL_SELFRING_GPA_APER_CNTL,
-   tmp);
-*/
+   WREG32_SOC15(NBIO, 0, regBIF_BX_PF1_DOORBELL_SELFRING_GPA_APER_CNTL, 
tmp);
 }


@@ -216,12 +214,12 @@ static void nbio_v7_11_ih_control(struct amdgpu_device 
*adev)

 static u32 nbio_v7_11_get_hdp_flush_req_offset(struct amdgpu_device *adev)
 {
-   return SOC15_REG_OFFSET(NBIO, 0, regBIF_BX_PF0_GPU_HDP_FLUSH_REQ);
+   return SOC15_REG_OFFSET(NBIO, 0, regBIF_BX_PF1_GPU_HDP_FLUSH_REQ);
 }

 static u32 nbio_v7_11_get_hdp_flush_done_offset(struct amdgpu_device *adev)
 {
-   return SOC15_REG_OFFSET(NBIO, 0, regBIF_BX_PF0_GPU_HDP_FLUSH_DONE);
+   return SOC15_REG_OFFSET(NBIO, 0, regBIF_BX_PF1_GPU_HDP_FLUSH_DONE);
 }

 static u32 nbio_v7_11_get_pcie_index_offset(struct amdgpu_device *adev)
@@ -236,27 +234,27 @@ static u32 nbio_v7_11_get_pcie_data_offset(struct 
amdgpu_device *adev)

 static u32 nbio_v7_11_get_pcie_port_index_offset(struct amdgpu_device *adev)
 {
-   return SOC15_REG_OFFSET(NBIO, 0, regBIF_BX_PF0_RSMU_INDEX);
+   return SOC15_REG_OFFSET(NBIO, 0, regBIF_BX_PF1_RSMU_INDEX);
 }

 static u32 nbio_v7_11_get_pcie_port_data_offset(struct amdgpu_device *adev)
 {
-   return SOC15_REG_OFFSET(N

RE: [PATCH] drm/amd/swsmu: update smu v14_0_0 header files and metrics table

2023-10-10 Thread Deucher, Alexander
[Public]

> -Original Message-
> From: Ma, Li 
> Sent: Tuesday, October 10, 2023 9:48 AM
> To: amd-gfx@lists.freedesktop.org; Feng, Kenneth
> 
> Cc: Deucher, Alexander ; Zhang, Yifan
> ; Ma, Li 
> Subject: [PATCH] drm/amd/swsmu: update smu v14_0_0 header files and
> metrics table
>
> Update driver if, pmfw and ppsmc header files.
> Add new gpu_metrics_v3_0 for metrics table updated in driver if and reserve
> legacy metrics table to maintain backward compatibility.
>
> Signed-off-by: Li Ma 
> Reviewed-by: Yifan Zhang 

Acked-by: Alex Deucher 



RE: [PATCH 2/3] drm/amdgpu/umsch: power on/off UMSCH by DLDO

2023-10-09 Thread Deucher, Alexander
[Public]

> -Original Message-
> From: Yu, Lang 
> Sent: Saturday, October 7, 2023 4:54 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Zhang, Yifan
> ; Gopalakrishnan, Veerabadhran (Veera)
> ; Yu, Lang 
> Subject: [PATCH 2/3] drm/amdgpu/umsch: power on/off UMSCH by DLDO
>
> VCN 4.0.5 uses DLDO.
>
> Signed-off-by: Lang Yu 
> ---
>  drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c | 26
> ++
>  1 file changed, 26 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c
> b/drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c
> index a60178156c77..7e79954c833b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c
> @@ -34,6 +34,16 @@
>  #include "umsch_mm_4_0_api_def.h"
>  #include "umsch_mm_v4_0.h"
>
> +#define regUVD_IPX_DLDO_CONFIG 0x0064
> +#define regUVD_IPX_DLDO_CONFIG_BASE_IDX1
> +#define regUVD_IPX_DLDO_STATUS 0x0065
> +#define regUVD_IPX_DLDO_STATUS_BASE_IDX1
> +
> +#define UVD_IPX_DLDO_CONFIG__ONO0_PWR_CONFIG__SHIFT
> 0x0002
> +#define UVD_IPX_DLDO_CONFIG__ONO0_PWR_CONFIG_MASK
> 0x000cUL
> +#define UVD_IPX_DLDO_STATUS__ONO0_PWR_STATUS__SHIFT
> 0x0001
> +#define UVD_IPX_DLDO_STATUS__ONO0_PWR_STATUS_MASK
> 0x0002UL
> +
>  static int umsch_mm_v4_0_load_microcode(struct amdgpu_umsch_mm
> *umsch)  {
>   struct amdgpu_device *adev = umsch->ring.adev; @@ -50,6 +60,14
> @@ static int umsch_mm_v4_0_load_microcode(struct amdgpu_umsch_mm
> *umsch)
>
>   umsch->cmd_buf_curr_ptr = umsch->cmd_buf_ptr;
>
> + if (adev->ip_versions[VCN_HWIP][0] == IP_VERSION(4, 0, 5)) {
> + WREG32_SOC15(VCN, 0, regUVD_IPX_DLDO_CONFIG,
> + 1 <<
> UVD_IPX_DLDO_CONFIG__ONO0_PWR_CONFIG__SHIFT);
> + SOC15_WAIT_ON_RREG(VCN, 0, regUVD_IPX_DLDO_STATUS,
> + 0 <<
> UVD_IPX_DLDO_STATUS__ONO0_PWR_STATUS__SHIFT,
> +
>   UVD_IPX_DLDO_STATUS__ONO0_PWR_STATUS_MASK);
> + }
> +

Is this the right place for this?  umsch_mm_hw_init() only calls this for 
FW_LOAD_DIRECT.  Maybe that check needs to be dropped?

Alex

>   data = RREG32_SOC15(VCN, 0, regUMSCH_MES_RESET_CTRL);
>   data = REG_SET_FIELD(data, UMSCH_MES_RESET_CTRL,
> MES_CORE_SOFT_RESET, 0);
>   WREG32_SOC15_UMSCH(regUMSCH_MES_RESET_CTRL, data); @@ -
> 229,6 +247,14 @@ static int umsch_mm_v4_0_ring_stop(struct
> amdgpu_umsch_mm *umsch)
>   data = REG_SET_FIELD(data, VCN_UMSCH_RB_DB_CTRL, EN, 0);
>   WREG32_SOC15(VCN, 0, regVCN_UMSCH_RB_DB_CTRL, data);
>
> + if (adev->ip_versions[VCN_HWIP][0] == IP_VERSION(4, 0, 5)) {
> + WREG32_SOC15(VCN, 0, regUVD_IPX_DLDO_CONFIG,
> + 2 <<
> UVD_IPX_DLDO_CONFIG__ONO0_PWR_CONFIG__SHIFT);
> + SOC15_WAIT_ON_RREG(VCN, 0, regUVD_IPX_DLDO_STATUS,
> + 1 <<
> UVD_IPX_DLDO_STATUS__ONO0_PWR_STATUS__SHIFT,
> +
>   UVD_IPX_DLDO_STATUS__ONO0_PWR_STATUS_MASK);
> + }
> +
>   return 0;
>  }
>
> --
> 2.25.1



RE: [PATCH 2/3] drm/amdgpu/umsch: power on/off UMSCH by DLDO

2023-10-09 Thread Deucher, Alexander
[Public]

> -Original Message-
> From: Yu, Lang 
> Sent: Saturday, October 7, 2023 4:54 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Zhang, Yifan
> ; Gopalakrishnan, Veerabadhran (Veera)
> ; Yu, Lang 
> Subject: [PATCH 2/3] drm/amdgpu/umsch: power on/off UMSCH by DLDO
>
> VCN 4.0.5 uses DLDO.
>
> Signed-off-by: Lang Yu 
> ---
>  drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c | 26
> ++
>  1 file changed, 26 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c
> b/drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c
> index a60178156c77..7e79954c833b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c
> @@ -34,6 +34,16 @@
>  #include "umsch_mm_4_0_api_def.h"
>  #include "umsch_mm_v4_0.h"
>
> +#define regUVD_IPX_DLDO_CONFIG 0x0064
> +#define regUVD_IPX_DLDO_CONFIG_BASE_IDX1
> +#define regUVD_IPX_DLDO_STATUS 0x0065
> +#define regUVD_IPX_DLDO_STATUS_BASE_IDX1
> +
> +#define UVD_IPX_DLDO_CONFIG__ONO0_PWR_CONFIG__SHIFT
> 0x0002
> +#define UVD_IPX_DLDO_CONFIG__ONO0_PWR_CONFIG_MASK
> 0x000cUL
> +#define UVD_IPX_DLDO_STATUS__ONO0_PWR_STATUS__SHIFT
> 0x0001
> +#define UVD_IPX_DLDO_STATUS__ONO0_PWR_STATUS_MASK
> 0x0002UL
> +
>  static int umsch_mm_v4_0_load_microcode(struct amdgpu_umsch_mm
> *umsch)  {
>   struct amdgpu_device *adev = umsch->ring.adev; @@ -50,6 +60,14
> @@ static int umsch_mm_v4_0_load_microcode(struct amdgpu_umsch_mm
> *umsch)
>
>   umsch->cmd_buf_curr_ptr = umsch->cmd_buf_ptr;
>
> + if (adev->ip_versions[VCN_HWIP][0] == IP_VERSION(4, 0, 5)) {

This switched to a function call.  Amdgpu_ip_version().

> + WREG32_SOC15(VCN, 0, regUVD_IPX_DLDO_CONFIG,
> + 1 <<
> UVD_IPX_DLDO_CONFIG__ONO0_PWR_CONFIG__SHIFT);
> + SOC15_WAIT_ON_RREG(VCN, 0, regUVD_IPX_DLDO_STATUS,
> + 0 <<
> UVD_IPX_DLDO_STATUS__ONO0_PWR_STATUS__SHIFT,
> +
>   UVD_IPX_DLDO_STATUS__ONO0_PWR_STATUS_MASK);
> + }
> +
>   data = RREG32_SOC15(VCN, 0, regUMSCH_MES_RESET_CTRL);
>   data = REG_SET_FIELD(data, UMSCH_MES_RESET_CTRL,
> MES_CORE_SOFT_RESET, 0);
>   WREG32_SOC15_UMSCH(regUMSCH_MES_RESET_CTRL, data); @@ -
> 229,6 +247,14 @@ static int umsch_mm_v4_0_ring_stop(struct
> amdgpu_umsch_mm *umsch)
>   data = REG_SET_FIELD(data, VCN_UMSCH_RB_DB_CTRL, EN, 0);
>   WREG32_SOC15(VCN, 0, regVCN_UMSCH_RB_DB_CTRL, data);
>
> + if (adev->ip_versions[VCN_HWIP][0] == IP_VERSION(4, 0, 5)) {

Same here.

Alex

> + WREG32_SOC15(VCN, 0, regUVD_IPX_DLDO_CONFIG,
> + 2 <<
> UVD_IPX_DLDO_CONFIG__ONO0_PWR_CONFIG__SHIFT);
> + SOC15_WAIT_ON_RREG(VCN, 0, regUVD_IPX_DLDO_STATUS,
> + 1 <<
> UVD_IPX_DLDO_STATUS__ONO0_PWR_STATUS__SHIFT,
> +
>   UVD_IPX_DLDO_STATUS__ONO0_PWR_STATUS_MASK);
> + }
> +
>   return 0;
>  }
>
> --
> 2.25.1



Re: [PATCH] drm/amdgpu: Increase IP discovery region size

2023-10-06 Thread Deucher, Alexander
[AMD Official Use Only - General]

Acked-by: Alex Deucher 

From: amd-gfx  on behalf of Lijo Lazar 

Sent: Friday, October 6, 2023 1:00 AM
To: amd-gfx@lists.freedesktop.org 
Cc: Deucher, Alexander ; Zhang, Hawking 

Subject: [PATCH] drm/amdgpu: Increase IP discovery region size

IP discovery region has increased to > 8K on some SOCs.Maximum reserve
size is upto 12K, but not used. For now increase to 10K.

Signed-off-by: Lijo Lazar 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.h
index 3a2f347bd50d..4d03cd5b3410 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.h
@@ -24,7 +24,7 @@
 #ifndef __AMDGPU_DISCOVERY__
 #define __AMDGPU_DISCOVERY__

-#define DISCOVERY_TMR_SIZE  (8 << 10)
+#define DISCOVERY_TMR_SIZE  (10 << 10)
 #define DISCOVERY_TMR_OFFSET(64 << 10)

 void amdgpu_discovery_fini(struct amdgpu_device *adev);
--
2.25.1



Re: [PATCH v2 1/5] drm/amdgpu: Move package type enum to amdgpu_smuio

2023-10-04 Thread Deucher, Alexander
[AMD Official Use Only - General]

Series is:
Reviewed-by: Alex Deucher 

From: Lazar, Lijo 
Sent: Wednesday, October 4, 2023 3:39 AM
To: amd-gfx@lists.freedesktop.org 
Cc: Zhang, Hawking ; Deucher, Alexander 

Subject: [PATCH v2 1/5] drm/amdgpu: Move package type enum to amdgpu_smuio

Move definition of package type to amdgpu_smuio header and add new
package types for CEM and OAM.

Signed-off-by: Lijo Lazar 
---

v2: Move definition to amdgpu_smuio.h instead of amdgpu.h (Christian/Hawking)

 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h   | 5 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_smuio.h | 7 +++
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
index 42ac6d1bf9ca..7088c5015675 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
@@ -69,11 +69,6 @@ enum amdgpu_gfx_partition {

 #define NUM_XCC(x) hweight16(x)

-enum amdgpu_pkg_type {
-   AMDGPU_PKG_TYPE_APU = 2,
-   AMDGPU_PKG_TYPE_UNKNOWN,
-};
-
 enum amdgpu_gfx_ras_mem_id_type {
 AMDGPU_GFX_CP_MEM = 0,
 AMDGPU_GFX_GCEA_MEM,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_smuio.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_smuio.h
index 89c38d864471..5910d50ac74d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_smuio.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_smuio.h
@@ -23,6 +23,13 @@
 #ifndef __AMDGPU_SMUIO_H__
 #define __AMDGPU_SMUIO_H__

+enum amdgpu_pkg_type {
+   AMDGPU_PKG_TYPE_APU = 2,
+   AMDGPU_PKG_TYPE_CEM = 3,
+   AMDGPU_PKG_TYPE_OAM = 4,
+   AMDGPU_PKG_TYPE_UNKNOWN,
+};
+
 struct amdgpu_smuio_funcs {
 u32 (*get_rom_index_offset)(struct amdgpu_device *adev);
 u32 (*get_rom_data_offset)(struct amdgpu_device *adev);
--
2.25.1



RE: [PATCH v3 1/4] drm/amd: Add support for prepare() and complete() callbacks

2023-10-03 Thread Deucher, Alexander
[Public]

> -Original Message-
> From: Limonciello, Mario 
> Sent: Tuesday, October 3, 2023 5:17 PM
> To: Deucher, Alexander ; amd-
> g...@lists.freedesktop.org
> Cc: Wentland, Harry 
> Subject: Re: [PATCH v3 1/4] drm/amd: Add support for prepare() and
> complete() callbacks
>
> On 10/3/2023 16:11, Deucher, Alexander wrote:
> > [Public]
> >
> >> -Original Message-
> >> From: amd-gfx  On Behalf Of
> >> Mario Limonciello
> >> Sent: Tuesday, October 3, 2023 4:55 PM
> >> To: amd-gfx@lists.freedesktop.org
> >> Cc: Wentland, Harry ; Limonciello, Mario
> >> 
> >> Subject: [PATCH v3 1/4] drm/amd: Add support for prepare() and
> >> complete() callbacks
> >>
> >> Linux PM core has a prepare() callback run before suspend and
> >> complete() callback ran after resume() for devices to use.  Add
> >> plumbing to bring
> >> prepare() to amdgpu.
> >>
> >> The idea with the new vfuncs for amdgpu is that all IP blocks that
> >> memory allocations during suspend should do the allocation from this
> >> call instead of the suspend() callback.
> >>
> >> By moving the allocations to prepare() the system suspend will be
> >> failed before any IP block has done any suspend code.
> >>
> >> If the suspend fails, then do any cleanups in the complete() callback.
> >>
> >> Signed-off-by: Mario Limonciello 
> >> ---
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu.h|  2 ++
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 39
> >> --
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c| 11 +++---
> >>   3 files changed, 46 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> >> index 73e825d20259..5d651552822c 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> >> @@ -1415,6 +1415,8 @@ void amdgpu_driver_postclose_kms(struct
> >> drm_device *dev,  void amdgpu_driver_release_kms(struct drm_device
> >> *dev);
> >>
> >>   int amdgpu_device_ip_suspend(struct amdgpu_device *adev);
> >> +int amdgpu_device_prepare(struct drm_device *dev); void
> >> +amdgpu_device_complete(struct drm_device *dev);
> >>   int amdgpu_device_suspend(struct drm_device *dev, bool fbcon);  int
> >> amdgpu_device_resume(struct drm_device *dev, bool fbcon);
> >>   u32 amdgpu_get_vblank_counter_kms(struct drm_crtc *crtc); diff
> >> --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >> index bad2b5577e96..f53cf675c3ce 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >> @@ -4259,6 +4259,43 @@ static int
> >> amdgpu_device_evict_resources(struct
> >> amdgpu_device *adev)
> >>   /*
> >>* Suspend & resume.
> >>*/
> >> +/**
> >> + * amdgpu_device_prepare - prepare for device suspend
> >> + *
> >> + * @dev: drm dev pointer
> >> + *
> >> + * Prepare to put the hw in the suspend state (all asics).
> >> + * Returns 0 for success or an error on failure.
> >> + * Called at driver suspend.
> >> + */
> >> +int amdgpu_device_prepare(struct drm_device *dev) {
> >> + struct amdgpu_device *adev = drm_to_adev(dev);
> >> + int r;
> >> +
> >> + if (dev->switch_power_state == DRM_SWITCH_POWER_OFF)
> >> + return 0;
> >> +
> >> + adev->in_suspend = true;
> >> +
> >> + return 0;
> >> +}
> >> +
> >> +/**
> >> + * amdgpu_device_complete - complete the device after resume
> >> + *
> >> + * @dev: drm dev pointer
> >> + *
> >> + * Clean up any actions that the prepare step did.
> >> + * Called after driver resume.
> >> + */
> >> +void amdgpu_device_complete(struct drm_device *dev) {
> >> + struct amdgpu_device *adev = drm_to_adev(dev);
> >> +
> >> + adev->in_suspend = false;
> >> +}
> >> +
> >>   /**
> >>* amdgpu_device_suspend - initiate device suspend
> >>*
> >> @@ -4277,8 +4314,6 @@ int amdgpu_device_suspend(struct drm_device
> >> *dev, bool fbcon)
> >>if (dev->switch_power_state == DRM_SWITCH_POWER_OFF)
> &g

RE: [PATCH v3 1/4] drm/amd: Add support for prepare() and complete() callbacks

2023-10-03 Thread Deucher, Alexander
[Public]

> -Original Message-
> From: amd-gfx  On Behalf Of Mario
> Limonciello
> Sent: Tuesday, October 3, 2023 4:55 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Wentland, Harry ; Limonciello, Mario
> 
> Subject: [PATCH v3 1/4] drm/amd: Add support for prepare() and complete()
> callbacks
>
> Linux PM core has a prepare() callback run before suspend and complete()
> callback ran after resume() for devices to use.  Add plumbing to bring
> prepare() to amdgpu.
>
> The idea with the new vfuncs for amdgpu is that all IP blocks that memory
> allocations during suspend should do the allocation from this call instead of
> the suspend() callback.
>
> By moving the allocations to prepare() the system suspend will be failed 
> before
> any IP block has done any suspend code.
>
> If the suspend fails, then do any cleanups in the complete() callback.
>
> Signed-off-by: Mario Limonciello 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h|  2 ++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 39
> --
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c| 11 +++---
>  3 files changed, 46 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 73e825d20259..5d651552822c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -1415,6 +1415,8 @@ void amdgpu_driver_postclose_kms(struct
> drm_device *dev,  void amdgpu_driver_release_kms(struct drm_device *dev);
>
>  int amdgpu_device_ip_suspend(struct amdgpu_device *adev);
> +int amdgpu_device_prepare(struct drm_device *dev); void
> +amdgpu_device_complete(struct drm_device *dev);
>  int amdgpu_device_suspend(struct drm_device *dev, bool fbcon);  int
> amdgpu_device_resume(struct drm_device *dev, bool fbcon);
>  u32 amdgpu_get_vblank_counter_kms(struct drm_crtc *crtc); diff --git
> a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index bad2b5577e96..f53cf675c3ce 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -4259,6 +4259,43 @@ static int amdgpu_device_evict_resources(struct
> amdgpu_device *adev)
>  /*
>   * Suspend & resume.
>   */
> +/**
> + * amdgpu_device_prepare - prepare for device suspend
> + *
> + * @dev: drm dev pointer
> + *
> + * Prepare to put the hw in the suspend state (all asics).
> + * Returns 0 for success or an error on failure.
> + * Called at driver suspend.
> + */
> +int amdgpu_device_prepare(struct drm_device *dev) {
> + struct amdgpu_device *adev = drm_to_adev(dev);
> + int r;
> +
> + if (dev->switch_power_state == DRM_SWITCH_POWER_OFF)
> + return 0;
> +
> + adev->in_suspend = true;
> +
> + return 0;
> +}
> +
> +/**
> + * amdgpu_device_complete - complete the device after resume
> + *
> + * @dev: drm dev pointer
> + *
> + * Clean up any actions that the prepare step did.
> + * Called after driver resume.
> + */
> +void amdgpu_device_complete(struct drm_device *dev) {
> + struct amdgpu_device *adev = drm_to_adev(dev);
> +
> + adev->in_suspend = false;
> +}
> +
>  /**
>   * amdgpu_device_suspend - initiate device suspend
>   *
> @@ -4277,8 +4314,6 @@ int amdgpu_device_suspend(struct drm_device
> *dev, bool fbcon)
>   if (dev->switch_power_state == DRM_SWITCH_POWER_OFF)
>   return 0;
>
> - adev->in_suspend = true;
> -

We also set this to false in amdgpu_device_resume() so that should be fixed up 
as well.  But, I'm not sure we want to move this out of 
amdgpu_device_suspend().  There are places we use 
amdgpu_device_suspend/resume() outside of pmops that also rely on these being 
set.  Those places may need to be fixed up if we do.  IIRC, the switcheroo code 
uses this.

Alex

>   /* Evict the majority of BOs before grabbing the full access */
>   r = amdgpu_device_evict_resources(adev);
>   if (r)
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index e3471293846f..4c6fb852516a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -2425,8 +2425,9 @@ static int amdgpu_pmops_prepare(struct device
> *dev)
>   /* Return a positive number here so
>* DPM_FLAG_SMART_SUSPEND works properly
>*/
> - if (amdgpu_device_supports_boco(drm_dev))
> - return pm_runtime_suspended(dev);
> + if (amdgpu_device_supports_boco(drm_dev) &&
> + pm_runtime_suspended(dev))
> + return 1;
>
>   /* if we will not support s3 or s2i for the device
>*  then skip suspend
> @@ -2435,12 +2436,14 @@ static int amdgpu_pmops_prepare(struct device
> *dev)
>   !amdgpu_acpi_is_s3_active(adev))
>   return 1;
>
> - return 0;
> + return amdgpu_device_prepare(drm_dev);
>  }
>
>  static void amdgpu_pmops_complete(struct device *dev)  {
> - /* nothing to do */
> + 

RE: [PATCH v2 2/3] drm/amd: Move evict resources suspend step to prepare()

2023-10-03 Thread Deucher, Alexander
[AMD Official Use Only - General]

> -Original Message-
> From: amd-gfx  On Behalf Of Mario
> Limonciello
> Sent: Monday, October 2, 2023 6:45 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Wentland, Harry ; Limonciello, Mario
> 
> Subject: [PATCH v2 2/3] drm/amd: Move evict resources suspend step to
> prepare()
>
> If the system is under high memory pressure, the resources may need to be
> evicted into swap instead.  If the storage backing for swap is offlined during
> the suspend() step then such a call may fail.
>
> So instead move this step into prepare(), while leaving all other steps that 
> put
> the GPU into a low power state in suspend().
>
> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2362
> Signed-off-by: Mario Limonciello 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 12 +---
>  1 file changed, 5 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index c41d69e7a8f5..bb0e753fb6f8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -4276,6 +4276,11 @@ int amdgpu_device_prepare(struct drm_device
> *dev)
>   if (dev->switch_power_state == DRM_SWITCH_POWER_OFF)
>   return 0;
>
> + /* Evict the majority of BOs before grabbing the full access */
> + r = amdgpu_device_evict_resources(adev);
> + if (r)
> + return r;
> +
>   return 0;
>  }
>
> @@ -4297,13 +4302,6 @@ int amdgpu_device_suspend(struct drm_device
> *dev, bool fbcon)
>   if (dev->switch_power_state == DRM_SWITCH_POWER_OFF)
>   return 0;
>
> - adev->in_suspend = true;

This shouldn't be dropped.

Alex

> -
> - /* Evict the majority of BOs before grabbing the full access */
> - r = amdgpu_device_evict_resources(adev);
> - if (r)
> - return r;
> -
>   if (amdgpu_sriov_vf(adev)) {
>   amdgpu_virt_fini_data_exchange(adev);
>   r = amdgpu_virt_request_full_gpu(adev, false);
> --
> 2.34.1



RE: [PATCH 1/3] drm/amd: Fix detection of _PR3 on the PCIe root port

2023-09-28 Thread Deucher, Alexander
[Public]

> -Original Message-
> From: Limonciello, Mario 
> Sent: Tuesday, September 26, 2023 7:00 PM
> To: amd-gfx@lists.freedesktop.org; Sebastian Reichel ;
> Deucher, Alexander 
> Cc: linux...@vger.kernel.org; linux-ker...@vger.kernel.org; Ma, Jun
> ; Limonciello, Mario 
> Subject: [PATCH 1/3] drm/amd: Fix detection of _PR3 on the PCIe root port
>
> On some systems with Navi3x dGPU will attempt to use BACO for runtime PM
> but fails to resume properly.  This is because on these systems the root port
> goes into D3cold which is incompatible with BACO.
>
> This happens because in this case dGPU is connected to a bridge between root
> port which causes BOCO detection logic to fail.  Fix the intent of the logic 
> by
> looking at root port, not the immediate upstream bridge for _PR3.
>
> Cc: sta...@vger.kernel.org
> Suggested-by: Jun Ma 
> Tested-by: David Perry 
> Fixes: b10c1c5b3a4e ("drm/amdgpu: add check for ACPI power resources")
> Signed-off-by: Mario Limonciello 

Series is:
Reviewed-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index e4627d92e1d0..bad2b5577e96 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -2238,7 +2238,7 @@ static int amdgpu_device_ip_early_init(struct
> amdgpu_device *adev)
>   adev->flags |= AMD_IS_PX;
>
>   if (!(adev->flags & AMD_IS_APU)) {
> - parent = pci_upstream_bridge(adev->pdev);
> + parent = pcie_find_root_port(adev->pdev);
>   adev->has_pr3 = parent ? pci_pr3_present(parent) : false;
>   }
>
> --
> 2.34.1



Re: [PATCH] drm/amd/pm: Disallow managing power profiles on SRIOV for gc11.0.3

2023-09-27 Thread Deucher, Alexander
[Public]

Acked-by: Alex Deucher 

From: amd-gfx  on behalf of Victor Zhao 

Sent: Monday, September 25, 2023 11:08 PM
To: amd-gfx@lists.freedesktop.org 
Cc: Zhao, Victor 
Subject: [PATCH] drm/amd/pm: Disallow managing power profiles on SRIOV for 
gc11.0.3

disable pp_power_profile_mode for sriov on gc11.0.3 as not supported
by smu

Signed-off-by: Victor Zhao 
---
 drivers/gpu/drm/amd/pm/amdgpu_pm.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c 
b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
index 2d19282e4fbe..b6f32d57b81f 100644
--- a/drivers/gpu/drm/amd/pm/amdgpu_pm.c
+++ b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
@@ -2122,7 +2122,8 @@ static int default_attr_update(struct amdgpu_device 
*adev, struct amdgpu_device_
 } else if (DEVICE_ATTR_IS(pp_power_profile_mode)) {
 if (amdgpu_dpm_get_power_profile_mode(adev, NULL) == 
-EOPNOTSUPP)
 *states = ATTR_STATE_UNSUPPORTED;
-   else if (gc_ver == IP_VERSION(10, 3, 0) && 
amdgpu_sriov_vf(adev))
+   else if ((gc_ver == IP_VERSION(10, 3, 0) ||
+ gc_ver == IP_VERSION(11, 0, 3)) && 
amdgpu_sriov_vf(adev))
 *states = ATTR_STATE_UNSUPPORTED;
 }

--
2.34.1



RE: [PATCH 2/2] drm/amdkfd: drop struct kfd_cu_info

2023-09-26 Thread Deucher, Alexander
[Public]

> -Original Message-
> From: Arnd Bergmann 
> Sent: Tuesday, September 26, 2023 1:49 PM
> To: Deucher, Alexander ; amd-
> g...@lists.freedesktop.org
> Subject: Re: [PATCH 2/2] drm/amdkfd: drop struct kfd_cu_info
>
> On Tue, Sep 26, 2023, at 18:39, Alex Deucher wrote:
> > I think this was an abstraction back from when kfd supported both
> > radeon and amdgpu.  Since we just support amdgpu now, there is no more
> > need for this and we can use the amdgpu structures directly.
> >
> > This also avoids having the kfd_cu_info structures on the stack when
> > inlining which can blow up the stack.
> >
> > Cc: Arnd Bergmann 
> > Signed-off-by: Alex Deucher 
>
> Nice cleanup!
>
> Acked-by: Arnd Bergmann 
>
> I guess you could fold patch 1/2 into this as it removes all the added code 
> from
> that anyway.

I left it as a separate patch as I didn't get a chance to see when the stack 
warning appeared and figured it might be a good way to mitigate that on stable 
kernels if necessary without pulling in the whole rework, but if not, I can 
just squash it into the second patch.

Alex



  1   2   3   4   5   6   7   8   9   10   >