Re: [PATCH] drm/amdgpu: release correct lock in amdgpu_gfx_enable_kgq()

2023-05-10 Thread Alex Deucher
Applied.  Thanks!

Alex

On Tue, May 9, 2023 at 10:32 AM Dan Carpenter  wrote:
>
> This function was releasing the incorrect lock on the error path.
>
> Reported-by: kernel test robot 
> Fixes: 9bfa241d1289 ("drm/amdgpu: add [en/dis]able_kgq() functions")
> Signed-off-by: Dan Carpenter 
> ---
> The LKP robot sent me an email about this after I had already written
> the patch.  (I review LKP Smatch emails and hit forward).
>
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> index 969f256aa003..7d2f119d9223 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> @@ -644,7 +644,7 @@ int amdgpu_gfx_enable_kgq(struct amdgpu_device *adev, int 
> xcc_id)
> adev->gfx.num_gfx_rings);
> if (r) {
> DRM_ERROR("Failed to lock KIQ (%d).\n", r);
> -   spin_unlock(>gfx.kiq[0].ring_lock);
> +   spin_unlock(>ring_lock);
> return r;
> }
>
> --
> 2.39.2
>


RE: [PATCH 1/2] drm/amdgpu: fix amdgpu_irq_put call trace in jpeg_v4_0_hw_fini

2023-05-10 Thread Zhang, Horatio
[AMD Official Use Only - General]

Got it!

Thanks,
Horatio

-Original Message-
From: Zhang, Hawking  
Sent: Thursday, May 11, 2023 10:28 AM
To: Zhang, Horatio ; Zhou1, Tao ; 
amd-gfx@lists.freedesktop.org
Cc: Xu, Feifei ; Liu, Leo ; Jiang, Sonny 
; Limonciello, Mario ; Liu, 
HaoPing (Alan) ; Zhou, Bob 
Subject: RE: [PATCH 1/2] drm/amdgpu: fix amdgpu_irq_put call trace in 
jpeg_v4_0_hw_fini

[AMD Official Use Only - General]

Please register dedicated ras_irq src and funcs for UVD_POISON, which should 
allow you to create vcn ras sw calls like gfx/sdma ip block.

Regards,
Hawking

-Original Message-
From: Zhang, Horatio 
Sent: Wednesday, May 10, 2023 18:55
To: Zhang, Hawking ; Zhou1, Tao ; 
amd-gfx@lists.freedesktop.org
Cc: Xu, Feifei ; Liu, Leo ; Jiang, Sonny 
; Limonciello, Mario ; Liu, 
HaoPing (Alan) ; Zhou, Bob 
Subject: RE: [PATCH 1/2] drm/amdgpu: fix amdgpu_irq_put call trace in 
jpeg_v4_0_hw_fini

[AMD Official Use Only - General]

Hi Hawking,

When modprobe, the interrupt of jpeg/vcn was enabled in 
amdgpu_fence_driver_hw_init(). If the amdgpu_irq_get function is added in 
amdgpu_xxx_ras_late_init/xxx_v4_0_late_init, it will enable the instance 
interrupt twice. 
My previous modification plan also had this issue. Perhaps we should remove the 
amdgpu_irq_put function from jpeg/vcn_v4_0_hw_fini.

Regards,
Horatio

-Original Message-
From: Zhang, Hawking 
Sent: Monday, May 8, 2023 8:32 PM
To: Zhou1, Tao ; Zhang, Horatio ; 
amd-gfx@lists.freedesktop.org
Cc: Xu, Feifei ; Liu, Leo ; Jiang, Sonny 
; Limonciello, Mario ; Liu, 
HaoPing (Alan) ; Zhang, Horatio 
Subject: RE: [PATCH 1/2] drm/amdgpu: fix amdgpu_irq_put call trace in 
jpeg_v4_0_hw_fini

[AMD Official Use Only - General]

Shall we consider creating amdgpu_vcn_ras_late_init as a common helper for 
interrupt enablement, like other IP blocks. This also reduces further effort 
when RAS feature is introduced in new version of vcn/jpeg

Regards,
Hawking

-Original Message-
From: Zhou1, Tao 
Sent: Monday, May 8, 2023 19:06
To: Zhang, Horatio ; amd-gfx@lists.freedesktop.org
Cc: Zhang, Hawking ; Xu, Feifei ; 
Liu, Leo ; Jiang, Sonny ; Limonciello, 
Mario ; Liu, HaoPing (Alan) ; 
Zhang, Horatio 
Subject: RE: [PATCH 1/2] drm/amdgpu: fix amdgpu_irq_put call trace in 
jpeg_v4_0_hw_fini

[AMD Official Use Only - General]

The series is:

Reviewed-by: Tao Zhou 

> -Original Message-
> From: Horatio Zhang 
> Sent: Monday, May 8, 2023 6:20 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Zhang, Hawking ; Zhou1, Tao 
> ; Xu, Feifei ; Liu, Leo 
> ; Jiang, Sonny ; Limonciello, 
> Mario ; Liu, HaoPing (Alan) 
> ; Zhang, Horatio 
> Subject: [PATCH 1/2] drm/amdgpu: fix amdgpu_irq_put call trace in 
> jpeg_v4_0_hw_fini
> 
> During the suspend, the jpeg_v4_0_hw_init function will use the 
> amdgpu_irq_put to disable the irq of jpeg.inst, but it was not enabled 
> during the resume process, which resulted in a call trace during the GPU 
> reset process.
> 
> [   50.497562] RIP: 0010:amdgpu_irq_put+0xa4/0xc0 [amdgpu]
> [   50.497619] RSP: 0018:aa2400fcfcb0 EFLAGS: 00010246
> [   50.497620] RAX:  RBX: 0001 RCX:
> 
> [   50.497621] RDX:  RSI:  RDI:
> 
> [   50.497621] RBP: aa2400fcfcd0 R08:  R09:
> 
> [   50.497622] R10:  R11:  R12:
> 99b2105242d8
> [   50.497622] R13:  R14: 99b21050 R15:
> 99b21050
> [   50.497623] FS:  () GS:99b51848()
> knlGS:
> [   50.497623] CS:  0010 DS:  ES:  CR0: 80050033
> [   50.497624] CR2: 7f9d32aa91e8 CR3: 0001ba21 CR4:
> 00750ee0
> [   50.497624] PKRU: 5554
> [   50.497625] Call Trace:
> [   50.497625]  
> [   50.497627]  jpeg_v4_0_hw_fini+0x43/0xc0 [amdgpu]
> [   50.497693]  jpeg_v4_0_suspend+0x13/0x30 [amdgpu]
> [   50.497751]  amdgpu_device_ip_suspend_phase2+0x240/0x470 [amdgpu]
> [   50.497802]  amdgpu_device_ip_suspend+0x41/0x80 [amdgpu]
> [   50.497854]  amdgpu_device_pre_asic_reset+0xd9/0x4a0 [amdgpu]
> [   50.497905]  amdgpu_device_gpu_recover.cold+0x548/0xcf1 [amdgpu]
> [   50.498005]  amdgpu_debugfs_reset_work+0x4c/0x80 [amdgpu]
> [   50.498060]  process_one_work+0x21f/0x400
> [   50.498063]  worker_thread+0x200/0x3f0
> [   50.498064]  ? process_one_work+0x400/0x400
> [   50.498065]  kthread+0xee/0x120
> [   50.498067]  ? kthread_complete_and_exit+0x20/0x20
> [   50.498068]  ret_from_fork+0x22/0x30
> 
> Fixes: 86e8255f941e ("drm/amdgpu: add JPEG 4.0 RAS poison consumption
> handling")
> Signed-off-by: Horatio Zhang 
> ---
>  drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c
> b/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c
> index 77e1e64aa1d1..b5c14a166063 100644
> --- a/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c
> 

RE: [PATCH 1/2] drm/amdgpu: fix amdgpu_irq_put call trace in jpeg_v4_0_hw_fini

2023-05-10 Thread Zhang, Hawking
[AMD Official Use Only - General]

Please register dedicated ras_irq src and funcs for UVD_POISON, which should 
allow you to create vcn ras sw calls like gfx/sdma ip block.

Regards,
Hawking

-Original Message-
From: Zhang, Horatio  
Sent: Wednesday, May 10, 2023 18:55
To: Zhang, Hawking ; Zhou1, Tao ; 
amd-gfx@lists.freedesktop.org
Cc: Xu, Feifei ; Liu, Leo ; Jiang, Sonny 
; Limonciello, Mario ; Liu, 
HaoPing (Alan) ; Zhou, Bob 
Subject: RE: [PATCH 1/2] drm/amdgpu: fix amdgpu_irq_put call trace in 
jpeg_v4_0_hw_fini

[AMD Official Use Only - General]

Hi Hawking,

When modprobe, the interrupt of jpeg/vcn was enabled in 
amdgpu_fence_driver_hw_init(). If the amdgpu_irq_get function is added in 
amdgpu_xxx_ras_late_init/xxx_v4_0_late_init, it will enable the instance 
interrupt twice. 
My previous modification plan also had this issue. Perhaps we should remove the 
amdgpu_irq_put function from jpeg/vcn_v4_0_hw_fini.

Regards,
Horatio

-Original Message-
From: Zhang, Hawking 
Sent: Monday, May 8, 2023 8:32 PM
To: Zhou1, Tao ; Zhang, Horatio ; 
amd-gfx@lists.freedesktop.org
Cc: Xu, Feifei ; Liu, Leo ; Jiang, Sonny 
; Limonciello, Mario ; Liu, 
HaoPing (Alan) ; Zhang, Horatio 
Subject: RE: [PATCH 1/2] drm/amdgpu: fix amdgpu_irq_put call trace in 
jpeg_v4_0_hw_fini

[AMD Official Use Only - General]

Shall we consider creating amdgpu_vcn_ras_late_init as a common helper for 
interrupt enablement, like other IP blocks. This also reduces further effort 
when RAS feature is introduced in new version of vcn/jpeg

Regards,
Hawking

-Original Message-
From: Zhou1, Tao 
Sent: Monday, May 8, 2023 19:06
To: Zhang, Horatio ; amd-gfx@lists.freedesktop.org
Cc: Zhang, Hawking ; Xu, Feifei ; 
Liu, Leo ; Jiang, Sonny ; Limonciello, 
Mario ; Liu, HaoPing (Alan) ; 
Zhang, Horatio 
Subject: RE: [PATCH 1/2] drm/amdgpu: fix amdgpu_irq_put call trace in 
jpeg_v4_0_hw_fini

[AMD Official Use Only - General]

The series is:

Reviewed-by: Tao Zhou 

> -Original Message-
> From: Horatio Zhang 
> Sent: Monday, May 8, 2023 6:20 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Zhang, Hawking ; Zhou1, Tao 
> ; Xu, Feifei ; Liu, Leo 
> ; Jiang, Sonny ; Limonciello, 
> Mario ; Liu, HaoPing (Alan) 
> ; Zhang, Horatio 
> Subject: [PATCH 1/2] drm/amdgpu: fix amdgpu_irq_put call trace in 
> jpeg_v4_0_hw_fini
> 
> During the suspend, the jpeg_v4_0_hw_init function will use the 
> amdgpu_irq_put to disable the irq of jpeg.inst, but it was not enabled 
> during the resume process, which resulted in a call trace during the GPU 
> reset process.
> 
> [   50.497562] RIP: 0010:amdgpu_irq_put+0xa4/0xc0 [amdgpu]
> [   50.497619] RSP: 0018:aa2400fcfcb0 EFLAGS: 00010246
> [   50.497620] RAX:  RBX: 0001 RCX:
> 
> [   50.497621] RDX:  RSI:  RDI:
> 
> [   50.497621] RBP: aa2400fcfcd0 R08:  R09:
> 
> [   50.497622] R10:  R11:  R12:
> 99b2105242d8
> [   50.497622] R13:  R14: 99b21050 R15:
> 99b21050
> [   50.497623] FS:  () GS:99b51848()
> knlGS:
> [   50.497623] CS:  0010 DS:  ES:  CR0: 80050033
> [   50.497624] CR2: 7f9d32aa91e8 CR3: 0001ba21 CR4:
> 00750ee0
> [   50.497624] PKRU: 5554
> [   50.497625] Call Trace:
> [   50.497625]  
> [   50.497627]  jpeg_v4_0_hw_fini+0x43/0xc0 [amdgpu]
> [   50.497693]  jpeg_v4_0_suspend+0x13/0x30 [amdgpu]
> [   50.497751]  amdgpu_device_ip_suspend_phase2+0x240/0x470 [amdgpu]
> [   50.497802]  amdgpu_device_ip_suspend+0x41/0x80 [amdgpu]
> [   50.497854]  amdgpu_device_pre_asic_reset+0xd9/0x4a0 [amdgpu]
> [   50.497905]  amdgpu_device_gpu_recover.cold+0x548/0xcf1 [amdgpu]
> [   50.498005]  amdgpu_debugfs_reset_work+0x4c/0x80 [amdgpu]
> [   50.498060]  process_one_work+0x21f/0x400
> [   50.498063]  worker_thread+0x200/0x3f0
> [   50.498064]  ? process_one_work+0x400/0x400
> [   50.498065]  kthread+0xee/0x120
> [   50.498067]  ? kthread_complete_and_exit+0x20/0x20
> [   50.498068]  ret_from_fork+0x22/0x30
> 
> Fixes: 86e8255f941e ("drm/amdgpu: add JPEG 4.0 RAS poison consumption
> handling")
> Signed-off-by: Horatio Zhang 
> ---
>  drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c
> b/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c
> index 77e1e64aa1d1..b5c14a166063 100644
> --- a/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c
> @@ -66,6 +66,13 @@ static int jpeg_v4_0_early_init(void *handle)
>   return 0;
>  }
> 
> +static int jpeg_v4_0_late_init(void *handle) {
> + struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> +
> + return amdgpu_irq_get(adev, >jpeg.inst->irq, 0); }
> +
>  /**
>   * jpeg_v4_0_sw_init - sw init for JPEG block
>   *
> @@ -696,7 

Re: [PATCH] drm/amd/amdgpu: Remove redundant else branch in amdgpu_encoders.c

2023-05-10 Thread Alex Deucher
On Tue, May 9, 2023 at 1:17 AM SHANMUGAM, SRINIVASAN
 wrote:
>
> [AMD Official Use Only - General]
>
>
>
> -Original Message-
> From: Alex Deucher 
> Sent: Monday, May 8, 2023 9:27 PM
> To: SHANMUGAM, SRINIVASAN 
> Cc: Koenig, Christian ; Deucher, Alexander 
> ; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH] drm/amd/amdgpu: Remove redundant else branch in 
> amdgpu_encoders.c
>
> On Mon, May 8, 2023 at 11:29 AM Srinivasan Shanmugam 
>  wrote:
> >
> > Adhere to Linux kernel coding style.
> >
> > Reported by checkpatch:
> >
> > WARNING: else is not generally useful after a break or return
> >
>
> What about the else in the previous case statement?
>
> Alex
>
> Hi Alex,
>
> Thanks a lot for your feedbacks,
>
> the else in the previous case ie., is binded to if statement ie., "if 
> (amdgpu_connector->use_digital) {", am I correct please?, please correct me, 
> if my understanding is wrong? & the best solution with your tips pls, so that 
> I can edit & resend the patch please?
>

Yes that one.  It follows a similar pattern to the case you changed.
Shouldn't checkpatch warn on both?

Alex

> Much appreciate for your help in advance,
>
> > Cc: Christian König 
> > Cc: Alex Deucher 
> > Signed-off-by: Srinivasan Shanmugam 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_encoders.c | 26
> > ++--
> >  1 file changed, 13 insertions(+), 13 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_encoders.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_encoders.c
> > index c96e458ed088..049e9976ff34 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_encoders.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_encoders.c
> > @@ -242,19 +242,18 @@ bool amdgpu_dig_monitor_is_duallink(struct 
> > drm_encoder *encoder,
> > if ((dig_connector->dp_sink_type == 
> > CONNECTOR_OBJECT_ID_DISPLAYPORT) ||
> > (dig_connector->dp_sink_type == 
> > CONNECTOR_OBJECT_ID_eDP))
> > return false;
> > -   else {
> > -   /* HDMI 1.3 supports up to 340 Mhz over single link 
> > */
> > -   if (connector->display_info.is_hdmi) {
> > -   if (pixel_clock > 34)
> > -   return true;
> > -   else
> > -   return false;
> > -   } else {
> > -   if (pixel_clock > 165000)
> > -   return true;
> > -   else
> > -   return false;
> > -   }
> > +
> > +   /* HDMI 1.3 supports up to 340 Mhz over single link */
> > +   if (connector->display_info.is_hdmi) {
> > +   if (pixel_clock > 34)
> > +   return true;
> > +   else
> > +   return false;
> > +   } else {
> > +   if (pixel_clock > 165000)
> > +   return true;
> > +   else
> > +   return false;
> > }
> > default:
> > return false;
> > --
> > 2.25.1
> >


[PATCH] drm/amdgpu: change gfx 11.0.4 external_id range

2023-05-10 Thread Yifan Zhang
gfx 11.0.4 range starts from 0x80.

Fixes: 311d52367d0a ("drm/amdgpu: add soc21 common ip block support for GC 
11.0.4")
Cc: sta...@vger.kernel.org
Signed-off-by: Yifan Zhang 
Reported-by: Yogesh Mohan Marimuthu 
Acked-by: Alex Deucher 
Reviewed-by: Tim Huang 
---
 drivers/gpu/drm/amd/amdgpu/soc21.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/soc21.c 
b/drivers/gpu/drm/amd/amdgpu/soc21.c
index 0f82b8e83acb..6bff936a6e55 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc21.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc21.c
@@ -711,7 +711,7 @@ static int soc21_common_early_init(void *handle)
AMD_PG_SUPPORT_VCN_DPG |
AMD_PG_SUPPORT_GFX_PG |
AMD_PG_SUPPORT_JPEG;
-   adev->external_rev_id = adev->rev_id + 0x1;
+   adev->external_rev_id = adev->rev_id + 0x80;
break;
 
default:
-- 
2.37.3



Re: [PATCH] drm/amd/amdgpu: Fix warnings in amdgpu _object, _ring.c

2023-05-10 Thread Alex Deucher
On Tue, May 9, 2023 at 10:03 AM Srinivasan Shanmugam
 wrote:
>
> Fix below warnings reported by checkpatch:
>
> WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
> WARNING: static const char * array should probably be static const char * 
> const
> WARNING: space prohibited between function name and open parenthesis '('
> WARNING: braces {} are not necessary for single statement blocks
> WARNING: Symbolic permissions 'S_IRUGO' are not preferred. Consider using 
> octal permissions '0444'.
>
> Cc: Christian König 
> Cc: Alex Deucher 
> Signed-off-by: Srinivasan Shanmugam 

Acked-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 10 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c   |  9 -
>  2 files changed, 9 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index 7c9b788ae0a9..fbd906ac556e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -130,7 +130,7 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo 
> *abo, u32 domain)
> u32 c = 0;
>
> if (domain & AMDGPU_GEM_DOMAIN_VRAM) {
> -   unsigned visible_pfn = adev->gmc.visible_vram_size >> 
> PAGE_SHIFT;
> +   unsigned int visible_pfn = adev->gmc.visible_vram_size >> 
> PAGE_SHIFT;
>
> places[c].fpfn = 0;
> places[c].lpfn = 0;
> @@ -935,7 +935,7 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 
> domain,
> bo->flags |= AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
> amdgpu_bo_placement_from_domain(bo, domain);
> for (i = 0; i < bo->placement.num_placement; i++) {
> -   unsigned fpfn, lpfn;
> +   unsigned int fpfn, lpfn;
>
> fpfn = min_offset >> PAGE_SHIFT;
> lpfn = max_offset >> PAGE_SHIFT;
> @@ -1016,7 +1016,7 @@ void amdgpu_bo_unpin(struct amdgpu_bo *bo)
> }
>  }
>
> -static const char *amdgpu_vram_names[] = {
> +static const char * const amdgpu_vram_names[] = {
> "UNKNOWN",
> "GDDR1",
> "DDR2",
> @@ -1148,8 +1148,8 @@ void amdgpu_bo_get_tiling_flags(struct amdgpu_bo *bo, 
> u64 *tiling_flags)
>   * Returns:
>   * 0 for success or a negative error code on failure.
>   */
> -int amdgpu_bo_set_metadata (struct amdgpu_bo *bo, void *metadata,
> -   uint32_t metadata_size, uint64_t flags)
> +int amdgpu_bo_set_metadata(struct amdgpu_bo *bo, void *metadata,
> +  u32 metadata_size, uint64_t flags)
>  {
> struct amdgpu_bo_user *ubo;
> void *buffer;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> index a1d480b7fd1f..7429b20257a6 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> @@ -78,7 +78,7 @@ unsigned int amdgpu_ring_max_ibs(enum amdgpu_ring_type type)
>   * Allocate @ndw dwords in the ring buffer (all asics).
>   * Returns 0 on success, error on failure.
>   */
> -int amdgpu_ring_alloc(struct amdgpu_ring *ring, unsigned ndw)
> +int amdgpu_ring_alloc(struct amdgpu_ring *ring, unsigned int ndw)
>  {
> /* Align requested size with padding so unlock_commit can
>  * pad safely */
> @@ -315,9 +315,8 @@ int amdgpu_ring_init(struct amdgpu_device *adev, struct 
> amdgpu_ring *ring,
>  amdgpu_ring_max_ibs(ring->funcs->type) * 
> ring->funcs->emit_ib_size;
> max_ibs_dw = (max_ibs_dw + ring->funcs->align_mask) & 
> ~ring->funcs->align_mask;
>
> -   if (WARN_ON(max_ibs_dw > max_dw)) {
> +   if (WARN_ON(max_ibs_dw > max_dw))
> max_dw = max_ibs_dw;
> -   }
>
> ring->ring_size = roundup_pow_of_two(max_dw * 4 * 
> sched_hw_submission);
>
> @@ -591,7 +590,7 @@ void amdgpu_debugfs_ring_init(struct amdgpu_device *adev,
> char name[32];
>
> sprintf(name, "amdgpu_ring_%s", ring->name);
> -   debugfs_create_file_size(name, S_IFREG | S_IRUGO, root, ring,
> +   debugfs_create_file_size(name, S_IFREG | 0444, root, ring,
>  _debugfs_ring_fops,
>  ring->ring_size + 12);
>
> @@ -601,7 +600,7 @@ void amdgpu_debugfs_ring_init(struct amdgpu_device *adev,
>
> if (ring->mqd_obj) {
> sprintf(name, "amdgpu_mqd_%s", ring->name);
> -   debugfs_create_file_size(name, S_IFREG | S_IRUGO, root, ring,
> +   debugfs_create_file_size(name, S_IFREG | 0444, root, ring,
>  _debugfs_mqd_fops,
>  ring->mqd_size);
> }
> --
> 2.25.1
>


RE: [PATCH] drm/amdgpu: change gfx 11.0.4 external_id range

2023-05-10 Thread Huang, Tim
[AMD Official Use Only - General]

This patch is

Reviewed-by: Tim Huang 

Best Regards,
Tim Huang



-Original Message-
From: Zhang, Yifan 
Sent: Wednesday, May 10, 2023 4:38 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Huang, Tim 
; Du, Xiaojian ; Limonciello, Mario 
; Mohan Marimuthu, Yogesh 
; Zhang, Yifan 
Subject: [PATCH] drm/amdgpu: change gfx 11.0.4 external_id range

gfx 11.0.4 range starts from 0x80.

Fixes: 311d52367d0a ("drm/amdgpu: add soc21 common ip block support for GC 
11.0.4")

Signed-off-by: Yifan Zhang 
---
 drivers/gpu/drm/amd/amdgpu/soc21.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/soc21.c 
b/drivers/gpu/drm/amd/amdgpu/soc21.c
index 0f82b8e83acb..6bff936a6e55 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc21.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc21.c
@@ -711,7 +711,7 @@ static int soc21_common_early_init(void *handle)
AMD_PG_SUPPORT_VCN_DPG |
AMD_PG_SUPPORT_GFX_PG |
AMD_PG_SUPPORT_JPEG;
-   adev->external_rev_id = adev->rev_id + 0x1;
+   adev->external_rev_id = adev->rev_id + 0x80;
break;

default:
--
2.37.3



[PATCH 5/5] drm/amdgpu: add check for RAS instance mask

2023-05-10 Thread Alex Deucher
From: Tao Zhou 

The mask is only needed to be set when RAS block instance number is
more than 1 and invalid bits should be also masked out.
We only check valid bits for GFX and SDMA block for now, and will
add check for other RAS blocks in the future.

v2: move the check under injection operation since the mask is only
used by RAS error inject.
v3: add valid bits handling for SDMA.
v4: print message if the mask is adjusted.

Signed-off-by: Tao Zhou 
Hawking Zhang 
Reviewed-by: Stanley.Yang 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 38 +
 1 file changed, 38 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index b7d8250a9281..6bb438642cc0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -333,6 +333,42 @@ static int amdgpu_ras_debugfs_ctrl_parse_data(struct file 
*f,
return 0;
 }
 
+static void amdgpu_ras_instance_mask_check(struct amdgpu_device *adev,
+   struct ras_debug_if *data)
+{
+   int num_xcc = adev->gfx.xcc_mask ? NUM_XCC(adev->gfx.xcc_mask) : 1;
+   uint32_t mask, inst_mask = data->inject.instance_mask;
+
+   /* no need to set instance mask if there is only one instance */
+   if (num_xcc <= 1 && inst_mask) {
+   data->inject.instance_mask = 0;
+   dev_dbg(adev->dev,
+   "RAS inject mask(0x%x) isn't supported and force it to 
0.\n",
+   inst_mask);
+
+   return;
+   }
+
+   switch (data->head.block) {
+   case AMDGPU_RAS_BLOCK__GFX:
+   mask = GENMASK(num_xcc - 1, 0);
+   break;
+   case AMDGPU_RAS_BLOCK__SDMA:
+   mask = GENMASK(adev->sdma.num_instances - 1, 0);
+   break;
+   default:
+   mask = 0;
+   break;
+   }
+
+   /* remove invalid bits in instance mask */
+   data->inject.instance_mask &= mask;
+   if (inst_mask != data->inject.instance_mask)
+   dev_dbg(adev->dev,
+   "Adjust RAS inject mask 0x%x to 0x%x\n",
+   inst_mask, data->inject.instance_mask);
+}
+
 /**
  * DOC: AMDGPU RAS debugfs control interface
  *
@@ -468,6 +504,8 @@ static ssize_t amdgpu_ras_debugfs_ctrl_write(struct file *f,
break;
}
 
+   amdgpu_ras_instance_mask_check(adev, );
+
/* data.inject.address is offset instead of absolute gpu 
address */
ret = amdgpu_ras_error_inject(adev, );
break;
-- 
2.40.1



[PATCH 3/5] drm/amdgpu: reorganize RAS injection flow

2023-05-10 Thread Alex Deucher
From: Tao Zhou 

So GFX RAS injection could use default function if it doesn't define its
own injection interface.

Signed-off-by: Tao Zhou 
Reviewed-by: Hawking Zhang 
Reviewed-by: Stanley.Yang 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 7ae08f168f99..b7d8250a9281 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -1123,16 +1123,15 @@ int amdgpu_ras_error_inject(struct amdgpu_device *adev,
  block_info.address);
}
 
-   if (info->head.block == AMDGPU_RAS_BLOCK__GFX) {
-   if (block_obj->hw_ops->ras_error_inject)
+   if (block_obj->hw_ops->ras_error_inject) {
+   if (info->head.block == AMDGPU_RAS_BLOCK__GFX)
ret = block_obj->hw_ops->ras_error_inject(adev, info, 
info->instance_mask);
-   } else {
-   /* If defined special ras_error_inject(e.g: xgmi), implement 
special ras_error_inject */
-   if (block_obj->hw_ops->ras_error_inject)
+   else /* Special ras_error_inject is defined (e.g: xgmi) */
ret = block_obj->hw_ops->ras_error_inject(adev, 
_info,
info->instance_mask);
-   else  /*If not defined .ras_error_inject, use default 
ras_error_inject*/
-   ret = psp_ras_trigger_error(>psp, _info, 
info->instance_mask);
+   } else {
+   /* default path */
+   ret = psp_ras_trigger_error(>psp, _info, 
info->instance_mask);
}
 
if (ret)
-- 
2.40.1



[PATCH 2/5] drm/amdgpu: add instance mask for RAS inject

2023-05-10 Thread Alex Deucher
From: Tao Zhou 

User can specify injected instances by the mask. For backward
compatibility, the mask value is incorporated into sub block index
without interface change of RAS TA.
User uses logical mask and driver should convert it to physical value
before sending it to RAS TA.

v2: update parameter name.

Signed-off-by: Tao Zhou 
Reviewed-by: Hawking Zhang 
Reviewed-by: Stanley.Yang 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c  | 21 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h  |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c  | 23 ---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h  |  9 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c |  5 +++--
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c|  6 +++---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_4.c|  4 ++--
 drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c  |  5 +++--
 8 files changed, 56 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index ec79a5c2f500..59b8b26e2caf 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -1672,14 +1672,33 @@ int psp_ras_initialize(struct psp_context *psp)
 }
 
 int psp_ras_trigger_error(struct psp_context *psp,
- struct ta_ras_trigger_error_input *info)
+ struct ta_ras_trigger_error_input *info, uint32_t 
instance_mask)
 {
struct ta_ras_shared_memory *ras_cmd;
+   struct amdgpu_device *adev = psp->adev;
int ret;
+   uint32_t dev_mask;
 
if (!psp->ras_context.context.initialized)
return -EINVAL;
 
+   switch (info->block_id) {
+   case TA_RAS_BLOCK__GFX:
+   dev_mask = GET_MASK(GC, instance_mask);
+   break;
+   case TA_RAS_BLOCK__SDMA:
+   dev_mask = GET_MASK(SDMA0, instance_mask);
+   break;
+   default:
+   dev_mask = instance_mask;
+   break;
+   }
+
+   /* reuse sub_block_index for backward compatibility */
+   dev_mask <<= AMDGPU_RAS_INST_SHIFT;
+   dev_mask &= AMDGPU_RAS_INST_MASK;
+   info->sub_block_index |= dev_mask;
+
ras_cmd = (struct ta_ras_shared_memory 
*)psp->ras_context.context.mem_context.shared_buf;
memset(ras_cmd, 0, sizeof(struct ta_ras_shared_memory));
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h
index 0a409da749d1..d84323923a3f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h
@@ -486,7 +486,7 @@ int psp_ras_invoke(struct psp_context *psp, uint32_t 
ta_cmd_id);
 int psp_ras_enable_features(struct psp_context *psp,
union ta_ras_cmd_input *info, bool enable);
 int psp_ras_trigger_error(struct psp_context *psp,
- struct ta_ras_trigger_error_input *info);
+ struct ta_ras_trigger_error_input *info, uint32_t 
instance_mask);
 int psp_ras_terminate(struct psp_context *psp);
 
 int psp_hdcp_invoke(struct psp_context *psp, uint32_t ta_cmd_id);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 64f80e8cbd63..7ae08f168f99 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -256,6 +256,8 @@ static int amdgpu_ras_debugfs_ctrl_parse_data(struct file 
*f,
int block_id;
uint32_t sub_block;
u64 address, value;
+   /* default value is 0 if the mask is not set by user */
+   u32 instance_mask = 0;
 
if (*pos)
return -EINVAL;
@@ -306,7 +308,11 @@ static int amdgpu_ras_debugfs_ctrl_parse_data(struct file 
*f,
data->op = op;
 
if (op == 2) {
-   if (sscanf(str, "%*s %*s %*s 0x%x 0x%llx 0x%llx",
+   if (sscanf(str, "%*s %*s %*s 0x%x 0x%llx 0x%llx 0x%x",
+  _block, , , 
_mask) != 4 &&
+   sscanf(str, "%*s %*s %*s %u %llu %llu %u",
+  _block, , , 
_mask) != 4 &&
+   sscanf(str, "%*s %*s %*s 0x%x 0x%llx 0x%llx",
   _block, , ) != 3 &&
sscanf(str, "%*s %*s %*s %u %llu %llu",
   _block, , ) != 3)
@@ -314,6 +320,7 @@ static int amdgpu_ras_debugfs_ctrl_parse_data(struct file 
*f,
data->head.sub_block_index = sub_block;
data->inject.address = address;
data->inject.value = value;
+   data->inject.instance_mask = instance_mask;
}
} else {
if (size < sizeof(*data))
@@ -341,7 +348,7 @@ static int amdgpu_ras_debugfs_ctrl_parse_data(struct file 
*f,
  * sub_block_index: some IPs have subcomponets. say, GFX, sDMA.
  * name: 

[PATCH 4/5] drm/amdgpu: remove RAS GFX injection for gfx_v9_4/gfx_v9_4_2

2023-05-10 Thread Alex Deucher
From: Tao Zhou 

No special requirement in RAS injection for the two versions, switch to
use default injection interface.

Signed-off-by: Tao Zhou 
Reviewed-by: Hawking Zhang 
Reviewed-by: Stanley.Yang 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_4.c   | 24 
 drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c | 24 
 2 files changed, 48 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4.c
index 59abe162bbaf..bc8416afb62c 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4.c
@@ -970,29 +970,6 @@ static void gfx_v9_4_reset_ras_error_count(struct 
amdgpu_device *adev)
WREG32_SOC15(GC, 0, mmATC_L2_CACHE_4K_DSM_INDEX, 255);
 }
 
-static int gfx_v9_4_ras_error_inject(struct amdgpu_device *adev,
-void *inject_if, uint32_t instance_mask)
-{
-   struct ras_inject_if *info = (struct ras_inject_if *)inject_if;
-   int ret;
-   struct ta_ras_trigger_error_input block_info = { 0 };
-
-   if (!amdgpu_ras_is_supported(adev, AMDGPU_RAS_BLOCK__GFX))
-   return -EINVAL;
-
-   block_info.block_id = amdgpu_ras_block_to_ta(info->head.block);
-   block_info.sub_block_index = info->head.sub_block_index;
-   block_info.inject_error_type = amdgpu_ras_error_to_ta(info->head.type);
-   block_info.address = info->address;
-   block_info.value = info->value;
-
-   mutex_lock(>grbm_idx_mutex);
-   ret = psp_ras_trigger_error(>psp, _info, instance_mask);
-   mutex_unlock(>grbm_idx_mutex);
-
-   return ret;
-}
-
 static const struct soc15_reg_entry gfx_v9_4_ea_err_status_regs =
{ SOC15_REG_ENTRY(GC, 0, mmGCEA_ERR_STATUS), 0, 1, 32 };
 
@@ -1030,7 +1007,6 @@ static void gfx_v9_4_query_ras_error_status(struct 
amdgpu_device *adev)
 
 
 const struct amdgpu_ras_block_hw_ops  gfx_v9_4_ras_ops = {
-   .ras_error_inject = _v9_4_ras_error_inject,
.query_ras_error_count = _v9_4_query_ras_error_count,
.reset_ras_error_count = _v9_4_reset_ras_error_count,
.query_ras_error_status = _v9_4_query_ras_error_status,
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c
index 4906affa6f8c..2cc3a7cb1f54 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c
@@ -1699,29 +1699,6 @@ static void gfx_v9_4_2_reset_ras_error_count(struct 
amdgpu_device *adev)
gfx_v9_4_2_query_utc_edc_count(adev, NULL, NULL);
 }
 
-static int gfx_v9_4_2_ras_error_inject(struct amdgpu_device *adev,
-   void *inject_if, uint32_t instance_mask)
-{
-   struct ras_inject_if *info = (struct ras_inject_if *)inject_if;
-   int ret;
-   struct ta_ras_trigger_error_input block_info = { 0 };
-
-   if (!amdgpu_ras_is_supported(adev, AMDGPU_RAS_BLOCK__GFX))
-   return -EINVAL;
-
-   block_info.block_id = amdgpu_ras_block_to_ta(info->head.block);
-   block_info.sub_block_index = info->head.sub_block_index;
-   block_info.inject_error_type = amdgpu_ras_error_to_ta(info->head.type);
-   block_info.address = info->address;
-   block_info.value = info->value;
-
-   mutex_lock(>grbm_idx_mutex);
-   ret = psp_ras_trigger_error(>psp, _info, instance_mask);
-   mutex_unlock(>grbm_idx_mutex);
-
-   return ret;
-}
-
 static void gfx_v9_4_2_query_ea_err_status(struct amdgpu_device *adev)
 {
uint32_t i, j;
@@ -1945,7 +1922,6 @@ static bool gfx_v9_4_2_query_uctl2_poison_status(struct 
amdgpu_device *adev)
 }
 
 struct amdgpu_ras_block_hw_ops  gfx_v9_4_2_ras_ops = {
-   .ras_error_inject = _v9_4_2_ras_error_inject,
.query_ras_error_count = _v9_4_2_query_ras_error_count,
.reset_ras_error_count = _v9_4_2_reset_ras_error_count,
.query_ras_error_status = _v9_4_2_query_ras_error_status,
-- 
2.40.1



[PATCH 1/5] drm/amdgpu: convert logical instance mask to physical one

2023-05-10 Thread Alex Deucher
From: Tao Zhou 

Convert instance mask for the convenience of RAS TA.

Signed-off-by: Tao Zhou 
Reviewed-by: Hawking Zhang 
Reviewed-by: Stanley.Yang 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h|  6 --
 .../drm/amd/amdgpu/aqua_vanjaram_reg_init.c| 18 ++
 drivers/gpu/drm/amd/amdgpu/soc15_common.h  |  7 ++-
 3 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 4fb43baddf96..22f1e197cc09 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -698,12 +698,14 @@ enum amd_hw_ip_block_type {
 #define IP_VERSION_REV(ver) ((ver) & 0xFF)
 
 struct amdgpu_ip_map_info {
-   /* Map of logical to actual dev instances */
+   /* Map of logical to actual dev instances/mask */
uint32_tdev_inst[MAX_HWIP][HWIP_MAX_INSTANCE];
int8_t (*logical_to_dev_inst)(struct amdgpu_device *adev,
  enum amd_hw_ip_block_type block,
  int8_t inst);
-
+   uint32_t (*logical_to_dev_mask)(struct amdgpu_device *adev,
+   enum amd_hw_ip_block_type block,
+   uint32_t mask);
 };
 
 struct amd_powerplay {
diff --git a/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram_reg_init.c 
b/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram_reg_init.c
index 93e9f947a85d..68d1a0fc5f5d 100644
--- a/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram_reg_init.c
+++ b/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram_reg_init.c
@@ -229,6 +229,23 @@ static int8_t aqua_vanjaram_logical_to_dev_inst(struct 
amdgpu_device *adev,
return dev_inst;
 }
 
+static uint32_t aqua_vanjaram_logical_to_dev_mask(struct amdgpu_device *adev,
+enum amd_hw_ip_block_type block,
+uint32_t mask)
+{
+   uint32_t dev_mask = 0;
+   int8_t log_inst, dev_inst;
+
+   while (mask) {
+   log_inst = ffs(mask) - 1;
+   dev_inst = aqua_vanjaram_logical_to_dev_inst(adev, block, 
log_inst);
+   dev_mask |= (1 << dev_inst);
+   mask &= ~(1 << log_inst);
+   }
+
+   return dev_mask;
+}
+
 static void aqua_vanjaram_populate_ip_map(struct amdgpu_device *adev,
  enum amd_hw_ip_block_type ip_block,
  uint32_t inst_mask)
@@ -257,6 +274,7 @@ void aqua_vanjaram_ip_map_init(struct amdgpu_device *adev)
aqua_vanjaram_populate_ip_map(adev, ip_map[i][0], ip_map[i][1]);
 
adev->ip_map.logical_to_dev_inst = aqua_vanjaram_logical_to_dev_inst;
+   adev->ip_map.logical_to_dev_mask = aqua_vanjaram_logical_to_dev_mask;
 }
 
 /* Fixed pattern for smn addressing on different AIDs:
diff --git a/drivers/gpu/drm/amd/amdgpu/soc15_common.h 
b/drivers/gpu/drm/amd/amdgpu/soc15_common.h
index 3730c5ec202f..96948a59f8dd 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc15_common.h
+++ b/drivers/gpu/drm/amd/amdgpu/soc15_common.h
@@ -25,7 +25,12 @@
 #define __SOC15_COMMON_H__
 
 /* GET_INST returns the physical instance corresponding to a logical instance 
*/
-#define GET_INST(ip, inst) (adev->ip_map.logical_to_dev_inst? 
adev->ip_map.logical_to_dev_inst(adev, ip##_HWIP, inst): inst)
+#define GET_INST(ip, inst) \
+   (adev->ip_map.logical_to_dev_inst ? \
+   adev->ip_map.logical_to_dev_inst(adev, ip##_HWIP, inst) : inst)
+#define GET_MASK(ip, mask) \
+   (adev->ip_map.logical_to_dev_mask ? \
+   adev->ip_map.logical_to_dev_mask(adev, ip##_HWIP, mask) : mask)
 
 /* Register Access Macros */
 #define SOC15_REG_OFFSET(ip, inst, reg)
(adev->reg_offset[ip##_HWIP][inst][reg##_BASE_IDX] + reg)
-- 
2.40.1



[PATCH] drm/amdgpu: Enable IH CAM on GFX9.4.3

2023-05-10 Thread Alex Deucher
From: Mukul Joshi 

This patch enables IH CAM on GFX9.4.3 ASIC.

Signed-off-by: Mukul Joshi 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c | 4 ++--
 drivers/gpu/drm/amd/amdgpu/vega20_ih.c | 3 ++-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c 
b/drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c
index e1552d645308..755259e96bbc 100644
--- a/drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c
+++ b/drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c
@@ -265,7 +265,7 @@ static void nbio_v7_9_ih_doorbell_range(struct 
amdgpu_device *adev,
ih_doorbell_range = REG_SET_FIELD(ih_doorbell_range,
DOORBELL0_CTRL_ENTRY_0,
BIF_DOORBELL0_RANGE_SIZE_ENTRY,
-   0x4);
+   0x8);
 
ih_doorbell_ctrl = REG_SET_FIELD(ih_doorbell_ctrl,
S2A_DOORBELL_ENTRY_1_CTRL,
@@ -278,7 +278,7 @@ static void nbio_v7_9_ih_doorbell_range(struct 
amdgpu_device *adev,
S2A_DOORBELL_PORT1_RANGE_OFFSET, 0);
ih_doorbell_ctrl = REG_SET_FIELD(ih_doorbell_ctrl,
S2A_DOORBELL_ENTRY_1_CTRL,
-   S2A_DOORBELL_PORT1_RANGE_SIZE, 0x4);
+   S2A_DOORBELL_PORT1_RANGE_SIZE, 0x8);
ih_doorbell_ctrl = REG_SET_FIELD(ih_doorbell_ctrl,
S2A_DOORBELL_ENTRY_1_CTRL,
S2A_DOORBELL_PORT1_AWADDR_31_28_VALUE, 0);
diff --git a/drivers/gpu/drm/amd/amdgpu/vega20_ih.c 
b/drivers/gpu/drm/amd/amdgpu/vega20_ih.c
index 17ccf02462ab..4d719df376a7 100644
--- a/drivers/gpu/drm/amd/amdgpu/vega20_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/vega20_ih.c
@@ -334,7 +334,8 @@ static int vega20_ih_irq_init(struct amdgpu_device *adev)

vega20_setup_retry_doorbell(adev->irq.retry_cam_doorbell_index));
 
/* Enable IH Retry CAM */
-   if (adev->ip_versions[OSSSYS_HWIP][0] == IP_VERSION(4, 4, 0))
+   if (adev->ip_versions[OSSSYS_HWIP][0] == IP_VERSION(4, 4, 0) ||
+   adev->ip_versions[OSSSYS_HWIP][0] == IP_VERSION(4, 4, 2))
WREG32_FIELD15(OSSSYS, 0, IH_RETRY_INT_CAM_CNTL_ALDEBARAN,
   ENABLE, 1);
else
-- 
2.40.1



[PATCH 27/29] drm/amdgpu: route ioctls on primary node of XCPs to primary device

2023-05-10 Thread Alex Deucher
From: Shiwu Zhang 

During XCP init, unlike the primary device, there is no amdgpu_device
attached to each XCP's drm_device

In case that user trying to open/close the primary node of XCP drm_device
this rerouting is to solve the NULL pointer issue causing by referring
to any member of the amdgpu_device

 BUG: unable to handle page fault for address: 00020c80
 #PF: supervisor write access in kernel mode
 #PF: error_code(0x0002) - not-present page
 Oops: 0002 [#1] PREEMPT SMP NOPTI
 Call Trace:
  
  lock_timer_base+0x6b/0x90
  try_to_del_timer_sync+0x2b/0x80
  del_timer_sync+0x29/0x40
  flush_delayed_work+0x1c/0x50
  amdgpu_driver_open_kms+0x2c/0x280 [amdgpu]
  drm_file_alloc+0x1b3/0x260 [drm]
  drm_open+0xaa/0x280 [drm]
  drm_stub_open+0xa2/0x120 [drm]
  chrdev_open+0xa6/0x1c0

Signed-off-by: Shiwu Zhang 
Reviewed-by: Le Ma 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c
index 610c32c4f5af..daeb6bcc9245 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c
@@ -241,6 +241,7 @@ static int amdgpu_xcp_dev_alloc(struct amdgpu_device *adev)
 
/* Redirect all IOCTLs to the primary device */
p_ddev->render->dev = ddev;
+   p_ddev->primary->dev = ddev;
p_ddev->vma_offset_manager = ddev->vma_offset_manager;
adev->xcp_mgr->xcp[i].ddev = p_ddev;
}
-- 
2.40.1



[PATCH 29/29] drm/amdgpu: Correct get_xcp_mem_id calculation

2023-05-10 Thread Alex Deucher
From: Philip Yang 

Current calculation only works for NPS4/QPX mode, correct it for
NPS4/CPX mode.

Signed-off-by: Philip Yang 
Reviewed-by: Lijo Lazar 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/aqua_vanjaram_reg_init.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram_reg_init.c 
b/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram_reg_init.c
index 4ca932a62ce6..93e9f947a85d 100644
--- a/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram_reg_init.c
+++ b/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram_reg_init.c
@@ -518,10 +518,9 @@ static int aqua_vanjaram_switch_partition_mode(struct 
amdgpu_xcp_mgr *xcp_mgr,
 static int __aqua_vanjaram_get_xcp_mem_id(struct amdgpu_device *adev,
  int xcc_id, uint8_t *mem_id)
 {
-   /* TODO: Check if any validation is required based on current
-* memory/spatial modes
-*/
+   /* memory/spatial modes validation check is already done */
*mem_id = xcc_id / adev->gfx.num_xcc_per_xcp;
+   *mem_id /= adev->xcp_mgr->num_xcp_per_mem_partition;
 
return 0;
 }
-- 
2.40.1



[PATCH 18/29] drm/amdkfd: Update MTYPE for far memory partition

2023-05-10 Thread Alex Deucher
From: Philip Yang 

Use MTYPE RW/MTYPE_CC for mapping system memory or VRAM to KFD node
within the same memory partition, use MTYPE_NC for mapping on KFD node
from the far memory partition of the same socket or from another socket
on same XGMI hive.

On NPS4 or 4P system, MTYPE will be overridden per page depending on
the memory NUMA node id and vm->mem_id.

Signed-off-by: Philip Yang 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 15 +++
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c  |  9 +
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index 7dfe6a8ca91a..ee5d4d67b423 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -1191,7 +1191,7 @@ static void gmc_v9_0_get_coherence_flags(struct 
amdgpu_device *adev,
bool is_vram = bo->tbo.resource->mem_type == TTM_PL_VRAM;
bool coherent = bo->flags & AMDGPU_GEM_CREATE_COHERENT;
bool uncached = bo->flags & AMDGPU_GEM_CREATE_UNCACHED;
-   /* TODO: memory partitions struct amdgpu_vm *vm = 
mapping->bo_va->base.vm;*/
+   struct amdgpu_vm *vm = mapping->bo_va->base.vm;
unsigned int mtype_local, mtype;
bool snoop = false;
bool is_local;
@@ -1252,8 +1252,8 @@ static void gmc_v9_0_get_coherence_flags(struct 
amdgpu_device *adev,
}
is_local = (!is_vram && (adev->flags & AMD_IS_APU) &&
num_possible_nodes() <= 1) ||
-  (is_vram && adev == bo_adev /* TODO: memory 
partitions &&
-   bo->mem_id == vm->mem_id*/);
+  (is_vram && adev == bo_adev &&
+   bo->mem_id == vm->mem_id);
snoop = true;
if (uncached) {
mtype = MTYPE_UC;
@@ -1340,13 +1340,12 @@ static void gmc_v9_0_override_vm_pte_flags(struct 
amdgpu_device *adev,
return;
}
 
-   /* TODO: memory partitions. mem_id is hard-coded to 0 for now.
-* FIXME: Only supported on native mode for now. For carve-out, the
+   /* FIXME: Only supported on native mode for now. For carve-out, the
 * NUMA affinity of the GPU/VM needs to come from the PCI info because
 * memory partitions are not associated with different NUMA nodes.
 */
-   if (adev->gmc.is_app_apu) {
-   local_node = 
adev->gmc.mem_partitions[/*vm->mem_id*/0].numa.node;
+   if (adev->gmc.is_app_apu && vm->mem_id >= 0) {
+   local_node = adev->gmc.mem_partitions[vm->mem_id].numa.node;
} else {
dev_dbg(adev->dev, "Only native mode APU is supported.\n");
return;
@@ -1361,7 +1360,7 @@ static void gmc_v9_0_override_vm_pte_flags(struct 
amdgpu_device *adev,
}
nid = pfn_to_nid(addr >> PAGE_SHIFT);
dev_dbg(adev->dev, "vm->mem_id=%d, local_node=%d, nid=%d\n",
-   /*vm->mem_id*/0, local_node, nid);
+   vm->mem_id, local_node, nid);
if (nid == local_node) {
uint64_t old_flags = *flags;
unsigned int mtype_local = MTYPE_RW;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index f6a886d9e902..8b5453fd304a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1202,8 +1202,8 @@ svm_range_get_pte_flags(struct kfd_node *node,
mapping_flags |= AMDGPU_VM_MTYPE_UC;
} else if (domain == SVM_RANGE_VRAM_DOMAIN) {
/* local HBM region close to partition */
-   if (bo_node->adev == node->adev /* TODO: memory 
partitions &&
-   bo_node->mem_id == node->mem_id*/)
+   if (bo_node->adev == node->adev &&
+   (!bo_node->xcp || !node->xcp || 
bo_node->xcp->mem_id == node->xcp->mem_id))
mapping_flags |= mtype_local;
/* local HBM region far from partition or remote XGMI 
GPU */
else if (svm_nodes_in_same_hive(bo_node, node))
@@ -1357,8 +1357,9 @@ svm_range_map_to_gpu(struct kfd_process_device *pdd, 
struct svm_range *prange,
 (last_domain == SVM_RANGE_VRAM_DOMAIN) ? 1 : 0,
 pte_flags);
 
-   /* TODO: we still need to determine the 
vm_manager.vram_base_offset based on
-* the memory partition.
+   /* For dGPU mode, we use same vm_manager to allocate VRAM for
+* different memory partition based on fpfn/lpfn, we should use
+* same vm_manager.vram_base_offset regardless memory partition.
 */
r = amdgpu_vm_update_range(adev, vm, false, false, flush_tlb, 
NULL,
  

[PATCH 24/29] drm/amdkfd: Move local_mem_info to kfd_node

2023-05-10 Thread Alex Deucher
From: Mukul Joshi 

We need to track memory usage on a per partition basis. To do
that, store the local memory information in KFD node instead
of kfd device.

v2: squash in fix ("amdkfd: Use mem_id to access mem_partition info")

Signed-off-by: Mukul Joshi 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 17 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 12 +++-
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c   |  7 ---
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c  |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_device.c|  7 +--
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h  |  3 ++-
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c  |  7 ---
 7 files changed, 36 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index 00edb13d2124..85df73f2c85e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -428,14 +428,23 @@ uint32_t amdgpu_amdkfd_get_fw_version(struct 
amdgpu_device *adev,
 }
 
 void amdgpu_amdkfd_get_local_mem_info(struct amdgpu_device *adev,
- struct kfd_local_mem_info *mem_info)
+ struct kfd_local_mem_info *mem_info,
+ uint8_t xcp_id)
 {
memset(mem_info, 0, sizeof(*mem_info));
 
-   mem_info->local_mem_size_public = adev->gmc.visible_vram_size;
-   mem_info->local_mem_size_private = adev->gmc.real_vram_size -
+   if (adev->ip_versions[GC_HWIP][0] == IP_VERSION(9, 4, 3)) {
+   if (adev->gmc.real_vram_size == adev->gmc.visible_vram_size)
+   mem_info->local_mem_size_public =
+   KFD_XCP_MEMORY_SIZE(adev, xcp_id);
+   else
+   mem_info->local_mem_size_private =
+   KFD_XCP_MEMORY_SIZE(adev, xcp_id);
+   } else {
+   mem_info->local_mem_size_public = adev->gmc.visible_vram_size;
+   mem_info->local_mem_size_private = adev->gmc.real_vram_size -
adev->gmc.visible_vram_size;
-
+   }
mem_info->vram_width = adev->gmc.vram_width;
 
pr_debug("Address base: %pap public 0x%llx private 0x%llx\n",
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 4e6221bccffe..4bf6f5659568 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -231,7 +231,8 @@ int amdgpu_amdkfd_remove_gws_from_process(void *info, void 
*mem);
 uint32_t amdgpu_amdkfd_get_fw_version(struct amdgpu_device *adev,
  enum kgd_engine_type type);
 void amdgpu_amdkfd_get_local_mem_info(struct amdgpu_device *adev,
- struct kfd_local_mem_info *mem_info);
+ struct kfd_local_mem_info *mem_info,
+ uint8_t xcp_id);
 uint64_t amdgpu_amdkfd_get_gpu_clock_counter(struct amdgpu_device *adev);
 
 uint32_t amdgpu_amdkfd_get_max_engine_clock_in_mhz(struct amdgpu_device *adev);
@@ -334,10 +335,11 @@ void amdgpu_amdkfd_unreserve_mem_limit(struct 
amdgpu_device *adev,
((adev)->xcp_mgr && (xcp_id) >= 0 ?\
(adev)->xcp_mgr->xcp[(xcp_id)].mem_id : -1)
 
-#define KFD_XCP_MEMORY_SIZE(n) ((n)->adev->gmc.num_mem_partitions ?\
-   (n)->adev->gmc.mem_partitions[(n)->xcp->mem_id].size /\
-   (n)->adev->xcp_mgr->num_xcp_per_mem_partition :\
-   (n)->adev->gmc.real_vram_size)
+#define KFD_XCP_MEMORY_SIZE(adev, xcp_id)\
+   ((adev)->gmc.num_mem_partitions && (xcp_id) >= 0 ?\
+   (adev)->gmc.mem_partitions[KFD_XCP_MEM_ID((adev), 
(xcp_id))].size /\
+   (adev)->xcp_mgr->num_xcp_per_mem_partition :\
+   (adev)->gmc.real_vram_size)
 
 #if IS_ENABLED(CONFIG_HSA_AMD)
 void amdgpu_amdkfd_gpuvm_init_mem_limits(void);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 344b238d6771..089e1d498670 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1023,11 +1023,12 @@ bool kfd_dev_is_large_bar(struct kfd_node *dev)
if (dev->kfd->use_iommu_v2)
return false;
 
-   if (dev->kfd->local_mem_info.local_mem_size_private == 0 &&
-   dev->kfd->local_mem_info.local_mem_size_public > 0)
+   if (dev->local_mem_info.local_mem_size_private == 0 &&
+   dev->local_mem_info.local_mem_size_public > 0)
return true;
 
-   if (dev->kfd->local_mem_info.local_mem_size_public == 0 && 
dev->kfd->adev->gmc.is_app_apu) {
+   if (dev->local_mem_info.local_mem_size_public == 0 &&
+   

[PATCH 23/29] drm/amdgpu: use xcp partition ID for amdgpu_gem

2023-05-10 Thread Alex Deucher
From: James Zhu 

Find xcp_id from amdgpu_fpriv, use it for amdgpu_gem_object_create.

Signed-off-by: James Zhu 
Acked-by: Felix Kuehling 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index b02d106d5a0c..aad860667ab5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -336,7 +336,7 @@ int amdgpu_gem_create_ioctl(struct drm_device *dev, void 
*data,
 retry:
r = amdgpu_gem_object_create(adev, size, args->in.alignment,
 initial_domain,
-flags, ttm_bo_type_device, resv, , 0);
+flags, ttm_bo_type_device, resv, , 
fpriv->xcp_id + 1);
if (r && r != -ERESTARTSYS) {
if (flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) {
flags &= ~AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
@@ -379,6 +379,7 @@ int amdgpu_gem_userptr_ioctl(struct drm_device *dev, void 
*data,
struct ttm_operation_ctx ctx = { true, false };
struct amdgpu_device *adev = drm_to_adev(dev);
struct drm_amdgpu_gem_userptr *args = data;
+   struct amdgpu_fpriv *fpriv = filp->driver_priv;
struct drm_gem_object *gobj;
struct hmm_range *range;
struct amdgpu_bo *bo;
@@ -405,7 +406,7 @@ int amdgpu_gem_userptr_ioctl(struct drm_device *dev, void 
*data,
 
/* create a gem object to contain this object in */
r = amdgpu_gem_object_create(adev, args->size, 0, AMDGPU_GEM_DOMAIN_CPU,
-0, ttm_bo_type_device, NULL, , 0);
+0, ttm_bo_type_device, NULL, , 
fpriv->xcp_id + 1);
if (r)
return r;
 
@@ -908,6 +909,7 @@ int amdgpu_mode_dumb_create(struct drm_file *file_priv,
struct drm_mode_create_dumb *args)
 {
struct amdgpu_device *adev = drm_to_adev(dev);
+   struct amdgpu_fpriv *fpriv = file_priv->driver_priv;
struct drm_gem_object *gobj;
uint32_t handle;
u64 flags = AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED |
@@ -931,7 +933,7 @@ int amdgpu_mode_dumb_create(struct drm_file *file_priv,
domain = amdgpu_bo_get_preferred_domain(adev,
amdgpu_display_supported_domains(adev, flags));
r = amdgpu_gem_object_create(adev, args->size, 0, domain, flags,
-ttm_bo_type_device, NULL, , 0);
+ttm_bo_type_device, NULL, , 
fpriv->xcp_id + 1);
if (r)
return -ENOMEM;
 
-- 
2.40.1



[PATCH 19/29] drm/amdgpu: Alloc page table on correct memory partition

2023-05-10 Thread Alex Deucher
From: Philip Yang 

Alloc kernel mode page table bo uses the amdgpu_vm->mem_id + 1 as bp
mem_id_plus1 parameter. For APU mode, select the correct TTM pool to
alloc page from the corresponding memory partition, this will be the
closest NUMA node. For dGPU mode, select the correct address range for
vram manager.

Signed-off-by: Philip Yang 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
index 60b1da93b06d..62fc7e8d326e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
@@ -534,6 +534,8 @@ int amdgpu_vm_pt_create(struct amdgpu_device *adev, struct 
amdgpu_vm *vm,
 
bp.type = ttm_bo_type_kernel;
bp.no_wait_gpu = immediate;
+   bp.mem_id_plus1 = vm->mem_id + 1;
+
if (vm->root.bo)
bp.resv = vm->root.bo->tbo.base.resv;
 
@@ -558,6 +560,7 @@ int amdgpu_vm_pt_create(struct amdgpu_device *adev, struct 
amdgpu_vm *vm,
bp.type = ttm_bo_type_kernel;
bp.resv = bo->tbo.base.resv;
bp.bo_ptr_size = sizeof(struct amdgpu_bo);
+   bp.mem_id_plus1 = vm->mem_id + 1;
 
r = amdgpu_bo_create(adev, , &(*vmbo)->shadow);
 
-- 
2.40.1



[PATCH 15/29] drm/amdkfd: Alloc memory of GPU support memory partition

2023-05-10 Thread Alex Deucher
From: Philip Yang 

For dGPU mode VRAM allocation, create amdgpu_bo from amdgpu_vm->mem_id,
to alloc from the correct memory range.

For APU mode VRAM allocation, set alloc domain to GTT, and set
bp->mem_id_plus1 from amdgpu_vm->mem_id + 1 to create amdgpu_bo, to
allocate system memory from correct NUMA node.

For GTT allocation, use mem_id -1 to allocate system memory from any
NUMA nodes.

Remove amdgpu_ttm_tt_set_mem_pool, to avoid the confusion that memory
maybe allocated from different mem_id.

Signed-off-by: Philip Yang 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher 
---
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 24 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   | 20 +---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h   |  1 -
 3 files changed, 8 insertions(+), 37 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 6d0c25e34af1..71b22d61dd27 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1640,9 +1640,9 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
struct drm_gem_object *gobj = NULL;
u32 domain, alloc_domain;
uint64_t aligned_size;
+   int8_t mem_id = -1;
u64 alloc_flags;
int ret;
-   int mem_id = 0; /* Fixme : to be changed when mem_id support patch 
lands, until then NPS1, SPX only */
 
/*
 * Check on which domain to allocate BO
@@ -1652,13 +1652,14 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
 
if (adev->gmc.is_app_apu) {
domain = AMDGPU_GEM_DOMAIN_GTT;
-   alloc_domain = AMDGPU_GEM_DOMAIN_CPU;
+   alloc_domain = AMDGPU_GEM_DOMAIN_GTT;
alloc_flags = 0;
} else {
alloc_flags = AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE;
alloc_flags |= (flags & KFD_IOC_ALLOC_MEM_FLAGS_PUBLIC) 
?
AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED : 0;
}
+   mem_id = avm->mem_id;
} else if (flags & KFD_IOC_ALLOC_MEM_FLAGS_GTT) {
domain = alloc_domain = AMDGPU_GEM_DOMAIN_GTT;
alloc_flags = 0;
@@ -1716,11 +1717,12 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
goto err_reserve_limit;
}
 
-   pr_debug("\tcreate BO VA 0x%llx size 0x%llx domain %s\n",
-   va, (*mem)->aql_queue ? size << 1 : size, 
domain_string(alloc_domain));
+   pr_debug("\tcreate BO VA 0x%llx size 0x%llx domain %s mem_id %d\n",
+va, (*mem)->aql_queue ? size << 1 : size,
+domain_string(alloc_domain), mem_id);
 
ret = amdgpu_gem_object_create(adev, aligned_size, 1, alloc_domain, 
alloc_flags,
-  bo_type, NULL, , 0);
+  bo_type, NULL, , mem_id + 1);
if (ret) {
pr_debug("Failed to create BO on domain %s. ret %d\n",
 domain_string(alloc_domain), ret);
@@ -1746,17 +1748,6 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
(*mem)->mapped_to_gpu_memory = 0;
(*mem)->process_info = avm->process_info;
 
-   if (adev->gmc.is_app_apu &&
-   ((*mem)->alloc_flags & KFD_IOC_ALLOC_MEM_FLAGS_VRAM)) {
-   bo->allowed_domains = AMDGPU_GEM_DOMAIN_GTT;
-   bo->preferred_domains = AMDGPU_GEM_DOMAIN_GTT;
-   ret = amdgpu_ttm_tt_set_mem_pool(>tbo, mem_id);
-   if (ret) {
-   pr_debug("failed to set ttm mem pool %d\n", ret);
-   goto err_set_mem_partition;
-   }
-   }
-
add_kgd_mem_to_kfd_bo_list(*mem, avm->process_info, user_addr);
 
if (user_addr) {
@@ -1783,7 +1774,6 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
 allocate_init_user_pages_failed:
 err_pin_bo:
remove_kgd_mem_from_kfd_bo_list(*mem, avm->process_info);
-err_set_mem_partition:
drm_vma_node_revoke(>vma_node, drm_priv);
 err_node_allow:
/* Don't unreserve system mem limit twice */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 254927c596ba..395edca3b7f9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1064,7 +1064,7 @@ static struct ttm_tt *amdgpu_ttm_tt_create(struct 
ttm_buffer_object *bo,
return NULL;
}
gtt->gobj = >base;
-   gtt->pool_id = NUMA_NO_NODE;
+   gtt->pool_id = abo->mem_id;
 
if (abo->flags & AMDGPU_GEM_CREATE_CPU_GTT_USWC)
caching = ttm_write_combined;
@@ -1159,24 +1159,6 @@ static void amdgpu_ttm_tt_unpopulate(struct ttm_device 
*bdev,
return ttm_pool_free(pool, ttm);
 }
 
-/**
- * amdgpu_ttm_tt_set_mem_pool - Set the TTM memory pool for the 

[PATCH 25/29] drm/amdkfd: Fix memory reporting on GFX 9.4.3

2023-05-10 Thread Alex Deucher
From: Mukul Joshi 

This patch fixes memory reporting on the GFX 9.4.3 APU and dGPU
by reporting available memory on a per partition basis. If its an
APU, available and used memory calculations take into account
system and TTM memory.

v2: squash in fix ("drm/amdkfd: Fix array out of bound warning")
squash in fix ("drm/amdgpu: Update memory reporting for GFX9.4.3")

Signed-off-by: Mukul Joshi 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 12 +--
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 81 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h   |  5 ++
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |  3 +-
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c  | 14 ++--
 5 files changed, 84 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 4bf6f5659568..948d362adabb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -35,6 +35,7 @@
 #include 
 #include "amdgpu_sync.h"
 #include "amdgpu_vm.h"
+#include "amdgpu_xcp.h"
 
 extern uint64_t amdgpu_amdkfd_total_mem_size;
 
@@ -98,8 +99,8 @@ struct amdgpu_amdkfd_fence {
 
 struct amdgpu_kfd_dev {
struct kfd_dev *dev;
-   int64_t vram_used;
-   uint64_t vram_used_aligned;
+   int64_t vram_used[MAX_XCP];
+   uint64_t vram_used_aligned[MAX_XCP];
bool init_complete;
struct work_struct reset_work;
 
@@ -287,7 +288,8 @@ int amdgpu_amdkfd_gpuvm_acquire_process_vm(struct 
amdgpu_device *adev,
 void amdgpu_amdkfd_gpuvm_release_process_vm(struct amdgpu_device *adev,
void *drm_priv);
 uint64_t amdgpu_amdkfd_gpuvm_get_process_page_dir(void *drm_priv);
-size_t amdgpu_amdkfd_get_available_memory(struct amdgpu_device *adev);
+size_t amdgpu_amdkfd_get_available_memory(struct amdgpu_device *adev,
+   uint8_t xcp_id);
 int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
struct amdgpu_device *adev, uint64_t va, uint64_t size,
void *drm_priv, struct kgd_mem **mem,
@@ -327,9 +329,9 @@ void amdgpu_amdkfd_block_mmu_notifications(void *p);
 int amdgpu_amdkfd_criu_resume(void *p);
 bool amdgpu_amdkfd_ras_query_utcl2_poison_status(struct amdgpu_device *adev);
 int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
-   uint64_t size, u32 alloc_flag);
+   uint64_t size, u32 alloc_flag, int8_t xcp_id);
 void amdgpu_amdkfd_unreserve_mem_limit(struct amdgpu_device *adev,
-   uint64_t size, u32 alloc_flag);
+   uint64_t size, u32 alloc_flag, int8_t xcp_id);
 
 #define KFD_XCP_MEM_ID(adev, xcp_id) \
((adev)->xcp_mgr && (xcp_id) >= 0 ?\
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index cf8f80e4ef56..fa4057da0d7f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -156,12 +156,13 @@ void amdgpu_amdkfd_reserve_system_mem(uint64_t size)
  * Return: returns -ENOMEM in case of error, ZERO otherwise
  */
 int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
-   uint64_t size, u32 alloc_flag)
+   uint64_t size, u32 alloc_flag, int8_t xcp_id)
 {
uint64_t reserved_for_pt =
ESTIMATE_PT_SIZE(amdgpu_amdkfd_total_mem_size);
size_t system_mem_needed, ttm_mem_needed, vram_needed;
int ret = 0;
+   uint64_t vram_size = 0;
 
system_mem_needed = 0;
ttm_mem_needed = 0;
@@ -176,6 +177,17 @@ int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device 
*adev,
 * 2M BO chunk.
 */
vram_needed = size;
+   /*
+* For GFX 9.4.3, get the VRAM size from XCP structs
+*/
+   if (WARN_ONCE(xcp_id < 0, "invalid XCP ID %d", xcp_id))
+   return -EINVAL;
+
+   vram_size = KFD_XCP_MEMORY_SIZE(adev, xcp_id);
+   if (adev->gmc.is_app_apu) {
+   system_mem_needed = size;
+   ttm_mem_needed = size;
+   }
} else if (alloc_flag & KFD_IOC_ALLOC_MEM_FLAGS_USERPTR) {
system_mem_needed = size;
} else if (!(alloc_flag &
@@ -195,8 +207,8 @@ int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device 
*adev,
 kfd_mem_limit.max_system_mem_limit && !no_system_mem_limit) ||
(kfd_mem_limit.ttm_mem_used + ttm_mem_needed >
 kfd_mem_limit.max_ttm_mem_limit) ||
-   (adev && adev->kfd.vram_used + vram_needed >
-adev->gmc.real_vram_size - reserved_for_pt)) {
+   (adev && xcp_id >= 0 && adev->kfd.vram_used[xcp_id] + vram_needed >
+vram_size - reserved_for_pt)) {
ret = -ENOMEM;
   

[PATCH 22/29] drm/amdgpu: KFD graphics interop support compute partition

2023-05-10 Thread Alex Deucher
From: Philip Yang 

kfd_ioctl_get_dmabuf use the amdgpu bo xcp_id to get the gpu_id of the
KFD node from the exported dmabuf_adev, and then create kfd bo on the
correct adev and KFD node when importing the amdgpu bo to KFD.

Remove function kfd_device_by_adev, it is not needed as it is the same
result as dmabuf_adev->kfd.dev->nodes[0]->id.

Signed-off-by: Philip Yang 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c |  4 +++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c   | 14 ++
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h  |  1 -
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c  | 18 --
 5 files changed, 10 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index bbbfe9ec4adf..00edb13d2124 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -498,7 +498,7 @@ int amdgpu_amdkfd_get_dmabuf_info(struct amdgpu_device 
*adev, int dma_buf_fd,
  struct amdgpu_device **dmabuf_adev,
  uint64_t *bo_size, void *metadata_buffer,
  size_t buffer_size, uint32_t *metadata_size,
- uint32_t *flags)
+ uint32_t *flags, int8_t *xcp_id)
 {
struct dma_buf *dma_buf;
struct drm_gem_object *obj;
@@ -542,6 +542,8 @@ int amdgpu_amdkfd_get_dmabuf_info(struct amdgpu_device 
*adev, int dma_buf_fd,
if (bo->flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED)
*flags |= KFD_IOC_ALLOC_MEM_FLAGS_PUBLIC;
}
+   if (xcp_id)
+   *xcp_id = bo->xcp_id;
 
 out_put:
dma_buf_put(dma_buf);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 05c54776951b..4e6221bccffe 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -241,7 +241,7 @@ int amdgpu_amdkfd_get_dmabuf_info(struct amdgpu_device 
*adev, int dma_buf_fd,
  struct amdgpu_device **dmabuf_adev,
  uint64_t *bo_size, void *metadata_buffer,
  size_t buffer_size, uint32_t *metadata_size,
- uint32_t *flags);
+ uint32_t *flags, int8_t *xcp_id);
 uint8_t amdgpu_amdkfd_get_xgmi_hops_count(struct amdgpu_device *dst,
  struct amdgpu_device *src);
 int amdgpu_amdkfd_get_xgmi_bandwidth_mbytes(struct amdgpu_device *dst,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 8c86d69938ea..344b238d6771 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1499,6 +1499,7 @@ static int kfd_ioctl_get_dmabuf_info(struct file *filep,
struct amdgpu_device *dmabuf_adev;
void *metadata_buffer = NULL;
uint32_t flags;
+   int8_t xcp_id;
unsigned int i;
int r;
 
@@ -1519,17 +1520,14 @@ static int kfd_ioctl_get_dmabuf_info(struct file *filep,
r = amdgpu_amdkfd_get_dmabuf_info(dev->adev, args->dmabuf_fd,
  _adev, >size,
  metadata_buffer, args->metadata_size,
- >metadata_size, );
+ >metadata_size, , 
_id);
if (r)
goto exit;
 
-   /* Reverse-lookup gpu_id from kgd pointer */
-   dev = kfd_device_by_adev(dmabuf_adev);
-   if (!dev) {
-   r = -EINVAL;
-   goto exit;
-   }
-   args->gpu_id = dev->id;
+   if (xcp_id >= 0)
+   args->gpu_id = dmabuf_adev->kfd.dev->nodes[xcp_id]->id;
+   else
+   args->gpu_id = dmabuf_adev->kfd.dev->nodes[0]->id;
args->flags = flags;
 
/* Copy metadata buffer to user mode */
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 214d950f948e..44f4d5509db6 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -1068,7 +1068,6 @@ struct kfd_topology_device 
*kfd_topology_device_by_proximity_domain_no_lock(
 struct kfd_topology_device *kfd_topology_device_by_id(uint32_t gpu_id);
 struct kfd_node *kfd_device_by_id(uint32_t gpu_id);
 struct kfd_node *kfd_device_by_pci_dev(const struct pci_dev *pdev);
-struct kfd_node *kfd_device_by_adev(const struct amdgpu_device *adev);
 static inline bool kfd_irq_is_from_node(struct kfd_node *node, uint32_t 
node_id,
uint32_t vmid)
 {
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c 

[PATCH 21/29] drm/amdkfd: Store xcp partition id to amdgpu bo

2023-05-10 Thread Alex Deucher
From: Philip Yang 

For memory accounting per compute partition and export drm amdgpu bo and
then import to KFD, we need the xcp id to account the memory usage or
find the KFD node of the original amdgpu bo to create the KFD bo on the
correct adev KFD node.

Set xcp_id_plus1 of amdgpu_bo_param to create bo and store xcp_id to
amddgpu bo. Add helper macro to get the mem_id from adev and xcp_id.

v2: squash in fix ("drm/amdgpu: Fix BO creation failure on GFX 9.4.3 dGPU")

Signed-off-by: Philip Yang 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h   |  4 
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 11 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c  |  4 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.h  |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c   | 15 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h   | 12 
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c  |  6 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c|  5 +++--
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c|  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c |  4 ++--
 10 files changed, 42 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 324cb566ca2f..05c54776951b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -330,6 +330,10 @@ int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device 
*adev,
 void amdgpu_amdkfd_unreserve_mem_limit(struct amdgpu_device *adev,
uint64_t size, u32 alloc_flag);
 
+#define KFD_XCP_MEM_ID(adev, xcp_id) \
+   ((adev)->xcp_mgr && (xcp_id) >= 0 ?\
+   (adev)->xcp_mgr->xcp[(xcp_id)].mem_id : -1)
+
 #define KFD_XCP_MEMORY_SIZE(n) ((n)->adev->gmc.num_mem_partitions ?\
(n)->adev->gmc.mem_partitions[(n)->xcp->mem_id].size /\
(n)->adev->xcp_mgr->num_xcp_per_mem_partition :\
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 71b22d61dd27..cf8f80e4ef56 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1633,6 +1633,7 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
uint64_t *offset, uint32_t flags, bool criu_resume)
 {
struct amdgpu_vm *avm = drm_priv_to_vm(drm_priv);
+   struct amdgpu_fpriv *fpriv = container_of(avm, struct amdgpu_fpriv, vm);
enum ttm_bo_type bo_type = ttm_bo_type_device;
struct sg_table *sg = NULL;
uint64_t user_addr = 0;
@@ -1640,7 +1641,7 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
struct drm_gem_object *gobj = NULL;
u32 domain, alloc_domain;
uint64_t aligned_size;
-   int8_t mem_id = -1;
+   int8_t xcp_id = -1;
u64 alloc_flags;
int ret;
 
@@ -1659,7 +1660,7 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
alloc_flags |= (flags & KFD_IOC_ALLOC_MEM_FLAGS_PUBLIC) 
?
AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED : 0;
}
-   mem_id = avm->mem_id;
+   xcp_id = fpriv->xcp_id == ~0 ? 0 : fpriv->xcp_id;
} else if (flags & KFD_IOC_ALLOC_MEM_FLAGS_GTT) {
domain = alloc_domain = AMDGPU_GEM_DOMAIN_GTT;
alloc_flags = 0;
@@ -1717,12 +1718,12 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
goto err_reserve_limit;
}
 
-   pr_debug("\tcreate BO VA 0x%llx size 0x%llx domain %s mem_id %d\n",
+   pr_debug("\tcreate BO VA 0x%llx size 0x%llx domain %s xcp_id %d\n",
 va, (*mem)->aql_queue ? size << 1 : size,
-domain_string(alloc_domain), mem_id);
+domain_string(alloc_domain), xcp_id);
 
ret = amdgpu_gem_object_create(adev, aligned_size, 1, alloc_domain, 
alloc_flags,
-  bo_type, NULL, , mem_id + 1);
+  bo_type, NULL, , xcp_id + 1);
if (ret) {
pr_debug("Failed to create BO on domain %s. ret %d\n",
 domain_string(alloc_domain), ret);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 01029b495f5a..b02d106d5a0c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -97,7 +97,7 @@ int amdgpu_gem_object_create(struct amdgpu_device *adev, 
unsigned long size,
 int alignment, u32 initial_domain,
 u64 flags, enum ttm_bo_type type,
 struct dma_resv *resv,
-struct drm_gem_object **obj, int8_t mem_id_plus1)
+struct drm_gem_object **obj, int8_t xcp_id_plus1)
 {
struct amdgpu_bo *bo;

[PATCH 17/29] drm/amdgpu: dGPU mode placement support memory partition

2023-05-10 Thread Alex Deucher
From: Philip Yang 

dGPU mode uses VRAM manager to validate bo, amdgpu bo placement use the
mem_id  to get the allocation range first, last page frame number
from xcp manager, pass to drm buddy allocator as the allowed range.

Signed-off-by: Philip Yang 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 155b62971a33..cfa14b56c419 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -132,13 +132,18 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo 
*abo, u32 domain)
if (domain & AMDGPU_GEM_DOMAIN_VRAM) {
unsigned visible_pfn = adev->gmc.visible_vram_size >> 
PAGE_SHIFT;
 
-   places[c].fpfn = 0;
-   places[c].lpfn = 0;
+   if (adev->gmc.mem_partitions && abo->mem_id >= 0) {
+   places[c].fpfn = 
adev->gmc.mem_partitions[abo->mem_id].range.fpfn;
+   places[c].lpfn = 
adev->gmc.mem_partitions[abo->mem_id].range.lpfn;
+   } else {
+   places[c].fpfn = 0;
+   places[c].lpfn = 0;
+   }
places[c].mem_type = TTM_PL_VRAM;
places[c].flags = 0;
 
if (flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED)
-   places[c].lpfn = visible_pfn;
+   places[c].lpfn = min_not_zero(places[c].lpfn, 
visible_pfn);
else if (adev->gmc.real_vram_size != 
adev->gmc.visible_vram_size)
places[c].flags |= TTM_PL_FLAG_TOPDOWN;
 
-- 
2.40.1



[PATCH 26/29] drm/amdkfd: APU mode set max svm range pages

2023-05-10 Thread Alex Deucher
From: Philip Yang 

svm_migrate_init set the max svm range pages based on the KFD nodes
partition size. APU mode don't init pgmap because there is no migration.

kgd2kfd_device_init calls svm_migrate_init after KFD nodes allocation
and initialization.

Signed-off-by: Philip Yang 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdkfd/kfd_device.c  |  5 ++---
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c |  7 +--
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 15 ++-
 3 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index d41da964d2f5..882ff86bba08 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -724,9 +724,6 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
 
kfd_cwsr_init(kfd);
 
-   svm_migrate_init(kfd->adev);
-
-
dev_info(kfd_device, "Total number of KFD nodes to be created: %d\n",
kfd->num_nodes);
 
@@ -794,6 +791,8 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
kfd->nodes[i] = node;
}
 
+   svm_migrate_init(kfd->adev);
+
if (kfd_resume_iommu(kfd))
goto kfd_resume_iommu_error;
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index 199d32c7c289..2512bf681112 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -1000,6 +1000,11 @@ int svm_migrate_init(struct amdgpu_device *adev)
if (!KFD_IS_SOC15(kfddev->dev))
return -EINVAL;
 
+   svm_range_set_max_pages(adev);
+
+   if (adev->gmc.is_app_apu)
+   return 0;
+
pgmap = >pgmap;
memset(pgmap, 0, sizeof(*pgmap));
 
@@ -1042,8 +1047,6 @@ int svm_migrate_init(struct amdgpu_device *adev)
 
amdgpu_amdkfd_reserve_system_mem(SVM_HMM_PAGE_STRUCT_SIZE(size));
 
-   svm_range_set_max_pages(adev);
-
pr_info("HMM registered %ldMB device memory\n", size >> 20);
 
return 0;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 2dbbdad3f392..41dacc015983 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1937,14 +1937,19 @@ void svm_range_set_max_pages(struct amdgpu_device *adev)
 {
uint64_t max_pages;
uint64_t pages, _pages;
+   uint64_t min_pages = 0;
+   int i;
+
+   for (i = 0; i < adev->kfd.dev->num_nodes; i++) {
+   pages = KFD_XCP_MEMORY_SIZE(adev, 
adev->kfd.dev->nodes[i]->xcp->id) >> 17;
+   pages = clamp(pages, 1ULL << 9, 1ULL << 18);
+   pages = rounddown_pow_of_two(pages);
+   min_pages = min_not_zero(min_pages, pages);
+   }
 
-   /* 1/32 VRAM size in pages */
-   pages = adev->gmc.real_vram_size >> 17;
-   pages = clamp(pages, 1ULL << 9, 1ULL << 18);
-   pages = rounddown_pow_of_two(pages);
do {
max_pages = READ_ONCE(max_svm_range_pages);
-   _pages = min_not_zero(max_pages, pages);
+   _pages = min_not_zero(max_pages, min_pages);
} while (cmpxchg(_svm_range_pages, max_pages, _pages) != max_pages);
 }
 
-- 
2.40.1



[PATCH 28/29] drm/amdkfd: Refactor migrate init to support partition switch

2023-05-10 Thread Alex Deucher
From: Philip Yang 

Rename smv_migrate_init to a better name kgd2kfd_init_zone_device
because it setup zone devive pgmap for page migration and keep it in
kfd_migrate.c to access static functions svm_migrate_pgmap_ops. Call it
only once in amdgpu_device_ip_init after adev ip blocks are initialized,
but before amdgpu_amdkfd_device_init initialize kfd nodes which enable
SVM support based on pgmap.

svm_range_set_max_pages is called by kgd2kfd_device_init everytime after
switching compute partition mode.

Signed-off-by: Philip Yang 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 11 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  4 +++-
 drivers/gpu/drm/amd/amdkfd/kfd_device.c|  3 ++-
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c   |  8 +++-
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.h   |  9 -
 drivers/gpu/drm/amd/amdkfd/kfd_svm.h   |  4 
 6 files changed, 23 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 948d362adabb..48d12dbff968 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -372,6 +372,17 @@ void amdgpu_amdkfd_release_notify(struct amdgpu_bo *bo)
 {
 }
 #endif
+
+#if IS_ENABLED(CONFIG_HSA_AMD_SVM)
+int kgd2kfd_init_zone_device(struct amdgpu_device *adev);
+#else
+static inline
+int kgd2kfd_init_zone_device(struct amdgpu_device *adev)
+{
+   return 0;
+}
+#endif
+
 /* KGD2KFD callbacks */
 int kgd2kfd_quiesce_mm(struct mm_struct *mm, uint32_t trigger);
 int kgd2kfd_resume_mm(struct mm_struct *mm);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 321b689db601..9c1a8ace6c31 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2632,8 +2632,10 @@ static int amdgpu_device_ip_init(struct amdgpu_device 
*adev)
goto init_failed;
 
/* Don't init kfd if whole hive need to be reset during init */
-   if (!adev->gmc.xgmi.pending_reset)
+   if (!adev->gmc.xgmi.pending_reset) {
+   kgd2kfd_init_zone_device(adev);
amdgpu_amdkfd_device_init(adev);
+   }
 
amdgpu_fru_get_product_info(adev);
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 882ff86bba08..bf32e547182c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -32,6 +32,7 @@
 #include "kfd_iommu.h"
 #include "amdgpu_amdkfd.h"
 #include "kfd_smi_events.h"
+#include "kfd_svm.h"
 #include "kfd_migrate.h"
 #include "amdgpu.h"
 #include "amdgpu_xcp.h"
@@ -791,7 +792,7 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
kfd->nodes[i] = node;
}
 
-   svm_migrate_init(kfd->adev);
+   svm_range_set_max_pages(kfd->adev);
 
if (kfd_resume_iommu(kfd))
goto kfd_resume_iommu_error;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index 2512bf681112..35cf6558cf1b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -988,7 +988,7 @@ static const struct dev_pagemap_ops svm_migrate_pgmap_ops = 
{
 /* Each VRAM page uses sizeof(struct page) on system memory */
 #define SVM_HMM_PAGE_STRUCT_SIZE(size) ((size)/PAGE_SIZE * sizeof(struct page))
 
-int svm_migrate_init(struct amdgpu_device *adev)
+int kgd2kfd_init_zone_device(struct amdgpu_device *adev)
 {
struct amdgpu_kfd_dev *kfddev = >kfd;
struct dev_pagemap *pgmap;
@@ -996,12 +996,10 @@ int svm_migrate_init(struct amdgpu_device *adev)
unsigned long size;
void *r;
 
-   /* Page migration works on Vega10 or newer */
-   if (!KFD_IS_SOC15(kfddev->dev))
+   /* Page migration works on gfx9 or newer */
+   if (adev->ip_versions[GC_HWIP][0] < IP_VERSION(9, 0, 1))
return -EINVAL;
 
-   svm_range_set_max_pages(adev);
-
if (adev->gmc.is_app_apu)
return 0;
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.h
index a5d7e6d22264..487f26368164 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.h
@@ -47,15 +47,6 @@ int svm_migrate_vram_to_ram(struct svm_range *prange, struct 
mm_struct *mm,
 unsigned long
 svm_migrate_addr_to_pfn(struct amdgpu_device *adev, unsigned long addr);
 
-int svm_migrate_init(struct amdgpu_device *adev);
-
-#else
-
-static inline int svm_migrate_init(struct amdgpu_device *adev)
-{
-   return 0;
-}
-
 #endif /* IS_ENABLED(CONFIG_HSA_AMD_SVM) */
 
 #endif /* KFD_MIGRATE_H_ */
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
index 021def496f5a..762679835e31 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
+++ 

[PATCH 16/29] drm/amdkfd: SVM range allocation support memory partition

2023-05-10 Thread Alex Deucher
From: Philip Yang 

Pass kfd node->xcp->mem_id to amdgpu bo create parameter mem_id_plus1 to
allocate new svm_bo on the specified memory partition.

This is only for dGPU mode as we don't migrate with APU mode.

Signed-off-by: Philip Yang 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index c5675c7e3b9e..f6a886d9e902 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -554,16 +554,20 @@ svm_range_vram_node_new(struct kfd_node *node, struct 
svm_range *prange,
bp.flags |= AMDGPU_GEM_CREATE_DISCARDABLE;
bp.type = ttm_bo_type_device;
bp.resv = NULL;
+   if (node->xcp)
+   bp.mem_id_plus1 = node->xcp->mem_id + 1;
 
-   /* TODO: Allocate memory from the right memory partition. We can sort
-* out the details later, once basic memory partitioning is working
-*/
r = amdgpu_bo_create_user(node->adev, , );
if (r) {
pr_debug("failed %d to create bo\n", r);
goto create_bo_failed;
}
bo = >bo;
+
+   pr_debug("alloc bo at offset 0x%lx size 0x%lx on partition %d\n",
+bo->tbo.resource->start << PAGE_SHIFT, bp.size,
+bp.mem_id_plus1 - 1);
+
r = amdgpu_bo_reserve(bo, true);
if (r) {
pr_debug("failed %d to reserve bo\n", r);
-- 
2.40.1



[PATCH 14/29] drm/amdgpu: Add memory partition mem_id to amdgpu_bo

2023-05-10 Thread Alex Deucher
From: Philip Yang 

Add mem_id_plus1 parameter to amdgpu_gem_object_create and pass it to
amdgpu_bo_create. For dGPU mode allocation, mem_id is used by VRAM
manager to get the memory partition fpfn, lpfn from xcp manager. For APU
native mode allocation, mem_id is used to get NUMA node id from xcp
manager, then pass to TTM as numa pool id to alloc memory from the
specific NUMA node. mem_id -1 means for entire VRAM or any NUMA nodes.

Signed-off-by: Philip Yang 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 4 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c  | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c  | 9 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.h  | 3 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c   | 3 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h   | 5 +
 6 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 12149b317b88..6d0c25e34af1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -289,7 +289,7 @@ create_dmamap_sg_bo(struct amdgpu_device *adev,
 
ret = amdgpu_gem_object_create(adev, mem->bo->tbo.base.size, 1,
AMDGPU_GEM_DOMAIN_CPU, AMDGPU_GEM_CREATE_PREEMPTIBLE | 
flags,
-   ttm_bo_type_sg, mem->bo->tbo.base.resv, _obj);
+   ttm_bo_type_sg, mem->bo->tbo.base.resv, _obj, 0);
 
amdgpu_bo_unreserve(mem->bo);
 
@@ -1720,7 +1720,7 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
va, (*mem)->aql_queue ? size << 1 : size, 
domain_string(alloc_domain));
 
ret = amdgpu_gem_object_create(adev, aligned_size, 1, alloc_domain, 
alloc_flags,
-  bo_type, NULL, );
+  bo_type, NULL, , 0);
if (ret) {
pr_debug("Failed to create BO on domain %s. ret %d\n",
 domain_string(alloc_domain), ret);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
index e97b1eef2c9d..8b162f05d1fd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
@@ -335,7 +335,7 @@ amdgpu_dma_buf_create_obj(struct drm_device *dev, struct 
dma_buf *dma_buf)
 
ret = amdgpu_gem_object_create(adev, dma_buf->size, PAGE_SIZE,
   AMDGPU_GEM_DOMAIN_CPU, flags,
-  ttm_bo_type_sg, resv, );
+  ttm_bo_type_sg, resv, , 0);
if (ret)
goto error;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 6936cd63df42..01029b495f5a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -97,7 +97,7 @@ int amdgpu_gem_object_create(struct amdgpu_device *adev, 
unsigned long size,
 int alignment, u32 initial_domain,
 u64 flags, enum ttm_bo_type type,
 struct dma_resv *resv,
-struct drm_gem_object **obj)
+struct drm_gem_object **obj, int8_t mem_id_plus1)
 {
struct amdgpu_bo *bo;
struct amdgpu_bo_user *ubo;
@@ -115,6 +115,7 @@ int amdgpu_gem_object_create(struct amdgpu_device *adev, 
unsigned long size,
bp.flags = flags;
bp.domain = initial_domain;
bp.bo_ptr_size = sizeof(struct amdgpu_bo);
+   bp.mem_id_plus1 = mem_id_plus1;
 
r = amdgpu_bo_create_user(adev, , );
if (r)
@@ -335,7 +336,7 @@ int amdgpu_gem_create_ioctl(struct drm_device *dev, void 
*data,
 retry:
r = amdgpu_gem_object_create(adev, size, args->in.alignment,
 initial_domain,
-flags, ttm_bo_type_device, resv, );
+flags, ttm_bo_type_device, resv, , 0);
if (r && r != -ERESTARTSYS) {
if (flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) {
flags &= ~AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
@@ -404,7 +405,7 @@ int amdgpu_gem_userptr_ioctl(struct drm_device *dev, void 
*data,
 
/* create a gem object to contain this object in */
r = amdgpu_gem_object_create(adev, args->size, 0, AMDGPU_GEM_DOMAIN_CPU,
-0, ttm_bo_type_device, NULL, );
+0, ttm_bo_type_device, NULL, , 0);
if (r)
return r;
 
@@ -930,7 +931,7 @@ int amdgpu_mode_dumb_create(struct drm_file *file_priv,
domain = amdgpu_bo_get_preferred_domain(adev,
amdgpu_display_supported_domains(adev, flags));
r = 

[PATCH 20/29] drm/amdgpu: dGPU mode set VRAM range lpfn as exclusive

2023-05-10 Thread Alex Deucher
From: Philip Yang 

TTM place lpfn is exclusive used as end (start + size) in drm and buddy
allocator, adev->gmc memory partition range lpfn is inclusive (start +
size - 1), should plus 1 to set TTM place lpfn.

Signed-off-by: Philip Yang 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index cfa14b56c419..3002d431ce3d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -134,7 +134,11 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo 
*abo, u32 domain)
 
if (adev->gmc.mem_partitions && abo->mem_id >= 0) {
places[c].fpfn = 
adev->gmc.mem_partitions[abo->mem_id].range.fpfn;
-   places[c].lpfn = 
adev->gmc.mem_partitions[abo->mem_id].range.lpfn;
+   /*
+* memory partition range lpfn is inclusive start + 
size - 1
+* TTM place lpfn is exclusive start + size
+*/
+   places[c].lpfn = 
adev->gmc.mem_partitions[abo->mem_id].range.lpfn + 1;
} else {
places[c].fpfn = 0;
places[c].lpfn = 0;
-- 
2.40.1



[PATCH 07/29] drm/amdgpu: add partition schedule for GC(9, 4, 3)

2023-05-10 Thread Alex Deucher
From: James Zhu 

Implement partition schedule for GC(9, 4, 3).

Signed-off-by: James Zhu 
Acked-by: Lijo Lazar 
Signed-off-by: Alex Deucher 
---
 .../drm/amd/amdgpu/aqua_vanjaram_reg_init.c   | 41 +++
 1 file changed, 41 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram_reg_init.c 
b/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram_reg_init.c
index 073ae95e6dd6..4ca932a62ce6 100644
--- a/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram_reg_init.c
+++ b/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram_reg_init.c
@@ -166,6 +166,46 @@ static int 
aqua_vanjaram_update_partition_sched_list(struct amdgpu_device *adev)
return aqua_vanjaram_xcp_sched_list_update(adev);
 }
 
+int aqua_vanjaram_select_scheds(
+   struct amdgpu_device *adev,
+   u32 hw_ip,
+   u32 hw_prio,
+   struct amdgpu_fpriv *fpriv,
+   unsigned int *num_scheds,
+   struct drm_gpu_scheduler ***scheds)
+{
+   u32 sel_xcp_id;
+   int i;
+
+   if (fpriv->xcp_id == ~0) {
+   u32 least_ref_cnt = ~0;
+
+   fpriv->xcp_id = 0;
+   for (i = 0; i < adev->xcp_mgr->num_xcps; i++) {
+   u32 total_ref_cnt;
+
+   total_ref_cnt = 
atomic_read(>xcp_mgr->xcp[i].ref_cnt);
+   if (total_ref_cnt < least_ref_cnt) {
+   fpriv->xcp_id = i;
+   least_ref_cnt = total_ref_cnt;
+   }
+   }
+   }
+   sel_xcp_id = fpriv->xcp_id;
+
+   if 
(adev->xcp_mgr->xcp[sel_xcp_id].gpu_sched[hw_ip][hw_prio].num_scheds) {
+   *num_scheds = 
adev->xcp_mgr->xcp[fpriv->xcp_id].gpu_sched[hw_ip][hw_prio].num_scheds;
+   *scheds = 
adev->xcp_mgr->xcp[fpriv->xcp_id].gpu_sched[hw_ip][hw_prio].sched;
+   atomic_inc(>xcp_mgr->xcp[sel_xcp_id].ref_cnt);
+   DRM_DEBUG("Selected partition #%d", sel_xcp_id);
+   } else {
+   DRM_ERROR("Failed to schedule partition #%d.", sel_xcp_id);
+   return -ENOENT;
+   }
+
+   return 0;
+}
+
 static int8_t aqua_vanjaram_logical_to_dev_inst(struct amdgpu_device *adev,
 enum amd_hw_ip_block_type block,
 int8_t inst)
@@ -548,6 +588,7 @@ struct amdgpu_xcp_mgr_funcs aqua_vanjaram_xcp_funcs = {
.query_partition_mode = _vanjaram_query_partition_mode,
.get_ip_details = _vanjaram_get_xcp_ip_details,
.get_xcp_mem_id = _vanjaram_get_xcp_mem_id,
+   .select_scheds = _vanjaram_select_scheds,
.update_partition_sched_list = 
_vanjaram_update_partition_sched_list
 };
 
-- 
2.40.1



[PATCH 05/29] drm/amdgpu: add partition scheduler list update

2023-05-10 Thread Alex Deucher
From: James Zhu 

Add partition scheduler list update in late init
and xcp partition mode switch.

Signed-off-by: James Zhu 
Acked-by: Lijo Lazar 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c|  2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c   |  2 +
 .../drm/amd/amdgpu/aqua_vanjaram_reg_init.c   | 67 ++-
 3 files changed, 70 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 40c5845c78df..321b689db601 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2473,6 +2473,8 @@ static int amdgpu_device_init_schedulers(struct 
amdgpu_device *adev)
}
}
 
+   amdgpu_xcp_update_partition_sched_list(adev);
+
return 0;
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c
index 9b627a8b1d5c..78fce5aab218 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c
@@ -118,6 +118,7 @@ static void __amdgpu_xcp_add_block(struct amdgpu_xcp_mgr 
*xcp_mgr, int xcp_id,
 
 int amdgpu_xcp_init(struct amdgpu_xcp_mgr *xcp_mgr, int num_xcps, int mode)
 {
+   struct amdgpu_device *adev = xcp_mgr->adev;
struct amdgpu_xcp_ip ip;
uint8_t mem_id;
int i, j, ret;
@@ -153,6 +154,7 @@ int amdgpu_xcp_init(struct amdgpu_xcp_mgr *xcp_mgr, int 
num_xcps, int mode)
}
 
xcp_mgr->num_xcps = num_xcps;
+   amdgpu_xcp_update_partition_sched_list(adev);
 
return 0;
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram_reg_init.c 
b/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram_reg_init.c
index c90ea34ef9ec..073ae95e6dd6 100644
--- a/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram_reg_init.c
+++ b/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram_reg_init.c
@@ -102,6 +102,70 @@ static void aqua_vanjaram_set_xcp_id(struct amdgpu_device 
*adev,
}
 }
 
+static void aqua_vanjaram_xcp_gpu_sched_update(
+   struct amdgpu_device *adev,
+   struct amdgpu_ring *ring,
+   unsigned int sel_xcp_id)
+{
+   unsigned int *num_gpu_sched;
+
+   num_gpu_sched = >xcp_mgr->xcp[sel_xcp_id]
+   .gpu_sched[ring->funcs->type][ring->hw_prio].num_scheds;
+   
adev->xcp_mgr->xcp[sel_xcp_id].gpu_sched[ring->funcs->type][ring->hw_prio]
+   .sched[(*num_gpu_sched)++] = >sched;
+   DRM_DEBUG("%s :[%d] gpu_sched[%d][%d] = %d", ring->name,
+   sel_xcp_id, ring->funcs->type,
+   ring->hw_prio, *num_gpu_sched);
+}
+
+static int aqua_vanjaram_xcp_sched_list_update(
+   struct amdgpu_device *adev)
+{
+   struct amdgpu_ring *ring;
+   int i;
+
+   for (i = 0; i < MAX_XCP; i++) {
+   atomic_set(>xcp_mgr->xcp[i].ref_cnt, 0);
+   memset(adev->xcp_mgr->xcp[i].gpu_sched, 0, 
sizeof(adev->xcp_mgr->xcp->gpu_sched));
+   }
+
+   if (adev->xcp_mgr->mode == AMDGPU_XCP_MODE_NONE)
+   return 0;
+
+   for (i = 0; i < AMDGPU_MAX_RINGS; i++) {
+   ring = adev->rings[i];
+   if (!ring || !ring->sched.ready)
+   continue;
+
+   aqua_vanjaram_xcp_gpu_sched_update(adev, ring, ring->xcp_id);
+
+   /* VCN is shared by two partitions under CPX MODE */
+   if ((ring->funcs->type == AMDGPU_RING_TYPE_VCN_ENC ||
+   ring->funcs->type == AMDGPU_RING_TYPE_VCN_JPEG) &&
+   adev->xcp_mgr->mode == AMDGPU_CPX_PARTITION_MODE)
+   aqua_vanjaram_xcp_gpu_sched_update(adev, ring, 
ring->xcp_id + 1);
+   }
+
+   return 0;
+}
+
+static int aqua_vanjaram_update_partition_sched_list(struct amdgpu_device 
*adev)
+{
+   int i;
+
+   for (i = 0; i < adev->num_rings; i++) {
+   struct amdgpu_ring *ring = adev->rings[i];
+
+   if (ring->funcs->type == AMDGPU_RING_TYPE_COMPUTE ||
+   ring->funcs->type == AMDGPU_RING_TYPE_KIQ)
+   aqua_vanjaram_set_xcp_id(adev, ring->xcc_id, ring);
+   else
+   aqua_vanjaram_set_xcp_id(adev, ring->me, ring);
+   }
+
+   return aqua_vanjaram_xcp_sched_list_update(adev);
+}
+
 static int8_t aqua_vanjaram_logical_to_dev_inst(struct amdgpu_device *adev,
 enum amd_hw_ip_block_type block,
 int8_t inst)
@@ -483,7 +547,8 @@ struct amdgpu_xcp_mgr_funcs aqua_vanjaram_xcp_funcs = {
.switch_partition_mode = _vanjaram_switch_partition_mode,
.query_partition_mode = _vanjaram_query_partition_mode,
.get_ip_details = _vanjaram_get_xcp_ip_details,
-   .get_xcp_mem_id = _vanjaram_get_xcp_mem_id
+   .get_xcp_mem_id = _vanjaram_get_xcp_mem_id,
+   .update_partition_sched_list = 

[PATCH 08/29] drm/amdgpu: run partition schedule if it is supported

2023-05-10 Thread Alex Deucher
From: James Zhu 

Run partition schedule if it is supported during ctx init entity.

Signed-off-by: James Zhu 
Acked-by: Lijo Lazar 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index 06d68a08251a..e579bb054a58 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -222,8 +222,19 @@ static int amdgpu_ctx_init_entity(struct amdgpu_ctx *ctx, 
u32 hw_ip,
drm_prio = amdgpu_ctx_to_drm_sched_prio(ctx_prio);
 
hw_ip = array_index_nospec(hw_ip, AMDGPU_HW_IP_NUM);
-   scheds = adev->gpu_sched[hw_ip][hw_prio].sched;
-   num_scheds = adev->gpu_sched[hw_ip][hw_prio].num_scheds;
+
+   if (!(adev)->xcp_mgr) {
+   scheds = adev->gpu_sched[hw_ip][hw_prio].sched;
+   num_scheds = adev->gpu_sched[hw_ip][hw_prio].num_scheds;
+   } else {
+   struct amdgpu_fpriv *fpriv;
+
+   fpriv = container_of(ctx->ctx_mgr, struct amdgpu_fpriv, 
ctx_mgr);
+   r = amdgpu_xcp_select_scheds(adev, hw_ip, hw_prio, fpriv,
+   _scheds, );
+   if (r)
+   goto cleanup_entity;
+   }
 
/* disable load balance if the hw engine retains context among 
dependent jobs */
if (hw_ip == AMDGPU_HW_IP_VCN_ENC ||
-- 
2.40.1



[PATCH 09/29] drm/amdgpu: update ref_cnt before ctx free

2023-05-10 Thread Alex Deucher
From: James Zhu 

Update ref_cnt before ctx free.

Signed-off-by: James Zhu 
Acked-by: Lijo Lazar 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c |  7 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c | 16 
 drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h |  2 ++
 3 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index e579bb054a58..3ccd709ae76a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -266,7 +266,8 @@ static int amdgpu_ctx_init_entity(struct amdgpu_ctx *ctx, 
u32 hw_ip,
return r;
 }
 
-static ktime_t amdgpu_ctx_fini_entity(struct amdgpu_ctx_entity *entity)
+static ktime_t amdgpu_ctx_fini_entity(struct amdgpu_device *adev,
+ struct amdgpu_ctx_entity *entity)
 {
ktime_t res = ns_to_ktime(0);
int i;
@@ -279,6 +280,8 @@ static ktime_t amdgpu_ctx_fini_entity(struct 
amdgpu_ctx_entity *entity)
dma_fence_put(entity->fences[i]);
}
 
+   amdgpu_xcp_release_sched(adev, entity);
+
kfree(entity);
return res;
 }
@@ -412,7 +415,7 @@ static void amdgpu_ctx_fini(struct kref *ref)
for (j = 0; j < AMDGPU_MAX_ENTITY_NUM; ++j) {
ktime_t spend;
 
-   spend = amdgpu_ctx_fini_entity(ctx->entities[i][j]);
+   spend = amdgpu_ctx_fini_entity(adev, 
ctx->entities[i][j]);
atomic64_add(ktime_to_ns(spend), >time_spend[i]);
}
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c
index 78fce5aab218..9b960ba0b7ac 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c
@@ -366,3 +366,19 @@ int amdgpu_xcp_open_device(struct amdgpu_device *adev,
return 0;
 }
 
+void amdgpu_xcp_release_sched(struct amdgpu_device *adev,
+ struct amdgpu_ctx_entity *entity)
+{
+   struct drm_gpu_scheduler *sched;
+   struct amdgpu_ring *ring;
+
+   if (!adev->xcp_mgr)
+   return;
+
+   sched = entity->entity.rq->sched;
+   if (sched->ready) {
+   ring = to_amdgpu_ring(entity->entity.rq->sched);
+   atomic_dec(>xcp_mgr->xcp[ring->xcp_id].ref_cnt);
+   }
+}
+
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h
index cca06d38b03d..39aca87ce204 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h
@@ -128,6 +128,8 @@ void amdgpu_xcp_dev_unplug(struct amdgpu_device *adev);
 int amdgpu_xcp_open_device(struct amdgpu_device *adev,
   struct amdgpu_fpriv *fpriv,
   struct drm_file *file_priv);
+void amdgpu_xcp_release_sched(struct amdgpu_device *adev,
+ struct amdgpu_ctx_entity *entity);
 
 #define amdgpu_xcp_select_scheds(adev, e, c, d, x, y) \
((adev)->xcp_mgr && (adev)->xcp_mgr->funcs && \
-- 
2.40.1



[PATCH 06/29] drm/amdgpu: keep amdgpu_ctx_mgr in ctx structure

2023-05-10 Thread Alex Deucher
From: James Zhu 

Keep amdgpu_ctx_mgr in ctx structure to track fpriv.

Signed-off-by: James Zhu 
Acked-by: Lijo Lazar 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index e3d047663d61..06d68a08251a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -332,6 +332,7 @@ static int amdgpu_ctx_init(struct amdgpu_ctx_mgr *mgr, 
int32_t priority,
else
ctx->stable_pstate = current_stable_pstate;
 
+   ctx->ctx_mgr = &(fpriv->ctx_mgr);
return 0;
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
index 5fd79f94e2d0..85376baaa92f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
@@ -57,6 +57,7 @@ struct amdgpu_ctx {
unsigned long   ras_counter_ce;
unsigned long   ras_counter_ue;
uint32_tstable_pstate;
+   struct amdgpu_ctx_mgr   *ctx_mgr;
 };
 
 struct amdgpu_ctx_mgr {
-- 
2.40.1



[PATCH 11/29] drm/amdkfd: Store drm node minor number for kfd nodes

2023-05-10 Thread Alex Deucher
From: Philip Yang 

>From KFD topology, application will find kfd node with the corresponding
drm device node minor number, for example if partition drm node starts
from /dev/dri/renderD129, then KFD node 0 with store drm node minor
number 129. Application will open drm node /dev/dri/renderD129 to create
amdgpu vm for kfd node 0 with the correct vm->mem_id to indicate the
memory partition.

Signed-off-by: Philip Yang 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 6d6243b978e1..a8e25aecf839 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -1942,8 +1942,12 @@ int kfd_topology_add_device(struct kfd_node *gpu)
amdgpu_amdkfd_get_max_engine_clock_in_mhz(dev->gpu->adev);
dev->node_props.max_engine_clk_ccompute =
cpufreq_quick_get_max(0) / 1000;
-   dev->node_props.drm_render_minor =
-   gpu->kfd->shared_resources.drm_render_minor;
+
+   if (gpu->xcp)
+   dev->node_props.drm_render_minor = 
gpu->xcp->ddev->render->index;
+   else
+   dev->node_props.drm_render_minor =
+   gpu->kfd->shared_resources.drm_render_minor;
 
dev->node_props.hive_id = gpu->kfd->hive_id;
dev->node_props.num_sdma_engines = kfd_get_num_sdma_engines(gpu);
-- 
2.40.1



[PATCH 13/29] drm/amdkfd: Show KFD node memory partition info

2023-05-10 Thread Alex Deucher
From: Philip Yang 

Show KFD node memory partition id and size, add helper function
KFD_XCP_MEMORY_SIZE to get kfd node memory size, will be used
later to support memory accounting per partition.

Signed-off-by: Philip Yang 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 5 +
 drivers/gpu/drm/amd/amdkfd/kfd_device.c| 7 ++-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index e4e1dbba060a..324cb566ca2f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -330,6 +330,11 @@ int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device 
*adev,
 void amdgpu_amdkfd_unreserve_mem_limit(struct amdgpu_device *adev,
uint64_t size, u32 alloc_flag);
 
+#define KFD_XCP_MEMORY_SIZE(n) ((n)->adev->gmc.num_mem_partitions ?\
+   (n)->adev->gmc.mem_partitions[(n)->xcp->mem_id].size /\
+   (n)->adev->xcp_mgr->num_xcp_per_mem_partition :\
+   (n)->adev->gmc.real_vram_size)
+
 #if IS_ENABLED(CONFIG_HSA_AMD)
 void amdgpu_amdkfd_gpuvm_init_mem_limits(void);
 void amdgpu_amdkfd_gpuvm_destroy_cb(struct amdgpu_device *adev,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index b5497d2ee984..db5b53fcdf11 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -724,7 +724,6 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
 
kfd_cwsr_init(kfd);
 
-   /* TODO: Needs to be updated for memory partitioning */
svm_migrate_init(kfd->adev);
 
amdgpu_amdkfd_get_local_mem_info(kfd->adev, >local_mem_info);
@@ -754,6 +753,12 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
(1U << NUM_XCC(kfd->adev->gfx.xcc_mask)) - 1;
}
 
+   if (node->xcp) {
+   dev_info(kfd_device, "KFD node %d partition %d size 
%lldM\n",
+   node->node_id, node->xcp->mem_id,
+   KFD_XCP_MEMORY_SIZE(node) >> 20);
+   }
+
if (KFD_GC_VERSION(kfd) == IP_VERSION(9, 4, 3) &&
partition_mode == AMDGPU_CPX_PARTITION_MODE &&
kfd->num_nodes != 1) {
-- 
2.40.1



[PATCH 12/29] drm/amdgpu: Add memory partition id to amdgpu_vm

2023-05-10 Thread Alex Deucher
From: Philip Yang 

If xcp_mgr is initialized, add mem_id to amdgpu_vm structure to store
memory partition number when creating amdgpu_vm for the xcp. The xcp
number is decided when opening the render device, for example
/dev/dri/renderD129 is xcp_id 0, /dev/dri/rederD130 is xcp_id 1.

Signed-off-by: Philip Yang 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 8 
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h  | 3 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c | 3 +++
 3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 879718598fa4..815098be4c2f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -1223,10 +1223,6 @@ int amdgpu_driver_open_kms(struct drm_device *dev, 
struct drm_file *file_priv)
goto out_suspend;
}
 
-   r = amdgpu_xcp_open_device(adev, fpriv, file_priv);
-   if (r)
-   return r;
-
pasid = amdgpu_pasid_alloc(16);
if (pasid < 0) {
dev_warn(adev->dev, "No more PASIDs available!");
@@ -1237,6 +1233,10 @@ int amdgpu_driver_open_kms(struct drm_device *dev, 
struct drm_file *file_priv)
if (r)
goto error_pasid;
 
+   r = amdgpu_xcp_open_device(adev, fpriv, file_priv);
+   if (r)
+   goto error_vm;
+
r = amdgpu_vm_set_pasid(adev, >vm, pasid);
if (r)
goto error_vm;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 2fdec4114627..d551fca1780e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -332,6 +332,9 @@ struct amdgpu_vm {
struct ttm_lru_bulk_move lru_bulk_move;
/* Flag to indicate if VM is used for compute */
boolis_compute_context;
+
+   /* Memory partition number, -1 means any partition */
+   int8_t  mem_id;
 };
 
 struct amdgpu_vm_manager {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c
index f2981d21d4e0..610c32c4f5af 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c
@@ -364,6 +364,9 @@ int amdgpu_xcp_open_device(struct amdgpu_device *adev,
break;
}
}
+
+   fpriv->vm.mem_id = fpriv->xcp_id == ~0 ? -1 :
+   adev->xcp_mgr->xcp[fpriv->xcp_id].mem_id;
return 0;
 }
 
-- 
2.40.1



[PATCH 10/29] drm/amdgpu: Add xcp manager num_xcp_per_mem_partition

2023-05-10 Thread Alex Deucher
From: Philip Yang 

Used by KFD to check memory limit accounting.

Signed-off-by: Philip Yang 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c | 1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c
index 9b960ba0b7ac..f2981d21d4e0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c
@@ -156,6 +156,7 @@ int amdgpu_xcp_init(struct amdgpu_xcp_mgr *xcp_mgr, int 
num_xcps, int mode)
xcp_mgr->num_xcps = num_xcps;
amdgpu_xcp_update_partition_sched_list(adev);
 
+   xcp_mgr->num_xcp_per_mem_partition = num_xcps / 
xcp_mgr->adev->gmc.num_mem_partitions;
return 0;
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h
index 39aca87ce204..68b63b970ce8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h
@@ -83,6 +83,9 @@ struct amdgpu_xcp_mgr {
struct amdgpu_xcp xcp[MAX_XCP];
uint8_t num_xcps;
int8_t mode;
+
+/* Used to determine KFD memory size limits per XCP */
+   unsigned int num_xcp_per_mem_partition;
 };
 
 struct amdgpu_xcp_mgr_funcs {
-- 
2.40.1



[PATCH 03/29] drm/amdgpu: add partition ID track in ring

2023-05-10 Thread Alex Deucher
From: James Zhu 

Keep track partition ID in ring.

Signed-off-by: James Zhu 
Acked-by: Lijo Lazar 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h  |  1 +
 .../drm/amd/amdgpu/aqua_vanjaram_reg_init.c   | 41 +++
 2 files changed, 42 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index 5192e3577e99..baa03527bf8b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -252,6 +252,7 @@ struct amdgpu_ring {
uint32_tbuf_mask;
u32 idx;
u32 xcc_id;
+   u32 xcp_id;
u32 me;
u32 pipe;
u32 queue;
diff --git a/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram_reg_init.c 
b/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram_reg_init.c
index 97011e7e031d..c90ea34ef9ec 100644
--- a/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram_reg_init.c
+++ b/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram_reg_init.c
@@ -61,6 +61,47 @@ void aqua_vanjaram_doorbell_index_init(struct amdgpu_device 
*adev)
adev->doorbell_index.max_assignment = 
AMDGPU_DOORBELL_LAYOUT1_MAX_ASSIGNMENT << 1;
 }
 
+static void aqua_vanjaram_set_xcp_id(struct amdgpu_device *adev,
+uint32_t inst_idx, struct amdgpu_ring *ring)
+{
+   int xcp_id;
+   enum AMDGPU_XCP_IP_BLOCK ip_blk;
+   uint32_t inst_mask;
+
+   ring->xcp_id = ~0;
+   if (adev->xcp_mgr->mode == AMDGPU_XCP_MODE_NONE)
+   return;
+
+   inst_mask = 1 << inst_idx;
+
+   switch (ring->funcs->type) {
+   case AMDGPU_HW_IP_GFX:
+   case AMDGPU_RING_TYPE_COMPUTE:
+   case AMDGPU_RING_TYPE_KIQ:
+   ip_blk = AMDGPU_XCP_GFX;
+   break;
+   case AMDGPU_RING_TYPE_SDMA:
+   ip_blk = AMDGPU_XCP_SDMA;
+   break;
+   case AMDGPU_RING_TYPE_VCN_ENC:
+   case AMDGPU_RING_TYPE_VCN_JPEG:
+   ip_blk = AMDGPU_XCP_VCN;
+   if (adev->xcp_mgr->mode == AMDGPU_CPX_PARTITION_MODE)
+   inst_mask = 1 << (inst_idx * 2);
+   break;
+   default:
+   DRM_ERROR("Not support ring type %d!", ring->funcs->type);
+   return;
+   }
+
+   for (xcp_id = 0; xcp_id < adev->xcp_mgr->num_xcps; xcp_id++) {
+   if (adev->xcp_mgr->xcp[xcp_id].ip[ip_blk].inst_mask & 
inst_mask) {
+   ring->xcp_id = xcp_id;
+   break;
+   }
+   }
+}
+
 static int8_t aqua_vanjaram_logical_to_dev_inst(struct amdgpu_device *adev,
 enum amd_hw_ip_block_type block,
 int8_t inst)
-- 
2.40.1



[PATCH 02/29] drm/amdgpu: find partition ID when open device

2023-05-10 Thread Alex Deucher
From: James Zhu 

Find partition ID when open device from render device minor.

Signed-off-by: Christian König 
Signed-off-by: James Zhu 
Reviewed-and-tested-by: Philip Yang
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h |  2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c |  4 
 drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c | 29 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h |  3 +++
 4 files changed, 38 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 45c6522ee854..4fb43baddf96 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -488,6 +488,8 @@ struct amdgpu_fpriv {
struct mutexbo_list_lock;
struct idr  bo_list_handles;
struct amdgpu_ctx_mgr   ctx_mgr;
+   /** GPU partition selection */
+   uint32_txcp_id;
 };
 
 int amdgpu_file_to_fpriv(struct file *filp, struct amdgpu_fpriv **fpriv);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 44997c7ee89d..879718598fa4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -1223,6 +1223,10 @@ int amdgpu_driver_open_kms(struct drm_device *dev, 
struct drm_file *file_priv)
goto out_suspend;
}
 
+   r = amdgpu_xcp_open_device(adev, fpriv, file_priv);
+   if (r)
+   return r;
+
pasid = amdgpu_pasid_alloc(16);
if (pasid < 0) {
dev_warn(adev->dev, "No more PASIDs available!");
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c
index 8b28b18e4291..9b627a8b1d5c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c
@@ -335,3 +335,32 @@ void amdgpu_xcp_dev_unplug(struct amdgpu_device *adev)
drm_dev_unplug(adev->xcp_mgr->xcp[i].ddev);
 }
 
+int amdgpu_xcp_open_device(struct amdgpu_device *adev,
+  struct amdgpu_fpriv *fpriv,
+  struct drm_file *file_priv)
+{
+   int i;
+
+   if (!adev->xcp_mgr)
+   return 0;
+
+   fpriv->xcp_id = ~0;
+   for (i = 0; i < MAX_XCP; ++i) {
+   if (!adev->xcp_mgr->xcp[i].ddev)
+   break;
+
+   if (file_priv->minor == adev->xcp_mgr->xcp[i].ddev->render) {
+   if (adev->xcp_mgr->xcp[i].valid == FALSE) {
+   dev_err(adev->dev, "renderD%d partition %d not 
valid!",
+   file_priv->minor->index, i);
+   return -ENOENT;
+   }
+   dev_dbg(adev->dev, "renderD%d partition %d openned!",
+   file_priv->minor->index, i);
+   fpriv->xcp_id = i;
+   break;
+   }
+   }
+   return 0;
+}
+
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h
index dad0b98d1ae7..ad60520f952c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h
@@ -119,6 +119,9 @@ int amdgpu_xcp_get_inst_details(struct amdgpu_xcp *xcp,
 int amdgpu_xcp_dev_register(struct amdgpu_device *adev,
const struct pci_device_id *ent);
 void amdgpu_xcp_dev_unplug(struct amdgpu_device *adev);
+int amdgpu_xcp_open_device(struct amdgpu_device *adev,
+  struct amdgpu_fpriv *fpriv,
+  struct drm_file *file_priv);
 
 static inline int amdgpu_xcp_get_num_xcp(struct amdgpu_xcp_mgr *xcp_mgr)
 {
-- 
2.40.1



[PATCH 04/29] drm/amdgpu: update header to support partition scheduling

2023-05-10 Thread Alex Deucher
From: James Zhu 

Update header to support partition scheduling.

Signed-off-by: James Zhu 
Acked-by: Lijo Lazar 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h
index ad60520f952c..cca06d38b03d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h
@@ -70,7 +70,9 @@ struct amdgpu_xcp {
uint8_t id;
uint8_t mem_id;
bool valid;
+   atomic_tref_cnt;
struct drm_device *ddev;
+   struct amdgpu_sched 
gpu_sched[AMDGPU_HW_IP_NUM][AMDGPU_RING_PRIO_MAX];
 };
 
 struct amdgpu_xcp_mgr {
@@ -97,6 +99,10 @@ struct amdgpu_xcp_mgr_funcs {
int (*suspend)(struct amdgpu_xcp_mgr *xcp_mgr, int xcp_id);
int (*prepare_resume)(struct amdgpu_xcp_mgr *xcp_mgr, int xcp_id);
int (*resume)(struct amdgpu_xcp_mgr *xcp_mgr, int xcp_id);
+   int (*select_scheds)(struct amdgpu_device *adev,
+ u32 hw_ip, u32 hw_prio, struct amdgpu_fpriv 
*fpriv,
+ unsigned int *num_scheds, struct 
drm_gpu_scheduler ***scheds);
+   int (*update_partition_sched_list)(struct amdgpu_device *adev);
 };
 
 int amdgpu_xcp_prepare_suspend(struct amdgpu_xcp_mgr *xcp_mgr, int xcp_id);
@@ -123,6 +129,15 @@ int amdgpu_xcp_open_device(struct amdgpu_device *adev,
   struct amdgpu_fpriv *fpriv,
   struct drm_file *file_priv);
 
+#define amdgpu_xcp_select_scheds(adev, e, c, d, x, y) \
+   ((adev)->xcp_mgr && (adev)->xcp_mgr->funcs && \
+   (adev)->xcp_mgr->funcs->select_scheds ? \
+   (adev)->xcp_mgr->funcs->select_scheds((adev), (e), (c), (d), (x), (y)) 
: -ENOENT)
+#define amdgpu_xcp_update_partition_sched_list(adev) \
+   ((adev)->xcp_mgr && (adev)->xcp_mgr->funcs && \
+   (adev)->xcp_mgr->funcs->update_partition_sched_list ? \
+   (adev)->xcp_mgr->funcs->update_partition_sched_list(adev) : 0)
+
 static inline int amdgpu_xcp_get_num_xcp(struct amdgpu_xcp_mgr *xcp_mgr)
 {
if (!xcp_mgr)
-- 
2.40.1



[PATCH 01/29] drm/amdgpu: support partition drm devices

2023-05-10 Thread Alex Deucher
From: James Zhu 

Support partition drm devices on GC_HWIP IP_VERSION(9, 4, 3).

This is a temporary solution and will be superceded.

Signed-off-by: Christian König 
Signed-off-by: James Zhu 
Reviewed-and-tested-by: Philip Yang
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h|  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c| 32 
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.h|  2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c| 59 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h|  5 ++
 6 files changed, 99 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index bed6d1d09ac2..45c6522ee854 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -108,6 +108,7 @@
 #include "amdgpu_fdinfo.h"
 #include "amdgpu_mca.h"
 #include "amdgpu_ras.h"
+#include "amdgpu_xcp.h"
 
 #define MAX_GPU_INSTANCE   64
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index c2136accd523..40c5845c78df 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -6062,6 +6062,7 @@ void amdgpu_device_halt(struct amdgpu_device *adev)
struct pci_dev *pdev = adev->pdev;
struct drm_device *ddev = adev_to_drm(adev);
 
+   amdgpu_xcp_dev_unplug(adev);
drm_dev_unplug(ddev);
 
amdgpu_irq_disable_all(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 562e65ab48fa..4589cb2255a2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -2194,6 +2194,10 @@ static int amdgpu_pci_probe(struct pci_dev *pdev,
goto err_pci;
}
 
+   ret = amdgpu_xcp_dev_register(adev, ent);
+   if (ret)
+   goto err_pci;
+
/*
 * 1. don't init fbdev on hw without DCE
 * 2. don't init fbdev if there are no connectors
@@ -2266,6 +2270,7 @@ amdgpu_pci_remove(struct pci_dev *pdev)
struct drm_device *dev = pci_get_drvdata(pdev);
struct amdgpu_device *adev = drm_to_adev(dev);
 
+   amdgpu_xcp_dev_unplug(adev);
drm_dev_unplug(dev);
 
if (adev->pm.rpm_mode != AMDGPU_RUNPM_NONE) {
@@ -2849,6 +2854,33 @@ static const struct drm_driver amdgpu_kms_driver = {
.patchlevel = KMS_DRIVER_PATCHLEVEL,
 };
 
+const struct drm_driver amdgpu_partition_driver = {
+   .driver_features =
+   DRIVER_GEM | DRIVER_RENDER | DRIVER_SYNCOBJ |
+   DRIVER_SYNCOBJ_TIMELINE,
+   .open = amdgpu_driver_open_kms,
+   .postclose = amdgpu_driver_postclose_kms,
+   .lastclose = amdgpu_driver_lastclose_kms,
+   .ioctls = amdgpu_ioctls_kms,
+   .num_ioctls = ARRAY_SIZE(amdgpu_ioctls_kms),
+   .dumb_create = amdgpu_mode_dumb_create,
+   .dumb_map_offset = amdgpu_mode_dumb_mmap,
+   .fops = _driver_kms_fops,
+   .release = _driver_release_kms,
+
+   .prime_handle_to_fd = drm_gem_prime_handle_to_fd,
+   .prime_fd_to_handle = drm_gem_prime_fd_to_handle,
+   .gem_prime_import = amdgpu_gem_prime_import,
+   .gem_prime_mmap = drm_gem_prime_mmap,
+
+   .name = DRIVER_NAME,
+   .desc = DRIVER_DESC,
+   .date = DRIVER_DATE,
+   .major = KMS_DRIVER_MAJOR,
+   .minor = KMS_DRIVER_MINOR,
+   .patchlevel = KMS_DRIVER_PATCHLEVEL,
+};
+
 static struct pci_error_handlers amdgpu_pci_err_handler = {
.error_detected = amdgpu_pci_error_detected,
.mmio_enabled   = amdgpu_pci_mmio_enabled,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.h
index 8178323e4bef..5bc2cb661af7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.h
@@ -42,6 +42,8 @@
 #define DRIVER_DESC"AMD GPU"
 #define DRIVER_DATE"20150101"
 
+extern const struct drm_driver amdgpu_partition_driver;
+
 long amdgpu_drm_ioctl(struct file *filp,
  unsigned int cmd, unsigned long arg);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c
index bca226cc4e0b..8b28b18e4291 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c
@@ -22,6 +22,9 @@
  */
 #include "amdgpu.h"
 #include "amdgpu_xcp.h"
+#include "amdgpu_drv.h"
+
+#include 
 
 static int __amdgpu_xcp_run(struct amdgpu_xcp_mgr *xcp_mgr,
struct amdgpu_xcp_ip *xcp_ip, int xcp_state)
@@ -217,6 +220,31 @@ int amdgpu_xcp_query_partition_mode(struct amdgpu_xcp_mgr 
*xcp_mgr, u32 flags)
return mode;
 }
 
+static int amdgpu_xcp_dev_alloc(struct amdgpu_device *adev)
+{
+   struct drm_device *p_ddev;
+   struct pci_dev *pdev;
+   struct drm_device *ddev;
+   int i;
+
+   pdev = 

Re: [PATCH 10/66] drm/amd/display: Do not set drr on pipe commit

2023-05-10 Thread Aurabindo Pillai



On 5/10/23 09:20, Michel Dänzer wrote:
> On 5/9/23 23:07, Pillai, Aurabindo wrote:
>>
>> Sorry - the firmware in the previous message is for DCN32. For Navi2x, 
>> please use the firmware attached here.
> 
> Same problem (contents of /sys/kernel/debug/dri/0/amdgpu_firmware_info below).
> 
> Even if it did work with newer FW, the kernel must keep working with older 
> FW, so in that case the new behaviour would need to be guarded by the FW 
> version.
> 

Agreed. Were you able to repro the hang on any other modes/monitors? 

> 
> VCE feature version: 0, firmware version: 0x
> UVD feature version: 0, firmware version: 0x
> MC feature version: 0, firmware version: 0x
> ME feature version: 44, firmware version: 0x0040
> PFP feature version: 44, firmware version: 0x0061
> CE feature version: 44, firmware version: 0x0025
> RLC feature version: 1, firmware version: 0x0060
> RLC SRLC feature version: 0, firmware version: 0x
> RLC SRLG feature version: 0, firmware version: 0x
> RLC SRLS feature version: 0, firmware version: 0x
> RLCP feature version: 0, firmware version: 0x
> RLCV feature version: 0, firmware version: 0x
> MEC feature version: 44, firmware version: 0x0071
> MEC2 feature version: 44, firmware version: 0x0071
> IMU feature version: 0, firmware version: 0x
> SOS feature version: 0, firmware version: 0x00210c64
> ASD feature version: 553648297, firmware version: 0x21a9
> TA XGMI feature version: 0x, firmware version: 0x200f
> TA RAS feature version: 0x, firmware version: 0x1b00013e
> TA HDCP feature version: 0x, firmware version: 0x1738
> TA DTM feature version: 0x, firmware version: 0x1215
> TA RAP feature version: 0x, firmware version: 0x07000213
> TA SECUREDISPLAY feature version: 0x, firmware version: 0x
> SMC feature version: 0, program: 0, firmware version: 0x003a5800 (58.88.0)
> SDMA0 feature version: 52, firmware version: 0x0053
> SDMA1 feature version: 52, firmware version: 0x0053
> SDMA2 feature version: 52, firmware version: 0x0053
> SDMA3 feature version: 52, firmware version: 0x0053
> VCN feature version: 0, firmware version: 0x0211b000
> DMCU feature version: 0, firmware version: 0x
> DMCUB feature version: 0, firmware version: 0x0202001c
> TOC feature version: 0, firmware version: 0x
> MES_KIQ feature version: 0, firmware version: 0x
> MES feature version: 0, firmware version: 0x
> VBIOS version: 113-D4300100-051
> 
> 
> --
>> *From:* Pillai, Aurabindo 
>> *Sent:* Tuesday, May 9, 2023 4:44 PM
>> *To:* Michel Dänzer ; Zhuo, Qingqing (Lillian) 
>> ; amd-gfx@lists.freedesktop.org 
>> ; Chalmers, Wesley 
>> *Cc:* Wang, Chao-kai (Stylon) ; Li, Sun peng (Leo) 
>> ; Wentland, Harry ; Siqueira, 
>> Rodrigo ; Li, Roman ; Chiu, 
>> Solomon ; Lin, Wayne ; Lakha, 
>> Bhawanpreet ; Gutierrez, Agustin 
>> ; Kotarac, Pavle 
>> *Subject:* Re: [PATCH 10/66] drm/amd/display: Do not set drr on pipe commit
>>  
>> Hi Michel,
>>
>> Could you please try with the attached firmware package if you see the hang 
>> without any reverts?  If you do see hangs, please send dmesg with 
>> "drm.debug=0x156 log_buf_len=30M" in the kernel cmdline.
>>
>> The attached fw is not released to the public yet, but we will be updating 
>> them in linux-firmware tree next week. Please do backup your existing 
>> firmware, and put the attached files into /usr/lib/firmware/updates/amgpu 
>> and regenerate your ramdisk. On ubuntu the following should do:
>>
>> sudo update-initramfs -u -k `uname -r`
>>
>> --
>>
>> Regards,
>> Jay
>> 

[PATCH 07/10] drm/amd/display: Make unbounded req update separate from dlg/ttu

2023-05-10 Thread Aurabindo Pillai
From: Alvin Lee 

[Description]
- Updates to unbounded requesting should not be conditional
  on updates to dlg / ttu, as this could prevent unbounded
  requesting from being updated if dlg / ttu does not change

Reviewed-by: Jun Lei 
Acked-by: Aurabindo Pillai 
Signed-off-by: Alvin Lee 
---
 drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c | 11 ---
 drivers/gpu/drm/amd/display/dc/inc/core_types.h|  1 +
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c 
b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c
index b3e187b1347d..e74c3ce561ab 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c
@@ -1361,6 +1361,7 @@ static void dcn20_detect_pipe_changes(struct pipe_ctx 
*old_pipe, struct pipe_ctx
new_pipe->update_flags.bits.dppclk = 1;
new_pipe->update_flags.bits.hubp_interdependent = 1;
new_pipe->update_flags.bits.hubp_rq_dlg_ttu = 1;
+   new_pipe->update_flags.bits.unbounded_req = 1;
new_pipe->update_flags.bits.gamut_remap = 1;
new_pipe->update_flags.bits.scaler = 1;
new_pipe->update_flags.bits.viewport = 1;
@@ -1504,6 +1505,9 @@ static void dcn20_detect_pipe_changes(struct pipe_ctx 
*old_pipe, struct pipe_ctx
memcmp(_pipe->rq_regs, _pipe->rq_regs, 
sizeof(old_pipe->rq_regs)))
new_pipe->update_flags.bits.hubp_rq_dlg_ttu = 1;
}
+
+   if (old_pipe->unbounded_req != new_pipe->unbounded_req)
+   new_pipe->update_flags.bits.unbounded_req = 1;
 }
 
 static void dcn20_update_dchubp_dpp(
@@ -1537,10 +1541,11 @@ static void dcn20_update_dchubp_dpp(
_ctx->ttu_regs,
_ctx->rq_regs,
_ctx->pipe_dlg_param);
-
-   if (hubp->funcs->set_unbounded_requesting)
-   hubp->funcs->set_unbounded_requesting(hubp, 
pipe_ctx->unbounded_req);
}
+
+   if (pipe_ctx->update_flags.bits.unbounded_req && 
hubp->funcs->set_unbounded_requesting)
+   hubp->funcs->set_unbounded_requesting(hubp, 
pipe_ctx->unbounded_req);
+
if (pipe_ctx->update_flags.bits.hubp_interdependent)
hubp->funcs->hubp_setup_interdependent(
hubp,
diff --git a/drivers/gpu/drm/amd/display/dc/inc/core_types.h 
b/drivers/gpu/drm/amd/display/dc/inc/core_types.h
index b4c1cc6dc857..d8dd143cf6ea 100644
--- a/drivers/gpu/drm/amd/display/dc/inc/core_types.h
+++ b/drivers/gpu/drm/amd/display/dc/inc/core_types.h
@@ -374,6 +374,7 @@ union pipe_update_flags {
uint32_t viewport : 1;
uint32_t plane_changed : 1;
uint32_t det_size : 1;
+   uint32_t unbounded_req : 1;
} bits;
uint32_t raw;
 };
-- 
2.40.0



[PATCH 09/10] drm/amd/display: Remove v_startup workaround for dcn3+

2023-05-10 Thread Aurabindo Pillai
From: Daniel Miess 

[Why]
Calls to dcn20_adjust_freesync_v_startup are no longer
needed as of dcn3+ and can cause underflow in some cases

[How]
Move calls to dcn20_adjust_freesync_v_startup up into
validate_bandwidth for dcn2.x

Reviewed-by: Jun Lei 
Acked-by: Aurabindo Pillai 
Signed-off-by: Daniel Miess 
---
 .../drm/amd/display/dc/dml/dcn20/dcn20_fpu.c  | 24 +++
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c 
b/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c
index 3407f9a2c6a1..8ae5ddbd1b27 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c
@@ -1099,10 +1099,6 @@ void dcn20_calculate_dlg_params(struct dc *dc,
context->res_ctx.pipe_ctx[i].plane_res.bw.dppclk_khz =

pipes[pipe_idx].clks_cfg.dppclk_mhz * 1000;
context->res_ctx.pipe_ctx[i].pipe_dlg_param = 
pipes[pipe_idx].pipe.dest;
-   if 
(context->res_ctx.pipe_ctx[i].stream->adaptive_sync_infopacket.valid)
-   dcn20_adjust_freesync_v_startup(
-   >res_ctx.pipe_ctx[i].stream->timing,
-   
>res_ctx.pipe_ctx[i].pipe_dlg_param.vstartup_start);
 
pipe_idx++;
}
@@ -1931,6 +1927,7 @@ static bool dcn20_validate_bandwidth_internal(struct dc 
*dc, struct dc_state *co
int vlevel = 0;
int pipe_split_from[MAX_PIPES];
int pipe_cnt = 0;
+   int i = 0;
display_e2e_pipe_params_st *pipes = kzalloc(dc->res_pool->pipe_count * 
sizeof(display_e2e_pipe_params_st), GFP_ATOMIC);
DC_LOGGER_INIT(dc->ctx->logger);
 
@@ -1954,6 +1951,15 @@ static bool dcn20_validate_bandwidth_internal(struct dc 
*dc, struct dc_state *co
dcn20_calculate_wm(dc, context, pipes, _cnt, pipe_split_from, 
vlevel, fast_validate);
dcn20_calculate_dlg_params(dc, context, pipes, pipe_cnt, vlevel);
 
+   for (i = 0; i < dc->res_pool->pipe_count; i++) {
+   if (!context->res_ctx.pipe_ctx[i].stream)
+   continue;
+   if 
(context->res_ctx.pipe_ctx[i].stream->adaptive_sync_infopacket.valid)
+   dcn20_adjust_freesync_v_startup(
+   >res_ctx.pipe_ctx[i].stream->timing,
+   
>res_ctx.pipe_ctx[i].pipe_dlg_param.vstartup_start);
+   }
+
BW_VAL_TRACE_END_WATERMARKS();
 
goto validate_out;
@@ -2226,6 +2232,7 @@ bool dcn21_validate_bandwidth_fp(struct dc *dc,
int vlevel = 0;
int pipe_split_from[MAX_PIPES];
int pipe_cnt = 0;
+   int i = 0;
display_e2e_pipe_params_st *pipes = kzalloc(dc->res_pool->pipe_count * 
sizeof(display_e2e_pipe_params_st), GFP_ATOMIC);
DC_LOGGER_INIT(dc->ctx->logger);
 
@@ -2254,6 +2261,15 @@ bool dcn21_validate_bandwidth_fp(struct dc *dc,
dcn21_calculate_wm(dc, context, pipes, _cnt, pipe_split_from, 
vlevel, fast_validate);
dcn20_calculate_dlg_params(dc, context, pipes, pipe_cnt, vlevel);
 
+   for (i = 0; i < dc->res_pool->pipe_count; i++) {
+   if (!context->res_ctx.pipe_ctx[i].stream)
+   continue;
+   if 
(context->res_ctx.pipe_ctx[i].stream->adaptive_sync_infopacket.valid)
+   dcn20_adjust_freesync_v_startup(
+   >res_ctx.pipe_ctx[i].stream->timing,
+   
>res_ctx.pipe_ctx[i].pipe_dlg_param.vstartup_start);
+   }
+
BW_VAL_TRACE_END_WATERMARKS();
 
goto validate_out;
-- 
2.40.0



[PATCH 10/10] drm/amd/display: 3.2.236

2023-05-10 Thread Aurabindo Pillai
From: Aric Cyr 

Acked-by: Aurabindo Pillai 
Signed-off-by: Aric Cyr 
---
 drivers/gpu/drm/amd/display/dc/dc.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dc.h 
b/drivers/gpu/drm/amd/display/dc/dc.h
index 8be2e6d6d888..2dff1a5cf3b1 100644
--- a/drivers/gpu/drm/amd/display/dc/dc.h
+++ b/drivers/gpu/drm/amd/display/dc/dc.h
@@ -45,7 +45,7 @@ struct aux_payload;
 struct set_config_cmd_payload;
 struct dmub_notification;
 
-#define DC_VER "3.2.235"
+#define DC_VER "3.2.236"
 
 #define MAX_SURFACES 3
 #define MAX_PLANES 6
-- 
2.40.0



[PATCH 08/10] drm/amd/display: Remove unnecessary variable

2023-05-10 Thread Aurabindo Pillai
From: Rodrigo Siqueira 

There is no need to use dc_version in the dc_construct_ctx since this
value is copied to dc_ctx->dce_version later. This commit removes the
extra steps.

Reviewed-by: Alex Hung 
Acked-by: Aurabindo Pillai 
Signed-off-by: Rodrigo Siqueira 
---
 drivers/gpu/drm/amd/display/dc/core/dc.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c 
b/drivers/gpu/drm/amd/display/dc/core/dc.c
index adf5d0e1a7c5..f864fd3b6f29 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
@@ -857,7 +857,6 @@ static bool dc_construct_ctx(struct dc *dc,
const struct dc_init_data *init_params)
 {
struct dc_context *dc_ctx;
-   enum dce_version dc_version = DCE_VERSION_UNKNOWN;
 
dc_ctx = kzalloc(sizeof(*dc_ctx), GFP_KERNEL);
if (!dc_ctx)
@@ -875,8 +874,7 @@ static bool dc_construct_ctx(struct dc *dc,
 
/* Create logger */
 
-   dc_version = resource_parse_asic_id(init_params->asic_id);
-   dc_ctx->dce_version = dc_version;
+   dc_ctx->dce_version = resource_parse_asic_id(init_params->asic_id);
 
dc_ctx->perf_trace = dc_perf_trace_create();
if (!dc_ctx->perf_trace) {
-- 
2.40.0



[PATCH 06/10] drm/amd/display: Add visual confirm color support for MCLK switch

2023-05-10 Thread Aurabindo Pillai
From: "Leo (Hanghong) Ma" 

[Why && How]
We would like to have visual confirm color support for MCLK switch.
1. Set visual confirm color to yellow: Vblank MCLK switch.
2. Set visual confirm color to cyan: FPO + Vblank MCLK
switch.
3. Set visual confirm color to pink: Vactive MCLK switch.

Reviewed-by: Jun Lei 
Acked-by: Aurabindo Pillai 
Signed-off-by: Leo (Hanghong) Ma 
---
 drivers/gpu/drm/amd/display/dc/core/dc.c  | 47 +++--
 .../drm/amd/display/dc/core/dc_hw_sequencer.c | 50 +--
 drivers/gpu/drm/amd/display/dc/dc.h   |  1 +
 .../amd/display/dc/dcn10/dcn10_hw_sequencer.c | 22 +++-
 .../amd/display/dc/dcn10/dcn10_hw_sequencer.h |  1 -
 .../drm/amd/display/dc/dcn20/dcn20_hwseq.c| 26 +-
 .../drm/amd/display/dc/dcn20/dcn20_hwseq.h|  5 --
 .../gpu/drm/amd/display/dc/dcn20/dcn20_init.c |  2 +-
 .../drm/amd/display/dc/dcn201/dcn201_hwseq.c  |  4 +-
 .../drm/amd/display/dc/dcn201/dcn201_init.c   |  2 +-
 .../gpu/drm/amd/display/dc/dcn21/dcn21_init.c |  2 +-
 .../gpu/drm/amd/display/dc/dcn30/dcn30_init.c |  2 +-
 .../drm/amd/display/dc/dcn301/dcn301_init.c   |  2 +-
 .../gpu/drm/amd/display/dc/dcn31/dcn31_init.c |  2 +-
 .../drm/amd/display/dc/dcn314/dcn314_init.c   |  2 +-
 .../gpu/drm/amd/display/dc/dcn32/dcn32_init.c |  2 +-
 .../drm/amd/display/dc/dml/dcn32/dcn32_fpu.c  |  7 +++
 .../gpu/drm/amd/display/dc/inc/core_types.h   |  2 +
 .../gpu/drm/amd/display/dc/inc/hw_sequencer.h |  9 +++-
 19 files changed, 125 insertions(+), 65 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c 
b/drivers/gpu/drm/amd/display/dc/core/dc.c
index 9be18ebb1c17..adf5d0e1a7c5 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
@@ -1119,6 +1119,33 @@ static void phantom_pipe_blank(
hws->funcs.wait_for_blank_complete(opp);
 }
 
+static void dc_update_viusal_confirm_color(struct dc *dc, struct dc_state 
*context, struct pipe_ctx *pipe_ctx)
+{
+   if (dc->ctx->dce_version >= DCN_VERSION_1_0) {
+   memset(_ctx->visual_confirm_color, 0, sizeof(struct 
tg_color));
+
+   if (dc->debug.visual_confirm == VISUAL_CONFIRM_HDR)
+   get_hdr_visual_confirm_color(pipe_ctx, 
&(pipe_ctx->visual_confirm_color));
+   else if (dc->debug.visual_confirm == VISUAL_CONFIRM_SURFACE)
+   get_surface_visual_confirm_color(pipe_ctx, 
&(pipe_ctx->visual_confirm_color));
+   else if (dc->debug.visual_confirm == VISUAL_CONFIRM_SWIZZLE)
+   get_surface_tile_visual_confirm_color(pipe_ctx, 
&(pipe_ctx->visual_confirm_color));
+   else {
+   if (dc->ctx->dce_version < DCN_VERSION_2_0)
+   color_space_to_black_color(
+   dc, 
pipe_ctx->stream->output_color_space, &(pipe_ctx->visual_confirm_color));
+   }
+   if (dc->ctx->dce_version >= DCN_VERSION_2_0) {
+   if (dc->debug.visual_confirm == VISUAL_CONFIRM_MPCTREE)
+   get_mpctree_visual_confirm_color(pipe_ctx, 
&(pipe_ctx->visual_confirm_color));
+   else if (dc->debug.visual_confirm == 
VISUAL_CONFIRM_SUBVP)
+   get_subvp_visual_confirm_color(dc, context, 
pipe_ctx, &(pipe_ctx->visual_confirm_color));
+   else if (dc->debug.visual_confirm == 
VISUAL_CONFIRM_MCLK_SWITCH)
+   get_mclk_switch_visual_confirm_color(dc, 
context, pipe_ctx, &(pipe_ctx->visual_confirm_color));
+   }
+   }
+}
+
 static void disable_dangling_plane(struct dc *dc, struct dc_state *context)
 {
int i, j;
@@ -1189,6 +1216,9 @@ static void disable_dangling_plane(struct dc *dc, struct 
dc_state *context)
dc_rem_all_planes_for_stream(dc, old_stream, 
dangling_context);
disable_all_writeback_pipes_for_stream(dc, old_stream, 
dangling_context);
 
+   if (pipe->stream && pipe->plane_state)
+   dc_update_viusal_confirm_color(dc, context, 
pipe);
+
if (dc->hwss.apply_ctx_for_surface) {
apply_ctx_interdependent_lock(dc, 
dc->current_state, old_stream, true);
dc->hwss.apply_ctx_for_surface(dc, old_stream, 
0, dangling_context);
@@ -3456,6 +3486,14 @@ static void commit_planes_for_stream(struct dc *dc,
}
}
 
+   if (dc->debug.visual_confirm)
+   for (i = 0; i < dc->res_pool->pipe_count; i++) {
+   struct pipe_ctx *pipe = >res_ctx.pipe_ctx[i];
+
+   if (pipe->stream && pipe->plane_state)
+   dc_update_viusal_confirm_color(dc, context, 
pipe);
+   }
+
if (stream->test_pattern.type != 

[PATCH 05/10] drm/amd/display: Fix possible underflow for displays with large vblank

2023-05-10 Thread Aurabindo Pillai
From: Daniel Miess 

[Why]
Underflow observed when using a display with a large vblank region
and low refresh rate

[How]
Simplify calculation of vblank_nom

Increase value for VBlankNomDefaultUS to 800us

Reviewed-by: Jun Lei 
Acked-by: Aurabindo Pillai 
Signed-off-by: Daniel Miess 
---
 .../amd/display/dc/dml/dcn314/dcn314_fpu.c| 19 +++
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn314/dcn314_fpu.c 
b/drivers/gpu/drm/amd/display/dc/dml/dcn314/dcn314_fpu.c
index 1d00eb9e73c6..554152371eb5 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn314/dcn314_fpu.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn314/dcn314_fpu.c
@@ -33,7 +33,7 @@
 #include "dml/display_mode_vba.h"
 
 struct _vcs_dpi_ip_params_st dcn3_14_ip = {
-   .VBlankNomDefaultUS = 668,
+   .VBlankNomDefaultUS = 800,
.gpuvm_enable = 1,
.gpuvm_max_page_table_levels = 1,
.hostvm_enable = 1,
@@ -286,7 +286,7 @@ int dcn314_populate_dml_pipes_from_context_fpu(struct dc 
*dc, struct dc_state *c
struct resource_context *res_ctx = >res_ctx;
struct pipe_ctx *pipe;
bool upscaled = false;
-   bool isFreesyncVideo = false;
+   const unsigned int max_allowed_vblank_nom = 1023;
 
dc_assert_fp_enabled();
 
@@ -300,16 +300,11 @@ int dcn314_populate_dml_pipes_from_context_fpu(struct dc 
*dc, struct dc_state *c
pipe = _ctx->pipe_ctx[i];
timing = >stream->timing;
 
-   isFreesyncVideo = pipe->stream->adjust.v_total_max == 
pipe->stream->adjust.v_total_min;
-   isFreesyncVideo = isFreesyncVideo && 
pipe->stream->adjust.v_total_min > timing->v_total;
-
-   if (!isFreesyncVideo) {
-   pipes[pipe_cnt].pipe.dest.vblank_nom =
-   dcn3_14_ip.VBlankNomDefaultUS / 
(timing->h_total / (timing->pix_clk_100hz / 1.0));
-   } else {
-   pipes[pipe_cnt].pipe.dest.vtotal = 
pipe->stream->adjust.v_total_min;
-   pipes[pipe_cnt].pipe.dest.vblank_nom = timing->v_total 
- pipes[pipe_cnt].pipe.dest.vactive;
-   }
+   pipes[pipe_cnt].pipe.dest.vtotal = 
pipe->stream->adjust.v_total_min;
+   pipes[pipe_cnt].pipe.dest.vblank_nom = timing->v_total - 
pipes[pipe_cnt].pipe.dest.vactive;
+   pipes[pipe_cnt].pipe.dest.vblank_nom = 
min(pipes[pipe_cnt].pipe.dest.vblank_nom, dcn3_14_ip.VBlankNomDefaultUS);
+   pipes[pipe_cnt].pipe.dest.vblank_nom = 
max(pipes[pipe_cnt].pipe.dest.vblank_nom, timing->v_sync_width);
+   pipes[pipe_cnt].pipe.dest.vblank_nom = 
min(pipes[pipe_cnt].pipe.dest.vblank_nom, max_allowed_vblank_nom);
 
if (pipe->plane_state &&
(pipe->plane_state->src_rect.height < 
pipe->plane_state->dst_rect.height ||
-- 
2.40.0



[PATCH 04/10] drm/amd/display: Convert connector signal id to string

2023-05-10 Thread Aurabindo Pillai
From: Rodrigo Siqueira 

To improve the readability of the of the log, this commit introduces a
function that converts the signal type id to a human-readable string.

Reviewed-by: Jerry Zuo 
Acked-by: Aurabindo Pillai 
Signed-off-by: Rodrigo Siqueira 
---
 .../drm/amd/display/dc/link/link_factory.c|  6 ++--
 .../drm/amd/display/include/signal_types.h| 28 +++
 2 files changed, 30 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/link/link_factory.c 
b/drivers/gpu/drm/amd/display/dc/link/link_factory.c
index 1515c817f03b..ac1c3e2e7c1d 100644
--- a/drivers/gpu/drm/amd/display/dc/link/link_factory.c
+++ b/drivers/gpu/drm/amd/display/dc/link/link_factory.c
@@ -563,11 +563,9 @@ static bool construct_phy(struct dc_link *link,
goto create_fail;
}
 
-   /* TODO: #DAL3 Implement id to str function.*/
-   LINK_INFO("Connector[%d] description:"
- "signal %d\n",
+   LINK_INFO("Connector[%d] description: signal: %s\n",
  init_params->connector_index,
- link->connector_signal);
+ signal_type_to_string(link->connector_signal));
 
ddc_service_init_data.ctx = link->ctx;
ddc_service_init_data.id = link->link_id;
diff --git a/drivers/gpu/drm/amd/display/include/signal_types.h 
b/drivers/gpu/drm/amd/display/include/signal_types.h
index 23a308c3eccb..325c5ba4c82a 100644
--- a/drivers/gpu/drm/amd/display/include/signal_types.h
+++ b/drivers/gpu/drm/amd/display/include/signal_types.h
@@ -44,6 +44,34 @@ enum signal_type {
SIGNAL_TYPE_VIRTUAL = (1 << 9), /* Virtual Display */
 };
 
+static inline const char *signal_type_to_string(const int type)
+{
+   switch (type) {
+   case SIGNAL_TYPE_NONE:
+   return "No signal";
+   case SIGNAL_TYPE_DVI_SINGLE_LINK:
+   return "DVI: Single Link";
+   case SIGNAL_TYPE_DVI_DUAL_LINK:
+   return "DVI: Dual Link";
+   case SIGNAL_TYPE_HDMI_TYPE_A:
+   return "HDMI: TYPE A";
+   case SIGNAL_TYPE_LVDS:
+   return "LVDS";
+   case SIGNAL_TYPE_RGB:
+   return "RGB";
+   case SIGNAL_TYPE_DISPLAY_PORT:
+   return "Display Port";
+   case SIGNAL_TYPE_DISPLAY_PORT_MST:
+   return "Display Port: MST";
+   case SIGNAL_TYPE_EDP:
+   return "Embedded Display Port";
+   case SIGNAL_TYPE_VIRTUAL:
+   return "Virtual";
+   default:
+   return "Unknown";
+   }
+}
+
 /* help functions for signal types manipulation */
 static inline bool dc_is_hdmi_tmds_signal(enum signal_type signal)
 {
-- 
2.40.0



[PATCH 01/10] drm/amd/display: enable dpia validate

2023-05-10 Thread Aurabindo Pillai
From: Mustapha Ghaddar 

Use dpia_validate_usb4_bw() function

Fixes: 6d86146dd62f ("drm/amd/display: Add function pointer for validate bw 
usb4")
Reviewed-by: Roman Li 
Reviewed-by: Meenakshikumar Somasundaram 
Acked-by: Aurabindo Pillai 
Signed-off-by: Mustapha Ghaddar 
---
 drivers/gpu/drm/amd/display/dc/link/link_validation.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/dc/link/link_validation.c 
b/drivers/gpu/drm/amd/display/dc/link/link_validation.c
index d4b7da526f0a..e8b2fc4002a5 100644
--- a/drivers/gpu/drm/amd/display/dc/link/link_validation.c
+++ b/drivers/gpu/drm/amd/display/dc/link/link_validation.c
@@ -359,5 +359,8 @@ bool link_validate_dpia_bandwidth(const struct 
dc_stream_state *stream, const un
link[i] = stream[i].link;
bw_needed[i] = 
dc_bandwidth_in_kbps_from_timing([i].timing);
}
+
+   ret = dpia_validate_usb4_bw(link, bw_needed, num_streams);
+
return ret;
 }
-- 
2.40.0



[PATCH 03/10] drm/amd/display: Update vactive margin and max vblank for fpo + vactive

2023-05-10 Thread Aurabindo Pillai
From: Alvin Lee 

[Description]
- Some 1920x1080@60hz displays have VBLANK time > 600us which we
  still want to accept for FPO + Vactive configs based on testing
- Increase max VBLANK time to 1000us to allow these configs
  for FPO + Vactive
- Increase minimum vactive switch margin for FPO + Vactive to 200us
- Based on testing, 1920x1080@120hz can have a switch margin
  of ~160us which requires significantly longer FPO stretch
  margin (5ms) which we don't want to accept for now
- Also move margins into debug option

Reviewed-by: Jun Lei 
Reviewed-by: Nevenko Stupar 
Acked-by: Aurabindo Pillai 
Signed-off-by: Alvin Lee 
---
 drivers/gpu/drm/amd/display/dc/dc.h   | 2 ++
 drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource.c | 2 ++
 drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource.h | 1 -
 drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource_helpers.c | 2 +-
 drivers/gpu/drm/amd/display/dc/dcn321/dcn321_resource.c   | 2 ++
 drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c  | 3 +--
 6 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dc.h 
b/drivers/gpu/drm/amd/display/dc/dc.h
index e89de1078964..1ebb8d3573f4 100644
--- a/drivers/gpu/drm/amd/display/dc/dc.h
+++ b/drivers/gpu/drm/amd/display/dc/dc.h
@@ -893,6 +893,8 @@ struct dc_debug_options {
bool minimize_dispclk_using_odm;
bool disable_subvp_high_refresh;
bool disable_dp_plus_plus_wa;
+   uint32_t fpo_vactive_min_active_margin_us;
+   uint32_t fpo_vactive_max_blank_us;
 };
 
 struct gpu_info_soc_bounding_box_v1_0;
diff --git a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource.c
index 4de2f8813dce..98c394f9f8cf 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource.c
@@ -730,6 +730,8 @@ static const struct dc_debug_options debug_defaults_drv = {
.disable_boot_optimizations = false,
.disable_subvp_high_refresh = true,
.disable_dp_plus_plus_wa = true,
+   .fpo_vactive_min_active_margin_us = 200,
+   .fpo_vactive_max_blank_us = 1000,
 };
 
 static const struct dc_debug_options debug_defaults_diags = {
diff --git a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource.h 
b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource.h
index 42ccfd13a37c..58826e0aa76e 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource.h
+++ b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource.h
@@ -39,7 +39,6 @@
 #define DCN3_2_MBLK_HEIGHT_8BPE 64
 #define DCN3_2_VMIN_DISPCLK_HZ 71700
 #define DCN3_2_DCFCLK_DS_INIT_KHZ 1 // Choose 10Mhz for init DCFCLK DS freq
-#define DCN3_2_MIN_ACTIVE_SWITCH_MARGIN_FPO_US 100 // Only allow FPO + Vactive 
if active margin >= 100
 #define SUBVP_HIGH_REFRESH_LIST_LEN 3
 #define DCN3_2_MAX_SUBVP_PIXEL_RATE_MHZ 1800
 
diff --git a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource_helpers.c 
b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource_helpers.c
index df912c333bbd..a8082580df92 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource_helpers.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource_helpers.c
@@ -626,7 +626,7 @@ struct dc_stream_state 
*dcn32_can_support_mclk_switch_using_fw_based_vblank_stre
DC_FP_END();
 
DC_FP_START();
-   is_fpo_vactive = dcn32_find_vactive_pipe(dc, context, 
DCN3_2_MIN_ACTIVE_SWITCH_MARGIN_FPO_US);
+   is_fpo_vactive = dcn32_find_vactive_pipe(dc, context, 
dc->debug.fpo_vactive_min_active_margin_us);
DC_FP_END();
if (!is_fpo_vactive || dc->debug.disable_fpo_vactive)
return NULL;
diff --git a/drivers/gpu/drm/amd/display/dc/dcn321/dcn321_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn321/dcn321_resource.c
index 4c1e0f5a5f09..f4cd9749ffdf 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn321/dcn321_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn321/dcn321_resource.c
@@ -728,6 +728,8 @@ static const struct dc_debug_options debug_defaults_drv = {
.disable_fpo_vactive = false,
.disable_boot_optimizations = false,
.disable_subvp_high_refresh = true,
+   .fpo_vactive_min_active_margin_us = 200,
+   .fpo_vactive_max_blank_us = 1000,
 };
 
 static const struct dc_debug_options debug_defaults_diags = {
diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c 
b/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c
index f7e45d935a29..8c60b88c7d1a 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c
@@ -35,7 +35,6 @@
 
 #define DC_LOGGER_INIT(logger)
 
-static const unsigned int MAX_FPO_VACTIVE_BLANK_US = 600;
 static const struct subvp_high_refresh_list subvp_high_refresh_list = {
.min_refresh = 120,
 

[PATCH 02/10] drm/amd/display: Only skip update for DCFCLK, UCLK, FCLK on overclock

2023-05-10 Thread Aurabindo Pillai
From: Alvin Lee 

[Description]
- Update clocks is skipped in the GPU overclock sequence
- However, we still need to update DISPCLK, DPPCLK, and DTBCLK
  because the GPU overclock sequence could temporarily disable
  ODM 2:1 combine because we disable all planes in the sequence

Reviewed-by: Jun Lei 
Acked-by: Aurabindo Pillai 
Signed-off-by: Alvin Lee 
---
 .../display/dc/clk_mgr/dcn32/dcn32_clk_mgr.c  | 24 +++
 drivers/gpu/drm/amd/display/dc/dc.h   |  7 +-
 2 files changed, 20 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn32/dcn32_clk_mgr.c 
b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn32/dcn32_clk_mgr.c
index 85e963ec25ab..1df623b298a9 100644
--- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn32/dcn32_clk_mgr.c
+++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn32/dcn32_clk_mgr.c
@@ -460,9 +460,6 @@ static void dcn32_update_clocks(struct clk_mgr 
*clk_mgr_base,
bool p_state_change_support;
bool fclk_p_state_change_support;
 
-   if (dc->work_arounds.skip_clock_update)
-   return;
-
if (clk_mgr_base->clks.dispclk_khz == 0 ||
(dc->debug.force_clock_mode & 0x1)) {
/* This is from resume or boot up, if forced_clock cfg option 
used,
@@ -489,7 +486,8 @@ static void dcn32_update_clocks(struct clk_mgr 
*clk_mgr_base,
 
fclk_p_state_change_support = 
new_clocks->fclk_p_state_change_support;
 
-   if (should_update_pstate_support(safe_to_lower, 
fclk_p_state_change_support, clk_mgr_base->clks.fclk_p_state_change_support)) {
+   if (should_update_pstate_support(safe_to_lower, 
fclk_p_state_change_support, clk_mgr_base->clks.fclk_p_state_change_support) &&
+   
!dc->work_arounds.clock_update_disable_mask.fclk) {
clk_mgr_base->clks.fclk_p_state_change_support = 
fclk_p_state_change_support;
 
/* To enable FCLK P-state switching, send 
FCLK_PSTATE_SUPPORTED message to PMFW */
@@ -503,12 +501,14 @@ static void dcn32_update_clocks(struct clk_mgr 
*clk_mgr_base,
new_clocks->dcfclk_khz = (new_clocks->dcfclk_khz > 
(dc->debug.force_min_dcfclk_mhz * 1000)) ?
new_clocks->dcfclk_khz : 
(dc->debug.force_min_dcfclk_mhz * 1000);
 
-   if (should_set_clock(safe_to_lower, new_clocks->dcfclk_khz, 
clk_mgr_base->clks.dcfclk_khz)) {
+   if (should_set_clock(safe_to_lower, new_clocks->dcfclk_khz, 
clk_mgr_base->clks.dcfclk_khz) &&
+   
!dc->work_arounds.clock_update_disable_mask.dcfclk) {
clk_mgr_base->clks.dcfclk_khz = new_clocks->dcfclk_khz;
dcn32_smu_set_hard_min_by_freq(clk_mgr, PPCLK_DCFCLK, 
khz_to_mhz_ceil(clk_mgr_base->clks.dcfclk_khz));
}
 
-   if (should_set_clock(safe_to_lower, 
new_clocks->dcfclk_deep_sleep_khz, clk_mgr_base->clks.dcfclk_deep_sleep_khz)) {
+   if (should_set_clock(safe_to_lower, 
new_clocks->dcfclk_deep_sleep_khz, clk_mgr_base->clks.dcfclk_deep_sleep_khz) &&
+   
!dc->work_arounds.clock_update_disable_mask.dcfclk_ds) {
clk_mgr_base->clks.dcfclk_deep_sleep_khz = 
new_clocks->dcfclk_deep_sleep_khz;
dcn30_smu_set_min_deep_sleep_dcef_clk(clk_mgr, 
khz_to_mhz_ceil(clk_mgr_base->clks.dcfclk_deep_sleep_khz));
}
@@ -527,7 +527,8 @@ static void dcn32_update_clocks(struct clk_mgr 
*clk_mgr_base,
}
 
p_state_change_support = new_clocks->p_state_change_support;
-   if (should_update_pstate_support(safe_to_lower, 
p_state_change_support, clk_mgr_base->clks.p_state_change_support)) {
+   if (should_update_pstate_support(safe_to_lower, 
p_state_change_support, clk_mgr_base->clks.p_state_change_support) &&
+   
!dc->work_arounds.clock_update_disable_mask.uclk) {
clk_mgr_base->clks.p_state_change_support = 
p_state_change_support;
 
/* to disable P-State switching, set UCLK min = max */
@@ -541,20 +542,23 @@ static void dcn32_update_clocks(struct clk_mgr 
*clk_mgr_base,
update_fclk = true;
}
 
-   if (clk_mgr_base->ctx->dce_version != DCN_VERSION_3_21 && 
!clk_mgr_base->clks.fclk_p_state_change_support && update_fclk) {
+   if (clk_mgr_base->ctx->dce_version != DCN_VERSION_3_21 && 
!clk_mgr_base->clks.fclk_p_state_change_support && update_fclk &&
+   
!dc->work_arounds.clock_update_disable_mask.fclk) {
/* Handle code for sending a message to PMFW that FCLK 
P-state change is not supported */
dcn32_smu_send_fclk_pstate_message(clk_mgr, 
FCLK_PSTATE_NOTSUPPORTED);
}
 
/* Always 

[PATCH 00/10] DC Patches for 15 May 2023

2023-05-10 Thread Aurabindo Pillai
This DC patchset brings improvements in multiple areas. In summary, we 
highlight:

* DC v3.2.236
* Fixes related to DCN clock sequencing
* Changes to FPO acceptance heuristics for various modelines
* Dmesg log readability, visual debug improments and various bug fixes.

Cc: Daniel Wheeler 

---

Alvin Lee (3):
  drm/amd/display: Only skip update for DCFCLK, UCLK, FCLK on overclock
  drm/amd/display: Update vactive margin and max vblank for fpo +
vactive
  drm/amd/display: Make unbounded req update separate from dlg/ttu

Aric Cyr (1):
  drm/amd/display: 3.2.236

Daniel Miess (2):
  drm/amd/display: Fix possible underflow for displays with large vblank
  drm/amd/display: Remove v_startup workaround for dcn3+

Leo (Hanghong) Ma (1):
  drm/amd/display: Add visual confirm color support for MCLK switch

Mustapha Ghaddar (1):
  drm/amd/display: enable dpia validate

Rodrigo Siqueira (2):
  drm/amd/display: Convert connector signal id to string
  drm/amd/display: Remove unnecessary variable

 .../display/dc/clk_mgr/dcn32/dcn32_clk_mgr.c  | 24 +
 drivers/gpu/drm/amd/display/dc/core/dc.c  | 51 ---
 .../drm/amd/display/dc/core/dc_hw_sequencer.c | 50 --
 drivers/gpu/drm/amd/display/dc/dc.h   | 12 -
 .../amd/display/dc/dcn10/dcn10_hw_sequencer.c | 22 +++-
 .../amd/display/dc/dcn10/dcn10_hw_sequencer.h |  1 -
 .../drm/amd/display/dc/dcn20/dcn20_hwseq.c| 37 --
 .../drm/amd/display/dc/dcn20/dcn20_hwseq.h|  5 --
 .../gpu/drm/amd/display/dc/dcn20/dcn20_init.c |  2 +-
 .../drm/amd/display/dc/dcn201/dcn201_hwseq.c  |  4 +-
 .../drm/amd/display/dc/dcn201/dcn201_init.c   |  2 +-
 .../gpu/drm/amd/display/dc/dcn21/dcn21_init.c |  2 +-
 .../gpu/drm/amd/display/dc/dcn30/dcn30_init.c |  2 +-
 .../drm/amd/display/dc/dcn301/dcn301_init.c   |  2 +-
 .../gpu/drm/amd/display/dc/dcn31/dcn31_init.c |  2 +-
 .../drm/amd/display/dc/dcn314/dcn314_init.c   |  2 +-
 .../gpu/drm/amd/display/dc/dcn32/dcn32_init.c |  2 +-
 .../drm/amd/display/dc/dcn32/dcn32_resource.c |  2 +
 .../drm/amd/display/dc/dcn32/dcn32_resource.h |  1 -
 .../display/dc/dcn32/dcn32_resource_helpers.c |  2 +-
 .../amd/display/dc/dcn321/dcn321_resource.c   |  2 +
 .../drm/amd/display/dc/dml/dcn20/dcn20_fpu.c  | 24 +++--
 .../amd/display/dc/dml/dcn314/dcn314_fpu.c| 19 +++
 .../drm/amd/display/dc/dml/dcn32/dcn32_fpu.c  | 10 +++-
 .../gpu/drm/amd/display/dc/inc/core_types.h   |  3 ++
 .../gpu/drm/amd/display/dc/inc/hw_sequencer.h |  9 +++-
 .../drm/amd/display/dc/link/link_factory.c|  6 +--
 .../drm/amd/display/dc/link/link_validation.c |  3 ++
 .../drm/amd/display/include/signal_types.h| 28 ++
 29 files changed, 224 insertions(+), 107 deletions(-)

-- 
2.40.0



Re: [RFC PATCH 0/4] Add support for DRM cgroup memory accounting.

2023-05-10 Thread Tejun Heo
Hello,

On Wed, May 10, 2023 at 04:59:01PM +0200, Maarten Lankhorst wrote:
> The misc controller is not granular enough. A single computer may have any 
> number of
> graphics cards, some of them with multiple regions of vram inside a single 
> card.

Extending the misc controller to support dynamic keys shouldn't be that
difficult.

...
> In the next version, I will move all the code for handling the resource limit 
> to
> TTM's eviction layer, because otherwise it cannot handle the resource limit 
> correctly.
> 
> The effect of moving the code to TTM, is that it will make the code even more 
> generic
> for drivers that have vram and use TTM. When using TTM, you only have to 
> describe your
> VRAM, update some fields in the TTM manager and (un)register your device with 
> the
> cgroup handler on (un)load. It's quite trivial to add vram accounting to 
> amdgpu and
> nouveau. [2]
> 
> If you want to add a knob for scheduling weight for a process, it makes sense 
> to
> also add resource usage as a knob, otherwise the effect of that knob is very
> limited. So even for Tvrtko's original proposed usecase, it would make sense.

It does make sense but unlike Tvrtko's scheduling weights what's being
proposed doesn't seem to encapsulate GPU memory resource in a generic enough
manner at least to my untrained eyes. ie. w/ drm.weight, I don't need any
specific knoweldge of how a specific GPU operates to say "this guy should
get 2x processing power over that guy". This more or less holds for other
major resources including CPU, memory and IO. What you're proposing seems a
lot more tied to hardware details and users would have to know a lot more
about how memory is configured on that particular GPU.

Now, if this is inherent to how all, or at least most, GPUs operate, sure,
but otherwise let's start small in terms of interface and not take up space
which should be for something universal. If this turns out to be the way,
expanding to take up the generic interface space isn't difficult.

I don't know GPU space so please educate me where I'm wrong.

Thanks.

-- 
tejun


[PATCH 6/6] drm/amdgpu/bu: update mtype_local parameter settings

2023-05-10 Thread Alex Deucher
From: Graham Sider 

Update mtype_local module parameter to use MTYPE_RW by default.

0: MTYPE_RW (default)
1: MTYPE_NC
2: MTYPE_CC

Signed-off-by: Graham Sider 
Reviewed-by: Harish Kasiviswanathan 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |  2 +-
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c   | 12 ++--
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c|  3 ++-
 3 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 8163abcc420c..562e65ab48fa 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -835,7 +835,7 @@ module_param_named(no_queue_eviction_on_vm_fault, 
amdgpu_no_queue_eviction_on_vm
  * DOC: mtype_local (int)
  */
 int amdgpu_mtype_local;
-MODULE_PARM_DESC(mtype_local, "MTYPE for local memory (0 = MTYPE_CC (default), 
1 = MTYPE_NC, 2 = MTYPE_RW)");
+MODULE_PARM_DESC(mtype_local, "MTYPE for local memory (0 = MTYPE_RW (default), 
1 = MTYPE_NC, 2 = MTYPE_CC)");
 module_param_named(mtype_local, amdgpu_mtype_local, int, 0444);
 
 /**
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index 5f7e6e15842b..7dfe6a8ca91a 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -1240,15 +1240,15 @@ static void gmc_v9_0_get_coherence_flags(struct 
amdgpu_device *adev,
 * NUMA systems. Their MTYPE can be overridden per-page in
 * gmc_v9_0_override_vm_pte_flags.
 */
-   mtype_local = MTYPE_CC;
+   mtype_local = MTYPE_RW;
if (amdgpu_mtype_local == 1) {
DRM_INFO_ONCE("Using MTYPE_NC for local memory\n");
mtype_local = MTYPE_NC;
} else if (amdgpu_mtype_local == 2) {
-   DRM_INFO_ONCE("Using MTYPE_RW for local memory\n");
-   mtype_local = MTYPE_RW;
-   } else {
DRM_INFO_ONCE("Using MTYPE_CC for local memory\n");
+   mtype_local = MTYPE_CC;
+   } else {
+   DRM_INFO_ONCE("Using MTYPE_RW for local memory\n");
}
is_local = (!is_vram && (adev->flags & AMD_IS_APU) &&
num_possible_nodes() <= 1) ||
@@ -1364,12 +1364,12 @@ static void gmc_v9_0_override_vm_pte_flags(struct 
amdgpu_device *adev,
/*vm->mem_id*/0, local_node, nid);
if (nid == local_node) {
uint64_t old_flags = *flags;
-   unsigned int mtype_local = MTYPE_CC;
+   unsigned int mtype_local = MTYPE_RW;
 
if (amdgpu_mtype_local == 1)
mtype_local = MTYPE_NC;
else if (amdgpu_mtype_local == 2)
-   mtype_local = MTYPE_RW;
+   mtype_local = MTYPE_CC;
 
*flags = (*flags & ~AMDGPU_PTE_MTYPE_VG10_MASK) |
 AMDGPU_PTE_MTYPE_VG10(mtype_local);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 9053202ab534..c5675c7e3b9e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1191,7 +1191,8 @@ svm_range_get_pte_flags(struct kfd_node *node,
}
break;
case IP_VERSION(9, 4, 3):
-   mtype_local = amdgpu_mtype_local == 1 ? AMDGPU_VM_MTYPE_NC : 
(amdgpu_mtype_local == 2 ? AMDGPU_VM_MTYPE_RW : AMDGPU_VM_MTYPE_CC);
+   mtype_local = amdgpu_mtype_local == 1 ? AMDGPU_VM_MTYPE_NC :
+(amdgpu_mtype_local == 2 ? AMDGPU_VM_MTYPE_CC : 
AMDGPU_VM_MTYPE_RW);
snoop = true;
if (uncached) {
mapping_flags |= AMDGPU_VM_MTYPE_UC;
-- 
2.40.1



[PATCH 5/6] drm/amdgpu/bu: add mtype_local as a module parameter

2023-05-10 Thread Alex Deucher
From: David Francis 

Selects the MTYPE to be used for local memory,
(0 = MTYPE_CC (default), 1 = MTYPE_NC, 2 = MTYPE_RW)

This change is for internal testing only - do not upstream.

v2: squash in build fix (Alex)

Reviewed-by: Graham Sider 
Signed-off-by: David Francis 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |  8 
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c   | 19 ---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c|  3 +--
 4 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index a3a0dbeb251f..bed6d1d09ac2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -213,7 +213,7 @@ extern int amdgpu_noretry;
 extern int amdgpu_force_asic_type;
 extern int amdgpu_smartshift_bias;
 extern int amdgpu_use_xgmi_p2p;
-extern bool amdgpu_use_mtype_cc_wa;
+extern int amdgpu_mtype_local;
 #ifdef CONFIG_HSA_AMD
 extern int sched_policy;
 extern bool debug_evictions;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 2f38c49aa597..8163abcc420c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -832,11 +832,11 @@ module_param_named(no_queue_eviction_on_vm_fault, 
amdgpu_no_queue_eviction_on_vm
 #endif
 
 /**
- * DOC: use_mtype_cc_wa (bool)
+ * DOC: mtype_local (int)
  */
-bool amdgpu_use_mtype_cc_wa = true;
-MODULE_PARM_DESC(use_mtype_cc_wa, "Use MTYPE_CC workaround (0 = use MTYPE_RW 
where applicable, 1 = use MTYPE_CC where applicable (default))");
-module_param_named(use_mtype_cc_wa, amdgpu_use_mtype_cc_wa, bool, 0444);
+int amdgpu_mtype_local;
+MODULE_PARM_DESC(mtype_local, "MTYPE for local memory (0 = MTYPE_CC (default), 
1 = MTYPE_NC, 2 = MTYPE_RW)");
+module_param_named(mtype_local, amdgpu_mtype_local, int, 0444);
 
 /**
  * DOC: pcie_p2p (bool)
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index 5c9f0169292e..5f7e6e15842b 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -1240,7 +1240,16 @@ static void gmc_v9_0_get_coherence_flags(struct 
amdgpu_device *adev,
 * NUMA systems. Their MTYPE can be overridden per-page in
 * gmc_v9_0_override_vm_pte_flags.
 */
-   mtype_local = amdgpu_use_mtype_cc_wa ? MTYPE_CC : MTYPE_RW;
+   mtype_local = MTYPE_CC;
+   if (amdgpu_mtype_local == 1) {
+   DRM_INFO_ONCE("Using MTYPE_NC for local memory\n");
+   mtype_local = MTYPE_NC;
+   } else if (amdgpu_mtype_local == 2) {
+   DRM_INFO_ONCE("Using MTYPE_RW for local memory\n");
+   mtype_local = MTYPE_RW;
+   } else {
+   DRM_INFO_ONCE("Using MTYPE_CC for local memory\n");
+   }
is_local = (!is_vram && (adev->flags & AMD_IS_APU) &&
num_possible_nodes() <= 1) ||
   (is_vram && adev == bo_adev /* TODO: memory 
partitions &&
@@ -1354,9 +1363,13 @@ static void gmc_v9_0_override_vm_pte_flags(struct 
amdgpu_device *adev,
dev_dbg(adev->dev, "vm->mem_id=%d, local_node=%d, nid=%d\n",
/*vm->mem_id*/0, local_node, nid);
if (nid == local_node) {
-   unsigned int mtype_local =
-   amdgpu_use_mtype_cc_wa ? MTYPE_CC : MTYPE_RW;
uint64_t old_flags = *flags;
+   unsigned int mtype_local = MTYPE_CC;
+
+   if (amdgpu_mtype_local == 1)
+   mtype_local = MTYPE_NC;
+   else if (amdgpu_mtype_local == 2)
+   mtype_local = MTYPE_RW;
 
*flags = (*flags & ~AMDGPU_PTE_MTYPE_VG10_MASK) |
 AMDGPU_PTE_MTYPE_VG10(mtype_local);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index ab1acf97d049..9053202ab534 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1191,8 +1191,7 @@ svm_range_get_pte_flags(struct kfd_node *node,
}
break;
case IP_VERSION(9, 4, 3):
-   mtype_local = amdgpu_use_mtype_cc_wa ? AMDGPU_VM_MTYPE_CC :
-  AMDGPU_VM_MTYPE_RW;
+   mtype_local = amdgpu_mtype_local == 1 ? AMDGPU_VM_MTYPE_NC : 
(amdgpu_mtype_local == 2 ? AMDGPU_VM_MTYPE_RW : AMDGPU_VM_MTYPE_CC);
snoop = true;
if (uncached) {
mapping_flags |= AMDGPU_VM_MTYPE_UC;
-- 
2.40.1



[PATCH 3/6] drm/amdgpu: Fix per-BO MTYPE selection for GFXv9.4.3

2023-05-10 Thread Alex Deucher
From: Felix Kuehling 

Treat system memory on NUMA systems as remote by default. Overriding with
a more efficient MTYPE per page will be implemented in the next patch.

No need for a special case for APP APUs. System memory is handled the same
for carve-out and native mode. And VRAM doesn't exist in native mode.

Signed-off-by: Felix Kuehling 
Reviewed-by: Philip Yang 
Reviewed-and-tested-by: Rajneesh Bhardwaj 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 40 +++
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c  | 24 +---
 2 files changed, 30 insertions(+), 34 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index 59ce741dfa73..52f5bab5fcb7 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -1191,9 +1191,10 @@ static void gmc_v9_0_get_coherence_flags(struct 
amdgpu_device *adev,
bool is_vram = bo->tbo.resource->mem_type == TTM_PL_VRAM;
bool coherent = bo->flags & AMDGPU_GEM_CREATE_COHERENT;
bool uncached = bo->flags & AMDGPU_GEM_CREATE_UNCACHED;
-   unsigned int mtype;
-   unsigned int mtype_default;
+   /* TODO: memory partitions struct amdgpu_vm *vm = 
mapping->bo_va->base.vm;*/
+   unsigned int mtype_local, mtype;
bool snoop = false;
+   bool is_local;
 
switch (adev->ip_versions[GC_HWIP][0]) {
case IP_VERSION(9, 4, 1):
@@ -1233,35 +1234,26 @@ static void gmc_v9_0_get_coherence_flags(struct 
amdgpu_device *adev,
}
break;
case IP_VERSION(9, 4, 3):
-   /* FIXME: Needs more work for handling multiple memory
-* partitions (> NPS1 mode) e.g. NPS4 for both APU and dGPU
-* modes.
-* FIXME: Temporarily using MTYPE_CC instead of MTYPE_RW where 
applicable.
-* To force use of MTYPE_RW, set use_mtype_cc_wa=0
+   /* Only local VRAM BOs or system memory on non-NUMA APUs
+* can be assumed to be local in their entirety. Choose
+* MTYPE_NC as safe fallback for all system memory BOs on
+* NUMA systems. Their MTYPE can be overridden per-page in
+* gmc_v9_0_override_vm_pte_flags.
 */
-   mtype_default = amdgpu_use_mtype_cc_wa ? MTYPE_CC : MTYPE_RW;
+   mtype_local = amdgpu_use_mtype_cc_wa ? MTYPE_CC : MTYPE_RW;
+   is_local = (!is_vram && (adev->flags & AMD_IS_APU) &&
+   num_possible_nodes() <= 1) ||
+  (is_vram && adev == bo_adev /* TODO: memory 
partitions &&
+   bo->mem_id == vm->mem_id*/);
snoop = true;
if (uncached) {
mtype = MTYPE_UC;
-   } else if (adev->gmc.is_app_apu) {
-   /* FIXME: APU in native mode, NPS1 single socket only
-*
-* For suporting NUMA partitioned APU e.g. in NPS4 mode,
-* this need to look at the NUMA node on which the
-* system memory allocation was done.
-*
-* Memory access by a different partition within same
-* socket should be treated as remote access so MTYPE_RW
-* cannot be used always.
-*/
-   mtype = mtype_default;
} else if (adev->flags & AMD_IS_APU) {
-   /* APU on carve out mode */
-   mtype = mtype_default;
+   mtype = is_local ? mtype_local : MTYPE_NC;
} else {
/* dGPU */
-   if (is_vram && bo_adev == adev)
-   mtype = mtype_default;
+   if (is_local)
+   mtype = mtype_local;
else if (is_vram)
mtype = MTYPE_NC;
else
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index c55b9754c506..ab1acf97d049 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1150,6 +1150,7 @@ svm_range_get_pte_flags(struct kfd_node *node,
bool snoop = (domain != SVM_RANGE_VRAM_DOMAIN);
bool coherent = flags & KFD_IOCTL_SVM_FLAG_COHERENT;
bool uncached = flags & KFD_IOCTL_SVM_FLAG_UNCACHED;
+   unsigned int mtype_local;
 
if (domain == SVM_RANGE_VRAM_DOMAIN)
bo_node = prange->svm_bo->node;
@@ -1190,19 +1191,16 @@ svm_range_get_pte_flags(struct kfd_node *node,
}
break;
case IP_VERSION(9, 4, 3):
-   //TODO: Need more work for handling multiple memory partitions
-   //e.g. NPS4. 

[PATCH 4/6] drm/amdgpu: Override MTYPE per page on GFXv9.4.3 APUs

2023-05-10 Thread Alex Deucher
From: Felix Kuehling 

On GFXv9.4.3 NUMA APUs, system memory locality must be determined per
page to choose the correct MTYPE. This patch adds a GMC callback that
can provide this per-page override and implements it for native mode.

Carve-out mode is not yet supported and will use the safe default
(remote) MTYPE for system memory.

Signed-off-by: Felix Kuehling 
Reviewed-by: Philip Yang 
Reviewed-and-tested-by: Rajneesh Bhardwaj 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h   |  7 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 22 ++--
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 64 +++
 3 files changed, 90 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
index 43357d699e6e..6794edd1d2d2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
@@ -148,6 +148,10 @@ struct amdgpu_gmc_funcs {
void (*get_vm_pte)(struct amdgpu_device *adev,
   struct amdgpu_bo_va_mapping *mapping,
   uint64_t *flags);
+   /* override per-page pte flags */
+   void (*override_vm_pte_flags)(struct amdgpu_device *dev,
+ struct amdgpu_vm *vm,
+ uint64_t addr, uint64_t *flags);
/* get the amount of memory used by the vbios for pre-OS console */
unsigned int (*get_vbios_fb_size)(struct amdgpu_device *adev);
 
@@ -336,6 +340,9 @@ struct amdgpu_gmc {
 #define amdgpu_gmc_map_mtype(adev, flags) 
(adev)->gmc.gmc_funcs->map_mtype((adev),(flags))
 #define amdgpu_gmc_get_vm_pde(adev, level, dst, flags) 
(adev)->gmc.gmc_funcs->get_vm_pde((adev), (level), (dst), (flags))
 #define amdgpu_gmc_get_vm_pte(adev, mapping, flags) 
(adev)->gmc.gmc_funcs->get_vm_pte((adev), (mapping), (flags))
+#define amdgpu_gmc_override_vm_pte_flags(adev, vm, addr, pte_flags)\
+   (adev)->gmc.gmc_funcs->override_vm_pte_flags\
+   ((adev), (vm), (addr), (pte_flags))
 #define amdgpu_gmc_get_vbios_fb_size(adev) 
(adev)->gmc.gmc_funcs->get_vbios_fb_size((adev))
 
 /**
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
index bc5d126b600b..60b1da93b06d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
@@ -786,13 +786,14 @@ static void amdgpu_vm_pte_update_flags(struct 
amdgpu_vm_update_params *params,
   uint64_t pe, uint64_t addr,
   unsigned int count, uint32_t incr,
   uint64_t flags)
-
 {
+   struct amdgpu_device *adev = params->adev;
+
if (level != AMDGPU_VM_PTB) {
flags |= AMDGPU_PDE_PTE;
-   amdgpu_gmc_get_vm_pde(params->adev, level, , );
+   amdgpu_gmc_get_vm_pde(adev, level, , );
 
-   } else if (params->adev->asic_type >= CHIP_VEGA10 &&
+   } else if (adev->asic_type >= CHIP_VEGA10 &&
   !(flags & AMDGPU_PTE_VALID) &&
   !(flags & AMDGPU_PTE_PRT)) {
 
@@ -800,6 +801,21 @@ static void amdgpu_vm_pte_update_flags(struct 
amdgpu_vm_update_params *params,
flags |= AMDGPU_PTE_EXECUTABLE;
}
 
+   /* APUs mapping system memory may need different MTYPEs on different
+* NUMA nodes. Only do this for contiguous ranges that can be assumed
+* to be on the same NUMA node.
+*/
+   if ((flags & AMDGPU_PTE_SYSTEM) && (adev->flags & AMD_IS_APU) &&
+   adev->gmc.gmc_funcs->override_vm_pte_flags &&
+   num_possible_nodes() > 1) {
+   if (!params->pages_addr)
+   amdgpu_gmc_override_vm_pte_flags(adev, params->vm,
+addr, );
+   else
+   dev_dbg(adev->dev,
+   "override_vm_pte_flags skipped: 
non-contiguous\n");
+   }
+
params->vm->update_funcs->update(params, pt, pe, addr, count, incr,
 flags);
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index 52f5bab5fcb7..5c9f0169292e 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -1302,6 +1302,69 @@ static void gmc_v9_0_get_vm_pte(struct amdgpu_device 
*adev,
 mapping, flags);
 }
 
+static void gmc_v9_0_override_vm_pte_flags(struct amdgpu_device *adev,
+  struct amdgpu_vm *vm,
+  uint64_t addr, uint64_t *flags)
+{
+   int local_node, nid;
+
+   /* Only GFX 9.4.3 APUs associate GPUs with NUMA nodes. Local system
+* memory can use more efficient MTYPEs.
+*/
+   if 

[PATCH 2/6] drm/amdgpu/bu: Add use_mtype_cc_wa module param

2023-05-10 Thread Alex Deucher
From: Graham Sider 

By default, set use_mtype_cc_wa to 1 to set PTE coherence flag MTYPE_CC
instead of MTYPE_RW by default. This is required for the time being to
mitigate a bug causing XCCs to hit stale data due to TCC marking fully
dirty lines as exclusive.

Signed-off-by: Graham Sider 
Reviewed-by: Joseph Greathouse 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |  7 +++
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c   | 10 +++---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c|  7 +--
 4 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 9904ce78b8fc..a3a0dbeb251f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -213,6 +213,7 @@ extern int amdgpu_noretry;
 extern int amdgpu_force_asic_type;
 extern int amdgpu_smartshift_bias;
 extern int amdgpu_use_xgmi_p2p;
+extern bool amdgpu_use_mtype_cc_wa;
 #ifdef CONFIG_HSA_AMD
 extern int sched_policy;
 extern bool debug_evictions;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index e4d09bf0887d..2f38c49aa597 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -831,6 +831,13 @@ MODULE_PARM_DESC(no_queue_eviction_on_vm_fault, "No queue 
eviction on VM fault (
 module_param_named(no_queue_eviction_on_vm_fault, 
amdgpu_no_queue_eviction_on_vm_fault, int, 0444);
 #endif
 
+/**
+ * DOC: use_mtype_cc_wa (bool)
+ */
+bool amdgpu_use_mtype_cc_wa = true;
+MODULE_PARM_DESC(use_mtype_cc_wa, "Use MTYPE_CC workaround (0 = use MTYPE_RW 
where applicable, 1 = use MTYPE_CC where applicable (default))");
+module_param_named(use_mtype_cc_wa, amdgpu_use_mtype_cc_wa, bool, 0444);
+
 /**
  * DOC: pcie_p2p (bool)
  * Enable PCIe P2P (requires large-BAR). Default value: true (on)
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index d28ffdb07ae6..59ce741dfa73 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -1192,6 +1192,7 @@ static void gmc_v9_0_get_coherence_flags(struct 
amdgpu_device *adev,
bool coherent = bo->flags & AMDGPU_GEM_CREATE_COHERENT;
bool uncached = bo->flags & AMDGPU_GEM_CREATE_UNCACHED;
unsigned int mtype;
+   unsigned int mtype_default;
bool snoop = false;
 
switch (adev->ip_versions[GC_HWIP][0]) {
@@ -1235,7 +1236,10 @@ static void gmc_v9_0_get_coherence_flags(struct 
amdgpu_device *adev,
/* FIXME: Needs more work for handling multiple memory
 * partitions (> NPS1 mode) e.g. NPS4 for both APU and dGPU
 * modes.
+* FIXME: Temporarily using MTYPE_CC instead of MTYPE_RW where 
applicable.
+* To force use of MTYPE_RW, set use_mtype_cc_wa=0
 */
+   mtype_default = amdgpu_use_mtype_cc_wa ? MTYPE_CC : MTYPE_RW;
snoop = true;
if (uncached) {
mtype = MTYPE_UC;
@@ -1250,14 +1254,14 @@ static void gmc_v9_0_get_coherence_flags(struct 
amdgpu_device *adev,
 * socket should be treated as remote access so MTYPE_RW
 * cannot be used always.
 */
-   mtype = MTYPE_RW;
+   mtype = mtype_default;
} else if (adev->flags & AMD_IS_APU) {
/* APU on carve out mode */
-   mtype = MTYPE_RW;
+   mtype = mtype_default;
} else {
/* dGPU */
if (is_vram && bo_adev == adev)
-   mtype = MTYPE_RW;
+   mtype = mtype_default;
else if (is_vram)
mtype = MTYPE_NC;
else
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 83f8e4e50315..c55b9754c506 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1197,9 +1197,12 @@ svm_range_get_pte_flags(struct kfd_node *node,
if (uncached) {
mapping_flags |= AMDGPU_VM_MTYPE_UC;
} else if (domain == SVM_RANGE_VRAM_DOMAIN) {
-   /* local HBM region close to partition */
+   /* local HBM region close to partition
+* FIXME: Temporarily using MTYPE_CC instead of 
MTYPE_RW where applicable.
+* To force use of MTYPE_RW, set use_mtype_cc_wa=0
+*/
if (bo_node == node)
-   mapping_flags |= AMDGPU_VM_MTYPE_RW;
+   mapping_flags |= 

[PATCH 1/6] drm/amdgpu/bu: Use legacy TLB flush for gfx943

2023-05-10 Thread Alex Deucher
From: Graham Sider 

Invalidate TLBs via a legacy flush request (flush_type=0) prior to the
heavyweight flush requests (flush_type=2) in gmc_v9_0.c. This is
temporarily required to mitigate a bug causing CPC UTCL1 to return stale
translations after invalidation requests in address range mode.

Signed-off-by: Graham Sider 
Reviewed-by: Philip Yang 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index f000e0e89bd0..d28ffdb07ae6 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -833,6 +833,14 @@ static void gmc_v9_0_flush_gpu_tlb(struct amdgpu_device 
*adev, uint32_t vmid,
 */
inv_req = gmc_v9_0_get_invalidate_req(vmid, 2);
inv_req2 = gmc_v9_0_get_invalidate_req(vmid, flush_type);
+   } else if (flush_type == 2 &&
+  adev->ip_versions[GC_HWIP][0] == IP_VERSION(9, 4, 3)) {
+   /* FIXME: Temporarily add a legacy flush (type 0) before 
heavyweight
+* flush for gfx943 to mitigate a bug which causes CPC UTCL1 to 
return
+* stale translations even after TLB heavyweight flush.
+*/
+   inv_req = gmc_v9_0_get_invalidate_req(vmid, 0);
+   inv_req2 = gmc_v9_0_get_invalidate_req(vmid, flush_type);
} else {
inv_req = gmc_v9_0_get_invalidate_req(vmid, flush_type);
inv_req2 = 0;
@@ -976,6 +984,15 @@ static int gmc_v9_0_flush_gpu_tlb_pasid(struct 
amdgpu_device *adev,
if (vega20_xgmi_wa)
kiq->pmf->kiq_invalidate_tlbs(ring,
  pasid, 2, all_hub);
+
+   /* FIXME: Temporarily add a legacy flush (type 0) before 
heavyweight
+* flush for gfx943 to mitigate a bug which causes CPC UTCL1 to 
return
+* stale translations even after TLB heavyweight flush.
+*/
+   if (flush_type == 2 && adev->ip_versions[GC_HWIP][0] == 
IP_VERSION(9, 4, 3))
+   kiq->pmf->kiq_invalidate_tlbs(ring,
+   pasid, 0, all_hub);
+
kiq->pmf->kiq_invalidate_tlbs(ring,
pasid, flush_type, all_hub);
r = amdgpu_fence_emit_polling(ring, , MAX_KIQ_REG_WAIT);
-- 
2.40.1



[linux-next:master] BUILD SUCCESS WITH WARNING 578215f3e21c472c08d70b8796edf1ac58f88578

2023-05-10 Thread kernel test robot
tree/branch: 
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
branch HEAD: 578215f3e21c472c08d70b8796edf1ac58f88578  Add linux-next specific 
files for 20230510

Warning reports:

https://lore.kernel.org/oe-kbuild-all/202304140707.coh337ux-...@intel.com

Warning: (recently discovered and may have been fixed)

drivers/base/regmap/regcache-maple.c:113:23: warning: 'lower_index' is used 
uninitialized [-Wuninitialized]
drivers/base/regmap/regcache-maple.c:113:36: warning: 'lower_last' is used 
uninitialized [-Wuninitialized]
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:6395:21: warning: 
variable 'count' set but not used [-Wunused-but-set-variable]
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:499:13: warning: variable 'j' set but 
not used [-Wunused-but-set-variable]
drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c:48:38: warning: unused variable 
'golden_settings_gc_9_4_3' [-Wunused-const-variable]

Unverified Warning (likely false positive, please contact us if interested):

drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:648:3-9: preceding lock on line 640
drivers/gpu/drm/i915/display/intel_psr.c:2999:0-23: WARNING: 
i915_edp_psr_debug_fops should be defined with DEFINE_DEBUGFS_ATTRIBUTE
fs/ext4/super.c:4724 ext4_check_feature_compatibility() warn: bitwise AND 
condition is false here
fs/ext4/verity.c:316 ext4_get_verity_descriptor_location() error: uninitialized 
symbol 'desc_size_disk'.
fs/xfs/scrub/fscounters.c:459 xchk_fscounters() warn: ignoring unreachable code.

Warning ids grouped by kconfigs:

gcc_recent_errors
|-- alpha-allyesconfig
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-amdgpu_dm-amdgpu_dm.c:warning:variable-count-set-but-not-used
|   `-- 
drivers-gpu-drm-amd-amdgpu-amdgpu_gfx.c:warning:variable-j-set-but-not-used
|-- arc-allyesconfig
|   |-- 
drivers-base-regmap-regcache-maple.c:warning:lower_index-is-used-uninitialized
|   |-- 
drivers-base-regmap-regcache-maple.c:warning:lower_last-is-used-uninitialized
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-amdgpu_dm-amdgpu_dm.c:warning:variable-count-set-but-not-used
|   `-- 
drivers-gpu-drm-amd-amdgpu-amdgpu_gfx.c:warning:variable-j-set-but-not-used
|-- arc-randconfig-r025-20230509
|   |-- 
drivers-base-regmap-regcache-maple.c:warning:lower_index-is-used-uninitialized
|   `-- 
drivers-base-regmap-regcache-maple.c:warning:lower_last-is-used-uninitialized
|-- arm-allmodconfig
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-amdgpu_dm-amdgpu_dm.c:warning:variable-count-set-but-not-used
|   `-- 
drivers-gpu-drm-amd-amdgpu-amdgpu_gfx.c:warning:variable-j-set-but-not-used
|-- arm-allyesconfig
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-amdgpu_dm-amdgpu_dm.c:warning:variable-count-set-but-not-used
|   `-- 
drivers-gpu-drm-amd-amdgpu-amdgpu_gfx.c:warning:variable-j-set-but-not-used
|-- arm64-allyesconfig
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-amdgpu_dm-amdgpu_dm.c:warning:variable-count-set-but-not-used
|   `-- 
drivers-gpu-drm-amd-amdgpu-amdgpu_gfx.c:warning:variable-j-set-but-not-used
|-- csky-allmodconfig
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-amdgpu_dm-amdgpu_dm.c:warning:variable-count-set-but-not-used
|   `-- 
drivers-gpu-drm-amd-amdgpu-amdgpu_gfx.c:warning:variable-j-set-but-not-used
|-- i386-allyesconfig
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-amdgpu_dm-amdgpu_dm.c:warning:variable-count-set-but-not-used
|   `-- 
drivers-gpu-drm-amd-amdgpu-amdgpu_gfx.c:warning:variable-j-set-but-not-used
|-- ia64-allmodconfig
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-amdgpu_dm-amdgpu_dm.c:warning:variable-count-set-but-not-used
|   `-- 
drivers-gpu-drm-amd-amdgpu-amdgpu_gfx.c:warning:variable-j-set-but-not-used
|-- ia64-randconfig-s052-20230509
|   `-- 
drivers-gpu-drm-amd-amdgpu-amdgpu_gfx.c:warning:variable-j-set-but-not-used
|-- loongarch-allmodconfig
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-amdgpu_dm-amdgpu_dm.c:warning:variable-count-set-but-not-used
|   `-- 
drivers-gpu-drm-amd-amdgpu-amdgpu_gfx.c:warning:variable-j-set-but-not-used
|-- loongarch-defconfig
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-amdgpu_dm-amdgpu_dm.c:warning:variable-count-set-but-not-used
|   `-- 
drivers-gpu-drm-amd-amdgpu-amdgpu_gfx.c:warning:variable-j-set-but-not-used
|-- loongarch-randconfig-c023-20230509
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-amdgpu_dm-amdgpu_dm.c:warning:variable-count-set-but-not-used
|   `-- 
drivers-gpu-drm-amd-amdgpu-amdgpu_gfx.c:warning:variable-j-set-but-not-used
|-- loongarch-randconfig-s051-20230509
|   `-- 
drivers-gpu-drm-amd-amdgpu-amdgpu_gfx.c:warning:variable-j-set-but-not-used
|-- microblaze-randconfig-m031-20230509
|   `-- 
fs-ext4-super.c-ext4_check_feature_compatibility()-warn:bitwise-AND-condition-is-false-here
|-- microblaze-randconfig-r035-20230509
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-amdgpu_dm-amdgpu_dm.c:warning:variable-count-set-but-not-used
|   `-- 
drivers-gpu-drm-amd-amdgpu-amdgpu_gfx.c:warning:variable-j-set-but-not-used
|-- microblaze-randconfig-s032

Re: [PATCH] drm/amdkfd: Remove skiping userptr buffer mapping when mmu notifier marks it as invalid

2023-05-10 Thread Alex Deucher
On Wed, May 10, 2023 at 11:00 AM Felix Kuehling  wrote:
>
> Am 2023-05-09 um 18:17 schrieb Alex Deucher:
> > From: Xiaogang Chen 
> >
> > mmu notifier does not always hold mm->sem during call back. That causes
> > a race condition between kfd userprt buffer mapping and mmu notifier
> > which leds to gpu shadder or SDMA access userptr buffer before it has been
> > mapped to gpu VM. Always map userptr buffer to avoid that though it may make
> > some userprt buffers mapped two times.
> >
> > Suggested-by: Felix Kuehling 
> > Signed-off-by: Xiaogang Chen 
> > Reviewed-by: Felix Kuehling 
> > Signed-off-by: Alex Deucher 
>
> This patch is no longer needed and should not be applied. It was
> originally applied to amd-staging-drm-next as patch
> fcf00f8d29f2fc6bf00531a1447be28b99073cc3 in November 2022. This fixed a
> race condition due to incorrect assumptions about the mmap lock and MMU
> notifiers. This hunk was added back by my later patch f95f51a4c335
> ("drm/amdgpu: Add notifier lock for KFD userptrs") in December, using
> our own notifier lock that doesn't suffer from those races.
>

Thanks.  Dropped.

Alex

> Regards,
>Felix
>
>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 10 --
> >   1 file changed, 10 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> > index 58a774647573..40078c0a5585 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> > @@ -1942,16 +1942,6 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
> >*/
> >   mutex_lock(>process_info->lock);
> >
> > - /* Lock notifier lock. If we find an invalid userptr BO, we can be
> > -  * sure that the MMU notifier is no longer running
> > -  * concurrently and the queues are actually stopped
> > -  */
> > - if (amdgpu_ttm_tt_get_usermm(bo->tbo.ttm)) {
> > - mutex_lock(>process_info->notifier_lock);
> > - is_invalid_userptr = !!mem->invalid;
> > - mutex_unlock(>process_info->notifier_lock);
> > - }
> > -
> >   mutex_lock(>lock);
> >
> >   domain = mem->domain;


Re: [PATCH] drm/amdkfd: Remove skiping userptr buffer mapping when mmu notifier marks it as invalid

2023-05-10 Thread Felix Kuehling

Am 2023-05-09 um 18:17 schrieb Alex Deucher:

From: Xiaogang Chen 

mmu notifier does not always hold mm->sem during call back. That causes
a race condition between kfd userprt buffer mapping and mmu notifier
which leds to gpu shadder or SDMA access userptr buffer before it has been
mapped to gpu VM. Always map userptr buffer to avoid that though it may make
some userprt buffers mapped two times.

Suggested-by: Felix Kuehling 
Signed-off-by: Xiaogang Chen 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher 


This patch is no longer needed and should not be applied. It was 
originally applied to amd-staging-drm-next as patch 
fcf00f8d29f2fc6bf00531a1447be28b99073cc3 in November 2022. This fixed a 
race condition due to incorrect assumptions about the mmap lock and MMU 
notifiers. This hunk was added back by my later patch f95f51a4c335 
("drm/amdgpu: Add notifier lock for KFD userptrs") in December, using 
our own notifier lock that doesn't suffer from those races.


Regards,
  Felix



---
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 10 --
  1 file changed, 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 58a774647573..40078c0a5585 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1942,16 +1942,6 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
 */
mutex_lock(>process_info->lock);
  
-	/* Lock notifier lock. If we find an invalid userptr BO, we can be

-* sure that the MMU notifier is no longer running
-* concurrently and the queues are actually stopped
-*/
-   if (amdgpu_ttm_tt_get_usermm(bo->tbo.ttm)) {
-   mutex_lock(>process_info->notifier_lock);
-   is_invalid_userptr = !!mem->invalid;
-   mutex_unlock(>process_info->notifier_lock);
-   }
-
mutex_lock(>lock);
  
  	domain = mem->domain;


Re: [RFC PATCH 0/4] Add support for DRM cgroup memory accounting.

2023-05-10 Thread Maarten Lankhorst

Hey,

On 2023-05-05 21:50, Tejun Heo wrote:

Hello,

On Wed, May 03, 2023 at 10:34:56AM +0200, Maarten Lankhorst wrote:

RFC as I'm looking for comments.

For long running compute, it can be beneficial to partition the GPU memory
between cgroups, so each cgroup can use its maximum amount of memory without
interfering with other scheduled jobs. Done properly, this can alleviate the
need for eviction, which might result in a job being terminated if the GPU
doesn't support mid-thread preemption or recoverable page faults.

This is done by adding a bunch of knobs to cgroup:
drm.capacity: Shows maximum capacity of each resource region.
drm.max: Display or limit max amount of memory.
drm.current: Current amount of memory in use.

TTM has not been made cgroup aware yet, so instead of evicting from
the current cgroup to stay within the cgroup limits, it simply returns
the error -ENOSPC to userspace.

I've used Tvrtko's cgroup controller series as a base, but it implemented
scheduling weight, not memory accounting, so I only ended up keeping the
base patch.

Xe is not upstream yet, so the driver specific patch will only apply on
https://gitlab.freedesktop.org/drm/xe/kernel

Some high-level feedbacks.

* There have been multiple attempts at this but the track record is kinda
   poor. People don't seem to agree what should constitute DRM memory and how
   they should be accounted / controlled.


Thanks for the feedback.

I think for a lot of drivers, what is VRAM might have different meaning, but 
the intention
is it being accounted in the same way. Most drivers use TTM, which has a 
standard way
of allocating memory, and a standard way of evicting VRAM.

This makes it very useful for the usecase which I'm looking at, long running 
compute.
When you have long running jobs, you don't want them to be interrupted because 
a completely
unrelated process needs some VRAM, and one of the compute jobs buffers are 
being evicted.

Some hardware does not support mid-thread preemption or page fault recovery, 
this means that
when memory is evicted, the compute job is terminated.

The full problem statement is in drm-compute.rst in the memory accounting patch.


* I like Tvrtko's scheduling patchset because it exposes a generic interface
   which makes sense regardless of hardware details and then each driver can
   implement the configured control in whatever way they can. However, even
   for that, there doesn't seem much buy-in from other drivers.


Yeah, that is correct. But it tries to solve a different part of the problem.


* This proposal seems narrowly scoped trying to solve a specific problem
   which may not translate to different hardware configurations. Please let
   me know if I got that wrong, but if that's the case, I think a better and
   easier approach might be just being a part of the misc controller. That
   doesn't require much extra code and should be able to provide everything
   necessary for statically limiting specific resources.


The misc controller is not granular enough. A single computer may have any 
number of
graphics cards, some of them with multiple regions of vram inside a single card.

For compute and shared hosting you might want to limit the usage of a single 
memory
region on a single card, and then limit the same limits for the rest too, to 
prevent
triggering eviction.

The current version doesn't handle eviction correctly, because I was still 
working
on it and I wanted to post a RFC. As a result, the case where resource limit is 
hit
will evict the device's entire memory or get stuck in a loop. With some 
changes, the
next version will not have this bug. This results in a few changes to the core 
code. [1]

In the next version, I will move all the code for handling the resource limit to
TTM's eviction layer, because otherwise it cannot handle the resource limit 
correctly.

The effect of moving the code to TTM, is that it will make the code even more 
generic
for drivers that have vram and use TTM. When using TTM, you only have to 
describe your
VRAM, update some fields in the TTM manager and (un)register your device with 
the
cgroup handler on (un)load. It's quite trivial to add vram accounting to amdgpu 
and
nouveau. [2]

If you want to add a knob for scheduling weight for a process, it makes sense to
also add resource usage as a knob, otherwise the effect of that knob is very
limited. So even for Tvrtko's original proposed usecase, it would make sense.

Cheers,
~Maarten


[1] Compared to this version:
 static inline int drmcg_try_charge(struct drmcgroup_state **drmcs,
+  struct drmcgroup_state **limitcs,
   struct drmcgroup_device *cgdev,
   u32 index, u64 size)

This now returns which cgroup's limit is hit on -EAGAIN.

+bool drmcs_grouped(struct drmcgroup_state *limitcs,
+  struct drmcgroup_state *testcs);
Tells if testcs is the same as limitcs, or a subgroup 

Re: [PATCH] drm/sched: Check scheduler work queue before calling timeout handling

2023-05-10 Thread Luben Tuikov
On 2023-05-10 10:24, vitaly prosyak wrote:
> 
> On 2023-05-10 10:19, Luben Tuikov wrote:
>> On 2023-05-10 09:51, vitaly.pros...@amd.com wrote:
>>> From: Vitaly Prosyak 
>>>
>>> During an IGT GPU reset test we see again oops despite of
>>> commit 0c8c901aaaebc9 (drm/sched: Check scheduler ready before calling
>>> timeout handling).
>>>
>>> It uses ready condition whether to call drm_sched_fault which unwind
>>> the TDR leads to GPU reset.
>>> However it looks the ready condition is overloaded with other meanings,
>>> for example, for the following stack is related GPU reset :
>>>
>>> 0  gfx_v9_0_cp_gfx_start
>>> 1  gfx_v9_0_cp_gfx_resume
>>> 2  gfx_v9_0_cp_resume
>>> 3  gfx_v9_0_hw_init
>>> 4  gfx_v9_0_resume
>>> 5  amdgpu_device_ip_resume_phase2
>>>
>>> does the following:
>>> /* start the ring */
>>> gfx_v9_0_cp_gfx_start(adev);
>>> ring->sched.ready = true;
>>>
>>> The same approach is for other ASICs as well :
>>> gfx_v8_0_cp_gfx_resume
>>> gfx_v10_0_kiq_resume, etc...
>>>
>>> As a result, our GPU reset test causes GPU fault which calls 
>>> unconditionally gfx_v9_0_fault
>>> and then drm_sched_fault. However now it depends on whether the interrupt 
>>> service routine
>>> drm_sched_fault is executed after gfx_v9_0_cp_gfx_start is completed which 
>>> sets the ready
>>> field of the scheduler to true even  for uninitialized schedulers and 
>>> causes oops vs
>>> no fault or when ISR  drm_sched_fault is completed prior  
>>> gfx_v9_0_cp_gfx_start and
>>> NULL pointer dereference does not occur.
>>>
>>> Use the field timeout_wq  to prevent oops for uninitialized schedulers.
>>> The field could be initialized by the work queue of resetting the domain.
>>>
>>> Fixes: 0c8c901aaaebc9 ("drm/sched: Check scheduler ready before calling 
>>> timeout handling")
>>>
>>> v1: Corrections to commit message (Luben)
>>> Signed-off-by: Vitaly Prosyak 
>>> Reviewed-by: Luben Tuikov 
>> I didn't give my RB to this patch so I'm not sure what it is doing here.
> I removed your rb, also if you do not know what is doing here why do you want 
> to push this to amd-staging-drm-next and to  drm-misc-fixed?

I'll add my RB as I push it to those two branches.
I'll also add a Link tag and fix the commit SHA for the Fixes tag to
one which is found in drm-misc-fixes.

Thanks for the patch fixing this long-standing bug.

Regards,
Luben


>>
>> The fixes tag should be before the SOB tag, and the v1 line should be 
>> separated
>> by a line before the Git tags.
>>
>> Since this is a good patch and I want it in both drm-misc-fixed and 
>> amd-staging-drm-next,
>> I'll submit it to drm-misc-fixed with a Link: and RB/SOB tag there and then 
>> cherry-pick
>> that into amd-staging-drm-next.
>>
>> Don't push it to amd-staging-drm-next.
>>
>> I'll fix this and submit to amd-staging-drm-next and to drm-misc-fixed with
>> a Link: tag.
>>
>> Regards,
>> Luben
>>
>>
>>> ---
>>>  drivers/gpu/drm/scheduler/sched_main.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>> index 649fac2e1ccb..670b7997f389 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>> @@ -308,7 +308,7 @@ static void drm_sched_start_timeout(struct 
>>> drm_gpu_scheduler *sched)
>>>   */
>>>  void drm_sched_fault(struct drm_gpu_scheduler *sched)
>>>  {
>>> -   if (sched->ready)
>>> +   if (sched->timeout_wq)
>>> mod_delayed_work(sched->timeout_wq, >work_tdr, 0);
>>>  }
>>>  EXPORT_SYMBOL(drm_sched_fault);



Re: [PATCH] drm/sched: Check scheduler work queue before calling timeout handling

2023-05-10 Thread vitaly prosyak


On 2023-05-10 10:19, Luben Tuikov wrote:
> On 2023-05-10 09:51, vitaly.pros...@amd.com wrote:
>> From: Vitaly Prosyak 
>>
>> During an IGT GPU reset test we see again oops despite of
>> commit 0c8c901aaaebc9 (drm/sched: Check scheduler ready before calling
>> timeout handling).
>>
>> It uses ready condition whether to call drm_sched_fault which unwind
>> the TDR leads to GPU reset.
>> However it looks the ready condition is overloaded with other meanings,
>> for example, for the following stack is related GPU reset :
>>
>> 0  gfx_v9_0_cp_gfx_start
>> 1  gfx_v9_0_cp_gfx_resume
>> 2  gfx_v9_0_cp_resume
>> 3  gfx_v9_0_hw_init
>> 4  gfx_v9_0_resume
>> 5  amdgpu_device_ip_resume_phase2
>>
>> does the following:
>>  /* start the ring */
>>  gfx_v9_0_cp_gfx_start(adev);
>>  ring->sched.ready = true;
>>
>> The same approach is for other ASICs as well :
>> gfx_v8_0_cp_gfx_resume
>> gfx_v10_0_kiq_resume, etc...
>>
>> As a result, our GPU reset test causes GPU fault which calls unconditionally 
>> gfx_v9_0_fault
>> and then drm_sched_fault. However now it depends on whether the interrupt 
>> service routine
>> drm_sched_fault is executed after gfx_v9_0_cp_gfx_start is completed which 
>> sets the ready
>> field of the scheduler to true even  for uninitialized schedulers and causes 
>> oops vs
>> no fault or when ISR  drm_sched_fault is completed prior  
>> gfx_v9_0_cp_gfx_start and
>> NULL pointer dereference does not occur.
>>
>> Use the field timeout_wq  to prevent oops for uninitialized schedulers.
>> The field could be initialized by the work queue of resetting the domain.
>>
>> Fixes: 0c8c901aaaebc9 ("drm/sched: Check scheduler ready before calling 
>> timeout handling")
>>
>> v1: Corrections to commit message (Luben)
>> Signed-off-by: Vitaly Prosyak 
>> Reviewed-by: Luben Tuikov 
> I didn't give my RB to this patch so I'm not sure what it is doing here.
I removed your rb, also if you do not know what is doing here why do you want 
to push this to amd-staging-drm-next and to  drm-misc-fixed?
>
> The fixes tag should be before the SOB tag, and the v1 line should be 
> separated
> by a line before the Git tags.
>
> Since this is a good patch and I want it in both drm-misc-fixed and 
> amd-staging-drm-next,
> I'll submit it to drm-misc-fixed with a Link: and RB/SOB tag there and then 
> cherry-pick
> that into amd-staging-drm-next.
>
> Don't push it to amd-staging-drm-next.
>
> I'll fix this and submit to amd-staging-drm-next and to drm-misc-fixed with
> a Link: tag.
>
> Regards,
> Luben
>
>
>> ---
>>  drivers/gpu/drm/scheduler/sched_main.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
>> b/drivers/gpu/drm/scheduler/sched_main.c
>> index 649fac2e1ccb..670b7997f389 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -308,7 +308,7 @@ static void drm_sched_start_timeout(struct 
>> drm_gpu_scheduler *sched)
>>   */
>>  void drm_sched_fault(struct drm_gpu_scheduler *sched)
>>  {
>> -if (sched->ready)
>> +if (sched->timeout_wq)
>>  mod_delayed_work(sched->timeout_wq, >work_tdr, 0);
>>  }
>>  EXPORT_SYMBOL(drm_sched_fault);


Re: [PATCH] drm/sched: Check scheduler work queue before calling timeout handling

2023-05-10 Thread Luben Tuikov
On 2023-05-10 09:51, vitaly.pros...@amd.com wrote:
> From: Vitaly Prosyak 
> 
> During an IGT GPU reset test we see again oops despite of
> commit 0c8c901aaaebc9 (drm/sched: Check scheduler ready before calling
> timeout handling).
> 
> It uses ready condition whether to call drm_sched_fault which unwind
> the TDR leads to GPU reset.
> However it looks the ready condition is overloaded with other meanings,
> for example, for the following stack is related GPU reset :
> 
> 0  gfx_v9_0_cp_gfx_start
> 1  gfx_v9_0_cp_gfx_resume
> 2  gfx_v9_0_cp_resume
> 3  gfx_v9_0_hw_init
> 4  gfx_v9_0_resume
> 5  amdgpu_device_ip_resume_phase2
> 
> does the following:
>   /* start the ring */
>   gfx_v9_0_cp_gfx_start(adev);
>   ring->sched.ready = true;
> 
> The same approach is for other ASICs as well :
> gfx_v8_0_cp_gfx_resume
> gfx_v10_0_kiq_resume, etc...
> 
> As a result, our GPU reset test causes GPU fault which calls unconditionally 
> gfx_v9_0_fault
> and then drm_sched_fault. However now it depends on whether the interrupt 
> service routine
> drm_sched_fault is executed after gfx_v9_0_cp_gfx_start is completed which 
> sets the ready
> field of the scheduler to true even  for uninitialized schedulers and causes 
> oops vs
> no fault or when ISR  drm_sched_fault is completed prior  
> gfx_v9_0_cp_gfx_start and
> NULL pointer dereference does not occur.
> 
> Use the field timeout_wq  to prevent oops for uninitialized schedulers.
> The field could be initialized by the work queue of resetting the domain.
> 
> Fixes: 0c8c901aaaebc9 ("drm/sched: Check scheduler ready before calling 
> timeout handling")
> 
> v1: Corrections to commit message (Luben)
> Signed-off-by: Vitaly Prosyak 
> Reviewed-by: Luben Tuikov 

I didn't give my RB to this patch so I'm not sure what it is doing here.

The fixes tag should be before the SOB tag, and the v1 line should be separated
by a line before the Git tags.

Since this is a good patch and I want it in both drm-misc-fixed and 
amd-staging-drm-next,
I'll submit it to drm-misc-fixed with a Link: and RB/SOB tag there and then 
cherry-pick
that into amd-staging-drm-next.

Don't push it to amd-staging-drm-next.

I'll fix this and submit to amd-staging-drm-next and to drm-misc-fixed with
a Link: tag.

Regards,
Luben


> ---
>  drivers/gpu/drm/scheduler/sched_main.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
> b/drivers/gpu/drm/scheduler/sched_main.c
> index 649fac2e1ccb..670b7997f389 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -308,7 +308,7 @@ static void drm_sched_start_timeout(struct 
> drm_gpu_scheduler *sched)
>   */
>  void drm_sched_fault(struct drm_gpu_scheduler *sched)
>  {
> - if (sched->ready)
> + if (sched->timeout_wq)
>   mod_delayed_work(sched->timeout_wq, >work_tdr, 0);
>  }
>  EXPORT_SYMBOL(drm_sched_fault);



[PATCH] drm/sched: Check scheduler work queue before calling timeout handling

2023-05-10 Thread vitaly.prosyak
From: Vitaly Prosyak 

During an IGT GPU reset test we see again oops despite of
commit 0c8c901aaaebc9 (drm/sched: Check scheduler ready before calling
timeout handling).

It uses ready condition whether to call drm_sched_fault which unwind
the TDR leads to GPU reset.
However it looks the ready condition is overloaded with other meanings,
for example, for the following stack is related GPU reset :

0  gfx_v9_0_cp_gfx_start
1  gfx_v9_0_cp_gfx_resume
2  gfx_v9_0_cp_resume
3  gfx_v9_0_hw_init
4  gfx_v9_0_resume
5  amdgpu_device_ip_resume_phase2

does the following:
/* start the ring */
gfx_v9_0_cp_gfx_start(adev);
ring->sched.ready = true;

The same approach is for other ASICs as well :
gfx_v8_0_cp_gfx_resume
gfx_v10_0_kiq_resume, etc...

As a result, our GPU reset test causes GPU fault which calls unconditionally 
gfx_v9_0_fault
and then drm_sched_fault. However now it depends on whether the interrupt 
service routine
drm_sched_fault is executed after gfx_v9_0_cp_gfx_start is completed which sets 
the ready
field of the scheduler to true even  for uninitialized schedulers and causes 
oops vs
no fault or when ISR  drm_sched_fault is completed prior  gfx_v9_0_cp_gfx_start 
and
NULL pointer dereference does not occur.

Use the field timeout_wq  to prevent oops for uninitialized schedulers.
The field could be initialized by the work queue of resetting the domain.

Fixes: 0c8c901aaaebc9 ("drm/sched: Check scheduler ready before calling timeout 
handling")

v1: Corrections to commit message (Luben)
Signed-off-by: Vitaly Prosyak 
---
 drivers/gpu/drm/scheduler/sched_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 649fac2e1ccb..670b7997f389 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -308,7 +308,7 @@ static void drm_sched_start_timeout(struct 
drm_gpu_scheduler *sched)
  */
 void drm_sched_fault(struct drm_gpu_scheduler *sched)
 {
-   if (sched->ready)
+   if (sched->timeout_wq)
mod_delayed_work(sched->timeout_wq, >work_tdr, 0);
 }
 EXPORT_SYMBOL(drm_sched_fault);
-- 
2.25.1



[PATCH] drm/sched: Check scheduler work queue before calling timeout handling

2023-05-10 Thread vitaly.prosyak
From: Vitaly Prosyak 

During an IGT GPU reset test we see again oops despite of
commit 0c8c901aaaebc9 (drm/sched: Check scheduler ready before calling
timeout handling).

It uses ready condition whether to call drm_sched_fault which unwind
the TDR leads to GPU reset.
However it looks the ready condition is overloaded with other meanings,
for example, for the following stack is related GPU reset :

0  gfx_v9_0_cp_gfx_start
1  gfx_v9_0_cp_gfx_resume
2  gfx_v9_0_cp_resume
3  gfx_v9_0_hw_init
4  gfx_v9_0_resume
5  amdgpu_device_ip_resume_phase2

does the following:
/* start the ring */
gfx_v9_0_cp_gfx_start(adev);
ring->sched.ready = true;

The same approach is for other ASICs as well :
gfx_v8_0_cp_gfx_resume
gfx_v10_0_kiq_resume, etc...

As a result, our GPU reset test causes GPU fault which calls unconditionally 
gfx_v9_0_fault
and then drm_sched_fault. However now it depends on whether the interrupt 
service routine
drm_sched_fault is executed after gfx_v9_0_cp_gfx_start is completed which sets 
the ready
field of the scheduler to true even  for uninitialized schedulers and causes 
oops vs
no fault or when ISR  drm_sched_fault is completed prior  gfx_v9_0_cp_gfx_start 
and
NULL pointer dereference does not occur.

Use the field timeout_wq  to prevent oops for uninitialized schedulers.
The field could be initialized by the work queue of resetting the domain.

Fixes: 0c8c901aaaebc9 ("drm/sched: Check scheduler ready before calling timeout 
handling")

v1: Corrections to commit message (Luben)
Signed-off-by: Vitaly Prosyak 
Reviewed-by: Luben Tuikov 
---
 drivers/gpu/drm/scheduler/sched_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 649fac2e1ccb..670b7997f389 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -308,7 +308,7 @@ static void drm_sched_start_timeout(struct 
drm_gpu_scheduler *sched)
  */
 void drm_sched_fault(struct drm_gpu_scheduler *sched)
 {
-   if (sched->ready)
+   if (sched->timeout_wq)
mod_delayed_work(sched->timeout_wq, >work_tdr, 0);
 }
 EXPORT_SYMBOL(drm_sched_fault);
-- 
2.25.1



Re: [PATCH 10/66] drm/amd/display: Do not set drr on pipe commit

2023-05-10 Thread Michel Dänzer
On 5/9/23 23:07, Pillai, Aurabindo wrote:
> 
> Sorry - the firmware in the previous message is for DCN32. For Navi2x, please 
> use the firmware attached here.

Same problem (contents of /sys/kernel/debug/dri/0/amdgpu_firmware_info below).

Even if it did work with newer FW, the kernel must keep working with older FW, 
so in that case the new behaviour would need to be guarded by the FW version.


VCE feature version: 0, firmware version: 0x
UVD feature version: 0, firmware version: 0x
MC feature version: 0, firmware version: 0x
ME feature version: 44, firmware version: 0x0040
PFP feature version: 44, firmware version: 0x0061
CE feature version: 44, firmware version: 0x0025
RLC feature version: 1, firmware version: 0x0060
RLC SRLC feature version: 0, firmware version: 0x
RLC SRLG feature version: 0, firmware version: 0x
RLC SRLS feature version: 0, firmware version: 0x
RLCP feature version: 0, firmware version: 0x
RLCV feature version: 0, firmware version: 0x
MEC feature version: 44, firmware version: 0x0071
MEC2 feature version: 44, firmware version: 0x0071
IMU feature version: 0, firmware version: 0x
SOS feature version: 0, firmware version: 0x00210c64
ASD feature version: 553648297, firmware version: 0x21a9
TA XGMI feature version: 0x, firmware version: 0x200f
TA RAS feature version: 0x, firmware version: 0x1b00013e
TA HDCP feature version: 0x, firmware version: 0x1738
TA DTM feature version: 0x, firmware version: 0x1215
TA RAP feature version: 0x, firmware version: 0x07000213
TA SECUREDISPLAY feature version: 0x, firmware version: 0x
SMC feature version: 0, program: 0, firmware version: 0x003a5800 (58.88.0)
SDMA0 feature version: 52, firmware version: 0x0053
SDMA1 feature version: 52, firmware version: 0x0053
SDMA2 feature version: 52, firmware version: 0x0053
SDMA3 feature version: 52, firmware version: 0x0053
VCN feature version: 0, firmware version: 0x0211b000
DMCU feature version: 0, firmware version: 0x
DMCUB feature version: 0, firmware version: 0x0202001c
TOC feature version: 0, firmware version: 0x
MES_KIQ feature version: 0, firmware version: 0x
MES feature version: 0, firmware version: 0x
VBIOS version: 113-D4300100-051


--
> *From:* Pillai, Aurabindo 
> *Sent:* Tuesday, May 9, 2023 4:44 PM
> *To:* Michel Dänzer ; Zhuo, Qingqing (Lillian) 
> ; amd-gfx@lists.freedesktop.org 
> ; Chalmers, Wesley 
> *Cc:* Wang, Chao-kai (Stylon) ; Li, Sun peng (Leo) 
> ; Wentland, Harry ; Siqueira, 
> Rodrigo ; Li, Roman ; Chiu, 
> Solomon ; Lin, Wayne ; Lakha, 
> Bhawanpreet ; Gutierrez, Agustin 
> ; Kotarac, Pavle 
> *Subject:* Re: [PATCH 10/66] drm/amd/display: Do not set drr on pipe commit
>  
> Hi Michel,
> 
> Could you please try with the attached firmware package if you see the hang 
> without any reverts?  If you do see hangs, please send dmesg with 
> "drm.debug=0x156 log_buf_len=30M" in the kernel cmdline.
> 
> The attached fw is not released to the public yet, but we will be updating 
> them in linux-firmware tree next week. Please do backup your existing 
> firmware, and put the attached files into /usr/lib/firmware/updates/amgpu and 
> regenerate your ramdisk. On ubuntu the following should do:
> 
> sudo update-initramfs -u -k `uname -r`
> 
> --
> 
> Regards,
> Jay
> 

Re: [PATCH] drm/amdgpu: change gfx 11.0.4 external_id range

2023-05-10 Thread Alex Deucher
On Wed, May 10, 2023 at 4:38 AM Yifan Zhang  wrote:
>
> gfx 11.0.4 range starts from 0x80.
>
> Fixes: 311d52367d0a ("drm/amdgpu: add soc21 common ip block support for GC 
> 11.0.4")
>
> Signed-off-by: Yifan Zhang 

Acked-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/amdgpu/soc21.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/soc21.c 
> b/drivers/gpu/drm/amd/amdgpu/soc21.c
> index 0f82b8e83acb..6bff936a6e55 100644
> --- a/drivers/gpu/drm/amd/amdgpu/soc21.c
> +++ b/drivers/gpu/drm/amd/amdgpu/soc21.c
> @@ -711,7 +711,7 @@ static int soc21_common_early_init(void *handle)
> AMD_PG_SUPPORT_VCN_DPG |
> AMD_PG_SUPPORT_GFX_PG |
> AMD_PG_SUPPORT_JPEG;
> -   adev->external_rev_id = adev->rev_id + 0x1;
> +   adev->external_rev_id = adev->rev_id + 0x80;
> break;
>
> default:
> --
> 2.37.3
>


Fwd: Kernel 5.11 crashes when it boots, it produces black screen.

2023-05-10 Thread Bagas Sanjaya
Hi,

I noticed a regression report on Bugzilla ([1]). As many developers don't
have a look on it, I decided to forward it by email. See the report
for the full thread.

Quoting from the report:

>  Azamat S. Kalimoulline 2021-04-06 15:45:08 UTC
> 
> Same as in https://bugzilla.kernel.org/show_bug.cgi?id=212133, but not 
> StoneyRidge related. I have same issue in 5.11.9 kernel, but on Renoir 
> architecture. I have AMD Ryzen 5 PRO 4650U with Radeon Graphics. Same stuck 
> on loading initial ramdisk. modprobe.blacklist=amdgpu 3` didn't help to boot. 
> Same stuck. Also iommu=off and acpi=off too. 5.10.26 boots fine. I boot via 
> efi and I have no option boot without it.

Azamat, can you try reproducing this issue on latest mainline?

Anyway, let me add this regression to regzbot:

#regzbot introduced: v5.10..v5.11 
https://bugzilla.kernel.org/show_bug.cgi?id=212579
#regzbot title: Booting kernel on AMD Ryzen 5 PRO stucks in loading initrd

Thanks.

[1]: https://bugzilla.kernel.org/show_bug.cgi?id=212579

-- 
An old man doll... just what I always wanted! - Clara


Re: Fwd: Kernel 5.11 crashes when it boots, it produces black screen.

2023-05-10 Thread Linux regression tracking (Thorsten Leemhuis)
Hi!

On 10.05.23 10:26, Bagas Sanjaya wrote:
> 
> I noticed a regression report on Bugzilla ([1]). As many developers don't
> have a look on it, I decided to forward it by email. See the report
> for the full thread.
> 
> Quoting from the report:
> 
>>  Azamat S. Kalimoulline 2021-04-06 15:45:08 UTC
>>
>> Same as in https://bugzilla.kernel.org/show_bug.cgi?id=212133, but not 
>> StoneyRidge related. I have same issue in 5.11.9 kernel, but on Renoir 
>> architecture. I have AMD Ryzen 5 PRO 4650U with Radeon Graphics. Same stuck 
>> on loading initial ramdisk. modprobe.blacklist=amdgpu 3` didn't help to 
>> boot. Same stuck. Also iommu=off and acpi=off too. 5.10.26 boots fine. I 
>> boot via efi and I have no option boot without it.
> 
> Azamat, can you try reproducing this issue on latest mainline?
>
> [1]: https://bugzilla.kernel.org/show_bug.cgi?id=212579

Bagas, thx for all your help with regression tracking, much appreciated
(side note, as I'm curious for a while already: what is your motivation?
Just want to help? But whatever, any help is great!).

That being said: I'm not sure if I like what you did in this particular
case, as developers might start getting annoyed by regression tracking
if we throw too many bug reports of lesser quality before their feet --
and then they might start to ignore us, which we really need to prevent.

That's why I would not have forwarded that report at this point of time,
mainly for these reasons:

 * The initial report is quite old already, as it fall through the
cracks (not good, but happens; sorry Azamat!). Hence in this case it
would definitely be better to *first* ask the reporter to check if the
problem still happens with latest mainline (or at least latest stable)
before involving the kernel developers, as it might have been fixed
already.

 * This might not be a amdgpu bug at all; in fact the other bug the
reporter mentioned was an iommu thing. Hence this might be one of those
regressions where a bisection is the only way to get down to the
problem. Sure, sending a few developers a quick inquiry along the lines
of "do you maybe have an idea what's up there" is fine, but that's not
what you did in your mail. Your list of recipients is also quite long;
that's risky: if you do that too often, as then they might start
ignoring mail from you.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.


RE: [PATCH 1/2] drm/amdgpu: fix amdgpu_irq_put call trace in jpeg_v4_0_hw_fini

2023-05-10 Thread Zhang, Horatio
[AMD Official Use Only - General]

Hi Hawking,

When modprobe, the interrupt of jpeg/vcn was enabled in 
amdgpu_fence_driver_hw_init(). If the amdgpu_irq_get function is added in 
amdgpu_xxx_ras_late_init/xxx_v4_0_late_init, it will enable the instance 
interrupt twice. 
My previous modification plan also had this issue. Perhaps we should remove the 
amdgpu_irq_put function from jpeg/vcn_v4_0_hw_fini.

Regards,
Horatio

-Original Message-
From: Zhang, Hawking  
Sent: Monday, May 8, 2023 8:32 PM
To: Zhou1, Tao ; Zhang, Horatio ; 
amd-gfx@lists.freedesktop.org
Cc: Xu, Feifei ; Liu, Leo ; Jiang, Sonny 
; Limonciello, Mario ; Liu, 
HaoPing (Alan) ; Zhang, Horatio 
Subject: RE: [PATCH 1/2] drm/amdgpu: fix amdgpu_irq_put call trace in 
jpeg_v4_0_hw_fini

[AMD Official Use Only - General]

Shall we consider creating amdgpu_vcn_ras_late_init as a common helper for 
interrupt enablement, like other IP blocks. This also reduces further effort 
when RAS feature is introduced in new version of vcn/jpeg

Regards,
Hawking

-Original Message-
From: Zhou1, Tao 
Sent: Monday, May 8, 2023 19:06
To: Zhang, Horatio ; amd-gfx@lists.freedesktop.org
Cc: Zhang, Hawking ; Xu, Feifei ; 
Liu, Leo ; Jiang, Sonny ; Limonciello, 
Mario ; Liu, HaoPing (Alan) ; 
Zhang, Horatio 
Subject: RE: [PATCH 1/2] drm/amdgpu: fix amdgpu_irq_put call trace in 
jpeg_v4_0_hw_fini

[AMD Official Use Only - General]

The series is:

Reviewed-by: Tao Zhou 

> -Original Message-
> From: Horatio Zhang 
> Sent: Monday, May 8, 2023 6:20 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Zhang, Hawking ; Zhou1, Tao 
> ; Xu, Feifei ; Liu, Leo 
> ; Jiang, Sonny ; Limonciello, 
> Mario ; Liu, HaoPing (Alan) 
> ; Zhang, Horatio 
> Subject: [PATCH 1/2] drm/amdgpu: fix amdgpu_irq_put call trace in 
> jpeg_v4_0_hw_fini
> 
> During the suspend, the jpeg_v4_0_hw_init function will use the 
> amdgpu_irq_put to disable the irq of jpeg.inst, but it was not enabled 
> during the resume process, which resulted in a call trace during the GPU 
> reset process.
> 
> [   50.497562] RIP: 0010:amdgpu_irq_put+0xa4/0xc0 [amdgpu]
> [   50.497619] RSP: 0018:aa2400fcfcb0 EFLAGS: 00010246
> [   50.497620] RAX:  RBX: 0001 RCX:
> 
> [   50.497621] RDX:  RSI:  RDI:
> 
> [   50.497621] RBP: aa2400fcfcd0 R08:  R09:
> 
> [   50.497622] R10:  R11:  R12:
> 99b2105242d8
> [   50.497622] R13:  R14: 99b21050 R15:
> 99b21050
> [   50.497623] FS:  () GS:99b51848()
> knlGS:
> [   50.497623] CS:  0010 DS:  ES:  CR0: 80050033
> [   50.497624] CR2: 7f9d32aa91e8 CR3: 0001ba21 CR4:
> 00750ee0
> [   50.497624] PKRU: 5554
> [   50.497625] Call Trace:
> [   50.497625]  
> [   50.497627]  jpeg_v4_0_hw_fini+0x43/0xc0 [amdgpu]
> [   50.497693]  jpeg_v4_0_suspend+0x13/0x30 [amdgpu]
> [   50.497751]  amdgpu_device_ip_suspend_phase2+0x240/0x470 [amdgpu]
> [   50.497802]  amdgpu_device_ip_suspend+0x41/0x80 [amdgpu]
> [   50.497854]  amdgpu_device_pre_asic_reset+0xd9/0x4a0 [amdgpu]
> [   50.497905]  amdgpu_device_gpu_recover.cold+0x548/0xcf1 [amdgpu]
> [   50.498005]  amdgpu_debugfs_reset_work+0x4c/0x80 [amdgpu]
> [   50.498060]  process_one_work+0x21f/0x400
> [   50.498063]  worker_thread+0x200/0x3f0
> [   50.498064]  ? process_one_work+0x400/0x400
> [   50.498065]  kthread+0xee/0x120
> [   50.498067]  ? kthread_complete_and_exit+0x20/0x20
> [   50.498068]  ret_from_fork+0x22/0x30
> 
> Fixes: 86e8255f941e ("drm/amdgpu: add JPEG 4.0 RAS poison consumption
> handling")
> Signed-off-by: Horatio Zhang 
> ---
>  drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c
> b/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c
> index 77e1e64aa1d1..b5c14a166063 100644
> --- a/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c
> @@ -66,6 +66,13 @@ static int jpeg_v4_0_early_init(void *handle)
>   return 0;
>  }
> 
> +static int jpeg_v4_0_late_init(void *handle) {
> + struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> +
> + return amdgpu_irq_get(adev, >jpeg.inst->irq, 0); }
> +
>  /**
>   * jpeg_v4_0_sw_init - sw init for JPEG block
>   *
> @@ -696,7 +703,7 @@ static int jpeg_v4_0_process_interrupt(struct
> amdgpu_device *adev,  static const struct amd_ip_funcs jpeg_v4_0_ip_funcs = {
>   .name = "jpeg_v4_0",
>   .early_init = jpeg_v4_0_early_init,
> - .late_init = NULL,
> + .late_init = jpeg_v4_0_late_init,
>   .sw_init = jpeg_v4_0_sw_init,
>   .sw_fini = jpeg_v4_0_sw_fini,
>   .hw_init = jpeg_v4_0_hw_init,
> --
> 2.34.1

Re: [PATCH] drm/sched: Check scheduler work queue before calling timeout handling

2023-05-10 Thread Luben Tuikov
On 2023-05-09 17:43, vitaly.pros...@amd.com wrote:
> From: Vitaly Prosyak 
> 
> During an IGT GPU reset test we see again oops despite of
> commit 0c8c901aaaebc9bf8bf189ffc116e678f7a2dc16
> drm/sched: Check scheduler ready before calling timeout handling.

You can probably use the more succinct fixes line:
0c8c901aaaebc9 ("drm/sched: Check scheduler ready before calling timeout 
handling")

> 
> It uses ready condition whether to call drm_sched_fault which unwind
> the TDR leads to GPU reset.
> However it looks the ready condition is overloaded with other meanings,
> for example, for the following stack is related GPU reset :
> 
> 0  gfx_v9_0_cp_gfx_start
> 1  gfx_v9_0_cp_gfx_resume
> 2  gfx_v9_0_cp_resume
> 3  gfx_v9_0_hw_init
> 4  gfx_v9_0_resume
> 5  amdgpu_device_ip_resume_phase2
> 
> does the following:
>   /* start the ring */
>   gfx_v9_0_cp_gfx_start(adev);
>   ring->sched.ready = true;
> 
> The same approach is for other ASICs as well :
> gfx_v8_0_cp_gfx_resume
> gfx_v10_0_kiq_resume, etc...
> 
> As a result, our GPU reset test causes GPU fault which calls unconditionally 
> gfx_v9_0_fault
> and then drm_sched_fault. However now it depends on whether the interrupt 
> service routine
> drm_sched_fault is executed after gfx_v9_0_cp_gfx_start is completed which 
> sets the ready
> field of the scheduler to true even  for not initialized schedulers and 
> causes oops vs

"not initialized" --> "uninitialized" reads better.

> no fault or when ISR  drm_sched_fault is completed prior  
> gfx_v9_0_cp_gfx_start and
> NULL pointer dereference does not occur.
> 
> Use the field timeout_wq  to prevent oops for uninitialized schedulers.
> The field could be initialized by the work queue of resetting the domain.
> 
> Signed-off-by: Vitaly Prosyak 

Add, a fixes tag,

Fixes: 0c8c901aaaebc9 ("drm/sched: Check scheduler ready before calling timeout 
handling")

Before the SOB tag.

> ---
>  drivers/gpu/drm/scheduler/sched_main.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
> b/drivers/gpu/drm/scheduler/sched_main.c
> index 649fac2e1ccb..670b7997f389 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -308,7 +308,7 @@ static void drm_sched_start_timeout(struct 
> drm_gpu_scheduler *sched)
>   */
>  void drm_sched_fault(struct drm_gpu_scheduler *sched)
>  {
> - if (sched->ready)
> + if (sched->timeout_wq)
>   mod_delayed_work(sched->timeout_wq, >work_tdr, 0);
>  }
>  EXPORT_SYMBOL(drm_sched_fault);

Yes, this does indeed seem more correct.

Apply the comments above and repost the patch to amd-gfx and dri-devel and
I'll push it to drm-misc-fixes and amd-staging-drm-next.
-- 
Regards,
Luben



[PATCH] drm/amdgpu: change gfx 11.0.4 external_id range

2023-05-10 Thread Yifan Zhang
gfx 11.0.4 range starts from 0x80.

Fixes: 311d52367d0a ("drm/amdgpu: add soc21 common ip block support for GC 
11.0.4")

Signed-off-by: Yifan Zhang 
---
 drivers/gpu/drm/amd/amdgpu/soc21.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/soc21.c 
b/drivers/gpu/drm/amd/amdgpu/soc21.c
index 0f82b8e83acb..6bff936a6e55 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc21.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc21.c
@@ -711,7 +711,7 @@ static int soc21_common_early_init(void *handle)
AMD_PG_SUPPORT_VCN_DPG |
AMD_PG_SUPPORT_GFX_PG |
AMD_PG_SUPPORT_JPEG;
-   adev->external_rev_id = adev->rev_id + 0x1;
+   adev->external_rev_id = adev->rev_id + 0x80;
break;
 
default:
-- 
2.37.3