Re: [PATCH] drm/amdgpu: release correct lock in amdgpu_gfx_enable_kgq()
Applied. Thanks! Alex On Tue, May 9, 2023 at 10:32 AM Dan Carpenter wrote: > > This function was releasing the incorrect lock on the error path. > > Reported-by: kernel test robot > Fixes: 9bfa241d1289 ("drm/amdgpu: add [en/dis]able_kgq() functions") > Signed-off-by: Dan Carpenter > --- > The LKP robot sent me an email about this after I had already written > the patch. (I review LKP Smatch emails and hit forward). > > drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c > index 969f256aa003..7d2f119d9223 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c > @@ -644,7 +644,7 @@ int amdgpu_gfx_enable_kgq(struct amdgpu_device *adev, int > xcc_id) > adev->gfx.num_gfx_rings); > if (r) { > DRM_ERROR("Failed to lock KIQ (%d).\n", r); > - spin_unlock(>gfx.kiq[0].ring_lock); > + spin_unlock(>ring_lock); > return r; > } > > -- > 2.39.2 >
RE: [PATCH 1/2] drm/amdgpu: fix amdgpu_irq_put call trace in jpeg_v4_0_hw_fini
[AMD Official Use Only - General] Got it! Thanks, Horatio -Original Message- From: Zhang, Hawking Sent: Thursday, May 11, 2023 10:28 AM To: Zhang, Horatio ; Zhou1, Tao ; amd-gfx@lists.freedesktop.org Cc: Xu, Feifei ; Liu, Leo ; Jiang, Sonny ; Limonciello, Mario ; Liu, HaoPing (Alan) ; Zhou, Bob Subject: RE: [PATCH 1/2] drm/amdgpu: fix amdgpu_irq_put call trace in jpeg_v4_0_hw_fini [AMD Official Use Only - General] Please register dedicated ras_irq src and funcs for UVD_POISON, which should allow you to create vcn ras sw calls like gfx/sdma ip block. Regards, Hawking -Original Message- From: Zhang, Horatio Sent: Wednesday, May 10, 2023 18:55 To: Zhang, Hawking ; Zhou1, Tao ; amd-gfx@lists.freedesktop.org Cc: Xu, Feifei ; Liu, Leo ; Jiang, Sonny ; Limonciello, Mario ; Liu, HaoPing (Alan) ; Zhou, Bob Subject: RE: [PATCH 1/2] drm/amdgpu: fix amdgpu_irq_put call trace in jpeg_v4_0_hw_fini [AMD Official Use Only - General] Hi Hawking, When modprobe, the interrupt of jpeg/vcn was enabled in amdgpu_fence_driver_hw_init(). If the amdgpu_irq_get function is added in amdgpu_xxx_ras_late_init/xxx_v4_0_late_init, it will enable the instance interrupt twice. My previous modification plan also had this issue. Perhaps we should remove the amdgpu_irq_put function from jpeg/vcn_v4_0_hw_fini. Regards, Horatio -Original Message- From: Zhang, Hawking Sent: Monday, May 8, 2023 8:32 PM To: Zhou1, Tao ; Zhang, Horatio ; amd-gfx@lists.freedesktop.org Cc: Xu, Feifei ; Liu, Leo ; Jiang, Sonny ; Limonciello, Mario ; Liu, HaoPing (Alan) ; Zhang, Horatio Subject: RE: [PATCH 1/2] drm/amdgpu: fix amdgpu_irq_put call trace in jpeg_v4_0_hw_fini [AMD Official Use Only - General] Shall we consider creating amdgpu_vcn_ras_late_init as a common helper for interrupt enablement, like other IP blocks. This also reduces further effort when RAS feature is introduced in new version of vcn/jpeg Regards, Hawking -Original Message- From: Zhou1, Tao Sent: Monday, May 8, 2023 19:06 To: Zhang, Horatio ; amd-gfx@lists.freedesktop.org Cc: Zhang, Hawking ; Xu, Feifei ; Liu, Leo ; Jiang, Sonny ; Limonciello, Mario ; Liu, HaoPing (Alan) ; Zhang, Horatio Subject: RE: [PATCH 1/2] drm/amdgpu: fix amdgpu_irq_put call trace in jpeg_v4_0_hw_fini [AMD Official Use Only - General] The series is: Reviewed-by: Tao Zhou > -Original Message- > From: Horatio Zhang > Sent: Monday, May 8, 2023 6:20 PM > To: amd-gfx@lists.freedesktop.org > Cc: Zhang, Hawking ; Zhou1, Tao > ; Xu, Feifei ; Liu, Leo > ; Jiang, Sonny ; Limonciello, > Mario ; Liu, HaoPing (Alan) > ; Zhang, Horatio > Subject: [PATCH 1/2] drm/amdgpu: fix amdgpu_irq_put call trace in > jpeg_v4_0_hw_fini > > During the suspend, the jpeg_v4_0_hw_init function will use the > amdgpu_irq_put to disable the irq of jpeg.inst, but it was not enabled > during the resume process, which resulted in a call trace during the GPU > reset process. > > [ 50.497562] RIP: 0010:amdgpu_irq_put+0xa4/0xc0 [amdgpu] > [ 50.497619] RSP: 0018:aa2400fcfcb0 EFLAGS: 00010246 > [ 50.497620] RAX: RBX: 0001 RCX: > > [ 50.497621] RDX: RSI: RDI: > > [ 50.497621] RBP: aa2400fcfcd0 R08: R09: > > [ 50.497622] R10: R11: R12: > 99b2105242d8 > [ 50.497622] R13: R14: 99b21050 R15: > 99b21050 > [ 50.497623] FS: () GS:99b51848() > knlGS: > [ 50.497623] CS: 0010 DS: ES: CR0: 80050033 > [ 50.497624] CR2: 7f9d32aa91e8 CR3: 0001ba21 CR4: > 00750ee0 > [ 50.497624] PKRU: 5554 > [ 50.497625] Call Trace: > [ 50.497625] > [ 50.497627] jpeg_v4_0_hw_fini+0x43/0xc0 [amdgpu] > [ 50.497693] jpeg_v4_0_suspend+0x13/0x30 [amdgpu] > [ 50.497751] amdgpu_device_ip_suspend_phase2+0x240/0x470 [amdgpu] > [ 50.497802] amdgpu_device_ip_suspend+0x41/0x80 [amdgpu] > [ 50.497854] amdgpu_device_pre_asic_reset+0xd9/0x4a0 [amdgpu] > [ 50.497905] amdgpu_device_gpu_recover.cold+0x548/0xcf1 [amdgpu] > [ 50.498005] amdgpu_debugfs_reset_work+0x4c/0x80 [amdgpu] > [ 50.498060] process_one_work+0x21f/0x400 > [ 50.498063] worker_thread+0x200/0x3f0 > [ 50.498064] ? process_one_work+0x400/0x400 > [ 50.498065] kthread+0xee/0x120 > [ 50.498067] ? kthread_complete_and_exit+0x20/0x20 > [ 50.498068] ret_from_fork+0x22/0x30 > > Fixes: 86e8255f941e ("drm/amdgpu: add JPEG 4.0 RAS poison consumption > handling") > Signed-off-by: Horatio Zhang > --- > drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c | 9 - > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c > b/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c > index 77e1e64aa1d1..b5c14a166063 100644 > --- a/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c >
RE: [PATCH 1/2] drm/amdgpu: fix amdgpu_irq_put call trace in jpeg_v4_0_hw_fini
[AMD Official Use Only - General] Please register dedicated ras_irq src and funcs for UVD_POISON, which should allow you to create vcn ras sw calls like gfx/sdma ip block. Regards, Hawking -Original Message- From: Zhang, Horatio Sent: Wednesday, May 10, 2023 18:55 To: Zhang, Hawking ; Zhou1, Tao ; amd-gfx@lists.freedesktop.org Cc: Xu, Feifei ; Liu, Leo ; Jiang, Sonny ; Limonciello, Mario ; Liu, HaoPing (Alan) ; Zhou, Bob Subject: RE: [PATCH 1/2] drm/amdgpu: fix amdgpu_irq_put call trace in jpeg_v4_0_hw_fini [AMD Official Use Only - General] Hi Hawking, When modprobe, the interrupt of jpeg/vcn was enabled in amdgpu_fence_driver_hw_init(). If the amdgpu_irq_get function is added in amdgpu_xxx_ras_late_init/xxx_v4_0_late_init, it will enable the instance interrupt twice. My previous modification plan also had this issue. Perhaps we should remove the amdgpu_irq_put function from jpeg/vcn_v4_0_hw_fini. Regards, Horatio -Original Message- From: Zhang, Hawking Sent: Monday, May 8, 2023 8:32 PM To: Zhou1, Tao ; Zhang, Horatio ; amd-gfx@lists.freedesktop.org Cc: Xu, Feifei ; Liu, Leo ; Jiang, Sonny ; Limonciello, Mario ; Liu, HaoPing (Alan) ; Zhang, Horatio Subject: RE: [PATCH 1/2] drm/amdgpu: fix amdgpu_irq_put call trace in jpeg_v4_0_hw_fini [AMD Official Use Only - General] Shall we consider creating amdgpu_vcn_ras_late_init as a common helper for interrupt enablement, like other IP blocks. This also reduces further effort when RAS feature is introduced in new version of vcn/jpeg Regards, Hawking -Original Message- From: Zhou1, Tao Sent: Monday, May 8, 2023 19:06 To: Zhang, Horatio ; amd-gfx@lists.freedesktop.org Cc: Zhang, Hawking ; Xu, Feifei ; Liu, Leo ; Jiang, Sonny ; Limonciello, Mario ; Liu, HaoPing (Alan) ; Zhang, Horatio Subject: RE: [PATCH 1/2] drm/amdgpu: fix amdgpu_irq_put call trace in jpeg_v4_0_hw_fini [AMD Official Use Only - General] The series is: Reviewed-by: Tao Zhou > -Original Message- > From: Horatio Zhang > Sent: Monday, May 8, 2023 6:20 PM > To: amd-gfx@lists.freedesktop.org > Cc: Zhang, Hawking ; Zhou1, Tao > ; Xu, Feifei ; Liu, Leo > ; Jiang, Sonny ; Limonciello, > Mario ; Liu, HaoPing (Alan) > ; Zhang, Horatio > Subject: [PATCH 1/2] drm/amdgpu: fix amdgpu_irq_put call trace in > jpeg_v4_0_hw_fini > > During the suspend, the jpeg_v4_0_hw_init function will use the > amdgpu_irq_put to disable the irq of jpeg.inst, but it was not enabled > during the resume process, which resulted in a call trace during the GPU > reset process. > > [ 50.497562] RIP: 0010:amdgpu_irq_put+0xa4/0xc0 [amdgpu] > [ 50.497619] RSP: 0018:aa2400fcfcb0 EFLAGS: 00010246 > [ 50.497620] RAX: RBX: 0001 RCX: > > [ 50.497621] RDX: RSI: RDI: > > [ 50.497621] RBP: aa2400fcfcd0 R08: R09: > > [ 50.497622] R10: R11: R12: > 99b2105242d8 > [ 50.497622] R13: R14: 99b21050 R15: > 99b21050 > [ 50.497623] FS: () GS:99b51848() > knlGS: > [ 50.497623] CS: 0010 DS: ES: CR0: 80050033 > [ 50.497624] CR2: 7f9d32aa91e8 CR3: 0001ba21 CR4: > 00750ee0 > [ 50.497624] PKRU: 5554 > [ 50.497625] Call Trace: > [ 50.497625] > [ 50.497627] jpeg_v4_0_hw_fini+0x43/0xc0 [amdgpu] > [ 50.497693] jpeg_v4_0_suspend+0x13/0x30 [amdgpu] > [ 50.497751] amdgpu_device_ip_suspend_phase2+0x240/0x470 [amdgpu] > [ 50.497802] amdgpu_device_ip_suspend+0x41/0x80 [amdgpu] > [ 50.497854] amdgpu_device_pre_asic_reset+0xd9/0x4a0 [amdgpu] > [ 50.497905] amdgpu_device_gpu_recover.cold+0x548/0xcf1 [amdgpu] > [ 50.498005] amdgpu_debugfs_reset_work+0x4c/0x80 [amdgpu] > [ 50.498060] process_one_work+0x21f/0x400 > [ 50.498063] worker_thread+0x200/0x3f0 > [ 50.498064] ? process_one_work+0x400/0x400 > [ 50.498065] kthread+0xee/0x120 > [ 50.498067] ? kthread_complete_and_exit+0x20/0x20 > [ 50.498068] ret_from_fork+0x22/0x30 > > Fixes: 86e8255f941e ("drm/amdgpu: add JPEG 4.0 RAS poison consumption > handling") > Signed-off-by: Horatio Zhang > --- > drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c | 9 - > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c > b/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c > index 77e1e64aa1d1..b5c14a166063 100644 > --- a/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c > +++ b/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c > @@ -66,6 +66,13 @@ static int jpeg_v4_0_early_init(void *handle) > return 0; > } > > +static int jpeg_v4_0_late_init(void *handle) { > + struct amdgpu_device *adev = (struct amdgpu_device *)handle; > + > + return amdgpu_irq_get(adev, >jpeg.inst->irq, 0); } > + > /** > * jpeg_v4_0_sw_init - sw init for JPEG block > * > @@ -696,7
Re: [PATCH] drm/amd/amdgpu: Remove redundant else branch in amdgpu_encoders.c
On Tue, May 9, 2023 at 1:17 AM SHANMUGAM, SRINIVASAN wrote: > > [AMD Official Use Only - General] > > > > -Original Message- > From: Alex Deucher > Sent: Monday, May 8, 2023 9:27 PM > To: SHANMUGAM, SRINIVASAN > Cc: Koenig, Christian ; Deucher, Alexander > ; amd-gfx@lists.freedesktop.org > Subject: Re: [PATCH] drm/amd/amdgpu: Remove redundant else branch in > amdgpu_encoders.c > > On Mon, May 8, 2023 at 11:29 AM Srinivasan Shanmugam > wrote: > > > > Adhere to Linux kernel coding style. > > > > Reported by checkpatch: > > > > WARNING: else is not generally useful after a break or return > > > > What about the else in the previous case statement? > > Alex > > Hi Alex, > > Thanks a lot for your feedbacks, > > the else in the previous case ie., is binded to if statement ie., "if > (amdgpu_connector->use_digital) {", am I correct please?, please correct me, > if my understanding is wrong? & the best solution with your tips pls, so that > I can edit & resend the patch please? > Yes that one. It follows a similar pattern to the case you changed. Shouldn't checkpatch warn on both? Alex > Much appreciate for your help in advance, > > > Cc: Christian König > > Cc: Alex Deucher > > Signed-off-by: Srinivasan Shanmugam > > --- > > drivers/gpu/drm/amd/amdgpu/amdgpu_encoders.c | 26 > > ++-- > > 1 file changed, 13 insertions(+), 13 deletions(-) > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_encoders.c > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_encoders.c > > index c96e458ed088..049e9976ff34 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_encoders.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_encoders.c > > @@ -242,19 +242,18 @@ bool amdgpu_dig_monitor_is_duallink(struct > > drm_encoder *encoder, > > if ((dig_connector->dp_sink_type == > > CONNECTOR_OBJECT_ID_DISPLAYPORT) || > > (dig_connector->dp_sink_type == > > CONNECTOR_OBJECT_ID_eDP)) > > return false; > > - else { > > - /* HDMI 1.3 supports up to 340 Mhz over single link > > */ > > - if (connector->display_info.is_hdmi) { > > - if (pixel_clock > 34) > > - return true; > > - else > > - return false; > > - } else { > > - if (pixel_clock > 165000) > > - return true; > > - else > > - return false; > > - } > > + > > + /* HDMI 1.3 supports up to 340 Mhz over single link */ > > + if (connector->display_info.is_hdmi) { > > + if (pixel_clock > 34) > > + return true; > > + else > > + return false; > > + } else { > > + if (pixel_clock > 165000) > > + return true; > > + else > > + return false; > > } > > default: > > return false; > > -- > > 2.25.1 > >
[PATCH] drm/amdgpu: change gfx 11.0.4 external_id range
gfx 11.0.4 range starts from 0x80. Fixes: 311d52367d0a ("drm/amdgpu: add soc21 common ip block support for GC 11.0.4") Cc: sta...@vger.kernel.org Signed-off-by: Yifan Zhang Reported-by: Yogesh Mohan Marimuthu Acked-by: Alex Deucher Reviewed-by: Tim Huang --- drivers/gpu/drm/amd/amdgpu/soc21.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/soc21.c b/drivers/gpu/drm/amd/amdgpu/soc21.c index 0f82b8e83acb..6bff936a6e55 100644 --- a/drivers/gpu/drm/amd/amdgpu/soc21.c +++ b/drivers/gpu/drm/amd/amdgpu/soc21.c @@ -711,7 +711,7 @@ static int soc21_common_early_init(void *handle) AMD_PG_SUPPORT_VCN_DPG | AMD_PG_SUPPORT_GFX_PG | AMD_PG_SUPPORT_JPEG; - adev->external_rev_id = adev->rev_id + 0x1; + adev->external_rev_id = adev->rev_id + 0x80; break; default: -- 2.37.3
Re: [PATCH] drm/amd/amdgpu: Fix warnings in amdgpu _object, _ring.c
On Tue, May 9, 2023 at 10:03 AM Srinivasan Shanmugam wrote: > > Fix below warnings reported by checkpatch: > > WARNING: Prefer 'unsigned int' to bare use of 'unsigned' > WARNING: static const char * array should probably be static const char * > const > WARNING: space prohibited between function name and open parenthesis '(' > WARNING: braces {} are not necessary for single statement blocks > WARNING: Symbolic permissions 'S_IRUGO' are not preferred. Consider using > octal permissions '0444'. > > Cc: Christian König > Cc: Alex Deucher > Signed-off-by: Srinivasan Shanmugam Acked-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 10 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 9 - > 2 files changed, 9 insertions(+), 10 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > index 7c9b788ae0a9..fbd906ac556e 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > @@ -130,7 +130,7 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo > *abo, u32 domain) > u32 c = 0; > > if (domain & AMDGPU_GEM_DOMAIN_VRAM) { > - unsigned visible_pfn = adev->gmc.visible_vram_size >> > PAGE_SHIFT; > + unsigned int visible_pfn = adev->gmc.visible_vram_size >> > PAGE_SHIFT; > > places[c].fpfn = 0; > places[c].lpfn = 0; > @@ -935,7 +935,7 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 > domain, > bo->flags |= AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED; > amdgpu_bo_placement_from_domain(bo, domain); > for (i = 0; i < bo->placement.num_placement; i++) { > - unsigned fpfn, lpfn; > + unsigned int fpfn, lpfn; > > fpfn = min_offset >> PAGE_SHIFT; > lpfn = max_offset >> PAGE_SHIFT; > @@ -1016,7 +1016,7 @@ void amdgpu_bo_unpin(struct amdgpu_bo *bo) > } > } > > -static const char *amdgpu_vram_names[] = { > +static const char * const amdgpu_vram_names[] = { > "UNKNOWN", > "GDDR1", > "DDR2", > @@ -1148,8 +1148,8 @@ void amdgpu_bo_get_tiling_flags(struct amdgpu_bo *bo, > u64 *tiling_flags) > * Returns: > * 0 for success or a negative error code on failure. > */ > -int amdgpu_bo_set_metadata (struct amdgpu_bo *bo, void *metadata, > - uint32_t metadata_size, uint64_t flags) > +int amdgpu_bo_set_metadata(struct amdgpu_bo *bo, void *metadata, > + u32 metadata_size, uint64_t flags) > { > struct amdgpu_bo_user *ubo; > void *buffer; > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c > index a1d480b7fd1f..7429b20257a6 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c > @@ -78,7 +78,7 @@ unsigned int amdgpu_ring_max_ibs(enum amdgpu_ring_type type) > * Allocate @ndw dwords in the ring buffer (all asics). > * Returns 0 on success, error on failure. > */ > -int amdgpu_ring_alloc(struct amdgpu_ring *ring, unsigned ndw) > +int amdgpu_ring_alloc(struct amdgpu_ring *ring, unsigned int ndw) > { > /* Align requested size with padding so unlock_commit can > * pad safely */ > @@ -315,9 +315,8 @@ int amdgpu_ring_init(struct amdgpu_device *adev, struct > amdgpu_ring *ring, > amdgpu_ring_max_ibs(ring->funcs->type) * > ring->funcs->emit_ib_size; > max_ibs_dw = (max_ibs_dw + ring->funcs->align_mask) & > ~ring->funcs->align_mask; > > - if (WARN_ON(max_ibs_dw > max_dw)) { > + if (WARN_ON(max_ibs_dw > max_dw)) > max_dw = max_ibs_dw; > - } > > ring->ring_size = roundup_pow_of_two(max_dw * 4 * > sched_hw_submission); > > @@ -591,7 +590,7 @@ void amdgpu_debugfs_ring_init(struct amdgpu_device *adev, > char name[32]; > > sprintf(name, "amdgpu_ring_%s", ring->name); > - debugfs_create_file_size(name, S_IFREG | S_IRUGO, root, ring, > + debugfs_create_file_size(name, S_IFREG | 0444, root, ring, > _debugfs_ring_fops, > ring->ring_size + 12); > > @@ -601,7 +600,7 @@ void amdgpu_debugfs_ring_init(struct amdgpu_device *adev, > > if (ring->mqd_obj) { > sprintf(name, "amdgpu_mqd_%s", ring->name); > - debugfs_create_file_size(name, S_IFREG | S_IRUGO, root, ring, > + debugfs_create_file_size(name, S_IFREG | 0444, root, ring, > _debugfs_mqd_fops, > ring->mqd_size); > } > -- > 2.25.1 >
RE: [PATCH] drm/amdgpu: change gfx 11.0.4 external_id range
[AMD Official Use Only - General] This patch is Reviewed-by: Tim Huang Best Regards, Tim Huang -Original Message- From: Zhang, Yifan Sent: Wednesday, May 10, 2023 4:38 PM To: amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Huang, Tim ; Du, Xiaojian ; Limonciello, Mario ; Mohan Marimuthu, Yogesh ; Zhang, Yifan Subject: [PATCH] drm/amdgpu: change gfx 11.0.4 external_id range gfx 11.0.4 range starts from 0x80. Fixes: 311d52367d0a ("drm/amdgpu: add soc21 common ip block support for GC 11.0.4") Signed-off-by: Yifan Zhang --- drivers/gpu/drm/amd/amdgpu/soc21.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/soc21.c b/drivers/gpu/drm/amd/amdgpu/soc21.c index 0f82b8e83acb..6bff936a6e55 100644 --- a/drivers/gpu/drm/amd/amdgpu/soc21.c +++ b/drivers/gpu/drm/amd/amdgpu/soc21.c @@ -711,7 +711,7 @@ static int soc21_common_early_init(void *handle) AMD_PG_SUPPORT_VCN_DPG | AMD_PG_SUPPORT_GFX_PG | AMD_PG_SUPPORT_JPEG; - adev->external_rev_id = adev->rev_id + 0x1; + adev->external_rev_id = adev->rev_id + 0x80; break; default: -- 2.37.3
[PATCH 5/5] drm/amdgpu: add check for RAS instance mask
From: Tao Zhou The mask is only needed to be set when RAS block instance number is more than 1 and invalid bits should be also masked out. We only check valid bits for GFX and SDMA block for now, and will add check for other RAS blocks in the future. v2: move the check under injection operation since the mask is only used by RAS error inject. v3: add valid bits handling for SDMA. v4: print message if the mask is adjusted. Signed-off-by: Tao Zhou Hawking Zhang Reviewed-by: Stanley.Yang Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 38 + 1 file changed, 38 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index b7d8250a9281..6bb438642cc0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -333,6 +333,42 @@ static int amdgpu_ras_debugfs_ctrl_parse_data(struct file *f, return 0; } +static void amdgpu_ras_instance_mask_check(struct amdgpu_device *adev, + struct ras_debug_if *data) +{ + int num_xcc = adev->gfx.xcc_mask ? NUM_XCC(adev->gfx.xcc_mask) : 1; + uint32_t mask, inst_mask = data->inject.instance_mask; + + /* no need to set instance mask if there is only one instance */ + if (num_xcc <= 1 && inst_mask) { + data->inject.instance_mask = 0; + dev_dbg(adev->dev, + "RAS inject mask(0x%x) isn't supported and force it to 0.\n", + inst_mask); + + return; + } + + switch (data->head.block) { + case AMDGPU_RAS_BLOCK__GFX: + mask = GENMASK(num_xcc - 1, 0); + break; + case AMDGPU_RAS_BLOCK__SDMA: + mask = GENMASK(adev->sdma.num_instances - 1, 0); + break; + default: + mask = 0; + break; + } + + /* remove invalid bits in instance mask */ + data->inject.instance_mask &= mask; + if (inst_mask != data->inject.instance_mask) + dev_dbg(adev->dev, + "Adjust RAS inject mask 0x%x to 0x%x\n", + inst_mask, data->inject.instance_mask); +} + /** * DOC: AMDGPU RAS debugfs control interface * @@ -468,6 +504,8 @@ static ssize_t amdgpu_ras_debugfs_ctrl_write(struct file *f, break; } + amdgpu_ras_instance_mask_check(adev, ); + /* data.inject.address is offset instead of absolute gpu address */ ret = amdgpu_ras_error_inject(adev, ); break; -- 2.40.1
[PATCH 3/5] drm/amdgpu: reorganize RAS injection flow
From: Tao Zhou So GFX RAS injection could use default function if it doesn't define its own injection interface. Signed-off-by: Tao Zhou Reviewed-by: Hawking Zhang Reviewed-by: Stanley.Yang Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 13 ++--- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index 7ae08f168f99..b7d8250a9281 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -1123,16 +1123,15 @@ int amdgpu_ras_error_inject(struct amdgpu_device *adev, block_info.address); } - if (info->head.block == AMDGPU_RAS_BLOCK__GFX) { - if (block_obj->hw_ops->ras_error_inject) + if (block_obj->hw_ops->ras_error_inject) { + if (info->head.block == AMDGPU_RAS_BLOCK__GFX) ret = block_obj->hw_ops->ras_error_inject(adev, info, info->instance_mask); - } else { - /* If defined special ras_error_inject(e.g: xgmi), implement special ras_error_inject */ - if (block_obj->hw_ops->ras_error_inject) + else /* Special ras_error_inject is defined (e.g: xgmi) */ ret = block_obj->hw_ops->ras_error_inject(adev, _info, info->instance_mask); - else /*If not defined .ras_error_inject, use default ras_error_inject*/ - ret = psp_ras_trigger_error(>psp, _info, info->instance_mask); + } else { + /* default path */ + ret = psp_ras_trigger_error(>psp, _info, info->instance_mask); } if (ret) -- 2.40.1
[PATCH 2/5] drm/amdgpu: add instance mask for RAS inject
From: Tao Zhou User can specify injected instances by the mask. For backward compatibility, the mask value is incorporated into sub block index without interface change of RAS TA. User uses logical mask and driver should convert it to physical value before sending it to RAS TA. v2: update parameter name. Signed-off-by: Tao Zhou Reviewed-by: Hawking Zhang Reviewed-by: Stanley.Yang Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 21 - drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 23 --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 9 - drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 5 +++-- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c| 6 +++--- drivers/gpu/drm/amd/amdgpu/gfx_v9_4.c| 4 ++-- drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c | 5 +++-- 8 files changed, 56 insertions(+), 19 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c index ec79a5c2f500..59b8b26e2caf 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c @@ -1672,14 +1672,33 @@ int psp_ras_initialize(struct psp_context *psp) } int psp_ras_trigger_error(struct psp_context *psp, - struct ta_ras_trigger_error_input *info) + struct ta_ras_trigger_error_input *info, uint32_t instance_mask) { struct ta_ras_shared_memory *ras_cmd; + struct amdgpu_device *adev = psp->adev; int ret; + uint32_t dev_mask; if (!psp->ras_context.context.initialized) return -EINVAL; + switch (info->block_id) { + case TA_RAS_BLOCK__GFX: + dev_mask = GET_MASK(GC, instance_mask); + break; + case TA_RAS_BLOCK__SDMA: + dev_mask = GET_MASK(SDMA0, instance_mask); + break; + default: + dev_mask = instance_mask; + break; + } + + /* reuse sub_block_index for backward compatibility */ + dev_mask <<= AMDGPU_RAS_INST_SHIFT; + dev_mask &= AMDGPU_RAS_INST_MASK; + info->sub_block_index |= dev_mask; + ras_cmd = (struct ta_ras_shared_memory *)psp->ras_context.context.mem_context.shared_buf; memset(ras_cmd, 0, sizeof(struct ta_ras_shared_memory)); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h index 0a409da749d1..d84323923a3f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h @@ -486,7 +486,7 @@ int psp_ras_invoke(struct psp_context *psp, uint32_t ta_cmd_id); int psp_ras_enable_features(struct psp_context *psp, union ta_ras_cmd_input *info, bool enable); int psp_ras_trigger_error(struct psp_context *psp, - struct ta_ras_trigger_error_input *info); + struct ta_ras_trigger_error_input *info, uint32_t instance_mask); int psp_ras_terminate(struct psp_context *psp); int psp_hdcp_invoke(struct psp_context *psp, uint32_t ta_cmd_id); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index 64f80e8cbd63..7ae08f168f99 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -256,6 +256,8 @@ static int amdgpu_ras_debugfs_ctrl_parse_data(struct file *f, int block_id; uint32_t sub_block; u64 address, value; + /* default value is 0 if the mask is not set by user */ + u32 instance_mask = 0; if (*pos) return -EINVAL; @@ -306,7 +308,11 @@ static int amdgpu_ras_debugfs_ctrl_parse_data(struct file *f, data->op = op; if (op == 2) { - if (sscanf(str, "%*s %*s %*s 0x%x 0x%llx 0x%llx", + if (sscanf(str, "%*s %*s %*s 0x%x 0x%llx 0x%llx 0x%x", + _block, , , _mask) != 4 && + sscanf(str, "%*s %*s %*s %u %llu %llu %u", + _block, , , _mask) != 4 && + sscanf(str, "%*s %*s %*s 0x%x 0x%llx 0x%llx", _block, , ) != 3 && sscanf(str, "%*s %*s %*s %u %llu %llu", _block, , ) != 3) @@ -314,6 +320,7 @@ static int amdgpu_ras_debugfs_ctrl_parse_data(struct file *f, data->head.sub_block_index = sub_block; data->inject.address = address; data->inject.value = value; + data->inject.instance_mask = instance_mask; } } else { if (size < sizeof(*data)) @@ -341,7 +348,7 @@ static int amdgpu_ras_debugfs_ctrl_parse_data(struct file *f, * sub_block_index: some IPs have subcomponets. say, GFX, sDMA. * name:
[PATCH 4/5] drm/amdgpu: remove RAS GFX injection for gfx_v9_4/gfx_v9_4_2
From: Tao Zhou No special requirement in RAS injection for the two versions, switch to use default injection interface. Signed-off-by: Tao Zhou Reviewed-by: Hawking Zhang Reviewed-by: Stanley.Yang Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v9_4.c | 24 drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c | 24 2 files changed, 48 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4.c index 59abe162bbaf..bc8416afb62c 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4.c @@ -970,29 +970,6 @@ static void gfx_v9_4_reset_ras_error_count(struct amdgpu_device *adev) WREG32_SOC15(GC, 0, mmATC_L2_CACHE_4K_DSM_INDEX, 255); } -static int gfx_v9_4_ras_error_inject(struct amdgpu_device *adev, -void *inject_if, uint32_t instance_mask) -{ - struct ras_inject_if *info = (struct ras_inject_if *)inject_if; - int ret; - struct ta_ras_trigger_error_input block_info = { 0 }; - - if (!amdgpu_ras_is_supported(adev, AMDGPU_RAS_BLOCK__GFX)) - return -EINVAL; - - block_info.block_id = amdgpu_ras_block_to_ta(info->head.block); - block_info.sub_block_index = info->head.sub_block_index; - block_info.inject_error_type = amdgpu_ras_error_to_ta(info->head.type); - block_info.address = info->address; - block_info.value = info->value; - - mutex_lock(>grbm_idx_mutex); - ret = psp_ras_trigger_error(>psp, _info, instance_mask); - mutex_unlock(>grbm_idx_mutex); - - return ret; -} - static const struct soc15_reg_entry gfx_v9_4_ea_err_status_regs = { SOC15_REG_ENTRY(GC, 0, mmGCEA_ERR_STATUS), 0, 1, 32 }; @@ -1030,7 +1007,6 @@ static void gfx_v9_4_query_ras_error_status(struct amdgpu_device *adev) const struct amdgpu_ras_block_hw_ops gfx_v9_4_ras_ops = { - .ras_error_inject = _v9_4_ras_error_inject, .query_ras_error_count = _v9_4_query_ras_error_count, .reset_ras_error_count = _v9_4_reset_ras_error_count, .query_ras_error_status = _v9_4_query_ras_error_status, diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c index 4906affa6f8c..2cc3a7cb1f54 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c @@ -1699,29 +1699,6 @@ static void gfx_v9_4_2_reset_ras_error_count(struct amdgpu_device *adev) gfx_v9_4_2_query_utc_edc_count(adev, NULL, NULL); } -static int gfx_v9_4_2_ras_error_inject(struct amdgpu_device *adev, - void *inject_if, uint32_t instance_mask) -{ - struct ras_inject_if *info = (struct ras_inject_if *)inject_if; - int ret; - struct ta_ras_trigger_error_input block_info = { 0 }; - - if (!amdgpu_ras_is_supported(adev, AMDGPU_RAS_BLOCK__GFX)) - return -EINVAL; - - block_info.block_id = amdgpu_ras_block_to_ta(info->head.block); - block_info.sub_block_index = info->head.sub_block_index; - block_info.inject_error_type = amdgpu_ras_error_to_ta(info->head.type); - block_info.address = info->address; - block_info.value = info->value; - - mutex_lock(>grbm_idx_mutex); - ret = psp_ras_trigger_error(>psp, _info, instance_mask); - mutex_unlock(>grbm_idx_mutex); - - return ret; -} - static void gfx_v9_4_2_query_ea_err_status(struct amdgpu_device *adev) { uint32_t i, j; @@ -1945,7 +1922,6 @@ static bool gfx_v9_4_2_query_uctl2_poison_status(struct amdgpu_device *adev) } struct amdgpu_ras_block_hw_ops gfx_v9_4_2_ras_ops = { - .ras_error_inject = _v9_4_2_ras_error_inject, .query_ras_error_count = _v9_4_2_query_ras_error_count, .reset_ras_error_count = _v9_4_2_reset_ras_error_count, .query_ras_error_status = _v9_4_2_query_ras_error_status, -- 2.40.1
[PATCH 1/5] drm/amdgpu: convert logical instance mask to physical one
From: Tao Zhou Convert instance mask for the convenience of RAS TA. Signed-off-by: Tao Zhou Reviewed-by: Hawking Zhang Reviewed-by: Stanley.Yang Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu.h| 6 -- .../drm/amd/amdgpu/aqua_vanjaram_reg_init.c| 18 ++ drivers/gpu/drm/amd/amdgpu/soc15_common.h | 7 ++- 3 files changed, 28 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 4fb43baddf96..22f1e197cc09 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -698,12 +698,14 @@ enum amd_hw_ip_block_type { #define IP_VERSION_REV(ver) ((ver) & 0xFF) struct amdgpu_ip_map_info { - /* Map of logical to actual dev instances */ + /* Map of logical to actual dev instances/mask */ uint32_tdev_inst[MAX_HWIP][HWIP_MAX_INSTANCE]; int8_t (*logical_to_dev_inst)(struct amdgpu_device *adev, enum amd_hw_ip_block_type block, int8_t inst); - + uint32_t (*logical_to_dev_mask)(struct amdgpu_device *adev, + enum amd_hw_ip_block_type block, + uint32_t mask); }; struct amd_powerplay { diff --git a/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram_reg_init.c b/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram_reg_init.c index 93e9f947a85d..68d1a0fc5f5d 100644 --- a/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram_reg_init.c +++ b/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram_reg_init.c @@ -229,6 +229,23 @@ static int8_t aqua_vanjaram_logical_to_dev_inst(struct amdgpu_device *adev, return dev_inst; } +static uint32_t aqua_vanjaram_logical_to_dev_mask(struct amdgpu_device *adev, +enum amd_hw_ip_block_type block, +uint32_t mask) +{ + uint32_t dev_mask = 0; + int8_t log_inst, dev_inst; + + while (mask) { + log_inst = ffs(mask) - 1; + dev_inst = aqua_vanjaram_logical_to_dev_inst(adev, block, log_inst); + dev_mask |= (1 << dev_inst); + mask &= ~(1 << log_inst); + } + + return dev_mask; +} + static void aqua_vanjaram_populate_ip_map(struct amdgpu_device *adev, enum amd_hw_ip_block_type ip_block, uint32_t inst_mask) @@ -257,6 +274,7 @@ void aqua_vanjaram_ip_map_init(struct amdgpu_device *adev) aqua_vanjaram_populate_ip_map(adev, ip_map[i][0], ip_map[i][1]); adev->ip_map.logical_to_dev_inst = aqua_vanjaram_logical_to_dev_inst; + adev->ip_map.logical_to_dev_mask = aqua_vanjaram_logical_to_dev_mask; } /* Fixed pattern for smn addressing on different AIDs: diff --git a/drivers/gpu/drm/amd/amdgpu/soc15_common.h b/drivers/gpu/drm/amd/amdgpu/soc15_common.h index 3730c5ec202f..96948a59f8dd 100644 --- a/drivers/gpu/drm/amd/amdgpu/soc15_common.h +++ b/drivers/gpu/drm/amd/amdgpu/soc15_common.h @@ -25,7 +25,12 @@ #define __SOC15_COMMON_H__ /* GET_INST returns the physical instance corresponding to a logical instance */ -#define GET_INST(ip, inst) (adev->ip_map.logical_to_dev_inst? adev->ip_map.logical_to_dev_inst(adev, ip##_HWIP, inst): inst) +#define GET_INST(ip, inst) \ + (adev->ip_map.logical_to_dev_inst ? \ + adev->ip_map.logical_to_dev_inst(adev, ip##_HWIP, inst) : inst) +#define GET_MASK(ip, mask) \ + (adev->ip_map.logical_to_dev_mask ? \ + adev->ip_map.logical_to_dev_mask(adev, ip##_HWIP, mask) : mask) /* Register Access Macros */ #define SOC15_REG_OFFSET(ip, inst, reg) (adev->reg_offset[ip##_HWIP][inst][reg##_BASE_IDX] + reg) -- 2.40.1
[PATCH] drm/amdgpu: Enable IH CAM on GFX9.4.3
From: Mukul Joshi This patch enables IH CAM on GFX9.4.3 ASIC. Signed-off-by: Mukul Joshi Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/vega20_ih.c | 3 ++- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c b/drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c index e1552d645308..755259e96bbc 100644 --- a/drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c +++ b/drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c @@ -265,7 +265,7 @@ static void nbio_v7_9_ih_doorbell_range(struct amdgpu_device *adev, ih_doorbell_range = REG_SET_FIELD(ih_doorbell_range, DOORBELL0_CTRL_ENTRY_0, BIF_DOORBELL0_RANGE_SIZE_ENTRY, - 0x4); + 0x8); ih_doorbell_ctrl = REG_SET_FIELD(ih_doorbell_ctrl, S2A_DOORBELL_ENTRY_1_CTRL, @@ -278,7 +278,7 @@ static void nbio_v7_9_ih_doorbell_range(struct amdgpu_device *adev, S2A_DOORBELL_PORT1_RANGE_OFFSET, 0); ih_doorbell_ctrl = REG_SET_FIELD(ih_doorbell_ctrl, S2A_DOORBELL_ENTRY_1_CTRL, - S2A_DOORBELL_PORT1_RANGE_SIZE, 0x4); + S2A_DOORBELL_PORT1_RANGE_SIZE, 0x8); ih_doorbell_ctrl = REG_SET_FIELD(ih_doorbell_ctrl, S2A_DOORBELL_ENTRY_1_CTRL, S2A_DOORBELL_PORT1_AWADDR_31_28_VALUE, 0); diff --git a/drivers/gpu/drm/amd/amdgpu/vega20_ih.c b/drivers/gpu/drm/amd/amdgpu/vega20_ih.c index 17ccf02462ab..4d719df376a7 100644 --- a/drivers/gpu/drm/amd/amdgpu/vega20_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/vega20_ih.c @@ -334,7 +334,8 @@ static int vega20_ih_irq_init(struct amdgpu_device *adev) vega20_setup_retry_doorbell(adev->irq.retry_cam_doorbell_index)); /* Enable IH Retry CAM */ - if (adev->ip_versions[OSSSYS_HWIP][0] == IP_VERSION(4, 4, 0)) + if (adev->ip_versions[OSSSYS_HWIP][0] == IP_VERSION(4, 4, 0) || + adev->ip_versions[OSSSYS_HWIP][0] == IP_VERSION(4, 4, 2)) WREG32_FIELD15(OSSSYS, 0, IH_RETRY_INT_CAM_CNTL_ALDEBARAN, ENABLE, 1); else -- 2.40.1
[PATCH 27/29] drm/amdgpu: route ioctls on primary node of XCPs to primary device
From: Shiwu Zhang During XCP init, unlike the primary device, there is no amdgpu_device attached to each XCP's drm_device In case that user trying to open/close the primary node of XCP drm_device this rerouting is to solve the NULL pointer issue causing by referring to any member of the amdgpu_device BUG: unable to handle page fault for address: 00020c80 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page Oops: 0002 [#1] PREEMPT SMP NOPTI Call Trace: lock_timer_base+0x6b/0x90 try_to_del_timer_sync+0x2b/0x80 del_timer_sync+0x29/0x40 flush_delayed_work+0x1c/0x50 amdgpu_driver_open_kms+0x2c/0x280 [amdgpu] drm_file_alloc+0x1b3/0x260 [drm] drm_open+0xaa/0x280 [drm] drm_stub_open+0xa2/0x120 [drm] chrdev_open+0xa6/0x1c0 Signed-off-by: Shiwu Zhang Reviewed-by: Le Ma Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c index 610c32c4f5af..daeb6bcc9245 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c @@ -241,6 +241,7 @@ static int amdgpu_xcp_dev_alloc(struct amdgpu_device *adev) /* Redirect all IOCTLs to the primary device */ p_ddev->render->dev = ddev; + p_ddev->primary->dev = ddev; p_ddev->vma_offset_manager = ddev->vma_offset_manager; adev->xcp_mgr->xcp[i].ddev = p_ddev; } -- 2.40.1
[PATCH 29/29] drm/amdgpu: Correct get_xcp_mem_id calculation
From: Philip Yang Current calculation only works for NPS4/QPX mode, correct it for NPS4/CPX mode. Signed-off-by: Philip Yang Reviewed-by: Lijo Lazar Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/aqua_vanjaram_reg_init.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram_reg_init.c b/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram_reg_init.c index 4ca932a62ce6..93e9f947a85d 100644 --- a/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram_reg_init.c +++ b/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram_reg_init.c @@ -518,10 +518,9 @@ static int aqua_vanjaram_switch_partition_mode(struct amdgpu_xcp_mgr *xcp_mgr, static int __aqua_vanjaram_get_xcp_mem_id(struct amdgpu_device *adev, int xcc_id, uint8_t *mem_id) { - /* TODO: Check if any validation is required based on current -* memory/spatial modes -*/ + /* memory/spatial modes validation check is already done */ *mem_id = xcc_id / adev->gfx.num_xcc_per_xcp; + *mem_id /= adev->xcp_mgr->num_xcp_per_mem_partition; return 0; } -- 2.40.1
[PATCH 18/29] drm/amdkfd: Update MTYPE for far memory partition
From: Philip Yang Use MTYPE RW/MTYPE_CC for mapping system memory or VRAM to KFD node within the same memory partition, use MTYPE_NC for mapping on KFD node from the far memory partition of the same socket or from another socket on same XGMI hive. On NPS4 or 4P system, MTYPE will be overridden per page depending on the memory NUMA node id and vm->mem_id. Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 15 +++ drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 9 + 2 files changed, 12 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c index 7dfe6a8ca91a..ee5d4d67b423 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c @@ -1191,7 +1191,7 @@ static void gmc_v9_0_get_coherence_flags(struct amdgpu_device *adev, bool is_vram = bo->tbo.resource->mem_type == TTM_PL_VRAM; bool coherent = bo->flags & AMDGPU_GEM_CREATE_COHERENT; bool uncached = bo->flags & AMDGPU_GEM_CREATE_UNCACHED; - /* TODO: memory partitions struct amdgpu_vm *vm = mapping->bo_va->base.vm;*/ + struct amdgpu_vm *vm = mapping->bo_va->base.vm; unsigned int mtype_local, mtype; bool snoop = false; bool is_local; @@ -1252,8 +1252,8 @@ static void gmc_v9_0_get_coherence_flags(struct amdgpu_device *adev, } is_local = (!is_vram && (adev->flags & AMD_IS_APU) && num_possible_nodes() <= 1) || - (is_vram && adev == bo_adev /* TODO: memory partitions && - bo->mem_id == vm->mem_id*/); + (is_vram && adev == bo_adev && + bo->mem_id == vm->mem_id); snoop = true; if (uncached) { mtype = MTYPE_UC; @@ -1340,13 +1340,12 @@ static void gmc_v9_0_override_vm_pte_flags(struct amdgpu_device *adev, return; } - /* TODO: memory partitions. mem_id is hard-coded to 0 for now. -* FIXME: Only supported on native mode for now. For carve-out, the + /* FIXME: Only supported on native mode for now. For carve-out, the * NUMA affinity of the GPU/VM needs to come from the PCI info because * memory partitions are not associated with different NUMA nodes. */ - if (adev->gmc.is_app_apu) { - local_node = adev->gmc.mem_partitions[/*vm->mem_id*/0].numa.node; + if (adev->gmc.is_app_apu && vm->mem_id >= 0) { + local_node = adev->gmc.mem_partitions[vm->mem_id].numa.node; } else { dev_dbg(adev->dev, "Only native mode APU is supported.\n"); return; @@ -1361,7 +1360,7 @@ static void gmc_v9_0_override_vm_pte_flags(struct amdgpu_device *adev, } nid = pfn_to_nid(addr >> PAGE_SHIFT); dev_dbg(adev->dev, "vm->mem_id=%d, local_node=%d, nid=%d\n", - /*vm->mem_id*/0, local_node, nid); + vm->mem_id, local_node, nid); if (nid == local_node) { uint64_t old_flags = *flags; unsigned int mtype_local = MTYPE_RW; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index f6a886d9e902..8b5453fd304a 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -1202,8 +1202,8 @@ svm_range_get_pte_flags(struct kfd_node *node, mapping_flags |= AMDGPU_VM_MTYPE_UC; } else if (domain == SVM_RANGE_VRAM_DOMAIN) { /* local HBM region close to partition */ - if (bo_node->adev == node->adev /* TODO: memory partitions && - bo_node->mem_id == node->mem_id*/) + if (bo_node->adev == node->adev && + (!bo_node->xcp || !node->xcp || bo_node->xcp->mem_id == node->xcp->mem_id)) mapping_flags |= mtype_local; /* local HBM region far from partition or remote XGMI GPU */ else if (svm_nodes_in_same_hive(bo_node, node)) @@ -1357,8 +1357,9 @@ svm_range_map_to_gpu(struct kfd_process_device *pdd, struct svm_range *prange, (last_domain == SVM_RANGE_VRAM_DOMAIN) ? 1 : 0, pte_flags); - /* TODO: we still need to determine the vm_manager.vram_base_offset based on -* the memory partition. + /* For dGPU mode, we use same vm_manager to allocate VRAM for +* different memory partition based on fpfn/lpfn, we should use +* same vm_manager.vram_base_offset regardless memory partition. */ r = amdgpu_vm_update_range(adev, vm, false, false, flush_tlb, NULL,
[PATCH 24/29] drm/amdkfd: Move local_mem_info to kfd_node
From: Mukul Joshi We need to track memory usage on a per partition basis. To do that, store the local memory information in KFD node instead of kfd device. v2: squash in fix ("amdkfd: Use mem_id to access mem_partition info") Signed-off-by: Mukul Joshi Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 17 + drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 12 +++- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 7 --- drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_device.c| 7 +-- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 3 ++- drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 7 --- 7 files changed, 36 insertions(+), 19 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c index 00edb13d2124..85df73f2c85e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c @@ -428,14 +428,23 @@ uint32_t amdgpu_amdkfd_get_fw_version(struct amdgpu_device *adev, } void amdgpu_amdkfd_get_local_mem_info(struct amdgpu_device *adev, - struct kfd_local_mem_info *mem_info) + struct kfd_local_mem_info *mem_info, + uint8_t xcp_id) { memset(mem_info, 0, sizeof(*mem_info)); - mem_info->local_mem_size_public = adev->gmc.visible_vram_size; - mem_info->local_mem_size_private = adev->gmc.real_vram_size - + if (adev->ip_versions[GC_HWIP][0] == IP_VERSION(9, 4, 3)) { + if (adev->gmc.real_vram_size == adev->gmc.visible_vram_size) + mem_info->local_mem_size_public = + KFD_XCP_MEMORY_SIZE(adev, xcp_id); + else + mem_info->local_mem_size_private = + KFD_XCP_MEMORY_SIZE(adev, xcp_id); + } else { + mem_info->local_mem_size_public = adev->gmc.visible_vram_size; + mem_info->local_mem_size_private = adev->gmc.real_vram_size - adev->gmc.visible_vram_size; - + } mem_info->vram_width = adev->gmc.vram_width; pr_debug("Address base: %pap public 0x%llx private 0x%llx\n", diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h index 4e6221bccffe..4bf6f5659568 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h @@ -231,7 +231,8 @@ int amdgpu_amdkfd_remove_gws_from_process(void *info, void *mem); uint32_t amdgpu_amdkfd_get_fw_version(struct amdgpu_device *adev, enum kgd_engine_type type); void amdgpu_amdkfd_get_local_mem_info(struct amdgpu_device *adev, - struct kfd_local_mem_info *mem_info); + struct kfd_local_mem_info *mem_info, + uint8_t xcp_id); uint64_t amdgpu_amdkfd_get_gpu_clock_counter(struct amdgpu_device *adev); uint32_t amdgpu_amdkfd_get_max_engine_clock_in_mhz(struct amdgpu_device *adev); @@ -334,10 +335,11 @@ void amdgpu_amdkfd_unreserve_mem_limit(struct amdgpu_device *adev, ((adev)->xcp_mgr && (xcp_id) >= 0 ?\ (adev)->xcp_mgr->xcp[(xcp_id)].mem_id : -1) -#define KFD_XCP_MEMORY_SIZE(n) ((n)->adev->gmc.num_mem_partitions ?\ - (n)->adev->gmc.mem_partitions[(n)->xcp->mem_id].size /\ - (n)->adev->xcp_mgr->num_xcp_per_mem_partition :\ - (n)->adev->gmc.real_vram_size) +#define KFD_XCP_MEMORY_SIZE(adev, xcp_id)\ + ((adev)->gmc.num_mem_partitions && (xcp_id) >= 0 ?\ + (adev)->gmc.mem_partitions[KFD_XCP_MEM_ID((adev), (xcp_id))].size /\ + (adev)->xcp_mgr->num_xcp_per_mem_partition :\ + (adev)->gmc.real_vram_size) #if IS_ENABLED(CONFIG_HSA_AMD) void amdgpu_amdkfd_gpuvm_init_mem_limits(void); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c index 344b238d6771..089e1d498670 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c @@ -1023,11 +1023,12 @@ bool kfd_dev_is_large_bar(struct kfd_node *dev) if (dev->kfd->use_iommu_v2) return false; - if (dev->kfd->local_mem_info.local_mem_size_private == 0 && - dev->kfd->local_mem_info.local_mem_size_public > 0) + if (dev->local_mem_info.local_mem_size_private == 0 && + dev->local_mem_info.local_mem_size_public > 0) return true; - if (dev->kfd->local_mem_info.local_mem_size_public == 0 && dev->kfd->adev->gmc.is_app_apu) { + if (dev->local_mem_info.local_mem_size_public == 0 && +
[PATCH 23/29] drm/amdgpu: use xcp partition ID for amdgpu_gem
From: James Zhu Find xcp_id from amdgpu_fpriv, use it for amdgpu_gem_object_create. Signed-off-by: James Zhu Acked-by: Felix Kuehling Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c index b02d106d5a0c..aad860667ab5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c @@ -336,7 +336,7 @@ int amdgpu_gem_create_ioctl(struct drm_device *dev, void *data, retry: r = amdgpu_gem_object_create(adev, size, args->in.alignment, initial_domain, -flags, ttm_bo_type_device, resv, , 0); +flags, ttm_bo_type_device, resv, , fpriv->xcp_id + 1); if (r && r != -ERESTARTSYS) { if (flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) { flags &= ~AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED; @@ -379,6 +379,7 @@ int amdgpu_gem_userptr_ioctl(struct drm_device *dev, void *data, struct ttm_operation_ctx ctx = { true, false }; struct amdgpu_device *adev = drm_to_adev(dev); struct drm_amdgpu_gem_userptr *args = data; + struct amdgpu_fpriv *fpriv = filp->driver_priv; struct drm_gem_object *gobj; struct hmm_range *range; struct amdgpu_bo *bo; @@ -405,7 +406,7 @@ int amdgpu_gem_userptr_ioctl(struct drm_device *dev, void *data, /* create a gem object to contain this object in */ r = amdgpu_gem_object_create(adev, args->size, 0, AMDGPU_GEM_DOMAIN_CPU, -0, ttm_bo_type_device, NULL, , 0); +0, ttm_bo_type_device, NULL, , fpriv->xcp_id + 1); if (r) return r; @@ -908,6 +909,7 @@ int amdgpu_mode_dumb_create(struct drm_file *file_priv, struct drm_mode_create_dumb *args) { struct amdgpu_device *adev = drm_to_adev(dev); + struct amdgpu_fpriv *fpriv = file_priv->driver_priv; struct drm_gem_object *gobj; uint32_t handle; u64 flags = AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED | @@ -931,7 +933,7 @@ int amdgpu_mode_dumb_create(struct drm_file *file_priv, domain = amdgpu_bo_get_preferred_domain(adev, amdgpu_display_supported_domains(adev, flags)); r = amdgpu_gem_object_create(adev, args->size, 0, domain, flags, -ttm_bo_type_device, NULL, , 0); +ttm_bo_type_device, NULL, , fpriv->xcp_id + 1); if (r) return -ENOMEM; -- 2.40.1
[PATCH 19/29] drm/amdgpu: Alloc page table on correct memory partition
From: Philip Yang Alloc kernel mode page table bo uses the amdgpu_vm->mem_id + 1 as bp mem_id_plus1 parameter. For APU mode, select the correct TTM pool to alloc page from the corresponding memory partition, this will be the closest NUMA node. For dGPU mode, select the correct address range for vram manager. Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c index 60b1da93b06d..62fc7e8d326e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c @@ -534,6 +534,8 @@ int amdgpu_vm_pt_create(struct amdgpu_device *adev, struct amdgpu_vm *vm, bp.type = ttm_bo_type_kernel; bp.no_wait_gpu = immediate; + bp.mem_id_plus1 = vm->mem_id + 1; + if (vm->root.bo) bp.resv = vm->root.bo->tbo.base.resv; @@ -558,6 +560,7 @@ int amdgpu_vm_pt_create(struct amdgpu_device *adev, struct amdgpu_vm *vm, bp.type = ttm_bo_type_kernel; bp.resv = bo->tbo.base.resv; bp.bo_ptr_size = sizeof(struct amdgpu_bo); + bp.mem_id_plus1 = vm->mem_id + 1; r = amdgpu_bo_create(adev, , &(*vmbo)->shadow); -- 2.40.1
[PATCH 15/29] drm/amdkfd: Alloc memory of GPU support memory partition
From: Philip Yang For dGPU mode VRAM allocation, create amdgpu_bo from amdgpu_vm->mem_id, to alloc from the correct memory range. For APU mode VRAM allocation, set alloc domain to GTT, and set bp->mem_id_plus1 from amdgpu_vm->mem_id + 1 to create amdgpu_bo, to allocate system memory from correct NUMA node. For GTT allocation, use mem_id -1 to allocate system memory from any NUMA nodes. Remove amdgpu_ttm_tt_set_mem_pool, to avoid the confusion that memory maybe allocated from different mem_id. Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 24 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 20 +--- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 1 - 3 files changed, 8 insertions(+), 37 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 6d0c25e34af1..71b22d61dd27 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -1640,9 +1640,9 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu( struct drm_gem_object *gobj = NULL; u32 domain, alloc_domain; uint64_t aligned_size; + int8_t mem_id = -1; u64 alloc_flags; int ret; - int mem_id = 0; /* Fixme : to be changed when mem_id support patch lands, until then NPS1, SPX only */ /* * Check on which domain to allocate BO @@ -1652,13 +1652,14 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu( if (adev->gmc.is_app_apu) { domain = AMDGPU_GEM_DOMAIN_GTT; - alloc_domain = AMDGPU_GEM_DOMAIN_CPU; + alloc_domain = AMDGPU_GEM_DOMAIN_GTT; alloc_flags = 0; } else { alloc_flags = AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE; alloc_flags |= (flags & KFD_IOC_ALLOC_MEM_FLAGS_PUBLIC) ? AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED : 0; } + mem_id = avm->mem_id; } else if (flags & KFD_IOC_ALLOC_MEM_FLAGS_GTT) { domain = alloc_domain = AMDGPU_GEM_DOMAIN_GTT; alloc_flags = 0; @@ -1716,11 +1717,12 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu( goto err_reserve_limit; } - pr_debug("\tcreate BO VA 0x%llx size 0x%llx domain %s\n", - va, (*mem)->aql_queue ? size << 1 : size, domain_string(alloc_domain)); + pr_debug("\tcreate BO VA 0x%llx size 0x%llx domain %s mem_id %d\n", +va, (*mem)->aql_queue ? size << 1 : size, +domain_string(alloc_domain), mem_id); ret = amdgpu_gem_object_create(adev, aligned_size, 1, alloc_domain, alloc_flags, - bo_type, NULL, , 0); + bo_type, NULL, , mem_id + 1); if (ret) { pr_debug("Failed to create BO on domain %s. ret %d\n", domain_string(alloc_domain), ret); @@ -1746,17 +1748,6 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu( (*mem)->mapped_to_gpu_memory = 0; (*mem)->process_info = avm->process_info; - if (adev->gmc.is_app_apu && - ((*mem)->alloc_flags & KFD_IOC_ALLOC_MEM_FLAGS_VRAM)) { - bo->allowed_domains = AMDGPU_GEM_DOMAIN_GTT; - bo->preferred_domains = AMDGPU_GEM_DOMAIN_GTT; - ret = amdgpu_ttm_tt_set_mem_pool(>tbo, mem_id); - if (ret) { - pr_debug("failed to set ttm mem pool %d\n", ret); - goto err_set_mem_partition; - } - } - add_kgd_mem_to_kfd_bo_list(*mem, avm->process_info, user_addr); if (user_addr) { @@ -1783,7 +1774,6 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu( allocate_init_user_pages_failed: err_pin_bo: remove_kgd_mem_from_kfd_bo_list(*mem, avm->process_info); -err_set_mem_partition: drm_vma_node_revoke(>vma_node, drm_priv); err_node_allow: /* Don't unreserve system mem limit twice */ diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 254927c596ba..395edca3b7f9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -1064,7 +1064,7 @@ static struct ttm_tt *amdgpu_ttm_tt_create(struct ttm_buffer_object *bo, return NULL; } gtt->gobj = >base; - gtt->pool_id = NUMA_NO_NODE; + gtt->pool_id = abo->mem_id; if (abo->flags & AMDGPU_GEM_CREATE_CPU_GTT_USWC) caching = ttm_write_combined; @@ -1159,24 +1159,6 @@ static void amdgpu_ttm_tt_unpopulate(struct ttm_device *bdev, return ttm_pool_free(pool, ttm); } -/** - * amdgpu_ttm_tt_set_mem_pool - Set the TTM memory pool for the
[PATCH 25/29] drm/amdkfd: Fix memory reporting on GFX 9.4.3
From: Mukul Joshi This patch fixes memory reporting on the GFX 9.4.3 APU and dGPU by reporting available memory on a per partition basis. If its an APU, available and used memory calculations take into account system and TTM memory. v2: squash in fix ("drm/amdkfd: Fix array out of bound warning") squash in fix ("drm/amdgpu: Update memory reporting for GFX9.4.3") Signed-off-by: Mukul Joshi Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 12 +-- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 81 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h | 5 ++ drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 3 +- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 14 ++-- 5 files changed, 84 insertions(+), 31 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h index 4bf6f5659568..948d362adabb 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h @@ -35,6 +35,7 @@ #include #include "amdgpu_sync.h" #include "amdgpu_vm.h" +#include "amdgpu_xcp.h" extern uint64_t amdgpu_amdkfd_total_mem_size; @@ -98,8 +99,8 @@ struct amdgpu_amdkfd_fence { struct amdgpu_kfd_dev { struct kfd_dev *dev; - int64_t vram_used; - uint64_t vram_used_aligned; + int64_t vram_used[MAX_XCP]; + uint64_t vram_used_aligned[MAX_XCP]; bool init_complete; struct work_struct reset_work; @@ -287,7 +288,8 @@ int amdgpu_amdkfd_gpuvm_acquire_process_vm(struct amdgpu_device *adev, void amdgpu_amdkfd_gpuvm_release_process_vm(struct amdgpu_device *adev, void *drm_priv); uint64_t amdgpu_amdkfd_gpuvm_get_process_page_dir(void *drm_priv); -size_t amdgpu_amdkfd_get_available_memory(struct amdgpu_device *adev); +size_t amdgpu_amdkfd_get_available_memory(struct amdgpu_device *adev, + uint8_t xcp_id); int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu( struct amdgpu_device *adev, uint64_t va, uint64_t size, void *drm_priv, struct kgd_mem **mem, @@ -327,9 +329,9 @@ void amdgpu_amdkfd_block_mmu_notifications(void *p); int amdgpu_amdkfd_criu_resume(void *p); bool amdgpu_amdkfd_ras_query_utcl2_poison_status(struct amdgpu_device *adev); int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev, - uint64_t size, u32 alloc_flag); + uint64_t size, u32 alloc_flag, int8_t xcp_id); void amdgpu_amdkfd_unreserve_mem_limit(struct amdgpu_device *adev, - uint64_t size, u32 alloc_flag); + uint64_t size, u32 alloc_flag, int8_t xcp_id); #define KFD_XCP_MEM_ID(adev, xcp_id) \ ((adev)->xcp_mgr && (xcp_id) >= 0 ?\ diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index cf8f80e4ef56..fa4057da0d7f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -156,12 +156,13 @@ void amdgpu_amdkfd_reserve_system_mem(uint64_t size) * Return: returns -ENOMEM in case of error, ZERO otherwise */ int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev, - uint64_t size, u32 alloc_flag) + uint64_t size, u32 alloc_flag, int8_t xcp_id) { uint64_t reserved_for_pt = ESTIMATE_PT_SIZE(amdgpu_amdkfd_total_mem_size); size_t system_mem_needed, ttm_mem_needed, vram_needed; int ret = 0; + uint64_t vram_size = 0; system_mem_needed = 0; ttm_mem_needed = 0; @@ -176,6 +177,17 @@ int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev, * 2M BO chunk. */ vram_needed = size; + /* +* For GFX 9.4.3, get the VRAM size from XCP structs +*/ + if (WARN_ONCE(xcp_id < 0, "invalid XCP ID %d", xcp_id)) + return -EINVAL; + + vram_size = KFD_XCP_MEMORY_SIZE(adev, xcp_id); + if (adev->gmc.is_app_apu) { + system_mem_needed = size; + ttm_mem_needed = size; + } } else if (alloc_flag & KFD_IOC_ALLOC_MEM_FLAGS_USERPTR) { system_mem_needed = size; } else if (!(alloc_flag & @@ -195,8 +207,8 @@ int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev, kfd_mem_limit.max_system_mem_limit && !no_system_mem_limit) || (kfd_mem_limit.ttm_mem_used + ttm_mem_needed > kfd_mem_limit.max_ttm_mem_limit) || - (adev && adev->kfd.vram_used + vram_needed > -adev->gmc.real_vram_size - reserved_for_pt)) { + (adev && xcp_id >= 0 && adev->kfd.vram_used[xcp_id] + vram_needed > +vram_size - reserved_for_pt)) { ret = -ENOMEM;
[PATCH 22/29] drm/amdgpu: KFD graphics interop support compute partition
From: Philip Yang kfd_ioctl_get_dmabuf use the amdgpu bo xcp_id to get the gpu_id of the KFD node from the exported dmabuf_adev, and then create kfd bo on the correct adev and KFD node when importing the amdgpu bo to KFD. Remove function kfd_device_by_adev, it is not needed as it is the same result as dmabuf_adev->kfd.dev->nodes[0]->id. Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 4 +++- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 14 ++ drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 1 - drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 18 -- 5 files changed, 10 insertions(+), 29 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c index bbbfe9ec4adf..00edb13d2124 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c @@ -498,7 +498,7 @@ int amdgpu_amdkfd_get_dmabuf_info(struct amdgpu_device *adev, int dma_buf_fd, struct amdgpu_device **dmabuf_adev, uint64_t *bo_size, void *metadata_buffer, size_t buffer_size, uint32_t *metadata_size, - uint32_t *flags) + uint32_t *flags, int8_t *xcp_id) { struct dma_buf *dma_buf; struct drm_gem_object *obj; @@ -542,6 +542,8 @@ int amdgpu_amdkfd_get_dmabuf_info(struct amdgpu_device *adev, int dma_buf_fd, if (bo->flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) *flags |= KFD_IOC_ALLOC_MEM_FLAGS_PUBLIC; } + if (xcp_id) + *xcp_id = bo->xcp_id; out_put: dma_buf_put(dma_buf); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h index 05c54776951b..4e6221bccffe 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h @@ -241,7 +241,7 @@ int amdgpu_amdkfd_get_dmabuf_info(struct amdgpu_device *adev, int dma_buf_fd, struct amdgpu_device **dmabuf_adev, uint64_t *bo_size, void *metadata_buffer, size_t buffer_size, uint32_t *metadata_size, - uint32_t *flags); + uint32_t *flags, int8_t *xcp_id); uint8_t amdgpu_amdkfd_get_xgmi_hops_count(struct amdgpu_device *dst, struct amdgpu_device *src); int amdgpu_amdkfd_get_xgmi_bandwidth_mbytes(struct amdgpu_device *dst, diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c index 8c86d69938ea..344b238d6771 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c @@ -1499,6 +1499,7 @@ static int kfd_ioctl_get_dmabuf_info(struct file *filep, struct amdgpu_device *dmabuf_adev; void *metadata_buffer = NULL; uint32_t flags; + int8_t xcp_id; unsigned int i; int r; @@ -1519,17 +1520,14 @@ static int kfd_ioctl_get_dmabuf_info(struct file *filep, r = amdgpu_amdkfd_get_dmabuf_info(dev->adev, args->dmabuf_fd, _adev, >size, metadata_buffer, args->metadata_size, - >metadata_size, ); + >metadata_size, , _id); if (r) goto exit; - /* Reverse-lookup gpu_id from kgd pointer */ - dev = kfd_device_by_adev(dmabuf_adev); - if (!dev) { - r = -EINVAL; - goto exit; - } - args->gpu_id = dev->id; + if (xcp_id >= 0) + args->gpu_id = dmabuf_adev->kfd.dev->nodes[xcp_id]->id; + else + args->gpu_id = dmabuf_adev->kfd.dev->nodes[0]->id; args->flags = flags; /* Copy metadata buffer to user mode */ diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h index 214d950f948e..44f4d5509db6 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h @@ -1068,7 +1068,6 @@ struct kfd_topology_device *kfd_topology_device_by_proximity_domain_no_lock( struct kfd_topology_device *kfd_topology_device_by_id(uint32_t gpu_id); struct kfd_node *kfd_device_by_id(uint32_t gpu_id); struct kfd_node *kfd_device_by_pci_dev(const struct pci_dev *pdev); -struct kfd_node *kfd_device_by_adev(const struct amdgpu_device *adev); static inline bool kfd_irq_is_from_node(struct kfd_node *node, uint32_t node_id, uint32_t vmid) { diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
[PATCH 21/29] drm/amdkfd: Store xcp partition id to amdgpu bo
From: Philip Yang For memory accounting per compute partition and export drm amdgpu bo and then import to KFD, we need the xcp id to account the memory usage or find the KFD node of the original amdgpu bo to create the KFD bo on the correct adev KFD node. Set xcp_id_plus1 of amdgpu_bo_param to create bo and store xcp_id to amddgpu bo. Add helper macro to get the mem_id from adev and xcp_id. v2: squash in fix ("drm/amdgpu: Fix BO creation failure on GFX 9.4.3 dGPU") Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 4 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 11 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.h | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 15 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 12 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 6 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c| 5 +++-- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c| 2 +- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 4 ++-- 10 files changed, 42 insertions(+), 23 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h index 324cb566ca2f..05c54776951b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h @@ -330,6 +330,10 @@ int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev, void amdgpu_amdkfd_unreserve_mem_limit(struct amdgpu_device *adev, uint64_t size, u32 alloc_flag); +#define KFD_XCP_MEM_ID(adev, xcp_id) \ + ((adev)->xcp_mgr && (xcp_id) >= 0 ?\ + (adev)->xcp_mgr->xcp[(xcp_id)].mem_id : -1) + #define KFD_XCP_MEMORY_SIZE(n) ((n)->adev->gmc.num_mem_partitions ?\ (n)->adev->gmc.mem_partitions[(n)->xcp->mem_id].size /\ (n)->adev->xcp_mgr->num_xcp_per_mem_partition :\ diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 71b22d61dd27..cf8f80e4ef56 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -1633,6 +1633,7 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu( uint64_t *offset, uint32_t flags, bool criu_resume) { struct amdgpu_vm *avm = drm_priv_to_vm(drm_priv); + struct amdgpu_fpriv *fpriv = container_of(avm, struct amdgpu_fpriv, vm); enum ttm_bo_type bo_type = ttm_bo_type_device; struct sg_table *sg = NULL; uint64_t user_addr = 0; @@ -1640,7 +1641,7 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu( struct drm_gem_object *gobj = NULL; u32 domain, alloc_domain; uint64_t aligned_size; - int8_t mem_id = -1; + int8_t xcp_id = -1; u64 alloc_flags; int ret; @@ -1659,7 +1660,7 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu( alloc_flags |= (flags & KFD_IOC_ALLOC_MEM_FLAGS_PUBLIC) ? AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED : 0; } - mem_id = avm->mem_id; + xcp_id = fpriv->xcp_id == ~0 ? 0 : fpriv->xcp_id; } else if (flags & KFD_IOC_ALLOC_MEM_FLAGS_GTT) { domain = alloc_domain = AMDGPU_GEM_DOMAIN_GTT; alloc_flags = 0; @@ -1717,12 +1718,12 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu( goto err_reserve_limit; } - pr_debug("\tcreate BO VA 0x%llx size 0x%llx domain %s mem_id %d\n", + pr_debug("\tcreate BO VA 0x%llx size 0x%llx domain %s xcp_id %d\n", va, (*mem)->aql_queue ? size << 1 : size, -domain_string(alloc_domain), mem_id); +domain_string(alloc_domain), xcp_id); ret = amdgpu_gem_object_create(adev, aligned_size, 1, alloc_domain, alloc_flags, - bo_type, NULL, , mem_id + 1); + bo_type, NULL, , xcp_id + 1); if (ret) { pr_debug("Failed to create BO on domain %s. ret %d\n", domain_string(alloc_domain), ret); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c index 01029b495f5a..b02d106d5a0c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c @@ -97,7 +97,7 @@ int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned long size, int alignment, u32 initial_domain, u64 flags, enum ttm_bo_type type, struct dma_resv *resv, -struct drm_gem_object **obj, int8_t mem_id_plus1) +struct drm_gem_object **obj, int8_t xcp_id_plus1) { struct amdgpu_bo *bo;
[PATCH 17/29] drm/amdgpu: dGPU mode placement support memory partition
From: Philip Yang dGPU mode uses VRAM manager to validate bo, amdgpu bo placement use the mem_id to get the allocation range first, last page frame number from xcp manager, pass to drm buddy allocator as the allowed range. Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 155b62971a33..cfa14b56c419 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -132,13 +132,18 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32 domain) if (domain & AMDGPU_GEM_DOMAIN_VRAM) { unsigned visible_pfn = adev->gmc.visible_vram_size >> PAGE_SHIFT; - places[c].fpfn = 0; - places[c].lpfn = 0; + if (adev->gmc.mem_partitions && abo->mem_id >= 0) { + places[c].fpfn = adev->gmc.mem_partitions[abo->mem_id].range.fpfn; + places[c].lpfn = adev->gmc.mem_partitions[abo->mem_id].range.lpfn; + } else { + places[c].fpfn = 0; + places[c].lpfn = 0; + } places[c].mem_type = TTM_PL_VRAM; places[c].flags = 0; if (flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) - places[c].lpfn = visible_pfn; + places[c].lpfn = min_not_zero(places[c].lpfn, visible_pfn); else if (adev->gmc.real_vram_size != adev->gmc.visible_vram_size) places[c].flags |= TTM_PL_FLAG_TOPDOWN; -- 2.40.1
[PATCH 26/29] drm/amdkfd: APU mode set max svm range pages
From: Philip Yang svm_migrate_init set the max svm range pages based on the KFD nodes partition size. APU mode don't init pgmap because there is no migration. kgd2kfd_device_init calls svm_migrate_init after KFD nodes allocation and initialization. Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 5 ++--- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 7 +-- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 15 ++- 3 files changed, 17 insertions(+), 10 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c index d41da964d2f5..882ff86bba08 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c @@ -724,9 +724,6 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd, kfd_cwsr_init(kfd); - svm_migrate_init(kfd->adev); - - dev_info(kfd_device, "Total number of KFD nodes to be created: %d\n", kfd->num_nodes); @@ -794,6 +791,8 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd, kfd->nodes[i] = node; } + svm_migrate_init(kfd->adev); + if (kfd_resume_iommu(kfd)) goto kfd_resume_iommu_error; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c index 199d32c7c289..2512bf681112 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c @@ -1000,6 +1000,11 @@ int svm_migrate_init(struct amdgpu_device *adev) if (!KFD_IS_SOC15(kfddev->dev)) return -EINVAL; + svm_range_set_max_pages(adev); + + if (adev->gmc.is_app_apu) + return 0; + pgmap = >pgmap; memset(pgmap, 0, sizeof(*pgmap)); @@ -1042,8 +1047,6 @@ int svm_migrate_init(struct amdgpu_device *adev) amdgpu_amdkfd_reserve_system_mem(SVM_HMM_PAGE_STRUCT_SIZE(size)); - svm_range_set_max_pages(adev); - pr_info("HMM registered %ldMB device memory\n", size >> 20); return 0; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index 2dbbdad3f392..41dacc015983 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -1937,14 +1937,19 @@ void svm_range_set_max_pages(struct amdgpu_device *adev) { uint64_t max_pages; uint64_t pages, _pages; + uint64_t min_pages = 0; + int i; + + for (i = 0; i < adev->kfd.dev->num_nodes; i++) { + pages = KFD_XCP_MEMORY_SIZE(adev, adev->kfd.dev->nodes[i]->xcp->id) >> 17; + pages = clamp(pages, 1ULL << 9, 1ULL << 18); + pages = rounddown_pow_of_two(pages); + min_pages = min_not_zero(min_pages, pages); + } - /* 1/32 VRAM size in pages */ - pages = adev->gmc.real_vram_size >> 17; - pages = clamp(pages, 1ULL << 9, 1ULL << 18); - pages = rounddown_pow_of_two(pages); do { max_pages = READ_ONCE(max_svm_range_pages); - _pages = min_not_zero(max_pages, pages); + _pages = min_not_zero(max_pages, min_pages); } while (cmpxchg(_svm_range_pages, max_pages, _pages) != max_pages); } -- 2.40.1
[PATCH 28/29] drm/amdkfd: Refactor migrate init to support partition switch
From: Philip Yang Rename smv_migrate_init to a better name kgd2kfd_init_zone_device because it setup zone devive pgmap for page migration and keep it in kfd_migrate.c to access static functions svm_migrate_pgmap_ops. Call it only once in amdgpu_device_ip_init after adev ip blocks are initialized, but before amdgpu_amdkfd_device_init initialize kfd nodes which enable SVM support based on pgmap. svm_range_set_max_pages is called by kgd2kfd_device_init everytime after switching compute partition mode. Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 11 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +++- drivers/gpu/drm/amd/amdkfd/kfd_device.c| 3 ++- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 8 +++- drivers/gpu/drm/amd/amdkfd/kfd_migrate.h | 9 - drivers/gpu/drm/amd/amdkfd/kfd_svm.h | 4 6 files changed, 23 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h index 948d362adabb..48d12dbff968 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h @@ -372,6 +372,17 @@ void amdgpu_amdkfd_release_notify(struct amdgpu_bo *bo) { } #endif + +#if IS_ENABLED(CONFIG_HSA_AMD_SVM) +int kgd2kfd_init_zone_device(struct amdgpu_device *adev); +#else +static inline +int kgd2kfd_init_zone_device(struct amdgpu_device *adev) +{ + return 0; +} +#endif + /* KGD2KFD callbacks */ int kgd2kfd_quiesce_mm(struct mm_struct *mm, uint32_t trigger); int kgd2kfd_resume_mm(struct mm_struct *mm); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 321b689db601..9c1a8ace6c31 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2632,8 +2632,10 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev) goto init_failed; /* Don't init kfd if whole hive need to be reset during init */ - if (!adev->gmc.xgmi.pending_reset) + if (!adev->gmc.xgmi.pending_reset) { + kgd2kfd_init_zone_device(adev); amdgpu_amdkfd_device_init(adev); + } amdgpu_fru_get_product_info(adev); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c index 882ff86bba08..bf32e547182c 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c @@ -32,6 +32,7 @@ #include "kfd_iommu.h" #include "amdgpu_amdkfd.h" #include "kfd_smi_events.h" +#include "kfd_svm.h" #include "kfd_migrate.h" #include "amdgpu.h" #include "amdgpu_xcp.h" @@ -791,7 +792,7 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd, kfd->nodes[i] = node; } - svm_migrate_init(kfd->adev); + svm_range_set_max_pages(kfd->adev); if (kfd_resume_iommu(kfd)) goto kfd_resume_iommu_error; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c index 2512bf681112..35cf6558cf1b 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c @@ -988,7 +988,7 @@ static const struct dev_pagemap_ops svm_migrate_pgmap_ops = { /* Each VRAM page uses sizeof(struct page) on system memory */ #define SVM_HMM_PAGE_STRUCT_SIZE(size) ((size)/PAGE_SIZE * sizeof(struct page)) -int svm_migrate_init(struct amdgpu_device *adev) +int kgd2kfd_init_zone_device(struct amdgpu_device *adev) { struct amdgpu_kfd_dev *kfddev = >kfd; struct dev_pagemap *pgmap; @@ -996,12 +996,10 @@ int svm_migrate_init(struct amdgpu_device *adev) unsigned long size; void *r; - /* Page migration works on Vega10 or newer */ - if (!KFD_IS_SOC15(kfddev->dev)) + /* Page migration works on gfx9 or newer */ + if (adev->ip_versions[GC_HWIP][0] < IP_VERSION(9, 0, 1)) return -EINVAL; - svm_range_set_max_pages(adev); - if (adev->gmc.is_app_apu) return 0; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.h b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.h index a5d7e6d22264..487f26368164 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.h @@ -47,15 +47,6 @@ int svm_migrate_vram_to_ram(struct svm_range *prange, struct mm_struct *mm, unsigned long svm_migrate_addr_to_pfn(struct amdgpu_device *adev, unsigned long addr); -int svm_migrate_init(struct amdgpu_device *adev); - -#else - -static inline int svm_migrate_init(struct amdgpu_device *adev) -{ - return 0; -} - #endif /* IS_ENABLED(CONFIG_HSA_AMD_SVM) */ #endif /* KFD_MIGRATE_H_ */ diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h b/drivers/gpu/drm/amd/amdkfd/kfd_svm.h index 021def496f5a..762679835e31 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h +++
[PATCH 16/29] drm/amdkfd: SVM range allocation support memory partition
From: Philip Yang Pass kfd node->xcp->mem_id to amdgpu bo create parameter mem_id_plus1 to allocate new svm_bo on the specified memory partition. This is only for dGPU mode as we don't migrate with APU mode. Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index c5675c7e3b9e..f6a886d9e902 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -554,16 +554,20 @@ svm_range_vram_node_new(struct kfd_node *node, struct svm_range *prange, bp.flags |= AMDGPU_GEM_CREATE_DISCARDABLE; bp.type = ttm_bo_type_device; bp.resv = NULL; + if (node->xcp) + bp.mem_id_plus1 = node->xcp->mem_id + 1; - /* TODO: Allocate memory from the right memory partition. We can sort -* out the details later, once basic memory partitioning is working -*/ r = amdgpu_bo_create_user(node->adev, , ); if (r) { pr_debug("failed %d to create bo\n", r); goto create_bo_failed; } bo = >bo; + + pr_debug("alloc bo at offset 0x%lx size 0x%lx on partition %d\n", +bo->tbo.resource->start << PAGE_SHIFT, bp.size, +bp.mem_id_plus1 - 1); + r = amdgpu_bo_reserve(bo, true); if (r) { pr_debug("failed %d to reserve bo\n", r); -- 2.40.1
[PATCH 14/29] drm/amdgpu: Add memory partition mem_id to amdgpu_bo
From: Philip Yang Add mem_id_plus1 parameter to amdgpu_gem_object_create and pass it to amdgpu_bo_create. For dGPU mode allocation, mem_id is used by VRAM manager to get the memory partition fpfn, lpfn from xcp manager. For APU native mode allocation, mem_id is used to get NUMA node id from xcp manager, then pass to TTM as numa pool id to alloc memory from the specific NUMA node. mem_id -1 means for entire VRAM or any NUMA nodes. Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 9 + drivers/gpu/drm/amd/amdgpu/amdgpu_gem.h | 3 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 5 + 6 files changed, 17 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 12149b317b88..6d0c25e34af1 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -289,7 +289,7 @@ create_dmamap_sg_bo(struct amdgpu_device *adev, ret = amdgpu_gem_object_create(adev, mem->bo->tbo.base.size, 1, AMDGPU_GEM_DOMAIN_CPU, AMDGPU_GEM_CREATE_PREEMPTIBLE | flags, - ttm_bo_type_sg, mem->bo->tbo.base.resv, _obj); + ttm_bo_type_sg, mem->bo->tbo.base.resv, _obj, 0); amdgpu_bo_unreserve(mem->bo); @@ -1720,7 +1720,7 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu( va, (*mem)->aql_queue ? size << 1 : size, domain_string(alloc_domain)); ret = amdgpu_gem_object_create(adev, aligned_size, 1, alloc_domain, alloc_flags, - bo_type, NULL, ); + bo_type, NULL, , 0); if (ret) { pr_debug("Failed to create BO on domain %s. ret %d\n", domain_string(alloc_domain), ret); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c index e97b1eef2c9d..8b162f05d1fd 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c @@ -335,7 +335,7 @@ amdgpu_dma_buf_create_obj(struct drm_device *dev, struct dma_buf *dma_buf) ret = amdgpu_gem_object_create(adev, dma_buf->size, PAGE_SIZE, AMDGPU_GEM_DOMAIN_CPU, flags, - ttm_bo_type_sg, resv, ); + ttm_bo_type_sg, resv, , 0); if (ret) goto error; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c index 6936cd63df42..01029b495f5a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c @@ -97,7 +97,7 @@ int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned long size, int alignment, u32 initial_domain, u64 flags, enum ttm_bo_type type, struct dma_resv *resv, -struct drm_gem_object **obj) +struct drm_gem_object **obj, int8_t mem_id_plus1) { struct amdgpu_bo *bo; struct amdgpu_bo_user *ubo; @@ -115,6 +115,7 @@ int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned long size, bp.flags = flags; bp.domain = initial_domain; bp.bo_ptr_size = sizeof(struct amdgpu_bo); + bp.mem_id_plus1 = mem_id_plus1; r = amdgpu_bo_create_user(adev, , ); if (r) @@ -335,7 +336,7 @@ int amdgpu_gem_create_ioctl(struct drm_device *dev, void *data, retry: r = amdgpu_gem_object_create(adev, size, args->in.alignment, initial_domain, -flags, ttm_bo_type_device, resv, ); +flags, ttm_bo_type_device, resv, , 0); if (r && r != -ERESTARTSYS) { if (flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) { flags &= ~AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED; @@ -404,7 +405,7 @@ int amdgpu_gem_userptr_ioctl(struct drm_device *dev, void *data, /* create a gem object to contain this object in */ r = amdgpu_gem_object_create(adev, args->size, 0, AMDGPU_GEM_DOMAIN_CPU, -0, ttm_bo_type_device, NULL, ); +0, ttm_bo_type_device, NULL, , 0); if (r) return r; @@ -930,7 +931,7 @@ int amdgpu_mode_dumb_create(struct drm_file *file_priv, domain = amdgpu_bo_get_preferred_domain(adev, amdgpu_display_supported_domains(adev, flags)); r =
[PATCH 20/29] drm/amdgpu: dGPU mode set VRAM range lpfn as exclusive
From: Philip Yang TTM place lpfn is exclusive used as end (start + size) in drm and buddy allocator, adev->gmc memory partition range lpfn is inclusive (start + size - 1), should plus 1 to set TTM place lpfn. Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index cfa14b56c419..3002d431ce3d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -134,7 +134,11 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32 domain) if (adev->gmc.mem_partitions && abo->mem_id >= 0) { places[c].fpfn = adev->gmc.mem_partitions[abo->mem_id].range.fpfn; - places[c].lpfn = adev->gmc.mem_partitions[abo->mem_id].range.lpfn; + /* +* memory partition range lpfn is inclusive start + size - 1 +* TTM place lpfn is exclusive start + size +*/ + places[c].lpfn = adev->gmc.mem_partitions[abo->mem_id].range.lpfn + 1; } else { places[c].fpfn = 0; places[c].lpfn = 0; -- 2.40.1
[PATCH 07/29] drm/amdgpu: add partition schedule for GC(9, 4, 3)
From: James Zhu Implement partition schedule for GC(9, 4, 3). Signed-off-by: James Zhu Acked-by: Lijo Lazar Signed-off-by: Alex Deucher --- .../drm/amd/amdgpu/aqua_vanjaram_reg_init.c | 41 +++ 1 file changed, 41 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram_reg_init.c b/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram_reg_init.c index 073ae95e6dd6..4ca932a62ce6 100644 --- a/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram_reg_init.c +++ b/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram_reg_init.c @@ -166,6 +166,46 @@ static int aqua_vanjaram_update_partition_sched_list(struct amdgpu_device *adev) return aqua_vanjaram_xcp_sched_list_update(adev); } +int aqua_vanjaram_select_scheds( + struct amdgpu_device *adev, + u32 hw_ip, + u32 hw_prio, + struct amdgpu_fpriv *fpriv, + unsigned int *num_scheds, + struct drm_gpu_scheduler ***scheds) +{ + u32 sel_xcp_id; + int i; + + if (fpriv->xcp_id == ~0) { + u32 least_ref_cnt = ~0; + + fpriv->xcp_id = 0; + for (i = 0; i < adev->xcp_mgr->num_xcps; i++) { + u32 total_ref_cnt; + + total_ref_cnt = atomic_read(>xcp_mgr->xcp[i].ref_cnt); + if (total_ref_cnt < least_ref_cnt) { + fpriv->xcp_id = i; + least_ref_cnt = total_ref_cnt; + } + } + } + sel_xcp_id = fpriv->xcp_id; + + if (adev->xcp_mgr->xcp[sel_xcp_id].gpu_sched[hw_ip][hw_prio].num_scheds) { + *num_scheds = adev->xcp_mgr->xcp[fpriv->xcp_id].gpu_sched[hw_ip][hw_prio].num_scheds; + *scheds = adev->xcp_mgr->xcp[fpriv->xcp_id].gpu_sched[hw_ip][hw_prio].sched; + atomic_inc(>xcp_mgr->xcp[sel_xcp_id].ref_cnt); + DRM_DEBUG("Selected partition #%d", sel_xcp_id); + } else { + DRM_ERROR("Failed to schedule partition #%d.", sel_xcp_id); + return -ENOENT; + } + + return 0; +} + static int8_t aqua_vanjaram_logical_to_dev_inst(struct amdgpu_device *adev, enum amd_hw_ip_block_type block, int8_t inst) @@ -548,6 +588,7 @@ struct amdgpu_xcp_mgr_funcs aqua_vanjaram_xcp_funcs = { .query_partition_mode = _vanjaram_query_partition_mode, .get_ip_details = _vanjaram_get_xcp_ip_details, .get_xcp_mem_id = _vanjaram_get_xcp_mem_id, + .select_scheds = _vanjaram_select_scheds, .update_partition_sched_list = _vanjaram_update_partition_sched_list }; -- 2.40.1
[PATCH 05/29] drm/amdgpu: add partition scheduler list update
From: James Zhu Add partition scheduler list update in late init and xcp partition mode switch. Signed-off-by: James Zhu Acked-by: Lijo Lazar Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c| 2 + drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c | 2 + .../drm/amd/amdgpu/aqua_vanjaram_reg_init.c | 67 ++- 3 files changed, 70 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 40c5845c78df..321b689db601 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2473,6 +2473,8 @@ static int amdgpu_device_init_schedulers(struct amdgpu_device *adev) } } + amdgpu_xcp_update_partition_sched_list(adev); + return 0; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c index 9b627a8b1d5c..78fce5aab218 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c @@ -118,6 +118,7 @@ static void __amdgpu_xcp_add_block(struct amdgpu_xcp_mgr *xcp_mgr, int xcp_id, int amdgpu_xcp_init(struct amdgpu_xcp_mgr *xcp_mgr, int num_xcps, int mode) { + struct amdgpu_device *adev = xcp_mgr->adev; struct amdgpu_xcp_ip ip; uint8_t mem_id; int i, j, ret; @@ -153,6 +154,7 @@ int amdgpu_xcp_init(struct amdgpu_xcp_mgr *xcp_mgr, int num_xcps, int mode) } xcp_mgr->num_xcps = num_xcps; + amdgpu_xcp_update_partition_sched_list(adev); return 0; } diff --git a/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram_reg_init.c b/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram_reg_init.c index c90ea34ef9ec..073ae95e6dd6 100644 --- a/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram_reg_init.c +++ b/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram_reg_init.c @@ -102,6 +102,70 @@ static void aqua_vanjaram_set_xcp_id(struct amdgpu_device *adev, } } +static void aqua_vanjaram_xcp_gpu_sched_update( + struct amdgpu_device *adev, + struct amdgpu_ring *ring, + unsigned int sel_xcp_id) +{ + unsigned int *num_gpu_sched; + + num_gpu_sched = >xcp_mgr->xcp[sel_xcp_id] + .gpu_sched[ring->funcs->type][ring->hw_prio].num_scheds; + adev->xcp_mgr->xcp[sel_xcp_id].gpu_sched[ring->funcs->type][ring->hw_prio] + .sched[(*num_gpu_sched)++] = >sched; + DRM_DEBUG("%s :[%d] gpu_sched[%d][%d] = %d", ring->name, + sel_xcp_id, ring->funcs->type, + ring->hw_prio, *num_gpu_sched); +} + +static int aqua_vanjaram_xcp_sched_list_update( + struct amdgpu_device *adev) +{ + struct amdgpu_ring *ring; + int i; + + for (i = 0; i < MAX_XCP; i++) { + atomic_set(>xcp_mgr->xcp[i].ref_cnt, 0); + memset(adev->xcp_mgr->xcp[i].gpu_sched, 0, sizeof(adev->xcp_mgr->xcp->gpu_sched)); + } + + if (adev->xcp_mgr->mode == AMDGPU_XCP_MODE_NONE) + return 0; + + for (i = 0; i < AMDGPU_MAX_RINGS; i++) { + ring = adev->rings[i]; + if (!ring || !ring->sched.ready) + continue; + + aqua_vanjaram_xcp_gpu_sched_update(adev, ring, ring->xcp_id); + + /* VCN is shared by two partitions under CPX MODE */ + if ((ring->funcs->type == AMDGPU_RING_TYPE_VCN_ENC || + ring->funcs->type == AMDGPU_RING_TYPE_VCN_JPEG) && + adev->xcp_mgr->mode == AMDGPU_CPX_PARTITION_MODE) + aqua_vanjaram_xcp_gpu_sched_update(adev, ring, ring->xcp_id + 1); + } + + return 0; +} + +static int aqua_vanjaram_update_partition_sched_list(struct amdgpu_device *adev) +{ + int i; + + for (i = 0; i < adev->num_rings; i++) { + struct amdgpu_ring *ring = adev->rings[i]; + + if (ring->funcs->type == AMDGPU_RING_TYPE_COMPUTE || + ring->funcs->type == AMDGPU_RING_TYPE_KIQ) + aqua_vanjaram_set_xcp_id(adev, ring->xcc_id, ring); + else + aqua_vanjaram_set_xcp_id(adev, ring->me, ring); + } + + return aqua_vanjaram_xcp_sched_list_update(adev); +} + static int8_t aqua_vanjaram_logical_to_dev_inst(struct amdgpu_device *adev, enum amd_hw_ip_block_type block, int8_t inst) @@ -483,7 +547,8 @@ struct amdgpu_xcp_mgr_funcs aqua_vanjaram_xcp_funcs = { .switch_partition_mode = _vanjaram_switch_partition_mode, .query_partition_mode = _vanjaram_query_partition_mode, .get_ip_details = _vanjaram_get_xcp_ip_details, - .get_xcp_mem_id = _vanjaram_get_xcp_mem_id + .get_xcp_mem_id = _vanjaram_get_xcp_mem_id, + .update_partition_sched_list =
[PATCH 08/29] drm/amdgpu: run partition schedule if it is supported
From: James Zhu Run partition schedule if it is supported during ctx init entity. Signed-off-by: James Zhu Acked-by: Lijo Lazar Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 15 +-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c index 06d68a08251a..e579bb054a58 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c @@ -222,8 +222,19 @@ static int amdgpu_ctx_init_entity(struct amdgpu_ctx *ctx, u32 hw_ip, drm_prio = amdgpu_ctx_to_drm_sched_prio(ctx_prio); hw_ip = array_index_nospec(hw_ip, AMDGPU_HW_IP_NUM); - scheds = adev->gpu_sched[hw_ip][hw_prio].sched; - num_scheds = adev->gpu_sched[hw_ip][hw_prio].num_scheds; + + if (!(adev)->xcp_mgr) { + scheds = adev->gpu_sched[hw_ip][hw_prio].sched; + num_scheds = adev->gpu_sched[hw_ip][hw_prio].num_scheds; + } else { + struct amdgpu_fpriv *fpriv; + + fpriv = container_of(ctx->ctx_mgr, struct amdgpu_fpriv, ctx_mgr); + r = amdgpu_xcp_select_scheds(adev, hw_ip, hw_prio, fpriv, + _scheds, ); + if (r) + goto cleanup_entity; + } /* disable load balance if the hw engine retains context among dependent jobs */ if (hw_ip == AMDGPU_HW_IP_VCN_ENC || -- 2.40.1
[PATCH 09/29] drm/amdgpu: update ref_cnt before ctx free
From: James Zhu Update ref_cnt before ctx free. Signed-off-by: James Zhu Acked-by: Lijo Lazar Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 7 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c | 16 drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h | 2 ++ 3 files changed, 23 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c index e579bb054a58..3ccd709ae76a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c @@ -266,7 +266,8 @@ static int amdgpu_ctx_init_entity(struct amdgpu_ctx *ctx, u32 hw_ip, return r; } -static ktime_t amdgpu_ctx_fini_entity(struct amdgpu_ctx_entity *entity) +static ktime_t amdgpu_ctx_fini_entity(struct amdgpu_device *adev, + struct amdgpu_ctx_entity *entity) { ktime_t res = ns_to_ktime(0); int i; @@ -279,6 +280,8 @@ static ktime_t amdgpu_ctx_fini_entity(struct amdgpu_ctx_entity *entity) dma_fence_put(entity->fences[i]); } + amdgpu_xcp_release_sched(adev, entity); + kfree(entity); return res; } @@ -412,7 +415,7 @@ static void amdgpu_ctx_fini(struct kref *ref) for (j = 0; j < AMDGPU_MAX_ENTITY_NUM; ++j) { ktime_t spend; - spend = amdgpu_ctx_fini_entity(ctx->entities[i][j]); + spend = amdgpu_ctx_fini_entity(adev, ctx->entities[i][j]); atomic64_add(ktime_to_ns(spend), >time_spend[i]); } } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c index 78fce5aab218..9b960ba0b7ac 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c @@ -366,3 +366,19 @@ int amdgpu_xcp_open_device(struct amdgpu_device *adev, return 0; } +void amdgpu_xcp_release_sched(struct amdgpu_device *adev, + struct amdgpu_ctx_entity *entity) +{ + struct drm_gpu_scheduler *sched; + struct amdgpu_ring *ring; + + if (!adev->xcp_mgr) + return; + + sched = entity->entity.rq->sched; + if (sched->ready) { + ring = to_amdgpu_ring(entity->entity.rq->sched); + atomic_dec(>xcp_mgr->xcp[ring->xcp_id].ref_cnt); + } +} + diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h index cca06d38b03d..39aca87ce204 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h @@ -128,6 +128,8 @@ void amdgpu_xcp_dev_unplug(struct amdgpu_device *adev); int amdgpu_xcp_open_device(struct amdgpu_device *adev, struct amdgpu_fpriv *fpriv, struct drm_file *file_priv); +void amdgpu_xcp_release_sched(struct amdgpu_device *adev, + struct amdgpu_ctx_entity *entity); #define amdgpu_xcp_select_scheds(adev, e, c, d, x, y) \ ((adev)->xcp_mgr && (adev)->xcp_mgr->funcs && \ -- 2.40.1
[PATCH 06/29] drm/amdgpu: keep amdgpu_ctx_mgr in ctx structure
From: James Zhu Keep amdgpu_ctx_mgr in ctx structure to track fpriv. Signed-off-by: James Zhu Acked-by: Lijo Lazar Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h | 1 + 2 files changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c index e3d047663d61..06d68a08251a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c @@ -332,6 +332,7 @@ static int amdgpu_ctx_init(struct amdgpu_ctx_mgr *mgr, int32_t priority, else ctx->stable_pstate = current_stable_pstate; + ctx->ctx_mgr = &(fpriv->ctx_mgr); return 0; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h index 5fd79f94e2d0..85376baaa92f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h @@ -57,6 +57,7 @@ struct amdgpu_ctx { unsigned long ras_counter_ce; unsigned long ras_counter_ue; uint32_tstable_pstate; + struct amdgpu_ctx_mgr *ctx_mgr; }; struct amdgpu_ctx_mgr { -- 2.40.1
[PATCH 11/29] drm/amdkfd: Store drm node minor number for kfd nodes
From: Philip Yang >From KFD topology, application will find kfd node with the corresponding drm device node minor number, for example if partition drm node starts from /dev/dri/renderD129, then KFD node 0 with store drm node minor number 129. Application will open drm node /dev/dri/renderD129 to create amdgpu vm for kfd node 0 with the correct vm->mem_id to indicate the memory partition. Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c index 6d6243b978e1..a8e25aecf839 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c @@ -1942,8 +1942,12 @@ int kfd_topology_add_device(struct kfd_node *gpu) amdgpu_amdkfd_get_max_engine_clock_in_mhz(dev->gpu->adev); dev->node_props.max_engine_clk_ccompute = cpufreq_quick_get_max(0) / 1000; - dev->node_props.drm_render_minor = - gpu->kfd->shared_resources.drm_render_minor; + + if (gpu->xcp) + dev->node_props.drm_render_minor = gpu->xcp->ddev->render->index; + else + dev->node_props.drm_render_minor = + gpu->kfd->shared_resources.drm_render_minor; dev->node_props.hive_id = gpu->kfd->hive_id; dev->node_props.num_sdma_engines = kfd_get_num_sdma_engines(gpu); -- 2.40.1
[PATCH 13/29] drm/amdkfd: Show KFD node memory partition info
From: Philip Yang Show KFD node memory partition id and size, add helper function KFD_XCP_MEMORY_SIZE to get kfd node memory size, will be used later to support memory accounting per partition. Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 5 + drivers/gpu/drm/amd/amdkfd/kfd_device.c| 7 ++- 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h index e4e1dbba060a..324cb566ca2f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h @@ -330,6 +330,11 @@ int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev, void amdgpu_amdkfd_unreserve_mem_limit(struct amdgpu_device *adev, uint64_t size, u32 alloc_flag); +#define KFD_XCP_MEMORY_SIZE(n) ((n)->adev->gmc.num_mem_partitions ?\ + (n)->adev->gmc.mem_partitions[(n)->xcp->mem_id].size /\ + (n)->adev->xcp_mgr->num_xcp_per_mem_partition :\ + (n)->adev->gmc.real_vram_size) + #if IS_ENABLED(CONFIG_HSA_AMD) void amdgpu_amdkfd_gpuvm_init_mem_limits(void); void amdgpu_amdkfd_gpuvm_destroy_cb(struct amdgpu_device *adev, diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c index b5497d2ee984..db5b53fcdf11 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c @@ -724,7 +724,6 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd, kfd_cwsr_init(kfd); - /* TODO: Needs to be updated for memory partitioning */ svm_migrate_init(kfd->adev); amdgpu_amdkfd_get_local_mem_info(kfd->adev, >local_mem_info); @@ -754,6 +753,12 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd, (1U << NUM_XCC(kfd->adev->gfx.xcc_mask)) - 1; } + if (node->xcp) { + dev_info(kfd_device, "KFD node %d partition %d size %lldM\n", + node->node_id, node->xcp->mem_id, + KFD_XCP_MEMORY_SIZE(node) >> 20); + } + if (KFD_GC_VERSION(kfd) == IP_VERSION(9, 4, 3) && partition_mode == AMDGPU_CPX_PARTITION_MODE && kfd->num_nodes != 1) { -- 2.40.1
[PATCH 12/29] drm/amdgpu: Add memory partition id to amdgpu_vm
From: Philip Yang If xcp_mgr is initialized, add mem_id to amdgpu_vm structure to store memory partition number when creating amdgpu_vm for the xcp. The xcp number is decided when opening the render device, for example /dev/dri/renderD129 is xcp_id 0, /dev/dri/rederD130 is xcp_id 1. Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 8 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 3 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c | 3 +++ 3 files changed, 10 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c index 879718598fa4..815098be4c2f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c @@ -1223,10 +1223,6 @@ int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv) goto out_suspend; } - r = amdgpu_xcp_open_device(adev, fpriv, file_priv); - if (r) - return r; - pasid = amdgpu_pasid_alloc(16); if (pasid < 0) { dev_warn(adev->dev, "No more PASIDs available!"); @@ -1237,6 +1233,10 @@ int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv) if (r) goto error_pasid; + r = amdgpu_xcp_open_device(adev, fpriv, file_priv); + if (r) + goto error_vm; + r = amdgpu_vm_set_pasid(adev, >vm, pasid); if (r) goto error_vm; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h index 2fdec4114627..d551fca1780e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h @@ -332,6 +332,9 @@ struct amdgpu_vm { struct ttm_lru_bulk_move lru_bulk_move; /* Flag to indicate if VM is used for compute */ boolis_compute_context; + + /* Memory partition number, -1 means any partition */ + int8_t mem_id; }; struct amdgpu_vm_manager { diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c index f2981d21d4e0..610c32c4f5af 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c @@ -364,6 +364,9 @@ int amdgpu_xcp_open_device(struct amdgpu_device *adev, break; } } + + fpriv->vm.mem_id = fpriv->xcp_id == ~0 ? -1 : + adev->xcp_mgr->xcp[fpriv->xcp_id].mem_id; return 0; } -- 2.40.1
[PATCH 10/29] drm/amdgpu: Add xcp manager num_xcp_per_mem_partition
From: Philip Yang Used by KFD to check memory limit accounting. Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h | 3 +++ 2 files changed, 4 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c index 9b960ba0b7ac..f2981d21d4e0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c @@ -156,6 +156,7 @@ int amdgpu_xcp_init(struct amdgpu_xcp_mgr *xcp_mgr, int num_xcps, int mode) xcp_mgr->num_xcps = num_xcps; amdgpu_xcp_update_partition_sched_list(adev); + xcp_mgr->num_xcp_per_mem_partition = num_xcps / xcp_mgr->adev->gmc.num_mem_partitions; return 0; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h index 39aca87ce204..68b63b970ce8 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h @@ -83,6 +83,9 @@ struct amdgpu_xcp_mgr { struct amdgpu_xcp xcp[MAX_XCP]; uint8_t num_xcps; int8_t mode; + +/* Used to determine KFD memory size limits per XCP */ + unsigned int num_xcp_per_mem_partition; }; struct amdgpu_xcp_mgr_funcs { -- 2.40.1
[PATCH 03/29] drm/amdgpu: add partition ID track in ring
From: James Zhu Keep track partition ID in ring. Signed-off-by: James Zhu Acked-by: Lijo Lazar Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 1 + .../drm/amd/amdgpu/aqua_vanjaram_reg_init.c | 41 +++ 2 files changed, 42 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h index 5192e3577e99..baa03527bf8b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h @@ -252,6 +252,7 @@ struct amdgpu_ring { uint32_tbuf_mask; u32 idx; u32 xcc_id; + u32 xcp_id; u32 me; u32 pipe; u32 queue; diff --git a/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram_reg_init.c b/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram_reg_init.c index 97011e7e031d..c90ea34ef9ec 100644 --- a/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram_reg_init.c +++ b/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram_reg_init.c @@ -61,6 +61,47 @@ void aqua_vanjaram_doorbell_index_init(struct amdgpu_device *adev) adev->doorbell_index.max_assignment = AMDGPU_DOORBELL_LAYOUT1_MAX_ASSIGNMENT << 1; } +static void aqua_vanjaram_set_xcp_id(struct amdgpu_device *adev, +uint32_t inst_idx, struct amdgpu_ring *ring) +{ + int xcp_id; + enum AMDGPU_XCP_IP_BLOCK ip_blk; + uint32_t inst_mask; + + ring->xcp_id = ~0; + if (adev->xcp_mgr->mode == AMDGPU_XCP_MODE_NONE) + return; + + inst_mask = 1 << inst_idx; + + switch (ring->funcs->type) { + case AMDGPU_HW_IP_GFX: + case AMDGPU_RING_TYPE_COMPUTE: + case AMDGPU_RING_TYPE_KIQ: + ip_blk = AMDGPU_XCP_GFX; + break; + case AMDGPU_RING_TYPE_SDMA: + ip_blk = AMDGPU_XCP_SDMA; + break; + case AMDGPU_RING_TYPE_VCN_ENC: + case AMDGPU_RING_TYPE_VCN_JPEG: + ip_blk = AMDGPU_XCP_VCN; + if (adev->xcp_mgr->mode == AMDGPU_CPX_PARTITION_MODE) + inst_mask = 1 << (inst_idx * 2); + break; + default: + DRM_ERROR("Not support ring type %d!", ring->funcs->type); + return; + } + + for (xcp_id = 0; xcp_id < adev->xcp_mgr->num_xcps; xcp_id++) { + if (adev->xcp_mgr->xcp[xcp_id].ip[ip_blk].inst_mask & inst_mask) { + ring->xcp_id = xcp_id; + break; + } + } +} + static int8_t aqua_vanjaram_logical_to_dev_inst(struct amdgpu_device *adev, enum amd_hw_ip_block_type block, int8_t inst) -- 2.40.1
[PATCH 02/29] drm/amdgpu: find partition ID when open device
From: James Zhu Find partition ID when open device from render device minor. Signed-off-by: Christian König Signed-off-by: James Zhu Reviewed-and-tested-by: Philip Yang Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 4 drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c | 29 + drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h | 3 +++ 4 files changed, 38 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 45c6522ee854..4fb43baddf96 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -488,6 +488,8 @@ struct amdgpu_fpriv { struct mutexbo_list_lock; struct idr bo_list_handles; struct amdgpu_ctx_mgr ctx_mgr; + /** GPU partition selection */ + uint32_txcp_id; }; int amdgpu_file_to_fpriv(struct file *filp, struct amdgpu_fpriv **fpriv); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c index 44997c7ee89d..879718598fa4 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c @@ -1223,6 +1223,10 @@ int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv) goto out_suspend; } + r = amdgpu_xcp_open_device(adev, fpriv, file_priv); + if (r) + return r; + pasid = amdgpu_pasid_alloc(16); if (pasid < 0) { dev_warn(adev->dev, "No more PASIDs available!"); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c index 8b28b18e4291..9b627a8b1d5c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c @@ -335,3 +335,32 @@ void amdgpu_xcp_dev_unplug(struct amdgpu_device *adev) drm_dev_unplug(adev->xcp_mgr->xcp[i].ddev); } +int amdgpu_xcp_open_device(struct amdgpu_device *adev, + struct amdgpu_fpriv *fpriv, + struct drm_file *file_priv) +{ + int i; + + if (!adev->xcp_mgr) + return 0; + + fpriv->xcp_id = ~0; + for (i = 0; i < MAX_XCP; ++i) { + if (!adev->xcp_mgr->xcp[i].ddev) + break; + + if (file_priv->minor == adev->xcp_mgr->xcp[i].ddev->render) { + if (adev->xcp_mgr->xcp[i].valid == FALSE) { + dev_err(adev->dev, "renderD%d partition %d not valid!", + file_priv->minor->index, i); + return -ENOENT; + } + dev_dbg(adev->dev, "renderD%d partition %d openned!", + file_priv->minor->index, i); + fpriv->xcp_id = i; + break; + } + } + return 0; +} + diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h index dad0b98d1ae7..ad60520f952c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h @@ -119,6 +119,9 @@ int amdgpu_xcp_get_inst_details(struct amdgpu_xcp *xcp, int amdgpu_xcp_dev_register(struct amdgpu_device *adev, const struct pci_device_id *ent); void amdgpu_xcp_dev_unplug(struct amdgpu_device *adev); +int amdgpu_xcp_open_device(struct amdgpu_device *adev, + struct amdgpu_fpriv *fpriv, + struct drm_file *file_priv); static inline int amdgpu_xcp_get_num_xcp(struct amdgpu_xcp_mgr *xcp_mgr) { -- 2.40.1
[PATCH 04/29] drm/amdgpu: update header to support partition scheduling
From: James Zhu Update header to support partition scheduling. Signed-off-by: James Zhu Acked-by: Lijo Lazar Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h | 15 +++ 1 file changed, 15 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h index ad60520f952c..cca06d38b03d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h @@ -70,7 +70,9 @@ struct amdgpu_xcp { uint8_t id; uint8_t mem_id; bool valid; + atomic_tref_cnt; struct drm_device *ddev; + struct amdgpu_sched gpu_sched[AMDGPU_HW_IP_NUM][AMDGPU_RING_PRIO_MAX]; }; struct amdgpu_xcp_mgr { @@ -97,6 +99,10 @@ struct amdgpu_xcp_mgr_funcs { int (*suspend)(struct amdgpu_xcp_mgr *xcp_mgr, int xcp_id); int (*prepare_resume)(struct amdgpu_xcp_mgr *xcp_mgr, int xcp_id); int (*resume)(struct amdgpu_xcp_mgr *xcp_mgr, int xcp_id); + int (*select_scheds)(struct amdgpu_device *adev, + u32 hw_ip, u32 hw_prio, struct amdgpu_fpriv *fpriv, + unsigned int *num_scheds, struct drm_gpu_scheduler ***scheds); + int (*update_partition_sched_list)(struct amdgpu_device *adev); }; int amdgpu_xcp_prepare_suspend(struct amdgpu_xcp_mgr *xcp_mgr, int xcp_id); @@ -123,6 +129,15 @@ int amdgpu_xcp_open_device(struct amdgpu_device *adev, struct amdgpu_fpriv *fpriv, struct drm_file *file_priv); +#define amdgpu_xcp_select_scheds(adev, e, c, d, x, y) \ + ((adev)->xcp_mgr && (adev)->xcp_mgr->funcs && \ + (adev)->xcp_mgr->funcs->select_scheds ? \ + (adev)->xcp_mgr->funcs->select_scheds((adev), (e), (c), (d), (x), (y)) : -ENOENT) +#define amdgpu_xcp_update_partition_sched_list(adev) \ + ((adev)->xcp_mgr && (adev)->xcp_mgr->funcs && \ + (adev)->xcp_mgr->funcs->update_partition_sched_list ? \ + (adev)->xcp_mgr->funcs->update_partition_sched_list(adev) : 0) + static inline int amdgpu_xcp_get_num_xcp(struct amdgpu_xcp_mgr *xcp_mgr) { if (!xcp_mgr) -- 2.40.1
[PATCH 01/29] drm/amdgpu: support partition drm devices
From: James Zhu Support partition drm devices on GC_HWIP IP_VERSION(9, 4, 3). This is a temporary solution and will be superceded. Signed-off-by: Christian König Signed-off-by: James Zhu Reviewed-and-tested-by: Philip Yang Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu.h| 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c| 32 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.h| 2 + drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c| 59 +- drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h| 5 ++ 6 files changed, 99 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index bed6d1d09ac2..45c6522ee854 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -108,6 +108,7 @@ #include "amdgpu_fdinfo.h" #include "amdgpu_mca.h" #include "amdgpu_ras.h" +#include "amdgpu_xcp.h" #define MAX_GPU_INSTANCE 64 diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index c2136accd523..40c5845c78df 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -6062,6 +6062,7 @@ void amdgpu_device_halt(struct amdgpu_device *adev) struct pci_dev *pdev = adev->pdev; struct drm_device *ddev = adev_to_drm(adev); + amdgpu_xcp_dev_unplug(adev); drm_dev_unplug(ddev); amdgpu_irq_disable_all(adev); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 562e65ab48fa..4589cb2255a2 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -2194,6 +2194,10 @@ static int amdgpu_pci_probe(struct pci_dev *pdev, goto err_pci; } + ret = amdgpu_xcp_dev_register(adev, ent); + if (ret) + goto err_pci; + /* * 1. don't init fbdev on hw without DCE * 2. don't init fbdev if there are no connectors @@ -2266,6 +2270,7 @@ amdgpu_pci_remove(struct pci_dev *pdev) struct drm_device *dev = pci_get_drvdata(pdev); struct amdgpu_device *adev = drm_to_adev(dev); + amdgpu_xcp_dev_unplug(adev); drm_dev_unplug(dev); if (adev->pm.rpm_mode != AMDGPU_RUNPM_NONE) { @@ -2849,6 +2854,33 @@ static const struct drm_driver amdgpu_kms_driver = { .patchlevel = KMS_DRIVER_PATCHLEVEL, }; +const struct drm_driver amdgpu_partition_driver = { + .driver_features = + DRIVER_GEM | DRIVER_RENDER | DRIVER_SYNCOBJ | + DRIVER_SYNCOBJ_TIMELINE, + .open = amdgpu_driver_open_kms, + .postclose = amdgpu_driver_postclose_kms, + .lastclose = amdgpu_driver_lastclose_kms, + .ioctls = amdgpu_ioctls_kms, + .num_ioctls = ARRAY_SIZE(amdgpu_ioctls_kms), + .dumb_create = amdgpu_mode_dumb_create, + .dumb_map_offset = amdgpu_mode_dumb_mmap, + .fops = _driver_kms_fops, + .release = _driver_release_kms, + + .prime_handle_to_fd = drm_gem_prime_handle_to_fd, + .prime_fd_to_handle = drm_gem_prime_fd_to_handle, + .gem_prime_import = amdgpu_gem_prime_import, + .gem_prime_mmap = drm_gem_prime_mmap, + + .name = DRIVER_NAME, + .desc = DRIVER_DESC, + .date = DRIVER_DATE, + .major = KMS_DRIVER_MAJOR, + .minor = KMS_DRIVER_MINOR, + .patchlevel = KMS_DRIVER_PATCHLEVEL, +}; + static struct pci_error_handlers amdgpu_pci_err_handler = { .error_detected = amdgpu_pci_error_detected, .mmio_enabled = amdgpu_pci_mmio_enabled, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.h index 8178323e4bef..5bc2cb661af7 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.h @@ -42,6 +42,8 @@ #define DRIVER_DESC"AMD GPU" #define DRIVER_DATE"20150101" +extern const struct drm_driver amdgpu_partition_driver; + long amdgpu_drm_ioctl(struct file *filp, unsigned int cmd, unsigned long arg); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c index bca226cc4e0b..8b28b18e4291 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c @@ -22,6 +22,9 @@ */ #include "amdgpu.h" #include "amdgpu_xcp.h" +#include "amdgpu_drv.h" + +#include static int __amdgpu_xcp_run(struct amdgpu_xcp_mgr *xcp_mgr, struct amdgpu_xcp_ip *xcp_ip, int xcp_state) @@ -217,6 +220,31 @@ int amdgpu_xcp_query_partition_mode(struct amdgpu_xcp_mgr *xcp_mgr, u32 flags) return mode; } +static int amdgpu_xcp_dev_alloc(struct amdgpu_device *adev) +{ + struct drm_device *p_ddev; + struct pci_dev *pdev; + struct drm_device *ddev; + int i; + + pdev =
Re: [PATCH 10/66] drm/amd/display: Do not set drr on pipe commit
On 5/10/23 09:20, Michel Dänzer wrote: > On 5/9/23 23:07, Pillai, Aurabindo wrote: >> >> Sorry - the firmware in the previous message is for DCN32. For Navi2x, >> please use the firmware attached here. > > Same problem (contents of /sys/kernel/debug/dri/0/amdgpu_firmware_info below). > > Even if it did work with newer FW, the kernel must keep working with older > FW, so in that case the new behaviour would need to be guarded by the FW > version. > Agreed. Were you able to repro the hang on any other modes/monitors? > > VCE feature version: 0, firmware version: 0x > UVD feature version: 0, firmware version: 0x > MC feature version: 0, firmware version: 0x > ME feature version: 44, firmware version: 0x0040 > PFP feature version: 44, firmware version: 0x0061 > CE feature version: 44, firmware version: 0x0025 > RLC feature version: 1, firmware version: 0x0060 > RLC SRLC feature version: 0, firmware version: 0x > RLC SRLG feature version: 0, firmware version: 0x > RLC SRLS feature version: 0, firmware version: 0x > RLCP feature version: 0, firmware version: 0x > RLCV feature version: 0, firmware version: 0x > MEC feature version: 44, firmware version: 0x0071 > MEC2 feature version: 44, firmware version: 0x0071 > IMU feature version: 0, firmware version: 0x > SOS feature version: 0, firmware version: 0x00210c64 > ASD feature version: 553648297, firmware version: 0x21a9 > TA XGMI feature version: 0x, firmware version: 0x200f > TA RAS feature version: 0x, firmware version: 0x1b00013e > TA HDCP feature version: 0x, firmware version: 0x1738 > TA DTM feature version: 0x, firmware version: 0x1215 > TA RAP feature version: 0x, firmware version: 0x07000213 > TA SECUREDISPLAY feature version: 0x, firmware version: 0x > SMC feature version: 0, program: 0, firmware version: 0x003a5800 (58.88.0) > SDMA0 feature version: 52, firmware version: 0x0053 > SDMA1 feature version: 52, firmware version: 0x0053 > SDMA2 feature version: 52, firmware version: 0x0053 > SDMA3 feature version: 52, firmware version: 0x0053 > VCN feature version: 0, firmware version: 0x0211b000 > DMCU feature version: 0, firmware version: 0x > DMCUB feature version: 0, firmware version: 0x0202001c > TOC feature version: 0, firmware version: 0x > MES_KIQ feature version: 0, firmware version: 0x > MES feature version: 0, firmware version: 0x > VBIOS version: 113-D4300100-051 > > > -- >> *From:* Pillai, Aurabindo >> *Sent:* Tuesday, May 9, 2023 4:44 PM >> *To:* Michel Dänzer ; Zhuo, Qingqing (Lillian) >> ; amd-gfx@lists.freedesktop.org >> ; Chalmers, Wesley >> *Cc:* Wang, Chao-kai (Stylon) ; Li, Sun peng (Leo) >> ; Wentland, Harry ; Siqueira, >> Rodrigo ; Li, Roman ; Chiu, >> Solomon ; Lin, Wayne ; Lakha, >> Bhawanpreet ; Gutierrez, Agustin >> ; Kotarac, Pavle >> *Subject:* Re: [PATCH 10/66] drm/amd/display: Do not set drr on pipe commit >> >> Hi Michel, >> >> Could you please try with the attached firmware package if you see the hang >> without any reverts? If you do see hangs, please send dmesg with >> "drm.debug=0x156 log_buf_len=30M" in the kernel cmdline. >> >> The attached fw is not released to the public yet, but we will be updating >> them in linux-firmware tree next week. Please do backup your existing >> firmware, and put the attached files into /usr/lib/firmware/updates/amgpu >> and regenerate your ramdisk. On ubuntu the following should do: >> >> sudo update-initramfs -u -k `uname -r` >> >> -- >> >> Regards, >> Jay >>
[PATCH 07/10] drm/amd/display: Make unbounded req update separate from dlg/ttu
From: Alvin Lee [Description] - Updates to unbounded requesting should not be conditional on updates to dlg / ttu, as this could prevent unbounded requesting from being updated if dlg / ttu does not change Reviewed-by: Jun Lei Acked-by: Aurabindo Pillai Signed-off-by: Alvin Lee --- drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c | 11 --- drivers/gpu/drm/amd/display/dc/inc/core_types.h| 1 + 2 files changed, 9 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c index b3e187b1347d..e74c3ce561ab 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c +++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c @@ -1361,6 +1361,7 @@ static void dcn20_detect_pipe_changes(struct pipe_ctx *old_pipe, struct pipe_ctx new_pipe->update_flags.bits.dppclk = 1; new_pipe->update_flags.bits.hubp_interdependent = 1; new_pipe->update_flags.bits.hubp_rq_dlg_ttu = 1; + new_pipe->update_flags.bits.unbounded_req = 1; new_pipe->update_flags.bits.gamut_remap = 1; new_pipe->update_flags.bits.scaler = 1; new_pipe->update_flags.bits.viewport = 1; @@ -1504,6 +1505,9 @@ static void dcn20_detect_pipe_changes(struct pipe_ctx *old_pipe, struct pipe_ctx memcmp(_pipe->rq_regs, _pipe->rq_regs, sizeof(old_pipe->rq_regs))) new_pipe->update_flags.bits.hubp_rq_dlg_ttu = 1; } + + if (old_pipe->unbounded_req != new_pipe->unbounded_req) + new_pipe->update_flags.bits.unbounded_req = 1; } static void dcn20_update_dchubp_dpp( @@ -1537,10 +1541,11 @@ static void dcn20_update_dchubp_dpp( _ctx->ttu_regs, _ctx->rq_regs, _ctx->pipe_dlg_param); - - if (hubp->funcs->set_unbounded_requesting) - hubp->funcs->set_unbounded_requesting(hubp, pipe_ctx->unbounded_req); } + + if (pipe_ctx->update_flags.bits.unbounded_req && hubp->funcs->set_unbounded_requesting) + hubp->funcs->set_unbounded_requesting(hubp, pipe_ctx->unbounded_req); + if (pipe_ctx->update_flags.bits.hubp_interdependent) hubp->funcs->hubp_setup_interdependent( hubp, diff --git a/drivers/gpu/drm/amd/display/dc/inc/core_types.h b/drivers/gpu/drm/amd/display/dc/inc/core_types.h index b4c1cc6dc857..d8dd143cf6ea 100644 --- a/drivers/gpu/drm/amd/display/dc/inc/core_types.h +++ b/drivers/gpu/drm/amd/display/dc/inc/core_types.h @@ -374,6 +374,7 @@ union pipe_update_flags { uint32_t viewport : 1; uint32_t plane_changed : 1; uint32_t det_size : 1; + uint32_t unbounded_req : 1; } bits; uint32_t raw; }; -- 2.40.0
[PATCH 09/10] drm/amd/display: Remove v_startup workaround for dcn3+
From: Daniel Miess [Why] Calls to dcn20_adjust_freesync_v_startup are no longer needed as of dcn3+ and can cause underflow in some cases [How] Move calls to dcn20_adjust_freesync_v_startup up into validate_bandwidth for dcn2.x Reviewed-by: Jun Lei Acked-by: Aurabindo Pillai Signed-off-by: Daniel Miess --- .../drm/amd/display/dc/dml/dcn20/dcn20_fpu.c | 24 +++ 1 file changed, 20 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c b/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c index 3407f9a2c6a1..8ae5ddbd1b27 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c @@ -1099,10 +1099,6 @@ void dcn20_calculate_dlg_params(struct dc *dc, context->res_ctx.pipe_ctx[i].plane_res.bw.dppclk_khz = pipes[pipe_idx].clks_cfg.dppclk_mhz * 1000; context->res_ctx.pipe_ctx[i].pipe_dlg_param = pipes[pipe_idx].pipe.dest; - if (context->res_ctx.pipe_ctx[i].stream->adaptive_sync_infopacket.valid) - dcn20_adjust_freesync_v_startup( - >res_ctx.pipe_ctx[i].stream->timing, - >res_ctx.pipe_ctx[i].pipe_dlg_param.vstartup_start); pipe_idx++; } @@ -1931,6 +1927,7 @@ static bool dcn20_validate_bandwidth_internal(struct dc *dc, struct dc_state *co int vlevel = 0; int pipe_split_from[MAX_PIPES]; int pipe_cnt = 0; + int i = 0; display_e2e_pipe_params_st *pipes = kzalloc(dc->res_pool->pipe_count * sizeof(display_e2e_pipe_params_st), GFP_ATOMIC); DC_LOGGER_INIT(dc->ctx->logger); @@ -1954,6 +1951,15 @@ static bool dcn20_validate_bandwidth_internal(struct dc *dc, struct dc_state *co dcn20_calculate_wm(dc, context, pipes, _cnt, pipe_split_from, vlevel, fast_validate); dcn20_calculate_dlg_params(dc, context, pipes, pipe_cnt, vlevel); + for (i = 0; i < dc->res_pool->pipe_count; i++) { + if (!context->res_ctx.pipe_ctx[i].stream) + continue; + if (context->res_ctx.pipe_ctx[i].stream->adaptive_sync_infopacket.valid) + dcn20_adjust_freesync_v_startup( + >res_ctx.pipe_ctx[i].stream->timing, + >res_ctx.pipe_ctx[i].pipe_dlg_param.vstartup_start); + } + BW_VAL_TRACE_END_WATERMARKS(); goto validate_out; @@ -2226,6 +2232,7 @@ bool dcn21_validate_bandwidth_fp(struct dc *dc, int vlevel = 0; int pipe_split_from[MAX_PIPES]; int pipe_cnt = 0; + int i = 0; display_e2e_pipe_params_st *pipes = kzalloc(dc->res_pool->pipe_count * sizeof(display_e2e_pipe_params_st), GFP_ATOMIC); DC_LOGGER_INIT(dc->ctx->logger); @@ -2254,6 +2261,15 @@ bool dcn21_validate_bandwidth_fp(struct dc *dc, dcn21_calculate_wm(dc, context, pipes, _cnt, pipe_split_from, vlevel, fast_validate); dcn20_calculate_dlg_params(dc, context, pipes, pipe_cnt, vlevel); + for (i = 0; i < dc->res_pool->pipe_count; i++) { + if (!context->res_ctx.pipe_ctx[i].stream) + continue; + if (context->res_ctx.pipe_ctx[i].stream->adaptive_sync_infopacket.valid) + dcn20_adjust_freesync_v_startup( + >res_ctx.pipe_ctx[i].stream->timing, + >res_ctx.pipe_ctx[i].pipe_dlg_param.vstartup_start); + } + BW_VAL_TRACE_END_WATERMARKS(); goto validate_out; -- 2.40.0
[PATCH 10/10] drm/amd/display: 3.2.236
From: Aric Cyr Acked-by: Aurabindo Pillai Signed-off-by: Aric Cyr --- drivers/gpu/drm/amd/display/dc/dc.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/dc/dc.h b/drivers/gpu/drm/amd/display/dc/dc.h index 8be2e6d6d888..2dff1a5cf3b1 100644 --- a/drivers/gpu/drm/amd/display/dc/dc.h +++ b/drivers/gpu/drm/amd/display/dc/dc.h @@ -45,7 +45,7 @@ struct aux_payload; struct set_config_cmd_payload; struct dmub_notification; -#define DC_VER "3.2.235" +#define DC_VER "3.2.236" #define MAX_SURFACES 3 #define MAX_PLANES 6 -- 2.40.0
[PATCH 08/10] drm/amd/display: Remove unnecessary variable
From: Rodrigo Siqueira There is no need to use dc_version in the dc_construct_ctx since this value is copied to dc_ctx->dce_version later. This commit removes the extra steps. Reviewed-by: Alex Hung Acked-by: Aurabindo Pillai Signed-off-by: Rodrigo Siqueira --- drivers/gpu/drm/amd/display/dc/core/dc.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c b/drivers/gpu/drm/amd/display/dc/core/dc.c index adf5d0e1a7c5..f864fd3b6f29 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc.c @@ -857,7 +857,6 @@ static bool dc_construct_ctx(struct dc *dc, const struct dc_init_data *init_params) { struct dc_context *dc_ctx; - enum dce_version dc_version = DCE_VERSION_UNKNOWN; dc_ctx = kzalloc(sizeof(*dc_ctx), GFP_KERNEL); if (!dc_ctx) @@ -875,8 +874,7 @@ static bool dc_construct_ctx(struct dc *dc, /* Create logger */ - dc_version = resource_parse_asic_id(init_params->asic_id); - dc_ctx->dce_version = dc_version; + dc_ctx->dce_version = resource_parse_asic_id(init_params->asic_id); dc_ctx->perf_trace = dc_perf_trace_create(); if (!dc_ctx->perf_trace) { -- 2.40.0
[PATCH 06/10] drm/amd/display: Add visual confirm color support for MCLK switch
From: "Leo (Hanghong) Ma" [Why && How] We would like to have visual confirm color support for MCLK switch. 1. Set visual confirm color to yellow: Vblank MCLK switch. 2. Set visual confirm color to cyan: FPO + Vblank MCLK switch. 3. Set visual confirm color to pink: Vactive MCLK switch. Reviewed-by: Jun Lei Acked-by: Aurabindo Pillai Signed-off-by: Leo (Hanghong) Ma --- drivers/gpu/drm/amd/display/dc/core/dc.c | 47 +++-- .../drm/amd/display/dc/core/dc_hw_sequencer.c | 50 +-- drivers/gpu/drm/amd/display/dc/dc.h | 1 + .../amd/display/dc/dcn10/dcn10_hw_sequencer.c | 22 +++- .../amd/display/dc/dcn10/dcn10_hw_sequencer.h | 1 - .../drm/amd/display/dc/dcn20/dcn20_hwseq.c| 26 +- .../drm/amd/display/dc/dcn20/dcn20_hwseq.h| 5 -- .../gpu/drm/amd/display/dc/dcn20/dcn20_init.c | 2 +- .../drm/amd/display/dc/dcn201/dcn201_hwseq.c | 4 +- .../drm/amd/display/dc/dcn201/dcn201_init.c | 2 +- .../gpu/drm/amd/display/dc/dcn21/dcn21_init.c | 2 +- .../gpu/drm/amd/display/dc/dcn30/dcn30_init.c | 2 +- .../drm/amd/display/dc/dcn301/dcn301_init.c | 2 +- .../gpu/drm/amd/display/dc/dcn31/dcn31_init.c | 2 +- .../drm/amd/display/dc/dcn314/dcn314_init.c | 2 +- .../gpu/drm/amd/display/dc/dcn32/dcn32_init.c | 2 +- .../drm/amd/display/dc/dml/dcn32/dcn32_fpu.c | 7 +++ .../gpu/drm/amd/display/dc/inc/core_types.h | 2 + .../gpu/drm/amd/display/dc/inc/hw_sequencer.h | 9 +++- 19 files changed, 125 insertions(+), 65 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c b/drivers/gpu/drm/amd/display/dc/core/dc.c index 9be18ebb1c17..adf5d0e1a7c5 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc.c @@ -1119,6 +1119,33 @@ static void phantom_pipe_blank( hws->funcs.wait_for_blank_complete(opp); } +static void dc_update_viusal_confirm_color(struct dc *dc, struct dc_state *context, struct pipe_ctx *pipe_ctx) +{ + if (dc->ctx->dce_version >= DCN_VERSION_1_0) { + memset(_ctx->visual_confirm_color, 0, sizeof(struct tg_color)); + + if (dc->debug.visual_confirm == VISUAL_CONFIRM_HDR) + get_hdr_visual_confirm_color(pipe_ctx, &(pipe_ctx->visual_confirm_color)); + else if (dc->debug.visual_confirm == VISUAL_CONFIRM_SURFACE) + get_surface_visual_confirm_color(pipe_ctx, &(pipe_ctx->visual_confirm_color)); + else if (dc->debug.visual_confirm == VISUAL_CONFIRM_SWIZZLE) + get_surface_tile_visual_confirm_color(pipe_ctx, &(pipe_ctx->visual_confirm_color)); + else { + if (dc->ctx->dce_version < DCN_VERSION_2_0) + color_space_to_black_color( + dc, pipe_ctx->stream->output_color_space, &(pipe_ctx->visual_confirm_color)); + } + if (dc->ctx->dce_version >= DCN_VERSION_2_0) { + if (dc->debug.visual_confirm == VISUAL_CONFIRM_MPCTREE) + get_mpctree_visual_confirm_color(pipe_ctx, &(pipe_ctx->visual_confirm_color)); + else if (dc->debug.visual_confirm == VISUAL_CONFIRM_SUBVP) + get_subvp_visual_confirm_color(dc, context, pipe_ctx, &(pipe_ctx->visual_confirm_color)); + else if (dc->debug.visual_confirm == VISUAL_CONFIRM_MCLK_SWITCH) + get_mclk_switch_visual_confirm_color(dc, context, pipe_ctx, &(pipe_ctx->visual_confirm_color)); + } + } +} + static void disable_dangling_plane(struct dc *dc, struct dc_state *context) { int i, j; @@ -1189,6 +1216,9 @@ static void disable_dangling_plane(struct dc *dc, struct dc_state *context) dc_rem_all_planes_for_stream(dc, old_stream, dangling_context); disable_all_writeback_pipes_for_stream(dc, old_stream, dangling_context); + if (pipe->stream && pipe->plane_state) + dc_update_viusal_confirm_color(dc, context, pipe); + if (dc->hwss.apply_ctx_for_surface) { apply_ctx_interdependent_lock(dc, dc->current_state, old_stream, true); dc->hwss.apply_ctx_for_surface(dc, old_stream, 0, dangling_context); @@ -3456,6 +3486,14 @@ static void commit_planes_for_stream(struct dc *dc, } } + if (dc->debug.visual_confirm) + for (i = 0; i < dc->res_pool->pipe_count; i++) { + struct pipe_ctx *pipe = >res_ctx.pipe_ctx[i]; + + if (pipe->stream && pipe->plane_state) + dc_update_viusal_confirm_color(dc, context, pipe); + } + if (stream->test_pattern.type !=
[PATCH 05/10] drm/amd/display: Fix possible underflow for displays with large vblank
From: Daniel Miess [Why] Underflow observed when using a display with a large vblank region and low refresh rate [How] Simplify calculation of vblank_nom Increase value for VBlankNomDefaultUS to 800us Reviewed-by: Jun Lei Acked-by: Aurabindo Pillai Signed-off-by: Daniel Miess --- .../amd/display/dc/dml/dcn314/dcn314_fpu.c| 19 +++ 1 file changed, 7 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn314/dcn314_fpu.c b/drivers/gpu/drm/amd/display/dc/dml/dcn314/dcn314_fpu.c index 1d00eb9e73c6..554152371eb5 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/dcn314/dcn314_fpu.c +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn314/dcn314_fpu.c @@ -33,7 +33,7 @@ #include "dml/display_mode_vba.h" struct _vcs_dpi_ip_params_st dcn3_14_ip = { - .VBlankNomDefaultUS = 668, + .VBlankNomDefaultUS = 800, .gpuvm_enable = 1, .gpuvm_max_page_table_levels = 1, .hostvm_enable = 1, @@ -286,7 +286,7 @@ int dcn314_populate_dml_pipes_from_context_fpu(struct dc *dc, struct dc_state *c struct resource_context *res_ctx = >res_ctx; struct pipe_ctx *pipe; bool upscaled = false; - bool isFreesyncVideo = false; + const unsigned int max_allowed_vblank_nom = 1023; dc_assert_fp_enabled(); @@ -300,16 +300,11 @@ int dcn314_populate_dml_pipes_from_context_fpu(struct dc *dc, struct dc_state *c pipe = _ctx->pipe_ctx[i]; timing = >stream->timing; - isFreesyncVideo = pipe->stream->adjust.v_total_max == pipe->stream->adjust.v_total_min; - isFreesyncVideo = isFreesyncVideo && pipe->stream->adjust.v_total_min > timing->v_total; - - if (!isFreesyncVideo) { - pipes[pipe_cnt].pipe.dest.vblank_nom = - dcn3_14_ip.VBlankNomDefaultUS / (timing->h_total / (timing->pix_clk_100hz / 1.0)); - } else { - pipes[pipe_cnt].pipe.dest.vtotal = pipe->stream->adjust.v_total_min; - pipes[pipe_cnt].pipe.dest.vblank_nom = timing->v_total - pipes[pipe_cnt].pipe.dest.vactive; - } + pipes[pipe_cnt].pipe.dest.vtotal = pipe->stream->adjust.v_total_min; + pipes[pipe_cnt].pipe.dest.vblank_nom = timing->v_total - pipes[pipe_cnt].pipe.dest.vactive; + pipes[pipe_cnt].pipe.dest.vblank_nom = min(pipes[pipe_cnt].pipe.dest.vblank_nom, dcn3_14_ip.VBlankNomDefaultUS); + pipes[pipe_cnt].pipe.dest.vblank_nom = max(pipes[pipe_cnt].pipe.dest.vblank_nom, timing->v_sync_width); + pipes[pipe_cnt].pipe.dest.vblank_nom = min(pipes[pipe_cnt].pipe.dest.vblank_nom, max_allowed_vblank_nom); if (pipe->plane_state && (pipe->plane_state->src_rect.height < pipe->plane_state->dst_rect.height || -- 2.40.0
[PATCH 04/10] drm/amd/display: Convert connector signal id to string
From: Rodrigo Siqueira To improve the readability of the of the log, this commit introduces a function that converts the signal type id to a human-readable string. Reviewed-by: Jerry Zuo Acked-by: Aurabindo Pillai Signed-off-by: Rodrigo Siqueira --- .../drm/amd/display/dc/link/link_factory.c| 6 ++-- .../drm/amd/display/include/signal_types.h| 28 +++ 2 files changed, 30 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/link/link_factory.c b/drivers/gpu/drm/amd/display/dc/link/link_factory.c index 1515c817f03b..ac1c3e2e7c1d 100644 --- a/drivers/gpu/drm/amd/display/dc/link/link_factory.c +++ b/drivers/gpu/drm/amd/display/dc/link/link_factory.c @@ -563,11 +563,9 @@ static bool construct_phy(struct dc_link *link, goto create_fail; } - /* TODO: #DAL3 Implement id to str function.*/ - LINK_INFO("Connector[%d] description:" - "signal %d\n", + LINK_INFO("Connector[%d] description: signal: %s\n", init_params->connector_index, - link->connector_signal); + signal_type_to_string(link->connector_signal)); ddc_service_init_data.ctx = link->ctx; ddc_service_init_data.id = link->link_id; diff --git a/drivers/gpu/drm/amd/display/include/signal_types.h b/drivers/gpu/drm/amd/display/include/signal_types.h index 23a308c3eccb..325c5ba4c82a 100644 --- a/drivers/gpu/drm/amd/display/include/signal_types.h +++ b/drivers/gpu/drm/amd/display/include/signal_types.h @@ -44,6 +44,34 @@ enum signal_type { SIGNAL_TYPE_VIRTUAL = (1 << 9), /* Virtual Display */ }; +static inline const char *signal_type_to_string(const int type) +{ + switch (type) { + case SIGNAL_TYPE_NONE: + return "No signal"; + case SIGNAL_TYPE_DVI_SINGLE_LINK: + return "DVI: Single Link"; + case SIGNAL_TYPE_DVI_DUAL_LINK: + return "DVI: Dual Link"; + case SIGNAL_TYPE_HDMI_TYPE_A: + return "HDMI: TYPE A"; + case SIGNAL_TYPE_LVDS: + return "LVDS"; + case SIGNAL_TYPE_RGB: + return "RGB"; + case SIGNAL_TYPE_DISPLAY_PORT: + return "Display Port"; + case SIGNAL_TYPE_DISPLAY_PORT_MST: + return "Display Port: MST"; + case SIGNAL_TYPE_EDP: + return "Embedded Display Port"; + case SIGNAL_TYPE_VIRTUAL: + return "Virtual"; + default: + return "Unknown"; + } +} + /* help functions for signal types manipulation */ static inline bool dc_is_hdmi_tmds_signal(enum signal_type signal) { -- 2.40.0
[PATCH 01/10] drm/amd/display: enable dpia validate
From: Mustapha Ghaddar Use dpia_validate_usb4_bw() function Fixes: 6d86146dd62f ("drm/amd/display: Add function pointer for validate bw usb4") Reviewed-by: Roman Li Reviewed-by: Meenakshikumar Somasundaram Acked-by: Aurabindo Pillai Signed-off-by: Mustapha Ghaddar --- drivers/gpu/drm/amd/display/dc/link/link_validation.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/amd/display/dc/link/link_validation.c b/drivers/gpu/drm/amd/display/dc/link/link_validation.c index d4b7da526f0a..e8b2fc4002a5 100644 --- a/drivers/gpu/drm/amd/display/dc/link/link_validation.c +++ b/drivers/gpu/drm/amd/display/dc/link/link_validation.c @@ -359,5 +359,8 @@ bool link_validate_dpia_bandwidth(const struct dc_stream_state *stream, const un link[i] = stream[i].link; bw_needed[i] = dc_bandwidth_in_kbps_from_timing([i].timing); } + + ret = dpia_validate_usb4_bw(link, bw_needed, num_streams); + return ret; } -- 2.40.0
[PATCH 03/10] drm/amd/display: Update vactive margin and max vblank for fpo + vactive
From: Alvin Lee [Description] - Some 1920x1080@60hz displays have VBLANK time > 600us which we still want to accept for FPO + Vactive configs based on testing - Increase max VBLANK time to 1000us to allow these configs for FPO + Vactive - Increase minimum vactive switch margin for FPO + Vactive to 200us - Based on testing, 1920x1080@120hz can have a switch margin of ~160us which requires significantly longer FPO stretch margin (5ms) which we don't want to accept for now - Also move margins into debug option Reviewed-by: Jun Lei Reviewed-by: Nevenko Stupar Acked-by: Aurabindo Pillai Signed-off-by: Alvin Lee --- drivers/gpu/drm/amd/display/dc/dc.h | 2 ++ drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource.c | 2 ++ drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource.h | 1 - drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource_helpers.c | 2 +- drivers/gpu/drm/amd/display/dc/dcn321/dcn321_resource.c | 2 ++ drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c | 3 +-- 6 files changed, 8 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/dc.h b/drivers/gpu/drm/amd/display/dc/dc.h index e89de1078964..1ebb8d3573f4 100644 --- a/drivers/gpu/drm/amd/display/dc/dc.h +++ b/drivers/gpu/drm/amd/display/dc/dc.h @@ -893,6 +893,8 @@ struct dc_debug_options { bool minimize_dispclk_using_odm; bool disable_subvp_high_refresh; bool disable_dp_plus_plus_wa; + uint32_t fpo_vactive_min_active_margin_us; + uint32_t fpo_vactive_max_blank_us; }; struct gpu_info_soc_bounding_box_v1_0; diff --git a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource.c b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource.c index 4de2f8813dce..98c394f9f8cf 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource.c +++ b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource.c @@ -730,6 +730,8 @@ static const struct dc_debug_options debug_defaults_drv = { .disable_boot_optimizations = false, .disable_subvp_high_refresh = true, .disable_dp_plus_plus_wa = true, + .fpo_vactive_min_active_margin_us = 200, + .fpo_vactive_max_blank_us = 1000, }; static const struct dc_debug_options debug_defaults_diags = { diff --git a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource.h b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource.h index 42ccfd13a37c..58826e0aa76e 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource.h +++ b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource.h @@ -39,7 +39,6 @@ #define DCN3_2_MBLK_HEIGHT_8BPE 64 #define DCN3_2_VMIN_DISPCLK_HZ 71700 #define DCN3_2_DCFCLK_DS_INIT_KHZ 1 // Choose 10Mhz for init DCFCLK DS freq -#define DCN3_2_MIN_ACTIVE_SWITCH_MARGIN_FPO_US 100 // Only allow FPO + Vactive if active margin >= 100 #define SUBVP_HIGH_REFRESH_LIST_LEN 3 #define DCN3_2_MAX_SUBVP_PIXEL_RATE_MHZ 1800 diff --git a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource_helpers.c b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource_helpers.c index df912c333bbd..a8082580df92 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource_helpers.c +++ b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource_helpers.c @@ -626,7 +626,7 @@ struct dc_stream_state *dcn32_can_support_mclk_switch_using_fw_based_vblank_stre DC_FP_END(); DC_FP_START(); - is_fpo_vactive = dcn32_find_vactive_pipe(dc, context, DCN3_2_MIN_ACTIVE_SWITCH_MARGIN_FPO_US); + is_fpo_vactive = dcn32_find_vactive_pipe(dc, context, dc->debug.fpo_vactive_min_active_margin_us); DC_FP_END(); if (!is_fpo_vactive || dc->debug.disable_fpo_vactive) return NULL; diff --git a/drivers/gpu/drm/amd/display/dc/dcn321/dcn321_resource.c b/drivers/gpu/drm/amd/display/dc/dcn321/dcn321_resource.c index 4c1e0f5a5f09..f4cd9749ffdf 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn321/dcn321_resource.c +++ b/drivers/gpu/drm/amd/display/dc/dcn321/dcn321_resource.c @@ -728,6 +728,8 @@ static const struct dc_debug_options debug_defaults_drv = { .disable_fpo_vactive = false, .disable_boot_optimizations = false, .disable_subvp_high_refresh = true, + .fpo_vactive_min_active_margin_us = 200, + .fpo_vactive_max_blank_us = 1000, }; static const struct dc_debug_options debug_defaults_diags = { diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c b/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c index f7e45d935a29..8c60b88c7d1a 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c @@ -35,7 +35,6 @@ #define DC_LOGGER_INIT(logger) -static const unsigned int MAX_FPO_VACTIVE_BLANK_US = 600; static const struct subvp_high_refresh_list subvp_high_refresh_list = { .min_refresh = 120,
[PATCH 02/10] drm/amd/display: Only skip update for DCFCLK, UCLK, FCLK on overclock
From: Alvin Lee [Description] - Update clocks is skipped in the GPU overclock sequence - However, we still need to update DISPCLK, DPPCLK, and DTBCLK because the GPU overclock sequence could temporarily disable ODM 2:1 combine because we disable all planes in the sequence Reviewed-by: Jun Lei Acked-by: Aurabindo Pillai Signed-off-by: Alvin Lee --- .../display/dc/clk_mgr/dcn32/dcn32_clk_mgr.c | 24 +++ drivers/gpu/drm/amd/display/dc/dc.h | 7 +- 2 files changed, 20 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn32/dcn32_clk_mgr.c b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn32/dcn32_clk_mgr.c index 85e963ec25ab..1df623b298a9 100644 --- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn32/dcn32_clk_mgr.c +++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn32/dcn32_clk_mgr.c @@ -460,9 +460,6 @@ static void dcn32_update_clocks(struct clk_mgr *clk_mgr_base, bool p_state_change_support; bool fclk_p_state_change_support; - if (dc->work_arounds.skip_clock_update) - return; - if (clk_mgr_base->clks.dispclk_khz == 0 || (dc->debug.force_clock_mode & 0x1)) { /* This is from resume or boot up, if forced_clock cfg option used, @@ -489,7 +486,8 @@ static void dcn32_update_clocks(struct clk_mgr *clk_mgr_base, fclk_p_state_change_support = new_clocks->fclk_p_state_change_support; - if (should_update_pstate_support(safe_to_lower, fclk_p_state_change_support, clk_mgr_base->clks.fclk_p_state_change_support)) { + if (should_update_pstate_support(safe_to_lower, fclk_p_state_change_support, clk_mgr_base->clks.fclk_p_state_change_support) && + !dc->work_arounds.clock_update_disable_mask.fclk) { clk_mgr_base->clks.fclk_p_state_change_support = fclk_p_state_change_support; /* To enable FCLK P-state switching, send FCLK_PSTATE_SUPPORTED message to PMFW */ @@ -503,12 +501,14 @@ static void dcn32_update_clocks(struct clk_mgr *clk_mgr_base, new_clocks->dcfclk_khz = (new_clocks->dcfclk_khz > (dc->debug.force_min_dcfclk_mhz * 1000)) ? new_clocks->dcfclk_khz : (dc->debug.force_min_dcfclk_mhz * 1000); - if (should_set_clock(safe_to_lower, new_clocks->dcfclk_khz, clk_mgr_base->clks.dcfclk_khz)) { + if (should_set_clock(safe_to_lower, new_clocks->dcfclk_khz, clk_mgr_base->clks.dcfclk_khz) && + !dc->work_arounds.clock_update_disable_mask.dcfclk) { clk_mgr_base->clks.dcfclk_khz = new_clocks->dcfclk_khz; dcn32_smu_set_hard_min_by_freq(clk_mgr, PPCLK_DCFCLK, khz_to_mhz_ceil(clk_mgr_base->clks.dcfclk_khz)); } - if (should_set_clock(safe_to_lower, new_clocks->dcfclk_deep_sleep_khz, clk_mgr_base->clks.dcfclk_deep_sleep_khz)) { + if (should_set_clock(safe_to_lower, new_clocks->dcfclk_deep_sleep_khz, clk_mgr_base->clks.dcfclk_deep_sleep_khz) && + !dc->work_arounds.clock_update_disable_mask.dcfclk_ds) { clk_mgr_base->clks.dcfclk_deep_sleep_khz = new_clocks->dcfclk_deep_sleep_khz; dcn30_smu_set_min_deep_sleep_dcef_clk(clk_mgr, khz_to_mhz_ceil(clk_mgr_base->clks.dcfclk_deep_sleep_khz)); } @@ -527,7 +527,8 @@ static void dcn32_update_clocks(struct clk_mgr *clk_mgr_base, } p_state_change_support = new_clocks->p_state_change_support; - if (should_update_pstate_support(safe_to_lower, p_state_change_support, clk_mgr_base->clks.p_state_change_support)) { + if (should_update_pstate_support(safe_to_lower, p_state_change_support, clk_mgr_base->clks.p_state_change_support) && + !dc->work_arounds.clock_update_disable_mask.uclk) { clk_mgr_base->clks.p_state_change_support = p_state_change_support; /* to disable P-State switching, set UCLK min = max */ @@ -541,20 +542,23 @@ static void dcn32_update_clocks(struct clk_mgr *clk_mgr_base, update_fclk = true; } - if (clk_mgr_base->ctx->dce_version != DCN_VERSION_3_21 && !clk_mgr_base->clks.fclk_p_state_change_support && update_fclk) { + if (clk_mgr_base->ctx->dce_version != DCN_VERSION_3_21 && !clk_mgr_base->clks.fclk_p_state_change_support && update_fclk && + !dc->work_arounds.clock_update_disable_mask.fclk) { /* Handle code for sending a message to PMFW that FCLK P-state change is not supported */ dcn32_smu_send_fclk_pstate_message(clk_mgr, FCLK_PSTATE_NOTSUPPORTED); } /* Always
[PATCH 00/10] DC Patches for 15 May 2023
This DC patchset brings improvements in multiple areas. In summary, we highlight: * DC v3.2.236 * Fixes related to DCN clock sequencing * Changes to FPO acceptance heuristics for various modelines * Dmesg log readability, visual debug improments and various bug fixes. Cc: Daniel Wheeler --- Alvin Lee (3): drm/amd/display: Only skip update for DCFCLK, UCLK, FCLK on overclock drm/amd/display: Update vactive margin and max vblank for fpo + vactive drm/amd/display: Make unbounded req update separate from dlg/ttu Aric Cyr (1): drm/amd/display: 3.2.236 Daniel Miess (2): drm/amd/display: Fix possible underflow for displays with large vblank drm/amd/display: Remove v_startup workaround for dcn3+ Leo (Hanghong) Ma (1): drm/amd/display: Add visual confirm color support for MCLK switch Mustapha Ghaddar (1): drm/amd/display: enable dpia validate Rodrigo Siqueira (2): drm/amd/display: Convert connector signal id to string drm/amd/display: Remove unnecessary variable .../display/dc/clk_mgr/dcn32/dcn32_clk_mgr.c | 24 + drivers/gpu/drm/amd/display/dc/core/dc.c | 51 --- .../drm/amd/display/dc/core/dc_hw_sequencer.c | 50 -- drivers/gpu/drm/amd/display/dc/dc.h | 12 - .../amd/display/dc/dcn10/dcn10_hw_sequencer.c | 22 +++- .../amd/display/dc/dcn10/dcn10_hw_sequencer.h | 1 - .../drm/amd/display/dc/dcn20/dcn20_hwseq.c| 37 -- .../drm/amd/display/dc/dcn20/dcn20_hwseq.h| 5 -- .../gpu/drm/amd/display/dc/dcn20/dcn20_init.c | 2 +- .../drm/amd/display/dc/dcn201/dcn201_hwseq.c | 4 +- .../drm/amd/display/dc/dcn201/dcn201_init.c | 2 +- .../gpu/drm/amd/display/dc/dcn21/dcn21_init.c | 2 +- .../gpu/drm/amd/display/dc/dcn30/dcn30_init.c | 2 +- .../drm/amd/display/dc/dcn301/dcn301_init.c | 2 +- .../gpu/drm/amd/display/dc/dcn31/dcn31_init.c | 2 +- .../drm/amd/display/dc/dcn314/dcn314_init.c | 2 +- .../gpu/drm/amd/display/dc/dcn32/dcn32_init.c | 2 +- .../drm/amd/display/dc/dcn32/dcn32_resource.c | 2 + .../drm/amd/display/dc/dcn32/dcn32_resource.h | 1 - .../display/dc/dcn32/dcn32_resource_helpers.c | 2 +- .../amd/display/dc/dcn321/dcn321_resource.c | 2 + .../drm/amd/display/dc/dml/dcn20/dcn20_fpu.c | 24 +++-- .../amd/display/dc/dml/dcn314/dcn314_fpu.c| 19 +++ .../drm/amd/display/dc/dml/dcn32/dcn32_fpu.c | 10 +++- .../gpu/drm/amd/display/dc/inc/core_types.h | 3 ++ .../gpu/drm/amd/display/dc/inc/hw_sequencer.h | 9 +++- .../drm/amd/display/dc/link/link_factory.c| 6 +-- .../drm/amd/display/dc/link/link_validation.c | 3 ++ .../drm/amd/display/include/signal_types.h| 28 ++ 29 files changed, 224 insertions(+), 107 deletions(-) -- 2.40.0
Re: [RFC PATCH 0/4] Add support for DRM cgroup memory accounting.
Hello, On Wed, May 10, 2023 at 04:59:01PM +0200, Maarten Lankhorst wrote: > The misc controller is not granular enough. A single computer may have any > number of > graphics cards, some of them with multiple regions of vram inside a single > card. Extending the misc controller to support dynamic keys shouldn't be that difficult. ... > In the next version, I will move all the code for handling the resource limit > to > TTM's eviction layer, because otherwise it cannot handle the resource limit > correctly. > > The effect of moving the code to TTM, is that it will make the code even more > generic > for drivers that have vram and use TTM. When using TTM, you only have to > describe your > VRAM, update some fields in the TTM manager and (un)register your device with > the > cgroup handler on (un)load. It's quite trivial to add vram accounting to > amdgpu and > nouveau. [2] > > If you want to add a knob for scheduling weight for a process, it makes sense > to > also add resource usage as a knob, otherwise the effect of that knob is very > limited. So even for Tvrtko's original proposed usecase, it would make sense. It does make sense but unlike Tvrtko's scheduling weights what's being proposed doesn't seem to encapsulate GPU memory resource in a generic enough manner at least to my untrained eyes. ie. w/ drm.weight, I don't need any specific knoweldge of how a specific GPU operates to say "this guy should get 2x processing power over that guy". This more or less holds for other major resources including CPU, memory and IO. What you're proposing seems a lot more tied to hardware details and users would have to know a lot more about how memory is configured on that particular GPU. Now, if this is inherent to how all, or at least most, GPUs operate, sure, but otherwise let's start small in terms of interface and not take up space which should be for something universal. If this turns out to be the way, expanding to take up the generic interface space isn't difficult. I don't know GPU space so please educate me where I'm wrong. Thanks. -- tejun
[PATCH 6/6] drm/amdgpu/bu: update mtype_local parameter settings
From: Graham Sider Update mtype_local module parameter to use MTYPE_RW by default. 0: MTYPE_RW (default) 1: MTYPE_NC 2: MTYPE_CC Signed-off-by: Graham Sider Reviewed-by: Harish Kasiviswanathan Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 +- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 12 ++-- drivers/gpu/drm/amd/amdkfd/kfd_svm.c| 3 ++- 3 files changed, 9 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 8163abcc420c..562e65ab48fa 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -835,7 +835,7 @@ module_param_named(no_queue_eviction_on_vm_fault, amdgpu_no_queue_eviction_on_vm * DOC: mtype_local (int) */ int amdgpu_mtype_local; -MODULE_PARM_DESC(mtype_local, "MTYPE for local memory (0 = MTYPE_CC (default), 1 = MTYPE_NC, 2 = MTYPE_RW)"); +MODULE_PARM_DESC(mtype_local, "MTYPE for local memory (0 = MTYPE_RW (default), 1 = MTYPE_NC, 2 = MTYPE_CC)"); module_param_named(mtype_local, amdgpu_mtype_local, int, 0444); /** diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c index 5f7e6e15842b..7dfe6a8ca91a 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c @@ -1240,15 +1240,15 @@ static void gmc_v9_0_get_coherence_flags(struct amdgpu_device *adev, * NUMA systems. Their MTYPE can be overridden per-page in * gmc_v9_0_override_vm_pte_flags. */ - mtype_local = MTYPE_CC; + mtype_local = MTYPE_RW; if (amdgpu_mtype_local == 1) { DRM_INFO_ONCE("Using MTYPE_NC for local memory\n"); mtype_local = MTYPE_NC; } else if (amdgpu_mtype_local == 2) { - DRM_INFO_ONCE("Using MTYPE_RW for local memory\n"); - mtype_local = MTYPE_RW; - } else { DRM_INFO_ONCE("Using MTYPE_CC for local memory\n"); + mtype_local = MTYPE_CC; + } else { + DRM_INFO_ONCE("Using MTYPE_RW for local memory\n"); } is_local = (!is_vram && (adev->flags & AMD_IS_APU) && num_possible_nodes() <= 1) || @@ -1364,12 +1364,12 @@ static void gmc_v9_0_override_vm_pte_flags(struct amdgpu_device *adev, /*vm->mem_id*/0, local_node, nid); if (nid == local_node) { uint64_t old_flags = *flags; - unsigned int mtype_local = MTYPE_CC; + unsigned int mtype_local = MTYPE_RW; if (amdgpu_mtype_local == 1) mtype_local = MTYPE_NC; else if (amdgpu_mtype_local == 2) - mtype_local = MTYPE_RW; + mtype_local = MTYPE_CC; *flags = (*flags & ~AMDGPU_PTE_MTYPE_VG10_MASK) | AMDGPU_PTE_MTYPE_VG10(mtype_local); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index 9053202ab534..c5675c7e3b9e 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -1191,7 +1191,8 @@ svm_range_get_pte_flags(struct kfd_node *node, } break; case IP_VERSION(9, 4, 3): - mtype_local = amdgpu_mtype_local == 1 ? AMDGPU_VM_MTYPE_NC : (amdgpu_mtype_local == 2 ? AMDGPU_VM_MTYPE_RW : AMDGPU_VM_MTYPE_CC); + mtype_local = amdgpu_mtype_local == 1 ? AMDGPU_VM_MTYPE_NC : +(amdgpu_mtype_local == 2 ? AMDGPU_VM_MTYPE_CC : AMDGPU_VM_MTYPE_RW); snoop = true; if (uncached) { mapping_flags |= AMDGPU_VM_MTYPE_UC; -- 2.40.1
[PATCH 5/6] drm/amdgpu/bu: add mtype_local as a module parameter
From: David Francis Selects the MTYPE to be used for local memory, (0 = MTYPE_CC (default), 1 = MTYPE_NC, 2 = MTYPE_RW) This change is for internal testing only - do not upstream. v2: squash in build fix (Alex) Reviewed-by: Graham Sider Signed-off-by: David Francis Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 8 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 19 --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c| 3 +-- 4 files changed, 22 insertions(+), 10 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index a3a0dbeb251f..bed6d1d09ac2 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -213,7 +213,7 @@ extern int amdgpu_noretry; extern int amdgpu_force_asic_type; extern int amdgpu_smartshift_bias; extern int amdgpu_use_xgmi_p2p; -extern bool amdgpu_use_mtype_cc_wa; +extern int amdgpu_mtype_local; #ifdef CONFIG_HSA_AMD extern int sched_policy; extern bool debug_evictions; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 2f38c49aa597..8163abcc420c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -832,11 +832,11 @@ module_param_named(no_queue_eviction_on_vm_fault, amdgpu_no_queue_eviction_on_vm #endif /** - * DOC: use_mtype_cc_wa (bool) + * DOC: mtype_local (int) */ -bool amdgpu_use_mtype_cc_wa = true; -MODULE_PARM_DESC(use_mtype_cc_wa, "Use MTYPE_CC workaround (0 = use MTYPE_RW where applicable, 1 = use MTYPE_CC where applicable (default))"); -module_param_named(use_mtype_cc_wa, amdgpu_use_mtype_cc_wa, bool, 0444); +int amdgpu_mtype_local; +MODULE_PARM_DESC(mtype_local, "MTYPE for local memory (0 = MTYPE_CC (default), 1 = MTYPE_NC, 2 = MTYPE_RW)"); +module_param_named(mtype_local, amdgpu_mtype_local, int, 0444); /** * DOC: pcie_p2p (bool) diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c index 5c9f0169292e..5f7e6e15842b 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c @@ -1240,7 +1240,16 @@ static void gmc_v9_0_get_coherence_flags(struct amdgpu_device *adev, * NUMA systems. Their MTYPE can be overridden per-page in * gmc_v9_0_override_vm_pte_flags. */ - mtype_local = amdgpu_use_mtype_cc_wa ? MTYPE_CC : MTYPE_RW; + mtype_local = MTYPE_CC; + if (amdgpu_mtype_local == 1) { + DRM_INFO_ONCE("Using MTYPE_NC for local memory\n"); + mtype_local = MTYPE_NC; + } else if (amdgpu_mtype_local == 2) { + DRM_INFO_ONCE("Using MTYPE_RW for local memory\n"); + mtype_local = MTYPE_RW; + } else { + DRM_INFO_ONCE("Using MTYPE_CC for local memory\n"); + } is_local = (!is_vram && (adev->flags & AMD_IS_APU) && num_possible_nodes() <= 1) || (is_vram && adev == bo_adev /* TODO: memory partitions && @@ -1354,9 +1363,13 @@ static void gmc_v9_0_override_vm_pte_flags(struct amdgpu_device *adev, dev_dbg(adev->dev, "vm->mem_id=%d, local_node=%d, nid=%d\n", /*vm->mem_id*/0, local_node, nid); if (nid == local_node) { - unsigned int mtype_local = - amdgpu_use_mtype_cc_wa ? MTYPE_CC : MTYPE_RW; uint64_t old_flags = *flags; + unsigned int mtype_local = MTYPE_CC; + + if (amdgpu_mtype_local == 1) + mtype_local = MTYPE_NC; + else if (amdgpu_mtype_local == 2) + mtype_local = MTYPE_RW; *flags = (*flags & ~AMDGPU_PTE_MTYPE_VG10_MASK) | AMDGPU_PTE_MTYPE_VG10(mtype_local); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index ab1acf97d049..9053202ab534 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -1191,8 +1191,7 @@ svm_range_get_pte_flags(struct kfd_node *node, } break; case IP_VERSION(9, 4, 3): - mtype_local = amdgpu_use_mtype_cc_wa ? AMDGPU_VM_MTYPE_CC : - AMDGPU_VM_MTYPE_RW; + mtype_local = amdgpu_mtype_local == 1 ? AMDGPU_VM_MTYPE_NC : (amdgpu_mtype_local == 2 ? AMDGPU_VM_MTYPE_RW : AMDGPU_VM_MTYPE_CC); snoop = true; if (uncached) { mapping_flags |= AMDGPU_VM_MTYPE_UC; -- 2.40.1
[PATCH 3/6] drm/amdgpu: Fix per-BO MTYPE selection for GFXv9.4.3
From: Felix Kuehling Treat system memory on NUMA systems as remote by default. Overriding with a more efficient MTYPE per page will be implemented in the next patch. No need for a special case for APP APUs. System memory is handled the same for carve-out and native mode. And VRAM doesn't exist in native mode. Signed-off-by: Felix Kuehling Reviewed-by: Philip Yang Reviewed-and-tested-by: Rajneesh Bhardwaj Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 40 +++ drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 24 +--- 2 files changed, 30 insertions(+), 34 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c index 59ce741dfa73..52f5bab5fcb7 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c @@ -1191,9 +1191,10 @@ static void gmc_v9_0_get_coherence_flags(struct amdgpu_device *adev, bool is_vram = bo->tbo.resource->mem_type == TTM_PL_VRAM; bool coherent = bo->flags & AMDGPU_GEM_CREATE_COHERENT; bool uncached = bo->flags & AMDGPU_GEM_CREATE_UNCACHED; - unsigned int mtype; - unsigned int mtype_default; + /* TODO: memory partitions struct amdgpu_vm *vm = mapping->bo_va->base.vm;*/ + unsigned int mtype_local, mtype; bool snoop = false; + bool is_local; switch (adev->ip_versions[GC_HWIP][0]) { case IP_VERSION(9, 4, 1): @@ -1233,35 +1234,26 @@ static void gmc_v9_0_get_coherence_flags(struct amdgpu_device *adev, } break; case IP_VERSION(9, 4, 3): - /* FIXME: Needs more work for handling multiple memory -* partitions (> NPS1 mode) e.g. NPS4 for both APU and dGPU -* modes. -* FIXME: Temporarily using MTYPE_CC instead of MTYPE_RW where applicable. -* To force use of MTYPE_RW, set use_mtype_cc_wa=0 + /* Only local VRAM BOs or system memory on non-NUMA APUs +* can be assumed to be local in their entirety. Choose +* MTYPE_NC as safe fallback for all system memory BOs on +* NUMA systems. Their MTYPE can be overridden per-page in +* gmc_v9_0_override_vm_pte_flags. */ - mtype_default = amdgpu_use_mtype_cc_wa ? MTYPE_CC : MTYPE_RW; + mtype_local = amdgpu_use_mtype_cc_wa ? MTYPE_CC : MTYPE_RW; + is_local = (!is_vram && (adev->flags & AMD_IS_APU) && + num_possible_nodes() <= 1) || + (is_vram && adev == bo_adev /* TODO: memory partitions && + bo->mem_id == vm->mem_id*/); snoop = true; if (uncached) { mtype = MTYPE_UC; - } else if (adev->gmc.is_app_apu) { - /* FIXME: APU in native mode, NPS1 single socket only -* -* For suporting NUMA partitioned APU e.g. in NPS4 mode, -* this need to look at the NUMA node on which the -* system memory allocation was done. -* -* Memory access by a different partition within same -* socket should be treated as remote access so MTYPE_RW -* cannot be used always. -*/ - mtype = mtype_default; } else if (adev->flags & AMD_IS_APU) { - /* APU on carve out mode */ - mtype = mtype_default; + mtype = is_local ? mtype_local : MTYPE_NC; } else { /* dGPU */ - if (is_vram && bo_adev == adev) - mtype = mtype_default; + if (is_local) + mtype = mtype_local; else if (is_vram) mtype = MTYPE_NC; else diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index c55b9754c506..ab1acf97d049 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -1150,6 +1150,7 @@ svm_range_get_pte_flags(struct kfd_node *node, bool snoop = (domain != SVM_RANGE_VRAM_DOMAIN); bool coherent = flags & KFD_IOCTL_SVM_FLAG_COHERENT; bool uncached = flags & KFD_IOCTL_SVM_FLAG_UNCACHED; + unsigned int mtype_local; if (domain == SVM_RANGE_VRAM_DOMAIN) bo_node = prange->svm_bo->node; @@ -1190,19 +1191,16 @@ svm_range_get_pte_flags(struct kfd_node *node, } break; case IP_VERSION(9, 4, 3): - //TODO: Need more work for handling multiple memory partitions - //e.g. NPS4.
[PATCH 4/6] drm/amdgpu: Override MTYPE per page on GFXv9.4.3 APUs
From: Felix Kuehling On GFXv9.4.3 NUMA APUs, system memory locality must be determined per page to choose the correct MTYPE. This patch adds a GMC callback that can provide this per-page override and implements it for native mode. Carve-out mode is not yet supported and will use the safe default (remote) MTYPE for system memory. Signed-off-by: Felix Kuehling Reviewed-by: Philip Yang Reviewed-and-tested-by: Rajneesh Bhardwaj Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h | 7 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 22 ++-- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 64 +++ 3 files changed, 90 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h index 43357d699e6e..6794edd1d2d2 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h @@ -148,6 +148,10 @@ struct amdgpu_gmc_funcs { void (*get_vm_pte)(struct amdgpu_device *adev, struct amdgpu_bo_va_mapping *mapping, uint64_t *flags); + /* override per-page pte flags */ + void (*override_vm_pte_flags)(struct amdgpu_device *dev, + struct amdgpu_vm *vm, + uint64_t addr, uint64_t *flags); /* get the amount of memory used by the vbios for pre-OS console */ unsigned int (*get_vbios_fb_size)(struct amdgpu_device *adev); @@ -336,6 +340,9 @@ struct amdgpu_gmc { #define amdgpu_gmc_map_mtype(adev, flags) (adev)->gmc.gmc_funcs->map_mtype((adev),(flags)) #define amdgpu_gmc_get_vm_pde(adev, level, dst, flags) (adev)->gmc.gmc_funcs->get_vm_pde((adev), (level), (dst), (flags)) #define amdgpu_gmc_get_vm_pte(adev, mapping, flags) (adev)->gmc.gmc_funcs->get_vm_pte((adev), (mapping), (flags)) +#define amdgpu_gmc_override_vm_pte_flags(adev, vm, addr, pte_flags)\ + (adev)->gmc.gmc_funcs->override_vm_pte_flags\ + ((adev), (vm), (addr), (pte_flags)) #define amdgpu_gmc_get_vbios_fb_size(adev) (adev)->gmc.gmc_funcs->get_vbios_fb_size((adev)) /** diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c index bc5d126b600b..60b1da93b06d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c @@ -786,13 +786,14 @@ static void amdgpu_vm_pte_update_flags(struct amdgpu_vm_update_params *params, uint64_t pe, uint64_t addr, unsigned int count, uint32_t incr, uint64_t flags) - { + struct amdgpu_device *adev = params->adev; + if (level != AMDGPU_VM_PTB) { flags |= AMDGPU_PDE_PTE; - amdgpu_gmc_get_vm_pde(params->adev, level, , ); + amdgpu_gmc_get_vm_pde(adev, level, , ); - } else if (params->adev->asic_type >= CHIP_VEGA10 && + } else if (adev->asic_type >= CHIP_VEGA10 && !(flags & AMDGPU_PTE_VALID) && !(flags & AMDGPU_PTE_PRT)) { @@ -800,6 +801,21 @@ static void amdgpu_vm_pte_update_flags(struct amdgpu_vm_update_params *params, flags |= AMDGPU_PTE_EXECUTABLE; } + /* APUs mapping system memory may need different MTYPEs on different +* NUMA nodes. Only do this for contiguous ranges that can be assumed +* to be on the same NUMA node. +*/ + if ((flags & AMDGPU_PTE_SYSTEM) && (adev->flags & AMD_IS_APU) && + adev->gmc.gmc_funcs->override_vm_pte_flags && + num_possible_nodes() > 1) { + if (!params->pages_addr) + amdgpu_gmc_override_vm_pte_flags(adev, params->vm, +addr, ); + else + dev_dbg(adev->dev, + "override_vm_pte_flags skipped: non-contiguous\n"); + } + params->vm->update_funcs->update(params, pt, pe, addr, count, incr, flags); } diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c index 52f5bab5fcb7..5c9f0169292e 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c @@ -1302,6 +1302,69 @@ static void gmc_v9_0_get_vm_pte(struct amdgpu_device *adev, mapping, flags); } +static void gmc_v9_0_override_vm_pte_flags(struct amdgpu_device *adev, + struct amdgpu_vm *vm, + uint64_t addr, uint64_t *flags) +{ + int local_node, nid; + + /* Only GFX 9.4.3 APUs associate GPUs with NUMA nodes. Local system +* memory can use more efficient MTYPEs. +*/ + if
[PATCH 2/6] drm/amdgpu/bu: Add use_mtype_cc_wa module param
From: Graham Sider By default, set use_mtype_cc_wa to 1 to set PTE coherence flag MTYPE_CC instead of MTYPE_RW by default. This is required for the time being to mitigate a bug causing XCCs to hit stale data due to TCC marking fully dirty lines as exclusive. Signed-off-by: Graham Sider Reviewed-by: Joseph Greathouse Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 7 +++ drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 10 +++--- drivers/gpu/drm/amd/amdkfd/kfd_svm.c| 7 +-- 4 files changed, 20 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 9904ce78b8fc..a3a0dbeb251f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -213,6 +213,7 @@ extern int amdgpu_noretry; extern int amdgpu_force_asic_type; extern int amdgpu_smartshift_bias; extern int amdgpu_use_xgmi_p2p; +extern bool amdgpu_use_mtype_cc_wa; #ifdef CONFIG_HSA_AMD extern int sched_policy; extern bool debug_evictions; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index e4d09bf0887d..2f38c49aa597 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -831,6 +831,13 @@ MODULE_PARM_DESC(no_queue_eviction_on_vm_fault, "No queue eviction on VM fault ( module_param_named(no_queue_eviction_on_vm_fault, amdgpu_no_queue_eviction_on_vm_fault, int, 0444); #endif +/** + * DOC: use_mtype_cc_wa (bool) + */ +bool amdgpu_use_mtype_cc_wa = true; +MODULE_PARM_DESC(use_mtype_cc_wa, "Use MTYPE_CC workaround (0 = use MTYPE_RW where applicable, 1 = use MTYPE_CC where applicable (default))"); +module_param_named(use_mtype_cc_wa, amdgpu_use_mtype_cc_wa, bool, 0444); + /** * DOC: pcie_p2p (bool) * Enable PCIe P2P (requires large-BAR). Default value: true (on) diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c index d28ffdb07ae6..59ce741dfa73 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c @@ -1192,6 +1192,7 @@ static void gmc_v9_0_get_coherence_flags(struct amdgpu_device *adev, bool coherent = bo->flags & AMDGPU_GEM_CREATE_COHERENT; bool uncached = bo->flags & AMDGPU_GEM_CREATE_UNCACHED; unsigned int mtype; + unsigned int mtype_default; bool snoop = false; switch (adev->ip_versions[GC_HWIP][0]) { @@ -1235,7 +1236,10 @@ static void gmc_v9_0_get_coherence_flags(struct amdgpu_device *adev, /* FIXME: Needs more work for handling multiple memory * partitions (> NPS1 mode) e.g. NPS4 for both APU and dGPU * modes. +* FIXME: Temporarily using MTYPE_CC instead of MTYPE_RW where applicable. +* To force use of MTYPE_RW, set use_mtype_cc_wa=0 */ + mtype_default = amdgpu_use_mtype_cc_wa ? MTYPE_CC : MTYPE_RW; snoop = true; if (uncached) { mtype = MTYPE_UC; @@ -1250,14 +1254,14 @@ static void gmc_v9_0_get_coherence_flags(struct amdgpu_device *adev, * socket should be treated as remote access so MTYPE_RW * cannot be used always. */ - mtype = MTYPE_RW; + mtype = mtype_default; } else if (adev->flags & AMD_IS_APU) { /* APU on carve out mode */ - mtype = MTYPE_RW; + mtype = mtype_default; } else { /* dGPU */ if (is_vram && bo_adev == adev) - mtype = MTYPE_RW; + mtype = mtype_default; else if (is_vram) mtype = MTYPE_NC; else diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index 83f8e4e50315..c55b9754c506 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -1197,9 +1197,12 @@ svm_range_get_pte_flags(struct kfd_node *node, if (uncached) { mapping_flags |= AMDGPU_VM_MTYPE_UC; } else if (domain == SVM_RANGE_VRAM_DOMAIN) { - /* local HBM region close to partition */ + /* local HBM region close to partition +* FIXME: Temporarily using MTYPE_CC instead of MTYPE_RW where applicable. +* To force use of MTYPE_RW, set use_mtype_cc_wa=0 +*/ if (bo_node == node) - mapping_flags |= AMDGPU_VM_MTYPE_RW; + mapping_flags |=
[PATCH 1/6] drm/amdgpu/bu: Use legacy TLB flush for gfx943
From: Graham Sider Invalidate TLBs via a legacy flush request (flush_type=0) prior to the heavyweight flush requests (flush_type=2) in gmc_v9_0.c. This is temporarily required to mitigate a bug causing CPC UTCL1 to return stale translations after invalidation requests in address range mode. Signed-off-by: Graham Sider Reviewed-by: Philip Yang Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 17 + 1 file changed, 17 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c index f000e0e89bd0..d28ffdb07ae6 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c @@ -833,6 +833,14 @@ static void gmc_v9_0_flush_gpu_tlb(struct amdgpu_device *adev, uint32_t vmid, */ inv_req = gmc_v9_0_get_invalidate_req(vmid, 2); inv_req2 = gmc_v9_0_get_invalidate_req(vmid, flush_type); + } else if (flush_type == 2 && + adev->ip_versions[GC_HWIP][0] == IP_VERSION(9, 4, 3)) { + /* FIXME: Temporarily add a legacy flush (type 0) before heavyweight +* flush for gfx943 to mitigate a bug which causes CPC UTCL1 to return +* stale translations even after TLB heavyweight flush. +*/ + inv_req = gmc_v9_0_get_invalidate_req(vmid, 0); + inv_req2 = gmc_v9_0_get_invalidate_req(vmid, flush_type); } else { inv_req = gmc_v9_0_get_invalidate_req(vmid, flush_type); inv_req2 = 0; @@ -976,6 +984,15 @@ static int gmc_v9_0_flush_gpu_tlb_pasid(struct amdgpu_device *adev, if (vega20_xgmi_wa) kiq->pmf->kiq_invalidate_tlbs(ring, pasid, 2, all_hub); + + /* FIXME: Temporarily add a legacy flush (type 0) before heavyweight +* flush for gfx943 to mitigate a bug which causes CPC UTCL1 to return +* stale translations even after TLB heavyweight flush. +*/ + if (flush_type == 2 && adev->ip_versions[GC_HWIP][0] == IP_VERSION(9, 4, 3)) + kiq->pmf->kiq_invalidate_tlbs(ring, + pasid, 0, all_hub); + kiq->pmf->kiq_invalidate_tlbs(ring, pasid, flush_type, all_hub); r = amdgpu_fence_emit_polling(ring, , MAX_KIQ_REG_WAIT); -- 2.40.1
[linux-next:master] BUILD SUCCESS WITH WARNING 578215f3e21c472c08d70b8796edf1ac58f88578
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master branch HEAD: 578215f3e21c472c08d70b8796edf1ac58f88578 Add linux-next specific files for 20230510 Warning reports: https://lore.kernel.org/oe-kbuild-all/202304140707.coh337ux-...@intel.com Warning: (recently discovered and may have been fixed) drivers/base/regmap/regcache-maple.c:113:23: warning: 'lower_index' is used uninitialized [-Wuninitialized] drivers/base/regmap/regcache-maple.c:113:36: warning: 'lower_last' is used uninitialized [-Wuninitialized] drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:6395:21: warning: variable 'count' set but not used [-Wunused-but-set-variable] drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:499:13: warning: variable 'j' set but not used [-Wunused-but-set-variable] drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c:48:38: warning: unused variable 'golden_settings_gc_9_4_3' [-Wunused-const-variable] Unverified Warning (likely false positive, please contact us if interested): drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:648:3-9: preceding lock on line 640 drivers/gpu/drm/i915/display/intel_psr.c:2999:0-23: WARNING: i915_edp_psr_debug_fops should be defined with DEFINE_DEBUGFS_ATTRIBUTE fs/ext4/super.c:4724 ext4_check_feature_compatibility() warn: bitwise AND condition is false here fs/ext4/verity.c:316 ext4_get_verity_descriptor_location() error: uninitialized symbol 'desc_size_disk'. fs/xfs/scrub/fscounters.c:459 xchk_fscounters() warn: ignoring unreachable code. Warning ids grouped by kconfigs: gcc_recent_errors |-- alpha-allyesconfig | |-- drivers-gpu-drm-amd-amdgpu-..-display-amdgpu_dm-amdgpu_dm.c:warning:variable-count-set-but-not-used | `-- drivers-gpu-drm-amd-amdgpu-amdgpu_gfx.c:warning:variable-j-set-but-not-used |-- arc-allyesconfig | |-- drivers-base-regmap-regcache-maple.c:warning:lower_index-is-used-uninitialized | |-- drivers-base-regmap-regcache-maple.c:warning:lower_last-is-used-uninitialized | |-- drivers-gpu-drm-amd-amdgpu-..-display-amdgpu_dm-amdgpu_dm.c:warning:variable-count-set-but-not-used | `-- drivers-gpu-drm-amd-amdgpu-amdgpu_gfx.c:warning:variable-j-set-but-not-used |-- arc-randconfig-r025-20230509 | |-- drivers-base-regmap-regcache-maple.c:warning:lower_index-is-used-uninitialized | `-- drivers-base-regmap-regcache-maple.c:warning:lower_last-is-used-uninitialized |-- arm-allmodconfig | |-- drivers-gpu-drm-amd-amdgpu-..-display-amdgpu_dm-amdgpu_dm.c:warning:variable-count-set-but-not-used | `-- drivers-gpu-drm-amd-amdgpu-amdgpu_gfx.c:warning:variable-j-set-but-not-used |-- arm-allyesconfig | |-- drivers-gpu-drm-amd-amdgpu-..-display-amdgpu_dm-amdgpu_dm.c:warning:variable-count-set-but-not-used | `-- drivers-gpu-drm-amd-amdgpu-amdgpu_gfx.c:warning:variable-j-set-but-not-used |-- arm64-allyesconfig | |-- drivers-gpu-drm-amd-amdgpu-..-display-amdgpu_dm-amdgpu_dm.c:warning:variable-count-set-but-not-used | `-- drivers-gpu-drm-amd-amdgpu-amdgpu_gfx.c:warning:variable-j-set-but-not-used |-- csky-allmodconfig | |-- drivers-gpu-drm-amd-amdgpu-..-display-amdgpu_dm-amdgpu_dm.c:warning:variable-count-set-but-not-used | `-- drivers-gpu-drm-amd-amdgpu-amdgpu_gfx.c:warning:variable-j-set-but-not-used |-- i386-allyesconfig | |-- drivers-gpu-drm-amd-amdgpu-..-display-amdgpu_dm-amdgpu_dm.c:warning:variable-count-set-but-not-used | `-- drivers-gpu-drm-amd-amdgpu-amdgpu_gfx.c:warning:variable-j-set-but-not-used |-- ia64-allmodconfig | |-- drivers-gpu-drm-amd-amdgpu-..-display-amdgpu_dm-amdgpu_dm.c:warning:variable-count-set-but-not-used | `-- drivers-gpu-drm-amd-amdgpu-amdgpu_gfx.c:warning:variable-j-set-but-not-used |-- ia64-randconfig-s052-20230509 | `-- drivers-gpu-drm-amd-amdgpu-amdgpu_gfx.c:warning:variable-j-set-but-not-used |-- loongarch-allmodconfig | |-- drivers-gpu-drm-amd-amdgpu-..-display-amdgpu_dm-amdgpu_dm.c:warning:variable-count-set-but-not-used | `-- drivers-gpu-drm-amd-amdgpu-amdgpu_gfx.c:warning:variable-j-set-but-not-used |-- loongarch-defconfig | |-- drivers-gpu-drm-amd-amdgpu-..-display-amdgpu_dm-amdgpu_dm.c:warning:variable-count-set-but-not-used | `-- drivers-gpu-drm-amd-amdgpu-amdgpu_gfx.c:warning:variable-j-set-but-not-used |-- loongarch-randconfig-c023-20230509 | |-- drivers-gpu-drm-amd-amdgpu-..-display-amdgpu_dm-amdgpu_dm.c:warning:variable-count-set-but-not-used | `-- drivers-gpu-drm-amd-amdgpu-amdgpu_gfx.c:warning:variable-j-set-but-not-used |-- loongarch-randconfig-s051-20230509 | `-- drivers-gpu-drm-amd-amdgpu-amdgpu_gfx.c:warning:variable-j-set-but-not-used |-- microblaze-randconfig-m031-20230509 | `-- fs-ext4-super.c-ext4_check_feature_compatibility()-warn:bitwise-AND-condition-is-false-here |-- microblaze-randconfig-r035-20230509 | |-- drivers-gpu-drm-amd-amdgpu-..-display-amdgpu_dm-amdgpu_dm.c:warning:variable-count-set-but-not-used | `-- drivers-gpu-drm-amd-amdgpu-amdgpu_gfx.c:warning:variable-j-set-but-not-used |-- microblaze-randconfig-s032
Re: [PATCH] drm/amdkfd: Remove skiping userptr buffer mapping when mmu notifier marks it as invalid
On Wed, May 10, 2023 at 11:00 AM Felix Kuehling wrote: > > Am 2023-05-09 um 18:17 schrieb Alex Deucher: > > From: Xiaogang Chen > > > > mmu notifier does not always hold mm->sem during call back. That causes > > a race condition between kfd userprt buffer mapping and mmu notifier > > which leds to gpu shadder or SDMA access userptr buffer before it has been > > mapped to gpu VM. Always map userptr buffer to avoid that though it may make > > some userprt buffers mapped two times. > > > > Suggested-by: Felix Kuehling > > Signed-off-by: Xiaogang Chen > > Reviewed-by: Felix Kuehling > > Signed-off-by: Alex Deucher > > This patch is no longer needed and should not be applied. It was > originally applied to amd-staging-drm-next as patch > fcf00f8d29f2fc6bf00531a1447be28b99073cc3 in November 2022. This fixed a > race condition due to incorrect assumptions about the mmap lock and MMU > notifiers. This hunk was added back by my later patch f95f51a4c335 > ("drm/amdgpu: Add notifier lock for KFD userptrs") in December, using > our own notifier lock that doesn't suffer from those races. > Thanks. Dropped. Alex > Regards, >Felix > > > > --- > > drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 10 -- > > 1 file changed, 10 deletions(-) > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c > > index 58a774647573..40078c0a5585 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c > > @@ -1942,16 +1942,6 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu( > >*/ > > mutex_lock(>process_info->lock); > > > > - /* Lock notifier lock. If we find an invalid userptr BO, we can be > > - * sure that the MMU notifier is no longer running > > - * concurrently and the queues are actually stopped > > - */ > > - if (amdgpu_ttm_tt_get_usermm(bo->tbo.ttm)) { > > - mutex_lock(>process_info->notifier_lock); > > - is_invalid_userptr = !!mem->invalid; > > - mutex_unlock(>process_info->notifier_lock); > > - } > > - > > mutex_lock(>lock); > > > > domain = mem->domain;
Re: [PATCH] drm/amdkfd: Remove skiping userptr buffer mapping when mmu notifier marks it as invalid
Am 2023-05-09 um 18:17 schrieb Alex Deucher: From: Xiaogang Chen mmu notifier does not always hold mm->sem during call back. That causes a race condition between kfd userprt buffer mapping and mmu notifier which leds to gpu shadder or SDMA access userptr buffer before it has been mapped to gpu VM. Always map userptr buffer to avoid that though it may make some userprt buffers mapped two times. Suggested-by: Felix Kuehling Signed-off-by: Xiaogang Chen Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher This patch is no longer needed and should not be applied. It was originally applied to amd-staging-drm-next as patch fcf00f8d29f2fc6bf00531a1447be28b99073cc3 in November 2022. This fixed a race condition due to incorrect assumptions about the mmap lock and MMU notifiers. This hunk was added back by my later patch f95f51a4c335 ("drm/amdgpu: Add notifier lock for KFD userptrs") in December, using our own notifier lock that doesn't suffer from those races. Regards, Felix --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 10 -- 1 file changed, 10 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 58a774647573..40078c0a5585 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -1942,16 +1942,6 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu( */ mutex_lock(>process_info->lock); - /* Lock notifier lock. If we find an invalid userptr BO, we can be -* sure that the MMU notifier is no longer running -* concurrently and the queues are actually stopped -*/ - if (amdgpu_ttm_tt_get_usermm(bo->tbo.ttm)) { - mutex_lock(>process_info->notifier_lock); - is_invalid_userptr = !!mem->invalid; - mutex_unlock(>process_info->notifier_lock); - } - mutex_lock(>lock); domain = mem->domain;
Re: [RFC PATCH 0/4] Add support for DRM cgroup memory accounting.
Hey, On 2023-05-05 21:50, Tejun Heo wrote: Hello, On Wed, May 03, 2023 at 10:34:56AM +0200, Maarten Lankhorst wrote: RFC as I'm looking for comments. For long running compute, it can be beneficial to partition the GPU memory between cgroups, so each cgroup can use its maximum amount of memory without interfering with other scheduled jobs. Done properly, this can alleviate the need for eviction, which might result in a job being terminated if the GPU doesn't support mid-thread preemption or recoverable page faults. This is done by adding a bunch of knobs to cgroup: drm.capacity: Shows maximum capacity of each resource region. drm.max: Display or limit max amount of memory. drm.current: Current amount of memory in use. TTM has not been made cgroup aware yet, so instead of evicting from the current cgroup to stay within the cgroup limits, it simply returns the error -ENOSPC to userspace. I've used Tvrtko's cgroup controller series as a base, but it implemented scheduling weight, not memory accounting, so I only ended up keeping the base patch. Xe is not upstream yet, so the driver specific patch will only apply on https://gitlab.freedesktop.org/drm/xe/kernel Some high-level feedbacks. * There have been multiple attempts at this but the track record is kinda poor. People don't seem to agree what should constitute DRM memory and how they should be accounted / controlled. Thanks for the feedback. I think for a lot of drivers, what is VRAM might have different meaning, but the intention is it being accounted in the same way. Most drivers use TTM, which has a standard way of allocating memory, and a standard way of evicting VRAM. This makes it very useful for the usecase which I'm looking at, long running compute. When you have long running jobs, you don't want them to be interrupted because a completely unrelated process needs some VRAM, and one of the compute jobs buffers are being evicted. Some hardware does not support mid-thread preemption or page fault recovery, this means that when memory is evicted, the compute job is terminated. The full problem statement is in drm-compute.rst in the memory accounting patch. * I like Tvrtko's scheduling patchset because it exposes a generic interface which makes sense regardless of hardware details and then each driver can implement the configured control in whatever way they can. However, even for that, there doesn't seem much buy-in from other drivers. Yeah, that is correct. But it tries to solve a different part of the problem. * This proposal seems narrowly scoped trying to solve a specific problem which may not translate to different hardware configurations. Please let me know if I got that wrong, but if that's the case, I think a better and easier approach might be just being a part of the misc controller. That doesn't require much extra code and should be able to provide everything necessary for statically limiting specific resources. The misc controller is not granular enough. A single computer may have any number of graphics cards, some of them with multiple regions of vram inside a single card. For compute and shared hosting you might want to limit the usage of a single memory region on a single card, and then limit the same limits for the rest too, to prevent triggering eviction. The current version doesn't handle eviction correctly, because I was still working on it and I wanted to post a RFC. As a result, the case where resource limit is hit will evict the device's entire memory or get stuck in a loop. With some changes, the next version will not have this bug. This results in a few changes to the core code. [1] In the next version, I will move all the code for handling the resource limit to TTM's eviction layer, because otherwise it cannot handle the resource limit correctly. The effect of moving the code to TTM, is that it will make the code even more generic for drivers that have vram and use TTM. When using TTM, you only have to describe your VRAM, update some fields in the TTM manager and (un)register your device with the cgroup handler on (un)load. It's quite trivial to add vram accounting to amdgpu and nouveau. [2] If you want to add a knob for scheduling weight for a process, it makes sense to also add resource usage as a knob, otherwise the effect of that knob is very limited. So even for Tvrtko's original proposed usecase, it would make sense. Cheers, ~Maarten [1] Compared to this version: static inline int drmcg_try_charge(struct drmcgroup_state **drmcs, + struct drmcgroup_state **limitcs, struct drmcgroup_device *cgdev, u32 index, u64 size) This now returns which cgroup's limit is hit on -EAGAIN. +bool drmcs_grouped(struct drmcgroup_state *limitcs, + struct drmcgroup_state *testcs); Tells if testcs is the same as limitcs, or a subgroup
Re: [PATCH] drm/sched: Check scheduler work queue before calling timeout handling
On 2023-05-10 10:24, vitaly prosyak wrote: > > On 2023-05-10 10:19, Luben Tuikov wrote: >> On 2023-05-10 09:51, vitaly.pros...@amd.com wrote: >>> From: Vitaly Prosyak >>> >>> During an IGT GPU reset test we see again oops despite of >>> commit 0c8c901aaaebc9 (drm/sched: Check scheduler ready before calling >>> timeout handling). >>> >>> It uses ready condition whether to call drm_sched_fault which unwind >>> the TDR leads to GPU reset. >>> However it looks the ready condition is overloaded with other meanings, >>> for example, for the following stack is related GPU reset : >>> >>> 0 gfx_v9_0_cp_gfx_start >>> 1 gfx_v9_0_cp_gfx_resume >>> 2 gfx_v9_0_cp_resume >>> 3 gfx_v9_0_hw_init >>> 4 gfx_v9_0_resume >>> 5 amdgpu_device_ip_resume_phase2 >>> >>> does the following: >>> /* start the ring */ >>> gfx_v9_0_cp_gfx_start(adev); >>> ring->sched.ready = true; >>> >>> The same approach is for other ASICs as well : >>> gfx_v8_0_cp_gfx_resume >>> gfx_v10_0_kiq_resume, etc... >>> >>> As a result, our GPU reset test causes GPU fault which calls >>> unconditionally gfx_v9_0_fault >>> and then drm_sched_fault. However now it depends on whether the interrupt >>> service routine >>> drm_sched_fault is executed after gfx_v9_0_cp_gfx_start is completed which >>> sets the ready >>> field of the scheduler to true even for uninitialized schedulers and >>> causes oops vs >>> no fault or when ISR drm_sched_fault is completed prior >>> gfx_v9_0_cp_gfx_start and >>> NULL pointer dereference does not occur. >>> >>> Use the field timeout_wq to prevent oops for uninitialized schedulers. >>> The field could be initialized by the work queue of resetting the domain. >>> >>> Fixes: 0c8c901aaaebc9 ("drm/sched: Check scheduler ready before calling >>> timeout handling") >>> >>> v1: Corrections to commit message (Luben) >>> Signed-off-by: Vitaly Prosyak >>> Reviewed-by: Luben Tuikov >> I didn't give my RB to this patch so I'm not sure what it is doing here. > I removed your rb, also if you do not know what is doing here why do you want > to push this to amd-staging-drm-next and to drm-misc-fixed? I'll add my RB as I push it to those two branches. I'll also add a Link tag and fix the commit SHA for the Fixes tag to one which is found in drm-misc-fixes. Thanks for the patch fixing this long-standing bug. Regards, Luben >> >> The fixes tag should be before the SOB tag, and the v1 line should be >> separated >> by a line before the Git tags. >> >> Since this is a good patch and I want it in both drm-misc-fixed and >> amd-staging-drm-next, >> I'll submit it to drm-misc-fixed with a Link: and RB/SOB tag there and then >> cherry-pick >> that into amd-staging-drm-next. >> >> Don't push it to amd-staging-drm-next. >> >> I'll fix this and submit to amd-staging-drm-next and to drm-misc-fixed with >> a Link: tag. >> >> Regards, >> Luben >> >> >>> --- >>> drivers/gpu/drm/scheduler/sched_main.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c >>> b/drivers/gpu/drm/scheduler/sched_main.c >>> index 649fac2e1ccb..670b7997f389 100644 >>> --- a/drivers/gpu/drm/scheduler/sched_main.c >>> +++ b/drivers/gpu/drm/scheduler/sched_main.c >>> @@ -308,7 +308,7 @@ static void drm_sched_start_timeout(struct >>> drm_gpu_scheduler *sched) >>> */ >>> void drm_sched_fault(struct drm_gpu_scheduler *sched) >>> { >>> - if (sched->ready) >>> + if (sched->timeout_wq) >>> mod_delayed_work(sched->timeout_wq, >work_tdr, 0); >>> } >>> EXPORT_SYMBOL(drm_sched_fault);
Re: [PATCH] drm/sched: Check scheduler work queue before calling timeout handling
On 2023-05-10 10:19, Luben Tuikov wrote: > On 2023-05-10 09:51, vitaly.pros...@amd.com wrote: >> From: Vitaly Prosyak >> >> During an IGT GPU reset test we see again oops despite of >> commit 0c8c901aaaebc9 (drm/sched: Check scheduler ready before calling >> timeout handling). >> >> It uses ready condition whether to call drm_sched_fault which unwind >> the TDR leads to GPU reset. >> However it looks the ready condition is overloaded with other meanings, >> for example, for the following stack is related GPU reset : >> >> 0 gfx_v9_0_cp_gfx_start >> 1 gfx_v9_0_cp_gfx_resume >> 2 gfx_v9_0_cp_resume >> 3 gfx_v9_0_hw_init >> 4 gfx_v9_0_resume >> 5 amdgpu_device_ip_resume_phase2 >> >> does the following: >> /* start the ring */ >> gfx_v9_0_cp_gfx_start(adev); >> ring->sched.ready = true; >> >> The same approach is for other ASICs as well : >> gfx_v8_0_cp_gfx_resume >> gfx_v10_0_kiq_resume, etc... >> >> As a result, our GPU reset test causes GPU fault which calls unconditionally >> gfx_v9_0_fault >> and then drm_sched_fault. However now it depends on whether the interrupt >> service routine >> drm_sched_fault is executed after gfx_v9_0_cp_gfx_start is completed which >> sets the ready >> field of the scheduler to true even for uninitialized schedulers and causes >> oops vs >> no fault or when ISR drm_sched_fault is completed prior >> gfx_v9_0_cp_gfx_start and >> NULL pointer dereference does not occur. >> >> Use the field timeout_wq to prevent oops for uninitialized schedulers. >> The field could be initialized by the work queue of resetting the domain. >> >> Fixes: 0c8c901aaaebc9 ("drm/sched: Check scheduler ready before calling >> timeout handling") >> >> v1: Corrections to commit message (Luben) >> Signed-off-by: Vitaly Prosyak >> Reviewed-by: Luben Tuikov > I didn't give my RB to this patch so I'm not sure what it is doing here. I removed your rb, also if you do not know what is doing here why do you want to push this to amd-staging-drm-next and to drm-misc-fixed? > > The fixes tag should be before the SOB tag, and the v1 line should be > separated > by a line before the Git tags. > > Since this is a good patch and I want it in both drm-misc-fixed and > amd-staging-drm-next, > I'll submit it to drm-misc-fixed with a Link: and RB/SOB tag there and then > cherry-pick > that into amd-staging-drm-next. > > Don't push it to amd-staging-drm-next. > > I'll fix this and submit to amd-staging-drm-next and to drm-misc-fixed with > a Link: tag. > > Regards, > Luben > > >> --- >> drivers/gpu/drm/scheduler/sched_main.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/gpu/drm/scheduler/sched_main.c >> b/drivers/gpu/drm/scheduler/sched_main.c >> index 649fac2e1ccb..670b7997f389 100644 >> --- a/drivers/gpu/drm/scheduler/sched_main.c >> +++ b/drivers/gpu/drm/scheduler/sched_main.c >> @@ -308,7 +308,7 @@ static void drm_sched_start_timeout(struct >> drm_gpu_scheduler *sched) >> */ >> void drm_sched_fault(struct drm_gpu_scheduler *sched) >> { >> -if (sched->ready) >> +if (sched->timeout_wq) >> mod_delayed_work(sched->timeout_wq, >work_tdr, 0); >> } >> EXPORT_SYMBOL(drm_sched_fault);
Re: [PATCH] drm/sched: Check scheduler work queue before calling timeout handling
On 2023-05-10 09:51, vitaly.pros...@amd.com wrote: > From: Vitaly Prosyak > > During an IGT GPU reset test we see again oops despite of > commit 0c8c901aaaebc9 (drm/sched: Check scheduler ready before calling > timeout handling). > > It uses ready condition whether to call drm_sched_fault which unwind > the TDR leads to GPU reset. > However it looks the ready condition is overloaded with other meanings, > for example, for the following stack is related GPU reset : > > 0 gfx_v9_0_cp_gfx_start > 1 gfx_v9_0_cp_gfx_resume > 2 gfx_v9_0_cp_resume > 3 gfx_v9_0_hw_init > 4 gfx_v9_0_resume > 5 amdgpu_device_ip_resume_phase2 > > does the following: > /* start the ring */ > gfx_v9_0_cp_gfx_start(adev); > ring->sched.ready = true; > > The same approach is for other ASICs as well : > gfx_v8_0_cp_gfx_resume > gfx_v10_0_kiq_resume, etc... > > As a result, our GPU reset test causes GPU fault which calls unconditionally > gfx_v9_0_fault > and then drm_sched_fault. However now it depends on whether the interrupt > service routine > drm_sched_fault is executed after gfx_v9_0_cp_gfx_start is completed which > sets the ready > field of the scheduler to true even for uninitialized schedulers and causes > oops vs > no fault or when ISR drm_sched_fault is completed prior > gfx_v9_0_cp_gfx_start and > NULL pointer dereference does not occur. > > Use the field timeout_wq to prevent oops for uninitialized schedulers. > The field could be initialized by the work queue of resetting the domain. > > Fixes: 0c8c901aaaebc9 ("drm/sched: Check scheduler ready before calling > timeout handling") > > v1: Corrections to commit message (Luben) > Signed-off-by: Vitaly Prosyak > Reviewed-by: Luben Tuikov I didn't give my RB to this patch so I'm not sure what it is doing here. The fixes tag should be before the SOB tag, and the v1 line should be separated by a line before the Git tags. Since this is a good patch and I want it in both drm-misc-fixed and amd-staging-drm-next, I'll submit it to drm-misc-fixed with a Link: and RB/SOB tag there and then cherry-pick that into amd-staging-drm-next. Don't push it to amd-staging-drm-next. I'll fix this and submit to amd-staging-drm-next and to drm-misc-fixed with a Link: tag. Regards, Luben > --- > drivers/gpu/drm/scheduler/sched_main.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c > b/drivers/gpu/drm/scheduler/sched_main.c > index 649fac2e1ccb..670b7997f389 100644 > --- a/drivers/gpu/drm/scheduler/sched_main.c > +++ b/drivers/gpu/drm/scheduler/sched_main.c > @@ -308,7 +308,7 @@ static void drm_sched_start_timeout(struct > drm_gpu_scheduler *sched) > */ > void drm_sched_fault(struct drm_gpu_scheduler *sched) > { > - if (sched->ready) > + if (sched->timeout_wq) > mod_delayed_work(sched->timeout_wq, >work_tdr, 0); > } > EXPORT_SYMBOL(drm_sched_fault);
[PATCH] drm/sched: Check scheduler work queue before calling timeout handling
From: Vitaly Prosyak During an IGT GPU reset test we see again oops despite of commit 0c8c901aaaebc9 (drm/sched: Check scheduler ready before calling timeout handling). It uses ready condition whether to call drm_sched_fault which unwind the TDR leads to GPU reset. However it looks the ready condition is overloaded with other meanings, for example, for the following stack is related GPU reset : 0 gfx_v9_0_cp_gfx_start 1 gfx_v9_0_cp_gfx_resume 2 gfx_v9_0_cp_resume 3 gfx_v9_0_hw_init 4 gfx_v9_0_resume 5 amdgpu_device_ip_resume_phase2 does the following: /* start the ring */ gfx_v9_0_cp_gfx_start(adev); ring->sched.ready = true; The same approach is for other ASICs as well : gfx_v8_0_cp_gfx_resume gfx_v10_0_kiq_resume, etc... As a result, our GPU reset test causes GPU fault which calls unconditionally gfx_v9_0_fault and then drm_sched_fault. However now it depends on whether the interrupt service routine drm_sched_fault is executed after gfx_v9_0_cp_gfx_start is completed which sets the ready field of the scheduler to true even for uninitialized schedulers and causes oops vs no fault or when ISR drm_sched_fault is completed prior gfx_v9_0_cp_gfx_start and NULL pointer dereference does not occur. Use the field timeout_wq to prevent oops for uninitialized schedulers. The field could be initialized by the work queue of resetting the domain. Fixes: 0c8c901aaaebc9 ("drm/sched: Check scheduler ready before calling timeout handling") v1: Corrections to commit message (Luben) Signed-off-by: Vitaly Prosyak --- drivers/gpu/drm/scheduler/sched_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 649fac2e1ccb..670b7997f389 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -308,7 +308,7 @@ static void drm_sched_start_timeout(struct drm_gpu_scheduler *sched) */ void drm_sched_fault(struct drm_gpu_scheduler *sched) { - if (sched->ready) + if (sched->timeout_wq) mod_delayed_work(sched->timeout_wq, >work_tdr, 0); } EXPORT_SYMBOL(drm_sched_fault); -- 2.25.1
[PATCH] drm/sched: Check scheduler work queue before calling timeout handling
From: Vitaly Prosyak During an IGT GPU reset test we see again oops despite of commit 0c8c901aaaebc9 (drm/sched: Check scheduler ready before calling timeout handling). It uses ready condition whether to call drm_sched_fault which unwind the TDR leads to GPU reset. However it looks the ready condition is overloaded with other meanings, for example, for the following stack is related GPU reset : 0 gfx_v9_0_cp_gfx_start 1 gfx_v9_0_cp_gfx_resume 2 gfx_v9_0_cp_resume 3 gfx_v9_0_hw_init 4 gfx_v9_0_resume 5 amdgpu_device_ip_resume_phase2 does the following: /* start the ring */ gfx_v9_0_cp_gfx_start(adev); ring->sched.ready = true; The same approach is for other ASICs as well : gfx_v8_0_cp_gfx_resume gfx_v10_0_kiq_resume, etc... As a result, our GPU reset test causes GPU fault which calls unconditionally gfx_v9_0_fault and then drm_sched_fault. However now it depends on whether the interrupt service routine drm_sched_fault is executed after gfx_v9_0_cp_gfx_start is completed which sets the ready field of the scheduler to true even for uninitialized schedulers and causes oops vs no fault or when ISR drm_sched_fault is completed prior gfx_v9_0_cp_gfx_start and NULL pointer dereference does not occur. Use the field timeout_wq to prevent oops for uninitialized schedulers. The field could be initialized by the work queue of resetting the domain. Fixes: 0c8c901aaaebc9 ("drm/sched: Check scheduler ready before calling timeout handling") v1: Corrections to commit message (Luben) Signed-off-by: Vitaly Prosyak Reviewed-by: Luben Tuikov --- drivers/gpu/drm/scheduler/sched_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 649fac2e1ccb..670b7997f389 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -308,7 +308,7 @@ static void drm_sched_start_timeout(struct drm_gpu_scheduler *sched) */ void drm_sched_fault(struct drm_gpu_scheduler *sched) { - if (sched->ready) + if (sched->timeout_wq) mod_delayed_work(sched->timeout_wq, >work_tdr, 0); } EXPORT_SYMBOL(drm_sched_fault); -- 2.25.1
Re: [PATCH 10/66] drm/amd/display: Do not set drr on pipe commit
On 5/9/23 23:07, Pillai, Aurabindo wrote: > > Sorry - the firmware in the previous message is for DCN32. For Navi2x, please > use the firmware attached here. Same problem (contents of /sys/kernel/debug/dri/0/amdgpu_firmware_info below). Even if it did work with newer FW, the kernel must keep working with older FW, so in that case the new behaviour would need to be guarded by the FW version. VCE feature version: 0, firmware version: 0x UVD feature version: 0, firmware version: 0x MC feature version: 0, firmware version: 0x ME feature version: 44, firmware version: 0x0040 PFP feature version: 44, firmware version: 0x0061 CE feature version: 44, firmware version: 0x0025 RLC feature version: 1, firmware version: 0x0060 RLC SRLC feature version: 0, firmware version: 0x RLC SRLG feature version: 0, firmware version: 0x RLC SRLS feature version: 0, firmware version: 0x RLCP feature version: 0, firmware version: 0x RLCV feature version: 0, firmware version: 0x MEC feature version: 44, firmware version: 0x0071 MEC2 feature version: 44, firmware version: 0x0071 IMU feature version: 0, firmware version: 0x SOS feature version: 0, firmware version: 0x00210c64 ASD feature version: 553648297, firmware version: 0x21a9 TA XGMI feature version: 0x, firmware version: 0x200f TA RAS feature version: 0x, firmware version: 0x1b00013e TA HDCP feature version: 0x, firmware version: 0x1738 TA DTM feature version: 0x, firmware version: 0x1215 TA RAP feature version: 0x, firmware version: 0x07000213 TA SECUREDISPLAY feature version: 0x, firmware version: 0x SMC feature version: 0, program: 0, firmware version: 0x003a5800 (58.88.0) SDMA0 feature version: 52, firmware version: 0x0053 SDMA1 feature version: 52, firmware version: 0x0053 SDMA2 feature version: 52, firmware version: 0x0053 SDMA3 feature version: 52, firmware version: 0x0053 VCN feature version: 0, firmware version: 0x0211b000 DMCU feature version: 0, firmware version: 0x DMCUB feature version: 0, firmware version: 0x0202001c TOC feature version: 0, firmware version: 0x MES_KIQ feature version: 0, firmware version: 0x MES feature version: 0, firmware version: 0x VBIOS version: 113-D4300100-051 -- > *From:* Pillai, Aurabindo > *Sent:* Tuesday, May 9, 2023 4:44 PM > *To:* Michel Dänzer ; Zhuo, Qingqing (Lillian) > ; amd-gfx@lists.freedesktop.org > ; Chalmers, Wesley > *Cc:* Wang, Chao-kai (Stylon) ; Li, Sun peng (Leo) > ; Wentland, Harry ; Siqueira, > Rodrigo ; Li, Roman ; Chiu, > Solomon ; Lin, Wayne ; Lakha, > Bhawanpreet ; Gutierrez, Agustin > ; Kotarac, Pavle > *Subject:* Re: [PATCH 10/66] drm/amd/display: Do not set drr on pipe commit > > Hi Michel, > > Could you please try with the attached firmware package if you see the hang > without any reverts? If you do see hangs, please send dmesg with > "drm.debug=0x156 log_buf_len=30M" in the kernel cmdline. > > The attached fw is not released to the public yet, but we will be updating > them in linux-firmware tree next week. Please do backup your existing > firmware, and put the attached files into /usr/lib/firmware/updates/amgpu and > regenerate your ramdisk. On ubuntu the following should do: > > sudo update-initramfs -u -k `uname -r` > > -- > > Regards, > Jay >
Re: [PATCH] drm/amdgpu: change gfx 11.0.4 external_id range
On Wed, May 10, 2023 at 4:38 AM Yifan Zhang wrote: > > gfx 11.0.4 range starts from 0x80. > > Fixes: 311d52367d0a ("drm/amdgpu: add soc21 common ip block support for GC > 11.0.4") > > Signed-off-by: Yifan Zhang Acked-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/soc21.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/soc21.c > b/drivers/gpu/drm/amd/amdgpu/soc21.c > index 0f82b8e83acb..6bff936a6e55 100644 > --- a/drivers/gpu/drm/amd/amdgpu/soc21.c > +++ b/drivers/gpu/drm/amd/amdgpu/soc21.c > @@ -711,7 +711,7 @@ static int soc21_common_early_init(void *handle) > AMD_PG_SUPPORT_VCN_DPG | > AMD_PG_SUPPORT_GFX_PG | > AMD_PG_SUPPORT_JPEG; > - adev->external_rev_id = adev->rev_id + 0x1; > + adev->external_rev_id = adev->rev_id + 0x80; > break; > > default: > -- > 2.37.3 >
Fwd: Kernel 5.11 crashes when it boots, it produces black screen.
Hi, I noticed a regression report on Bugzilla ([1]). As many developers don't have a look on it, I decided to forward it by email. See the report for the full thread. Quoting from the report: > Azamat S. Kalimoulline 2021-04-06 15:45:08 UTC > > Same as in https://bugzilla.kernel.org/show_bug.cgi?id=212133, but not > StoneyRidge related. I have same issue in 5.11.9 kernel, but on Renoir > architecture. I have AMD Ryzen 5 PRO 4650U with Radeon Graphics. Same stuck > on loading initial ramdisk. modprobe.blacklist=amdgpu 3` didn't help to boot. > Same stuck. Also iommu=off and acpi=off too. 5.10.26 boots fine. I boot via > efi and I have no option boot without it. Azamat, can you try reproducing this issue on latest mainline? Anyway, let me add this regression to regzbot: #regzbot introduced: v5.10..v5.11 https://bugzilla.kernel.org/show_bug.cgi?id=212579 #regzbot title: Booting kernel on AMD Ryzen 5 PRO stucks in loading initrd Thanks. [1]: https://bugzilla.kernel.org/show_bug.cgi?id=212579 -- An old man doll... just what I always wanted! - Clara
Re: Fwd: Kernel 5.11 crashes when it boots, it produces black screen.
Hi! On 10.05.23 10:26, Bagas Sanjaya wrote: > > I noticed a regression report on Bugzilla ([1]). As many developers don't > have a look on it, I decided to forward it by email. See the report > for the full thread. > > Quoting from the report: > >> Azamat S. Kalimoulline 2021-04-06 15:45:08 UTC >> >> Same as in https://bugzilla.kernel.org/show_bug.cgi?id=212133, but not >> StoneyRidge related. I have same issue in 5.11.9 kernel, but on Renoir >> architecture. I have AMD Ryzen 5 PRO 4650U with Radeon Graphics. Same stuck >> on loading initial ramdisk. modprobe.blacklist=amdgpu 3` didn't help to >> boot. Same stuck. Also iommu=off and acpi=off too. 5.10.26 boots fine. I >> boot via efi and I have no option boot without it. > > Azamat, can you try reproducing this issue on latest mainline? > > [1]: https://bugzilla.kernel.org/show_bug.cgi?id=212579 Bagas, thx for all your help with regression tracking, much appreciated (side note, as I'm curious for a while already: what is your motivation? Just want to help? But whatever, any help is great!). That being said: I'm not sure if I like what you did in this particular case, as developers might start getting annoyed by regression tracking if we throw too many bug reports of lesser quality before their feet -- and then they might start to ignore us, which we really need to prevent. That's why I would not have forwarded that report at this point of time, mainly for these reasons: * The initial report is quite old already, as it fall through the cracks (not good, but happens; sorry Azamat!). Hence in this case it would definitely be better to *first* ask the reporter to check if the problem still happens with latest mainline (or at least latest stable) before involving the kernel developers, as it might have been fixed already. * This might not be a amdgpu bug at all; in fact the other bug the reporter mentioned was an iommu thing. Hence this might be one of those regressions where a bisection is the only way to get down to the problem. Sure, sending a few developers a quick inquiry along the lines of "do you maybe have an idea what's up there" is fine, but that's not what you did in your mail. Your list of recipients is also quite long; that's risky: if you do that too often, as then they might start ignoring mail from you. Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page.
RE: [PATCH 1/2] drm/amdgpu: fix amdgpu_irq_put call trace in jpeg_v4_0_hw_fini
[AMD Official Use Only - General] Hi Hawking, When modprobe, the interrupt of jpeg/vcn was enabled in amdgpu_fence_driver_hw_init(). If the amdgpu_irq_get function is added in amdgpu_xxx_ras_late_init/xxx_v4_0_late_init, it will enable the instance interrupt twice. My previous modification plan also had this issue. Perhaps we should remove the amdgpu_irq_put function from jpeg/vcn_v4_0_hw_fini. Regards, Horatio -Original Message- From: Zhang, Hawking Sent: Monday, May 8, 2023 8:32 PM To: Zhou1, Tao ; Zhang, Horatio ; amd-gfx@lists.freedesktop.org Cc: Xu, Feifei ; Liu, Leo ; Jiang, Sonny ; Limonciello, Mario ; Liu, HaoPing (Alan) ; Zhang, Horatio Subject: RE: [PATCH 1/2] drm/amdgpu: fix amdgpu_irq_put call trace in jpeg_v4_0_hw_fini [AMD Official Use Only - General] Shall we consider creating amdgpu_vcn_ras_late_init as a common helper for interrupt enablement, like other IP blocks. This also reduces further effort when RAS feature is introduced in new version of vcn/jpeg Regards, Hawking -Original Message- From: Zhou1, Tao Sent: Monday, May 8, 2023 19:06 To: Zhang, Horatio ; amd-gfx@lists.freedesktop.org Cc: Zhang, Hawking ; Xu, Feifei ; Liu, Leo ; Jiang, Sonny ; Limonciello, Mario ; Liu, HaoPing (Alan) ; Zhang, Horatio Subject: RE: [PATCH 1/2] drm/amdgpu: fix amdgpu_irq_put call trace in jpeg_v4_0_hw_fini [AMD Official Use Only - General] The series is: Reviewed-by: Tao Zhou > -Original Message- > From: Horatio Zhang > Sent: Monday, May 8, 2023 6:20 PM > To: amd-gfx@lists.freedesktop.org > Cc: Zhang, Hawking ; Zhou1, Tao > ; Xu, Feifei ; Liu, Leo > ; Jiang, Sonny ; Limonciello, > Mario ; Liu, HaoPing (Alan) > ; Zhang, Horatio > Subject: [PATCH 1/2] drm/amdgpu: fix amdgpu_irq_put call trace in > jpeg_v4_0_hw_fini > > During the suspend, the jpeg_v4_0_hw_init function will use the > amdgpu_irq_put to disable the irq of jpeg.inst, but it was not enabled > during the resume process, which resulted in a call trace during the GPU > reset process. > > [ 50.497562] RIP: 0010:amdgpu_irq_put+0xa4/0xc0 [amdgpu] > [ 50.497619] RSP: 0018:aa2400fcfcb0 EFLAGS: 00010246 > [ 50.497620] RAX: RBX: 0001 RCX: > > [ 50.497621] RDX: RSI: RDI: > > [ 50.497621] RBP: aa2400fcfcd0 R08: R09: > > [ 50.497622] R10: R11: R12: > 99b2105242d8 > [ 50.497622] R13: R14: 99b21050 R15: > 99b21050 > [ 50.497623] FS: () GS:99b51848() > knlGS: > [ 50.497623] CS: 0010 DS: ES: CR0: 80050033 > [ 50.497624] CR2: 7f9d32aa91e8 CR3: 0001ba21 CR4: > 00750ee0 > [ 50.497624] PKRU: 5554 > [ 50.497625] Call Trace: > [ 50.497625] > [ 50.497627] jpeg_v4_0_hw_fini+0x43/0xc0 [amdgpu] > [ 50.497693] jpeg_v4_0_suspend+0x13/0x30 [amdgpu] > [ 50.497751] amdgpu_device_ip_suspend_phase2+0x240/0x470 [amdgpu] > [ 50.497802] amdgpu_device_ip_suspend+0x41/0x80 [amdgpu] > [ 50.497854] amdgpu_device_pre_asic_reset+0xd9/0x4a0 [amdgpu] > [ 50.497905] amdgpu_device_gpu_recover.cold+0x548/0xcf1 [amdgpu] > [ 50.498005] amdgpu_debugfs_reset_work+0x4c/0x80 [amdgpu] > [ 50.498060] process_one_work+0x21f/0x400 > [ 50.498063] worker_thread+0x200/0x3f0 > [ 50.498064] ? process_one_work+0x400/0x400 > [ 50.498065] kthread+0xee/0x120 > [ 50.498067] ? kthread_complete_and_exit+0x20/0x20 > [ 50.498068] ret_from_fork+0x22/0x30 > > Fixes: 86e8255f941e ("drm/amdgpu: add JPEG 4.0 RAS poison consumption > handling") > Signed-off-by: Horatio Zhang > --- > drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c | 9 - > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c > b/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c > index 77e1e64aa1d1..b5c14a166063 100644 > --- a/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c > +++ b/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c > @@ -66,6 +66,13 @@ static int jpeg_v4_0_early_init(void *handle) > return 0; > } > > +static int jpeg_v4_0_late_init(void *handle) { > + struct amdgpu_device *adev = (struct amdgpu_device *)handle; > + > + return amdgpu_irq_get(adev, >jpeg.inst->irq, 0); } > + > /** > * jpeg_v4_0_sw_init - sw init for JPEG block > * > @@ -696,7 +703,7 @@ static int jpeg_v4_0_process_interrupt(struct > amdgpu_device *adev, static const struct amd_ip_funcs jpeg_v4_0_ip_funcs = { > .name = "jpeg_v4_0", > .early_init = jpeg_v4_0_early_init, > - .late_init = NULL, > + .late_init = jpeg_v4_0_late_init, > .sw_init = jpeg_v4_0_sw_init, > .sw_fini = jpeg_v4_0_sw_fini, > .hw_init = jpeg_v4_0_hw_init, > -- > 2.34.1
Re: [PATCH] drm/sched: Check scheduler work queue before calling timeout handling
On 2023-05-09 17:43, vitaly.pros...@amd.com wrote: > From: Vitaly Prosyak > > During an IGT GPU reset test we see again oops despite of > commit 0c8c901aaaebc9bf8bf189ffc116e678f7a2dc16 > drm/sched: Check scheduler ready before calling timeout handling. You can probably use the more succinct fixes line: 0c8c901aaaebc9 ("drm/sched: Check scheduler ready before calling timeout handling") > > It uses ready condition whether to call drm_sched_fault which unwind > the TDR leads to GPU reset. > However it looks the ready condition is overloaded with other meanings, > for example, for the following stack is related GPU reset : > > 0 gfx_v9_0_cp_gfx_start > 1 gfx_v9_0_cp_gfx_resume > 2 gfx_v9_0_cp_resume > 3 gfx_v9_0_hw_init > 4 gfx_v9_0_resume > 5 amdgpu_device_ip_resume_phase2 > > does the following: > /* start the ring */ > gfx_v9_0_cp_gfx_start(adev); > ring->sched.ready = true; > > The same approach is for other ASICs as well : > gfx_v8_0_cp_gfx_resume > gfx_v10_0_kiq_resume, etc... > > As a result, our GPU reset test causes GPU fault which calls unconditionally > gfx_v9_0_fault > and then drm_sched_fault. However now it depends on whether the interrupt > service routine > drm_sched_fault is executed after gfx_v9_0_cp_gfx_start is completed which > sets the ready > field of the scheduler to true even for not initialized schedulers and > causes oops vs "not initialized" --> "uninitialized" reads better. > no fault or when ISR drm_sched_fault is completed prior > gfx_v9_0_cp_gfx_start and > NULL pointer dereference does not occur. > > Use the field timeout_wq to prevent oops for uninitialized schedulers. > The field could be initialized by the work queue of resetting the domain. > > Signed-off-by: Vitaly Prosyak Add, a fixes tag, Fixes: 0c8c901aaaebc9 ("drm/sched: Check scheduler ready before calling timeout handling") Before the SOB tag. > --- > drivers/gpu/drm/scheduler/sched_main.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c > b/drivers/gpu/drm/scheduler/sched_main.c > index 649fac2e1ccb..670b7997f389 100644 > --- a/drivers/gpu/drm/scheduler/sched_main.c > +++ b/drivers/gpu/drm/scheduler/sched_main.c > @@ -308,7 +308,7 @@ static void drm_sched_start_timeout(struct > drm_gpu_scheduler *sched) > */ > void drm_sched_fault(struct drm_gpu_scheduler *sched) > { > - if (sched->ready) > + if (sched->timeout_wq) > mod_delayed_work(sched->timeout_wq, >work_tdr, 0); > } > EXPORT_SYMBOL(drm_sched_fault); Yes, this does indeed seem more correct. Apply the comments above and repost the patch to amd-gfx and dri-devel and I'll push it to drm-misc-fixes and amd-staging-drm-next. -- Regards, Luben
[PATCH] drm/amdgpu: change gfx 11.0.4 external_id range
gfx 11.0.4 range starts from 0x80. Fixes: 311d52367d0a ("drm/amdgpu: add soc21 common ip block support for GC 11.0.4") Signed-off-by: Yifan Zhang --- drivers/gpu/drm/amd/amdgpu/soc21.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/soc21.c b/drivers/gpu/drm/amd/amdgpu/soc21.c index 0f82b8e83acb..6bff936a6e55 100644 --- a/drivers/gpu/drm/amd/amdgpu/soc21.c +++ b/drivers/gpu/drm/amd/amdgpu/soc21.c @@ -711,7 +711,7 @@ static int soc21_common_early_init(void *handle) AMD_PG_SUPPORT_VCN_DPG | AMD_PG_SUPPORT_GFX_PG | AMD_PG_SUPPORT_JPEG; - adev->external_rev_id = adev->rev_id + 0x1; + adev->external_rev_id = adev->rev_id + 0x80; break; default: -- 2.37.3