Re: [PATCH 03/13] drm: add Makefile support for drm buddy
On 20/10/21 1:51 pm, Thomas Zimmermann wrote: > Hi > > Am 20.10.21 um 00:53 schrieb Arunpravin: >> - Include drm buddy to DRM root Makefile >> - Add drm buddy init and exit function calls >>to drm core > > Is there a hard requirement to have this code in the core? > > IMHO there's already too much code in the DRM core that should rather go > into helpers. The allocator should live in it's own module and driver > should init it if needed. > Hi Thomas, I will check on this @Daniel, Could you please share your opinion Regards, Arun >> >> Signed-off-by: Arunpravin >> --- >> drivers/gpu/drm/Makefile | 2 +- >> drivers/gpu/drm/drm_drv.c | 3 +++ >> 2 files changed, 4 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile >> index 0dff40bb863c..dc61e91a3154 100644 >> --- a/drivers/gpu/drm/Makefile >> +++ b/drivers/gpu/drm/Makefile >> @@ -18,7 +18,7 @@ drm-y := drm_aperture.o drm_auth.o drm_cache.o \ >> drm_dumb_buffers.o drm_mode_config.o drm_vblank.o \ >> drm_syncobj.o drm_lease.o drm_writeback.o drm_client.o \ >> drm_client_modeset.o drm_atomic_uapi.o drm_hdcp.o \ >> -drm_managed.o drm_vblank_work.o >> +drm_managed.o drm_vblank_work.o drm_buddy.o >> >> drm-$(CONFIG_DRM_LEGACY) += drm_agpsupport.o drm_bufs.o drm_context.o >> drm_dma.o \ >> drm_legacy_misc.o drm_lock.o drm_memory.o >> drm_scatter.o \ >> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c >> index 7a5097467ba5..6707eec21bef 100644 >> --- a/drivers/gpu/drm/drm_drv.c >> +++ b/drivers/gpu/drm/drm_drv.c >> @@ -43,6 +43,7 @@ >> #include >> #include >> #include >> +#include > > In alphabetical order please. Here and everywhere else. > > Best regards > Thomas > >> >> #include "drm_crtc_internal.h" >> #include "drm_internal.h" >> @@ -1034,6 +1035,7 @@ static void drm_core_exit(void) >> drm_sysfs_destroy(); >> idr_destroy(_minors_idr); >> drm_connector_ida_destroy(); >> +drm_buddy_module_exit(); >> } >> >> static int __init drm_core_init(void) >> @@ -1043,6 +1045,7 @@ static int __init drm_core_init(void) >> drm_connector_ida_init(); >> idr_init(_minors_idr); >> drm_memcpy_init_early(); >> +drm_buddy_module_init(); >> >> ret = drm_sysfs_init(); >> if (ret < 0) { >> >
Re: [PATCH] drm/amd/amdgpu: fix potential bad job hw_fence underflow
What do you mean by underflow in this case ? You mean use after free because of extra dma_fence_put() ? On 2021-10-22 4:14 a.m., JingWen Chen wrote: ping On 2021/10/22 AM11:33, Jingwen Chen wrote: [Why] In advance tdr mode, the real bad job will be resubmitted twice, while in drm_sched_resubmit_jobs_ext, there's a dma_fence_put, so the bad job is put one more time than other jobs. [How] Adding dma_fence_get before resbumit job in amdgpu_device_recheck_guilty_jobs and put the fence for normal jobs Signed-off-by: Jingwen Chen --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 41ce86244144..975f069f6fe8 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4841,6 +4841,9 @@ static void amdgpu_device_recheck_guilty_jobs( /* clear job's guilty and depend the folowing step to decide the real one */ drm_sched_reset_karma(s_job); + /* for the real bad job, it will be resubmitted twice, adding a dma_fence_get +* to make sure fence is balanced */ But that put in drm_sched_resubmit_jobs_ext is for the previous parent fence. fence = sched->ops->run_job(s_job); returns a new HW fence and the put drops the refcount on the old one. Andrey + dma_fence_get(s_job->s_fence->parent); drm_sched_resubmit_jobs_ext(>sched, 1); ret = dma_fence_wait_timeout(s_job->s_fence->parent, false, ring->sched.timeout); @@ -4876,6 +4879,7 @@ static void amdgpu_device_recheck_guilty_jobs( /* got the hw fence, signal finished fence */ atomic_dec(ring->sched.score); + dma_fence_put(s_job->s_fence->parent); dma_fence_get(_job->s_fence->finished); dma_fence_signal(_job->s_fence->finished); dma_fence_put(_job->s_fence->finished);
[pull] amdgpu, amdkfd drm-next-5.16
Hi Dave, Daniel, Fixes for 5.16. The following changes since commit 48737ac4d70faffeb516e2a9847e24f9a7eee05f: drm/amdgpu/psp: add some missing cases to psp_check_pmfw_centralized_cstate_management (2021-10-13 22:51:41 -0400) are available in the Git repository at: https://gitlab.freedesktop.org/agd5f/linux.git tags/amd-drm-next-5.16-2021-10-22 for you to fetch changes up to 41ad36623fabe7d02c9f89aff077dd4c8ba5d602: amd/display: remove ChromeOS workaround (2021-10-22 10:11:52 -0400) amd-drm-next-5.16-2021-10-22: amdgpu: - PSP fix for resume - XGMI fixes - Interrupt fix in device tear down - Renoir USB-C DP alt mode fix for resume - DP 2.0 fixes - Yellow Carp display fixes - Misc display fixes - RAS fixes - IP Discovery enumeration fixes - VGH fixes - SR-IOV fixes - Revert ChromeOS workaround in display code - Cyan Skillfish fixes amdkfd: - Fix error handling in gpu memory allocation - Fix build warnings with some configs - SVM fixes Aaron Liu (1): drm/amdgpu: support B0 external revision id for yellow carp Agustin Gutierrez (2): Revert "drm/amd/display: Fix error in dmesg at boot" Revert "drm/amd/display: Add helper for blanking all dp displays" Alex Deucher (15): drm/amdgpu/pm: properly handle sclk for profiling modes on vangogh drm/amdgpu/gfx10: fix typo in gfx_v10_0_update_gfx_clock_gating() drm/amdkfd: protect hawaii_device_info with CONFIG_DRM_AMDGPU_CIK drm/amdkfd: protect raven_device_info with KFD_SUPPORT_IOMMU_V2 drm/amdgpu: drop soc15_set_ip_blocks() drm/amdgpu: drop nv_set_ip_blocks() drm/amdgpu: drop navi reg init functions drm/amdgpu/vcn3.0: handle harvesting in firmware setup drm/amdgpu: Consolidate VCN firmware setup code drm/amdgpu/vcn2.0: remove intermediate variable drm/amdgpu/vcn3.0: remove intermediate variable drm/amdgpu: Workaround harvesting info for some navy flounder boards drm/amdgpu/swsmu: handle VCN harvesting for VCN SMU setup drm/amdgpu/smu11.0: add missing IP version check drm/amdgpu/nbio7.4: use original HDP_FLUSH bits Anthony Koo (2): drm/amd/display: Change initializer to single brace drm/amd/display: [FW Promotion] Release 0.0.88 Aric Cyr (2): drm/amd/display: Validate plane rects before use drm/amd/display: 3.2.157 Eric Yang (1): drm/amd/display: increase Z9 latency to workaround underflow in Z9 Guchun Chen (1): drm/amdgpu/discovery: parse hw_id_name for SDMA instance 2 and 3 Hansen (1): drm/amd/display: Fix DP2 SE and LE SYMCLK selection for B0 PHY Huang Rui (1): drm/amdgpu: remove grbm cam index/data operations for gfx v10 Jake Wang (6): drm/amd/display: Disable dpp root clock when not being used drm/amd/display: Disable dsc root clock when not being used drm/amd/display: Disable dpstreamclk, symclk32_se, and symclk32_le drm/amd/display: Removed z10 save after dsc disable drm/amd/display: Moved dccg init to after bios golden init drm/amd/display: Disable hdmistream and hdmichar clocks Jimmy Kizito (2): drm/amd/display: Clear encoder assignment for copied streams drm/amd/display: Do not skip link training on DP quick hot plug Jingwen Chen (1): drm/amd/amdgpu: add dummy_page_addr to sriov msg Jonathan Kim (1): drm/amdkfd: map gpu hive id to xgmi connected cpu Josip Pavic (1): drm/amd/display: do not compare integers of different widths Kent Russell (1): drm/amdgpu: Clarify error when hitting bad page threshold Lai, Derek (1): drm/amd/display: Removed power down on boot from DCN31 Lang Yu (1): drm/amdkfd: Fix an inappropriate error handling in allloc memory of gpu Lijo Lazar (1): drm/amd/pm: Disable fan control if not supported Michael Strauss (1): drm/amd/display: Clean Up VPG Low Mem Power Mikita Lipski (1): drm/amd/display: Add missing PSR state Nevenko Stupar (1): drm/amd/display: Add bios parser support for latest firmware_info Nicholas Kazlauskas (2): drm/amd/display: Fix prefetch bandwidth calculation for DCN3.1 drm/amd/display: Require immediate flip support for DCN3.1 planes Nikola Cornij (2): drm/amd/display: Limit display scaling to up to true 4k for DCN 3.1 drm/amd/display: Increase watermark latencies for DCN3.1 Philip Yang (2): drm/amdkfd: clarify the origin of cpages returned by migration functions drm/amdkfd: debug message to count successfully migrated pages Qing Wang (1): drm/amdgpu: replace snprintf in show functions with sysfs_emit Roman Li (1): drm/amd/display: Fully switch to dmub for all dcn21 asics Simon Ser (1): amd/display: remove ChromeOS workaround Tao Zhou (3): drm/amdgpu: load PSP RL in resume path drm/amdgpu: centralize checking for RAS TA
Re: [PATCH] drm/amdkfd: restore userptr ignore bad address error
Am 2021-10-22 um 1:06 p.m. schrieb Philip Yang: > The userptr can be unmapped by application and still registered to > driver, restore userptr work return user pages will get -EFAULT bad > address error. Pretend this error as succeed. GPU access this userptr > will have VM fault later, it is better than application soft hangs with > stalled user mode queues. > > Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling > --- > .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 27 --- > drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 3 +++ > 2 files changed, 20 insertions(+), 10 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c > index cdf46bd0d8d5..6f01c6145a87 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c > @@ -2041,19 +2041,26 @@ static int update_invalid_user_pages(struct > amdkfd_process_info *process_info, > /* Get updated user pages */ > ret = amdgpu_ttm_tt_get_user_pages(bo, bo->tbo.ttm->pages); > if (ret) { > - pr_debug("%s: Failed to get user pages: %d\n", > - __func__, ret); > + pr_debug("Failed %d to get user pages\n", ret); > + > + /* Return -EFAULT bad address error as success. It will > + * fail later with a VM fault if the GPU tries to access > + * it. Better than hanging indefinitely with stalled > + * user mode queues. > + * > + * Return other error -EBUSY or -ENOMEM to retry restore > + */ > + if (ret != -EFAULT) > + return ret; > + } else { > > - /* Return error -EBUSY or -ENOMEM, retry restore */ > - return ret; > + /* > + * FIXME: Cannot ignore the return code, must hold > + * notifier_lock > + */ > + amdgpu_ttm_tt_get_user_pages_done(bo->tbo.ttm); > } > > - /* > - * FIXME: Cannot ignore the return code, must hold > - * notifier_lock > - */ > - amdgpu_ttm_tt_get_user_pages_done(bo->tbo.ttm); > - > /* Mark the BO as valid unless it was invalidated >* again concurrently. >*/ > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > index d784f8d3a834..ae6694f2c73d 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > @@ -693,6 +693,9 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, > struct page **pages) > r = amdgpu_hmm_range_get_pages(>notifier, mm, pages, start, > ttm->num_pages, >range, readonly, > false, NULL); > + if (r) > + pr_debug("failed %d to get user pages 0x%llx\n", r, start); > + > out_putmm: > mmput(mm); >
[PATCH] drm/amdkfd: restore userptr ignore bad address error
The userptr can be unmapped by application and still registered to driver, restore userptr work return user pages will get -EFAULT bad address error. Pretend this error as succeed. GPU access this userptr will have VM fault later, it is better than application soft hangs with stalled user mode queues. Signed-off-by: Philip Yang --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 27 --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 3 +++ 2 files changed, 20 insertions(+), 10 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index cdf46bd0d8d5..6f01c6145a87 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -2041,19 +2041,26 @@ static int update_invalid_user_pages(struct amdkfd_process_info *process_info, /* Get updated user pages */ ret = amdgpu_ttm_tt_get_user_pages(bo, bo->tbo.ttm->pages); if (ret) { - pr_debug("%s: Failed to get user pages: %d\n", - __func__, ret); + pr_debug("Failed %d to get user pages\n", ret); + + /* Return -EFAULT bad address error as success. It will +* fail later with a VM fault if the GPU tries to access +* it. Better than hanging indefinitely with stalled +* user mode queues. +* +* Return other error -EBUSY or -ENOMEM to retry restore +*/ + if (ret != -EFAULT) + return ret; + } else { - /* Return error -EBUSY or -ENOMEM, retry restore */ - return ret; + /* +* FIXME: Cannot ignore the return code, must hold +* notifier_lock +*/ + amdgpu_ttm_tt_get_user_pages_done(bo->tbo.ttm); } - /* -* FIXME: Cannot ignore the return code, must hold -* notifier_lock -*/ - amdgpu_ttm_tt_get_user_pages_done(bo->tbo.ttm); - /* Mark the BO as valid unless it was invalidated * again concurrently. */ diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index d784f8d3a834..ae6694f2c73d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -693,6 +693,9 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, struct page **pages) r = amdgpu_hmm_range_get_pages(>notifier, mm, pages, start, ttm->num_pages, >range, readonly, false, NULL); + if (r) + pr_debug("failed %d to get user pages 0x%llx\n", r, start); + out_putmm: mmput(mm); -- 2.17.1
Re: FW: [PATCH 1/3] drm/amdgpu: fix a potential memory leak in amdgpu_device_fini_sw()
On 2021-10-21 7:33 p.m., Yu, Lang wrote: [AMD Official Use Only] -Original Message- From: Grodzovsky, Andrey Sent: Thursday, October 21, 2021 11:18 PM To: Yu, Lang ; amd-gfx@lists.freedesktop.org Subject: Re: FW: [PATCH 1/3] drm/amdgpu: fix a potential memory leak in amdgpu_device_fini_sw() On 2021-10-21 3:19 a.m., Yu, Lang wrote: [AMD Official Use Only] -Original Message- From: Yu, Lang Sent: Thursday, October 21, 2021 3:18 PM To: Grodzovsky, Andrey Cc: Deucher, Alexander ; Koenig, Christian ; Huang, Ray ; Yu, Lang Subject: [PATCH 1/3] drm/amdgpu: fix a potential memory leak in amdgpu_device_fini_sw() amdgpu_fence_driver_sw_fini() should be executed before amdgpu_device_ip_fini(), otherwise fence driver resource won't be properly freed as adev->rings have been tore down. Cam you clarify more where exactly the memleak happens ? Andrey See amdgpu_fence_driver_sw_fini(), ring->fence_drv.fences will only be freed when adev->rings[i] is not NULL. void amdgpu_fence_driver_sw_fini(struct amdgpu_device *adev) { unsigned int i, j; for (i = 0; i < AMDGPU_MAX_RINGS; i++) { struct amdgpu_ring *ring = adev->rings[i]; if (!ring || !ring->fence_drv.initialized) continue; if (!ring->no_scheduler) drm_sched_fini(>sched); for (j = 0; j <= ring->fence_drv.num_fences_mask; ++j) dma_fence_put(ring->fence_drv.fences[j]); kfree(ring->fence_drv.fences); ring->fence_drv.fences = NULL; ring->fence_drv.initialized = false; } } If amdgpu_device_ip_fini() is executed before amdgpu_fence_driver_sw_fini(), amdgpu_device_ip_fini() will call gfx_vX_0_sw_fini() then call amdgpu_ring_fini() and set adev->rings[i] to NULL. Nothing will be freed in amdgpu_fence_driver_sw_fini(). ring->fence_drv.fences memory leak happened! void amdgpu_ring_fini(struct amdgpu_ring *ring) { .. ring->adev->rings[ring->idx] = NULL; } Regards, Lang Got it, Looks good to me. Reviewed-by: Andrey Grodzovsky Andrey Fixes: 72c8c97b1522 ("drm/amdgpu: Split amdgpu_device_fini into early and late") Signed-off-by: Lang Yu --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 41ce86244144..5654c4790773 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -3843,8 +3843,8 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev) void amdgpu_device_fini_sw(struct amdgpu_device *adev) { - amdgpu_device_ip_fini(adev); amdgpu_fence_driver_sw_fini(adev); + amdgpu_device_ip_fini(adev); release_firmware(adev->firmware.gpu_info_fw); adev->firmware.gpu_info_fw = NULL; adev->accel_working = false; -- 2.25.1
Re: [PATCH] amd/display: remove ChromeOS workaround
On Fri, Oct 22, 2021 at 11:02 AM Simon Ser wrote: > > On Friday, October 22nd, 2021 at 15:58, Alex Deucher > wrote: > > > > Agreed that this patch is good but we'll need to also revert the > > > is_chromeos w/a. > > > > I've reverted that and applied this one. Thanks! > > Ah, didn't realize I needed to revert that one too. Thank you! Well it was only applied to our amd-staging-drm-next branch, not to upstream. Alex
Re: [PATCH] amd/display: remove ChromeOS workaround
On Friday, October 22nd, 2021 at 15:58, Alex Deucher wrote: > > Agreed that this patch is good but we'll need to also revert the > > is_chromeos w/a. > > I've reverted that and applied this one. Thanks! Ah, didn't realize I needed to revert that one too. Thank you!
Re: [PATCH] amd/display: remove ChromeOS workaround
On Thu, Oct 21, 2021 at 2:19 PM Harry Wentland wrote: > > > > On 2021-10-21 13:55, Rodrigo Siqueira Jordao wrote: > > Hi Simon, > > > > I tested this patch and it lgtm. I also agree to revert it. > > > > Btw, did you send the revert patch for "amd/display: only require overlay > > plane to cover whole CRTC on ChromeOS"? I think we need to revert it as > > well. > > > > Agreed that this patch is good but we'll need to also revert the is_chromeos > w/a. I've reverted that and applied this one. Thanks! Alex > > This patch is > Reviewed-by: Harry Wentland > > Harry > > > Sean, Mark > > > > For ChromeOS, we should ignore this patch. Do we need to take any action to > > avoid landing this patch on ChromeOS tree? > > > > Thanks > > Siqueira > > > > On 2021-10-14 11:35 a.m., Simon Ser wrote: > >> This reverts commits ddab8bd788f5 ("drm/amd/display: Fix two cursor > >> duplication > >> when using overlay") and e7d9560aeae5 ("Revert "drm/amd/display: Fix > >> overlay > >> validation by considering cursors""). > >> > >> tl;dr ChromeOS uses the atomic interface for everything except the cursor. > >> This > >> is incorrect and forces amdgpu to disable some hardware features. Let's > >> revert > >> the ChromeOS-specific workaround in mainline and allow the Chrome team to > >> keep > >> it internally in their own tree. > >> > >> See [1] for more details. This patch is an alternative to [2], which added > >> ChromeOS detection. > >> > >> [1]: > >> https://lore.kernel.org/amd-gfx/JIQ_93_cHcshiIDsrMU1huBzx9P9LVQxucx8hQArpQu7Wk5DrCl_vTXj_Q20m_L-8C8A5dSpNcSJ8ehfcCrsQpfB5QG_Spn14EYkH9chtg0=@emersion.fr/>> > >> [2]: > >> https://lore.kernel.org/amd-gfx/20211011151609.452132-1-cont...@emersion.fr/>> > >> Signed-off-by: Simon Ser > >> Cc: Alex Deucher > >> Cc: Harry Wentland > >> Cc: Nicholas Kazlauskas > >> Cc: Bas Nieuwenhuizen > >> Cc: Rodrigo Siqueira > >> Cc: Sean Paul > >> Fixes: ddab8bd788f5 ("drm/amd/display: Fix two cursor duplication when > >> using overlay") > >> Fixes: e7d9560aeae5 ("Revert "drm/amd/display: Fix overlay validation by > >> considering cursors"") > >> --- > >> .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 51 --- > >> 1 file changed, 51 deletions(-) > >> > >> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > >> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > >> index 20065a145851..014c5a9fe461 100644 > >> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > >> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > >> @@ -10628,53 +10628,6 @@ static int add_affected_mst_dsc_crtcs(struct > >> drm_atomic_state *state, struct drm > >> } > >> #endif > >> -static int validate_overlay(struct drm_atomic_state *state) > >> -{ > >> -int i; > >> -struct drm_plane *plane; > >> -struct drm_plane_state *new_plane_state; > >> -struct drm_plane_state *primary_state, *overlay_state = NULL; > >> - > >> -/* Check if primary plane is contained inside overlay */ > >> -for_each_new_plane_in_state_reverse(state, plane, new_plane_state, i) > >> { > >> -if (plane->type == DRM_PLANE_TYPE_OVERLAY) { > >> -if (drm_atomic_plane_disabling(plane->state, new_plane_state)) > >> -return 0; > >> - > >> -overlay_state = new_plane_state; > >> -continue; > >> -} > >> -} > >> - > >> -/* check if we're making changes to the overlay plane */ > >> -if (!overlay_state) > >> -return 0; > >> - > >> -/* check if overlay plane is enabled */ > >> -if (!overlay_state->crtc) > >> -return 0; > >> - > >> -/* find the primary plane for the CRTC that the overlay is enabled on > >> */ > >> -primary_state = drm_atomic_get_plane_state(state, > >> overlay_state->crtc->primary); > >> -if (IS_ERR(primary_state)) > >> -return PTR_ERR(primary_state); > >> - > >> -/* check if primary plane is enabled */ > >> -if (!primary_state->crtc) > >> -return 0; > >> - > >> -/* Perform the bounds check to ensure the overlay plane covers the > >> primary */ > >> -if (primary_state->crtc_x < overlay_state->crtc_x || > >> -primary_state->crtc_y < overlay_state->crtc_y || > >> -primary_state->crtc_x + primary_state->crtc_w > > >> overlay_state->crtc_x + overlay_state->crtc_w || > >> -primary_state->crtc_y + primary_state->crtc_h > > >> overlay_state->crtc_y + overlay_state->crtc_h) { > >> -DRM_DEBUG_ATOMIC("Overlay plane is enabled with hardware cursor > >> but does not fully cover primary plane\n"); > >> -return -EINVAL; > >> -} > >> - > >> -return 0; > >> -} > >> - > >> /** > >>* amdgpu_dm_atomic_check() - Atomic check implementation for AMDgpu DM. > >>* @dev: The DRM device > >> @@ -10856,10 +10809,6 @@ static int amdgpu_dm_atomic_check(struct > >> drm_device *dev, > >> goto fail; > >> } > >> -ret = validate_overlay(state); > >> -
Re: [PATCH] drm/amd/pm: Disable fan control if not supported
On Fri, Oct 22, 2021 at 3:44 AM Lijo Lazar wrote: > > On arcturus, not all platforms use PMFW based fan control. On such > ASICs fan control by PMFW will be disabled in PPTable. Disable hwmon > knobs for fan control also as it is not possible to report or control > fan speed on such platforms through driver. > > Signed-off-by: Lijo Lazar Reviewed-by: Alex Deucher > --- > drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c | 14 ++ > 1 file changed, 14 insertions(+) > > diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c > b/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c > index 082f01893f3d..fd1d30a93db5 100644 > --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c > +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c > @@ -436,6 +436,19 @@ static void arcturus_check_bxco_support(struct > smu_context *smu) > } > } > > +static void arcturus_check_fan_support(struct smu_context *smu) > +{ > + struct smu_table_context *table_context = >smu_table; > + PPTable_t *pptable = table_context->driver_pptable; > + > + /* No sort of fan control possible if PPTable has it disabled */ > + smu->adev->pm.no_fan = > + !(pptable->FeaturesToRun[0] & FEATURE_FAN_CONTROL_MASK); > + if (smu->adev->pm.no_fan) > + dev_info_once(smu->adev->dev, > + "PMFW based fan control disabled"); > +} > + > static int arcturus_check_powerplay_table(struct smu_context *smu) > { > struct smu_table_context *table_context = >smu_table; > @@ -443,6 +456,7 @@ static int arcturus_check_powerplay_table(struct > smu_context *smu) > table_context->power_play_table; > > arcturus_check_bxco_support(smu); > + arcturus_check_fan_support(smu); > > table_context->thermal_controller_type = > powerplay_table->thermal_controller_type; > -- > 2.17.1 >
Re: [PATCH v4 1/3] drm/amdgpu: do not pass ttm_resource_manager to gtt_mgr
On 10/22/2021 1:46 PM, Christian König wrote: Reviewed-by: Christian König for the entire series. But please keep in mind that this here needs extensive testing. I will do multiple round of test again on gfx7-10 cards and gfx9 APU that I have but I don't have gfx6 and sriov cards to test. Regards, Nirmoy Thanks, Christian. Am 22.10.21 um 12:54 schrieb Nirmoy Das: Do not allow exported amdgpu_gtt_mgr_*() to accept any ttm_resource_manager pointer. Also there is no need to force other module to call a ttm function just to eventually call gtt_mgr functions. v4: remove unused adev. v3: upcast mgr from ttm resopurce manager instead of getting it from adev. v2: pass adev's gtt_mgr instead of adev. Signed-off-by: Nirmoy Das --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 22 ++--- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 4 ++-- 4 files changed, 16 insertions(+), 18 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 41ce86244144..2b53d86aebac 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4287,7 +4287,7 @@ static int amdgpu_device_reset_sriov(struct amdgpu_device *adev, amdgpu_virt_init_data_exchange(adev); /* we need recover gart prior to run SMC/CP/SDMA resume */ - amdgpu_gtt_mgr_recover(ttm_manager_type(>mman.bdev, TTM_PL_TT)); + amdgpu_gtt_mgr_recover(>mman.gtt_mgr); r = amdgpu_device_fw_loading(adev); if (r) @@ -4604,7 +4604,7 @@ int amdgpu_do_asic_reset(struct list_head *device_list_handle, amdgpu_inc_vram_lost(tmp_adev); } - r = amdgpu_gtt_mgr_recover(ttm_manager_type(_adev->mman.bdev, TTM_PL_TT)); + r = amdgpu_gtt_mgr_recover(_adev->mman.gtt_mgr); if (r) goto out; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c index c18f16b3be9c..9151950e0cc3 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c @@ -77,10 +77,8 @@ static ssize_t amdgpu_mem_info_gtt_used_show(struct device *dev, { struct drm_device *ddev = dev_get_drvdata(dev); struct amdgpu_device *adev = drm_to_adev(ddev); - struct ttm_resource_manager *man; - man = ttm_manager_type(>mman.bdev, TTM_PL_TT); - return sysfs_emit(buf, "%llu\n", amdgpu_gtt_mgr_usage(man)); + return sysfs_emit(buf, "%llu\n", amdgpu_gtt_mgr_usage(>mman.gtt_mgr)); } static DEVICE_ATTR(mem_info_gtt_total, S_IRUGO, @@ -206,14 +204,15 @@ static void amdgpu_gtt_mgr_del(struct ttm_resource_manager *man, /** * amdgpu_gtt_mgr_usage - return usage of GTT domain * - * @man: TTM memory type manager + * @mgr: amdgpu_gtt_mgr pointer * * Return how many bytes are used in the GTT domain */ -uint64_t amdgpu_gtt_mgr_usage(struct ttm_resource_manager *man) +uint64_t amdgpu_gtt_mgr_usage(struct amdgpu_gtt_mgr *mgr) { - struct amdgpu_gtt_mgr *mgr = to_gtt_mgr(man); - s64 result = man->size - atomic64_read(>available); + s64 result; + + result = mgr->manager.size - atomic64_read(>available); return (result > 0 ? result : 0) * PAGE_SIZE; } @@ -221,16 +220,15 @@ uint64_t amdgpu_gtt_mgr_usage(struct ttm_resource_manager *man) /** * amdgpu_gtt_mgr_recover - re-init gart * - * @man: TTM memory type manager + * @mgr: amdgpu_gtt_mgr pointer * * Re-init the gart for each known BO in the GTT. */ -int amdgpu_gtt_mgr_recover(struct ttm_resource_manager *man) +int amdgpu_gtt_mgr_recover(struct amdgpu_gtt_mgr *mgr) { - struct amdgpu_gtt_mgr *mgr = to_gtt_mgr(man); - struct amdgpu_device *adev; struct amdgpu_gtt_node *node; struct drm_mm_node *mm_node; + struct amdgpu_device *adev; int r = 0; adev = container_of(mgr, typeof(*adev), mman.gtt_mgr); @@ -267,7 +265,7 @@ static void amdgpu_gtt_mgr_debug(struct ttm_resource_manager *man, drm_printf(printer, "man size:%llu pages, gtt available:%lld pages, usage:%lluMB\n", man->size, (u64)atomic64_read(>available), - amdgpu_gtt_mgr_usage(man) >> 20); + amdgpu_gtt_mgr_usage(mgr) >> 20); } static const struct ttm_resource_manager_func amdgpu_gtt_mgr_func = { diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c index d2955ea4a62b..603ce32db5c5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c @@ -678,7 +678,7 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file *filp) ui64 = amdgpu_vram_mgr_vis_usage(ttm_manager_type(>mman.bdev, TTM_PL_VRAM)); return copy_to_user(out, , min(size, 8u)) ? -EFAULT : 0; case
Re: [Intel-gfx] [PATCH v3 13/13] drm/i915: replace drm_detect_hdmi_monitor() with drm_display_info.is_hdmi
On Fri, Oct 22, 2021 at 03:01:52PM +0300, Ville Syrjälä wrote: > On Fri, Oct 22, 2021 at 12:25:33PM +0200, Claudio Suarez wrote: > > On Thu, Oct 21, 2021 at 04:49:59PM +0300, Ville Syrjälä wrote: > > > On Wed, Oct 20, 2021 at 12:51:21AM +0200, Claudio Suarez wrote: > > > > drm_get_edid() internally calls to drm_connector_update_edid_property() > > > > and then drm_add_display_info(), which parses the EDID. > > > > This happens in the function intel_hdmi_set_edid() and > > > > intel_sdvo_tmds_sink_detect() (via intel_sdvo_get_edid()). > > > > > > > > Once EDID is parsed, the monitor HDMI support information is available > > > > through drm_display_info.is_hdmi. Retriving the same information with > > > > drm_detect_hdmi_monitor() is less efficient. Change to > > > > drm_display_info.is_hdmi > > > > > > I meant we need to examine all call chains that can lead to > > > .detect() to make sure all of them do in fact update the > > > display_info beforehand. > > > > Well, I studied it carefully and, yes, all call chains that can lead to > > drm_display_info.is_hdmi / drm_detect_hdmi_monitor() update display_info > > beforehand. In the case that this doesn't happen, the code is unchanged. > > > > Do you want I explain the changes in the code here again ? Or do you want > > to me change the commit message to be more clear ? In the first case, I can > > write here a detailed explanation. In the second case I can make a longer > > commit > > message. > > > > Or both? > > I want all those call chains explained in the commit message, > otherwise I have no easy way to confirm whether the change > is correct or not. Hmm. OK, so I had a bit of a dig around and seems that what we do now .detect()->drm_get_edid()->drm_connector_update_edid_property()->drm_add_display_info() Now the question is when did that start happening? Looks like it was commit 4b4df570b41d ("drm: Update edid-derived drm_display_info fields at edid property set [v2]") that started to call drm_add_display_info() from drm_connector_update_edid_property(), and then commit 5186421cbfe2 ("drm: Introduce epoch counter to drm_connector") started to call drm_connector_update_edid_property() from drm_get_edid(). Before both of those commits were in place display_info would still contain some stale garbage during .detect(). That is the story I think we want in these commit messages since it a) explains why the old code was directly parsing the edid instead b) why it's now safe to change this PS. connector->force handling in drm_get_edid() looks a bit busted since it doesn't call drm_connector_update_edid_property() at all in some cases. I think there might be some path that leads there anywya if/when we change connector->force, but we should fix drm_get_edid() to do the right thing regarless. -- Ville Syrjälä Intel
Re: [PATCH v3 13/13] drm/i915: replace drm_detect_hdmi_monitor() with drm_display_info.is_hdmi
On Fri, Oct 22, 2021 at 12:25:33PM +0200, Claudio Suarez wrote: > On Thu, Oct 21, 2021 at 04:49:59PM +0300, Ville Syrjälä wrote: > > On Wed, Oct 20, 2021 at 12:51:21AM +0200, Claudio Suarez wrote: > > > drm_get_edid() internally calls to drm_connector_update_edid_property() > > > and then drm_add_display_info(), which parses the EDID. > > > This happens in the function intel_hdmi_set_edid() and > > > intel_sdvo_tmds_sink_detect() (via intel_sdvo_get_edid()). > > > > > > Once EDID is parsed, the monitor HDMI support information is available > > > through drm_display_info.is_hdmi. Retriving the same information with > > > drm_detect_hdmi_monitor() is less efficient. Change to > > > drm_display_info.is_hdmi > > > > I meant we need to examine all call chains that can lead to > > .detect() to make sure all of them do in fact update the > > display_info beforehand. > > Well, I studied it carefully and, yes, all call chains that can lead to > drm_display_info.is_hdmi / drm_detect_hdmi_monitor() update display_info > beforehand. In the case that this doesn't happen, the code is unchanged. > > Do you want I explain the changes in the code here again ? Or do you want > to me change the commit message to be more clear ? In the first case, I can > write here a detailed explanation. In the second case I can make a longer > commit > message. > > Or both? I want all those call chains explained in the commit message, otherwise I have no easy way to confirm whether the change is correct or not. -- Ville Syrjälä Intel
Re: [PATCH v4 1/3] drm/amdgpu: do not pass ttm_resource_manager to gtt_mgr
Reviewed-by: Christian König for the entire series. But please keep in mind that this here needs extensive testing. Thanks, Christian. Am 22.10.21 um 12:54 schrieb Nirmoy Das: Do not allow exported amdgpu_gtt_mgr_*() to accept any ttm_resource_manager pointer. Also there is no need to force other module to call a ttm function just to eventually call gtt_mgr functions. v4: remove unused adev. v3: upcast mgr from ttm resopurce manager instead of getting it from adev. v2: pass adev's gtt_mgr instead of adev. Signed-off-by: Nirmoy Das --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 22 ++--- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 4 ++-- 4 files changed, 16 insertions(+), 18 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 41ce86244144..2b53d86aebac 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4287,7 +4287,7 @@ static int amdgpu_device_reset_sriov(struct amdgpu_device *adev, amdgpu_virt_init_data_exchange(adev); /* we need recover gart prior to run SMC/CP/SDMA resume */ - amdgpu_gtt_mgr_recover(ttm_manager_type(>mman.bdev, TTM_PL_TT)); + amdgpu_gtt_mgr_recover(>mman.gtt_mgr); r = amdgpu_device_fw_loading(adev); if (r) @@ -4604,7 +4604,7 @@ int amdgpu_do_asic_reset(struct list_head *device_list_handle, amdgpu_inc_vram_lost(tmp_adev); } - r = amdgpu_gtt_mgr_recover(ttm_manager_type(_adev->mman.bdev, TTM_PL_TT)); + r = amdgpu_gtt_mgr_recover(_adev->mman.gtt_mgr); if (r) goto out; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c index c18f16b3be9c..9151950e0cc3 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c @@ -77,10 +77,8 @@ static ssize_t amdgpu_mem_info_gtt_used_show(struct device *dev, { struct drm_device *ddev = dev_get_drvdata(dev); struct amdgpu_device *adev = drm_to_adev(ddev); - struct ttm_resource_manager *man; - man = ttm_manager_type(>mman.bdev, TTM_PL_TT); - return sysfs_emit(buf, "%llu\n", amdgpu_gtt_mgr_usage(man)); + return sysfs_emit(buf, "%llu\n", amdgpu_gtt_mgr_usage(>mman.gtt_mgr)); } static DEVICE_ATTR(mem_info_gtt_total, S_IRUGO, @@ -206,14 +204,15 @@ static void amdgpu_gtt_mgr_del(struct ttm_resource_manager *man, /** * amdgpu_gtt_mgr_usage - return usage of GTT domain * - * @man: TTM memory type manager + * @mgr: amdgpu_gtt_mgr pointer * * Return how many bytes are used in the GTT domain */ -uint64_t amdgpu_gtt_mgr_usage(struct ttm_resource_manager *man) +uint64_t amdgpu_gtt_mgr_usage(struct amdgpu_gtt_mgr *mgr) { - struct amdgpu_gtt_mgr *mgr = to_gtt_mgr(man); - s64 result = man->size - atomic64_read(>available); + s64 result; + + result = mgr->manager.size - atomic64_read(>available); return (result > 0 ? result : 0) * PAGE_SIZE; } @@ -221,16 +220,15 @@ uint64_t amdgpu_gtt_mgr_usage(struct ttm_resource_manager *man) /** * amdgpu_gtt_mgr_recover - re-init gart * - * @man: TTM memory type manager + * @mgr: amdgpu_gtt_mgr pointer * * Re-init the gart for each known BO in the GTT. */ -int amdgpu_gtt_mgr_recover(struct ttm_resource_manager *man) +int amdgpu_gtt_mgr_recover(struct amdgpu_gtt_mgr *mgr) { - struct amdgpu_gtt_mgr *mgr = to_gtt_mgr(man); - struct amdgpu_device *adev; struct amdgpu_gtt_node *node; struct drm_mm_node *mm_node; + struct amdgpu_device *adev; int r = 0; adev = container_of(mgr, typeof(*adev), mman.gtt_mgr); @@ -267,7 +265,7 @@ static void amdgpu_gtt_mgr_debug(struct ttm_resource_manager *man, drm_printf(printer, "man size:%llu pages, gtt available:%lld pages, usage:%lluMB\n", man->size, (u64)atomic64_read(>available), - amdgpu_gtt_mgr_usage(man) >> 20); + amdgpu_gtt_mgr_usage(mgr) >> 20); } static const struct ttm_resource_manager_func amdgpu_gtt_mgr_func = { diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c index d2955ea4a62b..603ce32db5c5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c @@ -678,7 +678,7 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file *filp) ui64 = amdgpu_vram_mgr_vis_usage(ttm_manager_type(>mman.bdev, TTM_PL_VRAM)); return copy_to_user(out, , min(size, 8u)) ? -EFAULT : 0; case AMDGPU_INFO_GTT_USAGE:
Re: [PATCH v3 13/13] drm/i915: replace drm_detect_hdmi_monitor() with drm_display_info.is_hdmi
On Thu, Oct 21, 2021 at 04:49:59PM +0300, Ville Syrjälä wrote: > On Wed, Oct 20, 2021 at 12:51:21AM +0200, Claudio Suarez wrote: > > drm_get_edid() internally calls to drm_connector_update_edid_property() > > and then drm_add_display_info(), which parses the EDID. > > This happens in the function intel_hdmi_set_edid() and > > intel_sdvo_tmds_sink_detect() (via intel_sdvo_get_edid()). > > > > Once EDID is parsed, the monitor HDMI support information is available > > through drm_display_info.is_hdmi. Retriving the same information with > > drm_detect_hdmi_monitor() is less efficient. Change to > > drm_display_info.is_hdmi > > I meant we need to examine all call chains that can lead to > .detect() to make sure all of them do in fact update the > display_info beforehand. Well, I studied it carefully and, yes, all call chains that can lead to drm_display_info.is_hdmi / drm_detect_hdmi_monitor() update display_info beforehand. In the case that this doesn't happen, the code is unchanged. Do you want I explain the changes in the code here again ? Or do you want to me change the commit message to be more clear ? In the first case, I can write here a detailed explanation. In the second case I can make a longer commit message. Or both? Best Regards, Claudio Suarez.
Re: [PATCH 3/3] drm/amdgpu: Implement bad_page_threshold = -2 case
On 10/21/2021 7:26 PM, Russell, Kent wrote: [AMD Official Use Only] -Original Message- From: Lazar, Lijo Sent: Thursday, October 21, 2021 1:25 AM To: Russell, Kent ; amd-gfx@lists.freedesktop.org Cc: Tuikov, Luben ; Joshi, Mukul Subject: Re: [PATCH 3/3] drm/amdgpu: Implement bad_page_threshold = -2 case On 10/20/2021 10:05 PM, Kent Russell wrote: If the bad_page_threshold kernel parameter is set to -2, continue to post the GPU. Print a warning to dmesg that this action has been done, and that page retirement will obviously not work for said GPU Cc: Luben Tuikov Cc: Mukul Joshi Signed-off-by: Kent Russell --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 13 + 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c index 1ede0f0d6f55..31852330c1db 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c @@ -1115,11 +1115,16 @@ int amdgpu_ras_eeprom_init(struct amdgpu_ras_eeprom_control *control, res = amdgpu_ras_eeprom_correct_header_tag(control, RAS_TABLE_HDR_VAL); } else { - *exceed_err_limit = true; - dev_err(adev->dev, - "RAS records:%d exceed threshold:%d, " - "GPU will not be initialized. Replace this GPU or increase the threshold", + dev_err(adev->dev, "RAS records:%d exceed threshold:%d", control->ras_num_recs, ras->bad_page_cnt_threshold); + if (amdgpu_bad_page_threshold == -2) { + dev_warn(adev->dev, "GPU will be initialized due to bad_page_threshold = -2."); + dev_warn(adev->dev, "Page retirement will not work for this GPU in this state."); Now, this looks as good as booting with 0 = disable bad page retirement. I thought page retirement will work as long as EEPROM has space, but it won't bother about the threshold. If the intent is to ignore bad page retirement, then 0 is good enough and -2 is not required. Also, when user passes threshold=-2, what is the threshold being compared against to say *exceed_err_limit = true? My thought on having the -2 option is that we'll still enable page retirement, we just won't shut the GPU down when it hits the threshold. The bad pages will still be retired as we hit them, it will just never disable the GPU. The comment about retirement not working is definitely incorrect now (leftover from previous local patches), so I'll remove that. In this case, I don't think we'd ever exceed the error limit. exceed_err_limit is only really used when we are disabling the GPU, so we wouldn't want to set that to true. Otherwise we wouldn't be loading the bad pages in gpu_recovery_init, and we'll still return 0 from gpu_recovery_init. Probably, this is too late as this patch is superseded. Replying to this to keep the context. I realized that the question went in the wrong direction because of *exceed_err_limit = true. Saw it in the earlier line but didn't realize that it was removed. Anyway, main question I wanted to ask was - "When user passes threshold=-2, what is the threshold being compared against?" Looked it up today and saw this piece of code. if (amdgpu_bad_page_threshold < 0) { u64 val = adev->gmc.mc_vram_size; do_div(val, RAS_BAD_PAGE_COVER); con->bad_page_cnt_threshold = min(lower_32_bits(val), max_count); For a user who has been specifying say bad_page_threshold=100 in the command line and unable to boot, now this is doing something else. The comparison in next boot is done against a different threshold. In fact, it won't even come to this comparison logic in the next boot as the threshold is raised by default when this parameter is specified. The only case this parameter takes effect is when a user is unable to boot because bad page retirements have exceeded the "default threshold". In other cases it just resets the comparison limit. Hope that is the intention. Thanks, Lijo Kent Thanks, Lijo + res = 0; + } else { + *exceed_err_limit = true; + dev_err(adev->dev, "GPU will not be initialized. Replace this GPU or increase the threshold."); + } } } else { DRM_INFO("Creating a new EEPROM table");
[PATCH 1/1] drm/amdgpu: remove unnecessary checks
amdgpu_ttm_backend_bind() only needed for TTM_PL_TT and AMDGPU_PL_PREEMPT. Signed-off-by: Nirmoy Das --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 5 - 1 file changed, 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index d784f8d3a834..eb872fc4ad92 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -914,11 +914,6 @@ static int amdgpu_ttm_backend_bind(struct ttm_device *bdev, ttm->num_pages, bo_mem, ttm); } - if (bo_mem->mem_type == AMDGPU_PL_GDS || - bo_mem->mem_type == AMDGPU_PL_GWS || - bo_mem->mem_type == AMDGPU_PL_OA) - return -EINVAL; - if (bo_mem->mem_type != TTM_PL_TT || !amdgpu_gtt_mgr_has_gart_addr(bo_mem)) { gtt->offset = AMDGPU_BO_INVALID_OFFSET; -- 2.32.0
Re: [PATCH v4 3/3] drm/amdgpu: recover gart table at resume
On 10/22/21 12:54 PM, Nirmoy Das wrote: Get rid off pin/unpin of gart BO at resume/suspend and instead pin only once and try to recover gart content at resume time. This is much more stable in case there is OOM situation at 2nd call to amdgpu_device_evict_resources() while evicting GART table. v4: return amdgpu_bo_create_kernel() directly without checking its return value. v3: remove gart recovery from other places v2: pin gart at amdgpu_gart_table_vram_alloc() Signed-off-by: Nirmoy Das --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 11 --- drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 82 ++ drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 3 +- drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c | 3 +- drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c | 3 +- drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 3 +- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 3 +- 7 files changed, 11 insertions(+), 97 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 2b53d86aebac..f0c70e9d37fb 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -3935,16 +3935,11 @@ int amdgpu_device_suspend(struct drm_device *dev, bool fbcon) if (!adev->in_s0ix) amdgpu_amdkfd_suspend(adev, adev->in_runpm); - /* First evict vram memory */ amdgpu_device_evict_resources(adev); amdgpu_fence_driver_hw_fini(adev); amdgpu_device_ip_suspend_phase2(adev); - /* This second call to evict device resources is to evict -* the gart page table using the CPU. -*/ - amdgpu_device_evict_resources(adev); return 0; } @@ -4286,8 +4281,6 @@ static int amdgpu_device_reset_sriov(struct amdgpu_device *adev, goto error; amdgpu_virt_init_data_exchange(adev); - /* we need recover gart prior to run SMC/CP/SDMA resume */ - amdgpu_gtt_mgr_recover(>mman.gtt_mgr); r = amdgpu_device_fw_loading(adev); if (r) @@ -4604,10 +4597,6 @@ int amdgpu_do_asic_reset(struct list_head *device_list_handle, amdgpu_inc_vram_lost(tmp_adev); } - r = amdgpu_gtt_mgr_recover(_adev->mman.gtt_mgr); - if (r) - goto out; - r = amdgpu_device_fw_loading(tmp_adev); if (r) return r; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c index d3e4203f6217..3525f87dc1af 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c @@ -116,78 +116,12 @@ int amdgpu_gart_table_vram_alloc(struct amdgpu_device *adev) { int r; I forgot to remove this unused r and now resent[1] this patch again but forgot add in-reply-to. https://patchwork.freedesktop.org/patch/460939/ Regards, Nirmoy - if (adev->gart.bo == NULL) { - struct amdgpu_bo_param bp; - - memset(, 0, sizeof(bp)); - bp.size = adev->gart.table_size; - bp.byte_align = PAGE_SIZE; - bp.domain = AMDGPU_GEM_DOMAIN_VRAM; - bp.flags = AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED | - AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS; - bp.type = ttm_bo_type_kernel; - bp.resv = NULL; - bp.bo_ptr_size = sizeof(struct amdgpu_bo); - - r = amdgpu_bo_create(adev, , >gart.bo); - if (r) { - return r; - } - } - return 0; -} - -/** - * amdgpu_gart_table_vram_pin - pin gart page table in vram - * - * @adev: amdgpu_device pointer - * - * Pin the GART page table in vram so it will not be moved - * by the memory manager (pcie r4xx, r5xx+). These asics require the - * gart table to be in video memory. - * Returns 0 for success, error for failure. - */ -int amdgpu_gart_table_vram_pin(struct amdgpu_device *adev) -{ - int r; - - r = amdgpu_bo_reserve(adev->gart.bo, false); - if (unlikely(r != 0)) - return r; - r = amdgpu_bo_pin(adev->gart.bo, AMDGPU_GEM_DOMAIN_VRAM); - if (r) { - amdgpu_bo_unreserve(adev->gart.bo); - return r; - } - r = amdgpu_bo_kmap(adev->gart.bo, >gart.ptr); - if (r) - amdgpu_bo_unpin(adev->gart.bo); - amdgpu_bo_unreserve(adev->gart.bo); - return r; -} - -/** - * amdgpu_gart_table_vram_unpin - unpin gart page table in vram - * - * @adev: amdgpu_device pointer - * - * Unpin the GART page table in vram (pcie r4xx, r5xx+). - * These asics require the gart table to be in video memory. - */ -void amdgpu_gart_table_vram_unpin(struct amdgpu_device *adev) -{ - int r; +
[PATCH 3/3] drm/amdgpu: recover gart table at resume
Get rid off pin/unpin of gart BO at resume/suspend and instead pin only once and try to recover gart content at resume time. This is much more stable in case there is OOM situation at 2nd call to amdgpu_device_evict_resources() while evicting GART table. v3: remove gart recovery from other places v2: pin gart at amdgpu_gart_table_vram_alloc() Signed-off-by: Nirmoy Das --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 11 --- drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 84 ++ drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 3 +- drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c | 3 +- drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c | 3 +- drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 3 +- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 3 +- 7 files changed, 11 insertions(+), 99 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 2b53d86aebac..f0c70e9d37fb 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -3935,16 +3935,11 @@ int amdgpu_device_suspend(struct drm_device *dev, bool fbcon) if (!adev->in_s0ix) amdgpu_amdkfd_suspend(adev, adev->in_runpm); - /* First evict vram memory */ amdgpu_device_evict_resources(adev); amdgpu_fence_driver_hw_fini(adev); amdgpu_device_ip_suspend_phase2(adev); - /* This second call to evict device resources is to evict -* the gart page table using the CPU. -*/ - amdgpu_device_evict_resources(adev); return 0; } @@ -4286,8 +4281,6 @@ static int amdgpu_device_reset_sriov(struct amdgpu_device *adev, goto error; amdgpu_virt_init_data_exchange(adev); - /* we need recover gart prior to run SMC/CP/SDMA resume */ - amdgpu_gtt_mgr_recover(>mman.gtt_mgr); r = amdgpu_device_fw_loading(adev); if (r) @@ -4604,10 +4597,6 @@ int amdgpu_do_asic_reset(struct list_head *device_list_handle, amdgpu_inc_vram_lost(tmp_adev); } - r = amdgpu_gtt_mgr_recover(_adev->mman.gtt_mgr); - if (r) - goto out; - r = amdgpu_device_fw_loading(tmp_adev); if (r) return r; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c index d3e4203f6217..645950a653a0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c @@ -114,80 +114,12 @@ void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev) */ int amdgpu_gart_table_vram_alloc(struct amdgpu_device *adev) { - int r; - - if (adev->gart.bo == NULL) { - struct amdgpu_bo_param bp; - - memset(, 0, sizeof(bp)); - bp.size = adev->gart.table_size; - bp.byte_align = PAGE_SIZE; - bp.domain = AMDGPU_GEM_DOMAIN_VRAM; - bp.flags = AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED | - AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS; - bp.type = ttm_bo_type_kernel; - bp.resv = NULL; - bp.bo_ptr_size = sizeof(struct amdgpu_bo); - - r = amdgpu_bo_create(adev, , >gart.bo); - if (r) { - return r; - } - } - return 0; -} - -/** - * amdgpu_gart_table_vram_pin - pin gart page table in vram - * - * @adev: amdgpu_device pointer - * - * Pin the GART page table in vram so it will not be moved - * by the memory manager (pcie r4xx, r5xx+). These asics require the - * gart table to be in video memory. - * Returns 0 for success, error for failure. - */ -int amdgpu_gart_table_vram_pin(struct amdgpu_device *adev) -{ - int r; - - r = amdgpu_bo_reserve(adev->gart.bo, false); - if (unlikely(r != 0)) - return r; - r = amdgpu_bo_pin(adev->gart.bo, AMDGPU_GEM_DOMAIN_VRAM); - if (r) { - amdgpu_bo_unreserve(adev->gart.bo); - return r; - } - r = amdgpu_bo_kmap(adev->gart.bo, >gart.ptr); - if (r) - amdgpu_bo_unpin(adev->gart.bo); - amdgpu_bo_unreserve(adev->gart.bo); - return r; -} - -/** - * amdgpu_gart_table_vram_unpin - unpin gart page table in vram - * - * @adev: amdgpu_device pointer - * - * Unpin the GART page table in vram (pcie r4xx, r5xx+). - * These asics require the gart table to be in video memory. - */ -void amdgpu_gart_table_vram_unpin(struct amdgpu_device *adev) -{ - int r; + if (adev->gart.bo != NULL) + return 0; - if (adev->gart.bo == NULL) { - return; - } - r = amdgpu_bo_reserve(adev->gart.bo, true); - if (likely(r == 0)) { -
[PATCH 2/3] drm/amdgpu: do not pass ttm_resource_manager to vram_mgr
Do not allow exported amdgpu_vram_mgr_*() to accept any ttm_resource_manager pointer. Also there is no need to force other module to call a ttm function just to eventually call vram_mgr functions. v2: pass adev's vram_mgr instead of adev Signed-off-by: Nirmoy Das Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 3 +- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 5 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 10 ++--- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 6 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 8 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 5 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 40 7 files changed, 31 insertions(+), 46 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c index 7077f21f0021..df818e145d9a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c @@ -531,9 +531,8 @@ int amdgpu_amdkfd_get_dmabuf_info(struct kgd_dev *kgd, int dma_buf_fd, uint64_t amdgpu_amdkfd_get_vram_usage(struct kgd_dev *kgd) { struct amdgpu_device *adev = (struct amdgpu_device *)kgd; - struct ttm_resource_manager *vram_man = ttm_manager_type(>mman.bdev, TTM_PL_VRAM); - return amdgpu_vram_mgr_usage(vram_man); + return amdgpu_vram_mgr_usage(>mman.vram_mgr); } uint64_t amdgpu_amdkfd_get_hive_id(struct kgd_dev *kgd) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index 76fe5b71e35d..7e745164a624 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -298,7 +298,6 @@ static void amdgpu_cs_get_threshold_for_moves(struct amdgpu_device *adev, { s64 time_us, increment_us; u64 free_vram, total_vram, used_vram; - struct ttm_resource_manager *vram_man = ttm_manager_type(>mman.bdev, TTM_PL_VRAM); /* Allow a maximum of 200 accumulated ms. This is basically per-IB * throttling. * @@ -315,7 +314,7 @@ static void amdgpu_cs_get_threshold_for_moves(struct amdgpu_device *adev, } total_vram = adev->gmc.real_vram_size - atomic64_read(>vram_pin_size); - used_vram = amdgpu_vram_mgr_usage(vram_man); + used_vram = amdgpu_vram_mgr_usage(>mman.vram_mgr); free_vram = used_vram >= total_vram ? 0 : total_vram - used_vram; spin_lock(>mm_stats.lock); @@ -362,7 +361,7 @@ static void amdgpu_cs_get_threshold_for_moves(struct amdgpu_device *adev, if (!amdgpu_gmc_vram_full_visible(>gmc)) { u64 total_vis_vram = adev->gmc.visible_vram_size; u64 used_vis_vram = - amdgpu_vram_mgr_vis_usage(vram_man); + amdgpu_vram_mgr_vis_usage(>mman.vram_mgr); if (used_vis_vram < total_vis_vram) { u64 free_vis_vram = total_vis_vram - used_vis_vram; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c index 603ce32db5c5..b426e03ad630 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c @@ -672,10 +672,10 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file *filp) ui64 = atomic64_read(>num_vram_cpu_page_faults); return copy_to_user(out, , min(size, 8u)) ? -EFAULT : 0; case AMDGPU_INFO_VRAM_USAGE: - ui64 = amdgpu_vram_mgr_usage(ttm_manager_type(>mman.bdev, TTM_PL_VRAM)); + ui64 = amdgpu_vram_mgr_usage(>mman.vram_mgr); return copy_to_user(out, , min(size, 8u)) ? -EFAULT : 0; case AMDGPU_INFO_VIS_VRAM_USAGE: - ui64 = amdgpu_vram_mgr_vis_usage(ttm_manager_type(>mman.bdev, TTM_PL_VRAM)); + ui64 = amdgpu_vram_mgr_vis_usage(>mman.vram_mgr); return copy_to_user(out, , min(size, 8u)) ? -EFAULT : 0; case AMDGPU_INFO_GTT_USAGE: ui64 = amdgpu_gtt_mgr_usage(>mman.gtt_mgr); @@ -709,8 +709,6 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file *filp) } case AMDGPU_INFO_MEMORY: { struct drm_amdgpu_memory_info mem; - struct ttm_resource_manager *vram_man = - ttm_manager_type(>mman.bdev, TTM_PL_VRAM); struct ttm_resource_manager *gtt_man = ttm_manager_type(>mman.bdev, TTM_PL_TT); memset(, 0, sizeof(mem)); @@ -719,7 +717,7 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file *filp) atomic64_read(>vram_pin_size) - AMDGPU_VM_RESERVED_VRAM; mem.vram.heap_usage = - amdgpu_vram_mgr_usage(vram_man); + amdgpu_vram_mgr_usage(>mman.vram_mgr); mem.vram.max_allocation = mem.vram.usable_heap_size
[PATCH v4 3/3] drm/amdgpu: recover gart table at resume
Get rid off pin/unpin of gart BO at resume/suspend and instead pin only once and try to recover gart content at resume time. This is much more stable in case there is OOM situation at 2nd call to amdgpu_device_evict_resources() while evicting GART table. v4: return amdgpu_bo_create_kernel() directly without checking its return value. v3: remove gart recovery from other places v2: pin gart at amdgpu_gart_table_vram_alloc() Signed-off-by: Nirmoy Das --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 11 --- drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 82 ++ drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 3 +- drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c | 3 +- drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c | 3 +- drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 3 +- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 3 +- 7 files changed, 11 insertions(+), 97 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 2b53d86aebac..f0c70e9d37fb 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -3935,16 +3935,11 @@ int amdgpu_device_suspend(struct drm_device *dev, bool fbcon) if (!adev->in_s0ix) amdgpu_amdkfd_suspend(adev, adev->in_runpm); - /* First evict vram memory */ amdgpu_device_evict_resources(adev); amdgpu_fence_driver_hw_fini(adev); amdgpu_device_ip_suspend_phase2(adev); - /* This second call to evict device resources is to evict -* the gart page table using the CPU. -*/ - amdgpu_device_evict_resources(adev); return 0; } @@ -4286,8 +4281,6 @@ static int amdgpu_device_reset_sriov(struct amdgpu_device *adev, goto error; amdgpu_virt_init_data_exchange(adev); - /* we need recover gart prior to run SMC/CP/SDMA resume */ - amdgpu_gtt_mgr_recover(>mman.gtt_mgr); r = amdgpu_device_fw_loading(adev); if (r) @@ -4604,10 +4597,6 @@ int amdgpu_do_asic_reset(struct list_head *device_list_handle, amdgpu_inc_vram_lost(tmp_adev); } - r = amdgpu_gtt_mgr_recover(_adev->mman.gtt_mgr); - if (r) - goto out; - r = amdgpu_device_fw_loading(tmp_adev); if (r) return r; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c index d3e4203f6217..3525f87dc1af 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c @@ -116,78 +116,12 @@ int amdgpu_gart_table_vram_alloc(struct amdgpu_device *adev) { int r; - if (adev->gart.bo == NULL) { - struct amdgpu_bo_param bp; - - memset(, 0, sizeof(bp)); - bp.size = adev->gart.table_size; - bp.byte_align = PAGE_SIZE; - bp.domain = AMDGPU_GEM_DOMAIN_VRAM; - bp.flags = AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED | - AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS; - bp.type = ttm_bo_type_kernel; - bp.resv = NULL; - bp.bo_ptr_size = sizeof(struct amdgpu_bo); - - r = amdgpu_bo_create(adev, , >gart.bo); - if (r) { - return r; - } - } - return 0; -} - -/** - * amdgpu_gart_table_vram_pin - pin gart page table in vram - * - * @adev: amdgpu_device pointer - * - * Pin the GART page table in vram so it will not be moved - * by the memory manager (pcie r4xx, r5xx+). These asics require the - * gart table to be in video memory. - * Returns 0 for success, error for failure. - */ -int amdgpu_gart_table_vram_pin(struct amdgpu_device *adev) -{ - int r; - - r = amdgpu_bo_reserve(adev->gart.bo, false); - if (unlikely(r != 0)) - return r; - r = amdgpu_bo_pin(adev->gart.bo, AMDGPU_GEM_DOMAIN_VRAM); - if (r) { - amdgpu_bo_unreserve(adev->gart.bo); - return r; - } - r = amdgpu_bo_kmap(adev->gart.bo, >gart.ptr); - if (r) - amdgpu_bo_unpin(adev->gart.bo); - amdgpu_bo_unreserve(adev->gart.bo); - return r; -} - -/** - * amdgpu_gart_table_vram_unpin - unpin gart page table in vram - * - * @adev: amdgpu_device pointer - * - * Unpin the GART page table in vram (pcie r4xx, r5xx+). - * These asics require the gart table to be in video memory. - */ -void amdgpu_gart_table_vram_unpin(struct amdgpu_device *adev) -{ - int r; + if (adev->gart.bo != NULL) + return 0; - if (adev->gart.bo == NULL) { - return; - } - r = amdgpu_bo_reserve(adev->gart.bo, true); - if (likely(r == 0)) { -
[PATCH v4 1/3] drm/amdgpu: do not pass ttm_resource_manager to gtt_mgr
Do not allow exported amdgpu_gtt_mgr_*() to accept any ttm_resource_manager pointer. Also there is no need to force other module to call a ttm function just to eventually call gtt_mgr functions. v4: remove unused adev. v3: upcast mgr from ttm resopurce manager instead of getting it from adev. v2: pass adev's gtt_mgr instead of adev. Signed-off-by: Nirmoy Das --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 22 ++--- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 4 ++-- 4 files changed, 16 insertions(+), 18 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 41ce86244144..2b53d86aebac 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4287,7 +4287,7 @@ static int amdgpu_device_reset_sriov(struct amdgpu_device *adev, amdgpu_virt_init_data_exchange(adev); /* we need recover gart prior to run SMC/CP/SDMA resume */ - amdgpu_gtt_mgr_recover(ttm_manager_type(>mman.bdev, TTM_PL_TT)); + amdgpu_gtt_mgr_recover(>mman.gtt_mgr); r = amdgpu_device_fw_loading(adev); if (r) @@ -4604,7 +4604,7 @@ int amdgpu_do_asic_reset(struct list_head *device_list_handle, amdgpu_inc_vram_lost(tmp_adev); } - r = amdgpu_gtt_mgr_recover(ttm_manager_type(_adev->mman.bdev, TTM_PL_TT)); + r = amdgpu_gtt_mgr_recover(_adev->mman.gtt_mgr); if (r) goto out; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c index c18f16b3be9c..9151950e0cc3 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c @@ -77,10 +77,8 @@ static ssize_t amdgpu_mem_info_gtt_used_show(struct device *dev, { struct drm_device *ddev = dev_get_drvdata(dev); struct amdgpu_device *adev = drm_to_adev(ddev); - struct ttm_resource_manager *man; - man = ttm_manager_type(>mman.bdev, TTM_PL_TT); - return sysfs_emit(buf, "%llu\n", amdgpu_gtt_mgr_usage(man)); + return sysfs_emit(buf, "%llu\n", amdgpu_gtt_mgr_usage(>mman.gtt_mgr)); } static DEVICE_ATTR(mem_info_gtt_total, S_IRUGO, @@ -206,14 +204,15 @@ static void amdgpu_gtt_mgr_del(struct ttm_resource_manager *man, /** * amdgpu_gtt_mgr_usage - return usage of GTT domain * - * @man: TTM memory type manager + * @mgr: amdgpu_gtt_mgr pointer * * Return how many bytes are used in the GTT domain */ -uint64_t amdgpu_gtt_mgr_usage(struct ttm_resource_manager *man) +uint64_t amdgpu_gtt_mgr_usage(struct amdgpu_gtt_mgr *mgr) { - struct amdgpu_gtt_mgr *mgr = to_gtt_mgr(man); - s64 result = man->size - atomic64_read(>available); + s64 result; + + result = mgr->manager.size - atomic64_read(>available); return (result > 0 ? result : 0) * PAGE_SIZE; } @@ -221,16 +220,15 @@ uint64_t amdgpu_gtt_mgr_usage(struct ttm_resource_manager *man) /** * amdgpu_gtt_mgr_recover - re-init gart * - * @man: TTM memory type manager + * @mgr: amdgpu_gtt_mgr pointer * * Re-init the gart for each known BO in the GTT. */ -int amdgpu_gtt_mgr_recover(struct ttm_resource_manager *man) +int amdgpu_gtt_mgr_recover(struct amdgpu_gtt_mgr *mgr) { - struct amdgpu_gtt_mgr *mgr = to_gtt_mgr(man); - struct amdgpu_device *adev; struct amdgpu_gtt_node *node; struct drm_mm_node *mm_node; + struct amdgpu_device *adev; int r = 0; adev = container_of(mgr, typeof(*adev), mman.gtt_mgr); @@ -267,7 +265,7 @@ static void amdgpu_gtt_mgr_debug(struct ttm_resource_manager *man, drm_printf(printer, "man size:%llu pages, gtt available:%lld pages, usage:%lluMB\n", man->size, (u64)atomic64_read(>available), - amdgpu_gtt_mgr_usage(man) >> 20); + amdgpu_gtt_mgr_usage(mgr) >> 20); } static const struct ttm_resource_manager_func amdgpu_gtt_mgr_func = { diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c index d2955ea4a62b..603ce32db5c5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c @@ -678,7 +678,7 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file *filp) ui64 = amdgpu_vram_mgr_vis_usage(ttm_manager_type(>mman.bdev, TTM_PL_VRAM)); return copy_to_user(out, , min(size, 8u)) ? -EFAULT : 0; case AMDGPU_INFO_GTT_USAGE: - ui64 = amdgpu_gtt_mgr_usage(ttm_manager_type(>mman.bdev, TTM_PL_TT)); + ui64 = amdgpu_gtt_mgr_usage(>mman.gtt_mgr); return copy_to_user(out, , min(size, 8u)) ?
Re: [PATCH v3 3/3] drm/amdgpu: recover gart table at resume
Am 22.10.21 um 11:32 schrieb Nirmoy Das: Get rid off pin/unpin of gart BO at resume/suspend and instead pin only once and try to recover gart content at resume time. This is much more stable in case there is OOM situation at 2nd call to amdgpu_device_evict_resources() while evicting GART table. v3: remove gart recovery from other places v2: pin gart at amdgpu_gart_table_vram_alloc() Signed-off-by: Nirmoy Das --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 11 --- drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 80 ++ drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 3 +- drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c | 3 +- drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c | 3 +- drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 3 +- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 3 +- 7 files changed, 12 insertions(+), 94 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 2b53d86aebac..f0c70e9d37fb 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -3935,16 +3935,11 @@ int amdgpu_device_suspend(struct drm_device *dev, bool fbcon) if (!adev->in_s0ix) amdgpu_amdkfd_suspend(adev, adev->in_runpm); - /* First evict vram memory */ amdgpu_device_evict_resources(adev); amdgpu_fence_driver_hw_fini(adev); amdgpu_device_ip_suspend_phase2(adev); - /* This second call to evict device resources is to evict -* the gart page table using the CPU. -*/ - amdgpu_device_evict_resources(adev); return 0; } @@ -4286,8 +4281,6 @@ static int amdgpu_device_reset_sriov(struct amdgpu_device *adev, goto error; amdgpu_virt_init_data_exchange(adev); - /* we need recover gart prior to run SMC/CP/SDMA resume */ - amdgpu_gtt_mgr_recover(>mman.gtt_mgr); r = amdgpu_device_fw_loading(adev); if (r) @@ -4604,10 +4597,6 @@ int amdgpu_do_asic_reset(struct list_head *device_list_handle, amdgpu_inc_vram_lost(tmp_adev); } - r = amdgpu_gtt_mgr_recover(_adev->mman.gtt_mgr); - if (r) - goto out; - r = amdgpu_device_fw_loading(tmp_adev); if (r) return r; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c index d3e4203f6217..679eec122bb5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c @@ -116,78 +116,16 @@ int amdgpu_gart_table_vram_alloc(struct amdgpu_device *adev) { int r; - if (adev->gart.bo == NULL) { - struct amdgpu_bo_param bp; - - memset(, 0, sizeof(bp)); - bp.size = adev->gart.table_size; - bp.byte_align = PAGE_SIZE; - bp.domain = AMDGPU_GEM_DOMAIN_VRAM; - bp.flags = AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED | - AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS; - bp.type = ttm_bo_type_kernel; - bp.resv = NULL; - bp.bo_ptr_size = sizeof(struct amdgpu_bo); - - r = amdgpu_bo_create(adev, , >gart.bo); - if (r) { - return r; - } - } - return 0; -} - -/** - * amdgpu_gart_table_vram_pin - pin gart page table in vram - * - * @adev: amdgpu_device pointer - * - * Pin the GART page table in vram so it will not be moved - * by the memory manager (pcie r4xx, r5xx+). These asics require the - * gart table to be in video memory. - * Returns 0 for success, error for failure. - */ -int amdgpu_gart_table_vram_pin(struct amdgpu_device *adev) -{ - int r; + if (adev->gart.bo != NULL) + return 0; - r = amdgpu_bo_reserve(adev->gart.bo, false); - if (unlikely(r != 0)) - return r; - r = amdgpu_bo_pin(adev->gart.bo, AMDGPU_GEM_DOMAIN_VRAM); + r = amdgpu_bo_create_kernel(adev, adev->gart.table_size, PAGE_SIZE, + AMDGPU_GEM_DOMAIN_VRAM, >gart.bo, + NULL, (void *)>gart.ptr); if (r) { - amdgpu_bo_unreserve(adev->gart.bo); return r; } Just "return amdgpu_bo_create_kernel(..);" should do it now as far as I can see. Apart from that looks really good now. Christian. - r = amdgpu_bo_kmap(adev->gart.bo, >gart.ptr); - if (r) - amdgpu_bo_unpin(adev->gart.bo); - amdgpu_bo_unreserve(adev->gart.bo); - return r; -} - -/** - * amdgpu_gart_table_vram_unpin - unpin gart page table in vram - * - * @adev: amdgpu_device pointer - * - * Unpin the GART page table in vram (pcie r4xx, r5xx+).
Re: [PATCH v3 1/3] drm/amdgpu: do not pass ttm_resource_manager to gtt_mgr
On 10/22/21 11:44 AM, Christian König wrote: Am 22.10.21 um 11:32 schrieb Nirmoy Das: Do not allow exported amdgpu_gtt_mgr_*() to accept any ttm_resource_manager pointer. Also there is no need to force other module to call a ttm function just to eventually call gtt_mgr functions. v3: upcast mgr from ttm resopurce manager instead of getting it from adev. v2: pass adev's gtt_mgr instead of adev Signed-off-by: Nirmoy Das --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 23 ++--- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 4 ++-- 4 files changed, 17 insertions(+), 18 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 41ce86244144..2b53d86aebac 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4287,7 +4287,7 @@ static int amdgpu_device_reset_sriov(struct amdgpu_device *adev, amdgpu_virt_init_data_exchange(adev); /* we need recover gart prior to run SMC/CP/SDMA resume */ - amdgpu_gtt_mgr_recover(ttm_manager_type(>mman.bdev, TTM_PL_TT)); + amdgpu_gtt_mgr_recover(>mman.gtt_mgr); r = amdgpu_device_fw_loading(adev); if (r) @@ -4604,7 +4604,7 @@ int amdgpu_do_asic_reset(struct list_head *device_list_handle, amdgpu_inc_vram_lost(tmp_adev); } - r = amdgpu_gtt_mgr_recover(ttm_manager_type(_adev->mman.bdev, TTM_PL_TT)); + r = amdgpu_gtt_mgr_recover(_adev->mman.gtt_mgr); if (r) goto out; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c index c18f16b3be9c..62e14efa61fe 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c @@ -77,10 +77,8 @@ static ssize_t amdgpu_mem_info_gtt_used_show(struct device *dev, { struct drm_device *ddev = dev_get_drvdata(dev); struct amdgpu_device *adev = drm_to_adev(ddev); - struct ttm_resource_manager *man; - man = ttm_manager_type(>mman.bdev, TTM_PL_TT); - return sysfs_emit(buf, "%llu\n", amdgpu_gtt_mgr_usage(man)); + return sysfs_emit(buf, "%llu\n", amdgpu_gtt_mgr_usage(>mman.gtt_mgr)); } static DEVICE_ATTR(mem_info_gtt_total, S_IRUGO, @@ -206,14 +204,15 @@ static void amdgpu_gtt_mgr_del(struct ttm_resource_manager *man, /** * amdgpu_gtt_mgr_usage - return usage of GTT domain * - * @man: TTM memory type manager + * @mgr: amdgpu_gtt_mgr pointer * * Return how many bytes are used in the GTT domain */ -uint64_t amdgpu_gtt_mgr_usage(struct ttm_resource_manager *man) +uint64_t amdgpu_gtt_mgr_usage(struct amdgpu_gtt_mgr *mgr) { - struct amdgpu_gtt_mgr *mgr = to_gtt_mgr(man); - s64 result = man->size - atomic64_read(>available); + s64 result; + + result = mgr->manager.size - atomic64_read(>available); return (result > 0 ? result : 0) * PAGE_SIZE; } @@ -221,16 +220,15 @@ uint64_t amdgpu_gtt_mgr_usage(struct ttm_resource_manager *man) /** * amdgpu_gtt_mgr_recover - re-init gart * - * @man: TTM memory type manager + * @mgr: amdgpu_gtt_mgr pointer * * Re-init the gart for each known BO in the GTT. */ -int amdgpu_gtt_mgr_recover(struct ttm_resource_manager *man) +int amdgpu_gtt_mgr_recover(struct amdgpu_gtt_mgr *mgr) { - struct amdgpu_gtt_mgr *mgr = to_gtt_mgr(man); - struct amdgpu_device *adev; struct amdgpu_gtt_node *node; struct drm_mm_node *mm_node; + struct amdgpu_device *adev; int r = 0; adev = container_of(mgr, typeof(*adev), mman.gtt_mgr); @@ -260,6 +258,7 @@ static void amdgpu_gtt_mgr_debug(struct ttm_resource_manager *man, struct drm_printer *printer) { struct amdgpu_gtt_mgr *mgr = to_gtt_mgr(man); + struct amdgpu_device *adev = container_of(mgr, typeof(*adev), mman.gtt_mgr); I think that variable is now unused, isn't it? Apart from that looks good to me now. Yes it is. I will remove it. Strangely GCC didn't complain. Nirmoy Christian. spin_lock(>lock); drm_mm_print(>mm, printer); @@ -267,7 +266,7 @@ static void amdgpu_gtt_mgr_debug(struct ttm_resource_manager *man, drm_printf(printer, "man size:%llu pages, gtt available:%lld pages, usage:%lluMB\n", man->size, (u64)atomic64_read(>available), - amdgpu_gtt_mgr_usage(man) >> 20); + amdgpu_gtt_mgr_usage(mgr) >> 20); } static const struct ttm_resource_manager_func amdgpu_gtt_mgr_func = { diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c index d2955ea4a62b..603ce32db5c5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c @@ -678,7 +678,7 @@ int amdgpu_info_ioctl(struct drm_device *dev, void
Re: [PATCH v2 2/3] drm/amdgpu: do not pass ttm_resource_manager to vram_mgr
Am 22.10.21 um 11:32 schrieb Nirmoy Das: Do not allow exported amdgpu_vram_mgr_*() to accept any ttm_resource_manager pointer. Also there is no need to force other module to call a ttm function just to eventually call vram_mgr functions. v2: pass adev's vram_mgr instead of adev Signed-off-by: Nirmoy Das Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 3 +- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 5 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 10 ++--- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 6 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 8 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 5 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 40 7 files changed, 31 insertions(+), 46 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c index 7077f21f0021..df818e145d9a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c @@ -531,9 +531,8 @@ int amdgpu_amdkfd_get_dmabuf_info(struct kgd_dev *kgd, int dma_buf_fd, uint64_t amdgpu_amdkfd_get_vram_usage(struct kgd_dev *kgd) { struct amdgpu_device *adev = (struct amdgpu_device *)kgd; - struct ttm_resource_manager *vram_man = ttm_manager_type(>mman.bdev, TTM_PL_VRAM); - return amdgpu_vram_mgr_usage(vram_man); + return amdgpu_vram_mgr_usage(>mman.vram_mgr); } uint64_t amdgpu_amdkfd_get_hive_id(struct kgd_dev *kgd) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index 76fe5b71e35d..7e745164a624 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -298,7 +298,6 @@ static void amdgpu_cs_get_threshold_for_moves(struct amdgpu_device *adev, { s64 time_us, increment_us; u64 free_vram, total_vram, used_vram; - struct ttm_resource_manager *vram_man = ttm_manager_type(>mman.bdev, TTM_PL_VRAM); /* Allow a maximum of 200 accumulated ms. This is basically per-IB * throttling. * @@ -315,7 +314,7 @@ static void amdgpu_cs_get_threshold_for_moves(struct amdgpu_device *adev, } total_vram = adev->gmc.real_vram_size - atomic64_read(>vram_pin_size); - used_vram = amdgpu_vram_mgr_usage(vram_man); + used_vram = amdgpu_vram_mgr_usage(>mman.vram_mgr); free_vram = used_vram >= total_vram ? 0 : total_vram - used_vram; spin_lock(>mm_stats.lock); @@ -362,7 +361,7 @@ static void amdgpu_cs_get_threshold_for_moves(struct amdgpu_device *adev, if (!amdgpu_gmc_vram_full_visible(>gmc)) { u64 total_vis_vram = adev->gmc.visible_vram_size; u64 used_vis_vram = - amdgpu_vram_mgr_vis_usage(vram_man); + amdgpu_vram_mgr_vis_usage(>mman.vram_mgr); if (used_vis_vram < total_vis_vram) { u64 free_vis_vram = total_vis_vram - used_vis_vram; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c index 603ce32db5c5..b426e03ad630 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c @@ -672,10 +672,10 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file *filp) ui64 = atomic64_read(>num_vram_cpu_page_faults); return copy_to_user(out, , min(size, 8u)) ? -EFAULT : 0; case AMDGPU_INFO_VRAM_USAGE: - ui64 = amdgpu_vram_mgr_usage(ttm_manager_type(>mman.bdev, TTM_PL_VRAM)); + ui64 = amdgpu_vram_mgr_usage(>mman.vram_mgr); return copy_to_user(out, , min(size, 8u)) ? -EFAULT : 0; case AMDGPU_INFO_VIS_VRAM_USAGE: - ui64 = amdgpu_vram_mgr_vis_usage(ttm_manager_type(>mman.bdev, TTM_PL_VRAM)); + ui64 = amdgpu_vram_mgr_vis_usage(>mman.vram_mgr); return copy_to_user(out, , min(size, 8u)) ? -EFAULT : 0; case AMDGPU_INFO_GTT_USAGE: ui64 = amdgpu_gtt_mgr_usage(>mman.gtt_mgr); @@ -709,8 +709,6 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file *filp) } case AMDGPU_INFO_MEMORY: { struct drm_amdgpu_memory_info mem; - struct ttm_resource_manager *vram_man = - ttm_manager_type(>mman.bdev, TTM_PL_VRAM); struct ttm_resource_manager *gtt_man = ttm_manager_type(>mman.bdev, TTM_PL_TT); memset(, 0, sizeof(mem)); @@ -719,7 +717,7 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file *filp) atomic64_read(>vram_pin_size) - AMDGPU_VM_RESERVED_VRAM; mem.vram.heap_usage = - amdgpu_vram_mgr_usage(vram_man); + amdgpu_vram_mgr_usage(>mman.vram_mgr);
Re: [PATCH v3 1/3] drm/amdgpu: do not pass ttm_resource_manager to gtt_mgr
Am 22.10.21 um 11:32 schrieb Nirmoy Das: Do not allow exported amdgpu_gtt_mgr_*() to accept any ttm_resource_manager pointer. Also there is no need to force other module to call a ttm function just to eventually call gtt_mgr functions. v3: upcast mgr from ttm resopurce manager instead of getting it from adev. v2: pass adev's gtt_mgr instead of adev Signed-off-by: Nirmoy Das --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 23 ++--- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 4 ++-- 4 files changed, 17 insertions(+), 18 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 41ce86244144..2b53d86aebac 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4287,7 +4287,7 @@ static int amdgpu_device_reset_sriov(struct amdgpu_device *adev, amdgpu_virt_init_data_exchange(adev); /* we need recover gart prior to run SMC/CP/SDMA resume */ - amdgpu_gtt_mgr_recover(ttm_manager_type(>mman.bdev, TTM_PL_TT)); + amdgpu_gtt_mgr_recover(>mman.gtt_mgr); r = amdgpu_device_fw_loading(adev); if (r) @@ -4604,7 +4604,7 @@ int amdgpu_do_asic_reset(struct list_head *device_list_handle, amdgpu_inc_vram_lost(tmp_adev); } - r = amdgpu_gtt_mgr_recover(ttm_manager_type(_adev->mman.bdev, TTM_PL_TT)); + r = amdgpu_gtt_mgr_recover(_adev->mman.gtt_mgr); if (r) goto out; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c index c18f16b3be9c..62e14efa61fe 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c @@ -77,10 +77,8 @@ static ssize_t amdgpu_mem_info_gtt_used_show(struct device *dev, { struct drm_device *ddev = dev_get_drvdata(dev); struct amdgpu_device *adev = drm_to_adev(ddev); - struct ttm_resource_manager *man; - man = ttm_manager_type(>mman.bdev, TTM_PL_TT); - return sysfs_emit(buf, "%llu\n", amdgpu_gtt_mgr_usage(man)); + return sysfs_emit(buf, "%llu\n", amdgpu_gtt_mgr_usage(>mman.gtt_mgr)); } static DEVICE_ATTR(mem_info_gtt_total, S_IRUGO, @@ -206,14 +204,15 @@ static void amdgpu_gtt_mgr_del(struct ttm_resource_manager *man, /** * amdgpu_gtt_mgr_usage - return usage of GTT domain * - * @man: TTM memory type manager + * @mgr: amdgpu_gtt_mgr pointer * * Return how many bytes are used in the GTT domain */ -uint64_t amdgpu_gtt_mgr_usage(struct ttm_resource_manager *man) +uint64_t amdgpu_gtt_mgr_usage(struct amdgpu_gtt_mgr *mgr) { - struct amdgpu_gtt_mgr *mgr = to_gtt_mgr(man); - s64 result = man->size - atomic64_read(>available); + s64 result; + + result = mgr->manager.size - atomic64_read(>available); return (result > 0 ? result : 0) * PAGE_SIZE; } @@ -221,16 +220,15 @@ uint64_t amdgpu_gtt_mgr_usage(struct ttm_resource_manager *man) /** * amdgpu_gtt_mgr_recover - re-init gart * - * @man: TTM memory type manager + * @mgr: amdgpu_gtt_mgr pointer * * Re-init the gart for each known BO in the GTT. */ -int amdgpu_gtt_mgr_recover(struct ttm_resource_manager *man) +int amdgpu_gtt_mgr_recover(struct amdgpu_gtt_mgr *mgr) { - struct amdgpu_gtt_mgr *mgr = to_gtt_mgr(man); - struct amdgpu_device *adev; struct amdgpu_gtt_node *node; struct drm_mm_node *mm_node; + struct amdgpu_device *adev; int r = 0; adev = container_of(mgr, typeof(*adev), mman.gtt_mgr); @@ -260,6 +258,7 @@ static void amdgpu_gtt_mgr_debug(struct ttm_resource_manager *man, struct drm_printer *printer) { struct amdgpu_gtt_mgr *mgr = to_gtt_mgr(man); + struct amdgpu_device *adev = container_of(mgr, typeof(*adev), mman.gtt_mgr); I think that variable is now unused, isn't it? Apart from that looks good to me now. Christian. spin_lock(>lock); drm_mm_print(>mm, printer); @@ -267,7 +266,7 @@ static void amdgpu_gtt_mgr_debug(struct ttm_resource_manager *man, drm_printf(printer, "man size:%llu pages, gtt available:%lld pages, usage:%lluMB\n", man->size, (u64)atomic64_read(>available), - amdgpu_gtt_mgr_usage(man) >> 20); + amdgpu_gtt_mgr_usage(mgr) >> 20); } static const struct ttm_resource_manager_func amdgpu_gtt_mgr_func = { diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c index d2955ea4a62b..603ce32db5c5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c @@ -678,7
[PATCH v2 2/3] drm/amdgpu: do not pass ttm_resource_manager to vram_mgr
Do not allow exported amdgpu_vram_mgr_*() to accept any ttm_resource_manager pointer. Also there is no need to force other module to call a ttm function just to eventually call vram_mgr functions. v2: pass adev's vram_mgr instead of adev Signed-off-by: Nirmoy Das --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 3 +- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 5 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 10 ++--- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 6 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 8 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 5 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 40 7 files changed, 31 insertions(+), 46 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c index 7077f21f0021..df818e145d9a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c @@ -531,9 +531,8 @@ int amdgpu_amdkfd_get_dmabuf_info(struct kgd_dev *kgd, int dma_buf_fd, uint64_t amdgpu_amdkfd_get_vram_usage(struct kgd_dev *kgd) { struct amdgpu_device *adev = (struct amdgpu_device *)kgd; - struct ttm_resource_manager *vram_man = ttm_manager_type(>mman.bdev, TTM_PL_VRAM); - return amdgpu_vram_mgr_usage(vram_man); + return amdgpu_vram_mgr_usage(>mman.vram_mgr); } uint64_t amdgpu_amdkfd_get_hive_id(struct kgd_dev *kgd) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index 76fe5b71e35d..7e745164a624 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -298,7 +298,6 @@ static void amdgpu_cs_get_threshold_for_moves(struct amdgpu_device *adev, { s64 time_us, increment_us; u64 free_vram, total_vram, used_vram; - struct ttm_resource_manager *vram_man = ttm_manager_type(>mman.bdev, TTM_PL_VRAM); /* Allow a maximum of 200 accumulated ms. This is basically per-IB * throttling. * @@ -315,7 +314,7 @@ static void amdgpu_cs_get_threshold_for_moves(struct amdgpu_device *adev, } total_vram = adev->gmc.real_vram_size - atomic64_read(>vram_pin_size); - used_vram = amdgpu_vram_mgr_usage(vram_man); + used_vram = amdgpu_vram_mgr_usage(>mman.vram_mgr); free_vram = used_vram >= total_vram ? 0 : total_vram - used_vram; spin_lock(>mm_stats.lock); @@ -362,7 +361,7 @@ static void amdgpu_cs_get_threshold_for_moves(struct amdgpu_device *adev, if (!amdgpu_gmc_vram_full_visible(>gmc)) { u64 total_vis_vram = adev->gmc.visible_vram_size; u64 used_vis_vram = - amdgpu_vram_mgr_vis_usage(vram_man); + amdgpu_vram_mgr_vis_usage(>mman.vram_mgr); if (used_vis_vram < total_vis_vram) { u64 free_vis_vram = total_vis_vram - used_vis_vram; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c index 603ce32db5c5..b426e03ad630 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c @@ -672,10 +672,10 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file *filp) ui64 = atomic64_read(>num_vram_cpu_page_faults); return copy_to_user(out, , min(size, 8u)) ? -EFAULT : 0; case AMDGPU_INFO_VRAM_USAGE: - ui64 = amdgpu_vram_mgr_usage(ttm_manager_type(>mman.bdev, TTM_PL_VRAM)); + ui64 = amdgpu_vram_mgr_usage(>mman.vram_mgr); return copy_to_user(out, , min(size, 8u)) ? -EFAULT : 0; case AMDGPU_INFO_VIS_VRAM_USAGE: - ui64 = amdgpu_vram_mgr_vis_usage(ttm_manager_type(>mman.bdev, TTM_PL_VRAM)); + ui64 = amdgpu_vram_mgr_vis_usage(>mman.vram_mgr); return copy_to_user(out, , min(size, 8u)) ? -EFAULT : 0; case AMDGPU_INFO_GTT_USAGE: ui64 = amdgpu_gtt_mgr_usage(>mman.gtt_mgr); @@ -709,8 +709,6 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file *filp) } case AMDGPU_INFO_MEMORY: { struct drm_amdgpu_memory_info mem; - struct ttm_resource_manager *vram_man = - ttm_manager_type(>mman.bdev, TTM_PL_VRAM); struct ttm_resource_manager *gtt_man = ttm_manager_type(>mman.bdev, TTM_PL_TT); memset(, 0, sizeof(mem)); @@ -719,7 +717,7 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file *filp) atomic64_read(>vram_pin_size) - AMDGPU_VM_RESERVED_VRAM; mem.vram.heap_usage = - amdgpu_vram_mgr_usage(vram_man); + amdgpu_vram_mgr_usage(>mman.vram_mgr); mem.vram.max_allocation = mem.vram.usable_heap_size * 3 / 4;
[PATCH v3 3/3] drm/amdgpu: recover gart table at resume
Get rid off pin/unpin of gart BO at resume/suspend and instead pin only once and try to recover gart content at resume time. This is much more stable in case there is OOM situation at 2nd call to amdgpu_device_evict_resources() while evicting GART table. v3: remove gart recovery from other places v2: pin gart at amdgpu_gart_table_vram_alloc() Signed-off-by: Nirmoy Das --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 11 --- drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 80 ++ drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 3 +- drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c | 3 +- drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c | 3 +- drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 3 +- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 3 +- 7 files changed, 12 insertions(+), 94 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 2b53d86aebac..f0c70e9d37fb 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -3935,16 +3935,11 @@ int amdgpu_device_suspend(struct drm_device *dev, bool fbcon) if (!adev->in_s0ix) amdgpu_amdkfd_suspend(adev, adev->in_runpm); - /* First evict vram memory */ amdgpu_device_evict_resources(adev); amdgpu_fence_driver_hw_fini(adev); amdgpu_device_ip_suspend_phase2(adev); - /* This second call to evict device resources is to evict -* the gart page table using the CPU. -*/ - amdgpu_device_evict_resources(adev); return 0; } @@ -4286,8 +4281,6 @@ static int amdgpu_device_reset_sriov(struct amdgpu_device *adev, goto error; amdgpu_virt_init_data_exchange(adev); - /* we need recover gart prior to run SMC/CP/SDMA resume */ - amdgpu_gtt_mgr_recover(>mman.gtt_mgr); r = amdgpu_device_fw_loading(adev); if (r) @@ -4604,10 +4597,6 @@ int amdgpu_do_asic_reset(struct list_head *device_list_handle, amdgpu_inc_vram_lost(tmp_adev); } - r = amdgpu_gtt_mgr_recover(_adev->mman.gtt_mgr); - if (r) - goto out; - r = amdgpu_device_fw_loading(tmp_adev); if (r) return r; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c index d3e4203f6217..679eec122bb5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c @@ -116,78 +116,16 @@ int amdgpu_gart_table_vram_alloc(struct amdgpu_device *adev) { int r; - if (adev->gart.bo == NULL) { - struct amdgpu_bo_param bp; - - memset(, 0, sizeof(bp)); - bp.size = adev->gart.table_size; - bp.byte_align = PAGE_SIZE; - bp.domain = AMDGPU_GEM_DOMAIN_VRAM; - bp.flags = AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED | - AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS; - bp.type = ttm_bo_type_kernel; - bp.resv = NULL; - bp.bo_ptr_size = sizeof(struct amdgpu_bo); - - r = amdgpu_bo_create(adev, , >gart.bo); - if (r) { - return r; - } - } - return 0; -} - -/** - * amdgpu_gart_table_vram_pin - pin gart page table in vram - * - * @adev: amdgpu_device pointer - * - * Pin the GART page table in vram so it will not be moved - * by the memory manager (pcie r4xx, r5xx+). These asics require the - * gart table to be in video memory. - * Returns 0 for success, error for failure. - */ -int amdgpu_gart_table_vram_pin(struct amdgpu_device *adev) -{ - int r; + if (adev->gart.bo != NULL) + return 0; - r = amdgpu_bo_reserve(adev->gart.bo, false); - if (unlikely(r != 0)) - return r; - r = amdgpu_bo_pin(adev->gart.bo, AMDGPU_GEM_DOMAIN_VRAM); + r = amdgpu_bo_create_kernel(adev, adev->gart.table_size, PAGE_SIZE, + AMDGPU_GEM_DOMAIN_VRAM, >gart.bo, + NULL, (void *)>gart.ptr); if (r) { - amdgpu_bo_unreserve(adev->gart.bo); return r; } - r = amdgpu_bo_kmap(adev->gart.bo, >gart.ptr); - if (r) - amdgpu_bo_unpin(adev->gart.bo); - amdgpu_bo_unreserve(adev->gart.bo); - return r; -} - -/** - * amdgpu_gart_table_vram_unpin - unpin gart page table in vram - * - * @adev: amdgpu_device pointer - * - * Unpin the GART page table in vram (pcie r4xx, r5xx+). - * These asics require the gart table to be in video memory. - */ -void amdgpu_gart_table_vram_unpin(struct amdgpu_device *adev) -{ - int r; - - if (adev->gart.bo == NULL) { -
[PATCH v3 1/3] drm/amdgpu: do not pass ttm_resource_manager to gtt_mgr
Do not allow exported amdgpu_gtt_mgr_*() to accept any ttm_resource_manager pointer. Also there is no need to force other module to call a ttm function just to eventually call gtt_mgr functions. v3: upcast mgr from ttm resopurce manager instead of getting it from adev. v2: pass adev's gtt_mgr instead of adev Signed-off-by: Nirmoy Das --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 23 ++--- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 4 ++-- 4 files changed, 17 insertions(+), 18 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 41ce86244144..2b53d86aebac 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4287,7 +4287,7 @@ static int amdgpu_device_reset_sriov(struct amdgpu_device *adev, amdgpu_virt_init_data_exchange(adev); /* we need recover gart prior to run SMC/CP/SDMA resume */ - amdgpu_gtt_mgr_recover(ttm_manager_type(>mman.bdev, TTM_PL_TT)); + amdgpu_gtt_mgr_recover(>mman.gtt_mgr); r = amdgpu_device_fw_loading(adev); if (r) @@ -4604,7 +4604,7 @@ int amdgpu_do_asic_reset(struct list_head *device_list_handle, amdgpu_inc_vram_lost(tmp_adev); } - r = amdgpu_gtt_mgr_recover(ttm_manager_type(_adev->mman.bdev, TTM_PL_TT)); + r = amdgpu_gtt_mgr_recover(_adev->mman.gtt_mgr); if (r) goto out; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c index c18f16b3be9c..62e14efa61fe 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c @@ -77,10 +77,8 @@ static ssize_t amdgpu_mem_info_gtt_used_show(struct device *dev, { struct drm_device *ddev = dev_get_drvdata(dev); struct amdgpu_device *adev = drm_to_adev(ddev); - struct ttm_resource_manager *man; - man = ttm_manager_type(>mman.bdev, TTM_PL_TT); - return sysfs_emit(buf, "%llu\n", amdgpu_gtt_mgr_usage(man)); + return sysfs_emit(buf, "%llu\n", amdgpu_gtt_mgr_usage(>mman.gtt_mgr)); } static DEVICE_ATTR(mem_info_gtt_total, S_IRUGO, @@ -206,14 +204,15 @@ static void amdgpu_gtt_mgr_del(struct ttm_resource_manager *man, /** * amdgpu_gtt_mgr_usage - return usage of GTT domain * - * @man: TTM memory type manager + * @mgr: amdgpu_gtt_mgr pointer * * Return how many bytes are used in the GTT domain */ -uint64_t amdgpu_gtt_mgr_usage(struct ttm_resource_manager *man) +uint64_t amdgpu_gtt_mgr_usage(struct amdgpu_gtt_mgr *mgr) { - struct amdgpu_gtt_mgr *mgr = to_gtt_mgr(man); - s64 result = man->size - atomic64_read(>available); + s64 result; + + result = mgr->manager.size - atomic64_read(>available); return (result > 0 ? result : 0) * PAGE_SIZE; } @@ -221,16 +220,15 @@ uint64_t amdgpu_gtt_mgr_usage(struct ttm_resource_manager *man) /** * amdgpu_gtt_mgr_recover - re-init gart * - * @man: TTM memory type manager + * @mgr: amdgpu_gtt_mgr pointer * * Re-init the gart for each known BO in the GTT. */ -int amdgpu_gtt_mgr_recover(struct ttm_resource_manager *man) +int amdgpu_gtt_mgr_recover(struct amdgpu_gtt_mgr *mgr) { - struct amdgpu_gtt_mgr *mgr = to_gtt_mgr(man); - struct amdgpu_device *adev; struct amdgpu_gtt_node *node; struct drm_mm_node *mm_node; + struct amdgpu_device *adev; int r = 0; adev = container_of(mgr, typeof(*adev), mman.gtt_mgr); @@ -260,6 +258,7 @@ static void amdgpu_gtt_mgr_debug(struct ttm_resource_manager *man, struct drm_printer *printer) { struct amdgpu_gtt_mgr *mgr = to_gtt_mgr(man); + struct amdgpu_device *adev = container_of(mgr, typeof(*adev), mman.gtt_mgr); spin_lock(>lock); drm_mm_print(>mm, printer); @@ -267,7 +266,7 @@ static void amdgpu_gtt_mgr_debug(struct ttm_resource_manager *man, drm_printf(printer, "man size:%llu pages, gtt available:%lld pages, usage:%lluMB\n", man->size, (u64)atomic64_read(>available), - amdgpu_gtt_mgr_usage(man) >> 20); + amdgpu_gtt_mgr_usage(mgr) >> 20); } static const struct ttm_resource_manager_func amdgpu_gtt_mgr_func = { diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c index d2955ea4a62b..603ce32db5c5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c @@ -678,7 +678,7 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file *filp) ui64 = amdgpu_vram_mgr_vis_usage(ttm_manager_type(>mman.bdev,
Re: [PATCH] drm/amd/amdgpu: fix potential bad job hw_fence underflow
ping On 2021/10/22 AM11:33, Jingwen Chen wrote: > [Why] > In advance tdr mode, the real bad job will be resubmitted twice, while > in drm_sched_resubmit_jobs_ext, there's a dma_fence_put, so the bad job > is put one more time than other jobs. > > [How] > Adding dma_fence_get before resbumit job in > amdgpu_device_recheck_guilty_jobs and put the fence for normal jobs > > Signed-off-by: Jingwen Chen > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 > 1 file changed, 4 insertions(+) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > index 41ce86244144..975f069f6fe8 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > @@ -4841,6 +4841,9 @@ static void amdgpu_device_recheck_guilty_jobs( > > /* clear job's guilty and depend the folowing step to decide > the real one */ > drm_sched_reset_karma(s_job); > + /* for the real bad job, it will be resubmitted twice, adding a > dma_fence_get > + * to make sure fence is balanced */ > + dma_fence_get(s_job->s_fence->parent); > drm_sched_resubmit_jobs_ext(>sched, 1); > > ret = dma_fence_wait_timeout(s_job->s_fence->parent, false, > ring->sched.timeout); > @@ -4876,6 +4879,7 @@ static void amdgpu_device_recheck_guilty_jobs( > > /* got the hw fence, signal finished fence */ > atomic_dec(ring->sched.score); > + dma_fence_put(s_job->s_fence->parent); > dma_fence_get(_job->s_fence->finished); > dma_fence_signal(_job->s_fence->finished); > dma_fence_put(_job->s_fence->finished);
[PATCH] drm/amd/pm: Disable fan control if not supported
On arcturus, not all platforms use PMFW based fan control. On such ASICs fan control by PMFW will be disabled in PPTable. Disable hwmon knobs for fan control also as it is not possible to report or control fan speed on such platforms through driver. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c b/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c index 082f01893f3d..fd1d30a93db5 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c @@ -436,6 +436,19 @@ static void arcturus_check_bxco_support(struct smu_context *smu) } } +static void arcturus_check_fan_support(struct smu_context *smu) +{ + struct smu_table_context *table_context = >smu_table; + PPTable_t *pptable = table_context->driver_pptable; + + /* No sort of fan control possible if PPTable has it disabled */ + smu->adev->pm.no_fan = + !(pptable->FeaturesToRun[0] & FEATURE_FAN_CONTROL_MASK); + if (smu->adev->pm.no_fan) + dev_info_once(smu->adev->dev, + "PMFW based fan control disabled"); +} + static int arcturus_check_powerplay_table(struct smu_context *smu) { struct smu_table_context *table_context = >smu_table; @@ -443,6 +456,7 @@ static int arcturus_check_powerplay_table(struct smu_context *smu) table_context->power_play_table; arcturus_check_bxco_support(smu); + arcturus_check_fan_support(smu); table_context->thermal_controller_type = powerplay_table->thermal_controller_type; -- 2.17.1