RE: [PATCH] drm/ttm: Fix the memory delay free issue
Hi David, You are right, it will copy per-vm resv. But currently, it still has the delay free issue which non per vm bo doesn't has. Maybe it already has new fences append to this resv object before copy. Hi Christian, Do you have any suggestion about this? For per vm bo, it seems always delay to free the ttm bo. Best wishes Emily Deng >-Original Message- >From: Zhou, David(ChunMing) >Sent: Wednesday, July 10, 2019 9:28 PM >To: Deng, Emily ; amd-gfx@lists.freedesktop.org >Subject: Re: [PATCH] drm/ttm: Fix the memory delay free issue > >It doesn't make sense that freeing BO still uses per-vm resv. > >I remember when BO is in release list, its resv will be from per-vm resv copy. >Could you check it? > >-David > >在 2019/7/10 17:29, Emily Deng 写道: >> For vulkan cts allocation test cases, they will create a series of >> bos, and then free them. As it has lots of alloction test cases with >> the same vm, as per vm bo feature enable, all of those bos' resv are >> the same. But the bo free is quite slow, as they use the same resv >> object, for every time, free a bo, it will check the resv whether >> signal, if it signal, then will free it. But as the test cases will >> continue to create bo, and the resv fence is increasing. So the free is more >slower than creating. It will cause memory exhausting. >> >> Method: >> When the resv signal, release all the bos which are use the same resv >> object. >> >> Signed-off-by: Emily Deng >> --- >> drivers/gpu/drm/ttm/ttm_bo.c | 29 - >> 1 file changed, 24 insertions(+), 5 deletions(-) >> >> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c >> b/drivers/gpu/drm/ttm/ttm_bo.c index f9a3d4c..57ec59b 100644 >> --- a/drivers/gpu/drm/ttm/ttm_bo.c >> +++ b/drivers/gpu/drm/ttm/ttm_bo.c >> @@ -543,6 +543,7 @@ static int ttm_bo_cleanup_refs(struct >ttm_buffer_object *bo, >> { >> struct ttm_bo_global *glob = bo->bdev->glob; >> struct reservation_object *resv; >> +struct ttm_buffer_object *resv_bo, *resv_bo_next; >> int ret; >> >> if (unlikely(list_empty(&bo->ddestroy))) >> @@ -566,10 +567,14 @@ static int ttm_bo_cleanup_refs(struct >ttm_buffer_object *bo, >> interruptible, >> 30 * HZ); >> >> -if (lret < 0) >> +if (lret < 0) { >> +kref_put(&bo->list_kref, ttm_bo_release_list); >> return lret; >> -else if (lret == 0) >> +} >> +else if (lret == 0) { >> +kref_put(&bo->list_kref, ttm_bo_release_list); >> return -EBUSY; >> +} >> >> spin_lock(&glob->lru_lock); >> if (unlock_resv && !kcl_reservation_object_trylock(bo->resv)) >{ @@ >> -582,6 +587,7 @@ static int ttm_bo_cleanup_refs(struct ttm_buffer_object >*bo, >> * here. >> */ >> spin_unlock(&glob->lru_lock); >> +kref_put(&bo->list_kref, ttm_bo_release_list); >> return 0; >> } >> ret = 0; >> @@ -591,15 +597,29 @@ static int ttm_bo_cleanup_refs(struct >ttm_buffer_object *bo, >> if (unlock_resv) >> kcl_reservation_object_unlock(bo->resv); >> spin_unlock(&glob->lru_lock); >> +kref_put(&bo->list_kref, ttm_bo_release_list); >> return ret; >> } >> >> ttm_bo_del_from_lru(bo); >> list_del_init(&bo->ddestroy); >> kref_put(&bo->list_kref, ttm_bo_ref_bug); >> - >> spin_unlock(&glob->lru_lock); >> ttm_bo_cleanup_memtype_use(bo); >> +kref_put(&bo->list_kref, ttm_bo_release_list); >> + >> +spin_lock(&glob->lru_lock); >> +list_for_each_entry_safe(resv_bo, resv_bo_next, &bo->bdev- >>ddestroy, ddestroy) { >> +if (resv_bo->resv == bo->resv) { >> +ttm_bo_del_from_lru(resv_bo); >> +list_del_init(&resv_bo->ddestroy); >> +spin_unlock(&glob->lru_lock); >> +ttm_bo_cleanup_memtype_use(resv_bo); >> +kref_put(&resv_bo->list_kref, ttm_bo_release_list); >> +spin_lock(&glob->lru_lock); >> +} >> +} >> +spin_unlock(&glob->lru_lock); >> >> if (unlock_resv) >> kcl_reservation_object_unlock(bo->resv); >> @@ -639,9 +659,8 @@ static bool ttm_bo_delayed_delete(struct >ttm_bo_device *bdev, bool remove_all) >> ttm_bo_cleanup_refs(bo, false, !remove_all, true); >> } else { >> spin_unlock(&glob->lru_lock); >> +kref_put(&bo->list_kref, ttm_bo_release_list); >> } >> - >> -kref_put(&bo->list_kref, ttm_bo_release_list); >> spin_lock(&glob->lru_lock); >> } >> list_splice_tail(&removed, &bdev->ddes
Re: [PATCH 27/27] drm/amdgpu: Fix GTT size calculation
Am 13.07.19 um 22:24 schrieb Felix Kuehling: Am 2019-04-30 um 1:03 p.m. schrieb Koenig, Christian: The only real solution I can see is to be able to reliable kill shaders in an OOM situation. Well, we can in fact preempt our compute shaders with low latency. Killing a KFD process will do exactly that. I've taken a look at that thing as well and to be honest it is not even remotely sufficient. We need something which stops the hardware *immediately* from accessing system memory, and not wait for the SQ to kill all waves, flush caches etc... One possibility I'm playing around with for a while is to replace the root PD for the VMIDs in question on the fly. E.g. we just let it point to some dummy which redirects everything into nirvana. But implementing this is easier said than done... Warming up this thread, since I just fixed another bug that was enabled by artificial memory pressure due to the GTT limit. I think disabling the PD for the VMIDs is a good idea. A problem is that HWS firmware updates PD pointers in the background for its VMIDs. So this would require a reliable and fast way to kill the HWS first. Well we don't necessary need to completely kill the HWS. What we need is to suspend it, kill a specific process and resume it later on. As far as I can see the concept with the HWS interaction was to use a ring buffer with async feedback when something is done. That is really convenient for performative and reliable operation, but unfortunately not if you need to kill of some processing immediately. So something like setting a bit in a register to suspend the HWS, kill the VMIDs, set a flag in the HWS runlist to stop it from scheduling a specific process once more and then resume the HWS is what is needed here. An alternative I thought about is, disabling bus access at the BIF level if that's possible somehow. Basically we would instantaneously kill all GPU system memory access, signal all fences or just remove all fences from all BO reservations (reservation_object_add_excl_fence(resv, NULL)) to allow memory to be freed, let the OOM killer do its thing, and when the dust settles, reset the GPU. Yeah, thought about that as well. The problem with this approach is that it is rather invasive. E.g. stopping the BIF means stopping it for everybody and not just the process which is currently killed and when we reset the GPU it is actually quite likely that we lose the content of VRAM. Regards, Christian. Regards, Felix Regards, Christian. ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH v3 05/24] drm/amdgpu: remove memset after kzalloc
kzalloc has already zeroed the memory during the allocation. So memset is unneeded. Signed-off-by: Fuqian Huang --- Changes in v3: - Fix subject prefix: gpu/drm -> drm/amdgpu drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_irq.c | 2 -- drivers/gpu/drm/amd/powerplay/hwmgr/process_pptables_v1_0.c | 2 -- drivers/gpu/drm/amd/powerplay/smumgr/ci_smumgr.c| 2 -- drivers/gpu/drm/amd/powerplay/smumgr/iceland_smumgr.c | 2 -- drivers/gpu/drm/amd/powerplay/smumgr/tonga_smumgr.c | 2 -- 5 files changed, 10 deletions(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_irq.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_irq.c index fd22b4474dbf..4e6da61d1a93 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_irq.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_irq.c @@ -279,8 +279,6 @@ void *amdgpu_dm_irq_register_interrupt(struct amdgpu_device *adev, return DAL_INVALID_IRQ_HANDLER_IDX; } - memset(handler_data, 0, sizeof(*handler_data)); - init_handler_common_data(handler_data, ih, handler_args, &adev->dm); irq_source = int_params->irq_source; diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/process_pptables_v1_0.c b/drivers/gpu/drm/amd/powerplay/hwmgr/process_pptables_v1_0.c index 1cd5a8b5cdc1..b760f95e7fa7 100644 --- a/drivers/gpu/drm/amd/powerplay/hwmgr/process_pptables_v1_0.c +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/process_pptables_v1_0.c @@ -1067,8 +1067,6 @@ static int pp_tables_v1_0_initialize(struct pp_hwmgr *hwmgr) PP_ASSERT_WITH_CODE((NULL != hwmgr->pptable), "Failed to allocate hwmgr->pptable!", return -ENOMEM); - memset(hwmgr->pptable, 0x00, sizeof(struct phm_ppt_v1_information)); - powerplay_table = get_powerplay_table(hwmgr); PP_ASSERT_WITH_CODE((NULL != powerplay_table), diff --git a/drivers/gpu/drm/amd/powerplay/smumgr/ci_smumgr.c b/drivers/gpu/drm/amd/powerplay/smumgr/ci_smumgr.c index 669bd0c2a16c..d55e264c5df5 100644 --- a/drivers/gpu/drm/amd/powerplay/smumgr/ci_smumgr.c +++ b/drivers/gpu/drm/amd/powerplay/smumgr/ci_smumgr.c @@ -2702,8 +2702,6 @@ static int ci_initialize_mc_reg_table(struct pp_hwmgr *hwmgr) cgs_write_register(hwmgr->device, mmMC_SEQ_PMG_CMD_MRS2_LP, cgs_read_register(hwmgr->device, mmMC_PMG_CMD_MRS2)); cgs_write_register(hwmgr->device, mmMC_SEQ_WR_CTL_2_LP, cgs_read_register(hwmgr->device, mmMC_SEQ_WR_CTL_2)); - memset(table, 0x00, sizeof(pp_atomctrl_mc_reg_table)); - result = atomctrl_initialize_mc_reg_table(hwmgr, module_index, table); if (0 == result) diff --git a/drivers/gpu/drm/amd/powerplay/smumgr/iceland_smumgr.c b/drivers/gpu/drm/amd/powerplay/smumgr/iceland_smumgr.c index 375ccf6ff5f2..c123b4d9c621 100644 --- a/drivers/gpu/drm/amd/powerplay/smumgr/iceland_smumgr.c +++ b/drivers/gpu/drm/amd/powerplay/smumgr/iceland_smumgr.c @@ -2631,8 +2631,6 @@ static int iceland_initialize_mc_reg_table(struct pp_hwmgr *hwmgr) cgs_write_register(hwmgr->device, mmMC_SEQ_PMG_CMD_MRS2_LP, cgs_read_register(hwmgr->device, mmMC_PMG_CMD_MRS2)); cgs_write_register(hwmgr->device, mmMC_SEQ_WR_CTL_2_LP, cgs_read_register(hwmgr->device, mmMC_SEQ_WR_CTL_2)); - memset(table, 0x00, sizeof(pp_atomctrl_mc_reg_table)); - result = atomctrl_initialize_mc_reg_table(hwmgr, module_index, table); if (0 == result) diff --git a/drivers/gpu/drm/amd/powerplay/smumgr/tonga_smumgr.c b/drivers/gpu/drm/amd/powerplay/smumgr/tonga_smumgr.c index 3ed6c5f1e5cf..60462c7211e3 100644 --- a/drivers/gpu/drm/amd/powerplay/smumgr/tonga_smumgr.c +++ b/drivers/gpu/drm/amd/powerplay/smumgr/tonga_smumgr.c @@ -3114,8 +3114,6 @@ static int tonga_initialize_mc_reg_table(struct pp_hwmgr *hwmgr) cgs_write_register(hwmgr->device, mmMC_SEQ_WR_CTL_2_LP, cgs_read_register(hwmgr->device, mmMC_SEQ_WR_CTL_2)); - memset(table, 0x00, sizeof(pp_atomctrl_mc_reg_table)); - result = atomctrl_initialize_mc_reg_table(hwmgr, module_index, table); if (!result) -- 2.11.0
Re: [GIT PULL] Please pull hmm changes
The pull request you sent on Tue, 9 Jul 2019 19:24:21 +: > git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git tags/for-linus-hmm has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/fec88ab0af9706b2201e5daf377c5031c62d11f7 Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [GIT PULL] Please pull hmm changes
On Tue, Jul 9, 2019 at 12:24 PM Jason Gunthorpe wrote: > > I'm sending it early as it is now a dependency for several patches in > mm's quilt. .. but I waited to merge it until I had time to review it more closely, because I expected the review to be painful. I'm happy to say that I was overly pessimistic, and that instead of finding things to hate, I found it all looking good. Particularly the whole "use reference counts properly, so that lifetimes make sense and all those nasty cases can't happen" parts. It's all merged, just waiting for the test-build to verify that I didn't miss anything (well, at least nothing obvious). Linus
RE: [PATCH v2] drm/amd/powerplay: add helper of smu_clk_dpm_is_enabled for smu
Usually for API *is_enabled, either 0 or 1 should be returned. Or you can give the API a new naming. With that addressed, the patch is reviewed-by: Evan Quan Regards, Evan > -Original Message- > From: Wang, Kevin(Yang) > Sent: Friday, July 12, 2019 6:00 PM > To: amd-gfx@lists.freedesktop.org > Cc: Feng, Kenneth ; Quan, Evan > ; Wang, Kevin(Yang) > Subject: [PATCH v2] drm/amd/powerplay: add helper of > smu_clk_dpm_is_enabled for smu > > v2: change function name to smu_clk_dpm_is_enabled. > add this helper function to check dpm clk feature is enabled. > > Change-Id: I7f9949033c318fec618a9701df4a082d54a626c8 > Signed-off-by: Kevin Wang > --- > drivers/gpu/drm/amd/powerplay/amdgpu_smu.c| 69 - > -- > .../gpu/drm/amd/powerplay/inc/amdgpu_smu.h| 1 + > 2 files changed, 47 insertions(+), 23 deletions(-) > > diff --git a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c > b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c > index 787a293fde97..c16195e19078 100644 > --- a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c > +++ b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c > @@ -69,6 +69,10 @@ int smu_set_soft_freq_range(struct smu_context > *smu, enum smu_clk_type clk_type, > if (min <= 0 && max <= 0) > return -EINVAL; > > + ret = smu_clk_dpm_is_enabled(smu, clk_type); > + if (ret) > + return ret; > + > clk_id = smu_clk_get_index(smu, clk_type); > if (clk_id < 0) > return clk_id; > @@ -102,6 +106,10 @@ int smu_set_hard_freq_range(struct smu_context > *smu, enum smu_clk_type clk_type, > if (min <= 0 && max <= 0) > return -EINVAL; > > + ret = smu_clk_dpm_is_enabled(smu, clk_type); > + if (ret) > + return ret; > + > clk_id = smu_clk_get_index(smu, clk_type); > if (clk_id < 0) > return clk_id; > @@ -135,29 +143,9 @@ int smu_get_dpm_freq_range(struct smu_context > *smu, enum smu_clk_type clk_type, > if (!min && !max) > return -EINVAL; > > - switch (clk_type) { > - case SMU_MCLK: > - case SMU_UCLK: > - if (!smu_feature_is_enabled(smu, > SMU_FEATURE_DPM_UCLK_BIT)) { > - pr_warn("uclk dpm is not enabled\n"); > - return 0; > - } > - break; > - case SMU_GFXCLK: > - case SMU_SCLK: > - if (!smu_feature_is_enabled(smu, > SMU_FEATURE_DPM_GFXCLK_BIT)) { > - pr_warn("gfxclk dpm is not enabled\n"); > - return 0; > - } > - case SMU_SOCCLK: > - if (!smu_feature_is_enabled(smu, > SMU_FEATURE_DPM_SOCCLK_BIT)) { > - pr_warn("sockclk dpm is not enabled\n"); > - return 0; > - } > - break; > - default: > - break; > - } > + ret = smu_clk_dpm_is_enabled(smu, clk_type); > + if (ret) > + return ret; > > mutex_lock(&smu->mutex); > clk_id = smu_clk_get_index(smu, clk_type); @@ -200,6 +188,10 @@ > int smu_get_dpm_freq_by_index(struct smu_context *smu, enum > smu_clk_type clk_typ > if (!value) > return -EINVAL; > > + ret = smu_clk_dpm_is_enabled(smu, clk_type); > + if (ret) > + return ret; > + > clk_id = smu_clk_get_index(smu, clk_type); > if (clk_id < 0) > return clk_id; > @@ -228,6 +220,37 @@ int smu_get_dpm_level_count(struct smu_context > *smu, enum smu_clk_type clk_type, > return smu_get_dpm_freq_by_index(smu, clk_type, 0xff, value); } > > +int smu_clk_dpm_is_enabled(struct smu_context *smu, enum > smu_clk_type > +clk_type) { > + int ret = 0; > + enum smu_feature_mask feature_id = 0; > + > + switch (clk_type) { > + case SMU_MCLK: > + case SMU_UCLK: > + feature_id = SMU_FEATURE_DPM_UCLK_BIT; > + break; > + case SMU_GFXCLK: > + case SMU_SCLK: > + feature_id = SMU_FEATURE_DPM_GFXCLK_BIT; > + break; > + case SMU_SOCCLK: > + feature_id = SMU_FEATURE_DPM_SOCCLK_BIT; > + break; > + default: > + return 0; > + } > + > + ret = smu_feature_is_enabled(smu, feature_id); > + if (ret) { > + pr_warn("smu %d clk dpm feature %d is not enabled\n", > clk_type, feature_id); > + return -EACCES; > + } > + > + return ret; > +} > + > + > int smu_dpm_set_power_gate(struct smu_context *smu, uint32_t > block_type, > bool gate) > { > diff --git a/drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h > b/drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h > index c97324ef7db2..4629a64a90ed 100644 > --- a/drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h > +++ b/drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h > @@ -973,5 +973,6 @@ int smu_set_hard_freq_range(struct smu_context > *smu, enum smu_clk_type clk_type, enum amd_dpm_forced_level > smu_get_performance_leve
Re: [PATCH v4] drm/amdgpu: fix scheduler timeout calc
Reviewed-by: Christian König Sorry for the delay, I've been on vacation for two weeks. Christian. Am 05.07.19 um 07:37 schrieb Xu, Feifei: Reviewed-by: feifei...@amd.com -Original Message- From: amd-gfx On Behalf Of Cui, Flora Sent: Monday, July 1, 2019 11:37 AM To: amd-gfx@lists.freedesktop.org Cc: Cui, Flora Subject: [PATCH v4] drm/amdgpu: fix scheduler timeout calc scheduler timeout is in jiffies v2: move timeout check to amdgpu_device_get_job_timeout_settings after parsing the value v3: add lockup_timeout param check. 0: keep default value. negative: infinity timeout. v4: refactor codes. Change-Id: I26708c163db943ff8d930dd81bcab4b4b9d84eb2 Signed-off-by: Flora Cui --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 13 + 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index e74a175..e448f8e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -245,7 +245,8 @@ module_param_named(msi, amdgpu_msi, int, 0444); * By default(with no lockup_timeout settings), the timeout for all non-compute(GFX, SDMA and Video) * jobs is 1. And there is no timeout enforced on compute jobs. */ -MODULE_PARM_DESC(lockup_timeout, "GPU lockup timeout in ms (default: 1 for non-compute jobs and no timeout for compute jobs), " +MODULE_PARM_DESC(lockup_timeout, "GPU lockup timeout in ms (default: 1 for non-compute jobs and infinity timeout for compute jobs." + " 0: keep default value. negative: infinity timeout), " "format is [Non-Compute] or [GFX,Compute,SDMA,Video]"); module_param_string(lockup_timeout, amdgpu_lockup_timeout, sizeof(amdgpu_lockup_timeout), 0444); @@ -1300,7 +1301,8 @@ int amdgpu_device_get_job_timeout_settings(struct amdgpu_device *adev) * By default timeout for non compute jobs is 1. * And there is no timeout enforced on compute jobs. */ - adev->gfx_timeout = adev->sdma_timeout = adev->video_timeout = 1; + adev->gfx_timeout = msecs_to_jiffies(1); + adev->sdma_timeout = adev->video_timeout = adev->gfx_timeout; adev->compute_timeout = MAX_SCHEDULE_TIMEOUT; if (strnlen(input, AMDGPU_MAX_TIMEOUT_PARAM_LENTH)) { @@ -1310,10 +1312,13 @@ int amdgpu_device_get_job_timeout_settings(struct amdgpu_device *adev) if (ret) return ret; - /* Invalidate 0 and negative values */ - if (timeout <= 0) { + if (timeout == 0) { index++; continue; + } else if (timeout < 0) { + timeout = MAX_SCHEDULE_TIMEOUT; + } else { + timeout = msecs_to_jiffies(timeout); } switch (index++) { -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx