RE: [RFC PATCH v2] drm/amdkfd: Run restore_workers on freezable WQs

2023-11-10 Thread Pan, Xinhui
softirqs last disabled at (1342671): [] __irq_exit_rcu+0xd3/0x140 [ 84.167692] ---[ end trace ]--- [ 84.189957] PM: suspe Thanks xinhui -Original Message- From: Pan, Xinhui Sent: Friday, November 10, 2023 12:51 PM To: Kuehling, Felix ; amd-gfx@lists.freedesktop.org Cc

RE: [RFC PATCH v2] drm/amdkfd: Run restore_workers on freezable WQs

2023-11-09 Thread Pan, Xinhui
@lists.freedesktop.org Cc: Deng, Emily ; Pan, Xinhui ; Koenig, Christian Subject: [RFC PATCH v2] drm/amdkfd: Run restore_workers on freezable WQs Make restore workers freezable so we don't have to explicitly flush them in suspend and GPU reset code paths, and we don't accidentally try to restore BOs while the GPU

RE: 回复: [PATCH] drm/amdgpu: Ignore first evction failure during suspend

2023-09-13 Thread Pan, Xinhui
, Christian Sent: Wednesday, September 13, 2023 10:29 PM To: Kuehling, Felix ; Christian König ; Pan, Xinhui ; amd-gfx@lists.freedesktop.org; Wentland, Harry Cc: Deucher, Alexander ; Fan, Shikang Subject: Re: 回复: [PATCH] drm/amdgpu: Ignore first evction failure during suspend [+Harry] Am

回复: [PATCH] drm/amdgpu: Ignore first evction failure during suspend

2023-09-12 Thread Pan, Xinhui
tTest.BasicTest pm-suspend thanks xinhui 发件人: Christian König 发送时间: 2023年9月12日 17:01 收件人: Pan, Xinhui ; amd-gfx@lists.freedesktop.org 抄送: Deucher, Alexander ; Koenig, Christian ; Fan, Shikang 主题: Re: [PATCH] drm/amdgpu: Ignore first evction failure du

RE: [PATCH] drm/amdgpu: Ignore first evction failure during suspend

2023-09-11 Thread Pan, Xinhui
in its suspend callback. SO the first eviction before kfd callback likely fails. -Original Message- From: Christian König Sent: Friday, September 8, 2023 2:49 PM To: Pan, Xinhui ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Koenig, Christian ; Fan, Shikang Subject: Re: [PATCH

RE: [PATCH] drm/scheduler: Partially revert "drm/scheduler: track GPU active time per entity"

2023-08-16 Thread Pan, Xinhui
[AMD Official Use Only - General] Can we just add kref for entity? Or just collect such job time usage somewhere else? -Original Message- From: Pan, Xinhui Sent: Thursday, August 17, 2023 1:05 PM To: amd-gfx@lists.freedesktop.org Cc: Tuikov, Luben ; airl...@gmail.com; dri-de

回复: 回复: [PATCH v4] drm: Optimise for continuous memory allocation

2022-11-29 Thread Pan, Xinhui
[AMD Official Use Only - General] comments line. 发件人: Koenig, Christian 发送时间: 2022年11月29日 20:07 收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org 抄送: dan...@ffwll.ch; matthew.a...@intel.com; dri-de...@lists.freedesktop.org; linux-ker...@vger.kernel.org

回复: [PATCH v4] drm: Optimise for continuous memory allocation

2022-11-29 Thread Pan, Xinhui
[AMD Official Use Only - General] comments inline. 发件人: Koenig, Christian 发送时间: 2022年11月29日 19:32 收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org 抄送: dan...@ffwll.ch; matthew.a...@intel.com; dri-de...@lists.freedesktop.org; linux-ker...@vger.kernel.org

回复: [PATCH v4] drm: Optimise for continuous memory allocation

2022-11-29 Thread Pan, Xinhui
free; thanks xinhui ____________ 发件人: Pan, Xinhui 发送时间: 2022年11月29日 18:56 收件人: amd-gfx@lists.freedesktop.org 抄送: dan...@ffwll.ch; matthew.a...@intel.com; Koenig, Christian; dri-de...@lists.freedesktop.org; linux-ker...@vger.kernel.org; Paneer Selvam

回复: [PATCH] drm/amdgpu: New method to check block continuous

2022-11-28 Thread Pan, Xinhui
just re-sort these blocks in ascending order if memory is indeed continuous? thanks xinhui 发件人: Christian König 发送时间: 2022年11月29日 1:11 收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org 抄送: Deucher, Alexander 主题: Re: [PATCH] drm/amdgpu: New method to check

回复: [PATCH v3] drm: Optimise for continuous memory allocation

2022-11-28 Thread Pan, Xinhui
[AMD Official Use Only - General] Hi Arun, Thanks for your reply. comments are inline. 发件人: Paneer Selvam, Arunpravin 发送时间: 2022年11月29日 1:09 收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org 抄送: linux-ker...@vger.kernel.org; dri-de...@lists.freedesktop.org

RE: [PATCH] drm/amdgpu: Fix a NULL pointer of fence

2022-07-07 Thread Pan, Xinhui
Christian König ; Pan, Xinhui ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Koenig, Christian Subject: Re: [PATCH] drm/amdgpu: Fix a NULL pointer of fence Am 2022-07-07 um 05:54 schrieb Christian König: > Am 07.07.22 um 11:50 schrieb xinhui pan: >> Fence is accessed by dma_r

回复: [PATCH] drm/amdgpu: Make sure ttm delayed work finished

2022-04-13 Thread Pan, Xinhui
: 2022年4月13日 15:30 收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org 抄送: Deucher, Alexander 主题: AW: [PATCH] drm/amdgpu: Make sure ttm delayed work finished We don't need that. TTM only reschedules when the BOs are still busy. And if the BOs are still busy when you unload the driver we have much

回复: [PATCH] drm/amdgpu: Fix one use-after-free of VM

2022-04-12 Thread Pan, Xinhui
out; + if (intr && signal_pending(current)) { ret = -ERESTARTSYS; goto out; 发件人: Koenig, Christian 发送时间: 2022年4月12日 20:11 收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org; Daniel Vetter 抄送: Deucher, Alexander 主题: Re:

回复: 回复: 回复: [PATCH] drm/ttm: Put BO in its memory manager's lru list

2021-11-09 Thread Pan, Xinhui
Christian 发送时间: 2021年11月9日 21:18 收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org 抄送: dri-de...@lists.freedesktop.org 主题: Re: 回复: 回复: [PATCH] drm/ttm: Put BO in its memory manager's lru list Exactly that's the reason why we should have the double check in TTM I've mentioned in the other mail.

回复: 回复: [PATCH] drm/ttm: Put BO in its memory manager's lru list

2021-11-09 Thread Pan, Xinhui
ist is on vram domain) to sMem. 发件人: Pan, Xinhui 发送时间: 2021年11月9日 21:05 收件人: Koenig, Christian; amd-gfx@lists.freedesktop.org 抄送: dri-de...@lists.freedesktop.org 主题: 回复: 回复: [PATCH] drm/ttm: Put BO in its memory manager's lru list Yes, a stable tag is nee

回复: 回复: [PATCH] drm/ttm: Put BO in its memory manager's lru list

2021-11-09 Thread Pan, Xinhui
_start(adev, mem->mem_type) + 209 mm_cur->start; 210 return 0; 211 } line 208, *addr is zero. So when amdgpu_copy_buffer submit job with such addr, page fault happens. 发件人: Koenig, Christian 发送时间: 20

回复: [PATCH] drm/ttm: Put BO in its memory manager's lru list

2021-11-09 Thread Pan, Xinhui
, Christian 发送时间: 2021年11月9日 20:20 收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org 抄送: dri-de...@lists.freedesktop.org 主题: Re: [PATCH] drm/ttm: Put BO in its memory manager's lru list Am 09.11.21 um 12:19 schrieb xinhui pan: > After we move BO to a new memory region, we should put it to > the new

Re: [PATCH] drm/amdgpu: Let BO created in its allowed_domain

2021-09-17 Thread Pan, Xinhui
[AMD Official Use Only] Why? just to evict some inactive vram BOs? From: Koenig, Christian Sent: Friday, September 17, 2021 3:06:16 PM To: Pan, Xinhui ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander Subject: Re: [PATCH] drm/amdgpu: Let BO created in its

RE: [PATCH] drm/amdgpu: Fix crash on device remove/driver unload

2021-09-16 Thread Pan, Xinhui
[AMD Official Use Only] Reviewed-by: xinhui pan -Original Message- From: amd-gfx On Behalf Of Andrey Grodzovsky Sent: 2021年9月16日 3:42 To: amd-gfx@lists.freedesktop.org Cc: Quan, Evan ; Pan, Xinhui ; Deucher, Alexander ; Grodzovsky, Andrey Subject: [PATCH] drm/amdgpu: Fix crash

回复: [PATCH v2] drm/amdgpu: Put drm_dev_enter/exit outside hot codepath

2021-09-15 Thread Pan, Xinhui
发件人: Pan, Xinhui 发送时间: 2021年9月15日 14:37 收件人: amd-gfx@lists.freedesktop.org 抄送: Deucher, Alexander; Koenig, Christian; Grodzovsky, Andrey; Pan, Xinhui 主题: [PATCH v2] drm/amdgpu: Put drm_dev_enter/exit outside hot codepath We hit soft hang while doing memory

回复: 回复: [PATCH v2] drm/amdgpu: Fix a race of IB test

2021-09-13 Thread Pan, Xinhui
; Pan, Xinhui; amd-gfx@lists.freedesktop.org 抄送: Deucher, Alexander 主题: Re: 回复: [PATCH v2] drm/amdgpu: Fix a race of IB test On 9/13/2021 12:21 PM, Christian König wrote: > Keep in mind that we don't try to avoid contention here. The goal is > rather to have as few locks as possible to

回复: 回复: [PATCH] drm/amdgpu: Fix a race of IB test

2021-09-13 Thread Pan, Xinhui
[AMD Official Use Only] These IB tests are all using direct IB submission including the delayed init work. 发件人: Koenig, Christian 发送时间: 2021年9月13日 14:19 收件人: Pan, Xinhui; Christian König; amd-gfx@lists.freedesktop.org 抄送: Deucher, Alexander 主题: Re: 回复

回复: [PATCH v3 1/3] drm/amdgpu: UVD avoid memory allocation during IB test

2021-09-13 Thread Pan, Xinhui
to understand. 发件人: Koenig, Christian 发送时间: 2021年9月13日 14:31 收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org 抄送: Deucher, Alexander 主题: Re: [PATCH v3 1/3] drm/amdgpu: UVD avoid memory allocation during IB test Am 11.09.21 um 03:34 schrieb xinhui pan: > m

回复: [RFC PATCH] drm/ttm: Try to check if new ttm man out of bounds during compile

2021-09-13 Thread Pan, Xinhui
: Christian König 发送时间: 2021年9月13日 14:35 收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org 抄送: Koenig, Christian; dan...@ffwll.ch; dri-de...@lists.freedesktop.org; Chen, Guchun 主题: Re: [RFC PATCH] drm/ttm: Try to check if new ttm man out of bounds during compile Am 13.09.21 um 05:36 schrieb

回复: [PATCH v2] drm/amdgpu: Fix a race of IB test

2021-09-12 Thread Pan, Xinhui
[AMD Official Use Only] yep, that is a lazy way to fix it. I am thinking of adding one amdgpu_ring.direct_access_mutex before we issue test_ib on each ring. 发件人: Lazar, Lijo 发送时间: 2021年9月13日 12:00 收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org 抄送

回复: [PATCH] drm/amdgpu: Fix a race of IB test

2021-09-11 Thread Pan, Xinhui
sync method. But I see device resume itself woud flush it. So there is no race between them as userspace is still freezed. I will drop this flush in V2. 发件人: Christian König 发送时间: 2021年9月11日 15:45 收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org 抄送: Deucher

回复: 回复: 回复: 回复: [PATCH 2/4] drm/amdgpu: UVD avoid memory allocation during IB test

2021-09-10 Thread Pan, Xinhui
it. 发件人: Koenig, Christian 发送时间: 2021年9月10日 19:10 收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org 抄送: Deucher, Alexander 主题: Re: 回复: 回复: 回复: [PATCH 2/4] drm/amdgpu: UVD avoid memory allocation during IB test Yeah, but that IB test should use the indirect submission through the scheduler

回复: 回复: 回复: [PATCH 2/4] drm/amdgpu: UVD avoid memory allocation during IB test

2021-09-10 Thread Pan, Xinhui
[AMD Official Use Only] we need take this lock. IB test can be triggered through debugfs. Recent days I usually test it by cat gpu recovery and amdgpu_test_ib in debugfs. 发件人: Koenig, Christian 发送时间: 2021年9月10日 18:02 收件人: Pan, Xinhui; amd-gfx

回复: 回复: [PATCH 2/4] drm/amdgpu: UVD avoid memory allocation during IB test

2021-09-10 Thread Pan, Xinhui
should use DIRECT pool. Looks like we should only use reserved BO for direct IB submission. As for delayed IB submission, we could alloc a new one dynamicly. 发件人: Koenig, Christian 发送时间: 2021年9月10日 16:53 收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org 抄

回复: [PATCH 2/4] drm/amdgpu: UVD avoid memory allocation during IB test

2021-09-10 Thread Pan, Xinhui
[AMD Official Use Only] I am wondering if amdgpu_bo_pin would change BO's placement in the futrue. For now, the new placement is calculated by new = old ∩ new. 发件人: Koenig, Christian 发送时间: 2021年9月10日 14:24 收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org

回复: [PATCH 4/4] drm/amdgpu: VCN avoid memory allocation during IB test

2021-09-10 Thread Pan, Xinhui
[AMD Official Use Only] I am using vim with set tabstop=8 set shiftwidth=8 set softtabstop=8 发件人: Koenig, Christian 发送时间: 2021年9月10日 14:33 收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org 抄送: Deucher, Alexander 主题: Re: [PATCH 4/4] drm/amdgpu: VCN avoid

Re: [PATCH] drm/ttm: add a BUG_ON in ttm_set_driver_manager when array bounds

2021-09-09 Thread Pan, Xinhui
; Koenig, Christian ; Pan, Xinhui ; Deucher, Alexander Cc: Chen, Guchun ; Shi, Leslie Subject: [PATCH] drm/ttm: add a BUG_ON in ttm_set_driver_manager when array bounds Vendor will define their own memory types on top of TTM_PL_PRIV, but call ttm_set_driver_manager directly without checking mem_type

Re: [PATCH 2/2] drm/amdgpu: alloc IB extra msg from IB pool

2021-09-09 Thread Pan, Xinhui
[AMD Official Use Only] well, If IB test fails because we use gtt domain or the above 256MB vram. Then the failure is expected. Doesn't IB test exist to detect such issue? 发件人: Koenig, Christian 发送时间: 2021年9月9日星期四 15:16 收件人: Pan, Xinhui; amd-gfx

Re: [PATCH 1/2] drm/amdgpu: Increase direct IB pool size

2021-09-09 Thread Pan, Xinhui
[AMD Official Use Only] yep, vcn need 128kb extra memory. I will make the pool size constant as 256kb. From: Koenig, Christian Sent: Thursday, September 9, 2021 3:14:15 PM To: Pan, Xinhui ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander Subject: Re

[PATCH 2/2] drm/amdgpu: alloc IB extra msg from IB pool

2021-09-08 Thread Pan, Xinhui
[AMD Official Use Only] There is one dedicated IB pool for IB test. So lets use it for extra msg too. For UVD on older HW, use one reserved BO at specific range. Signed-off-by: xinhui pan --- drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 173 +++-

[PATCH 1/2] drm/amdgpu: Increase direct IB pool size

2021-09-08 Thread Pan, Xinhui
[AMD Official Use Only] Direct IB pool is used for vce/uvd/vcn IB extra msg too. Increase its size to 64 pages. Signed-off-by: xinhui pan --- drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c

Re: [RFC PATCH v3] drm/amdgpu: alloc uvd msg from IB pool

2021-09-08 Thread Pan, Xinhui
> 2021年9月8日 14:23,Christian König 写道: > > Am 08.09.21 um 03:25 schrieb Pan, Xinhui: >>> 2021年9月7日 20:37,Koenig, Christian 写道: >>> >>> Am 07.09.21 um 14:26 schrieb xinhui pan: >>>> There is one dedicated IB pool for IB test. So lets use it fo

Re: [RFC PATCH v3] drm/amdgpu: alloc uvd msg from IB pool

2021-09-07 Thread Pan, Xinhui
> 2021年9月7日 20:37,Koenig, Christian 写道: > > Am 07.09.21 um 14:26 schrieb xinhui pan: >> There is one dedicated IB pool for IB test. So lets use it for uvd msg >> too. >> >> For some older HW, use one reserved BO at specific range. >> >> Signed-off-by: xinhui pan >> --- >>

RE: [PATCH v2 1/2] drm/ttm: Fix a deadlock if the target BO is not idle during swap

2021-09-06 Thread Pan, Xinhui
[AMD Official Use Only] It is the internal staging drm-next. -Original Message- From: Koenig, Christian Sent: 2021年9月6日 19:26 To: Pan, Xinhui ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; che...@uniontech.com; dri-de...@lists.freedesktop.org Subject: Re: [PATCH v2 1/2] drm

Re: [PATCH v2 0/2] Fix a hung during memory pressure test

2021-09-06 Thread Pan, Xinhui
> 2021年9月6日 17:04,Christian König 写道: > > > > Am 06.09.21 um 03:12 schrieb xinhui pan: >> A long time ago, someone reports system got hung during memory test. >> In recent days, I am trying to look for or understand the potential >> deadlock in ttm/amdgpu code. >> >> This patchset aims to

[PATCH v2 2/2] drm/amdpgu: Use VRAM domain in UVD IB test

2021-09-05 Thread Pan, Xinhui
[AMD Official Use Only] Like vce/vcn does, visible VRAM is OK for ib test. While commit a11d9ff3ebe0 ("drm/amdgpu: use GTT for uvd_get_create/destory_msg") says VRAM is not mapped correctly in his platform which is likely an arm64. So lets change back to use VRAM on x86_64 platform.

[PATCH v2 1/2] drm/ttm: Fix a deadlock if the target BO is not idle during swap

2021-09-05 Thread Pan, Xinhui
[AMD Official Use Only] The ret value might be -EBUSY, caller will think lru lock is still locked but actually NOT. So return -ENOSPC instead. Otherwise we hit list corruption. ttm_bo_cleanup_refs might fail too if BO is not idle. If we return 0, caller(ttm_tt_populate -> ttm_global_swapout

Subject: [PATCH v2 0/2] Fix a hung during memory pressure test

2021-09-05 Thread Pan, Xinhui
[AMD Official Use Only] A long time ago, someone reports system got hung during memory test. In recent days, I am trying to look for or understand the potential deadlock in ttm/amdgpu code. This patchset aims to fix the deadlock during ttm populate. TTM has a parameter called pages_limit, when

[PATCH v3] drm/amdgpu: Fix a deadlock if previous GEM object allocation fails

2021-08-31 Thread Pan, Xinhui
Fall through to handle the error instead of return. Fixes: f8aab60422c37 ("drm/amdgpu: Initialise drm_gem_object_funcs for imported BOs") Cc: sta...@vger.kernel.org Signed-off-by: xinhui pan --- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 23 ++- 1 file changed, 10

[PATCH v2] drm/amdgpu: Fix a deadlock if previous GEM object allocation fails

2021-08-30 Thread Pan, Xinhui
Fall through to handle the error instead of return. Signed-off-by: xinhui pan --- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c index 85b292ed5c43..7ddd429052ea 100644

Re: [PATCH] drm/amdgpu: Fix a deadlock if previous GEM object allocation fails

2021-08-30 Thread Pan, Xinhui
在 2021/8/31 13:38,“Pan, Xinhui” 写入: 在 2021/8/31 12:03,“Grodzovsky, Andrey” 写入: On 2021-08-30 11:24 p.m., Pan, Xinhui wrote: > [AMD Official Use Only] > > [AMD Official Use Only] > > Unreserve root B

Re: [PATCH] drm/amdgpu: Fix a deadlock if previous GEM object allocation fails

2021-08-30 Thread Pan, Xinhui
在 2021/8/31 12:03,“Grodzovsky, Andrey” 写入: On 2021-08-30 11:24 p.m., Pan, Xinhui wrote: > [AMD Official Use Only] > > [AMD Official Use Only] > > Unreserve root BO before return otherwise next allocation got deadlock. > > Sign

[PATCH] drm/amdgpu: Fix a deadlock if previous GEM object allocation fails

2021-08-30 Thread Pan, Xinhui
[AMD Official Use Only] Unreserve root BO before return otherwise next allocation got deadlock. Signed-off-by: xinhui pan --- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 11 +-- 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c

[PATCH] drm/amdgpu: Fix out-of-bounds read when update mapping

2021-07-27 Thread Pan, Xinhui
[AMD Official Use Only] If one GTT BO has been evicted/swapped out, it should sit in CPU domain. TTM only alloc struct ttm_resource instead of struct ttm_range_mgr_node for sysMem. Now when we update mapping for such invalidated BOs, we might walk out of bounds of struct ttm_resource. Three

Re: [PATCH] drm/amdgpu: further lower VRAM allocation overhead

2021-07-14 Thread Pan, Xinhui
> 2021年7月14日 16:33,Christian König 写道: > > Hi Eric, > > feel free to push into amd-staging-dkms-5.11, but please don't push it into > amd-staging-drm-next. > > The later will just cause a merge failure which Alex needs to resolve > manually. > > I can take care of pushing to

Re: [PATCH v2 1/2] drm/amdkfd: Fix some double free when destroy queue fails

2021-06-17 Thread Pan, Xinhui
Felix What I am wondreing is that if CP got hang, could we assume all usermode queues have stopped? If so, we can do cleanupwork regardless of the retval of execute_queues_cpsch(). > 2021年6月17日 20:11,Pan, Xinhui 写道: > > Felix > what I am thinking of like below looks like

Re: [PATCH v2 1/2] drm/amdkfd: Fix some double free when destroy queue fails

2021-06-17 Thread Pan, Xinhui
ang) { + retval = -EIO; + goto failed_try_destroy_debugged_queue; + } + if (qpd->is_debug) { /* * error, currently we do not allow to destroy a queue > 2021年6月17日 20:02,Pan, Xinhui 写道: > > Handle queue destroy failur

Re: [PATCH 1/2] drm/amdkfd: Fix some double free when destroy queue fails

2021-06-16 Thread Pan, Xinhui
> 2021年6月17日 06:55,Kuehling, Felix 写道: > > On 2021-06-16 4:35 a.m., xinhui pan wrote: >> Some resource are freed even destroy queue fails. > > Looks like you're keeping this behaviour for -ETIME. That is consistent with > what pqn_destroy_queue does. What you're fixing here is the behaviour

Re: [PATCH] drm/amdkfd: Fix circular lock in nocpsch path

2021-06-16 Thread Pan, Xinhui
> 2021年6月16日 12:36,Kuehling, Felix 写道: > > Am 2021-06-16 um 12:01 a.m. schrieb Pan, Xinhui: >>> 2021年6月16日 02:22,Kuehling, Felix 写道: >>> >>> [+Xinhui] >>> >>> >>> Am 2021-06-15 um 1:50 p.m. schrieb Amber Lin: >>

Re: [PATCH] drm/amdkfd: Fix circular lock in nocpsch path

2021-06-16 Thread Pan, Xinhui
> 2021年6月16日 02:22,Kuehling, Felix 写道: > > [+Xinhui] > > > Am 2021-06-15 um 1:50 p.m. schrieb Amber Lin: >> Calling free_mqd inside of destroy_queue_nocpsch_locked can cause a >> circular lock. destroy_queue_nocpsch_locked is called under a DQM lock, >> which is taken in MMU notifiers,

Re: [RFC PATCH] drm/ttm: Do page counting after populate callback succeed

2021-06-15 Thread Pan, Xinhui
> 2021年6月15日 20:01,Christian König 写道: > > Am 15.06.21 um 13:57 schrieb xinhui pan: >> Amdgpu set SG flag in populate callback. So TTM still count pages in SG >> BO. > > It's probably better to fix this instead. E.g. why does amdgpu modify the SG > flag during populate and not during initial

回复: [PATCH] drm/amdgpu: Use dma_resv_lock instead in BO release_notify

2021-05-21 Thread Pan, Xinhui
ru_lock); ret = amdgpu_amdkfd_remove_eviction_fence(bo, ef); dma_resv_unlock(bo->tbo.base.resv); 发件人: Kuehling, Felix 发送时间: 2021年5月22日 2:24 收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org 抄送: Deucher, Alexander; Koenig, Christian 主题: Re: [PATCH] drm/a

回复: 回复: 回复: [RFC PATCH 1/2] drm/amdgpu: Fix memory corruption due to swapout and swapin

2021-05-20 Thread Pan, Xinhui
[AMD Official Use Only] I just sent out patch below yesterday. swapping unpopulated bo is useless indeed. [RFC PATCH 2/2] drm/ttm: skip swapout when ttm has no backend page. 发件人: Christian König 发送时间: 2021年5月20日 14:39 收件人: Pan, Xinhui; Kuehling, Felix

Re: 回复: 回复: [RFC PATCH 1/2] drm/amdgpu: Fix memory corruption due to swapout and swapin

2021-05-20 Thread Pan, Xinhui
I just sent out patch below yesterday. swapping unpopulated bo is useless indeed. [RFC PATCH 2/2] drm/ttm: skip swapout when ttm has no backend page. ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org

回复: 回复: [RFC PATCH 1/2] drm/amdgpu: Fix memory corruption due to swapout and swapin

2021-05-19 Thread Pan, Xinhui
König; Pan, Xinhui; amd-gfx@lists.freedesktop.org 抄送: Deucher, Alexander; dan...@ffwll.ch; Koenig, Christian; dri-de...@lists.freedesktop.org 主题: Re: 回复: [RFC PATCH 1/2] drm/amdgpu: Fix memory corruption due to swapout and swapin Looks like we're creating the userptr BO as ttm_bo_type_device. I

回复: 回复: [RFC PATCH 1/2] drm/amdgpu: Fix memory corruption due to swapout and swapin

2021-05-19 Thread Pan, Xinhui
ate as TTM_PAGE_FLAG_SWAPPED is set. Now here is the problem, we swapin data to ttm bakend memory from swap storage. That just causes the memory been overwritten. 发件人: Christian König 发送时间: 2021年5月19日 18:01 收件人: Pan, Xinhui; Kuehling, Felix; amd-gfx@lists.freedesktop.org

回复: [RFC PATCH 1/2] drm/amdgpu: Fix memory corruption due to swapout and swapin

2021-05-18 Thread Pan, Xinhui
++i) { - list_for_each_entry(bo, >swap_lru[i], swap) { [snip] + for (i = TTM_PL_SYSTEM; i < TTM_NUM_MEM_TYPES; ++i) { + for (j = 0; j < TTM_MAX_BO_PRIORITY; ++j) { ________ 发件人: Pan, Xinhui 发送时间: 2021年5月19日 12:09 收件人: Kuehling

回复: [RFC PATCH 1/2] drm/amdgpu: Fix memory corruption due to swapout and swapin

2021-05-18 Thread Pan, Xinhui
Chris' patch as I think it desnt help. Or I can have a try later. 发件人: Kuehling, Felix 发送时间: 2021年5月19日 11:29 收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org 抄送: Deucher, Alexander; Koenig, Christian; dri-de...@lists.freedesktop.org; dan...@ffwll.ch 主题: Re

回复: [RFC PATCH 1/2] drm/amdgpu: Fix memory corruption due to swapout and swapin

2021-05-18 Thread Pan, Xinhui
Memory TEST_F(KFDMemoryTest, MemoryAlloc) { TEST_START(TESTPROFILE_RUNALL) -- 2.25.1 ____________ 发件人: Pan, Xinhui 发送时间: 2021年5月19日 10:28 收件人: amd-gfx@lists.freedesktop.org 抄送: Kuehling, Felix; Deucher, Alexander; Koenig, Christian; dri-de...@lists.freedeskt

回复: [PATCH] drm/amdgpu: fix PM reference leak in amdgpu_debugfs_gfxoff_rea()

2021-05-17 Thread Pan, Xinhui
_ 发件人: Yu Kuai 发送时间: 2021年5月17日 16:16 收件人: Deucher, Alexander; Koenig, Christian; Pan, Xinhui; airl...@linux.ie; dan...@ffwll.ch 抄送: amd-gfx@lists.freedesktop.org; dri-de...@lists.freedesktop.org; linux-ker...@vger.kernel.org; yuku...@huawei.com; yi.zh...@huawei.com 主题: [PATCH] drm/amdgp

Re: [PATCH] MAINTAINERS: Add Xinhui Pan as another AMDGPU contact

2021-05-10 Thread Pan, Xinhui
[AMD Official Use Only - Internal Distribution Only] Reviewed-by: xinhui pan From: Christian König Sent: Wednesday, May 5, 2021 7:01:46 PM To: Pan, Xinhui ; Deucher, Alexander ; amd-gfx@lists.freedesktop.org Subject: [PATCH] MAINTAINERS: Add Xinhui Pan

RE: [PATCH 1/2] drm/amdgpu: use zero as start for dummy resource walks

2021-03-23 Thread Pan, Xinhui
[AMD Official Use Only - Internal Distribution Only] I don’t think so. Start is offset here. We get the valid physical address from pages_addr[offset] when we update mapping. Btw, what issue we are seeing? -Original Message- From: amd-gfx On Behalf Of Christian K?nig Sent: 2021年3月23日

RE: [PATCH 5/8] drm/amdgpu: use the new cursor in amdgpu_ttm_access_memory

2021-03-21 Thread Pan, Xinhui
[AMD Official Use Only - Internal Distribution Only] Because this is not a deadlock of lock itself. Just because something like while(true) { LOCKIRQ ... UNLOCKIRQ ... } I think scheduler policy is voluntary. So it never schedule out if there is no sleep function and then soft lockup

RE: [PATCH 5/8] drm/amdgpu: use the new cursor in amdgpu_ttm_access_memory

2021-03-21 Thread Pan, Xinhui
[AMD Official Use Only - Internal Distribution Only] No, the patch from Nirmoy did not fully fix this issue. I will send another fix patch later. -Original Message- From: amd-gfx On Behalf Of Christian K?nig Sent: 2021年3月20日 17:08 To: Kuehling, Felix ; Paneer Selvam, Arunpravin ;

RE: [PATCH] amd/amdgpu: Fix resv shared fence overflow

2020-09-28 Thread Pan, Xinhui
[AMD Official Use Only - Internal Distribution Only] Pls ignore this patch. -Original Message- From: Pan, Xinhui Sent: 2020年9月29日 13:17 To: amd-gfx@lists.freedesktop.org Cc: Koenig, Christian ; Deucher, Alexander ; Pan, Xinhui Subject: [PATCH] amd/amdgpu: Fix resv shared fence

Re: [PATCH] drm/amdgpu: fix max_entries calculation v4

2020-09-03 Thread Pan, Xinhui
Reviewed-by: xinhui pan > 2020年9月3日 17:03,Christian König 写道: > > Calculate the correct value for max_entries or we might run after the > page_address array. > > v2: Xinhui pointed out we don't need the shift > v3: use local copy of start and simplify some calculation > v4: fix the case that

Re: [PATCH 0/3] Use implicit kref infra

2020-09-02 Thread Pan, Xinhui
> 2020年9月2日 22:50,Tuikov, Luben 写道: > > On 2020-09-02 00:43, Pan, Xinhui wrote: >> >> >>> 2020年9月2日 11:46,Tuikov, Luben 写道: >>> >>> On 2020-09-01 21:42, Pan, Xinhui wrote: >>>> If you take a look at the below function, yo

Re: [PATCH] drm/amdgpu: fix max_entries calculation v3

2020-09-02 Thread Pan, Xinhui
> 2020年9月2日 23:21,Christian König 写道: > > Calculate the correct value for max_entries or we might run after the > page_address array. > > v2: Xinhui pointed out we don't need the shift > v3: use local copy of start and simplify some calculation > > Signed-off-by: Christian König > Fixes:

Re: [PATCH] drm/amdgpu: fix max_entries calculation v2

2020-09-02 Thread Pan, Xinhui
> 2020年9月2日 22:31,Christian König 写道: > > Am 02.09.20 um 16:27 schrieb Pan, Xinhui: >> >>> 2020年9月2日 22:05,Christian König 写道: >>> >>> Calculate the correct value for max_entries or we might run after the >>> page_address array. >

Re: [PATCH] drm/amdgpu: fix max_entries calculation v2

2020-09-02 Thread Pan, Xinhui
> 2020年9月2日 22:05,Christian König 写道: > > Calculate the correct value for max_entries or we might run after the > page_address array. > > v2: Xinhui pointed out we don't need the shift > > Signed-off-by: Christian König > Fixes: 1e691e244487 drm/amdgpu: stop allocating dummy GTT nodes > ---

Re: [PATCH] drm/amdgpu: fix max_entries calculation

2020-09-02 Thread Pan, Xinhui
> 2020年9月2日 20:05,Christian König 写道: > > Calculate the correct value for max_entries or we might run after the > page_address array. > > Signed-off-by: Christian König > Fixes: 1e691e244487 drm/amdgpu: stop allocating dummy GTT nodes > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 3 ++- >

Re: [PATCH V2] drm/amdgpu: Do not move root PT bo to relocated list

2020-09-01 Thread Pan, Xinhui
list_move(_bo->vm_status, _bo->vm->relocated); >> 286 else >> 287 amdgpu_vm_bo_idle(vm_bo); >> 288 } >> >> Why you need to do the bo->parent check out side ? because it is me that moves such logic into amdgpu_vm_bo_relocated. &g

Re: [PATCH 0/3] Use implicit kref infra

2020-09-01 Thread Pan, Xinhui
> 2020年9月2日 11:46,Tuikov, Luben 写道: > > On 2020-09-01 21:42, Pan, Xinhui wrote: >> If you take a look at the below function, you should not use driver's >> release to free adev. As dev is embedded in adev. > > Do you mean "look at the function below",

Re: [PATCH 0/3] Use implicit kref infra

2020-09-01 Thread Pan, Xinhui
Luben" 日期: 2020年9月2日 星期三 09:07 收件人: "amd-gfx@lists.freedesktop.org" , "dri-de...@lists.freedesktop.org" 抄送: "Deucher, Alexander" , Daniel Vetter , "Pan, Xinhui" , "Tuikov, Luben" 主题: [PATCH 0/3] Use implicit kref infra Use the implic

Re: [PATCH] drm/amdgpu: Fix a redundant kfree

2020-09-01 Thread Pan, Xinhui
of total release sequence. Or still use the final_kfree to free adev and our release callback just do some other cleanup work. From: Tuikov, Luben Sent: Wednesday, September 2, 2020 4:35:32 AM To: Alex Deucher ; Pan, Xinhui ; Daniel Vetter Cc: amd-gfx@lists.freed

[PATCH] drm/amd/display: Fix a list corruption

2020-09-01 Thread Pan, Xinhui
[AMD Official Use Only - Internal Distribution Only] Remove the private obj from the internal list before we free aconnector. [ 56.925828] BUG: unable to handle page fault for address: 8f84a870a560 [ 56.933272] #PF: supervisor read access in kernel mode [ 56.938801] #PF:

[PATCH] drm/amdgpu: Fix a redundant kfree

2020-09-01 Thread Pan, Xinhui
[AMD Official Use Only - Internal Distribution Only] drm_dev_alloc() alloc *dev* and set managed.final_kfree to dev to free itself. Now from commit 5cdd68498918("drm/amdgpu: Embed drm_device into amdgpu_device (v3)") we alloc *adev* and ddev is just a member of it. So drm_dev_release try to free

Re: [PATCH] drm/amdgpu: fix kernel page fault issue by ras recovery on sGPU

2020-04-17 Thread Pan, Xinhui
that breaks the device list in gpu recovery. From: Pan, Xinhui Sent: Friday, April 17, 2020 7:11:40 PM To: Chen, Guchun ; amd-gfx@lists.freedesktop.org ; Zhang, Hawking ; Li, Dennis ; Clements, John ; Koenig, Christian Subject: Re: [PATCH] drm/amdgpu: fix

Re: [PATCH] drm/amdgpu: fix kernel page fault issue by ras recovery on sGPU

2020-04-17 Thread Pan, Xinhui
[AMD Official Use Only - Internal Distribution Only] This patch shluld fix the panic. but I would like you do NOT add adev xgmi head to the local device list. if ras ue occurs while the gpu is already in gpu recovery. From: amd-gfx on behalf of Christian K?nig

Re: [PATCH v2] drm/amdgpu: fix gfx hang during suspend with video playback (v2)

2020-04-12 Thread Pan, Xinhui
ils : Image[0]: Size(61952 Bytes), Type(Legacy Image) Image[1]: Size(43520 Bytes), Type(EFI Image) 发件人: "Liang, Prike" 日期: 2020年4月13日 星期一 12:23 收件人: "Pan, Xinhui" , Johannes Hirte 抄送: "Deucher, Alexander" , "Huang, Ray" , "Quan, Evan" , &q

Re: [PATCH v2] drm/amdgpu: fix gfx hang during suspend with video playback (v2)

2020-04-12 Thread Pan, Xinhui
Prike I hit this issue too. reboot hung with my vega10. it is ok with navi10. From: amd-gfx on behalf of Liang, Prike Sent: Sunday, April 12, 2020 11:49:39 AM To: Johannes Hirte Cc: Deucher, Alexander ; Huang, Ray ; Quan, Evan ; amd-gfx@lists.freedesktop.org

Re: [PATCH] drm/ttm: Schedule out if possibe in bo delayed delete worker

2020-04-09 Thread Pan, Xinhui
t cond_resched() has the advantage that we > could spend more time on cleaning up old BOs if there is nothing else for the > CPU TODO. > > Regards, > Christian. > > Am 09.04.20 um 16:24 schrieb Pan, Xinhui: >> https://elixir.bootlin.com/linux/latest/source/mm/slab.c

Re: [PATCH] drm/ttm: Schedule out if possibe in bo delayed delete worker

2020-04-09 Thread Pan, Xinhui
https://elixir.bootlin.com/linux/latest/source/mm/slab.c#L4026 This is another example of the usage of cond_sched. From: Pan, Xinhui Sent: Thursday, April 9, 2020 10:11:08 PM To: Lucas Stach ; amd-gfx@lists.freedesktop.org ; Koenig, Christian Cc: dri-de

Re: [PATCH] drm/ttm: Schedule out if possibe in bo delayed delete worker

2020-04-09 Thread Pan, Xinhui
I think it doesn't matter if workitem schedule out. Even we did not schedule out, the workqueue itself will schedule out later. So it did not break anything with this patch I think. From: Pan, Xinhui Sent: Thursday, April 9, 2020 10:07:09 PM To: Lucas Stach ; amd

Re: [PATCH] drm/ttm: Schedule out if possibe in bo delayed delete worker

2020-04-09 Thread Pan, Xinhui
From: Koenig, Christian Sent: Thursday, April 9, 2020 9:38:24 PM To: Lucas Stach ; Pan, Xinhui ; amd-gfx@lists.freedesktop.org Cc: dri-de...@lists.freedesktop.org Subject: Re: [PATCH] drm/ttm: Schedule out if possibe in bo delayed delete worker Am 09.04.20 um 15:25 schrieb Lucas Stach

Re: [PATCH] drm/amdgpu: fix fence handling in amdgpu_gem_object_close

2020-03-31 Thread Pan, Xinhui
Reviewed-by: xinhui pan > 2020年3月31日 22:25,Christian König 写道: > > The exclusive fence is only optional. > > Signed-off-by: Christian König > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git

Re: [PATCH 1/2] drm/amdgpu: fix and cleanup amdgpu_gem_object_close v2

2020-03-30 Thread Pan, Xinhui
Reviewed-by: xinhui pan > 2020年3月30日 18:50,Christian König 写道: > > The problem is that we can't add the clear fence to the BO > when there is an exclusive fence on it since we can't > guarantee the the clear fence will complete after the > exclusive one. > > To fix this refactor the function

Re: [PATCH v2] drm/amdgpu: implement more ib pools

2020-03-27 Thread Pan, Xinhui
> 2020年3月27日 16:24,Koenig, Christian 写道: > > Am 27.03.20 um 04:08 schrieb xinhui pan: >> We have three ib pools, they are normal, VM, direct pools. >> >> Any jobs which schedule IBs without dependence on gpu scheduler should >> use DIRECT pool. >> >> Any jobs schedule direct VM update IBs

Re: [RFC PATCH 1/2] drm/amdgpu: add direct ib pool

2020-03-26 Thread Pan, Xinhui
> 2020年3月26日 14:51,Koenig, Christian 写道: > > > > Am 26.03.2020 07:45 schrieb "Pan, Xinhui" : > > > > 2020年3月26日 14:36,Koenig, Christian 写道: > > > > > > > > Am 26.03.2020 07:15 schrieb "Pan, Xinhui" : > > >

Re: [RFC PATCH 1/2] drm/amdgpu: add direct ib pool

2020-03-26 Thread Pan, Xinhui
> 2020年3月26日 14:36,Koenig, Christian 写道: > > > > Am 26.03.2020 07:15 schrieb "Pan, Xinhui" : > > > > 2020年3月26日 13:38,Koenig, Christian 写道: > > > > Yeah that's on my TODO list for quite a while as well. > > > > But we e

Re: [RFC PATCH 1/2] drm/amdgpu: add direct ib pool

2020-03-26 Thread Pan, Xinhui
IB tests pool. > > Thanks, > Christian. > > Am 26.03.2020 03:02 schrieb "Pan, Xinhui" : > Another ib poll for direct submit. > Any jobs schedule IBs without dependence on gpu scheduler should use > this pool firstly. > > Signed-off-by: xinhui pan > --- > dri

Re: [RFC PATCH 0/2] add direct IB pool

2020-03-26 Thread Pan, Xinhui
have a littke time to fix this deadlock. if you want to repro it, set gpu timeout to 50ms,then run vulkan,ocl, amdgputest, etc together. I believe you will see more weird issues. From: Liu, Monk Sent: Thursday, March 26, 2020 1:31:04 PM To: Pan, Xinhui ; amd-gfx

Re: [PATCH] drm/amdgpu: Check entity rq

2020-03-25 Thread Pan, Xinhui
well, submit job with HW disabled shluld be no harm. The only concern is that we might use up IBs if we park scheduler thread during recovery. I have saw recovery stuck in sa new functuon. ring test alloc IBs to test if recovery succeed or not. But if there is no enough IBs it will wait

Re: [PATCH] drm/amdgpu: Check entity rq

2020-03-25 Thread Pan, Xinhui
. From: Koenig, Christian Sent: Wednesday, March 25, 2020 7:13:13 PM To: Das, Nirmoy Cc: Pan, Xinhui ; amd-gfx@lists.freedesktop.org ; Deucher, Alexander ; Kuehling, Felix Subject: Re: [PATCH] drm/amdgpu: Check entity rq Hi guys, thanks for pointing this out Nirmoy. Yeah, could be that I

  1   2   3   >