Re: [PATCH 1/2] drm/amdgpu: make duplicated EOP packet for GFX7/8 have real content

2024-06-17 Thread Christian König
Am 17.06.24 um 17:35 schrieb Xi Ruoyao: On Mon, 2024-06-17 at 22:30 +0800, Icenowy Zheng wrote: Two consecutive writes to the same bus address are perfectly legal from the PCIe specification and can happen all the time, even without this specific hw workaround. Yes I know it, and I am not from

Re: [PATCH 1/2] drm/amdgpu: make duplicated EOP packet for GFX7/8 have real content

2024-06-17 Thread Christian König
Am 17.06.24 um 18:09 schrieb Icenowy Zheng: BTW is there any operation that could be taken to examine this specific workaround? Is there any case possible to reproduce? No idea, I mean that's for GFX7/8 which was released between 2013 and 2017. My educated guess is that you could create a tes

Re: [PATCH AUTOSEL 6.1 13/14] drm/amdgpu: fix dereference null return value for the function amdgpu_vm_pt_parent

2024-06-18 Thread Christian König
Am 18.06.24 um 11:11 schrieb Pavel Machek: Hi! [ Upstream commit a0cf36546cc24ae1c95d72253c7795d4d2fc77aa ] The pointer parent may be NULLed by the function amdgpu_vm_pt_parent. To make the code more robust, check the pointer parent. If this can happen, it should not WARN(). If this can not

Re: [PATCH 1/6] drm/amdgpu: allow ioctls to opt-out of runtime pm

2024-06-19 Thread Christian König
Am 18.06.24 um 17:23 schrieb Pierre-Eric Pelloux-Prayer: Waking up a device can take multiple seconds, so if it's not going to be used we might as well not resume it. The safest default behavior for all ioctls is to resume the GPU, so this change allows specific ioctls to opt-out of generic runt

Re: [PATCH 4/6] drm/amdgpu: add AMDGPU_INFO_GB_ADDR_CONFIG query

2024-06-19 Thread Christian König
I would try to avoid that. Putting everything into amdgpu_info_device was a mistake only done because people assumed that IOCTLs on Linux are to expensive to query all information separately. We should rather have distinct IOCTLs for each value because that is way more flexible and we won't

Re: [PATCH] drm/amdgpu: track bo memory stats at runtime

2024-06-20 Thread Christian König
Am 20.06.24 um 02:30 schrieb Yunxiang Li: Before, every time fdinfo is queried we try to lock all the BOs in the VM and calculate memory usage from scratch. This works okay if the fdinfo is rarely read and the VMs don't have a ton of BOs. If either of these conditions is not true, we get a massiv

Re: [PATCH 1/6] drm/amdgpu: allow ioctls to opt-out of runtime pm

2024-06-20 Thread Christian König
Am 20.06.24 um 15:06 schrieb Pierre-Eric Pelloux-Prayer: Le 19/06/2024 à 11:26, Christian König a écrit : Am 18.06.24 um 17:23 schrieb Pierre-Eric Pelloux-Prayer: Waking up a device can take multiple seconds, so if it's not going to be used we might as well not resume it. The safest de

Re: [PATCH] drm/amdgpu: track bo memory stats at runtime

2024-06-20 Thread Christian König
Am 20.06.24 um 16:30 schrieb Li, Yunxiang (Teddy): [Public] + dma_resv_lock(bo->tbo.base.resv, NULL); Why do you grab the BO lock to update the stats? That doesn't seem to make any sense. + update_stats = !(bo->flags & AMDGPU_GEM_WAS_EXPORTED); + if (update_stats) + amdgpu_bo

Re: [PATCH 1/2] drm/amdgpu: Unmap BO memory before calling amdgpu_bo_unref()

2024-06-20 Thread Christian König
Am 20.06.24 um 16:44 schrieb Thomas Zimmermann: Prepares for using ttm_bo_vmap() and ttm_bo_vunmap() in amdgpu. Both require the caller to hold the GEM reservation lock, which is not the case while releasing a buffer object. Hence, push a possible call to unmap out from the buffer-object release

Re: [PATCH] drm/amdgpu: clear IH_RB_W/RPTR during enabling interrupts in sriov case

2024-06-20 Thread Christian König
Am 20.06.24 um 15:55 schrieb Danijel Slivka: Clearing the IH_RB_W/RPTR during interrupts disable is not clearing the RB_OVERFLOW bit. Adding workaround to clear the wptr when enabling interrupts in case RB_OVERFLOW bit is set. Signed-off-by: Danijel Slivka --- drivers/gpu/drm/amd/amdgpu/ih_v6

Re: [PATCH] drm/radeon: remove load callback

2024-06-21 Thread Christian König
Am 21.06.24 um 09:16 schrieb Thomas Zimmermann: Hi Am 20.06.24 um 16:30 schrieb Hoi Pok Wu: Dear Thomas, Thank you for testing my patch. The dev->dev_private is indeed the problem. However, most of the functions that uses dev->dev_private is passing drm_device as parameter, and then uses de

Re: [PATCH 1/2] drm/amdgpu: Unmap BO memory before calling amdgpu_bo_unref()

2024-06-21 Thread Christian König
Am 21.06.24 um 09:32 schrieb Thomas Zimmermann: Hi Am 20.06.24 um 17:50 schrieb Christian König: Am 20.06.24 um 16:44 schrieb Thomas Zimmermann: Prepares for using ttm_bo_vmap() and ttm_bo_vunmap() in amdgpu. Both require the caller to hold the GEM reservation lock, which is not the case

Re: [PATCH] drm/amdgpu: add missing error handling for amdgpu_ring_alloc()

2024-06-21 Thread Christian König
Am 21.06.24 um 11:24 schrieb Bob Zhou: Fix the unchecked return value warning reported by Coverity, so add error handling. That seems to be completely superfluous. The only case when amdgpu_ring_alloc() returns an error is when we try to allocate more than the maximum submission size. And i

[PATCH] drm/amdgpu: revert allow write access to mapped userptrs"

2024-06-24 Thread Christian König
This reverts commit 358c258a816baed4c6997b59c2117578a1360498. Jerome actually pointed out why that stuff doesn't work in 2016: https://lists.freedesktop.org/archives/dri-devel/2016-March/103062.html Unfortunately the revert somehow got lost. Signed-off-by: Christian König

Re: [PATCH] drm/amdgpu: clear RB_OVERFLOW bit if detected when enabling interrupts

2024-06-24 Thread Christian König
Am 24.06.24 um 08:58 schrieb Danijel Slivka: Why: Setting IH_RB_WPTR register to 0 will not clear the RB_OVERFLOW bit if RB_ENABLE is not set. How to fix: Set WPTR_OVERFLOW_CLEAR bit after RB_ENABLE bit is set. The RB_ENABLE bit is required to be set, together with WPTR_OVERFLOW_ENABLE bit so th

Re: [PATCH] drm/amdgpu: normalize registers as local xcc to read/write under sriov in TLB flush

2024-06-24 Thread Christian König
Am 24.06.24 um 11:13 schrieb Jane Jian: [WHY] sriov has the higher bit violation when flushing tlb [HOW] normalize the registers to keep lower 16-bit(dword aligned) to aviod higher bit violation RLCG will mask xcd out and always assume it's accessing its own xcd [TODO] later will add the norma

Re: [PATCH] drm/amdgpu: drop kiq access while in reset

2024-06-24 Thread Christian König
Am 24.06.24 um 08:34 schrieb Lazar, Lijo: On 6/24/2024 12:01 PM, Vignesh Chander wrote: correctly handle the case when trylock fails when gpu is about to be reset by dropping the request instead of using mmio Signed-off-by: Vignesh Chander Reviewed-by: Lijo Lazar Thanks, Lijo --- drive

Re: [PATCH] drm/amdgpu: drop kiq access while in reset

2024-06-24 Thread Christian König
Am 24.06.24 um 11:52 schrieb Lazar, Lijo: On 6/24/2024 3:08 PM, Christian König wrote: Am 24.06.24 um 08:34 schrieb Lazar, Lijo: On 6/24/2024 12:01 PM, Vignesh Chander wrote: correctly handle the case when trylock fails when gpu is about to be reset by dropping the request instead of using

Re: [PATCH] drm/amdgpu: normalize registers as local xcc to read/write under sriov in TLB flush

2024-06-24 Thread Christian König
ions. Regards, Christian. Thanks, Jane -Original Message- From: Christian König Sent: Monday, June 24, 2024 5:35 PM To: Jian, Jane ; Lazar, Lijo ; Chang, HaiJun ; Zhao, Victor Cc: amd-gfx@lists.freedesktop.org Subject: Re: [PATCH] drm/amdgpu: normalize registers as local xcc to read/write

Re: [PATCH] drm/amdgpu: clear RB_OVERFLOW bit when enabling interrupts

2024-06-24 Thread Christian König
eg_id, tmp)) + return -ETIMEDOUT; + } else { + WREG32_NO_KIQ(ih_regs->ih_rb_cntl, tmp); + } The last write can be merged with the one below to eventually enable the interrupt. So this chunk here can be droppe

Re: [PATCH] drm/amdgpu: normalize registers as local xcc to read/write under sriov in TLB flush

2024-06-24 Thread Christian König
Am 24.06.24 um 11:13 schrieb Jane Jian: [WHY] sriov has the higher bit violation when flushing tlb [HOW] normalize the registers to keep lower 16-bit(dword aligned) to aviod higher bit violation RLCG will mask xcd out and always assume it's accessing its own xcd [TODO] later will add the norma

Re: [PATCH] drm/amdgpu: drop kiq access while in reset

2024-06-24 Thread Christian König
Am 24.06.24 um 13:57 schrieb Lazar, Lijo: On 6/24/2024 5:19 PM, Christian König wrote: Am 24.06.24 um 11:52 schrieb Lazar, Lijo: On 6/24/2024 3:08 PM, Christian König wrote: Am 24.06.24 um 08:34 schrieb Lazar, Lijo: On 6/24/2024 12:01 PM, Vignesh Chander wrote: correctly handle the case

Re: [PATCH] drm/amdgpu: drop kiq access while in reset

2024-06-24 Thread Christian König
Am 24.06.24 um 14:24 schrieb Lazar, Lijo: On 6/24/2024 5:31 PM, Christian König wrote: Am 24.06.24 um 13:57 schrieb Lazar, Lijo: On 6/24/2024 5:19 PM, Christian König wrote: Am 24.06.24 um 11:52 schrieb Lazar, Lijo: On 6/24/2024 3:08 PM, Christian König wrote: Am 24.06.24 um 08:34 schrieb

Re: [PATCH] drm/amdgpu: normalize registers as local xcc to read/write under sriov in TLB flush

2024-06-24 Thread Christian König
al Message----- From: Christian König Sent: Monday, June 24, 2024 7:58 PM To: Jian, Jane ; Lazar, Lijo ; Chang, HaiJun ; Zhao, Victor Cc: amd-gfx@lists.freedesktop.org Subject: Re: [PATCH] drm/amdgpu: normalize registers as local xcc to read/write under sriov in TLB flush Am 24.06.24 um 11:13 sc

Re: [PATCH] drm/radeon: check bo_va->bo is non-NULL before using it

2024-06-25 Thread Christian König
Am 25.06.24 um 14:41 schrieb Pierre-Eric Pelloux-Prayer: The call to radeon_vm_clear_freed might clear bo_va->bo, so we have to check it before dereferencing it. Signed-off-by: Pierre-Eric Pelloux-Prayer --- drivers/gpu/drm/radeon/radeon_gem.c | 11 ++- 1 file changed, 10 insertions(

Re: [PATCH] drm/amdgpu: normalize registers as local xcc to read/write in gfx_v9_4_3

2024-06-25 Thread Christian König
Jian Lijo has a better technical understanding of the background than I do, so he should have the last word. But this looks like I would have expected it to look like so feel free to add Acked-by: Christian König . Regards, Christian. --- drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c

Re: [PATCH v2] drm/radeon: check bo_va->bo is non-NULL before using it

2024-06-26 Thread Christian König
Am 25.06.24 um 19:44 schrieb Alex Deucher: On Tue, Jun 25, 2024 at 10:32 AM Pierre-Eric Pelloux-Prayer wrote: The call to radeon_vm_clear_freed might clear bo_va->bo, so we have to check it before dereferencing it. Signed-off-by: Pierre-Eric Pelloux-Prayer Acked-by: Alex Deucher Should I

Re: [PATCH 4/4] drm/etnaviv: export client GPU usage statistics via fdinfo

2024-07-02 Thread Christian König
Am 01.07.24 um 19:14 schrieb Lucas Stach: This exposes a accumulated GPU active time per client via the fdinfo infrastructure. Signed-off-by: Lucas Stach Acked-by: Christian König Sorry that I couldn't find time to finalize and upstream that patch set myself. Regards, Chri

Re: [PATCH v1 1/2] drm/amdgpu: fix out of bounds access in gfx10 during ip dump

2024-07-02 Thread Christian König
Am 02.07.24 um 10:26 schrieb Sunil Khatri: During ip dump in gfx10 the index variable is reused but is not reinitialized to 0 and this causes the index calculation to be wrong and access out of bound access. Acked-by: Christian König for the series. Regards, Christian. Signed-off-by

Re: [PATCH 1/4] drm/scheduler: implement hardware time accounting

2024-07-02 Thread Christian König
Am 02.07.24 um 10:42 schrieb Tvrtko Ursulin: Hi, I few questions below. On 01/07/2024 18:14, Lucas Stach wrote: From: Christian König Multiple drivers came up with the requirement to measure how much runtime each entity accumulated on the HW. A previous attempt of accounting this had to

Re: [PATCH] drm/amdgpu: disallow multiple BO_HANDLES chunks in one submit

2024-07-02 Thread Christian König
like /* Only allow a single BO list to avoid memory leak. */ With that fixed Reviewed-by: Christian König Regards, Christian. + if (p->bo_list) + return -EINVAL; + r = amdgpu_bo_create_list_entry_array(data, &info); if (r) return r;

Re: [PATCH 2/2] drm/ttm: Make ttm shrinkers NUMA aware

2024-07-02 Thread Christian König
Am 02.07.24 um 20:20 schrieb Alex Deucher: + dri-devel On Tue, Jul 2, 2024 at 1:40 PM Rajneesh Bhardwaj wrote: Otherwise the nid is always passed as 0 during memory reclaim so make TTM shrinkers NUMA aware. Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/ttm/ttm_pool.c | 2 +- 1 f

Re: [PATCH 2/2] drm/ttm: Make ttm shrinkers NUMA aware

2024-07-02 Thread Christian König
Am 02.07.24 um 23:54 schrieb Bhardwaj, Rajneesh: [AMD Official Use Only - AMD Internal Distribution Only] -Original Message- From: Koenig, Christian Sent: Tuesday, July 2, 2024 2:25 PM To: Alex Deucher ; Bhardwaj, Rajneesh ; Maling list - DRI developers Cc: amd-gfx@lists.freedesktop.

Re: [PATCH] drm/amdgpu: add missing error handling for amdgpu_ring_alloc()

2024-07-03 Thread Christian König
Error handling is to resolve or at least mitigate problems caused by the user or the hardware. When the driver tries to allocate more than allowed from the ring then that's a driver bug and not something for error handling. So the error handling here doesn't seem to make sense in the first pl

Re: [PATCH] drm/amdgpu: : Fix the null pointer dereference for amdgpu_device_switch_gang

2024-07-03 Thread Christian König
Am 03.07.24 um 11:01 schrieb Bob Zhou: To avoid null pointer dereference reported by Coverity, so add null pointer check for the return of amdgpu_device_get_gang(). NAK, that's complete nonsense in the first place. The pointer is guaranteed to be never NULL or otherwise the logic would have c

Re: [PATCH] drm/amdgpu: : Fix the null pointer dereference for amdgpu_device_switch_gang

2024-07-03 Thread Christian König
Well Coverity probably can't know that and will keep complaining. Maybe we can add some extra code to point out that old can never be NULL here? Regards, Christian. Am 03.07.24 um 13:06 schrieb Zhou, Bob: [AMD Official Use Only - AMD Internal Distribution Only] Hi Christian Thanks for your

Re: [PATCH] drm/amdgpu: disallow multiple BO_HANDLES chunks in one submit

2024-07-03 Thread Christian König
Am 03.07.24 um 14:48 schrieb Pierre-Eric Pelloux-Prayer: Le 02/07/2024 à 15:35, Christian König a écrit : Am 02.07.24 um 15:23 schrieb Pierre-Eric Pelloux-Prayer: Before this commit, only submits with both a BO_HANDLES chunk and a 'bo_list_handle' would be rejected (by amdgpu_cs_

Re: [PATCH 2/2] MAINTAINERS: fix Xinhui's name

2024-07-04 Thread Christian König
Am 03.07.24 um 21:36 schrieb Alex Deucher: Switch to fist last for consistency. Signed-off-by: Alex Deucher Cc: Xinhui Pan Ah, that's probably my fault. I copy&pasted that from a mail. Reviewed-by: Christian König --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1

Re: [PATCH 1/2] MAINTAINERS: update powerplay and swsmu

2024-07-04 Thread Christian König
Am 03.07.24 um 21:36 schrieb Alex Deucher: Evan is no longer maintaining powerplay and swsmu. Add Kenneth Feng as his replacement. Signed-off-by: Alex Deucher Cc: Kenneth Feng Acked-by: Christian König --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a

Re: [PATCH 2/2] drm/amdgpu: Add address alignment support to DCC buffers

2024-07-04 Thread Christian König
) - Set the Alignment to a default value if the callback doesn't exist. - Add the callback to amdgpu_gmc_funcs. Signed-off-by: Arunpravin Paneer Selvam Acked-by: Christian König for the series. --- drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h | 3 ++ drivers/gpu/drm/amd/a

Re: [PATCH v1 1/2] drm:amdgpu: enable IH ring1 for IH v7.0

2024-07-04 Thread Christian König
Am 03.07.24 um 19:49 schrieb Sunil Khatri: We need IH ring1 for handling the pagefault interrupts which over flow in default ring for specific usecases. Enable ring1 allows software to redirect high interrupts to ring1 from default IH ring. Signed-off-by: Sunil Khatri Reviewed-by: Christian

Re: [PATCH v3 0/6] drm/radeon: remove load callback & drm_dev_alloc

2024-07-04 Thread Christian König
Am 04.07.24 um 06:58 schrieb Hoi Pok Wu: Thanks a lot for your help Thomas. On Wed, Jul 3, 2024 at 4:52 AM Thomas Zimmermann wrote: Hi Am 30.06.24 um 18:59 schrieb Wu Hoi Pok: .load and drm_dev_alloc are deprecated. These patch series aims to remove them. v3: Both v1 and v2 sucks. v3 improv

Re: [PATCH] drm/amdgpu: remove redundant semicolons in RAS_EVENT_LOG

2024-07-04 Thread Christian König
mmon mistakes of all times :) Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h index 9224fc6418e4..518b10f190ec 100644 -

Re: [PATCH] drm/amdgpu/job: Replace DRM_INFO/ERROR logging

2024-07-09 Thread Christian König
Am 08.07.24 um 21:04 schrieb Alex Deucher: Use the dev_info/err variants so we get per device logging in multi-GPU cases. Signed-off-by: Alex Deucher Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 21 +++-- 1 file changed, 11 insertions

Re: [PATCH v2] drm/amdgpu: set start timestamp of fence in the right place

2024-07-10 Thread Christian König
Am 10.07.24 um 02:31 schrieb jiadong@amd.com: From: Jiadong Zhu The job's embedded fence is dma_fence which shall not be conversed to amdgpu_fence. Good catch. The start timestamp shall be saved on job for hw_fence. But NAK to that approach. Why do we need the start time here in the

Re: [PATCH v2] drm/amdgpu: set start timestamp of fence in the right place

2024-07-10 Thread Christian König
Am 10.07.24 um 09:54 schrieb Zhu, Jiadong: [AMD Official Use Only - AMD Internal Distribution Only] -Original Message- From: Christian König Sent: Wednesday, July 10, 2024 3:17 PM To: Zhu, Jiadong ; amd-gfx@lists.freedesktop.org Subject: Re: [PATCH v2] drm/amdgpu: set start timestamp

Re: [PATCH v2] drm/amdgpu: set start timestamp of fence in the right place

2024-07-10 Thread Christian König
Am 10.07.24 um 12:15 schrieb Zhu, Jiadong: [AMD Official Use Only - AMD Internal Distribution Only] -Original Message- From: Christian König Sent: Wednesday, July 10, 2024 5:27 PM To: Zhu, Jiadong ; amd-gfx@lists.freedesktop.org; Deucher, Alexander Subject: Re: [PATCH v2] drm/amdgpu

Re: [PATCH v2] drm/amdgpu: set start timestamp of fence in the right place

2024-07-11 Thread Christian König
Am 11.07.24 um 03:31 schrieb Zhu, Jiadong: [AMD Official Use Only - AMD Internal Distribution Only] -Original Message- From: Christian König Sent: Wednesday, July 10, 2024 8:46 PM To: Zhu, Jiadong ; amd-gfx@lists.freedesktop.org; Deucher, Alexander Subject: Re: [PATCH v2] drm/amdgpu

Re: [PATCH v3] drm/amdgpu: set start timestamp of fence in the right place

2024-07-11 Thread Christian König
Am 11.07.24 um 07:37 schrieb jiadong@amd.com: From: Jiadong Zhu The job's embedded fence is dma_fence which shall not be conversed to amdgpu_fence. The start timestamp shall be saved on job for hw_fence. Again big NAK to adding the start time to the job. Regards, Christian. v2: optimi

Re: [PATCH] drm/amdgpu: Mark amdgpu_bo as invalid after moved

2024-07-11 Thread Christian König
Yeah, completely agree. This patch doesn't really make sense. Please explain why you would want to do this? Regards, Christian. Am 11.07.24 um 13:56 schrieb Huang, Trigger: [AMD Official Use Only - AMD Internal Distribution Only] This patch seems to be wrong. Quite a lot of preparations have

Re: [RFC] drm/amdgpu: More efficient ring padding

2024-07-12 Thread Christian König
Am 11.07.24 um 20:17 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin From the department of questionable optimisations today we have a minor improvement to how padding / filling the rings with nops is done. Having noticed that typically 200+ nops per submission are filled into the ring, using a

Re: [PATCH] drm/amdgpu: Mark amdgpu_bo as invalid after moved

2024-07-12 Thread Christian König
been synced to move notify, not the move action. Thanks River -Original Message- From: Christian König Sent: Thursday, July 11, 2024 8:39 PM To: Huang, Trigger ; YuanShang Mao (River) ; amd-gfx@lists.freedesktop.org Subject: Re: [PATCH] drm/amdgpu: Mark amdgpu_bo as invalid after moved

Re: [PATCH] drm/amdgpu: Mark amdgpu_bo as invalid after moved

2024-07-12 Thread Christian König
because compute shader is still accessing the "invalidated" BO. I am not familiar with amdgpu_vm_bo state machine, so I don’t know if it is an code error or an design error. Thanks River -Original Message- From: YuanShang Mao (River) Sent: Friday, July 12, 2024 10:55 AM To:

Re: [RFC] drm/amdgpu: More efficient ring padding

2024-07-12 Thread Christian König
Am 12.07.24 um 11:14 schrieb Tvrtko Ursulin: On 12/07/2024 08:33, Christian König wrote: Am 11.07.24 um 20:17 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin  From the department of questionable optimisations today we have a minor improvement to how padding / filling the rings with nops is

Re: [Patch v2] drm/ttm: Allow direct reclaim to allocate local memory

2024-07-12 Thread Christian König
compaction. (/proc/sys/vm/compact_memory) Note: On certain distros such as RHEL, the proactive compaction is disabled. (https://tinyurl.com/4f32f7rs) Cc: Dave Airlie Cc: Vlastimil Babka Cc: Daniel Vetter Reviewed-by: Christian König Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/ttm

Re: [PATCH] drm/amdgpu: Mark amdgpu_bo as invalid after moved

2024-07-15 Thread Christian König
en compute and graphics. So graphics and compute submissions at the same time are possible. @Christian, this is a concequence of using libdrm and insisting that each process uses only a single VM per GPU. Regards, Felix On 2024-07-12 3:39, Christian König wrote: Hi River, well that isn

Re: [PATCH 1/3] drm/amdgpu: Add empty HDP flush function to JPEG v4.0.3

2024-07-15 Thread Christian König
Am 15.07.24 um 16:47 schrieb Jane Jian: From: Lijo Lazar JPEG v4.0.3 doesn't support HDP flush when RRMT is enabled. Instead, mmsch fw will do the flush. Signed-off-by: Lijo Lazar Signed-off-by: Jane Jian --- drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c | 9 + 1 file changed, 9 ins

Re: [PATCH 1/3] drm/amdgpu: Add empty HDP flush function to JPEG v4.0.3

2024-07-15 Thread Christian König
Am 15.07.24 um 17:08 schrieb Lazar, Lijo: On 7/15/2024 8:28 PM, Christian König wrote: Am 15.07.24 um 16:47 schrieb Jane Jian: From: Lijo Lazar JPEG v4.0.3 doesn't support HDP flush when RRMT is enabled. Instead, mmsch fw will do the flush. Signed-off-by: Lijo Lazar Signed-off-by:

Re: [PATCH v5 2/2] drm/amdgpu: Add address alignment support to DCC buffers

2024-07-17 Thread Christian König
- Set the Alignment to a default value if the callback doesn't exist.   - Add the callback to amdgpu_gmc_funcs. v6:   - Fix checkpatch error reported by Intel CI. Signed-off-by: Arunpravin Paneer Selvam Acked-by: Alex Deucher Acked-by: Christian König

Re: [PATCH v5 2/2] drm/amdgpu: Add address alignment support to DCC buffers

2024-07-17 Thread Christian König
G_CONTIGUOUS) && (amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(12, 0, 0) || amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(12, 0, 1))) { } Regards, Arun. On 7/17/2024 2:38 PM, Christian König wrote: Well that approach was discussed before and seemed to be to complicated. But I to

Re: [PATCH v6 2/2] drm/amdgpu: Add address alignment support to DCC buffers

2024-07-18 Thread Christian König
ic hw generation. Signed-off-by: Arunpravin Paneer Selvam Acked-by: Alex Deucher Acked-by: Christian König Reviewed-by: Frank Min --- drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h | 6 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 39 +++- drivers/gpu/drm/amd/amdgpu/gmc_

Re: [PATCH v2 0/9] KFD user queue validation

2024-07-19 Thread Christian König
The series is Reviewed-by: Felix Kuehling I only skimmed over it and will probably find something to complain on later. But we need to get this out of the door, so feel free to add Acked-by: Christian König to the series for now. Thanks, Christian. Philip Yang (9): drm/amdkfd

Re: [PATCH] drm/amdgpu: reset vm state machine after gpu reset(vram lost)

2024-07-19 Thread Christian König
Am 19.07.24 um 11:19 schrieb ZhenGuo Yin: [Why] Page table of compute VM in the VRAM will lost after gpu reset. VRAM won't be restored since compute VM has no shadows. [How] Use higher 32-bit of vm->generation to record a vram_lost_counter. Reset the VM state machine when the counter is not equa

Re: [PATCH] drm/amdgpu: reset vm state machine after gpu reset(vram lost)

2024-07-19 Thread Christian König
Am 19.07.24 um 11:36 schrieb Yin, ZhenGuo (Chris): [AMD Official Use Only - AMD Internal Distribution Only] Hi, Christian Why loosing VRAM would result in the vm entity to become invalid? I think only if there has a fence error appeared(like a pending SDMA job got timedout or cancelled), then

Re: [PATCH] drm/scheduler: Fix drm_sched_entity_set_priority()

2024-07-19 Thread Christian König
: ac4eb83ab255 ("drm/sched: select new rq even if there is only one v3") References: 981b04d96856 ("drm/sched: improve docs around drm_sched_entity") Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov Cc: Matthew Brost Cc: Daniel Vetter Cc: amd-gfx@lists.freedesktop.or

Re: [PATCH] drm/scheduler: Fix drm_sched_entity_set_priority()

2024-07-19 Thread Christian König
Am 19.07.24 um 15:02 schrieb Christian König: Am 19.07.24 um 11:47 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin Long time ago in commit b3ac17667f11 ("drm/scheduler: rework entity creation") a change was made which prevented priority changes for entities with only one assigned

Re: [PATCH] drm/amdgpu/mes: fix mes ring buffer overflow

2024-07-19 Thread Christian König
Am 19.07.24 um 11:16 schrieb Jack Xiao: wait memory room until enough before writing mes packets to avoid ring buffer overflow. Signed-off-by: Jack Xiao --- drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 18 ++ drivers/gpu/drm/amd/amdgpu/mes_v12_0.c | 18 ++ 2 file

[PATCH] drm/amdgpu: harden the HW access lockdep check

2024-07-19 Thread Christian König
x27;t overflow the logs with them. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 30 +++--- 1 file changed, 9 insertions(+), 21 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

Re: [PATCH] drm/amdgpu/mes: fix mes ring buffer overflow

2024-07-22 Thread Christian König
k we rather need to increase the MES ring size instead./ Unfortunately, it doesn't work. I guess mes firmware has limitation. Regards, Jack ---- *From:* Christian König *Sent:* Friday, 19 July 2024 23:44 *To:* Xiao

[PATCH] drm/scheduler: remove full_recover from drm_sched_start

2024-07-22 Thread Christian König
a bit. Signed-off-by: Christian König --- .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c| 4 +-- drivers/gpu/drm/etnaviv/etnaviv_sched.c | 4 +-- drivers/gpu/drm/imagination/pvr_queue.c | 4 +-- drivers/gpu/drm/lima

Re: 6.10/bisected/regression - Since commit e356d321d024 in the kernel log appears the message "MES failed to respond to msg=MISC (WAIT_REG_MEM)" which were never seen before

2024-07-22 Thread Christian König
11 command submission git bisect bad e356d321d0240663a09b139fa3658ddbca163e27 # first bad commit: [e356d321d0240663a09b139fa3658ddbca163e27] drm/amdgpu: cleanup MES11 command submission Author: Christian König Date: Fri May 31 10:56:00 2024 +0200 drm/amdgpu: cleanup MES11 command s

Re: [PATCH] drm/amdgpu/mes: fix mes ring buffer overflow

2024-07-22 Thread Christian König
d more waiting time. Regards, Jack ---- *From:* Christian König *Sent:* Monday, 22 July 2024 16:20 *To:* Xiao, Jack ; amd-gfx@lists.freedesktop.org ; Deucher, Alexander *Subject:* Re: [PATCH] drm/amdgpu/mes: fix mes ring buffer o

Re: [PATCH 1/7] drm/amdgpu/gfx7: enable wave kill for compute queues

2024-07-22 Thread Christian König
Am 17.07.24 um 22:37 schrieb Alex Deucher: It should work the same for compute as well as gfx. Signed-off-by: Alex Deucher Reviewed-by: Christian König for the whole series. --- drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm

Re: [PATCH 1/4] drm/amdgpu/gfx10: properly handle error ints on all pipes

2024-07-22 Thread Christian König
Am 17.07.24 um 22:38 schrieb Alex Deucher: Need to handle the interrupt enables for all pipes. v2: fix indexing (Jessie) Signed-off-by: Alex Deucher Acked-by: Christian König for the whole series. --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 130 + 1 file

Re: [PATCH 2/6] drm/amdgpu/gfx11: Enable bad opcode interrupt

2024-07-22 Thread Christian König
Am 17.07.24 um 22:40 schrieb Alex Deucher: From: Jesse Zhang For the bad opcode case, it will cause CP/ME hang. The firmware will prevent the ME side from hanging by raising a bad opcode interrupt. And the driver needs to perform a vmid reset when receiving the interrupt. v2: update irq namin

Re: [PATCH] drm/scheduler: Fix drm_sched_entity_set_priority()

2024-07-22 Thread Christian König
Am 22.07.24 um 15:52 schrieb Tvrtko Ursulin: On 19/07/2024 16:18, Christian König wrote: Am 19.07.24 um 15:02 schrieb Christian König: Am 19.07.24 um 11:47 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin Long time ago in commit b3ac17667f11 ("drm/scheduler: rework entity creation"

Re: [PATCH] drm/scheduler: Fix drm_sched_entity_set_priority()

2024-07-22 Thread Christian König
Am 22.07.24 um 16:43 schrieb Tvrtko Ursulin: On 22/07/2024 15:06, Christian König wrote: Am 22.07.24 um 15:52 schrieb Tvrtko Ursulin: On 19/07/2024 16:18, Christian König wrote: Am 19.07.24 um 15:02 schrieb Christian König: Am 19.07.24 um 11:47 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin

Re: [PATCH v2] drm/amdgpu: reset vm state machine after gpu reset(vram lost)

2024-07-23 Thread Christian König
Am 23.07.24 um 05:05 schrieb ZhenGuo Yin: [Why] Page table of compute VM in the VRAM will lost after gpu reset. VRAM won't be restored since compute VM has no shadows. [How] Use higher 32-bit of vm->generation to record a vram_lost_counter. Reset the VM state machine when vm->genertaion is not e

Re: [PATCH] drm/amdgpu/mes: refine for maximum packet execution

2024-07-23 Thread Christian König
Am 23.07.24 um 10:27 schrieb Jack Xiao: Only allow API_NUMBER_OF_COMMAND_MAX packet in mes ring buffer, refine the code for maximum packet execution. Signed-off-by: Jack Xiao --- drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 2 ++ drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 2 +- drivers/gpu/dr

Re: [PATCH v2] drm/amdgpu: reset vm state machine after gpu reset(vram lost)

2024-07-23 Thread Christian König
valid, but let me comment on the patch itself. Regards, Christian. Best, Zhenguo Cloud-GPU Core team, SRDC -Original Message- From: Christian König Sent: Tuesday, July 23, 2024 3:13 PM To: Yin, ZhenGuo (Chris) ; amd-gfx@lists.freedesktop.org Cc: Koenig, Christian Subject: Re: [PATCH v

Re: [PATCH v2] drm/amdgpu: reset vm state machine after gpu reset(vram lost)

2024-07-23 Thread Christian König
Am 23.07.24 um 05:05 schrieb ZhenGuo Yin: [Why] Page table of compute VM in the VRAM will lost after gpu reset. VRAM won't be restored since compute VM has no shadows. [How] Use higher 32-bit of vm->generation to record a vram_lost_counter. Reset the VM state machine when vm->genertaion is not e

Re: [PATCH v3] drm/amdgpu: reset vm state machine after gpu reset(vram lost)

2024-07-23 Thread Christian König
,7 @@ uint64_t amdgpu_vm_generation(struct amdgpu_device *adev, struct amdgpu_vm *vm) if (!vm) return result; - result += vm->generation; + result += (vm->generation & 0xULL); Please use the lower_32_bits() macro here. With that fixed the patch is Revi

Re: [PATCH] amdgpu: allow setting contiguous on non-kernel bos for placement

2024-07-24 Thread Christian König
Am 24.07.24 um 09:02 schrieb Dave Airlie: From: Dave Airlie This is a partial revert of drm/amdgpu: Modify the contiguous flags behaviour. This broke VCN AV1 decoding on radv video on GFX11. On VCN4 only the first VCN block has AV1 decode support, so the kernel has a hacky heurisitic to work

Re: [PATCH] drm/amdgpu: fix Coverity explicit null dereferenced warnings

2024-07-24 Thread Christian König
Am 24.07.24 um 09:06 schrieb Tim Huang: This is to address the Coverity explicit null dereferenced warnings by NULL returns from amdgpu_mes_ctx_get_offs* but without follow-up Checks. Meanwhile refactor the code to keep only one *_get_gpu/cpu_addr. Well nice that you are looking into that, but

[PATCH] drm/amdgpu: fix contiguous handling for IB parsing

2024-07-24 Thread Christian König
Otherwise we won't get correct access to the IB. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 15 +++ 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c

Re: [PATCH] drm/scheduler: Fix drm_sched_entity_set_priority()

2024-07-24 Thread Christian König
Am 24.07.24 um 10:16 schrieb Tvrtko Ursulin: [SNIP] Absolutely. Absolutely good and absolutely me, or absolutely you? :) You, I don't even have time to finish all the stuff I already started :/ These are the TODO points and their opens: - Adjust amdgpu_ctx_set_entity_

Re: [PATCH v7 1/2] drm/buddy: Add start address support to trim function

2024-07-24 Thread Christian König
_start alignment with min chunk_size >>    - use range_overflows() >> >> Signed-off-by: Arunpravin Paneer Selvam     mailto:arunpravin.paneersel...@amd.com>> >> Acked-by: Alex Deucher mailto:alexander.deuc...@amd.com>> >> Acked-by: Chr

Re: [PATCH -next] drm/amd/display: use swap() in sort()

2024-07-24 Thread Christian König
Am 24.07.24 um 09:37 schrieb Jiapeng Chong: Use existing swap() function rather than duplicating its implementation. ./drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_pmo/dml2_pmo_dcn3.c:17:29-30: WARNING opportunity for swap(). Reported-by: Abaci Robot Closes: https://bugzilla.openanolis.

Re: [PATCH] drm/sched: Add error code parameter to drm_sched_start

2024-07-25 Thread Christian König
Am 24.07.24 um 20:43 schrieb vitaly.pros...@amd.com: From: Vitaly Prosyak The current implementation of drm_sched_start uses a hardcoded -ECANCELED to dispose of a job when the parent/hw fence is NULL. This results in drm_sched_job_done being called with -ECANCELED for each job with a NULL pa

[PATCH] drm/amdgpu: fix contiguous handling for IB parsing v2

2024-07-25 Thread Christian König
Otherwise we won't get correct access to the IB. v2: keep setting AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS to avoid problems in the VRAM backend. Signed-off-by: Christian König Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3501 Fixes: e362b7c8f8c7 ("drm/amdgpu: Modify the

Re: [PATCH] drm/radeon/evergreen_cs: fix int overflow errors in cs track offsets

2024-07-26 Thread Christian König
Am 25.07.24 um 20:09 schrieb Nikita Zhandarovich: Several cs track offsets (such as 'track->db_s_read_offset') either are initialized with or plainly take big enough values that, once shifted 8 bits left, may be hit with integer overflow if the resulting values end up going over u32 limit. Some

Re: [PATCH] drm/radeon/evergreen_cs: fix int overflow errors in cs track offsets

2024-07-26 Thread Christian König
I strongly suggest to revert that again. See my other mail. Christian. Am 25.07.24 um 22:59 schrieb Alex Deucher: Applied. Thanks! Alex On Thu, Jul 25, 2024 at 2:20 PM Nikita Zhandarovich wrote: Several cs track offsets (such as 'track->db_s_read_offset') either are initialized with or pla

[PATCH] drm/sched: add optional errno to drm_sched_start()

2024-07-26 Thread Christian König
ssage, drop the new function again and update all callers, apply the errno also to scheduler fences with hw fences Signed-off-by: Jesse Zhang Signed-off-by: Vitaly Prosyak Signed-off-by: Christian König Cc: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.

Re: [PATCH] drm/radeon/evergreen_cs: fix int overflow errors in cs track offsets

2024-07-29 Thread Christian König
Am 26.07.24 um 14:52 schrieb Alex Deucher: On Fri, Jul 26, 2024 at 3:05 AM Christian König wrote: Am 25.07.24 um 20:09 schrieb Nikita Zhandarovich: Several cs track offsets (such as 'track->db_s_read_offset') either are initialized with or plainly take big enough values that, o

Re: [PATCH] drm/amdgpu: always allocate cleared VRAM for GEM allocations

2024-07-29 Thread Christian König
Am 29.07.24 um 12:42 schrieb Michel Dänzer: On 2024-07-26 17:25, Alex Deucher wrote: On Fri, Jul 26, 2024 at 9:50 AM Alex Deucher wrote: This adds allocation latency, but aligns better with user expectations. The latency should improve with the drm buddy clearing patches that Arun has been wo

Re: [PATCH] drm/radeon/evergreen_cs: fix int overflow errors in cs track offsets

2024-07-29 Thread Christian König
Am 29.07.24 um 19:26 schrieb Nikita Zhandarovich: Hi, On 7/29/24 02:23, Christian König wrote: Am 26.07.24 um 14:52 schrieb Alex Deucher: On Fri, Jul 26, 2024 at 3:05 AM Christian König wrote: Am 25.07.24 um 20:09 schrieb Nikita Zhandarovich: Several cs track offsets (such as '

Re: [PATCH] drm/radeon/evergreen_cs: fix int overflow errors in cs track offsets

2024-07-29 Thread Christian König
Am 29.07.24 um 20:04 schrieb Christian König: Am 29.07.24 um 19:26 schrieb Nikita Zhandarovich: Hi, On 7/29/24 02:23, Christian König wrote: Am 26.07.24 um 14:52 schrieb Alex Deucher: On Fri, Jul 26, 2024 at 3:05 AM Christian König wrote: Am 25.07.24 um 20:09 schrieb Nikita Zhandarovich

Re: [PATCH v3] drm/amdkfd: Change kfd/svm page fault drain handling

2024-07-29 Thread Christian König
Am 25.07.24 um 20:19 schrieb Xiaogang.Chen: From: Xiaogang Chen When app unmap vm ranges(munmap) kfd/svm starts drain pending page fault and not handle any incoming pages fault of this process until a deferred work item got executed by default system wq. The time period of "not handle page faul

Re: [PATCH] drm/sched: add optional errno to drm_sched_start()

2024-07-29 Thread Christian König
Am 26.07.24 um 14:30 schrieb Matthew Brost: On Fri, Jul 26, 2024 at 09:55:50AM +0200, Christian König wrote: The current implementation of drm_sched_start uses a hardcoded -ECANCELED to dispose of a job when the parent/hw fence is NULL. This results in drm_sched_job_done being called with

Re: [PATCH] drm/sched: add optional errno to drm_sched_start()

2024-07-29 Thread Christian König
Am 26.07.24 um 16:21 schrieb Daniel Vetter: On Fri, Jul 26, 2024 at 09:55:50AM +0200, Christian König wrote: The current implementation of drm_sched_start uses a hardcoded -ECANCELED to dispose of a job when the parent/hw fence is NULL. This results in drm_sched_job_done being called with

<    1   2   3   4   5   6   7   8   9   10   >