Re: [PATCH] drm/ttm: set max_active to recommened default

2023-11-13 Thread Christian König
Am 11.11.23 um 14:11 schrieb Rajneesh Bhardwaj: To maximize per cpu execution context for the work items, use the recommended settings i.e. WQ_DFL_ACTIVE(256). There is no apparent reason to throttle to 16 while process tear down. Well big NAK to this. During process tear down it can be that hu

Re: [PATCH] drm/amdgpu: Skip execution of pending reset jobs

2023-11-10 Thread Christian König
Am 10.11.23 um 16:07 schrieb Lazar, Lijo: On 11/10/2023 8:18 PM, Christian König wrote: Am 09.11.23 um 08:38 schrieb Lijo Lazar: cancel_work is not backported to all custom kernels. Well this is pretty clear NAK to pushing this upstream. We absolutely can't add workaround for

Re: [PATCH] drm/amdgpu: fix AGP addressing when GART is not at 0

2023-11-10 Thread Christian König
AGP address setup into amdgpu_bo_gpu_offset_no_check(). v2: check mem_type before checking agp Fixes: 67318cb84341 ("drm/amdgpu/gmc11: set gart placement GC11") Reported-by: Jesse Zhang Reported-by: Yifan Zhang Signed-off-by: Alex Deucher Cc: christian.koe...@amd.com Reviewed-by: Ch

Re: [PATCH] drm/amdgpu: fix AGP addressing when GART is not at 0

2023-11-10 Thread Christian König
Am 10.11.23 um 15:47 schrieb Alex Deucher: This worked by luck if the GART aperture ended up at 0. When we ended up moving GART on some chips, the GART aperture ended up offsetting the the AGP address since the resource->start is a GART offset, not an MC address. Fix this by moving the AGP addr

Re: [PATCH] drm/amdgpu: Skip execution of pending reset jobs

2023-11-10 Thread Christian König
Am 09.11.23 um 08:38 schrieb Lijo Lazar: cancel_work is not backported to all custom kernels. Well this is pretty clear NAK to pushing this upstream. We absolutely can't add workaround for older kernels. You could keep this in the backported kernel, but why should cancel_work not be availab

Re: [PATCH] drm/amdgpu: exclude domain start when calucales offset for AGP aperture BOs

2023-11-10 Thread Christian König
Just call amdgpu_gmc_agp_addr() and check the return value for != AMDGPU_BO_INVALID_OFFSET; The problem is simply that we can't cache that result anywhere because bo->resource->start is essentially the offset into the GART and not the MC address. That must have been sneaked in years ago when

Re: [PATCH] drm/amdgpu: exclude domain start when calucales offset for AGP aperture BOs

2023-11-10 Thread Christian König
No, that's broken as well. The problem is in amdgpu_ttm_alloc_gart():     if (addr != AMDGPU_BO_INVALID_OFFSET) {     bo->resource->start = addr >> PAGE_SHIFT;     return 0;     } bo->resource->start is relative to the GART address, so we can't assign the AGP ad

Re: [PATCH] drm/amdgpu: exclude domain start when calucales offset for AGP aperture BOs

2023-11-10 Thread Christian König
Am 10.11.23 um 13:52 schrieb Yifan Zhang: For BOs in AGP aperture, tbo.resource->start includes AGP aperture start. Well big NAK to that. tbo.resource->start should never ever include the AGP aperture start in the first place. How did that happen? Regards, Christian. Don't add it again i

Re: [PATCH 1/3] drm/amdgpu/gmc11: disable AGP aperture

2023-11-10 Thread Christian König
Am 09.11.23 um 15:41 schrieb Alex Deucher: We've had misc reports of random IOMMU page faults when this is used. It's just a rarely used optimization anyway, so let's just disable it. Signed-off-by: Alex Deucher Acked-by: Christian König for the series. --- drivers/gpu

Re: [PATCH] drm/amdgpu: move UVD and VCE sched entity init after sched init

2023-11-10 Thread Christian König
ated after scheduler init, so change the ordering to fix this. v2: Leave logic in UVD and VCE code Fixes: 56e449603f0ac5 ("drm/sched: Convert the GPU scheduler to variable number of run-queues") Signed-off-by: Alex Deucher Cc: ltuiko...@gmail.com Reviewed-by: Christian König ---

[PATCH 1/2] drm/amdgpu: fix error handling in amdgpu_bo_list_get()

2023-11-09 Thread Christian König
We should not leak the pointer where we couldn't grab the reference on to the caller because it can be that the error handling still tries to put the reference then. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c | 1 + 1 file changed, 1 insertion(+) diff

[PATCH 2/2] drm/amdgpu: lower CS errors to debug severity

2023-11-09 Thread Christian König
Otherwise userspace can spam the logs by using incorrect input values. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c

Re: [PATCH 5/6] drm/amdkfd: Import DMABufs for interop through DRM

2023-11-09 Thread Christian König
-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 9 ++- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 64 +-- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 15 ++--- 3 files changed, 52 insertions(+), 36 deletions(-) diff --git a/drivers/gpu/drm/

Re: [PATCH 2/6] drm/amdgpu: New VM state for evicted user BOs

2023-11-09 Thread Christian König
Am 08.11.23 um 22:23 schrieb Felix Kuehling: On 2023-11-08 07:28, Christian König wrote: Not necessary objections to this patch here, but rather how this new state is used later on. The fundamental problem is that re-validating things in amdgpu_vm_handle_moved() won't work in all

Re: [Patch v2] drm/ttm: Schedule delayed_delete worker closer

2023-11-08 Thread Christian König
such as GFXIP9.4.3. Acked-by: Felix Kuehling Signed-off-by: Rajneesh Bhardwaj Reviewed-by: Christian König Going to push this to drm-misc-next. Thanks, Christian. --- Changes in v2: - Absorbed the feedback provided by Christian in the commit message and the comment. drivers/gpu/drm

Re: [PATCH 2/6] drm/amdgpu: New VM state for evicted user BOs

2023-11-08 Thread Christian König
Not necessary objections to this patch here, but rather how this new state is used later on. The fundamental problem is that re-validating things in amdgpu_vm_handle_moved() won't work in all cases. The general approach for both classic CS IOCTL as well as user queues is the following: 1.

Re: [PATCH 1/6] drm/amdgpu: Fix possible null pointer dereference

2023-11-08 Thread Christian König
Am 07.11.23 um 17:58 schrieb Felix Kuehling: mem = bo->tbo.resource may be NULL in amdgpu_vm_bo_update. Fixes: 180253782038 ("drm/ttm: stop allocating dummy resources during BO creation") Signed-off-by: Felix Kuehling Reviewed-by: Christian König --- drivers/gpu/d

Re: [PATCH] drm/ttm: Schedule delayed_delete worker closer

2023-11-08 Thread Christian König
Am 07.11.23 um 20:45 schrieb Rajneesh Bhardwaj: When a TTM BO is getting freed, to optimize the clearing operation on the workqueue, schedule it closer to a NUMA node where the memory was allocated. This avoids the cases where the ttm_bo_delayed_delete gets scheduled on the CPU cores that are acr

Re: [PATCH] drm/amdgpu: fix AGP init order

2023-11-07 Thread Christian König
g/archives/amd-gfx/2023-November/100966.html Signed-off-by: Alex Deucher Cc: Mikhail Gavrilov Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 --- drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 1 + drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c | 1 + drivers/g

Re: [PATCH] drm/amdgpu: move buffer funcs setting up a level

2023-11-07 Thread Christian König
Am 07.11.23 um 16:14 schrieb Alex Deucher: On Tue, Nov 7, 2023 at 9:52 AM Christian König wrote: This is only needed for certain UVD/VCE hardware/firmware versions. We currently enable it for all UVD and VCE versions. Leo needs to decide that, but I would rather like to keep the call to

Re: [PATCH] drm/amdgpu: move buffer funcs setting up a level

2023-11-07 Thread Christian König
wrote: On Tue, Nov 7, 2023 at 5:52 AM Christian König wrote: Am 03.11.23 um 23:10 schrieb Alex Deucher: On Fri, Nov 3, 2023 at 4:17 PM Alex Deucher wrote: On Thu, Oct 26, 2023 at 4:17 PM Luben Tuikov wrote: Pushed to drm-misc-next. BTW, I'm seeing the following on older GPUs with VC

Re: [PATCH] drm/amdgpu: move buffer funcs setting up a level

2023-11-07 Thread Christian König
0-26 15:52, Luben Tuikov wrote: On 2023-10-26 15:32, Alex Deucher wrote: On Thu, Oct 26, 2023 at 2:22 AM Christian König wrote: Am 25.10.23 um 19:19 schrieb Alex Deucher: Rather than doing this in the IP code for the SDMA paging engine, move it up to the core device level init level. This

Re: [PATCH v2 2/3] drm/amdgpu: Add flag to disable implicit sync for GEM operations.

2023-11-06 Thread Christian König
Am 06.11.23 um 16:47 schrieb Tatsuyuki Ishi: On Nov 6, 2023, at 22:44, Christian König wrote: Am 02.11.23 um 15:04 schrieb Tatsuyuki Ishi: In Vulkan, it is the application's responsibility to perform adequate synchronization before a sparse unmap, replace or BO destroy operation. Unti

Re: [PATCH v2 2/3] drm/amdgpu: Add flag to disable implicit sync for GEM operations.

2023-11-06 Thread Christian König
 Am 02.11.23 um 15:04 schrieb Tatsuyuki Ishi: In Vulkan, it is the application's responsibility to perform adequate synchronization before a sparse unmap, replace or BO destroy operation. Until now, the kernel applied the same rule as implicitly-synchronized APIs like OpenGL, which with per-VM BO

Re: [PATCH 3/6] drm/amdgpu: Flush VM updates for split bindings eagerly.

2023-11-06 Thread Christian König
Am 06.11.23 um 08:56 schrieb Tatsuyuki Ishi: On Oct 31, 2023, at 23:07, Christian König wrote: Am 31.10.23 um 14:59 schrieb Bas Nieuwenhuizen: On Tue, Oct 31, 2023 at 2:57 PM Christian König wrote: Am 31.10.23 um 14:40 schrieb Tatsuyuki Ishi: > The curr

Re: [PATCH] drm/amdgpu: fix error handling in amdgpu_vm_init

2023-11-06 Thread Christian König
Am 01.11.23 um 23:00 schrieb Felix Kuehling: On 2023-10-31 11:18, Alex Deucher wrote: On Tue, Oct 31, 2023 at 11:12 AM Christian König wrote: When clearing the root PD fails we need to properly release it again. Signed-off-by: Christian König Acked-by: Alex Deucher Has this been submitted

Re: [PATCH] drm/amdgpu: Fix the vram base start address

2023-11-02 Thread Christian König
Am 01.11.23 um 20:13 schrieb Arunpravin Paneer Selvam: Hi Christian, On 10/30/2023 9:34 PM, Christian König wrote: Am 30.10.23 um 13:22 schrieb Arunpravin Paneer Selvam: If the size returned by drm buddy allocator is higher than the required size, we take the higher size to calculate the

Re: [PATCH 2/2] drm/amdgpu: Use drm_exec for seq64 bo lock

2023-11-02 Thread Christian König
Am 01.11.23 um 17:26 schrieb Arunpravin Paneer Selvam: Replace seq64 bo lock sequences with drm_exec. Signed-off-by: Alex Deucher Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c | 73 ++- 1 file changed, 33 insertions(+), 40 deletions

Re: [PATCH] drm/amdgpu: don't put MQDs in VRAM on ARM | ARM64

2023-11-02 Thread Christian König
s in VRAM") Signed-off-by: Alex Deucher Cc: alexey.kli...@linaro.org Acked-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c index c92

Re: [PATCH 1/2] drm/amdgpu: Enable seq64 manager and fix bugs

2023-11-02 Thread Christian König
needs to be mapped read only in the user VM (Alex) v2: - Instead of just one define for TOP/BOTTOM reserved space separate them into two (Christian) - Fix the CPU and VA calculations and while at it also cleanup error handling and kerneldoc (Christian) Signed-off-by: Christian König

Re: [PATCH 3/6] drm/amdgpu: Flush VM updates for split bindings eagerly.

2023-11-01 Thread Christian König
Am 02.11.23 um 03:36 schrieb Lang Yu: On 10/31/ , Christian König wrote: Am 31.10.23 um 14:59 schrieb Bas Nieuwenhuizen: On Tue, Oct 31, 2023 at 2:57 PM Christian König wrote: Am 31.10.23 um 14:40 schrieb Tatsuyuki Ishi: > The current amdgpu_gem_va_update_vm only tries to perf

[PATCH] drm/amdgpu: fix error handling in amdgpu_vm_init

2023-10-31 Thread Christian König
When clearing the root PD fails we need to properly release it again. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 31 +- 1 file changed, 16 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu

Re: [PATCH 2/6] drm/amdgpu: Separate eviction from VM status.

2023-10-31 Thread Christian König
Am 31.10.23 um 15:39 schrieb Tatsuyuki Ishi: On Oct 31, 2023, at 22:55, Christian König wrote: Am 31.10.23 um 14:40 schrieb Tatsuyuki Ishi: In short, eviction never really belonged to the vm_status state machine. I strongly disagree to that. Even when evicted, the BO could belong to

Re: [PATCH 5/6] drm/amdgpu: Add flag to disable implicit sync for GEM operations.

2023-10-31 Thread Christian König
Am 31.10.23 um 15:14 schrieb Michel Dänzer: On 10/31/23 14:40, Tatsuyuki Ishi wrote: In Vulkan, it is the application's responsibility to perform adequate synchronization before a sparse unmap, replace or BO destroy operation. Until now, the kernel applied the same rule as implicitly-synchroni

Re: [PATCH 3/6] drm/amdgpu: Flush VM updates for split bindings eagerly.

2023-10-31 Thread Christian König
Am 31.10.23 um 14:59 schrieb Bas Nieuwenhuizen: On Tue, Oct 31, 2023 at 2:57 PM Christian König wrote: Am 31.10.23 um 14:40 schrieb Tatsuyuki Ishi: > The current amdgpu_gem_va_update_vm only tries to perform updates for the > BO specified in the GEM ioctl; however,

Re: [PATCH 4/6] drm/amdgpu: Remove redundant state change after validation.

2023-10-31 Thread Christian König
Am 31.10.23 um 14:40 schrieb Tatsuyuki Ishi: All the state changes are handled in the TTM move callback; doing it again here just leads to more confusion. The state move here is because we need to track which PDs/PTs are already validated and which have new locations reflected in the PDEs. W

Re: [PATCH 3/6] drm/amdgpu: Flush VM updates for split bindings eagerly.

2023-10-31 Thread Christian König
Am 31.10.23 um 14:40 schrieb Tatsuyuki Ishi: The current amdgpu_gem_va_update_vm only tries to perform updates for the BO specified in the GEM ioctl; however, when a binding is split, the adjacent bindings also need to be updated. Such updates currently ends up getting deferred until next submiss

Re: [PATCH 2/6] drm/amdgpu: Separate eviction from VM status.

2023-10-31 Thread Christian König
Am 31.10.23 um 14:40 schrieb Tatsuyuki Ishi: In short, eviction never really belonged to the vm_status state machine. I strongly disagree to that. Even when evicted, the BO could belong to either the moved or done state. The "evicted" state needed to handle both cases, causing greater confusi

Re: [REGRESSION] rx7600 stopped working after "1cfb4d612127 drm/amdgpu: put MQDs in VRAM"

2023-10-31 Thread Christian König
Hi Alexey, trying to answer some of the questions since Alex is currently on vacation. Am 30.10.23 um 17:01 schrieb Alexey Klimov: Hi Alex, On Thu, 26 Oct 2023 at 19:53, Alex Deucher wrote: On Thu, Oct 26, 2023 at 1:33 PM Alexey Klimov wrote: #regzbot introduced: 1cfb4d612127 #regzbot titl

Re: [PATCH] /drm/amdgpu: correct chunk_ptr to a pointer to chunk.

2023-10-31 Thread Christian König
Am 31.10.23 um 03:55 schrieb YuanShang: The variable "chunk_ptr" should be a pointer pointing to a struct drm_amdgpu_cs_chunk instead of to a pointer of that. Signed-off-by: YuanShang Good catch, Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +

Re: [RFC PATCH] drm/amdkfd: Run restore_workers on freezable WQs

2023-10-30 Thread Christian König
Am 30.10.23 um 18:56 schrieb Felix Kuehling: On 2023-10-30 13:48, Christian König wrote: Am 30.10.23 um 18:38 schrieb Felix Kuehling: On 2023-10-30 12:16, Christian König wrote: @@ -1904,6 +1906,19 @@ kfd_process_gpuid_from_node(struct kfd_process *p, struct kfd_node *node,   return

Re: [RFC PATCH] drm/amdkfd: Run restore_workers on freezable WQs

2023-10-30 Thread Christian König
Am 30.10.23 um 18:38 schrieb Felix Kuehling: On 2023-10-30 12:16, Christian König wrote: @@ -1904,6 +1906,19 @@ kfd_process_gpuid_from_node(struct kfd_process *p, struct kfd_node *node,   return -EINVAL;   }   +static void signal_eviction_fence(struct kfd_process *p) +{ +    spin_lock

Re: [RFC PATCH] drm/amdkfd: Run restore_workers on freezable WQs

2023-10-30 Thread Christian König
Am 30.10.23 um 16:16 schrieb Felix Kuehling: On 2023-10-30 4:23, Christian König wrote: Am 28.10.23 um 00:39 schrieb Felix Kuehling: Make restore workers freezable so we don't have to explicitly flush them in suspend and GPU reset code paths, and we don't accidentally try to restore

Re: [PATCH] drm/amdgpu: Fix the vram base start address

2023-10-30 Thread Christian König
d-off-by: Arunpravin Paneer Selvam Acked-by: Christian König IIRC that hack with the start address is actually not needed any more, but we need to double check this. Christian. --- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 15 +-- 1 file changed, 13 insertions(+), 2 deleti

Re: [PATCH 6/7] drm/exec: Pass in initial # of objects

2023-10-30 Thread Christian König
Am 30.10.23 um 14:38 schrieb Rob Clark: On Mon, Oct 30, 2023 at 1:05 AM Christian König wrote: Am 27.10.23 um 18:58 schrieb Rob Clark: From: Rob Clark In cases where the # is known ahead of time, it is silly to do the table resize dance. Ah, yes that was my initial implementation as well

Re: [PATCH] drm/amdgpu: Fixes uninitialized variable usage in amdgpu_dm_setup_replay

2023-10-30 Thread Christian König
Am 28.10.23 um 02:48 schrieb Yuran Pereira: Hello, On Fri, Oct 27, 2023 at 11:57:45AM -0400, Hamza Mahfooz wrote: On 10/27/23 11:55, Lakha, Bhawanpreet wrote: [AMD Official Use Only - General] There was a consensus to use memset instead of {0}. I remember making changes related to that previ

Re: [RFC PATCH] drm/amdkfd: Run restore_workers on freezable WQs

2023-10-30 Thread Christian König
Am 28.10.23 um 00:39 schrieb Felix Kuehling: Make restore workers freezable so we don't have to explicitly flush them in suspend and GPU reset code paths, and we don't accidentally try to restore BOs while the GPU is suspended. Not having to flush restore_work also helps avoid lock/fence dependen

Re: [PATCH 6/7] drm/exec: Pass in initial # of objects

2023-10-30 Thread Christian König
+ exec->objects = kmalloc(sz, GFP_KERNEL); Please use k*v*malloc() here since we can't predict how large that will be. With that fixed the patch is Reviewed-by: Christian König . Regards, Christian. /* If allocation here fails, just delay that till the first use

Re: [PATCH] drm/amd: Fix UBSAN array-index-out-of-bounds for Powerplay headers

2023-10-29 Thread Christian König
Am 27.10.23 um 22:41 schrieb Alex Deucher: For pptable structs that use flexible array sizes, use flexible arrays. Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2039926 Signed-off-by: Alex Deucher Acked-by: Christian König --- .../drm/amd/pm/powerplay/hwmgr/pptable_v1_0.h

Re: [PATCH] drm/amdgpu/gfx10,11: use memcpy_to/fromio for MQDs

2023-10-27 Thread Christian König
Am 26.10.23 um 20:56 schrieb Alex Deucher: Since they were moved to VRAM, we need to use the IO variants of memcpy. Fixes: 1cfb4d612127 ("drm/amdgpu: put MQDs in VRAM") Signed-off-by: Alex Deucher Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/gfx_v1

Re: [PATCH] MAINTAINERS: Update the GPU Scheduler email

2023-10-27 Thread Christian König
Am 26.10.23 um 21:32 schrieb Alex Deucher: On Thu, Oct 26, 2023 at 1:45 PM Luben Tuikov wrote: Update the GPU Scheduler maintainer email. Cc: Alex Deucher Cc: Christian König Cc: Daniel Vetter Cc: Dave Airlie Cc: AMD Graphics Cc: Direct Rendering Infrastructure - Development Signed-off

Re: [PATCH 1/2] drm/amdgpu: Remove duplicate fdinfo fields

2023-10-27 Thread Christian König
veloped-by: Umio Yasuno Signed-off-by: Umio Yasuno Reviewed-by: Christian König for the series. --- This thread has been inactive for nearly 5 months, so I re-created the patch. drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/gpu/drm/

Re: [PATCH] drm/amdgpu: remove amdgpu_mes_self_test in gpu recover

2023-10-26 Thread Christian König
remove hardware queue, queue id = 3 Fixes: d0c860f33553 ("drm/amdgpu: rework lock handling for flush_tlb v2") Reported-by: Li Ma Signed-off-by: Yifan Zhang Reviewed-by: Christian König Thanks for looking into this, Christian. --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |

Re: [PATCH v7 11/12] drm/amdgpu: introduce userqueue eviction fence

2023-10-26 Thread Christian König
Am 10.10.23 um 12:17 schrieb Shashank Sharma: This patch adds support for userqueue eviction fences. In general, when a process wants to map VRAM memory but TTM can't find enough space, it attempts to evict BOs from its LRU list. This fence will prevent the TTM manager from evicting the process's

Re: [PATCH] drm/amdgpu: remove the seq64 map sequence temporarily

2023-10-26 Thread Christian König
Am 26.10.23 um 12:24 schrieb Arunpravin Paneer Selvam: Temporarily remove the seq64 mapping sequence. Signed-off-by: Arunpravin Paneer Selvam Reviewed-by: Christian König Please push to amd-staging-drm-next ASAP and ping Kenny when that's merged (or if it doesn't merge aut

Re: [PATCH v3 1/2] drm/amdgpu: Acquire ttm locks for dmaunmap

2023-10-25 Thread Christian König
Am 25.10.23 um 20:45 schrieb Felix Kuehling: On 2023-10-25 02:12, Christian König wrote: Am 24.10.23 um 21:20 schrieb David Francis: dmaunmap can call ttm_bo_validate, which expects the ttm dma_resv to be held. Well first of all the dma_resv object isn't related to TTM. Acquire the

Re: [PATCH] drm/amdgpu: move buffer funcs setting up a level

2023-10-25 Thread Christian König
Deucher I don't know of hand if the high level function really cover everything, so only Acked-by: Christian König for now. Christian. --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 15 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c | 21 - drivers/gp

Re: [PATCH v3 1/2] drm/amdgpu: Acquire ttm locks for dmaunmap

2023-10-24 Thread Christian König
Am 24.10.23 um 21:20 schrieb David Francis: dmaunmap can call ttm_bo_validate, which expects the ttm dma_resv to be held. Well first of all the dma_resv object isn't related to TTM. Acquire the locks in amdgpu_amdkfd_gpuvm_dmaunmap_mem. Because the dmaunmap step can now fail, two new number

Re: [PATCH] drm/amdgpu: move buffer funcs setting up a level (v2)

2023-10-24 Thread Christian König
Am 25.10.23 um 06:24 schrieb Luben Tuikov: From: Alex Deucher Rather than doing this in the IP code for the SDMA paging engine, move it up to the core device level init level. This should fix the scheduler init ordering. v2: Fix checkpatch parens complaint; long lines. (Luben) Signed-off-by:

Re: [PATCH] drm/amdgpu: Initialize schedulers before using them

2023-10-24 Thread Christian König
ned-off-by: Andrey Grodzovsky     Reviewed-by: Christian König     Link: https://www.spinics.net/lists/amd-gfx/msg74112.html Andrey separated the scheduler initialization from the ring init because we need some of the rings for XGMI initialization which in turn in necessary to figure out the

Re: [PATCH] Revert "drm/amdgpu: remove vm sanity check from amdgpu_vm_make_compute"

2023-10-24 Thread Christian König
Am 24.10.23 um 01:41 schrieb Felix Kuehling: [sorry, I hit send too early] On 2023-10-23 11:15, Christian König wrote: Am 23.10.23 um 15:06 schrieb Daniel Tang: That commit causes the screen to freeze a few moments after running clinfo on v6.6-rc7 and ROCm 5.6. Sometimes the rest of the

Re: [PATCH 2/2] drm/amdgpu: Add timeout for sync wait

2023-10-23 Thread Christian König
Am 20.10.23 um 11:59 schrieb Emily Deng: Issue: Dead heappen during gpu recover, the call sequence as below: amdgpu_device_gpu_recover->amdgpu_amdkfd_pre_reset->flush_delayed_work-> amdgpu_amdkfd_gpuvm_restore_process_bos->amdgpu_sync_wait Resolving a deadlock with a timeout is illegal in gene

Re: [PATCH] drm/amdgpu: Initialize schedulers before using them

2023-10-23 Thread Christian König
Am 24.10.23 um 04:55 schrieb Luben Tuikov: On 2023-10-23 01:49, Christian König wrote: Am 23.10.23 um 05:23 schrieb Luben Tuikov: Initialize ring schedulers before using them, very early in the amdgpu boot, at PCI probe time, specifically at frame-buffer dumb-create at fill-buffer. This was

Re: [PATCH] Revert "drm/amdgpu: remove vm sanity check from amdgpu_vm_make_compute"

2023-10-23 Thread Christian König
Am 23.10.23 um 15:06 schrieb Daniel Tang: That commit causes the screen to freeze a few moments after running clinfo on v6.6-rc7 and ROCm 5.6. Sometimes the rest of the computer including ssh also freezes. On v6.5-rc1, it only results in a NULL pointer deference message in dmesg and the process t

Re: [PATCH 1/2] drm/amdgpu: handle the return for sync wait

2023-10-23 Thread Christian König
Am 20.10.23 um 11:59 schrieb Emily Deng: Add error handling for amdgpu_sync_wait. Signed-off-by: Emily Deng Reviewed-by: Christian König for this one. Going to discuss with Felix later today what we do with the timeout. Christian. --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c

Re: [PATCH 1/2] drm/amdgpu: Add timeout for sync wait

2023-10-23 Thread Christian König
Am 20.10.23 um 21:47 schrieb Felix Kuehling: On 2023-10-20 09:10, Christian König wrote: No, the wait forever is what is expected and perfectly valid user experience. Waiting with a timeout on the other hand sounds like a really bad idea to me. Every wait with a timeout needs a

Re: [PATCH] drm/amdgpu: Initialize schedulers before using them

2023-10-22 Thread Christian König
lized. Cc: Christian König Cc: Alex Deucher Cc: Felix Kuehling Cc: AMD Graphics Signed-off-by: Luben Tuikov --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/am

Re: [PATCH 1/2] drm/amdgpu: Add timeout for sync wait

2023-10-20 Thread Christian König
ssage- From: Christian König Sent: Friday, October 20, 2023 3:29 PM To: Deng, Emily ; amd-gfx@lists.freedesktop.org Subject: Re: [PATCH 1/2] drm/amdgpu: Add timeout for sync wait Am 20.10.23 um 08:13 schrieb Emily Deng: Issue: Dead heappen during gpu recover, the call se

[PATCH] drm/amdkfd: reserve a fence slot while locking the BO

2023-10-20 Thread Christian König
Looks like the KFD still needs this. Signed-off-by: Christian König Fixes: 8abc1eb2987a ("drm/amdkfd: switch over to using drm_exec v3") --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/

Re: [PATCH Review 1/1] drm/amdgpu: Workaround to skip kiq ring test during ras gpu recovery

2023-10-20 Thread Christian König
Am 17.10.23 um 16:36 schrieb Stanley.Yang: This is workaround, kiq ring test failed in suspend stage when do ras recovery for gfx v9_4_3. Any idea why that failed? Problems like this usually point to an incorrect init or in this case re-init procedure and are actually what the ring test shoul

Re: [PATCH 2/2] drm/amdgpu: handle the return for sync wait

2023-10-20 Thread Christian König
Am 20.10.23 um 08:13 schrieb Emily Deng: You need a patch description and this patch here needs to come first and not second. Christian. Signed-off-by: Emily Deng --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 9 ++--- drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 6

Re: [PATCH 1/2] drm/amdgpu: Add timeout for sync wait

2023-10-20 Thread Christian König
Am 20.10.23 um 08:13 schrieb Emily Deng: Issue: Dead heappen during gpu recover, the call sequence as below: amdgpu_device_gpu_recover->amdgpu_amdkfd_pre_reset->flush_delayed_work-> amdgpu_amdkfd_gpuvm_restore_process_bos->amdgpu_sync_wait It is because the amdgpu_sync_wait is waiting for the b

Re: [PATCH] drm/amdgpu: ignore duplicate BOs again

2023-10-19 Thread Christian König
Am 17.10.23 um 15:04 schrieb Alex Deucher: On Tue, Oct 17, 2023 at 8:22 AM Christian König wrote: Looks like RADV is actually hitting this. Signed-off-by: Christian König Fixes: ca6c1e210aa7 ("drm/amdgpu: use the new drm_exec object for CS v3") Acked-by: Alex Deucher Pushed t

Re: [PATCH v4 13/17] platform/x86/amd/pmf: Add PMF-AMDGPU get interface

2023-10-19 Thread Christian König
Am 18.10.23 um 19:05 schrieb Shyam Sundar S K: On 10/18/2023 9:37 PM, Christian König wrote: Am 18.10.23 um 17:47 schrieb Mario Limonciello: On 10/18/2023 08:40, Christian König wrote: Am 18.10.23 um 11:28 schrieb Shyam Sundar S K: On 10/18/2023 2:50 PM, Ilpo Järvinen wrote: On Wed, 18

Re: [PATCH v4 13/17] platform/x86/amd/pmf: Add PMF-AMDGPU get interface

2023-10-18 Thread Christian König
Am 18.10.23 um 17:47 schrieb Mario Limonciello: On 10/18/2023 08:40, Christian König wrote: Am 18.10.23 um 11:28 schrieb Shyam Sundar S K: On 10/18/2023 2:50 PM, Ilpo Järvinen wrote: On Wed, 18 Oct 2023, Shyam Sundar S K wrote: In order to provide GPU inputs to TA for the Smart PC

Re: [PATCH v4 13/17] platform/x86/amd/pmf: Add PMF-AMDGPU get interface

2023-10-18 Thread Christian König
Am 18.10.23 um 11:28 schrieb Shyam Sundar S K: On 10/18/2023 2:50 PM, Ilpo Järvinen wrote: On Wed, 18 Oct 2023, Shyam Sundar S K wrote: In order to provide GPU inputs to TA for the Smart PC solution to work, we need to have interface between the PMF driver and the AMDGPU driver. Add the i

Re: [PATCH] drm/amdgpu: Remove redundant call to priority_is_valid()

2023-10-18 Thread Christian König
alled amdgpu_ctx_priority_is_valid() already first thing in the function. Cc: Alex Deucher Cc: Christian König Signed-off-by: Luben Tuikov Acked-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 15 --- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/dr

Re: [PATCH 01/11] drm/amdgpu: Fix possible null pointer dereference

2023-10-17 Thread Christian König
Am 17.10.23 um 23:13 schrieb Felix Kuehling: abo->tbo.resource may be NULL in amdgpu_vm_bo_update. Fixes: 180253782038 ("drm/ttm: stop allocating dummy resources during BO creation") Signed-off-by: Felix Kuehling Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgp

Re: [PATCH] drm/amdgpu remove restriction of sriov max_pfn on Vega10

2023-10-17 Thread Christian König
-by: Christian König --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c index 3a1050344b59..b1eb81ca64bc 100644 --- a/drivers/gpu/drm/amd/amdgpu

Re: [PATCH] drm/amdgpu: ignore duplicate BOs again

2023-10-17 Thread Christian König
Am 17.10.23 um 14:42 schrieb Hamza Mahfooz: On 10/17/23 08:10, Christian König wrote: Looks like RADV is actually hitting this. Signed-off-by: Christian König Fixes: ca6c1e210aa7 ("drm/amdgpu: use the new drm_exec object for CS v3") Do you think this will fix the following iss

[PATCH] drm/amdgpu: ignore duplicate BOs again

2023-10-17 Thread Christian König
Looks like RADV is actually hitting this. Signed-off-by: Christian König Fixes: ca6c1e210aa7 ("drm/amdgpu: use the new drm_exec object for CS v3") --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/

Re: [PATCH] drm/amdgpu: move task_info to amdgpu_fpriv

2023-10-17 Thread Christian König
Am 17.10.23 um 09:25 schrieb Shashank Sharma: Hello Christian, Felix, Thanks for your comments, mine inline. On 17/10/2023 07:55, Christian König wrote: Am 17.10.23 um 00:15 schrieb Felix Kuehling: On 2023-10-16 13:08, Shashank Sharma wrote: This patch does the following: - moves vm

Re: [PATCH] drm/amdgpu: move task_info to amdgpu_fpriv

2023-10-16 Thread Christian König
Am 17.10.23 um 00:15 schrieb Felix Kuehling: On 2023-10-16 13:08, Shashank Sharma wrote: This patch does the following: - moves vm->task_info struct to fpriv->task_info. - makes task_info allocation dynamic. - adds reference counting support for task_info structure. - adds some new helper functi

Re: [PATCH v4 1/4] drm/amd/amdgpu/vcn: Add RB decouple feature under SRIOV - P1

2023-10-16 Thread Christian König
Am 16.10.23 um 18:52 schrieb Bokun Zhang: - Update SRIOV header with RB decouple flag Signed-off-by: Bokun Zhang --- drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h b/drivers/gpu

Re: [PATCH] drm/amdgpu: flush the correct vmid tlb for specific pasid

2023-10-12 Thread Christian König
Am 12.10.23 um 09:41 schrieb Yifan Zhang: flush the correct vmid tlb for specific pasid on gmc 11. Signed-off-by: Yifan Zhang Good catch, Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers

Re: [PATCH] Find bo_va before create it when map bo into compute VM

2023-10-12 Thread Christian König
when map the Bos into same VM, otherwise we may trigger kernel general protection when iterate mappings from bo_va. Signed-off-by: Felix Kuehling Acked-by: Christian König Reviewed-by: Ramesh Errabolu Reviewed-By: Xiaogang Chen Tested-By: Xiaogang Chen --- drivers/gpu/drm/amd/amdgpu

Re: [PATCH] drm/amdgpu:Check gfx poweron when skip flush_gpu_tlb

2023-10-10 Thread Christian König
Hi Feifei, yeah, that is correct behavior. The GMC callback should *not* get called during resume in a reset, because the reset needs to take care of invalidating the TLB anyway. If the later doesn't work any more we need to re-iterate the reset procedure and not mess with this here. Regar

Re: [PATCH AUTOSEL 5.10 13/22] drm/amdgpu: install stub fence into potential unused fence pointers

2023-10-09 Thread Christian König
Am 07.10.23 um 11:50 schrieb Greg KH: On Sun, Sep 10, 2023 at 03:43:01PM -0500, Bryan Jennings wrote: This is also causing log spam on 5.15. It was included in 5.15.128 as commit 4921792e04f2125b5eadef9dbe9417a8354c7eff. I encountered this and found https://gitlab.freedesktop.org/drm/amd/-/iss

Re: [PATCH] drm/amdgpu: add missing NULL check

2023-10-09 Thread Christian König
Am 06.10.23 um 16:41 schrieb Alex Deucher: On Fri, Oct 6, 2023 at 9:07 AM Christian König wrote: bo->tbo.resource can easily be NULL here. Signed-off-by: Christian König Add a link to the bug report? Ah, crap. Forgotten to add the link before pushing that. But I've added a C

Re: [PATCH v5 5/7] drm/amd/display: Catch errors from drm_atomic_helper_suspend()

2023-10-09 Thread Christian König
tps://gitlab.freedesktop.org/drm/amd/-/issues/2362 Signed-off-by: Mario Limonciello Acked-by: Christian König --- v4->v5: * New patch --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/d

Re: [PATCH v5 4/7] drm/amd: Capture errors in amdgpu_switcheroo_set_state()

2023-10-09 Thread Christian König
ff-by: Mario Limonciello Acked-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 36 +- 1 file changed, 29 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index a362152

Re: [PATCH] drm/amdgpu:Check gfx poweron when skip flush_gpu_tlb

2023-10-09 Thread Christian König
Am 09.10.23 um 03:50 schrieb Xu, Feifei: [AMD Official Use Only - General] Hi, Based on your description, the above code should use "||" instead of "&&", && is to add more restriction here. To avoid skipping necessary TLB flush by return. For Asics < GFX11, !adev->gfx.is_poweron is always t

Re: [PATCH v5 2/7] drm/amd: Add concept of running prepare() sequence for IP blocks

2023-10-09 Thread Christian König
to allocate memory before suspend starts so that it can potentially be evicted into swap instead. Signed-off-by: Mario Limonciello Apart from the commit message Reviewed-by: Christian König . Regards, Christian. --- v4->v5: * New patch --- drivers/gpu/drm/amd/amdgpu/amdgpu_devic

Re: [PATCH v5 1/7] drm/amd: Evict resources during PM ops prepare() callback

2023-10-09 Thread Christian König
may fail. So move this step into prepare() to move evict majority of resources and update all non-pmops callers to call the same callback. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2362 Signed-off-by: Mario Limonciello Reviewed-by: Christian König for this one. --- v4->

Re: [PATCH v2 4/4] drm/amd/pm: Add sysfs attribute to get pm log

2023-10-09 Thread Christian König
Am 06.10.23 um 16:24 schrieb Alex Deucher: On Fri, Oct 6, 2023 at 4:32 AM Christian König wrote: Am 06.10.23 um 07:21 schrieb Lijo Lazar: Add sysfs attribute to read power management log. A snapshot is captured to the buffer when the attribute is read. Signed-off-by: Lijo Lazar --- v2

[PATCH] drm/amdgpu: add missing NULL check

2023-10-06 Thread Christian König
bo->tbo.resource can easily be NULL here. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h index f3ee83cdf

Re: [PATCH v2 4/4] drm/amd/pm: Add sysfs attribute to get pm log

2023-10-06 Thread Christian König
Am 06.10.23 um 07:21 schrieb Lijo Lazar: Add sysfs attribute to read power management log. A snapshot is captured to the buffer when the attribute is read. Signed-off-by: Lijo Lazar --- v2: Pass PAGE_SIZE as the max size of input buffer drivers/gpu/drm/amd/pm/amdgpu_pm.c | 40 ++

Re: [PATCH v4 1/3] drm/amd: Evict resources during PM ops prepare() callback

2023-10-06 Thread Christian König
Am 05.10.23 um 16:59 schrieb Mario Limonciello: On 10/5/2023 09:39, Christian König wrote: Am 04.10.23 um 19:18 schrieb Mario Limonciello: Linux PM core has a prepare() callback run before suspend. If the system is under high memory pressure, the resources may need to be evicted into swap

Re: [PATCH v4 1/3] drm/amd: Evict resources during PM ops prepare() callback

2023-10-05 Thread Christian König
Am 04.10.23 um 19:18 schrieb Mario Limonciello: Linux PM core has a prepare() callback run before suspend. If the system is under high memory pressure, the resources may need to be evicted into swap instead. If the storage backing for swap is offlined during the suspend() step then such a call

Re: [PATCH] drm/amdgpu: update ib start and size alignment

2023-10-05 Thread Christian König
: 256 bytes Encode IB size alignment: 4 bytes Also bump amdgpu driver version for this update. Signed-off-by: Boyuan Zhang Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 22 +++--- 2 files

<    4   5   6   7   8   9   10   11   12   13   >