[PATCH 4/6] drm/amdgpu: New VM state for evicted user BOs

2023-12-01 Thread Felix Kuehling
Create a new VM state to track user BOs that are in the system domain. In the next patch this will be used do conditionally re-validate them in amdgpu_vm_handle_moved. Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 17 + drivers/gpu/drm/amd/amdgpu

[PATCH 1/6] Revert "drm/prime: Unexport helpers for fd/handle conversion"

2023-12-01 Thread Felix Kuehling
-by: Christian König Acked-by: Thomas Zimmermann Acked-by: Daniel Vetter Signed-off-by: Felix Kuehling --- drivers/gpu/drm/drm_prime.c | 33 ++--- include/drm/drm_prime.h | 7 +++ 2 files changed, 25 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm

Re: [PATCH v3] drm/amdkfd: Run restore_workers on freezable WQs

2023-11-28 Thread Felix Kuehling
On 2023-11-24 8:40, Lazar, Lijo wrote: On 11/24/2023 4:25 AM, Felix Kuehling wrote: Make restore workers freezable so we don't have to explicitly flush them in suspend and GPU reset code paths, and we don't accidentally try to restore BOs while the GPU is suspended. Not having to flush

Re: [PATCH 1/3] Revert "drm/prime: Unexport helpers for fd/handle conversion"

2023-11-28 Thread Felix Kuehling
On 2023-11-28 12:22, Alex Deucher wrote: On Thu, Nov 23, 2023 at 6:12 PM Felix Kuehling wrote: [+Alex] On 2023-11-17 16:44, Felix Kuehling wrote: This reverts commit 71a7974ac7019afeec105a54447ae1dc7216cbb3. These helper functions are needed for KFD to export and import DMABufs the right

Re: [PATCH] drm/amdgpu: Fix uninitialized return value

2023-11-28 Thread Felix Kuehling
On 2023-11-28 8:18, Christian König wrote: Am 28.11.23 um 10:49 schrieb Lazar, Lijo: On 11/28/2023 3:07 PM, Christian König wrote: Am 27.11.23 um 22:55 schrieb Alex Deucher: On Mon, Nov 27, 2023 at 2:22 PM Christian König wrote: Am 27.11.23 um 19:29 schrieb Lijo Lazar: The return value is

Re: [PATCH 21/24] drm/amdkfd: add queue remapping

2023-11-23 Thread Felix Kuehling
On 2023-11-23 17:41, Greathouse, Joseph wrote: [Public] -Original Message- From: Zhu, James Sent: Thursday, November 23, 2023 1:49 PM On 2023-11-23 14:02, Felix Kuehling wrote: On 2023-11-23 11:25, James Zhu wrote: On 2023-11-22 17:35, Felix Kuehling wrote: On 2023-11-03 09:11

[PATCH v3] drm/amdkfd: Run restore_workers on freezable WQs

2023-11-23 Thread Felix Kuehling
-by: Felix Kuehling Acked-by: Christian König --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 68 +++ drivers/gpu/drm/amd/amdkfd/kfd_process.c | 87 +++ drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 4 +- 3 files changed, 104 insertions(+), 55 deletions

Re: [PATCH v3] drm/amdkfd: Free gang_ctx_bo and wptr_bo in pqm_uninit

2023-11-23 Thread Felix Kuehling
ess_info, pqn->q->gws); + pdd->qpd.num_gws = 0; With that fixed, the patch is Reviewed-by: Felix Kuehling + } + + if (dev->kfd->shared_resources.enable_mes) { + amdgpu_amdkfd_free_gtt_mem(dev->adev, pqn->q->gang_ctx_bo); +

Re: [PATCH] drm/amdgpu: Enable event log on MES 11

2023-11-23 Thread Felix Kuehling
On 2023-11-23 14:55, shaoyunl wrote: Enable event log through the HW specific FW API Signed-off-by: shaoyunl I'm assuming that enabling the log unconditionally has no noticeable performance impact. In that case, the patch is Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu

Re: [PATCH] drm/amdgpu: SW part of MES event log enablement

2023-11-23 Thread Felix Kuehling
On 2023-11-23 16:29, Felix Kuehling wrote: On 2023-11-23 14:48, shaoyunl wrote: This is the generic SW part, prepare the event log buffer and dump it through debugfs Signed-off-by: shaoyunl Reviewed-by: Felix Kuehling Sorry, I just realized a potential problem, see inline

Re: [PATCH] drm/amdgpu: SW part of MES event log enablement

2023-11-23 Thread Felix Kuehling
On 2023-11-23 14:48, shaoyunl wrote: This is the generic SW part, prepare the event log buffer and dump it through debugfs Signed-off-by: shaoyunl Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 2 + drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.h | 2

Re: [RFC PATCH v2] drm/amdkfd: Run restore_workers on freezable WQs

2023-11-23 Thread Felix Kuehling
because evict/restore workers can run independently of it. Instead call a new restore_process_helper directly. This is an RFC and request for testing. v2: - Reworked eviction fence signaling - Introduced restore_process_helper Signed-off-by: Felix Kuehling --- .../gpu/drm/amd/amdgpu/amdgp

Re: [PATCH 07/24] drm/amdkfd: check pcs_enrty valid

2023-11-23 Thread Felix Kuehling
On 2023-11-23 15:18, James Zhu wrote: On 2023-11-22 17:15, Felix Kuehling wrote: On 2023-11-03 09:11, James Zhu wrote: Check pcs_enrty valid for pc sampling ioctl. Signed-off-by: James Zhu ---   drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 30 ++--   1 file changed, 27

Re: [PATCH 18/24] drm/amdkfd: enable pc sampling start

2023-11-23 Thread Felix Kuehling
On 2023-11-23 15:01, James Zhu wrote: On 2023-11-22 17:27, Felix Kuehling wrote: On 2023-11-03 09:11, James Zhu wrote: Enable pc sampling start. Signed-off-by: James Zhu ---   drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 26 +---   drivers/gpu/drm/amd/amdkfd/kfd_priv.h

Re: [PATCH 1/3] Revert "drm/prime: Unexport helpers for fd/handle conversion"

2023-11-23 Thread Felix Kuehling
[+Alex] On 2023-11-17 16:44, Felix Kuehling wrote: This reverts commit 71a7974ac7019afeec105a54447ae1dc7216cbb3. These helper functions are needed for KFD to export and import DMABufs the right way without duplicating the tracking of DMABufs associated with GEM objects while ensuring

Re: [PATCH] drm/amdgpu: SW part of MES event log enablement

2023-11-23 Thread Felix Kuehling
orbell cleanup: error_doorbell: amdgpu_mes_doorbell_free(adev); With that fixed, the patch is Reviewed-by: Felix Kuehling error: @@ -198,6 +224,10 @@ int amdgpu_mes_init(struct amdgpu_device *adev) void amdgpu_mes_fini(struct amdgpu_device *adev) { + amdgpu_bo_fre

Re: [PATCH 20/24] drm/amdkfd: enable pc sampling work to trigger trap

2023-11-23 Thread Felix Kuehling
On 2023-11-23 13:27, James Zhu wrote: On 2023-11-22 17:31, Felix Kuehling wrote: On 2023-11-03 09:11, James Zhu wrote: Enable a delay work to trigger pc sampling trap. Signed-off-by: James Zhu ---   drivers/gpu/drm/amd/amdkfd/kfd_device.c  |  3 ++   drivers/gpu/drm/amd/amdkfd

Re: [PATCH 21/24] drm/amdkfd: add queue remapping

2023-11-23 Thread Felix Kuehling
On 2023-11-23 11:25, James Zhu wrote: On 2023-11-22 17:35, Felix Kuehling wrote: On 2023-11-03 09:11, James Zhu wrote: Add queue remapping to force the waves in any running processes to complete a CWSR trap. Please add an explanation why this is needed. [JZ] Even though the profiling

Re: [PATCH 23/24] drm/amdkfd: add pc sampling capability check

2023-11-22 Thread Felix Kuehling
On 2023-11-03 09:11, James Zhu wrote: From: David Yat Sin Add pc sampling capability check. This should be squashed into patch 2. Or if you want to keep it separate, put this patch before patch 2 and define AMDKFD_IOC_PC_SAMPLE with KFD_IOC_FLAG_PERFMON from the beginning. Regards,  

Re: [PATCH 21/24] drm/amdkfd: add queue remapping

2023-11-22 Thread Felix Kuehling
On 2023-11-03 09:11, James Zhu wrote: Add queue remapping to force the waves in any running processes to complete a CWSR trap. Please add an explanation why this is needed. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 11 +++

Re: [PATCH 20/24] drm/amdkfd: enable pc sampling work to trigger trap

2023-11-22 Thread Felix Kuehling
On 2023-11-03 09:11, James Zhu wrote: Enable a delay work to trigger pc sampling trap. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 3 ++ drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 39 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h |

Re: [PATCH 18/24] drm/amdkfd: enable pc sampling start

2023-11-22 Thread Felix Kuehling
On 2023-11-03 09:11, James Zhu wrote: Enable pc sampling start. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 26 +--- drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 2 ++ 2 files changed, 25 insertions(+), 3 deletions(-) diff --git

Re: [PATCH 06/24] drm/amdkfd: add trace_id return

2023-11-22 Thread Felix Kuehling
On 2023-11-03 09:11, James Zhu wrote: Add trace_id return for new pc sampling creation per device, Use IDR to quickly locate pc_sampling_entry for reference. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 2 ++ drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c |

Re: [PATCH 07/24] drm/amdkfd: check pcs_enrty valid

2023-11-22 Thread Felix Kuehling
On 2023-11-03 09:11, James Zhu wrote: Check pcs_enrty valid for pc sampling ioctl. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 30 ++-- 1 file changed, 27 insertions(+), 3 deletions(-) diff --git

Re: [PATCH 06/24] drm/amdkfd: add trace_id return

2023-11-22 Thread Felix Kuehling
On 2023-11-03 09:11, James Zhu wrote: Add trace_id return for new pc sampling creation per device, Use IDR to quickly locate pc_sampling_entry for reference. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 2 ++ drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c |

Re: [PATCH 05/24] drm/amdkfd: enable pc sampling create

2023-11-22 Thread Felix Kuehling
On 2023-11-03 09:11, James Zhu wrote: From: David Yat Sin Enable pc sampling create. Co-developed-by: James Zhu Signed-off-by: James Zhu Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 54 +++- drivers/gpu/drm/amd/amdkfd/kfd_priv.h

Re: [PATCH 01/24] drm/amdkfd/kfd_ioctl: add pc sampling support

2023-11-22 Thread Felix Kuehling
On 2023-11-03 09:11, James Zhu wrote: From: David Yat Sin Add pc sampling support in kfd_ioctl. Co-developed-by: James Zhu Signed-off-by: James Zhu Signed-off-by: David Yat Sin --- include/uapi/linux/kfd_ioctl.h | 57 +- 1 file changed, 56 insertions(+),

Re: [PATCH] drm/amdkfd: Use partial migrations/mapping for GPU/CPU page faults in SVM

2023-11-22 Thread Felix Kuehling
On 2023-11-14 16:01, Xiaogang.Chen wrote: From: Xiaogang Chen This patch implements partial migration/mapping for gpu/cpu page faults in SVM according to migration granularity(default 2MB). A svm range may include pages from both system ram and vram of one gpu now. These chagnes are expected

Re: [PATCH v2 2/4] drm/prime: Helper to export dmabuf without fd

2023-11-22 Thread Felix Kuehling
back to v1 of this patch set, which was consistent at least. I think I'd prefer that because I don't really understand what you're trying to achieve. Thanks,   Felix Best regards Thomas Am 22.11.23 um 00:11 schrieb Felix Kuehling: Change drm_gem_prime_handle_to_fd to drm_gem_prime_handle_to_dmabuf

Re: [PATCH v2] drm/amdkfd: Free gang_ctx_bo and wptr_bo in pqm_uninit

2023-11-22 Thread Felix Kuehling
On 2023-11-20 02:17, ZhenGuo Yin wrote: [Why] Memory leaks of gang_ctx_bo and wptr_bo. [How] Free gang_ctx_bo and wptr_bo in pqm_uninit. v2: add a common function pqm_clean_queue_resource to free queue's resources. Signed-off-by: ZhenGuo Yin --- .../amd/amdkfd/kfd_process_queue_manager.c

[PATCH v2 4/4] drm/amdkfd: Import DMABufs for interop through DRM

2023-11-21 Thread Felix Kuehling
Use drm_gem_prime_fd_to_handle to import DMABufs for interop. This ensures that a GEM handle is created on import and that obj->dma_buf will be set and remain set as long as the object is imported into KFD. Signed-off-by: Felix Kuehling Reviewed-by: Ramesh Errabolu Reviewed-by: Xiaogang.C

[PATCH v2 3/4] drm/amdkfd: Export DMABufs from KFD using GEM handles

2023-11-21 Thread Felix Kuehling
Create GEM handles for exporting DMABufs using GEM-Prime APIs. The GEM handles are created in a drm_client_dev context to avoid exposing them in user mode contexts through a DMABuf import. Signed-off-by: Felix Kuehling Reviewed-by: Ramesh Errabolu --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c

[PATCH v2 2/4] drm/prime: Helper to export dmabuf without fd

2023-11-21 Thread Felix Kuehling
Change drm_gem_prime_handle_to_fd to drm_gem_prime_handle_to_dmabuf to export a dmabuf without creating an FD as a user mode handle. This is more useful for users in kernel mode. Suggested-by: Thomas Zimmermann Signed-off-by: Felix Kuehling --- drivers/gpu/drm/drm_prime.c | 63

[PATCH v2 1/4] Revert "drm/prime: Unexport helpers for fd/handle conversion"

2023-11-21 Thread Felix Kuehling
: Christian König CC: Thomas Zimmermann Signed-off-by: Felix Kuehling --- drivers/gpu/drm/drm_prime.c | 33 ++--- include/drm/drm_prime.h | 7 +++ 2 files changed, 25 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm

Re: [RFC PATCH v2] drm/amdkfd: Run restore_workers on freezable WQs

2023-11-21 Thread Felix Kuehling
can run independently of it. Instead call a new restore_process_helper directly. This is an RFC and request for testing. v2: - Reworked eviction fence signaling - Introduced restore_process_helper Signed-off-by: Felix Kuehling --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 34 ++-- driv

Re: [Bug 218168] New: amdgpu: kfd_topology.c warning: the frame size of 1408 bytes is larger than 1024 bytes

2023-11-21 Thread Felix Kuehling
on the stack when inlining which can blow up the stack. Cc: Arnd Bergmann Acked-by: Arnd Bergmann Reviewed-by: Felix Kuehling Acked-by: Christian König Signed-off-by: Alex Deucher commit 1f3b515578a1d73926993629a06a7f3b60535b59 Author: Alex Deucher Date: Thu Sep 21 10:32

Re: [PATCH] drm/amdgpu: Force order between a read and write to the same address

2023-11-21 Thread Felix Kuehling
On 2023-11-20 12:41, Alex Sierra wrote: Setting register to force ordering to prevent read/write or write/read hazards for un-cached modes. Signed-off-by: Alex Sierra Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c| 8 .../gpu/drm

Re: [PATCH 1/3] Revert "drm/prime: Unexport helpers for fd/handle conversion"

2023-11-20 Thread Felix Kuehling
On 2023-11-20 11:02, Thomas Zimmermann wrote: Hi Christian Am 20.11.23 um 16:22 schrieb Christian König: Am 20.11.23 um 16:18 schrieb Thomas Zimmermann: Hi Am 20.11.23 um 16:06 schrieb Felix Kuehling: On 2023-11-20 6:54, Thomas Zimmermann wrote: Hi Am 17.11.23 um 22:44 schrieb Felix

Re: [PATCH 1/3] Revert "drm/prime: Unexport helpers for fd/handle conversion"

2023-11-20 Thread Felix Kuehling
On 2023-11-20 6:54, Thomas Zimmermann wrote: Hi Am 17.11.23 um 22:44 schrieb Felix Kuehling: This reverts commit 71a7974ac7019afeec105a54447ae1dc7216cbb3. These helper functions are needed for KFD to export and import DMABufs the right way without duplicating the tracking of DMABufs

[PATCH 3/3] drm/amdkfd: Import DMABufs for interop through DRM

2023-11-17 Thread Felix Kuehling
Use drm_gem_prime_fd_to_handle to import DMABufs for interop. This ensures that a GEM handle is created on import and that obj->dma_buf will be set and remain set as long as the object is imported into KFD. Signed-off-by: Felix Kuehling Reviewed-by: Ramesh Errabolu Reviewed-by: Xiaogang.C

[PATCH 2/3] drm/amdkfd: Export DMABufs from KFD using GEM handles

2023-11-17 Thread Felix Kuehling
Create GEM handles for exporting DMABufs using GEM-Prime APIs. The GEM handles are created in a drm_client_dev context to avoid exposing them in user mode contexts through a DMABuf import. Signed-off-by: Felix Kuehling Reviewed-by: Ramesh Errabolu --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c

[PATCH 1/3] Revert "drm/prime: Unexport helpers for fd/handle conversion"

2023-11-17 Thread Felix Kuehling
: Christian König CC: Thomas Zimmermann Signed-off-by: Felix Kuehling --- drivers/gpu/drm/drm_prime.c | 33 ++--- include/drm/drm_prime.h | 7 +++ 2 files changed, 25 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm

Re: [PATCH] drm/amdkfd: Copy HW exception data to user event

2023-11-17 Thread Felix Kuehling
On 2023-11-17 00:20, David Yat Sin wrote: Fixes issue where user events of type KFD_EVENT_TYPE_HW_EXCEPTION do not have valid data Signed-off-by: David Yat Sin Looks good to me. Do you need a KFD API version bump so ROCr can decide whether the information is valid? Regards,   Felix ---

Re: [PATCH] drm/amd: Enable checkpoint and restore of VRAM Bos with no VA

2023-11-16 Thread Felix Kuehling
On 2023-11-16 06:11, Christian König wrote: Am 16.11.23 um 03:47 schrieb Ramesh Errabolu: Tag VRAM BOs that do not have a VA with a unique Id, a 128-bit UUID. This unique Id is used to distinguish BOs that might otherwise be of same size. Checkpoint and restore assumes that these BOs are

Re: [PATCH 4/6] drm/amdkfd: Export DMABufs from KFD using GEM handles

2023-11-16 Thread Felix Kuehling
On 2023-11-07 11:58, Felix Kuehling wrote: Create GEM handles for exporting DMABufs using GEM-Prime APIs. The GEM handles are created in a drm_client_dev context to avoid exposing them in user mode contexts through a DMABuf import. This patch (and the next one) won't apply upstream because

Re: [PATCH v2] Add function parameter 'xcc_mask' not described in 'amdgpu_vm_flush_compute_tlb'

2023-11-16 Thread Felix Kuehling
On 2023-11-15 11:15, Srinivasan Shanmugam wrote: Fixes the below: drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:1373: warning: Function parameter or member 'xcc_mask' not described in 'amdgpu_vm_flush_compute_tlb' Cc: Felix Kuehling Cc: Christian König Cc: Alex Deucher Cc: "Pan, Xinhui&quo

Re: [PATCH] Add function parameter 'xcc_mask' not described in 'amdgpu_vm_flush_compute_tlb'

2023-11-15 Thread Felix Kuehling
12.11.23 um 05:45 schrieb Srinivasan Shanmugam: Fixes the below: drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:1373: warning: Function parameter or member 'xcc_mask' not described in 'amdgpu_vm_flush_compute_tlb' Cc: Felix Kuehling Cc: Christian König Cc: Alex Deucher Cc: "Pan, Xinhui"

[PATCH 2/2] drm/amdgpu: Auto-validate DMABuf imports in compute VMs

2023-11-14 Thread Felix Kuehling
VM. Revalidation after evictions is handled in the VM code. Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 3 + .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 45 --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c| 6 +- drivers/gpu/drm/amd/amdgpu/amdgpu_dma_bu

[PATCH 1/2] drm/amdgpu: New VM state for evicted user BOs

2023-11-14 Thread Felix Kuehling
Create a new VM state to track user BOs that are in the system domain. In the next patch this will be used do conditionally re-validate them in amdgpu_vm_handle_moved. Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 17 + drivers/gpu/drm/amd/amdgpu

Re: [PATCH 2/6] drm/amdgpu: New VM state for evicted user BOs

2023-11-14 Thread Felix Kuehling
On 2023-11-09 03:12, Christian König wrote: Am 08.11.23 um 22:23 schrieb Felix Kuehling: On 2023-11-08 07:28, Christian König wrote: Not necessary objections to this patch here, but rather how this new state is used later on. The fundamental problem is that re-validating things

Re: [PATCH 5/6] drm/amdkfd: Import DMABufs for interop through DRM

2023-11-08 Thread Felix Kuehling
On 2023-11-08 18:20, Chen, Xiaogang wrote: On 11/7/2023 10:58 AM, Felix Kuehling wrote: Use drm_gem_prime_fd_to_handle to import DMABufs for interop. This ensures that a GEM handle is created on import and that obj->dma_buf will be set and remain set as long as the object is imported into

Re: [PATCH 2/6] drm/amdgpu: New VM state for evicted user BOs

2023-11-08 Thread Felix Kuehling
states into one? Regards,   Felix Regards, Christian. Am 07.11.23 um 23:11 schrieb Felix Kuehling: Hi Christian, I know you have objected to this patch before. I still think this is the best solution for what I need. I can talk you through my reasoning by email or offline. If I can't

Re: [Patch v2] drm/ttm: Schedule delayed_delete worker closer

2023-11-08 Thread Felix Kuehling
on NUMA systems (dGPU) and AMD APU platforms such as GFXIP9.4.3. Acked-by: Felix Kuehling Signed-off-by: Rajneesh Bhardwaj Reviewed-by: Christian König Going to push this to drm-misc-next. Hold on. Rajneesh just pointed out a WARN regression from testing. I think the problem is that the bdev

Re: [PATCH v2] drm/amdgpu: Change extended-scope MTYPE on GC 9.4.3

2023-11-08 Thread Felix Kuehling
On 2023-11-08 12:25, David Yat Sin wrote: Change local memory type to MTYPE_UC on revision id 0 Signed-off-by: David Yat Sin Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 7 +-- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 8 +--- 2 files changed, 10

Re: [PATCH] drm/amdkfd: Skip locking KFD when unbinding GPU

2023-11-07 Thread Felix Kuehling
On 2023-11-07 17:03, Alex Deucher wrote: On Mon, Nov 6, 2023 at 6:17 PM Felix Kuehling wrote: On 2023-11-06 2:14, Lawrence Yiu wrote: After unbinding a GPU, KFD becomes locked and unusable, resulting in applications not being able to use ROCm for compute anymore and rocminfo outputting

Re: [PATCH 2/6] drm/amdgpu: New VM state for evicted user BOs

2023-11-07 Thread Felix Kuehling
, Felix Kuehling wrote: Create a new VM state to track user BOs that are in the system domain. In the next patch this will be used do conditionally re-validate them in amdgpu_vm_handle_moved. Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 17 + drivers

Re: [PATCH] drm/ttm: Schedule delayed_delete worker closer

2023-11-07 Thread Felix Kuehling
that are across interconnect boundaries such as xGMI, PCIe etc. This change helps USWC GTT allocations on NUMA systems (dGPU) and AMD APU platforms such as GFXIP9.4.3. Signed-off-by: Rajneesh Bhardwaj Acked-by: Felix Kuehling --- drivers/gpu/drm/ttm/ttm_bo.c | 10 +- drivers/gpu/drm/ttm

[RFC PATCH v2] drm/amdkfd: Run restore_workers on freezable WQs

2023-11-07 Thread Felix Kuehling
because evict/restore workers can run independently of it. Instead call a new restore_process_helper directly. This is an RFC and request for testing. v2: - Reworked eviction fence signaling - Introduced restore_process_helper Signed-off-by: Felix Kuehling --- .../gpu/drm/amd/amdgpu

Re: [PATCH 4/6] drm/amdkfd: Export DMABufs from KFD using GEM handles

2023-11-07 Thread Felix Kuehling
, Ramesh Subject: [PATCH 4/6] drm/amdkfd: Export DMABufs from KFD using GEM handles Create GEM handles for exporting DMABufs using GEM-Prime APIs. The GEM handles are created in a drm_client_dev context to avoid exposing them in user mode contexts through a DMABuf import. Signed-off-by: Felix

[PATCH 5/6] drm/amdkfd: Import DMABufs for interop through DRM

2023-11-07 Thread Felix Kuehling
Use drm_gem_prime_fd_to_handle to import DMABufs for interop. This ensures that a GEM handle is created on import and that obj->dma_buf will be set and remain set as long as the object is imported into KFD. Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|

[PATCH 3/6] drm/amdgpu: Auto-validate DMABuf imports in compute VMs

2023-11-07 Thread Felix Kuehling
VM. Revalidation after evictions is handled in the VM code. Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 3 + .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 15 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c| 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_dma_bu

[PATCH 6/6] drm/amdkfd: Bump KFD ioctl version

2023-11-07 Thread Felix Kuehling
This is not strictly a change in the IOCTL API. This version bump is meant to indicate to user mode the presence of a number of changes and fixes that enable the management of VA mappings in compute VMs using the GEM_VA ioctl for DMABufs exported from KFD. Signed-off-by: Felix Kuehling

[PATCH 2/6] drm/amdgpu: New VM state for evicted user BOs

2023-11-07 Thread Felix Kuehling
Create a new VM state to track user BOs that are in the system domain. In the next patch this will be used do conditionally re-validate them in amdgpu_vm_handle_moved. Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 17 + drivers/gpu/drm/amd/amdgpu

[PATCH 4/6] drm/amdkfd: Export DMABufs from KFD using GEM handles

2023-11-07 Thread Felix Kuehling
Create GEM handles for exporting DMABufs using GEM-Prime APIs. The GEM handles are created in a drm_client_dev context to avoid exposing them in user mode contexts through a DMABuf import. Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c| 11 +++ drivers/gpu

[PATCH 1/6] drm/amdgpu: Fix possible null pointer dereference

2023-11-07 Thread Felix Kuehling
mem = bo->tbo.resource may be NULL in amdgpu_vm_bo_update. Fixes: 180253782038 ("drm/ttm: stop allocating dummy resources during BO creation") Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-

Re: [PATCH] drm/amdkfd: Skip locking KFD when unbinding GPU

2023-11-06 Thread Felix Kuehling
On 2023-11-06 2:14, Lawrence Yiu wrote: After unbinding a GPU, KFD becomes locked and unusable, resulting in applications not being able to use ROCm for compute anymore and rocminfo outputting the following error message: ROCk module is loaded Unable to open /dev/kfd read-write: Invalid

Re: [PATCH] drm/amdkfd: Free gang_ctx_bo and wptr_bo in pqm_uninit

2023-11-06 Thread Felix Kuehling
On 2023-11-06 5:40, ZhenGuo Yin wrote: [Why] There will be a warning trace when cleaning up the gtt drm_mm allocator during unloading driver since gang_ctx_bo and wptr_bo do not get freed. This isn't just a problem with module unloading, but a more general memory leak. pqm_uninit runs not

Re: [PATCH] drm/amdgpu: fix error handling in amdgpu_vm_init

2023-11-01 Thread Felix Kuehling
On 2023-10-31 11:18, Alex Deucher wrote: On Tue, Oct 31, 2023 at 11:12 AM Christian König wrote: When clearing the root PD fails we need to properly release it again. Signed-off-by: Christian König Acked-by: Alex Deucher Has this been submitted? I see some intermittent failures in the PSDB

Re: [PATCH 03/11] drm/amdkfd: Improve amdgpu_vm_handle_moved

2023-11-01 Thread Felix Kuehling
On 2023-10-17 17:13, Felix Kuehling wrote: Let amdgpu_vm_handle_moved update all BO VA mappings of BOs reserved by the caller. This will be useful for handling extra BO VA mappings in KFD VMs that are managed through the render node API. Signed-off-by: Felix Kuehling Reviewed-by: Christian

Re: [RFC PATCH] drm/amdkfd: Run restore_workers on freezable WQs

2023-10-30 Thread Felix Kuehling
On 2023-10-30 13:48, Christian König wrote: Am 30.10.23 um 18:38 schrieb Felix Kuehling: On 2023-10-30 12:16, Christian König wrote: @@ -1904,6 +1906,19 @@ kfd_process_gpuid_from_node(struct kfd_process *p, struct kfd_node *node,   return -EINVAL;   }   +static void

Re: [RFC PATCH] drm/amdkfd: Run restore_workers on freezable WQs

2023-10-30 Thread Felix Kuehling
On 2023-10-30 12:16, Christian König wrote: @@ -1904,6 +1906,19 @@ kfd_process_gpuid_from_node(struct kfd_process *p, struct kfd_node *node,   return -EINVAL;   }   +static void signal_eviction_fence(struct kfd_process *p) +{ +    spin_lock(>ef_lock); +    if (!p->ef) +    goto

Re: [RFC PATCH] drm/amdkfd: Run restore_workers on freezable WQs

2023-10-30 Thread Felix Kuehling
On 2023-10-30 4:23, Christian König wrote: Am 28.10.23 um 00:39 schrieb Felix Kuehling: Make restore workers freezable so we don't have to explicitly flush them in suspend and GPU reset code paths, and we don't accidentally try to restore BOs while the GPU is suspended. Not having to flush

[RFC PATCH] drm/amdkfd: Run restore_workers on freezable WQs

2023-10-27 Thread Felix Kuehling
igned-off-by: Felix Kuehling --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 9 ++-- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 1 + drivers/gpu/drm/amd/amdkfd/kfd_process.c | 49 +-- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 4 +- 4 files changed, 44 insertio

Re: [PATCHv2 1/2] drm/amdkfd: Populate cache info for GFX 9.4.3

2023-10-27 Thread Felix Kuehling
or all cache levels, I believe. Given that L3 is likely the largest, I'm a bit suspicious of this conversion. Other than that, the series is Reviewed-by: Felix Kuehling + pcache_info[i].cache_level = 3; + pca

Re: [PATCH v3 1/2] drm/amdgpu: Acquire ttm locks for dmaunmap

2023-10-25 Thread Felix Kuehling
On 2023-10-25 02:12, Christian König wrote: Am 24.10.23 um 21:20 schrieb David Francis: dmaunmap can call ttm_bo_validate, which expects the ttm dma_resv to be held. Well first of all the dma_resv object isn't related to TTM. Acquire the locks in amdgpu_amdkfd_gpuvm_dmaunmap_mem. Because

Re: [PATCH v3] drm/amdgpu: Add EXT_COHERENT support for APU and NUMA systems

2023-10-24 Thread Felix Kuehling
memory MTYPE_CC. Add an option in the override function for this case, and add a check to ensure it is not used on UNCACHED memory. V2: Combined APU and NUMA code into one patch V3: Fixed a potential nullptr in amdgpu_vm_bo_update Signed-off-by: David Francis Reviewed-by: Felix Kuehling

Re: [PATCH v3 2/2] drm/amdgpu: Permit PCIe transfer over links with XGMI

2023-10-24 Thread Felix Kuehling
function amdgpu_device_is_peer_accessible and into the topology path. Signed-off-by: David Francis This patch is Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +--- drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 6 -- 2 files changed, 5 insertions(+), 5

Re: [PATCH v3 1/2] drm/amdgpu: Acquire ttm locks for dmaunmap

2023-10-24 Thread Felix Kuehling
On 2023-10-24 15:20, David Francis wrote: dmaunmap can call ttm_bo_validate, which expects the ttm dma_resv to be held. Acquire the locks in amdgpu_amdkfd_gpuvm_dmaunmap_mem. Because the dmaunmap step can now fail, two new numbers need to be tracked. n_dmaunmap_success tracks the number of

Re: [PATCH] Revert "drm/amdgpu: remove vm sanity check from amdgpu_vm_make_compute"

2023-10-23 Thread Felix Kuehling
[sorry, I hit send too early] On 2023-10-23 11:15, Christian König wrote: Am 23.10.23 um 15:06 schrieb Daniel Tang: That commit causes the screen to freeze a few moments after running clinfo on v6.6-rc7 and ROCm 5.6. Sometimes the rest of the computer including ssh also freezes. On v6.5-rc1,

Re: [PATCH] Revert "drm/amdgpu: remove vm sanity check from amdgpu_vm_make_compute"

2023-10-23 Thread Felix Kuehling
On 2023-10-23 11:15, Christian König wrote: Am 23.10.23 um 15:06 schrieb Daniel Tang: That commit causes the screen to freeze a few moments after running clinfo on v6.6-rc7 and ROCm 5.6. Sometimes the rest of the computer including ssh also freezes. On v6.5-rc1, it only results in a NULL

Re: [PATCH v3] drm/amdkfd: Use partial mapping in GPU page faults

2023-10-23 Thread Felix Kuehling
On 2023-10-20 17:53, Xiaogang.Chen wrote: From: Xiaogang Chen After partial migration to recover GPU page fault this patch does GPU vm space mapping for same page range that got migrated intead of mapping all pages of svm range in which the page fault happened. Signed-off-by: Xiaogang Chen

Re: [PATCH 3/3] Revert "[PATCH] drm/amdkfd: Use partial migrations in GPU page faults"

2023-10-23 Thread Felix Kuehling
mapping to GPU patch later. Signed-off-by: Philip Yang The series is Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 150 ++- drivers/gpu/drm/amd/amdkfd/kfd_migrate.h | 6 +- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 83

Re: [PATCH] drm/amdkfd: Address 'remap_list' not described in 'svm_range_add'

2023-10-23 Thread Felix Kuehling
On 2023-10-23 12:12, Srinivasan Shanmugam wrote: Fixes the below: drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_svm.c:2073: warning: Function parameter or member 'remap_list' not described in 'svm_range_add' Cc: Felix Kuehling Cc: Christian König Cc: Alex Deucher Cc: "Pan, Xinhui"

Re: [PATCH 1/2] drm/amdgpu: Add timeout for sync wait

2023-10-20 Thread Felix Kuehling
On 2023-10-20 09:10, Christian König wrote: No, the wait forever is what is expected and perfectly valid user experience. Waiting with a timeout on the other hand sounds like a really bad idea to me. Every wait with a timeout needs a justification, e.g. for example that userspace

Re: [PATCH] drm/amdkfd: reserve a fence slot while locking the BO

2023-10-20 Thread Felix Kuehling
On 2023-10-20 08:33, Christian König wrote: Looks like the KFD still needs this. Signed-off-by: Christian König Fixes: 8abc1eb2987a ("drm/amdkfd: switch over to using drm_exec v3") To fix the immediate problem, this patch is Acked-by: Felix Kuehling As I understand it, thi

Re: [PATCH] drm/amdkfd: remap unaligned svm ranges that have split

2023-10-19 Thread Felix Kuehling
Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 43 +--- 1 file changed, 32 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index 7b81233bc9ae..aa2996d6f818 100644 --- a/drivers

Re: [PATCH] drm/amdkfd: remap unaligned svm ranges that have split

2023-10-19 Thread Felix Kuehling
On 2023-10-19 16:53, Philip Yang wrote: On 2023-10-19 16:05, Felix Kuehling wrote: On 2023-10-18 18:26, Alex Sierra wrote: Split SVM ranges that have been mapped into 2MB page table entries, require to be remap in case the split has happened in a non-aligned VA. [WHY]: This condition

Re: [PATCH] drm/amdkfd: remap unaligned svm ranges that have split

2023-10-19 Thread Felix Kuehling
On 2023-10-18 18:26, Alex Sierra wrote: Split SVM ranges that have been mapped into 2MB page table entries, require to be remap in case the split has happened in a non-aligned VA. [WHY]: This condition causes the 2MB page table entries be split into 4KB PTEs. Signed-off-by: Alex Sierra ---

Re: [PATCH] drm/amdgpu: Add timeout for sync wait

2023-10-19 Thread Felix Kuehling
On 2023-10-19 05:31, Emily Deng wrote: Issue: Dead heappen during gpu recover [56433.829492] amdgpu :04:00.0: amdgpu: GPU reset begin! [56550.499625] INFO: task kworker/u80:0:10 blocked for more than 120 seconds. [56550.520215] Tainted: G OE 6.2.0-34-generic

Re: [PATCH v2] drm/amdgpu: Add EXT_COHERENT support for APU and NUMA systems

2023-10-19 Thread Felix Kuehling
. +* MTYPE_UC will be present if the memory is external-coherent ext_coherent stands for "extended coherent", not "external". With that fixed, the patch is Reviewed-by: Felix Kuehling +* and can also be overridden. */ if ((*flags & AM

[PATCH 10/11] drm/amdkfd: Import DMABufs for interop through DRM

2023-10-17 Thread Felix Kuehling
Use drm_gem_prime_fd_to_handle to import DMABufs for interop. This ensures that a GEM handle is created on import and that obj->dma_buf will be set and remain set as long as the object is imported into KFD. Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|

[PATCH 07/11] drm/amdgpu: New VM state for evicted user BOs

2023-10-17 Thread Felix Kuehling
Create a new VM state to track user BOs that are in the system domain. In the next patch this will be used do conditionally re-validate them in amdgpu_vm_handle_moved. Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 17 + drivers/gpu/drm/amd/amdgpu

[PATCH 09/11] drm/amdkfd: Export DMABufs from KFD using GEM handles

2023-10-17 Thread Felix Kuehling
Create GEM handles for exporting DMABufs using GEM-Prime APIs. The GEM handles are created in a drm_client_dev context to avoid exposing them in user mode contexts through a DMABuf import. Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c| 11 +++ drivers/gpu

[PATCH 06/11] drm/amdkfd: Move TLB flushing logic into amdgpu

2023-10-17 Thread Felix Kuehling
. This is not a production use case. Signed-off-by: Felix Kuehling Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 29 -- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 5 --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 44 ++ drivers/gpu/drm/amd/amdgpu

[PATCH 11/11] drm/amdkfd: Bump KFD ioctl version

2023-10-17 Thread Felix Kuehling
This is not strictly a change in the IOCTL API. This version bump is meant to indicate to user mode the presence of a number of changes and fixes that enable the management of VA mappings in compute VMs using the GEM_VA ioctl for DMABufs exported from KFD. Signed-off-by: Felix Kuehling

[PATCH 08/11] drm/amdgpu: Auto-validate DMABuf imports in compute VMs

2023-10-17 Thread Felix Kuehling
VM. Revalidation after evictions is handled in the VM code. Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 3 + .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 15 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c| 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_dma_bu

[PATCH 04/11] drm/amdgpu: Attach eviction fence on alloc

2023-10-17 Thread Felix Kuehling
Instead of attaching the eviction fence when a KFD BO is first mapped, attach it when it is allocated or imported. This in preparation to allow KFD BOs to be mapped using the render node API. Signed-off-by: Felix Kuehling Acked-by: Christian König --- .../gpu/drm/amd/amdgpu

[PATCH 05/11] drm/amdgpu: update mappings not managed by KFD

2023-10-17 Thread Felix Kuehling
When restoring after an eviction, use amdgpu_vm_handle_moved to update BO VA mappings in KFD VMs that are not managed through the KFD API. This should allow using the render node API to create more flexible memory mappings in KFD VMs. Signed-off-by: Felix Kuehling Acked-by: Christian König

[PATCH 03/11] drm/amdkfd: Improve amdgpu_vm_handle_moved

2023-10-17 Thread Felix Kuehling
Let amdgpu_vm_handle_moved update all BO VA mappings of BOs reserved by the caller. This will be useful for handling extra BO VA mappings in KFD VMs that are managed through the render node API. Signed-off-by: Felix Kuehling Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu

[PATCH 02/11] drm/amdgpu: Reserve fences for VM update

2023-10-17 Thread Felix Kuehling
In amdgpu_dma_buf_move_notify reserve fences for the page table updates in amdgpu_vm_clear_freed and amdgpu_vm_handle_moved. This fixes a BUG_ON in dma_resv_add_fence when using SDMA for page table updates. Signed-off-by: Felix Kuehling Reviewed-by: Christian König --- drivers/gpu/drm/amd

<    1   2   3   4   5   6   7   8   9   10   >