Re: [PATCH 1/6] drm/amdgpu: Generalize KFD dmabuf import

2023-01-16 Thread Felix Kuehling
On 2023-01-16 17:04, Errabolu, Ramesh wrote: [AMD Official Use Only - General] A minor comment, unrelated to the patch. The comments are inline. Regards, Ramesh -Original Message- From: amd-gfx On Behalf Of Felix Kuehling Sent: Thursday, January 12, 2023 7:02 AM To: amd-gfx

Re: [PATCH 1/6] drm/amdgpu: Generalize KFD dmabuf import

2023-01-16 Thread Felix Kuehling
Am 2023-01-16 um 06:42 schrieb Christian König: [SNIP] When the BO is imported into the same GPU, you get a reference to the same BO, so the imported BO has the same mmap_offset as the original BO. When the BO is imported into a different GPU, it is a new BO with a new mmap_offset. That wo

Re: [PATCH 1/6] drm/amdgpu: Generalize KFD dmabuf import

2023-01-15 Thread Felix Kuehling
Am 2023-01-15 um 11:43 schrieb Christian König: Am 14.01.23 um 00:15 schrieb Felix Kuehling: On 2023-01-13 18:00, Chen, Xiaogang wrote: On 1/13/2023 4:26 PM, Felix Kuehling wrote: On 2023-01-12 17:41, Chen, Xiaogang wrote: On 1/11/2023 7:31 PM, Felix Kuehling wrote: Use proper

Re: [PATCH 1/6] drm/amdgpu: Generalize KFD dmabuf import

2023-01-13 Thread Felix Kuehling
On 2023-01-13 18:00, Chen, Xiaogang wrote: On 1/13/2023 4:26 PM, Felix Kuehling wrote: On 2023-01-12 17:41, Chen, Xiaogang wrote: On 1/11/2023 7:31 PM, Felix Kuehling wrote: Use proper amdgpu_gem_prime_import function to handle all kinds of imports. Remember the dmabuf reference to enable

Re: [PATCH 1/6] drm/amdgpu: Generalize KFD dmabuf import

2023-01-13 Thread Felix Kuehling
On 2023-01-12 17:41, Chen, Xiaogang wrote: On 1/11/2023 7:31 PM, Felix Kuehling wrote: Use proper amdgpu_gem_prime_import function to handle all kinds of imports. Remember the dmabuf reference to enable proper multi-GPU attachment to multiple VMs without erroneously re-exporting the underlying

Re: [PATCH] drm/amdkfd: Support process XNACK mode dynamic change

2023-01-13 Thread Felix Kuehling
create queues. Add helper macro KFD_SUPPORT_XNACK_PER_PROCESS to remove duplicate code and add new ASICs support in future. Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling --- .../amd/amdkfd/kfd_device_queue_manager_v9.c | 27 +-- drivers/gpu/drm/amd/amdkfd/kfd_priv.h

[PATCH 6/6] drm/amdgpu: Do bo_va ref counting for KFD BOs

2023-01-11 Thread Felix Kuehling
This is needed to correctly handle BOs imported into the GEM API, which would otherwise get added twice to the same VM. Signed-off-by: Felix Kuehling --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 28 +++ 1 file changed, 22 insertions(+), 6 deletions(-) diff --git a

[PATCH 5/6] drm/amdgpu: update mappings not managed by KFD

2023-01-11 Thread Felix Kuehling
comments, remove TODOs that are no longer applicable Signed-off-by: Felix Kuehling --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 28 +++ 1 file changed, 22 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd

[PATCH 4/6] drm/amdgpu: Attach eviction fence on alloc

2023-01-11 Thread Felix Kuehling
Instead of attaching the eviction fence when a KFD BO is first mapped, attach it when it is allocated or imported. This in preparation to allow KFD BOs to be mapped using the render node API. Signed-off-by: Felix Kuehling --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 63

[PATCH 3/6] drm/amdkfd: Improve amdgpu_vm_handle_moved

2023-01-11 Thread Felix Kuehling
Let amdgpu_vm_handle_moved update all BO VA mappings of BOs reserved by the caller. This will be useful for handling extra BO VA mappings in KFD VMs that are managed through the render node API. Signed-off-by: Felix Kuehling Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu

[PATCH 2/6] drm/amdkfd: Implement DMA buf fd export from KFD

2023-01-11 Thread Felix Kuehling
user mode change (Thunk API and kfdtest) is here: https://github.com/fxkamd/ROCT-Thunk-Interface/commits/fxkamd/dmabuf Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 2 + .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 45 +++ drivers/gpu/drm/amd

[PATCH 0/6] Enable KFD to use render node BO mappings

2023-01-11 Thread Felix Kuehling
single render node FD and GPUVM address space. The DMABuf export API will also be used later for upstream IPC and RDMA implementations. Felix Kuehling (6): drm/amdgpu: Generalize KFD dmabuf import drm/amdkfd: Implement DMA buf fd export from KFD drm/amdkfd: Improve amdgpu_vm_handle_moved drm

[PATCH 1/6] drm/amdgpu: Generalize KFD dmabuf import

2023-01-11 Thread Felix Kuehling
Use proper amdgpu_gem_prime_import function to handle all kinds of imports. Remember the dmabuf reference to enable proper multi-GPU attachment to multiple VMs without erroneously re-exporting the underlying BO multiple times. Signed-off-by: Felix Kuehling --- .../gpu/drm/amd/amdgpu

Re: [PATCH v4] drm/amdkfd: Page aligned memory reserve size

2023-01-10 Thread Felix Kuehling
trigger WARN_ONCE(adev && adev->kfd.vram_used < 0, "..."), to help debug the accounting issue with warning and backtrace. Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 2 +- drivers/gpu/drm/amd/amdgpu/am

Re: [PATCH] drm/amdkfd: Page aligned memory reserve size

2023-01-10 Thread Felix Kuehling
Am 2023-01-10 um 15:44 schrieb Philip Yang: On 2023-01-10 13:33, Felix Kuehling wrote: Am 2023-01-10 um 12:11 schrieb Philip Yang: Use page aligned size to reserve memory usage because page aligned TTM BO size is used to unreserve memory usage, otherwise no page aligned size causes memory

Re: [PATCH] drm/amdkfd: Page aligned memory reserve size

2023-01-10 Thread Felix Kuehling
Am 2023-01-10 um 12:11 schrieb Philip Yang: Use page aligned size to reserve memory usage because page aligned TTM BO size is used to unreserve memory usage, otherwise no page aligned size causes memory usage accounting unbalanced. Change vram_used definition type to int64_t to be able to trigge

Re: [PATCH] drm/amdkfd: Fix NULL pointer error for GC 11.0.1 on mGPU

2023-01-10 Thread Felix Kuehling
Am 2023-01-05 um 14:28 schrieb Eric Huang: The point bo->kfd_bo is NULL for queue's write pointer BO when creating queue on mGPU. To avoid using the pointer fixes the error. Signed-off-by: Eric Huang Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd

Re: [regression, bisected, pci/iommu] Bug 216865 - Black screen when amdgpu started during 6.2-rc1 boot with AMD IOMMU enabled

2023-01-10 Thread Felix Kuehling
Am 2023-01-10 um 10:19 schrieb Jason Gunthorpe: On Tue, Jan 10, 2023 at 10:05:44AM -0500, Felix Kuehling wrote: Am 2023-01-10 um 08:45 schrieb Christian König: And I'm like 99% sure that Kabini/Wani should be identical to that. Kabini is not supported by KFD. There should be no cal

Re: [regression, bisected, pci/iommu] Bug 216865 - Black screen when amdgpu started during 6.2-rc1 boot with AMD IOMMU enabled

2023-01-10 Thread Felix Kuehling
Am 2023-01-10 um 08:45 schrieb Christian König: And I'm like 99% sure that Kabini/Wani should be identical to that. Kabini is not supported by KFD. There should be no calls to amd_iommu_... functions on Kabini, at least not from kfd_iommu.c. And I'm not aware of any other callers in amdgpu.ko

Re: [PATCH] drm/amdkfd: Page aligned VRAM reserve size

2023-01-09 Thread Felix Kuehling
Am 2023-01-09 um 19:01 schrieb Philip Yang: Use page aligned size to reserve VRAM usage because page aligned TTM BO size is used to unreserve VRAM usage, otherwise this cause vram_used accounting unbalanced. Change vram_used definition type to int64_t to be able to trigger WARN_ONCE(adev && adev

Re: [PATCH] drm/amdkfd: Use resource_size() helper function

2023-01-09 Thread Felix Kuehling
Am 2023-01-07 um 15:09 schrieb Deepak R Varma: On Fri, Dec 23, 2022 at 02:45:00AM +0530, Deepak R Varma wrote: Use the resource_size() function instead of a open coded computation resource size. It makes the code more readable. Issue identified using resource_size.cocci coccinelle semantic pa

Re: [PATCH] drm/amdkfd: Add sync after creating vram bo

2023-01-09 Thread Felix Kuehling
Am 2023-01-09 um 15:23 schrieb Felix Kuehling: Am 2023-01-09 um 15:18 schrieb Philip Yang: On 2023-01-09 14:27, Eric Huang wrote: There will be data corruption on vram allocated by svm if initialization is not being done. Adding sync is to resolve this issue. Signed-off-by: Eric Huang

Re: [PATCH] drm/amdkfd: Add sync after creating vram bo

2023-01-09 Thread Felix Kuehling
Am 2023-01-09 um 15:18 schrieb Philip Yang: On 2023-01-09 14:27, Eric Huang wrote: There will be data corruption on vram allocated by svm if initialization is not being done. Adding sync is to resolve this issue. Signed-off-by: Eric Huang ---   drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 7 +++

Re: [PATCH] drm/amdkfd: Add sync after creating vram bo

2023-01-09 Thread Felix Kuehling
done. It is being done as a result of setting AMDGPU_GEM_CREATE_VRAM_CLEARED. The problem is that the initialization is not complete yet, so it can corrupt data written by the application unless we wait for it to finish first. Other than that the patch is Reviewed-by: Felix Kuehling

Re: [RFC 3/7] drm/amdgpu: Create MQD for userspace queue

2023-01-04 Thread Felix Kuehling
Am 2023-01-04 um 04:23 schrieb Shashank Sharma: On 04/01/2023 10:17, Christian König wrote: Am 04.01.23 um 10:13 schrieb Shashank Sharma: On 04/01/2023 10:10, Christian König wrote: Am 04.01.23 um 07:21 schrieb Yadav, Arvind: On 1/4/2023 12:07 AM, Felix Kuehling wrote: Am 2023-01-03 um

Re: [PATCH] drm/amdkfd: simplify cases

2023-01-03 Thread Felix Kuehling
Am 2022-12-27 um 12:12 schrieb Alex Deucher: On Tue, Dec 27, 2022 at 12:10 PM Alex Deucher wrote: A number of of the gfx8 cases where the same. Clean them up. typos here fixed up locally. Alex Signed-off-by: Alex Deucher Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd

Re: [RFC 1/7] drm/amdgpu: UAPI for user queue management

2023-01-03 Thread Felix Kuehling
goal, hence the flag for AQL vs PM4. Alex Regards Shaoyun.liu -Original Message- From: amd-gfx On Behalf Of Felix Kuehling Sent: Tuesday, January 3, 2023 1:30 PM To: Sharma, Shashank ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Koenig, Christian ; Yadav, Arvind ; Paneer S

Re: [RFC 3/7] drm/amdgpu: Create MQD for userspace queue

2023-01-03 Thread Felix Kuehling
Am 2023-01-03 um 04:36 schrieb Shashank Sharma: /*MQD struct for usermode Queue*/ +struct amdgpu_usermode_queue_mqd This is specific to GC 11.  Every IP and version will have its own MQD format.  That should live in the IP specific code, not the generic code.  We already have the generic MQD par

Re: [RFC 1/7] drm/amdgpu: UAPI for user queue management

2023-01-03 Thread Felix Kuehling
Am 2022-12-23 um 14:36 schrieb Shashank Sharma: From: Alex Deucher This patch intorduces new UAPI/IOCTL for usermode graphics queue. The userspace app will fill this structure and request the graphics driver to add a graphics work queue for it. The output of this UAPI is a queue id. This UAPI

Re: [syzbot] WARNING: locking bug in inet_autobind

2023-01-03 Thread Felix Kuehling
Am 2023-01-03 um 11:05 schrieb Waiman Long: On 1/3/23 10:39, Felix Kuehling wrote: The regression point doesn't make sense. The kernel config doesn't enable CONFIG_DRM_AMDGPU, so there is no way that a change in AMDGPU could have caused this regression. I agree. It is likely a pr

Re: [syzbot] WARNING: locking bug in inet_autobind

2023-01-03 Thread Felix Kuehling
The regression point doesn't make sense. The kernel config doesn't enable CONFIG_DRM_AMDGPU, so there is no way that a change in AMDGPU could have caused this regression. Regards,   Felix Am 2022-12-29 um 01:26 schrieb syzbot: syzbot has found a reproducer for the following issue on: HEAD c

Re: [PATCH 1/1] drm/amdkfd: Cleanup vm process info if init vm failed

2022-12-20 Thread Felix Kuehling
off-by: Philip Yang Reviewed-by: Felix Kuehling I'm still curious what caused the acquire_vm failure in the first place. Regards,   Felix --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 4 ++-- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 18 ++ drivers/gpu/drm

[RFC PATCH 3/3] drm/amdkfd: Use per-process notifier_lock for SVM

2022-12-20 Thread Felix Kuehling
00400 RSI: 0400 RDI: 7f32831ae000 [ 84.727944] RBP: 7fffb06c4750 R08: 7fffb06c4548 R09: 55e7570ad230 [ 84.735809] R10: 55e757088010 R11: 0246 R12: 55e75453cefa [ 84.743688] R13: R14: 0021 R15: [ 84.7

[RFC PATCH 2/3] drm/amdgpu: Add range param to amdgpu_vm_update_range

2022-12-20 Thread Felix Kuehling
König Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c| 27 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h| 58 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 6 ++- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 4 +- 4 files changed, 77

[RFC PATCH 1/3] drm/amdgpu: Add vm->notifier_lock

2022-12-20 Thread Felix Kuehling
This points to a mutex to serialize with MMU notifiers during page table updates. For graphics contexts, the notifier lock is per adev. For compute contexts the lock is per process. Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 3 +++ drivers/gpu/drm/amd

Re: [PATCH] drm/amdkfd: Fix kernel warning during topology setup

2022-12-20 Thread Felix Kuehling
t_rcu+0xd7/0x130 [ +0.004205] softirqs last disabled at (59649): [] irq_exit_rcu+0xd7/0x130 [ +0.004203] ---[ end trace ]--- Fixes: 0f28cca87e9a ("drm/amdkfd: Extend KFD device topology to surface peer-to-peer links") Signed-off-by: Mukul Joshi Reviewe

Re: [PATCH 2/2] drm/amdkfd: Fix double release compute pasid

2022-12-14 Thread Felix Kuehling
6/0x270 [drm] drm_close_helper.isra.14+0x60/0x70 [drm] drm_release+0x6e/0xf0 [drm] __fput+0xcc/0x280 fput+0xe/0x20 task_work_run+0x96/0xc0 do_exit+0x3d0/0xc10 Suggested-by: Felix Kuehling I don't think I suggested this fix. I didn't realize that the problem only a

Re: [PATCH 1/1] drm/amdgpu: Fix double release KFD pasid

2022-12-13 Thread Felix Kuehling
run+0x96/0xc0    do_exit+0x3d0/0xc10 Suggested-by: Felix Kuehling Signed-off-by: Philip Yang ---   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 8 +++-   1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_k

Re: [PATCH 1/2] drm/amdgpu: Enable IH retry CAM on GFX9

2022-12-12 Thread Felix Kuehling
: Mukul Joshi Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h | 2 + drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 51 --- drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c| 2 +- drivers/gpu/drm/amd/amdgpu/vega20_ih.c| 46

Re: [PATCH] drm/amdgpu: revert "generally allow over-commit during BO allocation"

2022-12-12 Thread Felix Kuehling
to evict first. Signed-off-by: Christian König Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c| 16 +--- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 6 +- 2 files changed, 18 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd

Re: [PATCH 1/9] drm/amdgpu: generally allow over-commit during BO allocation

2022-12-10 Thread Felix Kuehling
Am 2022-12-10 um 09:12 schrieb Christian König: Am 10.12.22 um 07:15 schrieb Felix Kuehling: On 2022-11-25 05:21, Christian König wrote: We already fallback to a dummy BO with no backing store when we allocate GDS,GWS and OA resources and to GTT when we allocate VRAM. Drop all those

Re: [PATCH 1/9] drm/amdgpu: generally allow over-commit during BO allocation

2022-12-09 Thread Felix Kuehling
On 2022-11-25 05:21, Christian König wrote: We already fallback to a dummy BO with no backing store when we allocate GDS,GWS and OA resources and to GTT when we allocate VRAM. Drop all those workarounds and generalize this for GTT as well. This fixes ENOMEM issues with runaway applications which

Re: Another circular lock dependency involving fs-reclaim and drm_buddy

2022-12-08 Thread Felix Kuehling
Am 2022-12-08 um 12:39 schrieb Christian König: Am 08.12.22 um 17:28 schrieb Felix Kuehling: Am 2022-12-08 um 10:44 schrieb Christian König: Am 08.12.22 um 16:19 schrieb Felix Kuehling: Am 2022-12-08 um 07:32 schrieb Christian König: Hi Felix, digging though the code I think I know now how

Re: Another circular lock dependency involving fs-reclaim and drm_buddy

2022-12-08 Thread Felix Kuehling
Am 2022-12-08 um 10:44 schrieb Christian König: Am 08.12.22 um 16:19 schrieb Felix Kuehling: Am 2022-12-08 um 07:32 schrieb Christian König: Hi Felix, digging though the code I think I know now how we can solve this. The lock which needs to protect the validity of the pages is the vm

Re: Another circular lock dependency involving fs-reclaim and drm_buddy

2022-12-08 Thread Felix Kuehling
holding vram_mgr->lock. Regards,   Felix Or am I missing something here? Regards, Christian. Am 06.12.22 um 16:57 schrieb Christian König: Am 06.12.22 um 16:14 schrieb Felix Kuehling: Am 2022-12-06 um 03:20 schrieb Christian König: Hi Felix, to be honest I think the whole approach yo

Re: [PATCH] drm/amdgpu: fixx NULL pointer deref in gmc_v9_0_get_vm_pte

2022-12-07 Thread Felix Kuehling
h from drm-next, not amd-staging-drm-next) Other than that, the patch is Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9

Re: Another circular lock dependency involving fs-reclaim and drm_buddy

2022-12-06 Thread Felix Kuehling
  Felix Invalidating the mappings and eventually scheduling that they are re-created is a separate step which should come independent of this if I'm not completely mistaken. Regards, Christian. Am 06.12.22 um 01:04 schrieb Felix Kuehling: We fixed a similar issue with Philip's patc

[PATCH] drm/amdgpu: Add notifier lock for KFD userptrs

2022-12-05 Thread Felix Kuehling
Add a per-process MMU notifier lock for processing notifiers from userptrs. Use that lock to properly synchronize page table updates with MMU notifiers. v2: rebased Signed-off-by: Felix Kuehling Reviewed-by: Xiaogang Chen (v1) --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 13 +- .../gpu

Another circular lock dependency involving fs-reclaim and drm_buddy

2022-12-05 Thread Felix Kuehling
We fixed a similar issue with Philip's patch "drm/amdgpu: Drop eviction lock when allocating PT BO", but there was another one hiding underneath that (see the log below). The problem is, that we're still allocating page tables while holding the prange->lock in the kfd_svm code, which is also he

Re: [PATCH 3/3] drm/amdgpu: mention RDNA support in docu

2022-12-01 Thread Felix Kuehling
On 2022-12-01 10:38, Peter Maucher wrote: The amdgpu kernel module has supported RDNA for a while, mention that in the module description. Signed-off-by: Peter Maucher --- Documentation/gpu/amdgpu/index.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/gpu

Re: [PATCH 2/3] drm/amdgpu: add GART and GTT to glossary

2022-12-01 Thread Felix Kuehling
On 2022-12-01 10:38, Peter Maucher wrote: GART and GTT are two abbreviations that should be mentioned in the glossary. Signed-off-by: Peter Maucher --- Documentation/gpu/amdgpu/amdgpu-glossary.rst | 6 ++ 1 file changed, 6 insertions(+) diff --git a/Documentation/gpu/amdgpu/amdgpu-gloss

Re: [PATCH 05/29] drm/amdgpu: setup hw debug registers on driver initialization

2022-11-30 Thread Felix Kuehling
On 2022-10-31 12:23, Jonathan Kim wrote: Add missing debug trap registers references and initialize all debug registers on boot by clearing the hardware exception overrides and the wave allocation ID index. For debug devices that only support single process debugging, enable trap temporary set

Re: [PATCH 05/29] drm/amdgpu: setup hw debug registers on driver initialization

2022-11-30 Thread Felix Kuehling
On 2022-11-22 18:38, Felix Kuehling wrote: On 2022-10-31 12:23, Jonathan Kim wrote: Add missing debug trap registers references and initialize all debug registers on boot by clearing the hardware exception overrides and the wave allocation ID index. For debug devices that only support

Re: [PATCH 04/29] drm/amdgpu: add kgd hw debug mode setting interface

2022-11-30 Thread Felix Kuehling
On 2022-10-31 12:23, Jonathan Kim wrote: Introduce the require KGD debug calls that will execute hardware debug mode setting. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- .../gpu/drm/amd/include/kgd_kfd_interface.h | 34 +++ 1 file changed, 34

Re: [PATCH 21/29] drm/amdkfd: add debug wave launch mode operation

2022-11-30 Thread Felix Kuehling
On 2022-10-31 12:23, Jonathan Kim wrote: Allow the debugger to set wave behaviour on to either normally operate, halt at launch, trap on every instruction, terminate immediately or stall on allocation. Signed-off-by: Jonathan Kim --- .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 18 ++

Re: [PATCH 29/29] drm/amdkfd: bump kfd ioctl minor version for debug api availability

2022-11-30 Thread Felix Kuehling
On 2022-10-31 12:23, Jonathan Kim wrote: Bump the minor version to declare debugging capability is now available. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 1 - include/uapi/linux/kfd_ioctl.h | 3 ++- 2 files

Re: [PATCH 28/29] drm/amdkfd: add debug device snapshot operation

2022-11-30 Thread Felix Kuehling
y_size) +{ + struct kfd_dbg_device_info_entry device_info = {0}; Use memset. With that fixed, the patch is Reviewed-by: Felix Kuehling + uint32_t tmp_entry_size = *entry_size, tmp_num_devices; + int i, r = 0; + + if (!(target && user_info && number_of_de

Re: [PATCH 27/29] drm/amdkfd: add debug queue snapshot operation

2022-11-30 Thread Felix Kuehling
On 2022-10-31 12:23, Jonathan Kim wrote: Allow the debugger to get a snapshot of a specified number of queues containing various queue property information that is copied to the debugger. Since the debugger doesn't know how many queues exist at any given time, allow the debugger to pass the re

Re: [PATCH 26/29] drm/amdkfd: add debug query exception info operation

2022-11-29 Thread Felix Kuehling
the queue exception status. The debugger has the option of clearing the target exception on query. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 7 ++ drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 120 +++ drivers/g

Re: [PATCH 25/29] drm/amdkfd: add debug query event operation

2022-11-29 Thread Felix Kuehling
FIFO statement. Other than that, this patch is Reviewed-by: Felix Kuehling The KFD should also return the GPU or Queue id of the exception. The debugger also has the option of clearing exceptions after being queried. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c

Re: [PATCH 24/29] drm/amdkfd: add debug set flags operation

2022-11-29 Thread Felix Kuehling
On 2022-10-31 12:23, Jonathan Kim wrote: Allow the debugger to set single memory and single ALU operations. Some exceptions are imprecise (memory violations, address watch) in the sense that a trap occurs only when the exception interrupt occurs and not at the non-halting faulty instruction.

Re: [PATCH 23/29] drm/amdkfd: add debug set and clear address watch points operation

2022-11-29 Thread Felix Kuehling
On 2022-10-31 12:23, Jonathan Kim wrote: Shader read, write and atomic memory operations can be alerted to the debugger as an address watch exception. Allow the debugger to pass in a watch point to a particular memory address per device. Note that there exists only 4 watch points per devices

Re: [PATCH 22/29] drm/amdkfd: add debug suspend and resume process queues operation

2022-11-29 Thread Felix Kuehling
On 2022-10-31 12:23, Jonathan Kim wrote: In order to inspect waves from the saved context at any point during a debug session, the debugger must be able to preempt queues to trigger context save by suspending them. On queue suspend, the KFD will copy the context save header information so that

Re: [PATCH 20/29] drm/amdkfd: add debug wave launch override operation

2022-11-29 Thread Felix Kuehling
to support unique EPERM on PTRACE failure. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 47 ++ .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c | 2 + .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c| 55

Re: [PATCH 3/9] drm/ttm: use per BO cleanup workers

2022-11-29 Thread Felix Kuehling
s work queue is not about freeing ttm_resources but about freeing the BOs. But it affects freeing of ghost_objs that are holding the ttm_resources being freed. If those assumptions all make sense, patches 1-3 are Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.

Re: [PATCH] drm/amdkfd: Fix memory leakage

2022-11-28 Thread Felix Kuehling
Fixes: d4ec4bdc0bd5 ("drm/amdkfd: Allow access for mmapping KFD BOs") Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu

Re: [PATCH 17/29] drm/amdkfd: Add debug trap enabled flag to TMA

2022-11-25 Thread Felix Kuehling
Reviewed-by: Felix Kuehling Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 4 drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 2 ++ drivers/gpu/drm/amd/amdkfd/kfd_process.c | 16 3 files changed, 22 insertions(+) diff --git a/drivers/gpu

Re: [PATCH] drm/amdkfd: Remove unnecessary condition in kfd_topology_add_device()

2022-11-25 Thread Felix Kuehling
On 2022-11-25 02:39, Dan Carpenter wrote: We re-arranged this code recently so "ret" is always zero at this point. Signed-off-by: Dan Carpenter Reviewed-by: Felix Kuehling I'm applying your patch to amd-staging-drm-next. Thank you! Felix --- drivers/gpu/drm/amd/amdkfd

Re: [PATCH 19/29] drm/amdkfd: add debug set exceptions enabled operation

2022-11-24 Thread Felix Kuehling
Am 2022-10-31 um 12:23 schrieb Jonathan Kim: The debugger subscibes to nofication for requested exceptions on attach. Allow the debugger to change its subsciption later on. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 3

Re: [PATCH 07/29] drm/amdgpu: add gfx9.4.1 hw debug mode enable and disable calls

2022-11-24 Thread Felix Kuehling
Am 2022-11-24 um 09:58 schrieb Kim, Jonathan: [AMD Official Use Only - General] -Original Message- From: Kuehling, Felix Sent: November 22, 2022 6:59 PM To: Kim, Jonathan ; amd- g...@lists.freedesktop.org Subject: Re: [PATCH 07/29] drm/amdgpu: add gfx9.4.1 hw debug mode enable and disa

Re: [PATCH 17/29] drm/amdkfd: Add debug trap enabled flag to TMA

2022-11-24 Thread Felix Kuehling
Cornwall Reviewed-by: Felix Kuehling Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 4 drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 2 ++ drivers/gpu/drm/amd/amdkfd/kfd_process.c | 16 3 files changed, 22 insertions(+) diff --git a/drivers

Re: [PATCH 17/29] drm/amdkfd: Add debug trap enabled flag to TMA

2022-11-22 Thread Felix Kuehling
on APUs Signed-off-by: Jay Cornwall Reviewed-by: Felix Kuehling Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 4 drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 2 ++ drivers/gpu/drm/amd/amdkfd/kfd_process.c | 16 3 files changed, 22 insertions

Re: [PATCH 16/29] drm/amdkfd: add runtime enable operation

2022-11-22 Thread Felix Kuehling
On 2022-10-31 12:23, Jonathan Kim wrote: This operation coordinates the debugger with the target HSA runtime process. The main motive for this coordination is due to CP performance overhead I wouldn't call that the main motivation. The main motivation for synchronizing runtime enable with t

Re: [PATCH 10/29] drm/amdgpu: add configurable grace period for unmap queues

2022-11-22 Thread Felix Kuehling
On 2022-10-31 12:23, Jonathan Kim wrote: The HWS schedule allows a grace period for wave completion prior to preemption but the debugger requires good performance since it preempts on every HW debug mode setting transaction request. For good performance, allow immediate preemption by setting the

Re: [PATCH 06/29] drm/amdgpu: add gfx9 hw debug mode enable and disable calls

2022-11-22 Thread Felix Kuehling
On 2022-10-31 12:23, Jonathan Kim wrote: Implement the per-device calls to enable or disable HW debug mode for GFX9 prior to GFX9.4.1. GFX9.4.1 and onward will require their own enable/disable sequence as follow on patches. When hardware debug mode setting is requested, waves will inherit the

Re: [PATCH 07/29] drm/amdgpu: add gfx9.4.1 hw debug mode enable and disable calls

2022-11-22 Thread Felix Kuehling
On 2022-10-31 12:23, Jonathan Kim wrote: On GFX9.4.1, the implicit wait count instruction on s_barrier is disabled by default in the driver during normal operation for performance requirements. There is a hardware bug in GFX9.4.1 where if the implicit wait count instruction after an s_barrier

Re: [PATCH 03/29] drm/amdkfd: prepare per-process debug enable and disable

2022-11-22 Thread Felix Kuehling
On 2022-10-31 12:23, Jonathan Kim wrote: The ROCm debugger will attach to a process to debug by PTRACE and will expect the KFD to prepare a process for the target PID, whether the target PID has opened the KFD device or not. This patch is to explicity handle this requirement. Further HW mode

Re: [PATCH 05/29] drm/amdgpu: setup hw debug registers on driver initialization

2022-11-22 Thread Felix Kuehling
On 2022-10-31 12:23, Jonathan Kim wrote: Add missing debug trap registers references and initialize all debug registers on boot by clearing the hardware exception overrides and the wave allocation ID index. For debug devices that only support single process debugging, enable trap temporary set

Re: [PATCH 02/29] drm/amdkfd: display debug capabilities

2022-11-22 Thread Felix Kuehling
athan Kim Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 88 +-- drivers/gpu/drm/amd/amdkfd/kfd_topology.h | 6 ++ include/uapi/linux/kfd_sysfs.h| 15 3 files changed, 104 insertions(+), 5 deletions(-) diff --git a/dr

Re: [PATCH 01/29] drm/amdkfd: add debug and runtime enable interface

2022-11-22 Thread Felix Kuehling
pshot sematics to match queue snapshot semantics This looks really good. I have 3 more nit-picks inline. Other than that, this patch is Reviewed-by: Felix Kuehling Do we have a debugger branch that uses the API yet? We should make this public in order to complete this upstream code review. S

Re: [PATCH v3] drm/amdgpu: fix stall on CPU when allocate large system memory

2022-11-22 Thread Felix Kuehling
[ 185.439463] amdgpu_ttm_tt_get_user_pages+0xc2/0x190 [amdgpu] [ 185.439603] amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0x49f/0x7a0 [amdgpu] [ 185.439774] kfd_ioctl_alloc_memory_of_gpu+0xfb/0x410 [amdgpu] Signed-off-by: James Zhu Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu

Re: [PATCH] drm/amdkfd: Release the topology_lock in error case

2022-11-21 Thread Felix Kuehling
Am 2022-11-21 um 00:13 schrieb Ma Jun: From: Felix Kuehling Move the topology-locked part of kfd_topology_add_device into a separate function to simlpify error handling and release the topology lock consistently. Reported-by: Dan Carpenter Signed-off-by: Felix Kuehling Signed-off-by: Ma Jun

[PATCH 3/6] drm/amdkfd: Improve amdgpu_vm_handle_moved

2022-11-18 Thread Felix Kuehling
Let amdgpu_vm_handle_moved update all BO VA mappings of BOs reserved by the caller. This will be useful for handling extra BO VA mappings in KFD VMs that are managed through the render node API. Signed-off-by: Felix Kuehling Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu

[PATCH 4/6] drm/amdgpu: Attach eviction fence on alloc

2022-11-18 Thread Felix Kuehling
Instead of attaching the eviction fence when a KFD BO is first mapped, attach it when it is allocated or imported. This in preparation to allow KFD BOs to be mapped using the render node API. Signed-off-by: Felix Kuehling --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 63

[PATCH 6/6] drm/amdgpu: Do bo_va ref counting for KFD BOs

2022-11-18 Thread Felix Kuehling
This is needed to correctly handle BOs imported into the GEM API, which would otherwise get added twice to the same VM. Signed-off-by: Felix Kuehling --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 28 +++ 1 file changed, 22 insertions(+), 6 deletions(-) diff --git a

[PATCH 2/6] drm/amdkfd: Implement DMA buf fd export from KFD

2022-11-18 Thread Felix Kuehling
user mode change (Thunk API and kfdtest) is here: https://github.com/fxkamd/ROCT-Thunk-Interface/commits/fxkamd/dmabuf Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 2 + .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 45 +++ drivers/gpu/drm/amd

[PATCH 5/6] drm/amdgpu: update mappings not managed by KFD

2022-11-18 Thread Felix Kuehling
comments, remove TODOs that are no longer applicable Signed-off-by: Felix Kuehling --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 28 +++ 1 file changed, 22 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd

[PATCH 1/6] drm/amdgpu: Generalize KFD dmabuf import

2022-11-18 Thread Felix Kuehling
Use proper amdgpu_gem_prime_import function to handle all kinds of imports. Remember the dmabuf reference to enable proper multi-GPU attachment to multiple VMs without erroneously re-exporting the underlying BO multiple times. Signed-off-by: Felix Kuehling --- .../gpu/drm/amd/amdgpu

[PATCH 0/6] Enable KFD to use render node BO mappings

2022-11-18 Thread Felix Kuehling
implementation. Felix Kuehling (6): drm/amdgpu: Generalize KFD dmabuf import drm/amdkfd: Implement DMA buf fd export from KFD drm/amdkfd: Improve amdgpu_vm_handle_moved drm/amdgpu: Attach eviction fence on alloc drm/amdgpu: update mappings not managed by KFD drm/amdgpu: Do bo_va ref counting for

Re: [PATCH] drm/amdgpu: Add notifier lock for KFD userptrs

2022-11-17 Thread Felix Kuehling
- From: amd-gfx On Behalf Of Felix Kuehling Sent: Wednesday, November 2, 2022 9:00 PM To: amd-gfx@lists.freedesktop.org Subject: [PATCH] drm/amdgpu: Add notifier lock for KFD userptrs Caution: This message originated from an External Source. Use proper caution when opening attachments, cli

Re: [PATCH] drm/amdgpu: fix stall on CPU when allocate large system memory

2022-11-17 Thread Felix Kuehling
Am 2022-11-17 um 16:38 schrieb James Zhu: When applications try to allocate large system (more than > 128GB), "stall cpu" is reported. for such large system memory, walk_page_range takes more than 20s usually. The warning message can be removed when splitting hmm range into smaller ones which is

Re: [PATCH] drm/amdkfd: Release the topology_lock in error case

2022-11-17 Thread Felix Kuehling
d_destroy_crat_image(crat_image); + return res; } On 11/17/2022 4:49 AM, Felix Kuehling wrote: Am 2022-11-16 um 03:04 schrieb Ma Jun: Release the topology_lock in error case Signed-off-by: Ma Jun Reported-by: Dan Carpenter Dan, did you change your email address, is this one cor

Re: [PATCH] drm/amdkfd: enable cooperative launch for gfx10.3

2022-11-17 Thread Felix Kuehling
Am 2022-10-12 um 15:07 schrieb Jonathan Kim: FW fix available to enable cooperative launch for GFX10.3. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers

Re: [PATCH] drm/amdgpu: Enable Aldebaran devices to report CU Occupancy

2022-11-16 Thread Felix Kuehling
Am 2022-11-16 um 11:54 schrieb Ramesh Errabolu: Allow user to know number of compute units (CU) that are in use at any given moment. Enable access to the method kgd_gfx_v9_get_cu_occupancy that computes CU occupancy. Signed-off-by: Ramesh Errabolu Reviewed-by: Felix Kuehling

Re: [PATCH 2/2] drm/amdgpu: make psp_ring_init common

2022-11-16 Thread Felix Kuehling
Am 2022-11-16 um 11:40 schrieb Alex Deucher: All of the IP specific versions are the same now, so we can just use a common function. Signed-off-by: Alex Deucher The series is Acked-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 26 +++ drivers

Re: [PATCH] drm/amdkfd: Release the topology_lock in error case

2022-11-16 Thread Felix Kuehling
res = -ENODEV; + up_write(&topology_lock); goto err; } From ceb79972cdd490de181a6895836e40bf4e93c631 Mon Sep 17 00:00:00 2001 From: Felix Kuehling Date: Wed, 16 Nov 2022 15:38:44 -0500 Subject: [PATCH] drm/amdkf

Re: [PATCH 2/4] drm/amdgpu: fix userptr HMM range handling v2

2022-11-15 Thread Felix Kuehling
mutex protected bo list for now. v2: make sure range is set to NULL in case of an error Signed-off-by: Christian König Reviewed-by: Alex Deucher Reviewed-by: Felix Kuehling CC: sta...@vger.kernel.org --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 12 +++-- drivers/gpu/drm/amd/a

Re: [PATCH 2/4] drm/amdgpu: fix userptr HMM range handling

2022-11-15 Thread Felix Kuehling
Am 2022-11-15 um 08:37 schrieb Christian König: Am 10.11.22 um 22:55 schrieb Felix Kuehling: Am 2022-11-10 um 08:00 schrieb Christian König: The basic problem here is that it's not allowed to page fault while holding the reservation lock. So it can happen that multiple processes t

Re: [PATCH] amd/amdkfd: Fix a memory limit issue

2022-11-14 Thread Felix Kuehling
. So removing vram_pin_size will resolve it. Signed-off-by: Eric Huang Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu

Re: [PATCH 3/4] drm/amdgpu: rename the files for HMM handling

2022-11-10 Thread Felix Kuehling
Am 2022-11-10 um 08:00 schrieb Christian König: Clean that up a bit, no functional change. Signed-off-by: Christian König Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/Makefile | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 - .../gpu/drm/amd

<    3   4   5   6   7   8   9   10   11   12   >