Re: [PATCH] drm/amdkfd: Set pte_flags for actual BO location

2022-08-30 Thread Felix Kuehling
On 2022-08-30 02:00, Christian König wrote: Am 29.08.22 um 21:30 schrieb Felix Kuehling: Am 2022-08-29 um 14:59 schrieb Christian König: Am 29.08.22 um 18:07 schrieb Felix Kuehling: Am 2022-08-29 um 11:38 schrieb Christian König: Am 27.08.22 um 01:16 schrieb Felix Kuehling: BOs can be in a

Re: [PATCH] drm/amdkfd: Set pte_flags for actual BO location

2022-08-29 Thread Felix Kuehling
Am 2022-08-29 um 14:59 schrieb Christian König: Am 29.08.22 um 18:07 schrieb Felix Kuehling: Am 2022-08-29 um 11:38 schrieb Christian König: Am 27.08.22 um 01:16 schrieb Felix Kuehling: BOs can be in a different location than was intended at allocation time, for example when restoring fails

Re: [PATCH] drm/amdkfd: Set pte_flags for actual BO location

2022-08-29 Thread Felix Kuehling
Am 2022-08-29 um 11:38 schrieb Christian König: Am 27.08.22 um 01:16 schrieb Felix Kuehling: BOs can be in a different location than was intended at allocation time, for example when restoring fails after an eviction or BOs get pinned in system memory. On some GPUs the MTYPE for coherent

Re: [PATCH] drm/amdgpu: ensure no PCIe peer access for CPU XGMI iolinks

2022-08-29 Thread Felix Kuehling
Am 2022-08-29 um 11:00 schrieb Alex Sierra: [Why] Devices with CPU XGMI iolink do not support PCIe peer access. Signed-off-by: Alex Sierra Acked-by: Alex Deucher Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ++- 1 file changed, 2 insertions(+), 1

Re: [PATCH] drm/amdgpu: ensure no PCIe peer access for CPU XGMI iolinks

2022-08-29 Thread Felix Kuehling
Am 2022-08-28 um 11:28 schrieb Christian König: Am 26.08.22 um 23:49 schrieb Felix Kuehling: On 2022-08-26 11:47, Alex Sierra wrote: [Why] Devices with CPU XGMI iolink do not support PCIe peer access. Signed-off-by: Alex Sierra ---   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ++-   1

[PATCH] drm/amdkfd: Set pte_flags for actual BO location

2022-08-26 Thread Felix Kuehling
every time the page tables are updated. Signed-off-by: Felix Kuehling --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 9 - drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c| 19 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h| 1 + 3 files changed, 28 insertions(+), 1

Re: [PATCH] drm/amdgpu: ensure no PCIe peer access for CPU XGMI iolinks

2022-08-26 Thread Felix Kuehling
On 2022-08-26 11:47, Alex Sierra wrote: [Why] Devices with CPU XGMI iolink do not support PCIe peer access. Signed-off-by: Alex Sierra --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device

Re: [PATCH 1/2] drm/amdgpu: Move HDP remapping earlier during init

2022-08-25 Thread Felix Kuehling
Am 2022-08-25 um 04:58 schrieb Lijo Lazar: HDP flush is used early in the init sequence as part of memory controller block initialization. Hence remapping of HDP registers needed for flush needs to happen earlier. This also fixes the AER error reported as Unsupported Request during driver load.

Re: [PATCH 03/11] drm/amdgpu: use DMA_RESV_USAGE_BOOKKEEP

2022-08-25 Thread Felix Kuehling
Am 2022-08-25 um 09:31 schrieb Christian König: Use DMA_RESV_USAGE_BOOKKEEP for VM page table updates and KFD preemption fence. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 3 ++- 2 files chang

Re: [Bug 216373] New: Uncorrected errors reported for AMD GPU

2022-08-25 Thread Felix Kuehling
Am 2022-08-24 um 01:10 schrieb Lazar, Lijo: On 8/23/2022 10:34 PM, Tom Seewald wrote: On Sat, Aug 20, 2022 at 2:53 AM Lazar, Lijo wrote: Missed the remap part, the offset is here - https://elixir.bootlin.com/linux/v6.0-rc1/source/drivers/gpu/drm/amd/amdgpu/nv.c#L680 The trace is com

Re: [PATCH 0/2] Use kfd_lock/unlock_pdd helpers

2022-08-25 Thread Felix Kuehling
Do you mind squashing the two patches. It would make them easier to review because it makes it easier to see that the same functions are using both. Thanks,   Felix On 2022-08-24 16:01, Daniel Phillips wrote: Patch 1 adds kfd_lock_pdd_by_id and patch 2 adds kfd_unlock_pdd helpers, broken out

Re: [PATCH] drm/amdgpu: Fix page table setup on Arcturus

2022-08-23 Thread Felix Kuehling
er to extend UTCL2 reach" Signed-off-by: Mukul Joshi Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/mmhub_v9_4.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/mmhub_v9_4.c b/drivers/gpu/drm/amd/amdgpu/mmhub_v9

Re: [PATCH 1/1] drm/amdgpu: Use kfd_lock_pdd_by_id helper in more places

2022-08-22 Thread Felix Kuehling
Am 2022-08-22 um 11:30 schrieb Daniel Phillips: Convert most of the "mutex_lock; kfd_process_device_data_by_id" occurrences in kfd_chardev to use the kfd_lock_pdd_by_id. These will now consistently log debug output if the lookup fails. Sites where kfd_process_device_data_by_id is used without loc

Re: [PATCH] powerpc: export cpu_smallcore_map for modules

2022-08-19 Thread Felix Kuehling
n same core") Signed-off-by: Randy Dunlap Cc: Gautham R. Shenoy Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Christophe Leroy Cc: linuxppc-...@lists.ozlabs.org Cc: amd-gfx@lists.freedesktop.org Cc: Felix Kuehling Cc: Alex Deucher Cc: Christian König Cc: "Pan, Xinhui" Acked-by: Fel

Re: build failure of next-20220817 for amdgpu due to 7bc913085765 ("drm/amdkfd: Try to schedule bottom half on same core")

2022-08-19 Thread Felix Kuehling
On 2022-08-18 15:34, Randy Dunlap wrote: Hi-- On 8/18/22 12:15, Sudip Mukherjee wrote: On Thu, Aug 18, 2022 at 4:10 PM Randy Dunlap wrote: On 8/18/22 03:43, Sudip Mukherjee wrote: On Thu, Aug 18, 2022 at 3:09 AM Randy Dunlap wrote: On 8/17/22 19:01, Alex Deucher wrote: On Wed, Aug 17, 2

Re: [PATCH] drm/amdgpu: Remove the additional kfd pre reset call for sriov

2022-08-18 Thread Felix Kuehling
Am 2022-08-18 um 14:19 schrieb shaoyunl: The additional call is caused by merge conflict Signed-off-by: shaoyunl Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

Re: [PATCH 1/6] drm/ttm: Add usage to ttm_validate_buffer.

2022-08-17 Thread Felix Kuehling
Am 2022-08-12 um 21:27 schrieb Bas Nieuwenhuizen: This way callsites can choose between READ/BOOKKEEP reservations. Signed-off-by: Bas Nieuwenhuizen --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 5 + drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 9 +++-- drivers/gpu/dr

Re: [PATCH] drm/amdkfd: add family_id property for kfd_node

2022-08-17 Thread Felix Kuehling
Am 2022-08-17 um 11:04 schrieb Felix Kuehling: Am 2022-08-16 um 23:09 schrieb Lang Yu: Then we can remove the burden of maintaining codes to parse family_id from gfx version in rocr, i.e., remove DevIDToAddrLibFamily(). I'm OK with the change. But you won't be able

Re: [PATCH] drm/amdkfd: add family_id property for kfd_node

2022-08-17 Thread Felix Kuehling
Am 2022-08-16 um 23:09 schrieb Lang Yu: Then we can remove the burden of maintaining codes to parse family_id from gfx version in rocr, i.e., remove DevIDToAddrLibFamily(). I'm OK with the change. But you won't be able to remove DevIDToAddrLibFamily as long as ROCr needs to support older kerne

Re: [PATCHv2] drm/amdgpu: Fix interrupt handling on ih_soft ring

2022-08-16 Thread Felix Kuehling
Am 2022-08-15 um 15:25 schrieb Mukul Joshi: There are no backing hardware registers for ih_soft ring. As a result, don't try to access hardware registers for read and write pointers when processing interrupts on the IH soft ring. Signed-off-by: Mukul Joshi Reviewed-by: Felix Kue

Re: [PATCH] drm/amdgpu: Fix interrupt handling on ih_soft ring

2022-08-15 Thread Felix Kuehling
Am 2022-08-14 um 13:27 schrieb Christian König: Am 12.08.22 um 22:56 schrieb Mukul Joshi: There are no backing hardware registers for ih_soft ring. As a result, don't try to access hardware registers for read and write pointers when processing interrupts on the IH soft ring. Mhm, the original

Re: [PATCH] drm/amdgpu: fix reset domain xgmi hive info reference leak

2022-08-12 Thread Felix Kuehling
On 2022-08-12 18:05, Andrey Grodzovsky wrote: On 2022-08-12 14:38, Kim, Jonathan wrote: [Public] Hi Andrey, Here's the load/unload stack trace.  This is a 2 GPU xGMI system.  I put dbg_xgmi_hive_get/put refcount print post kobj get/put. It's stuck at 2 on unload.  If it's an 8 GPU system,

Re: [PATCH] drm/amdgpu: Fix interrupt handling on ih_soft ring

2022-08-12 Thread Felix Kuehling
On 2022-08-12 16:56, Mukul Joshi wrote: There are no backing hardware registers for ih_soft ring. As a result, don't try to access hardware registers for read and write pointers when processing interrupts on the IH soft ring. Signed-off-by: Mukul Joshi The patch looks good to me. But you pr

Re: [PATCH] drm/amdkfd: potential crash in kfd_create_indirect_link_prop()

2022-08-12 Thread Felix Kuehling
x for this: https://lore.kernel.org/all/20220706183302.1719795-1-ramesh.errab...@amd.com/ You commented on a version of his patch: https://lore.kernel.org/all/20220629161241.GM11460@kadam/ Did this get lost somehow? Anyway, your patch looks good to me and I'm going to apply it to amd-stagin

Re: Selecting CPUs for queuing work on

2022-08-12 Thread Felix Kuehling
On 2022-08-12 16:30, Tejun Heo wrote: On Fri, Aug 12, 2022 at 04:26:47PM -0400, Felix Kuehling wrote: Hi workqueue maintainers, In the KFD (amdgpu) driver we found a need to schedule bottom half interrupt handlers on CPU cores different from the one where the top-half interrupt handler runs to

Selecting CPUs for queuing work on

2022-08-12 Thread Felix Kuehling
Hi workqueue maintainers, In the KFD (amdgpu) driver we found a need to schedule bottom half interrupt handlers on CPU cores different from the one where the top-half interrupt handler runs to avoid the interrupt handler stalling the bottom half in extreme scenarios. See my latest patch that t

Re: [PATCH v2] drm/amdkfd: Try to schedule bottom half on same core

2022-08-12 Thread Felix Kuehling
On 2022-08-12 09:55, Philip Yang wrote: On 2022-08-11 15:04, Felix Kuehling wrote: On systems that support SMT (hyperthreading) schedule the bottom half of the KFD interrupt handler on the same core. This makes it possible to reserve a core for interrupt handling and have the bottom half run

[PATCH v2] drm/amdkfd: Try to schedule bottom half on same core

2022-08-11 Thread Felix Kuehling
before. Use for_each_cpu_wrap instead of open-coding it. Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 20 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd

Re: [PATCH] drm/amdgpu: fix reset domain xgmi hive info reference leak

2022-08-11 Thread Felix Kuehling
Am 2022-08-11 um 09:42 schrieb Jonathan Kim: When an xgmi node is added to the hive, it takes another hive reference for its reset domain. This extra reference was not dropped on device removal from the hive so drop it. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c

Re: [PATCH] drm/amdkfd: Try to schedule bottom half on same core

2022-08-11 Thread Felix Kuehling
Am 2022-08-10 um 19:41 schrieb Felix Kuehling: On systems that support SMT (hyperthreading) schedule the bottom half of the KFD interrupt handler on the same core. This makes it possible to reserve a core for interrupt handling and have the bottom half run on that same core. On systems without

[PATCH] drm/amdkfd: Try to schedule bottom half on same core

2022-08-10 Thread Felix Kuehling
before. Use for_each_cpu_wrap instead of open-coding it. Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 20 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd

[PATCH] drm/amdkfd: Fix mm reference in SVM eviction worker

2022-08-08 Thread Felix Kuehling
and drop the svm_bo reference if it isn't. Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 17 +++-- drivers/gpu/drm/amd/amdkfd/kfd_svm.h | 1 - 2 files changed, 7 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/dri

[PATCH] drm/amdkfd: Handle restart of kfd_ioctl_wait_events

2022-08-04 Thread Felix Kuehling
hangs when kfd_ioctl_wait_events is interrupted by a signal. Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_events.c | 24 drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 2 +- 3 files changed, 14 insertions

Re: [PATCH] drm/amdgpu: Enable translate_further to extend UTCL2 reach

2022-08-04 Thread Felix Kuehling
On 2022-08-04 12:01, Joseph Greathouse wrote: Enable translate_further on Arcturus and Aldebaran server chips in order to increase the UTCL2 reach from 8 GiB to 64 GiB, which is more in line with the amount of framebuffer DRAM in the devices. Signed-off-by: Joseph Greathouse Acked-by: Felix

[PATCH] drm/amdkfd: Allocate doorbells only when needed

2022-08-03 Thread Felix Kuehling
Only allocate doorbells when the first queue is created on a GPU or the doorbells need to be mapped into CPU or GPU virtual address space. This avoids allocating doorbells unnecessarily and can allow more processes to use KFD on multi-GPU systems. Signed-off-by: Felix Kuehling --- drivers/gpu

Re: [PATCH 2/2] drm/amdgpu: Pessimistic availability based on rounded up allocations

2022-07-29 Thread Felix Kuehling
Your patches are missing Signed-off-by lines. If you use "git commit -s", git should add those automatically for your convenience. Other than that, the patches look good to me. With Signed-off-by added, the series is Reviewed-by: Felix Kuehling Am 2022-07-28 um 23:16 schrieb Danie

Re: [PATCH v2] drm/amdkfd: use time_is_before_jiffies(a + b) to replace "jiffies - a > b"

2022-07-28 Thread Felix Kuehling
Am 2022-07-27 um 23:30 schrieb Yu Zhe: time_is_before_jiffies deals with timer wrapping correctly. Signed-off-by: Yu Zhe Thank you. This patch looks good to me. I'm applying it to amd-staging-drm-next. Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_interrupt.

Re: [PATCH] drm/amdgpu: Fix stub fence refcount underflow

2022-07-28 Thread Felix Kuehling
Am 2022-07-27 um 23:52 schrieb Alex Deucher: On Wed, Jul 27, 2022 at 7:08 PM Felix Kuehling wrote: Don't drop the stub fence reference after installing it as a replacement for the eviction fence. dma_resv_replace_fences doesn't take another reference to the fence, so it takes owners

Re: [PATCH 3/3] drm/amdkfd: remove an unnecessary amdgpu_bo_ref

2022-07-27 Thread Felix Kuehling
Am 2022-07-25 um 06:32 schrieb Lang Yu: No need to reference the BO here, dmabuf framework will handle that. OK. I guess I needed to do that manually for the userptr attachment, and copy/pasted it unnecessarily for the dmabuf attachment. Reviewed-by: Felix Kuehling Signed-off-by

[PATCH] drm/amdgpu: Fix stub fence refcount underflow

2022-07-27 Thread Felix Kuehling
_fences v2") CC: Christian König Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 87a3a3ae9448..a6c7dcd8c345

Re: [PATCH] drm/amdkfd: use time_is_before_jiffies(a + b) to replace "jiffies - a > b"

2022-07-27 Thread Felix Kuehling
This patch introduces a build warning for me: CC [M] drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_interrupt.o In file included from /home/fkuehlin/compute/kernel/include/linux/spinlock.h:54, from /home/fkuehlin/compute/kernel/include/linux/mmzone.h:8, from /home/f

Re: [PATCH] drm/amdgpu: Avoid direct cast to amdgpu_ttm_tt

2022-07-26 Thread Felix Kuehling
Am 2022-07-26 um 19:43 schrieb Rajneesh Bhardwaj: For typesafety, use container_of() instead of implicit cast from struct ttm_tt to struct amdgpu_ttm_tt. Cc: Christian König Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 44 ++--- 1 file c

Re: [PATCH v3 2/3] drm/amdkfd: Set svm range max pages

2022-07-26 Thread Felix Kuehling
Am 2022-07-25 um 17:17 schrieb Philip Yang: This will be used to split giant svm range into smaller ranges, to support VRAM overcommitment by giant range and improve GPU retry fault recover on giant range. Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd

Re: [PATCH] drm/amdkfd: fix kgd_mem memory leak when importing dmabuf

2022-07-26 Thread Felix Kuehling
Am 2022-07-26 um 07:15 schrieb Lang Yu: The kgd_mem memory allocated in amdgpu_amdkfd_gpuvm_import_dmabuf() is not freed properly. Explicitly free it in amdgpu_amdkfd_gpuvm_free_memory_of_gpu() under condition "mem->bo->kfd_bo != mem". Suggested-by: Felix Kuehling Signed

Re: [PATCH 2/3] drm/amdkfd: refine the gfx BO based dmabuf handling

2022-07-25 Thread Felix Kuehling
Am 2022-07-25 um 22:18 schrieb Lang Yu: On 07/25/ , Felix Kuehling wrote: Am 2022-07-25 um 20:40 schrieb Lang Yu: On 07/25/ , Felix Kuehling wrote: Am 2022-07-25 um 20:15 schrieb Lang Yu: On 07/25/ , Felix Kuehling wrote: Am 2022-07-25 um 06:32 schrieb Lang Yu: We have memory leak issue

Re: [PATCH 2/3] drm/amdkfd: refine the gfx BO based dmabuf handling

2022-07-25 Thread Felix Kuehling
Am 2022-07-25 um 20:40 schrieb Lang Yu: On 07/25/ , Felix Kuehling wrote: Am 2022-07-25 um 20:15 schrieb Lang Yu: On 07/25/ , Felix Kuehling wrote: Am 2022-07-25 um 06:32 schrieb Lang Yu: We have memory leak issue in current implenmention, i.e., the allocated struct kgd_mem memory is not

Re: [PATCH 2/3] drm/amdkfd: refine the gfx BO based dmabuf handling

2022-07-25 Thread Felix Kuehling
Am 2022-07-25 um 20:15 schrieb Lang Yu: On 07/25/ , Felix Kuehling wrote: Am 2022-07-25 um 06:32 schrieb Lang Yu: We have memory leak issue in current implenmention, i.e., the allocated struct kgd_mem memory is not handled properly. The idea is always creating a buffer object when importing a

Re: [PATCH v2 3/3] drm/amdkfd: Split giant svm range

2022-07-25 Thread Felix Kuehling
Am 2022-07-25 um 13:19 schrieb Philip Yang: Giant svm range split to smaller ranges, align the range start address to max svm range pages to improve MMU TLB usage. Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 49

Re: [PATCH v2 2/3] drm/amdkfd: Set svm range max pages

2022-07-25 Thread Felix Kuehling
Am 2022-07-25 um 13:19 schrieb Philip Yang: This will be used to split giant svm range into smaller ranges, to support VRAM overcommitment by giant range and improve GPU retry fault recover on giant range. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 2 ++ dri

Re: [PATCH 1/1] drm/amdgpu: Remove rounding from vram allocation path

2022-07-25 Thread Felix Kuehling
Am 2022-07-25 um 13:14 schrieb Daniel Phillips: Rounding up allocations in the allocation path caused test regressions, so now just round in the availability path. --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 11 +-- 1 file changed, 5 insertions(+), 6 deletions(-) diff --git

Re: [PATCH 1/3] drm/amdgpu: Allow TTM to evict svm bo from same process

2022-07-25 Thread Felix Kuehling
"Check whether to prevent eviction of @f by @mm". With that fixed, the patch is Reviewed-by: Felix Kuehling * * @f: [IN] fence * @mm: [IN] mm that needs to be verified + * + * Check if @mm is same as that of the fence @f, if same return TRUE else + * return FALSE. + *

Re: [PATCH 3/3] drm/amdkfd: Split giant svm range

2022-07-25 Thread Felix Kuehling
Am 2022-07-25 um 08:23 schrieb Philip Yang: Giant svm range split to smaller ranges, align the range start address to max svm range pages to improve MMU TLB usage. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 52 +++- 1 file changed, 36 insert

Re: [PATCH 2/3] drm/amdkfd: Set svm range max pages

2022-07-25 Thread Felix Kuehling
Am 2022-07-25 um 08:23 schrieb Philip Yang: This will be used to split giant svm range into smaller ranges, to support VRAM overcommitment by giant range and improve GPU retry fault recover on giant range. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 2 ++ drive

Re: [PATCH 2/3] drm/amdkfd: refine the gfx BO based dmabuf handling

2022-07-25 Thread Felix Kuehling
Am 2022-07-25 um 06:32 schrieb Lang Yu: We have memory leak issue in current implenmention, i.e., the allocated struct kgd_mem memory is not handled properly. The idea is always creating a buffer object when importing a gfx BO based dmabuf. Signed-off-by: Lang Yu --- .../gpu/drm/amd/amdgpu

Re: [PATCH 1/1] drm/amdkfd: Correct mmu_notifier_get failure handling

2022-07-21 Thread Felix Kuehling
ses system crash with different backtrace. The fix is to increase process refcount and then decrease the refcount after mmu_notifier_get success. Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_process.c | 6 ++ 1 file changed, 6 inserti

Re: [PATCH] drm/amdgpu: Refactor code to handle non coherent and uncached

2022-07-20 Thread Felix Kuehling
+ snoop = true; + + With the two extra blank lines removed, this patch is Reviewed-by: Felix Kuehling Please check whether a similar cleanup can be made in svm_range_get_pte_flags, or maybe even, whether common code can be factored out of those two functions. Re

Re: [PATCH] drm/amdkfd: track unified memory reservation with xnack off

2022-07-18 Thread Felix Kuehling
prange creation and free. Signed-off-by: Alex Sierra Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 4 ++ .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 23 --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 60 +-- 3 files changed

Re: [PATCH v9 02/14] mm: move page zone helpers from mm.h to mmzone.h

2022-07-18 Thread Felix Kuehling
On 2022-07-18 06:50, David Hildenbrand wrote: On 15.07.22 17:05, Alex Sierra wrote: [WHY] It makes more sense to have these helpers in zone specific header file, rather than the generic mm.h Signed-off-by: Alex Sierra Acked-by: David Hildenbrand Thank you! I don't think I have the authorit

Re: [PATCH] drm/amdgpu: Fix a NULL pointer of fence

2022-07-18 Thread Felix Kuehling
Xinhui submitted this patch instead, which should address the same issue: "drm/amdgpu: Remove one duplicated ef removal" Alex, can you pick up that patch for drm-fixes for 5.19, if it's not too late? Thanks,   Felix On 2022-07-18 10:58, Mike Lothian wrote: Is this likely to land before 5.1

Re: [PATCH 2/3] drm/amdkfd: track unified memory reservation with xnack off

2022-07-16 Thread Felix Kuehling
On 2022-07-11 21:56, Alex Sierra wrote: [WHY] Unified memory with xnack off should be tracked, as userptr mappings and legacy allocations do. To avoid oversuscribe system memory when xnack off. [How] Exposing functions reserve_mem_limit and unreserve_mem_limit to SVM API and call them on every pr

[PATCH] drm/ttm: fix missing NULL check in ttm_device_swapout

2022-07-14 Thread Felix Kuehling
itted. Signed-off-by: Felix Kuehling --- drivers/gpu/drm/ttm/ttm_device.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c index be24bb6cefd0..165a6cbb45d5 100644 --- a/drivers/gpu/drm/ttm/ttm_device.c +++ b/drivers/gpu/dr

Re: [PATCH] x86/configs: Update defconfig with peer-to-peer configs

2022-07-13 Thread Felix Kuehling
On 2022-07-08 19:17, Ramesh Errabolu wrote: - Update defconfig for PCI_P2PDMA - Update defconfig for DMABUF_MOVE_NOTIFY - Update defconfig for HSA_AMD_P2P --- The patch is missing a Signed-off-by. With that fixed Reviewed-by: Felix Kuehling Notes: Following procedure

Re: [PATCH] drm/amdkfd: Remove Align VRAM allocations to 1MB on APU ASIC

2022-07-13 Thread Felix Kuehling
Am 2022-07-13 um 05:14 schrieb shikai guo: From: Shikai Guo While executing KFDMemoryTest.MMBench, test case will allocate 4KB size memory 1000 times. Every time, user space will get 2M memory.APU VRAM is 512M, there is not enough memory to be allocated. So the 2M aligned feature is not sui

Re: [PATCH] drm/amdkfd: bump KFD version for unified ctx save/restore memory

2022-07-12 Thread Felix Kuehling
Eric sent out the corresponding user mode patches to the mailing list as well. It looks a bit weird, because it looks like they're part of the same patch series. But patch 2 and 3 are actually user mode patches. The interesting one is patch 3. Do we still need a link to a user mode patch in t

Re: [PATCH 1/1] drm/amdkfd: Process notifier release callback don't take mutex

2022-07-11 Thread Felix Kuehling
; lock((work_completion)(&svms->deferred_list_work)); lock(&process->mutex); Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_process.c | 21 + 1 file changed, 9 insertions(+), 12 deletions(-)

Re: [PATCH] drm/amdkfd: bump KFD version for unified ctx save/restore memory

2022-07-11 Thread Felix Kuehling
On 2022-07-11 14:41, Eric Huang wrote: To expose unified memory for ctx save/resotre area feature availablity to libhsakmt. Signed-off-by: Eric Huang Reviewed-by: Felix Kuehling --- include/uapi/linux/kfd_ioctl.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a

Re: [PATCH v8 02/15] mm: move page zone helpers into new header-specific file

2022-07-08 Thread Felix Kuehling
On 2022-07-08 07:28, David Hildenbrand wrote: On 07.07.22 21:03, Alex Sierra wrote: [WHY] Have a cleaner way to expose all page zone helpers in one header What exactly is a "page zone"? Do you mean a buddy zone as in include/linux/mmzone.h ? Zone as in ZONE_DEVICE. Maybe we could extend mmzone

Re: [PATCH 1/1] drm/amdkfd: Process notifier release callback don't take mutex

2022-07-08 Thread Felix Kuehling
I think this could also be fixed by not taking the process_info lock in svm_range_restore_work and svm_range_set_attr. I'm not even sure why we're taking this lock in the SVM code. I think that was copied from the restore workers in amdgpu_amdkfd_gpuvm.c because there it's used to protect the B

Re: [PATCH] drm/amdgpu: Remove one duplicated ef removal

2022-07-08 Thread Felix Kuehling
On 2022-07-07 21:53, xinhui pan wrote: That has been done in BO release notify. Signed-off-by: xinhui pan Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 5 - 1 file changed, 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu

Re: [PATCH 1/1] drm/amdgpu: Don't clear memory when PTB BO is released

2022-07-07 Thread Felix Kuehling
On 2022-07-07 12:45, Philip Yang wrote: MMU notifier callback unmap the svm range update page table may free the PTB BO, then amdgpu_fill_buffer zero BO memory could cause deadlock as kmalloc may trigger MMU notifier. amdgpu_vm_pt_clear setup PTB BO memory with initial value, and no sensitive da

Re: [PATCH] drm/amdgpu: Fix one list corruption when create queue fails

2022-07-07 Thread Felix Kuehling
Am 2022-07-07 um 09:39 schrieb philip yang: On 2022-07-07 06:28, xinhui pan wrote: Queue would be freed when create_queue_cpsch fails So lets do queue cleanup otherwise various list and memory issues happen. This bug was introduced when adding MES support, as we used to ignore execute_queue

Re: [PATCH] drm/amdgpu: Fix a NULL pointer of fence

2022-07-07 Thread Felix Kuehling
Am 2022-07-07 um 05:54 schrieb Christian König: Am 07.07.22 um 11:50 schrieb xinhui pan: Fence is accessed by dma_resv_add_fence() now. Use amdgpu_amdkfd_remove_eviction_fence instead. Signed-off-by: xinhui pan ---   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 4 ++--   1 file changed, 2

Re: [PATCH] drm/amdkfd: Fix warnings from static analyzer Smatch

2022-07-06 Thread Felix Kuehling
"drm/amdkfd: Extend KFD device topology to surface peer-to-peer links") Signed-off-by: Ramesh Errabolu Reported-by: Dan Carpenter Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 34 +++ 1 file changed, 17 insertions(+), 17 deletions(-)

Re: [PATCH 1/1] drm/amdkfd: Remove queue sysfs and doorbell after unmapping

2022-07-05 Thread Felix Kuehling
On 2022-07-05 15:02, philip yang wrote: On 2022-07-05 14:48, Felix Kuehling wrote: On 2022-06-10 21:03, Philip Yang wrote: If destroying/unmapping queue failed, application may destroy queue again, cause below kernel warning backtrace. For outstanding queues, either applications forget to

Re: [PATCH 1/1] drm/amdkfd: Remove queue sysfs and doorbell after unmapping

2022-07-05 Thread Felix Kuehling
On 2022-06-10 21:03, Philip Yang wrote: If destroying/unmapping queue failed, application may destroy queue again, cause below kernel warning backtrace. For outstanding queues, either applications forget to destroy or failed to destroy, kfd_process_notifier_release will remove queue sysfs object

Re: [PATCH 2/3] drm/amdkfd: track unified memory reservation with xnack off

2022-07-05 Thread Felix Kuehling
On 2022-07-05 12:09, philip yang wrote: On 2022-06-27 20:23, Alex Sierra wrote: [WHY] Unified memory with xnack off should be tracked, as userptr mappings and legacy allocations do. To avoid oversuscribe system memory when xnack off. I think this also apply to XNACK ON (remove p->xnack_enabl

Re: [PATCH 5/5] libhsakmt: allocate unified memory for ctx save restore area

2022-06-30 Thread Felix Kuehling
header->ErrorEventId = Event->EventId; + header->ErrorReason = ErrPayload; + header->DebugOffset = q->ctx_save_restore_size; + header->DebugSize = q->debug_memory_size; + } Is there a way to refac

Re: [PATCH v3] drm/amdkfd: simplify vm_validate_pt_pd_bos

2022-06-30 Thread Felix Kuehling
gets mapped. So just map root PD after updating vm->update_funcs in amdgpu_vm_make_compute whether the vm_update_mode changed or not. v3: - Add some comments suggested by Christian. v2: - Don't rename vm_validate_pt_pd_bos and make it public. Signed-off-by: Lang Yu Reviewed-by: F

Re: [PATCH 2/2] drm/amdkfd: change svm range evict

2022-06-30 Thread Felix Kuehling
On 2022-06-30 11:19, Eric Huang wrote: On 2022-06-29 19:29, Felix Kuehling wrote: On 2022-06-29 18:53, Eric Huang wrote: On 2022-06-29 18:20, Felix Kuehling wrote: On 2022-06-28 17:43, Eric Huang wrote: Two changes: 1. reducing unnecessary evict/unmap when range is not mapped to gpu. 2

Re: [PATCH v5 1/11] drm/amdkfd: Add KFD SMI event IDs and triggers

2022-06-30 Thread Felix Kuehling
: Philip Yang Reviewed-by: Felix Kuehling --- include/uapi/linux/kfd_ioctl.h | 37 ++ 1 file changed, 37 insertions(+) diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h index c648ed7c5ff1..f239e260796b 100644 --- a/include/uapi/linux

Re: [PATCH v5 8/11] drm/amdkfd: Bump KFD API version for SMI profiling event

2022-06-30 Thread Felix Kuehling
Am 2022-06-28 um 10:50 schrieb Philip Yang: Indicate SMI profiling events available. Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling --- include/uapi/linux/kfd_ioctl.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/kfd_ioctl.h b/include

Re: [PATCH v5 7/11] drm/amdkfd: Asynchronously free smi_client

2022-06-30 Thread Felix Kuehling
ang Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c | 14 ++ 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c index e5896b7a16dd..0472b56de

Re: [PATCH v5 6/11] drm/amdkfd: Add unmap from GPU SMI event

2022-06-30 Thread Felix Kuehling
Am 2022-06-28 um 10:50 schrieb Philip Yang: SVM range unmapped from GPUs when range is unmapped from CPU, or with xnack on from MMU notifier when range is evicted or migrated. Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c | 9

Re: [PATCH v5 5/11] drm/amdkfd: Add user queue eviction restore SMI event

2022-06-30 Thread Felix Kuehling
restore. Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 2 +- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 12 --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 4 +-- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 4

Re: [PATCH v5 4/11] drm/amdkfd: Add migration SMI event

2022-06-30 Thread Felix Kuehling
eviction. Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c| 53 - drivers/gpu/drm/amd/amdkfd/kfd_migrate.h| 5 +- drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c | 22 + drivers/gpu/drm/amd/amdkfd

Re: [PATCH v5 3/11] drm/amdkfd: Add GPU recoverable fault SMI event

2022-06-30 Thread Felix Kuehling
tamp < AMDGPU_SVM_RANGE_RETRY_FAULT_PENDING) { + if (ktime_before(timestamp, ktime_add_ns(prange->validate_timestamp, + AMDGPU_SVM_RANGE_RETRY_FAULT_PENDING))) { You changed the timestamp units from us to ns. I think you'll need to update AMDGPU_SVM_RANGE_RETRY_FAULT_PENDING (multiply with 100

Re: [PATCH 2/2] drm/amdkfd: change svm range evict

2022-06-29 Thread Felix Kuehling
On 2022-06-29 18:53, Eric Huang wrote: On 2022-06-29 18:20, Felix Kuehling wrote: On 2022-06-28 17:43, Eric Huang wrote: Two changes: 1. reducing unnecessary evict/unmap when range is not mapped to gpu. 2. adding always evict when flags is set to always_mapped. Signed-off-by: Eric Huang

Re: [PATCH] drm/amdkfd: fix cu mask for asics with wgps

2022-06-29 Thread Felix Kuehling
distribute CUs across the SEs in a pairwise manner to assume WGP mode at all times. Signed-off-by: Jonathan Kim Looks good to me. Three nit-picks inline. With that fixed, the patch is Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c | 12

Re: [PATCH 2/2] drm/amdkfd: change svm range evict

2022-06-29 Thread Felix Kuehling
On 2022-06-28 17:43, Eric Huang wrote: Two changes: 1. reducing unnecessary evict/unmap when range is not mapped to gpu. 2. adding always evict when flags is set to always_mapped. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 10 -- 1 file changed, 8 insertions

Re: [PATCH v7 01/14] mm: rename is_pinnable_pages to is_pinnable_longterm_pages

2022-06-29 Thread Felix Kuehling
On 2022-06-29 03:33, David Hildenbrand wrote: On 29.06.22 05:54, Alex Sierra wrote: is_pinnable_page() and folio_is_pinnable() were renamed to is_longterm_pinnable_page() and folio_is_longterm_pinnable() respectively. These functions are used in the FOLL_LONGTERM flag context. Subject talks abo

Re: [PATCH] drm/amdkfd: Fix warnings from static analyzer Smatch

2022-06-29 Thread Felix Kuehling
Am 2022-06-28 um 20:03 schrieb Errabolu, Ramesh: [AMD Official Use Only - General] My responses are inline -Original Message- From: Kuehling, Felix Sent: Tuesday, June 28, 2022 6:41 PM To: Errabolu, Ramesh ; amd-gfx@lists.freedesktop.org; dan.carpen...@oracle.com Subject: Re: [PATCH]

Re: [PATCH] drm/amdkfd: Fix warnings from static analyzer Smatch

2022-06-28 Thread Felix Kuehling
Am 2022-06-28 um 19:25 schrieb Ramesh Errabolu: The patch fixes couple of warnings, as reported by Smatch a static analyzer Signed-off-by: Ramesh Errabolu Reported-by: Dan Carpenter --- drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 36 --- 1 file changed, 19 insertions(+

Re: [PATCH 1/3] drm/amdkfd: add new flags for svm

2022-06-28 Thread Felix Kuehling
Am 2022-06-27 um 12:01 schrieb Eric Huang: No. There is only internal link for now, because it is under review. Once it is submitted, external link should be in gerritgit for libhsakmt. Hi Eric, For anything that requires ioctl API changes, the user mode and kernel mode changes need to be rev

Re: [PATCH v5 01/13] mm: add zone device coherent type memory support

2022-06-21 Thread Felix Kuehling
. Signed-off-by: Alex Sierra Acked-by: Felix Kuehling Reviewed-by: Alistair Popple [hch: rebased ontop of the refcount changes, removed is_dev_private_or_coherent_page] Signed-off-by: Christoph Hellwig --- include/linux/memremap.h | 19 +++ mm/memcontrol.c

Re: [PATCH v2 1/3] drm/amdgpu: Fetch MES scheduler/KIQ versions

2022-06-10 Thread Felix Kuehling
Am 2022-06-10 um 13:13 schrieb Graham Sider: Store MES scheduler and MES KIQ version numbers in amdgpu_mes for GFX11. Signed-off-by: Graham Sider Acked-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 3 +++ drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 12

Re: [PATCH v2 3/3] drm/amdgpu: Update mes_v11_api_def.h

2022-06-10 Thread Felix Kuehling
Am 2022-06-10 um 13:13 schrieb Graham Sider: Update MES API to support oversubscription without aggregated doorbell for usermode queues. Signed-off-by: Graham Sider Acked-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 1 + drivers/gpu/drm/amd/amdgpu

Re: [PATCH v2 2/3] drm/amdkfd: Enable GFX11 usermode queue oversubscription

2022-06-10 Thread Felix Kuehling
Am 2022-06-10 um 13:13 schrieb Graham Sider: Starting with GFX11, MES requires wptr BOs to be GTT allocated/mapped to GART for usermode queues in order to support oversubscription. In the case that work is submitted to an unmapped queue, MES must have a GART wptr address to determine whether the

Re: [PATCH v3] drm/amdkfd: Add available memory ioctl

2022-06-10 Thread Felix Kuehling
: David Yat Sin Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 1 + .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 38 +-- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 34 + include/uapi/linux/kfd_ioctl.h

Re: [PATCH v2] drm/amdkfd: fix warning when CONFIG_HSA_AMD_P2P is not set

2022-06-10 Thread Felix Kuehling
Am 2022-06-10 um 11:46 schrieb Alex Deucher: drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c:1542:11: warning: variable 'i' set but not used [-Wunused-but-set-variable] Reported-by: kernel test robot Signed-off-by: Alex Deucher Thank you for taking care of this. Reviewed

Re: [PATCH] drm/amdgpu: Fix error handling in amdgpu_amdkfd_gpuvm_free_memory_of_gpu

2022-06-10 Thread Felix Kuehling
le to which BO belongs + * + * Return: void I don't think you need to state a void return explicitly. [+David], since you were looking into KFD documentation and kernel-doc comments lately, do you have any feedback on the kernel-doc syntax? Other than that, this patch is Reviewed-by: Felix Kue

<    5   6   7   8   9   10   11   12   13   14   >