On 2024-03-04 19:20, Rehman, Ahmad wrote:
[AMD Official Use Only - General]
Hey,
Due to mode-1 reset (pending_reset), the amdgpu_amdkfd_device_init
will not be called and hence adev->kfd.init_complete will not be set.
The function amdgpu_amdkfd_drm_client_create has condition:
if
On 2024-03-04 10:19, Samir Dhume wrote:
Signed-off-by: Samir Dhume
Please add a meaningful commit description to all the patches in the
series. See one more comment below.
---
drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 34 +++-
1 file changed, 27 insertions(+), 7
On 2024-03-04 17:05, Ahmad Rehman wrote:
In passthrough environment, when amdgpu is reloaded after unload, mode-1
is triggered after initializing the necessary IPs, That init does not
include KFD, and KFD init waits until the reset is completed. KFD init
is called in the reset handler, but in
On 2024-02-29 01:04, Jesse.Zhang wrote:
fix the issue:
"amdgpu: Failed to create process VM object".
[Why]when amdgpu initialized, seq64 do mampping and update bo mapping in vm
page table.
But when clifo run. It also initializes a vm for a process device through the
function
put last
in vm_fini()
Cc: Christian Koenig
Cc: Alex Deucher
Cc: Felix Kuehling
Signed-off-by: Shashank Sharma
One nit-pick and one bug inline. With those fixed, the patch
Reviewed-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 9 +-
drivers/gpu/drm/a
On 2024-02-28 01:41, Christian König wrote:
Am 28.02.24 um 06:04 schrieb Jesse.Zhang:
fix the issue when run clinfo:
"amdgpu: Failed to create process VM object".
when amdgpu initialized, seq64 do mampping and update bo mapping in
vm page table.
But when clifo run. It also initializes a vm
+TMA reserved memory size
to two pages.
Signed-off-by: Laurent Morichetti
Reviewed-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdkfd/kfd_device.c | 23 ---
drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 6 +++---
2 files changed, 19 insertions(+), 10 deletions(-)
diff
On 2024-02-21 05:54, Jonathan Kim wrote:
Prevent dropping the KFD process reference at the end of a debug
IOCTL call where the acquired process value is an error.
Signed-off-by: Jonathan Kim
Reviewed-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 1 +
1 file
On 2024-02-15 10:18, Philip Yang wrote:
Document how to use SMI system management interface to receive SVM
events.
Define SVM events message string format macro that could use by user
mode for sscanf to parse the event. Add it to uAPI header file to make
it obvious that is changing uAPI in
Signed-off-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c | 3 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c| 6 +---
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 11 +++-
drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c | 29 ++--
4 files changed, 27
Signed-off-by: Rajneesh Bhardwaj
Reviewed-by: Felix Kuehling
---
* Change the enum bitfield to 4 to avoid ORing condition of previous
member flags.
* Incorporate review feedback from Felix from
https://www.mail-archive.com/amd-gfx@lists.freedesktop.org/msg102840.html
and split one
-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c
b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c
index d722cbd31783..826bc4f6c8a7 100644
--- a/drivers
On 2024-02-09 20:49, Rajneesh Bhardwaj wrote:
In certain cooperative group dispatch scenarios the default SPI resource
allocation may cause reduced per-CU workgroup occupancy. Set
COMPUTE_RESOURCE_LIMITS.FORCE_SIMD_DIST=1 to mitigate soft hang
scenarions.
Suggested-by: Joseph Greathouse
On 2024-02-08 15:01, Bhardwaj, Rajneesh wrote:
On 2/8/2024 2:41 PM, Felix Kuehling wrote:
On 2024-02-07 23:14, Rajneesh Bhardwaj wrote:
In certain cooperative group dispatch scenarios the default SPI
resource
allocation may cause reduced per-CU workgroup occupancy. Set
On 2024-02-07 23:14, Rajneesh Bhardwaj wrote:
In certain cooperative group dispatch scenarios the default SPI resource
allocation may cause reduced per-CU workgroup occupancy. Set
COMPUTE_RESOURCE_LIMITS.FORCE_SIMD_DIST=1 to mitigate soft hang
scenarions.
Suggested-by: Joseph Greathouse
the kfd_gpu_cache_info before asking the remaining
fields to be filled in by lower-level functions.
Fixes: 04756ac9a24c ("drm/amdkfd: Add cache line sizes to KFD topology")
Signed-off-by: Joseph Greathouse
Reviewed-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 1 +
1 file
On 2024-02-06 16:24, Kent Russell wrote:
Partition mode only affects L3 cache size. After removing the L2 check in
the previous patch, make sure we aren't dividing all cache sizes by
partition mode, just L3.
Fixes: a75bfb3c4045 ("drm/amdkfd: Fix L2 cache size reporting in GFX9.4.3")
The fixes
On 2024-02-06 15:55, Joseph Greathouse wrote:
The current kfd_gpu_cache_info structure is only partially
filled in for some architectures. This means that for devices
where we do not fill in some fields, we can returned
uninitialized values through the KFD topology.
Zero out the
On 2024-02-01 11:50, Philip Yang wrote:
SVM migration unmap pages from GPU and then update mapping to GPU to
recover page fault. Currently unmap clears the PDE entry for range
length >= huge page and free PTB bo, update mapping to alloc new PT bo.
There is race bug that the freed entry bo
Thanks for checking. The patch ls
Reviewed-by: Felix Kuehling
Thanks,
-Joe
Regards,
Felix
+ m->compute_resource_limits = q->is_gws ?
+ COMPUTE_RESOURCE_LIMITS__FORCE_SIMD_DIST_MASK : 0;
+
q->is_active = QUEUE_IS_ACTIVE(*q);
}
On 2024-02-01 13:54, Rajneesh Bhardwaj wrote:
In certain cooperative group dispatch scenarios the default SPI resource
allocation may cause reduced per-CU workgroup occupancy. Set
COMPUTE_RESOURCE_LIMITS.FORCE_SIMD_DIST=1 to mitigate soft hang
scenarions.
Suggested-by: Joseph Greathouse
_64+0x3f/0x90
[ 41.709973] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
Signed-off-by: Lang Yu
Reviewed-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 2 +-
.../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 20 ---
drivers/gpu/drm/amd/amdkfd/kfd_charde
by a NULL
access with a small offset.
v2:
- Move it to the reserved space to avoid concflicts with Mesa
- Add macros to make reserved space management easier
Cc: Arunpravin Paneer Selvam
Cc: Christian Koenig
Signed-off-by: Jay Cornwall
Signed-off-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdgpu
/vm/mmap_min_addr.
Signed-off-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 98a57192..2c4053b29bb3 100644
--- a/drivers
On 2024-01-29 11:50, Arunpravin Paneer Selvam wrote:
@@ -339,18 +346,19 @@ static void kfd_init_apertures_v9(struct
kfd_process_device *pdd, uint8_t id)
pdd->lds_base = MAKE_LDS_APP_BASE_V9();
pdd->lds_limit = MAKE_LDS_APP_LIMIT(pdd->lds_base);
- /* Raven needs SVM to
, Felix Kuehling 写道:
On 2024-01-29 8:58, Shengyu Qu wrote:
Hi,
Seems rocm-opengl interop hang problem still exists[1]. Btw have you
discovered into this problem?
Best regards,
Shengyu
[1]
https://projects.blender.org/blender/blender/issues/100353#issuecomment-599
Maybe you're having a different
On 2024-01-28 21:30, Yu, Lang wrote:
[AMD Official Use Only - General]
-Original Message-
From: Kuehling, Felix
Sent: Saturday, January 27, 2024 3:22 AM
To: Yu, Lang ; amd-gfx@lists.freedesktop.org
Cc: Francis, David
Subject: Re: [PATCH v2] drm/amdkfd: reserve the BO before
On 2024-01-29 3:45, Yu, Lang wrote:
[AMD Official Use Only - General]
-Original Message-
From: amd-gfx On Behalf Of Felix
Kuehling
Sent: Friday, January 26, 2024 6:28 AM
To: amd-gfx@lists.freedesktop.org
Cc: Cornwall, Jay ; Koenig, Christian
; Paneer Selvam, Arunpravin
Subject
be we need more users to test it.
Besides,
Tested-by: Shengyu Qu
Best Regards,
Shengyu
在 2024/1/26 06:27, Felix Kuehling 写道:
The TBA and TMA, along with an unused IB allocation, reside at low
addresses in the VM address space. A stray VM fault which hits these
pages must be serviced
On 2024-01-25 20:59, Yu, Lang wrote:
[AMD Official Use Only - General]
-Original Message-
From: Kuehling, Felix
Sent: Thursday, January 25, 2024 5:41 AM
To: Yu, Lang ; amd-gfx@lists.freedesktop.org
Cc: Francis, David
Subject: Re: [PATCH v2] drm/amdkfd: reserve the BO before
by a NULL
access with a small offset.
v2:
- Move it to the reserved space to avoid concflicts with Mesa
- Add macros to make reserved space management easier
Cc: Arunpravin Paneer Selvam
Cc: Christian Koenig
Signed-off-by: Jay Cornwall
Signed-off-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdgpu
ight place for this to ensure it only gets called once.
The fix looks reasonable to me.
Reviewed-by: Felix Kuehling
This looks fine to me, needs to be checked by Felix anyway.
Thanks,
Lijo
And re-locating the drm client creation following after drm_dev_register
looks like a more proper flow.
v2: wr
On 2024-01-22 4:08, Lang Yu wrote:
Fixes: 410f08516e0f ("drm/amdkfd: Move dma unmapping after TLB flush")
v2:
Avoid unmapping attachment twice when ERESTARTSYS.
[ 41.708711] WARNING: CPU: 0 PID: 1463 at drivers/gpu/drm/ttm/ttm_bo.c:846
ttm_bo_validate+0x146/0x1b0 [ttm]
[ 41.708989] Call
On 2024-01-24 9:32, Shashank Sharma wrote:
On 19/01/2024 21:23, Felix Kuehling wrote:
On 2024-01-18 14:21, Shashank Sharma wrote:
This patch changes the handling and lifecycle of vm->task_info object.
The major changes are:
- vm->task_info is a dynamically allocated ptr now, and its
* On various Navis, most cache lines are 128 except L1 scalar data and
instruction caches as well as L3 cache
* You fixed L1 scalar data and instruction cache sizes for Carrizo.
Was that intentional?
If that sounds correct and how it's meant to be, you can add my
Reviewed-by: Felix Kuehling
of vm->task_info.
V2: Do not block all the prints when task_info not found (Felix)
Cc: Christian Koenig
Cc: Alex Deucher
Cc: Felix Kuehling
Signed-off-by: Shashank Sharma
Nit-picks inline.
---
drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 7 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
On 2024-01-18 07:07, Christian König wrote:
Am 18.01.24 um 00:44 schrieb Friedrich Vock:
On 18.01.24 00:00, Alex Deucher wrote:
[SNIP]
Right now, IH overflows, even if they occur repeatedly, only get
registered once. If not registering IH overflows can trivially
lead to
system crashes, it's
ras_ctrl debugfs
Charlene Liu (1):
drm/amd/display: Update z8 latency
Dafna Hirschfeld (1):
drm/amdkfd: fixes for HMM mem allocation
Daniel Miess (1):
Revert "drm/amd/display: Fix conversions between bytes and KB"
Felix Kuehling (4):
drm/amdkfd: Fix lock
A static checker pointed out, that bo_va->base.bo was already derefenced
earlier in the same scope. Therefore this check is unnecessary here.
Reported-by: Dan Carpenter
Fixes: 79e7fdec71f2 ("drm/amdgpu: Auto-validate DMABuf imports in compute VMs")
Signed-off-by: Felix Kuehling
---
be generalized later if there is interest then.
Regards,
Felix
On 2023-12-06 16:23, Felix Kuehling wrote:
Executive Summary: We need to add CRIU support to DRM render nodes in
order to maintain CRIU support for ROCm application once they start
relying on render nodes for more GPU memory management
On 2024-01-12 3:05, Flora Cui wrote:
otherwise drm_client_dev_unregister() would try to
kfree(>kfd.client).
Signed-off-by: Flora Cui
Thank you for finding and fixing this bug. You can add:
Fixes: 1819200166ce ("drm/amdkfd: Export DMABufs from KFD using GEM
handles")
Revie
won't be able to start. The VA range allocator is in libdrm.
Marek
On Fri, Jan 5, 2024, 15:20 Felix Kuehling wrote:
TBA/TMA were relocated to the upper half of the canonical address space.
I don't think that qualifies as 32-bit by definition. But maybe you're
using a different definition
On 2024-01-10 10:56, Srinivasan Shanmugam wrote:
Fixes the below:
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_debug.c:1024
kfd_dbg_trap_device_snapshot() warn: variable dereferenced before check
'entry_size' (see line 1021)
Cc: Felix Kuehling
Cc: Christian König
Cc: Alex Deucher
Signed-off
On 2024-01-10 17:01, Philip Yang wrote:
While svm range partial migrating to system memory, clear dma_addr vram
domain flag, otherwise the future split will get incorrect vram_pages
and actual loc.
After range spliting, set new range and old range actual_loc:
new range actual_loc is 0 if
On 2024-01-11 02:22, Lang Yu wrote:
Fixes: 410f08516e0f ("drm/amdkfd: Move dma unmapping after TLB flush")
[ 41.708711] WARNING: CPU: 0 PID: 1463 at drivers/gpu/drm/ttm/ttm_bo.c:846
ttm_bo_validate+0x146/0x1b0 [ttm]
[ 41.708989] Call Trace:
[ 41.708992]
[ 41.708996] ?
[+Jon]
On 2024-01-11 01:05, Ma, Jun wrote:
Hi Felix,
On 1/10/2024 11:57 PM, Felix Kuehling wrote:
On 2024-01-10 04:39, Ma Jun wrote:
There is following shift-out-of-bounds warning if ecode=0.
"shift exponent 4294967295 is too large for 64-bit type 'long long unsigned
int'"
On 2024-01-09 15:05, Philip Yang wrote:
After svm range partial migrating to system memory, unmap to cleanup the
corresponding dma_addr vram domain flag, otherwise the future split will
get incorrect vram_pages and actual loc.
After range spliting, set new range and old range actual_loc:
new
On 2024-01-10 04:39, Ma Jun wrote:
There is following shift-out-of-bounds warning if ecode=0.
"shift exponent 4294967295 is too large for 64-bit type 'long long unsigned
int'"
Signed-off-by: Ma Jun
---
include/uapi/linux/kfd_ioctl.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
mdkfd/kfd_svm.c:2691
svm_range_get_range_boundaries() warn: can 'node' even be NULL?
Suggested-by: Philip Yang
Cc: Felix Kuehling
Cc: Christian König
Cc: Alex Deucher
Signed-off-by: Srinivasan Shanmugam
Reviewed-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 10 +-
1 file changed, 5 insert
On 2024-01-07 08:07, Dafna Hirschfeld wrote:
Fix err return value and reset pgmap->type after checking it.
Fixes: c83dee9b6394 ("drm/amdkfd: add SPM support for SVM")
Reviewed-by: Felix Kuehling
Signed-off-by: Dafna Hirschfeld
---
v2: remove unrelated DOC fix and add 'Fixes'
Properly mark kfd_process->ef as __rcu and consistently use the right
accessor functions.
Reported-by: kernel test robot
Closes:
https://lore.kernel.org/oe-kbuild-all/202312052245.yfpbsgnh-...@intel.com/
Signed-off-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
why UMDs can't allocate
anything in that range.
Marek
On Wed, Jan 3, 2024 at 2:50 PM Jay Cornwall wrote:
On 1/3/2024 12:58, Felix Kuehling wrote:
A segfault in Mesa seems to be a different issue from what's mentioned
in the commit message. I'd let Christian or Marek comment on
compatibility
On 2024-01-04 4:33, Christian König wrote:
Am 04.01.24 um 00:15 schrieb Felix Kuehling:
DMABuf imports in compute VMs are not wrapped in a kgd_mem object on the
process_info->kfd_bo_list. There is no explicit KFD API call to validate
them or add eviction fences to them.
This pa
s long as all
imports are from KFD, with the exports already reserved, validated and
fenced by the KFD restore worker.
v5: Reintroduced separate evicted_user state to simplify the state machine
and CS error handling when amdgpu_vm_validate is called without a ticket.
Signed-off-by: Felix Ku
This is not strictly a change in the IOCTL API. This version bump is meant
to indicate to user mode the presence of a number of changes and fixes
that enable the management of VA mappings in compute VMs using the GEM_VA
ioctl for DMABufs exported from KFD.
Signed-off-by: Felix Kuehling
letion)(>deferred_list_work));
sync(srcu);
Signed-off-by: Philip Yang
Reviewed-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 6 --
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
b/drivers/gpu/drm/amd/amdk
't see how Jay's patch could have caused that. I
made another change in that code recently that could make a difference
for this issue:
commit 8f08c5b24ced1be7eb49692e4816c1916233c79b
Author: Felix Kuehling
Date: Fri Oct 27 18:21:55 2023 -0400
drm/amdkfd: Run restore_workers on fr
t are already done in
amdkfd_fence_enable_signaling.
Signed-off-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 26 ++
1 file changed, 10 insertions(+), 16 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
b/drivers/gpu/drm/amd/amdkf
/* Disable SVM support capability */
+ pgmap->type = 0;
Ooff, thanks for catching that. For the KFD driver changes you can add
Fixes: c83dee9b6394 ("drm/amdkfd: add SPM support for SVM")
Reviewed-by: Felix Kuehling
return PTR_ERR(r);
ogistical changes required for existing usage
of vm->task_info.
Cc: Christian Koenig
Cc: Alex Deucher
Cc: Felix Kuehling
Signed-off-by: Shashank Sharma
---
drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 7 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 15 ++-
drivers/gpu/drm/amd/amdgpu/
On 2024-01-02 09:07, Hawking Zhang wrote:
Check and report boot status if discovery failed.
Signed-off-by: Hawking Zhang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 6 +-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
ology to surface
peer-to-peer links")'
Suggested-by: Lijo Lazar
Suggested-by: Felix Kuehling
Cc: Felix Kuehling
Cc: Christian König
Cc: Alex Deucher
Signed-off-by: Srinivasan Shanmugam
Reviewed-by: Felix Kuehling
---
v2:
Changed to "if (list_empty(>io_link_props)) retur
On 2023-12-28 18:11, Philip Yang wrote:
On 2023-12-21 15:40, Felix Kuehling wrote:
==
WARNING: possible circular locking dependency detected
6.5.0-kfd-fkuehlin #276 Not tainted
--
kworker
On 2023-12-29 04:43, Srinivasan Shanmugam wrote:
Fix the following about iterator use:
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c:1456 kfd_add_peer_prop()
warn: iterator used outside loop: 'iolink3'
Cc: Felix Kuehling
Cc: Christian König
Cc: Alex Deucher
Signed-off-by: Srinivasan
as well?
I also see a bunch of unrelated indentation changes in this patch.
Regards,
Felix
Cc: Felix Kuehling
Cc: Christian König
Cc: Alex Deucher
Signed-off-by: Srinivasan Shanmugam
---
v3:
- updated u32, u16, u64 for missed variables in v2
drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 448
?
Cc: Felix Kuehling
Cc: Christian König
Cc: Alex Deucher
Signed-off-by: Srinivasan Shanmugam
Reviewed-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu
ange_schedule_evict_svm_bo instead of in the worker. That way it's
impossible for a BO to get freed while eviction work is pending and the
cancel_work_sync call in svm_range_bo_release can be eliminated.
Signed-off-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 13 -
1
/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c:1428 kfd_add_peer_prop()
warn: can 'iolink1' even be NULL?
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c:1433 kfd_add_peer_prop()
warn: can 'iolink2' even be NULL?
Cc: Felix Kuehling
Cc: Christian König
Cc: Alex Deucher
Signed-off-by: Srinivasan
()
warn: maybe use && instead of &
Please add a Fixes-tag:
Fixes: 0de4ec9a0353 ("drm/amdgpu: prepare map process for multi-process
debug devices")
Suggested-by: Lijo Lazar
Cc: Felix Kuehling
Cc: Christian König
Cc: Alex Deucher
Signed-off-by: Srinivasan Shanmugam
---
On 2023-12-11 10:56, Felix Kuehling wrote:
On 2023-12-08 05:11, Christian König wrote:
Am 07.12.23 um 20:14 schrieb Felix Kuehling:
On 2023-12-05 17:20, Felix Kuehling wrote:
Properly mark kfd_process->ef as __rcu and consistently access it with
rcu_dereference_protected.
Repor
On 2023-12-20 8:58, Christian König wrote:
Am 19.12.23 um 23:43 schrieb Felix Kuehling:
On 2023-12-19 3:10, Christian König wrote:
Am 15.12.23 um 16:19 schrieb Felix Kuehling:
On 2023-12-15 07:30, Christian König wrote:
@@ -1425,11 +1451,21 @@ int amdgpu_vm_handle_moved(struct
amdgpu_device
On 2023-12-19 3:10, Christian König wrote:
Am 15.12.23 um 16:19 schrieb Felix Kuehling:
On 2023-12-15 07:30, Christian König wrote:
@@ -1425,11 +1451,21 @@ int amdgpu_vm_handle_moved(struct
amdgpu_device *adev,
}
r = amdgpu_vm_bo_update(adev, bo_va, clear
Change the rules for amdgpu_sync_resv to let KFD synchronize with VM
fences on page table reservations. This fixes intermittent memory
corruption after evictions when using amdgpu_vm_handle_moved to update
page tables for VM mappings managed through render nodes.
Signed-off-by: Felix Kuehling
On 2023-12-15 07:30, Christian König wrote:
@@ -1425,11 +1451,21 @@ int amdgpu_vm_handle_moved(struct
amdgpu_device *adev,
}
r = amdgpu_vm_bo_update(adev, bo_va, clear);
- if (r)
- return r;
if (unlock)
dma_resv_unlock(resv);
+
s long as all
imports are from KFD, with the exports already reserved, validated and
fenced by the KFD restore worker.
Signed-off-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 10
.../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 39 --
drivers/gpu/drm/amd/
This is not strictly a change in the IOCTL API. This version bump is meant
to indicate to user mode the presence of a number of changes and fixes
that enable the management of VA mappings in compute VMs using the GEM_VA
ioctl for DMABufs exported from KFD.
Signed-off-by: Felix Kuehling
On 2023-12-14 16:40, Felix Kuehling wrote:
Fence slot reservation should bet done by the caller and not here.
The caller doesn't necessarily have the BO list to create all those
fences. The whole point of doing this in the VM code was, to use the
"BO lists" maintained by th
On 2023-12-13 09:30, Christian König wrote:
Am 06.12.23 um 22:44 schrieb Felix Kuehling:
DMABuf imports in compute VMs are not wrapped in a kgd_mem object on the
process_info->kfd_bo_list. There is no explicit KFD API call to validate
them or add eviction fences to them.
This pa
the DRM_BUDDY_CLEARED flag.
- Remove ! from amdgpu_res_cleared() check.
Signed-off-by: Arunpravin Paneer Selvam
Suggested-by: Christian König
Acked-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdgpu/amdgpu_object.c| 22 ---
.../gpu/drm/amd/amdgpu/amdgpu_res_cursor.h| 25
On 2023-12-14 10:06, Alex Deucher wrote:
On Thu, Dec 14, 2023 at 9:24 AM Liu, Shaoyun wrote:
[AMD Official Use Only - General]
The gmc flush tlb function is used on both baremetal and sriov. But the
function amdgpu_virt_kiq_reg_write_reg_wait is defined in amdgpu_virt.c with
name
be mapped to all GPUs
after this change. This side effect will be fixed with Thunk change to
set CWSR svm range with ACCESS_IN_PLACE attribute on the GPU that user
queue is created.
Signed-off-by: Philip Yang
With the commit description fixed, this patch is
Reviewed-by: Felix Kuehling
---
On 2023-12-07 13:02, Alex Deucher wrote:
Show buffers as shared if they are shared via dma-buf as well
(e.g., shared with v4l or some other subsystem).
You can add KFD to that list. With the in-progress CUDA11 VM changes and
improved interop between KFD and render nodes, sharing DMABufs
in those cases.
There are also some FIXMEs in this code that should be addressed at the
same time.
That said, as a short-term fix, this patch is
Acked-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git
On 2023-12-13 9:20, Christian König wrote:
Am 12.12.23 um 00:32 schrieb Felix Kuehling:
On 2023-12-11 04:50, Christian König wrote:
Am 08.12.23 um 20:53 schrieb Alex Deucher:
[SNIP]
You also need a functionality which resets all cleared blocks to
uncleared after suspend/resume.
No idea how
On 2023-12-11 05:38, Christian König wrote:
Am 09.12.23 um 00:01 schrieb James Zhu:
Needn't do schedule for each hmm_range_fault, and use cond_resched
to replace schedule.
cond_resched() is usually NAKed upstream since it is a NO-OP in most
situations.
That's weird, because
On 2023-12-11 04:50, Christian König wrote:
Am 08.12.23 um 20:53 schrieb Alex Deucher:
[SNIP]
You also need a functionality which resets all cleared blocks to
uncleared after suspend/resume.
No idea how to do this, maybe Alex knows of hand.
Since the buffers are cleared on creation, is
On 2023-12-08 05:11, Christian König wrote:
Am 07.12.23 um 20:14 schrieb Felix Kuehling:
On 2023-12-05 17:20, Felix Kuehling wrote:
Properly mark kfd_process->ef as __rcu and consistently access it with
rcu_dereference_protected.
Reported-by: kernel test robot
Closes:
ht
On 2023-12-05 17:20, Felix Kuehling wrote:
Properly mark kfd_process->ef as __rcu and consistently access it with
rcu_dereference_protected.
Reported-by: kernel test robot
Closes:
https://lore.kernel.org/oe-kbuild-all/202312052245.yfpbsgnh-...@intel.com/
Signed-off-by: Felix Kuehl
ces for amdgpu_vm_fence_imports into
amdgpu_vm_validate, outside the vm->status_lock
* Added dummy version of amdgpu_amdkfd_bo_validate_and_fence for builds
without KFD
Signed-off-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 10 ++
.../gpu/drm/amd/
This is not strictly a change in the IOCTL API. This version bump is meant
to indicate to user mode the presence of a number of changes and fixes
that enable the management of VA mappings in compute VMs using the GEM_VA
ioctl for DMABufs exported from KFD.
Signed-off-by: Felix Kuehling
Executive Summary: We need to add CRIU support to DRM render nodes in
order to maintain CRIU support for ROCm application once they start
relying on render nodes for more GPU memory management. In this email
I'm providing some background why we are doing this, and outlining some
of the
On 2023-12-04 03:40, Christian König wrote:
@@ -416,6 +423,28 @@ int amdgpu_vm_validate_pt_bos(struct
amdgpu_device *adev, struct amdgpu_vm *vm,
}
spin_lock(>status_lock);
}
+ while (ticket && !list_empty(>evicted_user)) {
+ bo_base =
Properly mark kfd_process->ef as __rcu and consistently access it with
rcu_dereference_protected.
Reported-by: kernel test robot
Closes:
https://lore.kernel.org/oe-kbuild-all/202312052245.yfpbsgnh-...@intel.com/
Signed-off-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkf
6.
Cheers,
Felix
Alex
Thanks,
Felix
On 2023-12-01 18:34, Felix Kuehling wrote:
This reverts commit 71a7974ac7019afeec105a54447ae1dc7216cbb3.
These helper functions are needed for KFD to export and import DMABufs
the right way without duplicating the tracking of DMABufs associated
with G
On 2023-12-01 18:34, Felix Kuehling wrote:
This reverts commit 71a7974ac7019afeec105a54447ae1dc7216cbb3.
These helper functions are needed for KFD to export and import DMABufs
the right way without duplicating the tracking of DMABufs associated with
GEM objects while ensuring that move notifier
VM. Revalidation after evictions is handled
in the VM code.
Signed-off-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 3 +
.../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 45 ---
drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c| 6 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_dma_bu
Use drm_gem_prime_fd_to_handle to import DMABufs for interop. This
ensures that a GEM handle is created on import and that obj->dma_buf
will be set and remain set as long as the object is imported into KFD.
Signed-off-by: Felix Kuehling
Reviewed-by: Ramesh Errabolu
Reviewed-by: Xiaogang.C
Create GEM handles for exporting DMABufs using GEM-Prime APIs. The GEM
handles are created in a drm_client_dev context to avoid exposing them
in user mode contexts through a DMABuf import.
Signed-off-by: Felix Kuehling
Reviewed-by: Ramesh Errabolu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
This is not strictly a change in the IOCTL API. This version bump is meant
to indicate to user mode the presence of a number of changes and fixes
that enable the management of VA mappings in compute VMs using the GEM_VA
ioctl for DMABufs exported from KFD.
Signed-off-by: Felix Kuehling
Create a new VM state to track user BOs that are in the system domain.
In the next patch this will be used do conditionally re-validate them in
amdgpu_vm_handle_moved.
Signed-off-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 17 +
drivers/gpu/drm/amd/amdgpu
101 - 200 of 3319 matches
Mail list logo