[PATCH] drm/amdgpu: Show retry fault message if process xnack on

2024-05-07 Thread Philip Yang
to amdgpu_vm_handle_fault and then to gmc interrupt handler to show vm fault message. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 5 +++-- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 2 +- drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 2 +- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 7

Re: [PATCH] drm/amdkfd: Remove arbitrary timeout for hmm_range_fault

2024-05-02 Thread Philip Yang
On 2024-05-02 08:42, James Zhu wrote: On 2024-05-01 18:56, Philip Yang wrote: On system with khugepaged enabled and user cases with THP buffer, the hmm_range_fault may takes > 15 seconds to return -EB

Re: [PATCH] drm/amdkfd: Remove arbitrary timeout for hmm_range_fault

2024-05-02 Thread Philip Yang
On 2024-05-02 00:09, Chen, Xiaogang wrote: On 5/1/2024 5:56 PM, Philip Yang wrote: Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding

[PATCH] drm/amdkfd: Remove arbitrary timeout for hmm_range_fault

2024-05-01 Thread Philip Yang
urn EBUSY, then userspace libdrm and Thunk will call ioctl again. Change EAGAIN to debug message as this is not error. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 5 - drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c | 12 +++- drivers/gpu/drm/

Re: [PATCH] drm/amd/amdkfd: Fix a resource leak in svm_range_validate_and_map()

2024-05-01 Thread Philip Yang
-by: Ramesh Errabolu Reviewed-by: Philip Yang --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index 386875e6eb96..dcb1d5d3f860 100644

[PATCH v6 1/5] drm/amdgpu: Support contiguous VRAM allocation

2024-04-24 Thread Philip Yang
TTM_PL_FLAG_CONTIFUOUS flag, and ask VRAM buddy allocator to get contiguous VRAM. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 4 include/uapi/linux/kfd_ioctl.h | 1 + 2 files changed, 5 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu

[PATCH v6 5/5] drm/amdkfd: Bump kfd version for contiguous VRAM allocation

2024-04-24 Thread Philip Yang
Bump the kfd ioctl minor version to delcare the contiguous VRAM allocation flag support. Signed-off-by: Philip Yang --- include/uapi/linux/kfd_ioctl.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h index

[PATCH v6 4/5] drm/amdkfd: Evict BO itself for contiguous allocation

2024-04-24 Thread Philip Yang
If the BO pages pinned for RDMA is not contiguous on VRAM, evict it to system memory first to free the VRAM space, then allocate contiguous VRAM space, and then move it from system memory back to VRAM. v6: user context should use interruptible call (Felix) Signed-off-by: Philip Yang

[PATCH v6 0/5] Best effort contiguous VRAM allocation

2024-04-24 Thread Philip Yang
GPU prefix to the macro name. v6: use shorter flag name, use interruptible wait ctx, drop patch 5/6 (Felix) Philip Yang (5): drm/amdgpu: Support contiguous VRAM allocation drm/amdgpu: Handle sg size limit for contiguous allocation drm/amdgpu: Evict BOs from same process for contiguous alloca

[PATCH v6 3/5] drm/amdgpu: Evict BOs from same process for contiguous allocation

2024-04-24 Thread Philip Yang
from the same process, this will evict the user queues first, and restore the queues later after contiguous VRAM allocation. Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git

[PATCH v6 2/5] drm/amdgpu: Handle sg size limit for contiguous allocation

2024-04-24 Thread Philip Yang
VRAM memory. To workaround the sg table segment size limit, allocate multiple segments if contiguous size is bigger than AMDGPU_MAX_SG_SEGMENT_SIZE. Signed-off-by: Philip Yang Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 12 ++-- 1 file changed, 6

Re: [PATCH v5 1/6] drm/amdgpu: Support contiguous VRAM allocation

2024-04-24 Thread Philip Yang
On 2024-04-23 18:17, Felix Kuehling wrote: On 2024-04-23 11:28, Philip Yang wrote: RDMA device with limited scatter-gather ability requires contiguous VRAM buffer allocation for RDMA peer direct support

Re: [PATCH v5 4/6] drm/amdkfd: Evict BO itself for contiguous allocation

2024-04-24 Thread Philip Yang
On 2024-04-23 18:15, Felix Kuehling wrote: On 2024-04-23 11:28, Philip Yang wrote: If the BO pages pinned for RDMA is not contiguous on VRAM, evict it to system memory first to free the VRAM space, then allocate

[PATCH v5 6/6] drm/amdkfd: Bump kfd version for contiguous VRAM allocation

2024-04-23 Thread Philip Yang
Bump the kfd ioctl minor version to delcare the contiguous VRAM allocation flag support. Signed-off-by: Philip Yang --- include/uapi/linux/kfd_ioctl.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h index

[PATCH v5 4/6] drm/amdkfd: Evict BO itself for contiguous allocation

2024-04-23 Thread Philip Yang
If the BO pages pinned for RDMA is not contiguous on VRAM, evict it to system memory first to free the VRAM space, then allocate contiguous VRAM space, and then move it from system memory back to VRAM. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 16

[PATCH v5 2/6] drm/amdgpu: Handle sg size limit for contiguous allocation

2024-04-23 Thread Philip Yang
memory. To workaround the sg table segment size limit, allocate multiple segments if contiguous size is bigger than MAX_SG_SEGMENT_SIZE. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git

[PATCH v5 3/6] drm/amdgpu: Evict BOs from same process for contiguous allocation

2024-04-23 Thread Philip Yang
from the same process, this will evict the user queues first, and restore the queues later after contiguous VRAM allocation. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu

[PATCH v5 5/6] drm/amdkfd: Increase KFD bo restore wait time

2024-04-23 Thread Philip Yang
time to 2 seconds, long enough for RDMA pin BO to alloc the contiguous VRAM. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h

[PATCH v5 1/6] drm/amdgpu: Support contiguous VRAM allocation

2024-04-23 Thread Philip Yang
TTM_PL_FLAG_CONTIFUOUS flag, and ask VRAM buddy allocator to get contiguous VRAM. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 4 include/uapi/linux/kfd_ioctl.h | 1 + 2 files changed, 5 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu

[PATCH v5 0/6] Best effort contiguous VRAM allocation

2024-04-23 Thread Philip Yang
GPU prefix to the macro name. Philip Yang (6): drm/amdgpu: Support contiguous VRAM allocation drm/amdgpu: Handle sg size limit for contiguous allocation drm/amdgpu: Evict BOs from same process for contiguous allocation drm/amdkfd: Evict BO itself for contiguous allocation drm/amdkfd: Incre

Re: [PATCH v4 6/7] drm/amdgpu: Skip dma map resource for null RDMA device

2024-04-23 Thread Philip Yang
On 2024-04-23 09:32, Christian König wrote: Am 23.04.24 um 15:04 schrieb Philip Yang: To test RDMA using dummy driver on the system without NIC/RDMA device, the get/put dma pages pass in null device pointer, skip

[PATCH v4 1/7] drm/amdgpu: Support contiguous VRAM allocation

2024-04-23 Thread Philip Yang
TTM_PL_FLAG_CONTIFUOUS flag, and ask VRAM buddy allocator to get contiguous VRAM. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 4 include/uapi/linux/kfd_ioctl.h | 1 + 2 files changed, 5 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu

[PATCH v4 3/7] drm/amdgpu: Evict BOs from same process for contiguous allocation

2024-04-23 Thread Philip Yang
from the same process, this will evict the user queues first, and restore the queues later after contiguous VRAM allocation. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu

[PATCH v4 7/7] drm/amdkfd: Bump kfd version for contiguous VRAM allocation

2024-04-23 Thread Philip Yang
Bump the kfd ioctl minor version to delcare the contiguous VRAM allocation flag support. Signed-off-by: Philip Yang --- include/uapi/linux/kfd_ioctl.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h index

[PATCH v4 4/7] drm/amdkfd: Evict BO itself for contiguous allocation

2024-04-23 Thread Philip Yang
If the BO pages pinned for RDMA is not contiguous on VRAM, evict it to system memory first to free the VRAM space, then allocate contiguous VRAM space, and then move it from system memory back to VRAM. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 16

[PATCH v4 6/7] drm/amdgpu: Skip dma map resource for null RDMA device

2024-04-23 Thread Philip Yang
To test RDMA using dummy driver on the system without NIC/RDMA device, the get/put dma pages pass in null device pointer, skip the dma map/unmap resource and sg table to avoid null pointer access. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 33

[PATCH v4 5/7] drm/amdkfd: Increase KFD bo restore wait time

2024-04-23 Thread Philip Yang
time to 2 seconds, long enough for RDMA pin BO to alloc the contiguous VRAM. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h

[PATCH v4 2/7] drm/amdgpu: Handle sg size limit for contiguous allocation

2024-04-23 Thread Philip Yang
memory. To workaround the sg table segment size limit, allocate multiple segments if contiguous size is bigger than MAX_SG_SEGMENT_SIZE. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git

[PATCH v4 0/7] Best effort contiguous VRAM allocation

2024-04-23 Thread Philip Yang
w GEM flag v3: add patch 2 to handle sg segment size limit (Christian) v4: remove the buddy block size limit from vram mgr because sg table creation already remove the limit, and resource uses u64 to handle block start, size (Christian) Philip Yang (7): drm/amdgpu: Support contiguous VRAM

Re: [PATCH v3 6/7] drm/amdgpu: Skip dma map resource for null RDMA device

2024-04-22 Thread Philip Yang
On 2024-04-22 10:56, Christian König wrote: Am 22.04.24 um 15:57 schrieb Philip Yang: To test RDMA using dummy driver on the system without NIC/RDMA device, the get/put dma pages pass in null device pointer, skip

Re: [PATCH v3 2/7] drm/amdgpu: Handle sg size limit for contiguous allocation

2024-04-22 Thread Philip Yang
On 2024-04-22 10:40, Christian König wrote: Am 22.04.24 um 15:57 schrieb Philip Yang: Define macro MAX_SG_SEGMENT_SIZE 2GB, because struct scatterlist length is unsigned int, and some users of it cast to a signed int, so

[PATCH v3 6/7] drm/amdgpu: Skip dma map resource for null RDMA device

2024-04-22 Thread Philip Yang
To test RDMA using dummy driver on the system without NIC/RDMA device, the get/put dma pages pass in null device pointer, skip the dma map/unmap resource and sg table to avoid null pointer access. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 33

[PATCH v3 0/7] Best effort contiguous VRAM allocation

2024-04-22 Thread Philip Yang
w GEM flag v3: add patch 2 to handle sg segment size limit (Christian) Philip Yang (7): drm/amdgpu: Support contiguous VRAM allocation drm/amdgpu: Handle sg size limit for contiguous allocation drm/amdgpu: Evict BOs from same process for contiguous allocation drm/amdkfd: Evict BO itself for

[PATCH v3 5/7] drm/amdkfd: Increase KFD bo restore wait time

2024-04-22 Thread Philip Yang
time to 2 seconds, long enough for RDMA pin BO to alloc the contiguous VRAM. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h

[PATCH v3 7/7] drm/amdkfd: Bump kfd version for contiguous VRAM allocation

2024-04-22 Thread Philip Yang
Bump the kfd ioctl minor version to delcare the contiguous VRAM allocation flag support. Signed-off-by: Philip Yang --- include/uapi/linux/kfd_ioctl.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h index

[PATCH v3 3/7] drm/amdgpu: Evict BOs from same process for contiguous allocation

2024-04-22 Thread Philip Yang
from the same process, this will evict the user queues first, and restore the queues later after contiguous VRAM allocation. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu

[PATCH v3 2/7] drm/amdgpu: Handle sg size limit for contiguous allocation

2024-04-22 Thread Philip Yang
memory. To workaround the sg table segment size limit, allocate multiple segments if contiguous size is bigger than MAX_SG_SEGMENT_SIZE. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 17 - 1 file changed, 12 insertions(+), 5 deletions(-) diff --git

[PATCH v3 4/7] drm/amdkfd: Evict BO itself for contiguous allocation

2024-04-22 Thread Philip Yang
If the BO pages pinned for RDMA is not contiguous on VRAM, evict it to system memory first to free the VRAM space, then allocate contiguous VRAM space, and then move it from system memory back to VRAM. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 16

[PATCH v3 1/7] drm/amdgpu: Support contiguous VRAM allocation

2024-04-22 Thread Philip Yang
TTM_PL_FLAG_CONTIFUOUS flag, and ask VRAM buddy allocator to get contiguous VRAM. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 4 include/uapi/linux/kfd_ioctl.h | 1 + 2 files changed, 5 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu

Re: [PATCH] drm/amdkfd: Fix rescheduling of restore worker

2024-04-19 Thread Philip Yang
-by: Felix Kuehling Reviewed-by: Philip Yang --- drivers/gpu/drm/amd/amdkfd/kfd_process.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c index aafdf064651f..58c1fe542

Re: [PATCH v2 1/6] drm/amdgpu: Support contiguous VRAM allocation

2024-04-18 Thread Philip Yang
On 2024-04-18 10:37, Christian König wrote: Am 18.04.24 um 15:57 schrieb Philip Yang: RDMA device with limited scatter-gather ability requires contiguous VRAM buffer allocation for RDMA peer direct support

Re: [PATCH] drm/amdkfd: Fix eviction fence handling

2024-04-18 Thread Philip Yang
("drm/amdkfd: Run restore_workers on freezable WQs") Signed-off-by: Felix Kuehling Reviewed-by: Philip Yang --- drivers/gpu/drm/amd/amdkfd/kfd_process.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_p

[PATCH v2 3/6] drm/amdkfd: Evict BO itself for contiguous allocation

2024-04-18 Thread Philip Yang
If the BO pages pinned for RDMA is not contiguous on VRAM, evict it to system memory first to free the VRAM space, then allocate contiguous VRAM space, and then move it from system memory back to VRAM. Signed-off-by: Philip Yang --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c| 17

[PATCH v2 1/6] drm/amdgpu: Support contiguous VRAM allocation

2024-04-18 Thread Philip Yang
TTM_PL_FLAG_CONTIFUOUS flag, and ask VRAM buddy allocator to get contiguous VRAM. Remove the 2GB max memory block size limit for contiguous allocation. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 4 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 9

[PATCH v2 5/6] drm/amdgpu: Skip dma map resource for null RDMA device

2024-04-18 Thread Philip Yang
To test RDMA using dummy driver on the system without NIC/RDMA device, the get/put dma pages pass in null device pointer, skip the dma map/unmap resource to avoid null pointer access. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 33 +++- 1 file

[PATCH v2 6/6] drm/amdkfd: Bump kfd version for contiguous VRAM allocation

2024-04-18 Thread Philip Yang
Bump the kfd ioctl minor version to delcare the contiguous VRAM allocation flag support. Signed-off-by: Philip Yang --- include/uapi/linux/kfd_ioctl.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h index

[PATCH v2 4/6] drm/amdkfd: Increase KFD bo restore wait time

2024-04-18 Thread Philip Yang
time to 2 seconds, long enough for RDMA pin BO to alloc the contiguous VRAM. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h

[PATCH v2 2/6] drm/amdgpu: Evict BOs from same process for contiguous allocation

2024-04-18 Thread Philip Yang
from the same process, this will evict the user queues first, and restore the queues later after contiguous VRAM allocation. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu

[PATCH v2 0/6] Best effort contiguous VRAM allocation

2024-04-18 Thread Philip Yang
w GEM flag Philip Yang (6): drm/amdgpu: Support contiguous VRAM allocation drm/amdgpu: Evict BOs from same process for contiguous allocation drm/amdkfd: Evict BO itself for contiguous allocation drm/amdkfd: Increase KFD bo restore wait time drm/amdgpu: Skip dma map resource for null RDMA de

Re: [PATCH v2] drm/amdgpu: Modify the contiguous flags behaviour

2024-04-17 Thread Philip Yang
On 2024-04-17 10:32, Paneer Selvam, Arunpravin wrote: Hi Christian, On 4/17/2024 6:57 PM, Paneer Selvam, Arunpravin wrote: Hi Christian, On 4/17/2024 12:19 PM, Christian König wrote:

Re: [PATCH 1/6] drm/amdgpu: Support contiguous VRAM allocation

2024-04-16 Thread Philip Yang
On 2024-04-15 08:02, Christian König wrote: Am 12.04.24 um 22:12 schrieb Philip Yang: RDMA device with limited scatter-gather capability requires physical address contiguous VRAM buffer for RDMA peer direct access

Re: [PATCH] drm/amdgpu: Modify the contiguous flags behaviour

2024-04-16 Thread Philip Yang
On 2024-04-16 02:50, Paneer Selvam, Arunpravin wrote: On 4/16/2024 3:32 AM, Philip Yang wrote: On 2024-04-14 10:57, Arunpravin Paneer Selvam wrote: Now we have two flags for contiguous

Re: [PATCH] drm/amdgpu: Modify the contiguous flags behaviour

2024-04-15 Thread Philip Yang
On 2024-04-14 10:57, Arunpravin Paneer Selvam wrote: Now we have two flags for contiguous VRAM buffer allocation. If the application request for AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS, it would set the ttm place TTM_PL_FLAG_CONTIGUOUS flag in the buffer's

[PATCH 5/6] drm/amdgpu: Skip dma map resource for null RDMA device

2024-04-12 Thread Philip Yang
To test RDMA using dummy driver on the system without NIC/RDMA device, the get dma pages pass in null device pointer, skip the dma map resource to avoid null device pointer access. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 33 +++- 1 file

[PATCH 2/6] drm/amdgpu: Evict BOs from same process for contiguous allocation

2024-04-12 Thread Philip Yang
KFD BOs from the same process, this will evict the user queues first, and restore the queues later after contiguous VRAM allocation. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd

[PATCH 6/6] drm/amdkfd: Bump kfd version for contiguous VRAM allocation

2024-04-12 Thread Philip Yang
Bump the kfd ioctl minor version to delcare the contiguous VRAM allocation flag support. Signed-off-by: Philip Yang --- include/uapi/linux/kfd_ioctl.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h index

[PATCH 4/6] drm/amdkfd: Increase KFD bo restore wait time

2024-04-12 Thread Philip Yang
wait time to 2 seconds, long enough for RDMA pin BO to finish the contiguous VRAM allocation. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd

[PATCH 3/6] drm/amdkfd: Evict BO itself for contiguous allocation

2024-04-12 Thread Philip Yang
If the BO pages pinned for RDMA is not contiguous on VRAM, evict it to system memory first to free the VRAM space, then allocate contiguous VRAM and then move it from system memory back to VRAM. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 15

[PATCH 1/6] drm/amdgpu: Support contiguous VRAM allocation

2024-04-12 Thread Philip Yang
AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS flag, and then vram_mgr will set TTM_PL_FLAG_CONTIFUOUS flag to ask VRAM buddy allocator to get contiguous VRAM. Remove the 2GB max memory block size limit for contiguous allocation. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 7 +++ drivers/gpu/drm

[PATCH 0/6] Best effort contiguous VRAM allocation

2024-04-12 Thread Philip Yang
This patch series implement new KFD memory alloc flag for best effort contiguous VRAM allocation, to support peer direct access RDMA device with limited scatter-gather dma capability. Philip Yang (6): drm/amdgpu: Support contiguous VRAM allocation drm/amdgpu: Evict BOs from same process

[PATCH] drm/amdgpu: Fix tlb_cb memory leaking

2024-04-08 Thread Philip Yang
] kfd_process_wq_release+0x273/0x3c0 [amdgpu] process_scheduled_works+0x2a7/0x500 worker_thread+0x186/0x340 Fixes: 220ecde84bc8 ("drm/amdgpu: implement TLB flush fence") Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 7 ++- 1 file changed, 2

Re: [PATCH 2/3] amd/amdgpu: wait no process running in kfd before resuming device

2024-03-26 Thread Philip Yang
On 2024-03-26 11:01, Felix Kuehling wrote: On 2024-03-26 10:53, Philip Yang wrote: On 2024-03-25 14:45, Felix Kuehling wrote: On 2024-03-22 15:57, Zhigang Luo wrote: it will cause

Re: [PATCH 2/3] amd/amdgpu: wait no process running in kfd before resuming device

2024-03-26 Thread Philip Yang
On 2024-03-25 14:45, Felix Kuehling wrote: On 2024-03-22 15:57, Zhigang Luo wrote: it will cause page fault after device recovered if there is a process running. Signed-off-by: Zhigang Luo

Re: [PATCH] drm/amdkfd: return negative error code in svm_ioctl()

2024-03-25 Thread Philip Yang
drm-next. Reviewed-by: Philip Yang --- Ps: When I try to compile this file, there is a error : drivers/gpu/drm/amd/amdkfd/kfd_migrate.c:28:10: fatal error: amdgpu_sync.h: No such file or directory. Maybe there are some steps I missed or this place need to be corrected?

[PATCH] drm/amdgpu: amdgpu_ttm_gart_bind set gtt bound flag

2024-03-11 Thread Philip Yang
Otherwise amdgpu_ttm_backend_unbind will not clear the gart page table and leave valid mapping entry to the stale system page. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b

Re: [PATCH v5 1/2] drm/amdgpu: implement TLB flush fence

2024-03-07 Thread Philip Yang
On 2024-03-06 09:41, Shashank Sharma wrote: From: Christian König The problem is that when (for example) 4k pages are replaced with a single 2M page we need to wait for change to be flushed out by invalidating the TLB before the PT can be freed. Solve

Re: [PATCH v4 2/2] drm/amdgpu: sync page table freeing with tlb flush

2024-03-01 Thread Philip Yang
On 2024-03-01 06:07, Shashank Sharma wrote: The idea behind this patch is to delay the freeing of PT entry objects until the TLB flush is done. This patch: - Adds a tlb_flush_waitlist which will keep the objects that need to be freed after tlb_flush -

Re: [PATCH v4 1/2] drm/amdgpu: implement TLB flush fence

2024-03-01 Thread Philip Yang
On 2024-03-01 06:07, Shashank Sharma wrote: From: Christian König The problem is that when (for example) 4k pages are replaced with a single 2M page we need to wait for change to be flushed out by invalidating the TLB before the PT can be freed. Solve

Re: [PATCH v3 3/3] drm/amdgpu: sync page table freeing with tlb flush

2024-02-26 Thread Philip Yang
On 2024-02-23 08:42, Shashank Sharma wrote: This patch: - adds a new list in amdgou_vm to hold the VM PT entries being freed - waits for the TLB flush using the vm->tlb_flush_fence - actually frees the PT BOs V2: rebase V3: Do not attach the tlb_fence to

Re: [PATCH v3 2/3] drm/amdgpu: implement TLB flush fence

2024-02-26 Thread Philip Yang
On 2024-02-23 11:58, Philip Yang wrote: On 2024-02-23 08:42, Shashank Sharma wrote: From: Christian König The problem is that when (for example) 4k pages are replaced with a single 2M page we need to wait

Re: [PATCH v3 2/3] drm/amdgpu: implement TLB flush fence

2024-02-23 Thread Philip Yang
On 2024-02-23 08:42, Shashank Sharma wrote: From: Christian König The problem is that when (for example) 4k pages are replaced with a single 2M page we need to wait for change to be flushed out by invalidating the TLB before the PT can be freed. Solve

Re: [PATCH] drm/amdgpu: break COW for user ptr during fork()

2024-02-22 Thread Philip Yang
On 2024-02-21 21:01, Lang Yu wrote: This is useful to prevent copy-on-write semantics from changing the physical location of a page if the parent writes to it after a fork(). Signed-off-by: Lang Yu --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 1 +

Re: [PATCH 1/2] drm/amdkfd: Document and define SVM event tracing macro

2024-02-20 Thread Philip Yang
On 2024-02-16 15:16, Felix Kuehling wrote: On 2024-02-15 10:18, Philip Yang wrote: Document how to use SMI system management interface to receive SVM events. Define SVM events message

Re: [PATCH 1/2] drm/amdkfd: Document and define SVM event tracing macro

2024-02-15 Thread Philip Yang
On 2024-02-15 12:54, Chen, Xiaogang wrote: On 2/15/2024 9:18 AM, Philip Yang wrote: Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding

[PATCH 2/2] drm/amdkfd: Output migrate end event if migration failed

2024-02-15 Thread Philip Yang
To track the migrate end-event in case of a migration failure, always output migrate end event, with the failure result added to the existing migrate end event string. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c| 16 drivers/gpu/drm/amd/amdkfd

[PATCH 1/2] drm/amdkfd: Document and define SVM event tracing macro

2024-02-15 Thread Philip Yang
-by: Philip Yang --- drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c | 51 +++--- include/uapi/linux/kfd_ioctl.h | 77 - 2 files changed, 102 insertions(+), 26 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c b/drivers/gpu/drm/amd/amdkfd

[PATCH 2/2] drm/amdgpu: Improve huge page mapping update

2024-02-01 Thread Philip Yang
loc and free. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c index a3d609655ce3..ef3ef03e50ab 100

[PATCH 1/2] drm/amdgpu: Unmap only clear the page table leaves

2024-02-01 Thread Philip Yang
ree the PTB bo. With this change, the vm->pt_freed list and work is not needed. Add WARN_ON(unlocked) in amdgpu_vm_pt_free_dfs to catch if unmap to free the PTB. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c| 4 --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h| 4 --- d

Re: [PATCH] drm/amdgpu: Support >=4GB GTT memory mapping

2024-01-29 Thread Philip Yang
On 2024-01-29 11:30, Christian König wrote: Am 29.01.24 um 17:25 schrieb Philip Yang: On 2024-01-29 05:06, Christian König wrote: Am 26.01.24 um 20:47 schrieb Philip Yang: This is to work

Re: [PATCH] drm/amdgpu: Support >=4GB GTT memory mapping

2024-01-29 Thread Philip Yang
On 2024-01-29 05:06, Christian König wrote: Am 26.01.24 um 20:47 schrieb Philip Yang: This is to work around a bug in function drm_prime_pages_to_sg if length of nr_pages >= 4GB, by doing the same check for max_segm

[PATCH] drm/amdgpu: Support >=4GB GTT memory mapping

2024-01-26 Thread Philip Yang
GB GTT memory mapping for mGPUs with IOMMU isolation mode. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 50 ++--- 1 file changed, 34 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c b/drivers/gpu/drm/amd/

Re: [PATCH] drm/amdgpu: Limit the maximum fragment to granularity size

2024-01-26 Thread Philip Yang
On 2024-01-26 10:35, Christian König wrote: Am 26.01.24 um 16:17 schrieb Philip Yang: On 2024-01-26 09:59, Christian König wrote: Am 26.01.24 um 15:38 schrieb Philip Yang: svm range support

Re: [PATCH] drm/amdgpu: Limit the maximum fragment to granularity size

2024-01-26 Thread Philip Yang
On 2024-01-26 09:59, Christian König wrote: Am 26.01.24 um 15:38 schrieb Philip Yang: svm range support partial migration and mapping update, for size 4MB virtual address 4MB alignment and physical address continuous

[PATCH] drm/amdgpu: Limit the maximum fragment to granularity size

2024-01-26 Thread Philip Yang
to the stale mapping. Limit the maximum fragment size to granularity size, 2MB by default, with the mapping and unmapping based on gramularity size, to solve this issue. The change is only for SVM map/unmap range, no change for gfx and legacy API path. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd

[PATCH v4 7/7] drm/amdkfd: Wait update sdma fence before tlb flush

2024-01-15 Thread Philip Yang
ble once, have to wait sdma udate fence and then flush tlb. No change if using CPU update GPU page table for large bar because no vm update fence. Remove wait parameter in svm_range_validate_and_map because it is always called with true now. Signed-off-by: Philip Yang --- drivers/gpu/drm/

[PATCH v4 2/7] drm/amdkfd: Add helper function align range start last

2024-01-15 Thread Philip Yang
Calculate range start, last address aligned to the range granularity size. This removes the duplicate code, and the helper function will be used in the future patch to handle map, unmap to GPU based on range granularity. No functional change. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd

[PATCH v4 5/7] drm/amdkfd: Change range granularity update bitmap_map

2024-01-15 Thread Philip Yang
When changing the svm range granularity, update the svm range bitmap_map based on new range granularity. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 49 +++- 1 file changed, 48 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd

[PATCH v4 4/7] amd/amdkfd: Unmap range from GPU based on granularity

2024-01-15 Thread Philip Yang
, split range may get incorrect bitmap_map for the remaining ranges. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 42 +++- 1 file changed, 29 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd

[PATCH v4 6/7] drm/amdkfd: Check bitmap_map flag to skip retry fault

2024-01-15 Thread Philip Yang
Remove prange validate_timestamp which is not accurate for multiple GPUs. Use the bitmap_map flag to skip the retry fault from different pages of the same granularity range if the granularity range is already mapped on the specific GPU. Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling

[PATCH v4 3/7] drm/amdkfd: Add granularity size based bitmap map flag

2024-01-15 Thread Philip Yang
retry fault recover. svm_range_partial_mapped is false only if no part of the range mapping on any GPUs. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 258 ++- drivers/gpu/drm/amd/amdkfd/kfd_svm.h | 7 +- 2 files changed, 219 insertions(

[PATCH v4 1/7] drm/amdkfd: Add helper function svm_range_need_access_gpus

2024-01-15 Thread Philip Yang
Add the helper function to get all GPUs bitmap that need access the svm range. This helper will be used in the following patch to check if prange is mapped to all gpus. Refactor svm_range_validate_and_map to use the helper function, no functional change. Signed-off-by: Philip Yang Reviewed

[PATCH v4] drm/amdkfd: Set correct svm range actual loc after spliting

2024-01-15 Thread Philip Yang
s 0 if old->vram_pages - new->vram_pages == 0. new range takes svm_bo ref only if vram_pages not equal to 0. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 8 + drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 42 ++-- drivers/gpu/drm/a

[PATCH] drm/amdkfd: Correct partial migration virtual addr

2024-01-15 Thread Philip Yang
Partial migration to system memory should use migrate.addr, not prange->start as virtual address to allocate system memory page. Fixes: 18eb61bd5a6a ("drm/amdkfd: Use partial migrations/mapping for GPU/CPU page faults in SVM" Signed-off-by: Philip Yang --- drivers/gpu/d

Re: [PATCH v3] amd/amdkfd: Set correct svm range actual loc after spliting

2024-01-11 Thread Philip Yang
On 2024-01-11 12:37, Chen, Xiaogang wrote: On 1/11/2024 10:54 AM, Felix Kuehling wrote: On 2024-01-10 17:01, Philip Yang wrote: While svm range partial migrating to system memory, clear

Re: [PATCH v3] amd/amdkfd: Set correct svm range actual loc after spliting

2024-01-11 Thread Philip Yang
On 2024-01-11 11:54, Felix Kuehling wrote: On 2024-01-10 17:01, Philip Yang wrote: While svm range partial migrating to system memory, clear dma_addr vram domain flag, otherwise the future split will get incorrect

[PATCH v3] amd/amdkfd: Set correct svm range actual loc after spliting

2024-01-10 Thread Philip Yang
s 0 if old->vram_pages - new->vram_pages == 0. new range takes svm_bo ref only if vram_pages not equal to 0. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 20 +++- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 24 ++-- 2 files ch

Re: [PATCH v2] amd/amdkfd: Set correct svm range actual loc after spliting

2024-01-10 Thread Philip Yang
On 2024-01-10 11:30, Felix Kuehling wrote: On 2024-01-09 15:05, Philip Yang wrote: After svm range partial migrating to system memory, unmap to cleanup the corresponding dma_addr vram domain flag, otherwise

Re: [PATCH v2] amd/amdkfd: Set correct svm range actual loc after spliting

2024-01-10 Thread Philip Yang
On 2024-01-09 17:29, Chen, Xiaogang wrote: On 1/9/2024 2:05 PM, Philip Yang wrote: After svm range partial migrating to system memory, unmap to cleanup the corresponding dma_addr vram domain flag, otherwise

[PATCH v2] amd/amdkfd: Set correct svm range actual loc after spliting

2024-01-09 Thread Philip Yang
ges is 0. old range actual_loc is 0 if old->vram_pages - new->vram_pages == 0. new range takes svm_bo ref only if vram_pages not equal to 0. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 3 ++ drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 35 +++-

Re: [PATCH] amd/amdkfd: Set correct svm range actual loc after spliting

2024-01-09 Thread Philip Yang
On 2024-01-08 18:17, Chen, Xiaogang wrote: With a nitpick below, this patch is Reviewed-by:Xiaogang Chen On 1/8/2024 4:36 PM, Philip Yang wrote: After range spliting, set new range and old range

  1   2   3   4   5   6   7   >