Re: [PATCH v6 02/13] mm: remove extra ZONE_DEVICE struct page refcount

2021-08-17 Thread Felix Kuehling
Am 2021-08-17 um 8:01 p.m. schrieb Ralph Campbell: > On 8/12/21 11:31 PM, Alex Sierra wrote: >> From: Ralph Campbell >> >> ZONE_DEVICE struct pages have an extra reference count that >> complicates the >> code for put_page() and several places in the kernel that need to >> check the >>

Re: [PATCH v1 03/14] mm: add iomem vma selection for memory migration

2021-08-26 Thread Felix Kuehling
Am 2021-08-25 um 2:24 p.m. schrieb Sierra Guiza, Alejandro (Alex): > > On 8/25/2021 2:46 AM, Christoph Hellwig wrote: >> On Tue, Aug 24, 2021 at 10:48:17PM -0500, Alex Sierra wrote: >>>   } else { >>> -    if (!(migrate->flags & MIGRATE_VMA_SELECT_SYSTEM)) >>> +    if

Re: [PATCH v1 03/14] mm: add iomem vma selection for memory migration

2021-09-01 Thread Felix Kuehling
Am 2021-09-01 um 4:29 a.m. schrieb Christoph Hellwig: > On Mon, Aug 30, 2021 at 01:04:43PM -0400, Felix Kuehling wrote: >>>> driver code is not really involved in updating the CPU mappings. Maybe >>>> it's something we need to do in the migration helpers. &g

Re: [PATCH v1 03/14] mm: add iomem vma selection for memory migration

2021-08-25 Thread Felix Kuehling
Am 2021-08-24 um 11:48 p.m. schrieb Alex Sierra: > In this case, this is used to migrate pages from device memory, back to > system memory. This particular device memory type should be accessible > by the CPU, through IOMEM access. Typically, zone device public type > memory falls into this

Re: [PATCH v1 06/14] drm/amdkfd: add SPM support for SVM

2021-08-25 Thread Felix Kuehling
DEVICE_PRIVATE or DEVICE_PUBLIC to create the device > page map region. > > Signed-off-by: Alex Sierra > Reviewed-by: Felix Kuehling > --- > v7: > Remove lookup_resource call, so export symbol for this function > is not longer required. Patch dropped "kernel: re

Re: [PATCH v1 05/14] drm/amdkfd: ref count init for device pages

2021-08-25 Thread Felix Kuehling
Am 2021-08-24 um 11:48 p.m. schrieb Alex Sierra: > Ref counter from device pages is init to zero during memmap init zone. > The first time a new device page is allocated to migrate data into it, > its ref counter needs to be initialized to one. > > Signed-off-by: Alex Sierra > --- >

Re: [PATCH v6 04/13] drm/amdkfd: add SPM support for SVM

2021-08-16 Thread Felix Kuehling
Am 2021-08-15 um 5:10 a.m. schrieb Christoph Hellwig: >> @@ -880,17 +881,22 @@ int svm_migrate_init(struct amdgpu_device *adev) >> * should remove reserved size >> */ >> size = ALIGN(adev->gmc.real_vram_size, 2ULL << 20); >> -res = devm_request_free_mem_region(adev->dev,

Re: [PATCH v6 02/13] mm: remove extra ZONE_DEVICE struct page refcount

2021-08-16 Thread Felix Kuehling
Am 2021-08-15 um 4:40 p.m. schrieb John Hubbard: > On 8/15/21 8:37 AM, Christoph Hellwig wrote: >>> diff --git a/include/linux/mm.h b/include/linux/mm.h >>> index 8ae31622deef..d48a1f0889d1 100644 >>> --- a/include/linux/mm.h >>> +++ b/include/linux/mm.h >>> @@ -1218,7 +1218,7 @@ __maybe_unused

Re: [PATCH v6 08/13] mm: call pgmap->ops->page_free for DEVICE_GENERIC pages

2021-08-16 Thread Felix Kuehling
Am 2021-08-15 um 11:40 a.m. schrieb Christoph Hellwig: > On Fri, Aug 13, 2021 at 01:31:45AM -0500, Alex Sierra wrote: >> Add MEMORY_DEVICE_GENERIC case to free_zone_device_page callback. >> Device generic type memory case is now able to free its pages properly. > How is this going to work for

Re: [PATCH v6 02/13] mm: remove extra ZONE_DEVICE struct page refcount

2021-08-19 Thread Felix Kuehling
Am 2021-08-19 um 2:00 p.m. schrieb Sierra Guiza, Alejandro (Alex): > > On 8/18/2021 2:28 PM, Ralph Campbell wrote: >> On 8/17/21 5:35 PM, Felix Kuehling wrote: >>> Am 2021-08-17 um 8:01 p.m. schrieb Ralph Campbell: >>>> On 8/12/21 11:31 PM, Alex Sierra

Re: [PATCH v6 05/13] drm/amdkfd: generic type as sys mem on migration to ram

2021-08-16 Thread Felix Kuehling
Am 2021-08-16 um 6:06 p.m. schrieb Zeng, Oak: > Regards, > Oak > > > > On 2021-08-16, 3:53 PM, "amd-gfx on behalf of Sierra Guiza, Alejandro > (Alex)" alex.sie...@amd.com> wrote: > > > On 8/15/2021 10:38 AM, Christoph Hellwig wrote: > > On Fri, Aug 13, 2021 at 01:31:42AM -0500, Alex

Re: [PATCH v1 03/14] mm: add iomem vma selection for memory migration

2021-09-01 Thread Felix Kuehling
On 2021-09-01 6:03 p.m., Dave Chinner wrote: On Wed, Sep 01, 2021 at 11:40:43AM -0400, Felix Kuehling wrote: Am 2021-09-01 um 4:29 a.m. schrieb Christoph Hellwig: On Mon, Aug 30, 2021 at 01:04:43PM -0400, Felix Kuehling wrote: driver code is not really involved in updating the CPU mappings

Re: [PATCH v1 03/14] mm: add iomem vma selection for memory migration

2021-09-08 Thread Felix Kuehling
Am 2021-09-02 um 4:18 a.m. schrieb Christoph Hellwig: > On Wed, Sep 01, 2021 at 11:40:43AM -0400, Felix Kuehling wrote: >>>>> It looks like I'm totally misunderstanding what you are adding here >>>>> then. Why do we need any special treatment at all for memory t

Re: [PATCH] drm/amd/amdkfd: fix possible memory leak in svm_range_restore_pages

2021-09-08 Thread Felix Kuehling
Hi Xiyu Yang, This bug was already fixed by this commit: https://gitlab.freedesktop.org/agd5f/linux/-/commit/598a118db0d85a432f8cd541a6a5d31e31c56b6b Regards,  Felix Am 2021-09-09 um 12:27 a.m. schrieb Xiyu Yang: > The memory leak issue may take place in an error handling path. When >

Re: [PATCH v1 03/14] mm: add iomem vma selection for memory migration

2021-09-08 Thread Felix Kuehling
Am 2021-09-01 um 9:14 p.m. schrieb Dave Chinner: > On Wed, Sep 01, 2021 at 07:07:34PM -0400, Felix Kuehling wrote: >> On 2021-09-01 6:03 p.m., Dave Chinner wrote: >>> On Wed, Sep 01, 2021 at 11:40:43AM -0400, Felix Kuehling wrote: >>>> Am 2021-09-01 um 4:29

[PATCH 1/1] drm/amdkfd: Add sysfs bitfields and enums to uAPI

2021-09-10 Thread Felix Kuehling
These bits are de-facto part of the uAPI, so declare them in a uAPI header. Signed-off-by: Felix Kuehling --- MAINTAINERS | 1 + drivers/gpu/drm/amd/amdkfd/kfd_topology.h | 46 + include/uapi/linux/kfd_sysfs.h| 108 ++ 3

Re: [PATCH v3 1/1] drm/ttm: Fix COW check

2021-07-13 Thread Felix Kuehling
Am 2021-07-13 um 2:57 a.m. schrieb Christian König: > > > Am 13.07.21 um 00:06 schrieb Felix Kuehling: >> KFD Thunk maps invisible VRAM BOs with PROT_NONE, MAP_PRIVATE. >> is_cow_mapping returns true for these mappings. Add a check for >> vm_flags & VM_WRITE to avoi

Re: [PATCH v3 1/1] drm/ttm: Fix COW check

2021-07-14 Thread Felix Kuehling
Am 2021-07-14 um 6:51 a.m. schrieb Christian König: > Am 14.07.21 um 12:44 schrieb Daniel Vetter: >> On Mon, Jul 12, 2021 at 06:06:36PM -0400, Felix Kuehling wrote: >>> KFD Thunk maps invisible VRAM BOs with PROT_NONE, MAP_PRIVATE. >>> is_cow_mapping returns true for

Re: [PATCH v3 0/8] Support DEVICE_GENERIC memory in migrate_vma_*

2021-07-30 Thread Felix Kuehling
Am 2021-07-23 um 6:46 p.m. schrieb Sierra Guiza, Alejandro (Alex): > > On 7/17/2021 2:54 PM, Sierra Guiza, Alejandro (Alex) wrote: >> >> On 7/16/2021 5:14 PM, Felix Kuehling wrote: >>> Am 2021-07-16 um 11:07 a.m. schrieb Theodore Y. Ts'o: >>>> On Wed,

Re: [PATCH v4 10/13] lib: test_hmm add module param for zone device type

2021-07-30 Thread Felix Kuehling
Am 2021-07-28 um 7:45 p.m. schrieb Sierra Guiza, Alejandro (Alex): > > On 7/22/2021 12:26 PM, Jason Gunthorpe wrote: >> On Thu, Jul 22, 2021 at 11:59:17AM -0500, Sierra Guiza, Alejandro >> (Alex) wrote: >>> On 7/22/2021 7:23 AM, Jason Gunthorpe wrote: On Sat, Jul 17, 2021 at 02:21:32PM -0500,

Re: [PATCH] Whitelist AMD host bridge device(s) to enable P2P DMA

2021-08-11 Thread Felix Kuehling
Am 2021-08-11 um 3:29 p.m. schrieb Alex Deucher: > On Wed, Aug 11, 2021 at 3:11 PM Ramesh Errabolu > wrote: >> Current implementation will disallow P2P DMA if the participating >> devices belong to different root complexes. Implementation allows >> this default behavior to be overridden for

BoF at LPC: Documenting the Heterogeneous Memory Model Architecture

2021-09-21 Thread Felix Kuehling
As the programming models for GPU-based high-performance computing applications are evolving, HMM is helping us integrate the GPU memory management more closely with the kernel's virtual memory management. As a result we can provide a shared virtual address space with demand-paging and

Re: BoF at LPC: Documenting the Heterogeneous Memory Model Architecture

2021-09-23 Thread Felix Kuehling
00 Pacific, 11:40-13:00 Eastern, 15:40-17:00 UTC. I hope to see you all tomorrow,   Felix On 2021-09-21 3:19 p.m., Felix Kuehling wrote: As the programming models for GPU-based high-performance computing applications are evolving, HMM is helping us integrate the GPU memory management mo

Re: [PATCH v1 00/12] MEMORY_DEVICE_COHERENT for CPU-accessible coherent device memory

2021-10-12 Thread Felix Kuehling
Am 2021-10-12 um 3:03 p.m. schrieb Andrew Morton: > On Tue, 12 Oct 2021 15:56:29 -0300 Jason Gunthorpe wrote: > >>> To what other uses will this infrastructure be put? >>> >>> Because I must ask: if this feature is for one single computer which >>> presumably has a custom kernel, why add it to

Re: [PATCH v1 00/12] MEMORY_DEVICE_COHERENT for CPU-accessible coherent device memory

2021-10-12 Thread Felix Kuehling
Am 2021-10-12 um 2:39 p.m. schrieb Andrew Morton: > On Tue, 12 Oct 2021 12:12:35 -0500 Alex Sierra wrote: > >> This patch series introduces MEMORY_DEVICE_COHERENT, a type of memory >> owned by a device that can be mapped into CPU page tables like >> MEMORY_DEVICE_GENERIC and can also be migrated

Re: [PATCH 12/28] drm/amdgpu: use new iterator in amdgpu_ttm_bo_eviction_valuable

2021-10-19 Thread Felix Kuehling
Am 2021-10-19 um 7:36 a.m. schrieb Christian König: > Am 13.10.21 um 16:07 schrieb Daniel Vetter: >> On Tue, Oct 05, 2021 at 01:37:26PM +0200, Christian König wrote: >>> Simplifying the code a bit. >>> >>> Signed-off-by: Christian König >>> --- >>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 14

Re: [PATCH v1 00/12] MEMORY_DEVICE_COHERENT for CPU-accessible coherent device memory

2021-10-12 Thread Felix Kuehling
Am 2021-10-12 um 3:11 p.m. schrieb Matthew Wilcox: > On Tue, Oct 12, 2021 at 11:39:57AM -0700, Andrew Morton wrote: >> Because I must ask: if this feature is for one single computer which >> presumably has a custom kernel, why add it to mainline Linux? > I think in particular patch 2 deserves to

Re: [PATCH v1 03/14] mm: add iomem vma selection for memory migration

2021-08-30 Thread Felix Kuehling
Am 2021-08-30 um 4:28 a.m. schrieb Christoph Hellwig: > On Thu, Aug 26, 2021 at 06:27:31PM -0400, Felix Kuehling wrote: >> I think we're missing something here. As far as I can tell, all the work >> we did first with DEVICE_GENERIC and now DEVICE_PUBLIC always used >> normal

Re: [PATCH] drm/ttm: Don't inherit GEM object VMAs in child process

2022-01-04 Thread Felix Kuehling
[+Adrian] Am 2021-12-23 um 2:05 a.m. schrieb Christian König: > Am 22.12.21 um 21:53 schrieb Daniel Vetter: >> On Mon, Dec 20, 2021 at 01:12:51PM -0500, Bhardwaj, Rajneesh wrote: >> >> [SNIP] >> Still sounds funky. I think minimally we should have an ack from CRIU >> developers that this is

Re: [PATCH] drm/ttm: Don't inherit GEM object VMAs in child process

2022-01-05 Thread Felix Kuehling
Am 2022-01-05 um 3:08 a.m. schrieb Christian König: > Am 04.01.22 um 19:08 schrieb Felix Kuehling: >> [+Adrian] >> >> Am 2021-12-23 um 2:05 a.m. schrieb Christian König: >> >>> Am 22.12.21 um 21:53 schrieb Daniel Vetter: >>>> On Mon, Dec 20, 20

Re: [PATCH] drm/amdkfd: Check for null pointer after calling kmemdup

2022-01-05 Thread Felix Kuehling
rm/amdkfd: Add topology support for dGPUs") > Signed-off-by: Jiasheng Jiang Reviewed-by: Felix Kuehling > --- > drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c > b/drivers/g

Re: [PATCH] drm/ttm: Don't inherit GEM object VMAs in child process

2022-01-05 Thread Felix Kuehling
Am 2022-01-05 um 11:16 a.m. schrieb Felix Kuehling: >> I was already wondering which mmaps through the KFD node we have left >> which cause problems here. > We still use the KFD FD for mapping doorbells and HDP flushing. These > are both SG BOs, so they cannot be CPU-mapped th

Re: [PATCH] drm/amd/amdgpu: fix potential memleak

2021-11-15 Thread Felix Kuehling
Am 2021-11-14 um 9:58 p.m. schrieb Bernard Zhao: > In function amdgpu_get_xgmi_hive, when kobject_init_and_add failed > There is a potential memleak if not call kobject_put. > > Signed-off-by: Bernard Zhao Reviewed-by: Felix Kuehling > --- > drivers/gpu/drm/amd/amdgpu

Re: [PATCH 2/2] drm/amdkfd: Slighly optimize 'init_doorbell_bitmap()'

2021-11-22 Thread Felix Kuehling
the patches. I think the same sort of change (at least the allocation/freeing part) could be applied to the queue_slot_bitmap in kfd_process_queue_manager.c. Would you like to submit another revision of this patch series that handles that as well? Either way, this series is Reviewed-by:

Re: [PATCH v1 1/9] mm: add zone device coherent type memory support

2021-11-18 Thread Felix Kuehling
Am 2021-11-18 um 1:53 a.m. schrieb Alistair Popple: > On Tuesday, 16 November 2021 6:30:18 AM AEDT Alex Sierra wrote: >> Device memory that is cache coherent from device and CPU point of view. >> This is used on platforms that have an advanced system bus (like CAPI >> or CXL). Any page of a

Re: [PATCH v1 7/9] lib: add support for device coherent type in test_hmm

2021-11-24 Thread Felix Kuehling
Am 2021-11-15 um 2:30 p.m. schrieb Alex Sierra: > Device Coherent type uses device memory that is coherently accesible by > the CPU. This could be shown as SP (special purpose) memory range > at the BIOS-e820 memory enumeration. If no SP memory is supported in > system, this could be faked by

Re: [PATCH v1 1/9] mm: add zone device coherent type memory support

2021-11-22 Thread Felix Kuehling
Am 2021-11-21 um 9:40 p.m. schrieb Alistair Popple: diff --git a/mm/migrate.c b/mm/migrate.c index 1852d787e6ab..f74422a42192 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -362,7 +362,7 @@ static int expected_page_refs(struct address_space *mapping, struct page

[PATCH 1/2] drm/amdgpu: Generalize KFD dmabuf import

2021-11-17 Thread Felix Kuehling
Use proper amdgpu_gem_prime_import function to handle all kinds of imports. Remember the dmabuf reference to enable proper multi-GPU attachment to multiple VMs without erroneously re-exporting the underlying BO multiple times. Signed-off-by: Felix Kuehling --- .../gpu/drm/amd/amdgpu

[PATCH 2/2] drm/amdkfd: Implement DMA buf fd export for RDMA

2021-11-17 Thread Felix Kuehling
-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 2 + .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 45 +++ drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 55 +++ include/uapi/linux/kfd_ioctl.h| 14 - 4 files changed, 104

Re: [PATCH v2 2/2] drm/amdkfd: Slighly optimize 'init_doorbell_bitmap()'

2021-11-25 Thread Felix Kuehling
Am 2021-11-23 um 3:46 p.m. schrieb Christophe JAILLET: > The 'doorbell_bitmap' bitmap has just been allocated. So we can use the > non-atomic '__set_bit()' function to save a few cycles as no concurrent > access can happen. > > Reviewed-by: Felix Kuehling > Signed-off-by:

Re: [PATCH] mm/migrate.c: Remove MIGRATE_PFN_LOCKED

2021-10-26 Thread Felix Kuehling
t; Signed-off-by: Alistair Popple It makes sense to me. Do you have any empirical data on how much more likely migrations are going to fail with this change due to contested page locks? Either way, the patch is Acked-by: Felix Kuehling > --- > Documentation/vm/hmm.rst | 2 +

Re: [PATCH] mm/migrate.c: Remove MIGRATE_PFN_LOCKED

2021-10-28 Thread Felix Kuehling
Am 2021-10-27 um 9:42 p.m. schrieb Alistair Popple: > On Wednesday, 27 October 2021 3:09:57 AM AEDT Felix Kuehling wrote: >> Am 2021-10-25 um 12:16 a.m. schrieb Alistair Popple: >>> MIGRATE_PFN_LOCKED is used to indicate to migrate_vma_prepare() that a >>> source pag

Re: [PATCH v2 03/11] mm/gup: migrate PIN_LONGTERM dev coherent pages to system

2021-12-09 Thread Felix Kuehling
Am 2021-12-09 um 5:53 a.m. schrieb Alistair Popple: > On Thursday, 9 December 2021 5:55:26 AM AEDT Sierra Guiza, Alejandro (Alex) > wrote: >> On 12/8/2021 11:30 AM, Felix Kuehling wrote: >>> Am 2021-12-08 um 11:58 a.m. schrieb Felix Kuehling: >>>> Am 2021-12

Re: [PATCH] drm/ttm: Don't inherit GEM object VMAs in child process

2021-12-09 Thread Felix Kuehling
;>>>>> we had >>>>>> worked around earlier in the user space inside the thunk library. >>>>>> >>>>>> Additionally, we faced this issue when using CRIU to checkpoint >>>>>> restore >>>>

Re: [PATCH] drm/amdkfd: Fix a wild pointer dereference in svm_range_add()

2021-12-07 Thread Felix Kuehling
. On Wed, Dec 1, 2021 at 1:35 AM Felix Kuehling <mailto:felix.kuehl...@amd.com>> wrote: Am 2021-11-30 um 11:51 a.m. schrieb philip yang: > > > On 2021-11-30 6:26 a.m., Zhou Qingyang wrote: >> In svm_range_add(), the return value of svm_range_new() is a

Re: [PATCH v2 03/11] mm/gup: migrate PIN_LONGTERM dev coherent pages to system

2021-12-08 Thread Felix Kuehling
Am 2021-12-08 um 6:31 a.m. schrieb Alistair Popple: > On Tuesday, 7 December 2021 5:52:43 AM AEDT Alex Sierra wrote: >> Avoid long term pinning for Coherent device type pages. This could >> interfere with their own device memory manager. >> If caller tries to get user device coherent pages with

Re: [PATCH v2 03/11] mm/gup: migrate PIN_LONGTERM dev coherent pages to system

2021-12-08 Thread Felix Kuehling
Am 2021-12-08 um 11:58 a.m. schrieb Felix Kuehling: > Am 2021-12-08 um 6:31 a.m. schrieb Alistair Popple: >> On Tuesday, 7 December 2021 5:52:43 AM AEDT Alex Sierra wrote: >>> Avoid long term pinning for Coherent device type pages. This could >>> interfere with thei

Re: [PATCH v2 03/11] mm/gup: migrate PIN_LONGTERM dev coherent pages to system

2021-12-10 Thread Felix Kuehling
On 2021-12-09 8:31 p.m., Alistair Popple wrote: On Friday, 10 December 2021 3:54:31 AM AEDT Sierra Guiza, Alejandro (Alex) wrote: On 12/9/2021 10:29 AM, Felix Kuehling wrote: Am 2021-12-09 um 5:53 a.m. schrieb Alistair Popple: On Thursday, 9 December 2021 5:55:26 AM AEDT Sierra Guiza

Re: [PATCH] drm/amdkfd: Fix a wild pointer dereference in svm_range_add()

2021-11-30 Thread Felix Kuehling
Am 2021-11-30 um 11:51 a.m. schrieb philip yang: > > > On 2021-11-30 6:26 a.m., Zhou Qingyang wrote: >> In svm_range_add(), the return value of svm_range_new() is assigned >> to prange and >insert_list is used in list_add(). There is a >> a dereference of >insert_list in list_add(), which could

Re: [Patch v2] drm/amdgpu: Don't inherit GEM object VMAs in child process

2021-12-10 Thread Felix Kuehling
space consumers such as OpenGL etc, limit it to KFD BOs only. Cc: Felix Kuehling Signed-off-by: David Yat Sin Signed-off-by: Rajneesh Bhardwaj Acked-by: Felix Kuehling --- Changes in v2: * Addressed Christian's concerns for user space impact * Further reduced the scope to KFD BOs only

Re: Reuse framebuffer after a kexec (amdgpu / efifb)

2021-12-10 Thread Felix Kuehling
On 2021-12-10 10:13 a.m., Christian König wrote: Am 10.12.21 um 15:25 schrieb Guilherme G. Piccoli: On 10/12/2021 11:16, Alex Deucher wrote:> [...] Why not just reload the driver after kexec? Alex Because the original issue is the kdump case, and we want a very very tiny kernel - also, the

Re: [PATCH v3 0/8] Support DEVICE_GENERIC memory in migrate_vma_*

2021-07-16 Thread Felix Kuehling
Am 2021-07-16 um 11:07 a.m. schrieb Theodore Y. Ts'o: > On Wed, Jun 23, 2021 at 05:49:55PM -0400, Felix Kuehling wrote: >> I can think of two ways to test the changes for MEMORY_DEVICE_GENERIC in >> this patch series in a way that is reproducible without special hardware

[PATCH 1/1] drm/amdgpu: workaround failed COW checks for Thunk VMAs

2021-07-15 Thread Felix Kuehling
is_cow_mapping(vm_flags) false. Fixes: f91142c62161 ("drm/ttm: nuke VM_MIXEDMAP on BO mappings v3") Suggested-by: Daniel Vetter Tested-by: Felix Kuehling Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 9 + 1 file changed, 9 insertions(+) diff --git a/d

Re: [PATCH] drm/amdkfd: move PTR_ERR under IS_ERR() condition

2021-07-27 Thread Felix Kuehling
Hi Huoqing, Your patch is technically correct. However, I don't think it fixes any actual bug, and it changes a code path that has no performance implications. Therefore I would just leave it as it is. Regards,   Felix Am 2021-07-20 um 2:34 a.m. schrieb Cai Huoqing: > no need to get error code

Re: [PATCH] drm/ttm: Don't inherit GEM object VMAs in child process

2022-01-07 Thread Felix Kuehling
Am 2022-01-07 um 3:56 a.m. schrieb Christian König: > Am 06.01.22 um 17:51 schrieb Felix Kuehling: >> Am 2022-01-06 um 11:48 a.m. schrieb Christian König: >>> Am 06.01.22 um 17:45 schrieb Felix Kuehling: >>>> Am 2022-01-06 um 4:05 a.m. schrieb Christian König: >&g

Re: [PATCH 1/1] Add available memory ioctl for libhsakmt

2022-01-10 Thread Felix Kuehling
by libhsakmt per node, allowing for space consumed by page translation tables. Other than the missing signed-off-by, this patch is Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 1 + .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c| 14 ++ drivers/gpu

Re: [PATCH 1/1] Add test for hsaKmtAvailableMemory available memory inquiry

2022-01-10 Thread Felix Kuehling
On 2022-01-10 4:48 p.m., Daniel Phillips wrote: Basic test for the new hsaKmtAvailableMemory library call. This is a standalone test, does not modify any of the other tests just to be on the safe side. More elaborate tests coming soon. Change-Id: I738600d4b74cc5dba6b857e4c793f6b14b7d2283

Re: [Patch v4 18/24] drm/amdkfd: CRIU checkpoint and restore xnack mode

2022-01-10 Thread Felix Kuehling
On 2022-01-05 10:22 a.m., philip yang wrote: On 2021-12-22 7:37 p.m., Rajneesh Bhardwaj wrote: Recoverable page faults are represented by the xnack mode setting inside a kfd process and are used to represent the device page faults. For CR, we don't consider negative values which are typically

Re: [PATCH] drm/amdkfd: Check for null pointer after calling kmemdup

2022-01-10 Thread Felix Kuehling
On 2022-01-05 10:56 a.m., Felix Kuehling wrote: Am 2022-01-05 um 4:09 a.m. schrieb Jiasheng Jiang: As the possible failure of the allocation, kmemdup() may return NULL pointer. Therefore, it should be better to check the 'props2' in order to prevent the dereference of NULL pointer. Fixes

Re: [Patch v4 04/24] drm/amdkfd: CRIU Implement KFD process_info ioctl

2022-01-10 Thread Felix Kuehling
On 2021-12-22 7:36 p.m., Rajneesh Bhardwaj wrote: This IOCTL is expected to be called as a precursor to the actual Checkpoint operation. This does the basic discovery into the target process seized by CRIU and relays the information to the userspace that utilizes it to start the Checkpoint

Re: [Patch v4 24/24] drm/amdkfd: CRIU resume shared virtual memory ranges

2022-01-10 Thread Felix Kuehling
On 2021-12-22 7:37 p.m., Rajneesh Bhardwaj wrote: In CRIU resume stage, resume all the shared virtual memory ranges from the data stored inside the resuming kfd process during CRIU restore phase. Also setup xnack mode and free up the resources. Signed-off-by: Rajneesh Bhardwaj ---

Re: [Patch v4 23/24] drm/amdkfd: CRIU prepare for svm resume

2022-01-10 Thread Felix Kuehling
On 2022-01-05 9:43 a.m., philip yang wrote: On 2021-12-22 7:37 p.m., Rajneesh Bhardwaj wrote: During CRIU restore phase, the VMAs for the virtual address ranges are not at their final location yet so in this stage, only cache the data required to successfully resume the svm ranges during an

Re: [Patch v4 06/24] drm/amdkfd: CRIU Implement KFD restore ioctl

2022-01-10 Thread Felix Kuehling
On 2021-12-22 7:36 p.m., Rajneesh Bhardwaj wrote: This implements the KFD CRIU Restore ioctl that lays the basic foundation for the CRIU restore operation. It provides support to create the buffer objects corresponding to Non-Paged system memory mapped for GPU and/or CPU access and lays basic

Re: [Patch v4 13/24] drm/amdkfd: CRIU checkpoint and restore queue mqds

2022-01-10 Thread Felix Kuehling
On 2021-12-22 7:37 p.m., Rajneesh Bhardwaj wrote: From: David Yat Sin Checkpoint contents of queue MQD's on CRIU dump and restore them during CRIU restore. Signed-off-by: David Yat Sin David has an update for this patch to fix up the doorbell offset in the restored SDMA MQD. Regards,  

Re: [Patch v4 03/24] drm/amdkfd: CRIU Introduce Checkpoint-Restore APIs

2022-01-10 Thread Felix Kuehling
criu plugin which has elevated ptrace attached privileges and CAP_CHECKPOINT_RESTORE capabilities attached with the file descriptors so modify KFD to allow such calls. (API redesigned by David Yat Sin) Suggested-by: Felix Kuehling Signed-off-by: David Yat Sin Signed-off-by: Rajneesh Bhardwaj

Re: [Patch v4 07/24] drm/amdkfd: CRIU Implement KFD resume ioctl

2022-01-10 Thread Felix Kuehling
On 2021-12-22 7:36 p.m., Rajneesh Bhardwaj wrote: This adds support to create userptr BOs on restore and introduces a new ioctl to restart memory notifiers for the restored userptr BOs. When doing CRIU restore MMU notifications can happen anytime after we call amdgpu_mn_register. Prevent MMU

Re: [PATCH v3 00/10] Add MEMORY_DEVICE_COHERENT for coherent device memory mapping

2022-01-12 Thread Felix Kuehling
Am 2022-01-12 um 6:16 a.m. schrieb David Hildenbrand: > On 10.01.22 23:31, Alex Sierra wrote: >> This patch series introduces MEMORY_DEVICE_COHERENT, a type of memory >> owned by a device that can be mapped into CPU page tables like >> MEMORY_DEVICE_GENERIC and can also be migrated like >>

Re: [PATCH] drm/ttm: Don't inherit GEM object VMAs in child process

2022-01-17 Thread Felix Kuehling
Am 2022-01-17 um 6:44 a.m. schrieb Christian König: > Am 14.01.22 um 18:40 schrieb Felix Kuehling: >> Am 2022-01-14 um 12:26 p.m. schrieb Christian König: >>> Am 14.01.22 um 17:44 schrieb Daniel Vetter: >>>> Top post because I tried to catch up on the entire

Re: [PATCH] drm/ttm: Don't inherit GEM object VMAs in child process

2022-01-17 Thread Felix Kuehling
Am 2022-01-17 um 9:21 a.m. schrieb Christian König: > Am 17.01.22 um 15:17 schrieb Felix Kuehling: >> Am 2022-01-17 um 6:44 a.m. schrieb Christian König: >>> Am 14.01.22 um 18:40 schrieb Felix Kuehling: >>>> Am 2022-01-14 um 12:26 p.m. schrieb Christian König: >

Re: [PATCH v6 01/10] mm: add zone device coherent type memory support

2022-02-11 Thread Felix Kuehling
. However, no one should be allowed to pin such memory so that it can always be evicted. Signed-off-by: Alex Sierra Acked-by: Felix Kuehling Reviewed-by: Alistair Popple So, I'm currently messing with PageAnon() pages and CoW semantics ... all these PageAnon() ZONE_DEVICE variants don't

Re: [PATCH v6 01/10] mm: add zone device coherent type memory support

2022-02-11 Thread Felix Kuehling
of a process can be migrated to such memory. However, no one should be allowed to pin such memory so that it can always be evicted. Signed-off-by: Alex Sierra Acked-by: Felix Kuehling Reviewed-by: Alistair Popple So, I'm currently messing with PageAnon() pages and CoW semantics ... all

Re: [PATCH v6 01/10] mm: add zone device coherent type memory support

2022-02-15 Thread Felix Kuehling
On 2022-02-15 14:41, Jason Gunthorpe wrote: On Tue, Feb 15, 2022 at 07:32:09PM +0100, Christoph Hellwig wrote: On Tue, Feb 15, 2022 at 10:45:24AM -0400, Jason Gunthorpe wrote: Do you know if DEVICE_GENERIC pages would end up as PageAnon()? My assumption was that they would be part of a

Re: [PATCH v6 01/10] mm: add zone device coherent type memory support

2022-02-15 Thread Felix Kuehling
On 2022-02-15 16:47, Jason Gunthorpe wrote: On Tue, Feb 15, 2022 at 04:35:56PM -0500, Felix Kuehling wrote: On 2022-02-15 14:41, Jason Gunthorpe wrote: On Tue, Feb 15, 2022 at 07:32:09PM +0100, Christoph Hellwig wrote: On Tue, Feb 15, 2022 at 10:45:24AM -0400, Jason Gunthorpe wrote: Do you

Re: [PATCH 2/2] drm/amdkfd: CRIU Refactor restore BO function

2022-03-08 Thread Felix Kuehling
. With that fixed, the series is Reviewed-by: Felix Kuehling Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 271 +++ 1 file changed, 129 insertions(+), 142 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd

Re: [PATCH v1 1/3] mm: split vm_normal_pages for LRU and non-LRU handling

2022-03-11 Thread Felix Kuehling
On 2022-03-11 04:16, David Hildenbrand wrote: On 10.03.22 18:26, Alex Sierra wrote: DEVICE_COHERENT pages introduce a subtle distinction in the way "normal" pages can be used by various callers throughout the kernel. They behave like normal pages for purposes of mapping in CPU page tables, and

[PATCH v2 1/2] drm/amdgpu: Generalize KFD dmabuf import

2022-03-16 Thread Felix Kuehling
Use proper amdgpu_gem_prime_import function to handle all kinds of imports. Remember the dmabuf reference to enable proper multi-GPU attachment to multiple VMs without erroneously re-exporting the underlying BO multiple times. Signed-off-by: Felix Kuehling --- .../gpu/drm/amd/amdgpu

[PATCH v2 2/2] drm/amdkfd: Implement DMA buf fd export for RDMA

2022-03-16 Thread Felix Kuehling
-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 2 + .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 45 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 54 +++ include/uapi/linux/kfd_ioctl.h| 14 - 4 files changed, 103

[RFC PATCH 1/4] drm/amdkfd: Improve amdgpu_vm_handle_moved

2022-03-16 Thread Felix Kuehling
. Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 6 +- drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 18 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 3 ++- 4 files changed, 21 insertions

[RFC PATCH 2/4] drm/amdgpu: Attach eviction fence on alloc

2022-03-16 Thread Felix Kuehling
Instead of attaching the eviction fence when a KFD BO is first mapped, attach it when it is allocated or imported. This in preparation to allow KFD BOs to be mapped using the render node API. Signed-off-by: Felix Kuehling --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 62

[RFC PATCH 3/4] drm/amdgpu: update mappings not managed by KFD

2022-03-16 Thread Felix Kuehling
When restoring after an eviction, use amdgpu_vm_handle_moved to update BO VA mappings in KFD VMs that are not managed through the KFD API. This should allow using the render node API to create more flexible memory mappings in KFD VMs. Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu

[RFC PATCH 4/4] drm/amdgpu: Do bo_va ref counting for KFD BOs

2022-03-16 Thread Felix Kuehling
This is needed to correctly handle BOs imported into the GEM API, which would otherwise get added twice to the same VM. Signed-off-by: Felix Kuehling --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 26 +++ 1 file changed, 21 insertions(+), 5 deletions(-) diff --git

[RFC PATCH 0/4] Enable render node VA mapping API for KFD BOs

2022-03-16 Thread Felix Kuehling
s with GEM handles. Doesn't help because there is no way to import GEM handles into libdrm-amdgpu Felix Kuehling (4): drm/amdkfd: Improve amdgpu_vm_handle_moved drm/amdgpu: Attach eviction fence on alloc drm/amdgpu: update mappings not managed by KFD drm/amdgpu: Do bo_va ref counting f

Re: [PATCH 1/1] drm/amdkfd: Protect the Client whilst it is being operated on

2022-03-17 Thread Felix Kuehling
Am 2022-03-17 um 09:16 schrieb Lee Jones: Presently the Client can be freed whilst still in use. Use the already provided lock to prevent this. Cc: Felix Kuehling Cc: Alex Deucher Cc: "Christian König" Cc: "Pan, Xinhui" Cc: David Airlie Cc: Daniel Vetter Cc: amd-...@l

Re: amd-gfx Digest, Vol 70, Issue 199

2022-03-17 Thread Felix Kuehling
Am 2022-03-16 um 21:57 schrieb Yat Sin, David: Use proper amdgpu_gem_prime_import function to handle all kinds of imports. Remember the dmabuf reference to enable proper multi-GPU attachment to multiple VMs without erroneously re-exporting the underlying BO multiple times. Signed-off-by: Felix

Re: [PATCH 1/1] drm/amdkfd: Protect the Client whilst it is being operated on

2022-03-17 Thread Felix Kuehling
Am 2022-03-17 um 11:00 schrieb Lee Jones: Good afternoon Felix, Thanks for your review. Am 2022-03-17 um 09:16 schrieb Lee Jones: Presently the Client can be freed whilst still in use. Use the already provided lock to prevent this. Cc: Felix Kuehling Cc: Alex Deucher Cc: "Chri

Re: [PATCH] drm/amdkfd: CRIU export dmabuf handles for GTT BOs

2022-03-08 Thread Felix Kuehling
Am 2022-03-08 um 14:11 schrieb David Yat Sin: Export dmabuf handles for GTT BOs so that their contents can be accessed using SDMA during checkpoint/restore. This deserves a minor version bump. The plugin should depend on that bumped version when it starts using dmabuf handles for GTT BOs.

Re: [PATCH] drm/amdkfd: Set handle to invalid for non GTT/VRAM BOs

2022-03-09 Thread Felix Kuehling
On 2022-03-09 12:41, David Yat Sin wrote: Set dmabuf handle to invalid for BOs that cannot be accessed using SDMA during checkpoint/restore. Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 8 ++-- include/uapi/linux/kfd_ioctl.h | 2 ++ 2 files

Re: [PATCH] fixup! drm/amdkfd: CRIU export dmabuf handles for GTT BOs

2022-03-09 Thread Felix Kuehling
On 2022-03-09 16:20, David Yat Sin wrote: Signed-off-by: David Yat Sin Please add the commit description back. And let's wait for Alex to confirm that the fixup-method is OK. With that fixed, the patch is Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 6

Re: [PATCH v2] drm/amdkfd: CRIU export dmabuf handles for GTT BOs

2022-03-08 Thread Felix Kuehling
Am 2022-03-08 um 16:08 schrieb David Yat Sin: Export dmabuf handles for GTT BOs so that their contents can be accessed using SDMA during checkpoint/restore. Signed-off-by: David Yat Sin Looks good to me. Please also post a link to the user mode change for this. Note that the user mode

Re: [RFC PATCH 1/4] drm/amdkfd: Improve amdgpu_vm_handle_moved

2022-03-17 Thread Felix Kuehling
Am 2022-03-17 um 04:21 schrieb Christian König: Am 17.03.22 um 01:20 schrieb Felix Kuehling: Let amdgpu_vm_handle_moved update all BO VA mappings of BOs reserved by the caller. This will be useful for handling extra BO VA mappings in KFD VMs that are managed through the render node API. Yes

Re: [RFC PATCH 1/4] drm/amdkfd: Improve amdgpu_vm_handle_moved

2022-03-18 Thread Felix Kuehling
Am 2022-03-18 um 08:38 schrieb Christian König: Am 17.03.22 um 20:11 schrieb Felix Kuehling: Am 2022-03-17 um 04:21 schrieb Christian König: Am 17.03.22 um 01:20 schrieb Felix Kuehling: Let amdgpu_vm_handle_moved update all BO VA mappings of BOs reserved by the caller. This will be useful

Re: [PATCH v1 1/3] mm: split vm_normal_pages for LRU and non-LRU handling

2022-03-10 Thread Felix Kuehling
Am 2022-03-10 um 14:25 schrieb Matthew Wilcox: On Thu, Mar 10, 2022 at 11:26:31AM -0600, Alex Sierra wrote: @@ -606,7 +606,7 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr, * PFNMAP mappings in order to support COWable mappings. * */ -struct page

Re: [PATCH v6 01/10] mm: add zone device coherent type memory support

2022-02-16 Thread Felix Kuehling
Am 2022-02-15 um 21:01 schrieb Jason Gunthorpe: On Tue, Feb 15, 2022 at 05:49:07PM -0500, Felix Kuehling wrote: Userspace does 1) mmap(MAP_PRIVATE) to allocate anon memory 2) something to trigger migration to install a ZONE_DEVICE page 3) munmap() Who decrements the refcout

Re: [PATCH v6 01/10] mm: add zone device coherent type memory support

2022-02-18 Thread Felix Kuehling
Am 2022-02-17 um 19:19 schrieb Jason Gunthorpe: On Thu, Feb 17, 2022 at 04:12:20PM -0500, Felix Kuehling wrote: I'm thinking of a more theoretical approach: Instead of auditing all users, I'd ask, what are the invariants that a vm_normal_page should have. Then check, whether our

Re: [PATCH] drm/amdkfd: Fix for possible integer overflow

2022-02-18 Thread Felix Kuehling
Am 2022-02-18 um 18:08 schrieb David Yat Sin: Fix for possible integer overflow when doing addition. Reported-by: Dan Carpenter Signed-off-by: David Yat Sin Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 2 +- 1 file changed, 1 insertion

Re: [PATCH v6 01/10] mm: add zone device coherent type memory support

2022-02-18 Thread Felix Kuehling
Am 2022-02-18 um 14:26 schrieb Jason Gunthorpe: On Fri, Feb 18, 2022 at 02:20:45PM -0500, Felix Kuehling wrote: Am 2022-02-17 um 19:19 schrieb Jason Gunthorpe: On Thu, Feb 17, 2022 at 04:12:20PM -0500, Felix Kuehling wrote: I'm thinking of a more theoretical approach: Instead of auditing all

Re: [PATCH] drm/amdkfd: rework criu_restore_bos error handling

2022-02-18 Thread Felix Kuehling
Am 2022-02-18 um 12:39 schrieb t...@redhat.com: From: Tom Rix Clang static analysis reports this problem kfd_chardev.c:2327:2: warning: 1st function call argument is an uninitialized value kvfree(bo_privs); ^~~~ If the copy_from_users(bo_buckets, ...) fails, there is a

Re: [PATCH] drm/amdkfd: rework criu_restore_bos error handling

2022-02-18 Thread Felix Kuehling
Am 2022-02-18 um 21:34 schrieb Tom Rix: On 2/18/22 10:35 AM, Felix Kuehling wrote: Am 2022-02-18 um 12:39 schrieb t...@redhat.com: From: Tom Rix Clang static analysis reports this problem kfd_chardev.c:2327:2: warning: 1st function call argument    is an uninitialized value    kvfree

Re: [PATCH] mm: split vm_normal_pages for LRU and non-LRU handling

2022-02-28 Thread Felix Kuehling
. We also introduced a FOLL_LRU flag that adds the same behaviour to follow_page and related APIs, to allow callers to specify that they expect to put pages on an LRU list. Signed-off-by: Alex Sierra Acked-by: Felix Kuehling FWIW. Full disclosure, Alex and I worked on this together, but it's a

Re: [PATCH 2/6] drm/ttm: add resource iterator v3

2022-02-15 Thread Felix Kuehling
(ttm_bo_get_unless_zero(res->bo)) {     bo = res->bo;     break; } if (locked)     dma_resv_unlock(res->bo->base.resv); Either way, the patch is Reviewed-by: Felix Kuehling + if (locked) + dma_resv_unlock(res->bo->

<    1   2   3   4   5   6   7   8   9   >