Am 2021-08-17 um 8:01 p.m. schrieb Ralph Campbell:
> On 8/12/21 11:31 PM, Alex Sierra wrote:
>> From: Ralph Campbell
>>
>> ZONE_DEVICE struct pages have an extra reference count that
>> complicates the
>> code for put_page() and several places in the kernel that need to
>> check the
>>
Am 2021-08-25 um 2:24 p.m. schrieb Sierra Guiza, Alejandro (Alex):
>
> On 8/25/2021 2:46 AM, Christoph Hellwig wrote:
>> On Tue, Aug 24, 2021 at 10:48:17PM -0500, Alex Sierra wrote:
>>> } else {
>>> - if (!(migrate->flags & MIGRATE_VMA_SELECT_SYSTEM))
>>> + if
Am 2021-09-01 um 4:29 a.m. schrieb Christoph Hellwig:
> On Mon, Aug 30, 2021 at 01:04:43PM -0400, Felix Kuehling wrote:
>>>> driver code is not really involved in updating the CPU mappings. Maybe
>>>> it's something we need to do in the migration helpers.
&g
Am 2021-08-24 um 11:48 p.m. schrieb Alex Sierra:
> In this case, this is used to migrate pages from device memory, back to
> system memory. This particular device memory type should be accessible
> by the CPU, through IOMEM access. Typically, zone device public type
> memory falls into this
DEVICE_PRIVATE or DEVICE_PUBLIC to create the device
> page map region.
>
> Signed-off-by: Alex Sierra
> Reviewed-by: Felix Kuehling
> ---
> v7:
> Remove lookup_resource call, so export symbol for this function
> is not longer required. Patch dropped "kernel: re
Am 2021-08-24 um 11:48 p.m. schrieb Alex Sierra:
> Ref counter from device pages is init to zero during memmap init zone.
> The first time a new device page is allocated to migrate data into it,
> its ref counter needs to be initialized to one.
>
> Signed-off-by: Alex Sierra
> ---
>
Am 2021-08-15 um 5:10 a.m. schrieb Christoph Hellwig:
>> @@ -880,17 +881,22 @@ int svm_migrate_init(struct amdgpu_device *adev)
>> * should remove reserved size
>> */
>> size = ALIGN(adev->gmc.real_vram_size, 2ULL << 20);
>> -res = devm_request_free_mem_region(adev->dev,
Am 2021-08-15 um 4:40 p.m. schrieb John Hubbard:
> On 8/15/21 8:37 AM, Christoph Hellwig wrote:
>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>> index 8ae31622deef..d48a1f0889d1 100644
>>> --- a/include/linux/mm.h
>>> +++ b/include/linux/mm.h
>>> @@ -1218,7 +1218,7 @@ __maybe_unused
Am 2021-08-15 um 11:40 a.m. schrieb Christoph Hellwig:
> On Fri, Aug 13, 2021 at 01:31:45AM -0500, Alex Sierra wrote:
>> Add MEMORY_DEVICE_GENERIC case to free_zone_device_page callback.
>> Device generic type memory case is now able to free its pages properly.
> How is this going to work for
Am 2021-08-19 um 2:00 p.m. schrieb Sierra Guiza, Alejandro (Alex):
>
> On 8/18/2021 2:28 PM, Ralph Campbell wrote:
>> On 8/17/21 5:35 PM, Felix Kuehling wrote:
>>> Am 2021-08-17 um 8:01 p.m. schrieb Ralph Campbell:
>>>> On 8/12/21 11:31 PM, Alex Sierra
Am 2021-08-16 um 6:06 p.m. schrieb Zeng, Oak:
> Regards,
> Oak
>
>
>
> On 2021-08-16, 3:53 PM, "amd-gfx on behalf of Sierra Guiza, Alejandro
> (Alex)" alex.sie...@amd.com> wrote:
>
>
> On 8/15/2021 10:38 AM, Christoph Hellwig wrote:
> > On Fri, Aug 13, 2021 at 01:31:42AM -0500, Alex
On 2021-09-01 6:03 p.m., Dave Chinner wrote:
On Wed, Sep 01, 2021 at 11:40:43AM -0400, Felix Kuehling wrote:
Am 2021-09-01 um 4:29 a.m. schrieb Christoph Hellwig:
On Mon, Aug 30, 2021 at 01:04:43PM -0400, Felix Kuehling wrote:
driver code is not really involved in updating the CPU mappings
Am 2021-09-02 um 4:18 a.m. schrieb Christoph Hellwig:
> On Wed, Sep 01, 2021 at 11:40:43AM -0400, Felix Kuehling wrote:
>>>>> It looks like I'm totally misunderstanding what you are adding here
>>>>> then. Why do we need any special treatment at all for memory t
Hi Xiyu Yang,
This bug was already fixed by this commit:
https://gitlab.freedesktop.org/agd5f/linux/-/commit/598a118db0d85a432f8cd541a6a5d31e31c56b6b
Regards,
Felix
Am 2021-09-09 um 12:27 a.m. schrieb Xiyu Yang:
> The memory leak issue may take place in an error handling path. When
>
Am 2021-09-01 um 9:14 p.m. schrieb Dave Chinner:
> On Wed, Sep 01, 2021 at 07:07:34PM -0400, Felix Kuehling wrote:
>> On 2021-09-01 6:03 p.m., Dave Chinner wrote:
>>> On Wed, Sep 01, 2021 at 11:40:43AM -0400, Felix Kuehling wrote:
>>>> Am 2021-09-01 um 4:29
These bits are de-facto part of the uAPI, so declare them in a uAPI header.
Signed-off-by: Felix Kuehling
---
MAINTAINERS | 1 +
drivers/gpu/drm/amd/amdkfd/kfd_topology.h | 46 +
include/uapi/linux/kfd_sysfs.h| 108 ++
3
Am 2021-07-13 um 2:57 a.m. schrieb Christian König:
>
>
> Am 13.07.21 um 00:06 schrieb Felix Kuehling:
>> KFD Thunk maps invisible VRAM BOs with PROT_NONE, MAP_PRIVATE.
>> is_cow_mapping returns true for these mappings. Add a check for
>> vm_flags & VM_WRITE to avoi
Am 2021-07-14 um 6:51 a.m. schrieb Christian König:
> Am 14.07.21 um 12:44 schrieb Daniel Vetter:
>> On Mon, Jul 12, 2021 at 06:06:36PM -0400, Felix Kuehling wrote:
>>> KFD Thunk maps invisible VRAM BOs with PROT_NONE, MAP_PRIVATE.
>>> is_cow_mapping returns true for
Am 2021-07-23 um 6:46 p.m. schrieb Sierra Guiza, Alejandro (Alex):
>
> On 7/17/2021 2:54 PM, Sierra Guiza, Alejandro (Alex) wrote:
>>
>> On 7/16/2021 5:14 PM, Felix Kuehling wrote:
>>> Am 2021-07-16 um 11:07 a.m. schrieb Theodore Y. Ts'o:
>>>> On Wed,
Am 2021-07-28 um 7:45 p.m. schrieb Sierra Guiza, Alejandro (Alex):
>
> On 7/22/2021 12:26 PM, Jason Gunthorpe wrote:
>> On Thu, Jul 22, 2021 at 11:59:17AM -0500, Sierra Guiza, Alejandro
>> (Alex) wrote:
>>> On 7/22/2021 7:23 AM, Jason Gunthorpe wrote:
On Sat, Jul 17, 2021 at 02:21:32PM -0500,
Am 2021-08-11 um 3:29 p.m. schrieb Alex Deucher:
> On Wed, Aug 11, 2021 at 3:11 PM Ramesh Errabolu
> wrote:
>> Current implementation will disallow P2P DMA if the participating
>> devices belong to different root complexes. Implementation allows
>> this default behavior to be overridden for
As the programming models for GPU-based high-performance computing
applications are evolving, HMM is helping us integrate the GPU memory
management more closely with the kernel's virtual memory management. As
a result we can provide a shared virtual address space with
demand-paging and
00 Pacific, 11:40-13:00
Eastern, 15:40-17:00 UTC.
I hope to see you all tomorrow,
Felix
On 2021-09-21 3:19 p.m., Felix Kuehling wrote:
As the programming models for GPU-based high-performance computing
applications are evolving, HMM is helping us integrate the GPU memory
management mo
Am 2021-10-12 um 3:03 p.m. schrieb Andrew Morton:
> On Tue, 12 Oct 2021 15:56:29 -0300 Jason Gunthorpe wrote:
>
>>> To what other uses will this infrastructure be put?
>>>
>>> Because I must ask: if this feature is for one single computer which
>>> presumably has a custom kernel, why add it to
Am 2021-10-12 um 2:39 p.m. schrieb Andrew Morton:
> On Tue, 12 Oct 2021 12:12:35 -0500 Alex Sierra wrote:
>
>> This patch series introduces MEMORY_DEVICE_COHERENT, a type of memory
>> owned by a device that can be mapped into CPU page tables like
>> MEMORY_DEVICE_GENERIC and can also be migrated
Am 2021-10-19 um 7:36 a.m. schrieb Christian König:
> Am 13.10.21 um 16:07 schrieb Daniel Vetter:
>> On Tue, Oct 05, 2021 at 01:37:26PM +0200, Christian König wrote:
>>> Simplifying the code a bit.
>>>
>>> Signed-off-by: Christian König
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 14
Am 2021-10-12 um 3:11 p.m. schrieb Matthew Wilcox:
> On Tue, Oct 12, 2021 at 11:39:57AM -0700, Andrew Morton wrote:
>> Because I must ask: if this feature is for one single computer which
>> presumably has a custom kernel, why add it to mainline Linux?
> I think in particular patch 2 deserves to
Am 2021-08-30 um 4:28 a.m. schrieb Christoph Hellwig:
> On Thu, Aug 26, 2021 at 06:27:31PM -0400, Felix Kuehling wrote:
>> I think we're missing something here. As far as I can tell, all the work
>> we did first with DEVICE_GENERIC and now DEVICE_PUBLIC always used
>> normal
[+Adrian]
Am 2021-12-23 um 2:05 a.m. schrieb Christian König:
> Am 22.12.21 um 21:53 schrieb Daniel Vetter:
>> On Mon, Dec 20, 2021 at 01:12:51PM -0500, Bhardwaj, Rajneesh wrote:
>>
>> [SNIP]
>> Still sounds funky. I think minimally we should have an ack from CRIU
>> developers that this is
Am 2022-01-05 um 3:08 a.m. schrieb Christian König:
> Am 04.01.22 um 19:08 schrieb Felix Kuehling:
>> [+Adrian]
>>
>> Am 2021-12-23 um 2:05 a.m. schrieb Christian König:
>>
>>> Am 22.12.21 um 21:53 schrieb Daniel Vetter:
>>>> On Mon, Dec 20, 20
rm/amdkfd: Add topology support for dGPUs")
> Signed-off-by: Jiasheng Jiang
Reviewed-by: Felix Kuehling
> ---
> drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> b/drivers/g
Am 2022-01-05 um 11:16 a.m. schrieb Felix Kuehling:
>> I was already wondering which mmaps through the KFD node we have left
>> which cause problems here.
> We still use the KFD FD for mapping doorbells and HDP flushing. These
> are both SG BOs, so they cannot be CPU-mapped th
Am 2021-11-14 um 9:58 p.m. schrieb Bernard Zhao:
> In function amdgpu_get_xgmi_hive, when kobject_init_and_add failed
> There is a potential memleak if not call kobject_put.
>
> Signed-off-by: Bernard Zhao
Reviewed-by: Felix Kuehling
> ---
> drivers/gpu/drm/amd/amdgpu
the patches. I think the same sort of change (at least the
allocation/freeing part) could be applied to the queue_slot_bitmap in
kfd_process_queue_manager.c. Would you like to submit another revision
of this patch series that handles that as well?
Either way, this series is
Reviewed-by:
Am 2021-11-18 um 1:53 a.m. schrieb Alistair Popple:
> On Tuesday, 16 November 2021 6:30:18 AM AEDT Alex Sierra wrote:
>> Device memory that is cache coherent from device and CPU point of view.
>> This is used on platforms that have an advanced system bus (like CAPI
>> or CXL). Any page of a
Am 2021-11-15 um 2:30 p.m. schrieb Alex Sierra:
> Device Coherent type uses device memory that is coherently accesible by
> the CPU. This could be shown as SP (special purpose) memory range
> at the BIOS-e820 memory enumeration. If no SP memory is supported in
> system, this could be faked by
Am 2021-11-21 um 9:40 p.m. schrieb Alistair Popple:
diff --git a/mm/migrate.c b/mm/migrate.c
index 1852d787e6ab..f74422a42192 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -362,7 +362,7 @@ static int expected_page_refs(struct address_space
*mapping, struct page
Use proper amdgpu_gem_prime_import function to handle all kinds of
imports. Remember the dmabuf reference to enable proper multi-GPU
attachment to multiple VMs without erroneously re-exporting the
underlying BO multiple times.
Signed-off-by: Felix Kuehling
---
.../gpu/drm/amd/amdgpu
-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 2 +
.../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 45 +++
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 55 +++
include/uapi/linux/kfd_ioctl.h| 14 -
4 files changed, 104
Am 2021-11-23 um 3:46 p.m. schrieb Christophe JAILLET:
> The 'doorbell_bitmap' bitmap has just been allocated. So we can use the
> non-atomic '__set_bit()' function to save a few cycles as no concurrent
> access can happen.
>
> Reviewed-by: Felix Kuehling
> Signed-off-by:
t; Signed-off-by: Alistair Popple
It makes sense to me. Do you have any empirical data on how much more
likely migrations are going to fail with this change due to contested
page locks?
Either way, the patch is
Acked-by: Felix Kuehling
> ---
> Documentation/vm/hmm.rst | 2 +
Am 2021-10-27 um 9:42 p.m. schrieb Alistair Popple:
> On Wednesday, 27 October 2021 3:09:57 AM AEDT Felix Kuehling wrote:
>> Am 2021-10-25 um 12:16 a.m. schrieb Alistair Popple:
>>> MIGRATE_PFN_LOCKED is used to indicate to migrate_vma_prepare() that a
>>> source pag
Am 2021-12-09 um 5:53 a.m. schrieb Alistair Popple:
> On Thursday, 9 December 2021 5:55:26 AM AEDT Sierra Guiza, Alejandro (Alex)
> wrote:
>> On 12/8/2021 11:30 AM, Felix Kuehling wrote:
>>> Am 2021-12-08 um 11:58 a.m. schrieb Felix Kuehling:
>>>> Am 2021-12
;>>>>> we had
>>>>>> worked around earlier in the user space inside the thunk library.
>>>>>>
>>>>>> Additionally, we faced this issue when using CRIU to checkpoint
>>>>>> restore
>>>>
.
On Wed, Dec 1, 2021 at 1:35 AM Felix Kuehling <mailto:felix.kuehl...@amd.com>> wrote:
Am 2021-11-30 um 11:51 a.m. schrieb philip yang:
>
>
> On 2021-11-30 6:26 a.m., Zhou Qingyang wrote:
>> In svm_range_add(), the return value of svm_range_new() is a
Am 2021-12-08 um 6:31 a.m. schrieb Alistair Popple:
> On Tuesday, 7 December 2021 5:52:43 AM AEDT Alex Sierra wrote:
>> Avoid long term pinning for Coherent device type pages. This could
>> interfere with their own device memory manager.
>> If caller tries to get user device coherent pages with
Am 2021-12-08 um 11:58 a.m. schrieb Felix Kuehling:
> Am 2021-12-08 um 6:31 a.m. schrieb Alistair Popple:
>> On Tuesday, 7 December 2021 5:52:43 AM AEDT Alex Sierra wrote:
>>> Avoid long term pinning for Coherent device type pages. This could
>>> interfere with thei
On 2021-12-09 8:31 p.m., Alistair Popple wrote:
On Friday, 10 December 2021 3:54:31 AM AEDT Sierra Guiza, Alejandro (Alex)
wrote:
On 12/9/2021 10:29 AM, Felix Kuehling wrote:
Am 2021-12-09 um 5:53 a.m. schrieb Alistair Popple:
On Thursday, 9 December 2021 5:55:26 AM AEDT Sierra Guiza
Am 2021-11-30 um 11:51 a.m. schrieb philip yang:
>
>
> On 2021-11-30 6:26 a.m., Zhou Qingyang wrote:
>> In svm_range_add(), the return value of svm_range_new() is assigned
>> to prange and >insert_list is used in list_add(). There is a
>> a dereference of >insert_list in list_add(), which could
space consumers such as OpenGL
etc, limit it to KFD BOs only.
Cc: Felix Kuehling
Signed-off-by: David Yat Sin
Signed-off-by: Rajneesh Bhardwaj
Acked-by: Felix Kuehling
---
Changes in v2:
* Addressed Christian's concerns for user space impact
* Further reduced the scope to KFD BOs only
On 2021-12-10 10:13 a.m., Christian König wrote:
Am 10.12.21 um 15:25 schrieb Guilherme G. Piccoli:
On 10/12/2021 11:16, Alex Deucher wrote:> [...]
Why not just reload the driver after kexec?
Alex
Because the original issue is the kdump case, and we want a very very
tiny kernel - also, the
Am 2021-07-16 um 11:07 a.m. schrieb Theodore Y. Ts'o:
> On Wed, Jun 23, 2021 at 05:49:55PM -0400, Felix Kuehling wrote:
>> I can think of two ways to test the changes for MEMORY_DEVICE_GENERIC in
>> this patch series in a way that is reproducible without special hardware
is_cow_mapping(vm_flags) false.
Fixes: f91142c62161 ("drm/ttm: nuke VM_MIXEDMAP on BO mappings v3")
Suggested-by: Daniel Vetter
Tested-by: Felix Kuehling
Signed-off-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 9 +
1 file changed, 9 insertions(+)
diff --git a/d
Hi Huoqing,
Your patch is technically correct. However, I don't think it fixes any
actual bug, and it changes a code path that has no performance
implications. Therefore I would just leave it as it is.
Regards,
Felix
Am 2021-07-20 um 2:34 a.m. schrieb Cai Huoqing:
> no need to get error code
Am 2022-01-07 um 3:56 a.m. schrieb Christian König:
> Am 06.01.22 um 17:51 schrieb Felix Kuehling:
>> Am 2022-01-06 um 11:48 a.m. schrieb Christian König:
>>> Am 06.01.22 um 17:45 schrieb Felix Kuehling:
>>>> Am 2022-01-06 um 4:05 a.m. schrieb Christian König:
>&g
by libhsakmt
per node, allowing for space consumed by page translation tables.
Other than the missing signed-off-by, this patch is
Reviewed-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 1 +
.../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c| 14 ++
drivers/gpu
On 2022-01-10 4:48 p.m., Daniel Phillips wrote:
Basic test for the new hsaKmtAvailableMemory library call. This is
a standalone test, does not modify any of the other tests just to
be on the safe side. More elaborate tests coming soon.
Change-Id: I738600d4b74cc5dba6b857e4c793f6b14b7d2283
On 2022-01-05 10:22 a.m., philip yang wrote:
On 2021-12-22 7:37 p.m., Rajneesh Bhardwaj wrote:
Recoverable page faults are represented by the xnack mode setting inside
a kfd process and are used to represent the device page faults. For CR,
we don't consider negative values which are typically
On 2022-01-05 10:56 a.m., Felix Kuehling wrote:
Am 2022-01-05 um 4:09 a.m. schrieb Jiasheng Jiang:
As the possible failure of the allocation, kmemdup() may return NULL
pointer.
Therefore, it should be better to check the 'props2' in order to prevent
the dereference of NULL pointer.
Fixes
On 2021-12-22 7:36 p.m., Rajneesh Bhardwaj wrote:
This IOCTL is expected to be called as a precursor to the actual
Checkpoint operation. This does the basic discovery into the target
process seized by CRIU and relays the information to the userspace that
utilizes it to start the Checkpoint
On 2021-12-22 7:37 p.m., Rajneesh Bhardwaj wrote:
In CRIU resume stage, resume all the shared virtual memory ranges from
the data stored inside the resuming kfd process during CRIU restore
phase. Also setup xnack mode and free up the resources.
Signed-off-by: Rajneesh Bhardwaj
---
On 2022-01-05 9:43 a.m., philip yang wrote:
On 2021-12-22 7:37 p.m., Rajneesh Bhardwaj wrote:
During CRIU restore phase, the VMAs for the virtual address ranges are
not at their final location yet so in this stage, only cache the data
required to successfully resume the svm ranges during an
On 2021-12-22 7:36 p.m., Rajneesh Bhardwaj wrote:
This implements the KFD CRIU Restore ioctl that lays the basic
foundation for the CRIU restore operation. It provides support to
create the buffer objects corresponding to Non-Paged system memory
mapped for GPU and/or CPU access and lays basic
On 2021-12-22 7:37 p.m., Rajneesh Bhardwaj wrote:
From: David Yat Sin
Checkpoint contents of queue MQD's on CRIU dump and restore them during
CRIU restore.
Signed-off-by: David Yat Sin
David has an update for this patch to fix up the doorbell offset in the
restored SDMA MQD.
Regards,
criu plugin which has elevated ptrace
attached privileges and CAP_CHECKPOINT_RESTORE capabilities attached with
the file descriptors so modify KFD to allow such calls.
(API redesigned by David Yat Sin)
Suggested-by: Felix Kuehling
Signed-off-by: David Yat Sin
Signed-off-by: Rajneesh Bhardwaj
On 2021-12-22 7:36 p.m., Rajneesh Bhardwaj wrote:
This adds support to create userptr BOs on restore and introduces a new
ioctl to restart memory notifiers for the restored userptr BOs.
When doing CRIU restore MMU notifications can happen anytime after we call
amdgpu_mn_register. Prevent MMU
Am 2022-01-12 um 6:16 a.m. schrieb David Hildenbrand:
> On 10.01.22 23:31, Alex Sierra wrote:
>> This patch series introduces MEMORY_DEVICE_COHERENT, a type of memory
>> owned by a device that can be mapped into CPU page tables like
>> MEMORY_DEVICE_GENERIC and can also be migrated like
>>
Am 2022-01-17 um 6:44 a.m. schrieb Christian König:
> Am 14.01.22 um 18:40 schrieb Felix Kuehling:
>> Am 2022-01-14 um 12:26 p.m. schrieb Christian König:
>>> Am 14.01.22 um 17:44 schrieb Daniel Vetter:
>>>> Top post because I tried to catch up on the entire
Am 2022-01-17 um 9:21 a.m. schrieb Christian König:
> Am 17.01.22 um 15:17 schrieb Felix Kuehling:
>> Am 2022-01-17 um 6:44 a.m. schrieb Christian König:
>>> Am 14.01.22 um 18:40 schrieb Felix Kuehling:
>>>> Am 2022-01-14 um 12:26 p.m. schrieb Christian König:
>
. However,
no one should be allowed to pin such memory so that it can always be
evicted.
Signed-off-by: Alex Sierra
Acked-by: Felix Kuehling
Reviewed-by: Alistair Popple
So, I'm currently messing with PageAnon() pages and CoW semantics ...
all these PageAnon() ZONE_DEVICE variants don't
of a process can be migrated to such memory. However,
no one should be allowed to pin such memory so that it can always be
evicted.
Signed-off-by: Alex Sierra
Acked-by: Felix Kuehling
Reviewed-by: Alistair Popple
So, I'm currently messing with PageAnon() pages and CoW semantics ...
all
On 2022-02-15 14:41, Jason Gunthorpe wrote:
On Tue, Feb 15, 2022 at 07:32:09PM +0100, Christoph Hellwig wrote:
On Tue, Feb 15, 2022 at 10:45:24AM -0400, Jason Gunthorpe wrote:
Do you know if DEVICE_GENERIC pages would end up as PageAnon()? My
assumption was that they would be part of a
On 2022-02-15 16:47, Jason Gunthorpe wrote:
On Tue, Feb 15, 2022 at 04:35:56PM -0500, Felix Kuehling wrote:
On 2022-02-15 14:41, Jason Gunthorpe wrote:
On Tue, Feb 15, 2022 at 07:32:09PM +0100, Christoph Hellwig wrote:
On Tue, Feb 15, 2022 at 10:45:24AM -0400, Jason Gunthorpe wrote:
Do you
.
With that fixed, the series is
Reviewed-by: Felix Kuehling
Signed-off-by: David Yat Sin
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 271 +++
1 file changed, 129 insertions(+), 142 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
b/drivers/gpu/drm/amd
On 2022-03-11 04:16, David Hildenbrand wrote:
On 10.03.22 18:26, Alex Sierra wrote:
DEVICE_COHERENT pages introduce a subtle distinction in the way
"normal" pages can be used by various callers throughout the kernel.
They behave like normal pages for purposes of mapping in CPU page
tables, and
Use proper amdgpu_gem_prime_import function to handle all kinds of
imports. Remember the dmabuf reference to enable proper multi-GPU
attachment to multiple VMs without erroneously re-exporting the
underlying BO multiple times.
Signed-off-by: Felix Kuehling
---
.../gpu/drm/amd/amdgpu
-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 2 +
.../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 45
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 54 +++
include/uapi/linux/kfd_ioctl.h| 14 -
4 files changed, 103
.
Signed-off-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 6 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 18 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 3 ++-
4 files changed, 21 insertions
Instead of attaching the eviction fence when a KFD BO is first mapped,
attach it when it is allocated or imported. This in preparation to allow
KFD BOs to be mapped using the render node API.
Signed-off-by: Felix Kuehling
---
.../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 62
When restoring after an eviction, use amdgpu_vm_handle_moved to update
BO VA mappings in KFD VMs that are not managed through the KFD API. This
should allow using the render node API to create more flexible memory
mappings in KFD VMs.
Signed-off-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdgpu
This is needed to correctly handle BOs imported into the GEM API, which
would otherwise get added twice to the same VM.
Signed-off-by: Felix Kuehling
---
.../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 26 +++
1 file changed, 21 insertions(+), 5 deletions(-)
diff --git
s with GEM handles. Doesn't help because there is no
way to import GEM handles into libdrm-amdgpu
Felix Kuehling (4):
drm/amdkfd: Improve amdgpu_vm_handle_moved
drm/amdgpu: Attach eviction fence on alloc
drm/amdgpu: update mappings not managed by KFD
drm/amdgpu: Do bo_va ref counting f
Am 2022-03-17 um 09:16 schrieb Lee Jones:
Presently the Client can be freed whilst still in use.
Use the already provided lock to prevent this.
Cc: Felix Kuehling
Cc: Alex Deucher
Cc: "Christian König"
Cc: "Pan, Xinhui"
Cc: David Airlie
Cc: Daniel Vetter
Cc: amd-...@l
Am 2022-03-16 um 21:57 schrieb Yat Sin, David:
Use proper amdgpu_gem_prime_import function to handle all kinds of
imports. Remember the dmabuf reference to enable proper multi-GPU
attachment to multiple VMs without erroneously re-exporting the underlying
BO multiple times.
Signed-off-by: Felix
Am 2022-03-17 um 11:00 schrieb Lee Jones:
Good afternoon Felix,
Thanks for your review.
Am 2022-03-17 um 09:16 schrieb Lee Jones:
Presently the Client can be freed whilst still in use.
Use the already provided lock to prevent this.
Cc: Felix Kuehling
Cc: Alex Deucher
Cc: "Chri
Am 2022-03-08 um 14:11 schrieb David Yat Sin:
Export dmabuf handles for GTT BOs so that their contents can be accessed
using SDMA during checkpoint/restore.
This deserves a minor version bump. The plugin should depend on that
bumped version when it starts using dmabuf handles for GTT BOs.
On 2022-03-09 12:41, David Yat Sin wrote:
Set dmabuf handle to invalid for BOs that cannot be accessed using SDMA
during checkpoint/restore.
Signed-off-by: David Yat Sin
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 8 ++--
include/uapi/linux/kfd_ioctl.h | 2 ++
2 files
On 2022-03-09 16:20, David Yat Sin wrote:
Signed-off-by: David Yat Sin
Please add the commit description back. And let's wait for Alex to
confirm that the fixup-method is OK. With that fixed, the patch is
Reviewed-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 6
Am 2022-03-08 um 16:08 schrieb David Yat Sin:
Export dmabuf handles for GTT BOs so that their contents can be accessed
using SDMA during checkpoint/restore.
Signed-off-by: David Yat Sin
Looks good to me. Please also post a link to the user mode change for this.
Note that the user mode
Am 2022-03-17 um 04:21 schrieb Christian König:
Am 17.03.22 um 01:20 schrieb Felix Kuehling:
Let amdgpu_vm_handle_moved update all BO VA mappings of BOs reserved by
the caller. This will be useful for handling extra BO VA mappings in
KFD VMs that are managed through the render node API.
Yes
Am 2022-03-18 um 08:38 schrieb Christian König:
Am 17.03.22 um 20:11 schrieb Felix Kuehling:
Am 2022-03-17 um 04:21 schrieb Christian König:
Am 17.03.22 um 01:20 schrieb Felix Kuehling:
Let amdgpu_vm_handle_moved update all BO VA mappings of BOs
reserved by
the caller. This will be useful
Am 2022-03-10 um 14:25 schrieb Matthew Wilcox:
On Thu, Mar 10, 2022 at 11:26:31AM -0600, Alex Sierra wrote:
@@ -606,7 +606,7 @@ static void print_bad_pte(struct vm_area_struct *vma,
unsigned long addr,
* PFNMAP mappings in order to support COWable mappings.
*
*/
-struct page
Am 2022-02-15 um 21:01 schrieb Jason Gunthorpe:
On Tue, Feb 15, 2022 at 05:49:07PM -0500, Felix Kuehling wrote:
Userspace does
1) mmap(MAP_PRIVATE) to allocate anon memory
2) something to trigger migration to install a ZONE_DEVICE page
3) munmap()
Who decrements the refcout
Am 2022-02-17 um 19:19 schrieb Jason Gunthorpe:
On Thu, Feb 17, 2022 at 04:12:20PM -0500, Felix Kuehling wrote:
I'm thinking of a more theoretical approach: Instead of auditing all users,
I'd ask, what are the invariants that a vm_normal_page should have. Then
check, whether our
Am 2022-02-18 um 18:08 schrieb David Yat Sin:
Fix for possible integer overflow when doing addition.
Reported-by: Dan Carpenter
Signed-off-by: David Yat Sin
Reviewed-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 2 +-
1 file changed, 1 insertion
Am 2022-02-18 um 14:26 schrieb Jason Gunthorpe:
On Fri, Feb 18, 2022 at 02:20:45PM -0500, Felix Kuehling wrote:
Am 2022-02-17 um 19:19 schrieb Jason Gunthorpe:
On Thu, Feb 17, 2022 at 04:12:20PM -0500, Felix Kuehling wrote:
I'm thinking of a more theoretical approach: Instead of auditing all
Am 2022-02-18 um 12:39 schrieb t...@redhat.com:
From: Tom Rix
Clang static analysis reports this problem
kfd_chardev.c:2327:2: warning: 1st function call argument
is an uninitialized value
kvfree(bo_privs);
^~~~
If the copy_from_users(bo_buckets, ...) fails, there is a
Am 2022-02-18 um 21:34 schrieb Tom Rix:
On 2/18/22 10:35 AM, Felix Kuehling wrote:
Am 2022-02-18 um 12:39 schrieb t...@redhat.com:
From: Tom Rix
Clang static analysis reports this problem
kfd_chardev.c:2327:2: warning: 1st function call argument
is an uninitialized value
kvfree
.
We also introduced a FOLL_LRU flag that adds the same behaviour to
follow_page and related APIs, to allow callers to specify that they
expect to put pages on an LRU list.
Signed-off-by: Alex Sierra
Acked-by: Felix Kuehling
FWIW. Full disclosure, Alex and I worked on this together, but it's a
(ttm_bo_get_unless_zero(res->bo)) {
bo = res->bo;
break;
}
if (locked)
dma_resv_unlock(res->bo->base.resv);
Either way, the patch is
Reviewed-by: Felix Kuehling
+ if (locked)
+ dma_resv_unlock(res->bo->
501 - 600 of 866 matches
Mail list logo