Re: [PATCH v5 01/13] mm: add zone device coherent type memory support

2022-06-24 Thread Sierra Guiza, Alejandro (Alex)



On 6/23/2022 1:21 PM, David Hildenbrand wrote:

On 23.06.22 20:20, Sierra Guiza, Alejandro (Alex) wrote:

On 6/23/2022 2:57 AM, David Hildenbrand wrote:

On 23.06.22 01:16, Sierra Guiza, Alejandro (Alex) wrote:

On 6/21/2022 11:16 AM, David Hildenbrand wrote:

On 21.06.22 18:08, Sierra Guiza, Alejandro (Alex) wrote:

On 6/21/2022 7:25 AM, David Hildenbrand wrote:

On 21.06.22 13:55, Alistair Popple wrote:

David Hildenbrand  writes:


On 21.06.22 13:25, Felix Kuehling wrote:

Am 6/17/22 um 23:19 schrieb David Hildenbrand:

On 17.06.22 21:27, Sierra Guiza, Alejandro (Alex) wrote:

On 6/17/2022 12:33 PM, David Hildenbrand wrote:

On 17.06.22 19:20, Sierra Guiza, Alejandro (Alex) wrote:

On 6/17/2022 4:40 AM, David Hildenbrand wrote:

On 31.05.22 22:00, Alex Sierra wrote:

Device memory that is cache coherent from device and CPU point of view.
This is used on platforms that have an advanced system bus (like CAPI
or CXL). Any page of a process can be migrated to such memory. However,
no one should be allowed to pin such memory so that it can always be
evicted.

Signed-off-by: Alex Sierra
Acked-by: Felix Kuehling
Reviewed-by: Alistair Popple
[hch: rebased ontop of the refcount changes,
 removed is_dev_private_or_coherent_page]
Signed-off-by: Christoph Hellwig
---
include/linux/memremap.h | 19 +++
mm/memcontrol.c  |  7 ---
mm/memory-failure.c  |  8 ++--
mm/memremap.c| 10 ++
mm/migrate_device.c  | 16 +++-
mm/rmap.c|  5 +++--
6 files changed, 49 insertions(+), 16 deletions(-)

diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index 8af304f6b504..9f752ebed613 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -41,6 +41,13 @@ struct vmem_altmap {
 * A more complete discussion of unaddressable memory may be found in
 * include/linux/hmm.h and Documentation/vm/hmm.rst.
 *
+ * MEMORY_DEVICE_COHERENT:
+ * Device memory that is cache coherent from device and CPU point of view. This
+ * is used on platforms that have an advanced system bus (like CAPI or CXL). A
+ * driver can hotplug the device memory using ZONE_DEVICE and with that memory
+ * type. Any page of a process can be migrated to such memory. However no one

Any page might not be right, I'm pretty sure. ... just thinking about special 
pages
like vdso, shared zeropage, ... pinned pages ...

Well, you cannot migrate long term pages, that's what I meant :)


+ * should be allowed to pin such memory so that it can always be evicted.
+ *
 * MEMORY_DEVICE_FS_DAX:
 * Host memory that has similar access semantics as System RAM i.e. DMA
 * coherent and supports page pinning. In support of coordinating page
@@ -61,6 +68,7 @@ struct vmem_altmap {
enum memory_type {
/* 0 is reserved to catch uninitialized type fields */
MEMORY_DEVICE_PRIVATE = 1,
+   MEMORY_DEVICE_COHERENT,
MEMORY_DEVICE_FS_DAX,
MEMORY_DEVICE_GENERIC,
MEMORY_DEVICE_PCI_P2PDMA,
@@ -143,6 +151,17 @@ static inline bool folio_is_device_private(const struct 
folio *folio)

In general, this LGTM, and it should be correct with PageAnonExclusive I think.


However, where exactly is pinning forbidden?

Long-term pinning is forbidden since it would interfere with the device
memory manager owning the
device-coherent pages (e.g. evictions in TTM). However, normal pinning
is allowed on this device type.

I don't see updates to folio_is_pinnable() in this patch.

Device coherent type pages should return true here, as they are pinnable
pages.

That function is only called for long-term pinnings in try_grab_folio().


So wouldn't try_grab_folio() simply pin these pages? What am I missing?

As far as I understand this return NULL for long term pin pages.
Otherwise they get refcount incremented.

I don't follow.

You're saying

a) folio_is_pinnable() returns true for device coherent pages

and that

b) device coherent pages don't get long-term pinned


Yet, the code says

struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags)
{
if (flags & FOLL_GET)
return try_get_folio(page, refs);
else if (flags & FOLL_PIN) {
struct folio *folio;

/*
 * Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a
 * right zone, so fail and let the caller fall back to the slow
 * path.
 */
if (unlikely((flags & FOLL_LONGTERM) &&
 !is_pinnable_page(page)))
return NULL;
...
return folio;
}
}


What prevents these pages from getting long-term pinned as stated in this patch?

Long-term pinning is handled by __gup_longterm_locked, which migrates
pages returned 

Re: [PATCH v5 01/13] mm: add zone device coherent type memory support

2022-06-23 Thread David Hildenbrand
On 23.06.22 20:20, Sierra Guiza, Alejandro (Alex) wrote:
> 
> On 6/23/2022 2:57 AM, David Hildenbrand wrote:
>> On 23.06.22 01:16, Sierra Guiza, Alejandro (Alex) wrote:
>>> On 6/21/2022 11:16 AM, David Hildenbrand wrote:
 On 21.06.22 18:08, Sierra Guiza, Alejandro (Alex) wrote:
> On 6/21/2022 7:25 AM, David Hildenbrand wrote:
>> On 21.06.22 13:55, Alistair Popple wrote:
>>> David Hildenbrand  writes:
>>>
 On 21.06.22 13:25, Felix Kuehling wrote:
> Am 6/17/22 um 23:19 schrieb David Hildenbrand:
>> On 17.06.22 21:27, Sierra Guiza, Alejandro (Alex) wrote:
>>> On 6/17/2022 12:33 PM, David Hildenbrand wrote:
 On 17.06.22 19:20, Sierra Guiza, Alejandro (Alex) wrote:
> On 6/17/2022 4:40 AM, David Hildenbrand wrote:
>> On 31.05.22 22:00, Alex Sierra wrote:
>>> Device memory that is cache coherent from device and CPU point 
>>> of view.
>>> This is used on platforms that have an advanced system bus 
>>> (like CAPI
>>> or CXL). Any page of a process can be migrated to such memory. 
>>> However,
>>> no one should be allowed to pin such memory so that it can 
>>> always be
>>> evicted.
>>>
>>> Signed-off-by: Alex Sierra
>>> Acked-by: Felix Kuehling
>>> Reviewed-by: Alistair Popple
>>> [hch: rebased ontop of the refcount changes,
>>> removed is_dev_private_or_coherent_page]
>>> Signed-off-by: Christoph Hellwig
>>> ---
>>>include/linux/memremap.h | 19 +++
>>>mm/memcontrol.c  |  7 ---
>>>mm/memory-failure.c  |  8 ++--
>>>mm/memremap.c| 10 ++
>>>mm/migrate_device.c  | 16 +++-
>>>mm/rmap.c|  5 +++--
>>>6 files changed, 49 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/include/linux/memremap.h b/include/linux/memremap.h
>>> index 8af304f6b504..9f752ebed613 100644
>>> --- a/include/linux/memremap.h
>>> +++ b/include/linux/memremap.h
>>> @@ -41,6 +41,13 @@ struct vmem_altmap {
>>> * A more complete discussion of unaddressable memory 
>>> may be found in
>>> * include/linux/hmm.h and Documentation/vm/hmm.rst.
>>> *
>>> + * MEMORY_DEVICE_COHERENT:
>>> + * Device memory that is cache coherent from device and CPU 
>>> point of view. This
>>> + * is used on platforms that have an advanced system bus (like 
>>> CAPI or CXL). A
>>> + * driver can hotplug the device memory using ZONE_DEVICE and 
>>> with that memory
>>> + * type. Any page of a process can be migrated to such memory. 
>>> However no one
>> Any page might not be right, I'm pretty sure. ... just thinking 
>> about special pages
>> like vdso, shared zeropage, ... pinned pages ...
 Well, you cannot migrate long term pages, that's what I meant :)

>>> + * should be allowed to pin such memory so that it can always 
>>> be evicted.
>>> + *
>>> * MEMORY_DEVICE_FS_DAX:
>>> * Host memory that has similar access semantics as 
>>> System RAM i.e. DMA
>>> * coherent and supports page pinning. In support of 
>>> coordinating page
>>> @@ -61,6 +68,7 @@ struct vmem_altmap {
>>>enum memory_type {
>>> /* 0 is reserved to catch uninitialized type fields */
>>> MEMORY_DEVICE_PRIVATE = 1,
>>> +   MEMORY_DEVICE_COHERENT,
>>> MEMORY_DEVICE_FS_DAX,
>>> MEMORY_DEVICE_GENERIC,
>>> MEMORY_DEVICE_PCI_P2PDMA,
>>> @@ -143,6 +151,17 @@ static inline bool 
>>> folio_is_device_private(const struct folio *folio)
>> In general, this LGTM, and it should be correct with 
>> PageAnonExclusive I think.
>>
>>
>> However, where exactly is pinning forbidden?
> Long-term pinning is forbidden since it would interfere with the 
> device
> memory manager owning the
> device-coherent pages (e.g. evictions in TTM). However, normal 
> pinning
> is allowed on this device type.
 I don't see updates to folio_is_pinnable() in this patch.
>>> Device coherent type pages should return true here, as they are 
>>> pinnable
>>> pages.

Re: [PATCH v5 01/13] mm: add zone device coherent type memory support

2022-06-23 Thread Sierra Guiza, Alejandro (Alex)



On 6/23/2022 2:57 AM, David Hildenbrand wrote:

On 23.06.22 01:16, Sierra Guiza, Alejandro (Alex) wrote:

On 6/21/2022 11:16 AM, David Hildenbrand wrote:

On 21.06.22 18:08, Sierra Guiza, Alejandro (Alex) wrote:

On 6/21/2022 7:25 AM, David Hildenbrand wrote:

On 21.06.22 13:55, Alistair Popple wrote:

David Hildenbrand  writes:


On 21.06.22 13:25, Felix Kuehling wrote:

Am 6/17/22 um 23:19 schrieb David Hildenbrand:

On 17.06.22 21:27, Sierra Guiza, Alejandro (Alex) wrote:

On 6/17/2022 12:33 PM, David Hildenbrand wrote:

On 17.06.22 19:20, Sierra Guiza, Alejandro (Alex) wrote:

On 6/17/2022 4:40 AM, David Hildenbrand wrote:

On 31.05.22 22:00, Alex Sierra wrote:

Device memory that is cache coherent from device and CPU point of view.
This is used on platforms that have an advanced system bus (like CAPI
or CXL). Any page of a process can be migrated to such memory. However,
no one should be allowed to pin such memory so that it can always be
evicted.

Signed-off-by: Alex Sierra
Acked-by: Felix Kuehling
Reviewed-by: Alistair Popple
[hch: rebased ontop of the refcount changes,
removed is_dev_private_or_coherent_page]
Signed-off-by: Christoph Hellwig
---
   include/linux/memremap.h | 19 +++
   mm/memcontrol.c  |  7 ---
   mm/memory-failure.c  |  8 ++--
   mm/memremap.c| 10 ++
   mm/migrate_device.c  | 16 +++-
   mm/rmap.c|  5 +++--
   6 files changed, 49 insertions(+), 16 deletions(-)

diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index 8af304f6b504..9f752ebed613 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -41,6 +41,13 @@ struct vmem_altmap {
* A more complete discussion of unaddressable memory may be found in
* include/linux/hmm.h and Documentation/vm/hmm.rst.
*
+ * MEMORY_DEVICE_COHERENT:
+ * Device memory that is cache coherent from device and CPU point of view. This
+ * is used on platforms that have an advanced system bus (like CAPI or CXL). A
+ * driver can hotplug the device memory using ZONE_DEVICE and with that memory
+ * type. Any page of a process can be migrated to such memory. However no one

Any page might not be right, I'm pretty sure. ... just thinking about special 
pages
like vdso, shared zeropage, ... pinned pages ...

Well, you cannot migrate long term pages, that's what I meant :)


+ * should be allowed to pin such memory so that it can always be evicted.
+ *
* MEMORY_DEVICE_FS_DAX:
* Host memory that has similar access semantics as System RAM i.e. DMA
* coherent and supports page pinning. In support of coordinating page
@@ -61,6 +68,7 @@ struct vmem_altmap {
   enum memory_type {
/* 0 is reserved to catch uninitialized type fields */
MEMORY_DEVICE_PRIVATE = 1,
+   MEMORY_DEVICE_COHERENT,
MEMORY_DEVICE_FS_DAX,
MEMORY_DEVICE_GENERIC,
MEMORY_DEVICE_PCI_P2PDMA,
@@ -143,6 +151,17 @@ static inline bool folio_is_device_private(const struct 
folio *folio)

In general, this LGTM, and it should be correct with PageAnonExclusive I think.


However, where exactly is pinning forbidden?

Long-term pinning is forbidden since it would interfere with the device
memory manager owning the
device-coherent pages (e.g. evictions in TTM). However, normal pinning
is allowed on this device type.

I don't see updates to folio_is_pinnable() in this patch.

Device coherent type pages should return true here, as they are pinnable
pages.

That function is only called for long-term pinnings in try_grab_folio().


So wouldn't try_grab_folio() simply pin these pages? What am I missing?

As far as I understand this return NULL for long term pin pages.
Otherwise they get refcount incremented.

I don't follow.

You're saying

a) folio_is_pinnable() returns true for device coherent pages

and that

b) device coherent pages don't get long-term pinned


Yet, the code says

struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags)
{
if (flags & FOLL_GET)
return try_get_folio(page, refs);
else if (flags & FOLL_PIN) {
struct folio *folio;

/*
 * Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a
 * right zone, so fail and let the caller fall back to the slow
 * path.
 */
if (unlikely((flags & FOLL_LONGTERM) &&
 !is_pinnable_page(page)))
return NULL;
...
return folio;
}
}


What prevents these pages from getting long-term pinned as stated in this patch?

Long-term pinning is handled by __gup_longterm_locked, which migrates
pages returned by __get_user_pages_locked that cannot be long-term
pinned. try_grab_folio is OK to grab the pages. Anything that can't be
long-term pinned will be migrated 

Re: [PATCH v5 01/13] mm: add zone device coherent type memory support

2022-06-23 Thread David Hildenbrand
On 23.06.22 01:16, Sierra Guiza, Alejandro (Alex) wrote:
> 
> On 6/21/2022 11:16 AM, David Hildenbrand wrote:
>> On 21.06.22 18:08, Sierra Guiza, Alejandro (Alex) wrote:
>>> On 6/21/2022 7:25 AM, David Hildenbrand wrote:
 On 21.06.22 13:55, Alistair Popple wrote:
> David Hildenbrand  writes:
>
>> On 21.06.22 13:25, Felix Kuehling wrote:
>>> Am 6/17/22 um 23:19 schrieb David Hildenbrand:
 On 17.06.22 21:27, Sierra Guiza, Alejandro (Alex) wrote:
> On 6/17/2022 12:33 PM, David Hildenbrand wrote:
>> On 17.06.22 19:20, Sierra Guiza, Alejandro (Alex) wrote:
>>> On 6/17/2022 4:40 AM, David Hildenbrand wrote:
 On 31.05.22 22:00, Alex Sierra wrote:
> Device memory that is cache coherent from device and CPU point of 
> view.
> This is used on platforms that have an advanced system bus (like 
> CAPI
> or CXL). Any page of a process can be migrated to such memory. 
> However,
> no one should be allowed to pin such memory so that it can always 
> be
> evicted.
>
> Signed-off-by: Alex Sierra
> Acked-by: Felix Kuehling
> Reviewed-by: Alistair Popple
> [hch: rebased ontop of the refcount changes,
>removed is_dev_private_or_coherent_page]
> Signed-off-by: Christoph Hellwig
> ---
>   include/linux/memremap.h | 19 +++
>   mm/memcontrol.c  |  7 ---
>   mm/memory-failure.c  |  8 ++--
>   mm/memremap.c| 10 ++
>   mm/migrate_device.c  | 16 +++-
>   mm/rmap.c|  5 +++--
>   6 files changed, 49 insertions(+), 16 deletions(-)
>
> diff --git a/include/linux/memremap.h b/include/linux/memremap.h
> index 8af304f6b504..9f752ebed613 100644
> --- a/include/linux/memremap.h
> +++ b/include/linux/memremap.h
> @@ -41,6 +41,13 @@ struct vmem_altmap {
>* A more complete discussion of unaddressable memory may 
> be found in
>* include/linux/hmm.h and Documentation/vm/hmm.rst.
>*
> + * MEMORY_DEVICE_COHERENT:
> + * Device memory that is cache coherent from device and CPU 
> point of view. This
> + * is used on platforms that have an advanced system bus (like 
> CAPI or CXL). A
> + * driver can hotplug the device memory using ZONE_DEVICE and 
> with that memory
> + * type. Any page of a process can be migrated to such memory. 
> However no one
 Any page might not be right, I'm pretty sure. ... just thinking 
 about special pages
 like vdso, shared zeropage, ... pinned pages ...
>> Well, you cannot migrate long term pages, that's what I meant :)
>>
> + * should be allowed to pin such memory so that it can always be 
> evicted.
> + *
>* MEMORY_DEVICE_FS_DAX:
>* Host memory that has similar access semantics as System 
> RAM i.e. DMA
>* coherent and supports page pinning. In support of 
> coordinating page
> @@ -61,6 +68,7 @@ struct vmem_altmap {
>   enum memory_type {
>   /* 0 is reserved to catch uninitialized type fields */
>   MEMORY_DEVICE_PRIVATE = 1,
> + MEMORY_DEVICE_COHERENT,
>   MEMORY_DEVICE_FS_DAX,
>   MEMORY_DEVICE_GENERIC,
>   MEMORY_DEVICE_PCI_P2PDMA,
> @@ -143,6 +151,17 @@ static inline bool 
> folio_is_device_private(const struct folio *folio)
 In general, this LGTM, and it should be correct with 
 PageAnonExclusive I think.


 However, where exactly is pinning forbidden?
>>> Long-term pinning is forbidden since it would interfere with the 
>>> device
>>> memory manager owning the
>>> device-coherent pages (e.g. evictions in TTM). However, normal 
>>> pinning
>>> is allowed on this device type.
>> I don't see updates to folio_is_pinnable() in this patch.
> Device coherent type pages should return true here, as they are 
> pinnable
> pages.
 That function is only called for long-term pinnings in 
 try_grab_folio().

>> So wouldn't try_grab_folio() simply pin these pages? What am I 
>> missing?
> As far as I understand this return NULL for long term pin pages.
> Otherwise they get refcount 

Re: [PATCH v5 01/13] mm: add zone device coherent type memory support

2022-06-22 Thread Sierra Guiza, Alejandro (Alex)



On 6/21/2022 11:16 AM, David Hildenbrand wrote:

On 21.06.22 18:08, Sierra Guiza, Alejandro (Alex) wrote:

On 6/21/2022 7:25 AM, David Hildenbrand wrote:

On 21.06.22 13:55, Alistair Popple wrote:

David Hildenbrand  writes:


On 21.06.22 13:25, Felix Kuehling wrote:

Am 6/17/22 um 23:19 schrieb David Hildenbrand:

On 17.06.22 21:27, Sierra Guiza, Alejandro (Alex) wrote:

On 6/17/2022 12:33 PM, David Hildenbrand wrote:

On 17.06.22 19:20, Sierra Guiza, Alejandro (Alex) wrote:

On 6/17/2022 4:40 AM, David Hildenbrand wrote:

On 31.05.22 22:00, Alex Sierra wrote:

Device memory that is cache coherent from device and CPU point of view.
This is used on platforms that have an advanced system bus (like CAPI
or CXL). Any page of a process can be migrated to such memory. However,
no one should be allowed to pin such memory so that it can always be
evicted.

Signed-off-by: Alex Sierra
Acked-by: Felix Kuehling
Reviewed-by: Alistair Popple
[hch: rebased ontop of the refcount changes,
   removed is_dev_private_or_coherent_page]
Signed-off-by: Christoph Hellwig
---
  include/linux/memremap.h | 19 +++
  mm/memcontrol.c  |  7 ---
  mm/memory-failure.c  |  8 ++--
  mm/memremap.c| 10 ++
  mm/migrate_device.c  | 16 +++-
  mm/rmap.c|  5 +++--
  6 files changed, 49 insertions(+), 16 deletions(-)

diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index 8af304f6b504..9f752ebed613 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -41,6 +41,13 @@ struct vmem_altmap {
   * A more complete discussion of unaddressable memory may be found in
   * include/linux/hmm.h and Documentation/vm/hmm.rst.
   *
+ * MEMORY_DEVICE_COHERENT:
+ * Device memory that is cache coherent from device and CPU point of view. This
+ * is used on platforms that have an advanced system bus (like CAPI or CXL). A
+ * driver can hotplug the device memory using ZONE_DEVICE and with that memory
+ * type. Any page of a process can be migrated to such memory. However no one

Any page might not be right, I'm pretty sure. ... just thinking about special 
pages
like vdso, shared zeropage, ... pinned pages ...

Well, you cannot migrate long term pages, that's what I meant :)


+ * should be allowed to pin such memory so that it can always be evicted.
+ *
   * MEMORY_DEVICE_FS_DAX:
   * Host memory that has similar access semantics as System RAM i.e. DMA
   * coherent and supports page pinning. In support of coordinating page
@@ -61,6 +68,7 @@ struct vmem_altmap {
  enum memory_type {
/* 0 is reserved to catch uninitialized type fields */
MEMORY_DEVICE_PRIVATE = 1,
+   MEMORY_DEVICE_COHERENT,
MEMORY_DEVICE_FS_DAX,
MEMORY_DEVICE_GENERIC,
MEMORY_DEVICE_PCI_P2PDMA,
@@ -143,6 +151,17 @@ static inline bool folio_is_device_private(const struct 
folio *folio)

In general, this LGTM, and it should be correct with PageAnonExclusive I think.


However, where exactly is pinning forbidden?

Long-term pinning is forbidden since it would interfere with the device
memory manager owning the
device-coherent pages (e.g. evictions in TTM). However, normal pinning
is allowed on this device type.

I don't see updates to folio_is_pinnable() in this patch.

Device coherent type pages should return true here, as they are pinnable
pages.

That function is only called for long-term pinnings in try_grab_folio().


So wouldn't try_grab_folio() simply pin these pages? What am I missing?

As far as I understand this return NULL for long term pin pages.
Otherwise they get refcount incremented.

I don't follow.

You're saying

a) folio_is_pinnable() returns true for device coherent pages

and that

b) device coherent pages don't get long-term pinned


Yet, the code says

struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags)
{
if (flags & FOLL_GET)
return try_get_folio(page, refs);
else if (flags & FOLL_PIN) {
struct folio *folio;

/*
 * Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a
 * right zone, so fail and let the caller fall back to the slow
 * path.
 */
if (unlikely((flags & FOLL_LONGTERM) &&
 !is_pinnable_page(page)))
return NULL;
...
return folio;
}
}


What prevents these pages from getting long-term pinned as stated in this patch?

Long-term pinning is handled by __gup_longterm_locked, which migrates
pages returned by __get_user_pages_locked that cannot be long-term
pinned. try_grab_folio is OK to grab the pages. Anything that can't be
long-term pinned will be migrated afterwards, and
__get_user_pages_locked will be retried. The migration of
DEVICE_COHERENT pages was implemented by Alistair in 

Re: [PATCH v5 01/13] mm: add zone device coherent type memory support

2022-06-22 Thread Sierra Guiza, Alejandro (Alex)



On 6/21/2022 7:16 PM, Alistair Popple wrote:

David Hildenbrand  writes:


On 21.06.22 18:08, Sierra Guiza, Alejandro (Alex) wrote:

On 6/21/2022 7:25 AM, David Hildenbrand wrote:

On 21.06.22 13:55, Alistair Popple wrote:

David Hildenbrand  writes:


On 21.06.22 13:25, Felix Kuehling wrote:

Am 6/17/22 um 23:19 schrieb David Hildenbrand:

On 17.06.22 21:27, Sierra Guiza, Alejandro (Alex) wrote:

On 6/17/2022 12:33 PM, David Hildenbrand wrote:

On 17.06.22 19:20, Sierra Guiza, Alejandro (Alex) wrote:

On 6/17/2022 4:40 AM, David Hildenbrand wrote:

On 31.05.22 22:00, Alex Sierra wrote:

Device memory that is cache coherent from device and CPU point of view.
This is used on platforms that have an advanced system bus (like CAPI
or CXL). Any page of a process can be migrated to such memory. However,
no one should be allowed to pin such memory so that it can always be
evicted.

Signed-off-by: Alex Sierra
Acked-by: Felix Kuehling
Reviewed-by: Alistair Popple
[hch: rebased ontop of the refcount changes,
   removed is_dev_private_or_coherent_page]
Signed-off-by: Christoph Hellwig
---
  include/linux/memremap.h | 19 +++
  mm/memcontrol.c  |  7 ---
  mm/memory-failure.c  |  8 ++--
  mm/memremap.c| 10 ++
  mm/migrate_device.c  | 16 +++-
  mm/rmap.c|  5 +++--
  6 files changed, 49 insertions(+), 16 deletions(-)

diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index 8af304f6b504..9f752ebed613 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -41,6 +41,13 @@ struct vmem_altmap {
   * A more complete discussion of unaddressable memory may be found in
   * include/linux/hmm.h and Documentation/vm/hmm.rst.
   *
+ * MEMORY_DEVICE_COHERENT:
+ * Device memory that is cache coherent from device and CPU point of view. This
+ * is used on platforms that have an advanced system bus (like CAPI or CXL). A
+ * driver can hotplug the device memory using ZONE_DEVICE and with that memory
+ * type. Any page of a process can be migrated to such memory. However no one

Any page might not be right, I'm pretty sure. ... just thinking about special 
pages
like vdso, shared zeropage, ... pinned pages ...

Well, you cannot migrate long term pages, that's what I meant :)


+ * should be allowed to pin such memory so that it can always be evicted.
+ *
   * MEMORY_DEVICE_FS_DAX:
   * Host memory that has similar access semantics as System RAM i.e. DMA
   * coherent and supports page pinning. In support of coordinating page
@@ -61,6 +68,7 @@ struct vmem_altmap {
  enum memory_type {
/* 0 is reserved to catch uninitialized type fields */
MEMORY_DEVICE_PRIVATE = 1,
+   MEMORY_DEVICE_COHERENT,
MEMORY_DEVICE_FS_DAX,
MEMORY_DEVICE_GENERIC,
MEMORY_DEVICE_PCI_P2PDMA,
@@ -143,6 +151,17 @@ static inline bool folio_is_device_private(const struct 
folio *folio)

In general, this LGTM, and it should be correct with PageAnonExclusive I think.


However, where exactly is pinning forbidden?

Long-term pinning is forbidden since it would interfere with the device
memory manager owning the
device-coherent pages (e.g. evictions in TTM). However, normal pinning
is allowed on this device type.

I don't see updates to folio_is_pinnable() in this patch.

Device coherent type pages should return true here, as they are pinnable
pages.

That function is only called for long-term pinnings in try_grab_folio().


So wouldn't try_grab_folio() simply pin these pages? What am I missing?

As far as I understand this return NULL for long term pin pages.
Otherwise they get refcount incremented.

I don't follow.

You're saying

a) folio_is_pinnable() returns true for device coherent pages

and that

b) device coherent pages don't get long-term pinned


Yet, the code says

struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags)
{
if (flags & FOLL_GET)
return try_get_folio(page, refs);
else if (flags & FOLL_PIN) {
struct folio *folio;

/*
 * Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a
 * right zone, so fail and let the caller fall back to the slow
 * path.
 */
if (unlikely((flags & FOLL_LONGTERM) &&
 !is_pinnable_page(page)))
return NULL;
...
return folio;
}
}


What prevents these pages from getting long-term pinned as stated in this patch?

Long-term pinning is handled by __gup_longterm_locked, which migrates
pages returned by __get_user_pages_locked that cannot be long-term
pinned. try_grab_folio is OK to grab the pages. Anything that can't be
long-term pinned will be migrated afterwards, and
__get_user_pages_locked will be retried. The migration of
DEVICE_COHERENT pages was 

Re: [PATCH v5 01/13] mm: add zone device coherent type memory support

2022-06-21 Thread Alistair Popple


David Hildenbrand  writes:

> On 21.06.22 18:08, Sierra Guiza, Alejandro (Alex) wrote:
>>
>> On 6/21/2022 7:25 AM, David Hildenbrand wrote:
>>> On 21.06.22 13:55, Alistair Popple wrote:
 David Hildenbrand  writes:

> On 21.06.22 13:25, Felix Kuehling wrote:
>> Am 6/17/22 um 23:19 schrieb David Hildenbrand:
>>> On 17.06.22 21:27, Sierra Guiza, Alejandro (Alex) wrote:
 On 6/17/2022 12:33 PM, David Hildenbrand wrote:
> On 17.06.22 19:20, Sierra Guiza, Alejandro (Alex) wrote:
>> On 6/17/2022 4:40 AM, David Hildenbrand wrote:
>>> On 31.05.22 22:00, Alex Sierra wrote:
 Device memory that is cache coherent from device and CPU point of 
 view.
 This is used on platforms that have an advanced system bus (like 
 CAPI
 or CXL). Any page of a process can be migrated to such memory. 
 However,
 no one should be allowed to pin such memory so that it can always 
 be
 evicted.

 Signed-off-by: Alex Sierra
 Acked-by: Felix Kuehling
 Reviewed-by: Alistair Popple
 [hch: rebased ontop of the refcount changes,
   removed is_dev_private_or_coherent_page]
 Signed-off-by: Christoph Hellwig
 ---
  include/linux/memremap.h | 19 +++
  mm/memcontrol.c  |  7 ---
  mm/memory-failure.c  |  8 ++--
  mm/memremap.c| 10 ++
  mm/migrate_device.c  | 16 +++-
  mm/rmap.c|  5 +++--
  6 files changed, 49 insertions(+), 16 deletions(-)

 diff --git a/include/linux/memremap.h b/include/linux/memremap.h
 index 8af304f6b504..9f752ebed613 100644
 --- a/include/linux/memremap.h
 +++ b/include/linux/memremap.h
 @@ -41,6 +41,13 @@ struct vmem_altmap {
   * A more complete discussion of unaddressable memory may be 
 found in
   * include/linux/hmm.h and Documentation/vm/hmm.rst.
   *
 + * MEMORY_DEVICE_COHERENT:
 + * Device memory that is cache coherent from device and CPU point 
 of view. This
 + * is used on platforms that have an advanced system bus (like 
 CAPI or CXL). A
 + * driver can hotplug the device memory using ZONE_DEVICE and 
 with that memory
 + * type. Any page of a process can be migrated to such memory. 
 However no one
>>> Any page might not be right, I'm pretty sure. ... just thinking 
>>> about special pages
>>> like vdso, shared zeropage, ... pinned pages ...
> Well, you cannot migrate long term pages, that's what I meant :)
>
 + * should be allowed to pin such memory so that it can always be 
 evicted.
 + *
   * MEMORY_DEVICE_FS_DAX:
   * Host memory that has similar access semantics as System 
 RAM i.e. DMA
   * coherent and supports page pinning. In support of 
 coordinating page
 @@ -61,6 +68,7 @@ struct vmem_altmap {
  enum memory_type {
/* 0 is reserved to catch uninitialized type fields */
MEMORY_DEVICE_PRIVATE = 1,
 +  MEMORY_DEVICE_COHERENT,
MEMORY_DEVICE_FS_DAX,
MEMORY_DEVICE_GENERIC,
MEMORY_DEVICE_PCI_P2PDMA,
 @@ -143,6 +151,17 @@ static inline bool 
 folio_is_device_private(const struct folio *folio)
>>> In general, this LGTM, and it should be correct with 
>>> PageAnonExclusive I think.
>>>
>>>
>>> However, where exactly is pinning forbidden?
>> Long-term pinning is forbidden since it would interfere with the 
>> device
>> memory manager owning the
>> device-coherent pages (e.g. evictions in TTM). However, normal 
>> pinning
>> is allowed on this device type.
> I don't see updates to folio_is_pinnable() in this patch.
 Device coherent type pages should return true here, as they are 
 pinnable
 pages.
>>> That function is only called for long-term pinnings in try_grab_folio().
>>>
> So wouldn't try_grab_folio() simply pin these pages? What am I 
> missing?
 As far as I understand this return NULL for long term pin pages.
 Otherwise they get refcount incremented.
>>> I don't follow.
>>>
>>> You're saying
>>>
>>> a) folio_is_pinnable() returns true for device coherent pages
>>>
>>> and that
>>>
>>> b) device 

Re: [PATCH v5 01/13] mm: add zone device coherent type memory support

2022-06-21 Thread David Hildenbrand
On 21.06.22 18:08, Sierra Guiza, Alejandro (Alex) wrote:
> 
> On 6/21/2022 7:25 AM, David Hildenbrand wrote:
>> On 21.06.22 13:55, Alistair Popple wrote:
>>> David Hildenbrand  writes:
>>>
 On 21.06.22 13:25, Felix Kuehling wrote:
> Am 6/17/22 um 23:19 schrieb David Hildenbrand:
>> On 17.06.22 21:27, Sierra Guiza, Alejandro (Alex) wrote:
>>> On 6/17/2022 12:33 PM, David Hildenbrand wrote:
 On 17.06.22 19:20, Sierra Guiza, Alejandro (Alex) wrote:
> On 6/17/2022 4:40 AM, David Hildenbrand wrote:
>> On 31.05.22 22:00, Alex Sierra wrote:
>>> Device memory that is cache coherent from device and CPU point of 
>>> view.
>>> This is used on platforms that have an advanced system bus (like 
>>> CAPI
>>> or CXL). Any page of a process can be migrated to such memory. 
>>> However,
>>> no one should be allowed to pin such memory so that it can always be
>>> evicted.
>>>
>>> Signed-off-by: Alex Sierra
>>> Acked-by: Felix Kuehling
>>> Reviewed-by: Alistair Popple
>>> [hch: rebased ontop of the refcount changes,
>>>   removed is_dev_private_or_coherent_page]
>>> Signed-off-by: Christoph Hellwig
>>> ---
>>>  include/linux/memremap.h | 19 +++
>>>  mm/memcontrol.c  |  7 ---
>>>  mm/memory-failure.c  |  8 ++--
>>>  mm/memremap.c| 10 ++
>>>  mm/migrate_device.c  | 16 +++-
>>>  mm/rmap.c|  5 +++--
>>>  6 files changed, 49 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/include/linux/memremap.h b/include/linux/memremap.h
>>> index 8af304f6b504..9f752ebed613 100644
>>> --- a/include/linux/memremap.h
>>> +++ b/include/linux/memremap.h
>>> @@ -41,6 +41,13 @@ struct vmem_altmap {
>>>   * A more complete discussion of unaddressable memory may be 
>>> found in
>>>   * include/linux/hmm.h and Documentation/vm/hmm.rst.
>>>   *
>>> + * MEMORY_DEVICE_COHERENT:
>>> + * Device memory that is cache coherent from device and CPU point 
>>> of view. This
>>> + * is used on platforms that have an advanced system bus (like 
>>> CAPI or CXL). A
>>> + * driver can hotplug the device memory using ZONE_DEVICE and with 
>>> that memory
>>> + * type. Any page of a process can be migrated to such memory. 
>>> However no one
>> Any page might not be right, I'm pretty sure. ... just thinking 
>> about special pages
>> like vdso, shared zeropage, ... pinned pages ...
 Well, you cannot migrate long term pages, that's what I meant :)

>>> + * should be allowed to pin such memory so that it can always be 
>>> evicted.
>>> + *
>>>   * MEMORY_DEVICE_FS_DAX:
>>>   * Host memory that has similar access semantics as System RAM 
>>> i.e. DMA
>>>   * coherent and supports page pinning. In support of 
>>> coordinating page
>>> @@ -61,6 +68,7 @@ struct vmem_altmap {
>>>  enum memory_type {
>>> /* 0 is reserved to catch uninitialized type fields */
>>> MEMORY_DEVICE_PRIVATE = 1,
>>> +   MEMORY_DEVICE_COHERENT,
>>> MEMORY_DEVICE_FS_DAX,
>>> MEMORY_DEVICE_GENERIC,
>>> MEMORY_DEVICE_PCI_P2PDMA,
>>> @@ -143,6 +151,17 @@ static inline bool 
>>> folio_is_device_private(const struct folio *folio)
>> In general, this LGTM, and it should be correct with 
>> PageAnonExclusive I think.
>>
>>
>> However, where exactly is pinning forbidden?
> Long-term pinning is forbidden since it would interfere with the 
> device
> memory manager owning the
> device-coherent pages (e.g. evictions in TTM). However, normal pinning
> is allowed on this device type.
 I don't see updates to folio_is_pinnable() in this patch.
>>> Device coherent type pages should return true here, as they are pinnable
>>> pages.
>> That function is only called for long-term pinnings in try_grab_folio().
>>
 So wouldn't try_grab_folio() simply pin these pages? What am I missing?
>>> As far as I understand this return NULL for long term pin pages.
>>> Otherwise they get refcount incremented.
>> I don't follow.
>>
>> You're saying
>>
>> a) folio_is_pinnable() returns true for device coherent pages
>>
>> and that
>>
>> b) device coherent pages don't get long-term pinned
>>
>>
>> Yet, the code says
>>
>> struct folio *try_grab_folio(struct page *page, int refs, unsigned int 
>> 

Re: [PATCH v5 01/13] mm: add zone device coherent type memory support

2022-06-21 Thread Sierra Guiza, Alejandro (Alex)



On 6/21/2022 7:25 AM, David Hildenbrand wrote:

On 21.06.22 13:55, Alistair Popple wrote:

David Hildenbrand  writes:


On 21.06.22 13:25, Felix Kuehling wrote:

Am 6/17/22 um 23:19 schrieb David Hildenbrand:

On 17.06.22 21:27, Sierra Guiza, Alejandro (Alex) wrote:

On 6/17/2022 12:33 PM, David Hildenbrand wrote:

On 17.06.22 19:20, Sierra Guiza, Alejandro (Alex) wrote:

On 6/17/2022 4:40 AM, David Hildenbrand wrote:

On 31.05.22 22:00, Alex Sierra wrote:

Device memory that is cache coherent from device and CPU point of view.
This is used on platforms that have an advanced system bus (like CAPI
or CXL). Any page of a process can be migrated to such memory. However,
no one should be allowed to pin such memory so that it can always be
evicted.

Signed-off-by: Alex Sierra
Acked-by: Felix Kuehling
Reviewed-by: Alistair Popple
[hch: rebased ontop of the refcount changes,
  removed is_dev_private_or_coherent_page]
Signed-off-by: Christoph Hellwig
---
 include/linux/memremap.h | 19 +++
 mm/memcontrol.c  |  7 ---
 mm/memory-failure.c  |  8 ++--
 mm/memremap.c| 10 ++
 mm/migrate_device.c  | 16 +++-
 mm/rmap.c|  5 +++--
 6 files changed, 49 insertions(+), 16 deletions(-)

diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index 8af304f6b504..9f752ebed613 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -41,6 +41,13 @@ struct vmem_altmap {
  * A more complete discussion of unaddressable memory may be found in
  * include/linux/hmm.h and Documentation/vm/hmm.rst.
  *
+ * MEMORY_DEVICE_COHERENT:
+ * Device memory that is cache coherent from device and CPU point of view. This
+ * is used on platforms that have an advanced system bus (like CAPI or CXL). A
+ * driver can hotplug the device memory using ZONE_DEVICE and with that memory
+ * type. Any page of a process can be migrated to such memory. However no one

Any page might not be right, I'm pretty sure. ... just thinking about special 
pages
like vdso, shared zeropage, ... pinned pages ...

Well, you cannot migrate long term pages, that's what I meant :)


+ * should be allowed to pin such memory so that it can always be evicted.
+ *
  * MEMORY_DEVICE_FS_DAX:
  * Host memory that has similar access semantics as System RAM i.e. DMA
  * coherent and supports page pinning. In support of coordinating page
@@ -61,6 +68,7 @@ struct vmem_altmap {
 enum memory_type {
/* 0 is reserved to catch uninitialized type fields */
MEMORY_DEVICE_PRIVATE = 1,
+   MEMORY_DEVICE_COHERENT,
MEMORY_DEVICE_FS_DAX,
MEMORY_DEVICE_GENERIC,
MEMORY_DEVICE_PCI_P2PDMA,
@@ -143,6 +151,17 @@ static inline bool folio_is_device_private(const struct 
folio *folio)

In general, this LGTM, and it should be correct with PageAnonExclusive I think.


However, where exactly is pinning forbidden?

Long-term pinning is forbidden since it would interfere with the device
memory manager owning the
device-coherent pages (e.g. evictions in TTM). However, normal pinning
is allowed on this device type.

I don't see updates to folio_is_pinnable() in this patch.

Device coherent type pages should return true here, as they are pinnable
pages.

That function is only called for long-term pinnings in try_grab_folio().


So wouldn't try_grab_folio() simply pin these pages? What am I missing?

As far as I understand this return NULL for long term pin pages.
Otherwise they get refcount incremented.

I don't follow.

You're saying

a) folio_is_pinnable() returns true for device coherent pages

and that

b) device coherent pages don't get long-term pinned


Yet, the code says

struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags)
{
if (flags & FOLL_GET)
return try_get_folio(page, refs);
else if (flags & FOLL_PIN) {
struct folio *folio;

/*
 * Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a
 * right zone, so fail and let the caller fall back to the slow
 * path.
 */
if (unlikely((flags & FOLL_LONGTERM) &&
 !is_pinnable_page(page)))
return NULL;
...
return folio;
}
}


What prevents these pages from getting long-term pinned as stated in this patch?

Long-term pinning is handled by __gup_longterm_locked, which migrates
pages returned by __get_user_pages_locked that cannot be long-term
pinned. try_grab_folio is OK to grab the pages. Anything that can't be
long-term pinned will be migrated afterwards, and
__get_user_pages_locked will be retried. The migration of
DEVICE_COHERENT pages was implemented by Alistair in patch 5/13
("mm/gup: migrate device coherent pages when pinning instead of failing").

Thanks.


Re: [PATCH v5 01/13] mm: add zone device coherent type memory support

2022-06-21 Thread David Hildenbrand
On 21.06.22 13:55, Alistair Popple wrote:
> 
> David Hildenbrand  writes:
> 
>> On 21.06.22 13:25, Felix Kuehling wrote:
>>>
>>> Am 6/17/22 um 23:19 schrieb David Hildenbrand:
 On 17.06.22 21:27, Sierra Guiza, Alejandro (Alex) wrote:
> On 6/17/2022 12:33 PM, David Hildenbrand wrote:
>> On 17.06.22 19:20, Sierra Guiza, Alejandro (Alex) wrote:
>>> On 6/17/2022 4:40 AM, David Hildenbrand wrote:
 On 31.05.22 22:00, Alex Sierra wrote:
> Device memory that is cache coherent from device and CPU point of 
> view.
> This is used on platforms that have an advanced system bus (like CAPI
> or CXL). Any page of a process can be migrated to such memory. 
> However,
> no one should be allowed to pin such memory so that it can always be
> evicted.
>
> Signed-off-by: Alex Sierra 
> Acked-by: Felix Kuehling 
> Reviewed-by: Alistair Popple 
> [hch: rebased ontop of the refcount changes,
>  removed is_dev_private_or_coherent_page]
> Signed-off-by: Christoph Hellwig 
> ---
> include/linux/memremap.h | 19 +++
> mm/memcontrol.c  |  7 ---
> mm/memory-failure.c  |  8 ++--
> mm/memremap.c| 10 ++
> mm/migrate_device.c  | 16 +++-
> mm/rmap.c|  5 +++--
> 6 files changed, 49 insertions(+), 16 deletions(-)
>
> diff --git a/include/linux/memremap.h b/include/linux/memremap.h
> index 8af304f6b504..9f752ebed613 100644
> --- a/include/linux/memremap.h
> +++ b/include/linux/memremap.h
> @@ -41,6 +41,13 @@ struct vmem_altmap {
>  * A more complete discussion of unaddressable memory may be 
> found in
>  * include/linux/hmm.h and Documentation/vm/hmm.rst.
>  *
> + * MEMORY_DEVICE_COHERENT:
> + * Device memory that is cache coherent from device and CPU point of 
> view. This
> + * is used on platforms that have an advanced system bus (like CAPI 
> or CXL). A
> + * driver can hotplug the device memory using ZONE_DEVICE and with 
> that memory
> + * type. Any page of a process can be migrated to such memory. 
> However no one
 Any page might not be right, I'm pretty sure. ... just thinking about 
 special pages
 like vdso, shared zeropage, ... pinned pages ...
>> Well, you cannot migrate long term pages, that's what I meant :)
>>
> + * should be allowed to pin such memory so that it can always be 
> evicted.
> + *
>  * MEMORY_DEVICE_FS_DAX:
>  * Host memory that has similar access semantics as System RAM 
> i.e. DMA
>  * coherent and supports page pinning. In support of coordinating 
> page
> @@ -61,6 +68,7 @@ struct vmem_altmap {
> enum memory_type {
>   /* 0 is reserved to catch uninitialized type fields */
>   MEMORY_DEVICE_PRIVATE = 1,
> + MEMORY_DEVICE_COHERENT,
>   MEMORY_DEVICE_FS_DAX,
>   MEMORY_DEVICE_GENERIC,
>   MEMORY_DEVICE_PCI_P2PDMA,
> @@ -143,6 +151,17 @@ static inline bool folio_is_device_private(const 
> struct folio *folio)
 In general, this LGTM, and it should be correct with PageAnonExclusive 
 I think.


 However, where exactly is pinning forbidden?
>>> Long-term pinning is forbidden since it would interfere with the device
>>> memory manager owning the
>>> device-coherent pages (e.g. evictions in TTM). However, normal pinning
>>> is allowed on this device type.
>> I don't see updates to folio_is_pinnable() in this patch.
> Device coherent type pages should return true here, as they are pinnable
> pages.
 That function is only called for long-term pinnings in try_grab_folio().

>> So wouldn't try_grab_folio() simply pin these pages? What am I missing?
> As far as I understand this return NULL for long term pin pages.
> Otherwise they get refcount incremented.
 I don't follow.

 You're saying

 a) folio_is_pinnable() returns true for device coherent pages

 and that

 b) device coherent pages don't get long-term pinned


 Yet, the code says

 struct folio *try_grab_folio(struct page *page, int refs, unsigned int 
 flags)
 {
if (flags & FOLL_GET)
return try_get_folio(page, refs);
else if (flags & FOLL_PIN) {
struct folio *folio;

/*
 * Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a
 * right zone, so fail and let the caller fall back to the slow
 * path.
   

Re: [PATCH v5 01/13] mm: add zone device coherent type memory support

2022-06-21 Thread Alistair Popple


David Hildenbrand  writes:

> On 21.06.22 13:25, Felix Kuehling wrote:
>>
>> Am 6/17/22 um 23:19 schrieb David Hildenbrand:
>>> On 17.06.22 21:27, Sierra Guiza, Alejandro (Alex) wrote:
 On 6/17/2022 12:33 PM, David Hildenbrand wrote:
> On 17.06.22 19:20, Sierra Guiza, Alejandro (Alex) wrote:
>> On 6/17/2022 4:40 AM, David Hildenbrand wrote:
>>> On 31.05.22 22:00, Alex Sierra wrote:
 Device memory that is cache coherent from device and CPU point of view.
 This is used on platforms that have an advanced system bus (like CAPI
 or CXL). Any page of a process can be migrated to such memory. However,
 no one should be allowed to pin such memory so that it can always be
 evicted.

 Signed-off-by: Alex Sierra 
 Acked-by: Felix Kuehling 
 Reviewed-by: Alistair Popple 
 [hch: rebased ontop of the refcount changes,
  removed is_dev_private_or_coherent_page]
 Signed-off-by: Christoph Hellwig 
 ---
 include/linux/memremap.h | 19 +++
 mm/memcontrol.c  |  7 ---
 mm/memory-failure.c  |  8 ++--
 mm/memremap.c| 10 ++
 mm/migrate_device.c  | 16 +++-
 mm/rmap.c|  5 +++--
 6 files changed, 49 insertions(+), 16 deletions(-)

 diff --git a/include/linux/memremap.h b/include/linux/memremap.h
 index 8af304f6b504..9f752ebed613 100644
 --- a/include/linux/memremap.h
 +++ b/include/linux/memremap.h
 @@ -41,6 +41,13 @@ struct vmem_altmap {
  * A more complete discussion of unaddressable memory may be found 
 in
  * include/linux/hmm.h and Documentation/vm/hmm.rst.
  *
 + * MEMORY_DEVICE_COHERENT:
 + * Device memory that is cache coherent from device and CPU point of 
 view. This
 + * is used on platforms that have an advanced system bus (like CAPI 
 or CXL). A
 + * driver can hotplug the device memory using ZONE_DEVICE and with 
 that memory
 + * type. Any page of a process can be migrated to such memory. 
 However no one
>>> Any page might not be right, I'm pretty sure. ... just thinking about 
>>> special pages
>>> like vdso, shared zeropage, ... pinned pages ...
> Well, you cannot migrate long term pages, that's what I meant :)
>
 + * should be allowed to pin such memory so that it can always be 
 evicted.
 + *
  * MEMORY_DEVICE_FS_DAX:
  * Host memory that has similar access semantics as System RAM 
 i.e. DMA
  * coherent and supports page pinning. In support of coordinating 
 page
 @@ -61,6 +68,7 @@ struct vmem_altmap {
 enum memory_type {
/* 0 is reserved to catch uninitialized type fields */
MEMORY_DEVICE_PRIVATE = 1,
 +  MEMORY_DEVICE_COHERENT,
MEMORY_DEVICE_FS_DAX,
MEMORY_DEVICE_GENERIC,
MEMORY_DEVICE_PCI_P2PDMA,
 @@ -143,6 +151,17 @@ static inline bool folio_is_device_private(const 
 struct folio *folio)
>>> In general, this LGTM, and it should be correct with PageAnonExclusive 
>>> I think.
>>>
>>>
>>> However, where exactly is pinning forbidden?
>> Long-term pinning is forbidden since it would interfere with the device
>> memory manager owning the
>> device-coherent pages (e.g. evictions in TTM). However, normal pinning
>> is allowed on this device type.
> I don't see updates to folio_is_pinnable() in this patch.
 Device coherent type pages should return true here, as they are pinnable
 pages.
>>> That function is only called for long-term pinnings in try_grab_folio().
>>>
> So wouldn't try_grab_folio() simply pin these pages? What am I missing?
 As far as I understand this return NULL for long term pin pages.
 Otherwise they get refcount incremented.
>>> I don't follow.
>>>
>>> You're saying
>>>
>>> a) folio_is_pinnable() returns true for device coherent pages
>>>
>>> and that
>>>
>>> b) device coherent pages don't get long-term pinned
>>>
>>>
>>> Yet, the code says
>>>
>>> struct folio *try_grab_folio(struct page *page, int refs, unsigned int 
>>> flags)
>>> {
>>> if (flags & FOLL_GET)
>>> return try_get_folio(page, refs);
>>> else if (flags & FOLL_PIN) {
>>> struct folio *folio;
>>>
>>> /*
>>>  * Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a
>>>  * right zone, so fail and let the caller fall back to the slow
>>>  * path.
>>>  */
>>> if (unlikely((flags & FOLL_LONGTERM) &&
>>>  !is_pinnable_page(page)))
>>> return NULL;
>>>   

Re: [PATCH v5 01/13] mm: add zone device coherent type memory support

2022-06-21 Thread David Hildenbrand
On 21.06.22 13:25, Felix Kuehling wrote:
> 
> Am 6/17/22 um 23:19 schrieb David Hildenbrand:
>> On 17.06.22 21:27, Sierra Guiza, Alejandro (Alex) wrote:
>>> On 6/17/2022 12:33 PM, David Hildenbrand wrote:
 On 17.06.22 19:20, Sierra Guiza, Alejandro (Alex) wrote:
> On 6/17/2022 4:40 AM, David Hildenbrand wrote:
>> On 31.05.22 22:00, Alex Sierra wrote:
>>> Device memory that is cache coherent from device and CPU point of view.
>>> This is used on platforms that have an advanced system bus (like CAPI
>>> or CXL). Any page of a process can be migrated to such memory. However,
>>> no one should be allowed to pin such memory so that it can always be
>>> evicted.
>>>
>>> Signed-off-by: Alex Sierra 
>>> Acked-by: Felix Kuehling 
>>> Reviewed-by: Alistair Popple 
>>> [hch: rebased ontop of the refcount changes,
>>>  removed is_dev_private_or_coherent_page]
>>> Signed-off-by: Christoph Hellwig 
>>> ---
>>> include/linux/memremap.h | 19 +++
>>> mm/memcontrol.c  |  7 ---
>>> mm/memory-failure.c  |  8 ++--
>>> mm/memremap.c| 10 ++
>>> mm/migrate_device.c  | 16 +++-
>>> mm/rmap.c|  5 +++--
>>> 6 files changed, 49 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/include/linux/memremap.h b/include/linux/memremap.h
>>> index 8af304f6b504..9f752ebed613 100644
>>> --- a/include/linux/memremap.h
>>> +++ b/include/linux/memremap.h
>>> @@ -41,6 +41,13 @@ struct vmem_altmap {
>>>  * A more complete discussion of unaddressable memory may be found 
>>> in
>>>  * include/linux/hmm.h and Documentation/vm/hmm.rst.
>>>  *
>>> + * MEMORY_DEVICE_COHERENT:
>>> + * Device memory that is cache coherent from device and CPU point of 
>>> view. This
>>> + * is used on platforms that have an advanced system bus (like CAPI or 
>>> CXL). A
>>> + * driver can hotplug the device memory using ZONE_DEVICE and with 
>>> that memory
>>> + * type. Any page of a process can be migrated to such memory. However 
>>> no one
>> Any page might not be right, I'm pretty sure. ... just thinking about 
>> special pages
>> like vdso, shared zeropage, ... pinned pages ...
 Well, you cannot migrate long term pages, that's what I meant :)

>>> + * should be allowed to pin such memory so that it can always be 
>>> evicted.
>>> + *
>>>  * MEMORY_DEVICE_FS_DAX:
>>>  * Host memory that has similar access semantics as System RAM i.e. 
>>> DMA
>>>  * coherent and supports page pinning. In support of coordinating 
>>> page
>>> @@ -61,6 +68,7 @@ struct vmem_altmap {
>>> enum memory_type {
>>> /* 0 is reserved to catch uninitialized type fields */
>>> MEMORY_DEVICE_PRIVATE = 1,
>>> +   MEMORY_DEVICE_COHERENT,
>>> MEMORY_DEVICE_FS_DAX,
>>> MEMORY_DEVICE_GENERIC,
>>> MEMORY_DEVICE_PCI_P2PDMA,
>>> @@ -143,6 +151,17 @@ static inline bool folio_is_device_private(const 
>>> struct folio *folio)
>> In general, this LGTM, and it should be correct with PageAnonExclusive I 
>> think.
>>
>>
>> However, where exactly is pinning forbidden?
> Long-term pinning is forbidden since it would interfere with the device
> memory manager owning the
> device-coherent pages (e.g. evictions in TTM). However, normal pinning
> is allowed on this device type.
 I don't see updates to folio_is_pinnable() in this patch.
>>> Device coherent type pages should return true here, as they are pinnable
>>> pages.
>> That function is only called for long-term pinnings in try_grab_folio().
>>
 So wouldn't try_grab_folio() simply pin these pages? What am I missing?
>>> As far as I understand this return NULL for long term pin pages.
>>> Otherwise they get refcount incremented.
>> I don't follow.
>>
>> You're saying
>>
>> a) folio_is_pinnable() returns true for device coherent pages
>>
>> and that
>>
>> b) device coherent pages don't get long-term pinned
>>
>>
>> Yet, the code says
>>
>> struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags)
>> {
>>  if (flags & FOLL_GET)
>>  return try_get_folio(page, refs);
>>  else if (flags & FOLL_PIN) {
>>  struct folio *folio;
>>
>>  /*
>>   * Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a
>>   * right zone, so fail and let the caller fall back to the slow
>>   * path.
>>   */
>>  if (unlikely((flags & FOLL_LONGTERM) &&
>>   !is_pinnable_page(page)))
>>  return NULL;
>>  ...
>>  return folio;
>>  }
>> }
>>
>>
>> What prevents these pages from getting long-term pinned as 

Re: [PATCH v5 01/13] mm: add zone device coherent type memory support

2022-06-21 Thread Felix Kuehling



Am 6/17/22 um 23:19 schrieb David Hildenbrand:

On 17.06.22 21:27, Sierra Guiza, Alejandro (Alex) wrote:

On 6/17/2022 12:33 PM, David Hildenbrand wrote:

On 17.06.22 19:20, Sierra Guiza, Alejandro (Alex) wrote:

On 6/17/2022 4:40 AM, David Hildenbrand wrote:

On 31.05.22 22:00, Alex Sierra wrote:

Device memory that is cache coherent from device and CPU point of view.
This is used on platforms that have an advanced system bus (like CAPI
or CXL). Any page of a process can be migrated to such memory. However,
no one should be allowed to pin such memory so that it can always be
evicted.

Signed-off-by: Alex Sierra 
Acked-by: Felix Kuehling 
Reviewed-by: Alistair Popple 
[hch: rebased ontop of the refcount changes,
 removed is_dev_private_or_coherent_page]
Signed-off-by: Christoph Hellwig 
---
include/linux/memremap.h | 19 +++
mm/memcontrol.c  |  7 ---
mm/memory-failure.c  |  8 ++--
mm/memremap.c| 10 ++
mm/migrate_device.c  | 16 +++-
mm/rmap.c|  5 +++--
6 files changed, 49 insertions(+), 16 deletions(-)

diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index 8af304f6b504..9f752ebed613 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -41,6 +41,13 @@ struct vmem_altmap {
 * A more complete discussion of unaddressable memory may be found in
 * include/linux/hmm.h and Documentation/vm/hmm.rst.
 *
+ * MEMORY_DEVICE_COHERENT:
+ * Device memory that is cache coherent from device and CPU point of view. This
+ * is used on platforms that have an advanced system bus (like CAPI or CXL). A
+ * driver can hotplug the device memory using ZONE_DEVICE and with that memory
+ * type. Any page of a process can be migrated to such memory. However no one

Any page might not be right, I'm pretty sure. ... just thinking about special 
pages
like vdso, shared zeropage, ... pinned pages ...

Well, you cannot migrate long term pages, that's what I meant :)


+ * should be allowed to pin such memory so that it can always be evicted.
+ *
 * MEMORY_DEVICE_FS_DAX:
 * Host memory that has similar access semantics as System RAM i.e. DMA
 * coherent and supports page pinning. In support of coordinating page
@@ -61,6 +68,7 @@ struct vmem_altmap {
enum memory_type {
/* 0 is reserved to catch uninitialized type fields */
MEMORY_DEVICE_PRIVATE = 1,
+   MEMORY_DEVICE_COHERENT,
MEMORY_DEVICE_FS_DAX,
MEMORY_DEVICE_GENERIC,
MEMORY_DEVICE_PCI_P2PDMA,
@@ -143,6 +151,17 @@ static inline bool folio_is_device_private(const struct 
folio *folio)

In general, this LGTM, and it should be correct with PageAnonExclusive I think.


However, where exactly is pinning forbidden?

Long-term pinning is forbidden since it would interfere with the device
memory manager owning the
device-coherent pages (e.g. evictions in TTM). However, normal pinning
is allowed on this device type.

I don't see updates to folio_is_pinnable() in this patch.

Device coherent type pages should return true here, as they are pinnable
pages.

That function is only called for long-term pinnings in try_grab_folio().


So wouldn't try_grab_folio() simply pin these pages? What am I missing?

As far as I understand this return NULL for long term pin pages.
Otherwise they get refcount incremented.

I don't follow.

You're saying

a) folio_is_pinnable() returns true for device coherent pages

and that

b) device coherent pages don't get long-term pinned


Yet, the code says

struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags)
{
if (flags & FOLL_GET)
return try_get_folio(page, refs);
else if (flags & FOLL_PIN) {
struct folio *folio;

/*
 * Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a
 * right zone, so fail and let the caller fall back to the slow
 * path.
 */
if (unlikely((flags & FOLL_LONGTERM) &&
 !is_pinnable_page(page)))
return NULL;
...
return folio;
}
}


What prevents these pages from getting long-term pinned as stated in this patch?


Long-term pinning is handled by __gup_longterm_locked, which migrates 
pages returned by __get_user_pages_locked that cannot be long-term 
pinned. try_grab_folio is OK to grab the pages. Anything that can't be 
long-term pinned will be migrated afterwards, and 
__get_user_pages_locked will be retried. The migration of 
DEVICE_COHERENT pages was implemented by Alistair in patch 5/13 
("mm/gup: migrate device coherent pages when pinning instead of failing").


Regards,
  Felix




I am probably missing something important.

P.S.: I'm on vacation and looking at a tiny screen. Hope I didn't miss 
anything myself.


Re: [PATCH v5 01/13] mm: add zone device coherent type memory support

2022-06-20 Thread Oded Gabbay
On Mon, Jun 20, 2022 at 11:50 AM Alistair Popple  wrote:
>
>
> Oded Gabbay  writes:
>
> > On Mon, Jun 20, 2022 at 3:33 AM Alistair Popple  wrote:
> >>
> >>
> >> Oded Gabbay  writes:
> >>
> >> > On Fri, Jun 17, 2022 at 8:20 PM Sierra Guiza, Alejandro (Alex)
> >> >  wrote:
> >> >>
> >> >>
> >> >> On 6/17/2022 4:40 AM, David Hildenbrand wrote:
> >> >> > On 31.05.22 22:00, Alex Sierra wrote:
> >> >> >> Device memory that is cache coherent from device and CPU point of 
> >> >> >> view.
> >> >> >> This is used on platforms that have an advanced system bus (like CAPI
> >> >> >> or CXL). Any page of a process can be migrated to such memory. 
> >> >> >> However,
> >> >> >> no one should be allowed to pin such memory so that it can always be
> >> >> >> evicted.
> >> >> >>
> >> >> >> Signed-off-by: Alex Sierra 
> >> >> >> Acked-by: Felix Kuehling 
> >> >> >> Reviewed-by: Alistair Popple 
> >> >> >> [hch: rebased ontop of the refcount changes,
> >> >> >>removed is_dev_private_or_coherent_page]
> >> >> >> Signed-off-by: Christoph Hellwig 
> >> >> >> ---
> >> >> >>   include/linux/memremap.h | 19 +++
> >> >> >>   mm/memcontrol.c  |  7 ---
> >> >> >>   mm/memory-failure.c  |  8 ++--
> >> >> >>   mm/memremap.c| 10 ++
> >> >> >>   mm/migrate_device.c  | 16 +++-
> >> >> >>   mm/rmap.c|  5 +++--
> >> >> >>   6 files changed, 49 insertions(+), 16 deletions(-)
> >> >> >>
> >> >> >> diff --git a/include/linux/memremap.h b/include/linux/memremap.h
> >> >> >> index 8af304f6b504..9f752ebed613 100644
> >> >> >> --- a/include/linux/memremap.h
> >> >> >> +++ b/include/linux/memremap.h
> >> >> >> @@ -41,6 +41,13 @@ struct vmem_altmap {
> >> >> >>* A more complete discussion of unaddressable memory may be found 
> >> >> >> in
> >> >> >>* include/linux/hmm.h and Documentation/vm/hmm.rst.
> >> >> >>*
> >> >> >> + * MEMORY_DEVICE_COHERENT:
> >> >> >> + * Device memory that is cache coherent from device and CPU point 
> >> >> >> of view. This
> >> >> >> + * is used on platforms that have an advanced system bus (like CAPI 
> >> >> >> or CXL). A
> >> >> >> + * driver can hotplug the device memory using ZONE_DEVICE and with 
> >> >> >> that memory
> >> >> >> + * type. Any page of a process can be migrated to such memory. 
> >> >> >> However no one
> >> >> > Any page might not be right, I'm pretty sure. ... just thinking about 
> >> >> > special pages
> >> >> > like vdso, shared zeropage, ... pinned pages ...
> >> >>
> >> >> Hi David,
> >> >>
> >> >> Yes, I think you're right. This type does not cover all special pages.
> >> >> I need to correct that on the cover letter.
> >> >> Pinned pages are allowed as long as they're not long term pinned.
> >> >>
> >> >> Regards,
> >> >> Alex Sierra
> >> >
> >> > What if I want to hotplug this device's coherent memory, but I do
> >> > *not* want the OS
> >> > to migrate any page to it ?
> >> > I want to fully-control what resides on this memory, as I can consider
> >> > this memory
> >> > "expensive". i.e. I don't have a lot of it, I want to use it for
> >> > specific purposes and
> >> > I don't want the OS to start using it when there is some memory pressure 
> >> > in
> >> > the system.
> >>
> >> This is exactly what MEMORY_DEVICE_COHERENT is for. Device coherent
> >> pages are only allocated by a device driver and exposed to user-space by
> >> a driver migrating pages to them with migrate_vma. The OS can't just
> >> start using them due to memory pressure for example.
> >>
> >>  - Alistair
> > Thanks for the explanation.
> >
> > I guess the commit message confused me a bit, especially these two 
> > sentences:
> >
> > "Any page of a process can be migrated to such memory. However no one 
> > should be
> > allowed to pin such memory so that it can always be evicted."
> >
> > I read them as if the OS is free to choose which pages are migrated to
> > this memory,
> > and anything is eligible for migration to that memory (and that's why
> > we also don't
> > allow it to pin memory there).
> >
> > If we are not allowed to pin anything there, can the device driver
> > decide to disable
> > any option for oversubscription of this memory area ?
>
> I'm not sure I follow your thinking on how oversubscription would work
> here, however all allocations are controlled by the driver. So if a
> device's coherent memory is full a driver would be unable to migrate
> pages to that device until pages are freed by the OS due to being
> unmapped or the driver evicts pages by migrating them back to normal CPU
> memory.
>
> Pinning of pages is allowed, and could prevent such migrations. However
> this patch series prevents device coherent pages from being pinned
> longterm (ie. with FOLL_LONGTERM), so it should always be able to evict
> pages eventually.
>
> > Let's assume the user uses this memory area for doing p2p with other
> > CXL devices.
> > In that case, I wouldn't want the driver/OS to migrate pages in and
> 

Re: [PATCH v5 01/13] mm: add zone device coherent type memory support

2022-06-20 Thread Alistair Popple


Oded Gabbay  writes:

> On Mon, Jun 20, 2022 at 3:33 AM Alistair Popple  wrote:
>>
>>
>> Oded Gabbay  writes:
>>
>> > On Fri, Jun 17, 2022 at 8:20 PM Sierra Guiza, Alejandro (Alex)
>> >  wrote:
>> >>
>> >>
>> >> On 6/17/2022 4:40 AM, David Hildenbrand wrote:
>> >> > On 31.05.22 22:00, Alex Sierra wrote:
>> >> >> Device memory that is cache coherent from device and CPU point of view.
>> >> >> This is used on platforms that have an advanced system bus (like CAPI
>> >> >> or CXL). Any page of a process can be migrated to such memory. However,
>> >> >> no one should be allowed to pin such memory so that it can always be
>> >> >> evicted.
>> >> >>
>> >> >> Signed-off-by: Alex Sierra 
>> >> >> Acked-by: Felix Kuehling 
>> >> >> Reviewed-by: Alistair Popple 
>> >> >> [hch: rebased ontop of the refcount changes,
>> >> >>removed is_dev_private_or_coherent_page]
>> >> >> Signed-off-by: Christoph Hellwig 
>> >> >> ---
>> >> >>   include/linux/memremap.h | 19 +++
>> >> >>   mm/memcontrol.c  |  7 ---
>> >> >>   mm/memory-failure.c  |  8 ++--
>> >> >>   mm/memremap.c| 10 ++
>> >> >>   mm/migrate_device.c  | 16 +++-
>> >> >>   mm/rmap.c|  5 +++--
>> >> >>   6 files changed, 49 insertions(+), 16 deletions(-)
>> >> >>
>> >> >> diff --git a/include/linux/memremap.h b/include/linux/memremap.h
>> >> >> index 8af304f6b504..9f752ebed613 100644
>> >> >> --- a/include/linux/memremap.h
>> >> >> +++ b/include/linux/memremap.h
>> >> >> @@ -41,6 +41,13 @@ struct vmem_altmap {
>> >> >>* A more complete discussion of unaddressable memory may be found in
>> >> >>* include/linux/hmm.h and Documentation/vm/hmm.rst.
>> >> >>*
>> >> >> + * MEMORY_DEVICE_COHERENT:
>> >> >> + * Device memory that is cache coherent from device and CPU point of 
>> >> >> view. This
>> >> >> + * is used on platforms that have an advanced system bus (like CAPI 
>> >> >> or CXL). A
>> >> >> + * driver can hotplug the device memory using ZONE_DEVICE and with 
>> >> >> that memory
>> >> >> + * type. Any page of a process can be migrated to such memory. 
>> >> >> However no one
>> >> > Any page might not be right, I'm pretty sure. ... just thinking about 
>> >> > special pages
>> >> > like vdso, shared zeropage, ... pinned pages ...
>> >>
>> >> Hi David,
>> >>
>> >> Yes, I think you're right. This type does not cover all special pages.
>> >> I need to correct that on the cover letter.
>> >> Pinned pages are allowed as long as they're not long term pinned.
>> >>
>> >> Regards,
>> >> Alex Sierra
>> >
>> > What if I want to hotplug this device's coherent memory, but I do
>> > *not* want the OS
>> > to migrate any page to it ?
>> > I want to fully-control what resides on this memory, as I can consider
>> > this memory
>> > "expensive". i.e. I don't have a lot of it, I want to use it for
>> > specific purposes and
>> > I don't want the OS to start using it when there is some memory pressure in
>> > the system.
>>
>> This is exactly what MEMORY_DEVICE_COHERENT is for. Device coherent
>> pages are only allocated by a device driver and exposed to user-space by
>> a driver migrating pages to them with migrate_vma. The OS can't just
>> start using them due to memory pressure for example.
>>
>>  - Alistair
> Thanks for the explanation.
>
> I guess the commit message confused me a bit, especially these two sentences:
>
> "Any page of a process can be migrated to such memory. However no one should 
> be
> allowed to pin such memory so that it can always be evicted."
>
> I read them as if the OS is free to choose which pages are migrated to
> this memory,
> and anything is eligible for migration to that memory (and that's why
> we also don't
> allow it to pin memory there).
>
> If we are not allowed to pin anything there, can the device driver
> decide to disable
> any option for oversubscription of this memory area ?

I'm not sure I follow your thinking on how oversubscription would work
here, however all allocations are controlled by the driver. So if a
device's coherent memory is full a driver would be unable to migrate
pages to that device until pages are freed by the OS due to being
unmapped or the driver evicts pages by migrating them back to normal CPU
memory.

Pinning of pages is allowed, and could prevent such migrations. However
this patch series prevents device coherent pages from being pinned
longterm (ie. with FOLL_LONGTERM), so it should always be able to evict
pages eventually.

> Let's assume the user uses this memory area for doing p2p with other
> CXL devices.
> In that case, I wouldn't want the driver/OS to migrate pages in and
> out of that memory...

The OS will not migrate pages in or out (although it may free them if no
longer required), but a driver might choose to. So at the moment it's
really up to the driver to implement what you want in this regards.

> So either I should let the user pin those pages, or prevent him from
> doing (accidently or 

Re: [PATCH v5 01/13] mm: add zone device coherent type memory support

2022-06-20 Thread Oded Gabbay
On Mon, Jun 20, 2022 at 3:33 AM Alistair Popple  wrote:
>
>
> Oded Gabbay  writes:
>
> > On Fri, Jun 17, 2022 at 8:20 PM Sierra Guiza, Alejandro (Alex)
> >  wrote:
> >>
> >>
> >> On 6/17/2022 4:40 AM, David Hildenbrand wrote:
> >> > On 31.05.22 22:00, Alex Sierra wrote:
> >> >> Device memory that is cache coherent from device and CPU point of view.
> >> >> This is used on platforms that have an advanced system bus (like CAPI
> >> >> or CXL). Any page of a process can be migrated to such memory. However,
> >> >> no one should be allowed to pin such memory so that it can always be
> >> >> evicted.
> >> >>
> >> >> Signed-off-by: Alex Sierra 
> >> >> Acked-by: Felix Kuehling 
> >> >> Reviewed-by: Alistair Popple 
> >> >> [hch: rebased ontop of the refcount changes,
> >> >>removed is_dev_private_or_coherent_page]
> >> >> Signed-off-by: Christoph Hellwig 
> >> >> ---
> >> >>   include/linux/memremap.h | 19 +++
> >> >>   mm/memcontrol.c  |  7 ---
> >> >>   mm/memory-failure.c  |  8 ++--
> >> >>   mm/memremap.c| 10 ++
> >> >>   mm/migrate_device.c  | 16 +++-
> >> >>   mm/rmap.c|  5 +++--
> >> >>   6 files changed, 49 insertions(+), 16 deletions(-)
> >> >>
> >> >> diff --git a/include/linux/memremap.h b/include/linux/memremap.h
> >> >> index 8af304f6b504..9f752ebed613 100644
> >> >> --- a/include/linux/memremap.h
> >> >> +++ b/include/linux/memremap.h
> >> >> @@ -41,6 +41,13 @@ struct vmem_altmap {
> >> >>* A more complete discussion of unaddressable memory may be found in
> >> >>* include/linux/hmm.h and Documentation/vm/hmm.rst.
> >> >>*
> >> >> + * MEMORY_DEVICE_COHERENT:
> >> >> + * Device memory that is cache coherent from device and CPU point of 
> >> >> view. This
> >> >> + * is used on platforms that have an advanced system bus (like CAPI or 
> >> >> CXL). A
> >> >> + * driver can hotplug the device memory using ZONE_DEVICE and with 
> >> >> that memory
> >> >> + * type. Any page of a process can be migrated to such memory. However 
> >> >> no one
> >> > Any page might not be right, I'm pretty sure. ... just thinking about 
> >> > special pages
> >> > like vdso, shared zeropage, ... pinned pages ...
> >>
> >> Hi David,
> >>
> >> Yes, I think you're right. This type does not cover all special pages.
> >> I need to correct that on the cover letter.
> >> Pinned pages are allowed as long as they're not long term pinned.
> >>
> >> Regards,
> >> Alex Sierra
> >
> > What if I want to hotplug this device's coherent memory, but I do
> > *not* want the OS
> > to migrate any page to it ?
> > I want to fully-control what resides on this memory, as I can consider
> > this memory
> > "expensive". i.e. I don't have a lot of it, I want to use it for
> > specific purposes and
> > I don't want the OS to start using it when there is some memory pressure in
> > the system.
>
> This is exactly what MEMORY_DEVICE_COHERENT is for. Device coherent
> pages are only allocated by a device driver and exposed to user-space by
> a driver migrating pages to them with migrate_vma. The OS can't just
> start using them due to memory pressure for example.
>
>  - Alistair
Thanks for the explanation.

I guess the commit message confused me a bit, especially these two sentences:

"Any page of a process can be migrated to such memory. However no one should be
allowed to pin such memory so that it can always be evicted."

I read them as if the OS is free to choose which pages are migrated to
this memory,
and anything is eligible for migration to that memory (and that's why
we also don't
allow it to pin memory there).

If we are not allowed to pin anything there, can the device driver
decide to disable
any option for oversubscription of this memory area ?

Let's assume the user uses this memory area for doing p2p with other
CXL devices.
In that case, I wouldn't want the driver/OS to migrate pages in and
out of that memory...

So either I should let the user pin those pages, or prevent him from
doing (accidently or not)
oversubscription in this memory area.

wdyt ?

>
> > Oded
> >
> >>
> >> >
> >> >> + * should be allowed to pin such memory so that it can always be 
> >> >> evicted.
> >> >> + *
> >> >>* MEMORY_DEVICE_FS_DAX:
> >> >>* Host memory that has similar access semantics as System RAM i.e. 
> >> >> DMA
> >> >>* coherent and supports page pinning. In support of coordinating page
> >> >> @@ -61,6 +68,7 @@ struct vmem_altmap {
> >> >>   enum memory_type {
> >> >>  /* 0 is reserved to catch uninitialized type fields */
> >> >>  MEMORY_DEVICE_PRIVATE = 1,
> >> >> +MEMORY_DEVICE_COHERENT,
> >> >>  MEMORY_DEVICE_FS_DAX,
> >> >>  MEMORY_DEVICE_GENERIC,
> >> >>  MEMORY_DEVICE_PCI_P2PDMA,
> >> >> @@ -143,6 +151,17 @@ static inline bool folio_is_device_private(const 
> >> >> struct folio *folio)
> >> > In general, this LGTM, and it should be correct with PageAnonExclusive I 
> >> > think.
> >> >
> >> >
> >> > 

Re: [PATCH v5 01/13] mm: add zone device coherent type memory support

2022-06-19 Thread Alistair Popple


Oded Gabbay  writes:

> On Fri, Jun 17, 2022 at 8:20 PM Sierra Guiza, Alejandro (Alex)
>  wrote:
>>
>>
>> On 6/17/2022 4:40 AM, David Hildenbrand wrote:
>> > On 31.05.22 22:00, Alex Sierra wrote:
>> >> Device memory that is cache coherent from device and CPU point of view.
>> >> This is used on platforms that have an advanced system bus (like CAPI
>> >> or CXL). Any page of a process can be migrated to such memory. However,
>> >> no one should be allowed to pin such memory so that it can always be
>> >> evicted.
>> >>
>> >> Signed-off-by: Alex Sierra 
>> >> Acked-by: Felix Kuehling 
>> >> Reviewed-by: Alistair Popple 
>> >> [hch: rebased ontop of the refcount changes,
>> >>removed is_dev_private_or_coherent_page]
>> >> Signed-off-by: Christoph Hellwig 
>> >> ---
>> >>   include/linux/memremap.h | 19 +++
>> >>   mm/memcontrol.c  |  7 ---
>> >>   mm/memory-failure.c  |  8 ++--
>> >>   mm/memremap.c| 10 ++
>> >>   mm/migrate_device.c  | 16 +++-
>> >>   mm/rmap.c|  5 +++--
>> >>   6 files changed, 49 insertions(+), 16 deletions(-)
>> >>
>> >> diff --git a/include/linux/memremap.h b/include/linux/memremap.h
>> >> index 8af304f6b504..9f752ebed613 100644
>> >> --- a/include/linux/memremap.h
>> >> +++ b/include/linux/memremap.h
>> >> @@ -41,6 +41,13 @@ struct vmem_altmap {
>> >>* A more complete discussion of unaddressable memory may be found in
>> >>* include/linux/hmm.h and Documentation/vm/hmm.rst.
>> >>*
>> >> + * MEMORY_DEVICE_COHERENT:
>> >> + * Device memory that is cache coherent from device and CPU point of 
>> >> view. This
>> >> + * is used on platforms that have an advanced system bus (like CAPI or 
>> >> CXL). A
>> >> + * driver can hotplug the device memory using ZONE_DEVICE and with that 
>> >> memory
>> >> + * type. Any page of a process can be migrated to such memory. However 
>> >> no one
>> > Any page might not be right, I'm pretty sure. ... just thinking about 
>> > special pages
>> > like vdso, shared zeropage, ... pinned pages ...
>>
>> Hi David,
>>
>> Yes, I think you're right. This type does not cover all special pages.
>> I need to correct that on the cover letter.
>> Pinned pages are allowed as long as they're not long term pinned.
>>
>> Regards,
>> Alex Sierra
>
> What if I want to hotplug this device's coherent memory, but I do
> *not* want the OS
> to migrate any page to it ?
> I want to fully-control what resides on this memory, as I can consider
> this memory
> "expensive". i.e. I don't have a lot of it, I want to use it for
> specific purposes and
> I don't want the OS to start using it when there is some memory pressure in
> the system.

This is exactly what MEMORY_DEVICE_COHERENT is for. Device coherent
pages are only allocated by a device driver and exposed to user-space by
a driver migrating pages to them with migrate_vma. The OS can't just
start using them due to memory pressure for example.

 - Alistair

> Oded
>
>>
>> >
>> >> + * should be allowed to pin such memory so that it can always be evicted.
>> >> + *
>> >>* MEMORY_DEVICE_FS_DAX:
>> >>* Host memory that has similar access semantics as System RAM i.e. DMA
>> >>* coherent and supports page pinning. In support of coordinating page
>> >> @@ -61,6 +68,7 @@ struct vmem_altmap {
>> >>   enum memory_type {
>> >>  /* 0 is reserved to catch uninitialized type fields */
>> >>  MEMORY_DEVICE_PRIVATE = 1,
>> >> +MEMORY_DEVICE_COHERENT,
>> >>  MEMORY_DEVICE_FS_DAX,
>> >>  MEMORY_DEVICE_GENERIC,
>> >>  MEMORY_DEVICE_PCI_P2PDMA,
>> >> @@ -143,6 +151,17 @@ static inline bool folio_is_device_private(const 
>> >> struct folio *folio)
>> > In general, this LGTM, and it should be correct with PageAnonExclusive I 
>> > think.
>> >
>> >
>> > However, where exactly is pinning forbidden?
>>
>> Long-term pinning is forbidden since it would interfere with the device
>> memory manager owning the
>> device-coherent pages (e.g. evictions in TTM). However, normal pinning
>> is allowed on this device type.
>>
>> Regards,
>> Alex Sierra
>>
>> >


Re: [PATCH v5 01/13] mm: add zone device coherent type memory support

2022-06-18 Thread Oded Gabbay
On Fri, Jun 17, 2022 at 8:20 PM Sierra Guiza, Alejandro (Alex)
 wrote:
>
>
> On 6/17/2022 4:40 AM, David Hildenbrand wrote:
> > On 31.05.22 22:00, Alex Sierra wrote:
> >> Device memory that is cache coherent from device and CPU point of view.
> >> This is used on platforms that have an advanced system bus (like CAPI
> >> or CXL). Any page of a process can be migrated to such memory. However,
> >> no one should be allowed to pin such memory so that it can always be
> >> evicted.
> >>
> >> Signed-off-by: Alex Sierra 
> >> Acked-by: Felix Kuehling 
> >> Reviewed-by: Alistair Popple 
> >> [hch: rebased ontop of the refcount changes,
> >>removed is_dev_private_or_coherent_page]
> >> Signed-off-by: Christoph Hellwig 
> >> ---
> >>   include/linux/memremap.h | 19 +++
> >>   mm/memcontrol.c  |  7 ---
> >>   mm/memory-failure.c  |  8 ++--
> >>   mm/memremap.c| 10 ++
> >>   mm/migrate_device.c  | 16 +++-
> >>   mm/rmap.c|  5 +++--
> >>   6 files changed, 49 insertions(+), 16 deletions(-)
> >>
> >> diff --git a/include/linux/memremap.h b/include/linux/memremap.h
> >> index 8af304f6b504..9f752ebed613 100644
> >> --- a/include/linux/memremap.h
> >> +++ b/include/linux/memremap.h
> >> @@ -41,6 +41,13 @@ struct vmem_altmap {
> >>* A more complete discussion of unaddressable memory may be found in
> >>* include/linux/hmm.h and Documentation/vm/hmm.rst.
> >>*
> >> + * MEMORY_DEVICE_COHERENT:
> >> + * Device memory that is cache coherent from device and CPU point of 
> >> view. This
> >> + * is used on platforms that have an advanced system bus (like CAPI or 
> >> CXL). A
> >> + * driver can hotplug the device memory using ZONE_DEVICE and with that 
> >> memory
> >> + * type. Any page of a process can be migrated to such memory. However no 
> >> one
> > Any page might not be right, I'm pretty sure. ... just thinking about 
> > special pages
> > like vdso, shared zeropage, ... pinned pages ...
>
> Hi David,
>
> Yes, I think you're right. This type does not cover all special pages.
> I need to correct that on the cover letter.
> Pinned pages are allowed as long as they're not long term pinned.
>
> Regards,
> Alex Sierra

What if I want to hotplug this device's coherent memory, but I do
*not* want the OS
to migrate any page to it ?
I want to fully-control what resides on this memory, as I can consider
this memory
"expensive". i.e. I don't have a lot of it, I want to use it for
specific purposes and
I don't want the OS to start using it when there is some memory pressure in
the system.

Oded

>
> >
> >> + * should be allowed to pin such memory so that it can always be evicted.
> >> + *
> >>* MEMORY_DEVICE_FS_DAX:
> >>* Host memory that has similar access semantics as System RAM i.e. DMA
> >>* coherent and supports page pinning. In support of coordinating page
> >> @@ -61,6 +68,7 @@ struct vmem_altmap {
> >>   enum memory_type {
> >>  /* 0 is reserved to catch uninitialized type fields */
> >>  MEMORY_DEVICE_PRIVATE = 1,
> >> +MEMORY_DEVICE_COHERENT,
> >>  MEMORY_DEVICE_FS_DAX,
> >>  MEMORY_DEVICE_GENERIC,
> >>  MEMORY_DEVICE_PCI_P2PDMA,
> >> @@ -143,6 +151,17 @@ static inline bool folio_is_device_private(const 
> >> struct folio *folio)
> > In general, this LGTM, and it should be correct with PageAnonExclusive I 
> > think.
> >
> >
> > However, where exactly is pinning forbidden?
>
> Long-term pinning is forbidden since it would interfere with the device
> memory manager owning the
> device-coherent pages (e.g. evictions in TTM). However, normal pinning
> is allowed on this device type.
>
> Regards,
> Alex Sierra
>
> >


Re: [PATCH v5 01/13] mm: add zone device coherent type memory support

2022-06-17 Thread David Hildenbrand
On 17.06.22 21:27, Sierra Guiza, Alejandro (Alex) wrote:
> 
> On 6/17/2022 12:33 PM, David Hildenbrand wrote:
>> On 17.06.22 19:20, Sierra Guiza, Alejandro (Alex) wrote:
>>> On 6/17/2022 4:40 AM, David Hildenbrand wrote:
 On 31.05.22 22:00, Alex Sierra wrote:
> Device memory that is cache coherent from device and CPU point of view.
> This is used on platforms that have an advanced system bus (like CAPI
> or CXL). Any page of a process can be migrated to such memory. However,
> no one should be allowed to pin such memory so that it can always be
> evicted.
>
> Signed-off-by: Alex Sierra 
> Acked-by: Felix Kuehling 
> Reviewed-by: Alistair Popple 
> [hch: rebased ontop of the refcount changes,
> removed is_dev_private_or_coherent_page]
> Signed-off-by: Christoph Hellwig 
> ---
>include/linux/memremap.h | 19 +++
>mm/memcontrol.c  |  7 ---
>mm/memory-failure.c  |  8 ++--
>mm/memremap.c| 10 ++
>mm/migrate_device.c  | 16 +++-
>mm/rmap.c|  5 +++--
>6 files changed, 49 insertions(+), 16 deletions(-)
>
> diff --git a/include/linux/memremap.h b/include/linux/memremap.h
> index 8af304f6b504..9f752ebed613 100644
> --- a/include/linux/memremap.h
> +++ b/include/linux/memremap.h
> @@ -41,6 +41,13 @@ struct vmem_altmap {
> * A more complete discussion of unaddressable memory may be found in
> * include/linux/hmm.h and Documentation/vm/hmm.rst.
> *
> + * MEMORY_DEVICE_COHERENT:
> + * Device memory that is cache coherent from device and CPU point of 
> view. This
> + * is used on platforms that have an advanced system bus (like CAPI or 
> CXL). A
> + * driver can hotplug the device memory using ZONE_DEVICE and with that 
> memory
> + * type. Any page of a process can be migrated to such memory. However 
> no one
 Any page might not be right, I'm pretty sure. ... just thinking about 
 special pages
 like vdso, shared zeropage, ... pinned pages ...
>> Well, you cannot migrate long term pages, that's what I meant :)
>>
> + * should be allowed to pin such memory so that it can always be evicted.
> + *
> * MEMORY_DEVICE_FS_DAX:
> * Host memory that has similar access semantics as System RAM i.e. DMA
> * coherent and supports page pinning. In support of coordinating page
> @@ -61,6 +68,7 @@ struct vmem_altmap {
>enum memory_type {
>   /* 0 is reserved to catch uninitialized type fields */
>   MEMORY_DEVICE_PRIVATE = 1,
> + MEMORY_DEVICE_COHERENT,
>   MEMORY_DEVICE_FS_DAX,
>   MEMORY_DEVICE_GENERIC,
>   MEMORY_DEVICE_PCI_P2PDMA,
> @@ -143,6 +151,17 @@ static inline bool folio_is_device_private(const 
> struct folio *folio)
 In general, this LGTM, and it should be correct with PageAnonExclusive I 
 think.


 However, where exactly is pinning forbidden?
>>> Long-term pinning is forbidden since it would interfere with the device
>>> memory manager owning the
>>> device-coherent pages (e.g. evictions in TTM). However, normal pinning
>>> is allowed on this device type.
>> I don't see updates to folio_is_pinnable() in this patch.
> Device coherent type pages should return true here, as they are pinnable 
> pages.

That function is only called for long-term pinnings in try_grab_folio().

>>
>> So wouldn't try_grab_folio() simply pin these pages? What am I missing?
> 
> As far as I understand this return NULL for long term pin pages. 
> Otherwise they get refcount incremented.

I don't follow.

You're saying

a) folio_is_pinnable() returns true for device coherent pages

and that

b) device coherent pages don't get long-term pinned


Yet, the code says

struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags)
{
if (flags & FOLL_GET)
return try_get_folio(page, refs);
else if (flags & FOLL_PIN) {
struct folio *folio;

/*
 * Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a
 * right zone, so fail and let the caller fall back to the slow
 * path.
 */
if (unlikely((flags & FOLL_LONGTERM) &&
 !is_pinnable_page(page)))
return NULL;
...
return folio;
}
}


What prevents these pages from getting long-term pinned as stated in this patch?

I am probably missing something important.

-- 
Thanks,

David / dhildenb



Re: [PATCH v5 01/13] mm: add zone device coherent type memory support

2022-06-17 Thread Sierra Guiza, Alejandro (Alex)



On 6/17/2022 12:33 PM, David Hildenbrand wrote:

On 17.06.22 19:20, Sierra Guiza, Alejandro (Alex) wrote:

On 6/17/2022 4:40 AM, David Hildenbrand wrote:

On 31.05.22 22:00, Alex Sierra wrote:

Device memory that is cache coherent from device and CPU point of view.
This is used on platforms that have an advanced system bus (like CAPI
or CXL). Any page of a process can be migrated to such memory. However,
no one should be allowed to pin such memory so that it can always be
evicted.

Signed-off-by: Alex Sierra 
Acked-by: Felix Kuehling 
Reviewed-by: Alistair Popple 
[hch: rebased ontop of the refcount changes,
removed is_dev_private_or_coherent_page]
Signed-off-by: Christoph Hellwig 
---
   include/linux/memremap.h | 19 +++
   mm/memcontrol.c  |  7 ---
   mm/memory-failure.c  |  8 ++--
   mm/memremap.c| 10 ++
   mm/migrate_device.c  | 16 +++-
   mm/rmap.c|  5 +++--
   6 files changed, 49 insertions(+), 16 deletions(-)

diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index 8af304f6b504..9f752ebed613 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -41,6 +41,13 @@ struct vmem_altmap {
* A more complete discussion of unaddressable memory may be found in
* include/linux/hmm.h and Documentation/vm/hmm.rst.
*
+ * MEMORY_DEVICE_COHERENT:
+ * Device memory that is cache coherent from device and CPU point of view. This
+ * is used on platforms that have an advanced system bus (like CAPI or CXL). A
+ * driver can hotplug the device memory using ZONE_DEVICE and with that memory
+ * type. Any page of a process can be migrated to such memory. However no one

Any page might not be right, I'm pretty sure. ... just thinking about special 
pages
like vdso, shared zeropage, ... pinned pages ...

Well, you cannot migrate long term pages, that's what I meant :)


+ * should be allowed to pin such memory so that it can always be evicted.
+ *
* MEMORY_DEVICE_FS_DAX:
* Host memory that has similar access semantics as System RAM i.e. DMA
* coherent and supports page pinning. In support of coordinating page
@@ -61,6 +68,7 @@ struct vmem_altmap {
   enum memory_type {
/* 0 is reserved to catch uninitialized type fields */
MEMORY_DEVICE_PRIVATE = 1,
+   MEMORY_DEVICE_COHERENT,
MEMORY_DEVICE_FS_DAX,
MEMORY_DEVICE_GENERIC,
MEMORY_DEVICE_PCI_P2PDMA,
@@ -143,6 +151,17 @@ static inline bool folio_is_device_private(const struct 
folio *folio)

In general, this LGTM, and it should be correct with PageAnonExclusive I think.


However, where exactly is pinning forbidden?

Long-term pinning is forbidden since it would interfere with the device
memory manager owning the
device-coherent pages (e.g. evictions in TTM). However, normal pinning
is allowed on this device type.

I don't see updates to folio_is_pinnable() in this patch.
Device coherent type pages should return true here, as they are pinnable 
pages.


So wouldn't try_grab_folio() simply pin these pages? What am I missing?


As far as I understand this return NULL for long term pin pages. 
Otherwise they get refcount incremented.


Regards,
Alex Sierra





Re: [PATCH v5 01/13] mm: add zone device coherent type memory support

2022-06-17 Thread David Hildenbrand
On 17.06.22 19:20, Sierra Guiza, Alejandro (Alex) wrote:
> 
> On 6/17/2022 4:40 AM, David Hildenbrand wrote:
>> On 31.05.22 22:00, Alex Sierra wrote:
>>> Device memory that is cache coherent from device and CPU point of view.
>>> This is used on platforms that have an advanced system bus (like CAPI
>>> or CXL). Any page of a process can be migrated to such memory. However,
>>> no one should be allowed to pin such memory so that it can always be
>>> evicted.
>>>
>>> Signed-off-by: Alex Sierra 
>>> Acked-by: Felix Kuehling 
>>> Reviewed-by: Alistair Popple 
>>> [hch: rebased ontop of the refcount changes,
>>>removed is_dev_private_or_coherent_page]
>>> Signed-off-by: Christoph Hellwig 
>>> ---
>>>   include/linux/memremap.h | 19 +++
>>>   mm/memcontrol.c  |  7 ---
>>>   mm/memory-failure.c  |  8 ++--
>>>   mm/memremap.c| 10 ++
>>>   mm/migrate_device.c  | 16 +++-
>>>   mm/rmap.c|  5 +++--
>>>   6 files changed, 49 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/include/linux/memremap.h b/include/linux/memremap.h
>>> index 8af304f6b504..9f752ebed613 100644
>>> --- a/include/linux/memremap.h
>>> +++ b/include/linux/memremap.h
>>> @@ -41,6 +41,13 @@ struct vmem_altmap {
>>>* A more complete discussion of unaddressable memory may be found in
>>>* include/linux/hmm.h and Documentation/vm/hmm.rst.
>>>*
>>> + * MEMORY_DEVICE_COHERENT:
>>> + * Device memory that is cache coherent from device and CPU point of view. 
>>> This
>>> + * is used on platforms that have an advanced system bus (like CAPI or 
>>> CXL). A
>>> + * driver can hotplug the device memory using ZONE_DEVICE and with that 
>>> memory
>>> + * type. Any page of a process can be migrated to such memory. However no 
>>> one
>> Any page might not be right, I'm pretty sure. ... just thinking about 
>> special pages
>> like vdso, shared zeropage, ... pinned pages ...
> 

Well, you cannot migrate long term pages, that's what I meant :)

>>
>>> + * should be allowed to pin such memory so that it can always be evicted.
>>> + *
>>>* MEMORY_DEVICE_FS_DAX:
>>>* Host memory that has similar access semantics as System RAM i.e. DMA
>>>* coherent and supports page pinning. In support of coordinating page
>>> @@ -61,6 +68,7 @@ struct vmem_altmap {
>>>   enum memory_type {
>>> /* 0 is reserved to catch uninitialized type fields */
>>> MEMORY_DEVICE_PRIVATE = 1,
>>> +   MEMORY_DEVICE_COHERENT,
>>> MEMORY_DEVICE_FS_DAX,
>>> MEMORY_DEVICE_GENERIC,
>>> MEMORY_DEVICE_PCI_P2PDMA,
>>> @@ -143,6 +151,17 @@ static inline bool folio_is_device_private(const 
>>> struct folio *folio)
>> In general, this LGTM, and it should be correct with PageAnonExclusive I 
>> think.
>>
>>
>> However, where exactly is pinning forbidden?
> 
> Long-term pinning is forbidden since it would interfere with the device 
> memory manager owning the
> device-coherent pages (e.g. evictions in TTM). However, normal pinning 
> is allowed on this device type.

I don't see updates to folio_is_pinnable() in this patch.

So wouldn't try_grab_folio() simply pin these pages? What am I missing?

-- 
Thanks,

David / dhildenb



Re: [PATCH v5 01/13] mm: add zone device coherent type memory support

2022-06-17 Thread Sierra Guiza, Alejandro (Alex)



On 6/17/2022 4:40 AM, David Hildenbrand wrote:

On 31.05.22 22:00, Alex Sierra wrote:

Device memory that is cache coherent from device and CPU point of view.
This is used on platforms that have an advanced system bus (like CAPI
or CXL). Any page of a process can be migrated to such memory. However,
no one should be allowed to pin such memory so that it can always be
evicted.

Signed-off-by: Alex Sierra 
Acked-by: Felix Kuehling 
Reviewed-by: Alistair Popple 
[hch: rebased ontop of the refcount changes,
   removed is_dev_private_or_coherent_page]
Signed-off-by: Christoph Hellwig 
---
  include/linux/memremap.h | 19 +++
  mm/memcontrol.c  |  7 ---
  mm/memory-failure.c  |  8 ++--
  mm/memremap.c| 10 ++
  mm/migrate_device.c  | 16 +++-
  mm/rmap.c|  5 +++--
  6 files changed, 49 insertions(+), 16 deletions(-)

diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index 8af304f6b504..9f752ebed613 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -41,6 +41,13 @@ struct vmem_altmap {
   * A more complete discussion of unaddressable memory may be found in
   * include/linux/hmm.h and Documentation/vm/hmm.rst.
   *
+ * MEMORY_DEVICE_COHERENT:
+ * Device memory that is cache coherent from device and CPU point of view. This
+ * is used on platforms that have an advanced system bus (like CAPI or CXL). A
+ * driver can hotplug the device memory using ZONE_DEVICE and with that memory
+ * type. Any page of a process can be migrated to such memory. However no one

Any page might not be right, I'm pretty sure. ... just thinking about special 
pages
like vdso, shared zeropage, ... pinned pages ...


Hi David,

Yes, I think you're right. This type does not cover all special pages.  
I need to correct that on the cover letter.

Pinned pages are allowed as long as they're not long term pinned.

Regards,
Alex Sierra




+ * should be allowed to pin such memory so that it can always be evicted.
+ *
   * MEMORY_DEVICE_FS_DAX:
   * Host memory that has similar access semantics as System RAM i.e. DMA
   * coherent and supports page pinning. In support of coordinating page
@@ -61,6 +68,7 @@ struct vmem_altmap {
  enum memory_type {
/* 0 is reserved to catch uninitialized type fields */
MEMORY_DEVICE_PRIVATE = 1,
+   MEMORY_DEVICE_COHERENT,
MEMORY_DEVICE_FS_DAX,
MEMORY_DEVICE_GENERIC,
MEMORY_DEVICE_PCI_P2PDMA,
@@ -143,6 +151,17 @@ static inline bool folio_is_device_private(const struct 
folio *folio)

In general, this LGTM, and it should be correct with PageAnonExclusive I think.


However, where exactly is pinning forbidden?


Long-term pinning is forbidden since it would interfere with the device 
memory manager owning the
device-coherent pages (e.g. evictions in TTM). However, normal pinning 
is allowed on this device type.


Regards,
Alex Sierra





Re: [PATCH v5 01/13] mm: add zone device coherent type memory support

2022-06-17 Thread David Hildenbrand
On 31.05.22 22:00, Alex Sierra wrote:
> Device memory that is cache coherent from device and CPU point of view.
> This is used on platforms that have an advanced system bus (like CAPI
> or CXL). Any page of a process can be migrated to such memory. However,
> no one should be allowed to pin such memory so that it can always be
> evicted.
> 
> Signed-off-by: Alex Sierra 
> Acked-by: Felix Kuehling 
> Reviewed-by: Alistair Popple 
> [hch: rebased ontop of the refcount changes,
>   removed is_dev_private_or_coherent_page]
> Signed-off-by: Christoph Hellwig 
> ---
>  include/linux/memremap.h | 19 +++
>  mm/memcontrol.c  |  7 ---
>  mm/memory-failure.c  |  8 ++--
>  mm/memremap.c| 10 ++
>  mm/migrate_device.c  | 16 +++-
>  mm/rmap.c|  5 +++--
>  6 files changed, 49 insertions(+), 16 deletions(-)
> 
> diff --git a/include/linux/memremap.h b/include/linux/memremap.h
> index 8af304f6b504..9f752ebed613 100644
> --- a/include/linux/memremap.h
> +++ b/include/linux/memremap.h
> @@ -41,6 +41,13 @@ struct vmem_altmap {
>   * A more complete discussion of unaddressable memory may be found in
>   * include/linux/hmm.h and Documentation/vm/hmm.rst.
>   *
> + * MEMORY_DEVICE_COHERENT:
> + * Device memory that is cache coherent from device and CPU point of view. 
> This
> + * is used on platforms that have an advanced system bus (like CAPI or CXL). 
> A
> + * driver can hotplug the device memory using ZONE_DEVICE and with that 
> memory
> + * type. Any page of a process can be migrated to such memory. However no one

Any page might not be right, I'm pretty sure. ... just thinking about special 
pages
like vdso, shared zeropage, ... pinned pages ...

> + * should be allowed to pin such memory so that it can always be evicted.
> + *
>   * MEMORY_DEVICE_FS_DAX:
>   * Host memory that has similar access semantics as System RAM i.e. DMA
>   * coherent and supports page pinning. In support of coordinating page
> @@ -61,6 +68,7 @@ struct vmem_altmap {
>  enum memory_type {
>   /* 0 is reserved to catch uninitialized type fields */
>   MEMORY_DEVICE_PRIVATE = 1,
> + MEMORY_DEVICE_COHERENT,
>   MEMORY_DEVICE_FS_DAX,
>   MEMORY_DEVICE_GENERIC,
>   MEMORY_DEVICE_PCI_P2PDMA,
> @@ -143,6 +151,17 @@ static inline bool folio_is_device_private(const struct 
> folio *folio)

In general, this LGTM, and it should be correct with PageAnonExclusive I think.


However, where exactly is pinning forbidden?

-- 
Thanks,

David / dhildenb



[PATCH v5 01/13] mm: add zone device coherent type memory support

2022-05-31 Thread Alex Sierra
Device memory that is cache coherent from device and CPU point of view.
This is used on platforms that have an advanced system bus (like CAPI
or CXL). Any page of a process can be migrated to such memory. However,
no one should be allowed to pin such memory so that it can always be
evicted.

Signed-off-by: Alex Sierra 
Acked-by: Felix Kuehling 
Reviewed-by: Alistair Popple 
[hch: rebased ontop of the refcount changes,
  removed is_dev_private_or_coherent_page]
Signed-off-by: Christoph Hellwig 
---
 include/linux/memremap.h | 19 +++
 mm/memcontrol.c  |  7 ---
 mm/memory-failure.c  |  8 ++--
 mm/memremap.c| 10 ++
 mm/migrate_device.c  | 16 +++-
 mm/rmap.c|  5 +++--
 6 files changed, 49 insertions(+), 16 deletions(-)

diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index 8af304f6b504..9f752ebed613 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -41,6 +41,13 @@ struct vmem_altmap {
  * A more complete discussion of unaddressable memory may be found in
  * include/linux/hmm.h and Documentation/vm/hmm.rst.
  *
+ * MEMORY_DEVICE_COHERENT:
+ * Device memory that is cache coherent from device and CPU point of view. This
+ * is used on platforms that have an advanced system bus (like CAPI or CXL). A
+ * driver can hotplug the device memory using ZONE_DEVICE and with that memory
+ * type. Any page of a process can be migrated to such memory. However no one
+ * should be allowed to pin such memory so that it can always be evicted.
+ *
  * MEMORY_DEVICE_FS_DAX:
  * Host memory that has similar access semantics as System RAM i.e. DMA
  * coherent and supports page pinning. In support of coordinating page
@@ -61,6 +68,7 @@ struct vmem_altmap {
 enum memory_type {
/* 0 is reserved to catch uninitialized type fields */
MEMORY_DEVICE_PRIVATE = 1,
+   MEMORY_DEVICE_COHERENT,
MEMORY_DEVICE_FS_DAX,
MEMORY_DEVICE_GENERIC,
MEMORY_DEVICE_PCI_P2PDMA,
@@ -143,6 +151,17 @@ static inline bool folio_is_device_private(const struct 
folio *folio)
return is_device_private_page(>page);
 }
 
+static inline bool is_device_coherent_page(const struct page *page)
+{
+   return is_zone_device_page(page) &&
+   page->pgmap->type == MEMORY_DEVICE_COHERENT;
+}
+
+static inline bool folio_is_device_coherent(const struct folio *folio)
+{
+   return is_device_coherent_page(>page);
+}
+
 static inline bool is_pci_p2pdma_page(const struct page *page)
 {
return IS_ENABLED(CONFIG_PCI_P2PDMA) &&
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index abec50f31fe6..93f80d7ca148 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5665,8 +5665,8 @@ static int mem_cgroup_move_account(struct page *page,
  *   2(MC_TARGET_SWAP): if the swap entry corresponding to this pte is a
  * target for charge migration. if @target is not NULL, the entry is stored
  * in target->ent.
- *   3(MC_TARGET_DEVICE): like MC_TARGET_PAGE  but page is 
MEMORY_DEVICE_PRIVATE
- * (so ZONE_DEVICE page and thus not on the lru).
+ *   3(MC_TARGET_DEVICE): like MC_TARGET_PAGE  but page is device memory and
+ *   thus not on the lru.
  * For now we such page is charge like a regular page would be as for all
  * intent and purposes it is just special memory taking the place of a
  * regular page.
@@ -5704,7 +5704,8 @@ static enum mc_target_type get_mctgt_type(struct 
vm_area_struct *vma,
 */
if (page_memcg(page) == mc.from) {
ret = MC_TARGET_PAGE;
-   if (is_device_private_page(page))
+   if (is_device_private_page(page) ||
+   is_device_coherent_page(page))
ret = MC_TARGET_DEVICE;
if (target)
target->page = page;
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index b85661cbdc4a..0b6a0a01ee09 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1683,12 +1683,16 @@ static int memory_failure_dev_pagemap(unsigned long 
pfn, int flags,
goto unlock;
}
 
-   if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
+   switch (pgmap->type) {
+   case MEMORY_DEVICE_PRIVATE:
+   case MEMORY_DEVICE_COHERENT:
/*
-* TODO: Handle HMM pages which may need coordination
+* TODO: Handle device pages which may need coordination
 * with device-side memory.
 */
goto unlock;
+   default:
+   break;
}
 
/*
diff --git a/mm/memremap.c b/mm/memremap.c
index 2b92e97cb25b..dbd2631b3520 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -315,6 +315,16 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid)
return ERR_PTR(-EINVAL);
}
break;
+   case