Re: [PATCH v5 01/13] mm: add zone device coherent type memory support
On 6/23/2022 1:21 PM, David Hildenbrand wrote: On 23.06.22 20:20, Sierra Guiza, Alejandro (Alex) wrote: On 6/23/2022 2:57 AM, David Hildenbrand wrote: On 23.06.22 01:16, Sierra Guiza, Alejandro (Alex) wrote: On 6/21/2022 11:16 AM, David Hildenbrand wrote: On 21.06.22 18:08, Sierra Guiza, Alejandro (Alex) wrote: On 6/21/2022 7:25 AM, David Hildenbrand wrote: On 21.06.22 13:55, Alistair Popple wrote: David Hildenbrand writes: On 21.06.22 13:25, Felix Kuehling wrote: Am 6/17/22 um 23:19 schrieb David Hildenbrand: On 17.06.22 21:27, Sierra Guiza, Alejandro (Alex) wrote: On 6/17/2022 12:33 PM, David Hildenbrand wrote: On 17.06.22 19:20, Sierra Guiza, Alejandro (Alex) wrote: On 6/17/2022 4:40 AM, David Hildenbrand wrote: On 31.05.22 22:00, Alex Sierra wrote: Device memory that is cache coherent from device and CPU point of view. This is used on platforms that have an advanced system bus (like CAPI or CXL). Any page of a process can be migrated to such memory. However, no one should be allowed to pin such memory so that it can always be evicted. Signed-off-by: Alex Sierra Acked-by: Felix Kuehling Reviewed-by: Alistair Popple [hch: rebased ontop of the refcount changes, removed is_dev_private_or_coherent_page] Signed-off-by: Christoph Hellwig --- include/linux/memremap.h | 19 +++ mm/memcontrol.c | 7 --- mm/memory-failure.c | 8 ++-- mm/memremap.c| 10 ++ mm/migrate_device.c | 16 +++- mm/rmap.c| 5 +++-- 6 files changed, 49 insertions(+), 16 deletions(-) diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 8af304f6b504..9f752ebed613 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -41,6 +41,13 @@ struct vmem_altmap { * A more complete discussion of unaddressable memory may be found in * include/linux/hmm.h and Documentation/vm/hmm.rst. * + * MEMORY_DEVICE_COHERENT: + * Device memory that is cache coherent from device and CPU point of view. This + * is used on platforms that have an advanced system bus (like CAPI or CXL). A + * driver can hotplug the device memory using ZONE_DEVICE and with that memory + * type. Any page of a process can be migrated to such memory. However no one Any page might not be right, I'm pretty sure. ... just thinking about special pages like vdso, shared zeropage, ... pinned pages ... Well, you cannot migrate long term pages, that's what I meant :) + * should be allowed to pin such memory so that it can always be evicted. + * * MEMORY_DEVICE_FS_DAX: * Host memory that has similar access semantics as System RAM i.e. DMA * coherent and supports page pinning. In support of coordinating page @@ -61,6 +68,7 @@ struct vmem_altmap { enum memory_type { /* 0 is reserved to catch uninitialized type fields */ MEMORY_DEVICE_PRIVATE = 1, + MEMORY_DEVICE_COHERENT, MEMORY_DEVICE_FS_DAX, MEMORY_DEVICE_GENERIC, MEMORY_DEVICE_PCI_P2PDMA, @@ -143,6 +151,17 @@ static inline bool folio_is_device_private(const struct folio *folio) In general, this LGTM, and it should be correct with PageAnonExclusive I think. However, where exactly is pinning forbidden? Long-term pinning is forbidden since it would interfere with the device memory manager owning the device-coherent pages (e.g. evictions in TTM). However, normal pinning is allowed on this device type. I don't see updates to folio_is_pinnable() in this patch. Device coherent type pages should return true here, as they are pinnable pages. That function is only called for long-term pinnings in try_grab_folio(). So wouldn't try_grab_folio() simply pin these pages? What am I missing? As far as I understand this return NULL for long term pin pages. Otherwise they get refcount incremented. I don't follow. You're saying a) folio_is_pinnable() returns true for device coherent pages and that b) device coherent pages don't get long-term pinned Yet, the code says struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags) { if (flags & FOLL_GET) return try_get_folio(page, refs); else if (flags & FOLL_PIN) { struct folio *folio; /* * Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a * right zone, so fail and let the caller fall back to the slow * path. */ if (unlikely((flags & FOLL_LONGTERM) && !is_pinnable_page(page))) return NULL; ... return folio; } } What prevents these pages from getting long-term pinned as stated in this patch? Long-term pinning is handled by __gup_longterm_locked, which migrates pages returned
Re: [PATCH v5 01/13] mm: add zone device coherent type memory support
On 23.06.22 20:20, Sierra Guiza, Alejandro (Alex) wrote: > > On 6/23/2022 2:57 AM, David Hildenbrand wrote: >> On 23.06.22 01:16, Sierra Guiza, Alejandro (Alex) wrote: >>> On 6/21/2022 11:16 AM, David Hildenbrand wrote: On 21.06.22 18:08, Sierra Guiza, Alejandro (Alex) wrote: > On 6/21/2022 7:25 AM, David Hildenbrand wrote: >> On 21.06.22 13:55, Alistair Popple wrote: >>> David Hildenbrand writes: >>> On 21.06.22 13:25, Felix Kuehling wrote: > Am 6/17/22 um 23:19 schrieb David Hildenbrand: >> On 17.06.22 21:27, Sierra Guiza, Alejandro (Alex) wrote: >>> On 6/17/2022 12:33 PM, David Hildenbrand wrote: On 17.06.22 19:20, Sierra Guiza, Alejandro (Alex) wrote: > On 6/17/2022 4:40 AM, David Hildenbrand wrote: >> On 31.05.22 22:00, Alex Sierra wrote: >>> Device memory that is cache coherent from device and CPU point >>> of view. >>> This is used on platforms that have an advanced system bus >>> (like CAPI >>> or CXL). Any page of a process can be migrated to such memory. >>> However, >>> no one should be allowed to pin such memory so that it can >>> always be >>> evicted. >>> >>> Signed-off-by: Alex Sierra >>> Acked-by: Felix Kuehling >>> Reviewed-by: Alistair Popple >>> [hch: rebased ontop of the refcount changes, >>> removed is_dev_private_or_coherent_page] >>> Signed-off-by: Christoph Hellwig >>> --- >>>include/linux/memremap.h | 19 +++ >>>mm/memcontrol.c | 7 --- >>>mm/memory-failure.c | 8 ++-- >>>mm/memremap.c| 10 ++ >>>mm/migrate_device.c | 16 +++- >>>mm/rmap.c| 5 +++-- >>>6 files changed, 49 insertions(+), 16 deletions(-) >>> >>> diff --git a/include/linux/memremap.h b/include/linux/memremap.h >>> index 8af304f6b504..9f752ebed613 100644 >>> --- a/include/linux/memremap.h >>> +++ b/include/linux/memremap.h >>> @@ -41,6 +41,13 @@ struct vmem_altmap { >>> * A more complete discussion of unaddressable memory >>> may be found in >>> * include/linux/hmm.h and Documentation/vm/hmm.rst. >>> * >>> + * MEMORY_DEVICE_COHERENT: >>> + * Device memory that is cache coherent from device and CPU >>> point of view. This >>> + * is used on platforms that have an advanced system bus (like >>> CAPI or CXL). A >>> + * driver can hotplug the device memory using ZONE_DEVICE and >>> with that memory >>> + * type. Any page of a process can be migrated to such memory. >>> However no one >> Any page might not be right, I'm pretty sure. ... just thinking >> about special pages >> like vdso, shared zeropage, ... pinned pages ... Well, you cannot migrate long term pages, that's what I meant :) >>> + * should be allowed to pin such memory so that it can always >>> be evicted. >>> + * >>> * MEMORY_DEVICE_FS_DAX: >>> * Host memory that has similar access semantics as >>> System RAM i.e. DMA >>> * coherent and supports page pinning. In support of >>> coordinating page >>> @@ -61,6 +68,7 @@ struct vmem_altmap { >>>enum memory_type { >>> /* 0 is reserved to catch uninitialized type fields */ >>> MEMORY_DEVICE_PRIVATE = 1, >>> + MEMORY_DEVICE_COHERENT, >>> MEMORY_DEVICE_FS_DAX, >>> MEMORY_DEVICE_GENERIC, >>> MEMORY_DEVICE_PCI_P2PDMA, >>> @@ -143,6 +151,17 @@ static inline bool >>> folio_is_device_private(const struct folio *folio) >> In general, this LGTM, and it should be correct with >> PageAnonExclusive I think. >> >> >> However, where exactly is pinning forbidden? > Long-term pinning is forbidden since it would interfere with the > device > memory manager owning the > device-coherent pages (e.g. evictions in TTM). However, normal > pinning > is allowed on this device type. I don't see updates to folio_is_pinnable() in this patch. >>> Device coherent type pages should return true here, as they are >>> pinnable >>> pages.
Re: [PATCH v5 01/13] mm: add zone device coherent type memory support
On 6/23/2022 2:57 AM, David Hildenbrand wrote: On 23.06.22 01:16, Sierra Guiza, Alejandro (Alex) wrote: On 6/21/2022 11:16 AM, David Hildenbrand wrote: On 21.06.22 18:08, Sierra Guiza, Alejandro (Alex) wrote: On 6/21/2022 7:25 AM, David Hildenbrand wrote: On 21.06.22 13:55, Alistair Popple wrote: David Hildenbrand writes: On 21.06.22 13:25, Felix Kuehling wrote: Am 6/17/22 um 23:19 schrieb David Hildenbrand: On 17.06.22 21:27, Sierra Guiza, Alejandro (Alex) wrote: On 6/17/2022 12:33 PM, David Hildenbrand wrote: On 17.06.22 19:20, Sierra Guiza, Alejandro (Alex) wrote: On 6/17/2022 4:40 AM, David Hildenbrand wrote: On 31.05.22 22:00, Alex Sierra wrote: Device memory that is cache coherent from device and CPU point of view. This is used on platforms that have an advanced system bus (like CAPI or CXL). Any page of a process can be migrated to such memory. However, no one should be allowed to pin such memory so that it can always be evicted. Signed-off-by: Alex Sierra Acked-by: Felix Kuehling Reviewed-by: Alistair Popple [hch: rebased ontop of the refcount changes, removed is_dev_private_or_coherent_page] Signed-off-by: Christoph Hellwig --- include/linux/memremap.h | 19 +++ mm/memcontrol.c | 7 --- mm/memory-failure.c | 8 ++-- mm/memremap.c| 10 ++ mm/migrate_device.c | 16 +++- mm/rmap.c| 5 +++-- 6 files changed, 49 insertions(+), 16 deletions(-) diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 8af304f6b504..9f752ebed613 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -41,6 +41,13 @@ struct vmem_altmap { * A more complete discussion of unaddressable memory may be found in * include/linux/hmm.h and Documentation/vm/hmm.rst. * + * MEMORY_DEVICE_COHERENT: + * Device memory that is cache coherent from device and CPU point of view. This + * is used on platforms that have an advanced system bus (like CAPI or CXL). A + * driver can hotplug the device memory using ZONE_DEVICE and with that memory + * type. Any page of a process can be migrated to such memory. However no one Any page might not be right, I'm pretty sure. ... just thinking about special pages like vdso, shared zeropage, ... pinned pages ... Well, you cannot migrate long term pages, that's what I meant :) + * should be allowed to pin such memory so that it can always be evicted. + * * MEMORY_DEVICE_FS_DAX: * Host memory that has similar access semantics as System RAM i.e. DMA * coherent and supports page pinning. In support of coordinating page @@ -61,6 +68,7 @@ struct vmem_altmap { enum memory_type { /* 0 is reserved to catch uninitialized type fields */ MEMORY_DEVICE_PRIVATE = 1, + MEMORY_DEVICE_COHERENT, MEMORY_DEVICE_FS_DAX, MEMORY_DEVICE_GENERIC, MEMORY_DEVICE_PCI_P2PDMA, @@ -143,6 +151,17 @@ static inline bool folio_is_device_private(const struct folio *folio) In general, this LGTM, and it should be correct with PageAnonExclusive I think. However, where exactly is pinning forbidden? Long-term pinning is forbidden since it would interfere with the device memory manager owning the device-coherent pages (e.g. evictions in TTM). However, normal pinning is allowed on this device type. I don't see updates to folio_is_pinnable() in this patch. Device coherent type pages should return true here, as they are pinnable pages. That function is only called for long-term pinnings in try_grab_folio(). So wouldn't try_grab_folio() simply pin these pages? What am I missing? As far as I understand this return NULL for long term pin pages. Otherwise they get refcount incremented. I don't follow. You're saying a) folio_is_pinnable() returns true for device coherent pages and that b) device coherent pages don't get long-term pinned Yet, the code says struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags) { if (flags & FOLL_GET) return try_get_folio(page, refs); else if (flags & FOLL_PIN) { struct folio *folio; /* * Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a * right zone, so fail and let the caller fall back to the slow * path. */ if (unlikely((flags & FOLL_LONGTERM) && !is_pinnable_page(page))) return NULL; ... return folio; } } What prevents these pages from getting long-term pinned as stated in this patch? Long-term pinning is handled by __gup_longterm_locked, which migrates pages returned by __get_user_pages_locked that cannot be long-term pinned. try_grab_folio is OK to grab the pages. Anything that can't be long-term pinned will be migrated
Re: [PATCH v5 01/13] mm: add zone device coherent type memory support
On 23.06.22 01:16, Sierra Guiza, Alejandro (Alex) wrote: > > On 6/21/2022 11:16 AM, David Hildenbrand wrote: >> On 21.06.22 18:08, Sierra Guiza, Alejandro (Alex) wrote: >>> On 6/21/2022 7:25 AM, David Hildenbrand wrote: On 21.06.22 13:55, Alistair Popple wrote: > David Hildenbrand writes: > >> On 21.06.22 13:25, Felix Kuehling wrote: >>> Am 6/17/22 um 23:19 schrieb David Hildenbrand: On 17.06.22 21:27, Sierra Guiza, Alejandro (Alex) wrote: > On 6/17/2022 12:33 PM, David Hildenbrand wrote: >> On 17.06.22 19:20, Sierra Guiza, Alejandro (Alex) wrote: >>> On 6/17/2022 4:40 AM, David Hildenbrand wrote: On 31.05.22 22:00, Alex Sierra wrote: > Device memory that is cache coherent from device and CPU point of > view. > This is used on platforms that have an advanced system bus (like > CAPI > or CXL). Any page of a process can be migrated to such memory. > However, > no one should be allowed to pin such memory so that it can always > be > evicted. > > Signed-off-by: Alex Sierra > Acked-by: Felix Kuehling > Reviewed-by: Alistair Popple > [hch: rebased ontop of the refcount changes, >removed is_dev_private_or_coherent_page] > Signed-off-by: Christoph Hellwig > --- > include/linux/memremap.h | 19 +++ > mm/memcontrol.c | 7 --- > mm/memory-failure.c | 8 ++-- > mm/memremap.c| 10 ++ > mm/migrate_device.c | 16 +++- > mm/rmap.c| 5 +++-- > 6 files changed, 49 insertions(+), 16 deletions(-) > > diff --git a/include/linux/memremap.h b/include/linux/memremap.h > index 8af304f6b504..9f752ebed613 100644 > --- a/include/linux/memremap.h > +++ b/include/linux/memremap.h > @@ -41,6 +41,13 @@ struct vmem_altmap { >* A more complete discussion of unaddressable memory may > be found in >* include/linux/hmm.h and Documentation/vm/hmm.rst. >* > + * MEMORY_DEVICE_COHERENT: > + * Device memory that is cache coherent from device and CPU > point of view. This > + * is used on platforms that have an advanced system bus (like > CAPI or CXL). A > + * driver can hotplug the device memory using ZONE_DEVICE and > with that memory > + * type. Any page of a process can be migrated to such memory. > However no one Any page might not be right, I'm pretty sure. ... just thinking about special pages like vdso, shared zeropage, ... pinned pages ... >> Well, you cannot migrate long term pages, that's what I meant :) >> > + * should be allowed to pin such memory so that it can always be > evicted. > + * >* MEMORY_DEVICE_FS_DAX: >* Host memory that has similar access semantics as System > RAM i.e. DMA >* coherent and supports page pinning. In support of > coordinating page > @@ -61,6 +68,7 @@ struct vmem_altmap { > enum memory_type { > /* 0 is reserved to catch uninitialized type fields */ > MEMORY_DEVICE_PRIVATE = 1, > + MEMORY_DEVICE_COHERENT, > MEMORY_DEVICE_FS_DAX, > MEMORY_DEVICE_GENERIC, > MEMORY_DEVICE_PCI_P2PDMA, > @@ -143,6 +151,17 @@ static inline bool > folio_is_device_private(const struct folio *folio) In general, this LGTM, and it should be correct with PageAnonExclusive I think. However, where exactly is pinning forbidden? >>> Long-term pinning is forbidden since it would interfere with the >>> device >>> memory manager owning the >>> device-coherent pages (e.g. evictions in TTM). However, normal >>> pinning >>> is allowed on this device type. >> I don't see updates to folio_is_pinnable() in this patch. > Device coherent type pages should return true here, as they are > pinnable > pages. That function is only called for long-term pinnings in try_grab_folio(). >> So wouldn't try_grab_folio() simply pin these pages? What am I >> missing? > As far as I understand this return NULL for long term pin pages. > Otherwise they get refcount
Re: [PATCH v5 01/13] mm: add zone device coherent type memory support
On 6/21/2022 11:16 AM, David Hildenbrand wrote: On 21.06.22 18:08, Sierra Guiza, Alejandro (Alex) wrote: On 6/21/2022 7:25 AM, David Hildenbrand wrote: On 21.06.22 13:55, Alistair Popple wrote: David Hildenbrand writes: On 21.06.22 13:25, Felix Kuehling wrote: Am 6/17/22 um 23:19 schrieb David Hildenbrand: On 17.06.22 21:27, Sierra Guiza, Alejandro (Alex) wrote: On 6/17/2022 12:33 PM, David Hildenbrand wrote: On 17.06.22 19:20, Sierra Guiza, Alejandro (Alex) wrote: On 6/17/2022 4:40 AM, David Hildenbrand wrote: On 31.05.22 22:00, Alex Sierra wrote: Device memory that is cache coherent from device and CPU point of view. This is used on platforms that have an advanced system bus (like CAPI or CXL). Any page of a process can be migrated to such memory. However, no one should be allowed to pin such memory so that it can always be evicted. Signed-off-by: Alex Sierra Acked-by: Felix Kuehling Reviewed-by: Alistair Popple [hch: rebased ontop of the refcount changes, removed is_dev_private_or_coherent_page] Signed-off-by: Christoph Hellwig --- include/linux/memremap.h | 19 +++ mm/memcontrol.c | 7 --- mm/memory-failure.c | 8 ++-- mm/memremap.c| 10 ++ mm/migrate_device.c | 16 +++- mm/rmap.c| 5 +++-- 6 files changed, 49 insertions(+), 16 deletions(-) diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 8af304f6b504..9f752ebed613 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -41,6 +41,13 @@ struct vmem_altmap { * A more complete discussion of unaddressable memory may be found in * include/linux/hmm.h and Documentation/vm/hmm.rst. * + * MEMORY_DEVICE_COHERENT: + * Device memory that is cache coherent from device and CPU point of view. This + * is used on platforms that have an advanced system bus (like CAPI or CXL). A + * driver can hotplug the device memory using ZONE_DEVICE and with that memory + * type. Any page of a process can be migrated to such memory. However no one Any page might not be right, I'm pretty sure. ... just thinking about special pages like vdso, shared zeropage, ... pinned pages ... Well, you cannot migrate long term pages, that's what I meant :) + * should be allowed to pin such memory so that it can always be evicted. + * * MEMORY_DEVICE_FS_DAX: * Host memory that has similar access semantics as System RAM i.e. DMA * coherent and supports page pinning. In support of coordinating page @@ -61,6 +68,7 @@ struct vmem_altmap { enum memory_type { /* 0 is reserved to catch uninitialized type fields */ MEMORY_DEVICE_PRIVATE = 1, + MEMORY_DEVICE_COHERENT, MEMORY_DEVICE_FS_DAX, MEMORY_DEVICE_GENERIC, MEMORY_DEVICE_PCI_P2PDMA, @@ -143,6 +151,17 @@ static inline bool folio_is_device_private(const struct folio *folio) In general, this LGTM, and it should be correct with PageAnonExclusive I think. However, where exactly is pinning forbidden? Long-term pinning is forbidden since it would interfere with the device memory manager owning the device-coherent pages (e.g. evictions in TTM). However, normal pinning is allowed on this device type. I don't see updates to folio_is_pinnable() in this patch. Device coherent type pages should return true here, as they are pinnable pages. That function is only called for long-term pinnings in try_grab_folio(). So wouldn't try_grab_folio() simply pin these pages? What am I missing? As far as I understand this return NULL for long term pin pages. Otherwise they get refcount incremented. I don't follow. You're saying a) folio_is_pinnable() returns true for device coherent pages and that b) device coherent pages don't get long-term pinned Yet, the code says struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags) { if (flags & FOLL_GET) return try_get_folio(page, refs); else if (flags & FOLL_PIN) { struct folio *folio; /* * Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a * right zone, so fail and let the caller fall back to the slow * path. */ if (unlikely((flags & FOLL_LONGTERM) && !is_pinnable_page(page))) return NULL; ... return folio; } } What prevents these pages from getting long-term pinned as stated in this patch? Long-term pinning is handled by __gup_longterm_locked, which migrates pages returned by __get_user_pages_locked that cannot be long-term pinned. try_grab_folio is OK to grab the pages. Anything that can't be long-term pinned will be migrated afterwards, and __get_user_pages_locked will be retried. The migration of DEVICE_COHERENT pages was implemented by Alistair in
Re: [PATCH v5 01/13] mm: add zone device coherent type memory support
On 6/21/2022 7:16 PM, Alistair Popple wrote: David Hildenbrand writes: On 21.06.22 18:08, Sierra Guiza, Alejandro (Alex) wrote: On 6/21/2022 7:25 AM, David Hildenbrand wrote: On 21.06.22 13:55, Alistair Popple wrote: David Hildenbrand writes: On 21.06.22 13:25, Felix Kuehling wrote: Am 6/17/22 um 23:19 schrieb David Hildenbrand: On 17.06.22 21:27, Sierra Guiza, Alejandro (Alex) wrote: On 6/17/2022 12:33 PM, David Hildenbrand wrote: On 17.06.22 19:20, Sierra Guiza, Alejandro (Alex) wrote: On 6/17/2022 4:40 AM, David Hildenbrand wrote: On 31.05.22 22:00, Alex Sierra wrote: Device memory that is cache coherent from device and CPU point of view. This is used on platforms that have an advanced system bus (like CAPI or CXL). Any page of a process can be migrated to such memory. However, no one should be allowed to pin such memory so that it can always be evicted. Signed-off-by: Alex Sierra Acked-by: Felix Kuehling Reviewed-by: Alistair Popple [hch: rebased ontop of the refcount changes, removed is_dev_private_or_coherent_page] Signed-off-by: Christoph Hellwig --- include/linux/memremap.h | 19 +++ mm/memcontrol.c | 7 --- mm/memory-failure.c | 8 ++-- mm/memremap.c| 10 ++ mm/migrate_device.c | 16 +++- mm/rmap.c| 5 +++-- 6 files changed, 49 insertions(+), 16 deletions(-) diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 8af304f6b504..9f752ebed613 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -41,6 +41,13 @@ struct vmem_altmap { * A more complete discussion of unaddressable memory may be found in * include/linux/hmm.h and Documentation/vm/hmm.rst. * + * MEMORY_DEVICE_COHERENT: + * Device memory that is cache coherent from device and CPU point of view. This + * is used on platforms that have an advanced system bus (like CAPI or CXL). A + * driver can hotplug the device memory using ZONE_DEVICE and with that memory + * type. Any page of a process can be migrated to such memory. However no one Any page might not be right, I'm pretty sure. ... just thinking about special pages like vdso, shared zeropage, ... pinned pages ... Well, you cannot migrate long term pages, that's what I meant :) + * should be allowed to pin such memory so that it can always be evicted. + * * MEMORY_DEVICE_FS_DAX: * Host memory that has similar access semantics as System RAM i.e. DMA * coherent and supports page pinning. In support of coordinating page @@ -61,6 +68,7 @@ struct vmem_altmap { enum memory_type { /* 0 is reserved to catch uninitialized type fields */ MEMORY_DEVICE_PRIVATE = 1, + MEMORY_DEVICE_COHERENT, MEMORY_DEVICE_FS_DAX, MEMORY_DEVICE_GENERIC, MEMORY_DEVICE_PCI_P2PDMA, @@ -143,6 +151,17 @@ static inline bool folio_is_device_private(const struct folio *folio) In general, this LGTM, and it should be correct with PageAnonExclusive I think. However, where exactly is pinning forbidden? Long-term pinning is forbidden since it would interfere with the device memory manager owning the device-coherent pages (e.g. evictions in TTM). However, normal pinning is allowed on this device type. I don't see updates to folio_is_pinnable() in this patch. Device coherent type pages should return true here, as they are pinnable pages. That function is only called for long-term pinnings in try_grab_folio(). So wouldn't try_grab_folio() simply pin these pages? What am I missing? As far as I understand this return NULL for long term pin pages. Otherwise they get refcount incremented. I don't follow. You're saying a) folio_is_pinnable() returns true for device coherent pages and that b) device coherent pages don't get long-term pinned Yet, the code says struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags) { if (flags & FOLL_GET) return try_get_folio(page, refs); else if (flags & FOLL_PIN) { struct folio *folio; /* * Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a * right zone, so fail and let the caller fall back to the slow * path. */ if (unlikely((flags & FOLL_LONGTERM) && !is_pinnable_page(page))) return NULL; ... return folio; } } What prevents these pages from getting long-term pinned as stated in this patch? Long-term pinning is handled by __gup_longterm_locked, which migrates pages returned by __get_user_pages_locked that cannot be long-term pinned. try_grab_folio is OK to grab the pages. Anything that can't be long-term pinned will be migrated afterwards, and __get_user_pages_locked will be retried. The migration of DEVICE_COHERENT pages was
Re: [PATCH v5 01/13] mm: add zone device coherent type memory support
David Hildenbrand writes: > On 21.06.22 18:08, Sierra Guiza, Alejandro (Alex) wrote: >> >> On 6/21/2022 7:25 AM, David Hildenbrand wrote: >>> On 21.06.22 13:55, Alistair Popple wrote: David Hildenbrand writes: > On 21.06.22 13:25, Felix Kuehling wrote: >> Am 6/17/22 um 23:19 schrieb David Hildenbrand: >>> On 17.06.22 21:27, Sierra Guiza, Alejandro (Alex) wrote: On 6/17/2022 12:33 PM, David Hildenbrand wrote: > On 17.06.22 19:20, Sierra Guiza, Alejandro (Alex) wrote: >> On 6/17/2022 4:40 AM, David Hildenbrand wrote: >>> On 31.05.22 22:00, Alex Sierra wrote: Device memory that is cache coherent from device and CPU point of view. This is used on platforms that have an advanced system bus (like CAPI or CXL). Any page of a process can be migrated to such memory. However, no one should be allowed to pin such memory so that it can always be evicted. Signed-off-by: Alex Sierra Acked-by: Felix Kuehling Reviewed-by: Alistair Popple [hch: rebased ontop of the refcount changes, removed is_dev_private_or_coherent_page] Signed-off-by: Christoph Hellwig --- include/linux/memremap.h | 19 +++ mm/memcontrol.c | 7 --- mm/memory-failure.c | 8 ++-- mm/memremap.c| 10 ++ mm/migrate_device.c | 16 +++- mm/rmap.c| 5 +++-- 6 files changed, 49 insertions(+), 16 deletions(-) diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 8af304f6b504..9f752ebed613 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -41,6 +41,13 @@ struct vmem_altmap { * A more complete discussion of unaddressable memory may be found in * include/linux/hmm.h and Documentation/vm/hmm.rst. * + * MEMORY_DEVICE_COHERENT: + * Device memory that is cache coherent from device and CPU point of view. This + * is used on platforms that have an advanced system bus (like CAPI or CXL). A + * driver can hotplug the device memory using ZONE_DEVICE and with that memory + * type. Any page of a process can be migrated to such memory. However no one >>> Any page might not be right, I'm pretty sure. ... just thinking >>> about special pages >>> like vdso, shared zeropage, ... pinned pages ... > Well, you cannot migrate long term pages, that's what I meant :) > + * should be allowed to pin such memory so that it can always be evicted. + * * MEMORY_DEVICE_FS_DAX: * Host memory that has similar access semantics as System RAM i.e. DMA * coherent and supports page pinning. In support of coordinating page @@ -61,6 +68,7 @@ struct vmem_altmap { enum memory_type { /* 0 is reserved to catch uninitialized type fields */ MEMORY_DEVICE_PRIVATE = 1, + MEMORY_DEVICE_COHERENT, MEMORY_DEVICE_FS_DAX, MEMORY_DEVICE_GENERIC, MEMORY_DEVICE_PCI_P2PDMA, @@ -143,6 +151,17 @@ static inline bool folio_is_device_private(const struct folio *folio) >>> In general, this LGTM, and it should be correct with >>> PageAnonExclusive I think. >>> >>> >>> However, where exactly is pinning forbidden? >> Long-term pinning is forbidden since it would interfere with the >> device >> memory manager owning the >> device-coherent pages (e.g. evictions in TTM). However, normal >> pinning >> is allowed on this device type. > I don't see updates to folio_is_pinnable() in this patch. Device coherent type pages should return true here, as they are pinnable pages. >>> That function is only called for long-term pinnings in try_grab_folio(). >>> > So wouldn't try_grab_folio() simply pin these pages? What am I > missing? As far as I understand this return NULL for long term pin pages. Otherwise they get refcount incremented. >>> I don't follow. >>> >>> You're saying >>> >>> a) folio_is_pinnable() returns true for device coherent pages >>> >>> and that >>> >>> b) device
Re: [PATCH v5 01/13] mm: add zone device coherent type memory support
On 21.06.22 18:08, Sierra Guiza, Alejandro (Alex) wrote: > > On 6/21/2022 7:25 AM, David Hildenbrand wrote: >> On 21.06.22 13:55, Alistair Popple wrote: >>> David Hildenbrand writes: >>> On 21.06.22 13:25, Felix Kuehling wrote: > Am 6/17/22 um 23:19 schrieb David Hildenbrand: >> On 17.06.22 21:27, Sierra Guiza, Alejandro (Alex) wrote: >>> On 6/17/2022 12:33 PM, David Hildenbrand wrote: On 17.06.22 19:20, Sierra Guiza, Alejandro (Alex) wrote: > On 6/17/2022 4:40 AM, David Hildenbrand wrote: >> On 31.05.22 22:00, Alex Sierra wrote: >>> Device memory that is cache coherent from device and CPU point of >>> view. >>> This is used on platforms that have an advanced system bus (like >>> CAPI >>> or CXL). Any page of a process can be migrated to such memory. >>> However, >>> no one should be allowed to pin such memory so that it can always be >>> evicted. >>> >>> Signed-off-by: Alex Sierra >>> Acked-by: Felix Kuehling >>> Reviewed-by: Alistair Popple >>> [hch: rebased ontop of the refcount changes, >>> removed is_dev_private_or_coherent_page] >>> Signed-off-by: Christoph Hellwig >>> --- >>> include/linux/memremap.h | 19 +++ >>> mm/memcontrol.c | 7 --- >>> mm/memory-failure.c | 8 ++-- >>> mm/memremap.c| 10 ++ >>> mm/migrate_device.c | 16 +++- >>> mm/rmap.c| 5 +++-- >>> 6 files changed, 49 insertions(+), 16 deletions(-) >>> >>> diff --git a/include/linux/memremap.h b/include/linux/memremap.h >>> index 8af304f6b504..9f752ebed613 100644 >>> --- a/include/linux/memremap.h >>> +++ b/include/linux/memremap.h >>> @@ -41,6 +41,13 @@ struct vmem_altmap { >>> * A more complete discussion of unaddressable memory may be >>> found in >>> * include/linux/hmm.h and Documentation/vm/hmm.rst. >>> * >>> + * MEMORY_DEVICE_COHERENT: >>> + * Device memory that is cache coherent from device and CPU point >>> of view. This >>> + * is used on platforms that have an advanced system bus (like >>> CAPI or CXL). A >>> + * driver can hotplug the device memory using ZONE_DEVICE and with >>> that memory >>> + * type. Any page of a process can be migrated to such memory. >>> However no one >> Any page might not be right, I'm pretty sure. ... just thinking >> about special pages >> like vdso, shared zeropage, ... pinned pages ... Well, you cannot migrate long term pages, that's what I meant :) >>> + * should be allowed to pin such memory so that it can always be >>> evicted. >>> + * >>> * MEMORY_DEVICE_FS_DAX: >>> * Host memory that has similar access semantics as System RAM >>> i.e. DMA >>> * coherent and supports page pinning. In support of >>> coordinating page >>> @@ -61,6 +68,7 @@ struct vmem_altmap { >>> enum memory_type { >>> /* 0 is reserved to catch uninitialized type fields */ >>> MEMORY_DEVICE_PRIVATE = 1, >>> + MEMORY_DEVICE_COHERENT, >>> MEMORY_DEVICE_FS_DAX, >>> MEMORY_DEVICE_GENERIC, >>> MEMORY_DEVICE_PCI_P2PDMA, >>> @@ -143,6 +151,17 @@ static inline bool >>> folio_is_device_private(const struct folio *folio) >> In general, this LGTM, and it should be correct with >> PageAnonExclusive I think. >> >> >> However, where exactly is pinning forbidden? > Long-term pinning is forbidden since it would interfere with the > device > memory manager owning the > device-coherent pages (e.g. evictions in TTM). However, normal pinning > is allowed on this device type. I don't see updates to folio_is_pinnable() in this patch. >>> Device coherent type pages should return true here, as they are pinnable >>> pages. >> That function is only called for long-term pinnings in try_grab_folio(). >> So wouldn't try_grab_folio() simply pin these pages? What am I missing? >>> As far as I understand this return NULL for long term pin pages. >>> Otherwise they get refcount incremented. >> I don't follow. >> >> You're saying >> >> a) folio_is_pinnable() returns true for device coherent pages >> >> and that >> >> b) device coherent pages don't get long-term pinned >> >> >> Yet, the code says >> >> struct folio *try_grab_folio(struct page *page, int refs, unsigned int >>
Re: [PATCH v5 01/13] mm: add zone device coherent type memory support
On 6/21/2022 7:25 AM, David Hildenbrand wrote: On 21.06.22 13:55, Alistair Popple wrote: David Hildenbrand writes: On 21.06.22 13:25, Felix Kuehling wrote: Am 6/17/22 um 23:19 schrieb David Hildenbrand: On 17.06.22 21:27, Sierra Guiza, Alejandro (Alex) wrote: On 6/17/2022 12:33 PM, David Hildenbrand wrote: On 17.06.22 19:20, Sierra Guiza, Alejandro (Alex) wrote: On 6/17/2022 4:40 AM, David Hildenbrand wrote: On 31.05.22 22:00, Alex Sierra wrote: Device memory that is cache coherent from device and CPU point of view. This is used on platforms that have an advanced system bus (like CAPI or CXL). Any page of a process can be migrated to such memory. However, no one should be allowed to pin such memory so that it can always be evicted. Signed-off-by: Alex Sierra Acked-by: Felix Kuehling Reviewed-by: Alistair Popple [hch: rebased ontop of the refcount changes, removed is_dev_private_or_coherent_page] Signed-off-by: Christoph Hellwig --- include/linux/memremap.h | 19 +++ mm/memcontrol.c | 7 --- mm/memory-failure.c | 8 ++-- mm/memremap.c| 10 ++ mm/migrate_device.c | 16 +++- mm/rmap.c| 5 +++-- 6 files changed, 49 insertions(+), 16 deletions(-) diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 8af304f6b504..9f752ebed613 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -41,6 +41,13 @@ struct vmem_altmap { * A more complete discussion of unaddressable memory may be found in * include/linux/hmm.h and Documentation/vm/hmm.rst. * + * MEMORY_DEVICE_COHERENT: + * Device memory that is cache coherent from device and CPU point of view. This + * is used on platforms that have an advanced system bus (like CAPI or CXL). A + * driver can hotplug the device memory using ZONE_DEVICE and with that memory + * type. Any page of a process can be migrated to such memory. However no one Any page might not be right, I'm pretty sure. ... just thinking about special pages like vdso, shared zeropage, ... pinned pages ... Well, you cannot migrate long term pages, that's what I meant :) + * should be allowed to pin such memory so that it can always be evicted. + * * MEMORY_DEVICE_FS_DAX: * Host memory that has similar access semantics as System RAM i.e. DMA * coherent and supports page pinning. In support of coordinating page @@ -61,6 +68,7 @@ struct vmem_altmap { enum memory_type { /* 0 is reserved to catch uninitialized type fields */ MEMORY_DEVICE_PRIVATE = 1, + MEMORY_DEVICE_COHERENT, MEMORY_DEVICE_FS_DAX, MEMORY_DEVICE_GENERIC, MEMORY_DEVICE_PCI_P2PDMA, @@ -143,6 +151,17 @@ static inline bool folio_is_device_private(const struct folio *folio) In general, this LGTM, and it should be correct with PageAnonExclusive I think. However, where exactly is pinning forbidden? Long-term pinning is forbidden since it would interfere with the device memory manager owning the device-coherent pages (e.g. evictions in TTM). However, normal pinning is allowed on this device type. I don't see updates to folio_is_pinnable() in this patch. Device coherent type pages should return true here, as they are pinnable pages. That function is only called for long-term pinnings in try_grab_folio(). So wouldn't try_grab_folio() simply pin these pages? What am I missing? As far as I understand this return NULL for long term pin pages. Otherwise they get refcount incremented. I don't follow. You're saying a) folio_is_pinnable() returns true for device coherent pages and that b) device coherent pages don't get long-term pinned Yet, the code says struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags) { if (flags & FOLL_GET) return try_get_folio(page, refs); else if (flags & FOLL_PIN) { struct folio *folio; /* * Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a * right zone, so fail and let the caller fall back to the slow * path. */ if (unlikely((flags & FOLL_LONGTERM) && !is_pinnable_page(page))) return NULL; ... return folio; } } What prevents these pages from getting long-term pinned as stated in this patch? Long-term pinning is handled by __gup_longterm_locked, which migrates pages returned by __get_user_pages_locked that cannot be long-term pinned. try_grab_folio is OK to grab the pages. Anything that can't be long-term pinned will be migrated afterwards, and __get_user_pages_locked will be retried. The migration of DEVICE_COHERENT pages was implemented by Alistair in patch 5/13 ("mm/gup: migrate device coherent pages when pinning instead of failing"). Thanks.
Re: [PATCH v5 01/13] mm: add zone device coherent type memory support
On 21.06.22 13:55, Alistair Popple wrote: > > David Hildenbrand writes: > >> On 21.06.22 13:25, Felix Kuehling wrote: >>> >>> Am 6/17/22 um 23:19 schrieb David Hildenbrand: On 17.06.22 21:27, Sierra Guiza, Alejandro (Alex) wrote: > On 6/17/2022 12:33 PM, David Hildenbrand wrote: >> On 17.06.22 19:20, Sierra Guiza, Alejandro (Alex) wrote: >>> On 6/17/2022 4:40 AM, David Hildenbrand wrote: On 31.05.22 22:00, Alex Sierra wrote: > Device memory that is cache coherent from device and CPU point of > view. > This is used on platforms that have an advanced system bus (like CAPI > or CXL). Any page of a process can be migrated to such memory. > However, > no one should be allowed to pin such memory so that it can always be > evicted. > > Signed-off-by: Alex Sierra > Acked-by: Felix Kuehling > Reviewed-by: Alistair Popple > [hch: rebased ontop of the refcount changes, > removed is_dev_private_or_coherent_page] > Signed-off-by: Christoph Hellwig > --- > include/linux/memremap.h | 19 +++ > mm/memcontrol.c | 7 --- > mm/memory-failure.c | 8 ++-- > mm/memremap.c| 10 ++ > mm/migrate_device.c | 16 +++- > mm/rmap.c| 5 +++-- > 6 files changed, 49 insertions(+), 16 deletions(-) > > diff --git a/include/linux/memremap.h b/include/linux/memremap.h > index 8af304f6b504..9f752ebed613 100644 > --- a/include/linux/memremap.h > +++ b/include/linux/memremap.h > @@ -41,6 +41,13 @@ struct vmem_altmap { > * A more complete discussion of unaddressable memory may be > found in > * include/linux/hmm.h and Documentation/vm/hmm.rst. > * > + * MEMORY_DEVICE_COHERENT: > + * Device memory that is cache coherent from device and CPU point of > view. This > + * is used on platforms that have an advanced system bus (like CAPI > or CXL). A > + * driver can hotplug the device memory using ZONE_DEVICE and with > that memory > + * type. Any page of a process can be migrated to such memory. > However no one Any page might not be right, I'm pretty sure. ... just thinking about special pages like vdso, shared zeropage, ... pinned pages ... >> Well, you cannot migrate long term pages, that's what I meant :) >> > + * should be allowed to pin such memory so that it can always be > evicted. > + * > * MEMORY_DEVICE_FS_DAX: > * Host memory that has similar access semantics as System RAM > i.e. DMA > * coherent and supports page pinning. In support of coordinating > page > @@ -61,6 +68,7 @@ struct vmem_altmap { > enum memory_type { > /* 0 is reserved to catch uninitialized type fields */ > MEMORY_DEVICE_PRIVATE = 1, > + MEMORY_DEVICE_COHERENT, > MEMORY_DEVICE_FS_DAX, > MEMORY_DEVICE_GENERIC, > MEMORY_DEVICE_PCI_P2PDMA, > @@ -143,6 +151,17 @@ static inline bool folio_is_device_private(const > struct folio *folio) In general, this LGTM, and it should be correct with PageAnonExclusive I think. However, where exactly is pinning forbidden? >>> Long-term pinning is forbidden since it would interfere with the device >>> memory manager owning the >>> device-coherent pages (e.g. evictions in TTM). However, normal pinning >>> is allowed on this device type. >> I don't see updates to folio_is_pinnable() in this patch. > Device coherent type pages should return true here, as they are pinnable > pages. That function is only called for long-term pinnings in try_grab_folio(). >> So wouldn't try_grab_folio() simply pin these pages? What am I missing? > As far as I understand this return NULL for long term pin pages. > Otherwise they get refcount incremented. I don't follow. You're saying a) folio_is_pinnable() returns true for device coherent pages and that b) device coherent pages don't get long-term pinned Yet, the code says struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags) { if (flags & FOLL_GET) return try_get_folio(page, refs); else if (flags & FOLL_PIN) { struct folio *folio; /* * Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a * right zone, so fail and let the caller fall back to the slow * path.
Re: [PATCH v5 01/13] mm: add zone device coherent type memory support
David Hildenbrand writes: > On 21.06.22 13:25, Felix Kuehling wrote: >> >> Am 6/17/22 um 23:19 schrieb David Hildenbrand: >>> On 17.06.22 21:27, Sierra Guiza, Alejandro (Alex) wrote: On 6/17/2022 12:33 PM, David Hildenbrand wrote: > On 17.06.22 19:20, Sierra Guiza, Alejandro (Alex) wrote: >> On 6/17/2022 4:40 AM, David Hildenbrand wrote: >>> On 31.05.22 22:00, Alex Sierra wrote: Device memory that is cache coherent from device and CPU point of view. This is used on platforms that have an advanced system bus (like CAPI or CXL). Any page of a process can be migrated to such memory. However, no one should be allowed to pin such memory so that it can always be evicted. Signed-off-by: Alex Sierra Acked-by: Felix Kuehling Reviewed-by: Alistair Popple [hch: rebased ontop of the refcount changes, removed is_dev_private_or_coherent_page] Signed-off-by: Christoph Hellwig --- include/linux/memremap.h | 19 +++ mm/memcontrol.c | 7 --- mm/memory-failure.c | 8 ++-- mm/memremap.c| 10 ++ mm/migrate_device.c | 16 +++- mm/rmap.c| 5 +++-- 6 files changed, 49 insertions(+), 16 deletions(-) diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 8af304f6b504..9f752ebed613 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -41,6 +41,13 @@ struct vmem_altmap { * A more complete discussion of unaddressable memory may be found in * include/linux/hmm.h and Documentation/vm/hmm.rst. * + * MEMORY_DEVICE_COHERENT: + * Device memory that is cache coherent from device and CPU point of view. This + * is used on platforms that have an advanced system bus (like CAPI or CXL). A + * driver can hotplug the device memory using ZONE_DEVICE and with that memory + * type. Any page of a process can be migrated to such memory. However no one >>> Any page might not be right, I'm pretty sure. ... just thinking about >>> special pages >>> like vdso, shared zeropage, ... pinned pages ... > Well, you cannot migrate long term pages, that's what I meant :) > + * should be allowed to pin such memory so that it can always be evicted. + * * MEMORY_DEVICE_FS_DAX: * Host memory that has similar access semantics as System RAM i.e. DMA * coherent and supports page pinning. In support of coordinating page @@ -61,6 +68,7 @@ struct vmem_altmap { enum memory_type { /* 0 is reserved to catch uninitialized type fields */ MEMORY_DEVICE_PRIVATE = 1, + MEMORY_DEVICE_COHERENT, MEMORY_DEVICE_FS_DAX, MEMORY_DEVICE_GENERIC, MEMORY_DEVICE_PCI_P2PDMA, @@ -143,6 +151,17 @@ static inline bool folio_is_device_private(const struct folio *folio) >>> In general, this LGTM, and it should be correct with PageAnonExclusive >>> I think. >>> >>> >>> However, where exactly is pinning forbidden? >> Long-term pinning is forbidden since it would interfere with the device >> memory manager owning the >> device-coherent pages (e.g. evictions in TTM). However, normal pinning >> is allowed on this device type. > I don't see updates to folio_is_pinnable() in this patch. Device coherent type pages should return true here, as they are pinnable pages. >>> That function is only called for long-term pinnings in try_grab_folio(). >>> > So wouldn't try_grab_folio() simply pin these pages? What am I missing? As far as I understand this return NULL for long term pin pages. Otherwise they get refcount incremented. >>> I don't follow. >>> >>> You're saying >>> >>> a) folio_is_pinnable() returns true for device coherent pages >>> >>> and that >>> >>> b) device coherent pages don't get long-term pinned >>> >>> >>> Yet, the code says >>> >>> struct folio *try_grab_folio(struct page *page, int refs, unsigned int >>> flags) >>> { >>> if (flags & FOLL_GET) >>> return try_get_folio(page, refs); >>> else if (flags & FOLL_PIN) { >>> struct folio *folio; >>> >>> /* >>> * Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a >>> * right zone, so fail and let the caller fall back to the slow >>> * path. >>> */ >>> if (unlikely((flags & FOLL_LONGTERM) && >>> !is_pinnable_page(page))) >>> return NULL; >>>
Re: [PATCH v5 01/13] mm: add zone device coherent type memory support
On 21.06.22 13:25, Felix Kuehling wrote: > > Am 6/17/22 um 23:19 schrieb David Hildenbrand: >> On 17.06.22 21:27, Sierra Guiza, Alejandro (Alex) wrote: >>> On 6/17/2022 12:33 PM, David Hildenbrand wrote: On 17.06.22 19:20, Sierra Guiza, Alejandro (Alex) wrote: > On 6/17/2022 4:40 AM, David Hildenbrand wrote: >> On 31.05.22 22:00, Alex Sierra wrote: >>> Device memory that is cache coherent from device and CPU point of view. >>> This is used on platforms that have an advanced system bus (like CAPI >>> or CXL). Any page of a process can be migrated to such memory. However, >>> no one should be allowed to pin such memory so that it can always be >>> evicted. >>> >>> Signed-off-by: Alex Sierra >>> Acked-by: Felix Kuehling >>> Reviewed-by: Alistair Popple >>> [hch: rebased ontop of the refcount changes, >>> removed is_dev_private_or_coherent_page] >>> Signed-off-by: Christoph Hellwig >>> --- >>> include/linux/memremap.h | 19 +++ >>> mm/memcontrol.c | 7 --- >>> mm/memory-failure.c | 8 ++-- >>> mm/memremap.c| 10 ++ >>> mm/migrate_device.c | 16 +++- >>> mm/rmap.c| 5 +++-- >>> 6 files changed, 49 insertions(+), 16 deletions(-) >>> >>> diff --git a/include/linux/memremap.h b/include/linux/memremap.h >>> index 8af304f6b504..9f752ebed613 100644 >>> --- a/include/linux/memremap.h >>> +++ b/include/linux/memremap.h >>> @@ -41,6 +41,13 @@ struct vmem_altmap { >>> * A more complete discussion of unaddressable memory may be found >>> in >>> * include/linux/hmm.h and Documentation/vm/hmm.rst. >>> * >>> + * MEMORY_DEVICE_COHERENT: >>> + * Device memory that is cache coherent from device and CPU point of >>> view. This >>> + * is used on platforms that have an advanced system bus (like CAPI or >>> CXL). A >>> + * driver can hotplug the device memory using ZONE_DEVICE and with >>> that memory >>> + * type. Any page of a process can be migrated to such memory. However >>> no one >> Any page might not be right, I'm pretty sure. ... just thinking about >> special pages >> like vdso, shared zeropage, ... pinned pages ... Well, you cannot migrate long term pages, that's what I meant :) >>> + * should be allowed to pin such memory so that it can always be >>> evicted. >>> + * >>> * MEMORY_DEVICE_FS_DAX: >>> * Host memory that has similar access semantics as System RAM i.e. >>> DMA >>> * coherent and supports page pinning. In support of coordinating >>> page >>> @@ -61,6 +68,7 @@ struct vmem_altmap { >>> enum memory_type { >>> /* 0 is reserved to catch uninitialized type fields */ >>> MEMORY_DEVICE_PRIVATE = 1, >>> + MEMORY_DEVICE_COHERENT, >>> MEMORY_DEVICE_FS_DAX, >>> MEMORY_DEVICE_GENERIC, >>> MEMORY_DEVICE_PCI_P2PDMA, >>> @@ -143,6 +151,17 @@ static inline bool folio_is_device_private(const >>> struct folio *folio) >> In general, this LGTM, and it should be correct with PageAnonExclusive I >> think. >> >> >> However, where exactly is pinning forbidden? > Long-term pinning is forbidden since it would interfere with the device > memory manager owning the > device-coherent pages (e.g. evictions in TTM). However, normal pinning > is allowed on this device type. I don't see updates to folio_is_pinnable() in this patch. >>> Device coherent type pages should return true here, as they are pinnable >>> pages. >> That function is only called for long-term pinnings in try_grab_folio(). >> So wouldn't try_grab_folio() simply pin these pages? What am I missing? >>> As far as I understand this return NULL for long term pin pages. >>> Otherwise they get refcount incremented. >> I don't follow. >> >> You're saying >> >> a) folio_is_pinnable() returns true for device coherent pages >> >> and that >> >> b) device coherent pages don't get long-term pinned >> >> >> Yet, the code says >> >> struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags) >> { >> if (flags & FOLL_GET) >> return try_get_folio(page, refs); >> else if (flags & FOLL_PIN) { >> struct folio *folio; >> >> /* >> * Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a >> * right zone, so fail and let the caller fall back to the slow >> * path. >> */ >> if (unlikely((flags & FOLL_LONGTERM) && >> !is_pinnable_page(page))) >> return NULL; >> ... >> return folio; >> } >> } >> >> >> What prevents these pages from getting long-term pinned as
Re: [PATCH v5 01/13] mm: add zone device coherent type memory support
Am 6/17/22 um 23:19 schrieb David Hildenbrand: On 17.06.22 21:27, Sierra Guiza, Alejandro (Alex) wrote: On 6/17/2022 12:33 PM, David Hildenbrand wrote: On 17.06.22 19:20, Sierra Guiza, Alejandro (Alex) wrote: On 6/17/2022 4:40 AM, David Hildenbrand wrote: On 31.05.22 22:00, Alex Sierra wrote: Device memory that is cache coherent from device and CPU point of view. This is used on platforms that have an advanced system bus (like CAPI or CXL). Any page of a process can be migrated to such memory. However, no one should be allowed to pin such memory so that it can always be evicted. Signed-off-by: Alex Sierra Acked-by: Felix Kuehling Reviewed-by: Alistair Popple [hch: rebased ontop of the refcount changes, removed is_dev_private_or_coherent_page] Signed-off-by: Christoph Hellwig --- include/linux/memremap.h | 19 +++ mm/memcontrol.c | 7 --- mm/memory-failure.c | 8 ++-- mm/memremap.c| 10 ++ mm/migrate_device.c | 16 +++- mm/rmap.c| 5 +++-- 6 files changed, 49 insertions(+), 16 deletions(-) diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 8af304f6b504..9f752ebed613 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -41,6 +41,13 @@ struct vmem_altmap { * A more complete discussion of unaddressable memory may be found in * include/linux/hmm.h and Documentation/vm/hmm.rst. * + * MEMORY_DEVICE_COHERENT: + * Device memory that is cache coherent from device and CPU point of view. This + * is used on platforms that have an advanced system bus (like CAPI or CXL). A + * driver can hotplug the device memory using ZONE_DEVICE and with that memory + * type. Any page of a process can be migrated to such memory. However no one Any page might not be right, I'm pretty sure. ... just thinking about special pages like vdso, shared zeropage, ... pinned pages ... Well, you cannot migrate long term pages, that's what I meant :) + * should be allowed to pin such memory so that it can always be evicted. + * * MEMORY_DEVICE_FS_DAX: * Host memory that has similar access semantics as System RAM i.e. DMA * coherent and supports page pinning. In support of coordinating page @@ -61,6 +68,7 @@ struct vmem_altmap { enum memory_type { /* 0 is reserved to catch uninitialized type fields */ MEMORY_DEVICE_PRIVATE = 1, + MEMORY_DEVICE_COHERENT, MEMORY_DEVICE_FS_DAX, MEMORY_DEVICE_GENERIC, MEMORY_DEVICE_PCI_P2PDMA, @@ -143,6 +151,17 @@ static inline bool folio_is_device_private(const struct folio *folio) In general, this LGTM, and it should be correct with PageAnonExclusive I think. However, where exactly is pinning forbidden? Long-term pinning is forbidden since it would interfere with the device memory manager owning the device-coherent pages (e.g. evictions in TTM). However, normal pinning is allowed on this device type. I don't see updates to folio_is_pinnable() in this patch. Device coherent type pages should return true here, as they are pinnable pages. That function is only called for long-term pinnings in try_grab_folio(). So wouldn't try_grab_folio() simply pin these pages? What am I missing? As far as I understand this return NULL for long term pin pages. Otherwise they get refcount incremented. I don't follow. You're saying a) folio_is_pinnable() returns true for device coherent pages and that b) device coherent pages don't get long-term pinned Yet, the code says struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags) { if (flags & FOLL_GET) return try_get_folio(page, refs); else if (flags & FOLL_PIN) { struct folio *folio; /* * Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a * right zone, so fail and let the caller fall back to the slow * path. */ if (unlikely((flags & FOLL_LONGTERM) && !is_pinnable_page(page))) return NULL; ... return folio; } } What prevents these pages from getting long-term pinned as stated in this patch? Long-term pinning is handled by __gup_longterm_locked, which migrates pages returned by __get_user_pages_locked that cannot be long-term pinned. try_grab_folio is OK to grab the pages. Anything that can't be long-term pinned will be migrated afterwards, and __get_user_pages_locked will be retried. The migration of DEVICE_COHERENT pages was implemented by Alistair in patch 5/13 ("mm/gup: migrate device coherent pages when pinning instead of failing"). Regards, Felix I am probably missing something important. P.S.: I'm on vacation and looking at a tiny screen. Hope I didn't miss anything myself.
Re: [PATCH v5 01/13] mm: add zone device coherent type memory support
On Mon, Jun 20, 2022 at 11:50 AM Alistair Popple wrote: > > > Oded Gabbay writes: > > > On Mon, Jun 20, 2022 at 3:33 AM Alistair Popple wrote: > >> > >> > >> Oded Gabbay writes: > >> > >> > On Fri, Jun 17, 2022 at 8:20 PM Sierra Guiza, Alejandro (Alex) > >> > wrote: > >> >> > >> >> > >> >> On 6/17/2022 4:40 AM, David Hildenbrand wrote: > >> >> > On 31.05.22 22:00, Alex Sierra wrote: > >> >> >> Device memory that is cache coherent from device and CPU point of > >> >> >> view. > >> >> >> This is used on platforms that have an advanced system bus (like CAPI > >> >> >> or CXL). Any page of a process can be migrated to such memory. > >> >> >> However, > >> >> >> no one should be allowed to pin such memory so that it can always be > >> >> >> evicted. > >> >> >> > >> >> >> Signed-off-by: Alex Sierra > >> >> >> Acked-by: Felix Kuehling > >> >> >> Reviewed-by: Alistair Popple > >> >> >> [hch: rebased ontop of the refcount changes, > >> >> >>removed is_dev_private_or_coherent_page] > >> >> >> Signed-off-by: Christoph Hellwig > >> >> >> --- > >> >> >> include/linux/memremap.h | 19 +++ > >> >> >> mm/memcontrol.c | 7 --- > >> >> >> mm/memory-failure.c | 8 ++-- > >> >> >> mm/memremap.c| 10 ++ > >> >> >> mm/migrate_device.c | 16 +++- > >> >> >> mm/rmap.c| 5 +++-- > >> >> >> 6 files changed, 49 insertions(+), 16 deletions(-) > >> >> >> > >> >> >> diff --git a/include/linux/memremap.h b/include/linux/memremap.h > >> >> >> index 8af304f6b504..9f752ebed613 100644 > >> >> >> --- a/include/linux/memremap.h > >> >> >> +++ b/include/linux/memremap.h > >> >> >> @@ -41,6 +41,13 @@ struct vmem_altmap { > >> >> >>* A more complete discussion of unaddressable memory may be found > >> >> >> in > >> >> >>* include/linux/hmm.h and Documentation/vm/hmm.rst. > >> >> >>* > >> >> >> + * MEMORY_DEVICE_COHERENT: > >> >> >> + * Device memory that is cache coherent from device and CPU point > >> >> >> of view. This > >> >> >> + * is used on platforms that have an advanced system bus (like CAPI > >> >> >> or CXL). A > >> >> >> + * driver can hotplug the device memory using ZONE_DEVICE and with > >> >> >> that memory > >> >> >> + * type. Any page of a process can be migrated to such memory. > >> >> >> However no one > >> >> > Any page might not be right, I'm pretty sure. ... just thinking about > >> >> > special pages > >> >> > like vdso, shared zeropage, ... pinned pages ... > >> >> > >> >> Hi David, > >> >> > >> >> Yes, I think you're right. This type does not cover all special pages. > >> >> I need to correct that on the cover letter. > >> >> Pinned pages are allowed as long as they're not long term pinned. > >> >> > >> >> Regards, > >> >> Alex Sierra > >> > > >> > What if I want to hotplug this device's coherent memory, but I do > >> > *not* want the OS > >> > to migrate any page to it ? > >> > I want to fully-control what resides on this memory, as I can consider > >> > this memory > >> > "expensive". i.e. I don't have a lot of it, I want to use it for > >> > specific purposes and > >> > I don't want the OS to start using it when there is some memory pressure > >> > in > >> > the system. > >> > >> This is exactly what MEMORY_DEVICE_COHERENT is for. Device coherent > >> pages are only allocated by a device driver and exposed to user-space by > >> a driver migrating pages to them with migrate_vma. The OS can't just > >> start using them due to memory pressure for example. > >> > >> - Alistair > > Thanks for the explanation. > > > > I guess the commit message confused me a bit, especially these two > > sentences: > > > > "Any page of a process can be migrated to such memory. However no one > > should be > > allowed to pin such memory so that it can always be evicted." > > > > I read them as if the OS is free to choose which pages are migrated to > > this memory, > > and anything is eligible for migration to that memory (and that's why > > we also don't > > allow it to pin memory there). > > > > If we are not allowed to pin anything there, can the device driver > > decide to disable > > any option for oversubscription of this memory area ? > > I'm not sure I follow your thinking on how oversubscription would work > here, however all allocations are controlled by the driver. So if a > device's coherent memory is full a driver would be unable to migrate > pages to that device until pages are freed by the OS due to being > unmapped or the driver evicts pages by migrating them back to normal CPU > memory. > > Pinning of pages is allowed, and could prevent such migrations. However > this patch series prevents device coherent pages from being pinned > longterm (ie. with FOLL_LONGTERM), so it should always be able to evict > pages eventually. > > > Let's assume the user uses this memory area for doing p2p with other > > CXL devices. > > In that case, I wouldn't want the driver/OS to migrate pages in and >
Re: [PATCH v5 01/13] mm: add zone device coherent type memory support
Oded Gabbay writes: > On Mon, Jun 20, 2022 at 3:33 AM Alistair Popple wrote: >> >> >> Oded Gabbay writes: >> >> > On Fri, Jun 17, 2022 at 8:20 PM Sierra Guiza, Alejandro (Alex) >> > wrote: >> >> >> >> >> >> On 6/17/2022 4:40 AM, David Hildenbrand wrote: >> >> > On 31.05.22 22:00, Alex Sierra wrote: >> >> >> Device memory that is cache coherent from device and CPU point of view. >> >> >> This is used on platforms that have an advanced system bus (like CAPI >> >> >> or CXL). Any page of a process can be migrated to such memory. However, >> >> >> no one should be allowed to pin such memory so that it can always be >> >> >> evicted. >> >> >> >> >> >> Signed-off-by: Alex Sierra >> >> >> Acked-by: Felix Kuehling >> >> >> Reviewed-by: Alistair Popple >> >> >> [hch: rebased ontop of the refcount changes, >> >> >>removed is_dev_private_or_coherent_page] >> >> >> Signed-off-by: Christoph Hellwig >> >> >> --- >> >> >> include/linux/memremap.h | 19 +++ >> >> >> mm/memcontrol.c | 7 --- >> >> >> mm/memory-failure.c | 8 ++-- >> >> >> mm/memremap.c| 10 ++ >> >> >> mm/migrate_device.c | 16 +++- >> >> >> mm/rmap.c| 5 +++-- >> >> >> 6 files changed, 49 insertions(+), 16 deletions(-) >> >> >> >> >> >> diff --git a/include/linux/memremap.h b/include/linux/memremap.h >> >> >> index 8af304f6b504..9f752ebed613 100644 >> >> >> --- a/include/linux/memremap.h >> >> >> +++ b/include/linux/memremap.h >> >> >> @@ -41,6 +41,13 @@ struct vmem_altmap { >> >> >>* A more complete discussion of unaddressable memory may be found in >> >> >>* include/linux/hmm.h and Documentation/vm/hmm.rst. >> >> >>* >> >> >> + * MEMORY_DEVICE_COHERENT: >> >> >> + * Device memory that is cache coherent from device and CPU point of >> >> >> view. This >> >> >> + * is used on platforms that have an advanced system bus (like CAPI >> >> >> or CXL). A >> >> >> + * driver can hotplug the device memory using ZONE_DEVICE and with >> >> >> that memory >> >> >> + * type. Any page of a process can be migrated to such memory. >> >> >> However no one >> >> > Any page might not be right, I'm pretty sure. ... just thinking about >> >> > special pages >> >> > like vdso, shared zeropage, ... pinned pages ... >> >> >> >> Hi David, >> >> >> >> Yes, I think you're right. This type does not cover all special pages. >> >> I need to correct that on the cover letter. >> >> Pinned pages are allowed as long as they're not long term pinned. >> >> >> >> Regards, >> >> Alex Sierra >> > >> > What if I want to hotplug this device's coherent memory, but I do >> > *not* want the OS >> > to migrate any page to it ? >> > I want to fully-control what resides on this memory, as I can consider >> > this memory >> > "expensive". i.e. I don't have a lot of it, I want to use it for >> > specific purposes and >> > I don't want the OS to start using it when there is some memory pressure in >> > the system. >> >> This is exactly what MEMORY_DEVICE_COHERENT is for. Device coherent >> pages are only allocated by a device driver and exposed to user-space by >> a driver migrating pages to them with migrate_vma. The OS can't just >> start using them due to memory pressure for example. >> >> - Alistair > Thanks for the explanation. > > I guess the commit message confused me a bit, especially these two sentences: > > "Any page of a process can be migrated to such memory. However no one should > be > allowed to pin such memory so that it can always be evicted." > > I read them as if the OS is free to choose which pages are migrated to > this memory, > and anything is eligible for migration to that memory (and that's why > we also don't > allow it to pin memory there). > > If we are not allowed to pin anything there, can the device driver > decide to disable > any option for oversubscription of this memory area ? I'm not sure I follow your thinking on how oversubscription would work here, however all allocations are controlled by the driver. So if a device's coherent memory is full a driver would be unable to migrate pages to that device until pages are freed by the OS due to being unmapped or the driver evicts pages by migrating them back to normal CPU memory. Pinning of pages is allowed, and could prevent such migrations. However this patch series prevents device coherent pages from being pinned longterm (ie. with FOLL_LONGTERM), so it should always be able to evict pages eventually. > Let's assume the user uses this memory area for doing p2p with other > CXL devices. > In that case, I wouldn't want the driver/OS to migrate pages in and > out of that memory... The OS will not migrate pages in or out (although it may free them if no longer required), but a driver might choose to. So at the moment it's really up to the driver to implement what you want in this regards. > So either I should let the user pin those pages, or prevent him from > doing (accidently or
Re: [PATCH v5 01/13] mm: add zone device coherent type memory support
On Mon, Jun 20, 2022 at 3:33 AM Alistair Popple wrote: > > > Oded Gabbay writes: > > > On Fri, Jun 17, 2022 at 8:20 PM Sierra Guiza, Alejandro (Alex) > > wrote: > >> > >> > >> On 6/17/2022 4:40 AM, David Hildenbrand wrote: > >> > On 31.05.22 22:00, Alex Sierra wrote: > >> >> Device memory that is cache coherent from device and CPU point of view. > >> >> This is used on platforms that have an advanced system bus (like CAPI > >> >> or CXL). Any page of a process can be migrated to such memory. However, > >> >> no one should be allowed to pin such memory so that it can always be > >> >> evicted. > >> >> > >> >> Signed-off-by: Alex Sierra > >> >> Acked-by: Felix Kuehling > >> >> Reviewed-by: Alistair Popple > >> >> [hch: rebased ontop of the refcount changes, > >> >>removed is_dev_private_or_coherent_page] > >> >> Signed-off-by: Christoph Hellwig > >> >> --- > >> >> include/linux/memremap.h | 19 +++ > >> >> mm/memcontrol.c | 7 --- > >> >> mm/memory-failure.c | 8 ++-- > >> >> mm/memremap.c| 10 ++ > >> >> mm/migrate_device.c | 16 +++- > >> >> mm/rmap.c| 5 +++-- > >> >> 6 files changed, 49 insertions(+), 16 deletions(-) > >> >> > >> >> diff --git a/include/linux/memremap.h b/include/linux/memremap.h > >> >> index 8af304f6b504..9f752ebed613 100644 > >> >> --- a/include/linux/memremap.h > >> >> +++ b/include/linux/memremap.h > >> >> @@ -41,6 +41,13 @@ struct vmem_altmap { > >> >>* A more complete discussion of unaddressable memory may be found in > >> >>* include/linux/hmm.h and Documentation/vm/hmm.rst. > >> >>* > >> >> + * MEMORY_DEVICE_COHERENT: > >> >> + * Device memory that is cache coherent from device and CPU point of > >> >> view. This > >> >> + * is used on platforms that have an advanced system bus (like CAPI or > >> >> CXL). A > >> >> + * driver can hotplug the device memory using ZONE_DEVICE and with > >> >> that memory > >> >> + * type. Any page of a process can be migrated to such memory. However > >> >> no one > >> > Any page might not be right, I'm pretty sure. ... just thinking about > >> > special pages > >> > like vdso, shared zeropage, ... pinned pages ... > >> > >> Hi David, > >> > >> Yes, I think you're right. This type does not cover all special pages. > >> I need to correct that on the cover letter. > >> Pinned pages are allowed as long as they're not long term pinned. > >> > >> Regards, > >> Alex Sierra > > > > What if I want to hotplug this device's coherent memory, but I do > > *not* want the OS > > to migrate any page to it ? > > I want to fully-control what resides on this memory, as I can consider > > this memory > > "expensive". i.e. I don't have a lot of it, I want to use it for > > specific purposes and > > I don't want the OS to start using it when there is some memory pressure in > > the system. > > This is exactly what MEMORY_DEVICE_COHERENT is for. Device coherent > pages are only allocated by a device driver and exposed to user-space by > a driver migrating pages to them with migrate_vma. The OS can't just > start using them due to memory pressure for example. > > - Alistair Thanks for the explanation. I guess the commit message confused me a bit, especially these two sentences: "Any page of a process can be migrated to such memory. However no one should be allowed to pin such memory so that it can always be evicted." I read them as if the OS is free to choose which pages are migrated to this memory, and anything is eligible for migration to that memory (and that's why we also don't allow it to pin memory there). If we are not allowed to pin anything there, can the device driver decide to disable any option for oversubscription of this memory area ? Let's assume the user uses this memory area for doing p2p with other CXL devices. In that case, I wouldn't want the driver/OS to migrate pages in and out of that memory... So either I should let the user pin those pages, or prevent him from doing (accidently or not) oversubscription in this memory area. wdyt ? > > > Oded > > > >> > >> > > >> >> + * should be allowed to pin such memory so that it can always be > >> >> evicted. > >> >> + * > >> >>* MEMORY_DEVICE_FS_DAX: > >> >>* Host memory that has similar access semantics as System RAM i.e. > >> >> DMA > >> >>* coherent and supports page pinning. In support of coordinating page > >> >> @@ -61,6 +68,7 @@ struct vmem_altmap { > >> >> enum memory_type { > >> >> /* 0 is reserved to catch uninitialized type fields */ > >> >> MEMORY_DEVICE_PRIVATE = 1, > >> >> +MEMORY_DEVICE_COHERENT, > >> >> MEMORY_DEVICE_FS_DAX, > >> >> MEMORY_DEVICE_GENERIC, > >> >> MEMORY_DEVICE_PCI_P2PDMA, > >> >> @@ -143,6 +151,17 @@ static inline bool folio_is_device_private(const > >> >> struct folio *folio) > >> > In general, this LGTM, and it should be correct with PageAnonExclusive I > >> > think. > >> > > >> > > >> >
Re: [PATCH v5 01/13] mm: add zone device coherent type memory support
Oded Gabbay writes: > On Fri, Jun 17, 2022 at 8:20 PM Sierra Guiza, Alejandro (Alex) > wrote: >> >> >> On 6/17/2022 4:40 AM, David Hildenbrand wrote: >> > On 31.05.22 22:00, Alex Sierra wrote: >> >> Device memory that is cache coherent from device and CPU point of view. >> >> This is used on platforms that have an advanced system bus (like CAPI >> >> or CXL). Any page of a process can be migrated to such memory. However, >> >> no one should be allowed to pin such memory so that it can always be >> >> evicted. >> >> >> >> Signed-off-by: Alex Sierra >> >> Acked-by: Felix Kuehling >> >> Reviewed-by: Alistair Popple >> >> [hch: rebased ontop of the refcount changes, >> >>removed is_dev_private_or_coherent_page] >> >> Signed-off-by: Christoph Hellwig >> >> --- >> >> include/linux/memremap.h | 19 +++ >> >> mm/memcontrol.c | 7 --- >> >> mm/memory-failure.c | 8 ++-- >> >> mm/memremap.c| 10 ++ >> >> mm/migrate_device.c | 16 +++- >> >> mm/rmap.c| 5 +++-- >> >> 6 files changed, 49 insertions(+), 16 deletions(-) >> >> >> >> diff --git a/include/linux/memremap.h b/include/linux/memremap.h >> >> index 8af304f6b504..9f752ebed613 100644 >> >> --- a/include/linux/memremap.h >> >> +++ b/include/linux/memremap.h >> >> @@ -41,6 +41,13 @@ struct vmem_altmap { >> >>* A more complete discussion of unaddressable memory may be found in >> >>* include/linux/hmm.h and Documentation/vm/hmm.rst. >> >>* >> >> + * MEMORY_DEVICE_COHERENT: >> >> + * Device memory that is cache coherent from device and CPU point of >> >> view. This >> >> + * is used on platforms that have an advanced system bus (like CAPI or >> >> CXL). A >> >> + * driver can hotplug the device memory using ZONE_DEVICE and with that >> >> memory >> >> + * type. Any page of a process can be migrated to such memory. However >> >> no one >> > Any page might not be right, I'm pretty sure. ... just thinking about >> > special pages >> > like vdso, shared zeropage, ... pinned pages ... >> >> Hi David, >> >> Yes, I think you're right. This type does not cover all special pages. >> I need to correct that on the cover letter. >> Pinned pages are allowed as long as they're not long term pinned. >> >> Regards, >> Alex Sierra > > What if I want to hotplug this device's coherent memory, but I do > *not* want the OS > to migrate any page to it ? > I want to fully-control what resides on this memory, as I can consider > this memory > "expensive". i.e. I don't have a lot of it, I want to use it for > specific purposes and > I don't want the OS to start using it when there is some memory pressure in > the system. This is exactly what MEMORY_DEVICE_COHERENT is for. Device coherent pages are only allocated by a device driver and exposed to user-space by a driver migrating pages to them with migrate_vma. The OS can't just start using them due to memory pressure for example. - Alistair > Oded > >> >> > >> >> + * should be allowed to pin such memory so that it can always be evicted. >> >> + * >> >>* MEMORY_DEVICE_FS_DAX: >> >>* Host memory that has similar access semantics as System RAM i.e. DMA >> >>* coherent and supports page pinning. In support of coordinating page >> >> @@ -61,6 +68,7 @@ struct vmem_altmap { >> >> enum memory_type { >> >> /* 0 is reserved to catch uninitialized type fields */ >> >> MEMORY_DEVICE_PRIVATE = 1, >> >> +MEMORY_DEVICE_COHERENT, >> >> MEMORY_DEVICE_FS_DAX, >> >> MEMORY_DEVICE_GENERIC, >> >> MEMORY_DEVICE_PCI_P2PDMA, >> >> @@ -143,6 +151,17 @@ static inline bool folio_is_device_private(const >> >> struct folio *folio) >> > In general, this LGTM, and it should be correct with PageAnonExclusive I >> > think. >> > >> > >> > However, where exactly is pinning forbidden? >> >> Long-term pinning is forbidden since it would interfere with the device >> memory manager owning the >> device-coherent pages (e.g. evictions in TTM). However, normal pinning >> is allowed on this device type. >> >> Regards, >> Alex Sierra >> >> >
Re: [PATCH v5 01/13] mm: add zone device coherent type memory support
On Fri, Jun 17, 2022 at 8:20 PM Sierra Guiza, Alejandro (Alex) wrote: > > > On 6/17/2022 4:40 AM, David Hildenbrand wrote: > > On 31.05.22 22:00, Alex Sierra wrote: > >> Device memory that is cache coherent from device and CPU point of view. > >> This is used on platforms that have an advanced system bus (like CAPI > >> or CXL). Any page of a process can be migrated to such memory. However, > >> no one should be allowed to pin such memory so that it can always be > >> evicted. > >> > >> Signed-off-by: Alex Sierra > >> Acked-by: Felix Kuehling > >> Reviewed-by: Alistair Popple > >> [hch: rebased ontop of the refcount changes, > >>removed is_dev_private_or_coherent_page] > >> Signed-off-by: Christoph Hellwig > >> --- > >> include/linux/memremap.h | 19 +++ > >> mm/memcontrol.c | 7 --- > >> mm/memory-failure.c | 8 ++-- > >> mm/memremap.c| 10 ++ > >> mm/migrate_device.c | 16 +++- > >> mm/rmap.c| 5 +++-- > >> 6 files changed, 49 insertions(+), 16 deletions(-) > >> > >> diff --git a/include/linux/memremap.h b/include/linux/memremap.h > >> index 8af304f6b504..9f752ebed613 100644 > >> --- a/include/linux/memremap.h > >> +++ b/include/linux/memremap.h > >> @@ -41,6 +41,13 @@ struct vmem_altmap { > >>* A more complete discussion of unaddressable memory may be found in > >>* include/linux/hmm.h and Documentation/vm/hmm.rst. > >>* > >> + * MEMORY_DEVICE_COHERENT: > >> + * Device memory that is cache coherent from device and CPU point of > >> view. This > >> + * is used on platforms that have an advanced system bus (like CAPI or > >> CXL). A > >> + * driver can hotplug the device memory using ZONE_DEVICE and with that > >> memory > >> + * type. Any page of a process can be migrated to such memory. However no > >> one > > Any page might not be right, I'm pretty sure. ... just thinking about > > special pages > > like vdso, shared zeropage, ... pinned pages ... > > Hi David, > > Yes, I think you're right. This type does not cover all special pages. > I need to correct that on the cover letter. > Pinned pages are allowed as long as they're not long term pinned. > > Regards, > Alex Sierra What if I want to hotplug this device's coherent memory, but I do *not* want the OS to migrate any page to it ? I want to fully-control what resides on this memory, as I can consider this memory "expensive". i.e. I don't have a lot of it, I want to use it for specific purposes and I don't want the OS to start using it when there is some memory pressure in the system. Oded > > > > >> + * should be allowed to pin such memory so that it can always be evicted. > >> + * > >>* MEMORY_DEVICE_FS_DAX: > >>* Host memory that has similar access semantics as System RAM i.e. DMA > >>* coherent and supports page pinning. In support of coordinating page > >> @@ -61,6 +68,7 @@ struct vmem_altmap { > >> enum memory_type { > >> /* 0 is reserved to catch uninitialized type fields */ > >> MEMORY_DEVICE_PRIVATE = 1, > >> +MEMORY_DEVICE_COHERENT, > >> MEMORY_DEVICE_FS_DAX, > >> MEMORY_DEVICE_GENERIC, > >> MEMORY_DEVICE_PCI_P2PDMA, > >> @@ -143,6 +151,17 @@ static inline bool folio_is_device_private(const > >> struct folio *folio) > > In general, this LGTM, and it should be correct with PageAnonExclusive I > > think. > > > > > > However, where exactly is pinning forbidden? > > Long-term pinning is forbidden since it would interfere with the device > memory manager owning the > device-coherent pages (e.g. evictions in TTM). However, normal pinning > is allowed on this device type. > > Regards, > Alex Sierra > > >
Re: [PATCH v5 01/13] mm: add zone device coherent type memory support
On 17.06.22 21:27, Sierra Guiza, Alejandro (Alex) wrote: > > On 6/17/2022 12:33 PM, David Hildenbrand wrote: >> On 17.06.22 19:20, Sierra Guiza, Alejandro (Alex) wrote: >>> On 6/17/2022 4:40 AM, David Hildenbrand wrote: On 31.05.22 22:00, Alex Sierra wrote: > Device memory that is cache coherent from device and CPU point of view. > This is used on platforms that have an advanced system bus (like CAPI > or CXL). Any page of a process can be migrated to such memory. However, > no one should be allowed to pin such memory so that it can always be > evicted. > > Signed-off-by: Alex Sierra > Acked-by: Felix Kuehling > Reviewed-by: Alistair Popple > [hch: rebased ontop of the refcount changes, > removed is_dev_private_or_coherent_page] > Signed-off-by: Christoph Hellwig > --- >include/linux/memremap.h | 19 +++ >mm/memcontrol.c | 7 --- >mm/memory-failure.c | 8 ++-- >mm/memremap.c| 10 ++ >mm/migrate_device.c | 16 +++- >mm/rmap.c| 5 +++-- >6 files changed, 49 insertions(+), 16 deletions(-) > > diff --git a/include/linux/memremap.h b/include/linux/memremap.h > index 8af304f6b504..9f752ebed613 100644 > --- a/include/linux/memremap.h > +++ b/include/linux/memremap.h > @@ -41,6 +41,13 @@ struct vmem_altmap { > * A more complete discussion of unaddressable memory may be found in > * include/linux/hmm.h and Documentation/vm/hmm.rst. > * > + * MEMORY_DEVICE_COHERENT: > + * Device memory that is cache coherent from device and CPU point of > view. This > + * is used on platforms that have an advanced system bus (like CAPI or > CXL). A > + * driver can hotplug the device memory using ZONE_DEVICE and with that > memory > + * type. Any page of a process can be migrated to such memory. However > no one Any page might not be right, I'm pretty sure. ... just thinking about special pages like vdso, shared zeropage, ... pinned pages ... >> Well, you cannot migrate long term pages, that's what I meant :) >> > + * should be allowed to pin such memory so that it can always be evicted. > + * > * MEMORY_DEVICE_FS_DAX: > * Host memory that has similar access semantics as System RAM i.e. DMA > * coherent and supports page pinning. In support of coordinating page > @@ -61,6 +68,7 @@ struct vmem_altmap { >enum memory_type { > /* 0 is reserved to catch uninitialized type fields */ > MEMORY_DEVICE_PRIVATE = 1, > + MEMORY_DEVICE_COHERENT, > MEMORY_DEVICE_FS_DAX, > MEMORY_DEVICE_GENERIC, > MEMORY_DEVICE_PCI_P2PDMA, > @@ -143,6 +151,17 @@ static inline bool folio_is_device_private(const > struct folio *folio) In general, this LGTM, and it should be correct with PageAnonExclusive I think. However, where exactly is pinning forbidden? >>> Long-term pinning is forbidden since it would interfere with the device >>> memory manager owning the >>> device-coherent pages (e.g. evictions in TTM). However, normal pinning >>> is allowed on this device type. >> I don't see updates to folio_is_pinnable() in this patch. > Device coherent type pages should return true here, as they are pinnable > pages. That function is only called for long-term pinnings in try_grab_folio(). >> >> So wouldn't try_grab_folio() simply pin these pages? What am I missing? > > As far as I understand this return NULL for long term pin pages. > Otherwise they get refcount incremented. I don't follow. You're saying a) folio_is_pinnable() returns true for device coherent pages and that b) device coherent pages don't get long-term pinned Yet, the code says struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags) { if (flags & FOLL_GET) return try_get_folio(page, refs); else if (flags & FOLL_PIN) { struct folio *folio; /* * Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a * right zone, so fail and let the caller fall back to the slow * path. */ if (unlikely((flags & FOLL_LONGTERM) && !is_pinnable_page(page))) return NULL; ... return folio; } } What prevents these pages from getting long-term pinned as stated in this patch? I am probably missing something important. -- Thanks, David / dhildenb
Re: [PATCH v5 01/13] mm: add zone device coherent type memory support
On 6/17/2022 12:33 PM, David Hildenbrand wrote: On 17.06.22 19:20, Sierra Guiza, Alejandro (Alex) wrote: On 6/17/2022 4:40 AM, David Hildenbrand wrote: On 31.05.22 22:00, Alex Sierra wrote: Device memory that is cache coherent from device and CPU point of view. This is used on platforms that have an advanced system bus (like CAPI or CXL). Any page of a process can be migrated to such memory. However, no one should be allowed to pin such memory so that it can always be evicted. Signed-off-by: Alex Sierra Acked-by: Felix Kuehling Reviewed-by: Alistair Popple [hch: rebased ontop of the refcount changes, removed is_dev_private_or_coherent_page] Signed-off-by: Christoph Hellwig --- include/linux/memremap.h | 19 +++ mm/memcontrol.c | 7 --- mm/memory-failure.c | 8 ++-- mm/memremap.c| 10 ++ mm/migrate_device.c | 16 +++- mm/rmap.c| 5 +++-- 6 files changed, 49 insertions(+), 16 deletions(-) diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 8af304f6b504..9f752ebed613 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -41,6 +41,13 @@ struct vmem_altmap { * A more complete discussion of unaddressable memory may be found in * include/linux/hmm.h and Documentation/vm/hmm.rst. * + * MEMORY_DEVICE_COHERENT: + * Device memory that is cache coherent from device and CPU point of view. This + * is used on platforms that have an advanced system bus (like CAPI or CXL). A + * driver can hotplug the device memory using ZONE_DEVICE and with that memory + * type. Any page of a process can be migrated to such memory. However no one Any page might not be right, I'm pretty sure. ... just thinking about special pages like vdso, shared zeropage, ... pinned pages ... Well, you cannot migrate long term pages, that's what I meant :) + * should be allowed to pin such memory so that it can always be evicted. + * * MEMORY_DEVICE_FS_DAX: * Host memory that has similar access semantics as System RAM i.e. DMA * coherent and supports page pinning. In support of coordinating page @@ -61,6 +68,7 @@ struct vmem_altmap { enum memory_type { /* 0 is reserved to catch uninitialized type fields */ MEMORY_DEVICE_PRIVATE = 1, + MEMORY_DEVICE_COHERENT, MEMORY_DEVICE_FS_DAX, MEMORY_DEVICE_GENERIC, MEMORY_DEVICE_PCI_P2PDMA, @@ -143,6 +151,17 @@ static inline bool folio_is_device_private(const struct folio *folio) In general, this LGTM, and it should be correct with PageAnonExclusive I think. However, where exactly is pinning forbidden? Long-term pinning is forbidden since it would interfere with the device memory manager owning the device-coherent pages (e.g. evictions in TTM). However, normal pinning is allowed on this device type. I don't see updates to folio_is_pinnable() in this patch. Device coherent type pages should return true here, as they are pinnable pages. So wouldn't try_grab_folio() simply pin these pages? What am I missing? As far as I understand this return NULL for long term pin pages. Otherwise they get refcount incremented. Regards, Alex Sierra
Re: [PATCH v5 01/13] mm: add zone device coherent type memory support
On 17.06.22 19:20, Sierra Guiza, Alejandro (Alex) wrote: > > On 6/17/2022 4:40 AM, David Hildenbrand wrote: >> On 31.05.22 22:00, Alex Sierra wrote: >>> Device memory that is cache coherent from device and CPU point of view. >>> This is used on platforms that have an advanced system bus (like CAPI >>> or CXL). Any page of a process can be migrated to such memory. However, >>> no one should be allowed to pin such memory so that it can always be >>> evicted. >>> >>> Signed-off-by: Alex Sierra >>> Acked-by: Felix Kuehling >>> Reviewed-by: Alistair Popple >>> [hch: rebased ontop of the refcount changes, >>>removed is_dev_private_or_coherent_page] >>> Signed-off-by: Christoph Hellwig >>> --- >>> include/linux/memremap.h | 19 +++ >>> mm/memcontrol.c | 7 --- >>> mm/memory-failure.c | 8 ++-- >>> mm/memremap.c| 10 ++ >>> mm/migrate_device.c | 16 +++- >>> mm/rmap.c| 5 +++-- >>> 6 files changed, 49 insertions(+), 16 deletions(-) >>> >>> diff --git a/include/linux/memremap.h b/include/linux/memremap.h >>> index 8af304f6b504..9f752ebed613 100644 >>> --- a/include/linux/memremap.h >>> +++ b/include/linux/memremap.h >>> @@ -41,6 +41,13 @@ struct vmem_altmap { >>>* A more complete discussion of unaddressable memory may be found in >>>* include/linux/hmm.h and Documentation/vm/hmm.rst. >>>* >>> + * MEMORY_DEVICE_COHERENT: >>> + * Device memory that is cache coherent from device and CPU point of view. >>> This >>> + * is used on platforms that have an advanced system bus (like CAPI or >>> CXL). A >>> + * driver can hotplug the device memory using ZONE_DEVICE and with that >>> memory >>> + * type. Any page of a process can be migrated to such memory. However no >>> one >> Any page might not be right, I'm pretty sure. ... just thinking about >> special pages >> like vdso, shared zeropage, ... pinned pages ... > Well, you cannot migrate long term pages, that's what I meant :) >> >>> + * should be allowed to pin such memory so that it can always be evicted. >>> + * >>>* MEMORY_DEVICE_FS_DAX: >>>* Host memory that has similar access semantics as System RAM i.e. DMA >>>* coherent and supports page pinning. In support of coordinating page >>> @@ -61,6 +68,7 @@ struct vmem_altmap { >>> enum memory_type { >>> /* 0 is reserved to catch uninitialized type fields */ >>> MEMORY_DEVICE_PRIVATE = 1, >>> + MEMORY_DEVICE_COHERENT, >>> MEMORY_DEVICE_FS_DAX, >>> MEMORY_DEVICE_GENERIC, >>> MEMORY_DEVICE_PCI_P2PDMA, >>> @@ -143,6 +151,17 @@ static inline bool folio_is_device_private(const >>> struct folio *folio) >> In general, this LGTM, and it should be correct with PageAnonExclusive I >> think. >> >> >> However, where exactly is pinning forbidden? > > Long-term pinning is forbidden since it would interfere with the device > memory manager owning the > device-coherent pages (e.g. evictions in TTM). However, normal pinning > is allowed on this device type. I don't see updates to folio_is_pinnable() in this patch. So wouldn't try_grab_folio() simply pin these pages? What am I missing? -- Thanks, David / dhildenb
Re: [PATCH v5 01/13] mm: add zone device coherent type memory support
On 6/17/2022 4:40 AM, David Hildenbrand wrote: On 31.05.22 22:00, Alex Sierra wrote: Device memory that is cache coherent from device and CPU point of view. This is used on platforms that have an advanced system bus (like CAPI or CXL). Any page of a process can be migrated to such memory. However, no one should be allowed to pin such memory so that it can always be evicted. Signed-off-by: Alex Sierra Acked-by: Felix Kuehling Reviewed-by: Alistair Popple [hch: rebased ontop of the refcount changes, removed is_dev_private_or_coherent_page] Signed-off-by: Christoph Hellwig --- include/linux/memremap.h | 19 +++ mm/memcontrol.c | 7 --- mm/memory-failure.c | 8 ++-- mm/memremap.c| 10 ++ mm/migrate_device.c | 16 +++- mm/rmap.c| 5 +++-- 6 files changed, 49 insertions(+), 16 deletions(-) diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 8af304f6b504..9f752ebed613 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -41,6 +41,13 @@ struct vmem_altmap { * A more complete discussion of unaddressable memory may be found in * include/linux/hmm.h and Documentation/vm/hmm.rst. * + * MEMORY_DEVICE_COHERENT: + * Device memory that is cache coherent from device and CPU point of view. This + * is used on platforms that have an advanced system bus (like CAPI or CXL). A + * driver can hotplug the device memory using ZONE_DEVICE and with that memory + * type. Any page of a process can be migrated to such memory. However no one Any page might not be right, I'm pretty sure. ... just thinking about special pages like vdso, shared zeropage, ... pinned pages ... Hi David, Yes, I think you're right. This type does not cover all special pages. I need to correct that on the cover letter. Pinned pages are allowed as long as they're not long term pinned. Regards, Alex Sierra + * should be allowed to pin such memory so that it can always be evicted. + * * MEMORY_DEVICE_FS_DAX: * Host memory that has similar access semantics as System RAM i.e. DMA * coherent and supports page pinning. In support of coordinating page @@ -61,6 +68,7 @@ struct vmem_altmap { enum memory_type { /* 0 is reserved to catch uninitialized type fields */ MEMORY_DEVICE_PRIVATE = 1, + MEMORY_DEVICE_COHERENT, MEMORY_DEVICE_FS_DAX, MEMORY_DEVICE_GENERIC, MEMORY_DEVICE_PCI_P2PDMA, @@ -143,6 +151,17 @@ static inline bool folio_is_device_private(const struct folio *folio) In general, this LGTM, and it should be correct with PageAnonExclusive I think. However, where exactly is pinning forbidden? Long-term pinning is forbidden since it would interfere with the device memory manager owning the device-coherent pages (e.g. evictions in TTM). However, normal pinning is allowed on this device type. Regards, Alex Sierra
Re: [PATCH v5 01/13] mm: add zone device coherent type memory support
On 31.05.22 22:00, Alex Sierra wrote: > Device memory that is cache coherent from device and CPU point of view. > This is used on platforms that have an advanced system bus (like CAPI > or CXL). Any page of a process can be migrated to such memory. However, > no one should be allowed to pin such memory so that it can always be > evicted. > > Signed-off-by: Alex Sierra > Acked-by: Felix Kuehling > Reviewed-by: Alistair Popple > [hch: rebased ontop of the refcount changes, > removed is_dev_private_or_coherent_page] > Signed-off-by: Christoph Hellwig > --- > include/linux/memremap.h | 19 +++ > mm/memcontrol.c | 7 --- > mm/memory-failure.c | 8 ++-- > mm/memremap.c| 10 ++ > mm/migrate_device.c | 16 +++- > mm/rmap.c| 5 +++-- > 6 files changed, 49 insertions(+), 16 deletions(-) > > diff --git a/include/linux/memremap.h b/include/linux/memremap.h > index 8af304f6b504..9f752ebed613 100644 > --- a/include/linux/memremap.h > +++ b/include/linux/memremap.h > @@ -41,6 +41,13 @@ struct vmem_altmap { > * A more complete discussion of unaddressable memory may be found in > * include/linux/hmm.h and Documentation/vm/hmm.rst. > * > + * MEMORY_DEVICE_COHERENT: > + * Device memory that is cache coherent from device and CPU point of view. > This > + * is used on platforms that have an advanced system bus (like CAPI or CXL). > A > + * driver can hotplug the device memory using ZONE_DEVICE and with that > memory > + * type. Any page of a process can be migrated to such memory. However no one Any page might not be right, I'm pretty sure. ... just thinking about special pages like vdso, shared zeropage, ... pinned pages ... > + * should be allowed to pin such memory so that it can always be evicted. > + * > * MEMORY_DEVICE_FS_DAX: > * Host memory that has similar access semantics as System RAM i.e. DMA > * coherent and supports page pinning. In support of coordinating page > @@ -61,6 +68,7 @@ struct vmem_altmap { > enum memory_type { > /* 0 is reserved to catch uninitialized type fields */ > MEMORY_DEVICE_PRIVATE = 1, > + MEMORY_DEVICE_COHERENT, > MEMORY_DEVICE_FS_DAX, > MEMORY_DEVICE_GENERIC, > MEMORY_DEVICE_PCI_P2PDMA, > @@ -143,6 +151,17 @@ static inline bool folio_is_device_private(const struct > folio *folio) In general, this LGTM, and it should be correct with PageAnonExclusive I think. However, where exactly is pinning forbidden? -- Thanks, David / dhildenb
[PATCH v5 01/13] mm: add zone device coherent type memory support
Device memory that is cache coherent from device and CPU point of view. This is used on platforms that have an advanced system bus (like CAPI or CXL). Any page of a process can be migrated to such memory. However, no one should be allowed to pin such memory so that it can always be evicted. Signed-off-by: Alex Sierra Acked-by: Felix Kuehling Reviewed-by: Alistair Popple [hch: rebased ontop of the refcount changes, removed is_dev_private_or_coherent_page] Signed-off-by: Christoph Hellwig --- include/linux/memremap.h | 19 +++ mm/memcontrol.c | 7 --- mm/memory-failure.c | 8 ++-- mm/memremap.c| 10 ++ mm/migrate_device.c | 16 +++- mm/rmap.c| 5 +++-- 6 files changed, 49 insertions(+), 16 deletions(-) diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 8af304f6b504..9f752ebed613 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -41,6 +41,13 @@ struct vmem_altmap { * A more complete discussion of unaddressable memory may be found in * include/linux/hmm.h and Documentation/vm/hmm.rst. * + * MEMORY_DEVICE_COHERENT: + * Device memory that is cache coherent from device and CPU point of view. This + * is used on platforms that have an advanced system bus (like CAPI or CXL). A + * driver can hotplug the device memory using ZONE_DEVICE and with that memory + * type. Any page of a process can be migrated to such memory. However no one + * should be allowed to pin such memory so that it can always be evicted. + * * MEMORY_DEVICE_FS_DAX: * Host memory that has similar access semantics as System RAM i.e. DMA * coherent and supports page pinning. In support of coordinating page @@ -61,6 +68,7 @@ struct vmem_altmap { enum memory_type { /* 0 is reserved to catch uninitialized type fields */ MEMORY_DEVICE_PRIVATE = 1, + MEMORY_DEVICE_COHERENT, MEMORY_DEVICE_FS_DAX, MEMORY_DEVICE_GENERIC, MEMORY_DEVICE_PCI_P2PDMA, @@ -143,6 +151,17 @@ static inline bool folio_is_device_private(const struct folio *folio) return is_device_private_page(>page); } +static inline bool is_device_coherent_page(const struct page *page) +{ + return is_zone_device_page(page) && + page->pgmap->type == MEMORY_DEVICE_COHERENT; +} + +static inline bool folio_is_device_coherent(const struct folio *folio) +{ + return is_device_coherent_page(>page); +} + static inline bool is_pci_p2pdma_page(const struct page *page) { return IS_ENABLED(CONFIG_PCI_P2PDMA) && diff --git a/mm/memcontrol.c b/mm/memcontrol.c index abec50f31fe6..93f80d7ca148 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5665,8 +5665,8 @@ static int mem_cgroup_move_account(struct page *page, * 2(MC_TARGET_SWAP): if the swap entry corresponding to this pte is a * target for charge migration. if @target is not NULL, the entry is stored * in target->ent. - * 3(MC_TARGET_DEVICE): like MC_TARGET_PAGE but page is MEMORY_DEVICE_PRIVATE - * (so ZONE_DEVICE page and thus not on the lru). + * 3(MC_TARGET_DEVICE): like MC_TARGET_PAGE but page is device memory and + * thus not on the lru. * For now we such page is charge like a regular page would be as for all * intent and purposes it is just special memory taking the place of a * regular page. @@ -5704,7 +5704,8 @@ static enum mc_target_type get_mctgt_type(struct vm_area_struct *vma, */ if (page_memcg(page) == mc.from) { ret = MC_TARGET_PAGE; - if (is_device_private_page(page)) + if (is_device_private_page(page) || + is_device_coherent_page(page)) ret = MC_TARGET_DEVICE; if (target) target->page = page; diff --git a/mm/memory-failure.c b/mm/memory-failure.c index b85661cbdc4a..0b6a0a01ee09 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1683,12 +1683,16 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags, goto unlock; } - if (pgmap->type == MEMORY_DEVICE_PRIVATE) { + switch (pgmap->type) { + case MEMORY_DEVICE_PRIVATE: + case MEMORY_DEVICE_COHERENT: /* -* TODO: Handle HMM pages which may need coordination +* TODO: Handle device pages which may need coordination * with device-side memory. */ goto unlock; + default: + break; } /* diff --git a/mm/memremap.c b/mm/memremap.c index 2b92e97cb25b..dbd2631b3520 100644 --- a/mm/memremap.c +++ b/mm/memremap.c @@ -315,6 +315,16 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid) return ERR_PTR(-EINVAL); } break; + case