Re: [PATCH v11 07/14] HMM: mm add helper to update page table when migrating memory v2.

2015-10-22 Thread Hillf Danton
> > > This is a multi-stage process, first we save and replace page table
> > > entry with special HMM entry, also flushing tlb in the process. If
> > > we run into non allocated entry we either use the zero page or we
> > > allocate new page. For swaped entry we try to swap them in.
> > >
> > Please elaborate why swap entry is handled this way.
> 
> So first, this is only when you have a device then use HMM and a device
> that use memory migration. So far it only make sense for discrete GPUs.
> So regular workload that do not use a GPUs with HMM are not impacted and
> will not go throught this code path.
> 
> Now, here we are migrating memory because the device driver is asking for
> it, so presumably we are expecting that the device will use that memory
> hence we want to swap in anything that have been swap to disk. Once it is
> swap in memory we copy it to device memory and free the pages. So in the
> end we only need to allocate a page temporarily until we move things to
> the device.
> 
I prefer it is in log message.

thanks
Hillf

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v11 07/14] HMM: mm add helper to update page table when migrating memory v2.

2015-10-22 Thread Jerome Glisse
On Thu, Oct 22, 2015 at 05:46:47PM +0800, Hillf Danton wrote:
> > 
> > This is a multi-stage process, first we save and replace page table
> > entry with special HMM entry, also flushing tlb in the process. If
> > we run into non allocated entry we either use the zero page or we
> > allocate new page. For swaped entry we try to swap them in.
> > 
> Please elaborate why swap entry is handled this way.

So first, this is only when you have a device then use HMM and a device
that use memory migration. So far it only make sense for discrete GPUs.
So regular workload that do not use a GPUs with HMM are not impacted and
will not go throught this code path.

Now, here we are migrating memory because the device driver is asking for
it, so presumably we are expecting that the device will use that memory
hence we want to swap in anything that have been swap to disk. Once it is
swap in memory we copy it to device memory and free the pages. So in the
end we only need to allocate a page temporarily until we move things to
the device.

Cheers,
Jérôme
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v11 07/14] HMM: mm add helper to update page table when migrating memory v2.

2015-10-22 Thread Hillf Danton
> 
> This is a multi-stage process, first we save and replace page table
> entry with special HMM entry, also flushing tlb in the process. If
> we run into non allocated entry we either use the zero page or we
> allocate new page. For swaped entry we try to swap them in.
> 
Please elaborate why swap entry is handled this way.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v11 07/14] HMM: mm add helper to update page table when migrating memory v2.

2015-10-22 Thread Hillf Danton
> 
> This is a multi-stage process, first we save and replace page table
> entry with special HMM entry, also flushing tlb in the process. If
> we run into non allocated entry we either use the zero page or we
> allocate new page. For swaped entry we try to swap them in.
> 
Please elaborate why swap entry is handled this way.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v11 07/14] HMM: mm add helper to update page table when migrating memory v2.

2015-10-22 Thread Jerome Glisse
On Thu, Oct 22, 2015 at 05:46:47PM +0800, Hillf Danton wrote:
> > 
> > This is a multi-stage process, first we save and replace page table
> > entry with special HMM entry, also flushing tlb in the process. If
> > we run into non allocated entry we either use the zero page or we
> > allocate new page. For swaped entry we try to swap them in.
> > 
> Please elaborate why swap entry is handled this way.

So first, this is only when you have a device then use HMM and a device
that use memory migration. So far it only make sense for discrete GPUs.
So regular workload that do not use a GPUs with HMM are not impacted and
will not go throught this code path.

Now, here we are migrating memory because the device driver is asking for
it, so presumably we are expecting that the device will use that memory
hence we want to swap in anything that have been swap to disk. Once it is
swap in memory we copy it to device memory and free the pages. So in the
end we only need to allocate a page temporarily until we move things to
the device.

Cheers,
Jérôme
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v11 07/14] HMM: mm add helper to update page table when migrating memory v2.

2015-10-22 Thread Hillf Danton
> > > This is a multi-stage process, first we save and replace page table
> > > entry with special HMM entry, also flushing tlb in the process. If
> > > we run into non allocated entry we either use the zero page or we
> > > allocate new page. For swaped entry we try to swap them in.
> > >
> > Please elaborate why swap entry is handled this way.
> 
> So first, this is only when you have a device then use HMM and a device
> that use memory migration. So far it only make sense for discrete GPUs.
> So regular workload that do not use a GPUs with HMM are not impacted and
> will not go throught this code path.
> 
> Now, here we are migrating memory because the device driver is asking for
> it, so presumably we are expecting that the device will use that memory
> hence we want to swap in anything that have been swap to disk. Once it is
> swap in memory we copy it to device memory and free the pages. So in the
> end we only need to allocate a page temporarily until we move things to
> the device.
> 
I prefer it is in log message.

thanks
Hillf

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v11 07/14] HMM: mm add helper to update page table when migrating memory v2.

2015-10-21 Thread Jérôme Glisse
For doing memory migration to remote memory we need to unmap range
of anonymous memory from CPU page table and replace page table entry
with special HMM entry.

This is a multi-stage process, first we save and replace page table
entry with special HMM entry, also flushing tlb in the process. If
we run into non allocated entry we either use the zero page or we
allocate new page. For swaped entry we try to swap them in.

Once we have set the page table entry to the special entry we check
the page backing each of the address to make sure that only page
table mappings are holding reference on the page, which means we
can safely migrate the page to device memory. Because the CPU page
table entry are special entry, no get_user_pages() can reference
the page anylonger. So we are safe from race on that front. Note
that the page can still be referenced by get_user_pages() from
other process but in that case the page is write protected and
as we do not drop the mapcount nor the page count we know that
all user of get_user_pages() are only doing read only access (on
write access they would allocate a new page).

Once we have identified all the page that are safe to migrate the
first function return and let HMM schedule the migration with the
device driver.

Finaly there is a cleanup function that will drop the mapcount and
reference count on all page that have been successfully migrated,
or restore the page table entry otherwise.

Changed since v1:
  - Fix pmd/pte allocation when migrating.
  - Fix reverse logic on mm_forbids_zeropage()
  - Add comment on why we add to lru list new page.

Signed-off-by: Jérôme Glisse 
---
 include/linux/mm.h |  14 ++
 mm/memory.c| 471 +
 2 files changed, 485 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 3cb884f..f478076 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2345,6 +2345,20 @@ static inline void hmm_mm_init(struct mm_struct *mm)
mm->hmm = NULL;
 }
 
+int mm_hmm_migrate(struct mm_struct *mm,
+  struct vm_area_struct *vma,
+  pte_t *save_pte,
+  bool *backoff,
+  const void *mmu_notifier_exclude,
+  unsigned long start,
+  unsigned long end);
+void mm_hmm_migrate_cleanup(struct mm_struct *mm,
+   struct vm_area_struct *vma,
+   pte_t *save_pte,
+   dma_addr_t *hmm_pte,
+   unsigned long start,
+   unsigned long end);
+
 int mm_hmm_migrate_back(struct mm_struct *mm,
struct vm_area_struct *vma,
pte_t *new_pte,
diff --git a/mm/memory.c b/mm/memory.c
index 4b90e8b..268569e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -54,6 +54,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -3757,6 +3758,476 @@ void mm_hmm_migrate_back_cleanup(struct mm_struct *mm,
}
 }
 EXPORT_SYMBOL(mm_hmm_migrate_back_cleanup);
+
+/* mm_hmm_migrate() - unmap range and set special HMM pte for it.
+ *
+ * @mm: The mm struct.
+ * @vma: The vm area struct the range is in.
+ * @save_pte: array where to save current CPU page table entry value.
+ * @backoff: Pointer toward a boolean indicating that we need to stop.
+ * @exclude: The mmu_notifier listener to exclude from mmu_notifier callback.
+ * @start: Start address of the range (inclusive).
+ * @end: End address of the range (exclusive).
+ * Returns: 0 on success, -EINVAL if some argument where invalid, -ENOMEM if
+ * it failed allocating memory for performing the operation, -EFAULT if some
+ * memory backing the range is in bad state, -EAGAIN if backoff flag turned
+ * to true.
+ *
+ * The process of memory migration is bit involve, first we must set all CPU
+ * page table entry to the special HMM locked entry ensuring us exclusive
+ * control over the page table entry (ie no other process can change the page
+ * table but us).
+ *
+ * While doing that we must handle empty and swaped entry. For empty entry we
+ * either use the zero page or allocate a new page. For swap entry we call
+ * __handle_mm_fault() to try to faultin the page (swap entry can be a number
+ * of thing).
+ *
+ * Once we have unmapped we need to check that we can effectively migrate the
+ * page, by testing that no one is holding a reference on the page beside the
+ * reference taken by each page mapping.
+ *
+ * On success every valid entry inside save_pte array is an entry that can be
+ * migrated.
+ *
+ * Note that this function does not free any of the page, nor does it updates
+ * the various memcg counter (exception being for accounting new allocation).
+ * This happen inside the mm_hmm_migrate_cleanup() function.
+ *
+ */
+int mm_hmm_migrate(struct mm_struct *mm,
+  struct vm_area_struct *vma,
+  pte_t *save_pte,
+  bool 

[PATCH v11 07/14] HMM: mm add helper to update page table when migrating memory v2.

2015-10-21 Thread Jérôme Glisse
For doing memory migration to remote memory we need to unmap range
of anonymous memory from CPU page table and replace page table entry
with special HMM entry.

This is a multi-stage process, first we save and replace page table
entry with special HMM entry, also flushing tlb in the process. If
we run into non allocated entry we either use the zero page or we
allocate new page. For swaped entry we try to swap them in.

Once we have set the page table entry to the special entry we check
the page backing each of the address to make sure that only page
table mappings are holding reference on the page, which means we
can safely migrate the page to device memory. Because the CPU page
table entry are special entry, no get_user_pages() can reference
the page anylonger. So we are safe from race on that front. Note
that the page can still be referenced by get_user_pages() from
other process but in that case the page is write protected and
as we do not drop the mapcount nor the page count we know that
all user of get_user_pages() are only doing read only access (on
write access they would allocate a new page).

Once we have identified all the page that are safe to migrate the
first function return and let HMM schedule the migration with the
device driver.

Finaly there is a cleanup function that will drop the mapcount and
reference count on all page that have been successfully migrated,
or restore the page table entry otherwise.

Changed since v1:
  - Fix pmd/pte allocation when migrating.
  - Fix reverse logic on mm_forbids_zeropage()
  - Add comment on why we add to lru list new page.

Signed-off-by: Jérôme Glisse 
---
 include/linux/mm.h |  14 ++
 mm/memory.c| 471 +
 2 files changed, 485 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 3cb884f..f478076 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2345,6 +2345,20 @@ static inline void hmm_mm_init(struct mm_struct *mm)
mm->hmm = NULL;
 }
 
+int mm_hmm_migrate(struct mm_struct *mm,
+  struct vm_area_struct *vma,
+  pte_t *save_pte,
+  bool *backoff,
+  const void *mmu_notifier_exclude,
+  unsigned long start,
+  unsigned long end);
+void mm_hmm_migrate_cleanup(struct mm_struct *mm,
+   struct vm_area_struct *vma,
+   pte_t *save_pte,
+   dma_addr_t *hmm_pte,
+   unsigned long start,
+   unsigned long end);
+
 int mm_hmm_migrate_back(struct mm_struct *mm,
struct vm_area_struct *vma,
pte_t *new_pte,
diff --git a/mm/memory.c b/mm/memory.c
index 4b90e8b..268569e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -54,6 +54,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -3757,6 +3758,476 @@ void mm_hmm_migrate_back_cleanup(struct mm_struct *mm,
}
 }
 EXPORT_SYMBOL(mm_hmm_migrate_back_cleanup);
+
+/* mm_hmm_migrate() - unmap range and set special HMM pte for it.
+ *
+ * @mm: The mm struct.
+ * @vma: The vm area struct the range is in.
+ * @save_pte: array where to save current CPU page table entry value.
+ * @backoff: Pointer toward a boolean indicating that we need to stop.
+ * @exclude: The mmu_notifier listener to exclude from mmu_notifier callback.
+ * @start: Start address of the range (inclusive).
+ * @end: End address of the range (exclusive).
+ * Returns: 0 on success, -EINVAL if some argument where invalid, -ENOMEM if
+ * it failed allocating memory for performing the operation, -EFAULT if some
+ * memory backing the range is in bad state, -EAGAIN if backoff flag turned
+ * to true.
+ *
+ * The process of memory migration is bit involve, first we must set all CPU
+ * page table entry to the special HMM locked entry ensuring us exclusive
+ * control over the page table entry (ie no other process can change the page
+ * table but us).
+ *
+ * While doing that we must handle empty and swaped entry. For empty entry we
+ * either use the zero page or allocate a new page. For swap entry we call
+ * __handle_mm_fault() to try to faultin the page (swap entry can be a number
+ * of thing).
+ *
+ * Once we have unmapped we need to check that we can effectively migrate the
+ * page, by testing that no one is holding a reference on the page beside the
+ * reference taken by each page mapping.
+ *
+ * On success every valid entry inside save_pte array is an entry that can be
+ * migrated.
+ *
+ * Note that this function does not free any of the page, nor does it updates
+ * the various memcg counter (exception being for accounting new allocation).
+ * This happen inside the mm_hmm_migrate_cleanup() function.
+ *
+ */
+int mm_hmm_migrate(struct mm_struct *mm,
+  struct vm_area_struct *vma,
+  pte_t *save_pte,
+