Re: [PATCH v11 07/14] HMM: mm add helper to update page table when migrating memory v2.
> > > This is a multi-stage process, first we save and replace page table > > > entry with special HMM entry, also flushing tlb in the process. If > > > we run into non allocated entry we either use the zero page or we > > > allocate new page. For swaped entry we try to swap them in. > > > > > Please elaborate why swap entry is handled this way. > > So first, this is only when you have a device then use HMM and a device > that use memory migration. So far it only make sense for discrete GPUs. > So regular workload that do not use a GPUs with HMM are not impacted and > will not go throught this code path. > > Now, here we are migrating memory because the device driver is asking for > it, so presumably we are expecting that the device will use that memory > hence we want to swap in anything that have been swap to disk. Once it is > swap in memory we copy it to device memory and free the pages. So in the > end we only need to allocate a page temporarily until we move things to > the device. > I prefer it is in log message. thanks Hillf -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v11 07/14] HMM: mm add helper to update page table when migrating memory v2.
On Thu, Oct 22, 2015 at 05:46:47PM +0800, Hillf Danton wrote: > > > > This is a multi-stage process, first we save and replace page table > > entry with special HMM entry, also flushing tlb in the process. If > > we run into non allocated entry we either use the zero page or we > > allocate new page. For swaped entry we try to swap them in. > > > Please elaborate why swap entry is handled this way. So first, this is only when you have a device then use HMM and a device that use memory migration. So far it only make sense for discrete GPUs. So regular workload that do not use a GPUs with HMM are not impacted and will not go throught this code path. Now, here we are migrating memory because the device driver is asking for it, so presumably we are expecting that the device will use that memory hence we want to swap in anything that have been swap to disk. Once it is swap in memory we copy it to device memory and free the pages. So in the end we only need to allocate a page temporarily until we move things to the device. Cheers, Jérôme -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v11 07/14] HMM: mm add helper to update page table when migrating memory v2.
> > This is a multi-stage process, first we save and replace page table > entry with special HMM entry, also flushing tlb in the process. If > we run into non allocated entry we either use the zero page or we > allocate new page. For swaped entry we try to swap them in. > Please elaborate why swap entry is handled this way. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v11 07/14] HMM: mm add helper to update page table when migrating memory v2.
> > This is a multi-stage process, first we save and replace page table > entry with special HMM entry, also flushing tlb in the process. If > we run into non allocated entry we either use the zero page or we > allocate new page. For swaped entry we try to swap them in. > Please elaborate why swap entry is handled this way. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v11 07/14] HMM: mm add helper to update page table when migrating memory v2.
On Thu, Oct 22, 2015 at 05:46:47PM +0800, Hillf Danton wrote: > > > > This is a multi-stage process, first we save and replace page table > > entry with special HMM entry, also flushing tlb in the process. If > > we run into non allocated entry we either use the zero page or we > > allocate new page. For swaped entry we try to swap them in. > > > Please elaborate why swap entry is handled this way. So first, this is only when you have a device then use HMM and a device that use memory migration. So far it only make sense for discrete GPUs. So regular workload that do not use a GPUs with HMM are not impacted and will not go throught this code path. Now, here we are migrating memory because the device driver is asking for it, so presumably we are expecting that the device will use that memory hence we want to swap in anything that have been swap to disk. Once it is swap in memory we copy it to device memory and free the pages. So in the end we only need to allocate a page temporarily until we move things to the device. Cheers, Jérôme -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v11 07/14] HMM: mm add helper to update page table when migrating memory v2.
> > > This is a multi-stage process, first we save and replace page table > > > entry with special HMM entry, also flushing tlb in the process. If > > > we run into non allocated entry we either use the zero page or we > > > allocate new page. For swaped entry we try to swap them in. > > > > > Please elaborate why swap entry is handled this way. > > So first, this is only when you have a device then use HMM and a device > that use memory migration. So far it only make sense for discrete GPUs. > So regular workload that do not use a GPUs with HMM are not impacted and > will not go throught this code path. > > Now, here we are migrating memory because the device driver is asking for > it, so presumably we are expecting that the device will use that memory > hence we want to swap in anything that have been swap to disk. Once it is > swap in memory we copy it to device memory and free the pages. So in the > end we only need to allocate a page temporarily until we move things to > the device. > I prefer it is in log message. thanks Hillf -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v11 07/14] HMM: mm add helper to update page table when migrating memory v2.
For doing memory migration to remote memory we need to unmap range of anonymous memory from CPU page table and replace page table entry with special HMM entry. This is a multi-stage process, first we save and replace page table entry with special HMM entry, also flushing tlb in the process. If we run into non allocated entry we either use the zero page or we allocate new page. For swaped entry we try to swap them in. Once we have set the page table entry to the special entry we check the page backing each of the address to make sure that only page table mappings are holding reference on the page, which means we can safely migrate the page to device memory. Because the CPU page table entry are special entry, no get_user_pages() can reference the page anylonger. So we are safe from race on that front. Note that the page can still be referenced by get_user_pages() from other process but in that case the page is write protected and as we do not drop the mapcount nor the page count we know that all user of get_user_pages() are only doing read only access (on write access they would allocate a new page). Once we have identified all the page that are safe to migrate the first function return and let HMM schedule the migration with the device driver. Finaly there is a cleanup function that will drop the mapcount and reference count on all page that have been successfully migrated, or restore the page table entry otherwise. Changed since v1: - Fix pmd/pte allocation when migrating. - Fix reverse logic on mm_forbids_zeropage() - Add comment on why we add to lru list new page. Signed-off-by: Jérôme Glisse --- include/linux/mm.h | 14 ++ mm/memory.c| 471 + 2 files changed, 485 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index 3cb884f..f478076 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2345,6 +2345,20 @@ static inline void hmm_mm_init(struct mm_struct *mm) mm->hmm = NULL; } +int mm_hmm_migrate(struct mm_struct *mm, + struct vm_area_struct *vma, + pte_t *save_pte, + bool *backoff, + const void *mmu_notifier_exclude, + unsigned long start, + unsigned long end); +void mm_hmm_migrate_cleanup(struct mm_struct *mm, + struct vm_area_struct *vma, + pte_t *save_pte, + dma_addr_t *hmm_pte, + unsigned long start, + unsigned long end); + int mm_hmm_migrate_back(struct mm_struct *mm, struct vm_area_struct *vma, pte_t *new_pte, diff --git a/mm/memory.c b/mm/memory.c index 4b90e8b..268569e 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -54,6 +54,7 @@ #include #include #include +#include #include #include #include @@ -3757,6 +3758,476 @@ void mm_hmm_migrate_back_cleanup(struct mm_struct *mm, } } EXPORT_SYMBOL(mm_hmm_migrate_back_cleanup); + +/* mm_hmm_migrate() - unmap range and set special HMM pte for it. + * + * @mm: The mm struct. + * @vma: The vm area struct the range is in. + * @save_pte: array where to save current CPU page table entry value. + * @backoff: Pointer toward a boolean indicating that we need to stop. + * @exclude: The mmu_notifier listener to exclude from mmu_notifier callback. + * @start: Start address of the range (inclusive). + * @end: End address of the range (exclusive). + * Returns: 0 on success, -EINVAL if some argument where invalid, -ENOMEM if + * it failed allocating memory for performing the operation, -EFAULT if some + * memory backing the range is in bad state, -EAGAIN if backoff flag turned + * to true. + * + * The process of memory migration is bit involve, first we must set all CPU + * page table entry to the special HMM locked entry ensuring us exclusive + * control over the page table entry (ie no other process can change the page + * table but us). + * + * While doing that we must handle empty and swaped entry. For empty entry we + * either use the zero page or allocate a new page. For swap entry we call + * __handle_mm_fault() to try to faultin the page (swap entry can be a number + * of thing). + * + * Once we have unmapped we need to check that we can effectively migrate the + * page, by testing that no one is holding a reference on the page beside the + * reference taken by each page mapping. + * + * On success every valid entry inside save_pte array is an entry that can be + * migrated. + * + * Note that this function does not free any of the page, nor does it updates + * the various memcg counter (exception being for accounting new allocation). + * This happen inside the mm_hmm_migrate_cleanup() function. + * + */ +int mm_hmm_migrate(struct mm_struct *mm, + struct vm_area_struct *vma, + pte_t *save_pte, + bool
[PATCH v11 07/14] HMM: mm add helper to update page table when migrating memory v2.
For doing memory migration to remote memory we need to unmap range of anonymous memory from CPU page table and replace page table entry with special HMM entry. This is a multi-stage process, first we save and replace page table entry with special HMM entry, also flushing tlb in the process. If we run into non allocated entry we either use the zero page or we allocate new page. For swaped entry we try to swap them in. Once we have set the page table entry to the special entry we check the page backing each of the address to make sure that only page table mappings are holding reference on the page, which means we can safely migrate the page to device memory. Because the CPU page table entry are special entry, no get_user_pages() can reference the page anylonger. So we are safe from race on that front. Note that the page can still be referenced by get_user_pages() from other process but in that case the page is write protected and as we do not drop the mapcount nor the page count we know that all user of get_user_pages() are only doing read only access (on write access they would allocate a new page). Once we have identified all the page that are safe to migrate the first function return and let HMM schedule the migration with the device driver. Finaly there is a cleanup function that will drop the mapcount and reference count on all page that have been successfully migrated, or restore the page table entry otherwise. Changed since v1: - Fix pmd/pte allocation when migrating. - Fix reverse logic on mm_forbids_zeropage() - Add comment on why we add to lru list new page. Signed-off-by: Jérôme Glisse--- include/linux/mm.h | 14 ++ mm/memory.c| 471 + 2 files changed, 485 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index 3cb884f..f478076 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2345,6 +2345,20 @@ static inline void hmm_mm_init(struct mm_struct *mm) mm->hmm = NULL; } +int mm_hmm_migrate(struct mm_struct *mm, + struct vm_area_struct *vma, + pte_t *save_pte, + bool *backoff, + const void *mmu_notifier_exclude, + unsigned long start, + unsigned long end); +void mm_hmm_migrate_cleanup(struct mm_struct *mm, + struct vm_area_struct *vma, + pte_t *save_pte, + dma_addr_t *hmm_pte, + unsigned long start, + unsigned long end); + int mm_hmm_migrate_back(struct mm_struct *mm, struct vm_area_struct *vma, pte_t *new_pte, diff --git a/mm/memory.c b/mm/memory.c index 4b90e8b..268569e 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -54,6 +54,7 @@ #include #include #include +#include #include #include #include @@ -3757,6 +3758,476 @@ void mm_hmm_migrate_back_cleanup(struct mm_struct *mm, } } EXPORT_SYMBOL(mm_hmm_migrate_back_cleanup); + +/* mm_hmm_migrate() - unmap range and set special HMM pte for it. + * + * @mm: The mm struct. + * @vma: The vm area struct the range is in. + * @save_pte: array where to save current CPU page table entry value. + * @backoff: Pointer toward a boolean indicating that we need to stop. + * @exclude: The mmu_notifier listener to exclude from mmu_notifier callback. + * @start: Start address of the range (inclusive). + * @end: End address of the range (exclusive). + * Returns: 0 on success, -EINVAL if some argument where invalid, -ENOMEM if + * it failed allocating memory for performing the operation, -EFAULT if some + * memory backing the range is in bad state, -EAGAIN if backoff flag turned + * to true. + * + * The process of memory migration is bit involve, first we must set all CPU + * page table entry to the special HMM locked entry ensuring us exclusive + * control over the page table entry (ie no other process can change the page + * table but us). + * + * While doing that we must handle empty and swaped entry. For empty entry we + * either use the zero page or allocate a new page. For swap entry we call + * __handle_mm_fault() to try to faultin the page (swap entry can be a number + * of thing). + * + * Once we have unmapped we need to check that we can effectively migrate the + * page, by testing that no one is holding a reference on the page beside the + * reference taken by each page mapping. + * + * On success every valid entry inside save_pte array is an entry that can be + * migrated. + * + * Note that this function does not free any of the page, nor does it updates + * the various memcg counter (exception being for accounting new allocation). + * This happen inside the mm_hmm_migrate_cleanup() function. + * + */ +int mm_hmm_migrate(struct mm_struct *mm, + struct vm_area_struct *vma, + pte_t *save_pte, +