Re: [PATCH] mm: migration: fix migration of huge PMD shared pages
On Tue, Aug 14, 2018 at 05:15:57PM -0700, Mike Kravetz wrote: > On 08/14/2018 01:48 AM, Kirill A. Shutemov wrote: > > On Mon, Aug 13, 2018 at 11:21:41PM +, Mike Kravetz wrote: > >> On 08/13/2018 03:58 AM, Kirill A. Shutemov wrote: > >>> On Sun, Aug 12, 2018 at 08:41:08PM -0700, Mike Kravetz wrote: > I am not %100 sure on the required flushing, so suggestions would be > appreciated. This also should go to stable. It has been around for > a long time so still looking for an appropriate 'fixes:'. > >>> > >>> I believe we need flushing. And huge_pmd_unshare() usage in > >>> __unmap_hugepage_range() looks suspicious: I don't see how we flush TLB in > >>> that case. > >> > >> Thanks Kirill, > >> > >> __unmap_hugepage_range() has two callers: > >> 1) unmap_hugepage_range, which wraps the call with tlb_gather_mmu and > >>tlb_finish_mmu on the range. IIUC, this should cause an appropriate > >>TLB flush. > >> 2) __unmap_hugepage_range_final via unmap_single_vma. unmap_single_vma > >> has three callers: > >> - unmap_vmas which assumes the caller will flush the whole range after > >> return. > >> - zap_page_range wraps the call with tlb_gather_mmu/tlb_finish_mmu > >> - zap_page_range_single wraps the call with tlb_gather_mmu/tlb_finish_mmu > >> > >> So, it appears we are covered. But, I could be missing something. > > > > My problem here is that the mapping that moved by huge_pmd_unshare() in > > not accounted into mmu_gather and can be missed on tlb_finish_mmu(). > > Ah, I think I now see the issue you are concerned with. > > When huge_pmd_unshare succeeds we effectively unmap a PUD_SIZE area. > The routine __unmap_hugepage_range may only have been passed a range > that is a subset of PUD_SIZE. In the case I was trying to address, > try_to_unmap_one() the 'range' will certainly be less than PUD_SIZE. > Upon further thought, I think that even in the case of try_to_unmap_one > we should flush PUD_SIZE range. > > My first thought would be to embed this flushing within huge_pmd_unshare > itself. Perhaps, whenever huge_pmd_unshare succeeds we should do an > explicit: > flush_cache_range(PUD_SIZE) > flush_tlb_range(PUD_SIZE) > mmu_notifier_invalidate_range(PUD_SIZE) > That would take some of the burden off the callers of huge_pmd_unshare. > However, I am not sure if the flushing calls above play nice in all the > calling environments. I'll look into it some more, but would appreciate > additional comments. I don't think it would work: flush_tlb_range() does IPI and calling it under spinlock will not go well. I think we need to find a way to account it properly in the mmu_gather. It's not obvious to me how. -- Kirill A. Shutemov
Re: [PATCH] mm: migration: fix migration of huge PMD shared pages
On Tue, Aug 14, 2018 at 05:15:57PM -0700, Mike Kravetz wrote: > On 08/14/2018 01:48 AM, Kirill A. Shutemov wrote: > > On Mon, Aug 13, 2018 at 11:21:41PM +, Mike Kravetz wrote: > >> On 08/13/2018 03:58 AM, Kirill A. Shutemov wrote: > >>> On Sun, Aug 12, 2018 at 08:41:08PM -0700, Mike Kravetz wrote: > I am not %100 sure on the required flushing, so suggestions would be > appreciated. This also should go to stable. It has been around for > a long time so still looking for an appropriate 'fixes:'. > >>> > >>> I believe we need flushing. And huge_pmd_unshare() usage in > >>> __unmap_hugepage_range() looks suspicious: I don't see how we flush TLB in > >>> that case. > >> > >> Thanks Kirill, > >> > >> __unmap_hugepage_range() has two callers: > >> 1) unmap_hugepage_range, which wraps the call with tlb_gather_mmu and > >>tlb_finish_mmu on the range. IIUC, this should cause an appropriate > >>TLB flush. > >> 2) __unmap_hugepage_range_final via unmap_single_vma. unmap_single_vma > >> has three callers: > >> - unmap_vmas which assumes the caller will flush the whole range after > >> return. > >> - zap_page_range wraps the call with tlb_gather_mmu/tlb_finish_mmu > >> - zap_page_range_single wraps the call with tlb_gather_mmu/tlb_finish_mmu > >> > >> So, it appears we are covered. But, I could be missing something. > > > > My problem here is that the mapping that moved by huge_pmd_unshare() in > > not accounted into mmu_gather and can be missed on tlb_finish_mmu(). > > Ah, I think I now see the issue you are concerned with. > > When huge_pmd_unshare succeeds we effectively unmap a PUD_SIZE area. > The routine __unmap_hugepage_range may only have been passed a range > that is a subset of PUD_SIZE. In the case I was trying to address, > try_to_unmap_one() the 'range' will certainly be less than PUD_SIZE. > Upon further thought, I think that even in the case of try_to_unmap_one > we should flush PUD_SIZE range. > > My first thought would be to embed this flushing within huge_pmd_unshare > itself. Perhaps, whenever huge_pmd_unshare succeeds we should do an > explicit: > flush_cache_range(PUD_SIZE) > flush_tlb_range(PUD_SIZE) > mmu_notifier_invalidate_range(PUD_SIZE) > That would take some of the burden off the callers of huge_pmd_unshare. > However, I am not sure if the flushing calls above play nice in all the > calling environments. I'll look into it some more, but would appreciate > additional comments. I don't think it would work: flush_tlb_range() does IPI and calling it under spinlock will not go well. I think we need to find a way to account it properly in the mmu_gather. It's not obvious to me how. -- Kirill A. Shutemov
Re: [PATCH] mm: migration: fix migration of huge PMD shared pages
On 08/14/2018 01:48 AM, Kirill A. Shutemov wrote: > On Mon, Aug 13, 2018 at 11:21:41PM +, Mike Kravetz wrote: >> On 08/13/2018 03:58 AM, Kirill A. Shutemov wrote: >>> On Sun, Aug 12, 2018 at 08:41:08PM -0700, Mike Kravetz wrote: I am not %100 sure on the required flushing, so suggestions would be appreciated. This also should go to stable. It has been around for a long time so still looking for an appropriate 'fixes:'. >>> >>> I believe we need flushing. And huge_pmd_unshare() usage in >>> __unmap_hugepage_range() looks suspicious: I don't see how we flush TLB in >>> that case. >> >> Thanks Kirill, >> >> __unmap_hugepage_range() has two callers: >> 1) unmap_hugepage_range, which wraps the call with tlb_gather_mmu and >>tlb_finish_mmu on the range. IIUC, this should cause an appropriate >>TLB flush. >> 2) __unmap_hugepage_range_final via unmap_single_vma. unmap_single_vma >> has three callers: >> - unmap_vmas which assumes the caller will flush the whole range after >> return. >> - zap_page_range wraps the call with tlb_gather_mmu/tlb_finish_mmu >> - zap_page_range_single wraps the call with tlb_gather_mmu/tlb_finish_mmu >> >> So, it appears we are covered. But, I could be missing something. > > My problem here is that the mapping that moved by huge_pmd_unshare() in > not accounted into mmu_gather and can be missed on tlb_finish_mmu(). Ah, I think I now see the issue you are concerned with. When huge_pmd_unshare succeeds we effectively unmap a PUD_SIZE area. The routine __unmap_hugepage_range may only have been passed a range that is a subset of PUD_SIZE. In the case I was trying to address, try_to_unmap_one() the 'range' will certainly be less than PUD_SIZE. Upon further thought, I think that even in the case of try_to_unmap_one we should flush PUD_SIZE range. My first thought would be to embed this flushing within huge_pmd_unshare itself. Perhaps, whenever huge_pmd_unshare succeeds we should do an explicit: flush_cache_range(PUD_SIZE) flush_tlb_range(PUD_SIZE) mmu_notifier_invalidate_range(PUD_SIZE) That would take some of the burden off the callers of huge_pmd_unshare. However, I am not sure if the flushing calls above play nice in all the calling environments. I'll look into it some more, but would appreciate additional comments. -- Mike Kravetz
Re: [PATCH] mm: migration: fix migration of huge PMD shared pages
On 08/14/2018 01:48 AM, Kirill A. Shutemov wrote: > On Mon, Aug 13, 2018 at 11:21:41PM +, Mike Kravetz wrote: >> On 08/13/2018 03:58 AM, Kirill A. Shutemov wrote: >>> On Sun, Aug 12, 2018 at 08:41:08PM -0700, Mike Kravetz wrote: I am not %100 sure on the required flushing, so suggestions would be appreciated. This also should go to stable. It has been around for a long time so still looking for an appropriate 'fixes:'. >>> >>> I believe we need flushing. And huge_pmd_unshare() usage in >>> __unmap_hugepage_range() looks suspicious: I don't see how we flush TLB in >>> that case. >> >> Thanks Kirill, >> >> __unmap_hugepage_range() has two callers: >> 1) unmap_hugepage_range, which wraps the call with tlb_gather_mmu and >>tlb_finish_mmu on the range. IIUC, this should cause an appropriate >>TLB flush. >> 2) __unmap_hugepage_range_final via unmap_single_vma. unmap_single_vma >> has three callers: >> - unmap_vmas which assumes the caller will flush the whole range after >> return. >> - zap_page_range wraps the call with tlb_gather_mmu/tlb_finish_mmu >> - zap_page_range_single wraps the call with tlb_gather_mmu/tlb_finish_mmu >> >> So, it appears we are covered. But, I could be missing something. > > My problem here is that the mapping that moved by huge_pmd_unshare() in > not accounted into mmu_gather and can be missed on tlb_finish_mmu(). Ah, I think I now see the issue you are concerned with. When huge_pmd_unshare succeeds we effectively unmap a PUD_SIZE area. The routine __unmap_hugepage_range may only have been passed a range that is a subset of PUD_SIZE. In the case I was trying to address, try_to_unmap_one() the 'range' will certainly be less than PUD_SIZE. Upon further thought, I think that even in the case of try_to_unmap_one we should flush PUD_SIZE range. My first thought would be to embed this flushing within huge_pmd_unshare itself. Perhaps, whenever huge_pmd_unshare succeeds we should do an explicit: flush_cache_range(PUD_SIZE) flush_tlb_range(PUD_SIZE) mmu_notifier_invalidate_range(PUD_SIZE) That would take some of the burden off the callers of huge_pmd_unshare. However, I am not sure if the flushing calls above play nice in all the calling environments. I'll look into it some more, but would appreciate additional comments. -- Mike Kravetz
Re: [PATCH] mm: migration: fix migration of huge PMD shared pages
On Mon, Aug 13, 2018 at 11:21:41PM +, Mike Kravetz wrote: > On 08/13/2018 03:58 AM, Kirill A. Shutemov wrote: > > On Sun, Aug 12, 2018 at 08:41:08PM -0700, Mike Kravetz wrote: > >> The page migration code employs try_to_unmap() to try and unmap the > >> source page. This is accomplished by using rmap_walk to find all > >> vmas where the page is mapped. This search stops when page mapcount > >> is zero. For shared PMD huge pages, the page map count is always 1 > >> not matter the number of mappings. Shared mappings are tracked via > >> the reference count of the PMD page. Therefore, try_to_unmap stops > >> prematurely and does not completely unmap all mappings of the source > >> page. > >> > >> This problem can result is data corruption as writes to the original > >> source page can happen after contents of the page are copied to the > >> target page. Hence, data is lost. > >> > >> This problem was originally seen as DB corruption of shared global > >> areas after a huge page was soft offlined. DB developers noticed > >> they could reproduce the issue by (hotplug) offlining memory used > >> to back huge pages. A simple testcase can reproduce the problem by > >> creating a shared PMD mapping (note that this must be at least > >> PUD_SIZE in size and PUD_SIZE aligned (1GB on x86)), and using > >> migrate_pages() to migrate process pages between nodes. > >> > >> To fix, have the try_to_unmap_one routine check for huge PMD sharing > >> by calling huge_pmd_unshare for hugetlbfs huge pages. If it is a > >> shared mapping it will be 'unshared' which removes the page table > >> entry and drops reference on PMD page. After this, flush caches and > >> TLB. > >> > >> Signed-off-by: Mike Kravetz > >> --- > >> I am not %100 sure on the required flushing, so suggestions would be > >> appreciated. This also should go to stable. It has been around for > >> a long time so still looking for an appropriate 'fixes:'. > > > > I believe we need flushing. And huge_pmd_unshare() usage in > > __unmap_hugepage_range() looks suspicious: I don't see how we flush TLB in > > that case. > > Thanks Kirill, > > __unmap_hugepage_range() has two callers: > 1) unmap_hugepage_range, which wraps the call with tlb_gather_mmu and >tlb_finish_mmu on the range. IIUC, this should cause an appropriate >TLB flush. > 2) __unmap_hugepage_range_final via unmap_single_vma. unmap_single_vma > has three callers: > - unmap_vmas which assumes the caller will flush the whole range after > return. > - zap_page_range wraps the call with tlb_gather_mmu/tlb_finish_mmu > - zap_page_range_single wraps the call with tlb_gather_mmu/tlb_finish_mmu > > So, it appears we are covered. But, I could be missing something. My problem here is that the mapping that moved by huge_pmd_unshare() in not accounted into mmu_gather and can be missed on tlb_finish_mmu(). -- Kirill A. Shutemov
Re: [PATCH] mm: migration: fix migration of huge PMD shared pages
On Mon, Aug 13, 2018 at 11:21:41PM +, Mike Kravetz wrote: > On 08/13/2018 03:58 AM, Kirill A. Shutemov wrote: > > On Sun, Aug 12, 2018 at 08:41:08PM -0700, Mike Kravetz wrote: > >> The page migration code employs try_to_unmap() to try and unmap the > >> source page. This is accomplished by using rmap_walk to find all > >> vmas where the page is mapped. This search stops when page mapcount > >> is zero. For shared PMD huge pages, the page map count is always 1 > >> not matter the number of mappings. Shared mappings are tracked via > >> the reference count of the PMD page. Therefore, try_to_unmap stops > >> prematurely and does not completely unmap all mappings of the source > >> page. > >> > >> This problem can result is data corruption as writes to the original > >> source page can happen after contents of the page are copied to the > >> target page. Hence, data is lost. > >> > >> This problem was originally seen as DB corruption of shared global > >> areas after a huge page was soft offlined. DB developers noticed > >> they could reproduce the issue by (hotplug) offlining memory used > >> to back huge pages. A simple testcase can reproduce the problem by > >> creating a shared PMD mapping (note that this must be at least > >> PUD_SIZE in size and PUD_SIZE aligned (1GB on x86)), and using > >> migrate_pages() to migrate process pages between nodes. > >> > >> To fix, have the try_to_unmap_one routine check for huge PMD sharing > >> by calling huge_pmd_unshare for hugetlbfs huge pages. If it is a > >> shared mapping it will be 'unshared' which removes the page table > >> entry and drops reference on PMD page. After this, flush caches and > >> TLB. > >> > >> Signed-off-by: Mike Kravetz > >> --- > >> I am not %100 sure on the required flushing, so suggestions would be > >> appreciated. This also should go to stable. It has been around for > >> a long time so still looking for an appropriate 'fixes:'. > > > > I believe we need flushing. And huge_pmd_unshare() usage in > > __unmap_hugepage_range() looks suspicious: I don't see how we flush TLB in > > that case. > > Thanks Kirill, > > __unmap_hugepage_range() has two callers: > 1) unmap_hugepage_range, which wraps the call with tlb_gather_mmu and >tlb_finish_mmu on the range. IIUC, this should cause an appropriate >TLB flush. > 2) __unmap_hugepage_range_final via unmap_single_vma. unmap_single_vma > has three callers: > - unmap_vmas which assumes the caller will flush the whole range after > return. > - zap_page_range wraps the call with tlb_gather_mmu/tlb_finish_mmu > - zap_page_range_single wraps the call with tlb_gather_mmu/tlb_finish_mmu > > So, it appears we are covered. But, I could be missing something. My problem here is that the mapping that moved by huge_pmd_unshare() in not accounted into mmu_gather and can be missed on tlb_finish_mmu(). -- Kirill A. Shutemov
Re: [PATCH] mm: migration: fix migration of huge PMD shared pages
On 08/13/2018 03:58 AM, Kirill A. Shutemov wrote: > On Sun, Aug 12, 2018 at 08:41:08PM -0700, Mike Kravetz wrote: >> The page migration code employs try_to_unmap() to try and unmap the >> source page. This is accomplished by using rmap_walk to find all >> vmas where the page is mapped. This search stops when page mapcount >> is zero. For shared PMD huge pages, the page map count is always 1 >> not matter the number of mappings. Shared mappings are tracked via >> the reference count of the PMD page. Therefore, try_to_unmap stops >> prematurely and does not completely unmap all mappings of the source >> page. >> >> This problem can result is data corruption as writes to the original >> source page can happen after contents of the page are copied to the >> target page. Hence, data is lost. >> >> This problem was originally seen as DB corruption of shared global >> areas after a huge page was soft offlined. DB developers noticed >> they could reproduce the issue by (hotplug) offlining memory used >> to back huge pages. A simple testcase can reproduce the problem by >> creating a shared PMD mapping (note that this must be at least >> PUD_SIZE in size and PUD_SIZE aligned (1GB on x86)), and using >> migrate_pages() to migrate process pages between nodes. >> >> To fix, have the try_to_unmap_one routine check for huge PMD sharing >> by calling huge_pmd_unshare for hugetlbfs huge pages. If it is a >> shared mapping it will be 'unshared' which removes the page table >> entry and drops reference on PMD page. After this, flush caches and >> TLB. >> >> Signed-off-by: Mike Kravetz >> --- >> I am not %100 sure on the required flushing, so suggestions would be >> appreciated. This also should go to stable. It has been around for >> a long time so still looking for an appropriate 'fixes:'. > > I believe we need flushing. And huge_pmd_unshare() usage in > __unmap_hugepage_range() looks suspicious: I don't see how we flush TLB in > that case. Thanks Kirill, __unmap_hugepage_range() has two callers: 1) unmap_hugepage_range, which wraps the call with tlb_gather_mmu and tlb_finish_mmu on the range. IIUC, this should cause an appropriate TLB flush. 2) __unmap_hugepage_range_final via unmap_single_vma. unmap_single_vma has three callers: - unmap_vmas which assumes the caller will flush the whole range after return. - zap_page_range wraps the call with tlb_gather_mmu/tlb_finish_mmu - zap_page_range_single wraps the call with tlb_gather_mmu/tlb_finish_mmu So, it appears we are covered. But, I could be missing something. My primary reason for asking the question was with respect to the code added to try_to_unmap_one. In my testing, the changes I added appeared to be required. Just wanted to make sure. I need to fix a build issue and will send another version. -- Mike Kravetz
Re: [PATCH] mm: migration: fix migration of huge PMD shared pages
On 08/13/2018 03:58 AM, Kirill A. Shutemov wrote: > On Sun, Aug 12, 2018 at 08:41:08PM -0700, Mike Kravetz wrote: >> The page migration code employs try_to_unmap() to try and unmap the >> source page. This is accomplished by using rmap_walk to find all >> vmas where the page is mapped. This search stops when page mapcount >> is zero. For shared PMD huge pages, the page map count is always 1 >> not matter the number of mappings. Shared mappings are tracked via >> the reference count of the PMD page. Therefore, try_to_unmap stops >> prematurely and does not completely unmap all mappings of the source >> page. >> >> This problem can result is data corruption as writes to the original >> source page can happen after contents of the page are copied to the >> target page. Hence, data is lost. >> >> This problem was originally seen as DB corruption of shared global >> areas after a huge page was soft offlined. DB developers noticed >> they could reproduce the issue by (hotplug) offlining memory used >> to back huge pages. A simple testcase can reproduce the problem by >> creating a shared PMD mapping (note that this must be at least >> PUD_SIZE in size and PUD_SIZE aligned (1GB on x86)), and using >> migrate_pages() to migrate process pages between nodes. >> >> To fix, have the try_to_unmap_one routine check for huge PMD sharing >> by calling huge_pmd_unshare for hugetlbfs huge pages. If it is a >> shared mapping it will be 'unshared' which removes the page table >> entry and drops reference on PMD page. After this, flush caches and >> TLB. >> >> Signed-off-by: Mike Kravetz >> --- >> I am not %100 sure on the required flushing, so suggestions would be >> appreciated. This also should go to stable. It has been around for >> a long time so still looking for an appropriate 'fixes:'. > > I believe we need flushing. And huge_pmd_unshare() usage in > __unmap_hugepage_range() looks suspicious: I don't see how we flush TLB in > that case. Thanks Kirill, __unmap_hugepage_range() has two callers: 1) unmap_hugepage_range, which wraps the call with tlb_gather_mmu and tlb_finish_mmu on the range. IIUC, this should cause an appropriate TLB flush. 2) __unmap_hugepage_range_final via unmap_single_vma. unmap_single_vma has three callers: - unmap_vmas which assumes the caller will flush the whole range after return. - zap_page_range wraps the call with tlb_gather_mmu/tlb_finish_mmu - zap_page_range_single wraps the call with tlb_gather_mmu/tlb_finish_mmu So, it appears we are covered. But, I could be missing something. My primary reason for asking the question was with respect to the code added to try_to_unmap_one. In my testing, the changes I added appeared to be required. Just wanted to make sure. I need to fix a build issue and will send another version. -- Mike Kravetz
Re: [PATCH] mm: migration: fix migration of huge PMD shared pages
On Sun, Aug 12, 2018 at 08:41:08PM -0700, Mike Kravetz wrote: > The page migration code employs try_to_unmap() to try and unmap the > source page. This is accomplished by using rmap_walk to find all > vmas where the page is mapped. This search stops when page mapcount > is zero. For shared PMD huge pages, the page map count is always 1 > not matter the number of mappings. Shared mappings are tracked via > the reference count of the PMD page. Therefore, try_to_unmap stops > prematurely and does not completely unmap all mappings of the source > page. > > This problem can result is data corruption as writes to the original > source page can happen after contents of the page are copied to the > target page. Hence, data is lost. > > This problem was originally seen as DB corruption of shared global > areas after a huge page was soft offlined. DB developers noticed > they could reproduce the issue by (hotplug) offlining memory used > to back huge pages. A simple testcase can reproduce the problem by > creating a shared PMD mapping (note that this must be at least > PUD_SIZE in size and PUD_SIZE aligned (1GB on x86)), and using > migrate_pages() to migrate process pages between nodes. > > To fix, have the try_to_unmap_one routine check for huge PMD sharing > by calling huge_pmd_unshare for hugetlbfs huge pages. If it is a > shared mapping it will be 'unshared' which removes the page table > entry and drops reference on PMD page. After this, flush caches and > TLB. > > Signed-off-by: Mike Kravetz > --- > I am not %100 sure on the required flushing, so suggestions would be > appreciated. This also should go to stable. It has been around for > a long time so still looking for an appropriate 'fixes:'. I believe we need flushing. And huge_pmd_unshare() usage in __unmap_hugepage_range() looks suspicious: I don't see how we flush TLB in that case. -- Kirill A. Shutemov
Re: [PATCH] mm: migration: fix migration of huge PMD shared pages
On Sun, Aug 12, 2018 at 08:41:08PM -0700, Mike Kravetz wrote: > The page migration code employs try_to_unmap() to try and unmap the > source page. This is accomplished by using rmap_walk to find all > vmas where the page is mapped. This search stops when page mapcount > is zero. For shared PMD huge pages, the page map count is always 1 > not matter the number of mappings. Shared mappings are tracked via > the reference count of the PMD page. Therefore, try_to_unmap stops > prematurely and does not completely unmap all mappings of the source > page. > > This problem can result is data corruption as writes to the original > source page can happen after contents of the page are copied to the > target page. Hence, data is lost. > > This problem was originally seen as DB corruption of shared global > areas after a huge page was soft offlined. DB developers noticed > they could reproduce the issue by (hotplug) offlining memory used > to back huge pages. A simple testcase can reproduce the problem by > creating a shared PMD mapping (note that this must be at least > PUD_SIZE in size and PUD_SIZE aligned (1GB on x86)), and using > migrate_pages() to migrate process pages between nodes. > > To fix, have the try_to_unmap_one routine check for huge PMD sharing > by calling huge_pmd_unshare for hugetlbfs huge pages. If it is a > shared mapping it will be 'unshared' which removes the page table > entry and drops reference on PMD page. After this, flush caches and > TLB. > > Signed-off-by: Mike Kravetz > --- > I am not %100 sure on the required flushing, so suggestions would be > appreciated. This also should go to stable. It has been around for > a long time so still looking for an appropriate 'fixes:'. I believe we need flushing. And huge_pmd_unshare() usage in __unmap_hugepage_range() looks suspicious: I don't see how we flush TLB in that case. -- Kirill A. Shutemov
Re: [PATCH] mm: migration: fix migration of huge PMD shared pages
Hi Mike, I love your patch! Yet something to improve: [auto build test ERROR on linus/master] [also build test ERROR on v4.18 next-20180810] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Mike-Kravetz/mm-migration-fix-migration-of-huge-PMD-shared-pages/20180813-114549 config: i386-tinyconfig (attached as .config) compiler: gcc-7 (Debian 7.3.0-16) 7.3.0 reproduce: # save the attached .config to linux build tree make ARCH=i386 All errors (new ones prefixed by >>): mm/rmap.c: In function 'try_to_unmap_one': >> mm/rmap.c:1425:7: error: implicit declaration of function >> 'huge_pmd_unshare'; did you mean 'do_huge_pmd_wp_page'? >> [-Werror=implicit-function-declaration] huge_pmd_unshare(mm, , pvmw.pte)) { ^~~~ do_huge_pmd_wp_page cc1: some warnings being treated as errors vim +1425 mm/rmap.c 1382 1383 /* 1384 * If the page is mlock()d, we cannot swap it out. 1385 * If it's recently referenced (perhaps page_referenced 1386 * skipped over this mm) then we should reactivate it. 1387 */ 1388 if (!(flags & TTU_IGNORE_MLOCK)) { 1389 if (vma->vm_flags & VM_LOCKED) { 1390 /* PTE-mapped THP are never mlocked */ 1391 if (!PageTransCompound(page)) { 1392 /* 1393 * Holding pte lock, we do *not* need 1394 * mmap_sem here 1395 */ 1396 mlock_vma_page(page); 1397 } 1398 ret = false; 1399 page_vma_mapped_walk_done(); 1400 break; 1401 } 1402 if (flags & TTU_MUNLOCK) 1403 continue; 1404 } 1405 1406 /* Unexpected PMD-mapped THP? */ 1407 VM_BUG_ON_PAGE(!pvmw.pte, page); 1408 1409 subpage = page - page_to_pfn(page) + pte_pfn(*pvmw.pte); 1410 address = pvmw.address; 1411 1412 /* 1413 * PMDs for hugetlbfs pages could be shared. In this case, 1414 * pages with shared PMDs will have a mapcount of 1 no matter 1415 * how many times it is actually mapped. Map counting for 1416 * PMD sharing is mostly done via the reference count on the 1417 * PMD page itself. If the page we are trying to unmap is a 1418 * hugetlbfs page, attempt to 'unshare' at the PMD level. 1419 * huge_pmd_unshare takes care of clearing the PUD and 1420 * reference counting on the PMD page which effectively unmaps 1421 * the page. Take care of flushing cache and TLB for page in 1422 * this specific mapping here. 1423 */ 1424 if (PageHuge(page) && > 1425 huge_pmd_unshare(mm, , pvmw.pte)) { 1426 unsigned long end_add = address + vma_mmu_pagesize(vma); 1427 1428 flush_cache_range(vma, address, end_add); 1429 flush_tlb_range(vma, address, end_add); 1430 mmu_notifier_invalidate_range(mm, address, end_add); 1431 continue; 1432 } 1433 1434 if (IS_ENABLED(CONFIG_MIGRATION) && 1435 (flags & TTU_MIGRATION) && 1436 is_zone_device_page(page)) { 1437 swp_entry_t entry; 1438 pte_t swp_pte; 1439 1440 pteval = ptep_get_and_clear(mm, pvmw.address, pvmw.pte); 1441 1442 /* 1443 * Store the pfn of the page in a special migration 1444 * pte. do_swap_page() will wait until the migration 1445 * pte is removed and then restart fault handling. 1446 */ 1447 entry = make_migration_entry(page, 0); 1448 swp_pte = swp_entry_to_pte(entry); 1449 if (pte_soft_dirty(pteval)) 1450 swp_pte = pte_swp_mksoft_dirty(swp_pte); 1451 set_pte_at(mm, pvmw.address, pvmw.pte, swp_pte); 1452
Re: [PATCH] mm: migration: fix migration of huge PMD shared pages
Hi Mike, I love your patch! Yet something to improve: [auto build test ERROR on linus/master] [also build test ERROR on v4.18 next-20180810] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Mike-Kravetz/mm-migration-fix-migration-of-huge-PMD-shared-pages/20180813-114549 config: i386-tinyconfig (attached as .config) compiler: gcc-7 (Debian 7.3.0-16) 7.3.0 reproduce: # save the attached .config to linux build tree make ARCH=i386 All errors (new ones prefixed by >>): mm/rmap.c: In function 'try_to_unmap_one': >> mm/rmap.c:1425:7: error: implicit declaration of function >> 'huge_pmd_unshare'; did you mean 'do_huge_pmd_wp_page'? >> [-Werror=implicit-function-declaration] huge_pmd_unshare(mm, , pvmw.pte)) { ^~~~ do_huge_pmd_wp_page cc1: some warnings being treated as errors vim +1425 mm/rmap.c 1382 1383 /* 1384 * If the page is mlock()d, we cannot swap it out. 1385 * If it's recently referenced (perhaps page_referenced 1386 * skipped over this mm) then we should reactivate it. 1387 */ 1388 if (!(flags & TTU_IGNORE_MLOCK)) { 1389 if (vma->vm_flags & VM_LOCKED) { 1390 /* PTE-mapped THP are never mlocked */ 1391 if (!PageTransCompound(page)) { 1392 /* 1393 * Holding pte lock, we do *not* need 1394 * mmap_sem here 1395 */ 1396 mlock_vma_page(page); 1397 } 1398 ret = false; 1399 page_vma_mapped_walk_done(); 1400 break; 1401 } 1402 if (flags & TTU_MUNLOCK) 1403 continue; 1404 } 1405 1406 /* Unexpected PMD-mapped THP? */ 1407 VM_BUG_ON_PAGE(!pvmw.pte, page); 1408 1409 subpage = page - page_to_pfn(page) + pte_pfn(*pvmw.pte); 1410 address = pvmw.address; 1411 1412 /* 1413 * PMDs for hugetlbfs pages could be shared. In this case, 1414 * pages with shared PMDs will have a mapcount of 1 no matter 1415 * how many times it is actually mapped. Map counting for 1416 * PMD sharing is mostly done via the reference count on the 1417 * PMD page itself. If the page we are trying to unmap is a 1418 * hugetlbfs page, attempt to 'unshare' at the PMD level. 1419 * huge_pmd_unshare takes care of clearing the PUD and 1420 * reference counting on the PMD page which effectively unmaps 1421 * the page. Take care of flushing cache and TLB for page in 1422 * this specific mapping here. 1423 */ 1424 if (PageHuge(page) && > 1425 huge_pmd_unshare(mm, , pvmw.pte)) { 1426 unsigned long end_add = address + vma_mmu_pagesize(vma); 1427 1428 flush_cache_range(vma, address, end_add); 1429 flush_tlb_range(vma, address, end_add); 1430 mmu_notifier_invalidate_range(mm, address, end_add); 1431 continue; 1432 } 1433 1434 if (IS_ENABLED(CONFIG_MIGRATION) && 1435 (flags & TTU_MIGRATION) && 1436 is_zone_device_page(page)) { 1437 swp_entry_t entry; 1438 pte_t swp_pte; 1439 1440 pteval = ptep_get_and_clear(mm, pvmw.address, pvmw.pte); 1441 1442 /* 1443 * Store the pfn of the page in a special migration 1444 * pte. do_swap_page() will wait until the migration 1445 * pte is removed and then restart fault handling. 1446 */ 1447 entry = make_migration_entry(page, 0); 1448 swp_pte = swp_entry_to_pte(entry); 1449 if (pte_soft_dirty(pteval)) 1450 swp_pte = pte_swp_mksoft_dirty(swp_pte); 1451 set_pte_at(mm, pvmw.address, pvmw.pte, swp_pte); 1452
Re: [PATCH] mm: migration: fix migration of huge PMD shared pages
Hi Mike, I love your patch! Yet something to improve: [auto build test ERROR on linus/master] [also build test ERROR on v4.18 next-20180810] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Mike-Kravetz/mm-migration-fix-migration-of-huge-PMD-shared-pages/20180813-114549 config: i386-randconfig-x003-201832 (attached as .config) compiler: gcc-7 (Debian 7.3.0-16) 7.3.0 reproduce: # save the attached .config to linux build tree make ARCH=i386 All errors (new ones prefixed by >>): mm/rmap.c: In function 'try_to_unmap_one': >> mm/rmap.c:1425:7: error: implicit declaration of function >> 'huge_pmd_unshare'; did you mean '__NR_unshare'? >> [-Werror=implicit-function-declaration] huge_pmd_unshare(mm, , pvmw.pte)) { ^~~~ __NR_unshare cc1: some warnings being treated as errors vim +1425 mm/rmap.c 1382 1383 /* 1384 * If the page is mlock()d, we cannot swap it out. 1385 * If it's recently referenced (perhaps page_referenced 1386 * skipped over this mm) then we should reactivate it. 1387 */ 1388 if (!(flags & TTU_IGNORE_MLOCK)) { 1389 if (vma->vm_flags & VM_LOCKED) { 1390 /* PTE-mapped THP are never mlocked */ 1391 if (!PageTransCompound(page)) { 1392 /* 1393 * Holding pte lock, we do *not* need 1394 * mmap_sem here 1395 */ 1396 mlock_vma_page(page); 1397 } 1398 ret = false; 1399 page_vma_mapped_walk_done(); 1400 break; 1401 } 1402 if (flags & TTU_MUNLOCK) 1403 continue; 1404 } 1405 1406 /* Unexpected PMD-mapped THP? */ 1407 VM_BUG_ON_PAGE(!pvmw.pte, page); 1408 1409 subpage = page - page_to_pfn(page) + pte_pfn(*pvmw.pte); 1410 address = pvmw.address; 1411 1412 /* 1413 * PMDs for hugetlbfs pages could be shared. In this case, 1414 * pages with shared PMDs will have a mapcount of 1 no matter 1415 * how many times it is actually mapped. Map counting for 1416 * PMD sharing is mostly done via the reference count on the 1417 * PMD page itself. If the page we are trying to unmap is a 1418 * hugetlbfs page, attempt to 'unshare' at the PMD level. 1419 * huge_pmd_unshare takes care of clearing the PUD and 1420 * reference counting on the PMD page which effectively unmaps 1421 * the page. Take care of flushing cache and TLB for page in 1422 * this specific mapping here. 1423 */ 1424 if (PageHuge(page) && > 1425 huge_pmd_unshare(mm, , pvmw.pte)) { 1426 unsigned long end_add = address + vma_mmu_pagesize(vma); 1427 1428 flush_cache_range(vma, address, end_add); 1429 flush_tlb_range(vma, address, end_add); 1430 mmu_notifier_invalidate_range(mm, address, end_add); 1431 continue; 1432 } 1433 1434 if (IS_ENABLED(CONFIG_MIGRATION) && 1435 (flags & TTU_MIGRATION) && 1436 is_zone_device_page(page)) { 1437 swp_entry_t entry; 1438 pte_t swp_pte; 1439 1440 pteval = ptep_get_and_clear(mm, pvmw.address, pvmw.pte); 1441 1442 /* 1443 * Store the pfn of the page in a special migration 1444 * pte. do_swap_page() will wait until the migration 1445 * pte is removed and then restart fault handling. 1446 */ 1447 entry = make_migration_entry(page, 0); 1448 swp_pte = swp_entry_to_pte(entry); 1449 if (pte_soft_dirty(pteval)) 1450 swp_pte = pte_swp_mksoft_dirty(swp_pte); 1451 set_pte_at(mm, pvmw.address, pvmw.pte, swp_pte); 1452
Re: [PATCH] mm: migration: fix migration of huge PMD shared pages
Hi Mike, I love your patch! Yet something to improve: [auto build test ERROR on linus/master] [also build test ERROR on v4.18 next-20180810] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Mike-Kravetz/mm-migration-fix-migration-of-huge-PMD-shared-pages/20180813-114549 config: i386-randconfig-x003-201832 (attached as .config) compiler: gcc-7 (Debian 7.3.0-16) 7.3.0 reproduce: # save the attached .config to linux build tree make ARCH=i386 All errors (new ones prefixed by >>): mm/rmap.c: In function 'try_to_unmap_one': >> mm/rmap.c:1425:7: error: implicit declaration of function >> 'huge_pmd_unshare'; did you mean '__NR_unshare'? >> [-Werror=implicit-function-declaration] huge_pmd_unshare(mm, , pvmw.pte)) { ^~~~ __NR_unshare cc1: some warnings being treated as errors vim +1425 mm/rmap.c 1382 1383 /* 1384 * If the page is mlock()d, we cannot swap it out. 1385 * If it's recently referenced (perhaps page_referenced 1386 * skipped over this mm) then we should reactivate it. 1387 */ 1388 if (!(flags & TTU_IGNORE_MLOCK)) { 1389 if (vma->vm_flags & VM_LOCKED) { 1390 /* PTE-mapped THP are never mlocked */ 1391 if (!PageTransCompound(page)) { 1392 /* 1393 * Holding pte lock, we do *not* need 1394 * mmap_sem here 1395 */ 1396 mlock_vma_page(page); 1397 } 1398 ret = false; 1399 page_vma_mapped_walk_done(); 1400 break; 1401 } 1402 if (flags & TTU_MUNLOCK) 1403 continue; 1404 } 1405 1406 /* Unexpected PMD-mapped THP? */ 1407 VM_BUG_ON_PAGE(!pvmw.pte, page); 1408 1409 subpage = page - page_to_pfn(page) + pte_pfn(*pvmw.pte); 1410 address = pvmw.address; 1411 1412 /* 1413 * PMDs for hugetlbfs pages could be shared. In this case, 1414 * pages with shared PMDs will have a mapcount of 1 no matter 1415 * how many times it is actually mapped. Map counting for 1416 * PMD sharing is mostly done via the reference count on the 1417 * PMD page itself. If the page we are trying to unmap is a 1418 * hugetlbfs page, attempt to 'unshare' at the PMD level. 1419 * huge_pmd_unshare takes care of clearing the PUD and 1420 * reference counting on the PMD page which effectively unmaps 1421 * the page. Take care of flushing cache and TLB for page in 1422 * this specific mapping here. 1423 */ 1424 if (PageHuge(page) && > 1425 huge_pmd_unshare(mm, , pvmw.pte)) { 1426 unsigned long end_add = address + vma_mmu_pagesize(vma); 1427 1428 flush_cache_range(vma, address, end_add); 1429 flush_tlb_range(vma, address, end_add); 1430 mmu_notifier_invalidate_range(mm, address, end_add); 1431 continue; 1432 } 1433 1434 if (IS_ENABLED(CONFIG_MIGRATION) && 1435 (flags & TTU_MIGRATION) && 1436 is_zone_device_page(page)) { 1437 swp_entry_t entry; 1438 pte_t swp_pte; 1439 1440 pteval = ptep_get_and_clear(mm, pvmw.address, pvmw.pte); 1441 1442 /* 1443 * Store the pfn of the page in a special migration 1444 * pte. do_swap_page() will wait until the migration 1445 * pte is removed and then restart fault handling. 1446 */ 1447 entry = make_migration_entry(page, 0); 1448 swp_pte = swp_entry_to_pte(entry); 1449 if (pte_soft_dirty(pteval)) 1450 swp_pte = pte_swp_mksoft_dirty(swp_pte); 1451 set_pte_at(mm, pvmw.address, pvmw.pte, swp_pte); 1452