Re: [PATCH v2] mm/huge_memory.c: reorder operations in __split_huge_page_tail()
On Sun, Feb 11, 2018 at 05:29:37PM +0300, Konstantin Khlebnikov wrote: > + /* > + * Finally unfreeze refcount. Additional pin to radix tree. > + */ > + page_ref_unfreeze(page_tail, 1 + (!PageAnon(head) || > + PageSwapCache(head))); Please say "Additional pin from page cache", not "radix tree".
Re: [PATCH v2] mm/huge_memory.c: reorder operations in __split_huge_page_tail()
On Sun, Feb 11, 2018 at 05:29:37PM +0300, Konstantin Khlebnikov wrote: > + /* > + * Finally unfreeze refcount. Additional pin to radix tree. > + */ > + page_ref_unfreeze(page_tail, 1 + (!PageAnon(head) || > + PageSwapCache(head))); Please say "Additional pin from page cache", not "radix tree".
Re: [PATCH v2] mm/huge_memory.c: reorder operations in __split_huge_page_tail()
On Sun, Feb 11, 2018 at 6:47 PM, Kirill A. Shutemovwrote: > On Sun, Feb 11, 2018 at 06:32:10PM +0300, Konstantin Khlebnikov wrote: >> On Sun, Feb 11, 2018 at 6:14 PM, Kirill A. Shutemov >> wrote: >> > On Sun, Feb 11, 2018 at 05:29:37PM +0300, Konstantin Khlebnikov wrote: >> >> And replace page_ref_inc()/page_ref_add() with page_ref_unfreeze() which >> >> is made especially for that and has semantic of smp_store_release(). >> > >> > Nak on this part. >> > >> > page_ref_unfreeze() uses atomic_set() which neglects the situation in the >> > comment you're removing. >> >> Why? look into x86 smp_store_release >> for PPro it use same sequence smp_wb + WRITE_ONCE >> As I see spin_unlock uses exactly this macro. >> >> Anyway if page_ref_unfreeze cannot handle races with >> get_page_unless_zero() then it completely useless, > > Okay, fair enough. > > But please call it out it the commit message. OK, I'll review this yet again and resend tomorrow.
Re: [PATCH v2] mm/huge_memory.c: reorder operations in __split_huge_page_tail()
On Sun, Feb 11, 2018 at 6:47 PM, Kirill A. Shutemov wrote: > On Sun, Feb 11, 2018 at 06:32:10PM +0300, Konstantin Khlebnikov wrote: >> On Sun, Feb 11, 2018 at 6:14 PM, Kirill A. Shutemov >> wrote: >> > On Sun, Feb 11, 2018 at 05:29:37PM +0300, Konstantin Khlebnikov wrote: >> >> And replace page_ref_inc()/page_ref_add() with page_ref_unfreeze() which >> >> is made especially for that and has semantic of smp_store_release(). >> > >> > Nak on this part. >> > >> > page_ref_unfreeze() uses atomic_set() which neglects the situation in the >> > comment you're removing. >> >> Why? look into x86 smp_store_release >> for PPro it use same sequence smp_wb + WRITE_ONCE >> As I see spin_unlock uses exactly this macro. >> >> Anyway if page_ref_unfreeze cannot handle races with >> get_page_unless_zero() then it completely useless, > > Okay, fair enough. > > But please call it out it the commit message. OK, I'll review this yet again and resend tomorrow.
Re: [PATCH v2] mm/huge_memory.c: reorder operations in __split_huge_page_tail()
On Sun, Feb 11, 2018 at 06:32:10PM +0300, Konstantin Khlebnikov wrote: > On Sun, Feb 11, 2018 at 6:14 PM, Kirill A. Shutemov >wrote: > > On Sun, Feb 11, 2018 at 05:29:37PM +0300, Konstantin Khlebnikov wrote: > >> And replace page_ref_inc()/page_ref_add() with page_ref_unfreeze() which > >> is made especially for that and has semantic of smp_store_release(). > > > > Nak on this part. > > > > page_ref_unfreeze() uses atomic_set() which neglects the situation in the > > comment you're removing. > > Why? look into x86 smp_store_release > for PPro it use same sequence smp_wb + WRITE_ONCE > As I see spin_unlock uses exactly this macro. > > Anyway if page_ref_unfreeze cannot handle races with > get_page_unless_zero() then it completely useless, Okay, fair enough. But please call it out it the commit message. -- Kirill A. Shutemov
Re: [PATCH v2] mm/huge_memory.c: reorder operations in __split_huge_page_tail()
On Sun, Feb 11, 2018 at 06:32:10PM +0300, Konstantin Khlebnikov wrote: > On Sun, Feb 11, 2018 at 6:14 PM, Kirill A. Shutemov > wrote: > > On Sun, Feb 11, 2018 at 05:29:37PM +0300, Konstantin Khlebnikov wrote: > >> And replace page_ref_inc()/page_ref_add() with page_ref_unfreeze() which > >> is made especially for that and has semantic of smp_store_release(). > > > > Nak on this part. > > > > page_ref_unfreeze() uses atomic_set() which neglects the situation in the > > comment you're removing. > > Why? look into x86 smp_store_release > for PPro it use same sequence smp_wb + WRITE_ONCE > As I see spin_unlock uses exactly this macro. > > Anyway if page_ref_unfreeze cannot handle races with > get_page_unless_zero() then it completely useless, Okay, fair enough. But please call it out it the commit message. -- Kirill A. Shutemov
Re: [PATCH v2] mm/huge_memory.c: reorder operations in __split_huge_page_tail()
On Sun, Feb 11, 2018 at 6:14 PM, Kirill A. Shutemovwrote: > On Sun, Feb 11, 2018 at 05:29:37PM +0300, Konstantin Khlebnikov wrote: >> And replace page_ref_inc()/page_ref_add() with page_ref_unfreeze() which >> is made especially for that and has semantic of smp_store_release(). > > Nak on this part. > > page_ref_unfreeze() uses atomic_set() which neglects the situation in the > comment you're removing. Why? look into x86 smp_store_release for PPro it use same sequence smp_wb + WRITE_ONCE As I see spin_unlock uses exactly this macro. Anyway if page_ref_unfreeze cannot handle races with get_page_unless_zero() then it completely useless, > > You need at least explain why it's safe now. > > I would rather leave page_ref_inc()/page_ref_add() + explcit > smp_mb__before_atomic(). > > -- > Kirill A. Shutemov > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majord...@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: mailto:"d...@kvack.org;> em...@kvack.org
Re: [PATCH v2] mm/huge_memory.c: reorder operations in __split_huge_page_tail()
On Sun, Feb 11, 2018 at 6:14 PM, Kirill A. Shutemov wrote: > On Sun, Feb 11, 2018 at 05:29:37PM +0300, Konstantin Khlebnikov wrote: >> And replace page_ref_inc()/page_ref_add() with page_ref_unfreeze() which >> is made especially for that and has semantic of smp_store_release(). > > Nak on this part. > > page_ref_unfreeze() uses atomic_set() which neglects the situation in the > comment you're removing. Why? look into x86 smp_store_release for PPro it use same sequence smp_wb + WRITE_ONCE As I see spin_unlock uses exactly this macro. Anyway if page_ref_unfreeze cannot handle races with get_page_unless_zero() then it completely useless, > > You need at least explain why it's safe now. > > I would rather leave page_ref_inc()/page_ref_add() + explcit > smp_mb__before_atomic(). > > -- > Kirill A. Shutemov > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majord...@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: mailto:"d...@kvack.org;> em...@kvack.org
Re: [PATCH v2] mm/huge_memory.c: reorder operations in __split_huge_page_tail()
On Sun, Feb 11, 2018 at 05:29:37PM +0300, Konstantin Khlebnikov wrote: > And replace page_ref_inc()/page_ref_add() with page_ref_unfreeze() which > is made especially for that and has semantic of smp_store_release(). Nak on this part. page_ref_unfreeze() uses atomic_set() which neglects the situation in the comment you're removing. You need at least explain why it's safe now. I would rather leave page_ref_inc()/page_ref_add() + explcit smp_mb__before_atomic(). -- Kirill A. Shutemov
Re: [PATCH v2] mm/huge_memory.c: reorder operations in __split_huge_page_tail()
On Sun, Feb 11, 2018 at 05:29:37PM +0300, Konstantin Khlebnikov wrote: > And replace page_ref_inc()/page_ref_add() with page_ref_unfreeze() which > is made especially for that and has semantic of smp_store_release(). Nak on this part. page_ref_unfreeze() uses atomic_set() which neglects the situation in the comment you're removing. You need at least explain why it's safe now. I would rather leave page_ref_inc()/page_ref_add() + explcit smp_mb__before_atomic(). -- Kirill A. Shutemov
[PATCH v2] mm/huge_memory.c: reorder operations in __split_huge_page_tail()
THP split makes non-atomic change of tail page flags. This is almost ok because tail pages are locked and isolated but this breaks recent changes in page locking: non-atomic operation could clear bit PG_waiters. As a result concurrent sequence get_page_unless_zero() -> lock_page() might block forever. Especially if this page was truncated later. Fix is trivial: clone flags before unfreezing page reference counter. This race exists since commit 62906027091f ("mm: add PageWaiters indicating tasks are waiting for a page bit") while unsave unfreeze itself was added in commit 8df651c7059e ("thp: cleanup split_huge_page()"). clear_compound_head() also must be called before unfreezing page reference because successful get_page_unless_zero() must stabilize compound_head(). And replace page_ref_inc()/page_ref_add() with page_ref_unfreeze() which is made especially for that and has semantic of smp_store_release(). Signed-off-by: Konstantin Khlebnikov--- mm/huge_memory.c | 34 +- 1 file changed, 13 insertions(+), 21 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 87ab9b8f56b5..fa577aa7ecd8 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2355,26 +2355,11 @@ static void __split_huge_page_tail(struct page *head, int tail, struct page *page_tail = head + tail; VM_BUG_ON_PAGE(atomic_read(_tail->_mapcount) != -1, page_tail); - VM_BUG_ON_PAGE(page_ref_count(page_tail) != 0, page_tail); /* -* tail_page->_refcount is zero and not changing from under us. But -* get_page_unless_zero() may be running from under us on the -* tail_page. If we used atomic_set() below instead of atomic_inc() or -* atomic_add(), we would then run atomic_set() concurrently with -* get_page_unless_zero(), and atomic_set() is implemented in C not -* using locked ops. spin_unlock on x86 sometime uses locked ops -* because of PPro errata 66, 92, so unless somebody can guarantee -* atomic_set() here would be safe on all archs (and not only on x86), -* it's safer to use atomic_inc()/atomic_add(). +* Clone page flags before unfreezing refcount. +* lock_page() after speculative get wants to set PG_waiters. */ - if (PageAnon(head) && !PageSwapCache(head)) { - page_ref_inc(page_tail); - } else { - /* Additional pin to radix tree */ - page_ref_add(page_tail, 2); - } - page_tail->flags &= ~PAGE_FLAGS_CHECK_AT_PREP; page_tail->flags |= (head->flags & ((1L << PG_referenced) | @@ -2387,14 +2372,21 @@ static void __split_huge_page_tail(struct page *head, int tail, (1L << PG_unevictable) | (1L << PG_dirty))); - /* -* After clearing PageTail the gup refcount can be released. -* Page flags also must be visible before we make the page non-compound. -*/ + /* Page flags must be visible before we make the page non-compound. */ smp_wmb(); + /* +* Clear PageTail before unfreezing page refcount: +* speculative refcount must stabilize compound_head(). +*/ clear_compound_head(page_tail); + /* +* Finally unfreeze refcount. Additional pin to radix tree. +*/ + page_ref_unfreeze(page_tail, 1 + (!PageAnon(head) || + PageSwapCache(head))); + if (page_is_young(head)) set_page_young(page_tail); if (page_is_idle(head))
[PATCH v2] mm/huge_memory.c: reorder operations in __split_huge_page_tail()
THP split makes non-atomic change of tail page flags. This is almost ok because tail pages are locked and isolated but this breaks recent changes in page locking: non-atomic operation could clear bit PG_waiters. As a result concurrent sequence get_page_unless_zero() -> lock_page() might block forever. Especially if this page was truncated later. Fix is trivial: clone flags before unfreezing page reference counter. This race exists since commit 62906027091f ("mm: add PageWaiters indicating tasks are waiting for a page bit") while unsave unfreeze itself was added in commit 8df651c7059e ("thp: cleanup split_huge_page()"). clear_compound_head() also must be called before unfreezing page reference because successful get_page_unless_zero() must stabilize compound_head(). And replace page_ref_inc()/page_ref_add() with page_ref_unfreeze() which is made especially for that and has semantic of smp_store_release(). Signed-off-by: Konstantin Khlebnikov --- mm/huge_memory.c | 34 +- 1 file changed, 13 insertions(+), 21 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 87ab9b8f56b5..fa577aa7ecd8 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2355,26 +2355,11 @@ static void __split_huge_page_tail(struct page *head, int tail, struct page *page_tail = head + tail; VM_BUG_ON_PAGE(atomic_read(_tail->_mapcount) != -1, page_tail); - VM_BUG_ON_PAGE(page_ref_count(page_tail) != 0, page_tail); /* -* tail_page->_refcount is zero and not changing from under us. But -* get_page_unless_zero() may be running from under us on the -* tail_page. If we used atomic_set() below instead of atomic_inc() or -* atomic_add(), we would then run atomic_set() concurrently with -* get_page_unless_zero(), and atomic_set() is implemented in C not -* using locked ops. spin_unlock on x86 sometime uses locked ops -* because of PPro errata 66, 92, so unless somebody can guarantee -* atomic_set() here would be safe on all archs (and not only on x86), -* it's safer to use atomic_inc()/atomic_add(). +* Clone page flags before unfreezing refcount. +* lock_page() after speculative get wants to set PG_waiters. */ - if (PageAnon(head) && !PageSwapCache(head)) { - page_ref_inc(page_tail); - } else { - /* Additional pin to radix tree */ - page_ref_add(page_tail, 2); - } - page_tail->flags &= ~PAGE_FLAGS_CHECK_AT_PREP; page_tail->flags |= (head->flags & ((1L << PG_referenced) | @@ -2387,14 +2372,21 @@ static void __split_huge_page_tail(struct page *head, int tail, (1L << PG_unevictable) | (1L << PG_dirty))); - /* -* After clearing PageTail the gup refcount can be released. -* Page flags also must be visible before we make the page non-compound. -*/ + /* Page flags must be visible before we make the page non-compound. */ smp_wmb(); + /* +* Clear PageTail before unfreezing page refcount: +* speculative refcount must stabilize compound_head(). +*/ clear_compound_head(page_tail); + /* +* Finally unfreeze refcount. Additional pin to radix tree. +*/ + page_ref_unfreeze(page_tail, 1 + (!PageAnon(head) || + PageSwapCache(head))); + if (page_is_young(head)) set_page_young(page_tail); if (page_is_idle(head))