Re: [PATCH v3 2/2] mm/huge_memory.c: reorder operations in __split_huge_page_tail()

2018-02-12 Thread Kirill A. Shutemov
On Mon, Feb 12, 2018 at 04:58:53PM +0300, Konstantin Khlebnikov wrote:
> THP split makes non-atomic change of tail page flags. This is almost ok
> because tail pages are locked and isolated but this breaks recent changes
> in page locking: non-atomic operation could clear bit PG_waiters.
> 
> As a result concurrent sequence get_page_unless_zero() -> lock_page()
> might block forever. Especially if this page was truncated later.
> 
> Fix is trivial: clone flags before unfreezing page reference counter.
> 
> This race exists since commit 62906027091f ("mm: add PageWaiters indicating
> tasks are waiting for a page bit") while unsave unfreeze itself was added
> in commit 8df651c7059e ("thp: cleanup split_huge_page()").
> 
> clear_compound_head() also must be called before unfreezing page reference
> because after successful get_page_unless_zero() might follow put_page()
> which needs correct compound_head().
> 
> And replace page_ref_inc()/page_ref_add() with page_ref_unfreeze() which
> is made especially for that and has semantic of smp_store_release().
> 
> Signed-off-by: Konstantin Khlebnikov 

Acked-by: Kirill A. Shutemov 

-- 
 Kirill A. Shutemov


[PATCH v3 2/2] mm/huge_memory.c: reorder operations in __split_huge_page_tail()

2018-02-12 Thread Konstantin Khlebnikov
THP split makes non-atomic change of tail page flags. This is almost ok
because tail pages are locked and isolated but this breaks recent changes
in page locking: non-atomic operation could clear bit PG_waiters.

As a result concurrent sequence get_page_unless_zero() -> lock_page()
might block forever. Especially if this page was truncated later.

Fix is trivial: clone flags before unfreezing page reference counter.

This race exists since commit 62906027091f ("mm: add PageWaiters indicating
tasks are waiting for a page bit") while unsave unfreeze itself was added
in commit 8df651c7059e ("thp: cleanup split_huge_page()").

clear_compound_head() also must be called before unfreezing page reference
because after successful get_page_unless_zero() might follow put_page()
which needs correct compound_head().

And replace page_ref_inc()/page_ref_add() with page_ref_unfreeze() which
is made especially for that and has semantic of smp_store_release().

Signed-off-by: Konstantin Khlebnikov 
---
 mm/huge_memory.c |   36 +++-
 1 file changed, 15 insertions(+), 21 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 87ab9b8f56b5..7a8ead256b2f 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2355,26 +2355,13 @@ static void __split_huge_page_tail(struct page *head, 
int tail,
struct page *page_tail = head + tail;
 
VM_BUG_ON_PAGE(atomic_read(&page_tail->_mapcount) != -1, page_tail);
-   VM_BUG_ON_PAGE(page_ref_count(page_tail) != 0, page_tail);
 
/*
-* tail_page->_refcount is zero and not changing from under us. But
-* get_page_unless_zero() may be running from under us on the
-* tail_page. If we used atomic_set() below instead of atomic_inc() or
-* atomic_add(), we would then run atomic_set() concurrently with
-* get_page_unless_zero(), and atomic_set() is implemented in C not
-* using locked ops. spin_unlock on x86 sometime uses locked ops
-* because of PPro errata 66, 92, so unless somebody can guarantee
-* atomic_set() here would be safe on all archs (and not only on x86),
-* it's safer to use atomic_inc()/atomic_add().
+* Clone page flags before unfreezing refcount.
+*
+* After successful get_page_unless_zero() might follow flags change,
+* for exmaple lock_page() which set PG_waiters.
 */
-   if (PageAnon(head) && !PageSwapCache(head)) {
-   page_ref_inc(page_tail);
-   } else {
-   /* Additional pin to radix tree */
-   page_ref_add(page_tail, 2);
-   }
-
page_tail->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
page_tail->flags |= (head->flags &
((1L << PG_referenced) |
@@ -2387,14 +2374,21 @@ static void __split_huge_page_tail(struct page *head, 
int tail,
 (1L << PG_unevictable) |
 (1L << PG_dirty)));
 
-   /*
-* After clearing PageTail the gup refcount can be released.
-* Page flags also must be visible before we make the page non-compound.
-*/
+   /* Page flags must be visible before we make the page non-compound. */
smp_wmb();
 
+   /*
+* Clear PageTail before unfreezing page refcount.
+*
+* After successful get_page_unless_zero() might follow put_page()
+* which needs correct compound_head().
+*/
clear_compound_head(page_tail);
 
+   /* Finally unfreeze refcount. Additional reference from page cache. */
+   page_ref_unfreeze(page_tail, 1 + (!PageAnon(head) ||
+ PageSwapCache(head)));
+
if (page_is_young(head))
set_page_young(page_tail);
if (page_is_idle(head))