Re: [PATCH v2] page_alloc: Fix freeing non-compound pages

2020-10-18 Thread Nicholas Piggin
Excerpts from Andrew Morton's message of September 29, 2020 2:46 pm:
> On Tue, 29 Sep 2020 02:17:19 +0100 Matthew Wilcox  wrote:
> 
>> On Mon, Sep 28, 2020 at 06:03:07PM -0700, Andrew Morton wrote:
>> > On Sat, 26 Sep 2020 22:39:19 +0100 "Matthew Wilcox (Oracle)" 
>> >  wrote:
>> > 
>> > > Here is a very rare race which leaks memory:

Great catch! [sorry, a bit behind with emails]

>> > 
>> > Not worth a cc:stable?
>> 
>> Yes, it probably should have been.
> 
> Have you a feeling for how often this occurs?
> 
>>  I just assume the stablebot will
>> pick up anything that has a Fixes: tag.
> 
> We asked them not to do that for mm/ patches.  Crazy stuff was getting
> backported.
> 
>> > >
>> > > --- a/mm/page_alloc.c
>> > > +++ b/mm/page_alloc.c
>> > > @@ -4947,6 +4947,9 @@ void __free_pages(struct page *page, unsigned int 
>> > > order)
>> > >  {
>> > >  if (put_page_testzero(page))
>> > >  free_the_page(page, order);
>> > > +else if (!PageHead(page))
>> > > +while (order-- > 0)
>> > > +free_the_page(page + (1 << order), order);
>> > 
>> > Well that's weird and scary looking.  `page' has non-zero refcount yet
>> > we go and free random followon pages.  Methinks it merits an
>> > explanatory comment?
>> 
>> Well, poot.  I lost that comment in the shuffling of patches.  In a
>> different tree, I have:
>> 
>> @@ -4943,10 +4943,19 @@ static inline void free_the_page(struct page *page, 
>> unsi
>> gned int order)
>> __free_pages_ok(page, order);
>>  }
>>  
>> +/*
>> + * If we free a non-compound allocation, another thread may have a
> 
> "non-compound, higher-order", I suggest?
> 
>> + * speculative reference to the first page.  It has no way of knowing
>> + * about the rest of the allocation, so we have to free all but the
>> + * first page here.
>> + */
>>  void __free_pages(struct page *page, unsigned int order)
>>  {
>> if (put_page_testzero(page))
>> free_the_page(page, order);
>> +   else if (!PageHead(page))
>> +   while (order-- > 0)
>> +   free_the_page(page + (1 << order), order);
>>  }
>>  EXPORT_SYMBOL(__free_pages);
>>  
>> 
>> Although I'm now thinking of making that comment into kernel-doc and
>> turning it into advice to the caller rather than an internal note to
>> other mm developers.
> 
> hm.  But what action could the caller take?  The explanatory comment
> seems OK to me.

The version of this without the comment got merged. I didn't mind the
comment...

Thanks,
Nick


Re: [PATCH v2] page_alloc: Fix freeing non-compound pages

2020-09-30 Thread Mike Rapoport
On Tue, Sep 29, 2020 at 03:06:22PM +0100, Matthew Wilcox wrote:
> On Tue, Sep 29, 2020 at 10:26:22AM +0300, Mike Rapoport wrote:
> > This sentence presumes existing description/prior knowledge about
> > put_page().
> > 
> > Maybe
> > 
> >   This function can free multi-page allocations that were not allocated
> >   with %__GFP_COMP, unlike put_page() that would free only the first page
> >   in such case. __free_pages() does not ...
> 
> Thanks.  After waking up this morning I did a more extensive rewrite:

I like this one

Acked-by: Mike Rapoport 

> /**
>  * __free_pages - Free pages allocated with alloc_pages().
>  * @page: The page pointer returned from alloc_pages().
>  * @order: The order of the allocation.
>  *
>  * This function can free multi-page allocations that are not compound
>  * pages.  It does not check that the @order passed in matches that of
>  * the allocation, so it is easy to leak memory.  Freeing more memory
>  * than was allocated will probably emit a warning.
>  *
>  * If the last reference to this page is speculative, it will be released
>  * by put_page() which only frees the first page of a non-compound
>  * allocation.  To prevent the remaining pages from being leaked, we free
>  * the subsequent pages here.  If you want to use the page's reference
>  * count to decide when to free the allocation, you should allocate a
>  * compound page, and use put_page() instead of __free_pages().
>  *
>  * Context: May be called in interrupt context or holding a normal
>  * spinlock, but not in NMI context or while holding a raw spinlock.
>  */
> 

-- 
Sincerely yours,
Mike.


Re: [PATCH v2] page_alloc: Fix freeing non-compound pages

2020-09-29 Thread Matthew Wilcox
On Mon, Sep 28, 2020 at 09:46:56PM -0700, Andrew Morton wrote:
> On Tue, 29 Sep 2020 02:17:19 +0100 Matthew Wilcox  wrote:
> 
> > On Mon, Sep 28, 2020 at 06:03:07PM -0700, Andrew Morton wrote:
> > > On Sat, 26 Sep 2020 22:39:19 +0100 "Matthew Wilcox (Oracle)" 
> > >  wrote:
> > > 
> > > > Here is a very rare race which leaks memory:
> > > 
> > > Not worth a cc:stable?
> > 
> > Yes, it probably should have been.
> 
> Have you a feeling for how often this occurs?

I doubt it happens often.  I don't think I could construct a workload to
make it happen frequently.  Maybe more often with a virtualised workload
where a thread can be preempted between instructions.

> >  I just assume the stablebot will
> > pick up anything that has a Fixes: tag.
> 
> We asked them not to do that for mm/ patches.  Crazy stuff was getting
> backported.

That's a shame.  I'll try to remember to cc them explicitly in the future.

> > Although I'm now thinking of making that comment into kernel-doc and
> > turning it into advice to the caller rather than an internal note to
> > other mm developers.
> 
> hm.  But what action could the caller take?  The explanatory comment
> seems OK to me.

Use compound pages instead of non-compound pages.  Although Linus has
asked that people stop using __get_free_pages(), so maybe that will be
the direction we go in.

https://lore.kernel.org/lkml/ca+55afwyxj+topajznc5mpj-25xblaeu8ijp8ztyhma3lxf...@mail.gmail.com/


Re: [PATCH v2] page_alloc: Fix freeing non-compound pages

2020-09-29 Thread Matthew Wilcox
On Tue, Sep 29, 2020 at 10:26:22AM +0300, Mike Rapoport wrote:
> This sentence presumes existing description/prior knowledge about
> put_page().
> 
> Maybe
> 
>   This function can free multi-page allocations that were not allocated
>   with %__GFP_COMP, unlike put_page() that would free only the first page
>   in such case. __free_pages() does not ...

Thanks.  After waking up this morning I did a more extensive rewrite:

/**
 * __free_pages - Free pages allocated with alloc_pages().
 * @page: The page pointer returned from alloc_pages().
 * @order: The order of the allocation.
 *
 * This function can free multi-page allocations that are not compound
 * pages.  It does not check that the @order passed in matches that of
 * the allocation, so it is easy to leak memory.  Freeing more memory
 * than was allocated will probably emit a warning.
 *
 * If the last reference to this page is speculative, it will be released
 * by put_page() which only frees the first page of a non-compound
 * allocation.  To prevent the remaining pages from being leaked, we free
 * the subsequent pages here.  If you want to use the page's reference
 * count to decide when to free the allocation, you should allocate a
 * compound page, and use put_page() instead of __free_pages().
 *
 * Context: May be called in interrupt context or holding a normal
 * spinlock, but not in NMI context or while holding a raw spinlock.
 */



Re: [PATCH v2] page_alloc: Fix freeing non-compound pages

2020-09-29 Thread Mike Rapoport
On Tue, Sep 29, 2020 at 04:40:26AM +0100, Matthew Wilcox wrote:
> On Mon, Sep 28, 2020 at 06:03:07PM -0700, Andrew Morton wrote:
> > Well that's weird and scary looking.  `page' has non-zero refcount yet
> > we go and free random followon pages.  Methinks it merits an
> > explanatory comment?
> 
> Here's some kernel-doc.  Opinions?
> 
> /**
>  * __free_pages - Free pages allocated with alloc_pages().
>  * @page: The page pointer returned from alloc_pages().
>  * @order: The order of the allocation.
>  *
>  * This function differs from put_page() in that it can free multi-page

This sentence presumes existing description/prior knowledge about
put_page().

Maybe

  This function can free multi-page allocations that were not allocated
  with %__GFP_COMP, unlike put_page() that would free only the first page
  in such case. __free_pages() does not ...

>  * allocations that were not allocated with %__GFP_COMP.  This function
>  * does not check that the @order passed in matches that of the
>  * allocation, so it is possible to leak memory.  Freeing more memory than
>  * was allocated will probably be warned about by other debugging checks.
>  *
>  * It is only safe to use the page reference count to determine when
>  * to free an allocation if you use %__GFP_COMP (in which case, you may
>  * as well use put_page() to free the page).  Another thread may have a
>  * speculative reference to the first page, but it has no way of knowing
>  * about the rest of the allocation, so we have to free all but the
>  * first page here.
>  *
>  * Context: May be called in interrupt context but not NMI context.
>  */
> 

-- 
Sincerely yours,
Mike.


Re: [PATCH v2] page_alloc: Fix freeing non-compound pages

2020-09-28 Thread Andrew Morton
On Tue, 29 Sep 2020 02:17:19 +0100 Matthew Wilcox  wrote:

> On Mon, Sep 28, 2020 at 06:03:07PM -0700, Andrew Morton wrote:
> > On Sat, 26 Sep 2020 22:39:19 +0100 "Matthew Wilcox (Oracle)" 
> >  wrote:
> > 
> > > Here is a very rare race which leaks memory:
> > 
> > Not worth a cc:stable?
> 
> Yes, it probably should have been.

Have you a feeling for how often this occurs?

>  I just assume the stablebot will
> pick up anything that has a Fixes: tag.

We asked them not to do that for mm/ patches.  Crazy stuff was getting
backported.

> > >
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -4947,6 +4947,9 @@ void __free_pages(struct page *page, unsigned int 
> > > order)
> > >  {
> > >   if (put_page_testzero(page))
> > >   free_the_page(page, order);
> > > + else if (!PageHead(page))
> > > + while (order-- > 0)
> > > + free_the_page(page + (1 << order), order);
> > 
> > Well that's weird and scary looking.  `page' has non-zero refcount yet
> > we go and free random followon pages.  Methinks it merits an
> > explanatory comment?
> 
> Well, poot.  I lost that comment in the shuffling of patches.  In a
> different tree, I have:
> 
> @@ -4943,10 +4943,19 @@ static inline void free_the_page(struct page *page, 
> unsi
> gned int order)
> __free_pages_ok(page, order);
>  }
>  
> +/*
> + * If we free a non-compound allocation, another thread may have a

"non-compound, higher-order", I suggest?

> + * speculative reference to the first page.  It has no way of knowing
> + * about the rest of the allocation, so we have to free all but the
> + * first page here.
> + */
>  void __free_pages(struct page *page, unsigned int order)
>  {
> if (put_page_testzero(page))
> free_the_page(page, order);
> +   else if (!PageHead(page))
> +   while (order-- > 0)
> +   free_the_page(page + (1 << order), order);
>  }
>  EXPORT_SYMBOL(__free_pages);
>  
> 
> Although I'm now thinking of making that comment into kernel-doc and
> turning it into advice to the caller rather than an internal note to
> other mm developers.

hm.  But what action could the caller take?  The explanatory comment
seems OK to me.



Re: [PATCH v2] page_alloc: Fix freeing non-compound pages

2020-09-28 Thread Matthew Wilcox
On Mon, Sep 28, 2020 at 06:03:07PM -0700, Andrew Morton wrote:
> Well that's weird and scary looking.  `page' has non-zero refcount yet
> we go and free random followon pages.  Methinks it merits an
> explanatory comment?

Here's some kernel-doc.  Opinions?

/**
 * __free_pages - Free pages allocated with alloc_pages().
 * @page: The page pointer returned from alloc_pages().
 * @order: The order of the allocation.
 *
 * This function differs from put_page() in that it can free multi-page
 * allocations that were not allocated with %__GFP_COMP.  This function
 * does not check that the @order passed in matches that of the
 * allocation, so it is possible to leak memory.  Freeing more memory than
 * was allocated will probably be warned about by other debugging checks.
 *
 * It is only safe to use the page reference count to determine when
 * to free an allocation if you use %__GFP_COMP (in which case, you may
 * as well use put_page() to free the page).  Another thread may have a
 * speculative reference to the first page, but it has no way of knowing
 * about the rest of the allocation, so we have to free all but the
 * first page here.
 *
 * Context: May be called in interrupt context but not NMI context.
 */



Re: [PATCH v2] page_alloc: Fix freeing non-compound pages

2020-09-28 Thread Matthew Wilcox
On Mon, Sep 28, 2020 at 06:03:07PM -0700, Andrew Morton wrote:
> On Sat, 26 Sep 2020 22:39:19 +0100 "Matthew Wilcox (Oracle)" 
>  wrote:
> 
> > Here is a very rare race which leaks memory:
> 
> Not worth a cc:stable?

Yes, it probably should have been.  I just assume the stablebot will
pick up anything that has a Fixes: tag.

> > Page P0 is allocated to the page cache.  Page P1 is free.
> > 
> > Thread AThread BThread C
> > find_get_entry():
> > xas_load() returns P0
> > Removes P0 from page cache
> > P0 finds its buddy P1
> > alloc_pages(GFP_KERNEL, 1) returns P0
> > P0 has refcount 1
> > page_cache_get_speculative(P0)
> > P0 has refcount 2
> > __free_pages(P0)
> 
>   __free_pages(P0, 1), I assume.

Good catch.  That was what I meant to type.

> > P0 has refcount 1
> > put_page(P0)
> 
> but this is implicitly order 0

Right, because it's not a compound page.

> > P1 is not freed
> 
> huh.

Yeah.  Nasty, and we'll never know how often it was hit.

> > Fix this by freeing all the pages in __free_pages() that won't be freed
> > by the call to put_page().  It's usually not a good idea to split a page,
> > but this is a very unlikely scenario.
> > 
> > ...
> >
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -4947,6 +4947,9 @@ void __free_pages(struct page *page, unsigned int 
> > order)
> >  {
> > if (put_page_testzero(page))
> > free_the_page(page, order);
> > +   else if (!PageHead(page))
> > +   while (order-- > 0)
> > +   free_the_page(page + (1 << order), order);
> 
> Well that's weird and scary looking.  `page' has non-zero refcount yet
> we go and free random followon pages.  Methinks it merits an
> explanatory comment?

Well, poot.  I lost that comment in the shuffling of patches.  In a
different tree, I have:

@@ -4943,10 +4943,19 @@ static inline void free_the_page(struct page *page, unsi
gned int order)
__free_pages_ok(page, order);
 }
 
+/*
+ * If we free a non-compound allocation, another thread may have a
+ * speculative reference to the first page.  It has no way of knowing
+ * about the rest of the allocation, so we have to free all but the
+ * first page here.
+ */
 void __free_pages(struct page *page, unsigned int order)
 {
if (put_page_testzero(page))
free_the_page(page, order);
+   else if (!PageHead(page))
+   while (order-- > 0)
+   free_the_page(page + (1 << order), order);
 }
 EXPORT_SYMBOL(__free_pages);
 

Although I'm now thinking of making that comment into kernel-doc and
turning it into advice to the caller rather than an internal note to
other mm developers.


Re: [PATCH v2] page_alloc: Fix freeing non-compound pages

2020-09-28 Thread Andrew Morton
On Sat, 26 Sep 2020 22:39:19 +0100 "Matthew Wilcox (Oracle)" 
 wrote:

> Here is a very rare race which leaks memory:

Not worth a cc:stable?

> Page P0 is allocated to the page cache.  Page P1 is free.
> 
> Thread AThread BThread C
> find_get_entry():
> xas_load() returns P0
>   Removes P0 from page cache
>   P0 finds its buddy P1
>   alloc_pages(GFP_KERNEL, 1) returns P0
>   P0 has refcount 1
> page_cache_get_speculative(P0)
> P0 has refcount 2
>   __free_pages(P0)

__free_pages(P0, 1), I assume.

>   P0 has refcount 1
> put_page(P0)

but this is implicitly order 0

> P1 is not freed

huh.

> Fix this by freeing all the pages in __free_pages() that won't be freed
> by the call to put_page().  It's usually not a good idea to split a page,
> but this is a very unlikely scenario.
> 
> ...
>
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4947,6 +4947,9 @@ void __free_pages(struct page *page, unsigned int order)
>  {
>   if (put_page_testzero(page))
>   free_the_page(page, order);
> + else if (!PageHead(page))
> + while (order-- > 0)
> + free_the_page(page + (1 << order), order);

Well that's weird and scary looking.  `page' has non-zero refcount yet
we go and free random followon pages.  Methinks it merits an
explanatory comment?


[PATCH v2] page_alloc: Fix freeing non-compound pages

2020-09-26 Thread Matthew Wilcox (Oracle)
Here is a very rare race which leaks memory:

Page P0 is allocated to the page cache.  Page P1 is free.

Thread AThread BThread C
find_get_entry():
xas_load() returns P0
Removes P0 from page cache
P0 finds its buddy P1
alloc_pages(GFP_KERNEL, 1) returns P0
P0 has refcount 1
page_cache_get_speculative(P0)
P0 has refcount 2
__free_pages(P0)
P0 has refcount 1
put_page(P0)
P1 is not freed

Fix this by freeing all the pages in __free_pages() that won't be freed
by the call to put_page().  It's usually not a good idea to split a page,
but this is a very unlikely scenario.

Fixes: e286781d5f2e ("mm: speculative page references")
Signed-off-by: Matthew Wilcox (Oracle) 
---
v2: Add a test module.  Verified it works by:
 (1) loading the test module on an unmodified kernel and watching it OOM
 (2) modifying __free_pages() with my v1 patch that neglected to add the
 PageHead test and hitting VM_BUG_ON_PAGE(PageTail(page))
 (3) applying all of this patch and seeing it survive

 lib/Kconfig.debug |  9 +
 lib/Makefile  |  1 +
 lib/test_free_pages.c | 42 ++
 mm/page_alloc.c   |  3 +++
 4 files changed, 55 insertions(+)
 create mode 100644 lib/test_free_pages.c

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index e068c3c7189a..eed59af0e907 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2369,6 +2369,15 @@ config TEST_HMM
 
  If unsure, say N.
 
+config TEST_FREE_PAGES
+   tristate "Test freeing pages"
+   help
+ Test that a memory leak does not occur due to a race between
+ freeing a block of pages and a speculative page reference.
+ Loading this module is safe if your kernel has the bug fixed.
+ If the bug is not fixed, it will leak gigabytes of memory and
+ probably OOM your system.
+
 config TEST_FPU
tristate "Test floating point operations in kernel space"
depends on X86 && !KCOV_INSTRUMENT_ALL
diff --git a/lib/Makefile b/lib/Makefile
index a4a4c6864f51..071b687b7363 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -99,6 +99,7 @@ obj-$(CONFIG_TEST_BLACKHOLE_DEV) += test_blackhole_dev.o
 obj-$(CONFIG_TEST_MEMINIT) += test_meminit.o
 obj-$(CONFIG_TEST_LOCKUP) += test_lockup.o
 obj-$(CONFIG_TEST_HMM) += test_hmm.o
+obj-$(CONFIG_TEST_FREE_PAGES) += test_free_pages.o
 
 #
 # CFLAGS for compiling floating point code inside the kernel. x86/Makefile 
turns
diff --git a/lib/test_free_pages.c b/lib/test_free_pages.c
new file mode 100644
index ..074e76bd76b2
--- /dev/null
+++ b/lib/test_free_pages.c
@@ -0,0 +1,42 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * test_free_pages.c: Check that free_pages() doesn't leak memory
+ * Copyright (c) 2020 Oracle
+ * Author: Matthew Wilcox 
+ */
+
+#include 
+#include 
+#include 
+
+static void test_free_pages(gfp_t gfp)
+{
+   unsigned int i;
+
+   for (i = 0; i < 1000 * 1000; i++) {
+   unsigned long addr = __get_free_pages(gfp, 3);
+   struct page *page = virt_to_page(addr);
+
+   /* Simulate page cache getting a speculative reference */
+   get_page(page);
+   free_pages(addr, 3);
+   put_page(page);
+   }
+}
+
+static int m_in(void)
+{
+   test_free_pages(GFP_KERNEL);
+   test_free_pages(GFP_KERNEL | __GFP_COMP);
+
+   return 0;
+}
+
+static void m_ex(void)
+{
+}
+
+module_init(m_in);
+module_exit(m_ex);
+MODULE_AUTHOR("Matthew Wilcox ");
+MODULE_LICENSE("GPL");
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index fab5e97dc9ca..9b259c76e285 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4947,6 +4947,9 @@ void __free_pages(struct page *page, unsigned int order)
 {
if (put_page_testzero(page))
free_the_page(page, order);
+   else if (!PageHead(page))
+   while (order-- > 0)
+   free_the_page(page + (1 << order), order);
 }
 EXPORT_SYMBOL(__free_pages);
 
-- 
2.28.0