Re: [PATCH v2] page_alloc: Fix freeing non-compound pages
Excerpts from Andrew Morton's message of September 29, 2020 2:46 pm: > On Tue, 29 Sep 2020 02:17:19 +0100 Matthew Wilcox wrote: > >> On Mon, Sep 28, 2020 at 06:03:07PM -0700, Andrew Morton wrote: >> > On Sat, 26 Sep 2020 22:39:19 +0100 "Matthew Wilcox (Oracle)" >> > wrote: >> > >> > > Here is a very rare race which leaks memory: Great catch! [sorry, a bit behind with emails] >> > >> > Not worth a cc:stable? >> >> Yes, it probably should have been. > > Have you a feeling for how often this occurs? > >> I just assume the stablebot will >> pick up anything that has a Fixes: tag. > > We asked them not to do that for mm/ patches. Crazy stuff was getting > backported. > >> > > >> > > --- a/mm/page_alloc.c >> > > +++ b/mm/page_alloc.c >> > > @@ -4947,6 +4947,9 @@ void __free_pages(struct page *page, unsigned int >> > > order) >> > > { >> > > if (put_page_testzero(page)) >> > > free_the_page(page, order); >> > > +else if (!PageHead(page)) >> > > +while (order-- > 0) >> > > +free_the_page(page + (1 << order), order); >> > >> > Well that's weird and scary looking. `page' has non-zero refcount yet >> > we go and free random followon pages. Methinks it merits an >> > explanatory comment? >> >> Well, poot. I lost that comment in the shuffling of patches. In a >> different tree, I have: >> >> @@ -4943,10 +4943,19 @@ static inline void free_the_page(struct page *page, >> unsi >> gned int order) >> __free_pages_ok(page, order); >> } >> >> +/* >> + * If we free a non-compound allocation, another thread may have a > > "non-compound, higher-order", I suggest? > >> + * speculative reference to the first page. It has no way of knowing >> + * about the rest of the allocation, so we have to free all but the >> + * first page here. >> + */ >> void __free_pages(struct page *page, unsigned int order) >> { >> if (put_page_testzero(page)) >> free_the_page(page, order); >> + else if (!PageHead(page)) >> + while (order-- > 0) >> + free_the_page(page + (1 << order), order); >> } >> EXPORT_SYMBOL(__free_pages); >> >> >> Although I'm now thinking of making that comment into kernel-doc and >> turning it into advice to the caller rather than an internal note to >> other mm developers. > > hm. But what action could the caller take? The explanatory comment > seems OK to me. The version of this without the comment got merged. I didn't mind the comment... Thanks, Nick
Re: [PATCH v2] page_alloc: Fix freeing non-compound pages
On Tue, Sep 29, 2020 at 03:06:22PM +0100, Matthew Wilcox wrote: > On Tue, Sep 29, 2020 at 10:26:22AM +0300, Mike Rapoport wrote: > > This sentence presumes existing description/prior knowledge about > > put_page(). > > > > Maybe > > > > This function can free multi-page allocations that were not allocated > > with %__GFP_COMP, unlike put_page() that would free only the first page > > in such case. __free_pages() does not ... > > Thanks. After waking up this morning I did a more extensive rewrite: I like this one Acked-by: Mike Rapoport > /** > * __free_pages - Free pages allocated with alloc_pages(). > * @page: The page pointer returned from alloc_pages(). > * @order: The order of the allocation. > * > * This function can free multi-page allocations that are not compound > * pages. It does not check that the @order passed in matches that of > * the allocation, so it is easy to leak memory. Freeing more memory > * than was allocated will probably emit a warning. > * > * If the last reference to this page is speculative, it will be released > * by put_page() which only frees the first page of a non-compound > * allocation. To prevent the remaining pages from being leaked, we free > * the subsequent pages here. If you want to use the page's reference > * count to decide when to free the allocation, you should allocate a > * compound page, and use put_page() instead of __free_pages(). > * > * Context: May be called in interrupt context or holding a normal > * spinlock, but not in NMI context or while holding a raw spinlock. > */ > -- Sincerely yours, Mike.
Re: [PATCH v2] page_alloc: Fix freeing non-compound pages
On Mon, Sep 28, 2020 at 09:46:56PM -0700, Andrew Morton wrote: > On Tue, 29 Sep 2020 02:17:19 +0100 Matthew Wilcox wrote: > > > On Mon, Sep 28, 2020 at 06:03:07PM -0700, Andrew Morton wrote: > > > On Sat, 26 Sep 2020 22:39:19 +0100 "Matthew Wilcox (Oracle)" > > > wrote: > > > > > > > Here is a very rare race which leaks memory: > > > > > > Not worth a cc:stable? > > > > Yes, it probably should have been. > > Have you a feeling for how often this occurs? I doubt it happens often. I don't think I could construct a workload to make it happen frequently. Maybe more often with a virtualised workload where a thread can be preempted between instructions. > > I just assume the stablebot will > > pick up anything that has a Fixes: tag. > > We asked them not to do that for mm/ patches. Crazy stuff was getting > backported. That's a shame. I'll try to remember to cc them explicitly in the future. > > Although I'm now thinking of making that comment into kernel-doc and > > turning it into advice to the caller rather than an internal note to > > other mm developers. > > hm. But what action could the caller take? The explanatory comment > seems OK to me. Use compound pages instead of non-compound pages. Although Linus has asked that people stop using __get_free_pages(), so maybe that will be the direction we go in. https://lore.kernel.org/lkml/ca+55afwyxj+topajznc5mpj-25xblaeu8ijp8ztyhma3lxf...@mail.gmail.com/
Re: [PATCH v2] page_alloc: Fix freeing non-compound pages
On Tue, Sep 29, 2020 at 10:26:22AM +0300, Mike Rapoport wrote: > This sentence presumes existing description/prior knowledge about > put_page(). > > Maybe > > This function can free multi-page allocations that were not allocated > with %__GFP_COMP, unlike put_page() that would free only the first page > in such case. __free_pages() does not ... Thanks. After waking up this morning I did a more extensive rewrite: /** * __free_pages - Free pages allocated with alloc_pages(). * @page: The page pointer returned from alloc_pages(). * @order: The order of the allocation. * * This function can free multi-page allocations that are not compound * pages. It does not check that the @order passed in matches that of * the allocation, so it is easy to leak memory. Freeing more memory * than was allocated will probably emit a warning. * * If the last reference to this page is speculative, it will be released * by put_page() which only frees the first page of a non-compound * allocation. To prevent the remaining pages from being leaked, we free * the subsequent pages here. If you want to use the page's reference * count to decide when to free the allocation, you should allocate a * compound page, and use put_page() instead of __free_pages(). * * Context: May be called in interrupt context or holding a normal * spinlock, but not in NMI context or while holding a raw spinlock. */
Re: [PATCH v2] page_alloc: Fix freeing non-compound pages
On Tue, Sep 29, 2020 at 04:40:26AM +0100, Matthew Wilcox wrote: > On Mon, Sep 28, 2020 at 06:03:07PM -0700, Andrew Morton wrote: > > Well that's weird and scary looking. `page' has non-zero refcount yet > > we go and free random followon pages. Methinks it merits an > > explanatory comment? > > Here's some kernel-doc. Opinions? > > /** > * __free_pages - Free pages allocated with alloc_pages(). > * @page: The page pointer returned from alloc_pages(). > * @order: The order of the allocation. > * > * This function differs from put_page() in that it can free multi-page This sentence presumes existing description/prior knowledge about put_page(). Maybe This function can free multi-page allocations that were not allocated with %__GFP_COMP, unlike put_page() that would free only the first page in such case. __free_pages() does not ... > * allocations that were not allocated with %__GFP_COMP. This function > * does not check that the @order passed in matches that of the > * allocation, so it is possible to leak memory. Freeing more memory than > * was allocated will probably be warned about by other debugging checks. > * > * It is only safe to use the page reference count to determine when > * to free an allocation if you use %__GFP_COMP (in which case, you may > * as well use put_page() to free the page). Another thread may have a > * speculative reference to the first page, but it has no way of knowing > * about the rest of the allocation, so we have to free all but the > * first page here. > * > * Context: May be called in interrupt context but not NMI context. > */ > -- Sincerely yours, Mike.
Re: [PATCH v2] page_alloc: Fix freeing non-compound pages
On Tue, 29 Sep 2020 02:17:19 +0100 Matthew Wilcox wrote: > On Mon, Sep 28, 2020 at 06:03:07PM -0700, Andrew Morton wrote: > > On Sat, 26 Sep 2020 22:39:19 +0100 "Matthew Wilcox (Oracle)" > > wrote: > > > > > Here is a very rare race which leaks memory: > > > > Not worth a cc:stable? > > Yes, it probably should have been. Have you a feeling for how often this occurs? > I just assume the stablebot will > pick up anything that has a Fixes: tag. We asked them not to do that for mm/ patches. Crazy stuff was getting backported. > > > > > > --- a/mm/page_alloc.c > > > +++ b/mm/page_alloc.c > > > @@ -4947,6 +4947,9 @@ void __free_pages(struct page *page, unsigned int > > > order) > > > { > > > if (put_page_testzero(page)) > > > free_the_page(page, order); > > > + else if (!PageHead(page)) > > > + while (order-- > 0) > > > + free_the_page(page + (1 << order), order); > > > > Well that's weird and scary looking. `page' has non-zero refcount yet > > we go and free random followon pages. Methinks it merits an > > explanatory comment? > > Well, poot. I lost that comment in the shuffling of patches. In a > different tree, I have: > > @@ -4943,10 +4943,19 @@ static inline void free_the_page(struct page *page, > unsi > gned int order) > __free_pages_ok(page, order); > } > > +/* > + * If we free a non-compound allocation, another thread may have a "non-compound, higher-order", I suggest? > + * speculative reference to the first page. It has no way of knowing > + * about the rest of the allocation, so we have to free all but the > + * first page here. > + */ > void __free_pages(struct page *page, unsigned int order) > { > if (put_page_testzero(page)) > free_the_page(page, order); > + else if (!PageHead(page)) > + while (order-- > 0) > + free_the_page(page + (1 << order), order); > } > EXPORT_SYMBOL(__free_pages); > > > Although I'm now thinking of making that comment into kernel-doc and > turning it into advice to the caller rather than an internal note to > other mm developers. hm. But what action could the caller take? The explanatory comment seems OK to me.
Re: [PATCH v2] page_alloc: Fix freeing non-compound pages
On Mon, Sep 28, 2020 at 06:03:07PM -0700, Andrew Morton wrote: > Well that's weird and scary looking. `page' has non-zero refcount yet > we go and free random followon pages. Methinks it merits an > explanatory comment? Here's some kernel-doc. Opinions? /** * __free_pages - Free pages allocated with alloc_pages(). * @page: The page pointer returned from alloc_pages(). * @order: The order of the allocation. * * This function differs from put_page() in that it can free multi-page * allocations that were not allocated with %__GFP_COMP. This function * does not check that the @order passed in matches that of the * allocation, so it is possible to leak memory. Freeing more memory than * was allocated will probably be warned about by other debugging checks. * * It is only safe to use the page reference count to determine when * to free an allocation if you use %__GFP_COMP (in which case, you may * as well use put_page() to free the page). Another thread may have a * speculative reference to the first page, but it has no way of knowing * about the rest of the allocation, so we have to free all but the * first page here. * * Context: May be called in interrupt context but not NMI context. */
Re: [PATCH v2] page_alloc: Fix freeing non-compound pages
On Mon, Sep 28, 2020 at 06:03:07PM -0700, Andrew Morton wrote: > On Sat, 26 Sep 2020 22:39:19 +0100 "Matthew Wilcox (Oracle)" > wrote: > > > Here is a very rare race which leaks memory: > > Not worth a cc:stable? Yes, it probably should have been. I just assume the stablebot will pick up anything that has a Fixes: tag. > > Page P0 is allocated to the page cache. Page P1 is free. > > > > Thread AThread BThread C > > find_get_entry(): > > xas_load() returns P0 > > Removes P0 from page cache > > P0 finds its buddy P1 > > alloc_pages(GFP_KERNEL, 1) returns P0 > > P0 has refcount 1 > > page_cache_get_speculative(P0) > > P0 has refcount 2 > > __free_pages(P0) > > __free_pages(P0, 1), I assume. Good catch. That was what I meant to type. > > P0 has refcount 1 > > put_page(P0) > > but this is implicitly order 0 Right, because it's not a compound page. > > P1 is not freed > > huh. Yeah. Nasty, and we'll never know how often it was hit. > > Fix this by freeing all the pages in __free_pages() that won't be freed > > by the call to put_page(). It's usually not a good idea to split a page, > > but this is a very unlikely scenario. > > > > ... > > > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -4947,6 +4947,9 @@ void __free_pages(struct page *page, unsigned int > > order) > > { > > if (put_page_testzero(page)) > > free_the_page(page, order); > > + else if (!PageHead(page)) > > + while (order-- > 0) > > + free_the_page(page + (1 << order), order); > > Well that's weird and scary looking. `page' has non-zero refcount yet > we go and free random followon pages. Methinks it merits an > explanatory comment? Well, poot. I lost that comment in the shuffling of patches. In a different tree, I have: @@ -4943,10 +4943,19 @@ static inline void free_the_page(struct page *page, unsi gned int order) __free_pages_ok(page, order); } +/* + * If we free a non-compound allocation, another thread may have a + * speculative reference to the first page. It has no way of knowing + * about the rest of the allocation, so we have to free all but the + * first page here. + */ void __free_pages(struct page *page, unsigned int order) { if (put_page_testzero(page)) free_the_page(page, order); + else if (!PageHead(page)) + while (order-- > 0) + free_the_page(page + (1 << order), order); } EXPORT_SYMBOL(__free_pages); Although I'm now thinking of making that comment into kernel-doc and turning it into advice to the caller rather than an internal note to other mm developers.
Re: [PATCH v2] page_alloc: Fix freeing non-compound pages
On Sat, 26 Sep 2020 22:39:19 +0100 "Matthew Wilcox (Oracle)" wrote: > Here is a very rare race which leaks memory: Not worth a cc:stable? > Page P0 is allocated to the page cache. Page P1 is free. > > Thread AThread BThread C > find_get_entry(): > xas_load() returns P0 > Removes P0 from page cache > P0 finds its buddy P1 > alloc_pages(GFP_KERNEL, 1) returns P0 > P0 has refcount 1 > page_cache_get_speculative(P0) > P0 has refcount 2 > __free_pages(P0) __free_pages(P0, 1), I assume. > P0 has refcount 1 > put_page(P0) but this is implicitly order 0 > P1 is not freed huh. > Fix this by freeing all the pages in __free_pages() that won't be freed > by the call to put_page(). It's usually not a good idea to split a page, > but this is a very unlikely scenario. > > ... > > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -4947,6 +4947,9 @@ void __free_pages(struct page *page, unsigned int order) > { > if (put_page_testzero(page)) > free_the_page(page, order); > + else if (!PageHead(page)) > + while (order-- > 0) > + free_the_page(page + (1 << order), order); Well that's weird and scary looking. `page' has non-zero refcount yet we go and free random followon pages. Methinks it merits an explanatory comment?
[PATCH v2] page_alloc: Fix freeing non-compound pages
Here is a very rare race which leaks memory: Page P0 is allocated to the page cache. Page P1 is free. Thread AThread BThread C find_get_entry(): xas_load() returns P0 Removes P0 from page cache P0 finds its buddy P1 alloc_pages(GFP_KERNEL, 1) returns P0 P0 has refcount 1 page_cache_get_speculative(P0) P0 has refcount 2 __free_pages(P0) P0 has refcount 1 put_page(P0) P1 is not freed Fix this by freeing all the pages in __free_pages() that won't be freed by the call to put_page(). It's usually not a good idea to split a page, but this is a very unlikely scenario. Fixes: e286781d5f2e ("mm: speculative page references") Signed-off-by: Matthew Wilcox (Oracle) --- v2: Add a test module. Verified it works by: (1) loading the test module on an unmodified kernel and watching it OOM (2) modifying __free_pages() with my v1 patch that neglected to add the PageHead test and hitting VM_BUG_ON_PAGE(PageTail(page)) (3) applying all of this patch and seeing it survive lib/Kconfig.debug | 9 + lib/Makefile | 1 + lib/test_free_pages.c | 42 ++ mm/page_alloc.c | 3 +++ 4 files changed, 55 insertions(+) create mode 100644 lib/test_free_pages.c diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index e068c3c7189a..eed59af0e907 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -2369,6 +2369,15 @@ config TEST_HMM If unsure, say N. +config TEST_FREE_PAGES + tristate "Test freeing pages" + help + Test that a memory leak does not occur due to a race between + freeing a block of pages and a speculative page reference. + Loading this module is safe if your kernel has the bug fixed. + If the bug is not fixed, it will leak gigabytes of memory and + probably OOM your system. + config TEST_FPU tristate "Test floating point operations in kernel space" depends on X86 && !KCOV_INSTRUMENT_ALL diff --git a/lib/Makefile b/lib/Makefile index a4a4c6864f51..071b687b7363 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -99,6 +99,7 @@ obj-$(CONFIG_TEST_BLACKHOLE_DEV) += test_blackhole_dev.o obj-$(CONFIG_TEST_MEMINIT) += test_meminit.o obj-$(CONFIG_TEST_LOCKUP) += test_lockup.o obj-$(CONFIG_TEST_HMM) += test_hmm.o +obj-$(CONFIG_TEST_FREE_PAGES) += test_free_pages.o # # CFLAGS for compiling floating point code inside the kernel. x86/Makefile turns diff --git a/lib/test_free_pages.c b/lib/test_free_pages.c new file mode 100644 index ..074e76bd76b2 --- /dev/null +++ b/lib/test_free_pages.c @@ -0,0 +1,42 @@ +// SPDX-License-Identifier: GPL-2.0+ +/* + * test_free_pages.c: Check that free_pages() doesn't leak memory + * Copyright (c) 2020 Oracle + * Author: Matthew Wilcox + */ + +#include +#include +#include + +static void test_free_pages(gfp_t gfp) +{ + unsigned int i; + + for (i = 0; i < 1000 * 1000; i++) { + unsigned long addr = __get_free_pages(gfp, 3); + struct page *page = virt_to_page(addr); + + /* Simulate page cache getting a speculative reference */ + get_page(page); + free_pages(addr, 3); + put_page(page); + } +} + +static int m_in(void) +{ + test_free_pages(GFP_KERNEL); + test_free_pages(GFP_KERNEL | __GFP_COMP); + + return 0; +} + +static void m_ex(void) +{ +} + +module_init(m_in); +module_exit(m_ex); +MODULE_AUTHOR("Matthew Wilcox "); +MODULE_LICENSE("GPL"); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index fab5e97dc9ca..9b259c76e285 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4947,6 +4947,9 @@ void __free_pages(struct page *page, unsigned int order) { if (put_page_testzero(page)) free_the_page(page, order); + else if (!PageHead(page)) + while (order-- > 0) + free_the_page(page + (1 << order), order); } EXPORT_SYMBOL(__free_pages); -- 2.28.0