Re: [PATCH v1] mm: hugetlb: fix hugepage memory leak caused by wrong reserve count
On 11/23/2015 09:32 PM, Naoya Horiguchi wrote: > On Fri, Nov 20, 2015 at 01:56:18PM -0800, Mike Kravetz wrote: >> On 11/19/2015 11:57 PM, Hillf Danton wrote: When dequeue_huge_page_vma() in alloc_huge_page() fails, we fall back to alloc_buddy_huge_page() to directly create a hugepage from the buddy allocator. In that case, however, if alloc_buddy_huge_page() succeeds we don't decrement h->resv_huge_pages, which means that successful hugetlb_fault() returns without releasing the reserve count. As a result, subsequent hugetlb_fault() might fail despite that there are still free hugepages. This patch simply adds decrementing code on that code path. >> >> In general, I agree with the patch. If we allocate a huge page via the >> buddy allocator and that page will be used to satisfy a reservation, then >> we need to decrement the reservation count. >> >> As Hillf mentions, this code is not exactly the same in linux-next. >> Specifically, there is the new call to take the memory policy of the >> vma into account when calling the buddy allocator. I do not think, >> this impacts your proposed change but you may want to test with that >> in place. >> I reproduced this problem when testing v4.3 kernel in the following situation: - the test machine/VM is a NUMA system, - hugepage overcommiting is enabled, - most of hugepages are allocated and there's only one free hugepage which is on node 0 (for example), - another program, which calls set_mempolicy(MPOL_BIND) to bind itself to node 1, tries to allocate a hugepage, >> >> I am curious about this scenario. When this second program attempts to >> allocate the page, I assume it creates a reservation first. Is this >> reservation before or after setting mempolicy? If the mempolicy was set >> first, I would have expected the reservation to allocate a page on >> node 1 to satisfy the reservation. > > My testing called set_mempolicy() at first then called mmap(), but things > didn't change if I reordered them, because currently hugetlb reservation is > not NUMA-aware. Ah right. I was looking at gather_surplus_pages() as called by hugetlb_acct_memory() to account for a new reservation. In your case, the global free count is still large enough to satisfy the reservation so gather_surplus_pages simply increases the global reservation count. If there were not enough free pages, alloc_buddy_huge_page() would be called in an attempt to allocate enough free pages. As is the case in alloc_huge_page(), the mempolicy of the of the task would be taken into account (if there is no vma specific policy). So, the new huge pages to satisfy the reservation would 'hopefully' be allocated on the correct node. Sorry, I thinking your test might be allocating a new huge page at reservation time. But, it is not. -- Mike Kravetz > > Thanks, > Naoya Horiguchi > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v1] mm: hugetlb: fix hugepage memory leak caused by wrong reserve count
On 11/23/2015 09:32 PM, Naoya Horiguchi wrote: > On Fri, Nov 20, 2015 at 01:56:18PM -0800, Mike Kravetz wrote: >> On 11/19/2015 11:57 PM, Hillf Danton wrote: When dequeue_huge_page_vma() in alloc_huge_page() fails, we fall back to alloc_buddy_huge_page() to directly create a hugepage from the buddy allocator. In that case, however, if alloc_buddy_huge_page() succeeds we don't decrement h->resv_huge_pages, which means that successful hugetlb_fault() returns without releasing the reserve count. As a result, subsequent hugetlb_fault() might fail despite that there are still free hugepages. This patch simply adds decrementing code on that code path. >> >> In general, I agree with the patch. If we allocate a huge page via the >> buddy allocator and that page will be used to satisfy a reservation, then >> we need to decrement the reservation count. >> >> As Hillf mentions, this code is not exactly the same in linux-next. >> Specifically, there is the new call to take the memory policy of the >> vma into account when calling the buddy allocator. I do not think, >> this impacts your proposed change but you may want to test with that >> in place. >> I reproduced this problem when testing v4.3 kernel in the following situation: - the test machine/VM is a NUMA system, - hugepage overcommiting is enabled, - most of hugepages are allocated and there's only one free hugepage which is on node 0 (for example), - another program, which calls set_mempolicy(MPOL_BIND) to bind itself to node 1, tries to allocate a hugepage, >> >> I am curious about this scenario. When this second program attempts to >> allocate the page, I assume it creates a reservation first. Is this >> reservation before or after setting mempolicy? If the mempolicy was set >> first, I would have expected the reservation to allocate a page on >> node 1 to satisfy the reservation. > > My testing called set_mempolicy() at first then called mmap(), but things > didn't change if I reordered them, because currently hugetlb reservation is > not NUMA-aware. Ah right. I was looking at gather_surplus_pages() as called by hugetlb_acct_memory() to account for a new reservation. In your case, the global free count is still large enough to satisfy the reservation so gather_surplus_pages simply increases the global reservation count. If there were not enough free pages, alloc_buddy_huge_page() would be called in an attempt to allocate enough free pages. As is the case in alloc_huge_page(), the mempolicy of the of the task would be taken into account (if there is no vma specific policy). So, the new huge pages to satisfy the reservation would 'hopefully' be allocated on the correct node. Sorry, I thinking your test might be allocating a new huge page at reservation time. But, it is not. -- Mike Kravetz > > Thanks, > Naoya Horiguchi > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v1] mm: hugetlb: fix hugepage memory leak caused by wrong reserve count
On Fri, Nov 20, 2015 at 01:56:18PM -0800, Mike Kravetz wrote: > On 11/19/2015 11:57 PM, Hillf Danton wrote: > >> > >> When dequeue_huge_page_vma() in alloc_huge_page() fails, we fall back to > >> alloc_buddy_huge_page() to directly create a hugepage from the buddy > >> allocator. > >> In that case, however, if alloc_buddy_huge_page() succeeds we don't > >> decrement > >> h->resv_huge_pages, which means that successful hugetlb_fault() returns > >> without > >> releasing the reserve count. As a result, subsequent hugetlb_fault() might > >> fail > >> despite that there are still free hugepages. > >> > >> This patch simply adds decrementing code on that code path. > > In general, I agree with the patch. If we allocate a huge page via the > buddy allocator and that page will be used to satisfy a reservation, then > we need to decrement the reservation count. > > As Hillf mentions, this code is not exactly the same in linux-next. > Specifically, there is the new call to take the memory policy of the > vma into account when calling the buddy allocator. I do not think, > this impacts your proposed change but you may want to test with that > in place. > > >> > >> I reproduced this problem when testing v4.3 kernel in the following > >> situation: > >> - the test machine/VM is a NUMA system, > >> - hugepage overcommiting is enabled, > >> - most of hugepages are allocated and there's only one free hugepage > >> which is on node 0 (for example), > >> - another program, which calls set_mempolicy(MPOL_BIND) to bind itself to > >> node 1, tries to allocate a hugepage, > > I am curious about this scenario. When this second program attempts to > allocate the page, I assume it creates a reservation first. Is this > reservation before or after setting mempolicy? If the mempolicy was set > first, I would have expected the reservation to allocate a page on > node 1 to satisfy the reservation. My testing called set_mempolicy() at first then called mmap(), but things didn't change if I reordered them, because currently hugetlb reservation is not NUMA-aware. Thanks, Naoya Horiguchi-- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v1] mm: hugetlb: fix hugepage memory leak caused by wrong reserve count
On Fri, Nov 20, 2015 at 02:26:38PM -0800, Andrew Morton wrote: > On Fri, 20 Nov 2015 15:57:21 +0800 "Hillf Danton" > wrote: > > > > > > > When dequeue_huge_page_vma() in alloc_huge_page() fails, we fall back to > > > alloc_buddy_huge_page() to directly create a hugepage from the buddy > > > allocator. > > > In that case, however, if alloc_buddy_huge_page() succeeds we don't > > > decrement > > > h->resv_huge_pages, which means that successful hugetlb_fault() returns > > > without > > > releasing the reserve count. As a result, subsequent hugetlb_fault() > > > might fail > > > despite that there are still free hugepages. > > > > > > This patch simply adds decrementing code on that code path. > > > > > > I reproduced this problem when testing v4.3 kernel in the following > > > situation: > > > - the test machine/VM is a NUMA system, > > > - hugepage overcommiting is enabled, > > > - most of hugepages are allocated and there's only one free hugepage > > > which is on node 0 (for example), > > > - another program, which calls set_mempolicy(MPOL_BIND) to bind itself to > > > node 1, tries to allocate a hugepage, > > > - the allocation should fail but the reserve count is still hold. > > > > > > Signed-off-by: Naoya Horiguchi > > > Cc: [3.16+] > > > --- > > > - the reason why I set stable target to "3.16+" is that this patch can be > > > applied easily/automatically on these versions. But this bug seems to be > > > old one, so if you are interested in backporting to older kernels, > > > please let me know. > > > --- > > > mm/hugetlb.c |5 - > > > 1 files changed, 4 insertions(+), 1 deletions(-) > > > > > > diff --git v4.3/mm/hugetlb.c v4.3_patched/mm/hugetlb.c > > > index 9cc7734..77c518c 100644 > > > --- v4.3/mm/hugetlb.c > > > +++ v4.3_patched/mm/hugetlb.c > > > @@ -1790,7 +1790,10 @@ struct page *alloc_huge_page(struct vm_area_struct > > > *vma, > > > page = alloc_buddy_huge_page(h, NUMA_NO_NODE); > > > if (!page) > > > goto out_uncharge_cgroup; > > > - > > > + if (!avoid_reserve && vma_has_reserves(vma, gbl_chg)) { > > > + SetPagePrivate(page); > > > + h->resv_huge_pages--; > > > + } > > > > I am wondering if this patch was prepared against the next tree. > > It's against 4.3. Hi Hillf, Andrew, That's right, this was against 4.3, and I agree with the adjustment for next as done below. > Here's the version I have, against current -linus: > > --- > a/mm/hugetlb.c~mm-hugetlb-fix-hugepage-memory-leak-caused-by-wrong-reserve-count > +++ a/mm/hugetlb.c > @@ -1886,7 +1886,10 @@ struct page *alloc_huge_page(struct vm_a > page = __alloc_buddy_huge_page_with_mpol(h, vma, addr); > if (!page) > goto out_uncharge_cgroup; > - > + if (!avoid_reserve && vma_has_reserves(vma, gbl_chg)) { > + SetPagePrivate(page); > + h->resv_huge_pages--; > + } > spin_lock(_lock); > list_move(>lru, >hugepage_activelist); > /* Fall through */ > > It needs a careful re-review and, preferably, retest please. I retested and made sure that the fix works on next-20151123. Thanks, Naoya Horiguchi > Probably when Greg comes to merge this he'll hit problems and we'll > need to provide him with the against-4.3 patch. > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v1] mm: hugetlb: fix hugepage memory leak caused by wrong reserve count
On Fri, Nov 20, 2015 at 02:26:38PM -0800, Andrew Morton wrote: > On Fri, 20 Nov 2015 15:57:21 +0800 "Hillf Danton"> wrote: > > > > > > > When dequeue_huge_page_vma() in alloc_huge_page() fails, we fall back to > > > alloc_buddy_huge_page() to directly create a hugepage from the buddy > > > allocator. > > > In that case, however, if alloc_buddy_huge_page() succeeds we don't > > > decrement > > > h->resv_huge_pages, which means that successful hugetlb_fault() returns > > > without > > > releasing the reserve count. As a result, subsequent hugetlb_fault() > > > might fail > > > despite that there are still free hugepages. > > > > > > This patch simply adds decrementing code on that code path. > > > > > > I reproduced this problem when testing v4.3 kernel in the following > > > situation: > > > - the test machine/VM is a NUMA system, > > > - hugepage overcommiting is enabled, > > > - most of hugepages are allocated and there's only one free hugepage > > > which is on node 0 (for example), > > > - another program, which calls set_mempolicy(MPOL_BIND) to bind itself to > > > node 1, tries to allocate a hugepage, > > > - the allocation should fail but the reserve count is still hold. > > > > > > Signed-off-by: Naoya Horiguchi > > > Cc: [3.16+] > > > --- > > > - the reason why I set stable target to "3.16+" is that this patch can be > > > applied easily/automatically on these versions. But this bug seems to be > > > old one, so if you are interested in backporting to older kernels, > > > please let me know. > > > --- > > > mm/hugetlb.c |5 - > > > 1 files changed, 4 insertions(+), 1 deletions(-) > > > > > > diff --git v4.3/mm/hugetlb.c v4.3_patched/mm/hugetlb.c > > > index 9cc7734..77c518c 100644 > > > --- v4.3/mm/hugetlb.c > > > +++ v4.3_patched/mm/hugetlb.c > > > @@ -1790,7 +1790,10 @@ struct page *alloc_huge_page(struct vm_area_struct > > > *vma, > > > page = alloc_buddy_huge_page(h, NUMA_NO_NODE); > > > if (!page) > > > goto out_uncharge_cgroup; > > > - > > > + if (!avoid_reserve && vma_has_reserves(vma, gbl_chg)) { > > > + SetPagePrivate(page); > > > + h->resv_huge_pages--; > > > + } > > > > I am wondering if this patch was prepared against the next tree. > > It's against 4.3. Hi Hillf, Andrew, That's right, this was against 4.3, and I agree with the adjustment for next as done below. > Here's the version I have, against current -linus: > > --- > a/mm/hugetlb.c~mm-hugetlb-fix-hugepage-memory-leak-caused-by-wrong-reserve-count > +++ a/mm/hugetlb.c > @@ -1886,7 +1886,10 @@ struct page *alloc_huge_page(struct vm_a > page = __alloc_buddy_huge_page_with_mpol(h, vma, addr); > if (!page) > goto out_uncharge_cgroup; > - > + if (!avoid_reserve && vma_has_reserves(vma, gbl_chg)) { > + SetPagePrivate(page); > + h->resv_huge_pages--; > + } > spin_lock(_lock); > list_move(>lru, >hugepage_activelist); > /* Fall through */ > > It needs a careful re-review and, preferably, retest please. I retested and made sure that the fix works on next-20151123. Thanks, Naoya Horiguchi > Probably when Greg comes to merge this he'll hit problems and we'll > need to provide him with the against-4.3 patch. > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v1] mm: hugetlb: fix hugepage memory leak caused by wrong reserve count
On Fri, Nov 20, 2015 at 01:56:18PM -0800, Mike Kravetz wrote: > On 11/19/2015 11:57 PM, Hillf Danton wrote: > >> > >> When dequeue_huge_page_vma() in alloc_huge_page() fails, we fall back to > >> alloc_buddy_huge_page() to directly create a hugepage from the buddy > >> allocator. > >> In that case, however, if alloc_buddy_huge_page() succeeds we don't > >> decrement > >> h->resv_huge_pages, which means that successful hugetlb_fault() returns > >> without > >> releasing the reserve count. As a result, subsequent hugetlb_fault() might > >> fail > >> despite that there are still free hugepages. > >> > >> This patch simply adds decrementing code on that code path. > > In general, I agree with the patch. If we allocate a huge page via the > buddy allocator and that page will be used to satisfy a reservation, then > we need to decrement the reservation count. > > As Hillf mentions, this code is not exactly the same in linux-next. > Specifically, there is the new call to take the memory policy of the > vma into account when calling the buddy allocator. I do not think, > this impacts your proposed change but you may want to test with that > in place. > > >> > >> I reproduced this problem when testing v4.3 kernel in the following > >> situation: > >> - the test machine/VM is a NUMA system, > >> - hugepage overcommiting is enabled, > >> - most of hugepages are allocated and there's only one free hugepage > >> which is on node 0 (for example), > >> - another program, which calls set_mempolicy(MPOL_BIND) to bind itself to > >> node 1, tries to allocate a hugepage, > > I am curious about this scenario. When this second program attempts to > allocate the page, I assume it creates a reservation first. Is this > reservation before or after setting mempolicy? If the mempolicy was set > first, I would have expected the reservation to allocate a page on > node 1 to satisfy the reservation. My testing called set_mempolicy() at first then called mmap(), but things didn't change if I reordered them, because currently hugetlb reservation is not NUMA-aware. Thanks, Naoya Horiguchi-- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v1] mm: hugetlb: fix hugepage memory leak caused by wrong reserve count
On Fri, 20 Nov 2015 15:57:21 +0800 "Hillf Danton" wrote: > > > > When dequeue_huge_page_vma() in alloc_huge_page() fails, we fall back to > > alloc_buddy_huge_page() to directly create a hugepage from the buddy > > allocator. > > In that case, however, if alloc_buddy_huge_page() succeeds we don't > > decrement > > h->resv_huge_pages, which means that successful hugetlb_fault() returns > > without > > releasing the reserve count. As a result, subsequent hugetlb_fault() might > > fail > > despite that there are still free hugepages. > > > > This patch simply adds decrementing code on that code path. > > > > I reproduced this problem when testing v4.3 kernel in the following > > situation: > > - the test machine/VM is a NUMA system, > > - hugepage overcommiting is enabled, > > - most of hugepages are allocated and there's only one free hugepage > > which is on node 0 (for example), > > - another program, which calls set_mempolicy(MPOL_BIND) to bind itself to > > node 1, tries to allocate a hugepage, > > - the allocation should fail but the reserve count is still hold. > > > > Signed-off-by: Naoya Horiguchi > > Cc: [3.16+] > > --- > > - the reason why I set stable target to "3.16+" is that this patch can be > > applied easily/automatically on these versions. But this bug seems to be > > old one, so if you are interested in backporting to older kernels, > > please let me know. > > --- > > mm/hugetlb.c |5 - > > 1 files changed, 4 insertions(+), 1 deletions(-) > > > > diff --git v4.3/mm/hugetlb.c v4.3_patched/mm/hugetlb.c > > index 9cc7734..77c518c 100644 > > --- v4.3/mm/hugetlb.c > > +++ v4.3_patched/mm/hugetlb.c > > @@ -1790,7 +1790,10 @@ struct page *alloc_huge_page(struct vm_area_struct > > *vma, > > page = alloc_buddy_huge_page(h, NUMA_NO_NODE); > > if (!page) > > goto out_uncharge_cgroup; > > - > > + if (!avoid_reserve && vma_has_reserves(vma, gbl_chg)) { > > + SetPagePrivate(page); > > + h->resv_huge_pages--; > > + } > > I am wondering if this patch was prepared against the next tree. It's against 4.3. Here's the version I have, against current -linus: --- a/mm/hugetlb.c~mm-hugetlb-fix-hugepage-memory-leak-caused-by-wrong-reserve-count +++ a/mm/hugetlb.c @@ -1886,7 +1886,10 @@ struct page *alloc_huge_page(struct vm_a page = __alloc_buddy_huge_page_with_mpol(h, vma, addr); if (!page) goto out_uncharge_cgroup; - + if (!avoid_reserve && vma_has_reserves(vma, gbl_chg)) { + SetPagePrivate(page); + h->resv_huge_pages--; + } spin_lock(_lock); list_move(>lru, >hugepage_activelist); /* Fall through */ It needs a careful re-review and, preferably, retest please. Probably when Greg comes to merge this he'll hit problems and we'll need to provide him with the against-4.3 patch. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v1] mm: hugetlb: fix hugepage memory leak caused by wrong reserve count
On 11/19/2015 11:57 PM, Hillf Danton wrote: >> >> When dequeue_huge_page_vma() in alloc_huge_page() fails, we fall back to >> alloc_buddy_huge_page() to directly create a hugepage from the buddy >> allocator. >> In that case, however, if alloc_buddy_huge_page() succeeds we don't decrement >> h->resv_huge_pages, which means that successful hugetlb_fault() returns >> without >> releasing the reserve count. As a result, subsequent hugetlb_fault() might >> fail >> despite that there are still free hugepages. >> >> This patch simply adds decrementing code on that code path. In general, I agree with the patch. If we allocate a huge page via the buddy allocator and that page will be used to satisfy a reservation, then we need to decrement the reservation count. As Hillf mentions, this code is not exactly the same in linux-next. Specifically, there is the new call to take the memory policy of the vma into account when calling the buddy allocator. I do not think, this impacts your proposed change but you may want to test with that in place. >> >> I reproduced this problem when testing v4.3 kernel in the following >> situation: >> - the test machine/VM is a NUMA system, >> - hugepage overcommiting is enabled, >> - most of hugepages are allocated and there's only one free hugepage >> which is on node 0 (for example), >> - another program, which calls set_mempolicy(MPOL_BIND) to bind itself to >> node 1, tries to allocate a hugepage, I am curious about this scenario. When this second program attempts to allocate the page, I assume it creates a reservation first. Is this reservation before or after setting mempolicy? If the mempolicy was set first, I would have expected the reservation to allocate a page on node 1 to satisfy the reservation. -- Mike Kravetz >> - the allocation should fail but the reserve count is still hold. >> >> Signed-off-by: Naoya Horiguchi >> Cc: [3.16+] >> --- >> - the reason why I set stable target to "3.16+" is that this patch can be >> applied easily/automatically on these versions. But this bug seems to be >> old one, so if you are interested in backporting to older kernels, >> please let me know. >> --- >> mm/hugetlb.c |5 - >> 1 files changed, 4 insertions(+), 1 deletions(-) >> >> diff --git v4.3/mm/hugetlb.c v4.3_patched/mm/hugetlb.c >> index 9cc7734..77c518c 100644 >> --- v4.3/mm/hugetlb.c >> +++ v4.3_patched/mm/hugetlb.c >> @@ -1790,7 +1790,10 @@ struct page *alloc_huge_page(struct vm_area_struct >> *vma, >> page = alloc_buddy_huge_page(h, NUMA_NO_NODE); >> if (!page) >> goto out_uncharge_cgroup; >> - >> +if (!avoid_reserve && vma_has_reserves(vma, gbl_chg)) { >> +SetPagePrivate(page); >> +h->resv_huge_pages--; >> +} > > I am wondering if this patch was prepared against the next tree. > >> spin_lock(_lock); >> list_move(>lru, >hugepage_activelist); >> /* Fall through */ >> -- >> 1.7.1 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v1] mm: hugetlb: fix hugepage memory leak caused by wrong reserve count
> > When dequeue_huge_page_vma() in alloc_huge_page() fails, we fall back to > alloc_buddy_huge_page() to directly create a hugepage from the buddy > allocator. > In that case, however, if alloc_buddy_huge_page() succeeds we don't decrement > h->resv_huge_pages, which means that successful hugetlb_fault() returns > without > releasing the reserve count. As a result, subsequent hugetlb_fault() might > fail > despite that there are still free hugepages. > > This patch simply adds decrementing code on that code path. > > I reproduced this problem when testing v4.3 kernel in the following situation: > - the test machine/VM is a NUMA system, > - hugepage overcommiting is enabled, > - most of hugepages are allocated and there's only one free hugepage > which is on node 0 (for example), > - another program, which calls set_mempolicy(MPOL_BIND) to bind itself to > node 1, tries to allocate a hugepage, > - the allocation should fail but the reserve count is still hold. > > Signed-off-by: Naoya Horiguchi > Cc: [3.16+] > --- > - the reason why I set stable target to "3.16+" is that this patch can be > applied easily/automatically on these versions. But this bug seems to be > old one, so if you are interested in backporting to older kernels, > please let me know. > --- > mm/hugetlb.c |5 - > 1 files changed, 4 insertions(+), 1 deletions(-) > > diff --git v4.3/mm/hugetlb.c v4.3_patched/mm/hugetlb.c > index 9cc7734..77c518c 100644 > --- v4.3/mm/hugetlb.c > +++ v4.3_patched/mm/hugetlb.c > @@ -1790,7 +1790,10 @@ struct page *alloc_huge_page(struct vm_area_struct > *vma, > page = alloc_buddy_huge_page(h, NUMA_NO_NODE); > if (!page) > goto out_uncharge_cgroup; > - > + if (!avoid_reserve && vma_has_reserves(vma, gbl_chg)) { > + SetPagePrivate(page); > + h->resv_huge_pages--; > + } I am wondering if this patch was prepared against the next tree. > spin_lock(_lock); > list_move(>lru, >hugepage_activelist); > /* Fall through */ > -- > 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v1] mm: hugetlb: fix hugepage memory leak caused by wrong reserve count
On 11/19/2015 11:57 PM, Hillf Danton wrote: >> >> When dequeue_huge_page_vma() in alloc_huge_page() fails, we fall back to >> alloc_buddy_huge_page() to directly create a hugepage from the buddy >> allocator. >> In that case, however, if alloc_buddy_huge_page() succeeds we don't decrement >> h->resv_huge_pages, which means that successful hugetlb_fault() returns >> without >> releasing the reserve count. As a result, subsequent hugetlb_fault() might >> fail >> despite that there are still free hugepages. >> >> This patch simply adds decrementing code on that code path. In general, I agree with the patch. If we allocate a huge page via the buddy allocator and that page will be used to satisfy a reservation, then we need to decrement the reservation count. As Hillf mentions, this code is not exactly the same in linux-next. Specifically, there is the new call to take the memory policy of the vma into account when calling the buddy allocator. I do not think, this impacts your proposed change but you may want to test with that in place. >> >> I reproduced this problem when testing v4.3 kernel in the following >> situation: >> - the test machine/VM is a NUMA system, >> - hugepage overcommiting is enabled, >> - most of hugepages are allocated and there's only one free hugepage >> which is on node 0 (for example), >> - another program, which calls set_mempolicy(MPOL_BIND) to bind itself to >> node 1, tries to allocate a hugepage, I am curious about this scenario. When this second program attempts to allocate the page, I assume it creates a reservation first. Is this reservation before or after setting mempolicy? If the mempolicy was set first, I would have expected the reservation to allocate a page on node 1 to satisfy the reservation. -- Mike Kravetz >> - the allocation should fail but the reserve count is still hold. >> >> Signed-off-by: Naoya Horiguchi>> Cc: [3.16+] >> --- >> - the reason why I set stable target to "3.16+" is that this patch can be >> applied easily/automatically on these versions. But this bug seems to be >> old one, so if you are interested in backporting to older kernels, >> please let me know. >> --- >> mm/hugetlb.c |5 - >> 1 files changed, 4 insertions(+), 1 deletions(-) >> >> diff --git v4.3/mm/hugetlb.c v4.3_patched/mm/hugetlb.c >> index 9cc7734..77c518c 100644 >> --- v4.3/mm/hugetlb.c >> +++ v4.3_patched/mm/hugetlb.c >> @@ -1790,7 +1790,10 @@ struct page *alloc_huge_page(struct vm_area_struct >> *vma, >> page = alloc_buddy_huge_page(h, NUMA_NO_NODE); >> if (!page) >> goto out_uncharge_cgroup; >> - >> +if (!avoid_reserve && vma_has_reserves(vma, gbl_chg)) { >> +SetPagePrivate(page); >> +h->resv_huge_pages--; >> +} > > I am wondering if this patch was prepared against the next tree. > >> spin_lock(_lock); >> list_move(>lru, >hugepage_activelist); >> /* Fall through */ >> -- >> 1.7.1 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v1] mm: hugetlb: fix hugepage memory leak caused by wrong reserve count
On Fri, 20 Nov 2015 15:57:21 +0800 "Hillf Danton"wrote: > > > > When dequeue_huge_page_vma() in alloc_huge_page() fails, we fall back to > > alloc_buddy_huge_page() to directly create a hugepage from the buddy > > allocator. > > In that case, however, if alloc_buddy_huge_page() succeeds we don't > > decrement > > h->resv_huge_pages, which means that successful hugetlb_fault() returns > > without > > releasing the reserve count. As a result, subsequent hugetlb_fault() might > > fail > > despite that there are still free hugepages. > > > > This patch simply adds decrementing code on that code path. > > > > I reproduced this problem when testing v4.3 kernel in the following > > situation: > > - the test machine/VM is a NUMA system, > > - hugepage overcommiting is enabled, > > - most of hugepages are allocated and there's only one free hugepage > > which is on node 0 (for example), > > - another program, which calls set_mempolicy(MPOL_BIND) to bind itself to > > node 1, tries to allocate a hugepage, > > - the allocation should fail but the reserve count is still hold. > > > > Signed-off-by: Naoya Horiguchi > > Cc: [3.16+] > > --- > > - the reason why I set stable target to "3.16+" is that this patch can be > > applied easily/automatically on these versions. But this bug seems to be > > old one, so if you are interested in backporting to older kernels, > > please let me know. > > --- > > mm/hugetlb.c |5 - > > 1 files changed, 4 insertions(+), 1 deletions(-) > > > > diff --git v4.3/mm/hugetlb.c v4.3_patched/mm/hugetlb.c > > index 9cc7734..77c518c 100644 > > --- v4.3/mm/hugetlb.c > > +++ v4.3_patched/mm/hugetlb.c > > @@ -1790,7 +1790,10 @@ struct page *alloc_huge_page(struct vm_area_struct > > *vma, > > page = alloc_buddy_huge_page(h, NUMA_NO_NODE); > > if (!page) > > goto out_uncharge_cgroup; > > - > > + if (!avoid_reserve && vma_has_reserves(vma, gbl_chg)) { > > + SetPagePrivate(page); > > + h->resv_huge_pages--; > > + } > > I am wondering if this patch was prepared against the next tree. It's against 4.3. Here's the version I have, against current -linus: --- a/mm/hugetlb.c~mm-hugetlb-fix-hugepage-memory-leak-caused-by-wrong-reserve-count +++ a/mm/hugetlb.c @@ -1886,7 +1886,10 @@ struct page *alloc_huge_page(struct vm_a page = __alloc_buddy_huge_page_with_mpol(h, vma, addr); if (!page) goto out_uncharge_cgroup; - + if (!avoid_reserve && vma_has_reserves(vma, gbl_chg)) { + SetPagePrivate(page); + h->resv_huge_pages--; + } spin_lock(_lock); list_move(>lru, >hugepage_activelist); /* Fall through */ It needs a careful re-review and, preferably, retest please. Probably when Greg comes to merge this he'll hit problems and we'll need to provide him with the against-4.3 patch. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v1] mm: hugetlb: fix hugepage memory leak caused by wrong reserve count
> > When dequeue_huge_page_vma() in alloc_huge_page() fails, we fall back to > alloc_buddy_huge_page() to directly create a hugepage from the buddy > allocator. > In that case, however, if alloc_buddy_huge_page() succeeds we don't decrement > h->resv_huge_pages, which means that successful hugetlb_fault() returns > without > releasing the reserve count. As a result, subsequent hugetlb_fault() might > fail > despite that there are still free hugepages. > > This patch simply adds decrementing code on that code path. > > I reproduced this problem when testing v4.3 kernel in the following situation: > - the test machine/VM is a NUMA system, > - hugepage overcommiting is enabled, > - most of hugepages are allocated and there's only one free hugepage > which is on node 0 (for example), > - another program, which calls set_mempolicy(MPOL_BIND) to bind itself to > node 1, tries to allocate a hugepage, > - the allocation should fail but the reserve count is still hold. > > Signed-off-by: Naoya Horiguchi> Cc: [3.16+] > --- > - the reason why I set stable target to "3.16+" is that this patch can be > applied easily/automatically on these versions. But this bug seems to be > old one, so if you are interested in backporting to older kernels, > please let me know. > --- > mm/hugetlb.c |5 - > 1 files changed, 4 insertions(+), 1 deletions(-) > > diff --git v4.3/mm/hugetlb.c v4.3_patched/mm/hugetlb.c > index 9cc7734..77c518c 100644 > --- v4.3/mm/hugetlb.c > +++ v4.3_patched/mm/hugetlb.c > @@ -1790,7 +1790,10 @@ struct page *alloc_huge_page(struct vm_area_struct > *vma, > page = alloc_buddy_huge_page(h, NUMA_NO_NODE); > if (!page) > goto out_uncharge_cgroup; > - > + if (!avoid_reserve && vma_has_reserves(vma, gbl_chg)) { > + SetPagePrivate(page); > + h->resv_huge_pages--; > + } I am wondering if this patch was prepared against the next tree. > spin_lock(_lock); > list_move(>lru, >hugepage_activelist); > /* Fall through */ > -- > 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v1] mm: hugetlb: fix hugepage memory leak caused by wrong reserve count
When dequeue_huge_page_vma() in alloc_huge_page() fails, we fall back to alloc_buddy_huge_page() to directly create a hugepage from the buddy allocator. In that case, however, if alloc_buddy_huge_page() succeeds we don't decrement h->resv_huge_pages, which means that successful hugetlb_fault() returns without releasing the reserve count. As a result, subsequent hugetlb_fault() might fail despite that there are still free hugepages. This patch simply adds decrementing code on that code path. I reproduced this problem when testing v4.3 kernel in the following situation: - the test machine/VM is a NUMA system, - hugepage overcommiting is enabled, - most of hugepages are allocated and there's only one free hugepage which is on node 0 (for example), - another program, which calls set_mempolicy(MPOL_BIND) to bind itself to node 1, tries to allocate a hugepage, - the allocation should fail but the reserve count is still hold. Signed-off-by: Naoya Horiguchi Cc: [3.16+] --- - the reason why I set stable target to "3.16+" is that this patch can be applied easily/automatically on these versions. But this bug seems to be old one, so if you are interested in backporting to older kernels, please let me know. --- mm/hugetlb.c |5 - 1 files changed, 4 insertions(+), 1 deletions(-) diff --git v4.3/mm/hugetlb.c v4.3_patched/mm/hugetlb.c index 9cc7734..77c518c 100644 --- v4.3/mm/hugetlb.c +++ v4.3_patched/mm/hugetlb.c @@ -1790,7 +1790,10 @@ struct page *alloc_huge_page(struct vm_area_struct *vma, page = alloc_buddy_huge_page(h, NUMA_NO_NODE); if (!page) goto out_uncharge_cgroup; - + if (!avoid_reserve && vma_has_reserves(vma, gbl_chg)) { + SetPagePrivate(page); + h->resv_huge_pages--; + } spin_lock(_lock); list_move(>lru, >hugepage_activelist); /* Fall through */ -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v1] mm: hugetlb: fix hugepage memory leak caused by wrong reserve count
When dequeue_huge_page_vma() in alloc_huge_page() fails, we fall back to alloc_buddy_huge_page() to directly create a hugepage from the buddy allocator. In that case, however, if alloc_buddy_huge_page() succeeds we don't decrement h->resv_huge_pages, which means that successful hugetlb_fault() returns without releasing the reserve count. As a result, subsequent hugetlb_fault() might fail despite that there are still free hugepages. This patch simply adds decrementing code on that code path. I reproduced this problem when testing v4.3 kernel in the following situation: - the test machine/VM is a NUMA system, - hugepage overcommiting is enabled, - most of hugepages are allocated and there's only one free hugepage which is on node 0 (for example), - another program, which calls set_mempolicy(MPOL_BIND) to bind itself to node 1, tries to allocate a hugepage, - the allocation should fail but the reserve count is still hold. Signed-off-by: Naoya HoriguchiCc: [3.16+] --- - the reason why I set stable target to "3.16+" is that this patch can be applied easily/automatically on these versions. But this bug seems to be old one, so if you are interested in backporting to older kernels, please let me know. --- mm/hugetlb.c |5 - 1 files changed, 4 insertions(+), 1 deletions(-) diff --git v4.3/mm/hugetlb.c v4.3_patched/mm/hugetlb.c index 9cc7734..77c518c 100644 --- v4.3/mm/hugetlb.c +++ v4.3_patched/mm/hugetlb.c @@ -1790,7 +1790,10 @@ struct page *alloc_huge_page(struct vm_area_struct *vma, page = alloc_buddy_huge_page(h, NUMA_NO_NODE); if (!page) goto out_uncharge_cgroup; - + if (!avoid_reserve && vma_has_reserves(vma, gbl_chg)) { + SetPagePrivate(page); + h->resv_huge_pages--; + } spin_lock(_lock); list_move(>lru, >hugepage_activelist); /* Fall through */ -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/