Re: [RFC PATCH 0/5] mm, hugetlb: allocation API and migration improvements
On 12/20/2017 06:53 PM, Michal Hocko wrote: > On Wed 20-12-17 05:33:36, Naoya Horiguchi wrote: >> >> On 12/15/2017 06:33 PM, Michal Hocko wrote: >>> Naoya, >>> this has passed Mike's review (thanks for that!), you have mentioned >>> that you can pass this through your testing machinery earlier. While >>> I've done some testing already I would really appreciate if you could >>> do that as well. Review would be highly appreciated as well. >> >> Sorry for my slow response. I reviewed/tested this patchset and looks >> good to me overall. > > No need to feel sorry. This doesn't have an urgent priority. Thanks for > the review and testing. Can I assume your {Reviewed,Acked}-by or > Tested-by? > Yes, I tested again with additional changes below, and hugetlb migration works fine from mbind(2). Thank you very much for your work. Reviewed-by: Naoya Horiguchifor the series. Thanks, Naoya Horiguchi >> I have one comment on the code path from mbind(2). >> The callback passed to migrate_pages() in do_mbind() (i.e. new_page()) >> calls alloc_huge_page_noerr() which currently doesn't call >> SetPageHugeTemporary(), >> so hugetlb migration fails when h->surplus_huge_page >= >> h->nr_overcommit_huge_pages. > > Yes, I am aware of that. I should have been more explicit in the > changelog. Sorry about that and thanks for pointing it out explicitly. > To be honest I wasn't really sure what to do about this. The code path > is really complex and it made my head spin. I fail to see why we have to > call alloc_huge_page and mess with reservations at all. > >> I don't think this is a bug, but it would be better if mbind(2) works >> more similarly with other migration callers like >> move_pages(2)/migrate_pages(2). > > If the fix is as easy as the following I will add it to the pile. > Otherwise I would prefer to do this separately after I find some more > time to understand the callpath. > --- > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > index e035002d3fb6..08a4af411e25 100644 > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h > @@ -345,10 +345,9 @@ struct huge_bootmem_page { > struct page *alloc_huge_page(struct vm_area_struct *vma, > unsigned long addr, int avoid_reserve); > struct page *alloc_huge_page_node(struct hstate *h, int nid); > -struct page *alloc_huge_page_noerr(struct vm_area_struct *vma, > - unsigned long addr, int avoid_reserve); > struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid, > nodemask_t *nmask); > +struct page *alloc_huge_page_vma(struct vm_area_struct *vma, unsigned long > address); > int huge_add_to_page_cache(struct page *page, struct address_space *mapping, > pgoff_t idx); > > @@ -526,7 +525,7 @@ struct hstate {}; > #define alloc_huge_page(v, a, r) NULL > #define alloc_huge_page_node(h, nid) NULL > #define alloc_huge_page_nodemask(h, preferred_nid, nmask) NULL > -#define alloc_huge_page_noerr(v, a, r) NULL > +#define alloc_huge_page_vma(vma, address) NULL > #define alloc_bootmem_huge_page(h) NULL > #define hstate_file(f) NULL > #define hstate_sizelog(s) NULL > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 4426c5b23a20..e00deabe6d17 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -1672,6 +1672,25 @@ struct page *alloc_huge_page_nodemask(struct hstate > *h, int preferred_nid, > return alloc_migrate_huge_page(h, gfp_mask, preferred_nid, nmask); > } > > +/* mempolicy aware migration callback */ > +struct page *alloc_huge_page_vma(struct vm_area_struct *vma, unsigned long > address) > +{ > + struct mempolicy *mpol; > + nodemask_t *nodemask; > + struct page *page; > + struct hstate *h; > + gfp_t gfp_mask; > + int node; > + > + h = hstate_vma(vma); > + gfp_mask = htlb_alloc_mask(h); > + node = huge_node(vma, address, gfp_mask, , ); > + page = alloc_huge_page_nodemask(h, node, nodemask); > + mpol_cond_put(mpol); > + > + return page; > +} > + > /* > * Increase the hugetlb pool such that it can accommodate a reservation > * of size 'delta'. > @@ -2077,20 +2096,6 @@ struct page *alloc_huge_page(struct vm_area_struct > *vma, > return ERR_PTR(-ENOSPC); > } > > -/* > - * alloc_huge_page()'s wrapper which simply returns the page if allocation > - * succeeds, otherwise NULL. This function is called from new_vma_page(), > - * where no ERR_VALUE is expected to be returned. > - */ > -struct page *alloc_huge_page_noerr(struct vm_area_struct *vma, > - unsigned long addr, int avoid_reserve) > -{ > - struct page *page = alloc_huge_page(vma, addr, avoid_reserve); > - if (IS_ERR(page)) > - page = NULL; > - return page; > -} > - > int alloc_bootmem_huge_page(struct hstate *h) > __attribute__ ((weak, alias("__alloc_bootmem_huge_page"))); > int
Re: [RFC PATCH 0/5] mm, hugetlb: allocation API and migration improvements
On 12/20/2017 06:53 PM, Michal Hocko wrote: > On Wed 20-12-17 05:33:36, Naoya Horiguchi wrote: >> >> On 12/15/2017 06:33 PM, Michal Hocko wrote: >>> Naoya, >>> this has passed Mike's review (thanks for that!), you have mentioned >>> that you can pass this through your testing machinery earlier. While >>> I've done some testing already I would really appreciate if you could >>> do that as well. Review would be highly appreciated as well. >> >> Sorry for my slow response. I reviewed/tested this patchset and looks >> good to me overall. > > No need to feel sorry. This doesn't have an urgent priority. Thanks for > the review and testing. Can I assume your {Reviewed,Acked}-by or > Tested-by? > Yes, I tested again with additional changes below, and hugetlb migration works fine from mbind(2). Thank you very much for your work. Reviewed-by: Naoya Horiguchi for the series. Thanks, Naoya Horiguchi >> I have one comment on the code path from mbind(2). >> The callback passed to migrate_pages() in do_mbind() (i.e. new_page()) >> calls alloc_huge_page_noerr() which currently doesn't call >> SetPageHugeTemporary(), >> so hugetlb migration fails when h->surplus_huge_page >= >> h->nr_overcommit_huge_pages. > > Yes, I am aware of that. I should have been more explicit in the > changelog. Sorry about that and thanks for pointing it out explicitly. > To be honest I wasn't really sure what to do about this. The code path > is really complex and it made my head spin. I fail to see why we have to > call alloc_huge_page and mess with reservations at all. > >> I don't think this is a bug, but it would be better if mbind(2) works >> more similarly with other migration callers like >> move_pages(2)/migrate_pages(2). > > If the fix is as easy as the following I will add it to the pile. > Otherwise I would prefer to do this separately after I find some more > time to understand the callpath. > --- > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > index e035002d3fb6..08a4af411e25 100644 > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h > @@ -345,10 +345,9 @@ struct huge_bootmem_page { > struct page *alloc_huge_page(struct vm_area_struct *vma, > unsigned long addr, int avoid_reserve); > struct page *alloc_huge_page_node(struct hstate *h, int nid); > -struct page *alloc_huge_page_noerr(struct vm_area_struct *vma, > - unsigned long addr, int avoid_reserve); > struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid, > nodemask_t *nmask); > +struct page *alloc_huge_page_vma(struct vm_area_struct *vma, unsigned long > address); > int huge_add_to_page_cache(struct page *page, struct address_space *mapping, > pgoff_t idx); > > @@ -526,7 +525,7 @@ struct hstate {}; > #define alloc_huge_page(v, a, r) NULL > #define alloc_huge_page_node(h, nid) NULL > #define alloc_huge_page_nodemask(h, preferred_nid, nmask) NULL > -#define alloc_huge_page_noerr(v, a, r) NULL > +#define alloc_huge_page_vma(vma, address) NULL > #define alloc_bootmem_huge_page(h) NULL > #define hstate_file(f) NULL > #define hstate_sizelog(s) NULL > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 4426c5b23a20..e00deabe6d17 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -1672,6 +1672,25 @@ struct page *alloc_huge_page_nodemask(struct hstate > *h, int preferred_nid, > return alloc_migrate_huge_page(h, gfp_mask, preferred_nid, nmask); > } > > +/* mempolicy aware migration callback */ > +struct page *alloc_huge_page_vma(struct vm_area_struct *vma, unsigned long > address) > +{ > + struct mempolicy *mpol; > + nodemask_t *nodemask; > + struct page *page; > + struct hstate *h; > + gfp_t gfp_mask; > + int node; > + > + h = hstate_vma(vma); > + gfp_mask = htlb_alloc_mask(h); > + node = huge_node(vma, address, gfp_mask, , ); > + page = alloc_huge_page_nodemask(h, node, nodemask); > + mpol_cond_put(mpol); > + > + return page; > +} > + > /* > * Increase the hugetlb pool such that it can accommodate a reservation > * of size 'delta'. > @@ -2077,20 +2096,6 @@ struct page *alloc_huge_page(struct vm_area_struct > *vma, > return ERR_PTR(-ENOSPC); > } > > -/* > - * alloc_huge_page()'s wrapper which simply returns the page if allocation > - * succeeds, otherwise NULL. This function is called from new_vma_page(), > - * where no ERR_VALUE is expected to be returned. > - */ > -struct page *alloc_huge_page_noerr(struct vm_area_struct *vma, > - unsigned long addr, int avoid_reserve) > -{ > - struct page *page = alloc_huge_page(vma, addr, avoid_reserve); > - if (IS_ERR(page)) > - page = NULL; > - return page; > -} > - > int alloc_bootmem_huge_page(struct hstate *h) > __attribute__ ((weak, alias("__alloc_bootmem_huge_page"))); > int __alloc_bootmem_huge_page(struct hstate *h) > diff
Re: [RFC PATCH 0/5] mm, hugetlb: allocation API and migration improvements
On Thu 21-12-17 15:35:28, Mike Kravetz wrote: [...] > You can add, > > Reviewed-by: Mike Kravetz> > I had some concerns about transferring huge page state during migration > not specific to this patch, so I did a bunch of testing. On Fri 22-12-17 08:58:48, Naoya Horiguchi wrote: [...] > Yes, I tested again with additional changes below, and hugetlb migration > works fine from mbind(2). Thank you very much for your work. > > Reviewed-by: Naoya Horiguchi > > for the series. Thanks a lot to both of you! I have added the changelog to the last patch. I am currently busy as hell so I will unlikely send the whole thing before new year but please double check the changelog if you find some more time. --- >From 65e7dda5dfe90fa656285729a270855e93637caa Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Fri, 22 Dec 2017 10:31:18 +0100 Subject: [PATCH] hugetlb, mempolicy: fix the mbind hugetlb migration do_mbind migration code relies on alloc_huge_page_noerr for hugetlb pages. alloc_huge_page_noerr uses alloc_huge_page which is a highlevel allocation function which has to take care of reserves, overcommit or hugetlb cgroup accounting. None of that is really required for the page migration because the new page is only temporal and either will replace the original page or it will be dropped. This is essentially as for other migration call paths and there shouldn't be any reason to handle mbind in a special way. The current implementation is even suboptimal because the migration might fail just because the hugetlb cgroup limit is reached, or the overcommit is saturated. Fix this by making mbind like other hugetlb migration paths. Add a new migration helper alloc_huge_page_vma as a wrapper around alloc_huge_page_nodemask with additional mempolicy handling. alloc_huge_page_noerr has no more users and it can go. Reviewed-by: Mike Kravetz Reviewed-by: Naoya Horiguchi Signed-off-by: Michal Hocko --- include/linux/hugetlb.h | 5 ++--- mm/hugetlb.c| 33 +++-- mm/mempolicy.c | 3 +-- 3 files changed, 22 insertions(+), 19 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index e035002d3fb6..08a4af411e25 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -345,10 +345,9 @@ struct huge_bootmem_page { struct page *alloc_huge_page(struct vm_area_struct *vma, unsigned long addr, int avoid_reserve); struct page *alloc_huge_page_node(struct hstate *h, int nid); -struct page *alloc_huge_page_noerr(struct vm_area_struct *vma, - unsigned long addr, int avoid_reserve); struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid, nodemask_t *nmask); +struct page *alloc_huge_page_vma(struct vm_area_struct *vma, unsigned long address); int huge_add_to_page_cache(struct page *page, struct address_space *mapping, pgoff_t idx); @@ -526,7 +525,7 @@ struct hstate {}; #define alloc_huge_page(v, a, r) NULL #define alloc_huge_page_node(h, nid) NULL #define alloc_huge_page_nodemask(h, preferred_nid, nmask) NULL -#define alloc_huge_page_noerr(v, a, r) NULL +#define alloc_huge_page_vma(vma, address) NULL #define alloc_bootmem_huge_page(h) NULL #define hstate_file(f) NULL #define hstate_sizelog(s) NULL diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 4426c5b23a20..e00deabe6d17 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1672,6 +1672,25 @@ struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid, return alloc_migrate_huge_page(h, gfp_mask, preferred_nid, nmask); } +/* mempolicy aware migration callback */ +struct page *alloc_huge_page_vma(struct vm_area_struct *vma, unsigned long address) +{ + struct mempolicy *mpol; + nodemask_t *nodemask; + struct page *page; + struct hstate *h; + gfp_t gfp_mask; + int node; + + h = hstate_vma(vma); + gfp_mask = htlb_alloc_mask(h); + node = huge_node(vma, address, gfp_mask, , ); + page = alloc_huge_page_nodemask(h, node, nodemask); + mpol_cond_put(mpol); + + return page; +} + /* * Increase the hugetlb pool such that it can accommodate a reservation * of size 'delta'. @@ -2077,20 +2096,6 @@ struct page *alloc_huge_page(struct vm_area_struct *vma, return ERR_PTR(-ENOSPC); } -/* - * alloc_huge_page()'s wrapper which simply returns the page if allocation - * succeeds, otherwise NULL. This function is called from new_vma_page(), - * where no ERR_VALUE is expected to be returned. - */ -struct page *alloc_huge_page_noerr(struct vm_area_struct *vma, - unsigned long addr, int avoid_reserve) -{ - struct page *page = alloc_huge_page(vma, addr, avoid_reserve); - if (IS_ERR(page)) -
Re: [RFC PATCH 0/5] mm, hugetlb: allocation API and migration improvements
On Thu 21-12-17 15:35:28, Mike Kravetz wrote: [...] > You can add, > > Reviewed-by: Mike Kravetz > > I had some concerns about transferring huge page state during migration > not specific to this patch, so I did a bunch of testing. On Fri 22-12-17 08:58:48, Naoya Horiguchi wrote: [...] > Yes, I tested again with additional changes below, and hugetlb migration > works fine from mbind(2). Thank you very much for your work. > > Reviewed-by: Naoya Horiguchi > > for the series. Thanks a lot to both of you! I have added the changelog to the last patch. I am currently busy as hell so I will unlikely send the whole thing before new year but please double check the changelog if you find some more time. --- >From 65e7dda5dfe90fa656285729a270855e93637caa Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Fri, 22 Dec 2017 10:31:18 +0100 Subject: [PATCH] hugetlb, mempolicy: fix the mbind hugetlb migration do_mbind migration code relies on alloc_huge_page_noerr for hugetlb pages. alloc_huge_page_noerr uses alloc_huge_page which is a highlevel allocation function which has to take care of reserves, overcommit or hugetlb cgroup accounting. None of that is really required for the page migration because the new page is only temporal and either will replace the original page or it will be dropped. This is essentially as for other migration call paths and there shouldn't be any reason to handle mbind in a special way. The current implementation is even suboptimal because the migration might fail just because the hugetlb cgroup limit is reached, or the overcommit is saturated. Fix this by making mbind like other hugetlb migration paths. Add a new migration helper alloc_huge_page_vma as a wrapper around alloc_huge_page_nodemask with additional mempolicy handling. alloc_huge_page_noerr has no more users and it can go. Reviewed-by: Mike Kravetz Reviewed-by: Naoya Horiguchi Signed-off-by: Michal Hocko --- include/linux/hugetlb.h | 5 ++--- mm/hugetlb.c| 33 +++-- mm/mempolicy.c | 3 +-- 3 files changed, 22 insertions(+), 19 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index e035002d3fb6..08a4af411e25 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -345,10 +345,9 @@ struct huge_bootmem_page { struct page *alloc_huge_page(struct vm_area_struct *vma, unsigned long addr, int avoid_reserve); struct page *alloc_huge_page_node(struct hstate *h, int nid); -struct page *alloc_huge_page_noerr(struct vm_area_struct *vma, - unsigned long addr, int avoid_reserve); struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid, nodemask_t *nmask); +struct page *alloc_huge_page_vma(struct vm_area_struct *vma, unsigned long address); int huge_add_to_page_cache(struct page *page, struct address_space *mapping, pgoff_t idx); @@ -526,7 +525,7 @@ struct hstate {}; #define alloc_huge_page(v, a, r) NULL #define alloc_huge_page_node(h, nid) NULL #define alloc_huge_page_nodemask(h, preferred_nid, nmask) NULL -#define alloc_huge_page_noerr(v, a, r) NULL +#define alloc_huge_page_vma(vma, address) NULL #define alloc_bootmem_huge_page(h) NULL #define hstate_file(f) NULL #define hstate_sizelog(s) NULL diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 4426c5b23a20..e00deabe6d17 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1672,6 +1672,25 @@ struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid, return alloc_migrate_huge_page(h, gfp_mask, preferred_nid, nmask); } +/* mempolicy aware migration callback */ +struct page *alloc_huge_page_vma(struct vm_area_struct *vma, unsigned long address) +{ + struct mempolicy *mpol; + nodemask_t *nodemask; + struct page *page; + struct hstate *h; + gfp_t gfp_mask; + int node; + + h = hstate_vma(vma); + gfp_mask = htlb_alloc_mask(h); + node = huge_node(vma, address, gfp_mask, , ); + page = alloc_huge_page_nodemask(h, node, nodemask); + mpol_cond_put(mpol); + + return page; +} + /* * Increase the hugetlb pool such that it can accommodate a reservation * of size 'delta'. @@ -2077,20 +2096,6 @@ struct page *alloc_huge_page(struct vm_area_struct *vma, return ERR_PTR(-ENOSPC); } -/* - * alloc_huge_page()'s wrapper which simply returns the page if allocation - * succeeds, otherwise NULL. This function is called from new_vma_page(), - * where no ERR_VALUE is expected to be returned. - */ -struct page *alloc_huge_page_noerr(struct vm_area_struct *vma, - unsigned long addr, int avoid_reserve) -{ - struct page *page = alloc_huge_page(vma, addr, avoid_reserve); - if (IS_ERR(page)) - page = NULL; - return page; -} - int alloc_bootmem_huge_page(struct hstate *h) __attribute__ ((weak,
Re: [RFC PATCH 0/5] mm, hugetlb: allocation API and migration improvements
On 12/20/2017 11:28 PM, Michal Hocko wrote: > On Wed 20-12-17 14:43:03, Mike Kravetz wrote: >> On 12/20/2017 01:53 AM, Michal Hocko wrote: >>> On Wed 20-12-17 05:33:36, Naoya Horiguchi wrote: I have one comment on the code path from mbind(2). The callback passed to migrate_pages() in do_mbind() (i.e. new_page()) calls alloc_huge_page_noerr() which currently doesn't call SetPageHugeTemporary(), so hugetlb migration fails when h->surplus_huge_page >= h->nr_overcommit_huge_pages. >>> >>> Yes, I am aware of that. I should have been more explicit in the >>> changelog. Sorry about that and thanks for pointing it out explicitly. >>> To be honest I wasn't really sure what to do about this. The code path >>> is really complex and it made my head spin. I fail to see why we have to >>> call alloc_huge_page and mess with reservations at all. >> >> Oops! I missed that in my review. >> >> Since alloc_huge_page was called with avoid_reserve == 1, it should not >> do anything with reserve counts. One potential issue with the existing >> code is cgroup accounting done by alloc_huge_page. When the new target >> page is allocated, it is charged against the cgroup even though the original >> page is still accounted for. If we are 'at the cgroup limit', the migration >> may fail because of this. > > Yeah, the existing code seems just broken. I strongly suspect that the > allocation API for hugetlb was so complicated that this was just a > natural result of a confusion with some follow up changes on top. > >> I like your new code below as it explicitly takes reserve and cgroup >> accounting out of the picture for migration. Let me think about it >> for another day before providing a Reviewed-by. > > Thanks a lot! You can add, Reviewed-by: Mike KravetzI had some concerns about transferring huge page state during migration not specific to this patch, so I did a bunch of testing. -- Mike Kravetz
Re: [RFC PATCH 0/5] mm, hugetlb: allocation API and migration improvements
On 12/20/2017 11:28 PM, Michal Hocko wrote: > On Wed 20-12-17 14:43:03, Mike Kravetz wrote: >> On 12/20/2017 01:53 AM, Michal Hocko wrote: >>> On Wed 20-12-17 05:33:36, Naoya Horiguchi wrote: I have one comment on the code path from mbind(2). The callback passed to migrate_pages() in do_mbind() (i.e. new_page()) calls alloc_huge_page_noerr() which currently doesn't call SetPageHugeTemporary(), so hugetlb migration fails when h->surplus_huge_page >= h->nr_overcommit_huge_pages. >>> >>> Yes, I am aware of that. I should have been more explicit in the >>> changelog. Sorry about that and thanks for pointing it out explicitly. >>> To be honest I wasn't really sure what to do about this. The code path >>> is really complex and it made my head spin. I fail to see why we have to >>> call alloc_huge_page and mess with reservations at all. >> >> Oops! I missed that in my review. >> >> Since alloc_huge_page was called with avoid_reserve == 1, it should not >> do anything with reserve counts. One potential issue with the existing >> code is cgroup accounting done by alloc_huge_page. When the new target >> page is allocated, it is charged against the cgroup even though the original >> page is still accounted for. If we are 'at the cgroup limit', the migration >> may fail because of this. > > Yeah, the existing code seems just broken. I strongly suspect that the > allocation API for hugetlb was so complicated that this was just a > natural result of a confusion with some follow up changes on top. > >> I like your new code below as it explicitly takes reserve and cgroup >> accounting out of the picture for migration. Let me think about it >> for another day before providing a Reviewed-by. > > Thanks a lot! You can add, Reviewed-by: Mike Kravetz I had some concerns about transferring huge page state during migration not specific to this patch, so I did a bunch of testing. -- Mike Kravetz
Re: [RFC PATCH 0/5] mm, hugetlb: allocation API and migration improvements
On Wed 20-12-17 14:43:03, Mike Kravetz wrote: > On 12/20/2017 01:53 AM, Michal Hocko wrote: > > On Wed 20-12-17 05:33:36, Naoya Horiguchi wrote: > >> I have one comment on the code path from mbind(2). > >> The callback passed to migrate_pages() in do_mbind() (i.e. new_page()) > >> calls alloc_huge_page_noerr() which currently doesn't call > >> SetPageHugeTemporary(), > >> so hugetlb migration fails when h->surplus_huge_page >= > >> h->nr_overcommit_huge_pages. > > > > Yes, I am aware of that. I should have been more explicit in the > > changelog. Sorry about that and thanks for pointing it out explicitly. > > To be honest I wasn't really sure what to do about this. The code path > > is really complex and it made my head spin. I fail to see why we have to > > call alloc_huge_page and mess with reservations at all. > > Oops! I missed that in my review. > > Since alloc_huge_page was called with avoid_reserve == 1, it should not > do anything with reserve counts. One potential issue with the existing > code is cgroup accounting done by alloc_huge_page. When the new target > page is allocated, it is charged against the cgroup even though the original > page is still accounted for. If we are 'at the cgroup limit', the migration > may fail because of this. Yeah, the existing code seems just broken. I strongly suspect that the allocation API for hugetlb was so complicated that this was just a natural result of a confusion with some follow up changes on top. > I like your new code below as it explicitly takes reserve and cgroup > accounting out of the picture for migration. Let me think about it > for another day before providing a Reviewed-by. Thanks a lot! -- Michal Hocko SUSE Labs
Re: [RFC PATCH 0/5] mm, hugetlb: allocation API and migration improvements
On Wed 20-12-17 14:43:03, Mike Kravetz wrote: > On 12/20/2017 01:53 AM, Michal Hocko wrote: > > On Wed 20-12-17 05:33:36, Naoya Horiguchi wrote: > >> I have one comment on the code path from mbind(2). > >> The callback passed to migrate_pages() in do_mbind() (i.e. new_page()) > >> calls alloc_huge_page_noerr() which currently doesn't call > >> SetPageHugeTemporary(), > >> so hugetlb migration fails when h->surplus_huge_page >= > >> h->nr_overcommit_huge_pages. > > > > Yes, I am aware of that. I should have been more explicit in the > > changelog. Sorry about that and thanks for pointing it out explicitly. > > To be honest I wasn't really sure what to do about this. The code path > > is really complex and it made my head spin. I fail to see why we have to > > call alloc_huge_page and mess with reservations at all. > > Oops! I missed that in my review. > > Since alloc_huge_page was called with avoid_reserve == 1, it should not > do anything with reserve counts. One potential issue with the existing > code is cgroup accounting done by alloc_huge_page. When the new target > page is allocated, it is charged against the cgroup even though the original > page is still accounted for. If we are 'at the cgroup limit', the migration > may fail because of this. Yeah, the existing code seems just broken. I strongly suspect that the allocation API for hugetlb was so complicated that this was just a natural result of a confusion with some follow up changes on top. > I like your new code below as it explicitly takes reserve and cgroup > accounting out of the picture for migration. Let me think about it > for another day before providing a Reviewed-by. Thanks a lot! -- Michal Hocko SUSE Labs
Re: [RFC PATCH 0/5] mm, hugetlb: allocation API and migration improvements
On 12/20/2017 01:53 AM, Michal Hocko wrote: > On Wed 20-12-17 05:33:36, Naoya Horiguchi wrote: >> I have one comment on the code path from mbind(2). >> The callback passed to migrate_pages() in do_mbind() (i.e. new_page()) >> calls alloc_huge_page_noerr() which currently doesn't call >> SetPageHugeTemporary(), >> so hugetlb migration fails when h->surplus_huge_page >= >> h->nr_overcommit_huge_pages. > > Yes, I am aware of that. I should have been more explicit in the > changelog. Sorry about that and thanks for pointing it out explicitly. > To be honest I wasn't really sure what to do about this. The code path > is really complex and it made my head spin. I fail to see why we have to > call alloc_huge_page and mess with reservations at all. Oops! I missed that in my review. Since alloc_huge_page was called with avoid_reserve == 1, it should not do anything with reserve counts. One potential issue with the existing code is cgroup accounting done by alloc_huge_page. When the new target page is allocated, it is charged against the cgroup even though the original page is still accounted for. If we are 'at the cgroup limit', the migration may fail because of this. I like your new code below as it explicitly takes reserve and cgroup accounting out of the picture for migration. Let me think about it for another day before providing a Reviewed-by. -- Mike Kravetz >> I don't think this is a bug, but it would be better if mbind(2) works >> more similarly with other migration callers like >> move_pages(2)/migrate_pages(2). > > If the fix is as easy as the following I will add it to the pile. > Otherwise I would prefer to do this separately after I find some more > time to understand the callpath. > --- > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > index e035002d3fb6..08a4af411e25 100644 > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h > @@ -345,10 +345,9 @@ struct huge_bootmem_page { > struct page *alloc_huge_page(struct vm_area_struct *vma, > unsigned long addr, int avoid_reserve); > struct page *alloc_huge_page_node(struct hstate *h, int nid); > -struct page *alloc_huge_page_noerr(struct vm_area_struct *vma, > - unsigned long addr, int avoid_reserve); > struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid, > nodemask_t *nmask); > +struct page *alloc_huge_page_vma(struct vm_area_struct *vma, unsigned long > address); > int huge_add_to_page_cache(struct page *page, struct address_space *mapping, > pgoff_t idx); > > @@ -526,7 +525,7 @@ struct hstate {}; > #define alloc_huge_page(v, a, r) NULL > #define alloc_huge_page_node(h, nid) NULL > #define alloc_huge_page_nodemask(h, preferred_nid, nmask) NULL > -#define alloc_huge_page_noerr(v, a, r) NULL > +#define alloc_huge_page_vma(vma, address) NULL > #define alloc_bootmem_huge_page(h) NULL > #define hstate_file(f) NULL > #define hstate_sizelog(s) NULL > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 4426c5b23a20..e00deabe6d17 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -1672,6 +1672,25 @@ struct page *alloc_huge_page_nodemask(struct hstate > *h, int preferred_nid, > return alloc_migrate_huge_page(h, gfp_mask, preferred_nid, nmask); > } > > +/* mempolicy aware migration callback */ > +struct page *alloc_huge_page_vma(struct vm_area_struct *vma, unsigned long > address) > +{ > + struct mempolicy *mpol; > + nodemask_t *nodemask; > + struct page *page; > + struct hstate *h; > + gfp_t gfp_mask; > + int node; > + > + h = hstate_vma(vma); > + gfp_mask = htlb_alloc_mask(h); > + node = huge_node(vma, address, gfp_mask, , ); > + page = alloc_huge_page_nodemask(h, node, nodemask); > + mpol_cond_put(mpol); > + > + return page; > +} > + > /* > * Increase the hugetlb pool such that it can accommodate a reservation > * of size 'delta'. > @@ -2077,20 +2096,6 @@ struct page *alloc_huge_page(struct vm_area_struct > *vma, > return ERR_PTR(-ENOSPC); > } > > -/* > - * alloc_huge_page()'s wrapper which simply returns the page if allocation > - * succeeds, otherwise NULL. This function is called from new_vma_page(), > - * where no ERR_VALUE is expected to be returned. > - */ > -struct page *alloc_huge_page_noerr(struct vm_area_struct *vma, > - unsigned long addr, int avoid_reserve) > -{ > - struct page *page = alloc_huge_page(vma, addr, avoid_reserve); > - if (IS_ERR(page)) > - page = NULL; > - return page; > -} > - > int alloc_bootmem_huge_page(struct hstate *h) > __attribute__ ((weak, alias("__alloc_bootmem_huge_page"))); > int __alloc_bootmem_huge_page(struct hstate *h) > diff --git a/mm/mempolicy.c b/mm/mempolicy.c > index f604b22ebb65..96823fa07f38 100644 > --- a/mm/mempolicy.c > +++ b/mm/mempolicy.c > @@ -1121,8 +1121,7 @@ static struct page
Re: [RFC PATCH 0/5] mm, hugetlb: allocation API and migration improvements
On 12/20/2017 01:53 AM, Michal Hocko wrote: > On Wed 20-12-17 05:33:36, Naoya Horiguchi wrote: >> I have one comment on the code path from mbind(2). >> The callback passed to migrate_pages() in do_mbind() (i.e. new_page()) >> calls alloc_huge_page_noerr() which currently doesn't call >> SetPageHugeTemporary(), >> so hugetlb migration fails when h->surplus_huge_page >= >> h->nr_overcommit_huge_pages. > > Yes, I am aware of that. I should have been more explicit in the > changelog. Sorry about that and thanks for pointing it out explicitly. > To be honest I wasn't really sure what to do about this. The code path > is really complex and it made my head spin. I fail to see why we have to > call alloc_huge_page and mess with reservations at all. Oops! I missed that in my review. Since alloc_huge_page was called with avoid_reserve == 1, it should not do anything with reserve counts. One potential issue with the existing code is cgroup accounting done by alloc_huge_page. When the new target page is allocated, it is charged against the cgroup even though the original page is still accounted for. If we are 'at the cgroup limit', the migration may fail because of this. I like your new code below as it explicitly takes reserve and cgroup accounting out of the picture for migration. Let me think about it for another day before providing a Reviewed-by. -- Mike Kravetz >> I don't think this is a bug, but it would be better if mbind(2) works >> more similarly with other migration callers like >> move_pages(2)/migrate_pages(2). > > If the fix is as easy as the following I will add it to the pile. > Otherwise I would prefer to do this separately after I find some more > time to understand the callpath. > --- > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > index e035002d3fb6..08a4af411e25 100644 > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h > @@ -345,10 +345,9 @@ struct huge_bootmem_page { > struct page *alloc_huge_page(struct vm_area_struct *vma, > unsigned long addr, int avoid_reserve); > struct page *alloc_huge_page_node(struct hstate *h, int nid); > -struct page *alloc_huge_page_noerr(struct vm_area_struct *vma, > - unsigned long addr, int avoid_reserve); > struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid, > nodemask_t *nmask); > +struct page *alloc_huge_page_vma(struct vm_area_struct *vma, unsigned long > address); > int huge_add_to_page_cache(struct page *page, struct address_space *mapping, > pgoff_t idx); > > @@ -526,7 +525,7 @@ struct hstate {}; > #define alloc_huge_page(v, a, r) NULL > #define alloc_huge_page_node(h, nid) NULL > #define alloc_huge_page_nodemask(h, preferred_nid, nmask) NULL > -#define alloc_huge_page_noerr(v, a, r) NULL > +#define alloc_huge_page_vma(vma, address) NULL > #define alloc_bootmem_huge_page(h) NULL > #define hstate_file(f) NULL > #define hstate_sizelog(s) NULL > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 4426c5b23a20..e00deabe6d17 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -1672,6 +1672,25 @@ struct page *alloc_huge_page_nodemask(struct hstate > *h, int preferred_nid, > return alloc_migrate_huge_page(h, gfp_mask, preferred_nid, nmask); > } > > +/* mempolicy aware migration callback */ > +struct page *alloc_huge_page_vma(struct vm_area_struct *vma, unsigned long > address) > +{ > + struct mempolicy *mpol; > + nodemask_t *nodemask; > + struct page *page; > + struct hstate *h; > + gfp_t gfp_mask; > + int node; > + > + h = hstate_vma(vma); > + gfp_mask = htlb_alloc_mask(h); > + node = huge_node(vma, address, gfp_mask, , ); > + page = alloc_huge_page_nodemask(h, node, nodemask); > + mpol_cond_put(mpol); > + > + return page; > +} > + > /* > * Increase the hugetlb pool such that it can accommodate a reservation > * of size 'delta'. > @@ -2077,20 +2096,6 @@ struct page *alloc_huge_page(struct vm_area_struct > *vma, > return ERR_PTR(-ENOSPC); > } > > -/* > - * alloc_huge_page()'s wrapper which simply returns the page if allocation > - * succeeds, otherwise NULL. This function is called from new_vma_page(), > - * where no ERR_VALUE is expected to be returned. > - */ > -struct page *alloc_huge_page_noerr(struct vm_area_struct *vma, > - unsigned long addr, int avoid_reserve) > -{ > - struct page *page = alloc_huge_page(vma, addr, avoid_reserve); > - if (IS_ERR(page)) > - page = NULL; > - return page; > -} > - > int alloc_bootmem_huge_page(struct hstate *h) > __attribute__ ((weak, alias("__alloc_bootmem_huge_page"))); > int __alloc_bootmem_huge_page(struct hstate *h) > diff --git a/mm/mempolicy.c b/mm/mempolicy.c > index f604b22ebb65..96823fa07f38 100644 > --- a/mm/mempolicy.c > +++ b/mm/mempolicy.c > @@ -1121,8 +1121,7 @@ static struct page
Re: [RFC PATCH 0/5] mm, hugetlb: allocation API and migration improvements
On Wed 20-12-17 05:33:36, Naoya Horiguchi wrote: > > On 12/15/2017 06:33 PM, Michal Hocko wrote: > > Naoya, > > this has passed Mike's review (thanks for that!), you have mentioned > > that you can pass this through your testing machinery earlier. While > > I've done some testing already I would really appreciate if you could > > do that as well. Review would be highly appreciated as well. > > Sorry for my slow response. I reviewed/tested this patchset and looks > good to me overall. No need to feel sorry. This doesn't have an urgent priority. Thanks for the review and testing. Can I assume your {Reviewed,Acked}-by or Tested-by? > I have one comment on the code path from mbind(2). > The callback passed to migrate_pages() in do_mbind() (i.e. new_page()) > calls alloc_huge_page_noerr() which currently doesn't call > SetPageHugeTemporary(), > so hugetlb migration fails when h->surplus_huge_page >= > h->nr_overcommit_huge_pages. Yes, I am aware of that. I should have been more explicit in the changelog. Sorry about that and thanks for pointing it out explicitly. To be honest I wasn't really sure what to do about this. The code path is really complex and it made my head spin. I fail to see why we have to call alloc_huge_page and mess with reservations at all. > I don't think this is a bug, but it would be better if mbind(2) works > more similarly with other migration callers like > move_pages(2)/migrate_pages(2). If the fix is as easy as the following I will add it to the pile. Otherwise I would prefer to do this separately after I find some more time to understand the callpath. --- diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index e035002d3fb6..08a4af411e25 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -345,10 +345,9 @@ struct huge_bootmem_page { struct page *alloc_huge_page(struct vm_area_struct *vma, unsigned long addr, int avoid_reserve); struct page *alloc_huge_page_node(struct hstate *h, int nid); -struct page *alloc_huge_page_noerr(struct vm_area_struct *vma, - unsigned long addr, int avoid_reserve); struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid, nodemask_t *nmask); +struct page *alloc_huge_page_vma(struct vm_area_struct *vma, unsigned long address); int huge_add_to_page_cache(struct page *page, struct address_space *mapping, pgoff_t idx); @@ -526,7 +525,7 @@ struct hstate {}; #define alloc_huge_page(v, a, r) NULL #define alloc_huge_page_node(h, nid) NULL #define alloc_huge_page_nodemask(h, preferred_nid, nmask) NULL -#define alloc_huge_page_noerr(v, a, r) NULL +#define alloc_huge_page_vma(vma, address) NULL #define alloc_bootmem_huge_page(h) NULL #define hstate_file(f) NULL #define hstate_sizelog(s) NULL diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 4426c5b23a20..e00deabe6d17 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1672,6 +1672,25 @@ struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid, return alloc_migrate_huge_page(h, gfp_mask, preferred_nid, nmask); } +/* mempolicy aware migration callback */ +struct page *alloc_huge_page_vma(struct vm_area_struct *vma, unsigned long address) +{ + struct mempolicy *mpol; + nodemask_t *nodemask; + struct page *page; + struct hstate *h; + gfp_t gfp_mask; + int node; + + h = hstate_vma(vma); + gfp_mask = htlb_alloc_mask(h); + node = huge_node(vma, address, gfp_mask, , ); + page = alloc_huge_page_nodemask(h, node, nodemask); + mpol_cond_put(mpol); + + return page; +} + /* * Increase the hugetlb pool such that it can accommodate a reservation * of size 'delta'. @@ -2077,20 +2096,6 @@ struct page *alloc_huge_page(struct vm_area_struct *vma, return ERR_PTR(-ENOSPC); } -/* - * alloc_huge_page()'s wrapper which simply returns the page if allocation - * succeeds, otherwise NULL. This function is called from new_vma_page(), - * where no ERR_VALUE is expected to be returned. - */ -struct page *alloc_huge_page_noerr(struct vm_area_struct *vma, - unsigned long addr, int avoid_reserve) -{ - struct page *page = alloc_huge_page(vma, addr, avoid_reserve); - if (IS_ERR(page)) - page = NULL; - return page; -} - int alloc_bootmem_huge_page(struct hstate *h) __attribute__ ((weak, alias("__alloc_bootmem_huge_page"))); int __alloc_bootmem_huge_page(struct hstate *h) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index f604b22ebb65..96823fa07f38 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1121,8 +1121,7 @@ static struct page *new_page(struct page *page, unsigned long start, int **x) } if (PageHuge(page)) { - BUG_ON(!vma); - return alloc_huge_page_noerr(vma, address, 1); + return alloc_huge_page_vma(vma, address);
Re: [RFC PATCH 0/5] mm, hugetlb: allocation API and migration improvements
On Wed 20-12-17 05:33:36, Naoya Horiguchi wrote: > > On 12/15/2017 06:33 PM, Michal Hocko wrote: > > Naoya, > > this has passed Mike's review (thanks for that!), you have mentioned > > that you can pass this through your testing machinery earlier. While > > I've done some testing already I would really appreciate if you could > > do that as well. Review would be highly appreciated as well. > > Sorry for my slow response. I reviewed/tested this patchset and looks > good to me overall. No need to feel sorry. This doesn't have an urgent priority. Thanks for the review and testing. Can I assume your {Reviewed,Acked}-by or Tested-by? > I have one comment on the code path from mbind(2). > The callback passed to migrate_pages() in do_mbind() (i.e. new_page()) > calls alloc_huge_page_noerr() which currently doesn't call > SetPageHugeTemporary(), > so hugetlb migration fails when h->surplus_huge_page >= > h->nr_overcommit_huge_pages. Yes, I am aware of that. I should have been more explicit in the changelog. Sorry about that and thanks for pointing it out explicitly. To be honest I wasn't really sure what to do about this. The code path is really complex and it made my head spin. I fail to see why we have to call alloc_huge_page and mess with reservations at all. > I don't think this is a bug, but it would be better if mbind(2) works > more similarly with other migration callers like > move_pages(2)/migrate_pages(2). If the fix is as easy as the following I will add it to the pile. Otherwise I would prefer to do this separately after I find some more time to understand the callpath. --- diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index e035002d3fb6..08a4af411e25 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -345,10 +345,9 @@ struct huge_bootmem_page { struct page *alloc_huge_page(struct vm_area_struct *vma, unsigned long addr, int avoid_reserve); struct page *alloc_huge_page_node(struct hstate *h, int nid); -struct page *alloc_huge_page_noerr(struct vm_area_struct *vma, - unsigned long addr, int avoid_reserve); struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid, nodemask_t *nmask); +struct page *alloc_huge_page_vma(struct vm_area_struct *vma, unsigned long address); int huge_add_to_page_cache(struct page *page, struct address_space *mapping, pgoff_t idx); @@ -526,7 +525,7 @@ struct hstate {}; #define alloc_huge_page(v, a, r) NULL #define alloc_huge_page_node(h, nid) NULL #define alloc_huge_page_nodemask(h, preferred_nid, nmask) NULL -#define alloc_huge_page_noerr(v, a, r) NULL +#define alloc_huge_page_vma(vma, address) NULL #define alloc_bootmem_huge_page(h) NULL #define hstate_file(f) NULL #define hstate_sizelog(s) NULL diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 4426c5b23a20..e00deabe6d17 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1672,6 +1672,25 @@ struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid, return alloc_migrate_huge_page(h, gfp_mask, preferred_nid, nmask); } +/* mempolicy aware migration callback */ +struct page *alloc_huge_page_vma(struct vm_area_struct *vma, unsigned long address) +{ + struct mempolicy *mpol; + nodemask_t *nodemask; + struct page *page; + struct hstate *h; + gfp_t gfp_mask; + int node; + + h = hstate_vma(vma); + gfp_mask = htlb_alloc_mask(h); + node = huge_node(vma, address, gfp_mask, , ); + page = alloc_huge_page_nodemask(h, node, nodemask); + mpol_cond_put(mpol); + + return page; +} + /* * Increase the hugetlb pool such that it can accommodate a reservation * of size 'delta'. @@ -2077,20 +2096,6 @@ struct page *alloc_huge_page(struct vm_area_struct *vma, return ERR_PTR(-ENOSPC); } -/* - * alloc_huge_page()'s wrapper which simply returns the page if allocation - * succeeds, otherwise NULL. This function is called from new_vma_page(), - * where no ERR_VALUE is expected to be returned. - */ -struct page *alloc_huge_page_noerr(struct vm_area_struct *vma, - unsigned long addr, int avoid_reserve) -{ - struct page *page = alloc_huge_page(vma, addr, avoid_reserve); - if (IS_ERR(page)) - page = NULL; - return page; -} - int alloc_bootmem_huge_page(struct hstate *h) __attribute__ ((weak, alias("__alloc_bootmem_huge_page"))); int __alloc_bootmem_huge_page(struct hstate *h) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index f604b22ebb65..96823fa07f38 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1121,8 +1121,7 @@ static struct page *new_page(struct page *page, unsigned long start, int **x) } if (PageHuge(page)) { - BUG_ON(!vma); - return alloc_huge_page_noerr(vma, address, 1); + return alloc_huge_page_vma(vma, address);
Re: [RFC PATCH 0/5] mm, hugetlb: allocation API and migration improvements
On 12/15/2017 06:33 PM, Michal Hocko wrote: > Naoya, > this has passed Mike's review (thanks for that!), you have mentioned > that you can pass this through your testing machinery earlier. While > I've done some testing already I would really appreciate if you could > do that as well. Review would be highly appreciated as well. Sorry for my slow response. I reviewed/tested this patchset and looks good to me overall. I have one comment on the code path from mbind(2). The callback passed to migrate_pages() in do_mbind() (i.e. new_page()) calls alloc_huge_page_noerr() which currently doesn't call SetPageHugeTemporary(), so hugetlb migration fails when h->surplus_huge_page >= h->nr_overcommit_huge_pages. I don't think this is a bug, but it would be better if mbind(2) works more similarly with other migration callers like move_pages(2)/migrate_pages(2). Thanks, Naoya Horiguchi > > Thanks! > > On Mon 04-12-17 15:01:12, Michal Hocko wrote: >> Hi, >> this is a follow up for [1] for the allocation API and [2] for the >> hugetlb migration. It wasn't really easy to split those into two >> separate patch series as they share some code. >> >> My primary motivation to touch this code is to make the gigantic pages >> migration working. The giga pages allocation code is just too fragile >> and hacked into the hugetlb code now. This series tries to move giga >> pages closer to the first class citizen. We are not there yet but having >> 5 patches is quite a lot already and it will already make the code much >> easier to follow. I will come with other changes on top after this sees >> some review. >> >> The first two patches should be trivial to review. The third patch >> changes the way how we migrate huge pages. Newly allocated pages are a >> subject of the overcommit check and they participate surplus accounting >> which is quite unfortunate as the changelog explains. This patch doesn't >> change anything wrt. giga pages. >> Patch #4 removes the surplus accounting hack from >> __alloc_surplus_huge_page. I hope I didn't miss anything there and a >> deeper review is really due there. >> Patch #5 finally unifies allocation paths and giga pages shouldn't be >> any special anymore. There is also some renaming going on as well. >> >> Shortlog >> Michal Hocko (5): >> mm, hugetlb: unify core page allocation accounting and initialization >> mm, hugetlb: integrate giga hugetlb more naturally to the allocation >> path >> mm, hugetlb: do not rely on overcommit limit during migration >> mm, hugetlb: get rid of surplus page accounting tricks >> mm, hugetlb: further simplify hugetlb allocation API >> >> Diffstat: >> include/linux/hugetlb.h | 3 + >> mm/hugetlb.c| 305 >> +++- >> mm/migrate.c| 3 +- >> 3 files changed, 175 insertions(+), 136 deletions(-) >> >> >> [1] http://lkml.kernel.org/r/20170622193034.28972-1-mho...@kernel.org >> [2] http://lkml.kernel.org/r/20171122152832.iayefrlxbugph...@dhcp22.suse.cz >> >> -- >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >> the body to majord...@kvack.org. For more info on Linux MM, >> see: http://www.linux-mm.org/ . >> Don't email: mailto:"d...@kvack.org;> em...@kvack.org >
Re: [RFC PATCH 0/5] mm, hugetlb: allocation API and migration improvements
On 12/15/2017 06:33 PM, Michal Hocko wrote: > Naoya, > this has passed Mike's review (thanks for that!), you have mentioned > that you can pass this through your testing machinery earlier. While > I've done some testing already I would really appreciate if you could > do that as well. Review would be highly appreciated as well. Sorry for my slow response. I reviewed/tested this patchset and looks good to me overall. I have one comment on the code path from mbind(2). The callback passed to migrate_pages() in do_mbind() (i.e. new_page()) calls alloc_huge_page_noerr() which currently doesn't call SetPageHugeTemporary(), so hugetlb migration fails when h->surplus_huge_page >= h->nr_overcommit_huge_pages. I don't think this is a bug, but it would be better if mbind(2) works more similarly with other migration callers like move_pages(2)/migrate_pages(2). Thanks, Naoya Horiguchi > > Thanks! > > On Mon 04-12-17 15:01:12, Michal Hocko wrote: >> Hi, >> this is a follow up for [1] for the allocation API and [2] for the >> hugetlb migration. It wasn't really easy to split those into two >> separate patch series as they share some code. >> >> My primary motivation to touch this code is to make the gigantic pages >> migration working. The giga pages allocation code is just too fragile >> and hacked into the hugetlb code now. This series tries to move giga >> pages closer to the first class citizen. We are not there yet but having >> 5 patches is quite a lot already and it will already make the code much >> easier to follow. I will come with other changes on top after this sees >> some review. >> >> The first two patches should be trivial to review. The third patch >> changes the way how we migrate huge pages. Newly allocated pages are a >> subject of the overcommit check and they participate surplus accounting >> which is quite unfortunate as the changelog explains. This patch doesn't >> change anything wrt. giga pages. >> Patch #4 removes the surplus accounting hack from >> __alloc_surplus_huge_page. I hope I didn't miss anything there and a >> deeper review is really due there. >> Patch #5 finally unifies allocation paths and giga pages shouldn't be >> any special anymore. There is also some renaming going on as well. >> >> Shortlog >> Michal Hocko (5): >> mm, hugetlb: unify core page allocation accounting and initialization >> mm, hugetlb: integrate giga hugetlb more naturally to the allocation >> path >> mm, hugetlb: do not rely on overcommit limit during migration >> mm, hugetlb: get rid of surplus page accounting tricks >> mm, hugetlb: further simplify hugetlb allocation API >> >> Diffstat: >> include/linux/hugetlb.h | 3 + >> mm/hugetlb.c| 305 >> +++- >> mm/migrate.c| 3 +- >> 3 files changed, 175 insertions(+), 136 deletions(-) >> >> >> [1] http://lkml.kernel.org/r/20170622193034.28972-1-mho...@kernel.org >> [2] http://lkml.kernel.org/r/20171122152832.iayefrlxbugph...@dhcp22.suse.cz >> >> -- >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >> the body to majord...@kvack.org. For more info on Linux MM, >> see: http://www.linux-mm.org/ . >> Don't email: mailto:"d...@kvack.org;> em...@kvack.org >
Re: [RFC PATCH 0/5] mm, hugetlb: allocation API and migration improvements
Naoya, this has passed Mike's review (thanks for that!), you have mentioned that you can pass this through your testing machinery earlier. While I've done some testing already I would really appreciate if you could do that as well. Review would be highly appreciated as well. Thanks! On Mon 04-12-17 15:01:12, Michal Hocko wrote: > Hi, > this is a follow up for [1] for the allocation API and [2] for the > hugetlb migration. It wasn't really easy to split those into two > separate patch series as they share some code. > > My primary motivation to touch this code is to make the gigantic pages > migration working. The giga pages allocation code is just too fragile > and hacked into the hugetlb code now. This series tries to move giga > pages closer to the first class citizen. We are not there yet but having > 5 patches is quite a lot already and it will already make the code much > easier to follow. I will come with other changes on top after this sees > some review. > > The first two patches should be trivial to review. The third patch > changes the way how we migrate huge pages. Newly allocated pages are a > subject of the overcommit check and they participate surplus accounting > which is quite unfortunate as the changelog explains. This patch doesn't > change anything wrt. giga pages. > Patch #4 removes the surplus accounting hack from > __alloc_surplus_huge_page. I hope I didn't miss anything there and a > deeper review is really due there. > Patch #5 finally unifies allocation paths and giga pages shouldn't be > any special anymore. There is also some renaming going on as well. > > Shortlog > Michal Hocko (5): > mm, hugetlb: unify core page allocation accounting and initialization > mm, hugetlb: integrate giga hugetlb more naturally to the allocation > path > mm, hugetlb: do not rely on overcommit limit during migration > mm, hugetlb: get rid of surplus page accounting tricks > mm, hugetlb: further simplify hugetlb allocation API > > Diffstat: > include/linux/hugetlb.h | 3 + > mm/hugetlb.c| 305 > +++- > mm/migrate.c| 3 +- > 3 files changed, 175 insertions(+), 136 deletions(-) > > > [1] http://lkml.kernel.org/r/20170622193034.28972-1-mho...@kernel.org > [2] http://lkml.kernel.org/r/20171122152832.iayefrlxbugph...@dhcp22.suse.cz > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majord...@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: mailto:"d...@kvack.org;> em...@kvack.org -- Michal Hocko SUSE Labs
Re: [RFC PATCH 0/5] mm, hugetlb: allocation API and migration improvements
Naoya, this has passed Mike's review (thanks for that!), you have mentioned that you can pass this through your testing machinery earlier. While I've done some testing already I would really appreciate if you could do that as well. Review would be highly appreciated as well. Thanks! On Mon 04-12-17 15:01:12, Michal Hocko wrote: > Hi, > this is a follow up for [1] for the allocation API and [2] for the > hugetlb migration. It wasn't really easy to split those into two > separate patch series as they share some code. > > My primary motivation to touch this code is to make the gigantic pages > migration working. The giga pages allocation code is just too fragile > and hacked into the hugetlb code now. This series tries to move giga > pages closer to the first class citizen. We are not there yet but having > 5 patches is quite a lot already and it will already make the code much > easier to follow. I will come with other changes on top after this sees > some review. > > The first two patches should be trivial to review. The third patch > changes the way how we migrate huge pages. Newly allocated pages are a > subject of the overcommit check and they participate surplus accounting > which is quite unfortunate as the changelog explains. This patch doesn't > change anything wrt. giga pages. > Patch #4 removes the surplus accounting hack from > __alloc_surplus_huge_page. I hope I didn't miss anything there and a > deeper review is really due there. > Patch #5 finally unifies allocation paths and giga pages shouldn't be > any special anymore. There is also some renaming going on as well. > > Shortlog > Michal Hocko (5): > mm, hugetlb: unify core page allocation accounting and initialization > mm, hugetlb: integrate giga hugetlb more naturally to the allocation > path > mm, hugetlb: do not rely on overcommit limit during migration > mm, hugetlb: get rid of surplus page accounting tricks > mm, hugetlb: further simplify hugetlb allocation API > > Diffstat: > include/linux/hugetlb.h | 3 + > mm/hugetlb.c| 305 > +++- > mm/migrate.c| 3 +- > 3 files changed, 175 insertions(+), 136 deletions(-) > > > [1] http://lkml.kernel.org/r/20170622193034.28972-1-mho...@kernel.org > [2] http://lkml.kernel.org/r/20171122152832.iayefrlxbugph...@dhcp22.suse.cz > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majord...@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: mailto:"d...@kvack.org;> em...@kvack.org -- Michal Hocko SUSE Labs
[RFC PATCH 0/5] mm, hugetlb: allocation API and migration improvements
Hi, this is a follow up for [1] for the allocation API and [2] for the hugetlb migration. It wasn't really easy to split those into two separate patch series as they share some code. My primary motivation to touch this code is to make the gigantic pages migration working. The giga pages allocation code is just too fragile and hacked into the hugetlb code now. This series tries to move giga pages closer to the first class citizen. We are not there yet but having 5 patches is quite a lot already and it will already make the code much easier to follow. I will come with other changes on top after this sees some review. The first two patches should be trivial to review. The third patch changes the way how we migrate huge pages. Newly allocated pages are a subject of the overcommit check and they participate surplus accounting which is quite unfortunate as the changelog explains. This patch doesn't change anything wrt. giga pages. Patch #4 removes the surplus accounting hack from __alloc_surplus_huge_page. I hope I didn't miss anything there and a deeper review is really due there. Patch #5 finally unifies allocation paths and giga pages shouldn't be any special anymore. There is also some renaming going on as well. Shortlog Michal Hocko (5): mm, hugetlb: unify core page allocation accounting and initialization mm, hugetlb: integrate giga hugetlb more naturally to the allocation path mm, hugetlb: do not rely on overcommit limit during migration mm, hugetlb: get rid of surplus page accounting tricks mm, hugetlb: further simplify hugetlb allocation API Diffstat: include/linux/hugetlb.h | 3 + mm/hugetlb.c| 305 +++- mm/migrate.c| 3 +- 3 files changed, 175 insertions(+), 136 deletions(-) [1] http://lkml.kernel.org/r/20170622193034.28972-1-mho...@kernel.org [2] http://lkml.kernel.org/r/20171122152832.iayefrlxbugph...@dhcp22.suse.cz
[RFC PATCH 0/5] mm, hugetlb: allocation API and migration improvements
Hi, this is a follow up for [1] for the allocation API and [2] for the hugetlb migration. It wasn't really easy to split those into two separate patch series as they share some code. My primary motivation to touch this code is to make the gigantic pages migration working. The giga pages allocation code is just too fragile and hacked into the hugetlb code now. This series tries to move giga pages closer to the first class citizen. We are not there yet but having 5 patches is quite a lot already and it will already make the code much easier to follow. I will come with other changes on top after this sees some review. The first two patches should be trivial to review. The third patch changes the way how we migrate huge pages. Newly allocated pages are a subject of the overcommit check and they participate surplus accounting which is quite unfortunate as the changelog explains. This patch doesn't change anything wrt. giga pages. Patch #4 removes the surplus accounting hack from __alloc_surplus_huge_page. I hope I didn't miss anything there and a deeper review is really due there. Patch #5 finally unifies allocation paths and giga pages shouldn't be any special anymore. There is also some renaming going on as well. Shortlog Michal Hocko (5): mm, hugetlb: unify core page allocation accounting and initialization mm, hugetlb: integrate giga hugetlb more naturally to the allocation path mm, hugetlb: do not rely on overcommit limit during migration mm, hugetlb: get rid of surplus page accounting tricks mm, hugetlb: further simplify hugetlb allocation API Diffstat: include/linux/hugetlb.h | 3 + mm/hugetlb.c| 305 +++- mm/migrate.c| 3 +- 3 files changed, 175 insertions(+), 136 deletions(-) [1] http://lkml.kernel.org/r/20170622193034.28972-1-mho...@kernel.org [2] http://lkml.kernel.org/r/20171122152832.iayefrlxbugph...@dhcp22.suse.cz