Re: [PATCH v3 4/8] mm/hugetlb: make hugetlb migration callback CMA aware

2020-06-30 Thread Mike Kravetz
On 6/30/20 12:22 AM, Joonsoo Kim wrote:
> 2020년 6월 30일 (화) 오후 3:42, Michal Hocko 님이 작성:
>>
>> On Tue 30-06-20 15:30:04, Joonsoo Kim wrote:
>>> 2020년 6월 29일 (월) 오후 4:55, Michal Hocko 님이 작성:
>> [...]
 diff --git a/mm/hugetlb.c b/mm/hugetlb.c
 index 57ece74e3aae..c1595b1d36f3 100644
 --- a/mm/hugetlb.c
 +++ b/mm/hugetlb.c
 @@ -1092,10 +1092,14 @@ static struct page 
 *dequeue_huge_page_nodemask(struct hstate *h, gfp_t gfp_mask,
  /* Movability of hugepages depends on migration support. */
  static inline gfp_t htlb_alloc_mask(struct hstate *h)
  {
 +   gfp_t gfp;
 +
 if (hugepage_movable_supported(h))
 -   return GFP_HIGHUSER_MOVABLE;
 +   gfp = GFP_HIGHUSER_MOVABLE;
 else
 -   return GFP_HIGHUSER;
 +   gfp = GFP_HIGHUSER;
 +
 +   return current_gfp_context(gfp);
  }

  static struct page *dequeue_huge_page_vma(struct hstate *h,

 If we even fix this general issue for other allocations and allow a
 better CMA exclusion then it would be implemented consistently for
 everybody.
>>>
>>> Yes, I have reviewed the memalloc_nocma_{} APIs and found the better way
>>> for CMA exclusion. I will do it after this patch is finished.
>>>
 Does this make more sense to you are we still not on the same page wrt
 to the actual problem?
>>>
>>> Yes, but we have different opinions about it. As said above, I will make
>>> a patch for better CMA exclusion after this patchset. It will make
>>> code consistent.
>>> I'd really appreciate it if you wait until then.
>>
>> As I've said I would _prefer_ simplicity over "correctness" if it is only
>> partial and hard to reason about from the userspace experience but this
>> is not something I would _insist_ on. If Mike as a maintainer of the
>> code is ok with that then I will not stand in the way.
> 
> Okay.

I was OK with Joonsoo's original patch which is why I Ack'ed it.  However,
my sense of simplicity and style may not be the norm as I have spent too
much time with the hugetlbfs code. :)  That is why I did not chime in and
let Michal and Joonsoo discuss.  I can see both sides of the issue.  For
now, I am OK to go with Joonsoo's patch as long as the issue below is
considered in the the next patchset.
-- 
Mike Kravetz

>> But please note that a missing current_gfp_context inside
>> htlb_alloc_mask is a subtle bug. I do not think it matters right now but
>> with a growing use of scoped apis this might actually hit some day so I
>> believe we want to have it in place.
> 
> Okay. I will keep in mind and consider it when fixing CMA exclusion on the
> other patchset.
> 
> Thanks.


Re: [PATCH v3 4/8] mm/hugetlb: make hugetlb migration callback CMA aware

2020-06-30 Thread Joonsoo Kim
2020년 6월 30일 (화) 오후 3:42, Michal Hocko 님이 작성:
>
> On Tue 30-06-20 15:30:04, Joonsoo Kim wrote:
> > 2020년 6월 29일 (월) 오후 4:55, Michal Hocko 님이 작성:
> [...]
> > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > > index 57ece74e3aae..c1595b1d36f3 100644
> > > --- a/mm/hugetlb.c
> > > +++ b/mm/hugetlb.c
> > > @@ -1092,10 +1092,14 @@ static struct page 
> > > *dequeue_huge_page_nodemask(struct hstate *h, gfp_t gfp_mask,
> > >  /* Movability of hugepages depends on migration support. */
> > >  static inline gfp_t htlb_alloc_mask(struct hstate *h)
> > >  {
> > > +   gfp_t gfp;
> > > +
> > > if (hugepage_movable_supported(h))
> > > -   return GFP_HIGHUSER_MOVABLE;
> > > +   gfp = GFP_HIGHUSER_MOVABLE;
> > > else
> > > -   return GFP_HIGHUSER;
> > > +   gfp = GFP_HIGHUSER;
> > > +
> > > +   return current_gfp_context(gfp);
> > >  }
> > >
> > >  static struct page *dequeue_huge_page_vma(struct hstate *h,
> > >
> > > If we even fix this general issue for other allocations and allow a
> > > better CMA exclusion then it would be implemented consistently for
> > > everybody.
> >
> > Yes, I have reviewed the memalloc_nocma_{} APIs and found the better way
> > for CMA exclusion. I will do it after this patch is finished.
> >
> > > Does this make more sense to you are we still not on the same page wrt
> > > to the actual problem?
> >
> > Yes, but we have different opinions about it. As said above, I will make
> > a patch for better CMA exclusion after this patchset. It will make
> > code consistent.
> > I'd really appreciate it if you wait until then.
>
> As I've said I would _prefer_ simplicity over "correctness" if it is only
> partial and hard to reason about from the userspace experience but this
> is not something I would _insist_ on. If Mike as a maintainer of the
> code is ok with that then I will not stand in the way.

Okay.

> But please note that a missing current_gfp_context inside
> htlb_alloc_mask is a subtle bug. I do not think it matters right now but
> with a growing use of scoped apis this might actually hit some day so I
> believe we want to have it in place.

Okay. I will keep in mind and consider it when fixing CMA exclusion on the
other patchset.

Thanks.


Re: [PATCH v3 4/8] mm/hugetlb: make hugetlb migration callback CMA aware

2020-06-30 Thread Michal Hocko
On Tue 30-06-20 15:30:04, Joonsoo Kim wrote:
> 2020년 6월 29일 (월) 오후 4:55, Michal Hocko 님이 작성:
[...]
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 57ece74e3aae..c1595b1d36f3 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -1092,10 +1092,14 @@ static struct page 
> > *dequeue_huge_page_nodemask(struct hstate *h, gfp_t gfp_mask,
> >  /* Movability of hugepages depends on migration support. */
> >  static inline gfp_t htlb_alloc_mask(struct hstate *h)
> >  {
> > +   gfp_t gfp;
> > +
> > if (hugepage_movable_supported(h))
> > -   return GFP_HIGHUSER_MOVABLE;
> > +   gfp = GFP_HIGHUSER_MOVABLE;
> > else
> > -   return GFP_HIGHUSER;
> > +   gfp = GFP_HIGHUSER;
> > +
> > +   return current_gfp_context(gfp);
> >  }
> >
> >  static struct page *dequeue_huge_page_vma(struct hstate *h,
> >
> > If we even fix this general issue for other allocations and allow a
> > better CMA exclusion then it would be implemented consistently for
> > everybody.
> 
> Yes, I have reviewed the memalloc_nocma_{} APIs and found the better way
> for CMA exclusion. I will do it after this patch is finished.
> 
> > Does this make more sense to you are we still not on the same page wrt
> > to the actual problem?
> 
> Yes, but we have different opinions about it. As said above, I will make
> a patch for better CMA exclusion after this patchset. It will make
> code consistent.
> I'd really appreciate it if you wait until then.

As I've said I would _prefer_ simplicity over "correctness" if it is only
partial and hard to reason about from the userspace experience but this
is not something I would _insist_ on. If Mike as a maintainer of the
code is ok with that then I will not stand in the way.

But please note that a missing current_gfp_context inside
htlb_alloc_mask is a subtle bug. I do not think it matters right now but
with a growing use of scoped apis this might actually hit some day so I
believe we want to have it in place.

Thanks!

-- 
Michal Hocko
SUSE Labs


Re: [PATCH v3 4/8] mm/hugetlb: make hugetlb migration callback CMA aware

2020-06-30 Thread Joonsoo Kim
2020년 6월 29일 (월) 오후 4:55, Michal Hocko 님이 작성:
>
> On Mon 29-06-20 15:27:25, Joonsoo Kim wrote:
> [...]
> > Solution that Introduces a new
> > argument doesn't cause this problem while avoiding CMA regions.
>
> My primary argument is that there is no real reason to treat hugetlb
> dequeing somehow differently. So if we simply exclude __GFP_MOVABLE for
> _any_ other allocation then this certainly has some drawbacks on the
> usable memory for the migration target and it can lead to allocation
> failures (especially on movable_node setups where the amount of movable
> memory might be really high) and therefore longterm gup failures. And
> yes those failures might be premature. But my point is that the behavior
> would be _consistent_. So a user wouldn't see random failures for some
> types of pages while a success for others.

Hmm... I don't agree with your argument. Excluding __GFP_MOVABLE is
a *work-around* way to exclude CMA regions. Implementation for dequeuing
in this patch is a right way to exclude CMA regions. Why do we use a work-around
for this case? To be consistent is important but it's only meaningful
if it is correct.
It should not disrupt to make a better code. And, dequeing is already a special
process that is only available for hugetlb. I think that using
different (correct)
implementations there doesn't break any consistency.

> Let's have a look at this patch. It is simply working that around the
> restriction for a very limited types of pages - only hugetlb pages
> which have reserves in non-cma movable pools. I would claim that many
> setups will simply not have many (if any) spare hugetlb pages in the
> pool except for temporary time periods when a workload is (re)starting
> because this would be effectively a wasted memory.

This can not be a stopper to make the correct code.

> The patch is adding a special case flag to claim what the code already
> does by memalloc_nocma_{save,restore} API so the information is already
> there. Sorry I didn't bring this up earlier but I have completely forgot
> about its existence. With that one in place I do agree that dequeing
> needs a fixup but that should be something like the following instead.

Thanks for letting me know. I don't know it until now. It looks like it's
better to use this API rather than introducing a new argument.

> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 57ece74e3aae..c1595b1d36f3 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1092,10 +1092,14 @@ static struct page *dequeue_huge_page_nodemask(struct 
> hstate *h, gfp_t gfp_mask,
>  /* Movability of hugepages depends on migration support. */
>  static inline gfp_t htlb_alloc_mask(struct hstate *h)
>  {
> +   gfp_t gfp;
> +
> if (hugepage_movable_supported(h))
> -   return GFP_HIGHUSER_MOVABLE;
> +   gfp = GFP_HIGHUSER_MOVABLE;
> else
> -   return GFP_HIGHUSER;
> +   gfp = GFP_HIGHUSER;
> +
> +   return current_gfp_context(gfp);
>  }
>
>  static struct page *dequeue_huge_page_vma(struct hstate *h,
>
> If we even fix this general issue for other allocations and allow a
> better CMA exclusion then it would be implemented consistently for
> everybody.

Yes, I have reviewed the memalloc_nocma_{} APIs and found the better way
for CMA exclusion. I will do it after this patch is finished.

> Does this make more sense to you are we still not on the same page wrt
> to the actual problem?

Yes, but we have different opinions about it. As said above, I will make
a patch for better CMA exclusion after this patchset. It will make
code consistent.
I'd really appreciate it if you wait until then.

Thanks.


Re: [PATCH v3 4/8] mm/hugetlb: make hugetlb migration callback CMA aware

2020-06-29 Thread Michal Hocko
On Mon 29-06-20 15:27:25, Joonsoo Kim wrote:
[...]
> Solution that Introduces a new
> argument doesn't cause this problem while avoiding CMA regions.

My primary argument is that there is no real reason to treat hugetlb
dequeing somehow differently. So if we simply exclude __GFP_MOVABLE for
_any_ other allocation then this certainly has some drawbacks on the
usable memory for the migration target and it can lead to allocation
failures (especially on movable_node setups where the amount of movable
memory might be really high) and therefore longterm gup failures. And
yes those failures might be premature. But my point is that the behavior
would be _consistent_. So a user wouldn't see random failures for some
types of pages while a success for others.

Let's have a look at this patch. It is simply working that around the
restriction for a very limited types of pages - only hugetlb pages
which have reserves in non-cma movable pools. I would claim that many
setups will simply not have many (if any) spare hugetlb pages in the
pool except for temporary time periods when a workload is (re)starting
because this would be effectively a wasted memory.

The patch is adding a special case flag to claim what the code already
does by memalloc_nocma_{save,restore} API so the information is already
there. Sorry I didn't bring this up earlier but I have completely forgot
about its existence. With that one in place I do agree that dequeing
needs a fixup but that should be something like the following instead.

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 57ece74e3aae..c1595b1d36f3 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1092,10 +1092,14 @@ static struct page *dequeue_huge_page_nodemask(struct 
hstate *h, gfp_t gfp_mask,
 /* Movability of hugepages depends on migration support. */
 static inline gfp_t htlb_alloc_mask(struct hstate *h)
 {
+   gfp_t gfp;
+
if (hugepage_movable_supported(h))
-   return GFP_HIGHUSER_MOVABLE;
+   gfp = GFP_HIGHUSER_MOVABLE;
else
-   return GFP_HIGHUSER;
+   gfp = GFP_HIGHUSER;
+
+   return current_gfp_context(gfp);
 }
 
 static struct page *dequeue_huge_page_vma(struct hstate *h,

If we even fix this general issue for other allocations and allow a
better CMA exclusion then it would be implemented consistently for
everybody.

Does this make more sense to you are we still not on the same page wrt
to the actual problem?
-- 
Michal Hocko
SUSE Labs


Re: [PATCH v3 4/8] mm/hugetlb: make hugetlb migration callback CMA aware

2020-06-29 Thread Joonsoo Kim
2020년 6월 26일 (금) 오후 4:23, Michal Hocko 님이 작성:
>
> On Fri 26-06-20 13:49:15, Joonsoo Kim wrote:
> > 2020년 6월 25일 (목) 오후 8:54, Michal Hocko 님이 작성:
> > >
> > > On Tue 23-06-20 15:13:44, Joonsoo Kim wrote:
> > > > From: Joonsoo Kim 
> > > >
> > > > new_non_cma_page() in gup.c which try to allocate migration target page
> > > > requires to allocate the new page that is not on the CMA area.
> > > > new_non_cma_page() implements it by removing __GFP_MOVABLE flag. This 
> > > > way
> > > > works well for THP page or normal page but not for hugetlb page.
> > >
> > > Could you explain why? I mean why cannot you simply remove __GFP_MOVABLE
> > > flag when calling alloc_huge_page_nodemask and check for it in dequeue
> > > path?
> >
> > If we remove __GFP_MOVABLE when calling alloc_huge_page_nodemask, we cannot
> > use the page in ZONE_MOVABLE on dequeing.
> >
> > __GFP_MOVABLE is not only used for CMA selector but also used for zone
> > selector.  If we clear it, we cannot use the page in the ZONE_MOVABLE
> > even if it's not CMA pages.  For THP page or normal page allocation,
> > there is no way to avoid this weakness without introducing another
> > flag or argument. For me, introducing another flag or argument for
> > these functions looks over-engineering so I don't change them and
> > leave them as they are (removing __GFP_MOVABLE).
> >
> > But, for alloc_huge_page_nodemask(), introducing a new argument
> > doesn't seem to be a problem since it is not a general function but
> > just a migration target allocation function.
>
> I really do not see why hugetlb and only the dequeing part should be
> special. This just leads to a confusion. From the code point of view it
> makes perfect sense to opt out CMA regions for !__GFP_MOVABLE when
> dequeing. So I would rather see a consistent behavior than a special
> case deep in the hugetlb allocator layer.

It seems that there is a misunderstanding. It's possible to opt out CMA regions
for !__GFP_MOVABLE when dequeing. It's reasonable. But, for !__GFP_MOVABLE,
we don't search the hugetlb page on the ZONE_MOVABLE when dequeing since
dequeing zone is limited by gfp_zone(gfp_mask). Solution that Introduces a new
argument doesn't cause this problem while avoiding CMA regions.

Thanks.


Re: [PATCH v3 4/8] mm/hugetlb: make hugetlb migration callback CMA aware

2020-06-26 Thread Michal Hocko
On Fri 26-06-20 13:49:15, Joonsoo Kim wrote:
> 2020년 6월 25일 (목) 오후 8:54, Michal Hocko 님이 작성:
> >
> > On Tue 23-06-20 15:13:44, Joonsoo Kim wrote:
> > > From: Joonsoo Kim 
> > >
> > > new_non_cma_page() in gup.c which try to allocate migration target page
> > > requires to allocate the new page that is not on the CMA area.
> > > new_non_cma_page() implements it by removing __GFP_MOVABLE flag. This way
> > > works well for THP page or normal page but not for hugetlb page.
> >
> > Could you explain why? I mean why cannot you simply remove __GFP_MOVABLE
> > flag when calling alloc_huge_page_nodemask and check for it in dequeue
> > path?
> 
> If we remove __GFP_MOVABLE when calling alloc_huge_page_nodemask, we cannot
> use the page in ZONE_MOVABLE on dequeing.
> 
> __GFP_MOVABLE is not only used for CMA selector but also used for zone
> selector.  If we clear it, we cannot use the page in the ZONE_MOVABLE
> even if it's not CMA pages.  For THP page or normal page allocation,
> there is no way to avoid this weakness without introducing another
> flag or argument. For me, introducing another flag or argument for
> these functions looks over-engineering so I don't change them and
> leave them as they are (removing __GFP_MOVABLE).
> 
> But, for alloc_huge_page_nodemask(), introducing a new argument
> doesn't seem to be a problem since it is not a general function but
> just a migration target allocation function.

I really do not see why hugetlb and only the dequeing part should be
special. This just leads to a confusion. From the code point of view it
makes perfect sense to opt out CMA regions for !__GFP_MOVABLE when
dequeing. So I would rather see a consistent behavior than a special
case deep in the hugetlb allocator layer.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH v3 4/8] mm/hugetlb: make hugetlb migration callback CMA aware

2020-06-25 Thread Joonsoo Kim
2020년 6월 25일 (목) 오후 8:54, Michal Hocko 님이 작성:
>
> On Tue 23-06-20 15:13:44, Joonsoo Kim wrote:
> > From: Joonsoo Kim 
> >
> > new_non_cma_page() in gup.c which try to allocate migration target page
> > requires to allocate the new page that is not on the CMA area.
> > new_non_cma_page() implements it by removing __GFP_MOVABLE flag. This way
> > works well for THP page or normal page but not for hugetlb page.
>
> Could you explain why? I mean why cannot you simply remove __GFP_MOVABLE
> flag when calling alloc_huge_page_nodemask and check for it in dequeue
> path?

If we remove __GFP_MOVABLE when calling alloc_huge_page_nodemask, we cannot
use the page in ZONE_MOVABLE on dequeing.

__GFP_MOVABLE is not only used for CMA selector but also used for zone selector.
If we clear it, we cannot use the page in the ZONE_MOVABLE even if
it's not CMA pages.
For THP page or normal page allocation, there is no way to avoid this
weakness without
introducing another flag or argument. For me, introducing another flag
or argument for
these functions looks over-engineering so I don't change them and
leave them as they are
(removing __GFP_MOVABLE).

But, for alloc_huge_page_nodemask(), introducing a new argument
doesn't seem to be
a problem since it is not a general function but just a migration
target allocation function.

If you agree with this argument, I will add more description to the patch.

Thanks.


Re: [PATCH v3 4/8] mm/hugetlb: make hugetlb migration callback CMA aware

2020-06-25 Thread Michal Hocko
On Tue 23-06-20 15:13:44, Joonsoo Kim wrote:
> From: Joonsoo Kim 
> 
> new_non_cma_page() in gup.c which try to allocate migration target page
> requires to allocate the new page that is not on the CMA area.
> new_non_cma_page() implements it by removing __GFP_MOVABLE flag. This way
> works well for THP page or normal page but not for hugetlb page.

Could you explain why? I mean why cannot you simply remove __GFP_MOVABLE
flag when calling alloc_huge_page_nodemask and check for it in dequeue
path?

Your current calling convention doesn't allow that but as I've said in
the reply to previous patch this should be changed and then it would
make this one easier as well unless I am missing something.
-- 
Michal Hocko
SUSE Labs