Re: [RFC] mm: a question about high-order check in __zone_watermark_ok()

2016-09-28 Thread Xishi Qiu
On 2016/9/28 13:52, Joonsoo Kim wrote:

> On Mon, Sep 26, 2016 at 01:02:31PM +0200, Michal Hocko wrote:
>> On Mon 26-09-16 18:17:50, Xishi Qiu wrote:
>>> On 2016/9/26 17:43, Michal Hocko wrote:
>>>
 On Mon 26-09-16 17:16:54, Xishi Qiu wrote:
> On 2016/9/26 16:58, Michal Hocko wrote:
>
>> On Mon 26-09-16 16:47:57, Xishi Qiu wrote:
>>> commit 97a16fc82a7c5b0cfce95c05dfb9561e306ca1b1
>>> (mm, page_alloc: only enforce watermarks for order-0 allocations)
>>> rewrite the high-order check in __zone_watermark_ok(), but I think it
>>> quietly fix a bug. Please see the following.
>>>
>>> Before this patch, the high-order check is this:
>>> __zone_watermark_ok()
>>> ...
>>> for (o = 0; o < order; o++) {
>>> /* At the next order, this order's pages become 
>>> unavailable */
>>> free_pages -= z->free_area[o].nr_free << o;
>>>
>>> /* Require fewer higher order pages to be free */
>>> min >>= 1;
>>>
>>> if (free_pages <= min)
>>> return false;
>>> }
>>> ...
>>>
>>> If we have cma memory, and we alloc a high-order movable page, then 
>>> it's right.
>>>
>>> But if we alloc a high-order unmovable page(e.g. alloc kernel stack in 
>>> dup_task_struct()),
>>> and there are a lot of high-order cma pages, but little high-order 
>>> unmovable
>>> pages, the it is still return *true*, but we will alloc *failed* 
>>> finally, because
>>> we cannot fallback from migrate_unmovable to migrate_cma, right?
>>
>> AFAIR CMA wmark check was always tricky and the above commit has made
>> the situation at least a bit more clear. Anyway IIRC 
>>
>> #ifdef CONFIG_CMA
>>  /* If allocation can't use CMA areas don't use free CMA pages */
>>  if (!(alloc_flags & ALLOC_CMA))
>>  free_cma = zone_page_state(z, NR_FREE_CMA_PAGES);
>> #endif
>>
>>  if (free_pages - free_cma <= min + z->lowmem_reserve[classzone_idx])
>>  return false;
>>
>> should reduce the prioblem because a lot of CMA pages should just get us
>> below the wmark + reserve boundary.
>
> Hi Michal,
>
> If we have many high-order cma pages, and the left pages 
> (unmovable/movable/reclaimable)
> are also enough, but they are fragment, then it will triger the problem.
> If we alloc a high-order unmovable page, water mark check return *true*, 
> but we
> will alloc *failed*, right?

 As Vlastimil has written. There were known issues with the wmark checks
 and high order requests.
>>>
>>> Shall we backport to stable?
>>
>> I dunno, it was a part of a larger series with high atomic reserves and
>> changes which sound a bit intrusive for the stable kernel. Considering
>> that CMA was known to be problematic and there are still some issues
>> left I do not think this is worth the trouble/risk.
> 
> CMA problem is known one. I mentioned it on my ZONE_CMA series v1 but
> removed due to Mel's high atomic reserve series.
> 
> That series is rather large and has some problems so I think that it
> is not suitable for stable tree.
> 
> Thanks.
> 

OK, I know, thank you very much.

Thanks,
Xishi Qiu

> .
> 





Re: [RFC] mm: a question about high-order check in __zone_watermark_ok()

2016-09-27 Thread Joonsoo Kim
On Mon, Sep 26, 2016 at 01:02:31PM +0200, Michal Hocko wrote:
> On Mon 26-09-16 18:17:50, Xishi Qiu wrote:
> > On 2016/9/26 17:43, Michal Hocko wrote:
> > 
> > > On Mon 26-09-16 17:16:54, Xishi Qiu wrote:
> > >> On 2016/9/26 16:58, Michal Hocko wrote:
> > >>
> > >>> On Mon 26-09-16 16:47:57, Xishi Qiu wrote:
> >  commit 97a16fc82a7c5b0cfce95c05dfb9561e306ca1b1
> >  (mm, page_alloc: only enforce watermarks for order-0 allocations)
> >  rewrite the high-order check in __zone_watermark_ok(), but I think it
> >  quietly fix a bug. Please see the following.
> > 
> >  Before this patch, the high-order check is this:
> >  __zone_watermark_ok()
> > ...
> > for (o = 0; o < order; o++) {
> > /* At the next order, this order's pages become 
> >  unavailable */
> > free_pages -= z->free_area[o].nr_free << o;
> > 
> > /* Require fewer higher order pages to be free */
> > min >>= 1;
> > 
> > if (free_pages <= min)
> > return false;
> > }
> > ...
> > 
> >  If we have cma memory, and we alloc a high-order movable page, then 
> >  it's right.
> > 
> >  But if we alloc a high-order unmovable page(e.g. alloc kernel stack in 
> >  dup_task_struct()),
> >  and there are a lot of high-order cma pages, but little high-order 
> >  unmovable
> >  pages, the it is still return *true*, but we will alloc *failed* 
> >  finally, because
> >  we cannot fallback from migrate_unmovable to migrate_cma, right?
> > >>>
> > >>> AFAIR CMA wmark check was always tricky and the above commit has made
> > >>> the situation at least a bit more clear. Anyway IIRC 
> > >>>
> > >>> #ifdef CONFIG_CMA
> > >>> /* If allocation can't use CMA areas don't use free CMA pages */
> > >>> if (!(alloc_flags & ALLOC_CMA))
> > >>> free_cma = zone_page_state(z, NR_FREE_CMA_PAGES);
> > >>> #endif
> > >>>
> > >>> if (free_pages - free_cma <= min + 
> > >>> z->lowmem_reserve[classzone_idx])
> > >>> return false;
> > >>>
> > >>> should reduce the prioblem because a lot of CMA pages should just get us
> > >>> below the wmark + reserve boundary.
> > >>
> > >> Hi Michal,
> > >>
> > >> If we have many high-order cma pages, and the left pages 
> > >> (unmovable/movable/reclaimable)
> > >> are also enough, but they are fragment, then it will triger the problem.
> > >> If we alloc a high-order unmovable page, water mark check return *true*, 
> > >> but we
> > >> will alloc *failed*, right?
> > > 
> > > As Vlastimil has written. There were known issues with the wmark checks
> > > and high order requests.
> > 
> > Shall we backport to stable?
> 
> I dunno, it was a part of a larger series with high atomic reserves and
> changes which sound a bit intrusive for the stable kernel. Considering
> that CMA was known to be problematic and there are still some issues
> left I do not think this is worth the trouble/risk.

CMA problem is known one. I mentioned it on my ZONE_CMA series v1 but
removed due to Mel's high atomic reserve series.

That series is rather large and has some problems so I think that it
is not suitable for stable tree.

Thanks.


Re: [RFC] mm: a question about high-order check in __zone_watermark_ok()

2016-09-26 Thread Michal Hocko
On Mon 26-09-16 18:17:50, Xishi Qiu wrote:
> On 2016/9/26 17:43, Michal Hocko wrote:
> 
> > On Mon 26-09-16 17:16:54, Xishi Qiu wrote:
> >> On 2016/9/26 16:58, Michal Hocko wrote:
> >>
> >>> On Mon 26-09-16 16:47:57, Xishi Qiu wrote:
>  commit 97a16fc82a7c5b0cfce95c05dfb9561e306ca1b1
>  (mm, page_alloc: only enforce watermarks for order-0 allocations)
>  rewrite the high-order check in __zone_watermark_ok(), but I think it
>  quietly fix a bug. Please see the following.
> 
>  Before this patch, the high-order check is this:
>  __zone_watermark_ok()
>   ...
>   for (o = 0; o < order; o++) {
>   /* At the next order, this order's pages become unavailable */
>   free_pages -= z->free_area[o].nr_free << o;
> 
>   /* Require fewer higher order pages to be free */
>   min >>= 1;
> 
>   if (free_pages <= min)
>   return false;
>   }
>   ...
> 
>  If we have cma memory, and we alloc a high-order movable page, then it's 
>  right.
> 
>  But if we alloc a high-order unmovable page(e.g. alloc kernel stack in 
>  dup_task_struct()),
>  and there are a lot of high-order cma pages, but little high-order 
>  unmovable
>  pages, the it is still return *true*, but we will alloc *failed* 
>  finally, because
>  we cannot fallback from migrate_unmovable to migrate_cma, right?
> >>>
> >>> AFAIR CMA wmark check was always tricky and the above commit has made
> >>> the situation at least a bit more clear. Anyway IIRC 
> >>>
> >>> #ifdef CONFIG_CMA
> >>>   /* If allocation can't use CMA areas don't use free CMA pages */
> >>>   if (!(alloc_flags & ALLOC_CMA))
> >>>   free_cma = zone_page_state(z, NR_FREE_CMA_PAGES);
> >>> #endif
> >>>
> >>>   if (free_pages - free_cma <= min + z->lowmem_reserve[classzone_idx])
> >>>   return false;
> >>>
> >>> should reduce the prioblem because a lot of CMA pages should just get us
> >>> below the wmark + reserve boundary.
> >>
> >> Hi Michal,
> >>
> >> If we have many high-order cma pages, and the left pages 
> >> (unmovable/movable/reclaimable)
> >> are also enough, but they are fragment, then it will triger the problem.
> >> If we alloc a high-order unmovable page, water mark check return *true*, 
> >> but we
> >> will alloc *failed*, right?
> > 
> > As Vlastimil has written. There were known issues with the wmark checks
> > and high order requests.
> 
> Shall we backport to stable?

I dunno, it was a part of a larger series with high atomic reserves and
changes which sound a bit intrusive for the stable kernel. Considering
that CMA was known to be problematic and there are still some issues
left I do not think this is worth the trouble/risk.
-- 
Michal Hocko
SUSE Labs


Re: [RFC] mm: a question about high-order check in __zone_watermark_ok()

2016-09-26 Thread Xishi Qiu
On 2016/9/26 17:43, Michal Hocko wrote:

> On Mon 26-09-16 17:16:54, Xishi Qiu wrote:
>> On 2016/9/26 16:58, Michal Hocko wrote:
>>
>>> On Mon 26-09-16 16:47:57, Xishi Qiu wrote:
 commit 97a16fc82a7c5b0cfce95c05dfb9561e306ca1b1
 (mm, page_alloc: only enforce watermarks for order-0 allocations)
 rewrite the high-order check in __zone_watermark_ok(), but I think it
 quietly fix a bug. Please see the following.

 Before this patch, the high-order check is this:
 __zone_watermark_ok()
...
for (o = 0; o < order; o++) {
/* At the next order, this order's pages become unavailable */
free_pages -= z->free_area[o].nr_free << o;

/* Require fewer higher order pages to be free */
min >>= 1;

if (free_pages <= min)
return false;
}
...

 If we have cma memory, and we alloc a high-order movable page, then it's 
 right.

 But if we alloc a high-order unmovable page(e.g. alloc kernel stack in 
 dup_task_struct()),
 and there are a lot of high-order cma pages, but little high-order 
 unmovable
 pages, the it is still return *true*, but we will alloc *failed* finally, 
 because
 we cannot fallback from migrate_unmovable to migrate_cma, right?
>>>
>>> AFAIR CMA wmark check was always tricky and the above commit has made
>>> the situation at least a bit more clear. Anyway IIRC 
>>>
>>> #ifdef CONFIG_CMA
>>> /* If allocation can't use CMA areas don't use free CMA pages */
>>> if (!(alloc_flags & ALLOC_CMA))
>>> free_cma = zone_page_state(z, NR_FREE_CMA_PAGES);
>>> #endif
>>>
>>> if (free_pages - free_cma <= min + z->lowmem_reserve[classzone_idx])
>>> return false;
>>>
>>> should reduce the prioblem because a lot of CMA pages should just get us
>>> below the wmark + reserve boundary.
>>
>> Hi Michal,
>>
>> If we have many high-order cma pages, and the left pages 
>> (unmovable/movable/reclaimable)
>> are also enough, but they are fragment, then it will triger the problem.
>> If we alloc a high-order unmovable page, water mark check return *true*, but 
>> we
>> will alloc *failed*, right?
> 
> As Vlastimil has written. There were known issues with the wmark checks
> and high order requests.

Shall we backport to stable?

Thanks,
Xishi Qiu



Re: [RFC] mm: a question about high-order check in __zone_watermark_ok()

2016-09-26 Thread Michal Hocko
On Mon 26-09-16 17:16:54, Xishi Qiu wrote:
> On 2016/9/26 16:58, Michal Hocko wrote:
> 
> > On Mon 26-09-16 16:47:57, Xishi Qiu wrote:
> >> commit 97a16fc82a7c5b0cfce95c05dfb9561e306ca1b1
> >> (mm, page_alloc: only enforce watermarks for order-0 allocations)
> >> rewrite the high-order check in __zone_watermark_ok(), but I think it
> >> quietly fix a bug. Please see the following.
> >>
> >> Before this patch, the high-order check is this:
> >> __zone_watermark_ok()
> >>...
> >>for (o = 0; o < order; o++) {
> >>/* At the next order, this order's pages become unavailable */
> >>free_pages -= z->free_area[o].nr_free << o;
> >>
> >>/* Require fewer higher order pages to be free */
> >>min >>= 1;
> >>
> >>if (free_pages <= min)
> >>return false;
> >>}
> >>...
> >>
> >> If we have cma memory, and we alloc a high-order movable page, then it's 
> >> right.
> >>
> >> But if we alloc a high-order unmovable page(e.g. alloc kernel stack in 
> >> dup_task_struct()),
> >> and there are a lot of high-order cma pages, but little high-order 
> >> unmovable
> >> pages, the it is still return *true*, but we will alloc *failed* finally, 
> >> because
> >> we cannot fallback from migrate_unmovable to migrate_cma, right?
> > 
> > AFAIR CMA wmark check was always tricky and the above commit has made
> > the situation at least a bit more clear. Anyway IIRC 
> > 
> > #ifdef CONFIG_CMA
> > /* If allocation can't use CMA areas don't use free CMA pages */
> > if (!(alloc_flags & ALLOC_CMA))
> > free_cma = zone_page_state(z, NR_FREE_CMA_PAGES);
> > #endif
> > 
> > if (free_pages - free_cma <= min + z->lowmem_reserve[classzone_idx])
> > return false;
> > 
> > should reduce the prioblem because a lot of CMA pages should just get us
> > below the wmark + reserve boundary.
> 
> Hi Michal,
> 
> If we have many high-order cma pages, and the left pages 
> (unmovable/movable/reclaimable)
> are also enough, but they are fragment, then it will triger the problem.
> If we alloc a high-order unmovable page, water mark check return *true*, but 
> we
> will alloc *failed*, right?

As Vlastimil has written. There were known issues with the wmark checks
and high order requests.
-- 
Michal Hocko
SUSE Labs


Re: [RFC] mm: a question about high-order check in __zone_watermark_ok()

2016-09-26 Thread Xishi Qiu
On 2016/9/26 16:58, Michal Hocko wrote:

> On Mon 26-09-16 16:47:57, Xishi Qiu wrote:
>> commit 97a16fc82a7c5b0cfce95c05dfb9561e306ca1b1
>> (mm, page_alloc: only enforce watermarks for order-0 allocations)
>> rewrite the high-order check in __zone_watermark_ok(), but I think it
>> quietly fix a bug. Please see the following.
>>
>> Before this patch, the high-order check is this:
>> __zone_watermark_ok()
>>  ...
>>  for (o = 0; o < order; o++) {
>>  /* At the next order, this order's pages become unavailable */
>>  free_pages -= z->free_area[o].nr_free << o;
>>
>>  /* Require fewer higher order pages to be free */
>>  min >>= 1;
>>
>>  if (free_pages <= min)
>>  return false;
>>  }
>>  ...
>>
>> If we have cma memory, and we alloc a high-order movable page, then it's 
>> right.
>>
>> But if we alloc a high-order unmovable page(e.g. alloc kernel stack in 
>> dup_task_struct()),
>> and there are a lot of high-order cma pages, but little high-order unmovable
>> pages, the it is still return *true*, but we will alloc *failed* finally, 
>> because
>> we cannot fallback from migrate_unmovable to migrate_cma, right?
> 
> AFAIR CMA wmark check was always tricky and the above commit has made
> the situation at least a bit more clear. Anyway IIRC 
> 
> #ifdef CONFIG_CMA
>   /* If allocation can't use CMA areas don't use free CMA pages */
>   if (!(alloc_flags & ALLOC_CMA))
>   free_cma = zone_page_state(z, NR_FREE_CMA_PAGES);
> #endif
> 
>   if (free_pages - free_cma <= min + z->lowmem_reserve[classzone_idx])
>   return false;
> 
> should reduce the prioblem because a lot of CMA pages should just get us
> below the wmark + reserve boundary.

Hi Michal,

If we have many high-order cma pages, and the left pages 
(unmovable/movable/reclaimable)
are also enough, but they are fragment, then it will triger the problem.
If we alloc a high-order unmovable page, water mark check return *true*, but we
will alloc *failed*, right?

Thanks,
Xishi Qiu



Re: [RFC] mm: a question about high-order check in __zone_watermark_ok()

2016-09-26 Thread Michal Hocko
On Mon 26-09-16 16:47:57, Xishi Qiu wrote:
> commit 97a16fc82a7c5b0cfce95c05dfb9561e306ca1b1
> (mm, page_alloc: only enforce watermarks for order-0 allocations)
> rewrite the high-order check in __zone_watermark_ok(), but I think it
> quietly fix a bug. Please see the following.
> 
> Before this patch, the high-order check is this:
> __zone_watermark_ok()
>   ...
>   for (o = 0; o < order; o++) {
>   /* At the next order, this order's pages become unavailable */
>   free_pages -= z->free_area[o].nr_free << o;
> 
>   /* Require fewer higher order pages to be free */
>   min >>= 1;
> 
>   if (free_pages <= min)
>   return false;
>   }
>   ...
> 
> If we have cma memory, and we alloc a high-order movable page, then it's 
> right.
> 
> But if we alloc a high-order unmovable page(e.g. alloc kernel stack in 
> dup_task_struct()),
> and there are a lot of high-order cma pages, but little high-order unmovable
> pages, the it is still return *true*, but we will alloc *failed* finally, 
> because
> we cannot fallback from migrate_unmovable to migrate_cma, right?

AFAIR CMA wmark check was always tricky and the above commit has made
the situation at least a bit more clear. Anyway IIRC 

#ifdef CONFIG_CMA
/* If allocation can't use CMA areas don't use free CMA pages */
if (!(alloc_flags & ALLOC_CMA))
free_cma = zone_page_state(z, NR_FREE_CMA_PAGES);
#endif

if (free_pages - free_cma <= min + z->lowmem_reserve[classzone_idx])
return false;

should reduce the prioblem because a lot of CMA pages should just get us
below the wmark + reserve boundary.
-- 
Michal Hocko
SUSE Labs


Re: [RFC] mm: a question about high-order check in __zone_watermark_ok()

2016-09-26 Thread Vlastimil Babka

[+CC Joonsoo Kim]

On 09/26/2016 10:47 AM, Xishi Qiu wrote:

commit 97a16fc82a7c5b0cfce95c05dfb9561e306ca1b1
(mm, page_alloc: only enforce watermarks for order-0 allocations)
rewrite the high-order check in __zone_watermark_ok(), but I think it
quietly fix a bug. Please see the following.

Before this patch, the high-order check is this:
__zone_watermark_ok()
...
for (o = 0; o < order; o++) {
/* At the next order, this order's pages become unavailable */
free_pages -= z->free_area[o].nr_free << o;

/* Require fewer higher order pages to be free */
min >>= 1;

if (free_pages <= min)
return false;
}
...

If we have cma memory, and we alloc a high-order movable page, then it's right.

But if we alloc a high-order unmovable page(e.g. alloc kernel stack in 
dup_task_struct()),
and there are a lot of high-order cma pages, but little high-order unmovable
pages, the it is still return *true*, but we will alloc *failed* finally, 
because
we cannot fallback from migrate_unmovable to migrate_cma, right?


Yeah I think this limitation was known to CMA people.


Also if we doing __alloc_pages_slowpath(), the compact will not work, because
__zone_watermark_ok() always return true, and it lead to alloc a high-order
unmovable page failed, then do direct reclaim.


I guess that can happen as well.


Thanks,
Xishi Qiu





[RFC] mm: a question about high-order check in __zone_watermark_ok()

2016-09-26 Thread Xishi Qiu
commit 97a16fc82a7c5b0cfce95c05dfb9561e306ca1b1
(mm, page_alloc: only enforce watermarks for order-0 allocations)
rewrite the high-order check in __zone_watermark_ok(), but I think it
quietly fix a bug. Please see the following.

Before this patch, the high-order check is this:
__zone_watermark_ok()
...
for (o = 0; o < order; o++) {
/* At the next order, this order's pages become unavailable */
free_pages -= z->free_area[o].nr_free << o;

/* Require fewer higher order pages to be free */
min >>= 1;

if (free_pages <= min)
return false;
}
...

If we have cma memory, and we alloc a high-order movable page, then it's right.

But if we alloc a high-order unmovable page(e.g. alloc kernel stack in 
dup_task_struct()),
and there are a lot of high-order cma pages, but little high-order unmovable
pages, the it is still return *true*, but we will alloc *failed* finally, 
because
we cannot fallback from migrate_unmovable to migrate_cma, right?

Also if we doing __alloc_pages_slowpath(), the compact will not work, because
__zone_watermark_ok() always return true, and it lead to alloc a high-order
unmovable page failed, then do direct reclaim.

Thanks,
Xishi Qiu