Re: [PATCH 3/4] mm: unreserve highatomic free pages fully before OOM

2016-10-11 Thread Michal Hocko
On Tue 11-10-16 16:37:16, Minchan Kim wrote:
> On Tue, Oct 11, 2016 at 09:26:06AM +0200, Michal Hocko wrote:
> > On Tue 11-10-16 16:09:45, Minchan Kim wrote:
[...]
> > > @@ -2154,12 +2156,24 @@ static void unreserve_highatomic_pageblock(const 
> > > struct alloc_context *ac)
> > >* may increase.
> > >*/
> > >   set_pageblock_migratetype(page, ac->migratetype);
> > > - move_freepages_block(zone, page, ac->migratetype);
> > > - spin_unlock_irqrestore(&zone->lock, flags);
> > > - return;
> > > + ret = move_freepages_block(zone, page,
> > > + ac->migratetype);
> > > + /*
> > > +  * By race with page freeing functions, !highatomic
> > > +  * pageblocks can have free pages in highatomic free
> > > +  * list so if drain is true, try to unreserve every
> > > +  * free pages in highatomic free list without bailing
> > > +  * out.
> > > +  */
> > > + if (!drain) {
> > 
> > if (ret)
> > > + spin_unlock_irqrestore(&zone->lock, flags);
> > > + return ret;
> > > + }
> > 
> > arguably this would work better also for !drain case which currently
> > tries to unreserve but in case of the race it would do nothing.
> 
> I thought it but I was afraid if you say again it's over complicated.

Well, maybe there is even better/easier solution. Anyway, if
I were you I would just split it into two patches. The first
to unreserve from shoudl_reclaim_retry and the later to make
unreserve_highatomic_pageblock more reliable.

> I will do it with your SOB in next spin.

ok, thanks!
-- 
Michal Hocko
SUSE Labs


Re: [PATCH 3/4] mm: unreserve highatomic free pages fully before OOM

2016-10-11 Thread Minchan Kim
On Tue, Oct 11, 2016 at 09:26:06AM +0200, Michal Hocko wrote:
> On Tue 11-10-16 16:09:45, Minchan Kim wrote:
> > On Tue, Oct 11, 2016 at 08:50:48AM +0200, Michal Hocko wrote:
> > > On Tue 11-10-16 14:01:41, Minchan Kim wrote:
> [...]
> > > > Also, your patch makes retry loop greater than MAX_RECLAIM_RETRIES
> > > > if unreserve_highatomic_pageblock returns true. Theoretically,
> > > > it would make live lock. You might argue it's *really really* rare
> > > > but I don't want to add such subtle thing.
> > > > Maybe, we could drain when no_progress_loops == MAX_RECLAIM_RETRIES.
> > > 
> > > What would be the scenario when we would really livelock here? How can
> > > we have unreserve_highatomic_pageblock returning true for ever?
> > 
> > Other context freeing highorder page/reallocating repeatedly while
> > a process stucked direct reclaim is looping with should_reclaim_retry.
> 
> If we unreserve those pages then we should converge to OOM. Btw. this
> can happen even without highmem reserves. Heavy short lived allocations
> might keep us looping at the lowest priority. They are just too unlikely
> to care about.

Indeed.
> 
> > > > > aggressive to me. If we just do one at the time we have a chance to
> > > > > keep some reserves if the OOM situation is really ephemeral.
> > > > > 
> > > > > Does this patch work in your usecase?
> > > > 
> > > > I didn't test but I guess it works but it has problems I mentioned
> > > > above. 
> > > 
> > > Please do not make this too over complicated and be practical. I do not
> > > really want to dismiss your usecase but I am really not convinced that
> > > such a "perfectly fit into all memory" situations are sustainable and
> > > justify to make the whole code more complex. I agree that we can at
> > > least try to do something to release those reserves but let's do it
> > > as simple as possible.
> > 
> > If you think it's too complicated, how about this?
> 
> Definitely better than the original patch. Little bit too aggressive
> because we could really go with one block at the time. But this is a
> minor thing and easily fixable...
> 
> > @@ -2154,12 +2156,24 @@ static void unreserve_highatomic_pageblock(const 
> > struct alloc_context *ac)
> >  * may increase.
> >  */
> > set_pageblock_migratetype(page, ac->migratetype);
> > -   move_freepages_block(zone, page, ac->migratetype);
> > -   spin_unlock_irqrestore(&zone->lock, flags);
> > -   return;
> > +   ret = move_freepages_block(zone, page,
> > +   ac->migratetype);
> > +   /*
> > +* By race with page freeing functions, !highatomic
> > +* pageblocks can have free pages in highatomic free
> > +* list so if drain is true, try to unreserve every
> > +* free pages in highatomic free list without bailing
> > +* out.
> > +*/
> > +   if (!drain) {
> 
>   if (ret)
> > +   spin_unlock_irqrestore(&zone->lock, flags);
> > +   return ret;
> > +   }
> 
> arguably this would work better also for !drain case which currently
> tries to unreserve but in case of the race it would do nothing.

I thought it but I was afraid if you say again it's over complicated.
I will do it with your SOB in next spin.

Thanks, Michal.

> 
> > }
> > spin_unlock_irqrestore(&zone->lock, flags);
> > }
> > +
> > +   return ret;
> >  }
> >  
> >  /* Remove an element from the buddy allocator from the fallback list */
> -- 
> Michal Hocko
> SUSE Labs
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org";> em...@kvack.org 


Re: [PATCH 3/4] mm: unreserve highatomic free pages fully before OOM

2016-10-11 Thread Michal Hocko
On Tue 11-10-16 16:09:45, Minchan Kim wrote:
> On Tue, Oct 11, 2016 at 08:50:48AM +0200, Michal Hocko wrote:
> > On Tue 11-10-16 14:01:41, Minchan Kim wrote:
[...]
> > > Also, your patch makes retry loop greater than MAX_RECLAIM_RETRIES
> > > if unreserve_highatomic_pageblock returns true. Theoretically,
> > > it would make live lock. You might argue it's *really really* rare
> > > but I don't want to add such subtle thing.
> > > Maybe, we could drain when no_progress_loops == MAX_RECLAIM_RETRIES.
> > 
> > What would be the scenario when we would really livelock here? How can
> > we have unreserve_highatomic_pageblock returning true for ever?
> 
> Other context freeing highorder page/reallocating repeatedly while
> a process stucked direct reclaim is looping with should_reclaim_retry.

If we unreserve those pages then we should converge to OOM. Btw. this
can happen even without highmem reserves. Heavy short lived allocations
might keep us looping at the lowest priority. They are just too unlikely
to care about.

> > > > aggressive to me. If we just do one at the time we have a chance to
> > > > keep some reserves if the OOM situation is really ephemeral.
> > > > 
> > > > Does this patch work in your usecase?
> > > 
> > > I didn't test but I guess it works but it has problems I mentioned
> > > above. 
> > 
> > Please do not make this too over complicated and be practical. I do not
> > really want to dismiss your usecase but I am really not convinced that
> > such a "perfectly fit into all memory" situations are sustainable and
> > justify to make the whole code more complex. I agree that we can at
> > least try to do something to release those reserves but let's do it
> > as simple as possible.
> 
> If you think it's too complicated, how about this?

Definitely better than the original patch. Little bit too aggressive
because we could really go with one block at the time. But this is a
minor thing and easily fixable...

> @@ -2154,12 +2156,24 @@ static void unreserve_highatomic_pageblock(const 
> struct alloc_context *ac)
>* may increase.
>*/
>   set_pageblock_migratetype(page, ac->migratetype);
> - move_freepages_block(zone, page, ac->migratetype);
> - spin_unlock_irqrestore(&zone->lock, flags);
> - return;
> + ret = move_freepages_block(zone, page,
> + ac->migratetype);
> + /*
> +  * By race with page freeing functions, !highatomic
> +  * pageblocks can have free pages in highatomic free
> +  * list so if drain is true, try to unreserve every
> +  * free pages in highatomic free list without bailing
> +  * out.
> +  */
> + if (!drain) {

if (ret)
> + spin_unlock_irqrestore(&zone->lock, flags);
> + return ret;
> + }

arguably this would work better also for !drain case which currently
tries to unreserve but in case of the race it would do nothing.

>   }
>   spin_unlock_irqrestore(&zone->lock, flags);
>   }
> +
> + return ret;
>  }
>  
>  /* Remove an element from the buddy allocator from the fallback list */
-- 
Michal Hocko
SUSE Labs


Re: [PATCH 3/4] mm: unreserve highatomic free pages fully before OOM

2016-10-11 Thread Minchan Kim
On Tue, Oct 11, 2016 at 08:50:48AM +0200, Michal Hocko wrote:
> On Tue 11-10-16 14:01:41, Minchan Kim wrote:
> > Hi Michal,
> > 
> > On Mon, Oct 10, 2016 at 09:41:40AM +0200, Michal Hocko wrote:
> > > On Fri 07-10-16 23:43:45, Minchan Kim wrote:
> > > > On Fri, Oct 07, 2016 at 11:09:17AM +0200, Michal Hocko wrote:
> [...]
> > > > > @@ -2102,10 +2109,12 @@ static void 
> > > > > unreserve_highatomic_pageblock(const struct alloc_context *ac)
> > > > >   set_pageblock_migratetype(page, 
> > > > > ac->migratetype);
> > > > >   move_freepages_block(zone, page, 
> > > > > ac->migratetype);
> > > > >   spin_unlock_irqrestore(&zone->lock, flags);
> > > > > - return;
> > > > > + return true;
> > > > 
> > > > Such cut-off makes reserved pageblock remained before the OOM.
> > > > We call it as premature OOM kill.
> > > 
> > > Not sure I understand. The above should get rid of all atomic reserves
> > > before we go OOM. We can do it all at once but that sounds too
> > 
> > The problem is there is race between page freeing path and unreserve
> > logic so that some pages could be in highatomic free list even though
> > zone->nr_reserved_highatomic is already zero.
> 
> Does it make any sense to handle such an unlikely case?

I agree if it's really hard to solve but why should we remain
such hole in the algorithm if we can fix easily?

> 
> > So, at least, it would be better to have a draining step at some point
> > where was (no_progress_loops == MAX_RECLAIM RETRIES) in my patch.
> > 
> > Also, your patch makes retry loop greater than MAX_RECLAIM_RETRIES
> > if unreserve_highatomic_pageblock returns true. Theoretically,
> > it would make live lock. You might argue it's *really really* rare
> > but I don't want to add such subtle thing.
> > Maybe, we could drain when no_progress_loops == MAX_RECLAIM_RETRIES.
> 
> What would be the scenario when we would really livelock here? How can
> we have unreserve_highatomic_pageblock returning true for ever?

Other context freeing highorder page/reallocating repeatedly while
a process stucked direct reclaim is looping with should_reclaim_retry.

> 
> > > aggressive to me. If we just do one at the time we have a chance to
> > > keep some reserves if the OOM situation is really ephemeral.
> > > 
> > > Does this patch work in your usecase?
> > 
> > I didn't test but I guess it works but it has problems I mentioned
> > above. 
> 
> Please do not make this too over complicated and be practical. I do not
> really want to dismiss your usecase but I am really not convinced that
> such a "perfectly fit into all memory" situations are sustainable and
> justify to make the whole code more complex. I agree that we can at
> least try to do something to release those reserves but let's do it
> as simple as possible.

If you think it's too complicated, how about this?

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index fd91b8955b26..e3ce442e9976 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2098,7 +2098,8 @@ static void reserve_highatomic_pageblock(struct page 
*page, struct zone *zone,
  * intense memory pressure but failed atomic allocations should be easier
  * to recover from than an OOM.
  */
-static void unreserve_highatomic_pageblock(const struct alloc_context *ac)
+static bool unreserve_highatomic_pageblock(const struct alloc_context *ac,
+   bool drain)
 {
struct zonelist *zonelist = ac->zonelist;
unsigned long flags;
@@ -2106,11 +2107,12 @@ static void unreserve_highatomic_pageblock(const struct 
alloc_context *ac)
struct zone *zone;
struct page *page;
int order;
+   bool ret = false;
 
for_each_zone_zonelist_nodemask(zone, z, zonelist, ac->high_zoneidx,
ac->nodemask) {
/* Preserve at least one pageblock */
-   if (zone->nr_reserved_highatomic <= pageblock_nr_pages)
+   if (!drain && zone->nr_reserved_highatomic <= 
pageblock_nr_pages)
continue;
 
spin_lock_irqsave(&zone->lock, flags);
@@ -2154,12 +2156,24 @@ static void unreserve_highatomic_pageblock(const struct 
alloc_context *ac)
 * may increase.
 */
set_pageblock_migratetype(page, ac->migratetype);
-   move_freepages_block(zone, page, ac->migratetype);
-   spin_unlock_irqrestore(&zone->lock, flags);
-   return;
+   ret = move_freepages_block(zone, page,
+   ac->migratetype);
+   /*
+* By race with page freeing functions, !highatomic
+* pageblocks can have free pages in highatomic free
+* list so if drain 

Re: [PATCH 3/4] mm: unreserve highatomic free pages fully before OOM

2016-10-10 Thread Michal Hocko
On Tue 11-10-16 14:01:41, Minchan Kim wrote:
> Hi Michal,
> 
> On Mon, Oct 10, 2016 at 09:41:40AM +0200, Michal Hocko wrote:
> > On Fri 07-10-16 23:43:45, Minchan Kim wrote:
> > > On Fri, Oct 07, 2016 at 11:09:17AM +0200, Michal Hocko wrote:
[...]
> > > > @@ -2102,10 +2109,12 @@ static void 
> > > > unreserve_highatomic_pageblock(const struct alloc_context *ac)
> > > > set_pageblock_migratetype(page, 
> > > > ac->migratetype);
> > > > move_freepages_block(zone, page, 
> > > > ac->migratetype);
> > > > spin_unlock_irqrestore(&zone->lock, flags);
> > > > -   return;
> > > > +   return true;
> > > 
> > > Such cut-off makes reserved pageblock remained before the OOM.
> > > We call it as premature OOM kill.
> > 
> > Not sure I understand. The above should get rid of all atomic reserves
> > before we go OOM. We can do it all at once but that sounds too
> 
> The problem is there is race between page freeing path and unreserve
> logic so that some pages could be in highatomic free list even though
> zone->nr_reserved_highatomic is already zero.

Does it make any sense to handle such an unlikely case?

> So, at least, it would be better to have a draining step at some point
> where was (no_progress_loops == MAX_RECLAIM RETRIES) in my patch.
> 
> Also, your patch makes retry loop greater than MAX_RECLAIM_RETRIES
> if unreserve_highatomic_pageblock returns true. Theoretically,
> it would make live lock. You might argue it's *really really* rare
> but I don't want to add such subtle thing.
> Maybe, we could drain when no_progress_loops == MAX_RECLAIM_RETRIES.

What would be the scenario when we would really livelock here? How can
we have unreserve_highatomic_pageblock returning true for ever?

> > aggressive to me. If we just do one at the time we have a chance to
> > keep some reserves if the OOM situation is really ephemeral.
> > 
> > Does this patch work in your usecase?
> 
> I didn't test but I guess it works but it has problems I mentioned
> above. 

Please do not make this too over complicated and be practical. I do not
really want to dismiss your usecase but I am really not convinced that
such a "perfectly fit into all memory" situations are sustainable and
justify to make the whole code more complex. I agree that we can at
least try to do something to release those reserves but let's do it
as simple as possible.

-- 
Michal Hocko
SUSE Labs


Re: [PATCH 3/4] mm: unreserve highatomic free pages fully before OOM

2016-10-10 Thread Minchan Kim
Hi Michal,

On Mon, Oct 10, 2016 at 09:41:40AM +0200, Michal Hocko wrote:
> On Fri 07-10-16 23:43:45, Minchan Kim wrote:
> > On Fri, Oct 07, 2016 at 11:09:17AM +0200, Michal Hocko wrote:
> > > On Fri 07-10-16 14:45:35, Minchan Kim wrote:
> > > > After fixing the race of highatomic page count, I still encounter
> > > > OOM with many free memory reserved as highatomic.
> > > > 
> > > > One of reason in my testing was we unreserve free pages only if
> > > > reclaim has progress. Otherwise, we cannot have chance to unreseve.
> > > > 
> > > > Other problem after fixing it was it doesn't guarantee every pages
> > > > unreserving of highatomic pageblock because it just release *a*
> > > > pageblock which could have few free pages so other context could
> > > > steal it easily so that the process stucked with direct reclaim
> > > > finally can encounter OOM although there are free pages which can
> > > > be unreserved.
> > > > 
> > > > This patch changes the logic so that it unreserves pageblocks with
> > > > no_progress_loop proportionally. IOW, in first retrial of reclaim,
> > > > it will try to unreserve a pageblock. In second retrial of reclaim,
> > > > it will try to unreserve 1/MAX_RECLAIM_RETRIES * reserved_pageblock
> > > > and finally all reserved pageblock before the OOM.
> > > > 
> > > > Signed-off-by: Minchan Kim 
> > > > ---
> > > >  mm/page_alloc.c | 57 
> > > > -
> > > >  1 file changed, 44 insertions(+), 13 deletions(-)
> > > 
> > > This sounds much more complex then it needs to be IMHO. Why something as
> > > simple as thhe following wouldn't work? Please note that I even didn't
> > > try to compile this. It is just give you an idea.
> > > ---
> > >  mm/page_alloc.c | 26 --
> > >  1 file changed, 20 insertions(+), 6 deletions(-)
> > > 
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index 73f60ad6315f..e575a4f38555 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -2056,7 +2056,8 @@ static void reserve_highatomic_pageblock(struct 
> > > page *page, struct zone *zone,
> > >   * intense memory pressure but failed atomic allocations should be easier
> > >   * to recover from than an OOM.
> > >   */
> > > -static void unreserve_highatomic_pageblock(const struct alloc_context 
> > > *ac)
> > > +static bool unreserve_highatomic_pageblock(const struct alloc_context 
> > > *ac,
> > > + bool force)
> > >  {
> > >   struct zonelist *zonelist = ac->zonelist;
> > >   unsigned long flags;
> > > @@ -2067,8 +2068,14 @@ static void unreserve_highatomic_pageblock(const 
> > > struct alloc_context *ac)
> > >  
> > >   for_each_zone_zonelist_nodemask(zone, z, zonelist, ac->high_zoneidx,
> > >   ac->nodemask) {
> > > - /* Preserve at least one pageblock */
> > > - if (zone->nr_reserved_highatomic <= pageblock_nr_pages)
> > > + if (!zone->nr_reserved_highatomic)
> > > + continue;
> > > +
> > > + /*
> > > +  * Preserve at least one pageblock unless we are really running
> > > +  * out of memory
> > > +  */
> > > + if (!force && zone->nr_reserved_highatomic <= 
> > > pageblock_nr_pages)
> > >   continue;
> > >  
> > >   spin_lock_irqsave(&zone->lock, flags);
> > > @@ -2102,10 +2109,12 @@ static void unreserve_highatomic_pageblock(const 
> > > struct alloc_context *ac)
> > >   set_pageblock_migratetype(page, ac->migratetype);
> > >   move_freepages_block(zone, page, ac->migratetype);
> > >   spin_unlock_irqrestore(&zone->lock, flags);
> > > - return;
> > > + return true;
> > 
> > Such cut-off makes reserved pageblock remained before the OOM.
> > We call it as premature OOM kill.
> 
> Not sure I understand. The above should get rid of all atomic reserves
> before we go OOM. We can do it all at once but that sounds too

The problem is there is race between page freeing path and unreserve
logic so that some pages could be in highatomic free list even though
zone->nr_reserved_highatomic is already zero.
So, at least, it would be better to have a draining step at some point
where was (no_progress_loops == MAX_RECLAIM RETRIES) in my patch.

Also, your patch makes retry loop greater than MAX_RECLAIM_RETRIES
if unreserve_highatomic_pageblock returns true. Theoretically,
it would make live lock. You might argue it's *really really* rare
but I don't want to add such subtle thing.
Maybe, we could drain when no_progress_loops == MAX_RECLAIM_RETRIES.

> aggressive to me. If we just do one at the time we have a chance to
> keep some reserves if the OOM situation is really ephemeral.
> 
> Does this patch work in your usecase?

I didn't test but I guess it works but it has problems I mentioned
above. 


Re: [PATCH 3/4] mm: unreserve highatomic free pages fully before OOM

2016-10-10 Thread Michal Hocko
On Fri 07-10-16 23:43:45, Minchan Kim wrote:
> On Fri, Oct 07, 2016 at 11:09:17AM +0200, Michal Hocko wrote:
> > On Fri 07-10-16 14:45:35, Minchan Kim wrote:
> > > After fixing the race of highatomic page count, I still encounter
> > > OOM with many free memory reserved as highatomic.
> > > 
> > > One of reason in my testing was we unreserve free pages only if
> > > reclaim has progress. Otherwise, we cannot have chance to unreseve.
> > > 
> > > Other problem after fixing it was it doesn't guarantee every pages
> > > unreserving of highatomic pageblock because it just release *a*
> > > pageblock which could have few free pages so other context could
> > > steal it easily so that the process stucked with direct reclaim
> > > finally can encounter OOM although there are free pages which can
> > > be unreserved.
> > > 
> > > This patch changes the logic so that it unreserves pageblocks with
> > > no_progress_loop proportionally. IOW, in first retrial of reclaim,
> > > it will try to unreserve a pageblock. In second retrial of reclaim,
> > > it will try to unreserve 1/MAX_RECLAIM_RETRIES * reserved_pageblock
> > > and finally all reserved pageblock before the OOM.
> > > 
> > > Signed-off-by: Minchan Kim 
> > > ---
> > >  mm/page_alloc.c | 57 
> > > -
> > >  1 file changed, 44 insertions(+), 13 deletions(-)
> > 
> > This sounds much more complex then it needs to be IMHO. Why something as
> > simple as thhe following wouldn't work? Please note that I even didn't
> > try to compile this. It is just give you an idea.
> > ---
> >  mm/page_alloc.c | 26 --
> >  1 file changed, 20 insertions(+), 6 deletions(-)
> > 
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 73f60ad6315f..e575a4f38555 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -2056,7 +2056,8 @@ static void reserve_highatomic_pageblock(struct page 
> > *page, struct zone *zone,
> >   * intense memory pressure but failed atomic allocations should be easier
> >   * to recover from than an OOM.
> >   */
> > -static void unreserve_highatomic_pageblock(const struct alloc_context *ac)
> > +static bool unreserve_highatomic_pageblock(const struct alloc_context *ac,
> > +   bool force)
> >  {
> > struct zonelist *zonelist = ac->zonelist;
> > unsigned long flags;
> > @@ -2067,8 +2068,14 @@ static void unreserve_highatomic_pageblock(const 
> > struct alloc_context *ac)
> >  
> > for_each_zone_zonelist_nodemask(zone, z, zonelist, ac->high_zoneidx,
> > ac->nodemask) {
> > -   /* Preserve at least one pageblock */
> > -   if (zone->nr_reserved_highatomic <= pageblock_nr_pages)
> > +   if (!zone->nr_reserved_highatomic)
> > +   continue;
> > +
> > +   /*
> > +* Preserve at least one pageblock unless we are really running
> > +* out of memory
> > +*/
> > +   if (!force && zone->nr_reserved_highatomic <= 
> > pageblock_nr_pages)
> > continue;
> >  
> > spin_lock_irqsave(&zone->lock, flags);
> > @@ -2102,10 +2109,12 @@ static void unreserve_highatomic_pageblock(const 
> > struct alloc_context *ac)
> > set_pageblock_migratetype(page, ac->migratetype);
> > move_freepages_block(zone, page, ac->migratetype);
> > spin_unlock_irqrestore(&zone->lock, flags);
> > -   return;
> > +   return true;
> 
> Such cut-off makes reserved pageblock remained before the OOM.
> We call it as premature OOM kill.

Not sure I understand. The above should get rid of all atomic reserves
before we go OOM. We can do it all at once but that sounds too
aggressive to me. If we just do one at the time we have a chance to
keep some reserves if the OOM situation is really ephemeral.

Does this patch work in your usecase?

> If you feel it's rather complex, simply, we can drain *every*
> reserved pages when no_progress_loops is greater than MAX_RECLAIM_RETRIES.
> Do you want it?
> 
> It's rather conservative approach to keep highatomic pageblocks.
> Anyway, I think it's matter of policy and my goal is just to use up
> every reserved pageblock before OOM so anyone never say
> "Hey, enough free pages which is greater than min watermark but
> why I should see the OOM with GFP_KERNEL 4K allocation".

I agree that triggering OOM while we are above min watermarks is less
than optimial and we should think about how to fix it.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH 3/4] mm: unreserve highatomic free pages fully before OOM

2016-10-07 Thread Minchan Kim
On Fri, Oct 07, 2016 at 11:09:17AM +0200, Michal Hocko wrote:
> On Fri 07-10-16 14:45:35, Minchan Kim wrote:
> > After fixing the race of highatomic page count, I still encounter
> > OOM with many free memory reserved as highatomic.
> > 
> > One of reason in my testing was we unreserve free pages only if
> > reclaim has progress. Otherwise, we cannot have chance to unreseve.
> > 
> > Other problem after fixing it was it doesn't guarantee every pages
> > unreserving of highatomic pageblock because it just release *a*
> > pageblock which could have few free pages so other context could
> > steal it easily so that the process stucked with direct reclaim
> > finally can encounter OOM although there are free pages which can
> > be unreserved.
> > 
> > This patch changes the logic so that it unreserves pageblocks with
> > no_progress_loop proportionally. IOW, in first retrial of reclaim,
> > it will try to unreserve a pageblock. In second retrial of reclaim,
> > it will try to unreserve 1/MAX_RECLAIM_RETRIES * reserved_pageblock
> > and finally all reserved pageblock before the OOM.
> > 
> > Signed-off-by: Minchan Kim 
> > ---
> >  mm/page_alloc.c | 57 
> > -
> >  1 file changed, 44 insertions(+), 13 deletions(-)
> 
> This sounds much more complex then it needs to be IMHO. Why something as
> simple as thhe following wouldn't work? Please note that I even didn't
> try to compile this. It is just give you an idea.
> ---
>  mm/page_alloc.c | 26 --
>  1 file changed, 20 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 73f60ad6315f..e575a4f38555 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2056,7 +2056,8 @@ static void reserve_highatomic_pageblock(struct page 
> *page, struct zone *zone,
>   * intense memory pressure but failed atomic allocations should be easier
>   * to recover from than an OOM.
>   */
> -static void unreserve_highatomic_pageblock(const struct alloc_context *ac)
> +static bool unreserve_highatomic_pageblock(const struct alloc_context *ac,
> + bool force)
>  {
>   struct zonelist *zonelist = ac->zonelist;
>   unsigned long flags;
> @@ -2067,8 +2068,14 @@ static void unreserve_highatomic_pageblock(const 
> struct alloc_context *ac)
>  
>   for_each_zone_zonelist_nodemask(zone, z, zonelist, ac->high_zoneidx,
>   ac->nodemask) {
> - /* Preserve at least one pageblock */
> - if (zone->nr_reserved_highatomic <= pageblock_nr_pages)
> + if (!zone->nr_reserved_highatomic)
> + continue;
> +
> + /*
> +  * Preserve at least one pageblock unless we are really running
> +  * out of memory
> +  */
> + if (!force && zone->nr_reserved_highatomic <= 
> pageblock_nr_pages)
>   continue;
>  
>   spin_lock_irqsave(&zone->lock, flags);
> @@ -2102,10 +2109,12 @@ static void unreserve_highatomic_pageblock(const 
> struct alloc_context *ac)
>   set_pageblock_migratetype(page, ac->migratetype);
>   move_freepages_block(zone, page, ac->migratetype);
>   spin_unlock_irqrestore(&zone->lock, flags);
> - return;
> + return true;

Such cut-off makes reserved pageblock remained before the OOM.
We call it as premature OOM kill.

If you feel it's rather complex, simply, we can drain *every*
reserved pages when no_progress_loops is greater than MAX_RECLAIM_RETRIES.
Do you want it?

It's rather conservative approach to keep highatomic pageblocks.
Anyway, I think it's matter of policy and my goal is just to use up
every reserved pageblock before OOM so anyone never say
"Hey, enough free pages which is greater than min watermark but
why I should see the OOM with GFP_KERNEL 4K allocation".

so I'm not against on it if you guys like it. Say your preference.


>   }
>   spin_unlock_irqrestore(&zone->lock, flags);
>   }
> +
> + return false;
>  }
>  
>  /* Remove an element from the buddy allocator from the fallback list */
> @@ -3302,7 +3311,7 @@ __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned 
> int order,
>* Shrink them them and try again
>*/
>   if (!page && !drained) {
> - unreserve_highatomic_pageblock(ac);
> + unreserve_highatomic_pageblock(ac, false);
>   drain_all_pages(NULL);
>   drained = true;
>   goto retry;
> @@ -3418,9 +3427,14 @@ should_reclaim_retry(gfp_t gfp_mask, unsigned order,
>   /*
>* Make sure we converge to OOM if we cannot make any progress
>* several times in the row.
> +  * Do last desparate attempt to throw high atomic reserves away
> +  * before we give up
>*/
> - if (*no_progress_loops > MAX_REC

Re: [PATCH 3/4] mm: unreserve highatomic free pages fully before OOM

2016-10-07 Thread Michal Hocko
On Fri 07-10-16 14:45:35, Minchan Kim wrote:
> After fixing the race of highatomic page count, I still encounter
> OOM with many free memory reserved as highatomic.
> 
> One of reason in my testing was we unreserve free pages only if
> reclaim has progress. Otherwise, we cannot have chance to unreseve.
> 
> Other problem after fixing it was it doesn't guarantee every pages
> unreserving of highatomic pageblock because it just release *a*
> pageblock which could have few free pages so other context could
> steal it easily so that the process stucked with direct reclaim
> finally can encounter OOM although there are free pages which can
> be unreserved.
> 
> This patch changes the logic so that it unreserves pageblocks with
> no_progress_loop proportionally. IOW, in first retrial of reclaim,
> it will try to unreserve a pageblock. In second retrial of reclaim,
> it will try to unreserve 1/MAX_RECLAIM_RETRIES * reserved_pageblock
> and finally all reserved pageblock before the OOM.
> 
> Signed-off-by: Minchan Kim 
> ---
>  mm/page_alloc.c | 57 
> -
>  1 file changed, 44 insertions(+), 13 deletions(-)

This sounds much more complex then it needs to be IMHO. Why something as
simple as thhe following wouldn't work? Please note that I even didn't
try to compile this. It is just give you an idea.
---
 mm/page_alloc.c | 26 --
 1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 73f60ad6315f..e575a4f38555 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2056,7 +2056,8 @@ static void reserve_highatomic_pageblock(struct page 
*page, struct zone *zone,
  * intense memory pressure but failed atomic allocations should be easier
  * to recover from than an OOM.
  */
-static void unreserve_highatomic_pageblock(const struct alloc_context *ac)
+static bool unreserve_highatomic_pageblock(const struct alloc_context *ac,
+   bool force)
 {
struct zonelist *zonelist = ac->zonelist;
unsigned long flags;
@@ -2067,8 +2068,14 @@ static void unreserve_highatomic_pageblock(const struct 
alloc_context *ac)
 
for_each_zone_zonelist_nodemask(zone, z, zonelist, ac->high_zoneidx,
ac->nodemask) {
-   /* Preserve at least one pageblock */
-   if (zone->nr_reserved_highatomic <= pageblock_nr_pages)
+   if (!zone->nr_reserved_highatomic)
+   continue;
+
+   /*
+* Preserve at least one pageblock unless we are really running
+* out of memory
+*/
+   if (!force && zone->nr_reserved_highatomic <= 
pageblock_nr_pages)
continue;
 
spin_lock_irqsave(&zone->lock, flags);
@@ -2102,10 +2109,12 @@ static void unreserve_highatomic_pageblock(const struct 
alloc_context *ac)
set_pageblock_migratetype(page, ac->migratetype);
move_freepages_block(zone, page, ac->migratetype);
spin_unlock_irqrestore(&zone->lock, flags);
-   return;
+   return true;
}
spin_unlock_irqrestore(&zone->lock, flags);
}
+
+   return false;
 }
 
 /* Remove an element from the buddy allocator from the fallback list */
@@ -3302,7 +3311,7 @@ __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int 
order,
 * Shrink them them and try again
 */
if (!page && !drained) {
-   unreserve_highatomic_pageblock(ac);
+   unreserve_highatomic_pageblock(ac, false);
drain_all_pages(NULL);
drained = true;
goto retry;
@@ -3418,9 +3427,14 @@ should_reclaim_retry(gfp_t gfp_mask, unsigned order,
/*
 * Make sure we converge to OOM if we cannot make any progress
 * several times in the row.
+* Do last desparate attempt to throw high atomic reserves away
+* before we give up
 */
-   if (*no_progress_loops > MAX_RECLAIM_RETRIES)
+   if (*no_progress_loops > MAX_RECLAIM_RETRIES) {
+   if (unreserve_highatomic_pageblock(ac, true))
+   return true;
return false;
+   }
 
/*
 * Keep reclaiming pages while there is a chance this will lead
-- 
Michal Hocko
SUSE Labs