Re: [PATCH] mm/page_alloc: try oom if reclaim is unable to make forward progress

2021-03-26 Thread Aaron Tomlin
On Fri 2021-03-26 16:36 +0100, Michal Hocko wrote:
> We should be focusing on the compaction retry logic and see whether we
> can have some "run away" scenarios there.

Fair enough.

> Seeing so many retries without compaction bailing out sounds like a bug
> in that retry logic.

Agreed - I will continue to look into this.



Kind regards,

-- 
Aaron Tomlin



Re: [PATCH] mm/page_alloc: try oom if reclaim is unable to make forward progress

2021-03-26 Thread Michal Hocko
On Fri 26-03-21 11:22:54, Aaron Tomlin wrote:
[...]
> > Both reclaim and compaction maintain their own retries counters as they
> > are targeting a different operation. Although the compaction really
> > depends on the reclaim to do some progress.
> 
> Yes. Looking at should_compact_retry() if the last known compaction result
> was skipped i.e. suggesting there was not enough order-0 pages to support
> compaction, so assistance is needed from reclaim
> (see __compaction_suitable()).
> 
> I noticed that the value of compaction_retries, compact_result and
> compact_priority was 0, COMPACT_SKIPPED and 1 i.e. COMPACT_PRIO_SYNC_LIGHT,
> respectively.
> 
> > OK, this sound unexpected as it indicates that the reclaim is able to
> > make a forward progress but compaction doesn't want to give up and keeps
> > retrying. Are you able to reproduce this or could you find out which
> > specific condition keeps compaction retrying? I would expect that it is
> > one of the 3 conditions before the max_retries is checked.
> 
> Unfortunately, I have been told it is not entirely reproducible.
> I suspect it is the following in should_compact_retry() - as I indicated
> above the last known value stored in compaction_retries was 0:
> 
> 
> if (order > PAGE_ALLOC_COSTLY_ORDER)
> max_retries /= 4;
> if (*compaction_retries <= max_retries) {
> ret = true;
> goto out;
> }

OK, I kinda expected this would be not easily reproducible. The reason I
dislike your patch is that it addes yet another criterion for oom while
we already do have 2 which doesn't make the resulting code easier to
reason about. We should be focusing on the compaction retry logic and
see whether we can have some "run away" scenarios there. Seeing so many
retries without compaction bailing out sounds like a bug in that retry
logic. Vlastimil is much more familiar with that.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH] mm/page_alloc: try oom if reclaim is unable to make forward progress

2021-03-26 Thread Aaron Tomlin
Hi Michal,

On Fri 2021-03-26 09:16 +0100, Michal Hocko wrote:
> The oom killer is never triggered for costly allocation request.

Yes - I agree. Looking at __alloc_pages_may_oom() I can see for a costly
order allocation request the OOM killer is explicitly not used.
If I understand correctly, the patch I proposed was for the following
scenarios:

  1.The costly order allocation request to fail when
"some" progress is made (i.e. order-0) and the last
compaction request was "skipped"

  2.A non-costly order allocation request that is
unable to make any progress and failed over the
maximum reclaim retry count in a row and the last
known compaction result was skipped to consider
using the OOM killer for assistance

> Both reclaim and compaction maintain their own retries counters as they
> are targeting a different operation. Although the compaction really
> depends on the reclaim to do some progress.

Yes. Looking at should_compact_retry() if the last known compaction result
was skipped i.e. suggesting there was not enough order-0 pages to support
compaction, so assistance is needed from reclaim
(see __compaction_suitable()).

I noticed that the value of compaction_retries, compact_result and
compact_priority was 0, COMPACT_SKIPPED and 1 i.e. COMPACT_PRIO_SYNC_LIGHT,
respectively.

> OK, this sound unexpected as it indicates that the reclaim is able to
> make a forward progress but compaction doesn't want to give up and keeps
> retrying. Are you able to reproduce this or could you find out which
> specific condition keeps compaction retrying? I would expect that it is
> one of the 3 conditions before the max_retries is checked.

Unfortunately, I have been told it is not entirely reproducible.
I suspect it is the following in should_compact_retry() - as I indicated
above the last known value stored in compaction_retries was 0:


if (order > PAGE_ALLOC_COSTLY_ORDER)
max_retries /= 4;
if (*compaction_retries <= max_retries) {
ret = true;
goto out;
}




Kind regards,

-- 
Aaron Tomlin



Re: [PATCH] mm/page_alloc: try oom if reclaim is unable to make forward progress

2021-03-26 Thread Michal Hocko
[Cc Vlastimil]

On Thu 25-03-21 21:01:59, Aaron Tomlin wrote:
> On Mon 2021-03-22 11:47 +0100, Michal Hocko wrote:
> > Costly orders already do have heuristics for the retry in place. Could
> > you be more specific what kind of problem you see with those?
> 
> If I understand correctly, when the gfp_mask consists of
> GFP_KERNEL | __GFP_RETRY_MAYFAIL in particular, an allocation request will
> fail, if and only if reclaim is unable to make progress.
> 
> The costly order allocation retry logic is handled primarily in
> should_reclaim_retry(). Looking at should_reclaim_retry() we see that the
> no progress counter value is always incremented in the costly order case
> even when "some" progress has been made which is represented by the value
> stored in did_some_progress.
> 
> if (costly_order && !(gfp_mask & __GFP_RETRY_MAYFAIL))
> goto nopage;
> 
> if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags,
>  did_some_progress > 0, &no_progress_loops))
> goto retry;
> 
> I think after we have tried MAX_RECLAIM_RETRIES in a row without success
> and the last known compaction attempt was "skipped", perhaps it would be
> better to try and use the OOM killer or fail the allocation attempt?

The oom killer is never triggered for costly allocation request. Both
reclaim and compaction maintain their own retries counters as they are
targeting a different operation. Although the compaction really depends
on the reclaim to do some progress.

> I encountered a situation when the value of no_progress_loops was found to
> be 31,611,688 i.e. significantly over MAX_RECLAIM_RETRIES; the allocation
> order was 5. The gfp_mask contained the following:

OK, this sound unexpected as it indicates that the reclaim is able to
make a forward progress but compaction doesn't want to give up and keeps
retrying. Are you able to reproduce this or could you find out which
specific condition keeps compaction retrying? I would expect that it is
one of the 3 conditions before the max_retries is checked.

>  #define ___GFP_HIGHMEM  0x02
>  #define ___GFP_IO   0x40
>  #define ___GFP_FS   0x80
>  #define ___GFP_NOWARN   0x200
>  #define ___GFP_RETRY_MAYFAIL0x400
>  #define ___GFP_COMP 0x4000
>  #define ___GFP_HARDWALL 0x2
>  #define ___GFP_DIRECT_RECLAIM   0x20
>  #define ___GFP_KSWAPD_RECLAIM   0x40
> 
> 
> 
> -- 
> Aaron Tomlin
> 

-- 
Michal Hocko
SUSE Labs


Re: [PATCH] mm/page_alloc: try oom if reclaim is unable to make forward progress

2021-03-25 Thread Aaron Tomlin
On Mon 2021-03-22 11:47 +0100, Michal Hocko wrote:
> Costly orders already do have heuristics for the retry in place. Could
> you be more specific what kind of problem you see with those?

If I understand correctly, when the gfp_mask consists of
GFP_KERNEL | __GFP_RETRY_MAYFAIL in particular, an allocation request will
fail, if and only if reclaim is unable to make progress.

The costly order allocation retry logic is handled primarily in
should_reclaim_retry(). Looking at should_reclaim_retry() we see that the
no progress counter value is always incremented in the costly order case
even when "some" progress has been made which is represented by the value
stored in did_some_progress.

if (costly_order && !(gfp_mask & __GFP_RETRY_MAYFAIL))
goto nopage;

if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags,
 did_some_progress > 0, &no_progress_loops))
goto retry;

I think after we have tried MAX_RECLAIM_RETRIES in a row without success
and the last known compaction attempt was "skipped", perhaps it would be
better to try and use the OOM killer or fail the allocation attempt?

I encountered a situation when the value of no_progress_loops was found to
be 31,611,688 i.e. significantly over MAX_RECLAIM_RETRIES; the allocation
order was 5. The gfp_mask contained the following:

 #define ___GFP_HIGHMEM  0x02
 #define ___GFP_IO   0x40
 #define ___GFP_FS   0x80
 #define ___GFP_NOWARN   0x200
 #define ___GFP_RETRY_MAYFAIL0x400
 #define ___GFP_COMP 0x4000
 #define ___GFP_HARDWALL 0x2
 #define ___GFP_DIRECT_RECLAIM   0x20
 #define ___GFP_KSWAPD_RECLAIM   0x40



-- 
Aaron Tomlin



Re: [PATCH] mm/page_alloc: try oom if reclaim is unable to make forward progress

2021-03-22 Thread Michal Hocko
On Fri 19-03-21 17:29:01, Aaron Tomlin wrote:
> Hi Michal,
> 
> On Thu 2021-03-18 17:16 +0100, Michal Hocko wrote:
> > On Mon 15-03-21 16:58:37, Aaron Tomlin wrote:
> > > In the situation where direct reclaim is required to make progress for
> > > compaction but no_progress_loops is already over the limit of
> > > MAX_RECLAIM_RETRIES consider invoking the oom killer.
> 
> Firstly, thank you for your response.
> 
> > What is the problem you are trying to fix?
> 
> If I understand correctly, in the case of a "costly" order allocation
> request that is permitted to repeatedly retry, it is possible to exceed the
> maximum reclaim retry threshold as long as "some" progress is being made
> even at the highest compaction priority.

Costly orders already do have heuristics for the retry in place. Could
you be more specific what kind of problem you see with those?

> Furthermore, if the allocator has a fatal signal pending, this is not
> considered.

Fatal signals pending are usually not a strong reason to cut retries
count or fail allocations.

> In my opinion, it might be better to just give up straight away or try and
> use the OOM killer only in the non-costly order allocation scenario to
> assit reclaim. Looking at __alloc_pages_may_oom() the current logic is to
> entirely skip the OOM killer for a costly order request, which makes sense.

Well, opinions might differ of course. The main question is whether
there are workloads which are unhappy about the existing behavior.

-- 
Michal Hocko
SUSE Labs


Re: [PATCH] mm/page_alloc: try oom if reclaim is unable to make forward progress

2021-03-19 Thread Aaron Tomlin
Hi Michal,

On Thu 2021-03-18 17:16 +0100, Michal Hocko wrote:
> On Mon 15-03-21 16:58:37, Aaron Tomlin wrote:
> > In the situation where direct reclaim is required to make progress for
> > compaction but no_progress_loops is already over the limit of
> > MAX_RECLAIM_RETRIES consider invoking the oom killer.

Firstly, thank you for your response.

> What is the problem you are trying to fix?

If I understand correctly, in the case of a "costly" order allocation
request that is permitted to repeatedly retry, it is possible to exceed the
maximum reclaim retry threshold as long as "some" progress is being made
even at the highest compaction priority. Furthermore, if the allocator has
a fatal signal pending, this is not considered.

In my opinion, it might be better to just give up straight away or try and
use the OOM killer only in the non-costly order allocation scenario to
assit reclaim. Looking at __alloc_pages_may_oom() the current logic is to
entirely skip the OOM killer for a costly order request, which makes sense.


Regards,

-- 
Aaron Tomlin



Re: [PATCH] mm/page_alloc: try oom if reclaim is unable to make forward progress

2021-03-18 Thread Michal Hocko
On Mon 15-03-21 16:58:37, Aaron Tomlin wrote:
> In the situation where direct reclaim is required to make progress for
> compaction but no_progress_loops is already over the limit of
> MAX_RECLAIM_RETRIES consider invoking the oom killer.

What is the problem you are trying to fix?

> 
> Signed-off-by: Aaron Tomlin 
> ---
>  mm/page_alloc.c | 22 ++
>  1 file changed, 18 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 7a2c89b21115..8d748b1b8d1e 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4181,6 +4181,16 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned 
> int order,
>   return NULL;
>  }
>  
> +static inline bool
> +should_try_oom(int no_progress_loops,
> + enum compact_result last_compact_result)
> +{
> + if (no_progress_loops > MAX_RECLAIM_RETRIES && last_compact_result
> + == COMPACT_SKIPPED)
> + return true;
> + return false;
> +}
> +
>  static inline bool
>  should_compact_retry(struct alloc_context *ac, int order, int alloc_flags,
>enum compact_result compact_result,
> @@ -4547,10 +4557,11 @@ should_reclaim_retry(gfp_t gfp_mask, unsigned order,
>* Make sure we converge to OOM if we cannot make any progress
>* several times in the row.
>*/
> - if (*no_progress_loops > MAX_RECLAIM_RETRIES) {
> - /* Before OOM, exhaust highatomic_reserve */
> - return unreserve_highatomic_pageblock(ac, true);
> - }
> + if (*no_progress_loops > MAX_RECLAIM_RETRIES)
> + result false;
> + /* Last chance before OOM, try draining highatomic_reserve once */
> + else if (*no_progress_loops == MAX_RECLAIM_RETRIES)
> + return unreserve_highatomic_pageblock(ac, true)
>  
>   /*
>* Keep reclaiming pages while there is a chance this will lead
> @@ -4822,6 +4833,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int 
> order,
>did_some_progress > 0, &no_progress_loops))
>   goto retry;
>  
> + if (should_try_oom(no_progress_loops, compact_result))
> + goto oom:
>   /*
>* It doesn't make any sense to retry for the compaction if the order-0
>* reclaim is not able to make any progress because the current
> @@ -4839,6 +4852,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int 
> order,
>   if (check_retry_cpuset(cpuset_mems_cookie, ac))
>   goto retry_cpuset;
>  
> +oom:
>   /* Reclaim has failed us, start killing things */
>   page = __alloc_pages_may_oom(gfp_mask, order, ac, &did_some_progress);
>   if (page)
> -- 
> 2.26.2
> 

-- 
Michal Hocko
SUSE Labs


Re: [PATCH] mm/page_alloc: try oom if reclaim is unable to make forward progress

2021-03-15 Thread kernel test robot
Hi Aaron,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on hnaz-linux-mm/master]

url:
https://github.com/0day-ci/linux/commits/Aaron-Tomlin/mm-page_alloc-try-oom-if-reclaim-is-unable-to-make-forward-progress/20210316-010203
base:   https://github.com/hnaz/linux-mm master
config: riscv-randconfig-r026-20210315 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 
a28facba1ccdc957f386b7753f4958307f1bfde8)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# install riscv cross compiling tool for clang build
# apt-get install binutils-riscv64-linux-gnu
# 
https://github.com/0day-ci/linux/commit/77338aaff2606a7715c832545e79370e849e3b4e
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Aaron-Tomlin/mm-page_alloc-try-oom-if-reclaim-is-unable-to-make-forward-progress/20210316-010203
git checkout 77338aaff2606a7715c832545e79370e849e3b4e
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=riscv 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

 ^
   In file included from mm/page_alloc.c:20:
   In file included from include/linux/highmem.h:10:
   In file included from include/linux/hardirq.h:10:
   In file included from ./arch/riscv/include/generated/asm/hardirq.h:1:
   In file included from include/asm-generic/hardirq.h:13:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/riscv/include/asm/io.h:148:
   include/asm-generic/io.h:572:9: warning: performing pointer arithmetic on a 
null pointer has undefined behavior [-Wnull-pointer-arithmetic]
   return inl(addr);
  ^
   arch/riscv/include/asm/io.h:56:76: note: expanded from macro 'inl'
   #define inl(c)  ({ u32 __v; __io_pbr(); __v = 
readl_cpu((void*)(PCI_IOBASE + (c))); __io_par(__v); __v; })
   
~~ ^
   arch/riscv/include/asm/mmio.h:89:76: note: expanded from macro 'readl_cpu'
   #define readl_cpu(c)({ u32 __r = le32_to_cpu((__force 
__le32)__raw_readl(c)); __r; })

^
   include/uapi/linux/byteorder/little_endian.h:34:51: note: expanded from 
macro '__le32_to_cpu'
   #define __le32_to_cpu(x) ((__force __u32)(__le32)(x))
 ^
   In file included from mm/page_alloc.c:20:
   In file included from include/linux/highmem.h:10:
   In file included from include/linux/hardirq.h:10:
   In file included from ./arch/riscv/include/generated/asm/hardirq.h:1:
   In file included from include/asm-generic/hardirq.h:13:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/riscv/include/asm/io.h:148:
   include/asm-generic/io.h:580:2: warning: performing pointer arithmetic on a 
null pointer has undefined behavior [-Wnull-pointer-arithmetic]
   outb(value, addr);
   ^
   arch/riscv/include/asm/io.h:58:68: note: expanded from macro 'outb'
   #define outb(v,c)   ({ __io_pbw(); writeb_cpu((v),(void*)(PCI_IOBASE + 
(c))); __io_paw(); })
 ~~ ^
   arch/riscv/include/asm/mmio.h:91:52: note: expanded from macro 'writeb_cpu'
   #define writeb_cpu(v, c)((void)__raw_writeb((v), (c)))
 ^
   In file included from mm/page_alloc.c:20:
   In file included from include/linux/highmem.h:10:
   In file included from include/linux/hardirq.h:10:
   In file included from ./arch/riscv/include/generated/asm/hardirq.h:1:
   In file included from include/asm-generic/hardirq.h:13:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/riscv/include/asm/io.h:148:
   include/asm-generic/io.h:588:2: warning: performing pointer arithmetic on a 
null pointer has undefined behavior [-Wnull-pointer-arithmetic]
   outw(value, addr);
   ^
   arch/riscv/include/asm/io.h:59:68: note: expanded from macro 'outw'
   #define outw(v,c)   ({ __io_pbw(); writew_cpu((v),(void*)(PCI_IOBASE + 
(c))); __io_paw(); })
 ~~ ^
   arch/riscv/include/asm/mmio.h:92:76: note: expanded from macro 'writew_cpu'
   #define writew_cpu(v, c)((void)__raw_writew((__force 
u16)cpu_to_

Re: [PATCH] mm/page_alloc: try oom if reclaim is unable to make forward progress

2021-03-15 Thread kernel test robot
Hi Aaron,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on hnaz-linux-mm/master]

url:
https://github.com/0day-ci/linux/commits/Aaron-Tomlin/mm-page_alloc-try-oom-if-reclaim-is-unable-to-make-forward-progress/20210316-010203
base:   https://github.com/hnaz/linux-mm master
config: powerpc64-randconfig-r012-20210315 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 
a28facba1ccdc957f386b7753f4958307f1bfde8)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# install powerpc64 cross compiling tool for clang build
# apt-get install binutils-powerpc64-linux-gnu
# 
https://github.com/0day-ci/linux/commit/77338aaff2606a7715c832545e79370e849e3b4e
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Aaron-Tomlin/mm-page_alloc-try-oom-if-reclaim-is-unable-to-make-forward-progress/20210316-010203
git checkout 77338aaff2606a7715c832545e79370e849e3b4e
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross 
ARCH=powerpc64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

   mm/page_alloc.c:2538:5: warning: no previous prototype for function 
'find_suitable_fallback' [-Wmissing-prototypes]
   int find_suitable_fallback(struct free_area *area, unsigned int order,
   ^
   mm/page_alloc.c:2538:1: note: declare 'static' if the function is not 
intended to be used outside of this translation unit
   int find_suitable_fallback(struct free_area *area, unsigned int order,
   ^
   static 
>> mm/page_alloc.c::3: error: use of undeclared identifier 'result'
   result false;
   ^
>> mm/page_alloc.c:4447:50: error: expected ';' after return statement
   return unreserve_highatomic_pageblock(ac, true)
  ^
  ;
>> mm/page_alloc.c:4507:2: error: expected expression
   else
   ^
>> mm/page_alloc.c:4719:6: error: implicit declaration of function 
>> 'should_try_oom' [-Werror,-Wimplicit-function-declaration]
   if (should_try_oom(no_progress_loops, compact_result))
   ^
>> mm/page_alloc.c:4720:11: error: expected ';' after goto statement
   goto oom:
   ^
   ;
   mm/page_alloc.c:6136:23: warning: no previous prototype for function 
'memmap_init' [-Wmissing-prototypes]
   void __meminit __weak memmap_init(unsigned long size, int nid,
 ^
   mm/page_alloc.c:6136:1: note: declare 'static' if the function is not 
intended to be used outside of this translation unit
   void __meminit __weak memmap_init(unsigned long size, int nid,
   ^
   static 
   2 warnings and 5 errors generated.


vim +/result + mm/page_alloc.c

  4409  
  4410  /*
  4411   * Checks whether it makes sense to retry the reclaim to make a forward 
progress
  4412   * for the given allocation request.
  4413   *
  4414   * We give up when we either have tried MAX_RECLAIM_RETRIES in a row
  4415   * without success, or when we couldn't even meet the watermark if we
  4416   * reclaimed all remaining pages on the LRU lists.
  4417   *
  4418   * Returns true if a retry is viable or false to enter the oom path.
  4419   */
  4420  static inline bool
  4421  should_reclaim_retry(gfp_t gfp_mask, unsigned order,
  4422   struct alloc_context *ac, int alloc_flags,
  4423   bool did_some_progress, int *no_progress_loops)
  4424  {
  4425  struct zone *zone;
  4426  struct zoneref *z;
  4427  bool ret = false;
  4428  
  4429  /*
  4430   * Costly allocations might have made a progress but this 
doesn't mean
  4431   * their order will become available due to high fragmentation 
so
  4432   * always increment the no progress counter for them
  4433   */
  4434  if (did_some_progress && order <= PAGE_ALLOC_COSTLY_ORDER)
  4435  *no_progress_loops = 0;
  4436  else
  4437  (*no_progress_loops)++;
  4438  
  4439  /*
  4440   * Make sure we converge to OOM if we cannot make any progress
  4441   * several times in the row.
  4442   */
  4443  if (*no_progress_loops > MAX_RECLAIM_RETRIES)
>   result false;
  4445  /* Last chance before OOM, try draining highatomic_reserve once 
*/
  4446  else if (*no_progress_loops == MAX_RECLAIM_RETRIES)
> 4447  return unreserve_highatomic_pageblock(ac, true)
  4448  
  4449 

Re: [PATCH] mm/page_alloc: try oom if reclaim is unable to make forward progress

2021-03-15 Thread kernel test robot
Hi Aaron,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on hnaz-linux-mm/master]

url:
https://github.com/0day-ci/linux/commits/Aaron-Tomlin/mm-page_alloc-try-oom-if-reclaim-is-unable-to-make-forward-progress/20210316-010203
base:   https://github.com/hnaz/linux-mm master
config: arc-randconfig-r024-20210315 (attached as .config)
compiler: arceb-elf-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://github.com/0day-ci/linux/commit/77338aaff2606a7715c832545e79370e849e3b4e
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Aaron-Tomlin/mm-page_alloc-try-oom-if-reclaim-is-unable-to-make-forward-progress/20210316-010203
git checkout 77338aaff2606a7715c832545e79370e849e3b4e
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=arc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All error/warnings (new ones prefixed by >>):

   mm/page_alloc.c: In function 'should_reclaim_retry':
>> mm/page_alloc.c::3: error: 'result' undeclared (first use in this 
>> function)
 |   result false;
 |   ^~
   mm/page_alloc.c::3: note: each undeclared identifier is reported only 
once for each function it appears in
>> mm/page_alloc.c::9: error: expected ';' before 'false'
 |   result false;
 | ^~
 | ;
>> mm/page_alloc.c:4447:50: error: expected ';' before 'for'
4447 |   return unreserve_highatomic_pageblock(ac, true)
 |  ^
 |  ;
   mm/page_alloc.c:4426:18: warning: unused variable 'z' [-Wunused-variable]
4426 |  struct zoneref *z;
 |  ^
   mm/page_alloc.c:4425:15: warning: unused variable 'zone' [-Wunused-variable]
4425 |  struct zone *zone;
 |   ^~~~
   mm/page_alloc.c: In function '__alloc_pages_slowpath':
>> mm/page_alloc.c:4720:11: error: expected ';' before ':' token
4720 |   goto oom:
 |   ^
 |   ;
>> mm/page_alloc.c:4556:6: warning: variable 'compaction_retries' set but not 
>> used [-Wunused-but-set-variable]
4556 |  int compaction_retries;
 |  ^~
   mm/page_alloc.c: At top level:
   mm/page_alloc.c:6136:23: warning: no previous prototype for 'memmap_init' 
[-Wmissing-prototypes]
6136 | void __meminit __weak memmap_init(unsigned long size, int nid,
 |   ^~~


vim +/result + mm/page_alloc.c

  4409  
  4410  /*
  4411   * Checks whether it makes sense to retry the reclaim to make a forward 
progress
  4412   * for the given allocation request.
  4413   *
  4414   * We give up when we either have tried MAX_RECLAIM_RETRIES in a row
  4415   * without success, or when we couldn't even meet the watermark if we
  4416   * reclaimed all remaining pages on the LRU lists.
  4417   *
  4418   * Returns true if a retry is viable or false to enter the oom path.
  4419   */
  4420  static inline bool
  4421  should_reclaim_retry(gfp_t gfp_mask, unsigned order,
  4422   struct alloc_context *ac, int alloc_flags,
  4423   bool did_some_progress, int *no_progress_loops)
  4424  {
  4425  struct zone *zone;
  4426  struct zoneref *z;
  4427  bool ret = false;
  4428  
  4429  /*
  4430   * Costly allocations might have made a progress but this 
doesn't mean
  4431   * their order will become available due to high fragmentation 
so
  4432   * always increment the no progress counter for them
  4433   */
  4434  if (did_some_progress && order <= PAGE_ALLOC_COSTLY_ORDER)
  4435  *no_progress_loops = 0;
  4436  else
  4437  (*no_progress_loops)++;
  4438  
  4439  /*
  4440   * Make sure we converge to OOM if we cannot make any progress
  4441   * several times in the row.
  4442   */
  4443  if (*no_progress_loops > MAX_RECLAIM_RETRIES)
>   result false;
  4445  /* Last chance before OOM, try draining highatomic_reserve once 
*/
  4446  else if (*no_progress_loops == MAX_RECLAIM_RETRIES)
> 4447  return unreserve_highatomic_pageblock(ac, true)
  4448  
  4449  /*
  4450   * Keep reclaiming pages while there is a chance this will lead
  4451   * somewhere.  If none of the target zones can satisfy our 
allocation
  4452   * request even if all reclaimable pages are considered then we 
are
  4453   * screwed and have to go OOM.
  4454

[PATCH] mm/page_alloc: try oom if reclaim is unable to make forward progress

2021-03-15 Thread Aaron Tomlin
In the situation where direct reclaim is required to make progress for
compaction but no_progress_loops is already over the limit of
MAX_RECLAIM_RETRIES consider invoking the oom killer.

Signed-off-by: Aaron Tomlin 
---
 mm/page_alloc.c | 22 ++
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7a2c89b21115..8d748b1b8d1e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4181,6 +4181,16 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned 
int order,
return NULL;
 }
 
+static inline bool
+should_try_oom(int no_progress_loops,
+   enum compact_result last_compact_result)
+{
+   if (no_progress_loops > MAX_RECLAIM_RETRIES && last_compact_result
+   == COMPACT_SKIPPED)
+   return true;
+   return false;
+}
+
 static inline bool
 should_compact_retry(struct alloc_context *ac, int order, int alloc_flags,
 enum compact_result compact_result,
@@ -4547,10 +4557,11 @@ should_reclaim_retry(gfp_t gfp_mask, unsigned order,
 * Make sure we converge to OOM if we cannot make any progress
 * several times in the row.
 */
-   if (*no_progress_loops > MAX_RECLAIM_RETRIES) {
-   /* Before OOM, exhaust highatomic_reserve */
-   return unreserve_highatomic_pageblock(ac, true);
-   }
+   if (*no_progress_loops > MAX_RECLAIM_RETRIES)
+   result false;
+   /* Last chance before OOM, try draining highatomic_reserve once */
+   else if (*no_progress_loops == MAX_RECLAIM_RETRIES)
+   return unreserve_highatomic_pageblock(ac, true)
 
/*
 * Keep reclaiming pages while there is a chance this will lead
@@ -4822,6 +4833,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 did_some_progress > 0, &no_progress_loops))
goto retry;
 
+   if (should_try_oom(no_progress_loops, compact_result))
+   goto oom:
/*
 * It doesn't make any sense to retry for the compaction if the order-0
 * reclaim is not able to make any progress because the current
@@ -4839,6 +4852,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
if (check_retry_cpuset(cpuset_mems_cookie, ac))
goto retry_cpuset;
 
+oom:
/* Reclaim has failed us, start killing things */
page = __alloc_pages_may_oom(gfp_mask, order, ac, &did_some_progress);
if (page)
-- 
2.26.2