Re: [PATCH v2 1/2] mm: disable LRU pagevec during the migration temporarily

2021-03-09 Thread Michal Hocko
On Tue 09-03-21 08:29:21, Minchan Kim wrote:
> On Tue, Mar 09, 2021 at 12:03:08PM +0100, Michal Hocko wrote:
[...]
> > Sorry for nit picking but I think the additional abstraction for
> > migrate_prep is not really needed and we can remove some more code.
> > Maybe we should even get rid of migrate_prep_local which only has a
> > single caller and open coding lru draining with a comment would be
> > better from code reading POV IMO.
> 
> Thanks for the code. I agree with you.
> However, in this moment, let's go with this one until we conclude.
> The removal of migrate_prep could be easily done after that.
> I am happy to work on it.

I will leave that up to you but I find it a bit pointless to add
migrate_finish just to remove it in the next patch.

Btw. you should also move lru_cache_disabled up in swap.c to fix up
compilation issues by 0 day bot. I didn't do that in my version.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH v2 1/2] mm: disable LRU pagevec during the migration temporarily

2021-03-09 Thread Minchan Kim
On Tue, Mar 09, 2021 at 05:31:09PM +0100, David Hildenbrand wrote:
> 
> > > > Signed-off-by: Minchan Kim 
> > > > ---
> > > > * from v1 - 
> > > > https://lore.kernel.org/lkml/20210302210949.2440120-1-minc...@kernel.org/
> > > >* introduce __lru_add_drain_all to minimize changes - mhocko
> > > >* use lru_cache_disable for memory-hotplug
> > > >* schedule for every cpu at force_all_cpus
> > > > 
> > > > * from RFC - 
> > > > http://lore.kernel.org/linux-mm/20210216170348.1513483-1-minc...@kernel.org
> > > >* use atomic and lru_add_drain_all for strict ordering - mhocko
> > > >* lru_cache_disable/enable - mhocko
> > > > 
> > > >   include/linux/migrate.h |  6 ++-
> > > >   include/linux/swap.h|  2 +
> > > >   mm/memory_hotplug.c |  3 +-
> > > >   mm/mempolicy.c  |  6 +++
> > > >   mm/migrate.c| 13 ---
> > > >   mm/page_alloc.c |  3 ++
> > > >   mm/swap.c   | 82 +
> > > >   7 files changed, 91 insertions(+), 24 deletions(-)
> > > 
> > > Sorry for nit picking but I think the additional abstraction for
> > > migrate_prep is not really needed and we can remove some more code.
> > > Maybe we should even get rid of migrate_prep_local which only has a
> > > single caller and open coding lru draining with a comment would be
> > > better from code reading POV IMO.
> > 
> > Thanks for the code. I agree with you.
> > However, in this moment, let's go with this one until we conclude.
> > The removal of migrate_prep could be easily done after that.
> > I am happy to work on it.
> 
> Can you prepare + send along these cleanups so we can have a look at the end
> result?
> 
> (either cleanups before or after your changes - doing cleanups before might
> be cleaner as we are not dealing with a fix here that we want to backport)

Okay, let me try one more time.


Re: [PATCH v2 1/2] mm: disable LRU pagevec during the migration temporarily

2021-03-09 Thread David Hildenbrand




Signed-off-by: Minchan Kim 
---
* from v1 - 
https://lore.kernel.org/lkml/20210302210949.2440120-1-minc...@kernel.org/
   * introduce __lru_add_drain_all to minimize changes - mhocko
   * use lru_cache_disable for memory-hotplug
   * schedule for every cpu at force_all_cpus

* from RFC - 
http://lore.kernel.org/linux-mm/20210216170348.1513483-1-minc...@kernel.org
   * use atomic and lru_add_drain_all for strict ordering - mhocko
   * lru_cache_disable/enable - mhocko

  include/linux/migrate.h |  6 ++-
  include/linux/swap.h|  2 +
  mm/memory_hotplug.c |  3 +-
  mm/mempolicy.c  |  6 +++
  mm/migrate.c| 13 ---
  mm/page_alloc.c |  3 ++
  mm/swap.c   | 82 +
  7 files changed, 91 insertions(+), 24 deletions(-)


Sorry for nit picking but I think the additional abstraction for
migrate_prep is not really needed and we can remove some more code.
Maybe we should even get rid of migrate_prep_local which only has a
single caller and open coding lru draining with a comment would be
better from code reading POV IMO.


Thanks for the code. I agree with you.
However, in this moment, let's go with this one until we conclude.
The removal of migrate_prep could be easily done after that.
I am happy to work on it.


Can you prepare + send along these cleanups so we can have a look at the 
end result?


(either cleanups before or after your changes - doing cleanups before 
might be cleaner as we are not dealing with a fix here that we want to 
backport)


--
Thanks,

David / dhildenb



Re: [PATCH v2 1/2] mm: disable LRU pagevec during the migration temporarily

2021-03-09 Thread Minchan Kim
On Tue, Mar 09, 2021 at 12:03:08PM +0100, Michal Hocko wrote:
> On Mon 08-03-21 21:16:27, Minchan Kim wrote:
> > LRU pagevec holds refcount of pages until the pagevec are drained.
> > It could prevent migration since the refcount of the page is greater
> > than the expection in migration logic. To mitigate the issue,
> > callers of migrate_pages drains LRU pagevec via migrate_prep or
> > lru_add_drain_all before migrate_pages call.
> > 
> > However, it's not enough because pages coming into pagevec after the
> > draining call still could stay at the pagevec so it could keep
> > preventing page migration. Since some callers of migrate_pages have
> > retrial logic with LRU draining, the page would migrate at next trail
> > but it is still fragile in that it doesn't close the fundamental race
> > between upcoming LRU pages into pagvec and migration so the migration
> > failure could cause contiguous memory allocation failure in the end.
> > 
> > To close the race, this patch disables lru caches(i.e, pagevec)
> > during ongoing migration until migrate is done.
> > 
> > Since it's really hard to reproduce, I measured how many times
> > migrate_pages retried with force mode below debug code.
> 
> It would be better to explicitly state that this is about a fallback to
> a sync migration.
> 
>  
> > int migrate_pages(struct list_head *from, new_page_t get_new_page,
> > ..
> > ..
> > 
> > if (rc && reason == MR_CONTIG_RANGE && pass > 2) {
> >printk(KERN_ERR, "pfn 0x%lx reason %d\n", page_to_pfn(page), rc);
> >dump_page(page, "fail to migrate");
> > }
> > 
> > The test was repeating android apps launching with cma allocation
> > in background every five seconds. Total cma allocation count was
> > about 500 during the testing. With this patch, the dump_page count
> > was reduced from 400 to 30.
> 
> I still find these results hard to argue about because it has really no
> relation to any measurable effect for those apps you are mentioning. I
> would expect sync migration would lead to performance difference. Is
> there any?

Think about migrating 300M pages. It needs to migrate 76800 pages.
It means page migration works(unmap + copy + map) are dominant.

> 
> > It would be also useful for memory-hotplug.
> 
> This is a statment that would deserve some explanation.
> "
> The new interface is alsow useful for memory hotplug which currently
> drains lru pcp caches after each migration failure. This is rather
> suboptimal as it has to disrupt others running during the operation.
> With the new interface the operation happens only once. This is also in
> line with pcp allocator cache which are disabled for the offlining as
> well.
> "

Much better. Thanks. 

>  
> > Signed-off-by: Minchan Kim 
> > ---
> > * from v1 - 
> > https://lore.kernel.org/lkml/20210302210949.2440120-1-minc...@kernel.org/
> >   * introduce __lru_add_drain_all to minimize changes - mhocko
> >   * use lru_cache_disable for memory-hotplug
> >   * schedule for every cpu at force_all_cpus
> > 
> > * from RFC - 
> > http://lore.kernel.org/linux-mm/20210216170348.1513483-1-minc...@kernel.org
> >   * use atomic and lru_add_drain_all for strict ordering - mhocko
> >   * lru_cache_disable/enable - mhocko
> > 
> >  include/linux/migrate.h |  6 ++-
> >  include/linux/swap.h|  2 +
> >  mm/memory_hotplug.c |  3 +-
> >  mm/mempolicy.c  |  6 +++
> >  mm/migrate.c| 13 ---
> >  mm/page_alloc.c |  3 ++
> >  mm/swap.c   | 82 +
> >  7 files changed, 91 insertions(+), 24 deletions(-)
> 
> Sorry for nit picking but I think the additional abstraction for
> migrate_prep is not really needed and we can remove some more code.
> Maybe we should even get rid of migrate_prep_local which only has a
> single caller and open coding lru draining with a comment would be
> better from code reading POV IMO.

Thanks for the code. I agree with you.
However, in this moment, let's go with this one until we conclude.
The removal of migrate_prep could be easily done after that.
I am happy to work on it.


Re: [PATCH v2 1/2] mm: disable LRU pagevec during the migration temporarily

2021-03-09 Thread Michal Hocko
On Mon 08-03-21 21:16:27, Minchan Kim wrote:
> LRU pagevec holds refcount of pages until the pagevec are drained.
> It could prevent migration since the refcount of the page is greater
> than the expection in migration logic. To mitigate the issue,
> callers of migrate_pages drains LRU pagevec via migrate_prep or
> lru_add_drain_all before migrate_pages call.
> 
> However, it's not enough because pages coming into pagevec after the
> draining call still could stay at the pagevec so it could keep
> preventing page migration. Since some callers of migrate_pages have
> retrial logic with LRU draining, the page would migrate at next trail
> but it is still fragile in that it doesn't close the fundamental race
> between upcoming LRU pages into pagvec and migration so the migration
> failure could cause contiguous memory allocation failure in the end.
> 
> To close the race, this patch disables lru caches(i.e, pagevec)
> during ongoing migration until migrate is done.
> 
> Since it's really hard to reproduce, I measured how many times
> migrate_pages retried with force mode below debug code.

It would be better to explicitly state that this is about a fallback to
a sync migration.

 
> int migrate_pages(struct list_head *from, new_page_t get_new_page,
>   ..
>   ..
> 
> if (rc && reason == MR_CONTIG_RANGE && pass > 2) {
>printk(KERN_ERR, "pfn 0x%lx reason %d\n", page_to_pfn(page), rc);
>dump_page(page, "fail to migrate");
> }
> 
> The test was repeating android apps launching with cma allocation
> in background every five seconds. Total cma allocation count was
> about 500 during the testing. With this patch, the dump_page count
> was reduced from 400 to 30.

I still find these results hard to argue about because it has really no
relation to any measurable effect for those apps you are mentioning. I
would expect sync migration would lead to performance difference. Is
there any?

> It would be also useful for memory-hotplug.

This is a statment that would deserve some explanation.
"
The new interface is alsow useful for memory hotplug which currently
drains lru pcp caches after each migration failure. This is rather
suboptimal as it has to disrupt others running during the operation.
With the new interface the operation happens only once. This is also in
line with pcp allocator cache which are disabled for the offlining as
well.
"
 
> Signed-off-by: Minchan Kim 
> ---
> * from v1 - 
> https://lore.kernel.org/lkml/20210302210949.2440120-1-minc...@kernel.org/
>   * introduce __lru_add_drain_all to minimize changes - mhocko
>   * use lru_cache_disable for memory-hotplug
>   * schedule for every cpu at force_all_cpus
> 
> * from RFC - 
> http://lore.kernel.org/linux-mm/20210216170348.1513483-1-minc...@kernel.org
>   * use atomic and lru_add_drain_all for strict ordering - mhocko
>   * lru_cache_disable/enable - mhocko
> 
>  include/linux/migrate.h |  6 ++-
>  include/linux/swap.h|  2 +
>  mm/memory_hotplug.c |  3 +-
>  mm/mempolicy.c  |  6 +++
>  mm/migrate.c| 13 ---
>  mm/page_alloc.c |  3 ++
>  mm/swap.c   | 82 +
>  7 files changed, 91 insertions(+), 24 deletions(-)

Sorry for nit picking but I think the additional abstraction for
migrate_prep is not really needed and we can remove some more code.
Maybe we should even get rid of migrate_prep_local which only has a
single caller and open coding lru draining with a comment would be
better from code reading POV IMO.

 include/linux/migrate.h |  4 +--
 include/linux/swap.h|  2 ++
 mm/memory_hotplug.c |  3 +-
 mm/mempolicy.c  | 10 --
 mm/migrate.c| 19 ++--
 mm/page_alloc.c |  5 ++-
 mm/swap.c   | 82 +++--
 7 files changed, 85 insertions(+), 40 deletions(-)

diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 4594838a0f7c..dbd6311f23be 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -45,7 +45,6 @@ extern struct page *alloc_migration_target(struct page *page, 
unsigned long priv
 extern int isolate_movable_page(struct page *page, isolate_mode_t mode);
 extern void putback_movable_page(struct page *page);
 
-extern void migrate_prep(void);
 extern void migrate_prep_local(void);
 extern void migrate_page_states(struct page *newpage, struct page *page);
 extern void migrate_page_copy(struct page *newpage, struct page *page);
@@ -66,8 +65,7 @@ static inline struct page *alloc_migration_target(struct page 
*page,
 static inline int isolate_movable_page(struct page *page, isolate_mode_t mode)
{ return -EBUSY; }
 
-static inline int migrate_prep(void) { return -ENOSYS; }
-static inline int migrate_prep_local(void) { return -ENOSYS; }
+static inline void migrate_prep_local(void) { return -ENOSYS; }
 
 static inline void migrate_page_states(struct page *newpage, struct page *page)
 {

Re: [PATCH v2 1/2] mm: disable LRU pagevec during the migration temporarily

2021-03-09 Thread kernel test robot
Hi Minchan,

I love your patch! Perhaps something to improve:

[auto build test WARNING on linux/master]
[also build test WARNING on linus/master v5.12-rc2 next-20210309]
[cannot apply to hnaz-linux-mm/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Minchan-Kim/mm-disable-LRU-pagevec-during-the-migration-temporarily/20210309-131826
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
144c79ef33536b4ecb4951e07dbc1f2b7fa99d32
config: openrisc-randconfig-r026-20210308 (attached as .config)
compiler: or1k-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://github.com/0day-ci/linux/commit/e746db1a2ab13441890fa2cad8604bbec190b401
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Minchan-Kim/mm-disable-LRU-pagevec-during-the-migration-temporarily/20210309-131826
git checkout e746db1a2ab13441890fa2cad8604bbec190b401
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross 
ARCH=openrisc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>):

   mm/swap.c: In function 'pagevec_add_and_need_flush':
   mm/swap.c:244:4: error: implicit declaration of function 
'lru_cache_disabled'; did you mean 'lru_cache_disable'? 
[-Werror=implicit-function-declaration]
 244 |lru_cache_disabled())
 |^~
 |lru_cache_disable
   mm/swap.c: At top level:
>> mm/swap.c:743:6: warning: no previous prototype for '__lru_add_drain_all' 
>> [-Wmissing-prototypes]
 743 | void __lru_add_drain_all(bool force_all_cpus)
 |  ^~~
   mm/swap.c:858:6: warning: no previous prototype for 'lru_cache_disabled' 
[-Wmissing-prototypes]
 858 | bool lru_cache_disabled(void)
 |  ^~
   mm/swap.c:858:6: error: conflicting types for 'lru_cache_disabled'
   mm/swap.c:244:4: note: previous implicit declaration of 'lru_cache_disabled' 
was here
 244 |lru_cache_disabled())
 |^~
   cc1: some warnings being treated as errors


vim +/__lru_add_drain_all +743 mm/swap.c

   742  
 > 743  void __lru_add_drain_all(bool force_all_cpus)
   744  {
   745  /*
   746   * lru_drain_gen - Global pages generation number
   747   *
   748   * (A) Definition: global lru_drain_gen = x implies that all 
generations
   749   * 0 < n <= x are already *scheduled* for draining.
   750   *
   751   * This is an optimization for the highly-contended use case 
where a
   752   * user space workload keeps constantly generating a flow of 
pages for
   753   * each CPU.
   754   */
   755  static unsigned int lru_drain_gen;
   756  static struct cpumask has_work;
   757  static DEFINE_MUTEX(lock);
   758  unsigned cpu, this_gen;
   759  
   760  /*
   761   * Make sure nobody triggers this path before mm_percpu_wq is 
fully
   762   * initialized.
   763   */
   764  if (WARN_ON(!mm_percpu_wq))
   765  return;
   766  
   767  /*
   768   * Guarantee pagevec counter stores visible by this CPU are 
visible to
   769   * other CPUs before loading the current drain generation.
   770   */
   771  smp_mb();
   772  
   773  /*
   774   * (B) Locally cache global LRU draining generation number
   775   *
   776   * The read barrier ensures that the counter is loaded before 
the mutex
   777   * is taken. It pairs with smp_mb() inside the mutex critical 
section
   778   * at (D).
   779   */
   780  this_gen = smp_load_acquire(_drain_gen);
   781  
   782  mutex_lock();
   783  
   784  /*
   785   * (C) Exit the draining operation if a newer generation, from 
another
   786   * lru_add_drain_all(), was already scheduled for draining. 
Check (A).
   787   */
   788  if (unlikely(this_gen != lru_drain_gen && !force_all_cpus))
   789  goto done;
   790  
   791  /*
   792   * (D) Increment global generation number
   793   *
   794   * Pairs with smp_load_acquire() at (B), outside of the critical
   795   * section. Use a full memory barrier to guarantee that the new 
global
   796   * drain generation number is stored before loading pagevec 
counters.
   797   *
   798

Re: [PATCH v2 1/2] mm: disable LRU pagevec during the migration temporarily

2021-03-09 Thread kernel test robot
Hi Minchan,

I love your patch! Yet something to improve:

[auto build test ERROR on linux/master]
[also build test ERROR on linus/master v5.12-rc2 next-20210309]
[cannot apply to hnaz-linux-mm/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Minchan-Kim/mm-disable-LRU-pagevec-during-the-migration-temporarily/20210309-131826
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
144c79ef33536b4ecb4951e07dbc1f2b7fa99d32
config: powerpc-skiroot_defconfig (attached as .config)
compiler: powerpc64le-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://github.com/0day-ci/linux/commit/e746db1a2ab13441890fa2cad8604bbec190b401
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Minchan-Kim/mm-disable-LRU-pagevec-during-the-migration-temporarily/20210309-131826
git checkout e746db1a2ab13441890fa2cad8604bbec190b401
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross 
ARCH=powerpc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

   In file included from mm/mempolicy.c:92:
   include/linux/migrate.h: In function 'migrate_prep':
   include/linux/migrate.h:70:48: error: 'return' with a value, in function 
returning void [-Werror=return-type]
  70 | static inline void migrate_prep(void) { return -ENOSYS; }
 |^
   include/linux/migrate.h:70:20: note: declared here
  70 | static inline void migrate_prep(void) { return -ENOSYS; }
 |^~~~
   include/linux/migrate.h: In function 'migrate_prep_local':
   include/linux/migrate.h:71:54: error: 'return' with a value, in function 
returning void [-Werror=return-type]
  71 | static inline void migrate_prep_local(void) { return -ENOSYS; }
 |  ^
   include/linux/migrate.h:71:20: note: declared here
  71 | static inline void migrate_prep_local(void) { return -ENOSYS; }
 |^~
   mm/mempolicy.c: In function 'do_mbind':
>> mm/mempolicy.c:1378:3: error: implicit declaration of function 
>> 'migrate_finish'; did you mean 'migrate_done'? 
>> [-Werror=implicit-function-declaration]
1378 |   migrate_finish();
 |   ^~
 |   migrate_done
   cc1: some warnings being treated as errors


vim +1378 mm/mempolicy.c

  1277  
  1278  static long do_mbind(unsigned long start, unsigned long len,
  1279   unsigned short mode, unsigned short mode_flags,
  1280   nodemask_t *nmask, unsigned long flags)
  1281  {
  1282  struct mm_struct *mm = current->mm;
  1283  struct mempolicy *new;
  1284  unsigned long end;
  1285  int err;
  1286  int ret;
  1287  LIST_HEAD(pagelist);
  1288  
  1289  if (flags & ~(unsigned long)MPOL_MF_VALID)
  1290  return -EINVAL;
  1291  if ((flags & MPOL_MF_MOVE_ALL) && !capable(CAP_SYS_NICE))
  1292  return -EPERM;
  1293  
  1294  if (start & ~PAGE_MASK)
  1295  return -EINVAL;
  1296  
  1297  if (mode == MPOL_DEFAULT)
  1298  flags &= ~MPOL_MF_STRICT;
  1299  
  1300  len = (len + PAGE_SIZE - 1) & PAGE_MASK;
  1301  end = start + len;
  1302  
  1303  if (end < start)
  1304  return -EINVAL;
  1305  if (end == start)
  1306  return 0;
  1307  
  1308  new = mpol_new(mode, mode_flags, nmask);
  1309  if (IS_ERR(new))
  1310  return PTR_ERR(new);
  1311  
  1312  if (flags & MPOL_MF_LAZY)
  1313  new->flags |= MPOL_F_MOF;
  1314  
  1315  /*
  1316   * If we are using the default policy then operation
  1317   * on discontinuous address spaces is okay after all
  1318   */
  1319  if (!new)
  1320  flags |= MPOL_MF_DISCONTIG_OK;
  1321  
  1322  pr_debug("mbind %lx-%lx mode:%d flags:%d nodes:%lx\n",
  1323   start, start + len, mode, mode_flags,
  1324   nmask ? nodes_addr(*nmask)[0] : NUMA_NO_NODE);
  1325  
  1326  if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) {
  1327  
  1328  migrate_prep();
  1329  }
  1330  {
  1331  NODEMASK_SCRATCH(scratch);
  1332  if (scratch) {
  

Re: [PATCH v2 1/2] mm: disable LRU pagevec during the migration temporarily

2021-03-08 Thread kernel test robot
Hi Minchan,

I love your patch! Perhaps something to improve:

[auto build test WARNING on linux/master]
[also build test WARNING on linus/master v5.12-rc2 next-20210309]
[cannot apply to hnaz-linux-mm/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Minchan-Kim/mm-disable-LRU-pagevec-during-the-migration-temporarily/20210309-131826
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
144c79ef33536b4ecb4951e07dbc1f2b7fa99d32
config: arm64-randconfig-r023-20210308 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 
820f508b08d7c94b2dd7847e9710d2bc36d3dd45)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# install arm64 cross compiling tool for clang build
# apt-get install binutils-aarch64-linux-gnu
# 
https://github.com/0day-ci/linux/commit/e746db1a2ab13441890fa2cad8604bbec190b401
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Minchan-Kim/mm-disable-LRU-pagevec-during-the-migration-temporarily/20210309-131826
git checkout e746db1a2ab13441890fa2cad8604bbec190b401
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=arm64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>):

   mm/swap.c:244:4: error: implicit declaration of function 
'lru_cache_disabled' [-Werror,-Wimplicit-function-declaration]
   lru_cache_disabled())
   ^
   mm/swap.c:244:4: note: did you mean 'lru_cache_disable'?
   include/linux/swap.h:342:13: note: 'lru_cache_disable' declared here
   extern void lru_cache_disable(void);
   ^
>> mm/swap.c:743:6: warning: no previous prototype for function 
>> '__lru_add_drain_all' [-Wmissing-prototypes]
   void __lru_add_drain_all(bool force_all_cpus)
^
   mm/swap.c:743:1: note: declare 'static' if the function is not intended to 
be used outside of this translation unit
   void __lru_add_drain_all(bool force_all_cpus)
   ^
   static 
   mm/swap.c:858:6: error: conflicting types for 'lru_cache_disabled'
   bool lru_cache_disabled(void)
^
   mm/swap.c:244:4: note: previous implicit declaration is here
   lru_cache_disabled())
   ^
   1 warning and 2 errors generated.


vim +/__lru_add_drain_all +743 mm/swap.c

   742  
 > 743  void __lru_add_drain_all(bool force_all_cpus)
   744  {
   745  /*
   746   * lru_drain_gen - Global pages generation number
   747   *
   748   * (A) Definition: global lru_drain_gen = x implies that all 
generations
   749   * 0 < n <= x are already *scheduled* for draining.
   750   *
   751   * This is an optimization for the highly-contended use case 
where a
   752   * user space workload keeps constantly generating a flow of 
pages for
   753   * each CPU.
   754   */
   755  static unsigned int lru_drain_gen;
   756  static struct cpumask has_work;
   757  static DEFINE_MUTEX(lock);
   758  unsigned cpu, this_gen;
   759  
   760  /*
   761   * Make sure nobody triggers this path before mm_percpu_wq is 
fully
   762   * initialized.
   763   */
   764  if (WARN_ON(!mm_percpu_wq))
   765  return;
   766  
   767  /*
   768   * Guarantee pagevec counter stores visible by this CPU are 
visible to
   769   * other CPUs before loading the current drain generation.
   770   */
   771  smp_mb();
   772  
   773  /*
   774   * (B) Locally cache global LRU draining generation number
   775   *
   776   * The read barrier ensures that the counter is loaded before 
the mutex
   777   * is taken. It pairs with smp_mb() inside the mutex critical 
section
   778   * at (D).
   779   */
   780  this_gen = smp_load_acquire(_drain_gen);
   781  
   782  mutex_lock();
   783  
   784  /*
   785   * (C) Exit the draining operation if a newer generation, from 
another
   786   * lru_add_drain_all(), was already scheduled for draining. 
Check (A).
   787   */
   788  if (unlikely(this_gen != lru_drain_gen && !force_all_cpus))
   789  goto done;
   790  
   791  /*
   792   * (D) Increment global generation number
   793   *
   794   * Pairs with smp_load_acquire() at (B), 

[PATCH v2 1/2] mm: disable LRU pagevec during the migration temporarily

2021-03-08 Thread Minchan Kim
LRU pagevec holds refcount of pages until the pagevec are drained.
It could prevent migration since the refcount of the page is greater
than the expection in migration logic. To mitigate the issue,
callers of migrate_pages drains LRU pagevec via migrate_prep or
lru_add_drain_all before migrate_pages call.

However, it's not enough because pages coming into pagevec after the
draining call still could stay at the pagevec so it could keep
preventing page migration. Since some callers of migrate_pages have
retrial logic with LRU draining, the page would migrate at next trail
but it is still fragile in that it doesn't close the fundamental race
between upcoming LRU pages into pagvec and migration so the migration
failure could cause contiguous memory allocation failure in the end.

To close the race, this patch disables lru caches(i.e, pagevec)
during ongoing migration until migrate is done.

Since it's really hard to reproduce, I measured how many times
migrate_pages retried with force mode below debug code.

int migrate_pages(struct list_head *from, new_page_t get_new_page,
..
..

if (rc && reason == MR_CONTIG_RANGE && pass > 2) {
   printk(KERN_ERR, "pfn 0x%lx reason %d\n", page_to_pfn(page), rc);
   dump_page(page, "fail to migrate");
}

The test was repeating android apps launching with cma allocation
in background every five seconds. Total cma allocation count was
about 500 during the testing. With this patch, the dump_page count
was reduced from 400 to 30.

It would be also useful for memory-hotplug.

Signed-off-by: Minchan Kim 
---
* from v1 - 
https://lore.kernel.org/lkml/20210302210949.2440120-1-minc...@kernel.org/
  * introduce __lru_add_drain_all to minimize changes - mhocko
  * use lru_cache_disable for memory-hotplug
  * schedule for every cpu at force_all_cpus

* from RFC - 
http://lore.kernel.org/linux-mm/20210216170348.1513483-1-minc...@kernel.org
  * use atomic and lru_add_drain_all for strict ordering - mhocko
  * lru_cache_disable/enable - mhocko

 include/linux/migrate.h |  6 ++-
 include/linux/swap.h|  2 +
 mm/memory_hotplug.c |  3 +-
 mm/mempolicy.c  |  6 +++
 mm/migrate.c| 13 ---
 mm/page_alloc.c |  3 ++
 mm/swap.c   | 82 +
 7 files changed, 91 insertions(+), 24 deletions(-)

diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 3a389633b68f..6a23174ea081 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -47,6 +47,7 @@ extern void putback_movable_page(struct page *page);
 
 extern void migrate_prep(void);
 extern void migrate_prep_local(void);
+extern void migrate_finish(void);
 extern void migrate_page_states(struct page *newpage, struct page *page);
 extern void migrate_page_copy(struct page *newpage, struct page *page);
 extern int migrate_huge_page_move_mapping(struct address_space *mapping,
@@ -66,8 +67,9 @@ static inline struct page *alloc_migration_target(struct page 
*page,
 static inline int isolate_movable_page(struct page *page, isolate_mode_t mode)
{ return -EBUSY; }
 
-static inline int migrate_prep(void) { return -ENOSYS; }
-static inline int migrate_prep_local(void) { return -ENOSYS; }
+static inline void migrate_prep(void) { return -ENOSYS; }
+static inline void migrate_prep_local(void) { return -ENOSYS; }
+static inline void migrate_done(void) {}
 
 static inline void migrate_page_states(struct page *newpage, struct page *page)
 {
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 71166bc10d17..aaa6b9cc3f8a 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -339,6 +339,8 @@ extern void lru_note_cost(struct lruvec *lruvec, bool file,
 extern void lru_note_cost_page(struct page *);
 extern void lru_cache_add(struct page *);
 extern void mark_page_accessed(struct page *);
+extern void lru_cache_disable(void);
+extern void lru_cache_enable(void);
 extern void lru_add_drain(void);
 extern void lru_add_drain_cpu(int cpu);
 extern void lru_add_drain_cpu_zone(struct zone *zone);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index a969463bdda4..9a5af00802f9 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1571,6 +1571,7 @@ int __ref offline_pages(unsigned long start_pfn, unsigned 
long nr_pages)
 * in a way that pages from isolated pageblock are left on pcplists.
 */
zone_pcp_disable(zone);
+   lru_cache_disable();
 
/* set above range as isolated */
ret = start_isolate_page_range(start_pfn, end_pfn,
@@ -1602,7 +1603,6 @@ int __ref offline_pages(unsigned long start_pfn, unsigned 
long nr_pages)
}
 
cond_resched();
-   lru_add_drain_all();
 
ret = scan_movable_pages(pfn, end_pfn, );
if (!ret) {
@@ -1647,6 +1647,7 @@ int __ref offline_pages(unsigned long start_pfn, unsigned 
long nr_pages)