Re: [PATCH -mm -v4 03/21] mm, THP, swap: Support PMD swap mapping in swap_duplicate()

2018-07-10 Thread Huang, Ying
Dave Hansen  writes:

>> Yes.  Boolean parameter isn't good at most times.  Matthew Wilcox
>> suggested to use
>> 
>> swap_duplicate(, HPAGE_PMD_NR);
>> 
>> vs.
>> 
>> swap_duplicate(, 1);
>> 
>> He thinks this makes the interface more flexible to support other swap
>> entry size in the future.  What do you think about that?
>
> That looks great to me too.
>
if (likely(!non_swap_entry(entry))) {
 -  if (swap_duplicate(entry) < 0)
 +  if (swap_duplicate(, false) < 0)
return entry.val;
  
/* make sure dst_mm is on swapoff's mmlist. */
>>>
>>> I'll also point out that in a multi-hundred-line patch, adding arguments
>>> to a existing function would not be something I'd try to include in the
>>> patch.  I'd break it out separately unless absolutely necessary.
>> 
>> You mean add another patch, which only adds arguments to the function,
>> but not change the body of the function?
>
> Yes.  Or, just add the non-THP-swap version first.

OK, will do this.

Best Regards,
Huang, Ying


Re: [PATCH -mm -v4 03/21] mm, THP, swap: Support PMD swap mapping in swap_duplicate()

2018-07-10 Thread Huang, Ying
Dave Hansen  writes:

>> Yes.  Boolean parameter isn't good at most times.  Matthew Wilcox
>> suggested to use
>> 
>> swap_duplicate(, HPAGE_PMD_NR);
>> 
>> vs.
>> 
>> swap_duplicate(, 1);
>> 
>> He thinks this makes the interface more flexible to support other swap
>> entry size in the future.  What do you think about that?
>
> That looks great to me too.
>
if (likely(!non_swap_entry(entry))) {
 -  if (swap_duplicate(entry) < 0)
 +  if (swap_duplicate(, false) < 0)
return entry.val;
  
/* make sure dst_mm is on swapoff's mmlist. */
>>>
>>> I'll also point out that in a multi-hundred-line patch, adding arguments
>>> to a existing function would not be something I'd try to include in the
>>> patch.  I'd break it out separately unless absolutely necessary.
>> 
>> You mean add another patch, which only adds arguments to the function,
>> but not change the body of the function?
>
> Yes.  Or, just add the non-THP-swap version first.

OK, will do this.

Best Regards,
Huang, Ying


Re: [PATCH -mm -v4 03/21] mm, THP, swap: Support PMD swap mapping in swap_duplicate()

2018-07-10 Thread Dave Hansen
> Yes.  Boolean parameter isn't good at most times.  Matthew Wilcox
> suggested to use
> 
> swap_duplicate(, HPAGE_PMD_NR);
> 
> vs.
> 
> swap_duplicate(, 1);
> 
> He thinks this makes the interface more flexible to support other swap
> entry size in the future.  What do you think about that?

That looks great to me too.

>>> if (likely(!non_swap_entry(entry))) {
>>> -   if (swap_duplicate(entry) < 0)
>>> +   if (swap_duplicate(, false) < 0)
>>> return entry.val;
>>>  
>>> /* make sure dst_mm is on swapoff's mmlist. */
>>
>> I'll also point out that in a multi-hundred-line patch, adding arguments
>> to a existing function would not be something I'd try to include in the
>> patch.  I'd break it out separately unless absolutely necessary.
> 
> You mean add another patch, which only adds arguments to the function,
> but not change the body of the function?

Yes.  Or, just add the non-THP-swap version first.


Re: [PATCH -mm -v4 03/21] mm, THP, swap: Support PMD swap mapping in swap_duplicate()

2018-07-10 Thread Dave Hansen
> Yes.  Boolean parameter isn't good at most times.  Matthew Wilcox
> suggested to use
> 
> swap_duplicate(, HPAGE_PMD_NR);
> 
> vs.
> 
> swap_duplicate(, 1);
> 
> He thinks this makes the interface more flexible to support other swap
> entry size in the future.  What do you think about that?

That looks great to me too.

>>> if (likely(!non_swap_entry(entry))) {
>>> -   if (swap_duplicate(entry) < 0)
>>> +   if (swap_duplicate(, false) < 0)
>>> return entry.val;
>>>  
>>> /* make sure dst_mm is on swapoff's mmlist. */
>>
>> I'll also point out that in a multi-hundred-line patch, adding arguments
>> to a existing function would not be something I'd try to include in the
>> patch.  I'd break it out separately unless absolutely necessary.
> 
> You mean add another patch, which only adds arguments to the function,
> but not change the body of the function?

Yes.  Or, just add the non-THP-swap version first.


Re: [PATCH -mm -v4 03/21] mm, THP, swap: Support PMD swap mapping in swap_duplicate()

2018-07-10 Thread Huang, Ying
Dave Hansen  writes:

>> +static inline bool thp_swap_supported(void)
>> +{
>> +return IS_ENABLED(CONFIG_THP_SWAP);
>> +}
>
> This seems like rather useless abstraction.  Why do we need it?

I just want to make it shorter, 19 vs 27 characters.  But if you think
IS_ENABLED(CONFIG_THP_SWAP) is much better, I can use that instead.

> ...
>> -static inline int swap_duplicate(swp_entry_t swp)
>> +static inline int swap_duplicate(swp_entry_t *swp, bool cluster)
>>  {
>>  return 0;
>>  }
>
> FWIW, I despise true/false function arguments like this.  When I see
> this in code:
>
>   swap_duplicate(, false);
>
> I have no idea what false does.  I'd much rather see:
>
> enum do_swap_cluster {
>   SWP_DO_CLUSTER,
>   SWP_NO_CLUSTER
> };
>
> So you see:
>
>   swap_duplicate(, SWP_NO_CLUSTER);
>
> vs.
>
>   swap_duplicate(, SWP_DO_CLUSTER);
>

Yes.  Boolean parameter isn't good at most times.  Matthew Wilcox
suggested to use

swap_duplicate(, HPAGE_PMD_NR);

vs.

swap_duplicate(, 1);

He thinks this makes the interface more flexible to support other swap
entry size in the future.  What do you think about that?

>> diff --git a/mm/memory.c b/mm/memory.c
>> index e9cac1c4fa69..f3900282e3da 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -951,7 +951,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct 
>> *src_mm,
>>  swp_entry_t entry = pte_to_swp_entry(pte);
>>  
>>  if (likely(!non_swap_entry(entry))) {
>> -if (swap_duplicate(entry) < 0)
>> +if (swap_duplicate(, false) < 0)
>>  return entry.val;
>>  
>>  /* make sure dst_mm is on swapoff's mmlist. */
>
> I'll also point out that in a multi-hundred-line patch, adding arguments
> to a existing function would not be something I'd try to include in the
> patch.  I'd break it out separately unless absolutely necessary.

You mean add another patch, which only adds arguments to the function,
but not change the body of the function?

>> diff --git a/mm/swapfile.c b/mm/swapfile.c
>> index f42b1b0cdc58..48e2c54385ee 100644
>> --- a/mm/swapfile.c
>> +++ b/mm/swapfile.c
>> @@ -49,6 +49,9 @@ static bool swap_count_continued(struct swap_info_struct 
>> *, pgoff_t,
>>   unsigned char);
>>  static void free_swap_count_continuations(struct swap_info_struct *);
>>  static sector_t map_swap_entry(swp_entry_t, struct block_device**);
>> +static int add_swap_count_continuation_locked(struct swap_info_struct *si,
>> +  unsigned long offset,
>> +  struct page *page);
>>  
>>  DEFINE_SPINLOCK(swap_lock);
>>  static unsigned int nr_swapfiles;
>> @@ -319,6 +322,11 @@ static inline void unlock_cluster_or_swap_info(struct 
>> swap_info_struct *si,
>>  spin_unlock(>lock);
>>  }
>>  
>> +static inline bool is_cluster_offset(unsigned long offset)
>> +{
>> +return !(offset % SWAPFILE_CLUSTER);
>> +}
>> +
>>  static inline bool cluster_list_empty(struct swap_cluster_list *list)
>>  {
>>  return cluster_is_null(>head);
>> @@ -1166,16 +1174,14 @@ struct swap_info_struct *get_swap_device(swp_entry_t 
>> entry)
>>  return NULL;
>>  }
>>  
>> -static unsigned char __swap_entry_free(struct swap_info_struct *p,
>> -   swp_entry_t entry, unsigned char usage)
>> +static unsigned char __swap_entry_free_locked(struct swap_info_struct *p,
>> +  struct swap_cluster_info *ci,
>> +  unsigned long offset,
>> +  unsigned char usage)
>>  {
>> -struct swap_cluster_info *ci;
>> -unsigned long offset = swp_offset(entry);
>>  unsigned char count;
>>  unsigned char has_cache;
>>  
>> -ci = lock_cluster_or_swap_info(p, offset);
>> -
>>  count = p->swap_map[offset];
>>  
>>  has_cache = count & SWAP_HAS_CACHE;
>> @@ -1203,6 +1209,17 @@ static unsigned char __swap_entry_free(struct 
>> swap_info_struct *p,
>>  usage = count | has_cache;
>>  p->swap_map[offset] = usage ? : SWAP_HAS_CACHE;
>>  
>> +return usage;
>> +}
>> +
>> +static unsigned char __swap_entry_free(struct swap_info_struct *p,
>> +   swp_entry_t entry, unsigned char usage)
>> +{
>> +struct swap_cluster_info *ci;
>> +unsigned long offset = swp_offset(entry);
>> +
>> +ci = lock_cluster_or_swap_info(p, offset);
>> +usage = __swap_entry_free_locked(p, ci, offset, usage);
>>  unlock_cluster_or_swap_info(p, ci);
>>  
>>  return usage;
>> @@ -3450,32 +3467,12 @@ void si_swapinfo(struct sysinfo *val)
>>  spin_unlock(_lock);
>>  }
>>  
>> -/*
>> - * Verify that a swap entry is valid and increment its swap map count.
>> - *
>> - * Returns error code in following case.
>> - * - success -> 0
>> - * - swp_entry is invalid -> EINVAL

Re: [PATCH -mm -v4 03/21] mm, THP, swap: Support PMD swap mapping in swap_duplicate()

2018-07-10 Thread Huang, Ying
Dave Hansen  writes:

>> +static inline bool thp_swap_supported(void)
>> +{
>> +return IS_ENABLED(CONFIG_THP_SWAP);
>> +}
>
> This seems like rather useless abstraction.  Why do we need it?

I just want to make it shorter, 19 vs 27 characters.  But if you think
IS_ENABLED(CONFIG_THP_SWAP) is much better, I can use that instead.

> ...
>> -static inline int swap_duplicate(swp_entry_t swp)
>> +static inline int swap_duplicate(swp_entry_t *swp, bool cluster)
>>  {
>>  return 0;
>>  }
>
> FWIW, I despise true/false function arguments like this.  When I see
> this in code:
>
>   swap_duplicate(, false);
>
> I have no idea what false does.  I'd much rather see:
>
> enum do_swap_cluster {
>   SWP_DO_CLUSTER,
>   SWP_NO_CLUSTER
> };
>
> So you see:
>
>   swap_duplicate(, SWP_NO_CLUSTER);
>
> vs.
>
>   swap_duplicate(, SWP_DO_CLUSTER);
>

Yes.  Boolean parameter isn't good at most times.  Matthew Wilcox
suggested to use

swap_duplicate(, HPAGE_PMD_NR);

vs.

swap_duplicate(, 1);

He thinks this makes the interface more flexible to support other swap
entry size in the future.  What do you think about that?

>> diff --git a/mm/memory.c b/mm/memory.c
>> index e9cac1c4fa69..f3900282e3da 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -951,7 +951,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct 
>> *src_mm,
>>  swp_entry_t entry = pte_to_swp_entry(pte);
>>  
>>  if (likely(!non_swap_entry(entry))) {
>> -if (swap_duplicate(entry) < 0)
>> +if (swap_duplicate(, false) < 0)
>>  return entry.val;
>>  
>>  /* make sure dst_mm is on swapoff's mmlist. */
>
> I'll also point out that in a multi-hundred-line patch, adding arguments
> to a existing function would not be something I'd try to include in the
> patch.  I'd break it out separately unless absolutely necessary.

You mean add another patch, which only adds arguments to the function,
but not change the body of the function?

>> diff --git a/mm/swapfile.c b/mm/swapfile.c
>> index f42b1b0cdc58..48e2c54385ee 100644
>> --- a/mm/swapfile.c
>> +++ b/mm/swapfile.c
>> @@ -49,6 +49,9 @@ static bool swap_count_continued(struct swap_info_struct 
>> *, pgoff_t,
>>   unsigned char);
>>  static void free_swap_count_continuations(struct swap_info_struct *);
>>  static sector_t map_swap_entry(swp_entry_t, struct block_device**);
>> +static int add_swap_count_continuation_locked(struct swap_info_struct *si,
>> +  unsigned long offset,
>> +  struct page *page);
>>  
>>  DEFINE_SPINLOCK(swap_lock);
>>  static unsigned int nr_swapfiles;
>> @@ -319,6 +322,11 @@ static inline void unlock_cluster_or_swap_info(struct 
>> swap_info_struct *si,
>>  spin_unlock(>lock);
>>  }
>>  
>> +static inline bool is_cluster_offset(unsigned long offset)
>> +{
>> +return !(offset % SWAPFILE_CLUSTER);
>> +}
>> +
>>  static inline bool cluster_list_empty(struct swap_cluster_list *list)
>>  {
>>  return cluster_is_null(>head);
>> @@ -1166,16 +1174,14 @@ struct swap_info_struct *get_swap_device(swp_entry_t 
>> entry)
>>  return NULL;
>>  }
>>  
>> -static unsigned char __swap_entry_free(struct swap_info_struct *p,
>> -   swp_entry_t entry, unsigned char usage)
>> +static unsigned char __swap_entry_free_locked(struct swap_info_struct *p,
>> +  struct swap_cluster_info *ci,
>> +  unsigned long offset,
>> +  unsigned char usage)
>>  {
>> -struct swap_cluster_info *ci;
>> -unsigned long offset = swp_offset(entry);
>>  unsigned char count;
>>  unsigned char has_cache;
>>  
>> -ci = lock_cluster_or_swap_info(p, offset);
>> -
>>  count = p->swap_map[offset];
>>  
>>  has_cache = count & SWAP_HAS_CACHE;
>> @@ -1203,6 +1209,17 @@ static unsigned char __swap_entry_free(struct 
>> swap_info_struct *p,
>>  usage = count | has_cache;
>>  p->swap_map[offset] = usage ? : SWAP_HAS_CACHE;
>>  
>> +return usage;
>> +}
>> +
>> +static unsigned char __swap_entry_free(struct swap_info_struct *p,
>> +   swp_entry_t entry, unsigned char usage)
>> +{
>> +struct swap_cluster_info *ci;
>> +unsigned long offset = swp_offset(entry);
>> +
>> +ci = lock_cluster_or_swap_info(p, offset);
>> +usage = __swap_entry_free_locked(p, ci, offset, usage);
>>  unlock_cluster_or_swap_info(p, ci);
>>  
>>  return usage;
>> @@ -3450,32 +3467,12 @@ void si_swapinfo(struct sysinfo *val)
>>  spin_unlock(_lock);
>>  }
>>  
>> -/*
>> - * Verify that a swap entry is valid and increment its swap map count.
>> - *
>> - * Returns error code in following case.
>> - * - success -> 0
>> - * - swp_entry is invalid -> EINVAL

Re: [PATCH -mm -v4 03/21] mm, THP, swap: Support PMD swap mapping in swap_duplicate()

2018-07-09 Thread Dave Hansen
> +static inline bool thp_swap_supported(void)
> +{
> + return IS_ENABLED(CONFIG_THP_SWAP);
> +}

This seems like rather useless abstraction.  Why do we need it?

...
> -static inline int swap_duplicate(swp_entry_t swp)
> +static inline int swap_duplicate(swp_entry_t *swp, bool cluster)
>  {
>   return 0;
>  }

FWIW, I despise true/false function arguments like this.  When I see
this in code:

swap_duplicate(, false);

I have no idea what false does.  I'd much rather see:

enum do_swap_cluster {
SWP_DO_CLUSTER,
SWP_NO_CLUSTER
};

So you see:

swap_duplicate(, SWP_NO_CLUSTER);

vs.

swap_duplicate(, SWP_DO_CLUSTER);


> diff --git a/mm/memory.c b/mm/memory.c
> index e9cac1c4fa69..f3900282e3da 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -951,7 +951,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct 
> *src_mm,
>   swp_entry_t entry = pte_to_swp_entry(pte);
>  
>   if (likely(!non_swap_entry(entry))) {
> - if (swap_duplicate(entry) < 0)
> + if (swap_duplicate(, false) < 0)
>   return entry.val;
>  
>   /* make sure dst_mm is on swapoff's mmlist. */

I'll also point out that in a multi-hundred-line patch, adding arguments
to a existing function would not be something I'd try to include in the
patch.  I'd break it out separately unless absolutely necessary.

> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index f42b1b0cdc58..48e2c54385ee 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -49,6 +49,9 @@ static bool swap_count_continued(struct swap_info_struct *, 
> pgoff_t,
>unsigned char);
>  static void free_swap_count_continuations(struct swap_info_struct *);
>  static sector_t map_swap_entry(swp_entry_t, struct block_device**);
> +static int add_swap_count_continuation_locked(struct swap_info_struct *si,
> +   unsigned long offset,
> +   struct page *page);
>  
>  DEFINE_SPINLOCK(swap_lock);
>  static unsigned int nr_swapfiles;
> @@ -319,6 +322,11 @@ static inline void unlock_cluster_or_swap_info(struct 
> swap_info_struct *si,
>   spin_unlock(>lock);
>  }
>  
> +static inline bool is_cluster_offset(unsigned long offset)
> +{
> + return !(offset % SWAPFILE_CLUSTER);
> +}
> +
>  static inline bool cluster_list_empty(struct swap_cluster_list *list)
>  {
>   return cluster_is_null(>head);
> @@ -1166,16 +1174,14 @@ struct swap_info_struct *get_swap_device(swp_entry_t 
> entry)
>   return NULL;
>  }
>  
> -static unsigned char __swap_entry_free(struct swap_info_struct *p,
> -swp_entry_t entry, unsigned char usage)
> +static unsigned char __swap_entry_free_locked(struct swap_info_struct *p,
> +   struct swap_cluster_info *ci,
> +   unsigned long offset,
> +   unsigned char usage)
>  {
> - struct swap_cluster_info *ci;
> - unsigned long offset = swp_offset(entry);
>   unsigned char count;
>   unsigned char has_cache;
>  
> - ci = lock_cluster_or_swap_info(p, offset);
> -
>   count = p->swap_map[offset];
>  
>   has_cache = count & SWAP_HAS_CACHE;
> @@ -1203,6 +1209,17 @@ static unsigned char __swap_entry_free(struct 
> swap_info_struct *p,
>   usage = count | has_cache;
>   p->swap_map[offset] = usage ? : SWAP_HAS_CACHE;
>  
> + return usage;
> +}
> +
> +static unsigned char __swap_entry_free(struct swap_info_struct *p,
> +swp_entry_t entry, unsigned char usage)
> +{
> + struct swap_cluster_info *ci;
> + unsigned long offset = swp_offset(entry);
> +
> + ci = lock_cluster_or_swap_info(p, offset);
> + usage = __swap_entry_free_locked(p, ci, offset, usage);
>   unlock_cluster_or_swap_info(p, ci);
>  
>   return usage;
> @@ -3450,32 +3467,12 @@ void si_swapinfo(struct sysinfo *val)
>   spin_unlock(_lock);
>  }
>  
> -/*
> - * Verify that a swap entry is valid and increment its swap map count.
> - *
> - * Returns error code in following case.
> - * - success -> 0
> - * - swp_entry is invalid -> EINVAL
> - * - swp_entry is migration entry -> EINVAL
> - * - swap-cache reference is requested but there is already one. -> EEXIST
> - * - swap-cache reference is requested but the entry is not used. -> ENOENT
> - * - swap-mapped reference requested but needs continued swap count. -> 
> ENOMEM
> - */
> -static int __swap_duplicate(swp_entry_t entry, unsigned char usage)
> +static int __swap_duplicate_locked(struct swap_info_struct *p,
> +unsigned long offset, unsigned char usage)
>  {
> - struct swap_info_struct *p;
> - struct swap_cluster_info *ci;
> - unsigned long offset;
>   unsigned char count;
>   

Re: [PATCH -mm -v4 03/21] mm, THP, swap: Support PMD swap mapping in swap_duplicate()

2018-07-09 Thread Dave Hansen
> +static inline bool thp_swap_supported(void)
> +{
> + return IS_ENABLED(CONFIG_THP_SWAP);
> +}

This seems like rather useless abstraction.  Why do we need it?

...
> -static inline int swap_duplicate(swp_entry_t swp)
> +static inline int swap_duplicate(swp_entry_t *swp, bool cluster)
>  {
>   return 0;
>  }

FWIW, I despise true/false function arguments like this.  When I see
this in code:

swap_duplicate(, false);

I have no idea what false does.  I'd much rather see:

enum do_swap_cluster {
SWP_DO_CLUSTER,
SWP_NO_CLUSTER
};

So you see:

swap_duplicate(, SWP_NO_CLUSTER);

vs.

swap_duplicate(, SWP_DO_CLUSTER);


> diff --git a/mm/memory.c b/mm/memory.c
> index e9cac1c4fa69..f3900282e3da 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -951,7 +951,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct 
> *src_mm,
>   swp_entry_t entry = pte_to_swp_entry(pte);
>  
>   if (likely(!non_swap_entry(entry))) {
> - if (swap_duplicate(entry) < 0)
> + if (swap_duplicate(, false) < 0)
>   return entry.val;
>  
>   /* make sure dst_mm is on swapoff's mmlist. */

I'll also point out that in a multi-hundred-line patch, adding arguments
to a existing function would not be something I'd try to include in the
patch.  I'd break it out separately unless absolutely necessary.

> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index f42b1b0cdc58..48e2c54385ee 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -49,6 +49,9 @@ static bool swap_count_continued(struct swap_info_struct *, 
> pgoff_t,
>unsigned char);
>  static void free_swap_count_continuations(struct swap_info_struct *);
>  static sector_t map_swap_entry(swp_entry_t, struct block_device**);
> +static int add_swap_count_continuation_locked(struct swap_info_struct *si,
> +   unsigned long offset,
> +   struct page *page);
>  
>  DEFINE_SPINLOCK(swap_lock);
>  static unsigned int nr_swapfiles;
> @@ -319,6 +322,11 @@ static inline void unlock_cluster_or_swap_info(struct 
> swap_info_struct *si,
>   spin_unlock(>lock);
>  }
>  
> +static inline bool is_cluster_offset(unsigned long offset)
> +{
> + return !(offset % SWAPFILE_CLUSTER);
> +}
> +
>  static inline bool cluster_list_empty(struct swap_cluster_list *list)
>  {
>   return cluster_is_null(>head);
> @@ -1166,16 +1174,14 @@ struct swap_info_struct *get_swap_device(swp_entry_t 
> entry)
>   return NULL;
>  }
>  
> -static unsigned char __swap_entry_free(struct swap_info_struct *p,
> -swp_entry_t entry, unsigned char usage)
> +static unsigned char __swap_entry_free_locked(struct swap_info_struct *p,
> +   struct swap_cluster_info *ci,
> +   unsigned long offset,
> +   unsigned char usage)
>  {
> - struct swap_cluster_info *ci;
> - unsigned long offset = swp_offset(entry);
>   unsigned char count;
>   unsigned char has_cache;
>  
> - ci = lock_cluster_or_swap_info(p, offset);
> -
>   count = p->swap_map[offset];
>  
>   has_cache = count & SWAP_HAS_CACHE;
> @@ -1203,6 +1209,17 @@ static unsigned char __swap_entry_free(struct 
> swap_info_struct *p,
>   usage = count | has_cache;
>   p->swap_map[offset] = usage ? : SWAP_HAS_CACHE;
>  
> + return usage;
> +}
> +
> +static unsigned char __swap_entry_free(struct swap_info_struct *p,
> +swp_entry_t entry, unsigned char usage)
> +{
> + struct swap_cluster_info *ci;
> + unsigned long offset = swp_offset(entry);
> +
> + ci = lock_cluster_or_swap_info(p, offset);
> + usage = __swap_entry_free_locked(p, ci, offset, usage);
>   unlock_cluster_or_swap_info(p, ci);
>  
>   return usage;
> @@ -3450,32 +3467,12 @@ void si_swapinfo(struct sysinfo *val)
>   spin_unlock(_lock);
>  }
>  
> -/*
> - * Verify that a swap entry is valid and increment its swap map count.
> - *
> - * Returns error code in following case.
> - * - success -> 0
> - * - swp_entry is invalid -> EINVAL
> - * - swp_entry is migration entry -> EINVAL
> - * - swap-cache reference is requested but there is already one. -> EEXIST
> - * - swap-cache reference is requested but the entry is not used. -> ENOENT
> - * - swap-mapped reference requested but needs continued swap count. -> 
> ENOMEM
> - */
> -static int __swap_duplicate(swp_entry_t entry, unsigned char usage)
> +static int __swap_duplicate_locked(struct swap_info_struct *p,
> +unsigned long offset, unsigned char usage)
>  {
> - struct swap_info_struct *p;
> - struct swap_cluster_info *ci;
> - unsigned long offset;
>   unsigned char count;
>   

Re: [PATCH -mm -v4 03/21] mm, THP, swap: Support PMD swap mapping in swap_duplicate()

2018-07-09 Thread Huang, Ying
Dan Williams  writes:

> On Thu, Jun 21, 2018 at 8:55 PM Huang, Ying  wrote:
>>
>> From: Huang Ying 
>>
>> To support to swapin the THP as a whole, we need to create PMD swap
>> mapping during swapout, and maintain PMD swap mapping count.  This
>> patch implements the support to increase the PMD swap mapping
>> count (for swapout, fork, etc.)  and set SWAP_HAS_CACHE flag (for
>> swapin, etc.) for a huge swap cluster in swap_duplicate() function
>> family.  Although it only implements a part of the design of the swap
>> reference count with PMD swap mapping, the whole design is described
>> as follow to make it easy to understand the patch and the whole
>> picture.
>>
>> A huge swap cluster is used to hold the contents of a swapouted THP.
>> After swapout, a PMD page mapping to the THP will become a PMD
>> swap mapping to the huge swap cluster via a swap entry in PMD.  While
>> a PTE page mapping to a subpage of the THP will become the PTE swap
>> mapping to a swap slot in the huge swap cluster via a swap entry in
>> PTE.
>>
>> If there is no PMD swap mapping and the corresponding THP is removed
>> from the page cache (reclaimed), the huge swap cluster will be split
>> and become a normal swap cluster.
>>
>> The count (cluster_count()) of the huge swap cluster is
>> SWAPFILE_CLUSTER (= HPAGE_PMD_NR) + PMD swap mapping count.  Because
>> all swap slots in the huge swap cluster are mapped by PTE or PMD, or
>> has SWAP_HAS_CACHE bit set, the usage count of the swap cluster is
>> HPAGE_PMD_NR.  And the PMD swap mapping count is recorded too to make
>> it easy to determine whether there are remaining PMD swap mappings.
>>
>> The count in swap_map[offset] is the sum of PTE and PMD swap mapping
>> count.  This means when we increase the PMD swap mapping count, we
>> need to increase swap_map[offset] for all swap slots inside the swap
>> cluster.  An alternative choice is to make swap_map[offset] to record
>> PTE swap map count only, given we have recorded PMD swap mapping count
>> in the count of the huge swap cluster.  But this need to increase
>> swap_map[offset] when splitting the PMD swap mapping, that may fail
>> because of memory allocation for swap count continuation.  That is
>> hard to dealt with.  So we choose current solution.
>>
>> The PMD swap mapping to a huge swap cluster may be split when unmap a
>> part of PMD mapping etc.  That is easy because only the count of the
>> huge swap cluster need to be changed.  When the last PMD swap mapping
>> is gone and SWAP_HAS_CACHE is unset, we will split the huge swap
>> cluster (clear the huge flag).  This makes it easy to reason the
>> cluster state.
>>
>> A huge swap cluster will be split when splitting the THP in swap
>> cache, or failing to allocate THP during swapin, etc.  But when
>> splitting the huge swap cluster, we will not try to split all PMD swap
>> mappings, because we haven't enough information available for that
>> sometimes.  Later, when the PMD swap mapping is duplicated or swapin,
>> etc, the PMD swap mapping will be split and fallback to the PTE
>> operation.
>>
>> When a THP is added into swap cache, the SWAP_HAS_CACHE flag will be
>> set in the swap_map[offset] of all swap slots inside the huge swap
>> cluster backing the THP.  This huge swap cluster will not be split
>> unless the THP is split even if its PMD swap mapping count dropped to
>> 0.  Later, when the THP is removed from swap cache, the SWAP_HAS_CACHE
>> flag will be cleared in the swap_map[offset] of all swap slots inside
>> the huge swap cluster.  And this huge swap cluster will be split if
>> its PMD swap mapping count is 0.
>>
>> Signed-off-by: "Huang, Ying" 
>> Cc: "Kirill A. Shutemov" 
>> Cc: Andrea Arcangeli 
>> Cc: Michal Hocko 
>> Cc: Johannes Weiner 
>> Cc: Shaohua Li 
>> Cc: Hugh Dickins 
>> Cc: Minchan Kim 
>> Cc: Rik van Riel 
>> Cc: Dave Hansen 
>> Cc: Naoya Horiguchi 
>> Cc: Zi Yan 
>> Cc: Daniel Jordan 
>> ---
>>  include/linux/huge_mm.h |   5 +
>>  include/linux/swap.h|   9 +-
>>  mm/memory.c |   2 +-
>>  mm/rmap.c   |   2 +-
>>  mm/swap_state.c |   2 +-
>>  mm/swapfile.c   | 287 
>> +---
>>  6 files changed, 214 insertions(+), 93 deletions(-)
>
> I'm probably missing some background, but I find the patch hard to
> read. Can you disseminate some of this patch changelog into kernel-doc
> commentary so it's easier to follow which helpers do what relative to
> THP swap.

Yes.  This is a good idea.  Thanks for pointing it out.  I will add more
kernel-doc commentary to make the code easier to be understood.

>>
>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>> index d3bbf6bea9e9..213d32e57c39 100644
>> --- a/include/linux/huge_mm.h
>> +++ b/include/linux/huge_mm.h
>> @@ -80,6 +80,11 @@ extern struct kobj_attribute shmem_enabled_attr;
>>  #define HPAGE_PMD_ORDER (HPAGE_PMD_SHIFT-PAGE_SHIFT)
>>  #define HPAGE_PMD_NR (1<>
>> +static inline bool 

Re: [PATCH -mm -v4 03/21] mm, THP, swap: Support PMD swap mapping in swap_duplicate()

2018-07-09 Thread Huang, Ying
Dan Williams  writes:

> On Thu, Jun 21, 2018 at 8:55 PM Huang, Ying  wrote:
>>
>> From: Huang Ying 
>>
>> To support to swapin the THP as a whole, we need to create PMD swap
>> mapping during swapout, and maintain PMD swap mapping count.  This
>> patch implements the support to increase the PMD swap mapping
>> count (for swapout, fork, etc.)  and set SWAP_HAS_CACHE flag (for
>> swapin, etc.) for a huge swap cluster in swap_duplicate() function
>> family.  Although it only implements a part of the design of the swap
>> reference count with PMD swap mapping, the whole design is described
>> as follow to make it easy to understand the patch and the whole
>> picture.
>>
>> A huge swap cluster is used to hold the contents of a swapouted THP.
>> After swapout, a PMD page mapping to the THP will become a PMD
>> swap mapping to the huge swap cluster via a swap entry in PMD.  While
>> a PTE page mapping to a subpage of the THP will become the PTE swap
>> mapping to a swap slot in the huge swap cluster via a swap entry in
>> PTE.
>>
>> If there is no PMD swap mapping and the corresponding THP is removed
>> from the page cache (reclaimed), the huge swap cluster will be split
>> and become a normal swap cluster.
>>
>> The count (cluster_count()) of the huge swap cluster is
>> SWAPFILE_CLUSTER (= HPAGE_PMD_NR) + PMD swap mapping count.  Because
>> all swap slots in the huge swap cluster are mapped by PTE or PMD, or
>> has SWAP_HAS_CACHE bit set, the usage count of the swap cluster is
>> HPAGE_PMD_NR.  And the PMD swap mapping count is recorded too to make
>> it easy to determine whether there are remaining PMD swap mappings.
>>
>> The count in swap_map[offset] is the sum of PTE and PMD swap mapping
>> count.  This means when we increase the PMD swap mapping count, we
>> need to increase swap_map[offset] for all swap slots inside the swap
>> cluster.  An alternative choice is to make swap_map[offset] to record
>> PTE swap map count only, given we have recorded PMD swap mapping count
>> in the count of the huge swap cluster.  But this need to increase
>> swap_map[offset] when splitting the PMD swap mapping, that may fail
>> because of memory allocation for swap count continuation.  That is
>> hard to dealt with.  So we choose current solution.
>>
>> The PMD swap mapping to a huge swap cluster may be split when unmap a
>> part of PMD mapping etc.  That is easy because only the count of the
>> huge swap cluster need to be changed.  When the last PMD swap mapping
>> is gone and SWAP_HAS_CACHE is unset, we will split the huge swap
>> cluster (clear the huge flag).  This makes it easy to reason the
>> cluster state.
>>
>> A huge swap cluster will be split when splitting the THP in swap
>> cache, or failing to allocate THP during swapin, etc.  But when
>> splitting the huge swap cluster, we will not try to split all PMD swap
>> mappings, because we haven't enough information available for that
>> sometimes.  Later, when the PMD swap mapping is duplicated or swapin,
>> etc, the PMD swap mapping will be split and fallback to the PTE
>> operation.
>>
>> When a THP is added into swap cache, the SWAP_HAS_CACHE flag will be
>> set in the swap_map[offset] of all swap slots inside the huge swap
>> cluster backing the THP.  This huge swap cluster will not be split
>> unless the THP is split even if its PMD swap mapping count dropped to
>> 0.  Later, when the THP is removed from swap cache, the SWAP_HAS_CACHE
>> flag will be cleared in the swap_map[offset] of all swap slots inside
>> the huge swap cluster.  And this huge swap cluster will be split if
>> its PMD swap mapping count is 0.
>>
>> Signed-off-by: "Huang, Ying" 
>> Cc: "Kirill A. Shutemov" 
>> Cc: Andrea Arcangeli 
>> Cc: Michal Hocko 
>> Cc: Johannes Weiner 
>> Cc: Shaohua Li 
>> Cc: Hugh Dickins 
>> Cc: Minchan Kim 
>> Cc: Rik van Riel 
>> Cc: Dave Hansen 
>> Cc: Naoya Horiguchi 
>> Cc: Zi Yan 
>> Cc: Daniel Jordan 
>> ---
>>  include/linux/huge_mm.h |   5 +
>>  include/linux/swap.h|   9 +-
>>  mm/memory.c |   2 +-
>>  mm/rmap.c   |   2 +-
>>  mm/swap_state.c |   2 +-
>>  mm/swapfile.c   | 287 
>> +---
>>  6 files changed, 214 insertions(+), 93 deletions(-)
>
> I'm probably missing some background, but I find the patch hard to
> read. Can you disseminate some of this patch changelog into kernel-doc
> commentary so it's easier to follow which helpers do what relative to
> THP swap.

Yes.  This is a good idea.  Thanks for pointing it out.  I will add more
kernel-doc commentary to make the code easier to be understood.

>>
>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>> index d3bbf6bea9e9..213d32e57c39 100644
>> --- a/include/linux/huge_mm.h
>> +++ b/include/linux/huge_mm.h
>> @@ -80,6 +80,11 @@ extern struct kobj_attribute shmem_enabled_attr;
>>  #define HPAGE_PMD_ORDER (HPAGE_PMD_SHIFT-PAGE_SHIFT)
>>  #define HPAGE_PMD_NR (1<>
>> +static inline bool 

Re: [PATCH -mm -v4 03/21] mm, THP, swap: Support PMD swap mapping in swap_duplicate()

2018-07-07 Thread Dan Williams
On Thu, Jun 21, 2018 at 8:55 PM Huang, Ying  wrote:
>
> From: Huang Ying 
>
> To support to swapin the THP as a whole, we need to create PMD swap
> mapping during swapout, and maintain PMD swap mapping count.  This
> patch implements the support to increase the PMD swap mapping
> count (for swapout, fork, etc.)  and set SWAP_HAS_CACHE flag (for
> swapin, etc.) for a huge swap cluster in swap_duplicate() function
> family.  Although it only implements a part of the design of the swap
> reference count with PMD swap mapping, the whole design is described
> as follow to make it easy to understand the patch and the whole
> picture.
>
> A huge swap cluster is used to hold the contents of a swapouted THP.
> After swapout, a PMD page mapping to the THP will become a PMD
> swap mapping to the huge swap cluster via a swap entry in PMD.  While
> a PTE page mapping to a subpage of the THP will become the PTE swap
> mapping to a swap slot in the huge swap cluster via a swap entry in
> PTE.
>
> If there is no PMD swap mapping and the corresponding THP is removed
> from the page cache (reclaimed), the huge swap cluster will be split
> and become a normal swap cluster.
>
> The count (cluster_count()) of the huge swap cluster is
> SWAPFILE_CLUSTER (= HPAGE_PMD_NR) + PMD swap mapping count.  Because
> all swap slots in the huge swap cluster are mapped by PTE or PMD, or
> has SWAP_HAS_CACHE bit set, the usage count of the swap cluster is
> HPAGE_PMD_NR.  And the PMD swap mapping count is recorded too to make
> it easy to determine whether there are remaining PMD swap mappings.
>
> The count in swap_map[offset] is the sum of PTE and PMD swap mapping
> count.  This means when we increase the PMD swap mapping count, we
> need to increase swap_map[offset] for all swap slots inside the swap
> cluster.  An alternative choice is to make swap_map[offset] to record
> PTE swap map count only, given we have recorded PMD swap mapping count
> in the count of the huge swap cluster.  But this need to increase
> swap_map[offset] when splitting the PMD swap mapping, that may fail
> because of memory allocation for swap count continuation.  That is
> hard to dealt with.  So we choose current solution.
>
> The PMD swap mapping to a huge swap cluster may be split when unmap a
> part of PMD mapping etc.  That is easy because only the count of the
> huge swap cluster need to be changed.  When the last PMD swap mapping
> is gone and SWAP_HAS_CACHE is unset, we will split the huge swap
> cluster (clear the huge flag).  This makes it easy to reason the
> cluster state.
>
> A huge swap cluster will be split when splitting the THP in swap
> cache, or failing to allocate THP during swapin, etc.  But when
> splitting the huge swap cluster, we will not try to split all PMD swap
> mappings, because we haven't enough information available for that
> sometimes.  Later, when the PMD swap mapping is duplicated or swapin,
> etc, the PMD swap mapping will be split and fallback to the PTE
> operation.
>
> When a THP is added into swap cache, the SWAP_HAS_CACHE flag will be
> set in the swap_map[offset] of all swap slots inside the huge swap
> cluster backing the THP.  This huge swap cluster will not be split
> unless the THP is split even if its PMD swap mapping count dropped to
> 0.  Later, when the THP is removed from swap cache, the SWAP_HAS_CACHE
> flag will be cleared in the swap_map[offset] of all swap slots inside
> the huge swap cluster.  And this huge swap cluster will be split if
> its PMD swap mapping count is 0.
>
> Signed-off-by: "Huang, Ying" 
> Cc: "Kirill A. Shutemov" 
> Cc: Andrea Arcangeli 
> Cc: Michal Hocko 
> Cc: Johannes Weiner 
> Cc: Shaohua Li 
> Cc: Hugh Dickins 
> Cc: Minchan Kim 
> Cc: Rik van Riel 
> Cc: Dave Hansen 
> Cc: Naoya Horiguchi 
> Cc: Zi Yan 
> Cc: Daniel Jordan 
> ---
>  include/linux/huge_mm.h |   5 +
>  include/linux/swap.h|   9 +-
>  mm/memory.c |   2 +-
>  mm/rmap.c   |   2 +-
>  mm/swap_state.c |   2 +-
>  mm/swapfile.c   | 287 
> +---
>  6 files changed, 214 insertions(+), 93 deletions(-)

I'm probably missing some background, but I find the patch hard to
read. Can you disseminate some of this patch changelog into kernel-doc
commentary so it's easier to follow which helpers do what relative to
THP swap.

>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index d3bbf6bea9e9..213d32e57c39 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -80,6 +80,11 @@ extern struct kobj_attribute shmem_enabled_attr;
>  #define HPAGE_PMD_ORDER (HPAGE_PMD_SHIFT-PAGE_SHIFT)
>  #define HPAGE_PMD_NR (1<
> +static inline bool thp_swap_supported(void)
> +{
> +   return IS_ENABLED(CONFIG_THP_SWAP);
> +}
> +
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>  #define HPAGE_PMD_SHIFT PMD_SHIFT
>  #define HPAGE_PMD_SIZE ((1UL) << HPAGE_PMD_SHIFT)
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index 

Re: [PATCH -mm -v4 03/21] mm, THP, swap: Support PMD swap mapping in swap_duplicate()

2018-07-07 Thread Dan Williams
On Thu, Jun 21, 2018 at 8:55 PM Huang, Ying  wrote:
>
> From: Huang Ying 
>
> To support to swapin the THP as a whole, we need to create PMD swap
> mapping during swapout, and maintain PMD swap mapping count.  This
> patch implements the support to increase the PMD swap mapping
> count (for swapout, fork, etc.)  and set SWAP_HAS_CACHE flag (for
> swapin, etc.) for a huge swap cluster in swap_duplicate() function
> family.  Although it only implements a part of the design of the swap
> reference count with PMD swap mapping, the whole design is described
> as follow to make it easy to understand the patch and the whole
> picture.
>
> A huge swap cluster is used to hold the contents of a swapouted THP.
> After swapout, a PMD page mapping to the THP will become a PMD
> swap mapping to the huge swap cluster via a swap entry in PMD.  While
> a PTE page mapping to a subpage of the THP will become the PTE swap
> mapping to a swap slot in the huge swap cluster via a swap entry in
> PTE.
>
> If there is no PMD swap mapping and the corresponding THP is removed
> from the page cache (reclaimed), the huge swap cluster will be split
> and become a normal swap cluster.
>
> The count (cluster_count()) of the huge swap cluster is
> SWAPFILE_CLUSTER (= HPAGE_PMD_NR) + PMD swap mapping count.  Because
> all swap slots in the huge swap cluster are mapped by PTE or PMD, or
> has SWAP_HAS_CACHE bit set, the usage count of the swap cluster is
> HPAGE_PMD_NR.  And the PMD swap mapping count is recorded too to make
> it easy to determine whether there are remaining PMD swap mappings.
>
> The count in swap_map[offset] is the sum of PTE and PMD swap mapping
> count.  This means when we increase the PMD swap mapping count, we
> need to increase swap_map[offset] for all swap slots inside the swap
> cluster.  An alternative choice is to make swap_map[offset] to record
> PTE swap map count only, given we have recorded PMD swap mapping count
> in the count of the huge swap cluster.  But this need to increase
> swap_map[offset] when splitting the PMD swap mapping, that may fail
> because of memory allocation for swap count continuation.  That is
> hard to dealt with.  So we choose current solution.
>
> The PMD swap mapping to a huge swap cluster may be split when unmap a
> part of PMD mapping etc.  That is easy because only the count of the
> huge swap cluster need to be changed.  When the last PMD swap mapping
> is gone and SWAP_HAS_CACHE is unset, we will split the huge swap
> cluster (clear the huge flag).  This makes it easy to reason the
> cluster state.
>
> A huge swap cluster will be split when splitting the THP in swap
> cache, or failing to allocate THP during swapin, etc.  But when
> splitting the huge swap cluster, we will not try to split all PMD swap
> mappings, because we haven't enough information available for that
> sometimes.  Later, when the PMD swap mapping is duplicated or swapin,
> etc, the PMD swap mapping will be split and fallback to the PTE
> operation.
>
> When a THP is added into swap cache, the SWAP_HAS_CACHE flag will be
> set in the swap_map[offset] of all swap slots inside the huge swap
> cluster backing the THP.  This huge swap cluster will not be split
> unless the THP is split even if its PMD swap mapping count dropped to
> 0.  Later, when the THP is removed from swap cache, the SWAP_HAS_CACHE
> flag will be cleared in the swap_map[offset] of all swap slots inside
> the huge swap cluster.  And this huge swap cluster will be split if
> its PMD swap mapping count is 0.
>
> Signed-off-by: "Huang, Ying" 
> Cc: "Kirill A. Shutemov" 
> Cc: Andrea Arcangeli 
> Cc: Michal Hocko 
> Cc: Johannes Weiner 
> Cc: Shaohua Li 
> Cc: Hugh Dickins 
> Cc: Minchan Kim 
> Cc: Rik van Riel 
> Cc: Dave Hansen 
> Cc: Naoya Horiguchi 
> Cc: Zi Yan 
> Cc: Daniel Jordan 
> ---
>  include/linux/huge_mm.h |   5 +
>  include/linux/swap.h|   9 +-
>  mm/memory.c |   2 +-
>  mm/rmap.c   |   2 +-
>  mm/swap_state.c |   2 +-
>  mm/swapfile.c   | 287 
> +---
>  6 files changed, 214 insertions(+), 93 deletions(-)

I'm probably missing some background, but I find the patch hard to
read. Can you disseminate some of this patch changelog into kernel-doc
commentary so it's easier to follow which helpers do what relative to
THP swap.

>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index d3bbf6bea9e9..213d32e57c39 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -80,6 +80,11 @@ extern struct kobj_attribute shmem_enabled_attr;
>  #define HPAGE_PMD_ORDER (HPAGE_PMD_SHIFT-PAGE_SHIFT)
>  #define HPAGE_PMD_NR (1<
> +static inline bool thp_swap_supported(void)
> +{
> +   return IS_ENABLED(CONFIG_THP_SWAP);
> +}
> +
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>  #define HPAGE_PMD_SHIFT PMD_SHIFT
>  #define HPAGE_PMD_SIZE ((1UL) << HPAGE_PMD_SHIFT)
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index 

Re: [PATCH -mm -v4 03/21] mm, THP, swap: Support PMD swap mapping in swap_duplicate()

2018-07-01 Thread Huang, Ying
Matthew Wilcox  writes:

> On Fri, Jun 22, 2018 at 11:51:33AM +0800, Huang, Ying wrote:
>> +++ b/mm/swap_state.c
>> @@ -433,7 +433,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, 
>> gfp_t gfp_mask,
>>  /*
>>   * Swap entry may have been freed since our caller observed it.
>>   */
>> -err = swapcache_prepare(entry);
>> +err = swapcache_prepare(entry, false);
>>  if (err == -EEXIST) {
>>  radix_tree_preload_end();
>>  /*
>
> This commit should be just a textual conflict.

Yes.  Will check it.

Best Regards,
Huang, Ying


Re: [PATCH -mm -v4 03/21] mm, THP, swap: Support PMD swap mapping in swap_duplicate()

2018-07-01 Thread Huang, Ying
Matthew Wilcox  writes:

> On Fri, Jun 22, 2018 at 11:51:33AM +0800, Huang, Ying wrote:
>> +++ b/mm/swap_state.c
>> @@ -433,7 +433,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, 
>> gfp_t gfp_mask,
>>  /*
>>   * Swap entry may have been freed since our caller observed it.
>>   */
>> -err = swapcache_prepare(entry);
>> +err = swapcache_prepare(entry, false);
>>  if (err == -EEXIST) {
>>  radix_tree_preload_end();
>>  /*
>
> This commit should be just a textual conflict.

Yes.  Will check it.

Best Regards,
Huang, Ying


Re: [PATCH -mm -v4 03/21] mm, THP, swap: Support PMD swap mapping in swap_duplicate()

2018-06-29 Thread Matthew Wilcox
On Fri, Jun 22, 2018 at 11:51:33AM +0800, Huang, Ying wrote:
> +++ b/mm/swap_state.c
> @@ -433,7 +433,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, 
> gfp_t gfp_mask,
>   /*
>* Swap entry may have been freed since our caller observed it.
>*/
> - err = swapcache_prepare(entry);
> + err = swapcache_prepare(entry, false);
>   if (err == -EEXIST) {
>   radix_tree_preload_end();
>   /*

This commit should be just a textual conflict.



Re: [PATCH -mm -v4 03/21] mm, THP, swap: Support PMD swap mapping in swap_duplicate()

2018-06-29 Thread Matthew Wilcox
On Fri, Jun 22, 2018 at 11:51:33AM +0800, Huang, Ying wrote:
> +++ b/mm/swap_state.c
> @@ -433,7 +433,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, 
> gfp_t gfp_mask,
>   /*
>* Swap entry may have been freed since our caller observed it.
>*/
> - err = swapcache_prepare(entry);
> + err = swapcache_prepare(entry, false);
>   if (err == -EEXIST) {
>   radix_tree_preload_end();
>   /*

This commit should be just a textual conflict.



[PATCH -mm -v4 03/21] mm, THP, swap: Support PMD swap mapping in swap_duplicate()

2018-06-21 Thread Huang, Ying
From: Huang Ying 

To support to swapin the THP as a whole, we need to create PMD swap
mapping during swapout, and maintain PMD swap mapping count.  This
patch implements the support to increase the PMD swap mapping
count (for swapout, fork, etc.)  and set SWAP_HAS_CACHE flag (for
swapin, etc.) for a huge swap cluster in swap_duplicate() function
family.  Although it only implements a part of the design of the swap
reference count with PMD swap mapping, the whole design is described
as follow to make it easy to understand the patch and the whole
picture.

A huge swap cluster is used to hold the contents of a swapouted THP.
After swapout, a PMD page mapping to the THP will become a PMD
swap mapping to the huge swap cluster via a swap entry in PMD.  While
a PTE page mapping to a subpage of the THP will become the PTE swap
mapping to a swap slot in the huge swap cluster via a swap entry in
PTE.

If there is no PMD swap mapping and the corresponding THP is removed
from the page cache (reclaimed), the huge swap cluster will be split
and become a normal swap cluster.

The count (cluster_count()) of the huge swap cluster is
SWAPFILE_CLUSTER (= HPAGE_PMD_NR) + PMD swap mapping count.  Because
all swap slots in the huge swap cluster are mapped by PTE or PMD, or
has SWAP_HAS_CACHE bit set, the usage count of the swap cluster is
HPAGE_PMD_NR.  And the PMD swap mapping count is recorded too to make
it easy to determine whether there are remaining PMD swap mappings.

The count in swap_map[offset] is the sum of PTE and PMD swap mapping
count.  This means when we increase the PMD swap mapping count, we
need to increase swap_map[offset] for all swap slots inside the swap
cluster.  An alternative choice is to make swap_map[offset] to record
PTE swap map count only, given we have recorded PMD swap mapping count
in the count of the huge swap cluster.  But this need to increase
swap_map[offset] when splitting the PMD swap mapping, that may fail
because of memory allocation for swap count continuation.  That is
hard to dealt with.  So we choose current solution.

The PMD swap mapping to a huge swap cluster may be split when unmap a
part of PMD mapping etc.  That is easy because only the count of the
huge swap cluster need to be changed.  When the last PMD swap mapping
is gone and SWAP_HAS_CACHE is unset, we will split the huge swap
cluster (clear the huge flag).  This makes it easy to reason the
cluster state.

A huge swap cluster will be split when splitting the THP in swap
cache, or failing to allocate THP during swapin, etc.  But when
splitting the huge swap cluster, we will not try to split all PMD swap
mappings, because we haven't enough information available for that
sometimes.  Later, when the PMD swap mapping is duplicated or swapin,
etc, the PMD swap mapping will be split and fallback to the PTE
operation.

When a THP is added into swap cache, the SWAP_HAS_CACHE flag will be
set in the swap_map[offset] of all swap slots inside the huge swap
cluster backing the THP.  This huge swap cluster will not be split
unless the THP is split even if its PMD swap mapping count dropped to
0.  Later, when the THP is removed from swap cache, the SWAP_HAS_CACHE
flag will be cleared in the swap_map[offset] of all swap slots inside
the huge swap cluster.  And this huge swap cluster will be split if
its PMD swap mapping count is 0.

Signed-off-by: "Huang, Ying" 
Cc: "Kirill A. Shutemov" 
Cc: Andrea Arcangeli 
Cc: Michal Hocko 
Cc: Johannes Weiner 
Cc: Shaohua Li 
Cc: Hugh Dickins 
Cc: Minchan Kim 
Cc: Rik van Riel 
Cc: Dave Hansen 
Cc: Naoya Horiguchi 
Cc: Zi Yan 
Cc: Daniel Jordan 
---
 include/linux/huge_mm.h |   5 +
 include/linux/swap.h|   9 +-
 mm/memory.c |   2 +-
 mm/rmap.c   |   2 +-
 mm/swap_state.c |   2 +-
 mm/swapfile.c   | 287 +---
 6 files changed, 214 insertions(+), 93 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index d3bbf6bea9e9..213d32e57c39 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -80,6 +80,11 @@ extern struct kobj_attribute shmem_enabled_attr;
 #define HPAGE_PMD_ORDER (HPAGE_PMD_SHIFT-PAGE_SHIFT)
 #define HPAGE_PMD_NR (1head);
@@ -1166,16 +1174,14 @@ struct swap_info_struct *get_swap_device(swp_entry_t 
entry)
return NULL;
 }
 
-static unsigned char __swap_entry_free(struct swap_info_struct *p,
-  swp_entry_t entry, unsigned char usage)
+static unsigned char __swap_entry_free_locked(struct swap_info_struct *p,
+ struct swap_cluster_info *ci,
+ unsigned long offset,
+ 

[PATCH -mm -v4 03/21] mm, THP, swap: Support PMD swap mapping in swap_duplicate()

2018-06-21 Thread Huang, Ying
From: Huang Ying 

To support to swapin the THP as a whole, we need to create PMD swap
mapping during swapout, and maintain PMD swap mapping count.  This
patch implements the support to increase the PMD swap mapping
count (for swapout, fork, etc.)  and set SWAP_HAS_CACHE flag (for
swapin, etc.) for a huge swap cluster in swap_duplicate() function
family.  Although it only implements a part of the design of the swap
reference count with PMD swap mapping, the whole design is described
as follow to make it easy to understand the patch and the whole
picture.

A huge swap cluster is used to hold the contents of a swapouted THP.
After swapout, a PMD page mapping to the THP will become a PMD
swap mapping to the huge swap cluster via a swap entry in PMD.  While
a PTE page mapping to a subpage of the THP will become the PTE swap
mapping to a swap slot in the huge swap cluster via a swap entry in
PTE.

If there is no PMD swap mapping and the corresponding THP is removed
from the page cache (reclaimed), the huge swap cluster will be split
and become a normal swap cluster.

The count (cluster_count()) of the huge swap cluster is
SWAPFILE_CLUSTER (= HPAGE_PMD_NR) + PMD swap mapping count.  Because
all swap slots in the huge swap cluster are mapped by PTE or PMD, or
has SWAP_HAS_CACHE bit set, the usage count of the swap cluster is
HPAGE_PMD_NR.  And the PMD swap mapping count is recorded too to make
it easy to determine whether there are remaining PMD swap mappings.

The count in swap_map[offset] is the sum of PTE and PMD swap mapping
count.  This means when we increase the PMD swap mapping count, we
need to increase swap_map[offset] for all swap slots inside the swap
cluster.  An alternative choice is to make swap_map[offset] to record
PTE swap map count only, given we have recorded PMD swap mapping count
in the count of the huge swap cluster.  But this need to increase
swap_map[offset] when splitting the PMD swap mapping, that may fail
because of memory allocation for swap count continuation.  That is
hard to dealt with.  So we choose current solution.

The PMD swap mapping to a huge swap cluster may be split when unmap a
part of PMD mapping etc.  That is easy because only the count of the
huge swap cluster need to be changed.  When the last PMD swap mapping
is gone and SWAP_HAS_CACHE is unset, we will split the huge swap
cluster (clear the huge flag).  This makes it easy to reason the
cluster state.

A huge swap cluster will be split when splitting the THP in swap
cache, or failing to allocate THP during swapin, etc.  But when
splitting the huge swap cluster, we will not try to split all PMD swap
mappings, because we haven't enough information available for that
sometimes.  Later, when the PMD swap mapping is duplicated or swapin,
etc, the PMD swap mapping will be split and fallback to the PTE
operation.

When a THP is added into swap cache, the SWAP_HAS_CACHE flag will be
set in the swap_map[offset] of all swap slots inside the huge swap
cluster backing the THP.  This huge swap cluster will not be split
unless the THP is split even if its PMD swap mapping count dropped to
0.  Later, when the THP is removed from swap cache, the SWAP_HAS_CACHE
flag will be cleared in the swap_map[offset] of all swap slots inside
the huge swap cluster.  And this huge swap cluster will be split if
its PMD swap mapping count is 0.

Signed-off-by: "Huang, Ying" 
Cc: "Kirill A. Shutemov" 
Cc: Andrea Arcangeli 
Cc: Michal Hocko 
Cc: Johannes Weiner 
Cc: Shaohua Li 
Cc: Hugh Dickins 
Cc: Minchan Kim 
Cc: Rik van Riel 
Cc: Dave Hansen 
Cc: Naoya Horiguchi 
Cc: Zi Yan 
Cc: Daniel Jordan 
---
 include/linux/huge_mm.h |   5 +
 include/linux/swap.h|   9 +-
 mm/memory.c |   2 +-
 mm/rmap.c   |   2 +-
 mm/swap_state.c |   2 +-
 mm/swapfile.c   | 287 +---
 6 files changed, 214 insertions(+), 93 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index d3bbf6bea9e9..213d32e57c39 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -80,6 +80,11 @@ extern struct kobj_attribute shmem_enabled_attr;
 #define HPAGE_PMD_ORDER (HPAGE_PMD_SHIFT-PAGE_SHIFT)
 #define HPAGE_PMD_NR (1head);
@@ -1166,16 +1174,14 @@ struct swap_info_struct *get_swap_device(swp_entry_t 
entry)
return NULL;
 }
 
-static unsigned char __swap_entry_free(struct swap_info_struct *p,
-  swp_entry_t entry, unsigned char usage)
+static unsigned char __swap_entry_free_locked(struct swap_info_struct *p,
+ struct swap_cluster_info *ci,
+ unsigned long offset,
+