eference the '_numa_mem_' per cpu variable directly.
> @@ -2743,6 +2746,17 @@ static inline void zone_statistics(struct zone
> *preferred_zone, struct zone *z)
> #ifdef CONFIG_NUMA
> enum numa_stat_item local_stat = NUMA_LOCAL;
>
> + /*
> + *
From: Huang Ying
This patch adds a new Kconfig option VMA_SWAP_READAHEAD and wraps VMA
based swap readahead code inside #ifdef CONFIG_VMA_SWAP_READAHEAD/#endif.
This is more friendly for tiny kernels. And as pointed to by Minchan
Kim, give people who want to disable the swap readahead an
Minchan Kim writes:
> On Fri, Sep 15, 2017 at 11:15:08AM +0800, Huang, Ying wrote:
>> Minchan Kim writes:
>>
>> > On Thu, Sep 14, 2017 at 08:01:30PM +0800, Huang, Ying wrote:
>> >> Minchan Kim writes:
>> >>
>> >> > On Wed, Sep
Minchan Kim writes:
> On Thu, Sep 14, 2017 at 08:01:30PM +0800, Huang, Ying wrote:
>> Minchan Kim writes:
>>
>> > On Wed, Sep 13, 2017 at 02:02:29PM -0700, Andrew Morton wrote:
>> >> On Wed, 13 Sep 2017 10:40:19 +0900 Minchan Kim wrote:
>> >>
ter all, yes, it would
> be a minimum we should do. But it still breaks users don't/can't read/modify
> alert and program.
>
> How about this?
>
> Can't we make vma-based readahead config option?
> With that, users who no interest on readahead don't enable
isabling people to the issue?
This sounds good for me.
Hi, Minchan, what do you think about this? I think for low-end android
device, the end-user may have no opportunity to upgrade to the latest
kernel, the device vendor should care about this. For desktop users,
the warning proposed by Andrew may help to remind them for the new knob.
Best Regards,
Huang, Ying
Minchan Kim writes:
> On Tue, Sep 12, 2017 at 04:32:43PM +0800, Huang, Ying wrote:
>> Minchan Kim writes:
>>
>> > On Tue, Sep 12, 2017 at 04:07:01PM +0800, Huang, Ying wrote:
>> > < snip >
>> >> >> > My concern is users have be
Minchan Kim writes:
> On Tue, Sep 12, 2017 at 04:07:01PM +0800, Huang, Ying wrote:
> < snip >
>> >> > My concern is users have been disabled swap readahead by page-cluster
>> >> > would
>> >> > be regressed. Please take care of them.
&
Minchan Kim writes:
> On Tue, Sep 12, 2017 at 03:29:45PM +0800, Huang, Ying wrote:
>> Minchan Kim writes:
>>
>> > On Tue, Sep 12, 2017 at 02:44:36PM +0800, Huang, Ying wrote:
>> >> Minchan Kim writes:
>> >>
>> >> > On Tue, Sep 12
Minchan Kim writes:
> On Tue, Sep 12, 2017 at 02:44:36PM +0800, Huang, Ying wrote:
>> Minchan Kim writes:
>>
>> > On Tue, Sep 12, 2017 at 01:23:01PM +0800, Huang, Ying wrote:
>> >> Minchan Kim writes:
>> >>
>> >> > page_cluster
Minchan Kim writes:
> On Tue, Sep 12, 2017 at 01:23:01PM +0800, Huang, Ying wrote:
>> Minchan Kim writes:
>>
>> > page_cluster 0 means "we don't want readahead" so in the case,
>> > let's skip the readahead detection logic.
>>
Minchan Kim writes:
> page_cluster 0 means "we don't want readahead" so in the case,
> let's skip the readahead detection logic.
>
> Cc: "Huang, Ying"
> Signed-off-by: Minchan Kim
> ---
> include/linux/swap.h | 3 ++-
> 1 file changed, 2
Thomas Gleixner writes:
> On Tue, 5 Sep 2017, Huang, Ying wrote:
>
>> From: Huang Ying
>>
>> When developing code to bootup some APs (Application CPUs)
>> asynchronously, the following kernel panic is encountered. After
>> checking the code, it is found th
Thomas Gleixner writes:
> On Tue, 5 Sep 2017, Huang, Ying wrote:
>
>> From: Huang Ying
>>
>> When developing code to bootup some APs (Application CPUs)
>> asynchronously, the following kernel panic is encountered. After
>> checking the code, it is found th
From: Huang Ying
When developing code to bootup some APs (Application CPUs)
asynchronously, the following kernel panic is encountered. After
checking the code, it is found that the irq_to_desc() may return NULL
during CPU hotplug. So the NULL pointer checking is added to fix
this.
"
Thomas Gleixner writes:
> On Mon, 4 Sep 2017, Huang, Ying wrote:
>> diff --git a/kernel/irq/cpuhotplug.c b/kernel/irq/cpuhotplug.c
>> index 638eb9c83d9f..af9029625271 100644
>> --- a/kernel/irq/cpuhotplug.c
> ry> +++ b/kernel/irq/cpuhotplug.c
&g
From: Huang Ying
When developing code to bootup some APs (Application CPUs)
asynchronously, the following kernel panic is encountered. After
checking the code, it is found that the IRQ descriptor may be NULL
during CPU hotplug. So I added corresponding NULL pointer checking to
fix this. And
/mm/swapfile.c
> @@ -3053,6 +3053,7 @@ SYSCALL_DEFINE2(swapon, const char __user *,
> specialfile, int, swap_flags)
> spin_unlock(&swap_lock);
> vfree(swap_map);
> kvfree(cluster_info);
> + kvfree(frontswap_map);
> if (swap_file) {
> if (inode && S_ISREG(inode->i_mode)) {
> inode_unlock(inode);
Yes. There is a memory leak.
Reviewed-by: "Huang, Ying"
Best Regards,
Huang, Ying
the vfree calls to use kvfree.
>
> Found by running generic/357 from xfstests.
>
> Signed-off-by: Darrick J. Wong
Thanks for fixing!
Reviewed-by: "Huang, Ying"
Best Regards,
Huang, Ying
> ---
> mm/swapfile.c |2 +-
> 1 file changed, 1 insertion(+), 1 delet
From: Huang Ying
The optimized clear_huge_page() isn't easy to read and understand.
This is suggested by Michael Hocko to improve it.
Suggested-by: Michal Hocko
Signed-off-by: "Huang, Ying"
---
mm/memory.c | 11 +++
1 file changed, 7 insertions(+), 4 deletions(-)
ributed more balanced, so I think
scheduler do better job here. The problem is that the tasklist_lock
isn't scalable. But considering this is only a micro-benchmark which
specially exercises fork/exit/wait syscall, this may be not a big
problem in reality.
So, all in all, I think we can ignore this regression.
Best Regards,
Huang, Ying
"Huang, Ying" writes:
> Hi, Peter,
>
> "Huang, Ying" writes:
>
>> Peter Zijlstra writes:
>>
>>> On Sat, Aug 05, 2017 at 08:47:02AM +0800, Huang, Ying wrote:
>>>> Yes. That looks good. So you will prepare the final patch?
Michal Hocko writes:
> On Tue 15-08-17 09:46:18, Huang, Ying wrote:
>> From: Huang Ying
>>
>> Huge page helps to reduce TLB miss rate, but it has higher cache
>> footprint, sometimes this may cause some issue. For example, when
>> clearing huge page on x86_64
From: Huang Ying
Huge page helps to reduce TLB miss rate, but it has higher cache
footprint, sometimes this may cause some issue. For example, when
clearing huge page on x86_64 platform, the cache footprint is 2M. But
on a Xeon E5 v3 2699 CPU, there are 18 cores, 36 threads, and only 45M
LLC
Hi, Peter,
"Huang, Ying" writes:
> Peter Zijlstra writes:
>
>> On Sat, Aug 05, 2017 at 08:47:02AM +0800, Huang, Ying wrote:
>>> Yes. That looks good. So you will prepare the final patch? Or you
>>> hope me to do that?
>>
>> I was hoping yo
Hi, Andrew,
Andrew Morton writes:
> On Mon, 7 Aug 2017 15:21:31 +0800 "Huang, Ying" wrote:
>
>> From: Huang Ying
>>
>> Huge page helps to reduce TLB miss rate, but it has higher cache
>> footprint, sometimes this may cause some issue. For examp
Andrew Morton writes:
> On Mon, 7 Aug 2017 13:40:34 +0800 "Huang, Ying" wrote:
>
>> From: Huang Ying
>>
>> The statistics for total readahead pages and total readahead hits are
>> recorded and exported via the following sysfs interface.
>>
&g
Matthew Wilcox writes:
> On Mon, Aug 07, 2017 at 03:21:31PM +0800, Huang, Ying wrote:
>> @@ -2509,7 +2509,8 @@ enum mf_action_page_type {
>> #if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS)
>> extern void clear_huge_p
"Huang, Ying" writes:
> "Kirill A. Shutemov" writes:
>
>> On Mon, Aug 07, 2017 at 03:21:31PM +0800, Huang, Ying wrote:
>>> From: Huang Ying
>>>
>>> Huge page helps to reduce TLB miss rate, but it has higher cache
>>>
Christopher Lameter writes:
> On Mon, 7 Aug 2017, Huang, Ying wrote:
>
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -4374,9 +4374,31 @@ void clear_huge_page(struct page *page,
>> }
>>
>> might_sleep();
>> -for (i = 0; i <
Peter Zijlstra writes:
> On Sat, Aug 05, 2017 at 08:47:02AM +0800, Huang, Ying wrote:
>> Yes. That looks good. So you will prepare the final patch? Or you
>> hope me to do that?
>
> I was hoping you'd do it ;-)
Thanks! Here is the updated patch
Best Regards,
Mike Kravetz writes:
> On 08/07/2017 12:21 AM, Huang, Ying wrote:
>> From: Huang Ying
>>
>> Huge page helps to reduce TLB miss rate, but it has higher cache
>> footprint, sometimes this may cause some issue. For example, when
>> clearing huge page on x86_64 pl
Christopher Lameter writes:
> On Mon, 7 Aug 2017, Huang, Ying wrote:
>
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -4374,9 +4374,31 @@ void clear_huge_page(struct page *page,
>> }
>>
>> might_sleep();
>> -for (i = 0; i <
"Kirill A. Shutemov" writes:
> On Mon, Aug 07, 2017 at 03:21:31PM +0800, Huang, Ying wrote:
>> From: Huang Ying
>>
>> Huge page helps to reduce TLB miss rate, but it has higher cache
>> footprint, sometimes this may cause some issue. For example, when
>
Jan Kara writes:
> On Mon 07-08-17 15:21:31, Huang, Ying wrote:
>> From: Huang Ying
>>
>> Huge page helps to reduce TLB miss rate, but it has higher cache
>> footprint, sometimes this may cause some issue. For example, when
>> clearing huge page on x86_64 pl
From: Huang Ying
Huge page helps to reduce TLB miss rate, but it has higher cache
footprint, sometimes this may cause some issue. For example, when
clearing huge page on x86_64 platform, the cache footprint is 2M. But
on a Xeon E5 v3 2699 CPU, there are 18 cores, 36 threads, and only 45M
LLC
From: Huang Ying
VMA based swap readahead will readahead the virtual pages that is
continuous in the virtual address space. While the original swap
readahead will readahead the swap slots that is continuous in the swap
device. Although VMA based swap readahead is more correct for the
swap
From: Huang Ying
The sysfs interface to control the VMA based swap readahead is added
as follow,
/sys/kernel/mm/swap/vma_ra_enabled
Enable the VMA based swap readahead algorithm, or use the original
global swap readahead algorithm.
/sys/kernel/mm/swap/vma_ra_max_order
Set the max order of
From: Huang Ying
The statistics for total readahead pages and total readahead hits are
recorded and exported via the following sysfs interface.
/sys/kernel/mm/swap/ra_hits
/sys/kernel/mm/swap/ra_total
With them, the efficiency of the swap readahead could be measured, so
that the swap readahead
From: Huang Ying
The swap readahead is an important mechanism to reduce the swap in
latency. Although pure sequential memory access pattern isn't very
popular for anonymous memory, the space locality is still considered
valid.
In the original swap readahead implementation, the consec
From: Huang Ying
In the original implementation, it is possible that the existing pages
in the swap cache (not newly readahead) could be marked as the
readahead pages. This will cause the statistics of swap readahead be
wrong and influence the swap readahead algorithm too.
This is fixed via
swap readahead statistics, because that is the
interface used by other similar statistics.
- Add ABI document for newly added sysfs interface.
v3:
- Rebased on latest -mm tree
- Use percpu_counter for swap readahead statistics per Dave Hansen's comment.
Best Regards,
Huang, Ying
Peter Zijlstra writes:
> On Fri, Aug 04, 2017 at 10:05:55AM +0800, Huang, Ying wrote:
>> "Huang, Ying" writes:
>> > Peter Zijlstra writes:
>
>> >> +struct __call_single_data {
>> >> struct llist_node llist;
>> >> s
"Huang, Ying" writes:
> Peter Zijlstra writes:
> [snip]
>> diff --git a/include/linux/smp.h b/include/linux/smp.h
>> index 68123c1fe549..8d817cb80a38 100644
>> --- a/include/linux/smp.h
>> +++ b/include/linux/smp.h
>> @@ -14,13 +14,16 @@
>>
__call_single_data));
> +
Another requirement of the alignment is that it should be the power of
2. Otherwise, for example, if someone adds a field to struct, so that
the size becomes 40 on x86_64. The alignment should be 64 instead of
40.
Best Regards,
Huang, Ying
> /* total number of cp
Eric Dumazet writes:
> On Wed, 2017-08-02 at 16:52 +0800, Huang, Ying wrote:
>> From: Huang Ying
>>
>> struct call_single_data is used in IPI to transfer information between
>> CPUs. Its size is bigger than sizeof(unsigned long) and less than
>> cache line si
Christopher Lameter writes:
> On Wed, 2 Aug 2017, Huang, Ying wrote:
>
>> --- a/include/linux/percpu.h
>> +++ b/include/linux/percpu.h
>> @@ -129,5 +129,8 @@ extern phys_addr_t per_cpu_ptr_to_phys(void *addr);
>&
From: Huang Ying
To use the newly introduced alloc_percpu_aligned(), which can allocate
cache line size aligned percpu memory dynamically.
Signed-off-by: "Huang, Ying"
Cc: Joerg Roedel
Cc: io...@lists.linux-foundation.org
---
drivers/iommu/iova.c | 2 +-
1 file changed, 1 inser
From: Huang Ying
To allocate percpu memory that is aligned with cache line size
dynamically. We can statically allocate percpu memory that is aligned
with cache line size with DEFINE_PER_CPU_ALIGNED(), but we have no
correspondent API for dynamic allocation.
Signed-off-by: "Huang, Ying
From: Huang Ying
struct call_single_data is used in IPI to transfer information between
CPUs. Its size is bigger than sizeof(unsigned long) and less than
cache line size. Now, it is allocated with no any alignment
requirement. This makes it possible for allocated call_single_data to
cross 2
From: Huang Ying
struct call_single_data is used in IPI to transfer information between
CPUs. Its size is bigger than sizeof(unsigned long) and less than
cache line size. Now, it is allocated with no any alignment
requirement. This makes it possible for allocated call_single_data to
cross 2
looks like a false positive reporting and not reported by my
compiler and kbuild compiler (gcc-6). But anyway, we should silence it.
Best Regards,
Huang, Ying
-->8--
>From 7a7ff76d7bcbd7affda169b29abcf3dafa38052e Mon Sep 17 00:00:00 2001
From: Huang Ying
Date: Tue, 1 Aug 2017
2cd503b4980b0afc ]---
> [ 113.341281] Kernel panic - not syncing: Fatal exception
> [ 113.347398] Kernel Offset: 0x700 from 0x8100 (relocation
> range: 0x8000-0xbfff)
Thanks for reporting! Do you test it on a HDD? I can reproduce this on
a
Hi, Rik,
Rik van Riel writes:
> On Tue, 2017-07-25 at 09:51 +0800, Huang, Ying wrote:
>> From: Huang Ying
>>
>> The swap cache stats could be gotten only via sysrq, which isn't
>> convenient in some situation. So the sysfs interface of swap cache
>> sta
Andrew Morton writes:
> On Tue, 25 Jul 2017 09:51:46 +0800 "Huang, Ying" wrote:
>
>> The swap cache stats could be gotten only via sysrq, which isn't
>> convenient in some situation. So the sysfs interface of swap cache
>> stats is added for that. T
Hi, Andrew,
Andrew Morton writes:
> On Tue, 25 Jul 2017 09:51:51 +0800 "Huang, Ying" wrote:
>
>> From: Huang Ying
>>
>> VMA based swap readahead will readahead the virtual pages that is
>> continuous in the virtual address space. While the original s
From: Huang Ying
The swap readahead is an important mechanism to reduce the swap in
latency. Although pure sequential memory access pattern isn't very
popular for anonymous memory, the space locality is still considered
valid.
In the original swap readahead implementation, the consec
From: Huang Ying
VMA based swap readahead will readahead the virtual pages that is
continuous in the virtual address space. While the original swap
readahead will readahead the swap slots that is continuous in the swap
device. Although VMA based swap readahead is more correct for the
swap
From: Huang Ying
The swap cache stats could be gotten only via sysrq, which isn't
convenient in some situation. So the sysfs interface of swap cache
stats is added for that. The added sysfs directories/files are as
follow,
/sys/kernel/mm/swap
/sys/kernel/mm/swap/cache_find_total
/sys/k
From: Huang Ying
The sysfs interface to control the VMA based swap readahead is added
as follow,
/sys/kernel/mm/swap/vma_ra_enabled
Enable the VMA based swap readahead algorithm, or use the original
global swap readahead algorithm.
/sys/kernel/mm/swap/vma_ra_max_order
Set the max order of
From: Huang Ying
The statistics for total readahead pages and total readahead hits are
recorded and exported via the following sysfs interface.
/sys/kernel/mm/swap/ra_hits
/sys/kernel/mm/swap/ra_total
With them, the efficiency of the swap readahead could be measured, so
that the swap readahead
From: Huang Ying
In the original implementation, it is possible that the existing pages
in the swap cache (not newly readahead) could be marked as the
readahead pages. This will cause the statistics of swap readahead be
wrong and influence the swap readahead algorithm too.
This is fixed via
dahead hit rate is high, shows that the space
locality is still valid in some practical workloads.
Changelogs:
v3:
- Rebased on latest -mm tree
- Use percpu_counter for swap readahead statistics per Dave Hansen's comment.
Best Regards,
Huang, Ying
Steven Rostedt writes:
> On Mon, 24 Jul 2017 13:46:07 +0800
> "Huang\, Ying" wrote:
>
>> Hi, Steven,
>>
>> We are working on parallelizing secondary CPU bootup. So we need to
>> measure the bootup time of secondary CPU, that is, measure time spen
early
(before core_initcall()?). So, do you think it is possible to use
ftrace to measure secondary CPU bootup time?
Thanks,
Huang, Ying
From: Huang Ying
PTE mapped THP (Transparent Huge Page) will be ignored when moving
memory cgroup charge. But for THP which is in the swap cache, the
memory cgroup charge for the swap of a tail-page may be moved in
current implementation. That isn't correct, because the swap charge
for al
From: Huang Ying
In this patch, splitting transparent huge page (THP) during swapping
out is delayed from after adding the THP into the swap cache to after
swapping out finishes. After the patch, more operations for the
anonymous THP reclaiming, such as writing the THP to the swap device
From: Huang Ying
This patch makes mem_cgroup_swapout() works for the transparent huge
page (THP). Which will move the memory cgroup charge from memory to
swap for a THP.
This will be used for the THP swap support. Where a THP may be
swapped out as a whole to a set of (HPAGE_PMD_NR) continuous
From: Huang Ying
After adding swapping out support for THP (Transparent Huge Page), it
is possible that a THP in swap cache (partly swapped out) need to be
split. To split such a THP, the swap cluster backing the THP need to
be split too, that is, the CLUSTER_FLAG_HUGE flag need to be cleared
From: Huang Ying
When swapping out THP (Transparent Huge Page), instead of swapping out
the THP as a whole, sometimes we have to fallback to split the THP
into normal pages before swapping, because no free swap clusters are
available, or cgroup limit is exceeded, etc. To count the number of
the
From: Huang Ying
For a THP (Transparent Huge Page), tail_page->mem_cgroup is NULL. So
to check whether the page is charged already, we need to check the
head page. This is not an issue before because it is impossible for a
THP to be in the swap cache before. But after we add delay
From: Huang Ying
It's hard to write a whole transparent huge page (THP) to a file
backed swap device during swapping out and the file backed swap device
isn't very popular. So the huge cluster allocation for the file
backed swap device is disabled.
Signed-off-by: "Huang, Ying
From: Huang Ying
To support to delay splitting THP (Transparent Huge Page) after
swapped out. We need to enhance swap writing code to support to write
a THP as a whole. This will improve swap write IO performance. As
Ming Lei pointed out, this should be based on
multipage bvec support, which
From: Huang Ying
The .rw_page in struct block_device_operations is used by the swap
subsystem to read/write the page contents from/into the corresponding
swap slot in the swap device. To support the THP (Transparent Huge
Page) swap optimization, the .rw_page is enhanced to support to
read/write
From: Huang Ying
After supporting to delay THP (Transparent Huge Page) splitting after
swapped out, it is possible that some page table mappings of the THP
are turned into swap entries. So reuse_swap_page() need to check the
swap count in addition to the map count as before. This patch done
From: Huang Ying
Hi, Andrew, could you help me to check whether the overall design is
reasonable?
Hi, Johannes and Minchan, Thanks a lot for your review to the first
step of the THP swap optimization! Could you help me to review the
second step in this patchset?
Hi, Hugh, Shaohua, Minchan and
From: Huang Ying
The normal swap slot reclaiming can be done when the swap count
reaches SWAP_HAS_CACHE. But for the swap slot which is backing a THP,
all swap slots backing one THP must be reclaimed together, because the
swap slot may be used again when the THP is swapped out again later.
So
From: Huang Ying
Previously, swapcache_free_cluster() is used only in the error path of
shrink_page_list() to free the swap cluster just allocated if the
THP (Transparent Huge Page) is failed to be split. In this patch, it
is enhanced to clear the swap cache flag (SWAP_HAS_CACHE) for the swap
be onlined
alloc_swap_slot_cache()
mutex_lock(cache[B]->alloc_lock)
mutex_init(cache[B]->alloc_lock) !!!
The cache[B]->alloc_lock will be reinitialized when it is still held.
Best Regards,
Huang, Ying
> Reported-by: Wenwei Tao
> Sign
Andrew Morton writes:
> On Fri, 23 Jun 2017 15:12:51 +0800 "Huang, Ying" wrote:
>
>> From: Huang Ying
>>
>> Hi, Andrew, could you help me to check whether the overall design is
>> reasonable?
>>
>> Hi, Johannes and Minchan, Thanks a lot
Dave Hansen writes:
> On 06/29/2017 06:44 PM, Huang, Ying wrote:
>>
>> static atomic_t swapin_readahead_hits = ATOMIC_INIT(4);
>> +static atomic_long_t swapin_readahead_hits_total = ATOMIC_INIT(0);
>> +static atomic_long_t swapin_readahead_total = AT
you have time to take a look at this patchset?
Best Regards,
Huang, Ying
[snip]
The swap readahead is an important mechanism to reduce the swap in
latency. Although pure sequential memory access pattern isn't very
popular for anonymous memory, the space locality is still considered
valid.
In the original swap readahead implementation, the consecutive blocks
in swap device ar
From: Huang Ying
The sysfs interface to control the VMA based swap readahead is added
as follow,
/sys/kernel/mm/swap/vma_ra_enabled
Enable the VMA based swap readahead algorithm, or use the original
global swap readahead algorithm.
/sys/kernel/mm/swap/vma_ra_max_order
Set the max order of
From: Huang Ying
The swap cache stats could be gotten only via sysrq, which isn't
convenient in some situation. So the sysfs interface of swap cache
stats is added for that. The added sysfs directories/files are as
follow,
/sys/kernel/mm/swap
/sys/kernel/mm/swap/cache_find_total
/sys/k
From: Huang Ying
The swap readahead is an important mechanism to reduce the swap in
latency. Although pure sequential memory access pattern isn't very
popular for anonymous memory, the space locality is still considered
valid.
In the original swap readahead implementation, the consec
From: Huang Ying
VMA based swap readahead will readahead the virtual pages that is
continuous in the virtual address space. While the original swap
readahead will readahead the swap slots that is continuous in the swap
device. Although VMA based swap readahead is more correct for the
swap
From: Huang Ying
The statistics for total readahead pages and total readahead hits are
recorded and exported via the following sysfs interface.
/sys/kernel/mm/swap/ra_hits
/sys/kernel/mm/swap/ra_total
With them, the efficiency of the swap readahead could be measured, so
that the swap readahead
From: Huang Ying
In the original implementation, it is possible that the existing pages
in the swap cache (not newly readahead) could be marked as the
readahead pages. This will cause the statistics of swap readahead be
wrong and influence the swap readahead algorithm too.
This is fixed via
80 SS:ESP: 0068:f54efd80
[ 10.670881] CR2: 001fe2b8
[ 10.671140] ---[ end trace f51518af57e6b531 ]---
I think this comes from the signed and unsigned int comparison on
i386. The gcc version is,
gcc (Debian 6.3.0-18) 6.3.0 20170516
Best Regards,
Huang, Ying
From: Huang Ying
To support to delay splitting THP (Transparent Huge Page) after
swapped out. We need to enhance swap writing code to support to write
a THP as a whole. This will improve swap write IO performance. As
Ming Lei pointed out, this should be based on
multipage bvec support, which
From: Huang Ying
The normal swap slot reclaiming can be done when the swap count
reaches SWAP_HAS_CACHE. But for the swap slot which is backing a THP,
all swap slots backing one THP must be reclaimed together, because the
swap slot may be used again when the THP is swapped out again later.
So
From: Huang Ying
Previously, swapcache_free_cluster() is used only in the error path of
shrink_page_list() to free the swap cluster just allocated if the
THP (Transparent Huge Page) is failed to be split. In this patch, it
is enhanced to clear the swap cache flag (SWAP_HAS_CACHE) for the swap
From: Huang Ying
The .rw_page in struct block_device_operations is used by the swap
subsystem to read/write the page contents from/into the corresponding
swap slot in the swap device. To support the THP (Transparent Huge
Page) swap optimization, the .rw_page is enhanced to support to
read/write
From: Huang Ying
In this patch, splitting transparent huge page (THP) during swapping
out is delayed from after adding the THP into the swap cache to after
swapping out finishes. After the patch, more operations for the
anonymous THP reclaiming, such as writing the THP to the swap device
From: Huang Ying
Hi, Andrew, could you help me to check whether the overall design is
reasonable?
Hi, Johannes and Minchan, Thanks a lot for your review to the first
step of the THP swap optimization! Could you help me to review the
second step in this patchset?
Hi, Hugh, Shaohua, Minchan and
From: Huang Ying
After supporting to delay THP (Transparent Huge Page) splitting after
swapped out, it is possible that some page table mappings of the THP
are turned into swap entries. So reuse_swap_page() need to check the
swap count in addition to the map count as before. This patch done
From: Huang Ying
This patch makes mem_cgroup_swapout() works for the transparent huge
page (THP). Which will move the memory cgroup charge from memory to
swap for a THP.
This will be used for the THP swap support. Where a THP may be
swapped out as a whole to a set of (HPAGE_PMD_NR) continuous
From: Huang Ying
For a THP (Transparent Huge Page), tail_page->mem_cgroup is NULL. So
to check whether the page is charged already, we need to check the
head page. This is not an issue before because it is impossible for a
THP to be in the swap cache before. But after we add delay
From: Huang Ying
After adding swapping out support for THP (Transparent Huge Page), it
is possible that a THP in swap cache (partly swapped out) need to be
split. To split such a THP, the swap cluster backing the THP need to
be split too, that is, the CLUSTER_FLAG_HUGE flag need to be cleared
801 - 900 of 1830 matches
Mail list logo