Re: [PATCH v3 1/4] mm/swapfile: use percpu_ref to serialize against concurrent swapoff

2021-04-20 Thread Huang, Ying
tatic void _enable_swap_info(struct swap_info_struct *p) > { > - p->flags |= SWP_WRITEOK | SWP_VALID; > + p->flags |= SWP_WRITEOK; > atomic_long_add(p->pages, _swap_pages); > total_swap_pages += p->pages; > > @@ -2497,10 +2506,9 @@ static void

Re: [PATCH v3 4/4] mm/shmem: fix shmem_swapin() race with swapoff

2021-04-20 Thread Huang, Ying
}; > > + /* Prevent swapoff from happening to us. */ > + si = get_swap_device(swap); Better to put get/put_swap_device() in shmem_swapin_page(), that make it possible for us to remove get/put_swap_device() in lookup_swap_cache(). Best Regards, Huang, Ying > + if (unlikel

Re: [PATCH v3 3/4] mm/swap: remove confusing checking for non_swap_entry() in swap_ra_info()

2021-04-20 Thread Huang, Ying
race isn't important because it will not cause problem. Best Regards, Huang, Ying > But the swap_entry > isn't used in this function and we will have enough checking when we really > operate the PTE entries later. So checking for non_swap_entry() is not > really needed here and shou

Re: [PATCH v3 2/4] swap: fix do_swap_page() race with swapoff

2021-04-20 Thread Huang, Ying
y > done when system shutdown only. To reduce the performance overhead on the > hot-path as much as possible, it appears we can use the percpu_ref to close > this race window(as suggested by Huang, Ying). This needs to be revised too. Unless you squash 1/4 and 2/4. > Fixes: 0bcac06

Re: [PATCH v2 1/5] mm/swapfile: add percpu_ref support for swap

2021-04-19 Thread Huang, Ying
Miaohe Lin writes: > On 2021/4/19 15:09, Huang, Ying wrote: >> Miaohe Lin writes: >> >>> On 2021/4/19 10:48, Huang, Ying wrote: >>>> Miaohe Lin writes: >>>> >>>>> We will use percpu-refcount to serialize against concurrent

Re: [PATCH v2 5/5] mm/shmem: fix shmem_swapin() race with swapoff

2021-04-19 Thread Huang, Ying
Miaohe Lin writes: > On 2021/4/19 15:04, Huang, Ying wrote: >> Miaohe Lin writes: >> >>> On 2021/4/19 10:15, Huang, Ying wrote: >>>> Miaohe Lin writes: >>>> >>>>> When I was investigating the swap code,

Re: [PATCH v2 1/5] mm/swapfile: add percpu_ref support for swap

2021-04-19 Thread Huang, Ying
Miaohe Lin writes: > On 2021/4/19 10:48, Huang, Ying wrote: >> Miaohe Lin writes: >> >>> We will use percpu-refcount to serialize against concurrent swapoff. This >>> patch adds the percpu_ref support for swap. >>> >>> Signed-off-by:

Re: [PATCH v2 5/5] mm/shmem: fix shmem_swapin() race with swapoff

2021-04-19 Thread Huang, Ying
Miaohe Lin writes: > On 2021/4/19 10:15, Huang, Ying wrote: >> Miaohe Lin writes: >> >>> When I was investigating the swap code, I found the below possible race >>> window: >>> >>> CP

Re: [PATCH v2 2/5] mm/swapfile: use percpu_ref to serialize against concurrent swapoff

2021-04-18 Thread Huang, Ying
es, _swap_pages); > total_swap_pages += p->pages; > > @@ -2507,7 +2504,7 @@ static void enable_swap_info(struct swap_info_struct > *p, int prio, > spin_unlock(_lock); > /* >* Guarantee swap_map, cluster_info, etc. fields are valid >

Re: [PATCH v2 1/5] mm/swapfile: add percpu_ref support for swap

2021-04-18 Thread Huang, Ying
lags) > { > struct swap_info_struct *p; > - struct filename *name; > + struct filename *name = NULL; > struct file *swap_file = NULL; > struct address_space *mapping; > int prio; > @@ -3163,6 +3179,15 @@ SYSCALL_DEFINE2(swapon, const char __user *

Re: [PATCH v2 3/5] swap: fix do_swap_page() race with swapoff

2021-04-18 Thread Huang, Ying
is usually > done when system shutdown only. To reduce the performance overhead on the > hot-path as much as possible, it appears we can use the percpu_ref to close > this race window(as suggested by Huang, Ying). I still suggest to squash PATCH 1-3, at least PATCH 1-2. That will change th

Re: [PATCH v2 5/5] mm/shmem: fix shmem_swapin() race with swapoff

2021-04-18 Thread Huang, Ying
node *inode = si->swap_file->f_mapping->host;[oops!] > > Close this race window by using get/put_swap_device() to guard against > concurrent swapoff. > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") No. This isn't the commit that introduces the race condition

Re: [PATCH v2 4/5] mm/swap: remove confusing checking for non_swap_entry() in swap_ra_info()

2021-04-18 Thread Huang, Ying
-blame to find out it. The patch itself looks good to me. Best Regards, Huang, Ying > Signed-off-by: Miaohe Lin > --- > mm/swap_state.c | 6 -- > 1 file changed, 6 deletions(-) > > diff --git a/mm/swap_state.c b/mm/swap_state.c > index 272ea2108c9d..df5405384520 100644 > ---

Re: [PATCH 1/5] mm/swapfile: add percpu_ref support for swap

2021-04-16 Thread Huang, Ying
Miaohe Lin writes: > On 2021/4/15 22:31, Dennis Zhou wrote: >> On Thu, Apr 15, 2021 at 01:24:31PM +0800, Huang, Ying wrote: >>> Dennis Zhou writes: >>> >>>> On Wed, Apr 14, 2021 at 01:44:58PM +0800, Huang, Ying wrote: >>>>> Dennis Zhou w

Re: [RFC PATCH] percpu_ref: Make percpu_ref_tryget*() ACQUIRE operations

2021-04-16 Thread Huang, Ying
Kent Overstreet writes: > On Thu, Apr 15, 2021 at 09:42:56PM -0700, Paul E. McKenney wrote: >> On Tue, Apr 13, 2021 at 10:47:03AM +0800, Huang Ying wrote: >> > One typical use case of percpu_ref_tryget() family functions is as >> > follows, >> >

Re: [PATCH 1/5] mm/swapfile: add percpu_ref support for swap

2021-04-15 Thread Huang, Ying
Dennis Zhou writes: > On Thu, Apr 15, 2021 at 01:24:31PM +0800, Huang, Ying wrote: >> Dennis Zhou writes: >> >> > On Wed, Apr 14, 2021 at 01:44:58PM +0800, Huang, Ying wrote: >> >> Dennis Zhou writes: >> >> >> >> > On Wed, Apr 1

Re: [PATCH v2 00/16] Multigenerational LRU Framework

2021-04-15 Thread Huang, Ying
ning and rmap scanning in the page reclaiming. For example, if the working-set is transitioned, we can take advantage of the fast page table scanning to identify the new working-set quickly. While we can fallback to the rmap scanning if the page table scanning doesn't help. Best Regards, Huang, Ying

Re: [v2 PATCH 6/7] mm: migrate: check mapcount for THP instead of ref count

2021-04-15 Thread Huang, Ying
"Zi Yan" writes: > On 13 Apr 2021, at 23:00, Huang, Ying wrote: > >> Yang Shi writes: >> >>> The generic migration path will check refcount, so no need check refcount >>> here. >>> But the old code actually prevents from migrating shared T

Re: [PATCH 1/5] mm/swapfile: add percpu_ref support for swap

2021-04-14 Thread Huang, Ying
Dennis Zhou writes: > On Wed, Apr 14, 2021 at 01:44:58PM +0800, Huang, Ying wrote: >> Dennis Zhou writes: >> >> > On Wed, Apr 14, 2021 at 11:59:03AM +0800, Huang, Ying wrote: >> >> Dennis Zhou writes: >> >> >> >> > Hello, >&g

Re: [PATCH v2 00/16] Multigenerational LRU Framework

2021-04-14 Thread Huang, Ying
Yu Zhao writes: > On Wed, Apr 14, 2021 at 12:15 AM Huang, Ying wrote: >> >> Yu Zhao writes: >> >> > On Tue, Apr 13, 2021 at 8:30 PM Rik van Riel wrote: >> >> >> >> On Wed, 2021-04-14 at 09:14 +1000, Dave Chinner wrote: >> >

Re: [PATCH v2 00/16] Multigenerational LRU Framework

2021-04-14 Thread Huang, Ying
rmap, we need to > scan a lot of pages anyway. Why not just scan them all? This may be not the case. For rmap scanning, it's possible to scan only a small portion of memory. But with the page table scanning, you need to scan almost all (I understand you have some optimization as above). As Rik shown i

Re: [PATCH 1/5] mm/swapfile: add percpu_ref support for swap

2021-04-13 Thread Huang, Ying
Dennis Zhou writes: > On Wed, Apr 14, 2021 at 11:59:03AM +0800, Huang, Ying wrote: >> Dennis Zhou writes: >> >> > Hello, >> > >> > On Wed, Apr 14, 2021 at 10:06:48AM +0800, Huang, Ying wrote: >> >> Miaohe Lin writes: >> >>

Re: [PATCH 1/5] mm/swapfile: add percpu_ref support for swap

2021-04-13 Thread Huang, Ying
Dennis Zhou writes: > Hello, > > On Wed, Apr 14, 2021 at 10:06:48AM +0800, Huang, Ying wrote: >> Miaohe Lin writes: >> >> > On 2021/4/14 9:17, Huang, Ying wrote: >> >> Miaohe Lin writes: >> >> >> >>>

Re: [PATCH 2/5] swap: fix do_swap_page() race with swapoff

2021-04-13 Thread Huang, Ying
Miaohe Lin writes: > On 2021/4/13 9:27, Huang, Ying wrote: >> Miaohe Lin writes: >> >>> When I was investigating the swap code, I found the below possible race >>> window: >>> >>>

Re: [v2 PATCH 6/7] mm: migrate: check mapcount for THP instead of ref count

2021-04-13 Thread Huang, Ying
us from migrating shared THP? If no, why not just remove the old refcount checking? Best Regards, Huang, Ying > Signed-off-by: Yang Shi > --- > mm/migrate.c | 16 > 1 file changed, 4 insertions(+), 12 deletions(-) > > diff --git a/mm/migrate.c b/mm/migrate.

Re: [v2 PATCH 3/7] mm: thp: refactor NUMA fault handling

2021-04-13 Thread Huang, Ying
/mm/huge_memory.c > @@ -1418,93 +1418,21 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf) > { > struct vm_area_struct *vma = vmf->vma; > pmd_t pmd = vmf->orig_pmd; > - struct anon_vma *anon_vma = NULL; > + pmd_t oldpmd; nit: the usage of oldpmd and

Re: [PATCH 1/5] mm/swapfile: add percpu_ref support for swap

2021-04-13 Thread Huang, Ying
Miaohe Lin writes: > On 2021/4/14 9:17, Huang, Ying wrote: >> Miaohe Lin writes: >> >>> On 2021/4/12 15:24, Huang, Ying wrote: >>>> "Huang, Ying" writes: >>>> >>>>> Miaohe Lin writes: >>>>> >>&g

Re: [PATCH 1/5] mm/swapfile: add percpu_ref support for swap

2021-04-13 Thread Huang, Ying
Miaohe Lin writes: > On 2021/4/12 15:24, Huang, Ying wrote: >> "Huang, Ying" writes: >> >>> Miaohe Lin writes: >>> >>>> We will use percpu-refcount to serialize against concurrent swapoff. This >>>> patch adds the perc

Re: [PATCH 2/5] swap: fix do_swap_page() race with swapoff

2021-04-13 Thread Huang, Ying
Tim Chen writes: > On 4/12/21 6:27 PM, Huang, Ying wrote: > >> >> This isn't the commit that introduces the race. You can use `git blame` >> find out the correct commit. For this it's commit 0bcac06f27d7 "mm, >> swap: skip swapcache for swapin of synchr

Re: [RFC] mm: activate access-more-than-once page via NUMA balancing

2021-04-12 Thread Huang, Ying
Yu Zhao writes: > On Fri, Mar 26, 2021 at 12:21 AM Huang, Ying wrote: >> >> Mel Gorman writes: >> >> > On Thu, Mar 25, 2021 at 12:33:45PM +0800, Huang, Ying wrote: >> >> > I caution against this patch. >> >> > >> >&g

Re: [PATCH v1 09/14] mm: multigenerational lru: mm_struct list

2021-04-12 Thread Huang, Ying
Yu Zhao writes: > On Wed, Mar 24, 2021 at 12:58 AM Huang, Ying wrote: >> >> Yu Zhao writes: >> >> > On Mon, Mar 22, 2021 at 11:13:19AM +0800, Huang, Ying wrote: >> >> Yu Zhao writes: >> >> >> >> > On Wed, Mar 17,

Re: [PATCH v1 00/14] Multigenerational LRU

2021-04-12 Thread Huang, Ying
gle-page VMAs, i.e., not returning to the PGD table for each > of such VMAs. Just a heads-up. > > The rmap, on the other hand, had to > 1) lock each (shmem) page it scans > 2) go through five levels of page tables for each page, even though > some of them have the same LCAs > during the test. The second part is worse given that I have 5 levels > of page tables configured. > > Any additional benchmarks you would suggest? Thanks. Hi, Yu, Thanks for your data. In addition to the data your measured above, is it possible for you to measure some raw data? For example, how many CPU cycles does it take to scan all pages in the system? For the page table scanning, the page tables of all processes will be scanned. For the rmap scanning, all pages in LRU will be scanned. And we can do that with difference parameters, for example, shared vs. non-shared, sparse vs. dense. Then we can get an idea about how fast the page table scanning can be. Best Regards, Huang, Ying

[RFC PATCH] percpu_ref: Make percpu_ref_tryget*() ACQUIRE operations

2021-04-12 Thread Huang Ying
rom the other fields may be invalid or inconsistent. To guarantee the correct memory ordering, percpu_ref_tryget*() needs to be the ACQUIRE operations. This function implements that via using smp_load_acquire() in __ref_is_percpu() to read the percpu pointer. Signed-off-by: "Huang, Ying&quo

Re: [RFC PATCH v1 00/11] Manage the top tier memory in a tiered memory

2021-04-12 Thread Huang, Ying
ier 0 memory used by the cgroup exceeds > this high > boundary, allocation of tier 0 memory by the cgroup will > be throttled. The tier 0 memory used by this cgroup > will also be subjected to heavy demotion. I think we

Re: [PATCH 5/5] mm/swap_state: fix swap_cluster_readahead() race with swapoff

2021-04-12 Thread Huang, Ying
p_page() has been fixed. We need to fix shmem_swapin(). Best Regards, Huang, Ying > Signed-off-by: Miaohe Lin > --- > mm/swap_state.c | 11 +-- > 1 file changed, 9 insertions(+), 2 deletions(-) > > diff --git a/mm/swap_state.c b/mm/swap_state.c > index 3bf0d0c297b

Re: [PATCH 3/5] mm/swap_state: fix get_shadow_from_swap_cache() race with swapoff

2021-04-12 Thread Huang, Ying
essary. The only caller has guaranteed the swap device from swapoff. Best Regards, Huang, Ying > --- > mm/swap_state.c | 9 ++--- > 1 file changed, 6 insertions(+), 3 deletions(-) > > diff --git a/mm/swap_state.c b/mm/swap_state.c > index 272ea2108c9d..709c260d644a 100644 &

Re: [PATCH 2/5] swap: fix do_swap_page() race with swapoff

2021-04-12 Thread Huang, Ying
e overhead on the > hot-path as much as possible, it appears we can use the percpu_ref to close > this race window(as suggested by Huang, Ying). > > Fixes: 235b62176712 ("mm/swap: add cluster lock") This isn't the commit that introduces the race. You can use `git blame` find

Re: [PATCH 1/5] mm/swapfile: add percpu_ref support for swap

2021-04-12 Thread Huang, Ying
"Huang, Ying" writes: > Miaohe Lin writes: > >> We will use percpu-refcount to serialize against concurrent swapoff. This >> patch adds the percpu_ref support for later fixup. >> >> Signed-off-by: Miaohe Lin >> --- >> includ

Re: [PATCH 1/5] mm/swapfile: add percpu_ref support for swap

2021-04-11 Thread Huang, Ying
pecialfile); > if (IS_ERR(name)) { > error = PTR_ERR(name); > @@ -3356,6 +3374,7 @@ SYSCALL_DEFINE2(swapon, const char __user *, > specialfile, int, swap_flags) > bad_swap_unlock_inode: > inode_unlock(inode); > bad_swap: > + percpu_ref_exit(>users); Usually the resource freeing order matches their allocating order reversely. So, if there's no special reason, please follow that rule. Best Regards, Huang, Ying > free_percpu(p->percpu_cluster); > p->percpu_cluster = NULL; > free_percpu(p->cluster_next_cpu);

Re: [PATCH 2/5] swap: fix do_swap_page() race with swapoff

2021-04-11 Thread Huang, Ying
.. >>p->swap_file >> = NULL; >> struct file *swap_file = sis->swap_file; >> struct address_space *mapping = swap_file->f_mapping;[oops!] >> ... >> ... >> > > Agree. This is also what I meant to illustrate. And you provide a better one. > Many thanks! For the pages that are swapped in through swap cache. That isn't an issue. Because the page is locked, the swap entry will be marked with SWAP_HAS_CACHE, so swapoff() cannot proceed until the page has been unlocked. So the race is for the fast path as follows, if (data_race(si->flags & SWP_SYNCHRONOUS_IO) && __swap_count(entry) == 1) I found it in your original patch description. But please make it more explicit to reduce the potential confusing. Best Regards, Huang, Ying

Re: [PATCH 4/5] mm/swap_state: fix potential faulted in race in swap_ra_info()

2021-04-11 Thread Huang, Ying
Miaohe Lin writes: > On 2021/4/9 16:50, Huang, Ying wrote: >> Miaohe Lin writes: >> >>> While we released the pte lock, somebody else might faulted in this pte. >>> So we should check whether it's swap pte first to guard against such race >>> or swp

Re: [PATCH 4/5] mm/swap_state: fix potential faulted in race in swap_ra_info()

2021-04-09 Thread Huang, Ying
l issue. entry or swap_entry isn't used in this function. And we have enough checking when we really operate the PTE entries later. But I admit it's confusing. So I suggest to just remove the checking. We will check it when necessary. Best Regards, Huang, Ying

Re: [RFC PATCH v1 00/11] Manage the top tier memory in a tiered memory

2021-04-08 Thread Huang, Ying
secase which divides DRAM:PMEM ratio for different jobs or memcgs > when I was with Alibaba. > > In the first place I thought about per NUMA node limit, but it was > very hard to configure it correctly for users unless you know exactly > about your memory usage and hot/cold memory distribution. > > I'm wondering, just off the top of my head, if we could extend the > semantic of low and min limit. For example, just redefine low and min > to "the limit on top tier memory". Then we could have low priority > jobs have 0 low/min limit. Per my understanding, memory.low/min are for the memory protection instead of the memory limiting. memory.high is for the memory limiting. Best Regards, Huang, Ying

Re: [PATCH -V2] NUMA balancing: reduce TLB flush via delaying mapping on hint page fault

2021-04-08 Thread Huang, Ying
Mel Gorman writes: > On Fri, Apr 02, 2021 at 04:27:17PM +0800, Huang Ying wrote: >> With NUMA balancing, in hint page fault handler, the faulting page >> will be migrated to the accessing node if necessary. During the >> migration, TLB will be shot down on all CPUs tha

[PATCH -V3] NUMA balancing: reduce TLB flush via delaying mapping on hint page fault

2021-04-08 Thread Huang Ying
) with about 9.2e6 pages (35.8GB) migrated. From the perf profile, it can be found that the CPU cycles spent by try_to_unmap() and its callees reduces from 6.02% to 0.47%. That is, the CPU cycles spent by TLB shooting down decreases greatly. Signed-off-by: "Huang, Ying" Reviewed-by: Mel

[PATCH -V2] NUMA balancing: reduce TLB flush via delaying mapping on hint page fault

2021-04-02 Thread Huang Ying
) with about 9.2e6 pages (35.8GB) migrated. From the perf profile, it can be found that the CPU cycles spent by try_to_unmap() and its callees reduces from 6.02% to 0.47%. That is, the CPU cycles spent by TLB shooting down decreases greatly. Signed-off-by: "Huang, Ying" Cc: Peter Zijlstr

Re: [RFC] NUMA balancing: reduce TLB flush via delaying mapping on hint page fault

2021-03-31 Thread Huang, Ying
Mel Gorman writes: > On Wed, Mar 31, 2021 at 07:20:09PM +0800, Huang, Ying wrote: >> Mel Gorman writes: >> >> > On Mon, Mar 29, 2021 at 02:26:51PM +0800, Huang Ying wrote: >> >> For NUMA balancing, in hint page fault handler, the faulting page will >

Re: [RFC] NUMA balancing: reduce TLB flush via delaying mapping on hint page fault

2021-03-31 Thread Huang, Ying
Mel Gorman writes: > On Mon, Mar 29, 2021 at 02:26:51PM +0800, Huang Ying wrote: >> For NUMA balancing, in hint page fault handler, the faulting page will >> be migrated to the accessing node if necessary. During the migration, >> TLB will be shot down on all CPUs tha

Re: [Question] Is there a race window between swapoff vs synchronous swap_readpage

2021-03-30 Thread Huang, Ying
Yu Zhao writes: > On Mon, Mar 29, 2021 at 9:44 PM Huang, Ying wrote: >> >> Miaohe Lin writes: >> >> > On 2021/3/30 9:57, Huang, Ying wrote: >> >> Hi, Miaohe, >> >> >> >> Miaohe Lin writes: >> >> >> >

Re: [Question] Is there a race window between swapoff vs synchronous swap_readpage

2021-03-29 Thread Huang, Ying
Miaohe Lin writes: > On 2021/3/30 9:57, Huang, Ying wrote: >> Hi, Miaohe, >> >> Miaohe Lin writes: >> >>> Hi all, >>> I am investigating the swap code, and I found the below possible race >>> window: >>> >&g

Re: [Question] Is there a race window between swapoff vs synchronous swap_readpage

2021-03-29 Thread Huang, Ying
would be really grateful. Thanks! :) This appears possible. Even for swapcache case, we can't guarantee the swap entry gotten from the page table is always valid too. The underlying swap device can be swapped off at the same time. So we use get/put_swap_device() for that. Maybe we need similar stuff here. Best Regards, Huang, Ying

Re: [PATCH 4/6] mm: thp: refactor NUMA fault handling

2021-03-29 Thread Huang, Ying
implementation and your new implementation. Originally, PMD is restored after trying to migrate the misplaced THP. I think this can reduce the TLB shooting-down IPI. Best Regards, Huang, Ying > In the old code anon_vma lock was needed to serialize THP migration > against THP split, but si

Re: [PATCH 3/6] mm: migrate: teach migrate_misplaced_page() about THP

2021-03-29 Thread Huang, Ying
; --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -2127,7 +2127,7 @@ static inline bool is_shared_exec_page(struct > vm_area_struct *vma, > * the page that will be dropped by this function before returning. > */ > int migrate_misplaced_page(struct page *page, struct vm_area_struc

[RFC] NUMA balancing: reduce TLB flush via delaying mapping on hint page fault

2021-03-29 Thread Huang Ying
inaccessible. But the difference between the accessible window is small. Because the page will be made inaccessible soon for migrating. Signed-off-by: "Huang, Ying" Cc: Peter Zijlstra Cc: Mel Gorman Cc: Peter Xu Cc: Johannes Weiner Cc: Vlastimil Babka Cc: "Matthew Wilcox"

Re: [RFC] mm: activate access-more-than-once page via NUMA balancing

2021-03-26 Thread Huang, Ying
Mel Gorman writes: > On Thu, Mar 25, 2021 at 12:33:45PM +0800, Huang, Ying wrote: >> > I caution against this patch. >> > >> > It's non-deterministic for a number of reasons. As it requires NUMA >> > balancing to be enabled, the pageout behaviour of a

Re: [RFC] mm: activate access-more-than-once page via NUMA balancing

2021-03-24 Thread Huang, Ying
Hi, Mel, Thanks for comment! Mel Gorman writes: > On Wed, Mar 24, 2021 at 04:32:09PM +0800, Huang Ying wrote: >> One idea behind the LRU page reclaiming algorithm is to put the >> access-once pages in the inactive list and access-more-than-once pages >> in the activ

[RFC] mm: activate access-more-than-once page via NUMA balancing

2021-03-24 Thread Huang Ying
and cold pages. But generally, I don't think it is a good idea to improve the performance via increasing the system overhead purely. Signed-off-by: "Huang, Ying" Inspired-by: Yu Zhao Cc: Hillf Danton Cc: Johannes Weiner Cc: Joonsoo Kim Cc: Matthew Wilcox Cc: Mel Gorman Cc: Michal

Re: [PATCH v1 09/14] mm: multigenerational lru: mm_struct list

2021-03-24 Thread Huang, Ying
Yu Zhao writes: > On Mon, Mar 22, 2021 at 11:13:19AM +0800, Huang, Ying wrote: >> Yu Zhao writes: >> >> > On Wed, Mar 17, 2021 at 11:37:38AM +0800, Huang, Ying wrote: >> >> Yu Zhao writes: >> >> >> >> > On Tue, Mar 16, 20

Re: [PATCH v1 09/14] mm: multigenerational lru: mm_struct list

2021-03-21 Thread Huang, Ying
Yu Zhao writes: > On Wed, Mar 17, 2021 at 11:37:38AM +0800, Huang, Ying wrote: >> Yu Zhao writes: >> >> > On Tue, Mar 16, 2021 at 02:44:31PM +0800, Huang, Ying wrote: >> > The scanning overhead is only one of the two major problems of the >> &g

Re: [PATCH v1 09/14] mm: multigenerational lru: mm_struct list

2021-03-16 Thread Huang, Ying
Yu Zhao writes: > On Tue, Mar 16, 2021 at 02:44:31PM +0800, Huang, Ying wrote: >> Yu Zhao writes: >> >> > On Tue, Mar 16, 2021 at 10:07:36AM +0800, Huang, Ying wrote: >> >> Rik van Riel writes: >> >> >>

Re: [PATCH v1 10/14] mm: multigenerational lru: core

2021-03-16 Thread Huang, Ying
Yu Zhao writes: > On Tue, Mar 16, 2021 at 02:52:52PM +0800, Huang, Ying wrote: >> Yu Zhao writes: >> >> > On Tue, Mar 16, 2021 at 10:08:51AM +0800, Huang, Ying wrote: >> >> Yu Zhao writes: >> >> [snip] >> >> >> >&g

Re: [PATCH v1 10/14] mm: multigenerational lru: core

2021-03-16 Thread Huang, Ying
Yu Zhao writes: > On Tue, Mar 16, 2021 at 10:08:51AM +0800, Huang, Ying wrote: >> Yu Zhao writes: >> [snip] >> >> > +/* Main function used by foreground, background and user-triggered aging. >> > */ >> > +static bool walk_mm_li

Re: [PATCH v1 09/14] mm: multigenerational lru: mm_struct list

2021-03-16 Thread Huang, Ying
Yu Zhao writes: > On Tue, Mar 16, 2021 at 10:07:36AM +0800, Huang, Ying wrote: >> Rik van Riel writes: >> >> > On Sat, 2021-03-13 at 00:57 -0700, Yu Zhao wrote: >> > >> >> +/* >> >> + * After pages are faulted in, they become the younge

Re: [PATCH v1 10/14] mm: multigenerational lru: core

2021-03-15 Thread Huang, Ying
ation of the function? And may be the number of mm_struct and the number of pages scanned. In comparison, in the traditional LRU algorithm, for each round, only a small subset of the whole physical memory is scanned. Best Regards, Huang, Ying > + > + if (!last) { > +

Re: [PATCH v1 09/14] mm: multigenerational lru: mm_struct list

2021-03-15 Thread Huang, Ying
scheduled after the previous scanning will not be scanned. I guess that this helps OOM kills? If so, how about just take advantage of that information for OOM killing and page reclaiming? For example, if a process hasn't been scheduled for long time, just reclaim its private pages. Best Regards, Huang, Ying

Re: [PATCH] vmscan: retry without cache trim mode if nothing scanned

2021-03-11 Thread Huang, Ying
Hi, Butt, Shakeel Butt writes: > On Wed, Mar 10, 2021 at 4:47 PM Huang, Ying wrote: >> >> From: Huang Ying >> >> In shrink_node(), to determine whether to enable cache trim mode, the >> LRU size is gotten via lruvec_page_state(). That gets th

[RFC -V6 5/6] memory tiering: rate limit NUMA migration throughput

2021-03-11 Thread Huang Ying
decreases 51.4% (from 213.0 MB/s to 103.6 MB/s) with the patch, while the benchmark score decreases only 1.8%. A new sysctl knob kernel.numa_balancing_rate_limit_mbps is added for the users to specify the limit. TODO: Add ABI document for new sysctl knob. Signed-off-by: "Huang, Ying"

[RFC -V6 6/6] memory tiering: adjust hot threshold automatically

2021-03-11 Thread Huang Ying
% with 32.4% fewer NUMA page migrations on a 2 socket Intel server with Optance DC Persistent Memory. Because it improves the accuracy of the hot page selection. Signed-off-by: "Huang, Ying" Cc: Andrew Morton Cc: Michal Hocko Cc: Rik van Riel Cc: Mel Gorman Cc: Peter Zijlstra Cc: Ingo

[RFC -V6 4/6] memory tiering: hot page selection with hint page fault latency

2021-03-11 Thread Huang Ying
nse. - If fast response is more important for system performance, the administrator can set a higher hot threshold. Signed-off-by: "Huang, Ying" Cc: Andrew Morton Cc: Michal Hocko Cc: Rik van Riel Cc: Mel Gorman Cc: Peter Zijlstra Cc: Ingo Molnar Cc: Dave Hansen Cc: Dan Williams Cc:

[RFC -V6 3/6] memory tiering: skip to scan fast memory

2021-03-11 Thread Huang Ying
-by: "Huang, Ying" Suggested-by: Dave Hansen Cc: Andrew Morton Cc: Michal Hocko Cc: Rik van Riel Cc: Mel Gorman Cc: Peter Zijlstra Cc: Ingo Molnar Cc: Dan Williams Cc: linux-kernel@vger.kernel.org Cc: linux...@kvack.org --- mm/huge_memory.c | 30 +- mm/mprotect

[RFC -V6 0/6] NUMA balancing: optimize memory placement for memory tiering system

2021-03-11 Thread Huang Ying
cleanup. - Rebased on the latest page demotion patchset. v2: - Addressed comments for V1. - Rebased on v5.5. Huang Ying (6): NUMA balancing: optimize page placement for memory tiering system memory tiering: add page promotion counter memory tiering: skip to scan fast memory memo

[RFC -V6 1/6] NUMA balancing: optimize page placement for memory tiering system

2021-03-11 Thread Huang Ying
TODO: - Update ABI document: Documentation/sysctl/kernel.txt Signed-off-by: "Huang, Ying" Cc: Andrew Morton Cc: Michal Hocko Cc: Rik van Riel Cc: Mel Gorman Cc: Peter Zijlstra Cc: Ingo Molnar Cc: Dave Hansen Cc: Dan Williams Cc: linux-kernel@vger.kernel.org Cc: linux...@kvack.or

[RFC -V6 2/6] memory tiering: add page promotion counter

2021-03-11 Thread Huang Ying
To distinguish the number of the memory tiering promoted pages from that of the originally inter-socket NUMA balancing migrated pages. The counter is per-node (count in the target node). So this can be used to identify promotion imbalance among the NUMA nodes. Signed-off-by: "Huang, Ying

[PATCH] vmscan: retry without cache trim mode if nothing scanned

2021-03-10 Thread Huang, Ying
From: Huang Ying In shrink_node(), to determine whether to enable cache trim mode, the LRU size is gotten via lruvec_page_state(). That gets the value from a per-CPU counter (mem_cgroup_per_node->lruvec_stat[]). The error of the per-CPU counter from CPU local counting and the descendant mem

Re: [RFC -V5 1/6] NUMA balancing: optimize page placement for memory tiering system

2021-02-05 Thread Huang, Ying
Hillf Danton writes: > On Thu, 4 Feb 2021 18:10:51 +0800 Huang Ying wrote: >> With the advent of various new memory types, some machines will have >> multiple types of memory, e.g. DRAM and PMEM (persistent memory). The >> memory subsystem of these machines can be

[RFC -V5 6/6] memory tiering: add page promotion counter

2021-02-04 Thread Huang Ying
To distinguish the number of the memory tiering promoted pages from that of the originally inter-socket NUMA balancing migrated pages. The counter is per-node (count in the target node). So this can be used to identify promotion imbalance among the NUMA nodes. Signed-off-by: "Huang, Ying

[RFC -V5 4/6] memory tiering: rate limit NUMA migration throughput

2021-02-04 Thread Huang Ying
decreases 51.4% (from 213.0 MB/s to 103.6 MB/s) with the patch, while the benchmark score decreases only 1.8%. A new sysctl knob kernel.numa_balancing_rate_limit_mbps is added for the users to specify the limit. TODO: Add ABI document for new sysctl knob. Signed-off-by: "Huang, Ying"

[RFC -V5 5/6] memory tiering: adjust hot threshold automatically

2021-02-04 Thread Huang Ying
% with 32.4% fewer NUMA page migrations on a 2 socket Intel server with Optance DC Persistent Memory. Because it improves the accuracy of the hot page selection. Signed-off-by: "Huang, Ying" Cc: Andrew Morton Cc: Michal Hocko Cc: Rik van Riel Cc: Mel Gorman Cc: Peter Zijlstra Cc: Ingo

[RFC -V5 3/6] memory tiering: hot page selection with hint page fault latency

2021-02-04 Thread Huang Ying
nse. - If fast response is more important for system performance, the administrator can set a higher hot threshold. Signed-off-by: "Huang, Ying" Cc: Andrew Morton Cc: Michal Hocko Cc: Rik van Riel Cc: Mel Gorman Cc: Peter Zijlstra Cc: Ingo Molnar Cc: Dave Hansen Cc: Dan Williams Cc:

[RFC -V5 2/6] memory tiering: skip to scan fast memory

2021-02-04 Thread Huang Ying
-by: "Huang, Ying" Suggested-by: Dave Hansen Cc: Andrew Morton Cc: Michal Hocko Cc: Rik van Riel Cc: Mel Gorman Cc: Peter Zijlstra Cc: Ingo Molnar Cc: Dan Williams Cc: linux-kernel@vger.kernel.org Cc: linux...@kvack.org --- include/linux/node.h | 5 + mm/huge_memory.

[RFC -V5 1/6] NUMA balancing: optimize page placement for memory tiering system

2021-02-04 Thread Huang Ying
TODO: - Update ABI document: Documentation/sysctl/kernel.txt Signed-off-by: "Huang, Ying" Cc: Andrew Morton Cc: Michal Hocko Cc: Rik van Riel Cc: Mel Gorman Cc: Peter Zijlstra Cc: Ingo Molnar Cc: Dave Hansen Cc: Dan Williams Cc: linux-kernel@vger.kernel.org Cc: linux...@kvack.or

[RFC -V5 0/6] autonuma: Optimize memory placement for memory tiering system

2021-02-04 Thread Huang Ying
ebased on the latest page demotion patchset. v2: - Addressed comments for V1. - Rebased on v5.5. Huang Ying (6): NUMA balancing: optimize page placement for memory tiering system memory tiering: skip to scan fast memory memory tiering: hot page selection with hint page fault latency memory

Re: [PATCH] mm/swap_state: Constify static struct attribute_group

2021-02-01 Thread Huang, Ying
d to me. Acked-by: "Huang, Ying" > --- > mm/swap_state.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/swap_state.c b/mm/swap_state.c > index d0d417efeecc..3cdee7b11da9 100644 > --- a/mm/swap_state.c > +++ b/mm/swap_state.c > @@

Re: [PATCH -V9 2/3] NOT kernel/man2/set_mempolicy.2: Add mode flag MPOL_F_NUMA_BALANCING

2021-01-21 Thread Huang, Ying
"Alejandro Colomar (man-pages)" writes: > Hi Huang Ying, > > On 1/20/21 7:12 AM, Huang Ying wrote: >> Signed-off-by: "Huang, Ying" >> Cc: "Alejandro Colomar" > > Sorry, for the confusion. > I have a different email for reading lists.

Re: [PATCH] swap: Check nrexceptional of swap cache before being freed

2021-01-21 Thread Huang, Ying
Matthew Wilcox writes: > On Wed, Jan 20, 2021 at 03:27:11PM +0800, Huang Ying wrote: >> To catch the error in updating the swap cache shadow entries or their count. > > I just resent a patch that removes nrexceptional tracking. > > Can you use !mapping_empty() inst

Re: [PATCH] swap: Check nrexceptional of swap cache before being freed

2021-01-19 Thread Huang, Ying
Michal Hocko writes: > On Wed 20-01-21 15:27:11, Huang Ying wrote: >> To catch the error in updating the swap cache shadow entries or their count. > > What is the error? There's no error in the current code. But we will change the related code in the future. So this checki

[PATCH] swap: Check nrexceptional of swap cache before being freed

2021-01-19 Thread Huang Ying
To catch the error in updating the swap cache shadow entries or their count. Signed-off-by: "Huang, Ying" Cc: Minchan Kim Cc: Joonsoo Kim , Cc: Johannes Weiner , Cc: Vlastimil Babka , Hugh Dickins , Cc: Mel Gorman , Cc: Michal Hocko , Cc: Dan Williams , Cc: Christoph Hellwig , Il

[PATCH -V9 2/3] NOT kernel/man2/set_mempolicy.2: Add mode flag MPOL_F_NUMA_BALANCING

2021-01-19 Thread Huang Ying
Signed-off-by: "Huang, Ying" Cc: "Alejandro Colomar" --- man2/set_mempolicy.2 | 22 ++ 1 file changed, 22 insertions(+) diff --git a/man2/set_mempolicy.2 b/man2/set_mempolicy.2 index 68011eecb..fa64a1820 100644 --- a/man2/set_mempolicy.2 +++ b/man2/set_m

[PATCH -V9 3/3] NOT kernel/numactl: Support to enable Linux kernel NUMA balancing

2021-01-19 Thread Huang Ying
be used before the --membind/-m memory policy in the command line. With it, the Linux kernel NUMA balancing will be enabled for the process if --membind/-m is used and the feature is supported by the kernel. Signed-off-by: "Huang, Ying" --- libnuma.c | 14 ++ numa.3

[PATCH -V9 1/3] numa balancing: Migrate on fault among multiple bound nodes

2021-01-19 Thread Huang Ying
from node 1 to node 3 after killing the memory eater, and the pmbench score can increase about 17.5%. Signed-off-by: "Huang, Ying" Acked-by: Mel Gorman Cc: Peter Zijlstra Cc: Ingo Molnar Cc: Rik van Riel Cc: Johannes Weiner Cc: "Matthew Wilcox (Oracle)" Cc: Dave Hanse

[PATCH -V9 0/3] numa balancing: Migrate on fault among multiple bound nodes

2021-01-19 Thread Huang Ying
necessary. v4: - Use new flags instead of reuse MPOL_MF_LAZY. v3: - Rebased on latest upstream (v5.10-rc3) - Revised the change log. v2: - Rebased on latest upstream (v5.10-rc1) Best Regards, Huang, Ying

Re: [PATCH] mm: Free unused swap cache page in write protection fault handler

2021-01-15 Thread Huang, Ying
Linus Torvalds writes: > On Tue, Jan 12, 2021 at 9:24 PM huang ying > wrote: >> > >> > Couldn't we just move it to the tail of the LRU list so it's reclaimed >> > first? Or is locking going to be a problem here? >> >> Yes. That's a way to

Re: [PATCH] mm: Free unused swap cache page in write protection fault handler

2021-01-12 Thread huang ying
On Wed, Jan 13, 2021 at 11:12 AM Matthew Wilcox wrote: > > On Wed, Jan 13, 2021 at 11:08:56AM +0800, huang ying wrote: > > On Wed, Jan 13, 2021 at 10:47 AM Linus Torvalds > > wrote: > > > > > > On Tue, Jan 12, 2021 at 6:43 PM Huang Ying wrote: > >

Re: [PATCH] mm: Free unused swap cache page in write protection fault handler

2021-01-12 Thread huang ying
On Wed, Jan 13, 2021 at 10:47 AM Linus Torvalds wrote: > > On Tue, Jan 12, 2021 at 6:43 PM Huang Ying wrote: > > > > So in this patch, at the end of wp_page_copy(), the old unused swap > > cache page will be tried to be freed. > > I'd much rather free it later

[PATCH] mm: Free unused swap cache page in write protection fault handler

2021-01-12 Thread Huang Ying
SwapCached: 1240 kB AnonPages: 1904 kB BTW: I think this should be in stable after v5.9. Fixes: 09854ba94c6a ("mm: do_wp_page() simplification") Signed-off-by: "Huang, Ying" Cc: Linus Torvalds Cc: Peter Xu Cc: Hugh Dickins Cc: Johannes Weiner Cc: Mel Gorman

Re: [PATCH -V8 1/3] numa balancing: Migrate on fault among multiple bound nodes

2021-01-11 Thread Huang, Ying
Hi, Peter, Huang Ying writes: > Now, NUMA balancing can only optimize the page placement among the > NUMA nodes if the default memory policy is used. Because the memory > policy specified explicitly should take precedence. But this seems > too strict in some situations.

[PATCH -V8 2/3] NOT kernel/man2/set_mempolicy.2: Add mode flag MPOL_F_NUMA_BALANCING

2021-01-05 Thread Huang Ying
Signed-off-by: "Huang, Ying" Cc: "Alejandro Colomar" --- man2/set_mempolicy.2 | 22 ++ 1 file changed, 22 insertions(+) diff --git a/man2/set_mempolicy.2 b/man2/set_mempolicy.2 index 68011eecb..fa64a1820 100644 --- a/man2/set_mempolicy.2 +++ b/man2/set_m

[PATCH -V8 3/3] NOT kernel/numactl: Support to enable Linux kernel NUMA balancing

2021-01-05 Thread Huang Ying
be used before the --membind/-m memory policy in the command line. With it, the Linux kernel NUMA balancing will be enabled for the process if --membind/-m is used and the feature is supported by the kernel. Signed-off-by: "Huang, Ying" --- libnuma.c | 14 ++ numa.3

[PATCH -V8 1/3] numa balancing: Migrate on fault among multiple bound nodes

2021-01-05 Thread Huang Ying
from node 1 to node 3 after killing the memory eater, and the pmbench score can increase about 17.5%. Signed-off-by: "Huang, Ying" Acked-by: Mel Gorman Cc: Andrew Morton Cc: Ingo Molnar Cc: Rik van Riel Cc: Johannes Weiner Cc: "Matthew Wilcox (Oracle)" Cc: Dave Hanse

[PATCH -V8 0/3] numa balancing: Migrate on fault among multiple bound nodes

2021-01-05 Thread Huang Ying
: - Rebased on latest upstream (v5.10-rc3) - Revised the change log. v2: - Rebased on latest upstream (v5.10-rc1) Best Regards, Huang, Ying

  1   2   3   4   5   6   7   8   9   10   >