Re: [PATCH v3] mm: huge_memory: a new debugfs interface for splitting THP tests.

2021-03-15 Thread Yang Shi
On Mon, Mar 15, 2021 at 11:37 AM Zi Yan  wrote:
>
> On 15 Mar 2021, at 8:07, Kirill A. Shutemov wrote:
>
> > On Thu, Mar 11, 2021 at 07:57:12PM -0500, Zi Yan wrote:
> >> From: Zi Yan 
> >>
> >> We do not have a direct user interface of splitting the compound page
> >> backing a THP
> >
> > But we do. You expand it.
> >
> >> and there is no need unless we want to expose the THP
> >> implementation details to users. Make /split_huge_pages accept
> >> a new command to do that.
> >>
> >> By writing ",," to
> >> /split_huge_pages, THPs within the given virtual address range
> >> from the process with the given pid are split. It is used to test
> >> split_huge_page function. In addition, a selftest program is added to
> >> tools/testing/selftests/vm to utilize the interface by splitting
> >> PMD THPs and PTE-mapped THPs.
> >>
> >
> > Okay, makes sense.
> >
> > But it doesn't cover non-mapped THPs. tmpfs may have file backed by THP
> > that mapped nowhere. Do we want to cover this case too?
>
> Sure. It would be useful when large page in page cache too. I will send
> v4 with tmpfs THP split. I will definitely need a review for it, since
> I am not familiar with getting a page from a file path.

We do have some APIs to return pages for a file range, i.e.

find_get_page
find_get_pages
find_get_entries
find_get_pages_range

They all need address_space, so you need to convert file path to
address_space before using them.

The hole punch of tmpfs uses find_get_entries(), just check what
shmem_undo_range() does.

>
> > Maybe have PID:,, and
> > FILE:,, ?
>
> Or just check input[0] == ‘/‘ for file path input.
>
>
> —
> Best Regards,
> Yan Zi


Re: [PATCH v3] mm: huge_memory: a new debugfs interface for splitting THP tests.

2021-03-15 Thread Zi Yan
On 15 Mar 2021, at 8:07, Kirill A. Shutemov wrote:

> On Thu, Mar 11, 2021 at 07:57:12PM -0500, Zi Yan wrote:
>> From: Zi Yan 
>>
>> We do not have a direct user interface of splitting the compound page
>> backing a THP
>
> But we do. You expand it.
>
>> and there is no need unless we want to expose the THP
>> implementation details to users. Make /split_huge_pages accept
>> a new command to do that.
>>
>> By writing ",," to
>> /split_huge_pages, THPs within the given virtual address range
>> from the process with the given pid are split. It is used to test
>> split_huge_page function. In addition, a selftest program is added to
>> tools/testing/selftests/vm to utilize the interface by splitting
>> PMD THPs and PTE-mapped THPs.
>>
>
> Okay, makes sense.
>
> But it doesn't cover non-mapped THPs. tmpfs may have file backed by THP
> that mapped nowhere. Do we want to cover this case too?

Sure. It would be useful when large page in page cache too. I will send
v4 with tmpfs THP split. I will definitely need a review for it, since
I am not familiar with getting a page from a file path.

> Maybe have PID:,, and
> FILE:,, ?

Or just check input[0] == ‘/‘ for file path input.


—
Best Regards,
Yan Zi


signature.asc
Description: OpenPGP digital signature


Re: [PATCH v3] mm: huge_memory: a new debugfs interface for splitting THP tests.

2021-03-15 Thread Kirill A. Shutemov
On Thu, Mar 11, 2021 at 07:57:12PM -0500, Zi Yan wrote:
> From: Zi Yan 
> 
> We do not have a direct user interface of splitting the compound page
> backing a THP

But we do. You expand it.

> and there is no need unless we want to expose the THP
> implementation details to users. Make /split_huge_pages accept
> a new command to do that.
> 
> By writing ",," to
> /split_huge_pages, THPs within the given virtual address range
> from the process with the given pid are split. It is used to test
> split_huge_page function. In addition, a selftest program is added to
> tools/testing/selftests/vm to utilize the interface by splitting
> PMD THPs and PTE-mapped THPs.
> 

Okay, makes sense.

But it doesn't cover non-mapped THPs. tmpfs may have file backed by THP
that mapped nowhere. Do we want to cover this case too?

Maybe have PID:,, and
FILE:,, ?

-- 
 Kirill A. Shutemov


[PATCH v3] mm: huge_memory: a new debugfs interface for splitting THP tests.

2021-03-11 Thread Zi Yan
From: Zi Yan 

We do not have a direct user interface of splitting the compound page
backing a THP and there is no need unless we want to expose the THP
implementation details to users. Make /split_huge_pages accept
a new command to do that.

By writing ",," to
/split_huge_pages, THPs within the given virtual address range
from the process with the given pid are split. It is used to test
split_huge_page function. In addition, a selftest program is added to
tools/testing/selftests/vm to utilize the interface by splitting
PMD THPs and PTE-mapped THPs.

This does not change the old behavior, i.e., writing 1 to the interface
to split all THPs in the system.

Changelog:

>From v2:

1. Reused existing /split_huge_pages interface. (suggested by
   Yang Shi)

>From v1:

1. Removed unnecessary calling to vma_migratable, spotted by kernel test
   robot .
2. Dropped the use of find_mm_struct and code it directly, since there
   is no need for the permission check in that function and the function
   is only available when migration is on.
3. Added some comments in the selftest program to clarify how PTE-mapped
   THPs are formed.

Signed-off-by: Zi Yan 
---
 mm/huge_memory.c  | 122 ++-
 tools/testing/selftests/vm/.gitignore |   1 +
 tools/testing/selftests/vm/Makefile   |   1 +
 .../selftests/vm/split_huge_page_test.c   | 313 ++
 4 files changed, 430 insertions(+), 7 deletions(-)
 create mode 100644 tools/testing/selftests/vm/split_huge_page_test.c

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index bff92dea5ab3..f9fdff286a94 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -7,6 +7,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -2922,16 +2923,13 @@ static struct shrinker deferred_split_shrinker = {
 };
 
 #ifdef CONFIG_DEBUG_FS
-static int split_huge_pages_set(void *data, u64 val)
+static void split_huge_pages_all(void)
 {
struct zone *zone;
struct page *page;
unsigned long pfn, max_zone_pfn;
unsigned long total = 0, split = 0;
 
-   if (val != 1)
-   return -EINVAL;
-
for_each_populated_zone(zone) {
max_zone_pfn = zone_end_pfn(zone);
for (pfn = zone->zone_start_pfn; pfn < max_zone_pfn; pfn++) {
@@ -2959,11 +2957,121 @@ static int split_huge_pages_set(void *data, u64 val)
}
 
pr_info("%lu of %lu THP split\n", split, total);
+}
+
+static ssize_t split_huge_pages_write(struct file *file, const char __user 
*buf,
+   size_t count, loff_t *ppops)
+{
+   static DEFINE_MUTEX(mutex);
+   ssize_t ret;
+   char input_buf[80]; /* hold pid, start_vaddr, end_vaddr */
+   int pid;
+   unsigned long vaddr_start, vaddr_end, addr;
+   struct task_struct *task;
+   struct mm_struct *mm;
+   unsigned long total = 0, split = 0;
+
+   ret = mutex_lock_interruptible();
+   if (ret)
+   return ret;
+
+   ret = -EFAULT;
+
+   memset(input_buf, 0, 80);
+   if (copy_from_user(input_buf, buf, min_t(size_t, count, 80)))
+   goto out;
+
+   input_buf[79] = '\0';
+   ret = sscanf(input_buf, "%d,0x%lx,0x%lx", , _start, 
_end);
+   if (ret == 1 && pid == 1) {
+   split_huge_pages_all();
+   ret = strlen(input_buf);
+   goto out;
+   } else if (ret != 3) {
+   ret = -EINVAL;
+   goto out;
+   }
+
+   vaddr_start &= PAGE_MASK;
+   vaddr_end &= PAGE_MASK;
+
+   ret = strlen(input_buf);
+   pr_debug("split huge pages in pid: %d, vaddr: [%lx - %lx]\n",
+pid, vaddr_start, vaddr_end);
+
+   /* Find the task_struct from pid */
+   rcu_read_lock();
+   task = find_task_by_vpid(pid);
+   if (!task) {
+   rcu_read_unlock();
+   ret = -ESRCH;
+   goto out;
+   }
+   get_task_struct(task);
+   rcu_read_unlock();
+
+   /* Find the mm_struct */
+   mm = get_task_mm(task);
+   put_task_struct(task);
+
+   if (!mm) {
+   ret = -EINVAL;
+   goto out;
+   }
+
+   mmap_read_lock(mm);
+   /*
+* always increase addr by PAGE_SIZE, since we could have a PTE page
+* table filled with PTE-mapped THPs, each of which is distinct.
+*/
+   for (addr = vaddr_start; addr < vaddr_end; addr += PAGE_SIZE) {
+   struct vm_area_struct *vma = find_vma(mm, addr);
+   unsigned int follflags;
+   struct page *page;
+
+   if (!vma || addr < vma->vm_start)
+   break;
+
+   /* FOLL_DUMP to ignore special (like zero) pages */
+   follflags = FOLL_GET | FOLL_DUMP;
+   page = follow_page(vma, addr, follflags);
+
+   if (IS_ERR(page))
+   break;
+   if (!page)
+