Re: [PATCH v2 5/7] riscv: mm: accelerate pagefault when badaccess

2024-04-10 Thread Kefeng Wang




On 2024/4/11 1:28, Alexandre Ghiti wrote:

On 10/04/2024 10:07, Kefeng Wang wrote:



On 2024/4/10 15:32, Alexandre Ghiti wrote:

Hi Kefeng,

On 03/04/2024 10:38, Kefeng Wang wrote:

The access_error() of vma already checked under per-VMA lock, if it
is a bad access, directly handle error, no need to retry with mmap_lock
again. Since the page faut is handled under per-VMA lock, count it as
a vma lock event with VMA_LOCK_SUCCESS.

Reviewed-by: Suren Baghdasaryan 
Signed-off-by: Kefeng Wang 
---
  arch/riscv/mm/fault.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c
index 3ba1d4dde5dd..b3fcf7d67efb 100644
--- a/arch/riscv/mm/fault.c
+++ b/arch/riscv/mm/fault.c
@@ -292,7 +292,10 @@ void handle_page_fault(struct pt_regs *regs)
  if (unlikely(access_error(cause, vma))) {
  vma_end_read(vma);
-    goto lock_mmap;
+    count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
+    tsk->thread.bad_cause = SEGV_ACCERR;



I think we should use the cause variable here instead of SEGV_ACCERR, 
as bad_cause is a riscv internal status which describes the real 
fault that happened.


Oh, I see, it is exception causes on riscv, so it should be

diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c
index b3fcf7d67efb..5224f3733802 100644
--- a/arch/riscv/mm/fault.c
+++ b/arch/riscv/mm/fault.c
@@ -293,8 +293,8 @@ void handle_page_fault(struct pt_regs *regs)
    if (unlikely(access_error(cause, vma))) {
    vma_end_read(vma);
    count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
-   tsk->thread.bad_cause = SEGV_ACCERR;
-   bad_area_nosemaphore(regs, code, addr);
+   tsk->thread.bad_cause = cause;
+   bad_area_nosemaphore(regs, SEGV_ACCERR, addr);
    return;
    }

Hi Alex, could you help to check it?

Hi Andrew, please help to squash it after Alex ack it.

Thanks both.



So I have just tested Kefeng's fixup on my usual CI and with a simple 
program that triggers such bad access, everything went fine so with the 
fixup applied:


Reviewed-by: Alexandre Ghiti 
Tested-by: Alexandre Ghiti 


Great, thanks.



Thanks,

Alex








Thanks,

Alex



+    bad_area_nosemaphore(regs, code, addr);
+    return;
  }
  fault = handle_mm_fault(vma, addr, flags | 
FAULT_FLAG_VMA_LOCK, regs);


Re: [PATCH v2 5/7] riscv: mm: accelerate pagefault when badaccess

2024-04-10 Thread Kefeng Wang




On 2024/4/10 15:32, Alexandre Ghiti wrote:

Hi Kefeng,

On 03/04/2024 10:38, Kefeng Wang wrote:

The access_error() of vma already checked under per-VMA lock, if it
is a bad access, directly handle error, no need to retry with mmap_lock
again. Since the page faut is handled under per-VMA lock, count it as
a vma lock event with VMA_LOCK_SUCCESS.

Reviewed-by: Suren Baghdasaryan 
Signed-off-by: Kefeng Wang 
---
  arch/riscv/mm/fault.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c
index 3ba1d4dde5dd..b3fcf7d67efb 100644
--- a/arch/riscv/mm/fault.c
+++ b/arch/riscv/mm/fault.c
@@ -292,7 +292,10 @@ void handle_page_fault(struct pt_regs *regs)
  if (unlikely(access_error(cause, vma))) {
  vma_end_read(vma);
-    goto lock_mmap;
+    count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
+    tsk->thread.bad_cause = SEGV_ACCERR;



I think we should use the cause variable here instead of SEGV_ACCERR, as 
bad_cause is a riscv internal status which describes the real fault that 
happened.


Oh, I see, it is exception causes on riscv, so it should be

diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c
index b3fcf7d67efb..5224f3733802 100644
--- a/arch/riscv/mm/fault.c
+++ b/arch/riscv/mm/fault.c
@@ -293,8 +293,8 @@ void handle_page_fault(struct pt_regs *regs)
if (unlikely(access_error(cause, vma))) {
vma_end_read(vma);
count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
-   tsk->thread.bad_cause = SEGV_ACCERR;
-   bad_area_nosemaphore(regs, code, addr);
+   tsk->thread.bad_cause = cause;
+   bad_area_nosemaphore(regs, SEGV_ACCERR, addr);
return;
}

Hi Alex, could you help to check it?

Hi Andrew, please help to squash it after Alex ack it.

Thanks both.




Thanks,

Alex



+    bad_area_nosemaphore(regs, code, addr);
+    return;
  }
  fault = handle_mm_fault(vma, addr, flags | FAULT_FLAG_VMA_LOCK, 
regs);


Re: [PATCH v2 0/7] arch/mm/fault: accelerate pagefault when badaccess

2024-04-07 Thread Kefeng Wang




On 2024/4/4 4:45, Andrew Morton wrote:

On Wed, 3 Apr 2024 16:37:58 +0800 Kefeng Wang  
wrote:


After VMA lock-based page fault handling enabled, if bad access met
under per-vma lock, it will fallback to mmap_lock-based handling,
so it leads to unnessary mmap lock and vma find again. A test from
lmbench shows 34% improve after this changes on arm64,

   lat_sig -P 1 prot lat_sig 0.29194 -> 0.19198

Only build test on other archs except arm64.


Thanks.  So we now want a bunch of architectures to runtime test this.  Do
we have a selftest in place which will adequately do this?


I don't find such selftest, and badaccess would lead to coredump, the
performance should not affect most scene, so no selftest is acceptable.
lmbench is easy to use to measure the performance.


[PATCH v2 7/7] x86: mm: accelerate pagefault when badaccess

2024-04-03 Thread Kefeng Wang
The access_error() of vma already checked under per-VMA lock, if it
is a bad access, directly handle error, no need to retry with mmap_lock
again. In order to release the correct lock, pass the mm_struct into
bad_area_access_error(), if mm is NULL, release vma lock, or release
mmap_lock. Since the page faut is handled under per-VMA lock, count it
as a vma lock event with VMA_LOCK_SUCCESS.

Reviewed-by: Suren Baghdasaryan 
Signed-off-by: Kefeng Wang 
---
 arch/x86/mm/fault.c | 23 ++-
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index a4cc20d0036d..67b18adc75dd 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -866,14 +866,17 @@ bad_area_nosemaphore(struct pt_regs *regs, unsigned long 
error_code,
 
 static void
 __bad_area(struct pt_regs *regs, unsigned long error_code,
-  unsigned long address, u32 pkey, int si_code)
+  unsigned long address, struct mm_struct *mm,
+  struct vm_area_struct *vma, u32 pkey, int si_code)
 {
-   struct mm_struct *mm = current->mm;
/*
 * Something tried to access memory that isn't in our memory map..
 * Fix it, but check if it's kernel or user first..
 */
-   mmap_read_unlock(mm);
+   if (mm)
+   mmap_read_unlock(mm);
+   else
+   vma_end_read(vma);
 
__bad_area_nosemaphore(regs, error_code, address, pkey, si_code);
 }
@@ -897,7 +900,8 @@ static inline bool bad_area_access_from_pkeys(unsigned long 
error_code,
 
 static noinline void
 bad_area_access_error(struct pt_regs *regs, unsigned long error_code,
- unsigned long address, struct vm_area_struct *vma)
+ unsigned long address, struct mm_struct *mm,
+ struct vm_area_struct *vma)
 {
/*
 * This OSPKE check is not strictly necessary at runtime.
@@ -927,9 +931,9 @@ bad_area_access_error(struct pt_regs *regs, unsigned long 
error_code,
 */
u32 pkey = vma_pkey(vma);
 
-   __bad_area(regs, error_code, address, pkey, SEGV_PKUERR);
+   __bad_area(regs, error_code, address, mm, vma, pkey, 
SEGV_PKUERR);
} else {
-   __bad_area(regs, error_code, address, 0, SEGV_ACCERR);
+   __bad_area(regs, error_code, address, mm, vma, 0, SEGV_ACCERR);
}
 }
 
@@ -1357,8 +1361,9 @@ void do_user_addr_fault(struct pt_regs *regs,
goto lock_mmap;
 
if (unlikely(access_error(error_code, vma))) {
-   vma_end_read(vma);
-   goto lock_mmap;
+   bad_area_access_error(regs, error_code, address, NULL, vma);
+   count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
+   return;
}
fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, 
regs);
if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
@@ -1394,7 +1399,7 @@ void do_user_addr_fault(struct pt_regs *regs,
 * we can handle it..
 */
if (unlikely(access_error(error_code, vma))) {
-   bad_area_access_error(regs, error_code, address, vma);
+   bad_area_access_error(regs, error_code, address, mm, vma);
return;
}
 
-- 
2.27.0



[PATCH v2 5/7] riscv: mm: accelerate pagefault when badaccess

2024-04-03 Thread Kefeng Wang
The access_error() of vma already checked under per-VMA lock, if it
is a bad access, directly handle error, no need to retry with mmap_lock
again. Since the page faut is handled under per-VMA lock, count it as
a vma lock event with VMA_LOCK_SUCCESS.

Reviewed-by: Suren Baghdasaryan 
Signed-off-by: Kefeng Wang 
---
 arch/riscv/mm/fault.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c
index 3ba1d4dde5dd..b3fcf7d67efb 100644
--- a/arch/riscv/mm/fault.c
+++ b/arch/riscv/mm/fault.c
@@ -292,7 +292,10 @@ void handle_page_fault(struct pt_regs *regs)
 
if (unlikely(access_error(cause, vma))) {
vma_end_read(vma);
-   goto lock_mmap;
+   count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
+   tsk->thread.bad_cause = SEGV_ACCERR;
+   bad_area_nosemaphore(regs, code, addr);
+   return;
}
 
fault = handle_mm_fault(vma, addr, flags | FAULT_FLAG_VMA_LOCK, regs);
-- 
2.27.0



[PATCH v2 6/7] s390: mm: accelerate pagefault when badaccess

2024-04-03 Thread Kefeng Wang
The vm_flags of vma already checked under per-VMA lock, if it is a
bad access, directly handle error, no need to retry with mmap_lock
again. Since the page faut is handled under per-VMA lock, count it
as a vma lock event with VMA_LOCK_SUCCESS.

Signed-off-by: Kefeng Wang 
---
 arch/s390/mm/fault.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
index c421dd44ffbe..162ca2576fd4 100644
--- a/arch/s390/mm/fault.c
+++ b/arch/s390/mm/fault.c
@@ -325,7 +325,8 @@ static void do_exception(struct pt_regs *regs, int access)
goto lock_mmap;
if (!(vma->vm_flags & access)) {
vma_end_read(vma);
-   goto lock_mmap;
+   count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
+   return handle_fault_error_nolock(regs, SEGV_ACCERR);
}
fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, 
regs);
if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
-- 
2.27.0



[PATCH v2 3/7] arm: mm: accelerate pagefault when VM_FAULT_BADACCESS

2024-04-03 Thread Kefeng Wang
The vm_flags of vma already checked under per-VMA lock, if it is a
bad access, directly set fault to VM_FAULT_BADACCESS and handle error,
no need to retry with mmap_lock again. Since the page faut is handled
under per-VMA lock, count it as a vma lock event with VMA_LOCK_SUCCESS.

Reviewed-by: Suren Baghdasaryan 
Signed-off-by: Kefeng Wang 
---
 arch/arm/mm/fault.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index 439dc6a26bb9..5c4b417e24f9 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -294,7 +294,9 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct 
pt_regs *regs)
 
if (!(vma->vm_flags & vm_flags)) {
vma_end_read(vma);
-   goto lock_mmap;
+   count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
+   fault = VM_FAULT_BADACCESS;
+   goto bad_area;
}
fault = handle_mm_fault(vma, addr, flags | FAULT_FLAG_VMA_LOCK, regs);
if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
-- 
2.27.0



[PATCH v2 4/7] powerpc: mm: accelerate pagefault when badaccess

2024-04-03 Thread Kefeng Wang
The access_[pkey]_error() of vma already checked under per-VMA lock, if
it is a bad access, directly handle error, no need to retry with mmap_lock
again. In order to release the correct lock, pass the mm_struct into
bad_access_pkey()/bad_access(), if mm is NULL, release vma lock, or
release mmap_lock. Since the page faut is handled under per-VMA lock,
count it as a vma lock event with VMA_LOCK_SUCCESS.

Signed-off-by: Kefeng Wang 
---
 arch/powerpc/mm/fault.c | 33 -
 1 file changed, 20 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 53335ae21a40..215690452495 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -71,23 +71,26 @@ static noinline int bad_area_nosemaphore(struct pt_regs 
*regs, unsigned long add
return __bad_area_nosemaphore(regs, address, SEGV_MAPERR);
 }
 
-static int __bad_area(struct pt_regs *regs, unsigned long address, int si_code)
+static int __bad_area(struct pt_regs *regs, unsigned long address, int si_code,
+ struct mm_struct *mm, struct vm_area_struct *vma)
 {
-   struct mm_struct *mm = current->mm;
 
/*
 * Something tried to access memory that isn't in our memory map..
 * Fix it, but check if it's kernel or user first..
 */
-   mmap_read_unlock(mm);
+   if (mm)
+   mmap_read_unlock(mm);
+   else
+   vma_end_read(vma);
 
return __bad_area_nosemaphore(regs, address, si_code);
 }
 
 static noinline int bad_access_pkey(struct pt_regs *regs, unsigned long 
address,
+   struct mm_struct *mm,
struct vm_area_struct *vma)
 {
-   struct mm_struct *mm = current->mm;
int pkey;
 
/*
@@ -109,7 +112,10 @@ static noinline int bad_access_pkey(struct pt_regs *regs, 
unsigned long address,
 */
pkey = vma_pkey(vma);
 
-   mmap_read_unlock(mm);
+   if (mm)
+   mmap_read_unlock(mm);
+   else
+   vma_end_read(vma);
 
/*
 * If we are in kernel mode, bail out with a SEGV, this will
@@ -124,9 +130,10 @@ static noinline int bad_access_pkey(struct pt_regs *regs, 
unsigned long address,
return 0;
 }
 
-static noinline int bad_access(struct pt_regs *regs, unsigned long address)
+static noinline int bad_access(struct pt_regs *regs, unsigned long address,
+  struct mm_struct *mm, struct vm_area_struct *vma)
 {
-   return __bad_area(regs, address, SEGV_ACCERR);
+   return __bad_area(regs, address, SEGV_ACCERR, mm, vma);
 }
 
 static int do_sigbus(struct pt_regs *regs, unsigned long address,
@@ -479,13 +486,13 @@ static int ___do_page_fault(struct pt_regs *regs, 
unsigned long address,
 
if (unlikely(access_pkey_error(is_write, is_exec,
   (error_code & DSISR_KEYFAULT), vma))) {
-   vma_end_read(vma);
-   goto lock_mmap;
+   count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
+   return bad_access_pkey(regs, address, NULL, vma);
}
 
if (unlikely(access_error(is_write, is_exec, vma))) {
-   vma_end_read(vma);
-   goto lock_mmap;
+   count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
+   return bad_access(regs, address, NULL, vma);
}
 
fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, 
regs);
@@ -521,10 +528,10 @@ static int ___do_page_fault(struct pt_regs *regs, 
unsigned long address,
 
if (unlikely(access_pkey_error(is_write, is_exec,
   (error_code & DSISR_KEYFAULT), vma)))
-   return bad_access_pkey(regs, address, vma);
+   return bad_access_pkey(regs, address, mm, vma);
 
if (unlikely(access_error(is_write, is_exec, vma)))
-   return bad_access(regs, address);
+   return bad_access(regs, address, mm, vma);
 
/*
 * If for any reason at all we couldn't handle the fault,
-- 
2.27.0



[PATCH v2 0/7] arch/mm/fault: accelerate pagefault when badaccess

2024-04-03 Thread Kefeng Wang
After VMA lock-based page fault handling enabled, if bad access met
under per-vma lock, it will fallback to mmap_lock-based handling,
so it leads to unnessary mmap lock and vma find again. A test from
lmbench shows 34% improve after this changes on arm64,

  lat_sig -P 1 prot lat_sig 0.29194 -> 0.19198

Only build test on other archs except arm64.

v2: 
- a better changelog, and describe the counting changes, suggested by
  Suren Baghdasaryan
- add RB

Kefeng Wang (7):
  arm64: mm: cleanup __do_page_fault()
  arm64: mm: accelerate pagefault when VM_FAULT_BADACCESS
  arm: mm: accelerate pagefault when VM_FAULT_BADACCESS
  powerpc: mm: accelerate pagefault when badaccess
  riscv: mm: accelerate pagefault when badaccess
  s390: mm: accelerate pagefault when badaccess
  x86: mm: accelerate pagefault when badaccess

 arch/arm/mm/fault.c |  4 +++-
 arch/arm64/mm/fault.c   | 31 ++-
 arch/powerpc/mm/fault.c | 33 -
 arch/riscv/mm/fault.c   |  5 -
 arch/s390/mm/fault.c|  3 ++-
 arch/x86/mm/fault.c | 23 ++-
 6 files changed, 53 insertions(+), 46 deletions(-)

-- 
2.27.0



[PATCH v2 2/7] arm64: mm: accelerate pagefault when VM_FAULT_BADACCESS

2024-04-03 Thread Kefeng Wang
The vm_flags of vma already checked under per-VMA lock, if it is a
bad access, directly set fault to VM_FAULT_BADACCESS and handle error,
no need to retry with mmap_lock again, the latency time reduces 34% in
'lat_sig -P 1 prot lat_sig' from lmbench testcase.

Since the page faut is handled under per-VMA lock, count it as a vma lock
event with VMA_LOCK_SUCCESS.

Reviewed-by: Suren Baghdasaryan 
Signed-off-by: Kefeng Wang 
---
 arch/arm64/mm/fault.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 9bb9f395351a..405f9aa831bd 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -572,7 +572,9 @@ static int __kprobes do_page_fault(unsigned long far, 
unsigned long esr,
 
if (!(vma->vm_flags & vm_flags)) {
vma_end_read(vma);
-   goto lock_mmap;
+   fault = VM_FAULT_BADACCESS;
+   count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
+   goto done;
}
fault = handle_mm_fault(vma, addr, mm_flags | FAULT_FLAG_VMA_LOCK, 
regs);
if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
-- 
2.27.0



[PATCH v2 1/7] arm64: mm: cleanup __do_page_fault()

2024-04-03 Thread Kefeng Wang
The __do_page_fault() only calls handle_mm_fault() after vm_flags
checked, and it is only called by do_page_fault(), let's squash
it into do_page_fault() to cleanup code.

Reviewed-by: Suren Baghdasaryan 
Signed-off-by: Kefeng Wang 
---
 arch/arm64/mm/fault.c | 27 +++
 1 file changed, 7 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 8251e2fea9c7..9bb9f395351a 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -486,25 +486,6 @@ static void do_bad_area(unsigned long far, unsigned long 
esr,
}
 }
 
-#define VM_FAULT_BADMAP((__force vm_fault_t)0x01)
-#define VM_FAULT_BADACCESS ((__force vm_fault_t)0x02)
-
-static vm_fault_t __do_page_fault(struct mm_struct *mm,
- struct vm_area_struct *vma, unsigned long 
addr,
- unsigned int mm_flags, unsigned long vm_flags,
- struct pt_regs *regs)
-{
-   /*
-* Ok, we have a good vm_area for this memory access, so we can handle
-* it.
-* Check that the permissions on the VMA allow for the fault which
-* occurred.
-*/
-   if (!(vma->vm_flags & vm_flags))
-   return VM_FAULT_BADACCESS;
-   return handle_mm_fault(vma, addr, mm_flags, regs);
-}
-
 static bool is_el0_instruction_abort(unsigned long esr)
 {
return ESR_ELx_EC(esr) == ESR_ELx_EC_IABT_LOW;
@@ -519,6 +500,9 @@ static bool is_write_abort(unsigned long esr)
return (esr & ESR_ELx_WNR) && !(esr & ESR_ELx_CM);
 }
 
+#define VM_FAULT_BADMAP((__force vm_fault_t)0x01)
+#define VM_FAULT_BADACCESS ((__force vm_fault_t)0x02)
+
 static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
   struct pt_regs *regs)
 {
@@ -617,7 +601,10 @@ static int __kprobes do_page_fault(unsigned long far, 
unsigned long esr,
goto done;
}
 
-   fault = __do_page_fault(mm, vma, addr, mm_flags, vm_flags, regs);
+   if (!(vma->vm_flags & vm_flags))
+   fault = VM_FAULT_BADACCESS;
+   else
+   fault = handle_mm_fault(vma, addr, mm_flags, regs);
 
/* Quick path to respond to signals */
if (fault_signal_pending(fault, regs)) {
-- 
2.27.0



Re: [PATCH 7/7] x86: mm: accelerate pagefault when badaccess

2024-04-03 Thread Kefeng Wang




On 2024/4/3 13:59, Suren Baghdasaryan wrote:

On Tue, Apr 2, 2024 at 12:53 AM Kefeng Wang  wrote:


The vm_flags of vma already checked under per-VMA lock, if it is a
bad access, directly handle error and return, there is no need to
lock_mm_and_find_vma() and check vm_flags again.

Signed-off-by: Kefeng Wang 


Looks safe to me.
Using (mm != NULL) to indicate that we are holding mmap_lock is not
ideal but I guess that works.



Yes, I will add this part it into change too,

The access_error() of vma already checked under per-VMA lock, if it
is a bad access, directly handle error, no need to retry with mmap_lock
again. In order to release the correct lock, pass the mm_struct into
bad_area_access_error(), if mm is NULL, release vma lock, or release
mmap_lock. Since the page faut is handled under per-VMA lock, count it
as a vma lock event with VMA_LOCK_SUCCESS.

Thanks.



Reviewed-by: Suren Baghdasaryan 


---
  arch/x86/mm/fault.c | 23 ++-
  1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index a4cc20d0036d..67b18adc75dd 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -866,14 +866,17 @@ bad_area_nosemaphore(struct pt_regs *regs, unsigned long 
error_code,

  static void
  __bad_area(struct pt_regs *regs, unsigned long error_code,
-  unsigned long address, u32 pkey, int si_code)
+  unsigned long address, struct mm_struct *mm,
+  struct vm_area_struct *vma, u32 pkey, int si_code)
  {
-   struct mm_struct *mm = current->mm;
 /*
  * Something tried to access memory that isn't in our memory map..
  * Fix it, but check if it's kernel or user first..
  */
-   mmap_read_unlock(mm);
+   if (mm)
+   mmap_read_unlock(mm);
+   else
+   vma_end_read(vma);

 __bad_area_nosemaphore(regs, error_code, address, pkey, si_code);
  }
@@ -897,7 +900,8 @@ static inline bool bad_area_access_from_pkeys(unsigned long 
error_code,

  static noinline void
  bad_area_access_error(struct pt_regs *regs, unsigned long error_code,
- unsigned long address, struct vm_area_struct *vma)
+ unsigned long address, struct mm_struct *mm,
+ struct vm_area_struct *vma)
  {
 /*
  * This OSPKE check is not strictly necessary at runtime.
@@ -927,9 +931,9 @@ bad_area_access_error(struct pt_regs *regs, unsigned long 
error_code,
  */
 u32 pkey = vma_pkey(vma);

-   __bad_area(regs, error_code, address, pkey, SEGV_PKUERR);
+   __bad_area(regs, error_code, address, mm, vma, pkey, 
SEGV_PKUERR);
 } else {
-   __bad_area(regs, error_code, address, 0, SEGV_ACCERR);
+   __bad_area(regs, error_code, address, mm, vma, 0, SEGV_ACCERR);
 }
  }

@@ -1357,8 +1361,9 @@ void do_user_addr_fault(struct pt_regs *regs,
 goto lock_mmap;

 if (unlikely(access_error(error_code, vma))) {
-   vma_end_read(vma);
-   goto lock_mmap;
+   bad_area_access_error(regs, error_code, address, NULL, vma);
+   count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
+   return;
 }
 fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, 
regs);
 if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
@@ -1394,7 +1399,7 @@ void do_user_addr_fault(struct pt_regs *regs,
  * we can handle it..
  */
 if (unlikely(access_error(error_code, vma))) {
-   bad_area_access_error(regs, error_code, address, vma);
+   bad_area_access_error(regs, error_code, address, mm, vma);
 return;
 }

--
2.27.0



Re: [PATCH 2/7] arm64: mm: accelerate pagefault when VM_FAULT_BADACCESS

2024-04-03 Thread Kefeng Wang




On 2024/4/3 13:30, Suren Baghdasaryan wrote:

On Tue, Apr 2, 2024 at 10:19 PM Suren Baghdasaryan  wrote:


On Tue, Apr 2, 2024 at 12:53 AM Kefeng Wang  wrote:


The vm_flags of vma already checked under per-VMA lock, if it is a
bad access, directly set fault to VM_FAULT_BADACCESS and handle error,
no need to lock_mm_and_find_vma() and check vm_flags again, the latency
time reduce 34% in lmbench 'lat_sig -P 1 prot lat_sig'.


The change makes sense to me. Per-VMA lock is enough to keep
vma->vm_flags stable, so no need to retry with mmap_lock.



Signed-off-by: Kefeng Wang 


Reviewed-by: Suren Baghdasaryan 


---
  arch/arm64/mm/fault.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 9bb9f395351a..405f9aa831bd 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -572,7 +572,9 @@ static int __kprobes do_page_fault(unsigned long far, 
unsigned long esr,

 if (!(vma->vm_flags & vm_flags)) {
 vma_end_read(vma);
-   goto lock_mmap;
+   fault = VM_FAULT_BADACCESS;
+   count_vm_vma_lock_event(VMA_LOCK_SUCCESS);


nit: VMA_LOCK_SUCCESS accounting here seems correct to me but
unrelated to the main change. Either splitting into a separate patch
or mentioning this additional fixup in the changelog would be helpful.


The above nit applies to all the patches after this one, so I won't
comment on each one separately. If you decide to split or adjust the
changelog please do that for each patch.


I will update the change log for each patch, thank for your review and 
suggestion.







+   goto done;
 }
 fault = handle_mm_fault(vma, addr, mm_flags | FAULT_FLAG_VMA_LOCK, 
regs);
 if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
--
2.27.0



[PATCH 7/7] x86: mm: accelerate pagefault when badaccess

2024-04-02 Thread Kefeng Wang
The vm_flags of vma already checked under per-VMA lock, if it is a
bad access, directly handle error and return, there is no need to
lock_mm_and_find_vma() and check vm_flags again.

Signed-off-by: Kefeng Wang 
---
 arch/x86/mm/fault.c | 23 ++-
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index a4cc20d0036d..67b18adc75dd 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -866,14 +866,17 @@ bad_area_nosemaphore(struct pt_regs *regs, unsigned long 
error_code,
 
 static void
 __bad_area(struct pt_regs *regs, unsigned long error_code,
-  unsigned long address, u32 pkey, int si_code)
+  unsigned long address, struct mm_struct *mm,
+  struct vm_area_struct *vma, u32 pkey, int si_code)
 {
-   struct mm_struct *mm = current->mm;
/*
 * Something tried to access memory that isn't in our memory map..
 * Fix it, but check if it's kernel or user first..
 */
-   mmap_read_unlock(mm);
+   if (mm)
+   mmap_read_unlock(mm);
+   else
+   vma_end_read(vma);
 
__bad_area_nosemaphore(regs, error_code, address, pkey, si_code);
 }
@@ -897,7 +900,8 @@ static inline bool bad_area_access_from_pkeys(unsigned long 
error_code,
 
 static noinline void
 bad_area_access_error(struct pt_regs *regs, unsigned long error_code,
- unsigned long address, struct vm_area_struct *vma)
+ unsigned long address, struct mm_struct *mm,
+ struct vm_area_struct *vma)
 {
/*
 * This OSPKE check is not strictly necessary at runtime.
@@ -927,9 +931,9 @@ bad_area_access_error(struct pt_regs *regs, unsigned long 
error_code,
 */
u32 pkey = vma_pkey(vma);
 
-   __bad_area(regs, error_code, address, pkey, SEGV_PKUERR);
+   __bad_area(regs, error_code, address, mm, vma, pkey, 
SEGV_PKUERR);
} else {
-   __bad_area(regs, error_code, address, 0, SEGV_ACCERR);
+   __bad_area(regs, error_code, address, mm, vma, 0, SEGV_ACCERR);
}
 }
 
@@ -1357,8 +1361,9 @@ void do_user_addr_fault(struct pt_regs *regs,
goto lock_mmap;
 
if (unlikely(access_error(error_code, vma))) {
-   vma_end_read(vma);
-   goto lock_mmap;
+   bad_area_access_error(regs, error_code, address, NULL, vma);
+   count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
+   return;
}
fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, 
regs);
if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
@@ -1394,7 +1399,7 @@ void do_user_addr_fault(struct pt_regs *regs,
 * we can handle it..
 */
if (unlikely(access_error(error_code, vma))) {
-   bad_area_access_error(regs, error_code, address, vma);
+   bad_area_access_error(regs, error_code, address, mm, vma);
return;
}
 
-- 
2.27.0



[PATCH 1/7] arm64: mm: cleanup __do_page_fault()

2024-04-02 Thread Kefeng Wang
The __do_page_fault() only check vma->flags and call handle_mm_fault(),
and only called by do_page_fault(), let's squash it into do_page_fault()
to cleanup code.

Signed-off-by: Kefeng Wang 
---
 arch/arm64/mm/fault.c | 27 +++
 1 file changed, 7 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 8251e2fea9c7..9bb9f395351a 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -486,25 +486,6 @@ static void do_bad_area(unsigned long far, unsigned long 
esr,
}
 }
 
-#define VM_FAULT_BADMAP((__force vm_fault_t)0x01)
-#define VM_FAULT_BADACCESS ((__force vm_fault_t)0x02)
-
-static vm_fault_t __do_page_fault(struct mm_struct *mm,
- struct vm_area_struct *vma, unsigned long 
addr,
- unsigned int mm_flags, unsigned long vm_flags,
- struct pt_regs *regs)
-{
-   /*
-* Ok, we have a good vm_area for this memory access, so we can handle
-* it.
-* Check that the permissions on the VMA allow for the fault which
-* occurred.
-*/
-   if (!(vma->vm_flags & vm_flags))
-   return VM_FAULT_BADACCESS;
-   return handle_mm_fault(vma, addr, mm_flags, regs);
-}
-
 static bool is_el0_instruction_abort(unsigned long esr)
 {
return ESR_ELx_EC(esr) == ESR_ELx_EC_IABT_LOW;
@@ -519,6 +500,9 @@ static bool is_write_abort(unsigned long esr)
return (esr & ESR_ELx_WNR) && !(esr & ESR_ELx_CM);
 }
 
+#define VM_FAULT_BADMAP((__force vm_fault_t)0x01)
+#define VM_FAULT_BADACCESS ((__force vm_fault_t)0x02)
+
 static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
   struct pt_regs *regs)
 {
@@ -617,7 +601,10 @@ static int __kprobes do_page_fault(unsigned long far, 
unsigned long esr,
goto done;
}
 
-   fault = __do_page_fault(mm, vma, addr, mm_flags, vm_flags, regs);
+   if (!(vma->vm_flags & vm_flags))
+   fault = VM_FAULT_BADACCESS;
+   else
+   fault = handle_mm_fault(vma, addr, mm_flags, regs);
 
/* Quick path to respond to signals */
if (fault_signal_pending(fault, regs)) {
-- 
2.27.0



[PATCH 6/7] s390: mm: accelerate pagefault when badaccess

2024-04-02 Thread Kefeng Wang
The vm_flags of vma already checked under per-VMA lock, if it is a
bad access, directly handle error and return, there is no need to
lock_mm_and_find_vma() and check vm_flags again.

Signed-off-by: Kefeng Wang 
---
 arch/s390/mm/fault.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
index c421dd44ffbe..162ca2576fd4 100644
--- a/arch/s390/mm/fault.c
+++ b/arch/s390/mm/fault.c
@@ -325,7 +325,8 @@ static void do_exception(struct pt_regs *regs, int access)
goto lock_mmap;
if (!(vma->vm_flags & access)) {
vma_end_read(vma);
-   goto lock_mmap;
+   count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
+   return handle_fault_error_nolock(regs, SEGV_ACCERR);
}
fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, 
regs);
if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
-- 
2.27.0



[PATCH 0/7] arch/mm/fault: accelerate pagefault when badaccess

2024-04-02 Thread Kefeng Wang
After VMA lock-based page fault handling enabled, if bad access met
under per-vma lock, it will fallback to mmap_lock-based handling,
so it leads to unnessary mmap lock and vma find again. A test from
lmbench shows 34% improve after this changes on arm64,

  lat_sig -P 1 prot lat_sig 0.29194 -> 0.19198

Only build test on other archs except arm64.

Kefeng Wang (7):
  arm64: mm: cleanup __do_page_fault()
  arm64: mm: accelerate pagefault when VM_FAULT_BADACCESS
  arm: mm: accelerate pagefault when VM_FAULT_BADACCESS
  powerpc: mm: accelerate pagefault when badaccess
  riscv: mm: accelerate pagefault when badaccess
  s390: mm: accelerate pagefault when badaccess
  x86: mm: accelerate pagefault when badaccess

 arch/arm/mm/fault.c |  4 +++-
 arch/arm64/mm/fault.c   | 31 ++-
 arch/powerpc/mm/fault.c | 33 -
 arch/riscv/mm/fault.c   |  5 -
 arch/s390/mm/fault.c|  3 ++-
 arch/x86/mm/fault.c | 23 ++-
 6 files changed, 53 insertions(+), 46 deletions(-)

-- 
2.27.0



[PATCH 5/7] riscv: mm: accelerate pagefault when badaccess

2024-04-02 Thread Kefeng Wang
The vm_flags of vma already checked under per-VMA lock, if it is a
bad access, directly handle error and return, there is no need to
lock_mm_and_find_vma() and check vm_flags again.

Signed-off-by: Kefeng Wang 
---
 arch/riscv/mm/fault.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c
index 3ba1d4dde5dd..b3fcf7d67efb 100644
--- a/arch/riscv/mm/fault.c
+++ b/arch/riscv/mm/fault.c
@@ -292,7 +292,10 @@ void handle_page_fault(struct pt_regs *regs)
 
if (unlikely(access_error(cause, vma))) {
vma_end_read(vma);
-   goto lock_mmap;
+   count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
+   tsk->thread.bad_cause = SEGV_ACCERR;
+   bad_area_nosemaphore(regs, code, addr);
+   return;
}
 
fault = handle_mm_fault(vma, addr, flags | FAULT_FLAG_VMA_LOCK, regs);
-- 
2.27.0



[PATCH 4/7] powerpc: mm: accelerate pagefault when badaccess

2024-04-02 Thread Kefeng Wang
The vm_flags of vma already checked under per-VMA lock, if it is a
bad access, directly handle error and return, there is no need to
lock_mm_and_find_vma() and check vm_flags again.

Signed-off-by: Kefeng Wang 
---
 arch/powerpc/mm/fault.c | 33 -
 1 file changed, 20 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 53335ae21a40..215690452495 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -71,23 +71,26 @@ static noinline int bad_area_nosemaphore(struct pt_regs 
*regs, unsigned long add
return __bad_area_nosemaphore(regs, address, SEGV_MAPERR);
 }
 
-static int __bad_area(struct pt_regs *regs, unsigned long address, int si_code)
+static int __bad_area(struct pt_regs *regs, unsigned long address, int si_code,
+ struct mm_struct *mm, struct vm_area_struct *vma)
 {
-   struct mm_struct *mm = current->mm;
 
/*
 * Something tried to access memory that isn't in our memory map..
 * Fix it, but check if it's kernel or user first..
 */
-   mmap_read_unlock(mm);
+   if (mm)
+   mmap_read_unlock(mm);
+   else
+   vma_end_read(vma);
 
return __bad_area_nosemaphore(regs, address, si_code);
 }
 
 static noinline int bad_access_pkey(struct pt_regs *regs, unsigned long 
address,
+   struct mm_struct *mm,
struct vm_area_struct *vma)
 {
-   struct mm_struct *mm = current->mm;
int pkey;
 
/*
@@ -109,7 +112,10 @@ static noinline int bad_access_pkey(struct pt_regs *regs, 
unsigned long address,
 */
pkey = vma_pkey(vma);
 
-   mmap_read_unlock(mm);
+   if (mm)
+   mmap_read_unlock(mm);
+   else
+   vma_end_read(vma);
 
/*
 * If we are in kernel mode, bail out with a SEGV, this will
@@ -124,9 +130,10 @@ static noinline int bad_access_pkey(struct pt_regs *regs, 
unsigned long address,
return 0;
 }
 
-static noinline int bad_access(struct pt_regs *regs, unsigned long address)
+static noinline int bad_access(struct pt_regs *regs, unsigned long address,
+  struct mm_struct *mm, struct vm_area_struct *vma)
 {
-   return __bad_area(regs, address, SEGV_ACCERR);
+   return __bad_area(regs, address, SEGV_ACCERR, mm, vma);
 }
 
 static int do_sigbus(struct pt_regs *regs, unsigned long address,
@@ -479,13 +486,13 @@ static int ___do_page_fault(struct pt_regs *regs, 
unsigned long address,
 
if (unlikely(access_pkey_error(is_write, is_exec,
   (error_code & DSISR_KEYFAULT), vma))) {
-   vma_end_read(vma);
-   goto lock_mmap;
+   count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
+   return bad_access_pkey(regs, address, NULL, vma);
}
 
if (unlikely(access_error(is_write, is_exec, vma))) {
-   vma_end_read(vma);
-   goto lock_mmap;
+   count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
+   return bad_access(regs, address, NULL, vma);
}
 
fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, 
regs);
@@ -521,10 +528,10 @@ static int ___do_page_fault(struct pt_regs *regs, 
unsigned long address,
 
if (unlikely(access_pkey_error(is_write, is_exec,
   (error_code & DSISR_KEYFAULT), vma)))
-   return bad_access_pkey(regs, address, vma);
+   return bad_access_pkey(regs, address, mm, vma);
 
if (unlikely(access_error(is_write, is_exec, vma)))
-   return bad_access(regs, address);
+   return bad_access(regs, address, mm, vma);
 
/*
 * If for any reason at all we couldn't handle the fault,
-- 
2.27.0



[PATCH 3/7] arm: mm: accelerate pagefault when VM_FAULT_BADACCESS

2024-04-02 Thread Kefeng Wang
The vm_flags of vma already checked under per-VMA lock, if it is a
bad access, directly set fault to VM_FAULT_BADACCESS and handle error,
so no need to lock_mm_and_find_vma() and check vm_flags again.

Signed-off-by: Kefeng Wang 
---
 arch/arm/mm/fault.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index 439dc6a26bb9..5c4b417e24f9 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -294,7 +294,9 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct 
pt_regs *regs)
 
if (!(vma->vm_flags & vm_flags)) {
vma_end_read(vma);
-   goto lock_mmap;
+   count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
+   fault = VM_FAULT_BADACCESS;
+   goto bad_area;
}
fault = handle_mm_fault(vma, addr, flags | FAULT_FLAG_VMA_LOCK, regs);
if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
-- 
2.27.0



[PATCH 2/7] arm64: mm: accelerate pagefault when VM_FAULT_BADACCESS

2024-04-02 Thread Kefeng Wang
The vm_flags of vma already checked under per-VMA lock, if it is a
bad access, directly set fault to VM_FAULT_BADACCESS and handle error,
no need to lock_mm_and_find_vma() and check vm_flags again, the latency
time reduce 34% in lmbench 'lat_sig -P 1 prot lat_sig'.

Signed-off-by: Kefeng Wang 
---
 arch/arm64/mm/fault.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 9bb9f395351a..405f9aa831bd 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -572,7 +572,9 @@ static int __kprobes do_page_fault(unsigned long far, 
unsigned long esr,
 
if (!(vma->vm_flags & vm_flags)) {
vma_end_read(vma);
-   goto lock_mmap;
+   fault = VM_FAULT_BADACCESS;
+   count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
+   goto done;
}
fault = handle_mm_fault(vma, addr, mm_flags | FAULT_FLAG_VMA_LOCK, 
regs);
if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
-- 
2.27.0



Re: [PATCH] asm/io: remove unnecessary xlate_dev_mem_ptr() and unxlate_dev_mem_ptr()

2023-11-19 Thread Kefeng Wang




On 2023/11/20 14:40, Arnd Bergmann wrote:

On Mon, Nov 20, 2023, at 01:39, Kefeng Wang wrote:

On 2023/11/20 3:34, Geert Uytterhoeven wrote:

On Sat, Nov 18, 2023 at 11:09 AM Kefeng Wang  wrote:


-/*
- * Convert a physical pointer to a virtual kernel pointer for /dev/mem
- * access
- */
-#define xlate_dev_mem_ptr(p)   __va(p)
-#define unxlate_dev_mem_ptr(p, v) do { } while (0)
-
   void __ioread64_copy(void *to, const void __iomem *from, size_t count);


Missing #include , according to the build bot report.


Will check the bot report.


I had planned to pick up the series from

https://lore.kernel.org/lkml/20230921110424.215592-3-...@redhat.com/


Good to see it.



for v6.7 but didn't make it in the end. I'll try to do it now
for v6.8 and apply your v1 patch with the Acks on top.


Thanks.



 Arnd


[PATCH v2] asm/io: remove unnecessary xlate_dev_mem_ptr() and unxlate_dev_mem_ptr()

2023-11-19 Thread Kefeng Wang
The asm-generic/io.h already has default define, remove unnecessary
arch's defination.

Cc: Richard Henderson 
Cc: Ivan Kokshaysky 
Cc: Russell King 
Cc: Brian Cain 
Cc: "James E.J. Bottomley" 
Cc: Nicholas Piggin 
Cc: Christophe Leroy 
Cc: Yoshinori Sato 
Cc: Rich Felker 
Cc: "David S. Miller" 
Cc: Stanislav Kinsburskii 
Reviewed-by: Geert Uytterhoeven   [m68k]
Acked-by: Geert Uytterhoeven  [m68k]
Reviewed-by: Geert Uytterhoeven[sh]
Signed-off-by: Kefeng Wang 
---
v2:
- remove mips change, since it needs more extra works for enable
  
 arch/alpha/include/asm/io.h| 6 --
 arch/arm/include/asm/io.h  | 6 --
 arch/hexagon/include/asm/io.h  | 6 --
 arch/m68k/include/asm/io_mm.h  | 6 --
 arch/parisc/include/asm/io.h   | 6 --
 arch/powerpc/include/asm/io.h  | 6 --
 arch/sh/include/asm/io.h   | 7 ---
 arch/sparc/include/asm/io_64.h | 6 --
 8 files changed, 49 deletions(-)

diff --git a/arch/alpha/include/asm/io.h b/arch/alpha/include/asm/io.h
index 7aeaf7c30a6f..5e5d21ebc584 100644
--- a/arch/alpha/include/asm/io.h
+++ b/arch/alpha/include/asm/io.h
@@ -651,12 +651,6 @@ extern void outsl (unsigned long port, const void *src, 
unsigned long count);
 #endif
 #define RTC_ALWAYS_BCD 0
 
-/*
- * Convert a physical pointer to a virtual kernel pointer for /dev/mem
- * access
- */
-#define xlate_dev_mem_ptr(p)   __va(p)
-
 /*
  * These get provided from  since alpha does not
  * select GENERIC_IOMAP.
diff --git a/arch/arm/include/asm/io.h b/arch/arm/include/asm/io.h
index 56b08ed6cc3b..1815748f5d2a 100644
--- a/arch/arm/include/asm/io.h
+++ b/arch/arm/include/asm/io.h
@@ -407,12 +407,6 @@ struct pci_dev;
 #define pci_iounmap pci_iounmap
 extern void pci_iounmap(struct pci_dev *dev, void __iomem *addr);
 
-/*
- * Convert a physical pointer to a virtual kernel pointer for /dev/mem
- * access
- */
-#define xlate_dev_mem_ptr(p)   __va(p)
-
 #include 
 
 #ifdef CONFIG_MMU
diff --git a/arch/hexagon/include/asm/io.h b/arch/hexagon/include/asm/io.h
index e2b308e32a37..97d57751ce3b 100644
--- a/arch/hexagon/include/asm/io.h
+++ b/arch/hexagon/include/asm/io.h
@@ -58,12 +58,6 @@ static inline void *phys_to_virt(unsigned long address)
return __va(address);
 }
 
-/*
- * convert a physical pointer to a virtual kernel pointer for
- * /dev/mem access.
- */
-#define xlate_dev_mem_ptr(p)__va(p)
-
 /*
  * IO port access primitives.  Hexagon doesn't have special IO access
  * instructions; all I/O is memory mapped.
diff --git a/arch/m68k/include/asm/io_mm.h b/arch/m68k/include/asm/io_mm.h
index 47525f2a57e1..090aec54b8fa 100644
--- a/arch/m68k/include/asm/io_mm.h
+++ b/arch/m68k/include/asm/io_mm.h
@@ -389,12 +389,6 @@ static inline void isa_delay(void)
 
 #define __ARCH_HAS_NO_PAGE_ZERO_MAPPED 1
 
-/*
- * Convert a physical pointer to a virtual kernel pointer for /dev/mem
- * access
- */
-#define xlate_dev_mem_ptr(p)   __va(p)
-
 #define readb_relaxed(addr)readb(addr)
 #define readw_relaxed(addr)readw(addr)
 #define readl_relaxed(addr)readl(addr)
diff --git a/arch/parisc/include/asm/io.h b/arch/parisc/include/asm/io.h
index 366537042465..9c06cafb0e70 100644
--- a/arch/parisc/include/asm/io.h
+++ b/arch/parisc/include/asm/io.h
@@ -267,12 +267,6 @@ extern void iowrite64be(u64 val, void __iomem *addr);
 #define iowrite16_rep iowrite16_rep
 #define iowrite32_rep iowrite32_rep
 
-/*
- * Convert a physical pointer to a virtual kernel pointer for /dev/mem
- * access
- */
-#define xlate_dev_mem_ptr(p)   __va(p)
-
 extern int devmem_is_allowed(unsigned long pfn);
 
 #include 
diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h
index 5220274a6277..79421c285066 100644
--- a/arch/powerpc/include/asm/io.h
+++ b/arch/powerpc/include/asm/io.h
@@ -709,12 +709,6 @@ static inline void name at 
\
 #define memcpy_fromio memcpy_fromio
 #define memcpy_toio memcpy_toio
 
-/*
- * Convert a physical pointer to a virtual kernel pointer for /dev/mem
- * access
- */
-#define xlate_dev_mem_ptr(p)   __va(p)
-
 /*
  * We don't do relaxed operations yet, at least not with this semantic
  */
diff --git a/arch/sh/include/asm/io.h b/arch/sh/include/asm/io.h
index ac521f287fa5..be7ac06423a9 100644
--- a/arch/sh/include/asm/io.h
+++ b/arch/sh/include/asm/io.h
@@ -304,13 +304,6 @@ unsigned long long poke_real_address_q(unsigned long long 
addr,
 
 #define ioremap_uc ioremap
 
-/*
- * Convert a physical pointer to a virtual kernel pointer for /dev/mem
- * access
- */
-#define xlate_dev_mem_ptr(p)   __va(p)
-#define unxlate_dev_mem_ptr(p, v) do { } while (0)
-
 #include 
 
 #define ARCH_HAS_VALID_PHYS_ADDR_RANGE
diff --git a/arch/sparc/include/asm/io_64.h b/arch/sparc/include/asm/io_64.h
index 9303270b22f3..75ae9bf3bb7b 100644
--- a/arch/sparc/include/asm/io_64.h
+++ b/arch/sparc/include/asm/io_64.h
@@ -470,12 +470,6 @@ static inline int sbus_can_burst64(void)
 struct device;
 void sbus_set_sbus64(

Re: [PATCH] asm/io: remove unnecessary xlate_dev_mem_ptr() and unxlate_dev_mem_ptr()

2023-11-19 Thread Kefeng Wang




On 2023/11/20 3:34, Geert Uytterhoeven wrote:

On Sat, Nov 18, 2023 at 11:09 AM Kefeng Wang  wrote:

The asm-generic/io.h already has default definition, remove unnecessary
arch's defination.

Cc: Richard Henderson 
Cc: Ivan Kokshaysky 
Cc: Russell King 
Cc: Brian Cain 
Cc: "James E.J. Bottomley" 
Cc: Nicholas Piggin 
Cc: Christophe Leroy 
Cc: Yoshinori Sato 
Cc: Rich Felker 
Cc: "David S. Miller" 
Cc: Stanislav Kinsburskii 
Signed-off-by: Kefeng Wang 



  arch/m68k/include/asm/io_mm.h  | 6 --


Reviewed-by: Geert Uytterhoeven 
Acked-by: Geert Uytterhoeven 


  arch/sh/include/asm/io.h   | 7 ---


Reviewed-by: Geert Uytterhoeven 



Thanks,



--- a/arch/mips/include/asm/io.h
+++ b/arch/mips/include/asm/io.h
@@ -548,13 +548,6 @@ extern void (*_dma_cache_inv)(unsigned long start, 
unsigned long size);
  #define csr_out32(v, a) (*(volatile u32 *)((unsigned long)(a) + 
__CSR_32_ADJUST) = (v))
  #define csr_in32(a)(*(volatile u32 *)((unsigned long)(a) + 
__CSR_32_ADJUST))

-/*
- * Convert a physical pointer to a virtual kernel pointer for /dev/mem
- * access
- */
-#define xlate_dev_mem_ptr(p)   __va(p)
-#define unxlate_dev_mem_ptr(p, v) do { } while (0)
-
  void __ioread64_copy(void *to, const void __iomem *from, size_t count);


Missing #include , according to the build bot report.


Will check the bot report.




  #endif /* _ASM_IO_H */


Gr{oetje,eeting}s,

 Geert



[PATCH] asm/io: remove unnecessary xlate_dev_mem_ptr() and unxlate_dev_mem_ptr()

2023-11-18 Thread Kefeng Wang
The asm-generic/io.h already has default definition, remove unnecessary
arch's defination.

Cc: Richard Henderson 
Cc: Ivan Kokshaysky 
Cc: Russell King 
Cc: Brian Cain 
Cc: "James E.J. Bottomley" 
Cc: Nicholas Piggin 
Cc: Christophe Leroy 
Cc: Yoshinori Sato 
Cc: Rich Felker 
Cc: "David S. Miller" 
Cc: Stanislav Kinsburskii 
Signed-off-by: Kefeng Wang 
---
 arch/alpha/include/asm/io.h| 6 --
 arch/arm/include/asm/io.h  | 6 --
 arch/hexagon/include/asm/io.h  | 6 --
 arch/m68k/include/asm/io_mm.h  | 6 --
 arch/mips/include/asm/io.h | 7 ---
 arch/parisc/include/asm/io.h   | 6 --
 arch/powerpc/include/asm/io.h  | 6 --
 arch/sh/include/asm/io.h   | 7 ---
 arch/sparc/include/asm/io_64.h | 6 --
 9 files changed, 56 deletions(-)

diff --git a/arch/alpha/include/asm/io.h b/arch/alpha/include/asm/io.h
index 7aeaf7c30a6f..5e5d21ebc584 100644
--- a/arch/alpha/include/asm/io.h
+++ b/arch/alpha/include/asm/io.h
@@ -651,12 +651,6 @@ extern void outsl (unsigned long port, const void *src, 
unsigned long count);
 #endif
 #define RTC_ALWAYS_BCD 0
 
-/*
- * Convert a physical pointer to a virtual kernel pointer for /dev/mem
- * access
- */
-#define xlate_dev_mem_ptr(p)   __va(p)
-
 /*
  * These get provided from  since alpha does not
  * select GENERIC_IOMAP.
diff --git a/arch/arm/include/asm/io.h b/arch/arm/include/asm/io.h
index 56b08ed6cc3b..1815748f5d2a 100644
--- a/arch/arm/include/asm/io.h
+++ b/arch/arm/include/asm/io.h
@@ -407,12 +407,6 @@ struct pci_dev;
 #define pci_iounmap pci_iounmap
 extern void pci_iounmap(struct pci_dev *dev, void __iomem *addr);
 
-/*
- * Convert a physical pointer to a virtual kernel pointer for /dev/mem
- * access
- */
-#define xlate_dev_mem_ptr(p)   __va(p)
-
 #include 
 
 #ifdef CONFIG_MMU
diff --git a/arch/hexagon/include/asm/io.h b/arch/hexagon/include/asm/io.h
index e2b308e32a37..97d57751ce3b 100644
--- a/arch/hexagon/include/asm/io.h
+++ b/arch/hexagon/include/asm/io.h
@@ -58,12 +58,6 @@ static inline void *phys_to_virt(unsigned long address)
return __va(address);
 }
 
-/*
- * convert a physical pointer to a virtual kernel pointer for
- * /dev/mem access.
- */
-#define xlate_dev_mem_ptr(p)__va(p)
-
 /*
  * IO port access primitives.  Hexagon doesn't have special IO access
  * instructions; all I/O is memory mapped.
diff --git a/arch/m68k/include/asm/io_mm.h b/arch/m68k/include/asm/io_mm.h
index 47525f2a57e1..090aec54b8fa 100644
--- a/arch/m68k/include/asm/io_mm.h
+++ b/arch/m68k/include/asm/io_mm.h
@@ -389,12 +389,6 @@ static inline void isa_delay(void)
 
 #define __ARCH_HAS_NO_PAGE_ZERO_MAPPED 1
 
-/*
- * Convert a physical pointer to a virtual kernel pointer for /dev/mem
- * access
- */
-#define xlate_dev_mem_ptr(p)   __va(p)
-
 #define readb_relaxed(addr)readb(addr)
 #define readw_relaxed(addr)readw(addr)
 #define readl_relaxed(addr)readl(addr)
diff --git a/arch/mips/include/asm/io.h b/arch/mips/include/asm/io.h
index 062dd4e6b954..2158ff302430 100644
--- a/arch/mips/include/asm/io.h
+++ b/arch/mips/include/asm/io.h
@@ -548,13 +548,6 @@ extern void (*_dma_cache_inv)(unsigned long start, 
unsigned long size);
 #define csr_out32(v, a) (*(volatile u32 *)((unsigned long)(a) + 
__CSR_32_ADJUST) = (v))
 #define csr_in32(a)(*(volatile u32 *)((unsigned long)(a) + 
__CSR_32_ADJUST))
 
-/*
- * Convert a physical pointer to a virtual kernel pointer for /dev/mem
- * access
- */
-#define xlate_dev_mem_ptr(p)   __va(p)
-#define unxlate_dev_mem_ptr(p, v) do { } while (0)
-
 void __ioread64_copy(void *to, const void __iomem *from, size_t count);
 
 #endif /* _ASM_IO_H */
diff --git a/arch/parisc/include/asm/io.h b/arch/parisc/include/asm/io.h
index 366537042465..9c06cafb0e70 100644
--- a/arch/parisc/include/asm/io.h
+++ b/arch/parisc/include/asm/io.h
@@ -267,12 +267,6 @@ extern void iowrite64be(u64 val, void __iomem *addr);
 #define iowrite16_rep iowrite16_rep
 #define iowrite32_rep iowrite32_rep
 
-/*
- * Convert a physical pointer to a virtual kernel pointer for /dev/mem
- * access
- */
-#define xlate_dev_mem_ptr(p)   __va(p)
-
 extern int devmem_is_allowed(unsigned long pfn);
 
 #include 
diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h
index 5220274a6277..79421c285066 100644
--- a/arch/powerpc/include/asm/io.h
+++ b/arch/powerpc/include/asm/io.h
@@ -709,12 +709,6 @@ static inline void name at 
\
 #define memcpy_fromio memcpy_fromio
 #define memcpy_toio memcpy_toio
 
-/*
- * Convert a physical pointer to a virtual kernel pointer for /dev/mem
- * access
- */
-#define xlate_dev_mem_ptr(p)   __va(p)
-
 /*
  * We don't do relaxed operations yet, at least not with this semantic
  */
diff --git a/arch/sh/include/asm/io.h b/arch/sh/include/asm/io.h
index ac521f287fa5..be7ac06423a9 100644
--- a/arch/sh/include/asm/io.h
+++ b/arch/sh/include/asm/io.h
@@ -304,13 +304,6 @@ unsigned long long poke_real_address_q(unsigned long long 
addr,
 

Re: [PATCH rfc v2 04/10] s390: mm: use try_vma_locked_page_fault()

2023-08-25 Thread Kefeng Wang




On 2023/8/24 16:32, Heiko Carstens wrote:

On Thu, Aug 24, 2023 at 10:16:33AM +0200, Alexander Gordeev wrote:

On Mon, Aug 21, 2023 at 08:30:50PM +0800, Kefeng Wang wrote:

Use new try_vma_locked_page_fault() helper to simplify code.
No functional change intended.

Signed-off-by: Kefeng Wang 
---
  arch/s390/mm/fault.c | 66 ++--
  1 file changed, 27 insertions(+), 39 deletions(-)

...

-   fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, 
regs);
-   if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
-   vma_end_read(vma);
-   if (!(fault & VM_FAULT_RETRY)) {
-   count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
-   if (likely(!(fault & VM_FAULT_ERROR)))
-   fault = 0;


This fault fixup is removed in the new version.

...


+   vmf.vm_flags = VM_WRITE;
+   if (vmf.vm_flags == VM_WRITE)
+   vmf.flags |= FAULT_FLAG_WRITE;
+
+   fault = try_vma_locked_page_fault();
+   if (fault == VM_FAULT_NONE)
+   goto lock_mm;


Because VM_FAULT_NONE is set to 0 it gets confused with
the success code of 0 returned by a fault handler. In the
former case we want to continue, while in the latter -
successfully return. I think it applies to all archs.

...

FWIW, this series ends up with kernel BUG at arch/s390/mm/fault.c:341!




I didn't test and only built, this is a RFC to want to know whether
the way to add three more numbers into vmf and using vmf in arch's page
fault is feasible or not.


Without having looked in detail into this patch: all of this is likely
because s390's fault handling is quite odd. Not only because fault is set
to 0, but also because of the private VM_FAULT values like
VM_FAULT_BADCONTEXT. I'm just cleaning up all of this, but it won't make it
for the next merge window.


Sure, if re-post, will drop the s390's change, but as mentioned above, 
the abstract of the generic vma locked and changes may be not perfect,

let's wait for more response.

Thanks all.



Therefore I'd like to ask to drop the s390 conversion of this series, and
if this series is supposed to be merged the s390 conversion needs to be
done later. Let's not waste more time on the current implementation,
please.


Re: [PATCH rfc v2 01/10] mm: add a generic VMA lock-based page fault handler

2023-08-25 Thread Kefeng Wang




On 2023/8/24 15:12, Alexander Gordeev wrote:

On Mon, Aug 21, 2023 at 08:30:47PM +0800, Kefeng Wang wrote:

Hi Kefeng,


The ARCH_SUPPORTS_PER_VMA_LOCK are enabled by more and more architectures,
eg, x86, arm64, powerpc and s390, and riscv, those implementation are very
similar which results in some duplicated codes, let's add a generic VMA
lock-based page fault handler try_to_vma_locked_page_fault() to eliminate
them, and which also make us easy to support this on new architectures.

Since different architectures use different way to check vma whether is
accessable or not, the struct pt_regs, page fault error code and vma flags
are added into struct vm_fault, then, the architecture's page fault code
could re-use struct vm_fault to record and check vma accessable by each
own implementation.

Signed-off-by: Kefeng Wang 
---

...

+
+vm_fault_t try_vma_locked_page_fault(struct vm_fault *vmf)
+{
+   vm_fault_t fault = VM_FAULT_NONE;
+   struct vm_area_struct *vma;
+
+   if (!(vmf->flags & FAULT_FLAG_USER))
+   return fault;
+
+   vma = lock_vma_under_rcu(current->mm, vmf->real_address);
+   if (!vma)
+   return fault;
+
+   if (arch_vma_access_error(vma, vmf)) {
+   vma_end_read(vma);
+   return fault;
+   }
+
+   fault = handle_mm_fault(vma, vmf->real_address,
+   vmf->flags | FAULT_FLAG_VMA_LOCK, vmf->regs);
+
+   if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
+   vma_end_read(vma);


Could you please explain how vma_end_read() call could be conditional?


The check is added for swap and userfault, see

https://lkml.kernel.org/r/20230630211957.1341547-4-sur...@google.com



+
+   if (fault & VM_FAULT_RETRY)
+   count_vm_vma_lock_event(VMA_LOCK_RETRY);
+   else
+   count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
+
+   return fault;
+}
+
  #endif /* CONFIG_PER_VMA_LOCK */
  
  #ifndef __PAGETABLE_P4D_FOLDED


Re: [PATCH rfc v2 05/10] powerpc: mm: use try_vma_locked_page_fault()

2023-08-22 Thread Kefeng Wang




On 2023/8/22 17:38, Christophe Leroy wrote:



Le 21/08/2023 à 14:30, Kefeng Wang a écrit :

Use new try_vma_locked_page_fault() helper to simplify code.
No functional change intended.


Does it really simplifies code ? It's 32 insertions versus 34 deletions
so only removing 2 lines.


Yes,it is unfriendly for powerpc as the arch's vma access check is much
complex than other arch,


I don't like the struct vm_fault you are adding because when it was four
independant variables it was handled through local registers. Now that
it is a struct it has to go via the stack, leading to unnecessary memory
read and writes. And going back and forth between architecture code and
generic code may also be counter-performant.


Because different arch has different var to check vma access, so the
easy way to add them into vmf, I don' find a better way.


Did you make any performance analysis ? Page faults are really a hot
path when dealling with minor faults.


no, this is only built and rfc to see the feedback about the conversion.

Thanks.



Thanks
Christophe



Signed-off-by: Kefeng Wang 
---
   arch/powerpc/mm/fault.c | 66 -
   1 file changed, 32 insertions(+), 34 deletions(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index b1723094d464..52f9546e020e 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -391,6 +391,22 @@ static int page_fault_is_bad(unsigned long err)
   #define page_fault_is_bad(__err) ((__err) & DSISR_BAD_FAULT_32S)
   #endif
   
+#ifdef CONFIG_PER_VMA_LOCK

+bool arch_vma_access_error(struct vm_area_struct *vma, struct vm_fault *vmf)
+{
+   int is_exec = TRAP(vmf->regs) == INTERRUPT_INST_STORAGE;
+   int is_write = page_fault_is_write(vmf->fault_code);
+
+   if (unlikely(access_pkey_error(is_write, is_exec,
+   (vmf->fault_code & DSISR_KEYFAULT), vma)))
+   return true;
+
+   if (unlikely(access_error(is_write, is_exec, vma)))
+   return true;
+   return false;
+}
+#endif
+
   /*
* For 600- and 800-family processors, the error_code parameter is DSISR
* for a data fault, SRR1 for an instruction fault.
@@ -407,12 +423,18 @@ static int ___do_page_fault(struct pt_regs *regs, 
unsigned long address,
   {
struct vm_area_struct * vma;
struct mm_struct *mm = current->mm;
-   unsigned int flags = FAULT_FLAG_DEFAULT;
int is_exec = TRAP(regs) == INTERRUPT_INST_STORAGE;
int is_user = user_mode(regs);
int is_write = page_fault_is_write(error_code);
vm_fault_t fault, major = 0;
bool kprobe_fault = kprobe_page_fault(regs, 11);
+   struct vm_fault vmf = {
+   .real_address = address,
+   .fault_code = error_code,
+   .regs = regs,
+   .flags = FAULT_FLAG_DEFAULT,
+   };
+
   
   	if (unlikely(debugger_fault_handler(regs) || kprobe_fault))

return 0;
@@ -463,45 +485,21 @@ static int ___do_page_fault(struct pt_regs *regs, 
unsigned long address,
 * mmap_lock held
 */
if (is_user)
-   flags |= FAULT_FLAG_USER;
+   vmf.flags |= FAULT_FLAG_USER;
if (is_write)
-   flags |= FAULT_FLAG_WRITE;
+   vmf.flags |= FAULT_FLAG_WRITE;
if (is_exec)
-   flags |= FAULT_FLAG_INSTRUCTION;
+   vmf.flags |= FAULT_FLAG_INSTRUCTION;
   
-	if (!(flags & FAULT_FLAG_USER))

-   goto lock_mmap;
-
-   vma = lock_vma_under_rcu(mm, address);
-   if (!vma)
-   goto lock_mmap;
-
-   if (unlikely(access_pkey_error(is_write, is_exec,
-  (error_code & DSISR_KEYFAULT), vma))) {
-   vma_end_read(vma);
-   goto lock_mmap;
-   }
-
-   if (unlikely(access_error(is_write, is_exec, vma))) {
-   vma_end_read(vma);
-   goto lock_mmap;
-   }
-
-   fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, 
regs);
-   if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
-   vma_end_read(vma);
-
-   if (!(fault & VM_FAULT_RETRY)) {
-   count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
+   fault = try_vma_locked_page_fault();
+   if (fault == VM_FAULT_NONE)
+   goto retry;
+   if (!(fault & VM_FAULT_RETRY))
goto done;
-   }
-   count_vm_vma_lock_event(VMA_LOCK_RETRY);
   
   	if (fault_signal_pending(fault, regs))

return user_mode(regs) ? 0 : SIGBUS;
   
-lock_mmap:

-
/* When running in the kernel we expect faults to occur only to
 * addresses in user space.  All other faults represent errors in the
 * kernel and should generate an OOPS.  Unfortunately, in the case of an
@@ -528,7 +526,7 @@ static int ___do_page_fault(struct pt_regs *regs, unsigned 
long address,

[PATCH rfc v2 02/10] arm64: mm: use try_vma_locked_page_fault()

2023-08-21 Thread Kefeng Wang
Use new try_vma_locked_page_fault() helper to simplify code, also
pass struct vmf to __do_page_fault() directly instead of each
independent variable. No functional change intended.

Signed-off-by: Kefeng Wang 
---
 arch/arm64/mm/fault.c | 60 ---
 1 file changed, 22 insertions(+), 38 deletions(-)

diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 2e5d1e238af9..2b7a1e610b3e 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -498,9 +498,8 @@ static void do_bad_area(unsigned long far, unsigned long 
esr,
 #define VM_FAULT_BADACCESS ((__force vm_fault_t)0x02)
 
 static vm_fault_t __do_page_fault(struct mm_struct *mm,
- struct vm_area_struct *vma, unsigned long 
addr,
- unsigned int mm_flags, unsigned long vm_flags,
- struct pt_regs *regs)
+ struct vm_area_struct *vma,
+ struct vm_fault *vmf)
 {
/*
 * Ok, we have a good vm_area for this memory access, so we can handle
@@ -508,9 +507,9 @@ static vm_fault_t __do_page_fault(struct mm_struct *mm,
 * Check that the permissions on the VMA allow for the fault which
 * occurred.
 */
-   if (!(vma->vm_flags & vm_flags))
+   if (!(vma->vm_flags & vmf->vm_flags))
return VM_FAULT_BADACCESS;
-   return handle_mm_fault(vma, addr, mm_flags, regs);
+   return handle_mm_fault(vma, vmf->real_address, vmf->flags, vmf->regs);
 }
 
 static bool is_el0_instruction_abort(unsigned long esr)
@@ -533,10 +532,12 @@ static int __kprobes do_page_fault(unsigned long far, 
unsigned long esr,
const struct fault_info *inf;
struct mm_struct *mm = current->mm;
vm_fault_t fault;
-   unsigned long vm_flags;
-   unsigned int mm_flags = FAULT_FLAG_DEFAULT;
unsigned long addr = untagged_addr(far);
struct vm_area_struct *vma;
+   struct vm_fault vmf = {
+   .real_address = addr,
+   .flags = FAULT_FLAG_DEFAULT,
+   };
 
if (kprobe_page_fault(regs, esr))
return 0;
@@ -549,7 +550,7 @@ static int __kprobes do_page_fault(unsigned long far, 
unsigned long esr,
goto no_context;
 
if (user_mode(regs))
-   mm_flags |= FAULT_FLAG_USER;
+   vmf.flags |= FAULT_FLAG_USER;
 
/*
 * vm_flags tells us what bits we must have in vma->vm_flags
@@ -559,20 +560,20 @@ static int __kprobes do_page_fault(unsigned long far, 
unsigned long esr,
 */
if (is_el0_instruction_abort(esr)) {
/* It was exec fault */
-   vm_flags = VM_EXEC;
-   mm_flags |= FAULT_FLAG_INSTRUCTION;
+   vmf.vm_flags = VM_EXEC;
+   vmf.flags |= FAULT_FLAG_INSTRUCTION;
} else if (is_write_abort(esr)) {
/* It was write fault */
-   vm_flags = VM_WRITE;
-   mm_flags |= FAULT_FLAG_WRITE;
+   vmf.vm_flags = VM_WRITE;
+   vmf.flags |= FAULT_FLAG_WRITE;
} else {
/* It was read fault */
-   vm_flags = VM_READ;
+   vmf.vm_flags = VM_READ;
/* Write implies read */
-   vm_flags |= VM_WRITE;
+   vmf.vm_flags |= VM_WRITE;
/* If EPAN is absent then exec implies read */
if (!cpus_have_const_cap(ARM64_HAS_EPAN))
-   vm_flags |= VM_EXEC;
+   vmf.vm_flags |= VM_EXEC;
}
 
if (is_ttbr0_addr(addr) && is_el1_permission_fault(addr, esr, regs)) {
@@ -587,26 +588,11 @@ static int __kprobes do_page_fault(unsigned long far, 
unsigned long esr,
 
perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, addr);
 
-   if (!(mm_flags & FAULT_FLAG_USER))
-   goto lock_mmap;
-
-   vma = lock_vma_under_rcu(mm, addr);
-   if (!vma)
-   goto lock_mmap;
-
-   if (!(vma->vm_flags & vm_flags)) {
-   vma_end_read(vma);
-   goto lock_mmap;
-   }
-   fault = handle_mm_fault(vma, addr, mm_flags | FAULT_FLAG_VMA_LOCK, 
regs);
-   if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
-   vma_end_read(vma);
-
-   if (!(fault & VM_FAULT_RETRY)) {
-   count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
+   fault = try_vma_locked_page_fault();
+   if (fault == VM_FAULT_NONE)
+   goto retry;
+   if (!(fault & VM_FAULT_RETRY))
goto done;
-   }
-   count_vm_vma_lock_event(VMA_LOCK_RETRY);
 
/* Quick path to respond to signals */
if (fault_signal_pending(fault, regs)) {
@@ -614,8 +600,6 @@ static int __kprobes do_page_fault(unsigned long far, 
unsigned long esr,
goto no_co

[PATCH rfc v2 04/10] s390: mm: use try_vma_locked_page_fault()

2023-08-21 Thread Kefeng Wang
Use new try_vma_locked_page_fault() helper to simplify code.
No functional change intended.

Signed-off-by: Kefeng Wang 
---
 arch/s390/mm/fault.c | 66 ++--
 1 file changed, 27 insertions(+), 39 deletions(-)

diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
index 099c4824dd8a..fbbdebde6ea7 100644
--- a/arch/s390/mm/fault.c
+++ b/arch/s390/mm/fault.c
@@ -357,16 +357,18 @@ static noinline void do_fault_error(struct pt_regs *regs, 
vm_fault_t fault)
 static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
 {
struct gmap *gmap;
-   struct task_struct *tsk;
-   struct mm_struct *mm;
struct vm_area_struct *vma;
enum fault_type type;
-   unsigned long address;
-   unsigned int flags;
+   struct mm_struct *mm = current->mm;
+   unsigned long address = get_fault_address(regs);
vm_fault_t fault;
bool is_write;
+   struct vm_fault vmf = {
+   .real_address = address,
+   .flags = FAULT_FLAG_DEFAULT,
+   .vm_flags = access,
+   };
 
-   tsk = current;
/*
 * The instruction that caused the program check has
 * been nullified. Don't signal single step via SIGTRAP.
@@ -376,8 +378,6 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, 
int access)
if (kprobe_page_fault(regs, 14))
return 0;
 
-   mm = tsk->mm;
-   address = get_fault_address(regs);
is_write = fault_is_write(regs);
 
/*
@@ -398,45 +398,33 @@ static inline vm_fault_t do_exception(struct pt_regs 
*regs, int access)
}
 
perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
-   flags = FAULT_FLAG_DEFAULT;
if (user_mode(regs))
-   flags |= FAULT_FLAG_USER;
+   vmf.flags |= FAULT_FLAG_USER;
if (is_write)
-   access = VM_WRITE;
-   if (access == VM_WRITE)
-   flags |= FAULT_FLAG_WRITE;
-   if (!(flags & FAULT_FLAG_USER))
-   goto lock_mmap;
-   vma = lock_vma_under_rcu(mm, address);
-   if (!vma)
-   goto lock_mmap;
-   if (!(vma->vm_flags & access)) {
-   vma_end_read(vma);
-   goto lock_mmap;
-   }
-   fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, 
regs);
-   if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
-   vma_end_read(vma);
-   if (!(fault & VM_FAULT_RETRY)) {
-   count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
-   if (likely(!(fault & VM_FAULT_ERROR)))
-   fault = 0;
+   vmf.vm_flags = VM_WRITE;
+   if (vmf.vm_flags == VM_WRITE)
+   vmf.flags |= FAULT_FLAG_WRITE;
+
+   fault = try_vma_locked_page_fault();
+   if (fault == VM_FAULT_NONE)
+   goto lock_mm;
+   if (!(fault & VM_FAULT_RETRY))
goto out;
-   }
-   count_vm_vma_lock_event(VMA_LOCK_RETRY);
+
/* Quick path to respond to signals */
if (fault_signal_pending(fault, regs)) {
fault = VM_FAULT_SIGNAL;
goto out;
}
-lock_mmap:
+
+lock_mm:
mmap_read_lock(mm);
 
gmap = NULL;
if (IS_ENABLED(CONFIG_PGSTE) && type == GMAP_FAULT) {
gmap = (struct gmap *) S390_lowcore.gmap;
current->thread.gmap_addr = address;
-   current->thread.gmap_write_flag = !!(flags & FAULT_FLAG_WRITE);
+   current->thread.gmap_write_flag = !!(vmf.flags & 
FAULT_FLAG_WRITE);
current->thread.gmap_int_code = regs->int_code & 0x;
address = __gmap_translate(gmap, address);
if (address == -EFAULT) {
@@ -444,7 +432,7 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, 
int access)
goto out_up;
}
if (gmap->pfault_enabled)
-   flags |= FAULT_FLAG_RETRY_NOWAIT;
+   vmf.flags |= FAULT_FLAG_RETRY_NOWAIT;
}
 
 retry:
@@ -466,7 +454,7 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, 
int access)
 * we can handle it..
 */
fault = VM_FAULT_BADACCESS;
-   if (unlikely(!(vma->vm_flags & access)))
+   if (unlikely(!(vma->vm_flags & vmf.vm_flags)))
goto out_up;
 
/*
@@ -474,10 +462,10 @@ static inline vm_fault_t do_exception(struct pt_regs 
*regs, int access)
 * make sure we exit gracefully rather than endlessly redo
 * the fault.
 */
-   fault = handle_mm_fault(vma, address, flags, regs);
+   fault = handle_mm_fault(vma, address, vmf.flags, regs);
if (fault_signal_pending(fault, regs)) {
fault = VM_FAULT_SIGNAL;
-   if (flags & FAULT_FLAG_RETRY_NOWAIT)
+  

[PATCH rfc v2 03/10] x86: mm: use try_vma_locked_page_fault()

2023-08-21 Thread Kefeng Wang
Use new try_vma_locked_page_fault() helper to simplify code.
No functional change intended.

Signed-off-by: Kefeng Wang 
---
 arch/x86/mm/fault.c | 55 +++--
 1 file changed, 23 insertions(+), 32 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index ab778eac1952..3edc9edc0b28 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1227,6 +1227,13 @@ do_kern_addr_fault(struct pt_regs *regs, unsigned long 
hw_error_code,
 }
 NOKPROBE_SYMBOL(do_kern_addr_fault);
 
+#ifdef CONFIG_PER_VMA_LOCK
+bool arch_vma_access_error(struct vm_area_struct *vma, struct vm_fault *vmf)
+{
+   return access_error(vmf->fault_code, vma);
+}
+#endif
+
 /*
  * Handle faults in the user portion of the address space.  Nothing in here
  * should check X86_PF_USER without a specific justification: for almost
@@ -1241,13 +1248,13 @@ void do_user_addr_fault(struct pt_regs *regs,
unsigned long address)
 {
struct vm_area_struct *vma;
-   struct task_struct *tsk;
-   struct mm_struct *mm;
+   struct mm_struct *mm = current->mm;
vm_fault_t fault;
-   unsigned int flags = FAULT_FLAG_DEFAULT;
-
-   tsk = current;
-   mm = tsk->mm;
+   struct vm_fault vmf = {
+   .real_address = address,
+   .fault_code = error_code,
+   .flags = FAULT_FLAG_DEFAULT
+   };
 
if (unlikely((error_code & (X86_PF_USER | X86_PF_INSTR)) == 
X86_PF_INSTR)) {
/*
@@ -1311,7 +1318,7 @@ void do_user_addr_fault(struct pt_regs *regs,
 */
if (user_mode(regs)) {
local_irq_enable();
-   flags |= FAULT_FLAG_USER;
+   vmf.flags |= FAULT_FLAG_USER;
} else {
if (regs->flags & X86_EFLAGS_IF)
local_irq_enable();
@@ -1326,11 +1333,11 @@ void do_user_addr_fault(struct pt_regs *regs,
 * maybe_mkwrite() can create a proper shadow stack PTE.
 */
if (error_code & X86_PF_SHSTK)
-   flags |= FAULT_FLAG_WRITE;
+   vmf.flags |= FAULT_FLAG_WRITE;
if (error_code & X86_PF_WRITE)
-   flags |= FAULT_FLAG_WRITE;
+   vmf.flags |= FAULT_FLAG_WRITE;
if (error_code & X86_PF_INSTR)
-   flags |= FAULT_FLAG_INSTRUCTION;
+   vmf.flags |= FAULT_FLAG_INSTRUCTION;
 
 #ifdef CONFIG_X86_64
/*
@@ -1350,26 +1357,11 @@ void do_user_addr_fault(struct pt_regs *regs,
}
 #endif
 
-   if (!(flags & FAULT_FLAG_USER))
-   goto lock_mmap;
-
-   vma = lock_vma_under_rcu(mm, address);
-   if (!vma)
-   goto lock_mmap;
-
-   if (unlikely(access_error(error_code, vma))) {
-   vma_end_read(vma);
-   goto lock_mmap;
-   }
-   fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, 
regs);
-   if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
-   vma_end_read(vma);
-
-   if (!(fault & VM_FAULT_RETRY)) {
-   count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
+   fault = try_vma_locked_page_fault();
+   if (fault == VM_FAULT_NONE)
+   goto retry;
+   if (!(fault & VM_FAULT_RETRY))
goto done;
-   }
-   count_vm_vma_lock_event(VMA_LOCK_RETRY);
 
/* Quick path to respond to signals */
if (fault_signal_pending(fault, regs)) {
@@ -1379,7 +1371,6 @@ void do_user_addr_fault(struct pt_regs *regs,
 ARCH_DEFAULT_PKEY);
return;
}
-lock_mmap:
 
 retry:
vma = lock_mm_and_find_vma(mm, address, regs);
@@ -1410,7 +1401,7 @@ void do_user_addr_fault(struct pt_regs *regs,
 * userland). The return to userland is identified whenever
 * FAULT_FLAG_USER|FAULT_FLAG_KILLABLE are both set in flags.
 */
-   fault = handle_mm_fault(vma, address, flags, regs);
+   fault = handle_mm_fault(vma, address, vmf.flags, regs);
 
if (fault_signal_pending(fault, regs)) {
/*
@@ -1434,7 +1425,7 @@ void do_user_addr_fault(struct pt_regs *regs,
 * that we made any progress. Handle this case first.
 */
if (unlikely(fault & VM_FAULT_RETRY)) {
-   flags |= FAULT_FLAG_TRIED;
+   vmf.flags |= FAULT_FLAG_TRIED;
goto retry;
}
 
-- 
2.27.0



[PATCH rfc v2 07/10] ARM: mm: try VMA lock-based page fault handling first

2023-08-21 Thread Kefeng Wang
Attempt VMA lock-based page fault handling first, and fall back
to the existing mmap_lock-based handling if that fails.

Signed-off-by: Kefeng Wang 
---
 arch/arm/Kconfig|  1 +
 arch/arm/mm/fault.c | 35 +--
 2 files changed, 26 insertions(+), 10 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 1a6a6eb48a15..8b6d4507ccee 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -34,6 +34,7 @@ config ARM
select ARCH_OPTIONAL_KERNEL_RWX_DEFAULT if CPU_V7
select ARCH_SUPPORTS_ATOMIC_RMW
select ARCH_SUPPORTS_HUGETLBFS if ARM_LPAE
+   select ARCH_SUPPORTS_PER_VMA_LOCK
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_CMPXCHG_LOCKREF
select ARCH_USE_MEMTEST
diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index fef62e4a9edd..d53bb028899a 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -242,8 +242,11 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct 
pt_regs *regs)
struct vm_area_struct *vma;
int sig, code;
vm_fault_t fault;
-   unsigned int flags = FAULT_FLAG_DEFAULT;
-   unsigned long vm_flags = VM_ACCESS_FLAGS;
+   struct vm_fault vmf = {
+   .real_address = addr,
+   .flags = FAULT_FLAG_DEFAULT,
+   .vm_flags = VM_ACCESS_FLAGS,
+   };
 
if (kprobe_page_fault(regs, fsr))
return 0;
@@ -261,15 +264,15 @@ do_page_fault(unsigned long addr, unsigned int fsr, 
struct pt_regs *regs)
goto no_context;
 
if (user_mode(regs))
-   flags |= FAULT_FLAG_USER;
+   vmf.flags |= FAULT_FLAG_USER;
 
if (is_write_fault(fsr)) {
-   flags |= FAULT_FLAG_WRITE;
-   vm_flags = VM_WRITE;
+   vmf.flags |= FAULT_FLAG_WRITE;
+   vmf.vm_flags = VM_WRITE;
}
 
if (fsr & FSR_LNX_PF) {
-   vm_flags = VM_EXEC;
+   vmf.vm_flags = VM_EXEC;
 
if (is_permission_fault(fsr) && !user_mode(regs))
die_kernel_fault("execution of memory",
@@ -278,6 +281,18 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct 
pt_regs *regs)
 
perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, addr);
 
+   fault = try_vma_locked_page_fault();
+   if (fault == VM_FAULT_NONE)
+   goto retry;
+   if (!(fault & VM_FAULT_RETRY))
+   goto done;
+
+   if (fault_signal_pending(fault, regs)) {
+   if (!user_mode(regs))
+   goto no_context;
+   return 0;
+   }
+
 retry:
vma = lock_mm_and_find_vma(mm, addr, regs);
if (unlikely(!vma)) {
@@ -289,10 +304,10 @@ do_page_fault(unsigned long addr, unsigned int fsr, 
struct pt_regs *regs)
 * ok, we have a good vm_area for this memory access, check the
 * permissions on the VMA allow for the fault which occurred.
 */
-   if (!(vma->vm_flags & vm_flags))
+   if (!(vma->vm_flags & vmf.vm_flags))
fault = VM_FAULT_BADACCESS;
else
-   fault = handle_mm_fault(vma, addr & PAGE_MASK, flags, regs);
+   fault = handle_mm_fault(vma, addr & PAGE_MASK, vmf.flags, regs);
 
/* If we need to retry but a fatal signal is pending, handle the
 * signal first. We do not need to release the mmap_lock because
@@ -310,13 +325,13 @@ do_page_fault(unsigned long addr, unsigned int fsr, 
struct pt_regs *regs)
 
if (!(fault & VM_FAULT_ERROR)) {
if (fault & VM_FAULT_RETRY) {
-   flags |= FAULT_FLAG_TRIED;
+   vmf.flags |= FAULT_FLAG_TRIED;
goto retry;
}
}
 
mmap_read_unlock(mm);
-
+done:
/*
 * Handle the "normal" case first - VM_FAULT_MAJOR
 */
-- 
2.27.0



[PATCH rfc v2 08/10] loongarch: mm: cleanup __do_page_fault()

2023-08-21 Thread Kefeng Wang
Cleanup __do_page_fault() by reuse bad_area_nosemaphore and
bad_area label.

Signed-off-by: Kefeng Wang 
---
 arch/loongarch/mm/fault.c | 48 +--
 1 file changed, 16 insertions(+), 32 deletions(-)

diff --git a/arch/loongarch/mm/fault.c b/arch/loongarch/mm/fault.c
index e6376e3dce86..5d4c742c4bc5 100644
--- a/arch/loongarch/mm/fault.c
+++ b/arch/loongarch/mm/fault.c
@@ -157,18 +157,15 @@ static void __kprobes __do_page_fault(struct pt_regs 
*regs,
if (!user_mode(regs))
no_context(regs, write, address);
else
-   do_sigsegv(regs, write, address, si_code);
-   return;
+   goto bad_area_nosemaphore;
}
 
/*
 * If we're in an interrupt or have no user
 * context, we must not take the fault..
 */
-   if (faulthandler_disabled() || !mm) {
-   do_sigsegv(regs, write, address, si_code);
-   return;
-   }
+   if (faulthandler_disabled() || !mm)
+   goto bad_area_nosemaphore;
 
if (user_mode(regs))
flags |= FAULT_FLAG_USER;
@@ -178,23 +175,7 @@ static void __kprobes __do_page_fault(struct pt_regs *regs,
vma = lock_mm_and_find_vma(mm, address, regs);
if (unlikely(!vma))
goto bad_area_nosemaphore;
-   goto good_area;
-
-/*
- * Something tried to access memory that isn't in our memory map..
- * Fix it, but check if it's kernel or user first..
- */
-bad_area:
-   mmap_read_unlock(mm);
-bad_area_nosemaphore:
-   do_sigsegv(regs, write, address, si_code);
-   return;
 
-/*
- * Ok, we have a good vm_area for this memory access, so
- * we can handle it..
- */
-good_area:
si_code = SEGV_ACCERR;
 
if (write) {
@@ -235,22 +216,25 @@ static void __kprobes __do_page_fault(struct pt_regs 
*regs,
 */
goto retry;
}
+
+   mmap_read_unlock(mm);
+
if (unlikely(fault & VM_FAULT_ERROR)) {
-   mmap_read_unlock(mm);
-   if (fault & VM_FAULT_OOM) {
+   if (fault & VM_FAULT_OOM)
do_out_of_memory(regs, write, address);
-   return;
-   } else if (fault & VM_FAULT_SIGSEGV) {
-   do_sigsegv(regs, write, address, si_code);
-   return;
-   } else if (fault & 
(VM_FAULT_SIGBUS|VM_FAULT_HWPOISON|VM_FAULT_HWPOISON_LARGE)) {
+   else if (fault & VM_FAULT_SIGSEGV)
+   goto bad_area_nosemaphore;
+   else if (fault & 
(VM_FAULT_SIGBUS|VM_FAULT_HWPOISON|VM_FAULT_HWPOISON_LARGE))
do_sigbus(regs, write, address, si_code);
-   return;
-   }
-   BUG();
+   else
+   BUG();
}
 
+   return;
+bad_area:
mmap_read_unlock(mm);
+bad_area_nosemaphore:
+   do_sigsegv(regs, write, address, si_code);
 }
 
 asmlinkage void __kprobes do_page_fault(struct pt_regs *regs,
-- 
2.27.0



[PATCH rfc v2 09/10] loongarch: mm: add access_error() helper

2023-08-21 Thread Kefeng Wang
Add access_error() to check whether vma could be accessible or not,
which will be used __do_page_fault() and later vma locked based page
fault.

Signed-off-by: Kefeng Wang 
---
 arch/loongarch/mm/fault.c | 30 --
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/arch/loongarch/mm/fault.c b/arch/loongarch/mm/fault.c
index 5d4c742c4bc5..2a45e9f3a485 100644
--- a/arch/loongarch/mm/fault.c
+++ b/arch/loongarch/mm/fault.c
@@ -126,6 +126,22 @@ static void __kprobes do_sigsegv(struct pt_regs *regs,
force_sig_fault(SIGSEGV, si_code, (void __user *)address);
 }
 
+static inline bool access_error(unsigned int flags, struct pt_regs *regs,
+   unsigned long addr, struct vm_area_struct *vma)
+{
+   if (flags & FAULT_FLAG_WRITE) {
+   if (!(vma->vm_flags & VM_WRITE))
+   return true;
+   } else {
+   if (!(vma->vm_flags & VM_READ) && addr != exception_era(regs))
+   return true;
+   if (!(vma->vm_flags & VM_EXEC) && addr == exception_era(regs))
+   return true;
+   }
+
+   return false;
+}
+
 /*
  * This routine handles page faults.  It determines the address,
  * and the problem, and then passes it off to one of the appropriate
@@ -169,6 +185,8 @@ static void __kprobes __do_page_fault(struct pt_regs *regs,
 
if (user_mode(regs))
flags |= FAULT_FLAG_USER;
+   if (write)
+   flags |= FAULT_FLAG_WRITE;
 
perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
 retry:
@@ -178,16 +196,8 @@ static void __kprobes __do_page_fault(struct pt_regs *regs,
 
si_code = SEGV_ACCERR;
 
-   if (write) {
-   flags |= FAULT_FLAG_WRITE;
-   if (!(vma->vm_flags & VM_WRITE))
-   goto bad_area;
-   } else {
-   if (!(vma->vm_flags & VM_READ) && address != 
exception_era(regs))
-   goto bad_area;
-   if (!(vma->vm_flags & VM_EXEC) && address == 
exception_era(regs))
-   goto bad_area;
-   }
+   if (access_error(flags, regs, vma))
+   goto bad_area;
 
/*
 * If for any reason at all we couldn't handle the fault,
-- 
2.27.0



[PATCH rfc -next v2 00/10] mm: convert to generic VMA lock-based page fault

2023-08-21 Thread Kefeng Wang
Add a generic VMA lock-based page fault handler in mm core, and convert
architectures to use it, which eliminate architectures's duplicated
codes.

With it, we can avoid multiple changes in architectures's code if we 
add new feature or bugfix, in the end, enable this feature on ARM32
and Loongarch.

This is based on next-20230817, only built test.

v2: 
- convert "int arch_vma_check_access()" to "bool arch_vma_access_error()"
  still use __weak function for arch_vma_access_error(), which avoid to
  declare access_error() in architecture's(x86/powerpc/riscv/loongarch)
  headfile.
- re-use struct vm_fault instead of adding new struct vm_locked_fault,
  per Matthew Wilcox, add necessary pt_regs/fault error code/vm flags
  into vm_fault since they could be used in arch_vma_access_error()
- add special VM_FAULT_NONE and make try_vma_locked_page_fault() to
  return vm_fault_t

Kefeng Wang (10):
  mm: add a generic VMA lock-based page fault handler
  arm64: mm: use try_vma_locked_page_fault()
  x86: mm: use try_vma_locked_page_fault()
  s390: mm: use try_vma_locked_page_fault()
  powerpc: mm: use try_vma_locked_page_fault()
  riscv: mm: use try_vma_locked_page_fault()
  ARM: mm: try VMA lock-based page fault handling first
  loongarch: mm: cleanup __do_page_fault()
  loongarch: mm: add access_error() helper
  loongarch: mm: try VMA lock-based page fault handling first

 arch/arm/Kconfig  |   1 +
 arch/arm/mm/fault.c   |  35 
 arch/arm64/mm/fault.c |  60 -
 arch/loongarch/Kconfig|   1 +
 arch/loongarch/mm/fault.c | 111 ++
 arch/powerpc/mm/fault.c   |  66 +++
 arch/riscv/mm/fault.c |  58 +---
 arch/s390/mm/fault.c  |  66 ++-
 arch/x86/mm/fault.c   |  55 ---
 include/linux/mm.h|  17 ++
 include/linux/mm_types.h  |   2 +
 mm/memory.c   |  39 ++
 12 files changed, 278 insertions(+), 233 deletions(-)

-- 
2.27.0



[PATCH rfc v2 05/10] powerpc: mm: use try_vma_locked_page_fault()

2023-08-21 Thread Kefeng Wang
Use new try_vma_locked_page_fault() helper to simplify code.
No functional change intended.

Signed-off-by: Kefeng Wang 
---
 arch/powerpc/mm/fault.c | 66 -
 1 file changed, 32 insertions(+), 34 deletions(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index b1723094d464..52f9546e020e 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -391,6 +391,22 @@ static int page_fault_is_bad(unsigned long err)
 #define page_fault_is_bad(__err)   ((__err) & DSISR_BAD_FAULT_32S)
 #endif
 
+#ifdef CONFIG_PER_VMA_LOCK
+bool arch_vma_access_error(struct vm_area_struct *vma, struct vm_fault *vmf)
+{
+   int is_exec = TRAP(vmf->regs) == INTERRUPT_INST_STORAGE;
+   int is_write = page_fault_is_write(vmf->fault_code);
+
+   if (unlikely(access_pkey_error(is_write, is_exec,
+   (vmf->fault_code & DSISR_KEYFAULT), vma)))
+   return true;
+
+   if (unlikely(access_error(is_write, is_exec, vma)))
+   return true;
+   return false;
+}
+#endif
+
 /*
  * For 600- and 800-family processors, the error_code parameter is DSISR
  * for a data fault, SRR1 for an instruction fault.
@@ -407,12 +423,18 @@ static int ___do_page_fault(struct pt_regs *regs, 
unsigned long address,
 {
struct vm_area_struct * vma;
struct mm_struct *mm = current->mm;
-   unsigned int flags = FAULT_FLAG_DEFAULT;
int is_exec = TRAP(regs) == INTERRUPT_INST_STORAGE;
int is_user = user_mode(regs);
int is_write = page_fault_is_write(error_code);
vm_fault_t fault, major = 0;
bool kprobe_fault = kprobe_page_fault(regs, 11);
+   struct vm_fault vmf = {
+   .real_address = address,
+   .fault_code = error_code,
+   .regs = regs,
+   .flags = FAULT_FLAG_DEFAULT,
+   };
+
 
if (unlikely(debugger_fault_handler(regs) || kprobe_fault))
return 0;
@@ -463,45 +485,21 @@ static int ___do_page_fault(struct pt_regs *regs, 
unsigned long address,
 * mmap_lock held
 */
if (is_user)
-   flags |= FAULT_FLAG_USER;
+   vmf.flags |= FAULT_FLAG_USER;
if (is_write)
-   flags |= FAULT_FLAG_WRITE;
+   vmf.flags |= FAULT_FLAG_WRITE;
if (is_exec)
-   flags |= FAULT_FLAG_INSTRUCTION;
+   vmf.flags |= FAULT_FLAG_INSTRUCTION;
 
-   if (!(flags & FAULT_FLAG_USER))
-   goto lock_mmap;
-
-   vma = lock_vma_under_rcu(mm, address);
-   if (!vma)
-   goto lock_mmap;
-
-   if (unlikely(access_pkey_error(is_write, is_exec,
-  (error_code & DSISR_KEYFAULT), vma))) {
-   vma_end_read(vma);
-   goto lock_mmap;
-   }
-
-   if (unlikely(access_error(is_write, is_exec, vma))) {
-   vma_end_read(vma);
-   goto lock_mmap;
-   }
-
-   fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, 
regs);
-   if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
-   vma_end_read(vma);
-
-   if (!(fault & VM_FAULT_RETRY)) {
-   count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
+   fault = try_vma_locked_page_fault();
+   if (fault == VM_FAULT_NONE)
+   goto retry;
+   if (!(fault & VM_FAULT_RETRY))
goto done;
-   }
-   count_vm_vma_lock_event(VMA_LOCK_RETRY);
 
if (fault_signal_pending(fault, regs))
return user_mode(regs) ? 0 : SIGBUS;
 
-lock_mmap:
-
/* When running in the kernel we expect faults to occur only to
 * addresses in user space.  All other faults represent errors in the
 * kernel and should generate an OOPS.  Unfortunately, in the case of an
@@ -528,7 +526,7 @@ static int ___do_page_fault(struct pt_regs *regs, unsigned 
long address,
 * make sure we exit gracefully rather than endlessly redo
 * the fault.
 */
-   fault = handle_mm_fault(vma, address, flags, regs);
+   fault = handle_mm_fault(vma, address, vmf.flags, regs);
 
major |= fault & VM_FAULT_MAJOR;
 
@@ -544,7 +542,7 @@ static int ___do_page_fault(struct pt_regs *regs, unsigned 
long address,
 * case.
 */
if (unlikely(fault & VM_FAULT_RETRY)) {
-   flags |= FAULT_FLAG_TRIED;
+   vmf.flags |= FAULT_FLAG_TRIED;
goto retry;
}
 
-- 
2.27.0



[PATCH rfc v2 06/10] riscv: mm: use try_vma_locked_page_fault()

2023-08-21 Thread Kefeng Wang
Use new try_vma_locked_page_fault() helper to simplify code.
No functional change intended.

Signed-off-by: Kefeng Wang 
---
 arch/riscv/mm/fault.c | 58 ++-
 1 file changed, 24 insertions(+), 34 deletions(-)

diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c
index 6115d7514972..b46129b636f2 100644
--- a/arch/riscv/mm/fault.c
+++ b/arch/riscv/mm/fault.c
@@ -215,6 +215,13 @@ static inline bool access_error(unsigned long cause, 
struct vm_area_struct *vma)
return false;
 }
 
+#ifdef CONFIG_PER_VMA_LOCK
+bool arch_vma_access_error(struct vm_area_struct *vma, struct vm_fault *vmf)
+{
+   return access_error(vmf->fault_code, vma);
+}
+#endif
+
 /*
  * This routine handles page faults.  It determines the address and the
  * problem, and then passes it off to one of the appropriate routines.
@@ -223,17 +230,16 @@ void handle_page_fault(struct pt_regs *regs)
 {
struct task_struct *tsk;
struct vm_area_struct *vma;
-   struct mm_struct *mm;
-   unsigned long addr, cause;
-   unsigned int flags = FAULT_FLAG_DEFAULT;
+   struct mm_struct *mm = current->mm;
+   unsigned long addr = regs->badaddr;
+   unsigned long cause = regs->cause;
int code = SEGV_MAPERR;
vm_fault_t fault;
-
-   cause = regs->cause;
-   addr = regs->badaddr;
-
-   tsk = current;
-   mm = tsk->mm;
+   struct vm_fault vmf = {
+   .real_address = addr,
+   .fault_code = cause,
+   .flags = FAULT_FLAG_DEFAULT,
+   };
 
if (kprobe_page_fault(regs, cause))
return;
@@ -268,7 +274,7 @@ void handle_page_fault(struct pt_regs *regs)
}
 
if (user_mode(regs))
-   flags |= FAULT_FLAG_USER;
+   vmf.flags |= FAULT_FLAG_USER;
 
if (!user_mode(regs) && addr < TASK_SIZE && unlikely(!(regs->status & 
SR_SUM))) {
if (fixup_exception(regs))
@@ -280,37 +286,21 @@ void handle_page_fault(struct pt_regs *regs)
perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, addr);
 
if (cause == EXC_STORE_PAGE_FAULT)
-   flags |= FAULT_FLAG_WRITE;
+   vmf.flags |= FAULT_FLAG_WRITE;
else if (cause == EXC_INST_PAGE_FAULT)
-   flags |= FAULT_FLAG_INSTRUCTION;
-   if (!(flags & FAULT_FLAG_USER))
-   goto lock_mmap;
-
-   vma = lock_vma_under_rcu(mm, addr);
-   if (!vma)
-   goto lock_mmap;
+   vmf.flags |= FAULT_FLAG_INSTRUCTION;
 
-   if (unlikely(access_error(cause, vma))) {
-   vma_end_read(vma);
-   goto lock_mmap;
-   }
-
-   fault = handle_mm_fault(vma, addr, flags | FAULT_FLAG_VMA_LOCK, regs);
-   if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
-   vma_end_read(vma);
-
-   if (!(fault & VM_FAULT_RETRY)) {
-   count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
+   fault = try_vma_locked_page_fault();
+   if (fault == VM_FAULT_NONE)
+   goto retry;
+   if (!(fault & VM_FAULT_RETRY))
goto done;
-   }
-   count_vm_vma_lock_event(VMA_LOCK_RETRY);
 
if (fault_signal_pending(fault, regs)) {
if (!user_mode(regs))
no_context(regs, addr);
return;
}
-lock_mmap:
 
 retry:
vma = lock_mm_and_find_vma(mm, addr, regs);
@@ -337,7 +327,7 @@ void handle_page_fault(struct pt_regs *regs)
 * make sure we exit gracefully rather than endlessly redo
 * the fault.
 */
-   fault = handle_mm_fault(vma, addr, flags, regs);
+   fault = handle_mm_fault(vma, addr, vmf.flags, regs);
 
/*
 * If we need to retry but a fatal signal is pending, handle the
@@ -355,7 +345,7 @@ void handle_page_fault(struct pt_regs *regs)
return;
 
if (unlikely(fault & VM_FAULT_RETRY)) {
-   flags |= FAULT_FLAG_TRIED;
+   vmf.flags |= FAULT_FLAG_TRIED;
 
/*
 * No need to mmap_read_unlock(mm) as we would
-- 
2.27.0



[PATCH rfc v2 01/10] mm: add a generic VMA lock-based page fault handler

2023-08-21 Thread Kefeng Wang
The ARCH_SUPPORTS_PER_VMA_LOCK are enabled by more and more architectures,
eg, x86, arm64, powerpc and s390, and riscv, those implementation are very
similar which results in some duplicated codes, let's add a generic VMA
lock-based page fault handler try_to_vma_locked_page_fault() to eliminate
them, and which also make us easy to support this on new architectures.

Since different architectures use different way to check vma whether is
accessable or not, the struct pt_regs, page fault error code and vma flags
are added into struct vm_fault, then, the architecture's page fault code
could re-use struct vm_fault to record and check vma accessable by each
own implementation.

Signed-off-by: Kefeng Wang 
---
 include/linux/mm.h   | 17 +
 include/linux/mm_types.h |  2 ++
 mm/memory.c  | 39 +++
 3 files changed, 58 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 3f764e84e567..22a6f4c56ff3 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -512,9 +512,12 @@ struct vm_fault {
pgoff_t pgoff;  /* Logical page offset based on 
vma */
unsigned long address;  /* Faulting virtual address - 
masked */
unsigned long real_address; /* Faulting virtual address - 
unmasked */
+   unsigned long fault_code;   /* Faulting error code during 
page fault */
+   struct pt_regs *regs;   /* The registers stored during 
page fault */
};
enum fault_flag flags;  /* FAULT_FLAG_xxx flags
 * XXX: should really be 'const' */
+   vm_flags_t vm_flags;/* VMA flags to be used for access 
checking */
pmd_t *pmd; /* Pointer to pmd entry matching
 * the 'address' */
pud_t *pud; /* Pointer to pud entry matching
@@ -774,6 +777,9 @@ static inline void assert_fault_locked(struct vm_fault *vmf)
 struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
  unsigned long address);
 
+bool arch_vma_access_error(struct vm_area_struct *vma, struct vm_fault *vmf);
+vm_fault_t try_vma_locked_page_fault(struct vm_fault *vmf);
+
 #else /* CONFIG_PER_VMA_LOCK */
 
 static inline bool vma_start_read(struct vm_area_struct *vma)
@@ -801,6 +807,17 @@ static inline void assert_fault_locked(struct vm_fault 
*vmf)
mmap_assert_locked(vmf->vma->vm_mm);
 }
 
+static inline struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
+   unsigned long address)
+{
+   return NULL;
+}
+
+static inline vm_fault_t try_vma_locked_page_fault(struct vm_fault *vmf)
+{
+   return VM_FAULT_NONE;
+}
+
 #endif /* CONFIG_PER_VMA_LOCK */
 
 extern const struct vm_operations_struct vma_dummy_vm_ops;
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index f5ba5b0bc836..702820cea3f9 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -1119,6 +1119,7 @@ typedef __bitwise unsigned int vm_fault_t;
  * fault. Used to decide whether a process gets delivered SIGBUS or
  * just gets major/minor fault counters bumped up.
  *
+ * @VM_FAULT_NONE: Special case, not starting to handle fault
  * @VM_FAULT_OOM:  Out Of Memory
  * @VM_FAULT_SIGBUS:   Bad access
  * @VM_FAULT_MAJOR:Page read from storage
@@ -1139,6 +1140,7 @@ typedef __bitwise unsigned int vm_fault_t;
  *
  */
 enum vm_fault_reason {
+   VM_FAULT_NONE   = (__force vm_fault_t)0x00,
VM_FAULT_OOM= (__force vm_fault_t)0x01,
VM_FAULT_SIGBUS = (__force vm_fault_t)0x02,
VM_FAULT_MAJOR  = (__force vm_fault_t)0x04,
diff --git a/mm/memory.c b/mm/memory.c
index 3b4aaa0d2fff..60fe35db5134 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5510,6 +5510,45 @@ struct vm_area_struct *lock_vma_under_rcu(struct 
mm_struct *mm,
count_vm_vma_lock_event(VMA_LOCK_ABORT);
return NULL;
 }
+
+#ifdef CONFIG_PER_VMA_LOCK
+bool __weak arch_vma_access_error(struct vm_area_struct *vma, struct vm_fault 
*vmf)
+{
+   return (vma->vm_flags & vmf->vm_flags) == 0;
+}
+#endif
+
+vm_fault_t try_vma_locked_page_fault(struct vm_fault *vmf)
+{
+   vm_fault_t fault = VM_FAULT_NONE;
+   struct vm_area_struct *vma;
+
+   if (!(vmf->flags & FAULT_FLAG_USER))
+   return fault;
+
+   vma = lock_vma_under_rcu(current->mm, vmf->real_address);
+   if (!vma)
+   return fault;
+
+   if (arch_vma_access_error(vma, vmf)) {
+   vma_end_read(vma);
+   return fault;
+   }
+
+   fault = handle_mm_fault(vma, vmf->real_address,
+   vmf->flags | FAULT_FLAG_VMA_LOCK, vmf->regs);
+
+   if (!(fault 

[PATCH rfc v2 10/10] loongarch: mm: try VMA lock-based page fault handling first

2023-08-21 Thread Kefeng Wang
Attempt VMA lock-based page fault handling first, and fall back
to the existing mmap_lock-based handling if that fails.

Signed-off-by: Kefeng Wang 
---
 arch/loongarch/Kconfig|  1 +
 arch/loongarch/mm/fault.c | 37 +++--
 2 files changed, 32 insertions(+), 6 deletions(-)

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 2b27b18a63af..6b821f621920 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -56,6 +56,7 @@ config LOONGARCH
select ARCH_SUPPORTS_LTO_CLANG
select ARCH_SUPPORTS_LTO_CLANG_THIN
select ARCH_SUPPORTS_NUMA_BALANCING
+   select ARCH_SUPPORTS_PER_VMA_LOCK
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_CMPXCHG_LOCKREF
select ARCH_USE_QUEUED_RWLOCKS
diff --git a/arch/loongarch/mm/fault.c b/arch/loongarch/mm/fault.c
index 2a45e9f3a485..f7ac3a14bb06 100644
--- a/arch/loongarch/mm/fault.c
+++ b/arch/loongarch/mm/fault.c
@@ -142,6 +142,13 @@ static inline bool access_error(unsigned int flags, struct 
pt_regs *regs,
return false;
 }
 
+#ifdef CONFIG_PER_VMA_LOCK
+bool arch_vma_access_error(struct vm_area_struct *vma, struct vm_fault *vmf)
+{
+   return access_error(vmf->flags, vmf->regs, vmf->real_address, vma);
+}
+#endif
+
 /*
  * This routine handles page faults.  It determines the address,
  * and the problem, and then passes it off to one of the appropriate
@@ -151,11 +158,15 @@ static void __kprobes __do_page_fault(struct pt_regs 
*regs,
unsigned long write, unsigned long address)
 {
int si_code = SEGV_MAPERR;
-   unsigned int flags = FAULT_FLAG_DEFAULT;
struct task_struct *tsk = current;
struct mm_struct *mm = tsk->mm;
struct vm_area_struct *vma = NULL;
vm_fault_t fault;
+   struct vm_fault vmf = {
+   .real_address = address,
+   .regs = regs,
+   .flags = FAULT_FLAG_DEFAULT,
+   };
 
if (kprobe_page_fault(regs, current->thread.trap_nr))
return;
@@ -184,11 +195,24 @@ static void __kprobes __do_page_fault(struct pt_regs 
*regs,
goto bad_area_nosemaphore;
 
if (user_mode(regs))
-   flags |= FAULT_FLAG_USER;
+   vmf.flags |= FAULT_FLAG_USER;
if (write)
-   flags |= FAULT_FLAG_WRITE;
+   vmf.flags |= FAULT_FLAG_WRITE;
 
perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
+
+   fault = try_vma_locked_page_fault();
+   if (fault == VM_FAULT_NONE)
+   goto retry;
+   if (!(fault & VM_FAULT_RETRY))
+   goto done;
+
+   if (fault_signal_pending(fault, regs)) {
+   if (!user_mode(regs))
+   no_context(regs, write, address);
+   return;
+   }
+
 retry:
vma = lock_mm_and_find_vma(mm, address, regs);
if (unlikely(!vma))
@@ -196,7 +220,7 @@ static void __kprobes __do_page_fault(struct pt_regs *regs,
 
si_code = SEGV_ACCERR;
 
-   if (access_error(flags, regs, vma))
+   if (access_error(vmf.flags, regs, address, vma))
goto bad_area;
 
/*
@@ -204,7 +228,7 @@ static void __kprobes __do_page_fault(struct pt_regs *regs,
 * make sure we exit gracefully rather than endlessly redo
 * the fault.
 */
-   fault = handle_mm_fault(vma, address, flags, regs);
+   fault = handle_mm_fault(vma, address, vmf.flags, regs);
 
if (fault_signal_pending(fault, regs)) {
if (!user_mode(regs))
@@ -217,7 +241,7 @@ static void __kprobes __do_page_fault(struct pt_regs *regs,
return;
 
if (unlikely(fault & VM_FAULT_RETRY)) {
-   flags |= FAULT_FLAG_TRIED;
+   vmf.flags |= FAULT_FLAG_TRIED;
 
/*
 * No need to mmap_read_unlock(mm) as we would
@@ -229,6 +253,7 @@ static void __kprobes __do_page_fault(struct pt_regs *regs,
 
mmap_read_unlock(mm);
 
+done:
if (unlikely(fault & VM_FAULT_ERROR)) {
if (fault & VM_FAULT_OOM)
do_out_of_memory(regs, write, address);
-- 
2.27.0



Re: [PATCH rfc -next 01/10] mm: add a generic VMA lock-based page fault handler

2023-07-14 Thread Kefeng Wang




On 2023/7/14 9:52, Kefeng Wang wrote:



On 2023/7/14 4:12, Suren Baghdasaryan wrote:
On Thu, Jul 13, 2023 at 9:15 AM Matthew Wilcox  
wrote:


+int try_vma_locked_page_fault(struct vm_locked_fault *vmlf, 
vm_fault_t *ret)

+{
+ struct vm_area_struct *vma;
+ vm_fault_t fault;



On Thu, Jul 13, 2023 at 05:53:29PM +0800, Kefeng Wang wrote:
+#define VM_LOCKED_FAULT_INIT(_name, _mm, _address, _fault_flags, 
_vm_flags, _regs, _fault_code) \

+ _name.mm    = _mm;  \
+ _name.address   = _address; \
+ _name.fault_flags   = _fault_flags; \
+ _name.vm_flags  = _vm_flags;    \
+ _name.regs  = _regs;    \
+ _name.fault_code    = _fault_code


More consolidated code is a good idea; no question.  But I don't think
this is the right way to do it.


I agree it is not good enough, but the arch's vma check acess has
different implementation, some use vm flags, some need fault code and
regs, and some use both :(




+int __weak arch_vma_check_access(struct vm_area_struct *vma,
+  struct vm_locked_fault *vmlf);


This should be:

#ifndef vma_check_access
bool vma_check_access(struct vm_area_struct *vma, )
{
 return (vma->vm_flags & vm_flags) == 0;
}
#endif

and then arches which want to do something different can just define
vma_check_access.


Ok, I could convert to use this way.



+int try_vma_locked_page_fault(struct vm_locked_fault *vmlf, 
vm_fault_t *ret)

+{
+ struct vm_area_struct *vma;
+ vm_fault_t fault;


Declaring the vmf in this function and then copying it back is just 
wrong.

We need to declare vm_fault_t earlier (in the arch fault handler) and
pass it in.


Actually I passed the vm_fault_t *ret(in the arch fault handler), we
could directly use *ret instead of a new local variable, and no copy.


Did you mean to say "we need to declare vmf (struct vm_fault) earlier
(in the arch fault handler) and pass it in." ?


After recheck the code, I think Matthew' idea is 'declare vmf (struct 
vm_fault) earlier' like Suren said, not vm_fault_t, right? will try 
this, thanks.





  I don't think that creating struct vm_locked_fault is the
right idea either.


As mentioned above for vma check access, we need many arguments for a 
function, a new struct looks possible better, is there better solution

or any suggestion?

Thanks.



Re: [PATCH rfc -next 01/10] mm: add a generic VMA lock-based page fault handler

2023-07-13 Thread Kefeng Wang




On 2023/7/14 4:12, Suren Baghdasaryan wrote:

On Thu, Jul 13, 2023 at 9:15 AM Matthew Wilcox  wrote:



+int try_vma_locked_page_fault(struct vm_locked_fault *vmlf, vm_fault_t *ret)
+{
+ struct vm_area_struct *vma;
+ vm_fault_t fault;



On Thu, Jul 13, 2023 at 05:53:29PM +0800, Kefeng Wang wrote:

+#define VM_LOCKED_FAULT_INIT(_name, _mm, _address, _fault_flags, _vm_flags, 
_regs, _fault_code) \
+ _name.mm= _mm;  \
+ _name.address   = _address; \
+ _name.fault_flags   = _fault_flags; \
+ _name.vm_flags  = _vm_flags;\
+ _name.regs  = _regs;\
+ _name.fault_code= _fault_code


More consolidated code is a good idea; no question.  But I don't think
this is the right way to do it.


I agree it is not good enough, but the arch's vma check acess has
different implementation, some use vm flags, some need fault code and
regs, and some use both :(




+int __weak arch_vma_check_access(struct vm_area_struct *vma,
+  struct vm_locked_fault *vmlf);


This should be:

#ifndef vma_check_access
bool vma_check_access(struct vm_area_struct *vma, )
{
 return (vma->vm_flags & vm_flags) == 0;
}
#endif

and then arches which want to do something different can just define
vma_check_access.


Ok, I could convert to use this way.




+int try_vma_locked_page_fault(struct vm_locked_fault *vmlf, vm_fault_t *ret)
+{
+ struct vm_area_struct *vma;
+ vm_fault_t fault;


Declaring the vmf in this function and then copying it back is just wrong.
We need to declare vm_fault_t earlier (in the arch fault handler) and
pass it in.


Actually I passed the vm_fault_t *ret(in the arch fault handler), we
could directly use *ret instead of a new local variable, and no copy.


Did you mean to say "we need to declare vmf (struct vm_fault) earlier
(in the arch fault handler) and pass it in." ?


  I don't think that creating struct vm_locked_fault is the
right idea either.


As mentioned above for vma check access, we need many arguments for a 
function, a new struct looks possible better, is there better solution

or any suggestion?

Thanks.




+ if (!(vmlf->fault_flags & FAULT_FLAG_USER))
+ return -EINVAL;
+
+ vma = lock_vma_under_rcu(vmlf->mm, vmlf->address);
+ if (!vma)
+ return -EINVAL;
+
+ if (arch_vma_check_access(vma, vmlf)) {
+ vma_end_read(vma);
+ return -EINVAL;
+ }
+
+ fault = handle_mm_fault(vma, vmlf->address,
+ vmlf->fault_flags | FAULT_FLAG_VMA_LOCK,
+ vmlf->regs);
+ *ret = fault;
+
+ if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
+ vma_end_read(vma);
+
+ if ((fault & VM_FAULT_RETRY))
+ count_vm_vma_lock_event(VMA_LOCK_RETRY);
+ else
+ count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
+
+ return 0;
+}
+
  #endif /* CONFIG_PER_VMA_LOCK */

  #ifndef __PAGETABLE_P4D_FOLDED
--
2.27.0




Re: [PATCH rfc -next 00/10] mm: convert to generic VMA lock-based page fault

2023-07-13 Thread Kefeng Wang

Please ignore this one...

On 2023/7/13 17:51, Kefeng Wang wrote:

Add a generic VMA lock-based page fault handler in mm core, and convert
architectures to use it, which eliminate architectures's duplicated codes.

With it, we can avoid multiple changes in architectures's code if we
add new feature or bugfix.

This fixes riscv missing change about commit 38b3aec8e8d2 "mm: drop per-VMA
lock when returning VM_FAULT_RETRY or VM_FAULT_COMPLETED", and in the end,
we enable this feature on ARM32/Loongarch too.

This is based on next-20230713, only built test(no loongarch compiler,
so except loongarch).

Kefeng Wang (10):
   mm: add a generic VMA lock-based page fault handler
   x86: mm: use try_vma_locked_page_fault()
   arm64: mm: use try_vma_locked_page_fault()
   s390: mm: use try_vma_locked_page_fault()
   powerpc: mm: use try_vma_locked_page_fault()
   riscv: mm: use try_vma_locked_page_fault()
   ARM: mm: try VMA lock-based page fault handling first
   loongarch: mm: cleanup __do_page_fault()
   loongarch: mm: add access_error() helper
   loongarch: mm: try VMA lock-based page fault handling first

  arch/arm/Kconfig  |  1 +
  arch/arm/mm/fault.c   | 15 ++-
  arch/arm64/mm/fault.c | 28 +++-
  arch/loongarch/Kconfig|  1 +
  arch/loongarch/mm/fault.c | 92 ---
  arch/powerpc/mm/fault.c   | 54 ++-
  arch/riscv/mm/fault.c | 38 +++-
  arch/s390/mm/fault.c  | 23 +++---
  arch/x86/mm/fault.c   | 39 +++--
  include/linux/mm.h| 28 
  mm/memory.c   | 42 ++
  11 files changed, 206 insertions(+), 155 deletions(-)



[PATCH rfc -next 07/10] ARM: mm: try VMA lock-based page fault handling first

2023-07-13 Thread Kefeng Wang
Attempt VMA lock-based page fault handling first, and fall back
to the existing mmap_lock-based handling if that fails.

Signed-off-by: Kefeng Wang 
---
 arch/arm/Kconfig|  1 +
 arch/arm/mm/fault.c | 15 ++-
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 1a6a6eb48a15..8b6d4507ccee 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -34,6 +34,7 @@ config ARM
select ARCH_OPTIONAL_KERNEL_RWX_DEFAULT if CPU_V7
select ARCH_SUPPORTS_ATOMIC_RMW
select ARCH_SUPPORTS_HUGETLBFS if ARM_LPAE
+   select ARCH_SUPPORTS_PER_VMA_LOCK
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_CMPXCHG_LOCKREF
select ARCH_USE_MEMTEST
diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index fef62e4a9edd..c44b83841e36 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -244,6 +244,7 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct 
pt_regs *regs)
vm_fault_t fault;
unsigned int flags = FAULT_FLAG_DEFAULT;
unsigned long vm_flags = VM_ACCESS_FLAGS;
+   struct vm_locked_fault vmlf;
 
if (kprobe_page_fault(regs, fsr))
return 0;
@@ -278,6 +279,18 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct 
pt_regs *regs)
 
perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, addr);
 
+   VM_LOCKED_FAULT_INIT(vmlf, mm, addr, flags, vm_flags, regs, fsr);
+   if (try_vma_locked_page_fault(, ))
+   goto retry;
+   else if (!(fault | VM_FAULT_RETRY))
+   goto done;
+
+   if (fault_signal_pending(fault, regs)) {
+   if (!user_mode(regs))
+   goto no_context;
+   return 0;
+   }
+
 retry:
vma = lock_mm_and_find_vma(mm, addr, regs);
if (unlikely(!vma)) {
@@ -316,7 +329,7 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct 
pt_regs *regs)
}
 
mmap_read_unlock(mm);
-
+done:
/*
 * Handle the "normal" case first - VM_FAULT_MAJOR
 */
-- 
2.27.0



[PATCH rfc -next 04/10] s390: mm: use try_vma_locked_page_fault()

2023-07-13 Thread Kefeng Wang
Use new try_vma_locked_page_fault() helper to simplify code.
No functional change intended.

Signed-off-by: Kefeng Wang 
---
 arch/s390/mm/fault.c | 23 ++-
 1 file changed, 6 insertions(+), 17 deletions(-)

diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
index 40a71063949b..97e511690352 100644
--- a/arch/s390/mm/fault.c
+++ b/arch/s390/mm/fault.c
@@ -362,6 +362,7 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, 
int access)
struct task_struct *tsk;
struct mm_struct *mm;
struct vm_area_struct *vma;
+   struct vm_locked_fault vmlf;
enum fault_type type;
unsigned long address;
unsigned int flags;
@@ -407,31 +408,19 @@ static inline vm_fault_t do_exception(struct pt_regs 
*regs, int access)
access = VM_WRITE;
if (access == VM_WRITE)
flags |= FAULT_FLAG_WRITE;
-#ifdef CONFIG_PER_VMA_LOCK
-   if (!(flags & FAULT_FLAG_USER))
-   goto lock_mmap;
-   vma = lock_vma_under_rcu(mm, address);
-   if (!vma)
-   goto lock_mmap;
-   if (!(vma->vm_flags & access)) {
-   vma_end_read(vma);
+
+   VM_LOCKED_FAULT_INIT(vmlf, mm, address, flags, access, regs, 0);
+   if (try_vma_locked_page_fault(, ))
goto lock_mmap;
-   }
-   fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, 
regs);
-   if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
-   vma_end_read(vma);
-   if (!(fault & VM_FAULT_RETRY)) {
-   count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
+   else if (!(fault | VM_FAULT_RETRY))
goto out;
-   }
-   count_vm_vma_lock_event(VMA_LOCK_RETRY);
+
/* Quick path to respond to signals */
if (fault_signal_pending(fault, regs)) {
fault = VM_FAULT_SIGNAL;
goto out;
}
 lock_mmap:
-#endif /* CONFIG_PER_VMA_LOCK */
mmap_read_lock(mm);
 
gmap = NULL;
-- 
2.27.0



[PATCH rfc -next 03/10] arm64: mm: use try_vma_locked_page_fault()

2023-07-13 Thread Kefeng Wang
Use new try_vma_locked_page_fault() helper to simplify code.
No functional change intended.

Signed-off-by: Kefeng Wang 
---
 arch/arm64/mm/fault.c | 28 +---
 1 file changed, 5 insertions(+), 23 deletions(-)

diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index b8c80f7b8a5f..614bb53fc1bc 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -537,6 +537,7 @@ static int __kprobes do_page_fault(unsigned long far, 
unsigned long esr,
unsigned int mm_flags = FAULT_FLAG_DEFAULT;
unsigned long addr = untagged_addr(far);
struct vm_area_struct *vma;
+   struct vm_locked_fault vmlf;
 
if (kprobe_page_fault(regs, esr))
return 0;
@@ -587,27 +588,11 @@ static int __kprobes do_page_fault(unsigned long far, 
unsigned long esr,
 
perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, addr);
 
-#ifdef CONFIG_PER_VMA_LOCK
-   if (!(mm_flags & FAULT_FLAG_USER))
-   goto lock_mmap;
-
-   vma = lock_vma_under_rcu(mm, addr);
-   if (!vma)
-   goto lock_mmap;
-
-   if (!(vma->vm_flags & vm_flags)) {
-   vma_end_read(vma);
-   goto lock_mmap;
-   }
-   fault = handle_mm_fault(vma, addr, mm_flags | FAULT_FLAG_VMA_LOCK, 
regs);
-   if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
-   vma_end_read(vma);
-
-   if (!(fault & VM_FAULT_RETRY)) {
-   count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
+   VM_LOCKED_FAULT_INIT(vmlf, mm, addr, mm_flags, vm_flags, regs, esr);
+   if (try_vma_locked_page_fault(, ))
+   goto retry;
+   else if (!(fault | VM_FAULT_RETRY))
goto done;
-   }
-   count_vm_vma_lock_event(VMA_LOCK_RETRY);
 
/* Quick path to respond to signals */
if (fault_signal_pending(fault, regs)) {
@@ -615,9 +600,6 @@ static int __kprobes do_page_fault(unsigned long far, 
unsigned long esr,
goto no_context;
return 0;
}
-lock_mmap:
-#endif /* CONFIG_PER_VMA_LOCK */
-
 retry:
vma = lock_mm_and_find_vma(mm, addr, regs);
if (unlikely(!vma)) {
-- 
2.27.0



[PATCH rfc -next 01/10] mm: add a generic VMA lock-based page fault handler

2023-07-13 Thread Kefeng Wang
There are more and more architectures enabled ARCH_SUPPORTS_PER_VMA_LOCK,
eg, x86, arm64, powerpc and s390, and riscv, those implementation are very
similar which results in some duplicated codes, let's add a generic VMA
lock-based page fault handler to eliminate them, and which also make it
easy to support this feature on new architectures.

Signed-off-by: Kefeng Wang 
---
 include/linux/mm.h | 28 
 mm/memory.c| 42 ++
 2 files changed, 70 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index c7886784832b..cba1b7b19c9d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -633,6 +633,15 @@ static inline void vma_numab_state_init(struct 
vm_area_struct *vma) {}
 static inline void vma_numab_state_free(struct vm_area_struct *vma) {}
 #endif /* CONFIG_NUMA_BALANCING */
 
+struct vm_locked_fault {
+   struct mm_struct *mm;
+   unsigned long address;
+   unsigned int fault_flags;
+   unsigned long vm_flags;
+   struct pt_regs *regs;
+   unsigned long fault_code;
+};
+
 #ifdef CONFIG_PER_VMA_LOCK
 /*
  * Try to read-lock a vma. The function is allowed to occasionally yield false
@@ -733,6 +742,19 @@ static inline void assert_fault_locked(struct vm_fault 
*vmf)
 struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
  unsigned long address);
 
+#define VM_LOCKED_FAULT_INIT(_name, _mm, _address, _fault_flags, _vm_flags, 
_regs, _fault_code) \
+   _name.mm= _mm;  \
+   _name.address   = _address; \
+   _name.fault_flags   = _fault_flags; \
+   _name.vm_flags  = _vm_flags;\
+   _name.regs  = _regs;\
+   _name.fault_code= _fault_code
+
+int __weak arch_vma_check_access(struct vm_area_struct *vma,
+struct vm_locked_fault *vmlf);
+
+int try_vma_locked_page_fault(struct vm_locked_fault *vmlf, vm_fault_t *ret);
+
 #else /* CONFIG_PER_VMA_LOCK */
 
 static inline bool vma_start_read(struct vm_area_struct *vma)
@@ -742,6 +764,12 @@ static inline void vma_start_write(struct vm_area_struct 
*vma) {}
 static inline void vma_assert_write_locked(struct vm_area_struct *vma) {}
 static inline void vma_mark_detached(struct vm_area_struct *vma,
 bool detached) {}
+#define VM_LOCKED_FAULT_INIT(_name, _mm, _address, _fault_flags, _vm_flags, 
_regs, _fault_code)
+static inline int try_vma_locked_page_fault(struct vm_locked_fault *vmlf,
+   vm_fault_t *ret)
+{
+   return -EINVAL;
+}
 
 static inline void release_fault_lock(struct vm_fault *vmf)
 {
diff --git a/mm/memory.c b/mm/memory.c
index ad790394963a..d3f5d1270e7a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5449,6 +5449,48 @@ struct vm_area_struct *lock_vma_under_rcu(struct 
mm_struct *mm,
count_vm_vma_lock_event(VMA_LOCK_ABORT);
return NULL;
 }
+
+int __weak arch_vma_check_access(struct vm_area_struct *vma,
+struct vm_locked_fault *vmlf)
+{
+   if (!(vma->vm_flags & vmlf->vm_flags))
+   return -EINVAL;
+   return 0;
+}
+
+int try_vma_locked_page_fault(struct vm_locked_fault *vmlf, vm_fault_t *ret)
+{
+   struct vm_area_struct *vma;
+   vm_fault_t fault;
+
+   if (!(vmlf->fault_flags & FAULT_FLAG_USER))
+   return -EINVAL;
+
+   vma = lock_vma_under_rcu(vmlf->mm, vmlf->address);
+   if (!vma)
+   return -EINVAL;
+
+   if (arch_vma_check_access(vma, vmlf)) {
+   vma_end_read(vma);
+   return -EINVAL;
+   }
+
+   fault = handle_mm_fault(vma, vmlf->address,
+   vmlf->fault_flags | FAULT_FLAG_VMA_LOCK,
+   vmlf->regs);
+   *ret = fault;
+
+   if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
+   vma_end_read(vma);
+
+   if ((fault & VM_FAULT_RETRY))
+   count_vm_vma_lock_event(VMA_LOCK_RETRY);
+   else
+   count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
+
+   return 0;
+}
+
 #endif /* CONFIG_PER_VMA_LOCK */
 
 #ifndef __PAGETABLE_P4D_FOLDED
-- 
2.27.0



[PATCH rfc -next 00/10] mm: convert to generic VMA lock-based page fault

2023-07-13 Thread Kefeng Wang
Add a generic VMA lock-based page fault handler in mm core, and convert
architectures to use it, which eliminate architectures's duplicated codes.

With it, we can avoid multiple changes in architectures's code if we 
add new feature or bugfix.

This fixes riscv missing change about commit 38b3aec8e8d2 "mm: drop per-VMA
lock when returning VM_FAULT_RETRY or VM_FAULT_COMPLETED", and in the end,
we enable this feature on ARM32/Loongarch too.

This is based on next-20230713, only built test(no loongarch compiler,
so except loongarch).

Kefeng Wang (10):
  mm: add a generic VMA lock-based page fault handler
  x86: mm: use try_vma_locked_page_fault()
  arm64: mm: use try_vma_locked_page_fault()
  s390: mm: use try_vma_locked_page_fault()
  powerpc: mm: use try_vma_locked_page_fault()
  riscv: mm: use try_vma_locked_page_fault()
  ARM: mm: try VMA lock-based page fault handling first
  loongarch: mm: cleanup __do_page_fault()
  loongarch: mm: add access_error() helper
  loongarch: mm: try VMA lock-based page fault handling first

 arch/arm/Kconfig  |  1 +
 arch/arm/mm/fault.c   | 15 ++-
 arch/arm64/mm/fault.c | 28 +++-
 arch/loongarch/Kconfig|  1 +
 arch/loongarch/mm/fault.c | 92 ---
 arch/powerpc/mm/fault.c   | 54 ++-
 arch/riscv/mm/fault.c | 38 +++-
 arch/s390/mm/fault.c  | 23 +++---
 arch/x86/mm/fault.c   | 39 +++--
 include/linux/mm.h| 28 
 mm/memory.c   | 42 ++
 11 files changed, 206 insertions(+), 155 deletions(-)

-- 
2.27.0



[PATCH rfc -next 10/10] loongarch: mm: try VMA lock-based page fault handling first

2023-07-13 Thread Kefeng Wang
Attempt VMA lock-based page fault handling first, and fall back
to the existing mmap_lock-based handling if that fails.

Signed-off-by: Kefeng Wang 
---
 arch/loongarch/Kconfig|  1 +
 arch/loongarch/mm/fault.c | 26 ++
 2 files changed, 27 insertions(+)

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 397203e18800..afb0ccabab97 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -53,6 +53,7 @@ config LOONGARCH
select ARCH_SUPPORTS_LTO_CLANG
select ARCH_SUPPORTS_LTO_CLANG_THIN
select ARCH_SUPPORTS_NUMA_BALANCING
+   select ARCH_SUPPORTS_PER_VMA_LOCK
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_CMPXCHG_LOCKREF
select ARCH_USE_QUEUED_RWLOCKS
diff --git a/arch/loongarch/mm/fault.c b/arch/loongarch/mm/fault.c
index cde2ea0119fa..7e54bc48813e 100644
--- a/arch/loongarch/mm/fault.c
+++ b/arch/loongarch/mm/fault.c
@@ -136,6 +136,17 @@ static inline bool access_error(unsigned int flags, struct 
pt_regs *regs,
return false;
 }
 
+#ifdef CONFIG_PER_VMA_LOCK
+int arch_vma_check_access(struct vm_area_struct *vma,
+ struct vm_locked_fault *vmlf)
+{
+   if (unlikely(access_error(vmlf->fault_flags, vmlf->regs, vmlf->address,
+vma)))
+   return -EINVAL;
+   return 0;
+}
+#endif
+
 /*
  * This routine handles page faults.  It determines the address,
  * and the problem, and then passes it off to one of the appropriate
@@ -149,6 +160,7 @@ static void __kprobes __do_page_fault(struct pt_regs *regs,
struct task_struct *tsk = current;
struct mm_struct *mm = tsk->mm;
struct vm_area_struct *vma = NULL;
+   struct vm_locked_fault vmlf;
vm_fault_t fault;
 
if (kprobe_page_fault(regs, current->thread.trap_nr))
@@ -183,6 +195,19 @@ static void __kprobes __do_page_fault(struct pt_regs *regs,
flags |= FAULT_FLAG_WRITE;
 
perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
+
+   VM_LOCKED_FAULT_INIT(vmlf, mm, address, flags, 0, regs, 0);
+   if (try_vma_locked_page_fault(, ))
+   goto retry;
+   else if (!(fault | VM_FAULT_RETRY))
+   goto done;
+
+   if (fault_signal_pending(fault, regs)) {
+   if (!user_mode(regs))
+   no_context(regs, address);
+   return;
+   }
+
 retry:
vma = lock_mm_and_find_vma(mm, address, regs);
if (unlikely(!vma))
@@ -223,6 +248,7 @@ static void __kprobes __do_page_fault(struct pt_regs *regs,
 
mmap_read_unlock(mm);
 
+done:
if (unlikely(fault & VM_FAULT_ERROR)) {
if (fault & VM_FAULT_OOM) {
do_out_of_memory(regs, address);
-- 
2.27.0



[PATCH rfc -next 09/10] loongarch: mm: add access_error() helper

2023-07-13 Thread Kefeng Wang
Add access_error() to check whether vma could be accessible or not,
which will be used __do_page_fault() and later vma locked based page
fault.

Signed-off-by: Kefeng Wang 
---
 arch/loongarch/mm/fault.c | 30 --
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/arch/loongarch/mm/fault.c b/arch/loongarch/mm/fault.c
index 03d06ee184da..cde2ea0119fa 100644
--- a/arch/loongarch/mm/fault.c
+++ b/arch/loongarch/mm/fault.c
@@ -120,6 +120,22 @@ static void __kprobes do_sigsegv(struct pt_regs *regs,
force_sig_fault(SIGSEGV, si_code, (void __user *)address);
 }
 
+static inline bool access_error(unsigned int flags, struct pt_regs *regs,
+   unsigned long addr, struct vm_area_struct *vma)
+{
+   if (flags & FAULT_FLAG_WRITE) {
+   if (!(vma->vm_flags & VM_WRITE))
+   return true;
+   } else {
+   if (!(vma->vm_flags & VM_READ) && addr != exception_era(regs))
+   return true;
+   if (!(vma->vm_flags & VM_EXEC) && addr == exception_era(regs))
+   return true;
+   }
+
+   return false;
+}
+
 /*
  * This routine handles page faults.  It determines the address,
  * and the problem, and then passes it off to one of the appropriate
@@ -163,6 +179,8 @@ static void __kprobes __do_page_fault(struct pt_regs *regs,
 
if (user_mode(regs))
flags |= FAULT_FLAG_USER;
+   if (write)
+   flags |= FAULT_FLAG_WRITE;
 
perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
 retry:
@@ -172,16 +190,8 @@ static void __kprobes __do_page_fault(struct pt_regs *regs,
 
si_code = SEGV_ACCERR;
 
-   if (write) {
-   flags |= FAULT_FLAG_WRITE;
-   if (!(vma->vm_flags & VM_WRITE))
-   goto bad_area;
-   } else {
-   if (!(vma->vm_flags & VM_READ) && address != 
exception_era(regs))
-   goto bad_area;
-   if (!(vma->vm_flags & VM_EXEC) && address == 
exception_era(regs))
-   goto bad_area;
-   }
+   if (access_error(flags, regs, vma))
+   goto bad_area;
 
/*
 * If for any reason at all we couldn't handle the fault,
-- 
2.27.0



[PATCH rfc -next 08/10] loongarch: mm: cleanup __do_page_fault()

2023-07-13 Thread Kefeng Wang
Cleanup __do_page_fault() by reuse bad_area_nosemaphore and
bad_area label.

Signed-off-by: Kefeng Wang 
---
 arch/loongarch/mm/fault.c | 36 +++-
 1 file changed, 11 insertions(+), 25 deletions(-)

diff --git a/arch/loongarch/mm/fault.c b/arch/loongarch/mm/fault.c
index da5b6d518cdb..03d06ee184da 100644
--- a/arch/loongarch/mm/fault.c
+++ b/arch/loongarch/mm/fault.c
@@ -151,18 +151,15 @@ static void __kprobes __do_page_fault(struct pt_regs 
*regs,
if (!user_mode(regs))
no_context(regs, address);
else
-   do_sigsegv(regs, write, address, si_code);
-   return;
+   goto bad_area_nosemaphore;
}
 
/*
 * If we're in an interrupt or have no user
 * context, we must not take the fault..
 */
-   if (faulthandler_disabled() || !mm) {
-   do_sigsegv(regs, write, address, si_code);
-   return;
-   }
+   if (faulthandler_disabled() || !mm)
+   goto bad_area_nosemaphore;
 
if (user_mode(regs))
flags |= FAULT_FLAG_USER;
@@ -172,23 +169,7 @@ static void __kprobes __do_page_fault(struct pt_regs *regs,
vma = lock_mm_and_find_vma(mm, address, regs);
if (unlikely(!vma))
goto bad_area_nosemaphore;
-   goto good_area;
-
-/*
- * Something tried to access memory that isn't in our memory map..
- * Fix it, but check if it's kernel or user first..
- */
-bad_area:
-   mmap_read_unlock(mm);
-bad_area_nosemaphore:
-   do_sigsegv(regs, write, address, si_code);
-   return;
 
-/*
- * Ok, we have a good vm_area for this memory access, so
- * we can handle it..
- */
-good_area:
si_code = SEGV_ACCERR;
 
if (write) {
@@ -229,14 +210,15 @@ static void __kprobes __do_page_fault(struct pt_regs 
*regs,
 */
goto retry;
}
+
+   mmap_read_unlock(mm);
+
if (unlikely(fault & VM_FAULT_ERROR)) {
-   mmap_read_unlock(mm);
if (fault & VM_FAULT_OOM) {
do_out_of_memory(regs, address);
return;
} else if (fault & VM_FAULT_SIGSEGV) {
-   do_sigsegv(regs, write, address, si_code);
-   return;
+   goto bad_area_nosemaphore;
} else if (fault & 
(VM_FAULT_SIGBUS|VM_FAULT_HWPOISON|VM_FAULT_HWPOISON_LARGE)) {
do_sigbus(regs, write, address, si_code);
return;
@@ -244,7 +226,11 @@ static void __kprobes __do_page_fault(struct pt_regs *regs,
BUG();
}
 
+   return;
+bad_area:
mmap_read_unlock(mm);
+bad_area_nosemaphore:
+   do_sigsegv(regs, write, address, si_code);
 }
 
 asmlinkage void __kprobes do_page_fault(struct pt_regs *regs,
-- 
2.27.0



[PATCH rfc -next 05/10] powerpc: mm: use try_vma_locked_page_fault()

2023-07-13 Thread Kefeng Wang
Use new try_vma_locked_page_fault() helper to simplify code.
No functional change intended.

Signed-off-by: Kefeng Wang 
---
 arch/powerpc/mm/fault.c | 54 +
 1 file changed, 22 insertions(+), 32 deletions(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 82954d0e6906..dd4832a3cf10 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -391,6 +391,23 @@ static int page_fault_is_bad(unsigned long err)
 #define page_fault_is_bad(__err)   ((__err) & DSISR_BAD_FAULT_32S)
 #endif
 
+#ifdef CONFIG_PER_VMA_LOCK
+int arch_vma_check_access(struct vm_area_struct *vma,
+ struct vm_locked_fault *vmlf)
+{
+   int is_exec = TRAP(vmlf->regs) == INTERRUPT_INST_STORAGE;
+   int is_write = page_fault_is_write(vmlf->fault_code);
+
+   if (unlikely(access_pkey_error(is_write, is_exec,
+   (vmlf->fault_code & DSISR_KEYFAULT), vma)))
+   return -EINVAL;
+
+   if (unlikely(access_error(is_write, is_exec, vma)))
+   return -EINVAL;
+   return 0;
+}
+#endif
+
 /*
  * For 600- and 800-family processors, the error_code parameter is DSISR
  * for a data fault, SRR1 for an instruction fault.
@@ -413,6 +430,7 @@ static int ___do_page_fault(struct pt_regs *regs, unsigned 
long address,
int is_write = page_fault_is_write(error_code);
vm_fault_t fault, major = 0;
bool kprobe_fault = kprobe_page_fault(regs, 11);
+   struct vm_locked_fault vmlf;
 
if (unlikely(debugger_fault_handler(regs) || kprobe_fault))
return 0;
@@ -469,41 +487,15 @@ static int ___do_page_fault(struct pt_regs *regs, 
unsigned long address,
if (is_exec)
flags |= FAULT_FLAG_INSTRUCTION;
 
-#ifdef CONFIG_PER_VMA_LOCK
-   if (!(flags & FAULT_FLAG_USER))
-   goto lock_mmap;
-
-   vma = lock_vma_under_rcu(mm, address);
-   if (!vma)
-   goto lock_mmap;
-
-   if (unlikely(access_pkey_error(is_write, is_exec,
-  (error_code & DSISR_KEYFAULT), vma))) {
-   vma_end_read(vma);
-   goto lock_mmap;
-   }
-
-   if (unlikely(access_error(is_write, is_exec, vma))) {
-   vma_end_read(vma);
-   goto lock_mmap;
-   }
-
-   fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, 
regs);
-   if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
-   vma_end_read(vma);
-
-   if (!(fault & VM_FAULT_RETRY)) {
-   count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
+   VM_LOCKED_FAULT_INIT(vmlf, mm, address, flags, 0, regs, error_code);
+   if (try_vma_locked_page_fault(, ))
+   goto retry;
+   else if (!(fault | VM_FAULT_RETRY))
goto done;
-   }
-   count_vm_vma_lock_event(VMA_LOCK_RETRY);
 
if (fault_signal_pending(fault, regs))
return user_mode(regs) ? 0 : SIGBUS;
 
-lock_mmap:
-#endif /* CONFIG_PER_VMA_LOCK */
-
/* When running in the kernel we expect faults to occur only to
 * addresses in user space.  All other faults represent errors in the
 * kernel and should generate an OOPS.  Unfortunately, in the case of an
@@ -552,9 +544,7 @@ static int ___do_page_fault(struct pt_regs *regs, unsigned 
long address,
 
mmap_read_unlock(current->mm);
 
-#ifdef CONFIG_PER_VMA_LOCK
 done:
-#endif
if (unlikely(fault & VM_FAULT_ERROR))
return mm_fault_error(regs, address, fault);
 
-- 
2.27.0



[PATCH rfc -next 06/10] riscv: mm: use try_vma_locked_page_fault()

2023-07-13 Thread Kefeng Wang
Use new try_vma_locked_page_fault() helper to simplify code.
No functional change intended.

Signed-off-by: Kefeng Wang 
---
 arch/riscv/mm/fault.c | 38 +++---
 1 file changed, 15 insertions(+), 23 deletions(-)

diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c
index 6ea2cce4cc17..13bc60370b5c 100644
--- a/arch/riscv/mm/fault.c
+++ b/arch/riscv/mm/fault.c
@@ -215,6 +215,16 @@ static inline bool access_error(unsigned long cause, 
struct vm_area_struct *vma)
return false;
 }
 
+#ifdef CONFIG_PER_VMA_LOCK
+int arch_vma_check_access(struct vm_area_struct *vma,
+ struct vm_locked_fault *vmlf)
+{
+   if (unlikely(access_error(vmlf->fault_code, vma)))
+   return -EINVAL;
+   return 0;
+}
+#endif
+
 /*
  * This routine handles page faults.  It determines the address and the
  * problem, and then passes it off to one of the appropriate routines.
@@ -228,6 +238,7 @@ void handle_page_fault(struct pt_regs *regs)
unsigned int flags = FAULT_FLAG_DEFAULT;
int code = SEGV_MAPERR;
vm_fault_t fault;
+   struct vm_locked_fault vmlf;
 
cause = regs->cause;
addr = regs->badaddr;
@@ -283,35 +294,18 @@ void handle_page_fault(struct pt_regs *regs)
flags |= FAULT_FLAG_WRITE;
else if (cause == EXC_INST_PAGE_FAULT)
flags |= FAULT_FLAG_INSTRUCTION;
-#ifdef CONFIG_PER_VMA_LOCK
-   if (!(flags & FAULT_FLAG_USER))
-   goto lock_mmap;
 
-   vma = lock_vma_under_rcu(mm, addr);
-   if (!vma)
-   goto lock_mmap;
-
-   if (unlikely(access_error(cause, vma))) {
-   vma_end_read(vma);
-   goto lock_mmap;
-   }
-
-   fault = handle_mm_fault(vma, addr, flags | FAULT_FLAG_VMA_LOCK, regs);
-   vma_end_read(vma);
-
-   if (!(fault & VM_FAULT_RETRY)) {
-   count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
+   VM_LOCKED_FAULT_INIT(vmlf, mm, addr, flags, 0, regs, cause);
+   if (try_vma_locked_page_fault(, ))
+   goto retry;
+   else if (!(fault | VM_FAULT_RETRY))
goto done;
-   }
-   count_vm_vma_lock_event(VMA_LOCK_RETRY);
 
if (fault_signal_pending(fault, regs)) {
if (!user_mode(regs))
no_context(regs, addr);
return;
}
-lock_mmap:
-#endif /* CONFIG_PER_VMA_LOCK */
 
 retry:
vma = lock_mm_and_find_vma(mm, addr, regs);
@@ -368,9 +362,7 @@ void handle_page_fault(struct pt_regs *regs)
 
mmap_read_unlock(mm);
 
-#ifdef CONFIG_PER_VMA_LOCK
 done:
-#endif
if (unlikely(fault & VM_FAULT_ERROR)) {
tsk->thread.bad_cause = cause;
mm_fault_error(regs, addr, fault);
-- 
2.27.0



[PATCH rfc -next 02/10] x86: mm: use try_vma_locked_page_fault()

2023-07-13 Thread Kefeng Wang
Use new try_vma_locked_page_fault() helper to simplify code.
No functional change intended.

Signed-off-by: Kefeng Wang 
---
 arch/x86/mm/fault.c | 39 +++
 1 file changed, 15 insertions(+), 24 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 56b4f9faf8c4..3f3b8b0a87de 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1213,6 +1213,16 @@ do_kern_addr_fault(struct pt_regs *regs, unsigned long 
hw_error_code,
 }
 NOKPROBE_SYMBOL(do_kern_addr_fault);
 
+#ifdef CONFIG_PER_VMA_LOCK
+int arch_vma_check_access(struct vm_area_struct *vma,
+ struct vm_locked_fault *vmlf)
+{
+   if (unlikely(access_error(vmlf->fault_code, vma)))
+   return -EINVAL;
+   return 0;
+}
+#endif
+
 /*
  * Handle faults in the user portion of the address space.  Nothing in here
  * should check X86_PF_USER without a specific justification: for almost
@@ -1231,6 +1241,7 @@ void do_user_addr_fault(struct pt_regs *regs,
struct mm_struct *mm;
vm_fault_t fault;
unsigned int flags = FAULT_FLAG_DEFAULT;
+   struct vm_locked_fault vmlf;
 
tsk = current;
mm = tsk->mm;
@@ -1328,27 +1339,11 @@ void do_user_addr_fault(struct pt_regs *regs,
}
 #endif
 
-#ifdef CONFIG_PER_VMA_LOCK
-   if (!(flags & FAULT_FLAG_USER))
-   goto lock_mmap;
-
-   vma = lock_vma_under_rcu(mm, address);
-   if (!vma)
-   goto lock_mmap;
-
-   if (unlikely(access_error(error_code, vma))) {
-   vma_end_read(vma);
-   goto lock_mmap;
-   }
-   fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, 
regs);
-   if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
-   vma_end_read(vma);
-
-   if (!(fault & VM_FAULT_RETRY)) {
-   count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
+   VM_LOCKED_FAULT_INIT(vmlf, mm, address, flags, 0, regs, error_code);
+   if (try_vma_locked_page_fault(, ))
+   goto retry;
+   else if (!(fault | VM_FAULT_RETRY))
goto done;
-   }
-   count_vm_vma_lock_event(VMA_LOCK_RETRY);
 
/* Quick path to respond to signals */
if (fault_signal_pending(fault, regs)) {
@@ -1358,8 +1353,6 @@ void do_user_addr_fault(struct pt_regs *regs,
 ARCH_DEFAULT_PKEY);
return;
}
-lock_mmap:
-#endif /* CONFIG_PER_VMA_LOCK */
 
 retry:
vma = lock_mm_and_find_vma(mm, address, regs);
@@ -1419,9 +1412,7 @@ void do_user_addr_fault(struct pt_regs *regs,
}
 
mmap_read_unlock(mm);
-#ifdef CONFIG_PER_VMA_LOCK
 done:
-#endif
if (likely(!(fault & VM_FAULT_ERROR)))
return;
 
-- 
2.27.0



[PATCH rfc -next 00/10] mm: convert to generic VMA lock-based page fault

2023-07-13 Thread Kefeng Wang
Add a generic VMA lock-based page fault handler in mm core, and convert
architectures to use it, which eliminate architectures's duplicated codes.

With it, we can avoid multiple changes in architectures's code if we 
add new feature or bugfix.

This fixes riscv missing change about commit 38b3aec8e8d2 "mm: drop per-VMA
lock when returning VM_FAULT_RETRY or VM_FAULT_COMPLETED", and in the end,
we enable this feature on ARM32/Loongarch too.

This is based on next-20230713, only built test(no loongarch compiler,
so except loongarch).

Kefeng Wang (10):
  mm: add a generic VMA lock-based page fault handler
  x86: mm: use try_vma_locked_page_fault()
  arm64: mm: use try_vma_locked_page_fault()
  s390: mm: use try_vma_locked_page_fault()
  powerpc: mm: use try_vma_locked_page_fault()
  riscv: mm: use try_vma_locked_page_fault()
  ARM: mm: try VMA lock-based page fault handling first
  loongarch: mm: cleanup __do_page_fault()
  loongarch: mm: add access_error() helper
  loongarch: mm: try VMA lock-based page fault handling first

 arch/arm/Kconfig  |  1 +
 arch/arm/mm/fault.c   | 15 ++-
 arch/arm64/mm/fault.c | 28 +++-
 arch/loongarch/Kconfig|  1 +
 arch/loongarch/mm/fault.c | 92 ---
 arch/powerpc/mm/fault.c   | 54 ++-
 arch/riscv/mm/fault.c | 38 +++-
 arch/s390/mm/fault.c  | 23 +++---
 arch/x86/mm/fault.c   | 39 +++--
 include/linux/mm.h| 28 
 mm/memory.c   | 42 ++
 11 files changed, 206 insertions(+), 155 deletions(-)

-- 
2.27.0



[PATCH v2 1/2] mm: remove arguments of show_mem()

2023-06-30 Thread Kefeng Wang
All callers of show_mem() pass 0 and NULL, so we can remove the two
arguments by directly calling __show_mem(0, NULL, MAX_NR_ZONES - 1)
in show_mem().

Signed-off-by: Kefeng Wang 
---
v2: update commit log
 arch/powerpc/xmon/xmon.c  | 2 +-
 drivers/tty/sysrq.c   | 2 +-
 drivers/tty/vt/keyboard.c | 2 +-
 include/linux/mm.h| 4 ++--
 init/initramfs.c  | 2 +-
 kernel/panic.c| 2 +-
 6 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index fae747cc57d2..ee17270d35d0 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -1084,7 +1084,7 @@ cmds(struct pt_regs *excp)
memzcan();
break;
case 'i':
-   show_mem(0, NULL);
+   show_mem();
break;
default:
termch = cmd;
diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
index b6e70c5cfa17..e1df63a88aac 100644
--- a/drivers/tty/sysrq.c
+++ b/drivers/tty/sysrq.c
@@ -342,7 +342,7 @@ static const struct sysrq_key_op sysrq_ftrace_dump_op = {
 
 static void sysrq_handle_showmem(int key)
 {
-   show_mem(0, NULL);
+   show_mem();
 }
 static const struct sysrq_key_op sysrq_showmem_op = {
.handler= sysrq_handle_showmem,
diff --git a/drivers/tty/vt/keyboard.c b/drivers/tty/vt/keyboard.c
index be8313cdbac3..358f216c6cd6 100644
--- a/drivers/tty/vt/keyboard.c
+++ b/drivers/tty/vt/keyboard.c
@@ -606,7 +606,7 @@ static void fn_scroll_back(struct vc_data *vc)
 
 static void fn_show_mem(struct vc_data *vc)
 {
-   show_mem(0, NULL);
+   show_mem();
 }
 
 static void fn_show_state(struct vc_data *vc)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index eef34f6a0351..ddb140e14f3a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3047,9 +3047,9 @@ extern void mem_init(void);
 extern void __init mmap_init(void);
 
 extern void __show_mem(unsigned int flags, nodemask_t *nodemask, int 
max_zone_idx);
-static inline void show_mem(unsigned int flags, nodemask_t *nodemask)
+static inline void show_mem(void)
 {
-   __show_mem(flags, nodemask, MAX_NR_ZONES - 1);
+   __show_mem(0, NULL, MAX_NR_ZONES - 1);
 }
 extern long si_mem_available(void);
 extern void si_meminfo(struct sysinfo * val);
diff --git a/init/initramfs.c b/init/initramfs.c
index e7a01c2ccd1b..8d0fd946cdd2 100644
--- a/init/initramfs.c
+++ b/init/initramfs.c
@@ -61,7 +61,7 @@ static void __init error(char *x)
 }
 
 #define panic_show_mem(fmt, ...) \
-   ({ show_mem(0, NULL); panic(fmt, ##__VA_ARGS__); })
+   ({ show_mem(); panic(fmt, ##__VA_ARGS__); })
 
 /* link hash */
 
diff --git a/kernel/panic.c b/kernel/panic.c
index 10effe40a3fa..07239d4ad81e 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -216,7 +216,7 @@ static void panic_print_sys_info(bool console_flush)
show_state();
 
if (panic_print & PANIC_PRINT_MEM_INFO)
-   show_mem(0, NULL);
+   show_mem();
 
if (panic_print & PANIC_PRINT_TIMER_INFO)
sysrq_timer_list_show();
-- 
2.41.0



[PATCH v2 2/2] mm: make show_free_areas() static

2023-06-30 Thread Kefeng Wang
All callers of show_free_areas() pass 0 and NULL, so we can directly
use show_mem() instead of show_free_areas(0, NULL), which could make
show_free_areas() a static function.

Signed-off-by: Kefeng Wang 
---
v2: update commit log and fix a missing show_free_areas() conversion

 arch/sparc/kernel/setup_32.c |  2 +-
 include/linux/mm.h   | 12 
 mm/internal.h|  6 ++
 mm/nommu.c   |  8 
 mm/show_mem.c|  4 ++--
 5 files changed, 13 insertions(+), 19 deletions(-)

diff --git a/arch/sparc/kernel/setup_32.c b/arch/sparc/kernel/setup_32.c
index 1adf5c1c16b8..34ef7febf0d5 100644
--- a/arch/sparc/kernel/setup_32.c
+++ b/arch/sparc/kernel/setup_32.c
@@ -83,7 +83,7 @@ static void prom_sync_me(void)
 "nop\n\t" : : "r" ());
 
prom_printf("PROM SYNC COMMAND...\n");
-   show_free_areas(0, NULL);
+   show_mem();
if (!is_idle_task(current)) {
local_irq_enable();
ksys_sync();
diff --git a/include/linux/mm.h b/include/linux/mm.h
index ddb140e14f3a..0a1314a3ffae 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2218,18 +2218,6 @@ extern void pagefault_out_of_memory(void);
 #define offset_in_thp(page, p) ((unsigned long)(p) & (thp_size(page) - 1))
 #define offset_in_folio(folio, p) ((unsigned long)(p) & (folio_size(folio) - 
1))
 
-/*
- * Flags passed to show_mem() and show_free_areas() to suppress output in
- * various contexts.
- */
-#define SHOW_MEM_FILTER_NODES  (0x0001u)   /* disallowed nodes */
-
-extern void __show_free_areas(unsigned int flags, nodemask_t *nodemask, int 
max_zone_idx);
-static void __maybe_unused show_free_areas(unsigned int flags, nodemask_t 
*nodemask)
-{
-   __show_free_areas(flags, nodemask, MAX_NR_ZONES - 1);
-}
-
 /*
  * Parameter block passed down to zap_pte_range in exceptional cases.
  */
diff --git a/mm/internal.h b/mm/internal.h
index a7d9e980429a..721ed07d7fd6 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -61,6 +61,12 @@ void page_writeback_init(void);
 #define COMPOUND_MAPPED0x80
 #define FOLIO_PAGES_MAPPED (COMPOUND_MAPPED - 1)
 
+/*
+ * Flags passed to __show_mem() and show_free_areas() to suppress output in
+ * various contexts.
+ */
+#define SHOW_MEM_FILTER_NODES  (0x0001u)   /* disallowed nodes */
+
 /*
  * How many individual pages have an elevated _mapcount.  Excludes
  * the folio's entire_mapcount.
diff --git a/mm/nommu.c b/mm/nommu.c
index f670d9979a26..bff51d8ec66e 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -990,7 +990,7 @@ static int do_mmap_private(struct vm_area_struct *vma,
 enomem:
pr_err("Allocation of length %lu from process %d (%s) failed\n",
   len, current->pid, current->comm);
-   show_free_areas(0, NULL);
+   show_mem();
return -ENOMEM;
 }
 
@@ -1223,20 +1223,20 @@ unsigned long do_mmap(struct file *file,
kmem_cache_free(vm_region_jar, region);
pr_warn("Allocation of vma for %lu byte allocation from process %d 
failed\n",
len, current->pid);
-   show_free_areas(0, NULL);
+   show_mem();
return -ENOMEM;
 
 error_getting_region:
pr_warn("Allocation of vm region for %lu byte allocation from process 
%d failed\n",
len, current->pid);
-   show_free_areas(0, NULL);
+   show_mem();
return -ENOMEM;
 
 error_vma_iter_prealloc:
kmem_cache_free(vm_region_jar, region);
vm_area_free(vma);
pr_warn("Allocation of vma tree for process %d failed\n", current->pid);
-   show_free_areas(0, NULL);
+   show_mem();
return -ENOMEM;
 
 }
diff --git a/mm/show_mem.c b/mm/show_mem.c
index 01f8e9905817..09c7d036d49e 100644
--- a/mm/show_mem.c
+++ b/mm/show_mem.c
@@ -186,7 +186,7 @@ static bool node_has_managed_zones(pg_data_t *pgdat, int 
max_zone_idx)
  * SHOW_MEM_FILTER_NODES: suppress nodes that are not allowed by current's
  *   cpuset.
  */
-void __show_free_areas(unsigned int filter, nodemask_t *nodemask, int 
max_zone_idx)
+static void show_free_areas(unsigned int filter, nodemask_t *nodemask, int 
max_zone_idx)
 {
unsigned long free_pcp = 0;
int cpu, nid;
@@ -406,7 +406,7 @@ void __show_mem(unsigned int filter, nodemask_t *nodemask, 
int max_zone_idx)
struct zone *zone;
 
printk("Mem-Info:\n");
-   __show_free_areas(filter, nodemask, max_zone_idx);
+   show_free_areas(filter, nodemask, max_zone_idx);
 
for_each_populated_zone(zone) {
 
-- 
2.41.0



Re: [PATCH 2/2] mm: make show_free_areas() static

2023-06-29 Thread Kefeng Wang

Thanks,

On 2023/6/29 23:00, kernel test robot wrote:

Hi Kefeng,

kernel test robot noticed the following build errors:

[auto build test ERROR on akpm-mm/mm-everything]

url:
https://github.com/intel-lab-lkp/linux/commits/Kefeng-Wang/mm-make-show_free_areas-static/20230629-182958
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git 
mm-everything
patch link:
https://lore.kernel.org/r/20230629104357.35455-2-wangkefeng.wang%40huawei.com
patch subject: [PATCH 2/2] mm: make show_free_areas() static
config: sh-allmodconfig 
(https://download.01.org/0day-ci/archive/20230629/202306292240.rj0dlhfi-...@intel.com/config)
compiler: sh4-linux-gcc (GCC) 12.3.0
reproduce: 
(https://download.01.org/0day-ci/archive/20230629/202306292240.rj0dlhfi-...@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202306292240.rj0dlhfi-...@intel.com/

All errors (new ones prefixed by >>):

mm/nommu.c: In function 'do_mmap':

mm/nommu.c:1239:9: error: implicit declaration of function 'show_free_areas' 
[-Werror=implicit-function-declaration]

 1239 | show_free_areas(0, NULL);
  | ^~~
cc1: some warnings being treated as errors




Missing this one in patch-1, will update


vim +/show_free_areas +1239 mm/nommu.c



Re: [PATCH 1/2] mm: remove arguments of show_mem()

2023-06-29 Thread Kefeng Wang




On 2023/6/29 23:17, Matthew Wilcox wrote:

On Thu, Jun 29, 2023 at 06:43:56PM +0800, Kefeng Wang wrote:

Directly call __show_mem(0, NULL, MAX_NR_ZONES - 1) in show_mem()
to remove the arguments of show_mem().


Do you mean, "All callers of show_mem() pass 0 and NULL, so we can
remove the two arguments"?


Yes, will update with above to make it clear, thanks Matthew.





[PATCH 1/2] mm: remove arguments of show_mem()

2023-06-29 Thread Kefeng Wang
Directly call __show_mem(0, NULL, MAX_NR_ZONES - 1) in show_mem()
to remove the arguments of show_mem().

Signed-off-by: Kefeng Wang 
---
 arch/powerpc/xmon/xmon.c  | 2 +-
 drivers/tty/sysrq.c   | 2 +-
 drivers/tty/vt/keyboard.c | 2 +-
 include/linux/mm.h| 4 ++--
 init/initramfs.c  | 2 +-
 kernel/panic.c| 2 +-
 6 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index fae747cc57d2..ee17270d35d0 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -1084,7 +1084,7 @@ cmds(struct pt_regs *excp)
memzcan();
break;
case 'i':
-   show_mem(0, NULL);
+   show_mem();
break;
default:
termch = cmd;
diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
index b6e70c5cfa17..e1df63a88aac 100644
--- a/drivers/tty/sysrq.c
+++ b/drivers/tty/sysrq.c
@@ -342,7 +342,7 @@ static const struct sysrq_key_op sysrq_ftrace_dump_op = {
 
 static void sysrq_handle_showmem(int key)
 {
-   show_mem(0, NULL);
+   show_mem();
 }
 static const struct sysrq_key_op sysrq_showmem_op = {
.handler= sysrq_handle_showmem,
diff --git a/drivers/tty/vt/keyboard.c b/drivers/tty/vt/keyboard.c
index be8313cdbac3..358f216c6cd6 100644
--- a/drivers/tty/vt/keyboard.c
+++ b/drivers/tty/vt/keyboard.c
@@ -606,7 +606,7 @@ static void fn_scroll_back(struct vc_data *vc)
 
 static void fn_show_mem(struct vc_data *vc)
 {
-   show_mem(0, NULL);
+   show_mem();
 }
 
 static void fn_show_state(struct vc_data *vc)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index eef34f6a0351..ddb140e14f3a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3047,9 +3047,9 @@ extern void mem_init(void);
 extern void __init mmap_init(void);
 
 extern void __show_mem(unsigned int flags, nodemask_t *nodemask, int 
max_zone_idx);
-static inline void show_mem(unsigned int flags, nodemask_t *nodemask)
+static inline void show_mem(void)
 {
-   __show_mem(flags, nodemask, MAX_NR_ZONES - 1);
+   __show_mem(0, NULL, MAX_NR_ZONES - 1);
 }
 extern long si_mem_available(void);
 extern void si_meminfo(struct sysinfo * val);
diff --git a/init/initramfs.c b/init/initramfs.c
index e7a01c2ccd1b..8d0fd946cdd2 100644
--- a/init/initramfs.c
+++ b/init/initramfs.c
@@ -61,7 +61,7 @@ static void __init error(char *x)
 }
 
 #define panic_show_mem(fmt, ...) \
-   ({ show_mem(0, NULL); panic(fmt, ##__VA_ARGS__); })
+   ({ show_mem(); panic(fmt, ##__VA_ARGS__); })
 
 /* link hash */
 
diff --git a/kernel/panic.c b/kernel/panic.c
index 10effe40a3fa..07239d4ad81e 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -216,7 +216,7 @@ static void panic_print_sys_info(bool console_flush)
show_state();
 
if (panic_print & PANIC_PRINT_MEM_INFO)
-   show_mem(0, NULL);
+   show_mem();
 
if (panic_print & PANIC_PRINT_TIMER_INFO)
sysrq_timer_list_show();
-- 
2.41.0



[PATCH 2/2] mm: make show_free_areas() static

2023-06-29 Thread Kefeng Wang
Directly use show_mem() instead of show_free_areas(0, NULL), then
make show_free_areas() a static function.

Signed-off-by: Kefeng Wang 
---
 arch/sparc/kernel/setup_32.c |  2 +-
 include/linux/mm.h   | 12 
 mm/internal.h|  6 ++
 mm/nommu.c   |  6 +++---
 mm/show_mem.c|  4 ++--
 5 files changed, 12 insertions(+), 18 deletions(-)

diff --git a/arch/sparc/kernel/setup_32.c b/arch/sparc/kernel/setup_32.c
index 1adf5c1c16b8..34ef7febf0d5 100644
--- a/arch/sparc/kernel/setup_32.c
+++ b/arch/sparc/kernel/setup_32.c
@@ -83,7 +83,7 @@ static void prom_sync_me(void)
 "nop\n\t" : : "r" ());
 
prom_printf("PROM SYNC COMMAND...\n");
-   show_free_areas(0, NULL);
+   show_mem();
if (!is_idle_task(current)) {
local_irq_enable();
ksys_sync();
diff --git a/include/linux/mm.h b/include/linux/mm.h
index ddb140e14f3a..0a1314a3ffae 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2218,18 +2218,6 @@ extern void pagefault_out_of_memory(void);
 #define offset_in_thp(page, p) ((unsigned long)(p) & (thp_size(page) - 1))
 #define offset_in_folio(folio, p) ((unsigned long)(p) & (folio_size(folio) - 
1))
 
-/*
- * Flags passed to show_mem() and show_free_areas() to suppress output in
- * various contexts.
- */
-#define SHOW_MEM_FILTER_NODES  (0x0001u)   /* disallowed nodes */
-
-extern void __show_free_areas(unsigned int flags, nodemask_t *nodemask, int 
max_zone_idx);
-static void __maybe_unused show_free_areas(unsigned int flags, nodemask_t 
*nodemask)
-{
-   __show_free_areas(flags, nodemask, MAX_NR_ZONES - 1);
-}
-
 /*
  * Parameter block passed down to zap_pte_range in exceptional cases.
  */
diff --git a/mm/internal.h b/mm/internal.h
index a7d9e980429a..721ed07d7fd6 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -61,6 +61,12 @@ void page_writeback_init(void);
 #define COMPOUND_MAPPED0x80
 #define FOLIO_PAGES_MAPPED (COMPOUND_MAPPED - 1)
 
+/*
+ * Flags passed to __show_mem() and show_free_areas() to suppress output in
+ * various contexts.
+ */
+#define SHOW_MEM_FILTER_NODES  (0x0001u)   /* disallowed nodes */
+
 /*
  * How many individual pages have an elevated _mapcount.  Excludes
  * the folio's entire_mapcount.
diff --git a/mm/nommu.c b/mm/nommu.c
index f670d9979a26..5b179234ce89 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -990,7 +990,7 @@ static int do_mmap_private(struct vm_area_struct *vma,
 enomem:
pr_err("Allocation of length %lu from process %d (%s) failed\n",
   len, current->pid, current->comm);
-   show_free_areas(0, NULL);
+   show_mem();
return -ENOMEM;
 }
 
@@ -1223,13 +1223,13 @@ unsigned long do_mmap(struct file *file,
kmem_cache_free(vm_region_jar, region);
pr_warn("Allocation of vma for %lu byte allocation from process %d 
failed\n",
len, current->pid);
-   show_free_areas(0, NULL);
+   show_mem();
return -ENOMEM;
 
 error_getting_region:
pr_warn("Allocation of vm region for %lu byte allocation from process 
%d failed\n",
len, current->pid);
-   show_free_areas(0, NULL);
+   show_mem();
return -ENOMEM;
 
 error_vma_iter_prealloc:
diff --git a/mm/show_mem.c b/mm/show_mem.c
index 01f8e9905817..09c7d036d49e 100644
--- a/mm/show_mem.c
+++ b/mm/show_mem.c
@@ -186,7 +186,7 @@ static bool node_has_managed_zones(pg_data_t *pgdat, int 
max_zone_idx)
  * SHOW_MEM_FILTER_NODES: suppress nodes that are not allowed by current's
  *   cpuset.
  */
-void __show_free_areas(unsigned int filter, nodemask_t *nodemask, int 
max_zone_idx)
+static void show_free_areas(unsigned int filter, nodemask_t *nodemask, int 
max_zone_idx)
 {
unsigned long free_pcp = 0;
int cpu, nid;
@@ -406,7 +406,7 @@ void __show_mem(unsigned int filter, nodemask_t *nodemask, 
int max_zone_idx)
struct zone *zone;
 
printk("Mem-Info:\n");
-   __show_free_areas(filter, nodemask, max_zone_idx);
+   show_free_areas(filter, nodemask, max_zone_idx);
 
for_each_populated_zone(zone) {
 
-- 
2.41.0



Re: [PATCH v3 03/14] arm64: reword ARCH_FORCE_MAX_ORDER prompt and help text

2023-03-25 Thread Kefeng Wang




On 2023/3/25 14:08, Mike Rapoport wrote:

From: "Mike Rapoport (IBM)" 

The prompt and help text of ARCH_FORCE_MAX_ORDER are not even close to
describe this configuration option.

Update both to actually describe what this option does.

Acked-by: Kirill A. Shutemov 
Reviewed-by: Zi Yan 
Signed-off-by: Mike Rapoport (IBM) 


Reviewed-by: Kefeng Wang 


---
  arch/arm64/Kconfig | 24 
  1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 7324032af859..cc11cdcf5a00 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1487,24 +1487,24 @@ config XEN
  # 16K |   27  |  14  |   13| 11   
  |
  # 64K |   29  |  16  |   13| 13   
  |
  config ARCH_FORCE_MAX_ORDER
-   int "Maximum zone order" if EXPERT && (ARM64_4K_PAGES || 
ARM64_16K_PAGES)
+   int "Order of maximal physically contiguous allocations" if EXPERT && 
(ARM64_4K_PAGES || ARM64_16K_PAGES)
default "13" if ARM64_64K_PAGES
default "11" if ARM64_16K_PAGES
default "10"
help
- The kernel memory allocator divides physically contiguous memory
- blocks into "zones", where each zone is a power of two number of
- pages.  This option selects the largest power of two that the kernel
- keeps in the memory allocator.  If you need to allocate very large
- blocks of physically contiguous memory, then you may need to
- increase this value.
+ The kernel page allocator limits the size of maximal physically
+ contiguous allocations. The limit is called MAX_ORDER and it
+ defines the maximal power of two of number of pages that can be
+ allocated as a single contiguous block. This option allows
+ overriding the default setting when ability to allocate very
+ large blocks of physically contiguous memory is required.
  
-	  We make sure that we can allocate up to a HugePage size for each configuration.

- Hence we have :
-   MAX_ORDER = PMD_SHIFT - PAGE_SHIFT  => PAGE_SHIFT - 3
+ The maximal size of allocation cannot exceed the size of the
+ section, so the value of MAX_ORDER should satisfy
  
-	  However for 4K, we choose a higher default value, 10 as opposed to 9, giving us

- 4M allocations matching the default size used by generic code.
+   MAX_ORDER + PAGE_SHIFT <= SECTION_SIZE_BITS
+
+ Don't change if unsure.
  
  config UNMAP_KERNEL_AT_EL0

bool "Unmap kernel when running in userspace (aka \"KAISER\")" if EXPERT


Re: [PATCH v3 02/14] arm64: drop ranges in definition of ARCH_FORCE_MAX_ORDER

2023-03-25 Thread Kefeng Wang




On 2023/3/25 14:08, Mike Rapoport wrote:

From: "Mike Rapoport (IBM)" 

It is not a good idea to change fundamental parameters of core memory
management. Having predefined ranges suggests that the values within
those ranges are sensible, but one has to *really* understand
implications of changing MAX_ORDER before actually amending it and
ranges don't help here.

Drop ranges in definition of ARCH_FORCE_MAX_ORDER and make its prompt
visible only if EXPERT=y

Acked-by: Kirill A. Shutemov 
Reviewed-by: Zi Yan 
Signed-off-by: Mike Rapoport (IBM) 


Reviewed-by: Kefeng Wang 


---
  arch/arm64/Kconfig | 4 +---
  1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index e60baf7859d1..7324032af859 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1487,11 +1487,9 @@ config XEN
  # 16K |   27  |  14  |   13| 11   
  |
  # 64K |   29  |  16  |   13| 13   
  |
  config ARCH_FORCE_MAX_ORDER
-   int "Maximum zone order" if ARM64_4K_PAGES || ARM64_16K_PAGES
+   int "Maximum zone order" if EXPERT && (ARM64_4K_PAGES || 
ARM64_16K_PAGES)
default "13" if ARM64_64K_PAGES
-   range 11 13 if ARM64_16K_PAGES
default "11" if ARM64_16K_PAGES
-   range 10 15 if ARM64_4K_PAGES
default "10"
help
  The kernel memory allocator divides physically contiguous memory


Re: [PATCH v3 05/14] ia64: don't allow users to override ARCH_FORCE_MAX_ORDER

2023-03-25 Thread Kefeng Wang




On 2023/3/25 14:08, Mike Rapoport wrote:

From: "Mike Rapoport (IBM)" 

It is enough to keep default values for base and huge pages without
letting users to override ARCH_FORCE_MAX_ORDER.

Drop the prompt to make the option unvisible in *config.

Acked-by: Kirill A. Shutemov 
Reviewed-by: Zi Yan 
Signed-off-by: Mike Rapoport (IBM) 
---
  arch/ia64/Kconfig | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index 0d2f41fa56ee..b61437cae162 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -202,8 +202,7 @@ config IA64_CYCLONE
  If you're unsure, answer N.
  
  config ARCH_FORCE_MAX_ORDER

-   int "MAX_ORDER (10 - 16)"  if !HUGETLB_PAGE
-   range 10 16  if !HUGETLB_PAGE
+   int
default "16" if HUGETLB_PAGE
default "10"
  


It seems that we could drop the following part?

diff --git a/arch/ia64/include/asm/sparsemem.h 
b/arch/ia64/include/asm/sparsemem.h

index a58f8b466d96..18187551b183 100644
--- a/arch/ia64/include/asm/sparsemem.h
+++ b/arch/ia64/include/asm/sparsemem.h
@@ -11,11 +11,6 @@

 #define SECTION_SIZE_BITS  (30)
 #define MAX_PHYSMEM_BITS   (50)
-#ifdef CONFIG_ARCH_FORCE_MAX_ORDER
-#if (CONFIG_ARCH_FORCE_MAX_ORDER + PAGE_SHIFT > SECTION_SIZE_BITS)
-#undef SECTION_SIZE_BITS
-#define SECTION_SIZE_BITS (CONFIG_ARCH_FORCE_MAX_ORDER + PAGE_SHIFT)
-#endif
 #endif



[PATCH] mm: remove kern_addr_valid() completely

2022-10-18 Thread Kefeng Wang
Most architectures(except arm64/x86/sparc) simply return 1 for
kern_addr_valid(), which is only used in read_kcore(), and it
calls copy_from_kernel_nofault() which could check whether the
address is a valid kernel address, so no need kern_addr_valid(),
let's remove unneeded kern_addr_valid() completely.

Signed-off-by: Kefeng Wang 
---
 arch/alpha/include/asm/pgtable.h  |  2 -
 arch/arc/include/asm/pgtable-bits-arcv2.h |  2 -
 arch/arm/include/asm/pgtable-nommu.h  |  2 -
 arch/arm/include/asm/pgtable.h|  4 --
 arch/arm64/include/asm/pgtable.h  |  2 -
 arch/arm64/mm/mmu.c   | 47 ---
 arch/arm64/mm/pageattr.c  |  3 +-
 arch/csky/include/asm/pgtable.h   |  3 --
 arch/hexagon/include/asm/page.h   |  7 
 arch/ia64/include/asm/pgtable.h   | 16 
 arch/loongarch/include/asm/pgtable.h  |  2 -
 arch/m68k/include/asm/pgtable_mm.h|  2 -
 arch/m68k/include/asm/pgtable_no.h|  1 -
 arch/microblaze/include/asm/pgtable.h |  3 --
 arch/mips/include/asm/pgtable.h   |  2 -
 arch/nios2/include/asm/pgtable.h  |  2 -
 arch/openrisc/include/asm/pgtable.h   |  2 -
 arch/parisc/include/asm/pgtable.h | 15 
 arch/powerpc/include/asm/pgtable.h|  7 
 arch/riscv/include/asm/pgtable.h  |  2 -
 arch/s390/include/asm/pgtable.h   |  2 -
 arch/sh/include/asm/pgtable.h |  2 -
 arch/sparc/include/asm/pgtable_32.h   |  6 ---
 arch/sparc/mm/init_32.c   |  3 +-
 arch/sparc/mm/init_64.c   |  1 -
 arch/um/include/asm/pgtable.h |  2 -
 arch/x86/include/asm/pgtable_32.h |  9 -
 arch/x86/include/asm/pgtable_64.h |  1 -
 arch/x86/mm/init_64.c | 41 
 arch/xtensa/include/asm/pgtable.h |  2 -
 fs/proc/kcore.c   | 26 +
 31 files changed, 11 insertions(+), 210 deletions(-)

diff --git a/arch/alpha/include/asm/pgtable.h b/arch/alpha/include/asm/pgtable.h
index 3ea9661c09ff..9e45f6735d5d 100644
--- a/arch/alpha/include/asm/pgtable.h
+++ b/arch/alpha/include/asm/pgtable.h
@@ -313,8 +313,6 @@ extern inline pte_t mk_swap_pte(unsigned long type, 
unsigned long offset)
 #define __pte_to_swp_entry(pte)((swp_entry_t) { pte_val(pte) })
 #define __swp_entry_to_pte(x)  ((pte_t) { (x).val })
 
-#define kern_addr_valid(addr)  (1)
-
 #define pte_ERROR(e) \
printk("%s:%d: bad pte %016lx.\n", __FILE__, __LINE__, pte_val(e))
 #define pmd_ERROR(e) \
diff --git a/arch/arc/include/asm/pgtable-bits-arcv2.h 
b/arch/arc/include/asm/pgtable-bits-arcv2.h
index b23be557403e..515e82db519f 100644
--- a/arch/arc/include/asm/pgtable-bits-arcv2.h
+++ b/arch/arc/include/asm/pgtable-bits-arcv2.h
@@ -120,8 +120,6 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned 
long address,
 #define __pte_to_swp_entry(pte)((swp_entry_t) { pte_val(pte) })
 #define __swp_entry_to_pte(x)  ((pte_t) { (x).val })
 
-#define kern_addr_valid(addr)  (1)
-
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 #include 
 #endif
diff --git a/arch/arm/include/asm/pgtable-nommu.h 
b/arch/arm/include/asm/pgtable-nommu.h
index d16aba48fa0a..25d8c7bb07e0 100644
--- a/arch/arm/include/asm/pgtable-nommu.h
+++ b/arch/arm/include/asm/pgtable-nommu.h
@@ -21,8 +21,6 @@
 #define pgd_none(pgd)  (0)
 #define pgd_bad(pgd)   (0)
 #define pgd_clear(pgdp)
-#define kern_addr_valid(addr)  (1)
-/* FIXME */
 /*
  * PMD_SHIFT determines the size of the area a second-level page table can map
  * PGDIR_SHIFT determines what a third-level page table entry can map
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index 78a532068fec..00954ab1a039 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -298,10 +298,6 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
  */
 #define MAX_SWAPFILES_CHECK() BUILD_BUG_ON(MAX_SWAPFILES_SHIFT > 
__SWP_TYPE_BITS)
 
-/* Needs to be defined here and not in linux/mm.h, as it is arch dependent */
-/* FIXME: this is not correct */
-#define kern_addr_valid(addr)  (1)
-
 /*
  * We provide our own arch_get_unmapped_area to cope with VIPT caches.
  */
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 71a1af42f0e8..4873c1d6e7d0 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -1021,8 +1021,6 @@ static inline pmd_t pmdp_establish(struct vm_area_struct 
*vma,
  */
 #define MAX_SWAPFILES_CHECK() BUILD_BUG_ON(MAX_SWAPFILES_SHIFT > 
__SWP_TYPE_BITS)
 
-extern int kern_addr_valid(unsigned long addr);
-
 #ifdef CONFIG_ARM64_MTE
 
 #define __HAVE_ARCH_PREPARE_TO_SWAP
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 9a7c38965154..556154d821bf 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -814,53 +814,6 @@ void __init 

[PATCH v4 04/11] sections: Move is_kernel_inittext() into sections.h

2021-09-30 Thread Kefeng Wang
The is_kernel_inittext() and init_kernel_text() are with same
functionality, let's just keep is_kernel_inittext() and move
it into sections.h, then update all the callers.

Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Arnd Bergmann 
Cc: x...@kernel.org
Signed-off-by: Kefeng Wang 
---
 arch/x86/kernel/unwind_orc.c   |  2 +-
 include/asm-generic/sections.h | 14 ++
 include/linux/kallsyms.h   |  8 
 include/linux/kernel.h |  1 -
 kernel/extable.c   | 12 ++--
 5 files changed, 17 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
index a1202536fc57..d92ec2ced059 100644
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -175,7 +175,7 @@ static struct orc_entry *orc_find(unsigned long ip)
}
 
/* vmlinux .init slow lookup: */
-   if (init_kernel_text(ip))
+   if (is_kernel_inittext(ip))
return __orc_find(__start_orc_unwind_ip, __start_orc_unwind,
  __stop_orc_unwind_ip - __start_orc_unwind_ip, 
ip);
 
diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h
index 24780c0f40b1..811583ca8bd0 100644
--- a/include/asm-generic/sections.h
+++ b/include/asm-generic/sections.h
@@ -172,4 +172,18 @@ static inline bool is_kernel_rodata(unsigned long addr)
   addr < (unsigned long)__end_rodata;
 }
 
+/**
+ * is_kernel_inittext - checks if the pointer address is located in the
+ *  .init.text section
+ *
+ * @addr: address to check
+ *
+ * Returns: true if the address is located in .init.text, false otherwise.
+ */
+static inline bool is_kernel_inittext(unsigned long addr)
+{
+   return addr >= (unsigned long)_sinittext &&
+  addr < (unsigned long)_einittext;
+}
+
 #endif /* _ASM_GENERIC_SECTIONS_H_ */
diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h
index b016c62f30a6..8a9d329c927c 100644
--- a/include/linux/kallsyms.h
+++ b/include/linux/kallsyms.h
@@ -24,14 +24,6 @@
 struct cred;
 struct module;
 
-static inline int is_kernel_inittext(unsigned long addr)
-{
-   if (addr >= (unsigned long)_sinittext
-   && addr < (unsigned long)_einittext)
-   return 1;
-   return 0;
-}
-
 static inline int is_kernel_text(unsigned long addr)
 {
if ((addr >= (unsigned long)_stext && addr < (unsigned long)_etext))
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index e5a9af8a4e20..445d0dceefb8 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -229,7 +229,6 @@ extern bool parse_option_str(const char *str, const char 
*option);
 extern char *next_arg(char *args, char **param, char **val);
 
 extern int core_kernel_text(unsigned long addr);
-extern int init_kernel_text(unsigned long addr);
 extern int __kernel_text_address(unsigned long addr);
 extern int kernel_text_address(unsigned long addr);
 extern int func_ptr_is_kernel_text(void *ptr);
diff --git a/kernel/extable.c b/kernel/extable.c
index da26203841d4..98ca627ac5ef 100644
--- a/kernel/extable.c
+++ b/kernel/extable.c
@@ -62,14 +62,6 @@ const struct exception_table_entry 
*search_exception_tables(unsigned long addr)
return e;
 }
 
-int init_kernel_text(unsigned long addr)
-{
-   if (addr >= (unsigned long)_sinittext &&
-   addr < (unsigned long)_einittext)
-   return 1;
-   return 0;
-}
-
 int notrace core_kernel_text(unsigned long addr)
 {
if (addr >= (unsigned long)_stext &&
@@ -77,7 +69,7 @@ int notrace core_kernel_text(unsigned long addr)
return 1;
 
if (system_state < SYSTEM_RUNNING &&
-   init_kernel_text(addr))
+   is_kernel_inittext(addr))
return 1;
return 0;
 }
@@ -94,7 +86,7 @@ int __kernel_text_address(unsigned long addr)
 * Since we are after the module-symbols check, there's
 * no danger of address overlap:
 */
-   if (init_kernel_text(addr))
+   if (is_kernel_inittext(addr))
return 1;
return 0;
 }
-- 
2.26.2



[PATCH v4 10/11] microblaze: Use is_kernel_text() helper

2021-09-30 Thread Kefeng Wang
Use is_kernel_text() helper to simplify code.

Cc: Michal Simek 
Signed-off-by: Kefeng Wang 
---
 arch/microblaze/mm/pgtable.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/microblaze/mm/pgtable.c b/arch/microblaze/mm/pgtable.c
index c1833b159d3b..9f73265aad4e 100644
--- a/arch/microblaze/mm/pgtable.c
+++ b/arch/microblaze/mm/pgtable.c
@@ -34,6 +34,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -171,7 +172,7 @@ void __init mapin_ram(void)
for (s = 0; s < lowmem_size; s += PAGE_SIZE) {
f = _PAGE_PRESENT | _PAGE_ACCESSED |
_PAGE_SHARED | _PAGE_HWEXEC;
-   if ((char *) v < _stext || (char *) v >= _etext)
+   if (!is_kernel_text(v))
f |= _PAGE_WRENABLE;
else
/* On the MicroBlaze, no user access
-- 
2.26.2



[PATCH v4 07/11] mm: kasan: Use is_kernel() helper

2021-09-30 Thread Kefeng Wang
Directly use is_kernel() helper in kernel_or_module_addr().

Cc: Andrey Ryabinin 
Cc: Alexander Potapenko 
Cc: Andrey Konovalov 
Cc: Dmitry Vyukov 
Signed-off-by: Kefeng Wang 
---
 mm/kasan/report.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/kasan/report.c b/mm/kasan/report.c
index 3239fd8f8747..1c955e1c98d5 100644
--- a/mm/kasan/report.c
+++ b/mm/kasan/report.c
@@ -226,7 +226,7 @@ static void describe_object(struct kmem_cache *cache, void 
*object,
 
 static inline bool kernel_or_module_addr(const void *addr)
 {
-   if (addr >= (void *)_stext && addr < (void *)_end)
+   if (is_kernel((unsigned long)addr))
return true;
if (is_module_address((unsigned long)addr))
return true;
-- 
2.26.2



[PATCH v4 01/11] kallsyms: Remove arch specific text and data check

2021-09-30 Thread Kefeng Wang
After commit 4ba66a976072 ("arch: remove blackfin port"),
no need arch-specific text/data check.

Cc: Arnd Bergmann 
Signed-off-by: Kefeng Wang 
---
 include/asm-generic/sections.h | 16 
 include/linux/kallsyms.h   |  3 +--
 kernel/locking/lockdep.c   |  3 ---
 3 files changed, 1 insertion(+), 21 deletions(-)

diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h
index d16302d3eb59..817309e289db 100644
--- a/include/asm-generic/sections.h
+++ b/include/asm-generic/sections.h
@@ -64,22 +64,6 @@ extern __visible const void __nosave_begin, __nosave_end;
 #define dereference_kernel_function_descriptor(p) ((void *)(p))
 #endif
 
-/* random extra sections (if any).  Override
- * in asm/sections.h */
-#ifndef arch_is_kernel_text
-static inline int arch_is_kernel_text(unsigned long addr)
-{
-   return 0;
-}
-#endif
-
-#ifndef arch_is_kernel_data
-static inline int arch_is_kernel_data(unsigned long addr)
-{
-   return 0;
-}
-#endif
-
 /*
  * Check if an address is part of freed initmem. This is needed on 
architectures
  * with virt == phys kernel mapping, for code that wants to check if an address
diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h
index 6851c2313cad..2a241e3f063f 100644
--- a/include/linux/kallsyms.h
+++ b/include/linux/kallsyms.h
@@ -34,8 +34,7 @@ static inline int is_kernel_inittext(unsigned long addr)
 
 static inline int is_kernel_text(unsigned long addr)
 {
-   if ((addr >= (unsigned long)_stext && addr <= (unsigned long)_etext) ||
-   arch_is_kernel_text(addr))
+   if ((addr >= (unsigned long)_stext && addr <= (unsigned long)_etext))
return 1;
return in_gate_area_no_mm(addr);
 }
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 7096384dc60f..dcdbcee391cd 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -803,9 +803,6 @@ static int static_obj(const void *obj)
if ((addr >= start) && (addr < end))
return 1;
 
-   if (arch_is_kernel_data(addr))
-   return 1;
-
/*
 * in-kernel percpu var?
 */
-- 
2.26.2



[PATCH v4 05/11] x86: mm: Rename __is_kernel_text() to is_x86_32_kernel_text()

2021-09-30 Thread Kefeng Wang
Commit b56cd05c55a1 ("x86/mm: Rename is_kernel_text to __is_kernel_text"),
add '__' prefix not to get in conflict with existing is_kernel_text() in
.

We will add __is_kernel_text() for the basic kernel text range check in the
next patch, so use private is_x86_32_kernel_text() naming for x86 special
check.

Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: x...@kernel.org
Signed-off-by: Kefeng Wang 
---
 arch/x86/mm/init_32.c | 14 +-
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index bd90b8fe81e4..523743ee9dea 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -238,11 +238,7 @@ page_table_range_init(unsigned long start, unsigned long 
end, pgd_t *pgd_base)
}
 }
 
-/*
- * The  already defines is_kernel_text,
- * using '__' prefix not to get in conflict.
- */
-static inline int __is_kernel_text(unsigned long addr)
+static inline int is_x86_32_kernel_text(unsigned long addr)
 {
if (addr >= (unsigned long)_text && addr <= (unsigned long)__init_end)
return 1;
@@ -333,8 +329,8 @@ kernel_physical_mapping_init(unsigned long start,
addr2 = (pfn + PTRS_PER_PTE-1) * PAGE_SIZE +
PAGE_OFFSET + PAGE_SIZE-1;
 
-   if (__is_kernel_text(addr) ||
-   __is_kernel_text(addr2))
+   if (is_x86_32_kernel_text(addr) ||
+   is_x86_32_kernel_text(addr2))
prot = PAGE_KERNEL_LARGE_EXEC;
 
pages_2m++;
@@ -359,7 +355,7 @@ kernel_physical_mapping_init(unsigned long start,
 */
pgprot_t init_prot = __pgprot(PTE_IDENT_ATTR);
 
-   if (__is_kernel_text(addr))
+   if (is_x86_32_kernel_text(addr))
prot = PAGE_KERNEL_EXEC;
 
pages_4k++;
@@ -820,7 +816,7 @@ static void mark_nxdata_nx(void)
 */
unsigned long start = PFN_ALIGN(_etext);
/*
-* This comes from __is_kernel_text upper limit. Also HPAGE where used:
+* This comes from is_x86_32_kernel_text upper limit. Also HPAGE where 
used:
 */
unsigned long size = (((unsigned long)__init_end + HPAGE_SIZE) & 
HPAGE_MASK) - start;
 
-- 
2.26.2



[PATCH v4 02/11] kallsyms: Fix address-checks for kernel related range

2021-09-30 Thread Kefeng Wang
The is_kernel_inittext/is_kernel_text/is_kernel function should not
include the end address(the labels _einittext, _etext and _end) when
check the address range, the issue exists since Linux v2.6.12.

Cc: Arnd Bergmann 
Cc: Sergey Senozhatsky 
Cc: Petr Mladek 
Reviewed-by: Petr Mladek 
Reviewed-by: Steven Rostedt (VMware) 
Acked-by: Sergey Senozhatsky 
Signed-off-by: Kefeng Wang 
---
 include/linux/kallsyms.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h
index 2a241e3f063f..b016c62f30a6 100644
--- a/include/linux/kallsyms.h
+++ b/include/linux/kallsyms.h
@@ -27,21 +27,21 @@ struct module;
 static inline int is_kernel_inittext(unsigned long addr)
 {
if (addr >= (unsigned long)_sinittext
-   && addr <= (unsigned long)_einittext)
+   && addr < (unsigned long)_einittext)
return 1;
return 0;
 }
 
 static inline int is_kernel_text(unsigned long addr)
 {
-   if ((addr >= (unsigned long)_stext && addr <= (unsigned long)_etext))
+   if ((addr >= (unsigned long)_stext && addr < (unsigned long)_etext))
return 1;
return in_gate_area_no_mm(addr);
 }
 
 static inline int is_kernel(unsigned long addr)
 {
-   if (addr >= (unsigned long)_stext && addr <= (unsigned long)_end)
+   if (addr >= (unsigned long)_stext && addr < (unsigned long)_end)
return 1;
return in_gate_area_no_mm(addr);
 }
-- 
2.26.2



[PATCH v4 08/11] extable: Use is_kernel_text() helper

2021-09-30 Thread Kefeng Wang
The core_kernel_text() should check the gate area, as it is part
of kernel text range, use is_kernel_text() in core_kernel_text().

Cc: Steven Rostedt 
Signed-off-by: Kefeng Wang 
---
 kernel/extable.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/extable.c b/kernel/extable.c
index 98ca627ac5ef..0ba383d850ff 100644
--- a/kernel/extable.c
+++ b/kernel/extable.c
@@ -64,8 +64,7 @@ const struct exception_table_entry 
*search_exception_tables(unsigned long addr)
 
 int notrace core_kernel_text(unsigned long addr)
 {
-   if (addr >= (unsigned long)_stext &&
-   addr < (unsigned long)_etext)
+   if (is_kernel_text(addr))
return 1;
 
if (system_state < SYSTEM_RUNNING &&
-- 
2.26.2



[PATCH v4 11/11] alpha: Use is_kernel_text() helper

2021-09-30 Thread Kefeng Wang
Use is_kernel_text() helper to simplify code.

Cc: Richard Henderson 
Cc: Ivan Kokshaysky 
Cc: Matt Turner 
Signed-off-by: Kefeng Wang 
---
 arch/alpha/kernel/traps.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/alpha/kernel/traps.c b/arch/alpha/kernel/traps.c
index e805106409f7..2ae34702456c 100644
--- a/arch/alpha/kernel/traps.c
+++ b/arch/alpha/kernel/traps.c
@@ -129,9 +129,7 @@ dik_show_trace(unsigned long *sp, const char *loglvl)
extern char _stext[], _etext[];
unsigned long tmp = *sp;
sp++;
-   if (tmp < (unsigned long) &_stext)
-   continue;
-   if (tmp >= (unsigned long) &_etext)
+   if (!is_kernel_text(tmp))
continue;
printk("%s[<%lx>] %pSR\n", loglvl, tmp, (void *)tmp);
if (i > 40) {
-- 
2.26.2



[PATCH v4 09/11] powerpc/mm: Use core_kernel_text() helper

2021-09-30 Thread Kefeng Wang
Use core_kernel_text() helper to simplify code, also drop etext,
_stext, _sinittext, _einittext declaration which already declared
in section.h.

Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Christophe Leroy 
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Kefeng Wang 
---
 arch/powerpc/mm/pgtable_32.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index dcf5ecca19d9..079abbf45a33 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -33,8 +33,6 @@
 
 #include 
 
-extern char etext[], _stext[], _sinittext[], _einittext[];
-
 static u8 early_fixmap_pagetable[FIXMAP_PTE_SIZE] __page_aligned_data;
 
 notrace void __init early_ioremap_init(void)
@@ -104,14 +102,13 @@ static void __init __mapin_ram_chunk(unsigned long 
offset, unsigned long top)
 {
unsigned long v, s;
phys_addr_t p;
-   int ktext;
+   bool ktext;
 
s = offset;
v = PAGE_OFFSET + s;
p = memstart_addr + s;
for (; s < top; s += PAGE_SIZE) {
-   ktext = ((char *)v >= _stext && (char *)v < etext) ||
-   ((char *)v >= _sinittext && (char *)v < _einittext);
+   ktext = core_kernel_text(v);
map_kernel_page(v, p, ktext ? PAGE_KERNEL_TEXT : PAGE_KERNEL);
v += PAGE_SIZE;
p += PAGE_SIZE;
-- 
2.26.2



[PATCH v4 06/11] sections: Provide internal __is_kernel() and __is_kernel_text() helper

2021-09-30 Thread Kefeng Wang
An internal __is_kernel() helper which only check the
kernel address ranges, and an internal __is_kernel_text()
helper which only check text section ranges.

Signed-off-by: Kefeng Wang 
---
 include/asm-generic/sections.h | 29 +
 include/linux/kallsyms.h   |  4 ++--
 2 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h
index 811583ca8bd0..a7abeadddc7a 100644
--- a/include/asm-generic/sections.h
+++ b/include/asm-generic/sections.h
@@ -186,4 +186,33 @@ static inline bool is_kernel_inittext(unsigned long addr)
   addr < (unsigned long)_einittext;
 }
 
+/**
+ * __is_kernel_text - checks if the pointer address is located in the
+ *.text section
+ *
+ * @addr: address to check
+ *
+ * Returns: true if the address is located in .text, false otherwise.
+ * Note: an internal helper, only check the range of _stext to _etext.
+ */
+static inline bool __is_kernel_text(unsigned long addr)
+{
+   return addr >= (unsigned long)_stext &&
+  addr < (unsigned long)_etext;
+}
+
+/**
+ * __is_kernel - checks if the pointer address is located in the kernel range
+ *
+ * @addr: address to check
+ *
+ * Returns: true if the address is located in the kernel range, false 
otherwise.
+ * Note: an internal helper, only check the range of _stext to _end.
+ */
+static inline bool __is_kernel(unsigned long addr)
+{
+   return addr >= (unsigned long)_stext &&
+  addr < (unsigned long)_end;
+}
+
 #endif /* _ASM_GENERIC_SECTIONS_H_ */
diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h
index 8a9d329c927c..5fb17dd4b6fa 100644
--- a/include/linux/kallsyms.h
+++ b/include/linux/kallsyms.h
@@ -26,14 +26,14 @@ struct module;
 
 static inline int is_kernel_text(unsigned long addr)
 {
-   if ((addr >= (unsigned long)_stext && addr < (unsigned long)_etext))
+   if (__is_kernel_text(addr))
return 1;
return in_gate_area_no_mm(addr);
 }
 
 static inline int is_kernel(unsigned long addr)
 {
-   if (addr >= (unsigned long)_stext && addr < (unsigned long)_end)
+   if (__is_kernel(addr))
return 1;
return in_gate_area_no_mm(addr);
 }
-- 
2.26.2



[PATCH v4 03/11] sections: Move and rename core_kernel_data() to is_kernel_core_data()

2021-09-30 Thread Kefeng Wang
Move core_kernel_data() into sections.h and rename it to
is_kernel_core_data(), also make it return bool value, then
update all the callers.

Cc: Arnd Bergmann 
Cc: Steven Rostedt 
Cc: Ingo Molnar 
Cc: "David S. Miller" 
Signed-off-by: Kefeng Wang 
---
 include/asm-generic/sections.h | 16 
 include/linux/kernel.h |  1 -
 kernel/extable.c   | 18 --
 kernel/trace/ftrace.c  |  2 +-
 net/sysctl_net.c   |  2 +-
 5 files changed, 18 insertions(+), 21 deletions(-)

diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h
index 817309e289db..24780c0f40b1 100644
--- a/include/asm-generic/sections.h
+++ b/include/asm-generic/sections.h
@@ -142,6 +142,22 @@ static inline bool init_section_intersects(void *virt, 
size_t size)
return memory_intersects(__init_begin, __init_end, virt, size);
 }
 
+/**
+ * is_kernel_core_data - checks if the pointer address is located in the
+ *  .data section
+ *
+ * @addr: address to check
+ *
+ * Returns: true if the address is located in .data, false otherwise.
+ * Note: On some archs it may return true for core RODATA, and false
+ *   for others. But will always be true for core RW data.
+ */
+static inline bool is_kernel_core_data(unsigned long addr)
+{
+   return addr >= (unsigned long)_sdata &&
+  addr < (unsigned long)_edata;
+}
+
 /**
  * is_kernel_rodata - checks if the pointer address is located in the
  *.rodata section
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 2776423a587e..e5a9af8a4e20 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -230,7 +230,6 @@ extern char *next_arg(char *args, char **param, char **val);
 
 extern int core_kernel_text(unsigned long addr);
 extern int init_kernel_text(unsigned long addr);
-extern int core_kernel_data(unsigned long addr);
 extern int __kernel_text_address(unsigned long addr);
 extern int kernel_text_address(unsigned long addr);
 extern int func_ptr_is_kernel_text(void *ptr);
diff --git a/kernel/extable.c b/kernel/extable.c
index b0ea5eb0c3b4..da26203841d4 100644
--- a/kernel/extable.c
+++ b/kernel/extable.c
@@ -82,24 +82,6 @@ int notrace core_kernel_text(unsigned long addr)
return 0;
 }
 
-/**
- * core_kernel_data - tell if addr points to kernel data
- * @addr: address to test
- *
- * Returns true if @addr passed in is from the core kernel data
- * section.
- *
- * Note: On some archs it may return true for core RODATA, and false
- *  for others. But will always be true for core RW data.
- */
-int core_kernel_data(unsigned long addr)
-{
-   if (addr >= (unsigned long)_sdata &&
-   addr < (unsigned long)_edata)
-   return 1;
-   return 0;
-}
-
 int __kernel_text_address(unsigned long addr)
 {
if (kernel_text_address(addr))
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 7efbc8aaf7f6..f15badf31f52 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -323,7 +323,7 @@ int __register_ftrace_function(struct ftrace_ops *ops)
if (!ftrace_enabled && (ops->flags & FTRACE_OPS_FL_PERMANENT))
return -EBUSY;
 
-   if (!core_kernel_data((unsigned long)ops))
+   if (!is_kernel_core_data((unsigned long)ops))
ops->flags |= FTRACE_OPS_FL_DYNAMIC;
 
add_ftrace_ops(_ops_list, ops);
diff --git a/net/sysctl_net.c b/net/sysctl_net.c
index f6cb0d4d114c..4b45ed631eb8 100644
--- a/net/sysctl_net.c
+++ b/net/sysctl_net.c
@@ -144,7 +144,7 @@ static void ensure_safe_net_sysctl(struct net *net, const 
char *path,
addr = (unsigned long)ent->data;
if (is_module_address(addr))
where = "module";
-   else if (core_kernel_data(addr))
+   else if (is_kernel_core_data(addr))
where = "kernel";
else
continue;
-- 
2.26.2



[PATCH v4 00/11] sections: Unify kernel sections range check and use

2021-09-30 Thread Kefeng Wang
There are three head files(kallsyms.h, kernel.h and sections.h) which
include the kernel sections range check, let's make some cleanup and
unify them.

1. cleanup arch specific text/data check and fix address boundary check
   in kallsyms.h
2. make all the basic/core kernel range check function into sections.h
3. update all the callers, and use the helper in sections.h to simplify
   the code

After this series, we have 5 APIs about kernel sections range check in
sections.h

 * is_kernel_rodata()   --- already in sections.h
 * is_kernel_core_data()--- come from core_kernel_data() in kernel.h
 * is_kernel_inittext() --- come from kernel.h and kallsyms.h
 * __is_kernel_text()   --- add new internal helper
 * __is_kernel()--- add new internal helper

Note: For the last two helpers, people should not use directly, consider to
  use corresponding function in kallsyms.h.

v4:
- Use core_kernel_text() in powerpc sugguested Christophe Leroy, build
  test only
- Use is_kernel_text() in alpha and microblaze, build test only on
  next-20210929

v3:
https://lore.kernel.org/linux-arch/20210926072048.190336-1-wangkefeng.w...@huawei.com/
- Add Steven's RB to patch2
- Introduce two internal helper, then use is_kernel_text() in core_kernel_text()
  and is_kernel() in kernel_or_module_addr() suggested by Steven

v2:
https://lore.kernel.org/linux-arch/20210728081320.20394-1-wangkefeng.w...@huawei.com/
- add ACK/RW to patch2, and drop inappropriate fix tag
- keep 'core' to check kernel data, suggestted by Steven Rostedt
  , rename is_kernel_data() to is_kernel_core_data()
- drop patch8 which is merged
- drop patch9 which is resend independently

v1:
https://lore.kernel.org/linux-arch/20210626073439.150586-1-wangkefeng.w...@huawei.com


Kefeng Wang (11):
  kallsyms: Remove arch specific text and data check
  kallsyms: Fix address-checks for kernel related range
  sections: Move and rename core_kernel_data() to is_kernel_core_data()
  sections: Move is_kernel_inittext() into sections.h
  x86: mm: Rename __is_kernel_text() to is_x86_32_kernel_text()
  sections: Provide internal __is_kernel() and __is_kernel_text() helper
  mm: kasan: Use is_kernel() helper
  extable: Use is_kernel_text() helper
  powerpc/mm: Use core_kernel_text() helper
  microblaze: Use is_kernel_text() helper
  alpha: Use is_kernel_text() helper

 arch/alpha/kernel/traps.c  |  4 +-
 arch/microblaze/mm/pgtable.c   |  3 +-
 arch/powerpc/mm/pgtable_32.c   |  7 +---
 arch/x86/kernel/unwind_orc.c   |  2 +-
 arch/x86/mm/init_32.c  | 14 +++
 include/asm-generic/sections.h | 75 ++
 include/linux/kallsyms.h   | 13 +-
 include/linux/kernel.h |  2 -
 kernel/extable.c   | 33 ++-
 kernel/locking/lockdep.c   |  3 --
 kernel/trace/ftrace.c  |  2 +-
 mm/kasan/report.c  |  2 +-
 net/sysctl_net.c   |  2 +-
 13 files changed, 78 insertions(+), 84 deletions(-)

-- 
2.26.2



Re: [PATCH v3 9/9] powerpc/mm: Use is_kernel_text() and is_kernel_inittext() helper

2021-09-28 Thread Kefeng Wang




On 2021/9/29 1:51, Christophe Leroy wrote:



Le 26/09/2021 à 09:20, Kefeng Wang a écrit :

Use is_kernel_text() and is_kernel_inittext() helper to simplify code,
also drop etext, _stext, _sinittext, _einittext declaration which
already declared in section.h.

Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Kefeng Wang 
---
  arch/powerpc/mm/pgtable_32.c | 7 ++-
  1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index dcf5ecca19d9..13c798308c2e 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -33,8 +33,6 @@
  #include 
-extern char etext[], _stext[], _sinittext[], _einittext[];
-
  static u8 early_fixmap_pagetable[FIXMAP_PTE_SIZE] __page_aligned_data;
  notrace void __init early_ioremap_init(void)
@@ -104,14 +102,13 @@ static void __init __mapin_ram_chunk(unsigned 
long offset, unsigned long top)

  {
  unsigned long v, s;
  phys_addr_t p;
-    int ktext;
+    bool ktext;
  s = offset;
  v = PAGE_OFFSET + s;
  p = memstart_addr + s;
  for (; s < top; s += PAGE_SIZE) {
-    ktext = ((char *)v >= _stext && (char *)v < etext) ||
-    ((char *)v >= _sinittext && (char *)v < _einittext);
+    ktext = (is_kernel_text(v) || is_kernel_inittext(v));


I think we could use core_kernel_next() instead.

Indead. oops, sorry for the build error, will update, thanks.



Build failure on mpc885_ads_defconfig

arch/powerpc/mm/pgtable_32.c: In function '__mapin_ram_chunk':
arch/powerpc/mm/pgtable_32.c:111:26: error: implicit declaration of 
function 'is_kernel_text'; did you mean 'is_kernel_inittext'? 
[-Werror=implicit-function-declaration]
   111 | ktext = (is_kernel_text(v) || 
is_kernel_inittext(v));

   |  ^~
   |  is_kernel_inittext
cc1: all warnings being treated as errors
make[2]: *** [scripts/Makefile.build:277: arch/powerpc/mm/pgtable_32.o] 
Error 1

make[1]: *** [scripts/Makefile.build:540: arch/powerpc/mm] Error 2
make: *** [Makefile:1868: arch/powerpc] Error 2


.


[PATCH v3 4/9] sections: Move is_kernel_inittext() into sections.h

2021-09-26 Thread Kefeng Wang
The is_kernel_inittext() and init_kernel_text() are with same
functionality, let's just keep is_kernel_inittext() and move
it into sections.h, then update all the callers.

Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Arnd Bergmann 
Cc: x...@kernel.org
Signed-off-by: Kefeng Wang 
---
 arch/x86/kernel/unwind_orc.c   |  2 +-
 include/asm-generic/sections.h | 14 ++
 include/linux/kallsyms.h   |  8 
 include/linux/kernel.h |  1 -
 kernel/extable.c   | 12 ++--
 5 files changed, 17 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
index a1202536fc57..d92ec2ced059 100644
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -175,7 +175,7 @@ static struct orc_entry *orc_find(unsigned long ip)
}
 
/* vmlinux .init slow lookup: */
-   if (init_kernel_text(ip))
+   if (is_kernel_inittext(ip))
return __orc_find(__start_orc_unwind_ip, __start_orc_unwind,
  __stop_orc_unwind_ip - __start_orc_unwind_ip, 
ip);
 
diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h
index 24780c0f40b1..811583ca8bd0 100644
--- a/include/asm-generic/sections.h
+++ b/include/asm-generic/sections.h
@@ -172,4 +172,18 @@ static inline bool is_kernel_rodata(unsigned long addr)
   addr < (unsigned long)__end_rodata;
 }
 
+/**
+ * is_kernel_inittext - checks if the pointer address is located in the
+ *  .init.text section
+ *
+ * @addr: address to check
+ *
+ * Returns: true if the address is located in .init.text, false otherwise.
+ */
+static inline bool is_kernel_inittext(unsigned long addr)
+{
+   return addr >= (unsigned long)_sinittext &&
+  addr < (unsigned long)_einittext;
+}
+
 #endif /* _ASM_GENERIC_SECTIONS_H_ */
diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h
index b016c62f30a6..8a9d329c927c 100644
--- a/include/linux/kallsyms.h
+++ b/include/linux/kallsyms.h
@@ -24,14 +24,6 @@
 struct cred;
 struct module;
 
-static inline int is_kernel_inittext(unsigned long addr)
-{
-   if (addr >= (unsigned long)_sinittext
-   && addr < (unsigned long)_einittext)
-   return 1;
-   return 0;
-}
-
 static inline int is_kernel_text(unsigned long addr)
 {
if ((addr >= (unsigned long)_stext && addr < (unsigned long)_etext))
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index e5a9af8a4e20..445d0dceefb8 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -229,7 +229,6 @@ extern bool parse_option_str(const char *str, const char 
*option);
 extern char *next_arg(char *args, char **param, char **val);
 
 extern int core_kernel_text(unsigned long addr);
-extern int init_kernel_text(unsigned long addr);
 extern int __kernel_text_address(unsigned long addr);
 extern int kernel_text_address(unsigned long addr);
 extern int func_ptr_is_kernel_text(void *ptr);
diff --git a/kernel/extable.c b/kernel/extable.c
index da26203841d4..98ca627ac5ef 100644
--- a/kernel/extable.c
+++ b/kernel/extable.c
@@ -62,14 +62,6 @@ const struct exception_table_entry 
*search_exception_tables(unsigned long addr)
return e;
 }
 
-int init_kernel_text(unsigned long addr)
-{
-   if (addr >= (unsigned long)_sinittext &&
-   addr < (unsigned long)_einittext)
-   return 1;
-   return 0;
-}
-
 int notrace core_kernel_text(unsigned long addr)
 {
if (addr >= (unsigned long)_stext &&
@@ -77,7 +69,7 @@ int notrace core_kernel_text(unsigned long addr)
return 1;
 
if (system_state < SYSTEM_RUNNING &&
-   init_kernel_text(addr))
+   is_kernel_inittext(addr))
return 1;
return 0;
 }
@@ -94,7 +86,7 @@ int __kernel_text_address(unsigned long addr)
 * Since we are after the module-symbols check, there's
 * no danger of address overlap:
 */
-   if (init_kernel_text(addr))
+   if (is_kernel_inittext(addr))
return 1;
return 0;
 }
-- 
2.26.2



[PATCH v3 3/9] sections: Move and rename core_kernel_data() to is_kernel_core_data()

2021-09-26 Thread Kefeng Wang
Move core_kernel_data() into sections.h and rename it to
is_kernel_core_data(), also make it return bool value, then
update all the callers.

Cc: Arnd Bergmann 
Cc: Steven Rostedt 
Cc: Ingo Molnar 
Cc: "David S. Miller" 
Signed-off-by: Kefeng Wang 
---
 include/asm-generic/sections.h | 16 
 include/linux/kernel.h |  1 -
 kernel/extable.c   | 18 --
 kernel/trace/ftrace.c  |  2 +-
 net/sysctl_net.c   |  2 +-
 5 files changed, 18 insertions(+), 21 deletions(-)

diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h
index 817309e289db..24780c0f40b1 100644
--- a/include/asm-generic/sections.h
+++ b/include/asm-generic/sections.h
@@ -142,6 +142,22 @@ static inline bool init_section_intersects(void *virt, 
size_t size)
return memory_intersects(__init_begin, __init_end, virt, size);
 }
 
+/**
+ * is_kernel_core_data - checks if the pointer address is located in the
+ *  .data section
+ *
+ * @addr: address to check
+ *
+ * Returns: true if the address is located in .data, false otherwise.
+ * Note: On some archs it may return true for core RODATA, and false
+ *   for others. But will always be true for core RW data.
+ */
+static inline bool is_kernel_core_data(unsigned long addr)
+{
+   return addr >= (unsigned long)_sdata &&
+  addr < (unsigned long)_edata;
+}
+
 /**
  * is_kernel_rodata - checks if the pointer address is located in the
  *.rodata section
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 2776423a587e..e5a9af8a4e20 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -230,7 +230,6 @@ extern char *next_arg(char *args, char **param, char **val);
 
 extern int core_kernel_text(unsigned long addr);
 extern int init_kernel_text(unsigned long addr);
-extern int core_kernel_data(unsigned long addr);
 extern int __kernel_text_address(unsigned long addr);
 extern int kernel_text_address(unsigned long addr);
 extern int func_ptr_is_kernel_text(void *ptr);
diff --git a/kernel/extable.c b/kernel/extable.c
index b0ea5eb0c3b4..da26203841d4 100644
--- a/kernel/extable.c
+++ b/kernel/extable.c
@@ -82,24 +82,6 @@ int notrace core_kernel_text(unsigned long addr)
return 0;
 }
 
-/**
- * core_kernel_data - tell if addr points to kernel data
- * @addr: address to test
- *
- * Returns true if @addr passed in is from the core kernel data
- * section.
- *
- * Note: On some archs it may return true for core RODATA, and false
- *  for others. But will always be true for core RW data.
- */
-int core_kernel_data(unsigned long addr)
-{
-   if (addr >= (unsigned long)_sdata &&
-   addr < (unsigned long)_edata)
-   return 1;
-   return 0;
-}
-
 int __kernel_text_address(unsigned long addr)
 {
if (kernel_text_address(addr))
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 7efbc8aaf7f6..f15badf31f52 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -323,7 +323,7 @@ int __register_ftrace_function(struct ftrace_ops *ops)
if (!ftrace_enabled && (ops->flags & FTRACE_OPS_FL_PERMANENT))
return -EBUSY;
 
-   if (!core_kernel_data((unsigned long)ops))
+   if (!is_kernel_core_data((unsigned long)ops))
ops->flags |= FTRACE_OPS_FL_DYNAMIC;
 
add_ftrace_ops(_ops_list, ops);
diff --git a/net/sysctl_net.c b/net/sysctl_net.c
index f6cb0d4d114c..4b45ed631eb8 100644
--- a/net/sysctl_net.c
+++ b/net/sysctl_net.c
@@ -144,7 +144,7 @@ static void ensure_safe_net_sysctl(struct net *net, const 
char *path,
addr = (unsigned long)ent->data;
if (is_module_address(addr))
where = "module";
-   else if (core_kernel_data(addr))
+   else if (is_kernel_core_data(addr))
where = "kernel";
else
continue;
-- 
2.26.2



[PATCH v3 6/9] sections: Provide internal __is_kernel() and __is_kernel_text() helper

2021-09-26 Thread Kefeng Wang
An internal __is_kernel() helper which only check the kernel address ranges,
and an internal __is_kernel_text() helper which only check text section ranges.

Signed-off-by: Kefeng Wang 
---
 include/asm-generic/sections.h | 29 +
 include/linux/kallsyms.h   |  4 ++--
 2 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h
index 811583ca8bd0..a7abeadddc7a 100644
--- a/include/asm-generic/sections.h
+++ b/include/asm-generic/sections.h
@@ -186,4 +186,33 @@ static inline bool is_kernel_inittext(unsigned long addr)
   addr < (unsigned long)_einittext;
 }
 
+/**
+ * __is_kernel_text - checks if the pointer address is located in the
+ *.text section
+ *
+ * @addr: address to check
+ *
+ * Returns: true if the address is located in .text, false otherwise.
+ * Note: an internal helper, only check the range of _stext to _etext.
+ */
+static inline bool __is_kernel_text(unsigned long addr)
+{
+   return addr >= (unsigned long)_stext &&
+  addr < (unsigned long)_etext;
+}
+
+/**
+ * __is_kernel - checks if the pointer address is located in the kernel range
+ *
+ * @addr: address to check
+ *
+ * Returns: true if the address is located in the kernel range, false 
otherwise.
+ * Note: an internal helper, only check the range of _stext to _end.
+ */
+static inline bool __is_kernel(unsigned long addr)
+{
+   return addr >= (unsigned long)_stext &&
+  addr < (unsigned long)_end;
+}
+
 #endif /* _ASM_GENERIC_SECTIONS_H_ */
diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h
index 8a9d329c927c..5fb17dd4b6fa 100644
--- a/include/linux/kallsyms.h
+++ b/include/linux/kallsyms.h
@@ -26,14 +26,14 @@ struct module;
 
 static inline int is_kernel_text(unsigned long addr)
 {
-   if ((addr >= (unsigned long)_stext && addr < (unsigned long)_etext))
+   if (__is_kernel_text(addr))
return 1;
return in_gate_area_no_mm(addr);
 }
 
 static inline int is_kernel(unsigned long addr)
 {
-   if (addr >= (unsigned long)_stext && addr < (unsigned long)_end)
+   if (__is_kernel(addr))
return 1;
return in_gate_area_no_mm(addr);
 }
-- 
2.26.2



[PATCH v3 9/9] powerpc/mm: Use is_kernel_text() and is_kernel_inittext() helper

2021-09-26 Thread Kefeng Wang
Use is_kernel_text() and is_kernel_inittext() helper to simplify code,
also drop etext, _stext, _sinittext, _einittext declaration which
already declared in section.h.

Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Kefeng Wang 
---
 arch/powerpc/mm/pgtable_32.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index dcf5ecca19d9..13c798308c2e 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -33,8 +33,6 @@
 
 #include 
 
-extern char etext[], _stext[], _sinittext[], _einittext[];
-
 static u8 early_fixmap_pagetable[FIXMAP_PTE_SIZE] __page_aligned_data;
 
 notrace void __init early_ioremap_init(void)
@@ -104,14 +102,13 @@ static void __init __mapin_ram_chunk(unsigned long 
offset, unsigned long top)
 {
unsigned long v, s;
phys_addr_t p;
-   int ktext;
+   bool ktext;
 
s = offset;
v = PAGE_OFFSET + s;
p = memstart_addr + s;
for (; s < top; s += PAGE_SIZE) {
-   ktext = ((char *)v >= _stext && (char *)v < etext) ||
-   ((char *)v >= _sinittext && (char *)v < _einittext);
+   ktext = (is_kernel_text(v) || is_kernel_inittext(v));
map_kernel_page(v, p, ktext ? PAGE_KERNEL_TEXT : PAGE_KERNEL);
v += PAGE_SIZE;
p += PAGE_SIZE;
-- 
2.26.2



[PATCH v3 8/9] extable: Use is_kernel_text() helper

2021-09-26 Thread Kefeng Wang
The core_kernel_text() should check the gate area, as it is part
of kernel text range.

Cc: Steven Rostedt 
Signed-off-by: Kefeng Wang 
---
 kernel/extable.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/extable.c b/kernel/extable.c
index 98ca627ac5ef..0ba383d850ff 100644
--- a/kernel/extable.c
+++ b/kernel/extable.c
@@ -64,8 +64,7 @@ const struct exception_table_entry 
*search_exception_tables(unsigned long addr)
 
 int notrace core_kernel_text(unsigned long addr)
 {
-   if (addr >= (unsigned long)_stext &&
-   addr < (unsigned long)_etext)
+   if (is_kernel_text(addr))
return 1;
 
if (system_state < SYSTEM_RUNNING &&
-- 
2.26.2



[PATCH v3 7/9] mm: kasan: Use is_kernel() helper

2021-09-26 Thread Kefeng Wang
Directly use is_kernel() helper in kernel_or_module_addr().

Cc: Andrey Ryabinin 
Cc: Alexander Potapenko 
Cc: Andrey Konovalov 
Cc: Dmitry Vyukov 
Signed-off-by: Kefeng Wang 
---
 mm/kasan/report.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/kasan/report.c b/mm/kasan/report.c
index 3239fd8f8747..1c955e1c98d5 100644
--- a/mm/kasan/report.c
+++ b/mm/kasan/report.c
@@ -226,7 +226,7 @@ static void describe_object(struct kmem_cache *cache, void 
*object,
 
 static inline bool kernel_or_module_addr(const void *addr)
 {
-   if (addr >= (void *)_stext && addr < (void *)_end)
+   if (is_kernel((unsigned long)addr))
return true;
if (is_module_address((unsigned long)addr))
return true;
-- 
2.26.2



[PATCH v3 0/9] sections: Unify kernel sections range check and use

2021-09-26 Thread Kefeng Wang
There are three head files(kallsyms.h, kernel.h and sections.h) which
include the kernel sections range check, let's make some cleanup and
unify them.

1. cleanup arch specific text/data check and fix address boundary check
   in kallsyms.h
2. make all the basic/core kernel range check function into sections.h
3. update all the callers, and use the helper in sections.h to simplify
   the code

After this series, we have 5 APIs about kernel sections range check in
sections.h

 * is_kernel_rodata()   --- already in sections.h
 * is_kernel_core_data()--- come from core_kernel_data() in kernel.h
 * is_kernel_inittext() --- come from kernel.h and kallsyms.h
 * __is_kernel_text()   --- add new internal helper
 * __is_kernel()--- add new internal helper

Note: For the last two helpers, people should not use directly, consider to
  use corresponding function in kallsyms.h.

v3:
- Add Steven's RB to patch2
- Introduce two internal helper, then use is_kernel_text() in core_kernel_text()
  and is_kernel() in kernel_or_module_addr() suggested by Steven

v2:
https://lore.kernel.org/linux-arch/20210728081320.20394-1-wangkefeng.w...@huawei.com/
- add ACK/RW to patch2, and drop inappropriate fix tag
- keep 'core' to check kernel data, suggestted by Steven Rostedt
  , rename is_kernel_data() to is_kernel_core_data()
- drop patch8 which is merged
- drop patch9 which is resend independently

v1:
https://lore.kernel.org/linux-arch/20210626073439.150586-1-wangkefeng.w...@huawei.com

Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-a...@vger.kernel.org 
Cc: b...@vger.kernel.org 

Kefeng Wang (9):
  kallsyms: Remove arch specific text and data check
  kallsyms: Fix address-checks for kernel related range
  sections: Move and rename core_kernel_data() to is_kernel_core_data()
  sections: Move is_kernel_inittext() into sections.h
  x86: mm: Rename __is_kernel_text() to is_x86_32_kernel_text()
  sections: Provide internal __is_kernel() and __is_kernel_text() helper
  mm: kasan: Use is_kernel() helper
  extable: Use is_kernel_text() helper
  powerpc/mm: Use is_kernel_text() and is_kernel_inittext() helper

 arch/powerpc/mm/pgtable_32.c   |  7 +---
 arch/x86/kernel/unwind_orc.c   |  2 +-
 arch/x86/mm/init_32.c  | 14 +++
 include/asm-generic/sections.h | 75 ++
 include/linux/kallsyms.h   | 13 +-
 include/linux/kernel.h |  2 -
 kernel/extable.c   | 33 ++-
 kernel/locking/lockdep.c   |  3 --
 kernel/trace/ftrace.c  |  2 +-
 mm/kasan/report.c  |  2 +-
 net/sysctl_net.c   |  2 +-
 11 files changed, 75 insertions(+), 80 deletions(-)

-- 
2.26.2



[PATCH v3 2/9] kallsyms: Fix address-checks for kernel related range

2021-09-26 Thread Kefeng Wang
The is_kernel_inittext/is_kernel_text/is_kernel function should not
include the end address(the labels _einittext, _etext and _end) when
check the address range, the issue exists since Linux v2.6.12.

Cc: Arnd Bergmann 
Cc: Sergey Senozhatsky 
Cc: Petr Mladek 
Reviewed-by: Petr Mladek 
Reviewed-by: Steven Rostedt (VMware) 
Acked-by: Sergey Senozhatsky 
Signed-off-by: Kefeng Wang 
---
 include/linux/kallsyms.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h
index 2a241e3f063f..b016c62f30a6 100644
--- a/include/linux/kallsyms.h
+++ b/include/linux/kallsyms.h
@@ -27,21 +27,21 @@ struct module;
 static inline int is_kernel_inittext(unsigned long addr)
 {
if (addr >= (unsigned long)_sinittext
-   && addr <= (unsigned long)_einittext)
+   && addr < (unsigned long)_einittext)
return 1;
return 0;
 }
 
 static inline int is_kernel_text(unsigned long addr)
 {
-   if ((addr >= (unsigned long)_stext && addr <= (unsigned long)_etext))
+   if ((addr >= (unsigned long)_stext && addr < (unsigned long)_etext))
return 1;
return in_gate_area_no_mm(addr);
 }
 
 static inline int is_kernel(unsigned long addr)
 {
-   if (addr >= (unsigned long)_stext && addr <= (unsigned long)_end)
+   if (addr >= (unsigned long)_stext && addr < (unsigned long)_end)
return 1;
return in_gate_area_no_mm(addr);
 }
-- 
2.26.2



[PATCH v3 1/9] kallsyms: Remove arch specific text and data check

2021-09-26 Thread Kefeng Wang
After commit 4ba66a976072 ("arch: remove blackfin port"),
no need arch-specific text/data check.

Cc: Arnd Bergmann 
Signed-off-by: Kefeng Wang 
---
 include/asm-generic/sections.h | 16 
 include/linux/kallsyms.h   |  3 +--
 kernel/locking/lockdep.c   |  3 ---
 3 files changed, 1 insertion(+), 21 deletions(-)

diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h
index d16302d3eb59..817309e289db 100644
--- a/include/asm-generic/sections.h
+++ b/include/asm-generic/sections.h
@@ -64,22 +64,6 @@ extern __visible const void __nosave_begin, __nosave_end;
 #define dereference_kernel_function_descriptor(p) ((void *)(p))
 #endif
 
-/* random extra sections (if any).  Override
- * in asm/sections.h */
-#ifndef arch_is_kernel_text
-static inline int arch_is_kernel_text(unsigned long addr)
-{
-   return 0;
-}
-#endif
-
-#ifndef arch_is_kernel_data
-static inline int arch_is_kernel_data(unsigned long addr)
-{
-   return 0;
-}
-#endif
-
 /*
  * Check if an address is part of freed initmem. This is needed on 
architectures
  * with virt == phys kernel mapping, for code that wants to check if an address
diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h
index 6851c2313cad..2a241e3f063f 100644
--- a/include/linux/kallsyms.h
+++ b/include/linux/kallsyms.h
@@ -34,8 +34,7 @@ static inline int is_kernel_inittext(unsigned long addr)
 
 static inline int is_kernel_text(unsigned long addr)
 {
-   if ((addr >= (unsigned long)_stext && addr <= (unsigned long)_etext) ||
-   arch_is_kernel_text(addr))
+   if ((addr >= (unsigned long)_stext && addr <= (unsigned long)_etext))
return 1;
return in_gate_area_no_mm(addr);
 }
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 7096384dc60f..dcdbcee391cd 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -803,9 +803,6 @@ static int static_obj(const void *obj)
if ((addr >= start) && (addr < end))
return 1;
 
-   if (arch_is_kernel_data(addr))
-   return 1;
-
/*
 * in-kernel percpu var?
 */
-- 
2.26.2



[PATCH v3 5/9] x86: mm: Rename __is_kernel_text() to is_x86_32_kernel_text()

2021-09-26 Thread Kefeng Wang
Commit b56cd05c55a1 ("x86/mm: Rename is_kernel_text to __is_kernel_text"),
add '__' prefix not to get in conflict with existing is_kernel_text() in
.

We will add __is_kernel_text() for the basic kernel text range check in the
next patch, so use private is_x86_32_kernel_text() naming for x86 special
check.

Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: x...@kernel.org
Signed-off-by: Kefeng Wang 
---
 arch/x86/mm/init_32.c | 14 +-
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index bd90b8fe81e4..523743ee9dea 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -238,11 +238,7 @@ page_table_range_init(unsigned long start, unsigned long 
end, pgd_t *pgd_base)
}
 }
 
-/*
- * The  already defines is_kernel_text,
- * using '__' prefix not to get in conflict.
- */
-static inline int __is_kernel_text(unsigned long addr)
+static inline int is_x86_32_kernel_text(unsigned long addr)
 {
if (addr >= (unsigned long)_text && addr <= (unsigned long)__init_end)
return 1;
@@ -333,8 +329,8 @@ kernel_physical_mapping_init(unsigned long start,
addr2 = (pfn + PTRS_PER_PTE-1) * PAGE_SIZE +
PAGE_OFFSET + PAGE_SIZE-1;
 
-   if (__is_kernel_text(addr) ||
-   __is_kernel_text(addr2))
+   if (is_x86_32_kernel_text(addr) ||
+   is_x86_32_kernel_text(addr2))
prot = PAGE_KERNEL_LARGE_EXEC;
 
pages_2m++;
@@ -359,7 +355,7 @@ kernel_physical_mapping_init(unsigned long start,
 */
pgprot_t init_prot = __pgprot(PTE_IDENT_ATTR);
 
-   if (__is_kernel_text(addr))
+   if (is_x86_32_kernel_text(addr))
prot = PAGE_KERNEL_EXEC;
 
pages_4k++;
@@ -820,7 +816,7 @@ static void mark_nxdata_nx(void)
 */
unsigned long start = PFN_ALIGN(_etext);
/*
-* This comes from __is_kernel_text upper limit. Also HPAGE where used:
+* This comes from is_x86_32_kernel_text upper limit. Also HPAGE where 
used:
 */
unsigned long size = (((unsigned long)__init_end + HPAGE_SIZE) & 
HPAGE_MASK) - start;
 
-- 
2.26.2



[PATCH -next] trap: Cleanup trap_init()

2021-08-12 Thread Kefeng Wang
There are some empty trap_init() in different ARCHs, introduce
a new weak trap_init() function to cleanup them.

Cc: Vineet Gupta 
Cc: Russell King 
Cc: Yoshinori Sato 
Cc: Ley Foon Tan 
Cc: Jonas Bonn 
Cc: Stefan Kristiansson 
Cc: Stafford Horne 
Cc: James E.J. Bottomley 
Cc: Helge Deller 
Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Paul Walmsley 
Cc: Jeff Dike 
Cc: Richard Weinberger 
Cc: Anton Ivanov 
Cc: Andrew Morton 
Signed-off-by: Kefeng Wang 
---
 arch/arc/kernel/traps.c  | 5 -
 arch/arm/kernel/traps.c  | 5 -
 arch/h8300/kernel/traps.c| 4 
 arch/hexagon/kernel/traps.c  | 4 
 arch/nds32/kernel/traps.c| 5 -
 arch/nios2/kernel/traps.c| 5 -
 arch/openrisc/kernel/traps.c | 5 -
 arch/parisc/kernel/traps.c   | 4 
 arch/powerpc/kernel/traps.c  | 5 -
 arch/riscv/kernel/traps.c| 5 -
 arch/um/kernel/trap.c| 4 
 init/main.c  | 2 ++
 12 files changed, 2 insertions(+), 51 deletions(-)

diff --git a/arch/arc/kernel/traps.c b/arch/arc/kernel/traps.c
index 57235e5c0cea..6b83e3f2b41c 100644
--- a/arch/arc/kernel/traps.c
+++ b/arch/arc/kernel/traps.c
@@ -20,11 +20,6 @@
 #include 
 #include 
 
-void __init trap_init(void)
-{
-   return;
-}
-
 void die(const char *str, struct pt_regs *regs, unsigned long address)
 {
show_kernel_fault_diag(str, regs, address);
diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
index 64308e3a5d0c..e9b4f2b49bd8 100644
--- a/arch/arm/kernel/traps.c
+++ b/arch/arm/kernel/traps.c
@@ -781,11 +781,6 @@ void abort(void)
panic("Oops failed to kill thread");
 }
 
-void __init trap_init(void)
-{
-   return;
-}
-
 #ifdef CONFIG_KUSER_HELPERS
 static void __init kuser_init(void *vectors)
 {
diff --git a/arch/h8300/kernel/traps.c b/arch/h8300/kernel/traps.c
index 5d8b969cd8f3..bdbe988d8dbc 100644
--- a/arch/h8300/kernel/traps.c
+++ b/arch/h8300/kernel/traps.c
@@ -39,10 +39,6 @@ void __init base_trap_init(void)
 {
 }
 
-void __init trap_init(void)
-{
-}
-
 asmlinkage void set_esp0(unsigned long ssp)
 {
current->thread.esp0 = ssp;
diff --git a/arch/hexagon/kernel/traps.c b/arch/hexagon/kernel/traps.c
index 904134b37232..edfc35dafeb1 100644
--- a/arch/hexagon/kernel/traps.c
+++ b/arch/hexagon/kernel/traps.c
@@ -28,10 +28,6 @@
 #define TRAP_SYSCALL   1
 #define TRAP_DEBUG 0xdb
 
-void __init trap_init(void)
-{
-}
-
 #ifdef CONFIG_GENERIC_BUG
 /* Maybe should resemble arch/sh/kernel/traps.c ?? */
 int is_valid_bugaddr(unsigned long addr)
diff --git a/arch/nds32/kernel/traps.c b/arch/nds32/kernel/traps.c
index ee0d9ae192a5..f06421c645af 100644
--- a/arch/nds32/kernel/traps.c
+++ b/arch/nds32/kernel/traps.c
@@ -183,11 +183,6 @@ void __pgd_error(const char *file, int line, unsigned long 
val)
 }
 
 extern char *exception_vector, *exception_vector_end;
-void __init trap_init(void)
-{
-   return;
-}
-
 void __init early_trap_init(void)
 {
unsigned long ivb = 0;
diff --git a/arch/nios2/kernel/traps.c b/arch/nios2/kernel/traps.c
index b172da4eb1a9..596986a74a26 100644
--- a/arch/nios2/kernel/traps.c
+++ b/arch/nios2/kernel/traps.c
@@ -105,11 +105,6 @@ void show_stack(struct task_struct *task, unsigned long 
*stack,
printk("%s\n", loglvl);
 }
 
-void __init trap_init(void)
-{
-   /* Nothing to do here */
-}
-
 /* Breakpoint handler */
 asmlinkage void breakpoint_c(struct pt_regs *fp)
 {
diff --git a/arch/openrisc/kernel/traps.c b/arch/openrisc/kernel/traps.c
index 4d61333c2623..aa1e709405ac 100644
--- a/arch/openrisc/kernel/traps.c
+++ b/arch/openrisc/kernel/traps.c
@@ -231,11 +231,6 @@ void unhandled_exception(struct pt_regs *regs, int ea, int 
vector)
die("Oops", regs, 9);
 }
 
-void __init trap_init(void)
-{
-   /* Nothing needs to be done */
-}
-
 asmlinkage void do_trap(struct pt_regs *regs, unsigned long address)
 {
force_sig_fault(SIGTRAP, TRAP_BRKPT, (void __user *)regs->pc);
diff --git a/arch/parisc/kernel/traps.c b/arch/parisc/kernel/traps.c
index 8d8441d4562a..747c328fb886 100644
--- a/arch/parisc/kernel/traps.c
+++ b/arch/parisc/kernel/traps.c
@@ -859,7 +859,3 @@ void  __init early_trap_init(void)
 
initialize_ivt(_vector_20);
 }
-
-void __init trap_init(void)
-{
-}
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index e103b89234cd..91efb5c6f2f3 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -2209,11 +2209,6 @@ DEFINE_INTERRUPT_HANDLER(kernel_bad_stack)
die("Bad kernel stack pointer", regs, SIGABRT);
 }
 
-void __init trap_init(void)
-{
-}
-
-
 #ifdef CONFIG_PPC_EMULATED_STATS
 
 #define WARN_EMULATED_SETUP(type)  .type = { .name = #type }
diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
index 0a98fd0ddfe9..0daaa3e4630d 100644
--- a/arch/riscv/kernel/traps.c
+++ b/arch/riscv/kernel/traps.c
@@ -199,11 +199,6 @@ int is_valid_bugaddr(unsigned long pc)
 

Re: [PATCH v2 5/7] kallsyms: Rename is_kernel() and is_kernel_text()

2021-07-29 Thread Kefeng Wang



On 2021/7/29 12:05, Steven Rostedt wrote:

On Thu, 29 Jul 2021 10:00:51 +0800
Kefeng Wang  wrote:


On 2021/7/28 23:28, Steven Rostedt wrote:

On Wed, 28 Jul 2021 16:13:18 +0800
Kefeng Wang  wrote:
  

The is_kernel[_text]() function check the address whether or not
in kernel[_text] ranges, also they will check the address whether
or not in gate area, so use better name.

Do you know what a gate area is?

Because I believe gate area is kernel text, so the rename just makes it
redundant and more confusing.

Yes, the gate area(eg, vectors part on ARM32, similar on x86/ia64) is
kernel text.

I want to keep the 'basic' section boundaries check, which only check
the start/end

of sections, all in section.h,  could we use 'generic' or 'basic' or
'core' in the naming?

   * is_kernel_generic_data()   --- come from core_kernel_data() in kernel.h
   * is_kernel_generic_text()

The old helper could remain unchanged, any suggestion, thanks.

Because it looks like the check of just being in the range of "_stext"
to "_end" is just an internal helper, why not do what we do all over
the kernel, and just prefix the function with a couple of underscores,
that denote that it's internal?

   __is_kernel_text()


OK, thanks for your advise,  there's already a __is_kernel_text() in 
arch/x86/mm/init_32.c,


I will change it to is_x32_kernel_text() to avoid conflict on x86_32.



Then you have:

  static inline int is_kernel_text(unsigned long addr)
  {
if (__is_kernel_text(addr))
return 1;
return in_gate_area_no_mm(addr);
  }

-- Steve
.



Re: [PATCH v2 2/7] kallsyms: Fix address-checks for kernel related range

2021-07-28 Thread Kefeng Wang



On 2021/7/28 22:46, Steven Rostedt wrote:

On Wed, 28 Jul 2021 16:13:15 +0800
Kefeng Wang  wrote:


The is_kernel_inittext/is_kernel_text/is_kernel function should not
include the end address(the labels _einittext, _etext and _end) when
check the address range, the issue exists since Linux v2.6.12.

Cc: Arnd Bergmann 
Cc: Sergey Senozhatsky 
Cc: Petr Mladek 
Acked-by: Sergey Senozhatsky 
Reviewed-by: Petr Mladek 
Signed-off-by: Kefeng Wang 

Reviewed-by: Steven Rostedt (VMware) 


Thanks.



-- Steve



Re: [PATCH v2 6/7] sections: Add new is_kernel() and is_kernel_text()

2021-07-28 Thread Kefeng Wang



On 2021/7/28 23:32, Steven Rostedt wrote:

On Wed, 28 Jul 2021 16:13:19 +0800
Kefeng Wang  wrote:


@@ -64,8 +64,7 @@ const struct exception_table_entry 
*search_exception_tables(unsigned long addr)
  
  int notrace core_kernel_text(unsigned long addr)

  {
-   if (addr >= (unsigned long)_stext &&
-   addr < (unsigned long)_etext)
+   if (is_kernel_text(addr))

Perhaps this was a bug, and these functions should be checking the gate
area as well, as that is part of kernel text.

Ok, I would fix this if patch5 is reviewed well.


-- Steve



return 1;
  
  	if (system_state < SYSTEM_RUNNING &&

diff --git a/mm/kasan/report.c b/mm/kasan/report.c
index 884a950c7026..88f5b0c058b7 100644
--- a/mm/kasan/report.c
+++ b/mm/kasan/report.c
@@ -235,7 +235,7 @@ static void describe_object(struct kmem_cache *cache, void 
*object,
  
  static inline bool kernel_or_module_addr(const void *addr)

  {
-   if (addr >= (void *)_stext && addr < (void *)_end)
+   if (is_kernel((unsigned long)addr))
return true;
if (is_module_address((unsigned long)addr))
return true;
--

.



Re: [PATCH v2 5/7] kallsyms: Rename is_kernel() and is_kernel_text()

2021-07-28 Thread Kefeng Wang



On 2021/7/28 23:28, Steven Rostedt wrote:

On Wed, 28 Jul 2021 16:13:18 +0800
Kefeng Wang  wrote:


The is_kernel[_text]() function check the address whether or not
in kernel[_text] ranges, also they will check the address whether
or not in gate area, so use better name.

Do you know what a gate area is?

Because I believe gate area is kernel text, so the rename just makes it
redundant and more confusing.


Yes, the gate area(eg, vectors part on ARM32, similar on x86/ia64) is 
kernel text.


I want to keep the 'basic' section boundaries check, which only check 
the start/end


of sections, all in section.h,  could we use 'generic' or 'basic' or 
'core' in the naming?


 * is_kernel_generic_data() --- come from core_kernel_data() in kernel.h
 * is_kernel_generic_text()

The old helper could remain unchanged, any suggestion, thanks.



-- Steve
.



[PATCH v2 6/7] sections: Add new is_kernel() and is_kernel_text()

2021-07-28 Thread Kefeng Wang
The new is_kernel() check the kernel address ranges, and the
new is_kernel_text() check the kernel text section ranges.

Then use them to make some code clear.

Cc: Arnd Bergmann 
Cc: Andrey Ryabinin 
Signed-off-by: Kefeng Wang 
---
 include/asm-generic/sections.h | 27 +++
 include/linux/kallsyms.h   |  4 ++--
 kernel/extable.c   |  3 +--
 mm/kasan/report.c  |  2 +-
 4 files changed, 31 insertions(+), 5 deletions(-)

diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h
index 4f2f32aa2b7a..6b143637ab88 100644
--- a/include/asm-generic/sections.h
+++ b/include/asm-generic/sections.h
@@ -170,6 +170,20 @@ static inline bool is_kernel_rodata(unsigned long addr)
   addr < (unsigned long)__end_rodata;
 }
 
+/**
+ * is_kernel_text - checks if the pointer address is located in the
+ *  .text section
+ *
+ * @addr: address to check
+ *
+ * Returns: true if the address is located in .text, false otherwise.
+ */
+static inline bool is_kernel_text(unsigned long addr)
+{
+   return addr >= (unsigned long)_stext &&
+  addr < (unsigned long)_etext;
+}
+
 /**
  * is_kernel_inittext - checks if the pointer address is located in the
  *.init.text section
@@ -184,4 +198,17 @@ static inline bool is_kernel_inittext(unsigned long addr)
   addr < (unsigned long)_einittext;
 }
 
+/**
+ * is_kernel - checks if the pointer address is located in the kernel range
+ *
+ * @addr: address to check
+ *
+ * Returns: true if the address is located in kernel range, false otherwise.
+ */
+static inline bool is_kernel(unsigned long addr)
+{
+   return addr >= (unsigned long)_stext &&
+  addr < (unsigned long)_end;
+}
+
 #endif /* _ASM_GENERIC_SECTIONS_H_ */
diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h
index 4f501ac9c2c2..897d5720884f 100644
--- a/include/linux/kallsyms.h
+++ b/include/linux/kallsyms.h
@@ -26,14 +26,14 @@ struct module;
 
 static inline int is_kernel_text_or_gate_area(unsigned long addr)
 {
-   if ((addr >= (unsigned long)_stext && addr < (unsigned long)_etext))
+   if (is_kernel_text(addr))
return 1;
return in_gate_area_no_mm(addr);
 }
 
 static inline int is_kernel_or_gate_area(unsigned long addr)
 {
-   if (addr >= (unsigned long)_stext && addr < (unsigned long)_end)
+   if (is_kernel(addr))
return 1;
return in_gate_area_no_mm(addr);
 }
diff --git a/kernel/extable.c b/kernel/extable.c
index 98ca627ac5ef..0ba383d850ff 100644
--- a/kernel/extable.c
+++ b/kernel/extable.c
@@ -64,8 +64,7 @@ const struct exception_table_entry 
*search_exception_tables(unsigned long addr)
 
 int notrace core_kernel_text(unsigned long addr)
 {
-   if (addr >= (unsigned long)_stext &&
-   addr < (unsigned long)_etext)
+   if (is_kernel_text(addr))
return 1;
 
if (system_state < SYSTEM_RUNNING &&
diff --git a/mm/kasan/report.c b/mm/kasan/report.c
index 884a950c7026..88f5b0c058b7 100644
--- a/mm/kasan/report.c
+++ b/mm/kasan/report.c
@@ -235,7 +235,7 @@ static void describe_object(struct kmem_cache *cache, void 
*object,
 
 static inline bool kernel_or_module_addr(const void *addr)
 {
-   if (addr >= (void *)_stext && addr < (void *)_end)
+   if (is_kernel((unsigned long)addr))
return true;
if (is_module_address((unsigned long)addr))
return true;
-- 
2.26.2



[PATCH v2 3/7] sections: Move and rename core_kernel_data() to is_kernel_core_data()

2021-07-28 Thread Kefeng Wang
Move core_kernel_data() into sections.h and rename it to
is_kernel_core_data(), also make it return bool value, then
update all the callers.

Cc: Arnd Bergmann 
Cc: Steven Rostedt 
Cc: Ingo Molnar 
Cc: "David S. Miller" 
Signed-off-by: Kefeng Wang 
---
 include/asm-generic/sections.h | 14 ++
 include/linux/kernel.h |  1 -
 kernel/extable.c   | 18 --
 kernel/trace/ftrace.c  |  2 +-
 net/sysctl_net.c   |  2 +-
 5 files changed, 16 insertions(+), 21 deletions(-)

diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h
index 817309e289db..26ed9fc9b4e3 100644
--- a/include/asm-generic/sections.h
+++ b/include/asm-generic/sections.h
@@ -142,6 +142,20 @@ static inline bool init_section_intersects(void *virt, 
size_t size)
return memory_intersects(__init_begin, __init_end, virt, size);
 }
 
+/**
+ * is_kernel_core_data - checks if the pointer address is located in the
+ *  .data section
+ *
+ * @addr: address to check
+ *
+ * Returns: true if the address is located in .data, false otherwise.
+ */
+static inline bool is_kernel_core_data(unsigned long addr)
+{
+   return addr >= (unsigned long)_sdata &&
+  addr < (unsigned long)_edata;
+}
+
 /**
  * is_kernel_rodata - checks if the pointer address is located in the
  *.rodata section
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 1b2f0a7e00d6..0622418bafbc 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -230,7 +230,6 @@ extern char *next_arg(char *args, char **param, char **val);
 
 extern int core_kernel_text(unsigned long addr);
 extern int init_kernel_text(unsigned long addr);
-extern int core_kernel_data(unsigned long addr);
 extern int __kernel_text_address(unsigned long addr);
 extern int kernel_text_address(unsigned long addr);
 extern int func_ptr_is_kernel_text(void *ptr);
diff --git a/kernel/extable.c b/kernel/extable.c
index b0ea5eb0c3b4..da26203841d4 100644
--- a/kernel/extable.c
+++ b/kernel/extable.c
@@ -82,24 +82,6 @@ int notrace core_kernel_text(unsigned long addr)
return 0;
 }
 
-/**
- * core_kernel_data - tell if addr points to kernel data
- * @addr: address to test
- *
- * Returns true if @addr passed in is from the core kernel data
- * section.
- *
- * Note: On some archs it may return true for core RODATA, and false
- *  for others. But will always be true for core RW data.
- */
-int core_kernel_data(unsigned long addr)
-{
-   if (addr >= (unsigned long)_sdata &&
-   addr < (unsigned long)_edata)
-   return 1;
-   return 0;
-}
-
 int __kernel_text_address(unsigned long addr)
 {
if (kernel_text_address(addr))
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index e6fb3e6e1ffc..d01ca1cb2d5f 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -323,7 +323,7 @@ int __register_ftrace_function(struct ftrace_ops *ops)
if (!ftrace_enabled && (ops->flags & FTRACE_OPS_FL_PERMANENT))
return -EBUSY;
 
-   if (!core_kernel_data((unsigned long)ops))
+   if (!is_kernel_core_data((unsigned long)ops))
ops->flags |= FTRACE_OPS_FL_DYNAMIC;
 
add_ftrace_ops(_ops_list, ops);
diff --git a/net/sysctl_net.c b/net/sysctl_net.c
index f6cb0d4d114c..4b45ed631eb8 100644
--- a/net/sysctl_net.c
+++ b/net/sysctl_net.c
@@ -144,7 +144,7 @@ static void ensure_safe_net_sysctl(struct net *net, const 
char *path,
addr = (unsigned long)ent->data;
if (is_module_address(addr))
where = "module";
-   else if (core_kernel_data(addr))
+   else if (is_kernel_core_data(addr))
where = "kernel";
else
continue;
-- 
2.26.2



[PATCH v2 7/7] powerpc/mm: Use is_kernel_text() and is_kernel_inittext() helper

2021-07-28 Thread Kefeng Wang
Use is_kernel_text() and is_kernel_inittext() helper to simplify code,
also drop etext, _stext, _sinittext, _einittext declaration which
already declared in section.h.

Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Kefeng Wang 
---
 arch/powerpc/mm/pgtable_32.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index dcf5ecca19d9..13c798308c2e 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -33,8 +33,6 @@
 
 #include 
 
-extern char etext[], _stext[], _sinittext[], _einittext[];
-
 static u8 early_fixmap_pagetable[FIXMAP_PTE_SIZE] __page_aligned_data;
 
 notrace void __init early_ioremap_init(void)
@@ -104,14 +102,13 @@ static void __init __mapin_ram_chunk(unsigned long 
offset, unsigned long top)
 {
unsigned long v, s;
phys_addr_t p;
-   int ktext;
+   bool ktext;
 
s = offset;
v = PAGE_OFFSET + s;
p = memstart_addr + s;
for (; s < top; s += PAGE_SIZE) {
-   ktext = ((char *)v >= _stext && (char *)v < etext) ||
-   ((char *)v >= _sinittext && (char *)v < _einittext);
+   ktext = (is_kernel_text(v) || is_kernel_inittext(v));
map_kernel_page(v, p, ktext ? PAGE_KERNEL_TEXT : PAGE_KERNEL);
v += PAGE_SIZE;
p += PAGE_SIZE;
-- 
2.26.2



[PATCH v2 4/7] sections: Move is_kernel_inittext() into sections.h

2021-07-28 Thread Kefeng Wang
The is_kernel_inittext() and init_kernel_text() are with same
functionality, let's just keep is_kernel_inittext() and move
it into sections.h, then update all the callers.

Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Arnd Bergmann 
Cc: x...@kernel.org
Signed-off-by: Kefeng Wang 
---
 arch/x86/kernel/unwind_orc.c   |  2 +-
 include/asm-generic/sections.h | 14 ++
 include/linux/kallsyms.h   |  8 
 include/linux/kernel.h |  1 -
 kernel/extable.c   | 12 ++--
 5 files changed, 17 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
index a1202536fc57..d92ec2ced059 100644
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -175,7 +175,7 @@ static struct orc_entry *orc_find(unsigned long ip)
}
 
/* vmlinux .init slow lookup: */
-   if (init_kernel_text(ip))
+   if (is_kernel_inittext(ip))
return __orc_find(__start_orc_unwind_ip, __start_orc_unwind,
  __stop_orc_unwind_ip - __start_orc_unwind_ip, 
ip);
 
diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h
index 26ed9fc9b4e3..4f2f32aa2b7a 100644
--- a/include/asm-generic/sections.h
+++ b/include/asm-generic/sections.h
@@ -170,4 +170,18 @@ static inline bool is_kernel_rodata(unsigned long addr)
   addr < (unsigned long)__end_rodata;
 }
 
+/**
+ * is_kernel_inittext - checks if the pointer address is located in the
+ *.init.text section
+ *
+ * @addr: address to check
+ *
+ * Returns: true if the address is located in .init.text, false otherwise.
+ */
+static inline bool is_kernel_inittext(unsigned long addr)
+{
+   return addr >= (unsigned long)_sinittext &&
+  addr < (unsigned long)_einittext;
+}
+
 #endif /* _ASM_GENERIC_SECTIONS_H_ */
diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h
index b016c62f30a6..8a9d329c927c 100644
--- a/include/linux/kallsyms.h
+++ b/include/linux/kallsyms.h
@@ -24,14 +24,6 @@
 struct cred;
 struct module;
 
-static inline int is_kernel_inittext(unsigned long addr)
-{
-   if (addr >= (unsigned long)_sinittext
-   && addr < (unsigned long)_einittext)
-   return 1;
-   return 0;
-}
-
 static inline int is_kernel_text(unsigned long addr)
 {
if ((addr >= (unsigned long)_stext && addr < (unsigned long)_etext))
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 0622418bafbc..d4ba46cf4737 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -229,7 +229,6 @@ extern bool parse_option_str(const char *str, const char 
*option);
 extern char *next_arg(char *args, char **param, char **val);
 
 extern int core_kernel_text(unsigned long addr);
-extern int init_kernel_text(unsigned long addr);
 extern int __kernel_text_address(unsigned long addr);
 extern int kernel_text_address(unsigned long addr);
 extern int func_ptr_is_kernel_text(void *ptr);
diff --git a/kernel/extable.c b/kernel/extable.c
index da26203841d4..98ca627ac5ef 100644
--- a/kernel/extable.c
+++ b/kernel/extable.c
@@ -62,14 +62,6 @@ const struct exception_table_entry 
*search_exception_tables(unsigned long addr)
return e;
 }
 
-int init_kernel_text(unsigned long addr)
-{
-   if (addr >= (unsigned long)_sinittext &&
-   addr < (unsigned long)_einittext)
-   return 1;
-   return 0;
-}
-
 int notrace core_kernel_text(unsigned long addr)
 {
if (addr >= (unsigned long)_stext &&
@@ -77,7 +69,7 @@ int notrace core_kernel_text(unsigned long addr)
return 1;
 
if (system_state < SYSTEM_RUNNING &&
-   init_kernel_text(addr))
+   is_kernel_inittext(addr))
return 1;
return 0;
 }
@@ -94,7 +86,7 @@ int __kernel_text_address(unsigned long addr)
 * Since we are after the module-symbols check, there's
 * no danger of address overlap:
 */
-   if (init_kernel_text(addr))
+   if (is_kernel_inittext(addr))
return 1;
return 0;
 }
-- 
2.26.2



[PATCH v2 5/7] kallsyms: Rename is_kernel() and is_kernel_text()

2021-07-28 Thread Kefeng Wang
The is_kernel[_text]() function check the address whether or not
in kernel[_text] ranges, also they will check the address whether
or not in gate area, so use better name.

Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Cc: Sami Tolvanen 
Cc: Nathan Chancellor 
Cc: Arnd Bergmann 
Cc: b...@vger.kernel.org
Signed-off-by: Kefeng Wang 
---
 arch/x86/net/bpf_jit_comp.c | 2 +-
 include/linux/kallsyms.h| 8 
 kernel/cfi.c| 2 +-
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 333650b9372a..c87d0dd4370d 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -372,7 +372,7 @@ static int __bpf_arch_text_poke(void *ip, enum 
bpf_text_poke_type t,
 int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t,
   void *old_addr, void *new_addr)
 {
-   if (!is_kernel_text((long)ip) &&
+   if (!is_kernel_text_or_gate_area((long)ip) &&
!is_bpf_text_address((long)ip))
/* BPF poking in modules is not supported */
return -EINVAL;
diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h
index 8a9d329c927c..4f501ac9c2c2 100644
--- a/include/linux/kallsyms.h
+++ b/include/linux/kallsyms.h
@@ -24,14 +24,14 @@
 struct cred;
 struct module;
 
-static inline int is_kernel_text(unsigned long addr)
+static inline int is_kernel_text_or_gate_area(unsigned long addr)
 {
if ((addr >= (unsigned long)_stext && addr < (unsigned long)_etext))
return 1;
return in_gate_area_no_mm(addr);
 }
 
-static inline int is_kernel(unsigned long addr)
+static inline int is_kernel_or_gate_area(unsigned long addr)
 {
if (addr >= (unsigned long)_stext && addr < (unsigned long)_end)
return 1;
@@ -41,9 +41,9 @@ static inline int is_kernel(unsigned long addr)
 static inline int is_ksym_addr(unsigned long addr)
 {
if (IS_ENABLED(CONFIG_KALLSYMS_ALL))
-   return is_kernel(addr);
+   return is_kernel_or_gate_area(addr);
 
-   return is_kernel_text(addr) || is_kernel_inittext(addr);
+   return is_kernel_text_or_gate_area(addr) || is_kernel_inittext(addr);
 }
 
 static inline void *dereference_symbol_descriptor(void *ptr)
diff --git a/kernel/cfi.c b/kernel/cfi.c
index e17a56639766..e7d90eff4382 100644
--- a/kernel/cfi.c
+++ b/kernel/cfi.c
@@ -282,7 +282,7 @@ static inline cfi_check_fn find_check_fn(unsigned long ptr)
 {
cfi_check_fn fn = NULL;
 
-   if (is_kernel_text(ptr))
+   if (is_kernel_text_or_gate_area(ptr))
return __cfi_check;
 
/*
-- 
2.26.2



[PATCH v2 2/7] kallsyms: Fix address-checks for kernel related range

2021-07-28 Thread Kefeng Wang
The is_kernel_inittext/is_kernel_text/is_kernel function should not
include the end address(the labels _einittext, _etext and _end) when
check the address range, the issue exists since Linux v2.6.12.

Cc: Arnd Bergmann 
Cc: Sergey Senozhatsky 
Cc: Petr Mladek 
Acked-by: Sergey Senozhatsky 
Reviewed-by: Petr Mladek 
Signed-off-by: Kefeng Wang 
---
 include/linux/kallsyms.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h
index 2a241e3f063f..b016c62f30a6 100644
--- a/include/linux/kallsyms.h
+++ b/include/linux/kallsyms.h
@@ -27,21 +27,21 @@ struct module;
 static inline int is_kernel_inittext(unsigned long addr)
 {
if (addr >= (unsigned long)_sinittext
-   && addr <= (unsigned long)_einittext)
+   && addr < (unsigned long)_einittext)
return 1;
return 0;
 }
 
 static inline int is_kernel_text(unsigned long addr)
 {
-   if ((addr >= (unsigned long)_stext && addr <= (unsigned long)_etext))
+   if ((addr >= (unsigned long)_stext && addr < (unsigned long)_etext))
return 1;
return in_gate_area_no_mm(addr);
 }
 
 static inline int is_kernel(unsigned long addr)
 {
-   if (addr >= (unsigned long)_stext && addr <= (unsigned long)_end)
+   if (addr >= (unsigned long)_stext && addr < (unsigned long)_end)
return 1;
return in_gate_area_no_mm(addr);
 }
-- 
2.26.2



[PATCH v2 1/7] kallsyms: Remove arch specific text and data check

2021-07-28 Thread Kefeng Wang
After commit 4ba66a976072 ("arch: remove blackfin port"),
no need arch-specific text/data check.

Cc: Arnd Bergmann 
Signed-off-by: Kefeng Wang 
---
 include/asm-generic/sections.h | 16 
 include/linux/kallsyms.h   |  3 +--
 kernel/locking/lockdep.c   |  3 ---
 3 files changed, 1 insertion(+), 21 deletions(-)

diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h
index d16302d3eb59..817309e289db 100644
--- a/include/asm-generic/sections.h
+++ b/include/asm-generic/sections.h
@@ -64,22 +64,6 @@ extern __visible const void __nosave_begin, __nosave_end;
 #define dereference_kernel_function_descriptor(p) ((void *)(p))
 #endif
 
-/* random extra sections (if any).  Override
- * in asm/sections.h */
-#ifndef arch_is_kernel_text
-static inline int arch_is_kernel_text(unsigned long addr)
-{
-   return 0;
-}
-#endif
-
-#ifndef arch_is_kernel_data
-static inline int arch_is_kernel_data(unsigned long addr)
-{
-   return 0;
-}
-#endif
-
 /*
  * Check if an address is part of freed initmem. This is needed on 
architectures
  * with virt == phys kernel mapping, for code that wants to check if an address
diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h
index 6851c2313cad..2a241e3f063f 100644
--- a/include/linux/kallsyms.h
+++ b/include/linux/kallsyms.h
@@ -34,8 +34,7 @@ static inline int is_kernel_inittext(unsigned long addr)
 
 static inline int is_kernel_text(unsigned long addr)
 {
-   if ((addr >= (unsigned long)_stext && addr <= (unsigned long)_etext) ||
-   arch_is_kernel_text(addr))
+   if ((addr >= (unsigned long)_stext && addr <= (unsigned long)_etext))
return 1;
return in_gate_area_no_mm(addr);
 }
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index bf1c00c881e4..64b17e995108 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -803,9 +803,6 @@ static int static_obj(const void *obj)
if ((addr >= start) && (addr < end))
return 1;
 
-   if (arch_is_kernel_data(addr))
-   return 1;
-
/*
 * in-kernel percpu var?
 */
-- 
2.26.2



[PATCH v2 0/7] sections: Unify kernel sections range check and use

2021-07-28 Thread Kefeng Wang
There are three head files(kallsyms.h, kernel.h and sections.h) which
include the kernel sections range check, let's make some cleanup and
unify them.

1. cleanup arch specific text/data check and fix address boundary check
   in kallsyms.h
2. make all the basic/core kernel range check function into sections.h
3. update all the callers, and use the helper in sections.h to simplify
   the code

After this series, we have 5 APIs about kernel sections range check in
sections.h

 * is_kernel_core_data()--- come from core_kernel_data() in kernel.h
 * is_kernel_rodata()   --- already in sections.h
 * is_kernel_text() --- come from kallsyms.h
 * is_kernel_inittext() --- come from kernel.h and kallsyms.h
 * is_kernel()  --- come from kallsyms.h


Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s...@vger.kernel.org
Cc: linux-a...@vger.kernel.org 
Cc: io...@lists.linux-foundation.org
Cc: b...@vger.kernel.org 

v2:
- add ACK/RW to patch2, and drop inappropriate fix tag
- keep 'core' to check kernel data, suggestted by Steven Rostedt
  , rename is_kernel_data() to is_kernel_core_data()
- drop patch8 which is merged
- drop patch9 which is resend independently

v1:
https://lore.kernel.org/linux-arch/20210626073439.150586-1-wangkefeng.w...@huawei.com

Kefeng Wang (7):
  kallsyms: Remove arch specific text and data check
  kallsyms: Fix address-checks for kernel related range
  sections: Move and rename core_kernel_data() to is_kernel_core_data()
  sections: Move is_kernel_inittext() into sections.h
  kallsyms: Rename is_kernel() and is_kernel_text()
  sections: Add new is_kernel() and is_kernel_text()
  powerpc/mm: Use is_kernel_text() and is_kernel_inittext() helper

 arch/powerpc/mm/pgtable_32.c   |  7 +---
 arch/x86/kernel/unwind_orc.c   |  2 +-
 arch/x86/net/bpf_jit_comp.c|  2 +-
 include/asm-generic/sections.h | 71 ++
 include/linux/kallsyms.h   | 21 +++---
 include/linux/kernel.h |  2 -
 kernel/cfi.c   |  2 +-
 kernel/extable.c   | 33 ++--
 kernel/locking/lockdep.c   |  3 --
 kernel/trace/ftrace.c  |  2 +-
 mm/kasan/report.c  |  2 +-
 net/sysctl_net.c   |  2 +-
 12 files changed, 72 insertions(+), 77 deletions(-)

-- 
2.26.2



  1   2   >