Re: [PATCH 3/3] mm, arch: add generic implementation of pfn_valid() for FLATMEM

2023-01-25 Thread Arnd Bergmann
On Wed, Jan 25, 2023, at 20:07, Mike Rapoport wrote:
> From: "Mike Rapoport (IBM)" 
>
> Every architecture that supports FLATMEM memory model defines its own
> version of pfn_valid() that essentially compares a pfn to max_mapnr.
>
> Use mips/powerpc version implemented as static inline as a generic
> implementation of pfn_valid() and drop its per-architecture definitions
>
> Signed-off-by: Mike Rapoport (IBM) 

Acked-by: Arnd Bergmann 

I assume this can best go through the mm tree, let me know if I should
pick it up in the asm-generic tree instead.

 Arnd


Re: [PATCH] powerpc/kasan/book3s_64: warn when running with hash MMU

2023-01-25 Thread Christophe Leroy




Le 11/10/2022 à 12:25, Christophe Leroy a écrit :



Le 11/10/2022 à 12:00, Michael Ellerman a écrit :

Nathan Lynch  writes:

Michael Ellerman  writes:

Christophe Leroy  writes:

+ KASAN list

Le 06/10/2022 à 06:10, Michael Ellerman a écrit :

Nathan Lynch  writes:

kasan is known to crash at boot on book3s_64 with non-radix MMU. As
noted in commit 41b7a347bf14 ("powerpc: Book3S 64-bit outline-only
KASAN support"):

 A kernel with CONFIG_KASAN=y will crash during boot on a machine
 using HPT translation because not all the entry points to the
 generic KASAN code are protected with a call to kasan_arch_is_ready().


I guess I thought there was some plan to fix that.


I was thinking the same.

Do we have a list of the said entry points to the generic code that are
lacking a call to kasan_arch_is_ready() ?

Typically, the BUG dump below shows that kasan_byte_accessible() is
lacking the check. It should be straight forward to add
kasan_arch_is_ready() check to kasan_byte_accessible(), shouldn't it ?


Yes :)

And one other spot, but the patch below boots OK for me. I'll leave it
running for a while just in case there's a path I've missed.


It works for me too, thanks (p8 pseries qemu).


It works but I still see the kasan shadow getting mapped, which we would
ideally avoid.

  From PTDUMP:

---[ kasan shadow mem start ]---
0xc00f-0xc00f0006  0x045e   448K r  
w   pte  valid  presentdirty  accessed
0xc00f3ffe-0xc00f3fff  0x04d8   128K r  
w   pte  valid  presentdirty  accessed

I haven't worked out how those are getting mapped.





Alternative patch proposed at 
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/150768c55722311699fdcf8f5379e8256749f47d.1674716617.git.christophe.le...@csgroup.eu/


Christophe


[PATCH] kasan: Fix Oops due to missing calls to kasan_arch_is_ready()

2023-01-25 Thread Christophe Leroy
On powerpc64, you can build a kernel with KASAN as soon as you build it
with RADIX MMU support. However if the CPU doesn't have RADIX MMU,
KASAN isn't enabled at init and the following Oops is encountered.

  [0.00][T0] KASAN not enabled as it requires radix!

  [4.484295][   T26] BUG: Unable to handle kernel data access at 
0xc00e00804a04
  [4.485270][   T26] Faulting instruction address: 0xc062ec6c
  [4.485748][   T26] Oops: Kernel access of bad area, sig: 11 [#1]
  [4.485920][   T26] BE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
  [4.486259][   T26] Modules linked in:
  [4.486637][   T26] CPU: 0 PID: 26 Comm: kworker/u2:2 Not tainted 
6.2.0-rc3-02590-gf8a023b0a805 #249
  [4.486907][   T26] Hardware name: IBM pSeries (emulated by qemu) POWER9 
(raw) 0x4e1200 0xf05 of:SLOF,HEAD pSeries
  [4.487445][   T26] Workqueue: eval_map_wq .tracer_init_tracefs_work_func
  [4.488744][   T26] NIP:  c062ec6c LR: c062bb84 CTR: 
c02ebcd0
  [4.488867][   T26] REGS: c49175c0 TRAP: 0380   Not tainted  
(6.2.0-rc3-02590-gf8a023b0a805)
  [4.489028][   T26] MSR:  82009032   CR: 
44002808  XER: 
  [4.489584][   T26] CFAR: c062bb80 IRQMASK: 0
  [4.489584][   T26] GPR00: c05624d4 c4917860 
c1cfc000 18804a04
  [4.489584][   T26] GPR04: c03a2650 0cc0 
c000d3d8 c000d3d8
  [4.489584][   T26] GPR08: c49175b0 a80e 
 17d78400
  [4.489584][   T26] GPR12: 44002204 c379 
c435003c c43f1c40
  [4.489584][   T26] GPR16: c43f1c68 c43501a0 
c2106138 c43f1c08
  [4.489584][   T26] GPR20: c43f1c10 c43f1c20 
c4146c40 c2fdb7f8
  [4.489584][   T26] GPR24: c2fdb834 c3685e00 
c4025030 c3522e90
  [4.489584][   T26] GPR28: 0cc0 c03a2650 
c4025020 c4025020
  [4.491201][   T26] NIP [c062ec6c] .kasan_byte_accessible+0xc/0x20
  [4.491430][   T26] LR [c062bb84] .__kasan_check_byte+0x24/0x90
  [4.491767][   T26] Call Trace:
  [4.491941][   T26] [c4917860] [c062ae70] 
.__kasan_kmalloc+0xc0/0x110 (unreliable)
  [4.492270][   T26] [c49178f0] [c05624d4] 
.krealloc+0x54/0x1c0
  [4.492453][   T26] [c4917990] [c03a2650] 
.create_trace_option_files+0x280/0x530
  [4.492613][   T26] [c4917a90] [c2050d90] 
.tracer_init_tracefs_work_func+0x274/0x2c0
  [4.492771][   T26] [c4917b40] [c01f9948] 
.process_one_work+0x578/0x9f0
  [4.492927][   T26] [c4917c30] [c01f9ebc] 
.worker_thread+0xfc/0x950
  [4.493084][   T26] [c4917d60] [c020be84] 
.kthread+0x1a4/0x1b0
  [4.493232][   T26] [c4917e10] [c000d3d8] 
.ret_from_kernel_thread+0x58/0x60
  [4.495642][   T26] Code: 6000 7cc802a6 38a0 4bfffc78 6000 
7cc802a6 38a1 4bfffc68 6000 3d20a80e 7863e8c2 792907c6 <7c6348ae> 
20630007 78630fe0 68630001
  [4.496704][   T26] ---[ end trace  ]---

The Oops is due to kasan_byte_accessible() not checking the readiness
of KASAN. Add missing call to kasan_arch_is_ready() and bail out when
not ready. The same problem is observed with kasan_kfree_large()
so fix it the same.

Also, as KASAN is not available and no shadow area is allocated for
linear memory mapping, there is no point in allocating shadow mem for
vmalloc memory as shown below in /sys/kernel/debug/kernel_page_tables

  ---[ kasan shadow mem start ]---
  0xc00f-0xc00f0006  0x040f   448K 
r  w   pte  valid  presentdirty  accessed
  0xc00f0086-0xc00f0086  0x0ac164K 
r  w   pte  valid  presentdirty  accessed
  0xc00f3ffe-0xc00f3fff  0x04d1   128K 
r  w   pte  valid  presentdirty  accessed
  ---[ kasan shadow mem end ]---

So, also verify KASAN readiness before allocating and poisoning
shadow mem for VMAs.

Reported-by: Nathan Lynch 
Suggested-by: Michael Ellerman 
Signed-off-by: Christophe Leroy 
---
 mm/kasan/common.c  |  3 +++
 mm/kasan/generic.c |  7 ++-
 mm/kasan/shadow.c  | 12 
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/mm/kasan/common.c b/mm/kasan/common.c
index 833bf2cfd2a3..21e66d7f261d 100644
--- a/mm/kasan/common.c
+++ b/mm/kasan/common.c
@@ -246,6 +246,9 @@ bool __kasan_slab_free(struct kmem_cache *cache, void 
*object,
 
 static inline bool kasan_kfree_large(void *ptr, unsigned long ip)
 {
+   if (!kasan_arch_is_ready())
+   return false;
+
if (ptr != page_address(virt_to_head_page(ptr))) {

Re: [PATCH] powerpc/vdso: Filter clang's auto var init zero enabler when linking

2023-01-25 Thread Masahiro Yamada
On Wed, Jan 25, 2023 at 1:20 AM Nathan Chancellor  wrote:
>
> After commit 7bbf02b875b5 ("kbuild: Stop using '-Qunused-arguments' with
> clang"), the PowerPC vDSO shows the following error with clang-13 and
> older when CONFIG_INIT_STACK_ALL_ZERO is enabled:
>
>   clang: error: argument unused during compilation: 
> '-enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang' 
> [-Werror,-Wunused-command-line-argument]
>
> clang-14 added a change to make sure this flag never triggers
> -Wunused-command-line-argument, so it is fixed with newer releases. For
> older releases that the kernel still supports building with, just filter
> out this flag, as has been done for other flags.
>
> Fixes: b174f4c26aa3 ("powerpc/vdso: Improve linker flags")
> Fixes: 7bbf02b875b5 ("kbuild: Stop using '-Qunused-arguments' with clang")
> Link: 
> https://github.com/llvm/llvm-project/commit/ca6d5813d17598cd180995fb3bdfca00f364475f
> Signed-off-by: Nathan Chancellor 
> ---
> This should be the last flag that needs to be filtered (famous last
> words...) but if any more come up, we should really just explore
> switching the PowerPC vDSO to linking with $(LD) like every other part
> of the kernel; for now, I hope this is fine.
>
> Cheers,
> Nathan


Applied to linux-kbuild. Thanks.

Since I rebased the branch, the tags have been updated
accordingly.





powerpc/vdso: Filter clang's auto var init zero enabler when linking

After commit 8d9acfce3332 ("kbuild: Stop using '-Qunused-arguments' with
clang"), the PowerPC vDSO shows the following error with clang-13 and
older when CONFIG_INIT_STACK_ALL_ZERO is enabled:

  clang: error: argument unused during compilation:
'-enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang'
[-Werror,-Wunused-command-line-argument]

clang-14 added a change to make sure this flag never triggers
-Wunused-command-line-argument, so it is fixed with newer releases. For
older releases that the kernel still supports building with, just filter
out this flag, as has been done for other flags.

Fixes: f0a42fbab447 ("powerpc/vdso: Improve linker flags")
Fixes: 8d9acfce3332 ("kbuild: Stop using '-Qunused-arguments' with clang")
Link: 
https://github.com/llvm/llvm-project/commit/ca6d5813d17598cd180995fb3bdfca00f364475f
Signed-off-by: Nathan Chancellor 
Signed-off-by: Masahiro Yamada 



-- 
Best Regards
Masahiro Yamada


Re: [PATCH v2 05/14] powerpc: Remove linker flag from KBUILD_AFLAGS

2023-01-25 Thread Masahiro Yamada
On Thu, Jan 26, 2023 at 11:07 AM Nathan Chancellor  wrote:
>
> On Thu, Jan 26, 2023 at 10:29:54AM +0900, Masahiro Yamada wrote:
> > On Wed, Jan 25, 2023 at 1:11 PM Michael Ellerman  
> > wrote:
> > >
> > > Nathan Chancellor  writes:
> > > > When clang's -Qunused-arguments is dropped from KBUILD_CPPFLAGS, it
> > > > points out that KBUILD_AFLAGS contains a linker flag, which will be
> > > > used:
> > >
> > > Should that say "unused" ?
> >
> >
> >
> > Nathan, shall I fix it up locally?
> > (it will change the commit hash, though.)
>
> Yes please, if you would not mind. Sorry about that and thank you for
> spotting it Michael!
>
> Since you have to rebase to fix it, you can include Michael's acks?
>
> Cheers,
> Nathan



Done.








-- 
Best Regards
Masahiro Yamada


Re: [PATCH v2 05/14] powerpc: Remove linker flag from KBUILD_AFLAGS

2023-01-25 Thread Nathan Chancellor
On Thu, Jan 26, 2023 at 10:29:54AM +0900, Masahiro Yamada wrote:
> On Wed, Jan 25, 2023 at 1:11 PM Michael Ellerman  wrote:
> >
> > Nathan Chancellor  writes:
> > > When clang's -Qunused-arguments is dropped from KBUILD_CPPFLAGS, it
> > > points out that KBUILD_AFLAGS contains a linker flag, which will be
> > > used:
> >
> > Should that say "unused" ?
> 
> 
> 
> Nathan, shall I fix it up locally?
> (it will change the commit hash, though.)

Yes please, if you would not mind. Sorry about that and thank you for
spotting it Michael!

Since you have to rebase to fix it, you can include Michael's acks?

Cheers,
Nathan

> > >   clang: error: -Wl,-a32: 'linker' input unused 
> > > [-Werror,-Wunused-command-line-argument]
> > >
> > > This was likely supposed to be '-Wa,-a$(BITS)'. However, this change is
> > > unnecessary, as all supported versions of clang and gcc will pass '-a64'
> > > or '-a32' to GNU as based on the value of '-m'; the behavior of the
> > > latest stable release of the oldest supported major version of each
> > > compiler is shown below and each compiler's latest release exhibits the
> > > same behavior (GCC 12.2.0 and Clang 15.0.6).
> > >
> > >   $ powerpc64-linux-gcc --version | head -1
> > >   powerpc64-linux-gcc (GCC) 5.5.0
> > >
> > >   $ powerpc64-linux-gcc -m64 -### -x assembler-with-cpp -c -o /dev/null 
> > > /dev/null &| grep 'as '
> > >   .../as -a64 -mppc64 -many -mbig -o /dev/null /tmp/cctwuBzZ.s
> > >
> > >   $ powerpc64-linux-gcc -m32 -### -x assembler-with-cpp -c -o /dev/null 
> > > /dev/null &| grep 'as '
> > >   .../as -a32 -mppc -many -mbig -o /dev/null /tmp/ccaZP4mF.sg
> > >
> > >   $ clang --version | head -1
> > >   Ubuntu clang version 
> > > 11.1.0-++20211011094159+1fdec59bffc1-1~exp1~20211011214622.5
> > >
> > >   $ clang --target=powerpc64-linux-gnu -fno-integrated-as -m64 -### \
> > > -x assembler-with-cpp -c -o /dev/null /dev/null &| grep gnu-as
> > >"/usr/bin/powerpc64-linux-gnu-as" "-a64" "-mppc64" "-many" "-o" 
> > > "/dev/null" "/tmp/null-80267c.s"
> > >
> > >   $ clang --target=powerpc64-linux-gnu -fno-integrated-as -m64 -### \
> > > -x assembler-with-cpp -c -o /dev/null /dev/null &| grep gnu-as
> > >"/usr/bin/powerpc64-linux-gnu-as" "-a32" "-mppc" "-many" "-o" 
> > > "/dev/null" "/tmp/null-ab8f8d.s"
> > >
> > > Remove this flag altogether to avoid future issues.
> > >
> > > Fixes: 1421dc6d4829 ("powerpc/kbuild: Use flags variables rather than 
> > > overriding LD/CC/AS")
> > > Signed-off-by: Nathan Chancellor 
> > > Reviewed-by: Nick Desaulniers 
> > > ---
> > > Cc: m...@ellerman.id.au
> >
> > Acked-by: Michael Ellerman  (powerpc)
> >
> > cheers
> 
> 
> 
> -- 
> Best Regards
> Masahiro Yamada


Re: [PATCH 3/3] mm, arch: add generic implementation of pfn_valid() for FLATMEM

2023-01-25 Thread Andrew Morton
On Wed, 25 Jan 2023 21:07:57 +0200 Mike Rapoport  wrote:

> Every architecture that supports FLATMEM memory model defines its own
> version of pfn_valid() that essentially compares a pfn to max_mapnr.
> 
> Use mips/powerpc version implemented as static inline as a generic
> implementation of pfn_valid() and drop its per-architecture definitions

arm allnoconfig:

./include/asm-generic/memory_model.h:23:19: error: static declaration of 
'pfn_valid' follows non-static declaration
   23 | static inline int pfn_valid(unsigned long pfn)
  |   ^
./arch/arm/include/asm/page.h:160:12: note: previous declaration of 'pfn_valid' 
with type 'int(long unsigned int)'
  160 | extern int pfn_valid(unsigned long);
  |^


I thought of doing

--- 
a/arch/arm/include/asm/page.h~mm-arch-add-generic-implementation-of-pfn_valid-for-flatmem-fix
+++ a/arch/arm/include/asm/page.h
@@ -156,10 +156,6 @@ extern void copy_page(void *to, const vo
 
 typedef struct page *pgtable_t;
 
-#ifdef CONFIG_HAVE_ARCH_PFN_VALID
-extern int pfn_valid(unsigned long);
-#endif
-
 #include 
 
 #endif /* !__ASSEMBLY__ */
_

but I'm seeing a pfn_valid declaration in arch/arc/include/asm/page.h
which might be a problem.

v2, please ;)


Re: [PATCH v3 1/7] kernel/fork: convert vma assignment to a memcpy

2023-01-25 Thread Andrew Morton
On Wed, 25 Jan 2023 16:50:01 -0800 Suren Baghdasaryan  wrote:

> On Wed, Jan 25, 2023 at 4:22 PM Andrew Morton  
> wrote:
> >
> > On Wed, 25 Jan 2023 15:35:48 -0800 Suren Baghdasaryan  
> > wrote:
> >
> > > Convert vma assignment in vm_area_dup() to a memcpy() to prevent compiler
> > > errors when we add a const modifier to vma->vm_flags.
> > >
> > > ...
> > >
> > > --- a/kernel/fork.c
> > > +++ b/kernel/fork.c
> > > @@ -482,7 +482,7 @@ struct vm_area_struct *vm_area_dup(struct 
> > > vm_area_struct *orig)
> > >* orig->shared.rb may be modified concurrently, but the 
> > > clone
> > >* will be reinitialized.
> > >*/
> > > - *new = data_race(*orig);
> > > + memcpy(new, orig, sizeof(*new));
> >
> > The data_race() removal is unchangelogged?
> 
> True. I'll add a note in the changelog about that. Ideally I would
> like to preserve it but I could not find a way to do that.
> 

Perhaps Paul can comment?

I wonder if KCSAN knows how to detect this race, given that it's now in
a memcpy.  I assume so.


Re: [PATCH v2 05/14] powerpc: Remove linker flag from KBUILD_AFLAGS

2023-01-25 Thread Masahiro Yamada
On Wed, Jan 25, 2023 at 1:11 PM Michael Ellerman  wrote:
>
> Nathan Chancellor  writes:
> > When clang's -Qunused-arguments is dropped from KBUILD_CPPFLAGS, it
> > points out that KBUILD_AFLAGS contains a linker flag, which will be
> > used:
>
> Should that say "unused" ?



Nathan, shall I fix it up locally?
(it will change the commit hash, though.)





>
> >   clang: error: -Wl,-a32: 'linker' input unused 
> > [-Werror,-Wunused-command-line-argument]
> >
> > This was likely supposed to be '-Wa,-a$(BITS)'. However, this change is
> > unnecessary, as all supported versions of clang and gcc will pass '-a64'
> > or '-a32' to GNU as based on the value of '-m'; the behavior of the
> > latest stable release of the oldest supported major version of each
> > compiler is shown below and each compiler's latest release exhibits the
> > same behavior (GCC 12.2.0 and Clang 15.0.6).
> >
> >   $ powerpc64-linux-gcc --version | head -1
> >   powerpc64-linux-gcc (GCC) 5.5.0
> >
> >   $ powerpc64-linux-gcc -m64 -### -x assembler-with-cpp -c -o /dev/null 
> > /dev/null &| grep 'as '
> >   .../as -a64 -mppc64 -many -mbig -o /dev/null /tmp/cctwuBzZ.s
> >
> >   $ powerpc64-linux-gcc -m32 -### -x assembler-with-cpp -c -o /dev/null 
> > /dev/null &| grep 'as '
> >   .../as -a32 -mppc -many -mbig -o /dev/null /tmp/ccaZP4mF.sg
> >
> >   $ clang --version | head -1
> >   Ubuntu clang version 
> > 11.1.0-++20211011094159+1fdec59bffc1-1~exp1~20211011214622.5
> >
> >   $ clang --target=powerpc64-linux-gnu -fno-integrated-as -m64 -### \
> > -x assembler-with-cpp -c -o /dev/null /dev/null &| grep gnu-as
> >"/usr/bin/powerpc64-linux-gnu-as" "-a64" "-mppc64" "-many" "-o" 
> > "/dev/null" "/tmp/null-80267c.s"
> >
> >   $ clang --target=powerpc64-linux-gnu -fno-integrated-as -m64 -### \
> > -x assembler-with-cpp -c -o /dev/null /dev/null &| grep gnu-as
> >"/usr/bin/powerpc64-linux-gnu-as" "-a32" "-mppc" "-many" "-o" 
> > "/dev/null" "/tmp/null-ab8f8d.s"
> >
> > Remove this flag altogether to avoid future issues.
> >
> > Fixes: 1421dc6d4829 ("powerpc/kbuild: Use flags variables rather than 
> > overriding LD/CC/AS")
> > Signed-off-by: Nathan Chancellor 
> > Reviewed-by: Nick Desaulniers 
> > ---
> > Cc: m...@ellerman.id.au
>
> Acked-by: Michael Ellerman  (powerpc)
>
> cheers



-- 
Best Regards
Masahiro Yamada


[PATCH v4 12/12] perf pmu-events: Fix testing with JEVENTS_ARCH=all

2023-01-25 Thread Ian Rogers
The #slots literal will return NAN when not on ARM64 which causes a
perf test failure when not on an ARM64 for a JEVENTS_ARCH=all build:
..
 10.4: Parsing of PMU event table metrics with fake PMUs : FAILED!
..
Add an is_test boolean so that the failure can be avoided when running
as a test.

Fixes: acef233b7ca7 ("perf pmu: Add #slots literal support for arm64")
Signed-off-by: Ian Rogers 
---
 tools/perf/tests/pmu-events.c | 1 +
 tools/perf/util/expr.h| 1 +
 tools/perf/util/expr.l| 8 +---
 3 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/tools/perf/tests/pmu-events.c b/tools/perf/tests/pmu-events.c
index 962c3c0d53ba..accf44b3d968 100644
--- a/tools/perf/tests/pmu-events.c
+++ b/tools/perf/tests/pmu-events.c
@@ -950,6 +950,7 @@ static int metric_parse_fake(const char *metric_name, const 
char *str)
pr_debug("expr__ctx_new failed");
return TEST_FAIL;
}
+   ctx->sctx.is_test = true;
if (expr__find_ids(str, NULL, ctx) < 0) {
pr_err("expr__find_ids failed\n");
return -1;
diff --git a/tools/perf/util/expr.h b/tools/perf/util/expr.h
index 029271540fb0..eaa44b24c555 100644
--- a/tools/perf/util/expr.h
+++ b/tools/perf/util/expr.h
@@ -9,6 +9,7 @@ struct expr_scanner_ctx {
char *user_requested_cpu_list;
int runtime;
bool system_wide;
+   bool is_test;
 };
 
 struct expr_parse_ctx {
diff --git a/tools/perf/util/expr.l b/tools/perf/util/expr.l
index 0168a9637330..72ff4f3d6d4b 100644
--- a/tools/perf/util/expr.l
+++ b/tools/perf/util/expr.l
@@ -84,9 +84,11 @@ static int literal(yyscan_t scanner, const struct 
expr_scanner_ctx *sctx)
YYSTYPE *yylval = expr_get_lval(scanner);
 
yylval->num = expr__get_literal(expr_get_text(scanner), sctx);
-   if (isnan(yylval->num))
-   return EXPR_ERROR;
-
+   if (isnan(yylval->num)) {
+   if (!sctx->is_test)
+   return EXPR_ERROR;
+   yylval->num = 1;
+   }
return LITERAL;
 }
 %}
-- 
2.39.1.456.gfc5497dd1b-goog



[PATCH v4 11/12] perf jevents: Add model list option

2023-01-25 Thread Ian Rogers
This allows the set of generated jevents events and metrics be limited
to a subset of the model names. Appropriate if trying to minimize the
binary size where only a set of models are possible.

Signed-off-by: Ian Rogers 
---
 tools/perf/pmu-events/Build  |  3 ++-
 tools/perf/pmu-events/jevents.py | 14 ++
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/tools/perf/pmu-events/Build b/tools/perf/pmu-events/Build
index 15b9e8fdbffa..a14de24ecb69 100644
--- a/tools/perf/pmu-events/Build
+++ b/tools/perf/pmu-events/Build
@@ -10,6 +10,7 @@ JEVENTS_PY=  pmu-events/jevents.py
 ifeq ($(JEVENTS_ARCH),)
 JEVENTS_ARCH=$(SRCARCH)
 endif
+JEVENTS_MODEL ?= all
 
 #
 # Locate/process JSON files in pmu-events/arch/
@@ -23,5 +24,5 @@ $(OUTPUT)pmu-events/pmu-events.c: 
pmu-events/empty-pmu-events.c
 else
 $(OUTPUT)pmu-events/pmu-events.c: $(JSON) $(JSON_TEST) $(JEVENTS_PY) 
pmu-events/metric.py
$(call rule_mkdir)
-   $(Q)$(call echo-cmd,gen)$(PYTHON) $(JEVENTS_PY) $(JEVENTS_ARCH) 
pmu-events/arch $@
+   $(Q)$(call echo-cmd,gen)$(PYTHON) $(JEVENTS_PY) $(JEVENTS_ARCH) 
$(JEVENTS_MODEL) pmu-events/arch $@
 endif
diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jevents.py
index 627ee817f57f..2bcd07ce609f 100755
--- a/tools/perf/pmu-events/jevents.py
+++ b/tools/perf/pmu-events/jevents.py
@@ -599,6 +599,8 @@ const struct pmu_events_map pmu_events_map[] = {
 else:
   metric_tblname = 'NULL'
   metric_size = '0'
+if event_size == '0' and metric_size == '0':
+  continue
 cpuid = row[0].replace('\\', '')
 _args.output_file.write(f"""{{
 \t.arch = "{arch}",
@@ -888,12 +890,24 @@ def main() -> None:
   action: Callable[[Sequence[str], os.DirEntry], None]) -> None:
 """Replicate the directory/file walking behavior of C's file tree walk."""
 for item in os.scandir(path):
+  if _args.model != 'all' and item.is_dir():
+# Check if the model matches one in _args.model.
+if len(parents) == _args.model.split(',')[0].count('/'):
+  # We're testing the correct directory.
+  item_path = '/'.join(parents) + ('/' if len(parents) > 0 else '') + 
item.name
+  if 'test' not in item_path and item_path not in 
_args.model.split(','):
+continue
   action(parents, item)
   if item.is_dir():
 ftw(item.path, parents + [item.name], action)
 
   ap = argparse.ArgumentParser()
   ap.add_argument('arch', help='Architecture name like x86')
+  ap.add_argument('model', help='''Select a model such as skylake to
+reduce the code size.  Normally set to "all". For architectures like
+ARM64 with an implementor/model, the model must include the implementor
+such as "arm/cortex-a34".''',
+  default='all')
   ap.add_argument(
   'starting_dir',
   type=dir_path,
-- 
2.39.1.456.gfc5497dd1b-goog



[PATCH v4 10/12] perf jevents: Generate metrics and events as separate tables

2023-01-25 Thread Ian Rogers
Turn a perf json event into an event, metric or both. This reduces the
number of events needed to scan to find an event or metric. As events
no longer need the relatively seldom used metric fields, 4 bytes is
saved per event. This reduces the big C string's size by 335kb (14.8%)
on x86.

Note, for the test PMU architecture pme_test_soc_cpu is renamed
pmu_events__test_soc_cpu for consistency with the event vs metric
naming convention.

Signed-off-by: Ian Rogers 
---
 tools/perf/pmu-events/jevents.py | 244 +++
 tools/perf/tests/pmu-events.c|   3 +-
 2 files changed, 189 insertions(+), 58 deletions(-)

diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jevents.py
index d83cc94af51f..627ee817f57f 100755
--- a/tools/perf/pmu-events/jevents.py
+++ b/tools/perf/pmu-events/jevents.py
@@ -13,28 +13,40 @@ import collections
 
 # Global command line arguments.
 _args = None
+# List of regular event tables.
+_event_tables = []
 # List of event tables generated from "/sys" directories.
 _sys_event_tables = []
+# List of regular metric tables.
+_metric_tables = []
+# List of metric tables generated from "/sys" directories.
+_sys_metric_tables = []
+# Mapping between sys event table names and sys metric table names.
+_sys_event_table_to_metric_table_mapping = {}
 # Map from an event name to an architecture standard
 # JsonEvent. Architecture standard events are in json files in the top
 # f'{_args.starting_dir}/{_args.arch}' directory.
 _arch_std_events = {}
 # Events to write out when the table is closed
 _pending_events = []
-# Name of table to be written out
+# Name of events table to be written out
 _pending_events_tblname = None
+# Metrics to write out when the table is closed
+_pending_metrics = []
+# Name of metrics table to be written out
+_pending_metrics_tblname = None
 # Global BigCString shared by all structures.
 _bcs = None
 # Order specific JsonEvent attributes will be visited.
 _json_event_attributes = [
 # cmp_sevent related attributes.
-'name', 'pmu', 'topic', 'desc', 'metric_name', 'metric_group',
+'name', 'pmu', 'topic', 'desc',
 # Seems useful, put it early.
 'event',
 # Short things in alphabetical order.
 'aggr_mode', 'compat', 'deprecated', 'perpkg', 'unit',
 # Longer things (the last won't be iterated over during decompress).
-'metric_constraint', 'metric_expr', 'long_desc'
+'long_desc'
 ]
 
 # Attributes that are in pmu_metric rather than pmu_event.
@@ -52,14 +64,16 @@ def removesuffix(s: str, suffix: str) -> str:
   return s[0:-len(suffix)] if s.endswith(suffix) else s
 
 
-def file_name_to_table_name(parents: Sequence[str], dirname: str) -> str:
+def file_name_to_table_name(prefix: str, parents: Sequence[str],
+dirname: str) -> str:
   """Generate a C table name from directory names."""
-  tblname = 'pme'
+  tblname = prefix
   for p in parents:
 tblname += '_' + p
   tblname += '_' + dirname
   return tblname.replace('-', '_')
 
+
 def c_len(s: str) -> int:
   """Return the length of s a C string
 
@@ -277,7 +291,7 @@ class JsonEvent:
 self.metric_constraint = jd.get('MetricConstraint')
 self.metric_expr = None
 if 'MetricExpr' in jd:
-   self.metric_expr = metric.ParsePerfJson(jd['MetricExpr']).Simplify()
+  self.metric_expr = metric.ParsePerfJson(jd['MetricExpr']).Simplify()
 
 arch_std = jd.get('ArchStdEvent')
 if precise and self.desc and '(Precise Event)' not in self.desc:
@@ -326,23 +340,24 @@ class JsonEvent:
 s += f'\t{attr} = {value},\n'
 return s + '}'
 
-  def build_c_string(self) -> str:
+  def build_c_string(self, metric: bool) -> str:
 s = ''
-for attr in _json_event_attributes:
+for attr in _json_metric_attributes if metric else _json_event_attributes:
   x = getattr(self, attr)
-  if x and attr == 'metric_expr':
+  if metric and x and attr == 'metric_expr':
 # Convert parsed metric expressions into a string. Slashes
 # must be doubled in the file.
 x = x.ToPerfJson().replace('\\', '')
   s += f'{x}\\000' if x else '\\000'
 return s
 
-  def to_c_string(self) -> str:
+  def to_c_string(self, metric: bool) -> str:
 """Representation of the event as a C struct initializer."""
 
-s = self.build_c_string()
+s = self.build_c_string(metric)
 return f'{{ { _bcs.offsets[s] } }}, /* {s} */\n'
 
+
 @lru_cache(maxsize=None)
 def read_json_events(path: str, topic: str) -> Sequence[JsonEvent]:
   """Read json events from the specified file."""
@@ -381,7 +396,10 @@ def preprocess_arch_std_files(archpath: str) -> None:
 def add_events_table_entries(item: os.DirEntry, topic: str) -> None:
   """Add contents of file to _pending_events table."""
   for e in read_json_events(item.path, topic):
-_pending_events.append(e)
+if e.name:
+  _pending_events.append(e)
+if e.metric_name:
+  _pending_metrics.append(e)
 
 
 def print_pending_events() 

[PATCH v4 09/12] perf pmu-events: Introduce pmu_metrics_table

2023-01-25 Thread Ian Rogers
Add a metrics table that is just a cast from pmu_events_table. This
changes the APIs so that event and metric usage of the underlying
table is different. For the no jevents case the tables are already
separate, later changes will separate the tables for the jevents case.

Signed-off-by: Ian Rogers 
---
 tools/perf/arch/arm64/util/pmu.c | 11 -
 tools/perf/pmu-events/empty-pmu-events.c | 21 -
 tools/perf/pmu-events/jevents.py | 21 ++---
 tools/perf/pmu-events/pmu-events.h   | 10 +++--
 tools/perf/tests/expand-cgroup.c |  2 +-
 tools/perf/tests/parse-metric.c  |  2 +-
 tools/perf/tests/pmu-events.c|  5 ++-
 tools/perf/util/metricgroup.c| 54 
 tools/perf/util/metricgroup.h|  2 +-
 tools/perf/util/pmu.c|  5 +++
 tools/perf/util/pmu.h|  1 +
 11 files changed, 78 insertions(+), 56 deletions(-)

diff --git a/tools/perf/arch/arm64/util/pmu.c b/tools/perf/arch/arm64/util/pmu.c
index 801bf52e2ea6..2779840d8896 100644
--- a/tools/perf/arch/arm64/util/pmu.c
+++ b/tools/perf/arch/arm64/util/pmu.c
@@ -22,7 +22,14 @@ static struct perf_pmu *pmu__find_core_pmu(void)
return NULL;
 
return pmu;
-   }
+}
+
+const struct pmu_metrics_table *pmu_metrics_table__find(void)
+{
+   struct perf_pmu *pmu = pmu__find_core_pmu();
+
+   if (pmu)
+   return perf_pmu__find_metrics_table(pmu);
 
return NULL;
 }
@@ -32,7 +39,7 @@ const struct pmu_events_table *pmu_events_table__find(void)
struct perf_pmu *pmu = pmu__find_core_pmu();
 
if (pmu)
-   return perf_pmu__find_table(pmu);
+   return perf_pmu__find_events_table(pmu);
 
return NULL;
 }
diff --git a/tools/perf/pmu-events/empty-pmu-events.c 
b/tools/perf/pmu-events/empty-pmu-events.c
index 10bd4943ebf8..a938b74cf487 100644
--- a/tools/perf/pmu-events/empty-pmu-events.c
+++ b/tools/perf/pmu-events/empty-pmu-events.c
@@ -278,13 +278,11 @@ int pmu_events_table_for_each_event(const struct 
pmu_events_table *table, pmu_ev
return 0;
 }
 
-int pmu_events_table_for_each_metric(const struct pmu_events_table *etable, 
pmu_metric_iter_fn fn,
-void *data)
+int pmu_metrics_table_for_each_metric(const struct pmu_metrics_table *table, 
pmu_metric_iter_fn fn,
+ void *data)
 {
-   struct pmu_metrics_table *table = (struct pmu_metrics_table *)etable;
-
for (const struct pmu_metric *pm = >entries[0]; pm->metric_expr; 
pm++) {
-   int ret = fn(pm, etable, data);
+   int ret = fn(pm, table, data);
 
if (ret)
return ret;
@@ -320,9 +318,9 @@ const struct pmu_events_table 
*perf_pmu__find_events_table(struct perf_pmu *pmu)
return table;
 }
 
-const struct pmu_events_table *perf_pmu__find_metrics_table(struct perf_pmu 
*pmu)
+const struct pmu_metrics_table *perf_pmu__find_metrics_table(struct perf_pmu 
*pmu)
 {
-   const struct pmu_events_table *table = NULL;
+   const struct pmu_metrics_table *table = NULL;
char *cpuid = perf_pmu__getcpuid(pmu);
int i;
 
@@ -340,7 +338,7 @@ const struct pmu_events_table 
*perf_pmu__find_metrics_table(struct perf_pmu *pmu
break;
 
if (!strcmp_cpuid_str(map->cpuid, cpuid)) {
-   table = (const struct pmu_events_table 
*)>metric_table;
+   table = >metric_table;
break;
}
}
@@ -359,13 +357,13 @@ const struct pmu_events_table 
*find_core_events_table(const char *arch, const ch
return NULL;
 }
 
-const struct pmu_events_table *find_core_metrics_table(const char *arch, const 
char *cpuid)
+const struct pmu_metrics_table *find_core_metrics_table(const char *arch, 
const char *cpuid)
 {
for (const struct pmu_events_map *tables = _events_map[0];
 tables->arch;
 tables++) {
if (!strcmp(tables->arch, arch) && 
!strcmp_cpuid_str(tables->cpuid, cpuid))
-   return (const struct pmu_events_table 
*)>metric_table;
+   return >metric_table;
}
return NULL;
 }
@@ -386,8 +384,7 @@ int pmu_for_each_core_metric(pmu_metric_iter_fn fn, void 
*data)
for (const struct pmu_events_map *tables = _events_map[0];
 tables->arch;
 tables++) {
-   int ret = pmu_events_table_for_each_metric(
-   (const struct pmu_events_table *)>metric_table, 
fn, data);
+   int ret = 
pmu_metrics_table_for_each_metric(>metric_table, fn, data);
 
if (ret)
return ret;
diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jevents.py
index 5f8d490c7269..d83cc94af51f 100755
--- a/tools/perf/pmu-events/jevents.py
+++ 

[PATCH v4 08/12] perf jevents: Combine table prefix and suffix writing

2023-01-25 Thread Ian Rogers
Combine into a single function to simplify, in a later change, writing
metrics separately.

Signed-off-by: Ian Rogers 
---
 tools/perf/pmu-events/jevents.py | 36 +---
 1 file changed, 14 insertions(+), 22 deletions(-)

diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jevents.py
index 4cdbf34b7298..5f8d490c7269 100755
--- a/tools/perf/pmu-events/jevents.py
+++ b/tools/perf/pmu-events/jevents.py
@@ -19,10 +19,10 @@ _sys_event_tables = []
 # JsonEvent. Architecture standard events are in json files in the top
 # f'{_args.starting_dir}/{_args.arch}' directory.
 _arch_std_events = {}
-# Track whether an events table is currently being defined and needs closing.
-_close_table = False
 # Events to write out when the table is closed
 _pending_events = []
+# Name of table to be written out
+_pending_events_tblname = None
 # Global BigCString shared by all structures.
 _bcs = None
 # Order specific JsonEvent attributes will be visited.
@@ -378,24 +378,13 @@ def preprocess_arch_std_files(archpath: str) -> None:
   _arch_std_events[event.metric_name.lower()] = event
 
 
-def print_events_table_prefix(tblname: str) -> None:
-  """Called when a new events table is started."""
-  global _close_table
-  if _close_table:
-raise IOError('Printing table prefix but last table has no suffix')
-  _args.output_file.write(f'static const struct compact_pmu_event {tblname}[] 
= {{\n')
-  _close_table = True
-
-
 def add_events_table_entries(item: os.DirEntry, topic: str) -> None:
   """Add contents of file to _pending_events table."""
-  if not _close_table:
-raise IOError('Table entries missing prefix')
   for e in read_json_events(item.path, topic):
 _pending_events.append(e)
 
 
-def print_events_table_suffix() -> None:
+def print_pending_events() -> None:
   """Optionally close events table."""
 
   def event_cmp_key(j: JsonEvent) -> Tuple[bool, str, str, str, str]:
@@ -407,17 +396,19 @@ def print_events_table_suffix() -> None:
 return (j.desc is not None, fix_none(j.topic), fix_none(j.name), 
fix_none(j.pmu),
 fix_none(j.metric_name))
 
-  global _close_table
-  if not _close_table:
+  global _pending_events
+  if not _pending_events:
 return
 
-  global _pending_events
+  global _pending_events_tblname
+  _args.output_file.write(
+  f'static const struct compact_pmu_event {_pending_events_tblname}[] = 
{{\n')
+
   for event in sorted(_pending_events, key=event_cmp_key):
 _args.output_file.write(event.to_c_string())
-_pending_events = []
+  _pending_events = []
 
   _args.output_file.write('};\n\n')
-  _close_table = False
 
 def get_topic(topic: str) -> str:
   if topic.endswith('metrics.json'):
@@ -455,12 +446,13 @@ def process_one_file(parents: Sequence[str], item: 
os.DirEntry) -> None:
 
   # model directory, reset topic
   if item.is_dir() and is_leaf_dir(item.path):
-print_events_table_suffix()
+print_pending_events()
 
 tblname = file_name_to_table_name(parents, item.name)
 if item.name == 'sys':
   _sys_event_tables.append(tblname)
-print_events_table_prefix(tblname)
+global _pending_events_tblname
+_pending_events_tblname = tblname
 return
 
   # base dir or too deep
@@ -809,7 +801,7 @@ struct compact_pmu_event {
   for arch in archs:
 arch_path = f'{_args.starting_dir}/{arch}'
 ftw(arch_path, [], process_one_file)
-print_events_table_suffix()
+print_pending_events()
 
   print_mapping_table(archs)
   print_system_mapping_table()
-- 
2.39.1.456.gfc5497dd1b-goog



[PATCH v4 07/12] perf stat: Remove evsel metric_name/expr

2023-01-25 Thread Ian Rogers
Metrics are their own unit and these variables held broken metrics
previously and now just hold the value NULL. Remove code that used
these variables.

Reviewed-by: John Garry 
Signed-off-by: Ian Rogers 
---
 tools/perf/builtin-stat.c |   1 -
 tools/perf/util/cgroup.c  |   1 -
 tools/perf/util/evsel.c   |   2 -
 tools/perf/util/evsel.h   |   2 -
 tools/perf/util/python.c  |   7 ---
 tools/perf/util/stat-shadow.c | 112 --
 tools/perf/util/stat.h|   1 -
 7 files changed, 126 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 9f3e4b257516..5d18a5a6f662 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -2524,7 +2524,6 @@ int cmd_stat(int argc, const char **argv)
_config.metric_events);
zfree();
}
-   perf_stat__collect_metric_expr(evsel_list);
perf_stat__init_shadow_stats();
 
if (add_default_attributes())
diff --git a/tools/perf/util/cgroup.c b/tools/perf/util/cgroup.c
index cd978c240e0d..bfb13306d82c 100644
--- a/tools/perf/util/cgroup.c
+++ b/tools/perf/util/cgroup.c
@@ -481,7 +481,6 @@ int evlist__expand_cgroup(struct evlist *evlist, const char 
*str,
nr_cgroups++;
 
if (metric_events) {
-   perf_stat__collect_metric_expr(tmp_list);
if (metricgroup__copy_metric_events(tmp_list, cgrp,
metric_events,

_metric_events) < 0)
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 8550638587e5..a90e998826e0 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -285,8 +285,6 @@ void evsel__init(struct evsel *evsel,
evsel->sample_size = __evsel__sample_size(attr->sample_type);
evsel__calc_id_pos(evsel);
evsel->cmdline_group_boundary = false;
-   evsel->metric_expr   = NULL;
-   evsel->metric_name   = NULL;
evsel->metric_events = NULL;
evsel->per_pkg_mask  = NULL;
evsel->collect_stat  = false;
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index d572be41b960..24cb807ef6ce 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -105,8 +105,6 @@ struct evsel {
 * metric fields are similar, but needs more care as they can have
 * references to other metric (evsel).
 */
-   const char *metric_expr;
-   const char *metric_name;
struct evsel**metric_events;
struct evsel*metric_leader;
 
diff --git a/tools/perf/util/python.c b/tools/perf/util/python.c
index 9e5d881b0987..42e8b813d010 100644
--- a/tools/perf/util/python.c
+++ b/tools/perf/util/python.c
@@ -76,13 +76,6 @@ const char *perf_env__arch(struct perf_env *env 
__maybe_unused)
return NULL;
 }
 
-/*
- * Add this one here not to drag util/stat-shadow.c
- */
-void perf_stat__collect_metric_expr(struct evlist *evsel_list)
-{
-}
-
 /*
  * These ones are needed not to drag the PMU bandwagon, jevents generated
  * pmu_sys_event_tables, etc and evsel__find_pmu() is used so far just for
diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index cadb2df23c87..35ea4813f468 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -346,114 +346,6 @@ static const char *get_ratio_color(enum grc_type type, 
double ratio)
return color;
 }
 
-static struct evsel *perf_stat__find_event(struct evlist *evsel_list,
-   const char *name)
-{
-   struct evsel *c2;
-
-   evlist__for_each_entry (evsel_list, c2) {
-   if (!strcasecmp(c2->name, name) && !c2->collect_stat)
-   return c2;
-   }
-   return NULL;
-}
-
-/* Mark MetricExpr target events and link events using them to them. */
-void perf_stat__collect_metric_expr(struct evlist *evsel_list)
-{
-   struct evsel *counter, *leader, **metric_events, *oc;
-   bool found;
-   struct expr_parse_ctx *ctx;
-   struct hashmap_entry *cur;
-   size_t bkt;
-   int i;
-
-   ctx = expr__ctx_new();
-   if (!ctx) {
-   pr_debug("expr__ctx_new failed");
-   return;
-   }
-   evlist__for_each_entry(evsel_list, counter) {
-   bool invalid = false;
-
-   leader = evsel__leader(counter);
-   if (!counter->metric_expr)
-   continue;
-
-   expr__ctx_clear(ctx);
-   metric_events = counter->metric_events;
-   if (!metric_events) {
-   if (expr__find_ids(counter->metric_expr,
-  counter->name,
-  ctx) < 0)
-   continue;
-
-

[PATCH v4 06/12] perf pmu-events: Remove now unused event and metric variables

2023-01-25 Thread Ian Rogers
Previous changes separated the uses of pmu_event and pmu_metric,
however, both structures contained all the variables of event and
metric. This change removes the event variables from metric and the
metric variables from event.

Note, this change removes the setting of evsel's metric_name/expr as
these fields are no longer part of struct pmu_event. The metric
remains but is no longer implicitly requested when the event is. This
impacts a few Intel uncore events, however, as the ScaleUnit is shared
by the event and the metric this utility is questionable. Also the
MetricNames look broken (contain spaces) in some cases and when trying
to use the functionality with '-e' the metrics fail but regular
metrics with '-M' work. For example, on SkylakeX '-M' works:

```
$ perf stat -M LLC_MISSES.PCIE_WRITE -a sleep 1

 Performance counter stats for 'system wide':

 0  UNC_IIO_DATA_REQ_OF_CPU.MEM_WRITE.PART2 #  57896.0 
Bytes  LLC_MISSES.PCIE_WRITE  (49.84%)
 7,174  UNC_IIO_DATA_REQ_OF_CPU.MEM_WRITE.PART1 
   (49.85%)
 0  UNC_IIO_DATA_REQ_OF_CPU.MEM_WRITE.PART3 
   (50.16%)
63  UNC_IIO_DATA_REQ_OF_CPU.MEM_WRITE.PART0 
   (50.15%)

   1.004576381 seconds time elapsed
```

whilst the event '-e' version is broken even with --group/-g (fwiw, we should 
also remove -g [1]):

```
$ perf stat -g -e LLC_MISSES.PCIE_WRITE -g -a sleep 1
Add UNC_IIO_DATA_REQ_OF_CPU.MEM_WRITE.PART2 event to groups to get metric 
expression for LLC_MISSES.PCIE_WRITE
Add UNC_IIO_DATA_REQ_OF_CPU.MEM_WRITE.PART1 event to groups to get metric 
expression for LLC_MISSES.PCIE_WRITE
Add UNC_IIO_DATA_REQ_OF_CPU.MEM_WRITE.PART3 event to groups to get metric 
expression for LLC_MISSES.PCIE_WRITE
Add UNC_IIO_DATA_REQ_OF_CPU.MEM_WRITE.PART0 event to groups to get metric 
expression for LLC_MISSES.PCIE_WRITE
Add UNC_IIO_DATA_REQ_OF_CPU.MEM_WRITE.PART2 event to groups to get metric 
expression for LLC_MISSES.PCIE_WRITE
Add UNC_IIO_DATA_REQ_OF_CPU.MEM_WRITE.PART1 event to groups to get metric 
expression for LLC_MISSES.PCIE_WRITE
Add UNC_IIO_DATA_REQ_OF_CPU.MEM_WRITE.PART3 event to groups to get metric 
expression for LLC_MISSES.PCIE_WRITE
Add UNC_IIO_DATA_REQ_OF_CPU.MEM_WRITE.PART0 event to groups to get metric 
expression for LLC_MISSES.PCIE_WRITE
Add UNC_IIO_DATA_REQ_OF_CPU.MEM_WRITE.PART2 event to groups to get metric 
expression for LLC_MISSES.PCIE_WRITE
Add UNC_IIO_DATA_REQ_OF_CPU.MEM_WRITE.PART1 event to groups to get metric 
expression for LLC_MISSES.PCIE_WRITE
Add UNC_IIO_DATA_REQ_OF_CPU.MEM_WRITE.PART3 event to groups to get metric 
expression for LLC_MISSES.PCIE_WRITE
Add UNC_IIO_DATA_REQ_OF_CPU.MEM_WRITE.PART0 event to groups to get metric 
expression for LLC_MISSES.PCIE_WRITE
Add UNC_IIO_DATA_REQ_OF_CPU.MEM_WRITE.PART2 event to groups to get metric 
expression for LLC_MISSES.PCIE_WRITE
Add UNC_IIO_DATA_REQ_OF_CPU.MEM_WRITE.PART1 event to groups to get metric 
expression for LLC_MISSES.PCIE_WRITE
Add UNC_IIO_DATA_REQ_OF_CPU.MEM_WRITE.PART3 event to groups to get metric 
expression for LLC_MISSES.PCIE_WRITE
Add UNC_IIO_DATA_REQ_OF_CPU.MEM_WRITE.PART0 event to groups to get metric 
expression for LLC_MISSES.PCIE_WRITE
Add UNC_IIO_DATA_REQ_OF_CPU.MEM_WRITE.PART2 event to groups to get metric 
expression for LLC_MISSES.PCIE_WRITE
Add UNC_IIO_DATA_REQ_OF_CPU.MEM_WRITE.PART1 event to groups to get metric 
expression for LLC_MISSES.PCIE_WRITE
Add UNC_IIO_DATA_REQ_OF_CPU.MEM_WRITE.PART3 event to groups to get metric 
expression for LLC_MISSES.PCIE_WRITE
Add UNC_IIO_DATA_REQ_OF_CPU.MEM_WRITE.PART0 event to groups to get metric 
expression for LLC_MISSES.PCIE_WRITE
Add UNC_IIO_DATA_REQ_OF_CPU.MEM_WRITE.PART2 event to groups to get metric 
expression for LLC_MISSES.PCIE_WRITE
Add UNC_IIO_DATA_REQ_OF_CPU.MEM_WRITE.PART1 event to groups to get metric 
expression for LLC_MISSES.PCIE_WRITE
Add UNC_IIO_DATA_REQ_OF_CPU.MEM_WRITE.PART3 event to groups to get metric 
expression for LLC_MISSES.PCIE_WRITE
Add UNC_IIO_DATA_REQ_OF_CPU.MEM_WRITE.PART0 event to groups to get metric 
expression for LLC_MISSES.PCIE_WRITE

 Performance counter stats for 'system wide':

27,316 Bytes LLC_MISSES.PCIE_WRITE

   1.004505469 seconds time elapsed
```

The code also carries warnings where the user is supposed to select
events for metrics [2] but given the lack of use of such a feature,
let's clean the code and just remove.

[1] https://lore.kernel.org/lkml/20220707195610.303254-1-irog...@google.com/
[2] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/util/stat-shadow.c?id=01b8957b738f42f96a130079bc951b3cc78c5b8a#n425

Reviewed-by: John Garry 
Signed-off-by: Ian Rogers 
---
 tools/perf/builtin-list.c  | 20 ++---
 tools/perf/pmu-events/jevents.py   | 20 +
 tools/perf/pmu-events/pmu-events.h | 22 +--

[PATCH v4 05/12] perf pmu-events: Separate the metrics from events for no jevents

2023-01-25 Thread Ian Rogers
Separate the event and metric table when building without jevents. Add
find_core_metrics_table and perf_pmu__find_metrics_table while
renaming existing utilities to be event specific, so that users can
find the right table for their need.

Reviewed-by: John Garry 
Signed-off-by: Ian Rogers 
---
 tools/perf/pmu-events/empty-pmu-events.c | 88 ++--
 tools/perf/pmu-events/jevents.py |  7 +-
 tools/perf/pmu-events/pmu-events.h   |  4 +-
 tools/perf/tests/expand-cgroup.c |  2 +-
 tools/perf/tests/parse-metric.c  |  2 +-
 tools/perf/util/pmu.c|  4 +-
 6 files changed, 79 insertions(+), 28 deletions(-)

diff --git a/tools/perf/pmu-events/empty-pmu-events.c 
b/tools/perf/pmu-events/empty-pmu-events.c
index 4e39d1a8d6d6..10bd4943ebf8 100644
--- a/tools/perf/pmu-events/empty-pmu-events.c
+++ b/tools/perf/pmu-events/empty-pmu-events.c
@@ -11,7 +11,7 @@
 #include 
 #include 
 
-static const struct pmu_event pme_test_soc_cpu[] = {
+static const struct pmu_event pmu_events__test_soc_cpu[] = {
{
.name = "l3_cache_rd",
.event = "event=0x40",
@@ -105,6 +105,14 @@ static const struct pmu_event pme_test_soc_cpu[] = {
.desc = "L2 BTB Correction",
.topic = "branch",
},
+   {
+   .name = 0,
+   .event = 0,
+   .desc = 0,
+   },
+};
+
+static const struct pmu_metric pmu_metrics__test_soc_cpu[] = {
{
.metric_expr= "1 / IPC",
.metric_name= "CPI",
@@ -170,9 +178,8 @@ static const struct pmu_event pme_test_soc_cpu[] = {
.metric_name= "L1D_Cache_Fill_BW",
},
{
-   .name = 0,
-   .event = 0,
-   .desc = 0,
+   .metric_expr = 0,
+   .metric_name = 0,
},
 };
 
@@ -197,7 +204,8 @@ struct pmu_metrics_table {
 struct pmu_events_map {
const char *arch;
const char *cpuid;
-   const struct pmu_events_table table;
+   const struct pmu_events_table event_table;
+   const struct pmu_metrics_table metric_table;
 };
 
 /*
@@ -208,12 +216,14 @@ static const struct pmu_events_map pmu_events_map[] = {
{
.arch = "testarch",
.cpuid = "testcpu",
-   .table = { pme_test_soc_cpu },
+   .event_table = { pmu_events__test_soc_cpu },
+   .metric_table = { pmu_metrics__test_soc_cpu },
},
{
.arch = 0,
.cpuid = 0,
-   .table = { 0 },
+   .event_table = { 0 },
+   .metric_table = { 0 },
},
 };
 
@@ -259,12 +269,9 @@ static const struct pmu_sys_events pmu_sys_event_tables[] 
= {
 int pmu_events_table_for_each_event(const struct pmu_events_table *table, 
pmu_event_iter_fn fn,
void *data)
 {
-   for (const struct pmu_event *pe = >entries[0]; pe->name || 
pe->metric_expr; pe++) {
-   int ret;
+   for (const struct pmu_event *pe = >entries[0]; pe->name; pe++) {
+   int ret = fn(pe, table, data);
 
-   if (!pe->name)
-   continue;
-   ret = fn(pe, table, data);
if (ret)
return ret;
}
@@ -276,19 +283,44 @@ int pmu_events_table_for_each_metric(const struct 
pmu_events_table *etable, pmu_
 {
struct pmu_metrics_table *table = (struct pmu_metrics_table *)etable;
 
-   for (const struct pmu_metric *pm = >entries[0]; pm->name || 
pm->metric_expr; pm++) {
-   int ret;
+   for (const struct pmu_metric *pm = >entries[0]; pm->metric_expr; 
pm++) {
+   int ret = fn(pm, etable, data);
 
-   if (!pm->metric_expr)
-   continue;
-   ret = fn(pm, etable, data);
if (ret)
return ret;
}
return 0;
 }
 
-const struct pmu_events_table *perf_pmu__find_table(struct perf_pmu *pmu)
+const struct pmu_events_table *perf_pmu__find_events_table(struct perf_pmu 
*pmu)
+{
+   const struct pmu_events_table *table = NULL;
+   char *cpuid = perf_pmu__getcpuid(pmu);
+   int i;
+
+   /* on some platforms which uses cpus map, cpuid can be NULL for
+* PMUs other than CORE PMUs.
+*/
+   if (!cpuid)
+   return NULL;
+
+   i = 0;
+   for (;;) {
+   const struct pmu_events_map *map = _events_map[i++];
+
+   if (!map->cpuid)
+   break;
+
+   if (!strcmp_cpuid_str(map->cpuid, cpuid)) {
+   table = >event_table;
+   break;
+   }
+   }
+   free(cpuid);
+   return table;
+}
+
+const struct pmu_events_table *perf_pmu__find_metrics_table(struct perf_pmu 
*pmu)
 {
const struct pmu_events_table *table = NULL;

[PATCH v4 04/12] perf pmu-events: Add separate metric from pmu_event

2023-01-25 Thread Ian Rogers
Create a new pmu_metric for the metric related variables from
pmu_event but that is initially just a clone of pmu_event. Add
iterators for pmu_metric and use in places that metrics are desired
rather than events. Make the event iterator skip metric only events,
and the metric iterator skip event only events.

Reviewed-by: John Garry 
Signed-off-by: Ian Rogers 
---
 tools/perf/arch/powerpc/util/header.c|   4 +-
 tools/perf/pmu-events/empty-pmu-events.c |  49 ++-
 tools/perf/pmu-events/jevents.py |  62 -
 tools/perf/pmu-events/pmu-events.h   |  26 
 tools/perf/tests/pmu-events.c|  35 +++--
 tools/perf/util/metricgroup.c| 161 +++
 tools/perf/util/metricgroup.h|   2 +-
 7 files changed, 228 insertions(+), 111 deletions(-)

diff --git a/tools/perf/arch/powerpc/util/header.c 
b/tools/perf/arch/powerpc/util/header.c
index e8fe36b10d20..78eef77d8a8d 100644
--- a/tools/perf/arch/powerpc/util/header.c
+++ b/tools/perf/arch/powerpc/util/header.c
@@ -40,11 +40,11 @@ get_cpuid_str(struct perf_pmu *pmu __maybe_unused)
return bufp;
 }
 
-int arch_get_runtimeparam(const struct pmu_event *pe)
+int arch_get_runtimeparam(const struct pmu_metric *pm)
 {
int count;
char path[PATH_MAX] = "/devices/hv_24x7/interface/";
 
-   atoi(pe->aggr_mode) == PerChip ? strcat(path, "sockets") : strcat(path, 
"coresperchip");
+   atoi(pm->aggr_mode) == PerChip ? strcat(path, "sockets") : strcat(path, 
"coresperchip");
return sysfs__read_int(path, ) < 0 ? 1 : count;
 }
diff --git a/tools/perf/pmu-events/empty-pmu-events.c 
b/tools/perf/pmu-events/empty-pmu-events.c
index 480e8f0d30c8..4e39d1a8d6d6 100644
--- a/tools/perf/pmu-events/empty-pmu-events.c
+++ b/tools/perf/pmu-events/empty-pmu-events.c
@@ -181,6 +181,11 @@ struct pmu_events_table {
const struct pmu_event *entries;
 };
 
+/* Struct used to make the PMU metric table implementation opaque to callers. 
*/
+struct pmu_metrics_table {
+   const struct pmu_metric *entries;
+};
+
 /*
  * Map a CPU to its table of PMU events. The CPU is identified by the
  * cpuid field, which is an arch-specific identifier for the CPU.
@@ -254,11 +259,29 @@ static const struct pmu_sys_events pmu_sys_event_tables[] 
= {
 int pmu_events_table_for_each_event(const struct pmu_events_table *table, 
pmu_event_iter_fn fn,
void *data)
 {
-   for (const struct pmu_event *pe = >entries[0];
-pe->name || pe->metric_group || pe->metric_name;
-pe++) {
-   int ret = fn(pe, table, data);
+   for (const struct pmu_event *pe = >entries[0]; pe->name || 
pe->metric_expr; pe++) {
+   int ret;
 
+   if (!pe->name)
+   continue;
+   ret = fn(pe, table, data);
+   if (ret)
+   return ret;
+   }
+   return 0;
+}
+
+int pmu_events_table_for_each_metric(const struct pmu_events_table *etable, 
pmu_metric_iter_fn fn,
+void *data)
+{
+   struct pmu_metrics_table *table = (struct pmu_metrics_table *)etable;
+
+   for (const struct pmu_metric *pm = >entries[0]; pm->name || 
pm->metric_expr; pm++) {
+   int ret;
+
+   if (!pm->metric_expr)
+   continue;
+   ret = fn(pm, etable, data);
if (ret)
return ret;
}
@@ -305,11 +328,22 @@ const struct pmu_events_table 
*find_core_events_table(const char *arch, const ch
 }
 
 int pmu_for_each_core_event(pmu_event_iter_fn fn, void *data)
+{
+   for (const struct pmu_events_map *tables = _events_map[0]; 
tables->arch; tables++) {
+   int ret = pmu_events_table_for_each_event(>table, fn, 
data);
+
+   if (ret)
+   return ret;
+   }
+   return 0;
+}
+
+int pmu_for_each_core_metric(pmu_metric_iter_fn fn, void *data)
 {
for (const struct pmu_events_map *tables = _events_map[0];
 tables->arch;
 tables++) {
-   int ret = pmu_events_table_for_each_event(>table, fn, 
data);
+   int ret = pmu_events_table_for_each_metric(>table, fn, 
data);
 
if (ret)
return ret;
@@ -340,3 +374,8 @@ int pmu_for_each_sys_event(pmu_event_iter_fn fn, void *data)
}
return 0;
 }
+
+int pmu_for_each_sys_metric(pmu_metric_iter_fn fn __maybe_unused, void *data 
__maybe_unused)
+{
+   return 0;
+}
diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jevents.py
index 15a1671740cc..858787a12302 100755
--- a/tools/perf/pmu-events/jevents.py
+++ b/tools/perf/pmu-events/jevents.py
@@ -564,7 +564,19 @@ static const struct pmu_sys_events pmu_sys_event_tables[] 
= {
 \t},
 };
 
-static void decompress(int offset, struct pmu_event *pe)
+static void decompress_event(int offset, struct pmu_event *pe)
+{

[PATCH v4 03/12] perf jevents: Rewrite metrics in the same file with each other

2023-01-25 Thread Ian Rogers
Rewrite metrics within the same file in terms of each other. For example, on 
Power8
other_stall_cpi is rewritten from:
"PM_CMPLU_STALL / PM_RUN_INST_CMPL - PM_CMPLU_STALL_BRU_CRU / PM_RUN_INST_CMPL 
- PM_CMPLU_STALL_FXU / PM_RUN_INST_CMPL - PM_CMPLU_STALL_VSU / PM_RUN_INST_CMPL 
- PM_CMPLU_STALL_LSU / PM_RUN_INST_CMPL - PM_CMPLU_STALL_NTCG_FLUSH / 
PM_RUN_INST_CMPL - PM_CMPLU_STALL_NO_NTF / PM_RUN_INST_CMPL"
to:
"stall_cpi - bru_cru_stall_cpi - fxu_stall_cpi - vsu_stall_cpi - lsu_stall_cpi 
- ntcg_flush_cpi - no_ntf_stall_cpi"
Which more closely matches the definition on Power9.

To avoid recomputation decorate the function with a cache.

Signed-off-by: Ian Rogers 
---
 tools/perf/pmu-events/jevents.py | 21 -
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jevents.py
index 0416b7442171..15a1671740cc 100755
--- a/tools/perf/pmu-events/jevents.py
+++ b/tools/perf/pmu-events/jevents.py
@@ -3,6 +3,7 @@
 """Convert directories of JSON events to C code."""
 import argparse
 import csv
+from functools import lru_cache
 import json
 import metric
 import os
@@ -337,18 +338,28 @@ class JsonEvent:
 s = self.build_c_string()
 return f'{{ { _bcs.offsets[s] } }}, /* {s} */\n'
 
-
+@lru_cache(maxsize=None)
 def read_json_events(path: str, topic: str) -> Sequence[JsonEvent]:
   """Read json events from the specified file."""
-
   try:
-result = json.load(open(path), object_hook=JsonEvent)
+events = json.load(open(path), object_hook=JsonEvent)
   except BaseException as err:
 print(f"Exception processing {path}")
 raise
-  for event in result:
+  metrics: list[Tuple[str, metric.Expression]] = []
+  for event in events:
 event.topic = topic
-  return result
+if event.metric_name and '-' not in event.metric_name:
+  metrics.append((event.metric_name, event.metric_expr))
+  updates = metric.RewriteMetricsInTermsOfOthers(metrics)
+  if updates:
+for event in events:
+  if event.metric_name in updates:
+# print(f'Updated {event.metric_name} from\n"{event.metric_expr}"\n'
+#   f'to\n"{updates[event.metric_name]}"')
+event.metric_expr = updates[event.metric_name]
+
+  return events
 
 def preprocess_arch_std_files(archpath: str) -> None:
   """Read in all architecture standard events."""
-- 
2.39.1.456.gfc5497dd1b-goog



[PATCH v4 02/12] perf jevents metric: Add ability to rewrite metrics in terms of others

2023-01-25 Thread Ian Rogers
Add RewriteMetricsInTermsOfOthers that iterates over pairs of names
and expressions trying to replace an expression, within the current
expression, with its name.

Signed-off-by: Ian Rogers 
---
 tools/perf/pmu-events/metric.py  | 73 +++-
 tools/perf/pmu-events/metric_test.py | 10 
 2 files changed, 81 insertions(+), 2 deletions(-)

diff --git a/tools/perf/pmu-events/metric.py b/tools/perf/pmu-events/metric.py
index 2f2fd220e843..ed13efac7389 100644
--- a/tools/perf/pmu-events/metric.py
+++ b/tools/perf/pmu-events/metric.py
@@ -4,7 +4,7 @@ import ast
 import decimal
 import json
 import re
-from typing import Dict, List, Optional, Set, Union
+from typing import Dict, List, Optional, Set, Tuple, Union
 
 
 class Expression:
@@ -26,6 +26,9 @@ class Expression:
 """Returns true when two expressions are the same."""
 raise NotImplementedError()
 
+  def Substitute(self, name: str, expression: 'Expression') -> 'Expression':
+raise NotImplementedError()
+
   def __str__(self) -> str:
 return self.ToPerfJson()
 
@@ -186,6 +189,15 @@ class Operator(Expression):
   other.lhs) and self.rhs.Equals(other.rhs)
 return False
 
+  def Substitute(self, name: str, expression: Expression) -> Expression:
+if self.Equals(expression):
+  return Event(name)
+lhs = self.lhs.Substitute(name, expression)
+rhs = None
+if self.rhs:
+  rhs = self.rhs.Substitute(name, expression)
+return Operator(self.operator, lhs, rhs)
+
 
 class Select(Expression):
   """Represents a select ternary in the parse tree."""
@@ -225,6 +237,14 @@ class Select(Expression):
   other.false_val) and self.true_val.Equals(other.true_val)
 return False
 
+  def Substitute(self, name: str, expression: Expression) -> Expression:
+if self.Equals(expression):
+  return Event(name)
+true_val = self.true_val.Substitute(name, expression)
+cond = self.cond.Substitute(name, expression)
+false_val = self.false_val.Substitute(name, expression)
+return Select(true_val, cond, false_val)
+
 
 class Function(Expression):
   """A function in an expression like min, max, d_ratio."""
@@ -267,6 +287,15 @@ class Function(Expression):
   return result
 return False
 
+  def Substitute(self, name: str, expression: Expression) -> Expression:
+if self.Equals(expression):
+  return Event(name)
+lhs = self.lhs.Substitute(name, expression)
+rhs = None
+if self.rhs:
+  rhs = self.rhs.Substitute(name, expression)
+return Function(self.fn, lhs, rhs)
+
 
 def _FixEscapes(s: str) -> str:
   s = re.sub(r'([^\\]),', r'\1\\,', s)
@@ -293,6 +322,9 @@ class Event(Expression):
   def Equals(self, other: Expression) -> bool:
 return isinstance(other, Event) and self.name == other.name
 
+  def Substitute(self, name: str, expression: Expression) -> Expression:
+return self
+
 
 class Constant(Expression):
   """A constant within the expression tree."""
@@ -317,6 +349,9 @@ class Constant(Expression):
   def Equals(self, other: Expression) -> bool:
 return isinstance(other, Constant) and self.value == other.value
 
+  def Substitute(self, name: str, expression: Expression) -> Expression:
+return self
+
 
 class Literal(Expression):
   """A runtime literal within the expression tree."""
@@ -336,6 +371,9 @@ class Literal(Expression):
   def Equals(self, other: Expression) -> bool:
 return isinstance(other, Literal) and self.value == other.value
 
+  def Substitute(self, name: str, expression: Expression) -> Expression:
+return self
+
 
 def min(lhs: Union[int, float, Expression], rhs: Union[int, float,
Expression]) -> 
Function:
@@ -461,6 +499,7 @@ class MetricGroup:
 
 
 class _RewriteIfExpToSelect(ast.NodeTransformer):
+  """Transformer to convert if-else nodes to Select expressions."""
 
   def visit_IfExp(self, node):
 # pylint: disable=invalid-name
@@ -498,7 +537,37 @@ def ParsePerfJson(orig: str) -> Expression:
   for kw in keywords:
 py = re.sub(rf'Event\(r"{kw}"\)', kw, py)
 
-  parsed = ast.parse(py, mode='eval')
+  try:
+parsed = ast.parse(py, mode='eval')
+  except SyntaxError as e:
+raise SyntaxError(f'Parsing expression:\n{orig}') from e
   _RewriteIfExpToSelect().visit(parsed)
   parsed = ast.fix_missing_locations(parsed)
   return _Constify(eval(compile(parsed, orig, 'eval')))
+
+
+def RewriteMetricsInTermsOfOthers(metrics: list[Tuple[str, Expression]]
+  )-> Dict[str, Expression]:
+  """Shorten metrics by rewriting in terms of others.
+
+  Args:
+metrics (list): pairs of metric names and their expressions.
+  Returns:
+Dict: mapping from a metric name to a shortened expression.
+  """
+  updates: Dict[str, Expression] = dict()
+  for outer_name, outer_expression in metrics:
+updated = outer_expression
+while True:
+  for inner_name, inner_expression in metrics:
+if 

[PATCH v4 01/12] perf jevents metric: Correct Function equality

2023-01-25 Thread Ian Rogers
rhs may not be defined, say for source_count, so add a guard.

Reviewed-by: Kajol Jain
Signed-off-by: Ian Rogers 
---
 tools/perf/pmu-events/metric.py | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/tools/perf/pmu-events/metric.py b/tools/perf/pmu-events/metric.py
index 4797ed4fd817..2f2fd220e843 100644
--- a/tools/perf/pmu-events/metric.py
+++ b/tools/perf/pmu-events/metric.py
@@ -261,8 +261,10 @@ class Function(Expression):
 
   def Equals(self, other: Expression) -> bool:
 if isinstance(other, Function):
-  return self.fn == other.fn and self.lhs.Equals(
-  other.lhs) and self.rhs.Equals(other.rhs)
+  result = self.fn == other.fn and self.lhs.Equals(other.lhs)
+  if self.rhs:
+result = result and self.rhs.Equals(other.rhs)
+  return result
 return False
 
 
-- 
2.39.1.456.gfc5497dd1b-goog



[PATCH v4 00/12] jevents/pmu-events improvements

2023-01-25 Thread Ian Rogers
Add an optimization to jevents using the metric code, rewrite metrics
in terms of each other in order to minimize size and improve
readability. For example, on Power8
other_stall_cpi is rewritten from:
"PM_CMPLU_STALL / PM_RUN_INST_CMPL - PM_CMPLU_STALL_BRU_CRU / PM_RUN_INST_CMPL 
- PM_CMPLU_STALL_FXU / PM_RUN_INST_CMPL - PM_CMPLU_STALL_VSU / PM_RUN_INST_CMPL 
- PM_CMPLU_STALL_LSU / PM_RUN_INST_CMPL - PM_CMPLU_STALL_NTCG_FLUSH / 
PM_RUN_INST_CMPL - PM_CMPLU_STALL_NO_NTF / PM_RUN_INST_CMPL"
to:
"stall_cpi - bru_cru_stall_cpi - fxu_stall_cpi - vsu_stall_cpi - lsu_stall_cpi 
- ntcg_flush_cpi - no_ntf_stall_cpi"
Which more closely matches the definition on Power9.

A limitation of the substitutions are that they depend on strict
equality and the shape of the tree. This means that for "a + b + c"
then a substitution of "a + b" will succeed while "b + c" will fail
(the LHS for "+ c" is "a + b" not just "b").

Separate out the events and metrics in the pmu-events tables saving
14.8% in the table size while making it that metrics no longer need to
iterate over all events and vice versa. These changes remove evsel's
direct metric support as the pmu_event no longer has a metric to
populate it. This is a minor issue as the code wasn't working
properly, metrics for this are rare and can still be properly ran
using '-M'.

Add an ability to just build certain models into the jevents generated
pmu-metrics.c code. This functionality is appropriate for operating
systems like ChromeOS, that aim to minimize binary size and know all
the target CPU models.

v4. Better support the implementor/model style --model argument for
jevents.py. Add #slots test fix. On some patches add reviewed-by
John Garry  and Kajol
Jain.
v3. Rebase an incorporate review comments from John Garry
, in particular breaking apart patch 4
into 3 patches. The no jevents breakage and then later fix is
avoided in this series too.
v2. Rebase. Modify the code that skips rewriting a metric with the
same name with itself, to make the name check case insensitive.

Ian Rogers (12):
  perf jevents metric: Correct Function equality
  perf jevents metric: Add ability to rewrite metrics in terms of others
  perf jevents: Rewrite metrics in the same file with each other
  perf pmu-events: Add separate metric from pmu_event
  perf pmu-events: Separate the metrics from events for no jevents
  perf pmu-events: Remove now unused event and metric variables
  perf stat: Remove evsel metric_name/expr
  perf jevents: Combine table prefix and suffix writing
  perf pmu-events: Introduce pmu_metrics_table
  perf jevents: Generate metrics and events as separate tables
  perf jevents: Add model list option
  perf pmu-events: Fix testing with JEVENTS_ARCH=all

 tools/perf/arch/arm64/util/pmu.c |  11 +-
 tools/perf/arch/powerpc/util/header.c|   4 +-
 tools/perf/builtin-list.c|  20 +-
 tools/perf/builtin-stat.c|   1 -
 tools/perf/pmu-events/Build  |   3 +-
 tools/perf/pmu-events/empty-pmu-events.c | 108 ++-
 tools/perf/pmu-events/jevents.py | 357 +++
 tools/perf/pmu-events/metric.py  |  79 -
 tools/perf/pmu-events/metric_test.py |  10 +
 tools/perf/pmu-events/pmu-events.h   |  26 +-
 tools/perf/tests/expand-cgroup.c |   4 +-
 tools/perf/tests/parse-metric.c  |   4 +-
 tools/perf/tests/pmu-events.c|  69 ++---
 tools/perf/util/cgroup.c |   1 -
 tools/perf/util/evsel.c  |   2 -
 tools/perf/util/evsel.h  |   2 -
 tools/perf/util/expr.h   |   1 +
 tools/perf/util/expr.l   |   8 +-
 tools/perf/util/metricgroup.c| 207 +++--
 tools/perf/util/metricgroup.h|   4 +-
 tools/perf/util/parse-events.c   |   2 -
 tools/perf/util/pmu.c|  44 +--
 tools/perf/util/pmu.h|  10 +-
 tools/perf/util/print-events.c   |  32 +-
 tools/perf/util/print-events.h   |   3 +-
 tools/perf/util/python.c |   7 -
 tools/perf/util/stat-shadow.c| 112 ---
 tools/perf/util/stat.h   |   1 -
 28 files changed, 666 insertions(+), 466 deletions(-)

-- 
2.39.1.456.gfc5497dd1b-goog



Re: [PATCH v3 2/7] mm: introduce vma->vm_flags wrapper functions

2023-01-25 Thread Suren Baghdasaryan
On Wed, Jan 25, 2023 at 4:28 PM Andrew Morton  wrote:
>
> On Wed, 25 Jan 2023 15:35:49 -0800 Suren Baghdasaryan  
> wrote:
>
> > --- a/include/linux/mm_types.h
> > +++ b/include/linux/mm_types.h
> > @@ -491,7 +491,15 @@ struct vm_area_struct {
> >* See vmf_insert_mixed_prot() for discussion.
> >*/
> >   pgprot_t vm_page_prot;
> > - unsigned long vm_flags; /* Flags, see mm.h. */
> > +
> > + /*
> > +  * Flags, see mm.h.
> > +  * To modify use {init|reset|set|clear|mod}_vm_flags() functions.
> > +  */
> > + union {
> > + const vm_flags_t vm_flags;
> > + vm_flags_t __private __vm_flags;
> > + };
>
> Typically when making a change like this we'll rename the affected
> field/variable/function/etc.  This will reliably and deliberately break
> unconverted usage sites.
>
> This const trick will get us partway there, by breaking setters.  But
> renaming it will break both setters and getters.

My intent here is to break setters but to allow getters to keep
reading vma->vm_flags directly. We could provide get_vm_flags() and
convert all getters as well but it would introduce a huge additional
churn (800+ hits) with no obvious benefits, I think. Does that clarify
the intent of this trick?

>
> --
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to kernel-team+unsubscr...@android.com.
>


Re: [PATCH v3 2/7] mm: introduce vma->vm_flags wrapper functions

2023-01-25 Thread Suren Baghdasaryan
On Wed, Jan 25, 2023 at 4:24 PM Andrew Morton  wrote:
>
> On Wed, 25 Jan 2023 15:35:49 -0800 Suren Baghdasaryan  
> wrote:
>
> > vm_flags are among VMA attributes which affect decisions like VMA merging
> > and splitting. Therefore all vm_flags modifications are performed after
> > taking exclusive mmap_lock to prevent vm_flags updates racing with such
> > operations. Introduce modifier functions for vm_flags to be used whenever
> > flags are updated. This way we can better check and control correct
> > locking behavior during these updates.
> >
> > ...
> >
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > +static inline void init_vm_flags(struct vm_area_struct *vma,
> > +static inline void reset_vm_flags(struct vm_area_struct *vma,
> > +static inline void set_vm_flags(struct vm_area_struct *vma,
> > +static inline void clear_vm_flags(struct vm_area_struct *vma,
> > +static inline void mod_vm_flags(struct vm_area_struct *vma,
>
> vm_flags_init(), vm_flags_reset(), etc?
>
> This would be more idiomatic and I do think the most-significant-first
> naming style is preferable.

Thanks for the suggestion! I will rename them in the next version.

>
> --
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to kernel-team+unsubscr...@android.com.
>


Re: [PATCH v3 1/7] kernel/fork: convert vma assignment to a memcpy

2023-01-25 Thread Suren Baghdasaryan
On Wed, Jan 25, 2023 at 4:22 PM Andrew Morton  wrote:
>
> On Wed, 25 Jan 2023 15:35:48 -0800 Suren Baghdasaryan  
> wrote:
>
> > Convert vma assignment in vm_area_dup() to a memcpy() to prevent compiler
> > errors when we add a const modifier to vma->vm_flags.
> >
> > ...
> >
> > --- a/kernel/fork.c
> > +++ b/kernel/fork.c
> > @@ -482,7 +482,7 @@ struct vm_area_struct *vm_area_dup(struct 
> > vm_area_struct *orig)
> >* orig->shared.rb may be modified concurrently, but the clone
> >* will be reinitialized.
> >*/
> > - *new = data_race(*orig);
> > + memcpy(new, orig, sizeof(*new));
>
> The data_race() removal is unchangelogged?

True. I'll add a note in the changelog about that. Ideally I would
like to preserve it but I could not find a way to do that.

>
> >   INIT_LIST_HEAD(>anon_vma_chain);
> >   dup_anon_vma_name(orig, new);
> >   }
>


Re: [PATCH v3 2/7] mm: introduce vma->vm_flags wrapper functions

2023-01-25 Thread Andrew Morton
On Wed, 25 Jan 2023 15:35:49 -0800 Suren Baghdasaryan  wrote:

> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -491,7 +491,15 @@ struct vm_area_struct {
>* See vmf_insert_mixed_prot() for discussion.
>*/
>   pgprot_t vm_page_prot;
> - unsigned long vm_flags; /* Flags, see mm.h. */
> +
> + /*
> +  * Flags, see mm.h.
> +  * To modify use {init|reset|set|clear|mod}_vm_flags() functions.
> +  */
> + union {
> + const vm_flags_t vm_flags;
> + vm_flags_t __private __vm_flags;
> + };

Typically when making a change like this we'll rename the affected
field/variable/function/etc.  This will reliably and deliberately break
unconverted usage sites.

This const trick will get us partway there, by breaking setters.  But
renaming it will break both setters and getters.



Re: [PATCH v3 2/7] mm: introduce vma->vm_flags wrapper functions

2023-01-25 Thread Andrew Morton
On Wed, 25 Jan 2023 15:35:49 -0800 Suren Baghdasaryan  wrote:

> vm_flags are among VMA attributes which affect decisions like VMA merging
> and splitting. Therefore all vm_flags modifications are performed after
> taking exclusive mmap_lock to prevent vm_flags updates racing with such
> operations. Introduce modifier functions for vm_flags to be used whenever
> flags are updated. This way we can better check and control correct
> locking behavior during these updates.
> 
> ...
>
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> +static inline void init_vm_flags(struct vm_area_struct *vma,
> +static inline void reset_vm_flags(struct vm_area_struct *vma,
> +static inline void set_vm_flags(struct vm_area_struct *vma,
> +static inline void clear_vm_flags(struct vm_area_struct *vma,
> +static inline void mod_vm_flags(struct vm_area_struct *vma,

vm_flags_init(), vm_flags_reset(), etc?

This would be more idiomatic and I do think the most-significant-first
naming style is preferable.



Re: [PATCH v3 1/7] kernel/fork: convert vma assignment to a memcpy

2023-01-25 Thread Andrew Morton
On Wed, 25 Jan 2023 15:35:48 -0800 Suren Baghdasaryan  wrote:

> Convert vma assignment in vm_area_dup() to a memcpy() to prevent compiler
> errors when we add a const modifier to vma->vm_flags.
> 
> ...
>
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -482,7 +482,7 @@ struct vm_area_struct *vm_area_dup(struct vm_area_struct 
> *orig)
>* orig->shared.rb may be modified concurrently, but the clone
>* will be reinitialized.
>*/
> - *new = data_race(*orig);
> + memcpy(new, orig, sizeof(*new));

The data_race() removal is unchangelogged?

>   INIT_LIST_HEAD(>anon_vma_chain);
>   dup_anon_vma_name(orig, new);
>   }



Re: [PATCH v9 08/10] arm64: dts: ls1088a: Add serdes bindings

2023-01-25 Thread Shawn Guo
On Thu, Dec 29, 2022 at 07:01:37PM -0500, Sean Anderson wrote:
> This adds bindings for the SerDes devices. They are disabled by default
> to prevent any breakage on existing boards.
> 
> Signed-off-by: Sean Anderson 
> ---
> 
> (no changes since v4)
> 
> Changes in v4:
> - Convert to new bindings
> 
> Changes in v3:
> - New
> 
>  arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi | 18 ++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi 
> b/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi
> index 260d045dbd9a..ecf9d830e36f 100644
> --- a/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi
> +++ b/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi
> @@ -238,6 +238,24 @@ reset: syscon@1e6 {
>   reg = <0x0 0x1e6 0x0 0x1>;
>   };
>  
> + serdes1: serdes@1ea {
> + #address-cells = <1>;
> + #size-cells = <0>;
> + #clock-cells = <1>;
> + compatible = "fsl,ls1088a-serdes", "fsl,lynx-10g";
> + reg = <0x0 0x1ea 0x0 0x2000>;

Can we start the properties with compatible (and reg) like most of other
device nodes?

Shawn

> + status = "disabled";
> + };
> +
> + serdes2: serdes@1eb {
> + #address-cells = <1>;
> + #size-cells = <0>;
> + #clock-cells = <1>;
> + compatible = "fsl,ls1088a-serdes", "fsl,lynx-10g";
> + reg = <0x0 0x1eb 0x0 0x2000>;
> + status = "disabled";
> + };
> +
>   isc: syscon@1f7 {
>   compatible = "fsl,ls1088a-isc", "syscon";
>   reg = <0x0 0x1f7 0x0 0x1>;
> -- 
> 2.35.1.1320.gc452695387.dirty
> 


Re: [PATCH v9 06/10] arm64: dts: ls1046a: Add serdes bindings

2023-01-25 Thread Shawn Guo
On Thu, Dec 29, 2022 at 07:01:35PM -0500, Sean Anderson wrote:
> This adds bindings for the SerDes devices. They are disabled by default

s/bindings/descriptions?

The term "bindings" generally means the schema/doc in
Documentation/devicetree/bindings/.

Shawn

> to prevent any breakage on existing boards.
> 
> Signed-off-by: Sean Anderson 
> ---
> 
> (no changes since v4)
> 
> Changes in v4:
> - Convert to new bindings
> 
> Changes in v3:
> - Describe modes in device tree
> 
> Changes in v2:
> - Use one phy cell for SerDes1, since no lanes can be grouped
> - Disable SerDes by default to prevent breaking boards inadvertently.
> 
>  arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi | 18 ++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi 
> b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
> index a01e3cfec77f..12adccd5caae 100644
> --- a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
> +++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
> @@ -424,6 +424,24 @@ sfp: efuse@1e8 {
>   clock-names = "sfp";
>   };
>  
> + serdes1: serdes@1ea {
> + #address-cells = <1>;
> + #size-cells = <0>;
> + #clock-cells = <1>;
> + compatible = "fsl,ls1046a-serdes", "fsl,lynx-10g";
> + reg = <0x0 0x1ea 0x0 0x2000>;
> + status = "disabled";
> + };
> +
> + serdes2: serdes@1eb {
> + #address-cells = <1>;
> + #size-cells = <0>;
> + #clock-cells = <1>;
> + compatible = "fsl,ls1046a-serdes", "fsl,lynx-10g";
> + reg = <0x0 0x1eb 0x0 0x2000>;
> + status = "disabled";
> + };
> +
>   dcfg: dcfg@1ee {
>   compatible = "fsl,ls1046a-dcfg", "syscon";
>   reg = <0x0 0x1ee 0x0 0x1000>;
> -- 
> 2.35.1.1320.gc452695387.dirty
> 


Re: [PATCH v9 07/10] arm64: dts: ls1046ardb: Add serdes bindings

2023-01-25 Thread Shawn Guo
On Thu, Dec 29, 2022 at 07:01:36PM -0500, Sean Anderson wrote:
> This adds appropriate bindings for the macs which use the SerDes. The
> 156.25MHz fixed clock is a crystal. The 100MHz clocks (there are
> actually 3) come from a Renesas 6V49205B at address 69 on i2c0. There is
> no driver for this device (and as far as I know all you can do with the
> 100MHz clocks is gate them), so I have chosen to model it as a single
> fixed clock.
> 
> Note: the SerDes1 lane numbering for the LS1046A is *reversed*.
> This means that Lane A (what the driver thinks is lane 0) uses pins
> SD1_TX3_P/N.
> 
> Because this will break ethernet if the serdes is not enabled, enable
> the serdes driver by default on Layerscape.
> 
> Signed-off-by: Sean Anderson 
> ---
> This depends on [1].
> 
> [1] 
> https://lore.kernel.org/netdev/20220804194705.459670-4-sean.ander...@seco.com/
> 
> Changes in v9:
> - Fix name of phy mode node
> - phy-type -> fsl,phy
> 
> Changes in v8:
> - Rename serdes phy handles to use _A, _B, etc. instead of _0, _1, etc.
>   This should help remind readers that the numbering corresponds to the
>   physical layout of the registers, and not the lane (pin) number.
> 
> Changes in v6:
> - XGI.9 -> XFI.9
> 
> Changes in v4:
> - Convert to new bindings
> 
>  .../boot/dts/freescale/fsl-ls1046a-rdb.dts| 112 ++
>  drivers/phy/freescale/Kconfig |   1 +

The phy driver Kconfig change shouldn't be part of this patch.

>  2 files changed, 113 insertions(+)
> 
> diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a-rdb.dts 
> b/arch/arm64/boot/dts/freescale/fsl-ls1046a-rdb.dts
> index 7025aad8ae89..534f19855b47 100644
> --- a/arch/arm64/boot/dts/freescale/fsl-ls1046a-rdb.dts
> +++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a-rdb.dts
> @@ -10,6 +10,8 @@
>  
>  /dts-v1/;
>  
> +#include 
> +
>  #include "fsl-ls1046a.dtsi"
>  
>  / {
> @@ -26,8 +28,110 @@ aliases {
>   chosen {
>   stdout-path = "serial0:115200n8";
>   };
> +
> + clocks {

Drop this container node.

Shawn

> + clk_100mhz: clock-100mhz {
> + compatible = "fixed-clock";
> + #clock-cells = <0>;
> + clock-frequency = <1>;
> + };
> +
> + clk_156mhz: clock-156mhz {
> + compatible = "fixed-clock";
> + #clock-cells = <0>;
> + clock-frequency = <15625>;
> + };
> + };
>  };
>  
> + {
> + clocks = <_100mhz>, <_156mhz>;
> + clock-names = "ref0", "ref1";
> + status = "okay";
> +
> + /*
> +  * XXX: Lane A uses pins SD1_RX3_P/N! That is, the lane numbers and pin
> +  * numbers are _reversed_. In addition, the PCCR documentation is
> +  * _inconsistent_ in its usage of these terms!
> +  *
> +  * PCCR "Lane 0" refers to...
> +  *  =
> +  *0 Lane A
> +  *2 Lane A
> +  *8 Lane A
> +  *9 Lane A
> +  *B Lane D!
> +  */
> + serdes1_A: phy@0 {
> + #phy-cells = <0>;
> + reg = <0>;
> +
> + /* SGMII.6 */
> + sgmii-0 {
> + fsl,pccr = <0x8>;
> + fsl,index = <0>;
> + fsl,cfg = <0x1>;
> + fsl,type = ;
> + };
> + };
> +
> + serdes1_B: phy@1 {
> + #phy-cells = <0>;
> + reg = <1>;
> +
> + /* SGMII.5 */
> + sgmii-1 {
> + fsl,pccr = <0x8>;
> + fsl,index = <1>;
> + fsl,cfg = <0x1>;
> + fsl,type = ;
> + };
> + };
> +
> + serdes1_C: phy@2 {
> + #phy-cells = <0>;
> + reg = <2>;
> +
> + /* SGMII.10 */
> + sgmii-2 {
> + fsl,pccr = <0x8>;
> + fsl,index = <2>;
> + fsl,cfg = <0x1>;
> + fsl,type = ;
> + };
> +
> + /* XFI.10 */
> + xfi-0 {
> + fsl,pccr = <0xb>;
> + fsl,index = <0>;
> + fsl,cfg = <0x2>;
> + fsl,type = ;
> + };
> + };
> +
> + serdes1_D: phy@3 {
> + #phy-cells = <0>;
> + reg = <3>;
> +
> + /* SGMII.9 */
> + sgmii-3 {
> + fsl,pccr = <0x8>;
> + fsl,index = <3>;
> + fsl,cfg = <0x1>;
> + fsl,type = ;
> + };
> +
> + /* XFI.9 */
> + xfi-1 {
> + fsl,pccr = <0xb>;
> + fsl,index = <1>;
> + fsl,cfg = <0x1>;
> + fsl,type = ;
> + };
> + };
> +};
> +
> +
>   {
>   status = "okay";
>  };
> @@ -140,21 +244,29 @@ ethernet@e6000 {
>   

[PATCH v3 7/7] mm: export dump_mm()

2023-01-25 Thread Suren Baghdasaryan
mmap_assert_write_locked() is used in vm_flags modifiers. Because
mmap_assert_write_locked() uses dump_mm() and vm_flags are sometimes
modified from inside a module, it's necessary to export dump_mm()
function.

Signed-off-by: Suren Baghdasaryan 
Acked-by: Michal Hocko 
---
 mm/debug.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/debug.c b/mm/debug.c
index 9d3d893dc7f4..96d594e16292 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -215,6 +215,7 @@ void dump_mm(const struct mm_struct *mm)
mm->def_flags, >def_flags
);
 }
+EXPORT_SYMBOL(dump_mm);
 
 static bool page_init_poisoning __read_mostly = true;
 
-- 
2.39.1



[PATCH v3 6/7] mm: introduce mod_vm_flags_nolock and use it in untrack_pfn

2023-01-25 Thread Suren Baghdasaryan
In cases when VMA flags are modified after VMA was isolated and mmap_lock
was downgraded, flags modifications would result in an assertion because
mmap write lock is not held.
Introduce mod_vm_flags_nolock to be used in such situation, when VMA is
not part of VMA tree and locking it is not required.
Pass a hint to untrack_pfn to conditionally use mod_vm_flags_nolock for
flags modification and to avoid assertion.

Signed-off-by: Suren Baghdasaryan 
---
 arch/x86/mm/pat/memtype.c | 10 +++---
 include/linux/mm.h| 16 +---
 include/linux/pgtable.h   |  5 +++--
 mm/memory.c   | 13 +++--
 mm/memremap.c |  4 ++--
 mm/mmap.c | 16 ++--
 6 files changed, 42 insertions(+), 22 deletions(-)

diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
index ae9645c900fa..d8adc0b42cf2 100644
--- a/arch/x86/mm/pat/memtype.c
+++ b/arch/x86/mm/pat/memtype.c
@@ -1046,7 +1046,7 @@ void track_pfn_insert(struct vm_area_struct *vma, 
pgprot_t *prot, pfn_t pfn)
  * can be for the entire vma (in which case pfn, size are zero).
  */
 void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
-unsigned long size)
+unsigned long size, bool mm_wr_locked)
 {
resource_size_t paddr;
unsigned long prot;
@@ -1065,8 +1065,12 @@ void untrack_pfn(struct vm_area_struct *vma, unsigned 
long pfn,
size = vma->vm_end - vma->vm_start;
}
free_pfn_range(paddr, size);
-   if (vma)
-   clear_vm_flags(vma, VM_PAT);
+   if (vma) {
+   if (mm_wr_locked)
+   clear_vm_flags(vma, VM_PAT);
+   else
+   mod_vm_flags_nolock(vma, 0, VM_PAT);
+   }
 }
 
 /*
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 1ab5f73360f2..86bf043136f3 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -656,12 +656,22 @@ static inline void clear_vm_flags(struct vm_area_struct 
*vma,
ACCESS_PRIVATE(vma, __vm_flags) &= ~flags;
 }
 
+/*
+ * Use only if VMA has been previously isolated, is not part of the VMA tree
+ * and therefore needs no locking.
+ */
+static inline void mod_vm_flags_nolock(struct vm_area_struct *vma,
+  vm_flags_t set, vm_flags_t clear)
+{
+   ACCESS_PRIVATE(vma, __vm_flags) |= set;
+   ACCESS_PRIVATE(vma, __vm_flags) &= ~clear;
+}
+
 static inline void mod_vm_flags(struct vm_area_struct *vma,
vm_flags_t set, vm_flags_t clear)
 {
mmap_assert_write_locked(vma->vm_mm);
-   ACCESS_PRIVATE(vma, __vm_flags) |= set;
-   ACCESS_PRIVATE(vma, __vm_flags) &= ~clear;
+   mod_vm_flags_nolock(vma, set, clear);
 }
 
 static inline void vma_set_anonymous(struct vm_area_struct *vma)
@@ -2087,7 +2097,7 @@ static inline void zap_vma_pages(struct vm_area_struct 
*vma)
 }
 void unmap_vmas(struct mmu_gather *tlb, struct maple_tree *mt,
struct vm_area_struct *start_vma, unsigned long start,
-   unsigned long end);
+   unsigned long end, bool mm_wr_locked);
 
 struct mmu_notifier_range;
 
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 5fd45454c073..c63cd44777ec 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -1185,7 +1185,8 @@ static inline int track_pfn_copy(struct vm_area_struct 
*vma)
  * can be for the entire vma (in which case pfn, size are zero).
  */
 static inline void untrack_pfn(struct vm_area_struct *vma,
-  unsigned long pfn, unsigned long size)
+  unsigned long pfn, unsigned long size,
+  bool mm_wr_locked)
 {
 }
 
@@ -1203,7 +1204,7 @@ extern void track_pfn_insert(struct vm_area_struct *vma, 
pgprot_t *prot,
 pfn_t pfn);
 extern int track_pfn_copy(struct vm_area_struct *vma);
 extern void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
-   unsigned long size);
+   unsigned long size, bool mm_wr_locked);
 extern void untrack_pfn_moved(struct vm_area_struct *vma);
 #endif
 
diff --git a/mm/memory.c b/mm/memory.c
index d6902065e558..5b11b50e2c4a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1613,7 +1613,7 @@ void unmap_page_range(struct mmu_gather *tlb,
 static void unmap_single_vma(struct mmu_gather *tlb,
struct vm_area_struct *vma, unsigned long start_addr,
unsigned long end_addr,
-   struct zap_details *details)
+   struct zap_details *details, bool mm_wr_locked)
 {
unsigned long start = max(vma->vm_start, start_addr);
unsigned long end;
@@ -1628,7 +1628,7 @@ static void unmap_single_vma(struct mmu_gather *tlb,
uprobe_munmap(vma, start, end);
 
if (unlikely(vma->vm_flags & VM_PFNMAP))
-   untrack_pfn(vma, 0, 0);
+   untrack_pfn(vma, 0, 

[PATCH v3 5/7] mm: replace vma->vm_flags indirect modification in ksm_madvise

2023-01-25 Thread Suren Baghdasaryan
Replace indirect modifications to vma->vm_flags with calls to modifier
functions to be able to track flag changes and to keep vma locking
correctness.

Signed-off-by: Suren Baghdasaryan 
Acked-by: Michal Hocko 
---
 arch/powerpc/kvm/book3s_hv_uvmem.c | 5 -
 arch/s390/mm/gmap.c| 5 -
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 1d67baa5557a..325a7a47d348 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -393,6 +393,7 @@ static int kvmppc_memslot_page_merge(struct kvm *kvm,
 {
unsigned long gfn = memslot->base_gfn;
unsigned long end, start = gfn_to_hva(kvm, gfn);
+   unsigned long vm_flags;
int ret = 0;
struct vm_area_struct *vma;
int merge_flag = (merge) ? MADV_MERGEABLE : MADV_UNMERGEABLE;
@@ -409,12 +410,14 @@ static int kvmppc_memslot_page_merge(struct kvm *kvm,
ret = H_STATE;
break;
}
+   vm_flags = vma->vm_flags;
ret = ksm_madvise(vma, vma->vm_start, vma->vm_end,
- merge_flag, >vm_flags);
+ merge_flag, _flags);
if (ret) {
ret = H_STATE;
break;
}
+   reset_vm_flags(vma, vm_flags);
start = vma->vm_end;
} while (end > vma->vm_end);
 
diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index 3a695b8a1e3c..d5eb47dcdacb 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -2587,14 +2587,17 @@ int gmap_mark_unmergeable(void)
 {
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
+   unsigned long vm_flags;
int ret;
VMA_ITERATOR(vmi, mm, 0);
 
for_each_vma(vmi, vma) {
+   vm_flags = vma->vm_flags;
ret = ksm_madvise(vma, vma->vm_start, vma->vm_end,
- MADV_UNMERGEABLE, >vm_flags);
+ MADV_UNMERGEABLE, _flags);
if (ret)
return ret;
+   reset_vm_flags(vma, vm_flags);
}
mm->def_flags &= ~VM_MERGEABLE;
return 0;
-- 
2.39.1



[PATCH v3 4/7] mm: replace vma->vm_flags direct modifications with modifier calls

2023-01-25 Thread Suren Baghdasaryan
Replace direct modifications to vma->vm_flags with calls to modifier
functions to be able to track flag changes and to keep vma locking
correctness.

Signed-off-by: Suren Baghdasaryan 
Acked-by: Michal Hocko 
---
 arch/arm/kernel/process.c  |  2 +-
 arch/ia64/mm/init.c|  8 
 arch/loongarch/include/asm/tlb.h   |  2 +-
 arch/powerpc/kvm/book3s_xive_native.c  |  2 +-
 arch/powerpc/mm/book3s64/subpage_prot.c|  2 +-
 arch/powerpc/platforms/book3s/vas-api.c|  2 +-
 arch/powerpc/platforms/cell/spufs/file.c   | 14 +++---
 arch/s390/mm/gmap.c|  3 +--
 arch/x86/entry/vsyscall/vsyscall_64.c  |  2 +-
 arch/x86/kernel/cpu/sgx/driver.c   |  2 +-
 arch/x86/kernel/cpu/sgx/virt.c |  2 +-
 arch/x86/mm/pat/memtype.c  |  6 +++---
 arch/x86/um/mem_32.c   |  2 +-
 drivers/acpi/pfr_telemetry.c   |  2 +-
 drivers/android/binder.c   |  3 +--
 drivers/char/mspec.c   |  2 +-
 drivers/crypto/hisilicon/qm.c  |  2 +-
 drivers/dax/device.c   |  2 +-
 drivers/dma/idxd/cdev.c|  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c|  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c   |  4 ++--
 drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c  |  4 ++--
 drivers/gpu/drm/amd/amdkfd/kfd_events.c|  4 ++--
 drivers/gpu/drm/amd/amdkfd/kfd_process.c   |  4 ++--
 drivers/gpu/drm/drm_gem.c  |  2 +-
 drivers/gpu/drm/drm_gem_dma_helper.c   |  3 +--
 drivers/gpu/drm/drm_gem_shmem_helper.c |  2 +-
 drivers/gpu/drm/drm_vm.c   |  8 
 drivers/gpu/drm/etnaviv/etnaviv_gem.c  |  2 +-
 drivers/gpu/drm/exynos/exynos_drm_gem.c|  4 ++--
 drivers/gpu/drm/gma500/framebuffer.c   |  2 +-
 drivers/gpu/drm/i810/i810_dma.c|  2 +-
 drivers/gpu/drm/i915/gem/i915_gem_mman.c   |  4 ++--
 drivers/gpu/drm/mediatek/mtk_drm_gem.c |  2 +-
 drivers/gpu/drm/msm/msm_gem.c  |  2 +-
 drivers/gpu/drm/omapdrm/omap_gem.c |  3 +--
 drivers/gpu/drm/rockchip/rockchip_drm_gem.c|  3 +--
 drivers/gpu/drm/tegra/gem.c|  5 ++---
 drivers/gpu/drm/ttm/ttm_bo_vm.c|  3 +--
 drivers/gpu/drm/virtio/virtgpu_vram.c  |  2 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c   |  2 +-
 drivers/gpu/drm/xen/xen_drm_front_gem.c|  3 +--
 drivers/hsi/clients/cmt_speech.c   |  2 +-
 drivers/hwtracing/intel_th/msu.c   |  2 +-
 drivers/hwtracing/stm/core.c   |  2 +-
 drivers/infiniband/hw/hfi1/file_ops.c  |  4 ++--
 drivers/infiniband/hw/mlx5/main.c  |  4 ++--
 drivers/infiniband/hw/qib/qib_file_ops.c   | 13 ++---
 drivers/infiniband/hw/usnic/usnic_ib_verbs.c   |  2 +-
 drivers/infiniband/hw/vmw_pvrdma/pvrdma_verbs.c|  2 +-
 .../media/common/videobuf2/videobuf2-dma-contig.c  |  2 +-
 drivers/media/common/videobuf2/videobuf2-vmalloc.c |  2 +-
 drivers/media/v4l2-core/videobuf-dma-contig.c  |  2 +-
 drivers/media/v4l2-core/videobuf-dma-sg.c  |  4 ++--
 drivers/media/v4l2-core/videobuf-vmalloc.c |  2 +-
 drivers/misc/cxl/context.c |  2 +-
 drivers/misc/habanalabs/common/memory.c|  2 +-
 drivers/misc/habanalabs/gaudi/gaudi.c  |  4 ++--
 drivers/misc/habanalabs/gaudi2/gaudi2.c|  8 
 drivers/misc/habanalabs/goya/goya.c|  4 ++--
 drivers/misc/ocxl/context.c|  4 ++--
 drivers/misc/ocxl/sysfs.c  |  2 +-
 drivers/misc/open-dice.c   |  4 ++--
 drivers/misc/sgi-gru/grufile.c |  4 ++--
 drivers/misc/uacce/uacce.c |  2 +-
 drivers/sbus/char/oradax.c |  2 +-
 drivers/scsi/cxlflash/ocxl_hw.c|  2 +-
 drivers/scsi/sg.c  |  2 +-
 drivers/staging/media/atomisp/pci/hmm/hmm_bo.c |  2 +-
 drivers/staging/media/deprecated/meye/meye.c   |  4 ++--
 .../media/deprecated/stkwebcam/stk-webcam.c|  2 +-
 drivers/target/target_core_user.c  |  2 +-
 drivers/uio/uio.c  |  2 +-
 drivers/usb/core/devio.c   |  3 +--
 drivers/usb/mon/mon_bin.c  |  3 +--
 drivers/vdpa/vdpa_user/iova_domain.c   |  2 +-
 drivers/vfio/pci/vfio_pci_core.c   |  2 +-
 drivers/vhost/vdpa.c   |  2 +-
 

[PATCH v3 3/7] mm: replace VM_LOCKED_CLEAR_MASK with VM_LOCKED_MASK

2023-01-25 Thread Suren Baghdasaryan
To simplify the usage of VM_LOCKED_CLEAR_MASK in clear_vm_flags(),
replace it with VM_LOCKED_MASK bitmask and convert all users.

Signed-off-by: Suren Baghdasaryan 
Acked-by: Michal Hocko 
---
 include/linux/mm.h | 4 ++--
 kernel/fork.c  | 2 +-
 mm/hugetlb.c   | 4 ++--
 mm/mlock.c | 6 +++---
 mm/mmap.c  | 6 +++---
 mm/mremap.c| 2 +-
 6 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index bf16ddd544a5..cccbc2811827 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -421,8 +421,8 @@ extern unsigned int kobjsize(const void *objp);
 /* This mask defines which mm->def_flags a process can inherit its parent */
 #define VM_INIT_DEF_MASK   VM_NOHUGEPAGE
 
-/* This mask is used to clear all the VMA flags used by mlock */
-#define VM_LOCKED_CLEAR_MASK   (~(VM_LOCKED | VM_LOCKONFAULT))
+/* This mask represents all the VMA flag bits used by mlock */
+#define VM_LOCKED_MASK (VM_LOCKED | VM_LOCKONFAULT)
 
 /* Arch-specific flags to clear when updating VM flags on protection change */
 #ifndef VM_ARCH_CLEAR
diff --git a/kernel/fork.c b/kernel/fork.c
index a531901859d9..4f097999a570 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -669,7 +669,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
tmp->anon_vma = NULL;
} else if (anon_vma_fork(tmp, mpnt))
goto fail_nomem_anon_vma_fork;
-   tmp->vm_flags &= ~(VM_LOCKED | VM_LOCKONFAULT);
+   clear_vm_flags(tmp, VM_LOCKED_MASK);
file = tmp->vm_file;
if (file) {
struct address_space *mapping = file->f_mapping;
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index d20c8b09890e..4ecdbad9a451 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6973,8 +6973,8 @@ static unsigned long page_table_shareable(struct 
vm_area_struct *svma,
unsigned long s_end = sbase + PUD_SIZE;
 
/* Allow segments to share if only one is marked locked */
-   unsigned long vm_flags = vma->vm_flags & VM_LOCKED_CLEAR_MASK;
-   unsigned long svm_flags = svma->vm_flags & VM_LOCKED_CLEAR_MASK;
+   unsigned long vm_flags = vma->vm_flags & ~VM_LOCKED_MASK;
+   unsigned long svm_flags = svma->vm_flags & ~VM_LOCKED_MASK;
 
/*
 * match the virtual addresses, permission and the alignment of the
diff --git a/mm/mlock.c b/mm/mlock.c
index 0336f52e03d7..5c4fff93cd6b 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -497,7 +497,7 @@ static int apply_vma_lock_flags(unsigned long start, size_t 
len,
if (vma->vm_start != tmp)
return -ENOMEM;
 
-   newflags = vma->vm_flags & VM_LOCKED_CLEAR_MASK;
+   newflags = vma->vm_flags & ~VM_LOCKED_MASK;
newflags |= flags;
/* Here we know that  vma->vm_start <= nstart < vma->vm_end. */
tmp = vma->vm_end;
@@ -661,7 +661,7 @@ static int apply_mlockall_flags(int flags)
struct vm_area_struct *vma, *prev = NULL;
vm_flags_t to_add = 0;
 
-   current->mm->def_flags &= VM_LOCKED_CLEAR_MASK;
+   current->mm->def_flags &= ~VM_LOCKED_MASK;
if (flags & MCL_FUTURE) {
current->mm->def_flags |= VM_LOCKED;
 
@@ -681,7 +681,7 @@ static int apply_mlockall_flags(int flags)
for_each_vma(vmi, vma) {
vm_flags_t newflags;
 
-   newflags = vma->vm_flags & VM_LOCKED_CLEAR_MASK;
+   newflags = vma->vm_flags & ~VM_LOCKED_MASK;
newflags |= to_add;
 
/* Ignore errors */
diff --git a/mm/mmap.c b/mm/mmap.c
index d4abc6feced1..323bd253b25a 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2671,7 +2671,7 @@ unsigned long mmap_region(struct file *file, unsigned 
long addr,
if ((vm_flags & VM_SPECIAL) || vma_is_dax(vma) ||
is_vm_hugetlb_page(vma) ||
vma == get_gate_vma(current->mm))
-   vma->vm_flags &= VM_LOCKED_CLEAR_MASK;
+   clear_vm_flags(vma, VM_LOCKED_MASK);
else
mm->locked_vm += (len >> PAGE_SHIFT);
}
@@ -3340,8 +3340,8 @@ static struct vm_area_struct *__install_special_mapping(
vma->vm_start = addr;
vma->vm_end = addr + len;
 
-   vma->vm_flags = vm_flags | mm->def_flags | VM_DONTEXPAND | VM_SOFTDIRTY;
-   vma->vm_flags &= VM_LOCKED_CLEAR_MASK;
+   init_vm_flags(vma, (vm_flags | mm->def_flags |
+ VM_DONTEXPAND | VM_SOFTDIRTY) & ~VM_LOCKED_MASK);
vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
 
vma->vm_ops = ops;
diff --git a/mm/mremap.c b/mm/mremap.c
index 1b3ee02bead7..35db9752cb6a 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -687,7 +687,7 @@ static unsigned long move_vma(struct vm_area_struct *vma,
 
if (unlikely(!err && (flags & 

[PATCH v3 2/7] mm: introduce vma->vm_flags wrapper functions

2023-01-25 Thread Suren Baghdasaryan
vm_flags are among VMA attributes which affect decisions like VMA merging
and splitting. Therefore all vm_flags modifications are performed after
taking exclusive mmap_lock to prevent vm_flags updates racing with such
operations. Introduce modifier functions for vm_flags to be used whenever
flags are updated. This way we can better check and control correct
locking behavior during these updates.

Signed-off-by: Suren Baghdasaryan 
---
 include/linux/mm.h   | 37 +
 include/linux/mm_types.h | 10 +-
 2 files changed, 46 insertions(+), 1 deletion(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index c2f62bdce134..bf16ddd544a5 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -627,6 +627,43 @@ static inline void vma_init(struct vm_area_struct *vma, 
struct mm_struct *mm)
INIT_LIST_HEAD(>anon_vma_chain);
 }
 
+/* Use when VMA is not part of the VMA tree and needs no locking */
+static inline void init_vm_flags(struct vm_area_struct *vma,
+vm_flags_t flags)
+{
+   ACCESS_PRIVATE(vma, __vm_flags) = flags;
+}
+
+/* Use when VMA is part of the VMA tree and modifications need coordination */
+static inline void reset_vm_flags(struct vm_area_struct *vma,
+ vm_flags_t flags)
+{
+   mmap_assert_write_locked(vma->vm_mm);
+   init_vm_flags(vma, flags);
+}
+
+static inline void set_vm_flags(struct vm_area_struct *vma,
+   vm_flags_t flags)
+{
+   mmap_assert_write_locked(vma->vm_mm);
+   ACCESS_PRIVATE(vma, __vm_flags) |= flags;
+}
+
+static inline void clear_vm_flags(struct vm_area_struct *vma,
+ vm_flags_t flags)
+{
+   mmap_assert_write_locked(vma->vm_mm);
+   ACCESS_PRIVATE(vma, __vm_flags) &= ~flags;
+}
+
+static inline void mod_vm_flags(struct vm_area_struct *vma,
+   vm_flags_t set, vm_flags_t clear)
+{
+   mmap_assert_write_locked(vma->vm_mm);
+   ACCESS_PRIVATE(vma, __vm_flags) |= set;
+   ACCESS_PRIVATE(vma, __vm_flags) &= ~clear;
+}
+
 static inline void vma_set_anonymous(struct vm_area_struct *vma)
 {
vma->vm_ops = NULL;
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 2d6d790d9bed..bccbd5896850 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -491,7 +491,15 @@ struct vm_area_struct {
 * See vmf_insert_mixed_prot() for discussion.
 */
pgprot_t vm_page_prot;
-   unsigned long vm_flags; /* Flags, see mm.h. */
+
+   /*
+* Flags, see mm.h.
+* To modify use {init|reset|set|clear|mod}_vm_flags() functions.
+*/
+   union {
+   const vm_flags_t vm_flags;
+   vm_flags_t __private __vm_flags;
+   };
 
/*
 * For areas with an address space and backing store,
-- 
2.39.1



[PATCH v3 1/7] kernel/fork: convert vma assignment to a memcpy

2023-01-25 Thread Suren Baghdasaryan
Convert vma assignment in vm_area_dup() to a memcpy() to prevent compiler
errors when we add a const modifier to vma->vm_flags.

Signed-off-by: Suren Baghdasaryan 
---
 kernel/fork.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index 6683c1b0f460..a531901859d9 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -482,7 +482,7 @@ struct vm_area_struct *vm_area_dup(struct vm_area_struct 
*orig)
 * orig->shared.rb may be modified concurrently, but the clone
 * will be reinitialized.
 */
-   *new = data_race(*orig);
+   memcpy(new, orig, sizeof(*new));
INIT_LIST_HEAD(>anon_vma_chain);
dup_anon_vma_name(orig, new);
}
-- 
2.39.1



[PATCH v3 0/7] introduce vm_flags modifier functions

2023-01-25 Thread Suren Baghdasaryan
This patchset was originally published as a part of per-VMA locking [1] and
was split after suggestion that it's viable on its own and to facilitate
the review process. It is now a preprequisite for the next version of per-VMA
lock patchset, which reuses vm_flags modifier functions to lock the VMA when
vm_flags are being updated.

VMA vm_flags modifications are usually done under exclusive mmap_lock
protection because this attrubute affects other decisions like VMA merging
or splitting and races should be prevented. Introduce vm_flags modifier
functions to enforce correct locking.

The patchset applies cleanly over mm-unstable branch of mm tree.

My apologies for an extremely large distribution list. The patch touches
lots of files and many are in arch/ and drivers/.

Changes since v2 [2]
- Add an additional patch to convert vma assignment to a memcpy.
- Change vm_flags to a union of __private and const fields,
per Peter Zijlstra and Matthew Wilcox.
- Changed vm_flags type to vm_flags_t, per Matthew Wilcox.
- Removed unnecessary BUG_ON's from ksm_madvise and hugepage_madvise,
per Michal Hocko.
- Documented mod_vm_flags_nolock usage, per Michal Hocko.

[1] https://lore.kernel.org/all/20230109205336.3665937-1-sur...@google.com/
[2] https://lore.kernel.org/lkml/20230125083851.27759-1-sur...@google.com/

Suren Baghdasaryan (7):
  kernel/fork: convert vma assignment to a memcpy
  mm: introduce vma->vm_flags wrapper functions
  mm: replace VM_LOCKED_CLEAR_MASK with VM_LOCKED_MASK
  mm: replace vma->vm_flags direct modifications with modifier calls
  mm: replace vma->vm_flags indirect modification in ksm_madvise
  mm: introduce mod_vm_flags_nolock and use it in untrack_pfn
  mm: export dump_mm()

 arch/arm/kernel/process.c |  2 +-
 arch/ia64/mm/init.c   |  8 +--
 arch/loongarch/include/asm/tlb.h  |  2 +-
 arch/powerpc/kvm/book3s_hv_uvmem.c|  5 +-
 arch/powerpc/kvm/book3s_xive_native.c |  2 +-
 arch/powerpc/mm/book3s64/subpage_prot.c   |  2 +-
 arch/powerpc/platforms/book3s/vas-api.c   |  2 +-
 arch/powerpc/platforms/cell/spufs/file.c  | 14 ++---
 arch/s390/mm/gmap.c   |  8 ++-
 arch/x86/entry/vsyscall/vsyscall_64.c |  2 +-
 arch/x86/kernel/cpu/sgx/driver.c  |  2 +-
 arch/x86/kernel/cpu/sgx/virt.c|  2 +-
 arch/x86/mm/pat/memtype.c | 14 +++--
 arch/x86/um/mem_32.c  |  2 +-
 drivers/acpi/pfr_telemetry.c  |  2 +-
 drivers/android/binder.c  |  3 +-
 drivers/char/mspec.c  |  2 +-
 drivers/crypto/hisilicon/qm.c |  2 +-
 drivers/dax/device.c  |  2 +-
 drivers/dma/idxd/cdev.c   |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c   |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |  4 +-
 drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c |  4 +-
 drivers/gpu/drm/amd/amdkfd/kfd_events.c   |  4 +-
 drivers/gpu/drm/amd/amdkfd/kfd_process.c  |  4 +-
 drivers/gpu/drm/drm_gem.c |  2 +-
 drivers/gpu/drm/drm_gem_dma_helper.c  |  3 +-
 drivers/gpu/drm/drm_gem_shmem_helper.c|  2 +-
 drivers/gpu/drm/drm_vm.c  |  8 +--
 drivers/gpu/drm/etnaviv/etnaviv_gem.c |  2 +-
 drivers/gpu/drm/exynos/exynos_drm_gem.c   |  4 +-
 drivers/gpu/drm/gma500/framebuffer.c  |  2 +-
 drivers/gpu/drm/i810/i810_dma.c   |  2 +-
 drivers/gpu/drm/i915/gem/i915_gem_mman.c  |  4 +-
 drivers/gpu/drm/mediatek/mtk_drm_gem.c|  2 +-
 drivers/gpu/drm/msm/msm_gem.c |  2 +-
 drivers/gpu/drm/omapdrm/omap_gem.c|  3 +-
 drivers/gpu/drm/rockchip/rockchip_drm_gem.c   |  3 +-
 drivers/gpu/drm/tegra/gem.c   |  5 +-
 drivers/gpu/drm/ttm/ttm_bo_vm.c   |  3 +-
 drivers/gpu/drm/virtio/virtgpu_vram.c |  2 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c  |  2 +-
 drivers/gpu/drm/xen/xen_drm_front_gem.c   |  3 +-
 drivers/hsi/clients/cmt_speech.c  |  2 +-
 drivers/hwtracing/intel_th/msu.c  |  2 +-
 drivers/hwtracing/stm/core.c  |  2 +-
 drivers/infiniband/hw/hfi1/file_ops.c |  4 +-
 drivers/infiniband/hw/mlx5/main.c |  4 +-
 drivers/infiniband/hw/qib/qib_file_ops.c  | 13 ++---
 drivers/infiniband/hw/usnic/usnic_ib_verbs.c  |  2 +-
 .../infiniband/hw/vmw_pvrdma/pvrdma_verbs.c   |  2 +-
 .../common/videobuf2/videobuf2-dma-contig.c   |  2 +-
 .../common/videobuf2/videobuf2-vmalloc.c  |  2 +-
 drivers/media/v4l2-core/videobuf-dma-contig.c |  2 +-
 drivers/media/v4l2-core/videobuf-dma-sg.c |  4 +-
 drivers/media/v4l2-core/videobuf-vmalloc.c|  2 +-
 drivers/misc/cxl/context.c|  2 +-
 drivers/misc/habanalabs/common/memory.c   |  2 +-
 drivers/misc/habanalabs/gaudi/gaudi.c |  4 +-
 

Re: [PATCH v2 1/6] mm: introduce vma->vm_flags modifier functions

2023-01-25 Thread Suren Baghdasaryan
On Wed, Jan 25, 2023 at 10:33 AM Matthew Wilcox  wrote:
>
> On Wed, Jan 25, 2023 at 12:38:46AM -0800, Suren Baghdasaryan wrote:
> > +/* Use when VMA is not part of the VMA tree and needs no locking */
> > +static inline void init_vm_flags(struct vm_area_struct *vma,
> > +  unsigned long flags)
> > +{
> > + vma->vm_flags = flags;
>
> vm_flags are supposed to have type vm_flags_t.  That's not been
> fully realised yet, but perhaps we could avoid making it worse?
>
> >   pgprot_t vm_page_prot;
> > - unsigned long vm_flags; /* Flags, see mm.h. */
> > +
> > + /*
> > +  * Flags, see mm.h.
> > +  * WARNING! Do not modify directly.
> > +  * Use {init|reset|set|clear|mod}_vm_flags() functions instead.
> > +  */
> > + unsigned long vm_flags;
>
> Including changing this line to vm_flags_t

Good point. Will make the change. Thanks!


Re: [PATCH v2 1/6] mm: introduce vma->vm_flags modifier functions

2023-01-25 Thread Suren Baghdasaryan
On Wed, Jan 25, 2023 at 10:37 AM Matthew Wilcox  wrote:
>
> On Wed, Jan 25, 2023 at 08:49:50AM -0800, Suren Baghdasaryan wrote:
> > On Wed, Jan 25, 2023 at 1:10 AM Peter Zijlstra  wrote:
> > > > + /*
> > > > +  * Flags, see mm.h.
> > > > +  * WARNING! Do not modify directly.
> > > > +  * Use {init|reset|set|clear|mod}_vm_flags() functions instead.
> > > > +  */
> > > > + unsigned long vm_flags;
> > >
> > > We have __private and ACCESS_PRIVATE() to help with enforcing this.
> >
> > Thanks for pointing this out, Peter! I guess for that I'll need to
> > convert all read accesses and provide get_vm_flags() too? That will
> > cause some additional churt (a quick search shows 801 hits over 248
> > files) but maybe it's worth it? I think Michal suggested that too in
> > another patch. Should I do that while we are at it?
>
> Here's a trick I saw somewhere in the VFS:
>
> union {
> const vm_flags_t vm_flags;
> vm_flags_t __private __vm_flags;
> };
>
> Now it can be read by anybody but written only by those using
> ACCESS_PRIVATE.

Huh, this is quite nice! I think it does not save us from the cases
when vma->vm_flags is passed by a reference and modified indirectly,
like in ksm_madvise()? Though maybe such usecases are so rare (I found
only 2 cases) that we can ignore this?


Re: [PATCH v2 1/6] mm: introduce vma->vm_flags modifier functions

2023-01-25 Thread Matthew Wilcox
On Wed, Jan 25, 2023 at 08:49:50AM -0800, Suren Baghdasaryan wrote:
> On Wed, Jan 25, 2023 at 1:10 AM Peter Zijlstra  wrote:
> > > + /*
> > > +  * Flags, see mm.h.
> > > +  * WARNING! Do not modify directly.
> > > +  * Use {init|reset|set|clear|mod}_vm_flags() functions instead.
> > > +  */
> > > + unsigned long vm_flags;
> >
> > We have __private and ACCESS_PRIVATE() to help with enforcing this.
> 
> Thanks for pointing this out, Peter! I guess for that I'll need to
> convert all read accesses and provide get_vm_flags() too? That will
> cause some additional churt (a quick search shows 801 hits over 248
> files) but maybe it's worth it? I think Michal suggested that too in
> another patch. Should I do that while we are at it?

Here's a trick I saw somewhere in the VFS:

union {
const vm_flags_t vm_flags;
vm_flags_t __private __vm_flags;
};

Now it can be read by anybody but written only by those using
ACCESS_PRIVATE.


Re: [PATCH v2 1/6] mm: introduce vma->vm_flags modifier functions

2023-01-25 Thread Matthew Wilcox
On Wed, Jan 25, 2023 at 12:38:46AM -0800, Suren Baghdasaryan wrote:
> +/* Use when VMA is not part of the VMA tree and needs no locking */
> +static inline void init_vm_flags(struct vm_area_struct *vma,
> +  unsigned long flags)
> +{
> + vma->vm_flags = flags;

vm_flags are supposed to have type vm_flags_t.  That's not been
fully realised yet, but perhaps we could avoid making it worse?

>   pgprot_t vm_page_prot;
> - unsigned long vm_flags; /* Flags, see mm.h. */
> +
> + /*
> +  * Flags, see mm.h.
> +  * WARNING! Do not modify directly.
> +  * Use {init|reset|set|clear|mod}_vm_flags() functions instead.
> +  */
> + unsigned long vm_flags;

Including changing this line to vm_flags_t


Re: [PATCH v2 4/6] mm: replace vma->vm_flags indirect modification in ksm_madvise

2023-01-25 Thread Suren Baghdasaryan
On Wed, Jan 25, 2023 at 9:08 AM Michal Hocko  wrote:
>
> On Wed 25-01-23 08:57:48, Suren Baghdasaryan wrote:
> > On Wed, Jan 25, 2023 at 1:38 AM 'Michal Hocko' via kernel-team
> >  wrote:
> > >
> > > On Wed 25-01-23 00:38:49, Suren Baghdasaryan wrote:
> > > > Replace indirect modifications to vma->vm_flags with calls to modifier
> > > > functions to be able to track flag changes and to keep vma locking
> > > > correctness. Add a BUG_ON check in ksm_madvise() to catch indirect
> > > > vm_flags modification attempts.
> > >
> > > Those BUG_ONs scream to much IMHO. KSM is an MM internal code so I
> > > gueess we should be willing to trust it.
> >
> > Yes, but I really want to prevent an indirect misuse since it was not
> > easy to find these. If you feel strongly about it I will remove them
> > or if you have a better suggestion I'm all for it.
>
> You can avoid that by making flags inaccesible directly, right?

Ah, you mean Peter's suggestion of using __private? I guess that would
cover it. I'll drop these BUG_ONs in the next version. Thanks!

>
> --
> Michal Hocko
> SUSE Labs


Re: [PATCH v2 4/6] mm: replace vma->vm_flags indirect modification in ksm_madvise

2023-01-25 Thread Michal Hocko
On Wed 25-01-23 08:57:48, Suren Baghdasaryan wrote:
> On Wed, Jan 25, 2023 at 1:38 AM 'Michal Hocko' via kernel-team
>  wrote:
> >
> > On Wed 25-01-23 00:38:49, Suren Baghdasaryan wrote:
> > > Replace indirect modifications to vma->vm_flags with calls to modifier
> > > functions to be able to track flag changes and to keep vma locking
> > > correctness. Add a BUG_ON check in ksm_madvise() to catch indirect
> > > vm_flags modification attempts.
> >
> > Those BUG_ONs scream to much IMHO. KSM is an MM internal code so I
> > gueess we should be willing to trust it.
> 
> Yes, but I really want to prevent an indirect misuse since it was not
> easy to find these. If you feel strongly about it I will remove them
> or if you have a better suggestion I'm all for it.

You can avoid that by making flags inaccesible directly, right?

-- 
Michal Hocko
SUSE Labs


Re: [PATCH v2 5/6] mm: introduce mod_vm_flags_nolock and use it in untrack_pfn

2023-01-25 Thread Suren Baghdasaryan
On Wed, Jan 25, 2023 at 1:42 AM Michal Hocko  wrote:
>
> On Wed 25-01-23 00:38:50, Suren Baghdasaryan wrote:
> > In cases when VMA flags are modified after VMA was isolated and mmap_lock
> > was downgraded, flags modifications would result in an assertion because
> > mmap write lock is not held.
> > Introduce mod_vm_flags_nolock to be used in such situation.
> > Pass a hint to untrack_pfn to conditionally use mod_vm_flags_nolock for
> > flags modification and to avoid assertion.
>
> The changelog nor the documentation of mod_vm_flags_nolock
> really explain when it is safe to use it. This is really important for
> future potential users.

True. I'll add clarification in the comments and in the changelog. Thanks!

>
> > Signed-off-by: Suren Baghdasaryan 
> > ---
> >  arch/x86/mm/pat/memtype.c | 10 +++---
> >  include/linux/mm.h| 12 +---
> >  include/linux/pgtable.h   |  5 +++--
> >  mm/memory.c   | 13 +++--
> >  mm/memremap.c |  4 ++--
> >  mm/mmap.c | 16 ++--
> >  6 files changed, 38 insertions(+), 22 deletions(-)
> >
> > diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
> > index ae9645c900fa..d8adc0b42cf2 100644
> > --- a/arch/x86/mm/pat/memtype.c
> > +++ b/arch/x86/mm/pat/memtype.c
> > @@ -1046,7 +1046,7 @@ void track_pfn_insert(struct vm_area_struct *vma, 
> > pgprot_t *prot, pfn_t pfn)
> >   * can be for the entire vma (in which case pfn, size are zero).
> >   */
> >  void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
> > -  unsigned long size)
> > +  unsigned long size, bool mm_wr_locked)
> >  {
> >   resource_size_t paddr;
> >   unsigned long prot;
> > @@ -1065,8 +1065,12 @@ void untrack_pfn(struct vm_area_struct *vma, 
> > unsigned long pfn,
> >   size = vma->vm_end - vma->vm_start;
> >   }
> >   free_pfn_range(paddr, size);
> > - if (vma)
> > - clear_vm_flags(vma, VM_PAT);
> > + if (vma) {
> > + if (mm_wr_locked)
> > + clear_vm_flags(vma, VM_PAT);
> > + else
> > + mod_vm_flags_nolock(vma, 0, VM_PAT);
> > + }
> >  }
> >
> >  /*
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 55335edd1373..48d49930c411 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -656,12 +656,18 @@ static inline void clear_vm_flags(struct 
> > vm_area_struct *vma,
> >   vma->vm_flags &= ~flags;
> >  }
> >
> > +static inline void mod_vm_flags_nolock(struct vm_area_struct *vma,
> > +unsigned long set, unsigned long clear)
> > +{
> > + vma->vm_flags |= set;
> > + vma->vm_flags &= ~clear;
> > +}
> > +
> >  static inline void mod_vm_flags(struct vm_area_struct *vma,
> >   unsigned long set, unsigned long clear)
> >  {
> >   mmap_assert_write_locked(vma->vm_mm);
> > - vma->vm_flags |= set;
> > - vma->vm_flags &= ~clear;
> > + mod_vm_flags_nolock(vma, set, clear);
> >  }
> >
> >  static inline void vma_set_anonymous(struct vm_area_struct *vma)
> > @@ -2087,7 +2093,7 @@ static inline void zap_vma_pages(struct 
> > vm_area_struct *vma)
> >  }
> >  void unmap_vmas(struct mmu_gather *tlb, struct maple_tree *mt,
> >   struct vm_area_struct *start_vma, unsigned long start,
> > - unsigned long end);
> > + unsigned long end, bool mm_wr_locked);
> >
> >  struct mmu_notifier_range;
> >
> > diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> > index 5fd45454c073..c63cd44777ec 100644
> > --- a/include/linux/pgtable.h
> > +++ b/include/linux/pgtable.h
> > @@ -1185,7 +1185,8 @@ static inline int track_pfn_copy(struct 
> > vm_area_struct *vma)
> >   * can be for the entire vma (in which case pfn, size are zero).
> >   */
> >  static inline void untrack_pfn(struct vm_area_struct *vma,
> > -unsigned long pfn, unsigned long size)
> > +unsigned long pfn, unsigned long size,
> > +bool mm_wr_locked)
> >  {
> >  }
> >
> > @@ -1203,7 +1204,7 @@ extern void track_pfn_insert(struct vm_area_struct 
> > *vma, pgprot_t *prot,
> >pfn_t pfn);
> >  extern int track_pfn_copy(struct vm_area_struct *vma);
> >  extern void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
> > - unsigned long size);
> > + unsigned long size, bool mm_wr_locked);
> >  extern void untrack_pfn_moved(struct vm_area_struct *vma);
> >  #endif
> >
> > diff --git a/mm/memory.c b/mm/memory.c
> > index d6902065e558..5b11b50e2c4a 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -1613,7 +1613,7 @@ void unmap_page_range(struct mmu_gather *tlb,
> >  static void unmap_single_vma(struct mmu_gather *tlb,
> >   struct vm_area_struct *vma, unsigned long start_addr,
> >   unsigned long 

Re: [PATCH v2 4/6] mm: replace vma->vm_flags indirect modification in ksm_madvise

2023-01-25 Thread Suren Baghdasaryan
On Wed, Jan 25, 2023 at 1:38 AM 'Michal Hocko' via kernel-team
 wrote:
>
> On Wed 25-01-23 00:38:49, Suren Baghdasaryan wrote:
> > Replace indirect modifications to vma->vm_flags with calls to modifier
> > functions to be able to track flag changes and to keep vma locking
> > correctness. Add a BUG_ON check in ksm_madvise() to catch indirect
> > vm_flags modification attempts.
>
> Those BUG_ONs scream to much IMHO. KSM is an MM internal code so I
> gueess we should be willing to trust it.

Yes, but I really want to prevent an indirect misuse since it was not
easy to find these. If you feel strongly about it I will remove them
or if you have a better suggestion I'm all for it.

>
> > Signed-off-by: Suren Baghdasaryan 
>
> Acked-by: Michal Hocko 
> --
> Michal Hocko
> SUSE Labs
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to kernel-team+unsubscr...@android.com.
>


Re: [PATCH v2 3/6] mm: replace vma->vm_flags direct modifications with modifier calls

2023-01-25 Thread Suren Baghdasaryan
On Wed, Jan 25, 2023 at 1:30 AM 'Michal Hocko' via kernel-team
 wrote:
>
> On Wed 25-01-23 00:38:48, Suren Baghdasaryan wrote:
> > Replace direct modifications to vma->vm_flags with calls to modifier
> > functions to be able to track flag changes and to keep vma locking
> > correctness.
>
> Is this a manual (git grep) based work or have you used Coccinele for
> the patch generation?

It was a manual "search and replace" and in the process I temporarily
renamed vm_flags to ensure I did not miss any usage.

>
> My potentially incomplete check
> $ git grep ">[[:space:]]*vm_flags[[:space:]]*[&|^]="
>
> shows that nothing should be left after this. There is still quite a lot
> of direct checks of the flags (more than 600). Maybe it would be good to
> make flags accessible only via accessors which would also prevent any
> future direct setting of those flags in uncontrolled way as well.

Yes, I think Peter's suggestion in the first patch would also require
that. Much more churn but probably worth it for the future
maintenance. I'll add a patch which converts all readers as well.

>
> Anyway
> Acked-by: Michal Hocko 

Thanks for all the reviews!

> --
> Michal Hocko
> SUSE Labs
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to kernel-team+unsubscr...@android.com.
>


Re: [PATCH v2 1/6] mm: introduce vma->vm_flags modifier functions

2023-01-25 Thread Suren Baghdasaryan
On Wed, Jan 25, 2023 at 1:10 AM Peter Zijlstra  wrote:
>
> On Wed, Jan 25, 2023 at 12:38:46AM -0800, Suren Baghdasaryan wrote:
>
> > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> > index 2d6d790d9bed..6c7c70bf50dd 100644
> > --- a/include/linux/mm_types.h
> > +++ b/include/linux/mm_types.h
> > @@ -491,7 +491,13 @@ struct vm_area_struct {
> >* See vmf_insert_mixed_prot() for discussion.
> >*/
> >   pgprot_t vm_page_prot;
> > - unsigned long vm_flags; /* Flags, see mm.h. */
> > +
> > + /*
> > +  * Flags, see mm.h.
> > +  * WARNING! Do not modify directly.
> > +  * Use {init|reset|set|clear|mod}_vm_flags() functions instead.
> > +  */
> > + unsigned long vm_flags;
>
> We have __private and ACCESS_PRIVATE() to help with enforcing this.

Thanks for pointing this out, Peter! I guess for that I'll need to
convert all read accesses and provide get_vm_flags() too? That will
cause some additional churt (a quick search shows 801 hits over 248
files) but maybe it's worth it? I think Michal suggested that too in
another patch. Should I do that while we are at it?

>


[PATCH 3/3] mm, arch: add generic implementation of pfn_valid() for FLATMEM

2023-01-25 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

Every architecture that supports FLATMEM memory model defines its own
version of pfn_valid() that essentially compares a pfn to max_mapnr.

Use mips/powerpc version implemented as static inline as a generic
implementation of pfn_valid() and drop its per-architecture definitions

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/alpha/include/asm/page.h  |  4 
 arch/arc/include/asm/page.h|  1 -
 arch/csky/include/asm/page.h   |  1 -
 arch/hexagon/include/asm/page.h|  1 -
 arch/ia64/include/asm/page.h   |  4 
 arch/loongarch/include/asm/page.h  | 13 -
 arch/m68k/include/asm/page_no.h|  2 --
 arch/microblaze/include/asm/page.h |  1 -
 arch/mips/include/asm/page.h   | 13 -
 arch/nios2/include/asm/page.h  |  9 -
 arch/openrisc/include/asm/page.h   |  2 --
 arch/parisc/include/asm/page.h |  4 
 arch/powerpc/include/asm/page.h|  9 -
 arch/riscv/include/asm/page.h  |  5 -
 arch/sh/include/asm/page.h |  3 ---
 arch/sparc/include/asm/page_32.h   |  1 -
 arch/um/include/asm/page.h |  1 -
 arch/x86/include/asm/page_32.h |  4 
 arch/x86/include/asm/page_64.h |  4 
 arch/xtensa/include/asm/page.h |  2 --
 include/asm-generic/memory_model.h | 12 
 include/asm-generic/page.h |  2 --
 22 files changed, 12 insertions(+), 86 deletions(-)

diff --git a/arch/alpha/include/asm/page.h b/arch/alpha/include/asm/page.h
index 8f3f5eecba28..227d32b6b75f 100644
--- a/arch/alpha/include/asm/page.h
+++ b/arch/alpha/include/asm/page.h
@@ -87,10 +87,6 @@ typedef struct page *pgtable_t;
 #define virt_to_page(kaddr)pfn_to_page(__pa(kaddr) >> PAGE_SHIFT)
 #define virt_addr_valid(kaddr) pfn_valid((__pa(kaddr) >> PAGE_SHIFT))
 
-#ifdef CONFIG_FLATMEM
-#define pfn_valid(pfn) ((pfn) < max_mapnr)
-#endif /* CONFIG_FLATMEM */
-
 #include 
 #include 
 
diff --git a/arch/arc/include/asm/page.h b/arch/arc/include/asm/page.h
index 9a62e1d87967..e43fe27ec54d 100644
--- a/arch/arc/include/asm/page.h
+++ b/arch/arc/include/asm/page.h
@@ -109,7 +109,6 @@ extern int pfn_valid(unsigned long pfn);
 #else /* CONFIG_HIGHMEM */
 
 #define ARCH_PFN_OFFSETvirt_to_pfn(CONFIG_LINUX_RAM_BASE)
-#define pfn_valid(pfn) (((pfn) - ARCH_PFN_OFFSET) < max_mapnr)
 
 #endif /* CONFIG_HIGHMEM */
 
diff --git a/arch/csky/include/asm/page.h b/arch/csky/include/asm/page.h
index ed7451478b1b..b23e3006a9e0 100644
--- a/arch/csky/include/asm/page.h
+++ b/arch/csky/include/asm/page.h
@@ -39,7 +39,6 @@
 
 #define virt_addr_valid(kaddr)  ((void *)(kaddr) >= (void *)PAGE_OFFSET && \
(void *)(kaddr) < high_memory)
-#define pfn_valid(pfn) ((pfn) >= ARCH_PFN_OFFSET && ((pfn) - 
ARCH_PFN_OFFSET) < max_mapnr)
 
 extern void *memset(void *dest, int c, size_t l);
 extern void *memcpy(void *to, const void *from, size_t l);
diff --git a/arch/hexagon/include/asm/page.h b/arch/hexagon/include/asm/page.h
index d7d4f9fca327..9c03b9965f07 100644
--- a/arch/hexagon/include/asm/page.h
+++ b/arch/hexagon/include/asm/page.h
@@ -95,7 +95,6 @@ struct page;
 /* Default vm area behavior is non-executable.  */
 #define VM_DATA_DEFAULT_FLAGS  VM_DATA_FLAGS_NON_EXEC
 
-#define pfn_valid(pfn) ((pfn) < max_mapnr)
 #define virt_addr_valid(kaddr) pfn_valid(__pa(kaddr) >> PAGE_SHIFT)
 
 /*  Need to not use a define for linesize; may move this to another file.  */
diff --git a/arch/ia64/include/asm/page.h b/arch/ia64/include/asm/page.h
index 1b990466d540..783eceab5df3 100644
--- a/arch/ia64/include/asm/page.h
+++ b/arch/ia64/include/asm/page.h
@@ -97,10 +97,6 @@ do { \
 
 #include 
 
-#ifdef CONFIG_FLATMEM
-# define pfn_valid(pfn)((pfn) < max_mapnr)
-#endif
-
 #define page_to_phys(page) (page_to_pfn(page) << PAGE_SHIFT)
 #define virt_to_page(kaddr)pfn_to_page(__pa(kaddr) >> PAGE_SHIFT)
 #define pfn_to_kaddr(pfn)  __va((pfn) << PAGE_SHIFT)
diff --git a/arch/loongarch/include/asm/page.h 
b/arch/loongarch/include/asm/page.h
index 53f284a96182..fb5338b352e6 100644
--- a/arch/loongarch/include/asm/page.h
+++ b/arch/loongarch/include/asm/page.h
@@ -82,19 +82,6 @@ typedef struct { unsigned long pgprot; } pgprot_t;
 
 #define pfn_to_kaddr(pfn)  __va((pfn) << PAGE_SHIFT)
 
-#ifdef CONFIG_FLATMEM
-
-static inline int pfn_valid(unsigned long pfn)
-{
-   /* avoid  include hell */
-   extern unsigned long max_mapnr;
-   unsigned long pfn_offset = ARCH_PFN_OFFSET;
-
-   return pfn >= pfn_offset && pfn < max_mapnr;
-}
-
-#endif
-
 #define virt_to_pfn(kaddr) PFN_DOWN(PHYSADDR(kaddr))
 #define virt_to_page(kaddr)pfn_to_page(virt_to_pfn(kaddr))
 
diff --git a/arch/m68k/include/asm/page_no.h b/arch/m68k/include/asm/page_no.h
index 0a8ccef777fd..2555ec57149d 100644
--- a/arch/m68k/include/asm/page_no.h
+++ b/arch/m68k/include/asm/page_no.h
@@ -26,8 +26,6 @@ extern unsigned long memory_end;
 

[PATCH 2/3] mips: drop definition of pfn_valid() for DISCONTIGMEM

2023-01-25 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

There is stale definition of pfn_valid() for DISCONTINGMEM memory model
guarded !FLATMEM && !SPARSEMEM && NUMA ifdefery.

Remove everything but definition of pfn_valid() for FLATMEM.

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/mips/include/asm/page.h | 15 ---
 1 file changed, 15 deletions(-)

diff --git a/arch/mips/include/asm/page.h b/arch/mips/include/asm/page.h
index 96bc798c1ec1..9286f11ff6ad 100644
--- a/arch/mips/include/asm/page.h
+++ b/arch/mips/include/asm/page.h
@@ -235,21 +235,6 @@ static inline int pfn_valid(unsigned long pfn)
return pfn >= pfn_offset && pfn < max_mapnr;
 }
 
-#elif defined(CONFIG_SPARSEMEM)
-
-/* pfn_valid is defined in linux/mmzone.h */
-
-#elif defined(CONFIG_NUMA)
-
-#define pfn_valid(pfn) \
-({ \
-   unsigned long __pfn = (pfn);\
-   int __n = pfn_to_nid(__pfn);\
-   ((__n >= 0) ? (__pfn < NODE_DATA(__n)->node_start_pfn + \
-  NODE_DATA(__n)->node_spanned_pages)  \
-   : 0);   \
-})
-
 #endif
 
 #define virt_to_pfn(kaddr) PFN_DOWN(virt_to_phys((void *)(kaddr)))
-- 
2.35.1



[PATCH 1/3] m68k: use asm-generic/memory_model.h for both MMU and !MMU

2023-01-25 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

The MMU variant uses generic definitions of page_to_pfn() and
pfn_to_page(), but !MMU defines them in include/asm/page_no.h for no
good reason.

Include asm-generic/memory_model.h in the common include/asm/page.h and
drop redundant definitions.

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/m68k/include/asm/page.h| 6 +-
 arch/m68k/include/asm/page_mm.h | 1 -
 arch/m68k/include/asm/page_no.h | 2 --
 3 files changed, 1 insertion(+), 8 deletions(-)

diff --git a/arch/m68k/include/asm/page.h b/arch/m68k/include/asm/page.h
index 2f1c54e4725d..a5993ad83ed8 100644
--- a/arch/m68k/include/asm/page.h
+++ b/arch/m68k/include/asm/page.h
@@ -62,11 +62,7 @@ extern unsigned long _ramend;
 #include 
 #endif
 
-#ifndef CONFIG_MMU
-#define __phys_to_pfn(paddr)   ((unsigned long)((paddr) >> PAGE_SHIFT))
-#define __pfn_to_phys(pfn) PFN_PHYS(pfn)
-#endif
-
 #include 
+#include 
 
 #endif /* _M68K_PAGE_H */
diff --git a/arch/m68k/include/asm/page_mm.h b/arch/m68k/include/asm/page_mm.h
index a5b459bcb7d8..3903db2e8da7 100644
--- a/arch/m68k/include/asm/page_mm.h
+++ b/arch/m68k/include/asm/page_mm.h
@@ -134,7 +134,6 @@ extern int m68k_virt_to_node_shift;
 })
 
 #define ARCH_PFN_OFFSET (m68k_memory[0].addr >> PAGE_SHIFT)
-#include 
 
 #define virt_addr_valid(kaddr) ((unsigned long)(kaddr) >= PAGE_OFFSET && 
(unsigned long)(kaddr) < (unsigned long)high_memory)
 #define pfn_valid(pfn) virt_addr_valid(pfn_to_virt(pfn))
diff --git a/arch/m68k/include/asm/page_no.h b/arch/m68k/include/asm/page_no.h
index c9d0d84158a4..0a8ccef777fd 100644
--- a/arch/m68k/include/asm/page_no.h
+++ b/arch/m68k/include/asm/page_no.h
@@ -26,8 +26,6 @@ extern unsigned long memory_end;
 #define virt_to_page(addr) (mem_map + (((unsigned long)(addr)-PAGE_OFFSET) 
>> PAGE_SHIFT))
 #define page_to_virt(page) __va(page) - mem_map) << PAGE_SHIFT) + 
PAGE_OFFSET))
 
-#define pfn_to_page(pfn)   virt_to_page(pfn_to_virt(pfn))
-#define page_to_pfn(page)  virt_to_pfn(page_to_virt(page))
 #define pfn_valid(pfn) ((pfn) < max_mapnr)
 
 #definevirt_addr_valid(kaddr)  (((unsigned long)(kaddr) >= 
PAGE_OFFSET) && \
-- 
2.35.1



[PATCH 0/3] mm, arch: add generic implementation of pfn_valid() for FLATMEM

2023-01-25 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

Hi,

Every architecture that supports FLATMEM memory model defines its own
version of pfn_valid() that essentially compares a pfn to max_mapnr.

Use mips/powerpc version implemented as static inline as a generic
implementation of pfn_valid() and drop its per-architecture definitions

Mike Rapoport (IBM) (3):
  m68k: use asm-generic/memory_model.h for both MMU and !MMU
  mips: drop definition of pfn_valid() for DISCONTIGMEM
  mm, arch: add generic implementation of pfn_valid() for FLATMEM

 arch/alpha/include/asm/page.h  |  4 
 arch/arc/include/asm/page.h|  1 -
 arch/csky/include/asm/page.h   |  1 -
 arch/hexagon/include/asm/page.h|  1 -
 arch/ia64/include/asm/page.h   |  4 
 arch/loongarch/include/asm/page.h  | 13 -
 arch/m68k/include/asm/page.h   |  6 +-
 arch/m68k/include/asm/page_mm.h|  1 -
 arch/m68k/include/asm/page_no.h|  4 
 arch/microblaze/include/asm/page.h |  1 -
 arch/mips/include/asm/page.h   | 28 
 arch/nios2/include/asm/page.h  |  9 -
 arch/openrisc/include/asm/page.h   |  2 --
 arch/parisc/include/asm/page.h |  4 
 arch/powerpc/include/asm/page.h|  9 -
 arch/riscv/include/asm/page.h  |  5 -
 arch/sh/include/asm/page.h |  3 ---
 arch/sparc/include/asm/page_32.h   |  1 -
 arch/um/include/asm/page.h |  1 -
 arch/x86/include/asm/page_32.h |  4 
 arch/x86/include/asm/page_64.h |  4 
 arch/xtensa/include/asm/page.h |  2 --
 include/asm-generic/memory_model.h | 12 
 include/asm-generic/page.h |  2 --
 24 files changed, 13 insertions(+), 109 deletions(-)


base-commit: 2241ab53cbb5cdb08a6b2d4688feb13971058f65
-- 
2.35.1



Re: [PATCH 2/2] powerpc/module_64: Fix "expected nop" error on module re-patching

2023-01-25 Thread Song Liu
On Wed, Jan 25, 2023 at 10:53 AM Josh Poimboeuf  wrote:
>
> On Wed, Jan 25, 2023 at 09:36:02AM -0800, Song Liu wrote:
> > On Wed, Jan 25, 2023 at 8:46 AM Josh Poimboeuf  wrote:
> > >
> > > On Tue, Jan 24, 2023 at 10:09:56PM -0800, Song Liu wrote:
> > > > > @@ -514,9 +515,18 @@ static int restore_r2(const char *name, u32 
> > > > > *instruction, struct module *me)
> > > > > if (!instr_is_relative_link_branch(ppc_inst(*prev_insn)))
> > > > > return 0;
> > > > >
> > > > > -   if (*instruction != PPC_RAW_NOP()) {
> > > > > +   /*
> > > > > +* For livepatch, the restore r2 instruction might have 
> > > > > already been
> > > > > +* written previously, if the referenced symbol is in a 
> > > > > previously
> > > > > +* unloaded module which is now being loaded again.  In that 
> > > > > case, skip
> > > > > +* the warning and the instruction write.
> > > > > +*/
> > > > > +   if (insn_val == PPC_INST_LD_TOC)
> > > > > +   return 0;
> > > >
> > > > Do we need "sym->st_shndx == SHN_LIVEPATCH" here?
> > >
> > > My original patch had that check, but I dropped it for simplicity.
> > >
> > > In the non-livepatch case, the condition should never be true, but it
> > > doesn't hurt to check it anyway.
> >
> > While this is the only place we use PPC_INST_LD_TOC, there is another
> > place we use "PPC_RAW_STD(_R2, _R1, R2_STACK_OFFSET)", which
> > is identical to PPC_INST_LD_TOC. So I am not quite sure whether this
> > happens for non-livepatch.
>
> It's not actually identical.  That's the "store r2 to the stack"
> counterpart to the load in PPC_INST_LD_TOC, which loads r2 from the
> stack.

Ooops.. I misread the code.

>
> For R_PPC_REL24 relocations, when calling a function which lives outside
> the module, 24 bits isn't enough to encode the relative branch target
> address.  So it has to save r2 (TOC pointer) to the stack, and branch to
> a stub, which then branches to the external function.
>
> When the external function returns execution to the instruction after
> the original branch, that instruction needs to restore the TOC pointer
> from the stack to r2.
>
> The compiler knows this, and emits the instruction after the branch as a
> NOP.  The module code replaces that NOP with a "restore r2 from the
> stack".  That's what restore_r2() does.
>
> Long story short, restore_r2() needs to ensure the instruction after the
> branch restores r2 from the stack.  If that instruction is already
> there, it doesn't need to do anything.

Thanks for the explanation!

Acked-by: Song Liu 


Re: [PATCH 2/2] powerpc/module_64: Fix "expected nop" error on module re-patching

2023-01-25 Thread Josh Poimboeuf
On Wed, Jan 25, 2023 at 09:36:02AM -0800, Song Liu wrote:
> On Wed, Jan 25, 2023 at 8:46 AM Josh Poimboeuf  wrote:
> >
> > On Tue, Jan 24, 2023 at 10:09:56PM -0800, Song Liu wrote:
> > > > @@ -514,9 +515,18 @@ static int restore_r2(const char *name, u32 
> > > > *instruction, struct module *me)
> > > > if (!instr_is_relative_link_branch(ppc_inst(*prev_insn)))
> > > > return 0;
> > > >
> > > > -   if (*instruction != PPC_RAW_NOP()) {
> > > > +   /*
> > > > +* For livepatch, the restore r2 instruction might have already 
> > > > been
> > > > +* written previously, if the referenced symbol is in a 
> > > > previously
> > > > +* unloaded module which is now being loaded again.  In that 
> > > > case, skip
> > > > +* the warning and the instruction write.
> > > > +*/
> > > > +   if (insn_val == PPC_INST_LD_TOC)
> > > > +   return 0;
> > >
> > > Do we need "sym->st_shndx == SHN_LIVEPATCH" here?
> >
> > My original patch had that check, but I dropped it for simplicity.
> >
> > In the non-livepatch case, the condition should never be true, but it
> > doesn't hurt to check it anyway.
> 
> While this is the only place we use PPC_INST_LD_TOC, there is another
> place we use "PPC_RAW_STD(_R2, _R1, R2_STACK_OFFSET)", which
> is identical to PPC_INST_LD_TOC. So I am not quite sure whether this
> happens for non-livepatch.

It's not actually identical.  That's the "store r2 to the stack"
counterpart to the load in PPC_INST_LD_TOC, which loads r2 from the
stack.

For R_PPC_REL24 relocations, when calling a function which lives outside
the module, 24 bits isn't enough to encode the relative branch target
address.  So it has to save r2 (TOC pointer) to the stack, and branch to
a stub, which then branches to the external function.

When the external function returns execution to the instruction after
the original branch, that instruction needs to restore the TOC pointer
from the stack to r2.

The compiler knows this, and emits the instruction after the branch as a
NOP.  The module code replaces that NOP with a "restore r2 from the
stack".  That's what restore_r2() does.

Long story short, restore_r2() needs to ensure the instruction after the
branch restores r2 from the stack.  If that instruction is already
there, it doesn't need to do anything.

-- 
Josh


Re: [PATCH 2/2] powerpc/module_64: Fix "expected nop" error on module re-patching

2023-01-25 Thread Song Liu
On Wed, Jan 25, 2023 at 8:46 AM Josh Poimboeuf  wrote:
>
> On Tue, Jan 24, 2023 at 10:09:56PM -0800, Song Liu wrote:
> > > @@ -514,9 +515,18 @@ static int restore_r2(const char *name, u32 
> > > *instruction, struct module *me)
> > > if (!instr_is_relative_link_branch(ppc_inst(*prev_insn)))
> > > return 0;
> > >
> > > -   if (*instruction != PPC_RAW_NOP()) {
> > > +   /*
> > > +* For livepatch, the restore r2 instruction might have already 
> > > been
> > > +* written previously, if the referenced symbol is in a previously
> > > +* unloaded module which is now being loaded again.  In that 
> > > case, skip
> > > +* the warning and the instruction write.
> > > +*/
> > > +   if (insn_val == PPC_INST_LD_TOC)
> > > +   return 0;
> >
> > Do we need "sym->st_shndx == SHN_LIVEPATCH" here?
>
> My original patch had that check, but I dropped it for simplicity.
>
> In the non-livepatch case, the condition should never be true, but it
> doesn't hurt to check it anyway.

While this is the only place we use PPC_INST_LD_TOC, there is another
place we use "PPC_RAW_STD(_R2, _R1, R2_STACK_OFFSET)", which
is identical to PPC_INST_LD_TOC. So I am not quite sure whether this
happens for non-livepatch.

Thanks,
Song


Re: [PATCH v3 11/11] perf jevents: Add model list option

2023-01-25 Thread John Garry

On 24/01/2023 06:33, Ian Rogers wrote:

This allows the set of generated jevents events and metrics be limited
to a subset of the model names. Appropriate if trying to minimize the
binary size where only a set of models are possible. On ARM64 the
--model selects the implementor rather than model.

Signed-off-by: Ian Rogers 


would it be really difficult/painful to support model names as specified 
in the mapfile.csv, like "skylake" (for x86) or "hisilicon/hip08" (for 
arm64)?


Thanks,
John




Re: [PATCH v3 06/11] perf pmu-events: Remove now unused event and metric variables

2023-01-25 Thread John Garry

On 24/01/2023 06:33, Ian Rogers wrote:

Previous changes separated the uses of pmu_event and pmu_metric,
however, both structures contained all the variables of event and
metric. This change removes the event variables from metric and the
metric variables from event.

Note, this change removes the setting of evsel's metric_name/expr as
these fields are no longer part of struct pmu_event. The metric
remains but is no longer implicitly requested when the event is. This
impacts a few Intel uncore events, however, as the ScaleUnit is shared
by the event and the metric this utility is questionable. Also the
MetricNames look broken (contain spaces) in some cases and when trying
to use the functionality with '-e' the metrics fail but regular
metrics with '-M' work. For example, on SkylakeX '-M' works:



Reviewed-by: John Garry 


Re: [PATCH 2/2] powerpc/module_64: Fix "expected nop" error on module re-patching

2023-01-25 Thread Josh Poimboeuf
On Tue, Jan 24, 2023 at 10:09:56PM -0800, Song Liu wrote:
> > @@ -514,9 +515,18 @@ static int restore_r2(const char *name, u32 
> > *instruction, struct module *me)
> > if (!instr_is_relative_link_branch(ppc_inst(*prev_insn)))
> > return 0;
> >
> > -   if (*instruction != PPC_RAW_NOP()) {
> > +   /*
> > +* For livepatch, the restore r2 instruction might have already been
> > +* written previously, if the referenced symbol is in a previously
> > +* unloaded module which is now being loaded again.  In that case, 
> > skip
> > +* the warning and the instruction write.
> > +*/
> > +   if (insn_val == PPC_INST_LD_TOC)
> > +   return 0;
> 
> Do we need "sym->st_shndx == SHN_LIVEPATCH" here?

My original patch had that check, but I dropped it for simplicity.

In the non-livepatch case, the condition should never be true, but it
doesn't hurt to check it anyway.

-- 
Josh


Re: [PATCH] tools/perf: Disable perf probe when libtraceevent is missing

2023-01-25 Thread Athira Rajeev



> On 20-Jan-2023, at 7:58 PM, Arnaldo Carvalho de Melo  wrote:
> 
> Em Fri, Jan 20, 2023 at 05:32:56PM +0530, Athira Rajeev escreveu:
>> While parsing the tracepoint events in parse_events_add_tracepoint()
>> function, code checks for HAVE_LIBTRACEEVENT support. This is needed
>> since libtraceevent is necessary for tracepoint. But while adding
>> probe points, check for LIBTRACEEVENT is not done in case of perf probe.
>> Hence, in environment with missing libtraceevent-devel, it is
>> observed that adding a probe point works even though its not
>> supported.
> 
>> Example:
>> Adding probe point:
>>  ./perf probe 'vfs_getname=getname_flags:72 pathname=result->name:string'
>>  Added new event:
>>probe:vfs_getname(on getname_flags:72 with 
>> pathname=result->name:string)
> 
>>  You can now use it in all perf tools, such as:
> 
>>  perf record -e probe:vfs_getname -aR sleep 1
> 
>> But trying perf record:
>>  ./perf  record -e probe:vfs_getname -aR sleep 1
>>  event syntax error: 'probe:vfs_getname'
>>  \___ unsupported tracepoint
>>  libtraceevent is necessary for tracepoint support
>>  Run 'perf list' for a list of valid events
>> 
>> Fix this by wrapping "builtin-probe" compilation and
>> "perf probe" usage under "CONFIG_LIBTRACEEVENT" check.
> 
> Humm, but 'perf probe' continues to work, as uou demoed above, the
> problem is with the suggestion to use other perf tools that need to
> parse tracefs and without libtraceevent, currently can't do it:
> 
> [root@quaco ~]# perf probe 'vfs_getname=getname_flags:72 
> pathname=result->name:string'
> Added new event:
>  probe:vfs_getname(on getname_flags:72 with pathname=result->name:string)
> 
> You can now use it in all perf tools, such as:
> 
>   perf record -e probe:vfs_getname -aR sleep 1
> 
> [root@quaco ~]# perf probe -l
>  probe:vfs_getname(on getname_flags:72@fs/namei.c with pathname)
> [root@quaco ~]# perf trace -e probe:vfs_getname
> perf: 'trace' is not a perf-command. See 'perf --help'.
> [root@quaco ~]# cd /sys/kernel/tracing/events/probe/vfs_getname/
> [root@quaco vfs_getname]# ls
> enable  filter  format  hist  id  trigger
> [root@quaco vfs_getname]# ls -la
> total 0
> drwxr-x---. 2 root root 0 Jan 20 11:18 .
> drwxr-x---. 3 root root 0 Jan 20 11:18 ..
> -rw-r-. 1 root root 0 Jan 20 11:18 enable
> -rw-r-. 1 root root 0 Jan 20 11:18 filter
> -r--r-. 1 root root 0 Jan 20 11:18 format
> -r--r-. 1 root root 0 Jan 20 11:18 hist
> -r--r-. 1 root root 0 Jan 20 11:18 id
> -rw-r-. 1 root root 0 Jan 20 11:18 trigger
> [root@quaco vfs_getname]#
> 
> But we can go on from there:
> 
> [root@quaco tracing]# pwd
> /sys/kernel/tracing
> [root@quaco tracing]# echo 1 > 
> /sys/kernel/tracing/events/probe/vfs_getname/enable
> [root@quaco tracing]# echo 1 > tracing_on
> [root@quaco tracing]# head trace_pipe
>systemd-oomd-979 [003] . 96369.978971: vfs_getname: 
> (getname_flags.part.0+0x6b/0x1c0) 
> pathname="/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service"
>systemd-oomd-979 [003] . 96369.979022: vfs_getname: 
> (getname_flags.part.0+0x6b/0x1c0) 
> pathname="/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/memory.current"
>systemd-oomd-979 [003] . 96369.979084: vfs_getname: 
> (getname_flags.part.0+0x6b/0x1c0) pathname=""
>systemd-oomd-979 [003] . 96369.979162: vfs_getname: 
> (getname_flags.part.0+0x6b/0x1c0) 
> pathname="/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/memory.min"
>systemd-oomd-979 [003] . 96369.979197: vfs_getname: 
> (getname_flags.part.0+0x6b/0x1c0) pathname=""
>systemd-oomd-979 [003] . 96369.979267: vfs_getname: 
> (getname_flags.part.0+0x6b/0x1c0) 
> pathname="/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/memory.low"
>systemd-oomd-979 [003] . 96369.979303: vfs_getname: 
> (getname_flags.part.0+0x6b/0x1c0) pathname=""
>systemd-oomd-979 [003] . 96369.979372: vfs_getname: 
> (getname_flags.part.0+0x6b/0x1c0) 
> pathname="/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/memory.swap.current"
>systemd-oomd-979 [003] . 96369.979406: vfs_getname: 
> (getname_flags.part.0+0x6b/0x1c0) pathname=""
>systemd-oomd-979 [003] . 96369.979475: vfs_getname: 
> (getname_flags.part.0+0x6b/0x1c0) 
> pathname="/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/memory.stat"
> [root@quaco tracing]#
> 
> So you could instead replace the suggestion from:
> 
> "
>   You can now use it in all perf tools, such as:
> 
>   perf record -e probe:vfs_getname -aR sleep 1
> "
> 
> To:
> 
> "
>   perf is not linked with libtraceevent, to use the new probe you
>   can use tracefs:
> 
>   cd /sys/kernel/tracing/
>   echo 1 > events/probe/vfs_getname/enable
>   echo 1 > tracing_on
>   cat trace_pipe
> "
> 
> 

Re: [RFC 0/3] Asynchronous EEH recovery

2023-01-25 Thread Christophe Leroy
Hi,

Le 15/09/2022 à 12:15, Ganesh a écrit :
> On 9/2/22 05:49, Jason Gunthorpe wrote:
> 
>> On Tue, Aug 16, 2022 at 08:57:13AM +0530, Ganesh Goudar wrote:
>>> Hi,
>>>
>>> EEH reocvery is currently serialized and these patches shorten
>>> the time taken for EEH recovery by making the recovery to run
>>> in parallel. The original author of these patches is Sam Bobroff,
>>> I have rebased and tested these patches.
>> How did you test this?
> 
> This is tested on SRIOV VFs.
> 
>> I understand that VFIO on 6.0 does not work at all on power?
>>
>> I am waiting for power maintainers to pick up this series to fix it:
>>
>> https://lore.kernel.org/kvm/20220714081822.3717693-1-...@ozlabs.ru/
>>


This series doesn't apply anymore, for instance 
drivers/vfio/vfio_spapr_eeh.c doesn't exist anymore.

If you series is still relevant, please resubmit.

Christophe


Re: [PATCH 2/2] powerpc/module_64: Fix "expected nop" error on module re-patching

2023-01-25 Thread Petr Mladek
On Tue 2023-01-24 19:38:05, Josh Poimboeuf wrote:
> When a module with a livepatched function is unloaded and then reloaded,
> klp attempts to dynamically re-patch it.  On ppc64, that fails with the
> following error:
> 
>   module_64: livepatch_nfsd: Expected nop after call, got e8410018 at 
> e_show+0x60/0x548 [livepatch_nfsd]
>   livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' 
> (-8)
>   livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to 
> load module 'nfsd'
> 
> The error happens because the restore r2 instruction had already
> previously been written into the klp module's replacement function when
> the original function was patched the first time.  So the instruction
> wasn't a nop as expected.
> 
> When the restore r2 instruction has already been patched in, detect that
> and skip the warning and the instruction write.
> 
> Signed-off-by: Josh Poimboeuf 

It seems that the function does what it says. And it seems to be the
only location where an instruction is checked before it is modified.
I am fine with this approach.

Reviewed-by: Petr Mladek 

Best Regards,
Petr


Re: [PATCH 1/2] powerpc/module_64: Improve restore_r2() return semantics

2023-01-25 Thread Petr Mladek
On Tue 2023-01-24 19:38:04, Josh Poimboeuf wrote:
> restore_r2() returns 1 on success, which is surprising for a non-boolean
> function.  Change it to return 0 on success and -errno on error to match
> kernel coding convention.
> 
> Signed-off-by: Josh Poimboeuf 

Looks good:

Reviewed-by: Petr Mladek 

It is in the right direction. Just note that there are more functions
with the boolean semantic passed via int return value. But there
are also other functions already using the 0/-E* return values so
this is rather positive change.

Best Regards,
Petr


Re: [PATCH v4 02/24] powerpc/pseries: Fix alignment of PLPKS structures and buffers

2023-01-25 Thread Michael Ellerman
Andrew Donnellan  writes:
> A number of structures and buffers passed to PKS hcalls have alignment
> requirements, which could on occasion cause problems:
>
> - Authorisation structures must be 16-byte aligned and must not cross a
>   page boundary
>
> - Label structures must not cross page boundaries
>
> - Password output buffers must not cross page boundaries
>
> Round up the allocations of these structures/buffers to the next power of
> 2 to make sure this happens.

It's not the *next* power of 2, it's the *nearest* power of 2, including
the initial value if it's already a power of 2.

That in conjunction with slab's guarantee that power of 2 sized objects
are naturally aligned, and that the relevant structs are smaller than a
page, is what makes this actually work.

So I think the patch is fine, but the change log and comments probably
need to be a bit clearer.

cheers

> Reported-by: Benjamin Gray 
> Fixes: 2454a7af0f2a ("powerpc/pseries: define driver for Platform KeyStore")
> Signed-off-by: Andrew Donnellan 
> Reviewed-by: Russell Currey 
> Signed-off-by: Russell Currey 
>
> ---
>
> v3: Merge plpks fixes and signed update series with secvar series
>
> v4: Fix typo in commit message
>
> Move up in series (npiggin)
> ---
>  arch/powerpc/platforms/pseries/plpks.c | 10 +++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/plpks.c 
> b/arch/powerpc/platforms/pseries/plpks.c
> index 9e85b6d85b0b..a01cf2ff140a 100644
> --- a/arch/powerpc/platforms/pseries/plpks.c
> +++ b/arch/powerpc/platforms/pseries/plpks.c
> @@ -126,7 +126,8 @@ static int plpks_gen_password(void)
>   u8 *password, consumer = PKS_OS_OWNER;
>   int rc;
>  
> - password = kzalloc(maxpwsize, GFP_KERNEL);
> + // The password must not cross a page boundary, so we align to the next 
> power of 2
> + password = kzalloc(roundup_pow_of_two(maxpwsize), GFP_KERNEL);
>   if (!password)
>   return -ENOMEM;
>  
> @@ -162,7 +163,9 @@ static struct plpks_auth *construct_auth(u8 consumer)
>   if (consumer > PKS_OS_OWNER)
>   return ERR_PTR(-EINVAL);
>  
> - auth = kzalloc(struct_size(auth, password, maxpwsize), GFP_KERNEL);
> + // The auth structure must not cross a page boundary and must be
> + // 16 byte aligned. We align to the next largest power of 2
> + auth = kzalloc(roundup_pow_of_two(struct_size(auth, password, 
> maxpwsize)), GFP_KERNEL);
>   if (!auth)
>   return ERR_PTR(-ENOMEM);
>  
> @@ -196,7 +199,8 @@ static struct label *construct_label(char *component, u8 
> varos, u8 *name,
>   if (component && slen > sizeof(label->attr.prefix))
>   return ERR_PTR(-EINVAL);
>  
> - label = kzalloc(sizeof(*label), GFP_KERNEL);
> + // The label structure must not cross a page boundary, so we align to 
> the next power of 2
> + label = kzalloc(roundup_pow_of_two(sizeof(*label)), GFP_KERNEL);
>   if (!label)
>   return ERR_PTR(-ENOMEM);
>  
> -- 
> 2.39.0


Re: [PATCH v2 6/6] mm: export dump_mm()

2023-01-25 Thread Michal Hocko
On Wed 25-01-23 00:38:51, Suren Baghdasaryan wrote:
> mmap_assert_write_locked() is used in vm_flags modifiers. Because
> mmap_assert_write_locked() uses dump_mm() and vm_flags are sometimes
> modified from from inside a module, it's necessary to export
> dump_mm() function.
> 
> Signed-off-by: Suren Baghdasaryan 

Acked-by: Michal Hocko 

> ---
>  mm/debug.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/mm/debug.c b/mm/debug.c
> index 9d3d893dc7f4..96d594e16292 100644
> --- a/mm/debug.c
> +++ b/mm/debug.c
> @@ -215,6 +215,7 @@ void dump_mm(const struct mm_struct *mm)
>   mm->def_flags, >def_flags
>   );
>  }
> +EXPORT_SYMBOL(dump_mm);
>  
>  static bool page_init_poisoning __read_mostly = true;
>  
> -- 
> 2.39.1

-- 
Michal Hocko
SUSE Labs


Re: [PATCH v2 5/6] mm: introduce mod_vm_flags_nolock and use it in untrack_pfn

2023-01-25 Thread Michal Hocko
On Wed 25-01-23 00:38:50, Suren Baghdasaryan wrote:
> In cases when VMA flags are modified after VMA was isolated and mmap_lock
> was downgraded, flags modifications would result in an assertion because
> mmap write lock is not held.
> Introduce mod_vm_flags_nolock to be used in such situation.
> Pass a hint to untrack_pfn to conditionally use mod_vm_flags_nolock for
> flags modification and to avoid assertion.

The changelog nor the documentation of mod_vm_flags_nolock 
really explain when it is safe to use it. This is really important for
future potential users.

> Signed-off-by: Suren Baghdasaryan 
> ---
>  arch/x86/mm/pat/memtype.c | 10 +++---
>  include/linux/mm.h| 12 +---
>  include/linux/pgtable.h   |  5 +++--
>  mm/memory.c   | 13 +++--
>  mm/memremap.c |  4 ++--
>  mm/mmap.c | 16 ++--
>  6 files changed, 38 insertions(+), 22 deletions(-)
> 
> diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
> index ae9645c900fa..d8adc0b42cf2 100644
> --- a/arch/x86/mm/pat/memtype.c
> +++ b/arch/x86/mm/pat/memtype.c
> @@ -1046,7 +1046,7 @@ void track_pfn_insert(struct vm_area_struct *vma, 
> pgprot_t *prot, pfn_t pfn)
>   * can be for the entire vma (in which case pfn, size are zero).
>   */
>  void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
> -  unsigned long size)
> +  unsigned long size, bool mm_wr_locked)
>  {
>   resource_size_t paddr;
>   unsigned long prot;
> @@ -1065,8 +1065,12 @@ void untrack_pfn(struct vm_area_struct *vma, unsigned 
> long pfn,
>   size = vma->vm_end - vma->vm_start;
>   }
>   free_pfn_range(paddr, size);
> - if (vma)
> - clear_vm_flags(vma, VM_PAT);
> + if (vma) {
> + if (mm_wr_locked)
> + clear_vm_flags(vma, VM_PAT);
> + else
> + mod_vm_flags_nolock(vma, 0, VM_PAT);
> + }
>  }
>  
>  /*
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 55335edd1373..48d49930c411 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -656,12 +656,18 @@ static inline void clear_vm_flags(struct vm_area_struct 
> *vma,
>   vma->vm_flags &= ~flags;
>  }
>  
> +static inline void mod_vm_flags_nolock(struct vm_area_struct *vma,
> +unsigned long set, unsigned long clear)
> +{
> + vma->vm_flags |= set;
> + vma->vm_flags &= ~clear;
> +}
> +
>  static inline void mod_vm_flags(struct vm_area_struct *vma,
>   unsigned long set, unsigned long clear)
>  {
>   mmap_assert_write_locked(vma->vm_mm);
> - vma->vm_flags |= set;
> - vma->vm_flags &= ~clear;
> + mod_vm_flags_nolock(vma, set, clear);
>  }
>  
>  static inline void vma_set_anonymous(struct vm_area_struct *vma)
> @@ -2087,7 +2093,7 @@ static inline void zap_vma_pages(struct vm_area_struct 
> *vma)
>  }
>  void unmap_vmas(struct mmu_gather *tlb, struct maple_tree *mt,
>   struct vm_area_struct *start_vma, unsigned long start,
> - unsigned long end);
> + unsigned long end, bool mm_wr_locked);
>  
>  struct mmu_notifier_range;
>  
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index 5fd45454c073..c63cd44777ec 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -1185,7 +1185,8 @@ static inline int track_pfn_copy(struct vm_area_struct 
> *vma)
>   * can be for the entire vma (in which case pfn, size are zero).
>   */
>  static inline void untrack_pfn(struct vm_area_struct *vma,
> -unsigned long pfn, unsigned long size)
> +unsigned long pfn, unsigned long size,
> +bool mm_wr_locked)
>  {
>  }
>  
> @@ -1203,7 +1204,7 @@ extern void track_pfn_insert(struct vm_area_struct 
> *vma, pgprot_t *prot,
>pfn_t pfn);
>  extern int track_pfn_copy(struct vm_area_struct *vma);
>  extern void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
> - unsigned long size);
> + unsigned long size, bool mm_wr_locked);
>  extern void untrack_pfn_moved(struct vm_area_struct *vma);
>  #endif
>  
> diff --git a/mm/memory.c b/mm/memory.c
> index d6902065e558..5b11b50e2c4a 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1613,7 +1613,7 @@ void unmap_page_range(struct mmu_gather *tlb,
>  static void unmap_single_vma(struct mmu_gather *tlb,
>   struct vm_area_struct *vma, unsigned long start_addr,
>   unsigned long end_addr,
> - struct zap_details *details)
> + struct zap_details *details, bool mm_wr_locked)
>  {
>   unsigned long start = max(vma->vm_start, start_addr);
>   unsigned long end;
> @@ -1628,7 +1628,7 @@ static void unmap_single_vma(struct mmu_gather *tlb,
>   uprobe_munmap(vma, start, end);
>  
>   if 

Re: [PATCH v2 4/6] mm: replace vma->vm_flags indirect modification in ksm_madvise

2023-01-25 Thread Michal Hocko
On Wed 25-01-23 00:38:49, Suren Baghdasaryan wrote:
> Replace indirect modifications to vma->vm_flags with calls to modifier
> functions to be able to track flag changes and to keep vma locking
> correctness. Add a BUG_ON check in ksm_madvise() to catch indirect
> vm_flags modification attempts.

Those BUG_ONs scream to much IMHO. KSM is an MM internal code so I
gueess we should be willing to trust it.

> Signed-off-by: Suren Baghdasaryan 

Acked-by: Michal Hocko 
-- 
Michal Hocko
SUSE Labs


Re: [PATCH v2 1/6] mm: introduce vma->vm_flags modifier functions

2023-01-25 Thread Peter Zijlstra
On Wed, Jan 25, 2023 at 12:38:46AM -0800, Suren Baghdasaryan wrote:

> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 2d6d790d9bed..6c7c70bf50dd 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -491,7 +491,13 @@ struct vm_area_struct {
>* See vmf_insert_mixed_prot() for discussion.
>*/
>   pgprot_t vm_page_prot;
> - unsigned long vm_flags; /* Flags, see mm.h. */
> +
> + /*
> +  * Flags, see mm.h.
> +  * WARNING! Do not modify directly.
> +  * Use {init|reset|set|clear|mod}_vm_flags() functions instead.
> +  */
> + unsigned long vm_flags;

We have __private and ACCESS_PRIVATE() to help with enforcing this.


Re: [PATCH v2 2/6] mm: replace VM_LOCKED_CLEAR_MASK with VM_LOCKED_MASK

2023-01-25 Thread Michal Hocko
On Wed 25-01-23 00:38:47, Suren Baghdasaryan wrote:
> To simplify the usage of VM_LOCKED_CLEAR_MASK in clear_vm_flags(),
> replace it with VM_LOCKED_MASK bitmask and convert all users.
>
> Signed-off-by: Suren Baghdasaryan 

Acked-by: Michal Hocko 

> ---
>  include/linux/mm.h | 4 ++--
>  kernel/fork.c  | 2 +-
>  mm/hugetlb.c   | 4 ++--
>  mm/mlock.c | 6 +++---
>  mm/mmap.c  | 6 +++---
>  mm/mremap.c| 2 +-
>  6 files changed, 12 insertions(+), 12 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index b71f2809caac..da62bdd627bf 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -421,8 +421,8 @@ extern unsigned int kobjsize(const void *objp);
>  /* This mask defines which mm->def_flags a process can inherit its parent */
>  #define VM_INIT_DEF_MASK VM_NOHUGEPAGE
>  
> -/* This mask is used to clear all the VMA flags used by mlock */
> -#define VM_LOCKED_CLEAR_MASK (~(VM_LOCKED | VM_LOCKONFAULT))
> +/* This mask represents all the VMA flag bits used by mlock */
> +#define VM_LOCKED_MASK   (VM_LOCKED | VM_LOCKONFAULT)
>  
>  /* Arch-specific flags to clear when updating VM flags on protection change 
> */
>  #ifndef VM_ARCH_CLEAR
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 6683c1b0f460..03d472051236 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -669,7 +669,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
>   tmp->anon_vma = NULL;
>   } else if (anon_vma_fork(tmp, mpnt))
>   goto fail_nomem_anon_vma_fork;
> - tmp->vm_flags &= ~(VM_LOCKED | VM_LOCKONFAULT);
> + clear_vm_flags(tmp, VM_LOCKED_MASK);
>   file = tmp->vm_file;
>   if (file) {
>   struct address_space *mapping = file->f_mapping;
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index d20c8b09890e..4ecdbad9a451 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -6973,8 +6973,8 @@ static unsigned long page_table_shareable(struct 
> vm_area_struct *svma,
>   unsigned long s_end = sbase + PUD_SIZE;
>  
>   /* Allow segments to share if only one is marked locked */
> - unsigned long vm_flags = vma->vm_flags & VM_LOCKED_CLEAR_MASK;
> - unsigned long svm_flags = svma->vm_flags & VM_LOCKED_CLEAR_MASK;
> + unsigned long vm_flags = vma->vm_flags & ~VM_LOCKED_MASK;
> + unsigned long svm_flags = svma->vm_flags & ~VM_LOCKED_MASK;
>  
>   /*
>* match the virtual addresses, permission and the alignment of the
> diff --git a/mm/mlock.c b/mm/mlock.c
> index 0336f52e03d7..5c4fff93cd6b 100644
> --- a/mm/mlock.c
> +++ b/mm/mlock.c
> @@ -497,7 +497,7 @@ static int apply_vma_lock_flags(unsigned long start, 
> size_t len,
>   if (vma->vm_start != tmp)
>   return -ENOMEM;
>  
> - newflags = vma->vm_flags & VM_LOCKED_CLEAR_MASK;
> + newflags = vma->vm_flags & ~VM_LOCKED_MASK;
>   newflags |= flags;
>   /* Here we know that  vma->vm_start <= nstart < vma->vm_end. */
>   tmp = vma->vm_end;
> @@ -661,7 +661,7 @@ static int apply_mlockall_flags(int flags)
>   struct vm_area_struct *vma, *prev = NULL;
>   vm_flags_t to_add = 0;
>  
> - current->mm->def_flags &= VM_LOCKED_CLEAR_MASK;
> + current->mm->def_flags &= ~VM_LOCKED_MASK;
>   if (flags & MCL_FUTURE) {
>   current->mm->def_flags |= VM_LOCKED;
>  
> @@ -681,7 +681,7 @@ static int apply_mlockall_flags(int flags)
>   for_each_vma(vmi, vma) {
>   vm_flags_t newflags;
>  
> - newflags = vma->vm_flags & VM_LOCKED_CLEAR_MASK;
> + newflags = vma->vm_flags & ~VM_LOCKED_MASK;
>   newflags |= to_add;
>  
>   /* Ignore errors */
> diff --git a/mm/mmap.c b/mm/mmap.c
> index d4abc6feced1..323bd253b25a 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -2671,7 +2671,7 @@ unsigned long mmap_region(struct file *file, unsigned 
> long addr,
>   if ((vm_flags & VM_SPECIAL) || vma_is_dax(vma) ||
>   is_vm_hugetlb_page(vma) ||
>   vma == get_gate_vma(current->mm))
> - vma->vm_flags &= VM_LOCKED_CLEAR_MASK;
> + clear_vm_flags(vma, VM_LOCKED_MASK);
>   else
>   mm->locked_vm += (len >> PAGE_SHIFT);
>   }
> @@ -3340,8 +3340,8 @@ static struct vm_area_struct *__install_special_mapping(
>   vma->vm_start = addr;
>   vma->vm_end = addr + len;
>  
> - vma->vm_flags = vm_flags | mm->def_flags | VM_DONTEXPAND | VM_SOFTDIRTY;
> - vma->vm_flags &= VM_LOCKED_CLEAR_MASK;
> + init_vm_flags(vma, (vm_flags | mm->def_flags |
> +   VM_DONTEXPAND | VM_SOFTDIRTY) & ~VM_LOCKED_MASK);
>   vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
>  
>   vma->vm_ops = ops;
> diff --git a/mm/mremap.c b/mm/mremap.c
> index 

Re: [PATCH v2 1/6] mm: introduce vma->vm_flags modifier functions

2023-01-25 Thread Michal Hocko
On Wed 25-01-23 00:38:46, Suren Baghdasaryan wrote:
> vm_flags are among VMA attributes which affect decisions like VMA merging
> and splitting. Therefore all vm_flags modifications are performed after
> taking exclusive mmap_lock to prevent vm_flags updates racing with such
> operations. Introduce modifier functions for vm_flags to be used whenever
> flags are updated. This way we can better check and control correct
> locking behavior during these updates.
> 
> Signed-off-by: Suren Baghdasaryan 

Acked-by: Michal Hocko 

> ---
>  include/linux/mm.h   | 37 +
>  include/linux/mm_types.h |  8 +++-
>  2 files changed, 44 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index c2f62bdce134..b71f2809caac 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -627,6 +627,43 @@ static inline void vma_init(struct vm_area_struct *vma, 
> struct mm_struct *mm)
>   INIT_LIST_HEAD(>anon_vma_chain);
>  }
>  
> +/* Use when VMA is not part of the VMA tree and needs no locking */
> +static inline void init_vm_flags(struct vm_area_struct *vma,
> +  unsigned long flags)
> +{
> + vma->vm_flags = flags;
> +}
> +
> +/* Use when VMA is part of the VMA tree and modifications need coordination 
> */
> +static inline void reset_vm_flags(struct vm_area_struct *vma,
> +   unsigned long flags)
> +{
> + mmap_assert_write_locked(vma->vm_mm);
> + init_vm_flags(vma, flags);
> +}
> +
> +static inline void set_vm_flags(struct vm_area_struct *vma,
> + unsigned long flags)
> +{
> + mmap_assert_write_locked(vma->vm_mm);
> + vma->vm_flags |= flags;
> +}
> +
> +static inline void clear_vm_flags(struct vm_area_struct *vma,
> +   unsigned long flags)
> +{
> + mmap_assert_write_locked(vma->vm_mm);
> + vma->vm_flags &= ~flags;
> +}
> +
> +static inline void mod_vm_flags(struct vm_area_struct *vma,
> + unsigned long set, unsigned long clear)
> +{
> + mmap_assert_write_locked(vma->vm_mm);
> + vma->vm_flags |= set;
> + vma->vm_flags &= ~clear;
> +}
> +
>  static inline void vma_set_anonymous(struct vm_area_struct *vma)
>  {
>   vma->vm_ops = NULL;
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 2d6d790d9bed..6c7c70bf50dd 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -491,7 +491,13 @@ struct vm_area_struct {
>* See vmf_insert_mixed_prot() for discussion.
>*/
>   pgprot_t vm_page_prot;
> - unsigned long vm_flags; /* Flags, see mm.h. */
> +
> + /*
> +  * Flags, see mm.h.
> +  * WARNING! Do not modify directly.
> +  * Use {init|reset|set|clear|mod}_vm_flags() functions instead.
> +  */
> + unsigned long vm_flags;
>  
>   /*
>* For areas with an address space and backing store,
> -- 
> 2.39.1

-- 
Michal Hocko
SUSE Labs


[PATCH v2 6/6] mm: export dump_mm()

2023-01-25 Thread Suren Baghdasaryan
mmap_assert_write_locked() is used in vm_flags modifiers. Because
mmap_assert_write_locked() uses dump_mm() and vm_flags are sometimes
modified from from inside a module, it's necessary to export
dump_mm() function.

Signed-off-by: Suren Baghdasaryan 
---
 mm/debug.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/debug.c b/mm/debug.c
index 9d3d893dc7f4..96d594e16292 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -215,6 +215,7 @@ void dump_mm(const struct mm_struct *mm)
mm->def_flags, >def_flags
);
 }
+EXPORT_SYMBOL(dump_mm);
 
 static bool page_init_poisoning __read_mostly = true;
 
-- 
2.39.1



[PATCH v2 5/6] mm: introduce mod_vm_flags_nolock and use it in untrack_pfn

2023-01-25 Thread Suren Baghdasaryan
In cases when VMA flags are modified after VMA was isolated and mmap_lock
was downgraded, flags modifications would result in an assertion because
mmap write lock is not held.
Introduce mod_vm_flags_nolock to be used in such situation.
Pass a hint to untrack_pfn to conditionally use mod_vm_flags_nolock for
flags modification and to avoid assertion.

Signed-off-by: Suren Baghdasaryan 
---
 arch/x86/mm/pat/memtype.c | 10 +++---
 include/linux/mm.h| 12 +---
 include/linux/pgtable.h   |  5 +++--
 mm/memory.c   | 13 +++--
 mm/memremap.c |  4 ++--
 mm/mmap.c | 16 ++--
 6 files changed, 38 insertions(+), 22 deletions(-)

diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
index ae9645c900fa..d8adc0b42cf2 100644
--- a/arch/x86/mm/pat/memtype.c
+++ b/arch/x86/mm/pat/memtype.c
@@ -1046,7 +1046,7 @@ void track_pfn_insert(struct vm_area_struct *vma, 
pgprot_t *prot, pfn_t pfn)
  * can be for the entire vma (in which case pfn, size are zero).
  */
 void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
-unsigned long size)
+unsigned long size, bool mm_wr_locked)
 {
resource_size_t paddr;
unsigned long prot;
@@ -1065,8 +1065,12 @@ void untrack_pfn(struct vm_area_struct *vma, unsigned 
long pfn,
size = vma->vm_end - vma->vm_start;
}
free_pfn_range(paddr, size);
-   if (vma)
-   clear_vm_flags(vma, VM_PAT);
+   if (vma) {
+   if (mm_wr_locked)
+   clear_vm_flags(vma, VM_PAT);
+   else
+   mod_vm_flags_nolock(vma, 0, VM_PAT);
+   }
 }
 
 /*
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 55335edd1373..48d49930c411 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -656,12 +656,18 @@ static inline void clear_vm_flags(struct vm_area_struct 
*vma,
vma->vm_flags &= ~flags;
 }
 
+static inline void mod_vm_flags_nolock(struct vm_area_struct *vma,
+  unsigned long set, unsigned long clear)
+{
+   vma->vm_flags |= set;
+   vma->vm_flags &= ~clear;
+}
+
 static inline void mod_vm_flags(struct vm_area_struct *vma,
unsigned long set, unsigned long clear)
 {
mmap_assert_write_locked(vma->vm_mm);
-   vma->vm_flags |= set;
-   vma->vm_flags &= ~clear;
+   mod_vm_flags_nolock(vma, set, clear);
 }
 
 static inline void vma_set_anonymous(struct vm_area_struct *vma)
@@ -2087,7 +2093,7 @@ static inline void zap_vma_pages(struct vm_area_struct 
*vma)
 }
 void unmap_vmas(struct mmu_gather *tlb, struct maple_tree *mt,
struct vm_area_struct *start_vma, unsigned long start,
-   unsigned long end);
+   unsigned long end, bool mm_wr_locked);
 
 struct mmu_notifier_range;
 
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 5fd45454c073..c63cd44777ec 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -1185,7 +1185,8 @@ static inline int track_pfn_copy(struct vm_area_struct 
*vma)
  * can be for the entire vma (in which case pfn, size are zero).
  */
 static inline void untrack_pfn(struct vm_area_struct *vma,
-  unsigned long pfn, unsigned long size)
+  unsigned long pfn, unsigned long size,
+  bool mm_wr_locked)
 {
 }
 
@@ -1203,7 +1204,7 @@ extern void track_pfn_insert(struct vm_area_struct *vma, 
pgprot_t *prot,
 pfn_t pfn);
 extern int track_pfn_copy(struct vm_area_struct *vma);
 extern void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
-   unsigned long size);
+   unsigned long size, bool mm_wr_locked);
 extern void untrack_pfn_moved(struct vm_area_struct *vma);
 #endif
 
diff --git a/mm/memory.c b/mm/memory.c
index d6902065e558..5b11b50e2c4a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1613,7 +1613,7 @@ void unmap_page_range(struct mmu_gather *tlb,
 static void unmap_single_vma(struct mmu_gather *tlb,
struct vm_area_struct *vma, unsigned long start_addr,
unsigned long end_addr,
-   struct zap_details *details)
+   struct zap_details *details, bool mm_wr_locked)
 {
unsigned long start = max(vma->vm_start, start_addr);
unsigned long end;
@@ -1628,7 +1628,7 @@ static void unmap_single_vma(struct mmu_gather *tlb,
uprobe_munmap(vma, start, end);
 
if (unlikely(vma->vm_flags & VM_PFNMAP))
-   untrack_pfn(vma, 0, 0);
+   untrack_pfn(vma, 0, 0, mm_wr_locked);
 
if (start != end) {
if (unlikely(is_vm_hugetlb_page(vma))) {
@@ -1675,7 +1675,7 @@ static void unmap_single_vma(struct mmu_gather *tlb,
  */
 void unmap_vmas(struct mmu_gather *tlb, struct maple_tree *mt,

[PATCH v2 4/6] mm: replace vma->vm_flags indirect modification in ksm_madvise

2023-01-25 Thread Suren Baghdasaryan
Replace indirect modifications to vma->vm_flags with calls to modifier
functions to be able to track flag changes and to keep vma locking
correctness. Add a BUG_ON check in ksm_madvise() to catch indirect
vm_flags modification attempts.

Signed-off-by: Suren Baghdasaryan 
---
 arch/powerpc/kvm/book3s_hv_uvmem.c | 5 -
 arch/s390/mm/gmap.c| 5 -
 mm/khugepaged.c| 2 ++
 mm/ksm.c   | 2 ++
 4 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 1d67baa5557a..325a7a47d348 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -393,6 +393,7 @@ static int kvmppc_memslot_page_merge(struct kvm *kvm,
 {
unsigned long gfn = memslot->base_gfn;
unsigned long end, start = gfn_to_hva(kvm, gfn);
+   unsigned long vm_flags;
int ret = 0;
struct vm_area_struct *vma;
int merge_flag = (merge) ? MADV_MERGEABLE : MADV_UNMERGEABLE;
@@ -409,12 +410,14 @@ static int kvmppc_memslot_page_merge(struct kvm *kvm,
ret = H_STATE;
break;
}
+   vm_flags = vma->vm_flags;
ret = ksm_madvise(vma, vma->vm_start, vma->vm_end,
- merge_flag, >vm_flags);
+ merge_flag, _flags);
if (ret) {
ret = H_STATE;
break;
}
+   reset_vm_flags(vma, vm_flags);
start = vma->vm_end;
} while (end > vma->vm_end);
 
diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index 3a695b8a1e3c..d5eb47dcdacb 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -2587,14 +2587,17 @@ int gmap_mark_unmergeable(void)
 {
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
+   unsigned long vm_flags;
int ret;
VMA_ITERATOR(vmi, mm, 0);
 
for_each_vma(vmi, vma) {
+   vm_flags = vma->vm_flags;
ret = ksm_madvise(vma, vma->vm_start, vma->vm_end,
- MADV_UNMERGEABLE, >vm_flags);
+ MADV_UNMERGEABLE, _flags);
if (ret)
return ret;
+   reset_vm_flags(vma, vm_flags);
}
mm->def_flags &= ~VM_MERGEABLE;
return 0;
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 8abc59345bf2..76b24cd0c179 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -354,6 +354,8 @@ struct attribute_group khugepaged_attr_group = {
 int hugepage_madvise(struct vm_area_struct *vma,
 unsigned long *vm_flags, int advice)
 {
+   /* vma->vm_flags can be changed only using modifier functions */
+   BUG_ON(vm_flags == >vm_flags);
switch (advice) {
case MADV_HUGEPAGE:
 #ifdef CONFIG_S390
diff --git a/mm/ksm.c b/mm/ksm.c
index 04f1c8c2df11..992b2be9f5e6 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -2573,6 +2573,8 @@ int ksm_madvise(struct vm_area_struct *vma, unsigned long 
start,
struct mm_struct *mm = vma->vm_mm;
int err;
 
+   /* vma->vm_flags can be changed only using modifier functions */
+   BUG_ON(vm_flags == >vm_flags);
switch (advice) {
case MADV_MERGEABLE:
/*
-- 
2.39.1



[PATCH v2 3/6] mm: replace vma->vm_flags direct modifications with modifier calls

2023-01-25 Thread Suren Baghdasaryan
Replace direct modifications to vma->vm_flags with calls to modifier
functions to be able to track flag changes and to keep vma locking
correctness.

Signed-off-by: Suren Baghdasaryan 
---
 arch/arm/kernel/process.c  |  2 +-
 arch/ia64/mm/init.c|  8 
 arch/loongarch/include/asm/tlb.h   |  2 +-
 arch/powerpc/kvm/book3s_xive_native.c  |  2 +-
 arch/powerpc/mm/book3s64/subpage_prot.c|  2 +-
 arch/powerpc/platforms/book3s/vas-api.c|  2 +-
 arch/powerpc/platforms/cell/spufs/file.c   | 14 +++---
 arch/s390/mm/gmap.c|  3 +--
 arch/x86/entry/vsyscall/vsyscall_64.c  |  2 +-
 arch/x86/kernel/cpu/sgx/driver.c   |  2 +-
 arch/x86/kernel/cpu/sgx/virt.c |  2 +-
 arch/x86/mm/pat/memtype.c  |  6 +++---
 arch/x86/um/mem_32.c   |  2 +-
 drivers/acpi/pfr_telemetry.c   |  2 +-
 drivers/android/binder.c   |  3 +--
 drivers/char/mspec.c   |  2 +-
 drivers/crypto/hisilicon/qm.c  |  2 +-
 drivers/dax/device.c   |  2 +-
 drivers/dma/idxd/cdev.c|  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c|  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c   |  4 ++--
 drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c  |  4 ++--
 drivers/gpu/drm/amd/amdkfd/kfd_events.c|  4 ++--
 drivers/gpu/drm/amd/amdkfd/kfd_process.c   |  4 ++--
 drivers/gpu/drm/drm_gem.c  |  2 +-
 drivers/gpu/drm/drm_gem_dma_helper.c   |  3 +--
 drivers/gpu/drm/drm_gem_shmem_helper.c |  2 +-
 drivers/gpu/drm/drm_vm.c   |  8 
 drivers/gpu/drm/etnaviv/etnaviv_gem.c  |  2 +-
 drivers/gpu/drm/exynos/exynos_drm_gem.c|  4 ++--
 drivers/gpu/drm/gma500/framebuffer.c   |  2 +-
 drivers/gpu/drm/i810/i810_dma.c|  2 +-
 drivers/gpu/drm/i915/gem/i915_gem_mman.c   |  4 ++--
 drivers/gpu/drm/mediatek/mtk_drm_gem.c |  2 +-
 drivers/gpu/drm/msm/msm_gem.c  |  2 +-
 drivers/gpu/drm/omapdrm/omap_gem.c |  3 +--
 drivers/gpu/drm/rockchip/rockchip_drm_gem.c|  3 +--
 drivers/gpu/drm/tegra/gem.c|  5 ++---
 drivers/gpu/drm/ttm/ttm_bo_vm.c|  3 +--
 drivers/gpu/drm/virtio/virtgpu_vram.c  |  2 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c   |  2 +-
 drivers/gpu/drm/xen/xen_drm_front_gem.c|  3 +--
 drivers/hsi/clients/cmt_speech.c   |  2 +-
 drivers/hwtracing/intel_th/msu.c   |  2 +-
 drivers/hwtracing/stm/core.c   |  2 +-
 drivers/infiniband/hw/hfi1/file_ops.c  |  4 ++--
 drivers/infiniband/hw/mlx5/main.c  |  4 ++--
 drivers/infiniband/hw/qib/qib_file_ops.c   | 13 ++---
 drivers/infiniband/hw/usnic/usnic_ib_verbs.c   |  2 +-
 drivers/infiniband/hw/vmw_pvrdma/pvrdma_verbs.c|  2 +-
 .../media/common/videobuf2/videobuf2-dma-contig.c  |  2 +-
 drivers/media/common/videobuf2/videobuf2-vmalloc.c |  2 +-
 drivers/media/v4l2-core/videobuf-dma-contig.c  |  2 +-
 drivers/media/v4l2-core/videobuf-dma-sg.c  |  4 ++--
 drivers/media/v4l2-core/videobuf-vmalloc.c |  2 +-
 drivers/misc/cxl/context.c |  2 +-
 drivers/misc/habanalabs/common/memory.c|  2 +-
 drivers/misc/habanalabs/gaudi/gaudi.c  |  4 ++--
 drivers/misc/habanalabs/gaudi2/gaudi2.c|  8 
 drivers/misc/habanalabs/goya/goya.c|  4 ++--
 drivers/misc/ocxl/context.c|  4 ++--
 drivers/misc/ocxl/sysfs.c  |  2 +-
 drivers/misc/open-dice.c   |  4 ++--
 drivers/misc/sgi-gru/grufile.c |  4 ++--
 drivers/misc/uacce/uacce.c |  2 +-
 drivers/sbus/char/oradax.c |  2 +-
 drivers/scsi/cxlflash/ocxl_hw.c|  2 +-
 drivers/scsi/sg.c  |  2 +-
 drivers/staging/media/atomisp/pci/hmm/hmm_bo.c |  2 +-
 drivers/staging/media/deprecated/meye/meye.c   |  4 ++--
 .../media/deprecated/stkwebcam/stk-webcam.c|  2 +-
 drivers/target/target_core_user.c  |  2 +-
 drivers/uio/uio.c  |  2 +-
 drivers/usb/core/devio.c   |  3 +--
 drivers/usb/mon/mon_bin.c  |  3 +--
 drivers/vdpa/vdpa_user/iova_domain.c   |  2 +-
 drivers/vfio/pci/vfio_pci_core.c   |  2 +-
 drivers/vhost/vdpa.c   |  2 +-
 drivers/video/fbdev/68328fb.c 

[PATCH v2 2/6] mm: replace VM_LOCKED_CLEAR_MASK with VM_LOCKED_MASK

2023-01-25 Thread Suren Baghdasaryan
To simplify the usage of VM_LOCKED_CLEAR_MASK in clear_vm_flags(),
replace it with VM_LOCKED_MASK bitmask and convert all users.

Signed-off-by: Suren Baghdasaryan 
---
 include/linux/mm.h | 4 ++--
 kernel/fork.c  | 2 +-
 mm/hugetlb.c   | 4 ++--
 mm/mlock.c | 6 +++---
 mm/mmap.c  | 6 +++---
 mm/mremap.c| 2 +-
 6 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index b71f2809caac..da62bdd627bf 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -421,8 +421,8 @@ extern unsigned int kobjsize(const void *objp);
 /* This mask defines which mm->def_flags a process can inherit its parent */
 #define VM_INIT_DEF_MASK   VM_NOHUGEPAGE
 
-/* This mask is used to clear all the VMA flags used by mlock */
-#define VM_LOCKED_CLEAR_MASK   (~(VM_LOCKED | VM_LOCKONFAULT))
+/* This mask represents all the VMA flag bits used by mlock */
+#define VM_LOCKED_MASK (VM_LOCKED | VM_LOCKONFAULT)
 
 /* Arch-specific flags to clear when updating VM flags on protection change */
 #ifndef VM_ARCH_CLEAR
diff --git a/kernel/fork.c b/kernel/fork.c
index 6683c1b0f460..03d472051236 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -669,7 +669,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
tmp->anon_vma = NULL;
} else if (anon_vma_fork(tmp, mpnt))
goto fail_nomem_anon_vma_fork;
-   tmp->vm_flags &= ~(VM_LOCKED | VM_LOCKONFAULT);
+   clear_vm_flags(tmp, VM_LOCKED_MASK);
file = tmp->vm_file;
if (file) {
struct address_space *mapping = file->f_mapping;
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index d20c8b09890e..4ecdbad9a451 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6973,8 +6973,8 @@ static unsigned long page_table_shareable(struct 
vm_area_struct *svma,
unsigned long s_end = sbase + PUD_SIZE;
 
/* Allow segments to share if only one is marked locked */
-   unsigned long vm_flags = vma->vm_flags & VM_LOCKED_CLEAR_MASK;
-   unsigned long svm_flags = svma->vm_flags & VM_LOCKED_CLEAR_MASK;
+   unsigned long vm_flags = vma->vm_flags & ~VM_LOCKED_MASK;
+   unsigned long svm_flags = svma->vm_flags & ~VM_LOCKED_MASK;
 
/*
 * match the virtual addresses, permission and the alignment of the
diff --git a/mm/mlock.c b/mm/mlock.c
index 0336f52e03d7..5c4fff93cd6b 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -497,7 +497,7 @@ static int apply_vma_lock_flags(unsigned long start, size_t 
len,
if (vma->vm_start != tmp)
return -ENOMEM;
 
-   newflags = vma->vm_flags & VM_LOCKED_CLEAR_MASK;
+   newflags = vma->vm_flags & ~VM_LOCKED_MASK;
newflags |= flags;
/* Here we know that  vma->vm_start <= nstart < vma->vm_end. */
tmp = vma->vm_end;
@@ -661,7 +661,7 @@ static int apply_mlockall_flags(int flags)
struct vm_area_struct *vma, *prev = NULL;
vm_flags_t to_add = 0;
 
-   current->mm->def_flags &= VM_LOCKED_CLEAR_MASK;
+   current->mm->def_flags &= ~VM_LOCKED_MASK;
if (flags & MCL_FUTURE) {
current->mm->def_flags |= VM_LOCKED;
 
@@ -681,7 +681,7 @@ static int apply_mlockall_flags(int flags)
for_each_vma(vmi, vma) {
vm_flags_t newflags;
 
-   newflags = vma->vm_flags & VM_LOCKED_CLEAR_MASK;
+   newflags = vma->vm_flags & ~VM_LOCKED_MASK;
newflags |= to_add;
 
/* Ignore errors */
diff --git a/mm/mmap.c b/mm/mmap.c
index d4abc6feced1..323bd253b25a 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2671,7 +2671,7 @@ unsigned long mmap_region(struct file *file, unsigned 
long addr,
if ((vm_flags & VM_SPECIAL) || vma_is_dax(vma) ||
is_vm_hugetlb_page(vma) ||
vma == get_gate_vma(current->mm))
-   vma->vm_flags &= VM_LOCKED_CLEAR_MASK;
+   clear_vm_flags(vma, VM_LOCKED_MASK);
else
mm->locked_vm += (len >> PAGE_SHIFT);
}
@@ -3340,8 +3340,8 @@ static struct vm_area_struct *__install_special_mapping(
vma->vm_start = addr;
vma->vm_end = addr + len;
 
-   vma->vm_flags = vm_flags | mm->def_flags | VM_DONTEXPAND | VM_SOFTDIRTY;
-   vma->vm_flags &= VM_LOCKED_CLEAR_MASK;
+   init_vm_flags(vma, (vm_flags | mm->def_flags |
+ VM_DONTEXPAND | VM_SOFTDIRTY) & ~VM_LOCKED_MASK);
vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
 
vma->vm_ops = ops;
diff --git a/mm/mremap.c b/mm/mremap.c
index 1b3ee02bead7..35db9752cb6a 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -687,7 +687,7 @@ static unsigned long move_vma(struct vm_area_struct *vma,
 
if (unlikely(!err && (flags & MREMAP_DONTUNMAP))) {
  

[PATCH v2 0/6] introduce vm_flags modifier functions

2023-01-25 Thread Suren Baghdasaryan
This patchset was originally published as a part of per-VMA locking [1] and
was split after suggestion that it's viable on its own and to facilitate
the review process. It is now a preprequisite for the next version of per-VMA
lock patchset, which reuses vm_flags modifier functions to lock the VMA when
vm_flags are being updated.

VMA vm_flags modifications are usually done under exclusive mmap_lock
protection because this attrubute affects other decisions like VMA merging
or splitting and races should be prevented. Introduce vm_flags modifier
functions to enforce correct locking.

[1] https://lore.kernel.org/all/20230109205336.3665937-1-sur...@google.com/

The patchset applies cleanly over mm-unstable branch of mm tree.

My apologies for an extremely large distribution list. The patch touches
lots of files and many are in arch/ and drivers/.

Suren Baghdasaryan (6):
  mm: introduce vma->vm_flags modifier functions
  mm: replace VM_LOCKED_CLEAR_MASK with VM_LOCKED_MASK
  mm: replace vma->vm_flags direct modifications with modifier calls
  mm: replace vma->vm_flags indirect modification in ksm_madvise
  mm: introduce mod_vm_flags_nolock and use it in untrack_pfn
  mm: export dump_mm()

 arch/arm/kernel/process.c |  2 +-
 arch/ia64/mm/init.c   |  8 +--
 arch/loongarch/include/asm/tlb.h  |  2 +-
 arch/powerpc/kvm/book3s_hv_uvmem.c|  5 +-
 arch/powerpc/kvm/book3s_xive_native.c |  2 +-
 arch/powerpc/mm/book3s64/subpage_prot.c   |  2 +-
 arch/powerpc/platforms/book3s/vas-api.c   |  2 +-
 arch/powerpc/platforms/cell/spufs/file.c  | 14 ++---
 arch/s390/mm/gmap.c   |  8 +--
 arch/x86/entry/vsyscall/vsyscall_64.c |  2 +-
 arch/x86/kernel/cpu/sgx/driver.c  |  2 +-
 arch/x86/kernel/cpu/sgx/virt.c|  2 +-
 arch/x86/mm/pat/memtype.c | 14 +++--
 arch/x86/um/mem_32.c  |  2 +-
 drivers/acpi/pfr_telemetry.c  |  2 +-
 drivers/android/binder.c  |  3 +-
 drivers/char/mspec.c  |  2 +-
 drivers/crypto/hisilicon/qm.c |  2 +-
 drivers/dax/device.c  |  2 +-
 drivers/dma/idxd/cdev.c   |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c   |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |  4 +-
 drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c |  4 +-
 drivers/gpu/drm/amd/amdkfd/kfd_events.c   |  4 +-
 drivers/gpu/drm/amd/amdkfd/kfd_process.c  |  4 +-
 drivers/gpu/drm/drm_gem.c |  2 +-
 drivers/gpu/drm/drm_gem_dma_helper.c  |  3 +-
 drivers/gpu/drm/drm_gem_shmem_helper.c|  2 +-
 drivers/gpu/drm/drm_vm.c  |  8 +--
 drivers/gpu/drm/etnaviv/etnaviv_gem.c |  2 +-
 drivers/gpu/drm/exynos/exynos_drm_gem.c   |  4 +-
 drivers/gpu/drm/gma500/framebuffer.c  |  2 +-
 drivers/gpu/drm/i810/i810_dma.c   |  2 +-
 drivers/gpu/drm/i915/gem/i915_gem_mman.c  |  4 +-
 drivers/gpu/drm/mediatek/mtk_drm_gem.c|  2 +-
 drivers/gpu/drm/msm/msm_gem.c |  2 +-
 drivers/gpu/drm/omapdrm/omap_gem.c|  3 +-
 drivers/gpu/drm/rockchip/rockchip_drm_gem.c   |  3 +-
 drivers/gpu/drm/tegra/gem.c   |  5 +-
 drivers/gpu/drm/ttm/ttm_bo_vm.c   |  3 +-
 drivers/gpu/drm/virtio/virtgpu_vram.c |  2 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c  |  2 +-
 drivers/gpu/drm/xen/xen_drm_front_gem.c   |  3 +-
 drivers/hsi/clients/cmt_speech.c  |  2 +-
 drivers/hwtracing/intel_th/msu.c  |  2 +-
 drivers/hwtracing/stm/core.c  |  2 +-
 drivers/infiniband/hw/hfi1/file_ops.c |  4 +-
 drivers/infiniband/hw/mlx5/main.c |  4 +-
 drivers/infiniband/hw/qib/qib_file_ops.c  | 13 +++--
 drivers/infiniband/hw/usnic/usnic_ib_verbs.c  |  2 +-
 .../infiniband/hw/vmw_pvrdma/pvrdma_verbs.c   |  2 +-
 .../common/videobuf2/videobuf2-dma-contig.c   |  2 +-
 .../common/videobuf2/videobuf2-vmalloc.c  |  2 +-
 drivers/media/v4l2-core/videobuf-dma-contig.c |  2 +-
 drivers/media/v4l2-core/videobuf-dma-sg.c |  4 +-
 drivers/media/v4l2-core/videobuf-vmalloc.c|  2 +-
 drivers/misc/cxl/context.c|  2 +-
 drivers/misc/habanalabs/common/memory.c   |  2 +-
 drivers/misc/habanalabs/gaudi/gaudi.c |  4 +-
 drivers/misc/habanalabs/gaudi2/gaudi2.c   |  8 +--
 drivers/misc/habanalabs/goya/goya.c   |  4 +-
 drivers/misc/ocxl/context.c   |  4 +-
 drivers/misc/ocxl/sysfs.c |  2 +-
 drivers/misc/open-dice.c  |  4 +-
 drivers/misc/sgi-gru/grufile.c|  4 +-
 drivers/misc/uacce/uacce.c|  2 +-
 drivers/sbus/char/oradax.c|  2 +-
 drivers/scsi/cxlflash/ocxl_hw.c   |  2 +-
 drivers/scsi/sg.c

[PATCH v2 1/6] mm: introduce vma->vm_flags modifier functions

2023-01-25 Thread Suren Baghdasaryan
vm_flags are among VMA attributes which affect decisions like VMA merging
and splitting. Therefore all vm_flags modifications are performed after
taking exclusive mmap_lock to prevent vm_flags updates racing with such
operations. Introduce modifier functions for vm_flags to be used whenever
flags are updated. This way we can better check and control correct
locking behavior during these updates.

Signed-off-by: Suren Baghdasaryan 
---
 include/linux/mm.h   | 37 +
 include/linux/mm_types.h |  8 +++-
 2 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index c2f62bdce134..b71f2809caac 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -627,6 +627,43 @@ static inline void vma_init(struct vm_area_struct *vma, 
struct mm_struct *mm)
INIT_LIST_HEAD(>anon_vma_chain);
 }
 
+/* Use when VMA is not part of the VMA tree and needs no locking */
+static inline void init_vm_flags(struct vm_area_struct *vma,
+unsigned long flags)
+{
+   vma->vm_flags = flags;
+}
+
+/* Use when VMA is part of the VMA tree and modifications need coordination */
+static inline void reset_vm_flags(struct vm_area_struct *vma,
+ unsigned long flags)
+{
+   mmap_assert_write_locked(vma->vm_mm);
+   init_vm_flags(vma, flags);
+}
+
+static inline void set_vm_flags(struct vm_area_struct *vma,
+   unsigned long flags)
+{
+   mmap_assert_write_locked(vma->vm_mm);
+   vma->vm_flags |= flags;
+}
+
+static inline void clear_vm_flags(struct vm_area_struct *vma,
+ unsigned long flags)
+{
+   mmap_assert_write_locked(vma->vm_mm);
+   vma->vm_flags &= ~flags;
+}
+
+static inline void mod_vm_flags(struct vm_area_struct *vma,
+   unsigned long set, unsigned long clear)
+{
+   mmap_assert_write_locked(vma->vm_mm);
+   vma->vm_flags |= set;
+   vma->vm_flags &= ~clear;
+}
+
 static inline void vma_set_anonymous(struct vm_area_struct *vma)
 {
vma->vm_ops = NULL;
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 2d6d790d9bed..6c7c70bf50dd 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -491,7 +491,13 @@ struct vm_area_struct {
 * See vmf_insert_mixed_prot() for discussion.
 */
pgprot_t vm_page_prot;
-   unsigned long vm_flags; /* Flags, see mm.h. */
+
+   /*
+* Flags, see mm.h.
+* WARNING! Do not modify directly.
+* Use {init|reset|set|clear|mod}_vm_flags() functions instead.
+*/
+   unsigned long vm_flags;
 
/*
 * For areas with an address space and backing store,
-- 
2.39.1



Re: [PATCH] powerpc/tlb: Remove BUILD_BUG for book3s/32/tlbflush.h local_flush_tlb_page_psize

2023-01-25 Thread Michael Ellerman
Benjamin Gray  writes:
> Converts the BUILD_BUG to a WARN to allow building with a low/unoptimised
> compiler.
>
> The original expectation was that a compiler would see that the only
> usage of this function was in a function that is only called behind
> radix-only guards. And it worked this way on GCC. It seems Clang does
> not optimise away this call however, so thinks the function may be
> invoked and triggers the build bug as reported by the kernel test robot.
>
> https://lore.kernel.org/llvm/202301212348.edkowvff-...@intel.com
>
> This fix converts the build bug to a warning to allow builds without
> relying on particular compiler optimisation behaviours. The warning is
> not rate limited because this implementation should still never be called
> as-is, and anyone who might invoke it might appreciate it being very
> obvious that it's not behaving as expected.
>
> Fixes: 274d842fa1ef ("powerpc/tlb: Add local flush for page given mm_struct 
> and psize")
> Reported-by: kernel test robot 
> Signed-off-by: Benjamin Gray 
> ---
>  arch/powerpc/include/asm/book3s/32/tlbflush.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/book3s/32/tlbflush.h 
> b/arch/powerpc/include/asm/book3s/32/tlbflush.h
> index 4be572908124..675196884640 100644
> --- a/arch/powerpc/include/asm/book3s/32/tlbflush.h
> +++ b/arch/powerpc/include/asm/book3s/32/tlbflush.h
> @@ -2,7 +2,7 @@
>  #ifndef _ASM_POWERPC_BOOK3S_32_TLBFLUSH_H
>  #define _ASM_POWERPC_BOOK3S_32_TLBFLUSH_H
>  
> -#include 
> +#include 
>  
>  #define MMU_NO_CONTEXT  (0)
>  /*
> @@ -80,7 +80,7 @@ static inline void local_flush_tlb_page(struct 
> vm_area_struct *vma,
>  static inline void local_flush_tlb_page_psize(struct mm_struct *mm,
> unsigned long vmaddr, int psize)
>  {
> - BUILD_BUG();
> + WARN(1, "Unimplemented local TLB flush with psize");

Can't we just fall back to flush_tlb_page(vma, vmaddr)?

I'd guess those CPUs can't flush based on page size anyway.

cheers


Re: [PATCH] powerpc/tlb: Remove BUILD_BUG for book3s/32/tlbflush.h local_flush_tlb_page_psize

2023-01-25 Thread Christophe Leroy


Le 24/01/2023 à 22:54, Benjamin Gray a écrit :
> Converts the BUILD_BUG to a WARN to allow building with a low/unoptimised
> compiler.

No no no no. Please don't do that.

That approach is used everywhere in the kernel, why should it be a 
problem only for local_flush_tlb_page_psize() and not everywhere else ?

Really if it doesn't work it means there is a deep problem in your 
compiler. We really don't want the overhead of WARN over BUILD_BUG in 
the normal case.

By the way, are you should the problem is really BUILD_BUG() ? Looking 
at your patch I would think that the problem is because it is "static 
inline". Have you tried 'static __always_inline' instead ?

Christophe

> 
> The original expectation was that a compiler would see that the only
> usage of this function was in a function that is only called behind
> radix-only guards. And it worked this way on GCC. It seems Clang does
> not optimise away this call however, so thinks the function may be
> invoked and triggers the build bug as reported by the kernel test robot.
> 
> https://lore.kernel.org/llvm/202301212348.edkowvff-...@intel.com
> 
> This fix converts the build bug to a warning to allow builds without
> relying on particular compiler optimisation behaviours. The warning is
> not rate limited because this implementation should still never be called
> as-is, and anyone who might invoke it might appreciate it being very
> obvious that it's not behaving as expected.
> 
> Fixes: 274d842fa1ef ("powerpc/tlb: Add local flush for page given mm_struct 
> and psize")
> Reported-by: kernel test robot 
> Signed-off-by: Benjamin Gray 
> ---
>   arch/powerpc/include/asm/book3s/32/tlbflush.h | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/32/tlbflush.h 
> b/arch/powerpc/include/asm/book3s/32/tlbflush.h
> index 4be572908124..675196884640 100644
> --- a/arch/powerpc/include/asm/book3s/32/tlbflush.h
> +++ b/arch/powerpc/include/asm/book3s/32/tlbflush.h
> @@ -2,7 +2,7 @@
>   #ifndef _ASM_POWERPC_BOOK3S_32_TLBFLUSH_H
>   #define _ASM_POWERPC_BOOK3S_32_TLBFLUSH_H
>   
> -#include 
> +#include 
>   
>   #define MMU_NO_CONTEXT  (0)
>   /*
> @@ -80,7 +80,7 @@ static inline void local_flush_tlb_page(struct 
> vm_area_struct *vma,
>   static inline void local_flush_tlb_page_psize(struct mm_struct *mm,
> unsigned long vmaddr, int psize)
>   {
> - BUILD_BUG();
> + WARN(1, "Unimplemented local TLB flush with psize");
>   }
>   
>   static inline void local_flush_tlb_mm(struct mm_struct *mm)
> 
> base-commit: 53ab112a95086d10dc353ea4f979abb01644bbb6


Re: [PATCH v2] ALSA: aoa: make remove callback of soundbus driver void returned

2023-01-25 Thread Takashi Iwai
On Fri, 06 Jan 2023 16:17:46 +0100,
Dawei Li wrote:
> 
> Since commit fc7a6209d571 ("bus: Make remove callback return void")
> forces bus_type::remove be void-returned, it doesn't make much sense
> for any bus based driver implementing remove callbalk to return
> non-void to its caller.
> 
> As such, change the remove function for soundbus based drivers to
> return void.
> 
> Signed-off-by: Dawei Li 

Applied now, thanks.


Takashi


Re: arch/powerpc/kernel/head_85xx.o: warning: objtool: .head.text+0x1a6c: unannotated intra-function call

2023-01-25 Thread Michael Ellerman
"Naveen N. Rao"  writes:
> Sathvika Vasireddy wrote:
>> 
> arch/powerpc/kvm/booke.o: warning: objtool: kvmppc_fill_pt_regs+0x30: 
> unannotated intra-function call
>> 
>> As an attempt to fix it, I tried expanding ANNOTATE_INTRA_FUNCTION_CALL 
>> macro to indicate that the branch target is valid. It then threw another 
>> warning (arch/powerpc/kvm/booke.o: warning: objtool: 
>> kvmppc_fill_pt_regs+0x38: intra_function_call not a direct call). The 
>> below diff just removes the warnings for me, but I'm not very sure if 
>> this is the best way to fix the objtool warnings seen with this 
>> particular file. Please let me know if there are any better ways to fix it.
>> 
>> diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
>> index 0dce93ccaadf..b6a413824b98 100644
>> --- a/arch/powerpc/kvm/booke.c
>> +++ b/arch/powerpc/kvm/booke.c
>> @@ -917,7 +917,9 @@ static void kvmppc_fill_pt_regs(struct pt_regs *regs)
>>      asm("mr %0, 1" : "=r"(r1));
>>      asm("mflr %0" : "=r"(lr));
>>      asm("mfmsr %0" : "=r"(msr));
>> +   asm(".pushsection .discard.intra_function_calls; .long 999f; 
>> .popsection; 999:");
>>      asm("bl 1f; 1: mflr %0" : "=r"(ip));
>
> I don't think you can assume that there won't be anything in between two 
> asm statements.

Yeah, compiler could interleave something theoretically.

> Even if that works, I don't think it is good to expand the macro here.  
> That asm statement looks to be trying to grab the current nip. I don't 
> know enough about that code, and someone who knows more about KVM may be 
> able to help, but it looks like we should be able to simply set 'ip' to 
> the address of kvmppc_fill_pt_regs()?

There is _THIS_IP_ which should be sufficient.

cheers


Re: [PATCH v3] powerpc/boot: Don't always pass -mcpu=powerpc when building 32-bit uImage

2023-01-25 Thread Christophe Leroy


Le 22/01/2023 à 12:19, Pali Rohár a écrit :
> On Saturday 24 December 2022 18:44:52 Pali Rohár wrote:
>> On Thursday 08 December 2022 19:57:39 Christophe Leroy wrote:
>>> Le 08/12/2022 à 20:16, Pali Rohár a écrit :
 On Sunday 28 August 2022 17:43:53 Christophe Leroy wrote:
> Le 28/08/2022 à 19:41, Pali Rohár a écrit :
>> On Sunday 28 August 2022 17:39:25 Christophe Leroy wrote:
>>> Le 28/08/2022 à 19:33, Christophe Leroy a écrit :


 Le 28/08/2022 à 11:56, Pali Rohár a écrit :
> When CONFIG_TARGET_CPU is specified then pass its value to the 
> compiler
> -mcpu option. This fixes following build error when building kernel 
> with
> powerpc e500 SPE capable cross compilers:
>
> BOOTAS  arch/powerpc/boot/crt0.o
>   powerpc-linux-gnuspe-gcc: error: unrecognized argument in option
> ‘-mcpu=powerpc’
>   powerpc-linux-gnuspe-gcc: note: valid arguments to ‘-mcpu=’ are:
> 8540 8548 native
>   make[1]: *** [arch/powerpc/boot/Makefile:231:
> arch/powerpc/boot/crt0.o] Error 1

 corenet64_smp_defconfig :

   BOOTAS  arch/powerpc/boot/crt0.o
 powerpc64-linux-gcc: error: missing argument to '-mcpu='
 make[1]: *** [arch/powerpc/boot/Makefile:237 : 
 arch/powerpc/boot/crt0.o]
 Erreur 1
 make: *** [arch/powerpc/Makefile:253 : uImage] Erreur 2


>>>
>>> Seems like in fact, E5500_CPU and E6500_CPU are not taken into account
>>> in CONFIG_TARGET_CPU, and get special treatment directly in
>>> arch/powerpc/Makefile.
>>>
>>> This goes unnoticed because of CFLAGS-$(CONFIG_TARGET_CPU_BOOL) +=
>>> $(call cc-option,-mcpu=$(CONFIG_TARGET_CPU))
>>>
>>> I think we need to fix that prior to your patch.
>>
>> It looks like that CONFIG_TARGET_CPU is broken.
>>
>>  $ make ARCH=powerpc corenet64_smp_defconfig 
>> CROSS_COMPILE=powerpc64-linux-gnu-
>>  ...
>>  # configuration written to .config
>>
>>  $ grep CONFIG_TARGET_CPU .config
>>  CONFIG_TARGET_CPU_BOOL=y
>>
>> CONFIG_TARGET_CPU_BOOL is set but CONFIG_TARGET_CPU not!
>
> Yes, because there is no default value for E5500_CPU and E6500_CPU. We
> need to add one for each.

 With "[PATCH v1] powerpc/64: Set default CPU in Kconfig" patch from
 https://lore.kernel.org/linuxppc-dev/3fd60c2d8a28668a42b766b18362a526ef47e757.1670420281.git.christophe.le...@csgroup.eu/
 this change does not throw above compile error anymore.
>>>
>>>
>>> That patch should land in powerpc/next soon. When it has landed, could
>>> you resent this patch so that snowpatch checks the build again ?
>>
>> Yes. But I'm still waiting because patch is not in powerpc/next yet.
> 
> Seems that it still has not landed. Any suggestions to move forward?

Hi

I just reposted to see if it passed the CI tests this time.

Christophe

> 
>>> Because at the time being it is flagged as "failed", see
>>> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20220828095659.4061-1-p...@kernel.org/
>>>
>>> Christophe