Re: [Skiboot] [PATCH 1/3] core/device: Add function to return child node using name and length
> On 09-Jan-2023, at 8:59 PM, Dan Horák wrote: > > Hi Athira, > > On Thu, 5 Jan 2023 12:41:33 +0530 > Athira Rajeev wrote: > >> >> >>> On 05-Jan-2023, at 12:35 PM, Madhavan Srinivasan >>> wrote: >>> >>> >>> On Mon, 2 Jan 2023 08:45:22 +0530 >>> Athira Rajeev wrote: >>> Add a function dt_find_by_name_len() that returns the child node if it matches the first "n" characters of a given name, otherwise NULL. This is helpful for cases with node name like: "name@addr". In scenarios where nodes are added with "name@addr" format and if the value of "addr" is not known, that node can't be matched with node name or addr. Hence matching with substring as node name will return the expected result. Patch adds dt_find_by_name_len() function and testcase for the same in core/test/run-device.c >>> >>> wouldn't it be better to automatically compare the name up to the "@" >>> character in the node name when searching for the match instead of >>> having to hard-code the lengths? I think it should be good enough for >>> the use case described above. >>> >>> something like >>> ... >>> pos = strchr(child->name, '@') >>> if (!strncmp(child->name, name, pos - child->name)) >>> ... >>> >>> >>> Dan >> >> Hi Dan, >> >> Thanks for checking the patch. >> >> Comparing upto "@" while searching for the match will restrict the string >> search only for patterns with "@". >> By having dt_find_by_name_len which uses length, will be useful for generic >> substring search for different patterns. >> So prefered to use length instead of hardcoding character. >> >> Please let us know your thoughts. > > I understand the presented solution is a pretty generic one, but I think > the question is whether the added complexity brings the benefits > compared to the simpler "separator char" solution. > > And thinking even more about the generic "length" approach, it might > bring some false positive hits. Imagine nodes abc@1, abcd@2 and you are > looking for "abc". A search for (abc,3) will match also the "abcd" > one. And if the search string will always contain the "@" character, > then specifying the length is not required. And I believe the length > parameter might be totally redundant, because it can be derived from > the search string and the new function would be like > "dt_find_by_name_substr()". > > > With regards, > > Dan Hi Dan, Thanks for the response. Makes sense to have the new function as "dt_find_by_name_substr" by comparing upto “@". I will rework on the changes and post a V2 for this. Thanks Athira > >> Thanks >> Athira >> >>> Signed-off-by: Athira Rajeev --- core/device.c | 20 core/test/run-device.c | 11 +++ include/device.h | 4 3 files changed, 35 insertions(+) diff --git a/core/device.c b/core/device.c index 2de37c74..72c54e85 100644 --- a/core/device.c +++ b/core/device.c @@ -395,6 +395,26 @@ struct dt_node *dt_find_by_name(struct dt_node *root, const char *name) } +struct dt_node *dt_find_by_name_len(struct dt_node *root, + const char *name, int len) +{ + struct dt_node *child, *match; + + if (len <= 0) + return NULL; + + list_for_each(>children, child, list) { + if (!strncmp(child->name, name, len)) + return child; + + match = dt_find_by_name_len(child, name, len); + if (match) + return match; + } + + return NULL; +} + struct dt_node *dt_new_check(struct dt_node *parent, const char *name) { struct dt_node *node = dt_find_by_name(parent, name); diff --git a/core/test/run-device.c b/core/test/run-device.c index 4a12382b..8c552103 100644 --- a/core/test/run-device.c +++ b/core/test/run-device.c @@ -466,6 +466,17 @@ int main(void) new_prop_ph = dt_prop_get_u32(ut2, "something"); assert(!(new_prop_ph == ev1_ph)); dt_free(subtree); + + /* Test dt_find_by_name_len */ + root = dt_new_root(""); + addr1 = dt_new_addr(root, "node", 0x1); + addr2 = dt_new_addr(root, "node0_1", 0x2); + assert(dt_find_by_name(root, "node@1") == addr1); + assert(dt_find_by_name(root, "node0_1@2") == addr2); + assert(dt_find_by_name_len(root, "node@", 5) == addr1); + assert(dt_find_by_name_len(root, "node0_1@", 8) == addr2); + dt_free(root); + return 0; } diff --git a/include/device.h b/include/device.h index 93fb90ff..f5e0fb79 100644 --- a/include/device.h +++ b/include/device.h @@ -184,6 +184,10 @@ struct dt_node *dt_find_by_path(struct dt_node *root, const char *path); /* Find a child node by name */ struct dt_node *dt_find_by_name(struct dt_node
[PATCH] powerpc/rtas: upgrade internal arch spinlocks
At the time commit f97bb36f705d ("powerpc/rtas: Turn rtas lock into a raw spinlock") was written, the spinlock lockup detection code called __delay(), which will not make progress if the timebase is not advancing. Since the interprocessor timebase synchronization sequence for chrp, cell, and some now-unsupported Power models can temporarily freeze the timebase through an RTAS function (freeze-time-base), the lock that serializes most RTAS calls was converted to arch_spinlock_t to prevent kernel hangs in the lockup detection code. However, commit bc88c10d7e69 ("locking/spinlock/debug: Remove spinlock lockup detection code") removed that inconvenient property from the lock debug code several years ago. So now it should be safe to reintroduce generic locks into the RTAS support code, primarily to increase lockdep coverage. Making rtas.lock a spinlock_t would violate lock type nesting rules because it can be acquired while holding raw locks, e.g. pci_lock and irq_desc->lock. So convert it to raw_spinlock_t. There's no apparent reason not to upgrade timebase_lock as well. Signed-off-by: Nathan Lynch --- arch/powerpc/include/asm/rtas-types.h | 2 +- arch/powerpc/kernel/rtas.c| 52 --- 2 files changed, 15 insertions(+), 39 deletions(-) diff --git a/arch/powerpc/include/asm/rtas-types.h b/arch/powerpc/include/asm/rtas-types.h index 8df6235d64d1..a58f96eb2d19 100644 --- a/arch/powerpc/include/asm/rtas-types.h +++ b/arch/powerpc/include/asm/rtas-types.h @@ -18,7 +18,7 @@ struct rtas_t { unsigned long entry;/* physical address pointer */ unsigned long base; /* physical address pointer */ unsigned long size; - arch_spinlock_t lock; + raw_spinlock_t lock; struct rtas_args args; struct device_node *dev;/* virtual address pointer */ }; diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c index deded51a7978..a834726f18e3 100644 --- a/arch/powerpc/kernel/rtas.c +++ b/arch/powerpc/kernel/rtas.c @@ -61,7 +61,7 @@ static inline void do_enter_rtas(unsigned long args) } struct rtas_t rtas = { - .lock = __ARCH_SPIN_LOCK_UNLOCKED + .lock = __RAW_SPIN_LOCK_UNLOCKED(rtas.lock), }; EXPORT_SYMBOL(rtas); @@ -80,28 +80,6 @@ unsigned long rtas_rmo_buf; void (*rtas_flash_term_hook)(int); EXPORT_SYMBOL(rtas_flash_term_hook); -/* RTAS use home made raw locking instead of spin_lock_irqsave - * because those can be called from within really nasty contexts - * such as having the timebase stopped which would lockup with - * normal locks and spinlock debugging enabled - */ -static unsigned long lock_rtas(void) -{ - unsigned long flags; - - local_irq_save(flags); - preempt_disable(); - arch_spin_lock(); - return flags; -} - -static void unlock_rtas(unsigned long flags) -{ - arch_spin_unlock(); - local_irq_restore(flags); - preempt_enable(); -} - /* * call_rtas_display_status and call_rtas_display_status_delay * are designed only for very early low-level debugging, which @@ -109,14 +87,14 @@ static void unlock_rtas(unsigned long flags) */ static void call_rtas_display_status(unsigned char c) { - unsigned long s; + unsigned long flags; if (!rtas.base) return; - s = lock_rtas(); + raw_spin_lock_irqsave(, flags); rtas_call_unlocked(, 10, 1, 1, NULL, c); - unlock_rtas(s); + raw_spin_unlock_irqrestore(, flags); } static void call_rtas_display_status_delay(char c) @@ -534,7 +512,7 @@ int rtas_call(int token, int nargs, int nret, int *outputs, ...) { va_list list; int i; - unsigned long s; + unsigned long flags; struct rtas_args *rtas_args; char *buff_copy = NULL; int ret; @@ -557,8 +535,7 @@ int rtas_call(int token, int nargs, int nret, int *outputs, ...) return -1; } - s = lock_rtas(); - + raw_spin_lock_irqsave(, flags); /* We use the global rtas args buffer */ rtas_args = @@ -576,7 +553,7 @@ int rtas_call(int token, int nargs, int nret, int *outputs, ...) outputs[i] = be32_to_cpu(rtas_args->rets[i+1]); ret = (nret > 0)? be32_to_cpu(rtas_args->rets[0]): 0; - unlock_rtas(s); + raw_spin_unlock_irqrestore(, flags); if (buff_copy) { log_error(buff_copy, ERR_TYPE_RTAS_LOG, 0); @@ -1268,7 +1245,7 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs) buff_copy = get_errorlog_buffer(); - flags = lock_rtas(); + raw_spin_lock_irqsave(, flags); rtas.args = args; do_enter_rtas(__pa()); @@ -1279,7 +1256,7 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs) if (be32_to_cpu(args.rets[0]) == -1) errbuf = __fetch_rtas_last_error(buff_copy); - unlock_rtas(flags); + raw_spin_unlock_irqrestore(,
[PATCH] powerpc/pseries: drop RTAS-based timebase synchronization
The pseries platform has been LPAR-only for several generations, and the PAPR spec: * Guarantees that timebase synchronization is performed by the platform ("The timebase registers are synchronized by the platform before CPUs are given to the OS" - 7.3.8 SMP Support). * Completely omits the RTAS freeze-time-base and thaw-time-base RTAS functions, which are CHRP artifacts. This code is effectively unused on currently supported models, so drop it. Signed-off-by: Nathan Lynch --- arch/powerpc/platforms/pseries/smp.c | 6 -- 1 file changed, 6 deletions(-) diff --git a/arch/powerpc/platforms/pseries/smp.c b/arch/powerpc/platforms/pseries/smp.c index fd2174edfa1d..2bcfee86ff87 100644 --- a/arch/powerpc/platforms/pseries/smp.c +++ b/arch/powerpc/platforms/pseries/smp.c @@ -278,11 +278,5 @@ void __init smp_init_pseries(void) cpumask_clear_cpu(boot_cpuid, of_spin_mask); } - /* Non-lpar has additional take/give timebase */ - if (rtas_token("freeze-time-base") != RTAS_UNKNOWN_SERVICE) { - smp_ops->give_timebase = rtas_give_timebase; - smp_ops->take_timebase = rtas_take_timebase; - } - pr_debug(" <- smp_init_pSeries()\n"); } -- 2.37.1
Re: [PATCH v2 7/7] powerpc/pseries: Implement secvars for dynamic secure boot
On Fri, 2023-01-06 at 21:49 +1100, Michael Ellerman wrote: > > > +static int plpks_get_variable(const char *key, uint64_t key_len, > > + u8 *data, uint64_t *data_size) > > +{ > > + struct plpks_var var = {0}; > > + u16 ucs2_namelen; > > + u8 *ucs2_name; > > + int rc = 0; > > + > > + ucs2_namelen = get_ucs2name(key, _name); > > + if (!ucs2_namelen) > > + return -ENOMEM; > > + > > + var.name = ucs2_name; > > + var.namelen = ucs2_namelen; > > + var.os = PLPKS_VAR_LINUX; > > + rc = plpks_read_os_var(); > > + > > + if (rc) > > + goto err; > > + > > + *data_size = var.datalen + sizeof(var.policy); > > + > > + // We can be called with data = NULL to just get the object > > size. > > + if (data) { > > + memcpy(data, , sizeof(var.policy)); > > + memcpy(data + sizeof(var.policy), var.data, > > var.datalen); > > + } > > There's a lot of allocation and copying going on. The secvar-sysfs.c > data_read() has kzalloc'ed data, but only after already doing the > hcall > to get the size. Then plpks_read_os_var() does an allocation for the > hcall and then another allocation of the exact data size. Then > data_read() > does another copy into the sysfs supplied buffer. > > So to read a single variable we do the hcall twice, and allocate/copy > the content of the variable 4 times? > > - Hypervisor into "output" in plpks_read_var(). > - "output" into "var->data" in plpks_read_var(). > - "var.data" into "data" in plpks_get_variable(). > - "data" into "buf" in data_read(). > > As long as maxobjsize is < PAGE_SIZE I think we could do the hcall > directly into "buf". Maybe we want to avoid writing into "buf" > directly > in case the hcall fails or something, but the other 3 copies seem > unnecessary. In the general case, I don't like passing buffer pointers straight from parameters into hcalls, since the address has to be in the linear map, and that's a detail I'd rather hide from callers. But otherwise, yes I think we can probably shift to having the caller allocate the buffers. -- Andrew DonnellanOzLabs, ADL Canberra a...@linux.ibm.com IBM Australia Limited
Re: [PATCH v2 7/7] powerpc/pseries: Implement secvars for dynamic secure boot
On Mon, 2023-01-09 at 16:20 +1100, Andrew Donnellan wrote: > On Mon, 2023-01-09 at 14:34 +1100, Russell Currey wrote: > > > > > > +static int plpks_secvar_init(void) > > > > +{ > > > > + if (!plpks_is_available()) > > > > + return -ENODEV; > > > > + > > > > + set_secvar_ops(_secvar_ops); > > > > + set_secvar_config_attrs(config_attrs); > > > > + return 0; > > > > +} > > > > +device_initcall(plpks_secvar_init); > > > > > > That must be a machine_device_initcall(pseries, ...), otherwise > > > we > > > will > > > blow up doing a hcall on powernv in plpks_is_available(). > > > > OK, can do. I don't understand your case of how powernv could hit > > this, but I think I to have to move plpks_is_available() into > > include/, > > so it's going to be even more possible anyway. > > Kernels can be compiled with both pseries and powernv support, in > which > case plpks_secvar_init() will be called unconditionally even when > booting on a powernv machine. > > I can confirm that as it is, booting this on powernv qemu causes a > panic. Of course, I'm not sure why I thought an initcall in a platform that wasn't active would magically not run on other platforms. >
Re: [PATCH 06/14] powerpc/vdso: Remove unused '-s' flag from ASFLAGS
On Tue, Jan 10, 2023 at 01:22:38AM +0100, Andreas Schwab wrote: > On Jan 09 2023, Segher Boessenkool wrote: > > > It is required by POSIX (for the c99 command, anyway). It *also* is > > required to be supported when producing object files (so when no linking > > is done). > > > > It is a GCC flag, and documented just fine: > > https://gcc.gnu.org/onlinedocs/gcc/Link-Options.html#index-s > > Most assembler flags are unrelated to the flags passed to the compiler > driver, and -s is no exception. POSIX has nothing to say about the > sub-commands of the compiler anyway. But this is not an assembler flag! quiet_cmd_vdso32as = VDSO32A $@ cmd_vdso32as = $(VDSOCC) $(a_flags) $(CC32FLAGS) $(AS32FLAGS) -c -o $@ $< where ifdef CROSS32_COMPILE VDSOCC := $(CROSS32_COMPILE)gcc else VDSOCC := $(CC) endif The name of the make variable AS32FLAGS is a bit misleading. This variable does not hold arguments to as, it holds arguments to the compiler driver, "gcc". Segher
Re: [PATCH 06/14] powerpc/vdso: Remove unused '-s' flag from ASFLAGS
On Mon, Jan 09, 2023 at 03:14:33PM -0800, Nick Desaulniers wrote: > On Mon, Jan 9, 2023 at 2:29 PM Segher Boessenkool > wrote: > > > > Hi! Happy new year all. > > HNY Segher! :) > > > > > On Mon, Jan 09, 2023 at 01:58:32PM -0800, Nick Desaulniers wrote: > > > On Wed, Jan 4, 2023 at 11:55 AM Nathan Chancellor > > > wrote: > > > > > > > > When clang's -Qunused-arguments is dropped from KBUILD_CPPFLAGS, it > > > > warns that ASFLAGS contains '-s', which is a linking phase option, so it > > > > is unused. > > > > > > > > clang-16: error: argument unused during compilation: '-s' > > > > [-Werror,-Wunused-command-line-argument] > > > > > > > > Looking at the GAS sources, '-s' is only useful when targeting Solaris > > > > and it is ignored for the powerpc target so just drop the flag > > > > altogether, as it is not needed. > > > > > > Do you have any more info where you found this? I don't see -s > > > documented as an assembler flag. > > > https://sourceware.org/binutils/docs/as/PowerPC_002dOpts.html > > > https://sourceware.org/binutils/docs/as/Invoking.html > > > > It is required by POSIX (for the c99 command, anyway). It *also* is > > required to be supported when producing object files (so when no linking > > is done). > > > > It is a GCC flag, and documented just fine: > > https://gcc.gnu.org/onlinedocs/gcc/Link-Options.html#index-s > > > > (Yes, that says it is for linking; but the option is allowed without > > error of any kind always). > > > > (ASFLAGS sounds like it is for assembler commands, but it really is > > for compiler commands that just happen to get .S input files). > > > > > The patch seems fine to me, but what was this ever supposed to be? > > > FWICT it predates git history (looking at > > > arch/powerpc/kernel/vdso32/Makefile at fc15351d9d63) > > > > Yeah, good question. This compiler flag does the moral equivalent of > > strip -s (aka --strip-all). Maybe this was needed at some point, or > > the symbol or debug info was just annoying (during bringup or similar)? > > Ah right! Ok then, I think we might keep the patch's diff, but update > the commit message to mention this is a linker flag that's unused > since the compiler is being invoked but not the linker (the compiler > is being used as the driver to assemble a single assembler source > without linking it; linking is then driven by the linker in a separate > make rule). Yes, sorry, I thought that was clear with the "which is a linking phase option" comment in the commit message but clearly not :) > Then we might want to revisit that s390 patch, too? > https://lore.kernel.org/llvm/20221228-drop-qunused-arguments-v1-9-658cbc8fc...@kernel.org/ So for this patch, I have When clang's -Qunused-arguments is dropped from KBUILD_CPPFLAGS, it warns: clang-16: error: argument unused during compilation: '-s' [-Werror,-Wunused-command-line-argument] The compiler's '-s' flag is a linking option (it is passed along to the linker directly), which means it does nothing when the linker is not invoked by the compiler. The kernel builds all .o files with either '-c' or '-S', which do not run the linker, so '-s' can be safely dropped from ASFLAGS. as a new commit message. Is that sufficient for everyone? If so, I'll adjust the s390 commit to match, as it is the same exact problem. Alternatively, if '-s' should actually remain around, we could move it to ldflags-y, which is added in patch 7. However, I assume that nobody has noticed that it has not been doing its job for a while, so it should be safe to remove. Cheers, Nathan
Re: [PATCH 06/14] powerpc/vdso: Remove unused '-s' flag from ASFLAGS
On Jan 09 2023, Segher Boessenkool wrote: > It is required by POSIX (for the c99 command, anyway). It *also* is > required to be supported when producing object files (so when no linking > is done). > > It is a GCC flag, and documented just fine: > https://gcc.gnu.org/onlinedocs/gcc/Link-Options.html#index-s Most assembler flags are unrelated to the flags passed to the compiler driver, and -s is no exception. POSIX has nothing to say about the sub-commands of the compiler anyway. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different."
Re: [PATCH 08/14] powerpc/vdso: Remove an unsupported flag from vgettimeofday-32.o with clang
On Mon, Jan 9, 2023 at 2:38 PM Nathan Chancellor wrote: > > On Mon, Jan 09, 2023 at 02:12:55PM -0800, Nick Desaulniers wrote: > > On Wed, Jan 4, 2023 at 11:55 AM Nathan Chancellor wrote: > > > > > > When clang's -Qunused-arguments is dropped from KBUILD_CPPFLAGS, it > > > warns: > > > > > > clang-16: error: argument unused during compilation: > > > '-fno-stack-clash-protection' [-Werror,-Wunused-command-line-argument] > > > > > > This flag is supported for 64-bit powerpc but not 32-bit, hence the > > > warning. > > > Just remove the flag from vgettimeofday-32.o's CFLAGS when using clang, > > > as has > > > been done for other flags previously. > > > > > > Signed-off-by: Nathan Chancellor > > > > Hmm...so this was added by the top level Makefile doing a cc-option > > test. How did the test pass if this was unsupported? That worries me > > that perhaps other cc-option tests are passing erroneously for certain > > ppc -m32/-m64 configs? > > Sure, that is a reasonable concern. I should have expanded upon this a > little bit more in the commit message. Is this any better? > > When clang's -Qunused-arguments is dropped from KBUILD_CPPFLAGS, it > warns: > > clang-16: error: argument unused during compilation: > '-fno-stack-clash-protection' [-Werror,-Wunused-command-line-argument] > > This warning happens because vgettimeofday-32.c gets its base CFLAGS > from the main kernel, which may contain flags that are only supported > on a 64-bit target but not a 32-bit one, which is the case here. > -fstack-clash-protection and its negation are only suppported by the > 64-bit powerpc target but that flag is included in an invocation for a > 32-bit powerpc target, so clang points out that while the flag is one > that it recognizes, it is not actually used by this compiler job. > > To eliminate the warning, remove -fno-stack-clash-protection from > vgettimeofday-32.c's CFLAGS when using clang, as has been done for > other flags previously. Ah, ok that makes more sense. Sorry for my confusion, but if you wouldn't mind adding the additional info in v3 it might help others (or just me!) You may add my RB to such a v3 w/ updated commit message. -- Thanks, ~Nick Desaulniers
Re: [PATCH 06/14] powerpc/vdso: Remove unused '-s' flag from ASFLAGS
On Mon, Jan 9, 2023 at 2:29 PM Segher Boessenkool wrote: > > Hi! Happy new year all. HNY Segher! :) > > On Mon, Jan 09, 2023 at 01:58:32PM -0800, Nick Desaulniers wrote: > > On Wed, Jan 4, 2023 at 11:55 AM Nathan Chancellor wrote: > > > > > > When clang's -Qunused-arguments is dropped from KBUILD_CPPFLAGS, it > > > warns that ASFLAGS contains '-s', which is a linking phase option, so it > > > is unused. > > > > > > clang-16: error: argument unused during compilation: '-s' > > > [-Werror,-Wunused-command-line-argument] > > > > > > Looking at the GAS sources, '-s' is only useful when targeting Solaris > > > and it is ignored for the powerpc target so just drop the flag > > > altogether, as it is not needed. > > > > Do you have any more info where you found this? I don't see -s > > documented as an assembler flag. > > https://sourceware.org/binutils/docs/as/PowerPC_002dOpts.html > > https://sourceware.org/binutils/docs/as/Invoking.html > > It is required by POSIX (for the c99 command, anyway). It *also* is > required to be supported when producing object files (so when no linking > is done). > > It is a GCC flag, and documented just fine: > https://gcc.gnu.org/onlinedocs/gcc/Link-Options.html#index-s > > (Yes, that says it is for linking; but the option is allowed without > error of any kind always). > > (ASFLAGS sounds like it is for assembler commands, but it really is > for compiler commands that just happen to get .S input files). > > > The patch seems fine to me, but what was this ever supposed to be? > > FWICT it predates git history (looking at > > arch/powerpc/kernel/vdso32/Makefile at fc15351d9d63) > > Yeah, good question. This compiler flag does the moral equivalent of > strip -s (aka --strip-all). Maybe this was needed at some point, or > the symbol or debug info was just annoying (during bringup or similar)? Ah right! Ok then, I think we might keep the patch's diff, but update the commit message to mention this is a linker flag that's unused since the compiler is being invoked but not the linker (the compiler is being used as the driver to assemble a single assembler source without linking it; linking is then driven by the linker in a separate make rule). Then we might want to revisit that s390 patch, too? https://lore.kernel.org/llvm/20221228-drop-qunused-arguments-v1-9-658cbc8fc...@kernel.org/ > > > Reviewed-by: Nick Desaulniers > Reviewed-by: Segher Boessenkool > > > Segher -- Thanks, ~Nick Desaulniers
Re: [PATCH 06/14] powerpc/vdso: Remove unused '-s' flag from ASFLAGS
On Mon, Jan 09, 2023 at 03:37:53PM -0700, Nathan Chancellor wrote: > On Mon, Jan 09, 2023 at 04:23:37PM -0600, Segher Boessenkool wrote: > > Hi! Happy new year all. > > > > On Mon, Jan 09, 2023 at 01:58:32PM -0800, Nick Desaulniers wrote: > > > On Wed, Jan 4, 2023 at 11:55 AM Nathan Chancellor > > > wrote: > > > > > > > > When clang's -Qunused-arguments is dropped from KBUILD_CPPFLAGS, it > > > > warns that ASFLAGS contains '-s', which is a linking phase option, so it > > > > is unused. > > > > > > > > clang-16: error: argument unused during compilation: '-s' > > > > [-Werror,-Wunused-command-line-argument] > > > > > > > > Looking at the GAS sources, '-s' is only useful when targeting Solaris > > > > and it is ignored for the powerpc target so just drop the flag > > > > altogether, as it is not needed. > > > > > > Do you have any more info where you found this? I don't see -s > > > documented as an assembler flag. > > > https://sourceware.org/binutils/docs/as/PowerPC_002dOpts.html > > > https://sourceware.org/binutils/docs/as/Invoking.html > > > > It is required by POSIX (for the c99 command, anyway). It *also* is > > required to be supported when producing object files (so when no linking > > is done). > > > > It is a GCC flag, and documented just fine: > > https://gcc.gnu.org/onlinedocs/gcc/Link-Options.html#index-s > > > > (Yes, that says it is for linking; but the option is allowed without > > error of any kind always). > > > > (ASFLAGS sounds like it is for assembler commands, but it really is > > for compiler commands that just happen to get .S input files). > > > > > The patch seems fine to me, but what was this ever supposed to be? > > > FWICT it predates git history (looking at > > > arch/powerpc/kernel/vdso32/Makefile at fc15351d9d63) > > > > Yeah, good question. This compiler flag does the moral equivalent of > > strip -s (aka --strip-all). Maybe this was needed at some point, or > > the symbol or debug info was just annoying (during bringup or similar)? > > > > > Reviewed-by: Nick Desaulniers > > Reviewed-by: Segher Boessenkool > > Thank you for the review and extra explanation! I had kind of expanded > on this in the s390 version of this patch [1], I will go ahead and do > the same thing for the powerpc version too. > > [1]: > https://lore.kernel.org/llvm/20221228-drop-qunused-arguments-v1-9-658cbc8fc...@kernel.org/ The underwhelming GCC source code for this is https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/gcc.cc;h=d629ca5e424ad3120be13e82b9f38b7bf8f1cdf2;hb=HEAD#l1148 which says that the -s flag is passed through unmodified to the linker when invoking the linker. See #l603 for the %{s} specs syntax. This is not a flag to the assembler at all. It is a flag to the compiler, which passes it on to the linker :-) > > (ASFLAGS sounds like it is for assembler commands, but it really is > > for compiler commands that just happen to get .S input files). Segher
Re: [PATCH 08/14] powerpc/vdso: Remove an unsupported flag from vgettimeofday-32.o with clang
On Mon, Jan 09, 2023 at 02:12:55PM -0800, Nick Desaulniers wrote: > On Wed, Jan 4, 2023 at 11:55 AM Nathan Chancellor wrote: > > > > When clang's -Qunused-arguments is dropped from KBUILD_CPPFLAGS, it > > warns: > > > > clang-16: error: argument unused during compilation: > > '-fno-stack-clash-protection' [-Werror,-Wunused-command-line-argument] > > > > This flag is supported for 64-bit powerpc but not 32-bit, hence the warning. > > Just remove the flag from vgettimeofday-32.o's CFLAGS when using clang, as > > has > > been done for other flags previously. > > > > Signed-off-by: Nathan Chancellor > > Hmm...so this was added by the top level Makefile doing a cc-option > test. How did the test pass if this was unsupported? That worries me > that perhaps other cc-option tests are passing erroneously for certain > ppc -m32/-m64 configs? Sure, that is a reasonable concern. I should have expanded upon this a little bit more in the commit message. Is this any better? When clang's -Qunused-arguments is dropped from KBUILD_CPPFLAGS, it warns: clang-16: error: argument unused during compilation: '-fno-stack-clash-protection' [-Werror,-Wunused-command-line-argument] This warning happens because vgettimeofday-32.c gets its base CFLAGS from the main kernel, which may contain flags that are only supported on a 64-bit target but not a 32-bit one, which is the case here. -fstack-clash-protection and its negation are only suppported by the 64-bit powerpc target but that flag is included in an invocation for a 32-bit powerpc target, so clang points out that while the flag is one that it recognizes, it is not actually used by this compiler job. To eliminate the warning, remove -fno-stack-clash-protection from vgettimeofday-32.c's CFLAGS when using clang, as has been done for other flags previously. Cheers, Nathan > > --- > > Cc: m...@ellerman.id.au > > Cc: npig...@gmail.com > > Cc: christophe.le...@csgroup.eu > > Cc: linuxppc-dev@lists.ozlabs.org > > --- > > arch/powerpc/kernel/vdso/Makefile | 5 + > > 1 file changed, 5 insertions(+) > > > > diff --git a/arch/powerpc/kernel/vdso/Makefile > > b/arch/powerpc/kernel/vdso/Makefile > > index 769b62832b38..4ee7d36ce752 100644 > > --- a/arch/powerpc/kernel/vdso/Makefile > > +++ b/arch/powerpc/kernel/vdso/Makefile > > @@ -16,6 +16,11 @@ ifneq ($(c-gettimeofday-y),) > >CFLAGS_vgettimeofday-32.o += -ffreestanding -fasynchronous-unwind-tables > >CFLAGS_REMOVE_vgettimeofday-32.o = $(CC_FLAGS_FTRACE) > >CFLAGS_REMOVE_vgettimeofday-32.o += -mcmodel=medium -mabi=elfv1 > > -mabi=elfv2 -mcall-aixdesc > > + # This flag is supported by clang for 64-bit but not 32-bit so it will > > cause > > + # an unused command line flag warning for this file. > > + ifdef CONFIG_CC_IS_CLANG > > + CFLAGS_REMOVE_vgettimeofday-32.o += -fno-stack-clash-protection > > + endif > >CFLAGS_vgettimeofday-64.o += -include $(c-gettimeofday-y) > >CFLAGS_vgettimeofday-64.o += $(DISABLE_LATENT_ENTROPY_PLUGIN) > >CFLAGS_vgettimeofday-64.o += $(call cc-option, -fno-stack-protector) > > > > -- > > 2.39.0 > > > > -- > Thanks, > ~Nick Desaulniers
Re: [PATCH 06/14] powerpc/vdso: Remove unused '-s' flag from ASFLAGS
On Mon, Jan 09, 2023 at 04:23:37PM -0600, Segher Boessenkool wrote: > Hi! Happy new year all. > > On Mon, Jan 09, 2023 at 01:58:32PM -0800, Nick Desaulniers wrote: > > On Wed, Jan 4, 2023 at 11:55 AM Nathan Chancellor wrote: > > > > > > When clang's -Qunused-arguments is dropped from KBUILD_CPPFLAGS, it > > > warns that ASFLAGS contains '-s', which is a linking phase option, so it > > > is unused. > > > > > > clang-16: error: argument unused during compilation: '-s' > > > [-Werror,-Wunused-command-line-argument] > > > > > > Looking at the GAS sources, '-s' is only useful when targeting Solaris > > > and it is ignored for the powerpc target so just drop the flag > > > altogether, as it is not needed. > > > > Do you have any more info where you found this? I don't see -s > > documented as an assembler flag. > > https://sourceware.org/binutils/docs/as/PowerPC_002dOpts.html > > https://sourceware.org/binutils/docs/as/Invoking.html > > It is required by POSIX (for the c99 command, anyway). It *also* is > required to be supported when producing object files (so when no linking > is done). > > It is a GCC flag, and documented just fine: > https://gcc.gnu.org/onlinedocs/gcc/Link-Options.html#index-s > > (Yes, that says it is for linking; but the option is allowed without > error of any kind always). > > (ASFLAGS sounds like it is for assembler commands, but it really is > for compiler commands that just happen to get .S input files). > > > The patch seems fine to me, but what was this ever supposed to be? > > FWICT it predates git history (looking at > > arch/powerpc/kernel/vdso32/Makefile at fc15351d9d63) > > Yeah, good question. This compiler flag does the moral equivalent of > strip -s (aka --strip-all). Maybe this was needed at some point, or > the symbol or debug info was just annoying (during bringup or similar)? > > > Reviewed-by: Nick Desaulniers > Reviewed-by: Segher Boessenkool Thank you for the review and extra explanation! I had kind of expanded on this in the s390 version of this patch [1], I will go ahead and do the same thing for the powerpc version too. [1]: https://lore.kernel.org/llvm/20221228-drop-qunused-arguments-v1-9-658cbc8fc...@kernel.org/ Cheers, Nathan
Re: [PATCH 06/14] powerpc/vdso: Remove unused '-s' flag from ASFLAGS
Hi! Happy new year all. On Mon, Jan 09, 2023 at 01:58:32PM -0800, Nick Desaulniers wrote: > On Wed, Jan 4, 2023 at 11:55 AM Nathan Chancellor wrote: > > > > When clang's -Qunused-arguments is dropped from KBUILD_CPPFLAGS, it > > warns that ASFLAGS contains '-s', which is a linking phase option, so it > > is unused. > > > > clang-16: error: argument unused during compilation: '-s' > > [-Werror,-Wunused-command-line-argument] > > > > Looking at the GAS sources, '-s' is only useful when targeting Solaris > > and it is ignored for the powerpc target so just drop the flag > > altogether, as it is not needed. > > Do you have any more info where you found this? I don't see -s > documented as an assembler flag. > https://sourceware.org/binutils/docs/as/PowerPC_002dOpts.html > https://sourceware.org/binutils/docs/as/Invoking.html It is required by POSIX (for the c99 command, anyway). It *also* is required to be supported when producing object files (so when no linking is done). It is a GCC flag, and documented just fine: https://gcc.gnu.org/onlinedocs/gcc/Link-Options.html#index-s (Yes, that says it is for linking; but the option is allowed without error of any kind always). (ASFLAGS sounds like it is for assembler commands, but it really is for compiler commands that just happen to get .S input files). > The patch seems fine to me, but what was this ever supposed to be? > FWICT it predates git history (looking at > arch/powerpc/kernel/vdso32/Makefile at fc15351d9d63) Yeah, good question. This compiler flag does the moral equivalent of strip -s (aka --strip-all). Maybe this was needed at some point, or the symbol or debug info was just annoying (during bringup or similar)? > Reviewed-by: Nick Desaulniers Reviewed-by: Segher Boessenkool Segher
Re: [PATCH 06/14] powerpc/vdso: Remove unused '-s' flag from ASFLAGS
On Mon, Jan 9, 2023 at 2:15 PM Nathan Chancellor wrote: > > On Mon, Jan 09, 2023 at 01:58:32PM -0800, Nick Desaulniers wrote: > > On Wed, Jan 4, 2023 at 11:55 AM Nathan Chancellor wrote: > > > > > > When clang's -Qunused-arguments is dropped from KBUILD_CPPFLAGS, it > > > warns that ASFLAGS contains '-s', which is a linking phase option, so it > > > is unused. > > > > > > clang-16: error: argument unused during compilation: '-s' > > > [-Werror,-Wunused-command-line-argument] > > > > > > Looking at the GAS sources, '-s' is only useful when targeting Solaris > > > and it is ignored for the powerpc target so just drop the flag > > > altogether, as it is not needed. > > > > Do you have any more info where you found this? I don't see -s > > documented as an assembler flag. > > https://sourceware.org/binutils/docs/as/PowerPC_002dOpts.html > > https://sourceware.org/binutils/docs/as/Invoking.html > > Sure thing, sorry I did not include it initially. See the section from > line 1284 to 1291 or search for "case 's':": > > https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gas/config/tc-ppc.c;h=9450fa74de1b61542c9a18babf8c8f621ef904fb;hb=HEAD > > > The patch seems fine to me, but what was this ever supposed to be? > > FWICT it predates git history (looking at > > arch/powerpc/kernel/vdso32/Makefile at fc15351d9d63) > > Right, I could not figure it out either, it has been there since the > vDSO was introduced back in 2005 (I was three days away from 10!) and > there is no comment about it so *shrug*: > > https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=054eb7153aeb84cc92da84210cf93b0e2a34811b > > If someone else's archeological skills are better and can provide more > information, I am happy to include that. > > > Reviewed-by: Nick Desaulniers > > Thanks as always for the review! I'll include this and a note about > where in binutils I found that information for v2. Yeah, I think the comment from binutils sources would be good to add to a v2 commit message. Thanks! -- Thanks, ~Nick Desaulniers
Re: [PATCH 07/14] powerpc/vdso: Improve linker flags
On Mon, Jan 09, 2023 at 02:08:41PM -0800, Nick Desaulniers wrote: > On Wed, Jan 4, 2023 at 11:55 AM Nathan Chancellor wrote: > > > > When clang's -Qunused-arguments is dropped from KBUILD_CPPFLAGS, there > > are several warnings in the PowerPC vDSO: > > > > clang-16: error: -Wl,-soname=linux-vdso32.so.1: 'linker' input unused > > [-Werror,-Wunused-command-line-argument] > > clang-16: error: -Wl,--hash-style=both: 'linker' input unused > > [-Werror,-Wunused-command-line-argument] > > clang-16: error: argument unused during compilation: '-shared' > > [-Werror,-Wunused-command-line-argument] > > > > clang-16: error: argument unused during compilation: '-nostdinc' > > [-Werror,-Wunused-command-line-argument] > > clang-16: error: argument unused during compilation: '-Wa,-maltivec' > > [-Werror,-Wunused-command-line-argument] > > > > The first group of warnings point out that linker flags were being added > > to all invocations of $(CC), even though they will only be used during > > the final vDSO link. Move those flags to ldflags-y. > > > > The second group of warnings are compiler or assembler flags that will > > be unused during linking. Filter them out from KBUILD_CFLAGS so that > > they are not used during linking. > > > > Signed-off-by: Nathan Chancellor > > --- > > Cc: m...@ellerman.id.au > > Cc: npig...@gmail.com > > Cc: christophe.le...@csgroup.eu > > Cc: linuxppc-dev@lists.ozlabs.org > > --- > > arch/powerpc/kernel/vdso/Makefile | 18 +++--- > > 1 file changed, 11 insertions(+), 7 deletions(-) > > > > diff --git a/arch/powerpc/kernel/vdso/Makefile > > b/arch/powerpc/kernel/vdso/Makefile > > index 45c0cc5d34b6..769b62832b38 100644 > > --- a/arch/powerpc/kernel/vdso/Makefile > > +++ b/arch/powerpc/kernel/vdso/Makefile > > @@ -47,13 +47,17 @@ KCOV_INSTRUMENT := n > > UBSAN_SANITIZE := n > > KASAN_SANITIZE := n > > > > -ccflags-y := -shared -fno-common -fno-builtin -nostdlib > > -Wl,--hash-style=both > > -ccflags-$(CONFIG_LD_IS_LLD) += $(call > > cc-option,--ld-path=$(LD),-fuse-ld=lld) > > - > > -CC32FLAGS := -Wl,-soname=linux-vdso32.so.1 -m32 > > +ccflags-y := -fno-common -fno-builtin > > +ldflags-y := -Wl,--hash-style=both -nostdlib -shared > > +ldflags-$(CONFIG_LD_IS_LLD) += $(call > > cc-option,--ld-path=$(LD),-fuse-ld=lld) > > +# Filter flags that clang will warn are unused for linking > > +ldflags-y += $(filter-out $(CC_FLAGS_FTRACE) -Wa$(comma)%, > > $(KBUILD_CFLAGS)) > > + > > +CC32FLAGS := -m32 > > +LD32FLAGS := -Wl,-soname=linux-vdso32.so.1 > > AS32FLAGS := -D__VDSO32__ > > > > -CC64FLAGS := -Wl,-soname=linux-vdso64.so.1 > > +LD64FLAGS := -Wl,-soname=linux-vdso64.so.1 > > AS64FLAGS := -D__VDSO64__ > > > > targets += vdso32.lds > > @@ -92,14 +96,14 @@ include/generated/vdso64-offsets.h: > > $(obj)/vdso64.so.dbg FORCE > > > > # actual build commands > > quiet_cmd_vdso32ld_and_check = VDSO32L $@ > > - cmd_vdso32ld_and_check = $(VDSOCC) $(c_flags) $(CC32FLAGS) -o $@ > > -Wl,-T$(filter %.lds,$^) $(filter %.o,$^) -z noexecstack ; $(cmd_vdso_check) > > + cmd_vdso32ld_and_check = $(VDSOCC) $(ldflags-y) $(CC32FLAGS) > > $(LD32FLAGS) -o $@ -Wl,-T$(filter %.lds,$^) $(filter %.o,$^) -z noexecstack > > ; $(cmd_vdso_check) > > quiet_cmd_vdso32as = VDSO32A $@ > >cmd_vdso32as = $(VDSOCC) $(a_flags) $(CC32FLAGS) $(AS32FLAGS) -c -o > > $@ $< > > quiet_cmd_vdso32cc = VDSO32C $@ > >cmd_vdso32cc = $(VDSOCC) $(c_flags) $(CC32FLAGS) -c -o $@ $< > > > > quiet_cmd_vdso64ld_and_check = VDSO64L $@ > > - cmd_vdso64ld_and_check = $(VDSOCC) $(c_flags) $(CC64FLAGS) -o $@ > > -Wl,-T$(filter %.lds,$^) $(filter %.o,$^) -z noexecstack ; $(cmd_vdso_check) > > + cmd_vdso64ld_and_check = $(VDSOCC) $(ldflags-y) $(CC64FLAGS) > > $(LD64FLAGS) -o $@ -Wl,-T$(filter %.lds,$^) $(filter %.o,$^) -z noexecstack > > ; $(cmd_vdso_check) > > Let's move `-z noexecstack` up into ldflags-y? (you may add my RB with > that modification) Ack, done locally, will be included in v2. > > quiet_cmd_vdso64as = VDSO64A $@ > >cmd_vdso64as = $(VDSOCC) $(a_flags) $(CC64FLAGS) $(AS64FLAGS) -c -o > > $@ $< > > > > > > -- > > 2.39.0 > > > > -- > Thanks, > ~Nick Desaulniers
Re: [PATCH 06/14] powerpc/vdso: Remove unused '-s' flag from ASFLAGS
On Mon, Jan 09, 2023 at 01:58:32PM -0800, Nick Desaulniers wrote: > On Wed, Jan 4, 2023 at 11:55 AM Nathan Chancellor wrote: > > > > When clang's -Qunused-arguments is dropped from KBUILD_CPPFLAGS, it > > warns that ASFLAGS contains '-s', which is a linking phase option, so it > > is unused. > > > > clang-16: error: argument unused during compilation: '-s' > > [-Werror,-Wunused-command-line-argument] > > > > Looking at the GAS sources, '-s' is only useful when targeting Solaris > > and it is ignored for the powerpc target so just drop the flag > > altogether, as it is not needed. > > Do you have any more info where you found this? I don't see -s > documented as an assembler flag. > https://sourceware.org/binutils/docs/as/PowerPC_002dOpts.html > https://sourceware.org/binutils/docs/as/Invoking.html Sure thing, sorry I did not include it initially. See the section from line 1284 to 1291 or search for "case 's':": https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gas/config/tc-ppc.c;h=9450fa74de1b61542c9a18babf8c8f621ef904fb;hb=HEAD > The patch seems fine to me, but what was this ever supposed to be? > FWICT it predates git history (looking at > arch/powerpc/kernel/vdso32/Makefile at fc15351d9d63) Right, I could not figure it out either, it has been there since the vDSO was introduced back in 2005 (I was three days away from 10!) and there is no comment about it so *shrug*: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=054eb7153aeb84cc92da84210cf93b0e2a34811b If someone else's archeological skills are better and can provide more information, I am happy to include that. > Reviewed-by: Nick Desaulniers Thanks as always for the review! I'll include this and a note about where in binutils I found that information for v2. > > > > Signed-off-by: Nathan Chancellor > > --- > > Cc: m...@ellerman.id.au > > Cc: npig...@gmail.com > > Cc: christophe.le...@csgroup.eu > > Cc: linuxppc-dev@lists.ozlabs.org > > --- > > arch/powerpc/kernel/vdso/Makefile | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/arch/powerpc/kernel/vdso/Makefile > > b/arch/powerpc/kernel/vdso/Makefile > > index 6a977b0d8ffc..45c0cc5d34b6 100644 > > --- a/arch/powerpc/kernel/vdso/Makefile > > +++ b/arch/powerpc/kernel/vdso/Makefile > > @@ -51,10 +51,10 @@ ccflags-y := -shared -fno-common -fno-builtin -nostdlib > > -Wl,--hash-style=both > > ccflags-$(CONFIG_LD_IS_LLD) += $(call > > cc-option,--ld-path=$(LD),-fuse-ld=lld) > > > > CC32FLAGS := -Wl,-soname=linux-vdso32.so.1 -m32 > > -AS32FLAGS := -D__VDSO32__ -s > > +AS32FLAGS := -D__VDSO32__ > > > > CC64FLAGS := -Wl,-soname=linux-vdso64.so.1 > > -AS64FLAGS := -D__VDSO64__ -s > > +AS64FLAGS := -D__VDSO64__ > > > > targets += vdso32.lds > > CPPFLAGS_vdso32.lds += -P -C -Upowerpc > > > > -- > > 2.39.0 > > > > -- > Thanks, > ~Nick Desaulniers
Re: [PATCH 08/14] powerpc/vdso: Remove an unsupported flag from vgettimeofday-32.o with clang
On Wed, Jan 4, 2023 at 11:55 AM Nathan Chancellor wrote: > > When clang's -Qunused-arguments is dropped from KBUILD_CPPFLAGS, it > warns: > > clang-16: error: argument unused during compilation: > '-fno-stack-clash-protection' [-Werror,-Wunused-command-line-argument] > > This flag is supported for 64-bit powerpc but not 32-bit, hence the warning. > Just remove the flag from vgettimeofday-32.o's CFLAGS when using clang, as has > been done for other flags previously. > > Signed-off-by: Nathan Chancellor Hmm...so this was added by the top level Makefile doing a cc-option test. How did the test pass if this was unsupported? That worries me that perhaps other cc-option tests are passing erroneously for certain ppc -m32/-m64 configs? > --- > Cc: m...@ellerman.id.au > Cc: npig...@gmail.com > Cc: christophe.le...@csgroup.eu > Cc: linuxppc-dev@lists.ozlabs.org > --- > arch/powerpc/kernel/vdso/Makefile | 5 + > 1 file changed, 5 insertions(+) > > diff --git a/arch/powerpc/kernel/vdso/Makefile > b/arch/powerpc/kernel/vdso/Makefile > index 769b62832b38..4ee7d36ce752 100644 > --- a/arch/powerpc/kernel/vdso/Makefile > +++ b/arch/powerpc/kernel/vdso/Makefile > @@ -16,6 +16,11 @@ ifneq ($(c-gettimeofday-y),) >CFLAGS_vgettimeofday-32.o += -ffreestanding -fasynchronous-unwind-tables >CFLAGS_REMOVE_vgettimeofday-32.o = $(CC_FLAGS_FTRACE) >CFLAGS_REMOVE_vgettimeofday-32.o += -mcmodel=medium -mabi=elfv1 > -mabi=elfv2 -mcall-aixdesc > + # This flag is supported by clang for 64-bit but not 32-bit so it will > cause > + # an unused command line flag warning for this file. > + ifdef CONFIG_CC_IS_CLANG > + CFLAGS_REMOVE_vgettimeofday-32.o += -fno-stack-clash-protection > + endif >CFLAGS_vgettimeofday-64.o += -include $(c-gettimeofday-y) >CFLAGS_vgettimeofday-64.o += $(DISABLE_LATENT_ENTROPY_PLUGIN) >CFLAGS_vgettimeofday-64.o += $(call cc-option, -fno-stack-protector) > > -- > 2.39.0 -- Thanks, ~Nick Desaulniers
Re: [PATCH 07/14] powerpc/vdso: Improve linker flags
On Wed, Jan 4, 2023 at 11:55 AM Nathan Chancellor wrote: > > When clang's -Qunused-arguments is dropped from KBUILD_CPPFLAGS, there > are several warnings in the PowerPC vDSO: > > clang-16: error: -Wl,-soname=linux-vdso32.so.1: 'linker' input unused > [-Werror,-Wunused-command-line-argument] > clang-16: error: -Wl,--hash-style=both: 'linker' input unused > [-Werror,-Wunused-command-line-argument] > clang-16: error: argument unused during compilation: '-shared' > [-Werror,-Wunused-command-line-argument] > > clang-16: error: argument unused during compilation: '-nostdinc' > [-Werror,-Wunused-command-line-argument] > clang-16: error: argument unused during compilation: '-Wa,-maltivec' > [-Werror,-Wunused-command-line-argument] > > The first group of warnings point out that linker flags were being added > to all invocations of $(CC), even though they will only be used during > the final vDSO link. Move those flags to ldflags-y. > > The second group of warnings are compiler or assembler flags that will > be unused during linking. Filter them out from KBUILD_CFLAGS so that > they are not used during linking. > > Signed-off-by: Nathan Chancellor > --- > Cc: m...@ellerman.id.au > Cc: npig...@gmail.com > Cc: christophe.le...@csgroup.eu > Cc: linuxppc-dev@lists.ozlabs.org > --- > arch/powerpc/kernel/vdso/Makefile | 18 +++--- > 1 file changed, 11 insertions(+), 7 deletions(-) > > diff --git a/arch/powerpc/kernel/vdso/Makefile > b/arch/powerpc/kernel/vdso/Makefile > index 45c0cc5d34b6..769b62832b38 100644 > --- a/arch/powerpc/kernel/vdso/Makefile > +++ b/arch/powerpc/kernel/vdso/Makefile > @@ -47,13 +47,17 @@ KCOV_INSTRUMENT := n > UBSAN_SANITIZE := n > KASAN_SANITIZE := n > > -ccflags-y := -shared -fno-common -fno-builtin -nostdlib -Wl,--hash-style=both > -ccflags-$(CONFIG_LD_IS_LLD) += $(call cc-option,--ld-path=$(LD),-fuse-ld=lld) > - > -CC32FLAGS := -Wl,-soname=linux-vdso32.so.1 -m32 > +ccflags-y := -fno-common -fno-builtin > +ldflags-y := -Wl,--hash-style=both -nostdlib -shared > +ldflags-$(CONFIG_LD_IS_LLD) += $(call cc-option,--ld-path=$(LD),-fuse-ld=lld) > +# Filter flags that clang will warn are unused for linking > +ldflags-y += $(filter-out $(CC_FLAGS_FTRACE) -Wa$(comma)%, $(KBUILD_CFLAGS)) > + > +CC32FLAGS := -m32 > +LD32FLAGS := -Wl,-soname=linux-vdso32.so.1 > AS32FLAGS := -D__VDSO32__ > > -CC64FLAGS := -Wl,-soname=linux-vdso64.so.1 > +LD64FLAGS := -Wl,-soname=linux-vdso64.so.1 > AS64FLAGS := -D__VDSO64__ > > targets += vdso32.lds > @@ -92,14 +96,14 @@ include/generated/vdso64-offsets.h: $(obj)/vdso64.so.dbg > FORCE > > # actual build commands > quiet_cmd_vdso32ld_and_check = VDSO32L $@ > - cmd_vdso32ld_and_check = $(VDSOCC) $(c_flags) $(CC32FLAGS) -o $@ > -Wl,-T$(filter %.lds,$^) $(filter %.o,$^) -z noexecstack ; $(cmd_vdso_check) > + cmd_vdso32ld_and_check = $(VDSOCC) $(ldflags-y) $(CC32FLAGS) > $(LD32FLAGS) -o $@ -Wl,-T$(filter %.lds,$^) $(filter %.o,$^) -z noexecstack ; > $(cmd_vdso_check) > quiet_cmd_vdso32as = VDSO32A $@ >cmd_vdso32as = $(VDSOCC) $(a_flags) $(CC32FLAGS) $(AS32FLAGS) -c -o $@ > $< > quiet_cmd_vdso32cc = VDSO32C $@ >cmd_vdso32cc = $(VDSOCC) $(c_flags) $(CC32FLAGS) -c -o $@ $< > > quiet_cmd_vdso64ld_and_check = VDSO64L $@ > - cmd_vdso64ld_and_check = $(VDSOCC) $(c_flags) $(CC64FLAGS) -o $@ > -Wl,-T$(filter %.lds,$^) $(filter %.o,$^) -z noexecstack ; $(cmd_vdso_check) > + cmd_vdso64ld_and_check = $(VDSOCC) $(ldflags-y) $(CC64FLAGS) > $(LD64FLAGS) -o $@ -Wl,-T$(filter %.lds,$^) $(filter %.o,$^) -z noexecstack ; > $(cmd_vdso_check) Let's move `-z noexecstack` up into ldflags-y? (you may add my RB with that modification) > quiet_cmd_vdso64as = VDSO64A $@ >cmd_vdso64as = $(VDSOCC) $(a_flags) $(CC64FLAGS) $(AS64FLAGS) -c -o $@ > $< > > > -- > 2.39.0 -- Thanks, ~Nick Desaulniers
Re: [PATCH 06/14] powerpc/vdso: Remove unused '-s' flag from ASFLAGS
On Wed, Jan 4, 2023 at 11:55 AM Nathan Chancellor wrote: > > When clang's -Qunused-arguments is dropped from KBUILD_CPPFLAGS, it > warns that ASFLAGS contains '-s', which is a linking phase option, so it > is unused. > > clang-16: error: argument unused during compilation: '-s' > [-Werror,-Wunused-command-line-argument] > > Looking at the GAS sources, '-s' is only useful when targeting Solaris > and it is ignored for the powerpc target so just drop the flag > altogether, as it is not needed. Do you have any more info where you found this? I don't see -s documented as an assembler flag. https://sourceware.org/binutils/docs/as/PowerPC_002dOpts.html https://sourceware.org/binutils/docs/as/Invoking.html The patch seems fine to me, but what was this ever supposed to be? FWICT it predates git history (looking at arch/powerpc/kernel/vdso32/Makefile at fc15351d9d63) Reviewed-by: Nick Desaulniers > > Signed-off-by: Nathan Chancellor > --- > Cc: m...@ellerman.id.au > Cc: npig...@gmail.com > Cc: christophe.le...@csgroup.eu > Cc: linuxppc-dev@lists.ozlabs.org > --- > arch/powerpc/kernel/vdso/Makefile | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/arch/powerpc/kernel/vdso/Makefile > b/arch/powerpc/kernel/vdso/Makefile > index 6a977b0d8ffc..45c0cc5d34b6 100644 > --- a/arch/powerpc/kernel/vdso/Makefile > +++ b/arch/powerpc/kernel/vdso/Makefile > @@ -51,10 +51,10 @@ ccflags-y := -shared -fno-common -fno-builtin -nostdlib > -Wl,--hash-style=both > ccflags-$(CONFIG_LD_IS_LLD) += $(call cc-option,--ld-path=$(LD),-fuse-ld=lld) > > CC32FLAGS := -Wl,-soname=linux-vdso32.so.1 -m32 > -AS32FLAGS := -D__VDSO32__ -s > +AS32FLAGS := -D__VDSO32__ > > CC64FLAGS := -Wl,-soname=linux-vdso64.so.1 > -AS64FLAGS := -D__VDSO64__ -s > +AS64FLAGS := -D__VDSO64__ > > targets += vdso32.lds > CPPFLAGS_vdso32.lds += -P -C -Upowerpc > > -- > 2.39.0 -- Thanks, ~Nick Desaulniers
Re: [PATCH 12/15] auxdisplay: ht16k33: Introduce backlight_get_brightness()
Hi Sam, On 2023-01-08 10:29, Sam Ravnborg wrote: Hi Robin. On Sat, Jan 07, 2023 at 10:02:38PM +0100, Miguel Ojeda wrote: On Sat, Jan 7, 2023 at 7:26 PM Sam Ravnborg via B4 Submission Endpoint wrote: > > Introduce backlight_get_brightness() to simplify logic > and avoid direct access to backlight properties. Note: Stephen sent this one too a while ago (with some more details in the commit message, which is always nice); and then he sent yesterday v2 [1] (to mention the functional change with `BL_CORE_SUSPENDED` [2]). Thanks for the pointers. I will try to move forward with Stephen's patches. Anyway, if it goes via drm-misc, feel free to have my: Acked-by: Miguel Ojeda Though it would be nice to have Robin test the change. Robin - can I get your ack to apply Stephen's original v2 patch to drm-misc? done! see: https://lore.kernel.org/lkml/0b16391f997e6ed005a326e4e48f2...@protonic.nl/ - Robin
Re: [PATCH v7 2/2] arm64: support batched/deferred tlb shootdown during page reclamation
On Tue, Jan 10, 2023 at 1:19 AM Catalin Marinas wrote: > > On Sun, Jan 08, 2023 at 06:48:41PM +0800, Barry Song wrote: > > On Fri, Jan 6, 2023 at 2:15 AM Catalin Marinas > > wrote: > > > On Thu, Nov 17, 2022 at 04:26:48PM +0800, Yicong Yang wrote: > > > > It is tested on 4,8,128 CPU platforms and shows to be beneficial on > > > > large systems but may not have improvement on small systems like on > > > > a 4 CPU platform. So make ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH depends > > > > on CONFIG_EXPERT for this stage and make this disabled on systems > > > > with less than 8 CPUs. User can modify this threshold according to > > > > their own platforms by CONFIG_NR_CPUS_FOR_BATCHED_TLB. > > > > > > What's the overhead of such batching on systems with 4 or fewer CPUs? If > > > it isn't noticeable, I'd rather have it always on than some number > > > chosen on whichever SoC you tested. > > > > On the one hand, tlb flush is cheap on a small system. so batching tlb flush > > helps very minorly. > > Yes, it probably won't help on small systems but I don't like config > options choosing the threshold, which may be different from system to > system even if they have the same number of CPUs. A run-time tunable > would be a better option. > > > On the other hand, since we have batched the tlb flush, new PTEs might be > > invisible to others before the final broadcast is done and Ack-ed. > > The new PTEs could indeed be invisible at the TLB level but not at the > memory (page table) level since this is done under the PTL IIUC. > > > thus, there > > is a risk someone else might do mprotect or similar things on those > > deferred > > pages which will ask for read-modify-write on those deferred PTEs. > > And this should be fine, we have things like the PTL in place for the > actual memory access to the page table. > > > in this > > case, mm will do an explicit flush by flush_tlb_batched_pending which is > > not required if tlb flush is not deferred. > > I don't fully understand why it's needed, or at least why it would be > needed on arm64. At the end of an mprotect(), we have the final PTEs in > place and we just need to issue a TLBI for that range. > change_pte_range() for example has a tlb_flush_pte_range() if the PTE > was present and that won't be done lazily. If there are other TLBIs > pending for the same range, they'll be done later though likely > unnecessarily but still cheaper than issuing a flush_tlb_mm(). Thanks! I'd like to ask for some comments from Nadav and Mel from the x86 side. Revisiting the code of flush_tlb_batched_pending shows we still have races even under PTL. /* * Reclaim unmaps pages under the PTL but do not flush the TLB prior to * releasing the PTL if TLB flushes are batched. It's possible for a parallel * operation such as mprotect or munmap to race between reclaim unmapping * the page and flushing the page. If this race occurs, it potentially allows * access to data via a stale TLB entry. Tracking all mm's that have TLB * batching in flight would be expensive during reclaim so instead track * whether TLB batching occurred in the past and if so then do a flush here * if required. This will cost one additional flush per reclaim cycle paid * by the first operation at risk such as mprotect and mumap. * * This must be called under the PTL so that an access to tlb_flush_batched * that is potentially a "reclaim vs mprotect/munmap/etc" race will synchronise * via the PTL. */ void flush_tlb_batched_pending(struct mm_struct *mm) { } According to Catalin's comment, it seems over-cautious since we can make sure people see updated TLB after mprotect and munmap are done as they have tlb flush. We can also make sure mprotect see updated "memory" of PTEs from reclamation though pte is not visible in TLB level. Hi Mel, Nadav, would you please help clarify the exact sequence of how this race is going to happen? > > > void flush_tlb_batched_pending(struct mm_struct *mm) > > { > >int batch = atomic_read(>tlb_flush_batched); > >int pending = batch & TLB_FLUSH_BATCH_PENDING_MASK; > >int flushed = batch >> TLB_FLUSH_BATCH_FLUSHED_SHIFT; > > > >if (pending != flushed) { > >flush_tlb_mm(mm); > > /* > > * If the new TLB flushing is pending during flushing, leave > > * mm->tlb_flush_batched as is, to avoid losing flushing. > > */ > > atomic_cmpxchg(>tlb_flush_batched, batch, > >pending | (pending << TLB_FLUSH_BATCH_FLUSHED_SHIFT)); > > } > > } > > I guess this works on x86 better as it avoids the IPIs if this flush > already happened. But on arm64 we already issued the TLBI, we just > didn't wait for it to complete via a DSB. > > > I believe Anshuman has contributed many points on this in those previous > > discussions. > > Yeah, I should re-read the old threads. > > -- > Catalin Thanks Barry
Re: [PATCH 05/14] powerpc: Remove linker flag from KBUILD_AFLAGS
On Wed, Jan 4, 2023 at 11:54 AM Nathan Chancellor wrote: > > When clang's -Qunused-arguments is dropped from KBUILD_CPPFLAGS, it > points out that KBUILD_AFLAGS contains a linker flag, which will be > used: > > clang: error: -Wl,-a32: 'linker' input unused > [-Werror,-Wunused-command-line-argument] > > This was likely supposed to be '-Wa,-a$(BITS)'. However, this change is > unnecessary, as all supported versions of clang and gcc will pass '-a64' > or '-a32' to GNU as based on the value of '-m'; the behavior of the > latest stable release of the oldest supported major version of each > compiler is shown below and each compiler's latest release exhibits the > same behavior (GCC 12.2.0 and Clang 15.0.6). > > $ powerpc64-linux-gcc --version | head -1 > powerpc64-linux-gcc (GCC) 5.5.0 > > $ powerpc64-linux-gcc -m64 -### -x assembler-with-cpp -c -o /dev/null > /dev/null &| grep 'as ' > .../as -a64 -mppc64 -many -mbig -o /dev/null /tmp/cctwuBzZ.s > > $ powerpc64-linux-gcc -m32 -### -x assembler-with-cpp -c -o /dev/null > /dev/null &| grep 'as ' > .../as -a32 -mppc -many -mbig -o /dev/null /tmp/ccaZP4mF.sg > > $ clang --version | head -1 > Ubuntu clang version > 11.1.0-++20211011094159+1fdec59bffc1-1~exp1~20211011214622.5 > > $ clang --target=powerpc64-linux-gnu -fno-integrated-as -m64 -### \ > -x assembler-with-cpp -c -o /dev/null /dev/null &| grep gnu-as >"/usr/bin/powerpc64-linux-gnu-as" "-a64" "-mppc64" "-many" "-o" > "/dev/null" "/tmp/null-80267c.s" > > $ clang --target=powerpc64-linux-gnu -fno-integrated-as -m64 -### \ > -x assembler-with-cpp -c -o /dev/null /dev/null &| grep gnu-as >"/usr/bin/powerpc64-linux-gnu-as" "-a32" "-mppc" "-many" "-o" "/dev/null" > "/tmp/null-ab8f8d.s" > > Remove this flag altogether to avoid future issues. > > Fixes: 1421dc6d4829 ("powerpc/kbuild: Use flags variables rather than > overriding LD/CC/AS") > Signed-off-by: Nathan Chancellor Thanks for the patch! Reviewed-by: Nick Desaulniers > --- > Cc: m...@ellerman.id.au > Cc: npig...@gmail.com > Cc: christophe.le...@csgroup.eu > Cc: linuxppc-dev@lists.ozlabs.org > --- > arch/powerpc/Makefile | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile > index dc4cbf0a5ca9..4fd630efe39d 100644 > --- a/arch/powerpc/Makefile > +++ b/arch/powerpc/Makefile > @@ -90,7 +90,7 @@ aflags-$(CONFIG_CPU_LITTLE_ENDIAN)+= -mlittle-endian > > ifeq ($(HAS_BIARCH),y) > KBUILD_CFLAGS += -m$(BITS) > -KBUILD_AFLAGS += -m$(BITS) -Wl,-a$(BITS) > +KBUILD_AFLAGS += -m$(BITS) > KBUILD_LDFLAGS += -m elf$(BITS)$(LDEMULATION) > endif > > > -- > 2.39.0 -- Thanks, ~Nick Desaulniers
Re: [PATCH 01/15] video: fbdev: atmel_lcdfb: Rework backlight handling
Hi Sam, On Sun, 8 Jan 2023 21:24:20 +0100, Sam Ravnborg wrote: > > Here are my pending patches from last June on lore: > > > > All patches are handled I think except this: > > * https://lore.kernel.org/lkml/20220608205623.2106113-1-st...@sk2.org/ > > Can I ask you to drop the assignment that is not needed, and resend with > the collected acks/r-b. > > With this, then all fbdev patches are handled. Ah yes, it was revised as https://lore.kernel.org/lkml/20220609180440.3138625-1-st...@sk2.org/ which only got one ack (from you, https://lore.kernel.org/lkml/yqjckqmqeuvsb...@ravnborg.org/). Should I resend that or is that usable as-is? Or would it be better if I sent all the fbdev patches again (collecting all the acks and reviews)? Regards, Stephen pgp6D4pSmN3MA.pgp Description: OpenPGP digital signature
[PATCH 41/41] mm: replace rw_semaphore with atomic_t in vma_lock
rw_semaphore is a sizable structure of 40 bytes and consumes considerable space for each vm_area_struct. However vma_lock has two important specifics which can be used to replace rw_semaphore with a simpler structure: 1. Readers never wait. They try to take the vma_lock and fall back to mmap_lock if that fails. 2. Only one writer at a time will ever try to write-lock a vma_lock because writers first take mmap_lock in write mode. Because of these requirements, full rw_semaphore functionality is not needed and we can replace rw_semaphore with an atomic variable. When a reader takes read lock, it increments the atomic unless the value is negative. If that fails read-locking is aborted and mmap_lock is used instead. When writer takes write lock, it resets atomic value to -1 if the current value is 0 (no readers). Since all writers take mmap_lock in write mode first, there can be only one writer at a time. If there are readers, writer will place itself into a wait queue using new mm_struct.vma_writer_wait waitqueue head. The last reader to release the vma_lock will signal the writer to wake up. vm_lock_seq is also moved into vma_lock and along with atomic_t they are nicely packed and consume 8 bytes, bringing the overhead from vma_lock from 44 to 16 bytes: slabinfo before the changes: ...: ... vm_area_struct...152 532 : ... slabinfo with vma_lock: ...: ... rw_semaphore ... 8 5121 : ... vm_area_struct...160 512 : ... Assuming 4 vm_area_structs, memory consumption would be: baseline: 6040kB vma_lock (vm_area_structs+vma_lock): 6280kB+316kB=6596kB Total increase: 556kB atomic_t might overflow if there are many competing readers, therefore vma_read_trylock() implements an overflow check and if that occurs it restors the previous value and exits with a failure to lock. Signed-off-by: Suren Baghdasaryan --- include/linux/mm.h | 37 + include/linux/mm_types.h | 10 -- kernel/fork.c| 6 +++--- mm/init-mm.c | 2 ++ 4 files changed, 38 insertions(+), 17 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index d40bf8a5e19e..294dd44b2198 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -627,12 +627,16 @@ static inline void vma_write_lock(struct vm_area_struct *vma) * mm->mm_lock_seq can't be concurrently modified. */ mm_lock_seq = READ_ONCE(vma->vm_mm->mm_lock_seq); - if (vma->vm_lock_seq == mm_lock_seq) + if (vma->vm_lock->lock_seq == mm_lock_seq) return; - down_write(>vm_lock->lock); - vma->vm_lock_seq = mm_lock_seq; - up_write(>vm_lock->lock); + if (atomic_cmpxchg(>vm_lock->count, 0, -1)) + wait_event(vma->vm_mm->vma_writer_wait, + atomic_cmpxchg(>vm_lock->count, 0, -1) == 0); + vma->vm_lock->lock_seq = mm_lock_seq; + /* Write barrier to ensure lock_seq change is visible before count */ + smp_wmb(); + atomic_set(>vm_lock->count, 0); } /* @@ -643,20 +647,28 @@ static inline void vma_write_lock(struct vm_area_struct *vma) static inline bool vma_read_trylock(struct vm_area_struct *vma) { /* Check before locking. A race might cause false locked result. */ - if (vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq)) + if (vma->vm_lock->lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq)) return false; - if (unlikely(down_read_trylock(>vm_lock->lock) == 0)) + if (unlikely(!atomic_inc_unless_negative(>vm_lock->count))) return false; + /* If atomic_t overflows, restore and fail to lock. */ + if (unlikely(atomic_read(>vm_lock->count) < 0)) { + if (atomic_dec_and_test(>vm_lock->count)) + wake_up(>vm_mm->vma_writer_wait); + return false; + } + /* * Overflow might produce false locked result. * False unlocked result is impossible because we modify and check * vma->vm_lock_seq under vma->vm_lock protection and mm->mm_lock_seq * modification invalidates all existing locks. */ - if (unlikely(vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq))) { - up_read(>vm_lock->lock); + if (unlikely(vma->vm_lock->lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq))) { + if (atomic_dec_and_test(>vm_lock->count)) + wake_up(>vm_mm->vma_writer_wait); return false; } return true; @@ -664,7 +676,8 @@ static inline bool vma_read_trylock(struct vm_area_struct *vma) static inline void vma_read_unlock(struct vm_area_struct *vma) { - up_read(>vm_lock->lock); + if (atomic_dec_and_test(>vm_lock->count)) + wake_up(>vm_mm->vma_writer_wait); } static inline void
[PATCH 40/41] mm: separate vma->lock from vm_area_struct
vma->lock being part of the vm_area_struct causes performance regression during page faults because during contention its count and owner fields are constantly updated and having other parts of vm_area_struct used during page fault handling next to them causes constant cache line bouncing. Fix that by moving the lock outside of the vm_area_struct. All attempts to keep vma->lock inside vm_area_struct in a separate cache line still produce performance regression especially on NUMA machines. Smallest regression was achieved when lock is placed in the fourth cache line but that bloats vm_area_struct to 256 bytes. Considering performance and memory impact, separate lock looks like the best option. It increases memory footprint of each VMA but that will be addressed in the next patch. Note that after this change vma_init() does not allocate or initialize vma->lock anymore. A number of drivers allocate a pseudo VMA on the stack but they never use the VMA's lock, therefore it does not need to be allocated. The future drivers which might need the VMA lock should use vm_area_alloc()/vm_area_free() to allocate it. Signed-off-by: Suren Baghdasaryan --- include/linux/mm.h | 25 ++-- include/linux/mm_types.h | 6 ++- kernel/fork.c| 82 3 files changed, 74 insertions(+), 39 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 50c7a6dd9c7a..d40bf8a5e19e 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -615,11 +615,6 @@ struct vm_operations_struct { }; #ifdef CONFIG_PER_VMA_LOCK -static inline void vma_init_lock(struct vm_area_struct *vma) -{ - init_rwsem(>lock); - vma->vm_lock_seq = -1; -} static inline void vma_write_lock(struct vm_area_struct *vma) { @@ -635,9 +630,9 @@ static inline void vma_write_lock(struct vm_area_struct *vma) if (vma->vm_lock_seq == mm_lock_seq) return; - down_write(>lock); + down_write(>vm_lock->lock); vma->vm_lock_seq = mm_lock_seq; - up_write(>lock); + up_write(>vm_lock->lock); } /* @@ -651,17 +646,17 @@ static inline bool vma_read_trylock(struct vm_area_struct *vma) if (vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq)) return false; - if (unlikely(down_read_trylock(>lock) == 0)) + if (unlikely(down_read_trylock(>vm_lock->lock) == 0)) return false; /* * Overflow might produce false locked result. * False unlocked result is impossible because we modify and check -* vma->vm_lock_seq under vma->lock protection and mm->mm_lock_seq +* vma->vm_lock_seq under vma->vm_lock protection and mm->mm_lock_seq * modification invalidates all existing locks. */ if (unlikely(vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq))) { - up_read(>lock); + up_read(>vm_lock->lock); return false; } return true; @@ -669,7 +664,7 @@ static inline bool vma_read_trylock(struct vm_area_struct *vma) static inline void vma_read_unlock(struct vm_area_struct *vma) { - up_read(>lock); + up_read(>vm_lock->lock); } static inline void vma_assert_write_locked(struct vm_area_struct *vma) @@ -684,7 +679,7 @@ static inline void vma_assert_write_locked(struct vm_area_struct *vma) static inline void vma_assert_no_reader(struct vm_area_struct *vma) { - VM_BUG_ON_VMA(rwsem_is_locked(>lock) && + VM_BUG_ON_VMA(rwsem_is_locked(>vm_lock->lock) && vma->vm_lock_seq != READ_ONCE(vma->vm_mm->mm_lock_seq), vma); } @@ -694,7 +689,6 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm, #else /* CONFIG_PER_VMA_LOCK */ -static inline void vma_init_lock(struct vm_area_struct *vma) {} static inline void vma_write_lock(struct vm_area_struct *vma) {} static inline bool vma_read_trylock(struct vm_area_struct *vma) { return false; } @@ -704,6 +698,10 @@ static inline void vma_assert_no_reader(struct vm_area_struct *vma) {} #endif /* CONFIG_PER_VMA_LOCK */ +/* + * WARNING: vma_init does not initialize vma->vm_lock. + * Use vm_area_alloc()/vm_area_free() if vma needs locking. + */ static inline void vma_init(struct vm_area_struct *vma, struct mm_struct *mm) { static const struct vm_operations_struct dummy_vm_ops = {}; @@ -712,7 +710,6 @@ static inline void vma_init(struct vm_area_struct *vma, struct mm_struct *mm) vma->vm_mm = mm; vma->vm_ops = _vm_ops; INIT_LIST_HEAD(>anon_vma_chain); - vma_init_lock(vma); } /* Use when VMA is not part of the VMA tree and needs no locking */ diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index c0e6c8e4700b..faa61b400f9b 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -526,6 +526,10 @@ struct anon_vma_name { char name[]; }; +struct
[PATCH 39/41] kernel/fork: throttle call_rcu() calls in vm_area_free
call_rcu() can take a long time when callback offloading is enabled. Its use in the vm_area_free can cause regressions in the exit path when multiple VMAs are being freed. To minimize that impact, place VMAs into a list and free them in groups using one call_rcu() call per group. Signed-off-by: Suren Baghdasaryan --- include/linux/mm.h | 1 + include/linux/mm_types.h | 19 +-- kernel/fork.c| 68 +++- mm/init-mm.c | 3 ++ mm/mmap.c| 1 + 5 files changed, 82 insertions(+), 10 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 3158f33e268c..50c7a6dd9c7a 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -250,6 +250,7 @@ void setup_initial_init_mm(void *start_code, void *end_code, struct vm_area_struct *vm_area_alloc(struct mm_struct *); struct vm_area_struct *vm_area_dup(struct vm_area_struct *); void vm_area_free(struct vm_area_struct *); +void drain_free_vmas(struct mm_struct *mm); #ifndef CONFIG_MMU extern struct rb_root nommu_region_tree; diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index fce9113d979c..c0e6c8e4700b 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -592,8 +592,18 @@ struct vm_area_struct { /* Information about our backing store: */ unsigned long vm_pgoff; /* Offset (within vm_file) in PAGE_SIZE units */ - struct file * vm_file; /* File we map to (can be NULL). */ - void * vm_private_data; /* was vm_pte (shared mem) */ + union { + struct { + /* File we map to (can be NULL). */ + struct file *vm_file; + + /* was vm_pte (shared mem) */ + void *vm_private_data; + }; +#ifdef CONFIG_PER_VMA_LOCK + struct list_head vm_free_list; +#endif + }; #ifdef CONFIG_ANON_VMA_NAME /* @@ -693,6 +703,11 @@ struct mm_struct { */ #ifdef CONFIG_PER_VMA_LOCK int mm_lock_seq; + struct { + struct list_head head; + spinlock_t lock; + int size; + } vma_free_list; #endif diff --git a/kernel/fork.c b/kernel/fork.c index 6d9f14e55ecf..97f2b751f88d 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -481,26 +481,75 @@ struct vm_area_struct *vm_area_dup(struct vm_area_struct *orig) } #ifdef CONFIG_PER_VMA_LOCK -static void __vm_area_free(struct rcu_head *head) +static inline void __vm_area_free(struct vm_area_struct *vma) { - struct vm_area_struct *vma = container_of(head, struct vm_area_struct, - vm_rcu); /* The vma should either have no lock holders or be write-locked. */ vma_assert_no_reader(vma); kmem_cache_free(vm_area_cachep, vma); } -#endif + +static void vma_free_rcu_callback(struct rcu_head *head) +{ + struct vm_area_struct *first_vma; + struct vm_area_struct *vma, *vma2; + + first_vma = container_of(head, struct vm_area_struct, vm_rcu); + list_for_each_entry_safe(vma, vma2, _vma->vm_free_list, vm_free_list) + __vm_area_free(vma); + __vm_area_free(first_vma); +} + +void drain_free_vmas(struct mm_struct *mm) +{ + struct vm_area_struct *first_vma; + LIST_HEAD(to_destroy); + + spin_lock(>vma_free_list.lock); + list_splice_init(>vma_free_list.head, _destroy); + mm->vma_free_list.size = 0; + spin_unlock(>vma_free_list.lock); + + if (list_empty(_destroy)) + return; + + first_vma = list_first_entry(_destroy, struct vm_area_struct, vm_free_list); + /* Remove the head which is allocated on the stack */ + list_del(_destroy); + + call_rcu(_vma->vm_rcu, vma_free_rcu_callback); +} + +#define VM_AREA_FREE_LIST_MAX 32 + +void vm_area_free(struct vm_area_struct *vma) +{ + struct mm_struct *mm = vma->vm_mm; + bool drain; + + free_anon_vma_name(vma); + + spin_lock(>vma_free_list.lock); + list_add(>vm_free_list, >vma_free_list.head); + mm->vma_free_list.size++; + drain = mm->vma_free_list.size > VM_AREA_FREE_LIST_MAX; + spin_unlock(>vma_free_list.lock); + + if (drain) + drain_free_vmas(mm); +} + +#else /* CONFIG_PER_VMA_LOCK */ + +void drain_free_vmas(struct mm_struct *mm) {} void vm_area_free(struct vm_area_struct *vma) { free_anon_vma_name(vma); -#ifdef CONFIG_PER_VMA_LOCK - call_rcu(>vm_rcu, __vm_area_free); -#else kmem_cache_free(vm_area_cachep, vma); -#endif } +#endif /* CONFIG_PER_VMA_LOCK */ + static void account_kernel_stack(struct task_struct *tsk, int account) { if (IS_ENABLED(CONFIG_VMAP_STACK)) { @@ -1150,6 +1199,9 @@ static struct
[PATCH 38/41] mm: avoid assertion in untrack_pfn
untrack_pfn can be called after VMA was isolated and mmap_lock downgraded. An attempt to lock affected VMA would cause an assertion, therefore use mod_vm_flags_nolock in such situations. Signed-off-by: Suren Baghdasaryan --- arch/x86/mm/pat/memtype.c | 10 +++--- include/linux/mm.h| 2 +- include/linux/pgtable.h | 5 +++-- mm/memory.c | 15 --- mm/memremap.c | 4 ++-- mm/mmap.c | 4 ++-- 6 files changed, 23 insertions(+), 17 deletions(-) diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c index 9e490a372896..f71c8381430b 100644 --- a/arch/x86/mm/pat/memtype.c +++ b/arch/x86/mm/pat/memtype.c @@ -1045,7 +1045,7 @@ void track_pfn_insert(struct vm_area_struct *vma, pgprot_t *prot, pfn_t pfn) * can be for the entire vma (in which case pfn, size are zero). */ void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn, -unsigned long size) +unsigned long size, bool lock_vma) { resource_size_t paddr; unsigned long prot; @@ -1064,8 +1064,12 @@ void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn, size = vma->vm_end - vma->vm_start; } free_pfn_range(paddr, size); - if (vma) - clear_vm_flags(vma, VM_PAT); + if (vma) { + if (lock_vma) + clear_vm_flags(vma, VM_PAT); + else + mod_vm_flags_nolock(vma, 0, VM_PAT); + } } /* diff --git a/include/linux/mm.h b/include/linux/mm.h index 7d436a5027cc..3158f33e268c 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2135,7 +2135,7 @@ void zap_page_range_single(struct vm_area_struct *vma, unsigned long address, unsigned long size, struct zap_details *details); void unmap_vmas(struct mmu_gather *tlb, struct maple_tree *mt, struct vm_area_struct *start_vma, unsigned long start, - unsigned long end); + unsigned long end, bool lock_vma); struct mmu_notifier_range; diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 1159b25b0542..eaa831bd675d 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1214,7 +1214,8 @@ static inline int track_pfn_copy(struct vm_area_struct *vma) * can be for the entire vma (in which case pfn, size are zero). */ static inline void untrack_pfn(struct vm_area_struct *vma, - unsigned long pfn, unsigned long size) + unsigned long pfn, unsigned long size, + bool lock_vma) { } @@ -1232,7 +1233,7 @@ extern void track_pfn_insert(struct vm_area_struct *vma, pgprot_t *prot, pfn_t pfn); extern int track_pfn_copy(struct vm_area_struct *vma); extern void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn, - unsigned long size); + unsigned long size, bool lock_vma); extern void untrack_pfn_moved(struct vm_area_struct *vma); #endif diff --git a/mm/memory.c b/mm/memory.c index 12508f4d845a..5c7d5eaa60d8 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1610,7 +1610,7 @@ void unmap_page_range(struct mmu_gather *tlb, static void unmap_single_vma(struct mmu_gather *tlb, struct vm_area_struct *vma, unsigned long start_addr, unsigned long end_addr, - struct zap_details *details) + struct zap_details *details, bool lock_vma) { unsigned long start = max(vma->vm_start, start_addr); unsigned long end; @@ -1625,7 +1625,7 @@ static void unmap_single_vma(struct mmu_gather *tlb, uprobe_munmap(vma, start, end); if (unlikely(vma->vm_flags & VM_PFNMAP)) - untrack_pfn(vma, 0, 0); + untrack_pfn(vma, 0, 0, lock_vma); if (start != end) { if (unlikely(is_vm_hugetlb_page(vma))) { @@ -1672,7 +1672,7 @@ static void unmap_single_vma(struct mmu_gather *tlb, */ void unmap_vmas(struct mmu_gather *tlb, struct maple_tree *mt, struct vm_area_struct *vma, unsigned long start_addr, - unsigned long end_addr) + unsigned long end_addr, bool lock_vma) { struct mmu_notifier_range range; struct zap_details details = { @@ -1686,7 +1686,8 @@ void unmap_vmas(struct mmu_gather *tlb, struct maple_tree *mt, start_addr, end_addr); mmu_notifier_invalidate_range_start(); do { - unmap_single_vma(tlb, vma, start_addr, end_addr, ); + unmap_single_vma(tlb, vma, start_addr, end_addr, , +lock_vma); } while ((vma = mas_find(, end_addr - 1)) != NULL); mmu_notifier_invalidate_range_end(); } @@ -1715,7 +1716,7 @@ void zap_page_range(struct vm_area_struct *vma, unsigned long start,
[PATCH 37/41] mm: introduce mod_vm_flags_nolock
In cases when VMA flags are modified after VMA was isolated and mmap_lock was downgraded, flags modifications do not require per-VMA locking and an attempt to lock the VMA would result in an assertion because mmap write lock is not held. Introduce mod_vm_flags_nolock to be used in such situation. Signed-off-by: Suren Baghdasaryan --- include/linux/mm.h | 8 1 file changed, 8 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index 2e3be1d45371..7d436a5027cc 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -743,6 +743,14 @@ void clear_vm_flags(struct vm_area_struct *vma, unsigned long flags) vma->vm_flags &= ~flags; } +static inline +void mod_vm_flags_nolock(struct vm_area_struct *vma, + unsigned long set, unsigned long clear) +{ + vma->vm_flags |= set; + vma->vm_flags &= ~clear; +} + static inline void mod_vm_flags(struct vm_area_struct *vma, unsigned long set, unsigned long clear) -- 2.39.0
[PATCH 36/41] powerc/mm: try VMA lock-based page fault handling first
From: Laurent Dufour Attempt VMA lock-based page fault handling first, and fall back to the existing mmap_lock-based handling if that fails. Copied from "x86/mm: try VMA lock-based page fault handling first" Signed-off-by: Laurent Dufour Signed-off-by: Suren Baghdasaryan --- arch/powerpc/mm/fault.c| 41 ++ arch/powerpc/platforms/powernv/Kconfig | 1 + arch/powerpc/platforms/pseries/Kconfig | 1 + 3 files changed, 43 insertions(+) diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c index 2bef19cc1b98..f92f8956d5f2 100644 --- a/arch/powerpc/mm/fault.c +++ b/arch/powerpc/mm/fault.c @@ -469,6 +469,44 @@ static int ___do_page_fault(struct pt_regs *regs, unsigned long address, if (is_exec) flags |= FAULT_FLAG_INSTRUCTION; +#ifdef CONFIG_PER_VMA_LOCK + if (!(flags & FAULT_FLAG_USER) || atomic_read(>mm_users) == 1) + goto lock_mmap; + + vma = lock_vma_under_rcu(mm, address); + if (!vma) + goto lock_mmap; + + if (unlikely(access_pkey_error(is_write, is_exec, + (error_code & DSISR_KEYFAULT), vma))) { + int rc = bad_access_pkey(regs, address, vma); + + vma_read_unlock(vma); + return rc; + } + + if (unlikely(access_error(is_write, is_exec, vma))) { + int rc = bad_access(regs, address); + + vma_read_unlock(vma); + return rc; + } + + fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, regs); + vma_read_unlock(vma); + + if (!(fault & VM_FAULT_RETRY)) { + count_vm_vma_lock_event(VMA_LOCK_SUCCESS); + goto done; + } + count_vm_vma_lock_event(VMA_LOCK_RETRY); + + if (fault_signal_pending(fault, regs)) + return user_mode(regs) ? 0 : SIGBUS; + +lock_mmap: +#endif /* CONFIG_PER_VMA_LOCK */ + /* When running in the kernel we expect faults to occur only to * addresses in user space. All other faults represent errors in the * kernel and should generate an OOPS. Unfortunately, in the case of an @@ -545,6 +583,9 @@ static int ___do_page_fault(struct pt_regs *regs, unsigned long address, mmap_read_unlock(current->mm); +#ifdef CONFIG_PER_VMA_LOCK +done: +#endif if (unlikely(fault & VM_FAULT_ERROR)) return mm_fault_error(regs, address, fault); diff --git a/arch/powerpc/platforms/powernv/Kconfig b/arch/powerpc/platforms/powernv/Kconfig index ae248a161b43..70a46acc70d6 100644 --- a/arch/powerpc/platforms/powernv/Kconfig +++ b/arch/powerpc/platforms/powernv/Kconfig @@ -16,6 +16,7 @@ config PPC_POWERNV select PPC_DOORBELL select MMU_NOTIFIER select FORCE_SMP + select ARCH_SUPPORTS_PER_VMA_LOCK default y config OPAL_PRD diff --git a/arch/powerpc/platforms/pseries/Kconfig b/arch/powerpc/platforms/pseries/Kconfig index a3b4d99567cb..e036a04ff1ca 100644 --- a/arch/powerpc/platforms/pseries/Kconfig +++ b/arch/powerpc/platforms/pseries/Kconfig @@ -21,6 +21,7 @@ config PPC_PSERIES select HOTPLUG_CPU select FORCE_SMP select SWIOTLB + select ARCH_SUPPORTS_PER_VMA_LOCK default y config PARAVIRT -- 2.39.0
[PATCH 35/41] arm64/mm: try VMA lock-based page fault handling first
Attempt VMA lock-based page fault handling first, and fall back to the existing mmap_lock-based handling if that fails. Signed-off-by: Suren Baghdasaryan --- arch/arm64/Kconfig| 1 + arch/arm64/mm/fault.c | 36 2 files changed, 37 insertions(+) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 03934808b2ed..829fa6d14a36 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -95,6 +95,7 @@ config ARM64 select ARCH_SUPPORTS_INT128 if CC_HAS_INT128 select ARCH_SUPPORTS_NUMA_BALANCING select ARCH_SUPPORTS_PAGE_TABLE_CHECK + select ARCH_SUPPORTS_PER_VMA_LOCK select ARCH_WANT_COMPAT_IPC_PARSE_VERSION if COMPAT select ARCH_WANT_DEFAULT_BPF_JIT select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c index 596f46dabe4e..833fa8bab291 100644 --- a/arch/arm64/mm/fault.c +++ b/arch/arm64/mm/fault.c @@ -535,6 +535,9 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr, unsigned long vm_flags; unsigned int mm_flags = FAULT_FLAG_DEFAULT; unsigned long addr = untagged_addr(far); +#ifdef CONFIG_PER_VMA_LOCK + struct vm_area_struct *vma; +#endif if (kprobe_page_fault(regs, esr)) return 0; @@ -585,6 +588,36 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr, perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, addr); +#ifdef CONFIG_PER_VMA_LOCK + if (!(mm_flags & FAULT_FLAG_USER) || atomic_read(>mm_users) == 1) + goto lock_mmap; + + vma = lock_vma_under_rcu(mm, addr); + if (!vma) + goto lock_mmap; + + if (!(vma->vm_flags & vm_flags)) { + vma_read_unlock(vma); + goto lock_mmap; + } + fault = handle_mm_fault(vma, addr & PAGE_MASK, + mm_flags | FAULT_FLAG_VMA_LOCK, regs); + vma_read_unlock(vma); + + if (!(fault & VM_FAULT_RETRY)) { + count_vm_vma_lock_event(VMA_LOCK_SUCCESS); + goto done; + } + count_vm_vma_lock_event(VMA_LOCK_RETRY); + + /* Quick path to respond to signals */ + if (fault_signal_pending(fault, regs)) { + if (!user_mode(regs)) + goto no_context; + return 0; + } +lock_mmap: +#endif /* CONFIG_PER_VMA_LOCK */ /* * As per x86, we may deadlock here. However, since the kernel only * validly references user space from well defined areas of the code, @@ -628,6 +661,9 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr, } mmap_read_unlock(mm); +#ifdef CONFIG_PER_VMA_LOCK +done: +#endif /* * Handle the "normal" (no error) case first. */ -- 2.39.0
[PATCH 34/41] x86/mm: try VMA lock-based page fault handling first
Attempt VMA lock-based page fault handling first, and fall back to the existing mmap_lock-based handling if that fails. Signed-off-by: Suren Baghdasaryan --- arch/x86/Kconfig| 1 + arch/x86/mm/fault.c | 36 2 files changed, 37 insertions(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 3604074a878b..3647f7bdb110 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -27,6 +27,7 @@ config X86_64 # Options that are inherently 64-bit kernel only: select ARCH_HAS_GIGANTIC_PAGE select ARCH_SUPPORTS_INT128 if CC_HAS_INT128 + select ARCH_SUPPORTS_PER_VMA_LOCK select ARCH_USE_CMPXCHG_LOCKREF select HAVE_ARCH_SOFT_DIRTY select MODULES_USE_ELF_RELA diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 7b0d4ab894c8..983266e7c49b 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -19,6 +19,7 @@ #include /* faulthandler_disabled() */ #include /* efi_crash_gracefully_on_page_fault()*/ #include +#include /* find_and_lock_vma() */ #include /* boot_cpu_has, ...*/ #include /* dotraplinkage, ... */ @@ -1354,6 +1355,38 @@ void do_user_addr_fault(struct pt_regs *regs, } #endif +#ifdef CONFIG_PER_VMA_LOCK + if (!(flags & FAULT_FLAG_USER) || atomic_read(>mm_users) == 1) + goto lock_mmap; + + vma = lock_vma_under_rcu(mm, address); + if (!vma) + goto lock_mmap; + + if (unlikely(access_error(error_code, vma))) { + vma_read_unlock(vma); + goto lock_mmap; + } + fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, regs); + vma_read_unlock(vma); + + if (!(fault & VM_FAULT_RETRY)) { + count_vm_vma_lock_event(VMA_LOCK_SUCCESS); + goto done; + } + count_vm_vma_lock_event(VMA_LOCK_RETRY); + + /* Quick path to respond to signals */ + if (fault_signal_pending(fault, regs)) { + if (!user_mode(regs)) + kernelmode_fixup_or_oops(regs, error_code, address, +SIGBUS, BUS_ADRERR, +ARCH_DEFAULT_PKEY); + return; + } +lock_mmap: +#endif /* CONFIG_PER_VMA_LOCK */ + /* * Kernel-mode access to the user address space should only occur * on well-defined single instructions listed in the exception @@ -1454,6 +1487,9 @@ void do_user_addr_fault(struct pt_regs *regs, } mmap_read_unlock(mm); +#ifdef CONFIG_PER_VMA_LOCK +done: +#endif if (likely(!(fault & VM_FAULT_ERROR))) return; -- 2.39.0
[PATCH 33/41] mm: introduce per-VMA lock statistics
Add a new CONFIG_PER_VMA_LOCK_STATS config option to dump extra statistics about handling page fault under VMA lock. Signed-off-by: Suren Baghdasaryan --- include/linux/vm_event_item.h | 6 ++ include/linux/vmstat.h| 6 ++ mm/Kconfig.debug | 8 mm/vmstat.c | 6 ++ 4 files changed, 26 insertions(+) diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index 7f5d1caf5890..8abfa1240040 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -149,6 +149,12 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, #ifdef CONFIG_X86 DIRECT_MAP_LEVEL2_SPLIT, DIRECT_MAP_LEVEL3_SPLIT, +#endif +#ifdef CONFIG_PER_VMA_LOCK_STATS + VMA_LOCK_SUCCESS, + VMA_LOCK_ABORT, + VMA_LOCK_RETRY, + VMA_LOCK_MISS, #endif NR_VM_EVENT_ITEMS }; diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h index 19cf5b6892ce..fed855bae6d8 100644 --- a/include/linux/vmstat.h +++ b/include/linux/vmstat.h @@ -125,6 +125,12 @@ static inline void vm_events_fold_cpu(int cpu) #define count_vm_tlb_events(x, y) do { (void)(y); } while (0) #endif +#ifdef CONFIG_PER_VMA_LOCK_STATS +#define count_vm_vma_lock_event(x) count_vm_event(x) +#else +#define count_vm_vma_lock_event(x) do {} while (0) +#endif + #define __count_zid_vm_events(item, zid, delta) \ __count_vm_events(item##_NORMAL - ZONE_NORMAL + zid, delta) diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug index fca699ad1fb0..32a93b064590 100644 --- a/mm/Kconfig.debug +++ b/mm/Kconfig.debug @@ -207,3 +207,11 @@ config PTDUMP_DEBUGFS kernel. If in doubt, say N. + + +config PER_VMA_LOCK_STATS + bool "Statistics for per-vma locks" + depends on PER_VMA_LOCK + default y + help + Statistics for per-vma locks. diff --git a/mm/vmstat.c b/mm/vmstat.c index 1ea6a5ce1c41..4f1089a1860e 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1399,6 +1399,12 @@ const char * const vmstat_text[] = { "direct_map_level2_splits", "direct_map_level3_splits", #endif +#ifdef CONFIG_PER_VMA_LOCK_STATS + "vma_lock_success", + "vma_lock_abort", + "vma_lock_retry", + "vma_lock_miss", +#endif #endif /* CONFIG_VM_EVENT_COUNTERS || CONFIG_MEMCG */ }; #endif /* CONFIG_PROC_FS || CONFIG_SYSFS || CONFIG_NUMA || CONFIG_MEMCG */ -- 2.39.0
[PATCH 32/41] mm: prevent userfaults to be handled under per-vma lock
Due to the possibility of handle_userfault dropping mmap_lock, avoid fault handling under VMA lock and retry holding mmap_lock. This can be handled more gracefully in the future. Signed-off-by: Suren Baghdasaryan Suggested-by: Peter Xu --- mm/memory.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/mm/memory.c b/mm/memory.c index 20806bc8b4eb..12508f4d845a 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -5273,6 +5273,13 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm, if (!vma->anon_vma) goto inval; + /* +* Due to the possibility of userfault handler dropping mmap_lock, avoid +* it for now and fall back to page fault handling under mmap_lock. +*/ + if (userfaultfd_armed(vma)) + goto inval; + if (!vma_read_trylock(vma)) goto inval; -- 2.39.0
[PATCH 31/41] mm: prevent do_swap_page from handling page faults under VMA lock
Due to the possibility of do_swap_page dropping mmap_lock, abort fault handling under VMA lock and retry holding mmap_lock. This can be handled more gracefully in the future. Signed-off-by: Suren Baghdasaryan Reviewed-by: Laurent Dufour --- mm/memory.c | 5 + 1 file changed, 5 insertions(+) diff --git a/mm/memory.c b/mm/memory.c index 2560524ad7f4..20806bc8b4eb 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3707,6 +3707,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) if (!pte_unmap_same(vmf)) goto out; + if (vmf->flags & FAULT_FLAG_VMA_LOCK) { + ret = VM_FAULT_RETRY; + goto out; + } + entry = pte_to_swp_entry(vmf->orig_pte); if (unlikely(non_swap_entry(entry))) { if (is_migration_entry(entry)) { -- 2.39.0
[PATCH 30/41] mm: add FAULT_FLAG_VMA_LOCK flag
Add a new flag to distinguish page faults handled under protection of per-vma lock. Signed-off-by: Suren Baghdasaryan Reviewed-by: Laurent Dufour --- include/linux/mm.h | 3 ++- include/linux/mm_types.h | 1 + 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index d0fddf6a1de9..2e3be1d45371 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -467,7 +467,8 @@ static inline bool fault_flag_allow_retry_first(enum fault_flag flags) { FAULT_FLAG_USER, "USER" }, \ { FAULT_FLAG_REMOTE,"REMOTE" }, \ { FAULT_FLAG_INSTRUCTION, "INSTRUCTION" }, \ - { FAULT_FLAG_INTERRUPTIBLE, "INTERRUPTIBLE" } + { FAULT_FLAG_INTERRUPTIBLE, "INTERRUPTIBLE" }, \ + { FAULT_FLAG_VMA_LOCK, "VMA_LOCK" } /* * vm_fault is filled by the pagefault handler and passed to the vma's diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 0d27edd3e63a..fce9113d979c 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -1103,6 +1103,7 @@ enum fault_flag { FAULT_FLAG_INTERRUPTIBLE = 1 << 9, FAULT_FLAG_UNSHARE =1 << 10, FAULT_FLAG_ORIG_PTE_VALID = 1 << 11, + FAULT_FLAG_VMA_LOCK = 1 << 12, }; typedef unsigned int __bitwise zap_flags_t; -- 2.39.0
[PATCH 29/41] mm: fall back to mmap_lock if vma->anon_vma is not yet set
When vma->anon_vma is not set, pagefault handler will set it by either reusing anon_vma of an adjacent VMA if VMAs are compatible or by allocating a new one. find_mergeable_anon_vma() walks VMA tree to find a compatible adjacent VMA and that requires not only the faulting VMA to be stable but also the tree structure and other VMAs inside that tree. Therefore locking just the faulting VMA is not enough for this search. Fall back to taking mmap_lock when vma->anon_vma is not set. This situation happens only on the first page fault and should not affect overall performance. Signed-off-by: Suren Baghdasaryan --- mm/memory.c | 4 1 file changed, 4 insertions(+) diff --git a/mm/memory.c b/mm/memory.c index a658e26d965d..2560524ad7f4 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -5264,6 +5264,10 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm, if (!vma_is_anonymous(vma)) goto inval; + /* find_mergeable_anon_vma uses adjacent vmas which are not locked */ + if (!vma->anon_vma) + goto inval; + if (!vma_read_trylock(vma)) goto inval; -- 2.39.0
[PATCH 28/41] mm: introduce lock_vma_under_rcu to be used from arch-specific code
Introduce lock_vma_under_rcu function to lookup and lock a VMA during page fault handling. When VMA is not found, can't be locked or changes after being locked, the function returns NULL. The lookup is performed under RCU protection to prevent the found VMA from being destroyed before the VMA lock is acquired. VMA lock statistics are updated according to the results. For now only anonymous VMAs can be searched this way. In other cases the function returns NULL. Signed-off-by: Suren Baghdasaryan --- include/linux/mm.h | 3 +++ mm/memory.c| 51 ++ 2 files changed, 54 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index c464fc8a514c..d0fddf6a1de9 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -687,6 +687,9 @@ static inline void vma_assert_no_reader(struct vm_area_struct *vma) vma); } +struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm, + unsigned long address); + #else /* CONFIG_PER_VMA_LOCK */ static inline void vma_init_lock(struct vm_area_struct *vma) {} diff --git a/mm/memory.c b/mm/memory.c index 9ece18548db1..a658e26d965d 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -5242,6 +5242,57 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, } EXPORT_SYMBOL_GPL(handle_mm_fault); +#ifdef CONFIG_PER_VMA_LOCK +/* + * Lookup and lock a VMA under RCU protection. Returned VMA is guaranteed to be + * stable and not isolated. If the VMA is not found or is being modified the + * function returns NULL. + */ +struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm, + unsigned long address) +{ + MA_STATE(mas, >mm_mt, address, address); + struct vm_area_struct *vma, *validate; + + rcu_read_lock(); + vma = mas_walk(); +retry: + if (!vma) + goto inval; + + /* Only anonymous vmas are supported for now */ + if (!vma_is_anonymous(vma)) + goto inval; + + if (!vma_read_trylock(vma)) + goto inval; + + /* Check since vm_start/vm_end might change before we lock the VMA */ + if (unlikely(address < vma->vm_start || address >= vma->vm_end)) { + vma_read_unlock(vma); + goto inval; + } + + /* Check if the VMA got isolated after we found it */ + mas.index = address; + validate = mas_walk(); + if (validate != vma) { + vma_read_unlock(vma); + count_vm_vma_lock_event(VMA_LOCK_MISS); + /* The area was replaced with another one. */ + vma = validate; + goto retry; + } + + rcu_read_unlock(); + return vma; +inval: + rcu_read_unlock(); + count_vm_vma_lock_event(VMA_LOCK_ABORT); + return NULL; +} +#endif /* CONFIG_PER_VMA_LOCK */ + #ifndef __PAGETABLE_P4D_FOLDED /* * Allocate p4d page table. -- 2.39.0
[PATCH 27/41] mm/mmap: prevent pagefault handler from racing with mmu_notifier registration
Page fault handlers might need to fire MMU notifications while a new notifier is being registered. Modify mm_take_all_locks to write-lock all VMAs and prevent this race with fault handlers that would hold VMA locks. VMAs are locked before i_mmap_rwsem and anon_vma to keep the same locking order as in page fault handlers. Signed-off-by: Suren Baghdasaryan --- mm/mmap.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/mm/mmap.c b/mm/mmap.c index 30c7d1c5206e..a256deca0bc0 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -3566,6 +3566,7 @@ static void vm_lock_mapping(struct mm_struct *mm, struct address_space *mapping) * of mm/rmap.c: * - all hugetlbfs_i_mmap_rwsem_key locks (aka mapping->i_mmap_rwsem for * hugetlb mapping); + * - all vmas marked locked * - all i_mmap_rwsem locks; * - all anon_vma->rwseml * @@ -3591,6 +3592,7 @@ int mm_take_all_locks(struct mm_struct *mm) mas_for_each(, vma, ULONG_MAX) { if (signal_pending(current)) goto out_unlock; + vma_write_lock(vma); if (vma->vm_file && vma->vm_file->f_mapping && is_vm_hugetlb_page(vma)) vm_lock_mapping(mm, vma->vm_file->f_mapping); @@ -3677,6 +3679,7 @@ void mm_drop_all_locks(struct mm_struct *mm) if (vma->vm_file && vma->vm_file->f_mapping) vm_unlock_mapping(vma->vm_file->f_mapping); } + vma_write_unlock_mm(mm); mutex_unlock(_all_locks_mutex); } -- 2.39.0
[PATCH 26/41] kernel/fork: assert no VMA readers during its destruction
Assert there are no holders of VMA lock for reading when it is about to be destroyed. Signed-off-by: Suren Baghdasaryan --- include/linux/mm.h | 8 kernel/fork.c | 2 ++ 2 files changed, 10 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index 594e835bad9c..c464fc8a514c 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -680,6 +680,13 @@ static inline void vma_assert_write_locked(struct vm_area_struct *vma) VM_BUG_ON_VMA(vma->vm_lock_seq != READ_ONCE(vma->vm_mm->mm_lock_seq), vma); } +static inline void vma_assert_no_reader(struct vm_area_struct *vma) +{ + VM_BUG_ON_VMA(rwsem_is_locked(>lock) && + vma->vm_lock_seq != READ_ONCE(vma->vm_mm->mm_lock_seq), + vma); +} + #else /* CONFIG_PER_VMA_LOCK */ static inline void vma_init_lock(struct vm_area_struct *vma) {} @@ -688,6 +695,7 @@ static inline bool vma_read_trylock(struct vm_area_struct *vma) { return false; } static inline void vma_read_unlock(struct vm_area_struct *vma) {} static inline void vma_assert_write_locked(struct vm_area_struct *vma) {} +static inline void vma_assert_no_reader(struct vm_area_struct *vma) {} #endif /* CONFIG_PER_VMA_LOCK */ diff --git a/kernel/fork.c b/kernel/fork.c index 1591dd8a0745..6d9f14e55ecf 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -485,6 +485,8 @@ static void __vm_area_free(struct rcu_head *head) { struct vm_area_struct *vma = container_of(head, struct vm_area_struct, vm_rcu); + /* The vma should either have no lock holders or be write-locked. */ + vma_assert_no_reader(vma); kmem_cache_free(vm_area_cachep, vma); } #endif -- 2.39.0
[PATCH 25/41] mm/mmap: write-lock adjacent VMAs if they can grow into unmapped area
While unmapping VMAs, adjacent VMAs might be able to grow into the area being unmapped. In such cases write-lock adjacent VMAs to prevent this growth. Signed-off-by: Suren Baghdasaryan --- mm/mmap.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/mm/mmap.c b/mm/mmap.c index 0d767ce043af..30c7d1c5206e 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -2461,11 +2461,13 @@ do_mas_align_munmap(struct ma_state *mas, struct vm_area_struct *vma, * down_read(mmap_lock) and collide with the VMA we are about to unmap. */ if (downgrade) { - if (next && (next->vm_flags & VM_GROWSDOWN)) + if (next && (next->vm_flags & VM_GROWSDOWN)) { + vma_write_lock(next); downgrade = false; - else if (prev && (prev->vm_flags & VM_GROWSUP)) + } else if (prev && (prev->vm_flags & VM_GROWSUP)) { + vma_write_lock(prev); downgrade = false; - else + } else mmap_write_downgrade(mm); } -- 2.39.0
[PATCH 24/41] mm: conditionally write-lock VMA in free_pgtables
Normally free_pgtables needs to lock affected VMAs except for the case when VMAs were isolated under VMA write-lock. munmap() does just that, isolating while holding appropriate locks and then downgrading mmap_lock and dropping per-VMA locks before freeing page tables. Add a parameter to free_pgtables and unmap_region for such scenario. Signed-off-by: Suren Baghdasaryan --- mm/internal.h | 2 +- mm/memory.c | 6 +- mm/mmap.c | 18 -- 3 files changed, 18 insertions(+), 8 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index bcf75a8b032d..5ea4ff1a70e7 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -87,7 +87,7 @@ void folio_activate(struct folio *folio); void free_pgtables(struct mmu_gather *tlb, struct maple_tree *mt, struct vm_area_struct *start_vma, unsigned long floor, - unsigned long ceiling); + unsigned long ceiling, bool lock_vma); void pmd_install(struct mm_struct *mm, pmd_t *pmd, pgtable_t *pte); struct zap_details; diff --git a/mm/memory.c b/mm/memory.c index 2fabf89b2be9..9ece18548db1 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -348,7 +348,7 @@ void free_pgd_range(struct mmu_gather *tlb, void free_pgtables(struct mmu_gather *tlb, struct maple_tree *mt, struct vm_area_struct *vma, unsigned long floor, - unsigned long ceiling) + unsigned long ceiling, bool lock_vma) { MA_STATE(mas, mt, vma->vm_end, vma->vm_end); @@ -366,6 +366,8 @@ void free_pgtables(struct mmu_gather *tlb, struct maple_tree *mt, * Hide vma from rmap and truncate_pagecache before freeing * pgtables */ + if (lock_vma) + vma_write_lock(vma); unlink_anon_vmas(vma); unlink_file_vma(vma); @@ -380,6 +382,8 @@ void free_pgtables(struct mmu_gather *tlb, struct maple_tree *mt, && !is_vm_hugetlb_page(next)) { vma = next; next = mas_find(, ceiling - 1); + if (lock_vma) + vma_write_lock(vma); unlink_anon_vmas(vma); unlink_file_vma(vma); } diff --git a/mm/mmap.c b/mm/mmap.c index be289e0b693b..0d767ce043af 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -78,7 +78,7 @@ core_param(ignore_rlimit_data, ignore_rlimit_data, bool, 0644); static void unmap_region(struct mm_struct *mm, struct maple_tree *mt, struct vm_area_struct *vma, struct vm_area_struct *prev, struct vm_area_struct *next, unsigned long start, - unsigned long end); + unsigned long end, bool lock_vma); static pgprot_t vm_pgprot_modify(pgprot_t oldprot, unsigned long vm_flags) { @@ -2202,7 +2202,7 @@ static inline void remove_mt(struct mm_struct *mm, struct ma_state *mas) static void unmap_region(struct mm_struct *mm, struct maple_tree *mt, struct vm_area_struct *vma, struct vm_area_struct *prev, struct vm_area_struct *next, - unsigned long start, unsigned long end) + unsigned long start, unsigned long end, bool lock_vma) { struct mmu_gather tlb; @@ -2211,7 +2211,8 @@ static void unmap_region(struct mm_struct *mm, struct maple_tree *mt, update_hiwater_rss(mm); unmap_vmas(, mt, vma, start, end); free_pgtables(, mt, vma, prev ? prev->vm_end : FIRST_USER_ADDRESS, -next ? next->vm_start : USER_PGTABLES_CEILING); +next ? next->vm_start : USER_PGTABLES_CEILING, +lock_vma); tlb_finish_mmu(); } @@ -2468,7 +2469,11 @@ do_mas_align_munmap(struct ma_state *mas, struct vm_area_struct *vma, mmap_write_downgrade(mm); } - unmap_region(mm, _detach, vma, prev, next, start, end); + /* +* We can free page tables without locking the vmas because they were +* isolated before we downgraded mmap_lock and dropped per-vma locks. +*/ + unmap_region(mm, _detach, vma, prev, next, start, end, !downgrade); /* Statistics and freeing VMAs */ mas_set(_detach, start); remove_mt(mm, _detach); @@ -2785,7 +2790,8 @@ unsigned long mmap_region(struct file *file, unsigned long addr, vma->vm_file = NULL; /* Undo any partial mapping done by a device driver. */ - unmap_region(mm, mas.tree, vma, prev, next, vma->vm_start, vma->vm_end); + unmap_region(mm, mas.tree, vma, prev, next, vma->vm_start, vma->vm_end, +true); if (file && (vm_flags & VM_SHARED)) mapping_unmap_writable(file->f_mapping); free_vma: @@ -3130,7 +3136,7 @@ void exit_mmap(struct mm_struct *mm)
[PATCH 23/41] mm: write-lock VMAs before removing them from VMA tree
Write-locking VMAs before isolating them ensures that page fault handlers don't operate on isolated VMAs. Signed-off-by: Suren Baghdasaryan --- mm/mmap.c | 2 ++ mm/nommu.c | 5 + 2 files changed, 7 insertions(+) diff --git a/mm/mmap.c b/mm/mmap.c index da1908730828..be289e0b693b 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -448,6 +448,7 @@ void vma_mas_store(struct vm_area_struct *vma, struct ma_state *mas) */ void vma_mas_remove(struct vm_area_struct *vma, struct ma_state *mas) { + vma_write_lock(vma); trace_vma_mas_szero(mas->tree, vma->vm_start, vma->vm_end - 1); mas->index = vma->vm_start; mas->last = vma->vm_end - 1; @@ -2300,6 +2301,7 @@ int split_vma(struct mm_struct *mm, struct vm_area_struct *vma, static inline int munmap_sidetree(struct vm_area_struct *vma, struct ma_state *mas_detach) { + vma_write_lock(vma); mas_set_range(mas_detach, vma->vm_start, vma->vm_end - 1); if (mas_store_gfp(mas_detach, vma, GFP_KERNEL)) return -ENOMEM; diff --git a/mm/nommu.c b/mm/nommu.c index b3154357ced5..7ae91337ef14 100644 --- a/mm/nommu.c +++ b/mm/nommu.c @@ -552,6 +552,7 @@ void vma_mas_store(struct vm_area_struct *vma, struct ma_state *mas) void vma_mas_remove(struct vm_area_struct *vma, struct ma_state *mas) { + vma_write_lock(vma); mas->index = vma->vm_start; mas->last = vma->vm_end - 1; mas_store_prealloc(mas, NULL); @@ -1551,6 +1552,10 @@ void exit_mmap(struct mm_struct *mm) mmap_write_lock(mm); for_each_vma(vmi, vma) { cleanup_vma_from_mm(vma); + /* +* No need to lock VMA because this is the only mm user and no +* page fault handled can race with it. +*/ delete_vma(mm, vma); cond_resched(); } -- 2.39.0
[PATCH 22/41] mm/mremap: write-lock VMA while remapping it to a new address range
Write-lock VMA as locked before copying it and when copy_vma produces a new VMA. Signed-off-by: Suren Baghdasaryan Reviewed-by: Laurent Dufour --- mm/mmap.c | 1 + mm/mremap.c | 1 + 2 files changed, 2 insertions(+) diff --git a/mm/mmap.c b/mm/mmap.c index ff02cb51e7e7..da1908730828 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -3261,6 +3261,7 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap, get_file(new_vma->vm_file); if (new_vma->vm_ops && new_vma->vm_ops->open) new_vma->vm_ops->open(new_vma); + vma_write_lock(new_vma); if (vma_link(mm, new_vma)) goto out_vma_link; *need_rmap_locks = false; diff --git a/mm/mremap.c b/mm/mremap.c index 2ccdd1561f5b..d24a79bcb1a1 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -622,6 +622,7 @@ static unsigned long move_vma(struct vm_area_struct *vma, return -ENOMEM; } + vma_write_lock(vma); new_pgoff = vma->vm_pgoff + ((old_addr - vma->vm_start) >> PAGE_SHIFT); new_vma = copy_vma(, new_addr, new_len, new_pgoff, _rmap_locks); -- 2.39.0
[PATCH 21/41] mm/mmap: write-lock VMAs affected by VMA expansion
vma_expand changes VMA boundaries and might result in freeing an adjacent VMA. Write-lock affected VMAs to prevent concurrent page faults. Signed-off-by: Suren Baghdasaryan --- mm/mmap.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mm/mmap.c b/mm/mmap.c index 1e2154137631..ff02cb51e7e7 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -544,6 +544,7 @@ inline int vma_expand(struct ma_state *mas, struct vm_area_struct *vma, if (mas_preallocate(mas, vma, GFP_KERNEL)) goto nomem; + vma_write_lock(vma); vma_adjust_trans_huge(vma, start, end, 0); if (file) { @@ -590,6 +591,7 @@ inline int vma_expand(struct ma_state *mas, struct vm_area_struct *vma, } if (remove_next) { + vma_write_lock(next); if (file) { uprobe_munmap(next, next->vm_start, next->vm_end); fput(file); -- 2.39.0
[PATCH 20/41] mm/mmap: write-lock VMAs in vma_adjust
vma_adjust modifies a VMA and possibly its neighbors. Write-lock them before making the modifications. Signed-off-by: Suren Baghdasaryan --- mm/mmap.c | 17 - 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/mm/mmap.c b/mm/mmap.c index f6ca4a87f9e2..1e2154137631 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -614,6 +614,12 @@ inline int vma_expand(struct ma_state *mas, struct vm_area_struct *vma, * The following helper function should be used when such adjustments * are necessary. The "insert" vma (if any) is to be inserted * before we drop the necessary locks. + * 'expand' vma is always locked before it's passed to __vma_adjust() + * from vma_merge() because vma should not change from the moment + * can_vma_merge_{before|after} decision is made. + * 'insert' vma is used only by __split_vma() and it's always a brand + * new vma which is not yet added into mm's vma tree, therefore no need + * to lock it. */ int __vma_adjust(struct vm_area_struct *vma, unsigned long start, unsigned long end, pgoff_t pgoff, struct vm_area_struct *insert, @@ -633,6 +639,10 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start, MA_STATE(mas, >mm_mt, 0, 0); struct vm_area_struct *exporter = NULL, *importer = NULL; + vma_write_lock(vma); + if (next) + vma_write_lock(next); + if (next && !insert) { if (end >= next->vm_end) { /* @@ -676,8 +686,11 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start, * If next doesn't have anon_vma, import from vma after * next, if the vma overlaps with it. */ - if (remove_next == 2 && !next->anon_vma) + if (remove_next == 2 && !next->anon_vma) { exporter = next_next; + if (exporter) + vma_write_lock(exporter); + } } else if (end > next->vm_start) { /* @@ -850,6 +863,8 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start, if (remove_next == 2) { remove_next = 1; next = next_next; + if (next) + vma_write_lock(next); goto again; } } -- 2.39.0
[PATCH 19/41] mm/mmap: write-lock VMAs before merging, splitting or expanding them
Decisions about whether VMAs can be merged, split or expanded must be made while VMAs are protected from the changes which can affect that decision. For example, merge_vma uses vma->anon_vma in its decision whether the VMA can be merged. Meanwhile, page fault handler changes vma->anon_vma during COW operation. Write-lock all VMAs which might be affected by a merge or split operation before making decision how such operations should be performed. Not sure if expansion really needs this, just being paranoid. Otherwise mmap_region and vm_brk_flags might not locking. Signed-off-by: Suren Baghdasaryan --- mm/mmap.c | 23 --- 1 file changed, 20 insertions(+), 3 deletions(-) diff --git a/mm/mmap.c b/mm/mmap.c index 53d885e70a54..f6ca4a87f9e2 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -254,8 +254,11 @@ SYSCALL_DEFINE1(brk, unsigned long, brk) */ mas_set(, oldbrk); next = mas_find(, newbrk - 1 + PAGE_SIZE + stack_guard_gap); - if (next && newbrk + PAGE_SIZE > vm_start_gap(next)) - goto out; + if (next) { + vma_write_lock(next); + if (newbrk + PAGE_SIZE > vm_start_gap(next)) + goto out; + } brkvma = mas_prev(, mm->start_brk); /* Ok, looks good - let it rip. */ @@ -1017,10 +1020,17 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm, if (vm_flags & VM_SPECIAL) return NULL; + if (prev) + vma_write_lock(prev); next = find_vma(mm, prev ? prev->vm_end : 0); mid = next; - if (next && next->vm_end == end)/* cases 6, 7, 8 */ + if (next) + vma_write_lock(next); + if (next && next->vm_end == end) { /* cases 6, 7, 8 */ next = find_vma(mm, next->vm_end); + if (next) + vma_write_lock(next); + } /* verify some invariant that must be enforced by the caller */ VM_WARN_ON(prev && addr <= prev->vm_start); @@ -2198,6 +2208,7 @@ int __split_vma(struct mm_struct *mm, struct vm_area_struct *vma, int err; validate_mm_mt(mm); + vma_write_lock(vma); if (vma->vm_ops && vma->vm_ops->may_split) { err = vma->vm_ops->may_split(vma, addr); if (err) @@ -2564,6 +2575,8 @@ unsigned long mmap_region(struct file *file, unsigned long addr, /* Attempt to expand an old mapping */ /* Check next */ + if (next) + vma_write_lock(next); if (next && next->vm_start == end && !vma_policy(next) && can_vma_merge_before(next, vm_flags, NULL, file, pgoff+pglen, NULL_VM_UFFD_CTX, NULL)) { @@ -2573,6 +2586,8 @@ unsigned long mmap_region(struct file *file, unsigned long addr, } /* Check prev */ + if (prev) + vma_write_lock(prev); if (prev && prev->vm_end == addr && !vma_policy(prev) && (vma ? can_vma_merge_after(prev, vm_flags, vma->anon_vma, file, pgoff, vma->vm_userfaultfd_ctx, NULL) : @@ -2942,6 +2957,8 @@ static int do_brk_flags(struct ma_state *mas, struct vm_area_struct *vma, if (security_vm_enough_memory_mm(mm, len >> PAGE_SHIFT)) return -ENOMEM; + if (vma) + vma_write_lock(vma); /* * Expand the existing vma if possible; Note that singular lists do not * occur after forking, so the expand will only happen on new VMAs. -- 2.39.0
[PATCH 18/41] mm/khugepaged: write-lock VMA while collapsing a huge page
Protect VMA from concurrent page fault handler while collapsing a huge page. Page fault handler needs a stable PMD to use PTL and relies on per-VMA lock to prevent concurrent PMD changes. pmdp_collapse_flush(), set_huge_pmd() and collapse_and_free_pmd() can modify a PMD, which will not be detected by a page fault handler without proper locking. Signed-off-by: Suren Baghdasaryan --- mm/khugepaged.c | 5 + 1 file changed, 5 insertions(+) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 5376246a3052..d8d0647f0c2c 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1032,6 +1032,7 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, if (result != SCAN_SUCCEED) goto out_up_write; + vma_write_lock(vma); anon_vma_lock_write(vma->anon_vma); mmu_notifier_range_init(, MMU_NOTIFY_CLEAR, 0, NULL, mm, @@ -1503,6 +1504,9 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, goto drop_hpage; } + /* Lock the vma before taking i_mmap and page table locks */ + vma_write_lock(vma); + /* * We need to lock the mapping so that from here on, only GUP-fast and * hardware page walks can access the parts of the page tables that @@ -1690,6 +1694,7 @@ static int retract_page_tables(struct address_space *mapping, pgoff_t pgoff, result = SCAN_PTE_UFFD_WP; goto unlock_next; } + vma_write_lock(vma); collapse_and_free_pmd(mm, vma, addr, pmd); if (!cc->is_khugepaged && is_target) result = set_huge_pmd(vma, addr, pmd, hpage); -- 2.39.0
[PATCH 17/41] mm/mmap: move VMA locking before anon_vma_lock_write call
Move VMA flag modification (which now implies VMA locking) before anon_vma_lock_write to match the locking order of page fault handler. Signed-off-by: Suren Baghdasaryan --- mm/mmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/mmap.c b/mm/mmap.c index fa994ae903d9..53d885e70a54 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -2953,13 +2953,13 @@ static int do_brk_flags(struct ma_state *mas, struct vm_area_struct *vma, if (mas_preallocate(mas, vma, GFP_KERNEL)) goto unacct_fail; + set_vm_flags(vma, VM_SOFTDIRTY); vma_adjust_trans_huge(vma, vma->vm_start, addr + len, 0); if (vma->anon_vma) { anon_vma_lock_write(vma->anon_vma); anon_vma_interval_tree_pre_update_vma(vma); } vma->vm_end = addr + len; - set_vm_flags(vma, VM_SOFTDIRTY); mas_store_prealloc(mas, vma); if (vma->anon_vma) { -- 2.39.0
[PATCH 16/41] mm: replace vma->vm_flags indirect modification in ksm_madvise
Replace indirect modifications to vma->vm_flags with calls to modifier functions to be able to track flag changes and to keep vma locking correctness. Add a BUG_ON check in ksm_madvise() to catch indirect vm_flags modification attempts. Signed-off-by: Suren Baghdasaryan --- arch/powerpc/kvm/book3s_hv_uvmem.c | 5 - arch/s390/mm/gmap.c| 5 - mm/khugepaged.c| 2 ++ mm/ksm.c | 2 ++ 4 files changed, 12 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c index 1d67baa5557a..325a7a47d348 100644 --- a/arch/powerpc/kvm/book3s_hv_uvmem.c +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c @@ -393,6 +393,7 @@ static int kvmppc_memslot_page_merge(struct kvm *kvm, { unsigned long gfn = memslot->base_gfn; unsigned long end, start = gfn_to_hva(kvm, gfn); + unsigned long vm_flags; int ret = 0; struct vm_area_struct *vma; int merge_flag = (merge) ? MADV_MERGEABLE : MADV_UNMERGEABLE; @@ -409,12 +410,14 @@ static int kvmppc_memslot_page_merge(struct kvm *kvm, ret = H_STATE; break; } + vm_flags = vma->vm_flags; ret = ksm_madvise(vma, vma->vm_start, vma->vm_end, - merge_flag, >vm_flags); + merge_flag, _flags); if (ret) { ret = H_STATE; break; } + reset_vm_flags(vma, vm_flags); start = vma->vm_end; } while (end > vma->vm_end); diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c index 3811d6c86d09..e47387f8be6d 100644 --- a/arch/s390/mm/gmap.c +++ b/arch/s390/mm/gmap.c @@ -2587,14 +2587,17 @@ int gmap_mark_unmergeable(void) { struct mm_struct *mm = current->mm; struct vm_area_struct *vma; + unsigned long vm_flags; int ret; VMA_ITERATOR(vmi, mm, 0); for_each_vma(vmi, vma) { + vm_flags = vma->vm_flags; ret = ksm_madvise(vma, vma->vm_start, vma->vm_end, - MADV_UNMERGEABLE, >vm_flags); + MADV_UNMERGEABLE, _flags); if (ret) return ret; + reset_vm_flags(vma, vm_flags); } mm->def_flags &= ~VM_MERGEABLE; return 0; diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 5cb401aa2b9d..5376246a3052 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -352,6 +352,8 @@ struct attribute_group khugepaged_attr_group = { int hugepage_madvise(struct vm_area_struct *vma, unsigned long *vm_flags, int advice) { + /* vma->vm_flags can be changed only using modifier functions */ + BUG_ON(vm_flags == >vm_flags); switch (advice) { case MADV_HUGEPAGE: #ifdef CONFIG_S390 diff --git a/mm/ksm.c b/mm/ksm.c index dd02780c387f..d05c41b289db 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -2471,6 +2471,8 @@ int ksm_madvise(struct vm_area_struct *vma, unsigned long start, struct mm_struct *mm = vma->vm_mm; int err; + /* vma->vm_flags can be changed only using modifier functions */ + BUG_ON(vm_flags == >vm_flags); switch (advice) { case MADV_MERGEABLE: /* -- 2.39.0
[PATCH 15/41] mm: replace vma->vm_flags direct modifications with modifier calls
Replace direct modifications to vma->vm_flags with calls to modifier functions to be able to track flag changes and to keep vma locking correctness. Signed-off-by: Suren Baghdasaryan --- arch/arm/kernel/process.c | 2 +- arch/ia64/mm/init.c| 8 arch/loongarch/include/asm/tlb.h | 2 +- arch/powerpc/kvm/book3s_xive_native.c | 2 +- arch/powerpc/mm/book3s64/subpage_prot.c| 2 +- arch/powerpc/platforms/book3s/vas-api.c| 2 +- arch/powerpc/platforms/cell/spufs/file.c | 14 +++--- arch/s390/mm/gmap.c| 3 +-- arch/x86/entry/vsyscall/vsyscall_64.c | 2 +- arch/x86/kernel/cpu/sgx/driver.c | 2 +- arch/x86/kernel/cpu/sgx/virt.c | 2 +- arch/x86/mm/pat/memtype.c | 6 +++--- arch/x86/um/mem_32.c | 2 +- drivers/acpi/pfr_telemetry.c | 2 +- drivers/android/binder.c | 3 +-- drivers/char/mspec.c | 2 +- drivers/crypto/hisilicon/qm.c | 2 +- drivers/dax/device.c | 2 +- drivers/dma/idxd/cdev.c| 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c| 2 +- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 4 ++-- drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c | 4 ++-- drivers/gpu/drm/amd/amdkfd/kfd_events.c| 4 ++-- drivers/gpu/drm/amd/amdkfd/kfd_process.c | 4 ++-- drivers/gpu/drm/drm_gem.c | 2 +- drivers/gpu/drm/drm_gem_dma_helper.c | 3 +-- drivers/gpu/drm/drm_gem_shmem_helper.c | 2 +- drivers/gpu/drm/drm_vm.c | 8 drivers/gpu/drm/etnaviv/etnaviv_gem.c | 2 +- drivers/gpu/drm/exynos/exynos_drm_gem.c| 4 ++-- drivers/gpu/drm/gma500/framebuffer.c | 2 +- drivers/gpu/drm/i810/i810_dma.c| 2 +- drivers/gpu/drm/i915/gem/i915_gem_mman.c | 4 ++-- drivers/gpu/drm/mediatek/mtk_drm_gem.c | 2 +- drivers/gpu/drm/msm/msm_gem.c | 2 +- drivers/gpu/drm/omapdrm/omap_gem.c | 3 +-- drivers/gpu/drm/rockchip/rockchip_drm_gem.c| 3 +-- drivers/gpu/drm/tegra/gem.c| 5 ++--- drivers/gpu/drm/ttm/ttm_bo_vm.c| 3 +-- drivers/gpu/drm/virtio/virtgpu_vram.c | 2 +- drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c | 2 +- drivers/gpu/drm/xen/xen_drm_front_gem.c| 3 +-- drivers/hsi/clients/cmt_speech.c | 2 +- drivers/hwtracing/intel_th/msu.c | 2 +- drivers/hwtracing/stm/core.c | 2 +- drivers/infiniband/hw/hfi1/file_ops.c | 4 ++-- drivers/infiniband/hw/mlx5/main.c | 4 ++-- drivers/infiniband/hw/qib/qib_file_ops.c | 13 ++--- drivers/infiniband/hw/usnic/usnic_ib_verbs.c | 2 +- drivers/infiniband/hw/vmw_pvrdma/pvrdma_verbs.c| 2 +- .../media/common/videobuf2/videobuf2-dma-contig.c | 2 +- drivers/media/common/videobuf2/videobuf2-vmalloc.c | 2 +- drivers/media/v4l2-core/videobuf-dma-contig.c | 2 +- drivers/media/v4l2-core/videobuf-dma-sg.c | 4 ++-- drivers/media/v4l2-core/videobuf-vmalloc.c | 2 +- drivers/misc/cxl/context.c | 2 +- drivers/misc/habanalabs/common/memory.c| 2 +- drivers/misc/habanalabs/gaudi/gaudi.c | 4 ++-- drivers/misc/habanalabs/gaudi2/gaudi2.c| 8 drivers/misc/habanalabs/goya/goya.c| 4 ++-- drivers/misc/ocxl/context.c| 4 ++-- drivers/misc/ocxl/sysfs.c | 2 +- drivers/misc/open-dice.c | 6 +++--- drivers/misc/sgi-gru/grufile.c | 4 ++-- drivers/misc/uacce/uacce.c | 2 +- drivers/sbus/char/oradax.c | 2 +- drivers/scsi/cxlflash/ocxl_hw.c| 2 +- drivers/scsi/sg.c | 2 +- drivers/staging/media/atomisp/pci/hmm/hmm_bo.c | 2 +- drivers/staging/media/deprecated/meye/meye.c | 4 ++-- .../media/deprecated/stkwebcam/stk-webcam.c| 2 +- drivers/target/target_core_user.c | 2 +- drivers/uio/uio.c | 2 +- drivers/usb/core/devio.c | 3 +-- drivers/usb/mon/mon_bin.c | 3 +-- drivers/vdpa/vdpa_user/iova_domain.c | 2 +- drivers/vfio/pci/vfio_pci_core.c | 2 +- drivers/vhost/vdpa.c | 2 +- drivers/video/fbdev/68328fb.c
[PATCH 14/41] mm: replace VM_LOCKED_CLEAR_MASK with VM_LOCKED_MASK
To simplify the usage of VM_LOCKED_CLEAR_MASK in clear_vm_flags(), replace it with VM_LOCKED_MASK bitmask and convert all users. Signed-off-by: Suren Baghdasaryan --- include/linux/mm.h | 4 ++-- kernel/fork.c | 2 +- mm/hugetlb.c | 4 ++-- mm/mlock.c | 6 +++--- mm/mmap.c | 6 +++--- mm/mremap.c| 2 +- 6 files changed, 12 insertions(+), 12 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 35cf0a6cbcc2..2b16d45b75a6 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -416,8 +416,8 @@ extern unsigned int kobjsize(const void *objp); /* This mask defines which mm->def_flags a process can inherit its parent */ #define VM_INIT_DEF_MASK VM_NOHUGEPAGE -/* This mask is used to clear all the VMA flags used by mlock */ -#define VM_LOCKED_CLEAR_MASK (~(VM_LOCKED | VM_LOCKONFAULT)) +/* This mask represents all the VMA flag bits used by mlock */ +#define VM_LOCKED_MASK (VM_LOCKED | VM_LOCKONFAULT) /* Arch-specific flags to clear when updating VM flags on protection change */ #ifndef VM_ARCH_CLEAR diff --git a/kernel/fork.c b/kernel/fork.c index c026d75108b3..1591dd8a0745 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -674,7 +674,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm, tmp->anon_vma = NULL; } else if (anon_vma_fork(tmp, mpnt)) goto fail_nomem_anon_vma_fork; - tmp->vm_flags &= ~(VM_LOCKED | VM_LOCKONFAULT); + clear_vm_flags(tmp, VM_LOCKED_MASK); file = tmp->vm_file; if (file) { struct address_space *mapping = file->f_mapping; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index db895230ee7e..24861cbfa2b1 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6950,8 +6950,8 @@ static unsigned long page_table_shareable(struct vm_area_struct *svma, unsigned long s_end = sbase + PUD_SIZE; /* Allow segments to share if only one is marked locked */ - unsigned long vm_flags = vma->vm_flags & VM_LOCKED_CLEAR_MASK; - unsigned long svm_flags = svma->vm_flags & VM_LOCKED_CLEAR_MASK; + unsigned long vm_flags = vma->vm_flags & ~VM_LOCKED_MASK; + unsigned long svm_flags = svma->vm_flags & ~VM_LOCKED_MASK; /* * match the virtual addresses, permission and the alignment of the diff --git a/mm/mlock.c b/mm/mlock.c index 7032f6dd0ce1..06aa9e204fac 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -490,7 +490,7 @@ static int apply_vma_lock_flags(unsigned long start, size_t len, prev = mas_prev(, 0); for (nstart = start ; ; ) { - vm_flags_t newflags = vma->vm_flags & VM_LOCKED_CLEAR_MASK; + vm_flags_t newflags = vma->vm_flags & ~VM_LOCKED_MASK; newflags |= flags; @@ -662,7 +662,7 @@ static int apply_mlockall_flags(int flags) struct vm_area_struct *vma, *prev = NULL; vm_flags_t to_add = 0; - current->mm->def_flags &= VM_LOCKED_CLEAR_MASK; + current->mm->def_flags &= ~VM_LOCKED_MASK; if (flags & MCL_FUTURE) { current->mm->def_flags |= VM_LOCKED; @@ -682,7 +682,7 @@ static int apply_mlockall_flags(int flags) mas_for_each(, vma, ULONG_MAX) { vm_flags_t newflags; - newflags = vma->vm_flags & VM_LOCKED_CLEAR_MASK; + newflags = vma->vm_flags & ~VM_LOCKED_MASK; newflags |= to_add; /* Ignore errors */ diff --git a/mm/mmap.c b/mm/mmap.c index 9db37adfc00a..5c4b608edde9 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -2721,7 +2721,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr, if ((vm_flags & VM_SPECIAL) || vma_is_dax(vma) || is_vm_hugetlb_page(vma) || vma == get_gate_vma(current->mm)) - vma->vm_flags &= VM_LOCKED_CLEAR_MASK; + clear_vm_flags(vma, VM_LOCKED_MASK); else mm->locked_vm += (len >> PAGE_SHIFT); } @@ -3392,8 +3392,8 @@ static struct vm_area_struct *__install_special_mapping( vma->vm_start = addr; vma->vm_end = addr + len; - vma->vm_flags = vm_flags | mm->def_flags | VM_DONTEXPAND | VM_SOFTDIRTY; - vma->vm_flags &= VM_LOCKED_CLEAR_MASK; + init_vm_flags(vma, (vm_flags | mm->def_flags | + VM_DONTEXPAND | VM_SOFTDIRTY) & ~VM_LOCKED_MASK); vma->vm_page_prot = vm_get_page_prot(vma->vm_flags); vma->vm_ops = ops; diff --git a/mm/mremap.c b/mm/mremap.c index fe587c5d6591..5f6f9931bff1 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -686,7 +686,7 @@ static unsigned long move_vma(struct vm_area_struct *vma, if (unlikely(!err && (flags & MREMAP_DONTUNMAP))) { /* We always clear VM_LOCKED[ONFAULT] on the old vma */ -
[PATCH 13/41] mm: introduce vma->vm_flags modifier functions
To keep vma locking correctness when vm_flags are modified, add modifier functions to be used whenever flags are updated. Signed-off-by: Suren Baghdasaryan --- include/linux/mm.h | 38 ++ include/linux/mm_types.h | 8 +++- 2 files changed, 45 insertions(+), 1 deletion(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index ec2c4c227d51..35cf0a6cbcc2 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -702,6 +702,44 @@ static inline void vma_init(struct vm_area_struct *vma, struct mm_struct *mm) vma_init_lock(vma); } +/* Use when VMA is not part of the VMA tree and needs no locking */ +static inline +void init_vm_flags(struct vm_area_struct *vma, unsigned long flags) +{ + WRITE_ONCE(vma->vm_flags, flags); +} + +/* Use when VMA is part of the VMA tree and needs appropriate locking */ +static inline +void reset_vm_flags(struct vm_area_struct *vma, unsigned long flags) +{ + vma_write_lock(vma); + init_vm_flags(vma, flags); +} + +static inline +void set_vm_flags(struct vm_area_struct *vma, unsigned long flags) +{ + vma_write_lock(vma); + vma->vm_flags |= flags; +} + +static inline +void clear_vm_flags(struct vm_area_struct *vma, unsigned long flags) +{ + vma_write_lock(vma); + vma->vm_flags &= ~flags; +} + +static inline +void mod_vm_flags(struct vm_area_struct *vma, + unsigned long set, unsigned long clear) +{ + vma_write_lock(vma); + vma->vm_flags |= set; + vma->vm_flags &= ~clear; +} + static inline void vma_set_anonymous(struct vm_area_struct *vma) { vma->vm_ops = NULL; diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 5f7c5ca89931..0d27edd3e63a 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -553,7 +553,13 @@ struct vm_area_struct { * See vmf_insert_mixed_prot() for discussion. */ pgprot_t vm_page_prot; - unsigned long vm_flags; /* Flags, see mm.h. */ + + /* +* Flags, see mm.h. +* WARNING! Do not modify directly to keep correct VMA locking. +* Use {init|reset|set|clear|mod}_vm_flags() functions instead. +*/ + unsigned long vm_flags; #ifdef CONFIG_PER_VMA_LOCK int vm_lock_seq; -- 2.39.0
[PATCH 12/41] mm: add per-VMA lock and helper functions to control it
Introduce a per-VMA rw_semaphore to be used during page fault handling instead of mmap_lock. Because there are cases when multiple VMAs need to be exclusively locked during VMA tree modifications, instead of the usual lock/unlock patter we mark a VMA as locked by taking per-VMA lock exclusively and setting vma->lock_seq to the current mm->lock_seq. When mmap_write_lock holder is done with all modifications and drops mmap_lock, it will increment mm->lock_seq, effectively unlocking all VMAs marked as locked. VMA lock is placed on the cache line boundary so that its 'count' field falls into the first cache line while the rest of the fields fall into the second cache line. This lets the 'count' field to be cached with other frequently accessed fields and used quickly in uncontended case while 'owner' and other fields used in the contended case will not invalidate the first cache line while waiting on the lock. Signed-off-by: Suren Baghdasaryan --- include/linux/mm.h| 80 +++ include/linux/mm_types.h | 8 include/linux/mmap_lock.h | 13 +++ kernel/fork.c | 4 ++ mm/init-mm.c | 3 ++ 5 files changed, 108 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index f3f196e4d66d..ec2c4c227d51 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -612,6 +612,85 @@ struct vm_operations_struct { unsigned long addr); }; +#ifdef CONFIG_PER_VMA_LOCK +static inline void vma_init_lock(struct vm_area_struct *vma) +{ + init_rwsem(>lock); + vma->vm_lock_seq = -1; +} + +static inline void vma_write_lock(struct vm_area_struct *vma) +{ + int mm_lock_seq; + + mmap_assert_write_locked(vma->vm_mm); + + /* +* current task is holding mmap_write_lock, both vma->vm_lock_seq and +* mm->mm_lock_seq can't be concurrently modified. +*/ + mm_lock_seq = READ_ONCE(vma->vm_mm->mm_lock_seq); + if (vma->vm_lock_seq == mm_lock_seq) + return; + + down_write(>lock); + vma->vm_lock_seq = mm_lock_seq; + up_write(>lock); +} + +/* + * Try to read-lock a vma. The function is allowed to occasionally yield false + * locked result to avoid performance overhead, in which case we fall back to + * using mmap_lock. The function should never yield false unlocked result. + */ +static inline bool vma_read_trylock(struct vm_area_struct *vma) +{ + /* Check before locking. A race might cause false locked result. */ + if (vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq)) + return false; + + if (unlikely(down_read_trylock(>lock) == 0)) + return false; + + /* +* Overflow might produce false locked result. +* False unlocked result is impossible because we modify and check +* vma->vm_lock_seq under vma->lock protection and mm->mm_lock_seq +* modification invalidates all existing locks. +*/ + if (unlikely(vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq))) { + up_read(>lock); + return false; + } + return true; +} + +static inline void vma_read_unlock(struct vm_area_struct *vma) +{ + up_read(>lock); +} + +static inline void vma_assert_write_locked(struct vm_area_struct *vma) +{ + mmap_assert_write_locked(vma->vm_mm); + /* +* current task is holding mmap_write_lock, both vma->vm_lock_seq and +* mm->mm_lock_seq can't be concurrently modified. +*/ + VM_BUG_ON_VMA(vma->vm_lock_seq != READ_ONCE(vma->vm_mm->mm_lock_seq), vma); +} + +#else /* CONFIG_PER_VMA_LOCK */ + +static inline void vma_init_lock(struct vm_area_struct *vma) {} +static inline void vma_write_lock(struct vm_area_struct *vma) {} +static inline bool vma_read_trylock(struct vm_area_struct *vma) + { return false; } +static inline void vma_read_unlock(struct vm_area_struct *vma) {} +static inline void vma_assert_write_locked(struct vm_area_struct *vma) {} + +#endif /* CONFIG_PER_VMA_LOCK */ + static inline void vma_init(struct vm_area_struct *vma, struct mm_struct *mm) { static const struct vm_operations_struct dummy_vm_ops = {}; @@ -620,6 +699,7 @@ static inline void vma_init(struct vm_area_struct *vma, struct mm_struct *mm) vma->vm_mm = mm; vma->vm_ops = _vm_ops; INIT_LIST_HEAD(>anon_vma_chain); + vma_init_lock(vma); } static inline void vma_set_anonymous(struct vm_area_struct *vma) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index d5cdec1314fe..5f7c5ca89931 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -555,6 +555,11 @@ struct vm_area_struct { pgprot_t vm_page_prot; unsigned long vm_flags; /* Flags, see mm.h. */ +#ifdef CONFIG_PER_VMA_LOCK + int vm_lock_seq; + struct rw_semaphore lock; +#endif + /* * For areas with an
[PATCH 11/41] mm: export dump_mm()
mmap_assert_write_locked() will be used in the next patch to ensure vma write lock is taken only under mmap_lock exclusive lock. Because mmap_assert_write_locked() uses dump_mm() and there are cases when vma write lock is taken from inside a module, it's necessary to export dump_mm() function. Signed-off-by: Suren Baghdasaryan --- mm/debug.c | 1 + 1 file changed, 1 insertion(+) diff --git a/mm/debug.c b/mm/debug.c index 7f8e5f744e42..b6e9e53469d1 100644 --- a/mm/debug.c +++ b/mm/debug.c @@ -215,6 +215,7 @@ void dump_mm(const struct mm_struct *mm) mm->def_flags, >def_flags ); } +EXPORT_SYMBOL(dump_mm); static bool page_init_poisoning __read_mostly = true; -- 2.39.0
[PATCH 10/41] mm: move mmap_lock assert function definitions
Move mmap_lock assert function definitions up so that they can be used by other mmap_lock routines. Signed-off-by: Suren Baghdasaryan --- include/linux/mmap_lock.h | 24 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h index 96e113e23d04..e49ba91bb1f0 100644 --- a/include/linux/mmap_lock.h +++ b/include/linux/mmap_lock.h @@ -60,6 +60,18 @@ static inline void __mmap_lock_trace_released(struct mm_struct *mm, bool write) #endif /* CONFIG_TRACING */ +static inline void mmap_assert_locked(struct mm_struct *mm) +{ + lockdep_assert_held(>mmap_lock); + VM_BUG_ON_MM(!rwsem_is_locked(>mmap_lock), mm); +} + +static inline void mmap_assert_write_locked(struct mm_struct *mm) +{ + lockdep_assert_held_write(>mmap_lock); + VM_BUG_ON_MM(!rwsem_is_locked(>mmap_lock), mm); +} + static inline void mmap_init_lock(struct mm_struct *mm) { init_rwsem(>mmap_lock); @@ -150,18 +162,6 @@ static inline void mmap_read_unlock_non_owner(struct mm_struct *mm) up_read_non_owner(>mmap_lock); } -static inline void mmap_assert_locked(struct mm_struct *mm) -{ - lockdep_assert_held(>mmap_lock); - VM_BUG_ON_MM(!rwsem_is_locked(>mmap_lock), mm); -} - -static inline void mmap_assert_write_locked(struct mm_struct *mm) -{ - lockdep_assert_held_write(>mmap_lock); - VM_BUG_ON_MM(!rwsem_is_locked(>mmap_lock), mm); -} - static inline int mmap_lock_is_contended(struct mm_struct *mm) { return rwsem_is_contended(>mmap_lock); -- 2.39.0
[PATCH 09/41] mm: rcu safe VMA freeing
From: Michel Lespinasse This prepares for page faults handling under VMA lock, looking up VMAs under protection of an rcu read lock, instead of the usual mmap read lock. Signed-off-by: Michel Lespinasse Signed-off-by: Suren Baghdasaryan --- include/linux/mm_types.h | 13 ++--- kernel/fork.c| 13 + 2 files changed, 23 insertions(+), 3 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 4b6bce73fbb4..d5cdec1314fe 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -535,9 +535,16 @@ struct anon_vma_name { struct vm_area_struct { /* The first cache line has the info for VMA tree walking. */ - unsigned long vm_start; /* Our start address within vm_mm. */ - unsigned long vm_end; /* The first byte after our end address - within vm_mm. */ + union { + struct { + /* VMA covers [vm_start; vm_end) addresses within mm */ + unsigned long vm_start; + unsigned long vm_end; + }; +#ifdef CONFIG_PER_VMA_LOCK + struct rcu_head vm_rcu; /* Used for deferred freeing. */ +#endif + }; struct mm_struct *vm_mm;/* The address space we belong to. */ diff --git a/kernel/fork.c b/kernel/fork.c index 58aab6c889a4..5986817f393c 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -479,10 +479,23 @@ struct vm_area_struct *vm_area_dup(struct vm_area_struct *orig) return new; } +#ifdef CONFIG_PER_VMA_LOCK +static void __vm_area_free(struct rcu_head *head) +{ + struct vm_area_struct *vma = container_of(head, struct vm_area_struct, + vm_rcu); + kmem_cache_free(vm_area_cachep, vma); +} +#endif + void vm_area_free(struct vm_area_struct *vma) { free_anon_vma_name(vma); +#ifdef CONFIG_PER_VMA_LOCK + call_rcu(>vm_rcu, __vm_area_free); +#else kmem_cache_free(vm_area_cachep, vma); +#endif } static void account_kernel_stack(struct task_struct *tsk, int account) -- 2.39.0
[PATCH 08/41] mm: introduce CONFIG_PER_VMA_LOCK
This configuration variable will be used to build the support for VMA locking during page fault handling. This is enabled by default on supported architectures with SMP and MMU set. The architecture support is needed since the page fault handler is called from the architecture's page faulting code which needs modifications to handle faults under VMA lock. Signed-off-by: Suren Baghdasaryan --- mm/Kconfig | 13 + 1 file changed, 13 insertions(+) diff --git a/mm/Kconfig b/mm/Kconfig index ff7b209dec05..0aeca3794972 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -1183,6 +1183,19 @@ config LRU_GEN_STATS This option has a per-memcg and per-node memory overhead. # } +config ARCH_SUPPORTS_PER_VMA_LOCK + def_bool n + +config PER_VMA_LOCK + bool "Per-vma locking support" + default y + depends on ARCH_SUPPORTS_PER_VMA_LOCK && MMU && SMP + help + Allow per-vma locking during page fault handling. + + This feature allows locking each virtual memory area separately when + handling page faults instead of taking mmap_lock. + source "mm/damon/Kconfig" endmenu -- 2.39.0
[PATCH 07/41] mm: Enable maple tree RCU mode by default.
From: "Liam R. Howlett" Use the maple tree in RCU mode for VMA tracking. This is necessary for the use of per-VMA locking. RCU mode is enabled by default but disabled when exiting an mm and for the new tree during a fork. Also enable RCU for the tree used in munmap operations to ensure the nodes remain valid for readers. Signed-off-by: Liam R. Howlett Signed-off-by: Suren Baghdasaryan --- include/linux/mm_types.h | 3 ++- kernel/fork.c| 3 +++ mm/mmap.c| 4 +++- 3 files changed, 8 insertions(+), 2 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 3b8475007734..4b6bce73fbb4 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -810,7 +810,8 @@ struct mm_struct { unsigned long cpu_bitmap[]; }; -#define MM_MT_FLAGS(MT_FLAGS_ALLOC_RANGE | MT_FLAGS_LOCK_EXTERN) +#define MM_MT_FLAGS(MT_FLAGS_ALLOC_RANGE | MT_FLAGS_LOCK_EXTERN | \ +MT_FLAGS_USE_RCU) extern struct mm_struct init_mm; /* Pointer magic because the dynamic array size confuses some compilers. */ diff --git a/kernel/fork.c b/kernel/fork.c index 9f7fe3541897..58aab6c889a4 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -617,6 +617,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm, if (retval) goto out; + mt_clear_in_rcu(mas.tree); mas_for_each(_mas, mpnt, ULONG_MAX) { struct file *file; @@ -703,6 +704,8 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm, retval = arch_dup_mmap(oldmm, mm); loop_out: mas_destroy(); + if (!retval) + mt_set_in_rcu(mas.tree); out: mmap_write_unlock(mm); flush_tlb_mm(oldmm); diff --git a/mm/mmap.c b/mm/mmap.c index 87d929316d57..9db37adfc00a 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -2304,7 +2304,8 @@ do_mas_align_munmap(struct ma_state *mas, struct vm_area_struct *vma, int count = 0; int error = -ENOMEM; MA_STATE(mas_detach, _detach, 0, 0); - mt_init_flags(_detach, MT_FLAGS_LOCK_EXTERN); + mt_init_flags(_detach, mas->tree->ma_flags & + (MT_FLAGS_LOCK_MASK | MT_FLAGS_USE_RCU)); mt_set_external_lock(_detach, >mmap_lock); if (mas_preallocate(mas, vma, GFP_KERNEL)) @@ -3091,6 +3092,7 @@ void exit_mmap(struct mm_struct *mm) */ set_bit(MMF_OOM_SKIP, >flags); mmap_write_lock(mm); + mt_clear_in_rcu(>mm_mt); free_pgtables(, >mm_mt, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING); tlb_finish_mmu(); -- 2.39.0
[PATCH 06/41] maple_tree: Add smp_rmb() to dead node detection
From: "Liam R. Howlett" Add an smp_rmb() before reading the parent pointer to ensure that anything read from the node prior to the parent pointer hasn't been reordered ahead of this check. The is necessary for RCU mode. Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Liam R. Howlett Signed-off-by: Suren Baghdasaryan --- lib/maple_tree.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/lib/maple_tree.c b/lib/maple_tree.c index 8066fb1e8ec9..80ca28b656d3 100644 --- a/lib/maple_tree.c +++ b/lib/maple_tree.c @@ -535,9 +535,11 @@ static inline struct maple_node *mte_parent(const struct maple_enode *enode) */ static inline bool ma_dead_node(const struct maple_node *node) { - struct maple_node *parent = (void *)((unsigned long) -node->parent & ~MAPLE_NODE_MASK); + struct maple_node *parent; + /* Do not reorder reads from the node prior to the parent check */ + smp_rmb(); + parent = (void *)((unsigned long) node->parent & ~MAPLE_NODE_MASK); return (parent == node); } @@ -552,6 +554,8 @@ static inline bool mte_dead_node(const struct maple_enode *enode) struct maple_node *parent, *node; node = mte_to_node(enode); + /* Do not reorder reads from the node prior to the parent check */ + smp_rmb(); parent = mte_parent(enode); return (parent == node); } -- 2.39.0
[PATCH 05/41] maple_tree: Fix write memory barrier of nodes once dead for RCU mode
From: "Liam R. Howlett" During the development of the maple tree, the strategy of freeing multiple nodes changed and, in the process, the pivots were reused to store pointers to dead nodes. To ensure the readers see accurate pivots, the writers need to mark the nodes as dead and call smp_wmb() to ensure any readers can identify the node as dead before using the pivot values. There were two places where the old method of marking the node as dead without smp_wmb() were being used, which resulted in RCU readers seeing the wrong pivot value before seeing the node was dead. Fix this race condition by using mte_set_node_dead() which has the smp_wmb() call to ensure the race is closed. Add a WARN_ON() to the ma_free_rcu() call to ensure all nodes being freed are marked as dead to ensure there are no other call paths besides the two updated paths. This is necessary for the RCU mode of the maple tree. Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Liam R. Howlett Signed-off-by: Suren Baghdasaryan --- lib/maple_tree.c | 7 +-- tools/testing/radix-tree/maple.c | 16 2 files changed, 21 insertions(+), 2 deletions(-) diff --git a/lib/maple_tree.c b/lib/maple_tree.c index d85291b19f86..8066fb1e8ec9 100644 --- a/lib/maple_tree.c +++ b/lib/maple_tree.c @@ -179,7 +179,7 @@ static void mt_free_rcu(struct rcu_head *head) */ static void ma_free_rcu(struct maple_node *node) { - node->parent = ma_parent_ptr(node); + WARN_ON(node->parent != ma_parent_ptr(node)); call_rcu(>rcu, mt_free_rcu); } @@ -1775,8 +1775,10 @@ static inline void mas_replace(struct ma_state *mas, bool advanced) rcu_assign_pointer(slots[offset], mas->node); } - if (!advanced) + if (!advanced) { + mte_set_node_dead(old_enode); mas_free(mas, old_enode); + } } /* @@ -4217,6 +4219,7 @@ static inline bool mas_wr_node_store(struct ma_wr_state *wr_mas) done: mas_leaf_set_meta(mas, newnode, dst_pivots, maple_leaf_64, new_end); if (in_rcu) { + mte_set_node_dead(mas->node); mas->node = mt_mk_node(newnode, wr_mas->type); mas_replace(mas, false); } else { diff --git a/tools/testing/radix-tree/maple.c b/tools/testing/radix-tree/maple.c index 81fa7ec2e66a..2539ad6c4777 100644 --- a/tools/testing/radix-tree/maple.c +++ b/tools/testing/radix-tree/maple.c @@ -108,6 +108,7 @@ static noinline void check_new_node(struct maple_tree *mt) MT_BUG_ON(mt, mn->slot[1] != NULL); MT_BUG_ON(mt, mas_allocated() != 0); + mn->parent = ma_parent_ptr(mn); ma_free_rcu(mn); mas.node = MAS_START; mas_nomem(, GFP_KERNEL); @@ -160,6 +161,7 @@ static noinline void check_new_node(struct maple_tree *mt) MT_BUG_ON(mt, mas_allocated() != i); MT_BUG_ON(mt, !mn); MT_BUG_ON(mt, not_empty(mn)); + mn->parent = ma_parent_ptr(mn); ma_free_rcu(mn); } @@ -192,6 +194,7 @@ static noinline void check_new_node(struct maple_tree *mt) MT_BUG_ON(mt, not_empty(mn)); MT_BUG_ON(mt, mas_allocated() != i - 1); MT_BUG_ON(mt, !mn); + mn->parent = ma_parent_ptr(mn); ma_free_rcu(mn); } @@ -210,6 +213,7 @@ static noinline void check_new_node(struct maple_tree *mt) mn = mas_pop_node(); MT_BUG_ON(mt, not_empty(mn)); MT_BUG_ON(mt, mas_allocated() != j - 1); + mn->parent = ma_parent_ptr(mn); ma_free_rcu(mn); } MT_BUG_ON(mt, mas_allocated() != 0); @@ -233,6 +237,7 @@ static noinline void check_new_node(struct maple_tree *mt) MT_BUG_ON(mt, mas_allocated() != i - j); mn = mas_pop_node(); MT_BUG_ON(mt, not_empty(mn)); + mn->parent = ma_parent_ptr(mn); ma_free_rcu(mn); MT_BUG_ON(mt, mas_allocated() != i - j - 1); } @@ -269,6 +274,7 @@ static noinline void check_new_node(struct maple_tree *mt) mn = mas_pop_node(); /* get the next node. */ MT_BUG_ON(mt, mn == NULL); MT_BUG_ON(mt, not_empty(mn)); + mn->parent = ma_parent_ptr(mn); ma_free_rcu(mn); } MT_BUG_ON(mt, mas_allocated() != 0); @@ -294,6 +300,7 @@ static noinline void check_new_node(struct maple_tree *mt) mn = mas_pop_node(); /* get the next node. */ MT_BUG_ON(mt, mn == NULL); MT_BUG_ON(mt, not_empty(mn)); + mn->parent = ma_parent_ptr(mn);
[PATCH 04/41] maple_tree: remove extra smp_wmb() from mas_dead_leaves()
From: Liam Howlett The call to mte_set_dead_node() before the smp_wmb() already calls smp_wmb() so this is not needed. This is an optimization for the RCU mode of the maple tree. Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Liam Howlett Signed-off-by: Suren Baghdasaryan --- lib/maple_tree.c | 1 - 1 file changed, 1 deletion(-) diff --git a/lib/maple_tree.c b/lib/maple_tree.c index a11eea943f8d..d85291b19f86 100644 --- a/lib/maple_tree.c +++ b/lib/maple_tree.c @@ -5510,7 +5510,6 @@ unsigned char mas_dead_leaves(struct ma_state *mas, void __rcu **slots, break; mte_set_node_dead(entry); - smp_wmb(); /* Needed for RCU */ node->type = type; rcu_assign_pointer(slots[offset], node); } -- 2.39.0
[PATCH 03/41] maple_tree: Fix freeing of nodes in rcu mode
From: Liam Howlett The walk to destroy the nodes was not always setting the node type and would result in a destroy method potentially using the values as nodes. Avoid this by setting the correct node types. This is necessary for the RCU mode of the maple tree. Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Liam Howlett Signed-off-by: Suren Baghdasaryan --- lib/maple_tree.c | 73 1 file changed, 62 insertions(+), 11 deletions(-) diff --git a/lib/maple_tree.c b/lib/maple_tree.c index a748938ad2e9..a11eea943f8d 100644 --- a/lib/maple_tree.c +++ b/lib/maple_tree.c @@ -897,6 +897,44 @@ static inline void ma_set_meta(struct maple_node *mn, enum maple_type mt, meta->end = end; } +/* + * mas_clear_meta() - clear the metadata information of a node, if it exists + * @mas: The maple state + * @mn: The maple node + * @mt: The maple node type + * @offset: The offset of the highest sub-gap in this node. + * @end: The end of the data in this node. + */ +static inline void mas_clear_meta(struct ma_state *mas, struct maple_node *mn, + enum maple_type mt) +{ + struct maple_metadata *meta; + unsigned long *pivots; + void __rcu **slots; + void *next; + + switch (mt) { + case maple_range_64: + pivots = mn->mr64.pivot; + if (unlikely(pivots[MAPLE_RANGE64_SLOTS - 2])) { + slots = mn->mr64.slot; + next = mas_slot_locked(mas, slots, + MAPLE_RANGE64_SLOTS - 1); + if (unlikely((mte_to_node(next) && mte_node_type(next + return; /* The last slot is a node, no metadata */ + } + fallthrough; + case maple_arange_64: + meta = ma_meta(mn, mt); + break; + default: + return; + } + + meta->gap = 0; + meta->end = 0; +} + /* * ma_meta_end() - Get the data end of a node from the metadata * @mn: The maple node @@ -5448,20 +5486,22 @@ static inline int mas_rev_alloc(struct ma_state *mas, unsigned long min, * mas_dead_leaves() - Mark all leaves of a node as dead. * @mas: The maple state * @slots: Pointer to the slot array + * @type: The maple node type * * Must hold the write lock. * * Return: The number of leaves marked as dead. */ static inline -unsigned char mas_dead_leaves(struct ma_state *mas, void __rcu **slots) +unsigned char mas_dead_leaves(struct ma_state *mas, void __rcu **slots, + enum maple_type mt) { struct maple_node *node; enum maple_type type; void *entry; int offset; - for (offset = 0; offset < mt_slot_count(mas->node); offset++) { + for (offset = 0; offset < mt_slots[mt]; offset++) { entry = mas_slot_locked(mas, slots, offset); type = mte_node_type(entry); node = mte_to_node(entry); @@ -5480,14 +5520,13 @@ unsigned char mas_dead_leaves(struct ma_state *mas, void __rcu **slots) static void __rcu **mas_dead_walk(struct ma_state *mas, unsigned char offset) { - struct maple_node *node, *next; + struct maple_node *next; void __rcu **slots = NULL; next = mas_mn(mas); do { - mas->node = ma_enode_ptr(next); - node = mas_mn(mas); - slots = ma_slots(node, node->type); + mas->node = mt_mk_node(next, next->type); + slots = ma_slots(next, next->type); next = mas_slot_locked(mas, slots, offset); offset = 0; } while (!ma_is_leaf(next->type)); @@ -5551,11 +5590,14 @@ static inline void __rcu **mas_destroy_descend(struct ma_state *mas, node = mas_mn(mas); slots = ma_slots(node, mte_node_type(mas->node)); next = mas_slot_locked(mas, slots, 0); - if ((mte_dead_node(next))) + if ((mte_dead_node(next))) { + mte_to_node(next)->type = mte_node_type(next); next = mas_slot_locked(mas, slots, 1); + } mte_set_node_dead(mas->node); node->type = mte_node_type(mas->node); + mas_clear_meta(mas, node, node->type); node->piv_parent = prev; node->parent_slot = offset; offset = 0; @@ -5575,13 +5617,18 @@ static void mt_destroy_walk(struct maple_enode *enode, unsigned char ma_flags, MA_STATE(mas, , 0, 0); - if (mte_is_leaf(enode)) + mas.node = enode; + if (mte_is_leaf(enode)) { + node->type = mte_node_type(enode); goto free_leaf; + } + ma_flags &= ~MT_FLAGS_LOCK_MASK; mt_init_flags(, ma_flags); mas_lock(); -
[PATCH 02/41] maple_tree: Detect dead nodes in mas_start()
From: Liam Howlett When initially starting a search, the root node may already be in the process of being replaced in RCU mode. Detect and restart the walk if this is the case. This is necessary for RCU mode of the maple tree. Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Liam Howlett Signed-off-by: Suren Baghdasaryan --- lib/maple_tree.c | 4 1 file changed, 4 insertions(+) diff --git a/lib/maple_tree.c b/lib/maple_tree.c index ff9f04e0150d..a748938ad2e9 100644 --- a/lib/maple_tree.c +++ b/lib/maple_tree.c @@ -1359,11 +1359,15 @@ static inline struct maple_enode *mas_start(struct ma_state *mas) mas->depth = 0; mas->offset = 0; +retry: root = mas_root(mas); /* Tree with nodes */ if (likely(xa_is_node(root))) { mas->depth = 1; mas->node = mte_safe_root(root); + if (mte_dead_node(mas->node)) + goto retry; + return NULL; } -- 2.39.0
[PATCH 00/41] Per-VMA locks
This is the v1 of per-VMA locks patchset originally posted as an RFC at [1] and described in LWN article at [2]. Per-vma locks idea that was discussed during SPF [3] discussion at LSF/MM this year [4], which concluded with suggestion that “a reader/writer semaphore could be put into the VMA itself; that would have the effect of using the VMA as a sort of range lock. There would still be contention at the VMA level, but it would be an improvement.” This patchset implements this suggested approach. When handling page faults we lookup the VMA that contains the faulting page under RCU protection and try to acquire its lock. If that fails we fall back to using mmap_lock, similar to how SPF handled this situation. One notable way the implementation deviates from the proposal is the way VMAs are read-locked. During some of mm updates, multiple VMAs need to be locked until the end of the update (e.g. vma_merge, split_vma, etc). Tracking all the locked VMAs, avoiding recursive locks, figuring out when it's safe to unlock previously locked VMAs would make the code more complex. So, instead of the usual lock/unlock pattern, the proposed solution marks a VMA as locked and provides an efficient way to: 1. Identify locked VMAs. 2. Unlock all locked VMAs in bulk. We also postpone unlocking the locked VMAs until the end of the update, when we do mmap_write_unlock. Potentially this keeps a VMA locked for longer than is absolutely necessary but it results in a big reduction of code complexity. Read-locking a VMA is done using two sequence numbers - one in the vm_area_struct and one in the mm_struct. VMA is considered read-locked when these sequence numbers are equal. To read-lock a VMA we set the sequence number in vm_area_struct to be equal to the sequence number in mm_struct. To unlock all VMAs we increment mm_struct's seq number. This allows for an efficient way to track locked VMAs and to drop the locks on all VMAs at the end of the update. The patchset implements per-VMA locking only for anonymous pages which are not in swap and avoids userfaultfs as their implementation is more complex. Additional support for file-back page faults, swapped and user pages can be added incrementally. Performance benchmarks show similar although slightly smaller benefits as with SPF patchset (~75% of SPF benefits). Still, with lower complexity this approach might be more desirable. Since RFC was posted in September 2022, two separate Google teams outside of Android evaluated the patchset and confirmed positive results. Here are the known usecases when per-VMA locks show benefits: Android: Apps with high number of threads (~100) launch times improve by up to 20%. Each thread mmaps several areas upon startup (Stack and Thread-local storage (TLS), thread signal stack, indirect ref table), which requires taking mmap_lock in write mode. Page faults take mmap_lock in read mode. During app launch, both thread creation and page faults establishing the active workinget are happening in parallel and that causes lock contention between mm writers and readers even if updates and page faults are happening in different VMAs. Per-vma locks prevent this contention by providing more granular lock. Google Fibers: We have several dynamically sized thread pools that spawn new threads under increased load and reduce their number when idling. For example, Google's in-process scheduling/threading framework, UMCG/Fibers, is backed by such a thread pool. When idling, only a small number of idle worker threads are available; when a spike of incoming requests arrive, each request is handled in its own "fiber", which is a work item posted onto a UMCG worker thread; quite often these spikes lead to a number of new threads spawning. Each new thread needs to allocate and register an RSEQ section on its TLS, then register itself with the kernel as a UMCG worker thread, and only after that it can be considered by the in-process UMCG/Fiber scheduler as available to do useful work. In short, during an incoming workload spike new threads have to be spawned, and they perform several syscalls (RSEQ registration, UMCG worker registration, memory allocations) before they can actually start doing useful work. Removing any bottlenecks on this thread startup path will greatly improve our services' latencies when faced with request/workload spikes. At high scale, mmap_lock contention during thread creation and stack page faults leads to user-visible multi-second serving latencies in a similar pattern to Android app startup. Per-VMA locking patchset has been run successfully in limited experiments with user-facing production workloads. In these experiments, we observed that the peak thread creation rate was high enough that thread creation is no longer a bottleneck. TCP zerocopy receive: >From the point of view of TCP zerocopy receive, the per-vma lock patch is massively beneficial. In today's implementation, a process with N threads where N - 1 are performing zerocopy
[PATCH 01/41] maple_tree: Be more cautious about dead nodes
From: Liam Howlett ma_pivots() and ma_data_end() may be called with a dead node. Ensure to that the node isn't dead before using the returned values. This is necessary for RCU mode of the maple tree. Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Liam Howlett Signed-off-by: Suren Baghdasaryan --- lib/maple_tree.c | 53 +++- 1 file changed, 43 insertions(+), 10 deletions(-) diff --git a/lib/maple_tree.c b/lib/maple_tree.c index 26e2045d3cda..ff9f04e0150d 100644 --- a/lib/maple_tree.c +++ b/lib/maple_tree.c @@ -540,6 +540,7 @@ static inline bool ma_dead_node(const struct maple_node *node) return (parent == node); } + /* * mte_dead_node() - check if the @enode is dead. * @enode: The encoded maple node @@ -621,6 +622,8 @@ static inline unsigned int mas_alloc_req(const struct ma_state *mas) * @node - the maple node * @type - the node type * + * In the event of a dead node, this array may be %NULL + * * Return: A pointer to the maple node pivots */ static inline unsigned long *ma_pivots(struct maple_node *node, @@ -1091,8 +1094,11 @@ static int mas_ascend(struct ma_state *mas) a_type = mas_parent_enum(mas, p_enode); a_node = mte_parent(p_enode); a_slot = mte_parent_slot(p_enode); - pivots = ma_pivots(a_node, a_type); a_enode = mt_mk_node(a_node, a_type); + pivots = ma_pivots(a_node, a_type); + + if (unlikely(ma_dead_node(a_node))) + return 1; if (!set_min && a_slot) { set_min = true; @@ -1398,6 +1404,9 @@ static inline unsigned char ma_data_end(struct maple_node *node, { unsigned char offset; + if (!pivots) + return 0; + if (type == maple_arange_64) return ma_meta_end(node, type); @@ -1433,6 +1442,9 @@ static inline unsigned char mas_data_end(struct ma_state *mas) return ma_meta_end(node, type); pivots = ma_pivots(node, type); + if (unlikely(ma_dead_node(node))) + return 0; + offset = mt_pivots[type] - 1; if (likely(!pivots[offset])) return ma_meta_end(node, type); @@ -4504,6 +4516,9 @@ static inline int mas_prev_node(struct ma_state *mas, unsigned long min) node = mas_mn(mas); slots = ma_slots(node, mt); pivots = ma_pivots(node, mt); + if (unlikely(ma_dead_node(node))) + return 1; + mas->max = pivots[offset]; if (offset) mas->min = pivots[offset - 1] + 1; @@ -4525,6 +4540,9 @@ static inline int mas_prev_node(struct ma_state *mas, unsigned long min) slots = ma_slots(node, mt); pivots = ma_pivots(node, mt); offset = ma_data_end(node, mt, pivots, mas->max); + if (unlikely(ma_dead_node(node))) + return 1; + if (offset) mas->min = pivots[offset - 1] + 1; @@ -4573,6 +4591,7 @@ static inline int mas_next_node(struct ma_state *mas, struct maple_node *node, struct maple_enode *enode; int level = 0; unsigned char offset; + unsigned char node_end; enum maple_type mt; void __rcu **slots; @@ -4596,7 +4615,11 @@ static inline int mas_next_node(struct ma_state *mas, struct maple_node *node, node = mas_mn(mas); mt = mte_node_type(mas->node); pivots = ma_pivots(node, mt); - } while (unlikely(offset == ma_data_end(node, mt, pivots, mas->max))); + node_end = ma_data_end(node, mt, pivots, mas->max); + if (unlikely(ma_dead_node(node))) + return 1; + + } while (unlikely(offset == node_end)); slots = ma_slots(node, mt); pivot = mas_safe_pivot(mas, pivots, ++offset, mt); @@ -4612,6 +4635,9 @@ static inline int mas_next_node(struct ma_state *mas, struct maple_node *node, mt = mte_node_type(mas->node); slots = ma_slots(node, mt); pivots = ma_pivots(node, mt); + if (unlikely(ma_dead_node(node))) + return 1; + offset = 0; pivot = pivots[0]; } @@ -4658,16 +4684,18 @@ static inline void *mas_next_nentry(struct ma_state *mas, return NULL; } - pivots = ma_pivots(node, type); slots = ma_slots(node, type); - mas->index = mas_safe_min(mas, pivots, mas->offset); - if (ma_dead_node(node)) + pivots = ma_pivots(node, type); + count = ma_data_end(node, type, pivots, mas->max); + if (unlikely(ma_dead_node(node))) return NULL; + mas->index = mas_safe_min(mas, pivots, mas->offset); + if (unlikely(ma_dead_node(node))) + return NULL;
Re: [PATCH 01/15] video: fbdev: atmel_lcdfb: Rework backlight handling
On 1/9/23 21:18, Stephen Kitt wrote: On Sun, 8 Jan 2023 18:26:12 +0100, Helge Deller wrote: On 1/7/23 21:53, Sam Ravnborg wrote: Hi Stephen. On Sat, Jan 07, 2023 at 09:36:47PM +0100, Stephen Kitt wrote: On 7 January 2023 19:26:15 CET, Sam Ravnborg via B4 Submission Endpoint wrote: From: Sam Ravnborg The atmel_lcdfb had code to save/restore power state. This is not needed so drop it. Introduce backlight_is_brightness() to make logic simpler. Signed-off-by: Sam Ravnborg Cc: Nicolas Ferre Cc: Alexandre Belloni Cc: Ludovic Desroches Cc: linux-fb...@vger.kernel.org Cc: linux-arm-ker...@lists.infradead.org --- drivers/video/fbdev/atmel_lcdfb.c | 24 +++- 1 file changed, 3 insertions(+), 21 deletions(-) ... Hi Sam, I’d submitted quite a few more of these previously (and you’d reviewed them), see e.g. the thread starting at https://lkml.org/lkml/2022/6/7/4365, and yesterday, https://lkml.org/lkml/2023/1/6/520, https://lkml.org/lkml/2023/1/6/656, https://lkml.org/lkml/2023/1/6/970, https://lkml.org/lkml/2023/1/6/643, and https://lkml.org/lkml/2023/1/6/680. There are a few more, I can find them if it’s any use. The patches from yesterday was what triggered me to resurrect an old branch of mine where I had done something similar. I had lost all memory of reviewing similar patches from you. Helge - could you pick the reviewed patches from: https://lore.kernel.org/all/20220607192335.1137249-1-st...@sk2.org/ [This is the same mail as Stephen refer to above - looked up via lore]. I just pulled those 7 patches into fbdev/for-next. If you need more, please let me know, Please pull https://lore.kernel.org/lkml/20230109200239.1850611-1-st...@sk2.org/ too, it completes the fbdev set. (It’s a re-send of https://lore.kernel.org/lkml/20220609180440.3138625-1-st...@sk2.org/). Done. Thanks! Helge
Re: [PATCH 01/15] video: fbdev: atmel_lcdfb: Rework backlight handling
On Sun, 8 Jan 2023 18:26:12 +0100, Helge Deller wrote: > On 1/7/23 21:53, Sam Ravnborg wrote: > > Hi Stephen. > > > > On Sat, Jan 07, 2023 at 09:36:47PM +0100, Stephen Kitt wrote: > >> On 7 January 2023 19:26:15 CET, Sam Ravnborg via B4 Submission Endpoint > >> wrote: > >>> From: Sam Ravnborg > >>> > >>> The atmel_lcdfb had code to save/restore power state. > >>> This is not needed so drop it. > >>> > >>> Introduce backlight_is_brightness() to make logic simpler. > >>> > >>> Signed-off-by: Sam Ravnborg > >>> Cc: Nicolas Ferre > >>> Cc: Alexandre Belloni > >>> Cc: Ludovic Desroches > >>> Cc: linux-fb...@vger.kernel.org > >>> Cc: linux-arm-ker...@lists.infradead.org > >>> --- > >>> drivers/video/fbdev/atmel_lcdfb.c | 24 +++- > >>> 1 file changed, 3 insertions(+), 21 deletions(-) > > ... > >> > >> Hi Sam, > >> > >> I’d submitted quite a few more of these previously (and you’d reviewed > >> them), see e.g. the thread starting at > >> https://lkml.org/lkml/2022/6/7/4365, and yesterday, > >> https://lkml.org/lkml/2023/1/6/520, https://lkml.org/lkml/2023/1/6/656, > >> https://lkml.org/lkml/2023/1/6/970, https://lkml.org/lkml/2023/1/6/643, > >> and https://lkml.org/lkml/2023/1/6/680. There are a few more, I can find > >> them if it’s any use. > > > > The patches from yesterday was what triggered me to resurrect an old > > branch of mine where I had done something similar. I had lost all > > memory of reviewing similar patches from you. > > > > > > Helge - could you pick the reviewed patches from: > > https://lore.kernel.org/all/20220607192335.1137249-1-st...@sk2.org/ > > [This is the same mail as Stephen refer to above - looked up via lore]. > > I just pulled those 7 patches into fbdev/for-next. > If you need more, please let me know, Please pull https://lore.kernel.org/lkml/20230109200239.1850611-1-st...@sk2.org/ too, it completes the fbdev set. (It’s a re-send of https://lore.kernel.org/lkml/20220609180440.3138625-1-st...@sk2.org/). Thanks, Stephen pgpZRyHS3C8SI.pgp Description: OpenPGP digital signature
Re: [PATCH 01/15] video: fbdev: atmel_lcdfb: Rework backlight handling
On Mon, 9 Jan 2023 20:53:44 +0100, Stephen Kitt wrote: > On Sun, 8 Jan 2023 21:24:20 +0100, Sam Ravnborg wrote: > > > Here are my pending patches from last June on lore: > > > > > > > All patches are handled I think except this: > > > * https://lore.kernel.org/lkml/20220608205623.2106113-1-st...@sk2.org/ > > > > > > > Can I ask you to drop the assignment that is not needed, and resend with > > the collected acks/r-b. > > > > With this, then all fbdev patches are handled. > > Ah yes, it was revised as > https://lore.kernel.org/lkml/20220609180440.3138625-1-st...@sk2.org/ which > only got one ack (from you, > https://lore.kernel.org/lkml/yqjckqmqeuvsb...@ravnborg.org/). Should I > resend that or is that usable as-is? Or would it be better if I sent all the > fbdev patches again (collecting all the acks and reviews)? Since the others are already in fbdev/for-next, I’ve just resent v2 of this patch with your ack. Regards, Stephen pgpqRo2Vm6zc7.pgp Description: OpenPGP digital signature
Re: [PATCH v3 3/8] sched, smp: Trace IPIs sent via send_call_function_single_ipi()
On 07/01/23 12:04, Ingo Molnar wrote: > * Valentin Schneider wrote: > >> send_call_function_single_ipi() is the thing that sends IPIs at the bottom >> of smp_call_function*() via either generic_exec_single() or >> smp_call_function_many_cond(). Give it an IPI-related tracepoint. >> >> Note that this ends up tracing any IPI sent via __smp_call_single_queue(), >> which covers __ttwu_queue_wakelist() and irq_work_queue_on() "for free". >> >> Signed-off-by: Valentin Schneider >> Reviewed-by: Steven Rostedt (Google) > > Acked-by: Ingo Molnar > > Patch series logistics: > > - No objections from the scheduler side, this feature looks pretty useful. > Thanks! > - Certain patches are incomplete, others are noted as being merged >separately, so I presume you'll send an updated/completed series >eventually? > The first patch from Steve is now in, so can drop it. The other patches are complete, though I need to rebase them and regenerate the treewide patch to catch any changes that came with 6.2. I'll do that this week. The "incompleteness" pointed out in the cover letter is about the types of IPIs that can be traced. This series covers the ones that end up invoking some core code (coincidentally those are the most common ones), others such as e.g. tick_broadcast() for arm, arm64, riscv and hexagon remain unaffected. I'm not that much interested in these (other than maybe the tick broadcast one they are all fairly unfrequent), but I'm happy to have a shot at them for the sake of completeness - either in that series or in a followup, up to you. > - We can merge this via the scheduler tree I suspect, as most callbacks >affected relate to tip:sched/core and tmp:smp/core - but if you have >some other preferred tree that's fine too. > Either sound good to me. > Thanks, > > Ingo
Re: [PATCH v3 6/8] treewide: Trace IPIs sent via smp_send_reschedule()
On 08/01/23 20:17, Huacai Chen wrote: > Hi, Valentin, > > On Fri, Dec 2, 2022 at 11:59 PM Valentin Schneider > wrote: >> @@ -83,7 +83,7 @@ extern void show_ipi_list(struct seq_file *p, int prec); >> * it goes straight through and wastes no time serializing >> * anything. Worst case is that we lose a reschedule ... >> */ >> -static inline void smp_send_reschedule(int cpu) >> +static inline void arch_smp_send_reschedule(int cpu) >> { >> loongson_send_ipi_single(cpu, SMP_RESCHEDULE); >> } > This function has been moved to arch/loongarch/kernel/smp.c since 6.2. > Thanks! I'll make sure to rerun the coccinelle script for the next version.
Re: [PATCH 02/15] video: fbdev: atyfb: Introduce backlight_get_brightness()
Hi Christophe, On Mon, Jan 09, 2023 at 05:44:46PM +, Christophe Leroy wrote: > > > Le 07/01/2023 à 19:26, Sam Ravnborg via B4 Submission Endpoint a écrit : > > From: Sam Ravnborg > > > > Introduce backlight_get_brightness() to simplify logic > > and avoid direct access to backlight properties. > > When I read 'introduce' I understand that you are adding a new function. > > In fact backlight_get_brightness() already exists, so maybe replace > 'introduce' by 'use' Thanks for your feedback. A similar patch is already applied to the fbdev tree, so this patch can be ignored. Sam
Re: [PATCH 02/15] video: fbdev: atyfb: Introduce backlight_get_brightness()
Le 07/01/2023 à 19:26, Sam Ravnborg via B4 Submission Endpoint a écrit : > From: Sam Ravnborg > > Introduce backlight_get_brightness() to simplify logic > and avoid direct access to backlight properties. When I read 'introduce' I understand that you are adding a new function. In fact backlight_get_brightness() already exists, so maybe replace 'introduce' by 'use' > > Signed-off-by: Sam Ravnborg > Cc: Bartlomiej Zolnierkiewicz > Cc: Sam Ravnborg > Cc: Daniel Vetter > Cc: Souptick Joarder > Cc: Maarten Lankhorst > Cc: Jason Yan > Cc: Jani Nikula > Cc: Arnd Bergmann > --- > drivers/video/fbdev/aty/atyfb_base.c | 8 +--- > 1 file changed, 1 insertion(+), 7 deletions(-) > > diff --git a/drivers/video/fbdev/aty/atyfb_base.c > b/drivers/video/fbdev/aty/atyfb_base.c > index 0ccf5d401ecb..ca361e215904 100644 > --- a/drivers/video/fbdev/aty/atyfb_base.c > +++ b/drivers/video/fbdev/aty/atyfb_base.c > @@ -2219,13 +2219,7 @@ static int aty_bl_update_status(struct > backlight_device *bd) > { > struct atyfb_par *par = bl_get_data(bd); > unsigned int reg = aty_ld_lcd(LCD_MISC_CNTL, par); > - int level; > - > - if (bd->props.power != FB_BLANK_UNBLANK || > - bd->props.fb_blank != FB_BLANK_UNBLANK) > - level = 0; > - else > - level = bd->props.brightness; > + int level = backlight_get_brightness(bd); > > reg |= (BLMOD_EN | BIASMOD_EN); > if (level > 0) { > > -- > 2.34.1
Re: [PATCH v2 1/2] powerpc/ps3: Change updateboltedpp panic to info
Le 03/01/2023 à 18:51, Geoff Levand a écrit : > Commit fdacae8a84024474afff234bdd1dbe19ad597a10 (powerpc: Activate > CONFIG_STRICT_KERNEL_RWX by default) causes ps3_hpte_updateboltedpp() > to be called. Change the panic statment in ps3_hpte_updateboltedpp() > to a pr_info statement so that bootup can continue. But if I understand correctly, it means that CONFIG_STRICT_KERNEL_RWX won't work then. So, shouldn't we keep the panic and forbid CONFIG_STRICT_KERNEL_RWX on PS3 ? > > Signed-off-by: Geoff Levand > --- > arch/powerpc/platforms/ps3/htab.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/powerpc/platforms/ps3/htab.c > b/arch/powerpc/platforms/ps3/htab.c > index c27e6cf85272..9de62bd52650 100644 > --- a/arch/powerpc/platforms/ps3/htab.c > +++ b/arch/powerpc/platforms/ps3/htab.c > @@ -146,7 +146,7 @@ static long ps3_hpte_updatepp(unsigned long slot, > unsigned long newpp, > static void ps3_hpte_updateboltedpp(unsigned long newpp, unsigned long ea, > int psize, int ssize) > { > - panic("ps3_hpte_updateboltedpp() not implemented"); > + pr_info("ps3_hpte_updateboltedpp() not implemented"); > } > > static void ps3_hpte_invalidate(unsigned long slot, unsigned long vpn,
Re: [PATCH] objtool: continue if find_insn() fails in decode_instructions()
On 09/01/23 22:23, Ingo Molnar wrote: * Sathvika Vasireddy wrote: Hi Ingo, Happy New Year! Happy New Year to you too! :-) On 07/01/23 15:51, Ingo Molnar wrote: * Sathvika Vasireddy wrote: Currently, decode_instructions() is failing if it is not able to find instruction, and this is happening since commit dbcdbdfdf137b4 ("objtool: Rework instruction -> symbol mapping") because it is expecting instruction for STT_NOTYPE symbols. Due to this, the following objtool warnings are seen: [1] arch/powerpc/kernel/optprobes_head.o: warning: objtool: optprobe_template_end(): can't find starting instruction [2] arch/powerpc/kernel/kvm_emul.o: warning: objtool: kvm_template_end(): can't find starting instruction [3] arch/powerpc/kernel/head_64.o: warning: objtool: end_first_256B(): can't find starting instruction The warnings are thrown because find_insn() is failing for symbols that are at the end of the file, or at the end of the section. Given how STT_NOTYPE symbols are currently handled in decode_instructions(), continue if the instruction is not found, instead of throwing warning and returning. Signed-off-by: Naveen N. Rao Signed-off-by: Sathvika Vasireddy The SOB chain doesn't look valid: is Naveen N. Rao, the first SOB line, the author of the patch? If yes then a matching From: line is needed. Or if two people developed the patch, then Co-developed-by should be used: Co-developed-by: First Co-Author Signed-off-by: First Co-Author Co-developed-by: Second Co-Author Signed-off-by: Second Co-Author [ In this SOB sequence "Second Co-Author" is the one who submits the patch. ] [ Please only use Co-developed-by if actual lines of code were written by the co-author that created copyrightable material - it's not a courtesy tag. Reviewed-by/Acked-by/Tested-by can be used to credit non-code contributions. ] Thank you for the clarification, and for bringing these points to my attention. I'll keep these things in mind. In this case, since both Naveen N. Rao and I developed the patch, the below tags are applicable. Co-developed-by: First Co-Author Signed-off-by: First Co-Author Co-developed-by: Second Co-Author Signed-off-by: Second Co-Author ... while filling in your real names & email addresses I suppose. ;-) Indeed :-) However, I would be dropping this particular patch, since I think Nick's patch [1] is better to fix the objtool issue. [1] - https://lore.kernel.org/linuxppc-dev/20221220101323.3119939-1-npig...@gmail.com/ Ok, I'll pick up Nick's fix, with these tags added for the PowerPC regression aspect and your review: Reported-by: Naveen N. Rao Reported-by: Sathvika Vasireddy Acked-by: Sathvika Vasireddy To document & credit the efforts of your patch. Sure, thank you! - Sathvika
Re: [PATCH v7 2/2] arm64: support batched/deferred tlb shootdown during page reclamation
On Sun, Jan 08, 2023 at 06:48:41PM +0800, Barry Song wrote: > On Fri, Jan 6, 2023 at 2:15 AM Catalin Marinas > wrote: > > On Thu, Nov 17, 2022 at 04:26:48PM +0800, Yicong Yang wrote: > > > It is tested on 4,8,128 CPU platforms and shows to be beneficial on > > > large systems but may not have improvement on small systems like on > > > a 4 CPU platform. So make ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH depends > > > on CONFIG_EXPERT for this stage and make this disabled on systems > > > with less than 8 CPUs. User can modify this threshold according to > > > their own platforms by CONFIG_NR_CPUS_FOR_BATCHED_TLB. > > > > What's the overhead of such batching on systems with 4 or fewer CPUs? If > > it isn't noticeable, I'd rather have it always on than some number > > chosen on whichever SoC you tested. > > On the one hand, tlb flush is cheap on a small system. so batching tlb flush > helps very minorly. Yes, it probably won't help on small systems but I don't like config options choosing the threshold, which may be different from system to system even if they have the same number of CPUs. A run-time tunable would be a better option. > On the other hand, since we have batched the tlb flush, new PTEs might be > invisible to others before the final broadcast is done and Ack-ed. The new PTEs could indeed be invisible at the TLB level but not at the memory (page table) level since this is done under the PTL IIUC. > thus, there > is a risk someone else might do mprotect or similar things on those deferred > pages which will ask for read-modify-write on those deferred PTEs. And this should be fine, we have things like the PTL in place for the actual memory access to the page table. > in this > case, mm will do an explicit flush by flush_tlb_batched_pending which is > not required if tlb flush is not deferred. I don't fully understand why it's needed, or at least why it would be needed on arm64. At the end of an mprotect(), we have the final PTEs in place and we just need to issue a TLBI for that range. change_pte_range() for example has a tlb_flush_pte_range() if the PTE was present and that won't be done lazily. If there are other TLBIs pending for the same range, they'll be done later though likely unnecessarily but still cheaper than issuing a flush_tlb_mm(). > void flush_tlb_batched_pending(struct mm_struct *mm) > { >int batch = atomic_read(>tlb_flush_batched); >int pending = batch & TLB_FLUSH_BATCH_PENDING_MASK; >int flushed = batch >> TLB_FLUSH_BATCH_FLUSHED_SHIFT; > >if (pending != flushed) { >flush_tlb_mm(mm); > /* > * If the new TLB flushing is pending during flushing, leave > * mm->tlb_flush_batched as is, to avoid losing flushing. > */ > atomic_cmpxchg(>tlb_flush_batched, batch, >pending | (pending << TLB_FLUSH_BATCH_FLUSHED_SHIFT)); > } > } I guess this works on x86 better as it avoids the IPIs if this flush already happened. But on arm64 we already issued the TLBI, we just didn't wait for it to complete via a DSB. > I believe Anshuman has contributed many points on this in those previous > discussions. Yeah, I should re-read the old threads. -- Catalin
Re: [PATCH] objtool: continue if find_insn() fails in decode_instructions()
* Sathvika Vasireddy wrote: > Hi Ingo, Happy New Year! Happy New Year to you too! :-) > On 07/01/23 15:51, Ingo Molnar wrote: > > * Sathvika Vasireddy wrote: > > > > > Currently, decode_instructions() is failing if it is not able to find > > > instruction, and this is happening since commit dbcdbdfdf137b4 > > > ("objtool: Rework instruction -> symbol mapping") because it is > > > expecting instruction for STT_NOTYPE symbols. > > > > > > Due to this, the following objtool warnings are seen: > > > [1] arch/powerpc/kernel/optprobes_head.o: warning: objtool: > > > optprobe_template_end(): can't find starting instruction > > > [2] arch/powerpc/kernel/kvm_emul.o: warning: objtool: > > > kvm_template_end(): can't find starting instruction > > > [3] arch/powerpc/kernel/head_64.o: warning: objtool: end_first_256B(): > > > can't find starting instruction > > > > > > The warnings are thrown because find_insn() is failing for symbols that > > > are at the end of the file, or at the end of the section. Given how > > > STT_NOTYPE symbols are currently handled in decode_instructions(), > > > continue if the instruction is not found, instead of throwing warning > > > and returning. > > > > > > Signed-off-by: Naveen N. Rao > > > Signed-off-by: Sathvika Vasireddy > > The SOB chain doesn't look valid: is Naveen N. Rao, the first SOB line, the > > author of the patch? If yes then a matching From: line is needed. > > > > Or if two people developed the patch, then Co-developed-by should be used: > > > > Co-developed-by: First Co-Author > > Signed-off-by: First Co-Author > > Co-developed-by: Second Co-Author > > Signed-off-by: Second Co-Author > > > > [ In this SOB sequence "Second Co-Author" is the one who submits the patch. > > ] > > > > [ Please only use Co-developed-by if actual lines of code were written by > >the co-author that created copyrightable material - it's not a courtesy > >tag. Reviewed-by/Acked-by/Tested-by can be used to credit non-code > >contributions. ] > Thank you for the clarification, and for bringing these points to my > attention. I'll keep these things in mind. In this case, since both Naveen > N. Rao and I developed the patch, the below tags > are applicable. > > Co-developed-by: First Co-Author > Signed-off-by: First Co-Author > Co-developed-by: Second Co-Author > Signed-off-by: Second Co-Author ... while filling in your real names & email addresses I suppose. ;-) > > However, I would be dropping this particular patch, since I think Nick's > patch [1] is better to fix the objtool issue. > > [1] - > https://lore.kernel.org/linuxppc-dev/20221220101323.3119939-1-npig...@gmail.com/ Ok, I'll pick up Nick's fix, with these tags added for the PowerPC regression aspect and your review: Reported-by: Naveen N. Rao Reported-by: Sathvika Vasireddy Acked-by: Sathvika Vasireddy To document & credit the efforts of your patch. Thanks, Ingo
Re: [PATCH 15/15] backlight: backlight: Drop the deprecated fb_blank property
Hi Daniel. On Mon, Jan 09, 2023 at 11:06:35AM +, Daniel Thompson wrote: > On Sat, Jan 07, 2023 at 07:26:29PM +0100, Sam Ravnborg via B4 Submission > Endpoint wrote: > > From: Sam Ravnborg > > > > With all users gone remove the deprecated fb_blank member in > > backlight_properties. > > > > Signed-off-by: Sam Ravnborg > > Cc: Lee Jones > > Cc: Daniel Thompson > > Cc: Jingoo Han > > > Reviewed-by: Daniel Thompson Thanks for the follow-up on all the backlight related patches. > > > PS Please don't treat this like a maintainer Acked-by: and merge it >(Lee's not on holiday so work with Lee to figure out the merge >strategy ;-) ). Nope, I am aware that the usual pattern here and wait for Lee to show up. For this patch there is a bug as I need to update a comment. I will fix this when I resend after all the patches in flight has landed. So likely after the next merge window, Sam
Re: [PATCH] net-wan: Add check for NULL for utdm in ucc_hdlc_probe
Le 23/12/2022 à 15:32, Ekaterina Esina a écrit : > [Vous ne recevez pas souvent de courriers de ees...@astralinux.ru. Découvrez > pourquoi ceci est important à https://aka.ms/LearnAboutSenderIdentification ] > > If uhdlc_priv_tsa != 1 then utdm is not initialized. > And if ret != NULL then goto undo_uhdlc_init, where utdm is dereferenced. > Same if dev == NULL. > > Found by Linux Verification Center (linuxtesting.org) with SVACE. > > Signed-off-by: Ekaterina Esina > --- > drivers/net/wan/fsl_ucc_hdlc.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/wan/fsl_ucc_hdlc.c b/drivers/net/wan/fsl_ucc_hdlc.c > index 22edea6ca4b8..2ddb0f71e648 100644 > --- a/drivers/net/wan/fsl_ucc_hdlc.c > +++ b/drivers/net/wan/fsl_ucc_hdlc.c > @@ -1243,7 +1243,9 @@ static int ucc_hdlc_probe(struct platform_device *pdev) > free_dev: > free_netdev(dev); > undo_uhdlc_init: > - iounmap(utdm->siram); > + if (utdm != NULL) { > + iounmap(utdm->siram); > + } If utdm being NULL is a problem here, isn't it also a problem in the iounmap below ? > unmap_si_regs: > iounmap(utdm->si_regs); > free_utdm: > -- > 2.30.2 >
RE: [PATCH 1/1] PCI: layerscape: Add EP mode support for ls1028a
> > From: Xiaowei Bao > > Add PCIe EP mode support for ls1028a. > > Signed-off-by: Xiaowei Bao > Signed-off-by: Hou Zhiqiang > --- > > All other patches were already accepte by maintainer in > https://lore.kernel.org/lkml/2022223457.10599-1-leoyang...@nxp.com/ > > But missed this one. > > Re-post. > Ping.
Re: [Skiboot] [PATCH 1/3] core/device: Add function to return child node using name and length
Hi Athira, On Thu, 5 Jan 2023 12:41:33 +0530 Athira Rajeev wrote: > > > > On 05-Jan-2023, at 12:35 PM, Madhavan Srinivasan > > wrote: > > > > > > On Mon, 2 Jan 2023 08:45:22 +0530 > > Athira Rajeev wrote: > > > >> Add a function dt_find_by_name_len() that returns the child node if > >> it matches the first "n" characters of a given name, otherwise NULL. > >> This is helpful for cases with node name like: "name@addr". In > >> scenarios where nodes are added with "name@addr" format and if the > >> value of "addr" is not known, that node can't be matched with node > >> name or addr. Hence matching with substring as node name will return > >> the expected result. Patch adds dt_find_by_name_len() function > >> and testcase for the same in core/test/run-device.c > > > > wouldn't it be better to automatically compare the name up to the "@" > > character in the node name when searching for the match instead of > > having to hard-code the lengths? I think it should be good enough for > > the use case described above. > > > > something like > > ... > > pos = strchr(child->name, '@') > > if (!strncmp(child->name, name, pos - child->name)) > > ... > > > > > > Dan > > Hi Dan, > > Thanks for checking the patch. > > Comparing upto "@" while searching for the match will restrict the string > search only for patterns with "@". > By having dt_find_by_name_len which uses length, will be useful for generic > substring search for different patterns. > So prefered to use length instead of hardcoding character. > > Please let us know your thoughts. I understand the presented solution is a pretty generic one, but I think the question is whether the added complexity brings the benefits compared to the simpler "separator char" solution. And thinking even more about the generic "length" approach, it might bring some false positive hits. Imagine nodes abc@1, abcd@2 and you are looking for "abc". A search for (abc,3) will match also the "abcd" one. And if the search string will always contain the "@" character, then specifying the length is not required. And I believe the length parameter might be totally redundant, because it can be derived from the search string and the new function would be like "dt_find_by_name_substr()". With regards, Dan > Thanks > Athira > > > > >> Signed-off-by: Athira Rajeev > >> --- > >> core/device.c | 20 > >> core/test/run-device.c | 11 +++ > >> include/device.h | 4 > >> 3 files changed, 35 insertions(+) > >> diff --git a/core/device.c b/core/device.c > >> index 2de37c74..72c54e85 100644 > >> --- a/core/device.c > >> +++ b/core/device.c > >> @@ -395,6 +395,26 @@ struct dt_node *dt_find_by_name(struct dt_node *root, > >> const char *name) > >> } > >> +struct dt_node *dt_find_by_name_len(struct dt_node *root, > >> + const char *name, int len) > >> +{ > >> + struct dt_node *child, *match; > >> + > >> + if (len <= 0) > >> + return NULL; > >> + > >> + list_for_each(>children, child, list) { > >> + if (!strncmp(child->name, name, len)) > >> + return child; > >> + > >> + match = dt_find_by_name_len(child, name, len); > >> + if (match) > >> + return match; > >> + } > >> + > >> + return NULL; > >> +} > >> + > >> struct dt_node *dt_new_check(struct dt_node *parent, const char *name) > >> { > >>struct dt_node *node = dt_find_by_name(parent, name); > >> diff --git a/core/test/run-device.c b/core/test/run-device.c > >> index 4a12382b..8c552103 100644 > >> --- a/core/test/run-device.c > >> +++ b/core/test/run-device.c > >> @@ -466,6 +466,17 @@ int main(void) > >>new_prop_ph = dt_prop_get_u32(ut2, "something"); > >>assert(!(new_prop_ph == ev1_ph)); > >>dt_free(subtree); > >> + > >> + /* Test dt_find_by_name_len */ > >> + root = dt_new_root(""); > >> + addr1 = dt_new_addr(root, "node", 0x1); > >> + addr2 = dt_new_addr(root, "node0_1", 0x2); > >> + assert(dt_find_by_name(root, "node@1") == addr1); > >> + assert(dt_find_by_name(root, "node0_1@2") == addr2); > >> + assert(dt_find_by_name_len(root, "node@", 5) == addr1); > >> + assert(dt_find_by_name_len(root, "node0_1@", 8) == addr2); > >> + dt_free(root); > >> + > >>return 0; > >> } > >> diff --git a/include/device.h b/include/device.h > >> index 93fb90ff..f5e0fb79 100644 > >> --- a/include/device.h > >> +++ b/include/device.h > >> @@ -184,6 +184,10 @@ struct dt_node *dt_find_by_path(struct dt_node *root, > >> const char *path); > >> /* Find a child node by name */ > >> struct dt_node *dt_find_by_name(struct dt_node *root, const char *name); > >> +/* Find a child node by name and len */ > >> +struct dt_node *dt_find_by_name_len(struct dt_node *root, > >> +const char *name, int len); > >> + > >> /* Find a node by phandle */ > >> struct dt_node *dt_find_by_phandle(struct dt_node
Re: [PATCH 2/2] perf test bpf: Skip test if kernel-debuginfo is not present
> On 05-Jan-2023, at 6:24 PM, Arnaldo Carvalho de Melo wrote: > > Em Thu, Jan 05, 2023 at 05:47:42PM +0530, Athira Rajeev escreveu: >> Perf BPF filter test fails in environment where "kernel-debuginfo" >> is not installed. > > I'll apply this to perf/core, for the next merge window, as its more an > improvement than a fix, i.e. we know why it fails, we're just improving > the user reporting to make that clear at first sight. > > - Arnaldo Hi Arnaldo, Sure, Thanks for checking Athira > >> Test failure logs: >> <<>> >> 42: BPF filter: >> 42.1: Basic BPF filtering : Ok >> 42.2: BPF pinning : Ok >> 42.3: BPF prologue generation : FAILED! >> <<>> >> >> Enabling verbose option provided debug logs, which says debuginfo >> needs to be installed. Snippet of verbose logs: >> >> <<>> >> 42.3: BPF prologue generation : >> --- start --- >> test child forked, pid 28218 >> <<>> >> Rebuild with CONFIG_DEBUG_INFO=y, or install an appropriate debuginfo >> package. >> bpf_probe: failed to convert perf probe events >> Failed to add events selected by BPF >> test child finished with -1 >> end >> BPF filter subtest 3: FAILED! >> <<>> >> >> Here subtest, "BPF prologue generation" failed and >> logs shows debuginfo is needed. After installing >> kernel-debuginfo package, testcase passes. >> >> Subtest "BPF prologue generation" failed because, the "do_test" >> function returns "TEST_FAIL" without checking the error type >> returned by "parse_events_load_bpf_obj" function. >> Function parse_events_load_bpf_obj can also return error of type >> "-ENODATA" incase kernel-debuginfo package is not installed. Fix this >> by adding check for -ENODATA error. >> >> Test result after the patch changes: >> >> Test failure logs: >> <<>> >> 42: BPF filter : >> 42.1: Basic BPF filtering : Ok >> 42.2: BPF pinning : Ok >> 42.3: BPF prologue generation : Skip (clang/debuginfo isn't >> installed or environment missing BPF support) >> >> Fixes: ba1fae431e74 ("perf test: Add 'perf test BPF'") >> Signed-off-by: Athira Rajeev >> --- >> Note: This is dependent on patch 1: >> tools/perf: Update the exit error codes in function >> try_to_find_probe_trace_event >> >> tools/perf/tests/bpf.c | 6 +- >> 1 file changed, 5 insertions(+), 1 deletion(-) >> >> diff --git a/tools/perf/tests/bpf.c b/tools/perf/tests/bpf.c >> index 17c023823713..6a4235a9cf57 100644 >> --- a/tools/perf/tests/bpf.c >> +++ b/tools/perf/tests/bpf.c >> @@ -126,6 +126,10 @@ static int do_test(struct bpf_object *obj, int >> (*func)(void), >> >> err = parse_events_load_bpf_obj(_state, _state.list, obj, >> NULL); >> parse_events_error__exit(_error); >> +if (err == -ENODATA) { >> +pr_debug("Failed to add events selected by BPF, debuginfo >> package not installed\n"); >> +return TEST_SKIP; >> +} >> if (err || list_empty(_state.list)) { >> pr_debug("Failed to add events selected by BPF\n"); >> return TEST_FAIL; >> @@ -368,7 +372,7 @@ static struct test_case bpf_tests[] = { >> "clang isn't installed or environment missing BPF >> support"), >> #ifdef HAVE_BPF_PROLOGUE >> TEST_CASE_REASON("BPF prologue generation", bpf_prologue_test, >> -"clang isn't installed or environment missing BPF >> support"), >> +"clang/debuginfo isn't installed or environment missing >> BPF support"), >> #else >> TEST_CASE_REASON("BPF prologue generation", bpf_prologue_test, "not >> compiled in"), >> #endif >> -- >> 2.31.1 > > -- > > - Arnaldo
Re: [PATCH] tools/perf: Fix bpf-script-test-prologue test compile issue with clang
> On 05-Jan-2023, at 6:23 PM, Arnaldo Carvalho de Melo wrote: > > Em Thu, Jan 05, 2023 at 05:34:36PM +0530, Athira Rajeev escreveu: >> While running perf test for bpf, observed that "BPF prologue >> generation" test case fails to compile with clang. Logs below >> from powerpc: >> >> :33:2: error: use of undeclared identifier 'fmode_t' >>fmode_t f_mode = (fmode_t)_f_mode; >>^ >> :37:6: error: use of undeclared identifier 'f_mode'; did you mean >> '_f_mode'? >>if (f_mode & FMODE_WRITE) >>^~ >>_f_mode >> :30:60: note: '_f_mode' declared here >> int bpf_func__null_lseek(void *ctx, int err, unsigned long _f_mode, >> ^ >> 2 errors generated. > > Thanks for fixing this, I noticed the problem but didn't got around to > investigate it. > > Tested and applied. Thanks Arnaldo for checking the patch. Athira > > - Arnaldo > >> The test code tests/bpf-script-test-prologue.c uses fmode_t. >> And the error above is for "fmode_t" which is defined in >> include/linux/types.h as part of kernel build directory: >> "/lib/modules//build" that comes from >> kernel devel [ soft link to /usr/src/ ]. >> Clang picks this header file from "-working-directory" build >> option that specifies this build folder. >> >> But the 'commit 14e4b9f4289a ("perf trace: Raw augmented >> syscalls fix libbpf 1.0+ compatibility")', changed the >> include directory to use: "/usr/include". Post this change, >> types.h from /usr/include/ is getting picked upwhich doesn’t >> contain definition of "fmode_t" and hence fails to compile. >> >> Compilation command before this commit: >> /usr/bin/clang -D__KERNEL__ -D__NR_CPUS__=72 -DLINUX_VERSION_CODE=0x50e00 >> -xc -I/root/lib/perf/include/bpf -nostdinc -I./arch/powerpc/include >> -I./arch/powerpc/include/generated -I./include >> -I./arch/powerpc/include/uapi -I./arch/powerpc/include/generated/uapi >> -I./include/uapi -I./include/generated/uapi -include >> ./include/linux/compiler-version.h -include ./include/linux/kconfig.h >> -Wno-unused-value -Wno-pointer-sign -working-directory >> /lib/modules//build -c - -target bpf -g -O2 -o - >> >> Compilation command after this commit: >> /usr/bin/clang -D__KERNEL__ -D__NR_CPUS__=72 -DLINUX_VERSION_CODE=0x50e00 >> -xc -I/usr/include/ -nostdinc -I./arch/powerpc/include >> -I./arch/powerpc/include/generated -I./include >> -I./arch/powerpc/include/uapi -I./arch/powerpc/include/generated/uapi >> -I./include/uapi -I./include/generated/uapi -include >> ./include/linux/compiler-version.h -include ./include/linux/kconfig.h >> -Wno-unused-value -Wno-pointer-sign -working-directory >> /lib/modules//build -c - -target bpf -g -O2 -o - >> >> The difference is addition of -I/usr/include/ in the first line >> which is causing the error. Fix this by adding typedef for "fmode_t" >> in the testcase to solve the compile issue. >> >> Fixes: 14e4b9f4289a ("perf trace: Raw augmented syscalls fix libbpf 1.0+ >> compatibility") >> Signed-off-by: Athira Rajeev >> --- >> tools/perf/tests/bpf-script-test-prologue.c | 2 ++ >> 1 file changed, 2 insertions(+) >> >> diff --git a/tools/perf/tests/bpf-script-test-prologue.c >> b/tools/perf/tests/bpf-script-test-prologue.c >> index bd83d364cf30..91778b5c6125 100644 >> --- a/tools/perf/tests/bpf-script-test-prologue.c >> +++ b/tools/perf/tests/bpf-script-test-prologue.c >> @@ -20,6 +20,8 @@ >> # undef if >> #endif >> >> +typedef unsigned int __bitwise fmode_t; >> + >> #define FMODE_READ 0x1 >> #define FMODE_WRITE 0x2 >> >> -- >> 2.31.1 > > -- > > - Arnaldo
Re: [PATCH] objtool: continue if find_insn() fails in decode_instructions()
Hi Ingo, Happy New Year! On 07/01/23 15:51, Ingo Molnar wrote: * Sathvika Vasireddy wrote: Currently, decode_instructions() is failing if it is not able to find instruction, and this is happening since commit dbcdbdfdf137b4 ("objtool: Rework instruction -> symbol mapping") because it is expecting instruction for STT_NOTYPE symbols. Due to this, the following objtool warnings are seen: [1] arch/powerpc/kernel/optprobes_head.o: warning: objtool: optprobe_template_end(): can't find starting instruction [2] arch/powerpc/kernel/kvm_emul.o: warning: objtool: kvm_template_end(): can't find starting instruction [3] arch/powerpc/kernel/head_64.o: warning: objtool: end_first_256B(): can't find starting instruction The warnings are thrown because find_insn() is failing for symbols that are at the end of the file, or at the end of the section. Given how STT_NOTYPE symbols are currently handled in decode_instructions(), continue if the instruction is not found, instead of throwing warning and returning. Signed-off-by: Naveen N. Rao Signed-off-by: Sathvika Vasireddy The SOB chain doesn't look valid: is Naveen N. Rao, the first SOB line, the author of the patch? If yes then a matching From: line is needed. Or if two people developed the patch, then Co-developed-by should be used: Co-developed-by: First Co-Author Signed-off-by: First Co-Author Co-developed-by: Second Co-Author Signed-off-by: Second Co-Author [ In this SOB sequence "Second Co-Author" is the one who submits the patch. ] [ Please only use Co-developed-by if actual lines of code were written by the co-author that created copyrightable material - it's not a courtesy tag. Reviewed-by/Acked-by/Tested-by can be used to credit non-code contributions. ] Thank you for the clarification, and for bringing these points to my attention. I'll keep these things in mind. In this case, since both Naveen N. Rao and I developed the patch, the below tags are applicable. Co-developed-by: First Co-Author Signed-off-by: First Co-Author Co-developed-by: Second Co-Author Signed-off-by: Second Co-Author However, I would be dropping this particular patch, since I think Nick's patch [1] is better to fix the objtool issue. [1] - https://lore.kernel.org/linuxppc-dev/20221220101323.3119939-1-npig...@gmail.com/ Thanks for reviewing! - Sathvika
Re: [PATCH net-next] Remove DECnet support from kernel
On Mon, Jan 9, 2023, at 09:34, Jiri Slaby wrote: > On 09. 01. 23, 9:14, Jan Engelhardt wrote: >> On Monday 2023-01-09 08:04, Jiri Slaby wrote: > > Right, we used to keep providing also defines and structs in uapi > headers of removed functionality. So that the above socket would > compile, but fail during runtime. > > I am not biased to any solution. In fact, I found out trinity was fixed > already. So either path networking takes, it's fine by me. I'm not sure > about the chromium users, though (and I don't care). Chromium and some of the others look like automatically generated lists of files and the rest seem to have compile-time checks. >From a brief look at all the packages in the debian codesearch link you provided, I don't see any that are likely to cause problems aside from trinity. Arnd
Re: [PATCH net-next] Remove DECnet support from kernel
On 09. 01. 23, 9:14, Jan Engelhardt wrote: On Monday 2023-01-09 08:04, Jiri Slaby wrote: On 18. 08. 22, 2:43, Stephen Hemminger wrote: DECnet is an obsolete network protocol this breaks userspace. Some projects include linux/dn.h: https://codesearch.debian.net/search?q=include.*linux%2Fdn.h=0 I found Trinity fails to build: net/proto-decnet.c:5:10: fatal error: linux/dn.h: No such file or directory 5 | #include Should we provide the above as empty files? Not a good idea. There may be configure tests / code that merely checks for dn.h existence without checking for specific contents/defines. If you provide empty files, this would fail to build: #include "config.h" #ifdef HAVE_LINUX_DN_H # include #endif int main() { #ifdef HAVE_LINUX_DN_H socket(AF_DECNET, 0, DNPROTO_NSP); // or whatever #else ... #endif } So, with my distro hat on, outright removing header files feels like the slightly lesser of two evils. Given the task to port $arbitrary software between operating systems, absent header files is something more or less "regularly" encountered, so one could argue we are "trained" to deal with it. But missing individual defines is a much deeper dive into the APIs and software to patch it out. Right, we used to keep providing also defines and structs in uapi headers of removed functionality. So that the above socket would compile, but fail during runtime. I am not biased to any solution. In fact, I found out trinity was fixed already. So either path networking takes, it's fine by me. I'm not sure about the chromium users, though (and I don't care). thanks, -- js suse labs
Re: [PATCH net-next] Remove DECnet support from kernel
On Monday 2023-01-09 08:04, Jiri Slaby wrote: > On 18. 08. 22, 2:43, Stephen Hemminger wrote: >> DECnet is an obsolete network protocol > > this breaks userspace. Some projects include linux/dn.h: > > https://codesearch.debian.net/search?q=include.*linux%2Fdn.h=0 > > I found Trinity fails to build: > net/proto-decnet.c:5:10: fatal error: linux/dn.h: No such file or directory > 5 | #include > > Should we provide the above as empty files? Not a good idea. There may be configure tests / code that merely checks for dn.h existence without checking for specific contents/defines. If you provide empty files, this would fail to build: #include "config.h" #ifdef HAVE_LINUX_DN_H # include #endif int main() { #ifdef HAVE_LINUX_DN_H socket(AF_DECNET, 0, DNPROTO_NSP); // or whatever #else ... #endif } So, with my distro hat on, outright removing header files feels like the slightly lesser of two evils. Given the task to port $arbitrary software between operating systems, absent header files is something more or less "regularly" encountered, so one could argue we are "trained" to deal with it. But missing individual defines is a much deeper dive into the APIs and software to patch it out.
Re: [PATCH 09/15] staging: fbtft: fb_ssd1351.c: Introduce backlight_is_blank()
On Sun, Jan 08, 2023 at 09:29:43PM +0100, Sam Ravnborg wrote: > Hi Stephen, > > On Sun, Jan 08, 2023 at 08:28:17PM +0100, Stephen Kitt wrote: > > On Sat, 07 Jan 2023 19:26:23 +0100, Sam Ravnborg via B4 Submission Endpoint > > wrote: > > > > > From: Sam Ravnborg > > > > > > Avoiding direct access to backlight_properties.props. > > > > > > Access to the deprecated props.fb_blank replaced by backlight_is_blank(). > > > Access to props.power is dropped - it was only used for debug. > > > > > > Signed-off-by: Sam Ravnborg > > > Cc: Stephen Kitt > > > Cc: Greg Kroah-Hartman > > > Cc: Daniel Thompson > > > Cc: Andy Shevchenko > > > Cc: linux-fb...@vger.kernel.org > > > --- > > > drivers/staging/fbtft/fb_ssd1351.c | 9 +++-- > > > 1 file changed, 3 insertions(+), 6 deletions(-) > > > > > > diff --git a/drivers/staging/fbtft/fb_ssd1351.c > > > b/drivers/staging/fbtft/fb_ssd1351.c index b8d55aa8c5c7..995fbd2f3dc6 > > > 100644 > > > --- a/drivers/staging/fbtft/fb_ssd1351.c > > > +++ b/drivers/staging/fbtft/fb_ssd1351.c > > > @@ -190,15 +190,12 @@ static struct fbtft_display display = { > > > static int update_onboard_backlight(struct backlight_device *bd) > > > { > > > struct fbtft_par *par = bl_get_data(bd); > > > - bool on; > > > + bool blank = backlight_is_blank(bd); > > > > > > - fbtft_par_dbg(DEBUG_BACKLIGHT, par, > > > - "%s: power=%d, fb_blank=%d\n", > > > - __func__, bd->props.power, bd->props.fb_blank); > > > + fbtft_par_dbg(DEBUG_BACKLIGHT, par, "%s: blank=%d\n", __func__, > > > blank); > > > - on = !backlight_is_blank(bd); > > > /* Onboard backlight connected to GPIO0 on SSD1351, GPIO1 unused */ > > > - write_reg(par, 0xB5, on ? 0x03 : 0x02); > > > + write_reg(par, 0xB5, !blank ? 0x03 : 0x02); > > > > > > return 0; > > > } > > > > > > -- > > > 2.34.1 > > > > For debugging purposes here, would there be any point in logging > > props.state? > > As in > > > > fbtft_par_dbg(DEBUG_BACKLIGHT, par, > > - "%s: power=%d, fb_blank=%d\n", > > - __func__, bd->props.power, bd->props.fb_blank); > > + "%s: power=%d, state=%u\n", > > + __func__, bd->props.power, bd->props.state); > > Thanks for the suggestion - and the reviews! > > I was tempted to just remove the debugging. > If we require debugging, then this could be added in the backlight core, > thus everyone would benefit from it. I had the same instinct to remove the debug print here (esp. ones with __func__ in them) but I think that's probably a much more widely scoped clean up for fbtft ;-). On that basis: Reviewed-by: Daniel Thompson Daniel.
Re: [PATCH 10/15] staging: fbtft: core: Introduce backlight_is_blank()
On Sat, Jan 07, 2023 at 07:26:24PM +0100, Sam Ravnborg via B4 Submission Endpoint wrote: > From: Sam Ravnborg > > Avoiding direct access to backlight_properties.props. > > Access to the deprecated props.fb_blank replaced by backlight_is_blank(). > Access to props.power is dropped - it was only used for debug. ... > + if (blank) > gpiod_set_value(par->gpio.led[0], !polarity); > + else > + gpiod_set_value(par->gpio.led[0], polarity); if (blank) polarity = !polarity; gpiod_set_value(par->gpio.led[0], polarity); ? -- With Best Regards, Andy Shevchenko
Re: [PATCH 09/15] staging: fbtft: fb_ssd1351.c: Introduce backlight_is_blank()
On Sat, Jan 07, 2023 at 07:26:23PM +0100, Sam Ravnborg via B4 Submission Endpoint wrote: > From: Sam Ravnborg > > Avoiding direct access to backlight_properties.props. > > Access to the deprecated props.fb_blank replaced by backlight_is_blank(). > Access to props.power is dropped - it was only used for debug. > Signed-off-by: Sam Ravnborg > Cc: Stephen Kitt > Cc: Greg Kroah-Hartman > Cc: Daniel Thompson > Cc: Andy Shevchenko > Cc: linux-fb...@vger.kernel.org Not sure why you have this (at least) explicitly mentioned as get_maintainer.pl can generate it and git send-email can utilize the script output... ... > - write_reg(par, 0xB5, on ? 0x03 : 0x02); > + write_reg(par, 0xB5, !blank ? 0x03 : 0x02); Why not positive conditional? write_reg(par, 0xB5, blank ? 0x02 : 0x03); -- With Best Regards, Andy Shevchenko
Re: [PATCH 15/15] backlight: backlight: Drop the deprecated fb_blank property
On Sat, Jan 07, 2023 at 07:26:29PM +0100, Sam Ravnborg via B4 Submission Endpoint wrote: > From: Sam Ravnborg > > With all users gone remove the deprecated fb_blank member in > backlight_properties. > > Signed-off-by: Sam Ravnborg > Cc: Lee Jones > Cc: Daniel Thompson > Cc: Jingoo Han Reviewed-by: Daniel Thompson PS Please don't treat this like a maintainer Acked-by: and merge it (Lee's not on holiday so work with Lee to figure out the merge strategy ;-) ).
Re: [PATCH 14/15] backlight: tosa: Use backlight helper
On Sat, Jan 07, 2023 at 07:26:28PM +0100, Sam Ravnborg via B4 Submission Endpoint wrote: > From: Stephen Kitt > > Instead of retrieving the backlight brightness in struct > backlight_properties manually, and then checking whether the backlight > should be on at all, use backlight_get_brightness() which does all > this and insulates this from future changes. > > Signed-off-by: Stephen Kitt > Signed-off-by: Sam Ravnborg Just for completeness... Reviewed-by: Daniel Thompson Daniel.
Re: [PATCH 13/15] backlight: omap1: Use backlight helpers
On Sat, Jan 07, 2023 at 07:26:27PM +0100, Sam Ravnborg via B4 Submission Endpoint wrote: > From: Sam Ravnborg > > Rework backlight handling to avoid access to the deprecated > backlight_properties.fb_blank member. > > The rework includes removal of get_brightness() operation, > because there was no read back from HW so no use for it. > > Signed-off-by: Sam Ravnborg > Cc: Lee Jones > Cc: Daniel Thompson > Cc: Jingoo Han This rework will result in additional calls to omapbl_send_enable() during updates so, if anyone who cares about omap1, is filtering LKML then a Tested-by: would be nice. However, I doubt the additional calls will do any harm so even with that absent: Reviewed-by: Daniel Thompson Daniel.
[Bug 216095] sysfs: cannot create duplicate filename '/devices/platform/of-display'
https://bugzilla.kernel.org/show_bug.cgi?id=216095 --- Comment #12 from Michal Suchánek (msucha...@suse.de) --- Thanks for the DT dumps. So you really do have two outputs but the problem is likely different: The hardware-specific driver gets initialized earlier, and then the offb/ofdrm driver cannot get the resources for the framebuffer because the native driver is already using them. That should be fine but the message it prints is not exactly clear. -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
Re: [PATCH 01/15] video: fbdev: atmel_lcdfb: Rework backlight handling
On Sat, 07 Jan 2023, Sam Ravnborg via B4 Submission Endpoint wrote: > From: Sam Ravnborg > > The atmel_lcdfb had code to save/restore power state. > This is not needed so drop it. > > Introduce backlight_is_brightness() to make logic simpler. backlight_is_brightness? BR, Jani. > > Signed-off-by: Sam Ravnborg > Cc: Nicolas Ferre > Cc: Alexandre Belloni > Cc: Ludovic Desroches > Cc: linux-fb...@vger.kernel.org > Cc: linux-arm-ker...@lists.infradead.org > --- > drivers/video/fbdev/atmel_lcdfb.c | 24 +++- > 1 file changed, 3 insertions(+), 21 deletions(-) > > diff --git a/drivers/video/fbdev/atmel_lcdfb.c > b/drivers/video/fbdev/atmel_lcdfb.c > index 1fc8de4ecbeb..d297b3892637 100644 > --- a/drivers/video/fbdev/atmel_lcdfb.c > +++ b/drivers/video/fbdev/atmel_lcdfb.c > @@ -49,7 +49,6 @@ struct atmel_lcdfb_info { > struct clk *lcdc_clk; > > struct backlight_device *backlight; > - u8 bl_power; > u8 saved_lcdcon; > > u32 pseudo_palette[16]; > @@ -109,32 +108,18 @@ static u32 contrast_ctr = ATMEL_LCDC_PS_DIV8 > static int atmel_bl_update_status(struct backlight_device *bl) > { > struct atmel_lcdfb_info *sinfo = bl_get_data(bl); > - int power = sinfo->bl_power; > - int brightness = bl->props.brightness; > + int brightness; > > - /* REVISIT there may be a meaningful difference between > - * fb_blank and power ... there seem to be some cases > - * this doesn't handle correctly. > - */ > - if (bl->props.fb_blank != sinfo->bl_power) > - power = bl->props.fb_blank; > - else if (bl->props.power != sinfo->bl_power) > - power = bl->props.power; > - > - if (brightness < 0 && power == FB_BLANK_UNBLANK) > - brightness = lcdc_readl(sinfo, ATMEL_LCDC_CONTRAST_VAL); > - else if (power != FB_BLANK_UNBLANK) > - brightness = 0; > + brightness = backlight_get_brightness(bl); > > lcdc_writel(sinfo, ATMEL_LCDC_CONTRAST_VAL, brightness); > + > if (contrast_ctr & ATMEL_LCDC_POL_POSITIVE) > lcdc_writel(sinfo, ATMEL_LCDC_CONTRAST_CTR, > brightness ? contrast_ctr : 0); > else > lcdc_writel(sinfo, ATMEL_LCDC_CONTRAST_CTR, contrast_ctr); > > - bl->props.fb_blank = bl->props.power = sinfo->bl_power = power; > - > return 0; > } > > @@ -155,8 +140,6 @@ static void init_backlight(struct atmel_lcdfb_info *sinfo) > struct backlight_properties props; > struct backlight_device *bl; > > - sinfo->bl_power = FB_BLANK_UNBLANK; > - > if (sinfo->backlight) > return; > > @@ -173,7 +156,6 @@ static void init_backlight(struct atmel_lcdfb_info *sinfo) > sinfo->backlight = bl; > > bl->props.power = FB_BLANK_UNBLANK; > - bl->props.fb_blank = FB_BLANK_UNBLANK; > bl->props.brightness = atmel_bl_get_brightness(bl); > } -- Jani Nikula, Intel Open Source Graphics Center