Re: [PATCH] init: move THIS_MODULE from to

2023-12-06 Thread Paul Gortmaker
[Re: [PATCH] init: move THIS_MODULE from  to ] On 
03/12/2023 (Sun 19:06) Masahiro Yamada wrote:

> On Sun, Nov 26, 2023 at 4:19???PM Masahiro Yamada  
> wrote:
> >
> > Commit f50169324df4 ("module.h: split out the EXPORT_SYMBOL into
> > export.h") appropriately separated EXPORT_SYMBOL into 
> > because modules and EXPORT_SYMBOL are orthogonal; modules are symbol
> > consumers, while EXPORT_SYMBOL are used by symbol providers, which
> > may not be necessarily a module.
> >
> > However, that commit also relocated THIS_MODULE. As explained in the
> > commit description, the intention was to define THIS_MODULE in a
> > lightweight header, but I do not believe  was the
> > suitable location because EXPORT_SYMBOL and THIS_MODULE are unrelated.
> >
> > Move it to another lightweight header, . The reason for
> > choosing  is to make  self-contained
> > without relying on  incorrectly including
> > .
> >
> > With this adjustment, the role of  becomes clearer as
> > it only defines EXPORT_SYMBOL.
> >
> > Signed-off-by: Masahiro Yamada 
> > ---
> 
> 
> Applied to kbuild.
> 
> I did not get any report from the 0day bot so far,
> but I hope it will get a little more compile tests
> before getting into linux-next.

Haven't touched that kind of header shuffle for over 10 years?

But yeah, it is near impossible to not trip over some implicit header
inclusion somewhere in some driver or a less common arch and hence break
the build at least once when doing this kind of stuff.

Paul.
--

> 
> 
> 
> >
> >  include/linux/export.h | 18 --
> >  include/linux/init.h   |  7 +++
> >  2 files changed, 7 insertions(+), 18 deletions(-)
> >
> > diff --git a/include/linux/export.h b/include/linux/export.h
> > index 9911508a9604..0bbd02fd351d 100644
> > --- a/include/linux/export.h
> > +++ b/include/linux/export.h
> > @@ -6,15 +6,6 @@
> >  #include 
> >  #include 
> >
> > -/*
> > - * Export symbols from the kernel to modules.  Forked from module.h
> > - * to reduce the amount of pointless cruft we feed to gcc when only
> > - * exporting a simple symbol or two.
> > - *
> > - * Try not to add #includes here.  It slows compilation and makes kernel
> > - * hackers place grumpy comments in header files.
> > - */
> > -
> >  /*
> >   * This comment block is used by fixdep. Please do not remove.
> >   *
> > @@ -23,15 +14,6 @@
> >   * side effect of the *.o build rule.
> >   */
> >
> > -#ifndef __ASSEMBLY__
> > -#ifdef MODULE
> > -extern struct module __this_module;
> > -#define THIS_MODULE (&__this_module)
> > -#else
> > -#define THIS_MODULE ((struct module *)0)
> > -#endif
> > -#endif /* __ASSEMBLY__ */
> > -
> >  #ifdef CONFIG_64BIT
> >  #define __EXPORT_SYMBOL_REF(sym)   \
> > .balign 8   ASM_NL  \
> > diff --git a/include/linux/init.h b/include/linux/init.h
> > index 01b52c9c7526..3fa3f6241350 100644
> > --- a/include/linux/init.h
> > +++ b/include/linux/init.h
> > @@ -179,6 +179,13 @@ extern void (*late_time_init)(void);
> >
> >  extern bool initcall_debug;
> >
> > +#ifdef MODULE
> > +extern struct module __this_module;
> > +#define THIS_MODULE (&__this_module)
> > +#else
> > +#define THIS_MODULE ((struct module *)0)
> > +#endif
> > +
> >  #endif
> >
> >  #ifndef MODULE
> > --
> > 2.40.1
> >
> 
> 
> -- 
> Best Regards
> Masahiro Yamada



[PATCH] sched/isolation: reconcile rcu_nocbs= and nohz_full=

2021-04-18 Thread Paul Gortmaker
We have a mismatch between RCU and isolation -- in relation to what is
considered the maximum valid CPU number.

This matters because nohz_full= and rcu_nocbs= are joined at the hip; in
fact the former will enforce the latter.  So we don't want a CPU mask to
be valid for one and denied for the other.

The difference 1st appeared as of v4.15; further details are below.

As it is confusing to anyone who isn't looking at the code regularly, a
reminder is in order; three values exist here:

CONFIG_NR_CPUS  - compiled in maximum cap on number of CPUs supported.
nr_cpu_ids  - possible # of CPUs (typically reflects what ACPI says)
cpus_present- actual number of present/detected/installed CPUs.

For this example, I'll refer to NR_CPUS=64 from "make defconfig" and
nr_cpu_ids=6 for ACPI reporting on a board that could run a six core,
and present=4 for a quad that is physically in the socket.  From dmesg:

 smpboot: Allowing 6 CPUs, 2 hotplug CPUs
 setup_percpu: NR_CPUS:64 nr_cpumask_bits:64 nr_cpu_ids:6 nr_node_ids:1
 rcu:   RCU restricting CPUs from NR_CPUS=64 to nr_cpu_ids=6.
 smp: Brought up 1 node, 4 CPUs

And from userspace, see:

   paul@trash:/sys/devices/system/cpu$ cat present
   0-3
   paul@trash:/sys/devices/system/cpu$ cat possible
   0-5
   paul@trash:/sys/devices/system/cpu$ cat kernel_max
   63

Everything is fine if we boot 5x5 for rcu/nohz:

  Command line: BOOT_IMAGE=/boot/bzImage nohz_full=2-5 rcu_nocbs=2-5 
root=/dev/sda1 ro
  NO_HZ: Full dynticks CPUs: 2-5.
  rcu:  Offload RCU callbacks from CPUs: 2-5.

..even though there is no CPU 4 or 5.  Both RCU and nohz_full are OK.
Now we push that > 6 but less than NR_CPU and with 15x15 we get:

  Command line: BOOT_IMAGE=/boot/bzImage rcu_nocbs=2-15 nohz_full=2-15 
root=/dev/sda1 ro
  rcu:  Note: kernel parameter 'rcu_nocbs=', 'nohz_full', or 'isolcpus=' 
contains nonexistent CPUs.
  rcu:  Offload RCU callbacks from CPUs: 2-5.

These are both functionally equivalent, as we are only changing flags on
phantom CPUs that don't exist, but note the kernel interpretation changes.
And worse, it only changes for one of the two - which is the problem.

RCU doesn't care if you want to restrict the flags on phantom CPUs but
clearly nohz_full does after this change from v4.15 (edb9382175c3):

-   if (cpulist_parse(str, non_housekeeping_mask) < 0) {
-   pr_warn("Housekeeping: Incorrect nohz_full cpumask\n");
+   err = cpulist_parse(str, non_housekeeping_mask);
+   if (err < 0 || cpumask_last(non_housekeeping_mask) >= nr_cpu_ids) {
+   pr_warn("Housekeeping: nohz_full= or isolcpus= incorrect CPU 
range\n");

To be clear, the sanity check on "possible" (nr_cpu_ids) is new here.

The goal was reasonable ; not wanting housekeeping to land on a
not-possible CPU, but note two things:

1) this is an exclusion list, not an inclusion list; we are tracking
non_housekeeping CPUs; not ones who are explicitly assigned housekeeping

2) we went one further in 9219565aa890 - ensuring that housekeeping was
sanity checking against present and not just possible CPUs.

To be clear, this means the check added in v4.15 is doubly redundant.
And more importantly, overly strict/restrictive.

We care now, because the bitmap boot arg parsing now knows that a value
of "N" is NR_CPUS; the size of the bitmap, but the bitmap code doesn't
know anything about the subtleties of our max/possible/present CPU
specifics as outlined above.

So drop the check added in v4.15 (edb9382175c3) and make RCU and
nohz_full both in alignment again on NR_CPUS so "N" works for both,
and then they can fall back to nr_cpu_ids internally just as before.

  Command line: BOOT_IMAGE=/boot/bzImage nohz_full=2-N rcu_nocbs=2-N 
root=/dev/sda1 ro
  NO_HZ: Full dynticks CPUs: 2-5.
  rcu:  Offload RCU callbacks from CPUs: 2-5.

As shown above, with this change, RCU and nohz_full are in sync, even
with the use of the "N" placeholder.  Same result is achieved with "15".

Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Paul E. McKenney 
Cc: Frederic Weisbecker 
Signed-off-by: Paul Gortmaker 

diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index 5a6ea03f9882..7f06eaf12818 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -81,11 +81,9 @@ static int __init housekeeping_setup(char *str, enum 
hk_flags flags)
 {
cpumask_var_t non_housekeeping_mask;
cpumask_var_t tmp;
-   int err;
 
alloc_bootmem_cpumask_var(_housekeeping_mask);
-   err = cpulist_parse(str, non_housekeeping_mask);
-   if (err < 0 || cpumask_last(non_housekeeping_mask) >= nr_cpu_ids) {
+   if (cpulist_parse(str, non_housekeeping_mask) < 0) {
pr_warn("Housekeeping: nohz_full= or isolcpus= incorrect CPU 
range\n");
free_bootmem_cpumask_var(non_housekeeping_mask);
return 0;
-- 
2.25.1



[PATCH] sched/isolation: don't do unbounded chomp on bootarg string

2021-04-18 Thread Paul Gortmaker
After commit 3662daf023500dc084fa3b96f68a6f46179ddc73
("sched/isolation: Allow "isolcpus=" to skip unknown sub-parameters")
the isolcpus= string is walked to skip over what might be future flag
comma separated additions.

However, there is a logic error, and so as can clearly be seen below, it
will ignore its own arg len and search to the end of the bootarg string.

 $ dmesg|grep isol
 Command line: BOOT_IMAGE=/boot/bzImage isolcpus=xyz pleasedontparseme=1 
root=/dev/sda1 ro
 Kernel command line: BOOT_IMAGE=/boot/bzImage isolcpus=xyz pleasedontparseme=1 
root=/dev/sda1 ro
 isolcpus: Skipped unknown flag xyz
 isolcpus: Invalid flag pleasedontparseme=1 root=/dev/sda1 ro

This happens because the flag "skip" code does an unconditional
increment, which skips over the '\0' check the loop body looks for. If
the isolcpus= happens to be the last bootarg, then you'd never notice?

So we only increment if the skipped flag is followed by a comma, as per
what the existing "continue" flag matching code does.

Note that isolcpus= was declared deprecated as of v4.15 (b0d40d2b22fe),
so we might want to revisit that if we are trying to future-proof it
as recently as a year ago for as yet unseen new flags.

Cc: Thomas Gleixner 
Cc: Frederic Weisbecker 
Cc: Peter Zijlstra 
Cc: Ingo Molnar 
Cc: Peter Xu 
Fixes: 3662daf02350 ("sched/isolation: Allow "isolcpus=" to skip unknown 
sub-parameters")
Signed-off-by: Paul Gortmaker 

diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index 5a6ea03f9882..9652dba7e938 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -188,7 +188,8 @@ static int __init housekeeping_isolcpus_setup(char *str)
}
 
pr_info("isolcpus: Skipped unknown flag %.*s\n", len, par);
-   str++;
+   if (str[1] == ',')  /* above continue; match on "flag," */
+   str++;
}
 
/* Default behaviour for isolcpus without flags */
-- 
2.25.1



[PATCH] sched/isolation: don't do unbounded chomp on bootarg string

2021-04-18 Thread Paul Gortmaker
After commit 3662daf02350 ("sched/isolation: Allow "isolcpus=" to skip
unknown sub-parameters") the isolcpus= string is walked to skip over what
might be any future flag comma separated additions.

However, there is a logic error, and so as can clearly be seen below, it
will ignore its own arg len and search to the end of the bootarg string.

 $ dmesg|grep isol
 Command line: BOOT_IMAGE=/boot/bzImage isolcpus=xyz pleasedontparseme=1 
root=/dev/sda1 ro
 isolcpus: Skipped unknown flag xyz
 isolcpus: Invalid flag pleasedontparseme=1 root=/dev/sda1 ro

This happens because the flag "skip" code does an unconditional
increment, which skips over the '\0' check the loop body looks for. If
the isolcpus= happens to be the last bootarg, then you'd never notice?

So we only increment if the skipped flag is followed by a comma, as per
what the existing "continue" flag matching code does.

Note that isolcpus= was declared deprecated as of v4.15 (b0d40d2b22fe),
so we might want to revisit that if we are trying to future-proof it
as recently as a year ago for as yet unseen new flags.

Cc: Peter Xu 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Frederic Weisbecker 
Fixes: 3662daf02350 ("sched/isolation: Allow "isolcpus=" to skip unknown 
sub-parameters")
Signed-off-by: Paul Gortmaker 

diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index 5a6ea03f9882..9652dba7e938 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -188,7 +188,8 @@ static int __init housekeeping_isolcpus_setup(char *str)
}
 
pr_info("isolcpus: Skipped unknown flag %.*s\n", len, par);
-   str++;
+   if (str[1] == ',')  /* above continue; match on "flag," */
+   str++;
}
 
/* Default behaviour for isolcpus without flags */
-- 
2.25.1



[tip: core/rcu] rcu: deprecate "all" option to rcu_nocbs=

2021-04-11 Thread tip-bot2 for Paul Gortmaker
The following commit has been merged into the core/rcu branch of tip:

Commit-ID: 3e70df91f961b9df7ab3c0ae1934bdf15454c536
Gitweb:
https://git.kernel.org/tip/3e70df91f961b9df7ab3c0ae1934bdf15454c536
Author:Paul Gortmaker 
AuthorDate:Sun, 21 Feb 2021 03:08:27 -05:00
Committer: Paul E. McKenney 
CommitterDate: Mon, 08 Mar 2021 14:16:58 -08:00

rcu: deprecate "all" option to rcu_nocbs=

With the core bitmap support now accepting "N" as a placeholder for
the end of the bitmap, "all" can be represented as "0-N" and has the
advantage of not being specific to RCU (or any other subsystem).

So deprecate the use of "all" by removing documentation references
to it.  The support itself needs to remain for now, since we don't
know how many people out there are using it currently, but since it
is in an __init area anyway, it isn't worth losing sleep over.

Cc: Yury Norov 
Cc: Peter Zijlstra 
Cc: "Paul E. McKenney" 
Cc: Josh Triplett 
Acked-by: Yury Norov 
Signed-off-by: Paul Gortmaker 
Signed-off-by: Paul E. McKenney 
---
 Documentation/admin-guide/kernel-parameters.txt | 4 +---
 kernel/rcu/tree_plugin.h| 6 ++
 2 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 0454572..83e2ef1 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4068,9 +4068,7 @@
see CONFIG_RAS_CEC help text.
 
rcu_nocbs=  [KNL]
-   The argument is a cpu list, as described above,
-   except that the string "all" can be used to
-   specify every CPU on the system.
+   The argument is a cpu list, as described above.
 
In kernels built with CONFIG_RCU_NOCB_CPU=y, set
the specified list of CPUs to be no-callback CPUs.
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 2d60377..0b95562 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1464,14 +1464,12 @@ static void rcu_cleanup_after_idle(void)
 
 /*
  * Parse the boot-time rcu_nocb_mask CPU list from the kernel parameters.
- * The string after the "rcu_nocbs=" is either "all" for all CPUs, or a
- * comma-separated list of CPUs and/or CPU ranges.  If an invalid list is
- * given, a warning is emitted and all CPUs are offloaded.
+ * If the list is invalid, a warning is emitted and all CPUs are offloaded.
  */
 static int __init rcu_nocb_setup(char *str)
 {
alloc_bootmem_cpumask_var(_nocb_mask);
-   if (!strcasecmp(str, "all"))
+   if (!strcasecmp(str, "all"))/* legacy: use "0-N" instead */
cpumask_setall(rcu_nocb_mask);
else
if (cpulist_parse(str, rcu_nocb_mask)) {


[tip: core/rcu] lib: test_bitmap: clearly separate ERANGE from EINVAL tests.

2021-04-11 Thread tip-bot2 for Paul Gortmaker
The following commit has been merged into the core/rcu branch of tip:

Commit-ID: 494215fbf298787e4ead16e4c68634d241336b02
Gitweb:
https://git.kernel.org/tip/494215fbf298787e4ead16e4c68634d241336b02
Author:Paul Gortmaker 
AuthorDate:Sun, 21 Feb 2021 03:08:20 -05:00
Committer: Paul E. McKenney 
CommitterDate: Mon, 08 Mar 2021 14:16:58 -08:00

lib: test_bitmap: clearly separate ERANGE from EINVAL tests.

This block of tests was meant to find/flag incorrect use of the ":"
and "/" separators (syntax errors) and invalid (zero) group len.

However they were specified with an 8 bit width and 32 bit operations,
so they really contained two errors (EINVAL and ERANGE).

Promote them to 32 bit so it is clear what they are meant to target.
Then we can add tests specific for ERANGE (no syntax errors, just
doing 32bit op on 8 bit width, plus a typical 9-on-8 fencepost error).

Cc: Yury Norov 
Cc: Rasmus Villemoes 
Cc: Andy Shevchenko 
Acked-by: Yury Norov 
Reviewed-by: Andy Shevchenko 
Signed-off-by: Paul Gortmaker 
Signed-off-by: Paul E. McKenney 
---
 lib/test_bitmap.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c
index 0ea0e82..853a3a6 100644
--- a/lib/test_bitmap.c
+++ b/lib/test_bitmap.c
@@ -337,12 +337,12 @@ static const struct test_bitmap_parselist 
parselist_tests[] __initconst = {
{-EINVAL, "-1", NULL, 8, 0},
{-EINVAL, "-0", NULL, 8, 0},
{-EINVAL, "10-1", NULL, 8, 0},
-   {-EINVAL, "0-31:", NULL, 8, 0},
-   {-EINVAL, "0-31:0", NULL, 8, 0},
-   {-EINVAL, "0-31:0/", NULL, 8, 0},
-   {-EINVAL, "0-31:0/0", NULL, 8, 0},
-   {-EINVAL, "0-31:1/0", NULL, 8, 0},
-   {-EINVAL, "0-31:10/1", NULL, 8, 0},
+   {-EINVAL, "0-31:", NULL, 32, 0},
+   {-EINVAL, "0-31:0", NULL, 32, 0},
+   {-EINVAL, "0-31:0/", NULL, 32, 0},
+   {-EINVAL, "0-31:0/0", NULL, 32, 0},
+   {-EINVAL, "0-31:1/0", NULL, 32, 0},
+   {-EINVAL, "0-31:10/1", NULL, 32, 0},
{-EOVERFLOW, "0-98765432123456789:10/1", NULL, 8, 0},
 
{-EINVAL, "a-31", NULL, 8, 0},


[tip: core/rcu] lib: test_bitmap: add tests to trigger ERANGE case.

2021-04-11 Thread tip-bot2 for Paul Gortmaker
The following commit has been merged into the core/rcu branch of tip:

Commit-ID: 6fef5905fbd691aeb91093056b27d5ee7b106097
Gitweb:
https://git.kernel.org/tip/6fef5905fbd691aeb91093056b27d5ee7b106097
Author:Paul Gortmaker 
AuthorDate:Sun, 21 Feb 2021 03:08:21 -05:00
Committer: Paul E. McKenney 
CommitterDate: Mon, 08 Mar 2021 14:16:58 -08:00

lib: test_bitmap: add tests to trigger ERANGE case.

Add tests that specify a valid range, but one that is outside the
width of the bitmap for which it is to be applied to.  These should
trigger an -ERANGE response from the code.

Cc: Yury Norov 
Cc: Rasmus Villemoes 
Cc: Andy Shevchenko 
Acked-by: Yury Norov 
Reviewed-by: Andy Shevchenko 
Signed-off-by: Paul Gortmaker 
Signed-off-by: Paul E. McKenney 
---
 lib/test_bitmap.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c
index 853a3a6..0f2e91d 100644
--- a/lib/test_bitmap.c
+++ b/lib/test_bitmap.c
@@ -337,6 +337,8 @@ static const struct test_bitmap_parselist parselist_tests[] 
__initconst = {
{-EINVAL, "-1", NULL, 8, 0},
{-EINVAL, "-0", NULL, 8, 0},
{-EINVAL, "10-1", NULL, 8, 0},
+   {-ERANGE, "8-8", NULL, 8, 0},
+   {-ERANGE, "0-31", NULL, 8, 0},
{-EINVAL, "0-31:", NULL, 32, 0},
{-EINVAL, "0-31:0", NULL, 32, 0},
{-EINVAL, "0-31:0/", NULL, 32, 0},


[tip: core/rcu] lib: test_bitmap: add more start-end:offset/len tests

2021-04-11 Thread tip-bot2 for Paul Gortmaker
The following commit has been merged into the core/rcu branch of tip:

Commit-ID: 97330db3af9a41302d1ccb0f495fcb5b5da2cc44
Gitweb:
https://git.kernel.org/tip/97330db3af9a41302d1ccb0f495fcb5b5da2cc44
Author:Paul Gortmaker 
AuthorDate:Sun, 21 Feb 2021 03:08:22 -05:00
Committer: Paul E. McKenney 
CommitterDate: Mon, 08 Mar 2021 14:16:58 -08:00

lib: test_bitmap: add more start-end:offset/len tests

There are inputs to bitmap_parselist() that would probably never
be entered manually by a person, but might result from some kind of
automated input generator.  Things like ranges of length 1, or group
lengths longer than nbits, overlaps, or offsets of zero.

Adding these tests serve two purposes:

1) document what might seem odd but nonetheless valid input.

2) don't regress from what we currently accept as valid.

Cc: Yury Norov 
Cc: Rasmus Villemoes 
Cc: Andy Shevchenko 
Acked-by: Yury Norov 
Reviewed-by: Andy Shevchenko 
Signed-off-by: Paul Gortmaker 
Signed-off-by: Paul E. McKenney 
---
 lib/test_bitmap.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c
index 0f2e91d..3c1c46d 100644
--- a/lib/test_bitmap.c
+++ b/lib/test_bitmap.c
@@ -34,6 +34,8 @@ static const unsigned long exp1[] __initconst = {
BITMAP_FROM_U64(0xULL),
BITMAP_FROM_U64(0xULL),
BITMAP_FROM_U64(0),
+   BITMAP_FROM_U64(0x8000),
+   BITMAP_FROM_U64(0x8000),
 };
 
 static const unsigned long exp2[] __initconst = {
@@ -334,6 +336,26 @@ static const struct test_bitmap_parselist 
parselist_tests[] __initconst = {
{0, " ,  ,,  , ,   ",   [12 * step], 8, 0},
{0, " ,  ,,  , ,   \n", [12 * step], 8, 0},
 
+   {0, "0-0",  [0], 32, 0},
+   {0, "1-1",  [1 * step], 32, 0},
+   {0, "15-15",[13 * step], 32, 0},
+   {0, "31-31",[14 * step], 32, 0},
+
+   {0, "0-0:0/1",  [12 * step], 32, 0},
+   {0, "0-0:1/1",  [0], 32, 0},
+   {0, "0-0:1/31", [0], 32, 0},
+   {0, "0-0:31/31",[0], 32, 0},
+   {0, "1-1:1/1",  [1 * step], 32, 0},
+   {0, "0-15:16/31",   [2 * step], 32, 0},
+   {0, "15-15:1/2",[13 * step], 32, 0},
+   {0, "15-15:31/31",  [13 * step], 32, 0},
+   {0, "15-31:1/31",   [13 * step], 32, 0},
+   {0, "16-31:16/31",  [3 * step], 32, 0},
+   {0, "31-31:31/31",  [14 * step], 32, 0},
+
+   {0, "0-31:1/3,1-31:1/3,2-31:1/3",   [8 * step], 32, 0},
+   {0, "1-10:8/12,8-31:24/29,0-31:0/3",[9 * step], 32, 0},
+
{-EINVAL, "-1", NULL, 8, 0},
{-EINVAL, "-0", NULL, 8, 0},
{-EINVAL, "10-1", NULL, 8, 0},


[tip: core/rcu] lib: bitmap: fold nbits into region struct

2021-04-11 Thread tip-bot2 for Paul Gortmaker
The following commit has been merged into the core/rcu branch of tip:

Commit-ID: 9d7a3366b7028ae8dd16a0d7585cbf11b03b42a0
Gitweb:
https://git.kernel.org/tip/9d7a3366b7028ae8dd16a0d7585cbf11b03b42a0
Author:Paul Gortmaker 
AuthorDate:Sun, 21 Feb 2021 03:08:23 -05:00
Committer: Paul E. McKenney 
CommitterDate: Mon, 08 Mar 2021 14:16:58 -08:00

lib: bitmap: fold nbits into region struct

This will reduce parameter passing and enable using nbits as part
of future dynamic region parameter parsing.

Cc: Yury Norov 
Cc: Rasmus Villemoes 
Cc: Andy Shevchenko 
Suggested-by: Yury Norov 
Acked-by: Yury Norov 
Reviewed-by: Andy Shevchenko 
Signed-off-by: Paul Gortmaker 
Signed-off-by: Paul E. McKenney 
---
 lib/bitmap.c | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/lib/bitmap.c b/lib/bitmap.c
index 75006c4..162e285 100644
--- a/lib/bitmap.c
+++ b/lib/bitmap.c
@@ -487,24 +487,24 @@ EXPORT_SYMBOL(bitmap_print_to_pagebuf);
 
 /*
  * Region 9-38:4/10 describes the following bitmap structure:
- * 0  9  1218  38
- * .......
- * ^  ^ ^   ^
- *  start  off   group_lenend
+ * 0  9  1218  38   N
+ * .......
+ * ^  ^ ^   ^   ^
+ *  start  off   group_lenend   nbits
  */
 struct region {
unsigned int start;
unsigned int off;
unsigned int group_len;
unsigned int end;
+   unsigned int nbits;
 };
 
-static int bitmap_set_region(const struct region *r,
-   unsigned long *bitmap, int nbits)
+static int bitmap_set_region(const struct region *r, unsigned long *bitmap)
 {
unsigned int start;
 
-   if (r->end >= nbits)
+   if (r->end >= r->nbits)
return -ERANGE;
 
for (start = r->start; start <= r->end; start += r->group_len)
@@ -640,7 +640,8 @@ int bitmap_parselist(const char *buf, unsigned long *maskp, 
int nmaskbits)
struct region r;
long ret;
 
-   bitmap_zero(maskp, nmaskbits);
+   r.nbits = nmaskbits;
+   bitmap_zero(maskp, r.nbits);
 
while (buf) {
buf = bitmap_find_region(buf);
@@ -655,7 +656,7 @@ int bitmap_parselist(const char *buf, unsigned long *maskp, 
int nmaskbits)
if (ret)
return ret;
 
-   ret = bitmap_set_region(, maskp, nmaskbits);
+   ret = bitmap_set_region(, maskp);
if (ret)
return ret;
}


[tip: core/rcu] lib: bitmap: move ERANGE check from set_region to check_region

2021-04-11 Thread tip-bot2 for Paul Gortmaker
The following commit has been merged into the core/rcu branch of tip:

Commit-ID: f3c869caef648c541a7445f2a6ba2196d343f542
Gitweb:
https://git.kernel.org/tip/f3c869caef648c541a7445f2a6ba2196d343f542
Author:Paul Gortmaker 
AuthorDate:Sun, 21 Feb 2021 03:08:24 -05:00
Committer: Paul E. McKenney 
CommitterDate: Mon, 08 Mar 2021 14:16:58 -08:00

lib: bitmap: move ERANGE check from set_region to check_region

It makes sense to do all the checks in check_region() and not 1/2
in check_region and 1/2 in set_region.

Since set_region is called immediately after check_region, the net
effect on runtime is zero, but it gets rid of an if (...) return...

Cc: Yury Norov 
Cc: Rasmus Villemoes 
Cc: Andy Shevchenko 
Acked-by: Yury Norov 
Reviewed-by: Andy Shevchenko 
Signed-off-by: Paul Gortmaker 
Signed-off-by: Paul E. McKenney 
---
 lib/bitmap.c | 14 +-
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/lib/bitmap.c b/lib/bitmap.c
index 162e285..833f152 100644
--- a/lib/bitmap.c
+++ b/lib/bitmap.c
@@ -500,17 +500,12 @@ struct region {
unsigned int nbits;
 };
 
-static int bitmap_set_region(const struct region *r, unsigned long *bitmap)
+static void bitmap_set_region(const struct region *r, unsigned long *bitmap)
 {
unsigned int start;
 
-   if (r->end >= r->nbits)
-   return -ERANGE;
-
for (start = r->start; start <= r->end; start += r->group_len)
bitmap_set(bitmap, start, min(r->end - start + 1, r->off));
-
-   return 0;
 }
 
 static int bitmap_check_region(const struct region *r)
@@ -518,6 +513,9 @@ static int bitmap_check_region(const struct region *r)
if (r->start > r->end || r->group_len == 0 || r->off > r->group_len)
return -EINVAL;
 
+   if (r->end >= r->nbits)
+   return -ERANGE;
+
return 0;
 }
 
@@ -656,9 +654,7 @@ int bitmap_parselist(const char *buf, unsigned long *maskp, 
int nmaskbits)
if (ret)
return ret;
 
-   ret = bitmap_set_region(, maskp);
-   if (ret)
-   return ret;
+   bitmap_set_region(, maskp);
}
 
return 0;


[tip: core/rcu] lib: bitmap: support "N" as an alias for size of bitmap

2021-04-11 Thread tip-bot2 for Paul Gortmaker
The following commit has been merged into the core/rcu branch of tip:

Commit-ID: 2c4885d24e64941702a8f81c8e83289823ba35d0
Gitweb:
https://git.kernel.org/tip/2c4885d24e64941702a8f81c8e83289823ba35d0
Author:Paul Gortmaker 
AuthorDate:Sun, 21 Feb 2021 03:08:25 -05:00
Committer: Paul E. McKenney 
CommitterDate: Mon, 08 Mar 2021 14:16:58 -08:00

lib: bitmap: support "N" as an alias for size of bitmap

While this is done for all bitmaps, the original use case in mind was
for CPU masks and cpulist_parse() as described below.

It seems that a common configuration is to use the 1st couple cores for
housekeeping tasks.  This tends to leave the remaining ones to form a
pool of similarly configured cores to take on the real workload of
interest to the user.

So on machine A - with 32 cores, it could be 0-3 for "system" and then
4-31 being used in boot args like nohz_full=, or rcu_nocbs= as part of
setting up the worker pool of CPUs.

But then newer machine B is added, and it has 48 cores, and so while
the 0-3 part remains unchanged, the pool setup cpu list becomes 4-47.

Multiple deployment becomes easier when we can just simply replace 31
and 47 with "N" and let the system substitute in the actual number at
boot; a number that it knows better than we do.

Cc: Yury Norov 
Cc: Peter Zijlstra 
Cc: "Paul E. McKenney" 
Cc: Rasmus Villemoes 
Cc: Andy Shevchenko 
Suggested-by: Yury Norov  # move it from CPU code
Acked-by: Yury Norov 
Signed-off-by: Paul Gortmaker 
Signed-off-by: Paul E. McKenney 
---
 Documentation/admin-guide/kernel-parameters.rst |  7 +-
 lib/bitmap.c| 22 
 2 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.rst 
b/Documentation/admin-guide/kernel-parameters.rst
index 1132796..d6e3f67 100644
--- a/Documentation/admin-guide/kernel-parameters.rst
+++ b/Documentation/admin-guide/kernel-parameters.rst
@@ -68,6 +68,13 @@ For example one can add to the command line following 
parameter:
 
 where the final item represents CPUs 100,101,125,126,150,151,...
 
+The value "N" can be used to represent the numerically last CPU on the system,
+i.e "foo_cpus=16-N" would be equivalent to "16-31" on a 32 core system.
+
+Keep in mind that "N" is dynamic, so if system changes cause the bitmap width
+to change, such as less cores in the CPU list, then N and any ranges using N
+will also change.  Use the same on a small 4 core system, and "16-N" becomes
+"16-3" and now the same boot input will be flagged as invalid (start > end).
 
 
 This document may not be entirely up to date and comprehensive. The command
diff --git a/lib/bitmap.c b/lib/bitmap.c
index 833f152..9f4626a 100644
--- a/lib/bitmap.c
+++ b/lib/bitmap.c
@@ -519,11 +519,17 @@ static int bitmap_check_region(const struct region *r)
return 0;
 }
 
-static const char *bitmap_getnum(const char *str, unsigned int *num)
+static const char *bitmap_getnum(const char *str, unsigned int *num,
+unsigned int lastbit)
 {
unsigned long long n;
unsigned int len;
 
+   if (str[0] == 'N') {
+   *num = lastbit;
+   return str + 1;
+   }
+
len = _parse_integer(str, 10, );
if (!len)
return ERR_PTR(-EINVAL);
@@ -571,7 +577,9 @@ static const char *bitmap_find_region_reverse(const char 
*start, const char *end
 
 static const char *bitmap_parse_region(const char *str, struct region *r)
 {
-   str = bitmap_getnum(str, >start);
+   unsigned int lastbit = r->nbits - 1;
+
+   str = bitmap_getnum(str, >start, lastbit);
if (IS_ERR(str))
return str;
 
@@ -581,7 +589,7 @@ static const char *bitmap_parse_region(const char *str, 
struct region *r)
if (*str != '-')
return ERR_PTR(-EINVAL);
 
-   str = bitmap_getnum(str + 1, >end);
+   str = bitmap_getnum(str + 1, >end, lastbit);
if (IS_ERR(str))
return str;
 
@@ -591,14 +599,14 @@ static const char *bitmap_parse_region(const char *str, 
struct region *r)
if (*str != ':')
return ERR_PTR(-EINVAL);
 
-   str = bitmap_getnum(str + 1, >off);
+   str = bitmap_getnum(str + 1, >off, lastbit);
if (IS_ERR(str))
return str;
 
if (*str != '/')
return ERR_PTR(-EINVAL);
 
-   return bitmap_getnum(str + 1, >group_len);
+   return bitmap_getnum(str + 1, >group_len, lastbit);
 
 no_end:
r->end = r->start;
@@ -625,6 +633,10 @@ no_pattern:
  * From each group will be used only defined amount of bits.
  * Syntax: range:used_size/group_size
  * Example: 0-1023:2/256 ==> 0,1,256,257,512,513,768,769
+ * The value 'N' can be used as a dynamically substituted token for the
+ * maximum allowe

[tip: core/rcu] lib: test_bitmap: add tests for "N" alias

2021-04-11 Thread tip-bot2 for Paul Gortmaker
The following commit has been merged into the core/rcu branch of tip:

Commit-ID: 99c58d1adbca25fb3ee2469bf0904e1e3e021f7e
Gitweb:
https://git.kernel.org/tip/99c58d1adbca25fb3ee2469bf0904e1e3e021f7e
Author:Paul Gortmaker 
AuthorDate:Sun, 21 Feb 2021 03:08:26 -05:00
Committer: Paul E. McKenney 
CommitterDate: Mon, 08 Mar 2021 14:16:58 -08:00

lib: test_bitmap: add tests for "N" alias

These are copies of existing tests, with just 31 --> N.  This ensures
the recently added "N" alias transparently works in any normally
numeric fields of a region specification.

Cc: Yury Norov 
Cc: Rasmus Villemoes 
Cc: Andy Shevchenko 
Acked-by: Yury Norov 
Signed-off-by: Paul Gortmaker 
Signed-off-by: Paul E. McKenney 
---
 lib/test_bitmap.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c
index 3c1c46d..9cd5755 100644
--- a/lib/test_bitmap.c
+++ b/lib/test_bitmap.c
@@ -353,6 +353,16 @@ static const struct test_bitmap_parselist 
parselist_tests[] __initconst = {
{0, "16-31:16/31",  [3 * step], 32, 0},
{0, "31-31:31/31",  [14 * step], 32, 0},
 
+   {0, "N-N",  [14 * step], 32, 0},
+   {0, "0-0:1/N",  [0], 32, 0},
+   {0, "0-0:N/N",  [0], 32, 0},
+   {0, "0-15:16/N",[2 * step], 32, 0},
+   {0, "15-15:N/N",[13 * step], 32, 0},
+   {0, "15-N:1/N", [13 * step], 32, 0},
+   {0, "16-N:16/N",[3 * step], 32, 0},
+   {0, "N-N:N/N",  [14 * step], 32, 0},
+
+   {0, "0-N:1/3,1-N:1/3,2-N:1/3",  [8 * step], 32, 0},
{0, "0-31:1/3,1-31:1/3,2-31:1/3",   [8 * step], 32, 0},
{0, "1-10:8/12,8-31:24/29,0-31:0/3",[9 * step], 32, 0},
 


[PATCH 8/8] rcu: deprecate "all" option to rcu_nocbs=

2021-02-21 Thread Paul Gortmaker
With the core bitmap support now accepting "N" as a placeholder for
the end of the bitmap, "all" can be represented as "0-N" and has the
advantage of not being specific to RCU (or any other subsystem).

So deprecate the use of "all" by removing documentation references
to it.  The support itself needs to remain for now, since we don't
know how many people out there are using it currently, but since it
is in an __init area anyway, it isn't worth losing sleep over.

Cc: Yury Norov 
Cc: Peter Zijlstra 
Cc: "Paul E. McKenney" 
Cc: Josh Triplett 
Signed-off-by: Paul Gortmaker 
---
 Documentation/admin-guide/kernel-parameters.txt | 4 +---
 kernel/rcu/tree_plugin.h| 6 ++
 2 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index a10b545c2070..a116c0ff0a91 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4037,9 +4037,7 @@
see CONFIG_RAS_CEC help text.
 
rcu_nocbs=  [KNL]
-   The argument is a cpu list, as described above,
-   except that the string "all" can be used to
-   specify every CPU on the system.
+   The argument is a cpu list, as described above.
 
In kernels built with CONFIG_RCU_NOCB_CPU=y, set
the specified list of CPUs to be no-callback CPUs.
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 7e291ce0a1d6..56788dfde922 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1463,14 +1463,12 @@ static void rcu_cleanup_after_idle(void)
 
 /*
  * Parse the boot-time rcu_nocb_mask CPU list from the kernel parameters.
- * The string after the "rcu_nocbs=" is either "all" for all CPUs, or a
- * comma-separated list of CPUs and/or CPU ranges.  If an invalid list is
- * given, a warning is emitted and all CPUs are offloaded.
+ * If the list is invalid, a warning is emitted and all CPUs are offloaded.
  */
 static int __init rcu_nocb_setup(char *str)
 {
alloc_bootmem_cpumask_var(_nocb_mask);
-   if (!strcasecmp(str, "all"))
+   if (!strcasecmp(str, "all"))/* legacy: use "0-N" instead */
cpumask_setall(rcu_nocb_mask);
else
if (cpulist_parse(str, rcu_nocb_mask)) {
-- 
2.30.0



[PATCH 6/8] lib: bitmap: support "N" as an alias for size of bitmap

2021-02-21 Thread Paul Gortmaker
While this is done for all bitmaps, the original use case in mind was
for CPU masks and cpulist_parse() as described below.

It seems that a common configuration is to use the 1st couple cores for
housekeeping tasks.  This tends to leave the remaining ones to form a
pool of similarly configured cores to take on the real workload of
interest to the user.

So on machine A - with 32 cores, it could be 0-3 for "system" and then
4-31 being used in boot args like nohz_full=, or rcu_nocbs= as part of
setting up the worker pool of CPUs.

But then newer machine B is added, and it has 48 cores, and so while
the 0-3 part remains unchanged, the pool setup cpu list becomes 4-47.

Multiple deployment becomes easier when we can just simply replace 31
and 47 with "N" and let the system substitute in the actual number at
boot; a number that it knows better than we do.

Cc: Yury Norov 
Cc: Peter Zijlstra 
Cc: "Paul E. McKenney" 
Cc: Rasmus Villemoes 
Cc: Andy Shevchenko 
Suggested-by: Yury Norov  # move it from CPU code
Signed-off-by: Paul Gortmaker 
---
 .../admin-guide/kernel-parameters.rst |  7 ++
 lib/bitmap.c  | 22 ++-
 2 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.rst 
b/Documentation/admin-guide/kernel-parameters.rst
index 682ab28b5c94..7733a773f5f8 100644
--- a/Documentation/admin-guide/kernel-parameters.rst
+++ b/Documentation/admin-guide/kernel-parameters.rst
@@ -68,6 +68,13 @@ For example one can add to the command line following 
parameter:
 
 where the final item represents CPUs 100,101,125,126,150,151,...
 
+The value "N" can be used to represent the numerically last CPU on the system,
+i.e "foo_cpus=16-N" would be equivalent to "16-31" on a 32 core system.
+
+Keep in mind that "N" is dynamic, so if system changes cause the bitmap width
+to change, such as less cores in the CPU list, then N and any ranges using N
+will also change.  Use the same on a small 4 core system, and "16-N" becomes
+"16-3" and now the same boot input will be flagged as invalid (start > end).
 
 
 This document may not be entirely up to date and comprehensive. The command
diff --git a/lib/bitmap.c b/lib/bitmap.c
index 833f152a2c43..9f4626a4c95f 100644
--- a/lib/bitmap.c
+++ b/lib/bitmap.c
@@ -519,11 +519,17 @@ static int bitmap_check_region(const struct region *r)
return 0;
 }
 
-static const char *bitmap_getnum(const char *str, unsigned int *num)
+static const char *bitmap_getnum(const char *str, unsigned int *num,
+unsigned int lastbit)
 {
unsigned long long n;
unsigned int len;
 
+   if (str[0] == 'N') {
+   *num = lastbit;
+   return str + 1;
+   }
+
len = _parse_integer(str, 10, );
if (!len)
return ERR_PTR(-EINVAL);
@@ -571,7 +577,9 @@ static const char *bitmap_find_region_reverse(const char 
*start, const char *end
 
 static const char *bitmap_parse_region(const char *str, struct region *r)
 {
-   str = bitmap_getnum(str, >start);
+   unsigned int lastbit = r->nbits - 1;
+
+   str = bitmap_getnum(str, >start, lastbit);
if (IS_ERR(str))
return str;
 
@@ -581,7 +589,7 @@ static const char *bitmap_parse_region(const char *str, 
struct region *r)
if (*str != '-')
return ERR_PTR(-EINVAL);
 
-   str = bitmap_getnum(str + 1, >end);
+   str = bitmap_getnum(str + 1, >end, lastbit);
if (IS_ERR(str))
return str;
 
@@ -591,14 +599,14 @@ static const char *bitmap_parse_region(const char *str, 
struct region *r)
if (*str != ':')
return ERR_PTR(-EINVAL);
 
-   str = bitmap_getnum(str + 1, >off);
+   str = bitmap_getnum(str + 1, >off, lastbit);
if (IS_ERR(str))
return str;
 
if (*str != '/')
return ERR_PTR(-EINVAL);
 
-   return bitmap_getnum(str + 1, >group_len);
+   return bitmap_getnum(str + 1, >group_len, lastbit);
 
 no_end:
r->end = r->start;
@@ -625,6 +633,10 @@ static const char *bitmap_parse_region(const char *str, 
struct region *r)
  * From each group will be used only defined amount of bits.
  * Syntax: range:used_size/group_size
  * Example: 0-1023:2/256 ==> 0,1,256,257,512,513,768,769
+ * The value 'N' can be used as a dynamically substituted token for the
+ * maximum allowed value; i.e (nmaskbits - 1).  Keep in mind that it is
+ * dynamic, so if system changes cause the bitmap width to change, such
+ * as more cores in a CPU list, then any ranges using N will also change.
  *
  * Returns: 0 on success, -errno on invalid input strings. Error values:
  *
-- 
2.30.0



[PATCH 4/8] lib: bitmap: fold nbits into region struct

2021-02-21 Thread Paul Gortmaker
This will reduce parameter passing and enable using nbits as part
of future dynamic region parameter parsing.

Cc: Yury Norov 
Cc: Rasmus Villemoes 
Cc: Andy Shevchenko 
Suggested-by: Yury Norov 
Acked-by: Yury Norov 
Reviewed-by: Andy Shevchenko 
Signed-off-by: Paul Gortmaker 
---
 lib/bitmap.c | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/lib/bitmap.c b/lib/bitmap.c
index 75006c4036e9..162e2850c622 100644
--- a/lib/bitmap.c
+++ b/lib/bitmap.c
@@ -487,24 +487,24 @@ EXPORT_SYMBOL(bitmap_print_to_pagebuf);
 
 /*
  * Region 9-38:4/10 describes the following bitmap structure:
- * 0  9  1218  38
- * .......
- * ^  ^ ^   ^
- *  start  off   group_lenend
+ * 0  9  1218  38   N
+ * .......
+ * ^  ^ ^   ^   ^
+ *  start  off   group_lenend   nbits
  */
 struct region {
unsigned int start;
unsigned int off;
unsigned int group_len;
unsigned int end;
+   unsigned int nbits;
 };
 
-static int bitmap_set_region(const struct region *r,
-   unsigned long *bitmap, int nbits)
+static int bitmap_set_region(const struct region *r, unsigned long *bitmap)
 {
unsigned int start;
 
-   if (r->end >= nbits)
+   if (r->end >= r->nbits)
return -ERANGE;
 
for (start = r->start; start <= r->end; start += r->group_len)
@@ -640,7 +640,8 @@ int bitmap_parselist(const char *buf, unsigned long *maskp, 
int nmaskbits)
struct region r;
long ret;
 
-   bitmap_zero(maskp, nmaskbits);
+   r.nbits = nmaskbits;
+   bitmap_zero(maskp, r.nbits);
 
while (buf) {
buf = bitmap_find_region(buf);
@@ -655,7 +656,7 @@ int bitmap_parselist(const char *buf, unsigned long *maskp, 
int nmaskbits)
if (ret)
return ret;
 
-   ret = bitmap_set_region(, maskp, nmaskbits);
+   ret = bitmap_set_region(, maskp);
if (ret)
return ret;
}
-- 
2.30.0



[PATCH 7/8] lib: test_bitmap: add tests for "N" alias

2021-02-21 Thread Paul Gortmaker
These are copies of existing tests, with just 31 --> N.  This ensures
the recently added "N" alias transparently works in any normally
numeric fields of a region specification.

Cc: Yury Norov 
Cc: Rasmus Villemoes 
Cc: Andy Shevchenko 
Signed-off-by: Paul Gortmaker 
---
 lib/test_bitmap.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c
index 9c6a88c480c1..a6048278d027 100644
--- a/lib/test_bitmap.c
+++ b/lib/test_bitmap.c
@@ -354,6 +354,16 @@ static const struct test_bitmap_parselist 
parselist_tests[] __initconst = {
{0, "16-31:16/31",  [3 * step], 32, 0},
{0, "31-31:31/31",  [14 * step], 32, 0},
 
+   {0, "N-N",  [14 * step], 32, 0},
+   {0, "0-0:1/N",  [0], 32, 0},
+   {0, "0-0:N/N",  [0], 32, 0},
+   {0, "0-15:16/N",[2 * step], 32, 0},
+   {0, "15-15:N/N",[13 * step], 32, 0},
+   {0, "15-N:1/N", [13 * step], 32, 0},
+   {0, "16-N:16/N",[3 * step], 32, 0},
+   {0, "N-N:N/N",  [14 * step], 32, 0},
+
+   {0, "0-N:1/3,1-N:1/3,2-N:1/3",  [8 * step], 32, 0},
{0, "0-31:1/3,1-31:1/3,2-31:1/3",   [8 * step], 32, 0},
{0, "1-10:8/12,8-31:24/29,0-31:0/3",[9 * step], 32, 0},
 
-- 
2.30.0



[PATCH 5/8] lib: bitmap: move ERANGE check from set_region to check_region

2021-02-21 Thread Paul Gortmaker
It makes sense to do all the checks in check_region() and not 1/2
in check_region and 1/2 in set_region.

Since set_region is called immediately after check_region, the net
effect on runtime is zero, but it gets rid of an if (...) return...

Cc: Yury Norov 
Cc: Rasmus Villemoes 
Cc: Andy Shevchenko 
Acked-by: Yury Norov 
Reviewed-by: Andy Shevchenko 
Signed-off-by: Paul Gortmaker 
---
 lib/bitmap.c | 14 +-
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/lib/bitmap.c b/lib/bitmap.c
index 162e2850c622..833f152a2c43 100644
--- a/lib/bitmap.c
+++ b/lib/bitmap.c
@@ -500,17 +500,12 @@ struct region {
unsigned int nbits;
 };
 
-static int bitmap_set_region(const struct region *r, unsigned long *bitmap)
+static void bitmap_set_region(const struct region *r, unsigned long *bitmap)
 {
unsigned int start;
 
-   if (r->end >= r->nbits)
-   return -ERANGE;
-
for (start = r->start; start <= r->end; start += r->group_len)
bitmap_set(bitmap, start, min(r->end - start + 1, r->off));
-
-   return 0;
 }
 
 static int bitmap_check_region(const struct region *r)
@@ -518,6 +513,9 @@ static int bitmap_check_region(const struct region *r)
if (r->start > r->end || r->group_len == 0 || r->off > r->group_len)
return -EINVAL;
 
+   if (r->end >= r->nbits)
+   return -ERANGE;
+
return 0;
 }
 
@@ -656,9 +654,7 @@ int bitmap_parselist(const char *buf, unsigned long *maskp, 
int nmaskbits)
if (ret)
return ret;
 
-   ret = bitmap_set_region(, maskp);
-   if (ret)
-   return ret;
+   bitmap_set_region(, maskp);
}
 
return 0;
-- 
2.30.0



[PATCH 2/8] lib: test_bitmap: add tests to trigger ERANGE case.

2021-02-21 Thread Paul Gortmaker
Add tests that specify a valid range, but one that is outside the
width of the bitmap for which it is to be applied to.  These should
trigger an -ERANGE response from the code.

Cc: Yury Norov 
Cc: Rasmus Villemoes 
Cc: Andy Shevchenko 
Reviewed-by: Andy Shevchenko 
Signed-off-by: Paul Gortmaker 
---
 lib/test_bitmap.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c
index 589f2a34ceba..172ffbfa83c4 100644
--- a/lib/test_bitmap.c
+++ b/lib/test_bitmap.c
@@ -338,6 +338,8 @@ static const struct test_bitmap_parselist parselist_tests[] 
__initconst = {
{-EINVAL, "-1", NULL, 8, 0},
{-EINVAL, "-0", NULL, 8, 0},
{-EINVAL, "10-1", NULL, 8, 0},
+   {-ERANGE, "8-8", NULL, 8, 0},
+   {-ERANGE, "0-31", NULL, 8, 0},
{-EINVAL, "0-31:", NULL, 32, 0},
{-EINVAL, "0-31:0", NULL, 32, 0},
{-EINVAL, "0-31:0/", NULL, 32, 0},
-- 
2.30.0



[PATCH 3/8] lib: test_bitmap: add more start-end:offset/len tests

2021-02-21 Thread Paul Gortmaker
There are inputs to bitmap_parselist() that would probably never
be entered manually by a person, but might result from some kind of
automated input generator.  Things like ranges of length 1, or group
lengths longer than nbits, overlaps, or offsets of zero.

Adding these tests serve two purposes:

1) document what might seem odd but nonetheless valid input.

2) don't regress from what we currently accept as valid.

Cc: Yury Norov 
Cc: Rasmus Villemoes 
Cc: Andy Shevchenko 
Acked-by: Yury Norov 
Reviewed-by: Andy Shevchenko 
Signed-off-by: Paul Gortmaker 
---
 lib/test_bitmap.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c
index 172ffbfa83c4..9c6a88c480c1 100644
--- a/lib/test_bitmap.c
+++ b/lib/test_bitmap.c
@@ -35,6 +35,8 @@ static const unsigned long exp1[] __initconst = {
BITMAP_FROM_U64(0xULL),
BITMAP_FROM_U64(0xULL),
BITMAP_FROM_U64(0),
+   BITMAP_FROM_U64(0x8000),
+   BITMAP_FROM_U64(0x8000),
 };
 
 static const unsigned long exp2[] __initconst = {
@@ -335,6 +337,26 @@ static const struct test_bitmap_parselist 
parselist_tests[] __initconst = {
{0, " ,  ,,  , ,   ",   [12 * step], 8, 0},
{0, " ,  ,,  , ,   \n", [12 * step], 8, 0},
 
+   {0, "0-0",  [0], 32, 0},
+   {0, "1-1",  [1 * step], 32, 0},
+   {0, "15-15",[13 * step], 32, 0},
+   {0, "31-31",[14 * step], 32, 0},
+
+   {0, "0-0:0/1",  [12 * step], 32, 0},
+   {0, "0-0:1/1",  [0], 32, 0},
+   {0, "0-0:1/31", [0], 32, 0},
+   {0, "0-0:31/31",[0], 32, 0},
+   {0, "1-1:1/1",  [1 * step], 32, 0},
+   {0, "0-15:16/31",   [2 * step], 32, 0},
+   {0, "15-15:1/2",[13 * step], 32, 0},
+   {0, "15-15:31/31",  [13 * step], 32, 0},
+   {0, "15-31:1/31",   [13 * step], 32, 0},
+   {0, "16-31:16/31",  [3 * step], 32, 0},
+   {0, "31-31:31/31",  [14 * step], 32, 0},
+
+   {0, "0-31:1/3,1-31:1/3,2-31:1/3",   [8 * step], 32, 0},
+   {0, "1-10:8/12,8-31:24/29,0-31:0/3",[9 * step], 32, 0},
+
{-EINVAL, "-1", NULL, 8, 0},
{-EINVAL, "-0", NULL, 8, 0},
{-EINVAL, "10-1", NULL, 8, 0},
-- 
2.30.0



[PATCH v5 0/8] support for bitmap (and hence CPU) list "N" abbreviation

2021-02-21 Thread Paul Gortmaker
This is the 5th and final version of this series.  We got some good
improvements, like adding self-tests, using "N" as "just another number"
that could be used anywhere, and making things not CPU specific.

But now it is time to close this review out since is down to just
hand-wringing over hypothetical use cases, bikeshedding on upper/lower
case, and a wild goose chase on trying to avoid adding a function arg.

So, once again - thanks to all who provided input; it was all considered
even if not all of it was used.  And in that vein, just to be clear:

1) There will be no adaptive modifying or guessing what the user meant if
a range turns out to be invalid.  The caller will be responsible for
handling the -EINVAL just as things are currently today.

2) There will be no use of "L" or lower case "n" because there is simply
no need for it.  Yes, it would be simple enough to add, but it complicates
things and would also be impossible to remove later, once it went mainline.


The original text from v4 follows:

The basic objective here was to add support for "nohz_full=8-N" and/or
"rcu_nocbs="4-N" -- essentially introduce "N" as a portable reference
to the last core, evaluated at boot for anything using a CPU list.

The thinking behind this, is that people carve off a few early CPUs to
support housekeeping tasks, and perhaps dedicate one to a busy I/O
peripheral, and then the remaining pool of CPUs out to the end are a
part of a commonly configured pool used for the real work the user
cares about.

Extend that logic out to a fleet of machines - some new, and some
nearing EOL, and you've probably got a wide range of core counts to
contend with - even though the early number of cores dedicated to the
system overhead probably doesn't vary.

This change would enable sysadmins to have a common bootarg across all
such systems, and would also avoid any off-by-one fencepost errors that
happen for users who might briefly forget that core counts start at zero.

Originally I did this at the CPU subsys level, but Yury suggested it
be moved down further to bitmap level itself, which made the core 
implementation smaller and less complex, but the series longer.

New self tests are added to better exercise what bitmap range/region
currently supports, and new tests are added for the new "N" support.

Also tested boot arg and the post-boot cgroup use case as per below:

   root@hackbox:~# cat /proc/cmdline 
   BOOT_IMAGE=/boot/bzImage root=/dev/sda1 rcu_nocbs=2,3,8-N:1/2
   root@hackbox:~# dmesg|grep Offl
   rcu: Offload RCU callbacks from CPUs: 2-3,8,10,12,14.

   root@hackbox:/sys/fs/cgroup/cpuset/foo# cat cpuset.cpus
   
   root@hackbox:/sys/fs/cgroup/cpuset/foo# /bin/echo 10-N > cpuset.cpus
   root@hackbox:/sys/fs/cgroup/cpuset/foo# cat cpuset.cpus
   10-15
   root@hackbox:/sys/fs/cgroup/cpuset/foo# /bin/echo N-N:N/N > cpuset.cpus
   root@hackbox:/sys/fs/cgroup/cpuset/foo# cat cpuset.cpus
   15

This was on a 16 core machine with CONFIG_NR_CPUS=16 in .config file.

Note that "N" is a dynamic quantity, and can change scope if the bitmap
is changed in size.  So at the risk of stating the obvious, don't use it
for "burn_eFuse=128-N" or "secure_erase_firmware=32-N" type stuff.

Paul.
---

[v5: go back to v3 location of "nbits" in region.  Add acks/reviewed.]

[v4: pair nbits with region, instead of inside it.  Split EINVAL and
 ERANGE tests.  Don't handle start/end/offset within a macro to
 abstract away nbits usage.  Added some Reviwed-by/Ack tags.]
 
https://lore.kernel.org/lkml/20210209225907.78405-1-paul.gortma...@windriver.com/

[v3: Allow "N" to be used anywhere in the region spec, i.e. "N-N:N/N" vs.
 just being allowed at end of range like "0-N".  Add new self-tests.  Drop
 "all" and "none" aliases as redundant and not worth the extra complication. ]
 
https://lore.kernel.org/lkml/20210126171141.122639-1-paul.gortma...@windriver.com/

[v2: push code down from cpu subsys to core bitmap code as per
 Yury's comments.  Change "last" to simply be "N" as per PeterZ.]
 
https://lore.kernel.org/lkml/20210121223355.59780-1-paul.gortma...@windriver.com/

[v1: https://lore.kernel.org/lkml/20210106004850.GA11682@paulmck-ThinkPad-P72/

Cc: Li Zefan 
Cc: Ingo Molnar 
Cc: Yury Norov 
Cc: Thomas Gleixner 
Cc: Josh Triplett 
Cc: Peter Zijlstra 
Cc: "Paul E. McKenney" 
Cc: Frederic Weisbecker 
Cc: Rasmus Villemoes 
Cc: Andy Shevchenko 



Paul Gortmaker (8):
  lib: test_bitmap: clearly separate ERANGE from EINVAL tests.
  lib: test_bitmap: add tests to trigger ERANGE case.
  lib: test_bitmap: add more start-end:offset/len tests
  lib: bitmap: fold nbits into region struct
  lib: bitmap: move ERANGE check from set_region to check_region
  lib: bitmap: support "N" as an alias for size of bitmap
  li

[PATCH 1/8] lib: test_bitmap: clearly separate ERANGE from EINVAL tests.

2021-02-21 Thread Paul Gortmaker
This block of tests was meant to find/flag incorrect use of the ":"
and "/" separators (syntax errors) and invalid (zero) group len.

However they were specified with an 8 bit width and 32 bit operations,
so they really contained two errors (EINVAL and ERANGE).

Promote them to 32 bit so it is clear what they are meant to target.
Then we can add tests specific for ERANGE (no syntax errors, just
doing 32bit op on 8 bit width, plus a typical 9-on-8 fencepost error).

Cc: Yury Norov 
Cc: Rasmus Villemoes 
Cc: Andy Shevchenko 
Reviewed-by: Andy Shevchenko 
Signed-off-by: Paul Gortmaker 
---
 lib/test_bitmap.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c
index 4425a1dd4ef1..589f2a34ceba 100644
--- a/lib/test_bitmap.c
+++ b/lib/test_bitmap.c
@@ -338,12 +338,12 @@ static const struct test_bitmap_parselist 
parselist_tests[] __initconst = {
{-EINVAL, "-1", NULL, 8, 0},
{-EINVAL, "-0", NULL, 8, 0},
{-EINVAL, "10-1", NULL, 8, 0},
-   {-EINVAL, "0-31:", NULL, 8, 0},
-   {-EINVAL, "0-31:0", NULL, 8, 0},
-   {-EINVAL, "0-31:0/", NULL, 8, 0},
-   {-EINVAL, "0-31:0/0", NULL, 8, 0},
-   {-EINVAL, "0-31:1/0", NULL, 8, 0},
-   {-EINVAL, "0-31:10/1", NULL, 8, 0},
+   {-EINVAL, "0-31:", NULL, 32, 0},
+   {-EINVAL, "0-31:0", NULL, 32, 0},
+   {-EINVAL, "0-31:0/", NULL, 32, 0},
+   {-EINVAL, "0-31:0/0", NULL, 32, 0},
+   {-EINVAL, "0-31:1/0", NULL, 32, 0},
+   {-EINVAL, "0-31:10/1", NULL, 32, 0},
{-EOVERFLOW, "0-98765432123456789:10/1", NULL, 8, 0},
 
{-EINVAL, "a-31", NULL, 8, 0},
-- 
2.30.0



Re: [PATCH 6/8] lib: bitmap: support "N" as an alias for size of bitmap

2021-02-21 Thread Paul Gortmaker
[Re: [PATCH 6/8] lib: bitmap: support "N" as an alias for size of bitmap] On 
11/02/2021 (Thu 17:24) Yury Norov wrote:

> On Wed, Feb 10, 2021 at 06:49:30PM +0200, Andy Shevchenko wrote:
> > On Wed, Feb 10, 2021 at 10:58:25AM -0500, Paul Gortmaker wrote:
> > > [Re: [PATCH 6/8] lib: bitmap: support "N" as an alias for size of bitmap] 
> > > On 09/02/2021 (Tue 15:16) Yury Norov wrote:
> > > 
> > > > On Tue, Feb 9, 2021 at 3:01 PM Paul Gortmaker
> > > >  wrote:
> > > 
> > > [...]
> > > 
> > > > > -static const char *bitmap_getnum(const char *str, unsigned int *num)
> > > > > +static const char *bitmap_getnum(const char *str, unsigned int *num,
> > > > > +unsigned int lastbit)
> > > > 
> > > > The idea of struct bitmap_region is avoid passing the lastbit to the 
> > > > functions.
> > > > But here you do pass. Can you please be consistent? Or if I 
> > > > misunderstand
> > > > the idea of struct bitmap_region, can you please clarify it?
> > > > 
> > > > Also, I don't think that in this specific case it's worth it to create
> > > > a hierarchy of
> > > > structures. Just adding lastbits to struct region will be simpler and 
> > > > more
> > > > transparent.
> > > 
> > > I'm getting mixed messages from different people as to what is wanted 
> > > here.
> > > 
> > > Here is what the code looks like now; only relevant lines shown:
> > > 
> > >  ---
> > > int bitmap_parselist(const char *buf, unsigned long *maskp, int nmaskbits)
> > > {
> > > 
> > > struct region r;
> > > 
> > > bitmap_parse_region(buf, );   <---
> > > bitmap_check_region();
> > > bitmap_set_region(, maskp, nmaskbits);
> > > }
> > > 
> > > static const char *bitmap_parse_region(const char *str, struct region *r)
> > > {
> > > bitmap_getnum(str, >start);
> > > bitmap_getnum(str + 1, >end);
> > > bitmap_getnum(str + 1, >off);
> > > bitmap_getnum(str + 1, >group_len);
> > > }
> > > 
> > > static const char *bitmap_getnum(const char *str, unsigned int *num)
> > > {
> > >   /* PG: We need nmaskbits here for N processing. */
> > > }
> > >  ---
> > > 
> > > 
> > > Note the final function - the one where you asked to locate the N
> > > processing into -- does not take a region.  So even if we bundle nbits
> > > into the region struct, it doesn't get the data to where we need it.

Yury - you asked why there was an arg passed -- "lastbit"  -- and from
your reply, I don't think you fully read my answer - or at least missed
the three key sentences above as to why "lastbit" was passed.

> > > 
> > > Choices:
> > > 
> > > 1) pass in nbits just like bitmap_set_region() does currently.
> > > 
> > > 2) add nbits to region and pass full region instead of start/end/off.
> > > 
> > > 2a) add nbits to region and pass full region and also start/end/off.
> > > 
> > > 3) use *num as a bi-directional data path and initialize with nbits.
> > > 
> > > 
> > > Yury doesn't want us add any function args -- i.e. not to do #1.
> > > 
> > > Andy didn't like #2 because it "hides" that we are writing to r.
> > > 
> > > I ruled out sending 2a -- bitmap_getnum(str, r, >end)  because
> > > it adds an arg, AND seems rather redundant to pass r and r->field.
> > > 
> > > The #3 is the smallest change - but seems like we are trying to be
> > > too clever just to save a line of code or a couple bytes. (see below)
> > > 
> > > Yury - in your reply to patch 5, you indicate you wrote the region
> > > code and want me to go back to putting nbits into region directly.
> > > 
> > > Can you guys please clarify who is maintainer and hence exactly how
> > > you want this relatively minor detail handled?  I'll gladly do it
> > > in whatever way the maintainer wants just to get this finally done.
> > 
> > Funny that there is no maintainer of the code.
> > That said, I consider #1 or #3 is good enough. Rationale for
> > - #1: it doesn't touch purity of getnum(), I think it's good enough not to 
> > kn

Re: [PATCH v4 0/8] support for bitmap (and hence CPU) list "N" abbreviation

2021-02-21 Thread Paul Gortmaker
[Re: [PATCH v4 0/8] support for bitmap (and hence CPU) list "N" abbreviation] 
On 10/02/2021 (Wed 15:50) Yury Norov wrote:

> On Wed, Feb 10, 2021 at 9:57 AM Paul E. McKenney  wrote:
> >
> > On Wed, Feb 10, 2021 at 06:26:54PM +0200, Andy Shevchenko wrote:
> > > On Tue, Feb 09, 2021 at 05:58:59PM -0500, Paul Gortmaker wrote:
> > > > The basic objective here was to add support for "nohz_full=8-N" and/or
> > > > "rcu_nocbs="4-N" -- essentially introduce "N" as a portable reference
> > > > to the last core, evaluated at boot for anything using a CPU list.
> > >
> > > I thought we kinda agreed that N is confusing and L is better.
> > > N to me is equal to 32 on 32 core system as *number of cores / CPUs*. 
> > > While L
> > > sounds better as *last available CPU number*.
> >
> > The advantage of "N" is that people will automatically recognize it as
> > "last thing" or number of things" because "N" has long been used in
> > both senses.  In contrast, someone seeing "0-L" for the first time is
> > likely to go "What???".
> >
> > Besides, why would someone interpret "N" as "number of CPUs" when doing
> > that almost always gets you an invalid CPU number?
> >
> > Thanx, Paul
> 
> I have no strong opinion about a letter, but I like Andy's idea to make it
> case-insensitive.

It is trivial to add later if someone can prove a genuine need for it,
but it is impossible to remove later if we add it now for no reason.

> 
> There is another comment from the previous iteration not addressed so far.

Actually, no - it was addressed in detail already:

https://lore.kernel.org/lkml/20210127091238.gh23...@windriver.com/

> This idea of the N notation is to make the bitmap list interface more robust
> when we share the configs between different machines. What we have now
> is definitely a good thing, but not completely portable except for cases
> 'N', '0-N' and 'N-N'.
> 
> For example, if one user adds rcu_nocbs= '4-N', and it works perfectly fine 
> for
> him, another user with s NR_CPUS == 2 will fail to boot with such a config.

Firstly there is no "fail to boot" from "rcu_nocbs=" -- that
just doesn't happen.   In any case, as you can see, I added in v4 the
documentation (as you requested) for this case - in several places.

And I explained in the thread above why any attempt to do some kind of
mapping policy was doomed to just add confusion and end up doing the
wrong thing.  And the discussion ended with that.

So I'm not clear why it was brought up again here as if I just ignored
your "broken config" concerns and never addressed them.

In any case as others have indicated, it serves no immediate purpose to
over-think this and start adding corner case reactions to use cases that
simply don't exist and probably never will.

Thanks,
Paul.
--

> 
> This is not a problem of course in case of absolute values because nobody
> guaranteed robustness. But this N feature would be barely useful in practice,
> except for 'N', '0-N' and 'N-N' as I mentioned before, because there's always
> a chance to end up with a broken config.
> 
> We can improve on robustness a lot if we take care about this case.For me,
> the more reliable interface would look like this:
> 1. chunks without N work as before.
> 2. if 'a-N' is passed where a>=N, we drop chunk and print warning message
> 3. if 'a-N' is passed where a>=N together with a control key, we set last bit
> and print warning.
> 
> For example, on 2-core CPU:
> "4-2" --> error
> "4-4" --> error
> "4-N" --> drop and warn
> "X, 4-N" --> set last bit and warn
> 
> Any comments?


[tip: core/rcu] docs: Fix typos and drop/fix dead links in RCU documentation

2021-02-15 Thread tip-bot2 for Paul Gortmaker
The following commit has been merged into the core/rcu branch of tip:

Commit-ID: 9d3a04853fe640e0eba2c0799c880b7dcf190219
Gitweb:
https://git.kernel.org/tip/9d3a04853fe640e0eba2c0799c880b7dcf190219
Author:Paul Gortmaker 
AuthorDate:Sat, 28 Nov 2020 15:32:59 -05:00
Committer: Paul E. McKenney 
CommitterDate: Mon, 04 Jan 2021 13:35:14 -08:00

docs: Fix typos and drop/fix dead links in RCU documentation

It appears the Compaq link moved to a machine at HP for a while
after the merger of the two, but that doesn't work either.  A search
of HP for "wiz_2637" (w and w/o html suffix) comes up empty.

Since the references aren't critical to the documents we remove them.

Also, the lkml.kernel.org/g links have been broken for ages, so replace
them with lore.kernel.org/r links - standardize on lore for all links too.

Note that we put off fixing these 4y ago - presumably thinking that a
treewide fixup was pending.  Probably safe to go fix the RCU ones now.

https://lore.kernel.org/r/20160915144926.gd10...@linux.vnet.ibm.com/

Cc: Michael Opdenacker 
Cc: Steven Rostedt 
Cc: "Paul E. McKenney" 
Signed-off-by: Paul Gortmaker 
Signed-off-by: Paul E. McKenney 
---
 Documentation/RCU/Design/Requirements/Requirements.rst | 23 -
 Documentation/RCU/checklist.rst|  8 +--
 2 files changed, 14 insertions(+), 17 deletions(-)

diff --git a/Documentation/RCU/Design/Requirements/Requirements.rst 
b/Documentation/RCU/Design/Requirements/Requirements.rst
index 1e3df77..f32f8fa 100644
--- a/Documentation/RCU/Design/Requirements/Requirements.rst
+++ b/Documentation/RCU/Design/Requirements/Requirements.rst
@@ -321,11 +321,10 @@ do_something_gp_buggy() below:
   12 }
 
 However, this temptation must be resisted because there are a
-surprisingly large number of ways that the compiler (to say nothing of
-`DEC Alpha CPUs <https://h71000.www7.hp.com/wizard/wiz_2637.html>`__)
-can trip this code up. For but one example, if the compiler were short
-of registers, it might choose to refetch from ``gp`` rather than keeping
-a separate copy in ``p`` as follows:
+surprisingly large number of ways that the compiler (or weak ordering
+CPUs like the DEC Alpha) can trip this code up. For but one example, if
+the compiler were short of registers, it might choose to refetch from
+``gp`` rather than keeping a separate copy in ``p`` as follows:
 
::
 
@@ -1183,7 +1182,7 @@ costs have plummeted. However, as I learned from Matt 
Mackall's
 `bloatwatch <http://elinux.org/Linux_Tiny-FAQ>`__ efforts, memory
 footprint is critically important on single-CPU systems with
 non-preemptible (``CONFIG_PREEMPT=n``) kernels, and thus `tiny
-RCU <https://lkml.kernel.org/g/20090113221724.ga15...@linux.vnet.ibm.com>`__
+RCU <https://lore.kernel.org/r/20090113221724.ga15...@linux.vnet.ibm.com>`__
 was born. Josh Triplett has since taken over the small-memory banner
 with his `Linux kernel tinification <https://tiny.wiki.kernel.org/>`__
 project, which resulted in `SRCU <#Sleepable%20RCU>`__ becoming optional
@@ -1624,7 +1623,7 @@ against mishaps and misuse:
init_rcu_head() and cleaned up with destroy_rcu_head().
Mathieu Desnoyers made me aware of this requirement, and also
supplied the needed
-   `patch <https://lkml.kernel.org/g/20100319013024.GA28456@Krystal>`__.
+   `patch <https://lore.kernel.org/r/20100319013024.GA28456@Krystal>`__.
 #. An infinite loop in an RCU read-side critical section will eventually
trigger an RCU CPU stall warning splat, with the duration of
“eventually” being controlled by the ``RCU_CPU_STALL_TIMEOUT``
@@ -1716,7 +1715,7 @@ requires almost all of them be hidden behind a 
``CONFIG_RCU_EXPERT``
 
 This all should be quite obvious, but the fact remains that Linus
 Torvalds recently had to
-`remind 
<https://lkml.kernel.org/g/ca+55afy4wccwal4okts8wxhgz5h-ibecy_meg9c4mnqrunw...@mail.gmail.com>`__
+`remind 
<https://lore.kernel.org/r/ca+55afy4wccwal4okts8wxhgz5h-ibecy_meg9c4mnqrunw...@mail.gmail.com>`__
 me of this requirement.
 
 Firmware Interface
@@ -1837,9 +1836,9 @@ NMI handlers.
 
 The name notwithstanding, some Linux-kernel architectures can have
 nested NMIs, which RCU must handle correctly. Andy Lutomirski `surprised
-me 
<https://lkml.kernel.org/r/CALCETrXLq1y7e_dKFPgou-FKHB6Pu-r8+t-6Ds+8=va7anb...@mail.gmail.com>`__
+me 
<https://lore.kernel.org/r/CALCETrXLq1y7e_dKFPgou-FKHB6Pu-r8+t-6Ds+8=va7anb...@mail.gmail.com>`__
 with this requirement; he also kindly surprised me with `an
-algorithm 
<https://lkml.kernel.org/r/CALCETrXSY9JpW3uE6H8WYk81sg56qasA2aqmjMPsq5dOtzso=g...@mail.gmail.com>`__
+algorithm 
<https://lore.kernel.org/r/CALCETrXSY9JpW3uE6H8WYk81sg56qasA2aqmjMPsq5dOtzso=g...@mail.gmail.com>`__
 that meets this requirement.
 
 Furthermore, NMI handlers can be interrupted by what appear to RCU to be
@@ -2264,7 +2263,7 @@ more extreme measur

Re: [PATCH 6/8] lib: bitmap: support "N" as an alias for size of bitmap

2021-02-10 Thread Paul Gortmaker
[Re: [PATCH 6/8] lib: bitmap: support "N" as an alias for size of bitmap] On 
09/02/2021 (Tue 15:16) Yury Norov wrote:

> On Tue, Feb 9, 2021 at 3:01 PM Paul Gortmaker
>  wrote:

[...]

> >
> > -static const char *bitmap_getnum(const char *str, unsigned int *num)
> > +static const char *bitmap_getnum(const char *str, unsigned int *num,
> > +unsigned int lastbit)
> 
> The idea of struct bitmap_region is avoid passing the lastbit to the 
> functions.
> But here you do pass. Can you please be consistent? Or if I misunderstand
> the idea of struct bitmap_region, can you please clarify it?
> 
> Also, I don't think that in this specific case it's worth it to create
> a hierarchy of
> structures. Just adding lastbits to struct region will be simpler and more
> transparent.

I'm getting mixed messages from different people as to what is wanted here.

Here is what the code looks like now; only relevant lines shown:

 ---
int bitmap_parselist(const char *buf, unsigned long *maskp, int nmaskbits)
{

struct region r;

bitmap_parse_region(buf, );   <---
bitmap_check_region();
bitmap_set_region(, maskp, nmaskbits);
}

static const char *bitmap_parse_region(const char *str, struct region *r)
{
bitmap_getnum(str, >start);
bitmap_getnum(str + 1, >end);
bitmap_getnum(str + 1, >off);
bitmap_getnum(str + 1, >group_len);
}

static const char *bitmap_getnum(const char *str, unsigned int *num)
{
/* PG: We need nmaskbits here for N processing. */
}
 ---


Note the final function - the one where you asked to locate the N
processing into -- does not take a region.  So even if we bundle nbits
into the region struct, it doesn't get the data to where we need it.

Choices:

1) pass in nbits just like bitmap_set_region() does currently.

2) add nbits to region and pass full region instead of start/end/off.

2a) add nbits to region and pass full region and also start/end/off.

3) use *num as a bi-directional data path and initialize with nbits.


Yury doesn't want us add any function args -- i.e. not to do #1.

Andy didn't like #2 because it "hides" that we are writing to r.

I ruled out sending 2a -- bitmap_getnum(str, r, >end)  because
it adds an arg, AND seems rather redundant to pass r and r->field.

The #3 is the smallest change - but seems like we are trying to be
too clever just to save a line of code or a couple bytes. (see below)

Yury - in your reply to patch 5, you indicate you wrote the region
code and want me to go back to putting nbits into region directly.

Can you guys please clarify who is maintainer and hence exactly how
you want this relatively minor detail handled?  I'll gladly do it
in whatever way the maintainer wants just to get this finally done.

I'd rather not keep going in circles and guessing and annoying everyone
else on the Cc: list by filling their inbox any more than I already have.

That would help a lot in getting this finished.

Thanks,
Paul.
--

Example #3 -- not sent..

+#define DECLARE_REGION(rname, initval) \
+struct region rname = {\
+   .start = initval,   \
+   .off = initval, \
+   .group_len = initval,   \
+   .end = initval, \
+}

[...]

-   struct region r;
+   DECLARE_REGION(r, nmaskbits - 1);   /* "N-N:N/N" */

[...]

+/*
+ * Seeing 'N' tells us to leave the value of "num" unchanged (which will
+ * be the max value for the width of the bitmap, set via DECLARE_REGION).
+ */
 static const char *bitmap_getnum(const char *str, unsigned int *num)
 {
unsigned long long n;
unsigned int len;
 
+   if (str[0] == 'N')  /* nothing to do, just advance str */
+   return str + 1;



[PATCH 6/8] lib: bitmap: support "N" as an alias for size of bitmap

2021-02-09 Thread Paul Gortmaker
While this is done for all bitmaps, the original use case in mind was
for CPU masks and cpulist_parse() as described below.

It seems that a common configuration is to use the 1st couple cores for
housekeeping tasks.  This tends to leave the remaining ones to form a
pool of similarly configured cores to take on the real workload of
interest to the user.

So on machine A - with 32 cores, it could be 0-3 for "system" and then
4-31 being used in boot args like nohz_full=, or rcu_nocbs= as part of
setting up the worker pool of CPUs.

But then newer machine B is added, and it has 48 cores, and so while
the 0-3 part remains unchanged, the pool setup cpu list becomes 4-47.

Multiple deployment becomes easier when we can just simply replace 31
and 47 with "N" and let the system substitute in the actual number at
boot; a number that it knows better than we do.

Cc: Yury Norov 
Cc: Peter Zijlstra 
Cc: "Paul E. McKenney" 
Cc: Rasmus Villemoes 
Cc: Andy Shevchenko 
Suggested-by: Yury Norov  # move it from CPU code
Signed-off-by: Paul Gortmaker 
---
 .../admin-guide/kernel-parameters.rst |  7 +
 lib/bitmap.c  | 27 ++-
 2 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.rst 
b/Documentation/admin-guide/kernel-parameters.rst
index 682ab28b5c94..7733a773f5f8 100644
--- a/Documentation/admin-guide/kernel-parameters.rst
+++ b/Documentation/admin-guide/kernel-parameters.rst
@@ -68,6 +68,13 @@ For example one can add to the command line following 
parameter:
 
 where the final item represents CPUs 100,101,125,126,150,151,...
 
+The value "N" can be used to represent the numerically last CPU on the system,
+i.e "foo_cpus=16-N" would be equivalent to "16-31" on a 32 core system.
+
+Keep in mind that "N" is dynamic, so if system changes cause the bitmap width
+to change, such as less cores in the CPU list, then N and any ranges using N
+will also change.  Use the same on a small 4 core system, and "16-N" becomes
+"16-3" and now the same boot input will be flagged as invalid (start > end).
 
 
 This document may not be entirely up to date and comprehensive. The command
diff --git a/lib/bitmap.c b/lib/bitmap.c
index 6b568f98af3d..cc7cb1fca1ac 100644
--- a/lib/bitmap.c
+++ b/lib/bitmap.c
@@ -530,11 +530,17 @@ static int bitmap_check_region(const struct bitmap_region 
*br)
return 0;
 }
 
-static const char *bitmap_getnum(const char *str, unsigned int *num)
+static const char *bitmap_getnum(const char *str, unsigned int *num,
+unsigned int lastbit)
 {
unsigned long long n;
unsigned int len;
 
+   if (str[0] == 'N') {
+   *num = lastbit;
+   return str + 1;
+   }
+
len = _parse_integer(str, 10, );
if (!len)
return ERR_PTR(-EINVAL);
@@ -580,9 +586,12 @@ static const char *bitmap_find_region_reverse(const char 
*start, const char *end
return end;
 }
 
-static const char *bitmap_parse_region(const char *str, struct region *r)
+static const char *bitmap_parse_region(const char *str, struct bitmap_region 
*br)
 {
-   str = bitmap_getnum(str, >start);
+   struct region *r = br->r;
+   unsigned int lastbit = br->nbits - 1;
+
+   str = bitmap_getnum(str, >start, lastbit);
if (IS_ERR(str))
return str;
 
@@ -592,7 +601,7 @@ static const char *bitmap_parse_region(const char *str, 
struct region *r)
if (*str != '-')
return ERR_PTR(-EINVAL);
 
-   str = bitmap_getnum(str + 1, >end);
+   str = bitmap_getnum(str + 1, >end, lastbit);
if (IS_ERR(str))
return str;
 
@@ -602,14 +611,14 @@ static const char *bitmap_parse_region(const char *str, 
struct region *r)
if (*str != ':')
return ERR_PTR(-EINVAL);
 
-   str = bitmap_getnum(str + 1, >off);
+   str = bitmap_getnum(str + 1, >off, lastbit);
if (IS_ERR(str))
return str;
 
if (*str != '/')
return ERR_PTR(-EINVAL);
 
-   return bitmap_getnum(str + 1, >group_len);
+   return bitmap_getnum(str + 1, >group_len, lastbit);
 
 no_end:
r->end = r->start;
@@ -636,6 +645,10 @@ static const char *bitmap_parse_region(const char *str, 
struct region *r)
  * From each group will be used only defined amount of bits.
  * Syntax: range:used_size/group_size
  * Example: 0-1023:2/256 ==> 0,1,256,257,512,513,768,769
+ * The value 'N' can be used as a dynamically substituted token for the
+ * maximum allowed value; i.e (nmaskbits - 1).  Keep in mind that it is
+ * dynamic, so if system changes cause the bitmap width to change, such
+ * as more cores in a CPU list, then any ranges using N will also change.
  *
  * Returns: 0 on success, -errno on inv

[PATCH v4 0/8] support for bitmap (and hence CPU) list "N" abbreviation

2021-02-09 Thread Paul Gortmaker
The basic objective here was to add support for "nohz_full=8-N" and/or
"rcu_nocbs="4-N" -- essentially introduce "N" as a portable reference
to the last core, evaluated at boot for anything using a CPU list.

The thinking behind this, is that people carve off a few early CPUs to
support housekeeping tasks, and perhaps dedicate one to a busy I/O
peripheral, and then the remaining pool of CPUs out to the end are a
part of a commonly configured pool used for the real work the user
cares about.

Extend that logic out to a fleet of machines - some new, and some
nearing EOL, and you've probably got a wide range of core counts to
contend with - even though the early number of cores dedicated to the
system overhead probably doesn't vary.

This change would enable sysadmins to have a common bootarg across all
such systems, and would also avoid any off-by-one fencepost errors that
happen for users who might briefly forget that core counts start at zero.

Originally I did this at the CPU subsys level, but Yury suggested it
be moved down further to bitmap level itself, which made the core 
implementation smaller and less complex, but the series longer.

New self tests are added to better exercise what bitmap range/region
currently supports, and new tests are added for the new "N" support.

Also tested boot arg and the post-boot cgroup use case as per below:

   root@hackbox:~# cat /proc/cmdline 
   BOOT_IMAGE=/boot/bzImage root=/dev/sda1 rcu_nocbs=2,3,8-N:1/2
   root@hackbox:~# dmesg|grep Offl
   rcu: Offload RCU callbacks from CPUs: 2-3,8,10,12,14.

   root@hackbox:/sys/fs/cgroup/cpuset/foo# cat cpuset.cpus
   
   root@hackbox:/sys/fs/cgroup/cpuset/foo# /bin/echo 10-N > cpuset.cpus
   root@hackbox:/sys/fs/cgroup/cpuset/foo# cat cpuset.cpus
   10-15
   root@hackbox:/sys/fs/cgroup/cpuset/foo# /bin/echo N-N:N/N > cpuset.cpus
   root@hackbox:/sys/fs/cgroup/cpuset/foo# cat cpuset.cpus
   15

This was on a 16 core machine with CONFIG_NR_CPUS=16 in .config file.

Note that "N" is a dynamic quantity, and can change scope if the bitmap
is changed in size.  So at the risk of stating the obvious, don't use it
for "burn_eFuse=128-N" or "secure_erase_firmware=32-N" type stuff.

Paul.
---

I've intentionally not gone down the rabbit hole of whether N or Z or
L is the better letter to mark the end of a mathematical set in the
hope that we can stay focused, and get this closed out here in v4.

Aside from that, I believe all other feedback has been responded to
in one way or another.  Note that I didn't add Reviewed/Ack tags to
anything that changed significantly from what was reviewed in v3.

[v4: pair nbits with region, instead of inside it.  Split EINVAL and
 ERANGE tests.  Don't handle start/end/offset within a macro to
 abstract away nbits usage.  Added some Reviwed-by/Ack tags.]

[v3: Allow "N" to be used anywhere in the region spec, i.e. "N-N:N/N" vs.
 just being allowed at end of range like "0-N".  Add new self-tests.  Drop
 "all" and "none" aliases as redundant and not worth the extra complication. ]
 
https://lore.kernel.org/lkml/20210126171141.122639-1-paul.gortma...@windriver.com

[v2: push code down from cpu subsys to core bitmap code as per
 Yury's comments.  Change "last" to simply be "N" as per PeterZ.]
 
https://lore.kernel.org/lkml/20210121223355.59780-1-paul.gortma...@windriver.com/

[v1: https://lore.kernel.org/lkml/20210106004850.GA11682@paulmck-ThinkPad-P72/

Cc: Li Zefan 
Cc: Ingo Molnar 
Cc: Yury Norov 
Cc: Thomas Gleixner 
Cc: Josh Triplett 
Cc: Peter Zijlstra 
Cc: "Paul E. McKenney" 
Cc: Frederic Weisbecker 
Cc: Rasmus Villemoes 
Cc: Andy Shevchenko 


Paul Gortmaker (8):
  lib: test_bitmap: clearly separate ERANGE from EINVAL tests.
  lib: test_bitmap: add tests to trigger ERANGE case.
  lib: test_bitmap: add more start-end:offset/len tests
  lib: bitmap: move ERANGE check from set_region to check_region
  lib: bitmap: pair nbits value with region struct
  lib: bitmap: support "N" as an alias for size of bitmap
  lib: test_bitmap: add tests for "N" alias
  rcu: deprecate "all" option to rcu_nocbs=

 .../admin-guide/kernel-parameters.rst |  7 +++
 .../admin-guide/kernel-parameters.txt |  4 +-
 kernel/rcu/tree_plugin.h  |  6 +-
 lib/bitmap.c  | 62 +--
 lib/test_bitmap.c | 46 --
 5 files changed, 93 insertions(+), 32 deletions(-)

-- 
2.17.1



[PATCH 4/8] lib: bitmap: move ERANGE check from set_region to check_region

2021-02-09 Thread Paul Gortmaker
It makes sense to do all the checks in check_region() and not 1/2
in check_region and 1/2 in set_region.

Since set_region is called immediately after check_region, the net
effect on runtime is zero, but it gets rid of an if (...) return...

Cc: Yury Norov 
Cc: Rasmus Villemoes 
Cc: Andy Shevchenko 
Acked-by: Yury Norov 
Signed-off-by: Paul Gortmaker 
---
 lib/bitmap.c | 19 +++
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/lib/bitmap.c b/lib/bitmap.c
index 75006c4036e9..9596ba53c36b 100644
--- a/lib/bitmap.c
+++ b/lib/bitmap.c
@@ -499,25 +499,22 @@ struct region {
unsigned int end;
 };
 
-static int bitmap_set_region(const struct region *r,
-   unsigned long *bitmap, int nbits)
+static void bitmap_set_region(const struct region *r, unsigned long *bitmap)
 {
unsigned int start;
 
-   if (r->end >= nbits)
-   return -ERANGE;
-
for (start = r->start; start <= r->end; start += r->group_len)
bitmap_set(bitmap, start, min(r->end - start + 1, r->off));
-
-   return 0;
 }
 
-static int bitmap_check_region(const struct region *r)
+static int bitmap_check_region(const struct region *r, int nbits)
 {
if (r->start > r->end || r->group_len == 0 || r->off > r->group_len)
return -EINVAL;
 
+   if (r->end >= nbits)
+   return -ERANGE;
+
return 0;
 }
 
@@ -651,13 +648,11 @@ int bitmap_parselist(const char *buf, unsigned long 
*maskp, int nmaskbits)
if (IS_ERR(buf))
return PTR_ERR(buf);
 
-   ret = bitmap_check_region();
+   ret = bitmap_check_region(, nmaskbits);
if (ret)
return ret;
 
-   ret = bitmap_set_region(, maskp, nmaskbits);
-   if (ret)
-   return ret;
+   bitmap_set_region(, maskp);
}
 
return 0;
-- 
2.17.1



[PATCH 7/8] lib: test_bitmap: add tests for "N" alias

2021-02-09 Thread Paul Gortmaker
These are copies of existing tests, with just 31 --> N.  This ensures
the recently added "N" alias transparently works in any normally
numeric fields of a region specification.

Cc: Yury Norov 
Cc: Rasmus Villemoes 
Cc: Andy Shevchenko 
Signed-off-by: Paul Gortmaker 
---
 lib/test_bitmap.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c
index 9c6a88c480c1..a6048278d027 100644
--- a/lib/test_bitmap.c
+++ b/lib/test_bitmap.c
@@ -354,6 +354,16 @@ static const struct test_bitmap_parselist 
parselist_tests[] __initconst = {
{0, "16-31:16/31",  [3 * step], 32, 0},
{0, "31-31:31/31",  [14 * step], 32, 0},
 
+   {0, "N-N",  [14 * step], 32, 0},
+   {0, "0-0:1/N",  [0], 32, 0},
+   {0, "0-0:N/N",  [0], 32, 0},
+   {0, "0-15:16/N",[2 * step], 32, 0},
+   {0, "15-15:N/N",[13 * step], 32, 0},
+   {0, "15-N:1/N", [13 * step], 32, 0},
+   {0, "16-N:16/N",[3 * step], 32, 0},
+   {0, "N-N:N/N",  [14 * step], 32, 0},
+
+   {0, "0-N:1/3,1-N:1/3,2-N:1/3",  [8 * step], 32, 0},
{0, "0-31:1/3,1-31:1/3,2-31:1/3",   [8 * step], 32, 0},
{0, "1-10:8/12,8-31:24/29,0-31:0/3",[9 * step], 32, 0},
 
-- 
2.17.1



[PATCH 5/8] lib: bitmap: pair nbits value with region struct

2021-02-09 Thread Paul Gortmaker
A region is a standalone entity to some degree, but it needs to
be paired with a bitmap width in order to set context and determine
if the region even fits into the width of the bitmap.

This will reduce parameter passing and enable using nbits as part
of future dynamic region parameter parsing.

Cc: Yury Norov 
Cc: Rasmus Villemoes 
Cc: Andy Shevchenko 
Suggested-by: Yury Norov 
Suggested-by: Andy Shevchenko 
Signed-off-by: Paul Gortmaker 
---
 lib/bitmap.c | 22 +++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/lib/bitmap.c b/lib/bitmap.c
index 9596ba53c36b..6b568f98af3d 100644
--- a/lib/bitmap.c
+++ b/lib/bitmap.c
@@ -499,6 +499,16 @@ struct region {
unsigned int end;
 };
 
+/*
+ * The region "0-3" is a complete specification, i.e. "the 1st four cores"
+ * for a CPU map, but it needs to be paired to a width in order to have a
+ * meaningful and valid context. (i.e. 4 core region on 4+ core machine...)
+ */
+struct bitmap_region {
+   struct region *r;
+   unsigned int nbits;
+};
+
 static void bitmap_set_region(const struct region *r, unsigned long *bitmap)
 {
unsigned int start;
@@ -507,12 +517,14 @@ static void bitmap_set_region(const struct region *r, 
unsigned long *bitmap)
bitmap_set(bitmap, start, min(r->end - start + 1, r->off));
 }
 
-static int bitmap_check_region(const struct region *r, int nbits)
+static int bitmap_check_region(const struct bitmap_region *br)
 {
+   struct region *r = br->r;
+
if (r->start > r->end || r->group_len == 0 || r->off > r->group_len)
return -EINVAL;
 
-   if (r->end >= nbits)
+   if (r->end >= br->nbits)
return -ERANGE;
 
return 0;
@@ -635,8 +647,12 @@ static const char *bitmap_parse_region(const char *str, 
struct region *r)
 int bitmap_parselist(const char *buf, unsigned long *maskp, int nmaskbits)
 {
struct region r;
+   struct bitmap_region br;
long ret;
 
+   br.r = 
+   br.nbits = nmaskbits;
+
bitmap_zero(maskp, nmaskbits);
 
while (buf) {
@@ -648,7 +664,7 @@ int bitmap_parselist(const char *buf, unsigned long *maskp, 
int nmaskbits)
if (IS_ERR(buf))
return PTR_ERR(buf);
 
-   ret = bitmap_check_region(, nmaskbits);
+   ret = bitmap_check_region();
if (ret)
return ret;
 
-- 
2.17.1



[PATCH 8/8] rcu: deprecate "all" option to rcu_nocbs=

2021-02-09 Thread Paul Gortmaker
With the core bitmap support now accepting "N" as a placeholder for
the end of the bitmap, "all" can be represented as "0-N" and has the
advantage of not being specific to RCU (or any other subsystem).

So deprecate the use of "all" by removing documentation references
to it.  The support itself needs to remain for now, since we don't
know how many people out there are using it currently, but since it
is in an __init area anyway, it isn't worth losing sleep over.

Cc: Yury Norov 
Cc: Peter Zijlstra 
Cc: "Paul E. McKenney" 
Cc: Josh Triplett 
Signed-off-by: Paul Gortmaker 
---
 Documentation/admin-guide/kernel-parameters.txt | 4 +---
 kernel/rcu/tree_plugin.h| 6 ++
 2 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index a10b545c2070..a116c0ff0a91 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4037,9 +4037,7 @@
see CONFIG_RAS_CEC help text.
 
rcu_nocbs=  [KNL]
-   The argument is a cpu list, as described above,
-   except that the string "all" can be used to
-   specify every CPU on the system.
+   The argument is a cpu list, as described above.
 
In kernels built with CONFIG_RCU_NOCB_CPU=y, set
the specified list of CPUs to be no-callback CPUs.
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 7e291ce0a1d6..56788dfde922 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1463,14 +1463,12 @@ static void rcu_cleanup_after_idle(void)
 
 /*
  * Parse the boot-time rcu_nocb_mask CPU list from the kernel parameters.
- * The string after the "rcu_nocbs=" is either "all" for all CPUs, or a
- * comma-separated list of CPUs and/or CPU ranges.  If an invalid list is
- * given, a warning is emitted and all CPUs are offloaded.
+ * If the list is invalid, a warning is emitted and all CPUs are offloaded.
  */
 static int __init rcu_nocb_setup(char *str)
 {
alloc_bootmem_cpumask_var(_nocb_mask);
-   if (!strcasecmp(str, "all"))
+   if (!strcasecmp(str, "all"))/* legacy: use "0-N" instead */
cpumask_setall(rcu_nocb_mask);
else
if (cpulist_parse(str, rcu_nocb_mask)) {
-- 
2.17.1



[PATCH 1/8] lib: test_bitmap: clearly separate ERANGE from EINVAL tests.

2021-02-09 Thread Paul Gortmaker
This block of tests was meant to find/flag incorrect use of the ":"
and "/" separators (syntax errors) and invalid (zero) group len.

However they were specified with an 8 bit width and 32 bit operations,
so they really contained two errors (EINVAL and ERANGE).

Promote them to 32 bit so it is clear what they are meant to target.
Then we can add tests specific for ERANGE (no syntax errors, just
doing 32bit op on 8 bit width, plus a typical 9-on-8 fencepost error).

Cc: Yury Norov 
Cc: Rasmus Villemoes 
Cc: Andy Shevchenko 
Reviewed-by: Andy Shevchenko 
Signed-off-by: Paul Gortmaker 
---
 lib/test_bitmap.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c
index 4425a1dd4ef1..589f2a34ceba 100644
--- a/lib/test_bitmap.c
+++ b/lib/test_bitmap.c
@@ -338,12 +338,12 @@ static const struct test_bitmap_parselist 
parselist_tests[] __initconst = {
{-EINVAL, "-1", NULL, 8, 0},
{-EINVAL, "-0", NULL, 8, 0},
{-EINVAL, "10-1", NULL, 8, 0},
-   {-EINVAL, "0-31:", NULL, 8, 0},
-   {-EINVAL, "0-31:0", NULL, 8, 0},
-   {-EINVAL, "0-31:0/", NULL, 8, 0},
-   {-EINVAL, "0-31:0/0", NULL, 8, 0},
-   {-EINVAL, "0-31:1/0", NULL, 8, 0},
-   {-EINVAL, "0-31:10/1", NULL, 8, 0},
+   {-EINVAL, "0-31:", NULL, 32, 0},
+   {-EINVAL, "0-31:0", NULL, 32, 0},
+   {-EINVAL, "0-31:0/", NULL, 32, 0},
+   {-EINVAL, "0-31:0/0", NULL, 32, 0},
+   {-EINVAL, "0-31:1/0", NULL, 32, 0},
+   {-EINVAL, "0-31:10/1", NULL, 32, 0},
{-EOVERFLOW, "0-98765432123456789:10/1", NULL, 8, 0},
 
{-EINVAL, "a-31", NULL, 8, 0},
-- 
2.17.1



[PATCH 3/8] lib: test_bitmap: add more start-end:offset/len tests

2021-02-09 Thread Paul Gortmaker
There are inputs to bitmap_parselist() that would probably never
be entered manually by a person, but might result from some kind of
automated input generator.  Things like ranges of length 1, or group
lengths longer than nbits, overlaps, or offsets of zero.

Adding these tests serve two purposes:

1) document what might seem odd but nonetheless valid input.

2) don't regress from what we currently accept as valid.

Cc: Yury Norov 
Cc: Rasmus Villemoes 
Cc: Andy Shevchenko 
Acked-by: Yury Norov 
Reviewed-by: Andy Shevchenko 
Signed-off-by: Paul Gortmaker 
---
 lib/test_bitmap.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c
index 172ffbfa83c4..9c6a88c480c1 100644
--- a/lib/test_bitmap.c
+++ b/lib/test_bitmap.c
@@ -35,6 +35,8 @@ static const unsigned long exp1[] __initconst = {
BITMAP_FROM_U64(0xULL),
BITMAP_FROM_U64(0xULL),
BITMAP_FROM_U64(0),
+   BITMAP_FROM_U64(0x8000),
+   BITMAP_FROM_U64(0x8000),
 };
 
 static const unsigned long exp2[] __initconst = {
@@ -335,6 +337,26 @@ static const struct test_bitmap_parselist 
parselist_tests[] __initconst = {
{0, " ,  ,,  , ,   ",   [12 * step], 8, 0},
{0, " ,  ,,  , ,   \n", [12 * step], 8, 0},
 
+   {0, "0-0",  [0], 32, 0},
+   {0, "1-1",  [1 * step], 32, 0},
+   {0, "15-15",[13 * step], 32, 0},
+   {0, "31-31",[14 * step], 32, 0},
+
+   {0, "0-0:0/1",  [12 * step], 32, 0},
+   {0, "0-0:1/1",  [0], 32, 0},
+   {0, "0-0:1/31", [0], 32, 0},
+   {0, "0-0:31/31",[0], 32, 0},
+   {0, "1-1:1/1",  [1 * step], 32, 0},
+   {0, "0-15:16/31",   [2 * step], 32, 0},
+   {0, "15-15:1/2",[13 * step], 32, 0},
+   {0, "15-15:31/31",  [13 * step], 32, 0},
+   {0, "15-31:1/31",   [13 * step], 32, 0},
+   {0, "16-31:16/31",  [3 * step], 32, 0},
+   {0, "31-31:31/31",  [14 * step], 32, 0},
+
+   {0, "0-31:1/3,1-31:1/3,2-31:1/3",   [8 * step], 32, 0},
+   {0, "1-10:8/12,8-31:24/29,0-31:0/3",[9 * step], 32, 0},
+
{-EINVAL, "-1", NULL, 8, 0},
{-EINVAL, "-0", NULL, 8, 0},
{-EINVAL, "10-1", NULL, 8, 0},
-- 
2.17.1



[PATCH 2/8] lib: test_bitmap: add tests to trigger ERANGE case.

2021-02-09 Thread Paul Gortmaker
Add tests that specify a valid range, but one that is outside the
width of the bitmap for which it is to be applied to.  These should
trigger an -ERANGE response from the code.

Cc: Yury Norov 
Cc: Rasmus Villemoes 
Cc: Andy Shevchenko 
Signed-off-by: Paul Gortmaker 
---
 lib/test_bitmap.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c
index 589f2a34ceba..172ffbfa83c4 100644
--- a/lib/test_bitmap.c
+++ b/lib/test_bitmap.c
@@ -338,6 +338,8 @@ static const struct test_bitmap_parselist parselist_tests[] 
__initconst = {
{-EINVAL, "-1", NULL, 8, 0},
{-EINVAL, "-0", NULL, 8, 0},
{-EINVAL, "10-1", NULL, 8, 0},
+   {-ERANGE, "8-8", NULL, 8, 0},
+   {-ERANGE, "0-31", NULL, 8, 0},
{-EINVAL, "0-31:", NULL, 32, 0},
{-EINVAL, "0-31:0", NULL, 32, 0},
{-EINVAL, "0-31:0/", NULL, 32, 0},
-- 
2.17.1



Re: [PATCH v3 0/8] support for bitmap (and hence CPU) list "N" abbreviation

2021-01-27 Thread Paul Gortmaker
[Re: [PATCH v3 0/8] support for bitmap (and hence CPU) list "N" abbreviation] 
On 26/01/2021 (Tue 14:27) Yury Norov wrote:

> On Tue, Jan 26, 2021 at 9:12 AM Paul Gortmaker
>  wrote:
> >
> > This was on a 16 core machine with CONFIG_NR_CPUS=16 in .config file.
> >
> > Note that "N" is a dynamic quantity, and can change scope if the bitmap
> > is changed in size.  So at the risk of stating the obvious, don't use it
> > for "burn_eFuse=128-N" or "secure_erase_firmware=32-N" type stuff.
> 
> I think it's worth moving this sentence to the Documentation. Another

Dynamic nature comment added to Documentation

> caveat with
> N is that users' config may surprisingly become invalid, like if user
> says 32-N, and
> on some machine with a smaller bitmap this config fails to boot.

Updated example to indicate that "16-N" becomes invalid if moved from 32
core system to quad core.  I'm not currently able to think of an example
where boot will fail -- vs. a subsystem getting -EINVAL from bitmap code
and printing a subsystem error instead.

> It doesn't mean of course that I'm against 'N'. I think it's very
> useful especially in
> such common cases like "N", "0-N", "1-N".
> 
> Would it make sense to treat the mask "32-N" when N < 32 as N-N,
> and bark something in dmesg?

I don't think so.  For the same reasons you used to convince me -- that N
should be treated as just another number and not have special rules.

If I boot now, with "important_cpu="32-3" on a quad core then I get what
I get for being stupid.   We don't special case that and subsitute in a
"3-3" (which would then be "3") -- and nor should we!

Sticking to the CPU example, we have no idea what the caller's use case
is -- we don't know if NUMA stuff might be present and whether having
the single CPU #3 in that set is better or worse than EINVAL and no CPUs
in the set.   Expand that to bitmaps in general and we have no idea what
the "right" reaction to garbage input is.

The context of the caller could be simply test_bitmap.c itself -- which
would be expecting the EINVAL, and not some kind of "hot patching" of
the region in order to make it valid.

The only sane option is for the bitmap code to return EINVAL and let the
calling subsystem (with the appropriate context/info) make the decision
as to what to do next.  Which is what the series does now.

Paul.
--

> 
> > Paul.
> > ---
> >
> > [v1: 
> > https://lore.kernel.org/lkml/20210106004850.GA11682@paulmck-ThinkPad-P72/
> >
> > [v2: push code down from cpu subsys to core bitmap code as per
> >  Yury's comments.  Change "last" to simply be "N" as per PeterZ.]
> >  
> > https://lore.kernel.org/lkml/20210121223355.59780-1-paul.gortma...@windriver.com/
> >
> > [v3: Allow "N" to be used anywhere in the region spec, i.e. "N-N:N/N" vs.
> >  just being allowed at end of range like "0-N".  Add new self-tests.  Drop
> >  "all" and "none" aliases as redundant and not worth the extra 
> > complication. ]
> >
> > Cc: Li Zefan 
> > Cc: Ingo Molnar 
> > Cc: Yury Norov 
> > Cc: Thomas Gleixner 
> > Cc: Josh Triplett 
> > Cc: Peter Zijlstra 
> > Cc: "Paul E. McKenney" 
> > Cc: Frederic Weisbecker 
> > Cc: Rasmus Villemoes 
> > Cc: Andy Shevchenko 
> >
> > ---
> >
> > Paul Gortmaker (8):
> >   lib: test_bitmap: clearly separate ERANGE from EINVAL tests.
> >   lib: test_bitmap: add more start-end:offset/len tests
> >   lib: bitmap: fold nbits into region struct
> >   lib: bitmap: move ERANGE check from set_region to check_region
> >   lib: bitmap_getnum: separate arg into region and field
> >   lib: bitmap: support "N" as an alias for size of bitmap
> >   lib: test_bitmap: add tests for "N" alias
> >   rcu: deprecate "all" option to rcu_nocbs=
> >
> >  .../admin-guide/kernel-parameters.rst |  2 +
> >  .../admin-guide/kernel-parameters.txt |  4 +-
> >  kernel/rcu/tree_plugin.h  |  6 +--
> >  lib/bitmap.c  | 46 ++
> >  lib/test_bitmap.c | 48 ---
> >  5 files changed, 72 insertions(+), 34 deletions(-)
> >
> > --
> > 2.17.1
> >


Re: [PATCH 5/8] lib: bitmap_getnum: separate arg into region and field

2021-01-27 Thread Paul Gortmaker
[Re: [PATCH 5/8] lib: bitmap_getnum: separate arg into region and field] On 
26/01/2021 (Tue 18:58) Yury Norov wrote:

> On Tue, Jan 26, 2021 at 1:22 PM Andy Shevchenko
>  wrote:
> >
> > On Tue, Jan 26, 2021 at 12:11:38PM -0500, Paul Gortmaker wrote:
> > > The bitmap_getnum is only used on a region's start/end/off/group_len
> > > field.  Trivially decouple the region from the field so that the region
> > > pointer is available for a pending change.
> >
> > Honestly, I don't like this macro trick. It's bad in couple of ways:
> >  - it hides what actually is done with the fields of r structure
> >(after you get that they are fields!)
> >  - it breaks possibility to compile time (type) checks
> >
> > I will listen what others say, but I'm in favour not to proceed like this.
> 
> Agree. Would be better to drop the patch. Paul, what kind of pending
> change do you mean here? All the following patches are not related to
> parsing machinery.

It was directly related, because...

> 
> > > Cc: Yury Norov 
> > > Cc: Rasmus Villemoes 
> > > Cc: Andy Shevchenko 
> > > Signed-off-by: Paul Gortmaker 
> > > ---
> > >  lib/bitmap.c | 9 +
> > >  1 file changed, 5 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/lib/bitmap.c b/lib/bitmap.c
> > > index 833f152a2c43..f65be2f148fd 100644
> > > --- a/lib/bitmap.c
> > > +++ b/lib/bitmap.c
> > > @@ -533,6 +533,7 @@ static const char *bitmap_getnum(const char *str, 
> > > unsigned int *num)
> > >   *num = n;
> > >   return str + len;
> > >  }
> > > +#define bitmap_getrnum(s, r, pos) bitmap_getnum(s, &(r->pos))

...this one line above opened the door to then do [in 6/8]:

   -#define bitmap_getrnum(s, r, pos) bitmap_getnum(s, &(r->pos))
   +#define bitmap_getrnum(s, r, pos) __bitmap_getnum(s, r->nbits, &(r->pos))

which gets nbits down into bitmap_getnum so we can handle N in there as
the placement you'd specifically requested for treating N as just a number.

In any case, I've decided against putting nbits into the region struct
and have got the nbits value down into getnum() another way for v4,
without using this commit or similar macros.

Paul.
--

> > >
> > >  static inline bool end_of_str(char c)
> > >  {
> > > @@ -571,7 +572,7 @@ static const char *bitmap_find_region_reverse(const 
> > > char *start, const char *end
> > >
> > >  static const char *bitmap_parse_region(const char *str, struct region *r)
> > >  {
> > > - str = bitmap_getnum(str, >start);
> > > + str = bitmap_getrnum(str, r, start);
> > >   if (IS_ERR(str))
> > >   return str;
> > >
> > > @@ -581,7 +582,7 @@ static const char *bitmap_parse_region(const char 
> > > *str, struct region *r)
> > >   if (*str != '-')
> > >   return ERR_PTR(-EINVAL);
> > >
> > > - str = bitmap_getnum(str + 1, >end);
> > > + str = bitmap_getrnum(str + 1, r, end);
> > >   if (IS_ERR(str))
> > >   return str;
> > >
> > > @@ -591,14 +592,14 @@ static const char *bitmap_parse_region(const char 
> > > *str, struct region *r)
> > >   if (*str != ':')
> > >   return ERR_PTR(-EINVAL);
> > >
> > > - str = bitmap_getnum(str + 1, >off);
> > > + str = bitmap_getrnum(str + 1, r, off);
> > >   if (IS_ERR(str))
> > >   return str;
> > >
> > >   if (*str != '/')
> > >   return ERR_PTR(-EINVAL);
> > >
> > > - return bitmap_getnum(str + 1, >group_len);
> > > + return bitmap_getrnum(str + 1, r, group_len);
> > >
> > >  no_end:
> > >   r->end = r->start;
> > > --
> > > 2.17.1
> > >
> >
> > --
> > With Best Regards,
> > Andy Shevchenko
> >
> >


Re: [PATCH 6/8] lib: bitmap: support "N" as an alias for size of bitmap

2021-01-27 Thread Paul Gortmaker
[Re: [PATCH 6/8] lib: bitmap: support "N" as an alias for size of bitmap] On 
26/01/2021 (Tue 23:37) Andy Shevchenko wrote:

> On Tue, Jan 26, 2021 at 12:11:39PM -0500, Paul Gortmaker wrote:
> > While this is done for all bitmaps, the original use case in mind was
> > for CPU masks and cpulist_parse() as described below.
> > 
> > It seems that a common configuration is to use the 1st couple cores for
> > housekeeping tasks.  This tends to leave the remaining ones to form a
> > pool of similarly configured cores to take on the real workload of
> > interest to the user.
> > 
> > So on machine A - with 32 cores, it could be 0-3 for "system" and then
> > 4-31 being used in boot args like nohz_full=, or rcu_nocbs= as part of
> > setting up the worker pool of CPUs.
> > 
> > But then newer machine B is added, and it has 48 cores, and so while
> > the 0-3 part remains unchanged, the pool setup cpu list becomes 4-47.
> > 
> > Multiple deployment becomes easier when we can just simply replace 31
> > and 47 with "N" and let the system substitute in the actual number at
> > boot; a number that it knows better than we do.
> 
> I would accept lower 'n' as well.
> 
> ...
> 
> > -static const char *bitmap_getnum(const char *str, unsigned int *num)
> > +static const char *__bitmap_getnum(const char *str, unsigned int nbits,
> > +   unsigned int *num)
> >  {
> > unsigned long long n;
> > unsigned int len;
> >  
> > +   if (str[0] == 'N') {
> > +   *num = nbits - 1;
> > +   return str + 1;
> > +   }
> 
> But locating it here makes possible to enter a priori invalid input, like N 
> for
> start of the region.

Actually, no.  N can be valid input for start of the region - or for any
field in the region.  I was originally thinking like you -- that N
was only valid as the end of the region, but Yury made a compelling
argument that N should be treated exactly as any other number is.

Skip down to where Yury says:
 So, when I do echo N-N > cpuset.cpus, I want it to work as
 if I do echo 15-15 > cpuset.cpus.

  https://lore.kernel.org/lkml/20210126171811.gc23...@windriver.com/

You weren't Cc'd at that point as I'd not added any self-test changes
to the series yet - so you didn't probably see that discussion.

This is why you'll see "N-N:N/N" added as a self-test that works.  It
doesn't make any more sense than using 15-15:15/15 does (vs. "15") but
both are equally valid inputs, in that they don't trigger an error.

Thanks,
Paul.
--

> 
> I think this should be separate helper which is called in places where it 
> makes
> sense.
> 
> > len = _parse_integer(str, 10, );
> > if (!len)
> > return ERR_PTR(-EINVAL);
> 
> 
> -- 
> With Best Regards,
> Andy Shevchenko
> 
> 


Re: [PATCH 3/8] lib: bitmap: fold nbits into region struct

2021-01-27 Thread Paul Gortmaker
[Re: [PATCH 3/8] lib: bitmap: fold nbits into region struct] On 26/01/2021 (Tue 
23:16) Andy Shevchenko wrote:

> On Tue, Jan 26, 2021 at 12:11:36PM -0500, Paul Gortmaker wrote:
> > This will reduce parameter passing and enable using nbits as part
> > of future dynamic region parameter parsing.
> 
> One nit below, nevertheless
> Reviewed-by: Andy Shevchenko 
> 
> > Cc: Yury Norov 
> > Cc: Rasmus Villemoes 
> > Cc: Andy Shevchenko 
> > Suggested-by: Yury Norov 
> > Signed-off-by: Paul Gortmaker 
> > ---
> >  lib/bitmap.c | 19 ++-
> >  1 file changed, 10 insertions(+), 9 deletions(-)
> > 
> > diff --git a/lib/bitmap.c b/lib/bitmap.c
> > index 75006c4036e9..162e2850c622 100644
> > --- a/lib/bitmap.c
> > +++ b/lib/bitmap.c
> > @@ -487,24 +487,24 @@ EXPORT_SYMBOL(bitmap_print_to_pagebuf);
> >  
> >  /*
> >   * Region 9-38:4/10 describes the following bitmap structure:
> > - * 0  9  1218  38
> > - * .......
> > - * ^  ^ ^   ^
> > - *  start  off   group_lenend
> > + * 0  9  1218  38   N
> > + * .......
> > + * ^  ^ ^   ^   ^
> > + *  start  off   group_lenend   nbits
> >   */
> >  struct region {
> > unsigned int start;
> > unsigned int off;
> > unsigned int group_len;
> > unsigned int end;
> > +   unsigned int nbits;
> >  };
> >  
> > -static int bitmap_set_region(const struct region *r,
> > -   unsigned long *bitmap, int nbits)
> > +static int bitmap_set_region(const struct region *r, unsigned long *bitmap)
> >  {
> > unsigned int start;
> >  
> > -   if (r->end >= nbits)
> > +   if (r->end >= r->nbits)
> > return -ERANGE;
> >  
> > for (start = r->start; start <= r->end; start += r->group_len)
> > @@ -640,7 +640,8 @@ int bitmap_parselist(const char *buf, unsigned long 
> > *maskp, int nmaskbits)
> > struct region r;
> > long ret;
> >  
> > -   bitmap_zero(maskp, nmaskbits);
> > +   r.nbits = nmaskbits;
> 
> > +   bitmap_zero(maskp, r.nbits);
> 
> This sounds not right from style perspective.
> You have completely uninitialized r on stack, then you assign only one value
> for immediate use here and...

So, this change was added because Yury suggested that I "..store
nmaskbits in the struct region, and avoid passing nmaskbits as a
parameter."

To which I originally noted "I considered that and went with the param
so as to not open the door to someone possibly using an uninitialized
struct value later."

https://lore.kernel.org/lkml/20210122044357.gs16...@windriver.com/

Looking back, I had a similar thought as to yours, it seems...

I am also thinking more and more that nbits doesn't belong in the
region anyway - yes, a region gets validated against a specific nbits
eventually, but it doesn't need an nbits field to be a complete
specification.  The region "0-3" is a complete specification for "the
1st four cores" and is as valid on a 4 core machine as it is on a 64 core
machine -- a validation we do when we deploy the region on that machine.

I will set this change aside and get the nbits value to getnum() another
way, and leave the region struct as it was -- without a nbits field.

This will also resolve having the macro handling of region that you were
not really liking.

Paul.
--

> > while (buf) {
> > buf = bitmap_find_region(buf);
> > @@ -655,7 +656,7 @@ int bitmap_parselist(const char *buf, unsigned long 
> > *maskp, int nmaskbits)
> > if (ret)
> > return ret;
> >  
> > -   ret = bitmap_set_region(, maskp, nmaskbits);
> > +   ret = bitmap_set_region(, maskp);
> 
> ...hiding this fact here. Which I would expect that  may be rewritten here.
> 
> I would leave these unchanged and simple assign the value in
> bitmap_set_region().
> 
> > if (ret)
> > return ret;
> > }
> > -- 
> > 2.17.1
> > 
> 
> -- 
> With Best Regards,
> Andy Shevchenko
> 
> 


Re: [PATCH 1/8] lib: test_bitmap: clearly separate ERANGE from EINVAL tests.

2021-01-26 Thread Paul Gortmaker
[Re: [PATCH 1/8] lib: test_bitmap: clearly separate ERANGE from EINVAL tests.] 
On 26/01/2021 (Tue 23:04) Andy Shevchenko wrote:

> On Tue, Jan 26, 2021 at 12:11:34PM -0500, Paul Gortmaker wrote:
> > This block of tests was meant to find/flag incorrect use of the ":"
> > and "/" separators (syntax errors) and invalid (zero) group len.
> > 
> > However they were specified with an 8 bit width and 32 bit operations,
> > so they really contained two errors (EINVAL and ERANGE).
> > 
> > Promote them to 32 bit so it is clear what they are meant to target,
> > and then add tests specific for ERANGE (no syntax errors, just doing
> > 32bit op on 8 bit width, plus a typical 9-on-8 fencepost error).
> > 
> > Note that the remaining "10-1" on 8 is left as-is, since that captures
> > the fact that we check for (r->start > r->end) ---> EINVAL before we
> > check for (r->end >= nbits) ---> ERANGE.  If the code is ever re-ordered
> > then this test will pick up the mismatched errno value.
> 
> I didn't get the last statement. You meant code in the bitmap library itself,
> and not in the test cases? Please, clarify this somehow.

It probably wasn't worth a mention at all, as that test in question was
left unchanged;  but yes - it was a reference to the ordering of the
sanity checks in the bitmap code itself and not the test order.   I'll
simply delete the confusing "10-1" paragraph/comment. 

> I don't really much care, since it's not a tricky commit, but it might be 
> split
> to two or three separated ones. Anyway, feel free to add
> Reviewed-by: Andy Shevchenko 

Since you mentioned it, I assume you would prefer it.  So I will make
the 8 --> 32 change in one commit, and add the two new ERANGE tests in
another subsequent commit.

Thanks,
Paul.
--

> 
> > Cc: Yury Norov 
> > Cc: Rasmus Villemoes 
> > Cc: Andy Shevchenko 
> > Signed-off-by: Paul Gortmaker 
> > ---
> >  lib/test_bitmap.c | 16 +---
> >  1 file changed, 9 insertions(+), 7 deletions(-)
> > 
> > diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c
> > index 4425a1dd4ef1..3d2cd3b1de84 100644
> > --- a/lib/test_bitmap.c
> > +++ b/lib/test_bitmap.c
> > @@ -337,13 +337,15 @@ static const struct test_bitmap_parselist 
> > parselist_tests[] __initconst = {
> >  
> > {-EINVAL, "-1", NULL, 8, 0},
> > {-EINVAL, "-0", NULL, 8, 0},
> > -   {-EINVAL, "10-1", NULL, 8, 0},
> > -   {-EINVAL, "0-31:", NULL, 8, 0},
> > -   {-EINVAL, "0-31:0", NULL, 8, 0},
> > -   {-EINVAL, "0-31:0/", NULL, 8, 0},
> > -   {-EINVAL, "0-31:0/0", NULL, 8, 0},
> > -   {-EINVAL, "0-31:1/0", NULL, 8, 0},
> > -   {-EINVAL, "0-31:10/1", NULL, 8, 0},
> > +   {-EINVAL, "10-1", NULL, 8, 0},  /* (start > end) ; also ERANGE */
> > +   {-ERANGE, "8-8", NULL, 8, 0},
> > +   {-ERANGE, "0-31", NULL, 8, 0},
> > +   {-EINVAL, "0-31:", NULL, 32, 0},
> > +   {-EINVAL, "0-31:0", NULL, 32, 0},
> > +   {-EINVAL, "0-31:0/", NULL, 32, 0},
> > +   {-EINVAL, "0-31:0/0", NULL, 32, 0},
> > +   {-EINVAL, "0-31:1/0", NULL, 32, 0},
> > +   {-EINVAL, "0-31:10/1", NULL, 32, 0},
> > {-EOVERFLOW, "0-98765432123456789:10/1", NULL, 8, 0},
> >  
> > {-EINVAL, "a-31", NULL, 8, 0},
> > -- 
> > 2.17.1
> > 
> 
> -- 
> With Best Regards,
> Andy Shevchenko
> 
> 


Re: [PATCH 3/3] lib: support N as end of range in bitmap_parselist()

2021-01-26 Thread Paul Gortmaker
[Re: [PATCH 3/3] lib: support N as end of range in bitmap_parselist()] On 
22/01/2021 (Fri 15:08) Yury Norov wrote:

> On Thu, Jan 21, 2021 at 8:44 PM Paul Gortmaker
>  wrote:
> >
> > [Re: [PATCH 3/3] lib: support N as end of range in bitmap_parselist()] On 
> > 21/01/2021 (Thu 16:29) Yury Norov wrote:
> >
> > > On Thu, Jan 21, 2021 at 2:34 PM Paul Gortmaker
> > >  wrote:
> > > >
> > > > While this is done for all bitmaps, the original use case in mind was
> > > > for CPU masks and cpulist_parse().  Credit to Yury who suggested to
> > > > push it down from CPU subsys to bitmap - it simplified things a lot.
> > >
> > > Can you convert your credit to Suggested-by or Reviewed-by? :)
> >
> > Sure, of course.

Now done for v3.

> >
> > [...]
> >
> > > > diff --git a/lib/bitmap.c b/lib/bitmap.c
> > > > index a1010646fbe5..d498ea9d526b 100644
> > > > --- a/lib/bitmap.c
> > > > +++ b/lib/bitmap.c
> > > > @@ -571,7 +571,7 @@ static const char *bitmap_find_region_reverse(const 
> > > > char *start, const char *end
> > > > return end;
> > > >  }
> > > >
> > > > -static const char *bitmap_parse_region(const char *str, struct region 
> > > > *r)
> > > > +static const char *bitmap_parse_region(const char *str, struct region 
> > > > *r, int nmaskbits)
> > > >  {
> > >
> > > in bitmap_parselist() you can store nmaskbits in the struct region, and 
> > > avoid
> > > passing nmaskbits as a parameter.
> >
> > OK.   FWIW, I considered that and went with the param so as to not open
> > the door to someone possibly using an uninitialized struct value later.

Also now done - reduces parameter passing and enables moving a sanity
check from set_region into check_region where it IMHO belongs.

> >
> > > > str = bitmap_getnum(str, >start);
> > > > if (IS_ERR(str))
> > > > @@ -583,9 +583,15 @@ static const char *bitmap_parse_region(const char 
> > > > *str, struct region *r)
> > > > if (*str != '-')
> > > > return ERR_PTR(-EINVAL);
> > > >
> > > > -   str = bitmap_getnum(str + 1, >end);
> > > > -   if (IS_ERR(str))
> > > > -   return str;
> > > > +   str++;
> > > > +   if (*str == 'N') {
> > > > +   r->end = nmaskbits - 1;
> > > > +   str++;
> > > > +   } else {
> > > > +   str = bitmap_getnum(str, >end);
> > > > +   if (IS_ERR(str))
> > > > +   return str;
> > > > +   }
> > >
> > > Indeed it's much simpler. But I don't like that you increase the nesting 
> > > level.
> > > Can you keep bitmap_parse_region() a single-tab style function?

No increase in nesting level in v3.

> >
> > Rather a strict coding style, but we can replace with:
> >
> >if (*str == 'N') {
> >r->end = nmaskbits - 1;
> >str++;
> >} else {
> >str = bitmap_getnum(str, >end);
> >}
> >
> >if (IS_ERR(str))
> >return str;
> >
> > Is that what you were after?
> >
> > > What about group size? Are you going to support N there, like "0-N:5/N"?
> >
> > No.  I would think that the group size has to be less than 1/2 of
> > the nmaskbits or you get the rather pointless case of just one group.
> > Plus conflating "end of range" with "group size" just adds confusion.
> > So it is currently not legal:
> >
> > root@hackbox:/sys/fs/cgroup/cpuset/foo# /bin/echo 4-N:2/4 > cpuset.cpus
> > root@hackbox:/sys/fs/cgroup/cpuset/foo# cat cpuset.cpus
> > 4-5,8-9,12-13
> > root@hackbox:/sys/fs/cgroup/cpuset/foo# /bin/echo 4-N:2/N > cpuset.cpus
> > /bin/echo: write error: Invalid argument
> > root@hackbox:/sys/fs/cgroup/cpuset/foo#
> >
> > > What about "N-N"? Is it legal? Maybe hide new logic in bitmap_getnum()?
> >
> > The "N-N" is also not supported/legal.  The allowed use is listed as
> > being for the end of a range only.  The code enforces this by ensuring
> > the char previous is a '-'  ; hence a leading N is invalid:
> >
> > root@hackbox:/sys/fs/cgroup/cpuset/foo# /bin/echo N-N > cpuset.cpus
> > /bin/echo: write erro

[PATCH 7/8] lib: test_bitmap: add tests for "N" alias

2021-01-26 Thread Paul Gortmaker
These are copies of existing tests, with just 31 --> N.  This ensures
the recently added "N" alias transparently works in any normally
numeric fields of a region specification.

Cc: Yury Norov 
Cc: Rasmus Villemoes 
Cc: Andy Shevchenko 
Signed-off-by: Paul Gortmaker 
---
 lib/test_bitmap.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c
index 807d1e8dd59c..2bcea2517c03 100644
--- a/lib/test_bitmap.c
+++ b/lib/test_bitmap.c
@@ -354,6 +354,16 @@ static const struct test_bitmap_parselist 
parselist_tests[] __initconst = {
{0, "16-31:16/31",  [3 * step], 32, 0},
{0, "31-31:31/31",  [14 * step], 32, 0},
 
+   {0, "N-N",  [14 * step], 32, 0},
+   {0, "0-0:1/N",  [0], 32, 0},
+   {0, "0-0:N/N",  [0], 32, 0},
+   {0, "0-15:16/N",[2 * step], 32, 0},
+   {0, "15-15:N/N",[13 * step], 32, 0},
+   {0, "15-N:1/N", [13 * step], 32, 0},
+   {0, "16-N:16/N",[3 * step], 32, 0},
+   {0, "N-N:N/N",  [14 * step], 32, 0},
+
+   {0, "0-N:1/3,1-N:1/3,2-N:1/3",  [8 * step], 32, 0},
{0, "0-31:1/3,1-31:1/3,2-31:1/3",   [8 * step], 32, 0},
{0, "1-10:8/12,8-31:24/29,0-31:0/3",[9 * step], 32, 0},
 
-- 
2.17.1



[PATCH 6/8] lib: bitmap: support "N" as an alias for size of bitmap

2021-01-26 Thread Paul Gortmaker
While this is done for all bitmaps, the original use case in mind was
for CPU masks and cpulist_parse() as described below.

It seems that a common configuration is to use the 1st couple cores for
housekeeping tasks.  This tends to leave the remaining ones to form a
pool of similarly configured cores to take on the real workload of
interest to the user.

So on machine A - with 32 cores, it could be 0-3 for "system" and then
4-31 being used in boot args like nohz_full=, or rcu_nocbs= as part of
setting up the worker pool of CPUs.

But then newer machine B is added, and it has 48 cores, and so while
the 0-3 part remains unchanged, the pool setup cpu list becomes 4-47.

Multiple deployment becomes easier when we can just simply replace 31
and 47 with "N" and let the system substitute in the actual number at
boot; a number that it knows better than we do.

Cc: Yury Norov 
Cc: Peter Zijlstra 
Cc: "Paul E. McKenney" 
Cc: Rasmus Villemoes 
Cc: Andy Shevchenko 
Suggested-by: Yury Norov 
Signed-off-by: Paul Gortmaker 
---
 Documentation/admin-guide/kernel-parameters.rst |  2 ++
 lib/bitmap.c| 12 ++--
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.rst 
b/Documentation/admin-guide/kernel-parameters.rst
index 682ab28b5c94..850917f19476 100644
--- a/Documentation/admin-guide/kernel-parameters.rst
+++ b/Documentation/admin-guide/kernel-parameters.rst
@@ -68,6 +68,8 @@ For example one can add to the command line following 
parameter:
 
 where the final item represents CPUs 100,101,125,126,150,151,...
 
+The value "N" can be used to represent the numerically last CPU on the system,
+i.e "foo_cpus=16-N" would be equivalent to "16-31" on a 32 core system.
 
 
 This document may not be entirely up to date and comprehensive. The command
diff --git a/lib/bitmap.c b/lib/bitmap.c
index f65be2f148fd..2fdd00b312c3 100644
--- a/lib/bitmap.c
+++ b/lib/bitmap.c
@@ -519,11 +519,17 @@ static int bitmap_check_region(const struct region *r)
return 0;
 }
 
-static const char *bitmap_getnum(const char *str, unsigned int *num)
+static const char *__bitmap_getnum(const char *str, unsigned int nbits,
+   unsigned int *num)
 {
unsigned long long n;
unsigned int len;
 
+   if (str[0] == 'N') {
+   *num = nbits - 1;
+   return str + 1;
+   }
+
len = _parse_integer(str, 10, );
if (!len)
return ERR_PTR(-EINVAL);
@@ -533,7 +539,7 @@ static const char *bitmap_getnum(const char *str, unsigned 
int *num)
*num = n;
return str + len;
 }
-#define bitmap_getrnum(s, r, pos) bitmap_getnum(s, &(r->pos))
+#define bitmap_getrnum(s, r, pos) __bitmap_getnum(s, r->nbits, &(r->pos))
 
 static inline bool end_of_str(char c)
 {
@@ -626,6 +632,8 @@ static const char *bitmap_parse_region(const char *str, 
struct region *r)
  * From each group will be used only defined amount of bits.
  * Syntax: range:used_size/group_size
  * Example: 0-1023:2/256 ==> 0,1,256,257,512,513,768,769
+ * The value 'N' can be used as a dynamically substituted token for the
+ * maximum allowed value; i.e (nmaskbits - 1).
  *
  * Returns: 0 on success, -errno on invalid input strings. Error values:
  *
-- 
2.17.1



[PATCH 8/8] rcu: deprecate "all" option to rcu_nocbs=

2021-01-26 Thread Paul Gortmaker
With the core bitmap support now accepting "N" as a placeholder for
the end of the bitmap, "all" can be represented as "0-N" and has the
advantage of not being specific to RCU (or any other subsystem).

So deprecate the use of "all" by removing documentation references
to it.  The support itself needs to remain for now, since we don't
know how many people out there are using it currently, but since it
is in an __init area anyway, it isn't worth losing sleep over.

Cc: Yury Norov 
Cc: Peter Zijlstra 
Cc: "Paul E. McKenney" 
Cc: Josh Triplett 
Signed-off-by: Paul Gortmaker 
---
 Documentation/admin-guide/kernel-parameters.txt | 4 +---
 kernel/rcu/tree_plugin.h| 6 ++
 2 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index a10b545c2070..a116c0ff0a91 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4037,9 +4037,7 @@
see CONFIG_RAS_CEC help text.
 
rcu_nocbs=  [KNL]
-   The argument is a cpu list, as described above,
-   except that the string "all" can be used to
-   specify every CPU on the system.
+   The argument is a cpu list, as described above.
 
In kernels built with CONFIG_RCU_NOCB_CPU=y, set
the specified list of CPUs to be no-callback CPUs.
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 7e291ce0a1d6..56788dfde922 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1463,14 +1463,12 @@ static void rcu_cleanup_after_idle(void)
 
 /*
  * Parse the boot-time rcu_nocb_mask CPU list from the kernel parameters.
- * The string after the "rcu_nocbs=" is either "all" for all CPUs, or a
- * comma-separated list of CPUs and/or CPU ranges.  If an invalid list is
- * given, a warning is emitted and all CPUs are offloaded.
+ * If the list is invalid, a warning is emitted and all CPUs are offloaded.
  */
 static int __init rcu_nocb_setup(char *str)
 {
alloc_bootmem_cpumask_var(_nocb_mask);
-   if (!strcasecmp(str, "all"))
+   if (!strcasecmp(str, "all"))/* legacy: use "0-N" instead */
cpumask_setall(rcu_nocb_mask);
else
if (cpulist_parse(str, rcu_nocb_mask)) {
-- 
2.17.1



[PATCH 5/8] lib: bitmap_getnum: separate arg into region and field

2021-01-26 Thread Paul Gortmaker
The bitmap_getnum is only used on a region's start/end/off/group_len
field.  Trivially decouple the region from the field so that the region
pointer is available for a pending change.

Cc: Yury Norov 
Cc: Rasmus Villemoes 
Cc: Andy Shevchenko 
Signed-off-by: Paul Gortmaker 
---
 lib/bitmap.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/lib/bitmap.c b/lib/bitmap.c
index 833f152a2c43..f65be2f148fd 100644
--- a/lib/bitmap.c
+++ b/lib/bitmap.c
@@ -533,6 +533,7 @@ static const char *bitmap_getnum(const char *str, unsigned 
int *num)
*num = n;
return str + len;
 }
+#define bitmap_getrnum(s, r, pos) bitmap_getnum(s, &(r->pos))
 
 static inline bool end_of_str(char c)
 {
@@ -571,7 +572,7 @@ static const char *bitmap_find_region_reverse(const char 
*start, const char *end
 
 static const char *bitmap_parse_region(const char *str, struct region *r)
 {
-   str = bitmap_getnum(str, >start);
+   str = bitmap_getrnum(str, r, start);
if (IS_ERR(str))
return str;
 
@@ -581,7 +582,7 @@ static const char *bitmap_parse_region(const char *str, 
struct region *r)
if (*str != '-')
return ERR_PTR(-EINVAL);
 
-   str = bitmap_getnum(str + 1, >end);
+   str = bitmap_getrnum(str + 1, r, end);
if (IS_ERR(str))
return str;
 
@@ -591,14 +592,14 @@ static const char *bitmap_parse_region(const char *str, 
struct region *r)
if (*str != ':')
return ERR_PTR(-EINVAL);
 
-   str = bitmap_getnum(str + 1, >off);
+   str = bitmap_getrnum(str + 1, r, off);
if (IS_ERR(str))
return str;
 
if (*str != '/')
return ERR_PTR(-EINVAL);
 
-   return bitmap_getnum(str + 1, >group_len);
+   return bitmap_getrnum(str + 1, r, group_len);
 
 no_end:
r->end = r->start;
-- 
2.17.1



[PATCH 2/8] lib: test_bitmap: add more start-end:offset/len tests

2021-01-26 Thread Paul Gortmaker
There are inputs to bitmap_parselist() that would probably never
be entered manually by a person, but might result from some kind of
automated input generator.  Things like ranges of length 1, or group
lengths longer than nbits, overlaps, or offsets of zero.

Adding these tests serve two purposes:

1) document what might seem odd but nonetheless valid input.

2) don't regress from what we currently accept as valid.

Cc: Yury Norov 
Cc: Rasmus Villemoes 
Cc: Andy Shevchenko 
Signed-off-by: Paul Gortmaker 
---
 lib/test_bitmap.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c
index 3d2cd3b1de84..807d1e8dd59c 100644
--- a/lib/test_bitmap.c
+++ b/lib/test_bitmap.c
@@ -35,6 +35,8 @@ static const unsigned long exp1[] __initconst = {
BITMAP_FROM_U64(0xULL),
BITMAP_FROM_U64(0xULL),
BITMAP_FROM_U64(0),
+   BITMAP_FROM_U64(0x8000),
+   BITMAP_FROM_U64(0x8000),
 };
 
 static const unsigned long exp2[] __initconst = {
@@ -335,6 +337,26 @@ static const struct test_bitmap_parselist 
parselist_tests[] __initconst = {
{0, " ,  ,,  , ,   ",   [12 * step], 8, 0},
{0, " ,  ,,  , ,   \n", [12 * step], 8, 0},
 
+   {0, "0-0",  [0], 32, 0},
+   {0, "1-1",  [1 * step], 32, 0},
+   {0, "15-15",[13 * step], 32, 0},
+   {0, "31-31",[14 * step], 32, 0},
+
+   {0, "0-0:0/1",  [12 * step], 32, 0},
+   {0, "0-0:1/1",  [0], 32, 0},
+   {0, "0-0:1/31", [0], 32, 0},
+   {0, "0-0:31/31",[0], 32, 0},
+   {0, "1-1:1/1",  [1 * step], 32, 0},
+   {0, "0-15:16/31",   [2 * step], 32, 0},
+   {0, "15-15:1/2",[13 * step], 32, 0},
+   {0, "15-15:31/31",  [13 * step], 32, 0},
+   {0, "15-31:1/31",   [13 * step], 32, 0},
+   {0, "16-31:16/31",  [3 * step], 32, 0},
+   {0, "31-31:31/31",  [14 * step], 32, 0},
+
+   {0, "0-31:1/3,1-31:1/3,2-31:1/3",   [8 * step], 32, 0},
+   {0, "1-10:8/12,8-31:24/29,0-31:0/3",[9 * step], 32, 0},
+
{-EINVAL, "-1", NULL, 8, 0},
{-EINVAL, "-0", NULL, 8, 0},
{-EINVAL, "10-1", NULL, 8, 0},  /* (start > end) ; also ERANGE */
-- 
2.17.1



[PATCH v3 0/8] support for bitmap (and hence CPU) list "N" abbreviation

2021-01-26 Thread Paul Gortmaker
The basic objective here was to add support for "nohz_full=8-N" and/or
"rcu_nocbs="4-N" -- essentially introduce "N" as a portable reference
to the last core, evaluated at boot for anything using a CPU list.

The thinking behind this, is that people carve off a few early CPUs to
support housekeeping tasks, and perhaps dedicate one to a busy I/O
peripheral, and then the remaining pool of CPUs out to the end are a
part of a commonly configured pool used for the real work the user
cares about.

Extend that logic out to a fleet of machines - some new, and some
nearing EOL, and you've probably got a wide range of core counts to
contend with - even though the early number of cores dedicated to the
system overhead probably doesn't vary.

This change would enable sysadmins to have a common bootarg across all
such systems, and would also avoid any off-by-one fencepost errors that
happen for users who might briefly forget that core counts start at zero.

Originally I did this at the CPU subsys level, but Yury suggested it
be moved down further to bitmap level itself, which made the core 
implementation [6/8] smaller and less complex, but the series longer.

New self tests are added to better exercise what bitmap range/region
currently supports, and new tests are added for the new "N" support.

Also tested boot arg and the post-boot cgroup use case as per below:

   root@hackbox:~# cat /proc/cmdline 
   BOOT_IMAGE=/boot/bzImage root=/dev/sda1 rcu_nocbs=2,3,8-N:1/2
   root@hackbox:~# dmesg|grep Offl
   rcu: Offload RCU callbacks from CPUs: 2-3,8,10,12,14.

   root@hackbox:/sys/fs/cgroup/cpuset/foo# cat cpuset.cpus
   
   root@hackbox:/sys/fs/cgroup/cpuset/foo# /bin/echo 10-N > cpuset.cpus
   root@hackbox:/sys/fs/cgroup/cpuset/foo# cat cpuset.cpus
   10-15
   root@hackbox:/sys/fs/cgroup/cpuset/foo# /bin/echo N-N:N/N > cpuset.cpus
   root@hackbox:/sys/fs/cgroup/cpuset/foo# cat cpuset.cpus
   15

This was on a 16 core machine with CONFIG_NR_CPUS=16 in .config file.

Note that "N" is a dynamic quantity, and can change scope if the bitmap
is changed in size.  So at the risk of stating the obvious, don't use it
for "burn_eFuse=128-N" or "secure_erase_firmware=32-N" type stuff.

Paul.
---

[v1: https://lore.kernel.org/lkml/20210106004850.GA11682@paulmck-ThinkPad-P72/

[v2: push code down from cpu subsys to core bitmap code as per
 Yury's comments.  Change "last" to simply be "N" as per PeterZ.]
 
https://lore.kernel.org/lkml/20210121223355.59780-1-paul.gortma...@windriver.com/

[v3: Allow "N" to be used anywhere in the region spec, i.e. "N-N:N/N" vs.
 just being allowed at end of range like "0-N".  Add new self-tests.  Drop
 "all" and "none" aliases as redundant and not worth the extra complication. ]

Cc: Li Zefan 
Cc: Ingo Molnar 
Cc: Yury Norov 
Cc: Thomas Gleixner 
Cc: Josh Triplett 
Cc: Peter Zijlstra 
Cc: "Paul E. McKenney" 
Cc: Frederic Weisbecker 
Cc: Rasmus Villemoes 
Cc: Andy Shevchenko 

---

Paul Gortmaker (8):
  lib: test_bitmap: clearly separate ERANGE from EINVAL tests.
  lib: test_bitmap: add more start-end:offset/len tests
  lib: bitmap: fold nbits into region struct
  lib: bitmap: move ERANGE check from set_region to check_region
  lib: bitmap_getnum: separate arg into region and field
  lib: bitmap: support "N" as an alias for size of bitmap
  lib: test_bitmap: add tests for "N" alias
  rcu: deprecate "all" option to rcu_nocbs=

 .../admin-guide/kernel-parameters.rst |  2 +
 .../admin-guide/kernel-parameters.txt |  4 +-
 kernel/rcu/tree_plugin.h  |  6 +--
 lib/bitmap.c  | 46 ++
 lib/test_bitmap.c | 48 ---
 5 files changed, 72 insertions(+), 34 deletions(-)

-- 
2.17.1



[PATCH 4/8] lib: bitmap: move ERANGE check from set_region to check_region

2021-01-26 Thread Paul Gortmaker
It makes sense to do all the checks in check_region() and not 1/2
in check_region and 1/2 in set_region.

Since set_region is called immediately after check_region, the net
effect on runtime is zero, but it gets rid of an if (...) return...

Cc: Yury Norov 
Cc: Rasmus Villemoes 
Cc: Andy Shevchenko 
Signed-off-by: Paul Gortmaker 
---
 lib/bitmap.c | 14 +-
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/lib/bitmap.c b/lib/bitmap.c
index 162e2850c622..833f152a2c43 100644
--- a/lib/bitmap.c
+++ b/lib/bitmap.c
@@ -500,17 +500,12 @@ struct region {
unsigned int nbits;
 };
 
-static int bitmap_set_region(const struct region *r, unsigned long *bitmap)
+static void bitmap_set_region(const struct region *r, unsigned long *bitmap)
 {
unsigned int start;
 
-   if (r->end >= r->nbits)
-   return -ERANGE;
-
for (start = r->start; start <= r->end; start += r->group_len)
bitmap_set(bitmap, start, min(r->end - start + 1, r->off));
-
-   return 0;
 }
 
 static int bitmap_check_region(const struct region *r)
@@ -518,6 +513,9 @@ static int bitmap_check_region(const struct region *r)
if (r->start > r->end || r->group_len == 0 || r->off > r->group_len)
return -EINVAL;
 
+   if (r->end >= r->nbits)
+   return -ERANGE;
+
return 0;
 }
 
@@ -656,9 +654,7 @@ int bitmap_parselist(const char *buf, unsigned long *maskp, 
int nmaskbits)
if (ret)
return ret;
 
-   ret = bitmap_set_region(, maskp);
-   if (ret)
-   return ret;
+   bitmap_set_region(, maskp);
}
 
return 0;
-- 
2.17.1



[PATCH 1/8] lib: test_bitmap: clearly separate ERANGE from EINVAL tests.

2021-01-26 Thread Paul Gortmaker
This block of tests was meant to find/flag incorrect use of the ":"
and "/" separators (syntax errors) and invalid (zero) group len.

However they were specified with an 8 bit width and 32 bit operations,
so they really contained two errors (EINVAL and ERANGE).

Promote them to 32 bit so it is clear what they are meant to target,
and then add tests specific for ERANGE (no syntax errors, just doing
32bit op on 8 bit width, plus a typical 9-on-8 fencepost error).

Note that the remaining "10-1" on 8 is left as-is, since that captures
the fact that we check for (r->start > r->end) ---> EINVAL before we
check for (r->end >= nbits) ---> ERANGE.  If the code is ever re-ordered
then this test will pick up the mismatched errno value.

Cc: Yury Norov 
Cc: Rasmus Villemoes 
Cc: Andy Shevchenko 
Signed-off-by: Paul Gortmaker 
---
 lib/test_bitmap.c | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c
index 4425a1dd4ef1..3d2cd3b1de84 100644
--- a/lib/test_bitmap.c
+++ b/lib/test_bitmap.c
@@ -337,13 +337,15 @@ static const struct test_bitmap_parselist 
parselist_tests[] __initconst = {
 
{-EINVAL, "-1", NULL, 8, 0},
{-EINVAL, "-0", NULL, 8, 0},
-   {-EINVAL, "10-1", NULL, 8, 0},
-   {-EINVAL, "0-31:", NULL, 8, 0},
-   {-EINVAL, "0-31:0", NULL, 8, 0},
-   {-EINVAL, "0-31:0/", NULL, 8, 0},
-   {-EINVAL, "0-31:0/0", NULL, 8, 0},
-   {-EINVAL, "0-31:1/0", NULL, 8, 0},
-   {-EINVAL, "0-31:10/1", NULL, 8, 0},
+   {-EINVAL, "10-1", NULL, 8, 0},  /* (start > end) ; also ERANGE */
+   {-ERANGE, "8-8", NULL, 8, 0},
+   {-ERANGE, "0-31", NULL, 8, 0},
+   {-EINVAL, "0-31:", NULL, 32, 0},
+   {-EINVAL, "0-31:0", NULL, 32, 0},
+   {-EINVAL, "0-31:0/", NULL, 32, 0},
+   {-EINVAL, "0-31:0/0", NULL, 32, 0},
+   {-EINVAL, "0-31:1/0", NULL, 32, 0},
+   {-EINVAL, "0-31:10/1", NULL, 32, 0},
{-EOVERFLOW, "0-98765432123456789:10/1", NULL, 8, 0},
 
{-EINVAL, "a-31", NULL, 8, 0},
-- 
2.17.1



[PATCH 3/8] lib: bitmap: fold nbits into region struct

2021-01-26 Thread Paul Gortmaker
This will reduce parameter passing and enable using nbits as part
of future dynamic region parameter parsing.

Cc: Yury Norov 
Cc: Rasmus Villemoes 
Cc: Andy Shevchenko 
Suggested-by: Yury Norov 
Signed-off-by: Paul Gortmaker 
---
 lib/bitmap.c | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/lib/bitmap.c b/lib/bitmap.c
index 75006c4036e9..162e2850c622 100644
--- a/lib/bitmap.c
+++ b/lib/bitmap.c
@@ -487,24 +487,24 @@ EXPORT_SYMBOL(bitmap_print_to_pagebuf);
 
 /*
  * Region 9-38:4/10 describes the following bitmap structure:
- * 0  9  1218  38
- * .......
- * ^  ^ ^   ^
- *  start  off   group_lenend
+ * 0  9  1218  38   N
+ * .......
+ * ^  ^ ^   ^   ^
+ *  start  off   group_lenend   nbits
  */
 struct region {
unsigned int start;
unsigned int off;
unsigned int group_len;
unsigned int end;
+   unsigned int nbits;
 };
 
-static int bitmap_set_region(const struct region *r,
-   unsigned long *bitmap, int nbits)
+static int bitmap_set_region(const struct region *r, unsigned long *bitmap)
 {
unsigned int start;
 
-   if (r->end >= nbits)
+   if (r->end >= r->nbits)
return -ERANGE;
 
for (start = r->start; start <= r->end; start += r->group_len)
@@ -640,7 +640,8 @@ int bitmap_parselist(const char *buf, unsigned long *maskp, 
int nmaskbits)
struct region r;
long ret;
 
-   bitmap_zero(maskp, nmaskbits);
+   r.nbits = nmaskbits;
+   bitmap_zero(maskp, r.nbits);
 
while (buf) {
buf = bitmap_find_region(buf);
@@ -655,7 +656,7 @@ int bitmap_parselist(const char *buf, unsigned long *maskp, 
int nmaskbits)
if (ret)
return ret;
 
-   ret = bitmap_set_region(, maskp, nmaskbits);
+   ret = bitmap_set_region(, maskp);
if (ret)
return ret;
}
-- 
2.17.1



Re: [PATCH 3/3] lib: support N as end of range in bitmap_parselist()

2021-01-21 Thread Paul Gortmaker
[Re: [PATCH 3/3] lib: support N as end of range in bitmap_parselist()] On 
21/01/2021 (Thu 16:29) Yury Norov wrote:

> On Thu, Jan 21, 2021 at 2:34 PM Paul Gortmaker
>  wrote:
> >
> > While this is done for all bitmaps, the original use case in mind was
> > for CPU masks and cpulist_parse().  Credit to Yury who suggested to
> > push it down from CPU subsys to bitmap - it simplified things a lot.
> 
> Can you convert your credit to Suggested-by or Reviewed-by? :)

Sure, of course.

[...]

> > diff --git a/lib/bitmap.c b/lib/bitmap.c
> > index a1010646fbe5..d498ea9d526b 100644
> > --- a/lib/bitmap.c
> > +++ b/lib/bitmap.c
> > @@ -571,7 +571,7 @@ static const char *bitmap_find_region_reverse(const 
> > char *start, const char *end
> > return end;
> >  }
> >
> > -static const char *bitmap_parse_region(const char *str, struct region *r)
> > +static const char *bitmap_parse_region(const char *str, struct region *r, 
> > int nmaskbits)
> >  {
> 
> in bitmap_parselist() you can store nmaskbits in the struct region, and avoid
> passing nmaskbits as a parameter.

OK.   FWIW, I considered that and went with the param so as to not open
the door to someone possibly using an uninitialized struct value later.

> > str = bitmap_getnum(str, >start);
> > if (IS_ERR(str))
> > @@ -583,9 +583,15 @@ static const char *bitmap_parse_region(const char 
> > *str, struct region *r)
> > if (*str != '-')
> > return ERR_PTR(-EINVAL);
> >
> > -   str = bitmap_getnum(str + 1, >end);
> > -   if (IS_ERR(str))
> > -   return str;
> > +   str++;
> > +   if (*str == 'N') {
> > +   r->end = nmaskbits - 1;
> > +   str++;
> > +   } else {
> > +   str = bitmap_getnum(str, >end);
> > +   if (IS_ERR(str))
> > +   return str;
> > +   }
> 
> Indeed it's much simpler. But I don't like that you increase the nesting 
> level.
> Can you keep bitmap_parse_region() a single-tab style function?

Rather a strict coding style, but we can replace with:

   if (*str == 'N') {
   r->end = nmaskbits - 1;
   str++;
   } else {
   str = bitmap_getnum(str, >end);
   }

   if (IS_ERR(str))
   return str;

Is that what you were after?

> What about group size? Are you going to support N there, like "0-N:5/N"?

No.  I would think that the group size has to be less than 1/2 of
the nmaskbits or you get the rather pointless case of just one group.
Plus conflating "end of range" with "group size" just adds confusion.
So it is currently not legal:

root@hackbox:/sys/fs/cgroup/cpuset/foo# /bin/echo 4-N:2/4 > cpuset.cpus
root@hackbox:/sys/fs/cgroup/cpuset/foo# cat cpuset.cpus
4-5,8-9,12-13
root@hackbox:/sys/fs/cgroup/cpuset/foo# /bin/echo 4-N:2/N > cpuset.cpus
/bin/echo: write error: Invalid argument
root@hackbox:/sys/fs/cgroup/cpuset/foo#

> What about "N-N"? Is it legal? Maybe hide new logic in bitmap_getnum()?

The "N-N" is also not supported/legal.  The allowed use is listed as
being for the end of a range only.  The code enforces this by ensuring
the char previous is a '-'  ; hence a leading N is invalid:

root@hackbox:/sys/fs/cgroup/cpuset/foo# /bin/echo N-N > cpuset.cpus
/bin/echo: write error: Invalid argument
root@hackbox:/sys/fs/cgroup/cpuset/foo# /bin/echo 0-N > cpuset.cpus
root@hackbox:/sys/fs/cgroup/cpuset/foo# cat cpuset.cpus
0-15
root@hackbox:/sys/fs/cgroup/cpuset/foo#

I think "use for end of range only" makes sense in the mathematical
sense most of us have seen during school:  {0, 1, 2, ...  N-1, N} as
used in the end point of a range of numbers.  I could make the "only"
part more explicit and concrete in the comments/docs if desired.

I'm not sure I see the value in complicating things in order to add
or extend support to non-intuitive use cases beyond that - to me that
seems to just make things more confusing for end users.  But again
if you've something in mind that I'm simply missing, then by all
means please elaborate.

> I would also like to see tests covering new functionality. As a user of "N",
> I want to be 100% sure that this "N" is a full equivalent of NR_CPUS, 
> including
> error codes that the parser returns. Otherwise it will be hard to maintain the
> transition.

That is a reasonable request.  I will look into adding "N" based type
tests to the existing bitmap test cases in a separate commit.

Thanks,
Paul.
--

> 
> > if (end_of_region(*str))
> > goto no_pattern;
> > @@ -628,6 +634,8 @@ s

Re: [PATCH 1/3] lib: add "all" and "none" as valid ranges to bitmap_parselist()

2021-01-21 Thread Paul Gortmaker
[Re: [PATCH 1/3] lib: add "all" and "none" as valid ranges to 
bitmap_parselist()] On 21/01/2021 (Thu 16:07) Yury Norov wrote:

> On Thu, Jan 21, 2021 at 2:34 PM Paul Gortmaker
>  wrote:
> >
> > The use of "all" was originally RCU specific - I'd pushed it down to
> > being used for any CPU lists -- then Yuri suggested pushing it down
> > further to be used by any bitmap, which is done here.
> >
> > As a trivial one line extension, we also accept the inverse "none"
> > as a valid alias.
> >
> > Cc: Yury Norov 
> > Cc: Peter Zijlstra 
> > Cc: "Paul E. McKenney" 
> > Signed-off-by: Paul Gortmaker 
> > ---
> >  Documentation/admin-guide/kernel-parameters.rst | 11 +++
> >  lib/bitmap.c|  9 +
> >  2 files changed, 20 insertions(+)
> >
> > diff --git a/Documentation/admin-guide/kernel-parameters.rst 
> > b/Documentation/admin-guide/kernel-parameters.rst
> > index 682ab28b5c94..5e080080b058 100644
> > --- a/Documentation/admin-guide/kernel-parameters.rst
> > +++ b/Documentation/admin-guide/kernel-parameters.rst
> > @@ -68,7 +68,18 @@ For example one can add to the command line following 
> > parameter:
> >
> >  where the final item represents CPUs 100,101,125,126,150,151,...
> >
> > +The following convenience aliases are also accepted and used:
> >
> > +foo_cpus=all
> > +
> > +will provide an full/all-set cpu mask for the associated boot argument.
> > +
> > +foo_cpus=none
> > +
> > +will provide an empty/cleared cpu mask for the associated boot argument.
> > +
> > +Note that "all" and "none" are not necessarily valid/sensible input values
> > +for each available boot parameter expecting a CPU list.
> 
> My question from v1 is still there: what about the line like
> "none,all", ok ",all,"

Apologies - I must have overlooked that somehow.  Let me address it now.

> or similar? If it's not legal, it should be mentioned in the comment,

OK, it is not legal.  So if desired, I can do this in the code...

 - * Optionally the self-descriptive "all" or "none" can be used.
 + * Optionally the self-descriptive stand alone "all" or "none" can be used.

...and a similar "stand alone" addition in kernel-parameters.rst above?

> if it is legal,
> the corresponding code should go to bitmap_parse_region(), just like for "N".

Non-standalone is not legal.  The strcmp ensures the "all" or "none" are
stand-alone.  And as can be seen in the testing below, any attempt to
combine them with commas or ranges or repeated instances is -EINVAL.
(And I'll look at adding such tests to bitmap_test.c as requested.)

> My personal preference is the latter option.

I'm a bit confused as to the value in adding code for supporting things
like ",all,none,all,,none" and then having to define some policy, like
"last processed takes precedence" or similar.   A strict stand-alone
"all" or "none" and everything else as -EINVAL as per below seems
logical.   Maybe I'm missing something and you can elaborate?

Thanks
Paul.
--

root@hackbox:/sys/fs/cgroup/cpuset/foo# /bin/echo all,none,all > cpuset.cpus
/bin/echo: write error: Invalid argument
root@hackbox:/sys/fs/cgroup/cpuset/foo# /bin/echo none,all > cpuset.cpus
/bin/echo: write error: Invalid argument
root@hackbox:/sys/fs/cgroup/cpuset/foo# /bin/echo all,all > cpuset.cpus
/bin/echo: write error: Invalid argument
root@hackbox:/sys/fs/cgroup/cpuset/foo# /bin/echo all, > cpuset.cpus
/bin/echo: write error: Invalid argument
root@hackbox:/sys/fs/cgroup/cpuset/foo# /bin/echo all > cpuset.cpus
root@hackbox:/sys/fs/cgroup/cpuset/foo# /bin/echo ,none > cpuset.cpus
/bin/echo: write error: Invalid argument
root@hackbox:/sys/fs/cgroup/cpuset/foo# /bin/echo none > cpuset.cpus
root@hackbox:/sys/fs/cgroup/cpuset/foo# /bin/echo 1,3,5,7,9,11,13,15 > 
cpuset.cpus
root@hackbox:/sys/fs/cgroup/cpuset/foo# cat cpuset.cpus
1,3,5,7,9,11,13,15
root@hackbox:/sys/fs/cgroup/cpuset/foo# /bin/echo 1,3,5,7,9,11,13,all > 
cpuset.cpus
/bin/echo: write error: Invalid argument
root@hackbox:/sys/fs/cgroup/cpuset/foo# /bin/echo none,3,5,7,9,11,13,15 > 
cpuset.cpus
/bin/echo: write error: Invalid argument
root@hackbox:/sys/fs/cgroup/cpuset/foo# 


Re: [PATCH RFC cpumask] Allow "all", "none", and "last" in cpumask strings

2021-01-21 Thread Paul Gortmaker
[Re: [PATCH RFC cpumask] Allow "all", "none", and "last" in cpumask strings] On 
20/01/2021 (Wed 23:11) Yury Norov wrote:

> Hi Paul,
> 
> Today I found this series in linux-next despite downsides discovered during
> the review. This series introduces absolutely unneeded cap on the number of
> cpus in the system (), and also adds unsafe and non-optimal code.
> 
> In addition to that, I observe this warning on powerpc:
>   CC  lib/cpumask.o
> lib/cpumask.c: In function ‘cpulist_parse’:
> lib/cpumask.c:222:17: warning: cast from pointer to integer of
> different size [-Wpointer-to-int-cast]
>   222 |   memblock_free((phys_addr_t)cpulist, len);
>   | ^
> 
> Can you please revert this series unless all the problems will be fixed?

That was my fault - I should have explicitly asked PaulM to yank it once
I didn't get to creating v2 immediately.  Sorry.

Your suggested changes made things much more simple and smaller - thanks!

I believe v2 does address all the problems - please have a look when you
have some time.  It should be easier to review, given the smaller size.

https://lore.kernel.org/lkml/20210121223355.59780-1-paul.gortma...@windriver.com/

Thanks again,
Paul.
--

> 
> Thanks,
> Yury


[PATCH 2/3] rcu: dont special case "all" handling; let bitmask deal with it

2021-01-21 Thread Paul Gortmaker
Now that the core bitmap parse code respects the "all" parameter, there
is no need for RCU to have its own special check for it.

Cc: Yury Norov 
Cc: Peter Zijlstra 
Cc: "Paul E. McKenney" 
Signed-off-by: Paul Gortmaker 
---
 Documentation/admin-guide/kernel-parameters.txt |  4 +---
 kernel/rcu/tree_plugin.h| 13 -
 2 files changed, 5 insertions(+), 12 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index a10b545c2070..a116c0ff0a91 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4037,9 +4037,7 @@
see CONFIG_RAS_CEC help text.
 
rcu_nocbs=  [KNL]
-   The argument is a cpu list, as described above,
-   except that the string "all" can be used to
-   specify every CPU on the system.
+   The argument is a cpu list, as described above.
 
In kernels built with CONFIG_RCU_NOCB_CPU=y, set
the specified list of CPUs to be no-callback CPUs.
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 7e291ce0a1d6..642ebd6569c7 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1463,20 +1463,15 @@ static void rcu_cleanup_after_idle(void)
 
 /*
  * Parse the boot-time rcu_nocb_mask CPU list from the kernel parameters.
- * The string after the "rcu_nocbs=" is either "all" for all CPUs, or a
- * comma-separated list of CPUs and/or CPU ranges.  If an invalid list is
- * given, a warning is emitted and all CPUs are offloaded.
+ * If the list is invalid, a warning is emitted and all CPUs are offloaded.
  */
 static int __init rcu_nocb_setup(char *str)
 {
alloc_bootmem_cpumask_var(_nocb_mask);
-   if (!strcasecmp(str, "all"))
+   if (cpulist_parse(str, rcu_nocb_mask)) {
+   pr_warn("rcu_nocbs= bad CPU range, all CPUs set\n");
cpumask_setall(rcu_nocb_mask);
-   else
-   if (cpulist_parse(str, rcu_nocb_mask)) {
-   pr_warn("rcu_nocbs= bad CPU range, all CPUs set\n");
-   cpumask_setall(rcu_nocb_mask);
-   }
+   }
return 1;
 }
 __setup("rcu_nocbs=", rcu_nocb_setup);
-- 
2.17.1



[PATCH 1/3] lib: add "all" and "none" as valid ranges to bitmap_parselist()

2021-01-21 Thread Paul Gortmaker
The use of "all" was originally RCU specific - I'd pushed it down to
being used for any CPU lists -- then Yuri suggested pushing it down
further to be used by any bitmap, which is done here.

As a trivial one line extension, we also accept the inverse "none"
as a valid alias.

Cc: Yury Norov 
Cc: Peter Zijlstra 
Cc: "Paul E. McKenney" 
Signed-off-by: Paul Gortmaker 
---
 Documentation/admin-guide/kernel-parameters.rst | 11 +++
 lib/bitmap.c|  9 +
 2 files changed, 20 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.rst 
b/Documentation/admin-guide/kernel-parameters.rst
index 682ab28b5c94..5e080080b058 100644
--- a/Documentation/admin-guide/kernel-parameters.rst
+++ b/Documentation/admin-guide/kernel-parameters.rst
@@ -68,7 +68,18 @@ For example one can add to the command line following 
parameter:
 
 where the final item represents CPUs 100,101,125,126,150,151,...
 
+The following convenience aliases are also accepted and used:
 
+foo_cpus=all
+
+will provide an full/all-set cpu mask for the associated boot argument.
+
+foo_cpus=none
+
+will provide an empty/cleared cpu mask for the associated boot argument.
+
+Note that "all" and "none" are not necessarily valid/sensible input values
+for each available boot parameter expecting a CPU list.
 
 This document may not be entirely up to date and comprehensive. The command
 "modinfo -p ${modulename}" shows a current list of all parameters of a loadable
diff --git a/lib/bitmap.c b/lib/bitmap.c
index 75006c4036e9..a1010646fbe5 100644
--- a/lib/bitmap.c
+++ b/lib/bitmap.c
@@ -627,6 +627,7 @@ static const char *bitmap_parse_region(const char *str, 
struct region *r)
  * From each group will be used only defined amount of bits.
  * Syntax: range:used_size/group_size
  * Example: 0-1023:2/256 ==> 0,1,256,257,512,513,768,769
+ * Optionally the self-descriptive "all" or "none" can be used.
  *
  * Returns: 0 on success, -errno on invalid input strings. Error values:
  *
@@ -640,8 +641,16 @@ int bitmap_parselist(const char *buf, unsigned long 
*maskp, int nmaskbits)
struct region r;
long ret;
 
+   if (!strcmp(buf, "all")) {
+   bitmap_fill(maskp, nmaskbits);
+   return 0;
+   }
+
bitmap_zero(maskp, nmaskbits);
 
+   if (!strcmp(buf, "none"))
+   return 0;
+
while (buf) {
buf = bitmap_find_region(buf);
if (buf == NULL)
-- 
2.17.1



[PATCH 3/3] lib: support N as end of range in bitmap_parselist()

2021-01-21 Thread Paul Gortmaker
While this is done for all bitmaps, the original use case in mind was
for CPU masks and cpulist_parse().  Credit to Yury who suggested to
push it down from CPU subsys to bitmap - it simplified things a lot.

It seems that a common configuration is to use the 1st couple cores
for housekeeping tasks, and or driving a busy peripheral that generates
a lot of interrupts, or something similar.

This tends to leave the remaining ones to form a pool of similarly
configured cores to take on the real workload of interest to the user.

So on machine A - with 32 cores, it could be 0-3 for "system" and then
4-31 being used in boot args like nohz_full=, or rcu_nocbs= as part of
setting up the worker pool of CPUs.

But then newer machine B is added, and it has 48 cores, and so while
the 0-3 part remains unchanged, the pool setup cpu list becomes 4-47.

Deployment would be easier if we could just simply replace 31 and 47
with "N" and let the system substitute in the actual number at boot;
a number that it knows better than we do.

No need to have custom boot args per node, no need to do a trial boot
in order to snoop /proc/cpuinfo and/or /sys/devices/system/cpu - no
more fencepost errors of using 32 and 48 instead of 31 and 47.

Cc: Yury Norov 
Cc: Peter Zijlstra 
Cc: "Paul E. McKenney" 
Signed-off-by: Paul Gortmaker 
---
 .../admin-guide/kernel-parameters.rst  |  4 
 lib/bitmap.c   | 18 +-
 2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.rst 
b/Documentation/admin-guide/kernel-parameters.rst
index 5e080080b058..668f0b69fb4f 100644
--- a/Documentation/admin-guide/kernel-parameters.rst
+++ b/Documentation/admin-guide/kernel-parameters.rst
@@ -68,6 +68,10 @@ For example one can add to the command line following 
parameter:
 
 where the final item represents CPUs 100,101,125,126,150,151,...
 
+The value "N" can be used as the end of a range, to represent the numerically
+last CPU on the system, i.e "foo_cpus=16-N" would be equivalent to "16-31" on
+a 32 core system.
+
 The following convenience aliases are also accepted and used:
 
 foo_cpus=all
diff --git a/lib/bitmap.c b/lib/bitmap.c
index a1010646fbe5..d498ea9d526b 100644
--- a/lib/bitmap.c
+++ b/lib/bitmap.c
@@ -571,7 +571,7 @@ static const char *bitmap_find_region_reverse(const char 
*start, const char *end
return end;
 }
 
-static const char *bitmap_parse_region(const char *str, struct region *r)
+static const char *bitmap_parse_region(const char *str, struct region *r, int 
nmaskbits)
 {
str = bitmap_getnum(str, >start);
if (IS_ERR(str))
@@ -583,9 +583,15 @@ static const char *bitmap_parse_region(const char *str, 
struct region *r)
if (*str != '-')
return ERR_PTR(-EINVAL);
 
-   str = bitmap_getnum(str + 1, >end);
-   if (IS_ERR(str))
-   return str;
+   str++;
+   if (*str == 'N') {
+   r->end = nmaskbits - 1;
+   str++;
+   } else {
+   str = bitmap_getnum(str, >end);
+   if (IS_ERR(str))
+   return str;
+   }
 
if (end_of_region(*str))
goto no_pattern;
@@ -628,6 +634,8 @@ static const char *bitmap_parse_region(const char *str, 
struct region *r)
  * Syntax: range:used_size/group_size
  * Example: 0-1023:2/256 ==> 0,1,256,257,512,513,768,769
  * Optionally the self-descriptive "all" or "none" can be used.
+ * The value 'N' can be used as the end of a range to indicate the maximum
+ * allowed value; i.e (nmaskbits - 1).
  *
  * Returns: 0 on success, -errno on invalid input strings. Error values:
  *
@@ -656,7 +664,7 @@ int bitmap_parselist(const char *buf, unsigned long *maskp, 
int nmaskbits)
if (buf == NULL)
return 0;
 
-   buf = bitmap_parse_region(buf, );
+   buf = bitmap_parse_region(buf, , nmaskbits);
if (IS_ERR(buf))
return PTR_ERR(buf);
 
-- 
2.17.1



[PATCH v2 0/3] :support for bitmap (and hence CPU) list abbreviations

2021-01-21 Thread Paul Gortmaker
The basic objective here was to add support for "nohz_full=8-N" and/or
"rcu_nocbs="4-N" -- essentially introduce "N" as a portable reference
to the last core, evaluated at boot for anything using a CPU list.

The thinking behind this, is that people carve off a few early CPUs to
support housekeeping tasks, and perhaps dedicate one to a busy I/O
peripheral, and then the remaining pool of CPUs out to the end are a
part of a commonly configured pool used for the real work the user
cares about.

Extend that logic out to a fleet of machines - some new, and some
nearing EOL, and you've probably got a wide range of core counts to
contend with - even though the early number of cores dedicated to the
system overhead probably doesn't vary.

This change would enable sysadmins to have a common bootarg across all
such systems, and would also avoid any off-by-one fencepost errors that
happen for users who might briefly forget that core counts start at
zero.

Looking around before starting, I noticed RCU already had a short-form
abbreviation "all" -- but if we want to treat CPU lists in a uniform
matter, then tokens shouldn't be implemented at a subsystem level and
hence be subsystem specific; each with their own variations.

So I moved "all" to global use - for boot args, and for cgroups.  Then
I added the inverse "none" and finally, the one I wanted -- "N".

Originally I did this at the CPU subsys level, but Yury suggested it
be moved down further to bitmap level itself, and that was good advice.
Things got smaller and less complex.

The use of "N" isn't a standalone word like "all" or "none".  It will
be a part of a complete range specification, possibly with CSV separate
ranges before and after; like "nohz_full=5,6,8-N" or "nohz_full=2-N:3/4"

Also tested the post-boot cgroup use case as per below:

   root@hackbox:/sys/fs/cgroup/cpuset# mkdir foo
   root@hackbox:/sys/fs/cgroup/cpuset# cd foo
   root@hackbox:/sys/fs/cgroup/cpuset/foo# cat cpuset.cpus
   
   root@hackbox:/sys/fs/cgroup/cpuset/foo# /bin/echo 10-N > cpuset.cpus
   root@hackbox:/sys/fs/cgroup/cpuset/foo# cat cpuset.cpus
   10-15
   root@hackbox:/sys/fs/cgroup/cpuset/foo# /bin/echo all > cpuset.cpus
   root@hackbox:/sys/fs/cgroup/cpuset/foo# cat cpuset.cpus
   0-15
   root@hackbox:/sys/fs/cgroup/cpuset/foo# /bin/echo none > cpuset.cpus
   root@hackbox:/sys/fs/cgroup/cpuset/foo# cat cpuset.cpus
   
   root@hackbox:/sys/fs/cgroup/cpuset/foo#

This was on a 16 core machine with CONFIG_NR_CPUS=16 in .config file.

Note that "N" is a dynamic quantity, and can change scope if the bitmap
is changed in size.  So at the risk of stating the obvious, don't use it
for "burn_eFuse=128-N" or "secure_erase_firmware=32-N" type stuff.

Paul.
---

[v1: https://lore.kernel.org/lkml/20210106004850.GA11682@paulmck-ThinkPad-P72/

[v2: push code down from cpu subsys to core bitmap code as per
 Yury's comments.  Change "last" to simply be "N" as per PeterZ.

 Combination of the two got rid of needing strword() and greatly
 reduced complexity and footprint of the change -- thanks! ]

Cc: Yury Norov 
Cc: Peter Zijlstra 
Cc: "Paul E. McKenney" 
Cc: Frederic Weisbecker 
Cc: Josh Triplett 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Li Zefan 


Paul Gortmaker (3):
  lib: add "all" and "none" as valid ranges to bitmap_parselist()
  rcu: dont special case "all" handling; let bitmask deal with it
  lib: support N as end of range in bitmap_parselist()

 .../admin-guide/kernel-parameters.rst | 15 +++
 .../admin-guide/kernel-parameters.txt |  4 +--
 kernel/rcu/tree_plugin.h  | 13 +++--
 lib/bitmap.c  | 27 +++
 4 files changed, 42 insertions(+), 17 deletions(-)

-- 
2.17.1



[PATCH 1/3] powerpc: retire sbc8548 board support

2021-01-11 Thread Paul Gortmaker
The support was for this was mainlined 13 years ago, in v2.6.25
[0e0fffe88767] just around the ppc --> powerpc migration.

I believe the board was introduced a year or two before that, so it
is roughly a 15 year old platform - with the CPU speed and memory size
that was typical for that era.

I haven't had one of these boards for several years, and availability
was discontinued several years before that.

Given that, there is no point in adding a burden to testing coverage
that builds all possible defconfigs, so it makes sense to remove it.

Of course it will remain in the git history forever, for anyone who
happens to find a functional board and wants to tinker with it.

Cc: Scott Wood 
Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Signed-off-by: Paul Gortmaker 
---
 arch/powerpc/boot/Makefile  |   1 -
 arch/powerpc/boot/dts/sbc8548-altflash.dts  | 111 
 arch/powerpc/boot/dts/sbc8548-post.dtsi | 289 
 arch/powerpc/boot/dts/sbc8548-pre.dtsi  |  48 
 arch/powerpc/boot/dts/sbc8548.dts   | 106 ---
 arch/powerpc/boot/wrapper   |   2 +-
 arch/powerpc/configs/85xx/sbc8548_defconfig |  50 
 arch/powerpc/configs/mpc85xx_base.config|   1 -
 arch/powerpc/platforms/85xx/Kconfig |   6 -
 arch/powerpc/platforms/85xx/Makefile|   1 -
 arch/powerpc/platforms/85xx/sbc8548.c   | 134 -
 11 files changed, 1 insertion(+), 748 deletions(-)
 delete mode 100644 arch/powerpc/boot/dts/sbc8548-altflash.dts
 delete mode 100644 arch/powerpc/boot/dts/sbc8548-post.dtsi
 delete mode 100644 arch/powerpc/boot/dts/sbc8548-pre.dtsi
 delete mode 100644 arch/powerpc/boot/dts/sbc8548.dts
 delete mode 100644 arch/powerpc/configs/85xx/sbc8548_defconfig
 delete mode 100644 arch/powerpc/platforms/85xx/sbc8548.c

diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
index 2b8da923ceca..8edb85b9ae11 100644
--- a/arch/powerpc/boot/Makefile
+++ b/arch/powerpc/boot/Makefile
@@ -339,7 +339,6 @@ image-$(CONFIG_TQM8541) += 
cuImage.tqm8541
 image-$(CONFIG_TQM8548)+= cuImage.tqm8548
 image-$(CONFIG_TQM8555)+= cuImage.tqm8555
 image-$(CONFIG_TQM8560)+= cuImage.tqm8560
-image-$(CONFIG_SBC8548)+= cuImage.sbc8548
 image-$(CONFIG_KSI8560)+= cuImage.ksi8560
 
 # Board ports in arch/powerpc/platform/86xx/Kconfig
diff --git a/arch/powerpc/boot/dts/sbc8548-altflash.dts 
b/arch/powerpc/boot/dts/sbc8548-altflash.dts
deleted file mode 100644
index bb7a1e712bb7..
--- a/arch/powerpc/boot/dts/sbc8548-altflash.dts
+++ /dev/null
@@ -1,111 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * SBC8548 Device Tree Source
- *
- * Configured for booting off the alternate (64MB SODIMM) flash.
- * Requires switching JP12 jumpers and changing SW2.8 setting.
- *
- * Copyright 2013 Wind River Systems Inc.
- *
- * Paul Gortmaker (see MAINTAINERS for contact information)
- */
-
-
-/dts-v1/;
-
-/include/ "sbc8548-pre.dtsi"
-
-/{
-   localbus@e000 {
-   #address-cells = <2>;
-   #size-cells = <1>;
-   compatible = "simple-bus";
-   reg = <0xe000 0x5000>;
-   interrupt-parent = <>;
-
-   ranges = <0x0 0x0 0xfc00 0x0400 /*64MB Flash*/
- 0x3 0x0 0xf000 0x0400 /*64MB SDRAM*/
- 0x4 0x0 0xf400 0x0400 /*64MB SDRAM*/
- 0x5 0x0 0xf800 0x00b1 /* EPLD */
- 0x6 0x0 0xef80 0x0080>;   /*8MB Flash*/
-
-   flash@0,0 {
-   #address-cells = <1>;
-   #size-cells = <1>;
-   reg = <0x0 0x0 0x0400>;
-   compatible = "intel,JS28F128", "cfi-flash";
-   bank-width = <4>;
-   device-width = <1>;
-   partition@0 {
-   label = "space";
-   /* FC00 -> FFEF */
-   reg = <0x 0x03f0>;
-   };
-   partition@3f0 {
-   label = "bootloader";
-   /* FFF0 ->  */
-   reg = <0x03f0 0x0010>;
-   read-only;
-   };
-};
-
-
-   epld@5,0 {
-   compatible = "wrs,epld-localbus";
-   #address-cells = <2>;
-   #size-cells = <1>;
-   reg = <0x5 0x0 0x00b100

[PATCH 2/3] powerpc: retire sbc8641d board support

2021-01-11 Thread Paul Gortmaker
The support was for this was added to mainline over 12 years ago, in
v2.6.26 [4e8aae89a35d] just around the ppc --> powerpc migration.

I believe the board was introduced shortly after the sbc8548 board,
making it roughly a 14 year old platform - with the CPU speed and
memory size typical for that era.

I haven't had one of these boards for several years, and availability
was discontinued several years before that.

Given that, there is no point in adding a burden to testing coverage
that builds all possible defconfigs, so it makes sense to remove it.

Of course it will remain in the git history forever, for anyone who
happens to find a functional board and wants to tinker with it.

Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Signed-off-by: Paul Gortmaker 
---
 arch/powerpc/boot/dts/fsl/sbc8641d.dts   | 176 ---
 arch/powerpc/configs/mpc86xx_base.config |   1 -
 arch/powerpc/configs/ppc6xx_defconfig|   1 -
 arch/powerpc/platforms/86xx/Kconfig  |   8 +-
 arch/powerpc/platforms/86xx/Makefile |   1 -
 arch/powerpc/platforms/86xx/sbc8641d.c   |  87 ---
 6 files changed, 1 insertion(+), 273 deletions(-)
 delete mode 100644 arch/powerpc/boot/dts/fsl/sbc8641d.dts
 delete mode 100644 arch/powerpc/platforms/86xx/sbc8641d.c

diff --git a/arch/powerpc/boot/dts/fsl/sbc8641d.dts 
b/arch/powerpc/boot/dts/fsl/sbc8641d.dts
deleted file mode 100644
index 3dca10acc161..
--- a/arch/powerpc/boot/dts/fsl/sbc8641d.dts
+++ /dev/null
@@ -1,176 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * SBC8641D Device Tree Source
- *
- * Copyright 2008 Wind River Systems Inc.
- *
- * Paul Gortmaker (see MAINTAINERS for contact information)
- *
- * Based largely on the mpc8641_hpcn.dts by Freescale Semiconductor Inc.
- */
-
-/include/ "mpc8641si-pre.dtsi"
-
-/ {
-   model = "SBC8641D";
-   compatible = "wind,sbc8641";
-
-   memory {
-   device_type = "memory";
-   reg = <0x 0x2000>;  // 512M at 0x0
-   };
-
-   lbc: localbus@f8005000 {
-   reg = <0xf8005000 0x1000>;
-
-   ranges = <0 0 0xff00 0x0100 // 16MB Boot flash
- 1 0 0xf000 0x0001 // 64KB EEPROM
- 2 0 0xf100 0x0010 // EPLD (1MB)
- 3 0 0xe000 0x0400 // 64MB LB SDRAM (CS3)
- 4 0 0xe400 0x0400 // 64MB LB SDRAM (CS4)
- 6 0 0xf400 0x0010 // LCD display (1MB)
- 7 0 0xe800 0x0400>;   // 64MB OneNAND
-
-   flash@0,0 {
-   compatible = "cfi-flash";
-   reg = <0 0 0x0100>;
-   bank-width = <2>;
-   device-width = <2>;
-   #address-cells = <1>;
-   #size-cells = <1>;
-   partition@0 {
-   label = "dtb";
-   reg = <0x 0x0010>;
-   read-only;
-   };
-   partition@30 {
-   label = "kernel";
-   reg = <0x0010 0x0040>;
-   read-only;
-   };
-   partition@40 {
-   label = "fs";
-   reg = <0x0050 0x00a0>;
-   };
-   partition@70 {
-   label = "firmware";
-   reg = <0x00f0 0x0010>;
-   read-only;
-   };
-   };
-
-   epld@2,0 {
-   compatible = "wrs,epld-localbus";
-   #address-cells = <2>;
-   #size-cells = <1>;
-   reg = <2 0 0x10>;
-   ranges = <0 0 5 0 1 // User switches
- 1 0 5 1 1 // Board ID/Rev
- 3 0 5 3 1>;   // LEDs
-   };
-   };
-
-   soc: soc@f800 {
-   ranges = <0x 0xf800 0x0010>;
-
-   enet0: ethernet@24000 {
-   tbi-handle = <>;
-   phy-handle = <>;
-   phy-connection-type = "rgmii-id";
-   };
-
-   mdio@24520 {
-   phy0: ethernet-phy@1f {
-   reg = <0x1f>;
-   };
-   phy1: ethernet-phy@0 

[PATCH 3/3] MAINTAINERS: update for Paul Gortmaker

2021-01-11 Thread Paul Gortmaker
Signed-off-by: Paul Gortmaker 
---
 MAINTAINERS | 1 -
 1 file changed, 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index cc1e6a5ee6e6..c5f5cdb24674 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6529,7 +6529,6 @@ F:Documentation/admin-guide/media/em28xx*
 F: drivers/media/usb/em28xx/
 
 EMBEDDED LINUX
-M: Paul Gortmaker 
 M: Matt Mackall 
 M: David Woodhouse 
 L: linux-embed...@vger.kernel.org
-- 
2.17.1



[PATCH 0/3] Retire remaining WindRiver embedded SBC BSPs

2021-01-11 Thread Paul Gortmaker
In v2.6.27 (2008, 917f0af9e5a9) the sbc8260 support was implicitly
retired by not being carried forward through the ppc --> powerpc
device tree transition.

Then, in v3.6 (2012, b048b4e17cbb) we retired the support for the
sbc8560 boards.

Next, in v4.18 (2017, 3bc6cf5a86e5) we retired the support for the
2006 vintage sbc834x boards.

The sbc8548 and sbc8641d boards were maybe 1-2 years newer than the
sbc834x boards, but it is also 3+ years later, so it makes sense to
now retire them as well - which is what is done here.

These two remaining WR boards were based on the Freescale MPC8548-CDS
and the MPC8641D-HPCN reference board implementations.  Having had the
chance to use these and many other Fsl ref boards, I know this:  The
Freescale reference boards were typically produced in limited quantity
and primarily available to BSP developers and hardware designers, and
not likely to have found a 2nd life with hobbyists and/or collectors.

It was good to have that BSP code subjected to mainline review and
hence also widely available back in the day. But given the above, we
should probably also be giving serious consideration to retiring
additional similar age/type reference board platforms as well.

I've always felt it is important for us to be proactive in retiring
old code, since it has a genuine non-zero carrying cost, as described
in the 930d52c012b8 merge log.  But for the here and now, we just
clean up the remaining BSP code that I had added for SBC platforms.

Paul.
-- 

Cc: Benjamin Herrenschmidt 
Cc: Michael Ellerman 
Cc: Paul Mackerras 
Cc: Scott Wood 

The following changes since commit 7c53f6b671f4aba70ff15e1b05148b10d58c2837:

  Linux 5.11-rc3 (2021-01-10 14:34:50 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux.git wr_sbc-delete

for you to fetch changes up to 1dfb28199572e3f6517cada41f6a150551749da1:

  MAINTAINERS: update for Paul Gortmaker (2021-01-11 00:06:01 -0500)

----
Paul Gortmaker (3):
  powerpc: retire sbc8548 board support
  powerpc: retire sbc8641d board support
  MAINTAINERS: update for Paul Gortmaker

 MAINTAINERS |   1 -
 arch/powerpc/boot/Makefile  |   1 -
 arch/powerpc/boot/dts/fsl/sbc8641d.dts  | 176 -
 arch/powerpc/boot/dts/sbc8548-altflash.dts  | 111 ---
 arch/powerpc/boot/dts/sbc8548-post.dtsi | 289 
 arch/powerpc/boot/dts/sbc8548-pre.dtsi  |  48 -
 arch/powerpc/boot/dts/sbc8548.dts   | 106 --
 arch/powerpc/boot/wrapper   |   2 +-
 arch/powerpc/configs/85xx/sbc8548_defconfig |  50 -
 arch/powerpc/configs/mpc85xx_base.config|   1 -
 arch/powerpc/configs/mpc86xx_base.config|   1 -
 arch/powerpc/configs/ppc6xx_defconfig   |   1 -
 arch/powerpc/platforms/85xx/Kconfig |   6 -
 arch/powerpc/platforms/85xx/Makefile|   1 -
 arch/powerpc/platforms/85xx/sbc8548.c   | 134 -
 arch/powerpc/platforms/86xx/Kconfig |   8 +-
 arch/powerpc/platforms/86xx/Makefile|   1 -
 arch/powerpc/platforms/86xx/sbc8641d.c  |  87 -
 18 files changed, 2 insertions(+), 1022 deletions(-)
 delete mode 100644 arch/powerpc/boot/dts/fsl/sbc8641d.dts
 delete mode 100644 arch/powerpc/boot/dts/sbc8548-altflash.dts
 delete mode 100644 arch/powerpc/boot/dts/sbc8548-post.dtsi
 delete mode 100644 arch/powerpc/boot/dts/sbc8548-pre.dtsi
 delete mode 100644 arch/powerpc/boot/dts/sbc8548.dts
 delete mode 100644 arch/powerpc/configs/85xx/sbc8548_defconfig
 delete mode 100644 arch/powerpc/platforms/85xx/sbc8548.c
 delete mode 100644 arch/powerpc/platforms/86xx/sbc8641d.c


[PATCH] kbuild: partial revert of "remove cc-option test of -Werror=date-time"

2021-01-10 Thread Paul Gortmaker
In commit 87de84c9140e1ccb221c68bb7e4939e880b3f2bb ("kbuild: remove
cc-option test of -Werror=date-time") the check for support of the
date-time option was removed.

However, by removing it from the top level Makefile, it breaks all
the normal compiler version checks, because GCC fails at the command
line parsing, and never gets to the CPP #error check in the headers.

So for gcc-4.8 (now unsupported) you get the confusing:

   cc1: error: -Werror=date-time: no option -Wdate-time

instead of the previous and expected error message of:

   # error Sorry, your version of GCC is too old - please use 4.9 or newer.

Restore the check in the top level Makefile so the longstanding GCC
arch independent version check works again for v4.8 and older.

Fixes: 87de84c9140e ("kbuild: remove cc-option test of -Werror=date-time")
Cc: Masahiro Yamada 
Cc: Nathan Chancellor 
Cc: Will Deacon 
Signed-off-by: Paul Gortmaker 
---
 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index e30cf02da8b8..e2f9e6582a10 100644
--- a/Makefile
+++ b/Makefile
@@ -937,7 +937,7 @@ KBUILD_CFLAGS  += -fno-stack-check
 KBUILD_CFLAGS   += $(call cc-option,-fconserve-stack)
 
 # Prohibit date/time macros, which would make the build non-deterministic
-KBUILD_CFLAGS   += -Werror=date-time
+KBUILD_CFLAGS   += $(call cc-option,-Werror=date-time)
 
 # enforce correct pointer usage
 KBUILD_CFLAGS   += $(call cc-option,-Werror=incompatible-pointer-types)
-- 
2.17.1



Re: [PATCH RFC cpumask 4/5] cpumask: Add "last" alias for cpu list specifications

2021-01-06 Thread Paul Gortmaker
[Re: [PATCH RFC cpumask 4/5] cpumask: Add "last" alias for cpu list 
specifications] On 06/01/2021 (Wed 10:49) Peter Zijlstra wrote:

> On Tue, Jan 05, 2021 at 04:49:55PM -0800, paul...@kernel.org wrote:
> > From: Paul Gortmaker 
> > 
> > It seems that a common configuration is to use the 1st couple cores
> > for housekeeping tasks, and or driving a busy peripheral that generates
> > a lot of interrupts, or something similar.

[...]

> > A generic token replacement is used to substitute "last" with the
> > number of CPUs present before handing off to bitmap processing.  But
> > it could just as easily be used to replace any placeholder token with
> > any other token or value only known at/after boot.
> 
> Aside from the comments Yury made, on how all this is better in
> bitmap_parselist(), how about doing s/last/N/ here? For me something
> like: "4-N" reads much saner than "4-last".

OK, I can see N used as per university math classes... to indicate the
end point of a fixed set of numbers, but I confess to having had to
think about it for a bit (university was a long time ago).  I don't have
any strong opinion one way or another -- "last" vs. "N"...

> Also, it might make sense to teach all this about core/node topology,
> but that's going to be messy. Imagine something like "Core1-CoreN" or
> "Nore1-NodeN" to mean the mask all/{Core,Node}0.
> 
> And that is another feature that seems to be missing from parselist,
> all/except.

Seems reasonable, but I'm going to look at fixing up what I've got as
per Yury's comments before volunteering to muck around with more string
parsing code to add more features...

Thanks,
Paul.
--


Re: [PATCH 0/3] clear_warn_once: add timed interval resetting

2020-12-09 Thread Paul Gortmaker
[Re: [PATCH 0/3] clear_warn_once: add timed interval resetting] On 09/12/2020 
(Wed 17:37) Petr Mladek wrote:

> On Thu 2020-11-26 01:30:26, Paul Gortmaker wrote:
> > The existing clear_warn_once functionality is currently a manually
> > issued state reset via the file /sys/kernel/debug/clear_warn_once when
> > debugfs is mounted.  The idea being that a developer would be running
> > some tests, like LTP or similar, and want to check reproducibility
> > without having to reboot.
> > 
> > But you currently can't make use of clear_warn_once unless you've got
> > debugfs enabled and mounted - which may not be desired by some people
> > in some deployment situations.
> > 
> > The functionality added here allows for periodic resets in addition to
> > the one-shot reset it already had.  Then we allow for a boot-time setting
> > of the periodic resets so it can be used even when debugfs isn't mounted.
> > 
> > By having a periodic reset, we also open the door for having the various
> > "once" functions act as long period ratelimited messages, where a sysadmin
> > can pick an hour or a day reset if they are facing an issue and are
> > wondering "did this just happen once, or am I only being informed once?"
> 
> OK, I though more about it and I NACK this patchset.

Not a problem.  Thanks again for your time and explaining your thoughts.

At least it is out there if anyone wants to use it and they can follow
the discussion here when considering the pros/cons of doing so.

Paul.
--

> 
> My reason:
> 
> 1. The primary purpose was to provide a way to reset warn_once() without
>debugfs. From this POV, the solution is rather complicated: timers
>and another kernel parameter.
> 
> 2. I am not aware of any convincing argument why debugfs could not be
>mounted on the debugged system.
> 
> 3. Debugfs provides many more debugging facilities. It is designed for
>this purpose. It does not look like a good strategy to provide
>alternative interfaces just to avoid it.
> 
> 4. There were mentioned several other use cases for this feature,
>like RT systems. But it was not clear that it was really needed
>or that people would really use it.
> 
> 5. Some code might even rely on that it is called only once, see commit
>dfbf2897d00499f94cd ("bug: set warn variable before calling
>WARN()") or the recent
>https://lore.kernel.org/r/20201029142406.3c468...@gandalf.local.home
> 
>It should better stay as debugging feature that should be used with
>care.
> 
> 
> 6. It creates system wide ratelimited printk().
> 
>We have printk_ratelimited() for this. And it is quite problematic.
>It is supposed to prevent flood of printk() messages. But it does
>not work well because the limits depend on more factors, like:
>system size, conditions, console speed.
> 
>Yes, the proposed feature is supposed to solve another problem
>(lack of messages). But it is a global action that would
> re-enable >1000 messages that were limited to be printed
> only once because they could be too frequent. As a result:
> 
>   + it might cause flood of printk() messages
> 
>   + it is hard to define a good system wide time limit;
> it was even unclear what should be the lower limit.
> 
>   + it will restart the messages at some "random" point,
> so that the relation of the reported events would
> be unclear.
> 
>   From the API point of view:
> 
>   + printk_ratelimited() is used when we want to see that a
> problem is still there. It is per-message setting.
> 
>   + printk_once() is used when even printk_ratelimited() would
> be too much. It is per-message setting.
> 
>   + The new printk_repeated_once() is a strange mix of this two
> with the global setting. It does not fit much.
> 
> 
> Best Regards,
> Petr
> 
> PS: I did not answer your last mail because it looked like an endless
> fight over words or point of views. I decided to make a summary
> of my view instead. These are reason why I nacked it.
> 
> I know that there might be different views but so far no arguments
> changed mine. And I do not know how to explain it better.


Re: [PATCH 0/3] clear_warn_once: add timed interval resetting

2020-12-01 Thread Paul Gortmaker
[Re: [PATCH 0/3] clear_warn_once: add timed interval resetting] On 01/12/2020 
(Tue 13:49) Petr Mladek wrote:

[...]

> Is this feature requested by RT people?
> Or is it just a possible use-case?
> 
> I am not sure that RT is a really good example. The cron job is only
> part of the problem. The message would create a noise on its own.
> It would be shown on console or read/stored by a userspace log
> daemon. I am not sure that RT people would really want to use this.

To be clear, no RT person requested this, and it is just one possible
use case.  Enabling the sysadmin to be able to collect more data on
recurrence equally applies to WARN_ONCE as it does printk_once.

> That said, I still do not have strong opinion about the feature.
> It might make sense on its own. But I still see it as a workaround
> for another problem.

I'm not sure how it could be a workaround for anything, really.  It
doesn't hide anything -- it would instead possibly cause more output.
It enables a sysadmin to collect more data on recurrence when asked to
by a developer like one of us -- without having to ask the sysadmin to
be rebuilding the kernel or altering the rootfs.  "Please boot with
this boot-arg, and run for 3 days and report what you see."

If you get a WARN_ONCE, and choose to ignore it - you have already
decided you are OK with running with something clearly broken (not
good).  Being able to easily check if it happens again over time seems
like a good step towards resolving the issue vs. ignoring it.

> Non-trivial periodic tasks sometimes cause problems. And we do not
> know how big avalanche of messages it might restart.

Without specifics, I can't really address what problems you speak of.
But with a 2m minimum, if we add that - we can definitely say the risk
of "big avalanche of messages" is zero and not an issue.  We could even
use 5 or 10m minimum w/o really changing what I'm trying to achieve here.

> Also the once is sometimes used on purpose. It prevents repeated delays
> on fast paths. I wonder if it can sometimes even prevent recursion.

Again, I can't really address an open speculation like that, other than
to say if we do have an example of such recursion blocking, we should
code it explicitly, so it doesn't hide as a trap and blow up if someone
removes the "_once" at a later date as a part of a mainline change.

> I know that everything is possible already now. But this patchset
> makes it more visible and easier to use.

So, I have one last idea that may address your concern of people abusing
the reset variable like it is something to be used everyday, blindly.

What if we unconditionally set TAINT_USER once it is used?  That also
assists with the fact that such abuse is possible now even without
any of these changes applied, as you have acknowledged.

We'd be making it 100% clear that a person shouldn't be hammering away
on the reset simply because it happens to be there.  The taint would
make it clear it isn't a "feature" but instead a debugging/information
gathering aid to only be used on occasion with a specific goal in mind.

I could do a v2 with a TAINT_USER addition, and a conversion to minutes,
with a 5m minimum.  But I won't spam people with that unless it resolves
the concerns that you (and anyone else) might have with misuse.

If people don't see the value in it easing data collection once an issue
is spotted, I'm fine with that and will shelf the patch set, and thank
people for their valuable time and feedback.

Paul.
--

> 
> Best Regards,
> Petr


Re: [PATCH 0/3] clear_warn_once: add timed interval resetting

2020-11-30 Thread Paul Gortmaker
[Re: [PATCH 0/3] clear_warn_once: add timed interval resetting] On 29/11/2020 
(Sun 19:08) Andi Kleen wrote:

> On Thu, Nov 26, 2020 at 01:30:26AM -0500, Paul Gortmaker wrote:
> > But you currently can't make use of clear_warn_once unless you've got
> > debugfs enabled and mounted - which may not be desired by some people
> > in some deployment situations.
> 
> Seems awfully special purpose. The problem with debugfs is security,
> or is it no convenient process that could do cron like functionality? 

My understanding is that it is a bit of both.  As users of rt tasks,
they won't be running anything like cron that could add to OS jitter on
the (presumably minimal) rootfs - so they were looking for a clean
engineered solution with near zero overhead, that they could easily
deploy on all nodes after the rt tuning was 99% completed and node
images had been bundled.  Just to be sure everything was operating as
they'd aimed to achieve.

I thought a boot arg (and the internal timer) seemed like a good fit to
that requirement.  No kernel or rootfs rebuilding required.  And I
figured others might be in the same boat and could use it too.

Paul.
--

> 
> If it's the first, perhaps what they really need is a way to get
> a partial debugfs? 
> 
> -Andi


Re: [PATCH 2/3] clear_warn_once: bind a timer to written reset value

2020-11-30 Thread Paul Gortmaker
[Re: [PATCH 2/3] clear_warn_once: bind a timer to written reset value] On 
30/11/2020 (Mon 11:20) Steven Rostedt wrote:

> On Thu, 26 Nov 2020 01:30:28 -0500
> Paul Gortmaker  wrote:
> 
> > +++ b/Documentation/admin-guide/clearing-warn-once.rst
> > @@ -7,3 +7,12 @@ echo 1 > /sys/kernel/debug/clear_warn_once
> >  
> >  clears the state and allows the warnings to print once again.
> >  This can be useful after test suite runs to reproduce problems.
> > +
> > +Values greater than one set a timer for a periodic state reset; e.g.
> > +
> > +echo 3600 > /sys/kernel/debug/clear_warn_once
> 
> I wonder if the value should be in minutes and not seconds, otherwise, a
> wrong value could possibly DoS the machine, if you were to write 2 into it.
> If there were a lot of warnings in high frequency events.
> 
> Or is dumping out a bunch of warnings every 2 seconds not be a problem?

It doesn't seem to be a problem - at least in that running a defconfig
build on an otherwise out of the box common distro doesn't seem to trip
any WARN or printk_once events in my testing.  Of course there may be a
use case out there that is doing lots of them, however.

> Anyway, would there ever be a need to have it cleared in less than 1 minute
> intervals?

I don't think so - as I said in another follow up from last week:

https://lore.kernel.org/lkml/20201127174316.ga11...@windriver.com/

I'd also indicated in the above that I'd be fine with adding a minimum
of 1m if people feel better about that.  Also maybe moving the units to
minutes instead of seconds helps implicitly convey the intended use
better -- i.e. "don't be smashing on this every second" -- maybe that
was your point as well - and I'd agree with that.

Paul.
--

> 
> -- Steve
> 
> 
> > +
> > +will establish an hourly state reset, effectively turning WARN_ONCE
> > +into a long period rate-limited warning.


Re: [PATCH 0/3] clear_warn_once: add timed interval resetting

2020-11-27 Thread Paul Gortmaker
[Re: [PATCH 0/3] clear_warn_once: add timed interval resetting] On 27/11/2020 
(Fri 17:13) Petr Mladek wrote:

> On Thu 2020-11-26 01:30:26, Paul Gortmaker wrote:
> > The existing clear_warn_once functionality is currently a manually
> > issued state reset via the file /sys/kernel/debug/clear_warn_once when
> > debugfs is mounted.  The idea being that a developer would be running
> > some tests, like LTP or similar, and want to check reproducibility
> > without having to reboot.
> > 
> > But you currently can't make use of clear_warn_once unless you've got
> > debugfs enabled and mounted - which may not be desired by some people
> > in some deployment situations.
> > 
> > The functionality added here allows for periodic resets in addition to
> > the one-shot reset it already had.  Then we allow for a boot-time setting
> > of the periodic resets so it can be used even when debugfs isn't mounted.
> > 
> > By having a periodic reset, we also open the door for having the various
> > "once" functions act as long period ratelimited messages, where a sysadmin
> > can pick an hour or a day reset if they are facing an issue and are
> > wondering "did this just happen once, or am I only being informed once?"
> 
> What is the primary problem that you wanted to solve, please?

You've captured it exactly below.

> 
> Do you have an example what particular printk_once() you were
> interested into?

Well, the one I encounter (directly/indirectly) most is the one I
mentioned in mainline 3ec25826ae3 - the throttling one.

> I guess that the main problem is that
> /sys/kernel/debug/clear_warn_once is available only when debugfs is
> mounted. And the periodic reset is just one possible solution
> that looks like a nice to have. Do I get it correctly, please?

That is exactly it.  I wanted the functionality of the clear but w/o the
debugfs requirement, and thinking backwards from there - came up with
the timer based solution.  Other uses and/or users of the periodic reset
seemed like an added bonus.  Enabling sysadmins to be able to gather
more data upon seeing an issue seems like a good thing.

> I am not completely against the idea. But I have some concerns.
> 
> 1. It allows to convert printk_once() into printk_ratelimited()
>with some special semantic and interface. It opens possibilities
>for creativity. It might be good and it also might create
>problems that are hard to foresight now.

Actually that problem, if it is one, existed as soon as clear_warn_once
feature was added to the kernel years ago in v4.x kernel version:

  (while [ 1 ] ; do echo 1 > clear_warn_once ; sleep 1 ; done) &

The printk_once is now converted to printk_ratelimited for one second.

I thought about it a bunch, and of course we have the fact that this
extension is an opt-in thing, and hence the default is unchanged and
most people won't even know it exists, unless they actively go looking
for it in order to collect more information.

>printk_ratelimited() is problematic, definitely, see below.

I can't argue that.

> 
> 2. printk_ratelimited() is typically used when a message might get
>printed too often. It prevents overloading consoles, log daemons.
>Also it helps to see other messages that might get lost otherwise.
> 
>I have seen many discussions about what is the right ratelimit
>for a particular message. I have to admit that it was mainly
>related to console speed. The messages were lost with slow
>consoles. People want to see more on fast consoles.

Yeah, I've seen those too, which is typically concerned with 10-1000
printk per second - but this isn't that discussion, and I don't want
it to be that discussion.

>The periodic warn once should not have this problem because the
>period would typically be long. And it would produce only
>one message on each location.

Correct.  I even entertained setting a minimum, like 1m or 5m, but then
considered the old unix rule about the kernel not setting policy.
That said, if it made people more at ease, I'd be OK with setting a 1m
minimum on the reset - I can't think of a use case where faster than
that would ever make sense.

>The problem is that it is a global setting. It would reset
>all printk_once() callers. And I see two problems here:
> 
>+ Periodic reset might cause printing related problems
>in the wrong order. Messages from victims first. Messages
>about the root of the problem later (from next cycle).
>It might create confusion.

The out-of-order problem exists already just like the ratelimited
"conversion" exists already as shown above - using the same script.

That aside, the out of order problem assumes 1) you have a linked pair
printk_once(&quo

[PATCH 3/3] clear_warn_once: add a warn_once_reset= boot parameter

2020-11-25 Thread Paul Gortmaker
In the event debugfs is not mounted, or if built with the .config
setting DEBUG_FS_ALLOW_NONE chosen, this gives the sysadmin access
to reset the WARN_ONCE() state on a periodic basis.

Cc: Andi Kleen 
Cc: Petr Mladek 
Cc: Sergey Senozhatsky 
Cc: Steven Rostedt 
Cc: John Ogness 
Signed-off-by: Paul Gortmaker 
---
 .../admin-guide/kernel-parameters.txt |  8 +++
 kernel/panic.c| 21 +++
 2 files changed, 29 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 44fde25bb221..89f5fee7 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5863,6 +5863,14 @@
vt.underline=   [VT] Default color for underlined text; 0-15.
Default: 3 = cyan.
 
+   warn_once_reset=
+   [KNL]
+   Set the WARN_ONCE reset period in seconds.  Normally
+   a WARN_ONCE() will only ever emit a message once per
+   boot, but for example, setting this to 3600 would
+   effectively rate-limit WARN_ONCE to once per hour.
+   Default: 0 = never.
+
watchdog timers [HW,WDT] For information on watchdog timers,
see Documentation/watchdog/watchdog-parameters.rst
or other driver-specific files in the
diff --git a/kernel/panic.c b/kernel/panic.c
index a23eb239fb17..f813ca3a5cd5 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -716,10 +716,31 @@ static __init int register_warn_debugfs(void)
/* Don't care about failure */
debugfs_create_file_unsafe("clear_warn_once", 0600, NULL,
   _once_reset, _warn_once_fops);
+
+   /* if a bootarg was used, set the initial timer */
+   if (warn_once_reset)
+   warn_once_set(NULL, warn_once_reset);
+
return 0;
 }
 
 device_initcall(register_warn_debugfs);
+
+static int __init warn_once_setup(char *s)
+{
+   int r;
+
+   if (!s)
+   return -EINVAL;
+
+   r = kstrtoull(s, 0, _once_reset);
+   if (r)
+   return r;
+
+   return 1;
+}
+__setup("warn_once_reset=", warn_once_setup);
+
 #endif
 
 #ifdef CONFIG_STACKPROTECTOR
-- 
2.25.1



[PATCH 2/3] clear_warn_once: bind a timer to written reset value

2020-11-25 Thread Paul Gortmaker
Existing documentation has a write of "1" to clear/reset all the
WARN_ONCE and similar to the as-booted state, so they can possibly
be re-triggered again during debugging/testing.

But having them auto-reset once a day, or once a week, might shed
valuable information to a sysadmin on what the system is doing.

Here we extend the existing debugfs variable to bind a timer to the
written value N, so that it will reset every N seconds, for N>1.
Writing a zero will clear any previously set timer value.

The pre-existing behaviour of writing N=1 will do a one-shot clear.

Cc: Andi Kleen 
Cc: Petr Mladek 
Cc: Sergey Senozhatsky 
Cc: Steven Rostedt 
Cc: John Ogness 
Signed-off-by: Paul Gortmaker 
---
 .../admin-guide/clearing-warn-once.rst|  9 ++
 kernel/panic.c| 32 +++
 2 files changed, 41 insertions(+)

diff --git a/Documentation/admin-guide/clearing-warn-once.rst 
b/Documentation/admin-guide/clearing-warn-once.rst
index 211fd926cf00..93cf3ba0b57d 100644
--- a/Documentation/admin-guide/clearing-warn-once.rst
+++ b/Documentation/admin-guide/clearing-warn-once.rst
@@ -7,3 +7,12 @@ echo 1 > /sys/kernel/debug/clear_warn_once
 
 clears the state and allows the warnings to print once again.
 This can be useful after test suite runs to reproduce problems.
+
+Values greater than one set a timer for a periodic state reset; e.g.
+
+echo 3600 > /sys/kernel/debug/clear_warn_once
+
+will establish an hourly state reset, effectively turning WARN_ONCE
+into a long period rate-limited warning.
+
+Writing a value of zero (or one) will remove any previously set timer.
diff --git a/kernel/panic.c b/kernel/panic.c
index 1d425970a50c..a23eb239fb17 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -655,6 +655,7 @@ EXPORT_SYMBOL(__warn_printk);
 /* Support resetting WARN*_ONCE state */
 
 static u64 warn_once_reset;
+static bool warn_timer_active;
 
 static void do_clear_warn_once(void)
 {
@@ -662,6 +663,14 @@ static void do_clear_warn_once(void)
memset(__start_once, 0, __end_once - __start_once);
 }
 
+static void timer_warn_once(struct timer_list *timer)
+{
+   do_clear_warn_once();
+   timer->expires = jiffies + warn_once_reset * HZ;
+   add_timer(timer);
+}
+static DEFINE_TIMER(warn_reset_timer, timer_warn_once);
+
 static int warn_once_get(void *data, u64 *val)
 {
*val = warn_once_reset;
@@ -672,6 +681,29 @@ static int warn_once_set(void *data, u64 val)
 {
warn_once_reset = val;
 
+   if (val > 1) {  /* set/reset new timer */
+   unsigned long expires = jiffies + val * HZ;
+
+   if (warn_timer_active) {
+   mod_timer(_reset_timer, expires);
+   } else {
+   warn_timer_active = 1;
+   warn_reset_timer.expires = expires;
+   add_timer(_reset_timer);
+   }
+   return 0;
+   }
+
+   if (warn_timer_active) {
+   del_timer_sync(_reset_timer);
+   warn_timer_active = 0;
+   }
+   warn_once_reset = 0;
+
+   if (val == 0)   /* cleared timer, we are done */
+   return 0;
+
+   /* Getting here means val == 1  --->  so clear existing data */
do_clear_warn_once();
return 0;
 }
-- 
2.25.1



[PATCH 1/3] clear_warn_once: expand debugfs to include read support

2020-11-25 Thread Paul Gortmaker
The existing clear_warn_once variable is write-only; used as per the
documentation to reset the warn_once to as-booted state with:

echo 1 > /sys/kernel/debug/clear_warn_once

The objective is to expand on that functionality, which requires the
debugfs variable to be read/write and not just write-only.

Here, we deal with the debugfs boilerplate associated with converting
it from write-only to read-write, in order to factor that out for easier
review, and for what may be a possible future useful bisect point.

Existing functionality is unchanged - the only difference is that we
have tied in a variable that lets you now read the variable and see
the last value written.

Cc: Andi Kleen 
Cc: Petr Mladek 
Cc: Sergey Senozhatsky 
Cc: Steven Rostedt 
Cc: John Ogness 
Signed-off-by: Paul Gortmaker 
---
 kernel/panic.c | 25 -
 1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/kernel/panic.c b/kernel/panic.c
index 332736a72a58..1d425970a50c 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -654,21 +654,36 @@ EXPORT_SYMBOL(__warn_printk);
 
 /* Support resetting WARN*_ONCE state */
 
-static int clear_warn_once_set(void *data, u64 val)
+static u64 warn_once_reset;
+
+static void do_clear_warn_once(void)
 {
generic_bug_clear_once();
memset(__start_once, 0, __end_once - __start_once);
+}
+
+static int warn_once_get(void *data, u64 *val)
+{
+   *val = warn_once_reset;
+   return 0;
+}
+
+static int warn_once_set(void *data, u64 val)
+{
+   warn_once_reset = val;
+
+   do_clear_warn_once();
return 0;
 }
 
-DEFINE_DEBUGFS_ATTRIBUTE(clear_warn_once_fops, NULL, clear_warn_once_set,
-"%lld\n");
+DEFINE_DEBUGFS_ATTRIBUTE(clear_warn_once_fops, warn_once_get, warn_once_set,
+"%llu\n");
 
 static __init int register_warn_debugfs(void)
 {
/* Don't care about failure */
-   debugfs_create_file_unsafe("clear_warn_once", 0200, NULL, NULL,
-  _warn_once_fops);
+   debugfs_create_file_unsafe("clear_warn_once", 0600, NULL,
+  _once_reset, _warn_once_fops);
return 0;
 }
 
-- 
2.25.1



[PATCH 0/3] clear_warn_once: add timed interval resetting

2020-11-25 Thread Paul Gortmaker
The existing clear_warn_once functionality is currently a manually
issued state reset via the file /sys/kernel/debug/clear_warn_once when
debugfs is mounted.  The idea being that a developer would be running
some tests, like LTP or similar, and want to check reproducibility
without having to reboot.

But you currently can't make use of clear_warn_once unless you've got
debugfs enabled and mounted - which may not be desired by some people
in some deployment situations.

The functionality added here allows for periodic resets in addition to
the one-shot reset it already had.  Then we allow for a boot-time setting
of the periodic resets so it can be used even when debugfs isn't mounted.

By having a periodic reset, we also open the door for having the various
"once" functions act as long period ratelimited messages, where a sysadmin
can pick an hour or a day reset if they are facing an issue and are
wondering "did this just happen once, or am I only being informed once?"

Tested with DEBUG_FS_ALLOW_ALL and DEBUG_FS_ALLOW_NONE on an otherwise
defconfig.

---

Cc: Andi Kleen 
Cc: Petr Mladek 
Cc: Sergey Senozhatsky 
Cc: Steven Rostedt 
Cc: John Ogness 

Paul Gortmaker (3):
  clear_warn_once: expand debugfs to include read support
  clear_warn_once: bind a timer to written reset value
  clear_warn_once: add a warn_once_reset= boot parameter

 .../admin-guide/clearing-warn-once.rst|  9 +++
 .../admin-guide/kernel-parameters.txt |  8 ++
 kernel/panic.c| 78 +--
 3 files changed, 90 insertions(+), 5 deletions(-)

-- 
2.25.1



Re: [PATCH][next] cpumask: allocate enough space for string and trailing '\0' char

2020-11-09 Thread Paul Gortmaker




On 2020-11-09 8:07 p.m., Qian Cai wrote:

On Mon, 2020-11-09 at 13:04 +, Colin King wrote:

From: Colin Ian King 

Currently the allocation of cpulist is based on the length of buf but does
not include the addition end of string '\0' terminator. Static analysis is
reporting this as a potential out-of-bounds access on cpulist. Fix this by
allocating enough space for the additional '\0' terminator.

Addresses-Coverity: ("Out-of-bounds access")
Fixes: 65987e67f7ff ("cpumask: add "last" alias for cpu list specifications")


Yeah, this bad commit also introduced KASAN errors everywhere and then will
disable lockdep that makes our linux-next CI miserable. Confirmed that this
patch will fix it.


I appreciate the reports reminding me why I hate touching string handling.

But let us not lose sight of why linux-next exists.  We want to
encourage code to appear there as a sounding board before it goes
mainline, so we can fix things and not pollute mainline git history
with those trivialities.

If you've decided to internalize linux-next as part of your CI, then
great, but do note that does not elevate linux-next to some pristine
status for the world at large.  That only means you have to watch more
closely what is going on.

If you want to declare linux-next unbreakable -- well that would scare
away others to get the multi-arch or multi-config coverage that they may
not be able to do themselves.  We are not going to do that.

I have (hopefully) fixed the "bad commit" in v2 -- as part of the
implicit linux-next rule "you broke it, you better fix it ASAP".

But "bad" and "miserable" can be things that might scare people off of
making use of linux-next for what it is meant to be for.  And I am not
OK with that.

Thanks,
Paul.
--




Signed-off-by: Colin Ian King 
---
  lib/cpumask.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/cpumask.c b/lib/cpumask.c
index 34ecb3005941..cb8a3ef0e73e 100644
--- a/lib/cpumask.c
+++ b/lib/cpumask.c
@@ -185,7 +185,7 @@ int __ref cpulist_parse(const char *buf, struct cpumask
*dstp)
  {
int r;
char *cpulist, last_cpu[5]; /* NR_CPUS <=  */
-   size_t len = strlen(buf);
+   size_t len = strlen(buf) + 1;
bool early = !slab_is_available();
  
  	if (!strcmp(buf, "all")) {




[PATCH 4/4] cpumask: add "last" alias for cpu list specifications

2020-11-09 Thread Paul Gortmaker
It seems that a common configuration is to use the 1st couple cores
for housekeeping tasks, and or driving a busy peripheral that generates
a lot of interrupts, or something similar.

This tends to leave the remaining ones to form a pool of similarly
configured cores to take on the real workload of interest to the user.

So on machine A - with 32 cores, it could be 0-3 for "system" and then
4-31 being used in boot args like nohz_full=, or rcu_nocbs= as part of
setting up the worker pool of CPUs.

But then newer machine B is added, and it has 48 cores, and so while
the 0-3 part remains unchanged, the pool setup cpu list becomes 4-47.

Deployment would be easier if we could just simply replace 31 and 47
with "last" and let the system substitute in the actual number at boot;
a number that it knows better than we do.

No need to have custom boot args per node, no need to do a trial boot
in order to snoop /proc/cpuinfo and/or /sys/devices/system/cpu - no
more fencepost errors of using 32 and 48 instead of 31 and 47.

A generic token replacement is used to substitute "last" with the
number of CPUs present before handing off to bitmap processing.  But
it could just as easily be used to replace any placeholder token with
any other token or value only known at/after boot.

Signed-off-by: Paul Gortmaker 
---
 .../admin-guide/kernel-parameters.rst |   7 ++
 lib/cpumask.c | 112 +-
 2 files changed, 117 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.rst 
b/Documentation/admin-guide/kernel-parameters.rst
index 9e1c4522e1f0..362dea55034e 100644
--- a/Documentation/admin-guide/kernel-parameters.rst
+++ b/Documentation/admin-guide/kernel-parameters.rst
@@ -83,6 +83,13 @@ will provide an empty/cleared cpu mask for the associated 
boot argument.
 Note that "all" and "none" are not necessarily valid/sensible input values
 for each available parameter expecting a CPU list.
 
+foo_cpus=1,3,5,16-last
+
+will at runtime, replace "last" with the number of the last (highest number)
+present CPU on the system.  Thus a common deployment can be used on multiple
+systems with different total number of cores present, without needing to
+evaluate the total core count in advance on each system.
+
 This document may not be entirely up to date and comprehensive. The command
 "modinfo -p ${modulename}" shows a current list of all parameters of a loadable
 module. Loadable modules, after being loaded into the running kernel, also
diff --git a/lib/cpumask.c b/lib/cpumask.c
index eb8b1c92501e..fa56d622c1d8 100644
--- a/lib/cpumask.c
+++ b/lib/cpumask.c
@@ -3,6 +3,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -96,15 +97,97 @@ int cpumask_next_wrap(int n, const struct cpumask *mask, 
int start, bool wrap)
 }
 EXPORT_SYMBOL(cpumask_next_wrap);
 
+/*
+ * Basically strstr() but given "foo", ignore "foobar", "myfoo", "foofoo"
+ * and "foo2bar" -- i.e. any case where the token is a word fragment.
+ */
+static char *cpumask_find_token(const char *str, const char *token)
+{
+   char *here = strstr(str, token);
+   size_t tlen = strlen(token);
+
+   if (!here)
+   return NULL;
+
+   while (here) {
+   size_t offset = here - str;
+   char prev, next = str[offset + tlen];
+
+   if (offset)
+   prev = str[offset - 1];
+   else
+   prev = '\0';
+
+   if (!(isalnum(prev) || isalnum(next)))
+   break;
+
+   here = strstr(here + tlen, token);
+   }
+
+   return here;
+}
+
+/*
+ * replace old token with new token: Given a convenience or placeholder
+ * token "last" and an associated value not known until boot, of say 1234,
+ * replace instances of "last" with "1234".
+ *
+ * For example src = "1,3,last,7-last,9,lastly,last-2047\0"  results in a
+ *dest = "1,3,1234,7-1234,9,lastly,1234-2047\0"
+ *
+ * The destination string may be shorter than, equal to, or longer than
+ * the source string -- based on whether the new token strlen is shorter
+ * than, equal to, or longer than the old token strlen.
+ * The caller must allocate dest space accordingly with that in mind.
+ */
+
+static void cpulist_replace_token(char *dest, const char *src,
+  const char *old_token, const char *new_token)
+{
+   const char *src_start = src;
+   char *dest_start = dest, *here;
+   const size_t olen = strlen(old_token);
+   const size_t nlen = strlen(new_token);
+
+   here = cpumask_find_token(src_start, old_token);
+   if (!here) {
+   strcpy(dest, src);
+   return;
+   }
+
+   while (here) {
+  

[PATCH 2/4] cpumask: make "all" alias global and not just RCU

2020-11-09 Thread Paul Gortmaker
It is probably better that we don't have subsystem specific
abbreviations or aliases for generic CPU list specifications.

Hence we move the "all" from RCU out to lib/ so that it can be
used in any instance where CPU lists are being parsed.

Signed-off-by: Paul Gortmaker 
---
 Documentation/admin-guide/kernel-parameters.rst |  7 +++
 Documentation/admin-guide/kernel-parameters.txt |  4 +---
 kernel/rcu/tree_plugin.h| 13 -
 lib/cpumask.c   |  6 ++
 4 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.rst 
b/Documentation/admin-guide/kernel-parameters.rst
index 6d421694d98e..ef98ca700946 100644
--- a/Documentation/admin-guide/kernel-parameters.rst
+++ b/Documentation/admin-guide/kernel-parameters.rst
@@ -68,6 +68,13 @@ For example one can add to the command line following 
parameter:
 
 where the final item represents CPUs 100,101,125,126,150,151,...
 
+The following convenience aliases are also accepted and used:
+
+foo_cpus=all
+
+is equivalent to "foo_cpus=0-N" -- where "N" is the numerically last CPU on
+the system, thus avoiding looking up the value in "/sys/devices/system/cpu"
+in advance on each deployed system.
 
 
 This document may not be entirely up to date and comprehensive. The command
diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 526d65d8573a..96eed72f02a2 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4023,9 +4023,7 @@
see CONFIG_RAS_CEC help text.
 
rcu_nocbs=  [KNL]
-   The argument is a cpu list, as described above,
-   except that the string "all" can be used to
-   specify every CPU on the system.
+   The argument is a cpu list, as described above.
 
In kernels built with CONFIG_RCU_NOCB_CPU=y, set
the specified list of CPUs to be no-callback CPUs.
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index fd8a52e9a887..b18f89f94fd3 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1463,20 +1463,15 @@ static void rcu_cleanup_after_idle(void)
 
 /*
  * Parse the boot-time rcu_nocb_mask CPU list from the kernel parameters.
- * The string after the "rcu_nocbs=" is either "all" for all CPUs, or a
- * comma-separated list of CPUs and/or CPU ranges.  If an invalid list is
- * given, a warning is emitted and all CPUs are offloaded.
+ * If the list is invalid, a warning is emitted and all CPUs are offloaded.
  */
 static int __init rcu_nocb_setup(char *str)
 {
alloc_bootmem_cpumask_var(_nocb_mask);
-   if (!strcasecmp(str, "all"))
+   if (cpulist_parse(str, rcu_nocb_mask)) {
+   pr_warn("rcu_nocbs= bad CPU range, all CPUs set\n");
cpumask_setall(rcu_nocb_mask);
-   else
-   if (cpulist_parse(str, rcu_nocb_mask)) {
-   pr_warn("rcu_nocbs= bad CPU range, all CPUs set\n");
-   cpumask_setall(rcu_nocb_mask);
-   }
+   }
return 1;
 }
 __setup("rcu_nocbs=", rcu_nocb_setup);
diff --git a/lib/cpumask.c b/lib/cpumask.c
index 5eb002237404..15599cdf5db6 100644
--- a/lib/cpumask.c
+++ b/lib/cpumask.c
@@ -2,6 +2,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -104,6 +105,11 @@ EXPORT_SYMBOL(cpumask_next_wrap);
  */
 int cpulist_parse(const char *buf, struct cpumask *dstp)
 {
+   if (!strcmp(buf, "all")) {
+   cpumask_setall(dstp);
+   return 0;
+   }
+
return bitmap_parselist(buf, cpumask_bits(dstp), nr_cpumask_bits);
 }
 EXPORT_SYMBOL(cpulist_parse);
-- 
2.25.1



[PATCH v2 0/4] support for global CPU list abbreviations

2020-11-09 Thread Paul Gortmaker
RFC/v1 ---> v2:

commit #1:
   leave one line stub behind for !SMP solving build failures.
   Reported by Randy Dunlap and various build bots.

commit #4
   manage to remember '\0' char in strlen from one line to the next.
   Reported by Colin King.

Original description from v1/RFC below remains unchanged...

 ---

The basic objective here was to add support for "nohz_full=8-last" and/or
"rcu_nocbs="4-last" -- essentially introduce "last" as a portable
reference evaluated at boot/runtime for anything using a CPU list.

The thinking behind this, is that people carve off a few early CPUs to
support housekeeping tasks, and perhaps dedicate one to a busy I/O
peripheral, and then the remaining pool of CPUs out to the end are a
part of a commonly configured pool used for the real work the user
cares about.

Extend that logic out to a fleet of machines - some new, and some
nearing EOL, and you've probably got a wide range of core counts to
contend with - even though the early number of cores dedicated to the
system overhead probably doesn't vary.

This change would enable sysadmins to have a common bootarg across all
such systems, and would also avoid any off-by-one fencepost errors that
happen for users who might briefly forget that core counts start at
zero.

Looking around before starting, I noticed RCU already had a short-form
abbreviation "all" -- but if we want to treat CPU lists in a uniform
matter, then tokens shouldn't be implemented at a subsystem level and
hence be subsystem specific; each with their own variations.

So I moved "all" to global use - for boot args, and for cgroups.  Then
I added the inverse "none" and finally, the one I wanted -- "last".

The use of "last" isn't a standalone word like "all" or "none".  It will
be a part of a complete range specification, possibly with CSV separate
ranges, and possibly specified multiple times.  So I had to be a bit
more careful with string matching - and hence un-inlined the parse
function as commit #1 in this series.

But it really is a generic support for "replace token ABC with known at
boot value XYZ" - for example, it would be trivial to extend support to
add "half" as a dynamic token to be replaced with 1/2 the core count,
even though I wouldn't suggest that has a use case like "last" does.

I tested the string matching with a bunch of intentionally badly crafted
strings in a user-space harness, and tested bootarg use with nohz_full
and rcu_nocbs, and also the post-boot cgroup use case as per below:

   root@hackbox:/sys/fs/cgroup/cpuset# mkdir foo
   root@hackbox:/sys/fs/cgroup/cpuset# cd foo
   root@hackbox:/sys/fs/cgroup/cpuset/foo# cat cpuset.cpus
   
   root@hackbox:/sys/fs/cgroup/cpuset/foo# /bin/echo 10-last > cpuset.cpus
   root@hackbox:/sys/fs/cgroup/cpuset/foo# cat cpuset.cpus
   10-15
   root@hackbox:/sys/fs/cgroup/cpuset/foo# /bin/echo all > cpuset.cpus
   root@hackbox:/sys/fs/cgroup/cpuset/foo# cat cpuset.cpus
   0-15
   root@hackbox:/sys/fs/cgroup/cpuset/foo# /bin/echo none > cpuset.cpus
   root@hackbox:/sys/fs/cgroup/cpuset/foo# cat cpuset.cpus
   
   root@hackbox:/sys/fs/cgroup/cpuset/foo#

This was on a 16 core machine with CONFIG_NR_CPUS=16 in .config file.

Note that the two use cases (boot and runtime) are why you see "early"
parameter in the code - I entertained just sticking the string copy on
the stack vs. the early alloc dance, but this felt more correct/robust.
The cgroup and modular code using cpulist_parse() are runtime cases.

---

Cc: Frederic Weisbecker 
Cc: "Paul E. McKenney" 
Cc: Josh Triplett 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Li Zefan 

Paul Gortmaker (4):
  cpumask: un-inline cpulist_parse for SMP; prepare for ascii helpers
  cpumask: make "all" alias global and not just RCU
  cpumask: add a "none" alias to complement "all"
  cpumask: add "last" alias for cpu list specifications

 .../admin-guide/kernel-parameters.rst |  20 +++
 .../admin-guide/kernel-parameters.txt |   4 +-
 include/linux/cpumask.h   |   8 ++
 kernel/rcu/tree_plugin.h  |  13 +-
 lib/cpumask.c | 132 ++
 5 files changed, 165 insertions(+), 12 deletions(-)

-- 
2.25.1



[PATCH 1/4] cpumask: un-inline cpulist_parse for SMP; prepare for ascii helpers

2020-11-09 Thread Paul Gortmaker
In order to support convenience tokens like "all", and "none" and
"last" in CPU lists, we'll have to use string operations and expand
on what is currently a simple wrapper around the underlying bitmap
function call.

Rather than add header dependencies to cpumask.h and code more complex
operations not really appropriate for a header file, we prepare by
simply un-inlining it here and move it to the lib dir alongside the
other more complex cpumask functions.

Since lib/cpumask.c is built conditionally on CONFIG_SMP, and there
are non-SMP callers, we leave the one-line stub behind for that case.
If they want to check "0-0" is a valid range, they can still do it.
In the meantime, we can add the ascii helpers for CONFIG_SMP users.
The use of NR_CPUS vs. CONFIG_SMP is consistent with the existing file.

Aside from an additional exported symbol in the SMP case, no functional
changes are anticipated with this move.

Signed-off-by: Paul Gortmaker 
---
 include/linux/cpumask.h |  8 
 lib/cpumask.c   | 13 +
 2 files changed, 21 insertions(+)

diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index f0d895d6ac39..d2e370c5ce99 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -679,11 +679,19 @@ static inline int cpumask_parse(const char *buf, struct 
cpumask *dstp)
  * @dstp: the cpumask to set.
  *
  * Returns -errno, or 0 for success.
+ *
+ * There are instances of non-SMP callers of this, and the easiest way
+ * to remain 100% runtime compatible is to let them continue to have the
+ * one-line stub, while the SMP version in lib/cpumask.c gets improved.
  */
+#if NR_CPUS == 1
 static inline int cpulist_parse(const char *buf, struct cpumask *dstp)
 {
return bitmap_parselist(buf, cpumask_bits(dstp), nr_cpumask_bits);
 }
+#else
+int cpulist_parse(const char *buf, struct cpumask *dstp);
+#endif
 
 /**
  * cpumask_size - size to allocate for a 'struct cpumask' in bytes
diff --git a/lib/cpumask.c b/lib/cpumask.c
index 85da6ab4fbb5..5eb002237404 100644
--- a/lib/cpumask.c
+++ b/lib/cpumask.c
@@ -95,6 +95,19 @@ int cpumask_next_wrap(int n, const struct cpumask *mask, int 
start, bool wrap)
 }
 EXPORT_SYMBOL(cpumask_next_wrap);
 
+/**
+ * cpulist_parse - extract a cpumask from a user string of ranges
+ * @buf: the buffer to extract from
+ * @dstp: the cpumask to set.
+ *
+ * Returns -errno, or 0 for success.
+ */
+int cpulist_parse(const char *buf, struct cpumask *dstp)
+{
+   return bitmap_parselist(buf, cpumask_bits(dstp), nr_cpumask_bits);
+}
+EXPORT_SYMBOL(cpulist_parse);
+
 /* These are not inline because of header tangles. */
 #ifdef CONFIG_CPUMASK_OFFSTACK
 /**
-- 
2.25.1



[PATCH 3/4] cpumask: add a "none" alias to complement "all"

2020-11-09 Thread Paul Gortmaker
With global support for a CPU list alias of "all", it seems to just make
sense to also trivially extend support for an opposite "none" specifier.

Signed-off-by: Paul Gortmaker 
---
 Documentation/admin-guide/kernel-parameters.rst | 6 ++
 lib/cpumask.c   | 5 +
 2 files changed, 11 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.rst 
b/Documentation/admin-guide/kernel-parameters.rst
index ef98ca700946..9e1c4522e1f0 100644
--- a/Documentation/admin-guide/kernel-parameters.rst
+++ b/Documentation/admin-guide/kernel-parameters.rst
@@ -76,6 +76,12 @@ is equivalent to "foo_cpus=0-N" -- where "N" is the 
numerically last CPU on
 the system, thus avoiding looking up the value in "/sys/devices/system/cpu"
 in advance on each deployed system.
 
+foo_cpus=none
+
+will provide an empty/cleared cpu mask for the associated boot argument.
+
+Note that "all" and "none" are not necessarily valid/sensible input values
+for each available parameter expecting a CPU list.
 
 This document may not be entirely up to date and comprehensive. The command
 "modinfo -p ${modulename}" shows a current list of all parameters of a loadable
diff --git a/lib/cpumask.c b/lib/cpumask.c
index 15599cdf5db6..eb8b1c92501e 100644
--- a/lib/cpumask.c
+++ b/lib/cpumask.c
@@ -110,6 +110,11 @@ int cpulist_parse(const char *buf, struct cpumask *dstp)
return 0;
}
 
+   if (!strcmp(buf, "none")) {
+   cpumask_clear(dstp);
+   return 0;
+   }
+
return bitmap_parselist(buf, cpumask_bits(dstp), nr_cpumask_bits);
 }
 EXPORT_SYMBOL(cpulist_parse);
-- 
2.25.1



Re: [PATCH -next] cpumask: use false and true for bool variables

2020-11-09 Thread Paul Gortmaker




On 2020-11-09 6:59 a.m., Zou Wei wrote:

Fix coccicheck warnings:

./lib/cpumask.c:342:6-13: WARNING: Comparison of 0/1 to bool variable
./lib/cpumask.c:351:33-40: WARNING: Comparison of 0/1 to bool variable
./lib/cpumask.c:406:3-11: WARNING: Assignment of 0/1 to bool variable

Reported-by: Hulk Robot 


It seems "Hulk Robot" needs an AI "tune-up".  I didn't touch any of 
these lines of code - and I'm guessing they haven't changed in years.


Paul.
--


Signed-off-by: Zou Wei 
---
  lib/cpumask.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/cpumask.c b/lib/cpumask.c
index 34ecb30..74d0cf1 100644
--- a/lib/cpumask.c
+++ b/lib/cpumask.c
@@ -339,7 +339,7 @@ static int find_nearest_node(int *node_dist, bool *used)
  
  	/* Choose the first unused node to compare */

for (i = 0; i < nr_node_ids; i++) {
-   if (used[i] == 0) {
+   if (used[i] == false) {
min_dist = node_dist[i];
node_id = i;
break;
@@ -348,7 +348,7 @@ static int find_nearest_node(int *node_dist, bool *used)
  
  	/* Compare and return the nearest node */

for (i = 0; i < nr_node_ids; i++) {
-   if (node_dist[i] < min_dist && used[i] == 0) {
+   if (node_dist[i] < min_dist && used[i] == false) {
min_dist = node_dist[i];
node_id = i;
}
@@ -403,7 +403,7 @@ unsigned int cpumask_local_spread(unsigned int i, int node)
   flags);
return cpu;
}
-   used[id] = 1;
+   used[id] = true;
}
spin_unlock_irqrestore(_lock, flags);
  



Re: [PATCH 0/4] RFC: support for global CPU list abbreviations

2020-11-08 Thread Paul Gortmaker

On 2020-11-08 1:02 p.m., Paul E. McKenney wrote:

> Or I can carry them if you wish.  My expected changes in response to
> this series are shown below, and are also what I used to test it.

Thanks Paul - that would get linux-next exposure w/o me pestering sfr.
If nobody else has objections, having them in rcu-next would be great.

Paul.
--


[PATCH 4/4] cpumask: add "last" alias for cpu list specifications

2020-11-08 Thread Paul Gortmaker
It seems that a common configuration is to use the 1st couple cores
for housekeeping tasks, and or driving a busy peripheral that generates
a lot of interrupts, or something similar.

This tends to leave the remaining ones to form a pool of similarly
configured cores to take on the real workload of interest to the user.

So on machine A - with 32 cores, it could be 0-3 for "system" and then
4-31 being used in boot args like nohz_full=, or rcu_nocbs= as part of
setting up the worker pool of CPUs.

But then newer machine B is added, and it has 48 cores, and so while
the 0-3 part remains unchanged, the pool setup cpu list becomes 4-47.

Deployment would be easier if we could just simply replace 31 and 47
with "last" and let the system substitute in the actual number at boot;
a number that it knows better than we do.

No need to have custom boot args per node, no need to do a trial boot
in order to snoop /proc/cpuinfo and/or /sys/devices/system/cpu - no
more fencepost errors of using 32 and 48 instead of 31 and 47.

A generic token replacement is used to substitute "last" with the
number of CPUs present before handing off to bitmap processing.  But
it could just as easily be used to replace any placeholder token with
any other token or value only known at/after boot.

Signed-off-by: Paul Gortmaker 
---
 .../admin-guide/kernel-parameters.rst |   7 ++
 lib/cpumask.c | 112 +-
 2 files changed, 117 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.rst 
b/Documentation/admin-guide/kernel-parameters.rst
index 9e1c4522e1f0..362dea55034e 100644
--- a/Documentation/admin-guide/kernel-parameters.rst
+++ b/Documentation/admin-guide/kernel-parameters.rst
@@ -83,6 +83,13 @@ will provide an empty/cleared cpu mask for the associated 
boot argument.
 Note that "all" and "none" are not necessarily valid/sensible input values
 for each available parameter expecting a CPU list.
 
+foo_cpus=1,3,5,16-last
+
+will at runtime, replace "last" with the number of the last (highest number)
+present CPU on the system.  Thus a common deployment can be used on multiple
+systems with different total number of cores present, without needing to
+evaluate the total core count in advance on each system.
+
 This document may not be entirely up to date and comprehensive. The command
 "modinfo -p ${modulename}" shows a current list of all parameters of a loadable
 module. Loadable modules, after being loaded into the running kernel, also
diff --git a/lib/cpumask.c b/lib/cpumask.c
index eb8b1c92501e..6c66f94b701d 100644
--- a/lib/cpumask.c
+++ b/lib/cpumask.c
@@ -3,6 +3,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -96,15 +97,97 @@ int cpumask_next_wrap(int n, const struct cpumask *mask, 
int start, bool wrap)
 }
 EXPORT_SYMBOL(cpumask_next_wrap);
 
+/*
+ * Basically strstr() but given "foo", ignore "foobar", "myfoo", "foofoo"
+ * and "foo2bar" -- i.e. any case where the token is a word fragment.
+ */
+static char *cpumask_find_token(const char *str, const char *token)
+{
+   char *here = strstr(str, token);
+   size_t tlen = strlen(token);
+
+   if (!here)
+   return NULL;
+
+   while (here) {
+   size_t offset = here - str;
+   char prev, next = str[offset + tlen];
+
+   if (offset)
+   prev = str[offset - 1];
+   else
+   prev = '\0';
+
+   if (!(isalnum(prev) || isalnum(next)))
+   break;
+
+   here = strstr(here + tlen, token);
+   }
+
+   return here;
+}
+
+/*
+ * replace old token with new token: Given a convenience or placeholder
+ * token "last" and an associated value not known until boot, of say 1234,
+ * replace instances of "last" with "1234".
+ *
+ * For example src = "1,3,last,7-last,9,lastly,last-2047\0"  results in a
+ *dest = "1,3,1234,7-1234,9,lastly,1234-2047\0"
+ *
+ * The destination string may be shorter than, equal to, or longer than
+ * the source string -- based on whether the new token strlen is shorter
+ * than, equal to, or longer than the old token strlen.
+ * The caller must allocate dest space accordingly with that in mind.
+ */
+
+static void cpulist_replace_token(char *dest, const char *src,
+  const char *old_token, const char *new_token)
+{
+   const char *src_start = src;
+   char *dest_start = dest, *here;
+   const size_t olen = strlen(old_token);
+   const size_t nlen = strlen(new_token);
+
+   here = cpumask_find_token(src_start, old_token);
+   if (!here) {
+   strcpy(dest, src);
+   return;
+   }
+
+   while (here) {
+  

[PATCH 3/4] cpumask: add a "none" alias to complement "all"

2020-11-08 Thread Paul Gortmaker
With global support for a CPU list alias of "all", it seems to just make
sense to also trivially extend support for an opposite "none" specifier.

Signed-off-by: Paul Gortmaker 
---
 Documentation/admin-guide/kernel-parameters.rst | 6 ++
 lib/cpumask.c   | 5 +
 2 files changed, 11 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.rst 
b/Documentation/admin-guide/kernel-parameters.rst
index ef98ca700946..9e1c4522e1f0 100644
--- a/Documentation/admin-guide/kernel-parameters.rst
+++ b/Documentation/admin-guide/kernel-parameters.rst
@@ -76,6 +76,12 @@ is equivalent to "foo_cpus=0-N" -- where "N" is the 
numerically last CPU on
 the system, thus avoiding looking up the value in "/sys/devices/system/cpu"
 in advance on each deployed system.
 
+foo_cpus=none
+
+will provide an empty/cleared cpu mask for the associated boot argument.
+
+Note that "all" and "none" are not necessarily valid/sensible input values
+for each available parameter expecting a CPU list.
 
 This document may not be entirely up to date and comprehensive. The command
 "modinfo -p ${modulename}" shows a current list of all parameters of a loadable
diff --git a/lib/cpumask.c b/lib/cpumask.c
index 15599cdf5db6..eb8b1c92501e 100644
--- a/lib/cpumask.c
+++ b/lib/cpumask.c
@@ -110,6 +110,11 @@ int cpulist_parse(const char *buf, struct cpumask *dstp)
return 0;
}
 
+   if (!strcmp(buf, "none")) {
+   cpumask_clear(dstp);
+   return 0;
+   }
+
return bitmap_parselist(buf, cpumask_bits(dstp), nr_cpumask_bits);
 }
 EXPORT_SYMBOL(cpulist_parse);
-- 
2.25.1



[PATCH 1/4] cpumask: un-inline cpulist_parse; prepare for ascii helpers

2020-11-08 Thread Paul Gortmaker
In order to support convenience tokens like "all", and "none" and
"last" in CPU lists, we'll have to use string operations and expand on
what is currently a simple wrapper around the underlying bitmap
function call.

Rather than add header dependencies to cpumask.h and code more complex
operations not really appropriate for a header file, we prepare by
simply un-inlining it here and move it to the lib dir alongside the
other more complex cpumask functions.

Aside from an additional exported symbol, no functional changes are
anticipated with this move.

Signed-off-by: Paul Gortmaker 
---
 include/linux/cpumask.h | 12 +---
 lib/cpumask.c   | 13 +
 2 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index f0d895d6ac39..6656019cce0f 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -673,17 +673,7 @@ static inline int cpumask_parse(const char *buf, struct 
cpumask *dstp)
return bitmap_parse(buf, UINT_MAX, cpumask_bits(dstp), nr_cpumask_bits);
 }
 
-/**
- * cpulist_parse - extract a cpumask from a user string of ranges
- * @buf: the buffer to extract from
- * @dstp: the cpumask to set.
- *
- * Returns -errno, or 0 for success.
- */
-static inline int cpulist_parse(const char *buf, struct cpumask *dstp)
-{
-   return bitmap_parselist(buf, cpumask_bits(dstp), nr_cpumask_bits);
-}
+int cpulist_parse(const char *buf, struct cpumask *dstp);
 
 /**
  * cpumask_size - size to allocate for a 'struct cpumask' in bytes
diff --git a/lib/cpumask.c b/lib/cpumask.c
index 85da6ab4fbb5..5eb002237404 100644
--- a/lib/cpumask.c
+++ b/lib/cpumask.c
@@ -95,6 +95,19 @@ int cpumask_next_wrap(int n, const struct cpumask *mask, int 
start, bool wrap)
 }
 EXPORT_SYMBOL(cpumask_next_wrap);
 
+/**
+ * cpulist_parse - extract a cpumask from a user string of ranges
+ * @buf: the buffer to extract from
+ * @dstp: the cpumask to set.
+ *
+ * Returns -errno, or 0 for success.
+ */
+int cpulist_parse(const char *buf, struct cpumask *dstp)
+{
+   return bitmap_parselist(buf, cpumask_bits(dstp), nr_cpumask_bits);
+}
+EXPORT_SYMBOL(cpulist_parse);
+
 /* These are not inline because of header tangles. */
 #ifdef CONFIG_CPUMASK_OFFSTACK
 /**
-- 
2.25.1



[PATCH 2/4] cpumask: make "all" alias global and not just RCU

2020-11-08 Thread Paul Gortmaker
It is probably better that we don't have subsystem specific
abbreviations or aliases for generic CPU list specifications.

Hence we move the "all" from RCU out to lib/ so that it can be
used in any instance where CPU lists are being parsed.

Signed-off-by: Paul Gortmaker 
---
 Documentation/admin-guide/kernel-parameters.rst |  7 +++
 Documentation/admin-guide/kernel-parameters.txt |  4 +---
 kernel/rcu/tree_plugin.h| 13 -
 lib/cpumask.c   |  6 ++
 4 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.rst 
b/Documentation/admin-guide/kernel-parameters.rst
index 6d421694d98e..ef98ca700946 100644
--- a/Documentation/admin-guide/kernel-parameters.rst
+++ b/Documentation/admin-guide/kernel-parameters.rst
@@ -68,6 +68,13 @@ For example one can add to the command line following 
parameter:
 
 where the final item represents CPUs 100,101,125,126,150,151,...
 
+The following convenience aliases are also accepted and used:
+
+foo_cpus=all
+
+is equivalent to "foo_cpus=0-N" -- where "N" is the numerically last CPU on
+the system, thus avoiding looking up the value in "/sys/devices/system/cpu"
+in advance on each deployed system.
 
 
 This document may not be entirely up to date and comprehensive. The command
diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 526d65d8573a..96eed72f02a2 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4023,9 +4023,7 @@
see CONFIG_RAS_CEC help text.
 
rcu_nocbs=  [KNL]
-   The argument is a cpu list, as described above,
-   except that the string "all" can be used to
-   specify every CPU on the system.
+   The argument is a cpu list, as described above.
 
In kernels built with CONFIG_RCU_NOCB_CPU=y, set
the specified list of CPUs to be no-callback CPUs.
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index fd8a52e9a887..b18f89f94fd3 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1463,20 +1463,15 @@ static void rcu_cleanup_after_idle(void)
 
 /*
  * Parse the boot-time rcu_nocb_mask CPU list from the kernel parameters.
- * The string after the "rcu_nocbs=" is either "all" for all CPUs, or a
- * comma-separated list of CPUs and/or CPU ranges.  If an invalid list is
- * given, a warning is emitted and all CPUs are offloaded.
+ * If the list is invalid, a warning is emitted and all CPUs are offloaded.
  */
 static int __init rcu_nocb_setup(char *str)
 {
alloc_bootmem_cpumask_var(_nocb_mask);
-   if (!strcasecmp(str, "all"))
+   if (cpulist_parse(str, rcu_nocb_mask)) {
+   pr_warn("rcu_nocbs= bad CPU range, all CPUs set\n");
cpumask_setall(rcu_nocb_mask);
-   else
-   if (cpulist_parse(str, rcu_nocb_mask)) {
-   pr_warn("rcu_nocbs= bad CPU range, all CPUs set\n");
-   cpumask_setall(rcu_nocb_mask);
-   }
+   }
return 1;
 }
 __setup("rcu_nocbs=", rcu_nocb_setup);
diff --git a/lib/cpumask.c b/lib/cpumask.c
index 5eb002237404..15599cdf5db6 100644
--- a/lib/cpumask.c
+++ b/lib/cpumask.c
@@ -2,6 +2,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -104,6 +105,11 @@ EXPORT_SYMBOL(cpumask_next_wrap);
  */
 int cpulist_parse(const char *buf, struct cpumask *dstp)
 {
+   if (!strcmp(buf, "all")) {
+   cpumask_setall(dstp);
+   return 0;
+   }
+
return bitmap_parselist(buf, cpumask_bits(dstp), nr_cpumask_bits);
 }
 EXPORT_SYMBOL(cpulist_parse);
-- 
2.25.1



[PATCH 0/4] RFC: support for global CPU list abbreviations

2020-11-08 Thread Paul Gortmaker
The basic objective here was to add support for "nohz_full=8-last" and/or
"rcu_nocbs="4-last" -- essentially introduce "last" as a portable
reference evaluated at boot/runtime for anything using a CPU list.

The thinking behind this, is that people carve off a few early CPUs to
support housekeeping tasks, and perhaps dedicate one to a busy I/O
peripheral, and then the remaining pool of CPUs out to the end are a
part of a commonly configured pool used for the real work the user
cares about.

Extend that logic out to a fleet of machines - some new, and some
nearing EOL, and you've probably got a wide range of core counts to
contend with - even though the early number of cores dedicated to the
system overhead probably doesn't vary.

This change would enable sysadmins to have a common bootarg across all
such systems, and would also avoid any off-by-one fencepost errors that
happen for users who might briefly forget that core counts start at
zero.

Looking around before starting, I noticed RCU already had a short-form
abbreviation "all" -- but if we want to treat CPU lists in a uniform
matter, then tokens shouldn't be implemented at a subsystem level and
hence be subsystem specific; each with their own variations.

So I moved "all" to global use - for boot args, and for cgroups.  Then
I added the inverse "none" and finally, the one I wanted -- "last".

The use of "last" isn't a standalone word like "all" or "none".  It will
be a part of a complete range specification, possibly with CSV separate
ranges, and possibly specified multiple times.  So I had to be a bit
more careful with string matching - and hence un-inlined the parse
function as commit #1 in this series.

But it really is a generic support for "replace token ABC with known at
boot value XYZ" - for example, it would be trivial to extend support to
add "half" as a dynamic token to be replaced with 1/2 the core count,
even though I wouldn't suggest that has a use case like "last" does.

I tested the string matching with a bunch of intentionally badly crafted
strings in a user-space harness, and tested bootarg use with nohz_full
and rcu_nocbs, and also the post-boot cgroup use case as per below:

   root@hackbox:/sys/fs/cgroup/cpuset# mkdir foo
   root@hackbox:/sys/fs/cgroup/cpuset# cd foo
   root@hackbox:/sys/fs/cgroup/cpuset/foo# cat cpuset.cpus
   
   root@hackbox:/sys/fs/cgroup/cpuset/foo# /bin/echo 10-last > cpuset.cpus
   root@hackbox:/sys/fs/cgroup/cpuset/foo# cat cpuset.cpus
   10-15
   root@hackbox:/sys/fs/cgroup/cpuset/foo# /bin/echo all > cpuset.cpus
   root@hackbox:/sys/fs/cgroup/cpuset/foo# cat cpuset.cpus
   0-15
   root@hackbox:/sys/fs/cgroup/cpuset/foo# /bin/echo none > cpuset.cpus
   root@hackbox:/sys/fs/cgroup/cpuset/foo# cat cpuset.cpus
   
   root@hackbox:/sys/fs/cgroup/cpuset/foo#

This was on a 16 core machine with CONFIG_NR_CPUS=16 in .config file.

Note that the two use cases (boot and runtime) are why you see "early"
parameter in the code - I entertained just sticking the string copy on
the stack vs. the early alloc dance, but this felt more correct/robust.
The cgroup and modular code using cpulist_parse() are runtime cases.

---

Cc: Frederic Weisbecker 
Cc: "Paul E. McKenney" 
Cc: Josh Triplett 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Li Zefan 

Paul Gortmaker (4):
  cpumask: un-inline cpulist_parse; prepare for ascii helpers
  cpumask: make "all" alias global and not just RCU
  cpumask: add a "none" alias to complement "all"
  cpumask: add "last" alias for cpu list specifications

 .../admin-guide/kernel-parameters.rst |  20 +++
 .../admin-guide/kernel-parameters.txt |   4 +-
 include/linux/cpumask.h   |  12 +-
 kernel/rcu/tree_plugin.h  |  13 +-
 lib/cpumask.c | 132 ++
 5 files changed, 158 insertions(+), 23 deletions(-)

-- 
2.25.1



[tip: core/rcu] torture: document --allcpus argument added to the kvm.sh script

2020-10-09 Thread tip-bot2 for Paul Gortmaker
The following commit has been merged into the core/rcu branch of tip:

Commit-ID: fbb9f8531a0d6693189783d295114db4c30624ca
Gitweb:
https://git.kernel.org/tip/fbb9f8531a0d6693189783d295114db4c30624ca
Author:Paul Gortmaker 
AuthorDate:Thu, 02 Jul 2020 15:59:05 -07:00
Committer: Paul E. McKenney 
CommitterDate: Mon, 24 Aug 2020 18:45:31 -07:00

torture: document --allcpus argument added to the kvm.sh script

Signed-off-by: Paul Gortmaker 
Signed-off-by: Paul E. McKenney 
---
 tools/testing/selftests/rcutorture/bin/kvm.sh | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/testing/selftests/rcutorture/bin/kvm.sh 
b/tools/testing/selftests/rcutorture/bin/kvm.sh
index e655983..0a08463 100755
--- a/tools/testing/selftests/rcutorture/bin/kvm.sh
+++ b/tools/testing/selftests/rcutorture/bin/kvm.sh
@@ -46,6 +46,7 @@ jitter="-1"
 
 usage () {
echo "Usage: $scriptname optional arguments:"
+   echo "   --allcpus"
echo "   --bootargs kernel-boot-arguments"
echo "   --bootimage relative-path-to-kernel-boot-image"
echo "   --buildonly"


[tip: sched/core] sched: nohz: stop passing around unused "ticks" parameter.

2020-07-22 Thread tip-bot2 for Paul Gortmaker
The following commit has been merged into the sched/core branch of tip:

Commit-ID: 46132e3ac58cb2ee48051ed80bffc0070ad59b2e
Gitweb:
https://git.kernel.org/tip/46132e3ac58cb2ee48051ed80bffc0070ad59b2e
Author:Paul Gortmaker 
AuthorDate:Wed, 01 Jul 2020 14:34:18 -04:00
Committer: Peter Zijlstra 
CommitterDate: Wed, 22 Jul 2020 10:22:04 +02:00

sched: nohz: stop passing around unused "ticks" parameter.

The "ticks" parameter was added in commit 0f004f5a696a ("sched: Cure more
NO_HZ load average woes") since calc_global_nohz() was called and needed
the "ticks" argument.

But in commit c308b56b5398 ("sched: Fix nohz load accounting -- again!")
it became unused as the function calc_global_nohz() dropped using "ticks".

Fixes: c308b56b5398 ("sched: Fix nohz load accounting -- again!")
Signed-off-by: Paul Gortmaker 
Signed-off-by: Peter Zijlstra (Intel) 
Link: 
https://lkml.kernel.org/r/1593628458-32290-1-git-send-email-paul.gortma...@windriver.com
---
 include/linux/sched/loadavg.h | 2 +-
 kernel/sched/loadavg.c| 2 +-
 kernel/time/timekeeping.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/linux/sched/loadavg.h b/include/linux/sched/loadavg.h
index 4859bea..83ec54b 100644
--- a/include/linux/sched/loadavg.h
+++ b/include/linux/sched/loadavg.h
@@ -43,6 +43,6 @@ extern unsigned long calc_load_n(unsigned long load, unsigned 
long exp,
 #define LOAD_INT(x) ((x) >> FSHIFT)
 #define LOAD_FRAC(x) LOAD_INT(((x) & (FIXED_1-1)) * 100)
 
-extern void calc_global_load(unsigned long ticks);
+extern void calc_global_load(void);
 
 #endif /* _LINUX_SCHED_LOADAVG_H */
diff --git a/kernel/sched/loadavg.c b/kernel/sched/loadavg.c
index de22da6..d2a6556 100644
--- a/kernel/sched/loadavg.c
+++ b/kernel/sched/loadavg.c
@@ -347,7 +347,7 @@ static inline void calc_global_nohz(void) { }
  *
  * Called from the global timer code.
  */
-void calc_global_load(unsigned long ticks)
+void calc_global_load(void)
 {
unsigned long sample_window;
long active, delta;
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index d20d489..63a632f 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -2193,7 +2193,7 @@ EXPORT_SYMBOL(ktime_get_coarse_ts64);
 void do_timer(unsigned long ticks)
 {
jiffies_64 += ticks;
-   calc_global_load(ticks);
+   calc_global_load();
 }
 
 /**


Re: 5.8-rc*: kernel BUG at kernel/signal.c:1917

2020-07-20 Thread Paul Gortmaker
[Re: 5.8-rc*: kernel BUG at kernel/signal.c:1917] On 20/07/2020 (Mon 16:21) 
Peter Zijlstra wrote:

> On Mon, Jul 20, 2020 at 04:02:24PM +0200, Oleg Nesterov wrote:
> > I have to admit, I do not understand the usage of prev_state in schedule(),
> > it looks really, really subtle...
> 
> Right, so commit dbfb089d360 solved a problem where schedule() re-read
> prev->state vs prev->on_rq = 0. That is, schedule()'s dequeue and
> ttwu()'s enqueue disagreed over sched_contributes_to_load. and as a
> result load-accounting went wobbly.
> 
> Now, looking at that commit again, I might've solved the problem twice
> :-P

[...]

> That said, in a crossed email, I just proposed we could simplify all
> this like so.. but now I need to go ask people to re-validate that
> loadavg muck again :-/

After a two hour "quick" sanity test I then gave it a full 7h run (which
always seemed to break before dbfb089d360) and I didn't see any stuck
load average with master from today + this change.

Paul.

root@t5610:/home/paul/git/linux-head#
[1]+  Donenohup 
tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 24 --duration 420 
--configs TREE03 --trust-make > /tmp/kvm.sh.out 2>&1
root@t5610:/home/paul/git/linux-head# cat /proc/version
Linux version 5.8.0-rc6-1-g5714ee50bb43-dirty (paul@t5610) (gcc (Ubuntu 
9.3.0-10ubuntu2) 9.3.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #3 SMP Mon Jul 
20 12:30:33 EDT 2020
root@t5610:/home/paul/git/linux-head# uptime
 00:49:18 up  7:41,  2 users,  load average: 0.01, 0.00, 0.63
root@t5610:/home/paul/git/linux-head# 

--

> 
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index a2a244af9a53..437fc3b241f2 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4193,9 +4193,6 @@ static void __sched notrace __schedule(bool preempt)
>   local_irq_disable();
>   rcu_note_context_switch(preempt);
>  
> - /* See deactivate_task() below. */
> - prev_state = prev->state;
> -
>   /*
>* Make sure that signal_pending_state()->signal_pending() below
>* can't be reordered with __set_current_state(TASK_INTERRUPTIBLE)
> @@ -4223,7 +4220,8 @@ static void __sched notrace __schedule(bool preempt)
>* We must re-load prev->state in case ttwu_remote() changed it
>* before we acquired rq->lock.
>*/
> - if (!preempt && prev_state && prev_state == prev->state) {
> + prev_state = prev->state;
> + if (!preempt && prev_state) {
>   if (signal_pending_state(prev_state, prev)) {
>   prev->state = TASK_RUNNING;
>   } else {
> @@ -4237,10 +4235,12 @@ static void __sched notrace __schedule(bool preempt)
>  
>   /*
>* __schedule() ttwu()
> -  *   prev_state = prev->state;if 
> (READ_ONCE(p->on_rq) && ...)
> -  *   LOCK rq->lock  goto out;
> -  *   smp_mb__after_spinlock();
> smp_acquire__after_ctrl_dep();
> -  *   p->on_rq = 0;p->state = 
> TASK_WAKING;
> +  *   if (prev_state)  if (p->on_rq && ...)
> +  * p->on_rq = 0;goto out;
> +  *
> smp_acquire__after_ctrl_dep();
> +  *p->state = TASK_WAKING
> +  *
> +  * Where __schedule() and ttwu() have matching control 
> dependencies.
>*
>* After this, schedule() must not care about p->state 
> any more.
>*/


Re: weird loadavg on idle machine post 5.7

2020-07-03 Thread Paul Gortmaker
[Re: weird loadavg on idle machine post 5.7] On 02/07/2020 (Thu 17:15) Paul 
Gortmaker wrote:

> [weird loadavg on idle machine post 5.7] On 02/07/2020 (Thu 13:15) Dave Jones 
> wrote:

[...]

> > both implicated this commit:
> > 
> > commit c6e7bd7afaeb3af55ffac122828035f1c01d1d7b (refs/bisect/bad)
> > Author: Peter Zijlstra 
> > Date:   Sun May 24 21:29:55 2020 +0100
> > 
> > sched/core: Optimize ttwu() spinning on p->on_cpu
> 
> I was down to 10 commits roughly above and below this guy before hearing
> you were working the same problem.
> 
> I just got this guy to reveal a false load after a 2h test as well.
> I want to let the one underneath soak overnight just to also confirm it
> is "good" - so that is pending.

As per above, I ran a 12h test overnight on 505b8af5891 and it seems
fine.  Every other "bad" bisect point failed in 7h or less.  So my
testing seems to give the same result as Dave.

Paul.
--

root@t5610:/home/paul/git/linux-head# 
[1]-  Donenohup
tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 24 --duration 720 
--configs TREE03 --trust-make > /tmp/kvm.sh.out 2>&1
root@t5610:/home/paul/git/linux-head# uptime
 09:10:56 up 13:12,  2 users,  load average: 0.00, 0.00, 0.14
root@t5610:/home/paul/git/linux-head# cat /proc/version 
Linux version 5.7.0-rc6-00029-gd505b8af5891 (paul@t5610) (gcc version 9.3.0 
(Ubuntu 9.3.0-10ubuntu2), GNU ld (GNU Binutils for Ubuntu) 2.34) #2 SMP Thu Jul 
2 18:55:40 EDT 2020


Re: weird loadavg on idle machine post 5.7

2020-07-02 Thread Paul Gortmaker
[weird loadavg on idle machine post 5.7] On 02/07/2020 (Thu 13:15) Dave Jones 
wrote:

> When I upgraded my firewall to 5.7-rc2 I noticed that on a mostly
> idle machine (that usually sees loadavg hover in the 0.xx range)
> that it was consistently above 1.00 even when there was nothing running.
> All that perf showed was the kernel was spending time in the idle loop
> (and running perf).
> 
> For the first hour or so after boot, everything seems fine, but over
> time loadavg creeps up, and once it's established a new baseline, it
> never seems to ever drop below that again.
> 
> One morning I woke up to find loadavg at '7.xx', after almost as many
> hours of uptime, which makes me wonder if perhaps this is triggered
> by something in cron.  I have a bunch of scripts that fire off
> every hour that involve thousands of shortlived runs of iptables/ipset,
> but running them manually didn't seem to automatically trigger the bug.
> 
> Given it took a few hours of runtime to confirm good/bad, bisecting this
> took the last two weeks. I did it four different times, the first

I've seen pretty much the same thing - I was helping paulmck test
rcu-dev for something hopefully unrelated, when I 1st saw it, and
assumed it came in with the sched-core merge and was using one under
that as "good" to attempt bisect.

> producing bogus results from over-eager 'good', but the last two runs

Yeah - it sucks.  I was using Paul's TREE03 rcu-torture for loading and
even after a two hour test I'd still get "false good" results.  Only
after 7h was I quite confident that good was really good.

> both implicated this commit:
> 
> commit c6e7bd7afaeb3af55ffac122828035f1c01d1d7b (refs/bisect/bad)
> Author: Peter Zijlstra 
> Date:   Sun May 24 21:29:55 2020 +0100
> 
> sched/core: Optimize ttwu() spinning on p->on_cpu

I was down to 10 commits roughly above and below this guy before hearing
you were working the same problem.

I just got this guy to reveal a false load after a 2h test as well.
I want to let the one underneath soak overnight just to also confirm it
is "good" - so that is pending.

What I can add, is that it is like we are "leaking" an instance into
calc_load_tasks -- which isn't anything new -- see when tglx fixed it
before in d60585c5766.  Unfortunate we don't have some low overhead leak
checks on that... ?

Anyway, if I "fix" the leak, then everything seems back to normal:

   (gdb) p calc_load_tasks
   $2 = {counter = 1}
   (gdb) set variable calc_load_tasks = { 0 }
   (gdb) p calc_load_tasks
   $4 = {counter = 0}
   (gdb) continue
   Continuing.
   
   [ ... watching decay on resumed target ]
   
10:13:14 up  9:54,  4 users,  load average: 0.92, 0.98, 1.15
10:13:54 up  9:55,  4 users,  load average: 0.47, 0.86, 1.10
10:15:17 up  9:56,  4 users,  load average: 0.12, 0.65, 1.00
10:19:20 up 10:00,  4 users,  load average: 0.00, 0.28, 0.76
10:26:07 up 10:07,  4 users,  load average: 0.00, 0.06, 0.48
10:32:48 up 10:14,  4 users,  load average: 0.00, 0.00, 0.29

Obviously that isn't a fix, but it shows it is an accounting thing.
I've also used gdb to snoop all the cfs->avg fields and they look as
expected for a completely idle machine.  Nothing hiding in avg_rt or
avg_dl either.

> 
> Both Rik and Mel reported seeing ttwu() spend significant time on:
> 
>   smp_cond_load_acquire(>on_cpu, !VAL);
> 
> Attempt to avoid this by queueing the wakeup on the CPU that owns the
> p->on_cpu value. This will then allow the ttwu() to complete without
> further waiting.
> 
> Since we run schedule() with interrupts disabled, the IPI is
> guaranteed to happen after p->on_cpu is cleared, this is what makes it
> safe to queue early.
> 
> Signed-off-by: Peter Zijlstra (Intel) 
> Signed-off-by: Mel Gorman 
> Signed-off-by: Ingo Molnar 
> Cc: Jirka Hladky 
> Cc: Vincent Guittot 
> Cc: valentin.schnei...@arm.com
> Cc: Hillf Danton 
> Cc: Rik van Riel 
> Link: 
> https://lore.kernel.org/r/20200524202956.27665-2-mgor...@techsingularity.net
> 
> Unfortunatly it doesn't revert cleanly on top of rc3 so I haven't
> confirmed 100% that it's the cause yet, but the two separate bisects
> seem promising.

I've not tried the revert (yet) - but Kyle saw me boring people on
#kernel with the details of bisecting this and gave me the heads-up you
were looking at it too (thanks Kyle!).   So I figured I'd better add
what I'd seen so far.

I'm testing with what is largely a defconfig, plus KVM_INTEL (needed for
paulmck TREE03 rcu-torture), plus I enabled KGDB and DEBUG_INFO after a
while so I could poke and prod - but was reproducing it before that.

For completeness, the test was:

  tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 24 --duration 120 \
--configs TREE03 --trust-make

...on a 24 core 2013 vintage xeon-v2 COTS box.   As above, the 120m
seemed to give between 60-75% confidence on not getting a false good.

Anyway - so 

[PATCH] sched: nohz: drop calc_load_nohz_remote with mismatched ifdefs

2020-07-01 Thread Paul Gortmaker
In commit ebc0f83c78a2 ("timers/nohz: Update NOHZ load in remote tick")
we got calc_load_nohz_remote(rq) as a one-line wrapper function around
calc_load_nohz_fold(rq) and with sched_tick_remote() as the only user.

However, we build the sched_tick_remote only for NO_HZ_FULL but the
wrapper inside a block for NO_HZ_COMMON.  So users that parallel the
defconfig with COMMON=y and FULL=n get an unused stub built in.

  (gdb) p calc_load_nohz_remote
  $227 = {void (struct rq *)} 0x8110a5c0 
  (gdb) p sched_tick_remote
  No symbol "sched_tick_remote" in current context.

Rather than fix the ifdeffery, we note that calc_load_nohz_remote was
most likely introduced to be self-documenting of the "remote" aspect.

And yet, inside sched_tick_remote, it is fairly clear that all the code
there are acting on behalf of the remote cpu and the corresponding
remote rq without them each adding specific remote named wrappers.

Hence it seems to make sense to just dispense with the stub entirely
and simplify the code code in the process.

Ensure at the same time, that nohz.h is aware of what a rq is - since
that implicit header dependency was introduced by the same commit.

Fixes: ebc0f83c78a2 ("timers/nohz: Update NOHZ load in remote tick")
Cc: Scott Wood 
Cc: Peter Zijlstra 
Cc: Frederic Weisbecker 
Cc: Ingo Molnar 
Signed-off-by: Paul Gortmaker 
---
 include/linux/sched/nohz.h |  5 +++--
 kernel/sched/core.c|  2 +-
 kernel/sched/loadavg.c | 11 +--
 3 files changed, 5 insertions(+), 13 deletions(-)

diff --git a/include/linux/sched/nohz.h b/include/linux/sched/nohz.h
index 6d67e9a5af6b..58459fdf6758 100644
--- a/include/linux/sched/nohz.h
+++ b/include/linux/sched/nohz.h
@@ -6,6 +6,8 @@
  * This is the interface between the scheduler and nohz/dynticks:
  */
 
+struct rq;
+
 #if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON)
 extern void nohz_balance_enter_idle(int cpu);
 extern int get_nohz_timer_target(void);
@@ -15,11 +17,10 @@ static inline void nohz_balance_enter_idle(int cpu) { }
 
 #ifdef CONFIG_NO_HZ_COMMON
 void calc_load_nohz_start(void);
-void calc_load_nohz_remote(struct rq *rq);
 void calc_load_nohz_stop(void);
+void calc_load_nohz_fold(struct rq *rq);
 #else
 static inline void calc_load_nohz_start(void) { }
-static inline void calc_load_nohz_remote(struct rq *rq) { }
 static inline void calc_load_nohz_stop(void) { }
 #endif /* CONFIG_NO_HZ_COMMON */
 
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f518af52d0fb..23d3b2282c09 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3763,7 +3763,7 @@ static void sched_tick_remote(struct work_struct *work)
}
curr->sched_class->task_tick(rq, curr, 0);
 
-   calc_load_nohz_remote(rq);
+   calc_load_nohz_fold(rq);
 out_unlock:
rq_unlock_irq(rq, );
 out_requeue:
diff --git a/kernel/sched/loadavg.c b/kernel/sched/loadavg.c
index d2a655643a02..5bab95e1bde4 100644
--- a/kernel/sched/loadavg.c
+++ b/kernel/sched/loadavg.c
@@ -231,7 +231,7 @@ static inline int calc_load_read_idx(void)
return calc_load_idx & 1;
 }
 
-static void calc_load_nohz_fold(struct rq *rq)
+void calc_load_nohz_fold(struct rq *rq)
 {
long delta;
 
@@ -252,15 +252,6 @@ void calc_load_nohz_start(void)
calc_load_nohz_fold(this_rq());
 }
 
-/*
- * Keep track of the load for NOHZ_FULL, must be called between
- * calc_load_nohz_{start,stop}().
- */
-void calc_load_nohz_remote(struct rq *rq)
-{
-   calc_load_nohz_fold(rq);
-}
-
 void calc_load_nohz_stop(void)
 {
struct rq *this_rq = this_rq();
-- 
1.9.1



[PATCH] sched: nohz: stop passing around unused "ticks" parameter.

2020-07-01 Thread Paul Gortmaker
The "ticks" parameter was added in commit 0f004f5a696a ("sched: Cure more
NO_HZ load average woes") since calc_global_nohz() was called and needed
the "ticks" argument.

But in commit c308b56b5398 ("sched: Fix nohz load accounting -- again!")
it became unused as the function calc_global_nohz() dropped using "ticks".

Fixes: c308b56b5398 ("sched: Fix nohz load accounting -- again!")
Cc: Peter Zijlstra 
Cc: Frederic Weisbecker 
Signed-off-by: Paul Gortmaker 
---
 include/linux/sched/loadavg.h | 2 +-
 kernel/sched/loadavg.c| 2 +-
 kernel/time/timekeeping.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/linux/sched/loadavg.h b/include/linux/sched/loadavg.h
index 4859bea47a7b..83ec54b65e79 100644
--- a/include/linux/sched/loadavg.h
+++ b/include/linux/sched/loadavg.h
@@ -43,6 +43,6 @@ extern unsigned long calc_load_n(unsigned long load, unsigned 
long exp,
 #define LOAD_INT(x) ((x) >> FSHIFT)
 #define LOAD_FRAC(x) LOAD_INT(((x) & (FIXED_1-1)) * 100)
 
-extern void calc_global_load(unsigned long ticks);
+extern void calc_global_load(void);
 
 #endif /* _LINUX_SCHED_LOADAVG_H */
diff --git a/kernel/sched/loadavg.c b/kernel/sched/loadavg.c
index de22da666ac7..d2a655643a02 100644
--- a/kernel/sched/loadavg.c
+++ b/kernel/sched/loadavg.c
@@ -347,7 +347,7 @@ static inline void calc_global_nohz(void) { }
  *
  * Called from the global timer code.
  */
-void calc_global_load(unsigned long ticks)
+void calc_global_load(void)
 {
unsigned long sample_window;
long active, delta;
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index d20d489841c8..63a632f9896c 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -2193,7 +2193,7 @@ void ktime_get_coarse_ts64(struct timespec64 *ts)
 void do_timer(unsigned long ticks)
 {
jiffies_64 += ticks;
-   calc_global_load(ticks);
+   calc_global_load();
 }
 
 /**
-- 
3.9.1



[PATCH] timers/nohz: fix implicit dependency on "struct rq"

2020-05-14 Thread Paul Gortmaker
Backports to older v5.x kernels revealed a recently introduced
implicit dependency on struct rq that makes the nohz.h header
no longer stand alone.  This is most easily demonstrated as:

   $ echo '#include ' > init/main.c
   $ echo 'void foo(void) {}' >> init/main.c
   $ make init/main.o
 CC  init/main.o
   In file included from init/main.c:1:0:
   ./include/linux/sched/nohz.h:18:35: warning: ‘struct rq’ declared inside 
parameter list [enabled by default]
void calc_load_nohz_remote(struct rq *rq);
  ^
   ./include/linux/sched/nohz.h:18:35: warning: its scope is only this 
definition or declaration, which is probably not what you want [enabled by 
default]

Fixes: ebc0f83c78a2 ("timers/nohz: Update NOHZ load in remote tick")
Cc: Peter Zijlstra 
Cc: Frederic Weisbecker 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Signed-off-by: Paul Gortmaker 
---
 include/linux/sched/nohz.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/sched/nohz.h b/include/linux/sched/nohz.h
index 6d67e9a5af6b..67a105b3dd82 100644
--- a/include/linux/sched/nohz.h
+++ b/include/linux/sched/nohz.h
@@ -6,6 +6,8 @@
  * This is the interface between the scheduler and nohz/dynticks:
  */
 
+struct rq;
+
 #if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON)
 extern void nohz_balance_enter_idle(int cpu);
 extern int get_nohz_timer_target(void);
-- 
1.9.1



[tip: perf/core] perf/x86/intel/pt: Drop pointless NULL assignment.

2020-05-01 Thread tip-bot2 for Paul Gortmaker
The following commit has been merged into the perf/core branch of tip:

Commit-ID: 4bd30106ddb26d2304adc5bb7bd269825300440d
Gitweb:
https://git.kernel.org/tip/4bd30106ddb26d2304adc5bb7bd269825300440d
Author:Paul Gortmaker 
AuthorDate:Wed, 08 Apr 2020 19:52:16 -04:00
Committer: Peter Zijlstra 
CommitterDate: Thu, 30 Apr 2020 20:14:36 +02:00

perf/x86/intel/pt: Drop pointless NULL assignment.

Only a few lines below this removed line is this:

  attrs = kzalloc(size, GFP_KERNEL);

and since there is no code path where this could be avoided, the
NULL assignment is a pointless relic of history and can be removed.

Signed-off-by: Paul Gortmaker 
Signed-off-by: Peter Zijlstra (Intel) 
Link: 
https://lkml.kernel.org/r/20200408235216.108980-1-paul.gortma...@windriver.com
---
 arch/x86/events/intel/pt.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index 1db7a51..e94af4a 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -226,8 +226,6 @@ static int __init pt_pmu_hw_init(void)
pt_pmu.vmx = true;
}
 
-   attrs = NULL;
-
for (i = 0; i < PT_CPUID_LEAVES; i++) {
cpuid_count(20, i,
_pmu.caps[CPUID_EAX + i*PT_CPUID_REGS_NUM],


Re: Fwd: linux-next: build failure after merge of the kbuild tree

2019-05-06 Thread Paul Gortmaker
[Re: Fwd: linux-next: build failure after merge of the kbuild tree] On 
06/05/2019 (Mon 21:07) Masahiro Yamada wrote:

> Hi Paul,
> 
> 
> On Mon, May 6, 2019 at 12:34 PM Paul Gortmaker
>  wrote:
> >
> > [Fwd: linux-next: build failure after merge of the kbuild tree] On 
> > 06/05/2019 (Mon 11:19) Masahiro Yamada wrote:
> >
> > > Hi Paul,
> > >
> > > In today's linux-next build testing,
> > > more "make ... explicitly non-modular"
> > > candidates showed up.
> > >
> >
> > Hi Masahiro,
> >
> > I am not 100% clear on what you are asking me.  There are lots and lots
> > of these in the kernel many fixed, and many remain unfortunately.
> >
> > > arch/arm/plat-omap/dma.c
> > > drivers/clocksource/timer-ti-dm.c
> > > drivers/mfd/omap-usb-host.c
> > > drivers/mfd/omap-usb-tll.c
> >
> > None of these are "new".  I just checked, and I have had patches for all
> > these for a long time, in my personal queue, found by my audits.
> 
> 
> OK, I saw many patches from you
> addressing this issue,
> so I just thought you might be motivated to
> fix them.
> 
> Anyway, I have a reason to fix them
> because a patch in my tree is causing build errors.

I understand now.  I missed the connection between these drivers and the
Kbuild change when I read this last night.  Sorry about that.

I can send the changes to those four files, but since I can't guarantee
they will be merged quickly (or at all!) - that will leave the commit in
the Kbuild tree causing build regressions for days or likely even weeks.

> So, I will do something for them
> if you do not have a plan to send patches soon.

I will be happy to send them, but we just opened the two week merge
window, and a lot of maintainers don't like getting sent new patches
until the two week merge window has closed - so we should avoid that.

I'm not sure how you would like to proceed - one way would be that we
get the drivers above changed in 5.2 and you delay your kbuild change
until we start v5.3 - to that end I'd be happy to add the Kbuild change
to my internal build testing in the meantime, if you would like.

Now that I understand the problem, let me know what you would like to
do, and I'll do what I can to help out.

Thanks,
Paul.


Re: Fwd: linux-next: build failure after merge of the kbuild tree

2019-05-05 Thread Paul Gortmaker
[Fwd: linux-next: build failure after merge of the kbuild tree] On 06/05/2019 
(Mon 11:19) Masahiro Yamada wrote:

> Hi Paul,
> 
> In today's linux-next build testing,
> more "make ... explicitly non-modular"
> candidates showed up.
> 

Hi Masahiro,

I am not 100% clear on what you are asking me.  There are lots and lots
of these in the kernel many fixed, and many remain unfortunately.

> arch/arm/plat-omap/dma.c
> drivers/clocksource/timer-ti-dm.c
> drivers/mfd/omap-usb-host.c
> drivers/mfd/omap-usb-tll.c

None of these are "new".  I just checked, and I have had patches for all
these for a long time, in my personal queue, found by my audits.

> Would you send patches?

It isn't that simple.  I wish it was.  Some subsystem maintainers are
glad to take the patches, and some think they are a waste of time and
reject them immediately.  Some I've sent just simply get "crickets".

What that means is, that I need to look at each maintainer's
requirements, and ensure the patch and commit log are matching
expectatins - I will not just spam out hundreds of patches across all
subsystems.  Anyone who has spent considerable time in linux development
knows that is a recipe for failure.

So I need to work across each subsystem - one at a time, with their
individual maintainer requirements in mind, and if you look at git
history, you will see that has been what I've tried to do when I had
free time to work on fixing these across the whole linux tree.

But fortunately, none of these represent a CVE/security issue, so I've
never had a reason to try and pretend there was any reason for an
immediate fix/merge - they just represent a better attention to detail
in the code we merge and support that I'd like to see happen tree-wide.

I appreciate that you are also interested in seeing all these fixed, and
I also wish they could all be solved in one version, but unfortunately I
don't think that is pragmatic.  So in the meantime, I will continue to
chip away at things when we are early in the dev cycle and not starting
the two week merge window, as we are just now.

Thanks,
Paul.
--

> 
> I think EXPORT_SYMBOL_GPL() in omap-usb-tll.c
> are also unnecessary.
> 
> Thanks.
> 
> 
> 
> -- Forwarded message -
> From: Stephen Rothwell 
> Date: Mon, May 6, 2019 at 8:51 AM
> Subject: linux-next: build failure after merge of the kbuild tree
> To: Masahiro Yamada 
> Cc: Linux Next Mailing List , Linux Kernel
> Mailing List , Alexey Gladkov
> , Keshava Munegowda ,
> Samuel Ortiz 
> 
> 
> Hi Masahiro,
> 
> After merging the kbuild tree, today's linux-next build (arm
> multi_v7_defconfig) failed like this:
> 
> In file included from include/linux/module.h:18,
>  from drivers/mfd/omap-usb-tll.c:21:
> drivers/mfd/omap-usb-tll.c:462:26: error: expected ',' or ';' before
> 'USBHS_DRIVER_NAME'
>  MODULE_ALIAS("platform:" USBHS_DRIVER_NAME);
>   ^
> include/linux/moduleparam.h:26:47: note: in definition of macro 
> '__MODULE_INFO'
>= __MODULE_INFO_PREFIX __stringify(tag) "=" info
>^~~~
> include/linux/module.h:164:30: note: in expansion of macro 'MODULE_INFO'
>  #define MODULE_ALIAS(_alias) MODULE_INFO(alias, _alias)
>   ^~~
> drivers/mfd/omap-usb-tll.c:462:1: note: in expansion of macro 'MODULE_ALIAS'
>  MODULE_ALIAS("platform:" USBHS_DRIVER_NAME);
>  ^~~~
> 
> Caused by commit
> 
>   6a26793a7891 ("moduleparam: Save information about built-in modules
> in separate file")
> 
> USBHS_DRIVER_NAME is not defined and this kbuild tree change has
> exposed it. It has been this way since commit
> 
>   16fa3dc75c22 ("mfd: omap-usb-tll: HOST TLL platform driver")
> 
> From v3.7-rc1 in 2012.
> 
> I have applied the following patch for today.
> 
> From: Stephen Rothwell 
> Date: Mon, 6 May 2019 09:39:14 +1000
> Subject: [PATCH] mfd: omap: remove unused MODULE_ALIAS from omap-usb-tll.c
> 
> USBHS_DRIVER_NAME has never been defined, so this cannot have ever
> been used.
> 
> Signed-off-by: Stephen Rothwell 
> ---
>  drivers/mfd/omap-usb-tll.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/mfd/omap-usb-tll.c b/drivers/mfd/omap-usb-tll.c
> index 446713dbee27..1cc8937e8bec 100644
> --- a/drivers/mfd/omap-usb-tll.c
> +++ b/drivers/mfd/omap-usb-tll.c
> @@ -459,7 +459,7 @@ EXPORT_SYMBOL_GPL(omap_tll_disable);
> 
>  MODULE_AUTHOR("Keshava Munegowda ");
>  MODULE_AUTHOR("Roger Quadros ");
> -MODULE_ALIAS("platform:" USBHS_DRIVER_NAME);
> +// MODULE_ALIAS("platform:" USBHS_DRIVER_NAME);
>  MODULE_LICENSE("GPL v2");
>  MODULE_DESCRIPTION("usb tll driver for TI OMAP EHCI and OHCI controllers");
> 
> --
> 2.20.1
> 
> --
> Cheers,
> Stephen Rothwell
> 
> 
> -- 
> Best Regards
> Masahiro Yamada




[PATCH] sound: soc-acpi: fix implicit header use of module.h/export.h

2019-04-13 Thread Paul Gortmaker
This file is implicitly relying on an instance of including
module.h from .

Ideally, header files under include/linux shouldn't be adding
includes of other headers, in anticipation of their consumers,
but just the headers needed for the header itself to pass
parsing with CPP.

The module.h is particularly bad in this sense, as it itself does
include a whole bunch of other headers, due to the complexity of
module support.

Here, we make the include explicit, in order to allow the future
removal of module.h from linux/acpi.h without causing build breakage.

Cc: Liam Girdwood 
Cc: Mark Brown 
Cc: Jaroslav Kysela 
Cc: Takashi Iwai 
Signed-off-by: Paul Gortmaker 

diff --git a/sound/soc/soc-acpi.c b/sound/soc/soc-acpi.c
index 4fb29f0e561e..444ce0602f76 100644
--- a/sound/soc/soc-acpi.c
+++ b/sound/soc/soc-acpi.c
@@ -4,6 +4,8 @@
 //
 // Copyright (c) 2013-15, Intel Corporation.
 
+#include 
+#include 
 #include 
 
 struct snd_soc_acpi_mach *
-- 
2.11.0



[PATCH] soundwire: intel: fix implicit header use of module.h/export.h

2019-04-13 Thread Paul Gortmaker
These two files are implicitly relying on an instance of including
module.h from .

Ideally, header files under include/linux shouldn't be adding
includes of other headers, in anticipation of their consumers,
but just the headers needed for the header itself to pass
parsing with CPP.

The module.h is particularly bad in this sense, as it itself does
include a whole bunch of other headers, due to the complexity of
module support.

Here, we make those includes explicit, in order to allow a future
removal of module.h from linux/acpi.h without causing build breakage.

Cc: Vinod Koul 
Cc: Sanyog Kale 
Cc: Pierre-Louis Bossart 
Signed-off-by: Paul Gortmaker 

diff --git a/drivers/soundwire/intel.c b/drivers/soundwire/intel.c
index fd8d034cfec1..4a4a883f29f6 100644
--- a/drivers/soundwire/intel.c
+++ b/drivers/soundwire/intel.c
@@ -7,6 +7,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/drivers/soundwire/intel_init.c b/drivers/soundwire/intel_init.c
index 5c8a20d99878..e0f2903101c7 100644
--- a/drivers/soundwire/intel_init.c
+++ b/drivers/soundwire/intel_init.c
@@ -8,6 +8,8 @@
  */
 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include "intel.h"
-- 
2.11.0



[PATCH] printk: tie printk_once / printk_deferred_once into .data.once for reset

2019-04-12 Thread Paul Gortmaker
In commit b1fca27d384e ("kernel debug: support resetting WARN*_ONCE")
we got the opportunity to reset state on the one shot messages,
without having to reboot.

However printk_once (printk_deferred_once) live in a different file
and didn't get the same kind of update/conversion, so they remain
unconditionally one shot, until the system is rebooted.

For example, we currently have:

  sched/rt.c: printk_deferred_once("sched: RT throttling activated\n");

..which could reasonably be tripped as someone is testing and tuning
a new system/workload and their task placements.  For consistency, and
to avoid reboots in the same vein as the original commit, we make these
two instances of _once the same as the WARN*_ONCE instances are.

Cc: Andi Kleen 
Cc: Petr Mladek 
Cc: Sergey Senozhatsky 
Cc: Steven Rostedt 
Cc: Andrew Morton 
Signed-off-by: Paul Gortmaker 
---
 Documentation/clearing-warn-once.txt | 2 +-
 include/linux/printk.h   | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/clearing-warn-once.txt 
b/Documentation/clearing-warn-once.txt
index 5b1f5d547be1..c68598b31428 100644
--- a/Documentation/clearing-warn-once.txt
+++ b/Documentation/clearing-warn-once.txt
@@ -1,5 +1,5 @@
 
-WARN_ONCE / WARN_ON_ONCE only print a warning once.
+WARN_ONCE / WARN_ON_ONCE / printk_once only emit a message once.
 
 echo 1 > /sys/kernel/debug/clear_warn_once
 
diff --git a/include/linux/printk.h b/include/linux/printk.h
index d7c77ed1a4cb..84ea4d094af3 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -347,7 +347,7 @@ extern int kptr_restrict;
 #ifdef CONFIG_PRINTK
 #define printk_once(fmt, ...)  \
 ({ \
-   static bool __print_once __read_mostly; \
+   static bool __section(.data.once) __print_once; \
bool __ret_print_once = !__print_once;  \
\
if (!__print_once) {\
@@ -358,7 +358,7 @@ extern int kptr_restrict;
 })
 #define printk_deferred_once(fmt, ...) \
 ({ \
-   static bool __print_once __read_mostly; \
+   static bool __section(.data.once) __print_once; \
bool __ret_print_once = !__print_once;  \
\
if (!__print_once) {\
-- 
2.7.4



Re: [PATCH v5 00/18] mfd: demodularization of non-modular drivers

2019-03-06 Thread Paul Gortmaker
[Re: [PATCH v5 00/18] mfd: demodularization of non-modular drivers] On 
07/03/2019 (Thu 00:10) Pavel Machek wrote:

> On Wed 2019-01-16 13:24:31, Lee Jones wrote:
> > [...]
> > 
> > > Paul Gortmaker (18):
> > >   mfd: aat2870-core: Make it explicitly non-modular
> > >   mfd: adp5520: Make it explicitly non-modular
> > >   mfd: as3711: Make it explicitly non-modular
> > >   mfd: db8500-prcmu: drop unused MODULE_ tags from non-modular code
> > >   mfd: htc-i2cpld: Make it explicitly non-modular
> > >   mfd: max8925-core: drop unused MODULE_ tags from non-modular code
> > >   mfd: rc5t583: Make it explicitly non-modular
> > >   mfd: sta2x11: drop unused MODULE_ tags from non-modular code
> > >   mfd: syscon: Make it explicitly non-modular
> > >   mfd: tps65090: Make it explicitly non-modular
> > >   mfd: tps65910: Make it explicitly non-modular
> > >   mfd: tps80031: Make it explicitly non-modular
> > >   mfd: wm831x-spi: Make it explicitly non-modular
> > >   mfd: wm831x-i2c: Make it explicitly non-modular
> > >   mfd: wm831x-core: drop unused module infrastructure from non-modular 
> > > code
> > >   mfd: wm8350-i2c: Make it explicitly non-modular
> > >   mfd: wm8350-core: drop unused module infrastructure from non-modular 
> > > code
> > >   mfd: wm8400-core: Make it explicitly non-modular
> > > 
> > >  drivers/mfd/aat2870-core.c  | 40 
> > > +++-
> > >  drivers/mfd/adp5520.c   | 30 +++---
> > >  drivers/mfd/as3711.c| 14 --
> > >  drivers/mfd/db8500-prcmu.c  | 10 --
> > >  drivers/mfd/htc-i2cpld.c| 18 +-
> > >  20 files changed, 41 insertions(+), 332 deletions(-)
> > 
> > All applied!
> 
> Is it good idea?

Pavel, I think yes it is good, and I hope you will allow me the chance
to convince you of the same.  It removes dead code, and removes the
chance that people mistakenly believe any of these drivers were
currently possible as modules, when they were really NOT at all modular.

> We want distro kernels on ARM, too, which means people will eventually
> want these as a modules, no?

And at the risk of repeating something I've said a lot already, this
is fine, and I 100% support people converting drivers to being modular,
in the case where there is demand, and where people with the hardware
who are willing to test that the modular use-case actually works.

If people want it to be modular, then this work actually helps.  You
don't have drivers "hiding in the shadows" that pretend to be modules.
Such drivers do not at all help with the "distro kernels" use case.

If a driver author responds and says they intended to make their driver
a module, I 100% support them, and will drop the code removal patch and
also have supported them in making their work tristate.  If the choice
to convert to tristate happens a year or more from now, it is trivial to
reclaim the unused-but-deleted code from git.

But, again as I have said many times -- I can't know every detail of
each driver to know if module/tristate makes any sense, as a use-case or
if even technically possible (such as in DMA/IOMMU or similar low level
core systems).   So the right option is to remove the dead code and not
impact the existing driver behaviour, and make it clear in the process
to the authors and users, that the driver was never modular to begin
with --  and in doing so, give them all a chance to comment and react.

Pavel, I hope this more extended explanation makes sense to you, and
that you simply have not seen me write these same details in the past.

Thanks,
Paul.
--

>   Pavel
> -- 
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) 
> http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html




[PATCH] mfd: tps68470: drop unused MODULE_DEVICE_TABLE

2019-02-06 Thread Paul Gortmaker
The Kconfig currently controlling compilation of this code is:

drivers/mfd/Kconfig:config MFD_TPS68470
drivers/mfd/Kconfig:bool "TI TPS68470 Power Management / LED chips"

...meaning that it currently is not being built as a module by anyone.

Hence we remove the MODULE_DEVICE_TABLE since it is a no-op for
non-modular code.

There is no removal of  here because there isn't one.
Instead, it is relying on including that implicitly from an ACPI header.

In cleaning up the ACPI instance of module.h (which also isn't strictly
needed), then this mfd driver breaks when MODULE_DEVICE_TABLE becomes
undefined here.

The easiest dependency solution is to simply defer the ACPI cleanup
until this change is present in mainline.

Cc: Lee Jones 
Signed-off-by: Paul Gortmaker 

diff --git a/drivers/mfd/tps68470.c b/drivers/mfd/tps68470.c
index a5981a79b29a..4a4df4ffd18c 100644
--- a/drivers/mfd/tps68470.c
+++ b/drivers/mfd/tps68470.c
@@ -86,7 +86,6 @@ static const struct acpi_device_id tps68470_acpi_ids[] = {
{"INT3472"},
{},
 };
-MODULE_DEVICE_TABLE(acpi, tps68470_acpi_ids);
 
 static struct i2c_driver tps68470_driver = {
.driver = {
-- 
2.7.4



  1   2   3   4   5   6   7   8   9   10   >