Re: [PATCH v4 05/10] sched/topology: Define and assign sched_domain flag metadata

2020-08-07 Thread Valentin Schneider


On 06/08/20 17:18, Valentin Schneider wrote:
> In the grand scheme of things I'd actually like to have this file output
> the names of the flags rather than their values (since I now save them when
> SCHED_DEBUG=y), but I didn't find a simple way to hack the existing SD ctl
> table (sd_alloc_ctl_domain_table() and co) into doing this.
>

I "just" had to spend some more time grokking how the whole ctl
proc_handler thing is supposed to work; I now have a mostly working
solution, i.e. on my Juno:

  $ cat /proc/sys/kernel/sched_domain/cpu0/domain*/flags
  SD_BALANCE_NEWIDLE SD_BALANCE_EXEC SD_BALANCE_FORK SD_WAKE_AFFINE 
SD_SHARE_PKG_RESOURCES
  SD_BALANCE_NEWIDLE SD_BALANCE_EXEC SD_BALANCE_FORK SD_WAKE_AFFINE 
SD_ASYM_CPUCAPACITY SD_PREFER_SIBLING

I'll clean that up and go for the automagic ordering I previously
described.

>
> Now as to making this fully automagic, I *think* I could do something like
> having a first enum to set up an ordering:
>
>   #define SD_FLAG(name, ...) __##name,
>   enum {
> #include 
>   };
>
> A second one to have powers of 2:
>
>   #define SD_FLAG(name, ...) name = 1 << __##name,
>   enum {
> #include 
>   };
>
> And finally the metadata array assignment might be doable with:
>
>   #define SD_FLAG(_name, mflags) [__##_name] = { .meta_flags = mflags, .name 
> = #_name },
>
> Or, if there is a way somehow to directly get powers of 2 out of an enum:
>
>   #define SD_FLAG(_name, mflags) [_ffs(_name)] = { .meta_flags = mflags, 
> .name = #_name },
>
>>> +#ifdef CONFIG_SCHED_DEBUG
>>> +#define SD_FLAG(_name, idx, mflags) [idx] = {.meta_flags = mflags, .name = 
>>> #_name},
>>
>> s/{./{ .
>> s/e}/e }
>>
>> Thanks,
>>
>>   Ingo


Re: [PATCH v4 05/10] sched/topology: Define and assign sched_domain flag metadata

2020-08-06 Thread Ingo Molnar


* Valentin Schneider  wrote:

> +#ifndef SD_FLAG
> +#define SD_FLAG(x, y, z)
> +#endif

AFAICS there's not a single use of sd_flags.h that doesn't come with 
its own SD_FLAG definition, so I suppose this should be:

#ifndef SD_FLAG
# error "Should not happen."
#endif

?

Also, some nits:

> +/*
> + * Expected flag uses
> + *
> + * SHARED_CHILD: These flags are meant to be set from the base domain 
> upwards.
> + * If a domain has this flag set, all of its children should have it set. 
> This
> + * is usually because the flag describes some shared resource (all CPUs in 
> that
> + * domain share the same foobar), or because they are tied to a scheduling
> + * behaviour that we want to disable at some point in the hierarchy for
> + * scalability reasons.

s/foobar/resource

?

> +/*
> + * cross-node balancing
> + *
> + * SHARED_PARENT: Set for all NUMA levels above NODE.
> + */
> +SD_FLAG(SD_NUMA,12, SDF_SHARED_PARENT)

s/cross-node/Cross-node

BTW., is there any particular reason why these need to be defines with 
a manual enumeration of flag values - couldn't we generate 
auto-enumerated C enums instead or so?

> +#ifdef CONFIG_SCHED_DEBUG
> +#define SD_FLAG(_name, idx, mflags) [idx] = {.meta_flags = mflags, .name = 
> #_name},

s/{./{ .
s/e}/e }

Thanks,

Ingo


Re: [PATCH v4 05/10] sched/topology: Define and assign sched_domain flag metadata

2020-08-06 Thread Valentin Schneider


On 06/08/20 15:07, Ingo Molnar wrote:
> * Valentin Schneider  wrote:
>
>> +#ifndef SD_FLAG
>> +#define SD_FLAG(x, y, z)
>> +#endif
>
> AFAICS there's not a single use of sd_flags.h that doesn't come with
> its own SD_FLAG definition, so I suppose this should be:
>
> #ifndef SD_FLAG
> # error "Should not happen."
> #endif
>
> ?
>

I can give this a try; for the context, I copied uapi/asm-generic/unistd.h
without thinking too much, and that does:

  #ifndef __SYSCALL
  #define __SYSCALL(x, y)
  #endif

> Also, some nits:
>
>> +/*
>> + * Expected flag uses
>> + *
>> + * SHARED_CHILD: These flags are meant to be set from the base domain 
>> upwards.
>> + * If a domain has this flag set, all of its children should have it set. 
>> This
>> + * is usually because the flag describes some shared resource (all CPUs in 
>> that
>> + * domain share the same foobar), or because they are tied to a scheduling
>> + * behaviour that we want to disable at some point in the hierarchy for
>> + * scalability reasons.
>
> s/foobar/resource
>
> ?
>

That's better indeed, I think that's a remnant of when I was listing a lot
of things that could be shared.

>> +/*
>> + * cross-node balancing
>> + *
>> + * SHARED_PARENT: Set for all NUMA levels above NODE.
>> + */
>> +SD_FLAG(SD_NUMA,12, SDF_SHARED_PARENT)
>
> s/cross-node/Cross-node
>
> BTW., is there any particular reason why these need to be defines with
> a manual enumeration of flag values - couldn't we generate
> auto-enumerated C enums instead or so?
>

I remember exploring a few different options there, but it's been a while
already. I think one of the reasons I kept some form of explicit assignment
is to avoid making reading

  /proc/sys/kernel/sched_domain/cpu*/domain*/flags

more convoluted than it is now (i.e. you have to go fetch the
bit -> name translation in the source code).

In the grand scheme of things I'd actually like to have this file output
the names of the flags rather than their values (since I now save them when
SCHED_DEBUG=y), but I didn't find a simple way to hack the existing SD ctl
table (sd_alloc_ctl_domain_table() and co) into doing this.


Now as to making this fully automagic, I *think* I could do something like
having a first enum to set up an ordering:

  #define SD_FLAG(name, ...) __##name,
  enum {
#include 
  };

A second one to have powers of 2:

  #define SD_FLAG(name, ...) name = 1 << __##name,
  enum {
#include 
  };

And finally the metadata array assignment might be doable with:

  #define SD_FLAG(_name, mflags) [__##_name] = { .meta_flags = mflags, .name = 
#_name },

Or, if there is a way somehow to directly get powers of 2 out of an enum:

  #define SD_FLAG(_name, mflags) [_ffs(_name)] = { .meta_flags = mflags, .name 
= #_name },

>> +#ifdef CONFIG_SCHED_DEBUG
>> +#define SD_FLAG(_name, idx, mflags) [idx] = {.meta_flags = mflags, .name = 
>> #_name},
>
> s/{./{ .
> s/e}/e }
>
> Thanks,
>
>   Ingo


Re: [PATCH v4 05/10] sched/topology: Define and assign sched_domain flag metadata

2020-08-04 Thread peterz
On Fri, Jul 31, 2020 at 12:54:57PM +0100, Valentin Schneider wrote:
> +/*
> + * Domain members share CPU capacity (i.e. SMT)
> + *
> + * SHARED_CHILD: Set from the base domain up until spanned CPUs no longer 
> share
> + * CPU capacity.
> + */
> +SD_FLAG(SD_SHARE_CPUCAPACITY,   6, SDF_SHARED_CHILD)
> +
> +/*
> + * Domain members share CPU package resources (i.e. caches)
> + *
> + * SHARED_CHILD: Set from the base domain up until spanned CPUs no longer 
> share
> + * the same cache(s).
> + */
> +SD_FLAG(SD_SHARE_PKG_RESOURCES, 7, SDF_SHARED_CHILD)

We should probably rename these to SD_SHARE_CORE / SD_SHARE_CACHE or
something, but let me first go through this series (and hopefully apply)
before we go make more changes.


.. onwards!


Re: [PATCH v4 05/10] sched/topology: Define and assign sched_domain flag metadata

2020-08-04 Thread Valentin Schneider


On 04/08/20 12:08, pet...@infradead.org wrote:
> On Fri, Jul 31, 2020 at 12:54:57PM +0100, Valentin Schneider wrote:
>> +/*
>> + * Domain members share CPU capacity (i.e. SMT)
>> + *
>> + * SHARED_CHILD: Set from the base domain up until spanned CPUs no longer 
>> share
>> + * CPU capacity.
>> + */
>> +SD_FLAG(SD_SHARE_CPUCAPACITY,   6, SDF_SHARED_CHILD)
>> +
>> +/*
>> + * Domain members share CPU package resources (i.e. caches)
>> + *
>> + * SHARED_CHILD: Set from the base domain up until spanned CPUs no longer 
>> share
>> + * the same cache(s).
>> + */
>> +SD_FLAG(SD_SHARE_PKG_RESOURCES, 7, SDF_SHARED_CHILD)
>
> We should probably rename these to SD_SHARE_CORE / SD_SHARE_CACHE or
> something,

SD_SHARE_CACHE sounds good, given how it's used (LLC stuff).

> but let me first go through this series (and hopefully apply)
> before we go make more changes.
>
>
> .. onwards!

Thanks!


[PATCH v4 05/10] sched/topology: Define and assign sched_domain flag metadata

2020-07-31 Thread Valentin Schneider
There are some expectations regarding how sched domain flags should be laid
out, but none of them are checked or asserted in
sched_domain_debug_one(). After staring at said flags for a while, I've
come to realize they all (except *one*) fall in either of two categories:

- Shared with children: those flags are set from the base CPU domain
  upwards. Any domain that has it set will have it set in its children. It
  hints at "some property holds true / some behaviour is enabled until this
  level".

- Shared with parents: those flags are set from the topmost domain
  downwards. Any domain that has it set will have it set in its parents. It
  hints at "some property isn't visible / some behaviour is disabled until
  this level".

The odd one out is SD_PREFER_SIBLING, which is cleared below levels with
SD_ASYM_CPUCAPACITY. The change was introduced by commit

  9c63e84db29b ("sched/core: Disable SD_PREFER_SIBLING on asymmetric CPU 
capacity domains")

as it could break misfit migration on some systems. In light of this, we
might want to change it back to make it fit one of the two categories and
fix the issue another way.

Tweak the sched_domain flag declaration to assign each flag an expected
layout, and include the rationale for each flag "meta type" assignment as a
comment. Consolidate the flag metadata into an array; the index of a flag's
metadata can easily be found with log2(flag), IOW __ffs(flag).

Signed-off-by: Valentin Schneider 
---
 include/linux/sched/sd_flags.h | 154 +++--
 include/linux/sched/topology.h |  13 ++-
 2 files changed, 140 insertions(+), 27 deletions(-)

diff --git a/include/linux/sched/sd_flags.h b/include/linux/sched/sd_flags.h
index c313291fe8bd..9d0fb8924181 100644
--- a/include/linux/sched/sd_flags.h
+++ b/include/linux/sched/sd_flags.h
@@ -3,29 +3,131 @@
  * sched-domains (multiprocessor balancing) flag declarations.
  */
 
-/* Balance when about to become idle */
-SD_FLAG(SD_BALANCE_NEWIDLE, 0)
-/* Balance on exec */
-SD_FLAG(SD_BALANCE_EXEC,1)
-/* Balance on fork, clone */
-SD_FLAG(SD_BALANCE_FORK,2)
-/* Balance on wakeup */
-SD_FLAG(SD_BALANCE_WAKE,3)
-/* Wake task to waking CPU */
-SD_FLAG(SD_WAKE_AFFINE, 4)
-/* Domain members have different CPU capacities */
-SD_FLAG(SD_ASYM_CPUCAPACITY,5)
-/* Domain members share CPU capacity */
-SD_FLAG(SD_SHARE_CPUCAPACITY,   6)
-/* Domain members share CPU pkg resources */
-SD_FLAG(SD_SHARE_PKG_RESOURCES, 7)
-/* Only a single load balancing instance */
-SD_FLAG(SD_SERIALIZE,   8)
-/* Place busy groups earlier in the domain */
-SD_FLAG(SD_ASYM_PACKING,9)
-/* Prefer to place tasks in a sibling domain */
-SD_FLAG(SD_PREFER_SIBLING,  10)
-/* sched_domains of this level overlap */
-SD_FLAG(SD_OVERLAP, 11)
-/* cross-node balancing */
-SD_FLAG(SD_NUMA,12)
+#ifndef SD_FLAG
+#define SD_FLAG(x, y, z)
+#endif
+
+/*
+ * Expected flag uses
+ *
+ * SHARED_CHILD: These flags are meant to be set from the base domain upwards.
+ * If a domain has this flag set, all of its children should have it set. This
+ * is usually because the flag describes some shared resource (all CPUs in that
+ * domain share the same foobar), or because they are tied to a scheduling
+ * behaviour that we want to disable at some point in the hierarchy for
+ * scalability reasons.
+ *
+ * In those cases it doesn't make sense to have the flag set for a domain but
+ * not have it in (some of) its children: sched domains ALWAYS span their child
+ * domains, so operations done with parent domains will cover CPUs in the lower
+ * child domains.
+ *
+ *
+ * SHARED_PARENT: These flags are meant to be set from the highest domain
+ * downwards. If a domain has this flag set, all of its parents should have it
+ * set. This is usually for topology properties that start to appear above a
+ * certain level (e.g. domain starts spanning CPUs outside of the base CPU's
+ * socket).
+ */
+#define SDF_SHARED_CHILD 0x1
+#define SDF_SHARED_PARENT 0x2
+
+/*
+ * Balance when about to become idle
+ *
+ * SHARED_CHILD: Set from the base domain up to 
cpuset.sched_relax_domain_level.
+ */
+SD_FLAG(SD_BALANCE_NEWIDLE, 0, SDF_SHARED_CHILD)
+
+/*
+ * Balance on exec
+ *
+ * SHARED_CHILD: Set from the base domain up to the NUMA reclaim level.
+ */
+SD_FLAG(SD_BALANCE_EXEC,1, SDF_SHARED_CHILD)
+
+/*
+ * Balance on fork, clone
+ *
+ * SHARED_CHILD: Set from the base domain up to the NUMA reclaim level.
+ */
+SD_FLAG(SD_BALANCE_FORK,2, SDF_SHARED_CHILD)
+
+/*
+ * Balance on wakeup
+ *
+ * SHARED_CHILD: Set from the base domain up to 
cpuset.sched_relax_domain_level.
+ */
+SD_FLAG(SD_BALANCE_WAKE,3, SDF_SHARED_CHILD)
+
+/*
+ * Consider waking task on waking CPU.
+ *
+ * SHARED_CHILD: Set from the base domain up to the NUMA reclaim level.
+ */
+SD_FLAG(SD_WAKE_AFFINE, 4, SDF_SHARED_CHILD)
+
+/*
+ * Domain members have different CPU capacities
+ *
+ * SHARED_PARENT