Re: [PATCH v6 01/15] xen/common: add cache coloring common code

2024-02-05 Thread Jan Beulich
On 03.02.2024 11:57, Carlo Nonato wrote:
> On Thu, Feb 1, 2024 at 1:59 PM Jan Beulich  wrote:
>> On 29.01.2024 18:17, Carlo Nonato wrote:
>>> --- a/xen/arch/Kconfig
>>> +++ b/xen/arch/Kconfig
>>> @@ -31,3 +31,20 @@ config NR_NUMA_NODES
>>> associated with multiple-nodes management. It is the upper bound of
>>> the number of NUMA nodes that the scheduler, memory allocation and
>>> other NUMA-aware components can handle.
>>> +
>>> +config LLC_COLORING
>>> + bool "Last Level Cache (LLC) coloring" if EXPERT
>>> + depends on HAS_LLC_COLORING
>>> +
>>> +config NR_LLC_COLORS
>>> + int "Maximum number of LLC colors"
>>> + range 2 1024
>>
>> What's the reasoning behind this upper bound? IOW - can something to this
>> effect be said in the description, please?
> 
> The only reason is that this is the number of colors that fit in a 4 KiB page.
> I don't have any other good way of picking a number here. 1024 is already big
> and probably nobody would use such a configuration. But 512 or 256 would be
> equally arbitrary.

And because of this I'm asking that you say in the description how you
arrived at this value. As to fitting in 4k-page: That makes two
assumptions (both true for all ports right now, but liable to be missed if
either changed down the road): PAGE_SIZE == 0x1000 && sizeof(int) == 4.

>>> --- /dev/null
>>> +++ b/xen/common/llc-coloring.c
>>> @@ -0,0 +1,87 @@
>>> +/* SPDX-License-Identifier: GPL-2.0-only */
>>> +/*
>>> + * Last Level Cache (LLC) coloring common code
>>> + *
>>> + * Copyright (C) 2022 Xilinx Inc.
>>> + */
>>> +#include 
>>> +#include 
>>> +#include 
>>> +
>>> +bool __ro_after_init llc_coloring_enabled;
>>> +boolean_param("llc-coloring", llc_coloring_enabled);
>>
>> The variable has no use right now afaics, so it's unclear whether (a) it
>> is legitimately non-static and (b) placed in an appropriate section.
> 
> My bad here. The variable should be tested for in llc_coloring_init() and in
> domain_dump_llc_colors() (in domain_llc_coloring_free() as well, in later
> patches). That change was lost in the rebase of the series.
> 
> Anyway per this patch, the global is only accessed from this file while it's
> going to be accessed from outside in later patches. In this case what should
> I do? Declare it static and then make it non-static afterwards?

That would be preferred, considering that there may be an extended time
period between the 1st and 2nd patches going in. Explaining why a
variable is non-static despite not needing to be just yet would be an
alternative, but then you'd also need to justify why transiently
violating the respective Misra guideline is acceptable.

Jan



Re: [PATCH v6 01/15] xen/common: add cache coloring common code

2024-02-05 Thread Jan Beulich
On 03.02.2024 11:57, Carlo Nonato wrote:
> On Wed, Jan 31, 2024 at 4:57 PM Jan Beulich  wrote:
>> On 29.01.2024 18:17, Carlo Nonato wrote:
>>> +Background
>>> +**
>>> +
>>> +Cache hierarchy of a modern multi-core CPU typically has first levels 
>>> dedicated
>>> +to each core (hence using multiple cache units), while the last level is 
>>> shared
>>> +among all of them. Such configuration implies that memory operations on one
>>> +core (e.g. running a DomU) are able to generate interference on another 
>>> core
>>> +(e.g .hosting another DomU). Cache coloring allows eliminating this
>>> +mutual interference, and thus guaranteeing higher and more predictable
>>> +performances for memory accesses.
>>> +The key concept underlying cache coloring is a fragmentation of the memory
>>> +space into a set of sub-spaces called colors that are mapped to disjoint 
>>> cache
>>> +partitions. Technically, the whole memory space is first divided into a 
>>> number
>>> +of subsequent regions. Then each region is in turn divided into a number of
>>> +subsequent sub-colors. The generic i-th color is then obtained by all the
>>> +i-th sub-colors in each region.
>>> +
>>> +::
>>> +
>>> +Region jRegion j+1
>>> +.   
>>> +. . .
>>> +.   .
>>> +_ _ ___ _ _ _ _
>>> +| | | | | | |
>>> +| c_0 | c_1 | | c_n | c_0 | c_1 |
>>> +   _ _ _|_|_|_ _ _|_|_|_|_ _ _
>>> +:   :
>>> +:   :... ... .
>>> +:color 0
>>> +:... ... .
>>> +:
>>> +  . . ..:
>>> +
>>> +There are two pragmatic lesson to be learnt.
>>> +
>>> +1. If one wants to avoid cache interference between two domains, different
>>> +   colors needs to be used for their memory.
>>> +
>>> +2. Color assignment must privilege contiguity in the partitioning. E.g.,
>>> +   assigning colors (0,1) to domain I  and (2,3) to domain  J is better 
>>> than
>>> +   assigning colors (0,2) to I and (1,3) to J.
>>
>> I can't connect this 2nd point with any of what was said above.
> 
> If colors are contiguous then a greater spatial locality is achievable. You
> mean we should better explain this?

Yes, but not just that. See how you using "must" in the text contradicts you
now suggesting this is merely an optimization.

>>> +How to compute the number of colors
>>> +***
>>> +
>>> +To compute the number of available colors for a specific platform, the 
>>> size of
>>> +an LLC way and the page size used by Xen must be known. The first 
>>> parameter can
>>> +be found in the processor manual or can be also computed dividing the total
>>> +cache size by the number of its ways. The second parameter is the minimum
>>> +amount of memory that can be mapped by the hypervisor,
>>
>> I find "amount of memory that can be mapped" quite confusing here. Don't you
>> really mean the granularity at which memory can be mapped?
> 
> Yes that's what I wanted to describe. I'll change it.
> 
>>> thus dividing the way
>>> +size by the page size, the number of total cache partitions is found. So 
>>> for
>>> +example, an Arm Cortex-A53 with a 16-ways associative 1 MiB LLC, can 
>>> isolate up
>>> +to 16 colors when pages are 4 KiB in size.
>>
>> I guess it's a matter of what one's use to, but to me talking of "way size"
>> and how the calculation is described is, well, unusual. What I would start
>> from is the smallest entity, i.e. a cache line. Then it would be relevant
>> to describe how, after removing the low so many bits to cover for cache line
>> size, the remaining address bits are used to map to a particular set. It
>> looks to me as if you're assuming that this mapping is linear, using the
>> next so many bits from the address. Afaik this isn't true on various modern
>> CPUs; instead hash functions are used. Without knowing at least certain
>> properties of such a hash function, I'm afraid your mapping from address to
>> color isn't necessarily guaranteeing the promised isolation. The guarantee
>> may hold for processors you specifically target, but then I think in this
>> description it would help if you would fully spell out any assumptions you
>> make on how hardware maps addresses to elements of the cache.
> 
> You're right, we are assuming a linear mapping. We are going to review and
> extend the documentation in order to fully specify when coloring can be
> applied.
> 
> About the "way size" it's a way of summarizing all the parameters into one.
> We could ask for different cache parameters as you said, but in 

Re: [PATCH v6 01/15] xen/common: add cache coloring common code

2024-02-03 Thread Carlo Nonato
Hi Jan,

On Thu, Feb 1, 2024 at 1:18 PM Jan Beulich  wrote:
>
> On 29.01.2024 18:17, Carlo Nonato wrote:
> > --- /dev/null
> > +++ b/docs/misc/cache-coloring.rst
> > @@ -0,0 +1,87 @@
> > +Xen cache coloring user guide
> > +=
> > +
> > +The cache coloring support in Xen allows to reserve Last Level Cache (LLC)
> > +partitions for Dom0, DomUs and Xen itself. Currently only ARM64 is 
> > supported.
> > +
> > +To compile LLC coloring support set ``CONFIG_LLC_COLORING=y``.
> > +
> > +If needed, change the maximum number of colors with
> > +``CONFIG_NR_LLC_COLORS=``.
> > +
> > +Compile Xen and the toolstack and then configure it via
> > +`Command line parameters`_.
> > +
> > +Background
> > +**
> > +
> > +Cache hierarchy of a modern multi-core CPU typically has first levels 
> > dedicated
> > +to each core (hence using multiple cache units), while the last level is 
> > shared
> > +among all of them. Such configuration implies that memory operations on one
> > +core (e.g. running a DomU) are able to generate interference on another 
> > core
> > +(e.g .hosting another DomU). Cache coloring allows eliminating this
> > +mutual interference, and thus guaranteeing higher and more predictable
> > +performances for memory accesses.
>
> Since you say "eliminating" - what about shared mid-level caches? What about
> shared TLBs?

Cache coloring can help in reducing the interference, but you're right and
there are other factors to be considered. We will update the documentation to
better specify the applicability range and relax the terminology concerning
"eliminating" etc.

Thanks

> Jan



Re: [PATCH v6 01/15] xen/common: add cache coloring common code

2024-02-03 Thread Carlo Nonato
Hi Jan,

On Thu, Feb 1, 2024 at 1:59 PM Jan Beulich  wrote:
>
> On 29.01.2024 18:17, Carlo Nonato wrote:
> > --- a/xen/arch/Kconfig
> > +++ b/xen/arch/Kconfig
> > @@ -31,3 +31,20 @@ config NR_NUMA_NODES
> > associated with multiple-nodes management. It is the upper bound of
> > the number of NUMA nodes that the scheduler, memory allocation and
> > other NUMA-aware components can handle.
> > +
> > +config LLC_COLORING
> > + bool "Last Level Cache (LLC) coloring" if EXPERT
> > + depends on HAS_LLC_COLORING
> > +
> > +config NR_LLC_COLORS
> > + int "Maximum number of LLC colors"
> > + range 2 1024
>
> What's the reasoning behind this upper bound? IOW - can something to this
> effect be said in the description, please?

The only reason is that this is the number of colors that fit in a 4 KiB page.
I don't have any other good way of picking a number here. 1024 is already big
and probably nobody would use such a configuration. But 512 or 256 would be
equally arbitrary.

> > + default 128
> > + depends on LLC_COLORING
> > + help
> > +   Controls the build-time size of various arrays associated with LLC
> > +   coloring. Refer to cache coloring documentation for how to compute 
> > the
> > +   number of colors supported by the platform. This is only an upper
> > +   bound. The runtime value is autocomputed or manually set via 
> > cmdline.
> > +   The default value corresponds to an 8 MiB 16-ways LLC, which should 
> > be
> > +   more than what needed in the general case.
>
> Aiui while not outright wrong, non-power-of-2 values are meaningless to
> specify. Perhaps that is worth mentioning (if not making this a value
> that's used as exponent of 2 in the first place)?

Yes, I prefer a better help message.

> As to the default and its description: As said for the documentation,
> doesn't what this corresponds to also depend on cache line size? Even
> if this was still Arm-specific rather than common code, I'd question
> whether now and forever Arm chips may only use one pre-determined cache
> line size.

I hope I answered in the previous mail why the line size (in the specific case
we are applying coloring to) can be ignored as a parameter in favor of cache
size and number of ways.

> > --- /dev/null
> > +++ b/xen/common/llc-coloring.c
> > @@ -0,0 +1,87 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/*
> > + * Last Level Cache (LLC) coloring common code
> > + *
> > + * Copyright (C) 2022 Xilinx Inc.
> > + */
> > +#include 
> > +#include 
> > +#include 
> > +
> > +bool __ro_after_init llc_coloring_enabled;
> > +boolean_param("llc-coloring", llc_coloring_enabled);
>
> The variable has no use right now afaics, so it's unclear whether (a) it
> is legitimately non-static and (b) placed in an appropriate section.

My bad here. The variable should be tested for in llc_coloring_init() and in
domain_dump_llc_colors() (in domain_llc_coloring_free() as well, in later
patches). That change was lost in the rebase of the series.

Anyway per this patch, the global is only accessed from this file while it's
going to be accessed from outside in later patches. In this case what should
I do? Declare it static and then make it non-static afterwards?

> > +/* Size of an LLC way */
> > +static unsigned int __ro_after_init llc_way_size;
> > +size_param("llc-way-size", llc_way_size);
> > +/* Number of colors available in the LLC */
> > +static unsigned int __ro_after_init max_nr_colors = CONFIG_NR_LLC_COLORS;
> > +
> > +static void print_colors(const unsigned int *colors, unsigned int 
> > num_colors)
> > +{
> > +unsigned int i;
> > +
> > +printk("{ ");
> > +for ( i = 0; i < num_colors; i++ ) {
>
> Nit (style): Brace placement.
>
> > +unsigned int start = colors[i], end = colors[i];
> > +
> > +printk("%u", start);
> > +
> > +for ( ;
> > +  i < num_colors - 1 && colors[i] + 1 == colors[i + 1];
>
> To reduce the number of array accesses, may I suggest to use "end + 1"
> here instead of "colors[i] + 1"? (The initializer of "end" could also
> be "start", but I guess the compiler will recognize this anyway.) This
> would then (imo) also better justify the desire for having "end" in
> the first place.
>
> > +  i++, end++ );
>
> Imo for clarity the semicolon want to live on its own line.
>
> > +static void dump_coloring_info(unsigned char key)
>
> This being common code now, I think it would be good practice to have
> cf_check here right away, even if for now (for whatever reason) the
> feature is meant to be limited to Arm. (Albeit see below for whether
> this is to remain that way.)
>
> > +void __init llc_coloring_init(void)
> > +{
> > +if ( !llc_way_size && !(llc_way_size = get_llc_way_size()) )
> > +panic("Probed LLC coloring way size is 0 and no custom value 
> > found\n");
> > +
> > +/*
> > + * The maximum number of colors must be a power of 2 in order to 
> > correctly
> > + 

Re: [PATCH v6 01/15] xen/common: add cache coloring common code

2024-02-03 Thread Carlo Nonato
Hi Jan,

On Wed, Jan 31, 2024 at 4:57 PM Jan Beulich  wrote:
>
> On 29.01.2024 18:17, Carlo Nonato wrote:
> > Last Level Cache (LLC) coloring allows to partition the cache in smaller
> > chunks called cache colors. Since not all architectures can actually
> > implement it, add a HAS_LLC_COLORING Kconfig and put other options under
> > xen/arch.
> >
> > LLC colors are a property of the domain, so the domain struct has to be
> > extended.
> >
> > Based on original work from: Luca Miccio 
> >
> > Signed-off-by: Carlo Nonato 
> > Signed-off-by: Marco Solieri 
> > ---
> > v6:
> > - moved almost all code in common
> > - moved documentation in this patch
> > - reintroduced range for CONFIG_NR_LLC_COLORS
> > - reintroduced some stub functions to reduce the number of checks on
> >   llc_coloring_enabled
> > - moved domain_llc_coloring_free() in same patch where allocation happens
> > - turned "d->llc_colors" to pointer-to-const
> > - llc_coloring_init() now returns void and panics if errors are found
> > v5:
> > - used - instead of _ for filenames
> > - removed domain_create_llc_colored()
> > - removed stub functions
> > - coloring domain fields are now #ifdef protected
> > v4:
> > - Kconfig options moved to xen/arch
> > - removed range for CONFIG_NR_LLC_COLORS
> > - added "llc_coloring_enabled" global to later implement the boot-time
> >   switch
> > - added domain_create_llc_colored() to be able to pass colors
> > - added is_domain_llc_colored() macro
> > ---
> >  docs/misc/cache-coloring.rst  | 87 +++
> >  docs/misc/xen-command-line.pandoc | 27 ++
> >  xen/arch/Kconfig  | 17 ++
> >  xen/common/Kconfig|  3 ++
> >  xen/common/Makefile   |  1 +
> >  xen/common/keyhandler.c   |  3 ++
> >  xen/common/llc-coloring.c | 87 +++
> >  xen/include/xen/llc-coloring.h| 38 ++
> >  xen/include/xen/sched.h   |  5 ++
> >  9 files changed, 268 insertions(+)
> >  create mode 100644 docs/misc/cache-coloring.rst
> >  create mode 100644 xen/common/llc-coloring.c
> >  create mode 100644 xen/include/xen/llc-coloring.h
> >
> > diff --git a/docs/misc/cache-coloring.rst b/docs/misc/cache-coloring.rst
> > new file mode 100644
> > index 00..9fe01e99e1
> > --- /dev/null
> > +++ b/docs/misc/cache-coloring.rst
> > @@ -0,0 +1,87 @@
> > +Xen cache coloring user guide
> > +=
> > +
> > +The cache coloring support in Xen allows to reserve Last Level Cache (LLC)
> > +partitions for Dom0, DomUs and Xen itself. Currently only ARM64 is 
> > supported.
> > +
> > +To compile LLC coloring support set ``CONFIG_LLC_COLORING=y``.
> > +
> > +If needed, change the maximum number of colors with
> > +``CONFIG_NR_LLC_COLORS=``.
> > +
> > +Compile Xen and the toolstack and then configure it via
> > +`Command line parameters`_.
> > +
> > +Background
> > +**
> > +
> > +Cache hierarchy of a modern multi-core CPU typically has first levels 
> > dedicated
> > +to each core (hence using multiple cache units), while the last level is 
> > shared
> > +among all of them. Such configuration implies that memory operations on one
> > +core (e.g. running a DomU) are able to generate interference on another 
> > core
> > +(e.g .hosting another DomU). Cache coloring allows eliminating this
> > +mutual interference, and thus guaranteeing higher and more predictable
> > +performances for memory accesses.
> > +The key concept underlying cache coloring is a fragmentation of the memory
> > +space into a set of sub-spaces called colors that are mapped to disjoint 
> > cache
> > +partitions. Technically, the whole memory space is first divided into a 
> > number
> > +of subsequent regions. Then each region is in turn divided into a number of
> > +subsequent sub-colors. The generic i-th color is then obtained by all the
> > +i-th sub-colors in each region.
> > +
> > +::
> > +
> > +Region jRegion j+1
> > +.   
> > +. . .
> > +.   .
> > +_ _ ___ _ _ _ _
> > +| | | | | | |
> > +| c_0 | c_1 | | c_n | c_0 | c_1 |
> > +   _ _ _|_|_|_ _ _|_|_|_|_ _ _
> > +:   :
> > +:   :... ... .
> > +:color 0
> > +:... ... .
> > +:
> > +  . . ..:
> > +
> > +There are two pragmatic lesson to be learnt.
> > +
> > +1. If one wants to avoid cache interference between two domains, different
> > +   colors needs to be used for their memory.
> > +
> > +2. Color 

Re: [PATCH v6 01/15] xen/common: add cache coloring common code

2024-02-01 Thread Jan Beulich
On 29.01.2024 18:17, Carlo Nonato wrote:
> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -626,6 +626,11 @@ struct domain
>  
>  /* Holding CDF_* constant. Internal flags for domain creation. */
>  unsigned int cdf;
> +
> +#ifdef CONFIG_LLC_COLORING
> +unsigned const int *llc_colors;
> +unsigned int num_llc_colors;
> +#endif
>  };

Btw, at this point flipping the order of the two fields will be more
efficient for 64-bit architectures (consuming a padding hole rather
than adding yet another one).

Jan



Re: [PATCH v6 01/15] xen/common: add cache coloring common code

2024-02-01 Thread Jan Beulich
On 29.01.2024 18:17, Carlo Nonato wrote:
> --- a/xen/arch/Kconfig
> +++ b/xen/arch/Kconfig
> @@ -31,3 +31,20 @@ config NR_NUMA_NODES
> associated with multiple-nodes management. It is the upper bound of
> the number of NUMA nodes that the scheduler, memory allocation and
> other NUMA-aware components can handle.
> +
> +config LLC_COLORING
> + bool "Last Level Cache (LLC) coloring" if EXPERT
> + depends on HAS_LLC_COLORING
> +
> +config NR_LLC_COLORS
> + int "Maximum number of LLC colors"
> + range 2 1024

What's the reasoning behind this upper bound? IOW - can something to this
effect be said in the description, please?

> + default 128
> + depends on LLC_COLORING
> + help
> +   Controls the build-time size of various arrays associated with LLC
> +   coloring. Refer to cache coloring documentation for how to compute the
> +   number of colors supported by the platform. This is only an upper
> +   bound. The runtime value is autocomputed or manually set via cmdline.
> +   The default value corresponds to an 8 MiB 16-ways LLC, which should be
> +   more than what needed in the general case.

Aiui while not outright wrong, non-power-of-2 values are meaningless to
specify. Perhaps that is worth mentioning (if not making this a value
that's used as exponent of 2 in the first place)?

As to the default and its description: As said for the documentation,
doesn't what this corresponds to also depend on cache line size? Even
if this was still Arm-specific rather than common code, I'd question
whether now and forever Arm chips may only use one pre-determined cache
line size.

> --- /dev/null
> +++ b/xen/common/llc-coloring.c
> @@ -0,0 +1,87 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Last Level Cache (LLC) coloring common code
> + *
> + * Copyright (C) 2022 Xilinx Inc.
> + */
> +#include 
> +#include 
> +#include 
> +
> +bool __ro_after_init llc_coloring_enabled;
> +boolean_param("llc-coloring", llc_coloring_enabled);

The variable has no use right now afaics, so it's unclear whether (a) it
is legitimately non-static and (b) placed in an appropriate section.

> +/* Size of an LLC way */
> +static unsigned int __ro_after_init llc_way_size;
> +size_param("llc-way-size", llc_way_size);
> +/* Number of colors available in the LLC */
> +static unsigned int __ro_after_init max_nr_colors = CONFIG_NR_LLC_COLORS;
> +
> +static void print_colors(const unsigned int *colors, unsigned int num_colors)
> +{
> +unsigned int i;
> +
> +printk("{ ");
> +for ( i = 0; i < num_colors; i++ ) {

Nit (style): Brace placement.

> +unsigned int start = colors[i], end = colors[i];
> +
> +printk("%u", start);
> +
> +for ( ;
> +  i < num_colors - 1 && colors[i] + 1 == colors[i + 1];

To reduce the number of array accesses, may I suggest to use "end + 1"
here instead of "colors[i] + 1"? (The initializer of "end" could also
be "start", but I guess the compiler will recognize this anyway.) This
would then (imo) also better justify the desire for having "end" in
the first place.

> +  i++, end++ );

Imo for clarity the semicolon want to live on its own line.

> +static void dump_coloring_info(unsigned char key)

This being common code now, I think it would be good practice to have
cf_check here right away, even if for now (for whatever reason) the
feature is meant to be limited to Arm. (Albeit see below for whether
this is to remain that way.)

> +void __init llc_coloring_init(void)
> +{
> +if ( !llc_way_size && !(llc_way_size = get_llc_way_size()) )
> +panic("Probed LLC coloring way size is 0 and no custom value 
> found\n");
> +
> +/*
> + * The maximum number of colors must be a power of 2 in order to 
> correctly
> + * map them to bits of an address, so also the LLC way size must be so.
> + */
> +if ( llc_way_size & (llc_way_size - 1) )
> +panic("LLC coloring way size (%u) isn't a power of 2\n", 
> llc_way_size);
> +
> +max_nr_colors = llc_way_size >> PAGE_SHIFT;

With this unconditionally initialized here, what's the purpose of the
variable's initializer?

> +if ( max_nr_colors < 2 || max_nr_colors > CONFIG_NR_LLC_COLORS )
> +panic("Number of LLC colors (%u) not in range [2, %u]\n",
> +  max_nr_colors, CONFIG_NR_LLC_COLORS);

I'm not convinced of panic()ing here (including the earlier two
instances). You could warn, taint, disable, and continue. If you want
to stick to panic(), please justify doing so in the description.

Plus, if you panic(), shouldn't that be limited to llc_coloring_enabled
being true? Or - not visible here, due to the lack of a caller of the
function - is that meant to be taken care of by the caller (to not call
here when the flag is off)? I think it would be cleaner if the check
lived here; quite possibly that would then further permit the flag
variable to become static.

> +register_keyhandler('K', 

Re: [PATCH v6 01/15] xen/common: add cache coloring common code

2024-02-01 Thread Jan Beulich
On 29.01.2024 18:17, Carlo Nonato wrote:
> --- /dev/null
> +++ b/docs/misc/cache-coloring.rst
> @@ -0,0 +1,87 @@
> +Xen cache coloring user guide
> +=
> +
> +The cache coloring support in Xen allows to reserve Last Level Cache (LLC)
> +partitions for Dom0, DomUs and Xen itself. Currently only ARM64 is supported.
> +
> +To compile LLC coloring support set ``CONFIG_LLC_COLORING=y``.
> +
> +If needed, change the maximum number of colors with
> +``CONFIG_NR_LLC_COLORS=``.
> +
> +Compile Xen and the toolstack and then configure it via
> +`Command line parameters`_.
> +
> +Background
> +**
> +
> +Cache hierarchy of a modern multi-core CPU typically has first levels 
> dedicated
> +to each core (hence using multiple cache units), while the last level is 
> shared
> +among all of them. Such configuration implies that memory operations on one
> +core (e.g. running a DomU) are able to generate interference on another core
> +(e.g .hosting another DomU). Cache coloring allows eliminating this
> +mutual interference, and thus guaranteeing higher and more predictable
> +performances for memory accesses.

Since you say "eliminating" - what about shared mid-level caches? What about
shared TLBs?

Jan



Re: [PATCH v6 01/15] xen/common: add cache coloring common code

2024-02-01 Thread Jan Beulich
On 31.01.2024 16:57, Jan Beulich wrote:
> On 29.01.2024 18:17, Carlo Nonato wrote:
>> +Command line parameters
>> +***
>> +
>> +More specific documentation is available at 
>> `docs/misc/xen-command-line.pandoc`.
>> +
>> ++--+---+
>> +| **Parameter**| **Description**   |
>> ++--+---+
>> +| ``llc-coloring`` | enable coloring at runtime|
>> ++--+---+
>> +| ``llc-way-size`` | set the LLC way size  |
>> ++--+---+
> 
> As a result of the above, I also find it confusing to specify "way size"
> as a command line option. Cache size, number of ways, and cache line size
> would seem more natural to me.

Or, alternatively, have the number of colors be specifiable directly.

Jan



Re: [PATCH v6 01/15] xen/common: add cache coloring common code

2024-01-31 Thread Jan Beulich
On 29.01.2024 18:17, Carlo Nonato wrote:
> Last Level Cache (LLC) coloring allows to partition the cache in smaller
> chunks called cache colors. Since not all architectures can actually
> implement it, add a HAS_LLC_COLORING Kconfig and put other options under
> xen/arch.
> 
> LLC colors are a property of the domain, so the domain struct has to be
> extended.
> 
> Based on original work from: Luca Miccio 
> 
> Signed-off-by: Carlo Nonato 
> Signed-off-by: Marco Solieri 
> ---
> v6:
> - moved almost all code in common
> - moved documentation in this patch
> - reintroduced range for CONFIG_NR_LLC_COLORS
> - reintroduced some stub functions to reduce the number of checks on
>   llc_coloring_enabled
> - moved domain_llc_coloring_free() in same patch where allocation happens
> - turned "d->llc_colors" to pointer-to-const
> - llc_coloring_init() now returns void and panics if errors are found
> v5:
> - used - instead of _ for filenames
> - removed domain_create_llc_colored()
> - removed stub functions
> - coloring domain fields are now #ifdef protected
> v4:
> - Kconfig options moved to xen/arch
> - removed range for CONFIG_NR_LLC_COLORS
> - added "llc_coloring_enabled" global to later implement the boot-time
>   switch
> - added domain_create_llc_colored() to be able to pass colors
> - added is_domain_llc_colored() macro
> ---
>  docs/misc/cache-coloring.rst  | 87 +++
>  docs/misc/xen-command-line.pandoc | 27 ++
>  xen/arch/Kconfig  | 17 ++
>  xen/common/Kconfig|  3 ++
>  xen/common/Makefile   |  1 +
>  xen/common/keyhandler.c   |  3 ++
>  xen/common/llc-coloring.c | 87 +++
>  xen/include/xen/llc-coloring.h| 38 ++
>  xen/include/xen/sched.h   |  5 ++
>  9 files changed, 268 insertions(+)
>  create mode 100644 docs/misc/cache-coloring.rst
>  create mode 100644 xen/common/llc-coloring.c
>  create mode 100644 xen/include/xen/llc-coloring.h
> 
> diff --git a/docs/misc/cache-coloring.rst b/docs/misc/cache-coloring.rst
> new file mode 100644
> index 00..9fe01e99e1
> --- /dev/null
> +++ b/docs/misc/cache-coloring.rst
> @@ -0,0 +1,87 @@
> +Xen cache coloring user guide
> +=
> +
> +The cache coloring support in Xen allows to reserve Last Level Cache (LLC)
> +partitions for Dom0, DomUs and Xen itself. Currently only ARM64 is supported.
> +
> +To compile LLC coloring support set ``CONFIG_LLC_COLORING=y``.
> +
> +If needed, change the maximum number of colors with
> +``CONFIG_NR_LLC_COLORS=``.
> +
> +Compile Xen and the toolstack and then configure it via
> +`Command line parameters`_.
> +
> +Background
> +**
> +
> +Cache hierarchy of a modern multi-core CPU typically has first levels 
> dedicated
> +to each core (hence using multiple cache units), while the last level is 
> shared
> +among all of them. Such configuration implies that memory operations on one
> +core (e.g. running a DomU) are able to generate interference on another core
> +(e.g .hosting another DomU). Cache coloring allows eliminating this
> +mutual interference, and thus guaranteeing higher and more predictable
> +performances for memory accesses.
> +The key concept underlying cache coloring is a fragmentation of the memory
> +space into a set of sub-spaces called colors that are mapped to disjoint 
> cache
> +partitions. Technically, the whole memory space is first divided into a 
> number
> +of subsequent regions. Then each region is in turn divided into a number of
> +subsequent sub-colors. The generic i-th color is then obtained by all the
> +i-th sub-colors in each region.
> +
> +::
> +
> +Region jRegion j+1
> +.   
> +. . .
> +.   .
> +_ _ ___ _ _ _ _
> +| | | | | | |
> +| c_0 | c_1 | | c_n | c_0 | c_1 |
> +   _ _ _|_|_|_ _ _|_|_|_|_ _ _
> +:   :
> +:   :... ... .
> +:color 0
> +:... ... .
> +:
> +  . . ..:
> +
> +There are two pragmatic lesson to be learnt.
> +
> +1. If one wants to avoid cache interference between two domains, different
> +   colors needs to be used for their memory.
> +
> +2. Color assignment must privilege contiguity in the partitioning. E.g.,
> +   assigning colors (0,1) to domain I  and (2,3) to domain  J is better than
> +   assigning colors (0,2) to I and (1,3) to J.

I can't connect this 2nd point with any of what was said above.

> +How to compute the number of 

[PATCH v6 01/15] xen/common: add cache coloring common code

2024-01-29 Thread Carlo Nonato
Last Level Cache (LLC) coloring allows to partition the cache in smaller
chunks called cache colors. Since not all architectures can actually
implement it, add a HAS_LLC_COLORING Kconfig and put other options under
xen/arch.

LLC colors are a property of the domain, so the domain struct has to be
extended.

Based on original work from: Luca Miccio 

Signed-off-by: Carlo Nonato 
Signed-off-by: Marco Solieri 
---
v6:
- moved almost all code in common
- moved documentation in this patch
- reintroduced range for CONFIG_NR_LLC_COLORS
- reintroduced some stub functions to reduce the number of checks on
  llc_coloring_enabled
- moved domain_llc_coloring_free() in same patch where allocation happens
- turned "d->llc_colors" to pointer-to-const
- llc_coloring_init() now returns void and panics if errors are found
v5:
- used - instead of _ for filenames
- removed domain_create_llc_colored()
- removed stub functions
- coloring domain fields are now #ifdef protected
v4:
- Kconfig options moved to xen/arch
- removed range for CONFIG_NR_LLC_COLORS
- added "llc_coloring_enabled" global to later implement the boot-time
  switch
- added domain_create_llc_colored() to be able to pass colors
- added is_domain_llc_colored() macro
---
 docs/misc/cache-coloring.rst  | 87 +++
 docs/misc/xen-command-line.pandoc | 27 ++
 xen/arch/Kconfig  | 17 ++
 xen/common/Kconfig|  3 ++
 xen/common/Makefile   |  1 +
 xen/common/keyhandler.c   |  3 ++
 xen/common/llc-coloring.c | 87 +++
 xen/include/xen/llc-coloring.h| 38 ++
 xen/include/xen/sched.h   |  5 ++
 9 files changed, 268 insertions(+)
 create mode 100644 docs/misc/cache-coloring.rst
 create mode 100644 xen/common/llc-coloring.c
 create mode 100644 xen/include/xen/llc-coloring.h

diff --git a/docs/misc/cache-coloring.rst b/docs/misc/cache-coloring.rst
new file mode 100644
index 00..9fe01e99e1
--- /dev/null
+++ b/docs/misc/cache-coloring.rst
@@ -0,0 +1,87 @@
+Xen cache coloring user guide
+=
+
+The cache coloring support in Xen allows to reserve Last Level Cache (LLC)
+partitions for Dom0, DomUs and Xen itself. Currently only ARM64 is supported.
+
+To compile LLC coloring support set ``CONFIG_LLC_COLORING=y``.
+
+If needed, change the maximum number of colors with
+``CONFIG_NR_LLC_COLORS=``.
+
+Compile Xen and the toolstack and then configure it via
+`Command line parameters`_.
+
+Background
+**
+
+Cache hierarchy of a modern multi-core CPU typically has first levels dedicated
+to each core (hence using multiple cache units), while the last level is shared
+among all of them. Such configuration implies that memory operations on one
+core (e.g. running a DomU) are able to generate interference on another core
+(e.g .hosting another DomU). Cache coloring allows eliminating this
+mutual interference, and thus guaranteeing higher and more predictable
+performances for memory accesses.
+The key concept underlying cache coloring is a fragmentation of the memory
+space into a set of sub-spaces called colors that are mapped to disjoint cache
+partitions. Technically, the whole memory space is first divided into a number
+of subsequent regions. Then each region is in turn divided into a number of
+subsequent sub-colors. The generic i-th color is then obtained by all the
+i-th sub-colors in each region.
+
+::
+
+Region jRegion j+1
+.   
+. . .
+.   .
+_ _ ___ _ _ _ _
+| | | | | | |
+| c_0 | c_1 | | c_n | c_0 | c_1 |
+   _ _ _|_|_|_ _ _|_|_|_|_ _ _
+:   :
+:   :... ... .
+:color 0
+:... ... .
+:
+  . . ..:
+
+There are two pragmatic lesson to be learnt.
+
+1. If one wants to avoid cache interference between two domains, different
+   colors needs to be used for their memory.
+
+2. Color assignment must privilege contiguity in the partitioning. E.g.,
+   assigning colors (0,1) to domain I  and (2,3) to domain  J is better than
+   assigning colors (0,2) to I and (1,3) to J.
+
+How to compute the number of colors
+***
+
+To compute the number of available colors for a specific platform, the size of
+an LLC way and the page size used by Xen must be known. The first parameter can
+be found in the processor manual or can be also computed dividing the total
+cache size by the number of its ways. The second parameter is