Re: [Xen-devel] [RFC Design Doc] Intel L2 Cache Allocation Technology (L2 CAT) Feature enabling

2016-05-16 Thread Dario Faggioli
On Mon, 2016-05-16 at 16:23 +0800, He Chen wrote:
> As Andrew said, CLOS is currently managed per-domain in Xen and it
> works
> well so far. So in initial design, I am inclined to continue this
> behavior (per-socket) to L2 CAT to keep the consistency between L2
> and
> L3 CAT. Any thoughts?
> 
FWIW, I think this is fine. If at some point we'll want something
different, we can always extend.

Perhaps, let's keep this (the fact that we may want to make things more
fine grained) in mind when designing the interface.

Regards,
Dario

-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)



signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC Design Doc] Intel L2 Cache Allocation Technology (L2 CAT) Feature enabling

2016-05-16 Thread He Chen
On Fri, May 13, 2016 at 06:17:53PM +0200, Dario Faggioli wrote:
> On Fri, 2016-05-13 at 10:23 +0100, Andrew Cooper wrote:
> > On 13/05/16 09:55, Jan Beulich wrote:
> > > 
> > > But anyway, L2 or L3 - I can't see how this context switching would
> > > DTRT when there are vCPU-s of different domains on the same
> > > socket (or core, if L2s and MSRs were per-core): The one getting
> > > scheduled later onto a socket (core) would simply overwrite what
> > > got written for the one which had been scheduled earlier.
> > PSR_ASSOC is a per-thread MSR which selects the CLOS to use.  CLOS is
> > currently managed per-domain in Xen, and context switched with vcpu.
> > 
> Yep, exactly. I did look a bit into this for CMT (so, not L3 CAT, but
> it's not that different).
> 
> Doing things on a per-vcpu basis could be interesting, and that's even
> more the case if we get to do L2 stuff, but there are two few RMIDs
> available for such a configuration to be really useful.
> 
> > Xen programs different capacity bitmaps into IA32_L2_QOS_MASK_0 ...
> > IA32_L2_QOS_MASK_n, and the CLOS selects which bitmap is enforced.
> > 
> So, basically, just to figure out if I understood (i.e., this is for He
> Chen).
> 
> If we have a 2 sockets, with socket 0 containing cores 0,1,2,3 and
> socket 1 containing cores 4,5,6,7, it will be possible to specify two
> different "L2 reservation values" (in the form of CBMs, of course), for
> a domain:
>  - one would be how much L2 cache the domain will be able to use (say 
>    X) when running on socket 1, which means on cores 0,1,2 or 3
>  - another would be how much L2 cache the domain will be able to (say, 
>    Y use when running on socket 2, which means on cores 4,5,6, or 7
> 
> Which in turn means, in case L2 is per core, the domain will get X of
> core 0's L2, X of core 1's L2, X of core 2's L2 and X of core 3's L2.
> On socket 1, it will get Y of core 4' L2, Y of core 5's L2 cache etc.
> etc.
> 
> And so, in summary what we would not be able to specify is a different
> value for the L2 reservations of, for instance, core 1 and core 3
> (i.e., of cores that are part of the same socket).
> 
> Does this summary make sense?

Yes, great instance and that is exactly how L3 CAT work now.
Let's see the source to make it clear:
```
void psr_ctxt_switch_to(struct domain *d)
{
...
if ( psra->cos_mask )
psr_assoc_cos(, d->arch.psr_cos_ids ?
  d->arch.psr_cos_ids[cpu_to_socket(smp_processor_id())] :
  0, psra->cos_mask);
...
}
```
`psr_cos_ids` is indexed by socket_id which leads to the per-socket
cache enforcement.

As Andrew said, CLOS is currently managed per-domain in Xen and it works
well so far. So in initial design, I am inclined to continue this
behavior (per-socket) to L2 CAT to keep the consistency between L2 and
L3 CAT. Any thoughts?

Thanks,
-He


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC Design Doc] Intel L2 Cache Allocation Technology (L2 CAT) Feature enabling

2016-05-13 Thread Dario Faggioli
On Fri, 2016-05-13 at 10:23 +0100, Andrew Cooper wrote:
> On 13/05/16 09:55, Jan Beulich wrote:
> > 
> > But anyway, L2 or L3 - I can't see how this context switching would
> > DTRT when there are vCPU-s of different domains on the same
> > socket (or core, if L2s and MSRs were per-core): The one getting
> > scheduled later onto a socket (core) would simply overwrite what
> > got written for the one which had been scheduled earlier.
> PSR_ASSOC is a per-thread MSR which selects the CLOS to use.  CLOS is
> currently managed per-domain in Xen, and context switched with vcpu.
> 
Yep, exactly. I did look a bit into this for CMT (so, not L3 CAT, but
it's not that different).

Doing things on a per-vcpu basis could be interesting, and that's even
more the case if we get to do L2 stuff, but there are two few RMIDs
available for such a configuration to be really useful.

> Xen programs different capacity bitmaps into IA32_L2_QOS_MASK_0 ...
> IA32_L2_QOS_MASK_n, and the CLOS selects which bitmap is enforced.
> 
So, basically, just to figure out if I understood (i.e., this is for He
Chen).

If we have a 2 sockets, with socket 0 containing cores 0,1,2,3 and
socket 1 containing cores 4,5,6,7, it will be possible to specify two
different "L2 reservation values" (in the form of CBMs, of course), for
a domain:
 - one would be how much L2 cache the domain will be able to use (say 
   X) when running on socket 1, which means on cores 0,1,2 or 3
 - another would be how much L2 cache the domain will be able to (say, 
   Y use when running on socket 2, which means on cores 4,5,6, or 7

Which in turn means, in case L2 is per core, the domain will get X of
core 0's L2, X of core 1's L2, X of core 2's L2 and X of core 3's L2.
On socket 1, it will get Y of core 4' L2, Y of core 5's L2 cache etc.
etc.

And so, in summary what we would not be able to specify is a different
value for the L2 reservations of, for instance, core 1 and core 3
(i.e., of cores that are part of the same socket).

Does this summary make sense?

Thanks and Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)



signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC Design Doc] Intel L2 Cache Allocation Technology (L2 CAT) Feature enabling

2016-05-13 Thread Andrew Cooper
On 13/05/16 09:55, Jan Beulich wrote:
 On 13.05.16 at 09:43,  wrote:
>> On 13/05/2016 07:48, Jan Beulich wrote:
>> On 13.05.16 at 08:26,  wrote:
 On Thu, May 12, 2016 at 04:05:36AM -0600, Jan Beulich wrote:
 On 12.05.16 at 11:40,  wrote:
>> We plan to bring new PQoS feature called Intel L2 Cache Allocation
>> Technology (L2 CAT) to Xen.
>>
>> L2 CAT is supported on Atom codename Goldmont and beyond. “Big-core”
>> Xeon does not support L2 CAT in current generations.
> Looks mostly like a direct (and hence reasonable) extension of what
> we have for L3 right now. One immediate question I have is whether
> tying this to per-socket information is a good idea. As soon as Xeon-s
> would also gain such functionality, things would (aiui) need to become
> per-core (as L2 is per core there iirc).
>
 L2 Cache capability keeps the same through all cores in a socket, so we
 make it per-socket to balance code complexity and accessibility.

 I am not a expert in scheduler, do you mean in some cases, a domain
 would apply different L2 cache access pattern when it is scheduled on
 different cores even though the cores are in the same socket?
>>> No, I mean different domains may be running on different cores,
>>> and hence different policies may be needed to accommodate them
>>> all.
>> From the description, it sounds like L2 behaves almost exactly like L3. 
>> There is one set of capacity bitmaps which apply to all L2 caches in the
>> socket, and the specific capacity bitmap in effect is specified by
>> PSR_ASSOC CLOS, which is context switched with the vcpu.
> Well, I suppose the description is implying per-socket L2s. For per-
> core L2s I'd expect the MSRs to also become per-core.
>
> But anyway, L2 or L3 - I can't see how this context switching would
> DTRT when there are vCPU-s of different domains on the same
> socket (or core, if L2s and MSRs were per-core): The one getting
> scheduled later onto a socket (core) would simply overwrite what
> got written for the one which had been scheduled earlier.

PSR_ASSOC is a per-thread MSR which selects the CLOS to use.  CLOS is
currently managed per-domain in Xen, and context switched with vcpu.

Xen programs different capacity bitmaps into IA32_L2_QOS_MASK_0 ...
IA32_L2_QOS_MASK_n, and the CLOS selects which bitmap is enforced.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC Design Doc] Intel L2 Cache Allocation Technology (L2 CAT) Feature enabling

2016-05-13 Thread Jan Beulich
>>> On 13.05.16 at 09:43,  wrote:
> On 13/05/2016 07:48, Jan Beulich wrote:
> On 13.05.16 at 08:26,  wrote:
>>> On Thu, May 12, 2016 at 04:05:36AM -0600, Jan Beulich wrote:
>>> On 12.05.16 at 11:40,  wrote:
> We plan to bring new PQoS feature called Intel L2 Cache Allocation
> Technology (L2 CAT) to Xen.
>
> L2 CAT is supported on Atom codename Goldmont and beyond. “Big-core”
> Xeon does not support L2 CAT in current generations.
 Looks mostly like a direct (and hence reasonable) extension of what
 we have for L3 right now. One immediate question I have is whether
 tying this to per-socket information is a good idea. As soon as Xeon-s
 would also gain such functionality, things would (aiui) need to become
 per-core (as L2 is per core there iirc).

>>> L2 Cache capability keeps the same through all cores in a socket, so we
>>> make it per-socket to balance code complexity and accessibility.
>>>
>>> I am not a expert in scheduler, do you mean in some cases, a domain
>>> would apply different L2 cache access pattern when it is scheduled on
>>> different cores even though the cores are in the same socket?
>> No, I mean different domains may be running on different cores,
>> and hence different policies may be needed to accommodate them
>> all.
> 
> From the description, it sounds like L2 behaves almost exactly like L3. 
> There is one set of capacity bitmaps which apply to all L2 caches in the
> socket, and the specific capacity bitmap in effect is specified by
> PSR_ASSOC CLOS, which is context switched with the vcpu.

Well, I suppose the description is implying per-socket L2s. For per-
core L2s I'd expect the MSRs to also become per-core.

But anyway, L2 or L3 - I can't see how this context switching would
DTRT when there are vCPU-s of different domains on the same
socket (or core, if L2s and MSRs were per-core): The one getting
scheduled later onto a socket (core) would simply overwrite what
got written for the one which had been scheduled earlier.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC Design Doc] Intel L2 Cache Allocation Technology (L2 CAT) Feature enabling

2016-05-13 Thread Andrew Cooper
On 13/05/2016 07:48, Jan Beulich wrote:
 On 13.05.16 at 08:26,  wrote:
>> On Thu, May 12, 2016 at 04:05:36AM -0600, Jan Beulich wrote:
>> On 12.05.16 at 11:40,  wrote:
 We plan to bring new PQoS feature called Intel L2 Cache Allocation
 Technology (L2 CAT) to Xen.

 L2 CAT is supported on Atom codename Goldmont and beyond. “Big-core”
 Xeon does not support L2 CAT in current generations.
>>> Looks mostly like a direct (and hence reasonable) extension of what
>>> we have for L3 right now. One immediate question I have is whether
>>> tying this to per-socket information is a good idea. As soon as Xeon-s
>>> would also gain such functionality, things would (aiui) need to become
>>> per-core (as L2 is per core there iirc).
>>>
>> L2 Cache capability keeps the same through all cores in a socket, so we
>> make it per-socket to balance code complexity and accessibility.
>>
>> I am not a expert in scheduler, do you mean in some cases, a domain
>> would apply different L2 cache access pattern when it is scheduled on
>> different cores even though the cores are in the same socket?
> No, I mean different domains may be running on different cores,
> and hence different policies may be needed to accommodate them
> all.

From the description, it sounds like L2 behaves almost exactly like L3. 
There is one set of capacity bitmaps which apply to all L2 caches in the
socket, and the specific capacity bitmap in effect is specified by
PSR_ASSOC CLOS, which is context switched with the vcpu.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC Design Doc] Intel L2 Cache Allocation Technology (L2 CAT) Feature enabling

2016-05-13 Thread Jan Beulich
>>> On 13.05.16 at 08:26,  wrote:
> On Thu, May 12, 2016 at 04:05:36AM -0600, Jan Beulich wrote:
>> >>> On 12.05.16 at 11:40,  wrote:
>> > We plan to bring new PQoS feature called Intel L2 Cache Allocation
>> > Technology (L2 CAT) to Xen.
>> > 
>> > L2 CAT is supported on Atom codename Goldmont and beyond. “Big-core”
>> > Xeon does not support L2 CAT in current generations.
>> 
>> Looks mostly like a direct (and hence reasonable) extension of what
>> we have for L3 right now. One immediate question I have is whether
>> tying this to per-socket information is a good idea. As soon as Xeon-s
>> would also gain such functionality, things would (aiui) need to become
>> per-core (as L2 is per core there iirc).
>> 
> 
> L2 Cache capability keeps the same through all cores in a socket, so we
> make it per-socket to balance code complexity and accessibility.
> 
> I am not a expert in scheduler, do you mean in some cases, a domain
> would apply different L2 cache access pattern when it is scheduled on
> different cores even though the cores are in the same socket?

No, I mean different domains may be running on different cores,
and hence different policies may be needed to accommodate them
all.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC Design Doc] Intel L2 Cache Allocation Technology (L2 CAT) Feature enabling

2016-05-13 Thread He Chen
On Thu, May 12, 2016 at 04:05:36AM -0600, Jan Beulich wrote:
> >>> On 12.05.16 at 11:40,  wrote:
> > % Intel L2 Cache Allocation Technology (L2 CAT) Feature
> > % Revision 1.0
> > 
> > \clearpage
> > 
> > Hi all,
> > 
> > We plan to bring new PQoS feature called Intel L2 Cache Allocation
> > Technology (L2 CAT) to Xen.
> > 
> > L2 CAT is supported on Atom codename Goldmont and beyond. “Big-core”
> > Xeon does not support L2 CAT in current generations.
> 
> Looks mostly like a direct (and hence reasonable) extension of what
> we have for L3 right now. One immediate question I have is whether
> tying this to per-socket information is a good idea. As soon as Xeon-s
> would also gain such functionality, things would (aiui) need to become
> per-core (as L2 is per core there iirc).
> 

L2 Cache capability keeps the same through all cores in a socket, so we
make it per-socket to balance code complexity and accessibility.

I am not a expert in scheduler, do you mean in some cases, a domain
would apply different L2 cache access pattern when it is scheduled on
different cores even though the cores are in the same socket?

> The other question is whether with Xen we care enough about Atoms
> to add code that's of use only there.
> 

L2 CAT is a platform independent feature although it firstly shows in
Atoms and I believe that it will appear in other platform soon.

Thanks,
-He

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC Design Doc] Intel L2 Cache Allocation Technology (L2 CAT) Feature enabling

2016-05-12 Thread Andrew Cooper
On 12/05/16 10:40, He Chen wrote:
> % Intel L2 Cache Allocation Technology (L2 CAT) Feature
> % Revision 1.0
>
> \clearpage
>
> Hi all,
>
> We plan to bring new PQoS feature called Intel L2 Cache Allocation
> Technology (L2 CAT) to Xen.
>
> L2 CAT is supported on Atom codename Goldmont and beyond. “Big-core”
> Xeon does not support L2 CAT in current generations.
>
> This is the initial design of L2 CAT. It might be a little long and
> detailed, hope it doesn't matter.
>
> Comments and suggestions are welcome :-)

First of all, thankyou very much for choosing to do the doc like this. 
It is nice to see this format starting to get used.

> ## The relationship between L2 CAT and L3 CAT/CDP
>
> L2 CAT is independent of L3 CAT/CDP, which means L2 CAT would be enabled
> while L3 CAT/CDP is disabled, or L2 CAT and L3 CAT/CDP are all enabled.

The wording here is a little odd, given that no hardware currently
supports both L2 and L3.

It might be easier to say:

L2 CAT is independent of L3 CAT/CDP, and both may be enabled at the same
time.


Otherwise, everything else looks great.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC Design Doc] Intel L2 Cache Allocation Technology (L2 CAT) Feature enabling

2016-05-12 Thread Jan Beulich
>>> On 12.05.16 at 11:40,  wrote:
> % Intel L2 Cache Allocation Technology (L2 CAT) Feature
> % Revision 1.0
> 
> \clearpage
> 
> Hi all,
> 
> We plan to bring new PQoS feature called Intel L2 Cache Allocation
> Technology (L2 CAT) to Xen.
> 
> L2 CAT is supported on Atom codename Goldmont and beyond. “Big-core”
> Xeon does not support L2 CAT in current generations.

Looks mostly like a direct (and hence reasonable) extension of what
we have for L3 right now. One immediate question I have is whether
tying this to per-socket information is a good idea. As soon as Xeon-s
would also gain such functionality, things would (aiui) need to become
per-core (as L2 is per core there iirc).

The other question is whether with Xen we care enough about Atoms
to add code that's of use only there.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [RFC Design Doc] Intel L2 Cache Allocation Technology (L2 CAT) Feature enabling

2016-05-12 Thread He Chen
% Intel L2 Cache Allocation Technology (L2 CAT) Feature
% Revision 1.0

\clearpage

Hi all,

We plan to bring new PQoS feature called Intel L2 Cache Allocation
Technology (L2 CAT) to Xen.

L2 CAT is supported on Atom codename Goldmont and beyond. “Big-core”
Xeon does not support L2 CAT in current generations.

This is the initial design of L2 CAT. It might be a little long and
detailed, hope it doesn't matter.

Comments and suggestions are welcome :-)

# Basics

 
 Status: **Tech Preview**

Architecture(s): Intel x86

   Component(s): Hypervisor, toolstack

   Hardware: Atom codename Goldmont and beyond
 

# Overview

L2 CAT allows an OS or Hypervisor/VMM to control allocation of a
CPU's shared L2 cache based on application priority or Class of Service
(COS). Each CLOS is configured using capacity bitmasks (CBM) which
represent cache capacity and indicate the degree of overlap and
isolation between classes. Once L2 CAT is configured, the processor
allows access to portions of L2 cache according to the established
class of service (COS).

# Technical information

L2 CAT is a member of Intel PQoS features and part of CAT, it shares
some base PSR infrastructure in Xen.

## Hardware perspective

L2 CAT defines a new range MSRs to assign different L2 cache access
patterns which are known as CBMs (Capacity BitMask), each CBM is
associated with a COS.

```

+++
   IA32_PQR_ASSOC   | MSR (per socket)   |Address |
 ++---+---+ +++
 ||COS|   | | IA32_L2_QOS_MASK_0 | 0xD10  |
 ++---+---+ +++
└-> | ...|  ...   |
+++
| IA32_L2_QOS_MASK_n | 0xD10+n (n<64) |
+++
```

When context switch happens, the COS of VCPU is written to per-thread
MSR `IA32_PQR_ASSOC`, and then hardware enforces L2 cache allocation
according to the corresponding CBM.

## The relationship between L2 CAT and L3 CAT/CDP

L2 CAT is independent of L3 CAT/CDP, which means L2 CAT would be enabled
while L3 CAT/CDP is disabled, or L2 CAT and L3 CAT/CDP are all enabled.

L2 CAT uses a new range CBMs from 0xD10 ~ 0xD10+n (n<64), following by
the L3 CAT/CDP CBMs, and supports setting different L2 cache accessing
patterns from L3 cache.

N.B. L2 CAT and L3 CAT/CDP share the same COS field in the same
associate register `IA32_PQR_ASSOC`, that means one COS corresponds to a
pair of L2 CBM and L3 CBM.

In the initial implementation, L2 CAT is shown up on Atom codename
Goldmont firstly and there is no platform support both L2 & L3 CAT so
far.

## Design Overview

* Core COS/CBM association

  When enforcing L2 CAT, all cores of domains have the same default
  COS (COS0) which associated to the fully open CBM (all ones bitmask)
  to access all L2 cache. The default COS is used only in hypervisor
  and is transparent to tool stack and user.

  System administrator can change PQoS allocation policy at runtime by
  tool stack. Since L2 CAT share COS with L3 CAT/CDP, a COS corresponds
  to a 2-tuple, like [L2 CBM, L3 CBM] with only-CAT enabled, when CDP
  is enabled, one COS corresponds to a 3-tuple, like [L2 CBM,
  L3 Code_CBM, L3 Data_CBM]. If neither L3 CAT nor L3 CDP is enabled,
  things would be easier, one COS corresponds to one L2 CBM.

* VCPU schedule

  This part reuses L3 CAT COS infrastructure.

* Multi-sockets

  Different sockets may have different L2 CAT capability (e.g. max COS)
  although it is consistent on the same socket. So the capability of
  per-socket L2 CAT is specified.

## Implementation Description

* Hypervisor interfaces:

  1. Ext: Boot line parameter "psr=cat" now will enable L2 CAT and L3
  CAT if hardware supported.

  2. New: SYSCTL:
  - XEN_SYSCTL_PSR_CAT_get_l2_info: Get L2 CAT information.

  3. New: DOMCTL:
  - XEN_DOMCTL_PSR_CAT_OP_GET_L2_CBM: Get L2 CBM for a domain.
  - XEN_DOMCTL_PSR_CAT_OP_SET_L2_CBM: Set L2 CBM for a domain.

* xl interfaces:

  1. Ext: psr-cat-show: Show system/domain L2 CAT information.
  => XEN_SYSCTL_PSR_CAT_get_l2_info /
 XEN_DOMCTL_PSR_CAT_OP_GET_L2_CBM

  2. Ext: psr-mba-set -l2 domain-id cbm
  Set L2 cbm for a domain.
  => XEN_DOMCTL_PSR_CAT_OP_SET_L2_CBM

* Key data structure:

  1. Combined PSR bitmasks structure

 ```
 struct psr_mask {
 struct l3_cat {
 union {
 uint64_t cbm;
 struct {
 uint64_t code;
 uint64_t data;
 };
 

[Xen-devel] [RFC Design Doc] Intel L2 Cache Allocation Technology (L2 CAT) Feature enabling

2016-05-12 Thread He Chen
% Intel L2 Cache Allocation Technology (L2 CAT) Feature
% Revision 1.0

\clearpage

Hi all,

We plan to bring new PQoS feature called Intel L2 Cache Allocation
Technology (L2 CAT) to Xen.

L2 CAT is supported on Atom codename Goldmont and beyond. “Big-core”
Xeon does not support L2 CAT in current generations.

This is the initial design of L2 CAT. It might be a little long and
detailed, hope it doesn't matter.

Comments and suggestions are welcome :-)

# Basics

 
 Status: **Tech Preview**

Architecture(s): Intel x86

   Component(s): Hypervisor, toolstack

   Hardware: Atom codename Goldmont and beyond
 

# Overview

L2 CAT allows an OS or Hypervisor/VMM to control allocation of a
CPU's shared L2 cache based on application priority or Class of Service
(COS). Each CLOS is configured using capacity bitmasks (CBM) which
represent cache capacity and indicate the degree of overlap and
isolation between classes. Once L2 CAT is configured, the processor
allows access to portions of L2 cache according to the established
class of service (COS).

# Technical information

L2 CAT is a member of Intel PQoS features and part of CAT, it shares
some base PSR infrastructure in Xen.

## Hardware perspective

L2 CAT defines a new range MSRs to assign different L2 cache access
patterns which are known as CBMs (Capacity BitMask), each CBM is
associated with a COS.

```

+++
   IA32_PQR_ASSOC   | MSR (per socket)   |Address |
 ++---+---+ +++
 ||COS|   | | IA32_L2_QOS_MASK_0 | 0xD10  |
 ++---+---+ +++
└-> | ...|  ...   |
+++
| IA32_L2_QOS_MASK_n | 0xD10+n (n<64) |
+++
```

When context switch happens, the COS of VCPU is written to per-thread
MSR `IA32_PQR_ASSOC`, and then hardware enforces L2 cache allocation
according to the corresponding CBM.

## The relationship between L2 CAT and L3 CAT/CDP

L2 CAT is independent of L3 CAT/CDP, which means L2 CAT would be enabled
while L3 CAT/CDP is disabled, or L2 CAT and L3 CAT/CDP are all enabled.

L2 CAT uses a new range CBMs from 0xD10 ~ 0xD10+n (n<64), following by
the L3 CAT/CDP CBMs, and supports setting different L2 cache accessing
patterns from L3 cache.

N.B. L2 CAT and L3 CAT/CDP share the same COS field in the same
associate register `IA32_PQR_ASSOC`, that means one COS corresponds to a
pair of L2 CBM and L3 CBM.

In the initial implementation, L2 CAT is shown up on Atom codename
Goldmont firstly and there is no platform support both L2 & L3 CAT so
far.

## Design Overview

* Core COS/CBM association

  When enforcing L2 CAT, all cores of domains have the same default
  COS (COS0) which associated to the fully open CBM (all ones bitmask)
  to access all L2 cache. The default COS is used only in hypervisor
  and is transparent to tool stack and user.

  System administrator can change PQoS allocation policy at runtime by
  tool stack. Since L2 CAT share COS with L3 CAT/CDP, a COS corresponds
  to a 2-tuple, like [L2 CBM, L3 CBM] with only-CAT enabled, when CDP
  is enabled, one COS corresponds to a 3-tuple, like [L2 CBM,
  L3 Code_CBM, L3 Data_CBM]. If neither L3 CAT nor L3 CDP is enabled,
  things would be easier, one COS corresponds to one L2 CBM.

* VCPU schedule

  This part reuses L3 CAT COS infrastructure.

* Multi-sockets

  Different sockets may have different L2 CAT capability (e.g. max COS)
  although it is consistent on the same socket. So the capability of
  per-socket L2 CAT is specified.

## Implementation Description

* Hypervisor interfaces:

  1. Ext: Boot line parameter "psr=cat" now will enable L2 CAT and L3
  CAT if hardware supported.

  2. New: SYSCTL:
  - XEN_SYSCTL_PSR_CAT_get_l2_info: Get L2 CAT information.

  3. New: DOMCTL:
  - XEN_DOMCTL_PSR_CAT_OP_GET_L2_CBM: Get L2 CBM for a domain.
  - XEN_DOMCTL_PSR_CAT_OP_SET_L2_CBM: Set L2 CBM for a domain.

* xl interfaces:

  1. Ext: psr-cat-show: Show system/domain L2 CAT information.
  => XEN_SYSCTL_PSR_CAT_get_l2_info /
 XEN_DOMCTL_PSR_CAT_OP_GET_L2_CBM

  2. Ext: psr-mba-set -l2 domain-id cbm
  Set L2 cbm for a domain.
  => XEN_DOMCTL_PSR_CAT_OP_SET_L2_CBM

* Key data structure:

  1. Combined PSR bitmasks structure

 ```
 struct psr_mask {
 struct l3_cat {
 union {
 uint64_t cbm;
 struct {
 uint64_t code;
 uint64_t data;
 };