Re: [Xen-devel] [PATCH v2 07/15] x86: implement set value flow for MBA

2017-08-31 Thread Roger Pau Monn
On Thu, Aug 31, 2017 at 06:10:07PM +0800, Yi Sun wrote:
> On 17-08-31 10:30:37, Roger Pau Monn� wrote:
> > On Thu, Aug 31, 2017 at 05:13:08PM +0800, Yi Sun wrote:
> > > On 17-08-31 09:30:47, Roger Pau Monn� wrote:
> > > > On Thu, Aug 31, 2017 at 10:20:10AM +0800, Yi Sun wrote:
> > > > > On 17-08-30 09:31:04, Roger Pau Monn� wrote:
> > > > > > On Thu, Aug 24, 2017 at 09:14:41AM +0800, Yi Sun wrote:
> > > > > Even user inputs a value that does not fulfill HW requirement, HW can 
> > > > > handle it.
> > > > > So, we do not need return error to user. Otherwise, user needs to 
> > > > > know details
> > > > > of MBA.
> > > > > 
> > > > > But the issue here is how we get the actual value and show it to 
> > > > > user. There are
> > > > > two ways to do that:
> > > > > 1. When setting value, check and change it to valid one and save it 
> > > > > to our cache.
> > > > 
> > > > Instead of performing those checks manually, why not simply write the
> > > > value and read it back to see the one the hardware has actually
> > > > chosen (and cache it)?
> > > > 
> > > So, I guess you prefer optioin 2, right? If so, do we need check if input 
> > > value
> > > is valid? I see above comment that you prefer to return an error and 
> > > leave the
> > > decision to user.
> > 
> > Yes, I think so. Is reading the MSR in any performance critical path?
> > Or it's just used by the user-space tools?
> > 
> Just used by user-space tools.

Then doing a MSR read each time doesn't seem that bad IMHO (it's not a
critical path anyway).

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 10/15] tools: implement the new libxl get hw info interface

2017-08-31 Thread Roger Pau Monn
On Thu, Aug 31, 2017 at 05:19:56PM +0800, Yi Sun wrote:
> On 17-08-31 09:40:37, Roger Pau Monn� wrote:
> > On Thu, Aug 31, 2017 at 11:16:34AM +0800, Yi Sun wrote:
> > > On 17-08-30 10:15:22, Roger Pau Monn� wrote:
> > > > On Thu, Aug 24, 2017 at 09:14:44AM +0800, Yi Sun wrote:
> > Has libxl_psr_cat_get_info been present in any Xen release? (I don't
> > think so) If it hasn't then you can change the interface without
> > issues.
> > 
> libxl_psr_cat_get_info is just introduced and not released yet. But 
> libxl_psr_cat_get_l3_info has been released for long time. It declares
> 'int *nr' and it calls 'libxl_psr_cat_get_info' to do the work. So, there
> must be a casting anyway.
> 
> What do you suggest? Shall I submit a patch to fix 'libxl_psr_cat_get_info'?
> Or, a patch to fix both 'libxl_psr_cat_get_info' and 
> 'libxl_psr_cat_get_l3_info'?

Just fixing libxl_psr_cat_get_info would be fine IMHO, let's try to
limit the int usage as much as possible.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 07/15] x86: implement set value flow for MBA

2017-08-31 Thread Roger Pau Monn
On Thu, Aug 31, 2017 at 05:13:08PM +0800, Yi Sun wrote:
> On 17-08-31 09:30:47, Roger Pau Monn� wrote:
> > On Thu, Aug 31, 2017 at 10:20:10AM +0800, Yi Sun wrote:
> > > On 17-08-30 09:31:04, Roger Pau Monn� wrote:
> > > > On Thu, Aug 24, 2017 at 09:14:41AM +0800, Yi Sun wrote:
> > > Even user inputs a value that does not fulfill HW requirement, HW can 
> > > handle it.
> > > So, we do not need return error to user. Otherwise, user needs to know 
> > > details
> > > of MBA.
> > > 
> > > But the issue here is how we get the actual value and show it to user. 
> > > There are
> > > two ways to do that:
> > > 1. When setting value, check and change it to valid one and save it to 
> > > our cache.
> > 
> > Instead of performing those checks manually, why not simply write the
> > value and read it back to see the one the hardware has actually
> > chosen (and cache it)?
> > 
> So, I guess you prefer optioin 2, right? If so, do we need check if input 
> value
> is valid? I see above comment that you prefer to return an error and leave the
> decision to user.

Yes, I think so. Is reading the MSR in any performance critical path?
Or it's just used by the user-space tools?

> > Is there any risk in writing an invalid value?
> 
> If input value is no more than the thrtl_max got through CPUID, it is fine 
> that
> HW can handle it.

OK, so I would just check for that then.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 11/15] tools: implement the new xl get hw info interface

2017-08-31 Thread Roger Pau Monn
On Thu, Aug 31, 2017 at 01:57:09PM +0800, Yi Sun wrote:
> On 17-08-30 10:23:18, Roger Pau Monn� wrote:
> > On Thu, Aug 24, 2017 at 09:14:45AM +0800, Yi Sun wrote:
> > > +static int psr_mba_hwinfo(void)
> > > +{
> > > +int rc;
> > > +unsigned int i, nr;
> > > +libxl_psr_hw_info *info;
> > > +
> > > +rc = libxl_psr_get_hw_info(ctx, , ,
> > > +   LIBXL_PSR_FEAT_TYPE_MBA, 0);
> > > +if (rc)
> > > +return rc;
> > > +
> > > +printf("Memory Bandwidth Allocation (MBA):\n");
> > > +
> > > +for (i = 0; i < nr; i++) {
> > > +printf("%-16s: %u\n", "Socket ID", info[i].id);
> > > +printf("%-16s: %s\n", "Linear Mode",
> > > +   info[i].u.mba.linear ? "Enabled" : "Disabled");
> > > +printf("%-16s: %u\n", "Maximum COS", info[i].u.mba.cos_max);
> > > +printf("%-16s: %u\n", "Maximum Throttling Value",
> > > +   info[i].u.mba.thrtl_max);
> > > +printf("%-16s: %u\n", "Default Throttling Value", 0);
> > 
> > If you really want to left-justify, shouldn't you choose a value that
> > aligns everything nicely (strlen("Default Throttling Valu") is
> > already greater than 16).
> > 
> Sorry for missing this.
> 
> > In fact you can do the alignment manually in the format string, and
> > avoid passing the name as the first parameter.
> > 
> DYM a sentence like below?
>   printf("%-*s: %u\n", 23, "Default Throttling Value", 0);

I was thinking more like:

printf(" Maximum COS: %u\n", ...
printf("Maximum Throttling Value: %u\n", ...
printf("Default Throttling Value: %u\n", ...

Or however you wish to align them. Although I don't have a strong
opinion, as long as it's easy to read.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 10/15] tools: implement the new libxl get hw info interface

2017-08-31 Thread Roger Pau Monn
On Thu, Aug 31, 2017 at 11:16:34AM +0800, Yi Sun wrote:
> On 17-08-30 10:15:22, Roger Pau Monn� wrote:
> > On Thu, Aug 24, 2017 at 09:14:44AM +0800, Yi Sun wrote:
> > > diff --git a/tools/libxl/libxl_psr.c b/tools/libxl/libxl_psr.c
> > > index b183305..d7da7d7 100644
> > > --- a/tools/libxl/libxl_psr.c
> > > +++ b/tools/libxl/libxl_psr.c
> > > @@ -382,56 +382,51 @@ static inline xc_psr_feat_type 
> > > libxl__psr_feat_type_to_libxc_psr_feat_type(
> > >  return xc_type;
> > >  }
> > >  
> > > +static inline int libxl__psr_hw_info_to_libxl_psr_cat_info(
> > 
> > No inline. Maybe you could try to shorter the name?
> > 
> Got it. Will remove the last '_psr'.
> 
> [...]
> 
> > >  int libxl_psr_cat_get_info(libxl_ctx *ctx, libxl_psr_cat_info **info,
> > > int *nr, unsigned int lvl)
> > >  {
> > >  GC_INIT(ctx);
> > >  int rc;
> > > -int i = 0, socketid, nr_sockets;
> > > -libxl_bitmap socketmap;
> > > +unsigned int i;
> > > +libxl_psr_hw_info *hw_info;
> > >  libxl_psr_cat_info *ptr;
> > > -xc_psr_hw_info hw_info;
> > > -xc_psr_feat_type xc_type;
> > > -
> > > -libxl_bitmap_init();
> > > -
> > > -rc = libxl__count_physical_sockets(gc, _sockets);
> > > -if (rc) {
> > > -LOGE(ERROR, "failed to get system socket count");
> > > -goto out;
> > > -}
> > >  
> > > -libxl_socket_bitmap_alloc(ctx, , nr_sockets);
> > > -rc = libxl_get_online_socketmap(ctx, );
> > > -if (rc < 0) {
> > > -LOGE(ERROR, "failed to get available sockets");
> > > +rc = libxl_psr_get_hw_info(ctx, _info, (unsigned int *)nr,
> > 
> > Is there any reason nr is int instead of unsigned int?
> > 
> > I would rather avoid casting things. Since this interface has not been
> > present in a release yet, could you please send a separate patch to
> > fix this if nr has no reason to be signed?
> > 
> This is a historical issue.
> 
> The first version of PSR introduced 'libxl_psr_cat_get_l3_info'. The input
> parameter is 'int *nr'.
> 
> I think we cannot change the interface which has been merged and used by
> others. Right?

Has libxl_psr_cat_get_info been present in any Xen release? (I don't
think so) If it hasn't then you can change the interface without
issues.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 08/15] tools: create general interfaces to support psr allocation features

2017-08-31 Thread Roger Pau Monn
On Thu, Aug 31, 2017 at 10:38:46AM +0800, Yi Sun wrote:
> On 17-08-30 09:42:56, Roger Pau Monn� wrote:
> > On Thu, Aug 24, 2017 at 09:14:42AM +0800, Yi Sun wrote:
> > > This patch creates general interfaces in libxl to support all psr
> > > allocation features.
> > > 
> > > Add 'LIBXL_HAVE_PSR_MBA' to indicate interface change.
> > 
> > I'm not sure this is enough to cover the changes you are doing here:
> > you are introducing some MBA stuff, plus a kind of generic interface
> > for PSR.
> > 
> > I think this should be split into two patches, the first one adding
> > the generic interface, and the second one adding the MBA stuff.
> > 
> This patch only introduces the generic interfaces without any MBA stuff.
> 
> The 'LIBXL_HAVE_PSR_MBA' is used to indicate the interfaces change here.
> Per my understand, we should add a macro to indicate libxl interfaces
> change, right?

So if there's no MBA specific interfaces introduced, why name the
define PSR_MBA? Maybe PSR_GENERIC or whatever you find more suitable.

> > > diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> > > index 6e80d36..ab847f8 100644
> > > --- a/tools/libxl/libxl_types.idl
> > > +++ b/tools/libxl/libxl_types.idl
> > > @@ -977,6 +977,7 @@ libxl_psr_cbm_type = Enumeration("psr_cbm_type", [
> > >  (2, "L3_CBM_CODE"),
> > >  (3, "L3_CBM_DATA"),
> > >  (4, "L2_CBM"),
> > > +(5, "MBA_THRTL"),
> > 
> > Is this really a CBM type?
> > 
> This is not CBM type. The 'libxl_psr_cbm_type' name is not good enough. But I
> have to introduce a new generic interface here if we want to make the name be
> generic. I think it is not so valuable. So, I reuse the 'libxl_psr_cbm_type'
> to cover MBA. How do you think?

Maybe you could introduce a new typedef, so that old code call still
work, ie:

typedef enum libxl_psr_cbm_type libxl_psr_type;

(Or whatever name you find suitable).

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 07/15] x86: implement set value flow for MBA

2017-08-31 Thread Roger Pau Monn
On Thu, Aug 31, 2017 at 10:20:10AM +0800, Yi Sun wrote:
> On 17-08-30 09:31:04, Roger Pau Monn� wrote:
> > On Thu, Aug 24, 2017 at 09:14:41AM +0800, Yi Sun wrote:
> > > diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c
> > > index 4a0c982..ce82975 100644
> > > --- a/xen/arch/x86/psr.c
> > > +++ b/xen/arch/x86/psr.c
> > > @@ -138,6 +138,12 @@ static const struct feat_props {
> > >  
> > >  /* write_msr is used to write out feature MSR register. */
> > >  void (*write_msr)(unsigned int cos, uint32_t val, enum psr_val_type 
> > > type);
> > > +
> > > +/*
> > > + * check_val is used to check if input val fulfills SDM requirement.
> > > + * Change it to valid value if SDM allows.
> > 
> > I'm not really sure it's a good idea to change the value to a valid
> > one, IMHO you should just check and print an error if the value is
> > invalid (and return false of course).
> > 
> Per SDM, the HW has ability to automatically change the input value to what it
> wants. E.g:
>   Linear mode: HW wants the input value be 10/20/30/.../90. But user inputs 
> 15.
>Then, HW can automatically change it to 10.

This seems like a very, very bad idea IMHO. What if I input 15 (which
is invalid), but I would rather prefer 20 instead of 10?

I would very much prefer an error, so I can get to chose a valid value
myself instead of the hardware deciding on my back.

> Even user inputs a value that does not fulfill HW requirement, HW can handle 
> it.
> So, we do not need return error to user. Otherwise, user needs to know details
> of MBA.
> 
> But the issue here is how we get the actual value and show it to user. There 
> are
> two ways to do that:
> 1. When setting value, check and change it to valid one and save it to our 
> cache.

Instead of performing those checks manually, why not simply write the
value and read it back to see the one the hardware has actually
chosen (and cache it)?

Is there any risk in writing an invalid value?

> 2. When getting value, call rdmsr to read the actual value back from HW.
> 
> I think option 1 has better performance and the code looks better.
> 
> > > + */
> > > +bool (*check_val)(const struct feat_node *feat, unsigned long *val);
> > >  } *feat_props[FEAT_TYPE_NUM];
> > >  
> [...]
> 
> > >  /* L3 CAT props */
> > >  static void l3_cat_write_msr(unsigned int cos, uint32_t val,
> > >   enum psr_val_type type)
> > > @@ -446,6 +453,7 @@ static const struct feat_props l3_cat_props = {
> > >  .alt_type = PSR_VAL_TYPE_UNKNOWN,
> > >  .get_feat_info = cat_get_feat_info,
> > >  .write_msr = l3_cat_write_msr,
> > > +.check_val = cat_check_cbm,
> > >  };
> > 
> > Maybe the introduction of check_val should be a separate patch? It's
> > mostly code movement and some fixup.
> > 
> Ok, may consider it.
> 
> [...]
> 
> > > +static bool mba_check_thrtl(const struct feat_node *feat, unsigned long 
> > > *thrtl)
> > > +{
> > > +if ( *thrtl > feat->mba_info.thrtl_max )
> > > +return false;
> > > +
> > > +/*
> > > + * Per SDM (chapter "Memory Bandwidth Allocation Configuration"):
> > > + * 1. Linear mode: In the linear mode the input precision is defined
> > > + *as 100-(MBA_MAX). For instance, if the MBA_MAX value is 90, the
> > > + *input precision is 10%. Values not an even multiple of the
> > > + *precision (e.g., 12%) will be rounded down (e.g., to 10% delay
> > > + *applied).
> > > + * 2. Non-linear mode: Input delay values are powers-of-two from zero
> > > + *to the MBA_MAX value from CPUID. In this case any values not a
> > > + *power of two will be rounded down the next nearest power of 
> > > two.
> > > + */
> > > +if ( feat->mba_info.linear )
> > > +{
> > > +unsigned int mod;
> > > +
> > > +mod = *thrtl % (100 - feat->mba_info.thrtl_max);
> > > +*thrtl -= mod;
> > > +}
> > > +else
> > > +{
> > > +/* Not power of 2. */
> > > +if ( *thrtl && (*thrtl & (*thrtl - 1)) )
> > 
> > This can be joined with the else to avoid another indentation level:
> > 
> > else if ( *thrtl && (*thrtl & (*thrtl - 1)) )
> > ...
> > 
> Thanks!
> 
> > > +*thrtl = *

Re: [Xen-devel] [PATCH v2 04/15] x86: implement data structure and CPU init flow for MBA

2017-08-30 Thread Roger Pau Monn
On Wed, Aug 30, 2017 at 01:31:51PM +0800, Yi Sun wrote:
> On 17-08-29 14:44:32, Roger Pau Monn� wrote:
> > On Thu, Aug 24, 2017 at 09:14:38AM +0800, Yi Sun wrote:
> > > +#define MBA_LINEAR (1u << 2)
> > 
> > Why is this shifted by 2?
> > 
> This is used to mask ECX register to know if MBA linear mode is set or not.

Shouldn't this be in asm-x86/psr.h since it's part of the hardware
interface?

> > > +/* We reserve cos=0 as default thrtl (0) which means no delay. */
> > > +feat->cos_reg_val[0] = 0;
> > > +wrmsrl(MSR_IA32_PSR_MBA_MASK(0), 0);
> > > +
> > > +/* Add this feature into array. */
> > > +info->features[type] = feat;
> > > +
> > > +if ( !opt_cpu_info )
> > > +return 0;
> > > +
> > > +printk(XENLOG_INFO "MBA: enabled on socket %u, cos_max:%u,"
> > > +   "thrtl_max:%u, linear:%u.\n",
> > 
> > Please try to avoid splitting messages, it makes it hard to grep for
> > them afterward.
> > 
> Ok, but that would exceed 80 characters. Is that acceptable?

That's fine for log messages. Please add a newline after XENLOG_INFO
and align the start of the string to the start of XENLOG_INFO.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 03/15] x86: rename 'cbm_type' to 'psr_val_type' to make it general

2017-08-30 Thread Roger Pau Monn
On Wed, Aug 30, 2017 at 01:47:38PM +0800, Yi Sun wrote:
> On 17-08-29 13:15:24, Roger Pau Monn� wrote:
> > On Thu, Aug 24, 2017 at 09:14:37AM +0800, Yi Sun wrote:
> > > This patch renames 'cbm_type' to 'psr_val_type' to make it be general.
> > 
> > I'm unsure whether the '_val_' is really needed here. AFAICT it
> > doesn't add any meaningful information to the define itself, and only
> > makes it longer. Apart from that it looks fine.
> > 
> The 'cbm_type' means the CAT value (CBM) type. As MBA value type is throttle,
> I want to rename 'cbm_type' to a generic value type. Because there are many
> 'psr' starting names, to make it clear that this structure is for value type,
> I name it 'psr_val_type'.

But the enum also has the '_val' thing, and it's used by psr_get_info.
IMHO it would be better to just drop the '_val'.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 02/15] Rename PSR sysctl/domctl interfaces and xsm policy to make them be general

2017-08-30 Thread Roger Pau Monn
On Wed, Aug 30, 2017 at 01:23:08PM +0800, Yi Sun wrote:
> On 17-08-29 13:00:19, Roger Pau Monn� wrote:
> > On Thu, Aug 24, 2017 at 09:14:36AM +0800, Yi Sun wrote:
> > > --- a/xen/arch/x86/domctl.c
> > > +++ b/xen/arch/x86/domctl.c
> > > @@ -1439,60 +1439,60 @@ long arch_do_domctl(
> > >  }
> > >  break;
> > >  
> > > -case XEN_DOMCTL_psr_cat_op:
> > > -switch ( domctl->u.psr_cat_op.cmd )
> > > +case XEN_DOMCTL_psr_alloc_op:
> > > +switch ( domctl->u.psr_alloc_op.cmd )
> > >  {
> > >  uint32_t val32;
> > >  
> > >  case XEN_DOMCTL_PSR_CAT_OP_SET_L3_CBM:
> > 
> > In order to match the name of the structure used, shouldn't those
> > defines be renamed to XEN_DOMCTL_PSR_ALLOC_* (instead of CAT)?
> > 
> These names are feature specific. E.g:
> XEN_DOMCTL_PSR_CAT_OP_SET_L3_CBM - L3 CAT
> XEN_DOMCTL_PSR_MBA_OP_SET_THRTL  - MBA
> 
> Could you accept them? If not, I think I may write a separate patch to rename
> them.

Maybe I'm just lost, but in this patch you are renaming psr_cat_op to
psr_alloc_op, hence I think those defines should also be renamed in
this same patch.

I would also prefer that you don't introduce the '_OP_|_op_'
everywhere, I don't think it's useful/descriptive. Hypercalls are
operations by definition IMHO, hence the 'op' thing doesn't add
anything useful.

Ie: the struct should be psr_alloc, and the name
XEN_DOMCTL_psr_alloc.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 01/15] docs: create Memory Bandwidth Allocation (MBA) feature document

2017-08-30 Thread Roger Pau Monn
On Wed, Aug 30, 2017 at 01:20:14PM +0800, Yi Sun wrote:
> Thanks a lot for the review comments!
> 
> On 17-08-29 12:46:49, Roger Pau Monn� wrote:
> > On Thu, Aug 24, 2017 at 09:14:35AM +0800, Yi Sun wrote:
> > Although I have to admit I don't really understand this interface,
> > wouldn't it be easier to specify the memory bandwidth allowed
> > per-domain, rather the amount of bandwidth removed?
> > 
> > Using your approach the user has to first get the total bandwidth, and
> > then subtract the removed bandwidth in order to know the remaining
> > bandwidth for a domain.
> > 
> The HW only provides throttling set method to control the bandwidth. So, I
> think it is straightforward to set throttling in tools layer. The 
> 'psr-mba-set'
> is designed as a simple command to set what HW needs.
> 
> Also, mentioned by SDM, "The throttling values exposed by MBA are approximate,
> and are calibrated to specific traffic patterns.". So, it is hard to provide
> exact bandwidth control in 'psr-mba-set'.

OK, I think I will wait until I see the example explained in order to
express my opinion on the proposed toolstack interface.

> > Also, IMHO you should provide a command to print the max throttling,
> 
> The 'psr-hwinfo' can show the max throttling. Because it is part of MBA HW 
> info.
> 
> > remember that from Xen's PoV Dom0 is just another domain, and the
> > CPUID values reported to Dom0 don't need to be the same as found on
> > bare metal.
> > 
> But the CPUID values got through 'psr' commands should be ones found on bare
> metal, right? Because these commands directly get the values from hypervisor
> through domctl/sysctl.

Yes, if they are provided by the hypervisor (ie: cpuid instruction is
executed in Xen, not Dom0).

> > > + if the MBA_MAX value is 90, the input precision is 10%. Values not 
> > > an even
> > > + multiple of the precision (e.g., 12%) will be rounded down (e.g., 
> > > to 10%
> > > + delay applied) by HW automatically.
> > > +
> > > + Non-linear mode: input delay values are powers-of-two from zero to 
> > > the
> > > + MBA_MAX value from CPUID. In this case any values not a power of 
> > > two will
> > > + be rounded down the next nearest power of two by HW automatically.
> > > +
> > > +# Technical details
> > > +
> > > +MBA is a member of Intel PSR features, it shares the base PSR 
> > > infrastructure
> > > +in Xen.
> > > +
> > > +## Hardware perspective
> > > +
> > > +  MBA defines a range of MSRs to support specifying a delay value 
> > > (Thrtl) per
> > > +  COS, with details below.
> > > +
> > > +  ```
> > > +   +++
> > > +   | MSR (per socket)   |Address |
> > > +   +++
> > > +   | IA32_L2_QOS_Ext_BW_Thrtl_0 | 0xD50  |
> > > +   +++
> > > +   | ...|  ...   |
> > > +   +++
> > > +   | IA32_L2_QOS_Ext_BW_Thrtl_n | 0xD50+n (n<64) |
> > > +   +++
> > > +  ```
> > 
> > Are you sure you want to hardcode this n<64? Isn't there a chance this
> > is going to be bumped in newer hardware?
> > 
> This is just a HW limitation declared in SDM. In fact, there is no such hard
> code limitation. Hypervisor side checks the 'cos_max' got through CPUID.

Then I would remove the n<64, or else this will get out-of-sync
without anyone noticing.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [early RFC] ARM PCI Passthrough design document

2017-03-15 Thread Roger Pau Monn?
On Wed, Mar 15, 2017 at 11:54:07AM -0500, Venu Busireddy wrote:
> On Wed, Mar 15, 2017 at 04:38:39PM +0000, Roger Pau Monn? wrote:
> > On Wed, Mar 15, 2017 at 10:11:35AM -0500, Venu Busireddy wrote:
> > > On Wed, Mar 15, 2017 at 12:56:50PM +0000, Roger Pau Monn? wrote:
> > > > On Wed, Mar 15, 2017 at 08:42:04AM -0400, Konrad Rzeszutek Wilk wrote:
> > > > > On Wed, Mar 15, 2017 at 12:07:28PM +, Roger Pau Monn? wrote:
> > > > > > On Fri, Mar 10, 2017 at 10:28:43AM -0500, Konrad Rzeszutek Wilk 
> > > > > > wrote:
> > > > > > > On Fri, Mar 10, 2017 at 12:23:18PM +0900, Roger Pau Monn? wrote:
> > > > > > > > On Thu, Mar 09, 2017 at 07:29:34PM -0500, Konrad Rzeszutek Wilk 
> > > > > > > > wrote:
> > > > > > > > > On Thu, Mar 09, 2017 at 01:26:45PM +, Julien Grall wrote:
> > > > > > > > > > Hi Konrad,
> > > > > > > > > > 
> > > > > > > > > > On 09/03/17 11:17, Konrad Rzeszutek Wilk wrote:
> > > > > > > > > > > On Thu, Mar 09, 2017 at 11:59:51AM +0900, Roger Pau Monn? 
> > > > > > > > > > > wrote:
> > > > > > > > > > > > On Wed, Mar 08, 2017 at 02:12:09PM -0500, Konrad 
> > > > > > > > > > > > Rzeszutek Wilk wrote:
> > > > > > > > > > > > > On Wed, Mar 08, 2017 at 07:06:23PM +, Julien 
> > > > > > > > > > > > > Grall wrote:
> > > > > > > > > > > > > .. this as for SR-IOV devices you need the drivers to 
> > > > > > > > > > > > > kick the hardware
> > > > > > > > > > > > > to generate the new bus addresses. And those (along 
> > > > > > > > > > > > > with the BAR regions) are
> > > > > > > > > > > > > not visible in ACPI (they are constructued 
> > > > > > > > > > > > > dynamically).
> > > > > > > > > > > > 
> > > > > > > > > > > > There's already code in Xen [0] to find out the size of 
> > > > > > > > > > > > the BARs of SR-IOV
> > > > > > > > > > > > devices, but I'm not sure what's the intended usage of 
> > > > > > > > > > > > that, does it need to
> > > > > > > > > > > > happen _after_ the driver in Dom0 has done whatever 
> > > > > > > > > > > > magic for this to work?
> > > > > > > > > > > 
> > > > > > > > > > > Yes. This is called via the PHYSDEVOP_pci_device_add 
> > > > > > > > > > > hypercall when
> > > > > > > > > > > the device driver in dom0 has finished "creating" the VF. 
> > > > > > > > > > > See drivers/xen/pci.c
> > > > > > > > > > 
> > > > > > > > > > We are thinking to not use PHYSDEVOP_pci_device_add 
> > > > > > > > > > hypercall for ARM and do
> > > > > > > > > > the PCI scanning in Xen.
> > > > > > > > > > 
> > > > > > > > > > If I understand correctly what you said, only the PCI 
> > > > > > > > > > driver will be able to
> > > > > > > > > > kick SR-IOV device and Xen would not be able to detect the 
> > > > > > > > > > device until it
> > > > > > > > > > has been fully configured. So it would mean that we have to 
> > > > > > > > > > keep
> > > > > > > > > > PHYSDEVOP_pci_device_add around to know when Xen can use 
> > > > > > > > > > the device.
> > > > > > > > > > 
> > > > > > > > > > Am I correct?
> > > > > > > > > 
> > > > > > > > > Yes. Unless the PCI drivers come up with some other way to 
> > > > > > > > > tell the
> > > > > > > > > OS that oh, hey, there is this new PCI device with this BDF.
> > > > > > > > > 
> > > > > > > > > O

Re: [Xen-devel] [early RFC] ARM PCI Passthrough design document

2017-03-15 Thread Roger Pau Monn?
On Wed, Mar 15, 2017 at 10:11:35AM -0500, Venu Busireddy wrote:
> On Wed, Mar 15, 2017 at 12:56:50PM +0000, Roger Pau Monn? wrote:
> > On Wed, Mar 15, 2017 at 08:42:04AM -0400, Konrad Rzeszutek Wilk wrote:
> > > On Wed, Mar 15, 2017 at 12:07:28PM +0000, Roger Pau Monn? wrote:
> > > > On Fri, Mar 10, 2017 at 10:28:43AM -0500, Konrad Rzeszutek Wilk wrote:
> > > > > On Fri, Mar 10, 2017 at 12:23:18PM +0900, Roger Pau Monn? wrote:
> > > > > > On Thu, Mar 09, 2017 at 07:29:34PM -0500, Konrad Rzeszutek Wilk 
> > > > > > wrote:
> > > > > > > On Thu, Mar 09, 2017 at 01:26:45PM +, Julien Grall wrote:
> > > > > > > > Hi Konrad,
> > > > > > > > 
> > > > > > > > On 09/03/17 11:17, Konrad Rzeszutek Wilk wrote:
> > > > > > > > > On Thu, Mar 09, 2017 at 11:59:51AM +0900, Roger Pau Monn? 
> > > > > > > > > wrote:
> > > > > > > > > > On Wed, Mar 08, 2017 at 02:12:09PM -0500, Konrad Rzeszutek 
> > > > > > > > > > Wilk wrote:
> > > > > > > > > > > On Wed, Mar 08, 2017 at 07:06:23PM +, Julien Grall 
> > > > > > > > > > > wrote:
> > > > > > > > > > > .. this as for SR-IOV devices you need the drivers to 
> > > > > > > > > > > kick the hardware
> > > > > > > > > > > to generate the new bus addresses. And those (along with 
> > > > > > > > > > > the BAR regions) are
> > > > > > > > > > > not visible in ACPI (they are constructued dynamically).
> > > > > > > > > > 
> > > > > > > > > > There's already code in Xen [0] to find out the size of the 
> > > > > > > > > > BARs of SR-IOV
> > > > > > > > > > devices, but I'm not sure what's the intended usage of 
> > > > > > > > > > that, does it need to
> > > > > > > > > > happen _after_ the driver in Dom0 has done whatever magic 
> > > > > > > > > > for this to work?
> > > > > > > > > 
> > > > > > > > > Yes. This is called via the PHYSDEVOP_pci_device_add 
> > > > > > > > > hypercall when
> > > > > > > > > the device driver in dom0 has finished "creating" the VF. See 
> > > > > > > > > drivers/xen/pci.c
> > > > > > > > 
> > > > > > > > We are thinking to not use PHYSDEVOP_pci_device_add hypercall 
> > > > > > > > for ARM and do
> > > > > > > > the PCI scanning in Xen.
> > > > > > > > 
> > > > > > > > If I understand correctly what you said, only the PCI driver 
> > > > > > > > will be able to
> > > > > > > > kick SR-IOV device and Xen would not be able to detect the 
> > > > > > > > device until it
> > > > > > > > has been fully configured. So it would mean that we have to keep
> > > > > > > > PHYSDEVOP_pci_device_add around to know when Xen can use the 
> > > > > > > > device.
> > > > > > > > 
> > > > > > > > Am I correct?
> > > > > > > 
> > > > > > > Yes. Unless the PCI drivers come up with some other way to tell 
> > > > > > > the
> > > > > > > OS that oh, hey, there is this new PCI device with this BDF.
> > > > > > > 
> > > > > > > Or the underlaying bus on ARM can send some 'new device' 
> > > > > > > information?
> > > > > > 
> > > > > > Hm, is this something standard between all the SR-IOV 
> > > > > > implementations, or each
> > > > > > vendors have their own sauce?
> > > > > 
> > > > > Gosh, all of them have their own sauce. The only thing that is the 
> > > > > same
> > > > > is that suddenly behind the PF device there are PCI devies that are 
> > > > > responding
> > > > > to 0xcfc requests. MAgic!
> > > > 
> > > > I'm reading the PCI SR-IOV 1.1 spec, and I think we don't need to wait 
> > > > for the
> > > > device driver in Dom

Re: [Xen-devel] [PATCH v8 03/24] x86: refactor psr: implement main data structures.

2017-03-01 Thread Roger Pau Monn
On Wed, Mar 01, 2017 at 01:54:00AM -0700, Jan Beulich wrote:
> >>> On 01.03.17 at 09:49,  wrote:
> > What I was proposing is to make feat_node an array of pointers, so if a 
> > feature
> > is not supported that specific pointer would be NULL (ie: if feat_mask &
> > PSR_SOCKET_L3_CDP == 0, features[PSR_SOCKET_L3_CDP] == NULL). And then you 
> > can
> > avoid all the list traversing code.
> 
> In which case the feature flags would look to become redundant too.

I guess so, I'm just not sure if you can have a feature available (so
feat_mask bit set) but not enabled (so features[...] == NULL). Not sure if that
makes sense or is desirable at all.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v8 03/24] x86: refactor psr: implement main data structures.

2017-03-01 Thread Roger Pau Monn
On Wed, Mar 01, 2017 at 01:10:02PM +0800, Yi Sun wrote:
> On 17-02-28 11:58:59, Roger Pau Monn� wrote:
> > On Wed, Feb 15, 2017 at 04:49:18PM +0800, Yi Sun wrote:
> > > To construct an extendible framework, we need analyze PSR features
> > > and abstract the common things and feature specific things. Then,
> > > encapsulate them into different data structures.
> > > 
> > > By analyzing PSR features, we can get below map.
> > > +--+--+--+
> > >   ->| Dom0 | Dom1 | ...  |
> > >   | +--+--+--+
> > >   ||
> > >   |Dom ID  | cos_id of domain
> > >   |V
> > >   |
> > > +-+
> > > User ->| PSR  
> > >|
> > >  Socket ID |  +--+---+---+
> > >|
> > >|  | Socket0 Info | Socket 1 Info |...|
> > >|
> > >|  +--+---+---+
> > >|
> > >||   cos_id=0   cos_id=1   
> > >... |
> > >||  
> > > +---+---+---+ |
> > >||->Ref   : | ref 0 | ref 1
> > >  | ...   | |
> > >||  
> > > +---+---+---+ |
> > >||  
> > > +---+---+---+ |
> > >||->L3 CAT: | cos 0 | cos 1
> > >  | ...   | |
> > >||  
> > > +---+---+---+ |
> > >||  
> > > +---+---+---+ |
> > >||->L2 CAT: | cos 0 | cos 1
> > >  | ...   | |
> > >||  
> > > +---+---+---+ |
> > >||  
> > > +---+---+---+---+---+ |
> > >||->CDP   : | cos0 code | cos0 data | cos1 code | cos1 
> > > data | ...   | |
> > >|   
> > > +---+---+---+---+---+ |
> > >
> > > +-+
> > > 
> > > So, we need define a socket info data structure, 'struct
> > > psr_socket_info' to manage information per socket. It contains a
> > > reference count array according to COS ID and a feature list to
> > > manage all features enabled. Every entry of the reference count
> > > array is used to record how many domains are using the COS registers
> > > according to the COS ID. For example, L3 CAT and L2 CAT are enabled,
> > > Dom1 uses COS_ID=1 registers of both features to save CBM values, like
> > > below.
> > > +---+---+---+-+
> > > | COS 0 | COS 1 | COS 2 | ... |
> > > +---+---+---+-+
> > > L3 CAT  | 0x7ff | 0x1ff | ...   | ... |
> > > +---+---+---+-+
> > > L2 CAT  | 0xff  | 0xff  | ...   | ... |
> > > +---+---+---+-+
> > > 
> > > If Dom2 has same CBM values, it can reuse these registers which COS_ID=1.
> > > That means, both Dom1 and Dom2 use same COS registers(ID=1) to save same
> > > L3/L2 values. So, the value ref[1] is 2 which means 2 domains are using
> > > COS_ID 1.
> > > 
> > > To manage a feature, we need define a feature node data structure,
> > > 'struct feat_node', to manage feature's specific HW info, its callback
> > > functions (all feature's specific behaviors are encapsulated into these
> > > callback functions), and an array of all COS registers values of this
> > > feature.
> > > 
> > > CDP is a special feature which uses two entries of the array
> > > for one COS ID. So, the number of CDP C

Re: [Xen-devel] [PATCH v8 00/24] Enable L2 Cache Allocation Technology & Refactor psr.c

2017-03-01 Thread Roger Pau Monn
On Wed, Mar 01, 2017 at 12:54:54PM +0800, Yi Sun wrote:
> On 17-02-28 11:02:42, Roger Pau Monn� wrote:
> > On Wed, Feb 15, 2017 at 04:49:15PM +0800, Yi Sun wrote:
> > > Hi all,
> > > 
> > > We plan to bring a new PSR (Platform Shared Resource) feature called
> > > Intel L2 Cache Allocation Technology (L2 CAT) to Xen.
> > > 
> > > Besides the L2 CAT implementaion, we refactor the psr.c to make it more
> > > flexible to add new features and fulfill the principle, open for extension
> > > but closed for modification. We abstract the general operations of all
> > 
> > Hello,
> > 
> > I see that you use the term "open for extension but closed for modification"
> > here and in order patches. Could you please clarify what this means? "closed
> > for modification" doesn't seem to make much sense for an open source 
> > project,
> > where all the code is always open for modification.
> > 
> > Thanks, Roger.
> 
> Thanks for your review! "closed for modification" means we do not need modify
> the most codes, especially the main flows, when we add a new feature into
> codes.

IMHO, just saying that the current code is easily to extend in order to add new
features sounds better.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel