Re: [Xen-devel] [PATCH v6 04/24] x86: refactor psr: implement CPU init and free flow.

2017-02-09 Thread Yi Sun
On 17-02-08 14:03:15, Konrad Rzeszutek Wilk wrote:
> > > +static void cpu_fini_work(unsigned int cpu)
> > > +{
> > > +unsigned int socket = cpu_to_socket(cpu);
> > > +
> > > +if ( !socket_cpumask[socket] || 
> > > cpumask_empty(socket_cpumask[socket]) )
> > 
> > I fear I don't understand this.
> > 
> > Looking at 65a399ac it says:
> > 
> > +.notifier_call = cpu_callback,
> > +/*
> > + * Ensure socket_cpumask is still valid in CPU_DEAD notification
> > + * (E.g. our CPU_DEAD notification should be called ahead of
> > + * cpu_smpboot_free).
> > 
> > Which means that socket_cpumask[socket] should have an value.
> > 
> > In fact cpumask_any(socket_cpumask[socket]) will return true at this
> > point.
> > 
> > Which means that code above gets called if this psr_cpu_fini is called
> > _after_ cpu_smpboot_free (b/c socket_cpumask[socket] will be NULL), see
> > cpu_smpboot_free.
> > 
> > But with .priority being -1 that won't happen.
> > 
> > But lets ignore that, lets say it does get called _after_
> > cpu_smpboot_free. In which case the first part of the 'if' condition
> > is true and we end up doing: cpumask_empty(NULL) which will result
> > in an page fault.
> > 
> > I think this is a bug.
> 
> I missed the fact that we call this function on CPU_UP_CANCELED
> _and_ CPU_DEAD and that remove_siblinginfo is called before CPU_DEAD.
> 
> So say the socket is being onlined and we are the first CPU on
> that (CPU=2).
> 
> 
> 1) CPU_UP_PREPARE notifier calls psr_cpu_prepare which allocates
>'feat_l3_cat'.
> 
> 2). Some other notifies dies and we end up calling CPU_UP_CANCELED.
>psr_cpu_fini -> cpu_fini_work:
> 
>Since __cpu_up has not been called that means
>socket_cpumask[socket] == NULL
> 
>and we enter free_feature.
> 
>It does 'if !(socket_info + socket)' effectively. And that is false
>as init_psr was the one initializing that array which means the
>pointer (info aka socket_info + socket) certainly is not NULL.
> 
>Anyhow this means free_feature tries to walk the list - but 'info'
>is full of zeros.
> 
>Which means list_for_each_entry_safe is going to blow when trying
>to set 'n' (as pos is zero aka NULL).
> 
>Perhaps you want an INIT_LIST_HEAD in 'init_psr' so that free_feature
>can be idempotent?
> 
>Or perhaps you need '&&' in the if conditional (and naturally
>remove the !)?
> 
>I wonder what other CPU notifiers are guilty of this..
> 
You are right. I made a mistake here. Thanks a lot for finding out the issue!

In previous version, I called INIT_LIST_HEAD in init_psr. But I moved it into
cpu_init_work to place it together with spin_lock_init per review comments.
This causes this mis-match issue. I think your suggestion is good to change
check criteria as below.

if ( socket_cpumask[socket] && cpumask_empty(socket_cpumask[socket]) )

Then, free_feature will only be executed when CPU_STARTING has been done.

> 
> Lets try another example:
> 
> 1) CPU_UP_PREPARE notifier calls psr_cpu_prepare which allocates
>'feat_l3_cat'.
> 
> 2) All notifiers were OK, the __cpu_up has been called. The
>socket_cpumask[socket] = 
> 
> 3) All notifiers are called with CPU_ONLINE.
>All good.
> 
> 
> ..some time later ..
> 
> 4). CPU is brought down. Notifiers for CPU_DOWN_PREPARE are called
>[psr is not part of it]
> 
> 5) CPU notifiers CPU_DEAD are called. They all have to return
> NOTIFY_DONE otherwise we hit 'BUG_ON'
> 
>We call psr_cpu_fini -> cpu_fini_work
> 
>socket_cpumask[socket] has some data, so the first conditional
>is false:
> 
> ( !socket_cpumask[socket] ||
> 
>the second:
> 
> cpumask_empty(socket_cpumask[socket]) )
> 
>is true as take_cpu_down -> __cpu_disable -> remove_siblinginfo
> 
>was called before us.(and this is the first CPU in the socket).
> 
>And we call free_feature.
>which is correct and we free our data.
> 
>And the 'cpumask_empty' guards us so that we only call this
>on the very last CPU.
> 
> 
>   And we would still call this if the conditional was:
> 
>   if ( socket_cpumask[socket] && cpumask_empty(socket_cpumask[socket]))
> 
>   I think?
> 
> Feel free to poke holes at my analysis - I am sure I missed some edge
> case.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 04/24] x86: refactor psr: implement CPU init and free flow.

2017-02-09 Thread Yi Sun
On 17-02-08 11:34:16, Konrad Rzeszutek Wilk wrote:
> > +/* No valid value so do not enable feature. */
> > +if ( !regs.a || !regs.b )
> 
> You use regs.d below. Would it make sense to check for that value as
> well?
> 
> Or is a value of 0 for cox_max OK? I would think so, but not exactly
> sure.
> 
Sorry, this is a typo.

> > +{
> > +free_feature(socket_info + socket);
> > +}
> > +}
> > +
> > +static void __init init_psr(void)
> 
> Why don't we make this return an value? And then the init
> code(psr_presmp_init) can just return an error?

Because we need init CMT related things besides CAT in psr_presmp_init. So no
matter init_psr return value, we need call psr_cpu_init to call psr_assoc_init.
We will check if socket_info in following functions called to avoid error.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 04/24] x86: refactor psr: implement CPU init and free flow.

2017-02-08 Thread Konrad Rzeszutek Wilk
> > +static void cpu_fini_work(unsigned int cpu)
> > +{
> > +unsigned int socket = cpu_to_socket(cpu);
> > +
> > +if ( !socket_cpumask[socket] || cpumask_empty(socket_cpumask[socket]) )
> 
> I fear I don't understand this.
> 
> Looking at 65a399ac it says:
> 
> +.notifier_call = cpu_callback,
> +/*
> + * Ensure socket_cpumask is still valid in CPU_DEAD notification
> + * (E.g. our CPU_DEAD notification should be called ahead of
> + * cpu_smpboot_free).
> 
> Which means that socket_cpumask[socket] should have an value.
> 
> In fact cpumask_any(socket_cpumask[socket]) will return true at this
> point.
> 
> Which means that code above gets called if this psr_cpu_fini is called
> _after_ cpu_smpboot_free (b/c socket_cpumask[socket] will be NULL), see
> cpu_smpboot_free.
> 
> But with .priority being -1 that won't happen.
> 
> But lets ignore that, lets say it does get called _after_
> cpu_smpboot_free. In which case the first part of the 'if' condition
> is true and we end up doing: cpumask_empty(NULL) which will result
> in an page fault.
> 
> I think this is a bug.

I missed the fact that we call this function on CPU_UP_CANCELED
_and_ CPU_DEAD and that remove_siblinginfo is called before CPU_DEAD.

So say the socket is being onlined and we are the first CPU on
that (CPU=2).


1) CPU_UP_PREPARE notifier calls psr_cpu_prepare which allocates
   'feat_l3_cat'.

2). Some other notifies dies and we end up calling CPU_UP_CANCELED.
   psr_cpu_fini -> cpu_fini_work:

   Since __cpu_up has not been called that means
   socket_cpumask[socket] == NULL

   and we enter free_feature.

   It does 'if !(socket_info + socket)' effectively. And that is false
   as init_psr was the one initializing that array which means the
   pointer (info aka socket_info + socket) certainly is not NULL.

   Anyhow this means free_feature tries to walk the list - but 'info'
   is full of zeros.

   Which means list_for_each_entry_safe is going to blow when trying
   to set 'n' (as pos is zero aka NULL).

   Perhaps you want an INIT_LIST_HEAD in 'init_psr' so that free_feature
   can be idempotent?

   Or perhaps you need '&&' in the if conditional (and naturally
   remove the !)?

   I wonder what other CPU notifiers are guilty of this..


Lets try another example:

1) CPU_UP_PREPARE notifier calls psr_cpu_prepare which allocates
   'feat_l3_cat'.

2) All notifiers were OK, the __cpu_up has been called. The
   socket_cpumask[socket] = 

3) All notifiers are called with CPU_ONLINE.
   All good.


..some time later ..

4). CPU is brought down. Notifiers for CPU_DOWN_PREPARE are called
   [psr is not part of it]

5) CPU notifiers CPU_DEAD are called. They all have to return
NOTIFY_DONE otherwise we hit 'BUG_ON'

   We call psr_cpu_fini -> cpu_fini_work

   socket_cpumask[socket] has some data, so the first conditional
   is false:

( !socket_cpumask[socket] ||

   the second:

cpumask_empty(socket_cpumask[socket]) )

   is true as take_cpu_down -> __cpu_disable -> remove_siblinginfo

   was called before us.(and this is the first CPU in the socket).

   And we call free_feature.
   which is correct and we free our data.

   And the 'cpumask_empty' guards us so that we only call this
   on the very last CPU.


  And we would still call this if the conditional was:

  if ( socket_cpumask[socket] && cpumask_empty(socket_cpumask[socket]))

  I think?

Feel free to poke holes at my analysis - I am sure I missed some edge
case.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 04/24] x86: refactor psr: implement CPU init and free flow.

2017-02-08 Thread Konrad Rzeszutek Wilk
.snip..
> +static void free_feature(struct psr_socket_info *info)
> +{
> +struct feat_node *feat, *next;
> +
> +if ( !info )
> +return;
> +
> +/*
> + * Free resources of features. But we do not free global feature list
> + * entry, like feat_l3_cat. Although it may cause a few memory leak,
> + * it is OK simplify things.

it is OK as it simplifies things.
> + */
> +list_for_each_entry_safe(feat, next, >feat_list, list)
> +{
> +__clear_bit(feat->feature, >feat_mask);
> +list_del(>list);
> +xfree(feat);
> +}
> +}
> +
> +/* L3 CAT functions implementation. */
> +static void l3_cat_init_feature(struct cpuid_leaf regs,
> +struct feat_node *feat,
> +struct psr_socket_info *info)
> +{
> +struct psr_cat_hw_info l3_cat;

Having experienced a few times myself forgetting to set
_all_ the entries a structure allocated on the stack and then
debugging for hours - perhaps you could do:

l3_cat = { };

Which forces the compiler to initialize _all_ the values to zero.
> +unsigned int socket;
> +
> +/* No valid value so do not enable feature. */
> +if ( !regs.a || !regs.b )

You use regs.d below. Would it make sense to check for that value as
well?

Or is a value of 0 for cox_max OK? I would think so, but not exactly
sure.

> +return;
> +
> +l3_cat.cbm_len = (regs.a & CAT_CBM_LEN_MASK) + 1;
> +l3_cat.cos_max = min(opt_cos_max, regs.d & CAT_COS_MAX_MASK);
> +
> +/* cos=0 is reserved as default cbm(all bits within cbm_len are 1). */
> +feat->cos_reg_val[0] = (1ull << l3_cat.cbm_len) - 1;
> +
> +feat->feature = PSR_SOCKET_L3_CAT;
> +ASSERT(!test_bit(PSR_SOCKET_L3_CAT, >feat_mask));
> +__set_bit(PSR_SOCKET_L3_CAT, >feat_mask);
> +
> +feat->info.l3_cat_info = l3_cat;
> +
> +info->nr_feat++;
> +
> +/* Add this feature into list. */
> +list_add_tail(>list, >feat_list);
> +
> +socket = cpu_to_socket(smp_processor_id());
> +if ( !opt_cpu_info )
> +return;
> +
> +printk(XENLOG_INFO "L3 CAT: enabled on socket %u, cos_max:%u, 
> cbm_len:%u\n",
> +   socket, feat->info.l3_cat_info.cos_max,
> +   feat->info.l3_cat_info.cbm_len);
> +}
> +
> +static const struct feat_ops l3_cat_ops = {
> +};
> +
>  static void __init parse_psr_bool(char *s, char *value, char *feature,
>unsigned int mask)
>  {
> @@ -180,6 +257,9 @@ static void __init parse_psr_param(char *s)
>  if ( val_str && !strcmp(s, "rmid_max") )
>  opt_rmid_max = simple_strtoul(val_str, NULL, 0);
>  
> +if ( val_str && !strcmp(s, "cos_max") )
> +opt_cos_max = simple_strtoul(val_str, NULL, 0);
> +
>  s = ss + 1;
>  } while ( ss );
>  }
> @@ -335,18 +415,107 @@ void psr_domain_free(struct domain *d)
>  psr_free_rmid(d);
>  }
>  
> +static void cpu_init_work(void)
> +{
> +struct psr_socket_info *info;
> +unsigned int socket;
> +unsigned int cpu = smp_processor_id();
> +struct feat_node *feat;
> +struct cpuid_leaf regs = {.a = 0, .b = 0, .c = 0, .d = 0};
> +
> +if ( !cpu_has(_cpu_data, X86_FEATURE_PQE) )
> +return;
> +else if ( current_cpu_data.cpuid_level < PSR_CPUID_LEVEL_CAT )
> +{
> +__clear_bit(X86_FEATURE_PQE, current_cpu_data.x86_capability);
> +return;
> +}
> +
> +socket = cpu_to_socket(cpu);
> +info = socket_info + socket;
> +if ( info->feat_mask )
> +return;
> +
> +INIT_LIST_HEAD(>feat_list);
> +spin_lock_init(>ref_lock);
> +
> +cpuid_count_leaf(PSR_CPUID_LEVEL_CAT, 0, );
> +if ( regs.b & PSR_RESOURCE_TYPE_L3 )
> +{
> +cpuid_count_leaf(PSR_CPUID_LEVEL_CAT, 1, );
> +
> +feat = feat_l3_cat;
> +/* psr_cpu_prepare will allocate it on subsequent CPU onlining. */
> +feat_l3_cat = NULL;
> +feat->ops = l3_cat_ops;
> +
> +l3_cat_init_feature(regs, feat, info);
> +}
> +}
> +
> +static void cpu_fini_work(unsigned int cpu)
> +{
> +unsigned int socket = cpu_to_socket(cpu);
> +
> +if ( !socket_cpumask[socket] || cpumask_empty(socket_cpumask[socket]) )

I fear I don't understand this.

Looking at 65a399ac it says:

+.notifier_call = cpu_callback,
+/*
+ * Ensure socket_cpumask is still valid in CPU_DEAD notification
+ * (E.g. our CPU_DEAD notification should be called ahead of
+ * cpu_smpboot_free).

Which means that socket_cpumask[socket] should have an value.

In fact cpumask_any(socket_cpumask[socket]) will return true at this
point.

Which means that code above gets called if this psr_cpu_fini is called
_after_ cpu_smpboot_free (b/c socket_cpumask[socket] will be NULL), see
cpu_smpboot_free.

But with .priority being -1 that won't happen.

But lets ignore that, lets say it does get called _after_
cpu_smpboot_free. In which case the first part of the 'if' condition
is true and we end up 

[Xen-devel] [PATCH v6 04/24] x86: refactor psr: implement CPU init and free flow.

2017-02-08 Thread Yi Sun
This patch implements the CPU init and free flow including L3 CAT
initialization and feature list free.

Signed-off-by: Yi Sun 
---
v6:
- use 'struct cpuid_leaf' in x86_emulate.h.
- move cpuid_{,count}_leaf() helpers from cpuid.c to processor.h for
  external user and pass the compilation.
- fix comments to make them clearer.
- use __clear_bit().
- call ASSERT() before __set_bit().
- adjust printk() position.
---
 xen/arch/x86/cpuid.c|   6 --
 xen/arch/x86/psr.c  | 177 +++-
 xen/include/asm-x86/processor.h |   7 ++
 3 files changed, 182 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c
index e0a387e..e3e92dd 100644
--- a/xen/arch/x86/cpuid.c
+++ b/xen/arch/x86/cpuid.c
@@ -34,12 +34,6 @@ static void cpuid_leaf(uint32_t leaf, struct cpuid_leaf 
*data)
 cpuid(leaf, >a, >b, >c, >d);
 }
 
-static void cpuid_count_leaf(uint32_t leaf, uint32_t subleaf,
- struct cpuid_leaf *data)
-{
-cpuid_count(leaf, subleaf, >a, >b, >c, >d);
-}
-
 static void sanitise_featureset(uint32_t *fs)
 {
 /* for_each_set_bit() uses unsigned longs.  Extend with zeroes. */
diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c
index 4656936..9496c97 100644
--- a/xen/arch/x86/psr.c
+++ b/xen/arch/x86/psr.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Terminology:
@@ -35,6 +36,9 @@
 #define PSR_CAT(1<<1)
 #define PSR_CDP(1<<2)
 
+#define CAT_CBM_LEN_MASK 0x1f
+#define CAT_COS_MAX_MASK 0x
+
 /*
  * Per SDM chapter 'Cache Allocation Technology: Cache Mask Configuration',
  * the MSRs ranging from 0C90H through 0D0FH (inclusive), enables support for
@@ -136,11 +140,84 @@ struct psr_assoc {
 
 struct psr_cmt *__read_mostly psr_cmt;
 
+static struct psr_socket_info *__read_mostly socket_info;
+
 static unsigned int opt_psr;
 static unsigned int __initdata opt_rmid_max = 255;
+static unsigned int __read_mostly opt_cos_max = MAX_COS_REG_CNT;
 static uint64_t rmid_mask;
 static DEFINE_PER_CPU(struct psr_assoc, psr_assoc);
 
+/*
+ * Declare global feature list entry for every feature to facilitate the
+ * feature list creation. It will be allocated in psr_cpu_prepare() and
+ * inserted into feature list in cpu_init_work(). It is protected by
+ * cpu_add_remove_lock spinlock.
+ */
+static struct feat_node *feat_l3_cat;
+
+/* Common functions. */
+static void free_feature(struct psr_socket_info *info)
+{
+struct feat_node *feat, *next;
+
+if ( !info )
+return;
+
+/*
+ * Free resources of features. But we do not free global feature list
+ * entry, like feat_l3_cat. Although it may cause a few memory leak,
+ * it is OK simplify things.
+ */
+list_for_each_entry_safe(feat, next, >feat_list, list)
+{
+__clear_bit(feat->feature, >feat_mask);
+list_del(>list);
+xfree(feat);
+}
+}
+
+/* L3 CAT functions implementation. */
+static void l3_cat_init_feature(struct cpuid_leaf regs,
+struct feat_node *feat,
+struct psr_socket_info *info)
+{
+struct psr_cat_hw_info l3_cat;
+unsigned int socket;
+
+/* No valid value so do not enable feature. */
+if ( !regs.a || !regs.b )
+return;
+
+l3_cat.cbm_len = (regs.a & CAT_CBM_LEN_MASK) + 1;
+l3_cat.cos_max = min(opt_cos_max, regs.d & CAT_COS_MAX_MASK);
+
+/* cos=0 is reserved as default cbm(all bits within cbm_len are 1). */
+feat->cos_reg_val[0] = (1ull << l3_cat.cbm_len) - 1;
+
+feat->feature = PSR_SOCKET_L3_CAT;
+ASSERT(!test_bit(PSR_SOCKET_L3_CAT, >feat_mask));
+__set_bit(PSR_SOCKET_L3_CAT, >feat_mask);
+
+feat->info.l3_cat_info = l3_cat;
+
+info->nr_feat++;
+
+/* Add this feature into list. */
+list_add_tail(>list, >feat_list);
+
+socket = cpu_to_socket(smp_processor_id());
+if ( !opt_cpu_info )
+return;
+
+printk(XENLOG_INFO "L3 CAT: enabled on socket %u, cos_max:%u, 
cbm_len:%u\n",
+   socket, feat->info.l3_cat_info.cos_max,
+   feat->info.l3_cat_info.cbm_len);
+}
+
+static const struct feat_ops l3_cat_ops = {
+};
+
 static void __init parse_psr_bool(char *s, char *value, char *feature,
   unsigned int mask)
 {
@@ -180,6 +257,9 @@ static void __init parse_psr_param(char *s)
 if ( val_str && !strcmp(s, "rmid_max") )
 opt_rmid_max = simple_strtoul(val_str, NULL, 0);
 
+if ( val_str && !strcmp(s, "cos_max") )
+opt_cos_max = simple_strtoul(val_str, NULL, 0);
+
 s = ss + 1;
 } while ( ss );
 }
@@ -335,18 +415,107 @@ void psr_domain_free(struct domain *d)
 psr_free_rmid(d);
 }
 
+static void cpu_init_work(void)
+{
+struct psr_socket_info *info;
+unsigned int socket;
+unsigned int cpu = smp_processor_id();
+struct feat_node *feat;
+struct cpuid_leaf