Re: [Xen-devel] [PATCH v6 04/24] x86: refactor psr: implement CPU init and free flow.
On 17-02-08 14:03:15, Konrad Rzeszutek Wilk wrote: > > > +static void cpu_fini_work(unsigned int cpu) > > > +{ > > > +unsigned int socket = cpu_to_socket(cpu); > > > + > > > +if ( !socket_cpumask[socket] || > > > cpumask_empty(socket_cpumask[socket]) ) > > > > I fear I don't understand this. > > > > Looking at 65a399ac it says: > > > > +.notifier_call = cpu_callback, > > +/* > > + * Ensure socket_cpumask is still valid in CPU_DEAD notification > > + * (E.g. our CPU_DEAD notification should be called ahead of > > + * cpu_smpboot_free). > > > > Which means that socket_cpumask[socket] should have an value. > > > > In fact cpumask_any(socket_cpumask[socket]) will return true at this > > point. > > > > Which means that code above gets called if this psr_cpu_fini is called > > _after_ cpu_smpboot_free (b/c socket_cpumask[socket] will be NULL), see > > cpu_smpboot_free. > > > > But with .priority being -1 that won't happen. > > > > But lets ignore that, lets say it does get called _after_ > > cpu_smpboot_free. In which case the first part of the 'if' condition > > is true and we end up doing: cpumask_empty(NULL) which will result > > in an page fault. > > > > I think this is a bug. > > I missed the fact that we call this function on CPU_UP_CANCELED > _and_ CPU_DEAD and that remove_siblinginfo is called before CPU_DEAD. > > So say the socket is being onlined and we are the first CPU on > that (CPU=2). > > > 1) CPU_UP_PREPARE notifier calls psr_cpu_prepare which allocates >'feat_l3_cat'. > > 2). Some other notifies dies and we end up calling CPU_UP_CANCELED. >psr_cpu_fini -> cpu_fini_work: > >Since __cpu_up has not been called that means >socket_cpumask[socket] == NULL > >and we enter free_feature. > >It does 'if !(socket_info + socket)' effectively. And that is false >as init_psr was the one initializing that array which means the >pointer (info aka socket_info + socket) certainly is not NULL. > >Anyhow this means free_feature tries to walk the list - but 'info' >is full of zeros. > >Which means list_for_each_entry_safe is going to blow when trying >to set 'n' (as pos is zero aka NULL). > >Perhaps you want an INIT_LIST_HEAD in 'init_psr' so that free_feature >can be idempotent? > >Or perhaps you need '&&' in the if conditional (and naturally >remove the !)? > >I wonder what other CPU notifiers are guilty of this.. > You are right. I made a mistake here. Thanks a lot for finding out the issue! In previous version, I called INIT_LIST_HEAD in init_psr. But I moved it into cpu_init_work to place it together with spin_lock_init per review comments. This causes this mis-match issue. I think your suggestion is good to change check criteria as below. if ( socket_cpumask[socket] && cpumask_empty(socket_cpumask[socket]) ) Then, free_feature will only be executed when CPU_STARTING has been done. > > Lets try another example: > > 1) CPU_UP_PREPARE notifier calls psr_cpu_prepare which allocates >'feat_l3_cat'. > > 2) All notifiers were OK, the __cpu_up has been called. The >socket_cpumask[socket] = > > 3) All notifiers are called with CPU_ONLINE. >All good. > > > ..some time later .. > > 4). CPU is brought down. Notifiers for CPU_DOWN_PREPARE are called >[psr is not part of it] > > 5) CPU notifiers CPU_DEAD are called. They all have to return > NOTIFY_DONE otherwise we hit 'BUG_ON' > >We call psr_cpu_fini -> cpu_fini_work > >socket_cpumask[socket] has some data, so the first conditional >is false: > > ( !socket_cpumask[socket] || > >the second: > > cpumask_empty(socket_cpumask[socket]) ) > >is true as take_cpu_down -> __cpu_disable -> remove_siblinginfo > >was called before us.(and this is the first CPU in the socket). > >And we call free_feature. >which is correct and we free our data. > >And the 'cpumask_empty' guards us so that we only call this >on the very last CPU. > > > And we would still call this if the conditional was: > > if ( socket_cpumask[socket] && cpumask_empty(socket_cpumask[socket])) > > I think? > > Feel free to poke holes at my analysis - I am sure I missed some edge > case. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v6 04/24] x86: refactor psr: implement CPU init and free flow.
On 17-02-08 11:34:16, Konrad Rzeszutek Wilk wrote: > > +/* No valid value so do not enable feature. */ > > +if ( !regs.a || !regs.b ) > > You use regs.d below. Would it make sense to check for that value as > well? > > Or is a value of 0 for cox_max OK? I would think so, but not exactly > sure. > Sorry, this is a typo. > > +{ > > +free_feature(socket_info + socket); > > +} > > +} > > + > > +static void __init init_psr(void) > > Why don't we make this return an value? And then the init > code(psr_presmp_init) can just return an error? Because we need init CMT related things besides CAT in psr_presmp_init. So no matter init_psr return value, we need call psr_cpu_init to call psr_assoc_init. We will check if socket_info in following functions called to avoid error. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v6 04/24] x86: refactor psr: implement CPU init and free flow.
> > +static void cpu_fini_work(unsigned int cpu) > > +{ > > +unsigned int socket = cpu_to_socket(cpu); > > + > > +if ( !socket_cpumask[socket] || cpumask_empty(socket_cpumask[socket]) ) > > I fear I don't understand this. > > Looking at 65a399ac it says: > > +.notifier_call = cpu_callback, > +/* > + * Ensure socket_cpumask is still valid in CPU_DEAD notification > + * (E.g. our CPU_DEAD notification should be called ahead of > + * cpu_smpboot_free). > > Which means that socket_cpumask[socket] should have an value. > > In fact cpumask_any(socket_cpumask[socket]) will return true at this > point. > > Which means that code above gets called if this psr_cpu_fini is called > _after_ cpu_smpboot_free (b/c socket_cpumask[socket] will be NULL), see > cpu_smpboot_free. > > But with .priority being -1 that won't happen. > > But lets ignore that, lets say it does get called _after_ > cpu_smpboot_free. In which case the first part of the 'if' condition > is true and we end up doing: cpumask_empty(NULL) which will result > in an page fault. > > I think this is a bug. I missed the fact that we call this function on CPU_UP_CANCELED _and_ CPU_DEAD and that remove_siblinginfo is called before CPU_DEAD. So say the socket is being onlined and we are the first CPU on that (CPU=2). 1) CPU_UP_PREPARE notifier calls psr_cpu_prepare which allocates 'feat_l3_cat'. 2). Some other notifies dies and we end up calling CPU_UP_CANCELED. psr_cpu_fini -> cpu_fini_work: Since __cpu_up has not been called that means socket_cpumask[socket] == NULL and we enter free_feature. It does 'if !(socket_info + socket)' effectively. And that is false as init_psr was the one initializing that array which means the pointer (info aka socket_info + socket) certainly is not NULL. Anyhow this means free_feature tries to walk the list - but 'info' is full of zeros. Which means list_for_each_entry_safe is going to blow when trying to set 'n' (as pos is zero aka NULL). Perhaps you want an INIT_LIST_HEAD in 'init_psr' so that free_feature can be idempotent? Or perhaps you need '&&' in the if conditional (and naturally remove the !)? I wonder what other CPU notifiers are guilty of this.. Lets try another example: 1) CPU_UP_PREPARE notifier calls psr_cpu_prepare which allocates 'feat_l3_cat'. 2) All notifiers were OK, the __cpu_up has been called. The socket_cpumask[socket] = 3) All notifiers are called with CPU_ONLINE. All good. ..some time later .. 4). CPU is brought down. Notifiers for CPU_DOWN_PREPARE are called [psr is not part of it] 5) CPU notifiers CPU_DEAD are called. They all have to return NOTIFY_DONE otherwise we hit 'BUG_ON' We call psr_cpu_fini -> cpu_fini_work socket_cpumask[socket] has some data, so the first conditional is false: ( !socket_cpumask[socket] || the second: cpumask_empty(socket_cpumask[socket]) ) is true as take_cpu_down -> __cpu_disable -> remove_siblinginfo was called before us.(and this is the first CPU in the socket). And we call free_feature. which is correct and we free our data. And the 'cpumask_empty' guards us so that we only call this on the very last CPU. And we would still call this if the conditional was: if ( socket_cpumask[socket] && cpumask_empty(socket_cpumask[socket])) I think? Feel free to poke holes at my analysis - I am sure I missed some edge case. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v6 04/24] x86: refactor psr: implement CPU init and free flow.
.snip.. > +static void free_feature(struct psr_socket_info *info) > +{ > +struct feat_node *feat, *next; > + > +if ( !info ) > +return; > + > +/* > + * Free resources of features. But we do not free global feature list > + * entry, like feat_l3_cat. Although it may cause a few memory leak, > + * it is OK simplify things. it is OK as it simplifies things. > + */ > +list_for_each_entry_safe(feat, next, >feat_list, list) > +{ > +__clear_bit(feat->feature, >feat_mask); > +list_del(>list); > +xfree(feat); > +} > +} > + > +/* L3 CAT functions implementation. */ > +static void l3_cat_init_feature(struct cpuid_leaf regs, > +struct feat_node *feat, > +struct psr_socket_info *info) > +{ > +struct psr_cat_hw_info l3_cat; Having experienced a few times myself forgetting to set _all_ the entries a structure allocated on the stack and then debugging for hours - perhaps you could do: l3_cat = { }; Which forces the compiler to initialize _all_ the values to zero. > +unsigned int socket; > + > +/* No valid value so do not enable feature. */ > +if ( !regs.a || !regs.b ) You use regs.d below. Would it make sense to check for that value as well? Or is a value of 0 for cox_max OK? I would think so, but not exactly sure. > +return; > + > +l3_cat.cbm_len = (regs.a & CAT_CBM_LEN_MASK) + 1; > +l3_cat.cos_max = min(opt_cos_max, regs.d & CAT_COS_MAX_MASK); > + > +/* cos=0 is reserved as default cbm(all bits within cbm_len are 1). */ > +feat->cos_reg_val[0] = (1ull << l3_cat.cbm_len) - 1; > + > +feat->feature = PSR_SOCKET_L3_CAT; > +ASSERT(!test_bit(PSR_SOCKET_L3_CAT, >feat_mask)); > +__set_bit(PSR_SOCKET_L3_CAT, >feat_mask); > + > +feat->info.l3_cat_info = l3_cat; > + > +info->nr_feat++; > + > +/* Add this feature into list. */ > +list_add_tail(>list, >feat_list); > + > +socket = cpu_to_socket(smp_processor_id()); > +if ( !opt_cpu_info ) > +return; > + > +printk(XENLOG_INFO "L3 CAT: enabled on socket %u, cos_max:%u, > cbm_len:%u\n", > + socket, feat->info.l3_cat_info.cos_max, > + feat->info.l3_cat_info.cbm_len); > +} > + > +static const struct feat_ops l3_cat_ops = { > +}; > + > static void __init parse_psr_bool(char *s, char *value, char *feature, >unsigned int mask) > { > @@ -180,6 +257,9 @@ static void __init parse_psr_param(char *s) > if ( val_str && !strcmp(s, "rmid_max") ) > opt_rmid_max = simple_strtoul(val_str, NULL, 0); > > +if ( val_str && !strcmp(s, "cos_max") ) > +opt_cos_max = simple_strtoul(val_str, NULL, 0); > + > s = ss + 1; > } while ( ss ); > } > @@ -335,18 +415,107 @@ void psr_domain_free(struct domain *d) > psr_free_rmid(d); > } > > +static void cpu_init_work(void) > +{ > +struct psr_socket_info *info; > +unsigned int socket; > +unsigned int cpu = smp_processor_id(); > +struct feat_node *feat; > +struct cpuid_leaf regs = {.a = 0, .b = 0, .c = 0, .d = 0}; > + > +if ( !cpu_has(_cpu_data, X86_FEATURE_PQE) ) > +return; > +else if ( current_cpu_data.cpuid_level < PSR_CPUID_LEVEL_CAT ) > +{ > +__clear_bit(X86_FEATURE_PQE, current_cpu_data.x86_capability); > +return; > +} > + > +socket = cpu_to_socket(cpu); > +info = socket_info + socket; > +if ( info->feat_mask ) > +return; > + > +INIT_LIST_HEAD(>feat_list); > +spin_lock_init(>ref_lock); > + > +cpuid_count_leaf(PSR_CPUID_LEVEL_CAT, 0, ); > +if ( regs.b & PSR_RESOURCE_TYPE_L3 ) > +{ > +cpuid_count_leaf(PSR_CPUID_LEVEL_CAT, 1, ); > + > +feat = feat_l3_cat; > +/* psr_cpu_prepare will allocate it on subsequent CPU onlining. */ > +feat_l3_cat = NULL; > +feat->ops = l3_cat_ops; > + > +l3_cat_init_feature(regs, feat, info); > +} > +} > + > +static void cpu_fini_work(unsigned int cpu) > +{ > +unsigned int socket = cpu_to_socket(cpu); > + > +if ( !socket_cpumask[socket] || cpumask_empty(socket_cpumask[socket]) ) I fear I don't understand this. Looking at 65a399ac it says: +.notifier_call = cpu_callback, +/* + * Ensure socket_cpumask is still valid in CPU_DEAD notification + * (E.g. our CPU_DEAD notification should be called ahead of + * cpu_smpboot_free). Which means that socket_cpumask[socket] should have an value. In fact cpumask_any(socket_cpumask[socket]) will return true at this point. Which means that code above gets called if this psr_cpu_fini is called _after_ cpu_smpboot_free (b/c socket_cpumask[socket] will be NULL), see cpu_smpboot_free. But with .priority being -1 that won't happen. But lets ignore that, lets say it does get called _after_ cpu_smpboot_free. In which case the first part of the 'if' condition is true and we end up
[Xen-devel] [PATCH v6 04/24] x86: refactor psr: implement CPU init and free flow.
This patch implements the CPU init and free flow including L3 CAT initialization and feature list free. Signed-off-by: Yi Sun--- v6: - use 'struct cpuid_leaf' in x86_emulate.h. - move cpuid_{,count}_leaf() helpers from cpuid.c to processor.h for external user and pass the compilation. - fix comments to make them clearer. - use __clear_bit(). - call ASSERT() before __set_bit(). - adjust printk() position. --- xen/arch/x86/cpuid.c| 6 -- xen/arch/x86/psr.c | 177 +++- xen/include/asm-x86/processor.h | 7 ++ 3 files changed, 182 insertions(+), 8 deletions(-) diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c index e0a387e..e3e92dd 100644 --- a/xen/arch/x86/cpuid.c +++ b/xen/arch/x86/cpuid.c @@ -34,12 +34,6 @@ static void cpuid_leaf(uint32_t leaf, struct cpuid_leaf *data) cpuid(leaf, >a, >b, >c, >d); } -static void cpuid_count_leaf(uint32_t leaf, uint32_t subleaf, - struct cpuid_leaf *data) -{ -cpuid_count(leaf, subleaf, >a, >b, >c, >d); -} - static void sanitise_featureset(uint32_t *fs) { /* for_each_set_bit() uses unsigned longs. Extend with zeroes. */ diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c index 4656936..9496c97 100644 --- a/xen/arch/x86/psr.c +++ b/xen/arch/x86/psr.c @@ -19,6 +19,7 @@ #include #include #include +#include /* * Terminology: @@ -35,6 +36,9 @@ #define PSR_CAT(1<<1) #define PSR_CDP(1<<2) +#define CAT_CBM_LEN_MASK 0x1f +#define CAT_COS_MAX_MASK 0x + /* * Per SDM chapter 'Cache Allocation Technology: Cache Mask Configuration', * the MSRs ranging from 0C90H through 0D0FH (inclusive), enables support for @@ -136,11 +140,84 @@ struct psr_assoc { struct psr_cmt *__read_mostly psr_cmt; +static struct psr_socket_info *__read_mostly socket_info; + static unsigned int opt_psr; static unsigned int __initdata opt_rmid_max = 255; +static unsigned int __read_mostly opt_cos_max = MAX_COS_REG_CNT; static uint64_t rmid_mask; static DEFINE_PER_CPU(struct psr_assoc, psr_assoc); +/* + * Declare global feature list entry for every feature to facilitate the + * feature list creation. It will be allocated in psr_cpu_prepare() and + * inserted into feature list in cpu_init_work(). It is protected by + * cpu_add_remove_lock spinlock. + */ +static struct feat_node *feat_l3_cat; + +/* Common functions. */ +static void free_feature(struct psr_socket_info *info) +{ +struct feat_node *feat, *next; + +if ( !info ) +return; + +/* + * Free resources of features. But we do not free global feature list + * entry, like feat_l3_cat. Although it may cause a few memory leak, + * it is OK simplify things. + */ +list_for_each_entry_safe(feat, next, >feat_list, list) +{ +__clear_bit(feat->feature, >feat_mask); +list_del(>list); +xfree(feat); +} +} + +/* L3 CAT functions implementation. */ +static void l3_cat_init_feature(struct cpuid_leaf regs, +struct feat_node *feat, +struct psr_socket_info *info) +{ +struct psr_cat_hw_info l3_cat; +unsigned int socket; + +/* No valid value so do not enable feature. */ +if ( !regs.a || !regs.b ) +return; + +l3_cat.cbm_len = (regs.a & CAT_CBM_LEN_MASK) + 1; +l3_cat.cos_max = min(opt_cos_max, regs.d & CAT_COS_MAX_MASK); + +/* cos=0 is reserved as default cbm(all bits within cbm_len are 1). */ +feat->cos_reg_val[0] = (1ull << l3_cat.cbm_len) - 1; + +feat->feature = PSR_SOCKET_L3_CAT; +ASSERT(!test_bit(PSR_SOCKET_L3_CAT, >feat_mask)); +__set_bit(PSR_SOCKET_L3_CAT, >feat_mask); + +feat->info.l3_cat_info = l3_cat; + +info->nr_feat++; + +/* Add this feature into list. */ +list_add_tail(>list, >feat_list); + +socket = cpu_to_socket(smp_processor_id()); +if ( !opt_cpu_info ) +return; + +printk(XENLOG_INFO "L3 CAT: enabled on socket %u, cos_max:%u, cbm_len:%u\n", + socket, feat->info.l3_cat_info.cos_max, + feat->info.l3_cat_info.cbm_len); +} + +static const struct feat_ops l3_cat_ops = { +}; + static void __init parse_psr_bool(char *s, char *value, char *feature, unsigned int mask) { @@ -180,6 +257,9 @@ static void __init parse_psr_param(char *s) if ( val_str && !strcmp(s, "rmid_max") ) opt_rmid_max = simple_strtoul(val_str, NULL, 0); +if ( val_str && !strcmp(s, "cos_max") ) +opt_cos_max = simple_strtoul(val_str, NULL, 0); + s = ss + 1; } while ( ss ); } @@ -335,18 +415,107 @@ void psr_domain_free(struct domain *d) psr_free_rmid(d); } +static void cpu_init_work(void) +{ +struct psr_socket_info *info; +unsigned int socket; +unsigned int cpu = smp_processor_id(); +struct feat_node *feat; +struct cpuid_leaf