Re: [Xen-devel] [PATCH v2] x86/microcode: Synchronize late microcode loading
>>> On 01.05.18 at 10:15,wrote: > On Mon, Apr 30, 2018 at 09:25:26AM -0600, Jan Beulich wrote: > On 25.04.18 at 13:46, wrote: >>> +static int do_microcode_update(void *_info) >>> +{ >>> +struct microcode_info *info = _info; >>> +unsigned int cpu = smp_processor_id(); >>> +int ret; >>> + >>> +ret = wait_for_cpus(>cpu_in, MICROCODE_DEFAULT_TIMEOUT); >>> +if ( ret ) >>> +return ret; >>> +/* >>> + * Logical threads which set the first bit in cpu_sibling_mask can do >>> + * the update. Other sibling threads just await the completion of >>> + * microcode update. >>> + */ >>> +if ( cpumask_test_and_set_cpu( >>> +cpumask_first(per_cpu(cpu_sibling_mask, cpu)), >>> >cpus) ) >>> +ret = microcode_update_cpu(info->buffer, info->buffer_size); >> >>Isn't the condition inverted (i.e. missing a ! )? > > Yes. > >> >>Also I take it that you've confirmed that loading ucode in parallel on >>multiple >>cores of the same socket is not a problem? The comment in the last hunk >>suggests otherwise. > > No. In microcode_update_cpu(), microcode_mutex makes the update > sequential. Oh, right, of course. >>> + >>> +return ret; >>> } >> >>You're losing this return value (once for every CPU making it into this >>function). > > I don't understand this remark. This function is called by > stop_machine_run(). stop_machine_run() could return error if > any cpu failed during update. We don't care the specific CPU and > how many CPUs failed to do the update. Then please check your stop_machine_run() invocation again, in particular what happens with that function's return value. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2] x86/microcode: Synchronize late microcode loading
On Mon, Apr 30, 2018 at 09:25:26AM -0600, Jan Beulich wrote: On 25.04.18 at 13:46,wrote: >> @@ -281,24 +288,56 @@ static int microcode_update_cpu(const void *buf, >> size_t size) >> return err; >> } >> >> -static long do_microcode_update(void *_info) >> +/* Wait for all CPUs to rendezvous with a timeout (us) */ >> +static int wait_for_cpus(atomic_t *cnt, int timeout) > >unsigned int > >> +static int do_microcode_update(void *_info) >> +{ >> +struct microcode_info *info = _info; >> +unsigned int cpu = smp_processor_id(); >> +int ret; >> + >> +ret = wait_for_cpus(>cpu_in, MICROCODE_DEFAULT_TIMEOUT); >> +if ( ret ) >> +return ret; >> +/* >> + * Logical threads which set the first bit in cpu_sibling_mask can do >> + * the update. Other sibling threads just await the completion of >> + * microcode update. >> + */ >> +if ( cpumask_test_and_set_cpu( >> +cpumask_first(per_cpu(cpu_sibling_mask, cpu)), >cpus) >> ) >> +ret = microcode_update_cpu(info->buffer, info->buffer_size); > >Isn't the condition inverted (i.e. missing a ! )? Yes. > >Also I take it that you've confirmed that loading ucode in parallel on multiple >cores of the same socket is not a problem? The comment in the last hunk >suggests otherwise. No. In microcode_update_cpu(), microcode_mutex makes the update sequential. > >> +/* >> + * Increase the wait timeout to a safe value here since we're >> serializing >> + * the microcode update and that could take a while on a large number of >> + * CPUs. And that is fine as the *actual* timeout will be determined by >> + * the last CPU finished updating and thus cut short >> + */ >> +if ( wait_for_cpus(>cpu_out, MICROCODE_DEFAULT_TIMEOUT * >> + num_online_cpus()) ) >> +panic("Timeout when finishing updating microcode"); > >A 3s timeout (as an example for a system with 100 CPU threads) is still >absurdly high to me, but considering you panic() anyway if you hit the >timeout the question mainly is whether there's a slim chance for this to >complete a brief moment before the timeout expires. If all goes well, >you won't come close to even 1s, but as said before - there may be >guests running, and they may become utterly confused if they don't >get any time within a second or more. > >With you no longer doing things sequentially I don't, however, see why No. It is still sequential. And only one thread in a core will try to acquire the lock -- microcode_mutex. >you need to scale the timeout by CPU count. Maybe by the number of core. But I did't find an existing variable to count cores. > >> + >> +return ret; >> } > >You're losing this return value (once for every CPU making it into this >function). I don't understand this remark. This function is called by stop_machine_run(). stop_machine_run() could return error if any cpu failed during update. We don't care the specific CPU and how many CPUs failed to do the update. Thanks Chao > >> @@ -318,26 +357,52 @@ int >> microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long len) >> >> ret = copy_from_guest(info->buffer, buf, len); >> if ( ret != 0 ) >> -{ >> -xfree(info); >> -return ret; >> -} >> +goto free; >> >> info->buffer_size = len; >> -info->error = 0; >> -info->cpu = cpumask_first(_online_map); >> + >> +/* cpu_online_map must not change during update */ >> +if ( !get_cpu_maps() ) >> +{ >> +ret = -EBUSY; >> +goto free; >> +} >> >> if ( microcode_ops->start_update ) >> { >> ret = microcode_ops->start_update(); >> if ( ret != 0 ) >> -{ >> -xfree(info); >> -return ret; >> -} >> +goto put; >> } >> >> -return continue_hypercall_on_cpu(info->cpu, do_microcode_update, info); >> +cpumask_empty(>cpus); > >DYM cpumask_clear()? > >> +atomic_set(>cpu_in, 0); >> +atomic_set(>cpu_out, 0); >> + >> +/* >> + * We intend to disable interrupt for long time, which may lead to >> + * watchdog timeout. >> + */ >> +watchdog_disable(); >> +/* >> + * Late loading dance. Why the heavy-handed stop_machine effort? >> + * >> + * -HT siblings must be idle and not execute other code while the other >> + * sibling is loading microcode in order to avoid any negative >> + * interactions cause by the loading. >> + * >> + * -In addition, microcode update on the cores must be serialized until >> + * this requirement can be relaxed in the feature. Right now, this is >> + * conservative and good. > >This is the comment I've referred to above. > >Jan > ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2] x86/microcode: Synchronize late microcode loading
>>> On 25.04.18 at 13:46,wrote: > @@ -281,24 +288,56 @@ static int microcode_update_cpu(const void *buf, size_t > size) > return err; > } > > -static long do_microcode_update(void *_info) > +/* Wait for all CPUs to rendezvous with a timeout (us) */ > +static int wait_for_cpus(atomic_t *cnt, int timeout) unsigned int > +static int do_microcode_update(void *_info) > +{ > +struct microcode_info *info = _info; > +unsigned int cpu = smp_processor_id(); > +int ret; > + > +ret = wait_for_cpus(>cpu_in, MICROCODE_DEFAULT_TIMEOUT); > +if ( ret ) > +return ret; > +/* > + * Logical threads which set the first bit in cpu_sibling_mask can do > + * the update. Other sibling threads just await the completion of > + * microcode update. > + */ > +if ( cpumask_test_and_set_cpu( > +cpumask_first(per_cpu(cpu_sibling_mask, cpu)), >cpus) ) > +ret = microcode_update_cpu(info->buffer, info->buffer_size); Isn't the condition inverted (i.e. missing a ! )? Also I take it that you've confirmed that loading ucode in parallel on multiple cores of the same socket is not a problem? The comment in the last hunk suggests otherwise. > +/* > + * Increase the wait timeout to a safe value here since we're serializing > + * the microcode update and that could take a while on a large number of > + * CPUs. And that is fine as the *actual* timeout will be determined by > + * the last CPU finished updating and thus cut short > + */ > +if ( wait_for_cpus(>cpu_out, MICROCODE_DEFAULT_TIMEOUT * > + num_online_cpus()) ) > +panic("Timeout when finishing updating microcode"); A 3s timeout (as an example for a system with 100 CPU threads) is still absurdly high to me, but considering you panic() anyway if you hit the timeout the question mainly is whether there's a slim chance for this to complete a brief moment before the timeout expires. If all goes well, you won't come close to even 1s, but as said before - there may be guests running, and they may become utterly confused if they don't get any time within a second or more. With you no longer doing things sequentially I don't, however, see why you need to scale the timeout by CPU count. > + > +return ret; > } You're losing this return value (once for every CPU making it into this function). > @@ -318,26 +357,52 @@ int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) > buf, unsigned long len) > > ret = copy_from_guest(info->buffer, buf, len); > if ( ret != 0 ) > -{ > -xfree(info); > -return ret; > -} > +goto free; > > info->buffer_size = len; > -info->error = 0; > -info->cpu = cpumask_first(_online_map); > + > +/* cpu_online_map must not change during update */ > +if ( !get_cpu_maps() ) > +{ > +ret = -EBUSY; > +goto free; > +} > > if ( microcode_ops->start_update ) > { > ret = microcode_ops->start_update(); > if ( ret != 0 ) > -{ > -xfree(info); > -return ret; > -} > +goto put; > } > > -return continue_hypercall_on_cpu(info->cpu, do_microcode_update, info); > +cpumask_empty(>cpus); DYM cpumask_clear()? > +atomic_set(>cpu_in, 0); > +atomic_set(>cpu_out, 0); > + > +/* > + * We intend to disable interrupt for long time, which may lead to > + * watchdog timeout. > + */ > +watchdog_disable(); > +/* > + * Late loading dance. Why the heavy-handed stop_machine effort? > + * > + * -HT siblings must be idle and not execute other code while the other > + * sibling is loading microcode in order to avoid any negative > + * interactions cause by the loading. > + * > + * -In addition, microcode update on the cores must be serialized until > + * this requirement can be relaxed in the feature. Right now, this is > + * conservative and good. This is the comment I've referred to above. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel