Re: [PATCH 7/7] x86/microcode: Synchronize late microcode loading

2018-03-05 Thread Tom Lendacky
On 2/28/2018 4:28 AM, Borislav Petkov wrote:
> From: Ashok Raj 
> 
> Original idea by Ashok, completely rewritten by Borislav.
> 
> Before you read any further: the early loading method is still the
> preferred one and you should always do that. The following patch is
> improving the late loading mechanism for long running jobs and cloud use
> cases.
> 
> Gather all cores and serialize the microcode update on them by doing it
> one-by-one to make the late update process as reliable as possible and
> avoid potential issues caused by the microcode update.
> 
> Signed-off-by: Ashok Raj 
> [Rewrite completely. ]
> Co-developed-by: Borislav Petkov 
> Signed-off-by: Borislav Petkov 
> ---
>  arch/x86/kernel/cpu/microcode/core.c | 118 
> +++
>  1 file changed, 92 insertions(+), 26 deletions(-)
> 

Reviewed-by: Tom Lendacky 



Re: [PATCH 7/7] x86/microcode: Synchronize late microcode loading

2018-03-05 Thread Tom Lendacky
On 2/28/2018 4:28 AM, Borislav Petkov wrote:
> From: Ashok Raj 
> 
> Original idea by Ashok, completely rewritten by Borislav.
> 
> Before you read any further: the early loading method is still the
> preferred one and you should always do that. The following patch is
> improving the late loading mechanism for long running jobs and cloud use
> cases.
> 
> Gather all cores and serialize the microcode update on them by doing it
> one-by-one to make the late update process as reliable as possible and
> avoid potential issues caused by the microcode update.
> 
> Signed-off-by: Ashok Raj 
> [Rewrite completely. ]
> Co-developed-by: Borislav Petkov 
> Signed-off-by: Borislav Petkov 
> ---
>  arch/x86/kernel/cpu/microcode/core.c | 118 
> +++
>  1 file changed, 92 insertions(+), 26 deletions(-)
> 

Reviewed-by: Tom Lendacky 



Re: [PATCH 7/7] x86/microcode: Synchronize late microcode loading

2018-02-28 Thread Henrique de Moraes Holschuh
On Wed, 28 Feb 2018, Borislav Petkov wrote:
> On Wed, Feb 28, 2018 at 10:59:31AM -0300, Henrique de Moraes Holschuh wrote:
> > Eek! If I read that right, this effectively halts the entire box until
> > every core is updated, with one core entering deep-coma at a time (the
> > rest are left either spinning or cpu_relax()ing
> 
> I think *you* should relax. :)

Well, I don't expect any general-use distro to unleash late loading on
the users, certainly :-)  Least of all, Debian...  It is, nowadays, "use
it only if you know what you're doing" land.

But it is not yet sufficiently documented as such, I fear.

> Late microcode loading on a long running box is not something you do
> more than 2-3 times a year. And if the box needs to restart, it'll get
> the early microcode.

Sure, but the thing is so damn expensive (and the time it takes is
directly proportional to the number of cores, thus likely to hurt worse
exactly those who would want to use it), that I was left wondering if it
should not be optimized further to do the work in parallel (if that can
be made safe enough).

Besides, we likely don't want to have early microcode updates end up
being the reason AP bringup has to be serialized during boot either (and
it *is* likely to dominate the time taken for AP bringup, too!), so it
would be nice to have a way to make parallel microcode updates possible
in general...  but I don't think we're there, yet.

No matter. I am not opposing the patch in the first place.  And any
paralell microcode update work would be best done in an incremental
fashion, on top of working serial updates, anyway.

> And yes, this is addressing *late* loading, if you haven't noticed yet.

I did get that message, yes :)

> So keep doing the early method and you'll be fine.

We need that in the documentation :-P  Microcode updates have always
been somewhat slow, but now they are potentially going to be *much* more
painful and noticeable in the late-update case...

-- 
  Henrique Holschuh


Re: [PATCH 7/7] x86/microcode: Synchronize late microcode loading

2018-02-28 Thread Henrique de Moraes Holschuh
On Wed, 28 Feb 2018, Borislav Petkov wrote:
> On Wed, Feb 28, 2018 at 10:59:31AM -0300, Henrique de Moraes Holschuh wrote:
> > Eek! If I read that right, this effectively halts the entire box until
> > every core is updated, with one core entering deep-coma at a time (the
> > rest are left either spinning or cpu_relax()ing
> 
> I think *you* should relax. :)

Well, I don't expect any general-use distro to unleash late loading on
the users, certainly :-)  Least of all, Debian...  It is, nowadays, "use
it only if you know what you're doing" land.

But it is not yet sufficiently documented as such, I fear.

> Late microcode loading on a long running box is not something you do
> more than 2-3 times a year. And if the box needs to restart, it'll get
> the early microcode.

Sure, but the thing is so damn expensive (and the time it takes is
directly proportional to the number of cores, thus likely to hurt worse
exactly those who would want to use it), that I was left wondering if it
should not be optimized further to do the work in parallel (if that can
be made safe enough).

Besides, we likely don't want to have early microcode updates end up
being the reason AP bringup has to be serialized during boot either (and
it *is* likely to dominate the time taken for AP bringup, too!), so it
would be nice to have a way to make parallel microcode updates possible
in general...  but I don't think we're there, yet.

No matter. I am not opposing the patch in the first place.  And any
paralell microcode update work would be best done in an incremental
fashion, on top of working serial updates, anyway.

> And yes, this is addressing *late* loading, if you haven't noticed yet.

I did get that message, yes :)

> So keep doing the early method and you'll be fine.

We need that in the documentation :-P  Microcode updates have always
been somewhat slow, but now they are potentially going to be *much* more
painful and noticeable in the late-update case...

-- 
  Henrique Holschuh


Re: [PATCH 7/7] x86/microcode: Synchronize late microcode loading

2018-02-28 Thread Borislav Petkov
On Wed, Feb 28, 2018 at 10:59:31AM -0300, Henrique de Moraes Holschuh wrote:
> Eek! If I read that right, this effectively halts the entire box until
> every core is updated, with one core entering deep-coma at a time (the
> rest are left either spinning or cpu_relax()ing

I think *you* should relax. :)

Late microcode loading on a long running box is not something you do
more than 2-3 times a year. And if the box needs to restart, it'll get
the early microcode.

And yes, this is addressing *late* loading, if you haven't noticed yet.
I added a big fat note *twice*:

"Before you read any further: the early loading method is still the
preferred one and you should always do that. This patchset is improving
the late loading mechanism for long running jobs and cloud use cases -
i.e., use cases where early loading is, hm, a bit problematic."

So keep doing the early method and you'll be fine.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH 7/7] x86/microcode: Synchronize late microcode loading

2018-02-28 Thread Borislav Petkov
On Wed, Feb 28, 2018 at 10:59:31AM -0300, Henrique de Moraes Holschuh wrote:
> Eek! If I read that right, this effectively halts the entire box until
> every core is updated, with one core entering deep-coma at a time (the
> rest are left either spinning or cpu_relax()ing

I think *you* should relax. :)

Late microcode loading on a long running box is not something you do
more than 2-3 times a year. And if the box needs to restart, it'll get
the early microcode.

And yes, this is addressing *late* loading, if you haven't noticed yet.
I added a big fat note *twice*:

"Before you read any further: the early loading method is still the
preferred one and you should always do that. This patchset is improving
the late loading mechanism for long running jobs and cloud use cases -
i.e., use cases where early loading is, hm, a bit problematic."

So keep doing the early method and you'll be fine.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH 7/7] x86/microcode: Synchronize late microcode loading

2018-02-28 Thread Henrique de Moraes Holschuh
On Wed, 28 Feb 2018, Borislav Petkov wrote:
> + * Late loading dance. Why the heavy-handed stomp_machine effort?
> + *
> + * - HT siblings must be idle and not execute other code while the other 
> sibling
> + *   is loading microcode in order to avoid any negative interactions caused 
> by
> + *   the loading.
> + *
> + * - In addition, microcode update on the cores must be serialized until this
> + *   requirement can be relaxed in the future. Right now, this is 
> conservative
> + *   and good.

Eek! If I read that right, this effectively halts the entire box until
every core is updated, with one core entering deep-coma at a time (the
rest are left either spinning or cpu_relax()ing depending on whether
they have already updated or not)?

If this is correct, I shudder at what it would do on a server with
dozens, or hundreds of cores...  According to Ben Hawkes' paper, Intel's
on-die microcode update loader takes linear time relative to the update
size to do the crypto dance.

On my single-xeon X5550 workstation, which should be relatively fast
since its microcode update is small, the whole thing would take about
3,2 million cycles (circa 800k cycles per core, 4 cores, skipping
hyperthreads) to do a sync late update.  I don't believe this has
changed much, but I *did not* test, e.g., a Skylake Xeon, or anything
newer than that Xeon X5550.

Anyway, maybe there is a safe way to do it in a more parallel fashion
based on cpu topology?

AFAIK, it is not like there is any way to make OS microcode updates
(early or late) safe against SMIs and NMIs hitting the sibling
hyperthread while updating the other, so we don't have to care about
*that* nasty corner case simply because we can't avoid it in the first
place.

Hopefully AMD has none of those pitfalls, and could just trigger an
update on half the cores at a time, easily bounding it to approximately
twice the time it takes to update a single core :-(

-- 
  Henrique Holschuh


Re: [PATCH 7/7] x86/microcode: Synchronize late microcode loading

2018-02-28 Thread Henrique de Moraes Holschuh
On Wed, 28 Feb 2018, Borislav Petkov wrote:
> + * Late loading dance. Why the heavy-handed stomp_machine effort?
> + *
> + * - HT siblings must be idle and not execute other code while the other 
> sibling
> + *   is loading microcode in order to avoid any negative interactions caused 
> by
> + *   the loading.
> + *
> + * - In addition, microcode update on the cores must be serialized until this
> + *   requirement can be relaxed in the future. Right now, this is 
> conservative
> + *   and good.

Eek! If I read that right, this effectively halts the entire box until
every core is updated, with one core entering deep-coma at a time (the
rest are left either spinning or cpu_relax()ing depending on whether
they have already updated or not)?

If this is correct, I shudder at what it would do on a server with
dozens, or hundreds of cores...  According to Ben Hawkes' paper, Intel's
on-die microcode update loader takes linear time relative to the update
size to do the crypto dance.

On my single-xeon X5550 workstation, which should be relatively fast
since its microcode update is small, the whole thing would take about
3,2 million cycles (circa 800k cycles per core, 4 cores, skipping
hyperthreads) to do a sync late update.  I don't believe this has
changed much, but I *did not* test, e.g., a Skylake Xeon, or anything
newer than that Xeon X5550.

Anyway, maybe there is a safe way to do it in a more parallel fashion
based on cpu topology?

AFAIK, it is not like there is any way to make OS microcode updates
(early or late) safe against SMIs and NMIs hitting the sibling
hyperthread while updating the other, so we don't have to care about
*that* nasty corner case simply because we can't avoid it in the first
place.

Hopefully AMD has none of those pitfalls, and could just trigger an
update on half the cores at a time, easily bounding it to approximately
twice the time it takes to update a single core :-(

-- 
  Henrique Holschuh