Re: [Qemu-devel] [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

2019-09-05 Thread Igor Mammedov
On Thu, 5 Sep 2019 15:08:31 +0200
Laszlo Ersek  wrote:

> On 09/04/19 11:52, Igor Mammedov wrote:
> 
> >  it could be stolen RAM + black hole like TSEG, assuming fw can live 
> > without RAM(0x3+128K) range
> >   (in this case fwcfg interface would only work for locking down the range)
> > 
> >  or
> > 
> >  we can actually have a dedicated SMRAM (like in my earlier RFC),
> >  in this case FW can use RAM(0x3+128K) when SMRAM isn't mapped into RAM 
> > address space
> >  (in this case fwcfg would be used to temporarily map SMRAM into normal RAM 
> > and unmap/lock
> >   after SMI relocation handler was initialized).
> > 
> > If possible I'd prefer a simpler TSEG like variant.  
> 
> I think TSEG-like behavior is between these two. That is, I believe we
> should have explicit open/close/lock operations. And, when the range is
> closed (meaning, closed+unlocked, or closed+locked), then the black hole
> should take effect for code that's not running in SMM.
> 
> Put differently, its like the second choice, except the range never
> appears as normal RAM. "When SMRAM isn't mapped into RAM address space",
> then the address range shows "nothing" (black hole).
I guess we at point where patch is better then words, I'll send one as reply 
here shortly.
I've just implemented subset of above (opened, closed+locked).


> Regarding "fw can live without RAM(0x3+128K) range" -- do you mean
> whether the firmware could use another RAM area for fw_cfg DMA?
> 
> If that's the question, then I wouldn't worry about it. I'd remove the
> 0x3+128K range from the memory map, so the fw_cfg stuff (or anything
> else) would never allocate memory from the range. It's much more
> concerning to me however how the SMM infrastructure would deal with a
> hole in the memory map right there.
I didn't mean fwcfg in this context, what I meant if firmware were able
to avoid using RAM(0x3+128K) range (since it becomes unusable after 
locking).
Looks like you just answered it here




Re: [Qemu-devel] [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

2019-09-05 Thread Laszlo Ersek
On 09/04/19 11:52, Igor Mammedov wrote:

>  it could be stolen RAM + black hole like TSEG, assuming fw can live without 
> RAM(0x3+128K) range
>   (in this case fwcfg interface would only work for locking down the range)
> 
>  or
> 
>  we can actually have a dedicated SMRAM (like in my earlier RFC),
>  in this case FW can use RAM(0x3+128K) when SMRAM isn't mapped into RAM 
> address space
>  (in this case fwcfg would be used to temporarily map SMRAM into normal RAM 
> and unmap/lock
>   after SMI relocation handler was initialized).
> 
> If possible I'd prefer a simpler TSEG like variant.

I think TSEG-like behavior is between these two. That is, I believe we
should have explicit open/close/lock operations. And, when the range is
closed (meaning, closed+unlocked, or closed+locked), then the black hole
should take effect for code that's not running in SMM.

Put differently, its like the second choice, except the range never
appears as normal RAM. "When SMRAM isn't mapped into RAM address space",
then the address range shows "nothing" (black hole).


Regarding "fw can live without RAM(0x3+128K) range" -- do you mean
whether the firmware could use another RAM area for fw_cfg DMA?

If that's the question, then I wouldn't worry about it. I'd remove the
0x3+128K range from the memory map, so the fw_cfg stuff (or anything
else) would never allocate memory from the range. It's much more
concerning to me however how the SMM infrastructure would deal with a
hole in the memory map right there.

Thanks
Laszlo



Re: [Qemu-devel] [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

2019-09-04 Thread Igor Mammedov
On Tue, 3 Sep 2019 19:20:25 +0200
Laszlo Ersek  wrote:

> On 09/03/19 16:53, Igor Mammedov wrote:
> > On Mon, 2 Sep 2019 21:09:58 +0200
> > Laszlo Ersek  wrote:
> >   
> >> On 09/02/19 10:45, Igor Mammedov wrote:  
> >>> On Fri, 30 Aug 2019 20:46:14 +0200
> >>> Laszlo Ersek  wrote:
> >>> 
>  On 08/30/19 16:48, Igor Mammedov wrote:
> 
> > (01) On boot firmware maps and initializes SMI handler at default 
> > SMBASE (3)
> >  (using dedicated SMRAM at 3 would allow us to avoid 
> > save/restore
> >   steps and make SMM handler pointer not vulnerable to DMA attacks)
> >
> > (02) QEMU hotplugs a new CPU in reset-ed state and sends SCI
> >
> > (03) on receiving SCI, host CPU calls GPE cpu hotplug handler
> >   which writes to IO port 0xB2 (broadcast SMI)
> >
> > (04) firmware waits for all existing CPUs rendezvous in SMM mode,
> >  new CPU(s) have SMI pending but does nothing yet
> >
> > (05) host CPU wakes up one new CPU (INIT-INIT-SIPI)
> >  SIPI vector points to RO flash HLT loop.
> >  (how host CPU will know which new CPUs to relocate?
> >   possibly reuse QEMU CPU hotplug MMIO interface???)
> >
> > (06) new CPU does relocation.
> >  (in case of attacker sends SIPI to several new CPUs,
> >   open question how to detect collision of several CPUs at the same 
> > default SMBASE)
> >
> > (07) once new CPU relocated host CPU completes initialization, returns
> >  from IO port write and executes the rest of GPE handler, telling OS
> >  to online new CPU.  
> 
>  In step (03), it is the OS that handles the SCI; it transfers control to
>  ACPI. The AML can write to IO port 0xB2 only because the OS allows it.
> 
>  If the OS decides to omit that step, and sends an INIT-SIPI-SIPI
>  directly to the new CPU, can it steal the CPU?
> >>> It sure can but this way it won't get access to privileged SMRAM
> >>> so OS can't subvert firmware.
> >>> The next time SMI broadcast is sent the CPU will use SMI handler at
> >>> default 3 SMBASE. It's up to us to define behavior here (for example
> >>> relocation handler can put such CPU in shutdown state).
> >>>
> >>> It's in the best interest of OS to cooperate and execute AML
> >>> provided by firmware, if it does not follow proper cpu hotplug flow
> >>> we can't guarantee that stolen CPU will work.
> >>
> >> This sounds convincing enough, for the hotplugged CPU; thanks.
> >>
> >> So now my concern is with step (01). While preparing for the initial
> >> relocation (of cold-plugged CPUs), the code assumes the memory at the
> >> default SMBASE (0x3) is normal RAM.
> >>
> >> Is it not a problem that the area is written initially while running in
> >> normal 32-bit or 64-bit mode, but then executed (in response to the
> >> first, synchronous, SMI) as SMRAM?  
> > 
> > currently there is no SMRAM at 0x3, so all access falls through
> > into RAM address space and we are about to change that.
> > 
> > but firmware doesn't have to use it as RAM, it can check if QEMU
> > supports SMRAM at 0x3 and if supported map it to configure
> > and then lock it down.  
> 
> I'm sure you are *technically* right, but you seem to be assuming that I
> can modify or rearrange anything I want in edk2. :)
yep, I'm looking at it from theoretical perspective so far,
but what could be done in reality might be limited.
 
> If we can solve the above in OVMF platform code, that's great. If not
> (e.g. UefiCpuPkg code needs to be updated), then things will get tricky.
> If we can introduce another platform hook for this, that would help. I
> can't say before I try.
> 
>
> > 
> >
> >> Basically I'm confused by the alias.
> >>
> >> TSEG (and presumably, A/B seg) work like this:
> >> - when open, looks like RAM to normal mode and SMM
> >> - when closed, looks like black-hole to normal mode, and like RAM to SMM
> >>
> >> The generic edk2 code knows this, and manages the SMRAM areas accordingly.
> >>
> >> The area at 0x3 is different:
> >> - looks like RAM to both normal mode and SMM
> >>
> >> If we set up the alias at 0x3 into A/B seg,
> >> - will that *permanently* hide the normal RAM at 0x3?
> >> - will 0x3 start behaving like A/B seg?
> >>
> >> Basically my concern is that the universal code in edk2 might or might
> >> not keep A/B seg open while initially populating the area at the default
> >> SMBASE. Specifically, I can imagine two issues:
> >>
> >> - if the alias into A/B seg is inactive during the initial population,
> >> then the initial writes go to RAM, but the execution (the first SMBASE
> >> relocation) will occur from A/B seg through the alias
> >>
> >> - alternatively, if the alias is always active, but A/B seg is closed
> >> during initial population (which happens in normal mode), then the
> >> initial writes go to the black hole, and execution will occur 

Re: [Qemu-devel] [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

2019-09-03 Thread Laszlo Ersek
On 09/03/19 16:53, Igor Mammedov wrote:
> On Mon, 2 Sep 2019 21:09:58 +0200
> Laszlo Ersek  wrote:
> 
>> On 09/02/19 10:45, Igor Mammedov wrote:
>>> On Fri, 30 Aug 2019 20:46:14 +0200
>>> Laszlo Ersek  wrote:
>>>   
 On 08/30/19 16:48, Igor Mammedov wrote:
  
> (01) On boot firmware maps and initializes SMI handler at default SMBASE 
> (3)
>  (using dedicated SMRAM at 3 would allow us to avoid save/restore
>   steps and make SMM handler pointer not vulnerable to DMA attacks)
>
> (02) QEMU hotplugs a new CPU in reset-ed state and sends SCI
>
> (03) on receiving SCI, host CPU calls GPE cpu hotplug handler
>   which writes to IO port 0xB2 (broadcast SMI)
>
> (04) firmware waits for all existing CPUs rendezvous in SMM mode,
>  new CPU(s) have SMI pending but does nothing yet
>
> (05) host CPU wakes up one new CPU (INIT-INIT-SIPI)
>  SIPI vector points to RO flash HLT loop.
>  (how host CPU will know which new CPUs to relocate?
>   possibly reuse QEMU CPU hotplug MMIO interface???)
>
> (06) new CPU does relocation.
>  (in case of attacker sends SIPI to several new CPUs,
>   open question how to detect collision of several CPUs at the same 
> default SMBASE)
>
> (07) once new CPU relocated host CPU completes initialization, returns
>  from IO port write and executes the rest of GPE handler, telling OS
>  to online new CPU.

 In step (03), it is the OS that handles the SCI; it transfers control to
 ACPI. The AML can write to IO port 0xB2 only because the OS allows it.

 If the OS decides to omit that step, and sends an INIT-SIPI-SIPI
 directly to the new CPU, can it steal the CPU?  
>>> It sure can but this way it won't get access to privileged SMRAM
>>> so OS can't subvert firmware.
>>> The next time SMI broadcast is sent the CPU will use SMI handler at
>>> default 3 SMBASE. It's up to us to define behavior here (for example
>>> relocation handler can put such CPU in shutdown state).
>>>
>>> It's in the best interest of OS to cooperate and execute AML
>>> provided by firmware, if it does not follow proper cpu hotplug flow
>>> we can't guarantee that stolen CPU will work.  
>>
>> This sounds convincing enough, for the hotplugged CPU; thanks.
>>
>> So now my concern is with step (01). While preparing for the initial
>> relocation (of cold-plugged CPUs), the code assumes the memory at the
>> default SMBASE (0x3) is normal RAM.
>>
>> Is it not a problem that the area is written initially while running in
>> normal 32-bit or 64-bit mode, but then executed (in response to the
>> first, synchronous, SMI) as SMRAM?
> 
> currently there is no SMRAM at 0x3, so all access falls through
> into RAM address space and we are about to change that.
> 
> but firmware doesn't have to use it as RAM, it can check if QEMU
> supports SMRAM at 0x3 and if supported map it to configure
> and then lock it down.

I'm sure you are *technically* right, but you seem to be assuming that I
can modify or rearrange anything I want in edk2. :)

If we can solve the above in OVMF platform code, that's great. If not
(e.g. UefiCpuPkg code needs to be updated), then things will get tricky.
If we can introduce another platform hook for this, that would help. I
can't say before I try.


> 
>  
>> Basically I'm confused by the alias.
>>
>> TSEG (and presumably, A/B seg) work like this:
>> - when open, looks like RAM to normal mode and SMM
>> - when closed, looks like black-hole to normal mode, and like RAM to SMM
>>
>> The generic edk2 code knows this, and manages the SMRAM areas accordingly.
>>
>> The area at 0x3 is different:
>> - looks like RAM to both normal mode and SMM
>>
>> If we set up the alias at 0x3 into A/B seg,
>> - will that *permanently* hide the normal RAM at 0x3?
>> - will 0x3 start behaving like A/B seg?
>>
>> Basically my concern is that the universal code in edk2 might or might
>> not keep A/B seg open while initially populating the area at the default
>> SMBASE. Specifically, I can imagine two issues:
>>
>> - if the alias into A/B seg is inactive during the initial population,
>> then the initial writes go to RAM, but the execution (the first SMBASE
>> relocation) will occur from A/B seg through the alias
>>
>> - alternatively, if the alias is always active, but A/B seg is closed
>> during initial population (which happens in normal mode), then the
>> initial writes go to the black hole, and execution will occur from a
>> "blank" A/B seg.
>>
>> Am I seeing things? (Sorry, I keep feeling dumber and dumber in this
>> thread.)
> 
> I don't really know how firmware uses A/B segments and I'm afraid that
> cannibalizing one for configuring 0x3 might break something.
> 
> Since we are inventing something out of q35 spec anyway, How about
> leaving A/B/TSEG to be and using fwcfg to configure when/where
> 

Re: [Qemu-devel] [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

2019-09-03 Thread Igor Mammedov
On Mon, 2 Sep 2019 21:09:58 +0200
Laszlo Ersek  wrote:

> On 09/02/19 10:45, Igor Mammedov wrote:
> > On Fri, 30 Aug 2019 20:46:14 +0200
> > Laszlo Ersek  wrote:
> >   
> >> On 08/30/19 16:48, Igor Mammedov wrote:
> >>  
> >>> (01) On boot firmware maps and initializes SMI handler at default SMBASE 
> >>> (3)
> >>>  (using dedicated SMRAM at 3 would allow us to avoid save/restore
> >>>   steps and make SMM handler pointer not vulnerable to DMA attacks)
> >>>
> >>> (02) QEMU hotplugs a new CPU in reset-ed state and sends SCI
> >>>
> >>> (03) on receiving SCI, host CPU calls GPE cpu hotplug handler
> >>>   which writes to IO port 0xB2 (broadcast SMI)
> >>>
> >>> (04) firmware waits for all existing CPUs rendezvous in SMM mode,
> >>>  new CPU(s) have SMI pending but does nothing yet
> >>>
> >>> (05) host CPU wakes up one new CPU (INIT-INIT-SIPI)
> >>>  SIPI vector points to RO flash HLT loop.
> >>>  (how host CPU will know which new CPUs to relocate?
> >>>   possibly reuse QEMU CPU hotplug MMIO interface???)
> >>>
> >>> (06) new CPU does relocation.
> >>>  (in case of attacker sends SIPI to several new CPUs,
> >>>   open question how to detect collision of several CPUs at the same 
> >>> default SMBASE)
> >>>
> >>> (07) once new CPU relocated host CPU completes initialization, returns
> >>>  from IO port write and executes the rest of GPE handler, telling OS
> >>>  to online new CPU.
> >>
> >> In step (03), it is the OS that handles the SCI; it transfers control to
> >> ACPI. The AML can write to IO port 0xB2 only because the OS allows it.
> >>
> >> If the OS decides to omit that step, and sends an INIT-SIPI-SIPI
> >> directly to the new CPU, can it steal the CPU?  
> > It sure can but this way it won't get access to privileged SMRAM
> > so OS can't subvert firmware.
> > The next time SMI broadcast is sent the CPU will use SMI handler at
> > default 3 SMBASE. It's up to us to define behavior here (for example
> > relocation handler can put such CPU in shutdown state).
> > 
> > It's in the best interest of OS to cooperate and execute AML
> > provided by firmware, if it does not follow proper cpu hotplug flow
> > we can't guarantee that stolen CPU will work.  
> 
> This sounds convincing enough, for the hotplugged CPU; thanks.
> 
> So now my concern is with step (01). While preparing for the initial
> relocation (of cold-plugged CPUs), the code assumes the memory at the
> default SMBASE (0x3) is normal RAM.
> 
> Is it not a problem that the area is written initially while running in
> normal 32-bit or 64-bit mode, but then executed (in response to the
> first, synchronous, SMI) as SMRAM?

currently there is no SMRAM at 0x3, so all access falls through
into RAM address space and we are about to change that.

but firmware doesn't have to use it as RAM, it can check if QEMU
supports SMRAM at 0x3 and if supported map it to configure
and then lock it down.

 
> Basically I'm confused by the alias.
> 
> TSEG (and presumably, A/B seg) work like this:
> - when open, looks like RAM to normal mode and SMM
> - when closed, looks like black-hole to normal mode, and like RAM to SMM
> 
> The generic edk2 code knows this, and manages the SMRAM areas accordingly.
> 
> The area at 0x3 is different:
> - looks like RAM to both normal mode and SMM
> 
> If we set up the alias at 0x3 into A/B seg,
> - will that *permanently* hide the normal RAM at 0x3?
> - will 0x3 start behaving like A/B seg?
> 
> Basically my concern is that the universal code in edk2 might or might
> not keep A/B seg open while initially populating the area at the default
> SMBASE. Specifically, I can imagine two issues:
> 
> - if the alias into A/B seg is inactive during the initial population,
> then the initial writes go to RAM, but the execution (the first SMBASE
> relocation) will occur from A/B seg through the alias
> 
> - alternatively, if the alias is always active, but A/B seg is closed
> during initial population (which happens in normal mode), then the
> initial writes go to the black hole, and execution will occur from a
> "blank" A/B seg.
> 
> Am I seeing things? (Sorry, I keep feeling dumber and dumber in this
> thread.)

I don't really know how firmware uses A/B segments and I'm afraid that
cannibalizing one for configuring 0x3 might break something.

Since we are inventing something out of q35 spec anyway, How about
leaving A/B/TSEG to be and using fwcfg to configure when/where
SMRAM(0x3+128K) should be mapped into RAM address space.

I see a couple of options:
  1: use identity mapping where SMRAM(0x3+128K) maps into the same
 range in RAM address space when firmware writes into fwcfg
 file and unmaps/locks on the second write (until HW reset)
  2: let firmware choose where to map SMRAM(0x3+128K) in RAM address
 space, logic is essentially the same as above only firmware
 picks and writes into fwcfg an address where 

Re: [Qemu-devel] [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

2019-09-02 Thread Laszlo Ersek
On 09/02/19 10:45, Igor Mammedov wrote:
> On Fri, 30 Aug 2019 20:46:14 +0200
> Laszlo Ersek  wrote:
> 
>> On 08/30/19 16:48, Igor Mammedov wrote:
>>
>>> (01) On boot firmware maps and initializes SMI handler at default SMBASE 
>>> (3)
>>>  (using dedicated SMRAM at 3 would allow us to avoid save/restore
>>>   steps and make SMM handler pointer not vulnerable to DMA attacks)
>>>
>>> (02) QEMU hotplugs a new CPU in reset-ed state and sends SCI
>>>
>>> (03) on receiving SCI, host CPU calls GPE cpu hotplug handler
>>>   which writes to IO port 0xB2 (broadcast SMI)
>>>
>>> (04) firmware waits for all existing CPUs rendezvous in SMM mode,
>>>  new CPU(s) have SMI pending but does nothing yet
>>>
>>> (05) host CPU wakes up one new CPU (INIT-INIT-SIPI)
>>>  SIPI vector points to RO flash HLT loop.
>>>  (how host CPU will know which new CPUs to relocate?
>>>   possibly reuse QEMU CPU hotplug MMIO interface???)
>>>
>>> (06) new CPU does relocation.
>>>  (in case of attacker sends SIPI to several new CPUs,
>>>   open question how to detect collision of several CPUs at the same 
>>> default SMBASE)
>>>
>>> (07) once new CPU relocated host CPU completes initialization, returns
>>>  from IO port write and executes the rest of GPE handler, telling OS
>>>  to online new CPU.  
>>
>> In step (03), it is the OS that handles the SCI; it transfers control to
>> ACPI. The AML can write to IO port 0xB2 only because the OS allows it.
>>
>> If the OS decides to omit that step, and sends an INIT-SIPI-SIPI
>> directly to the new CPU, can it steal the CPU?
> It sure can but this way it won't get access to privileged SMRAM
> so OS can't subvert firmware.
> The next time SMI broadcast is sent the CPU will use SMI handler at
> default 3 SMBASE. It's up to us to define behavior here (for example
> relocation handler can put such CPU in shutdown state).
> 
> It's in the best interest of OS to cooperate and execute AML
> provided by firmware, if it does not follow proper cpu hotplug flow
> we can't guarantee that stolen CPU will work.

This sounds convincing enough, for the hotplugged CPU; thanks.

So now my concern is with step (01). While preparing for the initial
relocation (of cold-plugged CPUs), the code assumes the memory at the
default SMBASE (0x3) is normal RAM.

Is it not a problem that the area is written initially while running in
normal 32-bit or 64-bit mode, but then executed (in response to the
first, synchronous, SMI) as SMRAM?

Basically I'm confused by the alias.

TSEG (and presumably, A/B seg) work like this:
- when open, looks like RAM to normal mode and SMM
- when closed, looks like black-hole to normal mode, and like RAM to SMM

The generic edk2 code knows this, and manages the SMRAM areas accordingly.

The area at 0x3 is different:
- looks like RAM to both normal mode and SMM

If we set up the alias at 0x3 into A/B seg,
- will that *permanently* hide the normal RAM at 0x3?
- will 0x3 start behaving like A/B seg?

Basically my concern is that the universal code in edk2 might or might
not keep A/B seg open while initially populating the area at the default
SMBASE. Specifically, I can imagine two issues:

- if the alias into A/B seg is inactive during the initial population,
then the initial writes go to RAM, but the execution (the first SMBASE
relocation) will occur from A/B seg through the alias

- alternatively, if the alias is always active, but A/B seg is closed
during initial population (which happens in normal mode), then the
initial writes go to the black hole, and execution will occur from a
"blank" A/B seg.

Am I seeing things? (Sorry, I keep feeling dumber and dumber in this
thread.)

Anyway, I guess we could try and see if OVMF still boots with the alias...

Thanks
Laszlo



Re: [Qemu-devel] [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

2019-09-02 Thread Igor Mammedov
On Fri, 30 Aug 2019 20:46:14 +0200
Laszlo Ersek  wrote:

> On 08/30/19 16:48, Igor Mammedov wrote:
> 
> > (01) On boot firmware maps and initializes SMI handler at default SMBASE 
> > (3)
> >  (using dedicated SMRAM at 3 would allow us to avoid save/restore
> >   steps and make SMM handler pointer not vulnerable to DMA attacks)
> > 
> > (02) QEMU hotplugs a new CPU in reset-ed state and sends SCI
> > 
> > (03) on receiving SCI, host CPU calls GPE cpu hotplug handler
> >   which writes to IO port 0xB2 (broadcast SMI)
> > 
> > (04) firmware waits for all existing CPUs rendezvous in SMM mode,
> >  new CPU(s) have SMI pending but does nothing yet
> > 
> > (05) host CPU wakes up one new CPU (INIT-INIT-SIPI)
> >  SIPI vector points to RO flash HLT loop.
> >  (how host CPU will know which new CPUs to relocate?
> >   possibly reuse QEMU CPU hotplug MMIO interface???)
> > 
> > (06) new CPU does relocation.
> >  (in case of attacker sends SIPI to several new CPUs,
> >   open question how to detect collision of several CPUs at the same 
> > default SMBASE)
> > 
> > (07) once new CPU relocated host CPU completes initialization, returns
> >  from IO port write and executes the rest of GPE handler, telling OS
> >  to online new CPU.  
> 
> In step (03), it is the OS that handles the SCI; it transfers control to
> ACPI. The AML can write to IO port 0xB2 only because the OS allows it.
> 
> If the OS decides to omit that step, and sends an INIT-SIPI-SIPI
> directly to the new CPU, can it steal the CPU?
It sure can but this way it won't get access to privileged SMRAM
so OS can't subvert firmware.
The next time SMI broadcast is sent the CPU will use SMI handler at
default 3 SMBASE. It's up to us to define behavior here (for example
relocation handler can put such CPU in shutdown state).

It's in the best interest of OS to cooperate and execute AML
provided by firmware, if it does not follow proper cpu hotplug flow
we can't guarantee that stolen CPU will work.

> Thanks!
> Laszlo




Re: [Qemu-devel] [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

2019-08-30 Thread Laszlo Ersek
On 08/30/19 16:48, Igor Mammedov wrote:

> (01) On boot firmware maps and initializes SMI handler at default SMBASE 
> (3)
>  (using dedicated SMRAM at 3 would allow us to avoid save/restore
>   steps and make SMM handler pointer not vulnerable to DMA attacks)
> 
> (02) QEMU hotplugs a new CPU in reset-ed state and sends SCI
> 
> (03) on receiving SCI, host CPU calls GPE cpu hotplug handler
>   which writes to IO port 0xB2 (broadcast SMI)
> 
> (04) firmware waits for all existing CPUs rendezvous in SMM mode,
>  new CPU(s) have SMI pending but does nothing yet
> 
> (05) host CPU wakes up one new CPU (INIT-INIT-SIPI)
>  SIPI vector points to RO flash HLT loop.
>  (how host CPU will know which new CPUs to relocate?
>   possibly reuse QEMU CPU hotplug MMIO interface???)
> 
> (06) new CPU does relocation.
>  (in case of attacker sends SIPI to several new CPUs,
>   open question how to detect collision of several CPUs at the same 
> default SMBASE)
> 
> (07) once new CPU relocated host CPU completes initialization, returns
>  from IO port write and executes the rest of GPE handler, telling OS
>  to online new CPU.

In step (03), it is the OS that handles the SCI; it transfers control to
ACPI. The AML can write to IO port 0xB2 only because the OS allows it.

If the OS decides to omit that step, and sends an INIT-SIPI-SIPI
directly to the new CPU, can it steal the CPU?

Thanks!
Laszlo



Re: [Qemu-devel] [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

2019-08-30 Thread Igor Mammedov
On Thu, 29 Aug 2019 19:01:35 +0200
Laszlo Ersek  wrote:

> On 08/27/19 20:31, Igor Mammedov wrote:
> > On Sat, 24 Aug 2019 01:48:09 +
> > "Yao, Jiewen"  wrote:  
> 
> >> (05) Host CPU: (OS) Port 0xB2 write, all CPUs enter SMM (NOTE: New CPU
> >>  will not enter CPU because SMI is disabled)  
> > I think only CPU that does the write will enter SMM  
> 
> That used to be the case (and it is still the default QEMU behavior, if
> broadcast SMI is not negotiated). However, OVMF does negotiate broadcast
> SMI whenever QEMU offers the feature. Broadcast SMI is important for the
> stability of the edk2 SMM infrastructure on QEMU/KVM, we've found.
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1412313
> https://bugzilla.redhat.com/show_bug.cgi?id=1412327
> 
> > and we might not need to pull in all already initialized CPUs into SMM.  
> 
> That, on the other hand, could be a valid idea. But then the CPU should
> use a different method for raising a synchronous SMI for itself (not a
> write to IO port 0xB2). Is a "directed SMI for self" possible?

theoretically depending on argument in 0xb3, it should be possible to
rise directed SMI even if broadcast ones are negotiated.

> > [...]  
> 
> I've tried to read through the procedure with your suggested changes,
> but I'm failing at composing a coherent mental image, in this email
> response format.
> 
> If you have the time, can you write up the suggested list of steps in a
> "flat" format? (I believe you are suggesting to eliminate some steps
> completely.)
if I'd sum it up:

(01) On boot firmware maps and initializes SMI handler at default SMBASE (3)
 (using dedicated SMRAM at 3 would allow us to avoid save/restore
  steps and make SMM handler pointer not vulnerable to DMA attacks)

(02) QEMU hotplugs a new CPU in reset-ed state and sends SCI

(03) on receiving SCI, host CPU calls GPE cpu hotplug handler
  which writes to IO port 0xB2 (broadcast SMI)

(04) firmware waits for all existing CPUs rendezvous in SMM mode,
 new CPU(s) have SMI pending but does nothing yet

(05) host CPU wakes up one new CPU (INIT-INIT-SIPI)
 SIPI vector points to RO flash HLT loop.
 (how host CPU will know which new CPUs to relocate?
  possibly reuse QEMU CPU hotplug MMIO interface???)

(06) new CPU does relocation.
 (in case of attacker sends SIPI to several new CPUs,
  open question how to detect collision of several CPUs at the same default 
SMBASE)

(07) once new CPU relocated host CPU completes initialization, returns
 from IO port write and executes the rest of GPE handler, telling OS
 to online new CPU.


> ... jumping to another point:
> 
> >> 2) Let trusted software (SMM and init code) guarantee SMREBASE one by one 
> >> (include any code runs before SMREBASE)  
> > that would mean pulling all present CPUs into SMM mode so no attack
> > code could be executing before doing hotplug. With a lot of present CPUs
> > it could be quite expensive and unlike physical hardware, guest's CPUs
> > could be preempted arbitrarily long causing long delays.  
> 
> I agree with your analysis, but I slightly disagree about the impact:
> 
> - CPU hotplug is not a frequent administrative action, so the CPU load
> should be temporary (it should be a spike). I don't worry that it would
> trip up OS kernel code. (SMI handling is known to take long on physical
> platforms oo.) In practice, all "normal" SMIs are broadcast already (for
> example when calling the runtime UEFI variable services from the OS kernel).
> 
> - The fact that QEMU/KVM introduces some jitter into the execution of
> multi-core code (including SMM code) has proved useful in the past, for
> catching edk2 regressions.
> 
> Again, this is not a strong disagreement from my side. I'm open to
> better ways for synching CPUs during muti-CPU-hotplug.
> 
> (Digression:
> 
> I expect someone could be curious why (a) I find it acceptable (even
> beneficial) that "some jitter" injected by the QEMU/KVM scheduling
> exposes multi-core regressions in edk2, but at the same time (b) I found
> it really important to add broadcast SMI to QEMU and OVMF. After all,
> both "jitter" and "unicast SMIs" are QEMU/KVM platform specifics, so why
> the different treatment?
> 
> The reason is that the "jitter" does not interfere with normal
> operation, and it has been good for catching *regressions*. IOW, there
> is a working edk2 state, someone posts a patch, works on physical
> hardware, but breaks on QEMU/KVM --> then we can still reject or rework
> or revert the patch. And we're back to a working state again (in the
> best case, with a fixed feature patch).
> 
> With the unicast SMIs however, it was impossible to enable the SMM stack
> reliably in the first place. There was no functional state to return to.
I don't really get the last statement, but the I know nothing about OVMF.
I don't insist on unicast SMI being used, it's just some ideas about what
we could do. It could be done later, broadcast 

Re: [Qemu-devel] [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

2019-08-30 Thread Igor Mammedov
On Thu, 29 Aug 2019 18:25:17 +0200
Laszlo Ersek  wrote:

> On 08/28/19 14:01, Igor Mammedov wrote:
> > On Tue, 27 Aug 2019 22:11:15 +0200
> > Laszlo Ersek  wrote:
> >   
> >> On 08/27/19 18:23, Igor Mammedov wrote:  
> >>> On Mon, 26 Aug 2019 17:30:43 +0200
> >>> Laszlo Ersek  wrote:
> >>>  
>  On 08/23/19 17:25, Kinney, Michael D wrote:  
> > Hi Jiewen,
> >
> > If a hot add CPU needs to run any code before the
> > first SMI, I would recommend is only executes code
> > from a write protected FLASH range without a stack
> > and then wait for the first SMI.
> 
>  "without a stack" looks very risky to me. Even if we manage to implement
>  the guest code initially, we'll be trapped without a stack, should we
>  ever need to add more complex stuff there.  
> >>>
> >>> Do we need anything complex in relocation handler, though?
> >>> From what I'd imagine, minimum handler should
> >>>   1: get address of TSEG, possibly read it from chipset  
> >>
> >> The TSEG base calculation is not trivial in this environment. The 32-bit
> >> RAM size needs to be read from the CMOS (IO port accesses). Then the
> >> extended TSEG size (if any) needs to be detected from PCI config space
> >> (IO port accesses). Both CMOS and PCI config space requires IO port
> >> writes too (not just reads). Even if there are enough registers for the
> >> calculations, can we rely on these unprotected IO ports?
> >>
> >> Also, can we switch to 32-bit mode without a stack? I assume it would be
> >> necessary to switch to 32-bit mode for 32-bit arithmetic.  
> > from SDM vol 3:
> > "
> > 34.5.1 Initial SMM Execution Environment
> > After saving the current context of the processor, the processor 
> > initializes its core registers to the values shown in Table 34-4. Upon 
> > entering SMM, the PE and PG flags in control register CR0 are cleared, 
> > which places the processor in an environment similar to real-address mode. 
> > The differences between the SMM execution environment and the real-address 
> > mode execution environment are as follows:
> > • The addressable address space ranges from 0 to H (4 GBytes).
> > • The normal 64-KByte segment limit for real-address mode is increased to 4 
> > GBytes.
> > • The default operand and address sizes are set to 16 bits, which restricts 
> > the addressable SMRAM address space to the 1-MByte real-address mode limit 
> > for native real-address-mode code. However, operand-size and address-size 
> > override prefixes can be used to access the address space beyond
> >  
> > the 1-MByte.
> > "  
> 
> That helps. Thanks for the quote!
> 
> >> Getting the initial APIC ID needs some CPUID instructions IIUC, which
> >> clobber EAX through EDX, if I understand correctly. Given the register
> >> pressure, CPUID might have to be one of the first instructions to call.  
> > 
> > we could map at 3 not 64K required for save area but 128K and use
> > 2nd half as secure RAM for stack and intermediate data.
> > 
> > Firmware could put there pre-calculated pointer to TSEG after it's 
> > configured and locked down,
> > this way relocation handler won't have to figure out TSEG address on its 
> > own.  
> 
> Sounds like a great idea.
> 
> >>>   2: calculate its new SMBASE offset based on its APIC ID
> >>>   3: save new SMBASE
> >>>  
> > For this OVMF use case, is any CPU init required
> > before the first SMI?
> 
>  I expressed a preference for that too: "I wish we could simply wake the
>  new CPU [...] with an SMI".
> 
>  http://mid.mail-archive.com/398b3327-0820-95af-a34d-1a4a1d50cf35@redhat.com
> 
>   
> > From Paolo's list of steps are steps (8a) and (8b) 
> > really required?
> >>>
> >>> 07b - implies 08b  
> >>
> >> I agree about that implication, yes. *If* we send an INIT/SIPI/SIPI to
> >> the new CPU, then the new CPU needs a HLT loop, I think.  
> > It also could execute INIT reset, which leaves initialized SMM untouched
> > but otherwise CPU would be inactive.
> >
> >>  
> >>>8b could be trivial hlt loop and we most likely could skip 08a and 
> >>> signaling host CPU steps
> >>>but we need INIT/SIPI/SIPI sequence to wake up AP so it could handle 
> >>> pending SMI
> >>>before handling SIPI (so behavior would follow SDM).
> >>>
> >>>  
>  See again my message linked above -- just after the quoted sentence, I
>  wrote, "IOW, if we could excise steps 07b, 08a, 08b".
> 
>  But, I obviously defer to Paolo and Igor on that.
> 
>  (I do believe we have a dilemma here. In QEMU, we probably prefer to
>  emulate physical hardware as faithfully as possible. However, we do not
>  have Cache-As-RAM (nor do we intend to, IIUC). Does that justify other
>  divergences from physical hardware too, such as waking just by virtue of
>  an SMI?)  
> >>> So far we should be able to implement it per spec (at least SDM 

Re: [Qemu-devel] [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

2019-08-29 Thread Laszlo Ersek
On 08/27/19 20:31, Igor Mammedov wrote:
> On Sat, 24 Aug 2019 01:48:09 +
> "Yao, Jiewen"  wrote:

>> (05) Host CPU: (OS) Port 0xB2 write, all CPUs enter SMM (NOTE: New CPU
>>  will not enter CPU because SMI is disabled)
> I think only CPU that does the write will enter SMM

That used to be the case (and it is still the default QEMU behavior, if
broadcast SMI is not negotiated). However, OVMF does negotiate broadcast
SMI whenever QEMU offers the feature. Broadcast SMI is important for the
stability of the edk2 SMM infrastructure on QEMU/KVM, we've found.

https://bugzilla.redhat.com/show_bug.cgi?id=1412313
https://bugzilla.redhat.com/show_bug.cgi?id=1412327

> and we might not need to pull in all already initialized CPUs into SMM.

That, on the other hand, could be a valid idea. But then the CPU should
use a different method for raising a synchronous SMI for itself (not a
write to IO port 0xB2). Is a "directed SMI for self" possible?

> [...]

I've tried to read through the procedure with your suggested changes,
but I'm failing at composing a coherent mental image, in this email
response format.

If you have the time, can you write up the suggested list of steps in a
"flat" format? (I believe you are suggesting to eliminate some steps
completely.)

... jumping to another point:

>> 2) Let trusted software (SMM and init code) guarantee SMREBASE one by one 
>> (include any code runs before SMREBASE)
> that would mean pulling all present CPUs into SMM mode so no attack
> code could be executing before doing hotplug. With a lot of present CPUs
> it could be quite expensive and unlike physical hardware, guest's CPUs
> could be preempted arbitrarily long causing long delays.

I agree with your analysis, but I slightly disagree about the impact:

- CPU hotplug is not a frequent administrative action, so the CPU load
should be temporary (it should be a spike). I don't worry that it would
trip up OS kernel code. (SMI handling is known to take long on physical
platforms oo.) In practice, all "normal" SMIs are broadcast already (for
example when calling the runtime UEFI variable services from the OS kernel).

- The fact that QEMU/KVM introduces some jitter into the execution of
multi-core code (including SMM code) has proved useful in the past, for
catching edk2 regressions.

Again, this is not a strong disagreement from my side. I'm open to
better ways for synching CPUs during muti-CPU-hotplug.

(Digression:

I expect someone could be curious why (a) I find it acceptable (even
beneficial) that "some jitter" injected by the QEMU/KVM scheduling
exposes multi-core regressions in edk2, but at the same time (b) I found
it really important to add broadcast SMI to QEMU and OVMF. After all,
both "jitter" and "unicast SMIs" are QEMU/KVM platform specifics, so why
the different treatment?

The reason is that the "jitter" does not interfere with normal
operation, and it has been good for catching *regressions*. IOW, there
is a working edk2 state, someone posts a patch, works on physical
hardware, but breaks on QEMU/KVM --> then we can still reject or rework
or revert the patch. And we're back to a working state again (in the
best case, with a fixed feature patch).

With the unicast SMIs however, it was impossible to enable the SMM stack
reliably in the first place. There was no functional state to return to.

Digression ends.)

> lets first see if if we can ignore race

Makes me uncomfortable, but if this is the consensus, I'll go along.

> and if it's not then
> we probably end up with implementing some form of #1

OK.

Thanks!
Laszlo



Re: [Qemu-devel] [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

2019-08-29 Thread Laszlo Ersek
On 08/28/19 14:01, Igor Mammedov wrote:
> On Tue, 27 Aug 2019 22:11:15 +0200
> Laszlo Ersek  wrote:
> 
>> On 08/27/19 18:23, Igor Mammedov wrote:
>>> On Mon, 26 Aug 2019 17:30:43 +0200
>>> Laszlo Ersek  wrote:
>>>
 On 08/23/19 17:25, Kinney, Michael D wrote:
> Hi Jiewen,
>
> If a hot add CPU needs to run any code before the
> first SMI, I would recommend is only executes code
> from a write protected FLASH range without a stack
> and then wait for the first SMI.  

 "without a stack" looks very risky to me. Even if we manage to implement
 the guest code initially, we'll be trapped without a stack, should we
 ever need to add more complex stuff there.
>>>
>>> Do we need anything complex in relocation handler, though?
>>> From what I'd imagine, minimum handler should
>>>   1: get address of TSEG, possibly read it from chipset
>>
>> The TSEG base calculation is not trivial in this environment. The 32-bit
>> RAM size needs to be read from the CMOS (IO port accesses). Then the
>> extended TSEG size (if any) needs to be detected from PCI config space
>> (IO port accesses). Both CMOS and PCI config space requires IO port
>> writes too (not just reads). Even if there are enough registers for the
>> calculations, can we rely on these unprotected IO ports?
>>
>> Also, can we switch to 32-bit mode without a stack? I assume it would be
>> necessary to switch to 32-bit mode for 32-bit arithmetic.
> from SDM vol 3:
> "
> 34.5.1 Initial SMM Execution Environment
> After saving the current context of the processor, the processor initializes 
> its core registers to the values shown in Table 34-4. Upon entering SMM, the 
> PE and PG flags in control register CR0 are cleared, which places the 
> processor in an environment similar to real-address mode. The differences 
> between the SMM execution environment and the real-address mode execution 
> environment are as follows:
> • The addressable address space ranges from 0 to H (4 GBytes).
> • The normal 64-KByte segment limit for real-address mode is increased to 4 
> GBytes.
> • The default operand and address sizes are set to 16 bits, which restricts 
> the addressable SMRAM address space to the 1-MByte real-address mode limit 
> for native real-address-mode code. However, operand-size and address-size 
> override prefixes can be used to access the address space beyond
>  
> the 1-MByte.
> "

That helps. Thanks for the quote!

>> Getting the initial APIC ID needs some CPUID instructions IIUC, which
>> clobber EAX through EDX, if I understand correctly. Given the register
>> pressure, CPUID might have to be one of the first instructions to call.
> 
> we could map at 3 not 64K required for save area but 128K and use
> 2nd half as secure RAM for stack and intermediate data.
> 
> Firmware could put there pre-calculated pointer to TSEG after it's configured 
> and locked down,
> this way relocation handler won't have to figure out TSEG address on its own.

Sounds like a great idea.

>>>   2: calculate its new SMBASE offset based on its APIC ID
>>>   3: save new SMBASE
>>>
> For this OVMF use case, is any CPU init required
> before the first SMI?  

 I expressed a preference for that too: "I wish we could simply wake the
 new CPU [...] with an SMI".

 http://mid.mail-archive.com/398b3327-0820-95af-a34d-1a4a1d50cf35@redhat.com


> From Paolo's list of steps are steps (8a) and (8b) 
> really required?  
>>>
>>> 07b - implies 08b
>>
>> I agree about that implication, yes. *If* we send an INIT/SIPI/SIPI to
>> the new CPU, then the new CPU needs a HLT loop, I think.
> It also could execute INIT reset, which leaves initialized SMM untouched
> but otherwise CPU would be inactive.
>  
>>
>>>8b could be trivial hlt loop and we most likely could skip 08a and 
>>> signaling host CPU steps
>>>but we need INIT/SIPI/SIPI sequence to wake up AP so it could handle 
>>> pending SMI
>>>before handling SIPI (so behavior would follow SDM).
>>>
>>>
 See again my message linked above -- just after the quoted sentence, I
 wrote, "IOW, if we could excise steps 07b, 08a, 08b".

 But, I obviously defer to Paolo and Igor on that.

 (I do believe we have a dilemma here. In QEMU, we probably prefer to
 emulate physical hardware as faithfully as possible. However, we do not
 have Cache-As-RAM (nor do we intend to, IIUC). Does that justify other
 divergences from physical hardware too, such as waking just by virtue of
 an SMI?)
>>> So far we should be able to implement it per spec (at least SDM one),
>>> but we would still need to invent chipset hardware
>>> i.e. like adding to Q35 non exiting SMRAM and means to map/unmap it
>>> to non-SMM address space.
>>> (and I hope we could avoid adding "parked CPU" thingy)
>>
>> I think we'll need a separate QEMU tree for this. I'm quite in the dark
>> -- I can't tell if 

Re: [Qemu-devel] [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

2019-08-28 Thread Igor Mammedov
On Tue, 27 Aug 2019 22:11:15 +0200
Laszlo Ersek  wrote:

> On 08/27/19 18:23, Igor Mammedov wrote:
> > On Mon, 26 Aug 2019 17:30:43 +0200
> > Laszlo Ersek  wrote:
> > 
> >> On 08/23/19 17:25, Kinney, Michael D wrote:
> >>> Hi Jiewen,
> >>>
> >>> If a hot add CPU needs to run any code before the
> >>> first SMI, I would recommend is only executes code
> >>> from a write protected FLASH range without a stack
> >>> and then wait for the first SMI.  
> >>
> >> "without a stack" looks very risky to me. Even if we manage to implement
> >> the guest code initially, we'll be trapped without a stack, should we
> >> ever need to add more complex stuff there.
> > 
> > Do we need anything complex in relocation handler, though?
> > From what I'd imagine, minimum handler should
> >   1: get address of TSEG, possibly read it from chipset
> 
> The TSEG base calculation is not trivial in this environment. The 32-bit
> RAM size needs to be read from the CMOS (IO port accesses). Then the
> extended TSEG size (if any) needs to be detected from PCI config space
> (IO port accesses). Both CMOS and PCI config space requires IO port
> writes too (not just reads). Even if there are enough registers for the
> calculations, can we rely on these unprotected IO ports?
> 
> Also, can we switch to 32-bit mode without a stack? I assume it would be
> necessary to switch to 32-bit mode for 32-bit arithmetic.
from SDM vol 3:
"
34.5.1 Initial SMM Execution Environment
After saving the current context of the processor, the processor initializes 
its core registers to the values shown in Table 34-4. Upon entering SMM, the PE 
and PG flags in control register CR0 are cleared, which places the processor in 
an environment similar to real-address mode. The differences between the SMM 
execution environment and the real-address mode execution environment are as 
follows:
• The addressable address space ranges from 0 to H (4 GBytes).
• The normal 64-KByte segment limit for real-address mode is increased to 4 
GBytes.
• The default operand and address sizes are set to 16 bits, which restricts the 
addressable SMRAM address space to the 1-MByte real-address mode limit for 
native real-address-mode code. However, operand-size and address-size override 
prefixes can be used to access the address space beyond
 
the 1-MByte.
"

> 
> Getting the initial APIC ID needs some CPUID instructions IIUC, which
> clobber EAX through EDX, if I understand correctly. Given the register
> pressure, CPUID might have to be one of the first instructions to call.

we could map at 3 not 64K required for save area but 128K and use
2nd half as secure RAM for stack and intermediate data.

Firmware could put there pre-calculated pointer to TSEG after it's configured 
and locked down,
this way relocation handler won't have to figure out TSEG address on its own.

> >   2: calculate its new SMBASE offset based on its APIC ID
> >   3: save new SMBASE
> > 
> >>> For this OVMF use case, is any CPU init required
> >>> before the first SMI?  
> >>
> >> I expressed a preference for that too: "I wish we could simply wake the
> >> new CPU [...] with an SMI".
> >>
> >> 398b3327-0820-95af-a34d-1a4a1d50cf35@redhat.com">http://mid.mail-archive.com/398b3327-0820-95af-a34d-1a4a1d50cf35@redhat.com
> >>
> >>
> >>> From Paolo's list of steps are steps (8a) and (8b) 
> >>> really required?  
> > 
> > 07b - implies 08b
> 
> I agree about that implication, yes. *If* we send an INIT/SIPI/SIPI to
> the new CPU, then the new CPU needs a HLT loop, I think.
It also could execute INIT reset, which leaves initialized SMM untouched
but otherwise CPU would be inactive.
 
> 
> >8b could be trivial hlt loop and we most likely could skip 08a and 
> > signaling host CPU steps
> >but we need INIT/SIPI/SIPI sequence to wake up AP so it could handle 
> > pending SMI
> >before handling SIPI (so behavior would follow SDM).
> > 
> > 
> >> See again my message linked above -- just after the quoted sentence, I
> >> wrote, "IOW, if we could excise steps 07b, 08a, 08b".
> >>
> >> But, I obviously defer to Paolo and Igor on that.
> >>
> >> (I do believe we have a dilemma here. In QEMU, we probably prefer to
> >> emulate physical hardware as faithfully as possible. However, we do not
> >> have Cache-As-RAM (nor do we intend to, IIUC). Does that justify other
> >> divergences from physical hardware too, such as waking just by virtue of
> >> an SMI?)
> > So far we should be able to implement it per spec (at least SDM one),
> > but we would still need to invent chipset hardware
> > i.e. like adding to Q35 non exiting SMRAM and means to map/unmap it
> > to non-SMM address space.
> > (and I hope we could avoid adding "parked CPU" thingy)
> 
> I think we'll need a separate QEMU tree for this. I'm quite in the dark
> -- I can't tell if I'll be able to do something in OVMF without actually
> trying it. And for that, we'll need some proposed QEMU code that is
> 

Re: [Qemu-devel] [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

2019-08-27 Thread Laszlo Ersek
On 08/27/19 18:23, Igor Mammedov wrote:
> On Mon, 26 Aug 2019 17:30:43 +0200
> Laszlo Ersek  wrote:
> 
>> On 08/23/19 17:25, Kinney, Michael D wrote:
>>> Hi Jiewen,
>>>
>>> If a hot add CPU needs to run any code before the
>>> first SMI, I would recommend is only executes code
>>> from a write protected FLASH range without a stack
>>> and then wait for the first SMI.  
>>
>> "without a stack" looks very risky to me. Even if we manage to implement
>> the guest code initially, we'll be trapped without a stack, should we
>> ever need to add more complex stuff there.
> 
> Do we need anything complex in relocation handler, though?
> From what I'd imagine, minimum handler should
>   1: get address of TSEG, possibly read it from chipset

The TSEG base calculation is not trivial in this environment. The 32-bit
RAM size needs to be read from the CMOS (IO port accesses). Then the
extended TSEG size (if any) needs to be detected from PCI config space
(IO port accesses). Both CMOS and PCI config space requires IO port
writes too (not just reads). Even if there are enough registers for the
calculations, can we rely on these unprotected IO ports?

Also, can we switch to 32-bit mode without a stack? I assume it would be
necessary to switch to 32-bit mode for 32-bit arithmetic.

Getting the initial APIC ID needs some CPUID instructions IIUC, which
clobber EAX through EDX, if I understand correctly. Given the register
pressure, CPUID might have to be one of the first instructions to call.

>   2: calculate its new SMBASE offset based on its APIC ID
>   3: save new SMBASE
> 
>>> For this OVMF use case, is any CPU init required
>>> before the first SMI?  
>>
>> I expressed a preference for that too: "I wish we could simply wake the
>> new CPU [...] with an SMI".
>>
>> 398b3327-0820-95af-a34d-1a4a1d50cf35@redhat.com">http://mid.mail-archive.com/398b3327-0820-95af-a34d-1a4a1d50cf35@redhat.com
>>
>>
>>> From Paolo's list of steps are steps (8a) and (8b) 
>>> really required?  
> 
> 07b - implies 08b

I agree about that implication, yes. *If* we send an INIT/SIPI/SIPI to
the new CPU, then the new CPU needs a HLT loop, I think.

>8b could be trivial hlt loop and we most likely could skip 08a and 
> signaling host CPU steps
>but we need INIT/SIPI/SIPI sequence to wake up AP so it could handle 
> pending SMI
>before handling SIPI (so behavior would follow SDM).
> 
> 
>> See again my message linked above -- just after the quoted sentence, I
>> wrote, "IOW, if we could excise steps 07b, 08a, 08b".
>>
>> But, I obviously defer to Paolo and Igor on that.
>>
>> (I do believe we have a dilemma here. In QEMU, we probably prefer to
>> emulate physical hardware as faithfully as possible. However, we do not
>> have Cache-As-RAM (nor do we intend to, IIUC). Does that justify other
>> divergences from physical hardware too, such as waking just by virtue of
>> an SMI?)
> So far we should be able to implement it per spec (at least SDM one),
> but we would still need to invent chipset hardware
> i.e. like adding to Q35 non exiting SMRAM and means to map/unmap it
> to non-SMM address space.
> (and I hope we could avoid adding "parked CPU" thingy)

I think we'll need a separate QEMU tree for this. I'm quite in the dark
-- I can't tell if I'll be able to do something in OVMF without actually
trying it. And for that, we'll need some proposed QEMU code that is
testable, but not upstream yet. (As I might realize that I'm unable to
make it work in OVMF.)

>>> Can the SMI monarch use the Local
>>> APIC to send a directed SMI to the hot added CPU?
>>> The SMI monarch needs to know the APIC ID of the
>>> hot added CPU.  Do we also need to handle the case
>>> where multiple CPUs are added at once?  I think we
>>> would need to serialize the use of 3000:8000 for the
>>> SMM rebase operation on each hot added CPU.  
>>
>> I agree this would be a huge help.
> 
> We can serialize it (for normal hotplug flow) from ACPI handler
> in the guest (i.e. non enforced serialization).
> The only reason for serialization I see is not to allow
> a bunch of new CPU trample over default SMBASE save area
> at the same time.

If the default SMBASE area is corrupted due to concurrent access, could
that lead to invalid relocated SMBASE values? Possibly pointing into
normal RAM?

Thanks
Laszlo



Re: [Qemu-devel] [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

2019-08-27 Thread Igor Mammedov
On Sat, 24 Aug 2019 01:48:09 +
"Yao, Jiewen"  wrote:

> I give my thought.
> Paolo may add more.
Here are some ideas I have on the topic.

> 
> > -Original Message-
> > From: Kinney, Michael D
> > Sent: Friday, August 23, 2019 11:25 PM
> > To: Yao, Jiewen ; Paolo Bonzini
> > ; Laszlo Ersek ;
> > r...@edk2.groups.io; Kinney, Michael D 
> > Cc: Alex Williamson ; de...@edk2.groups.io;
> > qemu devel list ; Igor Mammedov
> > ; Chen, Yingwen ;
> > Nakajima, Jun ; Boris Ostrovsky
> > ; Joao Marcal Lemos Martins
> > ; Phillip Goerl 
> > Subject: RE: [edk2-rfc] [edk2-devel] CPU hotplug using SMM with
> > QEMU+OVMF
> > 
> > Hi Jiewen,
> > 
> > If a hot add CPU needs to run any code before the
> > first SMI, I would recommend is only executes code
> > from a write protected FLASH range without a stack
> > and then wait for the first SMI.  
> [Jiewen] Right.
> 
> Another option from Paolo, the new CPU will not run until 0x7b.
> To mitigate DMA threat, someone need guarantee the low memory SIPI vector is 
> DMA protected.
> 
> NOTE: The LOW memory *could* be mapped to write protected FLASH AREA via PAM 
> register. The Host CPU may setup that in SMM.
> If that is the case, we don’t need worry DMA.
> 
> I copied the detail step here, because I found it is hard to dig them out 
> again.

*) In light of using dedicated SMRAM at 3 with pre-configured
relocation vector for initial relocation which is not reachable from
non-SMM mode:

> 
> (01a) QEMU: create new CPU.  The CPU already exists, but it does not
>  start running code until unparked by the CPU hotplug controller.
we might not need parked CPU (if we ignore attacker's attempt to send
SMI to several new CPUs, see below for issue it causes)

> (01b) QEMU: trigger SCI
> 
> (02-03) no equivalent
> 
> (04) Host CPU: (OS) execute GPE handler from DSDT
> 
> (05) Host CPU: (OS) Port 0xB2 write, all CPUs enter SMM (NOTE: New CPU
>  will not enter CPU because SMI is disabled)
I think only CPU that does the write will enter SMM
and we might not need to pull in all already initialized CPUs into SMM.

At this step we could also send a directed SMI to a new CPU from host
CPU that entered SMM on write.

> (06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM
>  rebase code.
could skip this step as well (*)


> (07a) Host CPU: (SMM) Write to CPU hotplug controller to enable
>  new CPU
ditto
 
> (07b) Host CPU: (SMM) Send INIT/SIPI/SIPI to new CPU.
we need to wake up new CPU somehow so it would process (09) pending (05) SMI
before jumping to SIPI vector

> (08a) New CPU: (Low RAM) Enter protected mode.
> 
> (08b) New CPU: (Flash) Signals host CPU to proceed and enter cli;hlt loop.

these both steps could be changed to to just cli;hlt loop or do INIT reset.
if SMI relocation handler and/or host CPU will pull in the new CPU into OVMF,
we actually don't care about SIPI vector as all firmware initialization
for the new CPU is done in SMM mode (07b triggers 10).
Thus eliminating one attack vector to protect from.

> (09) Host CPU: (SMM) Send SMI to the new CPU only.
could be done at (05)
 
> (10) New CPU: (SMM) Run SMM code at 38000, and rebase SMBASE to
>  TSEG.
it could also pull in itself into other OVMF structures
(assuming it can TSEG as stack as that's rather complex) or
just do relocation and let host CPU to fill in OVMF structures for the new CPU 
(12).

> (11) Host CPU: (SMM) Restore 38000.
could skip this step as well (*)

> (12) Host CPU: (SMM) Update located data structure to add the new CPU
>  information. (This step will involve CPU_SERVICE protocol)
> 
> (13) New CPU: (Flash) do whatever other initialization is needed
do we actually need it?

> (14) New CPU: (Flash) Deadloop, and wait for INIT-SIPI-SIPI.
> 
> (15) Host CPU: (OS) Send INIT-SIPI-SIPI to pull new CPU in..
> 
> 
> > 
> > For this OVMF use case, is any CPU init required
> > before the first SMI?  
> [Jiewen] I am sure what is the detail action in 08b.
> And I am not sure what your "init" means here?
> Personally, I don’t think we need too much init work, such as Microcode or 
> MTRR.
> But we need detail info.
Wouldn't it be preferable to do in SMM mode?

> > From Paolo's list of steps are steps (8a) and (8b)
> > really required?  Can the SMI monarch use the Local
> > APIC to send a directed SMI to the hot added CPU?
> > The SMI monarch needs to know the APIC ID of the
> > hot added CPU.
> [Jiewen] I think it depend upon virtual hardware design.
> Leave question to Paolo.

it's not really needed as described in (8x), it could be just
cli;hlt loop so that our SIPI co

Re: [Qemu-devel] [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

2019-08-27 Thread Igor Mammedov
On Mon, 26 Aug 2019 17:30:43 +0200
Laszlo Ersek  wrote:

> On 08/23/19 17:25, Kinney, Michael D wrote:
> > Hi Jiewen,
> > 
> > If a hot add CPU needs to run any code before the
> > first SMI, I would recommend is only executes code
> > from a write protected FLASH range without a stack
> > and then wait for the first SMI.  
> 
> "without a stack" looks very risky to me. Even if we manage to implement
> the guest code initially, we'll be trapped without a stack, should we
> ever need to add more complex stuff there.

Do we need anything complex in relocation handler, though?
From what I'd imagine, minimum handler should
  1: get address of TSEG, possibly read it from chipset
  2: calculate its new SMBASE offset based on its APIC ID
  3: save new SMBASE

> > For this OVMF use case, is any CPU init required
> > before the first SMI?  
> 
> I expressed a preference for that too: "I wish we could simply wake the
> new CPU [...] with an SMI".
> 
> 398b3327-0820-95af-a34d-1a4a1d50cf35@redhat.com">http://mid.mail-archive.com/398b3327-0820-95af-a34d-1a4a1d50cf35@redhat.com
> 
> 
> > From Paolo's list of steps are steps (8a) and (8b) 
> > really required?  

07b - implies 08b
   8b could be trivial hlt loop and we most likely could skip 08a and signaling 
host CPU steps
   but we need INIT/SIPI/SIPI sequence to wake up AP so it could handle pending 
SMI
   before handling SIPI (so behavior would follow SDM).


> See again my message linked above -- just after the quoted sentence, I
> wrote, "IOW, if we could excise steps 07b, 08a, 08b".
> 
> But, I obviously defer to Paolo and Igor on that.
> 
> (I do believe we have a dilemma here. In QEMU, we probably prefer to
> emulate physical hardware as faithfully as possible. However, we do not
> have Cache-As-RAM (nor do we intend to, IIUC). Does that justify other
> divergences from physical hardware too, such as waking just by virtue of
> an SMI?)
So far we should be able to implement it per spec (at least SDM one),
but we would still need to invent chipset hardware
i.e. like adding to Q35 non exiting SMRAM and means to map/unmap it
to non-SMM address space.
(and I hope we could avoid adding "parked CPU" thingy)
 
> > Can the SMI monarch use the Local
> > APIC to send a directed SMI to the hot added CPU?
> > The SMI monarch needs to know the APIC ID of the
> > hot added CPU.  Do we also need to handle the case
> > where multiple CPUs are added at once?  I think we
> > would need to serialize the use of 3000:8000 for the
> > SMM rebase operation on each hot added CPU.  
> 
> I agree this would be a huge help.

We can serialize it (for normal hotplug flow) from ACPI handler
in the guest (i.e. non enforced serialization).
The only reason for serialization I see is not to allow
a bunch of new CPU trample over default SMBASE save area
at the same time.

There is a consideration though, an OS level attacker
could send broadcast SMI and INIT-SIPI-SIPI sequences
to rigger race, but I don't see it as a threat since
attack shouldn't be able to exploit anything and in
worst case guest OS would crash (taking in account that
SMIs are privileged, OS attacker has a plenty of other
means to kill itself).

> > It would be simpler if we can guarantee that only
> > one CPU can be added or removed at a time and the 
> > complete flow of adding a CPU to SMM and the OS
> > needs to be completed before another add/remove
> > event needs to be processed.  
> 
> I don't know if the QEMU monitor command in question can guarantee this
> serialization. I think such a request/response pattern is generally
> implementable between QEMU and guest code.
> 
> But, AIUI, the "device-add" monitor command is quite generic, and used
> for hot-plugging a number of other (non-CPU) device models. I'm unsure
> if the pattern in question can be squeezed into "device-add". (It's not
> a dedicated command for CPU hotplug.)
> 
> ... Apologies that I didn't add much information to the thread, just
> now. I'd like to keep the discussion going.
> 
> Thanks
> Laszlo




Re: [Qemu-devel] [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

2019-08-26 Thread Laszlo Ersek
On 08/23/19 17:25, Kinney, Michael D wrote:
> Hi Jiewen,
> 
> If a hot add CPU needs to run any code before the
> first SMI, I would recommend is only executes code
> from a write protected FLASH range without a stack
> and then wait for the first SMI.

"without a stack" looks very risky to me. Even if we manage to implement
the guest code initially, we'll be trapped without a stack, should we
ever need to add more complex stuff there.


> For this OVMF use case, is any CPU init required
> before the first SMI?

I expressed a preference for that too: "I wish we could simply wake the
new CPU [...] with an SMI".

398b3327-0820-95af-a34d-1a4a1d50cf35@redhat.com">http://mid.mail-archive.com/398b3327-0820-95af-a34d-1a4a1d50cf35@redhat.com


> From Paolo's list of steps are steps (8a) and (8b) 
> really required?

See again my message linked above -- just after the quoted sentence, I
wrote, "IOW, if we could excise steps 07b, 08a, 08b".

But, I obviously defer to Paolo and Igor on that.

(I do believe we have a dilemma here. In QEMU, we probably prefer to
emulate physical hardware as faithfully as possible. However, we do not
have Cache-As-RAM (nor do we intend to, IIUC). Does that justify other
divergences from physical hardware too, such as waking just by virtue of
an SMI?)


> Can the SMI monarch use the Local
> APIC to send a directed SMI to the hot added CPU?
> The SMI monarch needs to know the APIC ID of the
> hot added CPU.  Do we also need to handle the case
> where multiple CPUs are added at once?  I think we
> would need to serialize the use of 3000:8000 for the
> SMM rebase operation on each hot added CPU.

I agree this would be a huge help.


> It would be simpler if we can guarantee that only
> one CPU can be added or removed at a time and the 
> complete flow of adding a CPU to SMM and the OS
> needs to be completed before another add/remove
> event needs to be processed.

I don't know if the QEMU monitor command in question can guarantee this
serialization. I think such a request/response pattern is generally
implementable between QEMU and guest code.

But, AIUI, the "device-add" monitor command is quite generic, and used
for hot-plugging a number of other (non-CPU) device models. I'm unsure
if the pattern in question can be squeezed into "device-add". (It's not
a dedicated command for CPU hotplug.)

... Apologies that I didn't add much information to the thread, just
now. I'd like to keep the discussion going.

Thanks
Laszlo



Re: [Qemu-devel] [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

2019-08-23 Thread Yao, Jiewen
I give my thought.
Paolo may add more.

> -Original Message-
> From: Kinney, Michael D
> Sent: Friday, August 23, 2019 11:25 PM
> To: Yao, Jiewen ; Paolo Bonzini
> ; Laszlo Ersek ;
> r...@edk2.groups.io; Kinney, Michael D 
> Cc: Alex Williamson ; de...@edk2.groups.io;
> qemu devel list ; Igor Mammedov
> ; Chen, Yingwen ;
> Nakajima, Jun ; Boris Ostrovsky
> ; Joao Marcal Lemos Martins
> ; Phillip Goerl 
> Subject: RE: [edk2-rfc] [edk2-devel] CPU hotplug using SMM with
> QEMU+OVMF
> 
> Hi Jiewen,
> 
> If a hot add CPU needs to run any code before the
> first SMI, I would recommend is only executes code
> from a write protected FLASH range without a stack
> and then wait for the first SMI.
[Jiewen] Right.

Another option from Paolo, the new CPU will not run until 0x7b.
To mitigate DMA threat, someone need guarantee the low memory SIPI vector is 
DMA protected.

NOTE: The LOW memory *could* be mapped to write protected FLASH AREA via PAM 
register. The Host CPU may setup that in SMM.
If that is the case, we don’t need worry DMA.

I copied the detail step here, because I found it is hard to dig them out again.

(01a) QEMU: create new CPU.  The CPU already exists, but it does not
 start running code until unparked by the CPU hotplug controller.

(01b) QEMU: trigger SCI

(02-03) no equivalent

(04) Host CPU: (OS) execute GPE handler from DSDT

(05) Host CPU: (OS) Port 0xB2 write, all CPUs enter SMM (NOTE: New CPU
 will not enter CPU because SMI is disabled)

(06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM
 rebase code.

(07a) Host CPU: (SMM) Write to CPU hotplug controller to enable
 new CPU

(07b) Host CPU: (SMM) Send INIT/SIPI/SIPI to new CPU.

(08a) New CPU: (Low RAM) Enter protected mode.

(08b) New CPU: (Flash) Signals host CPU to proceed and enter cli;hlt loop.

(09) Host CPU: (SMM) Send SMI to the new CPU only.

(10) New CPU: (SMM) Run SMM code at 38000, and rebase SMBASE to
 TSEG.

(11) Host CPU: (SMM) Restore 38000.

(12) Host CPU: (SMM) Update located data structure to add the new CPU
 information. (This step will involve CPU_SERVICE protocol)

(13) New CPU: (Flash) do whatever other initialization is needed

(14) New CPU: (Flash) Deadloop, and wait for INIT-SIPI-SIPI.

(15) Host CPU: (OS) Send INIT-SIPI-SIPI to pull new CPU in..


> 
> For this OVMF use case, is any CPU init required
> before the first SMI?
[Jiewen] I am sure what is the detail action in 08b.
And I am not sure what your "init" means here?
Personally, I don’t think we need too much init work, such as Microcode or MTRR.
But we need detail info.



> From Paolo's list of steps are steps (8a) and (8b)
> really required?  Can the SMI monarch use the Local
> APIC to send a directed SMI to the hot added CPU?
> The SMI monarch needs to know the APIC ID of the
> hot added CPU.  
[Jiewen] I think it depend upon virtual hardware design.
Leave question to Paolo.



Do we also need to handle the case
> where multiple CPUs are added at once?  I think we
> would need to serialize the use of 3000:8000 for the
> SMM rebase operation on each hot added CPU.
> It would be simpler if we can guarantee that only
> one CPU can be added or removed at a time and the
> complete flow of adding a CPU to SMM and the OS
> needs to be completed before another add/remove
> event needs to be processed.
[Jiewen] Right.
I treat the multiple CPU hot-add at same time as a potential threat.
We don’t want to trust end user.
The solution could be:
1) Let trusted hardware guarantee hot-add one by one.
2) Let trusted software (SMM and init code) guarantee SMREBASE one by one 
(include any code runs before SMREBASE)
3) Let trusted software (SMM and init code) support SMREBASE simultaneously 
(include any code runs before SMREBASE).
Solution #1 or #2 are simple solution.


> Mike
> 
> > -Original Message-
> > From: Yao, Jiewen
> > Sent: Thursday, August 22, 2019 10:00 PM
> > To: Kinney, Michael D ;
> > Paolo Bonzini ; Laszlo Ersek
> > ; r...@edk2.groups.io
> > Cc: Alex Williamson ;
> > de...@edk2.groups.io; qemu devel list  > de...@nongnu.org>; Igor Mammedov ;
> > Chen, Yingwen ; Nakajima, Jun
> > ; Boris Ostrovsky
> > ; Joao Marcal Lemos Martins
> > ; Phillip Goerl
> > 
> > Subject: RE: [edk2-rfc] [edk2-devel] CPU hotplug using
> > SMM with QEMU+OVMF
> >
> > Thank you Mike!
> >
> > That is good reference on the real hardware behavior.
> > (Glad it is public.)
> >
> > For threat model, the unique part in virtual environment
> > is temp RAM.
> > The temp RAM in real platform is per CPU cache, while
> > the temp RAM in virtual platform is global memory.
> > That brings one more potential attack surfa

Re: [Qemu-devel] [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

2019-08-23 Thread Kinney, Michael D
Hi Jiewen,

If a hot add CPU needs to run any code before the
first SMI, I would recommend is only executes code
from a write protected FLASH range without a stack
and then wait for the first SMI.

For this OVMF use case, is any CPU init required
before the first SMI?

From Paolo's list of steps are steps (8a) and (8b) 
really required?  Can the SMI monarch use the Local
APIC to send a directed SMI to the hot added CPU?
The SMI monarch needs to know the APIC ID of the
hot added CPU.  Do we also need to handle the case
where multiple CPUs are added at once?  I think we
would need to serialize the use of 3000:8000 for the
SMM rebase operation on each hot added CPU.

It would be simpler if we can guarantee that only
one CPU can be added or removed at a time and the 
complete flow of adding a CPU to SMM and the OS
needs to be completed before another add/remove
event needs to be processed.

Mike

> -Original Message-
> From: Yao, Jiewen
> Sent: Thursday, August 22, 2019 10:00 PM
> To: Kinney, Michael D ;
> Paolo Bonzini ; Laszlo Ersek
> ; r...@edk2.groups.io
> Cc: Alex Williamson ;
> de...@edk2.groups.io; qemu devel list  de...@nongnu.org>; Igor Mammedov ;
> Chen, Yingwen ; Nakajima, Jun
> ; Boris Ostrovsky
> ; Joao Marcal Lemos Martins
> ; Phillip Goerl
> 
> Subject: RE: [edk2-rfc] [edk2-devel] CPU hotplug using
> SMM with QEMU+OVMF
> 
> Thank you Mike!
> 
> That is good reference on the real hardware behavior.
> (Glad it is public.)
> 
> For threat model, the unique part in virtual environment
> is temp RAM.
> The temp RAM in real platform is per CPU cache, while
> the temp RAM in virtual platform is global memory.
> That brings one more potential attack surface in virtual
> environment, if hot-added CPU need run code with stack
> or heap before SMI rebase.
> 
> Other threats, such as SMRAM or DMA, are same.
> 
> Thank you
> Yao Jiewen
> 
> 
> > -Original Message-
> > From: Kinney, Michael D
> > Sent: Friday, August 23, 2019 9:03 AM
> > To: Paolo Bonzini ; Laszlo Ersek
> > ; r...@edk2.groups.io; Yao, Jiewen
> > ; Kinney, Michael D
> 
> > Cc: Alex Williamson ;
> > de...@edk2.groups.io; qemu devel list  de...@nongnu.org>; Igor
> > Mammedov ; Chen, Yingwen
> > ; Nakajima, Jun
> ;
> > Boris Ostrovsky ; Joao
> Marcal Lemos
> > Martins ; Phillip Goerl
> > 
> > Subject: RE: [edk2-rfc] [edk2-devel] CPU hotplug using
> SMM with
> > QEMU+OVMF
> >
> > Paolo,
> >
> > I find the following links related to the discussions
> here along with
> > one example feature called GENPROTRANGE.
> >
> > https://csrc.nist.gov/CSRC/media/Presentations/The-
> Whole-is-Greater/im
> > a ges-media/day1_trusted-computing_200-250.pdf
> > https://cansecwest.com/slides/2017/CSW2017_Cuauhtemoc-
> Rene_CPU_Ho
> > t-Add_flow.pdf
> > https://www.mouser.com/ds/2/612/5520-5500-chipset-ioh-
> datasheet-1131
> > 292.pdf
> >
> > Best regards,
> >
> > Mike
> >
> > > -Original Message-
> > > From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> > > Sent: Thursday, August 22, 2019 4:12 PM
> > > To: Kinney, Michael D ;
> Laszlo Ersek
> > > ; r...@edk2.groups.io; Yao, Jiewen
> > > 
> > > Cc: Alex Williamson ;
> > > de...@edk2.groups.io; qemu devel list  de...@nongnu.org>; Igor
> > > Mammedov ; Chen, Yingwen
> > > ; Nakajima, Jun
> ;
> > > Boris Ostrovsky ; Joao
> Marcal Lemos
> > > Martins ; Phillip Goerl
> > > 
> > > Subject: Re: [edk2-rfc] [edk2-devel] CPU hotplug
> using SMM with
> > > QEMU+OVMF
> > >
> > > On 23/08/19 00:32, Kinney, Michael D wrote:
> > > > Paolo,
> > > >
> > > > It is my understanding that real HW hot plug uses
> the
> > > SDM defined
> > > > methods.  Meaning the initial SMI is to 3000:8000
> and
> > > they rebase to
> > > > TSEG in the first SMI.  They must have chipset
> specific
> > > methods to
> > > > protect 3000:8000 from DMA.
> > >
> > > It would be great if you could check.
> > >
> > > > Can we add a chipset feature to prevent DMA to
> 64KB
> > > range from
> > > > 0x3-0x3 and the UEFI Memory Map and ACPI
> > > content can be
> > > > updated so the Guest OS knows to not use that
> range for
> > > DMA?
> > >
> > > If real hardware does it at the chipset level, we
> will probably use
> > > Igor's suggestion of aliasing A-seg to 3000:.
> Before starting
> > > the new CPU, the SMI handler can prepare the SMBASE
> relocation
> > > trampoline at
> > > A000:8000 and the hot-plugged CPU will find it at
> > > 3000:8000 when it receives the initial SMI.  Because
> this is backed
> > > by RAM at 0xA-0xA, DMA cannot access it and
> would still go
> > > through to RAM at 0x3.
> > >
> > > Paolo


Re: [Qemu-devel] [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

2019-08-23 Thread Laszlo Ersek
On 08/22/19 20:51, Paolo Bonzini wrote:
> On 22/08/19 20:29, Laszlo Ersek wrote:
>> On 08/22/19 08:18, Paolo Bonzini wrote:
>>> On 21/08/19 22:17, Kinney, Michael D wrote:
 DMA protection of memory ranges is a chipset feature. For the current
 QEMU implementation, what ranges of memory are guaranteed to be
 protected from DMA?  Is it only A/B seg and TSEG?
>>>
>>> Yes.
>>
>> This thread (esp. Jiewen's and Mike's messages) are the first time that
>> I've heard about the *existence* of such RAM ranges / the chipset
>> feature. :)
>>
>> Out of interest (independently of virtualization), how is a general
>> purpose OS informed by the firmware, "never try to set up DMA to this
>> RAM area"? Is this communicated through ACPI _CRS perhaps?
>>
>> ... Ah, almost: ACPI 6.2 specifies _DMA, in "6.2.4 _DMA (Direct Memory
>> Access)". It writes,
>>
>> For example, if a platform implements a PCI bus that cannot access
>> all of physical memory, it has a _DMA object under that PCI bus that
>> describes the ranges of physical memory that can be accessed by
>> devices on that bus.
>>
>> Sorry about the digression, and also about being late to this thread,
>> continually -- I'm primarily following and learning.
> 
> It's much simpler: these ranges are not in e820, for example
> 
> kernel: BIOS-e820: [mem 0x00059000-0x0008bfff] usable
> kernel: BIOS-e820: [mem 0x0008c000-0x000f] reserved

(1) Sorry, my _DMA quote was a detour from QEMU -- I wondered how a
physical machine with actual RAM at 0x3, and also chipset level
protection against DMA to/from that RAM range, would expose the fact to
the OS (so that the OS not innocently try to set up DMA there).

(2) In case of QEMU+OVMF, "e820" is not defined at the firmware level.

While
- QEMU exports an "e820 map" (and OVMF does utilize that),
- and Linux parses the UEFI memmap into an "e820 map" (so that dependent
logic only need to deal with e820),

in edk2 the concepts are "GCD memory space map" and "UEFI memmap".

So what OVMF does is, it reserves the TSEG area in the UEFI memmap:

  https://github.com/tianocore/edk2/commit/b09c1c6f2569a

(This was later de-constified for the extended TSEG size, in commit
23bfb5c0aab6, "OvmfPkg/PlatformPei: prepare for PcdQ35TsegMbytes
becoming dynamic", 2017-07-05).

This is just to say that with OVMF, TSEG is not absent from the UEFI
memmap, it is reserved instead. (Apologies if I misunderstood and you
didn't actually claim otherwise.)


> The ranges are not special-cased in any way by QEMU.  Simply, AB-segs
> and TSEG RAM are not part of the address space except when in SMM.

(or when TSEG is not locked, and open; but:) yes, this matches my
understanding.

> Therefore, DMA to those ranges ends up respectively to low VGA RAM[1]
> and to the bit bucket.  When AB-segs are open, for example, DMA to that
> area becomes possible.

Which seems to imply that, if we alias 0x3 to the AB-segs, and rely
on the AB-segs for CPU hotplug, OVMF should close and lock down the
AB-segs at first boot. Correct? (Because OVMF doesn't do anything about
AB at the moment.)

Thanks
Laszlo

> 
> Paolo
> 
> [1] old timers may remember DEF SEG=: BLOAD "foo.img",0.  It still
> works with some disk device models.
> 




Re: [Qemu-devel] [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

2019-08-22 Thread Yao, Jiewen
Thank you Mike!

That is good reference on the real hardware behavior. (Glad it is public.)

For threat model, the unique part in virtual environment is temp RAM.
The temp RAM in real platform is per CPU cache, while the temp RAM in virtual 
platform is global memory.
That brings one more potential attack surface in virtual environment, if 
hot-added CPU need run code with stack or heap before SMI rebase.

Other threats, such as SMRAM or DMA, are same.

Thank you
Yao Jiewen


> -Original Message-
> From: Kinney, Michael D
> Sent: Friday, August 23, 2019 9:03 AM
> To: Paolo Bonzini ; Laszlo Ersek
> ; r...@edk2.groups.io; Yao, Jiewen
> ; Kinney, Michael D 
> Cc: Alex Williamson ; de...@edk2.groups.io;
> qemu devel list ; Igor Mammedov
> ; Chen, Yingwen ;
> Nakajima, Jun ; Boris Ostrovsky
> ; Joao Marcal Lemos Martins
> ; Phillip Goerl 
> Subject: RE: [edk2-rfc] [edk2-devel] CPU hotplug using SMM with
> QEMU+OVMF
> 
> Paolo,
> 
> I find the following links related to the discussions here
> along with one example feature called GENPROTRANGE.
> 
> https://csrc.nist.gov/CSRC/media/Presentations/The-Whole-is-Greater/ima
> ges-media/day1_trusted-computing_200-250.pdf
> https://cansecwest.com/slides/2017/CSW2017_Cuauhtemoc-Rene_CPU_Ho
> t-Add_flow.pdf
> https://www.mouser.com/ds/2/612/5520-5500-chipset-ioh-datasheet-1131
> 292.pdf
> 
> Best regards,
> 
> Mike
> 
> > -Original Message-
> > From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> > Sent: Thursday, August 22, 2019 4:12 PM
> > To: Kinney, Michael D ;
> > Laszlo Ersek ; r...@edk2.groups.io;
> > Yao, Jiewen 
> > Cc: Alex Williamson ;
> > de...@edk2.groups.io; qemu devel list  > de...@nongnu.org>; Igor Mammedov ;
> > Chen, Yingwen ; Nakajima, Jun
> > ; Boris Ostrovsky
> > ; Joao Marcal Lemos Martins
> > ; Phillip Goerl
> > 
> > Subject: Re: [edk2-rfc] [edk2-devel] CPU hotplug using
> > SMM with QEMU+OVMF
> >
> > On 23/08/19 00:32, Kinney, Michael D wrote:
> > > Paolo,
> > >
> > > It is my understanding that real HW hot plug uses the
> > SDM defined
> > > methods.  Meaning the initial SMI is to 3000:8000 and
> > they rebase to
> > > TSEG in the first SMI.  They must have chipset specific
> > methods to
> > > protect 3000:8000 from DMA.
> >
> > It would be great if you could check.
> >
> > > Can we add a chipset feature to prevent DMA to 64KB
> > range from
> > > 0x3-0x3 and the UEFI Memory Map and ACPI
> > content can be
> > > updated so the Guest OS knows to not use that range for
> > DMA?
> >
> > If real hardware does it at the chipset level, we will
> > probably use Igor's suggestion of aliasing A-seg to
> > 3000:.  Before starting the new CPU, the SMI handler
> > can prepare the SMBASE relocation trampoline at
> > A000:8000 and the hot-plugged CPU will find it at
> > 3000:8000 when it receives the initial SMI.  Because this
> > is backed by RAM at 0xA-0xA, DMA cannot access it
> > and would still go through to RAM at 0x3.
> >
> > Paolo


Re: [Qemu-devel] [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

2019-08-22 Thread Kinney, Michael D
Paolo,

I find the following links related to the discussions here
along with one example feature called GENPROTRANGE.

https://csrc.nist.gov/CSRC/media/Presentations/The-Whole-is-Greater/images-media/day1_trusted-computing_200-250.pdf
https://cansecwest.com/slides/2017/CSW2017_Cuauhtemoc-Rene_CPU_Hot-Add_flow.pdf
https://www.mouser.com/ds/2/612/5520-5500-chipset-ioh-datasheet-1131292.pdf

Best regards,

Mike

> -Original Message-
> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> Sent: Thursday, August 22, 2019 4:12 PM
> To: Kinney, Michael D ;
> Laszlo Ersek ; r...@edk2.groups.io;
> Yao, Jiewen 
> Cc: Alex Williamson ;
> de...@edk2.groups.io; qemu devel list  de...@nongnu.org>; Igor Mammedov ;
> Chen, Yingwen ; Nakajima, Jun
> ; Boris Ostrovsky
> ; Joao Marcal Lemos Martins
> ; Phillip Goerl
> 
> Subject: Re: [edk2-rfc] [edk2-devel] CPU hotplug using
> SMM with QEMU+OVMF
> 
> On 23/08/19 00:32, Kinney, Michael D wrote:
> > Paolo,
> >
> > It is my understanding that real HW hot plug uses the
> SDM defined
> > methods.  Meaning the initial SMI is to 3000:8000 and
> they rebase to
> > TSEG in the first SMI.  They must have chipset specific
> methods to
> > protect 3000:8000 from DMA.
> 
> It would be great if you could check.
> 
> > Can we add a chipset feature to prevent DMA to 64KB
> range from
> > 0x3-0x3 and the UEFI Memory Map and ACPI
> content can be
> > updated so the Guest OS knows to not use that range for
> DMA?
> 
> If real hardware does it at the chipset level, we will
> probably use Igor's suggestion of aliasing A-seg to
> 3000:.  Before starting the new CPU, the SMI handler
> can prepare the SMBASE relocation trampoline at
> A000:8000 and the hot-plugged CPU will find it at
> 3000:8000 when it receives the initial SMI.  Because this
> is backed by RAM at 0xA-0xA, DMA cannot access it
> and would still go through to RAM at 0x3.
> 
> Paolo


Re: [Qemu-devel] [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

2019-08-22 Thread Paolo Bonzini
On 23/08/19 00:32, Kinney, Michael D wrote:
> Paolo,
> 
> It is my understanding that real HW hot plug uses the SDM defined
> methods.  Meaning the initial SMI is to 3000:8000 and they rebase
> to TSEG in the first SMI.  They must have chipset specific methods
> to protect 3000:8000 from DMA.

It would be great if you could check.

> Can we add a chipset feature to prevent DMA to 64KB range from
> 0x3-0x3 and the UEFI Memory Map and ACPI content can be
> updated so the Guest OS knows to not use that range for DMA?

If real hardware does it at the chipset level, we will probably use
Igor's suggestion of aliasing A-seg to 3000:.  Before starting the
new CPU, the SMI handler can prepare the SMBASE relocation trampoline at
A000:8000 and the hot-plugged CPU will find it at 3000:8000 when it
receives the initial SMI.  Because this is backed by RAM at
0xA-0xA, DMA cannot access it and would still go through to RAM
at 0x3.

Paolo



Re: [Qemu-devel] [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

2019-08-22 Thread Kinney, Michael D
Paolo,

It is my understanding that real HW hot plug uses the SDM defined
methods.  Meaning the initial SMI is to 3000:8000 and they rebase
to TSEG in the first SMI.  They must have chipset specific methods
to protect 3000:8000 from DMA.

Can we add a chipset feature to prevent DMA to 64KB range from
0x3-0x3 and the UEFI Memory Map and ACPI content can be
updated so the Guest OS knows to not use that range for DMA?

Thanks,

Mike

> -Original Message-
> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> Sent: Thursday, August 22, 2019 3:18 PM
> To: Kinney, Michael D ;
> Laszlo Ersek ; r...@edk2.groups.io;
> Yao, Jiewen 
> Cc: Alex Williamson ;
> de...@edk2.groups.io; qemu devel list  de...@nongnu.org>; Igor Mammedov ;
> Chen, Yingwen ; Nakajima, Jun
> ; Boris Ostrovsky
> ; Joao Marcal Lemos Martins
> ; Phillip Goerl
> 
> Subject: Re: [edk2-rfc] [edk2-devel] CPU hotplug using
> SMM with QEMU+OVMF
> 
> On 22/08/19 22:06, Kinney, Michael D wrote:
> > The SMBASE register is internal and cannot be directly
> accessed by any
> > CPU.  There is an SMBASE field that is member of the
> SMM Save State
> > area and can only be modified from SMM and requires the
> execution of
> > an RSM instruction from SMM for the SMBASE register to
> be updated from
> > the current SMBASE field value.  The new SMBASE
> register value is only
> > used on the next SMI.
> 
> Actually there is also an SMBASE MSR, even though in
> current silicon it's read-only and its use is
> theoretically limited to SMM-transfer monitors.  If that
> MSR could be made accessible somehow outside SMM, that
> would be great.
> 
> > Once all the CPUs have been initialized for SMM, the
> CPUs that are not
> > needed can be hot removed.  As noted above, the SMBASE
> value does not
> > change on an INIT.  So as long as the hot add operation
> does not do a
> > RESET, the SMBASE value must be preserved.
> 
> IIRC, hot-remove + hot-add will unplugs/plugs a
> completely different CPU.
> 
> > Another idea is to emulate this behavior.  If the hot
> plug controller
> > provide registers (only accessible from SMM) to assign
> the SMBASE
> > address for every CPU.  When a CPU is hot added, QEMU
> can set the
> > internal SMBASE register value from the hot plug
> controller register
> > value.  If the SMM Monarch sends an INIT or an SMI from
> the Local APIC
> > to the hot added CPU, then the SMBASE register should
> not be modified
> > and the CPU starts execution within TSEG the first time
> it receives an SMI.
> 
> Yes, this would work.  But again---if the issue is real
> on current hardware too, I'd rather have a matching
> solution for virtual platforms.
> 
> If the current hardware for example remembers INIT-
> preserved across hot-remove/hot-add, we could emulate
> that.
> 
> I guess the fundamental question is: how do bare metal
> platforms avoid this issue, or plan to avoid this issue?
> Once we know that, we can use that information to find a
> way to implement it in KVM.  Only if it is impossible
> we'll have a different strategy that is specific to our
> platform.
> 
> Paolo
> 
> > Jiewen and I can collect specific questions on this
> topic and continue
> > the discussion here.  For example, I do not think there
> is any method
> > other than what I referenced above to program the
> SMBASE register, but
> > I can ask if there are any other methods.



Re: [Qemu-devel] [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

2019-08-22 Thread Paolo Bonzini
On 22/08/19 22:06, Kinney, Michael D wrote:
> The SMBASE register is internal and cannot be directly accessed 
> by any CPU.  There is an SMBASE field that is member of the SMM Save
> State area and can only be modified from SMM and requires the
> execution of an RSM instruction from SMM for the SMBASE register to
> be updated from the current SMBASE field value.  The new SMBASE
> register value is only used on the next SMI.

Actually there is also an SMBASE MSR, even though in current silicon
it's read-only and its use is theoretically limited to SMM-transfer
monitors.  If that MSR could be made accessible somehow outside SMM,
that would be great.

> Once all the CPUs have been initialized for SMM, the CPUs that are not needed
> can be hot removed.  As noted above, the SMBASE value does not change on
> an INIT.  So as long as the hot add operation does not do a RESET, the
> SMBASE value must be preserved.

IIRC, hot-remove + hot-add will unplugs/plugs a completely different CPU.

> Another idea is to emulate this behavior.  If the hot plug controller
> provide registers (only accessible from SMM) to assign the SMBASE address
> for every CPU.  When a CPU is hot added, QEMU can set the internal SMBASE
> register value from the hot plug controller register value.  If the SMM
> Monarch sends an INIT or an SMI from the Local APIC to the hot added CPU,
> then the SMBASE register should not be modified and the CPU starts execution
> within TSEG the first time it receives an SMI.

Yes, this would work.  But again---if the issue is real on current
hardware too, I'd rather have a matching solution for virtual platforms.

If the current hardware for example remembers INIT-preserved across
hot-remove/hot-add, we could emulate that.

I guess the fundamental question is: how do bare metal platforms avoid
this issue, or plan to avoid this issue?  Once we know that, we can use
that information to find a way to implement it in KVM.  Only if it is
impossible we'll have a different strategy that is specific to our platform.

Paolo

> Jiewen and I can collect specific questions on this topic and continue
> the discussion here.  For example, I do not think there is any method
> other than what I referenced above to program the SMBASE register, but
> I can ask if there are any other methods.




Re: [Qemu-devel] [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

2019-08-22 Thread Kinney, Michael D
Laszlo,

I believe all the code for the AP startup vector
is already in edk2.

It is a combination of the reset vector code in
UefiCpuPkg/ResetVecor/Vtf0 and an IA32/X64 specific
feature in the GenFv tool.  It sets up a 4KB aligned
location near 4GB which can be used to start an AP
using INIT-SIPI-SIPI.

DI is set to 'AP' if the processor is not the BSP.
This can be used to choose to put the APs into a
wait loop executing from the protected FLASH region.

The SMM Monarch on a hot add event can use the Local
APIC to send an INIT-SIPI-SIPI to wake the AP at the 4KB 
startup vector in FLASH.  Later the SMM Monarch
can sent use the Local APIC to send an SMI to pull the 
hot added CPU into SMM.  It is not clear if we have to
do both SIPI followed by the SMI or if we can just do
the SMI.

Best regards,

Mike

> -Original Message-
> From: de...@edk2.groups.io
> [mailto:de...@edk2.groups.io] On Behalf Of Laszlo Ersek
> Sent: Thursday, August 22, 2019 11:29 AM
> To: Paolo Bonzini ; Kinney,
> Michael D ;
> r...@edk2.groups.io; Yao, Jiewen 
> Cc: Alex Williamson ;
> de...@edk2.groups.io; qemu devel list  de...@nongnu.org>; Igor Mammedov ;
> Chen, Yingwen ; Nakajima, Jun
> ; Boris Ostrovsky
> ; Joao Marcal Lemos Martins
> ; Phillip Goerl
> 
> Subject: Re: [edk2-rfc] [edk2-devel] CPU hotplug using
> SMM with QEMU+OVMF
> 
> On 08/22/19 08:18, Paolo Bonzini wrote:
> > On 21/08/19 22:17, Kinney, Michael D wrote:
> >> Paolo,
> >>
> >> It makes sense to match real HW.
> >
> > Note that it'd also be fine to match some kind of
> official Intel
> > specification even if no processor (currently?)
> supports it.
> 
> I agree, because...
> 
> >> That puts us back to the reset vector and handling
> the initial SMI at
> >> 3000:8000.  That is all workable from a FW
> implementation
> >> perspective.
> 
> that would suggest that matching reset vector code
> already exists, and it would "only" need to be
> upstreamed to edk2. :)
> 
> >> It look like the only issue left is DMA.
> >>
> >> DMA protection of memory ranges is a chipset
> feature. For the current
> >> QEMU implementation, what ranges of memory are
> guaranteed to be
> >> protected from DMA?  Is it only A/B seg and TSEG?
> >
> > Yes.
> 
> (
> 
> This thread (esp. Jiewen's and Mike's messages) are the
> first time that I've heard about the *existence* of
> such RAM ranges / the chipset feature. :)
> 
> Out of interest (independently of virtualization), how
> is a general purpose OS informed by the firmware,
> "never try to set up DMA to this RAM area"? Is this
> communicated through ACPI _CRS perhaps?
> 
> ... Ah, almost: ACPI 6.2 specifies _DMA, in "6.2.4 _DMA
> (Direct Memory Access)". It writes,
> 
> For example, if a platform implements a PCI bus
> that cannot access
> all of physical memory, it has a _DMA object under
> that PCI bus that
> describes the ranges of physical memory that can be
> accessed by
> devices on that bus.
> 
> Sorry about the digression, and also about being late
> to this thread, continually -- I'm primarily following
> and learning.
> 
> )
> 
> Thanks!
> Laszlo
> 
> -=-=-=-=-=-=-=-=-=-=-=-
> Groups.io Links: You receive all messages sent to this
> group.
> 
> View/Reply Online (#46228):
> https://edk2.groups.io/g/devel/message/46228
> Mute This Topic: https://groups.io/mt/32979681/1643496
> Group Owner: devel+ow...@edk2.groups.io
> Unsubscribe: https://edk2.groups.io/g/devel/unsub
> [michael.d.kin...@intel.com]
> -=-=-=-=-=-=-=-=-=-=-=-



Re: [Qemu-devel] [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

2019-08-22 Thread Kinney, Michael D
Paolo,

The SMBASE register is internal and cannot be directly accessed 
by any CPU.  There is an SMBASE field that is member of the SMM Save
State area and can only be modified from SMM and requires the
execution of an RSM instruction from SMM for the SMBASE register to
be updated from the current SMBASE field value.  The new SMBASE
register value is only used on the next SMI.

https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf

Vol 3C - Section 34.11

  The default base address for the SMRAM is 3H. This value is contained in 
an internal processor register called
  the SMBASE register. The operating system or executive can relocate the SMRAM 
by setting the SMBASE field in the
  saved state map (at offset 7EF8H) to a new value (see Figure 34-4). The RSM 
instruction reloads the internal
  SMBASE register with the value in the SMBASE field each time it exits SMM. 
All subsequent SMI requests will use
  the new SMBASE value to find the starting address for the SMI handler (at 
SMBASE + 8000H) and the SMRAM state
  save area (from SMBASE + FE00H to SMBASE + H). (The processor resets the 
value in its internal SMBASE
  register to 3H on a RESET, but does not change it on an INIT.)

One idea to work around these issues is to startup OVMF with the maximum number 
of
CPUs.  All the CPUs will be assigned an SMBASE address and at a safe time to 
assign
the SMBASE values using the initial 3000:8000 SMI vector because there is a 
guarantee
of no DMA at that point in the FW init.

Once all the CPUs have been initialized for SMM, the CPUs that are not needed
can be hot removed.  As noted above, the SMBASE value does not change on
an INIT.  So as long as the hot add operation does not do a RESET, the
SMBASE value must be preserved.

Of course, this is not a good idea from a boot performance perspective, 
especially if the max CPUs is a large value.

Another idea is to emulate this behavior.  If the hot plug controller
provide registers (only accessible from SMM) to assign the SMBASE address
for every CPU.  When a CPU is hot added, QEMU can set the internal SMBASE
register value from the hot plug controller register value.  If the SMM
Monarch sends an INIT or an SMI from the Local APIC to the hot added CPU,
then the SMBASE register should not be modified and the CPU starts execution
within TSEG the first time it receives an SMI.

Jiewen and I can collect specific questions on this topic and continue
the discussion here.  For example, I do not think there is any method
other than what I referenced above to program the SMBASE register, but
I can ask if there are any other methods.

Thanks,

Mike

> -Original Message-
> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> Sent: Thursday, August 22, 2019 11:43 AM
> To: Laszlo Ersek ; Kinney, Michael D
> ; r...@edk2.groups.io; Yao,
> Jiewen 
> Cc: Alex Williamson ;
> de...@edk2.groups.io; qemu devel list  de...@nongnu.org>; Igor Mammedov ;
> Chen, Yingwen ; Nakajima, Jun
> ; Boris Ostrovsky
> ; Joao Marcal Lemos Martins
> ; Phillip Goerl
> 
> Subject: Re: [edk2-rfc] [edk2-devel] CPU hotplug using
> SMM with QEMU+OVMF
> 
> On 22/08/19 19:59, Laszlo Ersek wrote:
> > The firmware and QEMU could agree on a formula, which
> would compute
> > the CPU-specific SMBASE from a value pre-programmed by
> the firmware,
> > and the initial APIC ID of the hot-added CPU.
> >
> > Yes, it would duplicate code -- the calculation --
> between QEMU and
> > edk2. While that's not optimal, it wouldn't be a first.
> 
> No, that would be unmaintainable.  The best solution to
> me seems to be to make SMBASE programmable from non-SMM
> code if some special conditions hold.  Michael, would it
> be possible to get in contact with the Intel architects?
> 
> Paolo



Re: [Qemu-devel] [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

2019-08-22 Thread Paolo Bonzini
On 22/08/19 20:29, Laszlo Ersek wrote:
> On 08/22/19 08:18, Paolo Bonzini wrote:
>> On 21/08/19 22:17, Kinney, Michael D wrote:
>>> DMA protection of memory ranges is a chipset feature. For the current
>>> QEMU implementation, what ranges of memory are guaranteed to be
>>> protected from DMA?  Is it only A/B seg and TSEG?
>>
>> Yes.
> 
> This thread (esp. Jiewen's and Mike's messages) are the first time that
> I've heard about the *existence* of such RAM ranges / the chipset
> feature. :)
> 
> Out of interest (independently of virtualization), how is a general
> purpose OS informed by the firmware, "never try to set up DMA to this
> RAM area"? Is this communicated through ACPI _CRS perhaps?
> 
> ... Ah, almost: ACPI 6.2 specifies _DMA, in "6.2.4 _DMA (Direct Memory
> Access)". It writes,
> 
> For example, if a platform implements a PCI bus that cannot access
> all of physical memory, it has a _DMA object under that PCI bus that
> describes the ranges of physical memory that can be accessed by
> devices on that bus.
> 
> Sorry about the digression, and also about being late to this thread,
> continually -- I'm primarily following and learning.

It's much simpler: these ranges are not in e820, for example

kernel: BIOS-e820: [mem 0x00059000-0x0008bfff] usable
kernel: BIOS-e820: [mem 0x0008c000-0x000f] reserved

The ranges are not special-cased in any way by QEMU.  Simply, AB-segs
and TSEG RAM are not part of the address space except when in SMM.
Therefore, DMA to those ranges ends up respectively to low VGA RAM[1]
and to the bit bucket.  When AB-segs are open, for example, DMA to that
area becomes possible.

Paolo

[1] old timers may remember DEF SEG=: BLOAD "foo.img",0.  It still
works with some disk device models.



Re: [Qemu-devel] [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

2019-08-22 Thread Paolo Bonzini
On 22/08/19 19:59, Laszlo Ersek wrote:
> The firmware and QEMU could agree on a formula, which would compute the
> CPU-specific SMBASE from a value pre-programmed by the firmware, and the
> initial APIC ID of the hot-added CPU.
> 
> Yes, it would duplicate code -- the calculation -- between QEMU and
> edk2. While that's not optimal, it wouldn't be a first.

No, that would be unmaintainable.  The best solution to me seems to be
to make SMBASE programmable from non-SMM code if some special conditions
hold.  Michael, would it be possible to get in contact with the Intel
architects?

Paolo



Re: [Qemu-devel] [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

2019-08-22 Thread Laszlo Ersek
On 08/22/19 08:18, Paolo Bonzini wrote:
> On 21/08/19 22:17, Kinney, Michael D wrote:
>> Paolo,
>>
>> It makes sense to match real HW.
>
> Note that it'd also be fine to match some kind of official Intel
> specification even if no processor (currently?) supports it.

I agree, because...

>> That puts us back to the reset vector and handling the initial SMI at
>> 3000:8000.  That is all workable from a FW implementation
>> perspective.

that would suggest that matching reset vector code already exists, and
it would "only" need to be upstreamed to edk2. :)

>> It look like the only issue left is DMA.
>>
>> DMA protection of memory ranges is a chipset feature. For the current
>> QEMU implementation, what ranges of memory are guaranteed to be
>> protected from DMA?  Is it only A/B seg and TSEG?
>
> Yes.

(

This thread (esp. Jiewen's and Mike's messages) are the first time that
I've heard about the *existence* of such RAM ranges / the chipset
feature. :)

Out of interest (independently of virtualization), how is a general
purpose OS informed by the firmware, "never try to set up DMA to this
RAM area"? Is this communicated through ACPI _CRS perhaps?

... Ah, almost: ACPI 6.2 specifies _DMA, in "6.2.4 _DMA (Direct Memory
Access)". It writes,

For example, if a platform implements a PCI bus that cannot access
all of physical memory, it has a _DMA object under that PCI bus that
describes the ranges of physical memory that can be accessed by
devices on that bus.

Sorry about the digression, and also about being late to this thread,
continually -- I'm primarily following and learning.

)

Thanks!
Laszlo



Re: [Qemu-devel] [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

2019-08-22 Thread Laszlo Ersek
On 08/21/19 19:05, Paolo Bonzini wrote:
> On 21/08/19 17:48, Kinney, Michael D wrote:
>> Perhaps there is a way to avoid the 3000:8000 startup
>> vector.
>>
>> If a CPU is added after a cold reset, it is already in a
>> different state because one of the active CPUs needs to
>> release it by interacting with the hot plug controller.
>>
>> Can the SMRR for CPUs in that state be pre-programmed to
>> match the SMRR in the rest of the active CPUs?
>>
>> For OVMF we expect all the active CPUs to use the same
>> SMRR value, so a check can be made to verify that all 
>> the active CPUs have the same SMRR value.  If they do,
>> then any CPU released through the hot plug controller 
>> can have its SMRR pre-programmed and the initial SMI
>> will start within TSEG.
>>
>> We just need to decide what to do in the unexpected 
>> case where all the active CPUs do not have the same
>> SMRR value.
>>
>> This should also reduce the total number of steps.
> 
> The problem is not the SMRR but the SMBASE.  If the SMBASE area is
> outside TSEG, it is vulnerable to DMA attacks independent of the SMRR.
> SMBASE is also different for all CPUs, so it cannot be preprogrammed.

The firmware and QEMU could agree on a formula, which would compute the
CPU-specific SMBASE from a value pre-programmed by the firmware, and the
initial APIC ID of the hot-added CPU.

Yes, it would duplicate code -- the calculation -- between QEMU and
edk2. While that's not optimal, it wouldn't be a first.

Thanks
Laszlo



Re: [Qemu-devel] [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

2019-08-22 Thread Laszlo Ersek
On 08/21/19 17:48, Kinney, Michael D wrote:
> Perhaps there is a way to avoid the 3000:8000 startup
> vector.
>
> If a CPU is added after a cold reset, it is already in a
> different state because one of the active CPUs needs to
> release it by interacting with the hot plug controller.
>
> Can the SMRR for CPUs in that state be pre-programmed to
> match the SMRR in the rest of the active CPUs?
>
> For OVMF we expect all the active CPUs to use the same
> SMRR value, so a check can be made to verify that all
> the active CPUs have the same SMRR value.  If they do,
> then any CPU released through the hot plug controller
> can have its SMRR pre-programmed and the initial SMI
> will start within TSEG.

Yes, that is what I proposed here:

* effa5e32-be1e-4703-4419-8866b7754e2d@redhat.com">http://mid.mail-archive.com/effa5e32-be1e-4703-4419-8866b7754e2d@redhat.com
* https://edk2.groups.io/g/devel/message/45570

Namely:

> When the SMM setup quiesces during normal firmware boot, OVMF could
> use existent (finalized) SMBASE infomation to *pre-program* some
> virtual QEMU hardware, with such state that would be expected, as
> "final" state, of any new hotplugged CPU. Afterwards, if / when the
> hotplug actually happens, QEMU could blanket-apply this state to the
> new CPU, and broadcast a hardware SMI to all CPUs except the new one.

(I know that Paolo didn't like it; I'm just confirming that I had the
same, or at least a very similar, idea.)

Thanks!
Laszlo



Re: [Qemu-devel] [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

2019-08-22 Thread Paolo Bonzini
On 21/08/19 22:17, Kinney, Michael D wrote:
> Paolo,
> 
> It makes sense to match real HW.

Note that it'd also be fine to match some kind of official Intel
specification even if no processor (currently?) supports it.

> That puts us back to
> the reset vector and handling the initial SMI at
> 3000:8000.  That is all workable from a FW implementation
> perspective.  It look like the only issue left is DMA.
> 
> DMA protection of memory ranges is a chipset feature.
> For the current QEMU implementation, what ranges of
> memory are guaranteed to be protected from DMA?  Is
> it only A/B seg and TSEG?

Yes.

Paolo

>> Yes, all of these would work.  Again, I'm interested in
>> having something that has a hope of being implemented in
>> real hardware.
>>
>> Another, far easier to implement possibility could be a
>> lockable MSR (could be the existing
>> MSR_SMM_FEATURE_CONTROL) that allows programming the
>> SMBASE outside SMM.  It would be nice if such a bit
>> could be defined by Intel.
>>
>> Paolo




Re: [Qemu-devel] [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

2019-08-21 Thread Kinney, Michael D
Paolo,

It makes sense to match real HW.  That puts us back to
the reset vector and handling the initial SMI at
3000:8000.  That is all workable from a FW implementation
perspective.  It look like the only issue left is DMA.

DMA protection of memory ranges is a chipset feature.
For the current QEMU implementation, what ranges of
memory are guaranteed to be protected from DMA?  Is
it only A/B seg and TSEG?

Thanks,

Mike

> -Original Message-
> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> Sent: Wednesday, August 21, 2019 10:40 AM
> To: Kinney, Michael D ;
> r...@edk2.groups.io; Yao, Jiewen 
> Cc: Alex Williamson ; Laszlo
> Ersek ; de...@edk2.groups.io; qemu
> devel list ; Igor Mammedov
> ; Chen, Yingwen
> ; Nakajima, Jun
> ; Boris Ostrovsky
> ; Joao Marcal Lemos Martins
> ; Phillip Goerl
> 
> Subject: Re: [edk2-rfc] [edk2-devel] CPU hotplug using
> SMM with QEMU+OVMF
> 
> On 21/08/19 19:25, Kinney, Michael D wrote:
> > Could we have an initial SMBASE that is within TSEG.
> >
> > If we bring in hot plug CPUs one at a time, then
> initial SMBASE in
> > TSEG can reprogram the SMBASE to the correct value for
> that CPU.
> >
> > Can we add a register to the hot plug controller that
> allows the BSP
> > to set the initial SMBASE value for a hot added CPU?
> The default can
> > be 3000:8000 for compatibility.
> >
> > Another idea is when the SMI handler runs for a hot
> add CPU event, the
> > SMM monarch programs the hot plug controller register
> with the SMBASE
> > to use for the CPU that is being added.  As each CPU
> is added, a
> > different SMBASE value can be programmed by the SMM
> Monarch.
> 
> Yes, all of these would work.  Again, I'm interested in
> having something that has a hope of being implemented in
> real hardware.
> 
> Another, far easier to implement possibility could be a
> lockable MSR (could be the existing
> MSR_SMM_FEATURE_CONTROL) that allows programming the
> SMBASE outside SMM.  It would be nice if such a bit
> could be defined by Intel.
> 
> Paolo



Re: [Qemu-devel] [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

2019-08-21 Thread Paolo Bonzini
On 21/08/19 19:25, Kinney, Michael D wrote:
> Could we have an initial SMBASE that is within TSEG.
> 
> If we bring in hot plug CPUs one at a time, then initial
> SMBASE in TSEG can reprogram the SMBASE to the correct 
> value for that CPU.
> 
> Can we add a register to the hot plug controller that
> allows the BSP to set the initial SMBASE value for 
> a hot added CPU?  The default can be 3000:8000 for
> compatibility.
> 
> Another idea is when the SMI handler runs for a hot add
> CPU event, the SMM monarch programs the hot plug controller
> register with the SMBASE to use for the CPU that is being
> added.  As each CPU is added, a different SMBASE value can
> be programmed by the SMM Monarch.

Yes, all of these would work.  Again, I'm interested in having something
that has a hope of being implemented in real hardware.

Another, far easier to implement possibility could be a lockable MSR
(could be the existing MSR_SMM_FEATURE_CONTROL) that allows programming
the SMBASE outside SMM.  It would be nice if such a bit could be defined
by Intel.

Paolo



Re: [Qemu-devel] [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

2019-08-21 Thread Paolo Bonzini
On 21/08/19 17:48, Kinney, Michael D wrote:
> Perhaps there is a way to avoid the 3000:8000 startup
> vector.
> 
> If a CPU is added after a cold reset, it is already in a
> different state because one of the active CPUs needs to
> release it by interacting with the hot plug controller.
> 
> Can the SMRR for CPUs in that state be pre-programmed to
> match the SMRR in the rest of the active CPUs?
> 
> For OVMF we expect all the active CPUs to use the same
> SMRR value, so a check can be made to verify that all 
> the active CPUs have the same SMRR value.  If they do,
> then any CPU released through the hot plug controller 
> can have its SMRR pre-programmed and the initial SMI
> will start within TSEG.
> 
> We just need to decide what to do in the unexpected 
> case where all the active CPUs do not have the same
> SMRR value.
> 
> This should also reduce the total number of steps.

The problem is not the SMRR but the SMBASE.  If the SMBASE area is
outside TSEG, it is vulnerable to DMA attacks independent of the SMRR.
SMBASE is also different for all CPUs, so it cannot be preprogrammed.

(As an aside, virt platforms are also immune to cache poisoning so they
don't have SMRR yet - we could use them for SMM_CODE_CHK_EN and block
execution outside SMRR but we never got round to it).

An even simpler alternative would be to make Ah the initial SMBASE.
 However, I would like to understand what hardware platforms plan to do,
if anything.

Paolo

> Mike
> 
>> -Original Message-
>> From: r...@edk2.groups.io [mailto:r...@edk2.groups.io] On
>> Behalf Of Yao, Jiewen
>> Sent: Sunday, August 18, 2019 4:01 PM
>> To: Paolo Bonzini 
>> Cc: Alex Williamson ; Laszlo
>> Ersek ; de...@edk2.groups.io; edk2-
>> rfc-groups-io ; qemu devel list
>> ; Igor Mammedov
>> ; Chen, Yingwen
>> ; Nakajima, Jun
>> ; Boris Ostrovsky
>> ; Joao Marcal Lemos Martins
>> ; Phillip Goerl
>> 
>> Subject: Re: [edk2-rfc] [edk2-devel] CPU hotplug using
>> SMM with QEMU+OVMF
>>
>> in real world, we deprecate AB-seg usage because they
>> are vulnerable to smm cache poison attack.
>> I assume cache poison is out of scope in the virtual
>> world, or there is a way to prevent ABseg cache poison.
>>
>> thank you!
>> Yao, Jiewen
>>
>>
>>> 在 2019年8月19日,上午3:50,Paolo Bonzini
>>  写道:
>>>
>>>> On 17/08/19 02:20, Yao, Jiewen wrote:
>>>> [Jiewen] That is OK. Then we MUST add the third
>> adversary.
>>>> -- Adversary: Simple hardware attacker, who can use
>> device to perform DMA attack in the virtual world.
>>>> NOTE: The DMA attack in the real world is out of
>> scope. That is be handled by IOMMU in the real world,
>> such as VTd. -- Please do clarify if this is TRUE.
>>>>
>>>> In the real world:
>>>> #1: the SMM MUST be non-DMA capable region.
>>>> #2: the MMIO MUST be non-DMA capable region.
>>>> #3: the stolen memory MIGHT be DMA capable region or
>> non-DMA capable
>>>> region. It depends upon the silicon design.
>>>> #4: the normal OS accessible memory - including ACPI
>> reclaim, ACPI
>>>> NVS, and reserved memory not included by #3 - MUST be
>> DMA capable region.
>>>> As such, IOMMU protection is NOT required for #1 and
>> #2. IOMMU
>>>> protection MIGHT be required for #3 and MUST be
>> required for #4.
>>>> I assume the virtual environment is designed in the
>> same way. Please
>>>> correct me if I am wrong.
>>>>
>>>
>>> Correct.  The 0x3...0x3 area is the only
>> problematic one;
>>> Igor's idea (or a variant, for example optionally
>> remapping
>>> 0xa..0xa SMRAM to 0x3) is becoming more
>> and more attractive.
>>>
>>> Paolo
>>
>> -=-=-=-=-=-=-=-=-=-=-=-
>> Groups.io Links: You receive all messages sent to this
>> group.
>>
>> View/Reply Online (#47):
>> https://edk2.groups.io/g/rfc/message/47
>> Mute This Topic: https://groups.io/mt/32887711/1643496
>> Group Owner: rfc+ow...@edk2.groups.io
>> Unsubscribe: https://edk2.groups.io/g/rfc/unsub
>> [michael.d.kin...@intel.com]
>> -=-=-=-=-=-=-=-=-=-=-=-
>