On 30/07/2024 11:55 am, Roger Pau Monné wrote:
> On Mon, Jul 29, 2024 at 06:51:31PM +0100, Andrew Cooper wrote:
>> On 29/07/2024 5:18 pm, Roger Pau Monné wrote:
>>> On Mon, Jul 29, 2024 at 04:52:22PM +0100, Andrew Cooper wrote:
>>>> On 29/07/2024 12:53 pm, Jan Beulich wrote:
>>>>> On 26.07.2024 17:21, Roger Pau Monne wrote:
>>>>>> The PVH dom0 builder doesn't switch page tables and has no need to run 
>>>>>> with
>>>>>> SMAP disabled.
>>>>>>
>>>>>> Put the SMAP disabling close to the code region where it's necessary, as 
>>>>>> it
>>>>>> then becomes obvious why switch_cr3_cr4() is required instead of
>>>>>> write_ptbase().
>>>>>>
>>>>>> Note removing SMAP from cr4_pv32_mask is not required, as we never jump 
>>>>>> into
>>>>>> guest context, and hence updating the value of cr4_pv32_mask is not 
>>>>>> relevant.
>>>>> I'm okay-ish with that being dropped, but iirc the goal was to keep the
>>>>> variable in sync with CPU state.
>>>> Removing SMAP from cr4_pv32_mask is necessary.
>>>>
>>>> Otherwise IST vectors will reactive SMAP behind the back of the dombuilder.
>>>>
>>>> This will probably only manifest in practice in a CONFIG_PV32=y build,
>>> Sorry, I'm possibly missing some context here.  When running the dom0
>>> builder we switch to the guest page-tables, but not to the guest vCPU,
>>> (iow: current == idle) and hence the context is always the Xen
>>> context.
>> Correct.
>>
>>> Why would the return path of the IST use cr4_pv32_mask when the
>>> context in which the IST happened was the Xen one, and the current
>>> vCPU is the idle one (a 64bit PV guest from Xen's PoV).
>>>
>>> My understanding is that cr4_pv32_mask should only be used when the
>>> current context is running a 32bit PV vCPU.
>> This logic is evil to follow, because you need to look at both
>> cr4_pv32_mask and XEN_CR4_PV32_BITS to see both halves of it.
>>
>> Notice how cr4_pv32_restore() only ever OR's cr4_pv32_mask into %cr4?
>>
>> CR4_PV32_RESTORE is called from every entry path which *might* have come
>> from a 32bit PV guest, and it always results in Xen having SMEP/SMAP
>> active (as applicable).  This includes NMI.
>>
>> The change is only undone in compat_restore_all_guest(), where
>> XEN_CR4_PV32_BITS is cleared from %cr4 iff returning to Ring1/2.  This
>> is logic cunningly disguised in the use of the Parity flag.
>>
>>
>> Because the NMI handler does reactive SMEP/SMAP (based on the value in
>> cr4_pv32_mask), and returning to Xen does not pass through
>> compat_restore_all_guest(), taking an NMI in the middle of of the
>> dombuilder will reactive SMAP behind your back.
> After further conversations with Andrew we believe the current
> disabling of X86_CR4_SMAP in %cr4 during dom0 build is not safe.
>
> Regardless of whether cr4_pv32_mask is properly adjusted return to Xen
> context from interrupt would be done with SMAP enabled if
> X86_FEATURE_XEN_SMAP is set.

Sorry - that's not what I intended to convey.

The logic prior to this patch is safe.  SMAP is cleared from
cr4_pv32_mask before clearing CR4.SMAP, and reinstated in the opposite
order.  Therefore, an NMI hitting the region won't reactivate SMAP
because it's not (instantaniously) set in cr4_pv32_mask.

Arguably it wants some barrier()'s for clarity, and an explanation of
why this works.

The problem your patch has is that by not clearing SMAP from
cr4_pv32_mask, it becomes unsafe iff an NMI/#MC/#DB hits the region.

> I will send a new patch that uses stac/clac in order to disable SMAP
> (if required) around the dom0 builder code that switches to the guest
> page-tables.

Either way - this is a much cleaner solution.

~Andrew

Reply via email to