On 27.03.2023 17:39, Tim Deegan wrote:
> At 10:33 +0100 on 22 Mar (1679481226), Jan Beulich wrote:
>> There's no need for an indirect call here, as the mode is invariant
>> throughout the entire paging-locked region. All it takes to avoid it is
>> to have a forward declaration of sh_update_cr3() in place.
>>
>> Signed-off-by: Jan Beulich <jbeul...@suse.com>
>> ---
>> I find this and the respective Win7 related comment suspicious: If we
>> really need to "fix up" L3 entries "on demand", wouldn't we better retry
>> the shadow_get_and_create_l1e() rather than exit? The spurious page
>> fault that the guest observes can, after all, not be known to be non-
>> fatal inside the guest. That's purely an OS policy.
> 
> I think it has to be non-fatal because it can happen on real hardware,
> even if the hardware *does* fill the TLB here (which it is not
> required to).

But if hardware filled the TLB, it won't raise #PF. If it didn't fill
the TLB (e.g. because of not even doing a walk when a PDPTE is non-
present), a #PF would be legitimate, and the handler would then need
to reload CR3. But such a #PF is what, according to the comment, Win7
chokes on. So it can only be the former case, yet what we do here is
fill the (virtual) TLB _and_ raise #PF. Win7 apparently ignores this
as spurious, but that's not required behavior for an OS afaik.

> Filling just one sl3e sounds plausible, though we don't want to go
> back to the idea of having L3 shadows on PAE!

Of course.

>> Furthermore the sh_update_cr3() will also invalidate L3 entries which
>> were loaded successfully before, but invalidated by the guest
>> afterwards. I strongly suspect that the described hardware behavior is
>> _only_ to load previously not-present entries from the PDPT, but not
>> purge ones already marked present.
> 
> Very likely, but we *are* allowed to forget old entries whenever we
> want to, so this is at worst a performance problem.

That depends on the model, I think: In the original Intel model PDPTEs
cannot be "forgotten". In some AMD variants, where L3 is walked normally,
they of course can be.

>> IOW I think sh_update_cr3() would
>> need calling in an "incremental" mode here.
> 
> This would be the best way of updating just the one entry - but as far
> as I can tell the existing code is correct so I wouldn't add any more
> complexity unless we know that this path is causing perf problems.

If it was/is just performance - sure.

>> In any event emitting a TRC_SHADOW_DOMF_DYING trace record in this case
>> looks wrong.
> 
> Yep.

Will add another patch to the series then.

>> Beyond the "on demand" L3 entry creation I also can't see what guest
>> actions could lead to the ASSERT() being inapplicable in the PAE case.
>> The 3-level code in shadow_get_and_create_l2e() doesn't consult guest
>> PDPTEs, and all other logic is similar to that for other modes.
> 
> The assert's not true here because the guest can push us down this
> path by doing exactly what Win 7 does here - loading CR3 with a
> missing L3E, then filling the L3E and causing a page fault that uses
> the now-filled L3E.  (Or maybe that's not true any more; my mental
> model of the pagetable walker code might be out of date)

Well, I specifically started the paragraph with 'Beyond the "on demand"
L3 entry creation'. IOW the assertion would look applicable to me if
we, as suggested higher up, retried shadow_get_and_create_l1e() and it
failed again. As the comment there says, "we know the guest entries are
OK", so the missing L3 entry must have appeared.

Jan

Reply via email to