On Fri, Oct 1, 2021 at 11:12 AM Borislav Petkov <[email protected]> wrote:
>
> On Fri, Oct 01, 2021 at 09:52:21AM -0700, Dan Williams wrote:
> > I think that puts us back in the situation that Tony fixed in:
> >
> > 17fae1294ad9 x86/{mce,mm}: Unmap the entire page if the whole page is
> > affected and poisoned
> >
> > ...where the clflush in _set_memory_uc() causes more unwanted virtual
> > #MC injection.
>
> Hmm, lemme read that commit message again: so the guest kernel sees a
> *second* MCE while doing set_memory_uc().
>
> So what prevents the guest kernel from seeing a second MCE when it does
> set_memory_np() instead?
>
> "Further investigation showed that the VMM had passed in another machine
> check because is appeared that the guest was accessing the bad page."
>
> but then the beginning of the commit message says:
>
> "The VMM unmapped the bad page from guest physical space and passed the
> machine check to the guest."
>
> so I'm really confused here what actually happens. Did the VMM manage to
> unmap it or not really? Because if the VMM had unmapped it, then how was
> the guest still accessing the bad page? ... Maybe I'm reading that wrong.

My read is that the guest gets virtual #MC on an access to that page.
When the guest tries to do set_memory_uc() and instructs cpa_flush()
to do clean caches that results in taking another fault / exception
perhaps because the VMM unmapped the page from the guest? If the guest
had flipped the page to NP then cpa_flush() says "oh, no caching
change, skip the clflush() loop".

>
> > I think that means that we have no choice but to mark the page NP
> > unconditionally and do the work to ensure that the driver has updated
> > its poisoned cacheline tracking for data recovery requests.
>
> So a couple of emails earlier I had this:
>
> |Well, the driver has special knowledge so *actually* it could go and use
> |the NP marking as "this page has been poisoned" and mark it NC only for
> |itself, so that it gets the job done. Dunno if that would end up being
> |too ugly to live and turn into a layering violation or so.
>
> So if we have marked the page NP, then nothing would be able to access
> it anymore and it will be marked as hwpoison additionally, which will
> prevent that access too.
>
> Then, the PMEM driver would go and map the page however it wants to, it
> could even remove it from the direct map so that nothing else accesses
> it, even in the kernel, and then do all kinds of recovery.
>
> Hmm?

Yeah, I thought UC would make the PMEM driver's life easier, but if it
has to contend with an NP case at all, might as well make it handle
that case all the time.

Safe to say this patch of mine is woefully insufficient and I need to
go look at how to make the guarantees needed by the PMEM driver so it
can handle NP and set up alias maps.

This was a useful discussion. It proves that my commit:

284ce4011ba6 x86/memory_failure: Introduce {set, clear}_mce_nospec()

...was broken from the outset.

Reply via email to