On Fri, Oct 01, 2021 at 09:52:21AM -0700, Dan Williams wrote:
> I think that puts us back in the situation that Tony fixed in:
>
> 17fae1294ad9 x86/{mce,mm}: Unmap the entire page if the whole page is
> affected and poisoned
>
> ...where the clflush in _set_memory_uc() causes more unwanted virtual
> #MC injection.
Hmm, lemme read that commit message again: so the guest kernel sees a
*second* MCE while doing set_memory_uc().
So what prevents the guest kernel from seeing a second MCE when it does
set_memory_np() instead?
"Further investigation showed that the VMM had passed in another machine
check because is appeared that the guest was accessing the bad page."
but then the beginning of the commit message says:
"The VMM unmapped the bad page from guest physical space and passed the
machine check to the guest."
so I'm really confused here what actually happens. Did the VMM manage to
unmap it or not really? Because if the VMM had unmapped it, then how was
the guest still accessing the bad page? ... Maybe I'm reading that wrong.
> I think that means that we have no choice but to mark the page NP
> unconditionally and do the work to ensure that the driver has updated
> its poisoned cacheline tracking for data recovery requests.
So a couple of emails earlier I had this:
|Well, the driver has special knowledge so *actually* it could go and use
|the NP marking as "this page has been poisoned" and mark it NC only for
|itself, so that it gets the job done. Dunno if that would end up being
|too ugly to live and turn into a layering violation or so.
So if we have marked the page NP, then nothing would be able to access
it anymore and it will be marked as hwpoison additionally, which will
prevent that access too.
Then, the PMEM driver would go and map the page however it wants to, it
could even remove it from the direct map so that nothing else accesses
it, even in the kernel, and then do all kinds of recovery.
Hmm?
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette