On 9/30/2021 3:35 PM, Borislav Petkov wrote:
On Thu, Sep 30, 2021 at 02:41:52PM -0700, Dan Williams wrote:
I fail to see the point of that extra plumbing when MSi_MISC
indicating "whole_page", or not is sufficient. What am I missing?

I think you're looking at it from the wrong side... (or it is too late
here, but we'll see). Forget how a memory type can be mapped but think
about how the recovery action looks like.

- DRAM: when a DRAM page is poisoned, it is only poisoned as a whole
page by memory_failure(). whole_page is always true here, no matter what
the hardware says because we don't and cannot do any sub-page recovery
actions. So it doesn't matter how we map it, UC, NP... I suggested NP
because the page is practically not present if you want to access it
because mm won't allow it...

- PMEM: reportedly, we can do sub-page recovery here so PMEM should be
mapped in the way it is better for the recovery action to work.

In both cases, the recovery action should control how the memory type is
mapped.

Now, you say we cannot know the memory type when the error gets
reported.

And I say: for simplicity's sake, we simply go and work with whole
pages. Always. That is the case anyway for DRAM.

Sorry, please correct me if I misunderstand. The DRAM poison handling
at page frame granularity is a helpless compromise due to lack of
guarantee to decipher the precise error blast radius given all
types of DRAM and architectures, right?  But that's not true for
the PMEM case. So why should PMEM poison handling follow the lead
of DRAM?

Regards,
-jane


For PMEM, AFAIU, it doesn't matter whether it is a whole page or not -
the PMEM driver knows how to do those sub-pages accesses.

IOW, set_mce_nospec() should simply do:

        rc = set_memory_np(decoy_addr, 1);

and that's it.


Reply via email to