On Thu, Sep 30, 2021 at 5:43 PM Jane Chu <[email protected]> wrote: > > > On 9/30/2021 3:35 PM, Borislav Petkov wrote: > > On Thu, Sep 30, 2021 at 02:41:52PM -0700, Dan Williams wrote: > >> I fail to see the point of that extra plumbing when MSi_MISC > >> indicating "whole_page", or not is sufficient. What am I missing? > > > > I think you're looking at it from the wrong side... (or it is too late > > here, but we'll see). Forget how a memory type can be mapped but think > > about how the recovery action looks like. > > > > - DRAM: when a DRAM page is poisoned, it is only poisoned as a whole > > page by memory_failure(). whole_page is always true here, no matter what > > the hardware says because we don't and cannot do any sub-page recovery > > actions. So it doesn't matter how we map it, UC, NP... I suggested NP > > because the page is practically not present if you want to access it > > because mm won't allow it... > > > > - PMEM: reportedly, we can do sub-page recovery here so PMEM should be > > mapped in the way it is better for the recovery action to work. > > > > In both cases, the recovery action should control how the memory type is > > mapped. > > > > Now, you say we cannot know the memory type when the error gets > > reported. > > > > And I say: for simplicity's sake, we simply go and work with whole > > pages. Always. That is the case anyway for DRAM. > > Sorry, please correct me if I misunderstand. The DRAM poison handling > at page frame granularity is a helpless compromise due to lack of > guarantee to decipher the precise error blast radius given all > types of DRAM and architectures, right? But that's not true for > the PMEM case. So why should PMEM poison handling follow the lead > of DRAM?
If I understand the proposal correctly Boris is basically saying "figure out how to do your special PMEM stuff in the driver directly and make it so MCE code has no knowledge of the PMEM case". The flow is: memory_failure(pfn, flags) nfit_handle_mce(...) <--- right now this on mce notifier chain set_mce_nospec(pfn) <--- drop the "whole page" concept This poses a problem because not all memory_failure() paths trigger set_mce_nospec() or the mce notifier chain. If that disconnect happens, attempts to read PMEM pages that have been signalled to memory_failure() will now crash in the driver without workarounds for NP pages. So memory_failure() needs to ensure that it communicates with the driver before any possible NP page attribute changes. I.e. the driver needs to know that regardless of how many cachelines are poisoned the entire page is always unmapped in the direct map. Then, when the driver is called with the new RWF_RECOVER_DATA flag, it can set up a new UC alias mapping for the pfn and access the good data in the page while being careful to read around the poisoned cache lines. In my mind this moves the RWF_RECOVER_DATA flag proposal from "nice to have" to "critical for properly coordinating with memory_failure() and mce expectations"
