On Mon, Sep 13, 2021 at 3:29 AM Borislav Petkov <[email protected]> wrote: > > On Tue, Jul 06, 2021 at 06:01:05PM -0700, Dan Williams wrote: > > When poison is discovered and triggers memory_failure() the physical > > page is unmapped from all process address space. However, it is not > > unmapped from kernel address space. Unlike a typical memory page that > > can be retired from use in the page allocator and marked 'not present', > > pmem needs to remain accessible given it can not be physically remapped > > or retired. > > I'm surely missing something obvious but why does it need to remain > accessible? Spell it out please.
Sure, I should probably include this following note in all patches touching the DAX-memory_failure() path, because it is a frequently asked question. The tl;dr is: Typical memory_failure() does not assume the physical page can be recovered and put back into circulation, PMEM memory_failure() allows for recovery of the page. The longer description is: Typical memory_failure() for anonymous, or page-cache pages, has the flexibility to invalidate bad pages and trigger any users to request a new page from the page allocator to replace the quarantined one. DAX removes that flexibility. The page is a handle for a fixed storage location, i.e. no mechanism to remap a physical page to a different logical address. Software expects to be able to repair an error in PMEM by reading around the poisoned cache lines and writing zeros, fallocate(...FALLOC_FL_PUNCH_HOLE...), to overwrite poison. The page needs to remain accessible to enable recovery. > > > set_memory_uc() tries to maintain consistent nominal memtype > > mappings for a given pfn, but memory_failure() is an exceptional > > condition. > > That's not clear to me too. So looking at the failure: > > [10683.426147] x86/PAT: fsdax_poison_v1:5018 conflicting memory types > 1850600000-1850601000 uncached-minus<->write-back > > set_memory_uc() marked it UC- but something? wants it to be WB. Why? PMEM is mapped WB at the beginning of time for nominal operation. track_pfn_remap() records that driver setting and forwards it to any track_pfn_insert() of the same pfn, i.e. this is how DAX mappings inherit the WB cache mode. memory_failure() wants to arrange avoidance speculative consumption of poison, set_memory_uc() checks with the track_pfn_remap() setting, but we know this is an exceptional condition and it is ok to force it UC against the typical memtype expectation. > I guess I need some more info on the whole memory offlining for pmem and > why that should be done differently than with normal memory. Short answer, PMEM never goes "offline" because it was never "online" in the first place. Where "online" in this context is specifically referring to pfns that are under the watchful eye of the core-mm page allocator.
