On Tue, Sep 14, 2021 at 11:08:00AM -0700, Dan Williams wrote:
> Sure, I should probably include this following note in all patches
> touching the DAX-memory_failure() path, because it is a frequently
> asked question. The tl;dr is:
> 
> Typical memory_failure() does not assume the physical page can be
> recovered and put back into circulation, PMEM memory_failure() allows
> for recovery of the page.

Hmm, I think by "PMEM memory_failure()" you mean what pmem_do_write()
does with that poison clearing or?

But then I have no clue what the "DAX-memory_failure()" path is.

> The longer description is:
> Typical memory_failure() for anonymous, or page-cache pages, has the
> flexibility to invalidate bad pages and trigger any users to request a
> new page from the page allocator to replace the quarantined one. DAX
> removes that flexibility. The page is a handle for a fixed storage
> location, i.e. no mechanism to remap a physical page to a different
> logical address. Software expects to be able to repair an error in
> PMEM by reading around the poisoned cache lines and writing zeros,
> fallocate(...FALLOC_FL_PUNCH_HOLE...), to overwrite poison. The page
> needs to remain accessible to enable recovery.

Aha, so memory failure there is not offlining he 4K page but simply
zeroing the poison. What happens if the same physical location triggers
more read errors, i.e., the underlying hw is going bad? Don't you want
to exclude that location from ever being written again?

Or maybe such recovery action doesn't make sense for pmem?

> Short answer, PMEM never goes "offline" because it was never "online"
> in the first place. Where "online" in this context is specifically
> referring to pfns that are under the watchful eye of the core-mm page
> allocator.

Ok, so pmem wants to be handled differently during memory failure.
Looking at the patch again, you change the !unmap case to do
_set_memory_uc().

That bool unmap thing gets *not* set in whole_page():

        return MCI_MISC_ADDR_LSB(m->misc) >= PAGE_SHIFT;

so I'm guessing that "Recoverable Address LSB (bits 5:0): The lowest
valid recoverable address bit." thing is < 12 for pmem.

But are you really saying that the hardware would never report a lower
than 12 value for normal memory?

If it does, then that is wrong here.

In any case, I'd prefer if the code would do an explicit check somewhere
saying "is this pmem" in order to do that special action only for when
it really is pmem and not rely on it implicitly.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Reply via email to