On Mon, Sep 20, 2021 at 07:04:50PM -0700, Dan Williams wrote:
> Yes, although I believe that DRAM patrol scrubbing is being done from
> the host memory controller, these PMEM DIMMs have firmware and DMA
> engines *in the DIMM* to do this scrub work.

Oh great. Lemme guess, they even have small OSes inside ... <eyeroll>

> Perhaps, but I don't know how you do that if memory_failure() has
> "offlined" the DRAM page, in the case of PMEM you can issue a
> byte-aligned direct-I/O access to the exact storage locations around
> the poisoned cachelines.

Well, looking at the exactly two call sites of set_mce_nospec(), it
does:

        if (!memory_failure(..))
                set_mce_nospec(pfn, whole_page...);

so IINM, if that thing returns 0, then we have hwpoisoned the page. And
if that is the case, then why are we even diddling with reads around the
poisoned cacheline when the whole page has been poisoned and we can't
access it anymore?

Or am I missing something?

Because the comment over set_mce_nospec() says

"or marking it uncacheable (if we want to try to retrieve data from
non-poisoned lines in the page)."

Question is, can we even access a hwpoisoned page to retrieve that data
or are we completely in the wrong weeds here? Tony?

> PMEM can still go NP if the entire page is failed, so no need to
> exclude PMEM from the treatise because the driver's badblocks
> implementation will cover the NP page, and the driver can use
> clear_mce_nospec() to recover the WB mapping / access after the poison
> has been cleared.

Now I'm all confused again. ;-(

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Reply via email to