> So let me cut to the chase:
>
>        if (!memory_failure(..))
>                set_mce_nospec(pfn, whole_page...);
>
> when memory_failure() returns 0, is a whole page marked as hwpoison or
> not?

Depends on what the whole_page() helper said.

static bool whole_page(struct mce *m)
{
        if (!mca_cfg.ser || !(m->status & MCI_STATUS_MISCV))
                return true;

        return MCI_MISC_ADDR_LSB(m->misc) >= PAGE_SHIFT;
}

> Because I see there close to the top of the function:
>
>       if (TestSetPageHWPoison(p)) {
>               ...
>
> after this, that whole page is hwpoison I'd say. Not a cacheline but the
> whole thing.

That may now be a confusing name for the page flag bit. Until the
pmem/storage use case we just simply lost the whole page (back
then set_mce_nospec() didn't take an argument, it just marked the
whole page as "not present" in the kernel 1:1 map).

So the meaning of HWPoison has subtly changed from "this whole
page is poisoned" to "there is some poison in some/all[1] of this page"

-Tony

[1] even in the "all" case the poison is likely just in one cache line, but
the MCI_MISC_ADDR_LSB indication in the machine check bank said
the scope was the whole page. This happened on some older systems
for page scrub errors where the memory controller wasn't smart enough
to translate the channel address back to a cache granular system address.

Reply via email to