On Thu, Sep 30, 2021 at 12:30 PM Borislav Petkov <[email protected]> wrote: > > On Thu, Sep 30, 2021 at 05:28:12PM +0000, Luck, Tony wrote: > > > Question is, can we even access a hwpoisoned page to retrieve that data > > > or are we completely in the wrong weeds here? Tony? > > > > Hardware scope for poison is a cache line (64 bytes for DDR, may be larger > > for the internals of 3D-Xpoint memory). > > I don't mean from the hw aspect but from the OS one: my simple thinking > is, *if* a page is marked as HW poison, any further mapping or accessing > of the page frame is prevented by the mm code. > > So you can't access *any* bits there so why do we even bother with whole > or not whole page? Page is gone...
I think the disconnect is that in the typical-memory-from-the-page-allocator case it's ok to throw away the whole page and get another one, they are interchangeable. In the PMEM case they are not, they are fixed physical offsets known to things like filesystem-metadata. So PageHWPoison in this latter case is just a flag to indicate that poison mitigations are in effect, page marked UC or NP, and the owner of that page, PMEM driver, knows how to navigate around that poison to maximize data recovery flows. The owner of the page in the latter / typical case, page allocator, says "bah, I'll just throw the whole thing away because there's no chance at repair and consumers can just as well use a different pfn for the same role."
