On 11/12/2021 2:35 PM, Jane Chu wrote: > On 11/12/2021 11:24 AM, Dan Williams wrote: >> On Fri, Nov 12, 2021 at 9:58 AM Jane Chu <[email protected]> wrote: >>> >>> On 11/11/2021 4:51 PM, Dan Williams wrote: >>>> On Thu, Nov 11, 2021 at 4:30 PM Jane Chu <[email protected]> wrote: >>>>> >>>>> Just a quick update - >>>>> >>>>> I managed to test the 'NP' and 'UC' effect on a pmem dax file. >>>>> The result is, as expected, both setting 'NP' and 'UC' works >>>>> well in preventing the prefetcher from accessing the poisoned >>>>> pmem page. >>>>> >>>>> I injected back-to-back poisons to the 3rd block(512B) of >>>>> the 3rd page in my dax file. With 'NP', the 'mc_safe read' >>>>> stops after reading the 1st and 2nd pages, with 'UC', >>>>> the 'mc_safe read' was able to read [2 pages + 2 blocks] on >>>>> my test machine. >>>> >>>> My expectation is that dax_direct_access() / dax_recovery_read() has >>>> installed a temporary UC alias for the pfn, or has temporarily flipped >>>> NP to UC. Outside of dax_recovery_read() the page will always be NP. >>>> >>> >>> Okay. Could we only flip the memtype within dax_recovery_read, and >>> not within dax_direct_access? dax_direct_access does not need to >>> access the page. >> >> True, dax_direct_access() does not need to do the page permission >> change, it just needs to indicate if dax_recovery_{read,write}() may >> be attempted. I was thinking that the DAX pages only float between NP >> and WB depending on whether poison is present in the page. If >> dax_recovery_read() wants to do UC reads around the poison it can use >> ioremap() or vmap() to create a temporary UC alias. The temporary UC >> alias is only possible if there might be non-clobbered data remaining >> in the page. I.e. the current "whole_page()" determination in >> uc_decode_notifier() needs to be plumbed into the PMEM driver so that >> it can cooperate with a virtualized environment that injects virtual >> #MC at page granularity. I.e. nfit_handle_mce() is broken in that it >> only assumes a single cacheline around the failure address is >> poisoned, it needs that same whole_page() logic. >> > > I'll have to take some time to digest what you proposed, but alas, why > couldn't we let the correct decision (about NP vs UC) being made when > the 'whole_page' test has access to the MSi_MISC register, instead of > having to risk mistakenly change NP->UC in dax_recovery_read() and > risk to repeat the bug that Tony has fixed? I mean a PMEM page might > be legitimately not-accessible due to it might have been unmapped by > the host, but dax_recovery_read() has no way to know, right? > > The whole UC <> NP arguments to me seems to be a > "UC being harmless/workable solution to DRAM and PMEM" versus > "NP being simpler regardless what risk it brings to PMEM".
That wasn't clear, let me try again - "reject the if-whole_page-then-set-NP-otherwise-set-UC solution on the ground of DRAM can't benefit from UC since to DRAM the UC effect is practically the same as the NP effect, but harmless anyway" versus "accept set-NP-always regardless whatever risk it brings to PMEM" > To us, PMEM is not just another driver, it is treated as memory by our > customer, so why? > > thanks! > -jane > > > thanks, -jane
