On 11/12/2021 2:35 PM, Jane Chu wrote:
> On 11/12/2021 11:24 AM, Dan Williams wrote:
>> On Fri, Nov 12, 2021 at 9:58 AM Jane Chu <[email protected]> wrote:
>>>
>>> On 11/11/2021 4:51 PM, Dan Williams wrote:
>>>> On Thu, Nov 11, 2021 at 4:30 PM Jane Chu <[email protected]> wrote:
>>>>>
>>>>> Just a quick update -
>>>>>
>>>>> I managed to test the 'NP' and 'UC' effect on a pmem dax file.
>>>>> The result is, as expected, both setting 'NP' and 'UC' works
>>>>> well in preventing the prefetcher from accessing the poisoned
>>>>> pmem page.
>>>>>
>>>>> I injected back-to-back poisons to the 3rd block(512B) of
>>>>> the 3rd page in my dax file.  With 'NP', the 'mc_safe read'
>>>>> stops  after reading the 1st and 2nd pages, with 'UC',
>>>>> the 'mc_safe read' was able to read [2 pages + 2 blocks] on
>>>>> my test machine.
>>>>
>>>> My expectation is that dax_direct_access() / dax_recovery_read() has
>>>> installed a temporary UC alias for the pfn, or has temporarily flipped
>>>> NP to UC. Outside of dax_recovery_read() the page will always be NP.
>>>>
>>>
>>> Okay.  Could we only flip the memtype within dax_recovery_read, and
>>> not within dax_direct_access?  dax_direct_access does not need to
>>> access the page.
>>
>> True, dax_direct_access() does not need to do the page permission
>> change, it just needs to indicate if dax_recovery_{read,write}() may
>> be attempted. I was thinking that the DAX pages only float between NP
>> and WB depending on whether poison is present in the page. If
>> dax_recovery_read() wants to do UC reads around the poison it can use
>> ioremap() or vmap() to create a temporary UC alias. The temporary UC
>> alias is only possible if there might be non-clobbered data remaining
>> in the page. I.e. the current "whole_page()" determination in
>> uc_decode_notifier() needs to be plumbed into the PMEM driver so that
>> it can cooperate with a virtualized environment that injects virtual
>> #MC at page granularity. I.e. nfit_handle_mce() is broken in that it
>> only assumes a single cacheline around the failure address is
>> poisoned, it needs that same whole_page() logic.
>>
> 
> I'll have to take some time to digest what you proposed, but alas, why
> couldn't we let the correct decision (about NP vs UC) being made when
> the 'whole_page' test has access to the MSi_MISC register, instead of
> having to risk mistakenly change NP->UC in dax_recovery_read() and
> risk to repeat the bug that Tony has fixed?  I mean a PMEM page might
> be legitimately not-accessible due to it might have been unmapped by
> the host, but dax_recovery_read() has no way to know, right?
> 
> The whole UC <> NP arguments to me seems to be a
>    "UC being harmless/workable solution to DRAM and PMEM"  versus
>    "NP being simpler regardless what risk it brings to PMEM".

That wasn't clear, let me try again -

"reject the if-whole_page-then-set-NP-otherwise-set-UC solution
  on the ground of DRAM can't benefit from UC since to DRAM the UC effect
  is practically the same as the NP effect, but harmless anyway"

versus

"accept set-NP-always regardless whatever risk it brings to PMEM"

> To us, PMEM is not just another driver, it is treated as memory by our
> customer, so why?
> 
> thanks!
> -jane
> 
> 
> 
thanks,
-jane

Reply via email to