[ add Tony ] Hi Jane,
I was out for a few days... On Tue, Jun 22, 2021 at 10:22 AM Jane Chu <[email protected]> wrote: > > > On 6/21/2021 5:15 PM, Luis Chamberlain wrote: > > On Tue, Jun 15, 2021 at 11:55:19AM -0700, Jane Chu wrote: > >> Hi, Dan, > >> > >> It appears the fact pmem region is of WB memtype renders set_memory_uc() > >> > >> to fail due to inconsistency between the requested memtype(UC-) and the > >> cached > >> > >> memtype(WB). > >> > >> # cat /proc/iomem |grep -A 8 -i persist > >> 1840000000-d53fffffff : Persistent Memory > >> 1840000000-1c3fffffff : namespace0.0 > >> 5740000000-76bfffffff : namespace2.0 > >> > >> # cat /sys/kernel/debug/x86/pat_memtype_list > >> PAT memtype list: > >> PAT: [mem 0x0000001840000000-0x0000001c40000000] write-back > >> PAT: [mem 0x0000005740000000-0x00000076c0000000] write-back > >> > >> [10683.418072] Memory failure: 0x1850600: recovery action for dax page: > >> Recovered > >> [10683.426147] x86/PAT: fsdax_poison_v1:5018 conflicting memory types > >> 1850600000-1850601000 uncached-minus<->write-back > >> > >> cscope search shows that unlike pmem, set_memory_uc() is primarily used by > >> drivers dealing with ioremap(), > > > > Yes, when a driver *knows* the type must follow certain rules, it > > requests it. > > > >> perhaps the pmem case needs another way to suppress prefetch? > >> > >> Your thoughts? > > > > The way to think about this problem is, who did the ioremap call for the > > ioremap'd area? That's the driver that needs changing. > > I'm not sure if the pmem driver is doing something wrong, but rather the > pmem memory failure handler. This is a fixup that got lost in the shuffle. The issue is that the memtype tracking needs to be updated before the set_memory_uc(), or the memtype tracking needs to be bypassed. You'll notice that this commit: 284ce4011ba6 x86/memory_failure: Introduce {set, clear}_mce_nospec() ...effectively replaced set_memory_np() with set_memory_uc(). However, set_memory_np() does not check memtype and set_memory_uc() does. I believe it should be sufficient to do the following, i.e. skip the memtype sanity checking because the memory-failure code should have already shot down all the conflicting WB mappings: diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h index 43fa081a1adb..0bf2274c5186 100644 --- a/arch/x86/include/asm/set_memory.h +++ b/arch/x86/include/asm/set_memory.h @@ -114,8 +114,13 @@ static inline int set_mce_nospec(unsigned long pfn, bool unmap) if (unmap) rc = set_memory_np(decoy_addr, 1); - else - rc = set_memory_uc(decoy_addr, 1); + else { + /* + * Bypass memtype checks since memory-failure has shot + * down mappings. + */ + rc = _set_memory_uc(decoy_addr, 1); + } if (rc) pr_warn("Could not invalidate pfn=0x%lx from 1:1 map\n", pfn); return rc;
