Hi, Dan,

On 8/26/2021 12:08 PM, Dan Williams wrote:
On Tue, Jul 6, 2021 at 6:01 PM Dan Williams<[email protected]>  wrote:
When poison is discovered and triggers memory_failure() the physical
page is unmapped from all process address space. However, it is not
unmapped from kernel address space. Unlike a typical memory page that
can be retired from use in the page allocator and marked 'not present',
pmem needs to remain accessible given it can not be physically remapped
or retired. set_memory_uc() tries to maintain consistent nominal memtype
mappings for a given pfn, but memory_failure() is an exceptional
condition.

For the same reason that set_memory_np() bypasses memtype checks
because they do not apply in the memory failure case, memtype validation
is not applicable for marking the pmem pfn uncacheable. Use
_set_memory_uc().

Reported-by: Jane Chu<[email protected]>
Fixes: 284ce4011ba6 ("x86/memory_failure: Introduce {set,clear}_mce_nospec()")
Cc: Luis Chamberlain<[email protected]>
Cc: Borislav Petkov<[email protected]>
Cc: Tony Luck<[email protected]>
Signed-off-by: Dan Williams<[email protected]>
---
Jane, can you give this a try and see if it cleans up the error you are
seeing?

Thanks for the help.
Jane, does this resolve the failure you reported [1]?

[1]:https://lore.kernel.org/r/[email protected]


Sorry for taking so long.  With the patch applied, the dmesg is displaying
[ 2111.282759] Memory failure: 0x1850600: recovery action for dax page: Recovered [ 2112.415412] x86/PAT: fsdax_poison_v1:3214 freeing invalid memtype [mem 0x1850600000-0x1850600fff]

instead of the problematic

[10683.426147] x86/PAT: fsdax_poison_v1:5018 conflicting memory types 1850600000-1850601000 uncached-minus<->write-back

Please feel free to add Tested-by: Jane Chu<[email protected]>

Thanks for the fix!

-jane






Reply via email to