在 2024/2/10 14:46, Dan Williams 写道:
Shiyang Ruan wrote:
If poison is detected(reported from cxl memdev), OS should be notified to
handle it.  Introduce this function:
   1. translate DPA to HPA;
   2. construct a MCE instance; (TODO: more details need to be filled)
   3. log it into MCE event queue;

After that, MCE mechanism can walk over its notifier chain to execute
specific handlers.

Signed-off-by: Shiyang Ruan <ruansy.f...@fujitsu.com>
---
  arch/x86/kernel/cpu/mce/core.c |  1 +
  drivers/cxl/core/mbox.c        | 33 +++++++++++++++++++++++++++++++++
  2 files changed, 34 insertions(+)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index bc39252bc54f..a64c0aceb7e0 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -131,6 +131,7 @@ void mce_setup(struct mce *m)
        m->ppin = cpu_data(m->extcpu).ppin;
        m->microcode = boot_cpu_data.microcode;
  }
+EXPORT_SYMBOL_GPL(mce_setup);

No, mce_setup() is x86 specific and the CXL subsystem is CPU
architecture independent. My expectation is that CXL should translate
errors for edac similar to how the ACPI GHES code does it. See usage of
edac_raw_mc_handle_error() and memory_failure_queue().

Otherwise an MCE is a CPU consumption of poison event, and CXL is
reporting device-side discovery of poison.

Yes, I misunderstood here. I was mean to use MCE to finally call memory_failure(). I think memory_failure_queue() is what I need.

  void memory_failure_queue(unsigned long pfn, int flags)

But it can only queue one PFN at a time, we may need to make it support queuing a range of PFN.


--
Thanks,
Ruan.

Reply via email to