On Tue, Nov 18, 2025 at 10:47:55AM +0000, Jonathan Cameron wrote:
> On Thu, 13 Nov 2025 03:25:27 +1000
> Gavin Shan <[email protected]> wrote:
> 
> > In the combination of 64KiB host and 4KiB guest, a problematic host
> > page affects 16x guest pages. Those 16x guest pages are most likely
> > owned by separate threads and accessed by the threads in parallel.
> > It means 16x memory errors can be raised at once. However, we're
> > unable to handle this situation because the only error source has
> > one read acknowledgement register in current design. QEMU has to
> > crash in the following path due to the previously delivered error
> > isn't acknowledged by the guest on attempt to deliver another error.
> > 
> >   kvm_vcpu_thread_fn
> >     kvm_cpu_exec
> >       kvm_arch_on_sigbus_vcpu
> >         kvm_cpu_synchronize_state
> >         acpi_ghes_memory_errors
> >         abort
> > 
> > This series fixes the issue by sending 16x consective CPER errors
> > which are contained in a single GHES error block.
> > 
> > PATCH[1-4] Increases GHES raw data maximal length from 1KiB to 4KiB
> > PATCH[5]   Supports multiple error records in a single error block
> > PATCH[6-7] Improves the error handling in the error delivery path
> > PATCH[8]   Sends 16x consective CPERs in a single block if needed
> > 
> 
> Hi Gavin,
> 
> Just a quick head's up to say we've had some internal discussions around the
> kernel handling of broader address masks in CPER and think it is probably
> broken. Rectifying that may at least simplify what is needed on the QEMU side
> of things and maybe even handle much larger blocks (2M and larger).

Btw, I just added a logic at rasdaemon to catch SIGBUS errors:
https://github.com/mchehab/rasdaemon/pull/199

But so far, I didn't find a proper way to check such code.

Jonathan/Gavin,

Do you know a good way for us to check how the mm SEA notification
is handled with QEMU?

Regards,
Mauro

Reply via email to