Hi Jonathan and Igor,

On 11/4/25 10:21 PM, Jonathan Cameron wrote:
On Mon, 3 Nov 2025 10:52:16 +0100
Igor Mammedov <[email protected]> wrote:


[...]

My idea using per cpu source is just a speculation based on spec
on how workaround the problem,
I don't really know if guest OS will be able to handle it (aka,
need to be tested is it's viable). That also probably was a reason
in previous review, why should've waited for multiple sources
support be be merged first before this series.

Per vCPU should work fine but I do like the approach here of reporting
all the related errors in one go as they represent the underlying nature
of the error granularity tracking. If anyone ever poisons at the 1GiB level
on the host they are on their own - so I think that it will only ever be
the finest granularity supported (so worse case 64KiB).


Well, I don't have strong opinions, but I intend to agree with Jonathan
to report all 16x errors at once. One reason is one as Jonathan mentioned.
Another reason is per vCPU error source is a bit heavy for the improvement.

So I'm going to improve (v2) series to address all received comments and
post a (v3) series.

I already had the prototype of error source per vcpu, which works fine for
64KB-host-4KB-guest. However, it doesn't work for huge pages. For example,
a problematic 512MB huge page can cause heavy memory error storm to QEMU
where we absolutely can't handle.

1. Start the VM with hugetlb pages

/home/gavin/sandbox/qemu.main/build/qemu-system-aarch64                         
            \
-accel kvm -machine virt,gic-version=host,nvdimm=on,ras=on                      
            \
-cpu host -smp maxcpus=8,cpus=8,sockets=2,clusters=2,cores=2,threads=1          
            \
-m 4096M,slots=16,maxmem=128G                                                   
            \
-object 
memory-backend-file,id=mem0,prealloc=on,mem-path=/dev/hugepages-524288kB,size=4096M
 \
-numa node,nodeid=0,cpus=0-7,memdev=mem0                                        
            \

2. Run 'victim -d' on guest

guest$ ./victim -d
physical address of (0xffff889d6000) = 0x11a7da000
Hit any key to trigger error:

3. Inject error from host

host$ errinjct 0x11a7da000

4. QEMU crashes with error message "Bus error (core dumped)", which is triggered
the following path.

sigbus_handler
  kvm_on_sigbus_vcpu           // have_sigbus_pending = 1
  sigbus_reraise

Thanks,
Gavin


Reply via email to