On Wed, 5 Nov 2025 10:40:10 +1000
Gavin Shan <[email protected]> wrote:

> Hi Jonathan and Igor,
> 
> On 11/4/25 10:21 PM, Jonathan Cameron wrote:
> > On Mon, 3 Nov 2025 10:52:16 +0100
> > Igor Mammedov <[email protected]> wrote:
> >   
> 
> [...]
> 
> >> My idea using per cpu source is just a speculation based on spec
> >> on how workaround the problem,
> >> I don't really know if guest OS will be able to handle it (aka,
> >> need to be tested is it's viable). That also probably was a reason
> >> in previous review, why should've waited for multiple sources
> >> support be be merged first before this series.  
> > 
> > Per vCPU should work fine but I do like the approach here of reporting
> > all the related errors in one go as they represent the underlying nature
> > of the error granularity tracking. If anyone ever poisons at the 1GiB level
> > on the host they are on their own - so I think that it will only ever be
> > the finest granularity supported (so worse case 64KiB).
> >   
> 
> Well, I don't have strong opinions, but I intend to agree with Jonathan
> to report all 16x errors at once. One reason is one as Jonathan mentioned.
> Another reason is per vCPU error source is a bit heavy for the improvement.
> 
> So I'm going to improve (v2) series to address all received comments and
> post a (v3) series.
> 
> I already had the prototype of error source per vcpu, which works fine for
> 64KB-host-4KB-guest. However, it doesn't work for huge pages. For example,
> a problematic 512MB huge page can cause heavy memory error storm to QEMU
> where we absolutely can't handle.
> 
> 1. Start the VM with hugetlb pages
> 
> /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64                       
>               \
> -accel kvm -machine virt,gic-version=host,nvdimm=on,ras=on                    
>               \
> -cpu host -smp maxcpus=8,cpus=8,sockets=2,clusters=2,cores=2,threads=1        
>               \
> -m 4096M,slots=16,maxmem=128G                                                 
>               \
> -object 
> memory-backend-file,id=mem0,prealloc=on,mem-path=/dev/hugepages-524288kB,size=4096M
>  \
> -numa node,nodeid=0,cpus=0-7,memdev=mem0                                      
>               \
> 
> 2. Run 'victim -d' on guest
> 
> guest$ ./victim -d
> physical address of (0xffff889d6000) = 0x11a7da000
> Hit any key to trigger error:
> 
> 3. Inject error from host
> 
> host$ errinjct 0x11a7da000
> 
> 4. QEMU crashes with error message "Bus error (core dumped)", which is 
> triggered
> the following path.
> 
> sigbus_handler
>    kvm_on_sigbus_vcpu           // have_sigbus_pending = 1
>    sigbus_reraise

To me this sounds like something that should not be happening on the host unless
a real memory error is detected that blows away the whole of / most of a huge 
page.
I'm not sure we care about surviving that case if it isn't mapped using 
hugetlb/DAX or
similar in the guest (so contiguous in both with contained impact in both).

I assume the issue is backing with hugetlbfs which doesn't have a sub huge page 
granularity
for poison tracking.  I vaguely recall an effort to solve that
https://lore.kernel.org/linux-mm/[email protected]/
was the first thing google threw me. Looks like it got to v2.
https://lore.kernel.org/linux-mm/[email protected]/

+CC James.

> 
> Thanks,
> Gavin
> 
> 


Reply via email to