Re: [PATCH RFC v1 0/3] aio-poll: improve aio-polling efficiency

Stefan Hajnoczi Thu, 12 Feb 2026 10:54:56 -0800

On Fri, Feb 06, 2026 at 12:50:38AM -0600, JAEHOON KIM wrote:
> On 2/3/2026 3:12 PM, Stefan Hajnoczi wrote:
> > On Fri, Jan 23, 2026 at 01:15:04PM -0600, JAEHOON KIM wrote:
> > > On 1/19/2026 12:16 PM, Stefan Hajnoczi wrote:
> > > > On Tue, Jan 13, 2026 at 11:48:21AM -0600, Jaehoon Kim wrote:
> > > > > We evaluated the patches on an s390x host with a single guest using 16
> > > > > virtio block devices backed by FCP multipath devices in a 
> > > > > separate-disk
> > > > > setup, with the I/O scheduler set to 'none' in both host and guest.
> > > > > 
> > > > > The fio workload included sequential and random read/write with 
> > > > > varying
> > > > > numbers of jobs (1,4,8,16) and io_depth of 8. The tests were conducted
> > > > > with single and dual iothreads, using the newly introduced poll-weight
> > > > > parameter to measure their impact on CPU cost and throughput.
> > > > > 
> > > > > Compared to the baseline, across four FIO workload patterns 
> > > > > (sequential
> > > > > R/W, random R/W), and averaged over FIO job counts of 1, 4, 8, and 16,
> > > > > throughput decreased slightly (-3% to -8% for one iothread, -2% to -5%
> > > > > for two iothreads), while CPU usage on the s390x host dropped
> > > > > significantly (-10% to -25% and -7% to -12%, respectively).
> > > > Hi Jaehoon,
> > > > I would like to run the same fio benchmarks on a local NVMe drive (<10us
> > > > request latency) to see how that type of hardware configuration is
> > > > affected. Are the scripts and fio job files available somewhere?
> > > > 
> > > > Thanks,
> > > > Stefan
> > > Thank you for your reply.
> > > The fio scripts are not available in a location you can access, but there 
> > > is nothing particularly special in the settings.
> > > I’m sharing below the methodology and test setup used by our performance 
> > > team.
> > > 
> > > Guest Setup
> > > ----------------------
> > > - 12 vCPUs, 4 GiB memory
> > > - 16 virtio disks based on the FCP multipath devices in the host
> > > 
> > > FIO test parameters
> > > -----------------------
> > > - FIO Version: fio-3.33
> > > - Filesize: 2G
> > > - Blocksize: 8K / 128K
> > > - Direct I/O: 1
> > > - FIO I/O Engine: libaio
> > > - NUMJOB List: 1, 4, 8, 16
> > > - IODEPTH: 8
> > > - Runtime (s): 150
> > > 
> > > Two FIO samples for random read
> > > --------------------------------
> > > fio --direct=1 --name=test --numjobs=16 
> > > --filename=base.0.0:base.1.0:base.2.0:base.3.0:base.4.0:base.5.0:base.6.0:base.7.0:base.8.0:base.9.0:base.10.0:base.11.0:base.12.0:base.13.0:base.14.0:base.15.0
> > >  --size=32G  --time_based --runtime=4m --readwrite=randread 
> > > --ioengine=libaio --iodepth=8 --bs=8k
> > > fio --direct=1 --name=test --numjobs=4  
> > > --filename=subw1/base.0.0:subw4/base.3.0:subw8/base.7.0:subw12/base.11.0:subw16/base.15.0
> > >                                                                         
> > > --size=8G   --time_based --runtime=4m --readwrite=randread 
> > > --ioengine=libaio --iodepth=8 --bs=8k
> > > 
> > > 
> > > additional notes
> > > ----------------
> > > - Each file is placed on a separate disk device mounted under subw<n> as 
> > > specified in --filename=....
> > > - We execute one warmup run, then two measurement runs and calculate the 
> > > average
> > Hi Jaehoon,
> > I ran fio benchmarks on an Intel Optane SSD DC P4800X Series drive (<10
> > microsecond latency). This is with just 1 drive.
> > 
> > The 8 KiB block size results show something similar to what you
> > reported: there are IOPS (or throughput) regressions and CPU utilization
> > improvements.
> > 
> > Although the CPU improvements are welcome, I think the default behavior
> > should only be changed if the IOPS regressions can be brought below 5%.
> > 
> > The regressions seem to happen regardless of whether 1 or 2 IOThreads
> > are configured. CPU utilization is different (98% vs 78%) depending on
> > the number of IOThreads, so the regressions happen across a range of CPU
> > utilizations.
> > 
> > The 128 KiB block size results are not interesting because the drive
> > already saturates at numjobs=1. This is expected since the drive cannot
> > go much above ~2 GiB/s throughput.
> > 
> > You can find the Ansible playbook, libvirt domain XML, fio
> > command-lines, and the fio/sar data here:
> > 
> > https://gitlab.com/stefanha/virt-playbooks/-/tree/aio-polling-efficiency
> > 
> > Please let me know if you'd like me to rerun the benchmark with new
> > patches or a configuration change.
> > 
> > Do you want to have a video call to discuss your work and how to get the
> > patches merged?
> > 
> > Host
> > ----
> > CPU: Intel Xeon Silver 4214 CPU @ 2.20GHz
> > RAM: 32 GiB
> > 
> > Guest
> > -----
> > vCPUs: 8
> > RAM: 4 GiB
> > Disk: 1 virtio-blk aio=native cache=none
> > 
> > IOPS
> > ----
> > rw        bs   numjobs iothreads iops   diff
> > randread  8k   1       1         163417 -7.8%
> > randread  8k   1       2         165041 -2.4%
> > randread  8k   4       1         221508 -0.64%
> > randread  8k   4       2         251298 0.008%
> > randread  8k   8       1         222128 -0.51%
> > randread  8k   8       2         249489 -2.6%
> > randread  8k   16      1         230535 -0.18%
> > randread  8k   16      2         246732 -0.22%
> > randread  128k 1       1          17616 -0.11%
> > randread  128k 1       2          17678 0.027%
> > randread  128k 4       1          17536 -0.27%
> > randread  128k 4       2          17610 -0.031%
> > randread  128k 8       1          17369 -0.42%
> > randread  128k 8       2          17433 -0.071%
> > randread  128k 16      1          17215 -0.61%
> > randread  128k 16      2          17269 -0.22%
> > randwrite 8k   1       1         156597 -3.1%
> > randwrite 8k   1       2         157720 -3.8%
> > randwrite 8k   4       1         218448 -0.5%
> > randwrite 8k   4       2         247075 -5.1%
> > randwrite 8k   8       1         220866 -0.75%
> > randwrite 8k   8       2         260935 -0.011%
> > randwrite 8k   16      1         230913 0.23%
> > randwrite 8k   16      2         261125 -0.01%
> > randwrite 128k 1       1          16009 0.094%
> > randwrite 128k 1       2          16070 0.035%
> > randwrite 128k 4       1          16073 -0.62%
> > randwrite 128k 4       2          16131 0.059%
> > randwrite 128k 8       1          16106 0.092%
> > randwrite 128k 8       2          16153 0.048%
> > randwrite 128k 16      1          16102 -0.0091%
> > randwrite 128k 16      2          16160 0.048%
> > 
> > IOThread CPU usage
> > ------------------
> > iothreads before  after
> > 1         98.7    95.81
> > 2         78.43   66.13
> > 
> > Stefan
> 
> Hello Stefan,
> 
> Thank you very much for your effort in running these benchmarks.
> The results show a pattern very similar to what our performance team
> observed.
> 
> I fully agree with the 5% threshold for the default behavior.
> However, we need an approach that balances the current performance
> oriented polling scheme with CPU efficiency.
> 
> I found that relying on grow/shrink parameters was too limited to
> achieve these results. This is why I've adjusted the process using a
> weight-based grow/shrink approach to ensure the polling window remains
> robust against jitter. Specifically, it avoids abrupt resets to zero
> by implementing a gradual shrink rather than an immediate reset, even
> when device latency exceeds the threshold.
> 
> As seen in both your results and our team's measurements, this may lead
> to a bit of a performance trade-off, but it provides a reasonable
> balance for CPU-sensitive environment.
> 
> Thank you for suggesting the video call and I am also looking forward to
> hearing your thoughts. I'm on US Central Time. Except for Tuesday, I can
> adjust my schedule to a time that works for you.
> 
> Please let me know your preferred time.


Is Monday, February 16th at 10:00am CST good for you? If not, please
feel free to pick any time on Monday.

Meeting link: https://meet.jit.si/AioPollingOptimization

Anyone else interested in this topic is welcome to join.

Thanks,
Stefan

signature.asc
Description: PGP signature

Re: [PATCH RFC v1 0/3] aio-poll: improve aio-polling efficiency

Reply via email to