On Fri, Feb 06, 2026 at 12:50:38AM -0600, JAEHOON KIM wrote: > On 2/3/2026 3:12 PM, Stefan Hajnoczi wrote: > > On Fri, Jan 23, 2026 at 01:15:04PM -0600, JAEHOON KIM wrote: > > > On 1/19/2026 12:16 PM, Stefan Hajnoczi wrote: > > > > On Tue, Jan 13, 2026 at 11:48:21AM -0600, Jaehoon Kim wrote: > > > > > We evaluated the patches on an s390x host with a single guest using 16 > > > > > virtio block devices backed by FCP multipath devices in a > > > > > separate-disk > > > > > setup, with the I/O scheduler set to 'none' in both host and guest. > > > > > > > > > > The fio workload included sequential and random read/write with > > > > > varying > > > > > numbers of jobs (1,4,8,16) and io_depth of 8. The tests were conducted > > > > > with single and dual iothreads, using the newly introduced poll-weight > > > > > parameter to measure their impact on CPU cost and throughput. > > > > > > > > > > Compared to the baseline, across four FIO workload patterns > > > > > (sequential > > > > > R/W, random R/W), and averaged over FIO job counts of 1, 4, 8, and 16, > > > > > throughput decreased slightly (-3% to -8% for one iothread, -2% to -5% > > > > > for two iothreads), while CPU usage on the s390x host dropped > > > > > significantly (-10% to -25% and -7% to -12%, respectively). > > > > Hi Jaehoon, > > > > I would like to run the same fio benchmarks on a local NVMe drive (<10us > > > > request latency) to see how that type of hardware configuration is > > > > affected. Are the scripts and fio job files available somewhere? > > > > > > > > Thanks, > > > > Stefan > > > Thank you for your reply. > > > The fio scripts are not available in a location you can access, but there > > > is nothing particularly special in the settings. > > > I’m sharing below the methodology and test setup used by our performance > > > team. > > > > > > Guest Setup > > > ---------------------- > > > - 12 vCPUs, 4 GiB memory > > > - 16 virtio disks based on the FCP multipath devices in the host > > > > > > FIO test parameters > > > ----------------------- > > > - FIO Version: fio-3.33 > > > - Filesize: 2G > > > - Blocksize: 8K / 128K > > > - Direct I/O: 1 > > > - FIO I/O Engine: libaio > > > - NUMJOB List: 1, 4, 8, 16 > > > - IODEPTH: 8 > > > - Runtime (s): 150 > > > > > > Two FIO samples for random read > > > -------------------------------- > > > fio --direct=1 --name=test --numjobs=16 > > > --filename=base.0.0:base.1.0:base.2.0:base.3.0:base.4.0:base.5.0:base.6.0:base.7.0:base.8.0:base.9.0:base.10.0:base.11.0:base.12.0:base.13.0:base.14.0:base.15.0 > > > --size=32G --time_based --runtime=4m --readwrite=randread > > > --ioengine=libaio --iodepth=8 --bs=8k > > > fio --direct=1 --name=test --numjobs=4 > > > --filename=subw1/base.0.0:subw4/base.3.0:subw8/base.7.0:subw12/base.11.0:subw16/base.15.0 > > > > > > --size=8G --time_based --runtime=4m --readwrite=randread > > > --ioengine=libaio --iodepth=8 --bs=8k > > > > > > > > > additional notes > > > ---------------- > > > - Each file is placed on a separate disk device mounted under subw<n> as > > > specified in --filename=.... > > > - We execute one warmup run, then two measurement runs and calculate the > > > average > > Hi Jaehoon, > > I ran fio benchmarks on an Intel Optane SSD DC P4800X Series drive (<10 > > microsecond latency). This is with just 1 drive. > > > > The 8 KiB block size results show something similar to what you > > reported: there are IOPS (or throughput) regressions and CPU utilization > > improvements. > > > > Although the CPU improvements are welcome, I think the default behavior > > should only be changed if the IOPS regressions can be brought below 5%. > > > > The regressions seem to happen regardless of whether 1 or 2 IOThreads > > are configured. CPU utilization is different (98% vs 78%) depending on > > the number of IOThreads, so the regressions happen across a range of CPU > > utilizations. > > > > The 128 KiB block size results are not interesting because the drive > > already saturates at numjobs=1. This is expected since the drive cannot > > go much above ~2 GiB/s throughput. > > > > You can find the Ansible playbook, libvirt domain XML, fio > > command-lines, and the fio/sar data here: > > > > https://gitlab.com/stefanha/virt-playbooks/-/tree/aio-polling-efficiency > > > > Please let me know if you'd like me to rerun the benchmark with new > > patches or a configuration change. > > > > Do you want to have a video call to discuss your work and how to get the > > patches merged? > > > > Host > > ---- > > CPU: Intel Xeon Silver 4214 CPU @ 2.20GHz > > RAM: 32 GiB > > > > Guest > > ----- > > vCPUs: 8 > > RAM: 4 GiB > > Disk: 1 virtio-blk aio=native cache=none > > > > IOPS > > ---- > > rw bs numjobs iothreads iops diff > > randread 8k 1 1 163417 -7.8% > > randread 8k 1 2 165041 -2.4% > > randread 8k 4 1 221508 -0.64% > > randread 8k 4 2 251298 0.008% > > randread 8k 8 1 222128 -0.51% > > randread 8k 8 2 249489 -2.6% > > randread 8k 16 1 230535 -0.18% > > randread 8k 16 2 246732 -0.22% > > randread 128k 1 1 17616 -0.11% > > randread 128k 1 2 17678 0.027% > > randread 128k 4 1 17536 -0.27% > > randread 128k 4 2 17610 -0.031% > > randread 128k 8 1 17369 -0.42% > > randread 128k 8 2 17433 -0.071% > > randread 128k 16 1 17215 -0.61% > > randread 128k 16 2 17269 -0.22% > > randwrite 8k 1 1 156597 -3.1% > > randwrite 8k 1 2 157720 -3.8% > > randwrite 8k 4 1 218448 -0.5% > > randwrite 8k 4 2 247075 -5.1% > > randwrite 8k 8 1 220866 -0.75% > > randwrite 8k 8 2 260935 -0.011% > > randwrite 8k 16 1 230913 0.23% > > randwrite 8k 16 2 261125 -0.01% > > randwrite 128k 1 1 16009 0.094% > > randwrite 128k 1 2 16070 0.035% > > randwrite 128k 4 1 16073 -0.62% > > randwrite 128k 4 2 16131 0.059% > > randwrite 128k 8 1 16106 0.092% > > randwrite 128k 8 2 16153 0.048% > > randwrite 128k 16 1 16102 -0.0091% > > randwrite 128k 16 2 16160 0.048% > > > > IOThread CPU usage > > ------------------ > > iothreads before after > > 1 98.7 95.81 > > 2 78.43 66.13 > > > > Stefan > > Hello Stefan, > > Thank you very much for your effort in running these benchmarks. > The results show a pattern very similar to what our performance team > observed. > > I fully agree with the 5% threshold for the default behavior. > However, we need an approach that balances the current performance > oriented polling scheme with CPU efficiency. > > I found that relying on grow/shrink parameters was too limited to > achieve these results. This is why I've adjusted the process using a > weight-based grow/shrink approach to ensure the polling window remains > robust against jitter. Specifically, it avoids abrupt resets to zero > by implementing a gradual shrink rather than an immediate reset, even > when device latency exceeds the threshold. > > As seen in both your results and our team's measurements, this may lead > to a bit of a performance trade-off, but it provides a reasonable > balance for CPU-sensitive environment. > > Thank you for suggesting the video call and I am also looking forward to > hearing your thoughts. I'm on US Central Time. Except for Tuesday, I can > adjust my schedule to a time that works for you. > > Please let me know your preferred time.
Is Monday, February 16th at 10:00am CST good for you? If not, please feel free to pick any time on Monday. Meeting link: https://meet.jit.si/AioPollingOptimization Anyone else interested in this topic is welcome to join. Thanks, Stefan
signature.asc
Description: PGP signature
