On Thu, Jun 05, 2025 at 09:34:01AM +0100, Daniel P. Berrangé wrote: > On Wed, Jun 04, 2025 at 03:18:43PM -0400, Stefan Hajnoczi wrote: > > Since commit 7ff9ff039380 ("meson: mitigate against use of uninitialize > > stack for exploits") the -ftrivial-auto-var-init=zero compiler option is > > used to zero local variables. While this reduces security risks > > associated with uninitialized stack data, it introduced a measurable > > bottleneck in the virtqueue_split_pop() and virtqueue_packed_pop() > > functions. > > > > These virtqueue functions are in the hot path. They are called for each > > element (request) that is popped from a VIRTIO device's virtqueue. Using > > __attribute__((uninitialized)) on large stack variables in these > > functions improves fio randread bs=4k iodepth=64 performance from 304k > > to 332k IOPS (+9%). > > IIUC, the 'hwaddr addr' variable is 8k in size, and the 'struct iovec iov' > array is 16k in size, so we have 24k on the stack that we're clearing and > then later writing the real value. Makes sense that this would have a > perf impact in a hotpath. > > > This issue was found using perf-top(1). virtqueue_split_pop() was one of > > the top CPU consumers and the "annotate" feature showed that the memory > > zeroing instructions at the beginning of the functions were hot. > > When you say you found it with 'perf-top' was that just discovered by > accident, or was this usage of perf-top in response to users reporting > a performance degradation vs earlier QEMU ?
By accident. I was looking for ways to optimize my work-in-progress QEMU io_uring patches and virtqueue_split_pop() stood out in the CPU time profile.
signature.asc
Description: PGP signature