On Thu, Jun 05, 2025 at 01:28:49PM +0200, Philippe Mathieu-Daudé wrote: > On 5/6/25 10:34, Daniel P. Berrangé wrote: > > On Wed, Jun 04, 2025 at 03:18:43PM -0400, Stefan Hajnoczi wrote: > > > Since commit 7ff9ff039380 ("meson: mitigate against use of uninitialize > > > stack for exploits") the -ftrivial-auto-var-init=zero compiler option is > > > used to zero local variables. While this reduces security risks > > > associated with uninitialized stack data, it introduced a measurable > > > bottleneck in the virtqueue_split_pop() and virtqueue_packed_pop() > > > functions. > > > > > > These virtqueue functions are in the hot path. They are called for each > > > element (request) that is popped from a VIRTIO device's virtqueue. Using > > > __attribute__((uninitialized)) on large stack variables in these > > > functions improves fio randread bs=4k iodepth=64 performance from 304k > > > to 332k IOPS (+9%). > > > > IIUC, the 'hwaddr addr' variable is 8k in size, and the 'struct iovec iov' > > array is 16k in size, so we have 24k on the stack that we're clearing and > > then later writing the real value. Makes sense that this would have a > > perf impact in a hotpath. > > > > > This issue was found using perf-top(1). virtqueue_split_pop() was one of > > > the top CPU consumers and the "annotate" feature showed that the memory > > > zeroing instructions at the beginning of the functions were hot. > > > > When you say you found it with 'perf-top' was that just discovered by > > accident, or was this usage of perf-top in response to users reporting > > a performance degradation vs earlier QEMU ? > > Would it make sense to move these to VirtQueue (since the structure > definition is local anyway)? > > -- >8 -- > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c > index 85110bce374..b96c6ec603c 100644 > --- a/hw/virtio/virtio.c > +++ b/hw/virtio/virtio.c > @@ -153,6 +153,12 @@ struct VirtQueue > EventNotifier host_notifier; > bool host_notifier_enabled; > QLIST_ENTRY(VirtQueue) node; > + > + /* Only used by virtqueue_pop() */ > + struct { > + hwaddr addr[VIRTQUEUE_MAX_SIZE]; > + struct iovec iov[VIRTQUEUE_MAX_SIZE]; > + } pop;
This is an alternative. Using g_alloca() is another alternative. I chose __attribute__((uninitialized)) because it clearly documents the reason why these variables need special treatment. In your patch the "Only used by virtqueue_pop()" comment isn't enough to explain why these variables should be located here. Someone might accidentally move them back into virtqueue_pop() functions in the future if they are unaware of the reason. I'm happy to change approaches based on the pros/cons. Why do you prefer moving the local variables into VirtQueue? > }; > > const char *virtio_device_names[] = { > @@ -1680,8 +1686,8 @@ static void *virtqueue_split_pop(VirtQueue *vq, size_t > sz) > VirtIODevice *vdev = vq->vdev; > VirtQueueElement *elem = NULL; > unsigned out_num, in_num, elem_entries; > - hwaddr addr[VIRTQUEUE_MAX_SIZE]; > - struct iovec iov[VIRTQUEUE_MAX_SIZE]; > + hwaddr *addr = vq->pop.addr; > + struct iovec *iov = vq->pop.iov; > VRingDesc desc; > int rc; > > @@ -1826,8 +1832,8 @@ static void *virtqueue_packed_pop(VirtQueue *vq, > size_t sz) > VirtIODevice *vdev = vq->vdev; > VirtQueueElement *elem = NULL; > unsigned out_num, in_num, elem_entries; > - hwaddr addr[VIRTQUEUE_MAX_SIZE]; > - struct iovec iov[VIRTQUEUE_MAX_SIZE]; > + hwaddr *addr = vq->pop.addr; > + struct iovec *iov = vq->pop.iov; > VRingPackedDesc desc; > uint16_t id; > int rc; > --- > > > > > > > > > Fixes: 7ff9ff039380 ("meson: mitigate against use of uninitialize stack > > > for exploits") > > > Cc: Daniel P. Berrangé <berra...@redhat.com> > > > Signed-off-by: Stefan Hajnoczi <stefa...@redhat.com> > > > --- > > > include/qemu/compiler.h | 12 ++++++++++++ > > > hw/virtio/virtio.c | 8 ++++---- > > > 2 files changed, 16 insertions(+), 4 deletions(-) >
signature.asc
Description: PGP signature