Thanks Stefan. Couldn't get to this earlier. Did another run and took a diff of /proc/interrupts before and after the run. It shows all the interrupts for 'virtio7-req.0' going to CPU1. I guess that explains the "CPU1/KVM" vcpu utilization on the host.
34: 147 666085 0 0 PCI-MSI-edge virtio7-req.0 The only remaining question is the high CPU utilization of the vCPU threads for this workload. Even when I run a light fio workload (queue depth of 1 which gives 8K IOPS), the vCPU threads are close to 100% utilization. Why is it high and does it have an impact on guest code that could be executing on the same CPU ? fio command line: fio --time_based --ioengine=libaio --randrepeat=1 --direct=1 --invalidate=1 --verify=0 --offset=0 --verify_fatal=0 --group_reporting --numjobs=1 --name=randread --rw=randread --blocksize=8K --iodepth=1 --runtime=60 --filename=/dev/vdb qemu command line: qemu-system-x86_64 -L /usr/share/seabios/ -enable-kvm -name node1,debug-threads=on -name node1 -S -machine pc-i440fx-2.8,accel=kvm,usb=off -cpu SandyBridge -m 7680 -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -object iothread,id=iothread1 -object iothread,id=iothread2 -object iothread,id=iothread3 -object iothread,id=iothread4 -uuid XX -nographic -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/node1fs.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device lsi,id=scsi0,bus=pci.0,addr=0x6 -device virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x7 -device virtio-scsi-pci,id=scsi2,bus=pci.0,addr=0x8 -drive file=rhel7.qcow2,if=none,id=drive-virtio-disk0,format=qcow2 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/dev/sdc,if=none,id=drive-virtio-disk1,format=raw,cache=none,aio=native -device virtio-blk-pci,iothread=iothread1,scsi=off,bus=pci.0,addr=0x17,drive=drive-virtio-disk1,id=virtio-disk1 -drive file=/dev/sdc,if=none,id=drive-scsi1-0-0-0,format=raw,cache=none,aio=native -device scsi-hd,bus=scsi1.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi1-0-0-0,id=scsi1-0-0-0 -netdev tap,fd=24,id=hostnet0,vhost=on,vhostfd=25 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=XXX,bus=pci.0,addr=0x2 -netdev tap,fd=26,id=hostnet1,vhost=on,vhostfd=27 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=YYY,bus=pci.0,multifunction=on,addr=0x15 -netdev tap,fd=28,id=hostnet2,vhost=on,vhostfd=29 -device virtio-net-pci,netdev=hostnet2,id=net2,mac=ZZZ,bus=pci.0,multifunction=on,addr=0x16 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 -msg timestamp=on # qemu-system-x86_64 --version QEMU emulator version 2.8.0(Debian 1:2.8+dfsg-3~bpo8+1) Copyright (c) 2003-2016 Fabrice Bellard and the QEMU Project developers Note that I had the same host block device (/dev/sdc in this case) exposed over both virtio-scsi and virtio-blk to the guest VM for perf comparisons. I see poor performance for 8K random reads inside the guest over both virtio-scsi and virtio-blk, compared to the host performance. Let me open another thread for that problem, but let me know if something obvious pops up based on the qemu command line. ~Padhu. -----Original Message----- From: Stefan Hajnoczi [mailto:stefa...@gmail.com] Sent: Tuesday, July 11, 2017 5:19 AM To: Nagarajan, Padhu (HPE Storage) <pa...@hpe.com> Cc: qemu-devel@nongnu.org Subject: Re: [Qemu-devel] Disparity between host and guest CPU utilization during disk IO benchmark On Mon, Jul 10, 2017 at 05:27:15PM +0000, Nagarajan, Padhu (HPE Storage) wrote: > Posted this in qemu-discuss and did not get a response. Hoping that someone > here might be able to offer insights. > > I was running an 8K random-read fio benchmark inside the guest with > iodepth=32. The device used inside the guest for the test was a virtio-blk > device with iothread enabled, mapped on to a raw block device on the host. > While this workload was running, I took a snapshot of the CPU utilization > reported by the host and the guest. The guest had 4 cores. top inside guest > shows 3 idle cores and one core being 74% utilized by fio (active on core 3). > The host had 12 cores and three cores were completely consumed by three qemu > threads. top inside host shows three qemu threads, each utilizing the CPU > core to a near 100%. These threads are "CPU 1/KVM", "CPU 3/KVM" and "IO > iothread1". The CPU utilization story on the host side is the same, even if I > run a light fio workload inside the guest (for ex. iodepth=1). > > Why do I see two "CPU/KVM" threads occupying 100% CPU, even though only one > core inside the guest is being utilized ? Note that I had 'accel=kvm' turned > on for the guest. fio might be submitting I/O requests on one vcpu and the completion interrupts are processed on another vcpu. To discuss further, please post: 1. Full QEMU command-line 2. Full fio command-line and job file (if applicable) 3. Output of cat /proc/interrupts inside the guest after running the benchmark