On Mon, Dec 17, 2012 at 11:05 AM, Stefan Hajnoczi <stefa...@redhat.com> wrote: > Note: v8 is a small change, if you have reviewed v7 then the code is almost > totally unchanged. > > This series adds the -device virtio-blk-pci,x-data-plane=on property that > enables a high performance I/O codepath. A dedicated thread is used to > process > virtio-blk requests outside the global mutex and without going through the > QEMU > block layer. > > Khoa Huynh <k...@us.ibm.com> reported an increase from 140,000 IOPS to 600,000 > IOPS for a single VM using virtio-blk-data-plane in July: > > http://comments.gmane.org/gmane.comp.emulators.kvm.devel/94580 > > The virtio-blk-data-plane approach was originally presented at Linux Plumbers > Conference 2010. The following slides contain a brief overview: > > > http://linuxplumbersconf.org/2010/ocw/system/presentations/651/original/Optimizing_the_QEMU_Storage_Stack.pdf > > The basic approach is: > 1. Each virtio-blk device has a thread dedicated to handling ioeventfd > signalling when the guest kicks the virtqueue. > 2. Requests are processed without going through the QEMU block layer using > Linux AIO directly. > 3. Completion interrupts are injected via irqfd from the dedicated thread. > > To try it out: > > qemu -drive if=none,id=drive0,cache=none,aio=native,format=raw,file=... > -device > virtio-blk-pci,drive=drive0,scsi=off,config-wce=off,x-data-plane=on > > Limitations: > * Only format=raw is supported > * Live migration is not supported > * Block jobs, hot unplug, and other operations fail with -EBUSY > * I/O throttling limits are ignored > * Only Linux hosts are supported due to Linux AIO usage > > The code has reached a stage where I feel it is ready to merge. Users have > been playing with it for some time and want the significant performance boost. > > We are refactoring QEMU to get rid of the global mutex. I believe that > virtio-blk-data-plane can eventually become the default mode of operation. > > Instead of waiting for global mutex removal efforts to finish, I want to use > virtio-blk-data-plane as an example device for AioContext and threaded hw > dispatch refactoring. This means: > > 1. When the block layer can bind to an AioContext and execute I/O outside the > global mutex, virtio-blk-data-plane can use this (and gain image format > support). > > 2. When hw dispatch no longer needs the global mutex we can use hw/virtio.c > again and perhaps run a pool of iothreads instead of dedicated data plane > threads. > > But in the meantime, I have cleaned up the virtio-blk-data-plane code so that > it can be merged as an experimental feature. > > v8: > * Fix VIRTIO_BLK_T_GET_ID support - use "in" bufsm, not "out" bufs in > hw/dataplane/virtio-blk.c > * Hostmem -> HostMem rename in hw/dataplane/hostmem.[ch] and > hw/dataplane/vring.h [Blue] > > v7: > * VIRTIO_BLK_T_GET_ID support > * Replace lock/condvar with drain operation that stops data plane thread > [Michael, Paolo, Laszlo] > * Add vring_pop() TODO about crossing memory region boundaries [Michael] > * Move #ifdef CONFIG_VIRTIO_BLK_DATA_PLANE to hw/virtio-blk.c [Michael] > * Typo s/there is/there is no/ in hostmem.h [Paolo] > * Avoid potential integer overflow in hostmem.c [Laszlo] > * Retry epoll_wait() on EINTR so gdb works > > v6: > * Move hw/Makefile.objs dataplane/ inclusion from Patch 4 to Patch 3 [Kevin] > * Split discard() with front/back and switch ssize_t to size_t [Michael] > * Disable WCE config feature [Michael] > * Assert on ioq underflow/overflow, it can never happen [Kevin] > * Propagate fdatasync() errors [Kevin] > * Remember to init/destroy hostmem mutex > * Declare VirtIOBlkConf->data_plane in the right patch so building works > > v5: > * Omit memory regions with dirty logging enabled from hostmem [Michael] > * Add doc comment about quiescing requests across memory hot unplug [Michael] > * Clarify which Linux vhost version the vring code originates from [Michael] > * Break up indirect vring buffer into 1 hostmem_lookup() per descriptor > [Michael] > * Barriers in hw/dataplane/vring.c to force fields to be loaded [Michael] > * split vring_set_notification() into enable/disable [Paolo] > * barriers in vring.c instead of virtio-blk.c [Michael] > * move setup code from hw/virtio-blk.c into hw/dataplane/virtio-blk.c > [Michael] > > * Note I did not get rid of the mutex+condvar approach to draining requests. > I've had good feedback on the performance of the patch series so I'm not > worried about eliminating the lock (it's very rarely contended). Hope > Michael and Paolo are okay with this approach. > > v4: > * Add qemu_iovec_concat_iov() [Paolo] > * Use QEMUIOVector to copy out virtio_blk_inhdr [Michael, Paolo] > > v3: > * Don't assume iovec layout [Michael] > * Better naming for hostmem.c MemoryListener callbacks [Don] > * More vring quarantining if commands are bogus instead of exiting [Blue] > > v2: > * Use MemoryListener for thread-safe memory mapping [Paolo, Anthony, and > everyone else pointed this out ;-)] > * Quarantine invalid vring instead of exiting [Blue] > * Replace __u16 kernel types with uint16_t [Blue] > > Changes from the RFC v9: > * Add x-data-plane=on|off option and coexist with regular virtio-blk code > * Create thread from BH so it inherits iothread cpusets > * Drain requests on vm_stop() so stopped guest does not access image file > * Add migration blocker > * Add bdrv_in_use() to prevent block jobs and other operations that can > interfere > * Drop IOQueue request merging for simplicity > * Drop ioctl interrupt injection and always use irqfd for simplicity > * Major cleanup to split up source files > * Rebase from qemu-kvm.git onto qemu.git > * Address Michael Tsirkin's review comments > > Stefan Hajnoczi (12): > raw-posix: add raw_get_aio_fd() for virtio-blk-data-plane > configure: add CONFIG_VIRTIO_BLK_DATA_PLANE > dataplane: add host memory mapping code > dataplane: add virtqueue vring code > dataplane: add event loop > dataplane: add Linux AIO request queue > iov: add iov_discard_front/back() to remove data > test-iov: add iov_discard_front/back() testcases > iov: add qemu_iovec_concat_iov() > virtio-blk: restore VirtIOBlkConf->config_wce flag > dataplane: add virtio-blk data plane code > virtio-blk: add x-data-plane=on|off performance feature > > block.h | 9 + > block/raw-posix.c | 34 ++++ > configure | 21 ++ > hw/Makefile.objs | 2 +- > hw/dataplane/Makefile.objs | 3 + > hw/dataplane/event-poll.c | 100 ++++++++++ > hw/dataplane/event-poll.h | 40 ++++ > hw/dataplane/hostmem.c | 176 +++++++++++++++++ > hw/dataplane/hostmem.h | 57 ++++++ > hw/dataplane/ioq.c | 117 ++++++++++++ > hw/dataplane/ioq.h | 57 ++++++ > hw/dataplane/virtio-blk.c | 465 > +++++++++++++++++++++++++++++++++++++++++++++ > hw/dataplane/virtio-blk.h | 29 +++ > hw/dataplane/vring.c | 362 +++++++++++++++++++++++++++++++++++ > hw/dataplane/vring.h | 63 ++++++ > hw/virtio-blk.c | 47 ++++- > hw/virtio-blk.h | 5 +- > hw/virtio-pci.c | 4 + > iov.c | 90 +++++++-- > iov.h | 13 ++ > qemu-common.h | 3 + > tests/test-iov.c | 150 +++++++++++++++ > trace-events | 9 + > 23 files changed, 1840 insertions(+), 16 deletions(-) > create mode 100644 hw/dataplane/Makefile.objs > create mode 100644 hw/dataplane/event-poll.c > create mode 100644 hw/dataplane/event-poll.h > create mode 100644 hw/dataplane/hostmem.c > create mode 100644 hw/dataplane/hostmem.h > create mode 100644 hw/dataplane/ioq.c > create mode 100644 hw/dataplane/ioq.h > create mode 100644 hw/dataplane/virtio-blk.c > create mode 100644 hw/dataplane/virtio-blk.h > create mode 100644 hw/dataplane/vring.c > create mode 100644 hw/dataplane/vring.h
Merged into my block tree: https://github.com/stefanha/qemu/commits/block Stefan