Hi, On Sat, Jun 4, 2016 at 12:26 AM, Stefan Hajnoczi <stefa...@redhat.com> wrote: > On Thu, Jun 02, 2016 at 05:19:41PM -0700, Stefan Hajnoczi wrote: >> On Mon, May 30, 2016 at 06:25:58PM -0700, Stefan Hajnoczi wrote: >> > v2: >> > * Simplify s->rq live migration [Paolo] >> > * Use more efficient bitmap ops for batch notification [Paolo] >> > * Fix perf regression due to batch notify BH in wrong AioContext >> > [Christian] >> > >> > The virtio_blk guest driver has supported multiple virtqueues since Linux >> > 3.17. >> > This patch series adds multiple virtqueues to QEMU's virtio-blk emulated >> > device. >> > >> > Ming Lei sent patches previously but these were not merged. This series >> > implements virtio-blk multiqueue for QEMU from scratch since the codebase >> > has >> > changed. Live migration support for s->rq was also missing from the >> > previous >> > series and has been added. >> > >> > It's important to note that QEMU's block layer does not support multiqueue >> > yet. >> > Therefore virtio-blk device processes all virtqueues in the same AioContext >> > (IOThread). Further work is necessary to take advantage of multiqueue >> > support >> > in QEMU's block layer once it becomes available. >> > >> > I will post performance results once they are ready. >> > >> > Stefan Hajnoczi (8): >> > virtio-blk: use batch notify in non-dataplane case >> > virtio-blk: tell dataplane which vq to notify >> > virtio-blk: associate request with a virtqueue >> > virtio-blk: add VirtIOBlockConf->num_queues >> > virtio-blk: multiqueue batch notify >> > virtio-blk: live migrateion s->rq with multiqueue >> > virtio-blk: dataplane multiqueue support >> > virtio-blk: add num-queues device property >> > >> > hw/block/dataplane/virtio-blk.c | 68 +++++++++++---------- >> > hw/block/dataplane/virtio-blk.h | 2 +- >> > hw/block/virtio-blk.c | 129 >> > +++++++++++++++++++++++++++++++++++----- >> > include/hw/virtio/virtio-blk.h | 8 ++- >> > 4 files changed, 159 insertions(+), 48 deletions(-) >> >> There is a significant performance regression due to batch notify: >> >> $ ./analyze.py runs/ >> Name IOPS Error >> unpatched-d6550e9ed2 19269820.2 ± 1.36% >> unpatched-d6550e9ed2-2 19567358.4 ± 2.42% >> v2-batch-only-f27ed9a4d9 16252227.2 ± 6.09% >> v2-no-dataplane 14560225.4 ± 5.16% >> v2-no-dataplane-2 14622535.6 ± 10.08% >> v2-no-dataplane-3 13960670.8 ± 7.11% >> >> unpatched-d6550e9ed2 is without this patch series. >> v2-batch-only-f27ed9a4d9 is with Patch 1 only. v2-no-dataplane is with >> the patch series (dataplane is not enabled in any of these tests). >> >> Next I will compare unpatched dataplane against patched dataplane. I >> want to make sure Patch 1 faithfully moved batch notify from dataplane >> code to generic virtio-blk code without affecting performance. >> >> If there is no difference then it means batch notify decreases >> performance for some workloads (obviously not the same workload that >> Ming Lei was running). > > It turns out that Patch 1 slows down dataplane even though the code > looks equivalent. After a lot of poking it turned out to be a subtle > issue: > > The order of BHs in the AioContext->first_bh list affects performance. > Linux AIO (block/linux-aio.c) invokes completion callbacks from a BH. > Performance is much better if virtio-blk.c's batch BH is after the > completion BH. > > The "fast" ordering notifies the guest in ~300 nanoseconds after the > last request completion. > > The "slow" ordering sometimes takes 100 microseconds after the last > request completion before the guest is notified. It probably depends on > whether the event loop is kicked by another source. > > I'm thinking of scrapping the batch BH and instead using a notify > plug/unplug callback to suppress notification until the last request has > been processed. > > I also checked that batch notification does indeed improve performance > compared to no batching. It offers a nice boost so we do want to port > the feature from dataplane to non-dataplane. > > For the time being: consider this patch series broken due to the > performance regression. > > Stefan
Stefan, could you please share your loads? I tried on my fio scripts and did not notice any significant difference. Would be interesting to understand the root cause. -- Roman