Re: [Qemu-devel] Block I/O optimizations

Stefan Hajnoczi Mon, 25 Feb 2013 05:01:42 -0800

On Mon, Feb 25, 2013 at 10:48:47AM +0200, Abel Gordon wrote:
> Stefan Hajnoczi <stefa...@gmail.com> wrote on 21/02/2013 10:11:12 AM:
> 
> > From: Stefan Hajnoczi <stefa...@gmail.com>
> > To: Loic Dachary <l...@dachary.org>,
> > Cc: qemu-devel <qemu-devel@nongnu.org>
> > Date: 21/02/2013 10:11 AM
> > Subject: Re: [Qemu-devel] Block I/O optimizations
> > Sent by: qemu-devel-bounces+abelg=il.ibm....@nongnu.org
> >
> > On Mon, Feb 18, 2013 at 7:19 PM, Loic Dachary <l...@dachary.org> wrote:
> > > I recently tried to figure out the best and easiest ways to
> > increase block I/O performances with qemu. Not being a qemu expert,
> > I expected to find a few optimization tricks. Much to my surprise,
> > it appears that there are many significant improvements being worked
> > on. This is excellent news :-)
> > >
> > > However, I'm not sure I understand how they all fit together. It's
> > probably quite obvious from the developer point of view but I would
> > very much appreciate an overview of how dataplane, vhost-blk, ELVIS
> > etc. should be used or developed to maximize I/O performances. Are
> > there documents I should read ? If not, would someone be willing to
> > share bits of wisdom ?
> >
> > Hi Loic,
> > There will be more information on dataplane shortly.  I'll write up a
> > blog post and share the link with you.
> 
> 
> Hi Stefan,
> 
> I assume dataplane could provide a significant performance boost
> and approximate vhost-blk performance. If I understand properly,
> that's because dataplane finally removes the dependency on
> the global mutex and uses eventfd to process notifications.


Right, it's the same approach - ioeventfd for kicks and irqfd for
notifies.  The difference is kernel thread vs userspace thread.

> However, I am concerned dataplane may not solve the scalability
> problem because QEMU will be still running 1 thread
> per VCPU and 1 per virtual device to handle I/O for each VM.
> Assuming we run N VMs with 1 VCPU and 1 virtual I/O device,
> we will have 2N threads competing for CPU cycles.  In a
> cloud-like environment running I/O intensive VMs that could
> be a problem because the I/O threads and VCPU threads may
> starve each other. Further more, the linux kernel can't make
> good scheduling decisions (from I/O perspective) because it
> has no information about the content of the I/O queues.

The kernel knows when the dataplane thread is schedulable - when the
ioeventfd is signalled.  In the worst case the scheduler could allow the
vcpu thread to complete an entire time slice before letting the
dataplane thread run.

So are you saying that the Linux scheduler wouldn't allow the dataplane
thread to run on a loaded box?  My first thought would be to raise the
priority of the dataplane thread so it preempts the vcpu thread upon
becoming schedulable.

> We did some experiments with a modified vhost-blk back-end
> that uses a single (or few) thread/s to process I/O for many
> VMs as opposed to 1 thread per VM (I/O device).  These thread/s
> decide for how-long and when to process the request of each
> VM based on the I/O activity of each queue. We noticed that
> this model (part of what we call ELVIS) significantly improves
> the scalability of the system when you run many I/O intensive
> guests.

When you say "this model (part of what we call ELVIS) significantly
improves the scalability of the system when you run many I/O intensive
guests", do you mean exit-less vs exit-based or shared thread vs 1
thread per device (without polling)?  I'm not sure if you're advocating
exit-less (polling) or shared thread without polling.

> I was wondering if you have considered this type of threading
> model for dataplane as well. With vhost-blk (or-net) it's relatively
> easy to use a kernel thread to process I/O for many VMs (user-space
> processes). However, with a QEMU back-end (like dataplane/virtio-blk)
> the shared thread model may be challenging because it requires
> a shared user-space process (for the I/O thread/s) to handle
> I/O for many QEMU processes.
> 
> Any thoughts/opinions on the share-thread direction ?

For low latency polling makes sense and a shared thread is an efficient
way to implement polling.  But it throws away resource control and
isolation - now you can no longer use cgroups and other standard
resource control mechanisms to manage guests.  You also create a
privileged thread that has access to all guests on the host - a security
bug here compromises all guests.  This can be fine for private
deployments where guests are trusted.  For untrusted guests and public
clouds it seems risky.

Maybe a hybrid approach is possible where exit-less is possible but I/O
emulation still happens in per-guest userspace threads.  Not sure how
much performance can be retained by doing that - e.g. a kernel driver
that allows processes to bind an eventfd to a memory notification area.
The kernel driver does polling in a single thread and signals eventfds.
Userspace threads do the actual I/O emulation.

Stefan

Re: [Qemu-devel] Block I/O optimizations

Reply via email to