Re: [Qemu-devel] [RFC] dataplane: IOThreads and writing dataplane-capable code

Stefan Hajnoczi Thu, 08 May 2014 04:59:52 -0700

On Thu, May 8, 2014 at 1:33 PM, Fam Zheng <f...@redhat.com> wrote:
> On Thu, 05/08 12:16, Stefan Hajnoczi wrote:
>> Here is background on the latest dataplane work in my "[PATCH v2 00/25]
>> dataplane: use QEMU block layer" series.  It's necessary for anyone who wants
>> to build on top of it.  Please leave feedback or questions and I'll submit a
>> docs/ patch with the final version of this document.
>>
>>
>> This document explains the IOThread feature and how to write code that runs
>> outside the QEMU global mutex.
>>
>> The main loop and IOThreads
>> ---------------------------
>> QEMU is an event-driven program that can do several things at once using an
>> event loop.  The VNC server and the QMP monitor are both processed from the
>> same event loop which monitors their file descriptors until they become
>> readable and then invokes a callback.
>>
>> The default event loop is called the main loop (see main-loop.c).  It is
>> possible to create additional event loop threads using -object
>> iothread,id=my-iothread.
>
> Is dataplane the only user for this now?


Yes, and neither dataplane (x-data-plane=on) nor IOThread (-object
iothread,id=<name>) are finalized.

There was a discussion about -object and QOM on the mailing list a
while back.  We reached the conclusion that -object shouldn't be a
supported command-line interface, it should be used for testing,
development, etc.  So an -iothread option still needs to be added.

>>
>> Side note: The main loop and IOThread are both event loops but their code is
>> not shared completely.  Sometimes it is useful to remember that although they
>> are conceptually similar they are currently not interchangeable.
>>
>> Why IOThreads are useful
>> ------------------------
>> IOThreads allow the user to control the placement of work.  The main loop is 
>> a
>> scalability bottleneck on hosts with many CPUs.  Work can be spread across
>> several IOThreads instead of just one main loop.  When set up correctly this
>> can improve I/O latency and reduce jitter seen by the guest.
>>
>> The main loop is also deeply associated with the QEMU global mutex, which is 
>> a
>> scalability bottleneck in itself.  vCPU threads and the main loop use the 
>> QEMU
>> global mutex to serialize execution of QEMU code.  This mutex is necessary
>> because a lot of QEMU's code historically was not thread-safe.
>>
>> The fact that all I/O processing is done in a single main loop and that the
>> QEMU global mutex is contended by all vCPU threads and the main loop explain
>> why it is desirable to place work into IOThreads.
>>
>> The experimental virtio-blk data-plane implementation has been benchmarked 
>> and
>> shows these effects:
>> ftp://public.dhe.ibm.com/linux/pdfs/KVM_Virtualized_IO_Performance_Paper.pdf
>>
>> How to program for IOThreads
>> ----------------------------
>> The main difference between legacy code and new code that can run in an
>> IOThread is dealing explicitly with the event loop object, AioContext
>> (see include/block/aio.h).  Code that only works in the main loop
>> implicitly uses the main loop's AioContext.  Code that supports running
>> in IOThreads must be aware of its AioContext.
>>
>> AioContext supports the following services:
>>  * File descriptor monitoring (read/write/error)
>>  * Event notifiers (inter-thread signalling)
>>  * Timers
>>  * Bottom Halves (BH) deferred callbacks
>>
>> There are several old APIs that use the main loop AioContext:
>>  * LEGACY qemu_aio_set_fd_handler() - monitor a file descriptor
>>  * LEGACY qemu_aio_set_event_notifier() - monitor an event notifier
>>  * LEGACY timer_new_ms() - create a timer
>>  * LEGACY qemu_bh_new() - create a BH
>>  * LEGACY qemu_aio_wait() - run an event loop iteration
>>
>> Since they implicitly work on the main loop they cannot be used in code that
>> runs in an IOThread.  They might cause a crash or deadlock if called from an
>> IOThread since the QEMU global mutex is not held.
>>
>> Instead, use the AioContext functions directly (see include/block/aio.h):
>>  * aio_set_fd_handler() - monitor a file descriptor
>>  * aio_set_event_notifier() - monitor an event notifier
>>  * aio_timer_new() - create a timer
>>  * aio_bh_new() - create a BH
>>  * aio_poll() - run an event loop iteration
>>
>> The AioContext can be obtained from the IOThread using
>> iothread_get_aio_context() or for the main loop using qemu_get_aio_context().
>> This way your code works both in IOThreads or the main loop.
>
> I think such code knows about its iothread, so iothread_get_aio_context is
> enough, why need to mention (using) qemu_get_aio_context here?

I want to encourage people to write code that works in *any*
AioContext, not just IOThreads and not just the main loop.  When
someone has written code that works in IOThreads it will probably look
like this:

void do_my_thing(MyObject *obj, AioContext *aio_context);

If you want to call do_my_thing() from the main loop you need to know
about qemu_get_aio_context() so you can call it:

do_my_thing(obj, qemu_get_aio_context());

>>
>> How to synchronize with an IOThread
>> -----------------------------------
>> AioContext is not thread-safe so some rules must be followed when using file
>> descriptors, event notifiers, timers, or BHs across threads:
>>
>> 1. AioContext functions can be called safely from file descriptor, event
>> notifier, timer, or BH callbacks invoked by the AioContext.  No locking is
>> necessary.
>>
>> 2. Other threads wishing to access the AioContext must use
>> aio_context_acquire()/aio_context_release() for mutual exclusion.  Once the
>> context is acquired no other thread can access it or run event loop 
>> iterations
>> in this AioContext.
>>
>> aio_context_acquire()/aio_context_release() calls may be nested.  This
>> means you can call them if you're not sure whether #1 applies.
>>
>> Side note: the best way to schedule a function call across threads is to 
>> create
>> a BH in the target AioContext beforehand and then call qemu_bh_schedule().  
>> No
>> acquire/release or locking is needed for the qemu_bh_schedule() call.  But be
>> sure to acquire the AioContext for aio_bh_new() if necessary.
>>
>> The relationship between AioContext and the block layer
>> -------------------------------------------------------
>> The AioContext originates from the QEMU block layer because it provides a
>> scoped way of running event loop iterations until all work is done.  This
>> feature is used to complete all in-flight block I/O requests (see
>> bdrv_drain_all()).  Nowadays AioContext is a generic event loop that can be
>> used by any QEMU subsystem.
>
> There was a concern about lock ordering, currently we only acquire contexts
> from main loop and vCPU threads, so we're safe. Do we enforce this rule? If we
> use this reentrant lock in others parts of QEMU, what are the rules then?

AioContext acquire/release is a reentrant lock (RFifoLock).  This is
useful since it makes it easier to write composable code that doesn't
deadlock if called inside a context that already has the AioContext
acquired.

Regarding lock ordering, there is currently no reason for IOThread
code (virtio-blk data-plane) to acquire another AioContext.  It only
needs its own BlockDriverState AioContext.  That's why we don't need
to worry about lock ordering problems - only the main loop will
acquire another AioContext.

I will add a note about lock ordering.

>>
>> The block layer has support for AioContext integrated.  Each BlockDriverState
>> is associated with an AioContext using bdrv_set_aio_context() and
>> bdrv_get_aio_context().  This allows block layer code to process I/O inside 
>> the
>> right AioContext.  Other subsystems may wish to follow a similar approach.
>>
>> If main loop code such as a QMP function wishes to access a BlockDriverState 
>> it
>> must first call aio_context_acquire(bdrv_get_aio_context(bs)) to ensure the
>> IOThread does not run in parallel.
>
> Does it imply that adding aio_context_acquire and aio_context_release
> protection inside a bdrv_* function makes it (at least in the sense of
> IOThreads) thread-safe?

Yes.  The AioContext lock is what protects the BlockDriverState.

>>
>> Long-running jobs (usually in the form of coroutines) are best scheduled in 
>> the
>> BlockDriverState's AioContext to avoid the need to acquire/release around 
>> each
>> bdrv_*() call.  Be aware that there is currently no mechanism to get notified
>> when bdrv_set_aio_context() moves this BlockDriverState to a different
>> AioContext (see bdrv_detach_aio_context()/bdrv_attach_aio_context()), so you
>> may need to add this if you want to support long-running jobs.
>
> Is block job a case of this? Looks like a subtask of adding support of block
> jobs in dataplane.

Yes, they are currently not available when x-data-plane=on is used
because it sets bdrv_in_use.

Stefan

Re: [Qemu-devel] [RFC] dataplane: IOThreads and writing dataplane-capable code

Reply via email to