On Thu, May 8, 2014 at 1:33 PM, Fam Zheng <f...@redhat.com> wrote: > On Thu, 05/08 12:16, Stefan Hajnoczi wrote: >> Here is background on the latest dataplane work in my "[PATCH v2 00/25] >> dataplane: use QEMU block layer" series. It's necessary for anyone who wants >> to build on top of it. Please leave feedback or questions and I'll submit a >> docs/ patch with the final version of this document. >> >> >> This document explains the IOThread feature and how to write code that runs >> outside the QEMU global mutex. >> >> The main loop and IOThreads >> --------------------------- >> QEMU is an event-driven program that can do several things at once using an >> event loop. The VNC server and the QMP monitor are both processed from the >> same event loop which monitors their file descriptors until they become >> readable and then invokes a callback. >> >> The default event loop is called the main loop (see main-loop.c). It is >> possible to create additional event loop threads using -object >> iothread,id=my-iothread. > > Is dataplane the only user for this now?
Yes, and neither dataplane (x-data-plane=on) nor IOThread (-object iothread,id=<name>) are finalized. There was a discussion about -object and QOM on the mailing list a while back. We reached the conclusion that -object shouldn't be a supported command-line interface, it should be used for testing, development, etc. So an -iothread option still needs to be added. >> >> Side note: The main loop and IOThread are both event loops but their code is >> not shared completely. Sometimes it is useful to remember that although they >> are conceptually similar they are currently not interchangeable. >> >> Why IOThreads are useful >> ------------------------ >> IOThreads allow the user to control the placement of work. The main loop is >> a >> scalability bottleneck on hosts with many CPUs. Work can be spread across >> several IOThreads instead of just one main loop. When set up correctly this >> can improve I/O latency and reduce jitter seen by the guest. >> >> The main loop is also deeply associated with the QEMU global mutex, which is >> a >> scalability bottleneck in itself. vCPU threads and the main loop use the >> QEMU >> global mutex to serialize execution of QEMU code. This mutex is necessary >> because a lot of QEMU's code historically was not thread-safe. >> >> The fact that all I/O processing is done in a single main loop and that the >> QEMU global mutex is contended by all vCPU threads and the main loop explain >> why it is desirable to place work into IOThreads. >> >> The experimental virtio-blk data-plane implementation has been benchmarked >> and >> shows these effects: >> ftp://public.dhe.ibm.com/linux/pdfs/KVM_Virtualized_IO_Performance_Paper.pdf >> >> How to program for IOThreads >> ---------------------------- >> The main difference between legacy code and new code that can run in an >> IOThread is dealing explicitly with the event loop object, AioContext >> (see include/block/aio.h). Code that only works in the main loop >> implicitly uses the main loop's AioContext. Code that supports running >> in IOThreads must be aware of its AioContext. >> >> AioContext supports the following services: >> * File descriptor monitoring (read/write/error) >> * Event notifiers (inter-thread signalling) >> * Timers >> * Bottom Halves (BH) deferred callbacks >> >> There are several old APIs that use the main loop AioContext: >> * LEGACY qemu_aio_set_fd_handler() - monitor a file descriptor >> * LEGACY qemu_aio_set_event_notifier() - monitor an event notifier >> * LEGACY timer_new_ms() - create a timer >> * LEGACY qemu_bh_new() - create a BH >> * LEGACY qemu_aio_wait() - run an event loop iteration >> >> Since they implicitly work on the main loop they cannot be used in code that >> runs in an IOThread. They might cause a crash or deadlock if called from an >> IOThread since the QEMU global mutex is not held. >> >> Instead, use the AioContext functions directly (see include/block/aio.h): >> * aio_set_fd_handler() - monitor a file descriptor >> * aio_set_event_notifier() - monitor an event notifier >> * aio_timer_new() - create a timer >> * aio_bh_new() - create a BH >> * aio_poll() - run an event loop iteration >> >> The AioContext can be obtained from the IOThread using >> iothread_get_aio_context() or for the main loop using qemu_get_aio_context(). >> This way your code works both in IOThreads or the main loop. > > I think such code knows about its iothread, so iothread_get_aio_context is > enough, why need to mention (using) qemu_get_aio_context here? I want to encourage people to write code that works in *any* AioContext, not just IOThreads and not just the main loop. When someone has written code that works in IOThreads it will probably look like this: void do_my_thing(MyObject *obj, AioContext *aio_context); If you want to call do_my_thing() from the main loop you need to know about qemu_get_aio_context() so you can call it: do_my_thing(obj, qemu_get_aio_context()); >> >> How to synchronize with an IOThread >> ----------------------------------- >> AioContext is not thread-safe so some rules must be followed when using file >> descriptors, event notifiers, timers, or BHs across threads: >> >> 1. AioContext functions can be called safely from file descriptor, event >> notifier, timer, or BH callbacks invoked by the AioContext. No locking is >> necessary. >> >> 2. Other threads wishing to access the AioContext must use >> aio_context_acquire()/aio_context_release() for mutual exclusion. Once the >> context is acquired no other thread can access it or run event loop >> iterations >> in this AioContext. >> >> aio_context_acquire()/aio_context_release() calls may be nested. This >> means you can call them if you're not sure whether #1 applies. >> >> Side note: the best way to schedule a function call across threads is to >> create >> a BH in the target AioContext beforehand and then call qemu_bh_schedule(). >> No >> acquire/release or locking is needed for the qemu_bh_schedule() call. But be >> sure to acquire the AioContext for aio_bh_new() if necessary. >> >> The relationship between AioContext and the block layer >> ------------------------------------------------------- >> The AioContext originates from the QEMU block layer because it provides a >> scoped way of running event loop iterations until all work is done. This >> feature is used to complete all in-flight block I/O requests (see >> bdrv_drain_all()). Nowadays AioContext is a generic event loop that can be >> used by any QEMU subsystem. > > There was a concern about lock ordering, currently we only acquire contexts > from main loop and vCPU threads, so we're safe. Do we enforce this rule? If we > use this reentrant lock in others parts of QEMU, what are the rules then? AioContext acquire/release is a reentrant lock (RFifoLock). This is useful since it makes it easier to write composable code that doesn't deadlock if called inside a context that already has the AioContext acquired. Regarding lock ordering, there is currently no reason for IOThread code (virtio-blk data-plane) to acquire another AioContext. It only needs its own BlockDriverState AioContext. That's why we don't need to worry about lock ordering problems - only the main loop will acquire another AioContext. I will add a note about lock ordering. >> >> The block layer has support for AioContext integrated. Each BlockDriverState >> is associated with an AioContext using bdrv_set_aio_context() and >> bdrv_get_aio_context(). This allows block layer code to process I/O inside >> the >> right AioContext. Other subsystems may wish to follow a similar approach. >> >> If main loop code such as a QMP function wishes to access a BlockDriverState >> it >> must first call aio_context_acquire(bdrv_get_aio_context(bs)) to ensure the >> IOThread does not run in parallel. > > Does it imply that adding aio_context_acquire and aio_context_release > protection inside a bdrv_* function makes it (at least in the sense of > IOThreads) thread-safe? Yes. The AioContext lock is what protects the BlockDriverState. >> >> Long-running jobs (usually in the form of coroutines) are best scheduled in >> the >> BlockDriverState's AioContext to avoid the need to acquire/release around >> each >> bdrv_*() call. Be aware that there is currently no mechanism to get notified >> when bdrv_set_aio_context() moves this BlockDriverState to a different >> AioContext (see bdrv_detach_aio_context()/bdrv_attach_aio_context()), so you >> may need to add this if you want to support long-running jobs. > > Is block job a case of this? Looks like a subtask of adding support of block > jobs in dataplane. Yes, they are currently not available when x-data-plane=on is used because it sets bdrv_in_use. Stefan