Re: [Qemu-devel] Using aio_poll for timer carrier threads
Il 14/08/2013 14:32, Jan Kiszka ha scritto: I still need to check more corner cases as timer dequeuing can now race with the handler execution, ie. a dequeued timer can still see one more handler run after timer_del returned. That's a property one can easily take into account when writing device models, but it has to be kept in mind that it's different from current behavior. Updated queue is at git://git.kiszka.org/qemu.git queues/rt.new3 again. I took a look here, there are several patches that I guess are basically ready... two random things I noticed: (1) rcu_init() must be called in rcutorture too (2) there are several load/store functions in exec.c that do not go through address_space_rw. These would not call qemu_flush_coalesced_mmio_buffer() after your patch provide address_space_rw_unlocked. I think that in most cases the solution should be to make these functions go through address_space_rw. Paolo
Re: [Qemu-devel] Using aio_poll for timer carrier threads
Il 13/08/2013 16:54, Jan Kiszka ha scritto: Using an AioContext lock for timers is somewhat complicated for lock ordering, because context A could try to modify a timer from context B, at the same time when context B is modifying a timer from context A. This would cause a deadlock. That's like MMIO access on device A triggers MMIO access on B and vice versa - why should we need this, so why should we support this? I think the typical case is that timers (and their lists) and data structures they access have a fairly close relation, thus can reuse the same lock. Yes, that's true. Still it would have to be documented, and using too-coarse locks risks having many BQLs, which multiplies the complexity (fine-grained locking at least keeps critical sections small and limits the amount of nested locking). I like Stefan's patches to make the timer list thread-safe, especially if we can optimize it (with RCU?) to make the read side lockless. Paolo
Re: [Qemu-devel] Using aio_poll for timer carrier threads
On 2013-08-19 15:21, Paolo Bonzini wrote: Il 13/08/2013 16:54, Jan Kiszka ha scritto: Using an AioContext lock for timers is somewhat complicated for lock ordering, because context A could try to modify a timer from context B, at the same time when context B is modifying a timer from context A. This would cause a deadlock. That's like MMIO access on device A triggers MMIO access on B and vice versa - why should we need this, so why should we support this? I think the typical case is that timers (and their lists) and data structures they access have a fairly close relation, thus can reuse the same lock. Yes, that's true. Still it would have to be documented, and using too-coarse locks risks having many BQLs, which multiplies the complexity (fine-grained locking at least keeps critical sections small and limits the amount of nested locking). As this lock does not require taking other locks while holding it, it should actually be fine. I like Stefan's patches to make the timer list thread-safe, especially if we can optimize it (with RCU?) to make the read side lockless. What is a pure read-side in that context? Checks if some timer is expired? Given that RCU write sides are heavier than plain mutexes and many typical accesses (mod, del, expire) involve writing, such an optimization may also be counterproductive. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux
Re: [Qemu-devel] Using aio_poll for timer carrier threads
On 19 Aug 2013, at 14:21, Paolo Bonzini wrote: I like Stefan's patches to make the timer list thread-safe, especially if we can optimize it (with RCU?) to make the read side lockless. We will have to be careful on the write side. I'm trying to work out just how slow this would with Paolo's RCU patches. If call_rcu is used (or synchronize_rcu after the entire list has been traversed, all timers updated etc), how bad is it? There is a particular concern for for repeating timers where two writes are done (firstly to remove the timer from the list, then next inside the timer routine to add it back to the list). As timers are only called by a single thread, it might be worth having some form of timer API which specifies a timer should be called every X nano/micro/milli/seconds, or just once in X nano/micro/milli/seconds time, which would save the inevitable call to read the clock value first, and would speed things up if write locking was slow. -- Alex Bligh
Re: [Qemu-devel] Using aio_poll for timer carrier threads
Il 19/08/2013 15:40, Jan Kiszka ha scritto: On 2013-08-19 15:21, Paolo Bonzini wrote: Il 13/08/2013 16:54, Jan Kiszka ha scritto: Using an AioContext lock for timers is somewhat complicated for lock ordering, because context A could try to modify a timer from context B, at the same time when context B is modifying a timer from context A. This would cause a deadlock. That's like MMIO access on device A triggers MMIO access on B and vice versa - why should we need this, so why should we support this? I think the typical case is that timers (and their lists) and data structures they access have a fairly close relation, thus can reuse the same lock. Yes, that's true. Still it would have to be documented, and using too-coarse locks risks having many BQLs, which multiplies the complexity (fine-grained locking at least keeps critical sections small and limits the amount of nested locking). As this lock does not require taking other locks while holding it, it should actually be fine. Using an AioContext lock would be more dangerous, though. The risk is mostly that someone is holding _another_ AioContext lock while calling a timer function. For MMIO, the rule would be that all address_space_rw/map/unmap must be done with no lock taken. I like Stefan's patches to make the timer list thread-safe, especially if we can optimize it (with RCU?) to make the read side lockless. What is a pure read-side in that context? Checks if some timer is expired? Yes. It is actually quite common, as Stefan pointed out elsewhere to Ping Fan, as it happens at least once in each iteration of the loop. The read side doesn't really need to look beyond the head of the list. Given that RCU write sides are heavier than plain mutexes and many typical accesses (mod, del, expire) involve writing, such an optimization may also be counterproductive. This would have to be a creative use of RCU, with no copy in it... basically you can use RCU only as a substitute for reference counting. call_rcu/synchronize_rcu is not done on every write operation, all we need to do is ensuring that devices free timers only after an RCU critical section. We should do the same for memory regions, see my patches to split exitfn and instance_finalize; I'll go more in depth in my KVM Forum 2013. Lockless algorithms (including RCU) are somewhat appealing because you do not have to worry about locking, but of course all warnings about increased complexity and premature optimization apply. Paolo
Re: [Qemu-devel] Using aio_poll for timer carrier threads
On Tue, Aug 13, 2013 at 04:13:03PM +0200, Jan Kiszka wrote: On 2013-08-13 15:45, Stefan Hajnoczi wrote: On Tue, Aug 13, 2013 at 09:56:17AM +0200, Jan Kiszka wrote: The details depend on your device, do you have a git repo I can look at to understand your device model? Pushed my hacks here: git://git.kiszka.org/qemu.git queues/rt.new3 Excellent, thanks! Are you calling qemu_raise_irq() outside the global mutex and how is it protected? Stefan
Re: [Qemu-devel] Using aio_poll for timer carrier threads
On 2013-08-14 10:52, Stefan Hajnoczi wrote: On Tue, Aug 13, 2013 at 04:13:03PM +0200, Jan Kiszka wrote: On 2013-08-13 15:45, Stefan Hajnoczi wrote: On Tue, Aug 13, 2013 at 09:56:17AM +0200, Jan Kiszka wrote: The details depend on your device, do you have a git repo I can look at to understand your device model? Pushed my hacks here: git://git.kiszka.org/qemu.git queues/rt.new3 Excellent, thanks! Are you calling qemu_raise_irq() outside the global mutex and how is it protected? By luck and via many exceptions, specifically by disabling of HPET support (to avoid that it is involved in IRQ routing - or even used in legacy mode) and by relying on the direct delivery to the kernel in KVM mode. Yes, IRQ delivery is still a huge construction site for BQL-free device models. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux
Re: [Qemu-devel] Using aio_poll for timer carrier threads
On Wed, Aug 14, 2013 at 11:05:29AM +0200, Jan Kiszka wrote: On 2013-08-14 10:52, Stefan Hajnoczi wrote: On Tue, Aug 13, 2013 at 04:13:03PM +0200, Jan Kiszka wrote: On 2013-08-13 15:45, Stefan Hajnoczi wrote: On Tue, Aug 13, 2013 at 09:56:17AM +0200, Jan Kiszka wrote: The details depend on your device, do you have a git repo I can look at to understand your device model? Pushed my hacks here: git://git.kiszka.org/qemu.git queues/rt.new3 Excellent, thanks! Are you calling qemu_raise_irq() outside the global mutex and how is it protected? By luck and via many exceptions, specifically by disabling of HPET support (to avoid that it is involved in IRQ routing - or even used in legacy mode) and by relying on the direct delivery to the kernel in KVM mode. Yes, IRQ delivery is still a huge construction site for BQL-free device models. Okay. In dataplane I use the guest notifier for virtio devices (irqfd for KVM mode with MSI-X or bounced through virtio_queue_guest_notifier_read()). Stefan
Re: [Qemu-devel] Using aio_poll for timer carrier threads
On 2013-08-13 16:13, Jan Kiszka wrote: On 2013-08-13 15:45, Stefan Hajnoczi wrote: This should make timers usable in another thread for clock device emulation if only your iothread uses the AioContext and its timers (besides the thread-safe mod/del interfaces). As argued in the other thread, I don't think we need (and want) locking in the timer subsystem, rather push this to its users. But I'll look again at your patches, if they are also usable. I've checked and applied your two patches adding the active_timers lock. This model apparently works as well for the RTC use case. And it avoids having to patch aio_poll, the RTC device lock is now taken inside the timer handlers. I still need to check more corner cases as timer dequeuing can now race with the handler execution, ie. a dequeued timer can still see one more handler run after timer_del returned. That's a property one can easily take into account when writing device models, but it has to be kept in mind that it's different from current behavior. Updated queue is at git://git.kiszka.org/qemu.git queues/rt.new3 again. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux
Re: [Qemu-devel] Using aio_poll for timer carrier threads
On Tue, Aug 13, 2013 at 09:56:17AM +0200, Jan Kiszka wrote: in the attempt to use Alex' ppoll-based timer rework for decoupled, real-time capable timer device models I'm now scratching my head over the aio_poll interface. I'm looking at dataplane/virtio-blk.c, just finding static void *data_plane_thread(void *opaque) { VirtIOBlockDataPlane *s = opaque; do { aio_poll(s-ctx, true); } while (!s-stopping || s-num_reqs 0); return NULL; } wondering where the locking is. Or doesn't this use need any at all? Are all data structures that this thread accesses exclusively used by it, or are they all accessed in a lock-less way? Most of the data structures in dataplane upstream are not shared. Virtio, virtio-blk, and Linux AIO raw file I/O are duplicated for dataplane and do not rely on QEMU infrastructure. I've been working on undoing this duplication over the past months but upstream QEMU still mostly does not share data structures and therefore does not need much synchronization. For the crude synchronization that we do need we simply start/stop the dataplane thread. Our iothread mainloop more or less open-codes aio_poll and is, thus, able to drop its lock before falling asleep while still holding it during event dispatching. Obviously, I need the same when processing timer lists of an AioContext, protecting them against concurrent modifications over VCPUs or other threads. So I'm thinking of adding a block notification callback to aio_poll, to be called before/after qemu_poll_ns so that any locks can be dropped / reacquired as needed. Or am I missing some magic interface / pattern? Upstream dataplane does not use timers, so the code there cannot serve as an example. If you combine Alex Bligh, Ping Fan, and my latest timer series, you get support for QEMUTimer in AioContexts where qemu_timer_mod_ns() and qemu_timer_del() are thread-safe. vm_clock (without icount) and rt_clock are thread-safe clock sources. This should make timers usable in another thread for clock device emulation if only your iothread uses the AioContext and its timers (besides the thread-safe mod/del interfaces). The details depend on your device, do you have a git repo I can look at to understand your device model? Stefan
Re: [Qemu-devel] Using aio_poll for timer carrier threads
On 2013-08-13 15:45, Stefan Hajnoczi wrote: On Tue, Aug 13, 2013 at 09:56:17AM +0200, Jan Kiszka wrote: in the attempt to use Alex' ppoll-based timer rework for decoupled, real-time capable timer device models I'm now scratching my head over the aio_poll interface. I'm looking at dataplane/virtio-blk.c, just finding static void *data_plane_thread(void *opaque) { VirtIOBlockDataPlane *s = opaque; do { aio_poll(s-ctx, true); } while (!s-stopping || s-num_reqs 0); return NULL; } wondering where the locking is. Or doesn't this use need any at all? Are all data structures that this thread accesses exclusively used by it, or are they all accessed in a lock-less way? Most of the data structures in dataplane upstream are not shared. Virtio, virtio-blk, and Linux AIO raw file I/O are duplicated for dataplane and do not rely on QEMU infrastructure. I've been working on undoing this duplication over the past months but upstream QEMU still mostly does not share data structures and therefore does not need much synchronization. For the crude synchronization that we do need we simply start/stop the dataplane thread. Our iothread mainloop more or less open-codes aio_poll and is, thus, able to drop its lock before falling asleep while still holding it during event dispatching. Obviously, I need the same when processing timer lists of an AioContext, protecting them against concurrent modifications over VCPUs or other threads. So I'm thinking of adding a block notification callback to aio_poll, to be called before/after qemu_poll_ns so that any locks can be dropped / reacquired as needed. Or am I missing some magic interface / pattern? Upstream dataplane does not use timers, so the code there cannot serve as an example. If you combine Alex Bligh, Ping Fan, and my latest timer series, you get support for QEMUTimer in AioContexts where qemu_timer_mod_ns() and qemu_timer_del() are thread-safe. vm_clock (without icount) and rt_clock are thread-safe clock sources. To which series of yours and Ping Fan are you referring? [1] and [2]? This should make timers usable in another thread for clock device emulation if only your iothread uses the AioContext and its timers (besides the thread-safe mod/del interfaces). As argued in the other thread, I don't think we need (and want) locking in the timer subsystem, rather push this to its users. But I'll look again at your patches, if they are also usable. The details depend on your device, do you have a git repo I can look at to understand your device model? Pushed my hacks here: git://git.kiszka.org/qemu.git queues/rt.new3 Jan [1] http://thread.gmane.org/gmane.comp.emulators.qemu/227590 [2] http://thread.gmane.org/gmane.comp.emulators.qemu/226369 -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux
Re: [Qemu-devel] Using aio_poll for timer carrier threads
Il 13/08/2013 09:56, Jan Kiszka ha scritto: Hi Stefan, in the attempt to use Alex' ppoll-based timer rework for decoupled, real-time capable timer device models I'm now scratching my head over the aio_poll interface. I'm looking at dataplane/virtio-blk.c, just finding static void *data_plane_thread(void *opaque) { VirtIOBlockDataPlane *s = opaque; do { aio_poll(s-ctx, true); } while (!s-stopping || s-num_reqs 0); return NULL; } wondering where the locking is. Or doesn't this use need any at all? Are all data structures that this thread accesses exclusively used by it, or are they all accessed in a lock-less way? There is some locking in aio_bh_poll. It is pretty lightweight because adding elements and deleting them is done under a lock, while the list is walked without taking it (because deletion is only done by the thread that walks). We could do something similar for file descriptors; timers are more complicated because insertions happen for every mod_timer. Using an AioContext lock for timers is somewhat complicated for lock ordering, because context A could try to modify a timer from context B, at the same time when context B is modifying a timer from context A. This would cause a deadlock. So I agree with Stefan's usage of a finer-grain lock for timer lists. Paolo
Re: [Qemu-devel] Using aio_poll for timer carrier threads
On 2013-08-13 16:45, Paolo Bonzini wrote: Il 13/08/2013 09:56, Jan Kiszka ha scritto: Hi Stefan, in the attempt to use Alex' ppoll-based timer rework for decoupled, real-time capable timer device models I'm now scratching my head over the aio_poll interface. I'm looking at dataplane/virtio-blk.c, just finding static void *data_plane_thread(void *opaque) { VirtIOBlockDataPlane *s = opaque; do { aio_poll(s-ctx, true); } while (!s-stopping || s-num_reqs 0); return NULL; } wondering where the locking is. Or doesn't this use need any at all? Are all data structures that this thread accesses exclusively used by it, or are they all accessed in a lock-less way? There is some locking in aio_bh_poll. It is pretty lightweight because adding elements and deleting them is done under a lock, while the list is walked without taking it (because deletion is only done by the thread that walks). We could do something similar for file descriptors; timers are more complicated because insertions happen for every mod_timer. Using an AioContext lock for timers is somewhat complicated for lock ordering, because context A could try to modify a timer from context B, at the same time when context B is modifying a timer from context A. This would cause a deadlock. That's like MMIO access on device A triggers MMIO access on B and vice versa - why should we need this, so why should we support this? I think the typical case is that timers (and their lists) and data structures they access have a fairly close relation, thus can reuse the same lock. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux
Re: [Qemu-devel] Using aio_poll for timer carrier threads
On Tue, Aug 13, 2013 at 10:13 PM, Jan Kiszka jan.kis...@siemens.com wrote: On 2013-08-13 15:45, Stefan Hajnoczi wrote: On Tue, Aug 13, 2013 at 09:56:17AM +0200, Jan Kiszka wrote: in the attempt to use Alex' ppoll-based timer rework for decoupled, real-time capable timer device models I'm now scratching my head over the aio_poll interface. I'm looking at dataplane/virtio-blk.c, just finding static void *data_plane_thread(void *opaque) { VirtIOBlockDataPlane *s = opaque; do { aio_poll(s-ctx, true); } while (!s-stopping || s-num_reqs 0); return NULL; } wondering where the locking is. Or doesn't this use need any at all? Are all data structures that this thread accesses exclusively used by it, or are they all accessed in a lock-less way? Most of the data structures in dataplane upstream are not shared. Virtio, virtio-blk, and Linux AIO raw file I/O are duplicated for dataplane and do not rely on QEMU infrastructure. I've been working on undoing this duplication over the past months but upstream QEMU still mostly does not share data structures and therefore does not need much synchronization. For the crude synchronization that we do need we simply start/stop the dataplane thread. Our iothread mainloop more or less open-codes aio_poll and is, thus, able to drop its lock before falling asleep while still holding it during event dispatching. Obviously, I need the same when processing timer lists of an AioContext, protecting them against concurrent modifications over VCPUs or other threads. So I'm thinking of adding a block notification callback to aio_poll, to be called before/after qemu_poll_ns so that any locks can be dropped / reacquired as needed. Or am I missing some magic interface / pattern? Upstream dataplane does not use timers, so the code there cannot serve as an example. If you combine Alex Bligh, Ping Fan, and my latest timer series, you get support for QEMUTimer in AioContexts where qemu_timer_mod_ns() and qemu_timer_del() are thread-safe. vm_clock (without icount) and rt_clock are thread-safe clock sources. To which series of yours and Ping Fan are you referring? [1] and [2]? Stefan's [1] has been rebased onto Alex's v10 My part [2'] http://thread.gmane.org/gmane.comp.emulators.qemu/227751, rebased onto v10 too. This should make timers usable in another thread for clock device emulation if only your iothread uses the AioContext and its timers (besides the thread-safe mod/del interfaces). As argued in the other thread, I don't think we need (and want) locking in the timer subsystem, rather push this to its users. But I'll look again at your patches, if they are also usable. The details depend on your device, do you have a git repo I can look at to understand your device model? Pushed my hacks here: git://git.kiszka.org/qemu.git queues/rt.new3 Jan [1] http://thread.gmane.org/gmane.comp.emulators.qemu/227590 [2] http://thread.gmane.org/gmane.comp.emulators.qemu/226369 -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux