Re: [Qemu-devel] Using aio_poll for timer carrier threads

2013-08-19 Thread Paolo Bonzini
Il 14/08/2013 14:32, Jan Kiszka ha scritto:
 I still need to check more corner cases as timer dequeuing can now race
 with the handler execution, ie. a dequeued timer can still see one more
 handler run after timer_del returned. That's a property one can easily
 take into account when writing device models, but it has to be kept in
 mind that it's different from current behavior.
 
 Updated queue is at git://git.kiszka.org/qemu.git queues/rt.new3 again.

I took a look here, there are several patches that I guess are basically
ready... two random things I noticed:

(1) rcu_init() must be called in rcutorture too

(2) there are several load/store functions in exec.c that do not go
through address_space_rw.  These would not call
qemu_flush_coalesced_mmio_buffer() after your patch provide
address_space_rw_unlocked.  I think that in most cases the solution
should be to make these functions go through address_space_rw.

Paolo



Re: [Qemu-devel] Using aio_poll for timer carrier threads

2013-08-19 Thread Paolo Bonzini
Il 13/08/2013 16:54, Jan Kiszka ha scritto:
  Using an AioContext lock for timers is somewhat complicated for lock
  ordering, because context A could try to modify a timer from context B,
  at the same time when context B is modifying a timer from context A.
  This would cause a deadlock.
 That's like MMIO access on device A triggers MMIO access on B and vice
 versa - why should we need this, so why should we support this? I think
 the typical case is that timers (and their lists) and data structures
 they access have a fairly close relation, thus can reuse the same lock.

Yes, that's true.  Still it would have to be documented, and using
too-coarse locks risks having many BQLs, which multiplies the complexity
(fine-grained locking at least keeps critical sections small and limits
the amount of nested locking).

I like Stefan's patches to make the timer list thread-safe, especially
if we can optimize it (with RCU?) to make the read side lockless.

Paolo



Re: [Qemu-devel] Using aio_poll for timer carrier threads

2013-08-19 Thread Jan Kiszka
On 2013-08-19 15:21, Paolo Bonzini wrote:
 Il 13/08/2013 16:54, Jan Kiszka ha scritto:
 Using an AioContext lock for timers is somewhat complicated for lock
 ordering, because context A could try to modify a timer from context B,
 at the same time when context B is modifying a timer from context A.
 This would cause a deadlock.
 That's like MMIO access on device A triggers MMIO access on B and vice
 versa - why should we need this, so why should we support this? I think
 the typical case is that timers (and their lists) and data structures
 they access have a fairly close relation, thus can reuse the same lock.
 
 Yes, that's true.  Still it would have to be documented, and using
 too-coarse locks risks having many BQLs, which multiplies the complexity
 (fine-grained locking at least keeps critical sections small and limits
 the amount of nested locking).

As this lock does not require taking other locks while holding it, it
should actually be fine.

 
 I like Stefan's patches to make the timer list thread-safe, especially
 if we can optimize it (with RCU?) to make the read side lockless.

What is a pure read-side in that context? Checks if some timer is
expired? Given that RCU write sides are heavier than plain mutexes and
many typical accesses (mod, del, expire) involve writing, such an
optimization may also be counterproductive.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] Using aio_poll for timer carrier threads

2013-08-19 Thread Alex Bligh

On 19 Aug 2013, at 14:21, Paolo Bonzini wrote:

 I like Stefan's patches to make the timer list thread-safe, especially
 if we can optimize it (with RCU?) to make the read side lockless.

We will have to be careful on the write side. I'm trying to
work out just how slow this would with Paolo's RCU patches.
If call_rcu is used (or synchronize_rcu after the entire list
has been traversed, all timers updated etc), how bad is it?

There is a particular concern for  for repeating timers where two writes
are done (firstly to remove the timer from the list, then next
inside the timer routine to add it back to the list). As timers
are only called by a single thread, it might be worth having
some form of timer API which specifies a timer should be
called every X nano/micro/milli/seconds, or just once in X
nano/micro/milli/seconds time, which would save the inevitable
call to read the clock value first, and would speed things up
if write locking was slow.

-- 
Alex Bligh







Re: [Qemu-devel] Using aio_poll for timer carrier threads

2013-08-19 Thread Paolo Bonzini
Il 19/08/2013 15:40, Jan Kiszka ha scritto:
 On 2013-08-19 15:21, Paolo Bonzini wrote:
 Il 13/08/2013 16:54, Jan Kiszka ha scritto:
 Using an AioContext lock for timers is somewhat complicated for lock
 ordering, because context A could try to modify a timer from context B,
 at the same time when context B is modifying a timer from context A.
 This would cause a deadlock.
 That's like MMIO access on device A triggers MMIO access on B and vice
 versa - why should we need this, so why should we support this? I think
 the typical case is that timers (and their lists) and data structures
 they access have a fairly close relation, thus can reuse the same lock.

 Yes, that's true.  Still it would have to be documented, and using
 too-coarse locks risks having many BQLs, which multiplies the complexity
 (fine-grained locking at least keeps critical sections small and limits
 the amount of nested locking).
 
 As this lock does not require taking other locks while holding it, it
 should actually be fine.

Using an AioContext lock would be more dangerous, though.  The risk is
mostly that someone is holding _another_ AioContext lock while calling a
timer function.

For MMIO, the rule would be that all address_space_rw/map/unmap must be
done with no lock taken.

 I like Stefan's patches to make the timer list thread-safe, especially
 if we can optimize it (with RCU?) to make the read side lockless.
 
 What is a pure read-side in that context? Checks if some timer is
 expired?

Yes.  It is actually quite common, as Stefan pointed out elsewhere to
Ping Fan, as it happens at least once in each iteration of the loop.

The read side doesn't really need to look beyond the head of the list.

 Given that RCU write sides are heavier than plain mutexes and
 many typical accesses (mod, del, expire) involve writing, such an
 optimization may also be counterproductive.

This would have to be a creative use of RCU, with no copy in it...
basically you can use RCU only as a substitute for reference counting.
call_rcu/synchronize_rcu is not done on every write operation, all we
need to do is ensuring that devices free timers only after an RCU
critical section.  We should do the same for memory regions, see my
patches to split exitfn and instance_finalize; I'll go more in depth in
my KVM Forum 2013.

Lockless algorithms (including RCU) are somewhat appealing because you
do not have to worry about locking, but of course all warnings about
increased complexity and premature optimization apply.

Paolo



Re: [Qemu-devel] Using aio_poll for timer carrier threads

2013-08-14 Thread Stefan Hajnoczi
On Tue, Aug 13, 2013 at 04:13:03PM +0200, Jan Kiszka wrote:
 On 2013-08-13 15:45, Stefan Hajnoczi wrote:
  On Tue, Aug 13, 2013 at 09:56:17AM +0200, Jan Kiszka wrote:
  The details depend on your device, do you have a git repo I can look at
  to understand your device model?
 
 Pushed my hacks here:
 
 git://git.kiszka.org/qemu.git queues/rt.new3

Excellent, thanks!  Are you calling qemu_raise_irq() outside the global
mutex and how is it protected?

Stefan



Re: [Qemu-devel] Using aio_poll for timer carrier threads

2013-08-14 Thread Jan Kiszka
On 2013-08-14 10:52, Stefan Hajnoczi wrote:
 On Tue, Aug 13, 2013 at 04:13:03PM +0200, Jan Kiszka wrote:
 On 2013-08-13 15:45, Stefan Hajnoczi wrote:
 On Tue, Aug 13, 2013 at 09:56:17AM +0200, Jan Kiszka wrote:
 The details depend on your device, do you have a git repo I can look at
 to understand your device model?

 Pushed my hacks here:

 git://git.kiszka.org/qemu.git queues/rt.new3
 
 Excellent, thanks!  Are you calling qemu_raise_irq() outside the global
 mutex and how is it protected?

By luck and via many exceptions, specifically by disabling of HPET
support (to avoid that it is involved in IRQ routing - or even used in
legacy mode) and by relying on the direct delivery to the kernel in KVM
mode. Yes, IRQ delivery is still a huge construction site for BQL-free
device models.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] Using aio_poll for timer carrier threads

2013-08-14 Thread Stefan Hajnoczi
On Wed, Aug 14, 2013 at 11:05:29AM +0200, Jan Kiszka wrote:
 On 2013-08-14 10:52, Stefan Hajnoczi wrote:
  On Tue, Aug 13, 2013 at 04:13:03PM +0200, Jan Kiszka wrote:
  On 2013-08-13 15:45, Stefan Hajnoczi wrote:
  On Tue, Aug 13, 2013 at 09:56:17AM +0200, Jan Kiszka wrote:
  The details depend on your device, do you have a git repo I can look at
  to understand your device model?
 
  Pushed my hacks here:
 
  git://git.kiszka.org/qemu.git queues/rt.new3
  
  Excellent, thanks!  Are you calling qemu_raise_irq() outside the global
  mutex and how is it protected?
 
 By luck and via many exceptions, specifically by disabling of HPET
 support (to avoid that it is involved in IRQ routing - or even used in
 legacy mode) and by relying on the direct delivery to the kernel in KVM
 mode. Yes, IRQ delivery is still a huge construction site for BQL-free
 device models.

Okay.  In dataplane I use the guest notifier for virtio devices (irqfd
for KVM mode with MSI-X or bounced through
virtio_queue_guest_notifier_read()).

Stefan



Re: [Qemu-devel] Using aio_poll for timer carrier threads

2013-08-14 Thread Jan Kiszka
On 2013-08-13 16:13, Jan Kiszka wrote:
 On 2013-08-13 15:45, Stefan Hajnoczi wrote:
 This should make timers usable in another thread for clock device
 emulation if only your iothread uses the AioContext and its timers
 (besides the thread-safe mod/del interfaces).
 
 As argued in the other thread, I don't think we need (and want) locking
 in the timer subsystem, rather push this to its users. But I'll look
 again at your patches, if they are also usable.

I've checked and applied your two patches adding the active_timers lock.
This model apparently works as well for the RTC use case. And it avoids
having to patch aio_poll, the RTC device lock is now taken inside the
timer handlers.

I still need to check more corner cases as timer dequeuing can now race
with the handler execution, ie. a dequeued timer can still see one more
handler run after timer_del returned. That's a property one can easily
take into account when writing device models, but it has to be kept in
mind that it's different from current behavior.

Updated queue is at git://git.kiszka.org/qemu.git queues/rt.new3 again.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] Using aio_poll for timer carrier threads

2013-08-13 Thread Stefan Hajnoczi
On Tue, Aug 13, 2013 at 09:56:17AM +0200, Jan Kiszka wrote:
 in the attempt to use Alex' ppoll-based timer rework for decoupled,
 real-time capable timer device models I'm now scratching my head over
 the aio_poll interface. I'm looking at dataplane/virtio-blk.c, just finding
 
 static void *data_plane_thread(void *opaque)
 {
 VirtIOBlockDataPlane *s = opaque;
 
 do {
 aio_poll(s-ctx, true);
 } while (!s-stopping || s-num_reqs  0);
 return NULL;
 }
 
 wondering where the locking is. Or doesn't this use need any at all? Are
 all data structures that this thread accesses exclusively used by it, or
 are they all accessed in a lock-less way?

Most of the data structures in dataplane upstream are not shared.
Virtio, virtio-blk, and Linux AIO raw file I/O are duplicated for
dataplane and do not rely on QEMU infrastructure.

I've been working on undoing this duplication over the past months but
upstream QEMU still mostly does not share data structures and therefore
does not need much synchronization.  For the crude synchronization that
we do need we simply start/stop the dataplane thread.

 Our iothread mainloop more or less open-codes aio_poll and is, thus,
 able to drop its lock before falling asleep while still holding it
 during event dispatching. Obviously, I need the same when processing
 timer lists of an AioContext, protecting them against concurrent
 modifications over VCPUs or other threads. So I'm thinking of adding a
 block notification callback to aio_poll, to be called before/after
 qemu_poll_ns so that any locks can be dropped / reacquired as needed. Or
 am I missing some magic interface / pattern?

Upstream dataplane does not use timers, so the code there cannot serve
as an example.

If you combine Alex Bligh, Ping Fan, and my latest timer series, you get
support for QEMUTimer in AioContexts where qemu_timer_mod_ns() and
qemu_timer_del() are thread-safe.  vm_clock (without icount) and
rt_clock are thread-safe clock sources.

This should make timers usable in another thread for clock device
emulation if only your iothread uses the AioContext and its timers
(besides the thread-safe mod/del interfaces).

The details depend on your device, do you have a git repo I can look at
to understand your device model?

Stefan



Re: [Qemu-devel] Using aio_poll for timer carrier threads

2013-08-13 Thread Jan Kiszka
On 2013-08-13 15:45, Stefan Hajnoczi wrote:
 On Tue, Aug 13, 2013 at 09:56:17AM +0200, Jan Kiszka wrote:
 in the attempt to use Alex' ppoll-based timer rework for decoupled,
 real-time capable timer device models I'm now scratching my head over
 the aio_poll interface. I'm looking at dataplane/virtio-blk.c, just finding

 static void *data_plane_thread(void *opaque)
 {
 VirtIOBlockDataPlane *s = opaque;

 do {
 aio_poll(s-ctx, true);
 } while (!s-stopping || s-num_reqs  0);
 return NULL;
 }

 wondering where the locking is. Or doesn't this use need any at all? Are
 all data structures that this thread accesses exclusively used by it, or
 are they all accessed in a lock-less way?
 
 Most of the data structures in dataplane upstream are not shared.
 Virtio, virtio-blk, and Linux AIO raw file I/O are duplicated for
 dataplane and do not rely on QEMU infrastructure.
 
 I've been working on undoing this duplication over the past months but
 upstream QEMU still mostly does not share data structures and therefore
 does not need much synchronization.  For the crude synchronization that
 we do need we simply start/stop the dataplane thread.
 
 Our iothread mainloop more or less open-codes aio_poll and is, thus,
 able to drop its lock before falling asleep while still holding it
 during event dispatching. Obviously, I need the same when processing
 timer lists of an AioContext, protecting them against concurrent
 modifications over VCPUs or other threads. So I'm thinking of adding a
 block notification callback to aio_poll, to be called before/after
 qemu_poll_ns so that any locks can be dropped / reacquired as needed. Or
 am I missing some magic interface / pattern?
 
 Upstream dataplane does not use timers, so the code there cannot serve
 as an example.
 
 If you combine Alex Bligh, Ping Fan, and my latest timer series, you get
 support for QEMUTimer in AioContexts where qemu_timer_mod_ns() and
 qemu_timer_del() are thread-safe.  vm_clock (without icount) and
 rt_clock are thread-safe clock sources.

To which series of yours and Ping Fan are you referring? [1] and [2]?

 
 This should make timers usable in another thread for clock device
 emulation if only your iothread uses the AioContext and its timers
 (besides the thread-safe mod/del interfaces).

As argued in the other thread, I don't think we need (and want) locking
in the timer subsystem, rather push this to its users. But I'll look
again at your patches, if they are also usable.

 
 The details depend on your device, do you have a git repo I can look at
 to understand your device model?

Pushed my hacks here:

git://git.kiszka.org/qemu.git queues/rt.new3

Jan

[1] http://thread.gmane.org/gmane.comp.emulators.qemu/227590
[2] http://thread.gmane.org/gmane.comp.emulators.qemu/226369

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] Using aio_poll for timer carrier threads

2013-08-13 Thread Paolo Bonzini
Il 13/08/2013 09:56, Jan Kiszka ha scritto:
 Hi Stefan,
 
 in the attempt to use Alex' ppoll-based timer rework for decoupled,
 real-time capable timer device models I'm now scratching my head over
 the aio_poll interface. I'm looking at dataplane/virtio-blk.c, just finding
 
 static void *data_plane_thread(void *opaque)
 {
 VirtIOBlockDataPlane *s = opaque;
 
 do {
 aio_poll(s-ctx, true);
 } while (!s-stopping || s-num_reqs  0);
 return NULL;
 }
 
 wondering where the locking is. Or doesn't this use need any at all? Are
 all data structures that this thread accesses exclusively used by it, or
 are they all accessed in a lock-less way?

There is some locking in aio_bh_poll.  It is pretty lightweight because
adding elements and deleting them is done under a lock, while the list
is walked without taking it (because deletion is only done by the thread
that walks).  We could do something similar for file descriptors; timers
are more complicated because insertions happen for every mod_timer.

Using an AioContext lock for timers is somewhat complicated for lock
ordering, because context A could try to modify a timer from context B,
at the same time when context B is modifying a timer from context A.
This would cause a deadlock.  So I agree with Stefan's usage of a
finer-grain lock for timer lists.

Paolo



Re: [Qemu-devel] Using aio_poll for timer carrier threads

2013-08-13 Thread Jan Kiszka
On 2013-08-13 16:45, Paolo Bonzini wrote:
 Il 13/08/2013 09:56, Jan Kiszka ha scritto:
 Hi Stefan,

 in the attempt to use Alex' ppoll-based timer rework for decoupled,
 real-time capable timer device models I'm now scratching my head over
 the aio_poll interface. I'm looking at dataplane/virtio-blk.c, just finding

 static void *data_plane_thread(void *opaque)
 {
 VirtIOBlockDataPlane *s = opaque;

 do {
 aio_poll(s-ctx, true);
 } while (!s-stopping || s-num_reqs  0);
 return NULL;
 }

 wondering where the locking is. Or doesn't this use need any at all? Are
 all data structures that this thread accesses exclusively used by it, or
 are they all accessed in a lock-less way?
 
 There is some locking in aio_bh_poll.  It is pretty lightweight because
 adding elements and deleting them is done under a lock, while the list
 is walked without taking it (because deletion is only done by the thread
 that walks).  We could do something similar for file descriptors; timers
 are more complicated because insertions happen for every mod_timer.
 
 Using an AioContext lock for timers is somewhat complicated for lock
 ordering, because context A could try to modify a timer from context B,
 at the same time when context B is modifying a timer from context A.
 This would cause a deadlock.

That's like MMIO access on device A triggers MMIO access on B and vice
versa - why should we need this, so why should we support this? I think
the typical case is that timers (and their lists) and data structures
they access have a fairly close relation, thus can reuse the same lock.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] Using aio_poll for timer carrier threads

2013-08-13 Thread liu ping fan
On Tue, Aug 13, 2013 at 10:13 PM, Jan Kiszka jan.kis...@siemens.com wrote:
 On 2013-08-13 15:45, Stefan Hajnoczi wrote:
 On Tue, Aug 13, 2013 at 09:56:17AM +0200, Jan Kiszka wrote:
 in the attempt to use Alex' ppoll-based timer rework for decoupled,
 real-time capable timer device models I'm now scratching my head over
 the aio_poll interface. I'm looking at dataplane/virtio-blk.c, just finding

 static void *data_plane_thread(void *opaque)
 {
 VirtIOBlockDataPlane *s = opaque;

 do {
 aio_poll(s-ctx, true);
 } while (!s-stopping || s-num_reqs  0);
 return NULL;
 }

 wondering where the locking is. Or doesn't this use need any at all? Are
 all data structures that this thread accesses exclusively used by it, or
 are they all accessed in a lock-less way?

 Most of the data structures in dataplane upstream are not shared.
 Virtio, virtio-blk, and Linux AIO raw file I/O are duplicated for
 dataplane and do not rely on QEMU infrastructure.

 I've been working on undoing this duplication over the past months but
 upstream QEMU still mostly does not share data structures and therefore
 does not need much synchronization.  For the crude synchronization that
 we do need we simply start/stop the dataplane thread.

 Our iothread mainloop more or less open-codes aio_poll and is, thus,
 able to drop its lock before falling asleep while still holding it
 during event dispatching. Obviously, I need the same when processing
 timer lists of an AioContext, protecting them against concurrent
 modifications over VCPUs or other threads. So I'm thinking of adding a
 block notification callback to aio_poll, to be called before/after
 qemu_poll_ns so that any locks can be dropped / reacquired as needed. Or
 am I missing some magic interface / pattern?

 Upstream dataplane does not use timers, so the code there cannot serve
 as an example.

 If you combine Alex Bligh, Ping Fan, and my latest timer series, you get
 support for QEMUTimer in AioContexts where qemu_timer_mod_ns() and
 qemu_timer_del() are thread-safe.  vm_clock (without icount) and
 rt_clock are thread-safe clock sources.

 To which series of yours and Ping Fan are you referring? [1] and [2]?

Stefan's [1] has been rebased onto Alex's v10
My part [2']  http://thread.gmane.org/gmane.comp.emulators.qemu/227751,
rebased onto v10 too.


 This should make timers usable in another thread for clock device
 emulation if only your iothread uses the AioContext and its timers
 (besides the thread-safe mod/del interfaces).

 As argued in the other thread, I don't think we need (and want) locking
 in the timer subsystem, rather push this to its users. But I'll look
 again at your patches, if they are also usable.


 The details depend on your device, do you have a git repo I can look at
 to understand your device model?

 Pushed my hacks here:

 git://git.kiszka.org/qemu.git queues/rt.new3

 Jan

 [1] http://thread.gmane.org/gmane.comp.emulators.qemu/227590
 [2] http://thread.gmane.org/gmane.comp.emulators.qemu/226369

 --
 Siemens AG, Corporate Technology, CT RTC ITP SES-DE
 Corporate Competence Center Embedded Linux