Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane

2013-02-28 Thread Michael S. Tsirkin
On Thu, Feb 21, 2013 at 11:02:31AM -0600, Anthony Liguori wrote:
 Michael S. Tsirkin m...@redhat.com writes:
 
  On Thu, Feb 21, 2013 at 10:15:58AM -0600, Anthony Liguori wrote:
  Michael S. Tsirkin m...@redhat.com writes:
  
   On Thu, Feb 21, 2013 at 08:54:44PM +0800, Liu Ping Fan wrote:
   This is a emulation to virtio-blk dataplane, which push the data
   handling out of biglock. And it is a try to implement this process
   in userspace, while vhost-net in kernel.
  
   What's the motivation for doing this?
  
  I haven't looked carefully at the patches, but I don't think we should
  cheat when it comes to virtio-net.  I think it's useful to have a
  baseline proof of concept but I think we should focus on making the
  network layer re-entrant.

By the way I agree on this one. We allowed the
cheat for block because there's no vhost-blk upstream yet.

-- 
MST



Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane

2013-02-27 Thread liu ping fan
On Tue, Feb 26, 2013 at 1:53 AM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 25/02/2013 18:35, mdroth ha scritto:
  Moving more of the os_host_main_loop_wait to AioContext would be
  possible (timers are on the todo list, in fact), but we should only do
  it as need arises.
 Were you planning on hanging another GSource off of AioContext to handle
 timers?

 No, I'm planning to merge qemu-timer.c into it.  I don't want to turn
 AioContext into a bad copy of the glib main loop.  AioContext should
 keep the concepts of the QEMU block layer.

 We could consolidate qemu_set_fd_handler()/qemu_aio_set_fd_handler() on
 POSIX by teaching the current GSource about fd_read_poll functions, and on
 Windows qemu_set_fd_handler() would tie into a winsock-specific GSource
 that we register with an AioContext. Might be able to do similar with
 GSources for slirp and the qemu_add_wait_object() stuff.

 Consolidating qemu_set_fd_handler and slirp to GSources is a good thing
 to do, indeed.  The assumption here is that qemu_set_fd_handler and
 slirp do not need special locking or priority constructs, while
 everything else can use AioContext.

 AioContext could also grow other implementations that use
 epoll/kqueue/whatever.

 Yup, don't mean to get ahead of things, my main interest is just in how
 we might deal with the interaction between NetClients and virtio-net
 dataplane threads without introducing ad-hoc, dataplane-specific
 mechanisms. If there was a general way for Nic to tell it's NetClient
 peer hey, i have my own thread/main loop, here's my {Aio,*}Context, register
 your handlers there instead I think this series might look a lot more
 realistic as a default, or at least make merging it less risky.

 Yes, I see the point.

 The main blocker to this series seems to be hubs, because they interact
 with multiple NetClients and thus could span multiple AioContexts.
 Adding proper locking there is going to be interesting. :)

I think we can attach hub ports with different AioContext, and
dispatch them on corresponding one.
What about
net_hub_receive()
{
   for_each_hub_port
if (dstport-aio == srcport-aio)
qemu_send_packet()
else
qemu_net_queue_append(dstport)
signal dstport's eventfd, whose handler will call  nc-receive()
}

Regards,
Pingfan
 But otherwise, I don't think we would have many hacks, if any.  Unlike
 the block layer, networking is quite self-contained and there isn't much
 magic to interact with it from the monitor; and for the block layer we
 already have an idea of how to deal with concurrency.

 Paolo

 But the
 right way to do that seems to tie into the discussion around making
 other aio sources more GMainContext/AioContext-ish.





Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane

2013-02-27 Thread liu ping fan
On Fri, Feb 22, 2013 at 12:51 AM, Michael S. Tsirkin m...@redhat.com wrote:
 On Thu, Feb 21, 2013 at 06:45:54PM +0200, Michael S. Tsirkin wrote:
 On Thu, Feb 21, 2013 at 10:15:58AM -0600, Anthony Liguori wrote:
  Michael S. Tsirkin m...@redhat.com writes:
 
   On Thu, Feb 21, 2013 at 08:54:44PM +0800, Liu Ping Fan wrote:
   This is a emulation to virtio-blk dataplane, which push the data
   handling out of biglock. And it is a try to implement this process
   in userspace, while vhost-net in kernel.
  
   What's the motivation for doing this?
 
  I haven't looked carefully at the patches, but I don't think we should
  cheat when it comes to virtio-net.  I think it's useful to have a
  baseline proof of concept but I think we should focus on making the
  network layer re-entrant.
 
  In terms of why even bother, if we can make virtio-net a viable
  alternative to vhost-net, it's a huge win.  Being in userspace is a huge
  advantage.
 
  I understand the challenges with zero-copy but we really need a proper
  userspace implementation to determine how much it really matters.
  Zero-copy tap is also not entirely outside the realm of possibility.

 This is exactly what we have, vhost-net is the interface we use
 for asynchronous communication. If you want to add yet another
 asynchronous interface in kernel, it's certainly doable but then what's
 the point?

   The iperf's result show it improving the perfermance of base line,
   but still slower than vhost-net (maybe the rx path need optimized).
  
   Todo:
   implement fine lock for net-subsystem
  
   vhost-net is currently the only way to get zero copy transmit
   on linux (and zero copy receive in the future)
   so at least in theory it'll always be faster.
 
  It might always be faster.  But that doesn't mean we should limit the
  performance of virtio-net in userspace.  Some people may be willing to
  take a small performance hit in order to obtain the security offered by
  being in userspace.
 
  Regards,
 
  Anthony Liguori

 By 'being in userspace' I presume you mean the ring processing code.
 Ring processing can be done in userspace if you are ready
 to use the synchronous read/write operations, skipping
 this bunch of code might improve security. But it requires a data
 copy almost by definition.


 BTW I think we do have a problem at the moment,
 and that is that virtio net can process the same ring both in userspace
 (for level interrupts) and in kernel (for edge interrupts).

When using virtio net, the proc/interrupt shows IO-APIC-edge as
vhost-net, so does virtio net use level interrupt?
 And which one is selected is under guest control.
 This is twice the number of potential security issues.
 A solution, in my opinion, is to improve handling of level
 interrupts to the point where we can make vhost-net
 the default for level as well.
 The new support for level IRQFD in kvm should make
 this possible.

Seems qemu lacks of something like eoi-eventfd for level interrupt, right?

Thanks and regards,
Pingfan

 Unfortunately, this prototype moves in exactly the reverse
 direction.

  
   Liu Ping Fan (9):
 vring: split the modification and publish of used ring
 vring: introduce vring_restore() to restore from img
 event poll: make epoll work for normal fd
 event poll: pass event type to event callback
 event poll: enable event poll handle more than one event each time
 virtio net: introduce dataplane for virtio net
 tap: make tap peer work on dedicated data-plane thread
 virtio net: enable dataplane for virtio net
 configure: make virtio net dataplane configurable
  
configure  |   12 ++
hw/dataplane/Makefile.objs |4 +-
hw/dataplane/event-poll.c  |   69 +--
hw/dataplane/event-poll.h  |   14 ++-
hw/dataplane/virtio-blk.c  |8 +-
hw/dataplane/virtio-net.c  |  444 
   
hw/dataplane/virtio-net.h  |   32 
hw/dataplane/vring.c   |   37 
hw/dataplane/vring.h   |3 +
hw/virtio-net.c|   94 +-
hw/virtio-net.h|   64 +++
hw/virtio-pci.c|2 +-
include/net/tap.h  |6 +
net/tap.c  |   92 +-
14 files changed, 806 insertions(+), 75 deletions(-)
create mode 100644 hw/dataplane/virtio-net.c
create mode 100644 hw/dataplane/virtio-net.h
  
   --
   1.7.4.4
  



Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane

2013-02-27 Thread Michael S. Tsirkin
On Wed, Feb 27, 2013 at 05:41:40PM +0800, liu ping fan wrote:
 On Fri, Feb 22, 2013 at 12:51 AM, Michael S. Tsirkin m...@redhat.com wrote:
  On Thu, Feb 21, 2013 at 06:45:54PM +0200, Michael S. Tsirkin wrote:
  On Thu, Feb 21, 2013 at 10:15:58AM -0600, Anthony Liguori wrote:
   Michael S. Tsirkin m...@redhat.com writes:
  
On Thu, Feb 21, 2013 at 08:54:44PM +0800, Liu Ping Fan wrote:
This is a emulation to virtio-blk dataplane, which push the data
handling out of biglock. And it is a try to implement this process
in userspace, while vhost-net in kernel.
   
What's the motivation for doing this?
  
   I haven't looked carefully at the patches, but I don't think we should
   cheat when it comes to virtio-net.  I think it's useful to have a
   baseline proof of concept but I think we should focus on making the
   network layer re-entrant.
  
   In terms of why even bother, if we can make virtio-net a viable
   alternative to vhost-net, it's a huge win.  Being in userspace is a huge
   advantage.
  
   I understand the challenges with zero-copy but we really need a proper
   userspace implementation to determine how much it really matters.
   Zero-copy tap is also not entirely outside the realm of possibility.
 
  This is exactly what we have, vhost-net is the interface we use
  for asynchronous communication. If you want to add yet another
  asynchronous interface in kernel, it's certainly doable but then what's
  the point?
 
The iperf's result show it improving the perfermance of base line,
but still slower than vhost-net (maybe the rx path need optimized).
   
Todo:
implement fine lock for net-subsystem
   
vhost-net is currently the only way to get zero copy transmit
on linux (and zero copy receive in the future)
so at least in theory it'll always be faster.
  
   It might always be faster.  But that doesn't mean we should limit the
   performance of virtio-net in userspace.  Some people may be willing to
   take a small performance hit in order to obtain the security offered by
   being in userspace.
  
   Regards,
  
   Anthony Liguori
 
  By 'being in userspace' I presume you mean the ring processing code.
  Ring processing can be done in userspace if you are ready
  to use the synchronous read/write operations, skipping
  this bunch of code might improve security. But it requires a data
  copy almost by definition.
 
 
  BTW I think we do have a problem at the moment,
  and that is that virtio net can process the same ring both in userspace
  (for level interrupts) and in kernel (for edge interrupts).
 
 When using virtio net, the proc/interrupt shows IO-APIC-edge as
 vhost-net, so does virtio net use level interrupt?

virtio *can* use level interrupts, and it's under guest
control. E.g. simply boot with pci=nomsi.
Which means we have two implementations and guest controls which one is
used.  If we always use the kernel one, we reduce the attack surface.
This already can be done with vhostforce=on, but it hurts (a bit)
the speed for very old guests (e.g. windows XP) as interrupts
are bounced through qemu.

  And which one is selected is under guest control.
  This is twice the number of potential security issues.
  A solution, in my opinion, is to improve handling of level
  interrupts to the point where we can make vhost-net
  the default for level as well.
  The new support for level IRQFD in kvm should make
  this possible.
 
 Seems qemu lacks of something like eoi-eventfd for level interrupt, right?
 
 Thanks and regards,
 Pingfan

No. See KVM_IRQFD_FLAG_RESAMPLE.

  Unfortunately, this prototype moves in exactly the reverse
  direction.
 
   
Liu Ping Fan (9):
  vring: split the modification and publish of used ring
  vring: introduce vring_restore() to restore from img
  event poll: make epoll work for normal fd
  event poll: pass event type to event callback
  event poll: enable event poll handle more than one event each time
  virtio net: introduce dataplane for virtio net
  tap: make tap peer work on dedicated data-plane thread
  virtio net: enable dataplane for virtio net
  configure: make virtio net dataplane configurable
   
 configure  |   12 ++
 hw/dataplane/Makefile.objs |4 +-
 hw/dataplane/event-poll.c  |   69 +--
 hw/dataplane/event-poll.h  |   14 ++-
 hw/dataplane/virtio-blk.c  |8 +-
 hw/dataplane/virtio-net.c  |  444 

 hw/dataplane/virtio-net.h  |   32 
 hw/dataplane/vring.c   |   37 
 hw/dataplane/vring.h   |3 +
 hw/virtio-net.c|   94 +-
 hw/virtio-net.h|   64 +++
 hw/virtio-pci.c|2 +-
 include/net/tap.h  |6 +
 net/tap.c  |   92 +-
 14 files changed, 806 insertions(+), 75 deletions(-)
 create mode 100644 hw/dataplane/virtio-net.c
 create mode 

Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane

2013-02-27 Thread Markus Armbruster
Stefan Hajnoczi stefa...@gmail.com writes:

 On Mon, Feb 25, 2013 at 06:53:04PM +0100, Paolo Bonzini wrote:
 Il 25/02/2013 18:35, mdroth ha scritto:
  Yup, don't mean to get ahead of things, my main interest is just in how
  we might deal with the interaction between NetClients and virtio-net
  dataplane threads without introducing ad-hoc, dataplane-specific
  mechanisms. If there was a general way for Nic to tell it's NetClient
  peer hey, i have my own thread/main loop, here's my
  {Aio,*}Context, register
  your handlers there instead I think this series might look a lot more
  realistic as a default, or at least make merging it less risky.
 
 Yes, I see the point.
 
 The main blocker to this series seems to be hubs, because they interact
 with multiple NetClients and thus could span multiple AioContexts.
 Adding proper locking there is going to be interesting. :)

 I think folks are pretty fed up with QEMU vlans (hubs).  Someone
 braver and with more time on their hands than me might be able to kill
 hubs without breaking their few use cases (-net dump, using the hub with
 -net socket).

Figuring out the 'interesting' locking leaves us with 'interesting'
locking code, and still stuck with vlans.

Killing vlans would avoid the locking problems, and get rid of a
fertile source of 'interesting' problems.

Guess how I'd prefer to see the locking problem solved :)



Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane

2013-02-27 Thread Paolo Bonzini
Il 27/02/2013 13:08, Markus Armbruster ha scritto:
  The main blocker to this series seems to be hubs, because they interact
  with multiple NetClients and thus could span multiple AioContexts.
  Adding proper locking there is going to be interesting. :)
 
  I think folks are pretty fed up with QEMU vlans (hubs).  Someone
  braver and with more time on their hands than me might be able to kill
  hubs without breaking their few use cases (-net dump, using the hub with
  -net socket).
 Figuring out the 'interesting' locking leaves us with 'interesting'
 locking code, and still stuck with vlans.
 
 Killing vlans would avoid the locking problems, and get rid of a
 fertile source of 'interesting' problems.
 
 Guess how I'd prefer to see the locking problem solved :)

True, though I'd love to be proved wrong.  After all, adding proper
locking to block migration was all but difficult (once I tried).

Paolo



Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane

2013-02-26 Thread Stefan Hajnoczi
On Mon, Feb 25, 2013 at 06:53:04PM +0100, Paolo Bonzini wrote:
 Il 25/02/2013 18:35, mdroth ha scritto:
  Yup, don't mean to get ahead of things, my main interest is just in how
  we might deal with the interaction between NetClients and virtio-net
  dataplane threads without introducing ad-hoc, dataplane-specific
  mechanisms. If there was a general way for Nic to tell it's NetClient
  peer hey, i have my own thread/main loop, here's my {Aio,*}Context, 
  register
  your handlers there instead I think this series might look a lot more
  realistic as a default, or at least make merging it less risky.
 
 Yes, I see the point.
 
 The main blocker to this series seems to be hubs, because they interact
 with multiple NetClients and thus could span multiple AioContexts.
 Adding proper locking there is going to be interesting. :)

I think folks are pretty fed up with QEMU vlans (hubs).  Someone
braver and with more time on their hands than me might be able to kill
hubs without breaking their few use cases (-net dump, using the hub with
-net socket).

Stefan



Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane

2013-02-25 Thread mdroth
On Fri, Feb 22, 2013 at 09:53:11AM +0100, Paolo Bonzini wrote:
 Il 22/02/2013 00:38, mdroth ha scritto:
  On Thu, Feb 21, 2013 at 11:12:01PM +0100, Paolo Bonzini wrote:
  Il 21/02/2013 22:07, mdroth ha scritto:
 
  100% agree.  In particular hw/dataplane/event-poll.c should be the first
  to go away, but AioContext provides the functionality that Ping Fan
  needs.  But hw/dataplane/vring.c will probably be here for a longer
  Has there been any discussion around introducing something similar to
  AioContexts for fd handlers? This would avoid the dataplane-specific hooks
  needed for NetClients in this series.
 
  AioContext can include file descriptors on POSIX systems (used for NBD
  and other network backends), see aio_set_fd_handler.
  
  Sorry, was using fd handlers too generally. I mean specifically for
  the qemu_set_fd_handler interfaces, where we currently rely on a single list
  of IOHandlerRecords for registration and a single loop to drive them. Would
  be nice if we could drive subsets of those via mini main loops, similar to 
  the
  way dataplane threads would with a particular AioContext via aio_poll (or 
  perhaps
  the exact same way)
 
 Yes, that's what I meant actually.  You can already do it for POSIX,
 unfortunately Windows poses extra complication because sockets are not
 handles.
 
 Moving more of the os_host_main_loop_wait to AioContext would be
 possible (timers are on the todo list, in fact), but we should only do
 it as need arises.

Were you planning on hanging another GSource off of AioContext to handle
timers?

I wonder if adding an interface to AioContext that allows you to just
register an arbitrary GSource might be the right approach.

We could consolidate qemu_set_fd_handler()/qemu_aio_set_fd_handler() on
POSIX by teaching the current GSource about fd_read_poll functions, and on
Windows qemu_set_fd_handler() would tie into a winsock-specific GSource
that we register with an AioContext. Might be able to do similar with
GSources for slirp and the qemu_add_wait_object() stuff.

Or maybe that's the plan already... glib-style main loops, but not
intrinsically tethered to GMainContext so we can implement our own
locking/priority/loop constructs if need be... seems to be what's been
discussed elsewhere, just trying to get an idea of what that might look
like.

 possible (timers are on the todo list, in fact), but we should only do
 it as need arises.

Yup, don't mean to get ahead of things, my main interest is just in how
we might deal with the interaction between NetClients and virtio-net
dataplane threads without introducing ad-hoc, dataplane-specific
mechanisms. If there was a general way for Nic to tell it's NetClient
peer hey, i have my own thread/main loop, here's my {Aio,*}Context, register
your handlers there instead I think this series might look a lot more
realistic as a default, or at least make merging it less risky. But the
right way to do that seems to tie into the discussion around making
other aio sources more GMainContext/AioContext-ish.

 
 Paolo
 
  Currently, Ping Fan's patches basically do this already by accessing a
  global reference to the vnet worker thread and attaching events/handlers to
  it's event loop via a new set of registration functions (PATCH 7).
  
  I think generalizing this by basing qemu_set_fd_handler() around
  AioContext, or something similar, would help to extend support to other
  NetClient implementations without requiring dataplane-specific hooks
  throughout.
 



Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane

2013-02-25 Thread Paolo Bonzini
Il 25/02/2013 18:35, mdroth ha scritto:
  Moving more of the os_host_main_loop_wait to AioContext would be
  possible (timers are on the todo list, in fact), but we should only do
  it as need arises.
 Were you planning on hanging another GSource off of AioContext to handle
 timers?

No, I'm planning to merge qemu-timer.c into it.  I don't want to turn
AioContext into a bad copy of the glib main loop.  AioContext should
keep the concepts of the QEMU block layer.

 We could consolidate qemu_set_fd_handler()/qemu_aio_set_fd_handler() on
 POSIX by teaching the current GSource about fd_read_poll functions, and on
 Windows qemu_set_fd_handler() would tie into a winsock-specific GSource
 that we register with an AioContext. Might be able to do similar with
 GSources for slirp and the qemu_add_wait_object() stuff.

Consolidating qemu_set_fd_handler and slirp to GSources is a good thing
to do, indeed.  The assumption here is that qemu_set_fd_handler and
slirp do not need special locking or priority constructs, while
everything else can use AioContext.

AioContext could also grow other implementations that use
epoll/kqueue/whatever.

 Yup, don't mean to get ahead of things, my main interest is just in how
 we might deal with the interaction between NetClients and virtio-net
 dataplane threads without introducing ad-hoc, dataplane-specific
 mechanisms. If there was a general way for Nic to tell it's NetClient
 peer hey, i have my own thread/main loop, here's my {Aio,*}Context, register
 your handlers there instead I think this series might look a lot more
 realistic as a default, or at least make merging it less risky.

Yes, I see the point.

The main blocker to this series seems to be hubs, because they interact
with multiple NetClients and thus could span multiple AioContexts.
Adding proper locking there is going to be interesting. :)

But otherwise, I don't think we would have many hacks, if any.  Unlike
the block layer, networking is quite self-contained and there isn't much
magic to interact with it from the monitor; and for the block layer we
already have an idea of how to deal with concurrency.

Paolo

 But the
 right way to do that seems to tie into the discussion around making
 other aio sources more GMainContext/AioContext-ish.





Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane

2013-02-22 Thread Paolo Bonzini
Il 22/02/2013 00:38, mdroth ha scritto:
 On Thu, Feb 21, 2013 at 11:12:01PM +0100, Paolo Bonzini wrote:
 Il 21/02/2013 22:07, mdroth ha scritto:

 100% agree.  In particular hw/dataplane/event-poll.c should be the first
 to go away, but AioContext provides the functionality that Ping Fan
 needs.  But hw/dataplane/vring.c will probably be here for a longer
 Has there been any discussion around introducing something similar to
 AioContexts for fd handlers? This would avoid the dataplane-specific hooks
 needed for NetClients in this series.

 AioContext can include file descriptors on POSIX systems (used for NBD
 and other network backends), see aio_set_fd_handler.
 
 Sorry, was using fd handlers too generally. I mean specifically for
 the qemu_set_fd_handler interfaces, where we currently rely on a single list
 of IOHandlerRecords for registration and a single loop to drive them. Would
 be nice if we could drive subsets of those via mini main loops, similar to the
 way dataplane threads would with a particular AioContext via aio_poll (or 
 perhaps
 the exact same way)

Yes, that's what I meant actually.  You can already do it for POSIX,
unfortunately Windows poses extra complication because sockets are not
handles.

Moving more of the os_host_main_loop_wait to AioContext would be
possible (timers are on the todo list, in fact), but we should only do
it as need arises.

Paolo

 Currently, Ping Fan's patches basically do this already by accessing a
 global reference to the vnet worker thread and attaching events/handlers to
 it's event loop via a new set of registration functions (PATCH 7).
 
 I think generalizing this by basing qemu_set_fd_handler() around
 AioContext, or something similar, would help to extend support to other
 NetClient implementations without requiring dataplane-specific hooks
 throughout.



Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane

2013-02-22 Thread Stefan Hajnoczi
On Thu, Feb 21, 2013 at 06:05:44PM +0100, Paolo Bonzini wrote:
 Il 21/02/2013 17:33, Stefan Hajnoczi ha scritto:
  This especially means things like adding live
  migration support or solving other limitations.  The limitations are
  there to motivate core QEMU refactoring, so that core QEMU code can be
  used directly instead of reimplementing it.
  
  Here's my TODO list:
   * Thread pool - AioContext
  
 The thread pool uses an EventNotifier to invoke callbacks.  In a
 multiple event loop world we need to invoke callbacks in the event
 loop that they are associated with.
  
 Worker threads are also started using a BH so that the thread
 inherits the iothread CPU affinity and other characteristics.  I
 think this can stay.
 
 It can and should stay. :)  BHs are tied to an AioContext.  Running
 thread-creation BHs from the AioContext's main thread ensures that the
 worker threads inherit the right attributes for e.g. real-time.

When I say this can stay I mean that worker threads will be spawned
from the QEMU iothread, not from the caller's thread.  Why?

Because if we decide to change this then we can no longer have a global
set of worker threads and a single work queue.  It would make worker
threads scoped to just one AioContext.

Do we really want to go there?  It kind of defeats the purpose of a
thread-pool to have one thread-pool per AioContext (i.e. per virtio-blk
device).

   * BlockDriverState - AioContext
  
 It probably makes sense to bind a BlockDriverState to an AioContext
 in which its file descriptors and aio activity happens.
 
 ... and have the full chain of BlockDriverStates (bs-file,
 bs-backing_file and any others as in the vmdk driver) bound to a single
 AioContext.  We will use two BlockDriverStates if the same backing file
 happens to be shared by two disks (BTW, anybody ever checked what kind
 of havoc happens if you commit to such a BDS? :)).

AFAIK we don't share backing file BlockDriverStates today.

   * Thread-friendly interrupt API
 
 What is this?

Raising interrupts, I think we're already okay using irqfd today but
maybe it's nicer to make the regular QEMU irq APIs support this instead
of manually using irqfd.

 What's still missing here is TCG support.  Perhaps you can emulate
 ioeventfd at the hw/virtio.c level and always going through a host
 notifier.  In retrospect, perhaps it was a mistake to make ioeventfd a
 non-experimental option.

That's true.  We need to emulate ioeventfd.  For virtio-pci we can cheat
by simply notifying the EventNotifier from the virtqueue kick handler.
But really we should implement an ioeventfd equivalent at the pio/mmio
emulation level in QEMU.

Stefan



Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane

2013-02-22 Thread Stefan Hajnoczi
On Thu, Feb 21, 2013 at 05:38:04PM -0600, mdroth wrote:
 On Thu, Feb 21, 2013 at 11:12:01PM +0100, Paolo Bonzini wrote:
  Il 21/02/2013 22:07, mdroth ha scritto:

100% agree.  In particular hw/dataplane/event-poll.c should be the 
first
to go away, but AioContext provides the functionality that Ping Fan
needs.  But hw/dataplane/vring.c will probably be here for a longer
   Has there been any discussion around introducing something similar to
   AioContexts for fd handlers? This would avoid the dataplane-specific hooks
   needed for NetClients in this series.
  
  AioContext can include file descriptors on POSIX systems (used for NBD
  and other network backends), see aio_set_fd_handler.
 
 Sorry, was using fd handlers too generally. I mean specifically for
 the qemu_set_fd_handler interfaces, where we currently rely on a single list
 of IOHandlerRecords for registration and a single loop to drive them. Would
 be nice if we could drive subsets of those via mini main loops, similar to the
 way dataplane threads would with a particular AioContext via aio_poll (or 
 perhaps
 the exact same way)
 
 Currently, Ping Fan's patches basically do this already by accessing a
 global reference to the vnet worker thread and attaching events/handlers to
 it's event loop via a new set of registration functions (PATCH 7).
 
 I think generalizing this by basing qemu_set_fd_handler() around
 AioContext, or something similar, would help to extend support to other
 NetClient implementations without requiring dataplane-specific hooks
 throughout.

We used to have a nesting feature that we got rid of.  I don't remember
the details but basically a nesting level counter:

  aio_set_fd_handler(...); /* fd at level N */

  aio_nesting_inc();
  aio_set_fd_handler(...); /* fd at level N+1 */
  aio_poll(); /* just this fd */
  aio_nesting_dec();

  aio_poll(); /* both fds */

Then aio_poll() only considers fds which are greater than or equal to
the current nesting level.

I remember disliking this feature but maybe it was just being used badly
rather than the mechanism itself being bad.

Stefan



Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane

2013-02-22 Thread Kevin Wolf
Am 21.02.2013 um 18:05 hat Paolo Bonzini geschrieben:
 Il 21/02/2013 17:33, Stefan Hajnoczi ha scritto:
   * BlockDriverState - AioContext
  
 It probably makes sense to bind a BlockDriverState to an AioContext
 in which its file descriptors and aio activity happens.
 
 ... and have the full chain of BlockDriverStates (bs-file,
 bs-backing_file and any others as in the vmdk driver) bound to a single
 AioContext.  We will use two BlockDriverStates if the same backing file
 happens to be shared by two disks (BTW, anybody ever checked what kind
 of havoc happens if you commit to such a BDS? :)).

Backing files aren't that interesting indeed, but I think there could be
cases where two BDSes refer to the same parent BDS. For example think of
the read-only qcow2 snapshot BDS that we've been talking about before.

So it would have to be more like one AioContext per connected component
in the BDS graph, if one per BDS doesn't work (why doesn't it?)

Kevin



Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane

2013-02-22 Thread Paolo Bonzini
Il 22/02/2013 11:43, Stefan Hajnoczi ha scritto:
  
  It can and should stay. :)  BHs are tied to an AioContext.  Running
  thread-creation BHs from the AioContext's main thread ensures that the
  worker threads inherit the right attributes for e.g. real-time.
 When I say this can stay I mean that worker threads will be spawned
 from the QEMU iothread, not from the caller's thread.

Ah :)

 Why?
 
 Because if we decide to change this then we can no longer have a global
 set of worker threads and a single work queue.  It would make worker
 threads scoped to just one AioContext.
 
 Do we really want to go there?  It kind of defeats the purpose of a
 thread-pool to have one thread-pool per AioContext (i.e. per virtio-blk
 device).

I think scoped worker threads are simpler to implement and have little
or no lock contention (I have patches for lockless thread-pool, but I'm
not 100% sure they're correct---and they're a bit ugly).  Also, with
many virtio-blk threads you could hit the 64-thread limit that
thread-pool has, so it may require some tuning.

More importantly, scoped worker thread do not require an external
iothread to exist at all.

We can switch from one solution to the other easily (benchmarking takes
more time than coding, in all likelihood), so it's not urgent to choose
one over the other now.

 We will use two BlockDriverStates if the same backing file
 happens to be shared by two disks (BTW, anybody ever checked what kind
 of havoc happens if you commit to such a BDS? :)).
 
 AFAIK we don't share backing file BlockDriverStates today.

Indeed, we don't share BDSs but commit operations could cause the same
file to be backed by two BDSs, one read-only and one read-write.

 What's still missing here is TCG support.  Perhaps you can emulate
 ioeventfd at the hw/virtio.c level and always going through a host
 notifier.  In retrospect, perhaps it was a mistake to make ioeventfd a
 non-experimental option.
 
 That's true.  We need to emulate ioeventfd.  For virtio-pci we can cheat
 by simply notifying the EventNotifier from the virtqueue kick handler.
 But really we should implement an ioeventfd equivalent at the pio/mmio
 emulation level in QEMU.

Not a big deal anyway (like the irqfd).

Paolo



Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane

2013-02-22 Thread Paolo Bonzini
Il 22/02/2013 12:06, Kevin Wolf ha scritto:
 or example think of
 the read-only qcow2 snapshot BDS that we've been talking about before.
 
 So it would have to be more like one AioContext per connected component
 in the BDS graph

Yes.  For now it means one AioContext per top-level BDS, but that can
change in the future.

 , if one per BDS doesn't work (why doesn't it?)

One AioContext per BDS means one event loop and hence one thread per
BDS, with extra costs in terms of locking, context switching, etc.

Paolo



Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane

2013-02-21 Thread Michael S. Tsirkin
On Thu, Feb 21, 2013 at 08:54:44PM +0800, Liu Ping Fan wrote:
 This is a emulation to virtio-blk dataplane, which push the data
 handling out of biglock. And it is a try to implement this process
 in userspace, while vhost-net in kernel.

What's the motivation for doing this? Speed up the slirp backend
which vhost-net does not support? If yes, I would start with
implementing offloads (segmentation,checksum) in slirp.
This is likely to give you a nice speed boost and is
generally a desirable feature.

 The iperf's result show it improving the perfermance of base line,
 but still slower than vhost-net (maybe the rx path need optimized).
 
 Todo:
 implement fine lock for net-subsystem

vhost-net is currently the only way to get zero copy transmit
on linux (and zero copy receive in the future)
so at least in theory it'll always be faster.

 Liu Ping Fan (9):
   vring: split the modification and publish of used ring
   vring: introduce vring_restore() to restore from img
   event poll: make epoll work for normal fd
   event poll: pass event type to event callback
   event poll: enable event poll handle more than one event each time
   virtio net: introduce dataplane for virtio net
   tap: make tap peer work on dedicated data-plane thread
   virtio net: enable dataplane for virtio net
   configure: make virtio net dataplane configurable
 
  configure  |   12 ++
  hw/dataplane/Makefile.objs |4 +-
  hw/dataplane/event-poll.c  |   69 +--
  hw/dataplane/event-poll.h  |   14 ++-
  hw/dataplane/virtio-blk.c  |8 +-
  hw/dataplane/virtio-net.c  |  444 
 
  hw/dataplane/virtio-net.h  |   32 
  hw/dataplane/vring.c   |   37 
  hw/dataplane/vring.h   |3 +
  hw/virtio-net.c|   94 +-
  hw/virtio-net.h|   64 +++
  hw/virtio-pci.c|2 +-
  include/net/tap.h  |6 +
  net/tap.c  |   92 +-
  14 files changed, 806 insertions(+), 75 deletions(-)
  create mode 100644 hw/dataplane/virtio-net.c
  create mode 100644 hw/dataplane/virtio-net.h
 
 -- 
 1.7.4.4
 



Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane

2013-02-21 Thread Anthony Liguori
Michael S. Tsirkin m...@redhat.com writes:

 On Thu, Feb 21, 2013 at 08:54:44PM +0800, Liu Ping Fan wrote:
 This is a emulation to virtio-blk dataplane, which push the data
 handling out of biglock. And it is a try to implement this process
 in userspace, while vhost-net in kernel.

 What's the motivation for doing this?

I haven't looked carefully at the patches, but I don't think we should
cheat when it comes to virtio-net.  I think it's useful to have a
baseline proof of concept but I think we should focus on making the
network layer re-entrant.

In terms of why even bother, if we can make virtio-net a viable
alternative to vhost-net, it's a huge win.  Being in userspace is a huge
advantage.

I understand the challenges with zero-copy but we really need a proper
userspace implementation to determine how much it really matters.
Zero-copy tap is also not entirely outside the realm of possibility.

 The iperf's result show it improving the perfermance of base line,
 but still slower than vhost-net (maybe the rx path need optimized).
 
 Todo:
 implement fine lock for net-subsystem

 vhost-net is currently the only way to get zero copy transmit
 on linux (and zero copy receive in the future)
 so at least in theory it'll always be faster.

It might always be faster.  But that doesn't mean we should limit the
performance of virtio-net in userspace.  Some people may be willing to
take a small performance hit in order to obtain the security offered by
being in userspace.

Regards,

Anthony Liguori


 Liu Ping Fan (9):
   vring: split the modification and publish of used ring
   vring: introduce vring_restore() to restore from img
   event poll: make epoll work for normal fd
   event poll: pass event type to event callback
   event poll: enable event poll handle more than one event each time
   virtio net: introduce dataplane for virtio net
   tap: make tap peer work on dedicated data-plane thread
   virtio net: enable dataplane for virtio net
   configure: make virtio net dataplane configurable
 
  configure  |   12 ++
  hw/dataplane/Makefile.objs |4 +-
  hw/dataplane/event-poll.c  |   69 +--
  hw/dataplane/event-poll.h  |   14 ++-
  hw/dataplane/virtio-blk.c  |8 +-
  hw/dataplane/virtio-net.c  |  444 
 
  hw/dataplane/virtio-net.h  |   32 
  hw/dataplane/vring.c   |   37 
  hw/dataplane/vring.h   |3 +
  hw/virtio-net.c|   94 +-
  hw/virtio-net.h|   64 +++
  hw/virtio-pci.c|2 +-
  include/net/tap.h  |6 +
  net/tap.c  |   92 +-
  14 files changed, 806 insertions(+), 75 deletions(-)
  create mode 100644 hw/dataplane/virtio-net.c
  create mode 100644 hw/dataplane/virtio-net.h
 
 -- 
 1.7.4.4
 




Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane

2013-02-21 Thread Stefan Hajnoczi
On Thu, Feb 21, 2013 at 08:54:44PM +0800, Liu Ping Fan wrote:
 This is a emulation to virtio-blk dataplane, which push the data
 handling out of biglock. And it is a try to implement this process
 in userspace, while vhost-net in kernel.
 
 The iperf's result show it improving the perfermance of base line,
 but still slower than vhost-net (maybe the rx path need optimized).
 
 Todo:
 implement fine lock for net-subsystem
 
 Liu Ping Fan (9):
   vring: split the modification and publish of used ring
   vring: introduce vring_restore() to restore from img
   event poll: make epoll work for normal fd
   event poll: pass event type to event callback
   event poll: enable event poll handle more than one event each time
   virtio net: introduce dataplane for virtio net
   tap: make tap peer work on dedicated data-plane thread
   virtio net: enable dataplane for virtio net
   configure: make virtio net dataplane configurable
 
  configure  |   12 ++
  hw/dataplane/Makefile.objs |4 +-
  hw/dataplane/event-poll.c  |   69 +--
  hw/dataplane/event-poll.h  |   14 ++-
  hw/dataplane/virtio-blk.c  |8 +-
  hw/dataplane/virtio-net.c  |  444 
 
  hw/dataplane/virtio-net.h  |   32 
  hw/dataplane/vring.c   |   37 
  hw/dataplane/vring.h   |3 +
  hw/virtio-net.c|   94 +-
  hw/virtio-net.h|   64 +++
  hw/virtio-pci.c|2 +-
  include/net/tap.h  |6 +
  net/tap.c  |   92 +-
  14 files changed, 806 insertions(+), 75 deletions(-)
  create mode 100644 hw/dataplane/virtio-net.c
  create mode 100644 hw/dataplane/virtio-net.h

Interesting patch series.  I want to share my thoughts on the status of
dataplane and what I'm currently working on.  There will probably be
some overlap that we can coordinate.

First, I want to eventually delete hw/dataplane/ :).  That is because
dataplane duplicates an event loop, thread-friendly guest RAM access,
and virtio.  QEMU already has all this functionality except it's not
thread-friendly.

Because hw/dataplane/ will eventually go away we should avoid adding new
code where possible.  This especially means things like adding live
migration support or solving other limitations.  The limitations are
there to motivate core QEMU refactoring, so that core QEMU code can be
used directly instead of reimplementing it.

Here's my TODO list:
 * Thread pool - AioContext

   The thread pool uses an EventNotifier to invoke callbacks.  In a
   multiple event loop world we need to invoke callbacks in the event
   loop that they are associated with.

   Worker threads are also started using a BH so that the thread
   inherits the iothread CPU affinity and other characteristics.  I
   think this can stay.

 * CoQueue - AioContext (for the scheduling bottom half)

   CoQueue wakes up coroutines from a BH so that the coroutine that
   asked to wake them up can run to completion or yield first.  We need
   to wake coroutines in the event loop that they belong to.

 * BlockDriverState - AioContext

   It probably makes sense to bind a BlockDriverState to an AioContext
   in which its file descriptors and aio activity happens.

 * bdrv_drain_all()/bdrv_flush_all() safety against concurrent requests

   The monitor, live migration, and other components use
   bdrv_drain_all() to complete all requests.  It also prevents new
   requests from starting since the global mutex is held and no device
   emulation can run.

   We need to support equivalent semantics in a world where
   BlockDriverState may be accessed from another thread and have
   in-flight aio in that event loop.

Once these tasks are complete it should be possible to use
BlockDriverState in virtio-blk dataplane.  This will mean:

 * Image format support
 * Block job support (stream, mirror, commit)
 * I/O throttling support

Then the next steps are:

 * Thread-friendly guest RAM access API
 * Convert hw/virtio.c (if necessary) to use thread-friendly RAM access
 * Assigning ioeventfd to a dedicated thread with its own AioContext
   event loop
 * Thread-friendly interrupt API

This should finally allow hw/dataplane/ to be dropped.  At this point we
get live migration and hotplug support back too.

Then dataplane mode can become the default and the experimental
x-data-plane=on|off option can be dropped.

Stefan



Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane

2013-02-21 Thread Michael S. Tsirkin
On Thu, Feb 21, 2013 at 10:15:58AM -0600, Anthony Liguori wrote:
 Michael S. Tsirkin m...@redhat.com writes:
 
  On Thu, Feb 21, 2013 at 08:54:44PM +0800, Liu Ping Fan wrote:
  This is a emulation to virtio-blk dataplane, which push the data
  handling out of biglock. And it is a try to implement this process
  in userspace, while vhost-net in kernel.
 
  What's the motivation for doing this?
 
 I haven't looked carefully at the patches, but I don't think we should
 cheat when it comes to virtio-net.  I think it's useful to have a
 baseline proof of concept but I think we should focus on making the
 network layer re-entrant.
 
 In terms of why even bother, if we can make virtio-net a viable
 alternative to vhost-net, it's a huge win.  Being in userspace is a huge
 advantage.
 
 I understand the challenges with zero-copy but we really need a proper
 userspace implementation to determine how much it really matters.
 Zero-copy tap is also not entirely outside the realm of possibility.

This is exactly what we have, vhost-net is the interface we use
for asynchronous communication. If you want to add yet another
asynchronous interface in kernel, it's certainly doable but then what's
the point?

  The iperf's result show it improving the perfermance of base line,
  but still slower than vhost-net (maybe the rx path need optimized).
  
  Todo:
  implement fine lock for net-subsystem
 
  vhost-net is currently the only way to get zero copy transmit
  on linux (and zero copy receive in the future)
  so at least in theory it'll always be faster.
 
 It might always be faster.  But that doesn't mean we should limit the
 performance of virtio-net in userspace.  Some people may be willing to
 take a small performance hit in order to obtain the security offered by
 being in userspace.
 
 Regards,
 
 Anthony Liguori

By 'being in userspace' I presume you mean the ring processing code.
Ring processing can be done in userspace if you are ready
to use the synchronous read/write operations, skipping
this bunch of code might improve security. But it requires a data
copy almost by definition.


 
  Liu Ping Fan (9):
vring: split the modification and publish of used ring
vring: introduce vring_restore() to restore from img
event poll: make epoll work for normal fd
event poll: pass event type to event callback
event poll: enable event poll handle more than one event each time
virtio net: introduce dataplane for virtio net
tap: make tap peer work on dedicated data-plane thread
virtio net: enable dataplane for virtio net
configure: make virtio net dataplane configurable
  
   configure  |   12 ++
   hw/dataplane/Makefile.objs |4 +-
   hw/dataplane/event-poll.c  |   69 +--
   hw/dataplane/event-poll.h  |   14 ++-
   hw/dataplane/virtio-blk.c  |8 +-
   hw/dataplane/virtio-net.c  |  444 
  
   hw/dataplane/virtio-net.h  |   32 
   hw/dataplane/vring.c   |   37 
   hw/dataplane/vring.h   |3 +
   hw/virtio-net.c|   94 +-
   hw/virtio-net.h|   64 +++
   hw/virtio-pci.c|2 +-
   include/net/tap.h  |6 +
   net/tap.c  |   92 +-
   14 files changed, 806 insertions(+), 75 deletions(-)
   create mode 100644 hw/dataplane/virtio-net.c
   create mode 100644 hw/dataplane/virtio-net.h
  
  -- 
  1.7.4.4
  



Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane

2013-02-21 Thread Anthony Liguori
Michael S. Tsirkin m...@redhat.com writes:

 On Thu, Feb 21, 2013 at 10:15:58AM -0600, Anthony Liguori wrote:
 Michael S. Tsirkin m...@redhat.com writes:
 
  On Thu, Feb 21, 2013 at 08:54:44PM +0800, Liu Ping Fan wrote:
  This is a emulation to virtio-blk dataplane, which push the data
  handling out of biglock. And it is a try to implement this process
  in userspace, while vhost-net in kernel.
 
  What's the motivation for doing this?
 
 I haven't looked carefully at the patches, but I don't think we should
 cheat when it comes to virtio-net.  I think it's useful to have a
 baseline proof of concept but I think we should focus on making the
 network layer re-entrant.
 
 In terms of why even bother, if we can make virtio-net a viable
 alternative to vhost-net, it's a huge win.  Being in userspace is a huge
 advantage.
 
 I understand the challenges with zero-copy but we really need a proper
 userspace implementation to determine how much it really matters.
 Zero-copy tap is also not entirely outside the realm of possibility.

 This is exactly what we have, vhost-net is the interface we use
 for asynchronous communication. If you want to add yet another
 asynchronous interface in kernel, it's certainly doable but then what's
 the point?

  The iperf's result show it improving the perfermance of base line,
  but still slower than vhost-net (maybe the rx path need optimized).
  
  Todo:
  implement fine lock for net-subsystem
 
  vhost-net is currently the only way to get zero copy transmit
  on linux (and zero copy receive in the future)
  so at least in theory it'll always be faster.
 
 It might always be faster.  But that doesn't mean we should limit the
 performance of virtio-net in userspace.  Some people may be willing to
 take a small performance hit in order to obtain the security offered by
 being in userspace.
 
 Regards,
 
 Anthony Liguori

 By 'being in userspace' I presume you mean the ring processing code.

I mean that:

1) the same code path is used regardless of network backend

2) QEMU has the ability to manipulate the traffic if it needs/wants to

3) the state of the device model is userspace at all times

4) QEMU is always the front-line interface communicating with the guest

Yes, this could be a model where -netdev vhost-net was the backend used
by QEMU.

 Ring processing can be done in userspace if you are ready
 to use the synchronous read/write operations, skipping
 this bunch of code might improve security. But it requires a data
 copy almost by definition.

Not sure what you mean here.  If you are talking about read/write on a
tap file descriptor, I'm not even suggesting that.

I'm perfectly fine with vhost-net being a -netdev backend.  But that's a
very different model than how we use it today.

Regards,

Anthony Liguori



 
  Liu Ping Fan (9):
vring: split the modification and publish of used ring
vring: introduce vring_restore() to restore from img
event poll: make epoll work for normal fd
event poll: pass event type to event callback
event poll: enable event poll handle more than one event each time
virtio net: introduce dataplane for virtio net
tap: make tap peer work on dedicated data-plane thread
virtio net: enable dataplane for virtio net
configure: make virtio net dataplane configurable
  
   configure  |   12 ++
   hw/dataplane/Makefile.objs |4 +-
   hw/dataplane/event-poll.c  |   69 +--
   hw/dataplane/event-poll.h  |   14 ++-
   hw/dataplane/virtio-blk.c  |8 +-
   hw/dataplane/virtio-net.c  |  444 
  
   hw/dataplane/virtio-net.h  |   32 
   hw/dataplane/vring.c   |   37 
   hw/dataplane/vring.h   |3 +
   hw/virtio-net.c|   94 +-
   hw/virtio-net.h|   64 +++
   hw/virtio-pci.c|2 +-
   include/net/tap.h  |6 +
   net/tap.c  |   92 +-
   14 files changed, 806 insertions(+), 75 deletions(-)
   create mode 100644 hw/dataplane/virtio-net.c
   create mode 100644 hw/dataplane/virtio-net.h
  
  -- 
  1.7.4.4
  




Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane

2013-02-21 Thread Paolo Bonzini
Il 21/02/2013 17:33, Stefan Hajnoczi ha scritto:
 Interesting patch series.  I want to share my thoughts on the status of
 dataplane and what I'm currently working on.  There will probably be
 some overlap that we can coordinate.
 
 First, I want to eventually delete hw/dataplane/ :).  That is because
 dataplane duplicates an event loop, thread-friendly guest RAM access,
 and virtio.  QEMU already has all this functionality except it's not
 thread-friendly.
 
 Because hw/dataplane/ will eventually go away we should avoid adding new
 code where possible.

100% agree.  In particular hw/dataplane/event-poll.c should be the first
to go away, but AioContext provides the functionality that Ping Fan
needs.  But hw/dataplane/vring.c will probably be here for a longer
time, so I don't see a problem in touching that.

 This especially means things like adding live
 migration support or solving other limitations.  The limitations are
 there to motivate core QEMU refactoring, so that core QEMU code can be
 used directly instead of reimplementing it.
 
 Here's my TODO list:
  * Thread pool - AioContext
 
The thread pool uses an EventNotifier to invoke callbacks.  In a
multiple event loop world we need to invoke callbacks in the event
loop that they are associated with.
 
Worker threads are also started using a BH so that the thread
inherits the iothread CPU affinity and other characteristics.  I
think this can stay.

It can and should stay. :)  BHs are tied to an AioContext.  Running
thread-creation BHs from the AioContext's main thread ensures that the
worker threads inherit the right attributes for e.g. real-time.

  * BlockDriverState - AioContext
 
It probably makes sense to bind a BlockDriverState to an AioContext
in which its file descriptors and aio activity happens.

... and have the full chain of BlockDriverStates (bs-file,
bs-backing_file and any others as in the vmdk driver) bound to a single
AioContext.  We will use two BlockDriverStates if the same backing file
happens to be shared by two disks (BTW, anybody ever checked what kind
of havoc happens if you commit to such a BDS? :)).

  * bdrv_drain_all()/bdrv_flush_all() safety against concurrent requests
 
The monitor, live migration, and other components use
bdrv_drain_all() to complete all requests.  It also prevents new
requests from starting since the global mutex is held and no device
emulation can run.
 
We need to support equivalent semantics in a world where
BlockDriverState may be accessed from another thread and have
in-flight aio in that event loop.

I think this has two parts.  The first is to make bdrv_drain_all() more
fine-grained.  Many calls can be changed to bdrv_drain() that operates
on a chain of BlockDriverStates (defined as above).  Initially I was
thinking of adding a locking API to AioContext, like:

aio_lock(bdrv_get_aio_context(bs));
bdrv_drain(bs);
...
... do work
...
aio_unlock(bdrv_get_aio_context(bs));

However, this is not enough, because there are some cases in which
bdrv_drain_all() is genuinely needed; for example atomic snapshots.  And
in general we need to stop ioeventfd processing if monitor code runs
while the VCPU is not paused.

So I think we need to register handlers from the AioContext that disable
and enable the ioeventfd:

aio_grab(bdrv_get_aio_context(bs));
bdrv_drain(bs);
...
... do work
...
aio_release(bdrv_get_aio_context(bs));

The actual functions called by aio_grab/aio_release would be defined by
the dataplane thread.

I'm not sure this would be enough to make locking entirely private to
the AioContext, but I hope so.

Anyhow, at this point hw/dataplane/ioq.c goes.  This is IMO doable for 1.5.

  * Thread-friendly guest RAM access API

... and hw/dataplane/hostmem.c goes...

  * Convert hw/virtio.c (if necessary) to use thread-friendly RAM access

... and hw/dataplane/vring.c goes.

  * Assigning ioeventfd to a dedicated thread with its own AioContext
event loop

I think here it starts not to be a problem.  I think you can already
have live migration and hotplug at this point, but I may be missing
something.

  * Thread-friendly interrupt API

What is this?

What's still missing here is TCG support.  Perhaps you can emulate
ioeventfd at the hw/virtio.c level and always going through a host
notifier.  In retrospect, perhaps it was a mistake to make ioeventfd a
non-experimental option.

Paolo



Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane

2013-02-21 Thread Michael S. Tsirkin
On Thu, Feb 21, 2013 at 11:02:31AM -0600, Anthony Liguori wrote:
 Michael S. Tsirkin m...@redhat.com writes:
 
  On Thu, Feb 21, 2013 at 10:15:58AM -0600, Anthony Liguori wrote:
  Michael S. Tsirkin m...@redhat.com writes:
  
   On Thu, Feb 21, 2013 at 08:54:44PM +0800, Liu Ping Fan wrote:
   This is a emulation to virtio-blk dataplane, which push the data
   handling out of biglock. And it is a try to implement this process
   in userspace, while vhost-net in kernel.
  
   What's the motivation for doing this?
  
  I haven't looked carefully at the patches, but I don't think we should
  cheat when it comes to virtio-net.  I think it's useful to have a
  baseline proof of concept but I think we should focus on making the
  network layer re-entrant.
  
  In terms of why even bother, if we can make virtio-net a viable
  alternative to vhost-net, it's a huge win.  Being in userspace is a huge
  advantage.
  
  I understand the challenges with zero-copy but we really need a proper
  userspace implementation to determine how much it really matters.
  Zero-copy tap is also not entirely outside the realm of possibility.
 
  This is exactly what we have, vhost-net is the interface we use
  for asynchronous communication. If you want to add yet another
  asynchronous interface in kernel, it's certainly doable but then what's
  the point?
 
   The iperf's result show it improving the perfermance of base line,
   but still slower than vhost-net (maybe the rx path need optimized).
   
   Todo:
   implement fine lock for net-subsystem
  
   vhost-net is currently the only way to get zero copy transmit
   on linux (and zero copy receive in the future)
   so at least in theory it'll always be faster.
  
  It might always be faster.  But that doesn't mean we should limit the
  performance of virtio-net in userspace.  Some people may be willing to
  take a small performance hit in order to obtain the security offered by
  being in userspace.
  
  Regards,
  
  Anthony Liguori
 
  By 'being in userspace' I presume you mean the ring processing code.
 
 I mean that:
 
 1) the same code path is used regardless of network backend
 
 2) QEMU has the ability to manipulate the traffic if it needs/wants to
 
 3) the state of the device model is userspace at all times
 
 4) QEMU is always the front-line interface communicating with the guest
 
 Yes, this could be a model where -netdev vhost-net was the backend used
 by QEMU.
 
  Ring processing can be done in userspace if you are ready
  to use the synchronous read/write operations, skipping
  this bunch of code might improve security. But it requires a data
  copy almost by definition.
 
 Not sure what you mean here.  If you are talking about read/write on a
 tap file descriptor, I'm not even suggesting that.
 
 I'm perfectly fine with vhost-net being a -netdev backend.  But that's a
 very different model than how we use it today.
 
 Regards,
 
 Anthony Liguori

It's not hard to make the first step there.  Basically just make -netdev
vhost-net an alias for -netdev tap, just making vhost=on the default.

For non virtio frontends, it needs a driver for vhost-net in userspace,
I started a work on this at some point as it would let me
write unit tests for virtio, but never finished.
Any takers?


 
 
  
   Liu Ping Fan (9):
 vring: split the modification and publish of used ring
 vring: introduce vring_restore() to restore from img
 event poll: make epoll work for normal fd
 event poll: pass event type to event callback
 event poll: enable event poll handle more than one event each time
 virtio net: introduce dataplane for virtio net
 tap: make tap peer work on dedicated data-plane thread
 virtio net: enable dataplane for virtio net
 configure: make virtio net dataplane configurable
   
configure  |   12 ++
hw/dataplane/Makefile.objs |4 +-
hw/dataplane/event-poll.c  |   69 +--
hw/dataplane/event-poll.h  |   14 ++-
hw/dataplane/virtio-blk.c  |8 +-
hw/dataplane/virtio-net.c  |  444 
   
hw/dataplane/virtio-net.h  |   32 
hw/dataplane/vring.c   |   37 
hw/dataplane/vring.h   |3 +
hw/virtio-net.c|   94 +-
hw/virtio-net.h|   64 +++
hw/virtio-pci.c|2 +-
include/net/tap.h  |6 +
net/tap.c  |   92 +-
14 files changed, 806 insertions(+), 75 deletions(-)
create mode 100644 hw/dataplane/virtio-net.c
create mode 100644 hw/dataplane/virtio-net.h
   
   -- 
   1.7.4.4
   



Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane

2013-02-21 Thread Michael S. Tsirkin
On Thu, Feb 21, 2013 at 06:45:54PM +0200, Michael S. Tsirkin wrote:
 On Thu, Feb 21, 2013 at 10:15:58AM -0600, Anthony Liguori wrote:
  Michael S. Tsirkin m...@redhat.com writes:
  
   On Thu, Feb 21, 2013 at 08:54:44PM +0800, Liu Ping Fan wrote:
   This is a emulation to virtio-blk dataplane, which push the data
   handling out of biglock. And it is a try to implement this process
   in userspace, while vhost-net in kernel.
  
   What's the motivation for doing this?
  
  I haven't looked carefully at the patches, but I don't think we should
  cheat when it comes to virtio-net.  I think it's useful to have a
  baseline proof of concept but I think we should focus on making the
  network layer re-entrant.
  
  In terms of why even bother, if we can make virtio-net a viable
  alternative to vhost-net, it's a huge win.  Being in userspace is a huge
  advantage.
  
  I understand the challenges with zero-copy but we really need a proper
  userspace implementation to determine how much it really matters.
  Zero-copy tap is also not entirely outside the realm of possibility.
 
 This is exactly what we have, vhost-net is the interface we use
 for asynchronous communication. If you want to add yet another
 asynchronous interface in kernel, it's certainly doable but then what's
 the point?
 
   The iperf's result show it improving the perfermance of base line,
   but still slower than vhost-net (maybe the rx path need optimized).
   
   Todo:
   implement fine lock for net-subsystem
  
   vhost-net is currently the only way to get zero copy transmit
   on linux (and zero copy receive in the future)
   so at least in theory it'll always be faster.
  
  It might always be faster.  But that doesn't mean we should limit the
  performance of virtio-net in userspace.  Some people may be willing to
  take a small performance hit in order to obtain the security offered by
  being in userspace.
  
  Regards,
  
  Anthony Liguori
 
 By 'being in userspace' I presume you mean the ring processing code.
 Ring processing can be done in userspace if you are ready
 to use the synchronous read/write operations, skipping
 this bunch of code might improve security. But it requires a data
 copy almost by definition.
 

BTW I think we do have a problem at the moment,
and that is that virtio net can process the same ring both in userspace
(for level interrupts) and in kernel (for edge interrupts).
And which one is selected is under guest control.
This is twice the number of potential security issues.
A solution, in my opinion, is to improve handling of level
interrupts to the point where we can make vhost-net
the default for level as well.
The new support for level IRQFD in kvm should make
this possible.

Unfortunately, this prototype moves in exactly the reverse
direction.

  
   Liu Ping Fan (9):
 vring: split the modification and publish of used ring
 vring: introduce vring_restore() to restore from img
 event poll: make epoll work for normal fd
 event poll: pass event type to event callback
 event poll: enable event poll handle more than one event each time
 virtio net: introduce dataplane for virtio net
 tap: make tap peer work on dedicated data-plane thread
 virtio net: enable dataplane for virtio net
 configure: make virtio net dataplane configurable
   
configure  |   12 ++
hw/dataplane/Makefile.objs |4 +-
hw/dataplane/event-poll.c  |   69 +--
hw/dataplane/event-poll.h  |   14 ++-
hw/dataplane/virtio-blk.c  |8 +-
hw/dataplane/virtio-net.c  |  444 
   
hw/dataplane/virtio-net.h  |   32 
hw/dataplane/vring.c   |   37 
hw/dataplane/vring.h   |3 +
hw/virtio-net.c|   94 +-
hw/virtio-net.h|   64 +++
hw/virtio-pci.c|2 +-
include/net/tap.h  |6 +
net/tap.c  |   92 +-
14 files changed, 806 insertions(+), 75 deletions(-)
create mode 100644 hw/dataplane/virtio-net.c
create mode 100644 hw/dataplane/virtio-net.h
   
   -- 
   1.7.4.4
   



Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane

2013-02-21 Thread mdroth
On Thu, Feb 21, 2013 at 06:05:44PM +0100, Paolo Bonzini wrote:
 Il 21/02/2013 17:33, Stefan Hajnoczi ha scritto:
  Interesting patch series.  I want to share my thoughts on the status of
  dataplane and what I'm currently working on.  There will probably be
  some overlap that we can coordinate.
  
  First, I want to eventually delete hw/dataplane/ :).  That is because
  dataplane duplicates an event loop, thread-friendly guest RAM access,
  and virtio.  QEMU already has all this functionality except it's not
  thread-friendly.
  
  Because hw/dataplane/ will eventually go away we should avoid adding new
  code where possible.
 
 100% agree.  In particular hw/dataplane/event-poll.c should be the first
 to go away, but AioContext provides the functionality that Ping Fan
 needs.  But hw/dataplane/vring.c will probably be here for a longer

Has there been any discussion around introducing something similar to
AioContexts for fd handlers? This would avoid the dataplane-specific hooks
needed for NetClients in this series.



Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane

2013-02-21 Thread Paolo Bonzini
Il 21/02/2013 22:07, mdroth ha scritto:
  
  100% agree.  In particular hw/dataplane/event-poll.c should be the first
  to go away, but AioContext provides the functionality that Ping Fan
  needs.  But hw/dataplane/vring.c will probably be here for a longer
 Has there been any discussion around introducing something similar to
 AioContexts for fd handlers? This would avoid the dataplane-specific hooks
 needed for NetClients in this series.

AioContext can include file descriptors on POSIX systems (used for NBD
and other network backends), see aio_set_fd_handler.

Paolo



Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane

2013-02-21 Thread mdroth
On Thu, Feb 21, 2013 at 11:12:01PM +0100, Paolo Bonzini wrote:
 Il 21/02/2013 22:07, mdroth ha scritto:
   
   100% agree.  In particular hw/dataplane/event-poll.c should be the first
   to go away, but AioContext provides the functionality that Ping Fan
   needs.  But hw/dataplane/vring.c will probably be here for a longer
  Has there been any discussion around introducing something similar to
  AioContexts for fd handlers? This would avoid the dataplane-specific hooks
  needed for NetClients in this series.
 
 AioContext can include file descriptors on POSIX systems (used for NBD
 and other network backends), see aio_set_fd_handler.

Sorry, was using fd handlers too generally. I mean specifically for
the qemu_set_fd_handler interfaces, where we currently rely on a single list
of IOHandlerRecords for registration and a single loop to drive them. Would
be nice if we could drive subsets of those via mini main loops, similar to the
way dataplane threads would with a particular AioContext via aio_poll (or 
perhaps
the exact same way)

Currently, Ping Fan's patches basically do this already by accessing a
global reference to the vnet worker thread and attaching events/handlers to
it's event loop via a new set of registration functions (PATCH 7).

I think generalizing this by basing qemu_set_fd_handler() around
AioContext, or something similar, would help to extend support to other
NetClient implementations without requiring dataplane-specific hooks
throughout.

 
 Paolo