Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane
On Thu, Feb 21, 2013 at 11:02:31AM -0600, Anthony Liguori wrote: Michael S. Tsirkin m...@redhat.com writes: On Thu, Feb 21, 2013 at 10:15:58AM -0600, Anthony Liguori wrote: Michael S. Tsirkin m...@redhat.com writes: On Thu, Feb 21, 2013 at 08:54:44PM +0800, Liu Ping Fan wrote: This is a emulation to virtio-blk dataplane, which push the data handling out of biglock. And it is a try to implement this process in userspace, while vhost-net in kernel. What's the motivation for doing this? I haven't looked carefully at the patches, but I don't think we should cheat when it comes to virtio-net. I think it's useful to have a baseline proof of concept but I think we should focus on making the network layer re-entrant. By the way I agree on this one. We allowed the cheat for block because there's no vhost-blk upstream yet. -- MST
Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane
On Tue, Feb 26, 2013 at 1:53 AM, Paolo Bonzini pbonz...@redhat.com wrote: Il 25/02/2013 18:35, mdroth ha scritto: Moving more of the os_host_main_loop_wait to AioContext would be possible (timers are on the todo list, in fact), but we should only do it as need arises. Were you planning on hanging another GSource off of AioContext to handle timers? No, I'm planning to merge qemu-timer.c into it. I don't want to turn AioContext into a bad copy of the glib main loop. AioContext should keep the concepts of the QEMU block layer. We could consolidate qemu_set_fd_handler()/qemu_aio_set_fd_handler() on POSIX by teaching the current GSource about fd_read_poll functions, and on Windows qemu_set_fd_handler() would tie into a winsock-specific GSource that we register with an AioContext. Might be able to do similar with GSources for slirp and the qemu_add_wait_object() stuff. Consolidating qemu_set_fd_handler and slirp to GSources is a good thing to do, indeed. The assumption here is that qemu_set_fd_handler and slirp do not need special locking or priority constructs, while everything else can use AioContext. AioContext could also grow other implementations that use epoll/kqueue/whatever. Yup, don't mean to get ahead of things, my main interest is just in how we might deal with the interaction between NetClients and virtio-net dataplane threads without introducing ad-hoc, dataplane-specific mechanisms. If there was a general way for Nic to tell it's NetClient peer hey, i have my own thread/main loop, here's my {Aio,*}Context, register your handlers there instead I think this series might look a lot more realistic as a default, or at least make merging it less risky. Yes, I see the point. The main blocker to this series seems to be hubs, because they interact with multiple NetClients and thus could span multiple AioContexts. Adding proper locking there is going to be interesting. :) I think we can attach hub ports with different AioContext, and dispatch them on corresponding one. What about net_hub_receive() { for_each_hub_port if (dstport-aio == srcport-aio) qemu_send_packet() else qemu_net_queue_append(dstport) signal dstport's eventfd, whose handler will call nc-receive() } Regards, Pingfan But otherwise, I don't think we would have many hacks, if any. Unlike the block layer, networking is quite self-contained and there isn't much magic to interact with it from the monitor; and for the block layer we already have an idea of how to deal with concurrency. Paolo But the right way to do that seems to tie into the discussion around making other aio sources more GMainContext/AioContext-ish.
Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane
On Fri, Feb 22, 2013 at 12:51 AM, Michael S. Tsirkin m...@redhat.com wrote: On Thu, Feb 21, 2013 at 06:45:54PM +0200, Michael S. Tsirkin wrote: On Thu, Feb 21, 2013 at 10:15:58AM -0600, Anthony Liguori wrote: Michael S. Tsirkin m...@redhat.com writes: On Thu, Feb 21, 2013 at 08:54:44PM +0800, Liu Ping Fan wrote: This is a emulation to virtio-blk dataplane, which push the data handling out of biglock. And it is a try to implement this process in userspace, while vhost-net in kernel. What's the motivation for doing this? I haven't looked carefully at the patches, but I don't think we should cheat when it comes to virtio-net. I think it's useful to have a baseline proof of concept but I think we should focus on making the network layer re-entrant. In terms of why even bother, if we can make virtio-net a viable alternative to vhost-net, it's a huge win. Being in userspace is a huge advantage. I understand the challenges with zero-copy but we really need a proper userspace implementation to determine how much it really matters. Zero-copy tap is also not entirely outside the realm of possibility. This is exactly what we have, vhost-net is the interface we use for asynchronous communication. If you want to add yet another asynchronous interface in kernel, it's certainly doable but then what's the point? The iperf's result show it improving the perfermance of base line, but still slower than vhost-net (maybe the rx path need optimized). Todo: implement fine lock for net-subsystem vhost-net is currently the only way to get zero copy transmit on linux (and zero copy receive in the future) so at least in theory it'll always be faster. It might always be faster. But that doesn't mean we should limit the performance of virtio-net in userspace. Some people may be willing to take a small performance hit in order to obtain the security offered by being in userspace. Regards, Anthony Liguori By 'being in userspace' I presume you mean the ring processing code. Ring processing can be done in userspace if you are ready to use the synchronous read/write operations, skipping this bunch of code might improve security. But it requires a data copy almost by definition. BTW I think we do have a problem at the moment, and that is that virtio net can process the same ring both in userspace (for level interrupts) and in kernel (for edge interrupts). When using virtio net, the proc/interrupt shows IO-APIC-edge as vhost-net, so does virtio net use level interrupt? And which one is selected is under guest control. This is twice the number of potential security issues. A solution, in my opinion, is to improve handling of level interrupts to the point where we can make vhost-net the default for level as well. The new support for level IRQFD in kvm should make this possible. Seems qemu lacks of something like eoi-eventfd for level interrupt, right? Thanks and regards, Pingfan Unfortunately, this prototype moves in exactly the reverse direction. Liu Ping Fan (9): vring: split the modification and publish of used ring vring: introduce vring_restore() to restore from img event poll: make epoll work for normal fd event poll: pass event type to event callback event poll: enable event poll handle more than one event each time virtio net: introduce dataplane for virtio net tap: make tap peer work on dedicated data-plane thread virtio net: enable dataplane for virtio net configure: make virtio net dataplane configurable configure | 12 ++ hw/dataplane/Makefile.objs |4 +- hw/dataplane/event-poll.c | 69 +-- hw/dataplane/event-poll.h | 14 ++- hw/dataplane/virtio-blk.c |8 +- hw/dataplane/virtio-net.c | 444 hw/dataplane/virtio-net.h | 32 hw/dataplane/vring.c | 37 hw/dataplane/vring.h |3 + hw/virtio-net.c| 94 +- hw/virtio-net.h| 64 +++ hw/virtio-pci.c|2 +- include/net/tap.h |6 + net/tap.c | 92 +- 14 files changed, 806 insertions(+), 75 deletions(-) create mode 100644 hw/dataplane/virtio-net.c create mode 100644 hw/dataplane/virtio-net.h -- 1.7.4.4
Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane
On Wed, Feb 27, 2013 at 05:41:40PM +0800, liu ping fan wrote: On Fri, Feb 22, 2013 at 12:51 AM, Michael S. Tsirkin m...@redhat.com wrote: On Thu, Feb 21, 2013 at 06:45:54PM +0200, Michael S. Tsirkin wrote: On Thu, Feb 21, 2013 at 10:15:58AM -0600, Anthony Liguori wrote: Michael S. Tsirkin m...@redhat.com writes: On Thu, Feb 21, 2013 at 08:54:44PM +0800, Liu Ping Fan wrote: This is a emulation to virtio-blk dataplane, which push the data handling out of biglock. And it is a try to implement this process in userspace, while vhost-net in kernel. What's the motivation for doing this? I haven't looked carefully at the patches, but I don't think we should cheat when it comes to virtio-net. I think it's useful to have a baseline proof of concept but I think we should focus on making the network layer re-entrant. In terms of why even bother, if we can make virtio-net a viable alternative to vhost-net, it's a huge win. Being in userspace is a huge advantage. I understand the challenges with zero-copy but we really need a proper userspace implementation to determine how much it really matters. Zero-copy tap is also not entirely outside the realm of possibility. This is exactly what we have, vhost-net is the interface we use for asynchronous communication. If you want to add yet another asynchronous interface in kernel, it's certainly doable but then what's the point? The iperf's result show it improving the perfermance of base line, but still slower than vhost-net (maybe the rx path need optimized). Todo: implement fine lock for net-subsystem vhost-net is currently the only way to get zero copy transmit on linux (and zero copy receive in the future) so at least in theory it'll always be faster. It might always be faster. But that doesn't mean we should limit the performance of virtio-net in userspace. Some people may be willing to take a small performance hit in order to obtain the security offered by being in userspace. Regards, Anthony Liguori By 'being in userspace' I presume you mean the ring processing code. Ring processing can be done in userspace if you are ready to use the synchronous read/write operations, skipping this bunch of code might improve security. But it requires a data copy almost by definition. BTW I think we do have a problem at the moment, and that is that virtio net can process the same ring both in userspace (for level interrupts) and in kernel (for edge interrupts). When using virtio net, the proc/interrupt shows IO-APIC-edge as vhost-net, so does virtio net use level interrupt? virtio *can* use level interrupts, and it's under guest control. E.g. simply boot with pci=nomsi. Which means we have two implementations and guest controls which one is used. If we always use the kernel one, we reduce the attack surface. This already can be done with vhostforce=on, but it hurts (a bit) the speed for very old guests (e.g. windows XP) as interrupts are bounced through qemu. And which one is selected is under guest control. This is twice the number of potential security issues. A solution, in my opinion, is to improve handling of level interrupts to the point where we can make vhost-net the default for level as well. The new support for level IRQFD in kvm should make this possible. Seems qemu lacks of something like eoi-eventfd for level interrupt, right? Thanks and regards, Pingfan No. See KVM_IRQFD_FLAG_RESAMPLE. Unfortunately, this prototype moves in exactly the reverse direction. Liu Ping Fan (9): vring: split the modification and publish of used ring vring: introduce vring_restore() to restore from img event poll: make epoll work for normal fd event poll: pass event type to event callback event poll: enable event poll handle more than one event each time virtio net: introduce dataplane for virtio net tap: make tap peer work on dedicated data-plane thread virtio net: enable dataplane for virtio net configure: make virtio net dataplane configurable configure | 12 ++ hw/dataplane/Makefile.objs |4 +- hw/dataplane/event-poll.c | 69 +-- hw/dataplane/event-poll.h | 14 ++- hw/dataplane/virtio-blk.c |8 +- hw/dataplane/virtio-net.c | 444 hw/dataplane/virtio-net.h | 32 hw/dataplane/vring.c | 37 hw/dataplane/vring.h |3 + hw/virtio-net.c| 94 +- hw/virtio-net.h| 64 +++ hw/virtio-pci.c|2 +- include/net/tap.h |6 + net/tap.c | 92 +- 14 files changed, 806 insertions(+), 75 deletions(-) create mode 100644 hw/dataplane/virtio-net.c create mode
Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane
Stefan Hajnoczi stefa...@gmail.com writes: On Mon, Feb 25, 2013 at 06:53:04PM +0100, Paolo Bonzini wrote: Il 25/02/2013 18:35, mdroth ha scritto: Yup, don't mean to get ahead of things, my main interest is just in how we might deal with the interaction between NetClients and virtio-net dataplane threads without introducing ad-hoc, dataplane-specific mechanisms. If there was a general way for Nic to tell it's NetClient peer hey, i have my own thread/main loop, here's my {Aio,*}Context, register your handlers there instead I think this series might look a lot more realistic as a default, or at least make merging it less risky. Yes, I see the point. The main blocker to this series seems to be hubs, because they interact with multiple NetClients and thus could span multiple AioContexts. Adding proper locking there is going to be interesting. :) I think folks are pretty fed up with QEMU vlans (hubs). Someone braver and with more time on their hands than me might be able to kill hubs without breaking their few use cases (-net dump, using the hub with -net socket). Figuring out the 'interesting' locking leaves us with 'interesting' locking code, and still stuck with vlans. Killing vlans would avoid the locking problems, and get rid of a fertile source of 'interesting' problems. Guess how I'd prefer to see the locking problem solved :)
Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane
Il 27/02/2013 13:08, Markus Armbruster ha scritto: The main blocker to this series seems to be hubs, because they interact with multiple NetClients and thus could span multiple AioContexts. Adding proper locking there is going to be interesting. :) I think folks are pretty fed up with QEMU vlans (hubs). Someone braver and with more time on their hands than me might be able to kill hubs without breaking their few use cases (-net dump, using the hub with -net socket). Figuring out the 'interesting' locking leaves us with 'interesting' locking code, and still stuck with vlans. Killing vlans would avoid the locking problems, and get rid of a fertile source of 'interesting' problems. Guess how I'd prefer to see the locking problem solved :) True, though I'd love to be proved wrong. After all, adding proper locking to block migration was all but difficult (once I tried). Paolo
Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane
On Mon, Feb 25, 2013 at 06:53:04PM +0100, Paolo Bonzini wrote: Il 25/02/2013 18:35, mdroth ha scritto: Yup, don't mean to get ahead of things, my main interest is just in how we might deal with the interaction between NetClients and virtio-net dataplane threads without introducing ad-hoc, dataplane-specific mechanisms. If there was a general way for Nic to tell it's NetClient peer hey, i have my own thread/main loop, here's my {Aio,*}Context, register your handlers there instead I think this series might look a lot more realistic as a default, or at least make merging it less risky. Yes, I see the point. The main blocker to this series seems to be hubs, because they interact with multiple NetClients and thus could span multiple AioContexts. Adding proper locking there is going to be interesting. :) I think folks are pretty fed up with QEMU vlans (hubs). Someone braver and with more time on their hands than me might be able to kill hubs without breaking their few use cases (-net dump, using the hub with -net socket). Stefan
Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane
On Fri, Feb 22, 2013 at 09:53:11AM +0100, Paolo Bonzini wrote: Il 22/02/2013 00:38, mdroth ha scritto: On Thu, Feb 21, 2013 at 11:12:01PM +0100, Paolo Bonzini wrote: Il 21/02/2013 22:07, mdroth ha scritto: 100% agree. In particular hw/dataplane/event-poll.c should be the first to go away, but AioContext provides the functionality that Ping Fan needs. But hw/dataplane/vring.c will probably be here for a longer Has there been any discussion around introducing something similar to AioContexts for fd handlers? This would avoid the dataplane-specific hooks needed for NetClients in this series. AioContext can include file descriptors on POSIX systems (used for NBD and other network backends), see aio_set_fd_handler. Sorry, was using fd handlers too generally. I mean specifically for the qemu_set_fd_handler interfaces, where we currently rely on a single list of IOHandlerRecords for registration and a single loop to drive them. Would be nice if we could drive subsets of those via mini main loops, similar to the way dataplane threads would with a particular AioContext via aio_poll (or perhaps the exact same way) Yes, that's what I meant actually. You can already do it for POSIX, unfortunately Windows poses extra complication because sockets are not handles. Moving more of the os_host_main_loop_wait to AioContext would be possible (timers are on the todo list, in fact), but we should only do it as need arises. Were you planning on hanging another GSource off of AioContext to handle timers? I wonder if adding an interface to AioContext that allows you to just register an arbitrary GSource might be the right approach. We could consolidate qemu_set_fd_handler()/qemu_aio_set_fd_handler() on POSIX by teaching the current GSource about fd_read_poll functions, and on Windows qemu_set_fd_handler() would tie into a winsock-specific GSource that we register with an AioContext. Might be able to do similar with GSources for slirp and the qemu_add_wait_object() stuff. Or maybe that's the plan already... glib-style main loops, but not intrinsically tethered to GMainContext so we can implement our own locking/priority/loop constructs if need be... seems to be what's been discussed elsewhere, just trying to get an idea of what that might look like. possible (timers are on the todo list, in fact), but we should only do it as need arises. Yup, don't mean to get ahead of things, my main interest is just in how we might deal with the interaction between NetClients and virtio-net dataplane threads without introducing ad-hoc, dataplane-specific mechanisms. If there was a general way for Nic to tell it's NetClient peer hey, i have my own thread/main loop, here's my {Aio,*}Context, register your handlers there instead I think this series might look a lot more realistic as a default, or at least make merging it less risky. But the right way to do that seems to tie into the discussion around making other aio sources more GMainContext/AioContext-ish. Paolo Currently, Ping Fan's patches basically do this already by accessing a global reference to the vnet worker thread and attaching events/handlers to it's event loop via a new set of registration functions (PATCH 7). I think generalizing this by basing qemu_set_fd_handler() around AioContext, or something similar, would help to extend support to other NetClient implementations without requiring dataplane-specific hooks throughout.
Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane
Il 25/02/2013 18:35, mdroth ha scritto: Moving more of the os_host_main_loop_wait to AioContext would be possible (timers are on the todo list, in fact), but we should only do it as need arises. Were you planning on hanging another GSource off of AioContext to handle timers? No, I'm planning to merge qemu-timer.c into it. I don't want to turn AioContext into a bad copy of the glib main loop. AioContext should keep the concepts of the QEMU block layer. We could consolidate qemu_set_fd_handler()/qemu_aio_set_fd_handler() on POSIX by teaching the current GSource about fd_read_poll functions, and on Windows qemu_set_fd_handler() would tie into a winsock-specific GSource that we register with an AioContext. Might be able to do similar with GSources for slirp and the qemu_add_wait_object() stuff. Consolidating qemu_set_fd_handler and slirp to GSources is a good thing to do, indeed. The assumption here is that qemu_set_fd_handler and slirp do not need special locking or priority constructs, while everything else can use AioContext. AioContext could also grow other implementations that use epoll/kqueue/whatever. Yup, don't mean to get ahead of things, my main interest is just in how we might deal with the interaction between NetClients and virtio-net dataplane threads without introducing ad-hoc, dataplane-specific mechanisms. If there was a general way for Nic to tell it's NetClient peer hey, i have my own thread/main loop, here's my {Aio,*}Context, register your handlers there instead I think this series might look a lot more realistic as a default, or at least make merging it less risky. Yes, I see the point. The main blocker to this series seems to be hubs, because they interact with multiple NetClients and thus could span multiple AioContexts. Adding proper locking there is going to be interesting. :) But otherwise, I don't think we would have many hacks, if any. Unlike the block layer, networking is quite self-contained and there isn't much magic to interact with it from the monitor; and for the block layer we already have an idea of how to deal with concurrency. Paolo But the right way to do that seems to tie into the discussion around making other aio sources more GMainContext/AioContext-ish.
Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane
Il 22/02/2013 00:38, mdroth ha scritto: On Thu, Feb 21, 2013 at 11:12:01PM +0100, Paolo Bonzini wrote: Il 21/02/2013 22:07, mdroth ha scritto: 100% agree. In particular hw/dataplane/event-poll.c should be the first to go away, but AioContext provides the functionality that Ping Fan needs. But hw/dataplane/vring.c will probably be here for a longer Has there been any discussion around introducing something similar to AioContexts for fd handlers? This would avoid the dataplane-specific hooks needed for NetClients in this series. AioContext can include file descriptors on POSIX systems (used for NBD and other network backends), see aio_set_fd_handler. Sorry, was using fd handlers too generally. I mean specifically for the qemu_set_fd_handler interfaces, where we currently rely on a single list of IOHandlerRecords for registration and a single loop to drive them. Would be nice if we could drive subsets of those via mini main loops, similar to the way dataplane threads would with a particular AioContext via aio_poll (or perhaps the exact same way) Yes, that's what I meant actually. You can already do it for POSIX, unfortunately Windows poses extra complication because sockets are not handles. Moving more of the os_host_main_loop_wait to AioContext would be possible (timers are on the todo list, in fact), but we should only do it as need arises. Paolo Currently, Ping Fan's patches basically do this already by accessing a global reference to the vnet worker thread and attaching events/handlers to it's event loop via a new set of registration functions (PATCH 7). I think generalizing this by basing qemu_set_fd_handler() around AioContext, or something similar, would help to extend support to other NetClient implementations without requiring dataplane-specific hooks throughout.
Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane
On Thu, Feb 21, 2013 at 06:05:44PM +0100, Paolo Bonzini wrote: Il 21/02/2013 17:33, Stefan Hajnoczi ha scritto: This especially means things like adding live migration support or solving other limitations. The limitations are there to motivate core QEMU refactoring, so that core QEMU code can be used directly instead of reimplementing it. Here's my TODO list: * Thread pool - AioContext The thread pool uses an EventNotifier to invoke callbacks. In a multiple event loop world we need to invoke callbacks in the event loop that they are associated with. Worker threads are also started using a BH so that the thread inherits the iothread CPU affinity and other characteristics. I think this can stay. It can and should stay. :) BHs are tied to an AioContext. Running thread-creation BHs from the AioContext's main thread ensures that the worker threads inherit the right attributes for e.g. real-time. When I say this can stay I mean that worker threads will be spawned from the QEMU iothread, not from the caller's thread. Why? Because if we decide to change this then we can no longer have a global set of worker threads and a single work queue. It would make worker threads scoped to just one AioContext. Do we really want to go there? It kind of defeats the purpose of a thread-pool to have one thread-pool per AioContext (i.e. per virtio-blk device). * BlockDriverState - AioContext It probably makes sense to bind a BlockDriverState to an AioContext in which its file descriptors and aio activity happens. ... and have the full chain of BlockDriverStates (bs-file, bs-backing_file and any others as in the vmdk driver) bound to a single AioContext. We will use two BlockDriverStates if the same backing file happens to be shared by two disks (BTW, anybody ever checked what kind of havoc happens if you commit to such a BDS? :)). AFAIK we don't share backing file BlockDriverStates today. * Thread-friendly interrupt API What is this? Raising interrupts, I think we're already okay using irqfd today but maybe it's nicer to make the regular QEMU irq APIs support this instead of manually using irqfd. What's still missing here is TCG support. Perhaps you can emulate ioeventfd at the hw/virtio.c level and always going through a host notifier. In retrospect, perhaps it was a mistake to make ioeventfd a non-experimental option. That's true. We need to emulate ioeventfd. For virtio-pci we can cheat by simply notifying the EventNotifier from the virtqueue kick handler. But really we should implement an ioeventfd equivalent at the pio/mmio emulation level in QEMU. Stefan
Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane
On Thu, Feb 21, 2013 at 05:38:04PM -0600, mdroth wrote: On Thu, Feb 21, 2013 at 11:12:01PM +0100, Paolo Bonzini wrote: Il 21/02/2013 22:07, mdroth ha scritto: 100% agree. In particular hw/dataplane/event-poll.c should be the first to go away, but AioContext provides the functionality that Ping Fan needs. But hw/dataplane/vring.c will probably be here for a longer Has there been any discussion around introducing something similar to AioContexts for fd handlers? This would avoid the dataplane-specific hooks needed for NetClients in this series. AioContext can include file descriptors on POSIX systems (used for NBD and other network backends), see aio_set_fd_handler. Sorry, was using fd handlers too generally. I mean specifically for the qemu_set_fd_handler interfaces, where we currently rely on a single list of IOHandlerRecords for registration and a single loop to drive them. Would be nice if we could drive subsets of those via mini main loops, similar to the way dataplane threads would with a particular AioContext via aio_poll (or perhaps the exact same way) Currently, Ping Fan's patches basically do this already by accessing a global reference to the vnet worker thread and attaching events/handlers to it's event loop via a new set of registration functions (PATCH 7). I think generalizing this by basing qemu_set_fd_handler() around AioContext, or something similar, would help to extend support to other NetClient implementations without requiring dataplane-specific hooks throughout. We used to have a nesting feature that we got rid of. I don't remember the details but basically a nesting level counter: aio_set_fd_handler(...); /* fd at level N */ aio_nesting_inc(); aio_set_fd_handler(...); /* fd at level N+1 */ aio_poll(); /* just this fd */ aio_nesting_dec(); aio_poll(); /* both fds */ Then aio_poll() only considers fds which are greater than or equal to the current nesting level. I remember disliking this feature but maybe it was just being used badly rather than the mechanism itself being bad. Stefan
Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane
Am 21.02.2013 um 18:05 hat Paolo Bonzini geschrieben: Il 21/02/2013 17:33, Stefan Hajnoczi ha scritto: * BlockDriverState - AioContext It probably makes sense to bind a BlockDriverState to an AioContext in which its file descriptors and aio activity happens. ... and have the full chain of BlockDriverStates (bs-file, bs-backing_file and any others as in the vmdk driver) bound to a single AioContext. We will use two BlockDriverStates if the same backing file happens to be shared by two disks (BTW, anybody ever checked what kind of havoc happens if you commit to such a BDS? :)). Backing files aren't that interesting indeed, but I think there could be cases where two BDSes refer to the same parent BDS. For example think of the read-only qcow2 snapshot BDS that we've been talking about before. So it would have to be more like one AioContext per connected component in the BDS graph, if one per BDS doesn't work (why doesn't it?) Kevin
Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane
Il 22/02/2013 11:43, Stefan Hajnoczi ha scritto: It can and should stay. :) BHs are tied to an AioContext. Running thread-creation BHs from the AioContext's main thread ensures that the worker threads inherit the right attributes for e.g. real-time. When I say this can stay I mean that worker threads will be spawned from the QEMU iothread, not from the caller's thread. Ah :) Why? Because if we decide to change this then we can no longer have a global set of worker threads and a single work queue. It would make worker threads scoped to just one AioContext. Do we really want to go there? It kind of defeats the purpose of a thread-pool to have one thread-pool per AioContext (i.e. per virtio-blk device). I think scoped worker threads are simpler to implement and have little or no lock contention (I have patches for lockless thread-pool, but I'm not 100% sure they're correct---and they're a bit ugly). Also, with many virtio-blk threads you could hit the 64-thread limit that thread-pool has, so it may require some tuning. More importantly, scoped worker thread do not require an external iothread to exist at all. We can switch from one solution to the other easily (benchmarking takes more time than coding, in all likelihood), so it's not urgent to choose one over the other now. We will use two BlockDriverStates if the same backing file happens to be shared by two disks (BTW, anybody ever checked what kind of havoc happens if you commit to such a BDS? :)). AFAIK we don't share backing file BlockDriverStates today. Indeed, we don't share BDSs but commit operations could cause the same file to be backed by two BDSs, one read-only and one read-write. What's still missing here is TCG support. Perhaps you can emulate ioeventfd at the hw/virtio.c level and always going through a host notifier. In retrospect, perhaps it was a mistake to make ioeventfd a non-experimental option. That's true. We need to emulate ioeventfd. For virtio-pci we can cheat by simply notifying the EventNotifier from the virtqueue kick handler. But really we should implement an ioeventfd equivalent at the pio/mmio emulation level in QEMU. Not a big deal anyway (like the irqfd). Paolo
Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane
Il 22/02/2013 12:06, Kevin Wolf ha scritto: or example think of the read-only qcow2 snapshot BDS that we've been talking about before. So it would have to be more like one AioContext per connected component in the BDS graph Yes. For now it means one AioContext per top-level BDS, but that can change in the future. , if one per BDS doesn't work (why doesn't it?) One AioContext per BDS means one event loop and hence one thread per BDS, with extra costs in terms of locking, context switching, etc. Paolo
Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane
On Thu, Feb 21, 2013 at 08:54:44PM +0800, Liu Ping Fan wrote: This is a emulation to virtio-blk dataplane, which push the data handling out of biglock. And it is a try to implement this process in userspace, while vhost-net in kernel. What's the motivation for doing this? Speed up the slirp backend which vhost-net does not support? If yes, I would start with implementing offloads (segmentation,checksum) in slirp. This is likely to give you a nice speed boost and is generally a desirable feature. The iperf's result show it improving the perfermance of base line, but still slower than vhost-net (maybe the rx path need optimized). Todo: implement fine lock for net-subsystem vhost-net is currently the only way to get zero copy transmit on linux (and zero copy receive in the future) so at least in theory it'll always be faster. Liu Ping Fan (9): vring: split the modification and publish of used ring vring: introduce vring_restore() to restore from img event poll: make epoll work for normal fd event poll: pass event type to event callback event poll: enable event poll handle more than one event each time virtio net: introduce dataplane for virtio net tap: make tap peer work on dedicated data-plane thread virtio net: enable dataplane for virtio net configure: make virtio net dataplane configurable configure | 12 ++ hw/dataplane/Makefile.objs |4 +- hw/dataplane/event-poll.c | 69 +-- hw/dataplane/event-poll.h | 14 ++- hw/dataplane/virtio-blk.c |8 +- hw/dataplane/virtio-net.c | 444 hw/dataplane/virtio-net.h | 32 hw/dataplane/vring.c | 37 hw/dataplane/vring.h |3 + hw/virtio-net.c| 94 +- hw/virtio-net.h| 64 +++ hw/virtio-pci.c|2 +- include/net/tap.h |6 + net/tap.c | 92 +- 14 files changed, 806 insertions(+), 75 deletions(-) create mode 100644 hw/dataplane/virtio-net.c create mode 100644 hw/dataplane/virtio-net.h -- 1.7.4.4
Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane
Michael S. Tsirkin m...@redhat.com writes: On Thu, Feb 21, 2013 at 08:54:44PM +0800, Liu Ping Fan wrote: This is a emulation to virtio-blk dataplane, which push the data handling out of biglock. And it is a try to implement this process in userspace, while vhost-net in kernel. What's the motivation for doing this? I haven't looked carefully at the patches, but I don't think we should cheat when it comes to virtio-net. I think it's useful to have a baseline proof of concept but I think we should focus on making the network layer re-entrant. In terms of why even bother, if we can make virtio-net a viable alternative to vhost-net, it's a huge win. Being in userspace is a huge advantage. I understand the challenges with zero-copy but we really need a proper userspace implementation to determine how much it really matters. Zero-copy tap is also not entirely outside the realm of possibility. The iperf's result show it improving the perfermance of base line, but still slower than vhost-net (maybe the rx path need optimized). Todo: implement fine lock for net-subsystem vhost-net is currently the only way to get zero copy transmit on linux (and zero copy receive in the future) so at least in theory it'll always be faster. It might always be faster. But that doesn't mean we should limit the performance of virtio-net in userspace. Some people may be willing to take a small performance hit in order to obtain the security offered by being in userspace. Regards, Anthony Liguori Liu Ping Fan (9): vring: split the modification and publish of used ring vring: introduce vring_restore() to restore from img event poll: make epoll work for normal fd event poll: pass event type to event callback event poll: enable event poll handle more than one event each time virtio net: introduce dataplane for virtio net tap: make tap peer work on dedicated data-plane thread virtio net: enable dataplane for virtio net configure: make virtio net dataplane configurable configure | 12 ++ hw/dataplane/Makefile.objs |4 +- hw/dataplane/event-poll.c | 69 +-- hw/dataplane/event-poll.h | 14 ++- hw/dataplane/virtio-blk.c |8 +- hw/dataplane/virtio-net.c | 444 hw/dataplane/virtio-net.h | 32 hw/dataplane/vring.c | 37 hw/dataplane/vring.h |3 + hw/virtio-net.c| 94 +- hw/virtio-net.h| 64 +++ hw/virtio-pci.c|2 +- include/net/tap.h |6 + net/tap.c | 92 +- 14 files changed, 806 insertions(+), 75 deletions(-) create mode 100644 hw/dataplane/virtio-net.c create mode 100644 hw/dataplane/virtio-net.h -- 1.7.4.4
Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane
On Thu, Feb 21, 2013 at 08:54:44PM +0800, Liu Ping Fan wrote: This is a emulation to virtio-blk dataplane, which push the data handling out of biglock. And it is a try to implement this process in userspace, while vhost-net in kernel. The iperf's result show it improving the perfermance of base line, but still slower than vhost-net (maybe the rx path need optimized). Todo: implement fine lock for net-subsystem Liu Ping Fan (9): vring: split the modification and publish of used ring vring: introduce vring_restore() to restore from img event poll: make epoll work for normal fd event poll: pass event type to event callback event poll: enable event poll handle more than one event each time virtio net: introduce dataplane for virtio net tap: make tap peer work on dedicated data-plane thread virtio net: enable dataplane for virtio net configure: make virtio net dataplane configurable configure | 12 ++ hw/dataplane/Makefile.objs |4 +- hw/dataplane/event-poll.c | 69 +-- hw/dataplane/event-poll.h | 14 ++- hw/dataplane/virtio-blk.c |8 +- hw/dataplane/virtio-net.c | 444 hw/dataplane/virtio-net.h | 32 hw/dataplane/vring.c | 37 hw/dataplane/vring.h |3 + hw/virtio-net.c| 94 +- hw/virtio-net.h| 64 +++ hw/virtio-pci.c|2 +- include/net/tap.h |6 + net/tap.c | 92 +- 14 files changed, 806 insertions(+), 75 deletions(-) create mode 100644 hw/dataplane/virtio-net.c create mode 100644 hw/dataplane/virtio-net.h Interesting patch series. I want to share my thoughts on the status of dataplane and what I'm currently working on. There will probably be some overlap that we can coordinate. First, I want to eventually delete hw/dataplane/ :). That is because dataplane duplicates an event loop, thread-friendly guest RAM access, and virtio. QEMU already has all this functionality except it's not thread-friendly. Because hw/dataplane/ will eventually go away we should avoid adding new code where possible. This especially means things like adding live migration support or solving other limitations. The limitations are there to motivate core QEMU refactoring, so that core QEMU code can be used directly instead of reimplementing it. Here's my TODO list: * Thread pool - AioContext The thread pool uses an EventNotifier to invoke callbacks. In a multiple event loop world we need to invoke callbacks in the event loop that they are associated with. Worker threads are also started using a BH so that the thread inherits the iothread CPU affinity and other characteristics. I think this can stay. * CoQueue - AioContext (for the scheduling bottom half) CoQueue wakes up coroutines from a BH so that the coroutine that asked to wake them up can run to completion or yield first. We need to wake coroutines in the event loop that they belong to. * BlockDriverState - AioContext It probably makes sense to bind a BlockDriverState to an AioContext in which its file descriptors and aio activity happens. * bdrv_drain_all()/bdrv_flush_all() safety against concurrent requests The monitor, live migration, and other components use bdrv_drain_all() to complete all requests. It also prevents new requests from starting since the global mutex is held and no device emulation can run. We need to support equivalent semantics in a world where BlockDriverState may be accessed from another thread and have in-flight aio in that event loop. Once these tasks are complete it should be possible to use BlockDriverState in virtio-blk dataplane. This will mean: * Image format support * Block job support (stream, mirror, commit) * I/O throttling support Then the next steps are: * Thread-friendly guest RAM access API * Convert hw/virtio.c (if necessary) to use thread-friendly RAM access * Assigning ioeventfd to a dedicated thread with its own AioContext event loop * Thread-friendly interrupt API This should finally allow hw/dataplane/ to be dropped. At this point we get live migration and hotplug support back too. Then dataplane mode can become the default and the experimental x-data-plane=on|off option can be dropped. Stefan
Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane
On Thu, Feb 21, 2013 at 10:15:58AM -0600, Anthony Liguori wrote: Michael S. Tsirkin m...@redhat.com writes: On Thu, Feb 21, 2013 at 08:54:44PM +0800, Liu Ping Fan wrote: This is a emulation to virtio-blk dataplane, which push the data handling out of biglock. And it is a try to implement this process in userspace, while vhost-net in kernel. What's the motivation for doing this? I haven't looked carefully at the patches, but I don't think we should cheat when it comes to virtio-net. I think it's useful to have a baseline proof of concept but I think we should focus on making the network layer re-entrant. In terms of why even bother, if we can make virtio-net a viable alternative to vhost-net, it's a huge win. Being in userspace is a huge advantage. I understand the challenges with zero-copy but we really need a proper userspace implementation to determine how much it really matters. Zero-copy tap is also not entirely outside the realm of possibility. This is exactly what we have, vhost-net is the interface we use for asynchronous communication. If you want to add yet another asynchronous interface in kernel, it's certainly doable but then what's the point? The iperf's result show it improving the perfermance of base line, but still slower than vhost-net (maybe the rx path need optimized). Todo: implement fine lock for net-subsystem vhost-net is currently the only way to get zero copy transmit on linux (and zero copy receive in the future) so at least in theory it'll always be faster. It might always be faster. But that doesn't mean we should limit the performance of virtio-net in userspace. Some people may be willing to take a small performance hit in order to obtain the security offered by being in userspace. Regards, Anthony Liguori By 'being in userspace' I presume you mean the ring processing code. Ring processing can be done in userspace if you are ready to use the synchronous read/write operations, skipping this bunch of code might improve security. But it requires a data copy almost by definition. Liu Ping Fan (9): vring: split the modification and publish of used ring vring: introduce vring_restore() to restore from img event poll: make epoll work for normal fd event poll: pass event type to event callback event poll: enable event poll handle more than one event each time virtio net: introduce dataplane for virtio net tap: make tap peer work on dedicated data-plane thread virtio net: enable dataplane for virtio net configure: make virtio net dataplane configurable configure | 12 ++ hw/dataplane/Makefile.objs |4 +- hw/dataplane/event-poll.c | 69 +-- hw/dataplane/event-poll.h | 14 ++- hw/dataplane/virtio-blk.c |8 +- hw/dataplane/virtio-net.c | 444 hw/dataplane/virtio-net.h | 32 hw/dataplane/vring.c | 37 hw/dataplane/vring.h |3 + hw/virtio-net.c| 94 +- hw/virtio-net.h| 64 +++ hw/virtio-pci.c|2 +- include/net/tap.h |6 + net/tap.c | 92 +- 14 files changed, 806 insertions(+), 75 deletions(-) create mode 100644 hw/dataplane/virtio-net.c create mode 100644 hw/dataplane/virtio-net.h -- 1.7.4.4
Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane
Michael S. Tsirkin m...@redhat.com writes: On Thu, Feb 21, 2013 at 10:15:58AM -0600, Anthony Liguori wrote: Michael S. Tsirkin m...@redhat.com writes: On Thu, Feb 21, 2013 at 08:54:44PM +0800, Liu Ping Fan wrote: This is a emulation to virtio-blk dataplane, which push the data handling out of biglock. And it is a try to implement this process in userspace, while vhost-net in kernel. What's the motivation for doing this? I haven't looked carefully at the patches, but I don't think we should cheat when it comes to virtio-net. I think it's useful to have a baseline proof of concept but I think we should focus on making the network layer re-entrant. In terms of why even bother, if we can make virtio-net a viable alternative to vhost-net, it's a huge win. Being in userspace is a huge advantage. I understand the challenges with zero-copy but we really need a proper userspace implementation to determine how much it really matters. Zero-copy tap is also not entirely outside the realm of possibility. This is exactly what we have, vhost-net is the interface we use for asynchronous communication. If you want to add yet another asynchronous interface in kernel, it's certainly doable but then what's the point? The iperf's result show it improving the perfermance of base line, but still slower than vhost-net (maybe the rx path need optimized). Todo: implement fine lock for net-subsystem vhost-net is currently the only way to get zero copy transmit on linux (and zero copy receive in the future) so at least in theory it'll always be faster. It might always be faster. But that doesn't mean we should limit the performance of virtio-net in userspace. Some people may be willing to take a small performance hit in order to obtain the security offered by being in userspace. Regards, Anthony Liguori By 'being in userspace' I presume you mean the ring processing code. I mean that: 1) the same code path is used regardless of network backend 2) QEMU has the ability to manipulate the traffic if it needs/wants to 3) the state of the device model is userspace at all times 4) QEMU is always the front-line interface communicating with the guest Yes, this could be a model where -netdev vhost-net was the backend used by QEMU. Ring processing can be done in userspace if you are ready to use the synchronous read/write operations, skipping this bunch of code might improve security. But it requires a data copy almost by definition. Not sure what you mean here. If you are talking about read/write on a tap file descriptor, I'm not even suggesting that. I'm perfectly fine with vhost-net being a -netdev backend. But that's a very different model than how we use it today. Regards, Anthony Liguori Liu Ping Fan (9): vring: split the modification and publish of used ring vring: introduce vring_restore() to restore from img event poll: make epoll work for normal fd event poll: pass event type to event callback event poll: enable event poll handle more than one event each time virtio net: introduce dataplane for virtio net tap: make tap peer work on dedicated data-plane thread virtio net: enable dataplane for virtio net configure: make virtio net dataplane configurable configure | 12 ++ hw/dataplane/Makefile.objs |4 +- hw/dataplane/event-poll.c | 69 +-- hw/dataplane/event-poll.h | 14 ++- hw/dataplane/virtio-blk.c |8 +- hw/dataplane/virtio-net.c | 444 hw/dataplane/virtio-net.h | 32 hw/dataplane/vring.c | 37 hw/dataplane/vring.h |3 + hw/virtio-net.c| 94 +- hw/virtio-net.h| 64 +++ hw/virtio-pci.c|2 +- include/net/tap.h |6 + net/tap.c | 92 +- 14 files changed, 806 insertions(+), 75 deletions(-) create mode 100644 hw/dataplane/virtio-net.c create mode 100644 hw/dataplane/virtio-net.h -- 1.7.4.4
Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane
Il 21/02/2013 17:33, Stefan Hajnoczi ha scritto: Interesting patch series. I want to share my thoughts on the status of dataplane and what I'm currently working on. There will probably be some overlap that we can coordinate. First, I want to eventually delete hw/dataplane/ :). That is because dataplane duplicates an event loop, thread-friendly guest RAM access, and virtio. QEMU already has all this functionality except it's not thread-friendly. Because hw/dataplane/ will eventually go away we should avoid adding new code where possible. 100% agree. In particular hw/dataplane/event-poll.c should be the first to go away, but AioContext provides the functionality that Ping Fan needs. But hw/dataplane/vring.c will probably be here for a longer time, so I don't see a problem in touching that. This especially means things like adding live migration support or solving other limitations. The limitations are there to motivate core QEMU refactoring, so that core QEMU code can be used directly instead of reimplementing it. Here's my TODO list: * Thread pool - AioContext The thread pool uses an EventNotifier to invoke callbacks. In a multiple event loop world we need to invoke callbacks in the event loop that they are associated with. Worker threads are also started using a BH so that the thread inherits the iothread CPU affinity and other characteristics. I think this can stay. It can and should stay. :) BHs are tied to an AioContext. Running thread-creation BHs from the AioContext's main thread ensures that the worker threads inherit the right attributes for e.g. real-time. * BlockDriverState - AioContext It probably makes sense to bind a BlockDriverState to an AioContext in which its file descriptors and aio activity happens. ... and have the full chain of BlockDriverStates (bs-file, bs-backing_file and any others as in the vmdk driver) bound to a single AioContext. We will use two BlockDriverStates if the same backing file happens to be shared by two disks (BTW, anybody ever checked what kind of havoc happens if you commit to such a BDS? :)). * bdrv_drain_all()/bdrv_flush_all() safety against concurrent requests The monitor, live migration, and other components use bdrv_drain_all() to complete all requests. It also prevents new requests from starting since the global mutex is held and no device emulation can run. We need to support equivalent semantics in a world where BlockDriverState may be accessed from another thread and have in-flight aio in that event loop. I think this has two parts. The first is to make bdrv_drain_all() more fine-grained. Many calls can be changed to bdrv_drain() that operates on a chain of BlockDriverStates (defined as above). Initially I was thinking of adding a locking API to AioContext, like: aio_lock(bdrv_get_aio_context(bs)); bdrv_drain(bs); ... ... do work ... aio_unlock(bdrv_get_aio_context(bs)); However, this is not enough, because there are some cases in which bdrv_drain_all() is genuinely needed; for example atomic snapshots. And in general we need to stop ioeventfd processing if monitor code runs while the VCPU is not paused. So I think we need to register handlers from the AioContext that disable and enable the ioeventfd: aio_grab(bdrv_get_aio_context(bs)); bdrv_drain(bs); ... ... do work ... aio_release(bdrv_get_aio_context(bs)); The actual functions called by aio_grab/aio_release would be defined by the dataplane thread. I'm not sure this would be enough to make locking entirely private to the AioContext, but I hope so. Anyhow, at this point hw/dataplane/ioq.c goes. This is IMO doable for 1.5. * Thread-friendly guest RAM access API ... and hw/dataplane/hostmem.c goes... * Convert hw/virtio.c (if necessary) to use thread-friendly RAM access ... and hw/dataplane/vring.c goes. * Assigning ioeventfd to a dedicated thread with its own AioContext event loop I think here it starts not to be a problem. I think you can already have live migration and hotplug at this point, but I may be missing something. * Thread-friendly interrupt API What is this? What's still missing here is TCG support. Perhaps you can emulate ioeventfd at the hw/virtio.c level and always going through a host notifier. In retrospect, perhaps it was a mistake to make ioeventfd a non-experimental option. Paolo
Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane
On Thu, Feb 21, 2013 at 11:02:31AM -0600, Anthony Liguori wrote: Michael S. Tsirkin m...@redhat.com writes: On Thu, Feb 21, 2013 at 10:15:58AM -0600, Anthony Liguori wrote: Michael S. Tsirkin m...@redhat.com writes: On Thu, Feb 21, 2013 at 08:54:44PM +0800, Liu Ping Fan wrote: This is a emulation to virtio-blk dataplane, which push the data handling out of biglock. And it is a try to implement this process in userspace, while vhost-net in kernel. What's the motivation for doing this? I haven't looked carefully at the patches, but I don't think we should cheat when it comes to virtio-net. I think it's useful to have a baseline proof of concept but I think we should focus on making the network layer re-entrant. In terms of why even bother, if we can make virtio-net a viable alternative to vhost-net, it's a huge win. Being in userspace is a huge advantage. I understand the challenges with zero-copy but we really need a proper userspace implementation to determine how much it really matters. Zero-copy tap is also not entirely outside the realm of possibility. This is exactly what we have, vhost-net is the interface we use for asynchronous communication. If you want to add yet another asynchronous interface in kernel, it's certainly doable but then what's the point? The iperf's result show it improving the perfermance of base line, but still slower than vhost-net (maybe the rx path need optimized). Todo: implement fine lock for net-subsystem vhost-net is currently the only way to get zero copy transmit on linux (and zero copy receive in the future) so at least in theory it'll always be faster. It might always be faster. But that doesn't mean we should limit the performance of virtio-net in userspace. Some people may be willing to take a small performance hit in order to obtain the security offered by being in userspace. Regards, Anthony Liguori By 'being in userspace' I presume you mean the ring processing code. I mean that: 1) the same code path is used regardless of network backend 2) QEMU has the ability to manipulate the traffic if it needs/wants to 3) the state of the device model is userspace at all times 4) QEMU is always the front-line interface communicating with the guest Yes, this could be a model where -netdev vhost-net was the backend used by QEMU. Ring processing can be done in userspace if you are ready to use the synchronous read/write operations, skipping this bunch of code might improve security. But it requires a data copy almost by definition. Not sure what you mean here. If you are talking about read/write on a tap file descriptor, I'm not even suggesting that. I'm perfectly fine with vhost-net being a -netdev backend. But that's a very different model than how we use it today. Regards, Anthony Liguori It's not hard to make the first step there. Basically just make -netdev vhost-net an alias for -netdev tap, just making vhost=on the default. For non virtio frontends, it needs a driver for vhost-net in userspace, I started a work on this at some point as it would let me write unit tests for virtio, but never finished. Any takers? Liu Ping Fan (9): vring: split the modification and publish of used ring vring: introduce vring_restore() to restore from img event poll: make epoll work for normal fd event poll: pass event type to event callback event poll: enable event poll handle more than one event each time virtio net: introduce dataplane for virtio net tap: make tap peer work on dedicated data-plane thread virtio net: enable dataplane for virtio net configure: make virtio net dataplane configurable configure | 12 ++ hw/dataplane/Makefile.objs |4 +- hw/dataplane/event-poll.c | 69 +-- hw/dataplane/event-poll.h | 14 ++- hw/dataplane/virtio-blk.c |8 +- hw/dataplane/virtio-net.c | 444 hw/dataplane/virtio-net.h | 32 hw/dataplane/vring.c | 37 hw/dataplane/vring.h |3 + hw/virtio-net.c| 94 +- hw/virtio-net.h| 64 +++ hw/virtio-pci.c|2 +- include/net/tap.h |6 + net/tap.c | 92 +- 14 files changed, 806 insertions(+), 75 deletions(-) create mode 100644 hw/dataplane/virtio-net.c create mode 100644 hw/dataplane/virtio-net.h -- 1.7.4.4
Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane
On Thu, Feb 21, 2013 at 06:45:54PM +0200, Michael S. Tsirkin wrote: On Thu, Feb 21, 2013 at 10:15:58AM -0600, Anthony Liguori wrote: Michael S. Tsirkin m...@redhat.com writes: On Thu, Feb 21, 2013 at 08:54:44PM +0800, Liu Ping Fan wrote: This is a emulation to virtio-blk dataplane, which push the data handling out of biglock. And it is a try to implement this process in userspace, while vhost-net in kernel. What's the motivation for doing this? I haven't looked carefully at the patches, but I don't think we should cheat when it comes to virtio-net. I think it's useful to have a baseline proof of concept but I think we should focus on making the network layer re-entrant. In terms of why even bother, if we can make virtio-net a viable alternative to vhost-net, it's a huge win. Being in userspace is a huge advantage. I understand the challenges with zero-copy but we really need a proper userspace implementation to determine how much it really matters. Zero-copy tap is also not entirely outside the realm of possibility. This is exactly what we have, vhost-net is the interface we use for asynchronous communication. If you want to add yet another asynchronous interface in kernel, it's certainly doable but then what's the point? The iperf's result show it improving the perfermance of base line, but still slower than vhost-net (maybe the rx path need optimized). Todo: implement fine lock for net-subsystem vhost-net is currently the only way to get zero copy transmit on linux (and zero copy receive in the future) so at least in theory it'll always be faster. It might always be faster. But that doesn't mean we should limit the performance of virtio-net in userspace. Some people may be willing to take a small performance hit in order to obtain the security offered by being in userspace. Regards, Anthony Liguori By 'being in userspace' I presume you mean the ring processing code. Ring processing can be done in userspace if you are ready to use the synchronous read/write operations, skipping this bunch of code might improve security. But it requires a data copy almost by definition. BTW I think we do have a problem at the moment, and that is that virtio net can process the same ring both in userspace (for level interrupts) and in kernel (for edge interrupts). And which one is selected is under guest control. This is twice the number of potential security issues. A solution, in my opinion, is to improve handling of level interrupts to the point where we can make vhost-net the default for level as well. The new support for level IRQFD in kvm should make this possible. Unfortunately, this prototype moves in exactly the reverse direction. Liu Ping Fan (9): vring: split the modification and publish of used ring vring: introduce vring_restore() to restore from img event poll: make epoll work for normal fd event poll: pass event type to event callback event poll: enable event poll handle more than one event each time virtio net: introduce dataplane for virtio net tap: make tap peer work on dedicated data-plane thread virtio net: enable dataplane for virtio net configure: make virtio net dataplane configurable configure | 12 ++ hw/dataplane/Makefile.objs |4 +- hw/dataplane/event-poll.c | 69 +-- hw/dataplane/event-poll.h | 14 ++- hw/dataplane/virtio-blk.c |8 +- hw/dataplane/virtio-net.c | 444 hw/dataplane/virtio-net.h | 32 hw/dataplane/vring.c | 37 hw/dataplane/vring.h |3 + hw/virtio-net.c| 94 +- hw/virtio-net.h| 64 +++ hw/virtio-pci.c|2 +- include/net/tap.h |6 + net/tap.c | 92 +- 14 files changed, 806 insertions(+), 75 deletions(-) create mode 100644 hw/dataplane/virtio-net.c create mode 100644 hw/dataplane/virtio-net.h -- 1.7.4.4
Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane
On Thu, Feb 21, 2013 at 06:05:44PM +0100, Paolo Bonzini wrote: Il 21/02/2013 17:33, Stefan Hajnoczi ha scritto: Interesting patch series. I want to share my thoughts on the status of dataplane and what I'm currently working on. There will probably be some overlap that we can coordinate. First, I want to eventually delete hw/dataplane/ :). That is because dataplane duplicates an event loop, thread-friendly guest RAM access, and virtio. QEMU already has all this functionality except it's not thread-friendly. Because hw/dataplane/ will eventually go away we should avoid adding new code where possible. 100% agree. In particular hw/dataplane/event-poll.c should be the first to go away, but AioContext provides the functionality that Ping Fan needs. But hw/dataplane/vring.c will probably be here for a longer Has there been any discussion around introducing something similar to AioContexts for fd handlers? This would avoid the dataplane-specific hooks needed for NetClients in this series.
Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane
Il 21/02/2013 22:07, mdroth ha scritto: 100% agree. In particular hw/dataplane/event-poll.c should be the first to go away, but AioContext provides the functionality that Ping Fan needs. But hw/dataplane/vring.c will probably be here for a longer Has there been any discussion around introducing something similar to AioContexts for fd handlers? This would avoid the dataplane-specific hooks needed for NetClients in this series. AioContext can include file descriptors on POSIX systems (used for NBD and other network backends), see aio_set_fd_handler. Paolo
Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane
On Thu, Feb 21, 2013 at 11:12:01PM +0100, Paolo Bonzini wrote: Il 21/02/2013 22:07, mdroth ha scritto: 100% agree. In particular hw/dataplane/event-poll.c should be the first to go away, but AioContext provides the functionality that Ping Fan needs. But hw/dataplane/vring.c will probably be here for a longer Has there been any discussion around introducing something similar to AioContexts for fd handlers? This would avoid the dataplane-specific hooks needed for NetClients in this series. AioContext can include file descriptors on POSIX systems (used for NBD and other network backends), see aio_set_fd_handler. Sorry, was using fd handlers too generally. I mean specifically for the qemu_set_fd_handler interfaces, where we currently rely on a single list of IOHandlerRecords for registration and a single loop to drive them. Would be nice if we could drive subsets of those via mini main loops, similar to the way dataplane threads would with a particular AioContext via aio_poll (or perhaps the exact same way) Currently, Ping Fan's patches basically do this already by accessing a global reference to the vnet worker thread and attaching events/handlers to it's event loop via a new set of registration functions (PATCH 7). I think generalizing this by basing qemu_set_fd_handler() around AioContext, or something similar, would help to extend support to other NetClient implementations without requiring dataplane-specific hooks throughout. Paolo