[Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
This feature bit can be used by hypervisor to indicate virtio_net device to act as a standby for another device with the same MAC address. I tested this with a small change to the patch to mark the STANDBY feature 'true' by default as i am using libvirt to start the VMs. Is there a way to pass the newly added feature bit 'standby' to qemu via libvirt XML file? Signed-off-by: Sridhar Samudrala <sridhar.samudr...@intel.com> --- hw/net/virtio-net.c | 2 ++ include/standard-headers/linux/virtio_net.h | 3 +++ 2 files changed, 5 insertions(+) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index 90502fca7c..38b3140670 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = { true), DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN), DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str), +DEFINE_PROP_BIT64("standby", VirtIONet, host_features, VIRTIO_NET_F_STANDBY, + false), DEFINE_PROP_END_OF_LIST(), }; diff --git a/include/standard-headers/linux/virtio_net.h b/include/standard-headers/linux/virtio_net.h index e9f255ea3f..01ec09684c 100644 --- a/include/standard-headers/linux/virtio_net.h +++ b/include/standard-headers/linux/virtio_net.h @@ -57,6 +57,9 @@ * Steering */ #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */ +#define VIRTIO_NET_F_STANDBY 62/* Act as standby for another device + * with the same MAC. + */ #define VIRTIO_NET_F_SPEED_DUPLEX 63 /* Device set linkspeed and duplex */ #ifndef VIRTIO_NET_NO_LEGACY -- 2.14.3
Re: [Qemu-devel] [net-next RFC PATCH 0/7] multiqueue support for tun/tap
On Fri, 2011-08-12 at 09:54 +0800, Jason Wang wrote: As multi-queue nics were commonly used for high-end servers, current single queue based tap can not satisfy the requirement of scaling guest network performance as the numbers of vcpus increase. So the following series implements multiple queue support in tun/tap. In order to take advantages of this, a multi-queue capable driver and qemu were also needed. I just rebase the latest version of Krishna's multi-queue virtio-net driver into this series to simplify the test. And for multiqueue supported qemu, you can refer the patches I post in http://www.spinics.net/lists/kvm/msg52808.html. Vhost is also a must to achieve high performance and its code could be used for multi-queue without modification. Alternatively, this series can be also used for Krishna's M:N implementation of multiqueue but I didn't test it. The idea is simple: each socket were abstracted as a queue for tun/tap, and userspace may open as many files as required and then attach them to the devices. In order to keep the ABI compatibility, device creation were still finished in TUNSETIFF, and two new ioctls TUNATTACHQUEUE and TUNDETACHQUEUE were added for user to manipulate the numbers of queues for the tun/tap. Is it possible to have tap create these queues automatically when TUNSETIFF is called instead of having userspace to do the new ioctls. I am just wondering if it is possible to get multi-queue to be enabled without any changes to qemu. I guess the number of queues could be based on the number of vhost threads/guest virtio-net queues. Also, is it possible to enable multi-queue on the host alone without any guest virtio-net changes? Have you done any multiple TCP_RR/UDP_RR testing with small packet sizes? 256byte request/response with 50-100 instances? I've done some basic performance testing of multi queue tap. For tun, I just test it through vpnc. Notes: - Test shows improvement when receving packets from local/external host to guest, and send big packet from guest to local/external host. - Current multiqueue based virtio-net/tap introduce a regression of send small packet (512 byte) from guest to local/external host. I suspect it's the issue of queue selection in both guest driver and tap. Would continue to investigate. - I would post the perforamnce numbers as a reply of this mail. TODO: - solve the issue of packet transmission of small packets. - addressing the comments of virtio-net driver - performance tunning Please review and comment it, Thanks. --- Jason Wang (5): tuntap: move socket/sock related structures to tun_file tuntap: categorize ioctl tuntap: introduce multiqueue related flags tuntap: multiqueue support tuntap: add ioctls to attach or detach a file form tap device Krishna Kumar (2): Change virtqueue structure virtio-net changes drivers/net/tun.c | 738 ++- drivers/net/virtio_net.c| 578 -- drivers/virtio/virtio_pci.c | 10 - include/linux/if_tun.h |5 include/linux/virtio.h |1 include/linux/virtio_net.h |3 6 files changed, 867 insertions(+), 468 deletions(-)
[Qemu-devel] Re: [PATCH] vhost: force vhost off for non-MSI guests
On Thu, 2011-01-20 at 17:35 +0200, Michael S. Tsirkin wrote: When MSI is off, each interrupt needs to be bounced through the io thread when it's set/cleared, so vhost-net causes more context switches and higher CPU utilization than userspace virtio which handles networking in the same thread. We'll need to fix this by adding level irq support in kvm irqfd, for now disable vhost-net in these configurations. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- I need to report some error from virtio-pci that would be handled specially (disable but don't report an error) so I wanted one that's never likely to be used by a userspace ioctl. I selected ERANGE but it'd be easy to switch to something else. Comments? Should this error be EVHOST_DISABLED rather than EVIRTIO_DISABLED? -Sridhar hw/vhost.c |4 +++- hw/virtio-net.c |6 -- hw/virtio-pci.c |3 +++ hw/virtio.h |2 ++ 4 files changed, 12 insertions(+), 3 deletions(-) diff --git a/hw/vhost.c b/hw/vhost.c index 1d09ed0..c79765a 100644 --- a/hw/vhost.c +++ b/hw/vhost.c @@ -649,7 +649,9 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev) r = vdev-binding-set_guest_notifiers(vdev-binding_opaque, true); if (r 0) { -fprintf(stderr, Error binding guest notifier: %d\n, -r); + if (r != -EVIRTIO_DISABLED) { + fprintf(stderr, Error binding guest notifier: %d\n, -r); + } goto fail_notifiers; } diff --git a/hw/virtio-net.c b/hw/virtio-net.c index ccb3e63..5de3fee 100644 --- a/hw/virtio-net.c +++ b/hw/virtio-net.c @@ -121,8 +121,10 @@ static void virtio_net_vhost_status(VirtIONet *n, uint8_t status) if (!n-vhost_started) { int r = vhost_net_start(tap_get_vhost_net(n-nic-nc.peer), n-vdev); if (r 0) { -error_report(unable to start vhost net: %d: - falling back on userspace virtio, -r); +if (r != -EVIRTIO_DISABLED) { +error_report(unable to start vhost net: %d: + falling back on userspace virtio, -r); +} } else { n-vhost_started = 1; } diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c index dd8887a..dbf4be0 100644 --- a/hw/virtio-pci.c +++ b/hw/virtio-pci.c @@ -628,6 +628,9 @@ static int virtio_pci_set_guest_notifier(void *opaque, int n, bool assign) EventNotifier *notifier = virtio_queue_get_guest_notifier(vq); if (assign) { +if (!msix_enabled(proxy-pci_dev)) { +return -EVIRTIO_DISABLED; +} int r = event_notifier_init(notifier, 0); if (r 0) { return r; diff --git a/hw/virtio.h b/hw/virtio.h index d8546d5..53bbdba 100644 --- a/hw/virtio.h +++ b/hw/virtio.h @@ -98,6 +98,8 @@ typedef struct { void (*vmstate_change)(void * opaque, bool running); } VirtIOBindings; +#define EVIRTIO_DISABLED ERANGE + #define VIRTIO_PCI_QUEUE_MAX 64 #define VIRTIO_NO_VECTOR 0x
[Qemu-devel] Re: [PATCH] vhost: force vhost off for non-MSI guests
On Thu, 2011-01-20 at 19:47 +0200, Michael S. Tsirkin wrote: On Thu, Jan 20, 2011 at 08:31:53AM -0800, Sridhar Samudrala wrote: On Thu, 2011-01-20 at 17:35 +0200, Michael S. Tsirkin wrote: When MSI is off, each interrupt needs to be bounced through the io thread when it's set/cleared, so vhost-net causes more context switches and higher CPU utilization than userspace virtio which handles networking in the same thread. We'll need to fix this by adding level irq support in kvm irqfd, for now disable vhost-net in these configurations. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- I need to report some error from virtio-pci that would be handled specially (disable but don't report an error) so I wanted one that's never likely to be used by a userspace ioctl. I selected ERANGE but it'd be easy to switch to something else. Comments? Should this error be EVHOST_DISABLED rather than EVIRTIO_DISABLED? -Sridhar The error is reported by virtio-pci which does not know about vhost. I started with EVIRTIO_MSIX_DISABLED and made is shorter. Would EVIRTIO_MSIX_DISABLED be better? I think so. This makes it more clear. -Sridhar hw/vhost.c |4 +++- hw/virtio-net.c |6 -- hw/virtio-pci.c |3 +++ hw/virtio.h |2 ++ 4 files changed, 12 insertions(+), 3 deletions(-) diff --git a/hw/vhost.c b/hw/vhost.c index 1d09ed0..c79765a 100644 --- a/hw/vhost.c +++ b/hw/vhost.c @@ -649,7 +649,9 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev) r = vdev-binding-set_guest_notifiers(vdev-binding_opaque, true); if (r 0) { -fprintf(stderr, Error binding guest notifier: %d\n, -r); + if (r != -EVIRTIO_DISABLED) { + fprintf(stderr, Error binding guest notifier: %d\n, -r); + } goto fail_notifiers; } diff --git a/hw/virtio-net.c b/hw/virtio-net.c index ccb3e63..5de3fee 100644 --- a/hw/virtio-net.c +++ b/hw/virtio-net.c @@ -121,8 +121,10 @@ static void virtio_net_vhost_status(VirtIONet *n, uint8_t status) if (!n-vhost_started) { int r = vhost_net_start(tap_get_vhost_net(n-nic-nc.peer), n-vdev); if (r 0) { -error_report(unable to start vhost net: %d: - falling back on userspace virtio, -r); +if (r != -EVIRTIO_DISABLED) { +error_report(unable to start vhost net: %d: + falling back on userspace virtio, -r); +} } else { n-vhost_started = 1; } diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c index dd8887a..dbf4be0 100644 --- a/hw/virtio-pci.c +++ b/hw/virtio-pci.c @@ -628,6 +628,9 @@ static int virtio_pci_set_guest_notifier(void *opaque, int n, bool assign) EventNotifier *notifier = virtio_queue_get_guest_notifier(vq); if (assign) { +if (!msix_enabled(proxy-pci_dev)) { +return -EVIRTIO_DISABLED; +} int r = event_notifier_init(notifier, 0); if (r 0) { return r; diff --git a/hw/virtio.h b/hw/virtio.h index d8546d5..53bbdba 100644 --- a/hw/virtio.h +++ b/hw/virtio.h @@ -98,6 +98,8 @@ typedef struct { void (*vmstate_change)(void * opaque, bool running); } VirtIOBindings; +#define EVIRTIO_DISABLED ERANGE + #define VIRTIO_PCI_QUEUE_MAX 64 #define VIRTIO_NO_VECTOR 0x
[Qemu-devel] Re: kvm networking todo wiki
On Tue, 2010-09-21 at 18:11 +0200, Michael S. Tsirkin wrote: I've put up a wiki page with a kvm networking todo list, mainly to avoid effort duplication, but also in the hope to draw attention to what I think we should try addressing in KVM: http://www.linux-kvm.org/page/NetworkingTodo This page could cover all networking related activity in KVM, currently most info is related to virtio-net. Note: if there's no developer listed for an item, this just means I don't know of anyone actively working on an issue at the moment, not that no one intends to. I would appreciate it if others working on one of the items on this list would add their names so we can communicate better. If others like this wiki page, please go ahead and add stuff you are working on if any. It would be especially nice to add autotest projects: there is just a short test matrix and a catch-all 'Cover test matrix with autotest', currently. Currently there are some links to Red Hat bugzilla entries, feel free to add links to other bugzillas. Thanks for capturing these items. It is really useful. Another item that is missing is - support assigning SR-IOV VF to a guest via tap/macvtap Currently, this requires - VF to be put in promiscuous mode when using a bridge/tap - add a new mac address to VF when using macvtap. I don't think any of the VF drivers provide these capabilities at this time. -Sridhar
[Qemu-devel] Re: [PATCH qemu-kvm] Add raw(af_packet) network backend to qemu
On Wed, 2010-01-27 at 14:56 -0800, Sridhar Samudrala wrote: On Wed, 2010-01-27 at 22:39 +0100, Arnd Bergmann wrote: On Wednesday 27 January 2010, Anthony Liguori wrote: I think -net socket,fd should just be (trivially) extended to work with raw sockets out of the box, with no support for opening it. Then you can have libvirt or some wrapper open a raw socket and a private namespace and just pass it down. That'd work. Anthony? The fundamental problem that I have with all of this is that we should not be introducing new network backends that are based around something only a developer is going to understand. If I'm a user and I want to use an external switch in VEPA mode, how in the world am I going to know that I'm supposed to use the -net raw backend or the -net socket backend? It might as well be the -net butterflies backend as far as a user is concerned. My point is that we already have -net socket,fd and any user that passes an fd into that already knows what he wants to do with it. Making it work with raw sockets is just a natural extension to this, which works on all kernels and (with separate namespaces) is reasonably secure. Didn't realize that -net socket is already there and supports TCP and UDP sockets. I will look into extending -net socket to support AF_PACKET SOCK_RAW type sockets. OK. Here is a patch that adds AF_PACKET-SOCK_RAW support to -netdev socket backend. It allows specifying a already opened raw fd or a ifname to which a raw socket can be bind. -netdev socket,fd=X,id=str -netdev socket,ifname=ethX/macvlanX,id=str However, i found that struct NetSocketState doesn't include all the State info that is required to support AF_PACKET Raw sockets. So i had to add NetSocketRawState and also couldn't re-use much of the code. I think -net socket backend is more geared towards AF_INET sockets. Adding support for a new family of socket doesn't fit nicely with the existing code. But if this approach is more acceptable than a new -net raw,fd backend, i am fine with it. Thanks Sridhar diff --git a/hw/virtio-net.c b/hw/virtio-net.c index eba578a..7d62dd9 100644 --- a/hw/virtio-net.c +++ b/hw/virtio-net.c @@ -15,6 +15,7 @@ #include net.h #include net/checksum.h #include net/tap.h +#include net/socket.h #include qemu-timer.h #include virtio-net.h @@ -133,6 +134,9 @@ static int peer_has_vnet_hdr(VirtIONet *n) case NET_CLIENT_TYPE_TAP: n-has_vnet_hdr = tap_has_vnet_hdr(n-nic-nc.peer); break; +case NET_CLIENT_TYPE_SOCKET_RAW: +n-has_vnet_hdr = sock_raw_has_vnet_hdr(n-nic-nc.peer); +break; default: return 0; } @@ -149,6 +153,9 @@ static int peer_has_ufo(VirtIONet *n) case NET_CLIENT_TYPE_TAP: n-has_ufo = tap_has_ufo(n-nic-nc.peer); break; +case NET_CLIENT_TYPE_SOCKET_RAW: +n-has_ufo = sock_raw_has_ufo(n-nic-nc.peer); +break; default: return 0; } @@ -165,6 +172,9 @@ static void peer_using_vnet_hdr(VirtIONet *n, int using_vnet_hdr) case NET_CLIENT_TYPE_TAP: tap_using_vnet_hdr(n-nic-nc.peer, using_vnet_hdr); break; +case NET_CLIENT_TYPE_SOCKET_RAW: +sock_raw_using_vnet_hdr(n-nic-nc.peer, using_vnet_hdr); +break; default: break; } @@ -180,6 +190,9 @@ static void peer_set_offload(VirtIONet *n, int csum, int tso4, int tso6, case NET_CLIENT_TYPE_TAP: tap_set_offload(n-nic-nc.peer, csum, tso4, tso6, ecn, ufo); break; +case NET_CLIENT_TYPE_SOCKET_RAW: +sock_raw_set_offload(n-nic-nc.peer, csum, tso4, tso6, ecn, ufo); +break; default: break; } diff --git a/net.c b/net.c index 6ef93e6..3d25d64 100644 --- a/net.c +++ b/net.c @@ -1002,6 +1002,11 @@ static struct { .type = QEMU_OPT_STRING, .help = UDP multicast address and port number, }, +{ +.name = ifname, +.type = QEMU_OPT_STRING, +.help = interface name, +}, { /* end of list */ } }, #ifdef CONFIG_VDE diff --git a/net.h b/net.h index 116bb80..74b3e69 100644 --- a/net.h +++ b/net.h @@ -34,7 +34,8 @@ typedef enum { NET_CLIENT_TYPE_TAP, NET_CLIENT_TYPE_SOCKET, NET_CLIENT_TYPE_VDE, -NET_CLIENT_TYPE_DUMP +NET_CLIENT_TYPE_DUMP, +NET_CLIENT_TYPE_SOCKET_RAW, } net_client_type; typedef void (NetPoll)(VLANClientState *, bool enable); diff --git a/net/socket.c b/net/socket.c index 5533737..56f5bad 100644 --- a/net/socket.c +++ b/net/socket.c @@ -32,6 +32,327 @@ #include qemu_socket.h #include sysemu.h +#include netpacket/packet.h +#include net/ethernet.h +#include net/if.h +#include sys/ioctl.h + +/* Maximum GSO packet size (64k) plus plenty of room for + * the ethernet and virtio_net headers + */ +#define
[Qemu-devel] Re: [PATCH qemu-kvm] Add raw(af_packet) network backend to qemu
On Tue, 2010-01-26 at 14:47 -0600, Anthony Liguori wrote: On 01/26/2010 02:40 PM, Sridhar Samudrala wrote: This patch adds raw socket backend to qemu and is based on Or Gerlitz's patch re-factored and ported to the latest qemu-kvm git tree. It also includes support for vnet_hdr option that enables gso/checksum offload with raw backend. You can find the linux kernel patch to support this feature here. http://thread.gmane.org/gmane.linux.network/150308 Signed-off-by: Sridhar Samudralas...@us.ibm.com See the previous discussion about the raw backend from Or's original patch. There's no obvious reason why we should have this in addition to a tun/tap backend. The only use-case I know of is macvlan but macvtap addresses this functionality while not introduce the rather nasty security problems associated with a raw backend. The raw backend can be attached to a physical device, macvlan or SR-IOV VF. I don't think AF_PACKET socket itself introduces any security problems. The raw socket can be created only by a user with CAP_RAW capability. The only issue is if we need to assume that qemu itself is an untrusted process and a raw fd cannot be passed to it. But, i think it is a useful backend to support in qemu that provides guest to remote host connectivity without the need for a bridge/tap. macvtap could be an alternative if it supports binding to SR-IOV VFs too. Thanks Sridhar