Re: [PATCHv7] add mergeable buffers support to vhost_net
On Wed, Apr 28, 2010 at 01:57:12PM -0700, David L Stevens wrote: This patch adds mergeable receive buffer support to vhost_net. Signed-off-by: David L Stevens dlstev...@us.ibm.com I've been doing some more testing before sending out a pull request, and I see a drastic performance degradation in guest to host traffic when this is applied but mergeable buffers are not in used by userspace (existing qemu-kvm userspace). This is both with and without my patch on top. Without patch: [...@tuck ~]$ sh runtest 21 | tee ser-meregeable-disabled-kernel-only-tun-only.log Starting netserver at port 12865 set_up_server could not establish a listen endpoint for port 12865 with family AF_UNSPEC TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.0.0.4 (11.0.0.4) port 0 AF_INET : demo Recv SendSend Utilization Service Demand Socket Socket Message Elapsed Send Recv SendRecv Size SizeSize Time Throughput localremote local remote bytes bytes bytessecs.10^6bits/s % S % S us/KB us/KB 87380 16384 1638410.00 9107.26 89.2033.850.802 2.436 With patch: TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.0.0.4 (11.0.0.4) port 0 AF_INET : demo Recv SendSend Utilization Service Demand Socket Socket Message Elapsed Send Recv SendRecv Size SizeSize Time Throughput localremote local remote bytes bytes bytessecs.10^6bits/s % S % S us/KB us/KB 87380 16384 1638410.0035.00 2.21 0.62 5.181 11.575 For ease of testing, I put this on my tree git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost-broken Please take a look. Thanks! -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCHv7] add mergeable buffers support to vhost_net
Michael S. Tsirkin m...@redhat.com wrote on 05/03/2010 03:34:11 AM: On Wed, Apr 28, 2010 at 01:57:12PM -0700, David L Stevens wrote: This patch adds mergeable receive buffer support to vhost_net. Signed-off-by: David L Stevens dlstev...@us.ibm.com I've been doing some more testing before sending out a pull request, and I see a drastic performance degradation in guest to host traffic when this is applied but mergeable buffers are not in used by userspace (existing qemu-kvm userspace). Actually, I wouldn't expect it to work at all; the qemu-kvm patch (particularly the feature bit setting bug fix) is required. Without it, I think the existing code will tell the guest to use mergeable buffers while turning it off in vhost. That was completely non-functional for me -- what version of qemu-kvm are you using? What I did to test w/o mergeable buffers is turn off the bit in VHOST_FEATURES. I'll recheck these, but qemu-kvm definitely must be updated; the original doesn't correctly handle feature bits. +-DLS ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCHv7] add mergeable buffers support to vhost_net
On Mon, May 03, 2010 at 08:39:08AM -0700, David Stevens wrote: Michael S. Tsirkin m...@redhat.com wrote on 05/03/2010 03:34:11 AM: On Wed, Apr 28, 2010 at 01:57:12PM -0700, David L Stevens wrote: This patch adds mergeable receive buffer support to vhost_net. Signed-off-by: David L Stevens dlstev...@us.ibm.com I've been doing some more testing before sending out a pull request, and I see a drastic performance degradation in guest to host traffic when this is applied but mergeable buffers are not in used by userspace (existing qemu-kvm userspace). Actually, I wouldn't expect it to work at all; the qemu-kvm patch (particularly the feature bit setting bug fix) is required. Which bugfix is that? Without it, I think the existing code will tell the guest to use mergeable buffers while turning it off in vhost. It should not do that, specifically vhost_net_get_features does: features = ~(1 VIRTIO_NET_F_MRG_RXBUF); unconditionally. This was added with: 5751995a20e77cd9d61d00f7390401895fa172a6 I forced mergeable buffers off with -global virtio-net-pci.mrg_rxbuf=off and it did not seem to help either. That was completely non-functional for me -- what version of qemu-kvm are you using? 992cc816c42f2e93db033919a9ddbfcd1da4 or later should work well AFAIK. What I did to test w/o mergeable buffers is turn off the bit in VHOST_FEATURES. It should be enough to force mergeable buffers to off by qemu command line: -global virtio-net-pci.mrg_rxbuf=off I'll recheck these, but qemu-kvm definitely must be updated; the original doesn't correctly handle feature bits. +-DLS Hmm, I don't see the bug. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCHv7] add mergeable buffers support to vhost_net
Michael S. Tsirkin m...@redhat.com wrote on 05/03/2010 08:56:14 AM: On Mon, May 03, 2010 at 08:39:08AM -0700, David Stevens wrote: Michael S. Tsirkin m...@redhat.com wrote on 05/03/2010 03:34:11 AM: On Wed, Apr 28, 2010 at 01:57:12PM -0700, David L Stevens wrote: This patch adds mergeable receive buffer support to vhost_net. Signed-off-by: David L Stevens dlstev...@us.ibm.com I've been doing some more testing before sending out a pull request, and I see a drastic performance degradation in guest to host traffic when this is applied but mergeable buffers are not in used by userspace (existing qemu-kvm userspace). Actually, I wouldn't expect it to work at all; the qemu-kvm patch (particularly the feature bit setting bug fix) is required. Which bugfix is that? Actually, I see you put that upstream already-- commit dc14a397812b91dd0d48b03d1b8f66a251542369 in Avi's tree is the one I was talking about. I'll look further. +-DLS ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
[GIT PULL] first round of vhost-net enhancements for net-next
David, The following tree includes a couple of enhancements that help vhost-net. Please pull them for net-next. Another set of patches is under debugging/testing and I hope to get them ready in time for 2.6.35, so there may be another pull request later. Thanks! The following changes since commit 7ef527377b88ff05fb122a47619ea506c631c914: Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 (2010-05-02 22:02:06 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost Michael S. Tsirkin (2): tun: add ioctl to modify vnet header size macvtap: add ioctl to modify vnet header size drivers/net/macvtap.c | 31 +++ drivers/net/tun.c | 32 include/linux/if_tun.h |2 ++ 3 files changed, 57 insertions(+), 8 deletions(-) -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [GIT PULL] first round of vhost-net enhancements for net-next
From: David Miller da...@davemloft.net Date: Mon, 03 May 2010 15:07:29 -0700 (PDT) From: Michael S. Tsirkin m...@redhat.com Date: Tue, 4 May 2010 00:32:45 +0300 The following tree includes a couple of enhancements that help vhost-net. Please pull them for net-next. Another set of patches is under debugging/testing and I hope to get them ready in time for 2.6.35, so there may be another pull request later. Pulled, thanks. Nevermind, reverted. Do you even compile test what you send to people? drivers/net/macvtap.c: In function ‘macvtap_ioctl’: drivers/net/macvtap.c:713: warning: control reaches end of non-void function You're really batting 1000 today Michael... ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [GIT PULL] first round of vhost-net enhancements for net-next
From: Michael S. Tsirkin m...@redhat.com Date: Tue, 4 May 2010 00:32:45 +0300 The following tree includes a couple of enhancements that help vhost-net. Please pull them for net-next. Another set of patches is under debugging/testing and I hope to get them ready in time for 2.6.35, so there may be another pull request later. Pulled, thanks. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [GIT PULL] first round of vhost-net enhancements for net-next
On Mon, May 03, 2010 at 03:08:29PM -0700, David Miller wrote: From: David Miller da...@davemloft.net Date: Mon, 03 May 2010 15:07:29 -0700 (PDT) From: Michael S. Tsirkin m...@redhat.com Date: Tue, 4 May 2010 00:32:45 +0300 The following tree includes a couple of enhancements that help vhost-net. Please pull them for net-next. Another set of patches is under debugging/testing and I hope to get them ready in time for 2.6.35, so there may be another pull request later. Pulled, thanks. Nevermind, reverted. Do you even compile test what you send to people? drivers/net/macvtap.c: In function ‘macvtap_ioctl’: drivers/net/macvtap.c:713: warning: control reaches end of non-void function You're really batting 1000 today Michael... Ouch. Should teach me not to send out stuff after midnight. Sorry about the noise. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCHv7] add mergeable buffers support to vhost_net
On Wed, Apr 28, 2010 at 01:57:12PM -0700, David L Stevens wrote: This patch adds mergeable receive buffer support to vhost_net. Signed-off-by: David L Stevens dlstev...@us.ibm.com diff -ruNp net-next-v0/drivers/vhost/net.c net-next-v7/drivers/vhost/net.c --- net-next-v0/drivers/vhost/net.c 2010-04-24 21:36:54.0 -0700 +++ net-next-v7/drivers/vhost/net.c 2010-04-28 12:26:18.0 -0700 @@ -74,6 +74,23 @@ static int move_iovec_hdr(struct iovec * } return seg; } +/* Copy iovec entries for len bytes from iovec. Return segments used. */ +static int copy_iovec_hdr(const struct iovec *from, struct iovec *to, + size_t len, int iovcount) +{ + int seg = 0; + size_t size; + while (len seg iovcount) { + size = min(from-iov_len, len); + to-iov_base = from-iov_base; + to-iov_len = size; + len -= size; + ++from; + ++to; + ++seg; + } + return seg; +} /* Caller must have TX VQ lock */ static void tx_poll_stop(struct vhost_net *net) @@ -109,7 +126,7 @@ static void handle_tx(struct vhost_net * }; size_t len, total_len = 0; int err, wmem; - size_t hdr_size; + size_t vhost_hlen; struct socket *sock = rcu_dereference(vq-private_data); if (!sock) return; @@ -128,13 +145,13 @@ static void handle_tx(struct vhost_net * if (wmem sock-sk-sk_sndbuf / 2) tx_poll_stop(net); - hdr_size = vq-hdr_size; + vhost_hlen = vq-vhost_hlen; for (;;) { - head = vhost_get_vq_desc(net-dev, vq, vq-iov, - ARRAY_SIZE(vq-iov), - out, in, - NULL, NULL); + head = vhost_get_desc(net-dev, vq, vq-iov, + ARRAY_SIZE(vq-iov), + out, in, + NULL, NULL); /* Nothing new? Wait for eventfd to tell us they refilled. */ if (head == vq-num) { wmem = atomic_read(sock-sk-sk_wmem_alloc); @@ -155,20 +172,20 @@ static void handle_tx(struct vhost_net * break; } /* Skip header. TODO: support TSO. */ - s = move_iovec_hdr(vq-iov, vq-hdr, hdr_size, out); + s = move_iovec_hdr(vq-iov, vq-hdr, vhost_hlen, out); msg.msg_iovlen = out; len = iov_length(vq-iov, out); /* Sanity check */ if (!len) { vq_err(vq, Unexpected header len for TX: %zd expected %zd\n, -iov_length(vq-hdr, s), hdr_size); +iov_length(vq-hdr, s), vhost_hlen); break; } /* TODO: Check specific error and bomb out unless ENOBUFS? */ err = sock-ops-sendmsg(NULL, sock, msg, len); if (unlikely(err 0)) { - vhost_discard_vq_desc(vq); + vhost_discard_desc(vq, 1); tx_poll_start(net, sock); break; } @@ -187,12 +204,25 @@ static void handle_tx(struct vhost_net * unuse_mm(net-dev.mm); } +static int vhost_head_len(struct vhost_virtqueue *vq, struct sock *sk) +{ + struct sk_buff *head; + int len = 0; + + lock_sock(sk); + head = skb_peek(sk-sk_receive_queue); + if (head) + len = head-len + vq-sock_hlen; + release_sock(sk); + return len; +} + /* Expects to be always run from workqueue - which acts as * read-size critical section for our kind of RCU. */ static void handle_rx(struct vhost_net *net) { struct vhost_virtqueue *vq = net-dev.vqs[VHOST_NET_VQ_RX]; - unsigned head, out, in, log, s; + unsigned in, log, s; struct vhost_log *vq_log; struct msghdr msg = { .msg_name = NULL, @@ -203,14 +233,14 @@ static void handle_rx(struct vhost_net * .msg_flags = MSG_DONTWAIT, }; - struct virtio_net_hdr hdr = { - .flags = 0, - .gso_type = VIRTIO_NET_HDR_GSO_NONE + struct virtio_net_hdr_mrg_rxbuf hdr = { + .hdr.flags = 0, + .hdr.gso_type = VIRTIO_NET_HDR_GSO_NONE }; size_t len, total_len = 0; - int err; - size_t hdr_size; + int err, headcount, datalen; + size_t vhost_hlen; struct socket *sock = rcu_dereference(vq-private_data); if (!sock || skb_queue_empty(sock-sk-sk_receive_queue)) return; @@ -218,18 +248,19 @@ static void handle_rx(struct vhost_net * use_mm(net-dev.mm); mutex_lock(vq-mutex); vhost_disable_notify(vq); - hdr_size =
Re: [PATCH] virtio-spec: document block CMD and FLUSH
On Fri, 19 Feb 2010 08:52:20 am Michael S. Tsirkin wrote: I took a stub at documenting CMD and FLUSH request types in virtio block. Christoph, could you look over this please? I note that the interface seems full of warts to me, this might be a first step to cleaning them. ISTR Christoph had withdrawn some patches in this area, and was waiting for him to resubmit? I've given up on figuring out the block device. What seem to me to be sane semantics along the lines of memory barriers are foreign to disk people: they want (and depend on) flushing everywhere. For example, tdb transactions do not require a flush, they only require what I would call a barrier: that prior data be written out before any future data. Surely that would be more efficient in general than a flush! In fact, TDB wants only writes to *that file* (and metadata) written out first; it has no ordering issues with other I/O on the same device. A generic I/O interface would allow you to specify this request depends on these outstanding requests and leave it at that. It might have some sync flush command for dumb applications and OSes. The userspace API might be not be as precise and only allow such a barrier against all prior writes on this fd. ISTR someone mentioning a desire for such an API years ago, so CC'ing the usual I/O suspects... Cheers, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization