Re: [PATCHv7] add mergeable buffers support to vhost_net

2010-05-03 Thread Michael S. Tsirkin
On Wed, Apr 28, 2010 at 01:57:12PM -0700, David L Stevens wrote:
 This patch adds mergeable receive buffer support to vhost_net.
 
 Signed-off-by: David L Stevens dlstev...@us.ibm.com

I've been doing some more testing before sending out a pull
request, and I see a drastic performance degradation in guest to host
traffic when this is applied but mergeable buffers are not in used
by userspace (existing qemu-kvm userspace).

This is both with and without my patch on top.

Without patch:
[...@tuck ~]$ sh runtest  21 | tee 
ser-meregeable-disabled-kernel-only-tun-only.log
Starting netserver at port 12865
set_up_server could not establish a listen endpoint for  port 12865 with family 
AF_UNSPEC
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.0.0.4 (11.0.0.4) 
port 0 AF_INET : demo
Recv   SendSend  Utilization   Service Demand
Socket Socket  Message  Elapsed  Send Recv SendRecv
Size   SizeSize Time Throughput  localremote   local   remote
bytes  bytes   bytessecs.10^6bits/s  % S  % S  us/KB   us/KB

 87380  16384  1638410.00  9107.26   89.2033.850.802   2.436  

With patch:
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.0.0.4 (11.0.0.4) 
port 0 AF_INET : demo
Recv   SendSend  Utilization   Service Demand
Socket Socket  Message  Elapsed  Send Recv SendRecv
Size   SizeSize Time Throughput  localremote   local   remote
bytes  bytes   bytessecs.10^6bits/s  % S  % S  us/KB   us/KB

 87380  16384  1638410.0035.00   2.21 0.62 5.181   11.575 


For ease of testing, I put this on my tree
git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost-broken

Please take a look.
Thanks!

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv7] add mergeable buffers support to vhost_net

2010-05-03 Thread David Stevens
Michael S. Tsirkin m...@redhat.com wrote on 05/03/2010 03:34:11 AM:

 On Wed, Apr 28, 2010 at 01:57:12PM -0700, David L Stevens wrote:
  This patch adds mergeable receive buffer support to vhost_net.
  
  Signed-off-by: David L Stevens dlstev...@us.ibm.com
 
 I've been doing some more testing before sending out a pull
 request, and I see a drastic performance degradation in guest to host
 traffic when this is applied but mergeable buffers are not in used
 by userspace (existing qemu-kvm userspace).

Actually, I wouldn't expect it to work at all; the qemu-kvm
patch (particularly the feature bit setting bug fix) is required.
Without it, I think the existing code will tell the guest to use
mergeable buffers while turning it off in vhost. That was completely
non-functional for me -- what version of qemu-kvm are you using?
What I did to test w/o mergeable buffers is turn off the
bit in VHOST_FEATURES. I'll recheck these, but qemu-kvm definitely
must be updated; the original doesn't correctly handle feature bits.

+-DLS

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv7] add mergeable buffers support to vhost_net

2010-05-03 Thread Michael S. Tsirkin
On Mon, May 03, 2010 at 08:39:08AM -0700, David Stevens wrote:
 Michael S. Tsirkin m...@redhat.com wrote on 05/03/2010 03:34:11 AM:
 
  On Wed, Apr 28, 2010 at 01:57:12PM -0700, David L Stevens wrote:
   This patch adds mergeable receive buffer support to vhost_net.
   
   Signed-off-by: David L Stevens dlstev...@us.ibm.com
  
  I've been doing some more testing before sending out a pull
  request, and I see a drastic performance degradation in guest to host
  traffic when this is applied but mergeable buffers are not in used
  by userspace (existing qemu-kvm userspace).
 
 Actually, I wouldn't expect it to work at all;
 the qemu-kvm
 patch (particularly the feature bit setting bug fix) is required.

Which bugfix is that?

 Without it, I think the existing code will tell the guest to use
 mergeable buffers while turning it off in vhost.

It should not do that, specifically
vhost_net_get_features does:
features = ~(1  VIRTIO_NET_F_MRG_RXBUF);
unconditionally. This was added with:
5751995a20e77cd9d61d00f7390401895fa172a6

I forced mergeable buffers off with -global virtio-net-pci.mrg_rxbuf=off
and it did not seem to help either.

 That was completely
 non-functional for me -- what version of qemu-kvm are you using?

992cc816c42f2e93db033919a9ddbfcd1da4 or later should work well
AFAIK.

 What I did to test w/o mergeable buffers is turn off the
 bit in VHOST_FEATURES.

It should be enough to force mergeable buffers to off by qemu command
line: -global virtio-net-pci.mrg_rxbuf=off 

 I'll recheck these, but qemu-kvm definitely
 must be updated; the original doesn't correctly handle feature bits.
 
 +-DLS

Hmm, I don't see the bug.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv7] add mergeable buffers support to vhost_net

2010-05-03 Thread David Stevens
Michael S. Tsirkin m...@redhat.com wrote on 05/03/2010 08:56:14 AM:

 On Mon, May 03, 2010 at 08:39:08AM -0700, David Stevens wrote:
  Michael S. Tsirkin m...@redhat.com wrote on 05/03/2010 03:34:11 AM:
  
   On Wed, Apr 28, 2010 at 01:57:12PM -0700, David L Stevens wrote:
This patch adds mergeable receive buffer support to vhost_net.

Signed-off-by: David L Stevens dlstev...@us.ibm.com
   
   I've been doing some more testing before sending out a pull
   request, and I see a drastic performance degradation in guest to 
host
   traffic when this is applied but mergeable buffers are not in used
   by userspace (existing qemu-kvm userspace).
  
  Actually, I wouldn't expect it to work at all;
  the qemu-kvm
  patch (particularly the feature bit setting bug fix) is required.
 
 Which bugfix is that?

Actually, I see you put that upstream already--
commit dc14a397812b91dd0d48b03d1b8f66a251542369  in
Avi's tree is the one I was talking about.
I'll look further.

+-DLS

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[GIT PULL] first round of vhost-net enhancements for net-next

2010-05-03 Thread Michael S. Tsirkin
David,
The following tree includes a couple of enhancements that help vhost-net.
Please pull them for net-next. Another set of patches is under
debugging/testing and I hope to get them ready in time for 2.6.35,
so there may be another pull request later.

Thanks!

The following changes since commit 7ef527377b88ff05fb122a47619ea506c631c914:

  Merge branch 'master' of 
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 (2010-05-02 22:02:06 
-0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost

Michael S. Tsirkin (2):
  tun: add ioctl to modify vnet header size
  macvtap: add ioctl to modify vnet header size

 drivers/net/macvtap.c  |   31 +++
 drivers/net/tun.c  |   32 
 include/linux/if_tun.h |2 ++
 3 files changed, 57 insertions(+), 8 deletions(-)

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [GIT PULL] first round of vhost-net enhancements for net-next

2010-05-03 Thread David Miller
From: David Miller da...@davemloft.net
Date: Mon, 03 May 2010 15:07:29 -0700 (PDT)

 From: Michael S. Tsirkin m...@redhat.com
 Date: Tue, 4 May 2010 00:32:45 +0300
 
 The following tree includes a couple of enhancements that help vhost-net.
 Please pull them for net-next. Another set of patches is under
 debugging/testing and I hope to get them ready in time for 2.6.35,
 so there may be another pull request later.
 
 Pulled, thanks.

Nevermind, reverted.

Do you even compile test what you send to people?

drivers/net/macvtap.c: In function ‘macvtap_ioctl’:
drivers/net/macvtap.c:713: warning: control reaches end of non-void function

You're really batting 1000 today Michael...
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [GIT PULL] first round of vhost-net enhancements for net-next

2010-05-03 Thread David Miller
From: Michael S. Tsirkin m...@redhat.com
Date: Tue, 4 May 2010 00:32:45 +0300

 The following tree includes a couple of enhancements that help vhost-net.
 Please pull them for net-next. Another set of patches is under
 debugging/testing and I hope to get them ready in time for 2.6.35,
 so there may be another pull request later.

Pulled, thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [GIT PULL] first round of vhost-net enhancements for net-next

2010-05-03 Thread Michael S. Tsirkin
On Mon, May 03, 2010 at 03:08:29PM -0700, David Miller wrote:
 From: David Miller da...@davemloft.net
 Date: Mon, 03 May 2010 15:07:29 -0700 (PDT)
 
  From: Michael S. Tsirkin m...@redhat.com
  Date: Tue, 4 May 2010 00:32:45 +0300
  
  The following tree includes a couple of enhancements that help vhost-net.
  Please pull them for net-next. Another set of patches is under
  debugging/testing and I hope to get them ready in time for 2.6.35,
  so there may be another pull request later.
  
  Pulled, thanks.
 
 Nevermind, reverted.
 
 Do you even compile test what you send to people?
 
 drivers/net/macvtap.c: In function ‘macvtap_ioctl’:
 drivers/net/macvtap.c:713: warning: control reaches end of non-void function
 
 You're really batting 1000 today Michael...

Ouch. Should teach me not to send out stuff after midnight. Sorry about
the noise.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCHv7] add mergeable buffers support to vhost_net

2010-05-03 Thread Michael S. Tsirkin
On Wed, Apr 28, 2010 at 01:57:12PM -0700, David L Stevens wrote:
 This patch adds mergeable receive buffer support to vhost_net.
 
 Signed-off-by: David L Stevens dlstev...@us.ibm.com
 
 diff -ruNp net-next-v0/drivers/vhost/net.c net-next-v7/drivers/vhost/net.c
 --- net-next-v0/drivers/vhost/net.c   2010-04-24 21:36:54.0 -0700
 +++ net-next-v7/drivers/vhost/net.c   2010-04-28 12:26:18.0 -0700
 @@ -74,6 +74,23 @@ static int move_iovec_hdr(struct iovec *
   }
   return seg;
  }
 +/* Copy iovec entries for len bytes from iovec. Return segments used. */
 +static int copy_iovec_hdr(const struct iovec *from, struct iovec *to,
 +   size_t len, int iovcount)
 +{
 + int seg = 0;
 + size_t size;
 + while (len  seg  iovcount) {
 + size = min(from-iov_len, len);
 + to-iov_base = from-iov_base;
 + to-iov_len = size;
 + len -= size;
 + ++from;
 + ++to;
 + ++seg;
 + }
 + return seg;
 +}
  
  /* Caller must have TX VQ lock */
  static void tx_poll_stop(struct vhost_net *net)
 @@ -109,7 +126,7 @@ static void handle_tx(struct vhost_net *
   };
   size_t len, total_len = 0;
   int err, wmem;
 - size_t hdr_size;
 + size_t vhost_hlen;
   struct socket *sock = rcu_dereference(vq-private_data);
   if (!sock)
   return;
 @@ -128,13 +145,13 @@ static void handle_tx(struct vhost_net *
  
   if (wmem  sock-sk-sk_sndbuf / 2)
   tx_poll_stop(net);
 - hdr_size = vq-hdr_size;
 + vhost_hlen = vq-vhost_hlen;
  
   for (;;) {
 - head = vhost_get_vq_desc(net-dev, vq, vq-iov,
 -  ARRAY_SIZE(vq-iov),
 -  out, in,
 -  NULL, NULL);
 + head = vhost_get_desc(net-dev, vq, vq-iov,
 +   ARRAY_SIZE(vq-iov),
 +   out, in,
 +   NULL, NULL);
   /* Nothing new?  Wait for eventfd to tell us they refilled. */
   if (head == vq-num) {
   wmem = atomic_read(sock-sk-sk_wmem_alloc);
 @@ -155,20 +172,20 @@ static void handle_tx(struct vhost_net *
   break;
   }
   /* Skip header. TODO: support TSO. */
 - s = move_iovec_hdr(vq-iov, vq-hdr, hdr_size, out);
 + s = move_iovec_hdr(vq-iov, vq-hdr, vhost_hlen, out);
   msg.msg_iovlen = out;
   len = iov_length(vq-iov, out);
   /* Sanity check */
   if (!len) {
   vq_err(vq, Unexpected header len for TX: 
  %zd expected %zd\n,
 -iov_length(vq-hdr, s), hdr_size);
 +iov_length(vq-hdr, s), vhost_hlen);
   break;
   }
   /* TODO: Check specific error and bomb out unless ENOBUFS? */
   err = sock-ops-sendmsg(NULL, sock, msg, len);
   if (unlikely(err  0)) {
 - vhost_discard_vq_desc(vq);
 + vhost_discard_desc(vq, 1);
   tx_poll_start(net, sock);
   break;
   }
 @@ -187,12 +204,25 @@ static void handle_tx(struct vhost_net *
   unuse_mm(net-dev.mm);
  }
  
 +static int vhost_head_len(struct vhost_virtqueue *vq, struct sock *sk)
 +{
 + struct sk_buff *head;
 + int len = 0;
 +
 + lock_sock(sk);
 + head = skb_peek(sk-sk_receive_queue);
 + if (head)
 + len = head-len + vq-sock_hlen;
 + release_sock(sk);
 + return len;
 +}
 +
  /* Expects to be always run from workqueue - which acts as
   * read-size critical section for our kind of RCU. */
  static void handle_rx(struct vhost_net *net)
  {
   struct vhost_virtqueue *vq = net-dev.vqs[VHOST_NET_VQ_RX];
 - unsigned head, out, in, log, s;
 + unsigned in, log, s;
   struct vhost_log *vq_log;
   struct msghdr msg = {
   .msg_name = NULL,
 @@ -203,14 +233,14 @@ static void handle_rx(struct vhost_net *
   .msg_flags = MSG_DONTWAIT,
   };
  
 - struct virtio_net_hdr hdr = {
 - .flags = 0,
 - .gso_type = VIRTIO_NET_HDR_GSO_NONE
 + struct virtio_net_hdr_mrg_rxbuf hdr = {
 + .hdr.flags = 0,
 + .hdr.gso_type = VIRTIO_NET_HDR_GSO_NONE
   };
  
   size_t len, total_len = 0;
 - int err;
 - size_t hdr_size;
 + int err, headcount, datalen;
 + size_t vhost_hlen;
   struct socket *sock = rcu_dereference(vq-private_data);
   if (!sock || skb_queue_empty(sock-sk-sk_receive_queue))
   return;
 @@ -218,18 +248,19 @@ static void handle_rx(struct vhost_net *
   use_mm(net-dev.mm);
   mutex_lock(vq-mutex);
   vhost_disable_notify(vq);
 - hdr_size = 

Re: [PATCH] virtio-spec: document block CMD and FLUSH

2010-05-03 Thread Rusty Russell
On Fri, 19 Feb 2010 08:52:20 am Michael S. Tsirkin wrote:
 I took a stub at documenting CMD and FLUSH request types in virtio
 block.  Christoph, could you look over this please?
 
 I note that the interface seems full of warts to me,
 this might be a first step to cleaning them.

ISTR Christoph had withdrawn some patches in this area, and was waiting
for him to resubmit?

I've given up on figuring out the block device.  What seem to me to be sane
semantics along the lines of memory barriers are foreign to disk people: they
want (and depend on) flushing everywhere.

For example, tdb transactions do not require a flush, they only require what
I would call a barrier: that prior data be written out before any future data.
Surely that would be more efficient in general than a flush!  In fact, TDB
wants only writes to *that file* (and metadata) written out first; it has no
ordering issues with other I/O on the same device.

A generic I/O interface would allow you to specify this request depends on 
these
outstanding requests and leave it at that.  It might have some sync flush
command for dumb applications and OSes.  The userspace API might be not be as
precise and only allow such a barrier against all prior writes on this fd.

ISTR someone mentioning a desire for such an API years ago, so CC'ing the
usual I/O suspects...

Cheers,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization