Re: [RFC][ PATCH 0/3] vhost-net: Add mergeable RX buffer support to vhost-net

2010-03-03 Thread Michael S. Tsirkin
On Tue, Mar 02, 2010 at 04:20:03PM -0800, David Stevens wrote:
 These patches add support for mergeable receive buffers to
 vhost-net, allowing it to use multiple virtio buffer heads for a single
 receive packet.
 +-DLS
 
 
 Signed-off-by: David L Stevens dlstev...@us.ibm.com

Do you have performance numbers (both with and without mergeable buffers
in guest)?

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [RFC][ PATCH 0/3] vhost-net: Add mergeable RX buffer support to vhost-net

2010-03-03 Thread Michael S. Tsirkin
On Wed, Mar 03, 2010 at 12:54:25AM -0800, David Stevens wrote:
 Michael S. Tsirkin m...@redhat.com wrote on 03/02/2010 11:54:32 PM:
 
  On Tue, Mar 02, 2010 at 04:20:03PM -0800, David Stevens wrote:
   These patches add support for mergeable receive buffers to
   vhost-net, allowing it to use multiple virtio buffer heads for a 
 single
   receive packet.
   +-DLS
   
   
   Signed-off-by: David L Stevens dlstev...@us.ibm.com
  
  Do you have performance numbers (both with and without mergeable buffers
  in guest)?
 
 Michael,
 Nothing formal. I did some TCP single-stream throughput tests
 and was seeing 20-25% improvement on a laptop (ie, low-end hardware).
 That actually surprised me; I'd think it'd be about the same, except
 maybe in a test that has mixed packet sizes. Comparisons with the
 net-next kernel these patches are for showed only ~10% improvement.
 But I also see a lot of variability both among different
 configurations and with the same configuration on different runs.
 So, I don't feel like those numbers are very solid, and I haven't
 yet done any tests on bigger hardware.

Interesting. Since the feature in question is billed first of all a
performance optimization, I think we might need some performance numbers
as a motivation.

Since the patches affect code paths when mergeable RX buffers are
disabled as well, I guess the most important point would be to verify
whether there's increase in latency and/or CPU utilization, or bandwidth
cost when the feature bit is *disabled*.

 2 notes: I have a modified version of qemu to get the VHOST_FEATURES
 flags, including the mergeable RX bufs flag, passed to the guest; I'll
 be working with your current qemu git trees next, if any changes are
 needed to support it there.

This feature also seems to conflict with zero-copy rx patches from Xin
Xiaohui (subject: Provide a zero-copy method on KVM virtio-net) these
are not in a mergeable shape yet, so this is not a blocker, but I wonder
what your thoughts on the subject are: how will we do feature
negotiation if some backends don't support some features?

 Second, I've found a missing initialization in the patches I
 sent on the list, so I'll send an updated patch 2 with the fix,

If you do, any chance you could use git send-email for this?

 and qemu patches when they are ready (plus any code-review comments
 incorporated).
 

Pls take a look here as well
http://www.openfabrics.org/~mst/boring.txt

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[RFC][ PATCH 3/3] vhost-net: Add mergeable RX buffer support to vhost-net

2010-03-03 Thread David Stevens
This patch glues them all together and makes sure we
notify whenever we don't have enough buffers to receive
a max-sized packet, and adds the feature bit.

Signed-off-by: David L Stevens dlstev...@us.ibm.com

diff -ruN net-next-p2/drivers/vhost/net.c net-next-p3/drivers/vhost/net.c
--- net-next-p2/drivers/vhost/net.c 2010-03-02 13:01:34.0 
-0800
+++ net-next-p3/drivers/vhost/net.c 2010-03-02 15:25:15.0 
-0800
@@ -54,26 +54,6 @@
enum vhost_net_poll_state tx_poll_state;
 };
 
-/* Pop first len bytes from iovec. Return number of segments used. */
-static int move_iovec_hdr(struct iovec *from, struct iovec *to,
- size_t len, int iov_count)
-{
-   int seg = 0;
-   size_t size;
-   while (len  seg  iov_count) {
-   size = min(from-iov_len, len);
-   to-iov_base = from-iov_base;
-   to-iov_len = size;
-   from-iov_len -= size;
-   from-iov_base += size;
-   len -= size;
-   ++from;
-   ++to;
-   ++seg;
-   }
-   return seg;
-}
-
 /* Caller must have TX VQ lock */
 static void tx_poll_stop(struct vhost_net *net)
 {
@@ -97,7 +77,7 @@
 static void handle_tx(struct vhost_net *net)
 {
struct vhost_virtqueue *vq = net-dev.vqs[VHOST_NET_VQ_TX];
-   unsigned out, in, s;
+   unsigned out, in;
struct iovec head;
struct msghdr msg = {
.msg_name = NULL,
@@ -110,6 +90,7 @@
size_t len, total_len = 0;
int err, wmem;
struct socket *sock = rcu_dereference(vq-private_data);
+
if (!sock)
return;
 
@@ -166,11 +147,11 @@
/* Skip header. TODO: support TSO. */
msg.msg_iovlen = out;
head.iov_len = len = iov_length(vq-iov, out);
+
/* Sanity check */
if (!len) {
vq_err(vq, Unexpected header len for TX: 
-  %zd expected %zd\n,
-  len, vq-guest_hlen);
+  %zd expected %zd\n, len, vq-guest_hlen);
break;
}
/* TODO: Check specific error and bomb out unless ENOBUFS? 
*/
@@ -214,7 +195,7 @@
 static void handle_rx(struct vhost_net *net)
 {
struct vhost_virtqueue *vq = net-dev.vqs[VHOST_NET_VQ_RX];
-   unsigned in, log, s;
+   unsigned in, log;
struct vhost_log *vq_log;
struct msghdr msg = {
.msg_name = NULL,
@@ -245,30 +226,36 @@
if (!headcount) {
vhost_enable_notify(vq);
break;
-   }
+   } else if (vq-maxheadcount  headcount)
+   vq-maxheadcount = headcount;
/* Skip header. TODO: support TSO/mergeable rx buffers. */
msg.msg_iovlen = in;
len = iov_length(vq-iov, in);
-
/* Sanity check */
if (!len) {
vq_err(vq, Unexpected header len for RX: 
-  %zd expected %zd\n,
-  len, vq-guest_hlen);
+  %zd expected %zd\n, len, vq-guest_hlen);
break;
}
err = sock-ops-recvmsg(NULL, sock, msg,
 len, MSG_DONTWAIT | MSG_TRUNC);
-   /* TODO: Check specific error and bomb out unless EAGAIN? 
*/
if (err  0) {
-   vhost_discard(vq, 1);
+   vhost_discard(vq, headcount);
break;
}
/* TODO: Should check and handle checksum. */
+   if (vhost_has_feature(net-dev, VIRTIO_NET_F_MRG_RXBUF)) 
{
+   struct virtio_net_hdr_mrg_rxbuf *vhdr =
+   (struct virtio_net_hdr_mrg_rxbuf *)
+   vq-iov[0].iov_base;
+   /* add num_bufs */
+   vq-iov[0].iov_len = vq-guest_hlen;
+   vhdr-num_buffers = headcount;
+   }
if (err  len) {
pr_err(Discarded truncated rx packet: 
len %d  %zd\n, err, len);
-   vhost_discard(vq, 1);
+   vhost_discard(vq, headcount);
continue;
}
len = err;
@@ -573,8 +560,6 @@
 
 static int vhost_net_set_features(struct vhost_net *n, u64 features)
 {
-   size_t hdr_size = features  (1  VHOST_NET_F_VIRTIO_NET_HDR) ?
-   sizeof(struct virtio_net_hdr) : 0;
int i;
mutex_lock(n-dev.mutex);
if ((features  (1  VHOST_F_LOG_ALL)) 
diff -ruN net-next-p2/drivers/vhost/vhost.c 
net-next-p3/drivers/vhost/vhost.c
--- net-next-p2/drivers/vhost/vhost.c   

[RFC][ PATCH 2/3] vhost-net: handle vnet_hdr processing for MRG_RX_BUF

2010-03-03 Thread David Stevens
This patch adds vnet_hdr processing for mergeable buffer
support to vhost-net.

Signed-off-by: David L Stevens dlstev...@us.ibm.com

diff -ruN net-next-p1/drivers/vhost/net.c net-next-p2/drivers/vhost/net.c
--- net-next-p1/drivers/vhost/net.c 2010-03-01 11:44:22.0 
-0800
+++ net-next-p2/drivers/vhost/net.c 2010-03-02 13:01:34.0 
-0800
@@ -109,7 +109,6 @@
};
size_t len, total_len = 0;
int err, wmem;
-   size_t hdr_size;
struct socket *sock = rcu_dereference(vq-private_data);
if (!sock)
return;
@@ -124,7 +123,6 @@
 
if (wmem  sock-sk-sk_sndbuf * 2)
tx_poll_stop(net);
-   hdr_size = vq-hdr_size;
 
for (;;) {
head.iov_base = (void *)vhost_get_vq_desc(net-dev, vq,
@@ -148,25 +146,45 @@
   out %d, int %d\n, out, in);
break;
}
+   if (vq-guest_hlen  vq-sock_hlen) {
+   if (msg.msg_iov[0].iov_len == vq-guest_hlen)
+   msg.msg_iov[0].iov_len = vq-sock_hlen;
+   else if (out == ARRAY_SIZE(vq-iov))
+   vq_err(vq, handle_tx iov overflow!);
+   else {
+   int i;
+
+   /* give header its own iov */
+   for (i=out; i0; ++i)
+   msg.msg_iov[i+1] = msg.msg_iov[i];
+   msg.msg_iov[0].iov_len = vq-sock_hlen;
+   msg.msg_iov[1].iov_base += vq-guest_hlen;
+   msg.msg_iov[1].iov_len -= vq-guest_hlen;
+   out++;
+   }
+   }
/* Skip header. TODO: support TSO. */
-   s = move_iovec_hdr(vq-iov, vq-hdr, hdr_size, out);
msg.msg_iovlen = out;
head.iov_len = len = iov_length(vq-iov, out);
/* Sanity check */
if (!len) {
vq_err(vq, Unexpected header len for TX: 
   %zd expected %zd\n,
-  iov_length(vq-hdr, s), hdr_size);
+  len, vq-guest_hlen);
break;
}
/* TODO: Check specific error and bomb out unless ENOBUFS? 
*/
err = sock-ops-sendmsg(NULL, sock, msg, len);
if (unlikely(err  0)) {
-   vhost_discard(vq, 1);
-   tx_poll_start(net, sock);
+   if (err == -EAGAIN) {
+   tx_poll_start(net, sock);
+   } else {
+   vq_err(vq, sendmsg: errno %d\n, -err);
+   /* drop packet; do not discard/resend */
+ vhost_add_used_and_signal(net-dev,vq,head,1);
+   }
break;
-   }
-   if (err != len)
+   } else if (err != len)
pr_err(Truncated TX packet: 
len %d != %zd\n, err, len);
vhost_add_used_and_signal(net-dev, vq, head, 1);
@@ -207,14 +225,8 @@
.msg_flags = MSG_DONTWAIT,
};
 
-   struct virtio_net_hdr hdr = {
-   .flags = 0,
-   .gso_type = VIRTIO_NET_HDR_GSO_NONE
-   };
-
size_t len, total_len = 0;
int err, headcount, datalen;
-   size_t hdr_size;
struct socket *sock = rcu_dereference(vq-private_data);
 
if (!sock || !skb_head_len(sock-sk-sk_receive_queue))
@@ -223,7 +235,6 @@
use_mm(net-dev.mm);
mutex_lock(vq-mutex);
vhost_disable_notify(vq);
-   hdr_size = vq-hdr_size;
 
vq_log = unlikely(vhost_has_feature(net-dev, VHOST_F_LOG_ALL)) ?
vq-log : NULL;
@@ -232,25 +243,18 @@
headcount = vhost_get_heads(vq, datalen, in, vq_log, 
log);
/* OK, now we need to know about added descriptors. */
if (!headcount) {
-   if (unlikely(vhost_enable_notify(vq))) {
-   /* They have slipped one in as we were
-* doing that: check again. */
-   vhost_disable_notify(vq);
-   continue;
-   }
-   /* Nothing new?  Wait for eventfd to tell us
-* they refilled. */
+   vhost_enable_notify(vq);
break;
}
/* Skip header. TODO: support TSO/mergeable rx buffers. */
-   s = move_iovec_hdr(vq-iov, vq-hdr, hdr_size, in);
msg.msg_iovlen = in;
len = iov_length(vq-iov, in);
+
/* Sanity check */
 

[RFC][ PATCH 0/3] vhost-net: Add mergeable RX buffer support to vhost-net

2010-03-03 Thread David Stevens
These patches add support for mergeable receive buffers to
vhost-net, allowing it to use multiple virtio buffer heads for a single
receive packet.
+-DLS


Signed-off-by: David L Stevens dlstev...@us.ibm.com
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[RFC][ PATCH 1/3] vhost-net: support multiple buffer heads in receiver

2010-03-03 Thread David Stevens
This patch generalizes buffer handling functions to
support multiple buffer heads.

In-line for viewing, attached for applying.

Signed-off-by: David L Stevens dlstev...@us.ibm.com

diff -ruN net-next-p0/drivers/vhost/net.c net-next-p1/drivers/vhost/net.c
--- net-next-p0/drivers/vhost/net.c 2010-02-24 12:59:24.0 
-0800
+++ net-next-p1/drivers/vhost/net.c 2010-03-01 11:44:22.0 
-0800
@@ -97,7 +97,8 @@
 static void handle_tx(struct vhost_net *net)
 {
struct vhost_virtqueue *vq = net-dev.vqs[VHOST_NET_VQ_TX];
-   unsigned head, out, in, s;
+   unsigned out, in, s;
+   struct iovec head;
struct msghdr msg = {
.msg_name = NULL,
.msg_namelen = 0,
@@ -126,12 +127,10 @@
hdr_size = vq-hdr_size;
 
for (;;) {
-   head = vhost_get_vq_desc(net-dev, vq, vq-iov,
-ARRAY_SIZE(vq-iov),
-out, in,
-NULL, NULL);
+   head.iov_base = (void *)vhost_get_vq_desc(net-dev, vq,
+   vq-iov, ARRAY_SIZE(vq-iov), out, in, NULL, 
NULL);
/* Nothing new?  Wait for eventfd to tell us they 
refilled. */
-   if (head == vq-num) {
+   if (head.iov_base == (void *)vq-num) {
wmem = atomic_read(sock-sk-sk_wmem_alloc);
if (wmem = sock-sk-sk_sndbuf * 3 / 4) {
tx_poll_start(net, sock);
@@ -152,7 +151,7 @@
/* Skip header. TODO: support TSO. */
s = move_iovec_hdr(vq-iov, vq-hdr, hdr_size, out);
msg.msg_iovlen = out;
-   len = iov_length(vq-iov, out);
+   head.iov_len = len = iov_length(vq-iov, out);
/* Sanity check */
if (!len) {
vq_err(vq, Unexpected header len for TX: 
@@ -163,14 +162,14 @@
/* TODO: Check specific error and bomb out unless ENOBUFS? 
*/
err = sock-ops-sendmsg(NULL, sock, msg, len);
if (unlikely(err  0)) {
-   vhost_discard_vq_desc(vq);
+   vhost_discard(vq, 1);
tx_poll_start(net, sock);
break;
}
if (err != len)
pr_err(Truncated TX packet: 
len %d != %zd\n, err, len);
-   vhost_add_used_and_signal(net-dev, vq, head, 0);
+   vhost_add_used_and_signal(net-dev, vq, head, 1);
total_len += len;
if (unlikely(total_len = VHOST_NET_WEIGHT)) {
vhost_poll_queue(vq-poll);
@@ -182,12 +181,22 @@
unuse_mm(net-dev.mm);
 }
 
+static int skb_head_len(struct sk_buff_head *skq)
+{
+   struct sk_buff *head;
+
+   head = skb_peek(skq);
+   if (head)
+   return head-len;
+   return 0;
+}
+
 /* Expects to be always run from workqueue - which acts as
  * read-size critical section for our kind of RCU. */
 static void handle_rx(struct vhost_net *net)
 {
struct vhost_virtqueue *vq = net-dev.vqs[VHOST_NET_VQ_RX];
-   unsigned head, out, in, log, s;
+   unsigned in, log, s;
struct vhost_log *vq_log;
struct msghdr msg = {
.msg_name = NULL,
@@ -204,10 +213,11 @@
};
 
size_t len, total_len = 0;
-   int err;
+   int err, headcount, datalen;
size_t hdr_size;
struct socket *sock = rcu_dereference(vq-private_data);
-   if (!sock || skb_queue_empty(sock-sk-sk_receive_queue))
+
+   if (!sock || !skb_head_len(sock-sk-sk_receive_queue))
return;
 
use_mm(net-dev.mm);
@@ -218,13 +228,10 @@
vq_log = unlikely(vhost_has_feature(net-dev, VHOST_F_LOG_ALL)) ?
vq-log : NULL;
 
-   for (;;) {
-   head = vhost_get_vq_desc(net-dev, vq, vq-iov,
-ARRAY_SIZE(vq-iov),
-out, in,
-vq_log, log);
+   while ((datalen = skb_head_len(sock-sk-sk_receive_queue))) {
+   headcount = vhost_get_heads(vq, datalen, in, vq_log, 
log);
/* OK, now we need to know about added descriptors. */
-   if (head == vq-num) {
+   if (!headcount) {
if (unlikely(vhost_enable_notify(vq))) {
/* They have slipped one in as we were
 * doing that: check again. */
@@ -235,13 +242,6 @@
 * they refilled. */
break;
}
-   /* We don't need to be notified again. */
-   if (out) {
-   vq_err(vq, Unexpected descriptor format for RX: 
-  out %d, int 

Re: [RFC][ PATCH 0/3] vhost-net: Add mergeable RX buffer support to vhost-net

2010-03-03 Thread David Stevens
Michael S. Tsirkin m...@redhat.com wrote on 03/02/2010 11:54:32 PM:

 On Tue, Mar 02, 2010 at 04:20:03PM -0800, David Stevens wrote:
  These patches add support for mergeable receive buffers to
  vhost-net, allowing it to use multiple virtio buffer heads for a 
single
  receive packet.
  +-DLS
  
  
  Signed-off-by: David L Stevens dlstev...@us.ibm.com
 
 Do you have performance numbers (both with and without mergeable buffers
 in guest)?

Michael,
Nothing formal. I did some TCP single-stream throughput tests
and was seeing 20-25% improvement on a laptop (ie, low-end hardware).
That actually surprised me; I'd think it'd be about the same, except
maybe in a test that has mixed packet sizes. Comparisons with the
net-next kernel these patches are for showed only ~10% improvement.
But I also see a lot of variability both among different
configurations and with the same configuration on different runs.
So, I don't feel like those numbers are very solid, and I haven't
yet done any tests on bigger hardware.

2 notes: I have a modified version of qemu to get the VHOST_FEATURES
flags, including the mergeable RX bufs flag, passed to the guest; I'll
be working with your current qemu git trees next, if any changes are
needed to support it there.
Second, I've found a missing initialization in the patches I
sent on the list, so I'll send an updated patch 2 with the fix, and
qemu patches when they are ready (plus any code-review comments
incorporated).

+-DLS

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


VTDC2010 Deadline Extended to March 11

2010-03-03 Thread Ming Zhao

(our apologies if you receive this announcement multiple times)

Dead-line extension (March 11th) !

  Call for Papers
  ---

Workshop on Virtualization Technologies in Distributed Computing (VTDC 2010)

in conjunction with the 19-th International Symposium on High 
Performance Distributed Computing (HPDC-19)


Chicago, Illinois, USA, June 22, 2010
http://www.grid-appliance.org/wiki/index.php/VTDC10 
http://www.grid-appliance.org/wiki/index.php/VTDC10



WORKSHOP SCOPE

Virtualization has proven to be a powerful enabler in the field of 
distributed computing and has led to the emergence of the cloud 
computing paradigm and the provisioning of Infrastructure-as-a-Service 
(IaaS). This new paradigm raises challenges ranging from performance 
evaluation of IaaS platforms, through new methods of resource management 
including providing Service Level Agreements (SLAs) and energy- and 
cost-efficient schedules, to the emergence of supporting technologies 
such as virtual appliance management.


For the last three years, the VTDC workshop has served as a forum for 
the exchange of ideas and experiences studying the challenges and 
opportunities created by IaaS/cloud computing and virtualization 
technologies. VTDC brings together researchers in academia and industry 
who are involved in research and development on resource virtualization 
technologies and on techniques applied to the management of virtualized 
environments in distributed systems.


Topics of interest include but are not limited to:

VTDC 2010 topics of interest include, but are not limited to:

 * Infrastructure as a service (IaaS)
 * Virtualization in data centers
 * Virtualization for resource management and QoS assurance
 * Security aspects of using virtualization in a distributed environment
 * Virtual networks
 * Virtual data, storage as a service
 * Fault tolerance in virtualized environments
 * Virtualization in P2P systems
 * Virtualization-based adaptive/autonomic systems
 * The creation and management of environments/appliances
 * Virtualization technologies
 * Performance modeling (applications and systems)
 * Virtualization techniques for energy/thermal management
 * Case studies of applications on IaaS platforms
 * Deployment studies of virtualization technologies
 * Tools relevant to virtualization


SUBMISSION GUIDELINES

Submitted papers should be limited to 8 pages (including tables, images, 
and references) and should be formatted according to the ACM SIGS Style. 
Please use the official HPDC conference submission site to submit your 
paper; only pdf format is accepted. All papers will receive at least 
three reviews.
Submission implies the willingness of at least one of the authors to 
register
or the workshop and present the paper. The authors of the best paper in 
the workshop will receive a best-paper award.



PROCEEDINGS

The proceedings of the workshop will be published by the ACM.


IMPORTANT DATES

Submission deadline: March 11, 2010 (11:59 PM EST)
Author notification: March 26, 2010
Final papers due:April 14, 2010
Workshop:   June 22, 2010


SUBMISSION SITE

Official HPDC conference submission site,
https://ssl.linklings.net/conferences/hpdc/ 
https://ssl.linklings.net/conferences/hpdc/



WORKSHOP WEBSITE

http://www.grid-appliance.org/wiki/index.php/VTDC10 
http://www.grid-appliance.org/wiki/index.php/VTDC10



WORKSHOP CHAIRS

General Chair: Renato Figueiredo, University of Florida

Program Chair: Frederic Desprez, INRIA

Steering Committee: Jose A. B. Fortes, University of Florida, Kate 
Keahey, University of Chicago, Argonne National Laboratory



PROGRAM COMMITTEE

-  James Broberg, The University of Melbourne, Australia
-  Franck Cappello, INRIA and University of Illinois at Urbana 
Champaign, USA

-  Dilma M Da silva, IBM Research, USA
-  Peter Dinda, Northwestern University, USA
-  Ian Foster, Argonne National Laboratory  The University of Chicago, USA
-  Sebastien Goasguen, Clemson University, USA
-  Kartik Gopalan, Computer Science, State University of New York at 
Binghamton, USA

-  Sverre Jarp, CERN, Switzerland
-  Thilo Kielmann, Vrije Universiteit, Amsterdam, Netherland
-  Jack Lange, Northwestern University, USA
-  Laurent Lefèvre, INRIA, University of Lyon, France
-  Ignacio Lorente, DSA-Research, Universidad Complutense de Madrid, Spain
-  Norbert Meyer, Poznan Supercomputing and Networking Center, Poland
-  Christine MORIN, INRIA Rennes - Bretagne Atlantique, France
-  D. K. Panda, The Ohio State University, USA
-  Matei Ripeanu, University of British Columbia, Canada
-  Paul Ruth, University of Mississippi, USA
-  Kyung D Ryu, IBM T.J. Watson Research Center, USA
-  Chris Samuel, The Victorian Partnership for Advanced Computing, Australia
-  Frank Siebenlist, Argonne National Laboratory, USA
-  Frederic Suter, CC IN2P3 / CNRS, France
-  Dongyan Xu, Purdue University, USA
-  Mike Wray, HP Labs, Bristol, UK
-  Mazin Yousif, IBM Corporation, 

Re: virtio over PCI

2010-03-03 Thread Ira W. Snyder
On Wed, Mar 03, 2010 at 05:09:48PM +1100, Michael Ellerman wrote:
 Hi guys,
 
 I was looking around at virtio over PCI stuff and noticed you had
 started some work on a driver. The last I can find via google is v2 from
 mid last year, is that as far as it got?
 
 http://lkml.org/lkml/2009/2/23/353
 

Yep, that is pretty much as far as I got. It was more-or-less rejected
because I hooked two instances of virtio-net together, rather than
having a proper backend and using virtio-net as the frontend.

I got started on writing a backend, which was never posted to LKML
because I never finished it. Feel free to take the code and use it to
start your own project. Note that vhost-net exists now, and is an
in-kernel backend for virtio-net. It *may* be possible to use this,
rather than writing a userspace backend as I started to do.
http://www.mmarray.org/~iws/virtio-phys/

I also got started with the alacrityvm project, developing a driver for
their virtualization framework. That project is nowhere near finished.
The virtualization folks basically told GHaskins (alacrityvm author)
that alacrityvm wouldn't ever make it to mainline Linux.
http://www.mmarray.org/~iws/vbus/

Unfortunately, I've been pulled onto other projects for the time being.
However, I'd really like to be able to use a virtio-over-PCI style
driver, rather than relying on my own custom (slow, unoptimized) network
driver (PCINet).

If you get something mostly working (and mostly agreed upon by the
virtualization guys), I will make the time to test it and get it cleaned
up. I've had 10+ people email me privately about this kind of driver
now. It is an area where Linux is sorely lacking.

I'm happy to provide any help I can, including testing on
MPC8349EA-based system. I would suggest talking to the virtualization
mailing list before you get too deep in the project. They sometimes have
good advice. I've added them to the CC list, so maybe they can comment.
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Good luck, and let me know if I can help.
Ira
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: virtio over PCI

2010-03-03 Thread Arnd Bergmann
On Thursday 04 March 2010, Ira W. Snyder wrote:

 I'm happy to provide any help I can, including testing on
 MPC8349EA-based system. I would suggest talking to the virtualization
 mailing list before you get too deep in the project. They sometimes have
 good advice. I've added them to the CC list, so maybe they can comment.
 https://lists.linux-foundation.org/mailman/listinfo/virtualization

You may also want to get together with Mark Purcell (if you are not
already working with him). He may be working on the same hardware that
you are interested in, just guessing ;-).

Arnd
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Ringbuffer usage in Linux Hyper-V drivers

2010-03-03 Thread Hank Janssen


All,

I have been looking at one of the TODO items in the Linux Hyper-V drivers. 

Specifically the one that says;

- remove RingBuffer.c to use in-kernel ringbuffer functions instead.

I spend some time figuring out the ring buffer capability inside of the Linux 
Kernel to see if we could change the Hyper-V ring buffer out for the in-kernel 
ring buffer capability.

The ring buffer in the Hyper-V Linux drivers is used to communicate with the 
parent partition running Server 2008 Hyper-V. The ring buffer functionality on 
the Hyper-V Linux drivers is written to be functionally compatible with the 
ring buffer functionality on the Hyper-V Server. Consequently, it is not 
possible to make any changes that might break the compatibility with server 
side ring buffer implementation.  

There is a pretty good chance that ring buffer on Hyper-V will change to 
support 
additional functionality. I did further investigations to check on other 
virtualization technologies. And this same things seems to be true for XEN, 
they also implemented their own ring buffer implementation on the guest side 
because of their host side implementation.

So my question is to the community at large, am I missing something that would 
enable me to use an existing ring buffer functionality somehow in the kernel?  
If not, I want to remove the line from the TODO file that is requesting to use 
the 
in-kernel ring buffer functionality.

Finally, while checking this out, I looked at a bunch of non virtualization 
device 
drivers currently in the kernel. And all the ones I looked at have 
implemented their own ring buffer. Is there a reason why this might be the case?
 
As usual, any help is appreciated.

Thanks,

Hank Janssen.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [RFC][ PATCH 0/3] vhost-net: Add mergeable RX buffer support to vhost-net

2010-03-03 Thread David Stevens
 Interesting. Since the feature in question is billed first of all a
 performance optimization...

By whom? Although I see some improved performance, I think its real
benefit is improving memory utilization on the guest. Instead of using
75K for an ARP packet, mergeable RX buffers only uses 4K. :-)

 Since the patches affect code paths when mergeable RX buffers are
 disabled as well, I guess the most important point would be to verify
 whether there's increase in latency and/or CPU utilization, or bandwidth
 cost when the feature bit is *disabled*.

Actually, when the feature bit is disabled, it'll only get a single
head, doesn't use the special vnet_hdr, and the codepath reduces to the
essentially to the original. But the answer is no; I saw no regressions
when using it without the feature bit. The only substantive difference in 
that case
is that the new code avoids copying the vnet header as the original
does, so it should actually be faster, but I don't think that's measurable
above the variability I already see.

 
  2 notes: I have a modified version of qemu to get the VHOST_FEATURES
  flags, including the mergeable RX bufs flag, passed to the guest; I'll
  be working with your current qemu git trees next, if any changes are
  needed to support it there.
 
 This feature also seems to conflict with zero-copy rx patches from Xin
 Xiaohui (subject: Provide a zero-copy method on KVM virtio-net) these
 are not in a mergeable shape yet, so this is not a blocker, but I wonder
 what your thoughts on the subject are: how will we do feature
 negotiation if some backends don't support some features?

The qemu code I have basically sends the set features and get
features all the way to vhost (ie, it's the guest negotiating with
vhost), except, of course, for the magic qemu-only bits. I think that's
the right model. I'll definitely take a look at the patch you mention
and maybe comment further.

+-DLS

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: Ringbuffer usage in Linux Hyper-V drivers

2010-03-03 Thread Jeremy Fitzhardinge
On 03/03/2010 08:42 AM, Hank Janssen wrote:
 There is a pretty good chance that ring buffer on Hyper-V will change to 
 support
 additional functionality. I did further investigations to check on other
 virtualization technologies. And this same things seems to be true for XEN,
 they also implemented their own ring buffer implementation on the guest side
 because of their host side implementation.


Yes.  The cross-domain producer-consumer ringbuffer is a pretty specific 
protocol.  Not only is the data format an ABI, but the exact protocol 
for what pointers get updated when, etc.  Its not at all obvious how we 
could reuse the kernel ringbuffer implementation, since it assumes its 
implementing both the producer and consumer ends.

 So my question is to the community at large, am I missing something that would
 enable me to use an existing ring buffer functionality somehow in the kernel?
 If not, I want to remove the line from the TODO file that is requesting to 
 use the
 in-kernel ring buffer functionality.

 Finally, while checking this out, I looked at a bunch of non virtualization 
 device
 drivers currently in the kernel. And all the ones I looked at have
 implemented their own ring buffer. Is there a reason why this might be the 
 case?


linux/ring_buffer.h is relatively new, and probably post-dates most of 
the driver ringbuffers.  If the ringbuffer is entirely within the kernel 
(say, between an ISR and the rest of the kernel) then I guess it might 
be possible to use the standard functions.  But if half the ringbuffer 
is being managed by the device itself, then that will define the protocol.

 J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: Ringbuffer usage in Linux Hyper-V drivers

2010-03-03 Thread Greg KH
On Wed, Mar 03, 2010 at 04:42:27PM +, Hank Janssen wrote:
 The ring buffer in the Hyper-V Linux drivers is used to communicate with the 
 parent partition running Server 2008 Hyper-V. The ring buffer functionality 
 on 
 the Hyper-V Linux drivers is written to be functionally compatible with the 
 ring buffer functionality on the Hyper-V Server. Consequently, it is not 
 possible to make any changes that might break the compatibility with server 
 side ring buffer implementation.  

Ok, that makes sense, feel free to remove that TODO item.

thanks for looking into this.

greg k-h
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization