Re: [RFC][ PATCH 0/3] vhost-net: Add mergeable RX buffer support to vhost-net
On Tue, Mar 02, 2010 at 04:20:03PM -0800, David Stevens wrote: These patches add support for mergeable receive buffers to vhost-net, allowing it to use multiple virtio buffer heads for a single receive packet. +-DLS Signed-off-by: David L Stevens dlstev...@us.ibm.com Do you have performance numbers (both with and without mergeable buffers in guest)? -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [RFC][ PATCH 0/3] vhost-net: Add mergeable RX buffer support to vhost-net
On Wed, Mar 03, 2010 at 12:54:25AM -0800, David Stevens wrote: Michael S. Tsirkin m...@redhat.com wrote on 03/02/2010 11:54:32 PM: On Tue, Mar 02, 2010 at 04:20:03PM -0800, David Stevens wrote: These patches add support for mergeable receive buffers to vhost-net, allowing it to use multiple virtio buffer heads for a single receive packet. +-DLS Signed-off-by: David L Stevens dlstev...@us.ibm.com Do you have performance numbers (both with and without mergeable buffers in guest)? Michael, Nothing formal. I did some TCP single-stream throughput tests and was seeing 20-25% improvement on a laptop (ie, low-end hardware). That actually surprised me; I'd think it'd be about the same, except maybe in a test that has mixed packet sizes. Comparisons with the net-next kernel these patches are for showed only ~10% improvement. But I also see a lot of variability both among different configurations and with the same configuration on different runs. So, I don't feel like those numbers are very solid, and I haven't yet done any tests on bigger hardware. Interesting. Since the feature in question is billed first of all a performance optimization, I think we might need some performance numbers as a motivation. Since the patches affect code paths when mergeable RX buffers are disabled as well, I guess the most important point would be to verify whether there's increase in latency and/or CPU utilization, or bandwidth cost when the feature bit is *disabled*. 2 notes: I have a modified version of qemu to get the VHOST_FEATURES flags, including the mergeable RX bufs flag, passed to the guest; I'll be working with your current qemu git trees next, if any changes are needed to support it there. This feature also seems to conflict with zero-copy rx patches from Xin Xiaohui (subject: Provide a zero-copy method on KVM virtio-net) these are not in a mergeable shape yet, so this is not a blocker, but I wonder what your thoughts on the subject are: how will we do feature negotiation if some backends don't support some features? Second, I've found a missing initialization in the patches I sent on the list, so I'll send an updated patch 2 with the fix, If you do, any chance you could use git send-email for this? and qemu patches when they are ready (plus any code-review comments incorporated). Pls take a look here as well http://www.openfabrics.org/~mst/boring.txt ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
[RFC][ PATCH 3/3] vhost-net: Add mergeable RX buffer support to vhost-net
This patch glues them all together and makes sure we notify whenever we don't have enough buffers to receive a max-sized packet, and adds the feature bit. Signed-off-by: David L Stevens dlstev...@us.ibm.com diff -ruN net-next-p2/drivers/vhost/net.c net-next-p3/drivers/vhost/net.c --- net-next-p2/drivers/vhost/net.c 2010-03-02 13:01:34.0 -0800 +++ net-next-p3/drivers/vhost/net.c 2010-03-02 15:25:15.0 -0800 @@ -54,26 +54,6 @@ enum vhost_net_poll_state tx_poll_state; }; -/* Pop first len bytes from iovec. Return number of segments used. */ -static int move_iovec_hdr(struct iovec *from, struct iovec *to, - size_t len, int iov_count) -{ - int seg = 0; - size_t size; - while (len seg iov_count) { - size = min(from-iov_len, len); - to-iov_base = from-iov_base; - to-iov_len = size; - from-iov_len -= size; - from-iov_base += size; - len -= size; - ++from; - ++to; - ++seg; - } - return seg; -} - /* Caller must have TX VQ lock */ static void tx_poll_stop(struct vhost_net *net) { @@ -97,7 +77,7 @@ static void handle_tx(struct vhost_net *net) { struct vhost_virtqueue *vq = net-dev.vqs[VHOST_NET_VQ_TX]; - unsigned out, in, s; + unsigned out, in; struct iovec head; struct msghdr msg = { .msg_name = NULL, @@ -110,6 +90,7 @@ size_t len, total_len = 0; int err, wmem; struct socket *sock = rcu_dereference(vq-private_data); + if (!sock) return; @@ -166,11 +147,11 @@ /* Skip header. TODO: support TSO. */ msg.msg_iovlen = out; head.iov_len = len = iov_length(vq-iov, out); + /* Sanity check */ if (!len) { vq_err(vq, Unexpected header len for TX: - %zd expected %zd\n, - len, vq-guest_hlen); + %zd expected %zd\n, len, vq-guest_hlen); break; } /* TODO: Check specific error and bomb out unless ENOBUFS? */ @@ -214,7 +195,7 @@ static void handle_rx(struct vhost_net *net) { struct vhost_virtqueue *vq = net-dev.vqs[VHOST_NET_VQ_RX]; - unsigned in, log, s; + unsigned in, log; struct vhost_log *vq_log; struct msghdr msg = { .msg_name = NULL, @@ -245,30 +226,36 @@ if (!headcount) { vhost_enable_notify(vq); break; - } + } else if (vq-maxheadcount headcount) + vq-maxheadcount = headcount; /* Skip header. TODO: support TSO/mergeable rx buffers. */ msg.msg_iovlen = in; len = iov_length(vq-iov, in); - /* Sanity check */ if (!len) { vq_err(vq, Unexpected header len for RX: - %zd expected %zd\n, - len, vq-guest_hlen); + %zd expected %zd\n, len, vq-guest_hlen); break; } err = sock-ops-recvmsg(NULL, sock, msg, len, MSG_DONTWAIT | MSG_TRUNC); - /* TODO: Check specific error and bomb out unless EAGAIN? */ if (err 0) { - vhost_discard(vq, 1); + vhost_discard(vq, headcount); break; } /* TODO: Should check and handle checksum. */ + if (vhost_has_feature(net-dev, VIRTIO_NET_F_MRG_RXBUF)) { + struct virtio_net_hdr_mrg_rxbuf *vhdr = + (struct virtio_net_hdr_mrg_rxbuf *) + vq-iov[0].iov_base; + /* add num_bufs */ + vq-iov[0].iov_len = vq-guest_hlen; + vhdr-num_buffers = headcount; + } if (err len) { pr_err(Discarded truncated rx packet: len %d %zd\n, err, len); - vhost_discard(vq, 1); + vhost_discard(vq, headcount); continue; } len = err; @@ -573,8 +560,6 @@ static int vhost_net_set_features(struct vhost_net *n, u64 features) { - size_t hdr_size = features (1 VHOST_NET_F_VIRTIO_NET_HDR) ? - sizeof(struct virtio_net_hdr) : 0; int i; mutex_lock(n-dev.mutex); if ((features (1 VHOST_F_LOG_ALL)) diff -ruN net-next-p2/drivers/vhost/vhost.c net-next-p3/drivers/vhost/vhost.c --- net-next-p2/drivers/vhost/vhost.c
[RFC][ PATCH 2/3] vhost-net: handle vnet_hdr processing for MRG_RX_BUF
This patch adds vnet_hdr processing for mergeable buffer support to vhost-net. Signed-off-by: David L Stevens dlstev...@us.ibm.com diff -ruN net-next-p1/drivers/vhost/net.c net-next-p2/drivers/vhost/net.c --- net-next-p1/drivers/vhost/net.c 2010-03-01 11:44:22.0 -0800 +++ net-next-p2/drivers/vhost/net.c 2010-03-02 13:01:34.0 -0800 @@ -109,7 +109,6 @@ }; size_t len, total_len = 0; int err, wmem; - size_t hdr_size; struct socket *sock = rcu_dereference(vq-private_data); if (!sock) return; @@ -124,7 +123,6 @@ if (wmem sock-sk-sk_sndbuf * 2) tx_poll_stop(net); - hdr_size = vq-hdr_size; for (;;) { head.iov_base = (void *)vhost_get_vq_desc(net-dev, vq, @@ -148,25 +146,45 @@ out %d, int %d\n, out, in); break; } + if (vq-guest_hlen vq-sock_hlen) { + if (msg.msg_iov[0].iov_len == vq-guest_hlen) + msg.msg_iov[0].iov_len = vq-sock_hlen; + else if (out == ARRAY_SIZE(vq-iov)) + vq_err(vq, handle_tx iov overflow!); + else { + int i; + + /* give header its own iov */ + for (i=out; i0; ++i) + msg.msg_iov[i+1] = msg.msg_iov[i]; + msg.msg_iov[0].iov_len = vq-sock_hlen; + msg.msg_iov[1].iov_base += vq-guest_hlen; + msg.msg_iov[1].iov_len -= vq-guest_hlen; + out++; + } + } /* Skip header. TODO: support TSO. */ - s = move_iovec_hdr(vq-iov, vq-hdr, hdr_size, out); msg.msg_iovlen = out; head.iov_len = len = iov_length(vq-iov, out); /* Sanity check */ if (!len) { vq_err(vq, Unexpected header len for TX: %zd expected %zd\n, - iov_length(vq-hdr, s), hdr_size); + len, vq-guest_hlen); break; } /* TODO: Check specific error and bomb out unless ENOBUFS? */ err = sock-ops-sendmsg(NULL, sock, msg, len); if (unlikely(err 0)) { - vhost_discard(vq, 1); - tx_poll_start(net, sock); + if (err == -EAGAIN) { + tx_poll_start(net, sock); + } else { + vq_err(vq, sendmsg: errno %d\n, -err); + /* drop packet; do not discard/resend */ + vhost_add_used_and_signal(net-dev,vq,head,1); + } break; - } - if (err != len) + } else if (err != len) pr_err(Truncated TX packet: len %d != %zd\n, err, len); vhost_add_used_and_signal(net-dev, vq, head, 1); @@ -207,14 +225,8 @@ .msg_flags = MSG_DONTWAIT, }; - struct virtio_net_hdr hdr = { - .flags = 0, - .gso_type = VIRTIO_NET_HDR_GSO_NONE - }; - size_t len, total_len = 0; int err, headcount, datalen; - size_t hdr_size; struct socket *sock = rcu_dereference(vq-private_data); if (!sock || !skb_head_len(sock-sk-sk_receive_queue)) @@ -223,7 +235,6 @@ use_mm(net-dev.mm); mutex_lock(vq-mutex); vhost_disable_notify(vq); - hdr_size = vq-hdr_size; vq_log = unlikely(vhost_has_feature(net-dev, VHOST_F_LOG_ALL)) ? vq-log : NULL; @@ -232,25 +243,18 @@ headcount = vhost_get_heads(vq, datalen, in, vq_log, log); /* OK, now we need to know about added descriptors. */ if (!headcount) { - if (unlikely(vhost_enable_notify(vq))) { - /* They have slipped one in as we were -* doing that: check again. */ - vhost_disable_notify(vq); - continue; - } - /* Nothing new? Wait for eventfd to tell us -* they refilled. */ + vhost_enable_notify(vq); break; } /* Skip header. TODO: support TSO/mergeable rx buffers. */ - s = move_iovec_hdr(vq-iov, vq-hdr, hdr_size, in); msg.msg_iovlen = in; len = iov_length(vq-iov, in); + /* Sanity check */
[RFC][ PATCH 0/3] vhost-net: Add mergeable RX buffer support to vhost-net
These patches add support for mergeable receive buffers to vhost-net, allowing it to use multiple virtio buffer heads for a single receive packet. +-DLS Signed-off-by: David L Stevens dlstev...@us.ibm.com ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
[RFC][ PATCH 1/3] vhost-net: support multiple buffer heads in receiver
This patch generalizes buffer handling functions to support multiple buffer heads. In-line for viewing, attached for applying. Signed-off-by: David L Stevens dlstev...@us.ibm.com diff -ruN net-next-p0/drivers/vhost/net.c net-next-p1/drivers/vhost/net.c --- net-next-p0/drivers/vhost/net.c 2010-02-24 12:59:24.0 -0800 +++ net-next-p1/drivers/vhost/net.c 2010-03-01 11:44:22.0 -0800 @@ -97,7 +97,8 @@ static void handle_tx(struct vhost_net *net) { struct vhost_virtqueue *vq = net-dev.vqs[VHOST_NET_VQ_TX]; - unsigned head, out, in, s; + unsigned out, in, s; + struct iovec head; struct msghdr msg = { .msg_name = NULL, .msg_namelen = 0, @@ -126,12 +127,10 @@ hdr_size = vq-hdr_size; for (;;) { - head = vhost_get_vq_desc(net-dev, vq, vq-iov, -ARRAY_SIZE(vq-iov), -out, in, -NULL, NULL); + head.iov_base = (void *)vhost_get_vq_desc(net-dev, vq, + vq-iov, ARRAY_SIZE(vq-iov), out, in, NULL, NULL); /* Nothing new? Wait for eventfd to tell us they refilled. */ - if (head == vq-num) { + if (head.iov_base == (void *)vq-num) { wmem = atomic_read(sock-sk-sk_wmem_alloc); if (wmem = sock-sk-sk_sndbuf * 3 / 4) { tx_poll_start(net, sock); @@ -152,7 +151,7 @@ /* Skip header. TODO: support TSO. */ s = move_iovec_hdr(vq-iov, vq-hdr, hdr_size, out); msg.msg_iovlen = out; - len = iov_length(vq-iov, out); + head.iov_len = len = iov_length(vq-iov, out); /* Sanity check */ if (!len) { vq_err(vq, Unexpected header len for TX: @@ -163,14 +162,14 @@ /* TODO: Check specific error and bomb out unless ENOBUFS? */ err = sock-ops-sendmsg(NULL, sock, msg, len); if (unlikely(err 0)) { - vhost_discard_vq_desc(vq); + vhost_discard(vq, 1); tx_poll_start(net, sock); break; } if (err != len) pr_err(Truncated TX packet: len %d != %zd\n, err, len); - vhost_add_used_and_signal(net-dev, vq, head, 0); + vhost_add_used_and_signal(net-dev, vq, head, 1); total_len += len; if (unlikely(total_len = VHOST_NET_WEIGHT)) { vhost_poll_queue(vq-poll); @@ -182,12 +181,22 @@ unuse_mm(net-dev.mm); } +static int skb_head_len(struct sk_buff_head *skq) +{ + struct sk_buff *head; + + head = skb_peek(skq); + if (head) + return head-len; + return 0; +} + /* Expects to be always run from workqueue - which acts as * read-size critical section for our kind of RCU. */ static void handle_rx(struct vhost_net *net) { struct vhost_virtqueue *vq = net-dev.vqs[VHOST_NET_VQ_RX]; - unsigned head, out, in, log, s; + unsigned in, log, s; struct vhost_log *vq_log; struct msghdr msg = { .msg_name = NULL, @@ -204,10 +213,11 @@ }; size_t len, total_len = 0; - int err; + int err, headcount, datalen; size_t hdr_size; struct socket *sock = rcu_dereference(vq-private_data); - if (!sock || skb_queue_empty(sock-sk-sk_receive_queue)) + + if (!sock || !skb_head_len(sock-sk-sk_receive_queue)) return; use_mm(net-dev.mm); @@ -218,13 +228,10 @@ vq_log = unlikely(vhost_has_feature(net-dev, VHOST_F_LOG_ALL)) ? vq-log : NULL; - for (;;) { - head = vhost_get_vq_desc(net-dev, vq, vq-iov, -ARRAY_SIZE(vq-iov), -out, in, -vq_log, log); + while ((datalen = skb_head_len(sock-sk-sk_receive_queue))) { + headcount = vhost_get_heads(vq, datalen, in, vq_log, log); /* OK, now we need to know about added descriptors. */ - if (head == vq-num) { + if (!headcount) { if (unlikely(vhost_enable_notify(vq))) { /* They have slipped one in as we were * doing that: check again. */ @@ -235,13 +242,6 @@ * they refilled. */ break; } - /* We don't need to be notified again. */ - if (out) { - vq_err(vq, Unexpected descriptor format for RX: - out %d, int
Re: [RFC][ PATCH 0/3] vhost-net: Add mergeable RX buffer support to vhost-net
Michael S. Tsirkin m...@redhat.com wrote on 03/02/2010 11:54:32 PM: On Tue, Mar 02, 2010 at 04:20:03PM -0800, David Stevens wrote: These patches add support for mergeable receive buffers to vhost-net, allowing it to use multiple virtio buffer heads for a single receive packet. +-DLS Signed-off-by: David L Stevens dlstev...@us.ibm.com Do you have performance numbers (both with and without mergeable buffers in guest)? Michael, Nothing formal. I did some TCP single-stream throughput tests and was seeing 20-25% improvement on a laptop (ie, low-end hardware). That actually surprised me; I'd think it'd be about the same, except maybe in a test that has mixed packet sizes. Comparisons with the net-next kernel these patches are for showed only ~10% improvement. But I also see a lot of variability both among different configurations and with the same configuration on different runs. So, I don't feel like those numbers are very solid, and I haven't yet done any tests on bigger hardware. 2 notes: I have a modified version of qemu to get the VHOST_FEATURES flags, including the mergeable RX bufs flag, passed to the guest; I'll be working with your current qemu git trees next, if any changes are needed to support it there. Second, I've found a missing initialization in the patches I sent on the list, so I'll send an updated patch 2 with the fix, and qemu patches when they are ready (plus any code-review comments incorporated). +-DLS ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
VTDC2010 Deadline Extended to March 11
(our apologies if you receive this announcement multiple times) Dead-line extension (March 11th) ! Call for Papers --- Workshop on Virtualization Technologies in Distributed Computing (VTDC 2010) in conjunction with the 19-th International Symposium on High Performance Distributed Computing (HPDC-19) Chicago, Illinois, USA, June 22, 2010 http://www.grid-appliance.org/wiki/index.php/VTDC10 http://www.grid-appliance.org/wiki/index.php/VTDC10 WORKSHOP SCOPE Virtualization has proven to be a powerful enabler in the field of distributed computing and has led to the emergence of the cloud computing paradigm and the provisioning of Infrastructure-as-a-Service (IaaS). This new paradigm raises challenges ranging from performance evaluation of IaaS platforms, through new methods of resource management including providing Service Level Agreements (SLAs) and energy- and cost-efficient schedules, to the emergence of supporting technologies such as virtual appliance management. For the last three years, the VTDC workshop has served as a forum for the exchange of ideas and experiences studying the challenges and opportunities created by IaaS/cloud computing and virtualization technologies. VTDC brings together researchers in academia and industry who are involved in research and development on resource virtualization technologies and on techniques applied to the management of virtualized environments in distributed systems. Topics of interest include but are not limited to: VTDC 2010 topics of interest include, but are not limited to: * Infrastructure as a service (IaaS) * Virtualization in data centers * Virtualization for resource management and QoS assurance * Security aspects of using virtualization in a distributed environment * Virtual networks * Virtual data, storage as a service * Fault tolerance in virtualized environments * Virtualization in P2P systems * Virtualization-based adaptive/autonomic systems * The creation and management of environments/appliances * Virtualization technologies * Performance modeling (applications and systems) * Virtualization techniques for energy/thermal management * Case studies of applications on IaaS platforms * Deployment studies of virtualization technologies * Tools relevant to virtualization SUBMISSION GUIDELINES Submitted papers should be limited to 8 pages (including tables, images, and references) and should be formatted according to the ACM SIGS Style. Please use the official HPDC conference submission site to submit your paper; only pdf format is accepted. All papers will receive at least three reviews. Submission implies the willingness of at least one of the authors to register or the workshop and present the paper. The authors of the best paper in the workshop will receive a best-paper award. PROCEEDINGS The proceedings of the workshop will be published by the ACM. IMPORTANT DATES Submission deadline: March 11, 2010 (11:59 PM EST) Author notification: March 26, 2010 Final papers due:April 14, 2010 Workshop: June 22, 2010 SUBMISSION SITE Official HPDC conference submission site, https://ssl.linklings.net/conferences/hpdc/ https://ssl.linklings.net/conferences/hpdc/ WORKSHOP WEBSITE http://www.grid-appliance.org/wiki/index.php/VTDC10 http://www.grid-appliance.org/wiki/index.php/VTDC10 WORKSHOP CHAIRS General Chair: Renato Figueiredo, University of Florida Program Chair: Frederic Desprez, INRIA Steering Committee: Jose A. B. Fortes, University of Florida, Kate Keahey, University of Chicago, Argonne National Laboratory PROGRAM COMMITTEE - James Broberg, The University of Melbourne, Australia - Franck Cappello, INRIA and University of Illinois at Urbana Champaign, USA - Dilma M Da silva, IBM Research, USA - Peter Dinda, Northwestern University, USA - Ian Foster, Argonne National Laboratory The University of Chicago, USA - Sebastien Goasguen, Clemson University, USA - Kartik Gopalan, Computer Science, State University of New York at Binghamton, USA - Sverre Jarp, CERN, Switzerland - Thilo Kielmann, Vrije Universiteit, Amsterdam, Netherland - Jack Lange, Northwestern University, USA - Laurent Lefèvre, INRIA, University of Lyon, France - Ignacio Lorente, DSA-Research, Universidad Complutense de Madrid, Spain - Norbert Meyer, Poznan Supercomputing and Networking Center, Poland - Christine MORIN, INRIA Rennes - Bretagne Atlantique, France - D. K. Panda, The Ohio State University, USA - Matei Ripeanu, University of British Columbia, Canada - Paul Ruth, University of Mississippi, USA - Kyung D Ryu, IBM T.J. Watson Research Center, USA - Chris Samuel, The Victorian Partnership for Advanced Computing, Australia - Frank Siebenlist, Argonne National Laboratory, USA - Frederic Suter, CC IN2P3 / CNRS, France - Dongyan Xu, Purdue University, USA - Mike Wray, HP Labs, Bristol, UK - Mazin Yousif, IBM Corporation,
Re: virtio over PCI
On Wed, Mar 03, 2010 at 05:09:48PM +1100, Michael Ellerman wrote: Hi guys, I was looking around at virtio over PCI stuff and noticed you had started some work on a driver. The last I can find via google is v2 from mid last year, is that as far as it got? http://lkml.org/lkml/2009/2/23/353 Yep, that is pretty much as far as I got. It was more-or-less rejected because I hooked two instances of virtio-net together, rather than having a proper backend and using virtio-net as the frontend. I got started on writing a backend, which was never posted to LKML because I never finished it. Feel free to take the code and use it to start your own project. Note that vhost-net exists now, and is an in-kernel backend for virtio-net. It *may* be possible to use this, rather than writing a userspace backend as I started to do. http://www.mmarray.org/~iws/virtio-phys/ I also got started with the alacrityvm project, developing a driver for their virtualization framework. That project is nowhere near finished. The virtualization folks basically told GHaskins (alacrityvm author) that alacrityvm wouldn't ever make it to mainline Linux. http://www.mmarray.org/~iws/vbus/ Unfortunately, I've been pulled onto other projects for the time being. However, I'd really like to be able to use a virtio-over-PCI style driver, rather than relying on my own custom (slow, unoptimized) network driver (PCINet). If you get something mostly working (and mostly agreed upon by the virtualization guys), I will make the time to test it and get it cleaned up. I've had 10+ people email me privately about this kind of driver now. It is an area where Linux is sorely lacking. I'm happy to provide any help I can, including testing on MPC8349EA-based system. I would suggest talking to the virtualization mailing list before you get too deep in the project. They sometimes have good advice. I've added them to the CC list, so maybe they can comment. https://lists.linux-foundation.org/mailman/listinfo/virtualization Good luck, and let me know if I can help. Ira ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: virtio over PCI
On Thursday 04 March 2010, Ira W. Snyder wrote: I'm happy to provide any help I can, including testing on MPC8349EA-based system. I would suggest talking to the virtualization mailing list before you get too deep in the project. They sometimes have good advice. I've added them to the CC list, so maybe they can comment. https://lists.linux-foundation.org/mailman/listinfo/virtualization You may also want to get together with Mark Purcell (if you are not already working with him). He may be working on the same hardware that you are interested in, just guessing ;-). Arnd ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Ringbuffer usage in Linux Hyper-V drivers
All, I have been looking at one of the TODO items in the Linux Hyper-V drivers. Specifically the one that says; - remove RingBuffer.c to use in-kernel ringbuffer functions instead. I spend some time figuring out the ring buffer capability inside of the Linux Kernel to see if we could change the Hyper-V ring buffer out for the in-kernel ring buffer capability. The ring buffer in the Hyper-V Linux drivers is used to communicate with the parent partition running Server 2008 Hyper-V. The ring buffer functionality on the Hyper-V Linux drivers is written to be functionally compatible with the ring buffer functionality on the Hyper-V Server. Consequently, it is not possible to make any changes that might break the compatibility with server side ring buffer implementation. There is a pretty good chance that ring buffer on Hyper-V will change to support additional functionality. I did further investigations to check on other virtualization technologies. And this same things seems to be true for XEN, they also implemented their own ring buffer implementation on the guest side because of their host side implementation. So my question is to the community at large, am I missing something that would enable me to use an existing ring buffer functionality somehow in the kernel? If not, I want to remove the line from the TODO file that is requesting to use the in-kernel ring buffer functionality. Finally, while checking this out, I looked at a bunch of non virtualization device drivers currently in the kernel. And all the ones I looked at have implemented their own ring buffer. Is there a reason why this might be the case? As usual, any help is appreciated. Thanks, Hank Janssen. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [RFC][ PATCH 0/3] vhost-net: Add mergeable RX buffer support to vhost-net
Interesting. Since the feature in question is billed first of all a performance optimization... By whom? Although I see some improved performance, I think its real benefit is improving memory utilization on the guest. Instead of using 75K for an ARP packet, mergeable RX buffers only uses 4K. :-) Since the patches affect code paths when mergeable RX buffers are disabled as well, I guess the most important point would be to verify whether there's increase in latency and/or CPU utilization, or bandwidth cost when the feature bit is *disabled*. Actually, when the feature bit is disabled, it'll only get a single head, doesn't use the special vnet_hdr, and the codepath reduces to the essentially to the original. But the answer is no; I saw no regressions when using it without the feature bit. The only substantive difference in that case is that the new code avoids copying the vnet header as the original does, so it should actually be faster, but I don't think that's measurable above the variability I already see. 2 notes: I have a modified version of qemu to get the VHOST_FEATURES flags, including the mergeable RX bufs flag, passed to the guest; I'll be working with your current qemu git trees next, if any changes are needed to support it there. This feature also seems to conflict with zero-copy rx patches from Xin Xiaohui (subject: Provide a zero-copy method on KVM virtio-net) these are not in a mergeable shape yet, so this is not a blocker, but I wonder what your thoughts on the subject are: how will we do feature negotiation if some backends don't support some features? The qemu code I have basically sends the set features and get features all the way to vhost (ie, it's the guest negotiating with vhost), except, of course, for the magic qemu-only bits. I think that's the right model. I'll definitely take a look at the patch you mention and maybe comment further. +-DLS ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: Ringbuffer usage in Linux Hyper-V drivers
On 03/03/2010 08:42 AM, Hank Janssen wrote: There is a pretty good chance that ring buffer on Hyper-V will change to support additional functionality. I did further investigations to check on other virtualization technologies. And this same things seems to be true for XEN, they also implemented their own ring buffer implementation on the guest side because of their host side implementation. Yes. The cross-domain producer-consumer ringbuffer is a pretty specific protocol. Not only is the data format an ABI, but the exact protocol for what pointers get updated when, etc. Its not at all obvious how we could reuse the kernel ringbuffer implementation, since it assumes its implementing both the producer and consumer ends. So my question is to the community at large, am I missing something that would enable me to use an existing ring buffer functionality somehow in the kernel? If not, I want to remove the line from the TODO file that is requesting to use the in-kernel ring buffer functionality. Finally, while checking this out, I looked at a bunch of non virtualization device drivers currently in the kernel. And all the ones I looked at have implemented their own ring buffer. Is there a reason why this might be the case? linux/ring_buffer.h is relatively new, and probably post-dates most of the driver ringbuffers. If the ringbuffer is entirely within the kernel (say, between an ISR and the rest of the kernel) then I guess it might be possible to use the standard functions. But if half the ringbuffer is being managed by the device itself, then that will define the protocol. J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: Ringbuffer usage in Linux Hyper-V drivers
On Wed, Mar 03, 2010 at 04:42:27PM +, Hank Janssen wrote: The ring buffer in the Hyper-V Linux drivers is used to communicate with the parent partition running Server 2008 Hyper-V. The ring buffer functionality on the Hyper-V Linux drivers is written to be functionally compatible with the ring buffer functionality on the Hyper-V Server. Consequently, it is not possible to make any changes that might break the compatibility with server side ring buffer implementation. Ok, that makes sense, feel free to remove that TODO item. thanks for looking into this. greg k-h ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization