Re: [PATCH V3 0/8] macvtap/vhost TX zero copy support
On Thu, 2011-04-21 at 10:29 -0400, Jon Mason wrote: Why are only these 3 drivers getting support? As far as I can tell, the only requirement is HIGHDMA. If this is the case, is there really a need for an additional flag to support this? If you can key off of HIGHDMA, all devices that support this would get the benefit. Agreed. So far, I have only verified these three 10Gb NICs we have in our lab. We can enable all devices which support HIGHDMA. Thanks Shirley -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 0/8] macvtap/vhost TX zero copy support
On Wed, 2011-04-20 at 12:36 -0700, Shirley Ma wrote: I am collecting more test results against 2.6.39-rc3 kernel and will provide the test matrix later. Single TCP_STREAM 120 secs test results over ixgbe 10Gb NIC results: Message BW(Gb/s)qemu-kvm (NumCPU)vhost-net(NumCPU) PerfTop irq/s 4K 7408.57 92.1% 22.6% 1229 4K(Orig)4913.17 118.1% 84.1% 2086 8K 9129.90 89.3% 23.3% 1141 8K(Orig)7094.55 115.9% 84.7% 2157 16K 9178.81 89.1% 23.3% 1139 16K(Orig)8927.1 118.7% 83.4% 2262 64K 9171.43 88.4% 24.9% 1253 64K(Orig)9085.85115.9% 82.4% 2229 For message size less or equal than 2K, there is a known KVM guest TX overrun issue. With this zerocopy patch, the issue becomes more severe, guest io_exits has tripled than before, so the performance is not good. Once the TX overrun problem has been addressed, I will retest the small message size performance. Thanks Shirley -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 0/8] macvtap/vhost TX zero copy support
On Wed, Apr 20, 2011 at 3:36 PM, Shirley Ma mashi...@us.ibm.com wrote: This patchset add supports for TX zero-copy between guest and host kernel through vhost. It significantly reduces CPU utilization on the local host on which the guest is located (It reduced 30-50% CPU usage for vhost thread for single stream test). The patchset is based on previous submission and comments from the community regarding when/how to handle guest kernel buffers to be released. This is the simplest approach I can think of after comparing with several other solutions. This patchset includes: 1/8: Add a new sock zero-copy flag, SOCK_ZEROCOPY; 2/8: Add a new device flag, NETIF_F_ZEROCOPY for lower level device support zero-copy; 3/8: Add a new struct skb_ubuf_info in skb_share_info for userspace buffers release callback when lower device DMA has done for that skb; 4/8: Add vhost zero-copy callback in vhost when skb last refcnt is gone; add vhost_zerocopy_add_used_and_signal to notify guest to release TX skb buffers. 5/8: Add macvtap zero-copy in lower device when sending packet is greater than 128 bytes. 6/8: Add Chelsio 10Gb NIC to zero copy feature flag 7/8: Add Intel 10Gb NIC zero copy feature flag 8/8: Add Emulex 10Gb NIC zero copy feature flag Why are only these 3 drivers getting support? As far as I can tell, the only requirement is HIGHDMA. If this is the case, is there really a need for an additional flag to support this? If you can key off of HIGHDMA, all devices that support this would get the benefit. The patchset is built against most recent linux 2.6.git. It has passed netperf/netserver multiple streams stress test on above NICs. The single stream test results from 2.6.37 kernel on Chelsio: 64K message size: copy_from_user dropped from 40% to 5%; vhost thread cpu utilization dropped from 76% to 28% I am collecting more test results against 2.6.39-rc3 kernel and will provide the test matrix later. Thanks Shirley -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V3 0/8] macvtap/vhost TX zero copy support
This patchset add supports for TX zero-copy between guest and host kernel through vhost. It significantly reduces CPU utilization on the local host on which the guest is located (It reduced 30-50% CPU usage for vhost thread for single stream test). The patchset is based on previous submission and comments from the community regarding when/how to handle guest kernel buffers to be released. This is the simplest approach I can think of after comparing with several other solutions. This patchset includes: 1/8: Add a new sock zero-copy flag, SOCK_ZEROCOPY; 2/8: Add a new device flag, NETIF_F_ZEROCOPY for lower level device support zero-copy; 3/8: Add a new struct skb_ubuf_info in skb_share_info for userspace buffers release callback when lower device DMA has done for that skb; 4/8: Add vhost zero-copy callback in vhost when skb last refcnt is gone; add vhost_zerocopy_add_used_and_signal to notify guest to release TX skb buffers. 5/8: Add macvtap zero-copy in lower device when sending packet is greater than 128 bytes. 6/8: Add Chelsio 10Gb NIC to zero copy feature flag 7/8: Add Intel 10Gb NIC zero copy feature flag 8/8: Add Emulex 10Gb NIC zero copy feature flag The patchset is built against most recent linux 2.6.git. It has passed netperf/netserver multiple streams stress test on above NICs. The single stream test results from 2.6.37 kernel on Chelsio: 64K message size: copy_from_user dropped from 40% to 5%; vhost thread cpu utilization dropped from 76% to 28% I am collecting more test results against 2.6.39-rc3 kernel and will provide the test matrix later. Thanks Shirley -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 0/8] macvtap/vhost TX zero copy support
Only when buffer size is greater than GOODCOPY_LEN (128), macvtap enables zero-copy. Signed-off-by: Shirley MA x...@us.ibm.com --- drivers/net/macvtap.c | 124 - 1 files changed, 112 insertions(+), 12 deletions(-) diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c index 6696e56..b4e6656 100644 --- a/drivers/net/macvtap.c +++ b/drivers/net/macvtap.c @@ -60,6 +60,7 @@ static struct proto macvtap_proto = { */ static dev_t macvtap_major; #define MACVTAP_NUM_DEVS 65536 +#define GOODCOPY_LEN (L1_CACHE_BYTES 128 ? 128 : L1_CACHE_BYTES) static struct class *macvtap_class; static struct cdev macvtap_cdev; @@ -340,6 +341,7 @@ static int macvtap_open(struct inode *inode, struct file *file) { struct net *net = current-nsproxy-net_ns; struct net_device *dev = dev_get_by_index(net, iminor(inode)); + struct macvlan_dev *vlan = netdev_priv(dev); struct macvtap_queue *q; int err; @@ -369,6 +371,16 @@ static int macvtap_open(struct inode *inode, struct file *file) q-flags = IFF_VNET_HDR | IFF_NO_PI | IFF_TAP; q-vnet_hdr_sz = sizeof(struct virtio_net_hdr); + /* +* so far only VM uses macvtap, enable zero copy between guest +* kernel and host kernel when lower device supports high memory +* DMA +*/ + if (vlan) { + if (vlan-lowerdev-features NETIF_F_ZEROCOPY) + sock_set_flag(q-sk, SOCK_ZEROCOPY); + } + err = macvtap_set_queue(dev, file, q); if (err) sock_put(q-sk); @@ -433,6 +445,80 @@ static inline struct sk_buff *macvtap_alloc_skb(struct sock *sk, size_t prepad, return skb; } +/* set skb frags from iovec, this can move to core network code for reuse */ +static int zerocopy_sg_from_iovec(struct sk_buff *skb, const struct iovec *from, + int offset, size_t count) +{ + int len = iov_length(from, count) - offset; + int copy = skb_headlen(skb); + int size, offset1 = 0; + int i = 0; + skb_frag_t *f; + + /* Skip over from offset */ + while (offset = from-iov_len) { + offset -= from-iov_len; + ++from; + --count; + } + + /* copy up to skb headlen */ + while (copy 0) { + size = min_t(unsigned int, copy, from-iov_len - offset); + if (copy_from_user(skb-data + offset1, from-iov_base + offset, + size)) + return -EFAULT; + if (copy size) { + ++from; + --count; + } + copy -= size; + offset1 += size; + offset = 0; + } + + if (len == offset1) + return 0; + + while (count--) { + struct page *page[MAX_SKB_FRAGS]; + int num_pages; + unsigned long base; + + len = from-iov_len - offset1; + if (!len) { + offset1 = 0; + ++from; + continue; + } + base = (unsigned long)from-iov_base + offset1; + size = ((base ~PAGE_MASK) + len + ~PAGE_MASK) PAGE_SHIFT; + num_pages = get_user_pages_fast(base, size, 0, page[i]); + if ((num_pages != size) || + (num_pages MAX_SKB_FRAGS - skb_shinfo(skb)-nr_frags)) + /* put_page is in skb free */ + return -EFAULT; + skb-data_len += len; + skb-len += len; + skb-truesize += len; + while (len) { + f = skb_shinfo(skb)-frags[i]; + f-page = page[i]; + f-page_offset = base ~PAGE_MASK; + f-size = min_t(int, len, PAGE_SIZE - f-page_offset); + skb_shinfo(skb)-nr_frags++; + /* increase sk_wmem_alloc */ + atomic_add(f-size, skb-sk-sk_wmem_alloc); + base += f-size; + len -= f-size; + i++; + } + offset1 = 0; + ++from; + } + return 0; +} + /* * macvtap_skb_from_vnet_hdr and macvtap_skb_to_vnet_hdr should * be shared with the tun/tap driver. @@ -515,17 +601,19 @@ static int macvtap_skb_to_vnet_hdr(const struct sk_buff *skb, /* Get packet from user space buffer */ -static ssize_t macvtap_get_user(struct macvtap_queue *q, - const struct iovec *iv, size_t count, - int noblock) +static ssize_t macvtap_get_user(struct macvtap_queue *q, struct msghdr *m, + const struct iovec *iv, unsigned long total_len, +