Re: [Qemu-devel] [ RFC Patch v6 1/3] virtio-net rsc: support coalescing ipv4 tcp traffic

2016-06-02 Thread Wei Xu

On 2016年05月30日 12:20, Jason Wang wrote:



On 2016年05月29日 00:37, w...@redhat.com wrote:

From: Wei Xu 

All the data packets in a tcp connection will be cached to a big buffer
in every receive interval, and will be sent out via a timer, the
'virtio_net_rsc_timeout' controls the interval, the value will
influent the
performance and response of tcp connection extremely, 5(50us) is a
experience value to gain a performance improvement, since the whql test
sends packets every 100us, so '30(300us)' can pass the test case,
this is also the default value, it's tunable via the command line
parameter 'rsc_interval' with 'virtio-net-pci' device, for example, below
parameter is to launch a guest with interval set as '50'.


Does the value make sense if it was smaller than 1us? If not, why not
just make the unit to be 1us?
It's an experience value, 500us is a good candidate for netperf 
throughput test, too short interval less than 50 us will gain an obvious 
penalty AFAIK, this is caused only few packets can be coalesced and the 
cycles wasted for rsc itself.


'virtio-net-pci,netdev=hostnet1,bus=pci.0,id=net1,mac=00,rsc_interval=50'

will

The timer will only be triggered if the packets pool is not empty,
and it'll drain off all the cached packets.

'NetRscChain' is used to save the segments of different protocols in a
VirtIONet device.

The main handler of TCP includes TCP window update, duplicated ACK check
and the real data coalescing if the new segment passed sanity check
and is identified as an 'wanted' one.

An 'wanted' segment means:
1. Segment is within current window and the sequence is the expected one.
2. 'ACK' of the segment is in the valid window.

Sanity check includes:
1. Incorrect version in IP header
2. IP options & IP fragment
3. Not a TCP packets
4. Sanity size check to prevent buffer overflow attack.

There maybe more cases should be considered such as ip identification
other
flags, while it broke the test because windows set it to the same even
it's
not a fragment.

Normally it includes 2 typical ways to handle a TCP control flag,
'bypass'
and 'finalize', 'bypass' means should be sent out directly, and
'finalize'
means the packets should also be bypassed, and this should be done
after searching for the same connection packets in the pool and sending
all of them out, this is to avoid out of data.

All the 'SYN' packets will be bypassed since this always begin a new'
connection, other flags such 'FIN/RST' will trigger a finalization,
because
this normally happens upon a connection is going to be closed, an
'URG' packet
also finalize current coalescing unit.

Statistics can be used to monitor the basic coalescing status, the
'out of order'
and 'out of window' means how many retransmitting packets, thus
describe the
performance intuitively.


We usually don't add device specific monitor command. Maybe a new
control vq command ethtool -S in guest in the future. I was thinking of
removing those counters since it was never used in this series.
Yes, that's a good idea, actually i'm doubting whether this feature 
should be a guest feature or a host feature, the spec says it should be 
more like a guest feature, but it's provided as a host built-in feature, 
so how and where to examine it is a problem. I'm using gdb to debugging 
it currently, normally i would check this counter directly via debug 
command, and the statistics is quite useful for troubleshooting, it'll 
be optional when this feature is geting more and more robust, can we 
keep it and leave if should keep it or where to display it along with 
qa's testing?




Signed-off-by: Wei Xu 
---
  hw/net/virtio-net.c | 498
+++-
  include/hw/virtio/virtio-net.h  |   2 +
  include/hw/virtio/virtio.h  |  75 +
  include/standard-headers/linux/virtio_net.h |   1 +


For RFC, it's ok. But for formal patch, this is not the correct way to
modify Linux headers. There's a script in
scripts/update-linux-headers.sh which was used to sync it from Linux
source. This means, it must be merged in Linux or at least in
maintainer's tree first.


  4 files changed, 575 insertions(+), 1 deletion(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 5798f87..b3bb63b 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -15,10 +15,12 @@
  #include "qemu/iov.h"
  #include "hw/virtio/virtio.h"
  #include "net/net.h"
+#include "net/eth.h"
  #include "net/checksum.h"
  #include "net/tap.h"
  #include "qemu/error-report.h"
  #include "qemu/timer.h"
+#include "qemu/sockets.h"
  #include "hw/virtio/virtio-net.h"
  #include "net/vhost_net.h"
  #include "hw/virtio/virtio-bus.h"
@@ -38,6 +40,25 @@
  #define endof(container, field) \
  (offsetof(container, field) + sizeof(((container *)0)->field))
+#define VIRTIO_NET_IP4_ADDR_SIZE   8/* ipv4 saddr + daddr */
+#define VIRTIO_NET_TCP_PORT_SIZE   4/* sport + 

Re: [Qemu-devel] [ RFC Patch v6 1/3] virtio-net rsc: support coalescing ipv4 tcp traffic

2016-05-29 Thread Jason Wang



On 2016年05月29日 00:37, w...@redhat.com wrote:

From: Wei Xu 

All the data packets in a tcp connection will be cached to a big buffer
in every receive interval, and will be sent out via a timer, the
'virtio_net_rsc_timeout' controls the interval, the value will influent the
performance and response of tcp connection extremely, 5(50us) is a
experience value to gain a performance improvement, since the whql test
sends packets every 100us, so '30(300us)' can pass the test case,
this is also the default value, it's tunable via the command line
parameter 'rsc_interval' with 'virtio-net-pci' device, for example, below
parameter is to launch a guest with interval set as '50'.


Does the value make sense if it was smaller than 1us? If not, why not 
just make the unit to be 1us?




'virtio-net-pci,netdev=hostnet1,bus=pci.0,id=net1,mac=00,rsc_interval=50'
will

The timer will only be triggered if the packets pool is not empty,
and it'll drain off all the cached packets.

'NetRscChain' is used to save the segments of different protocols in a
VirtIONet device.

The main handler of TCP includes TCP window update, duplicated ACK check
and the real data coalescing if the new segment passed sanity check
and is identified as an 'wanted' one.

An 'wanted' segment means:
1. Segment is within current window and the sequence is the expected one.
2. 'ACK' of the segment is in the valid window.

Sanity check includes:
1. Incorrect version in IP header
2. IP options & IP fragment
3. Not a TCP packets
4. Sanity size check to prevent buffer overflow attack.

There maybe more cases should be considered such as ip identification other
flags, while it broke the test because windows set it to the same even it's
not a fragment.

Normally it includes 2 typical ways to handle a TCP control flag, 'bypass'
and 'finalize', 'bypass' means should be sent out directly, and 'finalize'
means the packets should also be bypassed, and this should be done
after searching for the same connection packets in the pool and sending
all of them out, this is to avoid out of data.

All the 'SYN' packets will be bypassed since this always begin a new'
connection, other flags such 'FIN/RST' will trigger a finalization, because
this normally happens upon a connection is going to be closed, an 'URG' packet
also finalize current coalescing unit.

Statistics can be used to monitor the basic coalescing status, the 'out of 
order'
and 'out of window' means how many retransmitting packets, thus describe the
performance intuitively.


We usually don't add device specific monitor command. Maybe a new 
control vq command ethtool -S in guest in the future. I was thinking of 
removing those counters since it was never used in this series.




Signed-off-by: Wei Xu 
---
  hw/net/virtio-net.c | 498 +++-
  include/hw/virtio/virtio-net.h  |   2 +
  include/hw/virtio/virtio.h  |  75 +
  include/standard-headers/linux/virtio_net.h |   1 +


For RFC, it's ok. But for formal patch, this is not the correct way to 
modify Linux headers. There's a script in 
scripts/update-linux-headers.sh which was used to sync it from Linux 
source. This means, it must be merged in Linux or at least in 
maintainer's tree first.



  4 files changed, 575 insertions(+), 1 deletion(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 5798f87..b3bb63b 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -15,10 +15,12 @@
  #include "qemu/iov.h"
  #include "hw/virtio/virtio.h"
  #include "net/net.h"
+#include "net/eth.h"
  #include "net/checksum.h"
  #include "net/tap.h"
  #include "qemu/error-report.h"
  #include "qemu/timer.h"
+#include "qemu/sockets.h"
  #include "hw/virtio/virtio-net.h"
  #include "net/vhost_net.h"
  #include "hw/virtio/virtio-bus.h"
@@ -38,6 +40,25 @@
  #define endof(container, field) \
  (offsetof(container, field) + sizeof(((container *)0)->field))
  
+#define VIRTIO_NET_IP4_ADDR_SIZE   8/* ipv4 saddr + daddr */

+#define VIRTIO_NET_TCP_PORT_SIZE   4/* sport + dport */
+
+#define VIRTIO_NET_TCP_FLAG 0x3F
+#define VIRTIO_NET_TCP_HDR_LENGTH   0xF000
+
+/* IPv4 max payload, 16 bits in the header */
+#define VIRTIO_NET_MAX_IP4_PAYLOAD (65535 - sizeof(struct ip_header))
+#define VIRTIO_NET_MAX_TCP_PAYLOAD 65535


Is this still true if window scaling is enabled? And need to make sure 
your ack/seq processing can work for window scaling.



+
+/* header length value in ip header without option */
+#define VIRTIO_NET_IP4_HEADER_LENGTH 5
+
+/* Purge coalesced packets timer interval, This value affects the performance
+   a lot, and should be tuned carefully, '30'(300us) is the recommended
+   value to pass the WHQL test, '5' can gain 2x netperf throughput with
+   tso/gso/gro 'off'. */
+#define VIRTIO_NET_RSC_INTERVAL  30


Do we need a new property for this value?


+
  typedef struct 

[Qemu-devel] [ RFC Patch v6 1/3] virtio-net rsc: support coalescing ipv4 tcp traffic

2016-05-28 Thread wexu
From: Wei Xu 

All the data packets in a tcp connection will be cached to a big buffer
in every receive interval, and will be sent out via a timer, the
'virtio_net_rsc_timeout' controls the interval, the value will influent the
performance and response of tcp connection extremely, 5(50us) is a
experience value to gain a performance improvement, since the whql test
sends packets every 100us, so '30(300us)' can pass the test case,
this is also the default value, it's tunable via the command line
parameter 'rsc_interval' with 'virtio-net-pci' device, for example, below
parameter is to launch a guest with interval set as '50'.

'virtio-net-pci,netdev=hostnet1,bus=pci.0,id=net1,mac=00,rsc_interval=50'
will

The timer will only be triggered if the packets pool is not empty,
and it'll drain off all the cached packets.

'NetRscChain' is used to save the segments of different protocols in a
VirtIONet device.

The main handler of TCP includes TCP window update, duplicated ACK check
and the real data coalescing if the new segment passed sanity check
and is identified as an 'wanted' one.

An 'wanted' segment means:
1. Segment is within current window and the sequence is the expected one.
2. 'ACK' of the segment is in the valid window.

Sanity check includes:
1. Incorrect version in IP header
2. IP options & IP fragment
3. Not a TCP packets
4. Sanity size check to prevent buffer overflow attack.

There maybe more cases should be considered such as ip identification other
flags, while it broke the test because windows set it to the same even it's
not a fragment.

Normally it includes 2 typical ways to handle a TCP control flag, 'bypass'
and 'finalize', 'bypass' means should be sent out directly, and 'finalize'
means the packets should also be bypassed, and this should be done
after searching for the same connection packets in the pool and sending
all of them out, this is to avoid out of data.

All the 'SYN' packets will be bypassed since this always begin a new'
connection, other flags such 'FIN/RST' will trigger a finalization, because
this normally happens upon a connection is going to be closed, an 'URG' packet
also finalize current coalescing unit.

Statistics can be used to monitor the basic coalescing status, the 'out of 
order'
and 'out of window' means how many retransmitting packets, thus describe the
performance intuitively.

Signed-off-by: Wei Xu 
---
 hw/net/virtio-net.c | 498 +++-
 include/hw/virtio/virtio-net.h  |   2 +
 include/hw/virtio/virtio.h  |  75 +
 include/standard-headers/linux/virtio_net.h |   1 +
 4 files changed, 575 insertions(+), 1 deletion(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 5798f87..b3bb63b 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -15,10 +15,12 @@
 #include "qemu/iov.h"
 #include "hw/virtio/virtio.h"
 #include "net/net.h"
+#include "net/eth.h"
 #include "net/checksum.h"
 #include "net/tap.h"
 #include "qemu/error-report.h"
 #include "qemu/timer.h"
+#include "qemu/sockets.h"
 #include "hw/virtio/virtio-net.h"
 #include "net/vhost_net.h"
 #include "hw/virtio/virtio-bus.h"
@@ -38,6 +40,25 @@
 #define endof(container, field) \
 (offsetof(container, field) + sizeof(((container *)0)->field))
 
+#define VIRTIO_NET_IP4_ADDR_SIZE   8/* ipv4 saddr + daddr */
+#define VIRTIO_NET_TCP_PORT_SIZE   4/* sport + dport */
+
+#define VIRTIO_NET_TCP_FLAG 0x3F
+#define VIRTIO_NET_TCP_HDR_LENGTH   0xF000
+
+/* IPv4 max payload, 16 bits in the header */
+#define VIRTIO_NET_MAX_IP4_PAYLOAD (65535 - sizeof(struct ip_header))
+#define VIRTIO_NET_MAX_TCP_PAYLOAD 65535
+
+/* header length value in ip header without option */
+#define VIRTIO_NET_IP4_HEADER_LENGTH 5
+
+/* Purge coalesced packets timer interval, This value affects the performance
+   a lot, and should be tuned carefully, '30'(300us) is the recommended
+   value to pass the WHQL test, '5' can gain 2x netperf throughput with
+   tso/gso/gro 'off'. */
+#define VIRTIO_NET_RSC_INTERVAL  30
+
 typedef struct VirtIOFeature {
 uint32_t flags;
 size_t end;
@@ -1089,7 +1110,8 @@ static int receive_filter(VirtIONet *n, const uint8_t 
*buf, int size)
 return 0;
 }
 
-static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, 
size_t size)
+static ssize_t virtio_net_do_receive(NetClientState *nc,
+ const uint8_t *buf, size_t size)
 {
 VirtIONet *n = qemu_get_nic_opaque(nc);
 VirtIONetQueue *q = virtio_net_get_subqueue(nc);
@@ -1685,6 +1707,474 @@ static int virtio_net_load_device(VirtIODevice *vdev, 
QEMUFile *f,
 return 0;
 }
 
+static void virtio_net_rsc_extract_unit4(NetRscChain *chain,
+ const uint8_t *buf, NetRscUnit* unit)
+{
+uint16_t hdr_len;
+uint16_t ip_hdrlen;
+struct ip_header *ip;
+
+hdr_len =