Re: [Qemu-devel] [PATCH 1/2] virtio-net rsc: support coalescing ipv4 tcp traffic

2016-11-30 Thread Jason Wang



On 2016年11月30日 16:55, Wei Xu wrote:

On 2016年11月24日 12:17, Jason Wang wrote:



On 2016年11月01日 01:41, w...@redhat.com wrote:

From: Wei Xu 

All the data packets in a tcp connection are cached
to a single buffer in every receive interval, and will
be sent out via a timer, the 'virtio_net_rsc_timeout'
controls the interval, this value may impact the
performance and response time of tcp connection,
5(50us) is an experience value to gain a performance
improvement, since the whql test sends packets every 100us,
so '30(300us)' passes the test case, it is the default
value as well, tune it via the command line parameter
'rsc_interval' within 'virtio-net-pci' device, for example,
to launch a guest with interval set as '50':

'virtio-net-pci,netdev=hostnet1,bus=pci.0,id=net1,mac=00,rsc_interval=50' 




The timer will only be triggered if the packets pool is not empty,
and it'll drain off all the cached packets.

'NetRscChain' is used to save the segments of IPv4/6 in a
VirtIONet device.

A new segment becomes a 'Candidate' as well as it passed sanity check,
the main handler of TCP includes TCP window update, duplicated
ACK check and the real data coalescing.

An 'Candidate' segment means:
1. Segment is within current window and the sequence is the expected 
one.

2. 'ACK' of the segment is in the valid window.

Sanity check includes:
1. Incorrect version in IP header
2. An IP options or IP fragment
3. Not a TCP packet
4. Sanity size check to prevent buffer overflow attack.
5. An ECN packet

Even though, there might more cases should be considered such as
ip identification other flags, while it breaks the test because
windows set it to the same even it's not a fragment.

Normally it includes 2 typical ways to handle a TCP control flag,
'bypass' and 'finalize', 'bypass' means should be sent out directly,
while 'finalize' means the packets should also be bypassed, but this
should be done after search for the same connection packets in the
pool and drain all of them out, this is to avoid out of order fragment.

All the 'SYN' packets will be bypassed since this always begin a new'
connection, other flags such 'URG/FIN/RST/CWR/ECE' will trigger a
finalization, because this normally happens upon a connection is going
to be closed, an 'URG' packet also finalize current coalescing unit.

Statistics can be used to monitor the basic coalescing status, the
'out of order' and 'out of window' means how many retransmitting 
packets,

thus describe the performance intuitively.

Signed-off-by: Wei Xu 
---
  hw/net/virtio-net.c | 602
++--
  include/hw/virtio/virtio-net.h  |   5 +-
  include/hw/virtio/virtio.h  |  76 
  include/net/eth.h   |   2 +
  include/standard-headers/linux/virtio_net.h |  14 +
  net/tap.c   |   3 +-
  6 files changed, 670 insertions(+), 32 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 06bfe4b..d1824d9 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -15,10 +15,12 @@
  #include "qemu/iov.h"
  #include "hw/virtio/virtio.h"
  #include "net/net.h"
+#include "net/eth.h"
  #include "net/checksum.h"
  #include "net/tap.h"
  #include "qemu/error-report.h"
  #include "qemu/timer.h"
+#include "qemu/sockets.h"
  #include "hw/virtio/virtio-net.h"
  #include "net/vhost_net.h"
  #include "hw/virtio/virtio-bus.h"
@@ -43,6 +45,24 @@
  #define endof(container, field) \
  (offsetof(container, field) + sizeof(((container *)0)->field))
+#define VIRTIO_NET_IP4_ADDR_SIZE   8/* ipv4 saddr + daddr */


Only used once in the code, I don't see much value of this macro.


Just to keep it a bit readable.


Then you may want to replace this with sizeof(struct ...).






+
+#define VIRTIO_NET_TCP_FLAG 0x3F
+#define VIRTIO_NET_TCP_HDR_LENGTH   0xF000
+
+/* IPv4 max payload, 16 bits in the header */
+#define VIRTIO_NET_MAX_IP4_PAYLOAD (65535 - sizeof(struct ip_header))
+#define VIRTIO_NET_MAX_TCP_PAYLOAD 65535
+
+/* header length value in ip header without option */
+#define VIRTIO_NET_IP4_HEADER_LENGTH 5
+
+/* Purge coalesced packets timer interval, This value affects the
performance
+   a lot, and should be tuned carefully, '30'(300us) is the
recommended
+   value to pass the WHQL test, '5' can gain 2x netperf
throughput with
+   tso/gso/gro 'off'. */
+#define VIRTIO_NET_RSC_INTERVAL  30


This should be a property for virito-net and the above comment can be
the description of the property.


This is a value for a property, actually I hadn't found a place to put
it.


There's a description filed of PropertyInfo, but for virtio properties 
may need more work. So we can leave this as is now.







+
  typedef struct VirtIOFeature {
  uint32_t flags;
  size_t end;
@@ -589,7 +609,12 @@ static uint64_t
virtio_net_guest_offloads_by_features(uint32_t features)
  (1ULL 

Re: [Qemu-devel] [PATCH 1/2] virtio-net rsc: support coalescing ipv4 tcp traffic

2016-11-30 Thread Wei Xu

On 2016年11月24日 12:17, Jason Wang wrote:



On 2016年11月01日 01:41, w...@redhat.com wrote:

From: Wei Xu 

All the data packets in a tcp connection are cached
to a single buffer in every receive interval, and will
be sent out via a timer, the 'virtio_net_rsc_timeout'
controls the interval, this value may impact the
performance and response time of tcp connection,
5(50us) is an experience value to gain a performance
improvement, since the whql test sends packets every 100us,
so '30(300us)' passes the test case, it is the default
value as well, tune it via the command line parameter
'rsc_interval' within 'virtio-net-pci' device, for example,
to launch a guest with interval set as '50':

'virtio-net-pci,netdev=hostnet1,bus=pci.0,id=net1,mac=00,rsc_interval=50'


The timer will only be triggered if the packets pool is not empty,
and it'll drain off all the cached packets.

'NetRscChain' is used to save the segments of IPv4/6 in a
VirtIONet device.

A new segment becomes a 'Candidate' as well as it passed sanity check,
the main handler of TCP includes TCP window update, duplicated
ACK check and the real data coalescing.

An 'Candidate' segment means:
1. Segment is within current window and the sequence is the expected one.
2. 'ACK' of the segment is in the valid window.

Sanity check includes:
1. Incorrect version in IP header
2. An IP options or IP fragment
3. Not a TCP packet
4. Sanity size check to prevent buffer overflow attack.
5. An ECN packet

Even though, there might more cases should be considered such as
ip identification other flags, while it breaks the test because
windows set it to the same even it's not a fragment.

Normally it includes 2 typical ways to handle a TCP control flag,
'bypass' and 'finalize', 'bypass' means should be sent out directly,
while 'finalize' means the packets should also be bypassed, but this
should be done after search for the same connection packets in the
pool and drain all of them out, this is to avoid out of order fragment.

All the 'SYN' packets will be bypassed since this always begin a new'
connection, other flags such 'URG/FIN/RST/CWR/ECE' will trigger a
finalization, because this normally happens upon a connection is going
to be closed, an 'URG' packet also finalize current coalescing unit.

Statistics can be used to monitor the basic coalescing status, the
'out of order' and 'out of window' means how many retransmitting packets,
thus describe the performance intuitively.

Signed-off-by: Wei Xu 
---
  hw/net/virtio-net.c | 602
++--
  include/hw/virtio/virtio-net.h  |   5 +-
  include/hw/virtio/virtio.h  |  76 
  include/net/eth.h   |   2 +
  include/standard-headers/linux/virtio_net.h |  14 +
  net/tap.c   |   3 +-
  6 files changed, 670 insertions(+), 32 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 06bfe4b..d1824d9 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -15,10 +15,12 @@
  #include "qemu/iov.h"
  #include "hw/virtio/virtio.h"
  #include "net/net.h"
+#include "net/eth.h"
  #include "net/checksum.h"
  #include "net/tap.h"
  #include "qemu/error-report.h"
  #include "qemu/timer.h"
+#include "qemu/sockets.h"
  #include "hw/virtio/virtio-net.h"
  #include "net/vhost_net.h"
  #include "hw/virtio/virtio-bus.h"
@@ -43,6 +45,24 @@
  #define endof(container, field) \
  (offsetof(container, field) + sizeof(((container *)0)->field))
+#define VIRTIO_NET_IP4_ADDR_SIZE   8/* ipv4 saddr + daddr */


Only used once in the code, I don't see much value of this macro.


Just to keep it a bit readable.




+
+#define VIRTIO_NET_TCP_FLAG 0x3F
+#define VIRTIO_NET_TCP_HDR_LENGTH   0xF000
+
+/* IPv4 max payload, 16 bits in the header */
+#define VIRTIO_NET_MAX_IP4_PAYLOAD (65535 - sizeof(struct ip_header))
+#define VIRTIO_NET_MAX_TCP_PAYLOAD 65535
+
+/* header length value in ip header without option */
+#define VIRTIO_NET_IP4_HEADER_LENGTH 5
+
+/* Purge coalesced packets timer interval, This value affects the
performance
+   a lot, and should be tuned carefully, '30'(300us) is the
recommended
+   value to pass the WHQL test, '5' can gain 2x netperf
throughput with
+   tso/gso/gro 'off'. */
+#define VIRTIO_NET_RSC_INTERVAL  30


This should be a property for virito-net and the above comment can be
the description of the property.


This is a value for a property, actually I hadn't found a place to put
it.




+
  typedef struct VirtIOFeature {
  uint32_t flags;
  size_t end;
@@ -589,7 +609,12 @@ static uint64_t
virtio_net_guest_offloads_by_features(uint32_t features)
  (1ULL << VIRTIO_NET_F_GUEST_ECN)  |
  (1ULL << VIRTIO_NET_F_GUEST_UFO);
-return guest_offloads_mask & features;
+if (features & VIRTIO_NET_F_CTRL_GUEST_OFFLOADS) {
+return (guest_offloads_mask & features) |
+

Re: [Qemu-devel] [PATCH 1/2] virtio-net rsc: support coalescing ipv4 tcp traffic

2016-11-23 Thread Michael S. Tsirkin
On Thu, Nov 24, 2016 at 12:31:18PM +0800, Jason Wang wrote:
> 
> 
> On 2016年11月24日 12:26, Michael S. Tsirkin wrote:
> > On Thu, Nov 24, 2016 at 12:17:21PM +0800, Jason Wang wrote:
> > > > diff --git a/include/standard-headers/linux/virtio_net.h 
> > > > b/include/standard-headers/linux/virtio_net.h
> > > > index 30ff249..e67b36e 100644
> > > > --- a/include/standard-headers/linux/virtio_net.h
> > > > +++ b/include/standard-headers/linux/virtio_net.h
> > > > @@ -57,6 +57,9 @@
> > > >  * Steering */
> > > >#define VIRTIO_NET_F_CTRL_MAC_ADDR 23/* Set MAC address */
> > > > +/* Guest can handle coalesced ipv4-tcp packets */
> > > > +#define VIRTIO_NET_F_GUEST_RSC441
> > > Why not use 24?
> > I think we should use features >31 (virtio 1 only) for
> > nice-to-have features like RSC. Feature bits <31 are
> > easy to backport, so it makes more sense to use
> > them for fundamental things like the MTU
> > (which for some setups help fix broken networking).
> 
> Ok, I believe we need clarify this in the spec or somewhere else.

There is a design considerations chapter, it can go there.

-- 
MST



Re: [Qemu-devel] [PATCH 1/2] virtio-net rsc: support coalescing ipv4 tcp traffic

2016-11-23 Thread Jason Wang



On 2016年11月24日 12:26, Michael S. Tsirkin wrote:

On Thu, Nov 24, 2016 at 12:17:21PM +0800, Jason Wang wrote:

diff --git a/include/standard-headers/linux/virtio_net.h 
b/include/standard-headers/linux/virtio_net.h
index 30ff249..e67b36e 100644
--- a/include/standard-headers/linux/virtio_net.h
+++ b/include/standard-headers/linux/virtio_net.h
@@ -57,6 +57,9 @@
 * Steering */
   #define VIRTIO_NET_F_CTRL_MAC_ADDR 23/* Set MAC address */
+/* Guest can handle coalesced ipv4-tcp packets */
+#define VIRTIO_NET_F_GUEST_RSC441

Why not use 24?

I think we should use features >31 (virtio 1 only) for
nice-to-have features like RSC. Feature bits <31 are
easy to backport, so it makes more sense to use
them for fundamental things like the MTU
(which for some setups help fix broken networking).


Ok, I believe we need clarify this in the spec or somewhere else.



Re: [Qemu-devel] [PATCH 1/2] virtio-net rsc: support coalescing ipv4 tcp traffic

2016-11-23 Thread Michael S. Tsirkin
On Thu, Nov 24, 2016 at 12:17:21PM +0800, Jason Wang wrote:
> > diff --git a/include/standard-headers/linux/virtio_net.h 
> > b/include/standard-headers/linux/virtio_net.h
> > index 30ff249..e67b36e 100644
> > --- a/include/standard-headers/linux/virtio_net.h
> > +++ b/include/standard-headers/linux/virtio_net.h
> > @@ -57,6 +57,9 @@
> >  * Steering */
> >   #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */
> > +/* Guest can handle coalesced ipv4-tcp packets */
> > +#define VIRTIO_NET_F_GUEST_RSC441
> 
> Why not use 24?

I think we should use features >31 (virtio 1 only) for
nice-to-have features like RSC. Feature bits <31 are
easy to backport, so it makes more sense to use
them for fundamental things like the MTU
(which for some setups help fix broken networking).



Re: [Qemu-devel] [PATCH 1/2] virtio-net rsc: support coalescing ipv4 tcp traffic

2016-11-23 Thread Jason Wang



On 2016年11月01日 01:41, w...@redhat.com wrote:

From: Wei Xu 

All the data packets in a tcp connection are cached
to a single buffer in every receive interval, and will
be sent out via a timer, the 'virtio_net_rsc_timeout'
controls the interval, this value may impact the
performance and response time of tcp connection,
5(50us) is an experience value to gain a performance
improvement, since the whql test sends packets every 100us,
so '30(300us)' passes the test case, it is the default
value as well, tune it via the command line parameter
'rsc_interval' within 'virtio-net-pci' device, for example,
to launch a guest with interval set as '50':

'virtio-net-pci,netdev=hostnet1,bus=pci.0,id=net1,mac=00,rsc_interval=50'

The timer will only be triggered if the packets pool is not empty,
and it'll drain off all the cached packets.

'NetRscChain' is used to save the segments of IPv4/6 in a
VirtIONet device.

A new segment becomes a 'Candidate' as well as it passed sanity check,
the main handler of TCP includes TCP window update, duplicated
ACK check and the real data coalescing.

An 'Candidate' segment means:
1. Segment is within current window and the sequence is the expected one.
2. 'ACK' of the segment is in the valid window.

Sanity check includes:
1. Incorrect version in IP header
2. An IP options or IP fragment
3. Not a TCP packet
4. Sanity size check to prevent buffer overflow attack.
5. An ECN packet

Even though, there might more cases should be considered such as
ip identification other flags, while it breaks the test because
windows set it to the same even it's not a fragment.

Normally it includes 2 typical ways to handle a TCP control flag,
'bypass' and 'finalize', 'bypass' means should be sent out directly,
while 'finalize' means the packets should also be bypassed, but this
should be done after search for the same connection packets in the
pool and drain all of them out, this is to avoid out of order fragment.

All the 'SYN' packets will be bypassed since this always begin a new'
connection, other flags such 'URG/FIN/RST/CWR/ECE' will trigger a
finalization, because this normally happens upon a connection is going
to be closed, an 'URG' packet also finalize current coalescing unit.

Statistics can be used to monitor the basic coalescing status, the
'out of order' and 'out of window' means how many retransmitting packets,
thus describe the performance intuitively.

Signed-off-by: Wei Xu 
---
  hw/net/virtio-net.c | 602 ++--
  include/hw/virtio/virtio-net.h  |   5 +-
  include/hw/virtio/virtio.h  |  76 
  include/net/eth.h   |   2 +
  include/standard-headers/linux/virtio_net.h |  14 +
  net/tap.c   |   3 +-
  6 files changed, 670 insertions(+), 32 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 06bfe4b..d1824d9 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -15,10 +15,12 @@
  #include "qemu/iov.h"
  #include "hw/virtio/virtio.h"
  #include "net/net.h"
+#include "net/eth.h"
  #include "net/checksum.h"
  #include "net/tap.h"
  #include "qemu/error-report.h"
  #include "qemu/timer.h"
+#include "qemu/sockets.h"
  #include "hw/virtio/virtio-net.h"
  #include "net/vhost_net.h"
  #include "hw/virtio/virtio-bus.h"
@@ -43,6 +45,24 @@
  #define endof(container, field) \
  (offsetof(container, field) + sizeof(((container *)0)->field))
  
+#define VIRTIO_NET_IP4_ADDR_SIZE   8/* ipv4 saddr + daddr */


Only used once in the code, I don't see much value of this macro.


+
+#define VIRTIO_NET_TCP_FLAG 0x3F
+#define VIRTIO_NET_TCP_HDR_LENGTH   0xF000
+
+/* IPv4 max payload, 16 bits in the header */
+#define VIRTIO_NET_MAX_IP4_PAYLOAD (65535 - sizeof(struct ip_header))
+#define VIRTIO_NET_MAX_TCP_PAYLOAD 65535
+
+/* header length value in ip header without option */
+#define VIRTIO_NET_IP4_HEADER_LENGTH 5
+
+/* Purge coalesced packets timer interval, This value affects the performance
+   a lot, and should be tuned carefully, '30'(300us) is the recommended
+   value to pass the WHQL test, '5' can gain 2x netperf throughput with
+   tso/gso/gro 'off'. */
+#define VIRTIO_NET_RSC_INTERVAL  30


This should be a property for virito-net and the above comment can be 
the description of the property.



+
  typedef struct VirtIOFeature {
  uint32_t flags;
  size_t end;
@@ -589,7 +609,12 @@ static uint64_t 
virtio_net_guest_offloads_by_features(uint32_t features)
  (1ULL << VIRTIO_NET_F_GUEST_ECN)  |
  (1ULL << VIRTIO_NET_F_GUEST_UFO);
  
-return guest_offloads_mask & features;

+if (features & VIRTIO_NET_F_CTRL_GUEST_OFFLOADS) {
+return (guest_offloads_mask & features) |
+   (1ULL << VIRTIO_NET_F_GUEST_RSC4);


Why need to care this, I believe RSC has nothing to do with peer's 
offload setting?



+} else {
+

Re: [Qemu-devel] [ Patch 1/2] virtio-net rsc: support coalescing ipv4 tcp traffic

2016-03-20 Thread Jason Wang


On 03/18/2016 12:17 PM, Wei Xu wrote:
>
> +static ssize_t virtio_net_receive(NetClientState *nc,
> +  const uint8_t *buf, size_t size)
> +{
> +if (virtio_net_rsc_bypass) {
> +return virtio_net_do_receive(nc, buf, size);
 You need a feature bit for this and compat it for older machine types.
 And also need some work on virtio spec I think.
>>> yes, not sure which way is good to support this, hmp/qmp/ethtool, this
>>> is gonna to support win guest,
>>> so need a well-compatible interface, any comments?
>> I think this should be implemented through feature bits/negotiation
>> instead of something like ethtool.
> Looks this feature should be turn on/off dynamically due to the spec,
> so maybe this should be managed from the guest, is there any reference
> code for this? 

Then you may want to look at implementation of
VIRTIO_NET_F_CTRL_GUEST_OFFLOADS.



Re: [Qemu-devel] [ Patch 1/2] virtio-net rsc: support coalescing ipv4 tcp traffic

2016-03-19 Thread Jason Wang


On 03/18/2016 02:38 PM, Wei Xu wrote:
>
>
> On 2016年03月18日 13:20, Jason Wang wrote:
>>
>> On 03/18/2016 12:17 PM, Wei Xu wrote:
>>> +static ssize_t virtio_net_receive(NetClientState *nc,
>>> +  const uint8_t *buf, size_t size)
>>> +{
>>> +if (virtio_net_rsc_bypass) {
>>> +return virtio_net_do_receive(nc, buf, size);
>> You need a feature bit for this and compat it for older machine
>> types.
>> And also need some work on virtio spec I think.
> yes, not sure which way is good to support this, hmp/qmp/ethtool,
> this
> is gonna to support win guest,
> so need a well-compatible interface, any comments?
 I think this should be implemented through feature bits/negotiation
 instead of something like ethtool.
>>> Looks this feature should be turn on/off dynamically due to the spec,
>>> so maybe this should be managed from the guest, is there any reference
>>> code for this?
>> Then you may want to look at implementation of
>> VIRTIO_NET_F_CTRL_GUEST_OFFLOADS.
> Have a short look at it, do you know how to control the feature bit? 
> both when lauching vm and changing it during runtime? 

Virtio spec and maybe windows driver source code can give you the answer.




Re: [Qemu-devel] [ Patch 1/2] virtio-net rsc: support coalescing ipv4 tcp traffic

2016-03-19 Thread Wei Xu

On 2016年03月17日 16:42, Jason Wang wrote:



On 03/15/2016 05:17 PM, w...@redhat.com wrote:

From: Wei Xu 

All the data packets in a tcp connection will be cached to a big buffer
in every receive interval, and will be sent out via a timer, the
'virtio_net_rsc_timeout' controls the interval, the value will influent the
performance and response of tcp connection extremely, 5(50us) is a
experience value to gain a performance improvement, since the whql test
sends packets every 100us, so '30(300us)' can pass the test case,
this is also the default value, it's gonna to be tunable.

The timer will only be triggered if the packets pool is not empty,
and it'll drain off all the cached packets

'NetRscChain' is used to save the segments of different protocols in a
VirtIONet device.

The main handler of TCP includes TCP window update, duplicated ACK check
and the real data coalescing if the new segment passed sanity check
and is identified as an 'wanted' one.

An 'wanted' segment means:
1. Segment is within current window and the sequence is the expected one.
2. ACK of the segment is in the valid window.
3. If the ACK in the segment is a duplicated one, then it must less than 2,
this is to notify upper layer TCP starting retransmission due to the spec.

Sanity check includes:
1. Incorrect version in IP header
2. IP options & IP fragment
3. Not a TCP packets
4. Sanity size check to prevent buffer overflow attack.

There maybe more cases should be considered such as ip identification other
flags, while it broke the test because windows set it to the same even it's
not a fragment.

Normally it includes 2 typical ways to handle a TCP control flag, 'bypass'
and 'finalize', 'bypass' means should be sent out directly, and 'finalize'
means the packets should also be bypassed, and this should be done
after searching for the same connection packets in the pool and sending
all of them out, this is to avoid out of data.

All the 'SYN' packets will be bypassed since this always begin a new'
connection, other flags such 'FIN/RST' will trigger a finalization, because
this normally happens upon a connection is going to be closed, an 'URG' packet
also finalize current coalescing unit while there maybe protocol difference to
different OS.

But URG packet should be sent as quickly as possible regardless of
ordering, no?


Yes, you right, URG will terminate the current 'SCU', i'll amend the commit log.




Statistics can be used to monitor the basic coalescing status, the 'out of 
order'
and 'out of window' means how many retransmitting packets, thus describe the
performance intuitively.

Signed-off-by: Wei Xu 
---
  hw/net/virtio-net.c| 486 -
  include/hw/virtio/virtio-net.h |   1 +
  include/hw/virtio/virtio.h |  72 ++
  3 files changed, 558 insertions(+), 1 deletion(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 5798f87..c23b45f 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -15,10 +15,12 @@
  #include "qemu/iov.h"
  #include "hw/virtio/virtio.h"
  #include "net/net.h"
+#include "net/eth.h"
  #include "net/checksum.h"
  #include "net/tap.h"
  #include "qemu/error-report.h"
  #include "qemu/timer.h"
+#include "qemu/sockets.h"
  #include "hw/virtio/virtio-net.h"
  #include "net/vhost_net.h"
  #include "hw/virtio/virtio-bus.h"
@@ -38,6 +40,35 @@
  #define endof(container, field) \
  (offsetof(container, field) + sizeof(((container *)0)->field))
  
+#define ETH_HDR_SZ (sizeof(struct eth_header))

+#define IP4_HDR_SZ (sizeof(struct ip_header))
+#define TCP_HDR_SZ (sizeof(struct tcp_header))
+#define ETH_IP4_HDR_SZ (ETH_HDR_SZ + IP4_HDR_SZ)
+
+#define IP4_ADDR_SIZE   8   /* ipv4 saddr + daddr */
+#define TCP_PORT_SIZE   4   /* sport + dport */
+
+/* IPv4 max payload, 16 bits in the header */
+#define MAX_IP4_PAYLOAD (65535 - IP4_HDR_SZ)
+#define MAX_TCP_PAYLOAD 65535
+
+/* max payload with virtio header */
+#define MAX_VIRTIO_PAYLOAD  (sizeof(struct virtio_net_hdr_mrg_rxbuf) \
++ ETH_HDR_SZ + MAX_TCP_PAYLOAD)

Should we use guest_hdr_len instead of sizeof() here? Consider the
vnet_hdr will be extended in the future.


Sure.




+
+#define IP4_HEADER_LEN 5 /* header lenght value in ip header without option */

type, should be 'length'


ok.




+
+/* Purge coalesced packets timer interval */
+#define RSC_TIMER_INTERVAL  30
+
+/* Switcher to enable/disable rsc */
+static bool virtio_net_rsc_bypass = 1;
+
+/* This value affects the performance a lot, and should be tuned carefully,
+   '30'(300us) is the recommended value to pass the WHQL test, '5' can
+   gain 2x netperf throughput with tso/gso/gro 'off'. */
+static uint32_t virtio_net_rsc_timeout = RSC_TIMER_INTERVAL;
+
  typedef struct VirtIOFeature {
  uint32_t flags;
  size_t end;
@@ -1089,7 +1120,8 @@ static int receive_filter(VirtIONet *n, const uint8_t 
*buf, 

Re: [Qemu-devel] [ Patch 1/2] virtio-net rsc: support coalescing ipv4 tcp traffic

2016-03-19 Thread Jason Wang


On 03/18/2016 12:45 AM, Wei Xu wrote:
> On 2016年03月17日 16:42, Jason Wang wrote:
>
>>
>> On 03/15/2016 05:17 PM, w...@redhat.com wrote:
>>> From: Wei Xu 
>>>
>>> All the data packets in a tcp connection will be cached to a big buffer
>>> in every receive interval, and will be sent out via a timer, the
>>> 'virtio_net_rsc_timeout' controls the interval, the value will
>>> influent the
>>> performance and response of tcp connection extremely, 5(50us) is a
>>> experience value to gain a performance improvement, since the whql test
>>> sends packets every 100us, so '30(300us)' can pass the test case,
>>> this is also the default value, it's gonna to be tunable.
>>>
>>> The timer will only be triggered if the packets pool is not empty,
>>> and it'll drain off all the cached packets
>>>
>>> 'NetRscChain' is used to save the segments of different protocols in a
>>> VirtIONet device.
>>>
>>> The main handler of TCP includes TCP window update, duplicated ACK
>>> check
>>> and the real data coalescing if the new segment passed sanity check
>>> and is identified as an 'wanted' one.
>>>
>>> An 'wanted' segment means:
>>> 1. Segment is within current window and the sequence is the expected
>>> one.
>>> 2. ACK of the segment is in the valid window.
>>> 3. If the ACK in the segment is a duplicated one, then it must less
>>> than 2,
>>> this is to notify upper layer TCP starting retransmission due to
>>> the spec.
>>>
>>> Sanity check includes:
>>> 1. Incorrect version in IP header
>>> 2. IP options & IP fragment
>>> 3. Not a TCP packets
>>> 4. Sanity size check to prevent buffer overflow attack.
>>>
>>> There maybe more cases should be considered such as ip
>>> identification other
>>> flags, while it broke the test because windows set it to the same
>>> even it's
>>> not a fragment.
>>>
>>> Normally it includes 2 typical ways to handle a TCP control flag,
>>> 'bypass'
>>> and 'finalize', 'bypass' means should be sent out directly, and
>>> 'finalize'
>>> means the packets should also be bypassed, and this should be done
>>> after searching for the same connection packets in the pool and sending
>>> all of them out, this is to avoid out of data.
>>>
>>> All the 'SYN' packets will be bypassed since this always begin a new'
>>> connection, other flags such 'FIN/RST' will trigger a finalization,
>>> because
>>> this normally happens upon a connection is going to be closed, an
>>> 'URG' packet
>>> also finalize current coalescing unit while there maybe protocol
>>> difference to
>>> different OS.
>> But URG packet should be sent as quickly as possible regardless of
>> ordering, no?
>
> Yes, you right, URG will terminate the current 'SCU', i'll amend the
> commit log.
>
>>
>>> Statistics can be used to monitor the basic coalescing status, the
>>> 'out of order'
>>> and 'out of window' means how many retransmitting packets, thus
>>> describe the
>>> performance intuitively.
>>>
>>> Signed-off-by: Wei Xu 
>>> ---
>>>   hw/net/virtio-net.c| 486
>>> -
>>>   include/hw/virtio/virtio-net.h |   1 +
>>>   include/hw/virtio/virtio.h |  72 ++
>>>   3 files changed, 558 insertions(+), 1 deletion(-)

[...]

>>> +} else {
>>> +/* Coalesce window update */
>>> +o_tcp->th_win = n_tcp->th_win;
>>> +chain->stat.win_update++;
>>> +return RSC_COALESCE;
>>> +}
>>> +} else {
>>> +/* pure ack, update ack */
>>> +o_tcp->th_ack = n_tcp->th_ack;
>>> +chain->stat.pure_ack++;
>>> +return RSC_COALESCE;
>> Looks like there're something I missed. The spec said:
>>
>> "In other words, any pure ACK that is not a duplicate ACK or a window
>> update triggers an exception and must not be coalesced. All such pure
>> ACKs must be indicated as individual segments."
>>
>> Does it mean we *should not* coalesce windows update and pure ack?
>> (Since it can wakeup transmission)?
>
> It's also a little bit inexplicit and flexible due to the spec, please
> see the flowchart I on the same page.
>
> Comments about the  flowchart:
> 
> The first of the following two flowcharts describes the rules for
> coalescing segments and updating the TCP headers.
> This flowchart refers to mechanisms for distinguishing valid duplicate
> ACKs and window updates. The second flowchart describes these mechanisms.
> 
> As show in the flowchart, only status 'C' will break current scu and
> get finalized, both 'A' and 'B' can be coalesced afaik.
>

Interesting, looks like you're right.

>>
>>> +}
>>> +}
>>> +
>>> +static int32_t virtio_net_rsc_coalesce_data(NetRscChain *chain,
>>> NetRscSeg *seg,
>>> +const uint8_t *buf, NetRscUnit
>>> *n_unit)
>>> +{
>>> +void *data;
>>> +uint16_t o_ip_len;
>>> +   

Re: [Qemu-devel] [ Patch 1/2] virtio-net rsc: support coalescing ipv4 tcp traffic

2016-03-19 Thread Wei Xu



On 2016年03月18日 10:03, Jason Wang wrote:


On 03/18/2016 12:45 AM, Wei Xu wrote:

On 2016年03月17日 16:42, Jason Wang wrote:


On 03/15/2016 05:17 PM, w...@redhat.com wrote:

From: Wei Xu 

All the data packets in a tcp connection will be cached to a big buffer
in every receive interval, and will be sent out via a timer, the
'virtio_net_rsc_timeout' controls the interval, the value will
influent the
performance and response of tcp connection extremely, 5(50us) is a
experience value to gain a performance improvement, since the whql test
sends packets every 100us, so '30(300us)' can pass the test case,
this is also the default value, it's gonna to be tunable.

The timer will only be triggered if the packets pool is not empty,
and it'll drain off all the cached packets

'NetRscChain' is used to save the segments of different protocols in a
VirtIONet device.

The main handler of TCP includes TCP window update, duplicated ACK
check
and the real data coalescing if the new segment passed sanity check
and is identified as an 'wanted' one.

An 'wanted' segment means:
1. Segment is within current window and the sequence is the expected
one.
2. ACK of the segment is in the valid window.
3. If the ACK in the segment is a duplicated one, then it must less
than 2,
 this is to notify upper layer TCP starting retransmission due to
the spec.

Sanity check includes:
1. Incorrect version in IP header
2. IP options & IP fragment
3. Not a TCP packets
4. Sanity size check to prevent buffer overflow attack.

There maybe more cases should be considered such as ip
identification other
flags, while it broke the test because windows set it to the same
even it's
not a fragment.

Normally it includes 2 typical ways to handle a TCP control flag,
'bypass'
and 'finalize', 'bypass' means should be sent out directly, and
'finalize'
means the packets should also be bypassed, and this should be done
after searching for the same connection packets in the pool and sending
all of them out, this is to avoid out of data.

All the 'SYN' packets will be bypassed since this always begin a new'
connection, other flags such 'FIN/RST' will trigger a finalization,
because
this normally happens upon a connection is going to be closed, an
'URG' packet
also finalize current coalescing unit while there maybe protocol
difference to
different OS.

But URG packet should be sent as quickly as possible regardless of
ordering, no?

Yes, you right, URG will terminate the current 'SCU', i'll amend the
commit log.


Statistics can be used to monitor the basic coalescing status, the
'out of order'
and 'out of window' means how many retransmitting packets, thus
describe the
performance intuitively.

Signed-off-by: Wei Xu 
---
   hw/net/virtio-net.c| 486
-
   include/hw/virtio/virtio-net.h |   1 +
   include/hw/virtio/virtio.h |  72 ++
   3 files changed, 558 insertions(+), 1 deletion(-)

[...]


+} else {
+/* Coalesce window update */
+o_tcp->th_win = n_tcp->th_win;
+chain->stat.win_update++;
+return RSC_COALESCE;
+}
+} else {
+/* pure ack, update ack */
+o_tcp->th_ack = n_tcp->th_ack;
+chain->stat.pure_ack++;
+return RSC_COALESCE;

Looks like there're something I missed. The spec said:

"In other words, any pure ACK that is not a duplicate ACK or a window
update triggers an exception and must not be coalesced. All such pure
ACKs must be indicated as individual segments."

Does it mean we *should not* coalesce windows update and pure ack?
(Since it can wakeup transmission)?

It's also a little bit inexplicit and flexible due to the spec, please
see the flowchart I on the same page.

Comments about the  flowchart:

The first of the following two flowcharts describes the rules for
coalescing segments and updating the TCP headers.
This flowchart refers to mechanisms for distinguishing valid duplicate
ACKs and window updates. The second flowchart describes these mechanisms.

As show in the flowchart, only status 'C' will break current scu and
get finalized, both 'A' and 'B' can be coalesced afaik.


Interesting, looks like you're right.


+}
+}
+
+static int32_t virtio_net_rsc_coalesce_data(NetRscChain *chain,
NetRscSeg *seg,
+const uint8_t *buf, NetRscUnit
*n_unit)
+{
+void *data;
+uint16_t o_ip_len;
+uint32_t nseq, oseq;
+NetRscUnit *o_unit;
+
+o_unit = >unit;
+o_ip_len = htons(*o_unit->ip_plen);
+nseq = htonl(n_unit->tcp->th_seq);
+oseq = htonl(o_unit->tcp->th_seq);
+
+if (n_unit->tcp_hdrlen > TCP_HDR_SZ) {
+/* Log this only for debugging observation */
+chain->stat.tcp_option++;
+}
+
+/* Ignore packet with more/larger tcp 

Re: [Qemu-devel] [ Patch 1/2] virtio-net rsc: support coalescing ipv4 tcp traffic

2016-03-19 Thread Wei Xu



On 2016年03月18日 14:56, Jason Wang wrote:


On 03/18/2016 02:38 PM, Wei Xu wrote:


On 2016年03月18日 13:20, Jason Wang wrote:

On 03/18/2016 12:17 PM, Wei Xu wrote:

+static ssize_t virtio_net_receive(NetClientState *nc,
+  const uint8_t *buf, size_t size)
+{
+if (virtio_net_rsc_bypass) {
+return virtio_net_do_receive(nc, buf, size);

You need a feature bit for this and compat it for older machine
types.
And also need some work on virtio spec I think.

yes, not sure which way is good to support this, hmp/qmp/ethtool,
this
is gonna to support win guest,
so need a well-compatible interface, any comments?

I think this should be implemented through feature bits/negotiation
instead of something like ethtool.

Looks this feature should be turn on/off dynamically due to the spec,
so maybe this should be managed from the guest, is there any reference
code for this?

Then you may want to look at implementation of
VIRTIO_NET_F_CTRL_GUEST_OFFLOADS.

Have a short look at it, do you know how to control the feature bit?
both when lauching vm and changing it during runtime?

Virtio spec and maybe windows driver source code can give you the answer.

OK, will check it out.




Re: [Qemu-devel] [ Patch 1/2] virtio-net rsc: support coalescing ipv4 tcp traffic

2016-03-19 Thread Wei Xu



On 2016年03月18日 13:20, Jason Wang wrote:


On 03/18/2016 12:17 PM, Wei Xu wrote:

+static ssize_t virtio_net_receive(NetClientState *nc,
+  const uint8_t *buf, size_t size)
+{
+if (virtio_net_rsc_bypass) {
+return virtio_net_do_receive(nc, buf, size);

You need a feature bit for this and compat it for older machine types.
And also need some work on virtio spec I think.

yes, not sure which way is good to support this, hmp/qmp/ethtool, this
is gonna to support win guest,
so need a well-compatible interface, any comments?

I think this should be implemented through feature bits/negotiation
instead of something like ethtool.

Looks this feature should be turn on/off dynamically due to the spec,
so maybe this should be managed from the guest, is there any reference
code for this?

Then you may want to look at implementation of
VIRTIO_NET_F_CTRL_GUEST_OFFLOADS.
Have a short look at it, do you know how to control the feature bit?  
both when lauching vm and changing it during runtime?




Re: [Qemu-devel] [ Patch 1/2] virtio-net rsc: support coalescing ipv4 tcp traffic

2016-03-19 Thread Jason Wang


On 03/15/2016 05:17 PM, w...@redhat.com wrote:
> From: Wei Xu 
>
> All the data packets in a tcp connection will be cached to a big buffer
> in every receive interval, and will be sent out via a timer, the
> 'virtio_net_rsc_timeout' controls the interval, the value will influent the
> performance and response of tcp connection extremely, 5(50us) is a
> experience value to gain a performance improvement, since the whql test
> sends packets every 100us, so '30(300us)' can pass the test case,
> this is also the default value, it's gonna to be tunable.
>
> The timer will only be triggered if the packets pool is not empty,
> and it'll drain off all the cached packets
>
> 'NetRscChain' is used to save the segments of different protocols in a
> VirtIONet device.
>
> The main handler of TCP includes TCP window update, duplicated ACK check
> and the real data coalescing if the new segment passed sanity check
> and is identified as an 'wanted' one.
>
> An 'wanted' segment means:
> 1. Segment is within current window and the sequence is the expected one.
> 2. ACK of the segment is in the valid window.
> 3. If the ACK in the segment is a duplicated one, then it must less than 2,
>this is to notify upper layer TCP starting retransmission due to the spec.
>
> Sanity check includes:
> 1. Incorrect version in IP header
> 2. IP options & IP fragment
> 3. Not a TCP packets
> 4. Sanity size check to prevent buffer overflow attack.
>
> There maybe more cases should be considered such as ip identification other
> flags, while it broke the test because windows set it to the same even it's
> not a fragment.
>
> Normally it includes 2 typical ways to handle a TCP control flag, 'bypass'
> and 'finalize', 'bypass' means should be sent out directly, and 'finalize'
> means the packets should also be bypassed, and this should be done
> after searching for the same connection packets in the pool and sending
> all of them out, this is to avoid out of data.
>
> All the 'SYN' packets will be bypassed since this always begin a new'
> connection, other flags such 'FIN/RST' will trigger a finalization, because
> this normally happens upon a connection is going to be closed, an 'URG' packet
> also finalize current coalescing unit while there maybe protocol difference to
> different OS.

But URG packet should be sent as quickly as possible regardless of
ordering, no?

>
> Statistics can be used to monitor the basic coalescing status, the 'out of 
> order'
> and 'out of window' means how many retransmitting packets, thus describe the
> performance intuitively.
>
> Signed-off-by: Wei Xu 
> ---
>  hw/net/virtio-net.c| 486 
> -
>  include/hw/virtio/virtio-net.h |   1 +
>  include/hw/virtio/virtio.h |  72 ++
>  3 files changed, 558 insertions(+), 1 deletion(-)
>
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 5798f87..c23b45f 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -15,10 +15,12 @@
>  #include "qemu/iov.h"
>  #include "hw/virtio/virtio.h"
>  #include "net/net.h"
> +#include "net/eth.h"
>  #include "net/checksum.h"
>  #include "net/tap.h"
>  #include "qemu/error-report.h"
>  #include "qemu/timer.h"
> +#include "qemu/sockets.h"
>  #include "hw/virtio/virtio-net.h"
>  #include "net/vhost_net.h"
>  #include "hw/virtio/virtio-bus.h"
> @@ -38,6 +40,35 @@
>  #define endof(container, field) \
>  (offsetof(container, field) + sizeof(((container *)0)->field))
>  
> +#define ETH_HDR_SZ (sizeof(struct eth_header))
> +#define IP4_HDR_SZ (sizeof(struct ip_header))
> +#define TCP_HDR_SZ (sizeof(struct tcp_header))
> +#define ETH_IP4_HDR_SZ (ETH_HDR_SZ + IP4_HDR_SZ)
> +
> +#define IP4_ADDR_SIZE   8   /* ipv4 saddr + daddr */
> +#define TCP_PORT_SIZE   4   /* sport + dport */
> +
> +/* IPv4 max payload, 16 bits in the header */
> +#define MAX_IP4_PAYLOAD (65535 - IP4_HDR_SZ)
> +#define MAX_TCP_PAYLOAD 65535
> +
> +/* max payload with virtio header */
> +#define MAX_VIRTIO_PAYLOAD  (sizeof(struct virtio_net_hdr_mrg_rxbuf) \
> ++ ETH_HDR_SZ + MAX_TCP_PAYLOAD)

Should we use guest_hdr_len instead of sizeof() here? Consider the
vnet_hdr will be extended in the future.

> +
> +#define IP4_HEADER_LEN 5 /* header lenght value in ip header without option 
> */

type, should be 'length'

> +
> +/* Purge coalesced packets timer interval */
> +#define RSC_TIMER_INTERVAL  30
> +
> +/* Switcher to enable/disable rsc */
> +static bool virtio_net_rsc_bypass = 1;
> +
> +/* This value affects the performance a lot, and should be tuned carefully,
> +   '30'(300us) is the recommended value to pass the WHQL test, '5' 
> can
> +   gain 2x netperf throughput with tso/gso/gro 'off'. */
> +static uint32_t virtio_net_rsc_timeout = RSC_TIMER_INTERVAL;
> +
>  typedef struct VirtIOFeature {
>  uint32_t flags;
>  size_t end;
> @@ -1089,7 +1120,8 @@ static 

Re: [Qemu-devel] [ Patch 1/2] virtio-net rsc: support coalescing ipv4 tcp traffic

2016-03-15 Thread Wei Xu


- Original Message -
From: "Michael S. Tsirkin" <m...@redhat.com>
To: w...@redhat.com
Cc: vict...@redhat.com, jasow...@redhat.com, yvuge...@redhat.com, 
qemu-devel@nongnu.org, mar...@redhat.com, dfley...@redhat.com
Sent: Tuesday, March 15, 2016 6:00:03 PM
Subject: Re: [Qemu-devel] [ Patch 1/2] virtio-net rsc: support coalescing ipv4 
tcp traffic

On Tue, Mar 15, 2016 at 05:17:03PM +0800, w...@redhat.com wrote:
> From: Wei Xu <w...@redhat.com>
> 
> All the data packets in a tcp connection will be cached to a big buffer
> in every receive interval, and will be sent out via a timer, the
> 'virtio_net_rsc_timeout' controls the interval, the value will influent the
> performance and response of tcp connection extremely, 5(50us) is a
> experience value to gain a performance improvement, since the whql test
> sends packets every 100us, so '30(300us)' can pass the test case,
> this is also the default value, it's gonna to be tunable.
> The timer will only be triggered if the packets pool is not empty,
> and it'll drain off all the cached packets
> 
> 'NetRscChain' is used to save the segments of different protocols in a
> VirtIONet device.
> 
> The main handler of TCP includes TCP window update, duplicated ACK check
> and the real data coalescing if the new segment passed sanity check
> and is identified as an 'wanted' one.
> 
> An 'wanted' segment means:
> 1. Segment is within current window and the sequence is the expected one.
> 2. ACK of the segment is in the valid window.
> 3. If the ACK in the segment is a duplicated one, then it must less than 2,
>this is to notify upper layer TCP starting retransmission due to the spec.
> 
> Sanity check includes:
> 1. Incorrect version in IP header
> 2. IP options & IP fragment
> 3. Not a TCP packets
> 4. Sanity size check to prevent buffer overflow attack.
> 
> There maybe more cases should be considered such as ip identification other
> flags, while it broke the test because windows set it to the same even it's
> not a fragment.
> 
> Normally it includes 2 typical ways to handle a TCP control flag, 'bypass'
> and 'finalize', 'bypass' means should be sent out directly, and 'finalize'
> means the packets should also be bypassed, and this should be done
> after searching for the same connection packets in the pool and sending
> all of them out, this is to avoid out of data.
> 
> All the 'SYN' packets will be bypassed since this always begin a new'
> connection, other flags such 'FIN/RST' will trigger a finalization, because
> this normally happens upon a connection is going to be closed, an 'URG' packet
> also finalize current coalescing unit while there maybe protocol difference to
> different OS.
> 
> Statistics can be used to monitor the basic coalescing status, the 'out of 
> order'
> and 'out of window' means how many retransmitting packets, thus describe the
> performance intuitively.
> 
> Signed-off-by: Wei Xu <w...@redhat.com>
> ---
>  hw/net/virtio-net.c| 486 
> -
>  include/hw/virtio/virtio-net.h |   1 +
>  include/hw/virtio/virtio.h |  72 ++
>  3 files changed, 558 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 5798f87..c23b45f 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -15,10 +15,12 @@
>  #include "qemu/iov.h"
>  #include "hw/virtio/virtio.h"
>  #include "net/net.h"
> +#include "net/eth.h"
>  #include "net/checksum.h"
>  #include "net/tap.h"
>  #include "qemu/error-report.h"
>  #include "qemu/timer.h"
> +#include "qemu/sockets.h"
>  #include "hw/virtio/virtio-net.h"
>  #include "net/vhost_net.h"
>  #include "hw/virtio/virtio-bus.h"
> @@ -38,6 +40,35 @@
>  #define endof(container, field) \
>  (offsetof(container, field) + sizeof(((container *)0)->field))
>  
> +#define ETH_HDR_SZ (sizeof(struct eth_header))
> +#define IP4_HDR_SZ (sizeof(struct ip_header))
> +#define TCP_HDR_SZ (sizeof(struct tcp_header))
> +#define ETH_IP4_HDR_SZ (ETH_HDR_SZ + IP4_HDR_SZ)

It's better to open-code these imho.

okay.

> +
> +#define IP4_ADDR_SIZE   8   /* ipv4 saddr + daddr */
> +#define TCP_PORT_SIZE   4   /* sport + dport */
> +
> +/* IPv4 max payload, 16 bits in the header */
> +#define MAX_IP4_PAYLOAD (65535 - IP4_HDR_SZ)
> +#define MAX_TCP_PAYLOAD 65535
> +
> +/* max payload with virtio header */
> +#define MAX_VIRTIO_PAYLOAD  (sizeof(struct virtio_net_hdr_mrg_rxbuf) \
> ++ ETH_HDR_SZ + MAX_TCP_PAYLOAD)
> +
>

Re: [Qemu-devel] [ Patch 1/2] virtio-net rsc: support coalescing ipv4 tcp traffic

2016-03-15 Thread Michael S. Tsirkin
On Tue, Mar 15, 2016 at 05:17:03PM +0800, w...@redhat.com wrote:
> From: Wei Xu 
> 
> All the data packets in a tcp connection will be cached to a big buffer
> in every receive interval, and will be sent out via a timer, the
> 'virtio_net_rsc_timeout' controls the interval, the value will influent the
> performance and response of tcp connection extremely, 5(50us) is a
> experience value to gain a performance improvement, since the whql test
> sends packets every 100us, so '30(300us)' can pass the test case,
> this is also the default value, it's gonna to be tunable.
> The timer will only be triggered if the packets pool is not empty,
> and it'll drain off all the cached packets
> 
> 'NetRscChain' is used to save the segments of different protocols in a
> VirtIONet device.
> 
> The main handler of TCP includes TCP window update, duplicated ACK check
> and the real data coalescing if the new segment passed sanity check
> and is identified as an 'wanted' one.
> 
> An 'wanted' segment means:
> 1. Segment is within current window and the sequence is the expected one.
> 2. ACK of the segment is in the valid window.
> 3. If the ACK in the segment is a duplicated one, then it must less than 2,
>this is to notify upper layer TCP starting retransmission due to the spec.
> 
> Sanity check includes:
> 1. Incorrect version in IP header
> 2. IP options & IP fragment
> 3. Not a TCP packets
> 4. Sanity size check to prevent buffer overflow attack.
> 
> There maybe more cases should be considered such as ip identification other
> flags, while it broke the test because windows set it to the same even it's
> not a fragment.
> 
> Normally it includes 2 typical ways to handle a TCP control flag, 'bypass'
> and 'finalize', 'bypass' means should be sent out directly, and 'finalize'
> means the packets should also be bypassed, and this should be done
> after searching for the same connection packets in the pool and sending
> all of them out, this is to avoid out of data.
> 
> All the 'SYN' packets will be bypassed since this always begin a new'
> connection, other flags such 'FIN/RST' will trigger a finalization, because
> this normally happens upon a connection is going to be closed, an 'URG' packet
> also finalize current coalescing unit while there maybe protocol difference to
> different OS.
> 
> Statistics can be used to monitor the basic coalescing status, the 'out of 
> order'
> and 'out of window' means how many retransmitting packets, thus describe the
> performance intuitively.
> 
> Signed-off-by: Wei Xu 
> ---
>  hw/net/virtio-net.c| 486 
> -
>  include/hw/virtio/virtio-net.h |   1 +
>  include/hw/virtio/virtio.h |  72 ++
>  3 files changed, 558 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 5798f87..c23b45f 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -15,10 +15,12 @@
>  #include "qemu/iov.h"
>  #include "hw/virtio/virtio.h"
>  #include "net/net.h"
> +#include "net/eth.h"
>  #include "net/checksum.h"
>  #include "net/tap.h"
>  #include "qemu/error-report.h"
>  #include "qemu/timer.h"
> +#include "qemu/sockets.h"
>  #include "hw/virtio/virtio-net.h"
>  #include "net/vhost_net.h"
>  #include "hw/virtio/virtio-bus.h"
> @@ -38,6 +40,35 @@
>  #define endof(container, field) \
>  (offsetof(container, field) + sizeof(((container *)0)->field))
>  
> +#define ETH_HDR_SZ (sizeof(struct eth_header))
> +#define IP4_HDR_SZ (sizeof(struct ip_header))
> +#define TCP_HDR_SZ (sizeof(struct tcp_header))
> +#define ETH_IP4_HDR_SZ (ETH_HDR_SZ + IP4_HDR_SZ)

It's better to open-code these imho.

> +
> +#define IP4_ADDR_SIZE   8   /* ipv4 saddr + daddr */
> +#define TCP_PORT_SIZE   4   /* sport + dport */
> +
> +/* IPv4 max payload, 16 bits in the header */
> +#define MAX_IP4_PAYLOAD (65535 - IP4_HDR_SZ)
> +#define MAX_TCP_PAYLOAD 65535
> +
> +/* max payload with virtio header */
> +#define MAX_VIRTIO_PAYLOAD  (sizeof(struct virtio_net_hdr_mrg_rxbuf) \
> ++ ETH_HDR_SZ + MAX_TCP_PAYLOAD)
> +
> +#define IP4_HEADER_LEN 5 /* header lenght value in ip header without option 
> */
> +
> +/* Purge coalesced packets timer interval */
> +#define RSC_TIMER_INTERVAL  30

Pls prefix local macros with VIRTIO_NET_


> +
> +/* Switcher to enable/disable rsc */
> +static bool virtio_net_rsc_bypass = 1;
> +
> +/* This value affects the performance a lot, and should be tuned carefully,
> +   '30'(300us) is the recommended value to pass the WHQL test, '5' 
> can
> +   gain 2x netperf throughput with tso/gso/gro 'off'. */

So either tests pass or we get good performance, but not both?
Hmm.

> +static uint32_t virtio_net_rsc_timeout = RSC_TIMER_INTERVAL;


This would beed to be tunable.

> +
>  typedef struct VirtIOFeature {
>  uint32_t flags;
>  size_t end;
> @@ -1089,7 +1120,8 @@ static int