Re: [ovs-dev] [PATCH v1] netdev-dpdk: Implement TCP/UDP TX cksum in ovs-dpdk side

2017-08-24 Thread Gao Zhenyu
Thanks for the comments!

Yes, the best way is calculating cksum if destination need cksum.
But in current ovs-dpdk process,  it is hard to tell whether this whole
batch packets need cksum or not when delivering to destination.
If we check(check PKT_TX_L4_MASK and has l4 header) packets one by one will
introduce regression in some usecases. (In the previous email, Ciara give a
testing on my first patch and see about 4% regression in pure forwarding
packet testing )

About offlording to physical nic, I had make some testing on it and it
doesn't show significant improvment but disable dpdk tx vectorization.(may
not good for small packets) I prefer to implement software cksum first then
count hardware offloading later.

The VM I use for testing is centos7,kernel version
is 3.10.0-514.16.1.el7.x86_64. Supporting cksum has a additional benefit,
the vhost-net can enable NETIF_F_SG (enable scatter-gather feature).

2017-08-24 17:07 GMT+08:00 O Mahony, Billy <billy.o.mah...@intel.com>:

> Hi Gao,
>
> Thanks for working on this. Lack of checksum offload is big difference
> between ovs and ovs-dpdk when using linux stack in the guest.
>
> The thing that struck me was that rather than immediately calculating the
> L4 checksum in the host on vhost rx that the calculation should be delayed
> until it's known to be absolutely required to be done on the host. If the
> packet is for another VM a checksum is not required as the bits are not
> going over a physical medium. And if the packets is destined for a NIC then
> the checksum can be offloaded if the NIC supports it.
>
> I'm not sure why doing the L4 sum in the guest should give a performance
> gain. The processing still has to be done. Maybe the guest code was
> compiled for an older architecture and is not using as efficient a set of
> instructions?
>
> In any case the best advantage of having dpdk virtio device  support
> offload is if it can further offload to a NIC or avoid cksum entirely if
> the packet is destined for a local VM.
>
> Thanks,
> Billy.
>
>
> > -Original Message-
> > From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> > boun...@openvswitch.org] On Behalf Of Gao Zhenyu
> > Sent: Wednesday, August 23, 2017 4:12 PM
> > To: Loftus, Ciara <ciara.lof...@intel.com>
> > Cc: d...@openvswitch.org; us...@dpdk.org
> > Subject: Re: [ovs-dev] [PATCH v1] netdev-dpdk: Implement TCP/UDP TX
> > cksum in ovs-dpdk side
> >
> > Yes, maintaining only one implementation is resonable.
> > However making ovs-dpdk to support vhost tx-cksum first is doable as
> well.
> > We can have it in ovs, and replace it with new DPDK API once ovs update
> its
> > dpdk version which contains the tx-cksum implementation.
> >
> >
> > Thanks
> > Zhenyu Gao
> >
> > 2017-08-23 21:59 GMT+08:00 Loftus, Ciara <ciara.lof...@intel.com>:
> >
> > > >
> > > > Hi Ciara
> > > >
> > > > You had a general concern below; can we conclude on that before
> > > > going further ?
> > > >
> > > > Thanks Darrell
> > > >
> > > > “
> > > > > On another note I have a general concern. I understand similar
> > > functionality
> > > > > is present in the DPDK vhost sample app. I wonder if it would be
> > > feasible
> > > > for
> > > > > this to be implemented in the DPDK vhost library and leveraged
> > > > > here,
> > > > rather
> > > > > than having two implementations in two separate code bases.
> > >
> > > This is something I'd like to see, although I wouldn't block on this
> > > patch waiting for it.
> > > Maybe we can have the initial implementation as it is (if it proves
> > > beneficial), then move to a common DPDK API if/when it becomes
> > available.
> > >
> > > I've cc'ed DPDK users list hoping for some input. To summarise:
> > > From my understanding, the DPDK vhost sample application calculates TX
> > > checksum for packets received from vHost ports with invalid/0
> checksums:
> > > http://dpdk.org/browse/dpdk/tree/examples/vhost/main.c#n910
> > > The patch being discussed in this thread (also here:
> > > https://patchwork.ozlabs.org/patch/802070/) it seems does something
> > > very similar.
> > > Wondering on the feasibility of putting this functionality in a
> > > rte_vhost library call such that we don't have two separate
> > implementations?
> > >
> > > Thanks,
> > > Ciara
> > >
> > > > >
> > > > > I have some other comments inline.

Re: [ovs-dev] [PATCH v1] netdev-dpdk: Implement TCP/UDP TX cksum in ovs-dpdk side

2017-08-24 Thread O Mahony, Billy
Hi Gao,

Thanks for working on this. Lack of checksum offload is big difference between 
ovs and ovs-dpdk when using linux stack in the guest.
 
The thing that struck me was that rather than immediately calculating the L4 
checksum in the host on vhost rx that the calculation should be delayed until 
it's known to be absolutely required to be done on the host. If the packet is 
for another VM a checksum is not required as the bits are not going over a 
physical medium. And if the packets is destined for a NIC then the checksum can 
be offloaded if the NIC supports it.

I'm not sure why doing the L4 sum in the guest should give a performance gain. 
The processing still has to be done. Maybe the guest code was compiled for an 
older architecture and is not using as efficient a set of instructions?

In any case the best advantage of having dpdk virtio device  support offload is 
if it can further offload to a NIC or avoid cksum entirely if the packet is 
destined for a local VM.

Thanks,
Billy. 


> -Original Message-
> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> boun...@openvswitch.org] On Behalf Of Gao Zhenyu
> Sent: Wednesday, August 23, 2017 4:12 PM
> To: Loftus, Ciara <ciara.lof...@intel.com>
> Cc: d...@openvswitch.org; us...@dpdk.org
> Subject: Re: [ovs-dev] [PATCH v1] netdev-dpdk: Implement TCP/UDP TX
> cksum in ovs-dpdk side
> 
> Yes, maintaining only one implementation is resonable.
> However making ovs-dpdk to support vhost tx-cksum first is doable as well.
> We can have it in ovs, and replace it with new DPDK API once ovs update its
> dpdk version which contains the tx-cksum implementation.
> 
> 
> Thanks
> Zhenyu Gao
> 
> 2017-08-23 21:59 GMT+08:00 Loftus, Ciara <ciara.lof...@intel.com>:
> 
> > >
> > > Hi Ciara
> > >
> > > You had a general concern below; can we conclude on that before
> > > going further ?
> > >
> > > Thanks Darrell
> > >
> > > “
> > > > On another note I have a general concern. I understand similar
> > functionality
> > > > is present in the DPDK vhost sample app. I wonder if it would be
> > feasible
> > > for
> > > > this to be implemented in the DPDK vhost library and leveraged
> > > > here,
> > > rather
> > > > than having two implementations in two separate code bases.
> >
> > This is something I'd like to see, although I wouldn't block on this
> > patch waiting for it.
> > Maybe we can have the initial implementation as it is (if it proves
> > beneficial), then move to a common DPDK API if/when it becomes
> available.
> >
> > I've cc'ed DPDK users list hoping for some input. To summarise:
> > From my understanding, the DPDK vhost sample application calculates TX
> > checksum for packets received from vHost ports with invalid/0 checksums:
> > http://dpdk.org/browse/dpdk/tree/examples/vhost/main.c#n910
> > The patch being discussed in this thread (also here:
> > https://patchwork.ozlabs.org/patch/802070/) it seems does something
> > very similar.
> > Wondering on the feasibility of putting this functionality in a
> > rte_vhost library call such that we don't have two separate
> implementations?
> >
> > Thanks,
> > Ciara
> >
> > > >
> > > > I have some other comments inline.
> > > >
> > > > Thanks,
> > > > Ciara
> > > “
> > >
> > >
> > >
> > > From: Gao Zhenyu <sysugaozhe...@gmail.com>
> > > Date: Wednesday, August 16, 2017 at 6:38 AM
> > > To: "Loftus, Ciara" <ciara.lof...@intel.com>
> > > Cc: "b...@ovn.org" <b...@ovn.org>, "Chandran, Sugesh"
> > > <sugesh.chand...@intel.com>, "ktray...@redhat.com"
> > > <ktray...@redhat.com>, Darrell Ball <db...@vmware.com>,
> > > "d...@openvswitch.org" <d...@openvswitch.org>
> > > Subject: Re: [ovs-dev] [PATCH v1] netdev-dpdk: Implement TCP/UDP TX
> > > cksum in ovs-dpdk side
> > >
> > > Hi Loftus,
> > >I had submitted a new version, please see
> > > https://patchwork.ozlabs.org/patch/802070/
> > >It move the cksum to vhost receive side.
> > > Thanks
> > > Zhenyu Gao
> > >
> > > 2017-08-10 12:35 GMT+08:00 Gao Zhenyu <sysugaozhe...@gmail.com>:
> > > I see, for flows in phy-phy setup, they should not be calculate cksum.
> > > I will revise my patch to do the cksum for vhost port only. I will
> > > send
> > a new
> > > patch next week.
> > >
> &g

Re: [ovs-dev] [PATCH v1] netdev-dpdk: Implement TCP/UDP TX cksum in ovs-dpdk side

2017-08-23 Thread Gao Zhenyu
Yes, maintaining only one implementation is resonable.
However making ovs-dpdk to support vhost tx-cksum first is doable as well.
We can have it in ovs, and replace it with new DPDK API once ovs update its
dpdk version which contains the tx-cksum implementation.


Thanks
Zhenyu Gao

2017-08-23 21:59 GMT+08:00 Loftus, Ciara <ciara.lof...@intel.com>:

> >
> > Hi Ciara
> >
> > You had a general concern below; can we conclude on that before going
> > further ?
> >
> > Thanks Darrell
> >
> > “
> > > On another note I have a general concern. I understand similar
> functionality
> > > is present in the DPDK vhost sample app. I wonder if it would be
> feasible
> > for
> > > this to be implemented in the DPDK vhost library and leveraged here,
> > rather
> > > than having two implementations in two separate code bases.
>
> This is something I'd like to see, although I wouldn't block on this patch
> waiting for it.
> Maybe we can have the initial implementation as it is (if it proves
> beneficial), then move to a common DPDK API if/when it becomes available.
>
> I've cc'ed DPDK users list hoping for some input. To summarise:
> From my understanding, the DPDK vhost sample application calculates TX
> checksum for packets received from vHost ports with invalid/0 checksums:
> http://dpdk.org/browse/dpdk/tree/examples/vhost/main.c#n910
> The patch being discussed in this thread (also here:
> https://patchwork.ozlabs.org/patch/802070/) it seems does something very
> similar.
> Wondering on the feasibility of putting this functionality in a rte_vhost
> library call such that we don't have two separate implementations?
>
> Thanks,
> Ciara
>
> > >
> > > I have some other comments inline.
> > >
> > > Thanks,
> > > Ciara
> > “
> >
> >
> >
> > From: Gao Zhenyu <sysugaozhe...@gmail.com>
> > Date: Wednesday, August 16, 2017 at 6:38 AM
> > To: "Loftus, Ciara" <ciara.lof...@intel.com>
> > Cc: "b...@ovn.org" <b...@ovn.org>, "Chandran, Sugesh"
> > <sugesh.chand...@intel.com>, "ktray...@redhat.com"
> > <ktray...@redhat.com>, Darrell Ball <db...@vmware.com>,
> > "d...@openvswitch.org" <d...@openvswitch.org>
> > Subject: Re: [ovs-dev] [PATCH v1] netdev-dpdk: Implement TCP/UDP TX
> > cksum in ovs-dpdk side
> >
> > Hi Loftus,
> >I had submitted a new version, please see
> > https://patchwork.ozlabs.org/patch/802070/
> >It move the cksum to vhost receive side.
> > Thanks
> > Zhenyu Gao
> >
> > 2017-08-10 12:35 GMT+08:00 Gao Zhenyu <sysugaozhe...@gmail.com>:
> > I see, for flows in phy-phy setup, they should not be calculate cksum.
> > I will revise my patch to do the cksum for vhost port only. I will send
> a new
> > patch next week.
> >
> > Thanks
> > Zhenyu Gao
> >
> > 2017-08-08 17:53 GMT+08:00 Loftus, Ciara <ciara.lof...@intel.com>:
> > >
> > > Hi Loftus,
> > >
> > > Thanks for testing and the comments!
> > > Can you show more details about your phy-vm-phy,phy-phy setup and
> > > testing steps? Then I can reproduce it to see if I can solve this pps
> problem.
> >
> > You're welcome. I forgot to mention my tests were with 64B packets.
> >
> > For phy-phy the setup is a single host with 2 dpdk physical ports and 1
> flow
> > rule port1 -> port2.
> > See figure 3 here: https://tools.ietf.org/html/draft-ietf-bmwg-vswitch-
> > opnfv-04#section-4
> >
> > For the phy-vm-phy the setup is a single host with 2 dpdk physical ports
> and 2
> > vhostuser ports with flow rules:
> > Dpdk1 -> vhost 1 & vhost2 -> dpdk2
> > IP rules are set up in the VM to route packets from vhost1 to vhost 2.
> > See figure 4 in the link above.
> >
> > >
> > > BTW, how about throughput, did you saw improvment?
> >
> > By throughput if you mean 0% packet loss, I did not test this.
> >
> > Thanks,
> > Ciara
> >
> > >
> > > I would like to implement vhost->vhost part.
> > >
> > > Thanks
> > > Zhenyu Gao
> > >
> > > 2017-08-04 22:52 GMT+08:00 Loftus, Ciara <ciara.lof...@intel.com>:
> > > >
> > > > Currently, the dpdk-vhost side in ovs doesn't support tcp/udp tx
> cksum.
> > > > So L4 packets's cksum were calculated in VM side but performance is
> not
> > > > good.
> > > > Implementing t

Re: [ovs-dev] [PATCH v1] netdev-dpdk: Implement TCP/UDP TX cksum in ovs-dpdk side

2017-08-23 Thread Loftus, Ciara
> 
> Hi Ciara
> 
> You had a general concern below; can we conclude on that before going
> further ?
> 
> Thanks Darrell
> 
> “
> > On another note I have a general concern. I understand similar functionality
> > is present in the DPDK vhost sample app. I wonder if it would be feasible
> for
> > this to be implemented in the DPDK vhost library and leveraged here,
> rather
> > than having two implementations in two separate code bases.

This is something I'd like to see, although I wouldn't block on this patch 
waiting for it.
Maybe we can have the initial implementation as it is (if it proves 
beneficial), then move to a common DPDK API if/when it becomes available.

I've cc'ed DPDK users list hoping for some input. To summarise:
From my understanding, the DPDK vhost sample application calculates TX checksum 
for packets received from vHost ports with invalid/0 checksums:
http://dpdk.org/browse/dpdk/tree/examples/vhost/main.c#n910
The patch being discussed in this thread (also here: 
https://patchwork.ozlabs.org/patch/802070/) it seems does something very 
similar.
Wondering on the feasibility of putting this functionality in a rte_vhost 
library call such that we don't have two separate implementations?

Thanks,
Ciara

> >
> > I have some other comments inline.
> >
> > Thanks,
> > Ciara
> “
> 
> 
> 
> From: Gao Zhenyu <sysugaozhe...@gmail.com>
> Date: Wednesday, August 16, 2017 at 6:38 AM
> To: "Loftus, Ciara" <ciara.lof...@intel.com>
> Cc: "b...@ovn.org" <b...@ovn.org>, "Chandran, Sugesh"
> <sugesh.chand...@intel.com>, "ktray...@redhat.com"
> <ktray...@redhat.com>, Darrell Ball <db...@vmware.com>,
> "d...@openvswitch.org" <d...@openvswitch.org>
> Subject: Re: [ovs-dev] [PATCH v1] netdev-dpdk: Implement TCP/UDP TX
> cksum in ovs-dpdk side
> 
> Hi Loftus,
>    I had submitted a new version, please see
> https://patchwork.ozlabs.org/patch/802070/
>    It move the cksum to vhost receive side.
> Thanks
> Zhenyu Gao
> 
> 2017-08-10 12:35 GMT+08:00 Gao Zhenyu <sysugaozhe...@gmail.com>:
> I see, for flows in phy-phy setup, they should not be calculate cksum.
> I will revise my patch to do the cksum for vhost port only. I will send a new
> patch next week.
> 
> Thanks
> Zhenyu Gao
> 
> 2017-08-08 17:53 GMT+08:00 Loftus, Ciara <ciara.lof...@intel.com>:
> >
> > Hi Loftus,
> >
> > Thanks for testing and the comments!
> > Can you show more details about your phy-vm-phy,phy-phy setup and
> > testing steps? Then I can reproduce it to see if I can solve this pps 
> > problem.
> 
> You're welcome. I forgot to mention my tests were with 64B packets.
> 
> For phy-phy the setup is a single host with 2 dpdk physical ports and 1 flow
> rule port1 -> port2.
> See figure 3 here: https://tools.ietf.org/html/draft-ietf-bmwg-vswitch-
> opnfv-04#section-4
> 
> For the phy-vm-phy the setup is a single host with 2 dpdk physical ports and 2
> vhostuser ports with flow rules:
> Dpdk1 -> vhost 1 & vhost2 -> dpdk2
> IP rules are set up in the VM to route packets from vhost1 to vhost 2.
> See figure 4 in the link above.
> 
> >
> > BTW, how about throughput, did you saw improvment?
> 
> By throughput if you mean 0% packet loss, I did not test this.
> 
> Thanks,
> Ciara
> 
> >
> > I would like to implement vhost->vhost part.
> >
> > Thanks
> > Zhenyu Gao
> >
> > 2017-08-04 22:52 GMT+08:00 Loftus, Ciara <ciara.lof...@intel.com>:
> > >
> > > Currently, the dpdk-vhost side in ovs doesn't support tcp/udp tx cksum.
> > > So L4 packets's cksum were calculated in VM side but performance is not
> > > good.
> > > Implementing tcp/udp tx cksum in ovs-dpdk side improves throughput
> and
> > > makes virtio-net frontend-driver support NETIF_F_SG as well
> > >
> > > Signed-off-by: Zhenyu Gao <sysugaozhe...@gmail.com>
> > > ---
> > >
> > > Here is some performance number:
> > >
> > > Setup:
> > >
> > >  qperf client
> > > +-+
> > > |   VM    |
> > > +-+
> > >      |
> > >      |                          qperf server
> > > +--+              ++
> > > | vswitch+dpdk |              | bare-metal |
> > > +--+              ++
> > >        |                            |
> > >        |                            |
> > >       pNic-PhysicalSwitch
> > >
> > > do cksum in ovs-dpdk: Ap

Re: [ovs-dev] [PATCH v1] netdev-dpdk: Implement TCP/UDP TX cksum in ovs-dpdk side

2017-08-16 Thread Darrell Ball
Hi Ciara

You had a general concern below; can we conclude on that before going further ?

Thanks Darrell

“
> On another note I have a general concern. I understand similar functionality
> is present in the DPDK vhost sample app. I wonder if it would be feasible for
> this to be implemented in the DPDK vhost library and leveraged here, rather
> than having two implementations in two separate code bases.
>
> I have some other comments inline.
>
> Thanks,
> Ciara
“



From: Gao Zhenyu <sysugaozhe...@gmail.com>
Date: Wednesday, August 16, 2017 at 6:38 AM
To: "Loftus, Ciara" <ciara.lof...@intel.com>
Cc: "b...@ovn.org" <b...@ovn.org>, "Chandran, Sugesh" 
<sugesh.chand...@intel.com>, "ktray...@redhat.com" <ktray...@redhat.com>, 
Darrell Ball <db...@vmware.com>, "d...@openvswitch.org" <d...@openvswitch.org>
Subject: Re: [ovs-dev] [PATCH v1] netdev-dpdk: Implement TCP/UDP TX cksum in 
ovs-dpdk side

Hi Loftus,
   I had submitted a new version, please see 
https://patchwork.ozlabs.org/patch/802070/<https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.ozlabs.org_patch_802070_=DwMFaQ=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=CE8K7Pnx6aVmgYFTMrsCLCL8dLA6RjD_jGh5KNtWRvA=iBg71oKi5oXmrpna96jYdQhts7WkTJPTLFYuBkI2j1c=>
   It move the cksum to vhost receive side.
Thanks
Zhenyu Gao

2017-08-10 12:35 GMT+08:00 Gao Zhenyu 
<sysugaozhe...@gmail.com<mailto:sysugaozhe...@gmail.com>>:
I see, for flows in phy-phy setup, they should not be calculate cksum.
I will revise my patch to do the cksum for vhost port only. I will send a new 
patch next week.

Thanks
Zhenyu Gao

2017-08-08 17:53 GMT+08:00 Loftus, Ciara 
<ciara.lof...@intel.com<mailto:ciara.lof...@intel.com>>:
>
> Hi Loftus,
>
> Thanks for testing and the comments!
> Can you show more details about your phy-vm-phy,phy-phy setup and
> testing steps? Then I can reproduce it to see if I can solve this pps problem.

You're welcome. I forgot to mention my tests were with 64B packets.

For phy-phy the setup is a single host with 2 dpdk physical ports and 1 flow 
rule port1 -> port2.
See figure 3 here: 
https://tools.ietf.org/html/draft-ietf-bmwg-vswitch-opnfv-04#section-4<https://urldefense.proofpoint.com/v2/url?u=https-3A__tools.ietf.org_html_draft-2Dietf-2Dbmwg-2Dvswitch-2Dopnfv-2D04-23section-2D4=DwMFaQ=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=CE8K7Pnx6aVmgYFTMrsCLCL8dLA6RjD_jGh5KNtWRvA=I_yHZtRxUTnwJK7DOezdioeIoAn6dlev6BSCxDtKzwc=>

For the phy-vm-phy the setup is a single host with 2 dpdk physical ports and 2 
vhostuser ports with flow rules:
Dpdk1 -> vhost 1 & vhost2 -> dpdk2
IP rules are set up in the VM to route packets from vhost1 to vhost 2.
See figure 4 in the link above.

>
> BTW, how about throughput, did you saw improvment?

By throughput if you mean 0% packet loss, I did not test this.

Thanks,
Ciara

>
> I would like to implement vhost->vhost part.
>
> Thanks
> Zhenyu Gao
>
> 2017-08-04 22:52 GMT+08:00 Loftus, Ciara 
> <ciara.lof...@intel.com<mailto:ciara.lof...@intel.com>>:
> >
> > Currently, the dpdk-vhost side in ovs doesn't support tcp/udp tx cksum.
> > So L4 packets's cksum were calculated in VM side but performance is not
> > good.
> > Implementing tcp/udp tx cksum in ovs-dpdk side improves throughput and
> > makes virtio-net frontend-driver support NETIF_F_SG as well
> >
> > Signed-off-by: Zhenyu Gao 
> > <sysugaozhe...@gmail.com<mailto:sysugaozhe...@gmail.com>>
> > ---
> >
> > Here is some performance number:
> >
> > Setup:
> >
> >  qperf client
> > +-+
> > |   VM|
> > +-+
> >  |
> >  |  qperf server
> > +--+  ++
> > | vswitch+dpdk |  | bare-metal |
> > +--+  ++
> >||
> >||
> >   pNic-PhysicalSwitch
> >
> > do cksum in ovs-dpdk: Applied this patch and execute 'ethtool -K eth0 tx
> on'
> > in VM side.
> >   It offload cksum job to ovs-dpdk side.
> >
> > do cksum in VM: Applied this patch and execute 'ethtool -K eth0 tx off' in
> VM
> > side.
> > VM calculate cksum for tcp/udp packets.
> >
> > We can see huge improvment in TCP throughput if we leverage ovs-dpdk
> > cksum.
> Hi Zhenyu,
>
> Thanks for the patch. I tested some alternative use cases and unfortunately I
> see a degradation for phy-phy and phy-vm-phy topologies.
> Here are my results:
>
> phy-vm-phy:
> without patch: 0.871M

Re: [ovs-dev] [PATCH v1] netdev-dpdk: Implement TCP/UDP TX cksum in ovs-dpdk side

2017-08-16 Thread Gao Zhenyu
Hi Loftus,

   I had submitted a new version, please see
https://patchwork.ozlabs.org/patch/802070/
   It move the cksum to vhost receive side.

Thanks
Zhenyu Gao

2017-08-10 12:35 GMT+08:00 Gao Zhenyu :

> I see, for flows in phy-phy setup, they should not be calculate cksum.
> I will revise my patch to do the cksum for vhost port only. I will send a
> new patch next week.
>
> Thanks
> Zhenyu Gao
>
> 2017-08-08 17:53 GMT+08:00 Loftus, Ciara :
>
>> >
>> > Hi Loftus,
>> >
>> > Thanks for testing and the comments!
>> > Can you show more details about your phy-vm-phy,phy-phy setup and
>> > testing steps? Then I can reproduce it to see if I can solve this pps
>> problem.
>>
>> You're welcome. I forgot to mention my tests were with 64B packets.
>>
>> For phy-phy the setup is a single host with 2 dpdk physical ports and 1
>> flow rule port1 -> port2.
>> See figure 3 here: https://tools.ietf.org/html/dr
>> aft-ietf-bmwg-vswitch-opnfv-04#section-4
>>
>> For the phy-vm-phy the setup is a single host with 2 dpdk physical ports
>> and 2 vhostuser ports with flow rules:
>> Dpdk1 -> vhost 1 & vhost2 -> dpdk2
>> IP rules are set up in the VM to route packets from vhost1 to vhost 2.
>> See figure 4 in the link above.
>>
>> >
>> > BTW, how about throughput, did you saw improvment?
>>
>> By throughput if you mean 0% packet loss, I did not test this.
>>
>> Thanks,
>> Ciara
>>
>> >
>> > I would like to implement vhost->vhost part.
>> >
>> > Thanks
>> > Zhenyu Gao
>> >
>> > 2017-08-04 22:52 GMT+08:00 Loftus, Ciara :
>> > >
>> > > Currently, the dpdk-vhost side in ovs doesn't support tcp/udp tx
>> cksum.
>> > > So L4 packets's cksum were calculated in VM side but performance is
>> not
>> > > good.
>> > > Implementing tcp/udp tx cksum in ovs-dpdk side improves throughput and
>> > > makes virtio-net frontend-driver support NETIF_F_SG as well
>> > >
>> > > Signed-off-by: Zhenyu Gao 
>> > > ---
>> > >
>> > > Here is some performance number:
>> > >
>> > > Setup:
>> > >
>> > >  qperf client
>> > > +-+
>> > > |   VM|
>> > > +-+
>> > >  |
>> > >  |  qperf server
>> > > +--+  ++
>> > > | vswitch+dpdk |  | bare-metal |
>> > > +--+  ++
>> > >||
>> > >||
>> > >   pNic-PhysicalSwitch
>> > >
>> > > do cksum in ovs-dpdk: Applied this patch and execute 'ethtool -K eth0
>> tx
>> > on'
>> > > in VM side.
>> > >   It offload cksum job to ovs-dpdk side.
>> > >
>> > > do cksum in VM: Applied this patch and execute 'ethtool -K eth0 tx
>> off' in
>> > VM
>> > > side.
>> > > VM calculate cksum for tcp/udp packets.
>> > >
>> > > We can see huge improvment in TCP throughput if we leverage ovs-dpdk
>> > > cksum.
>> > Hi Zhenyu,
>> >
>> > Thanks for the patch. I tested some alternative use cases and
>> unfortunately I
>> > see a degradation for phy-phy and phy-vm-phy topologies.
>> > Here are my results:
>> >
>> > phy-vm-phy:
>> > without patch: 0.871Mpps
>> > with patch (offload=on): 0.877Mpps
>> > with patch (offload=off): 0.891Mpps
>> >
>> > phy-phy:
>> > without patch: 13.581Mpps
>> > with patch: 13.055Mpps
>> >
>> > The half a million pps drop for the second test case is concerning to
>> me but
>> > not surprising since we're adding extra complexity to netdev_dpdk_send()
>> > Could this be avoided? Would it make sense to put this functionality
>> > somewhere else eg. vhost receive?
>> >
>> > On another note I have a general concern. I understand similar
>> functionality
>> > is present in the DPDK vhost sample app. I wonder if it would be
>> feasible for
>> > this to be implemented in the DPDK vhost library and leveraged here,
>> rather
>> > than having two implementations in two separate code bases.
>> >
>> > I have some other comments inline.
>> >
>> > Thanks,
>> > Ciara
>> >
>> > >
>> > > [root@localhost ~]# qperf -t 10 -oo msg_size:1:64K:*2
>> host-qperf-server01
>> > > tcp_bw tcp_lat udp_bw udp_lat
>> > >   do cksum in ovs-dpdk  do cksum in VM without
>> this patch
>> > > tcp_bw:
>> > > bw  =  2.05 MB/secbw  =  1.92 MB/secbw  =  1.95
>> MB/sec
>> > > tcp_bw:
>> > > bw  =  3.9 MB/sec bw  =  3.99 MB/secbw  =  3.98
>> MB/sec
>> > > tcp_bw:
>> > > bw  =  8.09 MB/secbw  =  7.82 MB/secbw  =  8.19
>> MB/sec
>> > > tcp_bw:
>> > > bw  =  14.9 MB/secbw  =  14.8 MB/secbw  =  15.7
>> MB/sec
>> > > tcp_bw:
>> > > bw  =  27.7 MB/secbw  =  28 MB/sec  bw  =  29.7
>> MB/sec
>> > > tcp_bw:
>> > > bw  =  51.2 MB/secbw  =  50.9 MB/secbw  =  54.9
>> MB/sec
>> > > tcp_bw:
>> > > bw  =  86.7 MB/secbw  =  86.8 MB/secbw  =  95.1
>> MB/sec
>> > > tcp_bw:
>> > > 

Re: [ovs-dev] [PATCH v1] netdev-dpdk: Implement TCP/UDP TX cksum in ovs-dpdk side

2017-08-09 Thread Gao Zhenyu
I see, for flows in phy-phy setup, they should not be calculate cksum.
I will revise my patch to do the cksum for vhost port only. I will send a
new patch next week.

Thanks
Zhenyu Gao

2017-08-08 17:53 GMT+08:00 Loftus, Ciara :

> >
> > Hi Loftus,
> >
> > Thanks for testing and the comments!
> > Can you show more details about your phy-vm-phy,phy-phy setup and
> > testing steps? Then I can reproduce it to see if I can solve this pps
> problem.
>
> You're welcome. I forgot to mention my tests were with 64B packets.
>
> For phy-phy the setup is a single host with 2 dpdk physical ports and 1
> flow rule port1 -> port2.
> See figure 3 here: https://tools.ietf.org/html/
> draft-ietf-bmwg-vswitch-opnfv-04#section-4
>
> For the phy-vm-phy the setup is a single host with 2 dpdk physical ports
> and 2 vhostuser ports with flow rules:
> Dpdk1 -> vhost 1 & vhost2 -> dpdk2
> IP rules are set up in the VM to route packets from vhost1 to vhost 2.
> See figure 4 in the link above.
>
> >
> > BTW, how about throughput, did you saw improvment?
>
> By throughput if you mean 0% packet loss, I did not test this.
>
> Thanks,
> Ciara
>
> >
> > I would like to implement vhost->vhost part.
> >
> > Thanks
> > Zhenyu Gao
> >
> > 2017-08-04 22:52 GMT+08:00 Loftus, Ciara :
> > >
> > > Currently, the dpdk-vhost side in ovs doesn't support tcp/udp tx cksum.
> > > So L4 packets's cksum were calculated in VM side but performance is not
> > > good.
> > > Implementing tcp/udp tx cksum in ovs-dpdk side improves throughput and
> > > makes virtio-net frontend-driver support NETIF_F_SG as well
> > >
> > > Signed-off-by: Zhenyu Gao 
> > > ---
> > >
> > > Here is some performance number:
> > >
> > > Setup:
> > >
> > >  qperf client
> > > +-+
> > > |   VM|
> > > +-+
> > >  |
> > >  |  qperf server
> > > +--+  ++
> > > | vswitch+dpdk |  | bare-metal |
> > > +--+  ++
> > >||
> > >||
> > >   pNic-PhysicalSwitch
> > >
> > > do cksum in ovs-dpdk: Applied this patch and execute 'ethtool -K eth0
> tx
> > on'
> > > in VM side.
> > >   It offload cksum job to ovs-dpdk side.
> > >
> > > do cksum in VM: Applied this patch and execute 'ethtool -K eth0 tx
> off' in
> > VM
> > > side.
> > > VM calculate cksum for tcp/udp packets.
> > >
> > > We can see huge improvment in TCP throughput if we leverage ovs-dpdk
> > > cksum.
> > Hi Zhenyu,
> >
> > Thanks for the patch. I tested some alternative use cases and
> unfortunately I
> > see a degradation for phy-phy and phy-vm-phy topologies.
> > Here are my results:
> >
> > phy-vm-phy:
> > without patch: 0.871Mpps
> > with patch (offload=on): 0.877Mpps
> > with patch (offload=off): 0.891Mpps
> >
> > phy-phy:
> > without patch: 13.581Mpps
> > with patch: 13.055Mpps
> >
> > The half a million pps drop for the second test case is concerning to me
> but
> > not surprising since we're adding extra complexity to netdev_dpdk_send()
> > Could this be avoided? Would it make sense to put this functionality
> > somewhere else eg. vhost receive?
> >
> > On another note I have a general concern. I understand similar
> functionality
> > is present in the DPDK vhost sample app. I wonder if it would be
> feasible for
> > this to be implemented in the DPDK vhost library and leveraged here,
> rather
> > than having two implementations in two separate code bases.
> >
> > I have some other comments inline.
> >
> > Thanks,
> > Ciara
> >
> > >
> > > [root@localhost ~]# qperf -t 10 -oo msg_size:1:64K:*2
> host-qperf-server01
> > > tcp_bw tcp_lat udp_bw udp_lat
> > >   do cksum in ovs-dpdk  do cksum in VM without
> this patch
> > > tcp_bw:
> > > bw  =  2.05 MB/secbw  =  1.92 MB/secbw  =  1.95
> MB/sec
> > > tcp_bw:
> > > bw  =  3.9 MB/sec bw  =  3.99 MB/secbw  =  3.98
> MB/sec
> > > tcp_bw:
> > > bw  =  8.09 MB/secbw  =  7.82 MB/secbw  =  8.19
> MB/sec
> > > tcp_bw:
> > > bw  =  14.9 MB/secbw  =  14.8 MB/secbw  =  15.7
> MB/sec
> > > tcp_bw:
> > > bw  =  27.7 MB/secbw  =  28 MB/sec  bw  =  29.7
> MB/sec
> > > tcp_bw:
> > > bw  =  51.2 MB/secbw  =  50.9 MB/secbw  =  54.9
> MB/sec
> > > tcp_bw:
> > > bw  =  86.7 MB/secbw  =  86.8 MB/secbw  =  95.1
> MB/sec
> > > tcp_bw:
> > > bw  =  149 MB/sec bw  =  160 MB/sec bw  =  149
> MB/sec
> > > tcp_bw:
> > > bw  =  211 MB/sec bw  =  205 MB/sec bw  =  216
> MB/sec
> > > tcp_bw:
> > > bw  =  271 MB/sec bw  =  254 MB/sec bw  =  275
> MB/sec
> > > tcp_bw:
> > > bw  =  326 MB/sec bw  =  303 MB/sec bw  =  321
> MB/sec
> > > tcp_bw:
> > > bw  

Re: [ovs-dev] [PATCH v1] netdev-dpdk: Implement TCP/UDP TX cksum in ovs-dpdk side

2017-08-08 Thread Loftus, Ciara
> 
> Hi Loftus,
> 
> Thanks for testing and the comments!
> Can you show more details about your phy-vm-phy,phy-phy setup and
> testing steps? Then I can reproduce it to see if I can solve this pps problem.

You're welcome. I forgot to mention my tests were with 64B packets.

For phy-phy the setup is a single host with 2 dpdk physical ports and 1 flow 
rule port1 -> port2.
See figure 3 here: 
https://tools.ietf.org/html/draft-ietf-bmwg-vswitch-opnfv-04#section-4

For the phy-vm-phy the setup is a single host with 2 dpdk physical ports and 2 
vhostuser ports with flow rules:
Dpdk1 -> vhost 1 & vhost2 -> dpdk2
IP rules are set up in the VM to route packets from vhost1 to vhost 2.
See figure 4 in the link above.

> 
> BTW, how about throughput, did you saw improvment?

By throughput if you mean 0% packet loss, I did not test this.

Thanks,
Ciara

> 
> I would like to implement vhost->vhost part.
> 
> Thanks
> Zhenyu Gao
> 
> 2017-08-04 22:52 GMT+08:00 Loftus, Ciara :
> >
> > Currently, the dpdk-vhost side in ovs doesn't support tcp/udp tx cksum.
> > So L4 packets's cksum were calculated in VM side but performance is not
> > good.
> > Implementing tcp/udp tx cksum in ovs-dpdk side improves throughput and
> > makes virtio-net frontend-driver support NETIF_F_SG as well
> >
> > Signed-off-by: Zhenyu Gao 
> > ---
> >
> > Here is some performance number:
> >
> > Setup:
> >
> >  qperf client
> > +-+
> > |   VM    |
> > +-+
> >      |
> >      |                          qperf server
> > +--+              ++
> > | vswitch+dpdk |              | bare-metal |
> > +--+              ++
> >        |                            |
> >        |                            |
> >       pNic-PhysicalSwitch
> >
> > do cksum in ovs-dpdk: Applied this patch and execute 'ethtool -K eth0 tx
> on'
> > in VM side.
> >                       It offload cksum job to ovs-dpdk side.
> >
> > do cksum in VM: Applied this patch and execute 'ethtool -K eth0 tx off' in
> VM
> > side.
> >                 VM calculate cksum for tcp/udp packets.
> >
> > We can see huge improvment in TCP throughput if we leverage ovs-dpdk
> > cksum.
> Hi Zhenyu,
> 
> Thanks for the patch. I tested some alternative use cases and unfortunately I
> see a degradation for phy-phy and phy-vm-phy topologies.
> Here are my results:
> 
> phy-vm-phy:
> without patch: 0.871Mpps
> with patch (offload=on): 0.877Mpps
> with patch (offload=off): 0.891Mpps
> 
> phy-phy:
> without patch: 13.581Mpps
> with patch: 13.055Mpps
> 
> The half a million pps drop for the second test case is concerning to me but
> not surprising since we're adding extra complexity to netdev_dpdk_send()
> Could this be avoided? Would it make sense to put this functionality
> somewhere else eg. vhost receive?
> 
> On another note I have a general concern. I understand similar functionality
> is present in the DPDK vhost sample app. I wonder if it would be feasible for
> this to be implemented in the DPDK vhost library and leveraged here, rather
> than having two implementations in two separate code bases.
> 
> I have some other comments inline.
> 
> Thanks,
> Ciara
> 
> >
> > [root@localhost ~]# qperf -t 10 -oo msg_size:1:64K:*2  host-qperf-server01
> > tcp_bw tcp_lat udp_bw udp_lat
> >   do cksum in ovs-dpdk          do cksum in VM             without this 
> >patch
> > tcp_bw:
> >     bw  =  2.05 MB/sec        bw  =  1.92 MB/sec        bw  =  1.95 MB/sec
> > tcp_bw:
> >     bw  =  3.9 MB/sec         bw  =  3.99 MB/sec        bw  =  3.98 MB/sec
> > tcp_bw:
> >     bw  =  8.09 MB/sec        bw  =  7.82 MB/sec        bw  =  8.19 MB/sec
> > tcp_bw:
> >     bw  =  14.9 MB/sec        bw  =  14.8 MB/sec        bw  =  15.7 MB/sec
> > tcp_bw:
> >     bw  =  27.7 MB/sec        bw  =  28 MB/sec          bw  =  29.7 MB/sec
> > tcp_bw:
> >     bw  =  51.2 MB/sec        bw  =  50.9 MB/sec        bw  =  54.9 MB/sec
> > tcp_bw:
> >     bw  =  86.7 MB/sec        bw  =  86.8 MB/sec        bw  =  95.1 MB/sec
> > tcp_bw:
> >     bw  =  149 MB/sec         bw  =  160 MB/sec         bw  =  149 MB/sec
> > tcp_bw:
> >     bw  =  211 MB/sec         bw  =  205 MB/sec         bw  =  216 MB/sec
> > tcp_bw:
> >     bw  =  271 MB/sec         bw  =  254 MB/sec         bw  =  275 MB/sec
> > tcp_bw:
> >     bw  =  326 MB/sec         bw  =  303 MB/sec         bw  =  321 MB/sec
> > tcp_bw:
> >     bw  =  407 MB/sec         bw  =  359 MB/sec         bw  =  361 MB/sec
> > tcp_bw:
> >     bw  =  816 MB/sec         bw  =  512 MB/sec         bw  =  419 MB/sec
> > tcp_bw:
> >     bw  =  840 MB/sec         bw  =  756 MB/sec         bw  =  457 MB/sec
> > tcp_bw:
> >     bw  =  1.07 GB/sec        bw  =  880 MB/sec         bw  =  480 MB/sec
> > tcp_bw:
> >     bw  =  1.17 GB/sec        bw  =  1.01 GB/sec        bw  =  488 MB/sec
> > tcp_bw:
> >     bw  =  1.17 GB/sec        bw  =  1.11 GB/sec        bw  =  483 MB/sec
> > tcp_lat:
> 

Re: [ovs-dev] [PATCH v1] netdev-dpdk: Implement TCP/UDP TX cksum in ovs-dpdk side

2017-08-04 Thread Loftus, Ciara
> 
> Currently, the dpdk-vhost side in ovs doesn't support tcp/udp tx cksum.
> So L4 packets's cksum were calculated in VM side but performance is not
> good.
> Implementing tcp/udp tx cksum in ovs-dpdk side improves throughput and
> makes virtio-net frontend-driver support NETIF_F_SG as well
> 
> Signed-off-by: Zhenyu Gao 
> ---
> 
> Here is some performance number:
> 
> Setup:
> 
>  qperf client
> +-+
> |   VM|
> +-+
>  |
>  |  qperf server
> +--+  ++
> | vswitch+dpdk |  | bare-metal |
> +--+  ++
>||
>||
>   pNic-PhysicalSwitch
> 
> do cksum in ovs-dpdk: Applied this patch and execute 'ethtool -K eth0 tx on'
> in VM side.
>   It offload cksum job to ovs-dpdk side.
> 
> do cksum in VM: Applied this patch and execute 'ethtool -K eth0 tx off' in VM
> side.
> VM calculate cksum for tcp/udp packets.
> 
> We can see huge improvment in TCP throughput if we leverage ovs-dpdk
> cksum.

Hi Zhenyu,

Thanks for the patch. I tested some alternative use cases and unfortunately I 
see a degradation for phy-phy and phy-vm-phy topologies.
Here are my results:

phy-vm-phy:
without patch: 0.871Mpps
with patch (offload=on): 0.877Mpps
with patch (offload=off): 0.891Mpps

phy-phy:
without patch: 13.581Mpps
with patch: 13.055Mpps

The half a million pps drop for the second test case is concerning to me but 
not surprising since we're adding extra complexity to netdev_dpdk_send()
Could this be avoided? Would it make sense to put this functionality somewhere 
else eg. vhost receive?

On another note I have a general concern. I understand similar functionality is 
present in the DPDK vhost sample app. I wonder if it would be feasible for this 
to be implemented in the DPDK vhost library and leveraged here, rather than 
having two implementations in two separate code bases.

I have some other comments inline.

Thanks,
Ciara

> 
> [root@localhost ~]# qperf -t 10 -oo msg_size:1:64K:*2  host-qperf-server01
> tcp_bw tcp_lat udp_bw udp_lat
>   do cksum in ovs-dpdk  do cksum in VM without this patch
> tcp_bw:
> bw  =  2.05 MB/secbw  =  1.92 MB/secbw  =  1.95 MB/sec
> tcp_bw:
> bw  =  3.9 MB/sec bw  =  3.99 MB/secbw  =  3.98 MB/sec
> tcp_bw:
> bw  =  8.09 MB/secbw  =  7.82 MB/secbw  =  8.19 MB/sec
> tcp_bw:
> bw  =  14.9 MB/secbw  =  14.8 MB/secbw  =  15.7 MB/sec
> tcp_bw:
> bw  =  27.7 MB/secbw  =  28 MB/sec  bw  =  29.7 MB/sec
> tcp_bw:
> bw  =  51.2 MB/secbw  =  50.9 MB/secbw  =  54.9 MB/sec
> tcp_bw:
> bw  =  86.7 MB/secbw  =  86.8 MB/secbw  =  95.1 MB/sec
> tcp_bw:
> bw  =  149 MB/sec bw  =  160 MB/sec bw  =  149 MB/sec
> tcp_bw:
> bw  =  211 MB/sec bw  =  205 MB/sec bw  =  216 MB/sec
> tcp_bw:
> bw  =  271 MB/sec bw  =  254 MB/sec bw  =  275 MB/sec
> tcp_bw:
> bw  =  326 MB/sec bw  =  303 MB/sec bw  =  321 MB/sec
> tcp_bw:
> bw  =  407 MB/sec bw  =  359 MB/sec bw  =  361 MB/sec
> tcp_bw:
> bw  =  816 MB/sec bw  =  512 MB/sec bw  =  419 MB/sec
> tcp_bw:
> bw  =  840 MB/sec bw  =  756 MB/sec bw  =  457 MB/sec
> tcp_bw:
> bw  =  1.07 GB/secbw  =  880 MB/sec bw  =  480 MB/sec
> tcp_bw:
> bw  =  1.17 GB/secbw  =  1.01 GB/secbw  =  488 MB/sec
> tcp_bw:
> bw  =  1.17 GB/secbw  =  1.11 GB/secbw  =  483 MB/sec
> tcp_lat:
> latency  =  29 us latency  =  29.2 us   latency  =  29.6 us
> tcp_lat:
> latency  =  28.9 us   latency  =  29.3 us   latency  =  29.5 us
> tcp_lat:
> latency  =  29 us latency  =  29.3 us   latency  =  29.6 us
> tcp_lat:
> latency  =  29 us latency  =  29.4 us   latency  =  29.5 us
> tcp_lat:
> latency  =  29 us latency  =  29.2 us   latency  =  29.6 us
> tcp_lat:
> latency  =  29.1 us   latency  =  29.3 us   latency  =  29.7 us
> tcp_lat:
> latency  =  29.4 us   latency  =  29.6 us   latency  =  30 us
> tcp_lat:
> latency  =  29.8 us   latency  =  30.1 us   latency  =  30.2 us
> tcp_lat:
> latency  =  30.9 us   latency  =  30.9 us   latency  =  31 us
> tcp_lat:
> latency  =  46.9 us   latency  =  46.2 us   latency  =  32.2 us
> tcp_lat:
> latency  =  51.5 us   latency  =  52.6 us   latency  =  34.5 us
> tcp_lat:
> latency  =  43.9 us   latency  =  43.8 us   latency  =  43.6 us
> tcp_lat:
>  latency  =  47.6 us  latency  =  48 us latency  =  48.1 us
> tcp_lat:
> latency  =  77.7 us   latency  =  78.8 us  

[ovs-dev] [PATCH v1] netdev-dpdk: Implement TCP/UDP TX cksum in ovs-dpdk side

2017-08-01 Thread Zhenyu Gao
Currently, the dpdk-vhost side in ovs doesn't support tcp/udp tx cksum.
So L4 packets's cksum were calculated in VM side but performance is not
good.
Implementing tcp/udp tx cksum in ovs-dpdk side improves throughput and
makes virtio-net frontend-driver support NETIF_F_SG as well

Signed-off-by: Zhenyu Gao 
---

Here is some performance number:

Setup:

 qperf client
+-+
|   VM|
+-+
 |
 |  qperf server
+--+  ++
| vswitch+dpdk |  | bare-metal |
+--+  ++
   ||
   ||
  pNic-PhysicalSwitch

do cksum in ovs-dpdk: Applied this patch and execute 'ethtool -K eth0 tx on' in 
VM side.
  It offload cksum job to ovs-dpdk side.

do cksum in VM: Applied this patch and execute 'ethtool -K eth0 tx off' in VM 
side.
VM calculate cksum for tcp/udp packets.

We can see huge improvment in TCP throughput if we leverage ovs-dpdk cksum.

[root@localhost ~]# qperf -t 10 -oo msg_size:1:64K:*2  host-qperf-server01 
tcp_bw tcp_lat udp_bw udp_lat
  do cksum in ovs-dpdk  do cksum in VM without this patch
tcp_bw:
bw  =  2.05 MB/secbw  =  1.92 MB/secbw  =  1.95 MB/sec
tcp_bw:
bw  =  3.9 MB/sec bw  =  3.99 MB/secbw  =  3.98 MB/sec
tcp_bw:
bw  =  8.09 MB/secbw  =  7.82 MB/secbw  =  8.19 MB/sec
tcp_bw:
bw  =  14.9 MB/secbw  =  14.8 MB/secbw  =  15.7 MB/sec
tcp_bw:
bw  =  27.7 MB/secbw  =  28 MB/sec  bw  =  29.7 MB/sec
tcp_bw:
bw  =  51.2 MB/secbw  =  50.9 MB/secbw  =  54.9 MB/sec
tcp_bw:
bw  =  86.7 MB/secbw  =  86.8 MB/secbw  =  95.1 MB/sec
tcp_bw: 
bw  =  149 MB/sec bw  =  160 MB/sec bw  =  149 MB/sec
tcp_bw:
bw  =  211 MB/sec bw  =  205 MB/sec bw  =  216 MB/sec
tcp_bw:
bw  =  271 MB/sec bw  =  254 MB/sec bw  =  275 MB/sec
tcp_bw:
bw  =  326 MB/sec bw  =  303 MB/sec bw  =  321 MB/sec
tcp_bw:
bw  =  407 MB/sec bw  =  359 MB/sec bw  =  361 MB/sec
tcp_bw:
bw  =  816 MB/sec bw  =  512 MB/sec bw  =  419 MB/sec
tcp_bw: 
bw  =  840 MB/sec bw  =  756 MB/sec bw  =  457 MB/sec
tcp_bw:
bw  =  1.07 GB/secbw  =  880 MB/sec bw  =  480 MB/sec
tcp_bw:
bw  =  1.17 GB/secbw  =  1.01 GB/secbw  =  488 MB/sec
tcp_bw:
bw  =  1.17 GB/secbw  =  1.11 GB/secbw  =  483 MB/sec
tcp_lat:
latency  =  29 us latency  =  29.2 us   latency  =  29.6 us
tcp_lat:
latency  =  28.9 us   latency  =  29.3 us   latency  =  29.5 us
tcp_lat:
latency  =  29 us latency  =  29.3 us   latency  =  29.6 us
tcp_lat:
latency  =  29 us latency  =  29.4 us   latency  =  29.5 us
tcp_lat:
latency  =  29 us latency  =  29.2 us   latency  =  29.6 us
tcp_lat:
latency  =  29.1 us   latency  =  29.3 us   latency  =  29.7 us
tcp_lat:
latency  =  29.4 us   latency  =  29.6 us   latency  =  30 us
tcp_lat:
latency  =  29.8 us   latency  =  30.1 us   latency  =  30.2 us
tcp_lat:
latency  =  30.9 us   latency  =  30.9 us   latency  =  31 us
tcp_lat:
latency  =  46.9 us   latency  =  46.2 us   latency  =  32.2 us
tcp_lat:
latency  =  51.5 us   latency  =  52.6 us   latency  =  34.5 us
tcp_lat:
latency  =  43.9 us   latency  =  43.8 us   latency  =  43.6 us
tcp_lat:
 latency  =  47.6 us  latency  =  48 us latency  =  48.1 us
tcp_lat:
latency  =  77.7 us   latency  =  78.8 us   latency  =  78.8 us
tcp_lat:
latency  =  82.8 us   latency  =  82.3 us   latency  =  116 us
tcp_lat:
latency  =  94.8 us   latency  =  94.2 us   latency  =  134 us
tcp_lat:
latency  =  167 uslatency  =  197 uslatency  =  172 us
udp_bw:
send_bw  =  418 KB/secsend_bw  =  413 KB/secsend_bw  =  403 KB/sec
recv_bw  =  410 KB/secrecv_bw  =  412 KB/secrecv_bw  =  400 KB/sec
udp_bw:
send_bw  =  831 KB/secsend_bw  =  825 KB/secsend_bw  =  810 KB/sec
recv_bw  =  828 KB/secrecv_bw  =  816 KB/secrecv_bw  =  807 KB/sec
udp_bw:
send_bw  =  1.67 MB/sec   send_bw  =  1.65 MB/sec   send_bw  =  1.63 MB/sec
recv_bw  =  1.64 MB/sec   recv_bw  =  1.62 MB/sec   recv_bw  =  1.63 MB/sec
udp_bw:
send_bw  =  3.36 MB/sec   send_bw  =  3.29 MB/sec   send_bw  =  3.26 MB/sec
recv_bw  =  3.29 MB/sec   recv_bw  =  3.25 MB/sec   recv_bw  =  2.82 MB/sec
udp_bw:
send_bw  =  6.72 MB/sec   send_bw  =  6.61 MB/sec   send_bw  =  6.45 MB/sec
recv_bw  =  6.54 MB/sec   recv_bw  =  6.59 MB/sec   recv_bw  =  6.45 MB/sec
udp_bw:
send_bw  =  13.4