Re: [ovs-dev] [PATCH v6 0/7] Output packet batching.

Jan Scheurich Tue, 05 Dec 2017 09:27:58 -0800

We have now repeated our earlier iperf3 tests for this patch series.
https://mail.openvswitch.org/pipermail/ovs-dev/2017-September/338247.html


We use an iperf3 server as representative for a typical IO-intensive kernel 
application. The iperf3 server executes in a VM with 2 vCPUs where both virtio 
interrupts and iperf process are pinned to the same vCPU for best performance. 
We run two iperf3 clients in parallel on a different server to avoid the client 
to become the bottleneck when enabling tx batching.

OVS       tx-flush-     iperf3      Avg. PMD    PMD       Iperf       ping -f
version   interval      Gbps        cycles/pkt  util      CPU load    avg rtt
----------------------------------------------------------------------------------
master        -         7.24        1778        46.5%      99.7%      23 us
Patch v6      0         7.18        1873        47.7%     100.0%      29 us
Patch v6     50         8.99        1108        36.3%      99.7%      38 us
Patch v6    100         ----        ----        ----      -----       88 us

In all cases the vCPU capacity of the of the server VM handing the virtio 
interrupts and the iperf3 server thread is the bottleneck. The TCP throughput 
is throttled by packets being dropped on Tx to the vhostuser port of the server 
VM. The Linux kernel is not fast enough to handle the interrupts and poll the 
incoming packets.

As expected the tx batching patch alone with tx-flush-interval=0 does not 
provide any benefit as it doesn't reduce the virtio interrupt rate.

Setting the tx-flush-interval to 50 microseconds immediately improves the 
throughput: The PMD utilization drops from 47% to 36% due to the reduced rate 
of write calls to the virtio kick fd. (I believe the more pronounced drop in 
processing cycles/pkt is an artifact of the patch. The cycles used for delayed 
tx to vhostuser are no longer counted as packet processing cost. To be checked 
in the individual patch review.)

More importantly, the iperf3 server VM can now receive 8.99 instead of 7.24 
Gbit/s, an increase by 24%. I am sure that 10G line rate could be reached with 
vhost multi-queue in the server VM.

Compared to the v4 version of the patches, the impact on latency is now reduced 
a lot. Packets with an inter-arrival time larger than the configured 
tx-flush-interval are not affected at all. For a 50 us tx-flush-interval this 
means packet flows with a packet rate of up to 20 Kpps!

Hence the average RTT reported by "ping -f" experience only a small increases 
from 23 us on master to 38 us with tx-flush-interval=50. Only when increasing 
tx-flush-interval well beyond the intrinsic average inter-arrival time, it 
translates directly into increased latency.

Conclusion: Time-based tx batching fulfills the expectations for 
interrupt-driven kernel workloads, while avoiding a latency impact even on 
moderately loaded ports.

BR, Jan


From: Jan Scheurich
Sent: Tuesday, 05 December, 2017 00:21
To: Ilya Maximets <[email protected]>; [email protected]; 
Bhanuprakash Bodireddy <[email protected]>
Cc: Heetae Ahn <[email protected]>; Antonio Fischetti 
<[email protected]>; Eelco Chaudron <[email protected]>; Ciara 
Loftus <[email protected]>; Kevin Traynor <[email protected]>; Ian 
Stokes <[email protected]>
Subject: RE: [PATCH v6 0/7] Output packet batching.


Hi Ilya,



I have retested your "Output patches batching" v6 in our standard PVP 
L3-VPN/VXLAN benchmark setup [1]. The configuration is a single PMD serving a 
physical 10G port and a VM running DPDK testpmd as IP reflector with 4 equally 
loaded vhostuser ports. The tests are run with 64 byte packets. Below are Mpps 
values averaged over four 10 second runs:



        master  patch                patch

Flows   Mpps    tx-flush-interval=0  tx-flush-interval=50

8       4.419   4.342   -1.7%        4.749    7.5%

100     4.026   3.956   -1.7%        4.281    6.3%

1000    3.630   3.632    0.1%        3.760    3.6%

2000    3.394   3.390   -0.1%        3.490    2.8%

5000    2.989   2.938   -1.7%        2.994    0.2%

10000   2.756   2.711   -1.6%        2.746   -0.4%

20000   2.641   2.598   -1.6%        2.622   -0.7%

50000   2.604   2.558   -1.8%        2.579   -1.0%

100000  2.598   2.552   -1.8%        2.572   -1.0%

500000  2.598   2.550   -1.8%        2.571   -1.0%



As expected output batching within rx bursts (tx-flush-interval=0) provides 
little or no benefit in this scenario. The test results reflect roughly a 1.7% 
performance penalty due to the tx batching overhead. This overhead is 
measurable, but should in my eyes not be a blocker for merging this patch 
series.



Interestingly, tests with time-based tx batching and a minimum flush interval 
of 50 microseconds show a consistent and significant performance increase for 
small number of flows (in the regime where EMC is effective) and a reduced 
penalty of 1% for many flows. I don't have a good explanation yet for this 
phenomenon. I would be interested to see if other benchmark results support the 
general positive impact of time-based tx batching on throughput also for 
synthetic DPDK applications in the VM. The average Ping RTT increases by 20-30 
us as expected.



We will also retest the performance improvement of time-based tx batching on 
interrupt driven Linux kernel applications (such as iperf3).



BR, Jan



> -----Original Message-----

> From: Ilya Maximets [mailto:[email protected]]

> Sent: Friday, 01 December, 2017 16:44

> To: [email protected]<mailto:[email protected]>; Bhanuprakash 
> Bodireddy 
> <[email protected]<mailto:[email protected]>>

> Cc: Heetae Ahn <[email protected]<mailto:[email protected]>>; 
> Antonio Fischetti 
> <[email protected]<mailto:[email protected]>>; Eelco 
> Chaudron

> <[email protected]<mailto:[email protected]>>; Ciara Loftus 
> <[email protected]<mailto:[email protected]>>; Kevin Traynor 
> <[email protected]<mailto:[email protected]>>; Jan Scheurich

> <[email protected]<mailto:[email protected]>>; Ian Stokes 
> <[email protected]<mailto:[email protected]>>; Ilya Maximets 
> <[email protected]<mailto:[email protected]>>

> Subject: [PATCH v6 0/7] Output packet batching.

>

> This patch-set inspired by [1] from Bhanuprakash Bodireddy.

> Implementation of [1] looks very complex and introduces many pitfalls [2]

> for later code modifications like possible packet stucks.

>

> This version targeted to make simple and flexible output packet batching on

> higher level without introducing and even simplifying netdev layer.

>

> Basic testing of 'PVP with OVS bonding on phy ports' scenario shows

> significant performance improvement.

>

> Test results for time-based batching for v3:

> https://mail.openvswitch.org/pipermail/ovs-dev/2017-September/338247.html

>

> Test results for v4:

> https://mail.openvswitch.org/pipermail/ovs-dev/2017-October/339624.html

>

> [1] [PATCH v4 0/5] netdev-dpdk: Use intermediate queue during packet 
> transmission.

>     https://mail.openvswitch.org/pipermail/ovs-dev/2017-August/337019.html

>

> [2] For example:

>     https://mail.openvswitch.org/pipermail/ovs-dev/2017-August/337133.html

>

> Version 6:

>           * Rebased on current master:

>             - Added new patch to refactor dp_netdev_pmd_thread structure

>               according to following suggestion:

>               
> https://mail.openvswitch.org/pipermail/ovs-dev/2017-November/341230.html

>

>             NOTE: I still prefer reverting of the padding related patch.

>                   Rebase done to not block acepting of this series.

>                   Revert patch and discussion here:

>                   
> https://mail.openvswitch.org/pipermail/ovs-dev/2017-November/341153.html

>

>           * Added comment about pmd_thread_ctx_time_update() usage.

>

> Version 5:

>           * pmd_thread_ctx_time_update() calls moved to different places to

>             call them only from dp_netdev_process_rxq_port() and main

>             polling functions:

>                           pmd_thread_main, dpif_netdev_run and 
> dpif_netdev_execute.

>             All other functions should use cached time from pmd->ctx.now.

>             It's guaranteed to be updated at least once per polling cycle.

>           * 'may_steal' patch returned to version from v3 because

>             'may_steal' in qos is a completely different variable. This

>             patch only removes 'may_steal' from netdev API.

>           * 2 more usec functions added to timeval to have complete public 
> API.

>           * Checking of 'output_cnt' turned to assertion.

>

> Version 4:

>           * Rebased on current master.

>           * Rebased on top of "Keep latest measured time for PMD thread."

>             (Jan Scheurich)

>           * Microsecond resolution related patches integrated.

>           * Time-based batching without RFC tag.

>           * 'output_time' renamed to 'flush_time'. (Jan Scheurich)

>           * 'flush_time' update moved to 'dp_netdev_pmd_flush_output_on_port'.

>             (Jan Scheurich)

>           * 'output-max-latency' renamed to 'tx-flush-interval'.

>           * Added patch for output batching statistics.

>

> Version 3:

>

>           * Rebased on current master.

>           * Time based RFC: fixed assert on n_output_batches <= 0.

>

> Version 2:

>

>           * Rebased on current master.

>           * Added time based batching RFC patch.

>           * Fixed mixing packets with different sources in same batch.

>

>

> Ilya Maximets (7):

>   dpif-netdev: Refactor PMD thread structure for further extension.

>   dpif-netdev: Keep latest measured time for PMD thread.

>   dpif-netdev: Output packet batching.

>   netdev: Remove unused may_steal.

>   netdev: Remove useless cutlen.

>   dpif-netdev: Time based output batching.

>   dpif-netdev: Count sent packets and batches.

>

>  lib/dpif-netdev.c     | 412 
> +++++++++++++++++++++++++++++++++++++-------------

>  lib/netdev-bsd.c      |   6 +-

>  lib/netdev-dpdk.c     |  30 ++--

>  lib/netdev-dummy.c    |   6 +-

>  lib/netdev-linux.c    |   8 +-

>  lib/netdev-provider.h |   7 +-

>  lib/netdev.c          |  12 +-

>  lib/netdev.h          |   2 +-

>  vswitchd/vswitch.xml  |  16 ++

>  9 files changed, 349 insertions(+), 150 deletions(-)

>

> --

> 2.7.4


_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v6 0/7] Output packet batching.

Reply via email to