Re: [vpp-dev] VPP Performance drop from 17.04 to 17.07

Billy McFall Tue, 05 Sep 2017 07:21:06 -0700

Thanks for the update John. I'll this along to our test team. Not sure when
we can schedule a retest, but when we do, I'll provide our results.


Thanks again,
Billy

On Tue, Sep 5, 2017 at 10:10 AM, John Lo (loj) <l...@cisco.com> wrote:

> Hi Billy,
>
>
>
> I submitted fixes for VPP-963, now merged in both 17.07 and master/17.10,
> that I believe should address the NDR/PDR performance issue with the 10K
> and 1M flow cases. The regression was caused by a bug fix in the L2
> learning path to update stale time stamp and sequence number of MAC entries
> in L2FIB. Because the time stamp is in unit of  minutes, whenever the clock
> hits the minute mark, there can be a prolonged burst of MAC updates
> affecting forwarding performance with large number of MACs in L2 FIB
> needing updates. My fix would smooth out the update burst to reduce the
> impact. I believe you should now find the 17.07 or 17.10 performance for
> 10K and 1M flows slightly lower but fairly close to the level of 17.04,
> instead of somewhere between 1/3 to 1/2 to that of the 17.04 as you
> measured before.
>
>
>
> I also doubled the memory size of L2FIB table to fit 4M MACs and set the
> learn limit to 4M entries. During my test, I found L2FIB will run out of
> memory at around 2.8M MACs with the previous memory size.
>
>
>
> Regards,
>
> John
>
>
>
> *From:* vpp-dev-boun...@lists.fd.io [mailto:vpp-dev-boun...@lists.fd.io] *On
> Behalf Of *Billy McFall
> *Sent:* Monday, August 28, 2017 12:47 PM
> *To:* Maciek Konstantynowicz (mkonstan) <mkons...@cisco.com>
> *Cc:* csit-...@lists.fd.io; vpp-dev <vpp-dev@lists.fd.io>
> *Subject:* Re: [vpp-dev] VPP Performance drop from 17.04 to 17.07
>
>
>
>
>
>
>
> On Mon, Aug 28, 2017 at 8:53 AM, Maciek Konstantynowicz (mkonstan) <
> mkons...@cisco.com> wrote:
>
> + csit-dev
>
>
>
> Billy,
>
>
>
> Per the last week CSIT project call, from CSIT perspective, we
>
> classified your reported issue as Test coverage escape.
>
>
>
> Summary
>
> =======
>
> CSIT test coverage got fixed, see more detail below. The CSIT tests
>
> uncovered regression for L2BD with MAC learning with higher total number
>
> of MACs in L2FIB, >>10k MAC, for multi-threaded configurations. Single-
>
> threaded configurations seem to be not impacted.
>
>
>
> Billy, Karl, Can you confirm this aligns with your findings?
>
>
>
> When you say "multi-threaded configuration", I assume you mean multiple
> worker threads? Karl's tests had 4 workers, one for each NIC (physical
> and vhost-user). He only tested multi-threaded, so we can not confirm that 
> single-threaded
> configurations seem to be not impacted.
>
>
>
> Our numbers are a little different from yours, but we are both seeing
> drops between releases. We had a bigger drop off with 10k flows, but seems
> to be similar with the million flow tests.
>
>
>
> I was a little disappointed the MAC limit change by John Lo on 8/23 didn't
> improve master number some.
>
>
>
> Thanks for all the hard work and adding these additional test cases.
>
>
>
> Billy
>
>
>
>
>
> More detail
>
> ===========
>
> MAC scale tests have been now added L2BD and L2BD+vhost CSIT suites, as
>
> a simple extension to existing L2 testing suites. Some known issues with
>
> TG prevented CSIT to add those tests in the past, but now as TG issues
>
> have been addressed, the tests could be added swiftly. The complete list
>
> of added tests is listed in [1] - thanks to Peter Mikus for great work
>
> there!
>
>
>
> Results from running those tests multiple times within FD.io
> <http://fd.io> CSIT lab
>
> infra can be glanced over by checking dedicated test trigger commits
>
> [2][3][4], summary graphs in linked xls [5]. The results confirm there
>
> is regression in VPP l2fib code affecting all scaled up MAC tests in
>
> multi-thread configuration. Single-thread configurations seems not be
>
> impacted.
>
>
>
> The tests in commit [1] are not merged yet, as they're waiting for
>
> TG/TRex team to fix TRex issue with mis-calculating Ethernet FCS with
>
> large number of L2 MAC flows (>10k MAC flows). Issue is tracked by [6],
>
> TRex v2.29 with the fix ETA is w/e 1-Sep i.e. this week. Reported CSIT test
>
> results are using Ethernet frames with UDP headers that's masking the
>
> TRex issue.
>
>
>
> We have also vpp git bisected the problem between v17.04 (good) and
>
> v17.07 (bad) in a separate IXIA based lab in SJC, and found the culprit
>
> vpp patch [7]. Awaiting fix from vpp-dev, jira ticket raised [8].
>
>
>
> Many thanks for reporting this regression and working with CSIT to plug
>
> this hole in testing.
>
>
>
> -Maciek
>
>
>
> [1] CSIT-786 L2FIB scale testing [https://gerrit.fd.io/r/#/c/8145/
> ge8145] [https://jira.fd.io/browse/CSIT-786 CSIT-786];
>
>     L2FIB scale testing for 10k, 100k, 1M FIB entries
>
>      ./l2:
>
>      10ge2p1x520-eth-l2bdscale10kmaclrn-ndrpdrdisc.robot
>
>      10ge2p1x520-eth-l2bdscale100kmaclrn-ndrpdrdisc.robot
>
>      10ge2p1x520-eth-l2bdscale1mmaclrn-ndrpdrdisc.robot
>
>      10ge2p1x520-eth-l2bdscale10kmaclrn-eth-2vhostvr1024-1vm-cfsrr1-
> ndrpdrdisc
>
>      10ge2p1x520-eth-l2bdscale100kmaclrn-eth-2vhostvr1024-1vm-cfsrr1-
> ndrpdrdisc
>
>      10ge2p1x520-eth-l2bdscale1mmaclrn-eth-2vhostvr1024-1vm-cfsrr1-
> ndrpdrdisc
>
> [2] VPP master branch [https://gerrit.fd.io/r/#/c/8173/ ge8173];
>
> [3] VPP stable/1707 [https://gerrit.fd.io/r/#/c/8167/ ge8167];
>
> [4] VPP stable/1704 [https://gerrit.fd.io/r/#/c/8172/ ge8172];
>
> [5] CSIT-794 VPP v17.07 L2BD yields lower NDR and PDR performance vs.
> v17.04, 20170825_l2fib_regression_10k_100k_1M.xlsx, [
> https://jira.fd.io/browse/CSIT-794 CSIT-794];
>
> [6] TRex v2.28 Ethernet FCS mis-calculation issue [
> https://jira.fd.io/browse/CSIT-793 CSIT-793];
>
> [7] commit 25ff2ea3a31e422094f6d91eab46222a29a77c4b;
>
> [8] VPP v17.07 L2BD NDR and PDR multi-thread performance broken [
> https://jira.fd.io/browse/VPP-963 VPP-963];
>
>
>
> On 14 Aug 2017, at 23:40, Billy McFall <bmcf...@redhat.com> wrote:
>
>
>
> In the last VPP call, I reported some internal Red Hat performance testing
> was showing a significant drop in performance between releases 17.04 to
> 17.07. This with l2-bridge testing - PVP - 0.002% Drop Rate:
>
>    VPP-17.04: 256 Flow 7.8 MP/s 10k Flow 7.3 MP/s 1m Flow 5.2 MP/s
>
>    VPP-17.07: 256 Flow 7.7 MP/s 10k Flow 2.7 MP/s 1m Flow 1.8 MP/s
>
>
>
> The performance team re-ran some of the tests for me with some additional
> data collected. Looks like the size of the L2 FIB table was reduced in
> 17.07. Below are the number of entries in the MAC Table after the tests are
> run:
>
>    17.04:
>
>      show l2fib
>
>      4000008 l2fib entries
>
>    17.07:
>
>      show l2fib
>
>      1067053 l2fib entries with 1048576 learned (or non-static) entries
>
>
>
> This caused more packets to be flooded (see out of 'show node counters'
> below). I looked but couldn't find anything. Is the size of the L2 FIB
> Table table configurable?
>
>
>
> Thanks,
>
> Billy McFall
>
>
>
>
>
> 17.04:
>
>
>
> show node counters
>
>    Count                    Node                  Reason
>
> :
>
>  313035313                l2-input                L2 input packets
>
>     555726                l2-flood                L2 flood packets
>
> :
>
>  310115490                l2-input                L2 input packets
>
>     824859                l2-flood                L2 flood packets
>
> :
>
>  313508376                l2-input                L2 input packets
>
>    1041961                l2-flood                L2 flood packets
>
> :
>
>  313691024                l2-input                L2 input packets
>
>     698968                l2-flood                L2 flood packets
>
>
>
> 17.07:
>
>
>
> show node counters
>
>    Count                    Node                  Reason
>
> :
>
>   97810569                l2-input                L2 input packets
>
>   72557612                l2-flood                L2 flood packets
>
> :
>
>   97830674                l2-input                L2 input packets
>
>   72478802                l2-flood                L2 flood packets
>
> :
>
>   97714888                l2-input                L2 input packets
>
>   71655987                l2-flood                L2 flood packets
>
> :
>
>   97710374                l2-input                L2 input packets
>
>   70058006                l2-flood                L2 flood packets
>
>
>
>
>
> --
>
> *Billy McFall*
> SDN Group
> Office of Technology
> *Red Hat*
>
> _______________________________________________
> vpp-dev mailing list
> vpp-dev@lists.fd.io
> https://lists.fd.io/mailman/listinfo/vpp-dev
>
>
>
>
>
>
>
> --
>
> *Billy McFall*
> SDN Group
> Office of Technology
> *Red Hat*
>



-- 
*Billy McFall*
SDN Group
Office of Technology
*Red Hat*

_______________________________________________
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev

Re: [vpp-dev] VPP Performance drop from 17.04 to 17.07

Reply via email to