Re: [vpp-dev] CSIT-2001 update: Xeon Skylake Performance and Progressions/Regressions RCAs

2020-05-12 Thread Maciek Konstantynowicz (mkonstan) via lists.fd.io
Slides used on today’s VPP call: 
https://wiki.fd.io/view/File:200512-csit-vpp-readout.pptx

> On 12 May 2020, at 15:18, Maciek Konstantynowicz (mkonstan) 
>  wrote:
> 
> Dear All,
> 
> We have finally pushed out an update to CSIT-2001 report with VPP
> performance data for testbeds with Intel Xeon Skylake processors (2n-skx
> and 3n-skx testbeds), with SUT and TG servers impacted by firmware and
> OS upgrades (BIOS, ucode, kernel updates with mitigations against the
> newly discovered Spectre-Meltdown security vulnerabilities).
> 
> The updated CSIT-2001 report should be available for browsing just
> before 15:00 UTC today, subject to Jenkins job execution (will have
> updated version timestamp):
> 
>https://docs.fd.io/csit/rls2001/report/
>
> https://docs.fd.io/csit/rls2001/report/vpp_performance_tests/csit_release_notes.html
> 
> In addition to 2n-skx and 3n-skx performance data available at the usual
> locations in the report (see links [r1] to [r4] referenced below), we
> have expanded the way we do VPP release-to-release comparisons and root
> cause analysis (RCA) for any identified performance progressions and
> regressions:
> 
>- CSIT test environment is now versioned, with ver. 1 associated
>  with CSIT rls1908 git branch as of 2019-08-21, and ver. 2
>  associated with CSIT master and rls2001 git branches as of
>  2020-03-27.
> 
>- To identify SUT performance change(s) due to CSIT test environment
>  change(s) from ver. 1 to ver. 2, VPP v19.08.1 has been re-tested
>  in ver. 2 and results compared against the past data obtained with
>  ver. 1. RCA1 analysis has been applied to this part. See [r5].
> 
>- To identify SUT performance change(s) due to VPP code change(s)
>  from v19.08.1 to v20.01.0, both VPP versions have been tested in
>  CSIT environment ver. 2 and results compared. Separate RCA2
>  analysis has been applied to this part. See [r5].
> 
>- At this stage RCA1 and RCA2 analyses are focusing on progressions > +5%
>  and regressions < -5%.
> 
> Attached pasted complete list of RCAs identified as part of this
> exercise [1] to [12].
> 
> Hope it makes sense. For any questions and comments please contact
> csit-...@lists.fd.io.
> 
> Regards,
> Maciek
> (on behalf of FD.io CSIT team)
> 
> 
> Specific links within the report:
> 
> [r1] VPP throughput graphs,
> 
> https://docs.fd.io/csit/rls2001/report/vpp_performance_tests/packet_throughput_graphs/index.html
> 
> [r2] VPP throughput speedup multi-core,
> 
> https://docs.fd.io/csit/rls2001/report/vpp_performance_tests/throughput_speedup_multi_core/ip4-2n-skx-xxv710.html
> 
> [r3] VPP packet latency,
> 
> https://docs.fd.io/csit/rls2001/report/vpp_performance_tests/packet_latency/index.html
> 
> [r4] VPP soak tests,
> 
> https://docs.fd.io/csit/rls2001/report/vpp_performance_tests/soak_tests/index.html
> 
> [r5] 2n-skx PDR comparison with RCA,
> 
> https://docs.fd.io/csit/rls2001/report/vpp_performance_tests/comparisons/current_vs_previous_release.html#n-skx
> 
> [r6] 3n-skx PDR comparison with RCA,
> 
> https://docs.fd.io/csit/rls2001/report/vpp_performance_tests/comparisons/current_vs_previous_release.html#id1
> 
> RCA1:
> 
> [1] DONE, Impact of upgrades: i) Skx ucode from 0x243 to 0x265,
>[ii) Linux kernel from 4.15.0-60 to 4.15.0-72 and iii) SuperMicro
>[motherboard BIOS from 3.0c to 3.2.
> 
> [2] DONE, Applied fix of FVL NIC firmware 6.0.1 for increasing TRex pps
>rate from 27 Mpps to 37 Mpps, [CSIT-1503], [TRex-519].
> 
> [3] DONE, Applied VPP PAPI fix to enable memif zero-copy, [CSIT-1592],
>[VPP-1764].
> 
> [4] OPEN, Higher than before StDev of PDR throughput for VPP vhost-user
>with VPP-inside-VM, under investigation, [CSIT-1699], [CSIT-1704].
> 
> RCA2:
> 
> [5] OPEN, dot1q-l2xcbase progression, retro-inspection of weekly ndrpdr
>tests points to ge-22805, automated bisect script does not work
>due to frequent API changes, [CSIT-1699], [CSIT-1705].
> 
> [6] DONE, ip4base-nat44 regression, ge-23963
>(https://gerrit.fd.io/r/c/vpp/+/23963#message-044278e6_752c3327).
> 
> [7] WIP, avf-ip4scale regression, CANDIDATE(S) before ge-22699, [
>CSIT-1699], [CSIT-1706].
> 
> [8] OPEN, VPP vhost-user with VPP-inside-VM higher than before stdev
>of PDR throughput, under investigation, [CSIT-1699], [CSIT-1704].
> 
> [9] WIP, vhost-user with testpmd-in-VM progression, CANDIDATE(S)
>before 22277, [CSIT-1699], [CSIT-1707].
> 
> [10] WIP, avf-ip4base regression, CANDIDATE(S) range
> ge-18361..ge-24505, [CSIT-1699], [CSIT-1708].
> 
> [11] DONE, memif regression, CANDIDATE(S) confirmed ge-23801.
> 
> [12] WIP, ipsec tnl sw scale regression, CANDIDATE(S) before ge-23557,
> [CSIT-1699], [CSIT-1712].

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16349): https://lists.fd.io/g/vpp-dev/message/16349
Mute This Topic: https://lists.fd.io/mt/74159188/21656
Group 

[vpp-dev] CSIT-2001 update: Xeon Skylake Performance and Progressions/Regressions RCAs

2020-05-12 Thread Maciek Konstantynowicz (mkonstan) via lists.fd.io
Dear All,

We have finally pushed out an update to CSIT-2001 report with VPP
performance data for testbeds with Intel Xeon Skylake processors (2n-skx
and 3n-skx testbeds), with SUT and TG servers impacted by firmware and
OS upgrades (BIOS, ucode, kernel updates with mitigations against the
newly discovered Spectre-Meltdown security vulnerabilities).

The updated CSIT-2001 report should be available for browsing just
before 15:00 UTC today, subject to Jenkins job execution (will have
updated version timestamp):

https://docs.fd.io/csit/rls2001/report/

https://docs.fd.io/csit/rls2001/report/vpp_performance_tests/csit_release_notes.html

In addition to 2n-skx and 3n-skx performance data available at the usual
locations in the report (see links [r1] to [r4] referenced below), we
have expanded the way we do VPP release-to-release comparisons and root
cause analysis (RCA) for any identified performance progressions and
regressions:

- CSIT test environment is now versioned, with ver. 1 associated
  with CSIT rls1908 git branch as of 2019-08-21, and ver. 2
  associated with CSIT master and rls2001 git branches as of
  2020-03-27.

- To identify SUT performance change(s) due to CSIT test environment
  change(s) from ver. 1 to ver. 2, VPP v19.08.1 has been re-tested
  in ver. 2 and results compared against the past data obtained with
  ver. 1. RCA1 analysis has been applied to this part. See [r5].

- To identify SUT performance change(s) due to VPP code change(s)
  from v19.08.1 to v20.01.0, both VPP versions have been tested in
  CSIT environment ver. 2 and results compared. Separate RCA2
  analysis has been applied to this part. See [r5].

- At this stage RCA1 and RCA2 analyses are focusing on progressions > +5%
  and regressions < -5%.

Attached pasted complete list of RCAs identified as part of this
exercise [1] to [12].

Hope it makes sense. For any questions and comments please contact
csit-...@lists.fd.io.

Regards,
Maciek
(on behalf of FD.io CSIT team)


Specific links within the report:

[r1] VPP throughput graphs,
 
https://docs.fd.io/csit/rls2001/report/vpp_performance_tests/packet_throughput_graphs/index.html

[r2] VPP throughput speedup multi-core,
 
https://docs.fd.io/csit/rls2001/report/vpp_performance_tests/throughput_speedup_multi_core/ip4-2n-skx-xxv710.html

[r3] VPP packet latency,
 
https://docs.fd.io/csit/rls2001/report/vpp_performance_tests/packet_latency/index.html

[r4] VPP soak tests,
 
https://docs.fd.io/csit/rls2001/report/vpp_performance_tests/soak_tests/index.html

[r5] 2n-skx PDR comparison with RCA,
 
https://docs.fd.io/csit/rls2001/report/vpp_performance_tests/comparisons/current_vs_previous_release.html#n-skx

[r6] 3n-skx PDR comparison with RCA,
 
https://docs.fd.io/csit/rls2001/report/vpp_performance_tests/comparisons/current_vs_previous_release.html#id1

RCA1:

[1] DONE, Impact of upgrades: i) Skx ucode from 0x243 to 0x265,
[ii) Linux kernel from 4.15.0-60 to 4.15.0-72 and iii) SuperMicro
[motherboard BIOS from 3.0c to 3.2.

[2] DONE, Applied fix of FVL NIC firmware 6.0.1 for increasing TRex pps
rate from 27 Mpps to 37 Mpps, [CSIT-1503], [TRex-519].

[3] DONE, Applied VPP PAPI fix to enable memif zero-copy, [CSIT-1592],
[VPP-1764].

[4] OPEN, Higher than before StDev of PDR throughput for VPP vhost-user
with VPP-inside-VM, under investigation, [CSIT-1699], [CSIT-1704].

RCA2:

[5] OPEN, dot1q-l2xcbase progression, retro-inspection of weekly ndrpdr
tests points to ge-22805, automated bisect script does not work
due to frequent API changes, [CSIT-1699], [CSIT-1705].

[6] DONE, ip4base-nat44 regression, ge-23963
(https://gerrit.fd.io/r/c/vpp/+/23963#message-044278e6_752c3327).

[7] WIP, avf-ip4scale regression, CANDIDATE(S) before ge-22699, [
CSIT-1699], [CSIT-1706].

[8] OPEN, VPP vhost-user with VPP-inside-VM higher than before stdev
of PDR throughput, under investigation, [CSIT-1699], [CSIT-1704].

[9] WIP, vhost-user with testpmd-in-VM progression, CANDIDATE(S)
before 22277, [CSIT-1699], [CSIT-1707].

[10] WIP, avf-ip4base regression, CANDIDATE(S) range
 ge-18361..ge-24505, [CSIT-1699], [CSIT-1708].

[11] DONE, memif regression, CANDIDATE(S) confirmed ge-23801.

[12] WIP, ipsec tnl sw scale regression, CANDIDATE(S) before ge-23557,
 [CSIT-1699], [CSIT-1712].-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16343): https://lists.fd.io/g/vpp-dev/message/16343
Mute This Topic: https://lists.fd.io/mt/74159188/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-