> -----Original Message-----
> From: Timothy Redaelli <[email protected]>
> Sent: Monday, May 10, 2021 6:43 PM
> To: Amber, Kumar <[email protected]>; [email protected]
> Cc: [email protected]; [email protected]; [email protected]; Van Haaren, Harry
> <[email protected]>
> Subject: Re: [ovs-dev] [v2 v2 0/6] MFEX Infrastructure + Optimizations
<snip patchset details for brevity>
> >
>
> Hi,
> we (as Red Hat) did some tests with a "special" build created on top of
> master (a019868a6268 at that time) with with the 2 series ("DPIF
> Framework + Optimizations" and "MFEX Infrastructure + Optimizations")
> cherry-picked.
> The spec file was also modified in order to use add "-msse4.2 -mpopcnt"
> to OVS CFLAGS.
Hi Timothy,
Thanks for testing and reporting back your findings! Most of the configuration
is clear to me, but I have a few open questions inline below for context.
The performance numbers reported in the email below do not show benefit when
enabling AVX512, which contradicts our
recent whitepaper on benchmarking an Optimized Deployment of OVS, which
includes the AVX512 patches you've benchmarked too.
Specifically Table 8. for DPIF/MFEX patches, and Table 9. for the overall
optimizations at a platform level are relevant:
https://networkbuilders.intel.com/solutionslibrary/open-vswitch-optimized-deployment-benchmark-technology-guide
Based on the differences between these performance reports, there must be some
discrepancy in our testing/measurements.
I hope that the questions below help us understand any differences so we can
all measure the benefits from these optimizations.
Regards, -Harry
> RPM=openvswitch2.15-2.15.0-37.avx512.1.el8fdp (the "special" build with
> the patches backported)
>
> * Master --- 15.2 Mpps
> * Plus "avx512_gather 3" Only --- 15.2 Mpps
> * Plus "dpif-set dpif_avx512" Only --- 10.1 Mpps
> * Plus "miniflow-parser-set study" --- Failed to converge
> * Plus all three --- 13.5 Mpps
Open questions:
1) Is CPU frequency turbo enabled in any scenario, or always pinned to the 2.6
GHz base frequency?
- A "perf top -C x,y" (where x,y are datapath hyperthread ids) would be
interesting to compare with 3) below.
2) "plus Avx512 gather 3" (aka, DPCLS in AVX512), we see same performance. Is
DPCLS in use, or is EMC doing all the work?
- The output of " ovs-appctl dpif-netdev/pmd-perf-show" would be interesting
to understand where packets are classified.
3) "dpif-set dpif_avx512" only. The performance here is very strange, with ~30%
reduction, while our testing shows performance improvement.
- A "perf top" here (compared vs step 1) would be helpful to see what is
going on
4) "miniflow parser set study", I don't understand what is meant by "Failed to
converge"?
- Is the traffic running in your benchmark Ether()/IP()/UDP() ?
- Note that the only traffic pattern accelerated today is Ether()/IP()/UDP()
(see patch
https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/
for details). The next revision of the patchset will include other traffic
patterns, for example Ether()/Dot1Q()/IP()/UDP() and Ether()/IP()/TCP().
> RPM=openvswitch2.15-2.15.0-15.el8fdp (w/o "-msse4.2 -mpopcnt")
> * 15.2 Mpps
5) What CFLAGS "-march=" CPU ISA and "-O" optimization options are being used
for the package?
- It is likely that "-msse4.2 -mpopcnt" is already implied if -march=corei7
or Nehalem for example.
> P2P benchmark
> * ovs-dpdk/25 Gb i40e <-> trex/i40e
> * single queue two pmd's --- two HT's out of a CPU core.
>
> Host CPU
> Model name: Intel(R) Xeon(R) Gold 6132 CPU @ 2.60GHz
Thanks for detailing the configuration, and looking forward to understanding
the configuration/performance better.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev