On Wed, 20 May 2020 at 15:32, William Tu <[email protected]> wrote: > On Wed, May 20, 2020 at 3:35 AM Federico Iezzi <[email protected]> wrote: > > > > > > > > > > > > On Wed, 20 May 2020 at 12:20, Van Haaren, Harry < > [email protected]> wrote: > >> > >> > -----Original Message----- > >> > From: William Tu <[email protected]> > >> > Sent: Wednesday, May 20, 2020 1:12 AM > >> > To: Van Haaren, Harry <[email protected]> > >> > Cc: [email protected]; [email protected] > >> > Subject: Re: [ovs-dev] [PATCH v2 5/5] dpif-lookup: add avx512 gather > >> > implementation > >> > > >> > On Mon, May 18, 2020 at 9:12 AM Van Haaren, Harry > >> > <[email protected]> wrote: > >> > > > >> > > > -----Original Message----- > >> > > > From: William Tu <[email protected]> > >> > > > Sent: Monday, May 18, 2020 3:58 PM > >> > > > To: Van Haaren, Harry <[email protected]> > >> > > > Cc: [email protected]; [email protected] > >> > > > Subject: Re: [ovs-dev] [PATCH v2 5/5] dpif-lookup: add avx512 > gather > >> > > > implementation > >> > > > > >> > > > On Wed, May 06, 2020 at 02:06:09PM +0100, Harry van Haaren wrote: > >> > > > > This commit adds an AVX-512 dpcls lookup implementation. > >> > > > > It uses the AVX-512 SIMD ISA to perform multiple miniflow > >> > > > > operations in parallel. > >> > >> <snip lots of code/patch contents for readability> > >> > >> > Hi Harry, > >> > > >> > I managed to find a machine with avx512 in google cloud and did some > >> > performance testing. I saw lower performance when enabling avx512, > > > > > > AVX512 instruction path lowers the clock speed well below the base > frequency [1]. > > Aren't you killing the PMD performance while improving the lookup ones? > > > > [1] > https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/2nd-gen-xeon-scalable-spec-update.pdf > (see page 20) > > > > Hi Federico, > > Thanks for sharing the link. > Does that mean if OVS PMD uses avx512 on one core, then all the other > cores's > frequency will be lower? >
Only where avx512 instructions are executed the clock is reduced to cope with the thermals I'm not sure if there is a situation where avx512 code is executed only on specific PMDs, if that happens is bad as some may PMD be faster/slower (see below) Kinda like when dynamic turbo boost is enabled and some pmd go faster because of the higher clock > > There are some discussion here: > > https://lemire.me/blog/2018/09/07/avx-512-when-and-how-to-use-these-new-instructions/ Wow, quite interesting. Thanks! > > My take is that overall down clocking will happen, but application > will get better performance. > Indeed the part of the code wrote for avx512 goes much faster, the rest, stay on the normal path and will go slow due to the reduced clock. Those are different use-cases and programs but see Cannon Lake Anandtech review regarding what AVX512 can deliver ### When we crank on the AVX2 and AVX512, there is no stopping the Cannon Lake chip here. At a score of 4519, it beats a full 18-core Core i9-7980XE processor running in non-AVX. https://www.anandtech.com/show/13405/intel-10nm-cannon-lake-and-core-i3-8121u-deep-dive-review/9 ### Indeed you have to expect much-improved performance from it, the question is how much non-avx512 code will slow down See also this one -> https://travisdowns.github.io/blog/2020/01/17/avxfreq1.html HTH, Federico > William > > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
