Re: [ovs-dev] [PATCH v2 5/5] dpif-lookup: add avx512 gather implementation

Van Haaren, Harry Thu, 21 May 2020 10:11:08 -0700

Hey All,

[OT: Apologies for a missing indent, some HTML mixup occurred somewhere, now 
plain-text email again.]

>From: Federico Iezzi <[email protected]>
>Sent: Wednesday, May 20, 2020 5:13 PM
>To: William Tu <[email protected]>
>Cc: Van Haaren, Harry <[email protected]>; [email protected]; 
>[email protected]
>Subject: Re: [ovs-dev] [PATCH v2 5/5] dpif-lookup: add avx512 gather 
>implementation
>
>On Wed, 20 May 2020 at 15:32, William Tu <[email protected]> wrote:
>On Wed, May 20, 2020 at 3:35 AM Federico Iezzi <[email protected]> wrote:
>> On Wed, 20 May 2020 at 12:20, Van Haaren, Harry <[email protected]> 
>> wrote:
>>>
>>> > -----Original Message-----
>>> > From: William Tu <[email protected]>
>>> > Sent: Wednesday, May 20, 2020 1:12 AM
>>> > To: Van Haaren, Harry <[email protected]>
>>> > Cc: [email protected]; [email protected]
>>> > Subject: Re: [ovs-dev] [PATCH v2 5/5] dpif-lookup: add avx512 gather
>>> > implementation
>>> >
>>> > On Mon, May 18, 2020 at 9:12 AM Van Haaren, Harry
>>> > <[email protected]> wrote:
>>> > >
>>> > > > -----Original Message-----
>>> > > > From: William Tu <[email protected]>
>>> > > > Sent: Monday, May 18, 2020 3:58 PM
>>> > > > To: Van Haaren, Harry <[email protected]>
>>> > > > Cc: [email protected]; [email protected]
>>> > > > Subject: Re: [ovs-dev] [PATCH v2 5/5] dpif-lookup: add avx512 gather
>>> > > > implementation
>>> > > >
>>> > > > On Wed, May 06, 2020 at 02:06:09PM +0100, Harry van Haaren wrote:
>>> > > > > This commit adds an AVX-512 dpcls lookup implementation.
>>> > > > > It uses the AVX-512 SIMD ISA to perform multiple miniflow
>>> > > > > operations in parallel.
>>>
>>> <snip lots of code/patch contents for readability>
>>>
>>> > Hi Harry,
>>> >
>>> > I managed to find a machine with avx512 in google cloud and did some
>>> > performance testing. I saw lower performance when enabling avx512,
>>
>>
>> AVX512 instruction path lowers the clock speed well below the base frequency 
>> [1].
>> Aren't you killing the PMD performance while improving the lookup ones?
>>
>> [1] 
>> https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/2nd-gen-xeon-scalable-spec-update.pdf
>>  (see page 20)

Thanks for raising your question – likely there are others with similar 
questions. It will be good to
discuss here and to be able to present the logic and design taken these OVS 
patches for enabling AVX512.

From a frequency perspective, there is a mis-conception that AVX512 will always 
cause the worst-case degradation.
For example, there are differences in frequency based on what instructions are 
executing. This does makes it more
complicated, however there are rules here – and those rules provide us SW 
developers with best practices. I've added
my colleague Edwin on CC, who is much more familiar with AVX512 frequency 
topic, and can provide more detail.

From an OVS Software Developer perspective, these were the design decisions 
that made AVX512 enabling work:
AVX512 provides very powerful compute ISA, so to optimize with it we must 
efficiently achieve compute. This patchset
achieves "flattening" of a packet miniflow data-structure, based on the 
miniflow of the subtable to match on. In short,
it implements the tuple-space-search as required for DPCLS wildcarded lookup in 
SIMD. The instruction count reduction
is large – and that's what ultimately leads to the performance improvements.

Given a DPCLS implementation with AVX512, we must consider the other work done 
on that thread – you correctly
point out that other work (e.g. DPDK PMDs) also execute on that core. My 
experience has been that performance goes
up – including DPDK PMD rx and tx – overall rate of work done increases. Given 
OVS can spend significant amounts of
time in DPCLS itself, any potential slowdown of the PMD code is very likely 
still giving performance improvements.

Finally – the design itself here is very flexible – this allows each deployment 
of OVS to test if/how-much the AVX512
code-path improves real-world performance, and enable it based on that.

>Thanks for sharing the link.
>Does that mean if OVS PMD uses avx512 on one core, then all the other cores's
>frequency will be lower?
>
>Only where avx512 instructions are executed the clock is reduced to cope with 
>the thermals
>I'm not sure if there is a situation where avx512 code is executed only on 
>specific PMDs, if that happens is bad as some may PMD be faster/slower (see 
>below)
>Kinda like when dynamic turbo boost is enabled and some pmd go faster because 
>of the higher clock
>
>
>There are some discussion here:
>https://lemire.me/blog/2018/09/07/avx-512-when-and-how-to-use-these-new-instructions/
>
>Wow, quite interesting. Thanks!
>
>
>My take is that overall down clocking will happen, but application
>will get better performance.
>
>Indeed the part of the code wrote for avx512 goes much faster, the rest, stay 
>on the normal path and will go slow due to the reduced clock.
>Those are different use-cases and programs but see Cannon Lake Anandtech 
>review regarding what AVX512 can deliver
>
>###
>When we crank on the AVX2 and AVX512, there is no stopping the Cannon Lake 
>chip here. At a score of 4519, it beats a full 18-core Core i9-7980XE 
>processor running in non-AVX.
>https://www.anandtech.com/show/13405/intel-10nm-cannon-lake-and-core-i3-8121u-deep-dive-review/9
>###
>
>Indeed you have to expect much-improved performance from it, the question is 
>how much non-avx512 code will slow down
>See also this one -> 
>https://travisdowns.github.io/blog/2020/01/17/avxfreq1.html

There's a lot of (and some very detailed) information out there,  and it's 
useful to read the available information.
Ultimately it is very unlikely somebody has tested your exact configuration or 
deployment, particularly since this
OVS patchset is fresh on the mailing-list in the past weeks. I welcome $ perf 
top  output like William's email,
showing CPU %'s spent in DPCLS, more real-world data the better for showing the 
value of AVX512 in DPCLS.

Regards, -Harry
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v2 5/5] dpif-lookup: add avx512 gather implementation

Reply via email to