> -----Original Message-----
> From: Eelco Chaudron <[email protected]>
> Sent: Thursday, July 14, 2022 2:25 PM
> To: Van Haaren, Harry <[email protected]>
> Cc: [email protected]; [email protected]; Amber, Kumar
> <[email protected]>; Pai G, Sunil <[email protected]>; Finn, Emma
> <[email protected]>; Stokes, Ian <[email protected]>
> Subject: Re: [PATCH v10 10/10] odp-execute: Add ISA implementation of
> set_masked
> IPv4 action
>
> > From: Emma Finn <[email protected]>
> >
> > This commit adds support for the AVX512 implementation of the
> > ipv4_set_addrs action as well as an AVX512 implementation of
> > updating the checksums.
<snip>
> > + /* Update the IP checksum based on updated IP values. */
> > + uint16_t delta = avx512_ipv4_update_csum(v_res, v_packet);
> > + uint32_t new_csum = old_csum + delta;
> > + delta = csum_finish(new_csum);
> > +
> > + /* Insert new checksum. */
> > + v_res = _mm256_insert_epi16(v_res, delta, 5);
> > +
> > + /* If ip_src or ip_dst has been modified, L4 checksum needs to
> > + * be updated too. */
> > + if (mask->ipv4_src || mask->ipv4_dst) {
> > +
> > + uint16_t delta_checksum = avx512_l4_update_csum(v_packet,
> > v_res);
> > +
>
> Wondering if all this AVX code being executed really is faster than
> recalc_csum32(uh-
> >udp_csum, old_addr, new_addr)?
Ultimately, measuring is worth more than talking about it. In our measurements
here,
yes absolutely it is, our measurements are available in the cover letter of the
patchset.
Note that the code here is compute-bound, its juggling values between
registers, and
with XMM/YMM registers, SIMD IPC of 3 can be achieved. That means that in
theory,
the SIMD code executes ~3 intrinsics *per cycle*, but in practice the IPC is
often *more*
due to interleaved scalar code, and Out-of-Order execution capabilities of the
CPU.
Although the code is verbose (lots of typing) the resulting instruction stream
is generally
optimized very well by the compiler, and reduced to very small, dense and hot
loops.
I recommend using "perf top" to investigate the hotspots, for those unaware of
tools
and methods, a DPDK Userspace presentation covers exactly this using OVS DPCLS
as
the examples code! https://youtu.be/ZmwOKR5JyPk
Regards, -Harry
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev