> -----Original Message----- > From: Eelco Chaudron <[email protected]> > Sent: Wednesday, June 30, 2021 2:12 PM > To: Amber, Kumar <[email protected]>; Van Haaren, Harry > <[email protected]> > Cc: [email protected]; [email protected]; Flavio Leitner > <[email protected]>; > Stokes, Ian <[email protected]> > Subject: Re: [ovs-dev] [v4 10/12] dpif-netdev/mfex: Add AVX512 based optimized > miniflow extract > > This patch was an interesting patch to review and being reminded about > endianness, > and this site, > https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm512_maskz > _permutexvar_epi8&expand=4315, got me through it ;)
Hah, yes the Intrinsics Guide is very useful for reading/investigating what/how instructions can do. Its... almost always open in a browser in some tab here! :) > Some comments below... > > //Eelco Thanks for review, I'll snip away large chunks of code to reduce verbosity. Regards, -Harry > On 17 Jun 2021, at 18:27, Kumar Amber wrote: > > > From: Harry van Haaren <[email protected]> <snip> > > +/* AVX512-BW level permutex2var_epi8 emulation. */ > > +static inline __m512i > > +__attribute__((target("avx512bw"))) > > Are these targets universal enough for all supported compilers, if not we > might need > to move them to individual macros in compile.h. Yes, these are the standard gcc/clang etc compiler -m <isa level> switches. Search for "-mavx512bw" on e.g. this GCC page, lists them all; https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html If a compiler does not understand them, we will have to #ifdef that compiler out, as it just doesn't support the ISA. > > +/* Static const instances of profiles. These are compile-time constants, > > + * and are specialized into individual miniflow-extract functions. > > + */ > > +static const struct mfex_profile mfex_profiles[PROFILE_COUNT] = > > +{ > > + [PROFILE_ETH_IPV4_UDP] = { > > + .probe_mask.u8_data = { PATTERN_ETHERTYPE_MASK PATTERN_IPV4_MASK > }, > > + .probe_data.u8_data = { PATTERN_ETHERTYPE_IPV4 PATTERN_IPV4_UDP}, > > + > > + .store_shuf.u8_data = { PATTERN_IPV4_UDP_SHUFFLE }, > > + .store_kmsk = PATTERN_IPV4_UDP_KMASK, > > + > > + .mf_bits = { 0x18a0000000000000, 0x0000000000040401}, > > I did some manual translation from these bits, to parts of the flow structure > they > represent, but it was not something fun to do. Maybe you still have your > notes and > could add some to the code? It might help debugging? Agree that these bits are "arbitrary" to some degree, they're offsets into the miniflow datastructure, with each bit representing 8-bytes of data. These are derived from the output of the autovalidator, which prints "good" and "test" values. <snip> > As we are explicitly manual defining the mf_bits I think we also need to > update the > comment in the “struct flow” definition to reflect that if the order change > these > specific functions need updating also. There's an "ABI Macro" in that struct, we can throw one of those build-time asserts into here too to be "extra sure", but this would be caught by running MFEX autovalidation unit tests. <snip> > > +/* Generic loop to process any mfex profile. This code is specialized into > > + * multiple actual MFEX implementation functions. Its marked ALWAYS_INLINE > > + * to ensure the compiler specializes each instance. The code is marked > > "hot" > > + * to inform the compiler this is a hotspot in the program, encouraging > > + * inlining of callee functions such as the permute calls. > > + */ > > +static inline uint32_t ALWAYS_INLINE > > +__attribute__ ((hot)) > > Do we need to move this to a macro in compiler.h as OVS_HOT to make sure it’s > not > causing issues on other compilers like windows, etc? I'm not sure, we could I suppose, I'm not strongly for or against. Today this patchset doesn't modify compiler.h at all, perhaps cleaner to update in a later patch, and consider other functions for tagging with OVS_HOT too in that patchset? <snip> > > + /* Copy known dp packet offsets to the dp_packet instance. */ > > + memcpy(&packet->l2_pad_size, &profile->dp_pkt_offs, > > + sizeof(uint16_t) * 4); > > + > > Here we copy four fields to the packet structure (l2_pad_size, l2_5_ofs, > l3_ofs, > l4_ofs). I think we should add some static_assert to make sure the order of > these > fields do not change. Yes, I think Flavio had a similar comment in one of the reviews. Good point, has been addressed with BUILD_ASSERT_DELC() and offsets into struct by Amber. <snip to end> _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
