Replies inline prefixed with [HVH] (Topic seems to have went formatted email, can't remove without losing context.)
From: Eelco Chaudron <[email protected]> Sent: Friday, July 9, 2021 10:55 AM To: Van Haaren, Harry <[email protected]> Cc: Ferriter, Cian <[email protected]>; [email protected]; [email protected]; [email protected]; Amber, Kumar <[email protected]>; Stokes, Ian <[email protected]> Subject: Re: [v6 10/11] dpif-netdev/mfex: Add AVX512 based optimized miniflow extract On 8 Jul 2021, at 17:08, Van Haaren, Harry wrote: -----Original Message----- From: Eelco Chaudron <[email protected]<mailto:[email protected]>> Sent: Thursday, July 8, 2021 2:42 PM To: Ferriter, Cian <[email protected]<mailto:[email protected]>> Cc: [email protected]<mailto:[email protected]>; [email protected]<mailto:[email protected]>; [email protected]<mailto:[email protected]>; Van Haaren, Harry <[email protected]<mailto:[email protected]>>; Amber, Kumar <[email protected]<mailto:[email protected]>>; Stokes, Ian <[email protected]<mailto:[email protected]>> Subject: Re: [v6 10/11] dpif-netdev/mfex: Add AVX512 based optimized miniflow extract On 6 Jul 2021, at 15:11, Cian Ferriter wrote: From: Harry van Haaren <[email protected]<mailto:[email protected]>> This commit adds AVX512 implementations of miniflow extract. By using the 64 bytes available in an AVX512 register, it is possible to convert a packet to a miniflow data-structure in a small quantity instructions. The implementation here probes for Ether()/IP()/UDP() traffic, and builds the appropriate miniflow data-structure for packets that match the probe. The implementation here is auto-validated by the miniflow extract autovalidator, hence its correctness can be easily tested and verified. Note that this commit is designed to easily allow addition of new traffic profiles in a scalable way, without code duplication for each traffic profile. Signed-off-by: Harry van Haaren <[email protected]<mailto:[email protected]>> --- v5: <SNIP> + +BUILD_ASSERT_DECL((OFFSETOFEND(struct dp_packet, l4_ofs) + - offsetof(struct dp_packet, l2_pad_size)) == + MEMBER_SIZEOF(struct mfex_profile, dp_pkt_offs)); + +/* Any change in flow abi should be inluded here otherwise build should included + * fail. + */ +BUILD_ASSERT_DECL(FLOW_WC_SEQ == 42); I was discussing this assert offline, and some issues were raised as a lot of people build on top of OVS, and add new protocols, hence changing the FLOW_WC_SEQ. It would be good to add more details on what to change/check before you increase the value here. For example, which fields in the in mfex_profile structure, or what changes to the flow structure require changes in the AVX code. On top of this, a solution could also be to allow for compile-time disabling of MFEX for people that do their own hacking of OVS? Thoughts? I think of the FLOW_WC_SEQ as an "ABI for miniflow" counter. If we break the miniflow ABI for a given packet, then all mappings from packet data to miniflow need to be updated, or there is inconsistency. All mappings includes scalar miniflow_extract(), and potentially optimized SIMD implementations, it depends on the change/breakage. Simply put, if the MFEX autovalidator fails after changes, then the MFEX code needs to be updated. The problem is that not everybody has an AVX512 machine, so it might be impossible to run. Even the OVS GitHub hooks will not run the AVX512 part :) [HVH] I believe there's some ongoing discussions around CI and testing anyway, lets leave this topic for there. If a change is made to the miniflow ABI, the miniflow bits are likely to change position (if items are added in the middle of "struct flow"). Hence, the bit that previously represented IPv4 might now represent the newly hacked-in protocol. That's a break, and requires a counter bump. This is why upstream collaboration is favored across the industry, there is overhead/breakage in keeping features/code out-of-tree. While this is a nice goal, we all know this is not true in real life :( There are a lot of people out there that use OVS in the proprietary environment, and their changes are not sharable (or the community is not willing to include them). I feel that we the OVS community should not take steps towards making it easier to disable parts of code-bases internal to a project, just so others maintaining out-of-tree code might have an easier time hacking in features. Instead I encourage all to collaborate and contribute changes to OVS, and avoid the overhead of maintaining out-of-tree work. A similar stance is also taken by Linux kernel and DPDK on out-of-tree work. They want to, but it’s not how it’s in practice :( [HVH] Agreed in general, but I still hold that OVS community should not take on any burden due to 3rd parties have out-of-tree work. The fact that its maintained out-of-tree places any burden due to that fact on the 3rd party. I expect a neutral-as-possible position, where OVS is not hostile to out-of-tree changes, but also not accepting burden due to out-of-tree changes. MFEX opts are not enabled by default, so nothing will *break* due to the inclusion of this code. If somebody first hacks in out-of-tree changes, breaks things, and then blindly enables a specific MFEX impl without testing, then things can break yes. But note that running MFEX autovalidation will flag breakage: so even in this corner-case nothing will break silently. I think we are wasting time discussing stuff here, where we should try to be inclusive, as not many OVS users/developers might have access to AVZ512 hardware. I would suggest adding some documentation around this. Maybe something like this: /* If the above build assert happens, this means that you might need to make * some modifications to the AVX512 miniflow extractor code. In general, the * AVX512 flow extractor code uses hardcoded miniflow->map->bits which are * defined in the mfex_profile structure as mf_bits. In addition to the * hardcoded bits, it also has hardcoded offsets/masks that tell the AVX512 * code how to translate packet data in the required miniflow values. These * are stored in the mfex_profile structure as store_shuf and store_kmsk. * See the respective documentation on their usage. * * If you have made changes to the flow structure, but only additions, no * re-arranging of the actual members, you might be good to go. To be 100% * sure, if possible, run the AVX512 MFEX autovalidator tests on an AVX512 * enabled machine. * * If you did make changes to the order, you have to run the autovalidator * tests on an AVX512 machine, and the errors will help you determine what to * change. * * In case your change increased the maximum size of the map, i.e., * UFLOWMAP_UNITS, you need to study the code as it will need some rewriting. * * If you are not using the AVX512 MFEX implementation at all, i.e. keeping it * to the default scalar implementation, see "ovs-appctl * dpif-netdev/miniflow-parser-get", you could ignore this assert, and just * increase the number. */ In addition I think the following assert should also be added to the code: BUILD_ASSERT_DECL(UFLOWMAP_UNITS == 2); And the following should use the variable, so it’s also clear where it’s used: /* Constant data to set in mf.bits and dp_packet data on hit. */ uint64_t mf_bits[UFLOWMAP_UNITS]; [HVH] Both doc and code changes sound good, thanks for review & suggestion. //Eelco _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
