Replies inline prefixed with [HVH] (Topic seems to have went formatted email, 
can't remove without losing context.)

From: Eelco Chaudron <[email protected]>
Sent: Friday, July 9, 2021 10:55 AM
To: Van Haaren, Harry <[email protected]>
Cc: Ferriter, Cian <[email protected]>; [email protected]; 
[email protected]; [email protected]; Amber, Kumar <[email protected]>; 
Stokes, Ian <[email protected]>
Subject: Re: [v6 10/11] dpif-netdev/mfex: Add AVX512 based optimized miniflow 
extract


On 8 Jul 2021, at 17:08, Van Haaren, Harry wrote:

-----Original Message-----
From: Eelco Chaudron <[email protected]<mailto:[email protected]>>
Sent: Thursday, July 8, 2021 2:42 PM
To: Ferriter, Cian <[email protected]<mailto:[email protected]>>
Cc: [email protected]<mailto:[email protected]>; 
[email protected]<mailto:[email protected]>; 
[email protected]<mailto:[email protected]>; Van
Haaren, Harry <[email protected]<mailto:[email protected]>>; 
Amber, Kumar
<[email protected]<mailto:[email protected]>>; Stokes, Ian 
<[email protected]<mailto:[email protected]>>
Subject: Re: [v6 10/11] dpif-netdev/mfex: Add AVX512 based optimized miniflow
extract

On 6 Jul 2021, at 15:11, Cian Ferriter wrote:

From: Harry van Haaren 
<[email protected]<mailto:[email protected]>>

This commit adds AVX512 implementations of miniflow extract.
By using the 64 bytes available in an AVX512 register, it is
possible to convert a packet to a miniflow data-structure in
a small quantity instructions.

The implementation here probes for Ether()/IP()/UDP() traffic,
and builds the appropriate miniflow data-structure for packets
that match the probe.

The implementation here is auto-validated by the miniflow
extract autovalidator, hence its correctness can be easily
tested and verified.

Note that this commit is designed to easily allow addition of new
traffic profiles in a scalable way, without code duplication for
each traffic profile.

Signed-off-by: Harry van Haaren 
<[email protected]<mailto:[email protected]>>

---

v5:

<SNIP>

+
+BUILD_ASSERT_DECL((OFFSETOFEND(struct dp_packet, l4_ofs)
+ - offsetof(struct dp_packet, l2_pad_size)) ==
+ MEMBER_SIZEOF(struct mfex_profile, dp_pkt_offs));
+
+/* Any change in flow abi should be inluded here otherwise build should

included

+ * fail.
+ */
+BUILD_ASSERT_DECL(FLOW_WC_SEQ == 42);

I was discussing this assert offline, and some issues were raised as a lot of 
people
build on top of OVS, and add new protocols, hence changing the FLOW_WC_SEQ.

It would be good to add more details on what to change/check before you
increase the value here. For example, which fields in the in mfex_profile
structure, or what changes to the flow structure require changes in the AVX
code.

On top of this, a solution could also be to allow for compile-time disabling of
MFEX for people that do their own hacking of OVS?

Thoughts?

I think of the FLOW_WC_SEQ as an "ABI for miniflow" counter.
If we break the miniflow ABI for a given packet, then all mappings from
packet data to miniflow need to be updated, or there is inconsistency.
All mappings includes scalar miniflow_extract(), and potentially optimized
SIMD implementations, it depends on the change/breakage.

Simply put, if the MFEX autovalidator fails after changes, then the MFEX
code needs to be updated.

The problem is that not everybody has an AVX512 machine, so it might be 
impossible to run. Even the OVS GitHub hooks will not run the AVX512 part :)

[HVH] I believe there's some ongoing discussions around CI and testing anyway, 
lets leave this topic for there.

If a change is made to the miniflow ABI, the miniflow bits are likely to
change position (if items are added in the middle of "struct flow").
Hence, the bit that previously represented IPv4 might now represent
the newly hacked-in protocol. That's a break, and requires a counter bump.

This is why upstream collaboration is favored across the industry,
there is overhead/breakage in keeping features/code out-of-tree.

While this is a nice goal, we all know this is not true in real life :( There 
are a lot of people out there that use OVS in the proprietary environment, and 
their changes are not sharable (or the community is not willing to include 
them).

I feel that we the OVS community should not take steps towards making
it easier to disable parts of code-bases internal to a project, just so others 
maintaining
out-of-tree code might have an easier time hacking in features. Instead I 
encourage
all to collaborate and contribute changes to OVS, and avoid the overhead of 
maintaining
out-of-tree work. A similar stance is also taken by Linux kernel and DPDK on 
out-of-tree work.

They want to, but it’s not how it’s in practice :(

[HVH] Agreed in general, but I still hold that OVS community should not take on 
any burden due to 3rd parties have out-of-tree work. The fact that its 
maintained out-of-tree places any burden due to that fact on the 3rd party. I 
expect a neutral-as-possible position, where OVS is not hostile to out-of-tree 
changes, but also not accepting burden due to out-of-tree changes.

MFEX opts are not enabled by default, so nothing will *break* due to the 
inclusion of
this code. If somebody first hacks in out-of-tree changes, breaks things, and 
then
blindly enables a specific MFEX impl without testing, then things can break yes.
But note that running MFEX autovalidation will flag breakage: so even in this 
corner-case
nothing will break silently.

I think we are wasting time discussing stuff here, where we should try to be 
inclusive, as not many OVS users/developers might have access to AVZ512 
hardware.
I would suggest adding some documentation around this. Maybe something like 
this:

/* If the above build assert happens, this means that you might need to make

  *   some modifications to the AVX512 miniflow extractor code. In general, the
  *   AVX512 flow extractor code uses hardcoded miniflow->map->bits which are
  *   defined in the mfex_profile structure as mf_bits. In addition to the
  *   hardcoded bits, it also has hardcoded offsets/masks that tell the AVX512
  *   code how to translate packet data in the required miniflow values. These
  *   are stored in the mfex_profile structure as store_shuf and store_kmsk.
  *   See the respective documentation on their usage.
  *
  *   If you have made changes to the flow structure, but only additions, no
  *   re-arranging of the actual members, you might be good to go. To be 100%
  *   sure, if possible, run the AVX512 MFEX autovalidator tests on an AVX512
  *   enabled machine.
  *
  *   If you did make changes to the order, you have to run the autovalidator
  *   tests on an AVX512 machine, and the errors will help you determine what to
  *   change.
  *
  *   In case your change increased the maximum size of the map, i.e.,
  *   UFLOWMAP_UNITS, you need to study the code as it will need some rewriting.
  *
  *   If you are not using the AVX512 MFEX implementation at all, i.e. keeping 
it
  *   to the default scalar implementation, see "ovs-appctl
  *   dpif-netdev/miniflow-parser-get", you could ignore this assert, and just
  *   increase the number.

*/

In addition I think the following assert should also be added to the code:

BUILD_ASSERT_DECL(UFLOWMAP_UNITS == 2);

And the following should use the variable, so it’s also clear where it’s used:

/* Constant data to set in mf.bits and dp_packet data on hit. */
uint64_t mf_bits[UFLOWMAP_UNITS];

[HVH] Both doc and code changes sound good, thanks for review & suggestion.

//Eelco
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to