Re: [ovs-dev] [PATCH v9 4/5] dpif-netdev: refactor generic implementation

2019-05-28 Thread Ilya Maximets
Not a full review. See inline.

Best regards, Ilya Maximets.

On 08.05.2019 18:13, Harry van Haaren wrote:
> This commit refactors the generic implementation. The
> goal of this refactor is to simply the code to enable
> "specialization" of the functions at compile time.
> 
> Given compile-time optimizations, the compiler is able
> to unroll loops, and create optimized code sequences due
> to compile time knowledge of loop-trip counts.
> 
> In order to enable these compiler optimizations, we must
> refactor the code to pass the loop-trip counts to functions
> as compile time constants.
> 
> This patch allows the number of miniflow-bits set per "unit"
> in the miniflow to be passed around as a function argument.
> 
> Note that this patch does NOT yet take advantage of doing so,
> this is only a refactor to enable it in the next patches.
> 
> Signed-off-by: Harry van Haaren 
> 
> ---
> 
> v9:
> - Use count_1bits in favour of __builtin_popcount (Ilya)
> - Use ALWAYS_INLINE instead of __attribute__ synatx (Ilya)
> 
> v8:
> - Rework block_cache and mf_masks to avoid variable-lenght array
>   due to compiler issues. Provisioning for worst case is not a
>   good solution due to magnitue of over-provisioning required.
> - Rework netdev_flatten function removing unused parameter
> ---
>  lib/dpif-netdev-lookup-generic.c | 239 ---
>  lib/dpif-netdev.c|  79 +-
>  lib/dpif-netdev.h|  20 ++-
>  3 files changed, 283 insertions(+), 55 deletions(-)
> 
> diff --git a/lib/dpif-netdev-lookup-generic.c 
> b/lib/dpif-netdev-lookup-generic.c
> index 2e4003408..901d28ff0 100644
> --- a/lib/dpif-netdev-lookup-generic.c
> +++ b/lib/dpif-netdev-lookup-generic.c
> @@ -28,67 +28,204 @@
>  #include "packets.h"
>  #include "pvector.h"
>  
> -/* Returns a hash value for the bits of 'key' where there are 1-bits in
> - * 'mask'. */
> -static inline uint32_t
> -netdev_flow_key_hash_in_mask(const struct netdev_flow_key *key,
> - const struct netdev_flow_key *mask)
> +VLOG_DEFINE_THIS_MODULE(dpif_lookup_generic);
> +
> +/* netdev_flow_key_flatten_unit:
> + * Given a packet, table and mf_masks, this function iterates over each bit
> + * set in the subtable, and calculates the appropriate metadata to store in 
> the
> + * blocks_scratch[].
> + *
> + * The results of the blocks_scratch[] can be used for hashing, and later for
> + * verification of if a rule matches the given packet.
> + */
> +static inline void
> +netdev_flow_key_flatten_unit(const uint64_t * restrict pkt_blocks,
> + const uint64_t * restrict tbl_blocks,
> + const uint64_t * restrict mf_masks,
> + uint64_t * restrict blocks_scratch,
> + const uint64_t pkt_mf_bits,
> + const uint32_t count)
>  {
> -const uint64_t *p = miniflow_get_values(>mf);
> -uint32_t hash = 0;
> -uint64_t value;
> +uint32_t i;
> +for (i = 0; i < count; i++) {
> +uint64_t mf_mask = mf_masks[i];
> +/* Calculate the block index for the packet metadata */
> +uint64_t idx_bits = mf_mask & pkt_mf_bits;
> +const uint32_t pkt_idx = count_1bits(idx_bits);
>  
> -NETDEV_FLOW_KEY_FOR_EACH_IN_FLOWMAP (value, key, mask->mf.map) {
> -hash = hash_add64(hash, value & *p);
> -p++;
> +/* check if the packet has the subtable miniflow bit set. If yes, the
> + * block at the above pkt_idx will be stored, otherwise it is masked
> + * out to be zero.
> + */
> +uint64_t pkt_has_mf_bit = (mf_mask + 1) & pkt_mf_bits;
> +uint64_t no_bit = ((!pkt_has_mf_bit) > 0) - 1;
> +
> +/* mask packet block by table block, and mask to zero if packet
> + * doesn't actually contain this block of metadata
> + */
> +blocks_scratch[i] = pkt_blocks[pkt_idx] & tbl_blocks[i] & no_bit;
>  }
> +}
> +
> +/* netdev_flow_key_flatten:
> + * This function takes a packet, and subtable and writes an array of uint64_t
> + * blocks. The blocks contain the metadata that the subtable matches on, in
> + * the same order as the subtable, allowing linear iteration over the blocks.
> + *
> + * To calculate the blocks contents, the netdev_flow_key_flatten_unit 
> function
> + * is called twice, once for each "unit" of the miniflow. This call can be
> + * inlined by the compiler for performance.
> + *
> + * Note that the u0_count and u1_count variables can be compile-time 
> constants,
> + * allowing the loop in the inlined flatten_unit() function to be 
> compile-time
> + * unrolled, or possibly removed totally by unrolling by the loop iterations.
> + * The compile time optimizations enabled by this design improves 
> performance.
> + */
> +static inline void
> +netdev_flow_key_flatten(const struct netdev_flow_key * restrict key,
> +const struct netdev_flow_key * 

[ovs-dev] [PATCH v9 4/5] dpif-netdev: refactor generic implementation

2019-05-08 Thread Harry van Haaren
This commit refactors the generic implementation. The
goal of this refactor is to simply the code to enable
"specialization" of the functions at compile time.

Given compile-time optimizations, the compiler is able
to unroll loops, and create optimized code sequences due
to compile time knowledge of loop-trip counts.

In order to enable these compiler optimizations, we must
refactor the code to pass the loop-trip counts to functions
as compile time constants.

This patch allows the number of miniflow-bits set per "unit"
in the miniflow to be passed around as a function argument.

Note that this patch does NOT yet take advantage of doing so,
this is only a refactor to enable it in the next patches.

Signed-off-by: Harry van Haaren 

---

v9:
- Use count_1bits in favour of __builtin_popcount (Ilya)
- Use ALWAYS_INLINE instead of __attribute__ synatx (Ilya)

v8:
- Rework block_cache and mf_masks to avoid variable-lenght array
  due to compiler issues. Provisioning for worst case is not a
  good solution due to magnitue of over-provisioning required.
- Rework netdev_flatten function removing unused parameter
---
 lib/dpif-netdev-lookup-generic.c | 239 ---
 lib/dpif-netdev.c|  79 +-
 lib/dpif-netdev.h|  20 ++-
 3 files changed, 283 insertions(+), 55 deletions(-)

diff --git a/lib/dpif-netdev-lookup-generic.c b/lib/dpif-netdev-lookup-generic.c
index 2e4003408..901d28ff0 100644
--- a/lib/dpif-netdev-lookup-generic.c
+++ b/lib/dpif-netdev-lookup-generic.c
@@ -28,67 +28,204 @@
 #include "packets.h"
 #include "pvector.h"
 
-/* Returns a hash value for the bits of 'key' where there are 1-bits in
- * 'mask'. */
-static inline uint32_t
-netdev_flow_key_hash_in_mask(const struct netdev_flow_key *key,
- const struct netdev_flow_key *mask)
+VLOG_DEFINE_THIS_MODULE(dpif_lookup_generic);
+
+/* netdev_flow_key_flatten_unit:
+ * Given a packet, table and mf_masks, this function iterates over each bit
+ * set in the subtable, and calculates the appropriate metadata to store in the
+ * blocks_scratch[].
+ *
+ * The results of the blocks_scratch[] can be used for hashing, and later for
+ * verification of if a rule matches the given packet.
+ */
+static inline void
+netdev_flow_key_flatten_unit(const uint64_t * restrict pkt_blocks,
+ const uint64_t * restrict tbl_blocks,
+ const uint64_t * restrict mf_masks,
+ uint64_t * restrict blocks_scratch,
+ const uint64_t pkt_mf_bits,
+ const uint32_t count)
 {
-const uint64_t *p = miniflow_get_values(>mf);
-uint32_t hash = 0;
-uint64_t value;
+uint32_t i;
+for (i = 0; i < count; i++) {
+uint64_t mf_mask = mf_masks[i];
+/* Calculate the block index for the packet metadata */
+uint64_t idx_bits = mf_mask & pkt_mf_bits;
+const uint32_t pkt_idx = count_1bits(idx_bits);
 
-NETDEV_FLOW_KEY_FOR_EACH_IN_FLOWMAP (value, key, mask->mf.map) {
-hash = hash_add64(hash, value & *p);
-p++;
+/* check if the packet has the subtable miniflow bit set. If yes, the
+ * block at the above pkt_idx will be stored, otherwise it is masked
+ * out to be zero.
+ */
+uint64_t pkt_has_mf_bit = (mf_mask + 1) & pkt_mf_bits;
+uint64_t no_bit = ((!pkt_has_mf_bit) > 0) - 1;
+
+/* mask packet block by table block, and mask to zero if packet
+ * doesn't actually contain this block of metadata
+ */
+blocks_scratch[i] = pkt_blocks[pkt_idx] & tbl_blocks[i] & no_bit;
 }
+}
+
+/* netdev_flow_key_flatten:
+ * This function takes a packet, and subtable and writes an array of uint64_t
+ * blocks. The blocks contain the metadata that the subtable matches on, in
+ * the same order as the subtable, allowing linear iteration over the blocks.
+ *
+ * To calculate the blocks contents, the netdev_flow_key_flatten_unit function
+ * is called twice, once for each "unit" of the miniflow. This call can be
+ * inlined by the compiler for performance.
+ *
+ * Note that the u0_count and u1_count variables can be compile-time constants,
+ * allowing the loop in the inlined flatten_unit() function to be compile-time
+ * unrolled, or possibly removed totally by unrolling by the loop iterations.
+ * The compile time optimizations enabled by this design improves performance.
+ */
+static inline void
+netdev_flow_key_flatten(const struct netdev_flow_key * restrict key,
+const struct netdev_flow_key * restrict mask,
+const uint64_t * restrict mf_masks,
+uint64_t * restrict blocks_scratch,
+const uint32_t u0_count,
+const uint32_t u1_count)
+{
+/* load mask from subtable, mask with packet mf, popcount to get idx */
+const uint64_t *pkt_blocks