> -----Original Message-----
> From: Finn, Emma <[email protected]>
> Sent: Wednesday, January 5, 2022 4:54 PM
> To: [email protected]; Van Haaren, Harry <[email protected]>;
> Amber, Kumar <[email protected]>
> Cc: Finn, Emma <[email protected]>
> Subject: [PATCH v4 9/9] odp-execute: Add ISA implementation of push_vlan
> action.
>
> This commit adds the AVX512 implementation of the push_vlan action.
> The implementation here is auto-validated by the miniflow
> extract autovalidator, hence its correctness can be easily
> tested and verified.
>
> Signed-off-by: Emma Finn <[email protected]>
Some comments below;
> +enum OPERATION {
> + OP_ADD,
> + OP_SUB,
> +};
> +
> static inline void ALWAYS_INLINE
> -avx512_dp_packet_resize_l2(struct dp_packet *b, int increment)
> +avx512_dp_packet_resize_l2(struct dp_packet *b, int increment,
> + enum OPERATION op)
I like the more generic nature of the function here (with OP introduction)
as per internal reviews, however there might be more improvements possible.
A few suggestions & explanations below.
> {
> /* update packet size/data pointers */
> + if (increment >= 0) {
> + dp_packet_prealloc_headroom(b, increment);
> + } else {
> + ovs_assert(dp_packet_size(b) - dp_packet_l2_pad_size(b) >=
> -increment);
> + }
> +
> dp_packet_set_data(b, (char *) dp_packet_data(b) - increment);
> dp_packet_set_size(b, dp_packet_size(b) + increment);
>
> @@ -50,9 +62,9 @@ avx512_dp_packet_resize_l2(struct dp_packet *b, int
> increment)
> const __m128i v_zeros = _mm_setzero_si128();
> const __m128i v_u16_max = _mm_cmpeq_epi16(v_zeros, v_zeros);
>
> - /* Only these lanes can be incremented for push-VLAN action. */
> + /* Only these lanes can be incremented/decremented for L2. */
> const uint8_t k_lanes = 0b1110;
> - __m128i v_offset = _mm_set1_epi16(VLAN_HEADER_LEN);
> + __m128i v_offset = _mm_set1_epi16(abs(increment));
>
> /* Load packet and compare with UINT16_MAX */
> void *adjust_ptr = &b->l2_pad_size;
> @@ -60,9 +72,19 @@ avx512_dp_packet_resize_l2(struct dp_packet *b, int
> increment)
> __mmask8 k_cmp = _mm_mask_cmpneq_epu16_mask(k_lanes,
> v_adjust_src,
> v_u16_max);
>
> - /* Add VLAN_HEADER_LEN using compare mask, store results. */
> - __m128i v_adjust_wip = _mm_mask_sub_epi16(v_adjust_src, k_cmp,
> - v_adjust_src, v_offset);
> + /* Update VLAN_HEADER_LEN using compare mask, store results. */
> + __m128i v_adjust_wip;
> + switch (op) {
> + case OP_ADD:
> + v_adjust_wip = _mm_mask_add_epi16(v_adjust_src, k_cmp,
> + v_adjust_src, v_offset);
> + break;
> + case OP_SUB:
> + v_adjust_wip = _mm_mask_sub_epi16(v_adjust_src, k_cmp,
> + v_adjust_src, v_offset);
> + break;
> + }
> +
> _mm_storeu_si128(adjust_ptr, v_adjust_wip);
>
> }
> @@ -79,8 +101,7 @@ avx512_eth_pop_vlan(struct dp_packet *packet)
> __m128i v_realign = _mm_alignr_epi8(v_ether, _mm_setzero_si128(),
> 16 - VLAN_HEADER_LEN);
> _mm_storeu_si128((void *) veh, v_realign);
> - avx512_dp_packet_resize_l2(packet, -VLAN_HEADER_LEN);
> -
> + avx512_dp_packet_resize_l2(packet, -VLAN_HEADER_LEN, OP_SUB);
> }
> }
>
> @@ -96,6 +117,42 @@ action_avx512_pop_vlan(void *dp OVS_UNUSED, struct
> dp_packet_batch *batch,
> }
> }
>
> +static inline void ALWAYS_INLINE
> +avx512_eth_push_vlan(struct dp_packet *packet, ovs_be16 tpid, ovs_be16 tci)
> +{
> + avx512_dp_packet_resize_l2(packet, VLAN_HEADER_LEN, OP_ADD);
> +
> + /* Build up the VLAN TCI/TPID, and merge with the moving of Ether. */
> + char *new_pkt_data = (char *) dp_packet_data(packet);
Improve variable name to just "pkt_data".
> +
> + const uint16_t tci_proc = tci & htons(~VLAN_CFI);
> + const uint32_t tpid_tci = (tci_proc << 16) | tpid;
> + __m128i v_ether = _mm_loadu_si128((void *) new_pkt_data);
Move the load of the v_ether down to below where the rest of the SIMD code is,
to keep it all close together. Prefer to have static initialization of const
data above
the SIMD code to keep it "out of the way" when reviewing the SIMD operations.
> + static const uint8_t vlan_push_shuffle_mask[16] = {
> + 4, 5, 6, 7, 8, 9, 10, 11,
> + 12, 13, 14, 15, 0xFF, 0xFF, 0xFF, 0xFF
> + };
For "SIMD interested folks", this shuffle moves the data in the register
"across"
by 4 bytes (as the indexes for 0 = 4, 1=5, etc to 11=15). The last 4 bytes in
the
register are explicitly zeroed (MSB set, but 0xFF is the normal way of
indicating that).
> +
> + __m128i v_index = _mm_loadu_si128((void *) vlan_push_shuffle_mask);
> + __m128i v_shift = _mm_shuffle_epi8(v_ether, v_index);
> + __m128i v_vlan_hdr = _mm_insert_epi32(v_shift, tpid_tci, 3);
The "VLAN insertion" happens here where the v_shift gets the GPR tpid_tci
inserted.
> + _mm_storeu_si128((void *) new_pkt_data, v_vlan_hdr);
Then the updated packet data is stored back.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev