On 25 Nov 2022, at 16:20, Finn, Emma wrote:
>> -----Original Message-----
>> From: Eelco Chaudron <[email protected]>
>> Sent: Friday 25 November 2022 12:26
>> To: Finn, Emma <[email protected]>
>> Cc: [email protected]; Van Haaren, Harry
>> <[email protected]>
>> Subject: Re: [v4] odp-execute: Add ISA implementation of set_masked IPv6
>> action
>>
>>
>>
>> On 24 Nov 2022, at 10:30, Emma Finn wrote:
>>
>>> This commit adds support for the AVX512 implementation of the
>>> ipv6_set_addrs action as well as an AVX512 implementation of updating
>>> the L4 checksums.
>>>
>>> Signed-off-by: Emma Finn <[email protected]>
>>
>> Thanks Emma for the v4, I have one question and a couple of style issues. To
>> speed things up I just provide the diff for the style issues.
>>
>> I was not able to do any actual testing, as my system did not have the
>> avx512vbmi extension :(
>>
>> Cheers,
>>
>> Eelco
>>
>>> ---
>>
>> Style issues diff:
>>
>> diff --git a/lib/odp-execute-avx512.c b/lib/odp-execute-avx512.c index
>> 82ff7e647..f798d6708 100644
>> --- a/lib/odp-execute-avx512.c
>> +++ b/lib/odp-execute-avx512.c
>> @@ -20,9 +20,9 @@
>>
>> #include <config.h>
>> #include <errno.h>
>> -#include <sys/types.h>
>> #include <netinet/in.h>
>> #include <netinet/ip6.h>
>> +#include <sys/types.h>
>>
> For FreeBSD, network headers must be included in a certain order.
> See this commit for details:
> b2befd5bb2db ("sparse: Add guards to prevent FreeBSD-incompatible
> #include order.")
> So these need to stay in this order.
Oops, yes I did not test with FreeBSD (or windows), I need to figure out how to
do that in the future ;)
>> #include "csum.h"
>> #include "dp-packet.h"
>> @@ -547,8 +547,8 @@ avx512_ipv6_sum_header(__m512i ip6_header)
>> * horizontal add. */
>> __m256i v_swap32a = _mm256_setr_epi32(0x0, 0x4, 0xF, 0xF,
>> 0xF, 0xF, 0xF, 0xF);
>> - v_delta = _mm256_permutexvar_epi32(v_swap32a, v_delta);
>>
>> + v_delta = _mm256_permutexvar_epi32(v_swap32a, v_delta);
>> v_delta = _mm256_hadd_epi32(v_delta, v_zeros);
>> v_delta = _mm256_hadd_epi16(v_delta, v_zeros);
>>
>> @@ -562,7 +562,7 @@ avx512_ipv6_addr_csum_delta(__m512i old_header,
>> __m512i new_header) {
>> uint16_t old_delta = avx512_ipv6_sum_header(old_header);
>> uint16_t new_delta = avx512_ipv6_sum_header(new_header);
>> - uint32_t csum_delta = (uint16_t)~old_delta + new_delta;
>> + uint32_t csum_delta = ((uint16_t) ~old_delta) + new_delta;
>>
>> return ~csum_finish(csum_delta);
>> }
>> @@ -606,15 +606,15 @@ action_avx512_ipv6_set_addrs(struct
>> dp_packet_batch *batch,
>> __m512i v_key_shuf = _mm512_permutexvar_epi8(v_shuffle, v_key);
>> __m512i v_mask_shuf = _mm512_permutexvar_epi8(v_shuffle, v_mask);
>>
>> -
>> /* Set the v_zero register to all zero's. */
>> const __m128i v_zeros = _mm_setzero_si128();
>> +
>> /* Set the v_all_ones register to all one's. */
>> const __m128i v_all_ones = _mm_cmpeq_epi16(v_zeros, v_zeros);
>>
>> - /* Load ip6 src and dst respectively into 128-bit wide registers. */
>> + /* Load ip6 src and dst masks respectively into 128-bit wide
>> + registers. */
>> __m128i v_src = _mm_loadu_si128((void *) mask);
>> - __m128i v_dst = _mm_maskz_loadu_epi64(0xC,(void *) mask);
>> + __m128i v_dst = _mm_maskz_loadu_epi64(0xC, (void *) mask);
>>
>> /* Perform a bitwise OR between src and dst registers. */
>> __m128i v_or = _mm_or_si128(v_src, v_dst);
>>
>>> v4:
>>> - Reworked and moved check for checksum outside loop.
>>> - Code cleanup based on review from Eelco.
>>> v3:
>>> - Added a runtime check for AVX512 vbmi.
>>> v2:
>>> - Added check for availbility of s6_addr32 field of struct in6_addr.
>>> - Fixed network headers for freebsd builds.
>>>
>>
>> <SNIP>
>>
>>> + /* Load ip6 src and dst respectively into 128-bit wide registers. */
>>> + __m128i v_src = _mm_loadu_si128((void *) mask);
>>> + __m128i v_dst = _mm_maskz_loadu_epi64(0xC,(void *) mask);
>>
>> Guess it might be me, but I do not understand how
>> _mm_maskz_loadu_epi64() will load the dst from the mask.
>> Looking at the intrinsics guide it will only read the first two 64-bit
>> values, but
>> mask points to src?
>>
>> Should we not just do the following here?
>>
>> + __m128i v_src = _mm_loadu_si128((void *) &mask->ipv6_src);
>> + __m128i v_dst = _mm_loadu_si128((void *) &mask->ipv6_dst);
>>
>
> Yes, good catch. The _maskz_load_ isn't correctly pointing to the dst values
> from the mask.
>
> I will make all the changes above (except the include header comment) and
> send v5 shortly.
Thanks, take your time and enjoy the weekend!
> <SNIP>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev