On 25 Nov 2022, at 16:20, Finn, Emma wrote:

>> -----Original Message-----
>> From: Eelco Chaudron <[email protected]>
>> Sent: Friday 25 November 2022 12:26
>> To: Finn, Emma <[email protected]>
>> Cc: [email protected]; Van Haaren, Harry
>> <[email protected]>
>> Subject: Re: [v4] odp-execute: Add ISA implementation of set_masked IPv6
>> action
>>
>>
>>
>> On 24 Nov 2022, at 10:30, Emma Finn wrote:
>>
>>> This commit adds support for the AVX512 implementation of the
>>> ipv6_set_addrs action as well as an AVX512 implementation of updating
>>> the L4 checksums.
>>>
>>> Signed-off-by: Emma Finn <[email protected]>
>>
>> Thanks Emma for the v4, I have one question and a couple of style issues. To
>> speed things up I just provide the diff for the style issues.
>>
>> I was not able to do any actual testing, as my system did not have the
>> avx512vbmi extension :(
>>
>> Cheers,
>>
>> Eelco
>>
>>> ---
>>
>> Style issues diff:
>>
>> diff --git a/lib/odp-execute-avx512.c b/lib/odp-execute-avx512.c index
>> 82ff7e647..f798d6708 100644
>> --- a/lib/odp-execute-avx512.c
>> +++ b/lib/odp-execute-avx512.c
>> @@ -20,9 +20,9 @@
>>
>>  #include <config.h>
>>  #include <errno.h>
>> -#include <sys/types.h>
>>  #include <netinet/in.h>
>>  #include <netinet/ip6.h>
>> +#include <sys/types.h>
>>
> For FreeBSD, network headers must be included in a certain order.
> See this commit for details:
>     b2befd5bb2db ("sparse: Add guards to prevent FreeBSD-incompatible 
> #include order.")
> So these need to stay in this order.

Oops, yes I did not test with FreeBSD (or windows), I need to figure out how to 
do that in the future ;)

>>  #include "csum.h"
>>  #include "dp-packet.h"
>> @@ -547,8 +547,8 @@ avx512_ipv6_sum_header(__m512i ip6_header)
>>       * horizontal add. */
>>      __m256i v_swap32a = _mm256_setr_epi32(0x0, 0x4, 0xF, 0xF,
>>                                            0xF, 0xF, 0xF, 0xF);
>> -    v_delta = _mm256_permutexvar_epi32(v_swap32a, v_delta);
>>
>> +    v_delta = _mm256_permutexvar_epi32(v_swap32a, v_delta);
>>      v_delta = _mm256_hadd_epi32(v_delta, v_zeros);
>>      v_delta = _mm256_hadd_epi16(v_delta, v_zeros);
>>
>> @@ -562,7 +562,7 @@ avx512_ipv6_addr_csum_delta(__m512i old_header,
>> __m512i new_header)  {
>>      uint16_t old_delta = avx512_ipv6_sum_header(old_header);
>>      uint16_t new_delta = avx512_ipv6_sum_header(new_header);
>> -    uint32_t csum_delta = (uint16_t)~old_delta + new_delta;
>> +    uint32_t csum_delta = ((uint16_t) ~old_delta) + new_delta;
>>
>>      return  ~csum_finish(csum_delta);
>>  }
>> @@ -606,15 +606,15 @@ action_avx512_ipv6_set_addrs(struct
>> dp_packet_batch *batch,
>>      __m512i v_key_shuf = _mm512_permutexvar_epi8(v_shuffle, v_key);
>>      __m512i v_mask_shuf = _mm512_permutexvar_epi8(v_shuffle, v_mask);
>>
>> -
>>      /* Set the v_zero register to all zero's. */
>>      const __m128i v_zeros = _mm_setzero_si128();
>> +
>>      /* Set the v_all_ones register to all one's. */
>>      const __m128i v_all_ones = _mm_cmpeq_epi16(v_zeros, v_zeros);
>>
>> -    /* Load ip6 src and dst respectively into 128-bit wide registers. */
>> +    /* Load ip6 src and dst masks respectively into 128-bit wide
>> + registers. */
>>      __m128i v_src = _mm_loadu_si128((void *) mask);
>> -    __m128i v_dst = _mm_maskz_loadu_epi64(0xC,(void *) mask);
>> +    __m128i v_dst = _mm_maskz_loadu_epi64(0xC, (void *) mask);
>>
>>      /* Perform a bitwise OR between src and dst registers. */
>>      __m128i v_or = _mm_or_si128(v_src, v_dst);
>>
>>> v4:
>>>   - Reworked and moved check for checksum outside loop.
>>>   - Code cleanup based on review from Eelco.
>>> v3:
>>>   - Added a runtime check for AVX512 vbmi.
>>> v2:
>>>   - Added check for availbility of s6_addr32 field of struct in6_addr.
>>>   - Fixed network headers for freebsd builds.
>>>
>>
>> <SNIP>
>>
>>> +    /* Load ip6 src and dst respectively into 128-bit wide registers. */
>>> +    __m128i v_src = _mm_loadu_si128((void *) mask);
>>> +    __m128i v_dst = _mm_maskz_loadu_epi64(0xC,(void *) mask);
>>
>> Guess it might be me, but I do not understand how
>> _mm_maskz_loadu_epi64() will load the dst from the mask.
>> Looking at the intrinsics guide it will only read the first two 64-bit 
>> values, but
>> mask points to src?
>>
>> Should we not just do the following here?
>>
>> +    __m128i v_src = _mm_loadu_si128((void *) &mask->ipv6_src);
>> +    __m128i v_dst = _mm_loadu_si128((void *) &mask->ipv6_dst);
>>
>
> Yes, good catch. The _maskz_load_ isn't correctly pointing to the dst values 
> from the mask.
>
> I will make all the changes above (except the include header comment) and 
> send v5 shortly.

Thanks, take your time and enjoy the weekend!

> <SNIP>

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to