Re: [ovs-dev] [PATCH v2 4/5] acinclude: Add seperate check for AVX512VBMI ISA.

Van Haaren, Harry Wed, 02 Mar 2022 06:36:09 -0800

> -----Original Message-----
> From: Ferriter, Cian <[email protected]>
> Sent: Wednesday, March 2, 2022 12:18 PM
> To: [email protected]
> Cc: Ferriter, Cian <[email protected]>; Van Haaren, Harry
> <[email protected]>
> Subject: [PATCH v2 4/5] acinclude: Add seperate check for AVX512VBMI ISA.
> 
> Only use the "avx512vbmi" compiler target when it is actually supported
> by the compiler.
> 
> The order of mfex_impls and the 'dpif_miniflow_extract_impl_idx' enum
> have to be changed to keep the start index and size of the impl list
> correct in both VBMI and non VBMI cases.
> 
> Signed-off-by: Cian Ferriter <[email protected]>


Thanks for working on this.

<snip patch start>

>      return _mm512_maskz_permutexvar_epi8(kmask, idx, a);
> @@ -481,7 +483,7 @@ mfex_avx512_process(struct dp_packet_batch *packets,
>                      odp_port_t in_port,
>                      void *pmd_handle OVS_UNUSED,
>                      const enum MFEX_PROFILES profile_id,
> -                    const uint32_t use_vbmi)
> +                    const uint32_t use_vbmi OVS_UNUSED)
>  {
>      uint32_t hitmask = 0;
>      struct dp_packet *packet;
> @@ -544,7 +546,11 @@ mfex_avx512_process(struct dp_packet_batch *packets,
>           */
>          __m512i v512_zeros = _mm512_setzero_si512();
>          __m512i v_blk0;
> +#if HAVE_AVX512VBMI
>          if (__builtin_constant_p(use_vbmi) && use_vbmi) {
> +#else
> +        if (0) {
> +#endif
>              v_blk0 = _mm512_maskz_permutexvar_epi8_wrap(k_shuf, v_shuf,
>                                                          v_pkt0);
>          } else {

Cian previously wrote (on reply to this patch, 
https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/#2850616)
> The above solution works when the has not support for VBMI but I'm not happy 
> with the solution. It makes the code look messier IMO. I'm looking for 
> suggestions on this.
> I'm thinking to hide all this complexity in a wrapper function which would 
> always have the non VBMI permutex2var option and would have the VBMI 
> permutexvar option where possible. The run time selecting between the two 
> permute impls would remain unchanged. Does this sound good?

There's an equivalent syntax version which might be better or worse. All in 
all, I think this does exactly what we want - and I cannot see drastically 
simpler version?

if (
#if HAVE_AVX512VBMI
     __builtin_constant_p(use_vbmi) && use_vbmi
#else
     0
#endif
     )

<snip patch>


> diff --git a/lib/dpif-netdev-private-extract.c 
> b/lib/dpif-netdev-private-extract.c
> index 43b8b824e..ea2b03e5c 100644
> --- a/lib/dpif-netdev-private-extract.c
> +++ b/lib/dpif-netdev-private-extract.c
> @@ -56,45 +56,46 @@ static struct dpif_miniflow_extract_impl mfex_impls[] = {
>  /* Compile in implementations only if the compiler ISA checks pass. */
>  #if (__x86_64__ && HAVE_AVX512F && HAVE_LD_AVX512_GOOD &&
> HAVE_AVX512BW_DQ \
>       && __SSE4_2__)
> -    [MFEX_IMPL_VBMI_IPv4_UDP] = {
> -        .probe = mfex_avx512_vbmi_probe,
> -        .extract_func = mfex_avx512_vbmi_ip_udp,
> -        .name = "avx512_vbmi_ipv4_udp", },
> -
>      [MFEX_IMPL_IPv4_UDP] = {
>          .probe = mfex_avx512_probe,
>          .extract_func = mfex_avx512_ip_udp,
>          .name = "avx512_ipv4_udp", },
> 
> -    [MFEX_IMPL_VBMI_IPv4_TCP] = {
> -        .probe = mfex_avx512_vbmi_probe,
> -        .extract_func = mfex_avx512_vbmi_ip_tcp,
> -        .name = "avx512_vbmi_ipv4_tcp", },
> -
>      [MFEX_IMPL_IPv4_TCP] = {
>          .probe = mfex_avx512_probe,
>          .extract_func = mfex_avx512_ip_tcp,
>          .name = "avx512_ipv4_tcp", },
> 
> -    [MFEX_IMPL_VBMI_DOT1Q_IPv4_UDP] = {
> -        .probe = mfex_avx512_vbmi_probe,
> -        .extract_func = mfex_avx512_vbmi_dot1q_ip_udp,
> -        .name = "avx512_vbmi_dot1q_ipv4_udp", },
> -
>      [MFEX_IMPL_DOT1Q_IPv4_UDP] = {
>          .probe = mfex_avx512_probe,
>          .extract_func = mfex_avx512_dot1q_ip_udp,
>          .name = "avx512_dot1q_ipv4_udp", },
> 
> -    [MFEX_IMPL_VBMI_DOT1Q_IPv4_TCP] = {
> -        .probe = mfex_avx512_vbmi_probe,
> -        .extract_func = mfex_avx512_vbmi_dot1q_ip_tcp,
> -        .name = "avx512_vbmi_dot1q_ipv4_tcp", },
> -
>      [MFEX_IMPL_DOT1Q_IPv4_TCP] = {
>          .probe = mfex_avx512_probe,
>          .extract_func = mfex_avx512_dot1q_ip_tcp,
>          .name = "avx512_dot1q_ipv4_tcp", },
> +#if HAVE_AVX512VBMI
> +    [MFEX_IMPL_VBMI_IPv4_UDP] = {
> +        .probe = mfex_avx512_vbmi_probe,
> +        .extract_func = mfex_avx512_vbmi_ip_udp,
> +        .name = "avx512_vbmi_ipv4_udp", },
> +
> +    [MFEX_IMPL_VBMI_IPv4_TCP] = {
> +        .probe = mfex_avx512_vbmi_probe,
> +        .extract_func = mfex_avx512_vbmi_ip_tcp,
> +        .name = "avx512_vbmi_ipv4_tcp", },
> +
> +    [MFEX_IMPL_VBMI_DOT1Q_IPv4_UDP] = {
> +        .probe = mfex_avx512_vbmi_probe,
> +        .extract_func = mfex_avx512_vbmi_dot1q_ip_udp,
> +        .name = "avx512_vbmi_dot1q_ipv4_udp", },
> +
> +    [MFEX_IMPL_VBMI_DOT1Q_IPv4_TCP] = {
> +        .probe = mfex_avx512_vbmi_probe,
> +        .extract_func = mfex_avx512_vbmi_dot1q_ip_tcp,
> +        .name = "avx512_vbmi_dot1q_ipv4_tcp", },
> +#endif
>  #endif
>  };

Currently the MFEX study selector logic will select the first impl with >50% 
hits.

Putting the ISA optimized versions first means they will be automatically 
prioritized
over the "base" ISA versions. (E.g. VBMI > AVX512F > scalar mfex).

This is simpler than the DPCLS "priority" based selection method, but has a 
requirement
that the impls are listed in order of preferred ISA. The logic is quite clear:
use VBMI if available, else fallback to AVX512-F.

<snip> rest of patch.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v2 4/5] acinclude: Add seperate check for AVX512VBMI ISA.

Reply via email to