> -----Original Message-----
> From: Ferriter, Cian <[email protected]>
> Sent: Wednesday, March 2, 2022 12:18 PM
> To: [email protected]
> Cc: Ferriter, Cian <[email protected]>; Van Haaren, Harry
> <[email protected]>
> Subject: [PATCH v2 4/5] acinclude: Add seperate check for AVX512VBMI ISA.
>
> Only use the "avx512vbmi" compiler target when it is actually supported
> by the compiler.
>
> The order of mfex_impls and the 'dpif_miniflow_extract_impl_idx' enum
> have to be changed to keep the start index and size of the impl list
> correct in both VBMI and non VBMI cases.
>
> Signed-off-by: Cian Ferriter <[email protected]>
Thanks for working on this.
<snip patch start>
> return _mm512_maskz_permutexvar_epi8(kmask, idx, a);
> @@ -481,7 +483,7 @@ mfex_avx512_process(struct dp_packet_batch *packets,
> odp_port_t in_port,
> void *pmd_handle OVS_UNUSED,
> const enum MFEX_PROFILES profile_id,
> - const uint32_t use_vbmi)
> + const uint32_t use_vbmi OVS_UNUSED)
> {
> uint32_t hitmask = 0;
> struct dp_packet *packet;
> @@ -544,7 +546,11 @@ mfex_avx512_process(struct dp_packet_batch *packets,
> */
> __m512i v512_zeros = _mm512_setzero_si512();
> __m512i v_blk0;
> +#if HAVE_AVX512VBMI
> if (__builtin_constant_p(use_vbmi) && use_vbmi) {
> +#else
> + if (0) {
> +#endif
> v_blk0 = _mm512_maskz_permutexvar_epi8_wrap(k_shuf, v_shuf,
> v_pkt0);
> } else {
Cian previously wrote (on reply to this patch,
https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/#2850616)
> The above solution works when the has not support for VBMI but I'm not happy
> with the solution. It makes the code look messier IMO. I'm looking for
> suggestions on this.
> I'm thinking to hide all this complexity in a wrapper function which would
> always have the non VBMI permutex2var option and would have the VBMI
> permutexvar option where possible. The run time selecting between the two
> permute impls would remain unchanged. Does this sound good?
There's an equivalent syntax version which might be better or worse. All in
all, I think this does exactly what we want - and I cannot see drastically
simpler version?
if (
#if HAVE_AVX512VBMI
__builtin_constant_p(use_vbmi) && use_vbmi
#else
0
#endif
)
<snip patch>
> diff --git a/lib/dpif-netdev-private-extract.c
> b/lib/dpif-netdev-private-extract.c
> index 43b8b824e..ea2b03e5c 100644
> --- a/lib/dpif-netdev-private-extract.c
> +++ b/lib/dpif-netdev-private-extract.c
> @@ -56,45 +56,46 @@ static struct dpif_miniflow_extract_impl mfex_impls[] = {
> /* Compile in implementations only if the compiler ISA checks pass. */
> #if (__x86_64__ && HAVE_AVX512F && HAVE_LD_AVX512_GOOD &&
> HAVE_AVX512BW_DQ \
> && __SSE4_2__)
> - [MFEX_IMPL_VBMI_IPv4_UDP] = {
> - .probe = mfex_avx512_vbmi_probe,
> - .extract_func = mfex_avx512_vbmi_ip_udp,
> - .name = "avx512_vbmi_ipv4_udp", },
> -
> [MFEX_IMPL_IPv4_UDP] = {
> .probe = mfex_avx512_probe,
> .extract_func = mfex_avx512_ip_udp,
> .name = "avx512_ipv4_udp", },
>
> - [MFEX_IMPL_VBMI_IPv4_TCP] = {
> - .probe = mfex_avx512_vbmi_probe,
> - .extract_func = mfex_avx512_vbmi_ip_tcp,
> - .name = "avx512_vbmi_ipv4_tcp", },
> -
> [MFEX_IMPL_IPv4_TCP] = {
> .probe = mfex_avx512_probe,
> .extract_func = mfex_avx512_ip_tcp,
> .name = "avx512_ipv4_tcp", },
>
> - [MFEX_IMPL_VBMI_DOT1Q_IPv4_UDP] = {
> - .probe = mfex_avx512_vbmi_probe,
> - .extract_func = mfex_avx512_vbmi_dot1q_ip_udp,
> - .name = "avx512_vbmi_dot1q_ipv4_udp", },
> -
> [MFEX_IMPL_DOT1Q_IPv4_UDP] = {
> .probe = mfex_avx512_probe,
> .extract_func = mfex_avx512_dot1q_ip_udp,
> .name = "avx512_dot1q_ipv4_udp", },
>
> - [MFEX_IMPL_VBMI_DOT1Q_IPv4_TCP] = {
> - .probe = mfex_avx512_vbmi_probe,
> - .extract_func = mfex_avx512_vbmi_dot1q_ip_tcp,
> - .name = "avx512_vbmi_dot1q_ipv4_tcp", },
> -
> [MFEX_IMPL_DOT1Q_IPv4_TCP] = {
> .probe = mfex_avx512_probe,
> .extract_func = mfex_avx512_dot1q_ip_tcp,
> .name = "avx512_dot1q_ipv4_tcp", },
> +#if HAVE_AVX512VBMI
> + [MFEX_IMPL_VBMI_IPv4_UDP] = {
> + .probe = mfex_avx512_vbmi_probe,
> + .extract_func = mfex_avx512_vbmi_ip_udp,
> + .name = "avx512_vbmi_ipv4_udp", },
> +
> + [MFEX_IMPL_VBMI_IPv4_TCP] = {
> + .probe = mfex_avx512_vbmi_probe,
> + .extract_func = mfex_avx512_vbmi_ip_tcp,
> + .name = "avx512_vbmi_ipv4_tcp", },
> +
> + [MFEX_IMPL_VBMI_DOT1Q_IPv4_UDP] = {
> + .probe = mfex_avx512_vbmi_probe,
> + .extract_func = mfex_avx512_vbmi_dot1q_ip_udp,
> + .name = "avx512_vbmi_dot1q_ipv4_udp", },
> +
> + [MFEX_IMPL_VBMI_DOT1Q_IPv4_TCP] = {
> + .probe = mfex_avx512_vbmi_probe,
> + .extract_func = mfex_avx512_vbmi_dot1q_ip_tcp,
> + .name = "avx512_vbmi_dot1q_ipv4_tcp", },
> +#endif
> #endif
> };
Currently the MFEX study selector logic will select the first impl with >50%
hits.
Putting the ISA optimized versions first means they will be automatically
prioritized
over the "base" ISA versions. (E.g. VBMI > AVX512F > scalar mfex).
This is simpler than the DPCLS "priority" based selection method, but has a
requirement
that the impls are listed in order of preferred ISA. The logic is quite clear:
use VBMI if available, else fallback to AVX512-F.
<snip> rest of patch.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev