> -----Original Message-----
> From: Van Haaren, Harry <[email protected]>
> Sent: Wednesday 2 March 2022 14:36
> To: Ferriter, Cian <[email protected]>; [email protected]
> Subject: RE: [PATCH v2 4/5] acinclude: Add seperate check for AVX512VBMI ISA.
>
> > -----Original Message-----
> > From: Ferriter, Cian <[email protected]>
> > Sent: Wednesday, March 2, 2022 12:18 PM
> > To: [email protected]
> > Cc: Ferriter, Cian <[email protected]>; Van Haaren, Harry
> > <[email protected]>
> > Subject: [PATCH v2 4/5] acinclude: Add seperate check for AVX512VBMI ISA.
> >
> > Only use the "avx512vbmi" compiler target when it is actually supported
> > by the compiler.
> >
> > The order of mfex_impls and the 'dpif_miniflow_extract_impl_idx' enum
> > have to be changed to keep the start index and size of the impl list
> > correct in both VBMI and non VBMI cases.
> >
> > Signed-off-by: Cian Ferriter <[email protected]>
>
> Thanks for working on this.
>
> <snip patch start>
>
Thanks for the review Harry.
> > return _mm512_maskz_permutexvar_epi8(kmask, idx, a);
> > @@ -481,7 +483,7 @@ mfex_avx512_process(struct dp_packet_batch *packets,
> > odp_port_t in_port,
> > void *pmd_handle OVS_UNUSED,
> > const enum MFEX_PROFILES profile_id,
> > - const uint32_t use_vbmi)
> > + const uint32_t use_vbmi OVS_UNUSED)
> > {
> > uint32_t hitmask = 0;
> > struct dp_packet *packet;
> > @@ -544,7 +546,11 @@ mfex_avx512_process(struct dp_packet_batch *packets,
> > */
> > __m512i v512_zeros = _mm512_setzero_si512();
> > __m512i v_blk0;
> > +#if HAVE_AVX512VBMI
> > if (__builtin_constant_p(use_vbmi) && use_vbmi) {
> > +#else
> > + if (0) {
> > +#endif
> > v_blk0 = _mm512_maskz_permutexvar_epi8_wrap(k_shuf, v_shuf,
> > v_pkt0);
> > } else {
>
> Cian previously wrote (on reply to this patch,
> https://patchwork.ozlabs.org/project/openvswitch/patch/20220302121819.1261928-5-
> [email protected]/#2850616)
> > The above solution works when the has not support for VBMI but I'm not
> > happy with the solution. It
> makes the code look messier IMO. I'm looking for suggestions on this.
> > I'm thinking to hide all this complexity in a wrapper function which would
> > always have the non VBMI
> permutex2var option and would have the VBMI permutexvar option where
> possible. The run time selecting
> between the two permute impls would remain unchanged. Does this sound good?
>
> There's an equivalent syntax version which might be better or worse. All in
> all, I think this does
> exactly what we want - and I cannot see drastically simpler version?
>
> if (
> #if HAVE_AVX512VBMI
> __builtin_constant_p(use_vbmi) && use_vbmi
> #else
> 0
> #endif
> )
>
> <snip patch>
>
>
Agreed. I'll keep what I have in that case. Thanks for giving your opinion here.
> > diff --git a/lib/dpif-netdev-private-extract.c
> > b/lib/dpif-netdev-private-extract.c
> > index 43b8b824e..ea2b03e5c 100644
> > --- a/lib/dpif-netdev-private-extract.c
> > +++ b/lib/dpif-netdev-private-extract.c
> > @@ -56,45 +56,46 @@ static struct dpif_miniflow_extract_impl mfex_impls[] =
> > {
> > /* Compile in implementations only if the compiler ISA checks pass. */
> > #if (__x86_64__ && HAVE_AVX512F && HAVE_LD_AVX512_GOOD &&
> > HAVE_AVX512BW_DQ \
> > && __SSE4_2__)
> > - [MFEX_IMPL_VBMI_IPv4_UDP] = {
> > - .probe = mfex_avx512_vbmi_probe,
> > - .extract_func = mfex_avx512_vbmi_ip_udp,
> > - .name = "avx512_vbmi_ipv4_udp", },
> > -
> > [MFEX_IMPL_IPv4_UDP] = {
> > .probe = mfex_avx512_probe,
> > .extract_func = mfex_avx512_ip_udp,
> > .name = "avx512_ipv4_udp", },
> >
> > - [MFEX_IMPL_VBMI_IPv4_TCP] = {
> > - .probe = mfex_avx512_vbmi_probe,
> > - .extract_func = mfex_avx512_vbmi_ip_tcp,
> > - .name = "avx512_vbmi_ipv4_tcp", },
> > -
> > [MFEX_IMPL_IPv4_TCP] = {
> > .probe = mfex_avx512_probe,
> > .extract_func = mfex_avx512_ip_tcp,
> > .name = "avx512_ipv4_tcp", },
> >
> > - [MFEX_IMPL_VBMI_DOT1Q_IPv4_UDP] = {
> > - .probe = mfex_avx512_vbmi_probe,
> > - .extract_func = mfex_avx512_vbmi_dot1q_ip_udp,
> > - .name = "avx512_vbmi_dot1q_ipv4_udp", },
> > -
> > [MFEX_IMPL_DOT1Q_IPv4_UDP] = {
> > .probe = mfex_avx512_probe,
> > .extract_func = mfex_avx512_dot1q_ip_udp,
> > .name = "avx512_dot1q_ipv4_udp", },
> >
> > - [MFEX_IMPL_VBMI_DOT1Q_IPv4_TCP] = {
> > - .probe = mfex_avx512_vbmi_probe,
> > - .extract_func = mfex_avx512_vbmi_dot1q_ip_tcp,
> > - .name = "avx512_vbmi_dot1q_ipv4_tcp", },
> > -
> > [MFEX_IMPL_DOT1Q_IPv4_TCP] = {
> > .probe = mfex_avx512_probe,
> > .extract_func = mfex_avx512_dot1q_ip_tcp,
> > .name = "avx512_dot1q_ipv4_tcp", },
> > +#if HAVE_AVX512VBMI
> > + [MFEX_IMPL_VBMI_IPv4_UDP] = {
> > + .probe = mfex_avx512_vbmi_probe,
> > + .extract_func = mfex_avx512_vbmi_ip_udp,
> > + .name = "avx512_vbmi_ipv4_udp", },
> > +
> > + [MFEX_IMPL_VBMI_IPv4_TCP] = {
> > + .probe = mfex_avx512_vbmi_probe,
> > + .extract_func = mfex_avx512_vbmi_ip_tcp,
> > + .name = "avx512_vbmi_ipv4_tcp", },
> > +
> > + [MFEX_IMPL_VBMI_DOT1Q_IPv4_UDP] = {
> > + .probe = mfex_avx512_vbmi_probe,
> > + .extract_func = mfex_avx512_vbmi_dot1q_ip_udp,
> > + .name = "avx512_vbmi_dot1q_ipv4_udp", },
> > +
> > + [MFEX_IMPL_VBMI_DOT1Q_IPv4_TCP] = {
> > + .probe = mfex_avx512_vbmi_probe,
> > + .extract_func = mfex_avx512_vbmi_dot1q_ip_tcp,
> > + .name = "avx512_vbmi_dot1q_ipv4_tcp", },
> > +#endif
> > #endif
> > };
>
> Currently the MFEX study selector logic will select the first impl with >50%
> hits.
>
> Putting the ISA optimized versions first means they will be automatically
> prioritized
> over the "base" ISA versions. (E.g. VBMI > AVX512F > scalar mfex).
>
> This is simpler than the DPCLS "priority" based selection method, but has a
> requirement
> that the impls are listed in order of preferred ISA. The logic is quite clear:
> use VBMI if available, else fallback to AVX512-F.
>
> <snip> rest of patch.
Good point, this is a mistake on my part. By re-ordering the implementation
list here, I'm changing the priority by which the MFEX implementations will be
selected when the study implementation is used.
My goal was to minimize the amount of "#if HAVE_AVX512VBMI" lines by grouping
all VBMI implementations together. The reason I put them at the end was to make
the "MFEX_IMPL_START_IDX" consistent, since we know that MFEX_IMPL_IPv4_UDP
will always be available when compiling the AVX512 MFEX code.
Let's preserve the order of the implementations list and add more '#if
HAVE_AVX512VBMI' lines to keep the same behaviour for newer compilers, but only
compile and generate the VBMI MFEX implementations when the compiler has VBMI
ISA.
I'll fix this in the v3.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev