> -----Original Message-----
> From: Eelco Chaudron <[email protected]>
> Sent: Wednesday 23 November 2022 14:14
> To: Van Haaren, Harry <[email protected]>
> Cc: Finn, Emma <[email protected]>; [email protected];
> [email protected]; [email protected]
> Subject: Re: [ovs-dev] [v3] odp-execute: Add ISA implementation of
> set_masked IPv6 action
>
>
>
> On 23 Nov 2022, at 15:05, Van Haaren, Harry wrote:
>
> >> -----Original Message-----
> >> From: dev <[email protected]> On Behalf Of Eelco
> >> Chaudron
> >> Sent: Wednesday, November 23, 2022 1:55 PM
> >> To: Finn, Emma <[email protected]>
> >> Cc: [email protected]; [email protected];
> >> [email protected]
> >> Subject: Re: [ovs-dev] [v3] odp-execute: Add ISA implementation of
> >> set_masked IPv6 action
> >
> > <snip>
> >
> >>> Something like this
> >>> v_dst = Loadu_si128(dst)
> >>> v_src = Loadu_si128(src)
> >>> v_or = _or_si128(v_dst, v_src)
> >>>
> >>> /* generate all ones register from cmpeq of v_zeros vs itself? */
> >>> v_zeros = _setzero_si128()
> >>> v_all_ones = _cmpeq_epi(v_zeros, v_zeros);
> >>> int do_checksum = _mm_test_all_zeros(v_or, v_all_ones);
> >>>
> >>> Does this approach make sense to you?
> >>
> >> Yes perfectly, I was not aware of the _mm_test_all_zeros() which
> >> saves the popcount ;)
> >>
> >> One comment here is that do_checksum should be a bool type,
> something
> >> like
> >>
> >> bool do_checksum = !!_mm_test_all_zeros(v_or, v_all_ones);
> >
> > In the interest of micro-optimization discussions, we'd need to check if the
> resulting ASM is the same...
> > Branching on a value is usually a "test" with a register/register, or
> register/constant, and that sets the "flags" register.
> >
> > Note that the test_all_zeros() *already* sets the flags register!
> > https://www.intel.com/content/www/us/en/docs/intrinsics-
> guide/index.ht
> > ml?wapkw=intrinsics%20guide#text=mm_test_all_zero&ig_expand=7187
> >
> > By taking the result, doing the bitwise !! ops , and branching on the
> > result, it might force the compiler into emitting a bunch of
> > noisy-not-useful
> instructions.
> >
> > The test_all_zeros() isn't just a bypass of the popcnt instruction, it also
> avoids the "test" with a register to set flags register.
> > By having set the ZF (zero-flag) we can JumpZero (JZ instruction) or JNZ
> (JumpNotZero) on the result of it, no GPR register usage.
> >
> > Given this code is x86 specific anyway, I don't see value add from the bool
> type and !! trick to canonicalize the "any value" to 0 or 1.
> > If the ASM generated is the same, I'm OK with either approach, just noting
> the micro-optimization around test/flags-register.
>
> Lets see the asm, if we do keep int we should add a comment. But as this
> code will move outside the loop, I assume the flag register will be cleared
> out
> before it hits this in the loop.
>
> //Eelco
Let's change this to a bool type.
I will send v4 of this patch shortly with all these changes.
Thanks,
Emma
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev