> -----Original Message-----
> From: dev <[email protected]> On Behalf Of Eelco Chaudron
> Sent: Wednesday, November 23, 2022 1:55 PM
> To: Finn, Emma <[email protected]>
> Cc: [email protected]; [email protected]; [email protected]
> Subject: Re: [ovs-dev] [v3] odp-execute: Add ISA implementation of set_masked 
> IPv6
> action

<snip>

> > Something like this
> >     v_dst = Loadu_si128(dst)
> >     v_src = Loadu_si128(src)
> >     v_or = _or_si128(v_dst, v_src)
> >
> >     /* generate all ones register from cmpeq of v_zeros vs itself? */
> >      v_zeros = _setzero_si128()
> >             v_all_ones = _cmpeq_epi(v_zeros, v_zeros);
> >     int do_checksum = _mm_test_all_zeros(v_or, v_all_ones);
> >
> > Does this approach make sense to you?
> 
> Yes perfectly, I was not aware of the _mm_test_all_zeros() which saves the
> popcount ;)
> 
> One comment here is that do_checksum should be a bool type, something like
> 
> bool do_checksum = !!_mm_test_all_zeros(v_or, v_all_ones);

In the interest of micro-optimization discussions, we'd need to check if the 
resulting ASM is the same...
Branching on a value is usually a "test" with a register/register, or 
register/constant, and that sets the "flags" register.

Note that the test_all_zeros() *already* sets the flags register!
https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html?wapkw=intrinsics%20guide#text=mm_test_all_zero&ig_expand=7187

By taking the result, doing the bitwise !! ops , and branching on the result, 
it might force the compiler into emitting a
bunch of noisy-not-useful instructions.

The test_all_zeros() isn't just a bypass of the popcnt instruction, it also 
avoids the "test" with a register to set flags register.
By having set the ZF (zero-flag) we can JumpZero (JZ instruction) or JNZ 
(JumpNotZero) on the result of it, no GPR register usage.

Given this code is x86 specific anyway, I don't see value add from the bool 
type and !! trick to canonicalize the "any value" to 0 or 1.
If the ASM generated is the same, I'm OK with either approach, just noting the 
micro-optimization around test/flags-register.

Regards, -Harry
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to