On 23 Nov 2022, at 15:05, Van Haaren, Harry wrote:

>> -----Original Message-----
>> From: dev <ovs-dev-boun...@openvswitch.org> On Behalf Of Eelco Chaudron
>> Sent: Wednesday, November 23, 2022 1:55 PM
>> To: Finn, Emma <emma.f...@intel.com>
>> Cc: d...@openvswitch.org; david.march...@redhat.com; i.maxim...@ovn.org
>> Subject: Re: [ovs-dev] [v3] odp-execute: Add ISA implementation of 
>> set_masked IPv6
>> action
>
> <snip>
>
>>> Something like this
>>>     v_dst = Loadu_si128(dst)
>>>     v_src = Loadu_si128(src)
>>>     v_or = _or_si128(v_dst, v_src)
>>>
>>>     /* generate all ones register from cmpeq of v_zeros vs itself? */
>>>      v_zeros = _setzero_si128()
>>>             v_all_ones = _cmpeq_epi(v_zeros, v_zeros);
>>>     int do_checksum = _mm_test_all_zeros(v_or, v_all_ones);
>>>
>>> Does this approach make sense to you?
>>
>> Yes perfectly, I was not aware of the _mm_test_all_zeros() which saves the
>> popcount ;)
>>
>> One comment here is that do_checksum should be a bool type, something like
>>
>> bool do_checksum = !!_mm_test_all_zeros(v_or, v_all_ones);
>
> In the interest of micro-optimization discussions, we'd need to check if the 
> resulting ASM is the same...
> Branching on a value is usually a "test" with a register/register, or 
> register/constant, and that sets the "flags" register.
>
> Note that the test_all_zeros() *already* sets the flags register!
> https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html?wapkw=intrinsics%20guide#text=mm_test_all_zero&ig_expand=7187
>
> By taking the result, doing the bitwise !! ops , and branching on the result, 
> it might force the compiler into emitting a
> bunch of noisy-not-useful instructions.
>
> The test_all_zeros() isn't just a bypass of the popcnt instruction, it also 
> avoids the "test" with a register to set flags register.
> By having set the ZF (zero-flag) we can JumpZero (JZ instruction) or JNZ 
> (JumpNotZero) on the result of it, no GPR register usage.
>
> Given this code is x86 specific anyway, I don't see value add from the bool 
> type and !! trick to canonicalize the "any value" to 0 or 1.
> If the ASM generated is the same, I'm OK with either approach, just noting 
> the micro-optimization around test/flags-register.

Lets see the asm, if we do keep int we should add a comment. But as this code 
will move outside the loop, I assume the flag register will be cleared out 
before it hits this in the loop.

//Eelco

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to