On 14 Jul 2022, at 16:11, Van Haaren, Harry wrote:

>> -----Original Message-----
>> From: Eelco Chaudron <[email protected]>
>> Sent: Thursday, July 14, 2022 2:24 PM
>> To: Van Haaren, Harry <[email protected]>
>> Cc: [email protected]; [email protected]; Amber, Kumar
>> <[email protected]>; Pai G, Sunil <[email protected]>; Finn, Emma
>> <[email protected]>; Stokes, Ian <[email protected]>
>> Subject: Re: [PATCH v10 09/10] odp-execute: Add ISA implementation of 
>> set_masked
>> ETH
>
> <snip patch>
>
>>> +    /* Read the content of the key(src) and mask in the respective 
>>> registers.
>>> +     * We only load the src and dest addresses, which is only 96-bits and 
>>> not
>>> +     * 128-bits. */
>>> +    __m128i v_src = _mm_maskz_loadu_epi32(0x7,(void *) key);
>>> +    __m128i v_mask = _mm_maskz_loadu_epi32(0x7, (void *) mask);
>>
>> One question here I asked throughout the various revisions but got not 
>> answered:
>>
>> "The second load, loads 128 bits of data, but there are only 12 bytes to 
>> load. What
>> happens if the memory at the remaining 6 bytes are not mapped in memory 
>> (i.e. a
>> page does not exist/can't be loaded)? Will we crash!?
>
> AVX512 has some very nice features for handling scenarios where "not full" 
> SIMD is
> required. This feature is known as "k-masks", and in short allows "turning 
> off" part of
> the SIMD instruction from having an effect.
>
> In this case, the "maskz" part of the intrinsic means that the k-mask becomes 
> active.
> An extra parameter is added to any k-mask instruction (_mm_maskz_*), which 
> indicates
> what lanes to enable/disable. Note that the *size* of each lane is determined 
> by the
> end of the intrinsic, so _epi32() indicates 32-bit lanes. A worked example 
> below:
>
> _mm_maskz_loadu_epi32(0x7, (void *) mask);
>
> kmask is 0x7, or "111" in binary, so lowest 3 lanes (visualize them on the 
> right) are active.
> As the instruction targets 32-bit ints, each lane size is 4 bytes, so 3 * 4 = 
> 12 bytes "active".
> As a result, only 12 bytes are loaded from memory here. Even if the next byte 
> was on a new
> page, and not mapped into our virtual address range, there would be no crash 
> here due to
> the k-mask handling the load.
>
> <snip more patch>

Thanks, that really answers my question! I guess I should better read the 
pseudo code on the intrinsics guide :)

//Eelco

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to