On Fri, 19 Jan 2024 07:43:18 GMT, Emanuel Peter wrote:
>> For long/double each permute row is 32 byte in size, so a shift by 5 to
>> compute row address.
>
> Ah right. Maybe we could say `32byte = 4 long = 4 * 64bit`.
> Because "64bit row" sounds like the whole row is only 64 bit long. It is
>
On Thu, 18 Jan 2024 17:06:55 GMT, Jatin Bhateja wrote:
>> @jatin-bhateja so why do you shift by 5? I thought 4 longs are 32 bit?
>
> For long/double each permute row is 32 byte in size, so a shift by 5 to
> compute row address.
Ah right. Maybe we could say `32byte = 4 long = 4 * 64bit`.
On Tue, 16 Jan 2024 07:08:57 GMT, Emanuel Peter wrote:
>> Each long/double permute lane holds 64 bit value.
>
> @jatin-bhateja so why do you shift by 5? I thought 4 longs are 32 bit?
For long/double each permute row is 32 byte in size, so a shift by 5 to compute
row address.
-
PR
On Tue, 16 Jan 2024 06:13:43 GMT, Jatin Bhateja wrote:
>> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 5309:
>>
>>> 5307: assert(bt == T_LONG || bt == T_DOUBLE, "");
>>> 5308: vmovmskpd(rtmp, mask, vec_enc);
>>> 5309: shlq(rtmp, 5); // for 64 bit rows (4 longs)
>>
>>
On Mon, 15 Jan 2024 09:10:38 GMT, Emanuel Peter wrote:
>> Jatin Bhateja has updated the pull request incrementally with one additional
>> commit since the last revision:
>>
>> Using emulated variable blend E-Core optimized instruction.
>
> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line
On Tue, 9 Jan 2024 16:48:56 GMT, Jatin Bhateja wrote:
>> Hi,
>>
>> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2
>> only targets.
>> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2
>> instruction set.
>> These are very frequently used APIs in
On Tue, 9 Jan 2024 16:48:56 GMT, Jatin Bhateja wrote:
>> Hi,
>>
>> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2
>> only targets.
>> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2
>> instruction set.
>> These are very frequently used APIs in
On Tue, 9 Jan 2024 16:48:56 GMT, Jatin Bhateja wrote:
>> Hi,
>>
>> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2
>> only targets.
>> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2
>> instruction set.
>> These are very frequently used APIs in
> Hi,
>
> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only
> targets.
> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2
> instruction set.
> These are very frequently used APIs in columnar database filter operation.
>
> Implementation uses a