Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v5]

2024-01-19 Thread Jatin Bhateja
On Fri, 19 Jan 2024 07:43:18 GMT, Emanuel Peter wrote: >> For long/double each permute row is 32 byte in size, so a shift by 5 to >> compute row address. > > Ah right. Maybe we could say `32byte = 4 long = 4 * 64bit`. > Because "64bit row" sounds like the whole row is only 64 bit long. It is >

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v5]

2024-01-18 Thread Emanuel Peter
On Thu, 18 Jan 2024 17:06:55 GMT, Jatin Bhateja wrote: >> @jatin-bhateja so why do you shift by 5? I thought 4 longs are 32 bit? > > For long/double each permute row is 32 byte in size, so a shift by 5 to > compute row address. Ah right. Maybe we could say `32byte = 4 long = 4 * 64bit`.

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v5]

2024-01-18 Thread Jatin Bhateja
On Tue, 16 Jan 2024 07:08:57 GMT, Emanuel Peter wrote: >> Each long/double permute lane holds 64 bit value. > > @jatin-bhateja so why do you shift by 5? I thought 4 longs are 32 bit? For long/double each permute row is 32 byte in size, so a shift by 5 to compute row address. - PR

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v5]

2024-01-15 Thread Emanuel Peter
On Tue, 16 Jan 2024 06:13:43 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 5309: >> >>> 5307: assert(bt == T_LONG || bt == T_DOUBLE, ""); >>> 5308: vmovmskpd(rtmp, mask, vec_enc); >>> 5309: shlq(rtmp, 5); // for 64 bit rows (4 longs) >> >>

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v5]

2024-01-15 Thread Jatin Bhateja
On Mon, 15 Jan 2024 09:10:38 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Using emulated variable blend E-Core optimized instruction. > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v5]

2024-01-15 Thread Emanuel Peter
On Tue, 9 Jan 2024 16:48:56 GMT, Jatin Bhateja wrote: >> Hi, >> >> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 >> only targets. >> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 >> instruction set. >> These are very frequently used APIs in

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v5]

2024-01-14 Thread Andrey Turbanov
On Tue, 9 Jan 2024 16:48:56 GMT, Jatin Bhateja wrote: >> Hi, >> >> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 >> only targets. >> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 >> instruction set. >> These are very frequently used APIs in

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v5]

2024-01-10 Thread Jatin Bhateja
On Tue, 9 Jan 2024 16:48:56 GMT, Jatin Bhateja wrote: >> Hi, >> >> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 >> only targets. >> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 >> instruction set. >> These are very frequently used APIs in

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v5]

2024-01-09 Thread Jatin Bhateja
> Hi, > > Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only > targets. > Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 > instruction set. > These are very frequently used APIs in columnar database filter operation. > > Implementation uses a