On Tue, 7 May 2024 13:23:48 GMT, Emanuel Peter wrote:
>> Hamlin Li has updated the pull request incrementally with one additional
>> commit since the last revision:
>>
>> fix issues; modify vm options to make sure test the expected behaviors.
>
> test/hotspo
On Mon, 29 Apr 2024 11:38:27 GMT, Hamlin Li wrote:
>> HI,
>> Can you have a look at this patch adding some tests for Math.round
>> instrinsics?
>> Thanks!
>>
>> ### FYI:
>> During the development of RoundVF/RoundF, we faced the issues which were
>> only spotted by running test exhaustively
On Sat, 2 Mar 2024 16:22:22 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> This patch optimizes sub-word gather operation for x86 targets with AVX2 and
>> AVX512 features.
>>
>> Following is the summary of changes:-
>>
>> 1) Intrinsify sub-word gather using hybrid algorithm which initially
>>
On Tue, 27 Feb 2024 02:47:13 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> This patch optimizes sub-word gather operation for x86 targets with AVX2 and
>> AVX512 features.
>>
>> Following is the summary of changes:-
>>
>> 1) Intrinsify sub-word gather using hybrid algorithm which initially
>>
On Tue, 27 Feb 2024 10:25:19 GMT, Emanuel Peter wrote:
>> Jatin Bhateja has updated the pull request incrementally with one additional
>> commit since the last revision:
>>
>> Review comment resolutions.
>
> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp
On Mon, 26 Feb 2024 15:05:03 GMT, Emanuel Peter wrote:
>> Hi @eme64 , I was referring to allocation of label's array. To be concise
>> and avoid hand unrolling of loop, I chose an array of labels.
>
> I was referring to the various arrays as well above. I think it would
On Mon, 26 Feb 2024 14:58:53 GMT, Jatin Bhateja wrote:
>> I could not find any other case with the same pattern, of initializing a
>> list of Labels.
>>
>> On the other hand, I can find cases where we already do what I am saying:
>> `C2_MacroAssembler::rtm_counters_update`
>
> Hi @eme64 , I
On Mon, 26 Feb 2024 13:14:24 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> This patch optimizes sub-word gather operation for x86 targets with AVX2 and
>> AVX512 features.
>>
>> Following is the summary of changes:-
>>
>> 1) Intrinsify sub-word gather using hybrid algorithm which initially
>>
On Mon, 26 Feb 2024 13:24:05 GMT, Emanuel Peter wrote:
>> To avoid invariant initializations to happen within the loop, compiler will
>> unroll this small loop and will forward the initializations, if it does not
>> then we can save redundant allocation within loop.
>
>
On Mon, 26 Feb 2024 13:09:22 GMT, Jatin Bhateja wrote:
>> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1584:
>>
>>> 1582: if (elem_bt == T_SHORT) {
>>> 1583: Label case0, case1, case2, case3;
>>> 1584: Label* larr[] = {, , , };
>>
>> Not sure if I asked this already: why define
On Mon, 26 Feb 2024 13:06:19 GMT, Jatin Bhateja wrote:
>> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template
>> line 4840:
>>
>>> 4838:
>>> 4839: // Check indices are within array bounds.
>>> 4840: // FIXME: Check index under mask controlling.
On Sun, 25 Feb 2024 06:27:10 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> This patch optimizes sub-word gather operation for x86 targets with AVX2 and
>> AVX512 features.
>>
>> Following is the summary of changes:-
>>
>> 1) Intrinsify sub-word gather using hybrid algorithm which initially
>>
On Tue, 20 Feb 2024 08:29:44 GMT, Emanuel Peter wrote:
>> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1716:
>>
>>> 1714: XMMRegister xtmp3, Register
>>> rtmp,
>>> 1715:
On Sun, 25 Feb 2024 06:23:50 GMT, Jatin Bhateja wrote:
>> src/hotspot/cpu/x86/x86.ad line 4120:
>>
>>> 4118: BasicType elem_bt = Matcher::vector_element_basic_type(this);
>>> 4119: __ lea($tmp$$Register, $mem$$Address);
>>> 4120: __ vgather8b(elem_bt, $dst$$XMMRegister,
On Sun, 17 Dec 2023 17:51:37 GMT, Jatin Bhateja wrote:
>> Jatin Bhateja has updated the pull request incrementally with one additional
>> commit since the last revision:
>>
>> Fix incorrect comment
>
> Refined implementation using integral gather operation for AVX512 targets. As
> per Intel
On Wed, 7 Feb 2024 18:38:29 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> This patch optimizes sub-word gather operation for x86 targets with AVX2 and
>> AVX512 features.
>>
>> Following is the summary of changes:-
>>
>> 1) Intrinsify sub-word gather using hybrid algorithm which initially
>>
On Tue, 20 Feb 2024 07:35:28 GMT, Emanuel Peter wrote:
>> Jatin Bhateja has updated the pull request incrementally with one additional
>> commit since the last revision:
>>
>> Review comments resolutions.
>
> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp li
On Tue, 23 Jan 2024 11:56:58 GMT, Jatin Bhateja wrote:
>> Hi,
>>
>> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2
>> only targets.
>> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2
>> instruction set.
>> These are very frequently used APIs
On Tue, 23 Jan 2024 11:56:58 GMT, Jatin Bhateja wrote:
>> Hi,
>>
>> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2
>> only targets.
>> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2
>> instruction set.
>> These are very frequently used APIs
On Sat, 20 Jan 2024 09:55:45 GMT, Jatin Bhateja wrote:
>> Hi,
>>
>> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2
>> only targets.
>> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2
>> instruction set.
>> These are very frequently used APIs
On Thu, 18 Jan 2024 17:06:55 GMT, Jatin Bhateja wrote:
>> @jatin-bhateja so why do you shift by 5? I thought 4 longs are 32 bit?
>
> For long/double each permute row is 32 byte in size, so a shift by 5 to
> compute row address.
Ah right. Maybe we could say `32byte = 4 long = 4 * 64bit`.
On Tue, 16 Jan 2024 06:08:28 GMT, Jatin Bhateja wrote:
>> Or would that require too many registers?
>
>> Can the `offset` not be added to `idx_base` before the loop?
>
> Offset needs to be added to each index element, please refer to API
> specification for details.
>
On Tue, 16 Jan 2024 06:08:35 GMT, Jatin Bhateja wrote:
>> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1634:
>>
>>> 1632: Register offset,
>>> XMMRegister offset_vec, XMMRegister idx_vec,
>>> 1633:
On Tue, 16 Jan 2024 06:08:40 GMT, Jatin Bhateja wrote:
>> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1627:
>>
>>> 1625: vpsrlvd(dst, dst, xtmp, vlen_enc);
>>> 1626: // Pack double word vector into byte vector.
>>> 1627: vpackI2X(T_BYTE, dst, ones, xtmp, vlen_enc);
>>
>> I
On Tue, 16 Jan 2024 06:08:31 GMT, Jatin Bhateja wrote:
>> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1757:
>>
>>> 1755: for (int i = 0; i < 4; i++) {
>>> 1756: movl(rtmp, Address(idx_base, i * 4));
>>> 1757: pinsrw(dst, Address(base, rtmp, Address::times_2), i);
>>
>>
On Tue, 16 Jan 2024 06:17:43 GMT, Jatin Bhateja wrote:
>> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1900:
>>
>>> 1898: vgather8b(elem_ty, xtmp3, base, idx_base, rtmp, vlen_enc);
>>> 1899: } else {
>>> 1900: LP64_ONLY(vgather8b_masked(elem_ty, xtmp3, base, idx_base,
>>>
On Tue, 16 Jan 2024 06:13:43 GMT, Jatin Bhateja wrote:
>> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 5309:
>>
>>> 5307: assert(bt == T_LONG || bt == T_DOUBLE, "");
>>> 5308: vmovmskpd(rtmp, mask, vec_enc);
>>> 5309: shlq(rtmp, 5); // for 64 bit rows (4 longs)
>>
>>
On Mon, 15 Jan 2024 14:25:28 GMT, Emanuel Peter wrote:
>> Jatin Bhateja has updated the pull request with a new target base due to a
>> merge or a rebase. The pull request now contains 12 commits:
>>
>> - Accelerating masked sub-word gathers for AVX2 targets, this giv
On Mon, 1 Jan 2024 14:36:06 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> This patch optimizes sub-word gather operation for x86 targets with AVX2 and
>> AVX512 features.
>>
>> Following is the summary of changes:-
>>
>> 1) Intrinsify sub-word gather using hybrid algorithm which initially
>>
On Tue, 9 Jan 2024 16:48:56 GMT, Jatin Bhateja wrote:
>> Hi,
>>
>> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2
>> only targets.
>> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2
>> instruction set.
>> These are very frequently used APIs in
On Mon, 8 Jan 2024 20:48:39 GMT, Scott Gibbons wrote:
>> Re-write the IndexOf code without the use of the pcmpestri instruction, only
>> using AVX2 instructions. This change accelerates String.IndexOf on average
>> 1.3x for AVX2. The benchmark numbers:
>>
>>
>> Benchmark
On Tue, 9 Jan 2024 06:13:44 GMT, Jatin Bhateja wrote:
>> Yes, IF it is vectorized, then there is no difference between high and low
>> density. My concern was more if vectorization is preferrable over the scalar
>> alternative in the low-density case, where branch prediction is more stable.
>
On Mon, 8 Jan 2024 06:23:46 GMT, Jatin Bhateja wrote:
>> Hi,
>>
>> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2
>> only targets.
>> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2
>> instruction set.
>> These are very frequently used APIs in
On Fri, 5 Jan 2024 09:35:34 GMT, Emanuel Peter wrote:
>> Thanks for the comment addition!
>
> Improvement suggestion:
> For a vector with 8 ints, we get `2^8 = 256` many bit patterns for the mask.
> The table has a row for each `mask` value, consisting of 8 ints, which
&
On Mon, 8 Jan 2024 06:23:46 GMT, Jatin Bhateja wrote:
>> Hi,
>>
>> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2
>> only targets.
>> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2
>> instruction set.
>> These are very frequently used APIs in
On Mon, 8 Jan 2024 06:06:20 GMT, Jatin Bhateja wrote:
>> You are using `VectorMask pred = VectorMask.fromLong(ispecies,
>> maskctr++);`.
>> That basically systematically iterates over all masks, which is nice for a
>> correctness test.
>> But that would use different density inside one test
On Fri, 5 Jan 2024 07:08:35 GMT, Jatin Bhateja wrote:
>> Hi,
>>
>> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2
>> only targets.
>> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2
>> instruction set.
>> These are very frequently used APIs in
On Fri, 5 Jan 2024 07:05:51 GMT, Jatin Bhateja wrote:
>> We do have extensive functional tests for compress/expand APIs in
>> [test/jdk/jdk/incubator/vector](https://github.com/openjdk/jdk/tree/master/test/jdk/jdk/incubator/vector)
>
>> Could there be equivalent `expand` tests?
>
> Here are
On Fri, 5 Jan 2024 09:37:55 GMT, Emanuel Peter wrote:
>> This computes the byte offset from start of the table, both integer and long
>> permute table have same row sizes, 8 int elements vs 4 long elements.
>
> Ah, I understand now. Maybe leave a comment for that?
I would
On Thu, 4 Jan 2024 13:40:19 GMT, Emanuel Peter wrote:
>> Jatin Bhateja has updated the pull request incrementally with one additional
>> commit since the last revision:
>>
>> Updating copyright year of modified files.
>
> src/hotspot/cpu/x86/stubGenerator_
On Fri, 5 Jan 2024 09:31:50 GMT, Emanuel Peter wrote:
>> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 957:
>>
>>> 955: __ align(CodeEntryAlignment);
>>> 956: StubCodeMark mark(this, "StubRoutines", stub_name);
>>> 957: address
On Fri, 5 Jan 2024 07:03:34 GMT, Jatin Bhateja wrote:
>> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 5307:
>>
>>> 5305: assert(bt == T_LONG || bt == T_DOUBLE, "");
>>> 5306: vmovmskpd(rtmp, mask, vec_enc);
>>> 5307: shlq(rtmp, 5);
>>
>> Might this need to be 6? If I
On Thu, 4 Jan 2024 05:39:01 GMT, Jatin Bhateja wrote:
>> Hi,
>>
>> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2
>> only targets.
>> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2
>> instruction set.
>> These are very frequently used APIs in
On Thu, 4 Jan 2024 13:09:30 GMT, Emanuel Peter wrote:
>> Jatin Bhateja has updated the pull request incrementally with one additional
>> commit since the last revision:
>>
>> Updating copyright year of modified files.
>
> test/micro/org/open
44 matches
Mail list logo