On Tue, 7 Jun 2022 17:14:18 GMT, Quan Anh Mai wrote:
> Hi,
>
> This patch implements intrinsics for `Integer/Long::compareUnsigned` using
> the same approach as the JVM does for long and floating-point comparisons.
> This allows efficient and reliable usage of unsigned com
On Tue, 7 Jun 2022 17:41:13 GMT, Vladimir Kozlov wrote:
>> Quan Anh Mai has updated the pull request incrementally with two additional
>> commits since the last revision:
>>
>> - remove comments
>> - review comments
>
> src/hotspot/share/
mportant for range checks such as
> discussed in #8620 .
>
> Thank you very much.
Quan Anh Mai has updated the pull request incrementally with two additional
commits since the last revision:
- remove comments
- review comments
-
Changes:
- all: https://git.openjdk.j
Hi,
This patch implements intrinsics for `Integer/Long::compareUnsigned` using the
same approach as the JVM does for long and floating-point comparisons. This
allows efficient and reliable usage of unsigned comparison in Java, which is a
basic operation and is important for range checks such
On Wed, 4 May 2022 01:59:17 GMT, Quan Anh Mai wrote:
> Hi,
>
> This patch optimises the matching rules for floating-point comparison with
> respects to eq/ne on x86-64
>
> 1, When the inputs of a comparison is the same (i.e `isNaN` patterns), `ZF`
> is always set, so we
On Sat, 21 May 2022 10:31:25 GMT, Quan Anh Mai wrote:
>> Hi,
>>
>> This patch optimises the matching rules for floating-point comparison with
>> respects to eq/ne on x86-64
>>
>> 1, When the inputs of a comparison is the same (i.e `isNaN` patterns), `ZF`
On Wed, 4 May 2022 23:27:45 GMT, Vladimir Kozlov wrote:
>> The changes to `Float` and `Double` look good. I don't think we need
>> additional tests, see test/jdk/java/lang/Math/IeeeRecommendedTests.java.
>>
>> At first i thought we no longer need PR #8459 but it seems both PRs are
>>
nfiniteDouble avgt5 1232.800 ± 31.677 621.185 ±
> 11.935 ns/op1.98
> FPComparison.isInfiniteFloat avgt5 1234.708 ± 70.239 623.566 ±
> 15.206 ns/op1.98
> FPComparison.isNanDouble avgt5 2255.847 ± 7.238 400.124 ±
> 0.762 ns/op
On Wed, 18 May 2022 15:44:10 GMT, Quan Anh Mai wrote:
> This patch backs out the changes made by
> [JDK-8285390](https://bugs.openjdk.java.net/browse/JDK-8285390) and
> [JDK-8284742](https://bugs.openjdk.java.net/browse/JDK-8284742) since there
> are failures due to div nodes fl
On Thu, 19 May 2022 15:29:29 GMT, Quan Anh Mai wrote:
>> This patch backs out the changes made by
>> [JDK-8285390](https://bugs.openjdk.java.net/browse/JDK-8285390) and
>> [JDK-8284742](https://bugs.openjdk.java.net/browse/JDK-8284742) since there
>> are failures d
> This patch backs out the changes made by
> [JDK-8285390](https://bugs.openjdk.java.net/browse/JDK-8285390) and
> [JDK-8284742](https://bugs.openjdk.java.net/browse/JDK-8284742) since there
> are failures due to div nodes floating above their validity checks.
>
> Thanks.
On Wed, 4 May 2022 23:16:41 GMT, Vladimir Kozlov wrote:
>> src/hotspot/cpu/x86/x86_64.ad line 6998:
>>
>>> 6996: ins_encode %{
>>> 6997: __ cmovl(Assembler::parity, $dst$$Register, $src$$Register);
>>> 6998: __ cmovl(Assembler::notEqual, $dst$$Register, $src$$Register);
>>
>> Should
This patch backs out the changes made by
[JDK-8285390](https://bugs.openjdk.java.net/browse/JDK-8285390) and
[JDK-8284742](https://bugs.openjdk.java.net/browse/JDK-8284742) since there are
failures due to div nodes floating above its validity checks.
Thanks.
-
Commit messages:
-
On Wed, 4 May 2022 01:59:17 GMT, Quan Anh Mai wrote:
> Hi,
>
> This patch optimises the matching rules for floating-point comparison with
> respects to eq/ne on x86-64
>
> 1, When the inputs of a comparison is the same (i.e `isNaN` patterns), `ZF`
> is always set, so we
op
> FPComparison.isFiniteDoubleavgt5 518.309 ± 107.352 ns/op
> FPComparison.isFiniteFloat avgt5 515.576 ± 14.669 ns/op
> FPComparison.isInfiniteDouble avgt5 621.185 ± 11.935 ns/op
> FPComparison.isInfiniteFloat avgt5 623.566 ± 15.
On Fri, 13 May 2022 01:35:40 GMT, Xiaohong Gong wrote:
>> Checking whether the indexes of masked lanes are inside of the valid memory
>> boundary is necessary for masked vector memory access. However, this could
>> be saved if the given offset is inside of the vector range that could make
>>
On Fri, 13 May 2022 01:27:18 GMT, Xiaohong Gong wrote:
>> Maybe we could use `a.length - vsp.length() > 0 && offset u< a.length -
>> vsp.length()` which would hoist the first check outside of the loop.
>> Thanks.
>
>> Maybe we could use `a.length - vsp.length() > 0 && offset u< a.length -
>>
On Tue, 10 May 2022 01:23:55 GMT, Xiaohong Gong wrote:
> Checking whether the indexes of masked lanes are inside of the valid memory
> boundary is necessary for masked vector memory access. However, this could be
> saved if the given offset is inside of the vector range that could make sure
>
Hi,
This patch optimises the matching rules for floating-point comparison with
respects to eq/ne on x86-64
1, When the inputs of a comparison is the same (i.e `isNaN` patterns), `ZF` is
always set, so we don't need `cmpOpUCF2` for the eq/ne cases, which improves
the sequence of `If (CmpF x x)
On Sun, 17 Apr 2022 14:35:14 GMT, Jie Fu wrote:
> Hi all,
>
> According to the Vector API doc, the `LSHR` operator computes
> `a>>>(n&(ESIZE*8-1))`.
> However, current implementation is incorrect for negative bytes/shorts.
>
> The background is that one of our customers try to vectorize
On Mon, 18 Apr 2022 08:29:52 GMT, Jie Fu wrote:
>>> @DamonFool
>>>
>>> I think the issue is that these two cases of yours are not equal
>>> semantically.
>>
>> Why?
>> According to the vector api doc, they should compute the same value when the
>> shift_cnt is 3, right?
>>
>>>
>>> ```
>>>
On Mon, 18 Apr 2022 04:14:39 GMT, Jie Fu wrote:
> However, just image that someone would like to optimize some code segments of
> bytes/shorts `>>>`
Then that person can just use signed shift (`VectorOperators.ASHR`), right?
Shifting on masked shift counts means that the shift count cannot be
On Sun, 17 Apr 2022 14:35:14 GMT, Jie Fu wrote:
> Hi all,
>
> According to the Vector API doc, the `LSHR` operator computes
> `a>>>(n&(ESIZE*8-1))`.
> However, current implementation is incorrect for negative bytes/shorts.
>
> The background is that one of our customers try to vectorize
On Sun, 17 Apr 2022 14:35:14 GMT, Jie Fu wrote:
> Hi all,
>
> According to the Vector API doc, the `LSHR` operator computes
> `a>>>(n&(ESIZE*8-1))`.
> However, current implementation is incorrect for negative bytes/shorts.
>
> The background is that one of our customers try to vectorize
On Sun, 17 Apr 2022 14:35:14 GMT, Jie Fu wrote:
> Hi all,
>
> According to the Vector API doc, the `LSHR` operator computes
> `a>>>(n&(ESIZE*8-1))`.
> However, current implementation is incorrect for negative bytes/shorts.
>
> The background is that one of our customers try to vectorize
On Sun, 17 Apr 2022 14:35:14 GMT, Jie Fu wrote:
> Hi all,
>
> According to the Vector API doc, the `LSHR` operator computes
> `a>>>(n&(ESIZE*8-1))`.
> However, current implementation is incorrect for negative bytes/shorts.
>
> The background is that one of our customers try to vectorize
On Sun, 17 Apr 2022 14:35:14 GMT, Jie Fu wrote:
> Hi all,
>
> According to the Vector API doc, the `LSHR` operator computes
> `a>>>(n&(ESIZE*8-1))`.
> However, current implementation is incorrect for negative bytes/shorts.
>
> The background is that one of our customers try to vectorize
On Sun, 17 Apr 2022 14:35:14 GMT, Jie Fu wrote:
> Hi all,
>
> According to the Vector API doc, the `LSHR` operator computes
> `a>>>(n&(ESIZE*8-1))`.
> However, current implementation is incorrect for negative bytes/shorts.
>
> The background is that one of our customers try to vectorize
Hi,
This patch moves the handling of integral division overflow on x86 from code
emission time to parsing time. This allows the compiler to perform more
efficient transformations and also aids in achieving better code layout.
I also removed the handling for division by 10 in the ad file since
On Fri, 8 Apr 2022 16:39:31 GMT, Vladimir Kozlov wrote:
>> Hi Vladimir (@vnkozlov),
>>
>> Incorporated all the suggestions you made in the previous review and pushed
>> a new commit.
>> Please let me know if anything else is needed.
>>
>> Thanks,
>> Vamsi
>
> @vamsi-parasa I got failures in
On Fri, 8 Apr 2022 01:05:33 GMT, Srinivas Vamsi Parasa
wrote:
>> Optimizes the divideUnsigned() and remainderUnsigned() methods in
>> java.lang.Integer and java.lang.Long classes using x86 intrinsics. This
>> change shows 3x improvement for Integer methods and upto 25% improvement for
>>
On Wed, 30 Mar 2022 10:31:59 GMT, Xiaohong Gong wrote:
> Currently the vector load with mask when the given index happens out of the
> array boundary is implemented with pure java scalar code to avoid the IOOBE
> (IndexOutOfBoundaryException). This is necessary for architectures that do
> not
On Wed, 30 Mar 2022 10:31:59 GMT, Xiaohong Gong wrote:
> Currently the vector load with mask when the given index happens out of the
> array boundary is implemented with pure java scalar code to avoid the IOOBE
> (IndexOutOfBoundaryException). This is necessary for architectures that do
> not
On Tue, 29 Mar 2022 21:56:18 GMT, Vamsi Parasa wrote:
>> This is both complicated and inefficient, I would suggest building the
>> intrinsic in the IR graph so that the compiler can simplify
>> `Integer.compareUnsigned(x, y) < 0` into `x u< y`. Thanks.
>
>> This is both complicated and
On Sun, 27 Mar 2022 06:15:34 GMT, Vamsi Parasa wrote:
> Implements x86 intrinsics for compare() method in java.lang.Integer and
> java.lang.Long.
This is both complicated and inefficient, I would suggest building the
intrinsic in the IR graph so that the compiler can simplify
On Tue, 22 Mar 2022 02:52:07 GMT, Jatin Bhateja wrote:
>>> A read from constant table will incur minimum of L1I access penalty to
>>> access code blob or at worst even more if data is not present in first
>>> level cache
>>
>> But your approach comes at a cost of frontend bandwidth and port
On Mon, 21 Mar 2022 18:25:36 GMT, Jatin Bhateja wrote:
> A read from constant table will incur minimum of L1I access penalty to access
> code blob or at worst even more if data is not present in first level cache
But your approach comes at a cost of frontend bandwidth and port contention,
On Sun, 13 Mar 2022 04:27:44 GMT, Jatin Bhateja wrote:
>> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4178:
>>
>>> 4176: movl(scratch, 1056964608);
>>> 4177: movq(xtmp1, scratch);
>>> 4178: vbroadcastss(xtmp1, xtmp1, vec_enc);
>>
>> You could put the constant in the constant table
On Sat, 12 Mar 2022 09:48:14 GMT, xpbob wrote:
> * Constructs an empty list with an initial capacity of ten
>
> =>
>
> * Constructs an empty list with default sized empty instances.
>
>
> private static final Object[] DEFAULTCAPACITY_EMPTY_ELEMENTDATA = {};
>
>
On Sat, 12 Mar 2022 23:22:16 GMT, Quan Anh Mai wrote:
>> Jatin Bhateja has updated the pull request incrementally with one additional
>> commit since the last revision:
>>
>> 8279508: Creating separate test for round double under feature check.
>
> src/hotspot
On Sat, 12 Mar 2022 19:58:37 GMT, Jatin Bhateja wrote:
>> Summary of changes:
>> - Intrinsify Math.round(float) and Math.round(double) APIs.
>> - Extend auto-vectorizer to infer vector operations on encountering scalar
>> IR nodes for above intrinsics.
>> - Test creation using new IR testing
On Sat, 12 Mar 2022 09:48:14 GMT, xpbob wrote:
> * Constructs an empty list with an initial capacity of ten
>
> =>
>
> * Constructs an empty list with default sized empty instances.
>
>
> private static final Object[] DEFAULTCAPACITY_EMPTY_ELEMENTDATA = {};
>
>
On Fri, 4 Mar 2022 17:44:44 GMT, Ludovic Henry wrote:
>> Despite the hash value being cached for Strings, computing the hash still
>> represents a significant CPU usage for applications handling lots of text.
>>
>> Even though it would be generally better to do it through an enhancement to
>>
On Fri, 4 Mar 2022 17:44:44 GMT, Ludovic Henry wrote:
>> Despite the hash value being cached for Strings, computing the hash still
>> represents a significant CPU usage for applications handling lots of text.
>>
>> Even though it would be generally better to do it through an enhancement to
>>
On Sat, 19 Feb 2022 05:51:52 GMT, Quan Anh Mai wrote:
> Hi,
>
> `Objects.requireNonNull` may fail to be inlined. The call is expensive and
> may lead to objects escaping to the heap while the null check is cheap and is
> often elided. I have observed this when using the vector
On Tue, 1 Mar 2022 02:22:49 GMT, Quan Anh Mai wrote:
>> Hi,
>>
>> `Objects.requireNonNull` may fail to be inlined. The call is expensive and
>> may lead to objects escaping to the heap while the null check is cheap and
>> is often elided. I have observed this
On Tue, 1 Mar 2022 02:22:49 GMT, Quan Anh Mai wrote:
>> Hi,
>>
>> `Objects.requireNonNull` may fail to be inlined. The call is expensive and
>> may lead to objects escaping to the heap while the null check is cheap and
>> is often elided. I have observed this
s to vectors being materialised in a hot loop.
>
> Should the other `requireNonNull` be `ForceInline` as well?
>
> Thank you very much.
Quan Anh Mai has updated the pull request incrementally with one additional
commit since the last revision:
the other
On Sat, 26 Feb 2022 03:02:51 GMT, Sandhya Viswanathan
wrote:
>> src/hotspot/cpu/x86/x86.ad line 7263:
>>
>>> 7261: __ vector_round_float_avx($dst$$XMMRegister, $src$$XMMRegister,
>>> $xtmp1$$XMMRegister,
>>> 7262: $xtmp2$$XMMRegister,
>>>
On Sat, 26 Feb 2022 03:37:32 GMT, Quan Anh Mai wrote:
>> Clarification, the number in my comments above is (2^w - 1). This is from
>> Intel SDM
>> (https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html).
>> Also you will need to
Hi,
`Objects.requireNonNull` may fail to be inlined. The call is expensive and may
lead to objects escaping to the heap while the null check is cheap and is often
elided. I have observed this when using the vector API when a call to
`Objects.requireNonNull` leads to vectors being materialised
On Sat, 5 Feb 2022 15:34:08 GMT, Quan Anh Mai wrote:
> Hi,
>
> This patch implements the unsigned upcast intrinsics in x86, which are used
> in vector lane-wise reinterpreting operations.
>
> Thank you very much.
This pull request has now been integrated.
Changeset: 0af356
On Sun, 13 Feb 2022 05:18:34 GMT, Quan Anh Mai wrote:
>> Hi,
>>
>> This patch implements the unsigned upcast intrinsics in x86, which are used
>> in vector lane-wise reinterpreting operations.
>>
>> Thank you very much.
>
> Quan Anh Mai has updat
On Sun, 13 Feb 2022 03:09:43 GMT, Jatin Bhateja wrote:
>> Summary of changes:
>> - Intrinsify Math.round(float) and Math.round(double) APIs.
>> - Extend auto-vectorizer to infer vector operations on encountering scalar
>> IR nodes for above intrinsics.
>> - Test creation using new IR testing
On Thu, 10 Feb 2022 18:55:29 GMT, Paul Sandoz wrote:
>> Quan Anh Mai has updated the pull request incrementally with two additional
>> commits since the last revision:
>>
>> - minor rename
>> - address reviews
>
> Obse
> Hi,
>
> This patch implements the unsigned upcast intrinsics in x86, which are used
> in vector lane-wise reinterpreting operations.
>
> Thank you very much.
Quan Anh Mai has updated the pull request incrementally with one additional
commit since the last revision:
m
On Sun, 13 Feb 2022 03:09:43 GMT, Jatin Bhateja wrote:
>> Summary of changes:
>> - Intrinsify Math.round(float) and Math.round(double) APIs.
>> - Extend auto-vectorizer to infer vector operations on encountering scalar
>> IR nodes for above intrinsics.
>> - Test creation using new IR testing
On Thu, 10 Feb 2022 05:05:05 GMT, Jatin Bhateja wrote:
>> Quan Anh Mai has updated the pull request incrementally with two additional
>> commits since the last revision:
>>
>> - minor rename
>> - address reviews
>
> src/hotspot/cpu/x86/x86.ad line 7288
On Wed, 9 Feb 2022 22:52:47 GMT, Sandhya Viswanathan
wrote:
>> Quan Anh Mai has updated the pull request incrementally with two additional
>> commits since the last revision:
>>
>> - minor rename
>> - address reviews
>
> src/hotspot/cpu/x86/ass
> Hi,
>
> This patch implements the unsigned upcast intrinsics in x86, which are used
> in vector lane-wise reinterpreting operations.
>
> Thank you very much.
Quan Anh Mai has updated the pull request incrementally with two additional
commits since the last revision
Hi,
This patch implements the unsigned upcast intrinsics in x86, which are used in
vector lane-wise reinterpreting operations.
Thank you very much.
-
Commit messages:
- unsigned cast intrinsics
Changes: https://git.openjdk.java.net/jdk/pull/7358/files
Webrev:
On Sat, 15 Jan 2022 02:21:38 GMT, Jatin Bhateja wrote:
> Summary of changes:
> - Intrinsify Math.round(float) and Math.round(double) APIs.
> - Extend auto-vectorizer to infer vector operations on encountering scalar IR
> nodes for above intrinsics.
> - Test creation using new IR testing
On Sat, 15 Jan 2022 02:21:38 GMT, Jatin Bhateja wrote:
> Summary of changes:
> - Intrinsify Math.round(float) and Math.round(double) APIs.
> - Extend auto-vectorizer to infer vector operations on encountering scalar IR
> nodes for above intrinsics.
> - Test creation using new IR testing
63 matches
Mail list logo