Re: RFR: 8279508: Auto-vectorize Math.round API [v2]

2022-03-11 Thread Sandhya Viswanathan
On Thu, 3 Mar 2022 05:42:23 GMT, Jatin Bhateja  wrote:

>> The testing for this PR doesn't look adequate to me. I don't see any testing 
>> for the values where the behavior of round has been redefined at points in 
>> the last decade. See JDK-8010430 and JDK-6430675, both of which have 
>> regression tests in the core libs area. Thanks.
>
> Hi @jddarcy , can you kindly validate your feedback, it has been incorporated.

@jatin-bhateja There is a failure reported in the Pre-submit tests on Windows 
x64 for compiler/vectorization/TestRoundVect.java. Could you please take a look?

-

PR: https://git.openjdk.java.net/jdk/pull/7094


Re: RFR: 8279508: Auto-vectorize Math.round API [v2]

2022-03-02 Thread Jatin Bhateja
On Wed, 19 Jan 2022 22:09:26 GMT, Joe Darcy  wrote:

>> Jatin Bhateja has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   8279508: Adding a test for scalar intrinsification.
>
> The testing for this PR doesn't look adequate to me. I don't see any testing 
> for the values where the behavior of round has been redefined at points in 
> the last decade. See JDK-8010430 and JDK-6430675, both of which have 
> regression tests in the core libs area. Thanks.

Hi @jddarcy , can you kindly validate your feedback, it has been incorporated.

-

PR: https://git.openjdk.java.net/jdk/pull/7094


Re: RFR: 8279508: Auto-vectorize Math.round API [v2]

2022-02-16 Thread Joseph D. Darcy



On 2/12/2022 6:55 PM, Jatin Bhateja wrote:

On Fri, 21 Jan 2022 00:49:04 GMT, Sandhya Viswanathan 
 wrote:


The JVM currently initializes the x86 mxcsr to round to nearest even, see below 
in stubGenerator_x86_64.cpp: // Round to nearest (even), 64-bit mode, 
exceptions masked StubRoutines::x86::_mxcsr_std = 0x1F80; The above works for 
Math.rint which is specified to be round to nearest even. Please see: 
https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html
 : section 4.8.4

The rounding mode needed for Math.round is round to positive infinity which 
needs a different x86 mxcsr initialization(0x5F80).

Hi @sviswa7 ,
As per JLS 17 section 15.4 Java follows round to nearest rounding policy for 
all floating point operations except conversion to integer and remainder where 
it uses round toward zero.



That is a true background condition, but I will note that the Math.round 
method does independently define the semantics of its operation and 
rounding behavior, which has changed (slightly) over the lifetime of the 
platform.


-Joe




Re: RFR: 8279508: Auto-vectorize Math.round API [v2]

2022-02-12 Thread Jatin Bhateja
On Fri, 21 Jan 2022 00:49:04 GMT, Sandhya Viswanathan 
 wrote:

> The JVM currently initializes the x86 mxcsr to round to nearest even, see 
> below in stubGenerator_x86_64.cpp: // Round to nearest (even), 64-bit mode, 
> exceptions masked StubRoutines::x86::_mxcsr_std = 0x1F80; The above works for 
> Math.rint which is specified to be round to nearest even. Please see: 
> https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html
>  : section 4.8.4
> 
> The rounding mode needed for Math.round is round to positive infinity which 
> needs a different x86 mxcsr initialization(0x5F80).

Hi @sviswa7 ,
As per JLS 17 section 15.4 Java follows round to nearest rounding policy for 
all floating point operations except conversion to integer and remainder where 
it uses round toward zero.  

So it may not be feasible to modify global MXCSR.RC setting,  also modifying 
MXCSR setting just before rounding and re-setting back to its original value 
after operation will also not work as OOO processor is free to re-order LMXCSR 
instruction if used without any barriers and thus it may also influence other 
floating point operation. 
I am pushing an incremental patch which is vectorizes existing rounding APIs 
and is showing significant gain over existing implementation.

Best Regards,
Jatin

-

PR: https://git.openjdk.java.net/jdk/pull/7094


Re: RFR: 8279508: Auto-vectorize Math.round API [v2]

2022-01-20 Thread Sandhya Viswanathan
On Wed, 19 Jan 2022 17:38:25 GMT, Jatin Bhateja  wrote:

>> Summary of changes:
>> - Intrinsify Math.round(float) and Math.round(double) APIs.
>> - Extend auto-vectorizer to infer vector operations on encountering scalar 
>> IR nodes for above intrinsics.
>> - Test creation using new IR testing framework.
>> 
>> Following are the performance number of a JMH micro included with the patch 
>> 
>> Test System: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (Icelake Server)
>> 
>>   |   | BASELINE AVX2 | WithOpt AVX2 | Gain (opt/baseline) | Baseline AVX3 | 
>> Withopt AVX3 | Gain (opt/baseline)
>> -- | -- | -- | -- | -- | -- | -- | --
>> Benchmark | ARRAYLEN | Score (ops/ms) | Score (ops/ms) |   | Score (ops/ms) 
>> | Score (ops/ms) |  
>> FpRoundingBenchmark.test_round_double | 1024 | 518.532 | 1364.066 | 
>> 2.630630318 | 512.908 | 4292.11 | 8.368186887
>> FpRoundingBenchmark.test_round_double | 2048 | 270.137 | 830.986 | 
>> 3.076165057 | 273.159 | 2459.116 | 9.002507697
>> FpRoundingBenchmark.test_round_float | 1024 | 752.436 | 7780.905 | 
>> 10.34095259 | 752.49 | 9506.694 | 12.63364829
>> FpRoundingBenchmark.test_round_float | 2048 | 389.499 | 4113.046 | 
>> 10.55983712 | 389.63 | 4863.673 | 12.48279907
>> 
>> Kindly review and share your feedback.
>> 
>> Best Regards,
>> Jatin
>
> Jatin Bhateja has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   8279508: Adding a test for scalar intrinsification.

The JVM currently initializes the x86 mxcsr to round to nearest even, see below 
in stubGenerator_x86_64.cpp:
// Round to nearest (even), 64-bit mode, exceptions masked
StubRoutines::x86::_mxcsr_std = 0x1F80;
The above works for Math.rint which is specified to be round to nearest even.
Please see:
https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html
 : section 4.8.4

The rounding mode needed for Math.round is round to positive infinity which 
needs a different x86 mxcsr initialization(0x5F80).

-

PR: https://git.openjdk.java.net/jdk/pull/7094


Re: RFR: 8279508: Auto-vectorize Math.round API [v2]

2022-01-19 Thread Joe Darcy
On Wed, 19 Jan 2022 17:38:25 GMT, Jatin Bhateja  wrote:

>> Summary of changes:
>> - Intrinsify Math.round(float) and Math.round(double) APIs.
>> - Extend auto-vectorizer to infer vector operations on encountering scalar 
>> IR nodes for above intrinsics.
>> - Test creation using new IR testing framework.
>> 
>> Following are the performance number of a JMH micro included with the patch 
>> 
>> Test System: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (Icelake Server)
>> 
>>   |   | BASELINE AVX2 | WithOpt AVX2 | Gain (opt/baseline) | Baseline AVX3 | 
>> Withopt AVX3 | Gain (opt/baseline)
>> -- | -- | -- | -- | -- | -- | -- | --
>> Benchmark | ARRAYLEN | Score (ops/ms) | Score (ops/ms) |   | Score (ops/ms) 
>> | Score (ops/ms) |  
>> FpRoundingBenchmark.test_round_double | 1024 | 518.532 | 1364.066 | 
>> 2.630630318 | 512.908 | 4292.11 | 8.368186887
>> FpRoundingBenchmark.test_round_double | 2048 | 270.137 | 830.986 | 
>> 3.076165057 | 273.159 | 2459.116 | 9.002507697
>> FpRoundingBenchmark.test_round_float | 1024 | 752.436 | 7780.905 | 
>> 10.34095259 | 752.49 | 9506.694 | 12.63364829
>> FpRoundingBenchmark.test_round_float | 2048 | 389.499 | 4113.046 | 
>> 10.55983712 | 389.63 | 4863.673 | 12.48279907
>> 
>> Kindly review and share your feedback.
>> 
>> Best Regards,
>> Jatin
>
> Jatin Bhateja has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   8279508: Adding a test for scalar intrinsification.

The testing for this PR doesn't look adequate to me. I don't see any testing 
for the values where the behavior of round has been redefined at points in the 
last decade. See JDK-8010430 and JDK-6430675, both of which have regression 
tests in the core libs area. Thanks.

-

PR: https://git.openjdk.java.net/jdk/pull/7094


Re: RFR: 8279508: Auto-vectorize Math.round API [v2]

2022-01-19 Thread Vladimir Ivanov
On Wed, 19 Jan 2022 17:38:25 GMT, Jatin Bhateja  wrote:

>> Summary of changes:
>> - Intrinsify Math.round(float) and Math.round(double) APIs.
>> - Extend auto-vectorizer to infer vector operations on encountering scalar 
>> IR nodes for above intrinsics.
>> - Test creation using new IR testing framework.
>> 
>> Following are the performance number of a JMH micro included with the patch 
>> 
>> Test System: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (Icelake Server)
>> 
>>   |   | BASELINE AVX2 | WithOpt AVX2 | Gain (opt/baseline) | Baseline AVX3 | 
>> Withopt AVX3 | Gain (opt/baseline)
>> -- | -- | -- | -- | -- | -- | -- | --
>> Benchmark | ARRAYLEN | Score (ops/ms) | Score (ops/ms) |   | Score (ops/ms) 
>> | Score (ops/ms) |  
>> FpRoundingBenchmark.test_round_double | 1024 | 518.532 | 1364.066 | 
>> 2.630630318 | 512.908 | 4292.11 | 8.368186887
>> FpRoundingBenchmark.test_round_double | 2048 | 270.137 | 830.986 | 
>> 3.076165057 | 273.159 | 2459.116 | 9.002507697
>> FpRoundingBenchmark.test_round_float | 1024 | 752.436 | 7780.905 | 
>> 10.34095259 | 752.49 | 9506.694 | 12.63364829
>> FpRoundingBenchmark.test_round_float | 2048 | 389.499 | 4113.046 | 
>> 10.55983712 | 389.63 | 4863.673 | 12.48279907
>> 
>> Kindly review and share your feedback.
>> 
>> Best Regards,
>> Jatin
>
> Jatin Bhateja has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   8279508: Adding a test for scalar intrinsification.

There are already `RoundFloat`, `RoundDouble`, and `RoundDoubleMode` nodes 
defined.

Though `RoundFloat` and `RoundDouble` are legacy nodes used only on x86-32, 
`RoundDoubleMode` supports multiple rounding modes and is amenable to 
auto-vectorization.

What do you think about the following alternative? 

Reuse `RoundDoubleMode` (with a new rounding mode) and introduce 
`RoundFloatMode`.

Special rounding rules is not the only peculiarity of `Math.round()`. It also 
converts the result to an integral type. It can be represented as `ConvF2I 
(RoundFloatMode f #rmode)` / `ConvD2L (RoundDoubleMode d #rmode)`. In scalar 
case, it can be matched as a single AD instruction.

Auto-vectorizer can then convert it to `VectorCastF2X (RoundFloatModeV vf 
#rmode)` / `VectorCastD2X (RoundDoubleModeV vd #rmode)` and match it in a 
similar manner.

test/hotspot/jtreg/compiler/c2/cr6340864/TestFloatVect.java line 33:

> 31:  * @run main/othervm -Xbatch -XX:CompileCommand=exclude,*::test() 
> -Xmx128m -XX:MaxVectorSize=16 compiler.c2.cr6340864.TestFloatVect
> 32:  * @run main/othervm -Xbatch -XX:CompileCommand=exclude,*::test() 
> -Xmx128m -XX:MaxVectorSize=32 compiler.c2.cr6340864.TestFloatVect
> 33:  * @run main/othervm -Xbatch -XX:CompileCommand=exclude,*::test() 
> -XX:TieredStopAtLevel=2 -Xmx128m -XX:MaxVectorSize=32 
> compiler.c2.cr6340864.TestFloatVect

What's the purpose of `-XX:TieredStopAtLevel=2` from testing perspective?

-

PR: https://git.openjdk.java.net/jdk/pull/7094


Re: RFR: 8279508: Auto-vectorize Math.round API [v2]

2022-01-19 Thread Jatin Bhateja
> Summary of changes:
> - Intrinsify Math.round(float) and Math.round(double) APIs.
> - Extend auto-vectorizer to infer vector operations on encountering scalar IR 
> nodes for above intrinsics.
> - Test creation using new IR testing framework.
> 
> Following are the performance number of a JMH micro included with the patch 
> 
> Test System: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (Icelake Server)
> 
>   |   | BASELINE AVX2 | WithOpt AVX2 | Gain (opt/baseline) | Baseline AVX3 | 
> Withopt AVX3 | Gain (opt/baseline)
> -- | -- | -- | -- | -- | -- | -- | --
> Benchmark | ARRAYLEN | Score (ops/ms) | Score (ops/ms) |   | Score (ops/ms) | 
> Score (ops/ms) |  
> FpRoundingBenchmark.test_round_double | 1024 | 518.532 | 1364.066 | 
> 2.630630318 | 512.908 | 4292.11 | 8.368186887
> FpRoundingBenchmark.test_round_double | 2048 | 270.137 | 830.986 | 
> 3.076165057 | 273.159 | 2459.116 | 9.002507697
> FpRoundingBenchmark.test_round_float | 1024 | 752.436 | 7780.905 | 
> 10.34095259 | 752.49 | 9506.694 | 12.63364829
> FpRoundingBenchmark.test_round_float | 2048 | 389.499 | 4113.046 | 
> 10.55983712 | 389.63 | 4863.673 | 12.48279907
> 
> Kindly review and share your feedback.
> 
> Best Regards,
> Jatin

Jatin Bhateja has updated the pull request incrementally with one additional 
commit since the last revision:

  8279508: Adding a test for scalar intrinsification.

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/7094/files
  - new: https://git.openjdk.java.net/jdk/pull/7094/files/0fe01504..575d2935

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=7094=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=7094=00-01

  Stats: 2 lines in 2 files changed: 2 ins; 0 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk/pull/7094.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/7094/head:pull/7094

PR: https://git.openjdk.java.net/jdk/pull/7094