Re: RFR: 8279508: Auto-vectorize Math.round API [v2]
On Thu, 3 Mar 2022 05:42:23 GMT, Jatin Bhateja wrote: >> The testing for this PR doesn't look adequate to me. I don't see any testing >> for the values where the behavior of round has been redefined at points in >> the last decade. See JDK-8010430 and JDK-6430675, both of which have >> regression tests in the core libs area. Thanks. > > Hi @jddarcy , can you kindly validate your feedback, it has been incorporated. @jatin-bhateja There is a failure reported in the Pre-submit tests on Windows x64 for compiler/vectorization/TestRoundVect.java. Could you please take a look? - PR: https://git.openjdk.java.net/jdk/pull/7094
Re: RFR: 8279508: Auto-vectorize Math.round API [v2]
On Wed, 19 Jan 2022 22:09:26 GMT, Joe Darcy wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional >> commit since the last revision: >> >> 8279508: Adding a test for scalar intrinsification. > > The testing for this PR doesn't look adequate to me. I don't see any testing > for the values where the behavior of round has been redefined at points in > the last decade. See JDK-8010430 and JDK-6430675, both of which have > regression tests in the core libs area. Thanks. Hi @jddarcy , can you kindly validate your feedback, it has been incorporated. - PR: https://git.openjdk.java.net/jdk/pull/7094
Re: RFR: 8279508: Auto-vectorize Math.round API [v2]
On 2/12/2022 6:55 PM, Jatin Bhateja wrote: On Fri, 21 Jan 2022 00:49:04 GMT, Sandhya Viswanathan wrote: The JVM currently initializes the x86 mxcsr to round to nearest even, see below in stubGenerator_x86_64.cpp: // Round to nearest (even), 64-bit mode, exceptions masked StubRoutines::x86::_mxcsr_std = 0x1F80; The above works for Math.rint which is specified to be round to nearest even. Please see: https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html : section 4.8.4 The rounding mode needed for Math.round is round to positive infinity which needs a different x86 mxcsr initialization(0x5F80). Hi @sviswa7 , As per JLS 17 section 15.4 Java follows round to nearest rounding policy for all floating point operations except conversion to integer and remainder where it uses round toward zero. That is a true background condition, but I will note that the Math.round method does independently define the semantics of its operation and rounding behavior, which has changed (slightly) over the lifetime of the platform. -Joe
Re: RFR: 8279508: Auto-vectorize Math.round API [v2]
On Fri, 21 Jan 2022 00:49:04 GMT, Sandhya Viswanathan wrote: > The JVM currently initializes the x86 mxcsr to round to nearest even, see > below in stubGenerator_x86_64.cpp: // Round to nearest (even), 64-bit mode, > exceptions masked StubRoutines::x86::_mxcsr_std = 0x1F80; The above works for > Math.rint which is specified to be round to nearest even. Please see: > https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html > : section 4.8.4 > > The rounding mode needed for Math.round is round to positive infinity which > needs a different x86 mxcsr initialization(0x5F80). Hi @sviswa7 , As per JLS 17 section 15.4 Java follows round to nearest rounding policy for all floating point operations except conversion to integer and remainder where it uses round toward zero. So it may not be feasible to modify global MXCSR.RC setting, also modifying MXCSR setting just before rounding and re-setting back to its original value after operation will also not work as OOO processor is free to re-order LMXCSR instruction if used without any barriers and thus it may also influence other floating point operation. I am pushing an incremental patch which is vectorizes existing rounding APIs and is showing significant gain over existing implementation. Best Regards, Jatin - PR: https://git.openjdk.java.net/jdk/pull/7094
Re: RFR: 8279508: Auto-vectorize Math.round API [v2]
On Wed, 19 Jan 2022 17:38:25 GMT, Jatin Bhateja wrote: >> Summary of changes: >> - Intrinsify Math.round(float) and Math.round(double) APIs. >> - Extend auto-vectorizer to infer vector operations on encountering scalar >> IR nodes for above intrinsics. >> - Test creation using new IR testing framework. >> >> Following are the performance number of a JMH micro included with the patch >> >> Test System: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (Icelake Server) >> >> | | BASELINE AVX2 | WithOpt AVX2 | Gain (opt/baseline) | Baseline AVX3 | >> Withopt AVX3 | Gain (opt/baseline) >> -- | -- | -- | -- | -- | -- | -- | -- >> Benchmark | ARRAYLEN | Score (ops/ms) | Score (ops/ms) | | Score (ops/ms) >> | Score (ops/ms) | >> FpRoundingBenchmark.test_round_double | 1024 | 518.532 | 1364.066 | >> 2.630630318 | 512.908 | 4292.11 | 8.368186887 >> FpRoundingBenchmark.test_round_double | 2048 | 270.137 | 830.986 | >> 3.076165057 | 273.159 | 2459.116 | 9.002507697 >> FpRoundingBenchmark.test_round_float | 1024 | 752.436 | 7780.905 | >> 10.34095259 | 752.49 | 9506.694 | 12.63364829 >> FpRoundingBenchmark.test_round_float | 2048 | 389.499 | 4113.046 | >> 10.55983712 | 389.63 | 4863.673 | 12.48279907 >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional > commit since the last revision: > > 8279508: Adding a test for scalar intrinsification. The JVM currently initializes the x86 mxcsr to round to nearest even, see below in stubGenerator_x86_64.cpp: // Round to nearest (even), 64-bit mode, exceptions masked StubRoutines::x86::_mxcsr_std = 0x1F80; The above works for Math.rint which is specified to be round to nearest even. Please see: https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html : section 4.8.4 The rounding mode needed for Math.round is round to positive infinity which needs a different x86 mxcsr initialization(0x5F80). - PR: https://git.openjdk.java.net/jdk/pull/7094
Re: RFR: 8279508: Auto-vectorize Math.round API [v2]
On Wed, 19 Jan 2022 17:38:25 GMT, Jatin Bhateja wrote: >> Summary of changes: >> - Intrinsify Math.round(float) and Math.round(double) APIs. >> - Extend auto-vectorizer to infer vector operations on encountering scalar >> IR nodes for above intrinsics. >> - Test creation using new IR testing framework. >> >> Following are the performance number of a JMH micro included with the patch >> >> Test System: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (Icelake Server) >> >> | | BASELINE AVX2 | WithOpt AVX2 | Gain (opt/baseline) | Baseline AVX3 | >> Withopt AVX3 | Gain (opt/baseline) >> -- | -- | -- | -- | -- | -- | -- | -- >> Benchmark | ARRAYLEN | Score (ops/ms) | Score (ops/ms) | | Score (ops/ms) >> | Score (ops/ms) | >> FpRoundingBenchmark.test_round_double | 1024 | 518.532 | 1364.066 | >> 2.630630318 | 512.908 | 4292.11 | 8.368186887 >> FpRoundingBenchmark.test_round_double | 2048 | 270.137 | 830.986 | >> 3.076165057 | 273.159 | 2459.116 | 9.002507697 >> FpRoundingBenchmark.test_round_float | 1024 | 752.436 | 7780.905 | >> 10.34095259 | 752.49 | 9506.694 | 12.63364829 >> FpRoundingBenchmark.test_round_float | 2048 | 389.499 | 4113.046 | >> 10.55983712 | 389.63 | 4863.673 | 12.48279907 >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional > commit since the last revision: > > 8279508: Adding a test for scalar intrinsification. The testing for this PR doesn't look adequate to me. I don't see any testing for the values where the behavior of round has been redefined at points in the last decade. See JDK-8010430 and JDK-6430675, both of which have regression tests in the core libs area. Thanks. - PR: https://git.openjdk.java.net/jdk/pull/7094
Re: RFR: 8279508: Auto-vectorize Math.round API [v2]
On Wed, 19 Jan 2022 17:38:25 GMT, Jatin Bhateja wrote: >> Summary of changes: >> - Intrinsify Math.round(float) and Math.round(double) APIs. >> - Extend auto-vectorizer to infer vector operations on encountering scalar >> IR nodes for above intrinsics. >> - Test creation using new IR testing framework. >> >> Following are the performance number of a JMH micro included with the patch >> >> Test System: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (Icelake Server) >> >> | | BASELINE AVX2 | WithOpt AVX2 | Gain (opt/baseline) | Baseline AVX3 | >> Withopt AVX3 | Gain (opt/baseline) >> -- | -- | -- | -- | -- | -- | -- | -- >> Benchmark | ARRAYLEN | Score (ops/ms) | Score (ops/ms) | | Score (ops/ms) >> | Score (ops/ms) | >> FpRoundingBenchmark.test_round_double | 1024 | 518.532 | 1364.066 | >> 2.630630318 | 512.908 | 4292.11 | 8.368186887 >> FpRoundingBenchmark.test_round_double | 2048 | 270.137 | 830.986 | >> 3.076165057 | 273.159 | 2459.116 | 9.002507697 >> FpRoundingBenchmark.test_round_float | 1024 | 752.436 | 7780.905 | >> 10.34095259 | 752.49 | 9506.694 | 12.63364829 >> FpRoundingBenchmark.test_round_float | 2048 | 389.499 | 4113.046 | >> 10.55983712 | 389.63 | 4863.673 | 12.48279907 >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional > commit since the last revision: > > 8279508: Adding a test for scalar intrinsification. There are already `RoundFloat`, `RoundDouble`, and `RoundDoubleMode` nodes defined. Though `RoundFloat` and `RoundDouble` are legacy nodes used only on x86-32, `RoundDoubleMode` supports multiple rounding modes and is amenable to auto-vectorization. What do you think about the following alternative? Reuse `RoundDoubleMode` (with a new rounding mode) and introduce `RoundFloatMode`. Special rounding rules is not the only peculiarity of `Math.round()`. It also converts the result to an integral type. It can be represented as `ConvF2I (RoundFloatMode f #rmode)` / `ConvD2L (RoundDoubleMode d #rmode)`. In scalar case, it can be matched as a single AD instruction. Auto-vectorizer can then convert it to `VectorCastF2X (RoundFloatModeV vf #rmode)` / `VectorCastD2X (RoundDoubleModeV vd #rmode)` and match it in a similar manner. test/hotspot/jtreg/compiler/c2/cr6340864/TestFloatVect.java line 33: > 31: * @run main/othervm -Xbatch -XX:CompileCommand=exclude,*::test() > -Xmx128m -XX:MaxVectorSize=16 compiler.c2.cr6340864.TestFloatVect > 32: * @run main/othervm -Xbatch -XX:CompileCommand=exclude,*::test() > -Xmx128m -XX:MaxVectorSize=32 compiler.c2.cr6340864.TestFloatVect > 33: * @run main/othervm -Xbatch -XX:CompileCommand=exclude,*::test() > -XX:TieredStopAtLevel=2 -Xmx128m -XX:MaxVectorSize=32 > compiler.c2.cr6340864.TestFloatVect What's the purpose of `-XX:TieredStopAtLevel=2` from testing perspective? - PR: https://git.openjdk.java.net/jdk/pull/7094
Re: RFR: 8279508: Auto-vectorize Math.round API [v2]
> Summary of changes: > - Intrinsify Math.round(float) and Math.round(double) APIs. > - Extend auto-vectorizer to infer vector operations on encountering scalar IR > nodes for above intrinsics. > - Test creation using new IR testing framework. > > Following are the performance number of a JMH micro included with the patch > > Test System: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (Icelake Server) > > | | BASELINE AVX2 | WithOpt AVX2 | Gain (opt/baseline) | Baseline AVX3 | > Withopt AVX3 | Gain (opt/baseline) > -- | -- | -- | -- | -- | -- | -- | -- > Benchmark | ARRAYLEN | Score (ops/ms) | Score (ops/ms) | | Score (ops/ms) | > Score (ops/ms) | > FpRoundingBenchmark.test_round_double | 1024 | 518.532 | 1364.066 | > 2.630630318 | 512.908 | 4292.11 | 8.368186887 > FpRoundingBenchmark.test_round_double | 2048 | 270.137 | 830.986 | > 3.076165057 | 273.159 | 2459.116 | 9.002507697 > FpRoundingBenchmark.test_round_float | 1024 | 752.436 | 7780.905 | > 10.34095259 | 752.49 | 9506.694 | 12.63364829 > FpRoundingBenchmark.test_round_float | 2048 | 389.499 | 4113.046 | > 10.55983712 | 389.63 | 4863.673 | 12.48279907 > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: 8279508: Adding a test for scalar intrinsification. - Changes: - all: https://git.openjdk.java.net/jdk/pull/7094/files - new: https://git.openjdk.java.net/jdk/pull/7094/files/0fe01504..575d2935 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk=7094=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk=7094=00-01 Stats: 2 lines in 2 files changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/7094.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7094/head:pull/7094 PR: https://git.openjdk.java.net/jdk/pull/7094