Re: RFR: 8279508: Auto-vectorize Math.round API [v11]
On Wed, 2 Mar 2022 02:44:41 GMT, Jatin Bhateja wrote: >> Summary of changes: >> - Intrinsify Math.round(float) and Math.round(double) APIs. >> - Extend auto-vectorizer to infer vector operations on encountering scalar >> IR nodes for above intrinsics. >> - Test creation using new IR testing framework. >> >> Following are the performance number of a JMH micro included with the patch >> >> Test System: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (Icelake Server) >> >> >> Benchmark | TESTSIZE | Baseline AVX3 (ops/ms) | Withopt AVX3 (ops/ms) | Gain >> ratio | Baseline AVX2 (ops/ms) | Withopt AVX2 (ops/ms) | Gain ratio >> -- | -- | -- | -- | -- | -- | -- | -- >> FpRoundingBenchmark.test_round_double | 1024.00 | 504.15 | 2209.54 | 4.38 | >> 510.36 | 548.39 | 1.07 >> FpRoundingBenchmark.test_round_double | 2048.00 | 293.64 | 1271.98 | 4.33 | >> 293.48 | 274.01 | 0.93 >> FpRoundingBenchmark.test_round_float | 1024.00 | 825.99 | 4754.66 | 5.76 | >> 751.83 | 2274.13 | 3.02 >> FpRoundingBenchmark.test_round_float | 2048.00 | 412.22 | 2490.09 | 6.04 | >> 388.52 | 1334.18 | 3.43 >> >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional > commit since the last revision: > > 8279508: Removing +LogCompilation flag. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4157: > 4155: ExternalAddress mxcsr_std(StubRoutines::x86::addr_mxcsr_std()); > 4156: ldmxcsr(new_mxcsr); > 4157: movl(scratch, 1056964608); Suggestion: movl(scratch, 1056964608); // Raw bits corresponding to floating point value 0.5f. - PR: https://git.openjdk.java.net/jdk/pull/7094
Re: RFR: 8279508: Auto-vectorize Math.round API [v11]
On Sun, 6 Mar 2022 09:31:27 GMT, Andrew Haley wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional >> commit since the last revision: >> >> 8279508: Removing +LogCompilation flag. > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4157: > >> 4155: ExternalAddress mxcsr_std(StubRoutines::x86::addr_mxcsr_std()); >> 4156: ldmxcsr(new_mxcsr); >> 4157: movl(scratch, 1056964608); > > What is 1056964608 ? Raw bits corresponding to floating point value 0.5f. - PR: https://git.openjdk.java.net/jdk/pull/7094
Re: RFR: 8279508: Auto-vectorize Math.round API [v11]
On Wed, 2 Mar 2022 02:44:41 GMT, Jatin Bhateja wrote: >> Summary of changes: >> - Intrinsify Math.round(float) and Math.round(double) APIs. >> - Extend auto-vectorizer to infer vector operations on encountering scalar >> IR nodes for above intrinsics. >> - Test creation using new IR testing framework. >> >> Following are the performance number of a JMH micro included with the patch >> >> Test System: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (Icelake Server) >> >> >> Benchmark | TESTSIZE | Baseline AVX3 (ops/ms) | Withopt AVX3 (ops/ms) | Gain >> ratio | Baseline AVX2 (ops/ms) | Withopt AVX2 (ops/ms) | Gain ratio >> -- | -- | -- | -- | -- | -- | -- | -- >> FpRoundingBenchmark.test_round_double | 1024.00 | 504.15 | 2209.54 | 4.38 | >> 510.36 | 548.39 | 1.07 >> FpRoundingBenchmark.test_round_double | 2048.00 | 293.64 | 1271.98 | 4.33 | >> 293.48 | 274.01 | 0.93 >> FpRoundingBenchmark.test_round_float | 1024.00 | 825.99 | 4754.66 | 5.76 | >> 751.83 | 2274.13 | 3.02 >> FpRoundingBenchmark.test_round_float | 2048.00 | 412.22 | 2490.09 | 6.04 | >> 388.52 | 1334.18 | 3.43 >> >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional > commit since the last revision: > > 8279508: Removing +LogCompilation flag. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4157: > 4155: ExternalAddress mxcsr_std(StubRoutines::x86::addr_mxcsr_std()); > 4156: ldmxcsr(new_mxcsr); > 4157: movl(scratch, 1056964608); What is 1056964608 ? - PR: https://git.openjdk.java.net/jdk/pull/7094
Re: RFR: 8279508: Auto-vectorize Math.round API [v11]
On Wed, 2 Mar 2022 02:44:41 GMT, Jatin Bhateja wrote: >> Summary of changes: >> - Intrinsify Math.round(float) and Math.round(double) APIs. >> - Extend auto-vectorizer to infer vector operations on encountering scalar >> IR nodes for above intrinsics. >> - Test creation using new IR testing framework. >> >> Following are the performance number of a JMH micro included with the patch >> >> Test System: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (Icelake Server) >> >> >> Benchmark | TESTSIZE | Baseline AVX3 (ops/ms) | Withopt AVX3 (ops/ms) | Gain >> ratio | Baseline AVX2 (ops/ms) | Withopt AVX2 (ops/ms) | Gain ratio >> -- | -- | -- | -- | -- | -- | -- | -- >> FpRoundingBenchmark.test_round_double | 1024.00 | 504.15 | 2209.54 | 4.38 | >> 510.36 | 548.39 | 1.07 >> FpRoundingBenchmark.test_round_double | 2048.00 | 293.64 | 1271.98 | 4.33 | >> 293.48 | 274.01 | 0.93 >> FpRoundingBenchmark.test_round_float | 1024.00 | 825.99 | 4754.66 | 5.76 | >> 751.83 | 2274.13 | 3.02 >> FpRoundingBenchmark.test_round_float | 2048.00 | 412.22 | 2490.09 | 6.04 | >> 388.52 | 1334.18 | 3.43 >> >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional > commit since the last revision: > > 8279508: Removing +LogCompilation flag. Marked as reviewed by sviswanathan (Reviewer). - PR: https://git.openjdk.java.net/jdk/pull/7094
Re: RFR: 8279508: Auto-vectorize Math.round API [v11]
> Summary of changes: > - Intrinsify Math.round(float) and Math.round(double) APIs. > - Extend auto-vectorizer to infer vector operations on encountering scalar IR > nodes for above intrinsics. > - Test creation using new IR testing framework. > > Following are the performance number of a JMH micro included with the patch > > Test System: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (Icelake Server) > > > Benchmark | TESTSIZE | Baseline AVX3 (ops/ms) | Withopt AVX3 (ops/ms) | Gain > ratio | Baseline AVX2 (ops/ms) | Withopt AVX2 (ops/ms) | Gain ratio > -- | -- | -- | -- | -- | -- | -- | -- > FpRoundingBenchmark.test_round_double | 1024.00 | 504.15 | 2209.54 | 4.38 | > 510.36 | 548.39 | 1.07 > FpRoundingBenchmark.test_round_double | 2048.00 | 293.64 | 1271.98 | 4.33 | > 293.48 | 274.01 | 0.93 > FpRoundingBenchmark.test_round_float | 1024.00 | 825.99 | 4754.66 | 5.76 | > 751.83 | 2274.13 | 3.02 > FpRoundingBenchmark.test_round_float | 2048.00 | 412.22 | 2490.09 | 6.04 | > 388.52 | 1334.18 | 3.43 > > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: 8279508: Removing +LogCompilation flag. - Changes: - all: https://git.openjdk.java.net/jdk/pull/7094/files - new: https://git.openjdk.java.net/jdk/pull/7094/files/3b90ae53..57b1b13a Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk=7094=10 - incr: https://webrevs.openjdk.java.net/?repo=jdk=7094=09-10 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/7094.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7094/head:pull/7094 PR: https://git.openjdk.java.net/jdk/pull/7094