Re: RFR: 8279508: Auto-vectorize Math.round API [v18]
On Wed, 23 Mar 2022 06:55:50 GMT, Tobias Hartmann wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a >> merge or a rebase. The pull request now contains 22 commits: >> >> - 8279508: Using an explicit scratch register since rscratch1 is bound to >> r10 and its usage is transparent to compiler. >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8279508 >> - 8279508: Windows build failure fix. >> - 8279508: Styling comments resolved. >> - 8279508: Creating separate test for round double under feature check. >> - 8279508: Reducing the invocation count and compile thresholds for >> RoundTests.java. >> - 8279508: Review comments resolution. >> - 8279508: Preventing domain switch-over penalty for Math.round(float) and >> constraining unrolling to prevent code bloating. >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8279508 >> - 8279508: Removing +LogCompilation flag. >> - ... and 12 more: >> https://git.openjdk.java.net/jdk/compare/ff0b0927...c17440cf > > All tests passed. Hi @TobiHartmann , thanks for confirming. Hi @jddarcy , @theRealAph , kindly let me know if its good to integrate this. - PR: https://git.openjdk.java.net/jdk/pull/7094
Re: RFR: 8279508: Auto-vectorize Math.round API [v18]
On Fri, 18 Mar 2022 20:19:08 GMT, Jatin Bhateja wrote: >> Summary of changes: >> - Intrinsify Math.round(float) and Math.round(double) APIs. >> - Extend auto-vectorizer to infer vector operations on encountering scalar >> IR nodes for above intrinsics. >> - Test creation using new IR testing framework. >> >> Following are the performance number of a JMH micro included with the patch >> >> Test System: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (Icelake Server) >> >> >> Benchmark | TESTSIZE | Baseline AVX3 (ops/ms) | Withopt AVX3 (ops/ms) | Gain >> ratio | Baseline AVX2 (ops/ms) | Withopt AVX2 (ops/ms) | Gain ratio >> -- | -- | -- | -- | -- | -- | -- | -- >> FpRoundingBenchmark.test_round_double | 1024.00 | 504.15 | 2209.54 | 4.38 | >> 510.36 | 548.39 | 1.07 >> FpRoundingBenchmark.test_round_double | 2048.00 | 293.64 | 1271.98 | 4.33 | >> 293.48 | 274.01 | 0.93 >> FpRoundingBenchmark.test_round_float | 1024.00 | 825.99 | 4754.66 | 5.76 | >> 751.83 | 2274.13 | 3.02 >> FpRoundingBenchmark.test_round_float | 2048.00 | 412.22 | 2490.09 | 6.04 | >> 388.52 | 1334.18 | 3.43 >> >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request with a new target base due to a > merge or a rebase. The pull request now contains 22 commits: > > - 8279508: Using an explicit scratch register since rscratch1 is bound to > r10 and its usage is transparent to compiler. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8279508 > - 8279508: Windows build failure fix. > - 8279508: Styling comments resolved. > - 8279508: Creating separate test for round double under feature check. > - 8279508: Reducing the invocation count and compile thresholds for > RoundTests.java. > - 8279508: Review comments resolution. > - 8279508: Preventing domain switch-over penalty for Math.round(float) and > constraining unrolling to prevent code bloating. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8279508 > - 8279508: Removing +LogCompilation flag. > - ... and 12 more: > https://git.openjdk.java.net/jdk/compare/ff0b0927...c17440cf All tests passed. - PR: https://git.openjdk.java.net/jdk/pull/7094
Re: RFR: 8279508: Auto-vectorize Math.round API [v18]
On Fri, 18 Mar 2022 20:19:08 GMT, Jatin Bhateja wrote: >> Summary of changes: >> - Intrinsify Math.round(float) and Math.round(double) APIs. >> - Extend auto-vectorizer to infer vector operations on encountering scalar >> IR nodes for above intrinsics. >> - Test creation using new IR testing framework. >> >> Following are the performance number of a JMH micro included with the patch >> >> Test System: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (Icelake Server) >> >> >> Benchmark | TESTSIZE | Baseline AVX3 (ops/ms) | Withopt AVX3 (ops/ms) | Gain >> ratio | Baseline AVX2 (ops/ms) | Withopt AVX2 (ops/ms) | Gain ratio >> -- | -- | -- | -- | -- | -- | -- | -- >> FpRoundingBenchmark.test_round_double | 1024.00 | 504.15 | 2209.54 | 4.38 | >> 510.36 | 548.39 | 1.07 >> FpRoundingBenchmark.test_round_double | 2048.00 | 293.64 | 1271.98 | 4.33 | >> 293.48 | 274.01 | 0.93 >> FpRoundingBenchmark.test_round_float | 1024.00 | 825.99 | 4754.66 | 5.76 | >> 751.83 | 2274.13 | 3.02 >> FpRoundingBenchmark.test_round_float | 2048.00 | 412.22 | 2490.09 | 6.04 | >> 388.52 | 1334.18 | 3.43 >> >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request with a new target base due to a > merge or a rebase. The pull request now contains 22 commits: > > - 8279508: Using an explicit scratch register since rscratch1 is bound to > r10 and its usage is transparent to compiler. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8279508 > - 8279508: Windows build failure fix. > - 8279508: Styling comments resolved. > - 8279508: Creating separate test for round double under feature check. > - 8279508: Reducing the invocation count and compile thresholds for > RoundTests.java. > - 8279508: Review comments resolution. > - 8279508: Preventing domain switch-over penalty for Math.round(float) and > constraining unrolling to prevent code bloating. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8279508 > - 8279508: Removing +LogCompilation flag. > - ... and 12 more: > https://git.openjdk.java.net/jdk/compare/ff0b0927...c17440cf Sure, I'll re-run testing and report back. - PR: https://git.openjdk.java.net/jdk/pull/7094
Re: RFR: 8279508: Auto-vectorize Math.round API [v18]
> Summary of changes: > - Intrinsify Math.round(float) and Math.round(double) APIs. > - Extend auto-vectorizer to infer vector operations on encountering scalar IR > nodes for above intrinsics. > - Test creation using new IR testing framework. > > Following are the performance number of a JMH micro included with the patch > > Test System: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (Icelake Server) > > > Benchmark | TESTSIZE | Baseline AVX3 (ops/ms) | Withopt AVX3 (ops/ms) | Gain > ratio | Baseline AVX2 (ops/ms) | Withopt AVX2 (ops/ms) | Gain ratio > -- | -- | -- | -- | -- | -- | -- | -- > FpRoundingBenchmark.test_round_double | 1024.00 | 504.15 | 2209.54 | 4.38 | > 510.36 | 548.39 | 1.07 > FpRoundingBenchmark.test_round_double | 2048.00 | 293.64 | 1271.98 | 4.33 | > 293.48 | 274.01 | 0.93 > FpRoundingBenchmark.test_round_float | 1024.00 | 825.99 | 4754.66 | 5.76 | > 751.83 | 2274.13 | 3.02 > FpRoundingBenchmark.test_round_float | 2048.00 | 412.22 | 2490.09 | 6.04 | > 388.52 | 1334.18 | 3.43 > > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: - 8279508: Using an explicit scratch register since rscratch1 is bound to r10 and its usage is transparent to compiler. - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8279508 - 8279508: Windows build failure fix. - 8279508: Styling comments resolved. - 8279508: Creating separate test for round double under feature check. - 8279508: Reducing the invocation count and compile thresholds for RoundTests.java. - 8279508: Review comments resolution. - 8279508: Preventing domain switch-over penalty for Math.round(float) and constraining unrolling to prevent code bloating. - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8279508 - 8279508: Removing +LogCompilation flag. - ... and 12 more: https://git.openjdk.java.net/jdk/compare/ff0b0927...c17440cf - Changes: https://git.openjdk.java.net/jdk/pull/7094/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk=7094=17 Stats: 800 lines in 25 files changed: 707 ins; 30 del; 63 mod Patch: https://git.openjdk.java.net/jdk/pull/7094.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7094/head:pull/7094 PR: https://git.openjdk.java.net/jdk/pull/7094