Re: RFR: 8279508: Auto-vectorize Math.round API
On Sun, 16 Jan 2022 02:23:15 GMT, Quan Anh Mai wrote: > Hi, did we have tests for the scalar intrinsification already? Thanks. Verification is done against scalar rounding operation. https://github.com/openjdk/jdk/pull/7094/files#diff-88b1bad16d68808e6c1224fff7773104924bfdabcb23958c2a3e4e6b06844701R369 Thanks - PR: https://git.openjdk.java.net/jdk/pull/7094
Re: RFR: 8279508: Auto-vectorize Math.round API
On Sat, 15 Jan 2022 02:21:38 GMT, Jatin Bhateja wrote: > Summary of changes: > - Intrinsify Math.round(float) and Math.round(double) APIs. > - Extend auto-vectorizer to infer vector operations on encountering scalar IR > nodes for above intrinsics. > - Test creation using new IR testing framework. > > Following are the performance number of a JMH micro included with the patch > > Test System: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (Icelake Server) > > | | BASELINE AVX2 | WithOpt AVX2 | Gain (opt/baseline) | Baseline AVX3 | > Withopt AVX3 | Gain (opt/baseline) > -- | -- | -- | -- | -- | -- | -- | -- > Benchmark | ARRAYLEN | Score (ops/ms) | Score (ops/ms) | | Score (ops/ms) | > Score (ops/ms) | > FpRoundingBenchmark.test_round_double | 1024 | 518.532 | 1364.066 | > 2.630630318 | 512.908 | 4292.11 | 8.368186887 > FpRoundingBenchmark.test_round_double | 2048 | 270.137 | 830.986 | > 3.076165057 | 273.159 | 2459.116 | 9.002507697 > FpRoundingBenchmark.test_round_float | 1024 | 752.436 | 7780.905 | > 10.34095259 | 752.49 | 9506.694 | 12.63364829 > FpRoundingBenchmark.test_round_float | 2048 | 389.499 | 4113.046 | > 10.55983712 | 389.63 | 4863.673 | 12.48279907 > > Kindly review and share your feedback. > > Best Regards, > Jatin Hi, did we have tests for the scalar intrinsification already? Thanks. - PR: https://git.openjdk.java.net/jdk/pull/7094
Re: RFR: JDK-8277175 : Add a parallel multiply method to BigInteger [v7]
On Thu, 16 Dec 2021 06:07:29 GMT, kabutz wrote: >> BigInteger currently uses three different algorithms for multiply. The >> simple quadratic algorithm, then the slightly better Karatsuba if we exceed >> a bit count and then Toom Cook 3 once we go into the several thousands of >> bits. Since Toom Cook 3 is a recursive algorithm, it is trivial to >> parallelize it. I have demonstrated this several times in conference talks. >> In order to be consistent with other classes such as Arrays and Collection, >> I have added a parallelMultiply() method. Internally we have added a >> parameter to the private multiply method to indicate whether the calculation >> should be done in parallel. >> >> The performance improvements are as should be expected. Fibonacci of 100 >> million (using a single-threaded Dijkstra's sum of squares version) >> completes in 9.2 seconds with the parallelMultiply() vs 25.3 seconds with >> the sequential multiply() method. This is on my 1-8-2 laptop. The final >> multiplications are with very large numbers, which then benefit from the >> parallelization of Toom-Cook 3. Fibonacci 100 million is a 347084 bit number. >> >> We have also parallelized the private square() method. Internally, the >> square() method defaults to be sequential. >> >> Some benchmark results, run on my 1-6-2 server: >> >> >> Benchmark (n) Mode Cnt Score >> Error Units >> BigIntegerParallelMultiply.multiply100ss4 51.707 >> ± 11.194 ms/op >> BigIntegerParallelMultiply.multiply 1000ss4988.302 >> ± 235.977 ms/op >> BigIntegerParallelMultiply.multiply 1ss4 24662.063 >> ± 1123.329 ms/op >> BigIntegerParallelMultiply.parallelMultiply100ss4 49.337 >> ± 26.611 ms/op >> BigIntegerParallelMultiply.parallelMultiply 1000ss4527.560 >> ± 268.903 ms/op >> BigIntegerParallelMultiply.parallelMultiply 1ss4 9076.551 >> ± 1899.444 ms/op >> >> >> We can see that for larger calculations (fib 100m), the execution is 2.7x >> faster in parallel. For medium size (fib 10m) it is 1.873x faster. And for >> small (fib 1m) it is roughly the same. Considering that the fibonacci >> algorithm that we used was in itself sequential, and that the last 3 >> calculations would dominate, 2.7x faster should probably be considered quite >> good on a 1-6-2 machine. > > kabutz has updated the pull request incrementally with one additional commit > since the last revision: > > Changed depth type to byte to save 8 bytes on each RecursiveSquare instance test/jdk/java/math/BigInteger/BigIntegerParallelMultiplyTest.java line 64: > 62: BigInteger fib = fibonacci(n, BigInteger::multiply); > 63: System.out.printf("fibonacci(%d) = %d%n", n, fib); > 64: } I think we can remove this and the loop block at #70-80, since we have the performance test. After that we are good. - PR: https://git.openjdk.java.net/jdk/pull/6409
Re: RFR: 8279842: HTTPS Channel Binding support for Java GSS/Kerberos
On Sat, 15 Jan 2022 00:23:31 GMT, Weijun Wang wrote: >> Yes. I would like the security team to validate this. > > I suggest moving the `TlsChannelBinding` class into > `java.base/sun.security.util` since it's not only used by LDAP anymore. It's > even not restricted to GSS-API. According to > https://www.rfc-editor.org/rfc/rfc5056, "Although inspired by and derived > from the GSS-API, the notion of channel binding described herein is not at > all limited to use by GSS-API applications". > > If so, you might need to modify the types of exceptions thrown in the class, > and move the 2 final strings to some other class inside `java.security.sasl`. Seems like `com.sun.jndi.ldap.sasl.TlsChannelBinding` is not misplaced - PR: https://git.openjdk.java.net/jdk/pull/7065
Re: RFR: 8265891: (ch) InputStream returned by Channels.newInputStream should override transferTo [v13]
On Sun, 1 Aug 2021 22:01:33 GMT, Markus KARG wrote: >> This PR-*draft* is **work in progress** and an invitation to discuss a >> possible solution for issue >> [JDK-8265891](https://bugs.openjdk.java.net/browse/JDK-8265891). It is *not >> yet* intended for a final review. >> >> As proposed in JDK-8265891, this PR provides an implementation for >> `Channels.newInputStream().transferTo()` which provide superior performance >> compared to the current implementation. The changes are: >> * Prevents transfers through the JVM heap as much as possibly by offloading >> to deeper levels via NIO, hence allowing the operating system to optimize >> the transfer. >> * Using more JRE heap in the fallback case when no NIO is possible (still >> only KiBs, hence mostl ynegligible even on SBCs) to better perform on modern >> hardware / fast I/O devides. >> >> Using JMH I have benchmarked both, the original implementation and this >> implementation, and (depending on the used hardware and use case) >> performance change was approx. doubled performance. So this PoC proofs that >> it makes sense to finalize this work and turn it into an actual OpenJDK >> contribution. >> >> I encourage everybody to discuss this draft: >> * Are there valid arguments for *not* doing this change? >> * Is there a *better* way to improve performance of >> `Channels.newInputStream().transferTo()`? >> * How to go on from here: What is missing to get this ready for an actual >> review? > > Markus KARG has updated the pull request incrementally with two additional > commits since the last revision: > > - Draft: Eliminated duplicate code using lambda expressions > - Draft: Use blocking mode also for target channel Please keep this PR open as I am working on several sub-issues currently. - PR: https://git.openjdk.java.net/jdk/pull/4263
Re: RFR: 8279283 - BufferedInputStream should override transferTo [v5]
On Mon, 27 Dec 2021 13:43:12 GMT, Markus KARG wrote: >> Implementation of JDK-8279283 > > Markus KARG has updated the pull request incrementally with one additional > commit since the last revision: > > fixed missing BufferedInputStream Good catches, I will look into your comments! - PR: https://git.openjdk.java.net/jdk/pull/6935