Re: RFR: 8279508: Auto-vectorize Math.round API

2022-01-15 Thread Jatin Bhateja
On Sun, 16 Jan 2022 02:23:15 GMT, Quan Anh Mai  wrote:

> Hi, did we have tests for the scalar intrinsification already? Thanks.

Verification is done against scalar rounding operation.
https://github.com/openjdk/jdk/pull/7094/files#diff-88b1bad16d68808e6c1224fff7773104924bfdabcb23958c2a3e4e6b06844701R369

Thanks

-

PR: https://git.openjdk.java.net/jdk/pull/7094


Re: RFR: 8279508: Auto-vectorize Math.round API

2022-01-15 Thread Quan Anh Mai
On Sat, 15 Jan 2022 02:21:38 GMT, Jatin Bhateja  wrote:

> Summary of changes:
> - Intrinsify Math.round(float) and Math.round(double) APIs.
> - Extend auto-vectorizer to infer vector operations on encountering scalar IR 
> nodes for above intrinsics.
> - Test creation using new IR testing framework.
> 
> Following are the performance number of a JMH micro included with the patch 
> 
> Test System: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (Icelake Server)
> 
>   |   | BASELINE AVX2 | WithOpt AVX2 | Gain (opt/baseline) | Baseline AVX3 | 
> Withopt AVX3 | Gain (opt/baseline)
> -- | -- | -- | -- | -- | -- | -- | --
> Benchmark | ARRAYLEN | Score (ops/ms) | Score (ops/ms) |   | Score (ops/ms) | 
> Score (ops/ms) |  
> FpRoundingBenchmark.test_round_double | 1024 | 518.532 | 1364.066 | 
> 2.630630318 | 512.908 | 4292.11 | 8.368186887
> FpRoundingBenchmark.test_round_double | 2048 | 270.137 | 830.986 | 
> 3.076165057 | 273.159 | 2459.116 | 9.002507697
> FpRoundingBenchmark.test_round_float | 1024 | 752.436 | 7780.905 | 
> 10.34095259 | 752.49 | 9506.694 | 12.63364829
> FpRoundingBenchmark.test_round_float | 2048 | 389.499 | 4113.046 | 
> 10.55983712 | 389.63 | 4863.673 | 12.48279907
> 
> Kindly review and share your feedback.
> 
> Best Regards,
> Jatin

Hi, did we have tests for the scalar intrinsification already?
Thanks.

-

PR: https://git.openjdk.java.net/jdk/pull/7094


Re: RFR: JDK-8277175 : Add a parallel multiply method to BigInteger [v7]

2022-01-15 Thread Paul Sandoz
On Thu, 16 Dec 2021 06:07:29 GMT, kabutz  wrote:

>> BigInteger currently uses three different algorithms for multiply. The 
>> simple quadratic algorithm, then the slightly better Karatsuba if we exceed 
>> a bit count and then Toom Cook 3 once we go into the several thousands of 
>> bits. Since Toom Cook 3 is a recursive algorithm, it is trivial to 
>> parallelize it. I have demonstrated this several times in conference talks. 
>> In order to be consistent with other classes such as Arrays and Collection, 
>> I have added a parallelMultiply() method. Internally we have added a 
>> parameter to the private multiply method to indicate whether the calculation 
>> should be done in parallel.
>> 
>> The performance improvements are as should be expected. Fibonacci of 100 
>> million (using a single-threaded Dijkstra's sum of squares version) 
>> completes in 9.2 seconds with the parallelMultiply() vs 25.3 seconds with 
>> the sequential multiply() method. This is on my 1-8-2 laptop. The final 
>> multiplications are with very large numbers, which then benefit from the 
>> parallelization of Toom-Cook 3. Fibonacci 100 million is a 347084 bit number.
>> 
>> We have also parallelized the private square() method. Internally, the 
>> square() method defaults to be sequential.
>> 
>> Some benchmark results, run on my 1-6-2 server:
>> 
>> 
>> Benchmark  (n)  Mode  Cnt  Score 
>>  Error  Units
>> BigIntegerParallelMultiply.multiply100ss4 51.707 
>> ±   11.194  ms/op
>> BigIntegerParallelMultiply.multiply   1000ss4988.302 
>> ±  235.977  ms/op
>> BigIntegerParallelMultiply.multiply  1ss4  24662.063 
>> ± 1123.329  ms/op
>> BigIntegerParallelMultiply.parallelMultiply100ss4 49.337 
>> ±   26.611  ms/op
>> BigIntegerParallelMultiply.parallelMultiply   1000ss4527.560 
>> ±  268.903  ms/op
>> BigIntegerParallelMultiply.parallelMultiply  1ss4   9076.551 
>> ± 1899.444  ms/op
>> 
>> 
>> We can see that for larger calculations (fib 100m), the execution is 2.7x 
>> faster in parallel. For medium size (fib 10m) it is 1.873x faster. And for 
>> small (fib 1m) it is roughly the same. Considering that the fibonacci 
>> algorithm that we used was in itself sequential, and that the last 3 
>> calculations would dominate, 2.7x faster should probably be considered quite 
>> good on a 1-6-2 machine.
>
> kabutz has updated the pull request incrementally with one additional commit 
> since the last revision:
> 
>   Changed depth type to byte to save 8 bytes on each RecursiveSquare instance

test/jdk/java/math/BigInteger/BigIntegerParallelMultiplyTest.java line 64:

> 62: BigInteger fib = fibonacci(n, BigInteger::multiply);
> 63: System.out.printf("fibonacci(%d) = %d%n", n, fib);
> 64: }

I think we can remove this and the loop block at #70-80, since we have the 
performance test. After that we are good.

-

PR: https://git.openjdk.java.net/jdk/pull/6409


Re: RFR: 8279842: HTTPS Channel Binding support for Java GSS/Kerberos

2022-01-15 Thread Michael Osipov
On Sat, 15 Jan 2022 00:23:31 GMT, Weijun Wang  wrote:

>> Yes. I would like the security team to validate this.
>
> I suggest moving the `TlsChannelBinding` class into 
> `java.base/sun.security.util` since it's not only used by LDAP anymore. It's 
> even not restricted to GSS-API. According to 
> https://www.rfc-editor.org/rfc/rfc5056, "Although inspired by and derived 
> from the GSS-API, the notion of channel binding described herein is not at 
> all limited to use by GSS-API applications".
> 
> If so, you might need to modify the types of exceptions thrown in the class, 
> and move the 2 final strings to some other class inside `java.security.sasl`.

Seems like `com.sun.jndi.ldap.sasl.TlsChannelBinding` is not misplaced

-

PR: https://git.openjdk.java.net/jdk/pull/7065


Re: RFR: 8265891: (ch) InputStream returned by Channels.newInputStream should override transferTo [v13]

2022-01-15 Thread Markus KARG
On Sun, 1 Aug 2021 22:01:33 GMT, Markus KARG  wrote:

>> This PR-*draft* is **work in progress** and an invitation to discuss a 
>> possible solution for issue 
>> [JDK-8265891](https://bugs.openjdk.java.net/browse/JDK-8265891). It is *not 
>> yet* intended for a final review.
>> 
>> As proposed in JDK-8265891, this PR provides an implementation for 
>> `Channels.newInputStream().transferTo()` which provide superior performance 
>> compared to the current implementation. The changes are:
>> * Prevents transfers through the JVM heap as much as possibly by offloading 
>> to deeper levels via NIO, hence allowing the operating system to optimize 
>> the transfer.
>> * Using more JRE heap in the fallback case when no NIO is possible (still 
>> only KiBs, hence mostl ynegligible even on SBCs) to better perform on modern 
>> hardware / fast I/O devides.
>> 
>> Using JMH I have benchmarked both, the original implementation and this 
>> implementation, and (depending on the used hardware and use case) 
>> performance change was approx. doubled performance. So this PoC proofs that 
>> it makes sense to finalize this work and turn it into an actual OpenJDK 
>> contribution. 
>> 
>> I encourage everybody to discuss this draft:
>> * Are there valid arguments for *not* doing this change?
>> * Is there a *better* way to improve performance of 
>> `Channels.newInputStream().transferTo()`?
>> * How to go on from here: What is missing to get this ready for an actual 
>> review?
>
> Markus KARG has updated the pull request incrementally with two additional 
> commits since the last revision:
> 
>  - Draft: Eliminated duplicate code using lambda expressions
>  - Draft: Use blocking mode also for target channel

Please keep this PR open as I am working on several sub-issues currently.

-

PR: https://git.openjdk.java.net/jdk/pull/4263


Re: RFR: 8279283 - BufferedInputStream should override transferTo [v5]

2022-01-15 Thread Markus KARG
On Mon, 27 Dec 2021 13:43:12 GMT, Markus KARG  wrote:

>> Implementation of JDK-8279283
>
> Markus KARG has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   fixed missing BufferedInputStream

Good catches, I will look into your comments!

-

PR: https://git.openjdk.java.net/jdk/pull/6935