Re: RFR: 8265891: (ch) InputStream returned by Channels.newInputStream should override transferTo [v13]

2022-02-12 Thread Markus KARG
On Sun, 1 Aug 2021 22:01:33 GMT, Markus KARG  wrote:

>> This PR-*draft* is **work in progress** and an invitation to discuss a 
>> possible solution for issue 
>> [JDK-8265891](https://bugs.openjdk.java.net/browse/JDK-8265891). It is *not 
>> yet* intended for a final review.
>> 
>> As proposed in JDK-8265891, this PR provides an implementation for 
>> `Channels.newInputStream().transferTo()` which provide superior performance 
>> compared to the current implementation. The changes are:
>> * Prevents transfers through the JVM heap as much as possibly by offloading 
>> to deeper levels via NIO, hence allowing the operating system to optimize 
>> the transfer.
>> * Using more JRE heap in the fallback case when no NIO is possible (still 
>> only KiBs, hence mostl ynegligible even on SBCs) to better perform on modern 
>> hardware / fast I/O devides.
>> 
>> Using JMH I have benchmarked both, the original implementation and this 
>> implementation, and (depending on the used hardware and use case) 
>> performance change was approx. doubled performance. So this PoC proofs that 
>> it makes sense to finalize this work and turn it into an actual OpenJDK 
>> contribution. 
>> 
>> I encourage everybody to discuss this draft:
>> * Are there valid arguments for *not* doing this change?
>> * Is there a *better* way to improve performance of 
>> `Channels.newInputStream().transferTo()`?
>> * How to go on from here: What is missing to get this ready for an actual 
>> review?
>
> Markus KARG has updated the pull request incrementally with two additional 
> commits since the last revision:
> 
>  - Draft: Eliminated duplicate code using lambda expressions
>  - Draft: Use blocking mode also for target channel

Please keep this PR open. I am still working on it.

-

PR: https://git.openjdk.java.net/jdk/pull/4263


Re: RFR: 8278173: [vectorapi] Add x64 intrinsics for unsigned (zero extended) casts [v2]

2022-02-12 Thread Quan Anh Mai
On Thu, 10 Feb 2022 18:55:29 GMT, Paul Sandoz  wrote:

>> Quan Anh Mai has updated the pull request incrementally with two additional 
>> commits since the last revision:
>> 
>>  - minor rename
>>  - address reviews
>
> Observing the following failures on CPUs with 
> "Intel_R__Xeon_R__Gold_6354_CPU___3.00GHz" with HotSpot flags:
> 
> -XX:+CreateCoredumpOnCrash -ea -esa -XX:CompileThreshold=100 
> -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation
> 
> 
> TestVectorCastAVX512.java:
> 
> Failed IR Rules (1)
> --
> - Method "public static void 
> compiler.vectorapi.reshape.tests.TestVectorCast.testUI256toL512(int[],long[])":
>   * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={}, applyIf={}, 
> applyIfAnd={}, applyIfOr={}, 
> counts={"(d+(s){2}(VectorUCastI2X.*)+(s){2}===.*)", "1"}, 
> applyIfNot={})"
> - counts: Graph contains wrong number of nodes:
> Regex 1: (\\d+(\\s){2}(VectorUCastI2X.*)+(\\s){2}===.*)
> Expected 1 but found 0 nodes.
> 
> 
> TestVectorCastAVX1.java:
> 
> - Method "public static void 
> compiler.vectorapi.reshape.tests.TestVectorCast.testUB64toS64(byte[],short[])":
>   * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={}, applyIf={}, 
> applyIfAnd={}, applyIfOr={}, 
> counts={"(d+(s){2}(VectorUCastB2X.*)+(s){2}===.*)", "1"}, 
> applyIfNot={})"
> - counts: Graph contains wrong number of nodes:
> Regex 1: (\\d+(\\s){2}(VectorUCastB2X.*)+(\\s){2}===.*)
> Expected 1 but found 0 nodes.
> 
> - Method "public static void 
> compiler.vectorapi.reshape.tests.TestVectorCast.testUB64toI128(byte[],int[])":
>   * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={}, applyIf={}, 
> applyIfAnd={}, applyIfOr={}, 
> counts={"(d+(s){2}(VectorUCastB2X.*)+(s){2}===.*)", "1"}, 
> applyIfNot={})"
> - counts: Graph contains wrong number of nodes:
> Regex 1: (\\d+(\\s){2}(VectorUCastB2X.*)+(\\s){2}===.*)
> Expected 1 but found 0 nodes.

@PaulSandoz Thanks a lot for your testing, the reason seems to be due to 
`LaneType::asIntegral` missing `ForceInline` annotation. I have run the reshape 
test 10 times without getting any failure while with previous patch there is 
often 1 or 2.
Thanks.

-

PR: https://git.openjdk.java.net/jdk/pull/7358


Re: RFR: 8278173: [vectorapi] Add x64 intrinsics for unsigned (zero extended) casts [v3]

2022-02-12 Thread Quan Anh Mai
> Hi,
> 
> This patch implements the unsigned upcast intrinsics in x86, which are used 
> in vector lane-wise reinterpreting operations.
> 
> Thank you very much.

Quan Anh Mai has updated the pull request incrementally with one additional 
commit since the last revision:

  missing ForceInline

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/7358/files
  - new: https://git.openjdk.java.net/jdk/pull/7358/files/8028be52..cf78527b

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=7358=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=7358=01-02

  Stats: 10 lines in 2 files changed: 6 ins; 1 del; 3 mod
  Patch: https://git.openjdk.java.net/jdk/pull/7358.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/7358/head:pull/7358

PR: https://git.openjdk.java.net/jdk/pull/7358


Re: RFR: 8279508: Auto-vectorize Math.round API [v3]

2022-02-12 Thread Quan Anh Mai
On Sun, 13 Feb 2022 03:09:43 GMT, Jatin Bhateja  wrote:

>> Summary of changes:
>> - Intrinsify Math.round(float) and Math.round(double) APIs.
>> - Extend auto-vectorizer to infer vector operations on encountering scalar 
>> IR nodes for above intrinsics.
>> - Test creation using new IR testing framework.
>> 
>> Following are the performance number of a JMH micro included with the patch 
>> 
>> Test System: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (Icelake Server)
>> 
>> 
>> Benchmark | TESTSIZE | Baseline AVX3 (ops/ms) | Withopt AVX3 (ops/ms) | Gain 
>> ratio | Baseline AVX2 (ops/ms) | Withopt AVX2 (ops/ms) | Gain ratio
>> -- | -- | -- | -- | -- | -- | -- | --
>> FpRoundingBenchmark.test_round_double | 1024.00 | 584.99 | 1870.70 | 3.20 | 
>> 510.35 | 548.60 | 1.07
>> FpRoundingBenchmark.test_round_double | 2048.00 | 257.17 | 965.33 | 3.75 | 
>> 293.60 | 273.15 | 0.93
>> FpRoundingBenchmark.test_round_float | 1024.00 | 825.69 | 3592.54 | 4.35 | 
>> 825.32 | 1836.42 | 2.23
>> FpRoundingBenchmark.test_round_float | 2048.00 | 388.55 | 1895.77 | 4.88 | 
>> 412.31 | 945.82 | 2.29
>> 
>> 
>> Kindly review and share your feedback.
>> 
>> Best Regards,
>> Jatin
>
> Jatin Bhateja has updated the pull request with a new target base due to a 
> merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains four additional 
> commits since the last revision:
> 
>  - 8279508: Adding vectorized algorithms to match the semantics of rounding 
> operations.
>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8279508
>  - 8279508: Adding a test for scalar intrinsification.
>  - 8279508: Auto-vectorize Math.round API

Hi, IIRC for evex encoding you can embed the RC control bit directly in the 
evex prefix, removing the need to rely on global MXCSR register. Thanks.

-

PR: https://git.openjdk.java.net/jdk/pull/7094


Re: RFR: 8279508: Auto-vectorize Math.round API [v3]

2022-02-12 Thread Jatin Bhateja
> Summary of changes:
> - Intrinsify Math.round(float) and Math.round(double) APIs.
> - Extend auto-vectorizer to infer vector operations on encountering scalar IR 
> nodes for above intrinsics.
> - Test creation using new IR testing framework.
> 
> Following are the performance number of a JMH micro included with the patch 
> 
> Test System: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (Icelake Server)
> 
> 
> Benchmark | TESTSIZE | Baseline AVX3 (ops/ms) | Withopt AVX3 (ops/ms) | Gain 
> ratio | Baseline AVX2 (ops/ms) | Withopt AVX2 (ops/ms) | Gain ratio
> -- | -- | -- | -- | -- | -- | -- | --
> FpRoundingBenchmark.test_round_double | 1024.00 | 584.99 | 1870.70 | 3.20 | 
> 510.35 | 548.60 | 1.07
> FpRoundingBenchmark.test_round_double | 2048.00 | 257.17 | 965.33 | 3.75 | 
> 293.60 | 273.15 | 0.93
> FpRoundingBenchmark.test_round_float | 1024.00 | 825.69 | 3592.54 | 4.35 | 
> 825.32 | 1836.42 | 2.23
> FpRoundingBenchmark.test_round_float | 2048.00 | 388.55 | 1895.77 | 4.88 | 
> 412.31 | 945.82 | 2.29
> 
> 
> Kindly review and share your feedback.
> 
> Best Regards,
> Jatin

Jatin Bhateja has updated the pull request with a new target base due to a 
merge or a rebase. The incremental webrev excludes the unrelated changes 
brought in by the merge/rebase. The pull request contains four additional 
commits since the last revision:

 - 8279508: Adding vectorized algorithms to match the semantics of rounding 
operations.
 - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8279508
 - 8279508: Adding a test for scalar intrinsification.
 - 8279508: Auto-vectorize Math.round API

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/7094/files
  - new: https://git.openjdk.java.net/jdk/pull/7094/files/575d2935..2dc364fa

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=7094=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=7094=01-02

  Stats: 33695 lines in 1192 files changed: 23243 ins; 5703 del; 4749 mod
  Patch: https://git.openjdk.java.net/jdk/pull/7094.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/7094/head:pull/7094

PR: https://git.openjdk.java.net/jdk/pull/7094


Re: RFR: 8279508: Auto-vectorize Math.round API [v2]

2022-02-12 Thread Jatin Bhateja
On Fri, 21 Jan 2022 00:49:04 GMT, Sandhya Viswanathan 
 wrote:

> The JVM currently initializes the x86 mxcsr to round to nearest even, see 
> below in stubGenerator_x86_64.cpp: // Round to nearest (even), 64-bit mode, 
> exceptions masked StubRoutines::x86::_mxcsr_std = 0x1F80; The above works for 
> Math.rint which is specified to be round to nearest even. Please see: 
> https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html
>  : section 4.8.4
> 
> The rounding mode needed for Math.round is round to positive infinity which 
> needs a different x86 mxcsr initialization(0x5F80).

Hi @sviswa7 ,
As per JLS 17 section 15.4 Java follows round to nearest rounding policy for 
all floating point operations except conversion to integer and remainder where 
it uses round toward zero.  

So it may not be feasible to modify global MXCSR.RC setting,  also modifying 
MXCSR setting just before rounding and re-setting back to its original value 
after operation will also not work as OOO processor is free to re-order LMXCSR 
instruction if used without any barriers and thus it may also influence other 
floating point operation. 
I am pushing an incremental patch which is vectorizes existing rounding APIs 
and is showing significant gain over existing implementation.

Best Regards,
Jatin

-

PR: https://git.openjdk.java.net/jdk/pull/7094


Re: RFR: JDK-8281000 ClassLoader::registerAsParallelCapable throws NPE if caller is null

2022-02-12 Thread Alan Bateman
On Fri, 11 Feb 2022 23:25:44 GMT, Brent Christian  wrote:

> Having a second thought, since this API expects to be called by a class 
> loader, I think throwing `IllegalCallerException` to indicate this method is 
> called by an illegal caller. This will need a CSR due to the spec change.

I think this would work for both the "no caller" case and also the case where 
there is reflection hackery calling this method from somewhere other than a 
ClassLoader. So it would be a small change in behavior from CCE to ICE.

-

PR: https://git.openjdk.java.net/jdk/pull/7448


Re: RFR: 8279283 - BufferedInputStream should override transferTo [v5]

2022-02-12 Thread Markus KARG
On Mon, 27 Dec 2021 13:43:12 GMT, Markus KARG  wrote:

>> Implementation of JDK-8279283
>
> Markus KARG has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   fixed missing BufferedInputStream

Please keep open, still working on it.

-

PR: https://git.openjdk.java.net/jdk/pull/6935