Re: RFR: 8265891: (ch) InputStream returned by Channels.newInputStream should override transferTo [v13]
On Sun, 1 Aug 2021 22:01:33 GMT, Markus KARG wrote: >> This PR-*draft* is **work in progress** and an invitation to discuss a >> possible solution for issue >> [JDK-8265891](https://bugs.openjdk.java.net/browse/JDK-8265891). It is *not >> yet* intended for a final review. >> >> As proposed in JDK-8265891, this PR provides an implementation for >> `Channels.newInputStream().transferTo()` which provide superior performance >> compared to the current implementation. The changes are: >> * Prevents transfers through the JVM heap as much as possibly by offloading >> to deeper levels via NIO, hence allowing the operating system to optimize >> the transfer. >> * Using more JRE heap in the fallback case when no NIO is possible (still >> only KiBs, hence mostl ynegligible even on SBCs) to better perform on modern >> hardware / fast I/O devides. >> >> Using JMH I have benchmarked both, the original implementation and this >> implementation, and (depending on the used hardware and use case) >> performance change was approx. doubled performance. So this PoC proofs that >> it makes sense to finalize this work and turn it into an actual OpenJDK >> contribution. >> >> I encourage everybody to discuss this draft: >> * Are there valid arguments for *not* doing this change? >> * Is there a *better* way to improve performance of >> `Channels.newInputStream().transferTo()`? >> * How to go on from here: What is missing to get this ready for an actual >> review? > > Markus KARG has updated the pull request incrementally with two additional > commits since the last revision: > > - Draft: Eliminated duplicate code using lambda expressions > - Draft: Use blocking mode also for target channel Please keep this PR open. I am still working on it. - PR: https://git.openjdk.java.net/jdk/pull/4263
Re: RFR: 8278173: [vectorapi] Add x64 intrinsics for unsigned (zero extended) casts [v2]
On Thu, 10 Feb 2022 18:55:29 GMT, Paul Sandoz wrote: >> Quan Anh Mai has updated the pull request incrementally with two additional >> commits since the last revision: >> >> - minor rename >> - address reviews > > Observing the following failures on CPUs with > "Intel_R__Xeon_R__Gold_6354_CPU___3.00GHz" with HotSpot flags: > > -XX:+CreateCoredumpOnCrash -ea -esa -XX:CompileThreshold=100 > -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation > > > TestVectorCastAVX512.java: > > Failed IR Rules (1) > -- > - Method "public static void > compiler.vectorapi.reshape.tests.TestVectorCast.testUI256toL512(int[],long[])": > * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={}, applyIf={}, > applyIfAnd={}, applyIfOr={}, > counts={"(d+(s){2}(VectorUCastI2X.*)+(s){2}===.*)", "1"}, > applyIfNot={})" > - counts: Graph contains wrong number of nodes: > Regex 1: (\\d+(\\s){2}(VectorUCastI2X.*)+(\\s){2}===.*) > Expected 1 but found 0 nodes. > > > TestVectorCastAVX1.java: > > - Method "public static void > compiler.vectorapi.reshape.tests.TestVectorCast.testUB64toS64(byte[],short[])": > * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={}, applyIf={}, > applyIfAnd={}, applyIfOr={}, > counts={"(d+(s){2}(VectorUCastB2X.*)+(s){2}===.*)", "1"}, > applyIfNot={})" > - counts: Graph contains wrong number of nodes: > Regex 1: (\\d+(\\s){2}(VectorUCastB2X.*)+(\\s){2}===.*) > Expected 1 but found 0 nodes. > > - Method "public static void > compiler.vectorapi.reshape.tests.TestVectorCast.testUB64toI128(byte[],int[])": > * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={}, applyIf={}, > applyIfAnd={}, applyIfOr={}, > counts={"(d+(s){2}(VectorUCastB2X.*)+(s){2}===.*)", "1"}, > applyIfNot={})" > - counts: Graph contains wrong number of nodes: > Regex 1: (\\d+(\\s){2}(VectorUCastB2X.*)+(\\s){2}===.*) > Expected 1 but found 0 nodes. @PaulSandoz Thanks a lot for your testing, the reason seems to be due to `LaneType::asIntegral` missing `ForceInline` annotation. I have run the reshape test 10 times without getting any failure while with previous patch there is often 1 or 2. Thanks. - PR: https://git.openjdk.java.net/jdk/pull/7358
Re: RFR: 8278173: [vectorapi] Add x64 intrinsics for unsigned (zero extended) casts [v3]
> Hi, > > This patch implements the unsigned upcast intrinsics in x86, which are used > in vector lane-wise reinterpreting operations. > > Thank you very much. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: missing ForceInline - Changes: - all: https://git.openjdk.java.net/jdk/pull/7358/files - new: https://git.openjdk.java.net/jdk/pull/7358/files/8028be52..cf78527b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk=7358=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk=7358=01-02 Stats: 10 lines in 2 files changed: 6 ins; 1 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/7358.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7358/head:pull/7358 PR: https://git.openjdk.java.net/jdk/pull/7358
Re: RFR: 8279508: Auto-vectorize Math.round API [v3]
On Sun, 13 Feb 2022 03:09:43 GMT, Jatin Bhateja wrote: >> Summary of changes: >> - Intrinsify Math.round(float) and Math.round(double) APIs. >> - Extend auto-vectorizer to infer vector operations on encountering scalar >> IR nodes for above intrinsics. >> - Test creation using new IR testing framework. >> >> Following are the performance number of a JMH micro included with the patch >> >> Test System: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (Icelake Server) >> >> >> Benchmark | TESTSIZE | Baseline AVX3 (ops/ms) | Withopt AVX3 (ops/ms) | Gain >> ratio | Baseline AVX2 (ops/ms) | Withopt AVX2 (ops/ms) | Gain ratio >> -- | -- | -- | -- | -- | -- | -- | -- >> FpRoundingBenchmark.test_round_double | 1024.00 | 584.99 | 1870.70 | 3.20 | >> 510.35 | 548.60 | 1.07 >> FpRoundingBenchmark.test_round_double | 2048.00 | 257.17 | 965.33 | 3.75 | >> 293.60 | 273.15 | 0.93 >> FpRoundingBenchmark.test_round_float | 1024.00 | 825.69 | 3592.54 | 4.35 | >> 825.32 | 1836.42 | 2.23 >> FpRoundingBenchmark.test_round_float | 2048.00 | 388.55 | 1895.77 | 4.88 | >> 412.31 | 945.82 | 2.29 >> >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request with a new target base due to a > merge or a rebase. The incremental webrev excludes the unrelated changes > brought in by the merge/rebase. The pull request contains four additional > commits since the last revision: > > - 8279508: Adding vectorized algorithms to match the semantics of rounding > operations. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8279508 > - 8279508: Adding a test for scalar intrinsification. > - 8279508: Auto-vectorize Math.round API Hi, IIRC for evex encoding you can embed the RC control bit directly in the evex prefix, removing the need to rely on global MXCSR register. Thanks. - PR: https://git.openjdk.java.net/jdk/pull/7094
Re: RFR: 8279508: Auto-vectorize Math.round API [v3]
> Summary of changes: > - Intrinsify Math.round(float) and Math.round(double) APIs. > - Extend auto-vectorizer to infer vector operations on encountering scalar IR > nodes for above intrinsics. > - Test creation using new IR testing framework. > > Following are the performance number of a JMH micro included with the patch > > Test System: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (Icelake Server) > > > Benchmark | TESTSIZE | Baseline AVX3 (ops/ms) | Withopt AVX3 (ops/ms) | Gain > ratio | Baseline AVX2 (ops/ms) | Withopt AVX2 (ops/ms) | Gain ratio > -- | -- | -- | -- | -- | -- | -- | -- > FpRoundingBenchmark.test_round_double | 1024.00 | 584.99 | 1870.70 | 3.20 | > 510.35 | 548.60 | 1.07 > FpRoundingBenchmark.test_round_double | 2048.00 | 257.17 | 965.33 | 3.75 | > 293.60 | 273.15 | 0.93 > FpRoundingBenchmark.test_round_float | 1024.00 | 825.69 | 3592.54 | 4.35 | > 825.32 | 1836.42 | 2.23 > FpRoundingBenchmark.test_round_float | 2048.00 | 388.55 | 1895.77 | 4.88 | > 412.31 | 945.82 | 2.29 > > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - 8279508: Adding vectorized algorithms to match the semantics of rounding operations. - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8279508 - 8279508: Adding a test for scalar intrinsification. - 8279508: Auto-vectorize Math.round API - Changes: - all: https://git.openjdk.java.net/jdk/pull/7094/files - new: https://git.openjdk.java.net/jdk/pull/7094/files/575d2935..2dc364fa Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk=7094=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk=7094=01-02 Stats: 33695 lines in 1192 files changed: 23243 ins; 5703 del; 4749 mod Patch: https://git.openjdk.java.net/jdk/pull/7094.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7094/head:pull/7094 PR: https://git.openjdk.java.net/jdk/pull/7094
Re: RFR: 8279508: Auto-vectorize Math.round API [v2]
On Fri, 21 Jan 2022 00:49:04 GMT, Sandhya Viswanathan wrote: > The JVM currently initializes the x86 mxcsr to round to nearest even, see > below in stubGenerator_x86_64.cpp: // Round to nearest (even), 64-bit mode, > exceptions masked StubRoutines::x86::_mxcsr_std = 0x1F80; The above works for > Math.rint which is specified to be round to nearest even. Please see: > https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html > : section 4.8.4 > > The rounding mode needed for Math.round is round to positive infinity which > needs a different x86 mxcsr initialization(0x5F80). Hi @sviswa7 , As per JLS 17 section 15.4 Java follows round to nearest rounding policy for all floating point operations except conversion to integer and remainder where it uses round toward zero. So it may not be feasible to modify global MXCSR.RC setting, also modifying MXCSR setting just before rounding and re-setting back to its original value after operation will also not work as OOO processor is free to re-order LMXCSR instruction if used without any barriers and thus it may also influence other floating point operation. I am pushing an incremental patch which is vectorizes existing rounding APIs and is showing significant gain over existing implementation. Best Regards, Jatin - PR: https://git.openjdk.java.net/jdk/pull/7094
Re: RFR: JDK-8281000 ClassLoader::registerAsParallelCapable throws NPE if caller is null
On Fri, 11 Feb 2022 23:25:44 GMT, Brent Christian wrote: > Having a second thought, since this API expects to be called by a class > loader, I think throwing `IllegalCallerException` to indicate this method is > called by an illegal caller. This will need a CSR due to the spec change. I think this would work for both the "no caller" case and also the case where there is reflection hackery calling this method from somewhere other than a ClassLoader. So it would be a small change in behavior from CCE to ICE. - PR: https://git.openjdk.java.net/jdk/pull/7448
Re: RFR: 8279283 - BufferedInputStream should override transferTo [v5]
On Mon, 27 Dec 2021 13:43:12 GMT, Markus KARG wrote: >> Implementation of JDK-8279283 > > Markus KARG has updated the pull request incrementally with one additional > commit since the last revision: > > fixed missing BufferedInputStream Please keep open, still working on it. - PR: https://git.openjdk.java.net/jdk/pull/6935