On Thu, 27 Oct 2022 05:10:59 GMT, Jatin Bhateja <jbhat...@openjdk.org> wrote:
>> vpaprotsk has updated the pull request incrementally with one additional >> commit since the last revision: >> >> extra whitespace character > > src/java.base/share/classes/com/sun/crypto/provider/Poly1305.java line 175: > >> 173: // Choice of 1024 is arbitrary, need enough data blocks to >> amortize conversion overhead >> 174: // and not affect platforms without intrinsic support >> 175: int blockMultipleLength = (len/BLOCK_LENGTH) * BLOCK_LENGTH; > > Since Poly processes 16 byte chunks, a strength reduced version of above > expression could be len & (~(BLOCK_LEN-1) I guess I got no issue with either version.. I was mostly thinking about code clarity? I think your version is 'more reliable' so just gonna switch it, thanks. > test/micro/org/openjdk/bench/javax/crypto/full/Poly1305DigestBench.java line > 94: > >> 92: throw new RuntimeException(ex); >> 93: } >> 94: } > > On CLX patch shows performance regression of about 10% for block size > 1024-2048+. > > CLX (Non-IFMA target) > > Baseline (JDK-20):- > > Benchmark (dataSize) (provider) Mode Cnt Score > Error Units > Poly1305DigestBench.digest 64 thrpt 2 3128928.978 > ops/s > Poly1305DigestBench.digest 256 thrpt 2 1526452.083 > ops/s > Poly1305DigestBench.digest 1024 thrpt 2 509267.401 > ops/s > Poly1305DigestBench.digest 2048 thrpt 2 305784.922 > ops/s > Poly1305DigestBench.digest 4096 thrpt 2 142175.885 > ops/s > Poly1305DigestBench.digest 8192 thrpt 2 72142.906 > ops/s > Poly1305DigestBench.digest 16384 thrpt 2 36357.000 > ops/s > Poly1305DigestBench.digest 1048576 thrpt 2 676.142 > ops/s > > > Withopt: > Benchmark (dataSize) (provider) Mode Cnt Score > Error Units > Poly1305DigestBench.digest 64 thrpt 2 3136204.416 > ops/s > Poly1305DigestBench.digest 256 thrpt 2 1683221.124 > ops/s > Poly1305DigestBench.digest 1024 thrpt 2 457432.172 > ops/s > Poly1305DigestBench.digest 2048 thrpt 2 277563.817 > ops/s > Poly1305DigestBench.digest 4096 thrpt 2 149393.357 > ops/s > Poly1305DigestBench.digest 8192 thrpt 2 79463.734 > ops/s > Poly1305DigestBench.digest 16384 thrpt 2 41083.730 > ops/s > Poly1305DigestBench.digest 1048576 thrpt 2 705.419 > ops/s Odd, I measured it on `11th Gen Intel(R) Core(TM) i7-11700 @ 2.50GHz`, will go again ------------- PR: https://git.openjdk.org/jdk/pull/10582