On Thu, 7 Jan 2021 14:45:03 GMT, Claes Redestad <redes...@openjdk.org> wrote:
>> I've identified a number of optimizations to the plumbing behind >> `MessageDigest.getDigest(..)` over in #1933 that removes 80-90% of the >> throughput overhead and all the allocation overhead compared to the >> `clone()` approach prototyped here. The remaining 20ns/op overhead might not >> be enough of a concern to do a point fix in `UUID::nameUUIDFromBytes`. > > Removing the UUID clone cache and running the microbenchmark along with the > changes in #1933: > > Benchmark (size) Mode Cnt > Score Error Units > UUIDBench.fromType3Bytes 20000 thrpt 12 > 2.182 ± 0.090 ops/us > UUIDBench.fromType3Bytes:·gc.alloc.rate 20000 thrpt 12 > 439.020 ± 18.241 MB/sec > UUIDBench.fromType3Bytes:·gc.alloc.rate.norm 20000 thrpt 12 > 264.022 ± 0.003 B/op > > The goal now is if to simplify the digest code and compare alternatives. I've run various tests and concluded that the `VarHandle`ized code is matching or improving upon the `Unsafe`-riddled code in `ByteArrayAccess`. I then went ahead and consolidated to use similar code pattern in `ByteArrayAccess` for consistency, which amounts to a good cleanup. With MD5 intrinsics disabled, I get this baseline: Benchmark (size) Mode Cnt Score Error Units UUIDBench.fromType3Bytes 20000 thrpt 12 1.245 ± 0.077 ops/us UUIDBench.fromType3Bytes:·gc.alloc.rate.norm 20000 thrpt 12 488.042 ± 0.004 B/op With the current patch here (not including #1933): Benchmark (size) Mode Cnt Score Error Units UUIDBench.fromType3Bytes 20000 thrpt 12 1.431 ± 0.106 ops/us UUIDBench.fromType3Bytes:·gc.alloc.rate.norm 20000 thrpt 12 408.035 ± 0.006 B/op If I isolate the `ByteArrayAccess` changes I'm getting performance neutral or slightly better numbers compared to baseline for these tests: Benchmark (size) Mode Cnt Score Error Units UUIDBench.fromType3Bytes 20000 thrpt 12 1.317 ± 0.092 ops/us UUIDBench.fromType3Bytes:·gc.alloc.rate.norm 20000 thrpt 12 488.042 ± 0.004 B/op ------------- PR: https://git.openjdk.java.net/jdk/pull/1855