- The MD5 intrinsics added by
[JDK-8250902](https://bugs.openjdk.java.net/browse/JDK-8250902) shows that the
`int[] x` isn't actually needed. This also applies to the SHA intrinsics from
which the MD5 intrinsic takes inspiration
- Using VarHandles we can simplify the code in `ByteArrayAccess` enough to make
it acceptable to use inline and replace the array in MD5 wholesale. This
improves performance both in the presence and the absence of the intrinsic
optimization.
- Doing the exact same thing in the SHA impls would be unwieldy (64+ element
arrays), but allocating the array lazily gets most of the speed-up in the
presence of an intrinsic while being neutral in its absence.
Baseline:
(digesterName) (length) Cnt Score
Error Units
MessageDigests.digest MD5 16 15
2714.307 ± 21.133 ops/ms
MessageDigests.digest MD5 1024 15
318.087 ± 0.637 ops/ms
MessageDigests.digest SHA-1 16 15
1387.266 ± 40.932 ops/ms
MessageDigests.digest SHA-1 1024 15
109.273 ± 0.149 ops/ms
MessageDigests.digest SHA-256 16 15
995.566 ± 21.186 ops/ms
MessageDigests.digest SHA-256 1024 15
89.104 ± 0.079 ops/ms
MessageDigests.digest SHA-512 16 15
803.030 ± 15.722 ops/ms
MessageDigests.digest SHA-512 1024 15
115.611 ± 0.234 ops/ms
MessageDigests.getAndDigest MD5 16 15
2190.367 ± 97.037 ops/ms
MessageDigests.getAndDigest MD5 1024 15
302.903 ± 1.809 ops/ms
MessageDigests.getAndDigest SHA-1 16 15
1262.656 ± 43.751 ops/ms
MessageDigests.getAndDigest SHA-1 1024 15
104.889 ± 3.554 ops/ms
MessageDigests.getAndDigest SHA-256 16 15
914.541 ± 55.621 ops/ms
MessageDigests.getAndDigest SHA-256 1024 15
85.708 ± 1.394 ops/ms
MessageDigests.getAndDigest SHA-512 16 15
737.719 ± 53.671 ops/ms
MessageDigests.getAndDigest SHA-512 1024 15
112.307 ± 1.950 ops/ms
GC:
MessageDigests.getAndDigest:·gc.alloc.rate.norm MD5 16 15
312.011 ± 0.005 B/op
MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-1 16 15
584.020 ± 0.006 B/op
MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-256 16 15
544.019 ± 0.016 B/op
MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-512 16 15
1056.037 ± 0.003 B/op
Target:
Benchmark (digesterName) (length) Cnt
Score Error Units
MessageDigests.digest MD5 16 15
3134.462 ± 43.685 ops/ms
MessageDigests.digest MD5 1024 15
323.667 ± 0.633 ops/ms
MessageDigests.digest SHA-1 16 15
1418.742 ± 38.223 ops/ms
MessageDigests.digest SHA-1 1024 15
110.178 ± 0.788 ops/ms
MessageDigests.digest SHA-256 16 15
1037.949 ± 21.214 ops/ms
MessageDigests.digest SHA-256 1024 15
89.671 ± 0.228 ops/ms
MessageDigests.digest SHA-512 16 15
812.028 ± 39.489 ops/ms
MessageDigests.digest SHA-512 1024 15
116.738 ± 0.249 ops/ms
MessageDigests.getAndDigest MD5 16 15
2314.379 ± 229.294 ops/ms
MessageDigests.getAndDigest MD5 1024 15
307.835 ± 5.730 ops/ms
MessageDigests.getAndDigest SHA-1 16 15
1326.887 ± 63.263 ops/ms
MessageDigests.getAndDigest SHA-1 1024 15
106.611 ± 2.292 ops/ms
MessageDigests.getAndDigest SHA-256 16 15
961.589 ± 82.052 ops/ms
MessageDigests.getAndDigest SHA-256 1024 15
88.646 ± 0.194 ops/ms
MessageDigests.getAndDigest SHA-512 16 15
775.417 ± 56.775 ops/ms
MessageDigests.getAndDigest SHA-512 1024 15
112.904 ± 2.014 ops/ms
GC
MessageDigests.getAndDigest:·gc.alloc.rate.norm MD5 16 15
232.009 ± 0.006 B/op
MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-1 16 15
584.021 ± 0.001 B/op
MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-256 16 15
272.012 ± 0.015 B/op
MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-512 16 15
400.017 ± 0.019 B/op
For the `digest` micro digesting small inputs is faster with all algorithms,
ranging from ~1% for SHA-512 up to ~15% for MD5. The gain stems from not
allocating and reading into a temporary buffer once outside of the intrinsic.
SHA-1 does not see a statistically gain because the intrinsic is disabled by
default on my HW.
For the `getAndDigest` micro - which tests
`MessageDigest.getInstance(..).digest(..)` there are similar gains with this
patch. The interesting aspect here is verifying the reduction in allocations
per operation when there's an active intrinsic (again, not for SHA-1).
JDK-8259065 (#1933) reduced allocations on each of these with 144B/op, which
means allocation pressure for SHA-512 is down two thirds from 1200B/op to
400B/op in this contrived test.
I've verified there are no regressions in the absence of the intrinsic - which
the SHA-1 numbers here help show.
-------------
Commit messages:
- Remove unused Unsafe import
- Harmonize MD4 impl, remove now-redundant checks from ByteArrayAccess (VHs do
bounds checks, most of which will be optimized away)
- Merge branch 'master' into improve_md5
- Apply allocation avoiding optimizations to all SHA versions sharing
structural similarities with MD5
- Remove unused reverseBytes imports
- Copyrights
- Fix copy-paste error
- Various fixes (IDE stopped IDEing..)
- Add imports
- mismatched parens
- ... and 8 more: https://git.openjdk.java.net/jdk/compare/090bd3af...e1c943c5
Changes: https://git.openjdk.java.net/jdk/pull/1855/files
Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1855&range=00
Issue: https://bugs.openjdk.java.net/browse/JDK-8259498
Stats: 649 lines in 8 files changed: 83 ins; 344 del; 222 mod
Patch: https://git.openjdk.java.net/jdk/pull/1855.diff
Fetch: git fetch https://git.openjdk.java.net/jdk pull/1855/head:pull/1855
PR: https://git.openjdk.java.net/jdk/pull/1855