On 11/21/18 5:37 PM, Andrew Haley wrote: > On 11/15/18 10:42 AM, Gidon Gershinsky wrote: >> Having the decryption optimized in the HotSpot engine would be ideal. > > I agree with you. I did a few experiments, and it can take a very > long time for C2 compilation to kick in, especially because GCM is > glacially slow until the intrinsics are used. > > I think this would be a generally useful enhancement to HotSpot, > and I'm kicking around an experimental patch which adds the > intrinsics to c1 and the interpreter.
There's a proof-of-concept patch at http://cr.openjdk.java.net/~aph/gctr/ It's all rather hacky but it works. The patch is rather more complicated than I would have liked. We could simplify it somewhat by getting rid of the C1 intrinsic, and instead making C1 call the interpreter implementation. There also a jmh benchmark in that directory. Test results for 1Mb look like this: Interp: Benchmark Mode Cnt Score Error Units AESGCMUpdateAAD2.test avgt 5 1426.275 ± 8.778 us/op C1: Benchmark Mode Cnt Score Error Units AESGCMUpdateAAD2.test avgt 5 1359.367 ± 8.196 us/op C2: Benchmark Mode Cnt Score Error Units AESGCMUpdateAAD2.test avgt 5 1333.863 ± 18.385 us/op -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. <https://www.redhat.com> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671