I've run my decryption benchmarks on Java with this patch, with excellent results. The benchmarks reach the top speed right away, no long warm-up anymore. Also, there is no need to split the operation into multiple updates, the doFinal works just fine.
Moreover, the maximal decryption throughput is actually higher than in Java11 after warm-up. On one thread, I get 930MB/sec instead of 850MB/s. On 8 threads, 350x8 instead of 230x8. This capability will be important in the Spark/Parquet workloads. Cheers, Gidon On Fri, Nov 30, 2018 at 6:10 PM Andrew Haley <a...@redhat.com> wrote: > On 11/21/18 5:37 PM, Andrew Haley wrote: > > On 11/15/18 10:42 AM, Gidon Gershinsky wrote: > >> Having the decryption optimized in the HotSpot engine would be ideal. > > > > I agree with you. I did a few experiments, and it can take a very > > long time for C2 compilation to kick in, especially because GCM is > > glacially slow until the intrinsics are used. > > > > I think this would be a generally useful enhancement to HotSpot, > > and I'm kicking around an experimental patch which adds the > > intrinsics to c1 and the interpreter. > > There's a proof-of-concept patch at http://cr.openjdk.java.net/~aph/gctr/ > It's all rather hacky but it works. > > The patch is rather more complicated than I would have liked. We > could simplify it somewhat by getting rid of the C1 intrinsic, and > instead making C1 call the interpreter implementation. > > There also a jmh benchmark in that directory. Test results for 1Mb > look like this: > > Interp: > > Benchmark Mode Cnt Score Error Units > AESGCMUpdateAAD2.test avgt 5 1426.275 ± 8.778 us/op > > C1: > > Benchmark Mode Cnt Score Error Units > AESGCMUpdateAAD2.test avgt 5 1359.367 ± 8.196 us/op > > C2: > > Benchmark Mode Cnt Score Error Units > AESGCMUpdateAAD2.test avgt 5 1333.863 ± 18.385 us/op > > -- > Andrew Haley > Java Platform Lead Engineer > Red Hat UK Ltd. <https://www.redhat.com> > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 >