On 19/08/2014, at 12:32 am, Florian Weimer <fwei...@redhat.com> wrote:
> This change addresses a severe performance regression, first introduced in > JDK 8, triggered by the negotiation of a GCM cipher suite in the TLS > implementation. This regression is a result of the poor performance of the > implementation of the GHASH function. > > I first tried to eliminate just the allocations in blockMult while still > retaining the byte arrays. This did not substantially increase performance > in my micro-benchmark. I then replaced the 16-byte arrays with longs, > replaced the inner loops with direct bit fiddling on the longs, eliminated > data-dependent conditionals (which are generally frowned upon in > cryptographic algorithms due to the risk of timing attacks), and split the > main loop in two, one for each half of the hash state. This is the result: > > <https://fweimer.fedorapeople.org/openjdk/ghash-performance/> > > Performance is roughly ten times faster. My test download over HTTPS is no > longer CPU-bound, and GHASH hardly shows up in profiles anymore. (That's why > I didn't consider further changes, lookup tables in particular.) > Micro-benchmarking shows roughly a ten-fold increase in throughput, but this > is probably underestimating it because of the high allocation rate of the old > code. > Hi Florian It looks like your GHASH implementation as posted isn’t passing the tests in TestGHASH.java. The existing JDK implementation does, and the Bouncy Castle GHASH produces the same results. Can you reproduce that? cheers tim > The performance improvement on 32-bit architectures is probably a bit less, > but I suspect that using four ints instead of two longs would penalize 64-bit > architectures. > > -- > Florian Weimer / Red Hat Product Security