On 06/15/2015 05:20 PM, John Rose wrote:
Thanks for taking this on.

It looks good, except for one thing. The intrinsic does not need to be
an instance method, and doing so creates an undesirable coupling between
the JVM and JDK. Specifically, the JDK should not need to know about
subkeyH and state fields. The Java code should pass those as plain
(array long[2]) arguments to the intrinsic method processBlocks, which
should be adjusted to be static. The domain check routine should be
adjusted to be static also.

On my wish list for the future (but not now) is even less coupling
with the JVM. The loop code for processBlocks should be written in Java,
with various intrinsics (xmulx*) for dealing with single operations on
128-bit values (stored in long[2] boxes and 64-bit registers).

I forgot the exact numbers, but having the loop in assembly instead of java resulted in about 10-15% performance improvement. The tighter loop was definitely beneficial.

The
Unsafe misaligned access routines could help simplify this also, if the
coding were done in Java. This is not too hard to express in Java and
compile to excellent code. There will be a little extra awkwardness
working with 64x2-vectors in a way that will compile naturally to a
range of ALUs (both 64- and 128-bit).

I would have to look back at that again.. At first I was going to use Unsafe, but it seemed more complicated coding-wise compared to the assembly I saw that was in AES and SHA already.

If we get it right we can reduce
the amount of assembly code in the JVM and get even more timely access
to new data-processing instructions. Would you please file a followup
bug (low pri. for now) to track this, at least for GHASH and other
crypto loops?

— John

Sure, I can file them.

Tony



Reply via email to