On Jun 17, 2015, at 10:40 AM, Anthony Scarpino <anthony.scarp...@oracle.com> wrote: > > On 06/15/2015 05:20 PM, John Rose wrote: >> Thanks for taking this on. >> >> It looks good, except for one thing. The intrinsic does not need to be >> an instance method, and doing so creates an undesirable coupling between >> the JVM and JDK. Specifically, the JDK should not need to know about >> subkeyH and state fields. The Java code should pass those as plain >> (array long[2]) arguments to the intrinsic method processBlocks, which >> should be adjusted to be static. The domain check routine should be >> adjusted to be static also. >> >> On my wish list for the future (but not now) is even less coupling >> with the JVM. The loop code for processBlocks should be written in Java, >> with various intrinsics (xmulx*) for dealing with single operations on >> 128-bit values (stored in long[2] boxes and 64-bit registers). > > I forgot the exact numbers, but having the loop in assembly instead of java > resulted in about 10-15% performance improvement. The tighter loop was > definitely beneficial.
That's good information; please put it in the RFE. IMO the best (overall) way to get that 10-15% back is to get the JIT to tighten the loop. If that works, it will of course benefit all Java loops. > The >> Unsafe misaligned access routines could help simplify this also, if the >> coding were done in Java. This is not too hard to express in Java and >> compile to excellent code. There will be a little extra awkwardness >> working with 64x2-vectors in a way that will compile naturally to a >> range of ALUs (both 64- and 128-bit). > > I would have to look back at that again.. At first I was going to use Unsafe, > but it seemed more complicated coding-wise compared to the assembly I saw > that was in AES and SHA already. Yes; that makes perfect sense. I want to get to the place, eventually, when the next coder looks at how "stuff gets done", they will see less assembly and more Java, and take the Java route. > >> If we get it right we can reduce >> the amount of assembly code in the JVM and get even more timely access >> to new data-processing instructions. Would you please file a followup >> bug (low pri. for now) to track this, at least for GHASH and other >> crypto loops? >> >> — John > > Sure, I can file them. Good; thanks. — John