On 06/19/2015 09:34 AM, Andrew Haley wrote: > On 18/06/15 20:28, Vladimir Kozlov wrote: > >> Yes, it is a lot of handwriting but we need it to work on all OSs. > > Sure, I get that. I knew there would be a few goes around with this, > but it's worth the pain for the performance improvement.
I made some changes, as requested. Everything is now private static final. The libcall now only calls the runtime code: all allocation is done in Java code. I tested on Solaris using Solaris Studio 12.3 tools, and it's fine. There's one thing I'm not sure about. I now longer allocate scratch memory on the heap. That was only needed for extremely large integers, larger than anyone needs for crypto. Now, if the size of an integer exceeds 16384 bits I do not use the intrinsic, and this allows it to use stack-allocated memory for its scratch space. The main thing I was worried about is that the time spent in Montgomery multiplication. The runtime of the algorithm is O(N^2); if you don't limit the size, the time is unbounded, with no safepoint delay. This would mean that anyone who passed an absurdly large integer to BigInteger.modPow() would see the virtual machine apparently lock up and garbage collection would not run. I note that the multiplyToLen() intrinsic has the same problem. http://cr.openjdk.java.net/~aph/8046943-hs-3/ http://cr.openjdk.java.net/~aph/8046943-jdk-3/ Andrew.