Re: RFR: 8365581: Optimize Java implementation of P256 arithmetic [v4]

2025-09-17 Thread Jamil Nimeh
On Tue, 9 Sep 2025 16:46:10 GMT, Ben Perez wrote: >> There are several places where MontgomeryIntegerPolynomialP256.mult() can be >> optimized. In particular, since modulus[2] = 0 several multiplications can >> be removed. Other multiplications can be replaced by shifts, which also >> saves ti

Re: RFR: 8365581: Optimize Java implementation of P256 arithmetic [v4]

2025-09-09 Thread Ben Perez
> There are several places where MontgomeryIntegerPolynomialP256.mult() can be > optimized. In particular, since modulus[2] = 0 several multiplications can be > removed. Other multiplications can be replaced by shifts, which also saves > time. Preliminary tests indicate an improvement between 5-

Re: RFR: 8365581: Optimize Java implementation of P256 arithmetic [v3]

2025-08-30 Thread Jamil Nimeh
On Thu, 21 Aug 2025 23:30:08 GMT, Ben Perez wrote: >> There are several places where MontgomeryIntegerPolynomialP256.mult() can be >> optimized. In particular, since modulus[2] = 0 several multiplications can >> be removed. Other multiplications can be replaced by shifts, which also >> saves t

Re: RFR: 8365581: Optimize Java implementation of P256 arithmetic [v3]

2025-08-22 Thread Ben Perez
On Thu, 21 Aug 2025 23:30:08 GMT, Ben Perez wrote: >> There are several places where MontgomeryIntegerPolynomialP256.mult() can be >> optimized. In particular, since modulus[2] = 0 several multiplications can >> be removed. Other multiplications can be replaced by shifts, which also >> saves t

Re: RFR: 8365581: Optimize Java implementation of P256 arithmetic

2025-08-21 Thread Chen Liang
On Fri, 15 Aug 2025 01:01:01 GMT, Ben Perez wrote: > There are several places where MontgomeryIntegerPolynomialP256.mult() can be > optimized. In particular, since modulus[2] = 0 several multiplications can be > removed. Other multiplications can be replaced by shifts, which also saves > time.

Re: RFR: 8365581: Optimize Java implementation of P256 arithmetic [v3]

2025-08-21 Thread Ben Perez
> There are several places where MontgomeryIntegerPolynomialP256.mult() can be > optimized. In particular, since modulus[2] = 0 several multiplications can be > removed. Other multiplications can be replaced by shifts, which also saves > time. Preliminary tests indicate an improvement between 5-

Re: RFR: 8365581: Optimize Java implementation of P256 arithmetic [v2]

2025-08-21 Thread Ben Perez
> There are several places where MontgomeryIntegerPolynomialP256.mult() can be > optimized. In particular, since modulus[2] = 0 several multiplications can be > removed. Other multiplications can be replaced by shifts, which also saves > time. Preliminary tests indicate an improvement between 5-

Re: RFR: 8365581: Optimize Java implementation of P256 arithmetic

2025-08-21 Thread Ben Perez
On Wed, 20 Aug 2025 15:16:47 GMT, Chen Liang wrote: > > What do you mean by "works"? And why doesn't it work for the zeroes? > > As descirbed by [the documentation of > `@Stable`](https://cr.openjdk.org/~jrose/jvm/Stable.html), a 0 value may be > interpreted as "uncomputed" for lazy values, an

Re: RFR: 8365581: Optimize Java implementation of P256 arithmetic

2025-08-21 Thread altrisi
On Fri, 15 Aug 2025 01:01:01 GMT, Ben Perez wrote: > There are several places where MontgomeryIntegerPolynomialP256.mult() can be > optimized. In particular, since modulus[2] = 0 several multiplications can be > removed. Other multiplications can be replaced by shifts, which also saves > time.

Re: RFR: 8365581: Optimize Java implementation of P256 arithmetic

2025-08-21 Thread Chen Liang
On Wed, 20 Aug 2025 14:47:58 GMT, Ferenc Rakoczi wrote: > What do you mean by "works"? And why doesn't it work for the zeroes? As descirbed by [the documentation of `@Stable`](https://cr.openjdk.org/~jrose/jvm/Stable.html), a 0 value may be interpreted as "uncomputed" for lazy values, and the

Re: RFR: 8365581: Optimize Java implementation of P256 arithmetic

2025-08-21 Thread Ferenc Rakoczi
On Fri, 15 Aug 2025 14:50:23 GMT, ExE Boss wrote: > > I see you are inlining some modulus values manually. You can mark the > > arrays as `@Stable` and check what performance gain can you have as a > > result, because then C2 can treat these values as constants and generate > > more optimal co

Re: RFR: 8365581: Optimize Java implementation of P256 arithmetic

2025-08-21 Thread ExE Boss
On Fri, 15 Aug 2025 13:47:04 GMT, Chen Liang wrote: > I see you are inlining some modulus values manually. You can mark the arrays > as `@Stable` and check what performance gain can you have as a result, > because then C2 can treat these values as constants and generate more optimal > computat

Re: RFR: 8365581: Optimize Java implementation of P256 arithmetic

2025-08-21 Thread Ben Perez
On Fri, 15 Aug 2025 03:47:01 GMT, Chen Liang wrote: > This particular method is already `@IntrinsicCandidate`: what special > treatment does it get from the JVM? There are currently only intrinsics for x86 so improvements to the Java implementation will still benefit quite a few users. --