On Fri, 15 Aug 2025 01:01:01 GMT, Ben Perez <bpe...@openjdk.org> wrote:
> There are several places where MontgomeryIntegerPolynomialP256.mult() can be > optimized. In particular, since modulus[2] = 0 several multiplications can be > removed. Other multiplications can be replaced by shifts, which also saves > time. Preliminary tests indicate an improvement between 5-10%. This particular method is already `@IntrinsicCandidate`: what special treatment does it get from the JVM? I see you are inlining some modulus values manually. You can mark the arrays as `@Stable` and check what performance gain can you have as a result, because then C2 can treat these values as constants and generate more optimal computations. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26792#issuecomment-3190524045 PR Comment: https://git.openjdk.org/jdk/pull/26792#issuecomment-3191543014