On Thu, 13 Oct 2022 18:15:30 GMT, Xue-Lei Andrew Fan <xue...@openjdk.org> wrote:
>> Hi, >> >> May I have this update reviewed? With this update, the result will be >> reduced if required in EC limbs operations in the JDK implementation. >> >> In the current implementation, the EC limbs addition and subtraction result >> is not reduced before the value is returned. This behavior is different >> from multiplication and square. To get it right, a maximum addition limit >> is introduced for EC limbs addition and subtraction. If the limit is >> reached, an exception will be thrown so as to indicate an EC implementation >> problem. The exception is not recoverable from the viewpoint of an >> application. >> >> The design is error prone as the EC implementation code must be carefully >> checked so that the limit cannot reach. If a reach is noticed, a reduce >> operation would have to be hard-coded additionally. In the FieldGen.java >> implementation, the limit can only be 1 or 2 as the implementation code is >> only able to handle 2. But the FieldGen.java does not really check if 1 or >> 2 really works for the specific filed generation parameters. >> >> The design impact the performance as well. Because of this limit, the >> maximum limb size is 28 bits for 2 max adding limit. Otherwise there are >> integer (long) overflow issues. For example for 256 bits curves, 10 limbs >> are required for 28-bit limb size; and 9 limbs for 29-bit size. By reducing >> 1 limbs from 10 to 9, the Signature performance could get improved by 20%. >> >> In the IntegerPolynomial class description, it is said "All >> IntegerPolynomial implementations allow at most one addition before >> multiplication. Additions after that will result in an ArithmeticException." >> It's too strict to follow without exam the code very carefully. Indeed, the >> implementation does not really follow the spec, and 2 addition may be >> allowed. >> >> It would be nice if there is no addition limit, and then we are free from >> these issues. It is doable by reducing the limbs if required for EC limbs >> operations. >> >> The benchmark for ECDSA Signature is checked in order to see how much the >> performance could be impacted. Here are the benchmark numbers before the >> patch applied: >> >> Benchmark (curveName) (messageLength) Mode Cnt Score Error >> Units >> Signatures.sign secp256r1 64 thrpt 15 1417.507 ± 5.809 >> ops/s >> Signatures.sign secp256r1 512 thrpt 15 1412.620 ± 7.040 >> ops/s >> Signatures.sign secp256r1 2048 thrpt 15 1417.364 ± 6.643 >> ops/s >> Signatures.sign secp256r1 16384 thrpt 15 1364.720 ± 44.354 >> ops/s >> Signatures.sign secp384r1 64 thrpt 15 609.924 ± 9.858 >> ops/s >> Signatures.sign secp384r1 512 thrpt 15 610.873 ± 5.519 >> ops/s >> Signatures.sign secp384r1 2048 thrpt 15 608.573 ± 13.324 >> ops/s >> Signatures.sign secp384r1 16384 thrpt 15 597.802 ± 7.074 >> ops/s >> Signatures.sign secp521r1 64 thrpt 15 301.954 ± 6.449 >> ops/s >> Signatures.sign secp521r1 512 thrpt 15 300.774 ± 5.849 >> ops/s >> Signatures.sign secp521r1 2048 thrpt 15 295.831 ± 8.423 >> ops/s >> Signatures.sign secp521r1 16384 thrpt 15 276.258 ± 43.146 >> ops/s >> Signatures.sign Ed25519 64 thrpt 15 1171.878 ± 4.187 >> ops/s >> Signatures.sign Ed25519 512 thrpt 15 1175.175 ± 4.611 >> ops/s >> Signatures.sign Ed25519 2048 thrpt 15 1168.459 ± 5.750 >> ops/s >> Signatures.sign Ed25519 16384 thrpt 15 1086.906 ± 6.514 >> ops/s >> Signatures.sign Ed448 64 thrpt 15 326.475 ± 3.274 >> ops/s >> Signatures.sign Ed448 512 thrpt 15 328.709 ± 3.867 >> ops/s >> Signatures.sign Ed448 2048 thrpt 15 315.480 ± 15.377 >> ops/s >> Signatures.sign Ed448 16384 thrpt 15 312.388 ± 1.067 >> ops/s >> >> >> and here are the numbers with this patch applied: >> >> Benchmark (curveName) (messageLength) Mode Cnt Score Error >> Units >> Signatures.sign secp256r1 64 thrpt 15 1377.447 ± 38.889 >> ops/s >> Signatures.sign secp256r1 512 thrpt 15 1403.149 ± 5.151 >> ops/s >> Signatures.sign secp256r1 2048 thrpt 15 1401.101 ± 6.937 >> ops/s >> Signatures.sign secp256r1 16384 thrpt 15 1381.855 ± 10.606 >> ops/s >> Signatures.sign secp384r1 64 thrpt 15 595.979 ± 1.967 >> ops/s >> Signatures.sign secp384r1 512 thrpt 15 592.419 ± 6.087 >> ops/s >> Signatures.sign secp384r1 2048 thrpt 15 581.800 ± 18.815 >> ops/s >> Signatures.sign secp384r1 16384 thrpt 15 583.224 ± 9.856 >> ops/s >> Signatures.sign secp521r1 64 thrpt 15 264.772 ± 36.883 >> ops/s >> Signatures.sign secp521r1 512 thrpt 15 302.084 ± 1.074 >> ops/s >> Signatures.sign secp521r1 2048 thrpt 15 296.619 ± 3.272 >> ops/s >> Signatures.sign secp521r1 16384 thrpt 15 290.112 ± 5.085 >> ops/s >> Signatures.sign Ed25519 64 thrpt 15 1163.293 ± 4.266 >> ops/s >> Signatures.sign Ed25519 512 thrpt 15 1157.093 ± 8.145 >> ops/s >> Signatures.sign Ed25519 2048 thrpt 15 1140.900 ± 8.660 >> ops/s >> Signatures.sign Ed25519 16384 thrpt 15 1053.233 ± 40.591 >> ops/s >> Signatures.sign Ed448 64 thrpt 15 317.509 ± 6.870 >> ops/s >> Signatures.sign Ed448 512 thrpt 15 320.975 ± 1.534 >> ops/s >> Signatures.sign Ed448 2048 thrpt 15 322.157 ± 1.914 >> ops/s >> Signatures.sign Ed448 16384 thrpt 15 303.532 ± 6.419 >> ops/s >> >> >> From the numbers, it looks like the performance impact is limited. Based on >> this update, the performance could get rewarded by using less limbs. For >> examples if using less limbs for secp256r1/secp384r1/secp521r1/curve25519, >> [see PR for PR for JDK-8294248](https://github.com/openjdk/jdk/pull/10398), >> and run the benchmark together with this patch, I could see the performance >> improvement as follow (note: No update for Ed448 as using less limbs does >> not improve the performance for Ed448): >> >> Benchmark (curveName) (messageLength) Mode Cnt Score Error >> Units >> Signatures.sign secp256r1 64 thrpt 15 1769.262 ± 4.251 >> ops/s >> Signatures.sign secp256r1 512 thrpt 15 1764.754 ± 12.185 >> ops/s >> Signatures.sign secp256r1 2048 thrpt 15 1737.499 ± 43.937 >> ops/s >> Signatures.sign secp256r1 16384 thrpt 15 1733.932 ± 7.938 >> ops/s >> Signatures.sign secp384r1 64 thrpt 15 636.614 ± 6.752 >> ops/s >> Signatures.sign secp384r1 512 thrpt 15 629.449 ± 8.000 >> ops/s >> Signatures.sign secp384r1 2048 thrpt 15 645.898 ± 5.655 >> ops/s >> Signatures.sign secp384r1 16384 thrpt 15 634.530 ± 11.846 >> ops/s >> Signatures.sign secp521r1 64 thrpt 15 334.413 ± 6.710 >> ops/s >> Signatures.sign secp521r1 512 thrpt 15 335.088 ± 6.630 >> ops/s >> Signatures.sign secp521r1 2048 thrpt 15 339.729 ± 1.450 >> ops/s >> Signatures.sign secp521r1 16384 thrpt 15 327.597 ± 2.341 >> ops/s >> Signatures.sign Ed25519 64 thrpt 15 1361.028 ± 2.338 >> ops/s >> Signatures.sign Ed25519 512 thrpt 15 1359.072 ± 3.748 >> ops/s >> Signatures.sign Ed25519 2048 thrpt 15 1347.409 ± 9.172 >> ops/s >> Signatures.sign Ed25519 16384 thrpt 15 1250.794 ± 11.750 >> ops/s >> Signatures.sign Ed448 64 thrpt 15 321.312 ± 5.826 >> ops/s >> Signatures.sign Ed448 512 thrpt 15 326.592 ± 4.213 >> ops/s >> Signatures.sign Ed448 2048 thrpt 15 322.183 ± 7.005 >> ops/s >> Signatures.sign Ed448 16384 thrpt 15 299.869 ± 16.594 >> ops/s >> ``` >> >> If considering this PR together with [PR for >> JDK-8294248](https://github.com/openjdk/jdk/pull/10398), performance >> improvement could be expected for EC crypto primitives. >> >> Thanks, >> Xuelei > > Xue-Lei Andrew Fan has updated the pull request with a new target base due to > a merge or a rebase. The pull request now contains six commits: > > - Merge master > - missed reduce > - reduce if needed > - add the key pair generation bench code > - remove tabs > - 8295010: EC limbs addition and subtraction limit The way I see it is this: as limbs are 64-bit wide, the only place where they can possibly overflow (during the computations they are used for) is the multiplication (including multiply by int and squaring). So I would first try to change the mult() and square() methods only in IntegerPolynomialP256.java (well, in the generator that creates it). (It would also be nice to add comments to the various carry/reduce methods that explain what exactly they want to achieve -- although this is definitely something that doesn't have to be in this change set.) I also think (agree with you) that the setReduced() method can be eliminated if you reduce the multiplicands conditionally (if numAdds > maxAdds) before the multiplication/squaring and unconditionally after it (this part is done in the generated functions already). But that assumes you change all subclasses of IntegerPolynomial that way (most conveniently in the set[Product,Square]() methods). ------------- PR: https://git.openjdk.org/jdk/pull/10624