On Thu, 13 Oct 2022 18:15:30 GMT, Xue-Lei Andrew Fan <xue...@openjdk.org> wrote:

>> Hi,
>> 
>> May I have this update reviewed?  With this update, the result will be 
>> reduced if required in EC limbs operations in the JDK implementation.
>> 
>> In the current implementation, the EC limbs addition and subtraction result 
>> is not reduced before the value is returned.  This behavior is different 
>> from multiplication and square.  To get it right, a maximum addition limit 
>> is introduced for EC limbs addition and subtraction.  If the limit is 
>> reached, an exception will be thrown so as to indicate an EC implementation 
>> problem.  The exception is not recoverable from the viewpoint of an 
>> application.
>> 
>> The design is error prone as the EC implementation code must be carefully 
>> checked so that the limit cannot reach. If a reach is noticed, a reduce 
>> operation would have to be hard-coded additionally. In the FieldGen.java 
>> implementation, the limit can only be 1 or 2 as the implementation code is 
>> only able to handle 2.  But the FieldGen.java does not really check if 1 or 
>> 2 really works for the specific filed generation parameters.
>> 
>> The design impact the performance as well. Because of this limit, the 
>> maximum limb size is 28 bits for 2 max adding limit. Otherwise there are 
>> integer (long) overflow issues. For example for 256 bits curves, 10 limbs 
>> are required for 28-bit limb size; and 9 limbs for 29-bit size. By reducing 
>> 1 limbs from 10 to 9, the Signature performance could get improved by 20%.
>> 
>> In the IntegerPolynomial class description, it is said "All 
>> IntegerPolynomial implementations allow at most one addition before 
>> multiplication. Additions after that will result in an ArithmeticException." 
>> It's too strict to follow without exam the code very carefully. Indeed, the 
>> implementation does not really follow the spec, and 2 addition may be 
>> allowed.
>> 
>> It would be nice if there is no addition limit, and then we are free from 
>> these issues. It is doable by reducing the limbs if required for EC limbs 
>> operations.
>> 
>> The benchmark for ECDSA Signature is checked in order to see how much the 
>> performance could be impacted. Here are the benchmark numbers before the 
>> patch applied:
>> 
>> Benchmark        (curveName)  (messageLength)   Mode  Cnt     Score    Error 
>>  Units
>> Signatures.sign    secp256r1               64  thrpt   15  1417.507 ±  5.809 
>>  ops/s
>> Signatures.sign    secp256r1              512  thrpt   15  1412.620 ±  7.040 
>>  ops/s
>> Signatures.sign    secp256r1             2048  thrpt   15  1417.364 ±  6.643 
>>  ops/s
>> Signatures.sign    secp256r1            16384  thrpt   15  1364.720 ± 44.354 
>>  ops/s
>> Signatures.sign    secp384r1               64  thrpt   15   609.924 ±  9.858 
>>  ops/s
>> Signatures.sign    secp384r1              512  thrpt   15   610.873 ±  5.519 
>>  ops/s
>> Signatures.sign    secp384r1             2048  thrpt   15   608.573 ± 13.324 
>>  ops/s
>> Signatures.sign    secp384r1            16384  thrpt   15   597.802 ±  7.074 
>>  ops/s
>> Signatures.sign    secp521r1               64  thrpt   15   301.954 ±  6.449 
>>  ops/s
>> Signatures.sign    secp521r1              512  thrpt   15   300.774 ±  5.849 
>>  ops/s
>> Signatures.sign    secp521r1             2048  thrpt   15   295.831 ±  8.423 
>>  ops/s
>> Signatures.sign    secp521r1            16384  thrpt   15   276.258 ± 43.146 
>>  ops/s
>> Signatures.sign      Ed25519               64  thrpt   15  1171.878 ±  4.187 
>>  ops/s
>> Signatures.sign      Ed25519              512  thrpt   15  1175.175 ±  4.611 
>>  ops/s
>> Signatures.sign      Ed25519             2048  thrpt   15  1168.459 ±  5.750 
>>  ops/s
>> Signatures.sign      Ed25519            16384  thrpt   15  1086.906 ±  6.514 
>>  ops/s
>> Signatures.sign        Ed448               64  thrpt   15   326.475 ±  3.274 
>>  ops/s
>> Signatures.sign        Ed448              512  thrpt   15   328.709 ±  3.867 
>>  ops/s
>> Signatures.sign        Ed448             2048  thrpt   15   315.480 ± 15.377 
>>  ops/s
>> Signatures.sign        Ed448            16384  thrpt   15   312.388 ±  1.067 
>>  ops/s
>> 
>> 
>> and here are the numbers with this patch applied:
>> 
>> Benchmark        (curveName)  (messageLength)   Mode  Cnt     Score    Error 
>>  Units
>> Signatures.sign    secp256r1               64  thrpt   15  1377.447 ± 38.889 
>>  ops/s
>> Signatures.sign    secp256r1              512  thrpt   15  1403.149 ±  5.151 
>>  ops/s
>> Signatures.sign    secp256r1             2048  thrpt   15  1401.101 ±  6.937 
>>  ops/s
>> Signatures.sign    secp256r1            16384  thrpt   15  1381.855 ± 10.606 
>>  ops/s
>> Signatures.sign    secp384r1               64  thrpt   15   595.979 ±  1.967 
>>  ops/s
>> Signatures.sign    secp384r1              512  thrpt   15   592.419 ±  6.087 
>>  ops/s
>> Signatures.sign    secp384r1             2048  thrpt   15   581.800 ± 18.815 
>>  ops/s
>> Signatures.sign    secp384r1            16384  thrpt   15   583.224 ±  9.856 
>>  ops/s
>> Signatures.sign    secp521r1               64  thrpt   15   264.772 ± 36.883 
>>  ops/s
>> Signatures.sign    secp521r1              512  thrpt   15   302.084 ±  1.074 
>>  ops/s
>> Signatures.sign    secp521r1             2048  thrpt   15   296.619 ±  3.272 
>>  ops/s
>> Signatures.sign    secp521r1            16384  thrpt   15   290.112 ±  5.085 
>>  ops/s
>> Signatures.sign      Ed25519               64  thrpt   15  1163.293 ±  4.266 
>>  ops/s
>> Signatures.sign      Ed25519              512  thrpt   15  1157.093 ±  8.145 
>>  ops/s
>> Signatures.sign      Ed25519             2048  thrpt   15  1140.900 ±  8.660 
>>  ops/s
>> Signatures.sign      Ed25519            16384  thrpt   15  1053.233 ± 40.591 
>>  ops/s
>> Signatures.sign        Ed448               64  thrpt   15   317.509 ±  6.870 
>>  ops/s
>> Signatures.sign        Ed448              512  thrpt   15   320.975 ±  1.534 
>>  ops/s
>> Signatures.sign        Ed448             2048  thrpt   15   322.157 ±  1.914 
>>  ops/s
>> Signatures.sign        Ed448            16384  thrpt   15   303.532 ±  6.419 
>>  ops/s
>> 
>> 
>> From the numbers, it looks like the performance impact is limited.  Based on 
>> this update, the performance could get rewarded by using less limbs.  For 
>> examples if using less limbs for secp256r1/secp384r1/secp521r1/curve25519, 
>> [see PR for PR for JDK-8294248](https://github.com/openjdk/jdk/pull/10398), 
>> and run the benchmark together with this patch, I could see the performance 
>> improvement as follow (note: No update for Ed448 as using less limbs does 
>> not improve the performance for Ed448):
>> 
>> Benchmark        (curveName)  (messageLength)   Mode  Cnt     Score    Error 
>>  Units
>> Signatures.sign    secp256r1               64  thrpt   15  1769.262 ±  4.251 
>>  ops/s
>> Signatures.sign    secp256r1              512  thrpt   15  1764.754 ± 12.185 
>>  ops/s
>> Signatures.sign    secp256r1             2048  thrpt   15  1737.499 ± 43.937 
>>  ops/s
>> Signatures.sign    secp256r1            16384  thrpt   15  1733.932 ±  7.938 
>>  ops/s
>> Signatures.sign    secp384r1               64  thrpt   15   636.614 ±  6.752 
>>  ops/s
>> Signatures.sign    secp384r1              512  thrpt   15   629.449 ±  8.000 
>>  ops/s
>> Signatures.sign    secp384r1             2048  thrpt   15   645.898 ±  5.655 
>>  ops/s
>> Signatures.sign    secp384r1            16384  thrpt   15   634.530 ± 11.846 
>>  ops/s
>> Signatures.sign    secp521r1               64  thrpt   15   334.413 ±  6.710 
>>  ops/s
>> Signatures.sign    secp521r1              512  thrpt   15   335.088 ±  6.630 
>>  ops/s
>> Signatures.sign    secp521r1             2048  thrpt   15   339.729 ±  1.450 
>>  ops/s
>> Signatures.sign    secp521r1            16384  thrpt   15   327.597 ±  2.341 
>>  ops/s
>> Signatures.sign      Ed25519               64  thrpt   15  1361.028 ±  2.338 
>>  ops/s
>> Signatures.sign      Ed25519              512  thrpt   15  1359.072 ±  3.748 
>>  ops/s
>> Signatures.sign      Ed25519             2048  thrpt   15  1347.409 ±  9.172 
>>  ops/s
>> Signatures.sign      Ed25519            16384  thrpt   15  1250.794 ± 11.750 
>>  ops/s
>> Signatures.sign        Ed448               64  thrpt   15   321.312 ±  5.826 
>>  ops/s
>> Signatures.sign        Ed448              512  thrpt   15   326.592 ±  4.213 
>>  ops/s
>> Signatures.sign        Ed448             2048  thrpt   15   322.183 ±  7.005 
>>  ops/s
>> Signatures.sign        Ed448            16384  thrpt   15   299.869 ± 16.594 
>>  ops/s
>> ``` 
>> 
>> If considering this PR together with [PR for 
>> JDK-8294248](https://github.com/openjdk/jdk/pull/10398), performance 
>> improvement could be expected for EC crypto primitives.
>> 
>> Thanks,
>> Xuelei
>
> Xue-Lei Andrew Fan has updated the pull request with a new target base due to 
> a merge or a rebase. The pull request now contains six commits:
> 
>  - Merge master
>  - missed reduce
>  - reduce if needed
>  - add the key pair generation bench code
>  - remove tabs
>  - 8295010: EC limbs addition and subtraction limit

The way I see it is this: as limbs are 64-bit wide, the only place where they 
can possibly overflow (during the computations they are used for) is the 
multiplication (including multiply by int and squaring). So I would first try 
to change the mult() and square() methods only in IntegerPolynomialP256.java 
(well, in the generator that creates it). 
 (It would also be nice to add comments to the various carry/reduce methods 
that explain what exactly they want to achieve -- although this is definitely 
something that doesn't have to be in this change set.)
I also think (agree with you) that the setReduced() method can be eliminated if 
you reduce the multiplicands conditionally (if numAdds > maxAdds) before the 
multiplication/squaring and unconditionally after it (this part is done in the 
generated functions already). But that assumes you change all subclasses of 
IntegerPolynomial that way (most conveniently in the set[Product,Square]() 
methods).

-------------

PR: https://git.openjdk.org/jdk/pull/10624

Reply via email to