On Tue, 1 Nov 2022 19:10:13 GMT, Xue-Lei Andrew Fan <[email protected]> wrote:
>> Hi,
>>
>> May I have this update reviewed?
>>
>> The EC point multiplication for secp256r1 could be improved for better
>> performance, by using more efficient algorithm and pre-computation.
>> Improvement for other curves are similar, but will be addressed in separated
>> PRs.
>>
>> The basic idea is using pre-computed tables and safe table select in order
>> to speed up the point multiplication and keep is safe. Before this patch
>> applied, a secp256r1 point multiplication operation needs 256 double
>> operations and 78 addition operations. With this patch, it is reduced to 16
>> double operations and 64 addition operations. **If assuming the performance
>> for double and addition operations is about the same (double operation is a
>> little bit faster actually), the new point multiplication implementation
>> performance is about 4 times ((256+78)/(16+64)) of the current
>> implementation.**
>>
>> ## SSLHandshake.java benchmark
>> ### Use secp256r1 as the named group
>> The following are TLS benchmarking
>> (test/micro/org/openjdk/bench/java/security/SSLHandshake.java) results by
>> using secp256r1 (Set the System property, "jdk.tls.namedGroups" to
>> "secp256r1") as the key exchange algorithm in TLS connections:
>>
>> Benchmark with this patch:
>>
>> Benchmark (resume) (tlsVersion) Mode Cnt Score
>> Error Units
>> SSLHandshake.doHandshake true TLSv1.2 thrpt 15 7976.334 ±
>> 96.877 ops/s
>> SSLHandshake.doHandshake true TLS thrpt 15 315.783 ±
>> 1.208 ops/s
>> SSLHandshake.doHandshake false TLSv1.2 thrpt 15 235.646 ±
>> 1.356 ops/s
>> SSLHandshake.doHandshake false TLS thrpt 15 230.759 ±
>> 1.789 ops/s
>>
>>
>> Benchmark before this patch applied:
>>
>> Benchmark (resume) (tlsVersion) Mode Cnt Score
>> Error Units
>> SSLHandshake.doHandshake true TLSv1.2 thrpt 15 7830.289 ±
>> 58.584 ops/s
>> SSLHandshake.doHandshake true TLS thrpt 15 253.827 ±
>> 0.690 ops/s
>> SSLHandshake.doHandshake false TLSv1.2 thrpt 15 171.944 ±
>> 0.667 ops/s
>> SSLHandshake.doHandshake false TLS thrpt 15 169.383 ±
>> 0.593 ops/s
>>
>>
>> Per the result, the session resumption performance for TLS 1.2 is about the
>> same. It is the expected result as there is no EC point multiplication
>> involved for TLS 1.2 session resumption. TLS 1.3 is different as EC key
>> generation is involved in either initial handshake and session resumption.
>>
>> **When EC key generation get involved (TLS 1.3 connections and TLS 1.2
>> initial handshake), the performance improvement is about 35% for named group
>> secp256r1 based TLS connections..**
>>
>> ### Use default TLS named groups
>> The following are TLS benchmarking
>> (test/micro/org/openjdk/bench/java/security/SSLHandshake.java) results by
>> using key exchange algorithms in TLS connections. In the current JDK
>> implementation, the EC keys are generated for both secp256r1 and x25519
>> curves, and x25519 is the preferred curves.
>>
>> Benchmark with this patch:
>>
>> Benchmark (resume) (tlsVersion) Mode Cnt Score
>> Error Units
>> SSLHandshake.doHandshake true TLSv1.2 thrpt 15 7620.615 ±
>> 62.459 ops/s
>> SSLHandshake.doHandshake true TLS thrpt 15 746.924 ±
>> 6.549 ops/s
>> SSLHandshake.doHandshake false TLSv1.2 thrpt 15 456.066 ±
>> 1.440 ops/s
>> SSLHandshake.doHandshake false TLS thrpt 15 396.189 ±
>> 2.275 ops/s
>>
>>
>> Benchmark before this patch applied:
>>
>> Benchmark (resume) (tlsVersion) Mode Cnt Score
>> Error Units
>> SSLHandshake.doHandshake true TLSv1.2 thrpt 15 7605.177 ±
>> 55.961 ops/s
>> SSLHandshake.doHandshake true TLS thrpt 15 544.333 ±
>> 23.431 ops/s
>> SSLHandshake.doHandshake false TLSv1.2 thrpt 15 335.259 ±
>> 1.926 ops/s
>> SSLHandshake.doHandshake false TLS thrpt 15 269.422 ±
>> 1.531 ops/s
>>
>>
>> Per the result, the session resumption performance for TLS 1.2 is about the
>> same. It is the expected result as there is no EC point multiplication
>> involved for TLS 1.2 session resumption. TLS 1.3 is different as EC key
>> generation is involved in either initial handshake and session resumption.
>>
>> **When EC key generation get involved (TLS 1.3 connections and TLS 1.2
>> initial handshake), the performance improvement is about 36%~47% for
>> default configuration based TLS connections.**
>>
>> ## KeyPairGenerators.java benchmark
>> The following are EC key pair generation benchmark
>> (test/micro/org/openjdk/bench/java/security/KeyPairGenerators.java) results.
>>
>> Benchmark with this patch:
>>
>> Benchmark (curveName) Mode Cnt Score Error
>> Units
>> KeyPairGenerators.keyPairGen secp256r1 thrpt 15 4638.623 ± 93.320
>> ops/s
>> KeyPairGenerators.keyPairGen secp384r1 thrpt 15 710.643 ± 6.404
>> ops/s
>> KeyPairGenerators.keyPairGen secp521r1 thrpt 15 371.417 ± 1.302
>> ops/s
>> KeyPairGenerators.keyPairGen Ed25519 thrpt 15 2355.491 ± 69.584
>> ops/s
>> KeyPairGenerators.keyPairGen Ed448 thrpt 15 682.144 ± 6.671
>> ops/s
>>
>>
>> Benchmark before this patch applied:
>>
>> Benchmark (curveName) Mode Cnt Score Error
>> Units
>> KeyPairGenerators.keyPairGen secp256r1 thrpt 15 1642.312 ± 52.155
>> ops/s
>> KeyPairGenerators.keyPairGen secp384r1 thrpt 15 687.669 ± 30.576
>> ops/s
>> KeyPairGenerators.keyPairGen secp521r1 thrpt 15 371.854 ± 1.736
>> ops/s
>> KeyPairGenerators.keyPairGen Ed25519 thrpt 15 2448.139 ± 7.788
>> ops/s
>> KeyPairGenerators.keyPairGen Ed448 thrpt 15 685.195 ± 4.994
>> ops/s
>>
>>
>> **Per the result, the performance improvement is about 180% for key pair
>> generation performance for secp256r1.** Other curves should not be impacted
>> as the point multiplication implementation for them is not updated yet.
>>
>> ## Signatures.java benchmark
>> The following are EC key pair generation benchmark
>> (test/micro/org/openjdk/bench/java/security/Signatures.java) results.
>>
>> Benchmark with this patch:
>>
>> Benchmark (curveName) (messageLength) Mode Cnt Score Error
>> Units
>> Signatures.sign secp256r1 64 thrpt 15 3646.764 ± 28.633
>> ops/s
>> Signatures.sign secp256r1 512 thrpt 15 3595.674 ± 69.761
>> ops/s
>> Signatures.sign secp256r1 2048 thrpt 15 3633.184 ± 22.236
>> ops/s
>> Signatures.sign secp256r1 16384 thrpt 15 3481.104 ± 21.043
>> ops/s
>> Signatures.sign secp384r1 64 thrpt 15 631.036 ± 6.154
>> ops/s
>> Signatures.sign secp384r1 512 thrpt 15 633.485 ± 18.700
>> ops/s
>> Signatures.sign secp384r1 2048 thrpt 15 615.955 ± 4.598
>> ops/s
>> Signatures.sign secp384r1 16384 thrpt 15 627.193 ± 6.551
>> ops/s
>> Signatures.sign secp521r1 64 thrpt 15 303.849 ± 19.569
>> ops/s
>> Signatures.sign secp521r1 512 thrpt 15 308.676 ± 7.002
>> ops/s
>> Signatures.sign secp521r1 2048 thrpt 15 317.306 ± 0.327
>> ops/s
>> Signatures.sign secp521r1 16384 thrpt 15 312.579 ± 1.753
>> ops/s
>> Signatures.sign Ed25519 64 thrpt 15 1192.428 ± 10.424
>> ops/s
>> Signatures.sign Ed25519 512 thrpt 15 1185.397 ± 1.993
>> ops/s
>> Signatures.sign Ed25519 2048 thrpt 15 1181.980 ± 2.963
>> ops/s
>> Signatures.sign Ed25519 16384 thrpt 15 1105.737 ± 4.339
>> ops/s
>> Signatures.sign Ed448 64 thrpt 15 332.501 ± 1.471
>> ops/s
>> Signatures.sign Ed448 512 thrpt 15 324.770 ± 9.631
>> ops/s
>> Signatures.sign Ed448 2048 thrpt 15 325.833 ± 1.602
>> ops/s
>> Signatures.sign Ed448 16384 thrpt 15 313.231 ± 1.440
>> ops/s
>>
>>
>>
>> Benchmark before this patch applied:
>>
>> Benchmark (curveName) (messageLength) Mode Cnt Score Error
>> Units
>> Signatures.sign secp256r1 64 thrpt 15 1515.924 ± 8.489
>> ops/s
>> Signatures.sign secp256r1 512 thrpt 15 1521.586 ± 7.726
>> ops/s
>> Signatures.sign secp256r1 2048 thrpt 15 1499.704 ± 9.704
>> ops/s
>> Signatures.sign secp256r1 16384 thrpt 15 1499.392 ± 8.832
>> ops/s
>> Signatures.sign secp384r1 64 thrpt 15 634.406 ± 8.328
>> ops/s
>> Signatures.sign secp384r1 512 thrpt 15 633.766 ± 11.965
>> ops/s
>> Signatures.sign secp384r1 2048 thrpt 15 634.608 ± 5.526
>> ops/s
>> Signatures.sign secp384r1 16384 thrpt 15 628.815 ± 3.756
>> ops/s
>> Signatures.sign secp521r1 64 thrpt 15 313.390 ± 9.728
>> ops/s
>> Signatures.sign secp521r1 512 thrpt 15 316.420 ± 2.817
>> ops/s
>> Signatures.sign secp521r1 2048 thrpt 15 307.386 ± 3.966
>> ops/s
>> Signatures.sign secp521r1 16384 thrpt 15 315.384 ± 2.243
>> ops/s
>> Signatures.sign Ed25519 64 thrpt 15 1187.227 ± 5.758
>> ops/s
>> Signatures.sign Ed25519 512 thrpt 15 1189.044 ± 5.370
>> ops/s
>> Signatures.sign Ed25519 2048 thrpt 15 1182.833 ± 13.186
>> ops/s
>> Signatures.sign Ed25519 16384 thrpt 15 1099.599 ± 3.932
>> ops/s
>> Signatures.sign Ed448 64 thrpt 15 331.810 ± 3.786
>> ops/s
>> Signatures.sign Ed448 512 thrpt 15 332.885 ± 4.926
>> ops/s
>> Signatures.sign Ed448 2048 thrpt 15 332.941 ± 4.292
>> ops/s
>> Signatures.sign Ed448 16384 thrpt 15 318.226 ± 4.141
>> ops/s
>>
>>
>> **Per the result, the performance improvement is about 140% for signature
>> performance for secp256r1.** Other curves should not be impacted as the
>> point multiplication implementation for them is not updated yet.
>
> Xue-Lei Andrew Fan has updated the pull request incrementally with one
> additional commit since the last revision:
>
> typo correction
Thanks!
src/jdk.crypto.ec/share/classes/sun/security/ec/ECOperations.java line 425:
> 423: // point. The multiplier point is specified by the
> implementation of
> 424: // this interface, which could be a general EC point or EC
> generator
> 425: // point.
Suggestion for the comment:
// Multiply the ECPoint (that is specified in the implementation) by a
scalar and return the result as a
// ProjectivePoint.Mutable point.
// The point to be multiplied can be a general EC point or the
generator of a named EC group.
// The scalar multiplier is an integer in little endian byte array
representation.
src/jdk.crypto.ec/share/classes/sun/security/ec/ECOperations.java line 539:
> 537:
> 538: final class Secp256R1GeneratorMultiplier implements
> PointMultiplier {
> 539: private static final ECPoint point =
I would rename "point" to "generator"
src/jdk.crypto.ec/share/classes/sun/security/ec/ECOperations.java line 579:
> 577: }
> 578:
> 579: private static int posit(byte[] k, int i) {
I would rename "posit" to "bit"
src/jdk.crypto.ec/share/classes/sun/security/ec/ECOperations.java line 590:
> 588: // 16 elements. For the 1st dimension, each element in
> it is
> 589: // a pre-computed generator point multiplication value.
> 590: //
I suggest to reword this:
// Pre-computed table to speed up the point multiplication.
//
// This is a 4x16 array of ProjectivePoint.Immutable elements.
// The first row contains the following multiples of the
generator.
//
src/jdk.crypto.ec/share/classes/sun/security/ec/ECOperations.java line 612:
> 610: // For the following dimensions, each element is
> multiplied
> 611: // by 2^16 of the corresponding element value in the
> previous
> 612: // dimension.
Suggestion for the last part of the comment:
// For the other 3 rows: points[I][j] = (points[I-1][j] scalar
multiplied by 2^16)
-------------
PR: https://git.openjdk.org/jdk/pull/10893