Re: [PATCH v10 3/9] crypto: Add math to support fast NIST P384
On 3/6/21 7:03 PM, Vitaly Chikunov wrote:
Stefan,
On Sat, Mar 06, 2021 at 06:29:18PM -0500, Stefan Berger wrote:
On 3/6/21 2:25 PM, Vitaly Chikunov wrote:
On Thu, Mar 04, 2021 at 07:51:57PM -0500, Stefan Berger wrote:
From: Saulo Alessandre
* crypto/ecc.c
- add vli_mmod_fast_384
- change some routines to pass ecc_curve forward until vli_mmod_fast
* crypto/ecc.h
- add ECC_CURVE_NIST_P384_DIGITS
- change ECC_MAX_DIGITS to P384 size
Signed-off-by: Saulo Alessandre
Tested-by: Stefan Berger
---
crypto/ecc.c | 266 +--
crypto/ecc.h | 3 +-
2 files changed, 194 insertions(+), 75 deletions(-)
diff --git a/crypto/ecc.c b/crypto/ecc.c
index f6cef5a7942d..c125576cda6b 100644
--- a/crypto/ecc.c
+++ b/crypto/ecc.c
@@ -778,18 +778,133 @@ static void vli_mmod_fast_256(u64 *result, const u64
*product,
...
/* Computes result = product % curve_prime for different curve_primes.
*
* Note that curve_primes are distinguished just by heuristic check and
* not by complete conformance check.
*/
static bool vli_mmod_fast(u64 *result, u64 *product,
- const u64 *curve_prime, unsigned int ndigits)
+ const struct ecc_curve *curve)
{
u64 tmp[2 * ECC_MAX_DIGITS];
+ const u64 *curve_prime = curve->p;
+ const unsigned int ndigits = curve->g.ndigits;
- /* Currently, both NIST primes have -1 in lowest qword. */
- if (curve_prime[0] != -1ull) {
+ /* Currently, all NIST have name nist_.* */
+ if (strncmp(curve->name, "nist_", 5) != 0) {
I am not sure, but maybe this strncmp should not be optimized somehow,
since vli_mmod_fast could be called quite frequently. Perhaps by integer
algo id or even callback?
Should be optimized or should not be? You seem to say both.
Excuse me for the typo. I meant "should be optimized". I think, maybe
it's time to add algo selector id (for the case statement, for example
instead of `switch (ndigits)') or just callback for a low level mmod
function.
If you think this would not impact performance then nevermind.
I think it would only be a few cycles. Of course we could introduce a
flag to indicate nist functions (rather than using strncmp on the name)
or work with the callbacks (only for the nist functions?) as you
mentioned, but maybe that's something we could do after? Either way we
would have to pass the ecc_curve pointer all the way into vli_mmod_fast.
So this change here is preparing for this as well.
Stefan
Re: [PATCH v10 3/9] crypto: Add math to support fast NIST P384
Stefan,
On Sat, Mar 06, 2021 at 06:29:18PM -0500, Stefan Berger wrote:
> On 3/6/21 2:25 PM, Vitaly Chikunov wrote:
> >
> > On Thu, Mar 04, 2021 at 07:51:57PM -0500, Stefan Berger wrote:
> > > From: Saulo Alessandre
> > >
> > > * crypto/ecc.c
> > >- add vli_mmod_fast_384
> > >- change some routines to pass ecc_curve forward until vli_mmod_fast
> > >
> > > * crypto/ecc.h
> > >- add ECC_CURVE_NIST_P384_DIGITS
> > >- change ECC_MAX_DIGITS to P384 size
> > >
> > > Signed-off-by: Saulo Alessandre
> > > Tested-by: Stefan Berger
> > > ---
> > > crypto/ecc.c | 266 +--
> > > crypto/ecc.h | 3 +-
> > > 2 files changed, 194 insertions(+), 75 deletions(-)
> > >
> > > diff --git a/crypto/ecc.c b/crypto/ecc.c
> > > index f6cef5a7942d..c125576cda6b 100644
> > > --- a/crypto/ecc.c
> > > +++ b/crypto/ecc.c
> > > @@ -778,18 +778,133 @@ static void vli_mmod_fast_256(u64 *result, const
> > > u64 *product,
> > > ...
> > > /* Computes result = product % curve_prime for different curve_primes.
> > >*
> > >* Note that curve_primes are distinguished just by heuristic check and
> > >* not by complete conformance check.
> > >*/
> > > static bool vli_mmod_fast(u64 *result, u64 *product,
> > > - const u64 *curve_prime, unsigned int ndigits)
> > > + const struct ecc_curve *curve)
> > > {
> > > u64 tmp[2 * ECC_MAX_DIGITS];
> > > + const u64 *curve_prime = curve->p;
> > > + const unsigned int ndigits = curve->g.ndigits;
> > > - /* Currently, both NIST primes have -1 in lowest qword. */
> > > - if (curve_prime[0] != -1ull) {
> > > + /* Currently, all NIST have name nist_.* */
> > > + if (strncmp(curve->name, "nist_", 5) != 0) {
> > I am not sure, but maybe this strncmp should not be optimized somehow,
> > since vli_mmod_fast could be called quite frequently. Perhaps by integer
> > algo id or even callback?
>
> Should be optimized or should not be? You seem to say both.
Excuse me for the typo. I meant "should be optimized". I think, maybe
it's time to add algo selector id (for the case statement, for example
instead of `switch (ndigits)') or just callback for a low level mmod
function.
If you think this would not impact performance then nevermind.
Thanks,
>
> The code code here is shared with ecrdsa. The comparison won't go beyond a
> single letter considering the naming of the curves define here:
>
> "cp256a":
> https://elixir.bootlin.com/linux/v5.11.3/source/crypto/ecrdsa_defs.h#L49
>
> "cp256b":
> https://elixir.bootlin.com/linux/v5.11.3/source/crypto/ecrdsa_defs.h#L82
>
> "cp256c":
> https://elixir.bootlin.com/linux/v5.11.3/source/crypto/ecrdsa_defs.h#L119
>
> "tc512a":
> https://elixir.bootlin.com/linux/v5.11.3/source/crypto/ecrdsa_defs.h#L168
>
> and here:
>
> "nist_192":
> https://elixir.bootlin.com/linux/v5.11.3/source/crypto/ecc_curve_defs.h#L18
>
> "nist_256":
> https://elixir.bootlin.com/linux/v5.11.3/source/crypto/ecc_curve_defs.h#L45
>
>
> All the ecrdsa curves were previously evaluating 'curve_prime[0] != -1ull',
> so it doesn't change anything.
>
> Stefan
>
Re: [PATCH v10 3/9] crypto: Add math to support fast NIST P384
On 3/6/21 2:25 PM, Vitaly Chikunov wrote:
Stefan,
On Thu, Mar 04, 2021 at 07:51:57PM -0500, Stefan Berger wrote:
From: Saulo Alessandre
* crypto/ecc.c
- add vli_mmod_fast_384
- change some routines to pass ecc_curve forward until vli_mmod_fast
* crypto/ecc.h
- add ECC_CURVE_NIST_P384_DIGITS
- change ECC_MAX_DIGITS to P384 size
Signed-off-by: Saulo Alessandre
Tested-by: Stefan Berger
---
crypto/ecc.c | 266 +--
crypto/ecc.h | 3 +-
2 files changed, 194 insertions(+), 75 deletions(-)
diff --git a/crypto/ecc.c b/crypto/ecc.c
index f6cef5a7942d..c125576cda6b 100644
--- a/crypto/ecc.c
+++ b/crypto/ecc.c
@@ -778,18 +778,133 @@ static void vli_mmod_fast_256(u64 *result, const u64
*product,
...
/* Computes result = product % curve_prime for different curve_primes.
*
* Note that curve_primes are distinguished just by heuristic check and
* not by complete conformance check.
*/
static bool vli_mmod_fast(u64 *result, u64 *product,
- const u64 *curve_prime, unsigned int ndigits)
+ const struct ecc_curve *curve)
{
u64 tmp[2 * ECC_MAX_DIGITS];
+ const u64 *curve_prime = curve->p;
+ const unsigned int ndigits = curve->g.ndigits;
- /* Currently, both NIST primes have -1 in lowest qword. */
- if (curve_prime[0] != -1ull) {
+ /* Currently, all NIST have name nist_.* */
+ if (strncmp(curve->name, "nist_", 5) != 0) {
I am not sure, but maybe this strncmp should not be optimized somehow,
since vli_mmod_fast could be called quite frequently. Perhaps by integer
algo id or even callback?
Should be optimized or should not be? You seem to say both.
The code code here is shared with ecrdsa. The comparison won't go beyond
a single letter considering the naming of the curves define here:
"cp256a":
https://elixir.bootlin.com/linux/v5.11.3/source/crypto/ecrdsa_defs.h#L49
"cp256b":
https://elixir.bootlin.com/linux/v5.11.3/source/crypto/ecrdsa_defs.h#L82
"cp256c":
https://elixir.bootlin.com/linux/v5.11.3/source/crypto/ecrdsa_defs.h#L119
"tc512a":
https://elixir.bootlin.com/linux/v5.11.3/source/crypto/ecrdsa_defs.h#L168
and here:
"nist_192":
https://elixir.bootlin.com/linux/v5.11.3/source/crypto/ecc_curve_defs.h#L18
"nist_256":
https://elixir.bootlin.com/linux/v5.11.3/source/crypto/ecc_curve_defs.h#L45
All the ecrdsa curves were previously evaluating 'curve_prime[0] !=
-1ull', so it doesn't change anything.
Stefan
Re: [PATCH v10 3/9] crypto: Add math to support fast NIST P384
Stefan,
On Thu, Mar 04, 2021 at 07:51:57PM -0500, Stefan Berger wrote:
> From: Saulo Alessandre
>
> * crypto/ecc.c
> - add vli_mmod_fast_384
> - change some routines to pass ecc_curve forward until vli_mmod_fast
>
> * crypto/ecc.h
> - add ECC_CURVE_NIST_P384_DIGITS
> - change ECC_MAX_DIGITS to P384 size
>
> Signed-off-by: Saulo Alessandre
> Tested-by: Stefan Berger
> ---
> crypto/ecc.c | 266 +--
> crypto/ecc.h | 3 +-
> 2 files changed, 194 insertions(+), 75 deletions(-)
>
> diff --git a/crypto/ecc.c b/crypto/ecc.c
> index f6cef5a7942d..c125576cda6b 100644
> --- a/crypto/ecc.c
> +++ b/crypto/ecc.c
> @@ -778,18 +778,133 @@ static void vli_mmod_fast_256(u64 *result, const u64
> *product,
> ...
> /* Computes result = product % curve_prime for different curve_primes.
> *
> * Note that curve_primes are distinguished just by heuristic check and
> * not by complete conformance check.
> */
> static bool vli_mmod_fast(u64 *result, u64 *product,
> - const u64 *curve_prime, unsigned int ndigits)
> + const struct ecc_curve *curve)
> {
> u64 tmp[2 * ECC_MAX_DIGITS];
> + const u64 *curve_prime = curve->p;
> + const unsigned int ndigits = curve->g.ndigits;
>
> - /* Currently, both NIST primes have -1 in lowest qword. */
> - if (curve_prime[0] != -1ull) {
> + /* Currently, all NIST have name nist_.* */
> + if (strncmp(curve->name, "nist_", 5) != 0) {
I am not sure, but maybe this strncmp should not be optimized somehow,
since vli_mmod_fast could be called quite frequently. Perhaps by integer
algo id or even callback?
Thanks,
> /* Try to handle Pseudo-Marsenne primes. */
> if (curve_prime[ndigits - 1] == -1ull) {
> vli_mmod_special(result, product, curve_prime,
> @@ -812,6 +927,9 @@ static bool vli_mmod_fast(u64 *result, u64 *product,
> case 4:
> vli_mmod_fast_256(result, product, curve_prime, tmp);
> break;
> + case 6:
> + vli_mmod_fast_384(result, product, curve_prime, tmp);
> + break;
> default:
> pr_err_ratelimited("ecc: unsupported digits size!\n");
> return false;
> @@ -835,22 +953,22 @@ EXPORT_SYMBOL(vli_mod_mult_slow);
>
> /* Computes result = (left * right) % curve_prime. */
> static void vli_mod_mult_fast(u64 *result, const u64 *left, const u64 *right,
> - const u64 *curve_prime, unsigned int ndigits)
> + const struct ecc_curve *curve)
> {
> u64 product[2 * ECC_MAX_DIGITS];
>
> - vli_mult(product, left, right, ndigits);
> - vli_mmod_fast(result, product, curve_prime, ndigits);
> + vli_mult(product, left, right, curve->g.ndigits);
> + vli_mmod_fast(result, product, curve);
> }
>
> /* Computes result = left^2 % curve_prime. */
> static void vli_mod_square_fast(u64 *result, const u64 *left,
> - const u64 *curve_prime, unsigned int ndigits)
> + const struct ecc_curve *curve)
> {
> u64 product[2 * ECC_MAX_DIGITS];
>
> - vli_square(product, left, ndigits);
> - vli_mmod_fast(result, product, curve_prime, ndigits);
> + vli_square(product, left, curve->g.ndigits);
> + vli_mmod_fast(result, product, curve);
> }
>
> #define EVEN(vli) (!(vli[0] & 1))
Re: [PATCH v10 3/9] crypto: Add math to support fast NIST P384
On Thu, Mar 04, 2021 at 07:51:57PM -0500, Stefan Berger wrote:
> From: Saulo Alessandre
>
> * crypto/ecc.c
> - add vli_mmod_fast_384
> - change some routines to pass ecc_curve forward until vli_mmod_fast
>
> * crypto/ecc.h
> - add ECC_CURVE_NIST_P384_DIGITS
> - change ECC_MAX_DIGITS to P384 size
>
> Signed-off-by: Saulo Alessandre
> Tested-by: Stefan Berger
Another "diffstat".
/Jarkko
> ---
> crypto/ecc.c | 266 +--
> crypto/ecc.h | 3 +-
> 2 files changed, 194 insertions(+), 75 deletions(-)
>
> diff --git a/crypto/ecc.c b/crypto/ecc.c
> index f6cef5a7942d..c125576cda6b 100644
> --- a/crypto/ecc.c
> +++ b/crypto/ecc.c
> @@ -778,18 +778,133 @@ static void vli_mmod_fast_256(u64 *result, const u64
> *product,
> }
> }
>
> +#define SL32OR32(x32, y32) (((u64)x32 << 32) | y32)
> +#define AND64H(x64) (x64 & 0xffFFffFFull)
> +#define AND64L(x64) (x64 & 0xffFFffFFull)
> +
> +/* Computes result = product % curve_prime
> + * from "Mathematical routines for the NIST prime elliptic curves"
> + */
> +static void vli_mmod_fast_384(u64 *result, const u64 *product,
> + const u64 *curve_prime, u64 *tmp)
> +{
> + int carry;
> + const unsigned int ndigits = 6;
> +
> + /* t */
> + vli_set(result, product, ndigits);
> +
> + /* s1 */
> + tmp[0] = 0; // 0 || 0
> + tmp[1] = 0; // 0 || 0
> + tmp[2] = SL32OR32(product[11], (product[10]>>32)); //a22||a21
> + tmp[3] = product[11]>>32; // 0 ||a23
> + tmp[4] = 0; // 0 || 0
> + tmp[5] = 0; // 0 || 0
> + carry = vli_lshift(tmp, tmp, 1, ndigits);
> + carry += vli_add(result, result, tmp, ndigits);
> +
> + /* s2 */
> + tmp[0] = product[6];//a13||a12
> + tmp[1] = product[7];//a15||a14
> + tmp[2] = product[8];//a17||a16
> + tmp[3] = product[9];//a19||a18
> + tmp[4] = product[10]; //a21||a20
> + tmp[5] = product[11]; //a23||a22
> + carry += vli_add(result, result, tmp, ndigits);
> +
> + /* s3 */
> + tmp[0] = SL32OR32(product[11], (product[10]>>32)); //a22||a21
> + tmp[1] = SL32OR32(product[6], (product[11]>>32)); //a12||a23
> + tmp[2] = SL32OR32(product[7], (product[6])>>32);//a14||a13
> + tmp[3] = SL32OR32(product[8], (product[7]>>32));//a16||a15
> + tmp[4] = SL32OR32(product[9], (product[8]>>32));//a18||a17
> + tmp[5] = SL32OR32(product[10], (product[9]>>32)); //a20||a19
> + carry += vli_add(result, result, tmp, ndigits);
> +
> + /* s4 */
> + tmp[0] = AND64H(product[11]); //a23|| 0
> + tmp[1] = (product[10]<<32); //a20|| 0
> + tmp[2] = product[6];//a13||a12
> + tmp[3] = product[7];//a15||a14
> + tmp[4] = product[8];//a17||a16
> + tmp[5] = product[9];//a19||a18
> + carry += vli_add(result, result, tmp, ndigits);
> +
> + /* s5 */
> + tmp[0] = 0; // 0|| 0
> + tmp[1] = 0; // 0|| 0
> + tmp[2] = product[10]; //a21||a20
> + tmp[3] = product[11]; //a23||a22
> + tmp[4] = 0; // 0|| 0
> + tmp[5] = 0; // 0|| 0
> + carry += vli_add(result, result, tmp, ndigits);
> +
> + /* s6 */
> + tmp[0] = AND64L(product[10]); // 0 ||a20
> + tmp[1] = AND64H(product[10]); //a21|| 0
> + tmp[2] = product[11]; //a23||a22
> + tmp[3] = 0; // 0 || 0
> + tmp[4] = 0; // 0 || 0
> + tmp[5] = 0; // 0 || 0
> + carry += vli_add(result, result, tmp, ndigits);
> +
> + /* d1 */
> + tmp[0] = SL32OR32(product[6], (product[11]>>32)); //a12||a23
> + tmp[1] = SL32OR32(product[7], (product[6]>>32));//a14||a13
> + tmp[2] = SL32OR32(product[8], (product[7]>>32));//a16||a15
> + tmp[3] = SL32OR32(product[9], (product[8]>>32));//a18||a17
> + tmp[4] = SL32OR32(product[10], (product[9]>>32)); //a20||a19
> + tmp[5] = SL32OR32(product[11], (product[10]>>32)); //a22||a21
> + carry -= vli_sub(result, result, tmp, ndigits);
> +
> + /* d2 */
> + tmp[0] = (product[10]<<32); //a20|| 0
> + tmp[1] = SL32OR32(product[11], (product[10]>>32)); //a22||a21
> + tmp[2] = (product[11]>>32); // 0 ||a23
> + tmp[3] = 0; // 0 || 0
> + tmp[4] = 0; // 0 || 0
> + tmp[5] = 0; // 0 || 0
> + carry -= vli_sub(result, result, tmp, ndigits);
> +
> + /* d3 */
> + tmp[0] = 0; // 0 || 0
> + tmp[1] = AND64H(product[11]); //a23|| 0
> + tmp[2] = product[11]>>32; // 0 ||a23
> + tmp[3] = 0; // 0 || 0
> + tmp[4] = 0; // 0 || 0
> + tmp[5] = 0; // 0 || 0
> + carry -= vli_sub(result, result, tmp, ndigits);
> +
> + if (carry < 0) {
> + do {
> + carry +=
