ni...@lysator.liu.se (Niels Möller) writes:
> I've now implemented inverse too.
And now I've tried a different variant, using redc for the cofactor
updates. Cofactors are now canonically reduced (which seems unexpectedly
cheap). In the case that m is not normalized, so that 2m fits in n
limbs,
Marco Bodrato writes:
We should start writing mpn_sec_binvert :-)
I think mpn_binvert is almost sec_ naturally.
The exception is when sbpi1_bdiv_q.or dbpi1_bdiv_q c are invoked. The
former has some < on data (for carry computations) and the latter has a
mpn_incr_u which is very leaky.
--
Il 2022-09-01 17:04 Torbjörn Granlund ha scritto:
/* FIXME: Using mpz_invert is cheating. Instead, first compute m' =
m^-1 (mod 2^k) via Newton/Hensel. We can then get the inverse
via
2^{-k} (mod m) = (2^k - m') * m + 1)/2^k. */
mpz_invert (t, t, m);
mpn_copyi (info->ip,
/* FIXME: Using mpz_invert is cheating. Instead, first compute m' =
m^-1 (mod 2^k) via Newton/Hensel. We can then get the inverse via
2^{-k} (mod m) = (2^k - m') * m + 1)/2^k. */
mpz_invert (t, t, m);
mpn_copyi (info->ip, mpz_limbs_read (t), mpz_size (t));
You might
Torbjörn Granlund writes:
> Why do you use sec_invert when inverting mod the group order when that
> is of prime order? (Yes, this question will become moot I suppose with
> this new algorithm.
No good reason, it's just that I implemented inverse-by-powering (with a
hand-tuned addition chain)
Much more unclear to me how close it might be to the typical or average
number of iterations needed.
That's perhaps not very interesting, as early exit is not an option
here. (Unless this algorithm would beat plain, "leaky" inverse.)
Currently uses exponentiation for the field inverse,
Torbjörn Granlund writes:
> That count is a proven "upper bound" of the numbver of iterations. Does
> the paper discuss how tight it is, i.e., is it close to a "lower bound"
> of the required number of iterations.
As an upper bound, I think it's rather tight at least for smallish
sizes, since
> count = (49 * bits + 57) / 17;
>
> Odd.
For sure. This isn't based on local progress of the algorithm (there
ain't no guaranteed progress for a short sequence of reduction steps),
but based on rather complex analysis of the number of steps needed for
the complete 2-adic
Torbjörn Granlund writes:
> ni...@lysator.liu.se (Niels Möller) writes:
>
> count = (49 * bits + 57) / 17;
>
> Odd.
For sure. This isn't based on local progress of the algorithm (there
ain't no guaranteed progress for a short sequence of reduction steps),
but based on rather complex
ni...@lysator.liu.se (Niels Möller) writes:
count = (49 * bits + 57) / 17;
Odd.
(For production code, one will need to cap the intermediates there,
at least for 32-bit machines. Perhaps something like this:
count = (51 * bits - 2 * bits + 57) / 17 =
= 3 * bits - (2 * bits -
ni...@lysator.liu.se (Niels Möller) writes:
> If we require the largest matrix element, 2^k, to fit in a 64-bit
> signed, we can have at most k = 62.
I've implemented a first version doing GMP_LIMB_BITS-2 bits at a time.
For a start, gcd only, not cofactors/inverse. Code appended below.
It's
ni...@lysator.liu.se (Niels Möller) writes:
> Then 96 bits seems to be the maximum number of low-end bits that can be
> eliminated, under the constraint that elements of the corresponding
> transformation matrix should fit in 64-bit (signed) limbs.
I had another look into this (that is, the
Torbjörn Granlund writes:
> ni...@lysator.liu.se (Niels Möller) writes:
>
> Extract least significant 96 bits of each number.
>
> Is that 3 32-bit limbs or 1.5 64-bit limbs?
I was thinking about 64-bit archs.
Then 96 bits seems to be the maximum number of low-end bits that can be
eliminated,
ni...@lysator.liu.se (Niels Möller) writes:
Extract least significant 96 bits of each number.
Is that 3 32-bit limbs or 1.5 64-bit limbs?
--
Torbjörn
Please encrypt, key id 0xC8601622
___
gmp-devel mailing list
gmp-devel@gmplib.org
Torbjörn Granlund writes:
> Any algorithm with these properties would be a huge improvement compared
> to what we have today:
>
> 0. No side-channel leakage of anything but bit counts of input operands.
>(I suppose there are usages where the mod argument is not sensitive,
>such as for
Any algorithm with these properties would be a huge improvement compared
to what we have today:
0. No side-channel leakage of anything but bit counts of input operands.
(I suppose there are usages where the mod argument is not sensitive,
such as for the elliptic curve usages). This is
Albin Ahlbäck writes:
> Have you looked at https://eprint.iacr.org/2020/972.pdf, where the
> author seems to suggests an even faster algorithm? Or at least it was
> faster under heavy optimization under the assumption of what inputs
> the algorithm recieved.
No, I wasn't ware of that. I've now
On 24/05/22 11:59, Niels Möller wrote:
> Hi, I've had a first look at the paper by djb and Bo-Yin Yang,
> https://eprint.iacr.org/2019/266.pdf. Mainly focusing on the integer
> case.
Have you looked at https://eprint.iacr.org/2020/972.pdf, where the
author seems to suggests an even faster
ni...@lysator.liu.se (Niels Möller) writes:
> The new algorithm in the above paper uses a small auxillary state
> variable d (or \delta in the paper) to decide the update in the case
> both a, b are odd. If I write the iteration step using the same notation
> as above, it's
>
> if a even:
>
19 matches
Mail list logo