Re: Side-channel silent division

2012-11-14 Thread Niels Möller
Torbjorn Granlund t...@gmplib.org writes:

 ni...@lysator.liu.se (Niels Möller) writes:

   If we can arrange for a loop which does a full quotent limb, and applies
   it using mpn_submul_1 followed by an mpn_add_cnd_n per quotient3B, would
   that be faster or otherwise preferable to your loop with two submul_1
   per quotient limb?
   
 Quotient3B?  Is 3B a typo for  limb?  :-)

Yup! I blame the terminal and its escape sequences.

 A loop around submul_1 + mpn_addcnd_n would get a smaller n^2 term on
 almost all machines, I think.

I was been thinking of the following algorithm (I think I wrote about
the mod variant a while ago).

Say we want to divide by an n-limb number D, and for simplicity, assume
that D is normalised.

First precompute the inverse v = floor(B^{n+1} / D) (one limb and a
bit), and the corresponding n-limb remainder, r, so that

  B^{n+1} = v * D + R

Now, if U is an n+2 limb number, we can reduce it as

  u_{n+1}^B{n+1} + U' = u_{n+1} v D + u_{n+1} R + U' 

Here, the product u_{n+1} * v is two limbs + a bit to be added to the
quotient, and U' + u_{n+1} r is n+1 limbs and a carry out. We can
iterate this to reduce u from n+m limbs to n + 1 (we then need something
different for the final quotient limb).

For side-channel silent division, we'd need to store away the q limbs
and add them up in a single loop after the main loop. The carry from
addmul_1 would have to be handled inside the loop, not sure how to best do
that.

For general schoolbook division, it's nice that we use addmul_1 rather
than submul_1, and when addmul_2 is available it might make sense to
extend to two quotient limbs at a time.

Ah, and one more thing: For the applications which are going to need
side-channel silent division, I think the common case is a mod operation
with invariant divisor. In this case, I think the above idea (but
without all the quotient book-keeping) is a pretty good alternative to
schoolbook.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel


Re: Side-channel silent division

2012-11-14 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes:

  I was been thinking of the following algorithm (I think I wrote about
  the mod variant a while ago).
  
  Say we want to divide by an n-limb number D, and for simplicity, assume
  that D is normalised.
  
  First precompute the inverse v = floor(B^{n+1} / D) (one limb and a
  bit), and the corresponding n-limb remainder, r, so that
  
I assume v is sometimes a limb and two bits, but only for one D.  :-)

Do you mean to compute precisely the above value, not some very close
limb+bit approximation (looking at, say the two most significant limbs
of D)?

B^{n+1} = v * D + R
  
So r and R are the same?

  Now, if U is an n+2 limb number, we can reduce it as
  
u_{n+1}^B{n+1} + U' = u_{n+1} v D + u_{n+1} R + U' 
  
Something is strange there.  It seems that

u_{n+1}*B^{n+1} + U' = u_{n+1} v D + u_{n+1} R + U' 
  
makes more sense.

  Here, the product u_{n+1} * v is two limbs + a bit to be added to the
  quotient, and U' + u_{n+1} r is n+1 limbs and a carry out.

  We can iterate this to reduce u from n+m limbs to n + 1 (we then need
  something different for the final quotient limb).
  
U is n+2 limbs (per 6 lines above) and u is a different number of n+m
limbs?

OK, I think I see what you're doing.  You're allowing some greater
partial remainders, and will then (typically) generate a limb+bit
quotient.

This is highly interesting.

  For side-channel silent division, we'd need to store away the q limbs
  and add them up in a single loop after the main loop. The carry from
  addmul_1 would have to be handled inside the loop, not sure how to best do
  that.
  
That is similar to what I do in my proposed half-limb-at-a-time code.

  For general schoolbook division, it's nice that we use addmul_1 rather
  than submul_1, and when addmul_2 is available it might make sense to
  extend to two quotient limbs at a time.
  
Absolutely.  We should avoid submul_1 if we can.

  Ah, and one more thing: For the applications which are going to need
  side-channel silent division, I think the common case is a mod operation
  with invariant divisor. In this case, I think the above idea (but
  without all the quotient book-keeping) is a pretty good alternative to
  schoolbook.
  
I miss your point here.

Are you thinking of replacing REDC with the above trick in, say, powm?

I am mainly aiming at the a * B^n mod m operation for converting to
Montgomery residues, and of CRT.

-- 
Torbjörn
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel