Greets,

I'm new here, so be nice :) .

The hope of this is that "many relatively small modulo
calculations" can be implemented significantly more efficiently
than a few large, multiprecision divide calculations.

I don't know if this is new to this group, but I would appreciate
any feedback (on the mailing list) that anyone would care to give.

This algorithm takes advantage of easily calculated modulo
congruencies in a (CPU word-size)-wise fashion.  The well-known
modular identity exploited is this:
  ( sum(a_i, 0<=i<=k) ) % factor
    is congruent to
  sum(a_i % factor, 0<=i<=k)

Here's a trivial example using hypothetical computer with
maxint = radix = 10:

  calculate 1234567 % 7

  note that
    1234567 =
      1 * 1000000
      2 * 100000
      3 * 10000
      4 * 1000
      5 * 100
      6 * 10
      7 * 1
    these are our vector blocks

  (the digraph ~= means 'is congruent to' here)

  note that
    1       % 7 ~= 1

    to calculate 10 % 7, we calculate a trivial modulo
    this is also our 'original modulo'

    10      % 7 ~= 3

    to calculate 100 % 7, we can multiply our previous
    result by our 'original modulo' and do a trivial modulo

    100     % 7
      ~= ((10%7) * (10%7)) % 7
      ~= (3*3) % 7
      ~= 9 % 7
      ~= 2
    to calculate 1000 % 7, we can multiply our previous
    result by 'original modulo' and do a trivial modulo

    1000    % 7
      ~= ((100%7) * (10%7)) % 7
      ~= (2*3) % 7
      ~= 6

    etc.

    10000   % 7
      ~= ((1000%7) * (10%7)) % 7
      ~= (6*3) % 7
      ~= 4

    100000  % 7
      ~= ((10000%7) * (10%7)) % 7
      ~= (4*3) % 7
      ~= 5

    1000000 % 7
      ~= ((100000%7) * (10%7)) % 7
      ~= (5*3) % 7
      ~= 1

  Now we take that list of radix-block modulos and multiply them
  each by their place multiplier, and do a trivial modulo on the
  results

    1 * 1000000 % 7 ~= 1 * 1 ~=  1 % 7 = 1
    2 * 100000  % 7 ~= 2 * 5 ~= 10 % 7 = 3
    3 * 10000   % 7 ~= 3 * 4 ~= 12 % 7 = -2
    4 * 1000    % 7 ~= 4 * 6 ~= 24 % 7 = 3
    5 * 100     % 7 ~= 5 * 2 ~= 10 % 7 = 3
    6 * 10      % 7 ~= 6 * 3 ~= 18 % 7 = -3
    7 * 1       % 7 ~= 7 * 1 ~=  7 % 7 = 0

    summing the residues 1,3,-2,3,3,-3, and 0 gives 5

    5 is not congruent to zero mod 7, therefore
      7 is not a factor of 1234567

    checking the result via hand calculator:
      1234567 / 7 = 176366.714285(...repeats decimal portion)
        nope, it doesn't divide equally
      1234567 % 7 = 5
        the residue is, in fact, 5

  Note that these block-wise residues come in repeating cycles
  which always end in 1.  (short exercise for the reader why the
  next residue after 1 is always equal to the original residue)

  These vectors have a theoretical maximum length which depends on
  the length of the block and the number of factors of
  (block length-1). (fermat's little theorem for max length,
  exercise for the reader for lesser lengths)

  This property can be exploited thusly:
    First, accumulate a vector of previous residues
    Second, if a block residue is 1, then
      stop accumulating and cycle through previous residues

  The exploitation would only be worthwhile if the cost of
  accumulation and subsequent lookups (and the frequency of being
  able to exploit very short cycles is high--which it usually is
  with powers-of-two block lengths) is less than the cost of the
  multiply-and-take modulo operation.

Here's python-ish pseudo-code w/out the vector optimization

  blocklen is <= system word length
  maxint is max integer of blocklen

  N = big number to factor

  Nvec = vector of blocks of bits each of blocklen

  for each f (factor candidate)
    # f is a factor candidate ... drawn from table or sieve of
    # primes

    accum = 0
    # incremental modulo accumulator

    omod = maxint % f
    # original modulo
    # compute maxint modulo candidate

    mod = 1
    # stores block-wise modulos

    for (i=Nvec.length-1;0;i--)
      # grab a block from Nvec

      mod = (mod*omod) % f
      # raise mod to next power of omod and store its residue

      sub_mod = Nvec[i] * mod
      # calculate residue of Nvec[i], defer taking its residue
      # until accum step

      accum = (accum + sub_mod) % f
      # accumulate sub modulos and reduce it if necessary

    if accum == 0
      print "Factor Found!",f
      collect_prize_money or move on to next candidate prime

Tracking variables through the loops on the above example:

  Setup
    mod      <- 1
    sub_mod  <- undef
    accum    <- 0

  Nvec = 7
    mod      <- 1  (mod*omod % 7)      = (1 % 7)
    sub_mod  <- 1  (mod * Nvec)        = (1*1)
    accum    <- 1  (accum+sub_mod % 7) = (0+1)

  Nvec = 6
    mod      <- 3  (mod*omod % 7)      = (1*3 % 7)
    sub_mod  <- 4  (mod * Nvec)        = (6*3)
    accum    <- 5  (accum+sub_mod % 7) = (1+4 % 7)

  Nvec = 5
    mod      <- 2  (mod*omod % 7)      = (3*3 % 7)
    sub_mod  <- 3  (mod * Nvec)        = (5*2)
    accum    <- 1  (accum+sub_mod % 7) = (5+10 % 7)

  Nvec = 4
    mod      <- 6  (mod*omod % 7)      = (2*3 % 7)
    sub_mod  <- 3  (mod * Nvec)        = (4*6)
    accum    <- 4  (accum+sub_mod % 7) = (1+24 % 7)

  Nvec = 3
    mod      <- 4  (mod*omod % 7)      = (6*3 % 7)
    sub_mod  <- 5  (mod * Nvec)        = (3*4)
    accum    <- 2  (accum+sub_mod % 7) = (4+12 % 7)

  Nvec = 2
    mod      <- 5  (mod*omod % 7)      = (4*3 % 7)
    sub_mod  <- 3  (mod * Nvec)        = (2*5)
    accum    <- 5  (accum+sub_mod % 7) = (2+10 % 7)

  Nvec = 1
    mod      <- 1  (mod*omod % 7)      = (5*3 % 7)
    sub_mod  <- 0  (mod * Nvec)        = (0*1)
    accum    <- 5  (accum+sub_mod % 7) = (5+0 % 7)
/End of worked example


This is all fine and dandy for factor candidates that fit in a CPU
word.  For factor candidates that don't fit in a CPU word, we have
to switch to multi-precision math and we probably lose most
chances to use the cycle-length vector optimization.  When factor
candidates don't fit into a CPU word, then the block base modulos
are harder to calculate, as they're
  (maxint**(num_blocks-block_position)) % f
which is still accumulatable, but probably not in a CPU word.

Anyone see any benefit in this in terms of comparable cost of
implementation vs. pure multiprecision division?  I'm more
interested in the number-theoretics of the problem than in the
down-n-dirty details of bit pushing.

Thanks,

--jim

_______________________________________________
Prime mailing list
[email protected]
http://hogranch.com/mailman/listinfo/prime

Reply via email to