Greets,
I'm new here, so be nice :) .
The hope of this is that "many relatively small modulo
calculations" can be implemented significantly more efficiently
than a few large, multiprecision divide calculations.
I don't know if this is new to this group, but I would appreciate
any feedback (on the mailing list) that anyone would care to give.
This algorithm takes advantage of easily calculated modulo
congruencies in a (CPU word-size)-wise fashion. The well-known
modular identity exploited is this:
( sum(a_i, 0<=i<=k) ) % factor
is congruent to
sum(a_i % factor, 0<=i<=k)
Here's a trivial example using hypothetical computer with
maxint = radix = 10:
calculate 1234567 % 7
note that
1234567 =
1 * 1000000
2 * 100000
3 * 10000
4 * 1000
5 * 100
6 * 10
7 * 1
these are our vector blocks
(the digraph ~= means 'is congruent to' here)
note that
1 % 7 ~= 1
to calculate 10 % 7, we calculate a trivial modulo
this is also our 'original modulo'
10 % 7 ~= 3
to calculate 100 % 7, we can multiply our previous
result by our 'original modulo' and do a trivial modulo
100 % 7
~= ((10%7) * (10%7)) % 7
~= (3*3) % 7
~= 9 % 7
~= 2
to calculate 1000 % 7, we can multiply our previous
result by 'original modulo' and do a trivial modulo
1000 % 7
~= ((100%7) * (10%7)) % 7
~= (2*3) % 7
~= 6
etc.
10000 % 7
~= ((1000%7) * (10%7)) % 7
~= (6*3) % 7
~= 4
100000 % 7
~= ((10000%7) * (10%7)) % 7
~= (4*3) % 7
~= 5
1000000 % 7
~= ((100000%7) * (10%7)) % 7
~= (5*3) % 7
~= 1
Now we take that list of radix-block modulos and multiply them
each by their place multiplier, and do a trivial modulo on the
results
1 * 1000000 % 7 ~= 1 * 1 ~= 1 % 7 = 1
2 * 100000 % 7 ~= 2 * 5 ~= 10 % 7 = 3
3 * 10000 % 7 ~= 3 * 4 ~= 12 % 7 = -2
4 * 1000 % 7 ~= 4 * 6 ~= 24 % 7 = 3
5 * 100 % 7 ~= 5 * 2 ~= 10 % 7 = 3
6 * 10 % 7 ~= 6 * 3 ~= 18 % 7 = -3
7 * 1 % 7 ~= 7 * 1 ~= 7 % 7 = 0
summing the residues 1,3,-2,3,3,-3, and 0 gives 5
5 is not congruent to zero mod 7, therefore
7 is not a factor of 1234567
checking the result via hand calculator:
1234567 / 7 = 176366.714285(...repeats decimal portion)
nope, it doesn't divide equally
1234567 % 7 = 5
the residue is, in fact, 5
Note that these block-wise residues come in repeating cycles
which always end in 1. (short exercise for the reader why the
next residue after 1 is always equal to the original residue)
These vectors have a theoretical maximum length which depends on
the length of the block and the number of factors of
(block length-1). (fermat's little theorem for max length,
exercise for the reader for lesser lengths)
This property can be exploited thusly:
First, accumulate a vector of previous residues
Second, if a block residue is 1, then
stop accumulating and cycle through previous residues
The exploitation would only be worthwhile if the cost of
accumulation and subsequent lookups (and the frequency of being
able to exploit very short cycles is high--which it usually is
with powers-of-two block lengths) is less than the cost of the
multiply-and-take modulo operation.
Here's python-ish pseudo-code w/out the vector optimization
blocklen is <= system word length
maxint is max integer of blocklen
N = big number to factor
Nvec = vector of blocks of bits each of blocklen
for each f (factor candidate)
# f is a factor candidate ... drawn from table or sieve of
# primes
accum = 0
# incremental modulo accumulator
omod = maxint % f
# original modulo
# compute maxint modulo candidate
mod = 1
# stores block-wise modulos
for (i=Nvec.length-1;0;i--)
# grab a block from Nvec
mod = (mod*omod) % f
# raise mod to next power of omod and store its residue
sub_mod = Nvec[i] * mod
# calculate residue of Nvec[i], defer taking its residue
# until accum step
accum = (accum + sub_mod) % f
# accumulate sub modulos and reduce it if necessary
if accum == 0
print "Factor Found!",f
collect_prize_money or move on to next candidate prime
Tracking variables through the loops on the above example:
Setup
mod <- 1
sub_mod <- undef
accum <- 0
Nvec = 7
mod <- 1 (mod*omod % 7) = (1 % 7)
sub_mod <- 1 (mod * Nvec) = (1*1)
accum <- 1 (accum+sub_mod % 7) = (0+1)
Nvec = 6
mod <- 3 (mod*omod % 7) = (1*3 % 7)
sub_mod <- 4 (mod * Nvec) = (6*3)
accum <- 5 (accum+sub_mod % 7) = (1+4 % 7)
Nvec = 5
mod <- 2 (mod*omod % 7) = (3*3 % 7)
sub_mod <- 3 (mod * Nvec) = (5*2)
accum <- 1 (accum+sub_mod % 7) = (5+10 % 7)
Nvec = 4
mod <- 6 (mod*omod % 7) = (2*3 % 7)
sub_mod <- 3 (mod * Nvec) = (4*6)
accum <- 4 (accum+sub_mod % 7) = (1+24 % 7)
Nvec = 3
mod <- 4 (mod*omod % 7) = (6*3 % 7)
sub_mod <- 5 (mod * Nvec) = (3*4)
accum <- 2 (accum+sub_mod % 7) = (4+12 % 7)
Nvec = 2
mod <- 5 (mod*omod % 7) = (4*3 % 7)
sub_mod <- 3 (mod * Nvec) = (2*5)
accum <- 5 (accum+sub_mod % 7) = (2+10 % 7)
Nvec = 1
mod <- 1 (mod*omod % 7) = (5*3 % 7)
sub_mod <- 0 (mod * Nvec) = (0*1)
accum <- 5 (accum+sub_mod % 7) = (5+0 % 7)
/End of worked example
This is all fine and dandy for factor candidates that fit in a CPU
word. For factor candidates that don't fit in a CPU word, we have
to switch to multi-precision math and we probably lose most
chances to use the cycle-length vector optimization. When factor
candidates don't fit into a CPU word, then the block base modulos
are harder to calculate, as they're
(maxint**(num_blocks-block_position)) % f
which is still accumulatable, but probably not in a CPU word.
Anyone see any benefit in this in terms of comparable cost of
implementation vs. pure multiprecision division? I'm more
interested in the number-theoretics of the problem than in the
down-n-dirty details of bit pushing.
Thanks,
--jim
_______________________________________________
Prime mailing list
[email protected]
http://hogranch.com/mailman/listinfo/prime