#11475: improve prime_pi (speedup + small fixes)
-----------------------------+----------------------------------------------
   Reporter:  rohana         |          Owner:  was                             
       Type:  enhancement    |         Status:  needs_review                    
   Priority:  major          |      Milestone:  sage-4.7.1                      
  Component:  number theory  |       Keywords:  primes, prime counting, prime_pi
Work_issues:                 |       Upstream:  N/A                             
   Reviewer:                 |         Author:  R. Andrew Ohana                 
     Merged:                 |   Dependencies:                                  
-----------------------------+----------------------------------------------
Changes (by rohana):

  * status:  needs_work => needs_review


Old description:

> I have rewritten the `prime_pi` method from scratch, it is much cleaner
> and less hacky this time (as I have learned more about coding), however
> it is still based on the same algorithm as before (no LMO yet, although I
> intend to take a stab at it later this year). This was developed in
> parallel with a method to count primes in residue classes (ticket
> forthcoming).
>
> The new version deals with a variety of input in a better fashion,
> without a too significant effect on timings: (all timings were done on
> mod.math)
>
> New:
> {{{
> sage: timeit('prime_pi(1)')
> 625 loops, best of 3: 282 ns per loop
> sage: timeit('prime_pi(0.5)')
> 625 loops, best of 3: 4.19 µs per loop
> sage: timeit('prime_pi(sqrt(2))')
> 625 loops, best of 3: 384 µs per loop
> sage: timeit('prime_pi(mod(1, 2))')
> 625 loops, best of 3: 8.94 µs per loop
> }}}
> Old:
> {{{
> sage: timeit('prime_pi(1)')
> 625 loops, best of 3: 280 ns per loop
> sage: timeit('prime_pi(0.5)')
> 625 loops, best of 3: 3.92 µs per loop
> sage: timeit('prime_pi(sqrt(2))')
> 625 loops, best of 3: 383 µs per loop
> sage: timeit('prime_pi(mod(1, 2))')
> 625 loops, best of 3: 7.38 µs per loop
> }}}
> The overhead for small input `>= 2` is rather large in the current
> `prime_pi`, this has been dramatically improved:
>
> New:
> {{{
> sage: timeit('prime_pi(100)')
> 625 loops, best of 3: 331 ns per loop
> sage: timeit('prime_pi(10000)')
> 625 loops, best of 3: 971 ns per loop
> }}}
> Old:
> {{{
> sage: timeit('prime_pi(100)')
> 625 loops, best of 3: 1.71 µs per loop
> sage: timeit('prime_pi(10000)')
> 625 loops, best of 3: 5.56 µs per loop
> }}}
>
> There is also a ~3x speedup for larger input:
>
> New:
> {{{
> sage: timeit('prime_pi(10**8)')
> 125 loops, best of 3: 759 µs per loop
> sage: timeit('prime_pi(10**10)')
> 5 loops, best of 3: 47.4 ms per loop
> sage: time prime_pi(10**12)
> 37607912018
> Time: CPU 2.86 s, Wall: 2.86 s
> }}}
> Old:
> {{{
> sage: timeit('prime_pi(10**8)')
> 125 loops, best of 3: 3.66 ms per loop
> sage: timeit('prime_pi(10**10)')
> 5 loops, best of 3: 161 ms per loop
> sage: time prime_pi(10**12)
> 37607912018
> Time: CPU 8.65 s, Wall: 8.64 s
> }}}
>
> Primes are now cached, which can give a significant speedup when making
> smaller calls after larger calls: (run after previous commands)
>
> {{{
> sage: timeit('prime_pi(10**10)')
> 5 loops, best of 3: 27.4 ms per loop
> sage: timeit('prime_pi(10**8)')
> 625 loops, best of 3: 325 µs per loop
> sage: timeit('prime_pi(10000)')
> 625 loops, best of 3: 343 ns per loop
> }}}
>
> This doesn't cost very much memory, since everything is stored as 32-bit
> ints, but `prime_range` is currently built on top of pari, which poses a
> number of problems, but the primary concern is that it is a memory hog
> when sieving to a large upper bound (important for plotting purposes).
>
> I know that this implementation starts to fail to give correct output on
> 64-bit systems somewhere between `9.5*10**15` and `9.75*10**15`, for
> issues that I have been unable to determine. I have limited input to be
> `< 2**49`, which I am fairly confident is safe, as I have done a fair bit
> of testing in this range trying to debug this issue (despite it taking
> hours per call). I would appreciate testing on 32-bit platforms.
>
> A lot of the speed of this algorithm relies on the `__cached_count`
> method, which is essentially a binary search (with a simple adjustment in
> the beginning that takes advantage of the density of the primes). This is
> the fastest type of binary search I know of, but if anyone knows of a way
> to speed it up, it could have a significant effect. (Thanks to Yann for
> his suggestion)
>
> Finally, I have introduced the `legendres_formula` method, which takes
> two arguments, `x` and `a`, and returns the number of positive integers
> not larger than `x` that are not divisible by -- i.e. co-prime to -- each
> of the first `a` primes. This function is core to all combinatorial
> methods for computing the prime counting function, so I figured that I
> would make this publicly accessible now that I have clean code.
>
> ----
>
> Apply only [attachment:trac_11475v5.patch].

New description:

 I have rewritten the `prime_pi` method from scratch, it is much cleaner
 and less hacky this time (as I have learned more about coding), however it
 is still based on the same algorithm as before (no LMO yet, although I
 intend to take a stab at it later this year). This was developed in
 parallel with a method to count primes in residue classes (ticket
 forthcoming).

 The new version deals with a variety of input in a better fashion, without
 a too significant effect on timings: (all timings were done on mod.math)

 New:
 {{{
 sage: timeit('prime_pi(1)')
 625 loops, best of 3: 282 ns per loop
 sage: timeit('prime_pi(0.5)')
 625 loops, best of 3: 4.19 µs per loop
 sage: timeit('prime_pi(sqrt(2))')
 625 loops, best of 3: 384 µs per loop
 sage: timeit('prime_pi(mod(1, 2))')
 625 loops, best of 3: 8.94 µs per loop
 }}}
 Old:
 {{{
 sage: timeit('prime_pi(1)')
 625 loops, best of 3: 280 ns per loop
 sage: timeit('prime_pi(0.5)')
 625 loops, best of 3: 3.92 µs per loop
 sage: timeit('prime_pi(sqrt(2))')
 625 loops, best of 3: 383 µs per loop
 sage: timeit('prime_pi(mod(1, 2))')
 625 loops, best of 3: 7.38 µs per loop
 }}}
 The overhead for small input `>= 2` is rather large in the current
 `prime_pi`, this has been dramatically improved:

 New:
 {{{
 sage: timeit('prime_pi(100)')
 625 loops, best of 3: 331 ns per loop
 sage: timeit('prime_pi(10000)')
 625 loops, best of 3: 971 ns per loop
 }}}
 Old:
 {{{
 sage: timeit('prime_pi(100)')
 625 loops, best of 3: 1.71 µs per loop
 sage: timeit('prime_pi(10000)')
 625 loops, best of 3: 5.56 µs per loop
 }}}

 There is also a ~3x speedup for larger input:

 New:
 {{{
 sage: timeit('prime_pi(10**8)')
 125 loops, best of 3: 759 µs per loop
 sage: timeit('prime_pi(10**10)')
 5 loops, best of 3: 47.4 ms per loop
 sage: time prime_pi(10**12)
 37607912018
 Time: CPU 2.86 s, Wall: 2.86 s
 }}}
 Old:
 {{{
 sage: timeit('prime_pi(10**8)')
 125 loops, best of 3: 3.66 ms per loop
 sage: timeit('prime_pi(10**10)')
 5 loops, best of 3: 161 ms per loop
 sage: time prime_pi(10**12)
 37607912018
 Time: CPU 8.65 s, Wall: 8.64 s
 }}}

 Primes are now cached, which can give a significant speedup when making
 smaller calls after larger calls: (run after previous commands)

 {{{
 sage: timeit('prime_pi(10**10)')
 5 loops, best of 3: 27.4 ms per loop
 sage: timeit('prime_pi(10**8)')
 625 loops, best of 3: 325 µs per loop
 sage: timeit('prime_pi(10000)')
 625 loops, best of 3: 343 ns per loop
 }}}

 This doesn't cost very much memory, since everything is stored as 32-bit
 ints, but `prime_range` is currently built on top of pari, which poses a
 number of problems, but the primary concern is that it is a memory hog
 when sieving to a large upper bound (important for plotting purposes).

 I know that this implementation starts to fail to give correct output on
 64-bit systems somewhere between `9.5*10**15` and `9.75*10**15`, for
 issues that I have been unable to determine. I have limited input to be `<
 2**49`, which I am fairly confident is safe, as I have done a fair bit of
 testing in this range trying to debug this issue (despite it taking hours
 per call). I would appreciate testing on 32-bit platforms.

 A lot of the speed of this algorithm relies on the `__cached_count`
 method, which is essentially a binary search (with a simple adjustment in
 the beginning that takes advantage of the density of the primes). This is
 the fastest type of binary search I know of, but if anyone knows of a way
 to speed it up, it could have a significant effect. (Thanks to Yann for
 his suggestion)

 Finally, I have introduced the `legendres_formula` method, which takes two
 arguments, `x` and `a`, and returns the number of positive integers not
 larger than `x` that are not divisible by -- i.e. co-prime to -- each of
 the first `a` primes. This function is core to all combinatorial methods
 for computing the prime counting function, so I figured that I would make
 this publicly accessible now that I have clean code.

 ----

 Apply only [attachment:trac_11475v6.patch].

--

-- 
Ticket URL: <http://trac.sagemath.org/sage_trac/ticket/11475#comment:29>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica, 
and MATLAB

-- 
You received this message because you are subscribed to the Google Groups 
"sage-trac" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/sage-trac?hl=en.

Reply via email to