#7539: primes.p0.spkg with "prime_sieve.c" functionality
-----------------------------+----------------------------------------------
   Reporter:  GeorgSWeber    |       Owner:  rohana             
       Type:  enhancement    |      Status:  new                
   Priority:  major          |   Milestone:  sage-4.3           
  Component:  number theory  |    Keywords:                     
Work_issues:                 |      Author:  rohana, GeorgSWeber
   Upstream:  N/A            |    Reviewer:                     
     Merged:                 |  
-----------------------------+----------------------------------------------

Comment(by kevin.stueve):

 Leif Leonhardy sent some very useful contributions over the past days.
 Following is our correspondence.
 ----
 *Friday December 4, 2009:
 Hi Kevin,

 I've made some changes to [your version of] TOS's prime_sieve.c, making
 it more portable and removing the 4-byte pointer constraint (i.e. fixing
 your "-m64" problem).

 The original version also contains an overflow bug on large intervals
 (>=2^36).
 (Another overflow still occurs [very] near to 2^64, because the sum of
 [the muted] main_base and numbers_per_segment might [or will] exceed 64
 bits.)
 For use with Sage, the modulo-64* limitation on the intervals' bounds
 should be removed, too (easy, but it didn't disturb me yet ;-) ).
 I also fixed your parameter handling and output of the result.

 If you intend to build a library rather than a stand-alone program the
 memory management should be rewritten (and globals removed). Running
 many sieving threads on multi-core CPUs doesn't make much sense because
 memory (consumption and traffic) is the bottleneck; the sieving primes
 lists ("main_lists") can't be shared among tasks (TOS's optimization
 is a "sequential" one in that sense).

 Note that TOS's implementation, as he stated, was created mainly for
 illustration purposes - "simple but serious" - and wasn't intended (and
 doesn't claim) to be "fully optimized" or optimal in all aspects.

 The default values for sieve (segment) and bucket size were appropriate
 in 2002; for contemporary hardware, both should be increased. I got best
 results with the (currently) maximal bucket size of 64KB, and sieve
 segment sizes of 256KB and 1MB on an Intel Pentium 4 Prescott (1MB L2
 cache) and Intel Core2 with 6MB L2 cache per core-pair (e.g. E8400,
 Q9x50), respectively. Also, native code ("-m64"**) runs faster on x86_64
 platforms, even with u32 as the sieve basetype; using 64- rather than
 32-bit words for the sieve has probably greater effect on platforms like
 SPARC64, I guess. (Compile with "-DSIEVE_BASETYPE_U64" to get that.)

 I've attached your version of prime_sieve.c ("Silva.c.orig", 09/04/09),
 too, since I'm not sure I fetched the latest. The first diff contains a
 (working) subset of the changes I made, the second contains all changes.
 A ChangeLog (and adaption of TOS's "#if 0" test code) is in the
 pipeline... :/

 Have fun,
 -Leif

 ----

 *My reply, December 4th 2009
 Sounds great
 [...]

 On a Macbook pro, I have found that dualthreaded (simply calling TOS
 twice for two adjacent intervals) is about 1.71 times faster.  I hope
 that it might be possible to improve that.
 [...]
 Kevin Stueve
 ----
 *From Leif, December 4th 2009
 Hi again,

 I've just found another little 64bit-pointer bug in get_memory(),
 the attached diff contains only that patch.

 -Leif
 ----
 *From Leif, December 5th, 2009
 Hi Kevin,

 I've fixed the modulo-64 (or modulo-128 for 64-bit sieve words)
 restriction on interval bounds (by enlarging the sieved interval and
 subtracting the "extra" primes found if necessary);
 upper_bound==lower_bound is allowed now, too.
 [Adapting the Python code is up to you ;-)]

 The majority of "#ifdef SIEVE_BASETYPE_U64"s has also been removed in
 favour of conditional macros and constants defined at the beginning.

 -Leif

 P.S.: By "many sieving threads" (currently: processes) I meant more than
 2 or 4; but even with only few instances the processor's cache(s) might
 begin thrashing and we might run out of physical RAM; the behaviour can
 depend on not only the specific hardware but compile-time parameters
 (and does of course depend on the interval's lower bound), too.
 [Even small intervals above 4e18 require ~0.8GB (per process), intervals
 near 2^64 approx. 1.6GB. In worst case the machine starts swapping and
 multiple processes run slower, taking more time than a single would have
 taken. If I could get rid of the megs of sieving primes, I'd use the
 hundreds of cores on modern GPUs. :-) ]

-- 
Ticket URL: <http://trac.sagemath.org/sage_trac/ticket/7539#comment:2>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica, 
and MATLAB

--

You received this message because you are subscribed to the Google Groups 
"sage-trac" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/sage-trac?hl=en.


Reply via email to