#7539: primes.p0.spkg with "prime_sieve.c" functionality
-----------------------------+----------------------------------------------
Reporter: GeorgSWeber | Owner: rohana
Type: enhancement | Status: new
Priority: major | Milestone: sage-4.3
Component: number theory | Keywords:
Work_issues: | Author: rohana, GeorgSWeber
Upstream: N/A | Reviewer:
Merged: |
-----------------------------+----------------------------------------------
Comment(by kevin.stueve):
Leif Leonhardy sent some very useful contributions over the past days.
Following is our correspondence.
----
*Friday December 4, 2009:
Hi Kevin,
I've made some changes to [your version of] TOS's prime_sieve.c, making
it more portable and removing the 4-byte pointer constraint (i.e. fixing
your "-m64" problem).
The original version also contains an overflow bug on large intervals
(>=2^36).
(Another overflow still occurs [very] near to 2^64, because the sum of
[the muted] main_base and numbers_per_segment might [or will] exceed 64
bits.)
For use with Sage, the modulo-64* limitation on the intervals' bounds
should be removed, too (easy, but it didn't disturb me yet ;-) ).
I also fixed your parameter handling and output of the result.
If you intend to build a library rather than a stand-alone program the
memory management should be rewritten (and globals removed). Running
many sieving threads on multi-core CPUs doesn't make much sense because
memory (consumption and traffic) is the bottleneck; the sieving primes
lists ("main_lists") can't be shared among tasks (TOS's optimization
is a "sequential" one in that sense).
Note that TOS's implementation, as he stated, was created mainly for
illustration purposes - "simple but serious" - and wasn't intended (and
doesn't claim) to be "fully optimized" or optimal in all aspects.
The default values for sieve (segment) and bucket size were appropriate
in 2002; for contemporary hardware, both should be increased. I got best
results with the (currently) maximal bucket size of 64KB, and sieve
segment sizes of 256KB and 1MB on an Intel Pentium 4 Prescott (1MB L2
cache) and Intel Core2 with 6MB L2 cache per core-pair (e.g. E8400,
Q9x50), respectively. Also, native code ("-m64"**) runs faster on x86_64
platforms, even with u32 as the sieve basetype; using 64- rather than
32-bit words for the sieve has probably greater effect on platforms like
SPARC64, I guess. (Compile with "-DSIEVE_BASETYPE_U64" to get that.)
I've attached your version of prime_sieve.c ("Silva.c.orig", 09/04/09),
too, since I'm not sure I fetched the latest. The first diff contains a
(working) subset of the changes I made, the second contains all changes.
A ChangeLog (and adaption of TOS's "#if 0" test code) is in the
pipeline... :/
Have fun,
-Leif
----
*My reply, December 4th 2009
Sounds great
[...]
On a Macbook pro, I have found that dualthreaded (simply calling TOS
twice for two adjacent intervals) is about 1.71 times faster. I hope
that it might be possible to improve that.
[...]
Kevin Stueve
----
*From Leif, December 4th 2009
Hi again,
I've just found another little 64bit-pointer bug in get_memory(),
the attached diff contains only that patch.
-Leif
----
*From Leif, December 5th, 2009
Hi Kevin,
I've fixed the modulo-64 (or modulo-128 for 64-bit sieve words)
restriction on interval bounds (by enlarging the sieved interval and
subtracting the "extra" primes found if necessary);
upper_bound==lower_bound is allowed now, too.
[Adapting the Python code is up to you ;-)]
The majority of "#ifdef SIEVE_BASETYPE_U64"s has also been removed in
favour of conditional macros and constants defined at the beginning.
-Leif
P.S.: By "many sieving threads" (currently: processes) I meant more than
2 or 4; but even with only few instances the processor's cache(s) might
begin thrashing and we might run out of physical RAM; the behaviour can
depend on not only the specific hardware but compile-time parameters
(and does of course depend on the interval's lower bound), too.
[Even small intervals above 4e18 require ~0.8GB (per process), intervals
near 2^64 approx. 1.6GB. In worst case the machine starts swapping and
multiple processes run slower, taking more time than a single would have
taken. If I could get rid of the megs of sieving primes, I'd use the
hundreds of cores on modern GPUs. :-) ]
--
Ticket URL: <http://trac.sagemath.org/sage_trac/ticket/7539#comment:2>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica,
and MATLAB
--
You received this message because you are subscribed to the Google Groups
"sage-trac" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/sage-trac?hl=en.