Gonzalo Tornaria wrote:
On Wed, Jan 13, 2010 at 5:52 PM, William Stein <wst...@gmail.com> wrote:
What matters for this benchmark is the number of cores that the computer has.
Though t2 can manage 128 hardware threads, it only has 16 actual *cores*.
Not quite; the following is in a box with 8 cores -- 16 threads:
sage: time b = bernoulli(10^5, algorithm='bernmm', num_threads=1)
CPU times: user 4.84 s, sys: 0.00 s, total: 4.84 s
Wall time: 4.84 s
sage: timeit("bernoulli(10^5, algorithm='bernmm', num_threads=8)")
5 loops, best of 3: 1.06 s per loop
sage: timeit("bernoulli(10^5, algorithm='bernmm', num_threads=16)")
5 loops, best of 3: 914 ms per loop
That's 15% speedup for the extra threads... Not bad, given that going
4 --> 8 threads on the same machine or going 8 --> 16 threads on
sage.math gives about 30% speedup for the same computation.
I'd guess for a really cpu-bound task, the number of cores should be
it, but for memory-bound tasks, having more threads may be an
advantage because of memory latencies, as long as the threads don't
have to compete for cache. In the end, whatever optimizes the usage of
cache size / memory bandwidth between threads is probably best.
Also, HT could optimize the use of arithmetic units between threads in
the same core, so unless the inner loops have perfect scheduling,
there's always something to gain from this approach.
Best, Gonzalo
The definitely does seem to be something gained from having more threads in your
software than hardware threads on the machine.
I was surprised the fact that increasing the number of threads dramatically did
not have a negative impact on performance. I'm sure at some point it must do,
but on that little program I attached, running on my own machine (4 cores, 8
threads), using 1000 threads was actually a bit quicker than using 16, though I
have not done sufficient analysis to say whether that is statistically
significant or not.
1 thread - 0m58.633s
2 threads - 0m44.079s
4 threads - 0m28.383s (number of cores on my machine)
8 threads - 0m18.993s (number of hardware threads)
16 threads - 0m15.799s
32 threads - 0m15.385s
1000 threads - 0m15.067s
Certainly, on my machine at least, having too few threads in the software has a
major impact on performance. So far having lots of them does not seem to be a
problem.
In fact, I just tried using 2000 threads, and obvious impact on performance
drkir...@hawk:~$ time ./parallel 1000000 2000
Using 2000 threads
N primes up to 1000000 is: 78499
last is: 999983
real 0m15.066s
user 1m59.857s
sys 0m0.099s
I tried higher numbers, but there us a limit on the number of threads I can
create. 2000 is ok, 5000 is not. I was hoping to see what 1,000,000 threads
would be like, but it don't work.
I'm sure this is one of those things for which there are no hard and fast rules,
and that to get the ultimate performance, one is likely to want to conduct
individual tests for the software/hardware combination you use.
Dave
Dave
--
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to
sage-devel+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org