On 9/12/2011 5:00 AM, Dima Kogan wrote:
On Sun, 11 Sep 2011 10:22:30 -0400
chm<[email protected]> wrote:
Has anyone seen performance benefit from the
new auto pthread capability?
When I run the t/pthread_auto.t test on an
AMD Athlon(tm) X2 Dual Core machine I see no
win from pthreads. It would seem that the
performance gain might depend on the complexity
of the calculation being threaded and on the
number of cores.
Data points anyone?
--Chris
Hi.
First off, the test was broken, but it seems you already fixed it (unthreaded
control case was actually set to 10-way threaded). I just ran some experiments
to see just how beneficial extra threads are, and it is clear that the
benchmarking reported by the test is misleading. It reports the wall-clock
timing with a resolution of 1 second (way too coarse to be useful) and a user
timing with a resolution of 0.01 seconds. The user timing counts CPU time, so
it's USELESS here. If 5 cores each spend 1 second doing something, the user
timing would be 5 seconds, even though the whole point of the automatic
threading was to reduce wall-clock timing by increasing user timing.
Thanks for investigating and clearing things up.
I increased the resolution of the wall-clock timing by replacing the 'use
Benchmark' in the test header to
use Benchmark ':hireswallclock';
use Time::HiRes;
If it's acceptable to require that Time::HiRes is available, we should make this
change permanent I think.
According to the docs, just use the new Benchmark line.
You don't need to use Time::HiRes at all. Benchmark
will quietly fall back to standard timing if Time::HiRes
is not available. No dependencies required.
This gives us useful wall-clock numbers, so I ran some tests to see how adding
threads affects the computation time. I did this with the stock computation in
the test ( $a += 1 ) and a more complicated computation to try to reduce the
overhead costs ( $a = random(2000000); $a **= 1.3 ). The timings were done on a
recent 8-core Intel machine running a recent Debian/unstable install. Wall-clock
timings:
| set_autopthread_targ | += 1 (500 times) | **= 1.3 (10 times) |
|----------------------+------------------+--------------------|
| 0 | 1.90 | 2.15 |
| 1 | 1.90 | 2.15 |
| 2 | 1.17 | 1.10 |
| 3 | 1.15 | 1.10 |
| 4 | 0.91 | 0.56 |
| 5 | 0.89 | 0.45 |
| 6 | 0.90 | 0.45 |
| 7 | 0.90 | 0.46 |
| 8 | 0.80 | 0.29 |
| 9 | 0.80 | 0.29 |
| 10 | 0.93 | 0.39 |
We can clearly see that extra threads make things go quicker. We can clearly see
that the heavier computation benefits more from extra threads (lower relative
overhead costs to maintain the threads). There's an interesting discrete nature
to the improvement: adding a 4th thread makes a huge difference, while adding a
3rd doesn't at all. This may be due to the way the auto-threading is
implemented. We can also see that when we have more threads than cores, the
extra threads are a burden, not an improvement.
Mystery solved! I guess getting pthreads working for win32 will
be worth it after all.
Thanks,
Chris
_______________________________________________
Perldl mailing list
[email protected]
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl