Just pushed the ':hireswallclock' version of t/pthread.t and t/pthread_auto.t to PDL git. Thanks again, Dima.
--Chris On Mon, Sep 12, 2011 at 7:50 AM, chm <[email protected]> wrote: > On 9/12/2011 5:00 AM, Dima Kogan wrote: >>> >>> On Sun, 11 Sep 2011 10:22:30 -0400 >>> chm<[email protected]> wrote: >>> >>> Has anyone seen performance benefit from the >>> new auto pthread capability? >>> >>> When I run the t/pthread_auto.t test on an >>> AMD Athlon(tm) X2 Dual Core machine I see no >>> win from pthreads. It would seem that the >>> performance gain might depend on the complexity >>> of the calculation being threaded and on the >>> number of cores. >>> >>> Data points anyone? >>> >>> --Chris >> >> Hi. >> >> First off, the test was broken, but it seems you already fixed it >> (unthreaded >> control case was actually set to 10-way threaded). I just ran some >> experiments >> to see just how beneficial extra threads are, and it is clear that the >> benchmarking reported by the test is misleading. It reports the wall-clock >> timing with a resolution of 1 second (way too coarse to be useful) and a >> user >> timing with a resolution of 0.01 seconds. The user timing counts CPU time, >> so >> it's USELESS here. If 5 cores each spend 1 second doing something, the >> user >> timing would be 5 seconds, even though the whole point of the automatic >> threading was to reduce wall-clock timing by increasing user timing. > > Thanks for investigating and clearing things up. > >> I increased the resolution of the wall-clock timing by replacing the 'use >> Benchmark' in the test header to >> >> use Benchmark ':hireswallclock'; >> use Time::HiRes; >> >> If it's acceptable to require that Time::HiRes is available, we should >> make this >> change permanent I think. > > According to the docs, just use the new Benchmark line. > You don't need to use Time::HiRes at all. Benchmark > will quietly fall back to standard timing if Time::HiRes > is not available. No dependencies required. > >> This gives us useful wall-clock numbers, so I ran some tests to see how >> adding >> threads affects the computation time. I did this with the stock >> computation in >> the test ( $a += 1 ) and a more complicated computation to try to reduce >> the >> overhead costs ( $a = random(2000000); $a **= 1.3 ). The timings were done >> on a >> recent 8-core Intel machine running a recent Debian/unstable install. >> Wall-clock >> timings: >> >> >> | set_autopthread_targ | += 1 (500 times) | **= 1.3 (10 times) | >> |----------------------+------------------+--------------------| >> | 0 | 1.90 | 2.15 | >> | 1 | 1.90 | 2.15 | >> | 2 | 1.17 | 1.10 | >> | 3 | 1.15 | 1.10 | >> | 4 | 0.91 | 0.56 | >> | 5 | 0.89 | 0.45 | >> | 6 | 0.90 | 0.45 | >> | 7 | 0.90 | 0.46 | >> | 8 | 0.80 | 0.29 | >> | 9 | 0.80 | 0.29 | >> | 10 | 0.93 | 0.39 | >> >> We can clearly see that extra threads make things go quicker. We can >> clearly see >> that the heavier computation benefits more from extra threads (lower >> relative >> overhead costs to maintain the threads). There's an interesting discrete >> nature >> to the improvement: adding a 4th thread makes a huge difference, while >> adding a >> 3rd doesn't at all. This may be due to the way the auto-threading is >> implemented. We can also see that when we have more threads than cores, >> the >> extra threads are a burden, not an improvement. > > Mystery solved! I guess getting pthreads working for win32 will > be worth it after all. > > Thanks, > Chris > _______________________________________________ Perldl mailing list [email protected] http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
