I just built asperl 5.12.3 on win32 and the performance on 4 CPUs with the number of threads set to 4 and the calculation to $a **= 1.3 got almost 4X speedup. Fantastic!
--Chris On Mon, Sep 12, 2011 at 10:32 AM, Chris Marshall <[email protected]> wrote: > Just pushed the ':hireswallclock' version of t/pthread.t > and t/pthread_auto.t to PDL git. Thanks again, Dima. > > --Chris > > On Mon, Sep 12, 2011 at 7:50 AM, chm <[email protected]> wrote: >> On 9/12/2011 5:00 AM, Dima Kogan wrote: >>>> >>>> On Sun, 11 Sep 2011 10:22:30 -0400 >>>> chm<[email protected]> wrote: >>>> >>>> Has anyone seen performance benefit from the >>>> new auto pthread capability? >>>> >>>> When I run the t/pthread_auto.t test on an >>>> AMD Athlon(tm) X2 Dual Core machine I see no >>>> win from pthreads. It would seem that the >>>> performance gain might depend on the complexity >>>> of the calculation being threaded and on the >>>> number of cores. >>>> >>>> Data points anyone? >>>> >>>> --Chris >>> >>> Hi. >>> >>> First off, the test was broken, but it seems you already fixed it >>> (unthreaded >>> control case was actually set to 10-way threaded). I just ran some >>> experiments >>> to see just how beneficial extra threads are, and it is clear that the >>> benchmarking reported by the test is misleading. It reports the wall-clock >>> timing with a resolution of 1 second (way too coarse to be useful) and a >>> user >>> timing with a resolution of 0.01 seconds. The user timing counts CPU time, >>> so >>> it's USELESS here. If 5 cores each spend 1 second doing something, the >>> user >>> timing would be 5 seconds, even though the whole point of the automatic >>> threading was to reduce wall-clock timing by increasing user timing. >> >> Thanks for investigating and clearing things up. >> >>> I increased the resolution of the wall-clock timing by replacing the 'use >>> Benchmark' in the test header to >>> >>> use Benchmark ':hireswallclock'; >>> use Time::HiRes; >>> >>> If it's acceptable to require that Time::HiRes is available, we should >>> make this >>> change permanent I think. >> >> According to the docs, just use the new Benchmark line. >> You don't need to use Time::HiRes at all. Benchmark >> will quietly fall back to standard timing if Time::HiRes >> is not available. No dependencies required. >> >>> This gives us useful wall-clock numbers, so I ran some tests to see how >>> adding >>> threads affects the computation time. I did this with the stock >>> computation in >>> the test ( $a += 1 ) and a more complicated computation to try to reduce >>> the >>> overhead costs ( $a = random(2000000); $a **= 1.3 ). The timings were done >>> on a >>> recent 8-core Intel machine running a recent Debian/unstable install. >>> Wall-clock >>> timings: >>> >>> >>> | set_autopthread_targ | += 1 (500 times) | **= 1.3 (10 times) | >>> |----------------------+------------------+--------------------| >>> | 0 | 1.90 | 2.15 | >>> | 1 | 1.90 | 2.15 | >>> | 2 | 1.17 | 1.10 | >>> | 3 | 1.15 | 1.10 | >>> | 4 | 0.91 | 0.56 | >>> | 5 | 0.89 | 0.45 | >>> | 6 | 0.90 | 0.45 | >>> | 7 | 0.90 | 0.46 | >>> | 8 | 0.80 | 0.29 | >>> | 9 | 0.80 | 0.29 | >>> | 10 | 0.93 | 0.39 | >>> >>> We can clearly see that extra threads make things go quicker. We can >>> clearly see >>> that the heavier computation benefits more from extra threads (lower >>> relative >>> overhead costs to maintain the threads). There's an interesting discrete >>> nature >>> to the improvement: adding a 4th thread makes a huge difference, while >>> adding a >>> 3rd doesn't at all. This may be due to the way the auto-threading is >>> implemented. We can also see that when we have more threads than cores, >>> the >>> extra threads are a burden, not an improvement. >> >> Mystery solved! I guess getting pthreads working for win32 will >> be worth it after all. >> >> Thanks, >> Chris >> > _______________________________________________ Perldl mailing list [email protected] http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
