There is apparently a problem with your computer/compilation option/other? If you run an ordinary simulation with -j4 and many particles do you see 4 cores used?
Bruno On 25/02/14 16:26, Christian Jakob wrote: > Hi Bruno, > > I did some tests with your new collider: > > My "old" machine (2 cpu sockets with 4 cores each, Intel(R) Xeon(R) > CPU X5460 @ 3.16GHz) says: > > > yade-trunk -j4 --performance > > Welcome to Yade 2014-02-18.git-af75797 > ..... > number of bodies 200813 > > Elapsed 74.6882498264 sec > Performance 2.67779738399 iter/sec > Extrapolation on 1e5 iters 10.3733680314 hours > =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=* > Name Count > Time Rel. time > ------------------------------------------------------------------------------------------------------- > > ForceResetter 200 > 2625848us 3.52% > InsertionSortCollider 7 > 21494603us 28.79% > InteractionLoop 200 > 32631323us 43.70% > NewtonIntegrator 200 > 17913859us 23.99% > TOTAL > 74665635us 100.00% > > Common time 3845.09048295 s > > > Calculation velocity is unstable, try to close all programs and start > performance tests again > 5037 spheres, velocity= 44.7832284176 +- 60.1189421161 % > 25103 spheres, velocity= 17.4121076601 +- 0.99355345037 % > 50250 spheres, velocity= 10.0714940216 +- 1.53896666769 % > 100467 spheres, velocity= 5.05891811219 +- 0.434738330959 % > 200813 spheres, velocity= 2.65826879857 +- 0.933088603948 % > > > SCORE: 3479 > Number of threads 4 > > .... > > ########################################################### > > yade-parallel -j4 --performance (your pc branch) > > Welcome to Yade 2014-02-24.git-b60d388 > ..... > number of bodies 200813 > > Elapsed 75.6688189507 sec > Performance 2.64309662518 iter/sec > Extrapolation on 1e5 iters 10.5095581876 hours > =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=* > Name Count > Time Rel. time > ------------------------------------------------------------------------------------------------------- > > ForceResetter 200 > 2600100us 3.44% > InsertionSortCollider 7 > 20746020us 27.43% > InteractionLoop 200 > 34455725us 45.55% > NewtonIntegrator 200 > 17838205us 23.58% > TOTAL > 75640051us 100.00% > > Common time 4093.34840894 s > > > Calculation velocity is unstable, try to close all programs and start > performance tests again > 5037 spheres, velocity= 44.3999135517 +- 61.0812025756 % > 25103 spheres, velocity= 16.8531534243 +- 1.32470154863 % > 50250 spheres, velocity= 9.61504490252 +- 0.670186229301 % > 100467 spheres, velocity= 4.86679881913 +- 0.487840014886 % > 200813 spheres, velocity= 2.64490152313 +- 0.285084118261 % > > > SCORE: 3402 > Number of threads 4 > > ###################################################### > > > For my computer it seems to have nearly no speed up ... > > Looking at htop tells my, that -j4 --performance is using 4 threads, > but just on 1 core ... > > Regards, > > Christian > > > > Zitat von Bruno Chareyre <bruno.chare...@hmg.inpg.fr>: > >> Hi there, >> I implemented a parallel version of the InsertionSortCollider. It is >> almost ready but not yet pushed to the main trunk, as I have a few >> things to check before that. >> It would be helpful if some of you could 1/ test that your scripts work >> correctly and 2/ benchmark this for N>100k and j>4. >> If you run benchmarks, please remember to always activate timing and >> report the result of timing.stats(). It gives much more interesting data >> than the wall clock time. >> >> Preliminary benchmark results are below (from my laptop...), showing a >> speedup by a factor 2 on the total computation time for j4/200k >> particles (compared to the sequential collider). >> The speedup on collider alone is in fact of the order of x3.68 for 4 >> threads. Nearly linear at least for such small number of threads. >> >> My expectation is that it should change almost nothing for small number >> of particles (say, N<10k), where colliding is an inexpensive step. >> For 1million of particles OTOH, there could be significant speedup, >> since the collider takes most of the time. >> >> You can get the "pc" branch at my github repo: >> git clone -b pc https://github.com/bchareyre/trunk.git >> >> Results of yade -j4 --performance are below (I7 quad-core with >> hyperthreading enabled, lightly loaded by background tasks - j>4 not >> reported as hyperthreading is probably doing no good). >> >> Happy benchmarking. :) >> >> Bruno >> >> >> ==================== >> ./yade-trunk -j4 --performance (the current trunk) >> ....... >> number of bodies 200813 >> >> Elapsed 29.4102840424 sec >> Performance 6.80034234664 iter/sec >> Extrapolation on 1e5 iters 4.08476167255 hours >> =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=* >> Name >> Count Time Rel. time >> ------------------------------------------------------------------------------------------------------- >> >> ForceResetter 200 >> 700881us 2.38% >> InsertionSortCollider 7 >> 18816625us 64.02% >> InteractionLoop 200 >> 6581283us 22.39% >> NewtonIntegrator 200 >> 3293119us 11.20% >> TOTAL >> 29391910us 100.00% >> >> Common time 597.731503963 s >> >> >> 5037 spheres, velocity= 327.689688709 +- 5.13604387635 % >> 25103 spheres, velocity= 81.2726909754 +- 1.0105334405 % >> 50250 spheres, velocity= 45.4114521341 +- 3.02333274436 % >> 100467 spheres, velocity= 19.0287424005 +- 2.26073439157 % >> 200813 spheres, velocity= 6.51664351023 +- 4.03351515402 % >> >> >> SCORE: 13777 >> Number of threads 4 >> >> >> ======================== >> ./yade-parallel -j4 --performance (my "pc" branch) >> .... >> >> number of bodies 200813 >> >> Elapsed 15.4320101738 sec >> Performance 12.9600744004 iter/sec >> Extrapolation on 1e5 iters 2.14333474636 hours >> =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=* >> Name >> Count Time Rel. time >> ------------------------------------------------------------------------------------------------------- >> >> ForceResetter 200 >> 671157us 4.36% >> InsertionSortCollider 7 >> 5145114us 33.42% >> boundDispatcher 7 >> 93186us 1.81% >> bound >> 7 12us 0.00% >> copy 7 >> 160891us 3.13% >> erase 7 >> 66932us 1.30% >> sort&collide 7 >> 4824071us 93.76% >> TOTAL 35 >> 5145095us 100.00% >> InteractionLoop 200 >> 6545848us 42.52% >> NewtonIntegrator 200 >> 3030989us 19.69% >> TOTAL >> 15393110us 100.00% >> >> Common time 460.37680912 s >> >> >> 5037 spheres, velocity= 365.599773471 +- 8.02397068512 % >> 25103 spheres, velocity= 92.0077536966 +- 3.81069496509 % >> 50250 spheres, velocity= 54.1683980588 +- 0.528288534811 % >> 100467 spheres, velocity= 25.7134767981 +- 1.0796373464 % >> 200813 spheres, velocity= 12.6488486429 +- 4.66276699319 % >> >> >> SCORE: 18800 >> Number of threads 4 >> >> >> _______________________________________________ >> Mailing list: https://launchpad.net/~yade-dev >> Post to : yade-dev@lists.launchpad.net >> Unsubscribe : https://launchpad.net/~yade-dev >> More help : https://help.launchpad.net/ListHelp >> > > > > > _______________________________________________ > Mailing list: https://launchpad.net/~yade-dev > Post to : yade-dev@lists.launchpad.net > Unsubscribe : https://launchpad.net/~yade-dev > More help : https://help.launchpad.net/ListHelp > > > -- _______________ Bruno Chareyre Associate Professor ENSE³ - Grenoble INP Lab. 3SR BP 53 38041 Grenoble cedex 9 Tél : +33 4 56 52 86 21 Fax : +33 4 76 82 70 43 ________________ _______________________________________________ Mailing list: https://launchpad.net/~yade-dev Post to : yade-dev@lists.launchpad.net Unsubscribe : https://launchpad.net/~yade-dev More help : https://help.launchpad.net/ListHelp