I forgot to mention two things: 1- I tried the benchmark used by Christian for comparisons with PFC [1], however it seems that this test is very special. I get large differences between two runs. Basically, it seems the simulation only depends on truncation errors: vertical columns of sheres remain stable until a small bit of horizontal noise makes them fall down one by one. If you look at the simulation in the GUI it looks strange. I did not insist with this one, I think it could be improved by replacing the lattice by disordered packings. 2- The benchmark done by Alexander some time ago (on the same problem but with -j>1) is not visible anywhere if I'm not wrong. I have a copy of the pdf, is it ok to upload it on the wiki? It is an interesting starting point for evaluating the parallel collider.
Bruno [1] https://www.yade-dem.org/wiki/Comparisons_with_PFC3D On 24/02/14 16:36, Bruno Chareyre wrote: > Hi there, > I implemented a parallel version of the InsertionSortCollider. It is > almost ready but not yet pushed to the main trunk, as I have a few > things to check before that. > It would be helpful if some of you could 1/ test that your scripts work > correctly and 2/ benchmark this for N>100k and j>4. > If you run benchmarks, please remember to always activate timing and > report the result of timing.stats(). It gives much more interesting data > than the wall clock time. > > Preliminary benchmark results are below (from my laptop...), showing a > speedup by a factor 2 on the total computation time for j4/200k > particles (compared to the sequential collider). > The speedup on collider alone is in fact of the order of x3.68 for 4 > threads. Nearly linear at least for such small number of threads. > > My expectation is that it should change almost nothing for small number > of particles (say, N<10k), where colliding is an inexpensive step. > For 1million of particles OTOH, there could be significant speedup, > since the collider takes most of the time. > > You can get the "pc" branch at my github repo: > git clone -b pc https://github.com/bchareyre/trunk.git > > Results of yade -j4 --performance are below (I7 quad-core with > hyperthreading enabled, lightly loaded by background tasks - j>4 not > reported as hyperthreading is probably doing no good). > > Happy benchmarking. :) > > Bruno > > > ==================== > ./yade-trunk -j4 --performance (the current trunk) > ....... > number of bodies 200813 > > Elapsed 29.4102840424 sec > Performance 6.80034234664 iter/sec > Extrapolation on 1e5 iters 4.08476167255 hours > =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=* > Name > Count Time Rel. time > ------------------------------------------------------------------------------------------------------- > ForceResetter 200 > 700881us 2.38% > InsertionSortCollider 7 > 18816625us 64.02% > InteractionLoop 200 > 6581283us 22.39% > NewtonIntegrator 200 > 3293119us 11.20% > TOTAL > 29391910us 100.00% > > Common time 597.731503963 s > > > 5037 spheres, velocity= 327.689688709 +- 5.13604387635 % > 25103 spheres, velocity= 81.2726909754 +- 1.0105334405 % > 50250 spheres, velocity= 45.4114521341 +- 3.02333274436 % > 100467 spheres, velocity= 19.0287424005 +- 2.26073439157 % > 200813 spheres, velocity= 6.51664351023 +- 4.03351515402 % > > > SCORE: 13777 > Number of threads 4 > > > ======================== > ./yade-parallel -j4 --performance (my "pc" branch) > .... > > number of bodies 200813 > > Elapsed 15.4320101738 sec > Performance 12.9600744004 iter/sec > Extrapolation on 1e5 iters 2.14333474636 hours > =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=* > Name > Count Time Rel. time > ------------------------------------------------------------------------------------------------------- > ForceResetter 200 > 671157us 4.36% > InsertionSortCollider 7 > 5145114us 33.42% > boundDispatcher 7 > 93186us 1.81% > bound > 7 12us 0.00% > copy 7 > 160891us 3.13% > erase 7 > 66932us 1.30% > sort&collide 7 > 4824071us 93.76% > TOTAL 35 > 5145095us 100.00% > InteractionLoop 200 > 6545848us 42.52% > NewtonIntegrator 200 > 3030989us 19.69% > TOTAL > 15393110us 100.00% > > Common time 460.37680912 s > > > 5037 spheres, velocity= 365.599773471 +- 8.02397068512 % > 25103 spheres, velocity= 92.0077536966 +- 3.81069496509 % > 50250 spheres, velocity= 54.1683980588 +- 0.528288534811 % > 100467 spheres, velocity= 25.7134767981 +- 1.0796373464 % > 200813 spheres, velocity= 12.6488486429 +- 4.66276699319 % > > > SCORE: 18800 > Number of threads 4 > > > _______________________________________________ > Mailing list: https://launchpad.net/~yade-dev > Post to : yade-dev@lists.launchpad.net > Unsubscribe : https://launchpad.net/~yade-dev > More help : https://help.launchpad.net/ListHelp > > > _______________________________________________ Mailing list: https://launchpad.net/~yade-dev Post to : yade-dev@lists.launchpad.net Unsubscribe : https://launchpad.net/~yade-dev More help : https://help.launchpad.net/ListHelp