Re: [Yade-dev] parallel collider - testing needed

2014-04-16 Thread Bruno Chareyre
Thanks Matthias,
Actually I don't understand your benchmark results. You are the first
one to find no speedup on the colliding part.
It seems the results below were not using the parallel collider, since
the time it takes is exactly the same for all number of threads.
What version is that (diplayed at yade startup)?
Bruno

On 16/04/14 17:14, Matthias Frank wrote:
> hi bruno,
>
> i use your first version of the parallel collider for quiet a while
> during model development and also calibration. i saw no differences
> between yade-1.07 and your version.
>
> i did some benchmarks with 4 to 16 sandy bridge cores at our bull
> cluster. getting more than 16 cores for openmp applications is quit
> difficult.
> done on an  exclusively used 16 core node
>
> === 1 threads =
> number of bodies 200813
>
> Elapsed  47.6222550869  sec
> Performance  4.19971712039  iter/sec
> Extrapolation on 1e5 iters  6.6142020954  hours
> =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
> Name Count TimeRel. time
> ---
>
> ForceResetter 200 594120us1.25%
> InsertionSortCollider 7
> 15686671us   32.95%
> InteractionLoop 200
> 21787610us   45.76%
> NewtonIntegrator200
> 9541243us   20.04%
> TOTAL 47609645us  100.00%
>
> Common time  1383.60180092 s
>
>
> 5037  spheres, velocity= 103.875852973 +- 6.56561134015 %
> 25103  spheres, velocity= 31.681069095 +- 3.69992939292 %
> 50250  spheres, velocity= 15.6112167455 +- 0.651579666153 %
> 100467  spheres, velocity= 7.65955209926 +- 0.740064173207 %
> Calculation velocity is unstable, try to close all programs and start
> performance tests again
> 200813  spheres, velocity= 4.52368811131 +- 12.3907756519 %
>
>
> SCORE: 6055
> Number of threads  1
> === 4 threads =
> number of bodies 200813
>
> Elapsed  29.6409780979  sec
> Performance  6.7474156669  iter/sec
> Extrapolation on 1e5 iters  4.1168025136  hours
> =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
> Name Count TimeRel. time
> ---
>
> ForceResetter   200
> 2919976us9.85%
> InsertionSortCollider 7
> 15675024us   52.89%
> InteractionLoop 200
> 5309648us   17.92%
> NewtonIntegrator200
> 5730646us   19.34%
> TOTAL 29635295us  100.00%
>
> Common time  641.693111897 s
>
>
> Calculation velocity is unstable, try to close all programs and start
> performance tests again
> 5037  spheres, velocity= 232.725838879 +- 14.3014472878 %
> Calculation velocity is unstable, try to close all programs and start
> performance tests again
> 25103  spheres, velocity= 72.3475644141 +- 12.8106054968 %
> 50250  spheres, velocity= 50.2926096116 +- 3.01250915287 %
> 100467  spheres, velocity= 18.9664279425 +- 1.40241049531 %
> 200813  spheres, velocity= 6.95879166249 +- 2.72955035307 %
>
>
> SCORE: 13080
> Number of threads  4
> === 8 threads =
> number of bodies 200813
>
> Elapsed  28.8497908115  sec
> Performance  6.9324592787  iter/sec
> Extrapolation on 1e5 iters  4.00691539049  hours
> =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
> Name Count TimeRel. time
> ---
>
> ForceResetter   200
> 4760739us   16.51%
> InsertionSortCollider 7
> 15682352us   54.38%
> InteractionLoop 200
> 3398981us   11.79%
> NewtonIntegrator200
> 4997676us   17.33%
> TOTAL 28839750us  100.00%
>
> Common time  629.34264183 s
>
>
> Calculation velocity is unstable, try to close all programs and start
> performance tests again
> 5037  spheres, velocity= 242.232297207 +- 18.7054194438 %
> 25103  spheres, velocity= 78.2112705997 +- 4.19360243937 %
> 50250  spheres, velocity= 46.6877664726 +- 2.81481812835 %
> 100467  spheres, velocity= 19.9932164704 +- 3.06039659404 %
> 200813  spheres, velocity= 6.92396036557 +- 0.361116951928 %
>
>
> SCORE: 13272
> Number of threads  8
> === 12 threads =
> number of bodies 200813
>
> Elapsed  29.2484679222  sec
> Performance  6.83796500151  iter/sec
> Extrapolation on 1e5 iters  4.06228721142  hou

Re: [Yade-dev] parallel collider - testing needed

2014-04-16 Thread Matthias Frank

hi bruno,

i use your first version of the parallel collider for quiet a while 
during model development and also calibration. i saw no differences 
between yade-1.07 and your version.


i did some benchmarks with 4 to 16 sandy bridge cores at our bull 
cluster. getting more than 16 cores for openmp applications is quit 
difficult.

done on an  exclusively used 16 core node

=== 1 threads =
number of bodies 200813

Elapsed  47.6222550869  sec
Performance  4.19971712039  iter/sec
Extrapolation on 1e5 iters  6.6142020954  hours
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
Name Count TimeRel. time
---
ForceResetter 200 594120us1.25%
InsertionSortCollider 7 
15686671us   32.95%
InteractionLoop 200 
21787610us   45.76%
NewtonIntegrator200 
9541243us   20.04%

TOTAL 47609645us  100.00%

Common time  1383.60180092 s


5037  spheres, velocity= 103.875852973 +- 6.56561134015 %
25103  spheres, velocity= 31.681069095 +- 3.69992939292 %
50250  spheres, velocity= 15.6112167455 +- 0.651579666153 %
100467  spheres, velocity= 7.65955209926 +- 0.740064173207 %
Calculation velocity is unstable, try to close all programs and start 
performance tests again

200813  spheres, velocity= 4.52368811131 +- 12.3907756519 %


SCORE: 6055
Number of threads  1
=== 4 threads =
number of bodies 200813

Elapsed  29.6409780979  sec
Performance  6.7474156669  iter/sec
Extrapolation on 1e5 iters  4.1168025136  hours
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
Name Count TimeRel. time
---
ForceResetter   200 
2919976us9.85%
InsertionSortCollider 7 
15675024us   52.89%
InteractionLoop 200 
5309648us   17.92%
NewtonIntegrator200 
5730646us   19.34%

TOTAL 29635295us  100.00%

Common time  641.693111897 s


Calculation velocity is unstable, try to close all programs and start 
performance tests again

5037  spheres, velocity= 232.725838879 +- 14.3014472878 %
Calculation velocity is unstable, try to close all programs and start 
performance tests again

25103  spheres, velocity= 72.3475644141 +- 12.8106054968 %
50250  spheres, velocity= 50.2926096116 +- 3.01250915287 %
100467  spheres, velocity= 18.9664279425 +- 1.40241049531 %
200813  spheres, velocity= 6.95879166249 +- 2.72955035307 %


SCORE: 13080
Number of threads  4
=== 8 threads =
number of bodies 200813

Elapsed  28.8497908115  sec
Performance  6.9324592787  iter/sec
Extrapolation on 1e5 iters  4.00691539049  hours
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
Name Count TimeRel. time
---
ForceResetter   200 
4760739us   16.51%
InsertionSortCollider 7 
15682352us   54.38%
InteractionLoop 200 
3398981us   11.79%
NewtonIntegrator200 
4997676us   17.33%

TOTAL 28839750us  100.00%

Common time  629.34264183 s


Calculation velocity is unstable, try to close all programs and start 
performance tests again

5037  spheres, velocity= 242.232297207 +- 18.7054194438 %
25103  spheres, velocity= 78.2112705997 +- 4.19360243937 %
50250  spheres, velocity= 46.6877664726 +- 2.81481812835 %
100467  spheres, velocity= 19.9932164704 +- 3.06039659404 %
200813  spheres, velocity= 6.92396036557 +- 0.361116951928 %


SCORE: 13272
Number of threads  8
=== 12 threads =
number of bodies 200813

Elapsed  29.2484679222  sec
Performance  6.83796500151  iter/sec
Extrapolation on 1e5 iters  4.06228721142  hours
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
Name Count TimeRel. time
---
ForceResetter   200 
7943958us   27.17%
InsertionSortCollider 7 
15713441us   53.75%
InteractionLoop 200 
2522508us8.63%
NewtonIntegrator200 
3055652u

Re: [Yade-dev] parallel collider - testing needed

2014-04-10 Thread Bruno Chareyre
On 10/04/14 02:01, Klaus Thoeni wrote:
> just to clarify, Test 2 is done by increasing the number of iterations (1x, 
> 3x 
> and 12x the number of iterations specified in checkPerf.py). This means the 
> number of interactions should increase as well and, hence, particle 
> velocities 
> should decrease because of more interactions.
That is what I was thinking. And more interactions means less (relative)
time spent in collider.

> I added a table with the collider scaling factor for 1 million particles and 
> iter x 12.
Thanks! So there is still an optimum near 12-14. It may be possible to
improve (choosing approriate chunksizes internally), but it needs
serious testing.

> Note your T(j8)=T(j1)/5.8 is actually T(j8)=T(j1)/4.8. Where did you get the 
> number from? You must look into the uploaded files in order to get this 
> numbers 
I used the x1 line since I was not expecting any influence of the number
of steps on the collider's performance:
187/20=5.8
Now I see it is different with other lines. Weird.

Bruno


___
Mailing list: https://launchpad.net/~yade-dev
Post to : yade-dev@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yade-dev
More help   : https://help.launchpad.net/ListHelp


Re: [Yade-dev] parallel collider - testing needed

2014-04-09 Thread Klaus Thoeni
Hi Bruno,

just to clarify, Test 2 is done by increasing the number of iterations (1x, 3x 
and 12x the number of iterations specified in checkPerf.py). This means the 
number of interactions should increase as well and, hence, particle velocities 
should decrease because of more interactions.

I added a table with the collider scaling factor for 1 million particles and 
iter x 12.

Note your T(j8)=T(j1)/5.8 is actually T(j8)=T(j1)/4.8. Where did you get the 
number from? You must look into the uploaded files in order to get this numbers 
;-)

Cheers
Klaus

On Wednesday 09 April 2014 14:58:19 Bruno Chareyre wrote:
> Thanks!
> If I understand correctly, particles velocities are decreasing with
> iterations. So, more iterations means less weight for the collider
> overall (hence less effect of parallelizing it).
> From you results with 1million, I see for the collider T(j8)=T(j1)/5.8.
> Could you tell if the collider time alone still decreases with j>8 for
> 1million of particles?
> 
> Bruno
> 
> On 09/04/14 14:32, Klaus Thoeni wrote:
> > Hi guys,
> > 
> > just to let you know. I updated the results on the wiki [1]. Still
> > performance test but with more iterations and up to 1 million particles.
> > 
> > Cheers,
> > Klaus
> > 
> > [1] https://yade-dem.org/wiki/Performance_Test#Test_2
> 
> --
> ___
> Bruno Chareyre
> Associate Professor
> ENSE³ - Grenoble INP
> Lab. 3SR
> BP 53
> 38041 Grenoble cedex 9
> Tél : +33 4 56 52 86 21
> Fax : +33 4 76 82 70 43
> 
> 
> 
> ___
> Mailing list: https://launchpad.net/~yade-dev
> Post to : yade-dev@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~yade-dev
> More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~yade-dev
Post to : yade-dev@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yade-dev
More help   : https://help.launchpad.net/ListHelp


Re: [Yade-dev] parallel collider - testing needed

2014-04-09 Thread Klaus Thoeni
Hi guys,

just to let you know. I updated the results on the wiki [1]. Still performance 
test but with more iterations and up to 1 million particles. 

Cheers,
Klaus

[1] https://yade-dem.org/wiki/Performance_Test#Test_2

___
Mailing list: https://launchpad.net/~yade-dev
Post to : yade-dev@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yade-dev
More help   : https://help.launchpad.net/ListHelp


Re: [Yade-dev] parallel collider - testing needed

2014-04-09 Thread Bruno Chareyre
Thanks!
If I understand correctly, particles velocities are decreasing with
iterations. So, more iterations means less weight for the collider
overall (hence less effect of parallelizing it).
>From you results with 1million, I see for the collider T(j8)=T(j1)/5.8.
Could you tell if the collider time alone still decreases with j>8 for
1million of particles?

Bruno


On 09/04/14 14:32, Klaus Thoeni wrote:
> Hi guys,
>
> just to let you know. I updated the results on the wiki [1]. Still 
> performance 
> test but with more iterations and up to 1 million particles. 
>
> Cheers,
> Klaus
>
> [1] https://yade-dem.org/wiki/Performance_Test#Test_2
>
>
>


-- 
___
Bruno Chareyre
Associate Professor
ENSE³ - Grenoble INP
Lab. 3SR
BP 53
38041 Grenoble cedex 9
Tél : +33 4 56 52 86 21
Fax : +33 4 76 82 70 43



___
Mailing list: https://launchpad.net/~yade-dev
Post to : yade-dev@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yade-dev
More help   : https://help.launchpad.net/ListHelp


Re: [Yade-dev] parallel collider - testing needed

2014-04-02 Thread Anton Gladky
2014-03-31 10:29 GMT+02:00 Bruno Chareyre :
>> I think, we can include this code into the master branch in git.
>> Let`s check the code more precisely and merge it.
> For me the code is in its final version and ready to merge if nobody
> find bugs (at least you could run your QS simulation without crash - the
> good part!).
> But if someone wants to review it is never bad.

I have done some more tests (sorry, again without timings) and I do not see
any problems with the code. So I think we can safely merge it.

Thanks for you work on this

Anton

___
Mailing list: https://launchpad.net/~yade-dev
Post to : yade-dev@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yade-dev
More help   : https://help.launchpad.net/ListHelp


Re: [Yade-dev] parallel collider - testing needed

2014-03-31 Thread Klaus Thoeni
Hi guys,

I run some dynamic tests with my mesh too (some times ago, but I forgot to 
check). Implementation is fine and speed up is only about 6-8%. However, the 
simulation has just about 30 particles.

I even have more results for the performance check (with 1 Mio particles) 
which I will put on the wiki, at some stage (if I find time to analyse the 
results :-)). But something I can tell you, the maximum scaling for 1 Mio 
particles I get is about 3-4.

Cheers,
Klaus


On Monday 31 March 2014 10:29:03 Bruno Chareyre wrote:
> > I have tested this version of collider and have got a speedup for
> > about 5..10% with number of cores 2..6. But it was quasi-static
> > simulations, so the contact list is updating not so often.
> 
> Thanks Anton for feedback. Testing in quasistatic cases is indeed not
> very interesting.
> Or, in that case, it needs to report the collider's timing, not the wall
> clock time of yade as a whole.
> 
> > I think, we can include this code into the master branch in git.
> > Let`s check the code more precisely and merge it.
> 
> For me the code is in its final version and ready to merge if nobody
> find bugs (at least you could run your QS simulation without crash - the
> good part!).
> But if someone wants to review it is never bad.
> 
> Bruno
> 
> 
> ___
> Mailing list: https://launchpad.net/~yade-dev
> Post to : yade-dev@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~yade-dev
> More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~yade-dev
Post to : yade-dev@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yade-dev
More help   : https://help.launchpad.net/ListHelp


Re: [Yade-dev] parallel collider - testing needed

2014-03-31 Thread Bruno Chareyre

> I have tested this version of collider and have got a speedup for
> about 5..10% with number of cores 2..6. But it was quasi-static
> simulations, so the contact list is updating not so often.
Thanks Anton for feedback. Testing in quasistatic cases is indeed not
very interesting.
Or, in that case, it needs to report the collider's timing, not the wall
clock time of yade as a whole.


> I think, we can include this code into the master branch in git.
> Let`s check the code more precisely and merge it.
For me the code is in its final version and ready to merge if nobody
find bugs (at least you could run your QS simulation without crash - the
good part!).
But if someone wants to review it is never bad.

Bruno


___
Mailing list: https://launchpad.net/~yade-dev
Post to : yade-dev@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yade-dev
More help   : https://help.launchpad.net/ListHelp


Re: [Yade-dev] parallel collider - testing needed

2014-03-31 Thread Anton Gladky
Hi Bruno,

I have tested this version of collider and have got a speedup for
about 5..10% with number of cores 2..6. But it was quasi-static
simulations, so the contact list is updating not so often.

I think, we can include this code into the master branch in git.
Let`s check the code more precisely and merge it.

Thank you!

Anton


2014-02-24 16:36 GMT+01:00 Bruno Chareyre :
> Hi there,
> I implemented a parallel version of the InsertionSortCollider. It is
> almost ready but not yet pushed to the main trunk, as I have a few
> things to check before that.
> It would be helpful if some of you could 1/ test that your scripts work
> correctly and 2/ benchmark this for N>100k and j>4.
> If you run benchmarks, please remember to always activate timing and
> report the result of timing.stats(). It gives much more interesting data
> than the wall clock time.
>
> Preliminary benchmark results are below (from my laptop...), showing a
> speedup by a factor 2 on the total computation time for j4/200k
> particles (compared to the sequential collider).
> The speedup on collider alone is in fact of the order of x3.68 for 4
> threads. Nearly linear at least for such small number of threads.
>
> My expectation is that it should change almost nothing for small number
> of particles (say, N<10k), where colliding is an inexpensive step.
> For 1million of particles OTOH, there could be significant speedup,
> since the collider takes most of the time.
>
> You can get the "pc" branch at my github repo:
> git clone -b pc https://github.com/bchareyre/trunk.git
>
> Results of yade -j4 --performance are below (I7 quad-core with
> hyperthreading enabled, lightly loaded by background tasks -  j>4 not
> reported as hyperthreading is probably doing no good).
>
> Happy benchmarking. :)
>
> Bruno
>
>
> 
> ./yade-trunk -j4 --performance  (the current trunk)
> ...
> number of bodies 200813
>
> Elapsed  29.4102840424  sec
> Performance  6.80034234664  iter/sec
> Extrapolation on 1e5 iters  4.08476167255  hours
> =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
> Name
> Count TimeRel. time
> ---
> ForceResetter   200
> 700881us2.38%
> InsertionSortCollider 7
> 18816625us   64.02%
> InteractionLoop 200
> 6581283us   22.39%
> NewtonIntegrator200
> 3293119us   11.20%
> TOTAL
> 29391910us  100.00%
>
> Common time  597.731503963 s
>
>
> 5037  spheres, velocity= 327.689688709 +- 5.13604387635 %
> 25103  spheres, velocity= 81.2726909754 +- 1.0105334405 %
> 50250  spheres, velocity= 45.4114521341 +- 3.02333274436 %
> 100467  spheres, velocity= 19.0287424005 +- 2.26073439157 %
> 200813  spheres, velocity= 6.51664351023 +- 4.03351515402 %
>
>
> SCORE: 13777
> Number of threads  4
>
>
> 
> ./yade-parallel -j4 --performance  (my "pc" branch)
> 
>
> number of bodies 200813
>
> Elapsed  15.4320101738  sec
> Performance  12.9600744004  iter/sec
> Extrapolation on 1e5 iters  2.14333474636  hours
> =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
> Name
> Count TimeRel. time
> ---
> ForceResetter   200
> 671157us4.36%
> InsertionSortCollider 7
> 5145114us   33.42%
>   boundDispatcher   7
> 93186us1.81%
>   bound
> 7 12us0.00%
>   copy  7
> 160891us3.13%
>   erase 7
> 66932us1.30%
>   sort&collide  7
> 4824071us   93.76%
>   TOTAL35
> 5145095us  100.00%
> InteractionLoop 200
> 6545848us   42.52%
> NewtonIntegrator200
> 3030989us   19.69%
> TOTAL
> 15393110us  100.00%
>
> Common time  460.37680912 s
>
>
> 5037  spheres, velocity= 365.599773471 +- 8.02397068512 %
> 25103  spheres, velocity= 92.0077536966 +- 3.81069496509 %
> 50250  spheres, velocity= 54.1683980588 +- 0.528288534811 %
> 100467  spheres, velocity= 25.7134767981 +- 1.0796373464 %
> 200813  spheres, velocity= 12.6488486429 +- 4.66276699319 %
>
>
> SCORE: 18800
> Number of threads  4
>
>
> ___
> Mailing list: https://launchpad.net/~yade-dev
> Post to : yade-dev@lists.launchpad.net
> Unsubscribe : ht

Re: [Yade-dev] parallel collider - testing needed

2014-02-28 Thread Klaus Thoeni
> > https://yade-dem.org/wiki/Performance_Test
> 
> Wow! Speed x6 for 500k particules?!
> It was definitely worth trying with larger numbers, it changes the
> picture completely when the last points are included.
> 
> Very nice page.
> Could you also give some absolute timings for completness? A convenient
> value could be the Cundall's number: Np*Nt/Tcpu
> With Np number of bodies, Nt number of iterations, Tcpu computation time.

I just updated the page:

https://yade-dem.org/wiki/Performance_Test


___
Mailing list: https://launchpad.net/~yade-dev
Post to : yade-dev@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yade-dev
More help   : https://help.launchpad.net/ListHelp


Re: [Yade-dev] parallel collider - testing needed

2014-02-28 Thread Bruno Chareyre

> https://yade-dem.org/wiki/Performance_Test

Wow! Speed x6 for 500k particules?! 
It was definitely worth trying with larger numbers, it changes the
picture completely when the last points are included.

Very nice page.
Could you also give some absolute timings for completness? A convenient
value could be the Cundall's number: Np*Nt/Tcpu
With Np number of bodies, Nt number of iterations, Tcpu computation time.

Bruno


___
Mailing list: https://launchpad.net/~yade-dev
Post to : yade-dev@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yade-dev
More help   : https://help.launchpad.net/ListHelp


Re: [Yade-dev] parallel collider - testing needed

2014-02-28 Thread Bruno Chareyre
(forwarding to yade-dev)

On 28/02/14 10:13, Klaus Thoeni wrote:
> Hi guys.,
>
> have a look at this:
>
> https://yade-dem.org/wiki/Performance_Test
>
> Feel free to add your own tests. If you want I can provide the scripts for 
> the 
> graphs.
>
> Cheers
> Klaus
>
>
>
>



___
Mailing list: https://launchpad.net/~yade-dev
Post to : yade-dev@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yade-dev
More help   : https://help.launchpad.net/ListHelp


Re: [Yade-dev] parallel collider - testing needed

2014-02-26 Thread Christian Jakob

Zitat von Klaus Thoeni :


Hi Bruno,


2/ Hyperthreading is completely useless for heavy computing tasks,
actually even bad, as your results suggest.


I did some tests by enabling and disabling hyperthreading some time ago.
Conclusions: always disable hyperthreading, as you say it makes no sense for
the kind of thinks we are doing. Maybe we should mention it somewhere on our
web page. Any suggestions where?


Good idea. Maybe this page would be a nice place:

https://yade-dem.org/wiki/Multicore_Performance

There is also one thing I am missing on the wiki. What about  
comparison of different cpus/hardware-combinations? We could put one  
or two benchmark scripts in the examples folder. Users can follow  
instructions and provide benchmark results of the hardware they are  
using. Results will be published on the wiki. What do you think about  
that?




___
Mailing list: https://launchpad.net/~yade-dev
Post to : yade-dev@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yade-dev
More help   : https://help.launchpad.net/ListHelp


Re: [Yade-dev] parallel collider - testing needed

2014-02-26 Thread Klaus Thoeni
Hi Bruno,

> 2/ Hyperthreading is completely useless for heavy computing tasks,
> actually even bad, as your results suggest.

I did some tests by enabling and disabling hyperthreading some time ago. 
Conclusions: always disable hyperthreading, as you say it makes no sense for 
the kind of thinks we are doing. Maybe we should mention it somewhere on our 
web page. Any suggestions where?

> Benchmarking 8 threads via this technique is irrelevant for this reason.
> What I would really like to see is how the collider scales with 8
> non-virtual cores and more.
> I think they can do that in Freiberg and Newcastle (in Grenoble as well,
> in fact, I just didn't find the time).

I did some testing with --performance on our grid on 3 different nodes and with 
various numbers of cores. I am rerunning the test with 50 particles at the 
moment and will try to post a summary of all the results here or on the wiki 
later.

In the mean time some results of our slow AMD Opteron Processor 6282 SE:


yade -j4
5037  spheres, velocity= 94.8073682494 +- 3.55139591623 %
25103  spheres, velocity= 27.7389795715 +- 8.63375047506 %
50250  spheres, velocity= 16.0519684282 +- 5.60688183622 %
100467  spheres, velocity= 6.67235752786 +- 8.84758076674 %
200813  spheres, velocity= 2.66158958354 +- 7.70653861779 %

yade-pc -j4
5037  spheres, velocity= 78.264605326 +- 4.06741633055 %
25103  spheres, velocity= 26.0879865929 +- 2.61754448363 %
50250  spheres, velocity= 15.7245773611 +- 2.24679654566 %
100467  spheres, velocity= 7.64762330727 +- 2.59000324319 %
200813  spheres, velocity= 3.64194000319 +- 1.80798282427 %


yade -j8
5037  spheres, velocity= 138.024763661 +- 14.7299332104 %
25103  spheres, velocity= 35.7526851013 +- 4.24184671794 %
50250  spheres, velocity= 22.0071042904 +- 8.36195041437 %
100467  spheres, velocity= 11.1704832541 +- 11.725537817 %
200813  spheres, velocity= 3.54394003786 +- 5.48119712335 %

yade-pc -j8
5037  spheres, velocity= 133.311680084 +- 1.88168292497 %
25103  spheres, velocity= 34.3688804144 +- 7.43189318211 %
50250  spheres, velocity= 21.3620031259 +- 3.8532356508 %
100467  spheres, velocity= 11.3218727607 +- 3.77428592406 %
200813  spheres, velocity= 6.16209240352 +- 6.24680400297 %


yade -j16
5037  spheres, velocity= 71.8232644642 +- 41.7059425388 %
25103  spheres, velocity= 24.6342039841 +- 3.98148164778 %
50250  spheres, velocity= 16.1247061321 +- 4.73981941981 %
100467  spheres, velocity= 9.23509237236 +- 2.14822969955 %
200813  spheres, velocity= 2.91721702399 +- 3.88145803663 %

yade-pc -j16
5037  spheres, velocity= 129.908588625 +- 15.6874714595 %
25103  spheres, velocity= 33.526601121 +- 13.7594343427 %
50250  spheres, velocity= 17.7898704143 +- 7.7469432427 %
100467  spheres, velocity= 11.3877154372 +- 1.74832633634 %
200813  spheres, velocity= 6.95545612967 +- 2.35988760251 %


yade -j32
5037  spheres, velocity= 59.0283160736 +- 51.2569740982 %
25103  spheres, velocity= 18.7622567759 +- 6.54660223453 %
50250  spheres, velocity= 12.3588048445 +- 8.49295845839 %
100467  spheres, velocity= 7.6569548227 +- 6.71719242602 %
200813  spheres, velocity= 2.47982732752 +- 10.4129796959 %

yade-pc -j32
5037  spheres, velocity= 88.990043 +- 15.7295668423 %
25103  spheres, velocity= 18.1857423869 +- 1.17387945175 %
50250  spheres, velocity= 12.6321967406 +- 5.31792620843 %
100467  spheres, velocity= 8.98513348696 +- 4.48699885744 %
200813  spheres, velocity= 6.12495571697 +- 1.48933071382 %

Summary for 20 particles:
-> -j4: scale =1.37
-> -j8: scale =1.74
-> -j16: scale =2.38
-> -j32: scale =2.47

These numbers might look differently on our Intel nodes, I still have to check.

> What I need also before pushing to trunk is more testing with real
> scripts, not just --performance.
> I only covered a narrow range of situations with my own scripts, I would
> like to be sure that it will not break in other cases.

Maybe ask mister Fu, he really seems to be keen on increasing his computing 
scale ;-) 

Cheers
Klaus


___
Mailing list: https://launchpad.net/~yade-dev
Post to : yade-dev@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yade-dev
More help   : https://help.launchpad.net/ListHelp


Re: [Yade-dev] parallel collider - testing needed

2014-02-26 Thread Bruno Chareyre
Thanks! Comments below.

On 26/02/14 13:33, Matthias Frank wrote:
>
>
> i have also some benchmark results:
>
> for 1 thread
> ---
>
> InsertionSortCollider 7
> 21314382us   51.34%
> InteractionLoop 200
> 14890015us   35.87%
> NewtonIntegrator200
> 5084295us   12.25%
> TOTAL 41513619us  100.00%

> for 4 threads
> ---
>
> InsertionSortCollider 7
> 8374089us   44.57%
> InteractionLoop 200
> 6866564us   36.55%
> NewtonIntegrator200
> 2915176us   15.52%
> TOTAL 18787178us  100.00%

>
> 
>
> for 8 threads
> ---
>
> InsertionSortCollider 7
> 7577257us   39.74%
> InteractionLoop 200
> 6923126us   36.31%
> NewtonIntegrator200
> 3186823us   16.71%
> TOTAL 19067561us  100.00%
>

You are confirming my timings.
1/ ISC scales much better than interaction loop and newton.
2/ Hyperthreading is completely useless for heavy computing tasks,
actually even bad, as your results suggest.
Benchmarking 8 threads via this technique is irrelevant for this reason.
What I would really like to see is how the collider scales with 8
non-virtual cores and more.
I think they can do that in Freiberg and Newcastle (in Grenoble as well,
in fact, I just didn't find the time).

What I need also before pushing to trunk is more testing with real
scripts, not just --performance.
I only covered a narrow range of situations with my own scripts, I would
like to be sure that it will not break in other cases.

Cheers.

Bruno


___
Mailing list: https://launchpad.net/~yade-dev
Post to : yade-dev@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yade-dev
More help   : https://help.launchpad.net/ListHelp


Re: [Yade-dev] parallel collider - testing needed

2014-02-26 Thread Matthias Frank

hi guys,

i have also some benchmark results:

for 1 thread

200801
number of bodies 200813

Elapsed  41.6678731441  sec
Performance  4.79986101782  iter/sec
Extrapolation on 1e5 iters  5.78720460335  hours
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
Name Count TimeRel. time
---
ForceResetter   200 
224925us0.54%
InsertionSortCollider 7 
21314382us   51.34%
InteractionLoop 200 
14890015us   35.87%
NewtonIntegrator200 
5084295us   12.25%

TOTAL 41513619us  100.00%

Common time  1013.57112694 s


5037  spheres, velocity= 140.463364272 +- 1.28620387158 %
25103  spheres, velocity= 41.138472944 +- 2.34750742651 %
50250  spheres, velocity= 24.1614197693 +- 0.709212706826 %
100467  spheres, velocity= 11.7041352478 +- 0.681390348657 %
200813  spheres, velocity= 5.20881044621 +- 5.57298683259 %


SCORE: 7993
Number of threads  1



for 4 threads
200801
number of bodies 200813

Elapsed  18.8133409023  sec
Performance  10.6307540505  iter/sec
Extrapolation on 1e5 iters  2.61296401421  hours
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
Name Count TimeRel. time
---
ForceResetter   200 
631347us3.36%
InsertionSortCollider 7 
8374089us   44.57%
InteractionLoop 200 
6866564us   36.55%
NewtonIntegrator200 
2915176us   15.52%

TOTAL 18787178us  100.00%

Common time  443.513967991 s


5037  spheres, velocity= 404.919400864 +- 0.912571165941 %
25103  spheres, velocity= 105.118936499 +- 2.36368208547 %
50250  spheres, velocity= 61.4143580936 +- 1.40115209383 %
100467  spheres, velocity= 25.7654736657 +- 2.93262637568 %
200813  spheres, velocity= 12.2452664182 +- 9.39816092272 %


SCORE: 19832
Number of threads  4



for 8 threads
200801
number of bodies 200813

Elapsed  19.0994348526  sec
Performance  10.4715140287  iter/sec
Extrapolation on 1e5 iters  2.65269928508  hours
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
Name Count TimeRel. time
---
ForceResetter   200 
1380352us7.24%
InsertionSortCollider 7 
7577257us   39.74%
InteractionLoop 200 
6923126us   36.31%
NewtonIntegrator200 
3186823us   16.71%

TOTAL 19067561us  100.00%

Common time  479.59920001 s


5037  spheres, velocity= 355.829004066 +- 2.37547928463 %
25103  spheres, velocity= 87.4558634849 +- 2.63148596504 %
50250  spheres, velocity= 56.1805332982 +- 2.18028212667 %
100467  spheres, velocity= 26.26403263 +- 9.82416513972 %
200813  spheres, velocity= 11.736613584 +- 8.6342992153 %


SCORE: 18265
Number of threads  8




4 threads without virtualization
200801
number of bodies 200813

Elapsed  23.8045229912  sec
Performance  8.40176465935  iter/sec
Extrapolation on 1e5 iters  3.30618374878  hours
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
Name Count TimeRel. time
---
ForceResetter   200 
523769us2.20%
InsertionSortCollider 7 
15676590us   65.87%
InteractionLoop 200 
5077634us   21.33%
NewtonIntegrator200 
2522054us   10.60%

TOTAL 23800048us  100.00%

Common time  437.141875982 s


5037  spheres, velocity= 611.163145541 +- 0.257590873987 %
25103  spheres, velocity= 1

Re: [Yade-dev] parallel collider - testing needed

2014-02-26 Thread Bruno Chareyre

> after running  "make install" in my build folder I start yade using "python 
> yadeparallel -j4 --performance" 

Why "python" in the first place?! I would not be surprised if the number
of cores allocated to python was 1, which may cause "yade -j4" to run in
a single thread context.

B


___
Mailing list: https://launchpad.net/~yade-dev
Post to : yade-dev@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yade-dev
More help   : https://help.launchpad.net/ListHelp


Re: [Yade-dev] parallel collider - testing needed

2014-02-26 Thread Bruno Chareyre

>>> It is a good benchmark overall, the problem is that it is hardly
>>> reproducible. Each run can give a really different total time (more than
>>> a factor 2 between two measure time, didn't you see that to?
>> when i run the script with num_balls1D = 10 i get:
>>
> Mmmmh... I should try again then (I didn't save the logs).
> Thanks.

I confirm your results. The timings are stable in another try.
I guess it was an effect of other tasks (internet browsing and so on).

Bruno



___
Mailing list: https://launchpad.net/~yade-dev
Post to : yade-dev@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yade-dev
More help   : https://help.launchpad.net/ListHelp


Re: [Yade-dev] parallel collider - testing needed

2014-02-26 Thread Bruno Chareyre

>
> yes, it is faster at -j1:
>

So this is an independent problem.
For me -j4 is always faster and effectively uses 4 cores, be it with the
old or the new collider.
I have no idea what can be wrong with your processor.

B



___
Mailing list: https://launchpad.net/~yade-dev
Post to : yade-dev@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yade-dev
More help   : https://help.launchpad.net/ListHelp


Re: [Yade-dev] parallel collider - testing needed

2014-02-26 Thread Eulitz, Alexander
I can confirm this behavior fort he performance test  at my machine.
I'm not sure whether it is  caused by the way I start the compiled yade:
after running  "make install" in my build folder I start yade using "python 
yadeparallel -j4 --performance" in the ./install/bin folder. "parallel" is the 
DSUFFIX I added in cmake.
I will now try to run some other script (martin-niehoff's dynamic simulation) 
using an equal command.


-Ursprüngliche Nachricht-
Von: Yade-dev 
[mailto:yade-dev-bounces+alexander.eulitz=iwf.tu-berlin...@lists.launchpad.net] 
Im Auftrag von Christian Jakob
Gesendet: Mittwoch, 26. Februar 2014 08:54
An: yade-dev@lists.launchpad.net
Betreff: Re: [Yade-dev] parallel collider - testing needed

>> There is apparently a problem with your computer/compilation option/other?
>> If you run an ordinary simulation with -j4 and many particles do you 
>> see
>> 4 cores used?


yes, for normal scripts it is running 4 threads at 4 cores, but --performance 
assigns all threads to one core it seems...


> Is there any difference at all on this machine, between -j1 and -j4?


yes, it is faster at -j1:

number of bodies 200813

Elapsed  69.9356219769  sec
Performance  2.85977295042  iter/sec
Extrapolation on 1e5 iters  9.71328083012  hours
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
NameCount   
TimeRel. time
---
ForceResetter   200 
1069504us1.53%
InsertionSortCollider 7
21301263us   30.46%
InteractionLoop 200
29700514us   42.47%
NewtonIntegrator200
17853603us   25.53%
TOTAL  
69924885us  100.00%

Common time  2067.0501442 s


5037  spheres, velocity= 71.3011948258 +- 0.426132271892 %
25103  spheres, velocity= 18.804595478 +- 1.73479566756 %
50250  spheres, velocity= 10.9461326398 +- 0.367180852894 %
100467  spheres, velocity= 5.45291715221 +- 0.431878602357 %
200813  spheres, velocity= 2.85102513277 +- 0.221541088185 %


SCORE: 3959
Number of threads  1




___
Mailing list: https://launchpad.net/~yade-dev
Post to : yade-dev@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yade-dev
More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~yade-dev
Post to : yade-dev@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yade-dev
More help   : https://help.launchpad.net/ListHelp


Re: [Yade-dev] parallel collider - testing needed

2014-02-26 Thread Christian Jakob

There is apparently a problem with your computer/compilation option/other?
If you run an ordinary simulation with -j4 and many particles do you see
4 cores used?



yes, for normal scripts it is running 4 threads at 4 cores, but  
--performance assigns all threads to one core it seems...




Is there any difference at all on this machine, between -j1 and -j4?



yes, it is faster at -j1:

number of bodies 200813

Elapsed  69.9356219769  sec
Performance  2.85977295042  iter/sec
Extrapolation on 1e5 iters  9.71328083012  hours
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
NameCount   
   TimeRel. time

---
ForceResetter   200 
1069504us1.53%
InsertionSortCollider 7
21301263us   30.46%
InteractionLoop 200
29700514us   42.47%
NewtonIntegrator200
17853603us   25.53%
TOTAL  
69924885us  100.00%


Common time  2067.0501442 s


5037  spheres, velocity= 71.3011948258 +- 0.426132271892 %
25103  spheres, velocity= 18.804595478 +- 1.73479566756 %
50250  spheres, velocity= 10.9461326398 +- 0.367180852894 %
100467  spheres, velocity= 5.45291715221 +- 0.431878602357 %
200813  spheres, velocity= 2.85102513277 +- 0.221541088185 %


SCORE: 3959
Number of threads  1




___
Mailing list: https://launchpad.net/~yade-dev
Post to : yade-dev@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yade-dev
More help   : https://help.launchpad.net/ListHelp


Re: [Yade-dev] parallel collider - testing needed

2014-02-25 Thread Bruno Chareyre
Is there any difference at all on this machine, between -j1 and -j4?

B

On 25/02/14 18:56, Bruno Chareyre wrote:
> There is apparently a problem with your computer/compilation option/other?
> If you run an ordinary simulation with -j4 and many particles do you see
> 4 cores used?
>
> Bruno
>
>
>
> On 25/02/14 16:26, Christian Jakob wrote:
>> Hi Bruno,
>>
>> I did some tests with your new collider:
>>
>> My "old" machine (2 cpu sockets with 4 cores each, Intel(R) Xeon(R)
>> CPU X5460  @ 3.16GHz) says:
>>
>>
>> yade-trunk -j4 --performance
>>
>> Welcome to Yade 2014-02-18.git-af75797
>> .
>> number of bodies 200813
>>
>> Elapsed  74.6882498264  sec
>> Performance  2.67779738399  iter/sec
>> Extrapolation on 1e5 iters  10.3733680314  hours
>> =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
>> NameCount 
>>TimeRel. time
>> ---
>>
>> ForceResetter   200   
>> 2625848us3.52%
>> InsertionSortCollider 7  
>> 21494603us   28.79%
>> InteractionLoop 200  
>> 32631323us   43.70%
>> NewtonIntegrator200  
>> 17913859us   23.99%
>> TOTAL
>> 74665635us  100.00%
>>
>> Common time  3845.09048295 s
>>
>>
>> Calculation velocity is unstable, try to close all programs and start
>> performance tests again
>> 5037  spheres, velocity= 44.7832284176 +- 60.1189421161 %
>> 25103  spheres, velocity= 17.4121076601 +- 0.99355345037 %
>> 50250  spheres, velocity= 10.0714940216 +- 1.5389769 %
>> 100467  spheres, velocity= 5.05891811219 +- 0.434738330959 %
>> 200813  spheres, velocity= 2.65826879857 +- 0.933088603948 %
>>
>>
>> SCORE: 3479
>> Number of threads  4
>>
>> 
>>
>> ###
>>
>> yade-parallel -j4 --performance (your pc branch)
>>
>> Welcome to Yade 2014-02-24.git-b60d388
>> .
>> number of bodies 200813
>>
>> Elapsed  75.6688189507  sec
>> Performance  2.64309662518  iter/sec
>> Extrapolation on 1e5 iters  10.5095581876  hours
>> =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
>> NameCount 
>>TimeRel. time
>> ---
>>
>> ForceResetter   200   
>> 2600100us3.44%
>> InsertionSortCollider 7  
>> 20746020us   27.43%
>> InteractionLoop 200  
>> 34455725us   45.55%
>> NewtonIntegrator200  
>> 17838205us   23.58%
>> TOTAL
>> 75640051us  100.00%
>>
>> Common time  4093.34840894 s
>>
>>
>> Calculation velocity is unstable, try to close all programs and start
>> performance tests again
>> 5037  spheres, velocity= 44.3999135517 +- 61.0812025756 %
>> 25103  spheres, velocity= 16.8531534243 +- 1.32470154863 %
>> 50250  spheres, velocity= 9.61504490252 +- 0.670186229301 %
>> 100467  spheres, velocity= 4.86679881913 +- 0.487840014886 %
>> 200813  spheres, velocity= 2.64490152313 +- 0.285084118261 %
>>
>>
>> SCORE: 3402
>> Number of threads  4
>>
>> ##
>>
>>
>> For my computer it seems to have nearly no speed up ...
>>
>> Looking at htop tells my, that -j4 --performance is using 4 threads,
>> but just on 1 core ...
>>
>> Regards,
>>
>> Christian
>>
>>
>>
>> Zitat von Bruno Chareyre :
>>
>>> Hi there,
>>> I implemented a parallel version of the InsertionSortCollider. It is
>>> almost ready but not yet pushed to the main trunk, as I have a few
>>> things to check before that.
>>> It would be helpful if some of you could 1/ test that your scripts work
>>> correctly and 2/ benchmark this for N>100k and j>4.
>>> If you run benchmarks, please remember to always activate timing and
>>> report the result of timing.stats(). It gives much more interesting data
>>> than the wall clock time.
>>>
>>> Preliminary benchmark results are below (from my laptop...), showing a
>>> speedup by a factor 2 on the total computation time for j4/200k
>>> particles (compared to the sequential collider).
>>> The speedup on collider alone is in fact of the order of x3.68 for 4
>>> threads. Nearly linear at least for such small number of threads.
>>>
>>> My expectation is that it should change almost nothing for small number
>>> of particles (say, N<10k), where colliding is an inexpensive st

Re: [Yade-dev] parallel collider - testing needed

2014-02-25 Thread Bruno Chareyre
There is apparently a problem with your computer/compilation option/other?
If you run an ordinary simulation with -j4 and many particles do you see
4 cores used?

Bruno



On 25/02/14 16:26, Christian Jakob wrote:
> Hi Bruno,
>
> I did some tests with your new collider:
>
> My "old" machine (2 cpu sockets with 4 cores each, Intel(R) Xeon(R)
> CPU X5460  @ 3.16GHz) says:
>
>
> yade-trunk -j4 --performance
>
> Welcome to Yade 2014-02-18.git-af75797
> .
> number of bodies 200813
>
> Elapsed  74.6882498264  sec
> Performance  2.67779738399  iter/sec
> Extrapolation on 1e5 iters  10.3733680314  hours
> =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
> NameCount 
>TimeRel. time
> ---
>
> ForceResetter   200   
> 2625848us3.52%
> InsertionSortCollider 7  
> 21494603us   28.79%
> InteractionLoop 200  
> 32631323us   43.70%
> NewtonIntegrator200  
> 17913859us   23.99%
> TOTAL
> 74665635us  100.00%
>
> Common time  3845.09048295 s
>
>
> Calculation velocity is unstable, try to close all programs and start
> performance tests again
> 5037  spheres, velocity= 44.7832284176 +- 60.1189421161 %
> 25103  spheres, velocity= 17.4121076601 +- 0.99355345037 %
> 50250  spheres, velocity= 10.0714940216 +- 1.5389769 %
> 100467  spheres, velocity= 5.05891811219 +- 0.434738330959 %
> 200813  spheres, velocity= 2.65826879857 +- 0.933088603948 %
>
>
> SCORE: 3479
> Number of threads  4
>
> 
>
> ###
>
> yade-parallel -j4 --performance (your pc branch)
>
> Welcome to Yade 2014-02-24.git-b60d388
> .
> number of bodies 200813
>
> Elapsed  75.6688189507  sec
> Performance  2.64309662518  iter/sec
> Extrapolation on 1e5 iters  10.5095581876  hours
> =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
> NameCount 
>TimeRel. time
> ---
>
> ForceResetter   200   
> 2600100us3.44%
> InsertionSortCollider 7  
> 20746020us   27.43%
> InteractionLoop 200  
> 34455725us   45.55%
> NewtonIntegrator200  
> 17838205us   23.58%
> TOTAL
> 75640051us  100.00%
>
> Common time  4093.34840894 s
>
>
> Calculation velocity is unstable, try to close all programs and start
> performance tests again
> 5037  spheres, velocity= 44.3999135517 +- 61.0812025756 %
> 25103  spheres, velocity= 16.8531534243 +- 1.32470154863 %
> 50250  spheres, velocity= 9.61504490252 +- 0.670186229301 %
> 100467  spheres, velocity= 4.86679881913 +- 0.487840014886 %
> 200813  spheres, velocity= 2.64490152313 +- 0.285084118261 %
>
>
> SCORE: 3402
> Number of threads  4
>
> ##
>
>
> For my computer it seems to have nearly no speed up ...
>
> Looking at htop tells my, that -j4 --performance is using 4 threads,
> but just on 1 core ...
>
> Regards,
>
> Christian
>
>
>
> Zitat von Bruno Chareyre :
>
>> Hi there,
>> I implemented a parallel version of the InsertionSortCollider. It is
>> almost ready but not yet pushed to the main trunk, as I have a few
>> things to check before that.
>> It would be helpful if some of you could 1/ test that your scripts work
>> correctly and 2/ benchmark this for N>100k and j>4.
>> If you run benchmarks, please remember to always activate timing and
>> report the result of timing.stats(). It gives much more interesting data
>> than the wall clock time.
>>
>> Preliminary benchmark results are below (from my laptop...), showing a
>> speedup by a factor 2 on the total computation time for j4/200k
>> particles (compared to the sequential collider).
>> The speedup on collider alone is in fact of the order of x3.68 for 4
>> threads. Nearly linear at least for such small number of threads.
>>
>> My expectation is that it should change almost nothing for small number
>> of particles (say, N<10k), where colliding is an inexpensive step.
>> For 1million of particles OTOH, there could be significant speedup,
>> since the collider takes most of the time.
>>
>> You can get the "pc" branch at my github repo:
>> git clone -b pc https://github.com/bchareyre/trunk.git
>>
>> Results of yade 

Re: [Yade-dev] parallel collider - testing needed

2014-02-25 Thread Christian Jakob

Hi Bruno,

I did some tests with your new collider:

My "old" machine (2 cpu sockets with 4 cores each, Intel(R) Xeon(R)  
CPU X5460  @ 3.16GHz) says:



yade-trunk -j4 --performance

Welcome to Yade 2014-02-18.git-af75797
.
number of bodies 200813

Elapsed  74.6882498264  sec
Performance  2.67779738399  iter/sec
Extrapolation on 1e5 iters  10.3733680314  hours
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
NameCount   
   TimeRel. time

---
ForceResetter   200 
2625848us3.52%
InsertionSortCollider 7
21494603us   28.79%
InteractionLoop 200
32631323us   43.70%
NewtonIntegrator200
17913859us   23.99%
TOTAL  
74665635us  100.00%


Common time  3845.09048295 s


Calculation velocity is unstable, try to close all programs and start  
performance tests again

5037  spheres, velocity= 44.7832284176 +- 60.1189421161 %
25103  spheres, velocity= 17.4121076601 +- 0.99355345037 %
50250  spheres, velocity= 10.0714940216 +- 1.5389769 %
100467  spheres, velocity= 5.05891811219 +- 0.434738330959 %
200813  spheres, velocity= 2.65826879857 +- 0.933088603948 %


SCORE: 3479
Number of threads  4



###

yade-parallel -j4 --performance (your pc branch)

Welcome to Yade 2014-02-24.git-b60d388
.
number of bodies 200813

Elapsed  75.6688189507  sec
Performance  2.64309662518  iter/sec
Extrapolation on 1e5 iters  10.5095581876  hours
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
NameCount   
   TimeRel. time

---
ForceResetter   200 
2600100us3.44%
InsertionSortCollider 7
20746020us   27.43%
InteractionLoop 200
34455725us   45.55%
NewtonIntegrator200
17838205us   23.58%
TOTAL  
75640051us  100.00%


Common time  4093.34840894 s


Calculation velocity is unstable, try to close all programs and start  
performance tests again

5037  spheres, velocity= 44.3999135517 +- 61.0812025756 %
25103  spheres, velocity= 16.8531534243 +- 1.32470154863 %
50250  spheres, velocity= 9.61504490252 +- 0.670186229301 %
100467  spheres, velocity= 4.86679881913 +- 0.487840014886 %
200813  spheres, velocity= 2.64490152313 +- 0.285084118261 %


SCORE: 3402
Number of threads  4

##


For my computer it seems to have nearly no speed up ...

Looking at htop tells my, that -j4 --performance is using 4 threads,  
but just on 1 core ...


Regards,

Christian



Zitat von Bruno Chareyre :


Hi there,
I implemented a parallel version of the InsertionSortCollider. It is
almost ready but not yet pushed to the main trunk, as I have a few
things to check before that.
It would be helpful if some of you could 1/ test that your scripts work
correctly and 2/ benchmark this for N>100k and j>4.
If you run benchmarks, please remember to always activate timing and
report the result of timing.stats(). It gives much more interesting data
than the wall clock time.

Preliminary benchmark results are below (from my laptop...), showing a
speedup by a factor 2 on the total computation time for j4/200k
particles (compared to the sequential collider).
The speedup on collider alone is in fact of the order of x3.68 for 4
threads. Nearly linear at least for such small number of threads.

My expectation is that it should change almost nothing for small number
of particles (say, N<10k), where colliding is an inexpensive step.
For 1million of particles OTOH, there could be significant speedup,
since the collider takes most of the time.

You can get the "pc" branch at my github repo:
git clone -b pc https://github.com/bchareyre/trunk.git

Results of yade -j4 --performance are below (I7 quad-core with
hyperthreading enabled, lightly loaded by background tasks -  j>4 not
reported as hyperthreading is probably doing no good).

Happy benchmarking. :)

Bruno



./yade-trunk -j4 --performance  (the current trunk)
...
number of bodies 200813

Elapsed  29.4102840424  sec
Performance  6.80034234664  iter/sec
Extrapolation on 1e5 iters  4.08476167255  hours
=*=

Re: [Yade-dev] parallel collider - testing needed

2014-02-25 Thread Bruno Chareyre

> sorry for late reply. Feel free to share the pdf. Originally it was supposed 
> to be transferred to the wiki, anyway.
> I'm thinking about a good way to measure performance for highly dynamic 
> simulations, now. Maybe the script that martin-niehoff posted[1] would be 
> useful. It is basicly a regular cubic pack of spheres that is placed in 
> vibrating tub. The simulation runs for a single second (simulation time) and 
> excitation of the tub causes the pack to disperse. It is of great interest to 
> see whether such simulations benefit from the new collider, too, I think.

The more it is dynamic the more you should see the effect of parallel
collider, since higher velocities will trigger collision detection more
often.
The benchmark you suggest should be ok.

B


___
Mailing list: https://launchpad.net/~yade-dev
Post to : yade-dev@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yade-dev
More help   : https://help.launchpad.net/ListHelp


Re: [Yade-dev] parallel collider - testing needed

2014-02-25 Thread Eulitz, Alexander
sorry for late reply. Feel free to share the pdf. Originally it was supposed to 
be transferred to the wiki, anyway.
I'm thinking about a good way to measure performance for highly dynamic 
simulations, now. Maybe the script that martin-niehoff posted[1] would be 
useful. It is basicly a regular cubic pack of spheres that is placed in 
vibrating tub. The simulation runs for a single second (simulation time) and 
excitation of the tub causes the pack to disperse. It is of great interest to 
see whether such simulations benefit from the new collider, too, I think.

Alex
[1] https://answers.launchpad.net/yade/+question/242644 answer #10

-Ursprüngliche Nachricht-
Von: Yade-dev 
[mailto:yade-dev-bounces+alexander.eulitz=iwf.tu-berlin...@lists.launchpad.net] 
Im Auftrag von Bruno Chareyre
Gesendet: Montag, 24. Februar 2014 16:57
An: yade-dev@lists.launchpad.net
Betreff: Re: [Yade-dev] parallel collider - testing needed

I forgot to mention two things:
1- I tried the benchmark used by Christian for comparisons with PFC [1], 
however it seems that this test is very special. I get large differences 
between two runs. Basically, it seems the simulation only depends on truncation 
errors: vertical columns of sheres remain stable until a small bit of 
horizontal noise makes them fall down one by one. If you look at the simulation 
in the GUI it looks strange. I did not insist with this one, I think it could 
be improved by replacing the lattice by disordered packings.
2- The benchmark done by Alexander some time ago (on the same problem but with 
-j>1) is not visible anywhere if I'm not wrong. I have a copy of the pdf, is it 
ok to upload it on the wiki? It is an interesting starting point for evaluating 
the parallel collider.

Bruno

[1] https://www.yade-dem.org/wiki/Comparisons_with_PFC3D


On 24/02/14 16:36, Bruno Chareyre wrote:
> Hi there,
> I implemented a parallel version of the InsertionSortCollider. It is 
> almost ready but not yet pushed to the main trunk, as I have a few 
> things to check before that.
> It would be helpful if some of you could 1/ test that your scripts 
> work correctly and 2/ benchmark this for N>100k and j>4.
> If you run benchmarks, please remember to always activate timing and 
> report the result of timing.stats(). It gives much more interesting 
> data than the wall clock time.
>
> Preliminary benchmark results are below (from my laptop...), showing a 
> speedup by a factor 2 on the total computation time for j4/200k 
> particles (compared to the sequential collider).
> The speedup on collider alone is in fact of the order of x3.68 for 4 
> threads. Nearly linear at least for such small number of threads.
>
> My expectation is that it should change almost nothing for small 
> number of particles (say, N<10k), where colliding is an inexpensive step.
> For 1million of particles OTOH, there could be significant speedup, 
> since the collider takes most of the time.
>
> You can get the "pc" branch at my github repo:
> git clone -b pc https://github.com/bchareyre/trunk.git
>
> Results of yade -j4 --performance are below (I7 quad-core with 
> hyperthreading enabled, lightly loaded by background tasks -  j>4 not 
> reported as hyperthreading is probably doing no good).
>
> Happy benchmarking. :)
>
> Bruno
>
>
> 
> ./yade-trunk -j4 --performance  (the current trunk) ...
> number of bodies 200813
>
> Elapsed  29.4102840424  sec
> Performance  6.80034234664  iter/sec
> Extrapolation on 1e5 iters  4.08476167255  hours
> =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
> Name   
> Count TimeRel. time
> ---
> ForceResetter   200
> 700881us2.38% 
> InsertionSortCollider 7  
> 18816625us   64.02% 
> InteractionLoop 200   
> 6581283us   22.39% 
> NewtonIntegrator200   
> 3293119us   11.20% 
> TOTAL
> 29391910us  100.00% 
>
> Common time  597.731503963 s
>
>
> 5037  spheres, velocity= 327.689688709 +- 5.13604387635 %
> 25103  spheres, velocity= 81.2726909754 +- 1.0105334405 %
> 50250  spheres, velocity= 45.4114521341 +- 3.02333274436 %
> 100467  spheres, velocity= 19.0287424005 +- 2.26073439157 %
> 200813  spheres, velocity= 6.51664351023 +- 4.03351515402 %
>
>
> SCORE: 13777
> Number of threads  4
>

Re: [Yade-dev] parallel collider - testing needed

2014-02-25 Thread Bruno Chareyre



On 25/02/14 10:17, Christian Jakob wrote:
>
>> It is a good benchmark overall, the problem is that it is hardly
>> reproducible. Each run can give a really different total time (more than
>> a factor 2 between two measure time, didn't you see that to?
>
> when i run the script with num_balls1D = 10 i get:
>
Mmmmh... I should try again then (I didn't save the logs).
Thanks.

B


___
Mailing list: https://launchpad.net/~yade-dev
Post to : yade-dev@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yade-dev
More help   : https://help.launchpad.net/ListHelp


Re: [Yade-dev] parallel collider - testing needed

2014-02-25 Thread Christian Jakob



It is a good benchmark overall, the problem is that it is hardly
reproducible. Each run can give a really different total time (more than
a factor 2 between two measure time, didn't you see that to?


when i run the script with num_balls1D = 10 i get:

Welcome to Yade 2014-02-18.git-af75797
TCP python prompt on localhost:9001, auth cookie `cydaeu'
XMLRPC info provider on http://localhost:21001
Running script calc-time-YADE2014.py
--- run 1 of 20 ---
1.041
--- run 2 of 20 ---
0.961
--- run 3 of 20 ---
0.961
--- run 4 of 20 ---
0.961
--- run 5 of 20 ---
0.961
--- run 6 of 20 ---
0.961
--- run 7 of 20 ---
1.001
--- run 8 of 20 ---
1.041
--- run 9 of 20 ---
1.041
--- run 10 of 20 ---
1.001
--- run 11 of 20 ---
0.961
--- run 12 of 20 ---
1.001
--- run 13 of 20 ---
1.001
--- run 14 of 20 ---
0.961
--- run 15 of 20 ---
0.961
--- run 16 of 20 ---
1.001
--- run 17 of 20 ---
1.041
--- run 18 of 20 ---
1.041
--- run 19 of 20 ---
1.001
--- run 20 of 20 ---
1.041

i do not see a factor 2, do you?

can you run the original script and post your results?

c


___
Mailing list: https://launchpad.net/~yade-dev
Post to : yade-dev@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yade-dev
More help   : https://help.launchpad.net/ListHelp


Re: [Yade-dev] parallel collider - testing needed

2014-02-25 Thread Bruno Chareyre

> I rotated the wall below a little bit to make it slightly aslope. This
> is the reason why columns can collapse (not because truncation error):
>
I see.

>
> As you mentioned in a previous post we should define two benchmarking
> scripts. One for quasi-static simulations and one for dynamic ones.
> The one I used for comparison to PFC is quasi-static at the beginning
> and turns into a dynamic one.
> It seems not to be the best choice for a benchmark.
It is a good benchmark overall, the problem is that it is hardly
reproducible. Each run can give a really different total time (more than
a factor 2 between two measure time, didn't you see that to? or it is my
computer that failed for some reason?), so it needs a very large number
of runs to get a relevant average. It also means that subtle difference
in codes could lead to systematic bias.
I would change the benchmark a little, with some randomness in the
initial positions.

I could prepare another benchmark of the triaxial type. It can combines
dynamics in the initial steps and static situations after enough steps.

Bruno



___
Mailing list: https://launchpad.net/~yade-dev
Post to : yade-dev@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yade-dev
More help   : https://help.launchpad.net/ListHelp


Re: [Yade-dev] parallel collider - testing needed

2014-02-25 Thread Christian Jakob

Zitat von Bruno Chareyre :


I forgot to mention two things:
1- I tried the benchmark used by Christian for comparisons with PFC [1],
however it seems that this test is very special. I get large differences
between two runs. Basically, it seems the simulation only depends on
truncation errors: vertical columns of sheres remain stable until a
small bit of horizontal noise makes them fall down one by one. If you


I rotated the wall below a little bit to make it slightly aslope. This  
is the reason why columns can collapse (not because truncation error):


#rotation quaternion:
orientationWall = Quaternion(Vector3(.01,.01,1),math.pi)
#create box:
id_box=O.bodies.append(utils.box((origin_wall,origin_wall,-.5),(200,200,.5),orientationWall,fixed=True,material=WallMat))

As you mentioned in a previous post we should define two benchmarking  
scripts. One for quasi-static simulations and one for dynamic ones.  
The one I used for comparison to PFC is quasi-static at the beginning  
and turns into a dynamic one.

It seems not to be the best choice for a benchmark.


look at the simulation in the GUI it looks strange. I did not insist
with this one, I think it could be improved by replacing the lattice by
disordered packings.
2- The benchmark done by Alexander some time ago (on the same problem
but with -j>1) is not visible anywhere if I'm not wrong. I have a copy
of the pdf, is it ok to upload it on the wiki? It is an interesting
starting point for evaluating the parallel collider.

Bruno

[1] https://www.yade-dem.org/wiki/Comparisons_with_PFC3D


On 24/02/14 16:36, Bruno Chareyre wrote:

Hi there,
I implemented a parallel version of the InsertionSortCollider. It is
almost ready but not yet pushed to the main trunk, as I have a few
things to check before that.
It would be helpful if some of you could 1/ test that your scripts work
correctly and 2/ benchmark this for N>100k and j>4.
If you run benchmarks, please remember to always activate timing and
report the result of timing.stats(). It gives much more interesting data
than the wall clock time.

Preliminary benchmark results are below (from my laptop...), showing a
speedup by a factor 2 on the total computation time for j4/200k
particles (compared to the sequential collider).
The speedup on collider alone is in fact of the order of x3.68 for 4
threads. Nearly linear at least for such small number of threads.

My expectation is that it should change almost nothing for small number
of particles (say, N<10k), where colliding is an inexpensive step.
For 1million of particles OTOH, there could be significant speedup,
since the collider takes most of the time.

You can get the "pc" branch at my github repo:
git clone -b pc https://github.com/bchareyre/trunk.git

Results of yade -j4 --performance are below (I7 quad-core with
hyperthreading enabled, lightly loaded by background tasks -  j>4 not
reported as hyperthreading is probably doing no good).

Happy benchmarking. :)

Bruno



./yade-trunk -j4 --performance  (the current trunk)
...
number of bodies 200813

Elapsed  29.4102840424  sec
Performance  6.80034234664  iter/sec
Extrapolation on 1e5 iters  4.08476167255  hours
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
Name
Count TimeRel. time
---
ForceResetter   200
700881us2.38%
InsertionSortCollider 7
18816625us   64.02%
InteractionLoop 200
6581283us   22.39%
NewtonIntegrator200
3293119us   11.20%
TOTAL
29391910us  100.00%

Common time  597.731503963 s


5037  spheres, velocity= 327.689688709 +- 5.13604387635 %
25103  spheres, velocity= 81.2726909754 +- 1.0105334405 %
50250  spheres, velocity= 45.4114521341 +- 3.02333274436 %
100467  spheres, velocity= 19.0287424005 +- 2.26073439157 %
200813  spheres, velocity= 6.51664351023 +- 4.03351515402 %


SCORE: 13777
Number of threads  4



./yade-parallel -j4 --performance  (my "pc" branch)


number of bodies 200813

Elapsed  15.4320101738  sec
Performance  12.9600744004  iter/sec
Extrapolation on 1e5 iters  2.14333474636  hours
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
Name
Count TimeRel. time
---
ForceResetter   200
671157us4.36%
InsertionSortCollider 7
5145114us   33.42%
  boundDispatcher   7
93186us1.81%
  bound
7 12us0.00%
  copy  7
160891us   

Re: [Yade-dev] parallel collider - testing needed

2014-02-24 Thread Bruno Chareyre
I forgot to mention two things:
1- I tried the benchmark used by Christian for comparisons with PFC [1],
however it seems that this test is very special. I get large differences
between two runs. Basically, it seems the simulation only depends on
truncation errors: vertical columns of sheres remain stable until a
small bit of horizontal noise makes them fall down one by one. If you
look at the simulation in the GUI it looks strange. I did not insist
with this one, I think it could be improved by replacing the lattice by
disordered packings.
2- The benchmark done by Alexander some time ago (on the same problem
but with -j>1) is not visible anywhere if I'm not wrong. I have a copy
of the pdf, is it ok to upload it on the wiki? It is an interesting
starting point for evaluating the parallel collider.

Bruno

[1] https://www.yade-dem.org/wiki/Comparisons_with_PFC3D


On 24/02/14 16:36, Bruno Chareyre wrote:
> Hi there,
> I implemented a parallel version of the InsertionSortCollider. It is
> almost ready but not yet pushed to the main trunk, as I have a few
> things to check before that.
> It would be helpful if some of you could 1/ test that your scripts work
> correctly and 2/ benchmark this for N>100k and j>4.
> If you run benchmarks, please remember to always activate timing and
> report the result of timing.stats(). It gives much more interesting data
> than the wall clock time.
>
> Preliminary benchmark results are below (from my laptop...), showing a
> speedup by a factor 2 on the total computation time for j4/200k
> particles (compared to the sequential collider).
> The speedup on collider alone is in fact of the order of x3.68 for 4
> threads. Nearly linear at least for such small number of threads.
>
> My expectation is that it should change almost nothing for small number
> of particles (say, N<10k), where colliding is an inexpensive step.
> For 1million of particles OTOH, there could be significant speedup,
> since the collider takes most of the time.
>
> You can get the "pc" branch at my github repo:
> git clone -b pc https://github.com/bchareyre/trunk.git
>
> Results of yade -j4 --performance are below (I7 quad-core with
> hyperthreading enabled, lightly loaded by background tasks -  j>4 not
> reported as hyperthreading is probably doing no good).
>
> Happy benchmarking. :)
>
> Bruno
>
>
> 
> ./yade-trunk -j4 --performance  (the current trunk)
> ...
> number of bodies 200813
>
> Elapsed  29.4102840424  sec
> Performance  6.80034234664  iter/sec
> Extrapolation on 1e5 iters  4.08476167255  hours
> =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
> Name   
> Count TimeRel. time
> ---
> ForceResetter   200
> 700881us2.38% 
> InsertionSortCollider 7  
> 18816625us   64.02% 
> InteractionLoop 200   
> 6581283us   22.39% 
> NewtonIntegrator200   
> 3293119us   11.20% 
> TOTAL
> 29391910us  100.00% 
>
> Common time  597.731503963 s
>
>
> 5037  spheres, velocity= 327.689688709 +- 5.13604387635 %
> 25103  spheres, velocity= 81.2726909754 +- 1.0105334405 %
> 50250  spheres, velocity= 45.4114521341 +- 3.02333274436 %
> 100467  spheres, velocity= 19.0287424005 +- 2.26073439157 %
> 200813  spheres, velocity= 6.51664351023 +- 4.03351515402 %
>
>
> SCORE: 13777
> Number of threads  4
>
>
> 
> ./yade-parallel -j4 --performance  (my "pc" branch)
> 
>
> number of bodies 200813
>
> Elapsed  15.4320101738  sec
> Performance  12.9600744004  iter/sec
> Extrapolation on 1e5 iters  2.14333474636  hours
> =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
> Name   
> Count TimeRel. time
> ---
> ForceResetter   200
> 671157us4.36% 
> InsertionSortCollider 7   
> 5145114us   33.42% 
>   boundDispatcher   7 
> 93186us1.81%   
>   bound
> 7 12us0.00%   
>   copy  7
> 160891us3.13%   
>   erase 7 
> 66932us1.30%   
>   sort&collide  7  

[Yade-dev] parallel collider - testing needed

2014-02-24 Thread Bruno Chareyre
Hi there,
I implemented a parallel version of the InsertionSortCollider. It is
almost ready but not yet pushed to the main trunk, as I have a few
things to check before that.
It would be helpful if some of you could 1/ test that your scripts work
correctly and 2/ benchmark this for N>100k and j>4.
If you run benchmarks, please remember to always activate timing and
report the result of timing.stats(). It gives much more interesting data
than the wall clock time.

Preliminary benchmark results are below (from my laptop...), showing a
speedup by a factor 2 on the total computation time for j4/200k
particles (compared to the sequential collider).
The speedup on collider alone is in fact of the order of x3.68 for 4
threads. Nearly linear at least for such small number of threads.

My expectation is that it should change almost nothing for small number
of particles (say, N<10k), where colliding is an inexpensive step.
For 1million of particles OTOH, there could be significant speedup,
since the collider takes most of the time.

You can get the "pc" branch at my github repo:
git clone -b pc https://github.com/bchareyre/trunk.git

Results of yade -j4 --performance are below (I7 quad-core with
hyperthreading enabled, lightly loaded by background tasks -  j>4 not
reported as hyperthreading is probably doing no good).

Happy benchmarking. :)

Bruno



./yade-trunk -j4 --performance  (the current trunk)
...
number of bodies 200813

Elapsed  29.4102840424  sec
Performance  6.80034234664  iter/sec
Extrapolation on 1e5 iters  4.08476167255  hours
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
Name   
Count TimeRel. time
---
ForceResetter   200
700881us2.38% 
InsertionSortCollider 7  
18816625us   64.02% 
InteractionLoop 200   
6581283us   22.39% 
NewtonIntegrator200   
3293119us   11.20% 
TOTAL
29391910us  100.00% 

Common time  597.731503963 s


5037  spheres, velocity= 327.689688709 +- 5.13604387635 %
25103  spheres, velocity= 81.2726909754 +- 1.0105334405 %
50250  spheres, velocity= 45.4114521341 +- 3.02333274436 %
100467  spheres, velocity= 19.0287424005 +- 2.26073439157 %
200813  spheres, velocity= 6.51664351023 +- 4.03351515402 %


SCORE: 13777
Number of threads  4



./yade-parallel -j4 --performance  (my "pc" branch)


number of bodies 200813

Elapsed  15.4320101738  sec
Performance  12.9600744004  iter/sec
Extrapolation on 1e5 iters  2.14333474636  hours
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
Name   
Count TimeRel. time
---
ForceResetter   200
671157us4.36% 
InsertionSortCollider 7   
5145114us   33.42% 
  boundDispatcher   7 
93186us1.81%   
  bound
7 12us0.00%   
  copy  7
160891us3.13%   
  erase 7 
66932us1.30%   
  sort&collide  7   
4824071us   93.76%   
  TOTAL35   
5145095us  100.00%   
InteractionLoop 200   
6545848us   42.52% 
NewtonIntegrator200   
3030989us   19.69% 
TOTAL
15393110us  100.00% 

Common time  460.37680912 s


5037  spheres, velocity= 365.599773471 +- 8.02397068512 %
25103  spheres, velocity= 92.0077536966 +- 3.81069496509 %
50250  spheres, velocity= 54.1683980588 +- 0.528288534811 %
100467  spheres, velocity= 25.7134767981 +- 1.0796373464 %
200813  spheres, velocity= 12.6488486429 +- 4.66276699319 %


SCORE: 18800
Number of threads  4


___
Mailing list: https://launchpad.net/~yade-dev
Post to : yade-dev@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yade-dev
More help   : https://help.launchpad.net/ListHelp