Re: gemacl: Scientific computing application written in Clojure

2014-12-22 Thread Henrik Eneroth
Interesting read Jose, thanks!

It might be interesting to try a transducer on 

(defn dot-prod 
  Returns the dot product of two vectors
  [v1 v2]
  (reduce + (map * v1 v2)))

if you can get your hands on the 1.7 alpha and the time and inclination to 
do it. Transducers have shown to be faster than running functions in 
sequence. Although I don't know how likely they are to beat native arrays, 
probably not very much.


On Sunday, December 21, 2014 7:10:41 PM UTC+1, Jose M. Perez Sanchez wrote:


 Regarding the speed optimizations, execution time for a given model was 
 reduced from 2735 seconds to 70 seconds, over several versions by doing 
 several optimizations.

 The same calculation implemented in C# takes 12 seconds using the same 
 computer and OS. Maybe the Clojure code can still be improved, but for the 
 time being I'm happy with the Clojure version being six times slower, since 
 the new software has many advantages.

 For these tests the model was the circle with radius 1 using the diffmr1 
 tracker, the simulation was run using 1 particles and 1 total 
 random walk steps.

 These modifications in the critical parts of the code accounted for most 
 of the improvement:

 - Avoid reflection by using type hints.
 - Use Java arrays.
 - In some cases call Java arithmetic functions directly instead of Clojure 
 ones.
 - Avoid using partial functions in the critical parts of the code.

 Avoiding lazyness did not help much. Regarding the use of Java arrays, 
 there are many small functions performing typical vector operations on 
 arrays, such as the following example:

 Using Clojure types:

 (defn dot-prod 
   Returns the dot product of two vectors
   [v1 v2]
   (reduce + (map * v1 v2)))

 Using Java arrays:

 (defn dot-prod-j
   Returns the dot product of two arrays of doubles
   [^doubles v1 ^doubles v2]
   (areduce v1 i ret 0.0
(+ ret (* (aget v1 i)
  (aget v2 i)


 This gives a general idea of which optimizations helped the most. These 
 changes are not in the public repository, since previous commits have been 
 omitted because the code code was not ready for publication (different 
 license disclaimer, contained email addresses, etc.). If anyone is 
 interested in the diffs and the execution times over several optimizations, 
 please contact me.

 Kind regards,

 Jose.


 On Sunday, December 21, 2014 3:38:35 AM UTC-5, Jose M. Perez Sanchez wrote:


 Hi everyone:

 Sorry that it has taken so long. I've just released the software in 
 GitHub under the EPL. It can be found at:

 https://github.com/iosephus/gema


 Kind regards,

 Jose.



-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: gemacl: Scientific computing application written in Clojure

2014-12-22 Thread Mikera
For most array operations (e.g. dot products on vectors), I strongly 
recommend trying out the recent core.matrix implementations. We've put a 
lot of effort into fast implementations and a nice clean Clojure API so I'd 
love to see them used where it makes sense!

For example vectorz-clj can be over 100x faster than a naive map / reduce 
implementation:

(let [a (vec (range 1))
   b (vec (range 1))]
(time (dotimes [i 100] (reduce + (map * a b)
Elapsed time: 364.590211 msecs

(let [a (array :vectorz (range 1))
  b (array :vectorz (range 1))]
(time (dotimes [i 100] (dot a b
Elapsed time: 3.358484 msecs

On Monday, 22 December 2014 17:31:41 UTC+8, Henrik Eneroth wrote:

 Interesting read Jose, thanks!

 It might be interesting to try a transducer on 

 (defn dot-prod 
   Returns the dot product of two vectors
   [v1 v2]
   (reduce + (map * v1 v2)))

 if you can get your hands on the 1.7 alpha and the time and inclination to 
 do it. Transducers have shown to be faster than running functions in 
 sequence. Although I don't know how likely they are to beat native arrays, 
 probably not very much.


 On Sunday, December 21, 2014 7:10:41 PM UTC+1, Jose M. Perez Sanchez wrote:


 Regarding the speed optimizations, execution time for a given model was 
 reduced from 2735 seconds to 70 seconds, over several versions by doing 
 several optimizations.

 The same calculation implemented in C# takes 12 seconds using the same 
 computer and OS. Maybe the Clojure code can still be improved, but for the 
 time being I'm happy with the Clojure version being six times slower, since 
 the new software has many advantages.

 For these tests the model was the circle with radius 1 using the 
 diffmr1 tracker, the simulation was run using 1 particles and 1 
 total random walk steps.

 These modifications in the critical parts of the code accounted for most 
 of the improvement:

 - Avoid reflection by using type hints.
 - Use Java arrays.
 - In some cases call Java arithmetic functions directly instead of 
 Clojure ones.
 - Avoid using partial functions in the critical parts of the code.

 Avoiding lazyness did not help much. Regarding the use of Java arrays, 
 there are many small functions performing typical vector operations on 
 arrays, such as the following example:

 Using Clojure types:

 (defn dot-prod 
   Returns the dot product of two vectors
   [v1 v2]
   (reduce + (map * v1 v2)))

 Using Java arrays:

 (defn dot-prod-j
   Returns the dot product of two arrays of doubles
   [^doubles v1 ^doubles v2]
   (areduce v1 i ret 0.0
(+ ret (* (aget v1 i)
  (aget v2 i)


 This gives a general idea of which optimizations helped the most. These 
 changes are not in the public repository, since previous commits have been 
 omitted because the code code was not ready for publication (different 
 license disclaimer, contained email addresses, etc.). If anyone is 
 interested in the diffs and the execution times over several optimizations, 
 please contact me.

 Kind regards,

 Jose.


 On Sunday, December 21, 2014 3:38:35 AM UTC-5, Jose M. Perez Sanchez 
 wrote:


 Hi everyone:

 Sorry that it has taken so long. I've just released the software in 
 GitHub under the EPL. It can be found at:

 https://github.com/iosephus/gema


 Kind regards,

 Jose.



-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: gemacl: Scientific computing application written in Clojure

2014-12-22 Thread Christopher Small
I'll second the use of core.matrix. It's a wonderful, idiomatic, fast 
library, and I hope to see folks continue to rally around it.


On Monday, December 22, 2014 3:47:59 AM UTC-7, Mikera wrote:

 For most array operations (e.g. dot products on vectors), I strongly 
 recommend trying out the recent core.matrix implementations. We've put a 
 lot of effort into fast implementations and a nice clean Clojure API so I'd 
 love to see them used where it makes sense!

 For example vectorz-clj can be over 100x faster than a naive map / reduce 
 implementation:

 (let [a (vec (range 1))
b (vec (range 1))]
 (time (dotimes [i 100] (reduce + (map * a b)
 Elapsed time: 364.590211 msecs

 (let [a (array :vectorz (range 1))
   b (array :vectorz (range 1))]
 (time (dotimes [i 100] (dot a b
 Elapsed time: 3.358484 msecs

 On Monday, 22 December 2014 17:31:41 UTC+8, Henrik Eneroth wrote:

 Interesting read Jose, thanks!

 It might be interesting to try a transducer on 

 (defn dot-prod 
   Returns the dot product of two vectors
   [v1 v2]
   (reduce + (map * v1 v2)))

 if you can get your hands on the 1.7 alpha and the time and inclination 
 to do it. Transducers have shown to be faster than running functions in 
 sequence. Although I don't know how likely they are to beat native arrays, 
 probably not very much.


 On Sunday, December 21, 2014 7:10:41 PM UTC+1, Jose M. Perez Sanchez 
 wrote:


 Regarding the speed optimizations, execution time for a given model was 
 reduced from 2735 seconds to 70 seconds, over several versions by doing 
 several optimizations.

 The same calculation implemented in C# takes 12 seconds using the same 
 computer and OS. Maybe the Clojure code can still be improved, but for the 
 time being I'm happy with the Clojure version being six times slower, since 
 the new software has many advantages.

 For these tests the model was the circle with radius 1 using the 
 diffmr1 tracker, the simulation was run using 1 particles and 1 
 total random walk steps.

 These modifications in the critical parts of the code accounted for most 
 of the improvement:

 - Avoid reflection by using type hints.
 - Use Java arrays.
 - In some cases call Java arithmetic functions directly instead of 
 Clojure ones.
 - Avoid using partial functions in the critical parts of the code.

 Avoiding lazyness did not help much. Regarding the use of Java arrays, 
 there are many small functions performing typical vector operations on 
 arrays, such as the following example:

 Using Clojure types:

 (defn dot-prod 
   Returns the dot product of two vectors
   [v1 v2]
   (reduce + (map * v1 v2)))

 Using Java arrays:

 (defn dot-prod-j
   Returns the dot product of two arrays of doubles
   [^doubles v1 ^doubles v2]
   (areduce v1 i ret 0.0
(+ ret (* (aget v1 i)
  (aget v2 i)


 This gives a general idea of which optimizations helped the most. These 
 changes are not in the public repository, since previous commits have been 
 omitted because the code code was not ready for publication (different 
 license disclaimer, contained email addresses, etc.). If anyone is 
 interested in the diffs and the execution times over several optimizations, 
 please contact me.

 Kind regards,

 Jose.


 On Sunday, December 21, 2014 3:38:35 AM UTC-5, Jose M. Perez Sanchez 
 wrote:


 Hi everyone:

 Sorry that it has taken so long. I've just released the software in 
 GitHub under the EPL. It can be found at:

 https://github.com/iosephus/gema


 Kind regards,

 Jose.



-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: gemacl: Scientific computing application written in Clojure

2014-12-22 Thread Jose M. Perez Sanchez

Thank you very much for your replies. I will definitely take a look at 
core.matrix. I really hate the fact that I had to use Java arrays to make 
it fast. I'll take a look at transducers as well.

Kind regards,

Jose.

On Monday, December 22, 2014 7:09:27 PM UTC-5, Christopher Small wrote:

 I'll second the use of core.matrix. It's a wonderful, idiomatic, fast 
 library, and I hope to see folks continue to rally around it.


 On Monday, December 22, 2014 3:47:59 AM UTC-7, Mikera wrote:

 For most array operations (e.g. dot products on vectors), I strongly 
 recommend trying out the recent core.matrix implementations. We've put a 
 lot of effort into fast implementations and a nice clean Clojure API so I'd 
 love to see them used where it makes sense!

 For example vectorz-clj can be over 100x faster than a naive map / reduce 
 implementation:

 (let [a (vec (range 1))
b (vec (range 1))]
 (time (dotimes [i 100] (reduce + (map * a b)
 Elapsed time: 364.590211 msecs

 (let [a (array :vectorz (range 1))
   b (array :vectorz (range 1))]
 (time (dotimes [i 100] (dot a b
 Elapsed time: 3.358484 msecs

 On Monday, 22 December 2014 17:31:41 UTC+8, Henrik Eneroth wrote:

 Interesting read Jose, thanks!

 It might be interesting to try a transducer on 

 (defn dot-prod 
   Returns the dot product of two vectors
   [v1 v2]
   (reduce + (map * v1 v2)))

 if you can get your hands on the 1.7 alpha and the time and inclination 
 to do it. Transducers have shown to be faster than running functions in 
 sequence. Although I don't know how likely they are to beat native arrays, 
 probably not very much.



-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: gemacl: Scientific computing application written in Clojure

2014-12-21 Thread Jose M. Perez Sanchez

Hi everyone:

Sorry that it has taken so long. I've just released the software in GitHub 
under the EPL. It can be found at:

https://github.com/iosephus/gema


Kind regards,

Jose.

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: gemacl: Scientific computing application written in Clojure

2014-12-21 Thread Jose M. Perez Sanchez

Regarding the speed optimizations, execution time for a given model was 
reduced from 2735 seconds to 70 seconds, over several versions by doing 
several optimizations.

The same calculation implemented in C# takes 12 seconds using the same 
computer and OS. Maybe the Clojure code can still be improved, but for the 
time being I'm happy with the Clojure version being six times slower, since 
the new software has many advantages.

For these tests the model was the circle with radius 1 using the diffmr1 
tracker, the simulation was run using 1 particles and 1 total 
random walk steps.

These modifications in the critical parts of the code accounted for most of 
the improvement:

- Avoid reflection by using type hints.
- Use Java arrays.
- In some cases call Java arithmetic functions directly instead of Clojure 
ones.
- Avoid using partial functions in the critical parts of the code.

Avoiding lazyness did not help much. Regarding the use of Java arrays, 
there are many small functions performing typical vector operations on 
arrays, such as the following example:

Using Clojure types:

(defn dot-prod 
  Returns the dot product of two vectors
  [v1 v2]
  (reduce + (map * v1 v2)))

Using Java arrays:

(defn dot-prod-j
  Returns the dot product of two arrays of doubles
  [^doubles v1 ^doubles v2]
  (areduce v1 i ret 0.0
   (+ ret (* (aget v1 i)
 (aget v2 i)


This gives a general idea of which optimizations helped the most. These 
changes are not in the public repository, since previous commits have been 
omitted because the code code was not ready for publication (different 
license disclaimer, contained email addresses, etc.). If anyone is 
interested in the diffs and the execution times over several optimizations, 
please contact me.

Kind regards,

Jose.


On Sunday, December 21, 2014 3:38:35 AM UTC-5, Jose M. Perez Sanchez wrote:


 Hi everyone:

 Sorry that it has taken so long. I've just released the software in 
 GitHub under the EPL. It can be found at:

 https://github.com/iosephus/gema


 Kind regards,

 Jose.



-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: gemacl: Scientific computing application written in Clojure

2014-06-04 Thread Mars0i


On Tuesday, June 3, 2014 12:46:55 PM UTC-5, Mars0i wrote:

 (def ones (doall (repeat 1000 1)))
 (bench (def _ (doall (map rand ones   ; 189 microseconds average time
 (bench (def _ (doall (pmap rand ones  ; 948 microseconds average time


For the record, I worried later that rand was too inexpensive, and that 
those results were being driven only by the cost of setting up threads in 
pmap.  This seems like a better test:

(bench (def _ (doall (map #(nth (iterate rand %) 1) (repeat 256 1)
; 185 milliseconds average time

(bench (def _ (doall (pmap #(nth (iterate rand %) 1) (repeat 256 1)
; 793 milliseconds average time

I have been having success getting a speedup simply by changing certain map 
calls to pmap in my main project.  I'm sure that many of us will be 
interested in a report whenever you get to it, but I can easily imagine 
that finding the time to summarize what you've learned is difficult. 

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: gemacl: Scientific computing application written in Clojure

2014-06-03 Thread Mars0i
Jose,

This is an old thread, and whatever problems you might be dealing with now, 
they're probably not the same ones as when the thread was active.  However, 
I think that if parallel code uses the built-in Clojure random number 
functions, there is probably a bottleneck in access to the RNG.  With 
Criterium's bench function on an 8-core machine:

(def ones (doall (repeat 1000 1)))
(bench (def _ (doall (map rand ones   ; 189 microseconds average time
(bench (def _ (doall (pmap rand ones  ; 948 microseconds average time

One solution that doesn't involve generating the numbers in advance is to 
create separate RNGs, as discussed in this thread 
https://groups.google.com/forum/#!searchin/clojure/random/clojure/cRVS19PB06E/8FsmtsYx6SkJ.
  
This is a strategy that I am starting to explore.

Related notes for anyone interested:

As of Incanter 1.5.5 at least some functions such as sample are based on 
Clojure's bult-in rand, so they would have this problem as well.

clojure.data.generators allows rebinding the RNG, and provides 
reservoir-sample and a replacement for the default rand-nth.

The bigml/sampling 
https://github.com/bigmlcom/sampling/tree/master/src/bigml/sampling 
library provides sampling and random number functions with optional 
generation of a new RNG.

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: gemacl: Scientific computing application written in Clojure

2014-06-03 Thread Jose M. Perez Sanchez

Thank you very much. I'm using the Colt random number generator directly. 
I've managed to reduce computing time by orders of magnitude using type 
hints and java arrays in some critical parts. I haven't had the time to 
write a report on this for the list, since have been busy with other 
projects, but this will come as well as the release of the source code.

Thanks again,

Jose.

On Tuesday, June 3, 2014 1:46:55 PM UTC-4, Mars0i wrote:

 Jose,

 This is an old thread, and whatever problems you might be dealing with 
 now, they're probably not the same ones as when the thread was active.  
 However, I think that if parallel code uses the built-in Clojure random 
 number functions, there is probably a bottleneck in access to the RNG.  
 With Criterium's bench function on an 8-core machine:

 (def ones (doall (repeat 1000 1)))
 (bench (def _ (doall (map rand ones   ; 189 microseconds average time
 (bench (def _ (doall (pmap rand ones  ; 948 microseconds average time

 One solution that doesn't involve generating the numbers in advance is to 
 create separate RNGs, as discussed in this thread 
 https://groups.google.com/forum/#!searchin/clojure/random/clojure/cRVS19PB06E/8FsmtsYx6SkJ.
   
 This is a strategy that I am starting to explore.

 Related notes for anyone interested:

 As of Incanter 1.5.5 at least some functions such as sample are based on 
 Clojure's bult-in rand, so they would have this problem as well.

 clojure.data.generators allows rebinding the RNG, and provides 
 reservoir-sample and a replacement for the default rand-nth.

 The bigml/sampling 
 https://github.com/bigmlcom/sampling/tree/master/src/bigml/sampling 
 library provides sampling and random number functions with optional 
 generation of a new RNG.


-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: gemacl: Scientific computing application written in Clojure

2013-11-23 Thread Jose M. Perez Sanchez
Yes, the step extract function encodes the total number of steps and any 
intermediate steps whose values are to be saved.

I did the following changes to the code:

1 - Store results locally in the threads and return them when the thread 
function exits, instead of using global vector. This does not impact 
performance directly (tested), but allows to use a transient vector to 
store the results locally, which is faster.
2 - Use loop/recur to loop over the particles, the steps and the valid 
displacement generation (instead of lazy sequences with extract function). 
Also in a few other small loops that are executed many times.
3 - Use transients in any vector to which a lot of data is going to be 
conjoined during the calculation.

These changes brought the following results. There is some improvement, 
both in computing time and scaling. See graphs attached: The master branch 
is the old code I posted already and the perftest branch contains the 
changes. I'm sure there is still room for improvement, and I'll focus on 
that as soon as some important missing features get implemented and I can 
finish some calculations that we need urgently.

kovasb: Could you elaborate the last part of I think you should try making 
the core iteration purely functional, meaning no agents, atoms, *or side 
effecting functions like the random generator*? I did remove the atom and 
agent (I keep a global integer ref though since I need to track the 
progress of the calculation). Regarding the random displacements, if it 
means generating them first and then consuming them in a side effect free 
fashion, it would take a lot of RAM to store all those numbers...

Thanks a lot for the help, I'll keep you posted about any other tests that 
might be interesting and will let you know when the code gets released.

Best,

Jose.

-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


benchmark.pdf
Description: Adobe PDF document


scaling.pdf
Description: Adobe PDF document


Re: gemacl: Scientific computing application written in Clojure

2013-11-18 Thread Jose M. Perez Sanchez
Hi Andy, cej38, kovas:

Thanks for the replies. I plan to release the whole code soon (waiting for 
institutional authorization).

I do use lazyness both within the move function to select the allowed 
random displacements and when iterating the move function to generate the 
trajectory. Lazy structures are only consumed within the thread in which 
they are created.

Here is the core code where the computations happens:

(defn step-particle
  Returns a new value for particle after moving particle once to a new 
position from the current one
  [pdf-get-fn step-condition-fn calc-value-fns particle]
  (let [pos (particle :pos)
disp (first (filter (fn [x] (step-condition-fn (particle :pos) x)) 
(repeatedly (fn [] (pdf-get-fn)
new-pos (mapv + pos disp)
new-base-particle {:pos new-pos :steps (inc (particle :steps))}
new-trackers-results (if (seq calc-value-fns)
   (zipmap
 (keys calc-value-fns)
 ((apply juxt (vals calc-value-fns)) 
particle new-base-particle))
   {})]
(merge new-trackers-results new-base-particle)))

(defn walk-particles
  While there is work to do, create new particles, move them n-steps, then 
send them to particle container (agent)
  [todo particles simul-info init-get-fn init-condition step-get-fn 
step-condition trackers-maps step-extract-fn]
  (let [init-value-fns (zipmap
 (keys trackers-maps)
 (map :create-fn (vals trackers-maps)))
calc-value-fns (zipmap
 (keys trackers-maps)
 (map :compute-fn (vals trackers-maps)))
move (partial step-particle step-get-fn step-condition 
calc-value-fns)]

(while ( @todo 0)
  (swap! todo dec)
  (let [p (last (create-particle init-get-fn init-condition 
init-value-fns))
lazy-steps (iterate move p)
result (step-extract-fn lazy-steps)]
 (send-off particles (fn [x] (conj x result)))


Each worker is created launching a future that executes walk-particles, 
each worker has a separate Mersenne Twister random number generator 
embedded into the pdf-get-fn (using partial on a common pdf-get function 
and different MT generators). In real calculations both number of particles 
and steps are at least 1e4. In the benchmarks I'm posting particles are 
1000, steps are 5000.

As expected conjoining to a single global vector poses no problem, I tested 
both conjoining to a single global vector and to separate global vectors 
(one per worker) and the computing time is the same.

I could test in another system with 16 cores. See the results attached for 
the 8 and 16 core systems.

Best,

Jose.

-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


benchmark.pdf
Description: Adobe PDF document


Re: gemacl: Scientific computing application written in Clojure

2013-11-18 Thread kovas boguta
Hi Jose,

I think you should try making the core iteration purely functional,
meaning no agents, atoms, or side effecting functions like the random
generator.

I assume the number of steps you evolve the particle is encoded in
step-extract-fn?

What you probably want is something like

(loop [i 0  pos  initial-position]
  (if ( i num-of-steps)
  (recur (+ i 1) (move pos)) ;; iterate
  pos  ;; if done, return final position
   ))

This will make it easier to benchmark the iteration step, which is an
important number to know. I'm sure you can make it much faster, if
perf is the ultimate goal its worth tuning a little.

In terms of distributing the work, I would not use atoms or agents.
They are not meant for parallelism or for work queues. With agents and
futures you need to be aware of the various thread pools involved
under the hood and make sure you are not saturating them. And combined
with laziness, it takes care to ensure work is getting done where you
are expecting it.

It would be easier to reason about what is going on by using threads
and queues directly. Enqueue a bunch of work on a queue, and directly
set up a bunch of threads that read batches of work from the queue
until its empty.

If the initial condition / other parameters are the same across
workers, you could even skip the work queue, and just set up a bunch
of threads that just do the iterations and then dump their result
somewhere.

I also definitely recommend making friends with loop/recur:

(time
 (loop [i 0]
   (if ( i 100)
 (recur (+ 1 i))
 true)))
= Elapsed time: 2.441 msecs

(def i (atom 0))

(time
 (while ( @i 100)
   (swap! i + 1)))
= Elapsed time: 52.767 msecs

loop/recur is both simpler and faster, and the best way to rapidly iterate.





On Mon, Nov 18, 2013 at 7:47 PM, Jose M. Perez Sanchez
m...@josemperez.com wrote:
 Hi Andy, cej38, kovas:

 Thanks for the replies. I plan to release the whole code soon (waiting for
 institutional authorization).

 I do use lazyness both within the move function to select the allowed random
 displacements and when iterating the move function to generate the
 trajectory. Lazy structures are only consumed within the thread in which
 they are created.

 Here is the core code where the computations happens:

 (defn step-particle
   Returns a new value for particle after moving particle once to a new
 position from the current one
   [pdf-get-fn step-condition-fn calc-value-fns particle]
   (let [pos (particle :pos)
 disp (first (filter (fn [x] (step-condition-fn (particle :pos) x))
 (repeatedly (fn [] (pdf-get-fn)
 new-pos (mapv + pos disp)
 new-base-particle {:pos new-pos :steps (inc (particle :steps))}
 new-trackers-results (if (seq calc-value-fns)
(zipmap
  (keys calc-value-fns)
  ((apply juxt (vals calc-value-fns))
 particle new-base-particle))
{})]
 (merge new-trackers-results new-base-particle)))

 (defn walk-particles
   While there is work to do, create new particles, move them n-steps, then
 send them to particle container (agent)
   [todo particles simul-info init-get-fn init-condition step-get-fn
 step-condition trackers-maps step-extract-fn]
   (let [init-value-fns (zipmap
  (keys trackers-maps)
  (map :create-fn (vals trackers-maps)))
 calc-value-fns (zipmap
  (keys trackers-maps)
  (map :compute-fn (vals trackers-maps)))
 move (partial step-particle step-get-fn step-condition
 calc-value-fns)]

 (while ( @todo 0)
   (swap! todo dec)
   (let [p (last (create-particle init-get-fn init-condition
 init-value-fns))
 lazy-steps (iterate move p)
 result (step-extract-fn lazy-steps)]
  (send-off particles (fn [x] (conj x result)))
 

 Each worker is created launching a future that executes walk-particles, each
 worker has a separate Mersenne Twister random number generator embedded into
 the pdf-get-fn (using partial on a common pdf-get function and different MT
 generators). In real calculations both number of particles and steps are at
 least 1e4. In the benchmarks I'm posting particles are 1000, steps are 5000.

 As expected conjoining to a single global vector poses no problem, I tested
 both conjoining to a single global vector and to separate global vectors
 (one per worker) and the computing time is the same.

 I could test in another system with 16 cores. See the results attached for
 the 8 and 16 core systems.

 Best,

 Jose.

 --
 --
 You received this message because you are subscribed to the Google
 Groups Clojure group.
 To post to this group, send email to clojure@googlegroups.com
 Note that posts from new members are moderated - please be patient with your
 first post.
 To unsubscribe from this group, send email to
 

Re: gemacl: Scientific computing application written in Clojure

2013-11-12 Thread cej38
It is hard to say where the root of your problem lies without looking at 
the code more.  I would look closely at laziness.  I find that lazy 
evaluation really kills parallelization.




On Friday, November 8, 2013 4:42:11 PM UTC-5, Jose M. Perez Sanchez wrote:

 Hello everyone:

 This is my first post here. I'm a researcher writing a numerical 
 simulation software in Clojure. Actually, I'm porting an app a coworker and 
 I wrote in C/Python (called GEMA) to Clojure: The app has been in use for a 
 while at our group, but became very difficult to maintain due to outgrowing 
 its initial design and being very monolithic and at the same time I wanted 
 to learn Functional Programming, so I've been working in the port for a few 
 weeks.

 The simulations are embarrassingly parallel Random Walk calculations used 
 to study gas diffusion and Helium-3 Magnetic Resonance diffusion 
 measurements in the lungs. At the core of the simulations we do there is a 
 3D geometrical model of the pulmonary acinus. The new application is 
 designed in a modular fashion, I'm including part of the current README 
 file with :browse confirm wa
 a description.

 I've approached my institution's Technology Transfer Office to request 
 authorization to release the software under an Open Source license, and if 
 everything goes well the code will be published soon. I'm very happy in my 
 Clojure trip so far and all the things I'm learning in the process.

 One of the things I've observed is poor scaling with the number of threads 
 for more than 4 threads in an 8-core Intel i7 CPU, as follows:

 NTTime  cpu%x8
   1   101.9   108
   2 54.9   220
   4 36.0   430
   6 33.9   570
   8 32.5   700
 10 32.5   720

 Computing times reported are just the time spent in the computation of the 
 NT futures (not total program execution time). CPU x8 percent is measured 
 with top in Linux and the % values are approximate, just to give an idea. 
 I'm running on Debian Wheezy with the following Java platform:

 JRE: OpenJDK Runtime Environment 1.6.0_27-b27 on Linux 3.2.0-4-amd64 
 (amd64)
 JVM: OpenJDK 64-Bit Server VM (build 20.0-b12 mixed mode)

 I'll try in a 16 core (4-way Opteron) soon and see what happens there. The 
 computing happens over an infinite lazy sequence of random walk steps 
 generated with (iterate move particle), when an extraction function 
 gets values from zero to the highest number of random walk steps and adds 
 (conj) the values to be kept to a vector. The resulting vector for each 
 particle is then added (conj) to a global vector for latter storage.

 I've read the previous post about concurrent performance in AMD 
 processors: 
 https://groups.google.com/forum/#!topic/clojure/48W2eff3caU%5B1-25-false%5D. 
 Have to do it again with more time though, to check whether any of the 
 explanations presented there applies to my application. 

 Best regards,

 Jose Manuel.



-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: gemacl: Scientific computing application written in Clojure

2013-11-12 Thread kovas boguta
Sounds like some form of overhead is dominating the computation. How
are the infinite sequences being consumed? Is it 1 thread per
sequence? How compute-intensive is (move particle) ? What kind of
numbers of are talking about in terms of steps, particles?

If move is fast, you probably need to batch up your computation. If
move is a simple arithmetic operation or otherwise something without
an inner loop, I'd make it perform at least 100 iterations per
invocation.

If you have many particles, I'd pay attention to how the threads are
switching between them, and eliminate any switching if possible.

I'd definitely recommend removing the global recording to reduce
complexity for now.






On Fri, Nov 8, 2013 at 4:42 PM, Jose M. Perez Sanchez m...@josemperez.com 
wrote:
 Hello everyone:

 This is my first post here. I'm a researcher writing a numerical simulation
 software in Clojure. Actually, I'm porting an app a coworker and I wrote in
 C/Python (called GEMA) to Clojure: The app has been in use for a while at
 our group, but became very difficult to maintain due to outgrowing its
 initial design and being very monolithic and at the same time I wanted to
 learn Functional Programming, so I've been working in the port for a few
 weeks.

 The simulations are embarrassingly parallel Random Walk calculations used to
 study gas diffusion and Helium-3 Magnetic Resonance diffusion measurements
 in the lungs. At the core of the simulations we do there is a 3D geometrical
 model of the pulmonary acinus. The new application is designed in a modular
 fashion, I'm including part of the current README file with :browse confirm
 wa
 a description.

 I've approached my institution's Technology Transfer Office to request
 authorization to release the software under an Open Source license, and if
 everything goes well the code will be published soon. I'm very happy in my
 Clojure trip so far and all the things I'm learning in the process.

 One of the things I've observed is poor scaling with the number of threads
 for more than 4 threads in an 8-core Intel i7 CPU, as follows:

 NTTime  cpu%x8
   1   101.9   108
   2 54.9   220
   4 36.0   430
   6 33.9   570
   8 32.5   700
 10 32.5   720

 Computing times reported are just the time spent in the computation of the
 NT futures (not total program execution time). CPU x8 percent is measured
 with top in Linux and the % values are approximate, just to give an idea.
 I'm running on Debian Wheezy with the following Java platform:

 JRE: OpenJDK Runtime Environment 1.6.0_27-b27 on Linux 3.2.0-4-amd64 (amd64)
 JVM: OpenJDK 64-Bit Server VM (build 20.0-b12 mixed mode)

 I'll try in a 16 core (4-way Opteron) soon and see what happens there. The
 computing happens over an infinite lazy sequence of random walk steps
 generated with (iterate move particle), when an extraction function gets
 values from zero to the highest number of random walk steps and adds (conj)
 the values to be kept to a vector. The resulting vector for each particle is
 then added (conj) to a global vector for latter storage.

 I've read the previous post about concurrent performance in AMD processors:
 https://groups.google.com/forum/#!topic/clojure/48W2eff3caU%5B1-25-false%5D.
 Have to do it again with more time though, to check whether any of the
 explanations presented there applies to my application.

 Best regards,

 Jose Manuel.

 --
 --
 You received this message because you are subscribed to the Google
 Groups Clojure group.
 To post to this group, send email to clojure@googlegroups.com
 Note that posts from new members are moderated - please be patient with your
 first post.
 To unsubscribe from this group, send email to
 clojure+unsubscr...@googlegroups.com
 For more options, visit this group at
 http://groups.google.com/group/clojure?hl=en
 ---
 You received this message because you are subscribed to the Google Groups
 Clojure group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to clojure+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/groups/opt_out.

-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: gemacl: Scientific computing application written in Clojure

2013-11-09 Thread Jose M. Perez Sanchez

Hi Andy:

Thanks a lot for your reply. I'll do more careful testing in the very near 
future and there is surely a lot to optimize in my code. I must say I did 
expect computing speed reduction coming from an already optimized codebase 
with the perfomance critical parts written in C, and there is an 
intentional trade off in my porting effort to get something more 
maintainable, extensible and scalable. My future plans are to run in a 
cluster on something like EC2, because I've made the numbers and buying 
hardware isn't cost effective for us anymore (we paid around EUR 10K for 
our last big computer and we can do a lot of computing in the cloud for 
that money). Since the software is used for research, we tend to add 
features and change it so that it simulates the different scenarios coming 
out of our scientific discussions: This means we spend almost as much time 
coding as simulating, and having a higher level language like Clojure helps 
us enormously.

I'll keep you posted about my future performance tests and the Open Source 
release of the software.

Best,

Jose.

-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: gemacl: Scientific computing application written in Clojure

2013-11-09 Thread Andy Fingerhut
Jose:

On re-reading your original post, I noticed one statement you made that may
be of interest: The resulting vector for each particle is then added
(conj) to a global vector for latter storage.

Do you mean that there is a single global vector that is conj'd onto by all
N threads?  Is this vector in a ref or atom, perhaps, and you use swap! or
something similar to update it from all threads?

If so, and if you do that frequently from each thread, then that part of
your code is definitely not embarrassingly parallel, even if the rest of it
is.

Andy


On Sat, Nov 9, 2013 at 8:06 AM, Jose M. Perez Sanchez 
m...@josemperez.comwrote:


 Hi Andy:

 Thanks a lot for your reply. I'll do more careful testing in the very near
 future and there is surely a lot to optimize in my code. I must say I did
 expect computing speed reduction coming from an already optimized codebase
 with the perfomance critical parts written in C, and there is an
 intentional trade off in my porting effort to get something more
 maintainable, extensible and scalable. My future plans are to run in a
 cluster on something like EC2, because I've made the numbers and buying
 hardware isn't cost effective for us anymore (we paid around EUR 10K for
 our last big computer and we can do a lot of computing in the cloud for
 that money). Since the software is used for research, we tend to add
 features and change it so that it simulates the different scenarios coming
 out of our scientific discussions: This means we spend almost as much time
 coding as simulating, and having a higher level language like Clojure helps
 us enormously.

 I'll keep you posted about my future performance tests and the Open Source
 release of the software.

 Best,

 Jose.

  --
 --
 You received this message because you are subscribed to the Google
 Groups Clojure group.
 To post to this group, send email to clojure@googlegroups.com
 Note that posts from new members are moderated - please be patient with
 your first post.
 To unsubscribe from this group, send email to
 clojure+unsubscr...@googlegroups.com
 For more options, visit this group at
 http://groups.google.com/group/clojure?hl=en
 ---
 You received this message because you are subscribed to the Google Groups
 Clojure group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to clojure+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/groups/opt_out.


-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: gemacl: Scientific computing application written in Clojure

2013-11-09 Thread Jose M. Perez Sanchez

Hi Andy:

Yes, this breaks embarrassing parallelism indeed. When the calculations are 
done for real this isn't a problem though, because these conj operations to 
the global list would happen sporadically (in average once every couple of 
seconds or so) so the probability of a thread waiting for a significant 
amount of time is very low. In the short benchmarks I posted this happens 
every few milliseconds in average and it could be a problem.

Honestly I don't expect even in the one conj every few ms case to have a 
problem there. I don't know how computationally expensive is the conj, but 
for every conj to the global list, at least a few dozens of thousands of 
random numbers are generated with the Mersenne Twister, and a similar 
number of other arithmetical operations are done. Several local conj 
operations inside the thread are also performed and in each of the few 
thousand steps maps are created and merged. The only way to know for sure 
is testing though, I'll post the results as soon as I can run a test.

Thanks a lot.

Best,

Jose.

-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


gemacl: Scientific computing application written in Clojure

2013-11-08 Thread Jose M. Perez Sanchez
Hello everyone:

This is my first post here. I'm a researcher writing a numerical simulation 
software in Clojure. Actually, I'm porting an app a coworker and I wrote in 
C/Python (called GEMA) to Clojure: The app has been in use for a while at 
our group, but became very difficult to maintain due to outgrowing its 
initial design and being very monolithic and at the same time I wanted to 
learn Functional Programming, so I've been working in the port for a few 
weeks.

The simulations are embarrassingly parallel Random Walk calculations used 
to study gas diffusion and Helium-3 Magnetic Resonance diffusion 
measurements in the lungs. At the core of the simulations we do there is a 
3D geometrical model of the pulmonary acinus. The new application is 
designed in a modular fashion, I'm including part of the current README 
file with :browse confirm wa
a description.

I've approached my institution's Technology Transfer Office to request 
authorization to release the software under an Open Source license, and if 
everything goes well the code will be published soon. I'm very happy in my 
Clojure trip so far and all the things I'm learning in the process.

One of the things I've observed is poor scaling with the number of threads 
for more than 4 threads in an 8-core Intel i7 CPU, as follows:

NTTime  cpu%x8
  1   101.9   108
  2 54.9   220
  4 36.0   430
  6 33.9   570
  8 32.5   700
10 32.5   720

Computing times reported are just the time spent in the computation of the 
NT futures (not total program execution time). CPU x8 percent is measured 
with top in Linux and the % values are approximate, just to give an idea. 
I'm running on Debian Wheezy with the following Java platform:

JRE: OpenJDK Runtime Environment 1.6.0_27-b27 on Linux 3.2.0-4-amd64 (amd64)
JVM: OpenJDK 64-Bit Server VM (build 20.0-b12 mixed mode)

I'll try in a 16 core (4-way Opteron) soon and see what happens there. The 
computing happens over an infinite lazy sequence of random walk steps 
generated with (iterate move particle), when an extraction function 
gets values from zero to the highest number of random walk steps and adds 
(conj) the values to be kept to a vector. The resulting vector for each 
particle is then added (conj) to a global vector for latter storage.

I've read the previous post about concurrent performance in AMD processors: 
https://groups.google.com/forum/#!topic/clojure/48W2eff3caU%5B1-25-false%5D. 
Have to do it again with more time though, to check whether any of the 
explanations presented there applies to my application. 

Best regards,

Jose Manuel.

-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


README-brief.md
Description: Binary data


Re: gemacl: Scientific computing application written in Clojure

2013-11-08 Thread Andy Fingerhut
Jose:

I am not aware of any conclusive explanation for the issue, and would love
to know one if anyone finds out.

At least in the case of that program mentioned in the other discussion
thread, much better speedup was achieved running N different JVM processes,
each single-threaded, on a machine with N CPU cores.  If you are willing to
try an experiment like that and see whether you get similar results, that
would indicate that the issue is due to multiple threads within a single
JVM, as opposed to some OS or hardware performance limitation.

Below are a list of possible explanations that seem most likely to me, but
again, no conclusive evidence for any of them yet:

1. JVM object allocation and/or garbage collector using locks or other
multi-threading performance killers
2. CPU core cache thrashing when the thread scheduler causes threads to
frequently be scheduled on different CPU cores (I haven't aired that guess
before, but it is related to the guess I made near the end of the
conversation you link to).
3. CPU core cache thrashing because single-threaded versions have working
sets that fit in caches close to CPU cores, but this working set is
multiplied by N when running N threads.
4. Some subtle area of Clojure implementation that you are using that is
limiting parallelism

Andy



On Fri, Nov 8, 2013 at 1:42 PM, Jose M. Perez Sanchez 
m...@josemperez.comwrote:

 Hello everyone:

 This is my first post here. I'm a researcher writing a numerical
 simulation software in Clojure. Actually, I'm porting an app a coworker and
 I wrote in C/Python (called GEMA) to Clojure: The app has been in use for a
 while at our group, but became very difficult to maintain due to outgrowing
 its initial design and being very monolithic and at the same time I wanted
 to learn Functional Programming, so I've been working in the port for a few
 weeks.

 The simulations are embarrassingly parallel Random Walk calculations used
 to study gas diffusion and Helium-3 Magnetic Resonance diffusion
 measurements in the lungs. At the core of the simulations we do there is a
 3D geometrical model of the pulmonary acinus. The new application is
 designed in a modular fashion, I'm including part of the current README
 file with :browse confirm wa
 a description.

 I've approached my institution's Technology Transfer Office to request
 authorization to release the software under an Open Source license, and if
 everything goes well the code will be published soon. I'm very happy in my
 Clojure trip so far and all the things I'm learning in the process.

 One of the things I've observed is poor scaling with the number of threads
 for more than 4 threads in an 8-core Intel i7 CPU, as follows:

 NTTime  cpu%x8
   1   101.9   108
   2 54.9   220
   4 36.0   430
   6 33.9   570
   8 32.5   700
 10 32.5   720

 Computing times reported are just the time spent in the computation of the
 NT futures (not total program execution time). CPU x8 percent is measured
 with top in Linux and the % values are approximate, just to give an idea.
 I'm running on Debian Wheezy with the following Java platform:

 JRE: OpenJDK Runtime Environment 1.6.0_27-b27 on Linux 3.2.0-4-amd64
 (amd64)
 JVM: OpenJDK 64-Bit Server VM (build 20.0-b12 mixed mode)

 I'll try in a 16 core (4-way Opteron) soon and see what happens there. The
 computing happens over an infinite lazy sequence of random walk steps
 generated with (iterate move particle), when an extraction function
 gets values from zero to the highest number of random walk steps and adds
 (conj) the values to be kept to a vector. The resulting vector for each
 particle is then added (conj) to a global vector for latter storage.

 I've read the previous post about concurrent performance in AMD
 processors:
 https://groups.google.com/forum/#!topic/clojure/48W2eff3caU%5B1-25-false%5D.
 Have to do it again with more time though, to check whether any of the
 explanations presented there applies to my application.

 Best regards,

 Jose Manuel.

  --
 --
 You received this message because you are subscribed to the Google
 Groups Clojure group.
 To post to this group, send email to clojure@googlegroups.com
 Note that posts from new members are moderated - please be patient with
 your first post.
 To unsubscribe from this group, send email to
 clojure+unsubscr...@googlegroups.com
 For more options, visit this group at
 http://groups.google.com/group/clojure?hl=en
 ---
 You received this message because you are subscribed to the Google Groups
 Clojure group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to clojure+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/groups/opt_out.


-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To