Re: Question about pmap

2009-08-10 Thread cliffc
I'll volunteer to run your code on an Azul box. - Azul gear has a great profiling tool. I should be able to rapidly tell hot-locks/lock-contention from other resource bottlenecks. - Azul gear has far more bandwidth than X86 gear, so if your X86 is bandwidth bound - this won't show up on us. - Az

Re: Question about pmap

2009-08-09 Thread Berk Özbozkurt
... >parallel (6) : "Elapsed time: 38357.797175 msecs" >parallel (7) : "Elapsed time: 37756.190205 msecs" >From 4 to 7 there is no speedup at all. >This awfully looks like you are using a core i7 with 8 threats but only 4 physical cores. What is your hardware? sorry, I found you have alre

Re: Question about pmap

2009-08-09 Thread Bradbev
On Aug 9, 6:08 am, Nicolas Oury wrote: > > If I do my pmaptest with a very large Integer (inc 20) instead > > of (inc 0), it is as slow as the double version. My question is, > > whether Clojure may has a special handling for small integers? Like > > using primitives for small ints and do

Re: Question about pmap

2009-08-09 Thread Nicolas Oury
> If I do my pmaptest with a very large Integer (inc 20) instead > of (inc 0), it is as slow as the double version. My question is, > whether Clojure may has a special handling for small integers? Like > using primitives for small ints and doing a new Integer for larger > ones? > It seem

Re: Question about pmap

2009-08-09 Thread Johann Kraus
> Johann, if you are still following this thread, could you try running > this Clojure program on your 8 core machine? > > http://github.com/jafingerhut/clojure-benchmarks/blob/3e45bd8f6c3eba4... > > These first set of parameters below will do 8 jobs sequentially, each > doing 10^10 (inc c)'s, whe

Re: Question about pmap

2009-08-08 Thread Chad Harrington
Andy, I just thought I'd mention that for 80 cents you can rent an hour on an 8-core EC2 machine with 7GB of RAM. We use EC2 a lot for such things at work. It may be an easy way for you to accomplish your goals. http://aws.amazon.com/ec2/instance-types/ Chad Harrington chad.harring...@gmail.com

Re: Question about pmap

2009-08-08 Thread Nicolas Oury
Hi Brad, I think that there is no global lock for heap allocation, at least for small objects. As a support for this claim: http://www.ibm.com/developerworks/java/library/j-jtp09275.html (see more specifically: "Thread-local allocation", but the article is really interesting as a whole.) I am

Re: Question about pmap

2009-08-08 Thread Bradbev
> I'm not sure how to determine why calling 'new Double' each time > through NewDoubleTest's inner loop causes 2 threads to perform not > much better than 1.  The best possible explanation I've heard is from > Nicolas Oury -- perhaps we are measuring the bandwidth from cache to > main memory, not

Re: Question about pmap

2009-08-08 Thread Andy Fingerhut
Johann, if you are still following this thread, could you try running this Clojure program on your 8 core machine? http://github.com/jafingerhut/clojure-benchmarks/blob/3e45bd8f6c3eba47f982a0f6083493a9f076d0e9/misc/pmap-testing.clj These first set of parameters below will do 8 jobs sequentially,

Re: Question about pmap

2009-08-06 Thread Andy Fingerhut
On Aug 6, 11:51 am, John Harrop wrote: > Cache misses are a possibility; try the integer version with long, so the > size of the data is the same as with double. > The other possibility I'd consider likely is that the JDK you were using > implements caching in Double.valueOf(double). This could b

Re: Question about pmap

2009-08-06 Thread John Harrop
Cache misses are a possibility; try the integer version with long, so the size of the data is the same as with double. The other possibility I'd consider likely is that the JDK you were using implements caching in Double.valueOf(double). This could be dealt with if Clojure boxing directly called ne

Re: Question about pmap

2009-08-06 Thread Andy Fingerhut
On Aug 6, 10:00 am, Bradbev wrote: > On Aug 6, 3:07 am, Andy Fingerhut > wrote: > > > > > On Aug 5, 6:09 am, Rich Hickey wrote: > > > > On Wed, Aug 5, 2009 at 8:29 AM, Johann Kraus > > > wrote: > > > > >> Could it be that your CPU has a single floating-point unit shared by 4 > > > >> cores o

Re: Question about pmap

2009-08-06 Thread Sean Devlin
FYI IEEE doubles are typically 64 bit IEEE floats are typically 32 bit. The wikipedia article is good: http://en.wikipedia.org/wiki/IEEE_754-2008 The IEEE standard (requires login): http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4610935 I'm not sure how the JVM implements them prec

Re: Question about pmap

2009-08-06 Thread Bradbev
On Aug 6, 3:07 am, Andy Fingerhut wrote: > On Aug 5, 6:09 am, Rich Hickey wrote: > > > > > On Wed, Aug 5, 2009 at 8:29 AM, Johann Kraus wrote: > > > >> Could it be that your CPU has a single floating-point unit shared by 4 > > >> cores on a single die, and thus only 2 floating-point units total

Re: Question about pmap

2009-08-06 Thread Nicolas Oury
Hello again, Another interesting test: replace the double operation by something longer, that won't allocate anything. (a long chain of math functions with primitive types...), and see if the parallelism is better. Best, Nicolas. On Thu, Aug 6, 2009 at 2:32 PM, Nicolas Oury wrote: > Hello,

Re: Question about pmap

2009-08-06 Thread Nicolas Oury
Hello, I will try to have a guess. If 98% of time is spend allocating Doubles, the program is loading new lines of memory in cache every n Doubles. At some point down the different levels of cache, you have a common cache/main memory for both cores and the bus to this memory has to be shared in so

Re: Question about pmap

2009-08-06 Thread Andy Fingerhut
On Aug 5, 6:09 am, Rich Hickey wrote: > On Wed, Aug 5, 2009 at 8:29 AM, Johann Kraus wrote: > > >> Could it be that your CPU has a single floating-point unit shared by 4 > >> cores on a single die, and thus only 2 floating-point units total for > >> all 8 of your cores?  If so, then that fact, pl

Re: Question about pmap

2009-08-05 Thread Rich Hickey
On Wed, Aug 5, 2009 at 8:29 AM, Johann Kraus wrote: > >> Could it be that your CPU has a single floating-point unit shared by 4 >> cores on a single die, and thus only 2 floating-point units total for >> all 8 of your cores?  If so, then that fact, plus the fact that each >> core has its own separ

Re: Question about pmap

2009-08-05 Thread Johann Kraus
> Could it be that your CPU has a single floating-point unit shared by 4 > cores on a single die, and thus only 2 floating-point units total for > all 8 of your cores? If so, then that fact, plus the fact that each > core has its own separate ALU for integer operations, would seem to > explain th

Re: Question about pmap

2009-08-04 Thread Andy Fingerhut
Johann: Could it be that your CPU has a single floating-point unit shared by 4 cores on a single die, and thus only 2 floating-point units total for all 8 of your cores? If so, then that fact, plus the fact that each core has its own separate ALU for integer operations, would seem to explain the

Re: Question about pmap

2009-08-04 Thread Johann Kraus
> My guess would be you're seeing the overhead for pmap since the > (inc 0.1) computation is so cheap.  From the docs for pmap: >   "Only useful for computationally intensive functions where the time of >   f dominates the coordination overhead." I don't think so, as the cheap computation (inc 0.

Re: Question about pmap

2009-08-03 Thread Sudish Joseph
Johann Kraus writes: > Doing this with doubles: > leads to: > (time (maptest 8)) : 68044.060324 msecs > (time (pmaptest 8)) : 35051.174503 msecs > i.e. a speedup of ~2. > > However, the CPU usage indicated by "top" is ~690%. What does the CPU > do? My guess would be you're seeing the overhead

Re: Question about pmap

2009-08-03 Thread tmountain
> However, the CPU usage indicated by "top" is ~690%. What does the CPU do? 100% per core. So with dual quad-core processors, it'd mean roughly 7 cores were being pegged. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Group

Re: Question about pmap

2009-08-03 Thread Johann Kraus
Sorry about the copy&paste error. I partially changed len to cores. The code must look like: (defn maptest [cores] (doall (map (fn [x] (dotimes [_ 10] (inc 0))) (range cores (defn pmaptest [cores] (doall (pmap (fn [x] (dotimes [_ 10] (inc 0))) (range cores and (defn mapt

Question about pmap

2009-08-03 Thread Johann Kraus
Hi all, recently I did some micro-benchmarks of parallel code on my 8-core computer. But I don't get the point about this behaviour of pmap. Can anyone explain this to me? The code is running on a dual quad-core intel machine (Xeon X5482, 3.20 GHz). (defn maptest [cores] (doall (map (fn [x] (dot