http://mechanical-sympathy.blogspot.co.uk/
http://lmax-exchange.github.io/disruptor/
*Neale Swinnerton*
{t: @sw1nn <https://twitter.com/#!/sw1nn>, w: sw1nn.com }
On 27 September 2013 12:29, Wm. Josiah Erikson wrote:
> Interesting! If that is true of Java (I don't know Java
Interesting! If that is true of Java (I don't know Java at all), then your
argument seems plausible. Cache-to-main-memory writes still take many more
CPU cycles (an order of magnitude more, last I knew) than
processor-to-cache. I don't think it's so much a bandwidth issue as
latency, AFAIK. Thanks
Am I reading this right that this is actually a Java problem, and not
clojure-specific? Wouldn't the rest of the Java community have noticed
this? Or maybe massive parallelism in this particular way isn't something
commonly done with Java in the industry?
Thanks for the patches though - it's nice
1:00 PM, Lee Spector wrote:
>
> On Dec 19, 2012, at 11:57 AM, Wm. Josiah Erikson wrote:
> > I think this is a succinct, deterministic benchmark that clearly
> demonstrates the problem and also doesn't use conj or reverse.
>
> Clarification: it's not just a tight
Whoops, sorry about the link. It should be able to be found here:
http://gibson.hampshire.edu/~josiah/clojush/
On Wed, Dec 19, 2012 at 11:57 AM, Wm. Josiah Erikson wrote:
> So here's what we came up with that clearly demonstrates the problem. Lee
> provided the code and I tweaked
So here's what we came up with that clearly demonstrates the problem. Lee
provided the code and I tweaked it until I believe it shows the problem
clearly and succinctly.
I have put together a .tar.gz file that has everything needed to run it,
except lein. Grab it here: clojush_bowling_benchmark.ta
ww.azulsystems.com/products/zing/whatisit
>
> Andy
>
> On Dec 13, 2012, at 10:41 AM, Wm. Josiah Erikson wrote:
>
> OK, I did something a little bit different, but I think it proves the same
> thing we were shooting for.
>
> On a 48-way 4 x Opteron 6168 with 32GB of RAM. This
Ah. We'll look into running several clojures in one JVM too. Thanks.
On Thu, Dec 13, 2012 at 1:41 PM, Wm. Josiah Erikson wrote:
> OK, I did something a little bit different, but I think it proves the same
> thing we were shooting for.
>
> On a 48-way 4 x Opteron 6168 with 32G
OK, I did something a little bit different, but I think it proves the same
thing we were shooting for.
On a 48-way 4 x Opteron 6168 with 32GB of RAM. This is Tom's "Bowling"
benchmark:
1: multithreaded. Average of 10 runs: 14:00.9
2. singlethreaded. Average of 10 runs: 23:35.3
3. singlethreaded,
Hm. Interesting. For the record, the exact code I'm running right now that
I'm seeing great parallelism with is this:
(defn reverse-recursively [coll]
(loop [[r & more :as all] (seq coll)
acc '()]
(if all
(recur more (cons r acc))
acc)))
(defn burn
([] (loop [i 0
may have just saved us outrageous quantities of time, though Lee isn't
convinced that we know exactly what's going on with clojush yet, I don't
think. We haven't looked at it yet though I'm sure we will soon enough!
On Tue, Dec 11, 2012 at 2:57 PM, Wm. Jos
And, interestingly enough, suddenly the AMD FX-8350 beats the Intel Core i7
3770K, when before it was very very much not so. So for some reason, this
bug was tickled more dramatically on AMD multicore processors than on Intel
ones.
On Tue, Dec 11, 2012 at 2:54 PM, Wm. Josiah Erikson wrote:
>
OK WOW. You hit the nail on the head. It's "reverse" being called in a pmap
that does it. When I redefine my own version of reverse (I totally cheated
and just stole this) like this:
(defn reverse-recursively [coll]
(loop [[r & more :as all] (seq coll)
acc '()]
(if all
(recur
90528K, 0% used [0x00065035, 0x00065035,
0x000650350200, 0x0007fae0)
compacting perm gen total 21248K, used 11049K [0x0007fae0,
0x0007fc2c, 0x0008)
the space 21248K, 52% used [0x0007fae0, 0x0007fb8ca638,
0x0007fb8
wc -l
158
[josiah@compute-1-17 benchmark]$
On Mon, Dec 10, 2012 at 12:55 PM, Wm. Josiah Erikson wrote:
> Aha. Not only do I get a lot of "made not entrant", I get a lot of "made
> zombie". However, I get this for both runs with map and with pmap (and with
> p
Aha. Not only do I get a lot of "made not entrant", I get a lot of "made
zombie". However, I get this for both runs with map and with pmap (and with
pmapall as well)
For instance, from a pmapall run:
33752 159 clojure.lang.Cons::next (10 bytes) made zombie
33752 164
> when allocating memory? Should JVM memory allocations be completely
> parallel with no synchronization when running multiple threads, or do
> memory allocations sometimes lock a shared data structure?
>
> Andy
>
>
> On Dec 8, 2012, at 11:10 AM, Wm. Josiah Erikson wrote
Andy: The short answer is yes, and we saw huge speedups. My latest post, as
well as Lee's, has details.
On Friday, December 7, 2012 9:42:03 PM UTC-5, Andy Fingerhut wrote:
>
>
> On Dec 7, 2012, at 5:25 PM, Lee Spector wrote:
>
> >
> > Another strange observation is that we can run multiple inst
Hi guys - I'm the colleague Lee speaks of. Because Jim mentioned running
things on a 4-core Phenom II, I did some benchmarking on a Phenom II X4
945, and found some very strange results, which I shall post here, after I
explain a little function that Lee wrote that is designed to get improved
r
19 matches
Mail list logo