One could quantify this with Category Theory. In order to map from one
domain to another reversibly, the target domain must have at least as many
elements as the source domain, if not more. An abstraction which simplifies
by reducing the number of elements in play is guaranteed to be leaky.
Of
This is a slightly different result as this time I measure elapsed time
(see appendix and excuse not so nice code) as opposed to a clock time.
Results are similar (unless you have more processes than cores). I am
planning to release the code to github soon.
+--++---+
| # of
Memory access patterns make a huge a difference to memory throughput. I've
explored this in some detail in the following blog.
http://mechanical-sympathy.blogspot.co.uk/2012/08/memory-access-patterns-are-important.html
Thanks for sharing. From the Clojure perspective and using reducers, it
On 30/03/14 07:40, Andy C wrote:
Here are results where numbers are normalized gains.
++---++
| # of processes | random | linear|
++---++
|1 | 1.00| 1.00 |
Memory access patterns make a huge a difference to memory throughput. I've
explored this in some detail in the following blog.
http://mechanical-sympathy.blogspot.co.uk/2012/08/memory-access-patterns-are-important.html
On Sunday, 30 March 2014 06:40:24 UTC+1, Andy C wrote:
Hi,
So this is a
Hi,
So this is a follow-up. I claimed that 1 CPU core can saturate the memory
but it turns out I was wrong, at least to some extend. Driven by curiosity
I decided to do some measurements and test my somewhat older MBP 2.2GHz
Inter Core i7. While it obviously all depends on the hardware, I thought
On 20/03/14 04:03, Andy C wrote:
So, the following test puzzles me. Not because it takes virtually the
same time (I know that Fork/Join is not cheap and memory is probably
the biggest bottleneck here). But because I do not get why map (as
opposed to r/ma) uses all 8 cores on my MacBookPro.
into uses reduce under the hood, you probably need fold to have the
computation to run on FJ:
(def a (time (r/foldcat (r/map #( Math/sin (* % %)) l)
this runs more than 2x as fast on my macbook pro
Las
On Thu, Mar 20, 2014 at 10:34 AM, François Rey fmj...@gmail.com wrote:
On 20/03/14
On 20/03/14 12:26, László Török wrote:
into uses reduce under the hood, you probably need fold to have
the computation to run on FJ:
Yes, now I see all my CPUs being maxed-out.
francois@laptop:~$ perf stat java -cp
~/.m2/repository/org/clojure/clojure/1.5.1/clojure-1.5.1.jar
clojure.main -e
I like FSMs, but they do not compose well.
some have argued for generative grammars that generate the fsm,
because it is generally easier to compose grammars, and then generate
the final fsm. iiuc.
--
You received this message because you are subscribed to the Google
Groups Clojure group.
To
Due to the path-copy semantics, the contention gets driven to the root of the
tree.
Out of curiosity, would reference counting (rather than or in addition to
normal GC) help with this? Or is reference counting problematic in a highly
concurrent environment? It seems like reference cycles
1. Due to the path-copy semantics, the contention gets driven to the
root of the tree.
Out of curiosity, would reference counting (rather than or in addition to
normal GC) help with this? Or is reference counting problematic in a
highly concurrent environment? It seems like
So, the following test puzzles me. Not because it takes virtually the same
time (I know that Fork/Join is not cheap and memory is probably the biggest
bottleneck here). But because I do not get why map (as opposed to r/ma)
uses all 8 cores on my MacBookPro. All of them seem to be running
On Wed, Mar 19, 2014 at 11:14 AM, Raoul Duke rao...@gmail.com wrote:
I like FSMs, but they do not compose well.
some have argued for generative grammars that generate the fsm,
because it is generally easier to compose grammars, and then generate
the final fsm. iiuc.
I thought about it too
As a co-author of the reactive manifesto I'd like to point out that
reactive can be considered a superset of async. Good reactive
applications are event driven and non-blocking. They are also responsive,
resilient, and scalable which async can help with but does not prescribe.
What are the bad
In my personal experience I cannot get within 10X the throughput, or
latency, of mutable data models when using persistent data models.
Hi Martin,
Thanks for finding this thread :-). Let me ask a reversed question. Given
you come from a persistent data model where code remains
As a co-author of the reactive manifesto I'd like to point out that
reactive can be considered a superset of async. Good reactive
applications are event driven and non-blocking. They are also responsive,
resilient, and scalable which async can help with but does not prescribe.
What are the
Martin,
You recommend message-passing approaches like Erlang's as generally
superior, but I'm curious if there's any more specific thoughts to the
relative tradeoffs from shared-memory by default vs message-passing, ie,
where you might rely on hardware-level copies (cache coherence) for
I've never heard of imperative model. I'm aware of imperative
programming. Can you expand on what you mean?
I meant mutable data model. Sorry for mixing up terms.
http://blog.codinghorror.com/separating-programming-sheep-from-non-programming-goats/
Hope this helps clarify.
It does.
If it's any consolation, queues or delays are used in hardware to overcome
limitations in hardware, too :-).
Specifically, I'm thinking of anti-jitter stuff in audio circuits.
On Tue, Mar 18, 2014 at 12:45 PM, Andy C andy.coolw...@gmail.com wrote:
I've never heard of imperative model. I'm
As a co-author of the reactive manifesto I'd like to point out that
reactive can be considered a superset of async. Good reactive
applications are event driven and non-blocking. They are also responsive,
resilient, and scalable which async can help with but does not prescribe.
What are
2014-03-18 17:45 GMT+01:00 Andy C andy.coolw...@gmail.com:
I've never heard of imperative model. I'm aware of imperative
programming. Can you expand on what you mean?
I meant mutable data model. Sorry for mixing up terms.
some sort of FSM. Perhaps concurrency could be modeled using FSMs, but I do
not believe it is always a simple transition.
http://www.cis.upenn.edu/~stevez/papers/LZ06b.pdf
:-)
--
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group,
On 18/03/14 18:03, Martin Thompson wrote:
Our use of language in the technology industry could, for sure, be
better. Take simple examples like RAM where random should be
arbitrary, or don't get me started on people who misuse the term
agnostic ;-)
I would even say our use of abstractions in
The thing is that our industry is based on layers upon layers of
abstractions, whether at the physical level (integrated circuits,
interfaces, etc.) or at the software level: binary (1GL) abstracted into
assembly (2GL), then C language (3GL), etc. Virtual machines is now another
you maybe
On Tue, Mar 18, 2014 at 11:06 AM, Raoul Duke rao...@gmail.com wrote:
some sort of FSM. Perhaps concurrency could be modeled using FSMs, but I
do
not believe it is always a simple transition.
http://www.cis.upenn.edu/~stevez/papers/LZ06b.pdf
I like FSMs, but they do not compose well.
A.
On 16/03/14 18:24, Softaddicts wrote:
I think that significant optimizations have to be decided at a higher level.
I doubt that any of that can be implemented at the hardware level alone
and let it decide on the fly. This sounds like magic, too good to be true.
I am also quite convinced that
I would say that guidelines can help but
curiosity is mandatory.
I used to sleep with books on my
bedside table about hardware design,
operating system internals,
io subsystems, networking, ...
I did this for 15 years after I started
to work with computers.
Aside from work related stuff of
From what I understand, a single core can easily saturate a memory bus. At
the same time L2 and L3 caches are so small as compared to GB of memory
systems yet growing them does not necessarily help either due to larger
latencies. It all limits the number of practical applications which could
In the old days, the principle of locality was prevalent.
Most data allocation was static
or if not contiguous (the stack).
Code was also contiguous being
most of the time statically compiled
on several pages.
Memory was costly, its size remained
reasonable.
Today, data is scattered
everywhere
Today, data is scattered
everywhere in a huge memory,
its location changes frequently,
code can be dynamically compiled,
get loaded by small chunks,
It describes my case actually - I am loading about 2-8GB of stuff in the
memory and to tons of mapping, grouping by and filtering. That
In my personal experience I cannot get within 10X the throughput, or
latency, of mutable data models when using persistent data models.
Hi Martin,
Thanks for finding this thread :-). Let me ask a reversed question. Given
you come from a persistent data model where code remains reasonably
I agree with most of his arguments but not all of them.
Memory subsystems have always been a major concern.
Since 35 years ago, many designs went through simulation before
burning anything on chip. Especially SMP designs with shared memory
given the cost of prototyping.
Simulations used typical
On 15/03/14 01:59, Andy C wrote:
Maybe one day this idea http://en.wikipedia.org/wiki/Lisp_machine will
come back, I mean in a new form ..
That reminds me of this Urbit project http://www.urbit.org/, which is
nowhere near usefulness at present, but deserves to be mentioned as a
(radical)
Martin's point about immutable and persistent data structures is further
developed in his interview on infoq
http://www.infoq.com/interviews/reactive-system-design-martin-thompson, you
can skim to point #9 if you're in a hurry.
Overall what he says is that in terms of scalability of the
cough cough erlang cough ahem
--
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your
first post.
To unsubscribe from
He talks about simple things actually.
When you have any sort of immutable data structure and you want to change
it from multiple threads
you just must have a mutable reference which points to the current version
of that data structure.
Now, updates to that mutable reference are fundamentally
On Fri, Mar 14, 2014 at 12:58 PM, Raoul Duke rao...@gmail.com wrote:
cough cough erlang cough ahem
Care to elaborate? :-)
--
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that
cough cough erlang cough ahem
Care to elaborate? :-)
Now his point is that GC acts a super GIL which effectively kills all
the hard work done on the language and application design level.
erlang's approach to gc / sharing data / multi process / smp is i
think an interesting sweet spot. there
Shared memory is kind of a degenerate case of message-passing when you
think about cache-coherency protocols and such. I'd argue erlang's
message-passing isn't inherently superior, but kind of obfuscates the issue.
What's the sequential fraction of an arbitrary erlang program, can you even
know
i am probably out of my depth here, i do not have extensive real-world
experience with the various ways to approach parallelism and
concurrency (to be distinguished of course), more run of the mill
stuff. so if i sound like i'm missing your point or am clueless i ask
for your patience :-)
What's
On Fri, Mar 14, 2014 at 1:35 PM, Raoul Duke rao...@gmail.com wrote:
i am probably out of my depth here, i do not have extensive real-world
experience with the various ways to approach parallelism and
concurrency (to be distinguished of course), more run of the mill
stuff. so if i sound like
Everything has tradeoffs. One more example was when Martin explained how
he was able to get a 10x performance boost by pinning a JVM to each socket
in a server, then pinning that JVM's memory to the RAM sockets closest to
that CPU socket. Then he carefully setup shared memory message passing via
that closely match or can be massaged to match or 'have sympathy' for the
hardware realities. I think this can get lost when we stray too far.
i wish this were somehow more modeled, composed, and controllable up
in our ides and source code, rather than being esoteric tweaky options
in the
Yes I agree, that was my next thought. I wish we could do these knobs in a
single JVM.
On Fri, Mar 14, 2014 at 2:20 PM, Raoul Duke rao...@gmail.com wrote:
that closely match or can be massaged to match or 'have sympathy' for the
hardware realities. I think this can get lost when we stray
On Fri 14 Mar 2014 at 02:15:07PM -0400, Gary Trakhman wrote:
Everything has tradeoffs. One more example was when Martin explained how
he was able to get a 10x performance boost by pinning a JVM to each socket
in a server, then pinning that JVM's memory to the RAM sockets closest to
that CPU
Maybe one day this idea http://en.wikipedia.org/wiki/Lisp_machine will come
back, I mean in a new form ..
In any case, I think that it would be great to see some performance
benchmarks for STM
A.
--
You received this message because you are subscribed to the Google
Groups Clojure group.
To
I talked to Martin after a CodeMesh, and had a wonderful discussion with
him about performance from his side of the issue. From his side I mean
super high performance. You have to get a bit of background on some of his
work (at least the work he talks about), much of what he does is high
On Thu, Mar 13, 2014 at 11:24 AM, Timothy Baldridge tbaldri...@gmail.comwrote:
I talked to Martin after a CodeMesh, and had a wonderful discussion with
him about performance from his side of the issue. From his side I mean
super high performance.
[...]
Hi Tim,
Thanks for explaining the
49 matches
Mail list logo