Re: STM and persistent data structures performance on mutli-core archs
One could quantify this with Category Theory. In order to map from one domain to another reversibly, the target domain must have at least as many elements as the source domain, if not more. An abstraction which simplifies by reducing the number of elements in play is guaranteed to be leaky. Of course there could be abstractions that work not by reducing elements but organizing them so they map more efficiently to human thinking. Information theory also has tools that help us quantify when some relation or structure is lost in a translation from a domain to its abstraction. On Tuesday, March 18, 2014 2:23:46 PM UTC-7, François Rey wrote: On 18/03/14 18:03, Martin Thompson wrote: Our use of language in the technology industry could, for sure, be better. Take simple examples like RAM where random should be arbitrary, or don't get me started on people who misuse the term agnostic ;-) I would even say our use of abstractions in the tech industry could be better, most notably because of the law of leaky abstractionshttp://en.wikipedia.org/wiki/Leaky_abstraction: how abstractions such as RAM, GC, persistent data structure, etc. always have a limit at which the things they're supposed to abstract come back right in your face and you end up needing to be aware of the abstracted things. I think mechanical sympathy is all about dealing with leaky abstractions. The art of knowing and playing with the limits of our abstractions is what Rich Hickey would call knowing the trade-offs. I don't think the law of leaky abstraction has any formal underpinning, but using the concept of trade-off it's about what we gain from using an abstraction, which we often know very well because that's the whole point of using it, and what we loose, which is often the forgotten and darker side. In those terms an abstraction without leakiness is one that does not loose anything, whether in terms of information or capability. Know of any? Isn't the whole point of an abstraction to be hiding certain underlying aspects, therefore loosing seems to be part of the deal, meaning abstractions are leaky by nature? I think they are, but don't ask me for a proof. The thing is that our industry is based on layers upon layers of abstractions, whether at the physical level (integrated circuits, interfaces, etc.) or at the software level: binary (1GL) abstracted into assembly (2GL), then C language (3GL), etc. Virtual machines is now another level of abstraction which is probably here to stay, at least in terms of flexibility offered. So perhaps a key question is how many levels of abstraction can we afford before the whole thing becomes too difficult to manage. If think we even have situations where leakiness at one level makes another level even more leaky, at which point one needs to reconsider the whole stack of abstractions, perhaps from hardware up to programming language and runtime, and check for the soundness of the trade-offs made at each level, and then provide more than one choice. So in the end it's all about knowing the trade-off we make at each level of the stack we use. I don't think many of us do, this is becoming more and more obvious as the stack becomes higher and higher. We don't have to be a mechanic to drive a car. But it helps to know what we can expect from it, so we can better choose other modes of transportation when for example we need to travel 1000 km or miles in 4 hours, or travel with 10+ people, or travel where there's no road. End-users of computer programs don't have to know the mechanics. But programmers do if they want to excel at their trade. I don't think we can avoid this: space and time are inescapable dimensions of our material reality and there will always be trade-offs around them. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: STM and persistent data structures performance on mutli-core archs
This is a slightly different result as this time I measure elapsed time (see appendix and excuse not so nice code) as opposed to a clock time. Results are similar (unless you have more processes than cores). I am planning to release the code to github soon. +--++---+ | # of | seq | rdm | |processes || | +--++---+ |1 | 1.00 | 1.00 | +--++---+ |2 | 1.96 | 1.75 | +--++---+ |4 | 3.20 | 1.83 | +--++---+ |8 | 3.78 | 1.83 | +--++---+ |16| 3.61 | 1.81 | +--++---+ |32| 3.56 | 1.81 | +--++---+ This is great stuff. Let me make sure I read it correctly. Having 2 processes makes a value 1.97 times higher than with 1 core in the random case, and 1.76 times higher in the linear case, but what is that value being measured? Some form of throughput I suppose and not time, right? I think you could call that a normalized throughput. Here are more details. First column is the number of separate O/S test processes running in background concurrently, started by the shell script virtually at the same time. Then I collected the output which simply logs how long it takes to iterate thru 40MB of memory in sequential or random manner. Second and third column are number_of_processes/elapsed_time*elapsed_time_from_first_row_process for sequential and random access respectively. In ideal conditions, elapsed_time should be constant as we use more and more cores/CPUs. Indeed. It also means single threaded linear access isn't going to be very much faster if you add more threads. BTW, are you sure the threads were running in parallel on separate cores and not just concurrently on a smaller number of cores? As you said, this should be dependent on hardware and running this on actual server machine would be as interesting. I wanted to see the worse case, separate processes and memory , which was simplest to implement. Yes, I am sure that cores were utilized by OS as the number of processed added. I watched MBP Activity Monitor and CPU History which was hitting 100%. Also I did not optimize the output (should not matter). One interesting thing, I heard somewhere, that O/Ss bounce long lasting CPU intensive threads between cores in order to equalize the heat generated from the silicon. I did not observe that but longest running test took about 15 sec using a single core. Thx, Andy Appendix 1: { int n; int j; struct timeval begin,end; gettimeofday (begin, NULL); for(n=0;n100;n++) for (j=0;jlen;j++) i_array[rand()%len]=j; gettimeofday (end, NULL); printf([:rdm %s %d %d %f]\n,des,len,m,tdiff(begin,end)); } { int n; int j; struct timeval begin,end; gettimeofday (begin, NULL); for(n=0;n100;n++) for (j=0;jlen;j++) i_array[j]=j; gettimeofday (end, NULL); printf([:seq %s %d %d %f]\n,des,len,m,tdiff(begin,end)); } -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: STM and persistent data structures performance on mutli-core archs
Memory access patterns make a huge a difference to memory throughput. I've explored this in some detail in the following blog. http://mechanical-sympathy.blogspot.co.uk/2012/08/memory-access-patterns-are-important.html Thanks for sharing. From the Clojure perspective and using reducers, it seems we are hitting the worse case. Not sure if it worth effort of using it (at least in my applications). Thx, Andy -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: STM and persistent data structures performance on mutli-core archs
On 30/03/14 07:40, Andy C wrote: Here are results where numbers are normalized gains. ++---++ | # of processes | random | linear| ++---++ |1 | 1.00| 1.00 | ++---++ |2 | 1.97| 1.76 | ++---++ |4 | 3.51| 1.83 | ++---++ |8 | 4.24| 1.86 | ++---++ This is great stuff. Let me make sure I read it correctly. Having 2 processes makes a value 1.97 times higher than with 1 core in the random case, and 1.76 times higher in the linear case, but what is that value being measured? Some form of throughput I suppose and not time, right? The conclusion is that in practice two cores can easily saturate memory buses. Accessing it in certain patters helps to some extend. Although 8 cores is pretty much all what makes sense unless you do tons of in cache stuff. Indeed. It also means single threaded linear access isn't going to be very much faster if you add more threads. BTW, are you sure the threads were running in parallel on separate cores and not just concurrently on a smaller number of cores? As you said, this should be dependent on hardware and running this on actual server machine would be as interesting. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: STM and persistent data structures performance on mutli-core archs
Memory access patterns make a huge a difference to memory throughput. I've explored this in some detail in the following blog. http://mechanical-sympathy.blogspot.co.uk/2012/08/memory-access-patterns-are-important.html On Sunday, 30 March 2014 06:40:24 UTC+1, Andy C wrote: Hi, So this is a follow-up. I claimed that 1 CPU core can saturate the memory but it turns out I was wrong, at least to some extend. Driven by curiosity I decided to do some measurements and test my somewhat older MBP 2.2GHz Inter Core i7. While it obviously all depends on the hardware, I thought it could be still a good test. In order to rule out the GC and JVM out of equation I went back to old good C and wrote a simple program which accesses a 40MB chunk of memory in both linear and random manner. All tests run a few times to ensure proper warm up and allocations within OS, however saw a great deal of consistency. It is not scientific by any means, but gives a rough idea what we are dealing with. Here are results where numbers are normalized gains. ++---++ | # of processes | random | linear| ++---++ |1 | 1.00| 1.00 | ++---++ |2 | 1.97| 1.76 | ++---++ |4 | 3.51| 1.83 | ++---++ |8 | 4.24| 1.86 | ++---++ The conclusion is that in practice two cores can easily saturate memory buses. Accessing it in certain patters helps to some extend. Although 8 cores is pretty much all what makes sense unless you do tons of in cache stuff. Best, Andy -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: STM and persistent data structures performance on mutli-core archs
Hi, So this is a follow-up. I claimed that 1 CPU core can saturate the memory but it turns out I was wrong, at least to some extend. Driven by curiosity I decided to do some measurements and test my somewhat older MBP 2.2GHz Inter Core i7. While it obviously all depends on the hardware, I thought it could be still a good test. In order to rule out the GC and JVM out of equation I went back to old good C and wrote a simple program which accesses a 40MB chunk of memory in both linear and random manner. All tests run a few times to ensure proper warm up and allocations within OS, however saw a great deal of consistency. It is not scientific by any means, but gives a rough idea what we are dealing with. Here are results where numbers are normalized gains. ++---++ | # of processes | random | linear| ++---++ |1 | 1.00| 1.00 | ++---++ |2 | 1.97| 1.76 | ++---++ |4 | 3.51| 1.83 | ++---++ |8 | 4.24| 1.86 | ++---++ The conclusion is that in practice two cores can easily saturate memory buses. Accessing it in certain patters helps to some extend. Although 8 cores is pretty much all what makes sense unless you do tons of in cache stuff. Best, Andy -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: STM and persistent data structures performance on mutli-core archs
On 20/03/14 04:03, Andy C wrote: So, the following test puzzles me. Not because it takes virtually the same time (I know that Fork/Join is not cheap and memory is probably the biggest bottleneck here). But because I do not get why map (as opposed to r/ma) uses all 8 cores on my MacBookPro. All of them seem to be running according to Activity Monitor at more less the same level. user= (def l (into [] (range 6000))) #'user/l user= (time (def a (doall (map #(Math/sin (* % %)) l Elapsed time: 19986.18 msecs user= (time (def a (doall (into [] (r/map #( Math/sin (* % %)) l) Elapsed time: 18980.583 msecs I would also expect this code to run on a single CPU at any one time, however process/thread scheduling can make this thread run on different cores at different times. Depending on the sampling method the activity monitor may display another picture of what you would expect. On my linux machine, the cpu history graph shows it's using mostly one cpu at any one time, but it's not always the same cpu. I think the JVM uses all cores by default, and there's no standard way to specify thread affinity http://stackoverflow.com/questions/2238272/java-thread-affinity. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: STM and persistent data structures performance on mutli-core archs
into uses reduce under the hood, you probably need fold to have the computation to run on FJ: (def a (time (r/foldcat (r/map #( Math/sin (* % %)) l) this runs more than 2x as fast on my macbook pro Las On Thu, Mar 20, 2014 at 10:34 AM, François Rey fmj...@gmail.com wrote: On 20/03/14 04:03, Andy C wrote: So, the following test puzzles me. Not because it takes virtually the same time (I know that Fork/Join is not cheap and memory is probably the biggest bottleneck here). But because I do not get why map (as opposed to r/ma) uses all 8 cores on my MacBookPro. All of them seem to be running according to Activity Monitor at more less the same level. user= (def l (into [] (range 6000))) #'user/l user= (time (def a (doall (map #(Math/sin (* % %)) l Elapsed time: 19986.18 msecs user= (time (def a (doall (into [] (r/map #( Math/sin (* % %)) l) Elapsed time: 18980.583 msecs I would also expect this code to run on a single CPU at any one time, however process/thread scheduling can make this thread run on different cores at different times. Depending on the sampling method the activity monitor may display another picture of what you would expect. On my linux machine, the cpu history graph shows it's using mostly one cpu at any one time, but it's not always the same cpu. I think the JVM uses all cores by default, and there's no standard way to specify thread affinityhttp://stackoverflow.com/questions/2238272/java-thread-affinity . -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- László Török Checkout Lollyrewards.com http://www.lollyrewards.com - Giving credit, getting credit. Follow us on Twitter @lollyrewards https://twitter.com/lollyrewards Stay up to date on the latest offers by liking our Facebook pagehttps://www.facebook.com/lollyrewards . -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: STM and persistent data structures performance on mutli-core archs
On 20/03/14 12:26, László Török wrote: into uses reduce under the hood, you probably need fold to have the computation to run on FJ: Yes, now I see all my CPUs being maxed-out. francois@laptop:~$ perf stat java -cp ~/.m2/repository/org/clojure/clojure/1.5.1/clojure-1.5.1.jar clojure.main -e (require '[clojure.core.reducers :as r]) (let [l (into [] (range 1000))] (time (r/fold + (r/map #(Math/sin (* % %)) l Elapsed time: 1594.89737 msecs 678.1151045769922 Performance counter stats for 'java -cp /home/francois/.m2/repository/org/clojure/clojure/1.5.1/clojure-1.5.1.jar clojure.main -e (require '[clojure.core.reducers :as r]) (let [l (into [] (range 1000))] (time (r/fold + (r/map #(Math/sin (* % %)) l': 23460.971496 task-clock# *4.728**CPUs utilized* 7,663 context-switches #0.000 M/sec 570 CPU-migrations#0.000 M/sec 174,586 page-faults #0.007 M/sec 68,361,248,389 cycles#2.914 GHz [83.33%] 44,379,949,020 stalled-cycles-frontend # 64.92% frontend cycles idle[83.21%] 22,520,594,887 stalled-cycles-backend# 32.94% backend cycles idle[66.75%] 63,715,541,342 instructions #0.93 insns per cycle #0.70 stalled cycles per insn [83.39%] 11,104,063,057 branches # 473.299 M/sec [83.25%] 155,172,776 branch-misses #1.40% of all branches [83.46%] 4.962656410 seconds time elapsed francois@laptop:~$ perf stat java -cp ~/.m2/repository/org/clojure/clojure/1.5.1/clojure-1.5.1.jar clojure.main -e (require '[clojure.core.reducers :as r]) (let [l (into [] (range 1000))] (time (reduce + (map #(Math/sin (* % %)) l Elapsed time: 5694.224824 msecs 678.1151045768991 Performance counter stats for 'java -cp /home/francois/.m2/repository/org/clojure/clojure/1.5.1/clojure-1.5.1.jar clojure.main -e (require '[clojure.core.reducers :as r]) (let [l (into [] (range 1000))] (time (reduce + (map #(Math/sin (* % %)) l': 21365.047020 task-clock# *2.324 CPUs utilized* 9,443 context-switches #0.000 M/sec 375 CPU-migrations#0.000 M/sec 164,960 page-faults #0.008 M/sec 64,028,578,416 cycles#2.997 GHz [83.21%] 39,016,283,425 stalled-cycles-frontend # 60.94% frontend cycles idle[83.26%] 21,038,585,232 stalled-cycles-backend# 32.86% backend cycles idle[66.77%] 63,857,091,972 instructions #1.00 insns per cycle #0.61 stalled cycles per insn [83.29%] 11,161,960,110 branches # 522.440 M/sec [83.48%] 138,372,839 branch-misses #1.24% of all branches [83.28%] 9.195138860 seconds time elapsed francois@laptop:~$ -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: STM and persistent data structures performance on mutli-core archs
I like FSMs, but they do not compose well. some have argued for generative grammars that generate the fsm, because it is generally easier to compose grammars, and then generate the final fsm. iiuc. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: STM and persistent data structures performance on mutli-core archs
Due to the path-copy semantics, the contention gets driven to the root of the tree. Out of curiosity, would reference counting (rather than or in addition to normal GC) help with this? Or is reference counting problematic in a highly concurrent environment? It seems like reference cycles will be less of an issue with immutable data. - John -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: STM and persistent data structures performance on mutli-core archs
1. Due to the path-copy semantics, the contention gets driven to the root of the tree. Out of curiosity, would reference counting (rather than or in addition to normal GC) help with this? Or is reference counting problematic in a highly concurrent environment? It seems like reference cycles will be less of an issue with immutable data. Reference counting tends to be less efficient than a tracing collector in a concurrent environment. Reference counts must employ atomic updates that can be relatively expensive, e.g. similar to how AtomicLong.incrementAndGet() works. I've blogged on how atomic increments work. http://mechanical-sympathy.blogspot.co.uk/2011/09/adventures-with-atomiclong.html -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: STM and persistent data structures performance on mutli-core archs
So, the following test puzzles me. Not because it takes virtually the same time (I know that Fork/Join is not cheap and memory is probably the biggest bottleneck here). But because I do not get why map (as opposed to r/ma) uses all 8 cores on my MacBookPro. All of them seem to be running according to Activity Monitor at more less the same level. user= (def l (into [] (range 6000))) #'user/l user= (time (def a (doall (map #(Math/sin (* % %)) l Elapsed time: 19986.18 msecs user= (time (def a (doall (into [] (r/map #( Math/sin (* % %)) l) Elapsed time: 18980.583 msecs -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: STM and persistent data structures performance on mutli-core archs
On Wed, Mar 19, 2014 at 11:14 AM, Raoul Duke rao...@gmail.com wrote: I like FSMs, but they do not compose well. some have argued for generative grammars that generate the fsm, because it is generally easier to compose grammars, and then generate the final fsm. iiuc. I thought about it too but composing FSMs out of some patterns really transcends my mind unfortunately. In another words, I have hard time to see patterns in FSMs - they tend to be quite complex in the first place. Sometimes a single transition might ruin the perception and completely the properties of everything. There is a very good book treating about subject: http://www.amazon.com/Practical-Statecharts-Quantum-Programming-Embedded-ebook/dp/B0017UASZO/ref=sr_1_3?s=booksie=UTF8qid=1395284883sr=1-3keywords=C%2B%2B+state+machine A. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: STM and persistent data structures performance on mutli-core archs
As a co-author of the reactive manifesto I'd like to point out that reactive can be considered a superset of async. Good reactive applications are event driven and non-blocking. They are also responsive, resilient, and scalable which async can help with but does not prescribe. What are the bad connotations? I'm curious to understand other perspectives on this. On Monday, 17 March 2014 20:22:31 UTC, Andy C wrote: From what I understand, a single core can easily saturate a memory bus. At the same time L2 and L3 caches are so small as compared to GB of memory systems yet growing them does not necessarily help either due to larger latencies. It all limits the number of practical applications which could really take a full advantage of multi core architectures. Having said that I just wonder if a typical async Clojure application can even be efficiently performed on a multi-core architecture. In any case, immutability helps writing reliable multi execution context software there is no need for explicit synchronization in most of cases. BTW, I absolutely love the async name itself chosen by Clojure authors as opposed to reactive as in http://www.reactivemanifesto.org. Reactive comes with bad connotations and really does not reflect the intentions well. Best, Andy -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: STM and persistent data structures performance on mutli-core archs
In my personal experience I cannot get within 10X the throughput, or latency, of mutable data models when using persistent data models. Hi Martin, Thanks for finding this thread :-). Let me ask a reversed question. Given you come from a persistent data model where code remains reasonably simple. How much effort it really takes to make an imperative model working well with relatively low number of defects? How deep you go to make sure that data structures fit CPU architecture in terms of topology as well as of size of caches? And how it scales in terms of writing the code itself (I mean are code alternations are easy straightforward or you have to write it from scratch)? I've never heard of imperative model. I'm aware of imperative programming. Can you expand on what you mean? I think in the context of this thread 2 points are being conflated here. We started discussing the point about how immutable data (actually persistent data structures) solve the parallel problem. Parallel in the context of increasing thread count as a result of core counts increasing in CPUs, but actually concurrent in access and mutation of data structures in the micro and domain models in the macro. This is being conflated with the performance of data structures in general. I believe that concurrent access to data structures should not be the default design approach. The greatest complexity in any system often comes from concurrent access to state, be the persistent or not. It is better to have private data structures within processing contexts (threads, processes, services, etc.), which communicate via messages. Within these private processing contexts concurrent access is not an issue and thus non-concurrent data structures can be employed. With non-concurrent access it is easy to employ rich data structures like basic arrays, open-addressing hash maps, B+ and B* trees, bitsets, bloom filters, etc., without which many applications would be unusable due to performance and memory constraints. When working in the context of single threaded access to data structures I see using a blend of functional, set theory, OO, and imperative programming techniques as the best way to go. Right tool for the job. I see the effort levels as very similar when choosing the appropriate technique that fits a problem. I have many times seen a code mess and pain result from inappropriate techniques applied blindly like religion to problems that just are not a good fit. So to more directly answer you question. If I need to have concurrent access to a shared data structures I prefer it to be persistent from a usability perspective given performance constraints of my application at satisfied. If I need greater performance I find non-persistent data structures can give better performance for a marginal increase in complexity. I've never quantified it but it feels like percentage points rather than factors. I frame this in the context that the large step in complexity here is choosing to have concurrent access to the data. It is the elephant in the room. Those who are not good at concurrent programming should just not be in this space in the first place because if they do it will get ugly either way. I'd recommend reading the findings of the double hump paper from Brunel University on peoples natural ability in programming. http://blog.codinghorror.com/separating-programming-sheep-from-non-programming-goats/ Hope this helps clarify. Martin... -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: STM and persistent data structures performance on mutli-core archs
As a co-author of the reactive manifesto I'd like to point out that reactive can be considered a superset of async. Good reactive applications are event driven and non-blocking. They are also responsive, resilient, and scalable which async can help with but does not prescribe. What are the bad connotations? I'm curious to understand other perspectives on this. I would say the opposite. Asynchronous or not happening, moving, or existing at the same timehttp://www.learnersdictionary.com/definition/synchronous does not necessarily imply non-blocking. It simply emphasizes that things might coexist without an explicit dependency (in a time dimension) on each other. At the same time (pun not intended) reactivehttp://www.learnersdictionary.com/definition/reactivelimits itself to responding to events which implies a control by some sort of FSM. Perhaps concurrency could be modeled using FSMs, but I do not believe it is always a simple transition. Andy -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: STM and persistent data structures performance on mutli-core archs
Martin, You recommend message-passing approaches like Erlang's as generally superior, but I'm curious if there's any more specific thoughts to the relative tradeoffs from shared-memory by default vs message-passing, ie, where you might rely on hardware-level copies (cache coherence) for coordination in the first, but you rely on software-level data copies in the second for coordination. Is it just a more conservative approach to allocate resources by having to go through more manual work to coordinate processes, or is there more to it? I could see that having development cost (time and effort) proportional to the amount of coordination might result in systems that are likely to avoid these bottlenecks where possible. In the case of shared-memory, there is an inverse relationship, ie it takes more work to decouple systems and remove extra synchronization than it does to build them badly. On Tue, Mar 18, 2014 at 7:03 AM, Martin Thompson mjpt...@gmail.com wrote: In my personal experience I cannot get within 10X the throughput, or latency, of mutable data models when using persistent data models. Hi Martin, Thanks for finding this thread :-). Let me ask a reversed question. Given you come from a persistent data model where code remains reasonably simple. How much effort it really takes to make an imperative model working well with relatively low number of defects? How deep you go to make sure that data structures fit CPU architecture in terms of topology as well as of size of caches? And how it scales in terms of writing the code itself (I mean are code alternations are easy straightforward or you have to write it from scratch)? I've never heard of imperative model. I'm aware of imperative programming. Can you expand on what you mean? I think in the context of this thread 2 points are being conflated here. We started discussing the point about how immutable data (actually persistent data structures) solve the parallel problem. Parallel in the context of increasing thread count as a result of core counts increasing in CPUs, but actually concurrent in access and mutation of data structures in the micro and domain models in the macro. This is being conflated with the performance of data structures in general. I believe that concurrent access to data structures should not be the default design approach. The greatest complexity in any system often comes from concurrent access to state, be the persistent or not. It is better to have private data structures within processing contexts (threads, processes, services, etc.), which communicate via messages. Within these private processing contexts concurrent access is not an issue and thus non-concurrent data structures can be employed. With non-concurrent access it is easy to employ rich data structures like basic arrays, open-addressing hash maps, B+ and B* trees, bitsets, bloom filters, etc., without which many applications would be unusable due to performance and memory constraints. When working in the context of single threaded access to data structures I see using a blend of functional, set theory, OO, and imperative programming techniques as the best way to go. Right tool for the job. I see the effort levels as very similar when choosing the appropriate technique that fits a problem. I have many times seen a code mess and pain result from inappropriate techniques applied blindly like religion to problems that just are not a good fit. So to more directly answer you question. If I need to have concurrent access to a shared data structures I prefer it to be persistent from a usability perspective given performance constraints of my application at satisfied. If I need greater performance I find non-persistent data structures can give better performance for a marginal increase in complexity. I've never quantified it but it feels like percentage points rather than factors. I frame this in the context that the large step in complexity here is choosing to have concurrent access to the data. It is the elephant in the room. Those who are not good at concurrent programming should just not be in this space in the first place because if they do it will get ugly either way. I'd recommend reading the findings of the double hump paper from Brunel University on peoples natural ability in programming. http://blog.codinghorror.com/separating-programming-sheep-from-non-programming-goats/ Hope this helps clarify. Martin... -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google
Re: STM and persistent data structures performance on mutli-core archs
I've never heard of imperative model. I'm aware of imperative programming. Can you expand on what you mean? I meant mutable data model. Sorry for mixing up terms. http://blog.codinghorror.com/separating-programming-sheep-from-non-programming-goats/ Hope this helps clarify. It does. I absolutely agree and get the point that persistent data structure + multiple cores/threads approach does solve much here, in fact probably makes things worse. And indeed it is better to queue Updates from multiple threads and get it handled by the mutator thread. But at the same time I hate to have my imagination be limited by the hardware designs. That really hurts :-). Best regards, Andy -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: STM and persistent data structures performance on mutli-core archs
If it's any consolation, queues or delays are used in hardware to overcome limitations in hardware, too :-). Specifically, I'm thinking of anti-jitter stuff in audio circuits. On Tue, Mar 18, 2014 at 12:45 PM, Andy C andy.coolw...@gmail.com wrote: I've never heard of imperative model. I'm aware of imperative programming. Can you expand on what you mean? I meant mutable data model. Sorry for mixing up terms. http://blog.codinghorror.com/separating-programming-sheep-from-non-programming-goats/ Hope this helps clarify. It does. I absolutely agree and get the point that persistent data structure + multiple cores/threads approach does solve much here, in fact probably makes things worse. And indeed it is better to queue Updates from multiple threads and get it handled by the mutator thread. But at the same time I hate to have my imagination be limited by the hardware designs. That really hurts :-). Best regards, Andy -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: STM and persistent data structures performance on mutli-core archs
As a co-author of the reactive manifesto I'd like to point out that reactive can be considered a superset of async. Good reactive applications are event driven and non-blocking. They are also responsive, resilient, and scalable which async can help with but does not prescribe. What are the bad connotations? I'm curious to understand other perspectives on this. I would say the opposite. Asynchronous or not happening, moving, or existing at the same timehttp://www.learnersdictionary.com/definition/synchronous does not necessarily imply non-blocking. It simply emphasizes that things might coexist without an explicit dependency (in a time dimension) on each other. At the same time (pun not intended) reactivehttp://www.learnersdictionary.com/definition/reactivelimits itself to responding to events which implies a control by some sort of FSM. Perhaps concurrency could be modeled using FSMs, but I do not believe it is always a simple transition. Our use of language in the technology industry could, for sure, be better. Take simple examples like RAM where random should be arbitrary, or don't get me started on people who misuse the term agnostic ;-) Synchronous is now understood to be making a call in a blocking fashion with an optional return value. What is actually missing is the following implied word, i.e. communication. We are normally talking about synchronous communication or asynchronous communication but the communication part gets dropped. Good traits/qualities in critical systems are to be responsive, resilient, and scalable which is at the heart of the reactive manifesto. Many techniques can be employed to achieve this. One such technique that has yet to be bettered in my experience is to be event driven and non-blocking. This is very much how the real world functions. We all have our own private models of the world that we share with message passing, or events given your chosen nomenclature. As an aside I believe FSMs are a very useful tool that more developers should have in their toolbox. If they are observable in implementation then even better. They have been totally invaluable whenever I've done any work on highly-available systems. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: STM and persistent data structures performance on mutli-core archs
2014-03-18 17:45 GMT+01:00 Andy C andy.coolw...@gmail.com: I've never heard of imperative model. I'm aware of imperative programming. Can you expand on what you mean? I meant mutable data model. Sorry for mixing up terms. http://blog.codinghorror.com/separating-programming-sheep-from-non-programming-goats/ Hope this helps clarify. It does. I absolutely agree and get the point that persistent data structure + multiple cores/threads approach does solve much here, in fact probably makes things worse. And indeed it is better to queue Updates from multiple threads and get it handled by the mutator thread. But at the same time I hate to have my imagination be limited by the hardware designs. That really hurts :-). Reminds me the (abandoned?) work by Rich on Pods ... -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: STM and persistent data structures performance on mutli-core archs
some sort of FSM. Perhaps concurrency could be modeled using FSMs, but I do not believe it is always a simple transition. http://www.cis.upenn.edu/~stevez/papers/LZ06b.pdf :-) -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: STM and persistent data structures performance on mutli-core archs
On 18/03/14 18:03, Martin Thompson wrote: Our use of language in the technology industry could, for sure, be better. Take simple examples like RAM where random should be arbitrary, or don't get me started on people who misuse the term agnostic ;-) I would even say our use of abstractions in the tech industry could be better, most notably because of the law of leaky abstractions http://en.wikipedia.org/wiki/Leaky_abstraction: how abstractions such as RAM, GC, persistent data structure, etc. always have a limit at which the things they're supposed to abstract come back right in your face and you end up needing to be aware of the abstracted things. I think mechanical sympathy is all about dealing with leaky abstractions. The art of knowing and playing with the limits of our abstractions is what Rich Hickey would call knowing the trade-offs. I don't think the law of leaky abstraction has any formal underpinning, but using the concept of trade-off it's about what we gain from using an abstraction, which we often know very well because that's the whole point of using it, and what we loose, which is often the forgotten and darker side. In those terms an abstraction without leakiness is one that does not loose anything, whether in terms of information or capability. Know of any? Isn't the whole point of an abstraction to be hiding certain underlying aspects, therefore loosing seems to be part of the deal, meaning abstractions are leaky by nature? I think they are, but don't ask me for a proof. The thing is that our industry is based on layers upon layers of abstractions, whether at the physical level (integrated circuits, interfaces, etc.) or at the software level: binary (1GL) abstracted into assembly (2GL), then C language (3GL), etc. Virtual machines is now another level of abstraction which is probably here to stay, at least in terms of flexibility offered. So perhaps a key question is how many levels of abstraction can we afford before the whole thing becomes too difficult to manage. If think we even have situations where leakiness at one level makes another level even more leaky, at which point one needs to reconsider the whole stack of abstractions, perhaps from hardware up to programming language and runtime, and check for the soundness of the trade-offs made at each level, and then provide more than one choice. So in the end it's all about knowing the trade-off we make at each level of the stack we use. I don't think many of us do, this is becoming more and more obvious as the stack becomes higher and higher. We don't have to be a mechanic to drive a car. But it helps to know what we can expect from it, so we can better choose other modes of transportation when for example we need to travel 1000 km or miles in 4 hours, or travel with 10+ people, or travel where there's no road. End-users of computer programs don't have to know the mechanics. But programmers do if they want to excel at their trade. I don't think we can avoid this: space and time are inescapable dimensions of our material reality and there will always be trade-offs around them. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: STM and persistent data structures performance on mutli-core archs
The thing is that our industry is based on layers upon layers of abstractions, whether at the physical level (integrated circuits, interfaces, etc.) or at the software level: binary (1GL) abstracted into assembly (2GL), then C language (3GL), etc. Virtual machines is now another you maybe forgot microcode, too? ;-) ;-) yes, there are a lot of abstractions! i day-dream that it would be super keen if there were a way for us to have explicit models of them, including their leaks! and if those models where composable. and then if our tools could use those models to show us warnings and errors; to generate stress tests in a guided-randomized quickcheck style; etc. etc. etc. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: STM and persistent data structures performance on mutli-core archs
On Tue, Mar 18, 2014 at 11:06 AM, Raoul Duke rao...@gmail.com wrote: some sort of FSM. Perhaps concurrency could be modeled using FSMs, but I do not believe it is always a simple transition. http://www.cis.upenn.edu/~stevez/papers/LZ06b.pdf I like FSMs, but they do not compose well. A. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: STM and persistent data structures performance on mutli-core archs
On 16/03/14 18:24, Softaddicts wrote: I think that significant optimizations have to be decided at a higher level. I doubt that any of that can be implemented at the hardware level alone and let it decide on the fly. This sounds like magic, too good to be true. I am also quite convinced that optimizing in hardware single threaded and muti-threaded processing are antagonist goals given the current hardware designs. Martin hits a number of significant nails, we need to be aware of hardware limitations and we need to measure the impacts of our choices and change these within some reachable goals. However achieving 10% or less cpu idle time on a specific server architecture to me is not a goal. I'm interested by the constraints I have to met (business and physical ones) and playing within a large playground to meet these wether it involves using more powerful/specially designed hardware or using better software designs. I hear you well. Whether further progress comes from hardware and/or software doesn't matter, I just hope progress is being made somewhere and that we won't have to fiddle with hundreds of parameters as it's the case for the JVM right now. Until then, tedious experimentation will be the only way to ensure scaling does happen. Sure Martin's case at LMAX was extreme, not many people will have such concerns. But we're in an age where the need for scalability can be very sudden (e.g. popular phone app.). With servers nowadays that can have many CPUs and cores, and many Gb of RAM, one can easily hope that vertical scaling can be sufficient, but it's often not as simple as that. Vertical can even be more dangerous than horizontal, since in the horizontal case you probably architect for scaling right from start. So what works and doesn't work is valuable information, and initiatives such as the Reactive Manifesto http://www.reactivemanifesto.org/ are helpful because they provide guidelines. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: STM and persistent data structures performance on mutli-core archs
I would say that guidelines can help but curiosity is mandatory. I used to sleep with books on my bedside table about hardware design, operating system internals, io subsystems, networking, ... I did this for 15 years after I started to work with computers. Aside from work related stuff of course, this was to increase my general knowledge and to be more relaxed on the job :) Knowing your enemy is essential... Luc P. On 16/03/14 18:24, Softaddicts wrote: I think that significant optimizations have to be decided at a higher level. I doubt that any of that can be implemented at the hardware level alone and let it decide on the fly. This sounds like magic, too good to be true. I am also quite convinced that optimizing in hardware single threaded and muti-threaded processing are antagonist goals given the current hardware designs. Martin hits a number of significant nails, we need to be aware of hardware limitations and we need to measure the impacts of our choices and change these within some reachable goals. However achieving 10% or less cpu idle time on a specific server architecture to me is not a goal. I'm interested by the constraints I have to met (business and physical ones) and playing within a large playground to meet these wether it involves using more powerful/specially designed hardware or using better software designs. I hear you well. Whether further progress comes from hardware and/or software doesn't matter, I just hope progress is being made somewhere and that we won't have to fiddle with hundreds of parameters as it's the case for the JVM right now. Until then, tedious experimentation will be the only way to ensure scaling does happen. Sure Martin's case at LMAX was extreme, not many people will have such concerns. But we're in an age where the need for scalability can be very sudden (e.g. popular phone app.). With servers nowadays that can have many CPUs and cores, and many Gb of RAM, one can easily hope that vertical scaling can be sufficient, but it's often not as simple as that. Vertical can even be more dangerous than horizontal, since in the horizontal case you probably architect for scaling right from start. So what works and doesn't work is valuable information, and initiatives such as the Reactive Manifesto http://www.reactivemanifesto.org/ are helpful because they provide guidelines. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- Luc Prefontainelprefonta...@softaddicts.ca sent by ibisMail! -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: STM and persistent data structures performance on mutli-core archs
From what I understand, a single core can easily saturate a memory bus. At the same time L2 and L3 caches are so small as compared to GB of memory systems yet growing them does not necessarily help either due to larger latencies. It all limits the number of practical applications which could really take a full advantage of multi core architectures. Having said that I just wonder if a typical async Clojure application can even be efficiently performed on a multi-core architecture. In any case, immutability helps writing reliable multi execution context software there is no need for explicit synchronization in most of cases. BTW, I absolutely love the async name itself chosen by Clojure authors as opposed to reactive as in http://www.reactivemanifesto.org. Reactive comes with bad connotations and really does not reflect the intentions well. Best, Andy -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: STM and persistent data structures performance on mutli-core archs
In the old days, the principle of locality was prevalent. Most data allocation was static or if not contiguous (the stack). Code was also contiguous being most of the time statically compiled on several pages. Memory was costly, its size remained reasonable. Today, data is scattered everywhere in a huge memory, its location changes frequently, code can be dynamically compiled, get loaded by small chunks, Hardware wise they do what they can to deal with this but I think it's a lost battle. We need a quantum jump at the hardware level to get some faster random access to memory. This will probably shake the software stack a lot. Which brings other issues... Meanwhile we are stuck with finding the least painful path to get our stuff to run asap. Luc P. From what I understand, a single core can easily saturate a memory bus. At the same time L2 and L3 caches are so small as compared to GB of memory systems yet growing them does not necessarily help either due to larger latencies. It all limits the number of practical applications which could really take a full advantage of multi core architectures. Having said that I just wonder if a typical async Clojure application can even be efficiently performed on a multi-core architecture. In any case, immutability helps writing reliable multi execution context software there is no need for explicit synchronization in most of cases. BTW, I absolutely love the async name itself chosen by Clojure authors as opposed to reactive as in http://www.reactivemanifesto.org. Reactive comes with bad connotations and really does not reflect the intentions well. Best, Andy -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- Luc Prefontainelprefonta...@softaddicts.ca sent by ibisMail! -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: STM and persistent data structures performance on mutli-core archs
Today, data is scattered everywhere in a huge memory, its location changes frequently, code can be dynamically compiled, get loaded by small chunks, It describes my case actually - I am loading about 2-8GB of stuff in the memory and to tons of mapping, grouping by and filtering. That works reasonably well now but will not scale up soon. A. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: STM and persistent data structures performance on mutli-core archs
In my personal experience I cannot get within 10X the throughput, or latency, of mutable data models when using persistent data models. Hi Martin, Thanks for finding this thread :-). Let me ask a reversed question. Given you come from a persistent data model where code remains reasonably simple. How much effort it really takes to make an imperative model working well with relatively low number of defects? How deep you go to make sure that data structures fit CPU architecture in terms of topology as well as of size of caches? And how it scales in terms of writing the code itself (I mean are code alternations are easy straightforward or you have to write it from scratch)? I do not mean to troll, just sincere curiosity ... Andy -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: STM and persistent data structures performance on mutli-core archs
I agree with most of his arguments but not all of them. Memory subsystems have always been a major concern. Since 35 years ago, many designs went through simulation before burning anything on chip. Especially SMP designs with shared memory given the cost of prototyping. Simulations used typical workloads to identify which strategies should be implemented in hardware. Shared memory sub-systems proved to be the main bottleneck every time. Any choices of strategies could result in bottlenecks if the payload differed somewhat from the average simulated ones. Threads appeared much more later and this alone makes you wonder about the pressure increase on shared memory sub-systems. The numbers Martin claims as speed increases (myth #x, CPUs are not getting faster) are simply better/worse hardware strategy choices which decreased idle time. Manufacturers are doing the same job as decade ago, simulating and finding how to optimize as much as possible typical workloads. We cannot break the physical barriers anymore, we need a technology upgrade (atom level, quantum level, whatever). But such a change is no more elegant than a brute force approach that will reshape performance and create a new barrier farther away. Having tuned generated machine code when domestic appliance sized computers were the norm, I would not like to get back to this age of analyzing the hardware to twist my software accordingly. Relying on hardware architectures to craft software is not to me a path with a bright future. Server hardware changes often enough that such an approach would mess design and be unsustainable in the long run. We may end up someday with tunable hardware that adapt to workloads. By tunable I mean giving the hardware instructions on how to behave. This would be better than the above. We had such options with traditional compilers. I think that significant optimizations have to be decided at a higher level. I doubt that any of that can be implemented at the hardware level alone and let it decide on the fly. This sounds like magic, too good to be true. I am also quite convinced that optimizing in hardware single threaded and muti-threaded processing are antagonist goals given the current hardware designs. Martin hits a number of significant nails, we need to be aware of hardware limitations and we need to measure the impacts of our choices and change these within some reachable goals. However achieving 10% or less cpu idle time on a specific server architecture to me is not a goal. I'm interested by the constraints I have to met (business and physical ones) and playing within a large playground to meet these wether it involves using more powerful/specially designed hardware or using better software designs. Luc P. Martin's point about immutable and persistent data structures is further developed in his interview on infoq http://www.infoq.com/interviews/reactive-system-design-martin-thompson, you can skim to point #9 if you're in a hurry. Overall what he says is that in terms of scalability of the development activity, immutability and persistence are great ideas, since we don't have to deal with non-deterministic behaviour any more. When one needs to scale the running system, meaning increasing the rate at which the persistent data structure is updated, these can lead to performance issues in various ways: - longer GC pauses because persistency increases the number of objects that are neither very short-lived nor long-lived, - contention because the root of the tree of the persistent data structure becomes the focal point of concurrency, - increased CPU cache misses since persistent data structures are trees that increasingly span larger non-contiguous and non-sequential parts of memory Of these the last point is probably the most painful, since there's no way to deal with it unless one reconsiders the whole persistent data structure. In other words increasing the number of threads and cores may eventually lower throughput because the time taken for dealing with these issues (GC pauses, locking, cache misses) grows larger than the time taken for useful computation. I can't backup any of this with actual data and experience. However I think this old thread about poor performance on multicore https://groups.google.com/forum/#%21topic/clojure/48W2eff3caU does provide a clear picture of the problem, which becomes even clearer with actual stats showing https://groups.google.com/d/msg/clojure/48W2eff3caU/FBFQp2vrWFgJCPUs https://groups.google.com/d/msg/clojure/48W2eff3caU/FBFQp2vrWFgJwere 83% idle https://groups.google.com/d/msg/clojure/48W2eff3caU/FBFQp2vrWFgJ, i.e. waiting for memory. Also one should view Martin's other vidoes on infoq http://www.infoq.com/author/Martin-Thompson to get a better understanding of his arguments. He's actually quite positive about Clojure in general. It's just that depending on the scalability and
Re: STM and persistent data structures performance on mutli-core archs
On 15/03/14 01:59, Andy C wrote: Maybe one day this idea http://en.wikipedia.org/wiki/Lisp_machine will come back, I mean in a new form .. That reminds me of this Urbit project http://www.urbit.org/, which is nowhere near usefulness at present, but deserves to be mentioned as a (radical) experiment striving to go back to the roots (ideally hardware) in order to break free from the inconsistencies of having grown separately the mechanical and semantic models. FoNC http://www.vpri.org/fonc_wiki/index.php/Idst is another example. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: STM and persistent data structures performance on mutli-core archs
Martin's point about immutable and persistent data structures is further developed in his interview on infoq http://www.infoq.com/interviews/reactive-system-design-martin-thompson, you can skim to point #9 if you're in a hurry. Overall what he says is that in terms of scalability of the development activity, immutability and persistence are great ideas, since we don't have to deal with non-deterministic behaviour any more. When one needs to scale the running system, meaning increasing the rate at which the persistent data structure is updated, these can lead to performance issues in various ways: - longer GC pauses because persistency increases the number of objects that are neither very short-lived nor long-lived, - contention because the root of the tree of the persistent data structure becomes the focal point of concurrency, - increased CPU cache misses since persistent data structures are trees that increasingly span larger non-contiguous and non-sequential parts of memory Of these the last point is probably the most painful, since there's no way to deal with it unless one reconsiders the whole persistent data structure. In other words increasing the number of threads and cores may eventually lower throughput because the time taken for dealing with these issues (GC pauses, locking, cache misses) grows larger than the time taken for useful computation. I can't backup any of this with actual data and experience. However I think this old thread about poor performance on multicore https://groups.google.com/forum/#%21topic/clojure/48W2eff3caU does provide a clear picture of the problem, which becomes even clearer with actual stats showing https://groups.google.com/d/msg/clojure/48W2eff3caU/FBFQp2vrWFgJCPUs https://groups.google.com/d/msg/clojure/48W2eff3caU/FBFQp2vrWFgJwere 83% idle https://groups.google.com/d/msg/clojure/48W2eff3caU/FBFQp2vrWFgJ, i.e. waiting for memory. Also one should view Martin's other vidoes on infoq http://www.infoq.com/author/Martin-Thompson to get a better understanding of his arguments. He's actually quite positive about Clojure in general. It's just that depending on the scalability and performance requirements, persistent data structures may not provide a satisfactory answer and could even lower throughput. On 14/03/14 18:01, ?? ? wrote: He talks about simple things actually. When you have any sort of immutable data structure and you want to change it from multiple threads you just must have a mutable reference which points to the current version of that data structure. Now, updates to that mutable reference are fundamentally serial. Whatever synchronization strategy you chose been that optimistic updates (atom) or queuing (agent) or locks you inevitably will have a contention on a large number of threads. When you will run on that you will also have hundred ways to solve a problem. There is nothing magical about persistent data structures on multi-core machines :) ???, 13 ? 2014 ?., 20:58:54 UTC+4 Andy C ???: Hi, So the other day I came across this presentation:http://www.infoq.com/presentations/top-10-performance-myths http://www.infoq.com/presentations/top-10-performance-myths The guy seems to be smart and know what he talks about however when at 0:22:35 he touches on performance (or lack of thereof) of persistent data structures on multi-core machines I feel puzzled. He seems to have a point but really does not back it with any details. There is also a claim that STM does not cooperate well with GC. Is it true? Thanks, Andy -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: STM and persistent data structures performance on mutli-core archs
cough cough erlang cough ahem -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: STM and persistent data structures performance on mutli-core archs
He talks about simple things actually. When you have any sort of immutable data structure and you want to change it from multiple threads you just must have a mutable reference which points to the current version of that data structure. Now, updates to that mutable reference are fundamentally serial. Whatever synchronization strategy you chose been that optimistic updates (atom) or queuing (agent) or locks you inevitably will have a contention on a large number of threads. When you will run on that you will also have hundred ways to solve a problem. There is nothing magical about persistent data structures on multi-core machines :) четверг, 13 марта 2014 г., 20:58:54 UTC+4 пользователь Andy C написал: Hi, So the other day I came across this presentation:http://www.infoq.com/presentations/top-10-performance-myths The guy seems to be smart and know what he talks about however when at 0:22:35 he touches on performance (or lack of thereof) of persistent data structures on multi-core machines I feel puzzled. He seems to have a point but really does not back it with any details. There is also a claim that STM does not cooperate well with GC. Is it true? Thanks, Andy -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: STM and persistent data structures performance on mutli-core archs
On Fri, Mar 14, 2014 at 12:58 PM, Raoul Duke rao...@gmail.com wrote: cough cough erlang cough ahem Care to elaborate? :-) -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: STM and persistent data structures performance on mutli-core archs
cough cough erlang cough ahem Care to elaborate? :-) Now his point is that GC acts a super GIL which effectively kills all the hard work done on the language and application design level. erlang's approach to gc / sharing data / multi process / smp is i think an interesting sweet spot. there sure as heck is not a GIL effect by the GC. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: STM and persistent data structures performance on mutli-core archs
Shared memory is kind of a degenerate case of message-passing when you think about cache-coherency protocols and such. I'd argue erlang's message-passing isn't inherently superior, but kind of obfuscates the issue. What's the sequential fraction of an arbitrary erlang program, can you even know (I don't know erlang, so I'm honestly asking)? Shared memory pretty darn convenient, and we don't have hundred-core+ boxes yet. On Fri, Mar 14, 2014 at 1:18 PM, Raoul Duke rao...@gmail.com wrote: cough cough erlang cough ahem Care to elaborate? :-) Now his point is that GC acts a super GIL which effectively kills all the hard work done on the language and application design level. erlang's approach to gc / sharing data / multi process / smp is i think an interesting sweet spot. there sure as heck is not a GIL effect by the GC. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: STM and persistent data structures performance on mutli-core archs
i am probably out of my depth here, i do not have extensive real-world experience with the various ways to approach parallelism and concurrency (to be distinguished of course), more run of the mill stuff. so if i sound like i'm missing your point or am clueless i ask for your patience :-) What's the sequential fraction of an arbitrary erlang program, can you even know (I don't know erlang, so I'm honestly asking)? who cares? or rather, each person has to only care about their own program situation. maybe their stuff fits erlang. maybe it fits better with something else e.g. LMAX. it. all. depends. :-) everything depends on context. Martin's talk even included the part where he bemoaned that people don't just stay single-threaded and fix their crappy code first. running to concurrency and parallelism is often a cop-out the way i hear him. that could be seen as arguing 'against' erlang. so there are places where your program is mostly sequential and things like does the GC act like a GIL do not matter as much as the situation where you are trying to be more concurrent + parallel but not distributed. in those sequential situations, maybe erlang becomes a square peg for the round hole. (although i personally, through suffering as a maintenance programmer, am a *huge* lover of the recursive single assignment turn based approach to things, and i love clojure's idea of having a consistent view of the world; most OO people shoot me in the foot a year after they've left the company, with their crappy macaroni code.) Shared memory pretty darn convenient, and we don't have hundred-core+ boxes yet. i'm confused in that i thought you wrote shared memory ~= message passing. so why talk about shared memory when that is a lie? just like Martin said, RAM is a lie. why not realize everything is really message passing in the long run, model it as such, understand the issues as such? i do not have a chip on my shoulder about this, i'm just sounding it out / exploring the thought. sincerely. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: STM and persistent data structures performance on mutli-core archs
On Fri, Mar 14, 2014 at 1:35 PM, Raoul Duke rao...@gmail.com wrote: i am probably out of my depth here, i do not have extensive real-world experience with the various ways to approach parallelism and concurrency (to be distinguished of course), more run of the mill stuff. so if i sound like i'm missing your point or am clueless i ask for your patience :-) What's the sequential fraction of an arbitrary erlang program, can you even know (I don't know erlang, so I'm honestly asking)? who cares? or rather, each person has to only care about their own program situation. maybe their stuff fits erlang. maybe it fits better with something else e.g. LMAX. it. all. depends. :-) everything depends on context. Martin's talk even included the part where he bemoaned that people don't just stay single-threaded and fix their crappy code first. running to concurrency and parallelism is often a cop-out the way i hear him. that could be seen as arguing 'against' erlang. so there are places where your program is mostly sequential and things like does the GC act like a GIL do not matter as much as the situation where you are trying to be more concurrent + parallel but not distributed. in those sequential situations, maybe erlang becomes a square peg for the round hole. (although i personally, through suffering as a maintenance programmer, am a *huge* lover of the recursive single assignment turn based approach to things, and i love clojure's idea of having a consistent view of the world; most OO people shoot me in the foot a year after they've left the company, with their crappy macaroni code.) Shared memory pretty darn convenient, and we don't have hundred-core+ boxes yet. i'm confused in that i thought you wrote shared memory ~= message passing. so why talk about shared memory when that is a lie? just like Martin said, RAM is a lie. why not realize everything is really message passing in the long run, model it as such, understand the issues as such? i do not have a chip on my shoulder about this, i'm just sounding it out / exploring the thought. Well, yes it's a lie, it's an abstraction, we love those as programmers :-). The question is, is it worth the trouble and under what conditions? IMO, we don't understand message passing well enough to be great at programming or characterizing distributed concurrent systems. I wonder how this changes when you're writing erlang. I would guess that certain classes of programs are really easy to write in erlang compared to other languages, and I wonder at what cost? Clojure appealed to me because I was studying computer architecture in grad school, and I think it has a great answer for cores100 shared-memory concurrency, but at some point I acknowledge that might not be good enough. I'm very interested in Netkernel's approach, actually: http://www.1060research.com/products/ It sort of marries the notion of immutable data with distribution and message passing. URLs as pointers. sincerely. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: STM and persistent data structures performance on mutli-core archs
Everything has tradeoffs. One more example was when Martin explained how he was able to get a 10x performance boost by pinning a JVM to each socket in a server, then pinning that JVM's memory to the RAM sockets closest to that CPU socket. Then he carefully setup shared memory message passing via burning one core on each CPU to monitor the queues, and moving messages only in the directions directly supported by the Hyper Transport bus. Once again, that's amazing...once again I've never needed that to ship a product. This is absolutely fascinating to me. The ability to do this sort of thing is a tradeoff from using abstractions (threads, OS threads, Java threads) that closely match or can be massaged to match or 'have sympathy' for the hardware realities. I think this can get lost when we stray too far. On Fri, Mar 14, 2014 at 1:46 PM, Gary Trakhman gary.trakh...@gmail.comwrote: On Fri, Mar 14, 2014 at 1:35 PM, Raoul Duke rao...@gmail.com wrote: i am probably out of my depth here, i do not have extensive real-world experience with the various ways to approach parallelism and concurrency (to be distinguished of course), more run of the mill stuff. so if i sound like i'm missing your point or am clueless i ask for your patience :-) What's the sequential fraction of an arbitrary erlang program, can you even know (I don't know erlang, so I'm honestly asking)? who cares? or rather, each person has to only care about their own program situation. maybe their stuff fits erlang. maybe it fits better with something else e.g. LMAX. it. all. depends. :-) everything depends on context. Martin's talk even included the part where he bemoaned that people don't just stay single-threaded and fix their crappy code first. running to concurrency and parallelism is often a cop-out the way i hear him. that could be seen as arguing 'against' erlang. so there are places where your program is mostly sequential and things like does the GC act like a GIL do not matter as much as the situation where you are trying to be more concurrent + parallel but not distributed. in those sequential situations, maybe erlang becomes a square peg for the round hole. (although i personally, through suffering as a maintenance programmer, am a *huge* lover of the recursive single assignment turn based approach to things, and i love clojure's idea of having a consistent view of the world; most OO people shoot me in the foot a year after they've left the company, with their crappy macaroni code.) Shared memory pretty darn convenient, and we don't have hundred-core+ boxes yet. i'm confused in that i thought you wrote shared memory ~= message passing. so why talk about shared memory when that is a lie? just like Martin said, RAM is a lie. why not realize everything is really message passing in the long run, model it as such, understand the issues as such? i do not have a chip on my shoulder about this, i'm just sounding it out / exploring the thought. Well, yes it's a lie, it's an abstraction, we love those as programmers :-). The question is, is it worth the trouble and under what conditions? IMO, we don't understand message passing well enough to be great at programming or characterizing distributed concurrent systems. I wonder how this changes when you're writing erlang. I would guess that certain classes of programs are really easy to write in erlang compared to other languages, and I wonder at what cost? Clojure appealed to me because I was studying computer architecture in grad school, and I think it has a great answer for cores100 shared-memory concurrency, but at some point I acknowledge that might not be good enough. I'm very interested in Netkernel's approach, actually: http://www.1060research.com/products/ It sort of marries the notion of immutable data with distribution and message passing. URLs as pointers. sincerely. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this
Re: STM and persistent data structures performance on mutli-core archs
that closely match or can be massaged to match or 'have sympathy' for the hardware realities. I think this can get lost when we stray too far. i wish this were somehow more modeled, composed, and controllable up in our ides and source code, rather than being esoteric tweaky options in the various inscrutable layers. i want a futuristic JIT that does this kind of stuff for me :-) :-) -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: STM and persistent data structures performance on mutli-core archs
Yes I agree, that was my next thought. I wish we could do these knobs in a single JVM. On Fri, Mar 14, 2014 at 2:20 PM, Raoul Duke rao...@gmail.com wrote: that closely match or can be massaged to match or 'have sympathy' for the hardware realities. I think this can get lost when we stray too far. i wish this were somehow more modeled, composed, and controllable up in our ides and source code, rather than being esoteric tweaky options in the various inscrutable layers. i want a futuristic JIT that does this kind of stuff for me :-) :-) -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: STM and persistent data structures performance on mutli-core archs
On Fri 14 Mar 2014 at 02:15:07PM -0400, Gary Trakhman wrote: Everything has tradeoffs. One more example was when Martin explained how he was able to get a 10x performance boost by pinning a JVM to each socket in a server, then pinning that JVM's memory to the RAM sockets closest to that CPU socket. Then he carefully setup shared memory message passing via burning one core on each CPU to monitor the queues, and moving messages only in the directions directly supported by the Hyper Transport bus. Once again, that's amazing...once again I've never needed that to ship a product. This is absolutely fascinating to me. The ability to do this sort of thing is a tradeoff from using abstractions (threads, OS threads, Java threads) that closely match or can be massaged to match or 'have sympathy' for the hardware realities. I think this can get lost when we stray too far. cf. the C10M project: http://c10m.robertgraham.com/p/blog-page.html tl;dr The kernel gets in the way of extreme hi-throughput, so if that's what you want, don't lean on the kernel to do your packet routing, thread scheduling, etc. guns pgp0TBkBnDTzD.pgp Description: PGP signature
Re: STM and persistent data structures performance on mutli-core archs
Maybe one day this idea http://en.wikipedia.org/wiki/Lisp_machine will come back, I mean in a new form .. In any case, I think that it would be great to see some performance benchmarks for STM A. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: STM and persistent data structures performance on mutli-core archs
I talked to Martin after a CodeMesh, and had a wonderful discussion with him about performance from his side of the issue. From his side I mean super high performance. You have to get a bit of background on some of his work (at least the work he talks about), much of what he does is high throughput trading systems. In these systems you will often see small messages (50 bytes or so) and the the goal is to process them as quickly as possible. This means that almost everything is secondary to throughput. So let's take that generalization and apply it to the more common work I or people like me do...here we care mostly about programmer productivity, and making simpler systems. Here immutability and STM like concepts (I'm dumping STM in with atoms here) work pretty well. At one point in our conversation Martin mentioned that he was able to get about 100,000 messages through his system a second. That's awesome! How often do you have 100k messages in your system? Everything has tradeoffs. One more example was when Martin explained how he was able to get a 10x performance boost by pinning a JVM to each socket in a server, then pinning that JVM's memory to the RAM sockets closest to that CPU socket. Then he carefully setup shared memory message passing via burning one core on each CPU to monitor the queues, and moving messages only in the directions directly supported by the Hyper Transport bus. Once again, that's amazing...once again I've never needed that to ship a product. Sadly, other parts of that talk are very true. Many programmers I've talked to don't understand the difference between a hash map and a list, let alone understand when they would use them. So I'll fight that battle, and Martin can fight his, and I'll apply his work to my problems as the solution fits. So that's the way I view it. I don't claim to know much at all about Martin's work, I just know enough to know that I don't deal with those problems. Most servers I work on are idle at least 50% of the time. And when they do get overloaded I can just startup more machines. Timothy On Thu, Mar 13, 2014 at 10:58 AM, Andy C andy.coolw...@gmail.com wrote: Hi, So the other day I came across this presentation:http://www.infoq.com/presentations/top-10-performance-myths The guy seems to be smart and know what he talks about however when at 0:22:35 he touches on performance (or lack of thereof) of persistent data structures on multi-core machines I feel puzzled. He seems to have a point but really does not back it with any details. There is also a claim that STM does not cooperate well with GC. Is it true? Thanks, Andy -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- One of the main causes of the fall of the Roman Empire was that-lacking zero-they had no way to indicate successful termination of their C programs. (Robert Firth) -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: STM and persistent data structures performance on mutli-core archs
On Thu, Mar 13, 2014 at 11:24 AM, Timothy Baldridge tbaldri...@gmail.comwrote: I talked to Martin after a CodeMesh, and had a wonderful discussion with him about performance from his side of the issue. From his side I mean super high performance. [...] Hi Tim, Thanks for explaining the context of Martin's work - I did not know that it was so advanced. I did some research back in good times on http://en.wikipedia.org/wiki/Connection_Machine (via telnet 56kb/s telnet connection from east Europe to US - that was super cool back then) and can only atest to what he says in the presentation. You can achieve amazing levels of performance provided that patterns of data flow and control in software match exactly parallel nature of given architecture. However his point is somewhat more serious. He directly attacks a premise that using immutable data is inherently mutli-core friendly which comes from the deduction that writing reactive software reduces the use locks and guards hence enables more flow and less waiting. Now his point is that GC acts a super GIL which effectively kills all the hard work done on the language and application design level. Now, I wish Marin eats his own dog food and as pointed out numerous times in the presentations, he backs up his points with a real experiments and data. At least it was not apparent wether his conclusions were purely theoretical or grounded in some experience. I am in the process of transitioning from Scala to Clojure ecosystem (just finished SICP videos and have some hard 4clojure problems behind me, but a lot of to learn) so not yet fluent in all aspects of the language but I think at some point I will try to write some simulations of STM performace following some of Martin's intuitions. I think that it will be very cool. Best, Andy -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.