Re: multi-core software
Matthias Blume wrote: Jeff M. mass...@gmail.com writes: But, assuming that your program works and does what it's supposed to, I agree with Jon that performance needs to be right near the top of the list of concerns. Why? Performance isn't about looking good as a programmer, or having fun making a function run in 15 cycles instead of 24, or coming up with some neat bit packing scheme so that your app now only uses 20K instead of 200K. Performance is - pure and simple - about one thing only: money. Programmer time is vastly more expensive than CPU time, so the money argument often leads to slow (low performance) solutions as long as they are good enough because developing a faster solution would mean spending more valuable programmer time at a cost that cannot be recovered over the life cycle of the product in question. In the context of commercial software, the money to fund developers to improve performance comes from the huge marketing budget because performance is usually more about marketing than anything else. -- Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?u -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
Arved Sandstrom wrote: Jon Harrop wrote: Arved Sandstrom wrote: Jon Harrop wrote: No. Concurrent programming is about interleaving computations in order to reduce latency. Nothing to do with parallelism. Jon, I do concurrent programming all the time, as do most of my peers. Way down on the list of why we do it is the reduction of latency. What is higher on the list? Correctness. I'm not being facetious. I write applications that run on application servers, and from time to time I have had to write various special purpose servers. This kind of programming is all about managing concurrent execution of computations. The overarching concern is reliability and correct function. For many corporate situations, even with hundreds of users, the actual load at any instant is low enough that the various servers involved are nowhere close to being stressed out - performance is a secondary issue. In other words, without concurrency the latency would be so high that you would consider the program to be wrong. However you cut it, the real reason is latency. -- Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?u -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
On Jun 9, 9:08 pm, Arved Sandstrom dces...@hotmail.com wrote: Jon Harrop wrote: Arved Sandstrom wrote: Jon, I do concurrent programming all the time, as do most of my peers. Way down on the list of why we do it is the reduction of latency. What is higher on the list? Correctness. IMO, that response is a bit of a cop-out. Correctness is _always_ most important, no matter what application you are creating; without it, you don't have a job and the company you work for goes out of business. But, assuming that your program works and does what it's supposed to, I agree with Jon that performance needs to be right near the top of the list of concerns. Why? Performance isn't about looking good as a programmer, or having fun making a function run in 15 cycles instead of 24, or coming up with some neat bit packing scheme so that your app now only uses 20K instead of 200K. Performance is - pure and simple - about one thing only: money. Programs that use more memory require more money for the hardware of every user. Programs that run slower eat more time per day. If you have 100,000 users, all doing an operation once per day that takes 20 seconds, being able to shave 5 seconds off that saves 5.78 man-days of work. Hell, for some applications, that 20 seconds is just startup time spent at a splash screen. Just imagine if every Google search took even 5 seconds to resolve, how much time would be wasted every day around the world - ignoring the fact that Google wouldn't exist if that were the case ;-). Obviously Google engineers work incredibly hard every day to ensure correct results, but performance better be right up there at the top of the list as well. Jeff M. -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
Jeff M. mass...@gmail.com writes: On Jun 9, 9:08 pm, Arved Sandstrom dces...@hotmail.com wrote: Jon Harrop wrote: Arved Sandstrom wrote: Jon, I do concurrent programming all the time, as do most of my peers. Way down on the list of why we do it is the reduction of latency. What is higher on the list? Correctness. IMO, that response is a bit of a cop-out. Correctness is _always_ most important, no matter what application you are creating; without it, you don't have a job and the company you work for goes out of business. But, assuming that your program works and does what it's supposed to, I agree with Jon that performance needs to be right near the top of the list of concerns. Why? Performance isn't about looking good as a programmer, or having fun making a function run in 15 cycles instead of 24, or coming up with some neat bit packing scheme so that your app now only uses 20K instead of 200K. Performance is - pure and simple - about one thing only: money. Programmer time is vastly more expensive than CPU time, so the money argument often leads to slow (low performance) solutions as long as they are good enough because developing a faster solution would mean spending more valuable programmer time at a cost that cannot be recovered over the life cycle of the product in question. That being said, there are plenty of situations where performance obviously does matter a great deal -- as you correctly pointed out. (It all depends on the above mentioned product in question and the nature of its life cycle.) Matthias -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
Jon Harrop j...@ffconsultancy.com writes: I'm not being facetious. I write applications that run on application servers, and from time to time I have had to write various special purpose servers. This kind of programming is all about managing concurrent execution of computations. The overarching concern is reliability and correct function. For many corporate situations, even with hundreds of users, the actual load at any instant is low enough that the various servers involved are nowhere close to being stressed out - performance is a secondary issue. In other words, without concurrency the latency would be so high that you would consider the program to be wrong. However you cut it, the real reason is latency. I don't think that follows, if there is two-way communication and dependency between the servers, combined with lack of control over when any particular server decides to initiate an outgoing request. Stuff may have to happen concurrently to avoid complete deadlock. -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
On Jun 10, 12:49 pm, Seamus MacRae smacrae...@live.ca.invalid wrote: Jeff M. wrote: On Jun 9, 9:08 pm, Arved Sandstrom dces...@hotmail.com wrote: Jon Harrop wrote: Arved Sandstrom wrote: Jon, I do concurrent programming all the time, as do most of my peers. Way down on the list of why we do it is the reduction of latency. What is higher on the list? Correctness. IMO, that response is a bit of a cop-out. Correctness is _always_ most important, no matter what application you are creating; without it, you don't have a job and the company you work for goes out of business. And when, exactly, did Microsoft go out of business? I hadn't heard it mentioned in the news. :) Touche. :) Jeff M. -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
Jeff M. wrote: On Jun 9, 9:08 pm, Arved Sandstrom dces...@hotmail.com wrote: Jon Harrop wrote: Arved Sandstrom wrote: Jon, I do concurrent programming all the time, as do most of my peers. Way down on the list of why we do it is the reduction of latency. What is higher on the list? Correctness. IMO, that response is a bit of a cop-out. Correctness is _always_ most important, no matter what application you are creating; without it, you don't have a job and the company you work for goes out of business. And when, exactly, did Microsoft go out of business? I hadn't heard it mentioned in the news. :) -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
Jeff M. wrote: On Jun 9, 9:08 pm, Arved Sandstrom dces...@hotmail.com wrote: Jon Harrop wrote: Arved Sandstrom wrote: Jon, I do concurrent programming all the time, as do most of my peers. Way down on the list of why we do it is the reduction of latency. What is higher on the list? Correctness. IMO, that response is a bit of a cop-out. Correctness is _always_ most important, no matter what application you are creating; without it, you don't have a job and the company you work for goes out of business. PC / Video Games definitely fall out of the correctness. As long as the game does not crash your XBOX/PS3/Whatever for certain amount of time, and behaves well then, it's fine. Bugs are already part of the genre. In reality you can't ship on time, there are always BUGS :) Most important thing in games is (at least for large percent of them) speed of graphics - fluid 60fps, or stable 30fps. But, assuming that your program works and does what it's supposed to, I agree with Jon that performance needs to be right near the top of the list of concerns. Why? Performance isn't about looking good as a programmer, or having fun making a function run in 15 cycles instead of 24, or coming up with some neat bit packing scheme so that your app now only uses 20K instead of 200K. Performance is - pure and simple - about one thing only: money. Programs that use more memory require more money for the hardware of every user. Programs that run slower eat more time per day. If you have 100,000 users, all doing an operation once per day that takes 20 seconds, being able to shave 5 seconds off that saves 5.78 man-days of work. Hell, for some applications, that 20 seconds is just startup time spent at a splash screen. Just imagine if every Google search took even 5 seconds to resolve, how much time would be wasted every day around the world - ignoring the fact that Google wouldn't exist if that were the case ;-). Obviously Google engineers work incredibly hard every day to ensure correct results, but performance better be right up there at the top of the list as well. Jeff M. -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
Jon Harrop wrote: Arved Sandstrom wrote: Jon Harrop wrote: Arved Sandstrom wrote: Jon Harrop wrote: No. Concurrent programming is about interleaving computations in order to reduce latency. Nothing to do with parallelism. Jon, I do concurrent programming all the time, as do most of my peers. Way down on the list of why we do it is the reduction of latency. What is higher on the list? Correctness. I'm not being facetious. I write applications that run on application servers, and from time to time I have had to write various special purpose servers. This kind of programming is all about managing concurrent execution of computations. The overarching concern is reliability and correct function. For many corporate situations, even with hundreds of users, the actual load at any instant is low enough that the various servers involved are nowhere close to being stressed out - performance is a secondary issue. In other words, without concurrency the latency would be so high that you would consider the program to be wrong. However you cut it, the real reason is latency. For a certain group of applications and user loads I would concede that point, yes. For quite a few other situations, you could queue up user requests and execute them in order, finishing each before proceeding to the next, and users wouldn't even notice. I wrote a J2SE server a few months ago, to solve a very specific problem associated with an application running on a J2EE server, that could handle dozens of users per second using this strategy. It didn't write it that way because doing so is perverse, but I could have. AHS -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
Jeff M. wrote: On Jun 9, 9:08 pm, Arved Sandstrom dces...@hotmail.com wrote: Jon Harrop wrote: Arved Sandstrom wrote: Jon, I do concurrent programming all the time, as do most of my peers. Way down on the list of why we do it is the reduction of latency. What is higher on the list? Correctness. IMO, that response is a bit of a cop-out. Correctness is _always_ most important, no matter what application you are creating; without it, you don't have a job and the company you work for goes out of business. But, assuming that your program works and does what it's supposed to, I agree with Jon that performance needs to be right near the top of the list of concerns. Why? Performance isn't about looking good as a programmer, or having fun making a function run in 15 cycles instead of 24, or coming up with some neat bit packing scheme so that your app now only uses 20K instead of 200K. Performance is - pure and simple - about one thing only: money. Programs that use more memory require more money for the hardware of every user. Programs that run slower eat more time per day. If you have 100,000 users, all doing an operation once per day that takes 20 seconds, being able to shave 5 seconds off that saves 5.78 man-days of work. Hell, for some applications, that 20 seconds is just startup time spent at a splash screen. Just imagine if every Google search took even 5 seconds to resolve, how much time would be wasted every day around the world - ignoring the fact that Google wouldn't exist if that were the case ;-). Obviously Google engineers work incredibly hard every day to ensure correct results, but performance better be right up there at the top of the list as well. Jeff M. Point taken, but I primarily work on internal government and corporate applications. Might be hundreds or thousands of users at any given time, but not typically tens or hundreds of thousands. Hundreds or thousands of users translates to at least an order of magnitude less simultaneous users. Ops people that I talk to who monitor apps I've worked on, or similar apps, rarely report any such where the application itself is presenting a sluggish aspect to users because it's stressing out processors, or consuming gargantuan memory. Typically when latency problems come up, it's because the app has to talk to other servers - databases, LDAP, print servers etc. In other words, the network is coming into play. In fact when we do work on performance, it's typically about minimizing calls to the database or other services. AHS -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
Seamus MacRae smacrae...@live.ca.invalid (SM) wrote: SM Piet van Oostrum wrote: By the way, there is a series of articles about concurrency on ACM Queue which may be interesting for those participating in or just following this discussion: http://queue.acm.org/listing.cfm?item_topic=Concurrencyqc_type=theme_listfilter=Concurrencypage_title=Concurrency Here is one introductory paragraph from one of the articles: Parallel programming poses many new challenges to the developer, one of which is synchronizing concurrent access to shared memory by multiple threads. Programmers have traditionally used locks for synchronization, but lock-based synchronization has well-known pitfalls. Simplistic coarse-grained locking does not scale well, while more sophisticated fine-grained locking risks introducing deadlocks and data races. Furthermore, scalable libraries written using fine-grained locks cannot be easily composed in a way that retains scalability and avoids deadlock and data races. SM Is that the one about transactional memory? Yes -- Piet van Oostrum p...@cs.uu.nl URL: http://pietvanoostrum.com [PGP 8DAE142BE17999C4] Private email: p...@vanoostrum.org -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
On Jun 7, 2:41 pm, Jon Harrop j...@ffconsultancy.com wrote: Arved Sandstrom wrote: Jon Harrop wrote: I see no problem with mutable shared state. In which case, Jon, you're in a small minority. No. Most programmers still care about performance Frequently when they shouldn't. and performance means mutable state. Hm, not sure Erlangers would wholly agree. -- Dr Jon D Harrop, Flying Frog Consultancy Ltd.http://www.ffconsultancy.com/?u -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
toby wrote: On Jun 7, 2:41 pm, Jon Harrop j...@ffconsultancy.com wrote: Arved Sandstrom wrote: Jon Harrop wrote: I see no problem with mutable shared state. In which case, Jon, you're in a small minority. No. Most programmers still care about performance Frequently when they shouldn't. I disagree. A lot of software is still far too slow because the programmers failed to pay attention to performance. Blogspot in Firefox being one example: it can barely keep up with my typing! and performance means mutable state. Hm, not sure Erlangers would wholly agree. Erlang is about concurrency. This thread is about parallelism. -- Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?u -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
On Tue, 9 Jun 2009 10:47:11 -0700 (PDT), toby t...@telegraphics.com.au wrote: On Jun 7, 2:41 pm, Jon Harrop j...@ffconsultancy.com wrote: Arved Sandstrom wrote: Jon Harrop wrote: performance means mutable state. Hm, not sure Erlangers would wholly agree. Erlang uses quite a bit of mutable state behind the scenes ... the programmers just don't see it. George -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
Erlang uses quite a bit of mutable state behind the scenes ... the programmers just don't see it. George Heh... The CPUs use quite a bit of mutable state behind the scenes ... the programmers just don't see it. Actually with CPU they see it more, than... say Erlang (that's why you need to use fences/barriers/locked access here and there). -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
On 6/9/2009 11:59 AM Jon Harrop said... toby wrote: On Jun 7, 2:41 pm, Jon Harrop j...@ffconsultancy.com wrote: snip No. Most programmers still care about performance Frequently when they shouldn't. I disagree. A lot of software is still far too slow because the programmers failed to pay attention to performance. For a properly written spec, performance is spec'd and paid for. Blogspot in Firefox being one example: it can barely keep up with my typing! There are so many ways that's not the program. Performance,accuracy,cost -- pick two sacrifice one. Emile -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
Jon Harrop wrote: Arved Sandstrom wrote: Jon Harrop wrote: Arved Sandstrom wrote: Lew wrote: Interesting distinction. Would it be fair to compare concurrent programming to the bricks used to build the parallel program's edifice? Way too much of a fine distinction. While they are in fact different, the point of concurrent programming is to structure programs as a group of computations, which can be executed in parallel (however that might actually be done depending on how many processors there are). No. Concurrent programming is about interleaving computations in order to reduce latency. Nothing to do with parallelism. Jon, I do concurrent programming all the time, as do most of my peers. Way down on the list of why we do it is the reduction of latency. What is higher on the list? Correctness. I'm not being facetious. I write applications that run on application servers, and from time to time I have had to write various special purpose servers. This kind of programming is all about managing concurrent execution of computations. The overarching concern is reliability and correct function. For many corporate situations, even with hundreds of users, the actual load at any instant is low enough that the various servers involved are nowhere close to being stressed out - performance is a secondary issue. AHS -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
By the way, there is a series of articles about concurrency on ACM Queue which may be interesting for those participating in or just following this discussion: http://queue.acm.org/listing.cfm?item_topic=Concurrencyqc_type=theme_listfilter=Concurrencypage_title=Concurrency Here is one introductory paragraph from one of the articles: Parallel programming poses many new challenges to the developer, one of which is synchronizing concurrent access to shared memory by multiple threads. Programmers have traditionally used locks for synchronization, but lock-based synchronization has well-known pitfalls. Simplistic coarse-grained locking does not scale well, while more sophisticated fine-grained locking risks introducing deadlocks and data races. Furthermore, scalable libraries written using fine-grained locks cannot be easily composed in a way that retains scalability and avoids deadlock and data races. -- Piet van Oostrum p...@cs.uu.nl URL: http://pietvanoostrum.com [PGP 8DAE142BE17999C4] Private email: p...@vanoostrum.org -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
rossb...@mpi-sws.org wrote: On Jun 8, 6:28 am, Ken T. nowh...@home.com wrote: Let's not forget Elite for the 6502 exploiting predictable performance in order to switch graphics modes partway down the vsync! That actually didn't require predictable timing. You could tell the video chip to send you an interrupt when it got to a given scan line. I used this myself. I don't know what Elite did, but I know for sure that it was a common trick on the Atari ST to switch color palettes or graphics mode at a fixed point *in each single scan line* to get more colors, or display graphics on the screen borders. That required synchronous programming, i.e. counting clock cycles of machine instructions such that for every point in the program you knew exactly where the electron ray would be. The Atari ST had an M68000 with exactly 8 MHz, which made this possible. There were no caches in those times, and clock cycles were entirely predictable. The usual trick for these machines was an exact multiple of the NTSC color clock, which was approx 3.58 MHz. The 8-bit atari video games and home computers all used this technique, as did the C-64/128. 68000-based machines (such as the ST and the Amiga) could not only exploit that synchrony, they could also (this was the days before memory wall) exploit the fact that a 680x0 typically accessed memory only once every 4 clock cycles to do DMA from the same memory when the CPU wasn't using it. High display resolutions would lock the processor out of RAM except during blanking intervals. (Talk about contention and hot spots.) Figuring out how to reuse resources most effectively was pretty much the same as the register-allocation problem for compilers, and was sometimes solved using the same kinds of graph-coloring algorithms... -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
Piet van Oostrum wrote: By the way, there is a series of articles about concurrency on ACM Queue which may be interesting for those participating in or just following this discussion: http://queue.acm.org/listing.cfm?item_topic=Concurrencyqc_type=theme_listfilter=Concurrencypage_title=Concurrency Here is one introductory paragraph from one of the articles: Parallel programming poses many new challenges to the developer, one of which is synchronizing concurrent access to shared memory by multiple threads. Programmers have traditionally used locks for synchronization, but lock-based synchronization has well-known pitfalls. Simplistic coarse-grained locking does not scale well, while more sophisticated fine-grained locking risks introducing deadlocks and data races. Furthermore, scalable libraries written using fine-grained locks cannot be easily composed in a way that retains scalability and avoids deadlock and data races. Is that the one about transactional memory? -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
On Jun 8, 6:28 am, Ken T. nowh...@home.com wrote: Let's not forget Elite for the 6502 exploiting predictable performance in order to switch graphics modes partway down the vsync! That actually didn't require predictable timing. You could tell the video chip to send you an interrupt when it got to a given scan line. I used this myself. I don't know what Elite did, but I know for sure that it was a common trick on the Atari ST to switch color palettes or graphics mode at a fixed point *in each single scan line* to get more colors, or display graphics on the screen borders. That required synchronous programming, i.e. counting clock cycles of machine instructions such that for every point in the program you knew exactly where the electron ray would be. The Atari ST had an M68000 with exactly 8 MHz, which made this possible. There were no caches in those times, and clock cycles were entirely predictable. -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
On Jun 7, 1:56 am, Paul Rubin http://phr...@nospam.invalid wrote: Jeff M. mass...@gmail.com writes: Even the lightest weight user space (green) threads need a few hundred instructions, minimum, to amortize the cost of context switching There's always a context switch. It's just whether or not you are switching in/out a virtual stack and registers for the context or the hardware stack/registers. I don't see the hundreds of instructions in that case. http://shootout.alioth.debian.org/u32q/benchmark.php?test=threadring;... shows GHC doing 50 million lightweight thread switches in 8.47 seconds, passing a token around a thread ring. Almost all of that is probably spent acquiring and releasing the token's lock as the token is passed from one thread to another. That simply doesn't leave time for hundreds of instructions per switch. Who said there has to be? Sample code below (just to get the point across): struct context { vir_reg pc, sp, bp, ... ; object* stack; // ... context* next; }; struct vm { context* active_context; }; void switch_context(vm* v) { // maybe GC v-active_context before switching v-active_context = v-active_context-next; } Also, there isn't hundreds of instructions with multiplexing, either. It's all done in hardware. Take a look at the disassembly for any application: one that uses native threads on a platform that supports preemption. You won't see any instructions anywhere in the program that perform a context switch. If you did that would be absolutely horrible. Imagine if the compiler did something like this: while(1) { // whatever } do_context_switch_here(); That would suck. ;-) That's not to imply that there isn't a cost; there's always a cost. The example above just goes to show that for green threads, the cost [of the switch] can be reduced down to a single pointer assignment. Jeff M. -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
Roedy Green wrote: On Fri, 5 Jun 2009 18:15:00 + (UTC), Kaz Kylheku kkylh...@gmail.com wrote, quoted or indirectly quoted someone who said : Even for problems where it appears trivial, there can be hidden issues, like false cache coherency communication where no actual sharing is taking place. Or locks that appear to have low contention and negligible performance impact on ``only'' 8 processors suddenly turn into bottlenecks. Then there is NUMA. A given address in memory may be RAM attached to the processor accessing it, or to another processor, with very different access costs. Could what you are saying be summed up by saying, The more threads you have the more important it is to keep your threads independent, sharing as little data as possible. I see no problem with mutable shared state. -- Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?u -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
George Neuner wrote: On Fri, 05 Jun 2009 16:26:37 -0700, Roedy Green see_webs...@mindprod.com.invalid wrote: On Fri, 5 Jun 2009 18:15:00 + (UTC), Kaz Kylheku kkylh...@gmail.com wrote, quoted or indirectly quoted someone who said : Even for problems where it appears trivial, there can be hidden issues, like false cache coherency communication where no actual sharing is taking place. Or locks that appear to have low contention and negligible performance impact on ``only'' 8 processors suddenly turn into bottlenecks. Then there is NUMA. A given address in memory may be RAM attached to the processor accessing it, or to another processor, with very different access costs. Could what you are saying be summed up by saying, The more threads you have the more important it is to keep your threads independent, sharing as little data as possible. And therein lies the problem of leveraging many cores. There is a lot of potential parallelism in programs (even in Java :) that is lost because it is too fine a grain for threads. That will always be true so it conveys no useful information to the practitioner. Even the lightest weight user space (green) threads need a few hundred instructions, minimum, to amortize the cost of context switching. Work items in Cilk are much faster than that. Add to that the fact that programmers have shown themselves, on average, to be remarkably bad at figuring out what _should_ be done in parallel - as opposed to what _can_ be done - and you've got a clear indicator that threads, as we know them, are not scalable except under a limited set of conditions. Parallelism is inherently not scalable. I see no merit in speculating about the ramifications of average programmers alleged inabilities. -- Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?u -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
Scott David Daniels wrote: the nub of the problem is not on the benchmarks. There is something to be said for the good old daays when you looked up the instruction timings that you used in a little document for your machine, and could know the cost of any loop. We are faster now, but part of the cost of that speed is that timing is a black art. Those good old days never existed. Those manuals never accounted for things that affected timing even then, like memory latency or refresh time. SRAM cache made things worse, since the published timings never mentioned cache-miss delays. Though memory cache might seem a recent innovation, it's been around a while. It would be challenging to find any published timing since the commercialization of computers that would actually tell the cost of any loop. Things got worse when chips like the '86 family acquired multiple instructions for doing loops, still worse when pre-fetch pipelines became deeper and wider, absolutely Dark Art due to multi-level memory caches becoming universal, and throw-your-hands-up-and-leave-for-the-corner-bar with multiprocessor NUMA systems. OSes and high-level languages complicate the matter - you never know how much time slice you'll get or how your source got compiled or optimized by run-time. So the good old days are a matter of degree and self-deception - it was easier to fool ourselves then that we could at least guess timings proportionately if not absolutely, but things definitely get more unpredictable over evolution. -- Lew -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
Jon Harrop wrote: Roedy Green wrote: On Fri, 5 Jun 2009 18:15:00 + (UTC), Kaz Kylheku kkylh...@gmail.com wrote, quoted or indirectly quoted someone who said : Even for problems where it appears trivial, there can be hidden issues, like false cache coherency communication where no actual sharing is taking place. Or locks that appear to have low contention and negligible performance impact on ``only'' 8 processors suddenly turn into bottlenecks. Then there is NUMA. A given address in memory may be RAM attached to the processor accessing it, or to another processor, with very different access costs. Could what you are saying be summed up by saying, The more threads you have the more important it is to keep your threads independent, sharing as little data as possible. I see no problem with mutable shared state. In which case, Jon, you're in a small minority. AHS -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
Arved Sandstrom wrote: Jon Harrop wrote: I see no problem with mutable shared state. In which case, Jon, you're in a small minority. No. Most programmers still care about performance and performance means mutable state. -- Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?u -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
Jon Harrop wrote: No. Most programmers still care about performance and performance means mutable state. [ Citation needed ]. Most programmers I've met could care less about performance. -- Beware of bugs in the above code; I have only proved it correct, not tried it. -- Donald E. Knuth -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
Jon Harrop wrote: Arved Sandstrom wrote: Jon Harrop wrote: I see no problem with mutable shared state. In which case, Jon, you're in a small minority. No. Most programmers still care about performance and performance means mutable state. I don't see why that would affect whether one thinks there are problems. In my opinion, shared mutable state has a lot of problems. It is also sometimes the best design for performance reasons. Patricia -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
Arved Sandstrom wrote: Jon Harrop wrote: Arved Sandstrom wrote: Jon Harrop wrote: I see no problem with mutable shared state. In which case, Jon, you're in a small minority. No. Most programmers still care about performance and performance means mutable state. Quite apart from performance and mutable state, I believe we were talking about mutable _shared_ state. And this is something that gets a _lot_ of people into trouble. Nonsense. Scientists have been writing parallel programs for decades using shared state extensively without whining about it. Databases are mutable shared state but millions of database programmers solve real problems every day without whining about it. Use your common sense and you can write efficient parallel programs today with little difficulty. I do. -- Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?u -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
Joshua Cranmer wrote: Jon Harrop wrote: No. Most programmers still care about performance and performance means mutable state. [ Citation needed ]. Most programmers I've met could care less about performance. Then they have no need for parallelism in the first place. -- Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?u -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
On Jun 7, 3:19 pm, Arved Sandstrom dces...@hotmail.com wrote: Jon Harrop wrote: Arved Sandstrom wrote: Jon Harrop wrote: I see no problem with mutable shared state. In which case, Jon, you're in a small minority. No. Most programmers still care about performance and performance means mutable state. Quite apart from performance and mutable state, I believe we were talking about mutable _shared_ state. And this is something that gets a _lot_ of people into trouble. Mutable shared state gets _bad_ (err.. perhaps inexperienced would be a better adjective) programmers - who don't know what they are doing - in trouble. There are many problem domains that either benefit greatly from mutable shared states or can't [easily] be done without them. Unified memory management being an obvious example... there are many more. Unshared state has its place. Immutable state has its place. Shared immutable state has its place. Shared mutable place has its place. Jeff M. -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
Jon Harrop wrote: I see no problem with mutable shared state. In which case, Jon, you're in a small minority. Patricia Shanahan wrote: In my opinion, shared mutable state has a lot of problems. It is also sometimes the best design for performance reasons. As Dr. Jon pointed out upthread, one can write decent code with mutable shared state. It is also true that mutable state presents a lot of problems - potential problems, ones that can be solved, but not ones that can be solved thoughtlessly. On the flip side, one can write a tremendous amount of effective multi-threaded code involving shared mutable state with attention to a few rules of thumb, like always synchronize access and don't use different monitors to do so. Unlike some environments (e.g., database management systems), Java's tools to manage concurrency are explicit and low level. The programmer's job is to make sure those tools are used correctly to avoid problems. As long as they do that, then there is no special problem with shared mutable state. There is, however, a cost. Certain things must happen slower when you share mutable state, than when you share immutable state or don't share state. Certain things must happen when you share mutable state, regardless of speed, because without them your code doesn't work. For some reason, concurrent programming is an area often not well understood by a significant percentage of workaday programmers. When problems do arise, they tend to be probabilistic in nature and vary widely with system characteristics like attempted load. So the meeting ground is, yes, concurrent mutable state can present problems if not properly managed. Properly managing such is not necessarily a huge burden, but it must be borne. When done properly, shared mutable state will not present problems in production. -- Lew -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
Jeff M. wrote: On Jun 7, 3:19 pm, Arved Sandstrom dces...@hotmail.com wrote: Jon Harrop wrote: Arved Sandstrom wrote: Jon Harrop wrote: I see no problem with mutable shared state. In which case, Jon, you're in a small minority. No. Most programmers still care about performance and performance means mutable state. Quite apart from performance and mutable state, I believe we were talking about mutable _shared_ state. And this is something that gets a _lot_ of people into trouble. Mutable shared state gets _bad_ (err.. perhaps inexperienced would be a better adjective) programmers - who don't know what they are doing - in trouble. There are many problem domains that either benefit greatly from mutable shared states or can't [easily] be done without them. Unified memory management being an obvious example... there are many more. Unshared state has its place. Immutable state has its place. Shared immutable state has its place. Shared mutable place has its place. Exactly. I don't believe that shared mutable state is any harder to do correctly than the next solution and it is almost always the most efficient solution and the sole purpose of writing parallel programs to leverage multicores is performance. A bad developer can screw up anything. I see no reason to think that shared mutable state is any more fragile than the next thing a bad developer can screw up. -- Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?u -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
Lew wrote: As Dr. Jon pointed out upthread, one can write decent code with mutable shared state. It is also true that mutable state presents a lot of problems - potential problems, ones that can be solved, but not ones that can be solved thoughtlessly. On the flip side, one can write a tremendous amount of effective multi-threaded code involving shared mutable state with attention to a few rules of thumb, like always synchronize access and don't use different monitors to do so. Unlike some environments (e.g., database management systems), Java's tools to manage concurrency are explicit and low level. The programmer's job is to make sure those tools are used correctly to avoid problems. As long as they do that, then there is no special problem with shared mutable state. There is, however, a cost. Certain things must happen slower when you share mutable state, than when you share immutable state or don't share state. Certain things must happen when you share mutable state, regardless of speed, because without them your code doesn't work. For some reason, concurrent programming is an area often not well understood by a significant percentage of workaday programmers. When problems do arise, they tend to be probabilistic in nature and vary widely with system characteristics like attempted load. So the meeting ground is, yes, concurrent mutable state can present problems if not properly managed. Properly managing such is not necessarily a huge burden, but it must be borne. When done properly, shared mutable state will not present problems in production. I agree entirely but my statements were about parallelism and not concurrency. Parallel and concurrent programming have wildly different characteristics and solutions. I don't believe shared mutable state is overly problematic in the context of parallelism. Indeed, I think it is usually the best solution in that context. -- Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?u -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
Jon Harrop wrote: I agree entirely but my statements were about parallelism and not concurrency. Parallel and concurrent programming have wildly different characteristics and solutions. I don't believe shared mutable state is overly problematic in the context of parallelism. Indeed, I think it is usually the best solution in that context. Interesting distinction. Would it be fair to compare concurrent programming to the bricks used to build the parallel program's edifice? -- Lew -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
On Sun, 07 Jun 2009 11:16:46 -0400, Lew wrote: So the good old days are a matter of degree and self-deception - it was easier to fool ourselves then that we could at least guess timings proportionately if not absolutely, but things definitely get more unpredictable over evolution. As I recall I could get exact timings on my 6502 based Commodore 64. The issues you speak of simply weren't issues. -- Ken T. http://www.electricsenator.net Never underestimate the power of stupid people in large groups. -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
Lew wrote: Jon Harrop wrote: I agree entirely but my statements were about parallelism and not concurrency. Parallel and concurrent programming have wildly different characteristics and solutions. I don't believe shared mutable state is overly problematic in the context of parallelism. Indeed, I think it is usually the best solution in that context. Interesting distinction. Would it be fair to compare concurrent programming to the bricks used to build the parallel program's edifice? Way too much of a fine distinction. While they are in fact different, the point of concurrent programming is to structure programs as a group of computations, which can be executed in parallel (however that might actually be done depending on how many processors there are). Parallel computing means to carry out many computations simultaneously. These are interleaved definitions. And they are *not* wildly different. If you talk about shared mutable state, it is not as easy to use as Dr Harrop seems to think it is. Maybe in his experience it has been, but in general it's no trivial thing to manage. Lew, you probably summarized it best a few posts upstream. AHS -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
Arved Sandstrom wrote: Lew wrote: Interesting distinction. Would it be fair to compare concurrent programming to the bricks used to build the parallel program's edifice? Way too much of a fine distinction. While they are in fact different, the point of concurrent programming is to structure programs as a group of computations, which can be executed in parallel (however that might actually be done depending on how many processors there are). No. Concurrent programming is about interleaving computations in order to reduce latency. Nothing to do with parallelism. -- Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?u -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
Lew wrote: Jon Harrop wrote: I agree entirely but my statements were about parallelism and not concurrency. Parallel and concurrent programming have wildly different characteristics and solutions. I don't believe shared mutable state is overly problematic in the context of parallelism. Indeed, I think it is usually the best solution in that context. Interesting distinction. Would it be fair to compare concurrent programming to the bricks used to build the parallel program's edifice? Concurrent programming certainly underpins the foundations of almost all parallel programs. Not least at the level of the OS scheduling the threads than run the parallel programs. However, that knowledge is probably more confusing than helpful here. In essence, concurrent programming is concerned with reducing latency (e.g. evading blocking) by interleaving computations whereas parallel programming is concerned with maximizing throughput by performing computations at the same time. Historically, concurrency has been of general interest on single core machines in the context of operating systems and IO and has become more important recently due to the ubiquity of web programming. Parallelism was once only important to computational scientists programming shared-memory supercomputers and enterprise developers programming distributed-memory clusters but the advent of multicore machines on the desktop and in the games console has pushed parallelism into the lime light for ordinary developers when performance is important. Solutions for concurrent and parallel programming are also wildly different. Concurrent programming typically schedules work items that are expected to block on a shared queue for a pool of dozens or even hundreds of threads. Parallel programming typically schedules work items that are expected to not block on wait-free work-stealing queues for a pool of n threads where n is the number of cores. Solutions for concurrent programming (such as the .NET thread pool and asynchronous workflows in F#) can be used as a poor man's solution for parallel programming but the overheads are huge because they were not designed for this purpose so performance is much worse than necessary. Solutions for parallel programming (e.g. Cilk, the Task Parallel Library) are virtually useless for concurrent programming because you quickly end up with all n threads blocked and the whole program stalls. -- Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?u -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
Jon Harrop wrote: ... Historically, concurrency has been of general interest on single core machines in the context of operating systems and IO and has become more important recently due to the ubiquity of web programming. Parallelism was once only important to computational scientists programming shared-memory supercomputers and enterprise developers programming distributed-memory clusters but the advent of multicore machines on the desktop and in the games console has pushed parallelism into the lime light for ordinary developers when performance is important. ... Parallelism has also been important, for a long time, to multiprocessor operating system developers. I got my first exposure to parallel programming, in the 1980's, working on NCR's VRX operating system. Patricia -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
Lew wrote: div class=moz-text-flowed style=font-family: -moz-fixedScott David Daniels wrote: the nub of the problem is not on the benchmarks. There is something to be said for the good old daays when you looked up the instruction timings that you used in a little document for your machine, and could know the cost of any loop. We are faster now, but part of the cost of that speed is that timing is a black art. Those good old days never existed. Those manuals never accounted for things that affected timing even then, like memory latency or refresh time. SRAM cache made things worse, since the published timings never mentioned cache-miss delays. Though memory cache might seem a recent innovation, it's been around a while. It would be challenging to find any published timing since the commercialization of computers that would actually tell the cost of any loop. Things got worse when chips like the '86 family acquired multiple instructions for doing loops, still worse when pre-fetch pipelines became deeper and wider, absolutely Dark Art due to multi-level memory caches becoming universal, and throw-your-hands-up-and-leave-for-the-corner-bar with multiprocessor NUMA systems. OSes and high-level languages complicate the matter - you never know how much time slice you'll get or how your source got compiled or optimized by run-time. So the good old days are a matter of degree and self-deception - it was easier to fool ourselves then that we could at least guess timings proportionately if not absolutely, but things definitely get more unpredictable over evolution. Nonsense. The 6502 with static memory was precisely predictable, and many programmers (working in machine language, naturally) counted on it. Similarly the Novix 4000, when programmed in its native Forth. And previous to that, I worked on several machines (in fact, I wrote the assembler and debugger for two of them) where the only variable was the delay every two milliseconds for dynamic memory refresh. Separate control memory and data memory, and every instruction precisely clocked. No instruction prefetch, no cache memory. What you see is what you get. Would I want to go back there? No. Sub-megaherz clocks with much less happening on each clock means we were operating at way under .01% of present day. -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
Ken T. wrote: On Sun, 07 Jun 2009 11:16:46 -0400, Lew wrote: So the good old days are a matter of degree and self-deception - it was easier to fool ourselves then that we could at least guess timings proportionately if not absolutely, but things definitely get more unpredictable over evolution. As I recall I could get exact timings on my 6502 based Commodore 64. The issues you speak of simply weren't issues. Let's not forget Elite for the 6502 exploiting predictable performance in order to switch graphics modes partway down the vsync! -- Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?u -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
Jon Harrop wrote: Arved Sandstrom wrote: Lew wrote: Interesting distinction. Would it be fair to compare concurrent programming to the bricks used to build the parallel program's edifice? Way too much of a fine distinction. While they are in fact different, the point of concurrent programming is to structure programs as a group of computations, which can be executed in parallel (however that might actually be done depending on how many processors there are). No. Concurrent programming is about interleaving computations in order to reduce latency. Nothing to do with parallelism. Jon, I do concurrent programming all the time, as do most of my peers. Way down on the list of why we do it is the reduction of latency. AHS -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
Arved Sandstrom wrote: Jon Harrop wrote: Arved Sandstrom wrote: Lew wrote: Interesting distinction. Would it be fair to compare concurrent programming to the bricks used to build the parallel program's edifice? Way too much of a fine distinction. While they are in fact different, the point of concurrent programming is to structure programs as a group of computations, which can be executed in parallel (however that might actually be done depending on how many processors there are). No. Concurrent programming is about interleaving computations in order to reduce latency. Nothing to do with parallelism. Jon, I do concurrent programming all the time, as do most of my peers. Way down on the list of why we do it is the reduction of latency. What is higher on the list? -- Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?u -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
On Mon, 08 Jun 2009 02:39:40 +0100, Jon Harrop wrote: Ken T. wrote: On Sun, 07 Jun 2009 11:16:46 -0400, Lew wrote: So the good old days are a matter of degree and self-deception - it was easier to fool ourselves then that we could at least guess timings proportionately if not absolutely, but things definitely get more unpredictable over evolution. As I recall I could get exact timings on my 6502 based Commodore 64. The issues you speak of simply weren't issues. Let's not forget Elite for the 6502 exploiting predictable performance in order to switch graphics modes partway down the vsync! That actually didn't require predictable timing. You could tell the video chip to send you an interrupt when it got to a given scan line. I used this myself. Elite was cool though. I wasted many hours playing that game. -- Ken T. http://www.electricsenator.net Duct tape is like the force. It has a light side, and a dark side, and it holds the universe together ... -- Carl Zwanzig -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
George Neuner gneun...@comcast.net writes: Even the lightest weight user space (green) threads need a few hundred instructions, minimum, to amortize the cost of context switching. I thought the definition of green threads was that multiplexing them doesn't require context switches. -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
Jeff M. mass...@gmail.com writes: Even the lightest weight user space (green) threads need a few hundred instructions, minimum, to amortize the cost of context switching There's always a context switch. It's just whether or not you are switching in/out a virtual stack and registers for the context or the hardware stack/registers. I don't see the hundreds of instructions in that case. http://shootout.alioth.debian.org/u32q/benchmark.php?test=threadringlang=ghcid=3 shows GHC doing 50 million lightweight thread switches in 8.47 seconds, passing a token around a thread ring. Almost all of that is probably spent acquiring and releasing the token's lock as the token is passed from one thread to another. That simply doesn't leave time for hundreds of instructions per switch. -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
Paul Rubin wrote: Jeff M. mass...@gmail.com writes: Even the lightest weight user space (green) threads need a few hundred instructions, minimum, to amortize the cost of context switching There's always a context switch. It's just whether or not you are switching in/out a virtual stack and registers for the context or the hardware stack/registers. I don't see the hundreds of instructions in that case. http://shootout.alioth.debian.org/u32q/benchmark.php?test=threadringlang=ghcid=3 shows GHC doing 50 million lightweight thread switches in 8.47 seconds, passing a token around a thread ring. Almost all of that is probably spent acquiring and releasing the token's lock as the token is passed from one thread to another. That simply doesn't leave time for hundreds of instructions per switch. And Haskell is not exactly fast... -- Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?u -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
Scott David Daniels wrote: Lew wrote: Scott David Daniels wrote: the nub of the problem is not on the benchmarks. There is something to be said for the good old daays when you looked up the instruction timings that you used in a little document for your machine, and could know the cost of any loop. We are faster now, but part of the cost of that speed is that timing is a black art. Those good old days never existed. Those manuals never accounted for things that affected timing even then, like memory latency or refresh time. Well, as Gilbert and Sullivan wrote: - What, never? - No, never! - What, Never? - Well, hardly ever. Look up the LGP-30. It was quite predictable. It has been a while. Same for early ARMs. -- Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?u -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
Lew wrote: Scott David Daniels wrote: the nub of the problem is not on the benchmarks. There is something to be said for the good old daays when you looked up the instruction timings that you used in a little document for your machine, and could know the cost of any loop. We are faster now, but part of the cost of that speed is that timing is a black art. Those good old days never existed. Those manuals never accounted for things that affected timing even then, like memory latency or refresh time. Well, as Gilbert and Sullivan wrote: - What, never? - No, never! - What, Never? - Well, hardly ever. Look up the LGP-30. It was quite predictable. It has been a while. --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
On Fri, 5 Jun 2009 18:15:00 + (UTC), Kaz Kylheku kkylh...@gmail.com wrote, quoted or indirectly quoted someone who said : Even for problems where it appears trivial, there can be hidden issues, like false cache coherency communication where no actual sharing is taking place. Or locks that appear to have low contention and negligible performance impact on ``only'' 8 processors suddenly turn into bottlenecks. Then there is NUMA. A given address in memory may be RAM attached to the processor accessing it, or to another processor, with very different access costs. Could what you are saying be summed up by saying, The more threads you have the more important it is to keep your threads independent, sharing as little data as possible. -- Roedy Green Canadian Mind Products http://mindprod.com Never discourage anyone... who continually makes progress, no matter how slow. ~ Plato 428 BC died: 348 BC at age: 80 -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
Roedy Green see_webs...@mindprod.com.invalid writes: On Fri, 5 Jun 2009 18:15:00 + (UTC), Kaz Kylheku kkylh...@gmail.com wrote, quoted or indirectly quoted someone who said : Even for problems where it appears trivial, there can be hidden issues, like false cache coherency communication where no actual sharing is taking place. Or locks that appear to have low contention and negligible performance impact on ``only'' 8 processors suddenly turn into bottlenecks. Then there is NUMA. A given address in memory may be RAM attached to the processor accessing it, or to another processor, with very different access costs. Could what you are saying be summed up by saying, The more threads you have the more important it is to keep your threads independent, sharing as little data as possible. Absolutely... not a new observation, either, as it follows directly from Amdahl's law. -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
On Fri, 05 Jun 2009 16:26:37 -0700, Roedy Green see_webs...@mindprod.com.invalid wrote: On Fri, 5 Jun 2009 18:15:00 + (UTC), Kaz Kylheku kkylh...@gmail.com wrote, quoted or indirectly quoted someone who said : Even for problems where it appears trivial, there can be hidden issues, like false cache coherency communication where no actual sharing is taking place. Or locks that appear to have low contention and negligible performance impact on ``only'' 8 processors suddenly turn into bottlenecks. Then there is NUMA. A given address in memory may be RAM attached to the processor accessing it, or to another processor, with very different access costs. Could what you are saying be summed up by saying, The more threads you have the more important it is to keep your threads independent, sharing as little data as possible. And therein lies the problem of leveraging many cores. There is a lot of potential parallelism in programs (even in Java :) that is lost because it is too fine a grain for threads. Even the lightest weight user space (green) threads need a few hundred instructions, minimum, to amortize the cost of context switching. Add to that the fact that programmers have shown themselves, on average, to be remarkably bad at figuring out what _should_ be done in parallel - as opposed to what _can_ be done - and you've got a clear indicator that threads, as we know them, are not scalable except under a limited set of conditions. George -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
På Sat, 06 Jun 2009 21:46:51 +0200, skrev George Neuner gneun...@comcast.net: On Fri, 05 Jun 2009 16:26:37 -0700, Roedy Green Add to that the fact that programmers have shown themselves, on average, to be remarkably bad at figuring out what _should_ be done in parallel - as opposed to what _can_ be done - and you've got a clear indicator that threads, as we know them, are not scalable except under a limited set of conditions. George I find the dataflow model of concurrency on Oz to be interesting and to address many of the issues you just mentioned. See in particular: 'Dataflow variables and declarative concurrency' and onward. http://en.wikipedia.org/wiki/Oz_(programming_language) - John Thingstad -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
On Jun 6, 9:58 pm, Paul Rubin http://phr...@nospam.invalid wrote: George Neuner gneun...@comcast.net writes: Even the lightest weight user space (green) threads need a few hundred instructions, minimum, to amortize the cost of context switching. I thought the definition of green threads was that multiplexing them doesn't require context switches. There's always a context switch. It's just whether or not you are switching in/out a virtual stack and registers for the context or the hardware stack/registers. Jeff M. -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
On Thu, 4 Jun 2009 09:46:44 -0700 (PDT), Xah Lee xah...@gmail.com wrote, quoted or indirectly quoted someone who said : Why Must Software Be Rewritten For Multi-Core Processors? Threads have been part of Java since Day 1. Using threads complicates your code, but even with a single core processor, they can improve performance, particularly if you are doing something like combing multiple websites. The nice thing about Java is whether you are on a single core processor or a 256 CPU machine (We got to run our Athena Integer Java spreadsheet engine on such a beast), does not concern your code. You just have to make sure your threads don't interfere with each other, and Java/the OS, handle exploiting all the CPUs available. -- Roedy Green Canadian Mind Products http://mindprod.com Never discourage anyone... who continually makes progress, no matter how slow. ~ Plato 428 BC died: 348 BC at age: 80 -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
On 2009-06-05, Vend ven...@virgilio.it wrote: On Jun 4, 8:35 pm, Roedy Green see_webs...@mindprod.com.invalid wrote: On Thu, 4 Jun 2009 09:46:44 -0700 (PDT), Xah Lee xah...@gmail.com wrote, quoted or indirectly quoted someone who said : • Why Must Software Be Rewritten For Multi-Core Processors? Threads have been part of Java since Day 1. Using threads complicates your code, but even with a single core processor, they can improve performance, particularly if you are doing something like combing multiple websites. The nice thing about Java is whether you are on a single core processor or a 256 CPU machine (We got to run our Athena Integer Java spreadsheet engine on such a beast), does not concern your code. You just have to make sure your threads don't interfere with each other, and Java/the OS, handle exploiting all the CPUs available. You need to decompose your problem in 256 independent tasks. It can be trivial for some problems and difficult or perhaps impossible for some others. Even for problems where it appears trivial, there can be hidden issues, like false cache coherency communication where no actual sharing is taking place. Or locks that appear to have low contention and negligible performance impact on ``only'' 8 processors suddenly turn into bottlenecks. Then there is NUMA. A given address in memory may be RAM attached to the processor accessing it, or to another processor, with very different access costs. -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
On Jun 4, 9:46 am, Xah Lee xah...@gmail.com wrote: Of interest: • Why Must Software Be Rewritten For Multi-Core Processors? http://xahlee.org/UnixResource_dir/writ/multi-core_software.html plain text version follows. -- Why Must Software Be Rewritten For Multi-Core Processors? Xah Lee, 2009-06-04 I had a revelation today, namely, that it is necessary to rewrite software to use multi-processor in order to benefit from it. This may sound stupid, but is a revelation to me. For the past decade, the question has been on my mind, about why should software needs to be rewritten to take advantage of multi-processors. Because, in my mind, i thought that software are at some fundamental level just algorithms, and algorithms, have nothing to do with hardware implementation aspects such as number of processors. I always felt, that those talks about the need or difficulty of rewriting software for multi-processor (or multi-core these days) must be the product of idiocy of industrial imperative coding monkies. In particular, some languages such as java, the way they deal with it, seems to me extremely stupid. e.g. the concept of threads. In my mind, there should be a layer between the software and the hardware, such as the operating system, or the compiler, that should be able to automatically handle or compile the software so that it FULLY use the multi-processors when present. In short, i thought that a algorithm really has nothing to do with hardware details. I never really thought hard about this issue, but today, since i got a quad-core PC, so i looked into the issue, and thought about it, and i realized the answer. The gist is that, algorithm, fundamentally means manipulating some hardware, in fact, algorithm is a step by step instruction about some form of hardware, even the hardware may be abstract or virtual. For example, let's say the simplest case of 1+1. It is a algorithm, but where is the hardware? You may say it's abstract concept, or it being a mathematical model. What you call 1+1 depends on the context, but in our context, those numbers are the hardware. To see this, lets say more complex example of listing primes by sieve. Again, it comes down to “what is a number”? Here, numbers can be stones, or arrangement of beads on abacus, it's hardware! As another example, say sorting. To begin with, you have to have some something to sort, that's hardware. Another way to see this is that, for a given computing problem, there are infinite number of algorithms to achieve the same thing. Some, will be better ones, requiring less steps, or requiring less storage. All these are concrete manipulation issues, and the thing being manipulated, ultimately we have to call it hardware. So, when hardware changes, say from one cpu to multi-cpu, there's no way for the algorithm to magically change and adopt the changed hardware. If you need a algorithm that is catered to the new hardware, you need a new algorithm. One might think that there might be algorithm Omega, such that it takes input of old hardware O, new hardware N, and a algorithm A, and output a algorithm B, such that B has the same behavior as A, but N+B performs better than O+A. This is like asking for Strong AI. One thing we often forgot is that algorithms ultimately translates to manipulating hardware. In a modern digital computer, that means software algorithms ultimately becomes machine instructions in CPU, which manipulate the 1s and 0s in register, or electricity voltage in transisters. In a more mundane point of view, a automatic system for software to work on multi-processors is a problem of breaking a problem into discrete units (so that they can be computed in parallel). The problem of finding a algorithm is entirely different from the problem of breaking a problem into distinct units. The problem of breaking a problem into distinct units is a entire new branch of mathematics. For example, let's say factoring. Factoring is a well defined mathematical problem. There are millions algorithms to do it, each class has different properties such as number of steps or storage units. However, if we look at these algorithms from the point of view of distinct units, it's a new perspective on classification of algorithms. Software are in general too ill-defined and fuzzy and complex, the software we use on personal computers such as email, browsers, games, don't even have mathematical models. They don't even have mathematical models of their inputs and outputs. To talk about automatic system of unitizing software, would be more like a AI fantasy. Roughly, a term that describes this aspect of research is Parallel computing. In the Wikipedia article, it talks about types of parallelism: Bit- level parallelism, Instruction-level parallelism, Data parallelism, Task parallelism. Then it also discusses hardware aspects classes:
Re: multi-core software
On Jun 6, 1:26 am, Roedy Green see_webs...@mindprod.com.invalid wrote: On Fri, 5 Jun 2009 18:15:00 + (UTC), Kaz Kylheku kkylh...@gmail.com wrote, quoted or indirectly quoted someone who said : Even for problems where it appears trivial, there can be hidden issues, like false cache coherency communication where no actual sharing is taking place. Or locks that appear to have low contention and negligible performance impact on ``only'' 8 processors suddenly turn into bottlenecks. Then there is NUMA. A given address in memory may be RAM attached to the processor accessing it, or to another processor, with very different access costs. Could what you are saying be summed up by saying, The more threads you have the more important it is to keep your threads independent, sharing as little data as possible. Besides technical issues such as cache conflicts and synchronization latencies, there are more theoretical issues of task decomposability. It seems it is not always feasible to decompose an algorithm into subprograms that can be executed in parallel. -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
[Followup-To: header set to comp.lang.lisp.] On 2009-06-04, Roedy Green see_webs...@mindprod.com.invalid wrote: On Thu, 4 Jun 2009 09:46:44 -0700 (PDT), Xah Lee xah...@gmail.com wrote, quoted or indirectly quoted someone who said : Why Must Software Be Rewritten For Multi-Core Processors? Threads have been part of Java since Day 1. Unfortunately, not sane threads designed by people who actually understand multithreading. The nice thing about Java is whether you are on a single core processor or a 256 CPU machine (We got to run our Athena Integer Java spreadsheet engine on such a beast), does not concern your code. You are dreaming if you think that there are any circumstances (other than circumstances in which performance doesn't matter) in which you don't have to concern yourself about the difference between a uniprocessor and a 256 CPU machine. -- http://mail.python.org/mailman/listinfo/python-list
Re: multi-core software
Kaz Kylheku wrote: [Followup-To: header set to comp.lang.lisp.] On 2009-06-04, Roedy Green see_webs...@mindprod.com.invalid wrote: On Thu, 4 Jun 2009 09:46:44 -0700 (PDT), Xah Lee xah...@gmail.com wrote, quoted or indirectly quoted someone who said : Why Must Software Be Rewritten For Multi-Core Processors? Threads have been part of Java since Day 1. Unfortunately, not sane threads designed by people who actually understand multithreading. The nice thing about Java is whether you are on a single core processor or a 256 CPU machine (We got to run our Athena Integer Java spreadsheet engine on such a beast), does not concern your code. You are dreaming if you think that there are any circumstances (other than circumstances in which performance doesn't matter) in which you don't have to concern yourself about the difference between a uniprocessor and a 256 CPU machine. If you're interested in parallel programming, have a look at Flow-Based Programming: http://www.jpaulmorrison.com/fbp/ -- http://mail.python.org/mailman/listinfo/python-list