Re: Parallelism and Concurrency was Re: Ideas fora"Object-Belongs-to-Thread" (nntp: message 4 of 20) threading model (nntp:message 20 of 20 -lastone!-)

Dave Whipp Mon, 17 May 2010 14:15:03 -0700

nigelsande...@btconnect.com wrote:

From that statement, you do not appear to understand the subject matterof this thread: Perl 6 concurrency model.

If I misunderstood then I apologize: I had thought that the subject wasthe underlying abstractions of parallelism and concurrency that perl6language will define in order to enable specific threading modules to beprovided by implementations. If the subject is specifically contemporary[multicore] CPU threading implementations then my comments may not berelevant.

For CPU-bound processes [...].
Sure, there are exotica hardware that have thousands of cores, but ithardly seems likely that having spent $millions upon such hardware torun your massively parallel algorithms to solve problems in realistictime frames, that your going to use a dynamic (no-compiled) language towrite your solutions in.

Any midrange GPU will support millions of thread-launches per second.One frame of 720p video has a million pixels, and these days in notuncommon to have multiple threads per pixel.

True, using a dynamic programming language for the guts of these threadsis probably not a good tradeoff today: I'd probably use anInline::OpenCL module initially. But I see no reason that a stronglystatically typed subset of Perl6 could not be compiled to efficientdevice-code.

To use millions of threads you don't focus on what the algorithm isdoing: you focus on where the data is going. If you move dataunnecessarily (or fail to move it when it was necessary) then you'llburn power and lose performance.
Sorry, but I've got to call you on this.
Parallelisation (threading) is all about improving performance. And thefirst three rules of performance are: algorithm; algorithm; algorithm.Choose the wrong algorithm and you are wasting cycles. Parallelise thatwrong algorithm, and you're just multiplying the number of cycles you'rewasting.

I always thought that the first rule of performance optimization is"measure" (i.e. run a profiler). But ignoring that quibble, the reasonthat bad algorithms are bad is (usually) bad data management (eitherunnecessary movement, or unnecessary locking). If you want to understandwhy an algorithm is inefficient then you need to study the dataaccesses, not (just) the processing. A special case of bad datamovement, that applies even to sequential code, is cache-thrashing.

This is somewhat analagous to ASIC design: at around .13 um processes,wire load delays started to dominate gate delays: wires became adominant source of delay for many paths: initially just the longerroutes, but these days you can't ignore them. This doesn't mean that youcan completely ignore the logic: it just means that logic optimizationis taken as a given, and that the real work is in placement and routing.

Similarly, while a bad algorithm is obviously bad, even a good algorithmwill perform badly (i.e. will waste memory bandwidth and power) if youdon't have a way to define how the data will move in the implementationof that algorithm. If you're not careful then you'll burn more powermoving data from/to memory than processing it (slightly stale data:moving a 32-bit value to local dram may use 1nJ (1 nanojoule); movingthat same value a millimeter onchip may burn only 10 pJ: similar to theenergy of a single-precision floating point operation, but that op needs2 source operands and must write a destination, each of which requires adata movement. You can, of course, ignore these issues if you're justmanaging a handful of IO-bound CPU threads.

Feel free to tell me that perl6 will never be used in scenarios wheresuch considerations are important. I'll probably disagree. But I couldprobably be persuaded that the the current type-system has sufficientmechanisms (via traits) to define data placement without any newfeatures. And therefore the issue can be ignored until someone actuallyattempts to implement OpenCL bindings.



Dave.

Re: Parallelism and Concurrency was Re: Ideas fora"Object-Belongs-to-Thread" (nntp: message 4 of 20) threading model (nntp:message 20 of 20 -lastone!-)

Reply via email to