Fwd: Re: Parallelism and Concurrency was Re: Ideas for aObject-Belongs-to-Thread (nntp: message 4 of 20) threading model (nntp: message 20 of 20 -lastone!-) (nntp: message 13 of 20)

2010-05-18 Thread nigelsandever



--- Forwarded message ---
From: nigelsande...@btconnect.com
To: Dave Whipp - d...@whipp.name  
+nntp+browseruk+e66dbbe0cf.dave#whipp.n...@spamgourmet.com

Cc:
Subject: Re: Parallelism and Concurrency was Re: Ideas for  
aObject-Belongs-to-Thread (nntp: message 4 of 20) threading model (nntp:  
message 20 of 20 -lastone!-) (nntp: message 13 of 20)

Date: Mon, 17 May 2010 22:31:45 +0100

On Mon, 17 May 2010 20:33:24 +0100, Dave Whipp - dave_wh...@yahoo.com
+nntp+browseruk+2dcf7cf254.dave_whipp#yahoo@spamgourmet.com wrote:


From that statement, you do not appear to understand the subject matter

of this thread: Perl 6 concurrency model.

Actually, the reason for my post was that I fear that I did understand  
the subject matter of the thread: seems to me that any reasonable  
discussion of perl 6 concurrency should not be too focused on  
pthreads-style threading.


Okay. Now we're at cross-purposes about the heavily overloaded term
threading. Whilst GPUs overload the term threading for their internal
operations, they are for the most part invisible to applications
programmer. And quite different to the 100,000 threads demos in the Go and
Erlang documentation to which I referred. The latter being MIMD
algorithms, and significantly harder to find applications for than, SIMD
algorithms which are commonplace and well understood.

My uses of the terms threading and threads are limited specifically to
MIMD threading of two forms:

Kernel threading: pthreads, Win32/64 threads etc.
User-space threading: green threads; coroutines; goroutines; Actors; etc.

See below for why I've been limiting myself to these two definitions.

OpenCL/Cuda are not exotic $M hardware: they are available (and  
performant) on any PC (or Mac) that is mainstream or above. Millions of  
threads is not a huge number: its one thread per pixel on a 720p video  
frame (and I see no reason, other than performance, not to use Perl6 for  
image processing).


If the discussion is stricly limited abstracting remote procedure calls,  
then I'll back away. But the exclusion of modules that map  
hyper-operators (and feeds, etc.) to OpenCL from the generic concept of  
perl6 concurrency seems rather blinkered.





FWIW, I absolutely agree with you that the mapping between Perl 6
hyper-operators and (GPU-based or otherwise) SIMD instructions is a
natural fit. But, in your post above you said:

Pure SIMD (vectorization) is insufficient for many of these workloads:
programmers really do need to think in terms of threads (most likely
mapped to OpenCL or Cuda under the hood).

By which I took you to mean that in-box SIMD (be it x86/x64 CPU or GPU
SIMD instruction sets) was insufficient for many of the[se] workloads
you were considering. And therefore took you to be suggesting that the
Perl 6 should also be catering for the heterogeneous aspects of OpenCL in
core.

I now realise that you were distinguishing between CPU SIMD instructions
and GPU SIMD instructions. But the real point here is Perl 6 doesn't need
a threading model to use and benefit from using GPU SIMD.

Any bog-standard single-threaded process can benefit from using CUDA or
the homogeneous aspect of OpenCL where available, for SIMD algorithms.
Their use can be entirely transparent to the language semantics for
built-in operations like the hyper-operators. Ideally, the Perl 6 runtime
would implement roles for OpenCl or CUDA for hyper-operations; fall back
to CPU SIMD instructions; ad fall back again to old-fashioned loops if
neither where available. This would all be entirely transparent to the
Perl 6 programmer, just as utilising discrete FPUs was transparent to the
C programmer back in the day. In an ideal world, Perl 6.0.0.0.0 would ship
with just the looping hyper-operator implementation; and it would be down
to users loading in an appropriately named Role that matched the
hardware's capabilities that would then get transparently picked up and
used by the hyper-operations to give them CPU-SIMD or GPU-SIMD as
available. Or perhaps these would become perl6 build-time configuration
options.

The discussion (which originally started outside of this list), was about
MIMD threading--the two categories above--in order to utilise the multiple
*C*PU cores that are now ubiquitous. For this Perl 6 does need to sort out
a threading model.

The guts of the discussion has been kernel threading (and mutable shared
state) is necessary. The perception being that by using user-threading (on
a single core at a time), you avoid the need for and complexities of
locking and synchronisation. And one of the (I believe spurious) arguments
for the use of user-space (MIMD) threading, is that they are lightweight
which allows you to runs thousands of concurrent threads.

And it does. I've done it with Erlang right here on my dirt-cheap Intel
Core2 Quad Q6600 processor. But, no matter how hard you try, you can never
push the CPU utilisation above 25%, because those 100,000 user-threads all
run in 

Re: Re: Parallelism and Concurrency was Re: Ideas for aObject-Belongs-to-Thread (nntp: message 4 of 20) threading model (nntp: message 20 of 20 -lastone!-) (nntp: message 13 of 20)

2010-05-18 Thread Daniel Ruoso
Em Ter, 2010-05-18 às 12:58 -0700, Alex Elsayed escreveu:
 You are imposing a false dichotomy here. Neither 'green' threads nor kernel
 threads preclude each other. In fact, it can  be convincingly argued that they
 work _best_ when combined. Please look at the GSoC proposal for hybrid
 threading on the Parrot list.

While I agree that there isn't a dichotomy, the point here is more in
the lines of:

 1) Green threads are usually related to the requirement of serialized 
access to data so you can share all data in the thread without 
resorting to locks for every value.

 2) If that requirement is dropped, once only data that is explicitly   
marked as shared can be seen by both threads, the point for green 
threads is moot, since the OS threads are always going to be better 
performant then a manually implemented scheduler.

My original idea was pointing in creating a shared memory space that
would be seen by every green thread in the same os thread, where some
lines would be drawn to allow OS threading with different memory spaces
- message passing would be used to communicate between two different
memory spaces.

But what we might be getting here is at the point where we don't need
green threads at all... I'm still not sure about one point or another,
tho..

daniel