Fwd: Re: Parallelism and Concurrency was Re: Ideas for aObject-Belongs-to-Thread (nntp: message 4 of 20) threading model (nntp: message 20 of 20 -lastone!-) (nntp: message 13 of 20)

2010-05-18 Thread nigelsandever



--- Forwarded message ---
From: nigelsande...@btconnect.com
To: Dave Whipp - d...@whipp.name  
+nntp+browseruk+e66dbbe0cf.dave#whipp.n...@spamgourmet.com

Cc:
Subject: Re: Parallelism and Concurrency was Re: Ideas for  
aObject-Belongs-to-Thread (nntp: message 4 of 20) threading model (nntp:  
message 20 of 20 -lastone!-) (nntp: message 13 of 20)

Date: Mon, 17 May 2010 22:31:45 +0100

On Mon, 17 May 2010 20:33:24 +0100, Dave Whipp - dave_wh...@yahoo.com
+nntp+browseruk+2dcf7cf254.dave_whipp#yahoo@spamgourmet.com wrote:


From that statement, you do not appear to understand the subject matter

of this thread: Perl 6 concurrency model.

Actually, the reason for my post was that I fear that I did understand  
the subject matter of the thread: seems to me that any reasonable  
discussion of perl 6 concurrency should not be too focused on  
pthreads-style threading.


Okay. Now we're at cross-purposes about the heavily overloaded term
threading. Whilst GPUs overload the term threading for their internal
operations, they are for the most part invisible to applications
programmer. And quite different to the 100,000 threads demos in the Go and
Erlang documentation to which I referred. The latter being MIMD
algorithms, and significantly harder to find applications for than, SIMD
algorithms which are commonplace and well understood.

My uses of the terms threading and threads are limited specifically to
MIMD threading of two forms:

Kernel threading: pthreads, Win32/64 threads etc.
User-space threading: green threads; coroutines; goroutines; Actors; etc.

See below for why I've been limiting myself to these two definitions.

OpenCL/Cuda are not exotic $M hardware: they are available (and  
performant) on any PC (or Mac) that is mainstream or above. Millions of  
threads is not a huge number: its one thread per pixel on a 720p video  
frame (and I see no reason, other than performance, not to use Perl6 for  
image processing).


If the discussion is stricly limited abstracting remote procedure calls,  
then I'll back away. But the exclusion of modules that map  
hyper-operators (and feeds, etc.) to OpenCL from the generic concept of  
perl6 concurrency seems rather blinkered.





FWIW, I absolutely agree with you that the mapping between Perl 6
hyper-operators and (GPU-based or otherwise) SIMD instructions is a
natural fit. But, in your post above you said:

Pure SIMD (vectorization) is insufficient for many of these workloads:
programmers really do need to think in terms of threads (most likely
mapped to OpenCL or Cuda under the hood).

By which I took you to mean that in-box SIMD (be it x86/x64 CPU or GPU
SIMD instruction sets) was insufficient for many of the[se] workloads
you were considering. And therefore took you to be suggesting that the
Perl 6 should also be catering for the heterogeneous aspects of OpenCL in
core.

I now realise that you were distinguishing between CPU SIMD instructions
and GPU SIMD instructions. But the real point here is Perl 6 doesn't need
a threading model to use and benefit from using GPU SIMD.

Any bog-standard single-threaded process can benefit from using CUDA or
the homogeneous aspect of OpenCL where available, for SIMD algorithms.
Their use can be entirely transparent to the language semantics for
built-in operations like the hyper-operators. Ideally, the Perl 6 runtime
would implement roles for OpenCl or CUDA for hyper-operations; fall back
to CPU SIMD instructions; ad fall back again to old-fashioned loops if
neither where available. This would all be entirely transparent to the
Perl 6 programmer, just as utilising discrete FPUs was transparent to the
C programmer back in the day. In an ideal world, Perl 6.0.0.0.0 would ship
with just the looping hyper-operator implementation; and it would be down
to users loading in an appropriately named Role that matched the
hardware's capabilities that would then get transparently picked up and
used by the hyper-operations to give them CPU-SIMD or GPU-SIMD as
available. Or perhaps these would become perl6 build-time configuration
options.

The discussion (which originally started outside of this list), was about
MIMD threading--the two categories above--in order to utilise the multiple
*C*PU cores that are now ubiquitous. For this Perl 6 does need to sort out
a threading model.

The guts of the discussion has been kernel threading (and mutable shared
state) is necessary. The perception being that by using user-threading (on
a single core at a time), you avoid the need for and complexities of
locking and synchronisation. And one of the (I believe spurious) arguments
for the use of user-space (MIMD) threading, is that they are lightweight
which allows you to runs thousands of concurrent threads.

And it does. I've done it with Erlang right here on my dirt-cheap Intel
Core2 Quad Q6600 processor. But, no matter how hard you try, you can never
push the CPU utilisation above 25%, because those 100,000 user-threads all
run in 

Re: Re: Parallelism and Concurrency was Re: Ideas for aObject-Belongs-to-Thread (nntp: message 4 of 20) threading model (nntp: message 20 of 20 -lastone!-) (nntp: message 13 of 20)

2010-05-18 Thread Daniel Ruoso
Em Ter, 2010-05-18 às 12:58 -0700, Alex Elsayed escreveu:
 You are imposing a false dichotomy here. Neither 'green' threads nor kernel
 threads preclude each other. In fact, it can  be convincingly argued that they
 work _best_ when combined. Please look at the GSoC proposal for hybrid
 threading on the Parrot list.

While I agree that there isn't a dichotomy, the point here is more in
the lines of:

 1) Green threads are usually related to the requirement of serialized 
access to data so you can share all data in the thread without 
resorting to locks for every value.

 2) If that requirement is dropped, once only data that is explicitly   
marked as shared can be seen by both threads, the point for green 
threads is moot, since the OS threads are always going to be better 
performant then a manually implemented scheduler.

My original idea was pointing in creating a shared memory space that
would be seen by every green thread in the same os thread, where some
lines would be drawn to allow OS threading with different memory spaces
- message passing would be used to communicate between two different
memory spaces.

But what we might be getting here is at the point where we don't need
green threads at all... I'm still not sure about one point or another,
tho..

daniel



Re: Parallelism and Concurrency was Re: Ideas for aObject-Belongs-to-Thread (nntp: message 4 of 20) threading model (nntp: message 20 of 20 -lastone!-)

2010-05-17 Thread Aaron Sherman
[Note: removed one CCer because the email address was long and complex and
looked like my mail client had hacked up a hairball full of experimental
Perl 6 obfuscation. My apologies if that wasn't actually a mail failure]

On Mon, May 17, 2010 at 3:13 PM, nigelsande...@btconnect.com wrote:


  The important thing is not the number of algorithms: it's the number
 programs and workloads.


 From that statement, you do not appear to understand the subject matter of
 this thread: Perl 6 concurrency model.


That seems a tad more confrontational than was required. It's also arguably
incorrect. Surveying existing software implementations and code bases is not
precluded, simply because we're talking about a new(ish) language.

For CPU-bound processes, there is no benefit in trying to utilise more than
 one thread per core--or hardware thread if your cores have hyper-threading.
 Context switches are expensive, and running hundreds (let alone thousands or
 millions) of threads on 2/4/8/12 core commodity hardware, means that you'll
 spend more time context switching than doing actual work. With the net
 result of less rather than more throughput.


I know that you know what I'm about to say, but I'm going to say it anyway
just so we're standing on the same ground.

When I was in college, I had access to a loosely coupled 20-processor
system. That was considered radical, cutting-edge technology for the fact
that you could treat it like a standard Unix workhorse, and not as some sort
of black-hole of computing power with which you could commune via a
front-end (ala Cray). I then worked for a company that was producing an
order 1k processor system to do the same thing. These were relatively
short-spaced advances in technology.

When a single die shipped, containing 2 cores, I was agape. I'd never
considered that it would happen as soon as it did.

Today we're putting order of 10 cores on a die.

I'm really not all that old, and yet the shockingly high-end supercomputing
platforms of my youth are, more or less, being put on a chip.

Perl pre-6 hit its stride about 5-10 years into its lifespan (mid to late
90s). Perl 6 hasn't even shipped yet, and yet your statements appear to be
selecting modern hardware as its target platform, design wise. I'm not sure
that's entirely (un)wise. Then again, it does simplify the world
tremendously.

I just wanted to get that all out there for thought.

-- 
Aaron Sherman
Email or GTalk: a...@ajs.com
http://www.ajs.com/~ajs