Re: parallelFuture

2009-10-23 Thread dsimcha
== Quote from Andrei Alexandrescu (seewebsiteforem...@erdani.org)'s article
 dsimcha wrote:
  Again, code:
 
  http://dsource.org/projects/scrapple/browser/trunk/parallelFuture/parallelFuture.d
 
  Docs:
 
  http://cis.jhu.edu/~dsimcha/parallelFuture.html
 What license is the library under?
 Andrei

Boost.  I borrowed a few lines of assembler for atomic ops from Tango, though.
These snippets are standard atomic ops code and are probably not 
copyrightable.
 Nevertheless, this section of code is clearly marked, and D2/Phobos should
eventually get a full-fledged atomics lib anyhow.  At that point, these ad-hoc
functions can be replaced with more generic functions and the issue will be
completely resolved.


Re: parallelFuture

2009-10-23 Thread Christopher Wright

Andrei Alexandrescu wrote:

dsimcha wrote:

Again, code:

http://dsource.org/projects/scrapple/browser/trunk/parallelFuture/parallelFuture.d 



Docs:

http://cis.jhu.edu/~dsimcha/parallelFuture.html


What license is the library under?

Andrei


Boost. I suppose you didn't want to look at the source in case the 
license wasn't sufficiently liberal, but it's easy enough to scan 
through the module's doc comments for a license block, without reading 
the source code.


Re: parallelFuture

2009-10-23 Thread dsimcha
== Quote from Andrei Alexandrescu (seewebsiteforem...@erdani.org)'s article
 Christopher Wright wrote:
  Andrei Alexandrescu wrote:
  dsimcha wrote:
  Again, code:
 
 
http://dsource.org/projects/scrapple/browser/trunk/parallelFuture/parallelFuture.d
 
 
  Docs:
 
  http://cis.jhu.edu/~dsimcha/parallelFuture.html
 
  What license is the library under?
 
  Andrei
 
  Boost. I suppose you didn't want to look at the source in case the
  license wasn't sufficiently liberal, but it's easy enough to scan
  through the module's doc comments for a license block, without reading
  the source code.
 I actually did take a look but couldn't visually find the license block
 at the beginning or end of the code, sorry. Now I see it's a bit below
 the beginning.
 Great - it would be cool if we could adapt this for Phobos. I'm hoping
 Sean (who is fleshing out his messaging API) and David could find ways
 to integrate their work.
 Andrei

If that's the case, someone please point me to some use cases for message 
passing
so I can understand it better.  parallelFuture was designed mostly with pure
throughput-oriented multithreading of embarrassingly parallel tasks in mind.  I
hadn't given any thought to message passing because I don't encounter many use
cases for it in the type of work I do.


Re: parallelFuture

2009-10-22 Thread zsxxsz
== Quote from dsimcha (dsim...@yahoo.com)'s article
 I've created an alpha release of parallelFuture, a high-level parallelization
 library for D2.  Right now, it has a task pool, futures, parallel foreach, and
 parallel map.
 Here's the (IMHO) coolest example:
 auto pool = new ThreadPool();
 // Assuming we have a function isPrime(), print all
 // prime numbers from 0 to uint.max, testing for primeness
 // in parallel.
 auto myRange = iota(0, uint.max);
 foreach(num; pool.parallel(myRange)) {
 if(isPrime(num)) {
 synchronized writeln(num);
 }
 }
 The interface is there and it seems to work, although it has not been
 extensively stress tested yet.  Some of the implementation details could
 admittedly use some cleaning up, and I would appreciate help from some
 threading gurus on improving my queue (right now it's a naive synchronized
 singly-linked list) and getting condition mutexes to work properly.  (Right
 now, I'm using atomic polling followed by sleeping for 1 millisecond in a lot
 of places.  It's a kludge, but it seems to work reasonably well in practice.)
 The code is at:
 http://dsource.org/projects/scrapple/browser/trunk/parallelFuture/parallelFuture.d
 The docs are at:
 http://cis.jhu.edu/~dsimcha/parallelFuture.html

Very good!


Re: parallelFuture

2009-10-22 Thread Lars T. Kyllingstad

dsimcha wrote:

I've created an alpha release of parallelFuture, a high-level parallelization
library for D2.  Right now, it has a task pool, futures, parallel foreach, and
parallel map.

Here's the (IMHO) coolest example:

auto pool = new ThreadPool();

// Assuming we have a function isPrime(), print all
// prime numbers from 0 to uint.max, testing for primeness
// in parallel.
auto myRange = iota(0, uint.max);
foreach(num; pool.parallel(myRange)) {
if(isPrime(num)) {
synchronized writeln(num);
}
}

The interface is there and it seems to work, although it has not been
extensively stress tested yet.  Some of the implementation details could
admittedly use some cleaning up, and I would appreciate help from some
threading gurus on improving my queue (right now it's a naive synchronized
singly-linked list) and getting condition mutexes to work properly.  (Right
now, I'm using atomic polling followed by sleeping for 1 millisecond in a lot
of places.  It's a kludge, but it seems to work reasonably well in practice.)

The code is at:

http://dsource.org/projects/scrapple/browser/trunk/parallelFuture/parallelFuture.d

The docs are at:

http://cis.jhu.edu/~dsimcha/parallelFuture.html



Very nice! Parallelisation for us ordinary folks. :) I tried your 
isPrime example, which was very cool. I can't wait to try the library in 
a real application.


I often have a situation where I have a set of grid points, and I do 
(mostly) separate calculations on each grid point. Your library should 
allow me to do calculations on several grid points in parallel, with 
minimal changes to my code.


Is there some particular reason why you have capitalised the F in the 
file name, but not in the module name?


-Lars


Re: parallelFuture

2009-10-22 Thread dsimcha
== Quote from Lars T. Kyllingstad (pub...@kyllingen.nospamnet)'s article
 Is there some particular reason why you have capitalised the F in the
 file name, but not in the module name?
 -Lars

This is called the effects of being in hack mode late at night.  I guess the
convention is all lower case.


Re: parallelFuture

2009-10-22 Thread dsimcha
== Quote from Tim Matthews (tim.matthe...@gmail.com)'s article
 dsimcha wrote:
  I've created an alpha release of parallelFuture, a high-level 
  parallelization
  library for D2.  Right now, it has a task pool, futures, parallel foreach, 
  and
  parallel map.
 
  Here's the (IMHO) coolest example:
 
  auto pool = new ThreadPool();
 
  // Assuming we have a function isPrime(), print all
  // prime numbers from 0 to uint.max, testing for primeness
  // in parallel.
  auto myRange = iota(0, uint.max);
  foreach(num; pool.parallel(myRange)) {
  if(isPrime(num)) {
  synchronized writeln(num);
  }
  }
 
  The interface is there and it seems to work, although it has not been
  extensively stress tested yet.  Some of the implementation details could
  admittedly use some cleaning up, and I would appreciate help from some
  threading gurus on improving my queue (right now it's a naive synchronized
  singly-linked list) and getting condition mutexes to work properly.  (Right
  now, I'm using atomic polling followed by sleeping for 1 millisecond in a 
  lot
  of places.  It's a kludge, but it seems to work reasonably well in 
  practice.)
 
  The code is at:
 
  http://dsource.org/projects/scrapple/browser/trunk/parallelFuture/parallelFuture.d
 
  The docs are at:
 
  http://cis.jhu.edu/~dsimcha/parallelFuture.html
 Nice. About the tasks:
 In .Net a worker thread is created for each cpu core. Each worker thread
 has its own local queue of tasks which it initially retrieves from the
 global queue but if the tasks creates new tasks it adds to its local
 queue directly for less contention. When it finishes completing a task
 it takes the next from the back of its local queue to take advantage of
 cache (like a stack). The tasks at the front of the queue can be stolen
 from another worker thread if its local queue and the global queue are
 both empty to maximize cpu usage.
 Also the task's wait for complete is only considered complete if all of
 the tasks it created too are complete (kinda like recursion).
 What is the implementation or plans like for this.

For now, parallelFuture was designed with a single producer, multiple worker
model.  Absolutely no attempt was made to allow for tasks running in the task 
pool
to themselves submit jobs to the same task pool, because it would have made 
things
more complicated and I couldn't think of any use cases.  I designed the lib with
the types of use cases I encounter in my work in mind.  (mathy, pure throughput
oriented computing on large, embarrassingly parallel problems.)  If someone 
comes
up with a compelling use case, though, I'd certainly consider adding such
abilities provided they don't interfere with performance or API simplicity for 
the
more common cases.

To make this discussion simple, let's define F1 as a future/task submitted by 
the
main producer thread, and F2 as a task/future submitted by F1.  The queue is 
(for
now) strictly FIFO, except that if you have a pointer to the Task/Future object
you can steal a job.  When F1 submits F2 to the queue, F2 goes to the back of 
the
queue like anything else.  This means when F1 waits on F2, it is possible to 
have
a cyclical dependency (F1 waiting on F2, F2 waiting for a worker thread 
populated
by F1).  This is mitigated by work stealing (F1 may just steal F2 and do it in 
its
own thread).

In parallel map and foreach, I should probably document this, but for now it's
undefined behavior for the mapping function or parallel foreach loop body to
submit jobs to the task pool and wait on them, and in practice will likely 
result
in deadlocks.


Re: parallelFuture

2009-10-22 Thread dsimcha
== Quote from Charles Hixson (charleshi...@earthlink.net)'s article
 dsimcha wrote:
  I've created an alpha release of parallelFuture, a high-level 
  parallelization
  library for D2.  Right now, it has a task pool, futures, parallel foreach, 
  and
  parallel map.
 
  Here's the (IMHO) coolest example:
 
  auto pool = new ThreadPool();
 
  // Assuming we have a function isPrime(), print all
  // prime numbers from 0 to uint.max, testing for primeness
  // in parallel.
  auto myRange = iota(0, uint.max);
  foreach(num; pool.parallel(myRange)) {
  if(isPrime(num)) {
  synchronized writeln(num);
  }
  }
 
  The interface is there and it seems to work, although it has not been
  extensively stress tested yet.  Some of the implementation details could
  admittedly use some cleaning up, and I would appreciate help from some
  threading gurus on improving my queue (right now it's a naive synchronized
  singly-linked list) and getting condition mutexes to work properly.  (Right
  now, I'm using atomic polling followed by sleeping for 1 millisecond in a 
  lot
  of places.  It's a kludge, but it seems to work reasonably well in 
  practice.)
 
  The code is at:
 
  http://dsource.org/projects/scrapple/browser/trunk/parallelFuture/parallelFuture.d
 
  The docs are at:
 
  http://cis.jhu.edu/~dsimcha/parallelFuture.html
 If you can easily do it, it would be nice to be able to have the threads
 able to communicate with each other.  Something along the lines of both
 broadcast messages and 1:1 exchanges.  A very rough sketch of a possible
   use:
 Task[] tasks  =  Task.who_is_waiting;
 foreach (auto task; tasks)
 {  if (task.process(something) )
 {  task.markDone(true);   }
 }
 I see something as being an Object that would need to implement an
 interface to carry some identifying information, so when a task received
 it it could determine what to do with it (with reject processing being
 an option).  I think the rest is pretty clear.
 OTOH, this comment was just to demonstrate the kind of communication
 between tasks that I'm talking about.  Static methods for broadcast
 communication and instance methods for 1:1 communication.  With
 safeties, so when a task completes before it gets your message, you
 aren't left talking to a null pointer.  Cleanup would need to be managed
 by the original thread that created the tasks.  It would need to be able
 to send a broadcast now closing task xxx signal to all threads.
 Threads maintaining a list of tasks would need to scan through them and
 remove any references to that task.
 Maybe this is getting to complicated, but it would certainly be useful.
   In the past interthread communication is the main problem that's kept
 me from using them.

I'll think about this and see if I can work something like this in, but I need
real-world use cases.  I do bioinformatics work, which basically means 
large-scale
data mining, embarrassingly parallel problems and very CPU-bound work.  The use
cases I had in mind were mostly the use every core I have to do something
embarrassingly parallel kind.  The goal was to make these use cases as dead
simple as possible.

Whatever use cases you have in mind, I'm apparently not familiar with them.  If
I'm to improve this lib to handle use cases other than the pure
throughput-oriented parallelization of embarrassingly parallel tasks that I had 
in
mind, I need to understand use cases from other fields.


Re: parallelFuture

2009-10-22 Thread Tim Matthews

dsimcha wrote:


For now, parallelFuture was designed with a single producer, multiple worker
model.  Absolutely no attempt was made to allow for tasks running in the task 
pool
to themselves submit jobs to the same task pool, because it would have made 
things
more complicated and I couldn't think of any use cases.


like recursion a function is gona need to call another function, a 
thread needs to be able to spawn threads and task should be able to 
create new tasks. Making newly spawned tasks stay on the same thread is 
good optimization. This shouldn't need a specific use case.



I designed the lib with
the types of use cases I encounter in my work in mind.  (mathy, pure throughput
oriented computing on large, embarrassingly parallel problems.)  If someone 
comes
up with a compelling use case, though, I'd certainly consider adding such
abilities provided they don't interfere with performance or API simplicity for 
the
more common cases.

To make this discussion simple, let's define F1 as a future/task submitted by 
the
main producer thread, and F2 as a task/future submitted by F1.  The queue is 
(for
now) strictly FIFO, except that if you have a pointer to the Task/Future object
you can steal a job.  When F1 submits F2 to the queue, F2 goes to the back of 
the
queue like anything else.  This means when F1 waits on F2, it is possible to 
have
a cyclical dependency (F1 waiting on F2, F2 waiting for a worker thread 
populated
by F1).  This is mitigated by work stealing (F1 may just steal F2 and do it in 
its
own thread).


I don't like that ^ idea of simple discussion with the many F1 and F2 
all over the place. I hope this video can help visualize some ideas: 
http://channel9.msdn.com/pdc2008/TL26/




In parallel map and foreach, I should probably document this, but for now it's
undefined behavior for the mapping function or parallel foreach loop body to
submit jobs to the task pool and wait on them, and in practice will likely 
result
in deadlocks.


You want to document that as undefined behavior? It can be made to work.


Re: parallelFuture

2009-10-22 Thread dsimcha
== Quote from Tim Matthews (tim.matthe...@gmail.com)'s article
 dsimcha wrote:
  For now, parallelFuture was designed with a single producer, multiple worker
  model.  Absolutely no attempt was made to allow for tasks running in the 
  task pool
  to themselves submit jobs to the same task pool, because it would have made 
  things
  more complicated and I couldn't think of any use cases.
 like recursion a function is gona need to call another function, a
 thread needs to be able to spawn threads and task should be able to
 create new tasks. Making newly spawned tasks stay on the same thread is
 good optimization. This shouldn't need a specific use case.
  I designed the lib with
  the types of use cases I encounter in my work in mind.  (mathy, pure 
  throughput
  oriented computing on large, embarrassingly parallel problems.)  If someone 
  comes
  up with a compelling use case, though, I'd certainly consider adding such
  abilities provided they don't interfere with performance or API simplicity 
  for the
  more common cases.
 
  To make this discussion simple, let's define F1 as a future/task submitted 
  by the
  main producer thread, and F2 as a task/future submitted by F1.  The queue 
  is (for
  now) strictly FIFO, except that if you have a pointer to the Task/Future 
  object
  you can steal a job.  When F1 submits F2 to the queue, F2 goes to the back 
  of the
  queue like anything else.  This means when F1 waits on F2, it is possible 
  to have
  a cyclical dependency (F1 waiting on F2, F2 waiting for a worker thread 
  populated
  by F1).  This is mitigated by work stealing (F1 may just steal F2 and do it 
  in its
  own thread).
 I don't like that ^ idea of simple discussion with the many F1 and F2
 all over the place. I hope this video can help visualize some ideas:
 http://channel9.msdn.com/pdc2008/TL26/
 
  In parallel map and foreach, I should probably document this, but for now 
  it's
  undefined behavior for the mapping function or parallel foreach loop body to
  submit jobs to the task pool and wait on them, and in practice will likely 
  result
  in deadlocks.
 You want to document that as undefined behavior? It can be made to work.

Ok, I thought of my use case and an easy fix.  It will probably be fixed soon.


Re: parallelFuture

2009-10-21 Thread Tim Matthews

dsimcha wrote:

I've created an alpha release of parallelFuture, a high-level parallelization
library for D2.  Right now, it has a task pool, futures, parallel foreach, and
parallel map.

Here's the (IMHO) coolest example:

auto pool = new ThreadPool();

// Assuming we have a function isPrime(), print all
// prime numbers from 0 to uint.max, testing for primeness
// in parallel.
auto myRange = iota(0, uint.max);
foreach(num; pool.parallel(myRange)) {
if(isPrime(num)) {
synchronized writeln(num);
}
}

The interface is there and it seems to work, although it has not been
extensively stress tested yet.  Some of the implementation details could
admittedly use some cleaning up, and I would appreciate help from some
threading gurus on improving my queue (right now it's a naive synchronized
singly-linked list) and getting condition mutexes to work properly.  (Right
now, I'm using atomic polling followed by sleeping for 1 millisecond in a lot
of places.  It's a kludge, but it seems to work reasonably well in practice.)

The code is at:

http://dsource.org/projects/scrapple/browser/trunk/parallelFuture/parallelFuture.d

The docs are at:

http://cis.jhu.edu/~dsimcha/parallelFuture.html


Nice. About the tasks:

In .Net a worker thread is created for each cpu core. Each worker thread 
has its own local queue of tasks which it initially retrieves from the 
global queue but if the tasks creates new tasks it adds to its local 
queue directly for less contention. When it finishes completing a task 
it takes the next from the back of its local queue to take advantage of 
cache (like a stack). The tasks at the front of the queue can be stolen 
from another worker thread if its local queue and the global queue are 
both empty to maximize cpu usage.


Also the task's wait for complete is only considered complete if all of 
the tasks it created too are complete (kinda like recursion).


What is the implementation or plans like for this.