On 9/10/2010 8:05 PM, Michel Fortin wrote:
Le 2010-09-10 à 17:13, David Simcha a écrit :

As far as I can tell, your needs might be better served by std.concurrency.
> From what I can see, your parallel foreach is basically some syntactic sugar 
for queuing tasks inside a loop and then to block until the result is ready. While 
I'll admit I'm not sure I need that sugar or to block waiting for the result, 
queuing tasks in a loop is certainly something I need.
It's slightly more complicated than that under the hood because:

1. If your range has a huge amount of stuff, you want to lazily add it to the queue, not add it all upfront. Parallel foreach does some magic under the hood so that you can parallel foreach over a range of size N in O(1) memory even if you want small work units. Modulo the workaround for a Linux-specific compiler bug, parallel foreach doesn't even heap allocate.

2. The parallel foreach works with non-random access ranges by buffer data for small work units in an array.

With my app I can easily have 1000 of these tasks queued at a given time (I 
effectively have a couple of loops that can add tasks to a queue). They mostly 
read and parse files to extract some pieces of data. At the API level, 
std.concurency looks like it could do that, except it'd be creating one thread 
for each task. I don't want to create one thread for each task, so I need some 
sort of task queue and a thread pool.

But maybe you're right, and maybe the thread pool should go in std.concurrency 
where creating and queuing a task could work like spawning a thread, perhaps 
like this:

        // send task to a specific thread to be executed there
        tid.perform(&taskFunc, "hello world");

        // queue task for execution in a thread pool
        tpool.dispatch(&taskFunc, "hello world");

Those two things I'd find quite useful. And it'd be pretty much trivial to 
build a parallel foreach on top of this.

This is getting me thinking. I've given up making most of std.parallelism safe. Parallel foreach is the hardest thing to make safe, and for me personally the most useful part of std.parallelism. I wonder, though, if I can make Task @safe/@trusted provided:

1.  The input args are either indirection-free, immutable, or shared.

2. The callable is a function pointer, not a delegate, alias or class with overloaded opCall.

3. The return type is either indirection-free, immutable or shared. (This is, unfortunately, necessary b/c the worker thread could in theory hold onto a reference to it in TLS after returning, even though doing so would be thoroughly idiotic in most cases.)

I'm thinking I may add a safeTask() function that is marked @trusted, and creates a Task object iff these constraints are satisfied (and otherwise doesn't compile). I think the only sane way to do this is to have a separate safe function for creating tasks in addition to the more lenient "here be dragons" one. The only major thing I don't like about this is the idea of sprinkling a few safe functions in a mostly "here be dragons" module. It seems like it would complicate code reviews.

And just to add weight to the argument that task based concurrency is used 
pretty much everywhere: I worked before on some industrial software that had 
this too. It basically had to perform some analysis every time new data came 
in, in real-time. A new task was created for each piece of data and dispatched 
to a thread pool, then a few seconds later the result was sent to another 
thread that'd take some action based on the analysis.

Glad to hear that this might be useful outside scientific computing.
_______________________________________________
phobos mailing list
[email protected]
http://lists.puremagic.com/mailman/listinfo/phobos

Reply via email to