On 9/10/2010 8:05 PM, Michel Fortin wrote:
Le 2010-09-10 à 17:13, David Simcha a écrit :
As far as I can tell, your needs might be better served by std.concurrency.
> From what I can see, your parallel foreach is basically some syntactic sugar
for queuing tasks inside a loop and then to block until the result is ready. While
I'll admit I'm not sure I need that sugar or to block waiting for the result,
queuing tasks in a loop is certainly something I need.
It's slightly more complicated than that under the hood because:
1. If your range has a huge amount of stuff, you want to lazily add it
to the queue, not add it all upfront. Parallel foreach does some magic
under the hood so that you can parallel foreach over a range of size N
in O(1) memory even if you want small work units. Modulo the workaround
for a Linux-specific compiler bug, parallel foreach doesn't even heap
allocate.
2. The parallel foreach works with non-random access ranges by buffer
data for small work units in an array.
With my app I can easily have 1000 of these tasks queued at a given time (I
effectively have a couple of loops that can add tasks to a queue). They mostly
read and parse files to extract some pieces of data. At the API level,
std.concurency looks like it could do that, except it'd be creating one thread
for each task. I don't want to create one thread for each task, so I need some
sort of task queue and a thread pool.
But maybe you're right, and maybe the thread pool should go in std.concurrency
where creating and queuing a task could work like spawning a thread, perhaps
like this:
// send task to a specific thread to be executed there
tid.perform(&taskFunc, "hello world");
// queue task for execution in a thread pool
tpool.dispatch(&taskFunc, "hello world");
Those two things I'd find quite useful. And it'd be pretty much trivial to
build a parallel foreach on top of this.
This is getting me thinking. I've given up making most of
std.parallelism safe. Parallel foreach is the hardest thing to make
safe, and for me personally the most useful part of std.parallelism. I
wonder, though, if I can make Task @safe/@trusted provided:
1. The input args are either indirection-free, immutable, or shared.
2. The callable is a function pointer, not a delegate, alias or class
with overloaded opCall.
3. The return type is either indirection-free, immutable or shared.
(This is, unfortunately, necessary b/c the worker thread could in theory
hold onto a reference to it in TLS after returning, even though doing so
would be thoroughly idiotic in most cases.)
I'm thinking I may add a safeTask() function that is marked @trusted,
and creates a Task object iff these constraints are satisfied (and
otherwise doesn't compile). I think the only sane way to do this is to
have a separate safe function for creating tasks in addition to the more
lenient "here be dragons" one. The only major thing I don't like about
this is the idea of sprinkling a few safe functions in a mostly "here be
dragons" module. It seems like it would complicate code reviews.
And just to add weight to the argument that task based concurrency is used
pretty much everywhere: I worked before on some industrial software that had
this too. It basically had to perform some analysis every time new data came
in, in real-time. A new task was created for each piece of data and dispatched
to a thread pool, then a few seconds later the result was sent to another
thread that'd take some action based on the analysis.
Glad to hear that this might be useful outside scientific computing.
_______________________________________________
phobos mailing list
[email protected]
http://lists.puremagic.com/mailman/listinfo/phobos