As a newcomer to Rust, it seems to me that it would be a simplification to express an iterator using a port, channel and task, to re-use existing parts of the compiler instead of making an iterator a special case. So the iterator code (notionally) runs as a task and passes back values over the channel, and the foreach loop pulls in those values one by one from the port. (The task creation could be hidden behind the 'iter' calling syntax.)
However, this would force the compiler to do quite a lot of optimisation if this was going to work efficiently (e.g. as efficiently as a conventional for loop). But it makes sense to consider this because there are other uses of port/channels/tasks which aren't iterators but which could also use this optimisation. I have thought for a while about tasklet style implementations (Erlang/Go/whatever), and the big problem I still don't have a clear answer to is about scheduling and optimising all those tasks. Hopefully Rust has some kind of a rough plan for dealing with this? It seems to me there are several options: - Statically optimise the task to run tightly-coupled alongside the spawning task, like the iterator example above. - Run the task within the same thread, but independent and managed by the runtime scheduler. (Message passing is cheaper inside one thread.) - Run the task on this thread or possibly another thread, again managed by the runtime scheduler. (But needs inter-thread synchronization.) I see the whole task/port/channel thing as just notation, and how that is translated is up to the compiler. It doesn't actually have to run as an independent task, it could be optimised down to inline code. The minimum implementation of a called task -- with one outgoing channel, one calling task and only local (scoped) use of the called task -- could be a single pointer variable for the channel buffer, a nested stack for the called task allocated on the caller's stack, and switching between the two tasks as coroutines. This would take care of the iterator. I think if tasks and channels/etc can't be optimised down like this they're not going to be fast enough for widespread use in algorithms, and programmers will have to ask themselves at each point whether they are a good choice or not. For example, let's say I have a loop from 1 to 100,000, and there is a small chunk of work I need to do for each value. The obvious approach is to spawn a task for each of these 100,000 values in order to make use of all my cores. How is Rust going to deal with this? - Allocate 100,000 stack frames for all the tasks in a global pool, and work through them in parallel on N cores? (Memory explosion) - Allow the loop to only spawn 100 or so, and wait for those to finish executing before letting more be spawned? - Inline the sub-task execution into the loop, and then fork the loop over my N cores (for example) so that 100,000/N of those tasks are run on each core. This would be very hard for a compiler to do, but it is what I might implement if I was hand-coding it. - Or: recommend to the programmer that they don't create so many tasks, and split their work-load over max 100 or so tasks to ensure that the inefficiencies of scheduling so many tasks doesn't overwhelm the processing time. The final option is like giving up, though, saying that the task scheduling/optimisation problem can't be solved by the compiler. It would be better if the compiler could take full charge of this problem, because if you ask the programmer to optimise this, they are not likely to do a good job. For example: Say my top-level task A spawns 10 parallel B tasks, each of which spawns 10 C tasks, each of which spawns 10 D tasks. In total we have potentially 1000+ concurrent tasks. The level C tasks don't know that there are 100 of them already running in parallel, already saturating all the cores. Their 10 D tasks each could be run inline (tightly coupled) very much more efficiently than adding more tasks to the pool. However, if there was only one level C task running, then spawning 10 D tasks might be a good idea to make use of parallelism across the cores. This is an example where the programmer can't optimise this for us, because the programmer doesn't know in what context task C will be called. Only the runtime knows. So should there be two compiled versions of each task, one that inlines everything possible, and the other that spawns as many tasks as possible? (Or maybe a single version with more fine-grained switches.) If so, the version to run could be selected by the runtime according to how saturated the cores are with existing work. What kind of plan does Rust have regarding these questions? This seems like the hardest part of the design to me. Reading the Wiki, it suggests that the task/port/channel-related stuff was being moved out to a library, which means the compiler can't optimise it -- or was that just the old design? Maybe I am imagining a level of optimisation and abstraction of tasks far beyond what is realistically feasible, though. Maybe this really is a problem that has to be pushed back to the programmer. Maybe we can say: tasks are too expensive for implementing an iterator, but cheap compared to something that will take 1ms or more of processing time? So the programmer has to judge that for themselves. Is that the approach? Jim P.S. I'm interested in Rust because the design decisions make a lot of sense to me. However, having a ASM/Forth/C/Java background, the ML-inspired parts of Rust are giving me trouble, so I think I'm going to learn OCaml first to get a better grounding in that before going much further. -- Jim Peters (_)/=\~/_(_) [email protected] (_) /=\ ~/_ (_) UazĂș (_) /=\ ~/_ (_) http:// in Peru (_) ____ /=\ ____ ~/_ ____ (_) uazu.net _______________________________________________ Rust-dev mailing list [email protected] https://mail.mozilla.org/listinfo/rust-dev
