As a newcomer to Rust, it seems to me that it would be a
simplification to express an iterator using a port, channel and task,
to re-use existing parts of the compiler instead of making an iterator
a special case.  So the iterator code (notionally) runs as a task and
passes back values over the channel, and the foreach loop pulls in
those values one by one from the port.  (The task creation could be
hidden behind the 'iter' calling syntax.)

However, this would force the compiler to do quite a lot of
optimisation if this was going to work efficiently (e.g. as
efficiently as a conventional for loop).

But it makes sense to consider this because there are other uses of
port/channels/tasks which aren't iterators but which could also use
this optimisation.

I have thought for a while about tasklet style implementations
(Erlang/Go/whatever), and the big problem I still don't have a clear
answer to is about scheduling and optimising all those tasks.
Hopefully Rust has some kind of a rough plan for dealing with this?

It seems to me there are several options:

- Statically optimise the task to run tightly-coupled alongside the
  spawning task, like the iterator example above.

- Run the task within the same thread, but independent and managed by
  the runtime scheduler.  (Message passing is cheaper inside one
  thread.)

- Run the task on this thread or possibly another thread, again
  managed by the runtime scheduler.  (But needs inter-thread
  synchronization.)

I see the whole task/port/channel thing as just notation, and how that
is translated is up to the compiler.  It doesn't actually have to run
as an independent task, it could be optimised down to inline code.

The minimum implementation of a called task -- with one outgoing
channel, one calling task and only local (scoped) use of the called
task -- could be a single pointer variable for the channel buffer, a
nested stack for the called task allocated on the caller's stack, and
switching between the two tasks as coroutines.  This would take care
of the iterator.

I think if tasks and channels/etc can't be optimised down like this
they're not going to be fast enough for widespread use in algorithms,
and programmers will have to ask themselves at each point whether they
are a good choice or not.

For example, let's say I have a loop from 1 to 100,000, and there is a
small chunk of work I need to do for each value.  The obvious approach
is to spawn a task for each of these 100,000 values in order to make
use of all my cores.  How is Rust going to deal with this?

- Allocate 100,000 stack frames for all the tasks in a global pool,
  and work through them in parallel on N cores?  (Memory explosion)

- Allow the loop to only spawn 100 or so, and wait for those to finish
  executing before letting more be spawned?

- Inline the sub-task execution into the loop, and then fork the loop
  over my N cores (for example) so that 100,000/N of those tasks are
  run on each core.  This would be very hard for a compiler to do, but
  it is what I might implement if I was hand-coding it.

- Or: recommend to the programmer that they don't create so many
  tasks, and split their work-load over max 100 or so tasks to ensure
  that the inefficiencies of scheduling so many tasks doesn't
  overwhelm the processing time.

The final option is like giving up, though, saying that the task
scheduling/optimisation problem can't be solved by the compiler.  It
would be better if the compiler could take full charge of this
problem, because if you ask the programmer to optimise this, they are
not likely to do a good job.  For example:

Say my top-level task A spawns 10 parallel B tasks, each of which
spawns 10 C tasks, each of which spawns 10 D tasks.  In total we have
potentially 1000+ concurrent tasks.  The level C tasks don't know that
there are 100 of them already running in parallel, already saturating
all the cores.  Their 10 D tasks each could be run inline (tightly
coupled) very much more efficiently than adding more tasks to the
pool.  However, if there was only one level C task running, then
spawning 10 D tasks might be a good idea to make use of parallelism
across the cores.

This is an example where the programmer can't optimise this for us,
because the programmer doesn't know in what context task C will be
called.  Only the runtime knows.

So should there be two compiled versions of each task, one that
inlines everything possible, and the other that spawns as many tasks
as possible?  (Or maybe a single version with more fine-grained
switches.)  If so, the version to run could be selected by the runtime
according to how saturated the cores are with existing work.

What kind of plan does Rust have regarding these questions?  This
seems like the hardest part of the design to me.

Reading the Wiki, it suggests that the task/port/channel-related stuff
was being moved out to a library, which means the compiler can't
optimise it -- or was that just the old design?

Maybe I am imagining a level of optimisation and abstraction of tasks
far beyond what is realistically feasible, though.  Maybe this really
is a problem that has to be pushed back to the programmer.

Maybe we can say: tasks are too expensive for implementing an
iterator, but cheap compared to something that will take 1ms or more
of processing time?  So the programmer has to judge that for
themselves.  Is that the approach?

Jim

P.S.  I'm interested in Rust because the design decisions make a lot
of sense to me.  However, having a ASM/Forth/C/Java background, the
ML-inspired parts of Rust are giving me trouble, so I think I'm going
to learn OCaml first to get a better grounding in that before going
much further.

-- 
 Jim Peters                  (_)/=\~/_(_)                 [email protected]
                          (_)  /=\  ~/_  (_)
 UazĂș                  (_)    /=\    ~/_    (_)                http://
 in Peru            (_) ____ /=\ ____ ~/_ ____ (_)            uazu.net
_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev

Reply via email to