On 11-04-14 05:31 PM, Rafael Avila de Espindola wrote:
I have been thinking about the costs and benefits we get from tasks. I
had discussed some of them with Graydon both by email on IRC. The email
is a quick summary to open the discussion.
Thanks for the write-up. I'll try to be more coherent here :)
*) The idea of using a special calling convention for doing rust to C
calls only works if we C stack is in a really easy to find place, like a
pinned register. We could do better than we do now by converting the
upcall functions with intrinsic functions, but that is still not ideal.
I think we need something here in any case, since even with linked
stacks we might be calling into C code that was compiled without them.
It will overrun the current segment and crash if it goes too deep. We
need to switch stacks, sometimes :(
But aside from that I agree with the other points on linked stacks. It's
worth trying them, they look better than the approach rustboot tried.
Given this and the fact that there is already interest in having LLVM
support linked stacks, for the rest of the email I will assume we will
use stack linking instead of copying.
Yup.
The way I see it, the big advantaged of tasks would be if they could be
used like erlang threads or goroutines. The programmer can just create
lots of them, use blocking APIs and they get scheduled as needed.
Here is probably where we part ways. You're glossing over a lot of
detail by saying this:
- "Like Erlang" means that a runtime-provided, very careful async
I/O manager is standing in the way of every "blocking"-looking API
and intercepting it, multiplexing it through I/O facilities. This
is *largely* what I've been proposing we do, with exceptions for
unsafe calls when the user really wants the option to shoot their
own foot.
- "Like Go" only works in Go because they are willing to let all
goroutines get remapped to threads at will and race on shared access
to memory. We're not.
So let's please keep in mind that nobody, at the moment, has a "general"
solution to the speed/complexity/safety tradeoff: those that pick safety
have to compensate with a certain amount of runtime machinery for I/O
and possibly lose speed when dealing with concurrent access
(particularly with mutable things); those that pick speed wind up
(usually) losing safety.
I'd very much like to chart a path towards a nice balance between these
tensions, but we have to be honest about the tensions existing.
Unfortunately, that model *cannot* be implemented in rust. A task cannot
move from a thread to another, so it is possible for two tasks that
could be executing to be in the same thread.
Agreed. If you make an unsafe, blocking C OS API call in the current
model, it will block your thread and every task in it. Erlang can dodge
this bullet because there's *no* sharing between tasks; you can always
reschedule all tasks -- in their entirety -- to other threads, so one
thread blocking is no big deal.
But that's also rare; Erlang discourages you *heavily* from making any
blocking C calls directly. The point in that model (and in many dozens
of other systems, from Node to win32 IOCP) is that the work you do
saturating the CPU with "caulcuation" and the work you do feeding data
in and out of I/O facilities use such dramatically different OS
interfaces and concurrency requirements that they tend to run on
different threads and communicate using queues local to the process.
Separate IO thread pool and CPU thread pool. That's how we've been
proceeding so far (with a further dividing-up of memory into tasks to
provide multiplexed control within a thread). It's not strictly
necessary but it's also not completely made up or novel.
Consider the example of a browser that wants to fetch many objects and
handle them. It would be very tempting to create one task for each of
the objects, but we cannot do that. The task creation would happen
before the network request and we would be already pinned to a thread
before knowing which resource would be available first.
A similar problem happens for pure IO, like a static http server. Open a
thread per request and you don't know which read will finish first. Some
of this can be avoided by having a clever IO library where read just
sends a message to an IO thread that uses select. Unfortunately, this
will not work when using mmap for example.
For these reasons it looks to me as if tasks add a lot of cost for a
small benefit.
I disagree, obviously. The benefit has to do with conceptual simplicity.
In concurrent programs, you often have two Very Hard Reasoning Problems
that are typically Done Wrong:
- Multiplexing ownership of memory (usually via locks, atomic
refcounts or concurrent GC) between multiple cores.
- Multiplexing blocking serial execution (usually via state machines)
between multiple I/O endpoints.
A task unifies both these problems into a simple abstraction that a user
is likely to get right, while keeping costs .. low. Depending on which
cost you want to measure. I agree there's a price -- implementing a
scheduler, occasional mis-assignment of tasks to threads so you starve a
runnable task, writing a decent I/O multiplexing library -- but there
are major benefits to be had in providing a simplified cognitive model.
This is why we have OS processes with control stacks in the first place,
rather than single address space and 'goto'.
--------------------------------------------------------------------
Lets implement just process and threads for now. With these in place we
can see how far we can go. Once we have a need for more abstraction, we
can revisit what a task is and implement it.
--------------------------------------------------------------------
My feeling is we're *starting* with a need for more abstraction: we are
presently shooting our feet regularly trying to write safe programs that
do concurrent memory access and blocking I/O using just threads and
processes, in C. It's too hard in general. People always get it wrong.
And some different implementation ideas for when we do decide to
implement tasks:
* Use an OS thread of each task. What we currently call a thread in rust
will then just be a control for what tasks can run in parallel. A coarse
and easy to use parallelism that that user can refine if it finds a
contention.
I agree that there might be a way to make this model of a task work if
threads are sufficiently cheap -- on many unixes they are now, not sure
how win32 is doing these days -- and we can make guarantees about
exclusive ownership of memory. Even if it only works on some platforms,
it may well be worth investigating as a way to instantiate the model.
This solves the "task blocking because of unrelated task" problem with
no extra code, even for memory mapped IO.
Agreed. That's a very desirable feature of it.
Another advantage of this implementation is that we can expose any OS
level services to the tasks. For example, we can deliver signals without
having to de multiplex them.
To the extent that the OS service does not rely on running on our stack,
maybe. Even signals tend to want something for that (see sigaltstack).
But in some cases I imagine you are right. The OS generally presents
APIs that are ... if not thread *friendly*, at least thread *aware*.
We'd probably get a few for free.
This is not as expensive as it looks, since we would still be using
small stacks. It is hard to image a case where this is too expensive but
the existing proposal is not.
It's not the expense of thread-per-task that concerns me. That's just a
question of arithmetic: either it is or it isn't. We can simply measure:
how much is N thousand threads on each OS? Easy to research and figure
out threshold values.
My concern is with preserving the abstractions of multiplexed sequential
I/O (rather than manual continuation-passing / interleaved state-machine
style) and non-contended ownership of a private piece of memory (or the
functional equivalent, say "only immutable memory"). I want to hold on
to those, because those are the parts that are hardest about concurrent
programming today.
* Go with an even lighter notion of what a task is. The idea is to
implement something like GCD. In the current implementation of GCD (as
in most C code), the burden of safety is always in the programmer. We
can probably do a bit better for rust in the common case.
Maybe so. I'm interested in exploring this sort of interface more when
we have implemented unique pointers, so that we can talk about a type
kind that is known to require neither GC nor atomic refcounting. It
seems like a safe(r) GCD workalike could be written there.
But I feel there remains a strong need for an abstraction that lets you
trade some portion of maximum possible throughput (in terms of CPU *
I/O) for a simplified mental model that multiplexes I/O and control flow
safely and automatically (if not optimally). I agree that necessarily
entails a good AIO multiplexing library, somewhat baked into the
standard library.
I think you overestimate the degree to which normal human programmers
are currently able to write a correct program that *just* handles lots
of concurrent I/O, even setting aside the "saturate the CPU" problem.
AIO libraries are in the dark ages. Node and Erlang both achieved what
notoriety they did mostly by providing *any* kind of saner abstraction
over it (they didn't even pick the same abstraction; node doesn't do
coroutines). Simply exposing the AIO interfaces in a non-crazy way is a
huge win, and (IMO) providing a coroutine library on top is better yet.
Cognitive and correctness costs are real.
-Graydon
_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev