Forwarding message from last night to list...
-------- Original Message --------
Subject: Re: [rust-dev] cost/benefits of tasks
Date: Thu, 14 Apr 2011 22:36:45 -0700
From: Graydon Hoare <[email protected]>
To: Rafael Ávila de Espíndola <[email protected]>
I am replying to you since you replied privately to me; if you meant to
reply to list, say so and I'll send this back there. Not sure if you
wanted this to be a private diversion in the conversation..
On 14/04/2011 7:52 PM, Rafael Ávila de Espíndola wrote:
Yes, we have this is a problem for both implementations. On linux at
least we can also use a global thread local variable to hold the C stack
for the OS thread. I have no idea what we should do for OS X in here.
No TLS on OSX? I gather they have a rather different threading system
than linux (and windows); I don't know the exact details.
This is important research to do before jumping to conclusions:
different OSs provide quite different facilities, some cheap, some
expensive, and with very different scalability curves. Win32 looks like
it eats 12kb of kernel stack for each thread -- not free, though not
huge -- but I have no idea what the scheduler does when a few thousand
of them issue sync IO requests at once. OSX will cap an 8gb machine to
12,500 threads (weirdly: http://support.apple.com/kb/HT3854), not sure
if those are posix or mach threads. There are things to study.
We need to quantify costs and qualify portability if we're going to have
such horse-trading arguments about implementation effort and focus.
I suspect we won't get stacks smaller than a page. Tasks can be smaller,
but then if we're recycling stack segments between tasks-or-threads,
pages might make the most sense anyways.
I agree in here. It might just be that the decision of having no user
visible global mutable state makes tasks that can share refcounted
objects too limited to be useful. We might have to choose one.
Yeah. And be careful to differentiate uniquely owned from merely
immutable. Immutable stuff still needs to be collected somehow if it's
DAG-shaped or worse.
I understand the abstraction. In fact, I like it. The problem is that it
is not what we are implementing. You can implement it by giving the
language enough information to move tasks (expensive for all
implementation options I can think of) or by exposing the problem to the
user (undesirable in general, still needed in unsafe blocks).
I disagree. The abstraction is "a coroutine that's properly multiplexed
so long as you don't hit the blocking OS interfaces directly", and it's
properly implemented by soft coroutines with shared memory. Better yet,
you can share immutable DAG-shaped stuff between them, within a thread,
using non-atomic refcounting. And if you want to hit the blocking OS
interfaces directly, use a thread. That is: change
"spawn foo()"
to
"spawn thread foo()"
It's not so bad, eh? If a user sees too much jank or stalled tasks or
something, they can analyze the problem, decide, and possibly insert the
word 'thread'.
Can we just give them better tools? Consider what rust would look like
right now without tasks.
* Creating a thread is more expensive than creating a task.
* Scheduling would work as expected
* Passing data in a safe way would imply
* Immutable data *or*
* Copy *or*
* Move semantics if we can figure out how to do it.
* Use of an unsafe block is a natural extension.
That's exactly what "spawn thread foo()" gives you. I'm not suggesting
we take it away. Merely that we maintain a 3rd option that's lighter
still, that gives you:
* Free sharing of immutable substructures between tasks.
* Potentially (though I'll grant, not necessarily) cheaper spawn
and kill events, other per-task overheads, switching speed
(probably depends on platform, good to quantify).
* Easier concurrent debugging, if you like, since the scheduler is
down in userspace. Longer timeslices, manual yielding, etc.
Keep in mind, 3 is not the limit of options I'm hoping to support. A
GCD-like thing that uses frozen-unique values it owns and saturates the
CPU using a queue of worker callbacks is definitely reasonable. As is a
"parallel for loop" that does fork/join, or a vectorizing / SIMD
construct. Parallelism and concurrency structures are a bit
heterogeneous, just like control and data structures :)
I would love to have both at a reasonable price, but the proposed
implementation is not there. It looks like it when you write the code,
but you do not really have multiplexed sequential threads. You can get
basic (not memory mapped) IO multiplexing with a lot of code, but that
is it.
It's worth differentiating cases. Not every task is doing mmap'ed I/O,
and that's the only case where this abstraction doesn't hold up.
Everything else we can indirect through an AIO interface, with the added
benefit that we can make the user-facing side of it much less crazy than
the C API.
(And .. I think you're rarely doing mmap'ed network I/O anyways)
It is always nice to remember that one of the problems of C++ is that
its higher level features are too expensive. Firefox is built with
exceptions disabled for example.
That is not only out of cost concerns; but point taken.
Yes, but given our target marked (build the best browser), the
implementation cost is a go/no-go.
The best browser may well have lots of internal actions that are not
doing mmap'ed I/O. Coroutines make sense for them.
The argument seems a bit like you want it both ways. You're insisting
that the costs of doing non-mmap'ed I/O (that is, copying buffers) is
always unacceptable, but then saying it's .. ok to limit ourselves to a
tasking model that may require deep-copying a lot of messages to
communicate. What gives? Can't we just give the user a vocabulary to
decide when they want which kinds of copying?
If not copying stacks, tasks at least don't have the problem of making
every program pay for what they don't use. My only real concern is that
we would be implementing the next RTTI: a fancy feature that is not used
in practice.
Oh, "the next RTTI" that stings!
That is why my suggestion right now is to go with just threads (and
stack linking to make creating them cheap). If it turns out that users
have to use unsafe block too often because they couldn't get a safe move
and copying is too expensive, we know need to provide something else.
I'm not fond of this strategy. I think that if we don't spend the time
making "something else" work from the beginning, nobody ever will.
System architecture changes will get *harder* over time, not easier.
-Graydon
_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev