On 18/12/2010 9:03 AM, Sebastian Sylvan wrote:
This thread brings to mind a couple of issues that I'd like to just
float for your consideration.
Sure! Appreciate the comments. Even if it's just a forum to air some
design discussion; I realize there's always going to be a bit of an
imbalance between what's written down vs. discussed/planned, so I'm
happy to shed some light. Should do some roadmap-writing at some point too.
1) Many apps build up massive immutable data structures that need to be
accessed by tons of tasks (e.g. video games, my field). Copying these
values would be prohibitively impractical, so any general-purpose
language /must/ support real sharing of immutable data somehow.
Realistically, before Rust sees mainstream use we'll have >64 cores to
keep busy, so anything that imposes limits on read-only data sharing
between actual /cores/ is going to be a big burden too.
Yeah. There's some truth to this. Games and browser rendering trees both :)
Keep in mind, however, that there are multiple forms of parallelism.
Task parallelism is essentially MIMD; we're building task parallelism
into the language but it's not the only option. Rust also tracks
function purity as an effect; it's quite conceivable that we might
introduce an openMP-like parallel for loop, if we played some games with
deferring refcounts in the loop body, we could do SIMD-like fork/join on
(say) pure loop bodies. Semantically plausible and possibly a better fit
than forcing everything into the task model.
I have a couple ideas for the task model as well, of course. More below:
2) I'd also argue that sharing of immutable data is the simpler and more
obvious semantics. Especially when you take destructors into account. I
think you should therefore have to opt-in for "copy, don't share"
semantics as an optimization when you have data that ends up getting
loads of incref/decref operations on them.
The dtor story is a bit in flux at the moment, but we'll probably wind
up shifting it around so it's only viable on non-shared values. At that
point Rafael is right that it's semantically equivalent (though
performance-observable) whether a true copy is made. There's some leeway
in implementation, however...
3) The cost of reference counting in a multi-core scenario is a concern.
Rust already limits the number of reference operations using aliases,
which presumably gets rid of a lot of the cost. Is Rust wedded to the
idea of having accrate refcounts at all times? As far as I can tell,
modern ref-counting algorithms seem to be about on par with modern GC
algorithms for performance (just barely) even in the context of
multithreading, but they achieve this feat through deferred ref-counting
and things like that. Should Rust not do that instead? I kind of think
refcounts may be the way forward in the future, because the cost of
GC'ing in a ref-count system is proportional to the amount of actual
garbage, rather than being proportional to the size of the heap, but it
seems like the consensus on this issue in the literature is that
refcounts are only performant when they're not being constantly kept up
to date. Am I wrong?
You make three points here and I'll address them independently.
1. multi-threaded RC operations are going to be atomic -> expensive.
Yes. We're trying to avoid atomic RC, though as Rafael points out we can
conceivably fall back to it in cases where we have performance evidence
that the hurt will be less than the hurt of doing a full copy. There's a
knob to turn. But I'd prefer not to turn it at all. There are a couple
other options (more below, honest!)
2. It's true that on massive heaps, the GC-research consensus I've seen
lately is that you want to do primary RC and secondary cycle collection,
due to walking the heap less. And that fits with our model, generally,
in a performance-characteristic sense.
3. DRC is definitely an option we've discussed. There are multiple
schemes for it that have different relationships with whatever other GC
it's interacting with. It's not (IMO) essential that we "always have
100% accurate refcounts at all times", merely that we can say with
certainty when we will next have them. Similar to our preemption story;
we're going to say that 2 tasks on 1 thread will only preempt at
particular (well defined, easily controlled) points, not "any single
instruction".
4) If the cost of atomic ref counts are still too high, perhaps all
allocations should be task-local unless you explicitly tag it as being
shared (with task-local data not being allowed over channels)? This
seems conceptually simple and impossible to mess up. You tag shared data
in some special way and only for that data you pay the extra cost of
more expensive ref-counting. If you forget to tag something as shared,
then you'll get a compile-time error when you try to send it to another
task. Note that you're still sharing data here, so you support the
scenario in 1), you're just not incurring the cost for everything. I'd
prefer if this wasn't necessary (i.e. if refcount operations could be
statically or dynamically elided often enough that any immutable data
can be sent over a channel), but you could always make the "shared"
keyword a no-op in the future if that materializes.
Finally, the "more below" issue! Yes. Assuming we're just talking
independent control paths (so *some* task-parallelism) and assuming we
want to avoid heavy atomic RC (both also possibilities, but discussed
above) we can *also* twiddle the semantics of sharing a bit to talk
about thread-sharing vs. non-thread-sharing.
The scheme you propose would entail two things I'd prefer to avoid
(though it'd work): lots of atomic RC on the shared bits, and another
layer in the memory-layer system (the "shared") layer.
I'll describe a scheme I'd prefer to explore, you tell me if you think
it'd be tolerable:
We make a native data type called pinned[T]. A pinned[T] value holds a T
value as well as a list of viewers (in C code; pinned[T] is opaque to
rust clients). When you put a T value into a pinned[T], the system walks
the data structure (once) and does a detachment operation (make sure
each node is singly referenced, copying as required; necessary for the
'freeze' operator to work) then writes the magic constant-refcount
(already existing; to handle compile-time constant data) into every rc
field in the structure. The structure is now "pinned": it can be safely
shared with multiple concurrent readers. It's more than just frozen;
uninformed parties will think it's in read-only memory!
From a given pinned[T] value, multiple view[T] values can be
manufactured (helper function, also native). A view[T] is atomically
added to the pinned[T] 'viewer' list, and when a pinned[T] is destructed
it enters an "expiring" state that walks the viewer list, invalidates
all inactive views, then waits for the last active view to end, and
destructs the T. All view/pinned synchronization is atomic (or
effectively so; at best carefully-reasoned lockless C code).
Meanwhile, if I send a view[T] to some other thread, that thread can
pull (via an iterator / one-shot reference-returning accessor, as Dave
and Patrick have been discussing) an &option[T] out of the view[T]. If
the underlying pinned[T] is dead, the view[T] has been invalidated, then
the option[T] will come back none. Sorry. No data to view. But if it
comes back as &some[T](?t) then the viewing thread can work with the
target 't' data "as though it's const". No rc traffic to keep
reconciled. It's working as though the data is a compile-time constant
in read-only memory.
There are atomic operations in this scheme, but only at the "root" of
the data structure, the pinned[T] / view[T] values do atomic ops,
everything else works with aliases-and-constants while the view[T] is
"in use".
This scheme has the added advantage that you can do "dataflow"
multi-version publish/subscribe with it as well: the pinned value can be
updated to a new one and the viewers -- if they're pulling from a
view[T] via an iterator -- could just "get the next version" next time
around the loop, after the writer updates.
Again, as I said up top, there are multiple forms of parallelism, and
I'm not sure it'll be necessary to force everything into the MIMD
task-parallel model. I want to support the task-parallel variant *well*,
because even when running serially/multiplexed, I think it's an
essential ingredient in correctness: isolating tasks as a basic
mechanism for decoupling their effects, isolating their failures. But
it's not the only way; I've sketched in this email some variants we
explore to support any/all of:
SIMD - some kind of openMP-like pure-parallel loop
MISD - some kind of pin/view, publish/subscribe dataflow model
MIMD - existing task model
(And of course lonely old serial SISD, which we already do just fine)
-Graydon
_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev