Re: [rust-dev] Immutable shared boxes and destructors

Graydon Hoare Mon, 20 Dec 2010 08:25:14 -0800

On 18/12/2010 9:03 AM, Sebastian Sylvan wrote:

This thread brings to mind a couple of issues that I'd like to just
float for your consideration.

Sure! Appreciate the comments. Even if it's just a forum to air somedesign discussion; I realize there's always going to be a bit of animbalance between what's written down vs. discussed/planned, so I'mhappy to shed some light. Should do some roadmap-writing at some point too.

1) Many apps build up massive immutable data structures that need to be
accessed by tons of tasks (e.g. video games, my field). Copying these
values would be prohibitively impractical, so any general-purpose
language /must/ support real sharing of immutable data somehow.
Realistically, before Rust sees mainstream use we'll have >64 cores to
keep busy, so anything that imposes limits on read-only data sharing
between actual /cores/ is going to be a big burden too.


Yeah. There's some truth to this. Games and browser rendering trees both :)

Keep in mind, however, that there are multiple forms of parallelism.Task parallelism is essentially MIMD; we're building task parallelisminto the language but it's not the only option. Rust also tracksfunction purity as an effect; it's quite conceivable that we mightintroduce an openMP-like parallel for loop, if we played some games withdeferring refcounts in the loop body, we could do SIMD-like fork/join on(say) pure loop bodies. Semantically plausible and possibly a better fitthan forcing everything into the task model.


I have a couple ideas for the task model as well, of course. More below:

2) I'd also argue that sharing of immutable data is the simpler and more
obvious semantics. Especially when you take destructors into account. I
think you should therefore have to opt-in for "copy, don't share"
semantics as an optimization when you have data that ends up getting
loads of incref/decref operations on them.

The dtor story is a bit in flux at the moment, but we'll probably windup shifting it around so it's only viable on non-shared values. At thatpoint Rafael is right that it's semantically equivalent (thoughperformance-observable) whether a true copy is made. There's some leewayin implementation, however...

3) The cost of reference counting in a multi-core scenario is a concern.
Rust already limits the number of reference operations using aliases,
which presumably gets rid of a lot of the cost. Is Rust wedded to the
idea of having accrate refcounts at all times? As far as I can tell,
modern ref-counting algorithms seem to be about on par with modern GC
algorithms for performance (just barely) even in the context of
multithreading, but they achieve this feat through deferred ref-counting
and things like that. Should Rust not do that instead? I kind of think
refcounts may be the way forward in the future, because the cost of
GC'ing in a ref-count system is proportional to the amount of actual
garbage, rather than being proportional to the size of the heap, but it
seems like the consensus on this issue in the literature is that
refcounts are only performant when they're not being constantly kept up
to date. Am I wrong?


You make three points here and I'll address them independently.

1. multi-threaded RC operations are going to be atomic -> expensive.Yes. We're trying to avoid atomic RC, though as Rafael points out we canconceivably fall back to it in cases where we have performance evidencethat the hurt will be less than the hurt of doing a full copy. There's aknob to turn. But I'd prefer not to turn it at all. There are a coupleother options (more below, honest!)

2. It's true that on massive heaps, the GC-research consensus I've seenlately is that you want to do primary RC and secondary cycle collection,due to walking the heap less. And that fits with our model, generally,in a performance-characteristic sense.

3. DRC is definitely an option we've discussed. There are multipleschemes for it that have different relationships with whatever other GCit's interacting with. It's not (IMO) essential that we "always have100% accurate refcounts at all times", merely that we can say withcertainty when we will next have them. Similar to our preemption story;we're going to say that 2 tasks on 1 thread will only preempt atparticular (well defined, easily controlled) points, not "any singleinstruction".

4) If the cost of atomic ref counts are still too high, perhaps all
allocations should be task-local unless you explicitly tag it as being
shared (with task-local data not being allowed over channels)? This
seems conceptually simple and impossible to mess up. You tag shared data
in some special way and only for that data you pay the extra cost of
more expensive ref-counting. If you forget to tag something as shared,
then you'll get a compile-time error when you try to send it to another
task. Note that you're still sharing data here, so you support the
scenario in 1), you're just not incurring the cost for everything. I'd
prefer if this wasn't necessary (i.e. if refcount operations could be
statically or dynamically elided often enough that any immutable data
can be sent over a channel), but you could always make the "shared"
keyword a no-op in the future if that materializes.

Finally, the "more below" issue! Yes. Assuming we're just talkingindependent control paths (so *some* task-parallelism) and assuming wewant to avoid heavy atomic RC (both also possibilities, but discussedabove) we can *also* twiddle the semantics of sharing a bit to talkabout thread-sharing vs. non-thread-sharing.

The scheme you propose would entail two things I'd prefer to avoid(though it'd work): lots of atomic RC on the shared bits, and anotherlayer in the memory-layer system (the "shared") layer.

I'll describe a scheme I'd prefer to explore, you tell me if you thinkit'd be tolerable:

We make a native data type called pinned[T]. A pinned[T] value holds a Tvalue as well as a list of viewers (in C code; pinned[T] is opaque torust clients). When you put a T value into a pinned[T], the system walksthe data structure (once) and does a detachment operation (make sureeach node is singly referenced, copying as required; necessary for the'freeze' operator to work) then writes the magic constant-refcount(already existing; to handle compile-time constant data) into every rcfield in the structure. The structure is now "pinned": it can be safelyshared with multiple concurrent readers. It's more than just frozen;uninformed parties will think it's in read-only memory!

From a given pinned[T] value, multiple view[T] values can bemanufactured (helper function, also native). A view[T] is atomicallyadded to the pinned[T] 'viewer' list, and when a pinned[T] is destructedit enters an "expiring" state that walks the viewer list, invalidatesall inactive views, then waits for the last active view to end, anddestructs the T. All view/pinned synchronization is atomic (oreffectively so; at best carefully-reasoned lockless C code).

Meanwhile, if I send a view[T] to some other thread, that thread canpull (via an iterator / one-shot reference-returning accessor, as Daveand Patrick have been discussing) an &option[T] out of the view[T]. Ifthe underlying pinned[T] is dead, the view[T] has been invalidated, thenthe option[T] will come back none. Sorry. No data to view. But if itcomes back as &some[T](?t) then the viewing thread can work with thetarget 't' data "as though it's const". No rc traffic to keepreconciled. It's working as though the data is a compile-time constantin read-only memory.

There are atomic operations in this scheme, but only at the "root" ofthe data structure, the pinned[T] / view[T] values do atomic ops,everything else works with aliases-and-constants while the view[T] is"in use".

This scheme has the added advantage that you can do "dataflow"multi-version publish/subscribe with it as well: the pinned value can beupdated to a new one and the viewers -- if they're pulling from aview[T] via an iterator -- could just "get the next version" next timearound the loop, after the writer updates.

Again, as I said up top, there are multiple forms of parallelism, andI'm not sure it'll be necessary to force everything into the MIMDtask-parallel model. I want to support the task-parallel variant *well*,because even when running serially/multiplexed, I think it's anessential ingredient in correctness: isolating tasks as a basicmechanism for decoupling their effects, isolating their failures. Butit's not the only way; I've sketched in this email some variants weexplore to support any/all of:


SIMD - some kind of openMP-like pure-parallel loop
MISD - some kind of pin/view, publish/subscribe dataflow model
MIMD - existing task model

(And of course lonely old serial SISD, which we already do just fine)

-Graydon
_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev

Re: [rust-dev] Immutable shared boxes and destructors

Reply via email to