On Mon, Dec 20, 2010 at 4:25 PM, Graydon Hoare <[email protected]> wrote:
>
>  4) If the cost of atomic ref counts are still too high, perhaps all
>> allocations should be task-local unless you explicitly tag it as being
>> shared (with task-local data not being allowed over channels)? This
>> seems conceptually simple and impossible to mess up. You tag shared data
>> in some special way and only for that data you pay the extra cost of
>> more expensive ref-counting. If you forget to tag something as shared,
>> then you'll get a compile-time error when you try to send it to another
>> task. Note that you're still sharing data here, so you support the
>> scenario in 1), you're just not incurring the cost for everything. I'd
>> prefer if this wasn't necessary (i.e. if refcount operations could be
>> statically or dynamically elided often enough that any immutable data
>> can be sent over a channel), but you could always make the "shared"
>> keyword a no-op in the future if that materializes.
>
>
> I'll describe a scheme I'd prefer to explore, you tell me if you think it'd
> be tolerable:
>
> We make a native data type called pinned[T]. A pinned[T] value holds a T
> value as well as a list of viewers (in C code; pinned[T] is opaque to rust
> clients). When you put a T value into a pinned[T], the system walks the data
> structure (once) and does a detachment operation (make sure each node is
> singly referenced, copying as required; necessary for the 'freeze' operator
> to work) then writes the magic constant-refcount (already existing; to
> handle compile-time constant data) into every rc field in the structure. The
> structure is now "pinned": it can be safely shared with multiple concurrent
> readers. It's more than just frozen; uninformed parties will think it's in
> read-only memory!
>
> From a given pinned[T] value, multiple view[T] values can be manufactured
> (helper function, also native). A view[T] is atomically added to the
> pinned[T] 'viewer' list, and when a pinned[T] is destructed it enters an
> "expiring" state that walks the viewer list, invalidates all inactive views,
> then waits for the last active view to end, and destructs the T. All
> view/pinned synchronization is atomic (or effectively so; at best
> carefully-reasoned lockless C code).
>
> Meanwhile, if I send a view[T] to some other thread, that thread can pull
> (via an iterator / one-shot reference-returning accessor, as Dave and
> Patrick have been discussing) an &option[T] out of the view[T]. If the
> underlying pinned[T] is dead, the view[T] has been invalidated, then the
> option[T] will come back none. Sorry. No data to view. But if it comes back
> as &some[T](?t) then the viewing thread can work with the target 't' data
> "as though it's const". No rc traffic to keep reconciled. It's working as
> though the data is a compile-time constant in read-only memory.
>

Not sure I understand this. What happens if I grab the value from a view[T],
and then store a reference to some internal sub-part of that T (e.g. let's
say the data is a tree, and I want to keep a reference to some sub-tree)? I
can't increment its ref-count, but does that mean I can't keep a hold of a
reference to it at all? I'd have to assume so since the view[T] only tracks
the root and can only give me a "none" for that, which means I must be
prohibited from taking a reference to any sub-structure?



> Again, as I said up top, there are multiple forms of parallelism, and I'm
> not sure it'll be necessary to force everything into the MIMD task-parallel
> model. I want to support the task-parallel variant *well*, because even when
> running serially/multiplexed, I think it's an essential ingredient in
> correctness: isolating tasks as a basic mechanism for decoupling their
> effects, isolating their failures. But it's not the only way; I've sketched
> in this email some variants we explore to support any/all of:
>
> SIMD - some kind of openMP-like pure-parallel loop
>

You might want to look at Nested Data Parallel Haskell here. It has the
appealing property that it can flatten/fuse nested data parallel algorithms
for you (e.g. want to walk a binary tree? Just kick off two jobs for the two
children in each node in an almost task-parallel fashion, and the compiler
will flatten it and create one "pass" for each "level" in the tree
automatically).

IMO the OpenCL, DirectCompute, CUDA crew are all getting this wrong. The
biggest hurdle with GPGPU isn't that you have to work through a graphics
interface (although that sucks too), which is what they keep spending all
their effort on fixing. The biggest hurdle for me has always been that you
have to manually turn your algorithms inside-out to create a series of
"flat" data parallel passes, rather than have the compiler do that work for
you. It took me something like two days to get a correct version of a simple
odd-even merge sort a few years ago (before there were code samples around
for it!). This is a really, really, difficult.

I realise this is highly "researchy"/"unproven" at the moment, but it might
be worth keeping it in mind so you don't make any design decisions that
would rule it out in the future.

-- 
Sebastian Sylvan
_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev

Reply via email to