On 20/12/2010 9:18 AM, Sebastian Sylvan wrote:
Not sure I understand this. What happens if I grab the value from a view[T], and then store a reference to some internal sub-part of that T (e.g. let's say the data is a tree, and I want to keep a reference to some sub-tree)? I can't increment its ref-count, but does that mean I can't keep a hold of a reference to it at all? I'd have to assume so since the view[T] only tracks the root and can only give me a "none" for that, which means I must be prohibited from taking a reference to any sub-structure?
There are a few options. One is that we develop a linear (non-copyable) layer, which patrick's been working on semantics of (it's useful for modeling resources and dtors) and your view data is one of those. No heap-copies allowed, only aliases. Another option is that each pin 'view' acquired by the viewer actually survives until its *domain* dies (for one-shot worker threads, this is plausible). Another is you use a 'pinned' magic rc that differs from the 'const' magic rc, and causes a local copy (again: detachment) when you do a heap-store of a pointer to a pinned box (aliases are free though).
As with optimizing rc traffic, there's a pretty wide variety of tricks to try here, none of which is particularly painful to try out and measure.
You might want to look at Nested Data Parallel Haskell here. It has the appealing property that it can flatten/fuse nested data parallel algorithms for you (e.g. want to walk a binary tree? Just kick off two jobs for the two children in each node in an almost task-parallel fashion, and the compiler will flatten it and create one "pass" for each "level" in the tree automatically).
Very different evaluation model, deep changes to semantics, etc. I don't disagree with the research programme -- and it might keep getting better, attracting programmers, etc. -- but it's not the path I want to follow.
IMO the OpenCL, DirectCompute, CUDA crew are all getting this wrong. The biggest hurdle with GPGPU isn't that you have to work through a graphics interface (although that sucks too), which is what they keep spending all their effort on fixing. The biggest hurdle for me has always been that you have to manually turn your algorithms inside-out to create a series of "flat" data parallel passes, rather than have the compiler do that work for you. It took me something like two days to get a correct version of a simple odd-even merge sort a few years ago (before there were code samples around for it!). This is a really, really, difficult.
I don't know how to avoid some problems in programming remaining hard for the programmer, and am not (yet) aware of completely satisfactory ways to automate such reasoning. We've been waiting for the "magical parallelizing compiler" for 40 years or so. They keep getting better, but there always seems to be a pretty hard limit to the tech.
Maybe the compilers aren't even the issue, and some kind of reduceron haskell-hardware monster will get us there! I'd love to wind up wrong on some tech bets, wind up in a future with "easy parallelism", because that alternative future would be very pretty. But I'm still betting with the more likely (to me) path of a heterogeneous collection of techniques that ease, but do not "make easy", the parallelism problem.
I realise this is highly "researchy"/"unproven" at the moment, but it might be worth keeping it in mind so you don't make any design decisions that would rule it out in the future.
Nah, not ruling out. I just don't like depending on too many hypotheticals. If they arise and work well (and are simple, well understood and patent-free) a good research result is hard to turn down :)
-Graydon _______________________________________________ Rust-dev mailing list [email protected] https://mail.mozilla.org/listinfo/rust-dev
