On 03/06/2011 7:29 AM, Marijn Haverbeke wrote:
[This is just a rambing e-mail outlining some problems I'm running
into. Though I am stressing these problems to make sure they are not
glossed over, I'm *not* suggesting we give up aliases or anything like
that.]
Much appreciated; if something's to be shot down, earlier is better!
It's already embarrassingly late in the game to be working out these
rules in full. Sadly.
The issue I'm running into is that obj types*, function types, and
type parameters are 'opaque' and, as such, can contain everything.
Ouch. This is quite a wrinkle.
This means that any boxed value returned from a function that took a
stateful obj, parameterized type, or function type (or something that
contains one!) must be suspected of being reachable from that object.
Suspected of, yes. Let's work on eliminating the suspicion. Or limiting
its severity.
If we go with function parameter aliasing solution #2 (not
allowed to pass aliasing things) then any function in such a
context-passing module that takes both a context and an alias, can not
be called on any boxed (or box-containing) value that was returned
from another context-taking function. This seems like it'll invalidate
a serious percentage of the code in the current compiler, with no
obvious way to 'fix' it.
Possibly. I mean, your analysis is correct about ways it can go wrong,
but I'm not sure it's always going to be a pervasive problem, or
unfixable. I'd like to continue with solution #2 (let's call this the
"strong induction hypothesis" solution) for a moment and consider ways
of changing the code or helping it reveal its safety:
- When we have down-fn-args (in the form of lambda blocks) we will
be able to turn many-or-most obj field accessor methods into
iter-like constructs, yes? Like, what we do now as:
obj foo { fn get_bar() -> bar; }
may well turn into:
obj foo { fn with_bar(fn(&bar) &f); }
such that clients stop writing:
my_foo.get_bar().do_a_thing();
and start writing:
my_foo.with_bar() {| b | b.do_a_thing(); }
which carries the pleasant performance allowance of letting the
obj keep its bar member either allocated inline or held as a
unique box (which can only alias with another alias, not a
shared box). Consider if this is very common whether we want
an attribute-like (getter/setter pair) syntax for objs.
- Along those lines: consider what happens when we have unique boxes
in general, and whether returning "shared box" from a function
will be quite so common an operation. If the function's job is to
*construct* values of type foo, then even if boxed it makes much
more sense to return ~foo than @foo since, at the time of function
return, the out-pointer is (probably) the sole reference anyways.
- Further into the unique-ownership line of thinking: when there's a
kind system up and running (which you may find a necessary component
of formulating this analysis properly -- they're closely related!)
it might be possible to constrain the types of an opaque to
be "unknown but tree-shaped" (not containing shared pointers).
We have discussed always considering obj and fn types as opaque
to the kind system too (and assuming the worst about them) but
perhaps this is too loose. What would happen to the issue if we
could say "this obj type only has tree-kind memory inside it"?
Or further, given the depth of the hazard here: what if we
*required* that for all obj and fn types? Would shared boxes lose
all utility? Would too many idioms stop working?
Going with solution #1 (you may pass aliasing aliases to functions)
instead, we'd be in a situation where an obj or parameterized argument
may alias with every alias passed in, which means that after passing
said obj or parameterized argument to any function you can no longer
be sure your alias is still valid. This is worse than the situation
described above.
This is true, and a reason why I still prefer #2. Though again, it may
be mitigated by some of the "different idioms, focused on uniqueness"
stuff I discuss above. Still, I feel like the weaker induction
hypothesis will make too much stuff fall apart; gut feeling but the best
I have to go on. I'd like to try to get #2 to hold together.
A 'distinguish' operation would provide a way out, but if I understand
what you're proposing correctly it'll traverse the values at run-time.
Proving two big (maybe even cyclic) data structures don't share
structure is an arbitrarily expensive operation.
Agreed, this is a ... burdensome operation. Various versions might work
but they all feel like fixing the wrong problem. And adding cognitive
burden. And runtime landmines, as you say.
(Seems we're once again entering uncharted territory. As with the
effects system, that's always dangerous.)
True, and fair point. It's not *wholly* uncharted -- C and C++ both have
a variety of alias-analysis-driven *optimizations* -- but the new part
is making one of those analysis passes airtight enough to consider as
safety guarantees.
*) Here I mean types defined like 'type foo = obj { ... }' rather than
'obj foo() { ... }'. I saw someone claiming the former syntax was
invalid in the IRC logs this week (it's not), so maybe the distinction
is not widely understood.
The latter implies an in-place definition of the former as well. But
yes, both are valid.
-Graydon
_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev