This email is primarily for Marijn, as he's adopting the "alias analysis
pass" task (https://github.com/graydon/rust/issues/408) from the
release-milestone list for now, but I figured I'd write to the list so
everyone else involved can see what we're thinking (and offer comments
or suggestions) too.
The issue here is that Rust's current alias (&) slots permit
circumvention of ... most other rules in the system (type safety, memory
safety, type-state safety). Even outside any proposed "unsafe" sub-dialect.
This email is not fishing for recommendation of "how to get rid of
alias" (though we may rename it "reference" to be more C++-like); rather
it's a discussion of "the rules we will enforce on aliases to make them
safe". The entire point of & is to be "a safe version of & in C++", and
it's not quite safe enough yet. Needs a bit more safety. Unfortunately
with pointers, well, you know how it is: even "a little bit unsafe"
means "can saw a hole in the universe". So & is presently.
The task of an alias-analysis pass is to reject programs it can't prove
to be safely using &. It might additionally involve injecting
safety-ensuring code of sorts I'll discuss below.
First, how things can go wrong: if an alias points into a mutable memory
structure (or substructure of a mutable) that is simultaneously
reachable via any other path (alias or otherwise), then code that
mutates the one may invalidate the other. Invalidation may mean:
- The alias may point to memory that assumes a given typestate
predicate, which may no longer hold (by the mutation).
- This includes the 'init' predicate; in other words the alias
may point to memory that was dropped, causing a UMR.
- The alias may point to tag-variant X which has been changed to
tag-variant Y underfoot, causing data corruption or a UMR.
There may be other cases, but these at least spring to mind. We have to
prohibit all of these. Some of us have discussed this in the past (this
hazard has been evident from the beginning) and have been assuming an
analysis pass should be able to reject (conservatively) such cases.
The strategy we discussed was to attempt to prove pairwise disjointness
of all non-identical lvals (named points-of-reference through which a
read or write may be made). So this means building a list of N*N
pairings for the N lvals in a function, and considering each pair
independently. With something like the following "main" rules:
- A pair of *identical* lvals (resolve to the same def_id) are ok.
- A pair of lvals that point to fully disjoint types (neither type is
a substructure of the other) are ok.
- A pair of lvals that point to immutable types are ok.
- A pair of lvals where neither is rooted in an alias is ok.
- A pair of lvals where either or both are incoming arguments to a
function are ok.
The first few of these are obvious; the last point is the tricky one and
requires some explanation (as well as support machinery). Aliases are
only formed at function-call sites, so there is a sense in which the
incoming aliases to a function are the "induction hypothesis" for all
subsequent judgments in the body of the function: things that we
*assume* hold on the aliases we get, and that we must *prove* to hold
over the aliases we form. We thought of four possible rule-sets
governing the alias formation:
1. We permit a caller to form or pass aliases that point into
other passed arguments, and rely on the first 4 rules in the callee
to check all actions with those aliases (as well as formation of
new ones) are safe.
2. We prohibit callers forming or passing an alias that may point
into another passed argument, making the 5th rule hold in the
callee and rejecting programs that form maybe-overlapping alias
arguments.
3. We permit ruels #1 and #2 but differentiate them by something
like the C99 'restrict' keyword. The 5th "main" rule works for
restricted aliases but not unrestricted aliases; the compiler
must prove them disjoint using only the first four rules.
Restricted-ness becomes a part of the type signature so the caller
knows what it has to prove as well.
4. We permit cases #1 and #2 but provide a "distinguish X" operation
that *dynamically* ensures that X (an alias) doesn't point into
any of the other lvals it's possibly non-disjoint from in the
current scope (under the first four "main" rules). This can either
be automatically injected or manually written by users.
We considered all four options governing formation, and tentatively
settled on the 2nd, IIRC, but the other 3 options remain possible here
as well. In any case we should build one and see how usable or
cognitively burdensome it is.
-Graydon
_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev