Reference counting is generally more desirable than garbage collection, since
it is simple and deterministic, and avoids scanning the whole heap of the
program, which causes pauses, destroys caches, prevents effective swapping and
requires to tolerate increasing memory usage by a multiplicative factor to
allow GC to be amortized to be linear in the number of dead objects in the
general case.
However, it cannot support cycles (without destroying its advantages over GC),
so it is a good idea to statically prevent them to avoid leaks.
Currently Rust has several types of reference-counted pointers, with different
strategies to do that:
- The @ and @mut pointer, which seem to allow cycles and cause them to leak
until task termination, although it's sort of supposed to use a GC instead
eventually or be removed.
- The MutexARC type, which allows cycles but requires unsafe code
- The Rc and ARC types, that avoid cycles by not allowing mutations
- The RcMut and RWARC type, that avoid cycles by not allowing rc pointers in
the data
The issue is that none of this is optimal, since for instance it is not
possible to have an RcMut<A> where A contains an RcMut<B> but B contains no
pointers, even though it is perfectly safe to do so, and in general reference
counted pointers cannot be created for some types like the A in this example.
However, there seems to be a good solution: the following rule extends both the
"no mutations" and "no rc pointers" cases while allowing the above, and allows
creating rc pointers to any type at the expense of forbidding some mutations:
*** If there is a cycle of pointers (of any kind, including raw, owned and
borrowed, excluding raw pointers annotated as weak references) or enums
containing at least one reference-counted pointer, forbid mutating all those
pointers (even with an &mut) unless the object being assigned is the result of
an expression that doesn't include pointers to previously-created objects (and
thus cannot point to the pointer), or the value containing the pointer is on
the stack or owned by a stack slot (and thus is not in an rc box or owned by
it, and so cannot be pointed to by the assigned value) ***
Here a cycle means a cyclic sequence of structure fields such that each can
point to an instance of the structure of the next field in the sequence.
So for instance this case would be allowed:
struct A {b: Rc<B>}
struct B {a: Option<Rc<A>>}
but the A.a and B.b fields would be immutable even with &mut A or &mut B
pointers, and would only be mutable if the A or B structure is on the stack or
owned from the stack (and thus has no incoming rc pointers).
On the other hand, this would be allowed with no restrictions since the
points-to relationship of the types is acyclic:
struct A {b: RcMut<B>)
struct B {c: Rc<C>}
struct C {x: u32}
To implement this, one would need to extend the language with an annotation to
tell the compiler that the raw pointers inside of Rc/RcMut/ARC/RWARC/etc. are
to be treated as "reference-counted" pointers.
The compiler can then perform the static analysis required (do an SCC
decomposition of a graph containing types, enum members and pointer fields,
with edges from types to contained types, enum members and pointer fields, from
pointer fields to the pointed type, from traits to all possible implementing
types, from enum to enum member, and then forbid modification to pointers in
any SCC with at least one
reference-counted pointer)
There are at least two complexities: public trait pointers needs to be assumed
to be able to point to anything (unless one can see the whole program), while
generic types will need to be analyzed in their specialized versions, which
means that some methods would be valid to call only for some values of generic
type parameters, and that the analysis needs to be done from scratch globally
for every module compilation.
In addition to all this, weak versions of all rc pointers should be supported
(to allow weak "parent" references, for instance), which would require a
further annotation telling the compiler that the contained unsafe pointer is
"weak" and thus does not participate in the pointer cycle analysis.
Also, a keyword can be provided to allow the forbidden mutations at the cost of
doing a full run-time scan of the assigned value to ensure it doesn't point to
the object with the field being assigned, and failing or returning an error if
the check fails (this probably needs to be explicit since it can of course be
very expensive to do the scan)
Some structures like mutable strong doubly-linked lists would not be allowed by
this and would instead require an ad-hoc implementation with raw pointers
(where the list node smart pointer will increase the reference count on both
the node, and the whole list, and splicing will be O(n)).
This infrastructure should be able to handle most programs without using
garbage collection, except those that are interpreters of garbage-collected
languages or implement interfaces that require garbage collection (e.g. AF_UNIX
sockets with fd passing, where the fd can be an unix socket itself).
It's also possible that there are even better approaches in the literature or
other languages.
_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev