Reference counting is generally more desirable than garbage collection, since 
it is simple and deterministic, and avoids scanning the whole heap of the 
program, which causes pauses, destroys caches, prevents effective swapping and 
requires to tolerate increasing memory usage by a multiplicative factor to 
allow GC to be amortized to be linear in the number of dead objects in the 
general case.

However, it cannot support cycles (without destroying its advantages over GC), 
so it is a good idea to statically prevent them to avoid leaks.

Currently Rust has several types of reference-counted pointers, with different 
strategies to do that:
- The @ and @mut pointer, which seem to allow cycles and cause them to leak 
until task termination, although it's sort of supposed to use a GC instead 
eventually or be removed.
- The MutexARC type, which allows cycles but requires unsafe code
- The Rc and ARC types, that avoid cycles by not allowing mutations
- The RcMut and RWARC type, that avoid cycles by not allowing rc pointers in 
the data

The issue is that none of this is optimal, since for instance it is not 
possible to have an RcMut<A> where A contains an RcMut<B> but B contains no 
pointers, even though it is perfectly safe to do so, and in general reference 
counted pointers cannot be created for some types like the A in this example.

However, there seems to be a good solution: the following rule extends both the 
"no mutations" and "no rc pointers" cases while allowing the above, and allows 
creating rc pointers to any type at the expense of forbidding some mutations:
*** If there is a cycle of pointers (of any kind, including raw, owned and 
borrowed, excluding raw pointers annotated as weak references) or enums 
containing at least one reference-counted pointer, forbid mutating all those 
pointers (even with an &mut) unless the object being assigned is the result of 
an expression that doesn't include pointers to previously-created objects (and 
thus cannot point to the pointer), or the value containing the pointer is on 
the stack or owned by a stack slot (and thus is not in an rc box or owned by 
it, and so cannot be pointed to by the assigned value) ***

Here a cycle means a cyclic sequence of structure fields such that each can 
point to an instance of the structure of the next field in the sequence.

So for instance this case would be allowed:
struct A {b: Rc<B>}
struct B {a: Option<Rc<A>>}

but the A.a and B.b fields would be immutable even with &mut A or &mut B 
pointers, and would only be mutable if the A or B structure is on the stack or 
owned from the stack (and thus has no incoming rc pointers).

On the other hand, this would be allowed with no restrictions since the 
points-to relationship of the types is acyclic:
struct A {b: RcMut<B>)
struct B {c: Rc<C>}
struct C {x: u32}

To implement this, one would need to extend the language with an annotation to 
tell the compiler that the raw pointers inside of Rc/RcMut/ARC/RWARC/etc. are 
to be treated as "reference-counted" pointers.

The compiler can then perform the static analysis required (do an SCC 
decomposition of a graph containing types, enum members and pointer fields, 
with edges from types to contained types, enum members and pointer fields, from 
pointer fields to the pointed type, from traits to all possible implementing 
types, from enum to enum member, and then forbid modification to pointers in 
any SCC with at least one
 reference-counted pointer)

There are at least two complexities: public trait pointers needs to be assumed 
to be able to point to anything (unless one can see the whole program), while 
generic types will need to be analyzed in their specialized versions, which 
means that some methods would be valid to call only for some values of generic 
type parameters, and that the analysis needs to be done from scratch globally 
for every module compilation.

In addition to all this, weak versions of all rc pointers should be supported 
(to allow weak "parent" references, for instance), which would require a 
further annotation telling the compiler that the contained unsafe pointer is 
"weak" and thus does not participate in the pointer cycle analysis.

Also, a keyword can be provided to allow the forbidden mutations at the cost of 
doing a full run-time scan of the assigned value to ensure it doesn't point to 
the object with the field being assigned, and failing or returning an error if 
the check fails (this probably needs to be explicit since it can of course be 
very expensive to do the scan)

Some structures like mutable strong doubly-linked lists would not be allowed by 
this and would instead require an ad-hoc implementation with raw pointers 
(where the list node smart pointer will increase the reference count on both 
the node, and the whole list, and splicing will be O(n)).

This infrastructure should be able to handle most programs without using 
garbage collection, except those that are interpreters of garbage-collected 
languages or implement interfaces that require garbage collection (e.g. AF_UNIX 
sockets with fd passing, where the fd can be an unix socket itself).

It's also possible that there are even better approaches in the literature or 
other languages.

                                          
_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev

Reply via email to