On Sun, Jun 5, 2011 at 9:45 PM, Patrick Walton <[email protected]> wrote:

> Hi everyone,
>
> This is a proposal that I've been kicking around for maybe a month now, and
> last week I talked it over with Dave and Paul.
>
> The idea is to have record types be nominal, just as tags are. In this
> scheme, record types are declared using a syntax akin to that used to
> declare tags:
>
> rec point3 {
>    x: int;
>    y: int;
>    z: int;
> }
>
> Here "point3" becomes the name of the record. To construct an instance of
> this record, the user does the following:
>
> import foo::point3;     // if not already in scope
> auto pt = { x: 10, y: 20, z: 30 };
>
> Or, if either the import is not desired or the combination of field names
> is ambiguous:
>
> auto a = rec foo::point3 { x: 10, y: 20, z: 30 };
>
> (The leading "rec" makes parsing easier. Might not be needed.)
>
> It's valid to have two records in scope with overlapping field names. The
> combination of field names is used to determine which record is meant when
> record literal syntax is used.
>
> rec point2 {
>    int x;
>    int y;
> }
>
> import foo::point2;                     // if not already in scope
> auto b = { x: 10, y: 20 };              // constructs a point2
> auto c = { x: 10, y: 20, z: 30 };       // constructs a point3
>
> Selecting a field from a record requires neither the record name nor the
> module the record is declared in to be specified:
>
> log b.x;        // just works
> log c.x;        // works too
>
> OCaml requires that field names be unique and that record fields be fully
> qualified if not in scope so that its type inference engine can uniquely
> determine a type for the LHS of a field expression ("b" in "b.x" above). In
> Rust, this is not needed because we require that the LHS of a field
> expression already have a fully-resolved type by the time we encounter it
> during typechecking. Importantly, this is not a new restriction; automatic
> dereference demands this rule already.
>
> Record constructors wouldn't require that the fields are supplied in the
> same order that the record declaration specifies. The declaration of the
> record supplies the canonical ordering for memory layout purposes. For
> example:
>
> auto pt4 = { z: 30, y: 20, x: 10 }; // constructs an identical value to
> "pt" above
>
> Now there are a few obvious drawbacks with this proposal:
>
> (1) Anonymous records are no longer allowed. All records must have their
> types declared up front, potentially increasing programmer burden.
>
> (2) Ad-hoc sharing of records is no longer possible; if module A defines a
> "point3" with fields { x: int, y: int, z: int } and module B independently
> defines a "point3" with the same fields, the two modules no longer export
> compatible types.
>
> (3) If two records are in scope and all their field names and types are
> identical, extra work is required to disambiguate them.
>
> However, there are a number of benefits, roughly in decreasing order of
> importance:
>
> (1) Recursive records are now easy to handle without having to create a tag
> in between. Paul encountered an issue recently in which a record was unable
> to contain a function that took the same record as an argument. The
> workaround--to create a singleton tag--is somewhat awkward and requires the
> creation of helper functions to make usable. I imagine that this isn't the
> last time we'll be in this situation.
>
> (2) Ordering of fields in record constructors is no longer significant.
> This simplifies maintenance; for example, a programmer could experiment with
> different memory layouts for a record to see which yields the best
> performance without having to rewrite every record literal. It also means
> that the cognitive overhead of remembering the right order for record fields
> is reduced.
>
> (3) Type errors are more helpful. A record with the wrong types, for
> example, generates an error immediately at site of construction instead of
> farther down. Moreover, no complicated diffing logic is needed to make type
> mismatches between large record types sensible to the user.
>
> (4) Typechecking should speed up significantly. Much of the time spent in
> typechecking is spent unifying large record types.
>
> And in practice I think that the drawbacks mentioned above are not
> significant:
>
> (1) In Rust, truly anonymous record types seem to hardly ever be used in
> practice. Every record I know of in the standard library and in rustc has an
> associated typedef. This is due to the fact that functions require type
> annotations; sooner or later practically every record type that gets used
> tends to end up as part of the signature of a function, at which point its
> type must be specified in full. So, in practice, requiring the programmer to
> specify the types of every field up front is no more of a burden than the
> status quo.
>
> (2) Ad-hoc sharing of records seems rare to me, and we have tuples for
> that. In fact, I think simple "point"-like types, which are the ones in
> which ad-hoc sharing is commonest, may well better be specified as tuples
> for this exact reason. Tuples are less fragile than records anyway; in the
> current scheme, { x: int, y: int } and { x: int, y: int } exported by two
> modules happen to be type-compatible, but what if the two modules used { x:
> int, y: int } and { xcoord: int, ycoord: int } instead? Tuples don't have
> this problem, so it seems to me that most of the cases in which ad-hoc
> sharing is desired would be better served by using tuple types instead.
>
> (3) Having two identically-structured records in scope does require extra
> work to disambiguate. But this is no worse than having functions with
> identical names in scope. It's a hazard to be sure, but I suspect it'll be
> rare enough for the benefits to outweigh this drawback.
>
> Anyway, that's quite enough for one email. Opinions?
>
I am sad to see anonymous records go since I use them frequently in my ML
and JavaScript programs where giving a type a name doesn't make sense
(perhaps if there's no type inference at function boundaries this doesn't
matter). Tuples obviously make sense in the vector example you gave (which
is what I usually point to as a use case for anonymous records) but tuples
don't handle the cases where the type changes over time (specifically
dropping fields). Also sometimes it's just awkward to give a name to a type;
this is one my frustrations with C-like languages.

Would it be possible to support both named and anonymous record types? For
C0, I elaborated its nominal records into structural ones by inserting a
unique field derived from the name into the definition. Perhaps something
like this would work for Rust?

-Rob
_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev

Reply via email to