A recent and very welcome pull request [1] pointed out Yet Another Ambiguity around struct syntax. If you have something like this:
    ... match x { ...
is that "match (x {})", where `x` is the name of a struct literal, or `match x {` where `x` is the variable being matched and what follows are the arms?

Before I go any further, I want to emphasize that I am not picking on the author of the pull request. As I said, it's excellent work and the author made a logical decision on how to proceed with the ambiguity. However, since it is dealing with our grammar, it seems like we should decide how to resolve this with more discussion than a review on a pull request, so I thought I'd write up an e-mail describing the issue and gather some feedback.

Now, to some extent, you can resolve this if there are fields present because the code would look like:
    ... match x { field: ...
However, this breaks down if you have empty structs, which didn't used to be allowed but currently are. Plus it requires more lookahead, clearly, though not an indeterminate amount.

The pull request took the approach of parsing `match x {}` as an empty struct literal and thus to write a match with no arms (an admittedly bizarre thing to write) one must write `match (x) {}`. This is reasonable but I find it personally somewhat surprising that `match x { }` would not parse (...and then likely lead to an exhaustiveness checking failure).

However, this same ambiguity arises in a lot of places: if/else-if expressions, match expressions, `do` and `for` expressions, and perhaps a few others. Currently I *think* we use lookahead for field names to resolve the ambiguity that arises with struct literals, but of course this doesn't work with empty structs. I'd like it if we could resolve this in a uniform way.

I see various options:

1. Treat Foo {} as a struct literal, requiring parentheses to disambiguate in some cases (e.g., `if (x) {}`). This is what the pull request does.

2. Declare that `Foo { ... }` literals must always have at least one field, and use newtype structs for the empty struct case.

3. Place a parser restriction on those contexts where `{` terminates the expression and say that struct literals cannot appear there unless they are in parentheses.

Some details follow.

### Treat `Foo {}` as a struct literal

I don't have anything more to say about this approach. =)

### Treat empty structs the way we treat enum variants?

Perhaps we should just not parse a declaration like:
    struct X {}
instead one would write something like:
    struct X;
or
    struct X();
Much as you write
    enum Foo { Y }
This would be a "new-type" struct so X would also serve as a value, just like the constant `Y` in the enum case. This would mean that one never writes a struct literal `Foo {}` but instead just `Foo`.

### Restrict where struct literals can appear

We could also just have a subclass of expressions which can appear in `if`, `do`, etc. This subclass would not permit struct literals. That means that `if Foo {x: 10}.is_true {}` or something would have to be written `if (Foo { x: 10 }.is_true()) { ... }`. This rule implies that very little lookahead is needed. Such rules can be a pain for the pretty printer, however. To some extent we already have a rule like this for `do` and `for`, since we will parse:
    ...for x.each |y...
as a method call with one argument and not `(x.each | y)`. Since this rule would presumably not apply to `if` etc, there would actually be three classes of expressions, those that can appear in `if`, those that can appear in `do`/`for`, and full expressions.

### My personal opinion

I started out preferring the final option, but I am now leaning towards option #2, which seems to simplify the grammar overall and still requires only fixed lookahead to disambiguate.


Niko

[1] https://github.com/mozilla/rust/pull/5137
_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev

Reply via email to