A recent and very welcome pull request [1] pointed out Yet Another
Ambiguity around struct syntax. If you have something like this:
... match x { ...
is that "match (x {})", where `x` is the name of a struct literal, or
`match x {` where `x` is the variable being matched and what follows are
the arms?
Before I go any further, I want to emphasize that I am not picking on
the author of the pull request. As I said, it's excellent work and the
author made a logical decision on how to proceed with the ambiguity.
However, since it is dealing with our grammar, it seems like we should
decide how to resolve this with more discussion than a review on a pull
request, so I thought I'd write up an e-mail describing the issue and
gather some feedback.
Now, to some extent, you can resolve this if there are fields present
because the code would look like:
... match x { field: ...
However, this breaks down if you have empty structs, which didn't used
to be allowed but currently are. Plus it requires more lookahead,
clearly, though not an indeterminate amount.
The pull request took the approach of parsing `match x {}` as an empty
struct literal and thus to write a match with no arms (an admittedly
bizarre thing to write) one must write `match (x) {}`. This is
reasonable but I find it personally somewhat surprising that `match x {
}` would not parse (...and then likely lead to an exhaustiveness
checking failure).
However, this same ambiguity arises in a lot of places: if/else-if
expressions, match expressions, `do` and `for` expressions, and perhaps
a few others. Currently I *think* we use lookahead for field names to
resolve the ambiguity that arises with struct literals, but of course
this doesn't work with empty structs. I'd like it if we could resolve
this in a uniform way.
I see various options:
1. Treat Foo {} as a struct literal, requiring parentheses to
disambiguate in some cases (e.g., `if (x) {}`). This is what the pull
request does.
2. Declare that `Foo { ... }` literals must always have at least one
field, and use newtype structs for the empty struct case.
3. Place a parser restriction on those contexts where `{` terminates the
expression and say that struct literals cannot appear there unless they
are in parentheses.
Some details follow.
### Treat `Foo {}` as a struct literal
I don't have anything more to say about this approach. =)
### Treat empty structs the way we treat enum variants?
Perhaps we should just not parse a declaration like:
struct X {}
instead one would write something like:
struct X;
or
struct X();
Much as you write
enum Foo { Y }
This would be a "new-type" struct so X would also serve as a value, just
like the constant `Y` in the enum case. This would mean that one never
writes a struct literal `Foo {}` but instead just `Foo`.
### Restrict where struct literals can appear
We could also just have a subclass of expressions which can appear in
`if`, `do`, etc. This subclass would not permit struct literals. That
means that `if Foo {x: 10}.is_true {}` or something would have to be
written `if (Foo { x: 10 }.is_true()) { ... }`. This rule implies that
very little lookahead is needed. Such rules can be a pain for the
pretty printer, however. To some extent we already have a rule like
this for `do` and `for`, since we will parse:
...for x.each |y...
as a method call with one argument and not `(x.each | y)`. Since this
rule would presumably not apply to `if` etc, there would actually be
three classes of expressions, those that can appear in `if`, those that
can appear in `do`/`for`, and full expressions.
### My personal opinion
I started out preferring the final option, but I am now leaning towards
option #2, which seems to simplify the grammar overall and still
requires only fixed lookahead to disambiguate.
Niko
[1] https://github.com/mozilla/rust/pull/5137
_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev