Re: [rust-dev] LL(1) problems

2013-04-29 Thread Graydon Hoare
On 13-04-25 09:16 AM, Patrick Walton wrote:

 Are you concerned about the left-factoring needed to make the LL(1)
 grammar work? To me that's the biggest issue: the resulting grammar is
 kind of messy, and a tool that uses the LL(1) grammar is going to have a
 fun time reconstructing the first argument to method signatures (for
 example)...

Somewhat concerned. But also a bit surprised if it'd be that hard to
extract; after all many rules have a just propagate the thing found
within it semantic action in typical generated parsers. Eg. the
multi-layered stack of rules for binops-with-precedence.

If it's truly catastrophic for grammar users, or the docs for that m
tter, I suppose we can go with See appendix G for the LL(1) version of
this rule -- in a grammar we also regularly test against the codebase --
or this LL(2) version here.

-Graydon

___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] LL(1) problems

2013-04-26 Thread Niko Matsakis
Patrick,

Nice work! I'm still digesting your full mail.

With respect to the pattern-names-as-type problem, I don't know if
restricting the pattern name to be an identifier is the right thing to do.
Would we be adding that restriction for all fn items, or just those that
appear in traits? There are default methods to consider.  I think it'd be
best to have the same rules for all fn items (and maybe those rules should
be: no patterns, those can only appear in closures).

The other option is just to require parameter names on all method
declarations, which would be more consistent, but would also sometimes be
annoying for lightweight traits where the parameter names are uninteresting.

Finally, one could imagine requiring that patterns be parenthesized or
something like that? I confess that I sometimes find the current syntax
hard to parse, but I think that's at least partially because of the old
mode syntax (which is indeed ambiguous), parentheses might make it clearer.
But I think I wouldn't want them to be required in the `|...|` closure
syntax, so for consistency this option is probably out.



Niko



On Wed, Apr 24, 2013 at 10:06 PM, Patrick Walton pwal...@mozilla.comwrote:

 I've refactored the Rust grammar to get it as close to LL(1) as I could. I
 made some minor language changes along the way which should not break code,
 but there are a few things that are still preventing me from making it
 LL(1):

 1. Explicit self and patterns-as-arguments require two tokens of lookahead
 to find the `self` (or `mut` or lifetime).

 fn foo(self, ...) vs. fn foo(a: int) { ... }

 This is because arguments are patterns, and `a` is a valid pattern.

 2. There is a production called maybe-named argument, which is part of
 the trait declaration form. (It also shows up in function types.) This
 allows you to write:

 trait Foo {
 fn bar(int, int);
 }

 Instead of:

 trait Foo {
 fn bar(x: int, y: int);
 }

 However, because arguments can be patterns, lookahead is required to
 distinguish between that and this:

 trait Foo {
 fn bar(x: int, y: int);
 }

 I believe this can actually be *unbounded* lookahead. Consider:

 trait Foo {
 fn bar(x: int);
 }

 Versus:

 trait Foo {
 fn bar(int);
 }

 This has a relatively straightforward fix though: just restrict the
 argument name to be an identifier. There is little need for the pattern
 form here. This reduces the complexity to LL(2); I haven't investigated
 whether it can be reduced further.

 3. `unsafe` blocks and `unsafe` function declarations are both allowed
 inside blocks, and therefore two tokens of lookahead are required.

 fn foo() {
 unsafe fn bar() { ... }
 }

 Versus:

 fn foo() {
 unsafe { ... }
 }

 The parser needs to look ahead two tokens to find the `fn` or `{` keyword.

 These were the only warnings that the Python yapps2 module emitted when
 compiling the Rust grammar, after suitable refactoring. My general feeling
 is that, given how valuable explicit self and argument patterns are, LL(2)
 is okay, especially if we turn out to be LALR(1). Others may feel
 differently, however; thoughts?

 Patrick
 __**_
 Rust-dev mailing list
 Rust-dev@mozilla.org
 https://mail.mozilla.org/**listinfo/rust-devhttps://mail.mozilla.org/listinfo/rust-dev

___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] LL(1) problems

2013-04-26 Thread Benjamin Striegel
 why is continue spelled loop in Rust?

continue violates Rust's six-character policy, and the original keyword,
cont, was undesirable for obvious reasons. Considering how rarely-used it
is (there are eight uses in the whole compiler), the decision was made to
simply reuse loop and save a keyword.


On Fri, Apr 26, 2013 at 3:45 AM, Erik S sw...@earthling.net wrote:

  On 4/25/2013 10:53 AM, Patrick Walton wrote:

 I'm not sure we can do the latter. There are too many issues relating to
 `unsafe`, `loop`, the `self` argument, etc. to make the LL(1) derivable
 from the human-readable grammar in an automated fashion, in my eyes.

 Total bikeshed... but why is continue spelled loop in Rust? Especially
 if it's causing problems? Rust is the only language [1]
 http://rigaux.org/language-study/syntax-across-languages/CntrFlow.html#CntrFlowBrkCntFlothat
 uses loop to mean continue. It seems like an arbitrary incompatibility
 with C/C++/Java/JS/etc.

 Erik

 [1]
 http://rigaux.org/language-study/syntax-across-languages/CntrFlow.html#CntrFlowBrkCntFlo

 ___
 Rust-dev mailing list
 Rust-dev@mozilla.org
 https://mail.mozilla.org/listinfo/rust-dev


___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] LL(1) problems

2013-04-26 Thread Benjamin Striegel
 The other option is just to require parameter names on all method
declarations

I honestly wasn't even aware that it was possible to omit the parameter
names when declaring traits (it's not mentioned in the tutorial or in any
code I've ever seen). It makes sense in retrospect, but I wouldn't really
miss it.


On Fri, Apr 26, 2013 at 12:01 PM, Patrick Walton pwal...@mozilla.comwrote:

 On 4/26/13 2:48 AM, Niko Matsakis wrote:

 Patrick,

 Nice work! I'm still digesting your full mail.

 With respect to the pattern-names-as-type problem, I don't know if
 restricting the pattern name to be an identifier is the right thing to
 do. Would we be adding that restriction for all fn items, or just those
 that appear in traits? There are default methods to consider.


 Oh yes, you're right. This option is out then.


  The other option is just to require parameter names on all method
 declarations, which would be more consistent, but would also sometimes
 be annoying for lightweight traits where the parameter names are
 uninteresting.


 I think we have to do this to avoid unbounded lookahead.


  Finally, one could imagine requiring that patterns be parenthesized or
 something like that? I confess that I sometimes find the current syntax
 hard to parse, but I think that's at least partially because of the old
 mode syntax (which is indeed ambiguous), parentheses might make it
 clearer. But I think I wouldn't want them to be required in the `|...|`
 closure syntax, so for consistency this option is probably out.


 Yeah, I don't really like this idea, for the reason you gave.


 Patrick

 __**_
 Rust-dev mailing list
 Rust-dev@mozilla.org
 https://mail.mozilla.org/**listinfo/rust-devhttps://mail.mozilla.org/listinfo/rust-dev

___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] LL(1) problems

2013-04-25 Thread Lindsey Kuper
On Thu, Apr 25, 2013 at 1:25 AM, Patrick Walton pwal...@mozilla.com wrote:
 I have an untested grammar for Rust here,
 with the token regexes not yet filled in:

 https://gist.github.com/anonymous/5457664

 yapps2 reports that this grammar is LL(1).

 Note that the refactorings I made resulted in a grammar which isn't that
 great for tooling or parsing in many places. In particular the contortions
 needed to make `self_ty_and_maybenamed_args` result in an AST that combines
 the self type and the type of the first argument together in bizarre ways.

 Since this is untested, I'm sure there are bugs, but it's probably a good
 sign that I could at least refactor the grammar to what I think is LL(1).

This is really cool, but I'm sort of confused about the apparent
multiple ongoing efforts toward having a precise and machine-readable
Rust grammar.  Should we consider one of these the real grammar?

Lindsey
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] LL(1) problems

2013-04-25 Thread Alex Bradbury
On 25 April 2013 06:25, Patrick Walton pwal...@mozilla.com wrote:
 Note that the refactorings I made resulted in a grammar which isn't that
 great for tooling or parsing in many places. In particular the contortions
 needed to make `self_ty_and_maybenamed_args` result in an AST that combines
 the self type and the type of the first argument together in bizarre ways.

So that implies disallowing self as an identifier might be useful. Are
there other changes of a similar nature that should be considered?

Alex
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] LL(1) problems

2013-04-25 Thread Graydon Hoare

On 13-04-24 11:39 PM, Lindsey Kuper wrote:

On Thu, Apr 25, 2013 at 1:25 AM, Patrick Walton pwal...@mozilla.com wrote:

I have an untested grammar for Rust here,
with the token regexes not yet filled in:

https://gist.github.com/anonymous/5457664

yapps2 reports that this grammar is LL(1).

Note that the refactorings I made resulted in a grammar which isn't that
great for tooling or parsing in many places. In particular the contortions
needed to make `self_ty_and_maybenamed_args` result in an AST that combines
the self type and the type of the first argument together in bizarre ways.

Since this is untested, I'm sure there are bugs, but it's probably a good
sign that I could at least refactor the grammar to what I think is LL(1).


This is really cool, but I'm sort of confused about the apparent
multiple ongoing efforts toward having a precise and machine-readable
Rust grammar.  Should we consider one of these the real grammar?


This is a translation of the one John just finished (which was for 
antlr4, a very flexible any LL(k) grammar) into input for yapps2, 
which (as far as I know) has the distinguishing features of 
being-in-python and being-able-to-tell-us-about-LL(1)-conflicts. I 
believe it's still the same grammar as John did, just massaged to 
target a different tool. At some point I'd like a library that can 
consume/emit the 97 different equivalent dialects of EBNF used by 
different tools, so we can do this sort of exercise a little more readily.


In any case that is (as far as I know) the only multiplicity of grammars 
floating about. The previous attempt here was one I made, targeting 
llnextgen, but it was abandoned mid-way-through (the results of which 
are currently on display in the reference manual).


Are there others?

-Graydon

___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] LL(1) problems

2013-04-25 Thread John Clements

On Apr 25, 2013, at 7:04 AM, Graydon Hoare wrote:

 On 13-04-24 11:39 PM, Lindsey Kuper wrote:
 On Thu, Apr 25, 2013 at 1:25 AM, Patrick Walton pwal...@mozilla.com wrote:
 I have an untested grammar for Rust here,
 with the token regexes not yet filled in:
 
 https://gist.github.com/anonymous/5457664
 
 yapps2 reports that this grammar is LL(1).
 
 Note that the refactorings I made resulted in a grammar which isn't that
 great for tooling or parsing in many places. In particular the contortions
 needed to make `self_ty_and_maybenamed_args` result in an AST that combines
 the self type and the type of the first argument together in bizarre ways.
 
 Since this is untested, I'm sure there are bugs, but it's probably a good
 sign that I could at least refactor the grammar to what I think is LL(1).
 
 This is really cool, but I'm sort of confused about the apparent
 multiple ongoing efforts toward having a precise and machine-readable
 Rust grammar.  Should we consider one of these the real grammar?
 
 This is a translation of the one John just finished (which was for antlr4, a 
 very flexible any LL(k) grammar) into input for yapps2, which (as far as I 
 know) has the distinguishing features of being-in-python and 
 being-able-to-tell-us-about-LL(1)-conflicts. I believe it's still the same 
 grammar as John did, just massaged to target a different tool. At some point 
 I'd like a library that can consume/emit the 97 different equivalent dialects 
 of EBNF used by different tools, so we can do this sort of exercise a little 
 more readily.
 
 In any case that is (as far as I know) the only multiplicity of grammars 
 floating about. The previous attempt here was one I made, targeting 
 llnextgen, but it was abandoned mid-way-through (the results of which are 
 currently on display in the reference manual).

FWIW, I'm (mildly) concerned too.  In particular, I'm disappointed to discover 
that in its present form (and using my present caveman-like invocation style), 
ANTLR parses source files so slowly that it's impossible to use directly as a 
validation tool; I would very much like to directly validate the grammar used 
for documentation purposes against the existing sources.  I haven't yet asked 
for help from the ANTLR folks, because I don't yet feel like I've finished due 
diligence on RTFMing ANTLR, which I would prefer to do before dumping the 
problem in their lap.

John

___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] LL(1) problems

2013-04-25 Thread Patrick Walton

On 4/25/13 12:08 AM, Alex Bradbury wrote:

On 25 April 2013 06:25, Patrick Walton pwal...@mozilla.com wrote:

Note that the refactorings I made resulted in a grammar which isn't that
great for tooling or parsing in many places. In particular the contortions
needed to make `self_ty_and_maybenamed_args` result in an AST that combines
the self type and the type of the first argument together in bizarre ways.


So that implies disallowing self as an identifier might be useful. Are
there other changes of a similar nature that should be considered?


In general I needed to disable keywords as identifiers to make it work. 
There were a few other minor changes: reducing patterns to literals as 
discussed in the other thread, and removing the ability to call methods 
on statements without parentheses.


Patrick

___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] LL(1) problems

2013-04-25 Thread Graydon Hoare

On 13-04-25 08:37 AM, John Clements wrote:


FWIW, I'm (mildly) concerned too.  In particular, I'm disappointed to discover 
that in its present form (and using my present caveman-like invocation style), 
ANTLR parses source files so slowly that it's impossible to use directly as a 
validation tool; I would very much like to directly validate the grammar used 
for documentation purposes against the existing sources.  I haven't yet asked 
for help from the ANTLR folks, because I don't yet feel like I've finished due 
diligence on RTFMing ANTLR, which I would prefer to do before dumping the 
problem in their lap.


I'm sorry for the confusion; I don't think patrick's work here 
represents a divergence from yours so much as a continuation of it, in a 
direction that answers a question I've asked repeatedly while you were 
working on your grammar: can we actually find an LL(1) factoring?. 
Also can any other non-antlr tools consume this grammar?


Since you chose antlr4 rather than antlr3, the LL(1) question in 
particular was obscured under the antlr4 will parse anything! sales 
pitch[1]. Which is fine as far as getting the grammar roughed out and 
running -- I'm not criticizing that decision, it was yours to make, as 
was the choice of antlr in the first place. But it _did_ dodge a 
question I've been quite persistent about asking; one which I wanted to 
have an answer for before considering the grammar done.


Longer term, I would like whatever grammar we wind up denoting as 
canonical / documented / spec'ed to be as (re)target-able as possible. 
I've been relatively insistent on LL(1) since it is a nice 
intersection-of-inputs, practically guaranteed to parse under any 
framework we retarget it to. IOW I do _not_ want to force anyone working 
with rust grammars in the future to use antlr (3, 4, or anything else). 
That's too tool-specific[2]. A grammar that is trivally translatable 
between antlr4, antlr3, yapp2, llgen, llnextgen, coco, javacc, parsec, 
spirit, some rust parser-generator, and so forth is my eventual goal 
here.


-Graydon

[1]: We parse any grammar is unfortunately common in parser-generator 
sales pitches these days, with a profusion GLR and LL(*) things. As a 
downstream consumer of parser-generator technology, let me point out 
that while I appreciate broad guarantees by tool-makers, I very much 
_dislike_ a tool that offers broad guarantees at the expense of being 
able to make promises about efficiency, grammar class and algorithmic 
complexity. IOW I actually prefer tools that can tell me what I need to 
leave out (or change) in my grammar in order to arrive at an efficient 
parser in a given complexity class. Don't worry about it is the wrong 
answer here. I want to worry about it.


[2]: we also seem to be most-invested in python for in-tree 
maintainer-mode tools associated with rust development; it seems like 
a lot to ask to install a JDK in order to verify the grammar. If the 
grammar-check _can_ be done in a python module, I'm happy to shift over 
to using it. Unless antlr-ness is an important part of the grammar in 
some way I'm not perceiving; do you have a strong preference for keeping 
the java + antlr dependency?

___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] LL(1) problems

2013-04-25 Thread Wojciech Matyjewicz

Hello,


FWIW, I'm (mildly) concerned too.  In particular, I'm disappointed to discover 
that in its present form (and using my present caveman-like invocation style), 
ANTLR parses source files so slowly that it's impossible to use directly as a 
validation tool; I would very much like to directly validate the grammar used 
for documentation purposes against the existing sources.  I haven't yet asked 
for help from the ANTLR folks, because I don't yet feel like I've finished due 
diligence on RTFMing ANTLR, which I would prefer to do before dumping the 
problem in their lap.


I have not yet tried ANTLR v4, but I have used ANTLR v3 for a few 
grammars. From what I have read in Why do we need ANTLR v4 [1], the 
new version focuses on ease-of-use rather than performance by accepting 
any grammar we care to write and pushing all the grammar analysis 
effort to runtime. It seems to me that even the grammar option bounding 
the LL-lookahead (k=xxx) does not longer exist in v4. May it be the 
reason why the parser is so slow?


Have you considered trying v3 instead of v4? It would allow to set the 
LL-lookahead even per rule basis.


Wojtek

[1] http://www.antlr.org/wiki/pages/viewpage.action?pageId=29130850
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] LL(1) problems

2013-04-25 Thread Patrick Walton

On 4/25/13 9:12 AM, Graydon Hoare wrote:

Longer term, I would like whatever grammar we wind up denoting as
canonical / documented / spec'ed to be as (re)target-able as possible.
I've been relatively insistent on LL(1) since it is a nice
intersection-of-inputs, practically guaranteed to parse under any
framework we retarget it to. IOW I do _not_ want to force anyone working
with rust grammars in the future to use antlr (3, 4, or anything else).
That's too tool-specific[2]. A grammar that is trivally translatable
between antlr4, antlr3, yapp2, llgen, llnextgen, coco, javacc, parsec,
spirit, some rust parser-generator, and so forth is my eventual goal
here.


Are you concerned about the left-factoring needed to make the LL(1) 
grammar work? To me that's the biggest issue: the resulting grammar is 
kind of messy, and a tool that uses the LL(1) grammar is going to have a 
fun time reconstructing the first argument to method signatures (for 
example)...


Patrick

___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] LL(1) problems

2013-04-25 Thread Graydon Hoare

On 13-04-25 08:33 AM, Paul Stansifer wrote:


(For example: suppose that some statements and some expressions shared a
common prefix that had to be merged in order to make the grammar LL(1);
instead of having `stmt` and `expr` nonterminals in macros, we'd need to
split them into `most_stmt` `most_expr` and `some_stmt_and_expr`.)


That would of course be undesirable. It is not clear to me that the 
factorings patrick was struggling with had that property. I think both 
suffixes of 'unsafe' were inside $stmt (branching to sub-rules shared by 
$item and $expr) and both suffixes of '' were branching to sub-rules of 
$ty and $pat. But I might be misreading. I agree that we want to keep 
relatively pithy named entrypoints in the grammar for the macro 
nonterminal parser to hook in at (even if they're artificial). Difficult 
balancing act, but I think it may be possible. Or at least it's not 
evident to me yet that it's not.


-Graydon

___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] LL(1) problems

2013-04-25 Thread Felix S. Klock II

On 25/04/2013 18:12, Graydon Hoare wrote:
I've been relatively insistent on LL(1) since it is a nice 
intersection-of-inputs, practically guaranteed to parse under any 
framework we retarget it to.
I'm a fan of this choice too, if only because the simplest efficient 
parser-generators and/or parser-composition methodologies I know of take 
an LL(1) grammar as input.


However, Paul's earlier plea on this thread (Please don't do this 
[grammar factoring] to the official parser!) raised the following 
question in my mind:


Are we allowing for the possibility of choosing the semi-middle ground 
of: There *exists* an LL(1) grammar for Rust that is derivable from the 
non-LL(1)-but-official grammar for Rust. ?  Or do we want to go all the 
way to ensuring that our own grammar that we e.g. use for defining the 
syntactic classes of the macro system etc is strictly LL(1) (or perhaps 
LL(k) for some small but known fixed k)?


(I'd have to go review my compiler text books at home to review how much 
this would actually buy us.)


If we've already discussed the latter, mea culpa.  :)

Cheers,
-Felix

On 25/04/2013 18:12, Graydon Hoare wrote:

On 13-04-25 08:37 AM, John Clements wrote:

FWIW, I'm (mildly) concerned too.  In particular, I'm disappointed to 
discover that in its present form (and using my present caveman-like 
invocation style), ANTLR parses source files so slowly that it's 
impossible to use directly as a validation tool; I would very much 
like to directly validate the grammar used for documentation purposes 
against the existing sources.  I haven't yet asked for help from the 
ANTLR folks, because I don't yet feel like I've finished due 
diligence on RTFMing ANTLR, which I would prefer to do before dumping 
the problem in their lap.


I'm sorry for the confusion; I don't think patrick's work here 
represents a divergence from yours so much as a continuation of it, in 
a direction that answers a question I've asked repeatedly while you 
were working on your grammar: can we actually find an LL(1) 
factoring?. Also can any other non-antlr tools consume this grammar?


Since you chose antlr4 rather than antlr3, the LL(1) question in 
particular was obscured under the antlr4 will parse anything! sales 
pitch[1]. Which is fine as far as getting the grammar roughed out and 
running -- I'm not criticizing that decision, it was yours to make, as 
was the choice of antlr in the first place. But it _did_ dodge a 
question I've been quite persistent about asking; one which I wanted 
to have an answer for before considering the grammar done.


Longer term, I would like whatever grammar we wind up denoting as 
canonical / documented / spec'ed to be as (re)target-able as possible. 
I've been relatively insistent on LL(1) since it is a nice 
intersection-of-inputs, practically guaranteed to parse under any 
framework we retarget it to. IOW I do _not_ want to force anyone 
working with rust grammars in the future to use antlr (3, 4, or 
anything else). That's too tool-specific[2]. A grammar that is 
trivally translatable between antlr4, antlr3, yapp2, llgen, llnextgen, 
coco, javacc, parsec, spirit, some rust parser-generator, and so 
forth is my eventual goal here.


-Graydon

[1]: We parse any grammar is unfortunately common in 
parser-generator sales pitches these days, with a profusion GLR and 
LL(*) things. As a downstream consumer of parser-generator technology, 
let me point out that while I appreciate broad guarantees by 
tool-makers, I very much _dislike_ a tool that offers broad guarantees 
at the expense of being able to make promises about efficiency, 
grammar class and algorithmic complexity. IOW I actually prefer tools 
that can tell me what I need to leave out (or change) in my grammar in 
order to arrive at an efficient parser in a given complexity class. 
Don't worry about it is the wrong answer here. I want to worry about 
it.


[2]: we also seem to be most-invested in python for in-tree 
maintainer-mode tools associated with rust development; it seems 
like a lot to ask to install a JDK in order to verify the grammar. If 
the grammar-check _can_ be done in a python module, I'm happy to shift 
over to using it. Unless antlr-ness is an important part of the 
grammar in some way I'm not perceiving; do you have a strong 
preference for keeping the java + antlr dependency?

___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev



--
irc: pnkfelix on irc.mozilla.org
email: {fklock, pnkfelix}@mozilla.org

___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] LL(1) problems

2013-04-25 Thread Graydon Hoare

On 13-04-25 09:23 AM, Felix S. Klock II wrote:


Are we allowing for the possibility of choosing the semi-middle ground
of: There *exists* an LL(1) grammar for Rust that is derivable from the
non-LL(1)-but-official grammar for Rust. ?  Or do we want to go all the
way to ensuring that our own grammar that we e.g. use for defining the
syntactic classes of the macro system etc is strictly LL(1) (or perhaps
LL(k) for some small but known fixed k)?


We are (or at least, I am) allowing for this (and related) 
possibilities. This is an aspirational conversation. I can't rule out 
anything really; if we found that there were grammar bits that were 
essential to our expressiveness and put us in nothing short of GLR, 
that would be ... sad, but not ship-sinking. Until this week I had no 
idea where we were, grammar-wise; it might have required a turing 
complete recognizer for all I knew. Hand written parsers often contain 
gruesome hacks. Ours certainly has in the past!


I'm just trying to find out how good we can get things (without major 
surgery). LL(1) is icing on the cake. But who doesn't like icing on 
their cake?


-Graydon

___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] LL(1) problems

2013-04-25 Thread Lindsey Kuper
On Thu, Apr 25, 2013 at 10:04 AM, Graydon Hoare gray...@mozilla.com wrote:
 On 13-04-24 11:39 PM, Lindsey Kuper wrote:
 This is really cool, but I'm sort of confused about the apparent
 multiple ongoing efforts toward having a precise and machine-readable
 Rust grammar.  Should we consider one of these the real grammar?


 This is a translation of the one John just finished (which was for antlr4, a
 very flexible any LL(k) grammar) into input for yapps2, which (as far as I
 know) has the distinguishing features of being-in-python and
 being-able-to-tell-us-about-LL(1)-conflicts. I believe it's still the same
 grammar as John did, just massaged to target a different tool. At some point
 I'd like a library that can consume/emit the 97 different equivalent
 dialects of EBNF used by different tools, so we can do this sort of exercise
 a little more readily.

 In any case that is (as far as I know) the only multiplicity of grammars
 floating about. The previous attempt here was one I made, targeting
 llnextgen, but it was abandoned mid-way-through (the results of which are
 currently on display in the reference manual).

 Are there others?

OK, thanks for explaining!  (By multiple I just meant two.  I'm
not aware of others.)

Lindsey
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] LL(1) problems

2013-04-25 Thread Patrick Walton

On 4/25/13 9:23 AM, Felix S. Klock II wrote:

On 25/04/2013 18:12, Graydon Hoare wrote:

I've been relatively insistent on LL(1) since it is a nice
intersection-of-inputs, practically guaranteed to parse under any
framework we retarget it to.

I'm a fan of this choice too, if only because the simplest efficient
parser-generators and/or parser-composition methodologies I know of take
an LL(1) grammar as input.

However, Paul's earlier plea on this thread (Please don't do this
[grammar factoring] to the official parser!) raised the following
question in my mind:

Are we allowing for the possibility of choosing the semi-middle ground
of: There *exists* an LL(1) grammar for Rust that is derivable from the
non-LL(1)-but-official grammar for Rust. ?  Or do we want to go all the
way to ensuring that our own grammar that we e.g. use for defining the
syntactic classes of the macro system etc is strictly LL(1) (or perhaps
LL(k) for some small but known fixed k)?


I'm not sure we can do the latter. There are too many issues relating to 
`unsafe`, `loop`, the `self` argument, etc. to make the LL(1) derivable 
from the human-readable grammar in an automated fashion, in my eyes. At 
least, I'm pretty sure that if we did want to go down that route, we'd 
probably be doing months of parser research (and I do mean *research*, 
as far as I know).


Patrick

___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] LL(1) problems

2013-04-25 Thread Matthieu Monrocq
On Thu, Apr 25, 2013 at 6:53 PM, Patrick Walton pwal...@mozilla.com wrote:

 On 4/25/13 9:23 AM, Felix S. Klock II wrote:

 On 25/04/2013 18:12, Graydon Hoare wrote:

 I've been relatively insistent on LL(1) since it is a nice
 intersection-of-inputs, practically guaranteed to parse under any
 framework we retarget it to.

 I'm a fan of this choice too, if only because the simplest efficient
 parser-generators and/or parser-composition methodologies I know of take
 an LL(1) grammar as input.

 However, Paul's earlier plea on this thread (Please don't do this
 [grammar factoring] to the official parser!) raised the following
 question in my mind:

 Are we allowing for the possibility of choosing the semi-middle ground
 of: There *exists* an LL(1) grammar for Rust that is derivable from the
 non-LL(1)-but-official grammar for Rust. ?  Or do we want to go all the
 way to ensuring that our own grammar that we e.g. use for defining the
 syntactic classes of the macro system etc is strictly LL(1) (or perhaps
 LL(k) for some small but known fixed k)?


 I'm not sure we can do the latter. There are too many issues relating to
 `unsafe`, `loop`, the `self` argument, etc. to make the LL(1) derivable
 from the human-readable grammar in an automated fashion, in my eyes. At
 least, I'm pretty sure that if we did want to go down that route, we'd
 probably be doing months of parser research (and I do mean *research*, as
 far as I know).

 Patrick


 On the other hand, should you content yourself with LL(2), and actually
have a tool like yapp2 guarantee that it is indeed LL(2) (and does not
degenerate), would it not be sufficient ?

(in case LL(1) really is gruesome compared to LL(2))

-- Matthieu


 __**_
 Rust-dev mailing list
 Rust-dev@mozilla.org
 https://mail.mozilla.org/**listinfo/rust-devhttps://mail.mozilla.org/listinfo/rust-dev

___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] LL(1) problems

2013-04-24 Thread Graydon Hoare

On 13-04-24 07:06 PM, Patrick Walton wrote:


These were the only warnings that the Python yapps2 module emitted when
compiling the Rust grammar, after suitable refactoring. My general
feeling is that, given how valuable explicit self and argument patterns
are, LL(2) is okay, especially if we turn out to be LALR(1). Others may
feel differently, however; thoughts?


My skill at grammar-factoring is ... minimal, but I thought you could 
generally factor anything that's just a common-prefix situation like 
these by introducing new rules that go one (common) token further and 
then decide between the two remaining possible branches. Not to say it's 
pretty, but it ... ought to work, I think?


(TBH I don't actually understand any of the examples I can find of LL(2) 
grammars that can't be factored into LL(1); the compiler texts I have 
all suggest that you can factor most reasonable LL(*) things into LL(1) 
with sufficiently many intermediate nodes. This page for example shows 
one 
http://stackoverflow.com/questions/10634197/ll2-language-that-is-not-ll1 
but I can't tell exactly what's going on there, something to do with 
having to look 1 token past a choice point that may expand infinitely? 
It seems quite rare / unlikely.)


If we absolutely, resolutely can't factor the current grammar down to 
LL(1) I guess I'll live. LL(2) is not the end of the world. But it seems 
awfully close, given the results so far! 2 or 3 remaining conflicts? 
That's _amazing_. I thought we'd be fighting this for months.


I agree that removing pattern arguments and explicit self would be a 
high price, probably too high, if that's what it comes down to.


-Graydon
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] LL(1) problems

2013-04-24 Thread Patrick Walton

On 4/24/13 8:04 PM, Graydon Hoare wrote:

My skill at grammar-factoring is ... minimal, but I thought you could
generally factor anything that's just a common-prefix situation like
these by introducing new rules that go one (common) token further and
then decide between the two remaining possible branches. Not to say it's
pretty, but it ... ought to work, I think?


I tried harder to refactor it. I have an untested grammar for Rust here, 
with the token regexes not yet filled in:


https://gist.github.com/anonymous/5457664

yapps2 reports that this grammar is LL(1).

Note that the refactorings I made resulted in a grammar which isn't that 
great for tooling or parsing in many places. In particular the 
contortions needed to make `self_ty_and_maybenamed_args` result in an 
AST that combines the self type and the type of the first argument 
together in bizarre ways.


Since this is untested, I'm sure there are bugs, but it's probably a 
good sign that I could at least refactor the grammar to what I think is 
LL(1).


Patrick

___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev