Re: [rust-dev] LL(1) problems
On 13-04-25 09:16 AM, Patrick Walton wrote: Are you concerned about the left-factoring needed to make the LL(1) grammar work? To me that's the biggest issue: the resulting grammar is kind of messy, and a tool that uses the LL(1) grammar is going to have a fun time reconstructing the first argument to method signatures (for example)... Somewhat concerned. But also a bit surprised if it'd be that hard to extract; after all many rules have a just propagate the thing found within it semantic action in typical generated parsers. Eg. the multi-layered stack of rules for binops-with-precedence. If it's truly catastrophic for grammar users, or the docs for that m tter, I suppose we can go with See appendix G for the LL(1) version of this rule -- in a grammar we also regularly test against the codebase -- or this LL(2) version here. -Graydon ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] LL(1) problems
Patrick, Nice work! I'm still digesting your full mail. With respect to the pattern-names-as-type problem, I don't know if restricting the pattern name to be an identifier is the right thing to do. Would we be adding that restriction for all fn items, or just those that appear in traits? There are default methods to consider. I think it'd be best to have the same rules for all fn items (and maybe those rules should be: no patterns, those can only appear in closures). The other option is just to require parameter names on all method declarations, which would be more consistent, but would also sometimes be annoying for lightweight traits where the parameter names are uninteresting. Finally, one could imagine requiring that patterns be parenthesized or something like that? I confess that I sometimes find the current syntax hard to parse, but I think that's at least partially because of the old mode syntax (which is indeed ambiguous), parentheses might make it clearer. But I think I wouldn't want them to be required in the `|...|` closure syntax, so for consistency this option is probably out. Niko On Wed, Apr 24, 2013 at 10:06 PM, Patrick Walton pwal...@mozilla.comwrote: I've refactored the Rust grammar to get it as close to LL(1) as I could. I made some minor language changes along the way which should not break code, but there are a few things that are still preventing me from making it LL(1): 1. Explicit self and patterns-as-arguments require two tokens of lookahead to find the `self` (or `mut` or lifetime). fn foo(self, ...) vs. fn foo(a: int) { ... } This is because arguments are patterns, and `a` is a valid pattern. 2. There is a production called maybe-named argument, which is part of the trait declaration form. (It also shows up in function types.) This allows you to write: trait Foo { fn bar(int, int); } Instead of: trait Foo { fn bar(x: int, y: int); } However, because arguments can be patterns, lookahead is required to distinguish between that and this: trait Foo { fn bar(x: int, y: int); } I believe this can actually be *unbounded* lookahead. Consider: trait Foo { fn bar(x: int); } Versus: trait Foo { fn bar(int); } This has a relatively straightforward fix though: just restrict the argument name to be an identifier. There is little need for the pattern form here. This reduces the complexity to LL(2); I haven't investigated whether it can be reduced further. 3. `unsafe` blocks and `unsafe` function declarations are both allowed inside blocks, and therefore two tokens of lookahead are required. fn foo() { unsafe fn bar() { ... } } Versus: fn foo() { unsafe { ... } } The parser needs to look ahead two tokens to find the `fn` or `{` keyword. These were the only warnings that the Python yapps2 module emitted when compiling the Rust grammar, after suitable refactoring. My general feeling is that, given how valuable explicit self and argument patterns are, LL(2) is okay, especially if we turn out to be LALR(1). Others may feel differently, however; thoughts? Patrick __**_ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/**listinfo/rust-devhttps://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] LL(1) problems
why is continue spelled loop in Rust? continue violates Rust's six-character policy, and the original keyword, cont, was undesirable for obvious reasons. Considering how rarely-used it is (there are eight uses in the whole compiler), the decision was made to simply reuse loop and save a keyword. On Fri, Apr 26, 2013 at 3:45 AM, Erik S sw...@earthling.net wrote: On 4/25/2013 10:53 AM, Patrick Walton wrote: I'm not sure we can do the latter. There are too many issues relating to `unsafe`, `loop`, the `self` argument, etc. to make the LL(1) derivable from the human-readable grammar in an automated fashion, in my eyes. Total bikeshed... but why is continue spelled loop in Rust? Especially if it's causing problems? Rust is the only language [1] http://rigaux.org/language-study/syntax-across-languages/CntrFlow.html#CntrFlowBrkCntFlothat uses loop to mean continue. It seems like an arbitrary incompatibility with C/C++/Java/JS/etc. Erik [1] http://rigaux.org/language-study/syntax-across-languages/CntrFlow.html#CntrFlowBrkCntFlo ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] LL(1) problems
The other option is just to require parameter names on all method declarations I honestly wasn't even aware that it was possible to omit the parameter names when declaring traits (it's not mentioned in the tutorial or in any code I've ever seen). It makes sense in retrospect, but I wouldn't really miss it. On Fri, Apr 26, 2013 at 12:01 PM, Patrick Walton pwal...@mozilla.comwrote: On 4/26/13 2:48 AM, Niko Matsakis wrote: Patrick, Nice work! I'm still digesting your full mail. With respect to the pattern-names-as-type problem, I don't know if restricting the pattern name to be an identifier is the right thing to do. Would we be adding that restriction for all fn items, or just those that appear in traits? There are default methods to consider. Oh yes, you're right. This option is out then. The other option is just to require parameter names on all method declarations, which would be more consistent, but would also sometimes be annoying for lightweight traits where the parameter names are uninteresting. I think we have to do this to avoid unbounded lookahead. Finally, one could imagine requiring that patterns be parenthesized or something like that? I confess that I sometimes find the current syntax hard to parse, but I think that's at least partially because of the old mode syntax (which is indeed ambiguous), parentheses might make it clearer. But I think I wouldn't want them to be required in the `|...|` closure syntax, so for consistency this option is probably out. Yeah, I don't really like this idea, for the reason you gave. Patrick __**_ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/**listinfo/rust-devhttps://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] LL(1) problems
On Thu, Apr 25, 2013 at 1:25 AM, Patrick Walton pwal...@mozilla.com wrote: I have an untested grammar for Rust here, with the token regexes not yet filled in: https://gist.github.com/anonymous/5457664 yapps2 reports that this grammar is LL(1). Note that the refactorings I made resulted in a grammar which isn't that great for tooling or parsing in many places. In particular the contortions needed to make `self_ty_and_maybenamed_args` result in an AST that combines the self type and the type of the first argument together in bizarre ways. Since this is untested, I'm sure there are bugs, but it's probably a good sign that I could at least refactor the grammar to what I think is LL(1). This is really cool, but I'm sort of confused about the apparent multiple ongoing efforts toward having a precise and machine-readable Rust grammar. Should we consider one of these the real grammar? Lindsey ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] LL(1) problems
On 25 April 2013 06:25, Patrick Walton pwal...@mozilla.com wrote: Note that the refactorings I made resulted in a grammar which isn't that great for tooling or parsing in many places. In particular the contortions needed to make `self_ty_and_maybenamed_args` result in an AST that combines the self type and the type of the first argument together in bizarre ways. So that implies disallowing self as an identifier might be useful. Are there other changes of a similar nature that should be considered? Alex ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] LL(1) problems
On 13-04-24 11:39 PM, Lindsey Kuper wrote: On Thu, Apr 25, 2013 at 1:25 AM, Patrick Walton pwal...@mozilla.com wrote: I have an untested grammar for Rust here, with the token regexes not yet filled in: https://gist.github.com/anonymous/5457664 yapps2 reports that this grammar is LL(1). Note that the refactorings I made resulted in a grammar which isn't that great for tooling or parsing in many places. In particular the contortions needed to make `self_ty_and_maybenamed_args` result in an AST that combines the self type and the type of the first argument together in bizarre ways. Since this is untested, I'm sure there are bugs, but it's probably a good sign that I could at least refactor the grammar to what I think is LL(1). This is really cool, but I'm sort of confused about the apparent multiple ongoing efforts toward having a precise and machine-readable Rust grammar. Should we consider one of these the real grammar? This is a translation of the one John just finished (which was for antlr4, a very flexible any LL(k) grammar) into input for yapps2, which (as far as I know) has the distinguishing features of being-in-python and being-able-to-tell-us-about-LL(1)-conflicts. I believe it's still the same grammar as John did, just massaged to target a different tool. At some point I'd like a library that can consume/emit the 97 different equivalent dialects of EBNF used by different tools, so we can do this sort of exercise a little more readily. In any case that is (as far as I know) the only multiplicity of grammars floating about. The previous attempt here was one I made, targeting llnextgen, but it was abandoned mid-way-through (the results of which are currently on display in the reference manual). Are there others? -Graydon ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] LL(1) problems
On Apr 25, 2013, at 7:04 AM, Graydon Hoare wrote: On 13-04-24 11:39 PM, Lindsey Kuper wrote: On Thu, Apr 25, 2013 at 1:25 AM, Patrick Walton pwal...@mozilla.com wrote: I have an untested grammar for Rust here, with the token regexes not yet filled in: https://gist.github.com/anonymous/5457664 yapps2 reports that this grammar is LL(1). Note that the refactorings I made resulted in a grammar which isn't that great for tooling or parsing in many places. In particular the contortions needed to make `self_ty_and_maybenamed_args` result in an AST that combines the self type and the type of the first argument together in bizarre ways. Since this is untested, I'm sure there are bugs, but it's probably a good sign that I could at least refactor the grammar to what I think is LL(1). This is really cool, but I'm sort of confused about the apparent multiple ongoing efforts toward having a precise and machine-readable Rust grammar. Should we consider one of these the real grammar? This is a translation of the one John just finished (which was for antlr4, a very flexible any LL(k) grammar) into input for yapps2, which (as far as I know) has the distinguishing features of being-in-python and being-able-to-tell-us-about-LL(1)-conflicts. I believe it's still the same grammar as John did, just massaged to target a different tool. At some point I'd like a library that can consume/emit the 97 different equivalent dialects of EBNF used by different tools, so we can do this sort of exercise a little more readily. In any case that is (as far as I know) the only multiplicity of grammars floating about. The previous attempt here was one I made, targeting llnextgen, but it was abandoned mid-way-through (the results of which are currently on display in the reference manual). FWIW, I'm (mildly) concerned too. In particular, I'm disappointed to discover that in its present form (and using my present caveman-like invocation style), ANTLR parses source files so slowly that it's impossible to use directly as a validation tool; I would very much like to directly validate the grammar used for documentation purposes against the existing sources. I haven't yet asked for help from the ANTLR folks, because I don't yet feel like I've finished due diligence on RTFMing ANTLR, which I would prefer to do before dumping the problem in their lap. John ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] LL(1) problems
On 4/25/13 12:08 AM, Alex Bradbury wrote: On 25 April 2013 06:25, Patrick Walton pwal...@mozilla.com wrote: Note that the refactorings I made resulted in a grammar which isn't that great for tooling or parsing in many places. In particular the contortions needed to make `self_ty_and_maybenamed_args` result in an AST that combines the self type and the type of the first argument together in bizarre ways. So that implies disallowing self as an identifier might be useful. Are there other changes of a similar nature that should be considered? In general I needed to disable keywords as identifiers to make it work. There were a few other minor changes: reducing patterns to literals as discussed in the other thread, and removing the ability to call methods on statements without parentheses. Patrick ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] LL(1) problems
On 13-04-25 08:37 AM, John Clements wrote: FWIW, I'm (mildly) concerned too. In particular, I'm disappointed to discover that in its present form (and using my present caveman-like invocation style), ANTLR parses source files so slowly that it's impossible to use directly as a validation tool; I would very much like to directly validate the grammar used for documentation purposes against the existing sources. I haven't yet asked for help from the ANTLR folks, because I don't yet feel like I've finished due diligence on RTFMing ANTLR, which I would prefer to do before dumping the problem in their lap. I'm sorry for the confusion; I don't think patrick's work here represents a divergence from yours so much as a continuation of it, in a direction that answers a question I've asked repeatedly while you were working on your grammar: can we actually find an LL(1) factoring?. Also can any other non-antlr tools consume this grammar? Since you chose antlr4 rather than antlr3, the LL(1) question in particular was obscured under the antlr4 will parse anything! sales pitch[1]. Which is fine as far as getting the grammar roughed out and running -- I'm not criticizing that decision, it was yours to make, as was the choice of antlr in the first place. But it _did_ dodge a question I've been quite persistent about asking; one which I wanted to have an answer for before considering the grammar done. Longer term, I would like whatever grammar we wind up denoting as canonical / documented / spec'ed to be as (re)target-able as possible. I've been relatively insistent on LL(1) since it is a nice intersection-of-inputs, practically guaranteed to parse under any framework we retarget it to. IOW I do _not_ want to force anyone working with rust grammars in the future to use antlr (3, 4, or anything else). That's too tool-specific[2]. A grammar that is trivally translatable between antlr4, antlr3, yapp2, llgen, llnextgen, coco, javacc, parsec, spirit, some rust parser-generator, and so forth is my eventual goal here. -Graydon [1]: We parse any grammar is unfortunately common in parser-generator sales pitches these days, with a profusion GLR and LL(*) things. As a downstream consumer of parser-generator technology, let me point out that while I appreciate broad guarantees by tool-makers, I very much _dislike_ a tool that offers broad guarantees at the expense of being able to make promises about efficiency, grammar class and algorithmic complexity. IOW I actually prefer tools that can tell me what I need to leave out (or change) in my grammar in order to arrive at an efficient parser in a given complexity class. Don't worry about it is the wrong answer here. I want to worry about it. [2]: we also seem to be most-invested in python for in-tree maintainer-mode tools associated with rust development; it seems like a lot to ask to install a JDK in order to verify the grammar. If the grammar-check _can_ be done in a python module, I'm happy to shift over to using it. Unless antlr-ness is an important part of the grammar in some way I'm not perceiving; do you have a strong preference for keeping the java + antlr dependency? ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] LL(1) problems
Hello, FWIW, I'm (mildly) concerned too. In particular, I'm disappointed to discover that in its present form (and using my present caveman-like invocation style), ANTLR parses source files so slowly that it's impossible to use directly as a validation tool; I would very much like to directly validate the grammar used for documentation purposes against the existing sources. I haven't yet asked for help from the ANTLR folks, because I don't yet feel like I've finished due diligence on RTFMing ANTLR, which I would prefer to do before dumping the problem in their lap. I have not yet tried ANTLR v4, but I have used ANTLR v3 for a few grammars. From what I have read in Why do we need ANTLR v4 [1], the new version focuses on ease-of-use rather than performance by accepting any grammar we care to write and pushing all the grammar analysis effort to runtime. It seems to me that even the grammar option bounding the LL-lookahead (k=xxx) does not longer exist in v4. May it be the reason why the parser is so slow? Have you considered trying v3 instead of v4? It would allow to set the LL-lookahead even per rule basis. Wojtek [1] http://www.antlr.org/wiki/pages/viewpage.action?pageId=29130850 ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] LL(1) problems
On 4/25/13 9:12 AM, Graydon Hoare wrote: Longer term, I would like whatever grammar we wind up denoting as canonical / documented / spec'ed to be as (re)target-able as possible. I've been relatively insistent on LL(1) since it is a nice intersection-of-inputs, practically guaranteed to parse under any framework we retarget it to. IOW I do _not_ want to force anyone working with rust grammars in the future to use antlr (3, 4, or anything else). That's too tool-specific[2]. A grammar that is trivally translatable between antlr4, antlr3, yapp2, llgen, llnextgen, coco, javacc, parsec, spirit, some rust parser-generator, and so forth is my eventual goal here. Are you concerned about the left-factoring needed to make the LL(1) grammar work? To me that's the biggest issue: the resulting grammar is kind of messy, and a tool that uses the LL(1) grammar is going to have a fun time reconstructing the first argument to method signatures (for example)... Patrick ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] LL(1) problems
On 13-04-25 08:33 AM, Paul Stansifer wrote: (For example: suppose that some statements and some expressions shared a common prefix that had to be merged in order to make the grammar LL(1); instead of having `stmt` and `expr` nonterminals in macros, we'd need to split them into `most_stmt` `most_expr` and `some_stmt_and_expr`.) That would of course be undesirable. It is not clear to me that the factorings patrick was struggling with had that property. I think both suffixes of 'unsafe' were inside $stmt (branching to sub-rules shared by $item and $expr) and both suffixes of '' were branching to sub-rules of $ty and $pat. But I might be misreading. I agree that we want to keep relatively pithy named entrypoints in the grammar for the macro nonterminal parser to hook in at (even if they're artificial). Difficult balancing act, but I think it may be possible. Or at least it's not evident to me yet that it's not. -Graydon ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] LL(1) problems
On 25/04/2013 18:12, Graydon Hoare wrote: I've been relatively insistent on LL(1) since it is a nice intersection-of-inputs, practically guaranteed to parse under any framework we retarget it to. I'm a fan of this choice too, if only because the simplest efficient parser-generators and/or parser-composition methodologies I know of take an LL(1) grammar as input. However, Paul's earlier plea on this thread (Please don't do this [grammar factoring] to the official parser!) raised the following question in my mind: Are we allowing for the possibility of choosing the semi-middle ground of: There *exists* an LL(1) grammar for Rust that is derivable from the non-LL(1)-but-official grammar for Rust. ? Or do we want to go all the way to ensuring that our own grammar that we e.g. use for defining the syntactic classes of the macro system etc is strictly LL(1) (or perhaps LL(k) for some small but known fixed k)? (I'd have to go review my compiler text books at home to review how much this would actually buy us.) If we've already discussed the latter, mea culpa. :) Cheers, -Felix On 25/04/2013 18:12, Graydon Hoare wrote: On 13-04-25 08:37 AM, John Clements wrote: FWIW, I'm (mildly) concerned too. In particular, I'm disappointed to discover that in its present form (and using my present caveman-like invocation style), ANTLR parses source files so slowly that it's impossible to use directly as a validation tool; I would very much like to directly validate the grammar used for documentation purposes against the existing sources. I haven't yet asked for help from the ANTLR folks, because I don't yet feel like I've finished due diligence on RTFMing ANTLR, which I would prefer to do before dumping the problem in their lap. I'm sorry for the confusion; I don't think patrick's work here represents a divergence from yours so much as a continuation of it, in a direction that answers a question I've asked repeatedly while you were working on your grammar: can we actually find an LL(1) factoring?. Also can any other non-antlr tools consume this grammar? Since you chose antlr4 rather than antlr3, the LL(1) question in particular was obscured under the antlr4 will parse anything! sales pitch[1]. Which is fine as far as getting the grammar roughed out and running -- I'm not criticizing that decision, it was yours to make, as was the choice of antlr in the first place. But it _did_ dodge a question I've been quite persistent about asking; one which I wanted to have an answer for before considering the grammar done. Longer term, I would like whatever grammar we wind up denoting as canonical / documented / spec'ed to be as (re)target-able as possible. I've been relatively insistent on LL(1) since it is a nice intersection-of-inputs, practically guaranteed to parse under any framework we retarget it to. IOW I do _not_ want to force anyone working with rust grammars in the future to use antlr (3, 4, or anything else). That's too tool-specific[2]. A grammar that is trivally translatable between antlr4, antlr3, yapp2, llgen, llnextgen, coco, javacc, parsec, spirit, some rust parser-generator, and so forth is my eventual goal here. -Graydon [1]: We parse any grammar is unfortunately common in parser-generator sales pitches these days, with a profusion GLR and LL(*) things. As a downstream consumer of parser-generator technology, let me point out that while I appreciate broad guarantees by tool-makers, I very much _dislike_ a tool that offers broad guarantees at the expense of being able to make promises about efficiency, grammar class and algorithmic complexity. IOW I actually prefer tools that can tell me what I need to leave out (or change) in my grammar in order to arrive at an efficient parser in a given complexity class. Don't worry about it is the wrong answer here. I want to worry about it. [2]: we also seem to be most-invested in python for in-tree maintainer-mode tools associated with rust development; it seems like a lot to ask to install a JDK in order to verify the grammar. If the grammar-check _can_ be done in a python module, I'm happy to shift over to using it. Unless antlr-ness is an important part of the grammar in some way I'm not perceiving; do you have a strong preference for keeping the java + antlr dependency? ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev -- irc: pnkfelix on irc.mozilla.org email: {fklock, pnkfelix}@mozilla.org ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] LL(1) problems
On 13-04-25 09:23 AM, Felix S. Klock II wrote: Are we allowing for the possibility of choosing the semi-middle ground of: There *exists* an LL(1) grammar for Rust that is derivable from the non-LL(1)-but-official grammar for Rust. ? Or do we want to go all the way to ensuring that our own grammar that we e.g. use for defining the syntactic classes of the macro system etc is strictly LL(1) (or perhaps LL(k) for some small but known fixed k)? We are (or at least, I am) allowing for this (and related) possibilities. This is an aspirational conversation. I can't rule out anything really; if we found that there were grammar bits that were essential to our expressiveness and put us in nothing short of GLR, that would be ... sad, but not ship-sinking. Until this week I had no idea where we were, grammar-wise; it might have required a turing complete recognizer for all I knew. Hand written parsers often contain gruesome hacks. Ours certainly has in the past! I'm just trying to find out how good we can get things (without major surgery). LL(1) is icing on the cake. But who doesn't like icing on their cake? -Graydon ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] LL(1) problems
On Thu, Apr 25, 2013 at 10:04 AM, Graydon Hoare gray...@mozilla.com wrote: On 13-04-24 11:39 PM, Lindsey Kuper wrote: This is really cool, but I'm sort of confused about the apparent multiple ongoing efforts toward having a precise and machine-readable Rust grammar. Should we consider one of these the real grammar? This is a translation of the one John just finished (which was for antlr4, a very flexible any LL(k) grammar) into input for yapps2, which (as far as I know) has the distinguishing features of being-in-python and being-able-to-tell-us-about-LL(1)-conflicts. I believe it's still the same grammar as John did, just massaged to target a different tool. At some point I'd like a library that can consume/emit the 97 different equivalent dialects of EBNF used by different tools, so we can do this sort of exercise a little more readily. In any case that is (as far as I know) the only multiplicity of grammars floating about. The previous attempt here was one I made, targeting llnextgen, but it was abandoned mid-way-through (the results of which are currently on display in the reference manual). Are there others? OK, thanks for explaining! (By multiple I just meant two. I'm not aware of others.) Lindsey ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] LL(1) problems
On 4/25/13 9:23 AM, Felix S. Klock II wrote: On 25/04/2013 18:12, Graydon Hoare wrote: I've been relatively insistent on LL(1) since it is a nice intersection-of-inputs, practically guaranteed to parse under any framework we retarget it to. I'm a fan of this choice too, if only because the simplest efficient parser-generators and/or parser-composition methodologies I know of take an LL(1) grammar as input. However, Paul's earlier plea on this thread (Please don't do this [grammar factoring] to the official parser!) raised the following question in my mind: Are we allowing for the possibility of choosing the semi-middle ground of: There *exists* an LL(1) grammar for Rust that is derivable from the non-LL(1)-but-official grammar for Rust. ? Or do we want to go all the way to ensuring that our own grammar that we e.g. use for defining the syntactic classes of the macro system etc is strictly LL(1) (or perhaps LL(k) for some small but known fixed k)? I'm not sure we can do the latter. There are too many issues relating to `unsafe`, `loop`, the `self` argument, etc. to make the LL(1) derivable from the human-readable grammar in an automated fashion, in my eyes. At least, I'm pretty sure that if we did want to go down that route, we'd probably be doing months of parser research (and I do mean *research*, as far as I know). Patrick ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] LL(1) problems
On Thu, Apr 25, 2013 at 6:53 PM, Patrick Walton pwal...@mozilla.com wrote: On 4/25/13 9:23 AM, Felix S. Klock II wrote: On 25/04/2013 18:12, Graydon Hoare wrote: I've been relatively insistent on LL(1) since it is a nice intersection-of-inputs, practically guaranteed to parse under any framework we retarget it to. I'm a fan of this choice too, if only because the simplest efficient parser-generators and/or parser-composition methodologies I know of take an LL(1) grammar as input. However, Paul's earlier plea on this thread (Please don't do this [grammar factoring] to the official parser!) raised the following question in my mind: Are we allowing for the possibility of choosing the semi-middle ground of: There *exists* an LL(1) grammar for Rust that is derivable from the non-LL(1)-but-official grammar for Rust. ? Or do we want to go all the way to ensuring that our own grammar that we e.g. use for defining the syntactic classes of the macro system etc is strictly LL(1) (or perhaps LL(k) for some small but known fixed k)? I'm not sure we can do the latter. There are too many issues relating to `unsafe`, `loop`, the `self` argument, etc. to make the LL(1) derivable from the human-readable grammar in an automated fashion, in my eyes. At least, I'm pretty sure that if we did want to go down that route, we'd probably be doing months of parser research (and I do mean *research*, as far as I know). Patrick On the other hand, should you content yourself with LL(2), and actually have a tool like yapp2 guarantee that it is indeed LL(2) (and does not degenerate), would it not be sufficient ? (in case LL(1) really is gruesome compared to LL(2)) -- Matthieu __**_ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/**listinfo/rust-devhttps://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] LL(1) problems
On 13-04-24 07:06 PM, Patrick Walton wrote: These were the only warnings that the Python yapps2 module emitted when compiling the Rust grammar, after suitable refactoring. My general feeling is that, given how valuable explicit self and argument patterns are, LL(2) is okay, especially if we turn out to be LALR(1). Others may feel differently, however; thoughts? My skill at grammar-factoring is ... minimal, but I thought you could generally factor anything that's just a common-prefix situation like these by introducing new rules that go one (common) token further and then decide between the two remaining possible branches. Not to say it's pretty, but it ... ought to work, I think? (TBH I don't actually understand any of the examples I can find of LL(2) grammars that can't be factored into LL(1); the compiler texts I have all suggest that you can factor most reasonable LL(*) things into LL(1) with sufficiently many intermediate nodes. This page for example shows one http://stackoverflow.com/questions/10634197/ll2-language-that-is-not-ll1 but I can't tell exactly what's going on there, something to do with having to look 1 token past a choice point that may expand infinitely? It seems quite rare / unlikely.) If we absolutely, resolutely can't factor the current grammar down to LL(1) I guess I'll live. LL(2) is not the end of the world. But it seems awfully close, given the results so far! 2 or 3 remaining conflicts? That's _amazing_. I thought we'd be fighting this for months. I agree that removing pattern arguments and explicit self would be a high price, probably too high, if that's what it comes down to. -Graydon ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] LL(1) problems
On 4/24/13 8:04 PM, Graydon Hoare wrote: My skill at grammar-factoring is ... minimal, but I thought you could generally factor anything that's just a common-prefix situation like these by introducing new rules that go one (common) token further and then decide between the two remaining possible branches. Not to say it's pretty, but it ... ought to work, I think? I tried harder to refactor it. I have an untested grammar for Rust here, with the token regexes not yet filled in: https://gist.github.com/anonymous/5457664 yapps2 reports that this grammar is LL(1). Note that the refactorings I made resulted in a grammar which isn't that great for tooling or parsing in many places. In particular the contortions needed to make `self_ty_and_maybenamed_args` result in an AST that combines the self type and the type of the first argument together in bizarre ways. Since this is untested, I'm sure there are bugs, but it's probably a good sign that I could at least refactor the grammar to what I think is LL(1). Patrick ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev