If you haven’t seen Perl 6 grammars, I highly suggest taking a look. https://perl6advent.wordpress.com/2009/12/21/day-21-grammars-and-actions/ <https://perl6advent.wordpress.com/2009/12/21/day-21-grammars-and-actions/>
https://docs.perl6.org/language/grammars <https://docs.perl6.org/language/grammars> My off-the-cuff straw-man attempt to apply this to Swift. (Fair warning, I’m no expert on Perl 6 grammars, just an enthusiast) grammar FooGrammar { token top = { <string> | <number> } token string { pattern { '\"' <literal>\w+ '\"' } matched { // literal is in scope and holds the match // manipulate the AST return SomeTypeAdoptingAstNodeProtocol(x: y, value: literal) } } token number { pattern { /regex here/ } matched { } } } // produces an AST: let ast = FooGrammar.parse("some input") The key is that you use regexes for the basic tokens but you combine tokens more or less along BNF lines. When a token matches, the matched() block executes and you can manipulate the resulting AST. Presumably the AST types would come from the standard library. That is also a stepping stone to a hygienic macro system where the macro invocation can receive an AST representing the syntax inside the macro. You could even require a macro to define a grammar so the compiler could at least validate the syntax of a macro is well-formed. Russ > On Jan 25, 2017, at 7:27 AM, Chris Eidhof <ch...@eidhof.nl> wrote: > > Great stuff! > > I wonder if built-in grammars (that's what Perl calls them) would work only > for things that are backed by string literals, or if it's worth the > time/effort to make them work for other kind of data as well. For example, > what if you write a grammar to tokenize (yielding some sequence of `Token`s), > and then want to parse those `Token`s? Or what if you want to parse other > kinds of data? Or should we try to make the 80% case work (only provide > grammar/regex literals for Strings) to avoid complexity? > > I think it's worth looking at parser combinators. The really cool thing about > them is that they provide a few basic elements, and well-defined ways to > combine those elements. Specifically, it's interesting to look at the > problems surrounding parser combinators and seeing how we could do better: > > - Error messages can be cryptic > - Because it's built on top of the language, you can't really do stuff like > binding subexpressions to local variables > - Once you allow something like flatMap in a parser combinator library, you > lose the possibility to optimize by rewriting the parser structure. This is > because the parser that comes out of the rhs of a flatMap can depend on the > parsed output of the lhs. > > To me it seems like there's a lot of (exciting) work to be done to get this > right :). > > > On Wed, Jan 25, 2017 at 6:35 AM, Chris Lattner <sa...@nondot.org > <mailto:sa...@nondot.org>> wrote: > On Jan 24, 2017, at 12:05 AM, Chris Eidhof via swift-evolution > <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote: >> >> I agree that being able to implement parsers in a nice way can be a huge >> step forward in being really good at string processing. > > +1 from me as well, I agree with Joe that Swift can learn a lot from Perl 6 > grammar’s and we should take the time to do it right. Below I say “regex” a > lot, but I really mean a more general grammar system (and even Perl 5 regex’s > aren’t regular :-) > >> There are a couple of possibilities that come to mind directly: >> >> 1. Build parsers right into the language (like Perl 6 grammars) >> 2. Provide a parser combinator language (e.g. >> https://github.com/davedufresne/SwiftParsec >> <https://github.com/davedufresne/SwiftParsec>). >> 3. Rely on external tools like bison/yacc/etc. >> 4. Make it easy for people to write hand-written parsers (e.g. by providing >> an NSScanner alternative). > > > My opinion is that #1 is the right path to start with, but it wouldn’t > preclude doing #2. Here’s my rationale / half-baked thought process: > > There are two important use cases for regex's: the literal case (e.g. > /aa+b*/) and the dynamically computed case. The former is really what we’re > talking about here, the latter should obviously be handled with some sort of > Regex type which can be formed from string values or whatever. Regex > literals in an expression context should default to producing the Regex type > of course. > > This means that when you pass a regex literal into an API call (e.g. split on > a string), it is really just creating something of Regex type, and passing it > down. If you wanted to introduce a parser combinator DSL, you could totally > plug it into the system, by having the combinators produce something of the > Regex type. > > So why bless regex literals with language support at all? I see several > reasons: > > 1. Diagnostics: These will be heavily used by people, and you want to have > good compiler error and warning messages for them. You want to be able to > validate the regex at compile time, not wait until runtime to detect > syntactic mistakes like unbalanced parens. > > 2. Syntax Familiarity: To take advantage of people’s familiarity with other > languages, we should strive to make the basic regex syntax familiar and > obvious. I’d argue that /aa+b*/ should “just work” and do the thing you > think it does. Relying on a combinator library to do that would be crazy. > > 3. Performance: Many regex’s are actually regular, so they can be trivially > compiled into DFAs. There is a well understood body of work that can be > simply dropped into the compiler to do this. Regex’s that are not regular can > be compiled into hybrid DFA/NFA+backtracking schemes, and allowing a divide > and conquer style of compiler optimization to do this is the path that makes > the most sense (to me at least). Further, if you switch on a string and have > a bunch of cases that are regex’s, you’d obviously want the compiler to > generate a single state machine (like a lexer), not check each pattern in > series. > > 4. Pattern matching greatness: One of the most obnoxious/error prone aspects > of regex’s in many languages is that when you match a pattern, the various > matches are dumped into numbered result values (often by the order of the > parens in the pattern). This is totally barbaric: it begs for off by one > errors, often breaks as the program is being evolved/maintained, etc. It is > just as bad as printf/scanf! > > You should instead be able to directly bind subexpressions into local > variables. For example if you were trying to match something like “42: > Chris”, you should be able to use straw man syntax like this: > > case /(let id: \d+): (let name: \w+)/: print(id); print(name) > > Unless we were willing to dramatically expand how patterns work, this > requires baking support into the language. > > 5. Scanner/“Formatter" integration: Taking the above one step farther, we > could have default patterns for known types (and make it extensible to user > defined types of course). For example, \d+ is the obvious pattern for > integers, so you should be able to write the above like this (in principle): > > case /(let id: Int): (let name: \w+)/: print(id); print(name) > > In addition to avoiding having to specify \d+ all the time, this eliminates > the need for a “string to int” conversion after the pattern is matched, > because id would be bound as type Int already. > > > Anyway, to summarize, I think that getting regex’s into the language is > really important and expect them to be widely used. As such, I think it is > worth burning compiler/language complexity to make them be truly great in > Swift. > > -Chris > > > > > > -- > Chris Eidhof
_______________________________________________ swift-evolution mailing list swift-evolution@swift.org https://lists.swift.org/mailman/listinfo/swift-evolution