Great stuff! I wonder if built-in grammars (that's what Perl calls them) would work only for things that are backed by string literals, or if it's worth the time/effort to make them work for other kind of data as well. For example, what if you write a grammar to tokenize (yielding some sequence of `Token`s), and then want to parse those `Token`s? Or what if you want to parse other kinds of data? Or should we try to make the 80% case work (only provide grammar/regex literals for Strings) to avoid complexity?
I think it's worth looking at parser combinators. The really cool thing about them is that they provide a few basic elements, and well-defined ways to combine those elements. Specifically, it's interesting to look at the problems surrounding parser combinators and seeing how we could do better: - Error messages can be cryptic - Because it's built on top of the language, you can't really do stuff like binding subexpressions to local variables - Once you allow something like flatMap in a parser combinator library, you lose the possibility to optimize by rewriting the parser structure. This is because the parser that comes out of the rhs of a flatMap can depend on the parsed output of the lhs. To me it seems like there's a lot of (exciting) work to be done to get this right :). On Wed, Jan 25, 2017 at 6:35 AM, Chris Lattner <[email protected]> wrote: > On Jan 24, 2017, at 12:05 AM, Chris Eidhof via swift-evolution < > [email protected]> wrote: > > > I agree that being able to implement parsers in a nice way can be a huge > step forward in being really good at string processing. > > > +1 from me as well, I agree with Joe that Swift can learn a lot from Perl > 6 grammar’s and we should take the time to do it right. Below I say > “regex” a lot, but I really mean a more general grammar system (and even > Perl 5 regex’s aren’t regular :-) > > There are a couple of possibilities that come to mind directly: > > 1. Build parsers right into the language (like Perl 6 grammars) > 2. Provide a parser combinator language (e.g. https://github.com/ > davedufresne/SwiftParsec). > 3. Rely on external tools like bison/yacc/etc. > 4. Make it easy for people to write hand-written parsers (e.g. by > providing an NSScanner alternative). > > > My opinion is that #1 is the right path to start with, but it wouldn’t > preclude doing #2. Here’s my rationale / half-baked thought process: > > There are two important use cases for regex's: the literal case (e.g. > /aa+b*/) and the dynamically computed case. The former is really what > we’re talking about here, the latter should obviously be handled with some > sort of Regex type which can be formed from string values or whatever. > Regex literals in an expression context should default to producing the > Regex type of course. > > This means that when you pass a regex literal into an API call (e.g. split > on a string), it is really just creating something of Regex type, and > passing it down. If you wanted to introduce a parser combinator DSL, you > could totally plug it into the system, by having the combinators produce > something of the Regex type. > > So why bless regex literals with language support at all? I see several > reasons: > > 1. Diagnostics: These will be heavily used by people, and you want to have > good compiler error and warning messages for them. You want to be able to > validate the regex at compile time, not wait until runtime to detect > syntactic mistakes like unbalanced parens. > > 2. Syntax Familiarity: To take advantage of people’s familiarity with > other languages, we should strive to make the basic regex syntax familiar > and obvious. I’d argue that /aa+b*/ should “just work” and do the thing > you think it does. Relying on a combinator library to do that would be > crazy. > > 3. Performance: Many regex’s are actually regular, so they can be > trivially compiled into DFAs. There is a well understood body of work that > can be simply dropped into the compiler to do this. Regex’s that are not > regular can be compiled into hybrid DFA/NFA+backtracking schemes, and > allowing a divide and conquer style of compiler optimization to do this is > the path that makes the most sense (to me at least). Further, if you > switch on a string and have a bunch of cases that are regex’s, you’d > obviously want the compiler to generate a single state machine (like a > lexer), not check each pattern in series. > > 4. Pattern matching greatness: One of the most obnoxious/error prone > aspects of regex’s in many languages is that when you match a pattern, the > various matches are dumped into numbered result values (often by the order > of the parens in the pattern). This is totally barbaric: it begs for off > by one errors, often breaks as the program is being evolved/maintained, > etc. It is just as bad as printf/scanf! > > You should instead be able to directly bind subexpressions into local > variables. For example if you were trying to match something like “42: > Chris”, you should be able to use straw man syntax like this: > > case /(let id: \d+): (let name: \w+)/: print(id); print(name) > > Unless we were willing to dramatically expand how patterns work, this > requires baking support into the language. > > 5. Scanner/“Formatter" integration: Taking the above one step farther, we > could have default patterns for known types (and make it extensible to user > defined types of course). For example, \d+ is the obvious pattern for > integers, so you should be able to write the above like this (in principle): > > case /(let id: Int): (let name: \w+)/: print(id); print(name) > > In addition to avoiding having to specify \d+ all the time, this > eliminates the need for a “string to int” conversion after the pattern is > matched, because id would be bound as type Int already. > > > Anyway, to summarize, I think that getting regex’s into the language is > really important and expect them to be widely used. As such, I think it is > worth burning compiler/language complexity to make them be truly great in > Swift. > > -Chris > > > -- Chris Eidhof
_______________________________________________ swift-evolution mailing list [email protected] https://lists.swift.org/mailman/listinfo/swift-evolution
