If you haven’t seen Perl 6 grammars, I highly suggest taking a look.

https://perl6advent.wordpress.com/2009/12/21/day-21-grammars-and-actions/ 
<https://perl6advent.wordpress.com/2009/12/21/day-21-grammars-and-actions/>

https://docs.perl6.org/language/grammars 
<https://docs.perl6.org/language/grammars>



My off-the-cuff straw-man attempt to apply this to Swift.  (Fair warning, I’m 
no expert on Perl 6 grammars, just an enthusiast)

grammar FooGrammar {
    token top = {
        <string> |
        <number>
    }
    token string {
        pattern {
            '\"' <literal>\w+ '\"'
        }
        matched {
            // literal is in scope and holds the match
            // manipulate the AST
            return SomeTypeAdoptingAstNodeProtocol(x: y, value: literal)
        }
    }
    token number {
        pattern {
            /regex here/
        }
        matched { 
        }
    }
}

// produces an AST:
let ast = FooGrammar.parse("some input")


The key is that you use regexes for the basic tokens but you combine tokens 
more or less along BNF lines. When a token matches, the matched() block 
executes and you can manipulate the resulting AST. Presumably the AST types 
would come from the standard library. 

That is also a stepping stone to a hygienic macro system where the macro 
invocation can receive an AST representing the syntax inside the macro. You 
could even require a macro to define a grammar so the compiler could at least 
validate the syntax of a macro is well-formed.


Russ


> On Jan 25, 2017, at 7:27 AM, Chris Eidhof <ch...@eidhof.nl> wrote:
> 
> Great stuff!
> 
> I wonder if built-in grammars (that's what Perl calls them) would work only 
> for things that are backed by string literals, or if it's worth the 
> time/effort to make them work for other kind of data as well. For example, 
> what if you write a grammar to tokenize (yielding some sequence of `Token`s), 
> and then want to parse those `Token`s? Or what if you want to parse other 
> kinds of data? Or should we try to make the 80% case work (only provide 
> grammar/regex literals for Strings) to avoid complexity?
> 
> I think it's worth looking at parser combinators. The really cool thing about 
> them is that they provide a few basic elements, and well-defined ways to 
> combine those elements. Specifically, it's interesting to look at the 
> problems surrounding parser combinators and seeing how we could do better:
> 
> - Error messages can be cryptic
> - Because it's built on top of the language, you can't really do stuff like 
> binding subexpressions to local variables
> - Once you allow something like flatMap in a parser combinator library, you 
> lose the possibility to optimize by rewriting the parser structure. This is 
> because the parser that comes out of the rhs of a flatMap can depend on the 
> parsed output of the lhs. 
> 
> To me it seems like there's a lot of (exciting) work to be done to get this 
> right :).
> 
> 
> On Wed, Jan 25, 2017 at 6:35 AM, Chris Lattner <sa...@nondot.org 
> <mailto:sa...@nondot.org>> wrote:
> On Jan 24, 2017, at 12:05 AM, Chris Eidhof via swift-evolution 
> <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:
>> 
>> I agree that being able to implement parsers in a nice way can be a huge 
>> step forward in being really good at string processing.
> 
> +1 from me as well, I agree with Joe that Swift can learn a lot from Perl 6 
> grammar’s and we should take the time to do it right.  Below I say “regex” a 
> lot, but I really mean a more general grammar system (and even Perl 5 regex’s 
> aren’t regular :-)
> 
>> There are a couple of possibilities that come to mind directly:
>> 
>> 1. Build parsers right into the language (like Perl 6 grammars)
>> 2. Provide a parser combinator language (e.g. 
>> https://github.com/davedufresne/SwiftParsec 
>> <https://github.com/davedufresne/SwiftParsec>). 
>> 3. Rely on external tools like bison/yacc/etc.
>> 4. Make it easy for people to write hand-written parsers (e.g. by providing 
>> an NSScanner alternative).
> 
> 
> My opinion is that #1 is the right path to start with, but it wouldn’t 
> preclude doing #2.  Here’s my rationale / half-baked thought process:
> 
> There are two important use cases for regex's: the literal case (e.g. 
> /aa+b*/) and the dynamically computed case.  The former is really what we’re 
> talking about here, the latter should obviously be handled with some sort of 
> Regex type which can be formed from string values or whatever.  Regex 
> literals in an expression context should default to producing the Regex type 
> of course.
> 
> This means that when you pass a regex literal into an API call (e.g. split on 
> a string), it is really just creating something of Regex type, and passing it 
> down.  If you wanted to introduce a parser combinator DSL, you could totally 
> plug it into the system, by having the combinators produce something of the 
> Regex type.
> 
> So why bless regex literals with language support at all?  I see several 
> reasons:
> 
> 1. Diagnostics: These will be heavily used by people, and you want to have 
> good compiler error and warning messages for them.  You want to be able to 
> validate the regex at compile time, not wait until runtime to detect 
> syntactic mistakes like unbalanced parens.
> 
> 2. Syntax Familiarity: To take advantage of people’s familiarity with other 
> languages, we should strive to make the basic regex syntax familiar and 
> obvious.  I’d argue that /aa+b*/ should “just work” and do the thing you 
> think it does.  Relying on a combinator library to do that would be crazy.
> 
> 3. Performance: Many regex’s are actually regular, so they can be trivially 
> compiled into DFAs.  There is a well understood body of work that can be 
> simply dropped into the compiler to do this. Regex’s that are not regular can 
> be compiled into hybrid DFA/NFA+backtracking schemes, and allowing a divide 
> and conquer style of compiler optimization to do this is the path that makes 
> the most sense (to me at least).  Further, if you switch on a string and have 
> a bunch of cases that are regex’s, you’d obviously want the compiler to 
> generate a single state machine (like a lexer), not check each pattern in 
> series.
> 
> 4. Pattern matching greatness: One of the most obnoxious/error prone aspects 
> of regex’s in many languages is that when you match a pattern, the various 
> matches are dumped into numbered result values (often by the order of the 
> parens in the pattern).  This is totally barbaric: it begs for off by one 
> errors, often breaks as the program is being evolved/maintained, etc.  It is 
> just as bad as printf/scanf!  
> 
> You should instead be able to directly bind subexpressions into local 
> variables.  For example if you were trying to match something like “42: 
> Chris”, you should be able to use straw man syntax like this:
> 
>    case /(let id: \d+): (let name: \w+)/: print(id); print(name)
> 
> Unless we were willing to dramatically expand how patterns work, this 
> requires baking support into the language.
> 
> 5. Scanner/“Formatter" integration:  Taking the above one step farther, we 
> could have default patterns for known types (and make it extensible to user 
> defined types of course). For example, \d+ is the obvious pattern for 
> integers, so you should be able to write the above like this (in principle):
> 
>    case /(let id: Int): (let name: \w+)/: print(id); print(name)
> 
> In addition to avoiding having to specify \d+ all the time, this eliminates 
> the need for a “string to int” conversion after the pattern is matched, 
> because id would be bound as type Int already.
> 
> 
> Anyway, to summarize, I think that getting regex’s into the language is 
> really important and expect them to be widely used.  As such, I think it is 
> worth burning compiler/language complexity to make them be truly great in 
> Swift.
> 
> -Chris
> 
> 
> 
> 
> 
> -- 
> Chris Eidhof

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Reply via email to