On Jan 24, 2017, at 12:05 AM, Chris Eidhof via swift-evolution 
<[email protected]> wrote:
> 
> I agree that being able to implement parsers in a nice way can be a huge step 
> forward in being really good at string processing.

+1 from me as well, I agree with Joe that Swift can learn a lot from Perl 6 
grammar’s and we should take the time to do it right.  Below I say “regex” a 
lot, but I really mean a more general grammar system (and even Perl 5 regex’s 
aren’t regular :-)

> There are a couple of possibilities that come to mind directly:
> 
> 1. Build parsers right into the language (like Perl 6 grammars)
> 2. Provide a parser combinator language (e.g. 
> https://github.com/davedufresne/SwiftParsec 
> <https://github.com/davedufresne/SwiftParsec>). 
> 3. Rely on external tools like bison/yacc/etc.
> 4. Make it easy for people to write hand-written parsers (e.g. by providing 
> an NSScanner alternative).


My opinion is that #1 is the right path to start with, but it wouldn’t preclude 
doing #2.  Here’s my rationale / half-baked thought process:

There are two important use cases for regex's: the literal case (e.g. /aa+b*/) 
and the dynamically computed case.  The former is really what we’re talking 
about here, the latter should obviously be handled with some sort of Regex type 
which can be formed from string values or whatever.  Regex literals in an 
expression context should default to producing the Regex type of course.

This means that when you pass a regex literal into an API call (e.g. split on a 
string), it is really just creating something of Regex type, and passing it 
down.  If you wanted to introduce a parser combinator DSL, you could totally 
plug it into the system, by having the combinators produce something of the 
Regex type.

So why bless regex literals with language support at all?  I see several 
reasons:

1. Diagnostics: These will be heavily used by people, and you want to have good 
compiler error and warning messages for them.  You want to be able to validate 
the regex at compile time, not wait until runtime to detect syntactic mistakes 
like unbalanced parens.

2. Syntax Familiarity: To take advantage of people’s familiarity with other 
languages, we should strive to make the basic regex syntax familiar and 
obvious.  I’d argue that /aa+b*/ should “just work” and do the thing you think 
it does.  Relying on a combinator library to do that would be crazy.

3. Performance: Many regex’s are actually regular, so they can be trivially 
compiled into DFAs.  There is a well understood body of work that can be simply 
dropped into the compiler to do this. Regex’s that are not regular can be 
compiled into hybrid DFA/NFA+backtracking schemes, and allowing a divide and 
conquer style of compiler optimization to do this is the path that makes the 
most sense (to me at least).  Further, if you switch on a string and have a 
bunch of cases that are regex’s, you’d obviously want the compiler to generate 
a single state machine (like a lexer), not check each pattern in series.

4. Pattern matching greatness: One of the most obnoxious/error prone aspects of 
regex’s in many languages is that when you match a pattern, the various matches 
are dumped into numbered result values (often by the order of the parens in the 
pattern).  This is totally barbaric: it begs for off by one errors, often 
breaks as the program is being evolved/maintained, etc.  It is just as bad as 
printf/scanf!  

You should instead be able to directly bind subexpressions into local 
variables.  For example if you were trying to match something like “42: Chris”, 
you should be able to use straw man syntax like this:

   case /(let id: \d+): (let name: \w+)/: print(id); print(name)

Unless we were willing to dramatically expand how patterns work, this requires 
baking support into the language.

5. Scanner/“Formatter" integration:  Taking the above one step farther, we 
could have default patterns for known types (and make it extensible to user 
defined types of course). For example, \d+ is the obvious pattern for integers, 
so you should be able to write the above like this (in principle):

   case /(let id: Int): (let name: \w+)/: print(id); print(name)

In addition to avoiding having to specify \d+ all the time, this eliminates the 
need for a “string to int” conversion after the pattern is matched, because id 
would be bound as type Int already.


Anyway, to summarize, I think that getting regex’s into the language is really 
important and expect them to be widely used.  As such, I think it is worth 
burning compiler/language complexity to make them be truly great in Swift.

-Chris


_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution

Reply via email to