> On Jan 11, 2018, at 9:49 AM, John Holdsworth <m...@johnholdsworth.com> wrote:
> Hi Michael,
> Thanks for sending this through. It’s an interesting read. One section gave 
> me pause however. I feel Swift should resist the siren call of combining 
> Swift Syntax with Regex syntax as it falls on the wrong side of Occam's 
> razor. ISO Regex syntax is plenty complex enough without trying to 
> incorporate named capture, desirable as it may be. Also, if I was to go down 
> that route, I’d move away from / as the delimiter which is a carry over from 
> Perl to something like e”I am a regex” to give the lexer more to go on which 
> could represent say, a cached instance of NSRegularExpression.

Sorry for the confusion, in no way is the syntax of regex literals tied to any 
syntactic standard or historical baggage. It might happen to align when obvious 
or beneficial to common practice, i.e. use same basic meta-characters and built 
in character classes. This is important as they certainly wouldn’t honor any 
standard semantically, other than perhaps UTS-18 level-2 by coincidence (which 
doesn’t dictate preference of ambiguous matches, AFAIK).

As a downside, this does open a huge domain of bike shedding ;-)

The approach mentioned would allow someone (e.g. SPM packages) to provide 
functionality such as (ignoring style) for execution on Swift’s regex engine:

func compilePOSIX(_: String) throws -> Regex<[Any]> // Or perhaps 
Regex<[Substring]>, or Regex<POSIXMatch>, details...
func compileRE2(_: String) -> Regex<[Any]> // ditto
… PCRE, ICU, JS, Perl 5, Perl 6, etc. ...

(Note that we can't use NSRegularExpression as an execution engine 
out-of-the-box, as it relies on ICU which doesn’t provide matching modulo 
canonical equivalence. Not to mention the performance issues….)

> And now for something completely different...
> Common usage patterns for a regex fall into 4 categories: deconstruction, 
> replacement, iteration and switch/case.

Could you elaborate more on this breakdown? What are the differences between 
deconstruction, iteration, and switch/case?

> Ideally the representation of a regex match would the same for all four of 
> these categories and I’d like to argue a set of expressive regex primitives 
> can be created without building them into the language.

BTW, the “built into the language” would be confined to the regex literal 
syntax. The Regex<T> type wouldn’t necessarily need to be built-in, and could 
be constructed through other means.

> I’ve talked before about a regex match being coded as a string/regex 
> subscripting into a string and I’ve been able to move this forward since last 
> year. While this seems like an arbitrary operator to use it has some semantic 
> sense in that you are addressing a sub-part of the string with pattern as you 
> might use an index or a key. Subscripts also have some very interesting 
> properties in Swift compared to other operators or functions: You don’t have 
> to worry about precedence, they can be assigned to, used as an interator, and 
> I've learned since my last email on this topic that the Swift type checker 
> will disambiguate multiple subscript overloads on the basis of the type of 
> the variable is being assigned to.

Why do you use String as a regex rather than a new type, which could be 
ExpressibleByStringLiteral? That might help with overloading or ambiguities, 
and a new type is a something we can extend with regex-specific functionality.

String could have a generic subscript from Regex<T> to T. Perhaps it could also 
be done as a setter, assigning a value of T (which may have to be string 
convertible... details).

> An extension to String can now realise the common use cases by judicious use 
> of types:
> var input = "Now is the time for all good men to come to the aid of the party"
> if input["\\w+"] {
>     print("match")
> }
> // receiving type controls data you get
> if let firstMatch: Substring = input["\\w+"] {
>     print("match: \(firstMatch)")
> }
> if let groupsOfFirstMatch: [Substring?] = input["(all) (\\w+)"] {
>     print("groups: \(groupsOfFirstMatch)")
> }
> // "splat" out up to N groups of first match
> if let (group1, group2): (String, String) = input["(all) (\\w+)"] {
>     print("group1: \(group1), group2: \(group2)")
> }
> if let allGroupsOfAllMatches: [[Substring?]] = input["(\\w)(\\w*)"] {
>     print("allGroups: \(allGroupsOfAllMatches)")
> }

I’m interested in how you view the tradeoffs of not introducing a new type. If 
there was a Regex<T>, it could have computed properties for, e.g. an eager 
allMatches, lazy allMatches, firstMatch (given some ordering semantics), 
ignoringCaptures, caseInsensitive, …. then you don’t need your “(all) ” 

> // regex replace by assignment
> input["men"] = "folk"
> print(input)

Ok, you’re starting to sell me on the subscript-setter that takes a Regex ;-). 
The setter value wouldn’t be able to use information from the captures, so we 
would probably still want a substitute API that takes a closure receiving 
captures, but this looks nice for simple usage.

Transcribing into the presented approach (Using 「 and 」 as delimiters in a 
surely-futile effort to not focus on specific syntax):

input[「\d+」.firstMatch] = 123
input[「\d+」.allMatches] = sequence(first: 42) { return $0 + 1 }

> // parsing a properties file using regex as iterator
> let props = """
>     name1 = value1
>     name2 = value2
>     """
> var params = [String: String]()
> for groups in props["(\\w+)\\s*=\\s*(.*)"] {
>     params[String(groups[1]!)] = String(groups[2]!)
> }
> print(params)

Translating this over to the literal style:

for (name, value) in props[「(let _ = \w+) \s* = \s* (let _ = .*)」.lineByLine] {
  print(name, value)

Or even better, give it a name!

let propertyPattern = 「(let name = \w+) \s* = \s* (let value = .*) // 
Regex<(name: Substring, value: Substring)>
for (name, value) in props[propertyPattern.lineByLine] {
  print(name, value)

> The case for switches is slightly more opaque in order to avoid executing the 
> match twice but viable.
> let match = RegexMatch()
> switch input {
> case RegexPattern("(\\w)(\\w*)", capture: match):
>     let (first, rest) = input[match]
>     print("\(first) \(rest)")
> default:
>     break
> }

Using the literal approach:

let peelFirstWordChar = 「(let leading = \w)(let trailing = \w+)」 // 
Regex<(leading: Substring, trailing: Substring)>, or perhaps Regex<(leading: 
Character, trailing: Substring>), details….
switch input {
case let (first, rest) <- peelFirstWordChar: 
  print(“\(first) \(rest)”)

> This is explored in the attached playground (repo: 
> https://github.com/johnno1962/SwiftRegex4 
> <https://github.com/johnno1962/SwiftRegex4>)
> <SwiftRegex4.playground.zip>
> I’m not sure I really expect this to take off as an idea but I’d like to make 
> sure it's out there as an option and it certainly qualifies as “out there”.

I think it’s very interesting! Thanks for sharing. Do you have more usage 

> John
>> On 10 Jan 2018, at 19:58, Michael Ilseman via swift-evolution 
>> <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:
>> Hello, I just sent an email to swift-dev titled "State of String: ABI, 
>> Performance, Ergonomics, and You!” at 
>> https://lists.swift.org/pipermail/swift-dev/Week-of-Mon-20180108/006407.html 
>> <https://lists.swift.org/pipermail/swift-dev/Week-of-Mon-20180108/006407.html>,
>>  whose gist can be found at 
>> https://gist.github.com/milseman/bb39ef7f170641ae52c13600a512782f 
>> <https://gist.github.com/milseman/bb39ef7f170641ae52c13600a512782f>. I 
>> posted to swift-dev as much of the content is from an implementation 
>> perspective, but it also addresses many areas of potential evolution. Please 
>> refer to that email for details; here’s the recap from it:
>> ### Recap: Potential Additions for Swift 5
>> * Some form of unmanaged or unsafe Strings, and corresponding APIs
>> * Exposing performance flags, and some way to request a scan to populate them
>> * API gaps
>> * Character and UnicodeScalar properties, such as isNewline
>> * Generalizing, and optimizing, String interpolation
>> * Regex literals, Regex type, and generalized pattern match destructuring
>> * Substitution APIs, in conjunction with Regexes.
>> _______________________________________________
>> swift-evolution mailing list
>> swift-evolution@swift.org <mailto:swift-evolution@swift.org>
>> https://lists.swift.org/mailman/listinfo/swift-evolution

swift-evolution mailing list

Reply via email to