> On Jan 16, 2018, at 2:18 PM, Michael Ilseman via swift-evolution 
> <swift-evolution@swift.org> wrote:
> 
> (Replying to both Eneko and George at once)
> 
>>>>> I wonder if it is worth considering (for lack of a better word) *verbose* 
>>>>> regular expression for Swift.
> 
>>>> 
> 
> 
> It is certainly worth thought; even if we don’t go down that path there’s 
> lessons to pick up along the way. I believe “verbal expressions” is basically 
> what you’re describing: 
> https://github.com/VerbalExpressions/SwiftVerbalExpressions 
> <https://github.com/VerbalExpressions/SwiftVerbalExpressions>
> 
> 
>> On Jan 16, 2018, at 11:24 AM, Eneko Alonso via swift-evolution 
>> <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:
>> 
>> Thank you for the reply. The part I didn’t understand is if if giving names 
>> to the captured groups would be mandatory. Hopefully not.
>> 
>> Assuming we the user does not need names, the groups could be captures on an 
>> unlabeled tuple.
>> 
> 
> I mention this through use of ‘_’.
> 
> A construct like (let _ = \d+) could produce an unlabeled tuple element.
> 
> 
> 
> Thinking about explicit capture names, etc., is all subject to change based 
> on more investigation and playing around with examples. See my email exchange 
> with John Holdsworth, where most names end up being redundant with 
> destructuring at their only use site. That may have just been overly 
> simplistic examples, but maybe not.
> 
> 
>> Digits could always be inferred to be numeric (Int) and they should always 
>> be “exact” (to match "\d"):
>> 
>> let usPhoneNumber: Regex = (.digits(3) + "-“).oneOrZero + .digits(3) + “-“ + 
>> .digits(4)
>> 
> 
> What if you want to match a sequence of digits that are too large to fit in 
> an Int? For example, the market cap of any stock in the S&P 500 would 
> overflow Int on 32-bit platforms. Having the default represent a portion of 
> the input (whether that be Substring or just a Range) is more faithful to the 
> purposes of captures, which is matching parts of text. Explicitly specifying 
> a type is syntax for passing the capture into an init that serves as both a 
> capture-validator as well as a value constructor, which is really just yet 
> another kind of Pattern. (This might be generalizable to use beyond regexes, 
> but that’s a whole other digression.) This also aids discovery, as you know 
> what type’s conformance to RegexSubmatchableiblewobble to check.
> 
> (Note that some way to get slices or ranges will always be important for 
> things like case-insensitive matching: changing case can change the number of 
> graphemes in a string).
> 
> 
>> Personally, I like the `.optional` better than `.oneOrZero`:
>> 
>> let usPhoneNumber = Regex.optional(.digits(3) + "-“) + .digits(3) + “-“ + 
>> .digits(4)
>> 
>> Would it be possible to support both condensed and extended syntax? 
>> 
>> let usPhoneNumber = / (\d{3} + "-“)? + (\d{3}) + “-“ + (\d{4}) /
>> 
>> Maybe only extended (verbose) syntax would support named groups?
>> 
> 
> “\d” is just syntax for a built-in character class named “digit”. There will 
> be some way to use a character class, whether built-in or user-defined, in a 
> regex.
> 
> For example, in Perl 6, you can say “\d” or “<digit>”, both of which are 
> equivalent. Shortcuts for some built-in character classes are convenient and 
> leverage the collective understanding of regexes amongst developers, and I 
> don’t think they cause harm.
> 
>> Eneko
>> 
>> 
>>> On Jan 16, 2018, at 10:01 AM, George Leontiev <georgeleont...@gmail.com 
>>> <mailto:georgeleont...@gmail.com>> wrote:
>>> 
>>> @Eneko While it sure seems possible to specify the type, I think this would 
>>> go against the salient point "If something’s worth capturing, it’s worth 
>>> giving it a name.” Putting the name further away seems like a step backward.
>>> 
>>> 
>>> I could imagine a slightly more succinct syntax where things like 
>>> .numberFromDigits are replaced by protocol conformance of the bound type:
>>> ```
>>> extension Int: Regexable {
>>>     func baseRegex<T>() -> Regex<T, Int>
>>> }
>>> let usPhoneNumber = (/let area: Int/.exactDigits(3) + "-").oneOrZero +
>>>                     /let routing: Int/.exactDigits(3) + "-" +
>>>                     /let local: Int/.exactDigits(4)
>>> ```
>>> 
>>> In this model, the `//` syntax will only be used for initial binding and 
>>> swifty transformations will build the final regex.
>>> 
>>> 
>>>> On Jan 16, 2018, at 9:20 AM, Eneko Alonso via swift-evolution 
>>>> <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:
>>>> 
>>>> Could it be possible to specify the regex type ahead avoiding having to 
>>>> specify the type of each captured group?
>>>> 
>>>> let usPhoneNumber: Regex<UnicodeScalar, (area: Int?, routing: Int, local: 
>>>> Int)> = /
>>>>   (\d{3}?) -
>>>>   (\d{3}) -
>>>>   (\d{4}) /
>>>> 
>>>> “Verbose” alternative:
>>>> 
>>>> let usPhoneNumber: Regex<UnicodeScalar, (area: Int?, routing: Int, local: 
>>>> Int)> = / 
>>>>   .optional(.numberFromDigits(.exactly(3)) + "-“) +
>>>>   .numberFromDigits(.exactly(3)) + "-"
>>>>   .numberFromDigits(.exactly(4)) /
>>>> print(type(of: usPhoneNumber)) // => Regex<UnicodeScalar, (area: Int?, 
>>>> routing: Int, local: Int)>
>>>> 
>>>> 
>>>> Thanks,
>>>> Eneko
>>>> 
>>>> 
>>>>> On Jan 16, 2018, at 8:52 AM, George Leontiev via swift-evolution 
>>>>> <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:
>>>>> 
>>>>> Thanks, Michael. This is very interesting!
>>>>> 
>>>>> I wonder if it is worth considering (for lack of a better word) *verbose* 
>>>>> regular expression for Swift.
>>>>> 
>>>>> For instance, your example:
>>>>> ```
>>>>> let usPhoneNumber = /
>>>>>   (let area: Int? <- \d{3}?) -
>>>>>   (let routing: Int <- \d{3}) -
>>>>>   (let local: Int <- \d{4}) /
>>>>> ```
>>>>> would become something like (strawman syntax):
>>>>> ```
>>>>> let usPhoneNumber = /let area: Int? <- .numberFromDigits(.exactly(3))/ + 
>>>>> "-" +
>>>>>                     /let routing: Int <- .numberFromDigits(.exactly(3))/ 
>>>>> + "-"
>>>>>                     /let local: Int <- .numberFromDigits(.exactly(4))/
>>>>> ```
>>>>> With this format, I also noticed that your code wouldn't match 
>>>>> "555-5555", only "-555-5555", so maybe it would end up being something 
>>>>> like:
>>>>> ```
>>>>> let usPhoneNumber = .optional(/let area: Int <- 
>>>>> .numberFromDigits(.exactly(3))/ + "-") +
>>>>>                     /let routing: Int <- .numberFromDigits(.exactly(3))/ 
>>>>> + "-"
>>>>>                     /let local: Int <- .numberFromDigits(.exactly(4))/
>>>>> ```
>>>>> Notice that `area` is initially a non-optional `Int`, but becomes 
>>>>> optional when transformed by the `optional` directive.
> 
> That is a good catch and illustrates some of the trappings of regexes and the 
> need for pick the right syntax. BTW, when you say optional, does it mean the 
> match didn’t happen or the capture-validation didn’t succeed? In this 
> example, it seems like the inclusive-or of both.

Yes, it would be inclusive-or. This is a good example of your above point how 
capture-validation and matching can be conflated. I can’t immediately thing of 
a good way to make this explicit, but being able to do /let area: Int/ to match 
“something that can decode to Int” feels very convenient.

> 
>>>>> Other directives may be:
>>>>> ```
>>>>> let decimal = /let beforeDecimalPoint: Int <-- 
>>>>> .numberFromDigits(.oneOrMore)/ +
>>>>>               .optional("." + /let afterDecimalPoint: Int <-- 
>>>>> .numberFromDigits(.oneOrMore)/
>>>>> ```
>>>>> 
>>>>> In this world, the `/<--/` format will only be used for explicit binding, 
>>>>> and the rest will be inferred from generic `+` operators.
>>>>> 
>>>>> 
>>>>> I also think it would be helpful if `Regex` was generic over all sequence 
>>>>> types.
>>>>> Going back to the phone example, this would looks something like:
>>>>> ```
>>>>> let usPhoneNumber = .optional(/let area: Int <- 
>>>>> .numberFromDigits(.exactly(3))/ + "-") +
>>>>>                     /let routing: Int <- .numberFromDigits(.exactly(3))/ 
>>>>> + "-"
>>>>>                     /let local: Int <- .numberFromDigits(.exactly(4))/
>>>>> print(type(of: usPhoneNumber)) // => Regex<UnicodeScalar, (area: Int?, 
>>>>> routing: Int, local: Int)>
>>>>> ```
>>>>> Note the addition of `UnicodeScalar` to the signature of `Regex`. Other 
>>>>> interesting signatures are `Regex<JSONToken, JSONEnumeration>` or 
>>>>> `Regex<HTTPRequestHeaderToken, HTTPRequestHeader>`. Building parsers 
>>>>> becomes fun!
>>>>> 
> 
> I think I missed something. What does the `UnicodeScalar` type parameter do?

I was just commenting here that we may want to regex over non-strings. 
Regex<UnicodeScalar, T> would operate over strings (sequences of 
UnicodeScalar), but being able to create Regexes for arbitrary sequences 
(non-strings) may be useful as well.

> 
>>>>> - George
>>>>> 
>>>>>> On Jan 10, 2018, at 11:58 AM, Michael Ilseman via swift-evolution 
>>>>>> <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:
>>>>>> 
>>>>>> Hello, I just sent an email to swift-dev titled "State of String: ABI, 
>>>>>> Performance, Ergonomics, and You!” at 
>>>>>> https://lists.swift.org/pipermail/swift-dev/Week-of-Mon-20180108/006407.html
>>>>>>  
>>>>>> <https://lists.swift.org/pipermail/swift-dev/Week-of-Mon-20180108/006407.html>,
>>>>>>  whose gist can be found at 
>>>>>> https://gist.github.com/milseman/bb39ef7f170641ae52c13600a512782f 
>>>>>> <https://gist.github.com/milseman/bb39ef7f170641ae52c13600a512782f>. I 
>>>>>> posted to swift-dev as much of the content is from an implementation 
>>>>>> perspective, but it also addresses many areas of potential evolution. 
>>>>>> Please refer to that email for details; here’s the recap from it:
>>>>>> 
>>>>>> ### Recap: Potential Additions for Swift 5
>>>>>> 
>>>>>> * Some form of unmanaged or unsafe Strings, and corresponding APIs
>>>>>> * Exposing performance flags, and some way to request a scan to populate 
>>>>>> them
>>>>>> * API gaps
>>>>>> * Character and UnicodeScalar properties, such as isNewline
>>>>>> * Generalizing, and optimizing, String interpolation
>>>>>> * Regex literals, Regex type, and generalized pattern match destructuring
>>>>>> * Substitution APIs, in conjunction with Regexes.
>>>>>> 
>>>>>> _______________________________________________
>>>>>> swift-evolution mailing list
>>>>>> swift-evolution@swift.org <mailto:swift-evolution@swift.org>
>>>>>> https://lists.swift.org/mailman/listinfo/swift-evolution 
>>>>>> <https://lists.swift.org/mailman/listinfo/swift-evolution>
>>>>> 
>>>>> _______________________________________________
>>>>> swift-evolution mailing list
>>>>> swift-evolution@swift.org <mailto:swift-evolution@swift.org>
>>>>> https://lists.swift.org/mailman/listinfo/swift-evolution 
>>>>> <https://lists.swift.org/mailman/listinfo/swift-evolution>
>>>> 
>>>> _______________________________________________
>>>> swift-evolution mailing list
>>>> swift-evolution@swift.org <mailto:swift-evolution@swift.org>
>>>> https://lists.swift.org/mailman/listinfo/swift-evolution 
>>>> <https://lists.swift.org/mailman/listinfo/swift-evolution>
>>> 
>> 
>> _______________________________________________
>> swift-evolution mailing list
>> swift-evolution@swift.org <mailto:swift-evolution@swift.org>
>> https://lists.swift.org/mailman/listinfo/swift-evolution 
>> <https://lists.swift.org/mailman/listinfo/swift-evolution>
> 
> _______________________________________________
> swift-evolution mailing list
> swift-evolution@swift.org <mailto:swift-evolution@swift.org>
> https://lists.swift.org/mailman/listinfo/swift-evolution 
> <https://lists.swift.org/mailman/listinfo/swift-evolution>
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Reply via email to