> On Jan 16, 2018, at 2:18 PM, Michael Ilseman via swift-evolution > <swift-evolution@swift.org> wrote: > > (Replying to both Eneko and George at once) > >>>>> I wonder if it is worth considering (for lack of a better word) *verbose* >>>>> regular expression for Swift. > >>>> > > > It is certainly worth thought; even if we don’t go down that path there’s > lessons to pick up along the way. I believe “verbal expressions” is basically > what you’re describing: > https://github.com/VerbalExpressions/SwiftVerbalExpressions > <https://github.com/VerbalExpressions/SwiftVerbalExpressions> > > >> On Jan 16, 2018, at 11:24 AM, Eneko Alonso via swift-evolution >> <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote: >> >> Thank you for the reply. The part I didn’t understand is if if giving names >> to the captured groups would be mandatory. Hopefully not. >> >> Assuming we the user does not need names, the groups could be captures on an >> unlabeled tuple. >> > > I mention this through use of ‘_’. > > A construct like (let _ = \d+) could produce an unlabeled tuple element. > > > > Thinking about explicit capture names, etc., is all subject to change based > on more investigation and playing around with examples. See my email exchange > with John Holdsworth, where most names end up being redundant with > destructuring at their only use site. That may have just been overly > simplistic examples, but maybe not. > > >> Digits could always be inferred to be numeric (Int) and they should always >> be “exact” (to match "\d"): >> >> let usPhoneNumber: Regex = (.digits(3) + "-“).oneOrZero + .digits(3) + “-“ + >> .digits(4) >> > > What if you want to match a sequence of digits that are too large to fit in > an Int? For example, the market cap of any stock in the S&P 500 would > overflow Int on 32-bit platforms. Having the default represent a portion of > the input (whether that be Substring or just a Range) is more faithful to the > purposes of captures, which is matching parts of text. Explicitly specifying > a type is syntax for passing the capture into an init that serves as both a > capture-validator as well as a value constructor, which is really just yet > another kind of Pattern. (This might be generalizable to use beyond regexes, > but that’s a whole other digression.) This also aids discovery, as you know > what type’s conformance to RegexSubmatchableiblewobble to check. > > (Note that some way to get slices or ranges will always be important for > things like case-insensitive matching: changing case can change the number of > graphemes in a string). > > >> Personally, I like the `.optional` better than `.oneOrZero`: >> >> let usPhoneNumber = Regex.optional(.digits(3) + "-“) + .digits(3) + “-“ + >> .digits(4) >> >> Would it be possible to support both condensed and extended syntax? >> >> let usPhoneNumber = / (\d{3} + "-“)? + (\d{3}) + “-“ + (\d{4}) / >> >> Maybe only extended (verbose) syntax would support named groups? >> > > “\d” is just syntax for a built-in character class named “digit”. There will > be some way to use a character class, whether built-in or user-defined, in a > regex. > > For example, in Perl 6, you can say “\d” or “<digit>”, both of which are > equivalent. Shortcuts for some built-in character classes are convenient and > leverage the collective understanding of regexes amongst developers, and I > don’t think they cause harm. > >> Eneko >> >> >>> On Jan 16, 2018, at 10:01 AM, George Leontiev <georgeleont...@gmail.com >>> <mailto:georgeleont...@gmail.com>> wrote: >>> >>> @Eneko While it sure seems possible to specify the type, I think this would >>> go against the salient point "If something’s worth capturing, it’s worth >>> giving it a name.” Putting the name further away seems like a step backward. >>> >>> >>> I could imagine a slightly more succinct syntax where things like >>> .numberFromDigits are replaced by protocol conformance of the bound type: >>> ``` >>> extension Int: Regexable { >>> func baseRegex<T>() -> Regex<T, Int> >>> } >>> let usPhoneNumber = (/let area: Int/.exactDigits(3) + "-").oneOrZero + >>> /let routing: Int/.exactDigits(3) + "-" + >>> /let local: Int/.exactDigits(4) >>> ``` >>> >>> In this model, the `//` syntax will only be used for initial binding and >>> swifty transformations will build the final regex. >>> >>> >>>> On Jan 16, 2018, at 9:20 AM, Eneko Alonso via swift-evolution >>>> <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote: >>>> >>>> Could it be possible to specify the regex type ahead avoiding having to >>>> specify the type of each captured group? >>>> >>>> let usPhoneNumber: Regex<UnicodeScalar, (area: Int?, routing: Int, local: >>>> Int)> = / >>>> (\d{3}?) - >>>> (\d{3}) - >>>> (\d{4}) / >>>> >>>> “Verbose” alternative: >>>> >>>> let usPhoneNumber: Regex<UnicodeScalar, (area: Int?, routing: Int, local: >>>> Int)> = / >>>> .optional(.numberFromDigits(.exactly(3)) + "-“) + >>>> .numberFromDigits(.exactly(3)) + "-" >>>> .numberFromDigits(.exactly(4)) / >>>> print(type(of: usPhoneNumber)) // => Regex<UnicodeScalar, (area: Int?, >>>> routing: Int, local: Int)> >>>> >>>> >>>> Thanks, >>>> Eneko >>>> >>>> >>>>> On Jan 16, 2018, at 8:52 AM, George Leontiev via swift-evolution >>>>> <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote: >>>>> >>>>> Thanks, Michael. This is very interesting! >>>>> >>>>> I wonder if it is worth considering (for lack of a better word) *verbose* >>>>> regular expression for Swift. >>>>> >>>>> For instance, your example: >>>>> ``` >>>>> let usPhoneNumber = / >>>>> (let area: Int? <- \d{3}?) - >>>>> (let routing: Int <- \d{3}) - >>>>> (let local: Int <- \d{4}) / >>>>> ``` >>>>> would become something like (strawman syntax): >>>>> ``` >>>>> let usPhoneNumber = /let area: Int? <- .numberFromDigits(.exactly(3))/ + >>>>> "-" + >>>>> /let routing: Int <- .numberFromDigits(.exactly(3))/ >>>>> + "-" >>>>> /let local: Int <- .numberFromDigits(.exactly(4))/ >>>>> ``` >>>>> With this format, I also noticed that your code wouldn't match >>>>> "555-5555", only "-555-5555", so maybe it would end up being something >>>>> like: >>>>> ``` >>>>> let usPhoneNumber = .optional(/let area: Int <- >>>>> .numberFromDigits(.exactly(3))/ + "-") + >>>>> /let routing: Int <- .numberFromDigits(.exactly(3))/ >>>>> + "-" >>>>> /let local: Int <- .numberFromDigits(.exactly(4))/ >>>>> ``` >>>>> Notice that `area` is initially a non-optional `Int`, but becomes >>>>> optional when transformed by the `optional` directive. > > That is a good catch and illustrates some of the trappings of regexes and the > need for pick the right syntax. BTW, when you say optional, does it mean the > match didn’t happen or the capture-validation didn’t succeed? In this > example, it seems like the inclusive-or of both.
Yes, it would be inclusive-or. This is a good example of your above point how capture-validation and matching can be conflated. I can’t immediately thing of a good way to make this explicit, but being able to do /let area: Int/ to match “something that can decode to Int” feels very convenient. > >>>>> Other directives may be: >>>>> ``` >>>>> let decimal = /let beforeDecimalPoint: Int <-- >>>>> .numberFromDigits(.oneOrMore)/ + >>>>> .optional("." + /let afterDecimalPoint: Int <-- >>>>> .numberFromDigits(.oneOrMore)/ >>>>> ``` >>>>> >>>>> In this world, the `/<--/` format will only be used for explicit binding, >>>>> and the rest will be inferred from generic `+` operators. >>>>> >>>>> >>>>> I also think it would be helpful if `Regex` was generic over all sequence >>>>> types. >>>>> Going back to the phone example, this would looks something like: >>>>> ``` >>>>> let usPhoneNumber = .optional(/let area: Int <- >>>>> .numberFromDigits(.exactly(3))/ + "-") + >>>>> /let routing: Int <- .numberFromDigits(.exactly(3))/ >>>>> + "-" >>>>> /let local: Int <- .numberFromDigits(.exactly(4))/ >>>>> print(type(of: usPhoneNumber)) // => Regex<UnicodeScalar, (area: Int?, >>>>> routing: Int, local: Int)> >>>>> ``` >>>>> Note the addition of `UnicodeScalar` to the signature of `Regex`. Other >>>>> interesting signatures are `Regex<JSONToken, JSONEnumeration>` or >>>>> `Regex<HTTPRequestHeaderToken, HTTPRequestHeader>`. Building parsers >>>>> becomes fun! >>>>> > > I think I missed something. What does the `UnicodeScalar` type parameter do? I was just commenting here that we may want to regex over non-strings. Regex<UnicodeScalar, T> would operate over strings (sequences of UnicodeScalar), but being able to create Regexes for arbitrary sequences (non-strings) may be useful as well. > >>>>> - George >>>>> >>>>>> On Jan 10, 2018, at 11:58 AM, Michael Ilseman via swift-evolution >>>>>> <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote: >>>>>> >>>>>> Hello, I just sent an email to swift-dev titled "State of String: ABI, >>>>>> Performance, Ergonomics, and You!” at >>>>>> https://lists.swift.org/pipermail/swift-dev/Week-of-Mon-20180108/006407.html >>>>>> >>>>>> <https://lists.swift.org/pipermail/swift-dev/Week-of-Mon-20180108/006407.html>, >>>>>> whose gist can be found at >>>>>> https://gist.github.com/milseman/bb39ef7f170641ae52c13600a512782f >>>>>> <https://gist.github.com/milseman/bb39ef7f170641ae52c13600a512782f>. I >>>>>> posted to swift-dev as much of the content is from an implementation >>>>>> perspective, but it also addresses many areas of potential evolution. >>>>>> Please refer to that email for details; here’s the recap from it: >>>>>> >>>>>> ### Recap: Potential Additions for Swift 5 >>>>>> >>>>>> * Some form of unmanaged or unsafe Strings, and corresponding APIs >>>>>> * Exposing performance flags, and some way to request a scan to populate >>>>>> them >>>>>> * API gaps >>>>>> * Character and UnicodeScalar properties, such as isNewline >>>>>> * Generalizing, and optimizing, String interpolation >>>>>> * Regex literals, Regex type, and generalized pattern match destructuring >>>>>> * Substitution APIs, in conjunction with Regexes. >>>>>> >>>>>> _______________________________________________ >>>>>> swift-evolution mailing list >>>>>> swift-evolution@swift.org <mailto:swift-evolution@swift.org> >>>>>> https://lists.swift.org/mailman/listinfo/swift-evolution >>>>>> <https://lists.swift.org/mailman/listinfo/swift-evolution> >>>>> >>>>> _______________________________________________ >>>>> swift-evolution mailing list >>>>> swift-evolution@swift.org <mailto:swift-evolution@swift.org> >>>>> https://lists.swift.org/mailman/listinfo/swift-evolution >>>>> <https://lists.swift.org/mailman/listinfo/swift-evolution> >>>> >>>> _______________________________________________ >>>> swift-evolution mailing list >>>> swift-evolution@swift.org <mailto:swift-evolution@swift.org> >>>> https://lists.swift.org/mailman/listinfo/swift-evolution >>>> <https://lists.swift.org/mailman/listinfo/swift-evolution> >>> >> >> _______________________________________________ >> swift-evolution mailing list >> swift-evolution@swift.org <mailto:swift-evolution@swift.org> >> https://lists.swift.org/mailman/listinfo/swift-evolution >> <https://lists.swift.org/mailman/listinfo/swift-evolution> > > _______________________________________________ > swift-evolution mailing list > swift-evolution@swift.org <mailto:swift-evolution@swift.org> > https://lists.swift.org/mailman/listinfo/swift-evolution > <https://lists.swift.org/mailman/listinfo/swift-evolution>
_______________________________________________ swift-evolution mailing list swift-evolution@swift.org https://lists.swift.org/mailman/listinfo/swift-evolution