> On Mar 30, 2017, at 2:36 PM, Ben Cohen <[email protected]> wrote: > > The big win for Unicode is it is short. We want to encourage people to write > their extensions on this protocol. We want people who previously extended > String to feel very comfortable extending Unicode. It also helps emphasis how > important the Unicode-ness of Swift.String is. I like the idea of > Unicode.Collection, but it is a little intimidating and making it even a tiny > bit intimidating is worrying to me from an adoption perspective.
Yeah, I understand why "Collection" might be intimidating. But I think "Unicode" would be too—it's opaque enough that people wouldn't be entirely sure whether they were extending the right thing. I did a quick run-through of different language and the protocols/interfaces/whatever their string types conform to, but most don't seem to have anything that abstracts string types. The only similar things I could find were `CharSequence` in Java, `StringLike` in Scala...and `Stringy` in Perl 6. And I'm sure you thought you were joking! Honestly, I'd recommend just going with `StringProtocol` unless you can come up with an adjective form you like (`Stringlike`? `Textual`?). It's a bit clumsy, but it's crystal clear. Stupid name, but you'll never forget it. >> I'm a little worried about this because it seems to imply that the protocol >> cannot include any mutation operations that aren't in >> `RangeReplaceableCollection`. For instance, it won't be possible to include >> an in-place `applyTransform` method in the protocol. Do you anticipate that >> being an issue? Might it be a good idea to define a parallel `Mutable` or >> `RangeReplaceable` protocol? >> > > You can always assign to self. Then provide more efficient implementations > where RangeReplaceableCollection. We do this elsewhere in the std lib with > collections e.g. > https://github.com/apple/swift/blob/master/stdlib/public/core/Collection.swift#L1277 > > <https://github.com/apple/swift/blob/master/stdlib/public/core/Collection.swift#L1277>. > > Proliferating protocol combinations is problematic (looking at you, > BidirectionalMutableRandomAccessSlice). Nobody likes proliferation, but in this case it'd be because there genuinely were additional semantics that were only available on mutable strings. (Once upon a time, I think I requested the ability to write `func index(of elem: Iterator.Element) -> Index? where Iterator.Element: Equatable`. Could such a feature be used for this? `func apply(_ transform: StringTransform, reverse: Bool) where Self: RangeReplaceableCollection`?) >>> The C string interop methods will be updated to those described here: a >>> single withCString operation and two init(cString:) constructors, one for >>> UTF8 and one for arbitrary encodings. >> >> Sorry if I'm repeating something that was already discussed, but is there a >> reason you don't include a `withCString` variant for arbitrary encodings? It >> seems like an odd asymmetry. > > Hmm. Is this a common use-case people have? Symmetry for the sake of it > doesn’t seem enough. If uncommon, you can do it via an Array that you > nul-terminate manually. Is `init(cString:encoding:)` a common use case? If it is, I'm not sure why the opposite wouldn't be. > Yeah, it’s tempting to make ParseResult general, and the only reason we held > off is because we don’t want making sure it’s generally useful to be a > distraction. Understandable. I wonder if some part of the parsing algorithm could somehow be generalized so it was suitable for many purposes and then put on `Collection`, with the `UnicodeEncoding` then being passed as a parameter to it. If so, that would justify making `ParseResult` a top-level type. > Ah, yes. Here it is: > > public protocol EncodedScalarProtocol : RandomAccessCollection { > init?(_ scalarValue: UnicodeScalar) > var utf8: UTF8.EncodedScalar { get } > var utf16: UTF16.EncodedScalar { get } > var utf32: UTF32.EncodedScalar { get } > } What is the `Element` type expected to be here? I think what's missing is a holistic overview of the encoding system. So, please help me write this function: func unicodeScalars<Encoding: UnicodeEncoding>(in data: Data, using encoding: Encoding.Type) -> [UnicodeScalar] { var scalars: [UnicodeScalar] = [] data.withUnsafeBytes { (bytes: UnsafePointer<$ParseInputElement>) in let buffer = UnsafeBufferPointer(start: bytes, count: data.count / MemoryLayout<$ParseInputElement>.size) encoding.parseForward(buffer) { encodedScalar in let unicodeScalar: UnicodeScalar = $doSomething(encodedScalar) scalars.append(unicodeScalar) } } return scalars } What type would I put for $ParseInputElement? What function or initializer do I call for $doSomething? >>> @discardableResult >>> public static func parseForward<C: Collection>( >>> _ input: C, >>> repairingIllFormedSequences makeRepairs: Bool = true, >>> into output: (EncodedScalar) throws->Void >>> ) rethrows -> (remainder: C.SubSequence, errorCount: Int) >> >> Are there constraints missing on `parseForward`? >> > > Yep – see the note that appears a little later. They’re really implementation > details – so not something to capture in the proposal – which may or may not > be needed depending on whether this lands before or after the generics > features that make them redundant. No, I mean because this says nothing about `C`'s element type. Presumably you can't parse a bunch of `UIView`s into Unicode scalars, so there must be some kind of constraint on the collection's elements. What is it? ...oh, I notice that `parseScalarForward(_:knownCount:)` has the clause `where C.Iterator.Element == EncodedScalar.Iterator.Element` attached. Should that also be attached to `parseForward(_:repairingIllFormedSequences:into:)`? >> What do these do if `makeRepairs` is false? Would it be clearer if we made >> an enum that described the behaviors and changed the label to something like >> `ifIllFormed:`? > > The Unicode standard specifies values to substitute when making repairs. I'm asking what happens if you *don't* want to make repairs. Does it, say, stop immediately, returning an `errorCount` of `1` and a `remainder` that starts at the site of the error? If so, would we better off having that parameter be something like `ifIllFormed: .stop` or `ifIllFormed: .repair`, rather than `repairingIllFormedSequences: false` or `repairingIllFormedSequences: true`? >>> Due to the change in internal implementation, this means that these >>> operations will be O(n) rather than O(1). This is not expected to be a >>> major concern, based on experiences from a similar change made to Java, but >>> projects will be able to work around performance issues without upgrading >>> to Swift 4 by explicitly typing slices as Substring, which will call the >>> Swift 4 variant, and which will be available but not invoked by default in >>> Swift 3 mode. >> >> Will there be a way to make this also work with a real Swift 3 compiler? For >> instance, can you define `typealias Substring = String` in such a way that >> real Swift 3 will parse and use it, but Swift 4 in Swift 3 mode will ignore >> it? > > Are you talking about this as a way for people to change their code, while > still being able to compile their code with the old compiler? Yes, that might > be a good strategy, will think about that. Yes, that's what I'm talking about. I guess the actual question is, does `#if swift(>=4)` come out as `true` for Swift 4 in Swift 3 mode? If not, is there some way to detect that you're using Swift 4 in Swift 3 mode? (I suppose one answer is "yes, Swift 4 in Swift 3 mode is called Swift 3.2"; I just haven't heard anyone mention anything like that yet.) In either case, if there's some way to distinguish, you could say: #if thisIsRealSwift3NotSwift4PretendingToBeSwift3() typealias Substring = String #endif And then you could write the rest of your code using `Substring` and it would compile using both Swift 3 and Swift 4 toolchains, never forcing an implicit copy. -- Brent Royal-Gordon Architechies
_______________________________________________ swift-evolution mailing list [email protected] https://lists.swift.org/mailman/listinfo/swift-evolution
