on Tue Jan 24 2017, Zach Waldowski <[email protected]> wrote:
> I'll use Karl's point here as a minor jumping-off point for a semi- > related train of thought… I'm excited by the content of the original > manifesto, including a powerful Unicode namespace and types. But as > I've continued down the thread, I've had growing concern about modeling > strings breadthwise in the type system i.e., with UTF8String and so on. > > I strongly want Swift to have world-class string processing, but I > believe even more strongly in the language's spirit of progressive > disclosure. Newcomers to Swift's current String API find it difficult > (something I personally disagree with, but that's neither here nor > there); I don't think that difficulty is solved by aggressively use- > specific type modeling. I instead think it gives rise to the same severe > cargo-culting that gets us the scarily prevalent > String.Index.init(offset:) extensions in the current model. I think you're overplaying the impact these other types will have on the user experience. String will still be the common-currency vocabulary type most users will handle. Other models of Unicode *will* exist for cases where the highest performance matters, and will interoperate smoothly with String, but most users will never know about them. > > > Best > > Zach Waldowski > > [email protected] > > On Tue, Jan 24, 2017, at 10:15 PM, Karl Wagner via swift-evolution wrote: >> > >>> > >>>> I hope I am correct about the no-copy thing, and I would also >>>> like to >>>> permit promoting C strings to Swift strings without >>>> validation. This >>>> is obviously unsafe in general, but I know my strings... and I care >>>> about performance. ;) > >>> > >>> We intend to support that use-case. That's part of the reason >>> for the >>> ValidUTF8 and ValidUTF16 encodings you see here: > >>> https://github.com/apple/swift/blob/unicode-rethink/stdlib/public/core/Unicode2.swift#L598 >>> and here: > >>> https://github.com/apple/swift/blob/unicode-rethink/stdlib/public/core/Unicode2.swift#L862 >> > >> It seems a little strange to me that a pre-validated UTF8 string from >> C would have different types to a UTF8String (i.e. using ValidUTF8 vs >> UTF8). It defeats the point of having the encoding represented in the >> type-system. >> > >> For example, if I write a generic function: > >> > >>> func sendMessage<Source: Unicode where Source.Encoding == UTF8>(from: >>> Source) >> > >> I would only be able to accept UTF-8 text which hasn’t already been >> validated. >> > >> What about if we allowed each encoding to provide multiple kinds of >> decoder? That would also allow us to substitute our own decoders in, >> if there are application-specific shortcuts we can take. >> > >>> protocol UnicodeEncoding { > >>> associatedtype CodeUnit > >>> > >>> associatedtype ValidatingDecoder: UnicodeDecoder > >>> associatedtype NonValidatingDecoder: UnicodeDecoder >>> } > >>> > >>> protocol UnicodeDecoder { > >>> associatedtype Encoding: UnicodeEncoding > >>> associatedtype DecodedScalar: RandomAccessCollection where >>> Iterator.Element == Encoding.CodeUnit >>> > >>> static func parse1Forward<C>(…) -> ParseResult<DecodedScalar, >>> C.Index> >>> static func parse1Backward<C>(…) -> ParseResult<DecodedScalar, >>> C.Index> >>> } > >>> // Not shown: UnicodeEncoder protocol, with transcodeScalar<T> >>> function. >>> > >>> struct UTF8: UnicodeEncoding { > >>> typealias CodeUnit = UInt8 > >>> typealias ValidatingDecoder = ValidatingUTF8Decoder > >>> typealias NonValidatingDecoder = NonValidatingUTF8Decoder > >>> } > >>> > >>> struct NonValidatingUTF8Decoder: UnicodeDecoder { > >>> typealias Encoding = UTF8 > >>> struct DecodedScalar: RandomAccessCollection { … } > >>> // Parsing functions > >>> } > >>> > >>> struct ValidatingUTF8Decoder: UnicodeDecoder { > >>> typealias Encoding = UTF8 > >>> typealias DecodedScalar = NonValidatingUTF8Decoder.DecodedScalar >>> // newtype would be cool here >>> // Parsing functions > >>> } > >>> > >>> struct String { > >>> init<C, Encoding, Decoder>(from: C, encodedAs: Encoding, using: >>> Decoder = Encoding.ValidatingDecoder) >>> where C: Collection, C.Iterator.Element == Encoding.CodeUnit, >>> Decoder.Encoding == Encoding { >>> > >>> // transcode to native String encoding using ‘Decoder’ we >>> were given >>> } > >>> } > >> > >> - Karl > >> _________________________________________________ > >> swift-evolution mailing list > >> [email protected] > >> https://lists.swift.org/mailman/listinfo/swift-evolution > > _______________________________________________ > swift-evolution mailing list > [email protected] > https://lists.swift.org/mailman/listinfo/swift-evolution > -- -Dave _______________________________________________ swift-evolution mailing list [email protected] https://lists.swift.org/mailman/listinfo/swift-evolution
