Re: [swift-evolution] [Pitch] String revision proposal #1

Ben Cohen via swift-evolution Wed, 05 Apr 2017 12:16:46 -0700

Hi Brent,

Sorry, I realized I failed to reply to these at the time. See below.


> On Mar 30, 2017, at 6:52 PM, Brent Royal-Gordon <[email protected]> 
> wrote:
> 
>> On Mar 30, 2017, at 2:36 PM, Ben Cohen <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> The big win for Unicode is it is short. We want to encourage people to write 
>> their extensions on this protocol. We want people who previously extended 
>> String to feel very comfortable extending Unicode. It also helps emphasis 
>> how important the Unicode-ness of Swift.String is. I like the idea of 
>> Unicode.Collection, but it is a little intimidating and making it even a 
>> tiny bit intimidating is worrying to me from an adoption perspective. 
> 
> Yeah, I understand why "Collection" might be intimidating. But I think 
> "Unicode" would be too—it's opaque enough that people wouldn't be entirely 
> sure whether they were extending the right thing.
> 
> I did a quick run-through of different language and the 
> protocols/interfaces/whatever their string types conform to, but most don't 
> seem to have anything that abstracts string types. The only similar things I 
> could find were `CharSequence` in Java, `StringLike` in Scala...and `Stringy` 
> in Perl 6. And I'm sure you thought you were joking!
> 

Ha!

> Honestly, I'd recommend just going with `StringProtocol` unless you can come 
> up with an adjective form you like (`Stringlike`? `Textual`?). It's a bit 
> clumsy, but it's crystal clear. Stupid name, but you'll never forget it.
> 

I think it’s kind of evenly balanced between Unicode and StringProtocol. 
Neither are perfect.

>>> I'm a little worried about this because it seems to imply that the protocol 
>>> cannot include any mutation operations that aren't in 
>>> `RangeReplaceableCollection`. For instance, it won't be possible to include 
>>> an in-place `applyTransform` method in the protocol. Do you anticipate that 
>>> being an issue? Might it be a good idea to define a parallel `Mutable` or 
>>> `RangeReplaceable` protocol?
>>> 
>> 
>> You can always assign to self. Then provide more efficient implementations 
>> where RangeReplaceableCollection. We do this elsewhere in the std lib with 
>> collections e.g. 
>> https://github.com/apple/swift/blob/master/stdlib/public/core/Collection.swift#L1277
>>  
>> <https://github.com/apple/swift/blob/master/stdlib/public/core/Collection.swift#L1277>.
>> 
>> Proliferating protocol combinations is problematic (looking at you, 
>> BidirectionalMutableRandomAccessSlice).
> 
> Nobody likes proliferation, but in this case it'd be because there genuinely 
> were additional semantics that were only available on mutable strings.
> 
> (Once upon a time, I think I requested the ability to write `func index(of 
> elem: Iterator.Element) -> Index? where Iterator.Element: Equatable`. Could 
> such a feature be used for this? `func apply(_ transform: StringTransform, 
> reverse: Bool) where Self: RangeReplaceableCollection`?)
> 
>>>> The C string interop methods will be updated to those described here: a 
>>>> single withCString operation and two init(cString:) constructors, one for 
>>>> UTF8 and one for arbitrary encodings.
>>> 
>>> Sorry if I'm repeating something that was already discussed, but is there a 
>>> reason you don't include a `withCString` variant for arbitrary encodings? 
>>> It seems like an odd asymmetry.
>> 
>> Hmm. Is this a common use-case people have? Symmetry for the sake of it 
>> doesn’t seem enough. If uncommon, you can do it via an Array that you 
>> nul-terminate manually.
> 
> Is `init(cString:encoding:)` a common use case? If it is, I'm not sure why 
> the opposite wouldn't be.
> 

This + another use case has convinced me that yes, we should have a matching 
withCString version.

>> Yeah, it’s tempting to make ParseResult general, and the only reason we held 
>> off is because we don’t want making sure it’s generally useful to be a 
>> distraction.
> 
> Understandable.
> 
> I wonder if some part of the parsing algorithm could somehow be generalized 
> so it was suitable for many purposes and then put on `Collection`, with the 
> `UnicodeEncoding` then being passed as a parameter to it. If so, that would 
> justify making `ParseResult` a top-level type.
> 
>> Ah, yes. Here it is:
>> 
>> public protocol EncodedScalarProtocol : RandomAccessCollection {
>>  init?(_ scalarValue: UnicodeScalar)
>>  var utf8: UTF8.EncodedScalar { get }
>>  var utf16: UTF16.EncodedScalar { get }
>>  var utf32: UTF32.EncodedScalar { get }
>> }
> 
> What is the `Element` type expected to be here?
> 
> I think what's missing is a holistic overview of the encoding system. So, 
> please help me write this function:
> 
>       func unicodeScalars<Encoding: UnicodeEncoding>(in data: Data, using 
> encoding: Encoding.Type) -> [UnicodeScalar] {
>               var scalars: [UnicodeScalar] = []
>               
>               data.withUnsafeBytes { (bytes: 
> UnsafePointer<$ParseInputElement>) in
>                       let buffer = UnsafeBufferPointer(start: bytes, count: 
> data.count / MemoryLayout<$ParseInputElement>.size)
>                       encoding.parseForward(buffer) { encodedScalar in
>                               let unicodeScalar: UnicodeScalar = 
> $doSomething(encodedScalar)
>                               scalars.append(unicodeScalar)
>                       }
>               }
>               
>               return scalars
>       }
> 
> What type would I put for $ParseInputElement? What function or initializer do 
> I call for $doSomething?
> 

Will come back on this.

>>>> @discardableResult
>>>> public static func parseForward<C: Collection>(
>>>>   _ input: C,
>>>>   repairingIllFormedSequences makeRepairs: Bool = true,
>>>>   into output: (EncodedScalar) throws->Void
>>>> ) rethrows -> (remainder: C.SubSequence, errorCount: Int)
>>> 
>>> Are there constraints missing on `parseForward`?
>>> 
>> 
>> Yep – see the note that appears a little later. They’re really 
>> implementation details – so not something to capture in the proposal – which 
>> may or may not be needed depending on whether this lands before or after the 
>> generics features that make them redundant.
> 
> No, I mean because this says nothing about `C`'s element type. Presumably you 
> can't parse a bunch of `UIView`s into Unicode scalars, so there must be some 
> kind of constraint on the collection's elements. What is it?
> 
> ...oh, I notice that `parseScalarForward(_:knownCount:)` has the clause 
> `where C.Iterator.Element == EncodedScalar.Iterator.Element` attached. Should 
> that also be attached to `parseForward(_:repairingIllFormedSequences:into:)`?
> 
>>> What do these do if `makeRepairs` is false? Would it be clearer if we made 
>>> an enum that described the behaviors and changed the label to something 
>>> like `ifIllFormed:`?
>> 
>> The Unicode standard specifies values to substitute when making repairs.
> 
> I'm asking what happens if you *don't* want to make repairs. Does it, say, 
> stop immediately, returning an `errorCount` of `1` and a `remainder` that 
> starts at the site of the error? If so, would we better off having that 
> parameter be something like `ifIllFormed: .stop` or `ifIllFormed: .repair`, 
> rather than `repairingIllFormedSequences: false` or 
> `repairingIllFormedSequences: true`?
> 

The idea is, if you don’t want to make repairs, you use the transcoding 
primitives instead. The belief is that the old non-repairing versions (return 
nil if repairs needed) weren’t useful.

>>>> Due to the change in internal implementation, this means that these 
>>>> operations will be O(n) rather than O(1). This is not expected to be a 
>>>> major concern, based on experiences from a similar change made to Java, 
>>>> but projects will be able to work around performance issues without 
>>>> upgrading to Swift 4 by explicitly typing slices as Substring, which will 
>>>> call the Swift 4 variant, and which will be available but not invoked by 
>>>> default in Swift 3 mode.
>>> 
>>> Will there be a way to make this also work with a real Swift 3 compiler? 
>>> For instance, can you define `typealias Substring = String` in such a way 
>>> that real Swift 3 will parse and use it, but Swift 4 in Swift 3 mode will 
>>> ignore it?
>> 
>> Are you talking about this as a way for people to change their code, while 
>> still being able to compile their code with the old compiler? Yes, that 
>> might be a good strategy, will think about that.
> 
> Yes, that's what I'm talking about.
> 
> I guess the actual question is, does `#if swift(>=4)` come out as `true` for 
> Swift 4 in Swift 3 mode? If not, is there some way to detect that you're 
> using Swift 4 in Swift 3 mode? (I suppose one answer is "yes, Swift 4 in 
> Swift 3 mode is called Swift 3.2"; I just haven't heard anyone mention 
> anything like that yet.) In either case, if there's some way to distinguish, 
> you could say:
> 
>       #if thisIsRealSwift3NotSwift4PretendingToBeSwift3()
>       typealias Substring = String
>       #endif
> 
> And then you could write the rest of your code using `Substring` and it would 
> compile using both Swift 3 and Swift 4 toolchains, never forcing an implicit 
> copy.
> 

Ah right. Unfortunately as things are currently envisioned, this won’t work – 
you won’t be able to distinguish “true” Swift 3 from Swift 3 compatibility mode.

> -- 
> Brent Royal-Gordon
> Architechies
>

_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution

Re: [swift-evolution] [Pitch] String revision proposal #1

Reply via email to