Re: [swift-evolution] [Pitch] String revision proposal #1

Brent Royal-Gordon via swift-evolution Thu, 30 Mar 2017 18:52:45 -0700

> On Mar 30, 2017, at 2:36 PM, Ben Cohen <[email protected]> wrote:
> 
> The big win for Unicode is it is short. We want to encourage people to write 
> their extensions on this protocol. We want people who previously extended 
> String to feel very comfortable extending Unicode. It also helps emphasis how 
> important the Unicode-ness of Swift.String is. I like the idea of 
> Unicode.Collection, but it is a little intimidating and making it even a tiny 
> bit intimidating is worrying to me from an adoption perspective.


Yeah, I understand why "Collection" might be intimidating. But I think 
"Unicode" would be too—it's opaque enough that people wouldn't be entirely sure 
whether they were extending the right thing.

I did a quick run-through of different language and the 
protocols/interfaces/whatever their string types conform to, but most don't 
seem to have anything that abstracts string types. The only similar things I 
could find were `CharSequence` in Java, `StringLike` in Scala...and `Stringy` 
in Perl 6. And I'm sure you thought you were joking!

Honestly, I'd recommend just going with `StringProtocol` unless you can come up 
with an adjective form you like (`Stringlike`? `Textual`?). It's a bit clumsy, 
but it's crystal clear. Stupid name, but you'll never forget it.

>> I'm a little worried about this because it seems to imply that the protocol 
>> cannot include any mutation operations that aren't in 
>> `RangeReplaceableCollection`. For instance, it won't be possible to include 
>> an in-place `applyTransform` method in the protocol. Do you anticipate that 
>> being an issue? Might it be a good idea to define a parallel `Mutable` or 
>> `RangeReplaceable` protocol?
>> 
> 
> You can always assign to self. Then provide more efficient implementations 
> where RangeReplaceableCollection. We do this elsewhere in the std lib with 
> collections e.g. 
> https://github.com/apple/swift/blob/master/stdlib/public/core/Collection.swift#L1277
>  
> <https://github.com/apple/swift/blob/master/stdlib/public/core/Collection.swift#L1277>.
> 
> Proliferating protocol combinations is problematic (looking at you, 
> BidirectionalMutableRandomAccessSlice).

Nobody likes proliferation, but in this case it'd be because there genuinely 
were additional semantics that were only available on mutable strings.

(Once upon a time, I think I requested the ability to write `func index(of 
elem: Iterator.Element) -> Index? where Iterator.Element: Equatable`. Could 
such a feature be used for this? `func apply(_ transform: StringTransform, 
reverse: Bool) where Self: RangeReplaceableCollection`?)

>>> The C string interop methods will be updated to those described here: a 
>>> single withCString operation and two init(cString:) constructors, one for 
>>> UTF8 and one for arbitrary encodings.
>> 
>> Sorry if I'm repeating something that was already discussed, but is there a 
>> reason you don't include a `withCString` variant for arbitrary encodings? It 
>> seems like an odd asymmetry.
> 
> Hmm. Is this a common use-case people have? Symmetry for the sake of it 
> doesn’t seem enough. If uncommon, you can do it via an Array that you 
> nul-terminate manually.

Is `init(cString:encoding:)` a common use case? If it is, I'm not sure why the 
opposite wouldn't be.

> Yeah, it’s tempting to make ParseResult general, and the only reason we held 
> off is because we don’t want making sure it’s generally useful to be a 
> distraction.

Understandable.

I wonder if some part of the parsing algorithm could somehow be generalized so 
it was suitable for many purposes and then put on `Collection`, with the 
`UnicodeEncoding` then being passed as a parameter to it. If so, that would 
justify making `ParseResult` a top-level type.

> Ah, yes. Here it is:
> 
> public protocol EncodedScalarProtocol : RandomAccessCollection {
>  init?(_ scalarValue: UnicodeScalar)
>  var utf8: UTF8.EncodedScalar { get }
>  var utf16: UTF16.EncodedScalar { get }
>  var utf32: UTF32.EncodedScalar { get }
> }

What is the `Element` type expected to be here?

I think what's missing is a holistic overview of the encoding system. So, 
please help me write this function:

        func unicodeScalars<Encoding: UnicodeEncoding>(in data: Data, using 
encoding: Encoding.Type) -> [UnicodeScalar] {
                var scalars: [UnicodeScalar] = []
                
                data.withUnsafeBytes { (bytes: 
UnsafePointer<$ParseInputElement>) in
                        let buffer = UnsafeBufferPointer(start: bytes, count: 
data.count / MemoryLayout<$ParseInputElement>.size)
                        encoding.parseForward(buffer) { encodedScalar in
                                let unicodeScalar: UnicodeScalar = 
$doSomething(encodedScalar)
                                scalars.append(unicodeScalar)
                        }
                }
                
                return scalars
        }

What type would I put for $ParseInputElement? What function or initializer do I 
call for $doSomething?

>>> @discardableResult
>>> public static func parseForward<C: Collection>(
>>>   _ input: C,
>>>   repairingIllFormedSequences makeRepairs: Bool = true,
>>>   into output: (EncodedScalar) throws->Void
>>> ) rethrows -> (remainder: C.SubSequence, errorCount: Int)
>> 
>> Are there constraints missing on `parseForward`?
>> 
> 
> Yep – see the note that appears a little later. They’re really implementation 
> details – so not something to capture in the proposal – which may or may not 
> be needed depending on whether this lands before or after the generics 
> features that make them redundant.

No, I mean because this says nothing about `C`'s element type. Presumably you 
can't parse a bunch of `UIView`s into Unicode scalars, so there must be some 
kind of constraint on the collection's elements. What is it?

...oh, I notice that `parseScalarForward(_:knownCount:)` has the clause `where 
C.Iterator.Element == EncodedScalar.Iterator.Element` attached. Should that 
also be attached to `parseForward(_:repairingIllFormedSequences:into:)`?

>> What do these do if `makeRepairs` is false? Would it be clearer if we made 
>> an enum that described the behaviors and changed the label to something like 
>> `ifIllFormed:`?
> 
> The Unicode standard specifies values to substitute when making repairs.

I'm asking what happens if you *don't* want to make repairs. Does it, say, stop 
immediately, returning an `errorCount` of `1` and a `remainder` that starts at 
the site of the error? If so, would we better off having that parameter be 
something like `ifIllFormed: .stop` or `ifIllFormed: .repair`, rather than 
`repairingIllFormedSequences: false` or `repairingIllFormedSequences: true`?

>>> Due to the change in internal implementation, this means that these 
>>> operations will be O(n) rather than O(1). This is not expected to be a 
>>> major concern, based on experiences from a similar change made to Java, but 
>>> projects will be able to work around performance issues without upgrading 
>>> to Swift 4 by explicitly typing slices as Substring, which will call the 
>>> Swift 4 variant, and which will be available but not invoked by default in 
>>> Swift 3 mode.
>> 
>> Will there be a way to make this also work with a real Swift 3 compiler? For 
>> instance, can you define `typealias Substring = String` in such a way that 
>> real Swift 3 will parse and use it, but Swift 4 in Swift 3 mode will ignore 
>> it?
> 
> Are you talking about this as a way for people to change their code, while 
> still being able to compile their code with the old compiler? Yes, that might 
> be a good strategy, will think about that.

Yes, that's what I'm talking about.

I guess the actual question is, does `#if swift(>=4)` come out as `true` for 
Swift 4 in Swift 3 mode? If not, is there some way to detect that you're using 
Swift 4 in Swift 3 mode? (I suppose one answer is "yes, Swift 4 in Swift 3 mode 
is called Swift 3.2"; I just haven't heard anyone mention anything like that 
yet.) In either case, if there's some way to distinguish, you could say:

        #if thisIsRealSwift3NotSwift4PretendingToBeSwift3()
        typealias Substring = String
        #endif

And then you could write the rest of your code using `Substring` and it would 
compile using both Swift 3 and Swift 4 toolchains, never forcing an implicit 
copy.

-- 
Brent Royal-Gordon
Architechies

_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution

Re: [swift-evolution] [Pitch] String revision proposal #1

Reply via email to