I don't have much non-nitpick issues that I greatly care about; I'm in favor of
this.
My only request: it's currently painful to create a String from a fixed-size C
array. For instance, if I have a pointer to a `struct foo { char name[16]; }`
in Swift where the last character doesn't have to be a NUL, it's hard to create
a String from it. Real-world examples of this are Mach-O LC_SEGMENT and
LC_SEGMENT_64 commands.
The generally-accepted wisdom <http://stackoverflow.com/a/27456220/251153> is
that you take a pointer to the CChar tuple that represents the fixed-size
array, but this still requires the string to be NUL-terminated. What do we
think of an additional init(cString:) overload that takes an
UnsafeBufferPointer and reads up to the first NUL or the end of the buffer,
whichever comes first?
> Le 30 mars 2017 à 02:48, Brent Royal-Gordon via swift-evolution
> <[email protected]> a écrit :
>
>> On Mar 29, 2017, at 5:32 PM, Ben Cohen via swift-evolution
>> <[email protected]> wrote:
>>
>> Hi Swift Evolution,
>>
>> Below is a pitch for the first part of the String revision. This covers a
>> number of changes that would allow the basic internals to be overhauled.
>>
>> Online version here:
>> https://github.com/airspeedswift/swift-evolution/blob/3a822c799011ace682712532cfabfe32e9203fbb/proposals/0161-StringRevision1.md
>
> Really great stuff, guys. Thanks for your work on this!
>
>> In order to be able to write extensions accross both String and Substring, a
>> new Unicode protocol to which the two types will conform will be introduced.
>> For the purposes of this proposal, Unicode will be defined as a protocol to
>> be used whenver you would previously extend String. It should be possible to
>> substitute extension Unicode { ... } in Swift 4 wherever extension String {
>> ... } was written in Swift 3, with one exception: any passing of self into
>> an API that takes a concrete String will need to be rewritten as
>> String(self). If Self is a String then this should effectively optimize to a
>> no-op, whereas if Self is a Substring then this will force a copy, helping
>> to avoid the “memory leak” problems described above.
>
> I continue to feel that `Unicode` is the wrong name for this protocol,
> essentially because it sounds like a protocol for, say, a version of Unicode
> or some kind of encoding machinery instead of a Unicode string. I won't
> rehash that argument since I made it already in the manifesto thread, but I
> would like to make a couple new suggestions in this area.
>
> Later on, you note that it would be nice to namespace many of these types:
>
>> Several of the types related to String, such as the encodings, would ideally
>> reside inside a namespace rather than live at the top level of the standard
>> library. The best namespace for this is probably Unicode, but this is also
>> the name of the protocol. At some point if we gain the ability to nest enums
>> and types inside protocols, they should be moved there. Putting them inside
>> String or some other enum namespace is probably not worthwhile in the
>> mean-time.
>
> Perhaps we should use an empty enum to create a `Unicode` namespace and then
> nest the protocol within it via typealias. If we do that, we can consider
> names like `Unicode.Collection` or even `Unicode.String` which would shadow
> existing types if they were top-level.
>
> If not, then given this:
>
>> The exact nature of the protocol – such as which methods should be protocol
>> requirements vs which can be implemented as protocol extensions, are
>> considered implementation details and so not covered in this proposal.
>
> We may simply want to wait to choose a name. As the protocol develops, we may
> discover a theme in its requirements which would suggest a good name. For
> instance, we may realize that the core of what the protocol abstracts is
> grouping code units into characters, which might suggest a name like
> `Characters`, or `Unicode.Characters`, or `CharacterCollection`, or
> what-have-you.
>
> (By the way, I hope that the eventual protocol requirements will be put
> through the review process, if only as an amendment, once they're determined.)
>
>> Unicode will conform to BidirectionalCollection. RangeReplaceableCollection
>> conformance will be added directly onto the String and Substring types, as
>> it is possible future Unicode-conforming types might not be
>> range-replaceable (e.g. an immutable type that wraps a const char *).
>
> I'm a little worried about this because it seems to imply that the protocol
> cannot include any mutation operations that aren't in
> `RangeReplaceableCollection`. For instance, it won't be possible to include
> an in-place `applyTransform` method in the protocol. Do you anticipate that
> being an issue? Might it be a good idea to define a parallel `Mutable` or
> `RangeReplaceable` protocol?
>
>> The C string interop methods will be updated to those described here: a
>> single withCString operation and two init(cString:) constructors, one for
>> UTF8 and one for arbitrary encodings.
>
> Sorry if I'm repeating something that was already discussed, but is there a
> reason you don't include a `withCString` variant for arbitrary encodings? It
> seems like an odd asymmetry.
>
>> The standard library currently lacks a Latin1 codec, so a enum Latin1:
>> UnicodeEncoding type will be added.
>
> Nice. I wrote one of those once; I'll enjoy deleting it.
>
>> A new protocol, UnicodeEncoding, will be added to replace the current
>> UnicodeCodec protocol:
>>
>> public enum UnicodeParseResult<T, Index> {
>
> Either `T` should be given a more specific name, or the enum should be given
> a less specific one, becoming `ParseResult` and being oriented towards
> incremental parsing of anything from any kind of collection.
>
>> /// Indicates valid input was recognized.
>> ///
>> /// `resumptionPoint` is the end of the parsed region
>> case valid(T, resumptionPoint: Index) // FIXME: should these be reordered?
>
> No, I think this is the right order. The thing that's valid is the code point.
>
>> /// Indicates invalid input was recognized.
>> ///
>> /// `resumptionPoint` is the next position at which to continue parsing after
>> /// the invalid input is repaired.
>> case error(resumptionPoint: Index)
>
> I know this is abbreviated documentation, but I hope the full version
> includes a good usage example demonstrating, among other things, how to
> detect partial characters and defer processing of them instead of rejecting
> them as erroneous.
>
>> /// An encoding for text with UnicodeScalar as a common currency type
>> public protocol UnicodeEncoding {
>> /// The maximum number of code units in an encoded unicode scalar value
>> static var maxLengthOfEncodedScalar: Int { get }
>>
>> /// A type that can represent a single UnicodeScalar as it is encoded in
>> this
>> /// encoding.
>> associatedtype EncodedScalar : EncodedScalarProtocol
>
> There's an `EncodedScalarProtocol`-shaped hole in this proposal. What does it
> do? What are its semantics? How does `EncodedScalar` relate to the old
> `CodeUnit`?
>
>> @discardableResult
>> public static func parseForward<C: Collection>(
>> _ input: C,
>> repairingIllFormedSequences makeRepairs: Bool = true,
>> into output: (EncodedScalar) throws->Void
>> ) rethrows -> (remainder: C.SubSequence, errorCount: Int)
>>
>> @discardableResult
>> public static func parseReverse<C: BidirectionalCollection>(
>> _ input: C,
>> repairingIllFormedSequences makeRepairs: Bool = true,
>> into output: (EncodedScalar) throws->Void
>> ) rethrows -> (remainder: C.SubSequence, errorCount: Int)
>> where C.SubSequence : BidirectionalCollection,
>> C.SubSequence.SubSequence == C.SubSequence,
>> C.SubSequence.Iterator.Element == EncodedScalar.Iterator.Element
>> }
>
> Are there constraints missing on `parseForward`?
>
> What do these do if `makeRepairs` is false? Would it be clearer if we made an
> enum that described the behaviors and changed the label to something like
> `ifIllFormed:`?
>
>> Due to the change in internal implementation, this means that these
>> operations will be O(n) rather than O(1). This is not expected to be a major
>> concern, based on experiences from a similar change made to Java, but
>> projects will be able to work around performance issues without upgrading to
>> Swift 4 by explicitly typing slices as Substring, which will call the Swift
>> 4 variant, and which will be available but not invoked by default in Swift 3
>> mode.
>
> Will there be a way to make this also work with a real Swift 3 compiler? For
> instance, can you define `typealias Substring = String` in such a way that
> real Swift 3 will parse and use it, but Swift 4 in Swift 3 mode will ignore
> it?
>
>> This proposal does not yet introduce an implicit conversion from Substring
>> to String. The decision on whether to add this will be deferred pending
>> feedback on the initial implementation. The intention is to make a preview
>> toolchain available for feedback, including on whether this implicit
>> conversion is necessary, prior to the release of Swift 4.
>
> This is a sensible approach.
>
> Thank you for developing this into a full proposal. I discussed the plans for
> Swift 4 with a local group of programmers recently, and everyone was pleased
> to hear that `String` would get an overhaul, that the `characters` view would
> be integrated into the string, etc. We even talked a little about `Substring`
> and people thought it was a good idea. This proposal is shaping up to impact
> a lot of people, but in a good way!
>
> --
> Brent Royal-Gordon
> Architechies
>
> _______________________________________________
> swift-evolution mailing list
> [email protected]
> https://lists.swift.org/mailman/listinfo/swift-evolution
_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution