You could even argue that what we need is a Collection wrapper that turns a pointer + a terminating sigil into a Collection… but from-C-string-creation is such a common operation that it deserves a dedicated shorthand. Non-null-terminated creation probably doesn’t.
> On Mar 31, 2017, at 8:03 AM, Ben Cohen <[email protected]> wrote: > > > When you have a pointer and a length, you can create a fully functional > Collection using UnsafeBufferPointer. This means you aren't need something > that’s C interop-specific any more – just the ability to create a String from > a Collection of code units of some encoding. > > We’ll add something to the proposal making it clear this will be possible. > >> On Mar 31, 2017, at 4:01 AM, Jean-Daniel via swift-evolution >> <[email protected] <mailto:[email protected]>> wrote: >> >> I’m with you for a C intro API that support taking a non-null terminated >> string. I often work with API that support efficient parsing by providing >> pointer to a global buffer + length to report parsed strings. >> >> Without a way to create a Swift string from buffer + length, interop with >> such API will be difficult for no good reason, as Swift string don’t event >> have to be null terminated. >> >>> Le 30 mars 2017 à 18:35, Félix Cloutier via swift-evolution >>> <[email protected] <mailto:[email protected]>> a écrit : >>> >>> I don't have much non-nitpick issues that I greatly care about; I'm in >>> favor of this. >>> >>> My only request: it's currently painful to create a String from a >>> fixed-size C array. For instance, if I have a pointer to a `struct foo { >>> char name[16]; }` in Swift where the last character doesn't have to be a >>> NUL, it's hard to create a String from it. Real-world examples of this are >>> Mach-O LC_SEGMENT and LC_SEGMENT_64 commands. >>> >>> The generally-accepted wisdom <http://stackoverflow.com/a/27456220/251153> >>> is that you take a pointer to the CChar tuple that represents the >>> fixed-size array, but this still requires the string to be NUL-terminated. >>> What do we think of an additional init(cString:) overload that takes an >>> UnsafeBufferPointer and reads up to the first NUL or the end of the buffer, >>> whichever comes first? >>> >>>> Le 30 mars 2017 à 02:48, Brent Royal-Gordon via swift-evolution >>>> <[email protected] <mailto:[email protected]>> a écrit : >>>> >>>>> On Mar 29, 2017, at 5:32 PM, Ben Cohen via swift-evolution >>>>> <[email protected] <mailto:[email protected]>> wrote: >>>>> >>>>> Hi Swift Evolution, >>>>> >>>>> Below is a pitch for the first part of the String revision. This covers a >>>>> number of changes that would allow the basic internals to be overhauled. >>>>> >>>>> Online version here: >>>>> https://github.com/airspeedswift/swift-evolution/blob/3a822c799011ace682712532cfabfe32e9203fbb/proposals/0161-StringRevision1.md >>>>> >>>>> <https://github.com/airspeedswift/swift-evolution/blob/3a822c799011ace682712532cfabfe32e9203fbb/proposals/0161-StringRevision1.md> >>>> >>>> Really great stuff, guys. Thanks for your work on this! >>>> >>>>> In order to be able to write extensions accross both String and >>>>> Substring, a new Unicode protocol to which the two types will conform >>>>> will be introduced. For the purposes of this proposal, Unicode will be >>>>> defined as a protocol to be used whenver you would previously extend >>>>> String. It should be possible to substitute extension Unicode { ... } in >>>>> Swift 4 wherever extension String { ... } was written in Swift 3, with >>>>> one exception: any passing of self into an API that takes a concrete >>>>> String will need to be rewritten as String(self). If Self is a String >>>>> then this should effectively optimize to a no-op, whereas if Self is a >>>>> Substring then this will force a copy, helping to avoid the “memory leak” >>>>> problems described above. >>>> >>>> I continue to feel that `Unicode` is the wrong name for this protocol, >>>> essentially because it sounds like a protocol for, say, a version of >>>> Unicode or some kind of encoding machinery instead of a Unicode string. I >>>> won't rehash that argument since I made it already in the manifesto >>>> thread, but I would like to make a couple new suggestions in this area. >>>> >>>> Later on, you note that it would be nice to namespace many of these types: >>>> >>>>> Several of the types related to String, such as the encodings, would >>>>> ideally reside inside a namespace rather than live at the top level of >>>>> the standard library. The best namespace for this is probably Unicode, >>>>> but this is also the name of the protocol. At some point if we gain the >>>>> ability to nest enums and types inside protocols, they should be moved >>>>> there. Putting them inside String or some other enum namespace is >>>>> probably not worthwhile in the mean-time. >>>> >>>> Perhaps we should use an empty enum to create a `Unicode` namespace and >>>> then nest the protocol within it via typealias. If we do that, we can >>>> consider names like `Unicode.Collection` or even `Unicode.String` which >>>> would shadow existing types if they were top-level. >>>> >>>> If not, then given this: >>>> >>>>> The exact nature of the protocol – such as which methods should be >>>>> protocol requirements vs which can be implemented as protocol extensions, >>>>> are considered implementation details and so not covered in this proposal. >>>> >>>> We may simply want to wait to choose a name. As the protocol develops, we >>>> may discover a theme in its requirements which would suggest a good name. >>>> For instance, we may realize that the core of what the protocol abstracts >>>> is grouping code units into characters, which might suggest a name like >>>> `Characters`, or `Unicode.Characters`, or `CharacterCollection`, or >>>> what-have-you. >>>> >>>> (By the way, I hope that the eventual protocol requirements will be put >>>> through the review process, if only as an amendment, once they're >>>> determined.) >>>> >>>>> Unicode will conform to BidirectionalCollection. >>>>> RangeReplaceableCollection conformance will be added directly onto the >>>>> String and Substring types, as it is possible future Unicode-conforming >>>>> types might not be range-replaceable (e.g. an immutable type that wraps a >>>>> const char *). >>>> >>>> I'm a little worried about this because it seems to imply that the >>>> protocol cannot include any mutation operations that aren't in >>>> `RangeReplaceableCollection`. For instance, it won't be possible to >>>> include an in-place `applyTransform` method in the protocol. Do you >>>> anticipate that being an issue? Might it be a good idea to define a >>>> parallel `Mutable` or `RangeReplaceable` protocol? >>>> >>>>> The C string interop methods will be updated to those described here: a >>>>> single withCString operation and two init(cString:) constructors, one for >>>>> UTF8 and one for arbitrary encodings. >>>> >>>> Sorry if I'm repeating something that was already discussed, but is there >>>> a reason you don't include a `withCString` variant for arbitrary >>>> encodings? It seems like an odd asymmetry. >>>> >>>>> The standard library currently lacks a Latin1 codec, so a enum Latin1: >>>>> UnicodeEncoding type will be added. >>>> >>>> Nice. I wrote one of those once; I'll enjoy deleting it. >>>> >>>>> A new protocol, UnicodeEncoding, will be added to replace the current >>>>> UnicodeCodec protocol: >>>>> >>>>> public enum UnicodeParseResult<T, Index> { >>>> >>>> Either `T` should be given a more specific name, or the enum should be >>>> given a less specific one, becoming `ParseResult` and being oriented >>>> towards incremental parsing of anything from any kind of collection. >>>> >>>>> /// Indicates valid input was recognized. >>>>> /// >>>>> /// `resumptionPoint` is the end of the parsed region >>>>> case valid(T, resumptionPoint: Index) // FIXME: should these be >>>>> reordered? >>>> >>>> No, I think this is the right order. The thing that's valid is the code >>>> point. >>>> >>>>> /// Indicates invalid input was recognized. >>>>> /// >>>>> /// `resumptionPoint` is the next position at which to continue parsing >>>>> after >>>>> /// the invalid input is repaired. >>>>> case error(resumptionPoint: Index) >>>> >>>> I know this is abbreviated documentation, but I hope the full version >>>> includes a good usage example demonstrating, among other things, how to >>>> detect partial characters and defer processing of them instead of >>>> rejecting them as erroneous. >>>> >>>>> /// An encoding for text with UnicodeScalar as a common currency type >>>>> public protocol UnicodeEncoding { >>>>> /// The maximum number of code units in an encoded unicode scalar value >>>>> static var maxLengthOfEncodedScalar: Int { get } >>>>> >>>>> /// A type that can represent a single UnicodeScalar as it is encoded in >>>>> this >>>>> /// encoding. >>>>> associatedtype EncodedScalar : EncodedScalarProtocol >>>> >>>> There's an `EncodedScalarProtocol`-shaped hole in this proposal. What does >>>> it do? What are its semantics? How does `EncodedScalar` relate to the old >>>> `CodeUnit`? >>>> >>>>> @discardableResult >>>>> public static func parseForward<C: Collection>( >>>>> _ input: C, >>>>> repairingIllFormedSequences makeRepairs: Bool = true, >>>>> into output: (EncodedScalar) throws->Void >>>>> ) rethrows -> (remainder: C.SubSequence, errorCount: Int) >>>>> >>>>> @discardableResult >>>>> public static func parseReverse<C: BidirectionalCollection>( >>>>> _ input: C, >>>>> repairingIllFormedSequences makeRepairs: Bool = true, >>>>> into output: (EncodedScalar) throws->Void >>>>> ) rethrows -> (remainder: C.SubSequence, errorCount: Int) >>>>> where C.SubSequence : BidirectionalCollection, >>>>> C.SubSequence.SubSequence == C.SubSequence, >>>>> C.SubSequence.Iterator.Element == EncodedScalar.Iterator.Element >>>>> } >>>> >>>> Are there constraints missing on `parseForward`? >>>> >>>> What do these do if `makeRepairs` is false? Would it be clearer if we made >>>> an enum that described the behaviors and changed the label to something >>>> like `ifIllFormed:`? >>>> >>>>> Due to the change in internal implementation, this means that these >>>>> operations will be O(n) rather than O(1). This is not expected to be a >>>>> major concern, based on experiences from a similar change made to Java, >>>>> but projects will be able to work around performance issues without >>>>> upgrading to Swift 4 by explicitly typing slices as Substring, which will >>>>> call the Swift 4 variant, and which will be available but not invoked by >>>>> default in Swift 3 mode. >>>> >>>> Will there be a way to make this also work with a real Swift 3 compiler? >>>> For instance, can you define `typealias Substring = String` in such a way >>>> that real Swift 3 will parse and use it, but Swift 4 in Swift 3 mode will >>>> ignore it? >>>> >>>>> This proposal does not yet introduce an implicit conversion from >>>>> Substring to String. The decision on whether to add this will be deferred >>>>> pending feedback on the initial implementation. The intention is to make >>>>> a preview toolchain available for feedback, including on whether this >>>>> implicit conversion is necessary, prior to the release of Swift 4. >>>> >>>> This is a sensible approach. >>>> >>>> Thank you for developing this into a full proposal. I discussed the plans >>>> for Swift 4 with a local group of programmers recently, and everyone was >>>> pleased to hear that `String` would get an overhaul, that the `characters` >>>> view would be integrated into the string, etc. We even talked a little >>>> about `Substring` and people thought it was a good idea. This proposal is >>>> shaping up to impact a lot of people, but in a good way! >>>> >>>> -- >>>> Brent Royal-Gordon >>>> Architechies >>>> >>>> _______________________________________________ >>>> swift-evolution mailing list >>>> [email protected] <mailto:[email protected]> >>>> https://lists.swift.org/mailman/listinfo/swift-evolution >>>> <https://lists.swift.org/mailman/listinfo/swift-evolution> >>> >>> _______________________________________________ >>> swift-evolution mailing list >>> [email protected] <mailto:[email protected]> >>> https://lists.swift.org/mailman/listinfo/swift-evolution >>> <https://lists.swift.org/mailman/listinfo/swift-evolution> >> >> _______________________________________________ >> swift-evolution mailing list >> [email protected] <mailto:[email protected]> >> https://lists.swift.org/mailman/listinfo/swift-evolution >
_______________________________________________ swift-evolution mailing list [email protected] https://lists.swift.org/mailman/listinfo/swift-evolution
