When you have a pointer and a length, you can create a fully functional Collection using UnsafeBufferPointer. This means you aren't need something that’s C interop-specific any more – just the ability to create a String from a Collection of code units of some encoding.
We’ll add something to the proposal making it clear this will be possible. > On Mar 31, 2017, at 4:01 AM, Jean-Daniel via swift-evolution > <[email protected]> wrote: > > I’m with you for a C intro API that support taking a non-null terminated > string. I often work with API that support efficient parsing by providing > pointer to a global buffer + length to report parsed strings. > > Without a way to create a Swift string from buffer + length, interop with > such API will be difficult for no good reason, as Swift string don’t event > have to be null terminated. > >> Le 30 mars 2017 à 18:35, Félix Cloutier via swift-evolution >> <[email protected] <mailto:[email protected]>> a écrit : >> >> I don't have much non-nitpick issues that I greatly care about; I'm in favor >> of this. >> >> My only request: it's currently painful to create a String from a fixed-size >> C array. For instance, if I have a pointer to a `struct foo { char name[16]; >> }` in Swift where the last character doesn't have to be a NUL, it's hard to >> create a String from it. Real-world examples of this are Mach-O LC_SEGMENT >> and LC_SEGMENT_64 commands. >> >> The generally-accepted wisdom <http://stackoverflow.com/a/27456220/251153> >> is that you take a pointer to the CChar tuple that represents the fixed-size >> array, but this still requires the string to be NUL-terminated. What do we >> think of an additional init(cString:) overload that takes an >> UnsafeBufferPointer and reads up to the first NUL or the end of the buffer, >> whichever comes first? >> >>> Le 30 mars 2017 à 02:48, Brent Royal-Gordon via swift-evolution >>> <[email protected] <mailto:[email protected]>> a écrit : >>> >>>> On Mar 29, 2017, at 5:32 PM, Ben Cohen via swift-evolution >>>> <[email protected] <mailto:[email protected]>> wrote: >>>> >>>> Hi Swift Evolution, >>>> >>>> Below is a pitch for the first part of the String revision. This covers a >>>> number of changes that would allow the basic internals to be overhauled. >>>> >>>> Online version here: >>>> https://github.com/airspeedswift/swift-evolution/blob/3a822c799011ace682712532cfabfe32e9203fbb/proposals/0161-StringRevision1.md >>>> >>>> <https://github.com/airspeedswift/swift-evolution/blob/3a822c799011ace682712532cfabfe32e9203fbb/proposals/0161-StringRevision1.md> >>> >>> Really great stuff, guys. Thanks for your work on this! >>> >>>> In order to be able to write extensions accross both String and Substring, >>>> a new Unicode protocol to which the two types will conform will be >>>> introduced. For the purposes of this proposal, Unicode will be defined as >>>> a protocol to be used whenver you would previously extend String. It >>>> should be possible to substitute extension Unicode { ... } in Swift 4 >>>> wherever extension String { ... } was written in Swift 3, with one >>>> exception: any passing of self into an API that takes a concrete String >>>> will need to be rewritten as String(self). If Self is a String then this >>>> should effectively optimize to a no-op, whereas if Self is a Substring >>>> then this will force a copy, helping to avoid the “memory leak” problems >>>> described above. >>> >>> I continue to feel that `Unicode` is the wrong name for this protocol, >>> essentially because it sounds like a protocol for, say, a version of >>> Unicode or some kind of encoding machinery instead of a Unicode string. I >>> won't rehash that argument since I made it already in the manifesto thread, >>> but I would like to make a couple new suggestions in this area. >>> >>> Later on, you note that it would be nice to namespace many of these types: >>> >>>> Several of the types related to String, such as the encodings, would >>>> ideally reside inside a namespace rather than live at the top level of the >>>> standard library. The best namespace for this is probably Unicode, but >>>> this is also the name of the protocol. At some point if we gain the >>>> ability to nest enums and types inside protocols, they should be moved >>>> there. Putting them inside String or some other enum namespace is probably >>>> not worthwhile in the mean-time. >>> >>> Perhaps we should use an empty enum to create a `Unicode` namespace and >>> then nest the protocol within it via typealias. If we do that, we can >>> consider names like `Unicode.Collection` or even `Unicode.String` which >>> would shadow existing types if they were top-level. >>> >>> If not, then given this: >>> >>>> The exact nature of the protocol – such as which methods should be >>>> protocol requirements vs which can be implemented as protocol extensions, >>>> are considered implementation details and so not covered in this proposal. >>> >>> We may simply want to wait to choose a name. As the protocol develops, we >>> may discover a theme in its requirements which would suggest a good name. >>> For instance, we may realize that the core of what the protocol abstracts >>> is grouping code units into characters, which might suggest a name like >>> `Characters`, or `Unicode.Characters`, or `CharacterCollection`, or >>> what-have-you. >>> >>> (By the way, I hope that the eventual protocol requirements will be put >>> through the review process, if only as an amendment, once they're >>> determined.) >>> >>>> Unicode will conform to BidirectionalCollection. >>>> RangeReplaceableCollection conformance will be added directly onto the >>>> String and Substring types, as it is possible future Unicode-conforming >>>> types might not be range-replaceable (e.g. an immutable type that wraps a >>>> const char *). >>> >>> I'm a little worried about this because it seems to imply that the protocol >>> cannot include any mutation operations that aren't in >>> `RangeReplaceableCollection`. For instance, it won't be possible to include >>> an in-place `applyTransform` method in the protocol. Do you anticipate that >>> being an issue? Might it be a good idea to define a parallel `Mutable` or >>> `RangeReplaceable` protocol? >>> >>>> The C string interop methods will be updated to those described here: a >>>> single withCString operation and two init(cString:) constructors, one for >>>> UTF8 and one for arbitrary encodings. >>> >>> Sorry if I'm repeating something that was already discussed, but is there a >>> reason you don't include a `withCString` variant for arbitrary encodings? >>> It seems like an odd asymmetry. >>> >>>> The standard library currently lacks a Latin1 codec, so a enum Latin1: >>>> UnicodeEncoding type will be added. >>> >>> Nice. I wrote one of those once; I'll enjoy deleting it. >>> >>>> A new protocol, UnicodeEncoding, will be added to replace the current >>>> UnicodeCodec protocol: >>>> >>>> public enum UnicodeParseResult<T, Index> { >>> >>> Either `T` should be given a more specific name, or the enum should be >>> given a less specific one, becoming `ParseResult` and being oriented >>> towards incremental parsing of anything from any kind of collection. >>> >>>> /// Indicates valid input was recognized. >>>> /// >>>> /// `resumptionPoint` is the end of the parsed region >>>> case valid(T, resumptionPoint: Index) // FIXME: should these be reordered? >>> >>> No, I think this is the right order. The thing that's valid is the code >>> point. >>> >>>> /// Indicates invalid input was recognized. >>>> /// >>>> /// `resumptionPoint` is the next position at which to continue parsing >>>> after >>>> /// the invalid input is repaired. >>>> case error(resumptionPoint: Index) >>> >>> I know this is abbreviated documentation, but I hope the full version >>> includes a good usage example demonstrating, among other things, how to >>> detect partial characters and defer processing of them instead of rejecting >>> them as erroneous. >>> >>>> /// An encoding for text with UnicodeScalar as a common currency type >>>> public protocol UnicodeEncoding { >>>> /// The maximum number of code units in an encoded unicode scalar value >>>> static var maxLengthOfEncodedScalar: Int { get } >>>> >>>> /// A type that can represent a single UnicodeScalar as it is encoded in >>>> this >>>> /// encoding. >>>> associatedtype EncodedScalar : EncodedScalarProtocol >>> >>> There's an `EncodedScalarProtocol`-shaped hole in this proposal. What does >>> it do? What are its semantics? How does `EncodedScalar` relate to the old >>> `CodeUnit`? >>> >>>> @discardableResult >>>> public static func parseForward<C: Collection>( >>>> _ input: C, >>>> repairingIllFormedSequences makeRepairs: Bool = true, >>>> into output: (EncodedScalar) throws->Void >>>> ) rethrows -> (remainder: C.SubSequence, errorCount: Int) >>>> >>>> @discardableResult >>>> public static func parseReverse<C: BidirectionalCollection>( >>>> _ input: C, >>>> repairingIllFormedSequences makeRepairs: Bool = true, >>>> into output: (EncodedScalar) throws->Void >>>> ) rethrows -> (remainder: C.SubSequence, errorCount: Int) >>>> where C.SubSequence : BidirectionalCollection, >>>> C.SubSequence.SubSequence == C.SubSequence, >>>> C.SubSequence.Iterator.Element == EncodedScalar.Iterator.Element >>>> } >>> >>> Are there constraints missing on `parseForward`? >>> >>> What do these do if `makeRepairs` is false? Would it be clearer if we made >>> an enum that described the behaviors and changed the label to something >>> like `ifIllFormed:`? >>> >>>> Due to the change in internal implementation, this means that these >>>> operations will be O(n) rather than O(1). This is not expected to be a >>>> major concern, based on experiences from a similar change made to Java, >>>> but projects will be able to work around performance issues without >>>> upgrading to Swift 4 by explicitly typing slices as Substring, which will >>>> call the Swift 4 variant, and which will be available but not invoked by >>>> default in Swift 3 mode. >>> >>> Will there be a way to make this also work with a real Swift 3 compiler? >>> For instance, can you define `typealias Substring = String` in such a way >>> that real Swift 3 will parse and use it, but Swift 4 in Swift 3 mode will >>> ignore it? >>> >>>> This proposal does not yet introduce an implicit conversion from Substring >>>> to String. The decision on whether to add this will be deferred pending >>>> feedback on the initial implementation. The intention is to make a preview >>>> toolchain available for feedback, including on whether this implicit >>>> conversion is necessary, prior to the release of Swift 4. >>> >>> This is a sensible approach. >>> >>> Thank you for developing this into a full proposal. I discussed the plans >>> for Swift 4 with a local group of programmers recently, and everyone was >>> pleased to hear that `String` would get an overhaul, that the `characters` >>> view would be integrated into the string, etc. We even talked a little >>> about `Substring` and people thought it was a good idea. This proposal is >>> shaping up to impact a lot of people, but in a good way! >>> >>> -- >>> Brent Royal-Gordon >>> Architechies >>> >>> _______________________________________________ >>> swift-evolution mailing list >>> [email protected] <mailto:[email protected]> >>> https://lists.swift.org/mailman/listinfo/swift-evolution >>> <https://lists.swift.org/mailman/listinfo/swift-evolution> >> >> _______________________________________________ >> swift-evolution mailing list >> [email protected] <mailto:[email protected]> >> https://lists.swift.org/mailman/listinfo/swift-evolution > > _______________________________________________ > swift-evolution mailing list > [email protected] > https://lists.swift.org/mailman/listinfo/swift-evolution
_______________________________________________ swift-evolution mailing list [email protected] https://lists.swift.org/mailman/listinfo/swift-evolution
