Re: [swift-evolution] [Pitch] String revision proposal #1

2017-04-05 Thread Brent Royal-Gordon via swift-evolution
> On Apr 5, 2017, at 12:16 PM, Ben Cohen  wrote:
> 
> The idea is, if you don’t want to make repairs, you use the transcoding 
> primitives instead. The belief is that the old non-repairing versions (return 
> nil if repairs needed) weren’t useful.

Yes—I was asking about the transcoding primitives here. Currently the call 
looks like one of these:

let (remainder, errorCount) = UTF8.parseForward(bytes, 
repairingIllFormedSequences: false) { … }
let (remainder, errorCount) = UTF8.parseForward(bytes, 
repairingIllFormedSequences: true) { … }

I'm saying, would it be clearer if it looked like this instead?

let (remainder, errorCount) = UTF8.parseForward(bytes, ifIllFormed: 
.stop) { … }
let (remainder, errorCount) = UTF8.parseForward(bytes, ifIllFormed: 
.repair) { … }

-- 
Brent Royal-Gordon
Architechies

___
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


Re: [swift-evolution] [Pitch] String revision proposal #1

2017-04-05 Thread Ben Cohen via swift-evolution

> On Mar 31, 2017, at 2:00 AM, Brent Royal-Gordon  
> wrote:
> 
>> On Mar 30, 2017, at 2:42 PM, Ben Cohen via swift-evolution 
>> > wrote:
>> 
>> (or rather, given substrings will be 2-3 words, the not-even-that-small 
>> string optimization)
> 
> Wait, if you're using the `startIndex` and `endIndex` words for the small 
> substring optimization, how are you keeping the indices interchangeable with 
> the parent string's?
> 

This will require the indices to know more than just a position in a buffer. 
It’s not certain this performance trade-off is worth it, being experimented 
with.

> -- 
> Brent Royal-Gordon
> Architechies
> 

___
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


Re: [swift-evolution] [Pitch] String revision proposal #1

2017-04-05 Thread Ben Cohen via swift-evolution
Hi Brent,

Sorry, I realized I failed to reply to these at the time. See below.

> On Mar 30, 2017, at 6:52 PM, Brent Royal-Gordon  
> wrote:
> 
>> On Mar 30, 2017, at 2:36 PM, Ben Cohen > > wrote:
>> 
>> The big win for Unicode is it is short. We want to encourage people to write 
>> their extensions on this protocol. We want people who previously extended 
>> String to feel very comfortable extending Unicode. It also helps emphasis 
>> how important the Unicode-ness of Swift.String is. I like the idea of 
>> Unicode.Collection, but it is a little intimidating and making it even a 
>> tiny bit intimidating is worrying to me from an adoption perspective. 
> 
> Yeah, I understand why "Collection" might be intimidating. But I think 
> "Unicode" would be too—it's opaque enough that people wouldn't be entirely 
> sure whether they were extending the right thing.
> 
> I did a quick run-through of different language and the 
> protocols/interfaces/whatever their string types conform to, but most don't 
> seem to have anything that abstracts string types. The only similar things I 
> could find were `CharSequence` in Java, `StringLike` in Scala...and `Stringy` 
> in Perl 6. And I'm sure you thought you were joking!
> 

Ha!

> Honestly, I'd recommend just going with `StringProtocol` unless you can come 
> up with an adjective form you like (`Stringlike`? `Textual`?). It's a bit 
> clumsy, but it's crystal clear. Stupid name, but you'll never forget it.
> 

I think it’s kind of evenly balanced between Unicode and StringProtocol. 
Neither are perfect.

>>> I'm a little worried about this because it seems to imply that the protocol 
>>> cannot include any mutation operations that aren't in 
>>> `RangeReplaceableCollection`. For instance, it won't be possible to include 
>>> an in-place `applyTransform` method in the protocol. Do you anticipate that 
>>> being an issue? Might it be a good idea to define a parallel `Mutable` or 
>>> `RangeReplaceable` protocol?
>>> 
>> 
>> You can always assign to self. Then provide more efficient implementations 
>> where RangeReplaceableCollection. We do this elsewhere in the std lib with 
>> collections e.g. 
>> https://github.com/apple/swift/blob/master/stdlib/public/core/Collection.swift#L1277
>>  
>> .
>> 
>> Proliferating protocol combinations is problematic (looking at you, 
>> BidirectionalMutableRandomAccessSlice).
> 
> Nobody likes proliferation, but in this case it'd be because there genuinely 
> were additional semantics that were only available on mutable strings.
> 
> (Once upon a time, I think I requested the ability to write `func index(of 
> elem: Iterator.Element) -> Index? where Iterator.Element: Equatable`. Could 
> such a feature be used for this? `func apply(_ transform: StringTransform, 
> reverse: Bool) where Self: RangeReplaceableCollection`?)
> 
 The C string interop methods will be updated to those described here: a 
 single withCString operation and two init(cString:) constructors, one for 
 UTF8 and one for arbitrary encodings.
>>> 
>>> Sorry if I'm repeating something that was already discussed, but is there a 
>>> reason you don't include a `withCString` variant for arbitrary encodings? 
>>> It seems like an odd asymmetry.
>> 
>> Hmm. Is this a common use-case people have? Symmetry for the sake of it 
>> doesn’t seem enough. If uncommon, you can do it via an Array that you 
>> nul-terminate manually.
> 
> Is `init(cString:encoding:)` a common use case? If it is, I'm not sure why 
> the opposite wouldn't be.
> 

This + another use case has convinced me that yes, we should have a matching 
withCString version.

>> Yeah, it’s tempting to make ParseResult general, and the only reason we held 
>> off is because we don’t want making sure it’s generally useful to be a 
>> distraction.
> 
> Understandable.
> 
> I wonder if some part of the parsing algorithm could somehow be generalized 
> so it was suitable for many purposes and then put on `Collection`, with the 
> `UnicodeEncoding` then being passed as a parameter to it. If so, that would 
> justify making `ParseResult` a top-level type.
> 
>> Ah, yes. Here it is:
>> 
>> public protocol EncodedScalarProtocol : RandomAccessCollection {
>>  init?(_ scalarValue: UnicodeScalar)
>>  var utf8: UTF8.EncodedScalar { get }
>>  var utf16: UTF16.EncodedScalar { get }
>>  var utf32: UTF32.EncodedScalar { get }
>> }
> 
> What is the `Element` type expected to be here?
> 
> I think what's missing is a holistic overview of the encoding system. So, 
> please help me write this function:
> 
>   func unicodeScalars(in data: Data, using 
> encoding: Encoding.Type) -> [UnicodeScalar] {
>   var scalars: [UnicodeScalar] = []
>   
>   data.withUnsafeBytes { (bytes: 
> UnsafePointer<$ParseInputElement>) in
> 

Re: [swift-evolution] [Pitch] String revision proposal #1

2017-04-02 Thread Ben Rimmington via swift-evolution
Re: 

### C String Interop

Will the `init(cString: UnsafePointer)` API be deprecated in Swift 4 
mode?



It was added by SE-0107 to avoid unsafe pointer conversions:



### C Primitive Types

Will the `CChar` typealias be *unsigned* on some platforms?



Could some of those typealiases in CTypes.swift be moved to a C header, so that 
they're always imported as the correct types for each platform?

//===--- CTypes.h --===//
typedef char   CChar;
typedef signed charCSignedChar;
typedef unsigned char  CUnsignedChar;
typedef short  CShort;
typedef unsigned short CUnsignedShort;
typedef intCInt;
typedef unsigned int   CUnsignedInt;
typedef long   CLong;
typedef unsigned long  CUnsignedLong;
typedef long long  CLongLong;
typedef unsigned long long CUnsignedLongLong;
typedef float  CFloat;
typedef double CDouble;

For example, CTypes.swift for 64-bit Windows currently has:
* `CLong = Int32` versus `CUnsignedLong = UInt`,
* `CLongLong = Int`   versus `CUnsignedLongLong = UInt64`.

___
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


Re: [swift-evolution] [Pitch] String revision proposal #1

2017-03-31 Thread Ben Cohen via swift-evolution
You could even argue that what we need is a Collection wrapper that turns a 
pointer + a terminating sigil into a Collection… but from-C-string-creation is 
such a common operation that it deserves a dedicated shorthand. 
Non-null-terminated creation probably doesn’t.

> On Mar 31, 2017, at 8:03 AM, Ben Cohen  wrote:
> 
> 
> When you have a pointer and a length, you can create a fully functional 
> Collection using UnsafeBufferPointer. This means you aren't need something 
> that’s C interop-specific any more – just the ability to create a String from 
> a Collection of code units of some encoding.
> 
> We’ll add something to the proposal making it clear this will be possible.
> 
>> On Mar 31, 2017, at 4:01 AM, Jean-Daniel via swift-evolution 
>> > wrote:
>> 
>> I’m with you for a C intro API that support taking a non-null terminated 
>> string. I often work with API that support efficient parsing by providing 
>> pointer to a global buffer + length to report parsed strings.
>> 
>> Without a way to create a Swift string from buffer + length, interop with 
>> such API will be difficult for no good reason, as Swift string don’t event 
>> have to be null terminated.
>> 
>>> Le 30 mars 2017 à 18:35, Félix Cloutier via swift-evolution 
>>> > a écrit :
>>> 
>>> I don't have much non-nitpick issues that I greatly care about; I'm in 
>>> favor of this.
>>> 
>>> My only request: it's currently painful to create a String from a 
>>> fixed-size C array. For instance, if I have a pointer to a `struct foo { 
>>> char name[16]; }` in Swift where the last character doesn't have to be a 
>>> NUL, it's hard to create a String from it. Real-world examples of this are 
>>> Mach-O LC_SEGMENT and LC_SEGMENT_64 commands.
>>> 
>>> The generally-accepted wisdom  
>>> is that you take a pointer to the CChar tuple that represents the 
>>> fixed-size array, but this still requires the string to be NUL-terminated. 
>>> What do we think of an additional init(cString:) overload that takes an 
>>> UnsafeBufferPointer and reads up to the first NUL or the end of the buffer, 
>>> whichever comes first?
>>> 
 Le 30 mars 2017 à 02:48, Brent Royal-Gordon via swift-evolution 
 > a écrit :
 
> On Mar 29, 2017, at 5:32 PM, Ben Cohen via swift-evolution 
> > wrote:
> 
> Hi Swift Evolution,
> 
> Below is a pitch for the first part of the String revision. This covers a 
> number of changes that would allow the basic internals to be overhauled.
> 
> Online version here: 
> https://github.com/airspeedswift/swift-evolution/blob/3a822c799011ace682712532cfabfe32e9203fbb/proposals/0161-StringRevision1.md
>  
> 
 
 Really great stuff, guys. Thanks for your work on this!
 
> In order to be able to write extensions accross both String and 
> Substring, a new Unicode protocol to which the two types will conform 
> will be introduced. For the purposes of this proposal, Unicode will be 
> defined as a protocol to be used whenver you would previously extend 
> String. It should be possible to substitute extension Unicode { ... } in 
> Swift 4 wherever extension String { ... } was written in Swift 3, with 
> one exception: any passing of self into an API that takes a concrete 
> String will need to be rewritten as String(self). If Self is a String 
> then this should effectively optimize to a no-op, whereas if Self is a 
> Substring then this will force a copy, helping to avoid the “memory leak” 
> problems described above.
 
 I continue to feel that `Unicode` is the wrong name for this protocol, 
 essentially because it sounds like a protocol for, say, a version of 
 Unicode or some kind of encoding machinery instead of a Unicode string. I 
 won't rehash that argument since I made it already in the manifesto 
 thread, but I would like to make a couple new suggestions in this area.
 
 Later on, you note that it would be nice to namespace many of these types:
 
> Several of the types related to String, such as the encodings, would 
> ideally reside inside a namespace rather than live at the top level of 
> the standard library. The best namespace for this is probably Unicode, 
> but this is also the name of the protocol. At some point if we gain the 
> ability to nest enums and types inside protocols, they should be moved 
> there. Putting them inside String or some other enum namespace is 
> probably not worthwhile in the mean-time.
 
 Perhaps we 

Re: [swift-evolution] [Pitch] String revision proposal #1

2017-03-31 Thread Ben Cohen via swift-evolution

When you have a pointer and a length, you can create a fully functional 
Collection using UnsafeBufferPointer. This means you aren't need something 
that’s C interop-specific any more – just the ability to create a String from a 
Collection of code units of some encoding.

We’ll add something to the proposal making it clear this will be possible.

> On Mar 31, 2017, at 4:01 AM, Jean-Daniel via swift-evolution 
>  wrote:
> 
> I’m with you for a C intro API that support taking a non-null terminated 
> string. I often work with API that support efficient parsing by providing 
> pointer to a global buffer + length to report parsed strings.
> 
> Without a way to create a Swift string from buffer + length, interop with 
> such API will be difficult for no good reason, as Swift string don’t event 
> have to be null terminated.
> 
>> Le 30 mars 2017 à 18:35, Félix Cloutier via swift-evolution 
>> > a écrit :
>> 
>> I don't have much non-nitpick issues that I greatly care about; I'm in favor 
>> of this.
>> 
>> My only request: it's currently painful to create a String from a fixed-size 
>> C array. For instance, if I have a pointer to a `struct foo { char name[16]; 
>> }` in Swift where the last character doesn't have to be a NUL, it's hard to 
>> create a String from it. Real-world examples of this are Mach-O LC_SEGMENT 
>> and LC_SEGMENT_64 commands.
>> 
>> The generally-accepted wisdom  
>> is that you take a pointer to the CChar tuple that represents the fixed-size 
>> array, but this still requires the string to be NUL-terminated. What do we 
>> think of an additional init(cString:) overload that takes an 
>> UnsafeBufferPointer and reads up to the first NUL or the end of the buffer, 
>> whichever comes first?
>> 
>>> Le 30 mars 2017 à 02:48, Brent Royal-Gordon via swift-evolution 
>>> > a écrit :
>>> 
 On Mar 29, 2017, at 5:32 PM, Ben Cohen via swift-evolution 
 > wrote:
 
 Hi Swift Evolution,
 
 Below is a pitch for the first part of the String revision. This covers a 
 number of changes that would allow the basic internals to be overhauled.
 
 Online version here: 
 https://github.com/airspeedswift/swift-evolution/blob/3a822c799011ace682712532cfabfe32e9203fbb/proposals/0161-StringRevision1.md
  
 
>>> 
>>> Really great stuff, guys. Thanks for your work on this!
>>> 
 In order to be able to write extensions accross both String and Substring, 
 a new Unicode protocol to which the two types will conform will be 
 introduced. For the purposes of this proposal, Unicode will be defined as 
 a protocol to be used whenver you would previously extend String. It 
 should be possible to substitute extension Unicode { ... } in Swift 4 
 wherever extension String { ... } was written in Swift 3, with one 
 exception: any passing of self into an API that takes a concrete String 
 will need to be rewritten as String(self). If Self is a String then this 
 should effectively optimize to a no-op, whereas if Self is a Substring 
 then this will force a copy, helping to avoid the “memory leak” problems 
 described above.
>>> 
>>> I continue to feel that `Unicode` is the wrong name for this protocol, 
>>> essentially because it sounds like a protocol for, say, a version of 
>>> Unicode or some kind of encoding machinery instead of a Unicode string. I 
>>> won't rehash that argument since I made it already in the manifesto thread, 
>>> but I would like to make a couple new suggestions in this area.
>>> 
>>> Later on, you note that it would be nice to namespace many of these types:
>>> 
 Several of the types related to String, such as the encodings, would 
 ideally reside inside a namespace rather than live at the top level of the 
 standard library. The best namespace for this is probably Unicode, but 
 this is also the name of the protocol. At some point if we gain the 
 ability to nest enums and types inside protocols, they should be moved 
 there. Putting them inside String or some other enum namespace is probably 
 not worthwhile in the mean-time.
>>> 
>>> Perhaps we should use an empty enum to create a `Unicode` namespace and 
>>> then nest the protocol within it via typealias. If we do that, we can 
>>> consider names like `Unicode.Collection` or even `Unicode.String` which 
>>> would shadow existing types if they were top-level.
>>> 
>>> If not, then given this:
>>> 
 The exact nature of the protocol – such as which methods should be 
 protocol requirements vs which can be implemented as protocol extensions, 
 are 

Re: [swift-evolution] [Pitch] String revision proposal #1

2017-03-31 Thread Jean-Daniel via swift-evolution
I’m with you for a C intro API that support taking a non-null terminated 
string. I often work with API that support efficient parsing by providing 
pointer to a global buffer + length to report parsed strings.

Without a way to create a Swift string from buffer + length, interop with such 
API will be difficult for no good reason, as Swift string don’t event have to 
be null terminated.

> Le 30 mars 2017 à 18:35, Félix Cloutier via swift-evolution 
>  a écrit :
> 
> I don't have much non-nitpick issues that I greatly care about; I'm in favor 
> of this.
> 
> My only request: it's currently painful to create a String from a fixed-size 
> C array. For instance, if I have a pointer to a `struct foo { char name[16]; 
> }` in Swift where the last character doesn't have to be a NUL, it's hard to 
> create a String from it. Real-world examples of this are Mach-O LC_SEGMENT 
> and LC_SEGMENT_64 commands.
> 
> The generally-accepted wisdom  is 
> that you take a pointer to the CChar tuple that represents the fixed-size 
> array, but this still requires the string to be NUL-terminated. What do we 
> think of an additional init(cString:) overload that takes an 
> UnsafeBufferPointer and reads up to the first NUL or the end of the buffer, 
> whichever comes first?
> 
>> Le 30 mars 2017 à 02:48, Brent Royal-Gordon via swift-evolution 
>> > a écrit :
>> 
>>> On Mar 29, 2017, at 5:32 PM, Ben Cohen via swift-evolution 
>>> > wrote:
>>> 
>>> Hi Swift Evolution,
>>> 
>>> Below is a pitch for the first part of the String revision. This covers a 
>>> number of changes that would allow the basic internals to be overhauled.
>>> 
>>> Online version here: 
>>> https://github.com/airspeedswift/swift-evolution/blob/3a822c799011ace682712532cfabfe32e9203fbb/proposals/0161-StringRevision1.md
>>>  
>>> 
>> 
>> Really great stuff, guys. Thanks for your work on this!
>> 
>>> In order to be able to write extensions accross both String and Substring, 
>>> a new Unicode protocol to which the two types will conform will be 
>>> introduced. For the purposes of this proposal, Unicode will be defined as a 
>>> protocol to be used whenver you would previously extend String. It should 
>>> be possible to substitute extension Unicode { ... } in Swift 4 wherever 
>>> extension String { ... } was written in Swift 3, with one exception: any 
>>> passing of self into an API that takes a concrete String will need to be 
>>> rewritten as String(self). If Self is a String then this should effectively 
>>> optimize to a no-op, whereas if Self is a Substring then this will force a 
>>> copy, helping to avoid the “memory leak” problems described above.
>> 
>> I continue to feel that `Unicode` is the wrong name for this protocol, 
>> essentially because it sounds like a protocol for, say, a version of Unicode 
>> or some kind of encoding machinery instead of a Unicode string. I won't 
>> rehash that argument since I made it already in the manifesto thread, but I 
>> would like to make a couple new suggestions in this area.
>> 
>> Later on, you note that it would be nice to namespace many of these types:
>> 
>>> Several of the types related to String, such as the encodings, would 
>>> ideally reside inside a namespace rather than live at the top level of the 
>>> standard library. The best namespace for this is probably Unicode, but this 
>>> is also the name of the protocol. At some point if we gain the ability to 
>>> nest enums and types inside protocols, they should be moved there. Putting 
>>> them inside String or some other enum namespace is probably not worthwhile 
>>> in the mean-time.
>> 
>> Perhaps we should use an empty enum to create a `Unicode` namespace and then 
>> nest the protocol within it via typealias. If we do that, we can consider 
>> names like `Unicode.Collection` or even `Unicode.String` which would shadow 
>> existing types if they were top-level.
>> 
>> If not, then given this:
>> 
>>> The exact nature of the protocol – such as which methods should be protocol 
>>> requirements vs which can be implemented as protocol extensions, are 
>>> considered implementation details and so not covered in this proposal.
>> 
>> We may simply want to wait to choose a name. As the protocol develops, we 
>> may discover a theme in its requirements which would suggest a good name. 
>> For instance, we may realize that the core of what the protocol abstracts is 
>> grouping code units into characters, which might suggest a name like 
>> `Characters`, or `Unicode.Characters`, or `CharacterCollection`, or 
>> what-have-you.
>> 
>> (By the way, I hope that the eventual protocol requirements will be put 
>> through the review process, 

Re: [swift-evolution] [Pitch] String revision proposal #1

2017-03-31 Thread Brent Royal-Gordon via swift-evolution
> On Mar 30, 2017, at 2:42 PM, Ben Cohen via swift-evolution 
>  wrote:
> 
> (or rather, given substrings will be 2-3 words, the not-even-that-small 
> string optimization)

Wait, if you're using the `startIndex` and `endIndex` words for the small 
substring optimization, how are you keeping the indices interchangeable with 
the parent string's?

-- 
Brent Royal-Gordon
Architechies

___
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


Re: [swift-evolution] [Pitch] String revision proposal #1

2017-03-30 Thread Félix Cloutier via swift-evolution
Does it? According to the documentation for the current decodeCString 
, it 
seems to accept an UnsafePointer, not a buffer pointer, and expects the string 
to be null-terminated. Am I missing another overload?

> Le 30 mars 2017 à 17:27, Zach Waldowski via swift-evolution 
>  a écrit :
> 
> On Thu, Mar 30, 2017, at 12:35 PM, Félix Cloutier via swift-evolution wrote:
>> I don't have much non-nitpick issues that I greatly care about; I'm in favor 
>> of this.
>> 
>> My only request: it's currently painful to create a String from a fixed-size 
>> C array. For instance, if I have a pointer to a `struct foo { char name[16]; 
>> }` in Swift where the last character doesn't have to be a NUL, it's hard to 
>> create a String from it. Real-world examples of this are Mach-O LC_SEGMENT 
>> and LC_SEGMENT_64 commands.
>> 
>> 
>> The generally-accepted wisdom  
>> is that you take a pointer to the CChar tuple that represents the fixed-size 
>> array, but this still requires the string to be NUL-terminated. What do we 
>> think of an additional init(cString:) overload that takes an 
>> UnsafeBufferPointer and reads up to the first NUL or the end of the buffer, 
>> whichever comes first?
> 
> Today's String already supports this through 
> `String.decodeCString(_:as:repairingInvalidCodeUnits:)`, passing a buffer 
> pointer.
> 
> Best,
>   Zachary Waldowski
>   z...@waldowski.me 
> 
> 
> 
> ___
> swift-evolution mailing list
> swift-evolution@swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution

___
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


Re: [swift-evolution] [Pitch] String revision proposal #1

2017-03-30 Thread Brent Royal-Gordon via swift-evolution
> On Mar 30, 2017, at 2:36 PM, Ben Cohen  wrote:
> 
> The big win for Unicode is it is short. We want to encourage people to write 
> their extensions on this protocol. We want people who previously extended 
> String to feel very comfortable extending Unicode. It also helps emphasis how 
> important the Unicode-ness of Swift.String is. I like the idea of 
> Unicode.Collection, but it is a little intimidating and making it even a tiny 
> bit intimidating is worrying to me from an adoption perspective. 

Yeah, I understand why "Collection" might be intimidating. But I think 
"Unicode" would be too—it's opaque enough that people wouldn't be entirely sure 
whether they were extending the right thing.

I did a quick run-through of different language and the 
protocols/interfaces/whatever their string types conform to, but most don't 
seem to have anything that abstracts string types. The only similar things I 
could find were `CharSequence` in Java, `StringLike` in Scala...and `Stringy` 
in Perl 6. And I'm sure you thought you were joking!

Honestly, I'd recommend just going with `StringProtocol` unless you can come up 
with an adjective form you like (`Stringlike`? `Textual`?). It's a bit clumsy, 
but it's crystal clear. Stupid name, but you'll never forget it.

>> I'm a little worried about this because it seems to imply that the protocol 
>> cannot include any mutation operations that aren't in 
>> `RangeReplaceableCollection`. For instance, it won't be possible to include 
>> an in-place `applyTransform` method in the protocol. Do you anticipate that 
>> being an issue? Might it be a good idea to define a parallel `Mutable` or 
>> `RangeReplaceable` protocol?
>> 
> 
> You can always assign to self. Then provide more efficient implementations 
> where RangeReplaceableCollection. We do this elsewhere in the std lib with 
> collections e.g. 
> https://github.com/apple/swift/blob/master/stdlib/public/core/Collection.swift#L1277
>  
> .
> 
> Proliferating protocol combinations is problematic (looking at you, 
> BidirectionalMutableRandomAccessSlice).

Nobody likes proliferation, but in this case it'd be because there genuinely 
were additional semantics that were only available on mutable strings.

(Once upon a time, I think I requested the ability to write `func index(of 
elem: Iterator.Element) -> Index? where Iterator.Element: Equatable`. Could 
such a feature be used for this? `func apply(_ transform: StringTransform, 
reverse: Bool) where Self: RangeReplaceableCollection`?)

>>> The C string interop methods will be updated to those described here: a 
>>> single withCString operation and two init(cString:) constructors, one for 
>>> UTF8 and one for arbitrary encodings.
>> 
>> Sorry if I'm repeating something that was already discussed, but is there a 
>> reason you don't include a `withCString` variant for arbitrary encodings? It 
>> seems like an odd asymmetry.
> 
> Hmm. Is this a common use-case people have? Symmetry for the sake of it 
> doesn’t seem enough. If uncommon, you can do it via an Array that you 
> nul-terminate manually.

Is `init(cString:encoding:)` a common use case? If it is, I'm not sure why the 
opposite wouldn't be.

> Yeah, it’s tempting to make ParseResult general, and the only reason we held 
> off is because we don’t want making sure it’s generally useful to be a 
> distraction.

Understandable.

I wonder if some part of the parsing algorithm could somehow be generalized so 
it was suitable for many purposes and then put on `Collection`, with the 
`UnicodeEncoding` then being passed as a parameter to it. If so, that would 
justify making `ParseResult` a top-level type.

> Ah, yes. Here it is:
> 
> public protocol EncodedScalarProtocol : RandomAccessCollection {
>  init?(_ scalarValue: UnicodeScalar)
>  var utf8: UTF8.EncodedScalar { get }
>  var utf16: UTF16.EncodedScalar { get }
>  var utf32: UTF32.EncodedScalar { get }
> }

What is the `Element` type expected to be here?

I think what's missing is a holistic overview of the encoding system. So, 
please help me write this function:

func unicodeScalars(in data: Data, using 
encoding: Encoding.Type) -> [UnicodeScalar] {
var scalars: [UnicodeScalar] = []

data.withUnsafeBytes { (bytes: 
UnsafePointer<$ParseInputElement>) in
let buffer = UnsafeBufferPointer(start: bytes, count: 
data.count / MemoryLayout<$ParseInputElement>.size)
encoding.parseForward(buffer) { encodedScalar in
let unicodeScalar: UnicodeScalar = 
$doSomething(encodedScalar)
scalars.append(unicodeScalar)
}
}

return scalars
}

What type would I put for $ParseInputElement? What function or initializer do I 
call for 

Re: [swift-evolution] [Pitch] String revision proposal #1

2017-03-30 Thread Zach Waldowski via swift-evolution
On Thu, Mar 30, 2017, at 12:35 PM, Félix Cloutier via swift-evolution wrote:
> I don't have much non-nitpick issues that I greatly care about; I'm in
> favor of this.
> 

> My only request: it's currently painful to create a String from a fixed-
> size C array. For instance, if I have a pointer to a `struct foo {
> char name[16]; }` in Swift where the last character doesn't have to be
> a NUL, it's hard to create a String from it. Real-world examples of
> this are Mach-O LC_SEGMENT and LC_SEGMENT_64 commands.
> 

> 

> The generally-accepted wisdom[1] is that you take a pointer to the
> CChar tuple that represents the fixed-size array, but this still
> requires the string to be NUL-terminated. What do we think of an
> additional init(cString:) overload that takes an UnsafeBufferPointer
> and reads up to the first NUL or the end of the buffer, whichever
> comes first?


Today's String already supports this through
`String.decodeCString(_:as:repairingInvalidCodeUnits:)`, passing a
buffer pointer.


Best,

  Zachary Waldowski

  z...@waldowski.me








Links:

  1. http://stackoverflow.com/a/27456220/251153
___
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


Re: [swift-evolution] [Pitch] String revision proposal #1

2017-03-30 Thread Karimnassar via swift-evolution
Makes sense. Thanks for the explanation. 

--Karim

> On Mar 30, 2017, at 7:43 PM, Ben Cohen <ben_co...@apple.com> wrote:
> 
> 
>> On Mar 30, 2017, at 10:05 AM, Karim Nassar via swift-evolution 
>> <swift-evolution@swift.org> wrote:
>> 
>> 
>>> Message: 12
>>> Date: Thu, 30 Mar 2017 12:23:13 +0200
>>> From: Adrian Zubarev <adrian.zuba...@devandartist.com>
>>> To: Ben Cohen <ben_co...@apple.com>
>>> Cc: swift-evolution@swift.org
>>> Subject: Re: [swift-evolution] [Pitch] String revision proposal #1
>>> Message-ID: <etpan.58dcdc91.583a4c4b.1...@devandartist.com>
>>> Content-Type: text/plain; charset="utf-8"
>>> 
>>> I haven’t followed the topic and while reading the proposal I found it a 
>>> little confusing that we have inconsistent type names. I’m not a native 
>>> English speaker so that’s might be the main case for my confusion here, so 
>>> I’d appreciate for any clarification. ;-)
>>> 
>>> SubSequence vs. Substring and not SubString.
>>> 
>>> The word substring is an English word, but so is subsequence (I double 
>>> checked here).
>>> 
>>> So where exactly is the issue here? Is it SubSequence which is written in 
>>> camel case or is it Substring which is not?
>>> 
>>> 
>>> 
>>> -- 
>>> Adrian Zubarev
>>> Sent with Airmail
>> 
>> 
>> I’d also be curious if StringSlice was considered, since we have already the 
>> well-established and seemingly-parallel ArraySlice.
>> 
> 
> Yup we considered it (I should probably have put it in the Alternatives 
> Considered section). Also considered: having nothing but String.SubSequence. 
> In fact it may be Substring is just a typealias for String.SubSequence, 
> though this would be an implementation detail rather than part of the 
> proposal.
> 
> The current hope (unrelated to String) is that ArraySlice could be 
> eliminated, by introducing some kind of “ContiguouslyStored” protocol and 
> then extending the general Slice when it slices something that conforms to 
> that. But this can’t be done with Substring because of the small string 
> optimization.
> 
> The other difference is there’s no common term of art for a slice of an 
> Array, whereas there is for a slice of a String.
> 
>> —Karim
>> 
>> ___
>> swift-evolution mailing list
>> swift-evolution@swift.org
>> https://lists.swift.org/mailman/listinfo/swift-evolution
> 

___
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


Re: [swift-evolution] [Pitch] String revision proposal #1

2017-03-30 Thread Xiaodi Wu via swift-evolution
On Thu, Mar 30, 2017 at 10:38 AM, Ben Cohen  wrote:

>
> On Mar 29, 2017, at 6:59 PM, Xiaodi Wu  wrote:
>
> This looks great. The restored conformances to *Collection will be huge.
>
> Is this to be the first of several or the only major part of the manifesto
> to be delivered in Swift 4?
>
>
> First of several. This lays the ground work for the changes to the
> underlying implementation. Other changes will mostly be additive on top.
>
> Nits on naming: are we calling it Substring or SubString (à la
> SubSequence)?
>
>
> This is venturing into subjective territory, so these are just my feelings
> rather than something definitive (Dave may differ) but:
>
> It should definitely be Substring. My rule of thumb: if you might
> hyphenate it, you can capitalize it. I don’t think anyone spells it
> "sub-string". OTOH one *might* write "sub-sequence". Generally hyphens
> disappear in english as things come into common usage i.e. it used to be
> e-mail but now it’s mostly just email.  Substring is enough of a term of
> art in programming that this has happened. Admittedly, Subsequence is a
> term of art too – unfortunately one that has a different meaning to ours
> ("a sequence that can be derived from another sequence by deleting some
> elements without changing the order of the remaining elements" e.g. 
> is a Subsequence of  – see https://en.wikipedia.org/
> wiki/Subsequence). Even worse, the mathematical term for what we are
> calling a subsequence is a Substring!
>
> If we were change anything, my vote would be to lowercase Subsequence. We
> can typealias SubSequence = Subsequence to aid migration, with a slow burn
> on deprecating it since it’ll be quite a footling deprecation. I don’t know
> if it’s worth it though – the main use of “SubSequence” is currently in
> those pesky where clauses you have to put on all your Collection extensions
> if you want to use slicing, and many of these will be eliminated once we
> have the ability to put where clauses on associated types.
>

I regret bringing this up. `Substring` is totally fine. `SubSequence` is
too. Just wanted to get some clarification that this was the proposed
spelling. I doubt it's worth a whole migration to change the capitalization
of `SubSequence`, which after all prevents the word from being read like
"consequence."

and shouldn't it be UnicodeParsedResult rather than UnicodeParseResult?
>
>
> I think Parse. As in, this is the result of a parse, not these are the
> parsed results (though it does contain parsed results in some cases, but
> not all).
>

Ah, then `UnicodeParsingResult`, maybe? Something about nouning that verb
doesn't sit right. OK, done with bikeshedding.


> On Wed, Mar 29, 2017 at 19:32 Ben Cohen via swift-evolution <
> swift-evolution@swift.org> wrote:
>
> Hi Swift Evolution,
>
> Below is a pitch for the first part of the String revision. This covers a
> number of changes that would allow the basic internals to be overhauled.
>
> Online version here: https://github.com/airspeedswift/swift-evolution/
> blob/3a822c799011ace682712532cfabfe32e9203fbb/proposals/0161-
> StringRevision1.md
>
>
> String Revision: Collection Conformance, C Interop, Transcoding
>
>- Proposal: SE-0161
>- Authors: Ben Cohen , Dave Abrahams
>
>- Review Manager: TBD
>- Status: *Awaiting review*
>
> Introduction
>
> This proposal is to implement a subset of the changes from the Swift 4
> String Manifesto
> .
>
> Specifically:
>
>- Make String conform to BidirectionalCollection
>- Make String conform to RangeReplaceableCollection
>- Create a Substring type for String.SubSequence
>- Create a Unicode protocol to allow for generic operations over both
>types.
>- Consolidate on a concise set of C interop methods.
>- Revise the transcoding infrastructure.
>
> Other existing aspects of String remain unchanged for the purposes of
> this proposal.
> Motivation
>
> This proposal follows up on a number of recommendations found in the
> manifesto:
>
> Collection conformance was dropped from String in Swift 2. After
> reevaluation, the feeling is that the minor semantic discrepancies (mainly
> with RangeReplaceableCollection) are outweighed by the significant
> benefits of restoring these conformances. For more detail on the reasoning,
> see here
> 
>
> While it is not a collection, the Swift 3 string does have slicing
> operations. String is currently serving as its own subsequence, allowing
> substrings to share storage with their “owner”. This can lead to memory
> leaks when small substrings of larger strings are stored long-term (see
> here
> 

Re: [swift-evolution] [Pitch] String revision proposal #1

2017-03-30 Thread Ben Cohen via swift-evolution

> On Mar 30, 2017, at 10:05 AM, Karim Nassar via swift-evolution 
> <swift-evolution@swift.org> wrote:
> 
> 
>> Message: 12
>> Date: Thu, 30 Mar 2017 12:23:13 +0200
>> From: Adrian Zubarev <adrian.zuba...@devandartist.com>
>> To: Ben Cohen <ben_co...@apple.com>
>> Cc: swift-evolution@swift.org
>> Subject: Re: [swift-evolution] [Pitch] String revision proposal #1
>> Message-ID: <etpan.58dcdc91.583a4c4b.1...@devandartist.com>
>> Content-Type: text/plain; charset="utf-8"
>> 
>> I haven’t followed the topic and while reading the proposal I found it a 
>> little confusing that we have inconsistent type names. I’m not a native 
>> English speaker so that’s might be the main case for my confusion here, so 
>> I’d appreciate for any clarification. ;-)
>> 
>> SubSequence vs. Substring and not SubString.
>> 
>> The word substring is an English word, but so is subsequence (I double 
>> checked here).
>> 
>> So where exactly is the issue here? Is it SubSequence which is written in 
>> camel case or is it Substring which is not?
>> 
>> 
>> 
>> -- 
>> Adrian Zubarev
>> Sent with Airmail
> 
> 
> I’d also be curious if StringSlice was considered, since we have already the 
> well-established and seemingly-parallel ArraySlice.
> 

Yup we considered it (I should probably have put it in the Alternatives 
Considered section). Also considered: having nothing but String.SubSequence. In 
fact it may be Substring is just a typealias for String.SubSequence, though 
this would be an implementation detail rather than part of the proposal.

The current hope (unrelated to String) is that ArraySlice could be eliminated, 
by introducing some kind of “ContiguouslyStored” protocol and then extending 
the general Slice when it slices something that conforms to that. But this 
can’t be done with Substring because of the small string optimization.

The other difference is there’s no common term of art for a slice of an Array, 
whereas there is for a slice of a String.

> —Karim
> 
> ___
> swift-evolution mailing list
> swift-evolution@swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution

___
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


Re: [swift-evolution] [Pitch] String revision proposal #1

2017-03-30 Thread Ben Cohen via swift-evolution

> On Mar 30, 2017, at 2:03 PM, Karl Wagner via swift-evolution 
>  wrote:
> 
> So, running with the parallel, why not add a conditional conformance: "Slice: 
> Unicode where Base: Unicode”?
> 

Primarily because this would rule out giving substrings the “small string 
optimization" where we pack the characters into the struct directly when 
they’ll fit.

(or rather, given substrings will be 2-3 words, the not-even-that-small string 
optimization)


___
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


Re: [swift-evolution] [Pitch] String revision proposal #1

2017-03-30 Thread Ben Cohen via swift-evolution
Hi Brent,

Thanks for the notes. Replies inline.

> On Mar 30, 2017, at 2:48 AM, Brent Royal-Gordon  
> wrote:
> 
>> On Mar 29, 2017, at 5:32 PM, Ben Cohen via swift-evolution 
>>  wrote:
>> 
>> Hi Swift Evolution,
>> 
>> Below is a pitch for the first part of the String revision. This covers a 
>> number of changes that would allow the basic internals to be overhauled.
>> 
>> Online version here: 
>> https://github.com/airspeedswift/swift-evolution/blob/3a822c799011ace682712532cfabfe32e9203fbb/proposals/0161-StringRevision1.md
> 
> Really great stuff, guys. Thanks for your work on this!
> 
>> In order to be able to write extensions accross both String and Substring, a 
>> new Unicode protocol to which the two types will conform will be introduced. 
>> For the purposes of this proposal, Unicode will be defined as a protocol to 
>> be used whenver you would previously extend String. It should be possible to 
>> substitute extension Unicode { ... } in Swift 4 wherever extension String { 
>> ... } was written in Swift 3, with one exception: any passing of self into 
>> an API that takes a concrete String will need to be rewritten as 
>> String(self). If Self is a String then this should effectively optimize to a 
>> no-op, whereas if Self is a Substring then this will force a copy, helping 
>> to avoid the “memory leak” problems described above.
> 
> I continue to feel that `Unicode` is the wrong name for this protocol, 
> essentially because it sounds like a protocol for, say, a version of Unicode 
> or some kind of encoding machinery instead of a Unicode string. I won't 
> rehash that argument since I made it already in the manifesto thread, but I 
> would like to make a couple new suggestions in this area.
> 
> Later on, you note that it would be nice to namespace many of these types:
> 
>> Several of the types related to String, such as the encodings, would ideally 
>> reside inside a namespace rather than live at the top level of the standard 
>> library. The best namespace for this is probably Unicode, but this is also 
>> the name of the protocol. At some point if we gain the ability to nest enums 
>> and types inside protocols, they should be moved there. Putting them inside 
>> String or some other enum namespace is probably not worthwhile in the 
>> mean-time.
> 
> Perhaps we should use an empty enum to create a `Unicode` namespace and then 
> nest the protocol within it via typealias. If we do that, we can consider 
> names like `Unicode.Collection` or even `Unicode.String` which would shadow 
> existing types if they were top-level.
> 

We’re a bit on the fence about whether Unicode or StringProtocol is the better 
name.

The big win for Unicode is it is short. We want to encourage people to write 
their extensions on this protocol. We want people who previously extended 
String to feel very comfortable extending Unicode. It also helps emphasis how 
important the Unicode-ness of Swift.String is. I like the idea of 
Unicode.Collection, but it is a little intimidating and making it even a tiny 
bit intimidating is worrying to me from an adoption perspective. 


> If not, then given this:
> 
>> The exact nature of the protocol – such as which methods should be protocol 
>> requirements vs which can be implemented as protocol extensions, are 
>> considered implementation details and so not covered in this proposal.
> 
> We may simply want to wait to choose a name. As the protocol develops, we may 
> discover a theme in its requirements which would suggest a good name. For 
> instance, we may realize that the core of what the protocol abstracts is 
> grouping code units into characters, which might suggest a name like 
> `Characters`, or `Unicode.Characters`, or `CharacterCollection`, or 
> what-have-you.
> 
> (By the way, I hope that the eventual protocol requirements will be put 
> through the review process, if only as an amendment, once they're determined.)
> 

Definitely. We just want to minimize churn on the group to keep the discussion 
followable on the broader principles for as many as possible. Once it’s firmed 
up and we’ve had implementation/useability/performance feedback, we’ll be back.

>> Unicode will conform to BidirectionalCollection. RangeReplaceableCollection 
>> conformance will be added directly onto the String and Substring types, as 
>> it is possible future Unicode-conforming types might not be 
>> range-replaceable (e.g. an immutable type that wraps a const char *).
> 
> I'm a little worried about this because it seems to imply that the protocol 
> cannot include any mutation operations that aren't in 
> `RangeReplaceableCollection`. For instance, it won't be possible to include 
> an in-place `applyTransform` method in the protocol. Do you anticipate that 
> being an issue? Might it be a good idea to define a parallel `Mutable` or 
> `RangeReplaceable` protocol?
> 

You can always assign to self. Then provide more efficient 

Re: [swift-evolution] [Pitch] String revision proposal #1

2017-03-30 Thread Karl Wagner via swift-evolution

> On 30 Mar 2017, at 21:39, Robert Bennett via swift-evolution 
>  wrote:
> 
> Adrian,
> 
> I think there are a few competing claims here.
> 
> 1) Substring is a term of art used universally throughout computing, and 
> camel-casing it would run counter to that.
> 2) While subsequence is a word, its  precise mathematical meaning differs 
> from what it means in Swift. In Swift a SubSequence contains consecutive 
> elements of a sequence, whereas in math a subsequence may contain any subset 
> of the elements, ordered correctly. Hence Subsequence would be technically 
> incorrect (not a big issue IMO).
> 3) We want Sub[sS]tring and Sub[sS]equence to have the same capitalization.
> 
> I'd prefer ignoring 2 and satisfying 1 and 3, since there's no reason Swift's 
> names must exactly coincide with mathematical objects with the same name (for 
> instance, mathematical sets may contain anything at all, including 
> themselves). In addition, the prefix "Sub" does not usually (ever?) produce a 
> Sequence with wholly different mechanics, as do "Enumeration", "Zip2", etc. 
> -- the Subsequence really belongs to the owning Sequence. So my vote is for 
> Substring and Subsequence.
> 
> As for why not StringSlice... Substring is certainly a more familiar word. 
> That said, StringSlice would make it clearer that the slice is a view into a 
> String and is meant for temporary use only, which Substring might not convey. 
> If we choose StringSlice, I see no reason to change SubSequence to 
> Subsequence.
> ___
> swift-evolution mailing list
> swift-evolution@swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution

+ 1 to StringSlice. In fact, I’m not sure if we even need Substring/StringSlice 
at all. We already have a Slice type.

The proposal mentions the parallel to ArraySlice, but ArraySlice is set to go: 
https://bugs.swift.org/browse/SR-3631

So, running with the parallel, why not add a conditional conformance: "Slice: 
Unicode where Base: Unicode”?

- Karl___
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


Re: [swift-evolution] [Pitch] String revision proposal #1

2017-03-30 Thread Karl Wagner via swift-evolution

> In order to be able to write extensions accross both String and Substring, a 
> new Unicode protocol to which the two types will conform will be introduced. 
> For the purposes of this proposal, Unicode will be defined as a protocol to 
> be used whenver you would previously extend String. It should be possible to 
> substitute extension Unicode { ... } in Swift 4 wherever extension String { 
> ... } was written in Swift 3, with one exception: any passing of self into an 
> API that takes a concrete String will need to be rewritten as String(self). 
> If Self is a String then this should effectively optimize to a no-op, whereas 
> if Self is a Substring then this will force a copy, helping to avoid the 
> “memory leak” problems described above.

Did you consider an AnyUnicode wrapper? Then we could have a 
typealias called “AnyString”.

Also, regarding naming: “Unicode” is great if this was a namespace, and this 
proposal is a great example of why protocol nesting is badly needed in Swift 
code which defines (not even very complex) protocols. However, absent protocol 
nesting, I think “UnicodeEncoded” is better. It doesn’t roll off the tongue as 
nicely, perhaps, but it also doesn’t look as weird when written in code.

> The exact nature of the protocol – such as which methods should be protocol 
> requirements vs which can be implemented as protocol extensions, are 
> considered implementation details and so not covered in this proposal.
> 
I’d hope they do get a proposal at some stage, though. There are cases where 
I’d like to be able to write my own “Unicode” type and take advantage of 
generic (and existential when we can) text processing.

For example, maybe the thing I want to present as a single block of text is 
actually pieced together from multiple discontiguous regions of a buffer (i.e. 
the “buffer-gap” approach for faster random insertions/deletions, if I expect 
my code to be doing lots of that).

You could imagine that if something like CoreText (can’t speak for them, of 
course) were being rewritten in Swift, it would be able to compute layouts and 
render glyphs from any provider of unicode data and not just String or 
Substring. I mean, that’s my dream, anyway. It would mean you could go directly 
from a buffer-gap String to a rendered bitmap suitable for UI.

> Unicode will conform to BidirectionalCollection. RangeReplaceableCollection 
> conformance will be added directly onto the String and Substring types, as it 
> is possible future Unicode-conforming types might not be range-replaceable 
> (e.g. an immutable type that wraps a const char *).
> 
+1. Keep the protocol focussed.

> The standard library currently lacks a Latin1 codec, so a enum Latin1: 
> UnicodeEncoding type will be added.
> 
I feel this is a call for better naming somewhere.

>   init(
> cString nulTerminatedCodeUnits: UnsafePointer,
> encoding: Encoding)

So will this replace the stuff which Foundation puts in to String, which also 
decodes a C string in to Swift string?

Foundation includes more encodings (and also nests an “Encoding” enum in String 
itself, which makes things even more confusing), but totally ignores standard 
library decodes in favour of CF ones.

- Karl___
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


Re: [swift-evolution] [Pitch] String revision proposal #1

2017-03-30 Thread Nevin Brackett-Rozinsky via swift-evolution
On Thu, Mar 30, 2017 at 3:39 PM, Robert Bennett via swift-evolution <
swift-evolution@swift.org> wrote:
>
>
> (for instance, mathematical sets may contain anything at all, including
> themselves)


Well actually they can’t ,
for…reasons  :-)

Nevin
___
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


Re: [swift-evolution] [Pitch] String revision proposal #1

2017-03-30 Thread Ben Cohen via swift-evolution

> On Mar 30, 2017, at 11:20 AM, Brent Royal-Gordon  
> wrote:
> 
> (That's why there's no adjective form of "string", which makes naming the 
> protocol difficult.)

We-eelll, there is “Stringy”….

As tempting as it is to call the protocol this, it’s probably not a good idea.

(then again, if we called it Text instead of String, we could then call the 
subsequence Subtext…)


___
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


[swift-evolution] [Pitch] String revision proposal #1

2017-03-30 Thread Robert Bennett via swift-evolution
Adrian,

I think there are a few competing claims here.

1) Substring is a term of art used universally throughout computing, and 
camel-casing it would run counter to that.
2) While subsequence is a word, its  precise mathematical meaning differs from 
what it means in Swift. In Swift a SubSequence contains consecutive elements of 
a sequence, whereas in math a subsequence may contain any subset of the 
elements, ordered correctly. Hence Subsequence would be technically incorrect 
(not a big issue IMO).
3) We want Sub[sS]tring and Sub[sS]equence to have the same capitalization.

I'd prefer ignoring 2 and satisfying 1 and 3, since there's no reason Swift's 
names must exactly coincide with mathematical objects with the same name (for 
instance, mathematical sets may contain anything at all, including themselves). 
In addition, the prefix "Sub" does not usually (ever?) produce a Sequence with 
wholly different mechanics, as do "Enumeration", "Zip2", etc. -- the 
Subsequence really belongs to the owning Sequence. So my vote is for Substring 
and Subsequence.

As for why not StringSlice... Substring is certainly a more familiar word. That 
said, StringSlice would make it clearer that the slice is a view into a String 
and is meant for temporary use only, which Substring might not convey. If we 
choose StringSlice, I see no reason to change SubSequence to Subsequence.
___
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


Re: [swift-evolution] [Pitch] String revision proposal #1

2017-03-30 Thread Ben Cohen via swift-evolution

> On Mar 30, 2017, at 8:59 AM, Adrian Zubarev via swift-evolution 
>  wrote:
> 
> We cannot rename SubSequence to Subsequence, because that would be odd 
> compared to all other types containing Sequence.
> 
> 
There is a difference between subsequence, which is one word, and the others, 
which are noun phrases (i.e. “any sequence”, “lazy sequence”). The issue is 
whether it’s "sub-sequence" (capitalization reasonable) or subsequence (no 
reason for caps).


___
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


Re: [swift-evolution] [Pitch] String revision proposal #1

2017-03-30 Thread Brent Royal-Gordon via swift-evolution
> On Mar 30, 2017, at 8:38 AM, Ben Cohen via swift-evolution 
>  wrote:
> 
> It should definitely be Substring. My rule of thumb: if you might hyphenate 
> it, you can capitalize it.

I was going to make the same argument, but you beat me to it. 

"String" and "Substring" are both terms of art. (That's why there's no 
adjective form of "string", which makes naming the protocol difficult.) And 
they're probably the most widely-used terms of art in programming. "Substring" 
is inconsistent with other parts of the language, but for a good reason. 

Keep it. 

-- 
Brent Royal-Gordon
Sent from my iPhone

___
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


Re: [swift-evolution] [Pitch] String revision proposal #1

2017-03-30 Thread Karim Nassar via swift-evolution

> Message: 12
> Date: Thu, 30 Mar 2017 12:23:13 +0200
> From: Adrian Zubarev <adrian.zuba...@devandartist.com>
> To: Ben Cohen <ben_co...@apple.com>
> Cc: swift-evolution@swift.org
> Subject: Re: [swift-evolution] [Pitch] String revision proposal #1
> Message-ID: <etpan.58dcdc91.583a4c4b.1...@devandartist.com>
> Content-Type: text/plain; charset="utf-8"
> 
> I haven’t followed the topic and while reading the proposal I found it a 
> little confusing that we have inconsistent type names. I’m not a native 
> English speaker so that’s might be the main case for my confusion here, so 
> I’d appreciate for any clarification. ;-)
> 
> SubSequence vs. Substring and not SubString.
> 
> The word substring is an English word, but so is subsequence (I double 
> checked here).
> 
> So where exactly is the issue here? Is it SubSequence which is written in 
> camel case or is it Substring which is not?
> 
> 
> 
> -- 
> Adrian Zubarev
> Sent with Airmail


I’d also be curious if StringSlice was considered, since we have already the 
well-established and seemingly-parallel ArraySlice.

—Karim

___
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


Re: [swift-evolution] [Pitch] String revision proposal #1

2017-03-30 Thread Félix Cloutier via swift-evolution
I don't have much non-nitpick issues that I greatly care about; I'm in favor of 
this.

My only request: it's currently painful to create a String from a fixed-size C 
array. For instance, if I have a pointer to a `struct foo { char name[16]; }` 
in Swift where the last character doesn't have to be a NUL, it's hard to create 
a String from it. Real-world examples of this are Mach-O LC_SEGMENT and 
LC_SEGMENT_64 commands.

The generally-accepted wisdom  is 
that you take a pointer to the CChar tuple that represents the fixed-size 
array, but this still requires the string to be NUL-terminated. What do we 
think of an additional init(cString:) overload that takes an 
UnsafeBufferPointer and reads up to the first NUL or the end of the buffer, 
whichever comes first?

> Le 30 mars 2017 à 02:48, Brent Royal-Gordon via swift-evolution 
>  a écrit :
> 
>> On Mar 29, 2017, at 5:32 PM, Ben Cohen via swift-evolution 
>>  wrote:
>> 
>> Hi Swift Evolution,
>> 
>> Below is a pitch for the first part of the String revision. This covers a 
>> number of changes that would allow the basic internals to be overhauled.
>> 
>> Online version here: 
>> https://github.com/airspeedswift/swift-evolution/blob/3a822c799011ace682712532cfabfe32e9203fbb/proposals/0161-StringRevision1.md
> 
> Really great stuff, guys. Thanks for your work on this!
> 
>> In order to be able to write extensions accross both String and Substring, a 
>> new Unicode protocol to which the two types will conform will be introduced. 
>> For the purposes of this proposal, Unicode will be defined as a protocol to 
>> be used whenver you would previously extend String. It should be possible to 
>> substitute extension Unicode { ... } in Swift 4 wherever extension String { 
>> ... } was written in Swift 3, with one exception: any passing of self into 
>> an API that takes a concrete String will need to be rewritten as 
>> String(self). If Self is a String then this should effectively optimize to a 
>> no-op, whereas if Self is a Substring then this will force a copy, helping 
>> to avoid the “memory leak” problems described above.
> 
> I continue to feel that `Unicode` is the wrong name for this protocol, 
> essentially because it sounds like a protocol for, say, a version of Unicode 
> or some kind of encoding machinery instead of a Unicode string. I won't 
> rehash that argument since I made it already in the manifesto thread, but I 
> would like to make a couple new suggestions in this area.
> 
> Later on, you note that it would be nice to namespace many of these types:
> 
>> Several of the types related to String, such as the encodings, would ideally 
>> reside inside a namespace rather than live at the top level of the standard 
>> library. The best namespace for this is probably Unicode, but this is also 
>> the name of the protocol. At some point if we gain the ability to nest enums 
>> and types inside protocols, they should be moved there. Putting them inside 
>> String or some other enum namespace is probably not worthwhile in the 
>> mean-time.
> 
> Perhaps we should use an empty enum to create a `Unicode` namespace and then 
> nest the protocol within it via typealias. If we do that, we can consider 
> names like `Unicode.Collection` or even `Unicode.String` which would shadow 
> existing types if they were top-level.
> 
> If not, then given this:
> 
>> The exact nature of the protocol – such as which methods should be protocol 
>> requirements vs which can be implemented as protocol extensions, are 
>> considered implementation details and so not covered in this proposal.
> 
> We may simply want to wait to choose a name. As the protocol develops, we may 
> discover a theme in its requirements which would suggest a good name. For 
> instance, we may realize that the core of what the protocol abstracts is 
> grouping code units into characters, which might suggest a name like 
> `Characters`, or `Unicode.Characters`, or `CharacterCollection`, or 
> what-have-you.
> 
> (By the way, I hope that the eventual protocol requirements will be put 
> through the review process, if only as an amendment, once they're determined.)
> 
>> Unicode will conform to BidirectionalCollection. RangeReplaceableCollection 
>> conformance will be added directly onto the String and Substring types, as 
>> it is possible future Unicode-conforming types might not be 
>> range-replaceable (e.g. an immutable type that wraps a const char *).
> 
> I'm a little worried about this because it seems to imply that the protocol 
> cannot include any mutation operations that aren't in 
> `RangeReplaceableCollection`. For instance, it won't be possible to include 
> an in-place `applyTransform` method in the protocol. Do you anticipate that 
> being an issue? Might it be a good idea to define a parallel `Mutable` or 
> `RangeReplaceable` protocol?
> 
>> The C string interop methods will be 

Re: [swift-evolution] [Pitch] String revision proposal #1

2017-03-30 Thread Adrian Zubarev via swift-evolution
If I had to choose as a not native English speaker I’d go for SubString just 
for the camel case consistency across all other types.

We cannot rename SubSequence to Subsequence, because that would be odd compared 
to all other types containing Sequence.

AnySequence
LazyPrefixWhileSequence
LazySequence
EnumeratedSequence
etc.
This won’t break anything or create any other inconsistency.



-- 
Adrian Zubarev
Sent with Airmail

Am 30. März 2017 um 17:51:09, Joshua Alvarado via swift-evolution 
(swift-evolution@swift.org) schrieb:

...my vote would be to lowercase Subsequence. We can typealias SubSequence = 
Subsequence to aid migration
 
+1 didn't think that was an option. A good solution would be to have them 
either camel case (SubString, SubSequence) or just capitalized (Substring, 
Substring) either would be nice as long as they were matching.

On Thu, Mar 30, 2017 at 9:38 AM, Ben Cohen via swift-evolution 
 wrote:

On Mar 29, 2017, at 6:59 PM, Xiaodi Wu  wrote:

This looks great. The restored conformances to *Collection will be huge.

Is this to be the first of several or the only major part of the manifesto to 
be delivered in Swift 4?


First of several. This lays the ground work for the changes to the underlying 
implementation. Other changes will mostly be additive on top.

Nits on naming: are we calling it Substring or SubString (à la SubSequence)?

This is venturing into subjective territory, so these are just my feelings 
rather than something definitive (Dave may differ) but:

It should definitely be Substring. My rule of thumb: if you might hyphenate it, 
you can capitalize it. I don’t think anyone spells it "sub-string". OTOH one 
might write "sub-sequence". Generally hyphens disappear in english as things 
come into common usage i.e. it used to be e-mail but now it’s mostly just 
email.  Substring is enough of a term of art in programming that this has 
happened. Admittedly, Subsequence is a term of art too – unfortunately one that 
has a different meaning to ours ("a sequence that can be derived from another 
sequence by deleting some elements without changing the order of the remaining 
elements" e.g.  is a Subsequence of  – see 
https://en.wikipedia.org/wiki/Subsequence). Even worse, the mathematical term 
for what we are calling a subsequence is a Substring!

If we were change anything, my vote would be to lowercase Subsequence. We can 
typealias SubSequence = Subsequence to aid migration, with a slow burn on 
deprecating it since it’ll be quite a footling deprecation. I don’t know if 
it’s worth it though – the main use of “SubSequence” is currently in those 
pesky where clauses you have to put on all your Collection extensions if you 
want to use slicing, and many of these will be eliminated once we have the 
ability to put where clauses on associated types.

and shouldn't it be UnicodeParsedResult rather than UnicodeParseResult?


I think Parse. As in, this is the result of a parse, not these are the parsed 
results (though it does contain parsed results in some cases, but not all).


On Wed, Mar 29, 2017 at 19:32 Ben Cohen via swift-evolution 
 wrote:
Hi Swift Evolution,

Below is a pitch for the first part of the String revision. This covers a 
number of changes that would allow the basic internals to be overhauled.

Online version here: 
https://github.com/airspeedswift/swift-evolution/blob/3a822c799011ace682712532cfabfe32e9203fbb/proposals/0161-StringRevision1.md


String Revision: Collection Conformance, C Interop, Transcoding

Proposal: SE-0161
Authors: Ben Cohen, Dave Abrahams
Review Manager: TBD
Status: Awaiting review
Introduction

This proposal is to implement a subset of the changes from the Swift 4 String 
Manifesto.

Specifically:

Make String conform to BidirectionalCollection
Make String conform to RangeReplaceableCollection
Create a Substring type for String.SubSequence
Create a Unicode protocol to allow for generic operations over both types.
Consolidate on a concise set of C interop methods.
Revise the transcoding infrastructure.
Other existing aspects of String remain unchanged for the purposes of this 
proposal.

Motivation

This proposal follows up on a number of recommendations found in the manifesto:

Collection conformance was dropped from String in Swift 2. After reevaluation, 
the feeling is that the minor semantic discrepancies (mainly with 
RangeReplaceableCollection) are outweighed by the significant benefits of 
restoring these conformances. For more detail on the reasoning, see here

While it is not a collection, the Swift 3 string does have slicing operations. 
String is currently serving as its own subsequence, allowing substrings to 
share storage with their “owner”. This can lead to memory leaks when small 
substrings of larger strings are stored long-term (see here for more detail on 
this problem). Introducing a separate type of Substring to serve as 

Re: [swift-evolution] [Pitch] String revision proposal #1

2017-03-30 Thread Joshua Alvarado via swift-evolution
>
> ...my vote would be to lowercase Subsequence. We can typealias
> SubSequence = Subsequence to aid migration


+1 didn't think that was an option. A good solution would be to have them
either camel case (SubString, SubSequence) or just capitalized (Substring,
Substring) either would be nice as long as they were matching.

On Thu, Mar 30, 2017 at 9:38 AM, Ben Cohen via swift-evolution <
swift-evolution@swift.org> wrote:

>
> On Mar 29, 2017, at 6:59 PM, Xiaodi Wu  wrote:
>
> This looks great. The restored conformances to *Collection will be huge.
>
> Is this to be the first of several or the only major part of the manifesto
> to be delivered in Swift 4?
>
>
> First of several. This lays the ground work for the changes to the
> underlying implementation. Other changes will mostly be additive on top.
>
> Nits on naming: are we calling it Substring or SubString (à la
> SubSequence)?
>
>
> This is venturing into subjective territory, so these are just my feelings
> rather than something definitive (Dave may differ) but:
>
> It should definitely be Substring. My rule of thumb: if you might
> hyphenate it, you can capitalize it. I don’t think anyone spells it
> "sub-string". OTOH one *might* write "sub-sequence". Generally hyphens
> disappear in english as things come into common usage i.e. it used to be
> e-mail but now it’s mostly just email.  Substring is enough of a term of
> art in programming that this has happened. Admittedly, Subsequence is a
> term of art too – unfortunately one that has a different meaning to ours
> ("a sequence that can be derived from another sequence by deleting some
> elements without changing the order of the remaining elements" e.g. 
> is a Subsequence of  – see https://en.wikipedia.org/
> wiki/Subsequence). Even worse, the mathematical term for what we are
> calling a subsequence is a Substring!
>
> If we were change anything, my vote would be to lowercase Subsequence. We
> can typealias SubSequence = Subsequence to aid migration, with a slow burn
> on deprecating it since it’ll be quite a footling deprecation. I don’t know
> if it’s worth it though – the main use of “SubSequence” is currently in
> those pesky where clauses you have to put on all your Collection extensions
> if you want to use slicing, and many of these will be eliminated once we
> have the ability to put where clauses on associated types.
>
> and shouldn't it be UnicodeParsedResult rather than UnicodeParseResult?
>
>
> I think Parse. As in, this is the result of a parse, not these are the
> parsed results (though it does contain parsed results in some cases, but
> not all).
>
>
> On Wed, Mar 29, 2017 at 19:32 Ben Cohen via swift-evolution <
> swift-evolution@swift.org> wrote:
>
> Hi Swift Evolution,
>
> Below is a pitch for the first part of the String revision. This covers a
> number of changes that would allow the basic internals to be overhauled.
>
> Online version here: https://github.com/airspeedswift/swift-evolution/
> blob/3a822c799011ace682712532cfabfe32e9203fbb/proposals/0161-
> StringRevision1.md
>
>
> String Revision: Collection Conformance, C Interop, Transcoding
>
>- Proposal: SE-0161
>- Authors: Ben Cohen , Dave Abrahams
>
>- Review Manager: TBD
>- Status: *Awaiting review*
>
> Introduction
>
> This proposal is to implement a subset of the changes from the Swift 4
> String Manifesto
> .
>
> Specifically:
>
>- Make String conform to BidirectionalCollection
>- Make String conform to RangeReplaceableCollection
>- Create a Substring type for String.SubSequence
>- Create a Unicode protocol to allow for generic operations over both
>types.
>- Consolidate on a concise set of C interop methods.
>- Revise the transcoding infrastructure.
>
> Other existing aspects of String remain unchanged for the purposes of
> this proposal.
> Motivation
>
> This proposal follows up on a number of recommendations found in the
> manifesto:
>
> Collection conformance was dropped from String in Swift 2. After
> reevaluation, the feeling is that the minor semantic discrepancies (mainly
> with RangeReplaceableCollection) are outweighed by the significant
> benefits of restoring these conformances. For more detail on the reasoning,
> see here
> 
>
> While it is not a collection, the Swift 3 string does have slicing
> operations. String is currently serving as its own subsequence, allowing
> substrings to share storage with their “owner”. This can lead to memory
> leaks when small substrings of larger strings are stored long-term (see
> here
> 
>  for
> more detail on this problem). Introducing a separate type of 

Re: [swift-evolution] [Pitch] String revision proposal #1

2017-03-30 Thread Zach Waldowski via swift-evolution
Loving it so far.



`encode` and `parseScalar[Forward|Backward]` feel asymmetric. What's
wrong with `decode[Forward|Backward]`?


`UnicodeParseResult` really feels like it could/should be
defined as `UnicodeEncoding.ParseResult` (or `DecodeResult`,
given the above). I can't remember if that generics limitation was
being lifted?


Best,

  Zachary Waldowski

  z...@waldowski.me





On Wed, Mar 29, 2017, at 08:32 PM, Ben Cohen via swift-evolution wrote:
> Hi Swift Evolution,

> 

> Below is a pitch for the first part of the String revision. This
> covers a number of changes that would allow the basic internals to be
> overhauled.
> 

> Online version here:
> https://github.com/airspeedswift/swift-evolution/blob/3a822c799011ace682712532cfabfe32e9203fbb/proposals/0161-StringRevision1.md
> 

> 

> String Revision: Collection Conformance, C Interop, Transcoding


>  * Proposal: SE-0161
>  * Authors: Ben Cohen[1], Dave Abrahams[2]
>  * Review Manager: TBD
>  * Status: *Awaiting review*
> Introduction



> This proposal is to implement a subset of the changes from the Swift 4
> String Manifesto[3].
> Specifically:


>  * Make String conform to BidirectionalCollection
>  * Make String conform to RangeReplaceableCollection
>  * Create a Substring type for String.SubSequence
>  * Create a Unicode protocol to allow for generic operations over both
>types.
>  * Consolidate on a concise set of C interop methods.
>  * __Revise the transcoding infrastructure.
> Other existing aspects of String remain unchanged for the purposes of
> this proposal.
> Motivation



> This proposal follows up on a number of recommendations found in the
> manifesto:
> Collection conformance was dropped from String in Swift 2. After
> reevaluation, the feeling is that the minor semantic discrepancies
> (mainly with RangeReplaceableCollection) are outweighed by the
> significant benefits of restoring these conformances. For more detail
> on the reasoning, see here[4]
> While it is not a collection, the Swift 3 string does have slicing
> operations. String is currently serving as its own subsequence,
> allowing substrings to share storage with their “owner”. This can lead
> to memory leaks when small substrings of larger strings are stored long-
> term (see here[5] for more detail on this problem). Introducing a
> separate type of Substring to serve as String.Subsequence is
> recommended to resolve this issue, in a similar fashion to ArraySlice.
> As noted in the manifesto, support for interoperation with nul-
> terminated C strings in Swift 3 is scattered and incoherent, with 6
> ways to transform a C string into a String and four ways to do the
> inverse. These APIs should be replaced with a simpler set of methods
> on String.
> Proposed solution



> A new type, Substring, will be introduced. Similar to ArraySlice it
> will be documented as only for short- to medium-term storage:
>> *Important*



>> Long-term storage of Substring instances is discouraged. A substring
>> holds a reference to the entire storage of a larger string, not just
>> to the portion it presents, even after the original string’s lifetime
>> ends. Long-term storage of a substring may therefore prolong the
>> lifetime of elements that are no longer otherwise accessible, which
>> can appear to be memory leakage.
> Aside from minor differences, such as having a SubSequence of Self and
> a larger size to describe the range of the subsequence, Substring will
> be near-identical from a user perspective.
> In order to be able to write extensions accross both String and
> Substring, a new Unicode protocol to which the two types will conform
> will be introduced. For the purposes of this proposal, Unicode will be
> defined as a protocol to be used whenver you would previously extend
> String. It should be possible to substitute extension Unicode { ... }
> in Swift 4 wherever extension String { ... } was written in Swift 3,
> with one exception: any passing of self into an API that takes a
> concrete String will need to be rewritten as String(self). If Self is
> a String then this should effectively optimize to a no-op, whereas if
> Self is a Substring then this will force a copy, helping to avoid the
> “memory leak” problems described above.
> The exact nature of the protocol – such as which methods should be
> protocol requirements vs which can be implemented as protocol
> extensions, are considered implementation details and so not covered
> in this proposal.
> Unicode will conform to BidirectionalCollection.
> RangeReplaceableCollection conformance will be added directly onto the
> String and Substring types, as it is possible future Unicode-
> conforming types might not be range-replaceable (e.g. an immutable
> type that wraps a const char *).
> The C string interop methods will be updated to those described
> here[6]: a single withCString operation and two init(cString:)
> constructors, one for UTF8 and one for arbitrary encodings. The
> primary change is to 

Re: [swift-evolution] [Pitch] String revision proposal #1

2017-03-30 Thread Ben Cohen via swift-evolution

> On Mar 29, 2017, at 6:59 PM, Xiaodi Wu  wrote:
> 
> This looks great. The restored conformances to *Collection will be huge.
> 
> Is this to be the first of several or the only major part of the manifesto to 
> be delivered in Swift 4?
> 

First of several. This lays the ground work for the changes to the underlying 
implementation. Other changes will mostly be additive on top.

> Nits on naming: are we calling it Substring or SubString (à la SubSequence)?

This is venturing into subjective territory, so these are just my feelings 
rather than something definitive (Dave may differ) but:

It should definitely be Substring. My rule of thumb: if you might hyphenate it, 
you can capitalize it. I don’t think anyone spells it "sub-string". OTOH one 
might write "sub-sequence". Generally hyphens disappear in english as things 
come into common usage i.e. it used to be e-mail but now it’s mostly just 
email.  Substring is enough of a term of art in programming that this has 
happened. Admittedly, Subsequence is a term of art too – unfortunately one that 
has a different meaning to ours ("a sequence that can be derived from another 
sequence by deleting some elements without changing the order of the remaining 
elements" e.g.  is a Subsequence of  – see 
https://en.wikipedia.org/wiki/Subsequence 
). Even worse, the mathematical term 
for what we are calling a subsequence is a Substring!

If we were change anything, my vote would be to lowercase Subsequence. We can 
typealias SubSequence = Subsequence to aid migration, with a slow burn on 
deprecating it since it’ll be quite a footling deprecation. I don’t know if 
it’s worth it though – the main use of “SubSequence” is currently in those 
pesky where clauses you have to put on all your Collection extensions if you 
want to use slicing, and many of these will be eliminated once we have the 
ability to put where clauses on associated types.

> and shouldn't it be UnicodeParsedResult rather than UnicodeParseResult?
> 

I think Parse. As in, this is the result of a parse, not these are the parsed 
results (though it does contain parsed results in some cases, but not all).

> 
> On Wed, Mar 29, 2017 at 19:32 Ben Cohen via swift-evolution 
> > wrote:
> Hi Swift Evolution,
> 
> Below is a pitch for the first part of the String revision. This covers a 
> number of changes that would allow the basic internals to be overhauled.
> 
> Online version here: 
> https://github.com/airspeedswift/swift-evolution/blob/3a822c799011ace682712532cfabfe32e9203fbb/proposals/0161-StringRevision1.md
>  
> 
> 
> 
> String Revision: Collection Conformance, C Interop, Transcoding
> 
> Proposal: SE-0161 <>
> Authors: Ben Cohen , Dave Abrahams 
> 
> Review Manager: TBD
> Status: Awaiting review
> Introduction
> 
> This proposal is to implement a subset of the changes from the Swift 4 String 
> Manifesto 
> .
> 
> Specifically:
> 
> Make String conform to BidirectionalCollection
> Make String conform to RangeReplaceableCollection
> Create a Substring type for String.SubSequence
> Create a Unicode protocol to allow for generic operations over both types.
> Consolidate on a concise set of C interop methods.
> Revise the transcoding infrastructure.
> Other existing aspects of String remain unchanged for the purposes of this 
> proposal.
> 
> Motivation
> 
> This proposal follows up on a number of recommendations found in the 
> manifesto:
> 
> Collection conformance was dropped from String in Swift 2. After 
> reevaluation, the feeling is that the minor semantic discrepancies (mainly 
> with RangeReplaceableCollection) are outweighed by the significant benefits 
> of restoring these conformances. For more detail on the reasoning, see here 
> 
> While it is not a collection, the Swift 3 string does have slicing 
> operations. String is currently serving as its own subsequence, allowing 
> substrings to share storage with their “owner”. This can lead to memory leaks 
> when small substrings of larger strings are stored long-term (see here 
> 
>  for more detail on this problem). Introducing a separate type of Substring 
> to serve as String.Subsequence is recommended to resolve this issue, in a 
> similar fashion to ArraySlice.
> 
> As noted in the manifesto, support for interoperation with nul-terminated C 
> strings in Swift 3 is scattered and incoherent, with 6 ways to transform a C 
> string into a 

Re: [swift-evolution] [Pitch] String revision proposal #1

2017-03-30 Thread Joshua Alvarado via swift-evolution
Restoring Collection conformance back to String is a big win for Swift!
This revision looks great but I agree with the naming I believe it should
be SubString not Substring. I think SubString looks odd written out over
Substring but it keeps the convention of SubSequence.

On Wed, Mar 29, 2017 at 7:59 PM, Xiaodi Wu via swift-evolution <
swift-evolution@swift.org> wrote:

> This looks great. The restored conformances to *Collection will be huge.
>
> Is this to be the first of several or the only major part of the manifesto
> to be delivered in Swift 4?
>
> Nits on naming: are we calling it Substring or SubString (à la
> SubSequence)? and shouldn't it be UnicodeParsedResult rather than
> UnicodeParseResult?
>
>
> On Wed, Mar 29, 2017 at 19:32 Ben Cohen via swift-evolution <
> swift-evolution@swift.org> wrote:
>
> Hi Swift Evolution,
>
> Below is a pitch for the first part of the String revision. This covers a
> number of changes that would allow the basic internals to be overhauled.
>
> Online version here: https://github.com/airspeedswift/swift-evolution/
> blob/3a822c799011ace682712532cfabfe32e9203fbb/proposals/0161-
> StringRevision1.md
>
>
> String Revision: Collection Conformance, C Interop, Transcoding
>
>- Proposal: SE-0161
>- Authors: Ben Cohen , Dave Abrahams
>
>- Review Manager: TBD
>- Status: *Awaiting review*
>
> Introduction
>
> This proposal is to implement a subset of the changes from the Swift 4
> String Manifesto
> .
>
> Specifically:
>
>- Make String conform to BidirectionalCollection
>- Make String conform to RangeReplaceableCollection
>- Create a Substring type for String.SubSequence
>- Create a Unicode protocol to allow for generic operations over both
>types.
>- Consolidate on a concise set of C interop methods.
>- Revise the transcoding infrastructure.
>
> Other existing aspects of String remain unchanged for the purposes of
> this proposal.
> Motivation
>
> This proposal follows up on a number of recommendations found in the
> manifesto:
>
> Collection conformance was dropped from String in Swift 2. After
> reevaluation, the feeling is that the minor semantic discrepancies (mainly
> with RangeReplaceableCollection) are outweighed by the significant
> benefits of restoring these conformances. For more detail on the reasoning,
> see here
> 
>
> While it is not a collection, the Swift 3 string does have slicing
> operations. String is currently serving as its own subsequence, allowing
> substrings to share storage with their “owner”. This can lead to memory
> leaks when small substrings of larger strings are stored long-term (see
> here
> 
>  for
> more detail on this problem). Introducing a separate type of Substring to
> serve as String.Subsequence is recommended to resolve this issue, in a
> similar fashion to ArraySlice.
>
> As noted in the manifesto, support for interoperation with nul-terminated
> C strings in Swift 3 is scattered and incoherent, with 6 ways to transform
> a C string into a String and four ways to do the inverse. These APIs
> should be replaced with a simpler set of methods on String.
> Proposed solution
>
> A new type, Substring, will be introduced. Similar to ArraySlice it will
> be documented as only for short- to medium-term storage:
>
> *Important*
> Long-term storage of Substring instances is discouraged. A substring
> holds a reference to the entire storage of a larger string, not just to the
> portion it presents, even after the original string’s lifetime ends.
> Long-term storage of a substring may therefore prolong the lifetime of
> elements that are no longer otherwise accessible, which can appear to be
> memory leakage.
>
> Aside from minor differences, such as having a SubSequence of Self and a
> larger size to describe the range of the subsequence, Substring will be
> near-identical from a user perspective.
>
> In order to be able to write extensions accross both String and Substring,
> a new Unicode protocol to which the two types will conform will be
> introduced. For the purposes of this proposal, Unicode will be defined as
> a protocol to be used whenver you would previously extend String. It
> should be possible to substitute extension Unicode { ... } in Swift 4
> wherever extension String { ... } was written in Swift 3, with one
> exception: any passing of self into an API that takes a concrete String will
> need to be rewritten as String(self). If Self is a String then this
> should effectively optimize to a no-op, whereas if Self is a Substring then
> this will force a copy, helping to avoid the “memory leak” problems
> described above.
>
> The exact nature of the protocol 

Re: [swift-evolution] [Pitch] String revision proposal #1

2017-03-30 Thread Adrian Zubarev via swift-evolution
I haven’t followed the topic and while reading the proposal I found it a little 
confusing that we have inconsistent type names. I’m not a native English 
speaker so that’s might be the main case for my confusion here, so I’d 
appreciate for any clarification. ;-)

SubSequence vs. Substring and not SubString.

The word substring is an English word, but so is subsequence (I double checked 
here).

So where exactly is the issue here? Is it SubSequence which is written in camel 
case or is it Substring which is not?



-- 
Adrian Zubarev
Sent with Airmail

Am 30. März 2017 um 02:32:39, Ben Cohen via swift-evolution 
(swift-evolution@swift.org) schrieb:

Hi Swift Evolution,

Below is a pitch for the first part of the String revision. This covers a 
number of changes that would allow the basic internals to be overhauled.

Online version here: 
https://github.com/airspeedswift/swift-evolution/blob/3a822c799011ace682712532cfabfe32e9203fbb/proposals/0161-StringRevision1.md


String Revision: Collection Conformance, C Interop, Transcoding

Proposal: SE-0161
Authors: Ben Cohen, Dave Abrahams
Review Manager: TBD
Status: Awaiting review
Introduction

This proposal is to implement a subset of the changes from the Swift 4 String 
Manifesto.

Specifically:

Make String conform to BidirectionalCollection
Make String conform to RangeReplaceableCollection
Create a Substring type for String.SubSequence
Create a Unicode protocol to allow for generic operations over both types.
Consolidate on a concise set of C interop methods.
Revise the transcoding infrastructure.
Other existing aspects of String remain unchanged for the purposes of this 
proposal.

Motivation

This proposal follows up on a number of recommendations found in the manifesto:

Collection conformance was dropped from String in Swift 2. After reevaluation, 
the feeling is that the minor semantic discrepancies (mainly with 
RangeReplaceableCollection) are outweighed by the significant benefits of 
restoring these conformances. For more detail on the reasoning, see here

While it is not a collection, the Swift 3 string does have slicing operations. 
String is currently serving as its own subsequence, allowing substrings to 
share storage with their “owner”. This can lead to memory leaks when small 
substrings of larger strings are stored long-term (see here for more detail on 
this problem). Introducing a separate type of Substring to serve as 
String.Subsequence is recommended to resolve this issue, in a similar fashion 
to ArraySlice.

As noted in the manifesto, support for interoperation with nul-terminated C 
strings in Swift 3 is scattered and incoherent, with 6 ways to transform a C 
string into a String and four ways to do the inverse. These APIs should be 
replaced with a simpler set of methods on String.

Proposed solution

A new type, Substring, will be introduced. Similar to ArraySlice it will be 
documented as only for short- to medium-term storage:

Important

Long-term storage of Substring instances is discouraged. A substring holds a 
reference to the entire storage of a larger string, not just to the portion it 
presents, even after the original string’s lifetime ends. Long-term storage of 
a substring may therefore prolong the lifetime of elements that are no longer 
otherwise accessible, which can appear to be memory leakage.
Aside from minor differences, such as having a SubSequence of Self and a larger 
size to describe the range of the subsequence, Substring will be near-identical 
from a user perspective.

In order to be able to write extensions accross both String and Substring, a 
new Unicode protocol to which the two types will conform will be introduced. 
For the purposes of this proposal, Unicode will be defined as a protocol to be 
used whenver you would previously extend String. It should be possible to 
substitute extension Unicode { ... } in Swift 4 wherever extension String { ... 
} was written in Swift 3, with one exception: any passing of self into an API 
that takes a concrete String will need to be rewritten as String(self). If Self 
is a String then this should effectively optimize to a no-op, whereas if Self 
is a Substring then this will force a copy, helping to avoid the “memory leak” 
problems described above.

The exact nature of the protocol – such as which methods should be protocol 
requirements vs which can be implemented as protocol extensions, are considered 
implementation details and so not covered in this proposal.

Unicode will conform to BidirectionalCollection. RangeReplaceableCollection 
conformance will be added directly onto the String and Substring types, as it 
is possible future Unicode-conforming types might not be range-replaceable 
(e.g. an immutable type that wraps a const char *).

The C string interop methods will be updated to those described here: a single 
withCString operation and two init(cString:) constructors, one for UTF8 and one 
for arbitrary encodings. The primary change is to remove 

Re: [swift-evolution] [Pitch] String revision proposal #1

2017-03-30 Thread Brent Royal-Gordon via swift-evolution
> On Mar 29, 2017, at 5:32 PM, Ben Cohen via swift-evolution 
>  wrote:
> 
> Hi Swift Evolution,
> 
> Below is a pitch for the first part of the String revision. This covers a 
> number of changes that would allow the basic internals to be overhauled.
> 
> Online version here: 
> https://github.com/airspeedswift/swift-evolution/blob/3a822c799011ace682712532cfabfe32e9203fbb/proposals/0161-StringRevision1.md

Really great stuff, guys. Thanks for your work on this!

> In order to be able to write extensions accross both String and Substring, a 
> new Unicode protocol to which the two types will conform will be introduced. 
> For the purposes of this proposal, Unicode will be defined as a protocol to 
> be used whenver you would previously extend String. It should be possible to 
> substitute extension Unicode { ... } in Swift 4 wherever extension String { 
> ... } was written in Swift 3, with one exception: any passing of self into an 
> API that takes a concrete String will need to be rewritten as String(self). 
> If Self is a String then this should effectively optimize to a no-op, whereas 
> if Self is a Substring then this will force a copy, helping to avoid the 
> “memory leak” problems described above.

I continue to feel that `Unicode` is the wrong name for this protocol, 
essentially because it sounds like a protocol for, say, a version of Unicode or 
some kind of encoding machinery instead of a Unicode string. I won't rehash 
that argument since I made it already in the manifesto thread, but I would like 
to make a couple new suggestions in this area.

Later on, you note that it would be nice to namespace many of these types:

> Several of the types related to String, such as the encodings, would ideally 
> reside inside a namespace rather than live at the top level of the standard 
> library. The best namespace for this is probably Unicode, but this is also 
> the name of the protocol. At some point if we gain the ability to nest enums 
> and types inside protocols, they should be moved there. Putting them inside 
> String or some other enum namespace is probably not worthwhile in the 
> mean-time.

Perhaps we should use an empty enum to create a `Unicode` namespace and then 
nest the protocol within it via typealias. If we do that, we can consider names 
like `Unicode.Collection` or even `Unicode.String` which would shadow existing 
types if they were top-level.

If not, then given this:

> The exact nature of the protocol – such as which methods should be protocol 
> requirements vs which can be implemented as protocol extensions, are 
> considered implementation details and so not covered in this proposal.

We may simply want to wait to choose a name. As the protocol develops, we may 
discover a theme in its requirements which would suggest a good name. For 
instance, we may realize that the core of what the protocol abstracts is 
grouping code units into characters, which might suggest a name like 
`Characters`, or `Unicode.Characters`, or `CharacterCollection`, or 
what-have-you.

(By the way, I hope that the eventual protocol requirements will be put through 
the review process, if only as an amendment, once they're determined.)

> Unicode will conform to BidirectionalCollection. RangeReplaceableCollection 
> conformance will be added directly onto the String and Substring types, as it 
> is possible future Unicode-conforming types might not be range-replaceable 
> (e.g. an immutable type that wraps a const char *).

I'm a little worried about this because it seems to imply that the protocol 
cannot include any mutation operations that aren't in 
`RangeReplaceableCollection`. For instance, it won't be possible to include an 
in-place `applyTransform` method in the protocol. Do you anticipate that being 
an issue? Might it be a good idea to define a parallel `Mutable` or 
`RangeReplaceable` protocol?

> The C string interop methods will be updated to those described here: a 
> single withCString operation and two init(cString:) constructors, one for 
> UTF8 and one for arbitrary encodings.

Sorry if I'm repeating something that was already discussed, but is there a 
reason you don't include a `withCString` variant for arbitrary encodings? It 
seems like an odd asymmetry.

> The standard library currently lacks a Latin1 codec, so a enum Latin1: 
> UnicodeEncoding type will be added.

Nice. I wrote one of those once; I'll enjoy deleting it.

> A new protocol, UnicodeEncoding, will be added to replace the current 
> UnicodeCodec protocol:
> 
> public enum UnicodeParseResult {

Either `T` should be given a more specific name, or the enum should be given a 
less specific one, becoming `ParseResult` and being oriented towards 
incremental parsing of anything from any kind of collection.

> /// Indicates valid input was recognized.
> ///
> /// `resumptionPoint` is the end of the parsed region
> case valid(T, resumptionPoint: Index)  // FIXME: should these be 

Re: [swift-evolution] [Pitch] String revision proposal #1

2017-03-29 Thread Xiaodi Wu via swift-evolution
This looks great. The restored conformances to *Collection will be huge.

Is this to be the first of several or the only major part of the manifesto
to be delivered in Swift 4?

Nits on naming: are we calling it Substring or SubString (à la
SubSequence)? and shouldn't it be UnicodeParsedResult rather than
UnicodeParseResult?


On Wed, Mar 29, 2017 at 19:32 Ben Cohen via swift-evolution <
swift-evolution@swift.org> wrote:

Hi Swift Evolution,

Below is a pitch for the first part of the String revision. This covers a
number of changes that would allow the basic internals to be overhauled.

Online version here:
https://github.com/airspeedswift/swift-evolution/blob/3a822c799011ace682712532cfabfe32e9203fbb/proposals/0161-StringRevision1.md


String Revision: Collection Conformance, C Interop, Transcoding

   - Proposal: SE-0161
   - Authors: Ben Cohen , Dave Abrahams
   
   - Review Manager: TBD
   - Status: *Awaiting review*

Introduction

This proposal is to implement a subset of the changes from the Swift 4
String Manifesto
.

Specifically:

   - Make String conform to BidirectionalCollection
   - Make String conform to RangeReplaceableCollection
   - Create a Substring type for String.SubSequence
   - Create a Unicode protocol to allow for generic operations over both
   types.
   - Consolidate on a concise set of C interop methods.
   - Revise the transcoding infrastructure.

Other existing aspects of String remain unchanged for the purposes of this
proposal.
Motivation

This proposal follows up on a number of recommendations found in the
manifesto:

Collection conformance was dropped from String in Swift 2. After
reevaluation, the feeling is that the minor semantic discrepancies (mainly
with RangeReplaceableCollection) are outweighed by the significant benefits
of restoring these conformances. For more detail on the reasoning, see here


While it is not a collection, the Swift 3 string does have slicing
operations. String is currently serving as its own subsequence, allowing
substrings to share storage with their “owner”. This can lead to memory
leaks when small substrings of larger strings are stored long-term (see here

for
more detail on this problem). Introducing a separate type of Substring to
serve as String.Subsequence is recommended to resolve this issue, in a
similar fashion to ArraySlice.

As noted in the manifesto, support for interoperation with nul-terminated C
strings in Swift 3 is scattered and incoherent, with 6 ways to transform a
C string into a String and four ways to do the inverse. These APIs should
be replaced with a simpler set of methods on String.
Proposed solution

A new type, Substring, will be introduced. Similar to ArraySlice it will be
documented as only for short- to medium-term storage:

*Important*
Long-term storage of Substring instances is discouraged. A substring holds
a reference to the entire storage of a larger string, not just to the
portion it presents, even after the original string’s lifetime ends.
Long-term storage of a substring may therefore prolong the lifetime of
elements that are no longer otherwise accessible, which can appear to be
memory leakage.

Aside from minor differences, such as having a SubSequence of Self and a
larger size to describe the range of the subsequence, Substring will be
near-identical from a user perspective.

In order to be able to write extensions accross both String and Substring,
a new Unicode protocol to which the two types will conform will be
introduced. For the purposes of this proposal, Unicode will be defined as a
protocol to be used whenver you would previously extend String. It should
be possible to substitute extension Unicode { ... } in Swift 4
wherever extension
String { ... } was written in Swift 3, with one exception: any passing of
self into an API that takes a concrete String will need to be rewritten as
String(self). If Self is a String then this should effectively optimize to
a no-op, whereas if Self is a Substring then this will force a copy,
helping to avoid the “memory leak” problems described above.

The exact nature of the protocol – such as which methods should be protocol
requirements vs which can be implemented as protocol extensions, are
considered implementation details and so not covered in this proposal.

Unicode will conform to BidirectionalCollection.
RangeReplaceableCollection conformance
will be added directly onto the String and Substring types, as it is
possible future Unicode-conforming types might not be range-replaceable
(e.g. an immutable type that wraps a const char *).

The C string interop methods will be updated to those described here

[swift-evolution] [Pitch] String revision proposal #1

2017-03-29 Thread Ben Cohen via swift-evolution
Hi Swift Evolution,

Below is a pitch for the first part of the String revision. This covers a 
number of changes that would allow the basic internals to be overhauled.

Online version here: 
https://github.com/airspeedswift/swift-evolution/blob/3a822c799011ace682712532cfabfe32e9203fbb/proposals/0161-StringRevision1.md
 



String Revision: Collection Conformance, C Interop, Transcoding

Proposal: SE-0161 

Authors: Ben Cohen , Dave Abrahams 

Review Manager: TBD
Status: Awaiting review
Introduction

This proposal is to implement a subset of the changes from the Swift 4 String 
Manifesto .

Specifically:

Make String conform to BidirectionalCollection
Make String conform to RangeReplaceableCollection
Create a Substring type for String.SubSequence
Create a Unicode protocol to allow for generic operations over both types.
Consolidate on a concise set of C interop methods.
Revise the transcoding infrastructure.
Other existing aspects of String remain unchanged for the purposes of this 
proposal.

Motivation

This proposal follows up on a number of recommendations found in the manifesto:

Collection conformance was dropped from String in Swift 2. After reevaluation, 
the feeling is that the minor semantic discrepancies (mainly with 
RangeReplaceableCollection) are outweighed by the significant benefits of 
restoring these conformances. For more detail on the reasoning, see here 

While it is not a collection, the Swift 3 string does have slicing operations. 
String is currently serving as its own subsequence, allowing substrings to 
share storage with their “owner”. This can lead to memory leaks when small 
substrings of larger strings are stored long-term (see here 
 
for more detail on this problem). Introducing a separate type of Substring to 
serve as String.Subsequence is recommended to resolve this issue, in a similar 
fashion to ArraySlice.

As noted in the manifesto, support for interoperation with nul-terminated C 
strings in Swift 3 is scattered and incoherent, with 6 ways to transform a C 
string into a String and four ways to do the inverse. These APIs should be 
replaced with a simpler set of methods on String.

Proposed solution

A new type, Substring, will be introduced. Similar to ArraySlice it will be 
documented as only for short- to medium-term storage:

Important

Long-term storage of Substring instances is discouraged. A substring holds a 
reference to the entire storage of a larger string, not just to the portion it 
presents, even after the original string’s lifetime ends. Long-term storage of 
a substring may therefore prolong the lifetime of elements that are no longer 
otherwise accessible, which can appear to be memory leakage.
Aside from minor differences, such as having a SubSequence of Self and a larger 
size to describe the range of the subsequence, Substring will be near-identical 
from a user perspective.

In order to be able to write extensions accross both String and Substring, a 
new Unicode protocol to which the two types will conform will be introduced. 
For the purposes of this proposal, Unicode will be defined as a protocol to be 
used whenver you would previously extend String. It should be possible to 
substitute extension Unicode { ... } in Swift 4 wherever extension String { ... 
} was written in Swift 3, with one exception: any passing of self into an API 
that takes a concrete String will need to be rewritten as String(self). If Self 
is a String then this should effectively optimize to a no-op, whereas if Self 
is a Substring then this will force a copy, helping to avoid the “memory leak” 
problems described above.

The exact nature of the protocol – such as which methods should be protocol 
requirements vs which can be implemented as protocol extensions, are considered 
implementation details and so not covered in this proposal.

Unicode will conform to BidirectionalCollection. RangeReplaceableCollection 
conformance will be added directly onto the String and Substring types, as it 
is possible future Unicode-conforming types might not be range-replaceable 
(e.g. an immutable type that wraps a const char *).

The C string interop methods will be updated to those described here 
:
 a single withCString operation and two init(cString:) constructors, one for 
UTF8 and one for arbitrary encodings. The primary change is to remove 
“non-repairing” variants of construction from nul-terminated C strings. In both 
of the construction