Re: [swift-evolution] Strings in Swift 4

Ben Cohen via swift-evolution Wed, 25 Jan 2017 13:08:59 -0800

> On Jan 24, 2017, at 7:02 PM, Félix Cloutier via swift-evolution 
> <[email protected]> wrote:
> 
> 
>> Le 24 janv. 2017 à 11:33, Dave Abrahams via swift-evolution 
>> <[email protected]> a écrit :
>>>>> I've never seen anyone start a string with a combining character on 
>>>>> purpose, 
>>>> 
>>>> It will occur as a byproduct of the process of attaching a diacritic
>>>> to a base character.
>>> 
>>> Unless you're in the business of writing a text editor, I don't know
>>> if that's a common use case.
>> 
>> I don't either, to be honest.  But the experts I consult with keep
>> reassuring me that it's an important one.
> 
> Would it be possible that the Unicode experts' use cases are different from 
> non-experts' use cases? It would make sense to put people who know a lot 
> about Unicode in charge of handling complex Unicode operations, and that 
> makes that use case very important to them, but through their hard work no 
> one else needs to care about it.
> 
>>>>> though I'm familiar with just one natural language that needs
>>>>> combining characters. I can imagine that it could be a convenient
>>>>> feature in other natural languages.
>>>>> 
>>>>> However, if Swift Strings are now designed for machine processing
>>>>> and less for human language convenience, for me, it's easy enough to
>>>>> justify a safe default in the context of machine processing: `a+b`
>>>>> will not combine the end of `a` with the start of `b`. You could do
>>>>> this by inserting a ◌ that `b` could combine with if necessary.
>>>> 
>>>> You can do it, but it trades one semantic problem for a usability
>>>> problem, without solving all the semantic problems: you end up with
>>>> a.count + b.count == (a+b).count, sure, but you still don't satisfy
>>>> the usual law of collections that (a+b).contains(b.first!) if b is
>>>> non-empty, and now you've made it difficult to attach diacritics to
>>>> base characters.
>>> 
>>> "Difficult".
>>> 
>>> What kind of processing would you suggest on a variable "b" in the
>>> expression "\(a),\(b)" to ensure that the result can be split with a
>>> comma?
>> 
>> I'm sorry, I don't understand what you're driving at, here.
> 
> Okay, so I'm serializing two strings "a" and "b", and later on I want to 
> deserialize them. I control "a", and the user controls "b". I know that I'll 
> never have a comma in "a", so one obvious way to serialize the two strings is 
> with "\(a),\(b)", and the most obvious way to deserialize them is with 
> string.split(maxSplits: 2) { $0 == "," }.
> 
> For the example, string "a" is "hello", and the user put in "\u{0301}screw 
> you" for "b". This makes the result "hello,́screw you". Now split misses the 
> comma.
> 
> How do I fix it?
>


One option (once Character acquires a unicodeScalars view similar to String’s) 
would be:

s.split { $0.unicodeScalars.first == "," }

There’s probably also a case to be made for a String-specific overload 
split(separator: UnicodeScalar) in which case you’d pass in the scalar of “,”. 
This would replicate similar behavior to languages that use code points as 
their “character”.

Alternatively, the right solution is to sanitize your input before the 
interpolation. Sanitization is a big topic, of which this is just one example. 
Essentially, you are asking for this kind of sanitization to be automatically 
applied for all range-replaceable operations on strings for this specific use 
case. I’m not sure that’s a good precedent to set. There are other ways in 
which Unicode can be abused that wouldn’t be covered, should we be sanitizing 
for those too on all low-level operations?

This would also have pretty far-reaching implications across lots of different 
types and operations. For example, it’s not just on append:

var s = "pokemon"
let i = s.index(of: "m”)!
// insert not just \u{0301} but also a separator?
s.insert("\u{0301}", at: i)

It also would apply to in-place mutation on slices, given you can do this:

var a = [1,2,3,4]
a[0...2].append(99)
a // [1,2,3,99,4]

In this case, suppose you appended "e" to a slice that ended between "m" and 
"\u{0301}”. The append operation on the substring would need to look into the 
outer string, see that the next scalar is a combining character, and then 
insert a spacer element in between them.

We would still need the ability to append modifiers to characters legitimately. 
If users could not do this by inserting/appending these modifiers into String, 
we would have to put this logic onto Character, which would need to have the 
ability to range-replace within its scalars, which adds to a lot to the 
complexity of that type. It would also be fiddly to use, given that String is 
not going to conform to MutableCollection (because mutation on an element 
cannot be done in constant time). So you couldn’t do it in-place i.e. 
s[i].unicodeScalars.append("\u{0301}") wouldn’t work.

> Félix
> 
> _______________________________________________
> swift-evolution mailing list
> [email protected]
> https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution

Re: [swift-evolution] Strings in Swift 4

Reply via email to