Hello Swift Community,

I’ve found a problem on Swift String API with Thai language. In Thai, we have 
44 Consonants, 32 vowels and 5 tone marks. A special attribute of Thai vowels 
is that they can be put on anywhere around a consonant, some of them are placed 
after a consonant (ชา), some are before (แช), some are above (ชี) and some are 
below (ชุ). Since all vowels must be placed along with a consonants but they’re 
place in difference places around a consonant, Unicode standard says that some 
of the Thai vowels are Grapheme Base and some are Grapheme Extend.

And because Swift String is fully Unicode compliance and by having some vowels 
be a Grapheme Extend makes some Thai vowels have a invalid attributes in some 
aspects. For example a word “ชี” (a nun) and “ชา” (tea) both have one consonant 
(in this case is ช) and one vowel (ี and า). When we ask how many characters 
are there in those words or does this word contain a ช character, we should get 
the same results from those 2 words (2 characters and it contains ช). However I 
found that in Swift String API, I will get a difference answers from those 
questions.


// You can try this code snippet in a Swift Playground
let chi = "ชี"
let cha = “ชา"

// Value of these 2 lines below should be 2
chi.characters.count
cha.characters.count

// Value of these 3 lines below should be true
chi.contains("ช")
cha.contains("ช")
chi.characters.contains("ช”)

// end of code snippet


I’m not sure that if Swift team is aware of this problem and do they have any 
opinion on it. I know that Unicode is very very hard and do aware of that there 
would be a revamp on String API in Swift 4 so I want to make this into a 
discussion before Swift 4 is released.


Thank you,
Bank (Pitiphong)


_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution

Reply via email to