Le mardi, 13 octobre 2015 à 23:37, Richard Wordingham a écrit : > If you are referring to indexing, I suspect the issue is performance. > UTF-32 feels wasteful, and if the underlying character text is UTF-8 or > UTF-16 we need an auxiliary array to convert character number to byte > offset if we are to have O(1) time for access.
If UTF-32 feels wasteful there are various smart ways of providing direct indexing at a reasonable cost if you are in a language that has minimal support for datatype definition and abstraction. Also I personally find indexing to be rarely useful in string processing, so it may not be the operation you want to optimize for. Having iterators-like functions as you suggest and a datatype to represent substrings seems often a better fit than doing indexing arithmetic. Note that the Swift programming language seems to have gone even further than I would have: their notion of character is a grapheme cluster tested for equality using canonical equivalence and that's what they index in their strings, see [1]. Don't know how well that works in practice as I personally never used it; but it feels like the ultimate Unicode string model you want to provide to the zero-knowledge Unicode programmer (at least for alphabetic scripts). Best, Daniel [1] https://developer.apple.com/library/ios/documentation/Swift/Conceptual/Swift_Programming_Language/StringsAndCharacters.html

