M.-A. Lemburg: > You mean a slice that slices out the next <indextype> ?
Yes. > This sounds a lot like you'd want iterators for the various > index types. Should be possible to implement on top of the > proposed APIs, e.g. itergraphemes(u), itercodepoints(u), etc. Iterators may be helpful, but can also be too restrictive when the processing is not completely iterative, such as peeking ahead or looking behind to wrap at a word boundary in the display example. There should be It was more that there may leave less scope for error if there was a move away from indexes to slices. The PEP provides ways to specify what you want to examine or modify but it looks to me like returning indexes will see code repetition or additional variables with an increase in fragility. > Note that what most people refer to as "character" is a > grapheme in Unicode speak. A grapheme-oriented string type may be worthwhile although you'd probably have to choose a particular normalisation form to ease processing. > Given that interpretation, > "breaking" Unicode "characters" is something you won't > ever work around with by using larger code units such > as UCS4 compatible ones. I still think we can reduce the scope for errors. > Furthermore, you should also note that surrogates (two > code units encoding one code point) are part of Unicode > life. While you don't need them when storing Unicode > in UCS4 code units, they can still be part of the > Unicode data and the programmer has to be aware of > these. Many programmers can and will ignore surrogates. One day that may bite them but we can't close off text processing to those who have no idea of what surrogates are, or directional marks, or that sorting is locale dependent, or have no understanding of the difference between NFC and NFKD normalization forms. Neil _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com