David Hopwood schrieb: >> Assuming my Unicode lingo is right and code point represents a >> letter/character/digraph/whatever, then it will be a code point. Doing one >> of my rare channels of Guido, I *really* doubt he wants to expose the >> technical details of Unicode to the point of having people need to realize >> that UTF-8 takes two bytes to represent "ö". > > The argument used here is not valid. People do need to realize that *all* > Unicode encodings are variable-length, in the sense that abstract characters > can be represented by multiple code points.
Brett did not make such an argument. He made an argument that users should not need to care that "ö" in UTF-8 is two bytes. And I agree: users should not have to worry about this wrt. internal representation. > For example, "ö" can be represented either as the precomposed character > U+00F6, > or as "o" followed by a combining diaeresis (U+006F U+0308). Programs must > avoid splitting sequences of code points that represent a single abstract > character. Why is that? Many programs never encounter cases where this would matter, so why do such program have to operate correctly if that case was encountered? > It simply is not possible to do correct string processing in Unicode that > will "work the way [programmers] are used to when compared to working in > ASCII". Brett didn't say that this was a goal. > Should we nevertheless try to avoid making the use of Unicode strings > unnecessarily difficult for people who have minimal knowledge of Unicode? > Absolutely, but not at the expense of making basic operations on strings > asymptotically less efficient. O(1) indexing and slicing is a basic > requirement, even if it has to be done using code units. It's not possible to implement slicing in constant time, unless string views are introduced. Currently, slicing takes time linear with the length of the result string. Regards, Martin _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com