On Thu, Jul 3, 2008 at 9:35 AM, Steve Holden <[EMAIL PROTECTED]> wrote: > Paul Moore wrote: >> >> On 03/07/2008, Guido van Rossum <[EMAIL PROTECTED]> wrote: >>> >>> I don't see an answer there to the question of whether the length() >>> method of a Java String object containing a single surrogate pair >>> returns 1 or 2; I suspect it returns 2. >> >> It appears you're right: >> >>> type testucs.java >> >> class testucs { >> public static void main(String[] args) { >> StringBuilder s = new StringBuilder("Hello, "); >> s.appendCodePoint(0x2F81A); >> System.out.println(s); // Display the string. >> System.out.println(s.length()); >> } >> } >> >>> java testucs >> >> Hello, ? >> 9 >> >>> java -version >> >> java version "1.6.0_05" >> Java(TM) SE Runtime Environment (build 1.6.0_05-b13) >> Java HotSpot(TM) Client VM (build 10.0-b19, mixed mode, sharing) >> >>> Python 3 supports things like >>> chr(0x12345) and ord("\U00012345"). (And so does Python 2, using >>> unichr and unicode literals.) >> >> And Java doesn't appear to - that appendCodePoint() method was >> wonderfully hard to find :-) >> > There's also the issue of indexing the Unicode strings. If we are going to > insist that len(u) counts surrogate pairs as one character then random > access to the characters of a string is going to be an extremely inefficient > operation.
But my whole point is that len(u) should count surrogate pairs as TWO! > Surely it's desirable under all circumstances that > > len(u) == sum(1 for c in u) > > and that > > [c for c in u] == [c[i] for i in range(*len(u))] > > How would that play under Jeroen's proposed change? I am not considering such a change. At best there will be some helper function in unicodedata, or perhaps a helper method on the 3.0 str type to iterate over characters instead of 16-bit values. Whether that iterator should yield 21-bit integer values or strings containing one character (i.e. perhaps a surrogate pair) and what it would do with lone surrogate halves is up to the committee to design this API. -- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com