-On [20080703 15:58], Guido van Rossum ([EMAIL PROTECTED]) wrote: >Your seem to be suggesting that len(u"\U00012345") should return 1 on >a system that internally uses UTF-16 and hence represents this string >as a surrogate pair.
From a Unicode and UTF-16 point of view that makes the most sense. So yes, I am suggesting that. >This is not going to happen. You may as well complain to the authors >of the Java standard about the corresponding problem there. Why would I need to complain to them? They already fixed it since 1.5.0. Java 1.5.0's release notes (http://java.sun.com/developer/technicalArticles/releases/j2se15/): Supplementary Character Support 32-bit supplementary character support has been carefully added to the platform as part of the transition to Unicode 4.0 support. Supplementary characters are encoded as a special pair of UTF16 values to generate a different character, or codepoint. A surrogate pair is a combination of a high UTF16 value and a following low UTF16 value. The high and low values are from a special range of UTF16 values. In general, when using a String or sequence of characters, the core API libraries will transparently handle the new supplementary characters for you. See also http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Character.html The methods that accept an int value support all Unicode characters, including supplementary characters. For example, Character.isLetter(0x2F81A) returns true because the code point value represents a letter (a CJK ideograph). -- Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai イェルーン ラウフロック ヴァン デル ウェルヴェン http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B Life can only be understood backwards, but it must be lived forwards... _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com