On 9/20/06, Michael Chermside <[EMAIL PROTECTED]> wrote: > I wrote: > >>> msg = u'The ancient greeks used the letter "\U00010143" for the number 5.' > >>> msg[35:-18] > u'"\U00010143"' > >>> greek_five = msg[36:-19] > >>> len(greek_five) > 2 > > > After posting, I realized that it's worse than that. I suspect that if > I tried this on a CPython compiled with wide characters, then > len(greek_five) would be 1. > > What should it be? 2? 1? Implementation-dependent?
This has all been rehashed endlessly. It's implementation (and platform- and compilation options-) dependent because there are good reasons for both choices. Even if CPython 3.0 supports a dynamic choice (which some are proposing) then the *language* will still make it implementation dependent because of Jython and IronPython, where the only choice is UTF-16 (or UCS-2, depending the attitude towards surrogates). -- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Python-3000 mailing list [email protected] http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
