Hi, The unicode_internal decoder doesn't decode surrogate pairs and so test_unicode.UnicodeTest.test_codecs() is failing on Windows (16-bit wchar_t). I don't know if this codec is still revelant with the PEP 393 because the internal representation is now depending on the maximum character (Py_UCS1*, Py_UCS2* or Py_UCS4*), whereas it was a fixed size with Python <= 3.2 (Py_UNICODE*).
Should we: * Drop this codec (public and documented, but I don't know if it is used) * Use wchar_t* (Py_UNICODE*) to provide a result similar to Python 3.2, and so fix the decoder to handle surrogate pairs * Use the real representation (Py_UCS1*, Py_UCS2 or Py_UCS4* string) ? The failure on Windows: FAIL: test_codecs (test.test_unicode.UnicodeTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "D:\Buildslave\3.x.moore-windows\build\lib\test\test_unicode.py", line 1408, in test_codecs self.assertEqual(str(u.encode(encoding),encoding), u) AssertionError: '\ud800\udc01\ud840\udc02\ud880\udc03\ud8c0\udc04\ud900\udc05' != '\U00030003\U00040004\U00050005' Victor _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com