Lino Mastrodomenico a écrit :
Only for the new utf-8b encoding (if Martin agrees), while the existing utf-8 is fine as is (or at least waaay outside the scope of this PEP).
This is questionable. This would have the consequence that \udcxx in a python string would sometimes mean a surrogate, and sometimes mean raw bytes, depending on the history of the string.
By contrast, if the new utf-8b codec would *supercede* the old one, \udcxx would always mean raw bytes (at least on UCS-4 builds, where surrogates are unused). Thus ambiguity could be avoided.
Baptiste _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com