Ronald Oussoren <ronaldousso...@mac.com> added the comment:
BTW. The unicodeFromTclStringAndSize() basically undoes the special treatment of \0 in Modified UTF-8 [1]. That page says that all known implementation of MUTF-8 treat surrogate pairs the same as CESU-8 [2], which is UTF-8 with characters outside of the BMP encoded as surrogate pairs which are then converted to UTF-8. Neither encoding is currently supported by Python. [1] https://en.wikipedia.org/wiki/UTF-8#Modified_UTF-8 [2] https://en.wikipedia.org/wiki/CESU-8 ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue42318> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com