[issue42318] [tkinter] surrogate pairs in Tcl/Tk string when pasting an emoji in a text widget

Ronald Oussoren Fri, 13 Nov 2020 09:05:25 -0800


Ronald Oussoren <ronaldousso...@mac.com> added the comment:


BTW. The unicodeFromTclStringAndSize() basically undoes the special treatment 
of \0 in Modified UTF-8 [1]. That page says that all known implementation of 
MUTF-8 treat surrogate pairs the same as CESU-8 [2], which is UTF-8 with 
characters outside of the BMP encoded as surrogate pairs which are then 
converted to UTF-8.

Neither encoding is currently supported by Python.

[1] https://en.wikipedia.org/wiki/UTF-8#Modified_UTF-8
[2] https://en.wikipedia.org/wiki/CESU-8

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue42318>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue42318] [tkinter] surrogate pairs in Tcl/Tk string when pasting an emoji in a text widget

Reply via email to