Eryk Sun added the comment: Probably Python 2's UTF-16 decoder should be as broken as the encoder, which will match the broken behavior of the UTF-8 and UTF-32 codecs:
>>> u'\ud83d\uda12'.encode('utf-8').decode('utf-8') u'\ud83d\uda12' >>> u'\ud83d\uda12'.encode('utf-32-le').decode('utf-32-le') u'\ud83d\uda12' Lone surrogate codes aren't valid Unicode. In Python 3 they get used internally for tricks like the "surrogateescape" error handler. In Python 3.4+. the 'surrogatepass' error handler allows encoding and decoding lone surrogates: >>> u'\ud83d\uda12'.encode('utf-16le', 'surrogatepass') b'=\xd8\x12\xda' >>> _.decode('utf-16le', 'surrogatepass') '\ud83d\uda12' ---------- nosy: +eryksun _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue27971> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com