Max Bolingbroke <batterseapo...@hotmail.com> added the comment:

As of Python 3.7.9 this also affects \xf9\xd6 which should be \u7881 in 
Unicode. This character is the second character of 宏碁 which is the name of the 
Taiwanese electronics manufacturer Acer.

You can work around the issue using big5hkscs just like with the original 
\xf9\xd8 problem.

It looks like the F9D6–F9FE characters all come from the Big5-ETen extension 
(https://en.wikipedia.org/wiki/Big5#ETEN_extensions, 
https://moztw.org/docs/big5/table/eten.txt) which is so popular that it is a 
defacto standard. Big5-2003 (mentioned in a comment below) seems to be an 
extension of Big5-ETen. For what it's worth, whatwg includes these mappings in 
their own big5 reference tables: https://encoding.spec.whatwg.org/big5.html. 

Unfortunately Big5 is still in common use in Taiwan. It's pretty funny that 
Python fails to decode Big5 documents containing the name of one of Taiwan's 
largest multinationals :-)

----------
nosy: +batterseapower

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue7856>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to