Marc-Andre Lemburg added the comment:
On 27.12.2015 02:05, Serhiy Storchaka wrote:
>
>> I wonder why this does not trigger the exception.
>
> Because in case of utf-8 and iso-8859-1 decoding and encoding steps are
> omitted.
>
> In general case the input is decoded from specified encoding and than encoded
> to UTF-8 for parser. But for utf-8 and iso-8859-1 encodings the parser gets
> the raw data.
Right, but since the tokenizer doesn't know about "utf8" it
should reach out to the codec registry to get a properly encoded
version of the source code (even though this is an unnecessary
round-trip).
There are few other aliases for UTF-8 which would likely trigger
the same problem:
# utf_8 codec
'u8' : 'utf_8',
'utf' : 'utf_8',
'utf8' : 'utf_8',
'utf8_ucs2' : 'utf_8',
'utf8_ucs4' : 'utf_8',
----------
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue25937>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com