[issue25937] DIfference between utf8 and utf-8 when i define python source code encoding.

Marc-Andre Lemburg Sun, 27 Dec 2015 04:34:11 -0800

Marc-Andre Lemburg added the comment:

On 27.12.2015 02:05, Serhiy Storchaka wrote:
> 
>> I wonder why this does not trigger the exception.
> 
> Because in case of utf-8 and iso-8859-1 decoding and encoding steps are 
> omitted.
>
> In general case the input is decoded from specified encoding and than encoded 
> to UTF-8 for parser. But for utf-8 and iso-8859-1 encodings the parser gets 
> the raw data.


Right, but since the tokenizer doesn't know about "utf8" it
should reach out to the codec registry to get a properly encoded
version of the source code (even though this is an unnecessary
round-trip).

There are few other aliases for UTF-8 which would likely trigger
the same problem:

    # utf_8 codec
    'u8'                 : 'utf_8',
    'utf'                : 'utf_8',
    'utf8'               : 'utf_8',
    'utf8_ucs2'          : 'utf_8',
    'utf8_ucs4'          : 'utf_8',

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue25937>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue25937] DIfference between utf8 and utf-8 when i define python source code encoding.

Reply via email to