Re: [Python-Dev] What does a double coding cookie mean?

Serhiy Storchaka Sat, 19 Mar 2016 01:51:38 -0700

On 17.03.16 19:23, M.-A. Lemburg wrote:

On 17.03.2016 15:02, Serhiy Storchaka wrote:

On 17.03.16 15:14, M.-A. Lemburg wrote:

On 17.03.2016 01:29, Guido van Rossum wrote:

Should we recommend that everyone use tokenize.detect_encoding()?


I'd prefer a separate utility for this somewhere, since
tokenize.detect_encoding() is not available in Python 2.

I've attached an example implementation with tests, which works
in Python 2.7 and 3.


Sorry, but this code doesn't match the behaviour of Python interpreter,
nor other tools. I suggest to backport tokenize.detect_encoding() (but
be aware that the default encoding in Python 2 is ASCII, not UTF-8).


Yes, I got the default for Python 3 wrong. I'll fix that. Thanks
for the note.

What other aspects are different than what Python implements ?


1. If there is a BOM and coding cookie, the source encoding is "utf-8-sig".

2. If there is a BOM and coding cookie is not 'utf-8', this is an error.

3. If the first line is not blank or comment line, the coding cookie isnot searched in the second line.

4. Encoding name should be canonized. "UTF8", "utf8", "utf_8" and"utf-8" is the same encoding (and all are changed to "utf-8-sig" with BOM).

5. There isn't the limit of 400 bytes. Actually there is a bug withhandling long lines in current code, but even with this bug the limit islarger.


6. I made a mistake in the regular expression, missed the underscore.

tokenize.detect_encoding() is the closest imitation of the behavior ofPython interpreter.


_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] What does a double coding cookie mean?

Reply via email to