Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

Guido van Rossum Fri, 08 Jan 2010 07:56:29 -0800

On Thu, Jan 7, 2010 at 11:55 PM, Glyph Lefkowitz
<gl...@twistedmatrix.com> wrote:
> I'm saying that the BOM itself isn't enough to detect that the file is 
> actually UTF-8.


And I'm saying that it is, with as much certainty as we can ever guess
the encoding of a file.

> If (for whatever reason: explicitly specified, guessed in some other way) the 
> file's encoding is determined to be something else, the bytes comprising the 
> BOM should be decoded as normal.  It's just that the UTF-8 decoding of the 
> BOM at the start of a file should be "".

Sure, a Latin-1-encoded file could start with the same pattern that is
a UTF-8-encoded BOM. But at that point, a UTF-16-encoded file is also
valid Latin-1.

The question was in the context of encoding-guessing; if we're
guessing, a UTF-8-encoded BOM cannot signify anything else but UTF-8.
(Ditto for UTF-16 and UTF-32 BOMs.)

-- 
--Guido van Rossum (python.org/~guido)
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

Reply via email to