On Thu, Jan 7, 2010 at 11:55 PM, Glyph Lefkowitz <gl...@twistedmatrix.com> wrote: > I'm saying that the BOM itself isn't enough to detect that the file is > actually UTF-8.
And I'm saying that it is, with as much certainty as we can ever guess the encoding of a file. > If (for whatever reason: explicitly specified, guessed in some other way) the > file's encoding is determined to be something else, the bytes comprising the > BOM should be decoded as normal. It's just that the UTF-8 decoding of the > BOM at the start of a file should be "". Sure, a Latin-1-encoded file could start with the same pattern that is a UTF-8-encoded BOM. But at that point, a UTF-16-encoded file is also valid Latin-1. The question was in the context of encoding-guessing; if we're guessing, a UTF-8-encoded BOM cannot signify anything else but UTF-8. (Ditto for UTF-16 and UTF-32 BOMs.) -- --Guido van Rossum (python.org/~guido) _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com