Jason Orendorff wrote: > On 9/13/06, John S. Yates, Jr. <[EMAIL PROTECTED]> wrote: > >>It is a mistake on Microsoft's part to fail to strip the BOM >>during conversion to UTF-8. > > John, you're mistaken about the reason this BOM is here. > > In Notepad at least, the BOM is intentionally generated when writing > the file. It's not a "mistake" or "laziness". It's metadata. (I > admit the BOM was not originally invented for this purpose.) > >>There is no MEANINGFUL definition of BOM in a UTF-8 >>string. > > This thread is about files, not strings. At the start of a file, a > UTF-8 BOM is meaningful. It means the file is UTF-8. > > On Windows, there's a system default encoding, and it's never UTF-8.
The Windows system encoding can be UTF-8, but only for some locales recently added in Windows 2000/XP, where there was no compatibility constraint to use a non-Unicode encoding. You're correct about the use of a BOM as a signature. All Unicode-conformant applications should accept this use of a BOM in UTF-8 (although they need not generate it); the standard is quite clear on that. -- David Hopwood <[EMAIL PROTECTED]> _______________________________________________ Python-3000 mailing list [email protected] http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
