On Jan 8, 2010, at 4:14 PM, Tres Seaver wrote:
I understood this proposal as a general processing guideline, not
something the io library should do (but, say, a text editor).

FWIW, I'm personally in favor of using the UTF-8 signature. If people
consider them crazy talk, that may be because UTF-8 can't possibly have a byte order - hence I call it a signature, not the BOM. As a signature,
I don't consider it crazy at all. There is a long tradition of having
magic bytes in files (executable files, Postscript, PDF, ... - see
/etc/magic). Having a magic byte sequence for plain text to denote the encoding is useful and helps reducing moji-bake. This is the reason it's used on Windows: notepad would normally assume that text is in the ANSI code page, and for compatibility, it can't stop doing that. So the UTF-8
signature gives them an exit strategy.

Agreed. Having that marker at the start of the file makes interop with
other tools *much* easier.

Putting the BOM at the beginning of UTF-8 text files is not a good idea, it makes interop much *worse* on a unix system, not better. Without the BOM, most commands do the right thing with UTF-8 text. E.g. to concatenate two files:

$ cat file-1 file-2 > file-3

With a BOM at the beginning of the file, it won't work right. Of course, you could modify "cat" (and every other stream processing command) to know how to consume and emit BOMs, and omit the extra one that would show up in the middle of the stream...but even that can't work; what about:
$ (cat file-1; cat file-2) > file-3.

Should the shell now know that when you run multiple commands, it should eat the BOM emitted from the second command?

Basically, using a BOM in a utf-8 file is just not a good idea: it completely ruins interop with every standard unix tool.

This is not to say that Python shouldn't have a way to read a file with a UTF-8 BOM: it just shouldn't encourage you to *write* such files.

James
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to