On Fri, Feb 15, 2002 at 09:47:54AM -0800, Rick Cameron wrote: > If there is a file on disc called foo.txt, it is clearly not typed data. > Thus, it appears to be Mr Davis' opinion that when such a file contains > UTF-8 data, it is quite appropriate for there to be a BOM at the start.
In a global sense, it may be appropriate for a UTF-8 file to have a BOM. However, in a Unix context - and UTF-8 was originally designed for Unix and Unix-like systems - it is worthless and annoying. Take, for example, three files: A: <BOM>C<LF>AB<LF> B: <BOM>ABC<LF>AB<LF> C: <BOM>ABCDEFG<LF> and the operation grep "AB" A B > file; cat C >> file you'll end up with file: A: AB<LF>B: <BOM>ABC<LF>B: AB<LF><BOM>ABCDEFG<LF> That's a document with two BOM's, and none at the start of the file. There's no simple way to fix this; grep doesn't know if it's working on UTF-8 text or raw binary or Latin-1 (I frequently do grep foo file | recode l1..utf-8), and it doesn't know whether its output is going to the screen or a file or the tail of a file or the input of another program. Again, while globally, UTF-8 BOM's might work, in Unix they will be more of a nuisance than a help. -- David Starner / Давид Старнэр - [EMAIL PROTECTED] Pointless website: http://dvdeug.dhis.org What we've got is a blue-light special on truth. It's the hottest thing with the youth. -- Information Society, "Peace and Love, Inc."

