Doug Ewell wrote: > fine (as are LF->CRLF, stripped BOM's, and maybe even some edge cases > like converting between tabs and spaces). If there are any > security or > spoofing concerns, it's best to leave everything completely untouched.
I see this as a good reason for NOT using BOM in UTF-8 files. CRLF is a major nuisance that many Windows programmers need to deal with. It requires text vs. binary mode when opening the files, plus size of the file does not match the number of characters written or read. UNIX programs usually don't need to bother with all that. Now, expecting that UNIX programs will need to deal with BOM's would introduce a similar problem. One could say that they will need to anyway, in order to read UTF-16 files. But I don't believe that will ever happen. UTF-8 is the perfect solution for UNIX and UTF-16 will be dealt with by converting entire files, never processing them directly (as far as simple grep-like programs are concerned). Lars Kristan

