Yes, the software business is largely about dealing with the BADLY WRITTEN, the TRIVIAL, and the BRAIN-DEAD. Your point?
Newline problems are a good analogy. They still require bookkeeping of different formats and attention in any new coding and cause new bugs, even though the problem has been around for decades. Nobody is holding their breath for any of the platforms to change their newline convention to match the others or even update all their tools to deal with the differences - bare LF still doesn't work in Notepad. -----Original Message----- From: Edward H Trager [mailto:ehtrager@;umich.edu] Sent: Monday, November 04, 2002 9:19 AM To: Unicode Mailing List Subject: Re: PRODUCING and DESCRIBING UTF-8 with and without BOM Hi, everyone, It's almost unbelievable to me how many email postings are wasted on discussions such as this UTF-8 BOM issue ... I guess it means that there is a lot of BADLY WRITTEN software out there in the world ;-) With regard to READING incoming UTF-8 text streams, surely any good software designer will do exactly as Michael Michka has suggested here: > INCOMING TEXT: Trivial to simply check. I say (once again) its THREE > BYTES. With regard to EMITTING outgoing UTF-8 text streams, IMHO the default should be to do what is simplest, which is *not* output the BOM. It is superfluous to have it on UTF-8 streams. There's no harm in having a global option to turn BOM outputting on for the benefit of BRAIN-DEAD programs that are going to read the text: > EMITTING: They could simply choose globally whether to emit the BOM or > not. If they wanted to get "fancy" they could have a command line > option which said whether to emit the bytes or not. But that is > optional. The whole issue is analogous to the CR\LF issue in ASCII texts across different platforms. Well-written software is able to READ the text properly regardless of whether lines end in CR, LF, or CR\LF.

