RE: PRODUCING and DESCRIBING UTF-8 with and without BOM

Joseph Boyle Mon, 04 Nov 2002 20:26:27 -0800

Yes, the software business is largely about dealing with the BADLY WRITTEN,
the TRIVIAL, and the BRAIN-DEAD. Your point?


Newline problems are a good analogy. They still require bookkeeping of
different formats and attention in any new coding and cause new bugs, even
though the problem has been around for decades. Nobody is holding their
breath for any of the platforms to change their newline convention to match
the others or even update all their tools to deal with the differences -
bare LF still doesn't work in Notepad.

-----Original Message-----
From: Edward H Trager [mailto:ehtrager@;umich.edu] 
Sent: Monday, November 04, 2002 9:19 AM
To: Unicode Mailing List
Subject: Re: PRODUCING and DESCRIBING UTF-8 with and without BOM



Hi, everyone,

It's almost unbelievable to me how many email postings are wasted on
discussions such as this UTF-8 BOM issue ... I guess it means that there is
a lot of BADLY WRITTEN software out there in the world ;-)

With regard to READING incoming UTF-8 text streams, surely any good software
designer will do exactly as Michael Michka has suggested here:

> INCOMING TEXT: Trivial to simply check. I say (once again) its THREE 
> BYTES.

With regard to EMITTING outgoing UTF-8 text streams, IMHO the default should
be to do what is simplest, which is *not* output the BOM.  It is superfluous
to have it on UTF-8 streams.  There's no harm in having a global option to
turn BOM outputting on for the benefit of BRAIN-DEAD programs that are going
to read the text:

> EMITTING: They could simply choose globally whether to emit the BOM or 
> not. If they wanted to get "fancy" they could have a command line 
> option which said whether to emit the bytes or not. But that is 
> optional.

The whole issue is analogous to the CR\LF issue in ASCII texts across
different platforms.  Well-written software is able to READ the text
properly regardless of whether lines end in CR, LF, or CR\LF.

RE: PRODUCING and DESCRIBING UTF-8 with and without BOM

Reply via email to