Re: PRODUCING and DESCRIBING UTF-8 with and without BOM

Edward H Trager Mon, 04 Nov 2002 09:50:48 -0800

Hi, everyone,

It's almost unbelievable to me how many email postings are wasted on
discussions such as this UTF-8 BOM issue ... I guess it means that there
is a lot of BADLY WRITTEN software out there in the world ;-)


With regard to READING incoming UTF-8 text streams, surely any good
software designer will do exactly as Michael Michka has suggested here:

> INCOMING TEXT: Trivial to simply check. I say (once again) its THREE
> BYTES.

With regard to EMITTING outgoing UTF-8 text streams, IMHO the default
should be to do what is simplest, which is *not* output the BOM.  It is
superfluous to have it on UTF-8 streams.  There's no harm in having a
global option to turn BOM outputting on for the benefit of BRAIN-DEAD
programs that are going to read the text:

> EMITTING: They could simply choose globally whether to emit the BOM or not.
> If they wanted to get "fancy" they could have a command line option which
> said whether to emit the bytes or not. But that is optional.

The whole issue is analogous to the CR\LF issue in ASCII texts across
different platforms.  Well-written software is able to READ the text
properly regardless of whether lines end in CR, LF, or CR\LF.

Re: PRODUCING and DESCRIBING UTF-8 with and without BOM

Reply via email to