Working in a large organization whose product includes a large number of configuration 
and data files in text formats, I can say something about what we have found to work 
during development, localization, and release engineering, across multiple platforms .

We have eliminated UTF-16 text file formats in favor of UTF-8 because of Unix standard 
toolkit and other Unix-based tools' poor ability to deal with UTF-16. On the other 
hand, the BOM on UTF-8 has been useful and has not caused problems with Unix tools 
processing, including pipe sequences. Raw concatenation of files which would produce 
internal ZWNBSPs is not part of any of our processing as far as I know.

-----Original Message-----
From: David Starner [mailto:[EMAIL PROTECTED]] 
Sent: Thursday, November 07, 2002 12:14 PM
To: Markus Scherer
Cc: unicode
Subject: Re: Names for UTF-8 with and without BOM - pragmatic


On Wed, Nov 06, 2002 at 09:47:43AM -0800, Markus Scherer wrote:
> The fact is that Windows uses UTF-8 and UTF-16 plain text files with 
> signatures (BOMs) very simply, gracefully, and successfully. It has 
> applied what I called the "pragmatic" approach here for about 10 
> years. It just works.

It just works in an environment where relatively few documents are plain text, and 
that doesn’t use pipes of text as universal glue. C has been described as a 
(C)haracter processing language; whether or not that’s accurate, Awk and Perl 
certainly are; these are all Unix programming languages, and at the heart of what Unix 
is. The simple Unix program has a stream of text coming in and a stream of text going 
out, whereas the simple Windows program has a window. What works for Windows may very 
well not work for Unix. 

-- 
David Starner - [EMAIL PROTECTED]
Great is the battle-god, great, and his kingdom--
A field where a thousand corpses lie. 
  -- Stephen Crane, "War is Kind"


Reply via email to