{{ But a BOM in every UTF-16 plain text file would make this completely 
hopeless. If we ever think we might want to do UNIX-style text processing on 
UTF-16, we have to resist that! }}

If you're going to take the trouble of making text tools 16-bit aware, then 
you can afford to make them BOM-aware too.

type a.txt b.txt c.txt > d.txt

on Windows 2000, assuming that they are all UTF-16 (with an FFFE at the 
beginning of each, as is usual in MS-Windows Unicode files), strips every 
BOM except the last, so that d.txt has only the usual one initial FFFE. So 
it's not an immovable obstacle.

Concerning text files: nearly all of plain-text Unicode I've ever seen is in 
UTF-8. However, the ubiquitous MS-Office documents, from Office 2000 
onwards, are all in UTF-16 (little-endian, without BOM).

_________________________________________________________________
Join the world�s largest e-mail service with MSN Hotmail. 
http://www.hotmail.com


Reply via email to