On 24 Apr 2002, at 14:38, Jungshik Shin wrote:

> We don't expect text tools
> to work on files in UTF-16 the same way as we would expect them to work
> on files in UTF-8 or other ASCII-compatible encodings.

   But it might well be desirable to have UNIX-like tools that work on UTF-16  
files, in a way analogous to the way that the existing tools work with ASCII. 
The underlying philosophy of the UNIX toolset can clearly be applied with equal 
success in a world where "plain text" is UTF-16 everywhere:

      cat16 f1 f2 f3 f4 | sort16 | uniq16 | sed16 '....' > f5

   As we see, we need different versions of all the text tools. This is 
inconvenient, but not an insurmountable problem. (Maybe they could even be 
derived from the same source code as the 8-bit varieties. Maybe some future 
system will have *only* 16-bit text tools.)

   But a BOM in every UTF-16 plain text file would make this completely 
hopeless. If we ever think we might want to do UNIX-style text processing on 
UTF-16, we have to resist that!

        /|
 o o o (_|/
        /|
       (_/

Reply via email to