Re: Encoding issues

Joseph Kesselman 3 Dec 2002 13:59:41 -0000

UTF-8 can represent any Unicode character... but it does so by turning 
some of them into multiple-byte sequences, and in order to do so it has to 
reserve the bytes above 0x7F for that purpose. If you try to use those 
bytes as characters themselves, UTF-8 conversion will fail. See the RFC 
for more detail; it's not hard to find with a websearch.


There is probably an encoding that would work for your files -- but you'll 
have to determine what it is and explicitly specify it.

______________________________________
Joe Kesselman  / IBM Research

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Encoding issues

Reply via email to