On Monday 02 January 2006 02:36, Shriramana Sharma wrote: > Hello. > > Opening OOo (first Writer then Calc) I entered the Unicode sequence: > > 0928 092e 0938 094d 0924 0947 > > (Devanagari script for namaste = "I bow to you" = greeting) > > but I find that OOo (both Writer and Calc) stores it as the following > sequence in content.xml - > > e0 a4 a8 e0 a4 ae e0 a4 b8 e0 a5 8d e0 a4 a4 e0 a5 87
You need to distinguish between the code representation of a Unicode character, and the way in which it is transported between applications (or stored, sent to a printer etc.) The internal representation on Linux is typically 32 bits per character (I'm guessing that OOo does so). The most recent W3C standards on Unicode have now expanded the character set positions to occupy the lower 18 bits. What Microsoft will eventually do about this I don't know as they are currently restricted to 16 bits. To transmit this data the encoding is by default UTF-8, which is not dependent on the endian-ness of the implementation and is a stream of bytes where the high bits are flag bits, and the remaining ones used to carry the data (roughly speaking). For the Devanagari part of the spectrum, UTF-8 needs three bytes per character, and you can see this in the sequence you gave by dividing it into triplets. You might find these useful ftp://ftp.ilog.fr/pub/Users/haible/utf8/Unicode-HOWTO.html and http://www.alanwood.net/unicode/ or as you are talking about Linux, try man 7 utf-8 Andy. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
