[users] Help reqd on how OOo 2 stores Unicode in content.xml

Shriramana Sharma Sun, 01 Jan 2006 21:32:31 -0800

Hello.

Opening OOo (first Writer then Calc) I entered the Unicode sequence: 
 
0928 092e 0938 094d 0924 0947


(Devanagari script for namaste = "I bow to you" = greeting)

but I find that OOo (both Writer and Calc) stores it as the following sequence 
in content.xml - 
 
e0 a4 a8 e0 a4 ae e0 a4 b8 e0 a5 8d e0 a4 a4 e0 a5 87 
 
A friend told me that on Windows the digits are stored in little-endian model, 
and sure enough Windows Notepad saved the following Unicode text file: 
 
28 09 2e 09 38 09 4d 09 24 09 47 09 
 
But I fail to see any connection between the above text and what OOo stored. 

Kate on Linux stored the text same as OOo:

e0 a4 a8 e0 a4 ae e0 a4 b8 e0 a5 8d e0 a4 a4 e0 a5 87

which is again different from the original Unicode sequence.

I observe that e0 occupies positions 1, 4, 7 etc, and the length of the Kate / 
OOo text in bytes is exactly one third greater than that of the Notepad text. 
Apart from that  I fail to identify any pattern relation between the Notepad 
text (original Unicode sequence) and the Kate / OOo text.

Can anyone please elucidate this situation and why OOo stores the Unicode text 
in a different way from the actual Unicode sequence? What is the exact 
algorithm OOo / Kate uses to do this change?

I would like to write to an ODF file directly from an external application, 
which is why I ask.

Thanks.

P.S: First I sent this before subscribing, so it may have reached the 
moderator's desk. I apologise to him/her for the duplicate.

-- 

Penguin #395953 resides at http://samvit.org
subsisting on SUSE Linux 10.0 with KDE 3.5

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[users] Help reqd on how OOo 2 stores Unicode in content.xml

Reply via email to