Hi,
I'm not changing the text. I just read it. My problem occurs when
there is any TextCharsAtom because the platform I am using doesn't
support Unicode, just ISO-8859-1. So I had to change the code replacing
UTF-16LE by ISO-8859-1.
So I think I have no way out but show the text, without styles.
Thanks a lot,
--
Tales Paiva
Nick Burch wrote:
On Tue, 5 Dec 2006, Tales Paiva Nogueira wrote:
When PowerPoint stores text in Unicode a unknown char (byte value =
0) is placed between every "normal" char making the text 2 times
longer than it really is.
TextCharsAtoms, and other unicode containing fields in powerpoint
files, are stored as UTF-16. That means two bytes are used to store
every character. US-ASCII will be stored with the second byte zero,
but other characters will need to make some use of the second byte.
If you call getText() on a TextCharsAtom, it'll convert it to a string
for you. You should really be using that, not getting the bytes directly.
Is there any way to keep the style information and get the text as a
TextByteAtom, instead of TextCharsAtom?
Why? PowerPoint decided to make it a TextCharsAtom, rather than a
TextByteAtom, since your string contained at least one character that
couldn't be represented in a TextByteAtom.
HSLF supports upgrading a TextByteAtom to a TextCharsAtom if you try
to set text that can't be held in a TextByteAtom. It doesn't do the
other way around.
If you really want just the low order bytes, call getText() on the
TextCharsAtom, and mangle the string yourself. Not sure why you'd want
to though....
Nick
Yegor Kozlov wrote:
Hi,
Could you provide a test case?
As I understood you did something like this:
- take a ppt file with a text.
- programmatically change the text using HSLF API
- save file
- style information is wrong after save.
Is it correct?
Yegor
TPN> Hi List,
TPN> When PowerPoint stores text in Unicode a unknown char (byte value =
TPN> 0) is placed between every "normal" char making the text 2 times longer
TPN> than it really is. I can ignore these garbage chars, but I lost the text
TPN> style informations, as it's indexes are based in the original unicode
TPN> text with all that unicode trash. :(
TPN> Is there any way to keep the style information and get the text as a
TPN> TextByteAtom, instead of TextCharsAtom?
TPN> Thank you very much.
TPN> --
TPN> Tales Paiva
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
Mailing List: http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project: http://jakarta.apache.org/poi/