I want to store unicode characters in word doc, but if i store some russian
Characters only "?" will be displayed. (these chracters exists in unicode)
I think the encoding of these characters are unicode because when i print it to
sysout they will be display correctly.
This sample get's the text from doc and print it to stdout
System.out.println("#########");
TextPiece piece;
Iterator textPieces =
mydoc_output.getTextTable().getTextPieces().iterator();
String text1;
StringBuffer buffer = new StringBuffer();
while (textPieces.hasNext()) {
piece = (TextPiece) textPieces.next();
try {
text1 = new String(piece.getRawBytes(), "UTF-16LE");
buffer.append(text1);
} catch (UnsupportedEncodingException e) {
throw new InternalError("Standard Encoding " + "UTF-16LE" +
"not found, JVM broken");
}
}
text1 = buffer.toString();
System.out.println(text1);
System.out.println("+#+#+#+#+#+");
e.q.
#########
ﻱﺑẬ
"April"
"Апрель"
+#+#+#+#+#+
Then i add text1 to the range, i am getting only "?" for russian characters.
--begin output word doc
???
"April"
"??????"
-- end word doc
dops
> -----Ursprüngliche Nachricht-----
> Von: MSB [mailto:[email protected]]
> Gesendet: Freitag, 22. Januar 2010 15:16
> An: [email protected]
> Betreff: Re: AW: AW: how to set character encoding in new doc file
>
>
> Hello Andreas,
>
> I think that Nick is referring to explictly encoding the
> Strings using the required/desired character encoding; there
> are constructors for the java.lang.String class that do allow
> you to specify the character encoding to the bytes you can
> strip from the String you have read.
>
> Remember that HWPF is still very imature as an API and it
> could well be that the sort of thing you are asking for has
> not yet been included. The best long term solution may be to
> join the development team and contribute.
>
> Yours
>
> Mark B
>
>
> Doppelhofer Andreas wrote:
> >
> > I use HWPFDocument(...) to read the document. When i print
> the string
> > (some text in doc) to stdout/stderr all characters are displayed
> > correctly, put when i write it to a new doc file, all russian
> > characters are stored with "?".
> >
> > This is ok:
> > System.out.println(line);
> >
> > This is nok: (after opening with word) range.insertAfter(line);
> >
> > dops
> >
> >> -----Ursprüngliche Nachricht-----
> >> Von: Nick Burch [mailto:[email protected]]
> >> Gesendet: Freitag, 22. Januar 2010 11:20
> >> An: POI Users List
> >> Betreff: Re: AW: how to set character encoding in new doc file
> >>
> >> On Fri, 22 Jan 2010, Doppelhofer Andreas wrote:
> >> > Can anybody help me with this problem?
> >>
> >> Word (plus excel, powerpoint etc) can store strings as unicode or
> >> non-unicode. POI works only with java unicode strings, and handles
> >> reading and writing the strings to the appropriate kinds
> of bytes for
> >> you.
> >>
> >> Make sure you're correctly passing your strings as unicode
> into java,
> >> converting the encoding as needed.
> >>
> >> Nick
> >>
> >>
--
Salomon Automation GmbH - Friesachstrasse 15 - A-8114 Friesach bei Graz
Sitz der Gesellschaft: Friesach bei Graz
UID-NR:ATU28654300 - Firmenbuchnummer: 49324 K
Firmenbuchgericht: Landesgericht für Zivilrechtssachen Graz
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]