hi all,
i have a question getting character encoding for each character (ascii,
unicode, iso-8859-5...) in a Word Document.
Following code snippet extractes the text and convert it into a "hard
coded" Charset Buffer.
Is there a way to get the correct character encoding dynamically?
Say, the first character "a" is ISO-8859-1 and the second is a russian
character (like iso-8859-5) and so on.
fs = new POIFSFileSystem(new FileInputStream("test.doc"));
HWPFDocument mydoc = null;
mydoc = new HWPFDocument(fs);
Range myrange = mydoc.getRange();
for (int i = 0; i < myrange.numParagraphs(); i++) {
Paragraph myparagraph = myrange.getParagraph(i);
String mytext = myparagraph.text();
Charset charset = Charset.forName("ISO-8859-5"); // "hard coded" :-(
CharsetDecoder decoder = charset.newDecoder();
ByteBuffer bbuf = encoder.encode(CharBuffer.wrap(mytext));
// do something with bbuf
}
Thx dops
--
Salomon Automation GmbH - Friesachstrasse 15 - A-8114 Friesach bei Graz
Sitz der Gesellschaft: Friesach bei Graz
UID-NR:ATU28654300 - Firmenbuchnummer: 49324 K
Firmenbuchgericht: Landesgericht fur Zivilrechtssachen Graz
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]