Hi all!. I extract the text from MS Words documentos using this code:
HWPFDocument wdoc = new HWPFDocument(stream);
Range r = wdoc.getRange();
for (int x = 0; x < r.numSections(); x++){
Section s = r.getSection(x);
for (int y = 0; y < s.numParagraphs(); y++){
Paragraph p = s.getParagraph(y);
for (int z = 0; z < p.numCharacterRuns(); z++){
//character run
CharacterRun run = p.getCharacterRun(z);
//character run text
String text = run.text();
String finalText = new String();
byte[] b1=text.getBytes();
// show us the text
output.write(b1);
}
}
}
output.close();
stream.close();
The problem is I also get text from internal information of MSWord, for
example, the hyperlinks like this:
"4.1- Introducción PAGEREF _Toc142772733 \h 31
HYPERLINK \l "_Toc142772734" 4.2- Apple webobjects PAGEREF _Toc142772734
\h 32"
Can you give me any solution??
Thank's in advance.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
Mailing List: http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project: http://jakarta.apache.org/poi/