Hi all!. I extract the text from MS Words documentos using this code:

HWPFDocument wdoc = new HWPFDocument(stream);
Range r = wdoc.getRange();

for (int x = 0; x < r.numSections(); x++){
     Section s = r.getSection(x);
     for (int y = 0; y < s.numParagraphs(); y++){
         Paragraph p = s.getParagraph(y);
         for (int z = 0; z < p.numCharacterRuns(); z++){
             //character run
             CharacterRun run = p.getCharacterRun(z);
             //character run text
            String text = run.text();
            String finalText = new String();

            byte[] b1=text.getBytes();
            // show us the text
            output.write(b1);
            }
        }
}
        output.close();
        stream.close();

The problem is I also get text from internal information of MSWord, for
example, the hyperlinks like this:

   "4.1- Introducción PAGEREF _Toc142772733 \h 31
HYPERLINK \l "_Toc142772734" 4.2- Apple webobjects PAGEREF _Toc142772734
\h 32"


Can you give me any solution??

Thank's in advance.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/

Reply via email to