Hello,
I am using the org.apache.poi.hwpf.extractor.WordExtractor class to
extract the text from MS Word documents. The problem is that the output
includes not only the text of interest, but also some keywords indicating
the text format, e.g. TOC, HYPERLINK, REF, etc. Is there anyway to
recognize and exclude these keywords?
I used the getIstd() function from org.apache.poi.hwpf.model.PAPX to
access the sti codes of individual paragraphs. However, I did not find a
similar class or function that can be applied to individual words.
Any help is much appreciated.
Regards,
Leila
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
Mailing List: http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project: http://jakarta.apache.org/poi/