On Thu, 10 Aug 2006, Nir Nußbaum wrote:
I am trying to extract pure text from Word (to index into Lucene):
I did:
* org.apache.poi.hwpf.extractor.WordExtractor we=new
org.apache.poi.hwpf.extractor.WordExtractor(is);
bodyText=we.getText();
*snip*
at org.apache.poi.hdf.extractor.WordDocument.<init>(
WordDocument.java:193)
Which are you using, hdf or hwpf? You will probably have more luck with
hwpf than hdf.
My best guess though is that these aren't word documents. Try opening them
in word, and see what they really are (eg rtf)
Nick
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
Mailing List: http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project: http://jakarta.apache.org/poi/