On Thu, 10 Aug 2006, Nir Nußbaum wrote:
I am trying to extract pure text from Word (to index into Lucene):
I did:
*            org.apache.poi.hwpf.extractor.WordExtractor we=new
org.apache.poi.hwpf.extractor.WordExtractor(is);
          bodyText=we.getText();

*snip*

      at org.apache.poi.hdf.extractor.WordDocument.<init>(
WordDocument.java:193)

Which are you using, hdf or hwpf? You will probably have more luck with hwpf than hdf.


My best guess though is that these aren't word documents. Try opening them in word, and see what they really are (eg rtf)

Nick
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/

Reply via email to