Thanks Nick,
As you see from:
org.apache.poi.hwpf.extractor.WordExtractor
I used hwpf.
It is Word, not RTF, albite from 2000.
By the way, I tried to convert with AbiWord command-line and it all went well, more or less.
I try to enclose one of the documents, that can't be converted. Thanks again.

 
2006/8/10, Nick Burch <[EMAIL PROTECTED]>:
On Thu, 10 Aug 2006, Nir Nußbaum wrote:
> I am trying to extract pure text from Word (to index into Lucene):
> I did:
> *            org.apache.poi.hwpf.extractor.WordExtractor we=new
> org.apache.poi.hwpf.extractor.WordExtractor(is);
>           bodyText=we.getText();

*snip*

>       at org.apache.poi.hdf.extractor.WordDocument.<init>(
> WordDocument.java:193)

Which are you using, hdf or hwpf? You will probably have more luck with
hwpf than hdf.


My best guess though is that these aren't word documents. Try opening them
in word, and see what they really are (eg rtf)

Nick


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:   http://jakarta.apache.org/poi/




--
----------------------------------------------
Nir Nußbaum
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/

Reply via email to