Henry,
 
There are a few things you can try.
 
1. Take a look at org.textmining's Word text extractor:

org.textmining.text.extraction.WordExtractor

All you have to do is this: 

new WordExtractor().extractText(inputStream)

2. There is also the POI extractor:

org.apache.poi.hdf.extractor.WordDocument

All you do is:

WordDocument wd = new WordDocument(is);
StringWriter docTextWriter = new StringWriter();
wd.writeAllText(new PrintWriter(docTextWriter));
docTextWriter.close();
text = docTextWriter.toString();

3. I'd also check out the following:

org.semanticdesktop.aperture.extractor.word.WordExtractor

here: http://aperture.sourceforge.net/doc/javadoc/index.html

Hope this helps,
- Dmitry


________________________________

From: Henry Lu [mailto:[EMAIL PROTECTED]
Sent: Thu 5/17/2007 1:19 PM
To: poi-user@jakarta.apache.org
Subject: reading MS word file



Is there an example/code  to read a MS Word file for text line by line.
All I am interested in is the text regardless format, style, font...

-Henry

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/



Reply via email to