On Thu, 2 Feb 2012, Lawrence Tsang wrote:
As a newbie of Apache POI, I use the "org.apache.poi.hwpf.Word2Forrest" class to extract text in a MS Word 2003 document.

I wouldn't recommend using that class for text extraction, unless you really need it to come out in the Forrest format

Instead, you should use one of:
 * org.apache.poi.hwpf.extractor.WordExtractor
 * org.apache.poi.hwpf.converter.WordToTextConverter (or HTML or Fo)
 * Apache Tika

Depending on if you want plain text, clean html, HTML with full document stylings etc

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to