POI WordExtractor Not Extracting Entire Document

Ahmed, Sana R (IS) Mon, 24 Aug 2009 13:01:48 -0700

Hi.
 
We are using poi 3.5 beta 6 in production to extract office documents.  We came 
across a document where it did not extract completely.  The extracted text 
appears to have left out a couple of paragraphs from the middle of the 
document.  
 
Here is a link to the document.  
http://www.mediafire.com/?sharekey=2e6a7badb4ab32e07f7ec40ad
 
The following is the snippet of code we are using to extract the document.
 
   WordExtractor we = new WordExtractor(new FileInputStream(args[0]));
   BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(new 
FileOutputStream(outputFile), "UTF-8"));
   bw.write(we.getText().replaceAll("\n", 
System.getProperty("line.separator")));
   bw.flush();
   bw.close();
 
This is a major production problem, so please respond as soon as possible.  
 
Thanks!

POI WordExtractor Not Extracting Entire Document

Reply via email to