Hi. We are using poi 3.5 beta 6 in production to extract office documents. We came across a document where it did not extract completely. The extracted text appears to have left out a couple of paragraphs from the middle of the document. Here is a link to the document. http://www.mediafire.com/?sharekey=2e6a7badb4ab32e07f7ec40ad The following is the snippet of code we are using to extract the document. WordExtractor we = new WordExtractor(new FileInputStream(args[0])); BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(outputFile), "UTF-8")); bw.write(we.getText().replaceAll("\n", System.getProperty("line.separator"))); bw.flush(); bw.close(); This is a major production problem, so please respond as soon as possible. Thanks!
