Hello:

 

Here's a brand new PDFBox user with a problem

 

   Aug 12, 2011 11:57:52 AM org.apache.pdfbox.filter.FlateFilter decode

SEVERE: Stop reading corrupt stream

 

I've found these [possibly] related issues

 

   https://issues.apache.org/jira/browse/PDFBOX-872   [resolved]

   https://issues.apache.org/jira/browse/PDFBOX-697   [unresolved]
mentioned as possible duplicate of 872

 

I'm using PDFBox version 1.6 / Windows XP/ Java 7. Two PDF docs in
question are 1.4. One was created by the PDFComplete plugin to Windows
Word 2010. The other was created by OpenOffice 3.x Write from the
original Word .docx file.

 

Comments in the issues [above] seem related to encryption/decryption,
but the docs have not been encrypted [unless these producing tools do so
implicitly]. The files can both be viewed in Adobe Acrobat Reader and
don't require a password.

 

The code in question looks like

 

   String result = null;

    try ( FileInputStream fis = new FileInputStream( file ); ) {

      PDFParser parser = new PDFParser( fis );

      parser.parse();

      COSDocument cd = parser.getDocument();

      PDDocument  pd = new PDDocument( cd );

      cd.close();

      PDFTextStripper stripper = new PDFTextStripper();

      result = stripper.getText( pd );

      pd.close();

    }

    catch ( FileNotFoundException ex ) { ...

    }

    catch ( IOException ex ) {...

    }

 

And hints or suggestions on how to proceed?

 

Thanks.

Reply via email to