Hello,

 

I'm trying to use PdfBox to identify a PDF file is corrupted or not. We are
trying to automate a process in which it is going to loop through a given
folder and see how many of the PDF files are corrupted. This program works
fine in windows XP environment (OS Version: x86 Windows XP 5.1, Java version
: Java HotSpot(tm) Client VM 1.5.0-15-b04). When we ran this application in
UNIX box (OS Version: PA_RISC2.0 HP-UX B.11.23, Java Version: Java
HotSpot(tm) Client VM 1.5.0.11 jinteg:11.07.07-09:52 PA2.0(aCC_AP)) it throws
the following error.

 

NOTE: This error is not happening for all the time. It throws the error only
for some of the PDF files. Those PDF files are not corrupted and I could open
those PDF files manually and it opens fine.

 

java.io.EOFException: Unexpected end of ZLIB input stream

        at
java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:216)

        at
java.util.zip.InflaterInputStream.read(InflaterInputStream.java:134)

        at org.pdfbox.filter.FlateFilter.decode(FlateFilter.java:97)

        at org.pdfbox.cos.COSStream.doDecode(COSStream.java:290)

        at org.pdfbox.cos.COSStream.doDecode(COSStream.java:235)

        at org.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:170)

        at
org.pdfbox.pdmodel.common.COSStreamArray.getUnfilteredStream(COSStreamArray.j
ava:200)

        at
org.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:101)

        at
ProcessDefinitions.RunAuditProcess.RunAuditProcessGenerateAuditLogMessage.inv
oke(RunAuditProcessGenerateAuditLogMessage.java:212)

        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.
java:25)

        at java.lang.reflect.Method.invoke(Method.java:585)

        at com.tibco.plugin.java.JavaActivity.eval(JavaActivity.java:383)

        at com.tibco.pe.plugin.Activity.eval(Activity.java:209)

        at com.tibco.pe.core.TaskImpl.eval(TaskImpl.java:540)

        at com.tibco.pe.core.Job.a(Job.java:712)

        at com.tibco.pe.core.Job.k(Job.java:501)

        at
com.tibco.pe.core.JobDispatcher$JobCourier.a(JobDispatcher.java:249)

        at
com.tibco.pe.core.JobDispatcher$JobCourier.run(JobDispatcher.java:200)

 

Sample code snippet I use to do the task.

 

PDDocument document = PDDocument.load(<input stream>);

List pages = document.getDocumentCatalog().getAllPages();

If(pages != null && pages.size() > 0) {

  PDPage page = (PDPage)pages.get(i);

  PDStream contents = page.getContents();

  PDFStreamParser parser = null;

  try {

                parser = new PDFStreamParser(contents.getStream());

  } catch(Exception e) {

     System.err.println("This PDF cannot be read. Most possibly it could be
corrupted. " + pdfFileName);

  }

}

 

Could somebody shed some light on this one?

 

Thank you.

Reply via email to