Hello,
I'm trying to use PdfBox to identify a PDF file is corrupted or not. We are
trying to automate a process in which it is going to loop through a given
folder and see how many of the PDF files are corrupted. This program works
fine in windows XP environment (OS Version: x86 Windows XP 5.1, Java version
: Java HotSpot(tm) Client VM 1.5.0-15-b04). When we ran this application in
UNIX box (OS Version: PA_RISC2.0 HP-UX B.11.23, Java Version: Java
HotSpot(tm) Client VM 1.5.0.11 jinteg:11.07.07-09:52 PA2.0(aCC_AP)) it throws
the following error.
NOTE: This error is not happening for all the time. It throws the error only
for some of the PDF files. Those PDF files are not corrupted and I could open
those PDF files manually and it opens fine.
java.io.EOFException: Unexpected end of ZLIB input stream
at
java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:216)
at
java.util.zip.InflaterInputStream.read(InflaterInputStream.java:134)
at org.pdfbox.filter.FlateFilter.decode(FlateFilter.java:97)
at org.pdfbox.cos.COSStream.doDecode(COSStream.java:290)
at org.pdfbox.cos.COSStream.doDecode(COSStream.java:235)
at org.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:170)
at
org.pdfbox.pdmodel.common.COSStreamArray.getUnfilteredStream(COSStreamArray.j
ava:200)
at
org.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:101)
at
ProcessDefinitions.RunAuditProcess.RunAuditProcessGenerateAuditLogMessage.inv
oke(RunAuditProcessGenerateAuditLogMessage.java:212)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.
java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at com.tibco.plugin.java.JavaActivity.eval(JavaActivity.java:383)
at com.tibco.pe.plugin.Activity.eval(Activity.java:209)
at com.tibco.pe.core.TaskImpl.eval(TaskImpl.java:540)
at com.tibco.pe.core.Job.a(Job.java:712)
at com.tibco.pe.core.Job.k(Job.java:501)
at
com.tibco.pe.core.JobDispatcher$JobCourier.a(JobDispatcher.java:249)
at
com.tibco.pe.core.JobDispatcher$JobCourier.run(JobDispatcher.java:200)
Sample code snippet I use to do the task.
PDDocument document = PDDocument.load(<input stream>);
List pages = document.getDocumentCatalog().getAllPages();
If(pages != null && pages.size() > 0) {
PDPage page = (PDPage)pages.get(i);
PDStream contents = page.getContents();
PDFStreamParser parser = null;
try {
parser = new PDFStreamParser(contents.getStream());
} catch(Exception e) {
System.err.println("This PDF cannot be read. Most possibly it could be
corrupted. " + pdfFileName);
}
}
Could somebody shed some light on this one?
Thank you.