Re: Problem loading large pdf files

Lachezar Dobrev Wed, 30 Oct 2013 08:55:53 -0700

This smells of Exception overwriting.
BaseParser.java:610 is actually a clean-up procedure, and if it
crashes it's quite possible that the original error is lost.


I have a gut feeling that there is an OOME somewhere above, that gets
wiped out by a crashed clean-up procedure.

That said: did you give your application at least 2G of heap memory?
The amount is arbitrary, but I suspect it will require as a bare
minimum the size of the file and then some, and possibly even more
(for pointers and stuff). I would start with an -Xmx2G.

2013/10/30 Brent Pathakis <[email protected]>:
> Hi,
>
>   I'm trying to use PDFbox to load a large pdf document (>1gb):
> [
>                       File inputPdf = new File("c:\\some.pdf");
>    PDFTextStripper stop = new PDFTextStripper ();
>
> FileInputStream fis=null;
>  fis=new FileInputStream(inputPdf);
> pd = PDDocument.load(fis,true);[/CODE]
>
>   This code works fine for smaller pdfs, but only larger ones I'm getting:
>
>   org.apache.pdfbox.exceptions.WrappedIOException
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:245)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1192)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1159)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1130)
> at PDFRedact.main(PDFRedact.java:19)
> Caused by: java.lang.IndexOutOfBoundsException: Index: 15625, Size: 15625
> at java.util.ArrayList.RangeCheck(Unknown Source)
> at java.util.ArrayList.get(Unknown Source)
> at org.apache.pdfbox.io.RandomAccessBuffer.seek(RandomAccessBuffer.java:84)
> at org.apache.pdfbox.io.RandomAccessFileOutputStream.write(
> RandomAccessFileOutputStream.java:106)
> at java.io.BufferedOutputStream.flushBuffer(Unknown Source)
> at java.io.BufferedOutputStream.flush(Unknown Source)
> at java.io.FilterOutputStream.close(Unknown Source)
> at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:
> 610)
> at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:568)
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:188)
> ... 4 more
>
>
>    Any ideas or help would be appreciated.
>
> *Brent Pathakis*
> 801 536 0041

Re: Problem loading large pdf files

Reply via email to