Thanks. Do you have an example of code using the scratch file? On Oct 30, 2013 9:30 AM, "Gilad Denneboom" <[email protected]> wrote:
> Try using a scratch file in the load method of PDDocument. > > > On Wed, Oct 30, 2013 at 3:48 PM, Brent Pathakis <[email protected]> > wrote: > > > Hi, > > > > I'm trying to use PDFbox to load a large pdf document (>1gb): > > [ > > File inputPdf = new File("c:\\some.pdf"); > > PDFTextStripper stop = new PDFTextStripper (); > > > > FileInputStream fis=null; > > fis=new FileInputStream(inputPdf); > > pd = PDDocument.load(fis,true);[/CODE] > > > > This code works fine for smaller pdfs, but only larger ones I'm > getting: > > > > org.apache.pdfbox.exceptions.WrappedIOException > > at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:245) > > at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1192) > > at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1159) > > at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1130) > > at PDFRedact.main(PDFRedact.java:19) > > Caused by: java.lang.IndexOutOfBoundsException: Index: 15625, Size: 15625 > > at java.util.ArrayList.RangeCheck(Unknown Source) > > at java.util.ArrayList.get(Unknown Source) > > at > org.apache.pdfbox.io.RandomAccessBuffer.seek(RandomAccessBuffer.java:84) > > at org.apache.pdfbox.io.RandomAccessFileOutputStream.write( > > RandomAccessFileOutputStream.java:106) > > at java.io.BufferedOutputStream.flushBuffer(Unknown Source) > > at java.io.BufferedOutputStream.flush(Unknown Source) > > at java.io.FilterOutputStream.close(Unknown Source) > > at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java: > > 610) > > at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:568) > > at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:188) > > ... 4 more > > > > > > Any ideas or help would be appreciated. > > > > *Brent Pathakis* > > 801 536 0041 > > >

