Re: Problem loading large pdf files

Gilad Denneboom Wed, 30 Oct 2013 10:00:07 -0700

I used this code in one occasion:

        String tmpFilePath =
System.getProperty("java.io.tmpdir")+File.separator+"scratch.tmp";
        File tmpFile = new File(tmpFilePath);
        if (tmpFile.exists())
            tmpFile.delete();
        RandomAccess scratchFile = new RandomAccessFile(tmpFile, "rw");


        PDDocument doc = PDDocument.load( filePath, scratchFile );



On Wed, Oct 30, 2013 at 5:31 PM, Brent Pathakis <[email protected]> wrote:

> Thanks. Do you have an example of code using the scratch file?
> On Oct 30, 2013 9:30 AM, "Gilad Denneboom" <[email protected]>
> wrote:
>
> > Try using a scratch file in the load method of PDDocument.
> >
> >
> > On Wed, Oct 30, 2013 at 3:48 PM, Brent Pathakis <[email protected]>
> > wrote:
> >
> > > Hi,
> > >
> > >   I'm trying to use PDFbox to load a large pdf document (>1gb):
> > > [
> > >                       File inputPdf = new File("c:\\some.pdf");
> > >    PDFTextStripper stop = new PDFTextStripper ();
> > >
> > > FileInputStream fis=null;
> > >  fis=new FileInputStream(inputPdf);
> > > pd = PDDocument.load(fis,true);[/CODE]
> > >
> > >   This code works fine for smaller pdfs, but only larger ones I'm
> > getting:
> > >
> > >   org.apache.pdfbox.exceptions.WrappedIOException
> > > at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:245)
> > > at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1192)
> > > at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1159)
> > > at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1130)
> > > at PDFRedact.main(PDFRedact.java:19)
> > > Caused by: java.lang.IndexOutOfBoundsException: Index: 15625, Size:
> 15625
> > > at java.util.ArrayList.RangeCheck(Unknown Source)
> > > at java.util.ArrayList.get(Unknown Source)
> > > at
> > org.apache.pdfbox.io.RandomAccessBuffer.seek(RandomAccessBuffer.java:84)
> > > at org.apache.pdfbox.io.RandomAccessFileOutputStream.write(
> > > RandomAccessFileOutputStream.java:106)
> > > at java.io.BufferedOutputStream.flushBuffer(Unknown Source)
> > > at java.io.BufferedOutputStream.flush(Unknown Source)
> > > at java.io.FilterOutputStream.close(Unknown Source)
> > > at
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:
> > > 610)
> > > at
> org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:568)
> > > at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:188)
> > > ... 4 more
> > >
> > >
> > >    Any ideas or help would be appreciated.
> > >
> > > *Brent Pathakis*
> > > 801 536 0041
> > >
> >
>

Re: Problem loading large pdf files

Reply via email to