Hi Pierre,

If you load from an input stream a temporary file will be created. Try loading 
from java.io.File or pass the filename. In addition you do not have to provide 
a scratch file. In that case your memory consumption will be much higher. 

In addition the NonSequentialParser supports a system property 
org.apache.pdfbox.pdfparser.nonSequentialPDFParser.parseMinimal. Setting that 
to 'true' object references in catalog are not followed. That might help (I 
have never used that though, looked it up in the sources). Depends on your use 
case.

What are you trying to do with the file? Which information are you looking for?

Maruan Sahyoun

Am 21.03.2013 um 09:41 schrieb Pierre Huttin <[email protected]>:

> Hello,
> 
> I'm trying to work on very large PDF file (21GB), and I want to extract some 
> pages, the problem is when I load the file in a PDDocument it create a 
> scratchfile around the same size than the file, and yesterday evening it took 
> 3H30 just to load the file.
> 
> PDDocument.loadNonSeq (method)
> 
> Is it possible to open the file in "Read Only" and "Read All from disk" ? 
> because I don't really understand why I need to load the complete file in 
> scratchfile just for reading ?
> 
> thanks for yours answers/comments/ideas how to solve this.
> 
> Pierre Huttin

Reply via email to