Yes, I saw that DefaultResourceCache uses SoftReference, however for some 
reason when I look at the heap dump XObjects are prevalent.  
The TIFF images are saved to file, using ImageIOUtil.write, and I simply keep 
the File reference for downstream processing. 
The change made to PDResources.isAllowedCache is:
            COSBase image = 
xobject.getCOSObject().getDictionaryObject(COSName.SUBTYPE);            if 
(image instanceof COSName && ((COSName) image).equals(COSName.IMAGE))           
 {             return false;            }
After sending my earlier e-mail, I realized that I may not be able to add user 
defined filter to PDResource (still figuring out the code), and that maybe it 
would be added to DefaultResourceCache.
I would like to be able to get a fix for the issue in the next release - there 
may be a better fix to the problem than I have provided above, as I am 
unfamiliar with the code base.
Thanks
- viraf

      From: "[email protected]" <[email protected]>
 To: "[email protected]" <[email protected]> 
 Sent: Thursday, February 23, 2017 9:07 AM
 Subject: OutOfMemoryException converting PDF to TIFF Images
   
I am using PDFBox to convert PDF documents to a series of TIFF images (one for 
each page).  The implementation uses PDFRenderer to render each page.  Things 
work fine when I am processing a single document in a single thread, however 
when I try to process multiple documents (each in its own thread) I get an 
OutOfMemoryException.
In analyzing the heap dump, I see that this is caused by the images cached in 
DefaultResourceCache.  Objects are added added to the cache in PDResources, 
which includes a method private boolean isAllowedCache(PDXObject xobject) that 
is used to determine whether an PDXObject can be cached.  I have extended this 
to filter out COSName.IMAGE, and am now able to process multiple documents in 
parallel.
I'd like to contribute this change back to the community.  However prior to 
adding this, I though some feedback on the filtering mechanism may be 
appropriate.  Some options include:
   
   - Always exclude images
   - Allow user to specify whether images should be cached or not (add a method 
to PDResource to toggle filtering of images).  Default would including caching 
of images to be backwards compatible.
   - Defer image caching decision to user through callback.  Default callback 
would cache all images to provide backwards compatibility.

I also wanted to know how best to submit my patch for inclusion.
Thanks
- viraf



   

Reply via email to