Am 23.02.2017 um 15:07 schrieb [email protected]:
I am using PDFBox to convert PDF documents to a series of TIFF images (one for
each page). The implementation uses PDFRenderer to render each page. Things
work fine when I am processing a single document in a single thread, however
when I try to process multiple documents (each in its own thread) I get an
OutOfMemoryException.
In analyzing the heap dump, I see that this is caused by the images cached in
DefaultResourceCache. Objects are added added to the cache in PDResources,
which includes a method private boolean isAllowedCache(PDXObject xobject) that
is used to determine whether an PDXObject can be cached. I have extended this
to filter out COSName.IMAGE, and am now able to process multiple documents in
parallel.
I'd like to contribute this change back to the community. However prior to
adding this, I though some feedback on the filtering mechanism may be
appropriate. Some options include:
- Always exclude images
- Allow user to specify whether images should be cached or not (add a
method to PDResource to toggle filtering of images). Default would including
caching of images to be backwards compatible.
- Defer image caching decision to user through callback. Default callback
would cache all images to provide backwards compatibility.
I also wanted to know how best to submit my patch for inclusion.
Thanks
In theory, the cache should dump its contents when memory becomes low
thanks to the SoftReference. Thus I'm wondering if it doesn't work at
all, that's the real question.
About your own code - are you "losing" any reference to the TIFF you
produce after each page? Or are they in an array of images?
re submitting stuff, see
https://pdfbox.apache.org/codingconventions.html , open an issue in JIRA
and submit a .diff / .patch file. Be aware that we don't accept every
submission. I'm skeptical about callbacks, we don't do such a thing
anywhere.
Tilman
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]