PDFont.cmapObjects map memory leak ???

Antoni Mylka Thu, 12 May 2011 09:20:04 -0700

Hello,

I used the current pdfbox trunk (1101911) to extract text and picturesfrom a collection of 65 thousand PDF files of various sizes. Didn't usepdfbox 1.5.0 because I experienced the performance regression describedin PDFBOX-1005. The performance of the current trunk is MUCH better than1.5.0.

The extraction failed after some time with an OutOfMemoryException. WhenI analyzed the heap dump it turned out that the PDFont.cmapObjects maptakes more than 750 megabytes of memory.

1. Is it known already? (I'm not subscribed to the dev list, and itseems like a user issue)2. Is there any user-available way to clear this map periodically, itseems to me like a cache of some sort.

If not, I'll try to investigate and submit some patch. Just wanted toask if I'm not reinventing the wheel.


Antoni Myłka
[email protected]

PDFont.cmapObjects map memory leak ???

Reply via email to