Thanks, tried 3.0.1-SNAPSHOT and does seem fixed. Just in case here is a basic example (simplified cleanup/etc):
> InputStream is = new FileInputStream(new File("/tests/big.pdf")); > PDDocument doc = ...; > // PDDocument.load(is); //2.0.x > // Loader.loadPDF(new RandomAccessReadBuffer(is)); //3.0.x > > List<PDDocument> docs = new Splitter().split(doc); //timings here With a ~70MB PDF file of 600 pages (created by joining a PDF with a full-page image N times) - 2.0.29 = ~0.5 sec, ~300MB; 3.0.0 = ~7 sec, ~3500MB; 3.0.1: ~0.9 sec, ~130MB With a ~900MB PDF of 9600 pages (uncommon, but a real file sent by a client): - 2.0.29 = ~3.5 sec, ~3800MB; 3.0.0 = out of memory exception after ~30 sec; 3.0.1: ~0.9s, ~330MB Not exact timings but ok enough to compare (those would vary/increase after handling the List but not relevant here). High CPU probably depended on Java/SDK version, since I assume it would be linked to GC calls for the extra objects, and frequency/etc would vary per system, so was indirectly fixed. *** Also, for 2.0 we typically use: - PDDocument.load(is, MemoryUsageSetting.setupMixed(MAX_BYTES)) that seems to reduce/control memory a bit (at the cost of some CPU/etc). Does 3.0 have some direct equivalent? Tried stuff like: - Loader.loadPDF(rarb), null, null, null, MemoryUsageSetting.setupMixed(MAX_BYTES).streamCache) but doesn't seem to change much. 2.0 may be using Scratchfile internally but not sure how to setup that in 3.0? Thanks.