Hello, We parse random pdf files, some are containing large images (5000x8000), with filters,and I noticed a regression in our CI with this test.This seems related to [PDFBOX-4836] Reduce the usage of ScatchFileBuffer when parsing a pdf - ASF JIRA
| | | | [PDFBOX-4836] Reduce the usage of ScatchFileBuffer when parsing a pdf - ... | | | and in particular this commit :PDFBOX-4836: don't use ScratchFile within COSInputStream any more · apache/pdfbox@6b9dd61 | | | | | | | | | | | PDFBOX-4836: don't use ScratchFile within COSInputStream any more · apac... git-svn-id: https://svn.apache.org/repos/asf/pdfbox/trunk@1881870 13f79535-47bb-0310-9956-ffa450edef68 | | | Pdfbox 2 was using scratch file to do this (heavy) processing, this is no more the case(hence our OOMError) Unfortunately this is quite surprising, given the PDDocument was opened with :Loader.loadPDF( pdfInputStream, MemoryUsageSetting.setupTempFileOnly() );Looking at the code, it seems that the InputStream is always completely read into memory by this Loader, is that correct ?So what is the purpose of defining a MemoryUsageSetting if it is ignored in lower layers ? This looks like a blocker for us : we need to cap pdfbox memory usage somehow.Is there a workaround for this ? Thank you in advance for your responses. M.