High memory usage with pdfbox 3

Maison Mo Wed, 27 Jul 2022 07:11:05 -0700

Hello,
We parse random pdf files, some are containing large images (5000x8000), with 
filters,and I noticed a regression in our CI with this test.This seems related 
to [PDFBOX-4836] Reduce the usage of ScatchFileBuffer when parsing a pdf - ASF 
JIRA


| 
| 
|  | 
[PDFBOX-4836] Reduce the usage of ScatchFileBuffer when parsing a pdf - ...


 |

 |

 |


and in particular this commit :PDFBOX-4836: don't use ScratchFile within 
COSInputStream any more · apache/pdfbox@6b9dd61

| 
| 
| 
|  |  |

 |

 |
| 
|  | 
PDFBOX-4836: don't use ScratchFile within COSInputStream any more · apac...

git-svn-id: https://svn.apache.org/repos/asf/pdfbox/trunk@1881870 
13f79535-47bb-0310-9956-ffa450edef68
 |

 |

 |



Pdfbox 2 was using scratch file to do this (heavy) processing, this is no more 
the case(hence our OOMError)
Unfortunately this is quite surprising, given the PDDocument was opened with 
:Loader.loadPDF( pdfInputStream, MemoryUsageSetting.setupTempFileOnly() 
);Looking at the code, it seems that the InputStream is always completely read 
into memory by this Loader, is that correct ?So what is the purpose of defining 
a MemoryUsageSetting if it is ignored in lower layers ?
This looks like a blocker for us : we need to cap pdfbox memory usage 
somehow.Is there a workaround for this ?
Thank you in advance for your responses.

M.

High memory usage with pdfbox 3

Reply via email to