Hi PDFBox Team,
I have identified a potential bug in Apache PDFBox and would like to report it. 
Below are the details:


- **PDFBox Version**: 2.0.32 、3.0.0
- **Java Version**: 11


When there are a large number of sources (e.g., thousands), the `tobeclosed` 
method will load the PDF document into memory. This may pose a risk of 
Out-of-Memory (OOM) during the merge process.
The following adjustments can be made.
org.apache.pdfbox.multipdf.PDFMergerUtility#legacyMergeDocuments
for (Object sourceObject : sources)
{
PDDocument sourceDoc = null;
if (sourceObject instanceof File)
{
sourceDoc = PDDocument.load((File) sourceObject, partitionedMemSetting);
}
else
{
sourceDoc = PDDocument.load((InputStream) sourceObject,
partitionedMemSetting);
}
try {
appendDocument(destination, sourceDoc);
}finally {
IOUtils.closeAndLogException(sourceDoc, LOG, "PDDocument", null);
}

                }
one of the case :




Comparison of Memory Usage Before and After Modification (Merging a 16.8MB File 
200 Times, with JVM Heap Size Limit Set to 2GB)
- **Before Modification**: An OutOfMemoryError (OOM) occurred after just over 1 
minute of operation. Due to insufficient heap memory, Full GC (Full Garbage 
Collection) was triggered frequently, which can be observed from the CPU usage 
curve on the left.


- **After Modification**: The heap memory is now able to be collected normally 
without causing an OOM.




Thank you for your attention. Please let me know if you need any further 
information.


Best regards





Reply via email to