---------- Forwarded message --------- Von: Joshua <[email protected]> Date: Di., 19. Mai 2026, 10:37 Subject: High object numbers trigger OOME during save operation To: <[email protected]>
Hi there, We recently encountered a PDF document that contains unusually high object numbers in its source. Here is a non-contiguous excerpt: > > <</Info 2 0 R /Root 1 0 R /Encrypt 1151 0 R /Prev 213301232 0 obj > 0021353438 0000087 0 obj > 002135350785 0 obj > 0021353501209 0 obj > 11521216 0 obj > 000001241 0 obj > 0000000000 65531225 0 obj > 00213543971214 0 obj The PDF has the following restrictions: > PDF Version: 1.7 extension level 8 > R = 6 > P = -1052 > User password = > Supplied password is user password > extract for accessibility: allowed > extract for any purpose: not allowed > print low resolution: allowed > print high resolution: allowed > modify document assembly: not allowed > modify forms: allowed > modify annotations: allowed > modify other: not allowed > modify anything: not allowed > stream encryption method: AESv3 > string encryption method: AESv3 > file encryption method: AESv3 > File is not linearized > No syntax or stream encoding errors found; the file may still contain > errors that qpdf cannot detect The PDF contains: - Several hundred pages - 1282 objects - Size: ~25MB We are using PDFBox (currently version 3.0.3) to remove restrictions and save the file as unrestricted: document.setAllSecurityToBeRemoved(true); document.save(unrestrictedFile, CompressParameters.NO_COMPRESSION); For this type of document, saving consistently triggers an OutOfMemoryError in the JVM, even with more than 100 GB of RAM. Here is the stack trace: > java.lang.OutOfMemoryError: Java heap space > at java.base/java.util.Arrays.copyOf(Arrays.java:3481) > at java.base/java.util.ArrayList.grow(ArrayList.java:237) > at java.base/java.util.ArrayList.grow(ArrayList.java:244) > at java.base/java.util.ArrayList.add(ArrayList.java:454) > at java.base/java.util.ArrayList.add(ArrayList.java:467) > at > org.apache.pdfbox.pdfwriter.COSWriter.fillGapsWithFreeEntries(COSWriter.java:820) > at > org.apache.pdfbox.pdfwriter.COSWriter.doWriteXRefTable(COSWriter.java:761) > at > org.apache.pdfbox.pdfwriter.COSWriter.visitFromDocument(COSWriter.java:1326) > at org.apache.pdfbox.cos.COSDocument.accept(COSDocument.java:429) > at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1586) > at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1462) > at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1040) > at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:990) Due to the malformed object numbering in the PDF, the freeNumbers ArrayList in COSWriter grows excessively, as it attempts to store every integer up to the highest object number. This eventually causes memory allocation to exceed available heap space. We understand that the PDF itself is malformed. However, we would like to ask whether it would be possible to add a pre-check in PDFBox to prevent implausible object-number ranges from causing uncontrolled OOM errors. From our perspective, this behavior represents a potential attack surface: specially crafted documents could be used to trigger a denial-of-service condition and potentially disrupt an entire system. Thank you for your work on PDFBox and for considering this request. Best regards, Joshua

