HI,

first of all, try a more recent version of PDFBox. The current one is 3.0.7.

Did you ever try to save the pdf using compressed object streams? Maybe the code in question isn't triggered that way.

Andreas

Am 19.05.26 um 11:39 schrieb Joshua:
---------- Forwarded message ---------
Von: Joshua <[email protected]>
Date: Di., 19. Mai 2026, 10:37
Subject: High object numbers trigger OOME during save operation
To: <[email protected]>


Hi there,

We recently encountered a PDF document that contains unusually high object
numbers in its source. Here is a non-contiguous excerpt:

<</Info 2 0 R /Root 1 0 R /Encrypt 1151 0 R /Prev 213301232 0 obj
0021353438 0000087 0 obj
002135350785 0 obj
0021353501209 0 obj
11521216 0 obj
000001241 0 obj
0000000000 65531225 0 obj
00213543971214 0 obj


The PDF has the following restrictions:

PDF Version: 1.7 extension level 8
R = 6
P = -1052
User password =
Supplied password is user password
extract for accessibility: allowed
extract for any purpose: not allowed
print low resolution: allowed
print high resolution: allowed
modify document assembly: not allowed
modify forms: allowed
modify annotations: allowed
modify other: not allowed
modify anything: not allowed
stream encryption method: AESv3
string encryption method: AESv3
file encryption method: AESv3
File is not linearized
No syntax or stream encoding errors found; the file may still contain
errors that qpdf cannot detect


The PDF contains:

    - Several hundred pages
    - 1282 objects
    - Size: ~25MB


We are using PDFBox (currently version 3.0.3) to remove restrictions and
save the file as unrestricted:
document.setAllSecurityToBeRemoved(true);
document.save(unrestrictedFile, CompressParameters.NO_COMPRESSION);

For this type of document, saving consistently triggers an OutOfMemoryError
in the JVM, even with more than 100 GB of RAM. Here is the stack trace:

java.lang.OutOfMemoryError: Java heap space
at java.base/java.util.Arrays.copyOf(Arrays.java:3481)
at java.base/java.util.ArrayList.grow(ArrayList.java:237)
at java.base/java.util.ArrayList.grow(ArrayList.java:244)
at java.base/java.util.ArrayList.add(ArrayList.java:454)
at java.base/java.util.ArrayList.add(ArrayList.java:467)
at
org.apache.pdfbox.pdfwriter.COSWriter.fillGapsWithFreeEntries(COSWriter.java:820)
at
org.apache.pdfbox.pdfwriter.COSWriter.doWriteXRefTable(COSWriter.java:761)
at
org.apache.pdfbox.pdfwriter.COSWriter.visitFromDocument(COSWriter.java:1326)
at org.apache.pdfbox.cos.COSDocument.accept(COSDocument.java:429)
at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1586)
at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1462)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1040)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:990)


Due to the malformed object numbering in the PDF, the freeNumbers ArrayList
in COSWriter grows excessively, as it attempts to store every integer up to
the highest object number. This eventually causes memory allocation to
exceed available heap space.

We understand that the PDF itself is malformed. However, we would like to
ask whether it would be possible to add a pre-check in PDFBox to prevent
implausible object-number ranges from causing uncontrolled OOM errors. From
our perspective, this behavior represents a potential attack surface:
specially crafted documents could be used to trigger a denial-of-service
condition and potentially disrupt an entire system.

Thank you for your work on PDFBox and for considering this request.

Best regards,
Joshua



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to