I'm splitting a document into groups of 20 pages using the Splitter (PDFBox 3.0.3). It works as expected, the sum of group sizes (~77MB) is similar to the full document size (~64MB). *But if I remove the annotations from each page before splitting,* the result is a group of pages of 64MB, and the sum of sizes (~660MB) is huge compared to the original document (~64MB).
*Result without removing annotations:* Permissions Size User Date Modified Name .rw-rw-r-- 10M joan 23 dic 16:00 'test 0.pdf' .rw-rw-r-- 7,9M joan 23 dic 16:00 'test 1.pdf' .rw-rw-r-- 6,9M joan 23 dic 16:00 'test 2.pdf' .rw-rw-r-- 6,2M joan 23 dic 16:00 'test 3.pdf' .rw-rw-r-- 3,1M joan 23 dic 16:00 'test 4.pdf' .rw-rw-r-- 6,5M joan 23 dic 16:00 'test 5.pdf' .rw-rw-r-- 6,8M joan 23 dic 16:00 'test 6.pdf' .rw-rw-r-- 4,3M joan 23 dic 16:00 'test 7.pdf' .rw-rw-r-- 5,0M joan 23 dic 16:00 'test 8.pdf' .rw-rw-r-- 2,8M joan 23 dic 16:00 'test 9.pdf' .rw-rw-r-- 5,4M joan 23 dic 16:00 'test 10.pdf' .rw-rw-r-- 4,7M joan 23 dic 16:00 'test 11.pdf' .rw-rw-r-- 3,5M joan 23 dic 16:00 'test 12.pdf' .rw-rw-r-- 3,4M joan 23 dic 16:00 'test 13.pdf' .rw-rw-r-- 815k joan 23 dic 16:00 'test 14.pdf' *Result removing annotations:* Permissions Size User Date Modified Name .rw-rw-r-- 10M joan 23 dic 16:53 'test 0.pdf' .rw-rw-r-- 64M joan 23 dic 16:53 'test 1.pdf' .rw-rw-r-- 64M joan 23 dic 16:53 'test 2.pdf' .rw-rw-r-- 64M joan 23 dic 16:53 'test 3.pdf' .rw-rw-r-- 64M joan 23 dic 16:53 'test 4.pdf' .rw-rw-r-- 64M joan 23 dic 16:53 'test 5.pdf' .rw-rw-r-- 64M joan 23 dic 16:53 'test 6.pdf' .rw-rw-r-- 64M joan 23 dic 16:53 'test 7.pdf' .rw-rw-r-- 64M joan 23 dic 16:53 'test 8.pdf' .rw-rw-r-- 64M joan 23 dic 16:53 'test 9.pdf' .rw-rw-r-- 64M joan 23 dic 16:53 'test 10.pdf' .rw-rw-r-- 4,7M joan 23 dic 16:53 'test 11.pdf' .rw-rw-r-- 3,5M joan 23 dic 16:53 'test 12.pdf' .rw-rw-r-- 3,4M joan 23 dic 16:53 'test 13.pdf' .rw-rw-r-- 833k joan 23 dic 16:53 'test 14.pdf' *Related code:* private static List<Path> splitPdfByCleanAnnotations(Path fileToSplit, Supplier<Path> pathSupplier, int splitAtPage) throws IOException { Splitter splitter = new Splitter(); splitter.setSplitAtPage(splitAtPage); try (var document = Loader.loadPDF(fileToSplit.toFile())) { *clearAnnotations(document);* return splitAndSave(pathSupplier, splitter, document); } } private static void clearAnnotations(PDDocument document) throws IOException { for (int i = 0; i < document.getNumberOfPages(); i++) { document.getPage(i).getAnnotations().clear(); } } private static List<Path> splitAndSave(Supplier<Path> pathSupplier, Splitter splitter, PDDocument document) throws IOException { return splitter.split(document).stream() .map(d -> callOrLog(() -> { try (d) { Path path = pathSupplier.get(); d.save(path.toFile()); return path; } }) ).toList(); } Here is the link to the PDF: https://file.io/KI2CFBB87H4c Any idea why this is happening with this PDF? Thanks! P.S: We split 100's of PDFs each day and this is the first time we see this issue.