Duplicate Resources - How to find them?

Richard Kwasnicki Tue, 22 Jul 2025 06:58:24 -0700

Hey,

I have a PDF File with 281 pages, each page is basically just one big image.
When I load it with PDFBox, my aim is to compress the images to make them 
smaller.
My approach is loading the Document, iterating over every page, checking for 
all resources on it if they are of type PDImageXObject. Then i do some 
compression.


The crazy thing is, my file somehow has on every page a resource to every 
resource. Seems, that all the images are somehow shared... So my program now 
does 281 * 281 compressions which is really slow.

Im not sure whats the best way to detect shared resources, is there some easy 
way? Also if you see other approaches serving the same purposes of compressing 
large images, i would be interested...

Best, Richard

Richard Kwasnicki
Softwareentwickler
Telefon: +49 351 215 908 34
E-Mail: [email protected]
Website<https://avantgarde-labs.com/> · 
LinkedIn<https://www.linkedin.com/company/avantgarde-labs-gmbh/> · 
Datenschutzbestimmungen<https://avantgarde-labs.com/de/datenschutzbestimmungen/>
Avantgarde Labs GmbH · Theresienstr. 9 · 01097 Dresden
Geschäftsführung: Robert Glaß, Torsten Hartmann, Sandy Lucka, Sven Rega
Sitz Dresden · Amtsgericht Dresden · HRB 31215 · USt-ID DE283937395
Avantgarde Labs · Wir lieben Entwicklung.

Duplicate Resources - How to find them?

Reply via email to