Hello,

I'm experimenting with getting all images from a PDF using the approach presented in

http://kickjava.com/src/org/pdfbox/ExtractImages.java.htm

I'm getting a lot of duplicates, it seems that the same physical images are reused in a PDF file many times. I have two questions.

1. Is there some image pool in a PDF, so that I can iterate over a single data structure? 2. If there is not (or most probably the PDF allows all kinds of structures) then how can I get the byte offset from a PDXObjectImage instance, so that I can store the offsets of already visited images.

Antoni Mylka
[email protected]

Reply via email to