Hello,
I'm experimenting with getting all images from a PDF using the approach
presented in
http://kickjava.com/src/org/pdfbox/ExtractImages.java.htm
I'm getting a lot of duplicates, it seems that the same physical images
are reused in a PDF file many times. I have two questions.
1. Is there some image pool in a PDF, so that I can iterate over a
single data structure?
2. If there is not (or most probably the PDF allows all kinds of
structures) then how can I get the byte offset from a PDXObjectImage
instance, so that I can store the offsets of already visited images.
Antoni Mylka
[email protected]