Getting all images from a PDF file

Antoni Mylka Fri, 07 May 2010 03:02:46 -0700

Hello,

I'm experimenting with getting all images from a PDF using the approachpresented in


http://kickjava.com/src/org/pdfbox/ExtractImages.java.htm

I'm getting a lot of duplicates, it seems that the same physical imagesare reused in a PDF file many times. I have two questions.

1. Is there some image pool in a PDF, so that I can iterate over asingle data structure?2. If there is not (or most probably the PDF allows all kinds ofstructures) then how can I get the byte offset from a PDXObjectImageinstance, so that I can store the offsets of already visited images.


Antoni Mylka
[email protected]

Getting all images from a PDF file

Reply via email to