Embedded images in PDF - detect, extract and/or OCR

Stefan Alder Wed, 13 May 2015 12:04:07 -0700

Ultimately I'm trying to (1) determine whether images, particularly, full
page images, are embedded in a pdf, and (2) extract the images and/or (3)
OCR the text.


Does tika-app support this?  When I run java -jar tika-app-1.8.jar
test.pdf, I get all of the meta data, and see <page></page> tags but no
images.

Running with -z doesn't output any images.

Embedded images in PDF - detect, extract and/or OCR

Reply via email to