Ultimately I'm trying to (1) determine whether images, particularly, full
page images, are embedded in a pdf, and (2) extract the images and/or (3)
OCR the text.

Does tika-app support this?  When I run java -jar tika-app-1.8.jar
test.pdf, I get all of the meta data, and see <page></page> tags but no
images.

Running with -z doesn't output any images.

Reply via email to