Ultimately I'm trying to (1) determine whether images, particularly, full page images, are embedded in a pdf, and (2) extract the images and/or (3) OCR the text.
Does tika-app support this? When I run java -jar tika-app-1.8.jar test.pdf, I get all of the meta data, and see <page></page> tags but no images. Running with -z doesn't output any images.
