On Wed, 13 Apr 2016, ron.vandenbranden wrote:
Is it possible to disable text extraction from images inside a PDF file? I'm testing with the CLI tika app, which has "extractInlineImages" set to false by default, if I'm not mistaken. Yet, the text of the images still is present in the generated HTML output. Am I missing something obvious?

Yup, see "Disable Tika OCR" in https://wiki.apache.org/tika/TikaOCR (or remove tessaract from your path!)

Nick

Reply via email to