Re: disable extraction of images

Nick Burch Wed, 13 Apr 2016 03:57:00 -0700

On Wed, 13 Apr 2016, ron.vandenbranden wrote:

Is it possible to disable text extraction from images inside a PDF file?I'm testing with the CLI tika app, which has "extractInlineImages" setto false by default, if I'm not mistaken. Yet, the text of the imagesstill is present in the generated HTML output. Am I missing somethingobvious?

Yup, see "Disable Tika OCR" in https://wiki.apache.org/tika/TikaOCR (orremove tessaract from your path!)


Nick

Re: disable extraction of images

Reply via email to