Sorry and yes, the only way to turn off OCR is by removing the OCRParser or
configuring it to point to a made-up path that _does not_ contain
"tesseract".

On Mon, Jan 11, 2021 at 12:50 PM Peter Kronenberg <[email protected]>
wrote:

> Is the EnableImageProcessing flag in TesseractOCRConfig honored?  It seems
> to always do OCR.  And in fact, as long as it finds it in the path, I get
> this message
>
> *[main] WARN org.apache.tika.parser.ocr.TesseractOCRParser - Tesseract OCR
> is installed and will be automatically applied to image files unless*
>
> *you've excluded the TesseractOCRParser from the default parser.*
>
> *Tesseract may dramatically slow down content extraction (TIKA-2359).*
>
> *As of Tika 1.15 (and prior versions), Tesseract is automatically called.*
>
> *In future versions of Tika, users may need to turn the TesseractOCRParser
> on via TikaConfig.*
>
>
>
> Is the only way to turn off image processing to remove the OCR parser?
> Can I enable/disable it programmatically?
>
> The easiest way I found to disable it is to provide a bogus path (thanks
> to the hint in TesseractOCRParser#checkInitialization), but that still
> issues the above message (not sure why it can’t check first if the path is
> valid)
>
>
>
> Is there a better way to do this?
>

Reply via email to