Thanks. would be nice to have a control for this instead of having to create a 
separate TikaConfig to turn it off

From: Tim Allison <[email protected]>
Sent: Monday, January 11, 2021 3:39 PM
To: [email protected]
Subject: Re: Turning off ImageProcessing

Sorry and yes, the only way to turn off OCR is by removing the OCRParser or 
configuring it to point to a made-up path that _does not_ contain "tesseract".

On Mon, Jan 11, 2021 at 12:50 PM Peter Kronenberg 
<[email protected]<mailto:[email protected]>> wrote:
Is the EnableImageProcessing flag in TesseractOCRConfig honored?  It seems to 
always do OCR.  And in fact, as long as it finds it in the path, I get this 
message
[main] WARN org.apache.tika.parser.ocr.TesseractOCRParser - Tesseract OCR is 
installed and will be automatically applied to image files unless
you've excluded the TesseractOCRParser from the default parser.
Tesseract may dramatically slow down content extraction (TIKA-2359).
As of Tika 1.15 (and prior versions), Tesseract is automatically called.
In future versions of Tika, users may need to turn the TesseractOCRParser on 
via TikaConfig.

Is the only way to turn off image processing to remove the OCR parser?  Can I 
enable/disable it programmatically?
The easiest way I found to disable it is to provide a bogus path (thanks to the 
hint in TesseractOCRParser#checkInitialization), but that still issues the 
above message (not sure why it can’t check first if the path is valid)

Is there a better way to do this?

Reply via email to