Sorry and yes, the only way to turn off OCR is by removing the OCRParser or configuring it to point to a made-up path that _does not_ contain "tesseract".
On Mon, Jan 11, 2021 at 12:50 PM Peter Kronenberg <[email protected]> wrote: > Is the EnableImageProcessing flag in TesseractOCRConfig honored? It seems > to always do OCR. And in fact, as long as it finds it in the path, I get > this message > > *[main] WARN org.apache.tika.parser.ocr.TesseractOCRParser - Tesseract OCR > is installed and will be automatically applied to image files unless* > > *you've excluded the TesseractOCRParser from the default parser.* > > *Tesseract may dramatically slow down content extraction (TIKA-2359).* > > *As of Tika 1.15 (and prior versions), Tesseract is automatically called.* > > *In future versions of Tika, users may need to turn the TesseractOCRParser > on via TikaConfig.* > > > > Is the only way to turn off image processing to remove the OCR parser? > Can I enable/disable it programmatically? > > The easiest way I found to disable it is to provide a bogus path (thanks > to the hint in TesseractOCRParser#checkInitialization), but that still > issues the above message (not sure why it can’t check first if the path is > valid) > > > > Is there a better way to do this? >
