Re: Turning off ImageProcessing

Tim Allison Mon, 11 Jan 2021 12:37:54 -0800

We should change "enableImageProcessing" to "enableImagePreprocessing".
That flag covers the rotation.py and ImageMagick preprocessing, NOT ocr.


As for the warnings...I'm trying to figure out how to push those further
towards the time of executing tesseract so that people who run tesseract
get the warning, but those whose files never go down that path don't get
the warning.

On Mon, Jan 11, 2021 at 12:50 PM Peter Kronenberg <[email protected]>
wrote:

> Is the EnableImageProcessing flag in TesseractOCRConfig honored?  It seems
> to always do OCR.  And in fact, as long as it finds it in the path, I get
> this message
>
> *[main] WARN org.apache.tika.parser.ocr.TesseractOCRParser - Tesseract OCR
> is installed and will be automatically applied to image files unless*
>
> *you've excluded the TesseractOCRParser from the default parser.*
>
> *Tesseract may dramatically slow down content extraction (TIKA-2359).*
>
> *As of Tika 1.15 (and prior versions), Tesseract is automatically called.*
>
> *In future versions of Tika, users may need to turn the TesseractOCRParser
> on via TikaConfig.*
>
>
>
> Is the only way to turn off image processing to remove the OCR parser?
> Can I enable/disable it programmatically?
>
> The easiest way I found to disable it is to provide a bogus path (thanks
> to the hint in TesseractOCRParser#checkInitialization), but that still
> issues the above message (not sure why it can’t check first if the path is
> valid)
>
>
>
> Is there a better way to do this?
>

Re: Turning off ImageProcessing

Reply via email to