Re: disable extraction of images

ron.vandenbranden Wed, 13 Apr 2016 05:53:21 -0700

Hi again,

On 13/04/2016 13:18, ron.vandenbranden wrote:

I wasn't aware of tesseract; I definitely don't have it on myclasspath. I'm just testing with the stand-alone tika jar file. MyJava skills are close to zero (apart from copy/paste and recompilingthings). Could you tell me how to configure this for the standalonejar file, please?

Ok, answering my own question: per the documentation athttps://tika.apache.org/1.12/gettingstarted.html, I got the CLI appworking with a configuration file with following command line arguments:


  java -jar tika-app-1.12.jar --gui --config=tika-config.xml

I'm using the example configuration file fromhttps://wiki.apache.org/tika/TikaOCR#Disable_Tika_OCR, excluding theTesseractOCRParser.

Yet, this does not seem to change anything: the image content is stillextracted. Any idea what could be wrong?


Best,

Ron
<http://www.facebook.com/KANTL.be>

Re: disable extraction of images

Reply via email to