Hi again,

On 13/04/2016 13:18, ron.vandenbranden wrote:

I wasn't aware of tesseract; I definitely don't have it on my classpath. I'm just testing with the stand-alone tika jar file. My Java skills are close to zero (apart from copy/paste and recompiling things). Could you tell me how to configure this for the standalone jar file, please?


Ok, answering my own question: per the documentation at https://tika.apache.org/1.12/gettingstarted.html, I got the CLI app working with a configuration file with following command line arguments:

  java -jar tika-app-1.12.jar --gui --config=tika-config.xml

I'm using the example configuration file from https://wiki.apache.org/tika/TikaOCR#Disable_Tika_OCR, excluding the TesseractOCRParser.

Yet, this does not seem to change anything: the image content is still extracted. Any idea what could be wrong?

Best,

Ron
<http://www.facebook.com/KANTL.be>

Reply via email to