Hi again,
On 13/04/2016 13:18, ron.vandenbranden wrote:
I wasn't aware of tesseract; I definitely don't have it on my
classpath. I'm just testing with the stand-alone tika jar file. My
Java skills are close to zero (apart from copy/paste and recompiling
things). Could you tell me how to configure this for the standalone
jar file, please?
Ok, answering my own question: per the documentation at
https://tika.apache.org/1.12/gettingstarted.html, I got the CLI app
working with a configuration file with following command line arguments:
java -jar tika-app-1.12.jar --gui --config=tika-config.xml
I'm using the example configuration file from
https://wiki.apache.org/tika/TikaOCR#Disable_Tika_OCR, excluding the
TesseractOCRParser.
Yet, this does not seem to change anything: the image content is still
extracted. Any idea what could be wrong?
Best,
Ron
<http://www.facebook.com/KANTL.be>