Inline OCR Unit tests fail on Windows (Tika 1.7)

Ulrich Lang Mon, 19 Feb 2018 20:18:34 -0800

Hello!

Tika is awesome! I can build with no problems if I skip the tests. 
However, OCR'ing inline images fails, for example embedded in PDFs. OCR'ing
images as such works, just not embedded ones. I have the same issue with the
GUI app (PDFs are ok and images are ok, but PDFs with image not ok). Same
also happens with my application.


Is there a trick to make it work? The Unit tests for inline images also all
fail, so I am assuming there is some config issue. I have  set tesseractPath
and tessdataPath and the path to magick.exe in the properties file in the
Maven project (tika 1.7) in case it needs those paths...

tesseractPath="C:/Program Files (x86)/Tesseract-OCR"
tessdataPath="C:/Program Files (x86)/Tesseract-OCR/tessdata"
ImageMagickPath="C:/Program Files (x86)/ImageMagick"

Is there anything specific I need to configure to make inline OCR work? Is
this maybe a windows thing? 

Best,
Ulrich
ObjectSecurity

Inline OCR Unit tests fail on Windows (Tika 1.7)

Reply via email to