Tika is awesome! I can build with no problems if I skip the tests.
However, OCR'ing inline images fails, for example embedded in PDFs. OCR'ing
images as such works, just not embedded ones. I have the same issue with the
GUI app (PDFs are ok and images are ok, but PDFs with image not ok). Same
also happens with my application.
Is there a trick to make it work? The Unit tests for inline images also all
fail, so I am assuming there is some config issue. I have set tesseractPath
and tessdataPath and the path to magick.exe in the properties file in the
Maven project (tika 1.7) in case it needs those paths...
tesseractPath="C:/Program Files (x86)/Tesseract-OCR"
tessdataPath="C:/Program Files (x86)/Tesseract-OCR/tessdata"
ImageMagickPath="C:/Program Files (x86)/ImageMagick"
Is there anything specific I need to configure to make inline OCR work? Is
this maybe a windows thing?