Brian, I submitted a patch for this bug that was accepted by the team - https://github.com/apache/tika/pull/56
I do'nt think it has made it to any release version. On Wed, Sep 9, 2015 at 3:55 PM, Brian Young <[email protected]> wrote: > Hello, > > On OS X at least, tesseract and tessdata may not be under a common root. > e.g.: > > /opt/local/share/tessdata > > /opt/local/bin/tesseract > > > Unfortunately it looks like TesseractOCRParser does not accommodate for > this since there is only one configuration value that is used for finding > the binary as well as setting the TESSDATA _PREFIX environment var. > > > Now, TESSDATA_PREFIX does not get set if I do not pass in the path on the > config object. However, even though tesseract is in my path, it isn't > found when the ProcessBuilder executes unless I've given it the full > path... which of course sets the TESSDATA_PREFIX to the wrong thing. > > > It seems like maybe it would be best to handle these as two separate > configuration values? But short of that and a new version of Tika, does > anyone have any other advice? > > > Thank you > > Brian > > > > > > >
