You can build from source if you have an interest (and the bandwidth, time and
disk space) or pull a nightly build if you don’t want to wait for 1.11, for
example:
https://builds.apache.org/view/Tika/job/tika-trunk-jdk1.7/849/org.apache.tika$tika-app/
Thank you, Christian!
Best,
Tim
From: Brian Young [mailto:[email protected]]
Sent: Wednesday, September 09, 2015 4:09 PM
To: [email protected]
Subject: Re: tesseract issue
Ah that is very good- thank you. Looks like it will be in 1.11.
On Wed, Sep 9, 2015 at 4:00 PM, Christian Wolfe
<[email protected]<mailto:[email protected]>> wrote:
Brian,
I submitted a patch for this bug that was accepted by the team -
https://github.com/apache/tika/pull/56
I do'nt think it has made it to any release version.
On Wed, Sep 9, 2015 at 3:55 PM, Brian Young
<[email protected]<mailto:[email protected]>> wrote:
Hello,
On OS X at least, tesseract and tessdata may not be under a common root. e.g.:
/opt/local/share/tessdata
/opt/local/bin/tesseract
Unfortunately it looks like TesseractOCRParser does not accommodate for this
since there is only one configuration value that is used for finding the binary
as well as setting the TESSDATA _PREFIX environment var.
Now, TESSDATA_PREFIX does not get set if I do not pass in the path on the
config object. However, even though tesseract is in my path, it isn't found
when the ProcessBuilder executes unless I've given it the full path... which of
course sets the TESSDATA_PREFIX to the wrong thing.
It seems like maybe it would be best to handle these as two separate
configuration values? But short of that and a new version of Tika, does anyone
have any other advice?
Thank you
Brian