this would be a very welcome change, Christian. Please create a JIRA
issue at:

http://issues.apache.org/jira/browse/TIKA

And update the wiki page here http://wiki.apache.org/tika/TikaOCR

Would be happy for you to contribute via SVN and Jira/patch and/or
from Github per here:

http://github.com/apache/tika/#contributing

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: [email protected]
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++





-----Original Message-----
From: Christian Wolfe <[email protected]>
Reply-To: "[email protected]" <[email protected]>
Date: Wednesday, July 22, 2015 at 7:11 PM
To: "[email protected]" <[email protected]>
Subject: TesseractOCRParser on Linux

>Hi folks,
>
>It looks to me that TesseractOCRParser doesn't work on Linux unless the
>Tesseract executable and the 'tessdata' folder are in the same location
>on the filesystem. This makes sense in a Windows environment
> (where everything is installed together by default), but in linux,
>package managers (*and* source code installations) tend to split the
>files up across the filesystem.
>
>
>I believe this could be alleviated by creating a second property in
>TesseractOCRConfig that points to the 'tessdata' folder separately from
>the Tesseract executable. That, or a bit of documentation
> that clarifies that the files need to be together.
>
>
>I would be more than willing to work on either solution, but only if the
>team considered it worthwhile.
>
>
>Anyway, thanks for making a great library, and for taking time to read
>this.
>

Reply via email to