Thanks Konstantin!
++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@na
JFYI, there's no tesseract & leptonica for centos6/rhel6 (even in epel), so
I have specs for building tesseract and leptonica (its dependency) on
github (https://github.com/grossws/tesseract-ocr-specs). Feel free to use
if you're on centos/rhel.
Also, tesseract language packs are trained for one l
Yes that is fixed.
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> -Original Message-
> From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov]
> Sent: Monday, April 27, 2015 4:29 PM
> To: user@tika.apache.org
> Cc:
Hi,
TIKA OCR is definitely working automatically with Solr 5.x.
It is just important to install TesseractOCR on path (which is a native tool
that does the actual work). On Ubuntu Linux, this should be quite simple
("apt-get install tesseract-ocr" or like that). You may also need to ainstall
add
It should work out of the box in Solr as long as Tesseract is
installed and on the class path. Solr had an issue with it since
Tika sends 2 startDocument calls, but I fixed that with Uwe and
it was shipped in 4.10.4 and in 5.x I think?
++
Trung,
I haven't experimented with our OCR parser yet, but this should give a good
start: https://wiki.apache.org/tika/TikaOCR .
Have you installed tesseract?
Tika colleagues,
Any other tips? What else has to be configured and how?
-Original Message-
From: trung.ht [mailto:trung...@