Re: TIKA OCR not working

2015-04-27 Thread Mattmann, Chris A (3980)
Thanks Konstantin! ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@na

Re: TIKA OCR not working

2015-04-27 Thread Konstantin Gribov
JFYI, there's no tesseract & leptonica for centos6/rhel6 (even in epel), so I have specs for building tesseract and leptonica (its dependency) on github (https://github.com/grossws/tesseract-ocr-specs). Feel free to use if you're on centos/rhel. Also, tesseract language packs are trained for one l

RE: TIKA OCR not working

2015-04-27 Thread Uwe Schindler
Yes that is fixed. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov] > Sent: Monday, April 27, 2015 4:29 PM > To: user@tika.apache.org > Cc:

RE: TIKA OCR not working

2015-04-27 Thread Uwe Schindler
Hi, TIKA OCR is definitely working automatically with Solr 5.x. It is just important to install TesseractOCR on path (which is a native tool that does the actual work). On Ubuntu Linux, this should be quite simple ("apt-get install tesseract-ocr" or like that). You may also need to ainstall add

Re: TIKA OCR not working

2015-04-27 Thread Mattmann, Chris A (3980)
It should work out of the box in Solr as long as Tesseract is installed and on the class path. Solr had an issue with it since Tika sends 2 startDocument calls, but I fixed that with Uwe and it was shipped in 4.10.4 and in 5.x I think? ++

FW: TIKA OCR not working

2015-04-27 Thread Allison, Timothy B.
Trung, I haven't experimented with our OCR parser yet, but this should give a good start: https://wiki.apache.org/tika/TikaOCR . Have you installed tesseract? Tika colleagues, Any other tips? What else has to be configured and how? -Original Message- From: trung.ht [mailto:trung...@