RE: Tesseract - OCR and Tika

Allison, Timothy B. Tue, 20 Jun 2017 08:16:01 -0700

Bouncing to user@

Are you able to share the document?


How are you running OCR exactly:
1) running OCR on extracted inline images
2) rendering page and then running OCR on the rendered image

What is the quality of the image?

Are you using the right language pack for the language?

-----Original Message-----
From: Mattmann, Chris A (3010) [mailto:[email protected]] 
Sent: Tuesday, June 20, 2017 10:02 AM
To: [email protected]
Cc: Ravi Gadapa <[email protected]>
Subject: Re: Tesseract - OCR and Tika

FWD’ing to the Tika list (note TO: address change)


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Principal Data Scientist, Engineering Administrative Office (3010) Manager, NSF 
& Open Source Projects Formulation and Development Offices (8212) NASA Jet 
Propulsion Laboratory Pasadena, CA 91109 USA
Office: 180-503E, Mailstop: 180-503
Email: [email protected]
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Director, Information Retrieval and Data Science Group (IRDS) Adjunct Associate 
Professor, Computer Science Department University of Southern California, Los 
Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


From: Ravi Gadapa <[email protected]>
Date: Monday, June 19, 2017 at 8:56 PM
To: "[email protected]" <[email protected]>
Subject: Tesseract - OCR and Tika

I have been using it for our project and i seem to have problem extracting the 
data from pdf documents. Below is the sample how it extracts.

'EldAJ. iNEIWEI‘IEI ‘IVHG El‘c'l TIVHS SEIHOJJMS TIV "8 'NOILVGNEIWINOOEIEI 
ElElElﬂiOVdﬂNVW iNEIWdIﬂOEI ElElcl SV 3|in EIWVN S.J_NE|V\ld|ﬂOE| NO GEISVEI 
EIEI TIVHS HOJJMS iOEINNOOSIG iNEIWdIﬂOEI HO:| EIZIS ElSﬂzl TIV 'Z 'GEliON 
EISIMEIEIHLO SSEI‘INH ‘EldAJ. EltlﬂSO‘IONEI HS VINEIN NI EIEI TIVHS SEIHOJJMS 
iOEINNOOSIG HOOGiﬂO TIV 'L


Any suggestions

Thanks

RE: Tesseract - OCR and Tika

Reply via email to