Bouncing to user@ Are you able to share the document?
How are you running OCR exactly: 1) running OCR on extracted inline images 2) rendering page and then running OCR on the rendered image What is the quality of the image? Are you using the right language pack for the language? -----Original Message----- From: Mattmann, Chris A (3010) [mailto:[email protected]] Sent: Tuesday, June 20, 2017 10:02 AM To: [email protected] Cc: Ravi Gadapa <[email protected]> Subject: Re: Tesseract - OCR and Tika FWD’ing to the Tika list (note TO: address change) ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Principal Data Scientist, Engineering Administrative Office (3010) Manager, NSF & Open Source Projects Formulation and Development Offices (8212) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 180-503E, Mailstop: 180-503 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Director, Information Retrieval and Data Science Group (IRDS) Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA WWW: http://irds.usc.edu/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ From: Ravi Gadapa <[email protected]> Date: Monday, June 19, 2017 at 8:56 PM To: "[email protected]" <[email protected]> Subject: Tesseract - OCR and Tika I have been using it for our project and i seem to have problem extracting the data from pdf documents. Below is the sample how it extracts. 'EldAJ. iNEIWEI‘IEI ‘IVHG El‘c'l TIVHS SEIHOJJMS TIV "8 'NOILVGNEIWINOOEIEI ElElElfliOVdflNVW iNEIWdIflOEI ElElcl SV 3|in EIWVN S.J_NE|V\ld|flOE| NO GEISVEI EIEI TIVHS HOJJMS iOEINNOOSIG iNEIWdIflOEI HO:| EIZIS ElSflzl TIV 'Z 'GEliON EISIMEIEIHLO SSEI‘INH ‘EldAJ. EltlflSO‘IONEI HS VINEIN NI EIEI TIVHS SEIHOJJMS iOEINNOOSIG HOOGiflO TIV 'L Any suggestions Thanks
