Re: Long time with OCR

2018-02-20 Thread Mark Kerzner
help improve > your speed, but I thought I’d share. > > > > *From:* Chris Mattmann [mailto:mattm...@apache.org] > *Sent:* Tuesday, February 20, 2018 12:31 PM > *To:* user@tika.apache.org > *Subject:* Re: Long time with OCR > > > > Updated the wiki page with

RE: Long time with OCR

2018-02-20 Thread Allison, Timothy B.
won’t help improve your speed, but I thought I’d share. From: Chris Mattmann [mailto:mattm...@apache.org] Sent: Tuesday, February 20, 2018 12:31 PM To: user@tika.apache.org Subject: Re: Long time with OCR Updated the wiki page with this info, thanks Nick! From: Mark Kerzner mailto:mar

Re: Long time with OCR

2018-02-20 Thread Chris Mattmann
Updated the wiki page with this info, thanks Nick! From: Mark Kerzner Reply-To: "user@tika.apache.org" Date: Tuesday, February 20, 2018 at 6:36 AM To: Tika User Subject: Re: Long time with OCR Hi, Nick, Thank you very much. Mark Mark Kerzner, SHMsoft, B

Re: Long time with OCR

2018-02-20 Thread Mark Kerzner
Hi, Nick, Thank you very much. Mark Mark Kerzner, SHMsoft , Book a call with me here Mobile: 713-724-2534 Skype: mark.kerzner1 On Tue, Feb 20, 2018 at 6:59 AM, Nick Burch wrote: > On Mon, 19 Feb 2018, Mark Kerzner

Re: Long time with OCR

2018-02-20 Thread Nick Burch
On Mon, 19 Feb 2018, Mark Kerzner wrote: Is that a good approach? Is the 10 seconds time normal? I am using the latest most powerful Mac and I get similar results on an i7 processor in Ubuntu. Tika uses the open source Tesseract OCR engine. Tesseract is optimised for ease of contributions and

Long time with OCR

2018-02-19 Thread Mark Kerzner
Hi, all, I am doing OCR on a pdf with more than 500 hundred pages. Since it takes a long time, I broke the PDF into individual pages, so that I can better track progress. It works, but I get 10 seconds per page. These pages are hard because they have different fonts and maybe other complications.