Re: Long time with OCR

Mark Kerzner Tue, 20 Feb 2018 06:36:56 -0800

Hi, Nick,

Thank you very much.


Mark

Mark Kerzner, SHMsoft <http://shmsoft.com/>,
Book a call with me here <http://www.meetme.so/markkerzner>

Mobile: 713-724-2534
Skype: mark.kerzner1
<http://shmsoft.com/>

On Tue, Feb 20, 2018 at 6:59 AM, Nick Burch <[email protected]> wrote:

> On Mon, 19 Feb 2018, Mark Kerzner wrote:
>
>> Is that a good approach? Is the 10 seconds time normal? I am using the
>> latest most powerful Mac and I get similar results on an i7 processor in
>> Ubuntu.
>>
>
> Tika uses the open source Tesseract OCR engine. Tesseract is optimised for
> ease of contributions and ease of implementing new approaches, rather than
> for performance, because as an (ex?-) accademic project that's more what
> they think's important
>
> There's some advice on the Tesseract github issues + wiki on ways to speed
> it up, eg https://github.com/tesseract-ocr/tesseract/issues/263 and
> https://github.com/tesseract-ocr/tesseract/issues/1171 and
> https://github.com/tesseract-ocr/tesseract/wiki/4.0-Accuracy
> -and-Performance
>
> Otherwise you'd need to switch to a proprietary OCR tool. I understand
> that the Google Cloud OCR is pretty good, if you don't mind pushing all
> your files up to Gooogle and paying per file
>
> Nick
>

Re: Long time with OCR

Reply via email to