I created a large (1800 page) multi-page tiff and am feeding it to 
Tesseract via command line (on Ubuntu).  This way I am testing Tesseract 
performance.  I am still getting about 5/6 PPM.  I will run the test on 
another machine to see if the performance is the same.  Is this the 
performance that you are seeing for similar pages (details in thread 
above).  This is about 25% the performance of a commercial engine that I am 
evaluating (it gets about 24 PPM with 2 cores on my laptop), and its 
accuracy is significantly better.

- viraf

On Friday, February 19, 2016 at 7:50:09 AM UTC-5, viraf wrote:
>
> Thanks - I will investigate further.  Initial test that I ran based on 
> Tom's input showed around the same performance (I used a multi-page TIFF), 
> however the article you referenced indicated a speedup factor of 2x.  
>
> Is there a way to have Tesseract to process the pages in parallel ?
>
> On Thursday, February 18, 2016 at 9:58:12 PM UTC-5, Quan Nguyen wrote:
>>
>> If you can reduce or minimize initializing and disposing of Tesseract 
>> native instances for every run, you can achieve significant performance 
>> increase.
>>
>> https://sourceforge.net/p/tess4j/discussion/1202294/thread/d32bd579/ 
>>
>> On Sunday, February 14, 2016 at 10:15:12 AM UTC-6, viraf wrote:
>>>
>>> I am new to tesseract and using it through Tess4J.  I am trying to OCR 
>>> faxes where pages are represented as TIFF (CCITT T.6) images - 2509 x 3530 
>>> @ 300 dpi (1 bit - i.e. BW).  
>>>
>>> I have two set of questions
>>>
>>> *Speed*
>>> On an intel i7-4800 MQ @ 2.7GHz I am getting approximately 6 PPM using 1 
>>> thread.  I was looking for suggestions on how to speed up page processing. 
>>>  I use parallelStream to process each page in a separate thread,
>>>
>>>
>>> - viraf
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/9e976f46-8205-4a11-9c17-b6616c46a85b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to