On Wednesday, February 17, 2016 at 5:50:20 PM UTC-5, viraf wrote:
>
> I am exploring Tesseract 3.0.4 (using Tess4J 3.0) and wanted to poll the 
> community on the performance (speed) that they are observing for fax 
> documents.
>
>  
>
> I have am processing TIFF (CCITT T.6) images - 2509 x 3530 @ 300 dpi (1 
> bit - i.e. BW).that are in English (a mixture of forms, letters and 
> diagrams) on an Intel i7-4800 MQ @ 2.7GHz and observing approximately 6 
> PPM using a single thread and no GPU usage.  I observed similar results on 
> both Windows and Linux.
>
>  
>
> Is this representative of the performance that you are observing? 
>

I recently benchmarked using the UNLV 300 dpi bitonal document sets and got 
the following results with 3.04.01 running from the command line with a new 
invocation of Tesseract for every page. The times below are for a single 
thread on a 2.6 GHz Core i7 MacBook Pro:

bus.3B - business letters, 200 files, 367 seconds
doe3.3B - tech docs, reports & other DOE images, 785 files, 1650 seconds
news.3B - newspaper articles, 200 files, 618 seconds

In summary, about 1.8-3.1 seconds/page, depending on the content. 
Performance is highly dependent on content. Dense text, noisy scans, small 
fonts, etc will all increase processing time.

The benchmark files are all available for download if anyone wants to do 
their own testing:
    https://code.google.com/archive/p/isri-ocr-evaluation-tools/downloads
and there are some basic directions here:
    https://github.com/tesseract-ocr/tesseract/wiki/TestingTesseract

Tom

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/9beaba51-f8cb-44b1-911d-d59bb56936a5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to