[tesseract-ocr] Re: Tesseract performance (speed and accuracy)

Tomy Chacko Sun, 15 Jan 2017 21:33:07 -0800

Hi All,

    I am watching this thread regards to performance of tesseract. We are 
processing large PDF (100 of pages and each page is converted to BMP) and 
sent to tesseract for processing one by one. 
    I am interested in only identifying the orientation of the text in the 
image and do rotation of the image based on the orientation identified.


    I could see that each of the image takes nearly 3 secs on an average. 
So a hundred page PDF will take around 275 - 300 secs. Isn't this a bit too 
high?

    I am using the .NET tesseract wrapper 3.0.2 now. Do we have a latest 
release version available and will it improve performance?

    Again, my whole tesseract functionaliy is implemented in .NET assembly 
(DLL) which is then called from our Delphi client. 

    I understand that the tesseract init process is a bit costly, but 
wondering how to Init only once in the .NET assembly (DLL) and use it for 
all pages on the PDF so I can save time while sending 
    subsequent pages from Delphi for processing from the .NET assembly?

Ta
Tomy


On Sunday, 14 February 2016 21:45:12 UTC+5:30, viraf wrote:
>
> I am new to tesseract and using it through Tess4J.  I am trying to OCR 
> faxes where pages are represented as TIFF (CCITT T.6) images - 2509 x 3530 
> @ 300 dpi (1 bit - i.e. BW).  
>
> I have two set of questions
>
> *Speed*
> On an intel i7-4800 MQ @ 2.7GHz I am getting approximately 6 PPM using 1 
> thread.  I was looking for suggestions on how to speed up page processing. 
>  I use parallelStream to process each page in a separate thread,
>
> *Training*
> I am trying to learn about training Tesseract for improved accuracy. 
>  Given that the fonts / box files used to generate eng.traindata are not 
> available can one specify the fonts used for english?  
> Also, is there a description of the various training artifacts ?  I used 
> "combine_tessdata 
> -u" to unpack eng.traindata and  "dawg2wordlist" to extract thee 
> wordlist, however was looking for documentation to better understand the 
> various training artifacts.
>
> Thanks
>
> - viraf
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/c1be37b4-c6a5-4595-9b91-b6f8876b5cf5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Re: Tesseract performance (speed and accuracy)

Reply via email to