Hey community,
i use tesseract for Text extraction but the i find it slow, so i have some 
questions to find out where can i contribute to make it faster :

- Did Tesseract process some Image traitement and preprocessing/cleanup at 
the start (need of Leptonica )? if it is the case what are those 
traitements? how much time do you think they consume ? and how could we 
disable them ?
- Is Tesseract convert all the image to tiff then process them ?
- Which part of Tesseract is the much time consuming ? and what are 
functions that you think we can remove or disable to make it faster ?
-  I find this article which propose some parallelisation in some functions 
to speed it up **[Performance Characterization and Parallelization of 
Tesseract Optical Character
Recognition on Multicore 
Architectures](https://pdfs.semanticscholar.org/dab1/23de2a9c25eaeaf7b6456116cea1e509f3f7.pdf)**,
 
is it implemented ?

Thanks 

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/f5cae8e1-8ec4-4fb6-8979-3010dc05aa5b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to