Hey community, i use tesseract for Text extraction but the i find it slow, so i have some questions to find out where can i contribute to make it faster :
- Did Tesseract process some Image traitement and preprocessing/cleanup at the start (need of Leptonica )? if it is the case what are those traitements? how much time do you think they consume ? and how could we disable them ? - Is Tesseract convert all the image to tiff then process them ? - Which part of Tesseract is the much time consuming ? and what are functions that you think we can remove or disable to make it faster ? - I find this article which propose some parallelisation in some functions to speed it up **[Performance Characterization and Parallelization of Tesseract Optical Character Recognition on Multicore Architectures](https://pdfs.semanticscholar.org/dab1/23de2a9c25eaeaf7b6456116cea1e509f3f7.pdf)**, is it implemented ? Thanks -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/f5cae8e1-8ec4-4fb6-8979-3010dc05aa5b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

