Hi everyone! I work for a company that makes scans of several files and send it back to the client processed and organized as the client want. We are currently using Tesseract to OCR the files when the clients needs it.
Our internal system is written in VB .Net but the .Net wrapper is a bit obsolete compared with the current version of tesseract (the latest is with tesseract 3.02). What I'm doing is using Tesseract 3.04 by command line in our system. I have a loop that pick up each page of our files and do the OCR by command line generating a OCRed pdf and at the end I merge all the pdf's. I don't wait each process to finish and by doing this I can have up to 8 tesseract processes running at the same time and my current speed is something around one up to three pages per second. With command line is there any way to increase the speed? Our scanners have color detection so the pages are sometimes B&W and sometimes Colored pages. We have pdf files with up to 3000 pages and these files take something around 40min to one hour to do the OCR, it depends of the amount of Colored pages. Any suggestions to improve speed? Thank you very much!! -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/dad223ad-68f7-4f57-952c-7cecd1537d73%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

