Hi everyone!

I work for a company that makes scans of several files and send it back to 
the client processed and organized as the client want.
We are currently using Tesseract to OCR the files when the clients needs it.

Our internal system is written in VB .Net but the .Net wrapper is a bit 
obsolete compared with the current version of tesseract (the latest is with 
tesseract 3.02).
What I'm doing is using Tesseract 3.04 by command line in our system. I 
have a loop that pick up each page of our files and do the OCR by command 
line generating a OCRed pdf and at the end I merge all the pdf's. I don't 
wait each process to finish and by doing this I can have up to 8 tesseract 
processes running at the same time and my current speed is something around 
one up to three pages per second.

With command line is there any way to increase the speed? Our scanners have 
color detection so the pages are sometimes B&W and sometimes Colored pages.
We have pdf files with up to 3000 pages and these files take something 
around 40min to one hour to do the OCR, it depends of the amount of Colored 
pages.

Any suggestions to improve speed?

Thank you very much!!

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/dad223ad-68f7-4f57-952c-7cecd1537d73%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to