On Wednesday, April 27, 2016 at 7:29:53 AM UTC-4, Yeska wrote: > > Tesseract takes up to 20 seconds to process this image, I know that > colorful images with more than one column might be slower to process, but > 20 seconds is too much. > > > Can I do something to make processing this image faster ? > > > I'm sending the raw image to Tesseract, I prefer not to preprocess the > images because they are sent by the user so I can't be sure that my > preprocessing will be good for all the cases, but if you have some > "general" preprocessing ideas which will help in most of the cases I would > be grateful. >
The more "stuff" (<-- technical term) there is in an image, the more time it's going to take to process. You could do some simple manual testing with a photo editor, whiting out various parts of the page, to see what causing the increased processing time or you could profile Tesseract to see where it's spending its time. You could also ask Tess to dump the thresholded image so that you can see what it's actually working with. My first suspicion would the bank note engraving with all it's high frequency noise. My second guess would be the clip art on the left or the gradients top and bottom, but those are just guesses. If you don't have any additional domain knowledge that you can apply to the image pre-processing for your particular application, you may need to live with Tesseracts pre-processing (or attempt to improve it's general algorithms for your case without degrading other, more common cases). Tom -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/f44e04b6-f10a-4a80-b027-4eb8c0c2102a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

