Hello everyone. I just started using Tesseract-OCR 3.02 to recognise numbers only.
The number themselves are *probably* in Futura Bold font, styled in a particular manner (see images). Using the "digits" parameter, Tesseract-OCR would either get it perfectly or fail completely (return a blank). After quite a bit of testing, it appears that it is the "crop" of the image is what makes or break. For instance: <https://lh3.googleusercontent.com/-I6vx1-5KxGY/VepwFvh_OmI/AAAAAAAAABw/kSXSI8qsJiU/s1600/Test1.png> When poorly cropped as above, with quite a bit of horizontal and vertical blank, the engine will always fail to return anything <https://lh3.googleusercontent.com/-8IMD05QoIYY/VepweKPrTxI/AAAAAAAAAB4/EFfQGgoD4CM/s1600/Test2.png> A crop like this, with a some space for extra digits would fail in this particular example, but succeed at time. <https://lh3.googleusercontent.com/--fH0jI8pEeQ/VepyLQAw6zI/AAAAAAAAACE/Qm22VlnbqGI/s1600/Test3.png> A crop like this, has so far always worked. The problem is that I am capturing the image automatically and need to cover for a range of at least 5-7 digits. I would never need to crop as badly as the first example, but I do need more leeway than the last one allow. Is there anything I could try to make something like the middle crop work better? Thanks. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/15242123-c775-47ae-be49-e839e081a8c7%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

