Are you doing any pre-processing besides cropping? If those images are representative and the colors are constant, I'd replace the orange background with black and then invert the image to give black digits with no border on a white background. Also use the page segmentation mode for a single line of text.
Tom On Saturday, September 5, 2015 at 6:37:04 AM UTC-4, AxB wrote: > > Hello everyone. > > I just started using Tesseract-OCR 3.02 to recognise numbers only. > > The number themselves are *probably* in Futura Bold font, styled in a > particular manner (see images). > > Using the "digits" parameter, Tesseract-OCR would either get it perfectly > or fail completely (return a blank). > > After quite a bit of testing, it appears that it is the "crop" of the > image is what makes or break. For instance: > > > <https://lh3.googleusercontent.com/-I6vx1-5KxGY/VepwFvh_OmI/AAAAAAAAABw/kSXSI8qsJiU/s1600/Test1.png> > When poorly cropped as above, with quite a bit of horizontal and vertical > blank, the engine will always fail to return anything > > > > <https://lh3.googleusercontent.com/-8IMD05QoIYY/VepweKPrTxI/AAAAAAAAAB4/EFfQGgoD4CM/s1600/Test2.png> > A crop like this, with a some space for extra digits would fail in this > particular example, but succeed at time. > > > > <https://lh3.googleusercontent.com/--fH0jI8pEeQ/VepyLQAw6zI/AAAAAAAAACE/Qm22VlnbqGI/s1600/Test3.png> > > A crop like this, has so far always worked. > > > The problem is that I am capturing the image automatically and need to > cover for a range of at least 5-7 digits. > > I would never need to crop as badly as the first example, but I do need > more leeway than the last one allow. > > Is there anything I could try to make something like the middle crop work > better? > > Thanks. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/0c3984d1-0a21-4121-b19c-513928897193%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

