The major issue with your images is uneven illumination. You'll need to pre-process images in order to correct illumination. Another approach is to try to do the adaptive binarization (Sauvola, Niblack) and feed Tess with black and white images, since Tesseract internally uses a global binarization method (Otsu) and cannot cope with uneven illumination. Having said this, the font seems to be OK for recognition without any extra training.
Warm regards, Dmitri Silaev www.CustomOCR.com On Thu, Aug 18, 2011 at 12:09 AM, Andriy Malovanyy <[email protected]> wrote: > Hi, > > I try to write a simple program that uses pictures, which are taken from a > web-cam every 10 sec. with another program, recognises the text with OCR and > log the data into a text file. Everything seems to be working fine except > the fact that tesseract does not want to recognize the pictures that are > taken. If I "feed" tesseract pictures created with Photoshop, it works > better but sometimes also can not recognize very simple and obvious text > (numbers). > > I attach the 3 files taken by a web cam and 1 created with Photoshop. None > of them recognize well. The first two web-cam picture return garbage text, > the third one (the best quality I think) returns "Empty page message". > Photoshop picture returns "1234.018" instead of "1234.0.18". > > I use Tesseract-OCR 3.0 with language files that followed the package > (English only). Do I need to train Tessarat to recognise the pictures?? How > is it better to do it then?? Take several pictures taken with a web-cam, and > from them make a training file with numbers from 0 to 9 and points? I have > started to read how to do that, it seems sooo complicated.. > > Any advice appreciated. > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

