Hi Andre, You must have a minimum resolution to be effective, overly high resolution will also throw results off. Internally the JPG is converted to a bitmap format. They are currently only 72dpi, so you'll need to resize (scale) them at least, but the low contrast (black on gray) will be a problem, so ImageMagick or something could help. --Sven
On Thu, Aug 18, 2011 at 9:39 AM, Andres <[email protected]> wrote: > Sven: > What do you exactly mean with 200-300 dpi ? The dpi attributes in the > jpg files are being evaluated ? Or you are referring to some scaling > of the images ? > > Andriy: > If you continue having problems with this and if the camera is in a > fixed position with respect to your display and the font is always the > same, it should be very easy for you to avoid using tesseract and just > recognizing the characters by evaluating some pixels after > thresholding. (I would threshold just the evaluated pixels). > > Regards, > > Andres > > > > 2011/8/18 Sven Pedersen <[email protected]>: >> You should not need to retrain. You need to change the images to >> grayscale or B&W of 200-300 dpi, get the background (which seems to be >> gray) to be closer to white. You can do that kind of cleanup >> transformation with ImageMagick. >> --Sven >> >> >> On Wed, Aug 17, 2011 at 3:09 PM, Andriy Malovanyy <[email protected]> >> wrote: >>> Hi, >>> >>> I try to write a simple program that uses pictures, which are taken from a >>> web-cam every 10 sec. with another program, recognises the text with OCR and >>> log the data into a text file. Everything seems to be working fine except >>> the fact that tesseract does not want to recognize the pictures that are >>> taken. If I "feed" tesseract pictures created with Photoshop, it works >>> better but sometimes also can not recognize very simple and obvious text >>> (numbers). >>> >>> I attach the 3 files taken by a web cam and 1 created with Photoshop. None >>> of them recognize well. The first two web-cam picture return garbage text, >>> the third one (the best quality I think) returns "Empty page message". >>> Photoshop picture returns "1234.018" instead of "1234.0.18". >>> >>> I use Tesseract-OCR 3.0 with language files that followed the package >>> (English only). Do I need to train Tessarat to recognise the pictures?? How >>> is it better to do it then?? Take several pictures taken with a web-cam, and >>> from them make a training file with numbers from 0 to 9 and points? I have >>> started to read how to do that, it seems sooo complicated.. >>> >>> Any advice appreciated. >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To post to this group, send email to [email protected] >>> To unsubscribe from this group, send email to >>> [email protected] >>> For more options, visit this group at >>> http://groups.google.com/group/tesseract-ocr?hl=en >>> >> >> >> >> -- >> ``All that is gold does not glitter, >> not all those who wander are lost; >> the old that is strong does not wither, >> deep roots are not reached by the frost. >> From the ashes a fire shall be woken, >> a light from the shadows shall spring; >> renewed shall be blade that was broken, >> the crownless again shall be king.” >> >> -- >> You received this message because you are subscribed to the Google >> Groups "tesseract-ocr" group. >> To post to this group, send email to [email protected] >> To unsubscribe from this group, send email to >> [email protected] >> For more options, visit this group at >> http://groups.google.com/group/tesseract-ocr?hl=en >> > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- ``All that is gold does not glitter, not all those who wander are lost; the old that is strong does not wither, deep roots are not reached by the frost. >From the ashes a fire shall be woken, a light from the shadows shall spring; renewed shall be blade that was broken, the crownless again shall be king.” -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

