Thank you all for your support. Finally, I managed to make tesseract work. What I did is just to install version 2.04 together with language data for Basque. Then, I began to get much better results - most of the files with farely good background were recognized good. Since I use Yawcam program for taking pictures every X seconds, and the program supports only jpg, gif and png, I wanted to make it work with version 3 of Tesseract. I tried different languages but get the same bad result. Then, I downloaded bugfix version 3.01 from here: http://code.google.com/p/tesseract-ocr/downloads/detail?name=tesseract-3.00.1.exe.zip&can=2&q= and everything started to work!!
Finally, I still need to work with older version of Tesseract (2.01) because I use old computer with Win98 for logging and newer versions do not support it. So, I need to use image conversion software. So, in the end, the total logging process will look like this: 1. Set Yawcam to take picture every X sec. 2. Convert jpg to tif 3. Recognise image with Tesseract 2.01 and write result into text file 4. Read from text file and log to another text file together with date and time Steps 2-4 will be run from C++ program which I write just now If everything is working succesful the program will be using also the current readings to control electricity supply to one unit through parallel port and relay. I will definitely report about the results!! Thank again for your help and good luck! On 20 Aug, 14:19, Dmitri Silaev <[email protected]> wrote: > As a rule of the thumb, usually one can obtain good recognition > results for all standard regular fonts of 11-16pt size, be it a > screenshot or a 300 DPI scanned image. Should font size, resolution, > etc. differ significantly from these numbers, recognition quality > becomes a matter of experimentation. > > Warm regards, > Dmitri Silaevwww.CustomOCR.com > > On Sat, Aug 20, 2011 at 2:14 PM, Sriranga(78yrsold) > > > > > > > > <[email protected]> wrote: > > Dmitri, > > really the issue is very complex/complicated to understand by layman user. > > For training purpose in tesseract-ocr, , what is your expertise valuable > > guidance to be followed by users - who uses generally depends on scanner > > machine and "Print Screen"Key of the computer.. > > 1)for scanning the typed text - ,(a) font size in the text should be > > used.(b) resolution to be set in the scanner. > > 2)For Screenshot of the typed text file= with help of Irfanview, or > > imagemagic etc. resolution should be increased from 96 to 300 dpi > > for any image format like tif, png etc. > > With regards, > > -sriranga(78yrs) > > > On Sat, Aug 20, 2011 at 1:33 PM, Dmitri Silaev <[email protected]> > > wrote: > > >> There are different cases of how pixel height of a font's character > >> should be calculated. If you're trying to recognize a screenshot, you > >> may deem one pt to be equal to one pixel when typing it in Windows > >> Paint. However this might not be true for more complex editors like > >> Photoshop. Also this depends on physical size of screen's pixel and > >> current video mode resolution. Another case is a scanned image, here > >> pixel height depends on scanning resolution. Still another case, where > >> imho trying to relate pixel height to font's point size absolutely > >> lacks sense (however it is possible via some multi-parameter > >> formulas), is a photographic or video frame image; here pixel height > >> varies depending on the camera position and even can vary within a > >> single line of text. > > >> All in all, Tesseract does not bother itself with DPIs, pt sizes, > >> etc.; only pixel size is important for recognition. You can use this > >> formula for scanned images to roughly determine font pixel height: > > >> pixels = DPI * pts / 72 > > >> where pixels - pixel height to be found, DPI - scanning resolution, > >> pts - size of font in typographic points > > >> However the most reliable is to scan a test page and manually count > >> pixels. > > >> For those willing to understand everything, here are the links: > >>http://en.wikipedia.org/wiki/Dots_per_inch > >>http://en.wikipedia.org/wiki/Point_%28typography%29 > >>http://en.wikipedia.org/wiki/X-height > > >> Warm regards, > >> Dmitri Silaev > >>www.CustomOCR.com > > >> On Sat, Aug 20, 2011 at 7:48 AM, Sriranga(78yrsold) > >> <[email protected]> wrote: > >> > Dmitri, > >> > Thanks for the valuable guidance. I seek some clarification as follows= > >> > (1)"Tesseract, trained with ordinary fonts, proved good with fonts > >> > of12-64 > >> > pixel height" it would be nice, if indicated equivalent font size for > >> > pixel > >> > of 12-64? For 10 or 20 pt size of the regular(ordinary) font what is the > >> > pixel height used in the Notepad? > >> > I am not programmer nor developer - as such I am seeking valuable > >> > guidance > >> > as user. > >> > BTW Is it to possible to count the pixel of any size say 20 pt of > >> > regular in > >> > the paint brush in which it has gird ( graph like). Just > >> > now I tested in paintbrush vide screenshot attached. alphabets was typed > >> > using Arial- 20 and counted pixel -it has 20 pixels. > > >> > Thus it is presumed that 12-64 pixel height is equivalent to 12-64 point > >> > size of the ordinary font - kindly confirm. > >> > With warmest regards, > >> > -sriranga(78yrs) > > >> > On Sat, Aug 20, 2011 at 1:00 AM, Dmitri Silaev <[email protected]> > >> > wrote: > > >> >> The DPI measure is confusing for Tesseract's OCR, forget about it. The > >> >> big thing is within-image font's x-height, measured in pixels. > >> >> Tesseract, trained with ordinary fonts, proved good with fonts of > >> >> 12-64 pixel height. If you have bigger characters, scale them down. If > >> >> you have a font that's bold, use morphology and erode characters after > >> >> binarization. Experiment. Removing "greyness" won't help as it's not a > >> >> generic way of getting rid of uneven illumination; you need to use > >> >> more sophisticated algorithms. Just using Photoshop won't let you > >> >> achieve much. > > >> >> Warm regards, > >> >> Dmitri Silaev > >> >>www.CustomOCR.com > > >> >> On Fri, Aug 19, 2011 at 8:18 PM, Andriy Malovanyy <[email protected]> > >> >> wrote: > >> >> > To Zdenko: > >> >> > I think I have 3.0 version installed, so maybe I should reinstall the > >> >> > new version and try it. Thanks for the description of psm. Did you > >> >> > try > >> >> > to recognize other unedited images which I attached to > >> >> > the first post?? > > >> >> > To Rob: > >> >> > Initially I had 640x480 image with 72dpi with number occupying almost > >> >> > all the image. What I did is just opened the image in Photoshop, went > >> >> > to size of image menu, changed the resolution to 300 dpi (image > >> >> > increased in size) and set the image size back to 640x480. So, with > >> >> > that I got 640x480 image with 300dpi resolution. > > >> >> > On 19 Aug, 17:56, Robert Komar <[email protected]> wrote: > >> >> >> On Fri, 19 Aug 2011, Andriy Malovanyy wrote: > >> >> >> > To sriranga: > >> >> >> > I tried changing dpi (check the previous post). It doesnt work. > > >> >> >> Did you rescale the image from 72 dpi to 300 dpi, or just change > >> >> >> the tag on the original image to say 300 dpi? The latter won't > >> >> >> work. > >> >> >> Tesseract seems to be tuned to work best for scans at 300 dpi > >> >> >> (although I've often successfully used 600 dpi). Scans done at > >> >> >> 72 dpi usually get very poor results from tesseract. > > >> >> >> Cheers, > >> >> >> Rob Komar > > >> >> > -- > >> >> > You received this message because you are subscribed to the Google > >> >> > Groups "tesseract-ocr" group. > >> >> > To post to this group, send email to [email protected] > >> >> > To unsubscribe from this group, send email to > >> >> > [email protected] > >> >> > For more options, visit this group at > >> >> >http://groups.google.com/group/tesseract-ocr?hl=en > > >> >> -- > >> >> You received this message because you are subscribed to the Google > >> >> Groups "tesseract-ocr" group. > >> >> To post to this group, send email to [email protected] > >> >> To unsubscribe from this group, send email to > >> >> [email protected] > >> >> For more options, visit this group at > >> >>http://groups.google.com/group/tesseract-ocr?hl=en > > >> > -- > >> > You received this message because you are subscribed to the Google > >> > Groups "tesseract-ocr" group. > >> > To post to this group, send email to [email protected] > >> > To unsubscribe from this group, send email to > >> > [email protected] > >> > For more options, visit this group at > >> >http://groups.google.com/group/tesseract-ocr?hl=en > > >> -- > >> You received this message because you are subscribed to the Google > >> Groups "tesseract-ocr" group. > >> To post to this group, send email to [email protected] > >> To unsubscribe from this group, send email to > >> [email protected] > >> For more options, visit this group at > >>http://groups.google.com/group/tesseract-ocr?hl=en > > > -- > > You received this message because you are subscribed to the Google > > Groups "tesseract-ocr" group. > > To post to this group, send email to [email protected] > > To unsubscribe from this group, send email to > > [email protected] > > For more options, visit this group at > >http://groups.google.com/group/tesseract-ocr?hl=en -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

