2012/4/6 Zdenko Podobný <[email protected]>: > Dňa 06.04.2012 17:35, Rufus wrote / napísal(a): >> Thanks for the reply. >> >> I've tried another image(bad2.tiff), which is still a bit different from >> good.tiff, and is of the same order regarding the compression ratio. >> However, tesseract still doesn't output anything for bad2.tiff. >> I then tried to feed tesseract with only the first character, and there is >> works for bad_char.tiff (from bad.tiff) but it doesn't work for >> bad2_char.tiff (from bad2.tiff). >> >> Commands: >> tesseract bad_char.tiff bad_char -l eng -psm 10 nobatch digits >> tesseract bad2_char.tiff bad2_char -l eng -psm 10 nobatch digits >> >> >> All the images attached are actually thresholded. I guess there is not much >> room for improvement there. I've also tried by training tesseract with a >> new language consisting only of digits with a particular font (font: Impact >> .... looks like the font in the images). Do you also experience these >> problems when using tesseract? >> > I think problem is with size of text, resolution and missing border. I > tried this: > convert -border 500 -resample 300 -density 300 -resize 50 bad2.tiff bad2.png > and > tesseract bad2.png bad2 > produced results.
bad.tiff, is grayscale and no DPI is specified in the file. good.tiff is b&w and claims its 72dpi (which is probably wrong?). Perhaps just setting bad.tiff's dpi to 72dpi would fix the problem (without doing any resampling). Searching the source for "resolution", "credibleresolution", "defaultresolution" shows that libtesseract will do different things if an image doesn't specify a DPI or the DPI is less than "credible" (70 dpi). In particular it looks like it sets the DPI to 300 in some parts of the code, and 70 DPI (minCredibleResolution) in others. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

