I don't really want to distract from the original topic too much, but when I tried recognizing both "good.tiff" and "bad.tiff" with my 3.02 version, only a few settings yielded something (-psm 8, primarily, with "good.tiff"). The other times I got completely BLANK results
Coincidentally, i just realized, yesterday, that tesseract sometimes delivers blanks. Is this true? It gives up and doesn't offer ANYTHING for certain glyphs. It does not offer "the best non-blank it can match". This sort of took me off guard and bewildered me. Or am I wrong, and this is a momentary regressive bug in 3.02? Can you FORCE tesseract to make a non-blank guess, with some config, to "the best of its ability"? (I'll move this to a separate thread if necessary) On Apr 6, 5:38 pm, TP <[email protected]> wrote: > 2012/4/6 Zdenko Podobný <[email protected]>: > > > > > > > > > > > Dňa 06.04.2012 17:35, Rufus wrote / napísal(a): > >> Thanks for the reply. > > >> I've tried another image(bad2.tiff), which is still a bit different from > >> good.tiff, and is of the same order regarding the compression ratio. > >> However, tesseract still doesn't output anything for bad2.tiff. > >> I then tried to feed tesseract with only the first character, and there is > >> works for bad_char.tiff (from bad.tiff) but it doesn't work for > >> bad2_char.tiff (from bad2.tiff). > > >> Commands: > >> tesseract bad_char.tiff bad_char -l eng -psm 10 nobatch digits > >> tesseract bad2_char.tiff bad2_char -l eng -psm 10 nobatch digits > > >> All the images attached are actually thresholded. I guess there is not much > >> room for improvement there. I've also tried by training tesseract with a > >> new language consisting only of digits with a particular font (font: Impact > >> .... looks like the font in the images). Do you also experience these > >> problems when using tesseract? > > > I think problem is with size of text, resolution and missing border. I > > tried this: > > convert -border 500 -resample 300 -density 300 -resize 50 bad2.tiff bad2.png > > and > > tesseract bad2.png bad2 > > produced results. > > bad.tiff, is grayscale and no DPI is specified in the file. good.tiff > is b&w and claims its 72dpi (which is probably wrong?). Perhaps just > setting bad.tiff's dpi to 72dpi would fix the problem (without doing > any resampling). > > Searching the source for "resolution", "credibleresolution", > "defaultresolution" shows that libtesseract will do different things > if an image doesn't specify a DPI or the DPI is less than "credible" > (70 dpi). In particular it looks like it sets the DPI to 300 in some > parts of the code, and 70 DPI (minCredibleResolution) in others. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

