I don't really want to distract from the original topic too much, but
when I tried recognizing both "good.tiff" and "bad.tiff" with my 3.02
version, only a few settings yielded something (-psm 8, primarily,
with "good.tiff"). The other times I got completely BLANK results

Coincidentally, i just realized, yesterday, that tesseract sometimes
delivers blanks.  Is this true?  It gives up and doesn't offer
ANYTHING for certain glyphs.  It does not offer "the best non-blank it
can match".  This sort of took me off guard and bewildered me.

Or am I wrong, and this is a momentary regressive bug in 3.02?
Can you FORCE tesseract to make a non-blank guess, with some config,
to "the best of its ability"?

(I'll move this to a separate thread if necessary)

On Apr 6, 5:38 pm, TP <[email protected]> wrote:
> 2012/4/6 Zdenko Podobný <[email protected]>:
>
>
>
>
>
>
>
>
>
> > Dňa 06.04.2012 17:35, Rufus wrote / napísal(a):
> >> Thanks for the reply.
>
> >> I've tried another image(bad2.tiff), which is still a bit different from
> >> good.tiff, and is of the same order regarding the compression ratio.
> >> However, tesseract still doesn't output anything for bad2.tiff.
> >> I then tried to feed tesseract with only the first character, and there is
> >> works for bad_char.tiff (from bad.tiff) but it doesn't work for
> >> bad2_char.tiff (from bad2.tiff).
>
> >> Commands:
> >> tesseract bad_char.tiff bad_char -l eng -psm 10 nobatch digits
> >> tesseract bad2_char.tiff bad2_char -l eng -psm 10 nobatch digits
>
> >> All the images attached are actually thresholded. I guess there is not much
> >> room for improvement there. I've also tried by training tesseract with a
> >> new language consisting only of digits with a particular font (font: Impact
> >> .... looks like the font in the images). Do you also experience these
> >> problems when using tesseract?
>
> > I think problem is with size of text, resolution and missing border. I
> > tried this:
> > convert -border 500 -resample 300 -density 300 -resize 50 bad2.tiff bad2.png
> > and
> > tesseract bad2.png bad2
> > produced results.
>
> bad.tiff, is grayscale and no DPI is specified in the file. good.tiff
> is b&w and claims its 72dpi (which is probably wrong?). Perhaps just
> setting bad.tiff's dpi to 72dpi would fix the problem (without doing
> any resampling).
>
> Searching the source for "resolution", "credibleresolution",
> "defaultresolution" shows that libtesseract will do different things
> if an image doesn't specify a DPI or the DPI is less than "credible"
> (70 dpi). In particular it looks like it sets the DPI to 300 in some
> parts of the code, and 70 DPI (minCredibleResolution) in others.

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to