Probably something can be done to avoid 8 <-> : (and similar) recognition
errors. For example, you can add an extra character or two to your every
input image. This might help outweigh Tesseract's confidence in semicolons
and dots and make it recognize your single-character text correctly. Later
you can ignore those extra characters and leave the one you need.

The beginning of the story can be viewed here:
http://code.google.com/p/tesseract-ocr/issues/detail?id=446
The problem was not of major importance to me for a while and I'm not in
the know of the progress, but seemingly this is also the case with
Tesseract roadmap and nothing had been done with it since early 2011.
Therefore AFAIC generally there's no conventional way to work with
single-character texts, only custom Tess code corrections and clumsy
workarounds.

HTH

Best regards,
Dmitri Silaev
www.CustomOCR.com




On Mon, Jan 27, 2014 at 7:54 PM, Nick White <[email protected]> wrote:

> Hi again,
>
> Thanks for the feedback, I'm glad it's helpful.
>
> > I also need to get /, but lucky I am, I don't need : yet.
>
> To add '/' you can create a copy of the 'digits' config file (e.g.
> called 'mydigits') and add the '/' to the end of tessedit_char_whitelist
> entry. You can then run something like this:
>   tesseract 8.png test -psm 10 mydigits
>
> > The question is, what if I need to get : later?
>
> You'd have to add that to the whitelist as well. It may sometimes
> misrecognise 8 as :, unfortunately that's probably unavoidable.
>
> Hope that helps :)
>
> Nick
>
> --
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>
> ---
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to