Probably something can be done to avoid 8 <-> : (and similar) recognition errors. For example, you can add an extra character or two to your every input image. This might help outweigh Tesseract's confidence in semicolons and dots and make it recognize your single-character text correctly. Later you can ignore those extra characters and leave the one you need.
The beginning of the story can be viewed here: http://code.google.com/p/tesseract-ocr/issues/detail?id=446 The problem was not of major importance to me for a while and I'm not in the know of the progress, but seemingly this is also the case with Tesseract roadmap and nothing had been done with it since early 2011. Therefore AFAIC generally there's no conventional way to work with single-character texts, only custom Tess code corrections and clumsy workarounds. HTH Best regards, Dmitri Silaev www.CustomOCR.com On Mon, Jan 27, 2014 at 7:54 PM, Nick White <[email protected]> wrote: > Hi again, > > Thanks for the feedback, I'm glad it's helpful. > > > I also need to get /, but lucky I am, I don't need : yet. > > To add '/' you can create a copy of the 'digits' config file (e.g. > called 'mydigits') and add the '/' to the end of tessedit_char_whitelist > entry. You can then run something like this: > tesseract 8.png test -psm 10 mydigits > > > The question is, what if I need to get : later? > > You'd have to add that to the whitelist as well. It may sometimes > misrecognise 8 as :, unfortunately that's probably unavoidable. > > Hope that helps :) > > Nick > > -- > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > > --- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/groups/opt_out. > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

