On Wed, Jul 02, 2014 at 10:26:16PM -0700, Meenal Goyal wrote: > The post about "question about training tesseract" only suggests some > pre-processing steps which include binarisation and I have already tried > them. > I wanted to know if anything can be done to improve output at later stage, > something like adding the words to the dictionary used by tesseract.
OK, I see. The reason I recommended binarisation is that I suspect you'll have a lot more luck with that than anything else, for your problems. > I have tried listing words in eng.user-words but it wasn't much useful. Can > you > suggest anything of this sort which can train tesseract over the time and help > improve the output. If you're sure that all the words you will encounter will be in the dictionary this should help somewhat: https://code.google.com/p/tesseract-ocr/wiki/FAQ#How_to_increase_the_trust_in/strength_of_the_dictionary? Nick -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/20140703211810.GA19831%40manta.lan. For more options, visit https://groups.google.com/d/optout.

