Re: [tesseract-ocr] retrieve words not matching the dictionary

Nick White Thu, 03 Jul 2014 14:19:25 -0700

On Wed, Jul 02, 2014 at 10:26:16PM -0700, Meenal Goyal wrote: 
> The post about "question about training tesseract" only suggests some
> pre-processing steps which include binarisation and  I have already tried 
> them.
> I wanted to know if anything can be done to improve output at later stage,
> something like adding the words to the dictionary used by tesseract.


OK, I see. The reason I recommended binarisation is that I suspect 
you'll have a lot more luck with that than anything else, for your 
problems.

> I have tried listing words in eng.user-words but it wasn't much useful. Can 
> you
> suggest anything of this sort which can train tesseract over the time and help
> improve the output.

If you're sure that all the words you will encounter will be in the 
dictionary this should help somewhat: 
https://code.google.com/p/tesseract-ocr/wiki/FAQ#How_to_increase_the_trust_in/strength_of_the_dictionary?

Nick

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/20140703211810.GA19831%40manta.lan.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] retrieve words not matching the dictionary

Reply via email to