Re: [tesseract-ocr] retrieve words not matching the dictionary

Nick White Fri, 04 Jul 2014 07:16:18 -0700

On Fri, Jul 04, 2014 at 02:08:46AM -0700, Meenal Goyal wrote: 
>     If you're sure that all the words you will encounter will be in the
>     dictionary this should help somewhat:
>     https://code.google.com/p/tesseract-ocr/wiki/FAQ#How_to_
>     increase_the_trust_in/strength_of_the_dictionary?
>  
> The words won't always be in dictionary so I tried adding them in file
> eng.user-words but i m confused about the weightage given to this file against
> the already defined dictionaries.
> Also, I have read that post earlier about strengthening the dictionary and
> tried to modify some variables in the configuration file.  But then it starts
> recognizing wrong words, may be its the case of over-correcting.


Yes, that's the problem with just emphasising the dictionary.  
Ultimately if you're giving Tesseract a lot of noise, it's going to 
be very hard to stop it producing garbage output. So I'm afraid 
better binarisation is all I can recommend.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/20140704141513.GA6330%40manta.lan.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] retrieve words not matching the dictionary

Reply via email to