Re: [tesseract-ocr] retrieve words not matching the dictionary

Meenal Goyal Mon, 07 Jul 2014 00:02:27 -0700

Hi Nick,

I am using this technique for binarisation 
http://liris.cnrs.fr/christian.wolf/software/binarize/ . Could you 
recommend anything better than this one.


Thanks.

On Friday, July 4, 2014 7:45:54 PM UTC+5:30, Nick White wrote:
>
> On Fri, Jul 04, 2014 at 02:08:46AM -0700, Meenal Goyal wrote: 
> >     If you're sure that all the words you will encounter will be in the 
> >     dictionary this should help somewhat: 
> >     https://code.google.com/p/tesseract-ocr/wiki/FAQ#How_to_ 
> >     increase_the_trust_in/strength_of_the_dictionary? 
> >   
> > The words won't always be in dictionary so I tried adding them in file 
> > eng.user-words but i m confused about the weightage given to this file 
> against 
> > the already defined dictionaries. 
> > Also, I have read that post earlier about strengthening the dictionary 
> and 
> > tried to modify some variables in the configuration file.  But then it 
> starts 
> > recognizing wrong words, may be its the case of over-correcting. 
>
> Yes, that's the problem with just emphasising the dictionary.   
> Ultimately if you're giving Tesseract a lot of noise, it's going to 
> be very hard to stop it producing garbage output. So I'm afraid 
> better binarisation is all I can recommend. 
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/000f2cd4-1a18-4b85-8513-b5c75e72cd91%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] retrieve words not matching the dictionary

Reply via email to