On Fri, Jul 04, 2014 at 02:08:46AM -0700, Meenal Goyal wrote: > If you're sure that all the words you will encounter will be in the > dictionary this should help somewhat: > https://code.google.com/p/tesseract-ocr/wiki/FAQ#How_to_ > increase_the_trust_in/strength_of_the_dictionary? > > The words won't always be in dictionary so I tried adding them in file > eng.user-words but i m confused about the weightage given to this file against > the already defined dictionaries. > Also, I have read that post earlier about strengthening the dictionary and > tried to modify some variables in the configuration file. But then it starts > recognizing wrong words, may be its the case of over-correcting.
Yes, that's the problem with just emphasising the dictionary. Ultimately if you're giving Tesseract a lot of noise, it's going to be very hard to stop it producing garbage output. So I'm afraid better binarisation is all I can recommend. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/20140704141513.GA6330%40manta.lan. For more options, visit https://groups.google.com/d/optout.

