Hi Nick, I am using this technique for binarisation http://liris.cnrs.fr/christian.wolf/software/binarize/ . Could you recommend anything better than this one.
Thanks. On Friday, July 4, 2014 7:45:54 PM UTC+5:30, Nick White wrote: > > On Fri, Jul 04, 2014 at 02:08:46AM -0700, Meenal Goyal wrote: > > If you're sure that all the words you will encounter will be in the > > dictionary this should help somewhat: > > https://code.google.com/p/tesseract-ocr/wiki/FAQ#How_to_ > > increase_the_trust_in/strength_of_the_dictionary? > > > > The words won't always be in dictionary so I tried adding them in file > > eng.user-words but i m confused about the weightage given to this file > against > > the already defined dictionaries. > > Also, I have read that post earlier about strengthening the dictionary > and > > tried to modify some variables in the configuration file. But then it > starts > > recognizing wrong words, may be its the case of over-correcting. > > Yes, that's the problem with just emphasising the dictionary. > Ultimately if you're giving Tesseract a lot of noise, it's going to > be very hard to stop it producing garbage output. So I'm afraid > better binarisation is all I can recommend. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/000f2cd4-1a18-4b85-8513-b5c75e72cd91%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

