Hi, The wordlists are only one element Tesseract uses when judging what the best output for characters and words should be. So just adding a few more likely words to freq-words - while helpful - is not likely to make an enormous difference to the quality of your output.
If you're confident that all of the words you're going to be recognising are going to be in your wordlists you could try increasing the weighting Tesseract gives to words in the wordlists, but it can make "false positives" more common when you come to words that aren't in the lists. See this entry from the FAQ for more details on that: https://code.google.com/p/tesseract-ocr/wiki/FAQ#How_to_increase_the_trust_in/strength_of_the_dictionary? Nick On Tue, Mar 25, 2014 at 07:09:34AM -0700, temp name wrote: > Hello, > > I trained tesseract for a new language. In my testing I didn't get the > accurate > results, So I added dictionary in my training data file. I created lang. > word-dawg and lang.freq-word-dawg and combined them to training file. > In testing with new trained data files I got similar results. I can see no > change in recognition of the words which are present in the dictionary. > > Do anyone know how tesseract uses dictionary ? > How tesseract select a word from dictionary ? > And how to improve accuracy using the dictonary ? > > Thanks! > > -- > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > > --- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email > to [email protected]. > For more options, visit https://groups.google.com/d/optout. -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.

