Hello, I'm using tesseract 3.02 on Windows 7 and I started with the eng.traineddata that was distributed with 3.02. Tesseract keeps misreading some symbols, specifically 6 instead of G, I-I instead of H and a few others, so I'm getting 6od instead of God, I-Iercules instead of Hercules and so on. I was hoping that using the dictionary would help with this so I wouldn't have to retrain, because after all it's just these few symbols, but nothing seems to help. So far I've tried:
Cranking up the language_model_penalty_non_dict_word and language_model_penalty_non_freq_dict_word values in the config file Adding "load_system_dawg T" and "load_freq_dawg T" to the config file (even though it's supposed to do that by default) Adding the 6->G rule to unicharambigs (as "1 6 1 G 0") and recombining. The I-I -> H rule was already there. Adding the words God and Hercules to the frequent word list and recombining (eng.freq-dawg). Emptying both the word list (eng.word-dawg) and frequent word list (eng.freq-dawg) and putting just these two words in and recombining, just to see if it would make a difference. It didn't. Nothing I've done so far has helped, but it seems to me that the point of using the dictionary is to deal with exactly this type of a situation, so I feel like I must be missing something. Have I maybe missed a configuration step? Thanks -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/a8b99ece-3e74-461d-a553-42384b2e77f7%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

