Yes, I can pre-process each individual image to make it work, but unfortunately I've been unable to come up with a consistent pre-processing method that would work in general. I've been trying for a while now. I've known that retraining is an option from the beginning but I'm concerned that it may fix some problems and introduce others. The default eng.traineddata works pretty well except that every once in a while a character is misread. I've just downloaded and tried vietocr 4 beta and while it does get this one right it regrettably still misses quite a few others.
What I really need is a dictionary lookup for every non-word or garbage word tesseract finds that would return the best dictionary match. I'm thinking about writing my own but that would be absurd if tesseract is supposed to already contain this functionality. I understand from Ray's explanation here <https://groups.google.com/forum/#!searchin/tesseract-ocr/dictionary/tesseract-ocr/VJXE40iksnI/tr-_9O4F5OcJ> that the correct character choice is not ranked high enough to be considered for a dictionary match, and that would make sense if I didn't have an ambigs rule for it. But if I have an explicit unicharambigs rule that says consider replacing this character with another to look for a dictionary match, I don't know how tesseract still ends up preferring a non-word over a dictionary match? I keep thinking I must be missing some obscure config setting. I've already tried tweaking a while bunch of them from this list <http://www.sk-spell.sk.cx/tesseract-ocr-parameters-in-302-version> but to no avail. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/4b7432da-192d-491d-bdd2-b8de4d8bae0c%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

