A great addition to training would be if one dictionary file was used, combining freq-words and all-words, and a relative frequency probability score was given to each word. This would allow more fine-grained scoring based on exactly how likely the word is to appear, which would be a win.
Obviously for many cases such word frequency scores might be hard to generate, but for others (such as mine) it isn't at all, if the word list is generated from a large corpus of existing text. Would others find such a feature useful? Also, would I be better off posting this to the bug tracker? Nick -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

