A great addition to training would be if one dictionary file was
used, combining freq-words and all-words, and a relative frequency
probability score was given to each word. This would allow more
fine-grained scoring based on exactly how likely the word is to
appear, which would be a win.

Obviously for many cases such word frequency scores might be hard to
generate, but for others (such as mine) it isn't at all, if the word
list is generated from a large corpus of existing text.

Would others find such a feature useful? Also, would I be better off
posting this to the bug tracker?

Nick

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to