Yes, I think that would be very useful. For most books and scientific papers the word frequency could easily be determined. It might be possible to choose a literary style or category and produce different settings within a language. --Sven
On Thu, Aug 23, 2012 at 6:08 AM, Nick White <[email protected]> wrote: > A great addition to training would be if one dictionary file was > used, combining freq-words and all-words, and a relative frequency > probability score was given to each word. This would allow more > fine-grained scoring based on exactly how likely the word is to > appear, which would be a win. > > Obviously for many cases such word frequency scores might be hard to > generate, but for others (such as mine) it isn't at all, if the word > list is generated from a large corpus of existing text. > > Would others find such a feature useful? Also, would I be better off > posting this to the bug tracker? > > Nick > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en -- ``All that is gold does not glitter, not all those who wander are lost; the old that is strong does not wither, deep roots are not reached by the frost. >From the ashes a fire shall be woken, a light from the shadows shall spring; renewed shall be blade that was broken, the crownless again shall be king.” -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

