Yes, I think that would be very useful. For most books and scientific
papers the word frequency could easily be determined. It might be
possible to choose a literary style or category and produce different
settings within a language.
--Sven

On Thu, Aug 23, 2012 at 6:08 AM, Nick White <[email protected]> wrote:
> A great addition to training would be if one dictionary file was
> used, combining freq-words and all-words, and a relative frequency
> probability score was given to each word. This would allow more
> fine-grained scoring based on exactly how likely the word is to
> appear, which would be a win.
>
> Obviously for many cases such word frequency scores might be hard to
> generate, but for others (such as mine) it isn't at all, if the word
> list is generated from a large corpus of existing text.
>
> Would others find such a feature useful? Also, would I be better off
> posting this to the bug tracker?
>
> Nick
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en



-- 
``All that is gold does not glitter,
  not all those who wander are lost;
the old that is strong does not wither,
  deep roots are not reached by the frost.
>From the ashes a fire shall be woken,
  a light from the shadows shall spring;
renewed shall be blade that was broken,
  the crownless again shall be king.”

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to