Re: Character frequency in training data

Sven Pedersen Sun, 08 Jul 2012 07:54:02 -0700

First you need to make sure that every letter combination is accounted
for, then if you use representative frequencies as you suggest, it
should help.
--Sven


On Sun, Jul 8, 2012 at 1:17 AM, shahin youssefi <[email protected]> wrote:
> Hello dear friends,
> Today I was checking the sample tif+box pairs in download section, and I
> found out that the frequency of letters in samples( I've checked English and
> German) are very similar to actual alphabet frequency as they occur in real
> languages. I wonder if preparing tif+box pairs in this way for the language
> i'm trying to teach Tesseract, could help me get more OCR accuracy? Can
> anyone shed some light on this?
>
> Thanks in advance.
>
> p.s: attached files are the charts.
>
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en



-- 
``All that is gold does not glitter,
  not all those who wander are lost;
the old that is strong does not wither,
  deep roots are not reached by the frost.
>From the ashes a fire shall be woken,
  a light from the shadows shall spring;
renewed shall be blade that was broken,
  the crownless again shall be king.”

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Re: Character frequency in training data

Reply via email to