First you need to make sure that every letter combination is accounted for, then if you use representative frequencies as you suggest, it should help. --Sven
On Sun, Jul 8, 2012 at 1:17 AM, shahin youssefi <[email protected]> wrote: > Hello dear friends, > Today I was checking the sample tif+box pairs in download section, and I > found out that the frequency of letters in samples( I've checked English and > German) are very similar to actual alphabet frequency as they occur in real > languages. I wonder if preparing tif+box pairs in this way for the language > i'm trying to teach Tesseract, could help me get more OCR accuracy? Can > anyone shed some light on this? > > Thanks in advance. > > p.s: attached files are the charts. > > > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en -- ``All that is gold does not glitter, not all those who wander are lost; the old that is strong does not wither, deep roots are not reached by the frost. >From the ashes a fire shall be woken, a light from the shadows shall spring; renewed shall be blade that was broken, the crownless again shall be king.” -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

