On 12 July 2010 16:33, Rippalka <[email protected]> wrote:
> Thanks for the answer.
>
> Even when I don't reach the "50" and I don't add it, it doesn't give
> any better results.

Okl I can't remember the details of the spacing issue, but that
particular example happened to be a clear example. What's less clear
is that, for training, quite a lot of space is required between
characters, but in that case, you should be getting errors about
overlapping boxes.

> I tried to detect the characters before with another language and it
> gave almost perfect results, so I was expecting excellent results by
> creeating my own language. That's true that the definition of my TIFF
> file is extremely poor.

I'm not sure if you're doing it, but you should have multiple examples
of each character, preferably in proportion to the frequency of
occurrence of each character in your language. (Just for my own
curiosity: which language?)

> I only generated the basic files for my new language. For the DAWG
> files, it took them from the English.

Well, the DAWG files do make a difference, as they help to establish
confidence in 'good' words, the characters from which are used in the
adaptive classifier, but I think it won't make much difference in
training.

> Because i can find a "O" instead of a "I". So I'm really confused.
> Maybe I missed another important file.
> It's always the same font. So if I create the language with a 'good
> quality' TIFF file, will I obtain better results with my poor quality
> letters ?

Well, for training, it's much better to have a mix of good and bad
quality, so the classifier can learn the correct set of features,

-- 
<Leftmost> jimregan, that's because deep inside you, you are evil.
<Leftmost> Also not-so-deep inside you.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to