On 12 July 2010 16:33, Rippalka <[email protected]> wrote: > Thanks for the answer. > > Even when I don't reach the "50" and I don't add it, it doesn't give > any better results.
Okl I can't remember the details of the spacing issue, but that particular example happened to be a clear example. What's less clear is that, for training, quite a lot of space is required between characters, but in that case, you should be getting errors about overlapping boxes. > I tried to detect the characters before with another language and it > gave almost perfect results, so I was expecting excellent results by > creeating my own language. That's true that the definition of my TIFF > file is extremely poor. I'm not sure if you're doing it, but you should have multiple examples of each character, preferably in proportion to the frequency of occurrence of each character in your language. (Just for my own curiosity: which language?) > I only generated the basic files for my new language. For the DAWG > files, it took them from the English. Well, the DAWG files do make a difference, as they help to establish confidence in 'good' words, the characters from which are used in the adaptive classifier, but I think it won't make much difference in training. > Because i can find a "O" instead of a "I". So I'm really confused. > Maybe I missed another important file. > It's always the same font. So if I create the language with a 'good > quality' TIFF file, will I obtain better results with my poor quality > letters ? Well, for training, it's much better to have a mix of good and bad quality, so the classifier can learn the correct set of features, -- <Leftmost> jimregan, that's because deep inside you, you are evil. <Leftmost> Also not-so-deep inside you. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

