On 19 July 2010 17:09, Andres <[email protected]> wrote: > Hello people, > > I'm trying to distinguish between 0 (number) and O (vowel). > > O vowel is in uppercase. > > In my training tif image, I included lots of zeros and lots of Os, like > this: O0O0O0O0O0 OOOO 0000 > > Boxes and all the training procedure is ok, the log with no errors, but when > it reads this line O0O0O0O0O0 all of these characters are read as O vowels.
It's a classification problem: 0 and O look identical to OCR (as do 1, I and l). There's a post-processing step that normalises 'words' containing digits/letters, which is what's happening here > > Could you people have some tip for this ? > > Thanks, > > Andres > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en. > -- <Leftmost> jimregan, that's because deep inside you, you are evil. <Leftmost> Also not-so-deep inside you. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

