Hi there! I'm playing with Tesseract 3.02, and I would need precise recognition of capital letters. Unfortunately my files are full of all caps and small caps. During the training if I included such words in the sample, I got random capitals in the rest of the text. I thought I would try to put them into a new font, same. I included them in the dictionary files, somewhat better, but still problematic at letter o, u, v etc. I.e. HELLo WoRLD & friends, despite having HELLO WORLD in dictionary. It's quite similar to this: https://code.google.com/p/tesseract-ocr/issues/detail?id=691 What is your experience? How to train Tesseract for caps? Is it better in later versions? Is there a configuration parameter to set? Thanks!
Kazi -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/5e08d1d0-6e6c-48e3-9b0d-83ff60e6e368%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

