[tesseract-ocr] All-caps, small-caps

bácsi Kazi Mon, 14 Dec 2015 23:34:28 -0800

Hi there!

I'm playing with Tesseract 3.02, and I would need precise recognition of 
capital letters. Unfortunately my files are full of all caps and small caps. 
During the training if I included such words in the sample, I got random 
capitals in the rest of the text. I thought I would try to put them into a new 
font, same. I included them in the dictionary files, somewhat better, but still 
problematic at letter o, u, v etc. I.e. HELLo WoRLD & friends, despite having 
HELLO WORLD in dictionary.
It's quite similar to this:
https://code.google.com/p/tesseract-ocr/issues/detail?id=691
What is your experience? How to train Tesseract for caps? Is it better in later 
versions? Is there a configuration parameter to set?
Thanks!


Kazi

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/5e08d1d0-6e6c-48e3-9b0d-83ff60e6e368%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] All-caps, small-caps

Reply via email to