Hi All,
I am a new tesseract users to forgive me if my question is naive. 

My problem is similar to what is described here 
<https://groups.google.com/forum/#!searchin/tesseract-ocr/convert%7Csort:relevance/tesseract-ocr/1dcA1D8qdZw/omUMk6ajt-8J>.
 
I generate perfect, hi-res text using ImageMagick's *convert* command line 
tool, and then give the result as an input to *tesseract*, but what I get 
is very bad quality. Lowercase "w" become uppercase, uppercase "X" become 
lowercase "h" etc. I've tested a few fonts - including OCR-A - used 
different color spaces, configured tesseract to ignore language 
dictionaries etc., I can't get to a settings that assures me a seamless 
conversion. However, I haven't used any training yet.

What am I missing? Is it about training? In your experience, have you found 
anything that assures no error while keeping the text human readable and 
using a non-copyrighted font?

Thanks!

Giacecco

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/381fb8eb-eea3-41e6-b818-558c41bd9626%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to