Hi All, I am a new tesseract users to forgive me if my question is naive. My problem is similar to what is described here <https://groups.google.com/forum/#!searchin/tesseract-ocr/convert%7Csort:relevance/tesseract-ocr/1dcA1D8qdZw/omUMk6ajt-8J>. I generate perfect, hi-res text using ImageMagick's *convert* command line tool, and then give the result as an input to *tesseract*, but what I get is very bad quality. Lowercase "w" become uppercase, uppercase "X" become lowercase "h" etc. I've tested a few fonts - including OCR-A - used different color spaces, configured tesseract to ignore language dictionaries etc., I can't get to a settings that assures me a seamless conversion. However, I haven't used any training yet.
What am I missing? Is it about training? In your experience, have you found anything that assures no error while keeping the text human readable and using a non-copyrighted font? Thanks! Giacecco -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/381fb8eb-eea3-41e6-b818-558c41bd9626%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

