I am quite new to OCR and to Tesseract.
So far I have a working script that is extracting quite good text from images. My doubt is if it is possible to train tesseract to retrieve only words/chars presented in some kind of dictionary file. For example, I have an .txt with a big list of person names, and I want to train Tesseract that "SONIA" is not "50NlA" and "YANNICK" not "VANNlD", etc... If it has the list of imagine all names it will be able to give better accuracy? Sorry if it is a stupid question. I wanted the best approach or tutorials if it is possible. I have read this https://groups.google.com/forum/#!topic/tesseract-ocr/r5qkHxQOT98 and the manual http://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html and created the eng.user-words and the bazaar files... what should be the next step? Thanks so much for your time and patient. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/f85fb4e4-8f0e-468a-8254-3de1a053c3c7%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

