[tesseract-ocr] Python tesseract ocr training to a specific list of words

Inês Martins Fri, 12 Jun 2015 06:11:53 -0700

 
  
I am quite new to OCR and to Tesseract.


So far I have a working script that is extracting quite good text from 
images.


My doubt is if it is possible to train tesseract to retrieve only 
words/chars presented in some kind of dictionary file.


For example, I have an .txt with a big list of person names, and I want to 
train Tesseract that "SONIA" is not "50NlA" and "YANNICK" not "VANNlD", 
etc...


If it has the list of imagine all names it will be able to give better 
accuracy? Sorry if it is a stupid question. I wanted the best approach or 
tutorials if it is possible.


I have read this 
https://groups.google.com/forum/#!topic/tesseract-ocr/r5qkHxQOT98 and the 
manual http://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html 
and created the eng.user-words and the bazaar files... what should be the 
next step?


Thanks so much for your time and patient.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/f85fb4e4-8f0e-468a-8254-3de1a053c3c7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Python tesseract ocr training to a specific list of words

Reply via email to