I'm very new to OCR and image processing in general, so please excuse me if this question is a FAQ - I haven't been able to track down any recommendations yet.
I'm looking to identify words in images where the words to be recognized will be from a limited pool of known words (~5000 words). They will be in very similar fonts as well, but the images will generally be of poor quality. What would be the recommended approach? 1) use tesseract as-is and use the output to try to discern the words with post processing (using Levenshtein or Jaro-Winkler or whatever) 2) train tesseract with the known set of words 3) something else? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/3bf33d8c-0f4d-494a-baf5-5a4b490caf98%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

