I have this image I want to turn into text: <https://lh3.googleusercontent.com/-CQevnMSjYeM/WtqJNMUuI1I/AAAAAAAAAGY/_0vwKc52EMoAKeDcuyGrgWIPqb22raMfACLcBGAs/s1600/names.png> To clean it up, I've used Fred's textcleaner script (http://www.fmwconcepts.com/imagemagick/textcleaner/index.php) and ran
./textcleaner -i 2 names.png result.png > on the image, the result is now: <https://lh3.googleusercontent.com/-et8RIpYuVb8/WtqJxA3eEsI/AAAAAAAAAGg/I4TXRy4AzaIB2QVntxU28XUV3ZFBbGiEQCLcBGAs/s1600/result.png> It looks a lot cleaner, so now I use tesseract to turn it into text: tesseract result.png stdout -psm 7 -l eng --user-words > /path/to/eng.user-words --user-patterns /path/to/eng.user-patterns with the following files, eng.user-words: BLAZIKEN > RAPIDASH > VICTREEBEL > SHARPEDO > PORYGON-Z > AZELF eng.user-pattern: -M & /path/to/configs/bazaar: load_system_dawg F > load_freq_dawg F > user_words_suffix user-words > user_patterns_suffix user-patterns Yet my output is: Bl*H*ZIKEN-M R*H*PID*H*SH-M V*lE*TREEBEl-M SH*H*RPE*IIIJ*-M P*U*RY*Efl*N-Z-M > *H*ZELF-M Since case isn't an issue for me, the only problems are "A" showing up as "H", "C" showing up as "LE", "DO" showing up as "IIIJ", and "GO" showing up as "Efl" (with "fl" being one character). I'm not sure how to make the image any clearer if possible or if I'm doing something wrong with tesseract. Any help is appreciated. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/cc3d86fb-4d9f-4e77-a5dd-23a41df213e3%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.