I have this image I want to turn into text:

<https://lh3.googleusercontent.com/-CQevnMSjYeM/WtqJNMUuI1I/AAAAAAAAAGY/_0vwKc52EMoAKeDcuyGrgWIPqb22raMfACLcBGAs/s1600/names.png>
To clean it up, I've used Fred's textcleaner script 
(http://www.fmwconcepts.com/imagemagick/textcleaner/index.php) and ran  

./textcleaner -i 2 names.png result.png
>

on the image, the result is now:

<https://lh3.googleusercontent.com/-et8RIpYuVb8/WtqJxA3eEsI/AAAAAAAAAGg/I4TXRy4AzaIB2QVntxU28XUV3ZFBbGiEQCLcBGAs/s1600/result.png>
It looks a lot cleaner, so now I use tesseract to turn it into text:

tesseract result.png stdout -psm 7 -l eng --user-words 
> /path/to/eng.user-words --user-patterns /path/to/eng.user-patterns


with the following files,  eng.user-words:

BLAZIKEN
> RAPIDASH
> VICTREEBEL
> SHARPEDO
> PORYGON-Z
> AZELF


eng.user-pattern:

-M

 
& /path/to/configs/bazaar:

load_system_dawg     F
> load_freq_dawg       F
> user_words_suffix    user-words
> user_patterns_suffix user-patterns


Yet my output is:

Bl*H*ZIKEN-M R*H*PID*H*SH-M V*lE*TREEBEl-M SH*H*RPE*IIIJ*-M P*U*RY*Efl*N-Z-M 
> *H*ZELF-M 


Since case isn't an issue for me, the only problems are "A" showing up as 
"H", "C" showing up as "LE", "DO" showing up as "IIIJ", and "GO" showing up 
as "Efl" (with "fl" being one character).

I'm not sure how to make the image any clearer if possible or if I'm doing 
something wrong with tesseract. Any help is appreciated. 

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/cc3d86fb-4d9f-4e77-a5dd-23a41df213e3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to