I'm using: tesseract 3.04.01 leptonica-1.73 libgif 5.1.2 : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.4 : libopenjp2 2.1.0
On Saturday, April 21, 2018 at 2:48:15 AM UTC-6, shree wrote: > > > BLAZIKEN-M RAPIDASH-M VICTREEBEL-M SHRRPEDO-M PORYGON-I-M RAZELF-M > > with > > tesseract -v > tesseract 4.0.0-beta.1-133-g5435c > leptonica-1.76.0 > libjpeg 8d (libjpeg-turbo 1.3.0) : libpng 1.2.50 : libtiff 4.0.3 : zlib > 1.2.8 : libopenjp2 2.3.0 > Found AVX > Found SSE > > tesseract names.png - --tessdata-dir ./tessdata_best > Warning. Invalid resolution 0 dpi. Using 70 instead. > Estimating resolution as 547 > BLAZIKEN-M RAPIDASH-M VICTREEBEL-M SHRRPEDO-M PORYGON-I-M RAZELF-M > > > Which version of tesseract are you using? > > ShreeDevi > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > On Sat, Apr 21, 2018 at 6:32 AM, 'DR' via tesseract-ocr < > tesser...@googlegroups.com <javascript:>> wrote: > >> I have this image I want to turn into text: >> >> >> <https://lh3.googleusercontent.com/-CQevnMSjYeM/WtqJNMUuI1I/AAAAAAAAAGY/_0vwKc52EMoAKeDcuyGrgWIPqb22raMfACLcBGAs/s1600/names.png> >> To clean it up, I've used Fred's textcleaner script ( >> http://www.fmwconcepts.com/imagemagick/textcleaner/index.php) and ran >> >> ./textcleaner -i 2 names.png result.png >>> >> >> on the image, the result is now: >> >> >> <https://lh3.googleusercontent.com/-et8RIpYuVb8/WtqJxA3eEsI/AAAAAAAAAGg/I4TXRy4AzaIB2QVntxU28XUV3ZFBbGiEQCLcBGAs/s1600/result.png> >> It looks a lot cleaner, so now I use tesseract to turn it into text: >> >> tesseract result.png stdout -psm 7 -l eng --user-words >>> /path/to/eng.user-words --user-patterns /path/to/eng.user-patterns >> >> >> with the following files, eng.user-words: >> >> BLAZIKEN >>> RAPIDASH >>> VICTREEBEL >>> SHARPEDO >>> PORYGON-Z >>> AZELF >> >> >> eng.user-pattern: >> >> -M >> >> >> & /path/to/configs/bazaar: >> >> load_system_dawg F >>> load_freq_dawg F >>> user_words_suffix user-words >>> user_patterns_suffix user-patterns >> >> >> Yet my output is: >> >> Bl*H*ZIKEN-M R*H*PID*H*SH-M V*lE*TREEBEl-M SH*H*RPE*IIIJ*-M P*U*RY*Efl*N-Z-M >>> *H*ZELF-M >> >> >> Since case isn't an issue for me, the only problems are "A" showing up as >> "H", "C" showing up as "LE", "DO" showing up as "IIIJ", and "GO" showing up >> as "Efl" (with "fl" being one character). >> >> I'm not sure how to make the image any clearer if possible or if I'm >> doing something wrong with tesseract. Any help is appreciated. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-oc...@googlegroups.com <javascript:>. >> To post to this group, send email to tesser...@googlegroups.com >> <javascript:>. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/cc3d86fb-4d9f-4e77-a5dd-23a41df213e3%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/cc3d86fb-4d9f-4e77-a5dd-23a41df213e3%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/bb71ebf6-f92d-41ee-9ad1-c588eb7656f5%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.