subject:"\[tesseract\-ocr\] Unsure why tesseract isn't returning the correct text"

Re: [tesseract-ocr] Unsure why tesseract isn't returning the correct text

2018-04-22 Thread ShreeDevi Kumar

Yes, please use the latest code from github master branch for building.
That way you will have all the bug fixes and updates.

On Sun 22 Apr, 2018, 2:42 AM 'DR' via tesseract-ocr, <
tesseract-ocr@googlegroups.com> wrote:

> I double checked, there seems to be a 4.0.0-beta.1 tag. I assume you
> installed that using git?
>
>
> On Saturday, April 21, 2018 at 2:40:20 PM UTC-6, zdenop wrote:
>>
>> Really? Did you check it before writing to forum?
>>
>> Zdenko
>>
>> 2018-04-21 22:25 GMT+02:00 'DR' via tesseract-ocr <
>> tesser...@googlegroups.com>:
>>
>>> Where can I find tesseract 4 beta? The github repo goes up to 4 alpha.
>>>
>>> On Saturday, April 21, 2018 at 2:21:49 PM UTC-6, zdenop wrote:

 Time for upgrade?

 Zdenko

 2018-04-21 22:14 GMT+02:00 'DR' via tesseract-ocr <
 tesser...@googlegroups.com>:

> I'm using:
>
> tesseract 3.04.01
>  leptonica-1.73
>   libgif 5.1.2 : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 :
> libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.4 : libopenjp2 2.1.0
>
>
>
> On Saturday, April 21, 2018 at 2:48:15 AM UTC-6, shree wrote:
>>
>>
>> BLAZIKEN-M RAPIDASH-M VICTREEBEL-M SHRRPEDO-M PORYGON-I-M  RAZELF-M
>>
>> with
>>
>>  tesseract -v
>> tesseract 4.0.0-beta.1-133-g5435c
>>  leptonica-1.76.0
>>   libjpeg 8d (libjpeg-turbo 1.3.0) : libpng 1.2.50 : libtiff 4.0.3 :
>> zlib 1.2.8 : libopenjp2 2.3.0
>>  Found AVX
>>  Found SSE
>>
>> tesseract names.png - --tessdata-dir ./tessdata_best
>> Warning. Invalid resolution 0 dpi. Using 70 instead.
>> Estimating resolution as 547
>> BLAZIKEN-M RAPIDASH-M VICTREEBEL-M SHRRPEDO-M PORYGON-I-M  RAZELF-M
>>
>>
>> Which version of tesseract are you using?
>>
>> ShreeDevi
>> 
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
>> On Sat, Apr 21, 2018 at 6:32 AM, 'DR' via tesseract-ocr <
>> tesser...@googlegroups.com> wrote:
>>
>>> I have this image I want to turn into text:
>>>
>>>
>>> 
>>> To clean it up, I've used Fred's textcleaner script (
>>> http://www.fmwconcepts.com/imagemagick/textcleaner/index.php) and
>>> ran
>>>
>>> ./textcleaner -i 2 names.png result.png

>>>
>>> on the image, the result is now:
>>>
>>>
>>> 
>>> It looks a lot cleaner, so now I use tesseract to turn it into text:
>>>
>>> tesseract result.png stdout -psm 7 -l eng --user-words
 /path/to/eng.user-words --user-patterns /path/to/eng.user-patterns
>>>
>>>
>>> with the following files,  eng.user-words:
>>>
>>> BLAZIKEN
 RAPIDASH
 VICTREEBEL
 SHARPEDO
 PORYGON-Z
 AZELF
>>>
>>>
>>> eng.user-pattern:
>>>
>>> -M
>>>
>>>
>>> & /path/to/configs/bazaar:
>>>
>>> load_system_dawg F
 load_freq_dawg   F
 user_words_suffixuser-words
 user_patterns_suffix user-patterns
>>>
>>>
>>> Yet my output is:
>>>
>>> Bl*H*ZIKEN-M R*H*PID*H*SH-M V*lE*TREEBEl-M SH*H*RPE*IIIJ*-M P*U*RY
 *Eﬂ*N-Z-M *H*ZELF-M
>>>
>>>
>>> Since case isn't an issue for me, the only problems are "A" showing
>>> up as "H", "C" showing up as "LE", "DO" showing up as "IIIJ", and "GO"
>>> showing up as "Efl" (with "fl" being one character).
>>>
>>> I'm not sure how to make the image any clearer if possible or if I'm
>>> doing something wrong with tesseract. Any help is appreciated.
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it,
>>> send an email to tesseract-oc...@googlegroups.com.
>>> To post to this group, send email to tesser...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/cc3d86fb-4d9f-4e77-a5dd-23a41df213e3%40googlegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to tesseract-oc...@googlegroups.com.
> To post to this group, send

Re: [tesseract-ocr] Unsure why tesseract isn't returning the correct text

2018-04-21 Thread 'DR' via tesseract-ocr

Where can I find tesseract 4 beta? The github repo goes up to 4 alpha.

On Saturday, April 21, 2018 at 2:21:49 PM UTC-6, zdenop wrote:
>
> Time for upgrade?
>
> Zdenko
>
> 2018-04-21 22:14 GMT+02:00 'DR' via tesseract-ocr <
> tesser...@googlegroups.com >:
>
>> I'm using:
>>
>> tesseract 3.04.01
>>  leptonica-1.73
>>   libgif 5.1.2 : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : 
>> libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.4 : libopenjp2 2.1.0
>>
>>
>>
>> On Saturday, April 21, 2018 at 2:48:15 AM UTC-6, shree wrote:
>>>
>>>
>>> BLAZIKEN-M RAPIDASH-M VICTREEBEL-M SHRRPEDO-M PORYGON-I-M  RAZELF-M
>>>
>>> with
>>>
>>>  tesseract -v
>>> tesseract 4.0.0-beta.1-133-g5435c
>>>  leptonica-1.76.0
>>>   libjpeg 8d (libjpeg-turbo 1.3.0) : libpng 1.2.50 : libtiff 4.0.3 : 
>>> zlib 1.2.8 : libopenjp2 2.3.0
>>>  Found AVX
>>>  Found SSE
>>>
>>> tesseract names.png - --tessdata-dir ./tessdata_best
>>> Warning. Invalid resolution 0 dpi. Using 70 instead.
>>> Estimating resolution as 547
>>> BLAZIKEN-M RAPIDASH-M VICTREEBEL-M SHRRPEDO-M PORYGON-I-M  RAZELF-M
>>>
>>>
>>> Which version of tesseract are you using?
>>>
>>> ShreeDevi
>>> 
>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>
>>> On Sat, Apr 21, 2018 at 6:32 AM, 'DR' via tesseract-ocr <
>>> tesser...@googlegroups.com> wrote:
>>>
 I have this image I want to turn into text:


 
 To clean it up, I've used Fred's textcleaner script (
 http://www.fmwconcepts.com/imagemagick/textcleaner/index.php) and ran  

 ./textcleaner -i 2 names.png result.png
>

 on the image, the result is now:


 
 It looks a lot cleaner, so now I use tesseract to turn it into text:

 tesseract result.png stdout -psm 7 -l eng --user-words 
> /path/to/eng.user-words --user-patterns /path/to/eng.user-patterns


 with the following files,  eng.user-words:

 BLAZIKEN
> RAPIDASH
> VICTREEBEL
> SHARPEDO
> PORYGON-Z
> AZELF


 eng.user-pattern:

 -M

  
 & /path/to/configs/bazaar:

 load_system_dawg F
> load_freq_dawg   F
> user_words_suffixuser-words
> user_patterns_suffix user-patterns


 Yet my output is:

 Bl*H*ZIKEN-M R*H*PID*H*SH-M V*lE*TREEBEl-M SH*H*RPE*IIIJ*-M 
 P*U*RY*Eﬂ*N-Z-M 
> *H*ZELF-M 


 Since case isn't an issue for me, the only problems are "A" showing up 
 as "H", "C" showing up as "LE", "DO" showing up as "IIIJ", and "GO" 
 showing 
 up as "Efl" (with "fl" being one character).

 I'm not sure how to make the image any clearer if possible or if I'm 
 doing something wrong with tesseract. Any help is appreciated. 

 -- 
 You received this message because you are subscribed to the Google 
 Groups "tesseract-ocr" group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to tesseract-oc...@googlegroups.com.
 To post to this group, send email to tesser...@googlegroups.com.
 Visit this group at https://groups.google.com/group/tesseract-ocr.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/tesseract-ocr/cc3d86fb-4d9f-4e77-a5dd-23a41df213e3%40googlegroups.com
  
 
 .
 For more options, visit https://groups.google.com/d/optout.

>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com .
>> To post to this group, send email to tesser...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/bb71ebf6-f92d-41ee-9ad1-c588eb7656f5%40googlegroups.com
>>  
>> 
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit

Re: [tesseract-ocr] Unsure why tesseract isn't returning the correct text

2018-04-21 Thread Zdenko Podobny

Time for upgrade?

Zdenko

2018-04-21 22:14 GMT+02:00 'DR' via tesseract-ocr <
tesseract-ocr@googlegroups.com>:

> I'm using:
>
> tesseract 3.04.01
>  leptonica-1.73
>   libgif 5.1.2 : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 :
> libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.4 : libopenjp2 2.1.0
>
>
>
> On Saturday, April 21, 2018 at 2:48:15 AM UTC-6, shree wrote:
>>
>>
>> BLAZIKEN-M RAPIDASH-M VICTREEBEL-M SHRRPEDO-M PORYGON-I-M  RAZELF-M
>>
>> with
>>
>>  tesseract -v
>> tesseract 4.0.0-beta.1-133-g5435c
>>  leptonica-1.76.0
>>   libjpeg 8d (libjpeg-turbo 1.3.0) : libpng 1.2.50 : libtiff 4.0.3 : zlib
>> 1.2.8 : libopenjp2 2.3.0
>>  Found AVX
>>  Found SSE
>>
>> tesseract names.png - --tessdata-dir ./tessdata_best
>> Warning. Invalid resolution 0 dpi. Using 70 instead.
>> Estimating resolution as 547
>> BLAZIKEN-M RAPIDASH-M VICTREEBEL-M SHRRPEDO-M PORYGON-I-M  RAZELF-M
>>
>>
>> Which version of tesseract are you using?
>>
>> ShreeDevi
>> 
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
>> On Sat, Apr 21, 2018 at 6:32 AM, 'DR' via tesseract-ocr <
>> tesser...@googlegroups.com> wrote:
>>
>>> I have this image I want to turn into text:
>>>
>>>
>>> 
>>> To clean it up, I've used Fred's textcleaner script (
>>> http://www.fmwconcepts.com/imagemagick/textcleaner/index.php) and ran
>>>
>>> ./textcleaner -i 2 names.png result.png

>>>
>>> on the image, the result is now:
>>>
>>>
>>> 
>>> It looks a lot cleaner, so now I use tesseract to turn it into text:
>>>
>>> tesseract result.png stdout -psm 7 -l eng --user-words
 /path/to/eng.user-words --user-patterns /path/to/eng.user-patterns
>>>
>>>
>>> with the following files,  eng.user-words:
>>>
>>> BLAZIKEN
 RAPIDASH
 VICTREEBEL
 SHARPEDO
 PORYGON-Z
 AZELF
>>>
>>>
>>> eng.user-pattern:
>>>
>>> -M
>>>
>>>
>>> & /path/to/configs/bazaar:
>>>
>>> load_system_dawg F
 load_freq_dawg   F
 user_words_suffixuser-words
 user_patterns_suffix user-patterns
>>>
>>>
>>> Yet my output is:
>>>
>>> Bl*H*ZIKEN-M R*H*PID*H*SH-M V*lE*TREEBEl-M SH*H*RPE*IIIJ*-M P*U*RY*Eﬂ*N-Z-M
 *H*ZELF-M
>>>
>>>
>>> Since case isn't an issue for me, the only problems are "A" showing up
>>> as "H", "C" showing up as "LE", "DO" showing up as "IIIJ", and "GO" showing
>>> up as "Efl" (with "fl" being one character).
>>>
>>> I'm not sure how to make the image any clearer if possible or if I'm
>>> doing something wrong with tesseract. Any help is appreciated.
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesseract-oc...@googlegroups.com.
>>> To post to this group, send email to tesser...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>> gid/tesseract-ocr/cc3d86fb-4d9f-4e77-a5dd-23a41df213e3%40goo
>>> glegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/bb71ebf6-f92d-41ee-9ad1-c588eb7656f5%
> 40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8y8OZ%2BcdY4_sUBp0Rmm%2BWzAdnNZF73yFf8%3DekL5qQ19SQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Unsure why tesseract isn't returning the correct text

2018-04-21 Thread 'DR' via tesseract-ocr

I'm using:

tesseract 3.04.01
 leptonica-1.73
  libgif 5.1.2 : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 
4.0.6 : zlib 1.2.8 : libwebp 0.4.4 : libopenjp2 2.1.0



On Saturday, April 21, 2018 at 2:48:15 AM UTC-6, shree wrote:
>
>
> BLAZIKEN-M RAPIDASH-M VICTREEBEL-M SHRRPEDO-M PORYGON-I-M  RAZELF-M
>
> with
>
>  tesseract -v
> tesseract 4.0.0-beta.1-133-g5435c
>  leptonica-1.76.0
>   libjpeg 8d (libjpeg-turbo 1.3.0) : libpng 1.2.50 : libtiff 4.0.3 : zlib 
> 1.2.8 : libopenjp2 2.3.0
>  Found AVX
>  Found SSE
>
> tesseract names.png - --tessdata-dir ./tessdata_best
> Warning. Invalid resolution 0 dpi. Using 70 instead.
> Estimating resolution as 547
> BLAZIKEN-M RAPIDASH-M VICTREEBEL-M SHRRPEDO-M PORYGON-I-M  RAZELF-M
>
>
> Which version of tesseract are you using?
>
> ShreeDevi
> 
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Sat, Apr 21, 2018 at 6:32 AM, 'DR' via tesseract-ocr <
> tesser...@googlegroups.com > wrote:
>
>> I have this image I want to turn into text:
>>
>>
>> 
>> To clean it up, I've used Fred's textcleaner script (
>> http://www.fmwconcepts.com/imagemagick/textcleaner/index.php) and ran  
>>
>> ./textcleaner -i 2 names.png result.png
>>>
>>
>> on the image, the result is now:
>>
>>
>> 
>> It looks a lot cleaner, so now I use tesseract to turn it into text:
>>
>> tesseract result.png stdout -psm 7 -l eng --user-words 
>>> /path/to/eng.user-words --user-patterns /path/to/eng.user-patterns
>>
>>
>> with the following files,  eng.user-words:
>>
>> BLAZIKEN
>>> RAPIDASH
>>> VICTREEBEL
>>> SHARPEDO
>>> PORYGON-Z
>>> AZELF
>>
>>
>> eng.user-pattern:
>>
>> -M
>>
>>  
>> & /path/to/configs/bazaar:
>>
>> load_system_dawg F
>>> load_freq_dawg   F
>>> user_words_suffixuser-words
>>> user_patterns_suffix user-patterns
>>
>>
>> Yet my output is:
>>
>> Bl*H*ZIKEN-M R*H*PID*H*SH-M V*lE*TREEBEl-M SH*H*RPE*IIIJ*-M P*U*RY*Eﬂ*N-Z-M 
>>> *H*ZELF-M 
>>
>>
>> Since case isn't an issue for me, the only problems are "A" showing up as 
>> "H", "C" showing up as "LE", "DO" showing up as "IIIJ", and "GO" showing up 
>> as "Efl" (with "fl" being one character).
>>
>> I'm not sure how to make the image any clearer if possible or if I'm 
>> doing something wrong with tesseract. Any help is appreciated. 
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com .
>> To post to this group, send email to tesser...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/cc3d86fb-4d9f-4e77-a5dd-23a41df213e3%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/bb71ebf6-f92d-41ee-9ad1-c588eb7656f5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Unsure why tesseract isn't returning the correct text

2018-04-21 Thread ShreeDevi Kumar

BLAZIKEN-M RAPIDASH-M VICTREEBEL-M SHRRPEDO-M PORYGON-I-M  RAZELF-M

with

 tesseract -v
tesseract 4.0.0-beta.1-133-g5435c
 leptonica-1.76.0
  libjpeg 8d (libjpeg-turbo 1.3.0) : libpng 1.2.50 : libtiff 4.0.3 : zlib
1.2.8 : libopenjp2 2.3.0
 Found AVX
 Found SSE

tesseract names.png - --tessdata-dir ./tessdata_best
Warning. Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 547
BLAZIKEN-M RAPIDASH-M VICTREEBEL-M SHRRPEDO-M PORYGON-I-M  RAZELF-M


Which version of tesseract are you using?

ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Sat, Apr 21, 2018 at 6:32 AM, 'DR' via tesseract-ocr <
tesseract-ocr@googlegroups.com> wrote:

> I have this image I want to turn into text:
>
>
> 
> To clean it up, I've used Fred's textcleaner script (
> http://www.fmwconcepts.com/imagemagick/textcleaner/index.php) and ran
>
> ./textcleaner -i 2 names.png result.png
>>
>
> on the image, the result is now:
>
>
> 
> It looks a lot cleaner, so now I use tesseract to turn it into text:
>
> tesseract result.png stdout -psm 7 -l eng --user-words
>> /path/to/eng.user-words --user-patterns /path/to/eng.user-patterns
>
>
> with the following files,  eng.user-words:
>
> BLAZIKEN
>> RAPIDASH
>> VICTREEBEL
>> SHARPEDO
>> PORYGON-Z
>> AZELF
>
>
> eng.user-pattern:
>
> -M
>
>
> & /path/to/configs/bazaar:
>
> load_system_dawg F
>> load_freq_dawg   F
>> user_words_suffixuser-words
>> user_patterns_suffix user-patterns
>
>
> Yet my output is:
>
> Bl*H*ZIKEN-M R*H*PID*H*SH-M V*lE*TREEBEl-M SH*H*RPE*IIIJ*-M P*U*RY*Eﬂ*N-Z-M
>> *H*ZELF-M
>
>
> Since case isn't an issue for me, the only problems are "A" showing up as
> "H", "C" showing up as "LE", "DO" showing up as "IIIJ", and "GO" showing up
> as "Efl" (with "fl" being one character).
>
> I'm not sure how to make the image any clearer if possible or if I'm doing
> something wrong with tesseract. Any help is appreciated.
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/cc3d86fb-4d9f-4e77-a5dd-23a41df213e3%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduV%2BhWxicE7n82e3VrzuBmGe5wFhTaHAEp2Gf-Yeb5ievg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Unsure why tesseract isn't returning the correct text

2018-04-21 Thread 'DR' via tesseract-ocr

I have this image I want to turn into text:


To clean it up, I've used Fred's textcleaner script 
(http://www.fmwconcepts.com/imagemagick/textcleaner/index.php) and ran  

./textcleaner -i 2 names.png result.png
>

on the image, the result is now:


It looks a lot cleaner, so now I use tesseract to turn it into text:

tesseract result.png stdout -psm 7 -l eng --user-words 
> /path/to/eng.user-words --user-patterns /path/to/eng.user-patterns


with the following files,  eng.user-words:

BLAZIKEN
> RAPIDASH
> VICTREEBEL
> SHARPEDO
> PORYGON-Z
> AZELF


eng.user-pattern:

-M

 
& /path/to/configs/bazaar:

load_system_dawg F
> load_freq_dawg   F
> user_words_suffixuser-words
> user_patterns_suffix user-patterns


Yet my output is:

Bl*H*ZIKEN-M R*H*PID*H*SH-M V*lE*TREEBEl-M SH*H*RPE*IIIJ*-M P*U*RY*Eﬂ*N-Z-M 
> *H*ZELF-M 


Since case isn't an issue for me, the only problems are "A" showing up as 
"H", "C" showing up as "LE", "DO" showing up as "IIIJ", and "GO" showing up 
as "Efl" (with "fl" being one character).

I'm not sure how to make the image any clearer if possible or if I'm doing 
something wrong with tesseract. Any help is appreciated. 

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/cc3d86fb-4d9f-4e77-a5dd-23a41df213e3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Unsure why tesseract isn't returning the correct text

Re: [tesseract-ocr] Unsure why tesseract isn't returning the correct text

Re: [tesseract-ocr] Unsure why tesseract isn't returning the correct text

Re: [tesseract-ocr] Unsure why tesseract isn't returning the correct text

Re: [tesseract-ocr] Unsure why tesseract isn't returning the correct text

[tesseract-ocr] Unsure why tesseract isn't returning the correct text

6 matches

Site Navigation

Mail list logo

Footer information