[tesseract-ocr] Re: Optimal image resolution (dpi/ppi) for Tesseract 4.0.0 and eng.traineddata?

2018-12-31 Thread Willus Dotkom
So I did some more experimenting and convinced myself that the "xres" and 
"yres" values in the PIX structure passed to Tesseract have virtually no 
impact to the results unless the resolution is so poor as to make the error 
rate very high.  Using that information, I re-ran my tests in a more 
systematic way on both Tesseract 4 (with the "TessBest" English training 
data file--14.7 MiB) and Tesseract 3.05 (with CUBE).  The results below 
show the average error rate for the six fonts and then excluding 
Bookman-Demi and Helvetica-Narrow since they're a little out of the 
ordinary.  The error-rate is plotted against the height of a capital letter 
in pixels, as before.  A couple of things to note:
1. Tess v4.0.0 does far better at the lower resolutions (fewer pixels in a 
capital letter).
2. Tess v4.0.0 is more consistent across a broader font selection than Tess 
v3.05.  This is very good to see.
3. However, if I exclude Bookman-Demi and Helvetica-Narrow, Tess v3.05 does 
better for the higher resolutions (40-140 pixel heights).  Tess v4.0.0 
definitely has a consistent issue with high-res fonts which should be 
addressed, as I stated in my earlier posts.

6-font average:
[image: tess_accuracy_6fonts.png]

Without Bookman-Demi and Helvetica-Narrow:
[image: tess_accuracy_4fonts.png]





-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/baab64da-5125-4525-8691-fd03f7a0759c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Failing to run on OSX after installation with brew

2018-12-31 Thread Seokbong Choi
you need to install pre-requisite libraries.

https://gist.github.com/fractaledmind/cd2fc4125bef57bcb3e2
Please refer to line 17-19. Thanks. Happy new year!






On Mon, Dec 31, 2018 at 6:49 AM Bernard Pochet  wrote:

> After installing (and reinstall ...) with brew,I receive this message ...
>
> dyld: Library not loaded: /usr/local/opt/leptonica/lib/liblept.5.dylib
>   Referenced from: /usr/local/bin/tesseract
>   Reason: image not found
> Abort trap: 6
>
> need help
>
> Thanks (and hapy new year)
>
> B
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/bfe38d1b-8055-4e5e-bbf9-61cbe747eb53%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CA%2BVWkA6E5ibRAQ08u1bV%2BDsvSiw34xovTJG0F83tRV0n8ZO_iQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] o recognized as 0 on simple image (no captcha style text)

2018-12-31 Thread Thomas Johanns
I'm using thiagoalessio/tesseract-ocr-for-php 
 which is limited 
to tesseract version 3

On Monday, 31 December 2018 14:30:15 UTC+1, zdenop wrote:
>
> I got this result with tesseract 4.0.0
>  leptonica-1.76.0 (Dec 14 2018, 15:34:47) [MSC v.1916 LIB Release x64]
>   libgif 5.1.4 : libjpeg 9b : libpng 1.6.35 : libtiff 4.0.9 : zlib 1.2.11 
> : libwebp 0.6.1 : libopenjp2 2.3.0
>  Found AVX
>  Found SSE
>
> >tesseract.exe cropped.png -
> Warning: Invalid resolution 0 dpi. Using 70 instead.
> Estimating resolution as 439
> 6716993 - 1 of 1
>
> Which is IMO 100% recognition without doing anything special... 
>
> Zdenko
>
>
> po 31. 12. 2018 o 12:49 Thomas Johanns > 
> napísal(a):
>
>> Hi,
>>
>> I have a simple image that contains an order number, a page number and 
>> page count.
>> The format expected will always be : 1234567 - 1 of 3
>> I tried with 3 different images, only one gave the correct result, the 
>> other two got "10f2" instead of "1of2".
>>
>> This is weird since the o is lower case and doesn't ressemble.
>>
>> What is the best way do handle this ?
>> Is there an option I don't know about, can I specify the font that 
>> tesseract will be analyzing, etc ?
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com .
>> To post to this group, send email to tesser...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/da6e35bd-e579-4217-b339-28144f760a2f%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/2f5e1972-cf12-478e-a907-657f0c5e34da%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] o recognized as 0 on simple image (no captcha style text)

2018-12-31 Thread Zdenko Podobny
I got this result with tesseract 4.0.0
 leptonica-1.76.0 (Dec 14 2018, 15:34:47) [MSC v.1916 LIB Release x64]
  libgif 5.1.4 : libjpeg 9b : libpng 1.6.35 : libtiff 4.0.9 : zlib 1.2.11 :
libwebp 0.6.1 : libopenjp2 2.3.0
 Found AVX
 Found SSE

>tesseract.exe cropped.png -
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 439
6716993 - 1 of 1

Which is IMO 100% recognition without doing anything special...

Zdenko


po 31. 12. 2018 o 12:49 Thomas Johanns 
napísal(a):

> Hi,
>
> I have a simple image that contains an order number, a page number and
> page count.
> The format expected will always be : 1234567 - 1 of 3
> I tried with 3 different images, only one gave the correct result, the
> other two got "10f2" instead of "1of2".
>
> This is weird since the o is lower case and doesn't ressemble.
>
> What is the best way do handle this ?
> Is there an option I don't know about, can I specify the font that
> tesseract will be analyzing, etc ?
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/da6e35bd-e579-4217-b339-28144f760a2f%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8ymd%3DJdZs6HAE9wqwHMpbVU1dGaGzhAsV5pr0DwJOosyQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: o recognized as 0 on simple image (no captcha style text)

2018-12-31 Thread Thomas Johanns
I found a cheap solution : in my code, I set the regex to [0o]f, so that it 
recognises of and 0f in Tesseracts output.
I'm still interested in solving the cause of the issue instead of treating 
the symptom.


On Monday, 31 December 2018 12:50:00 UTC+1, Thomas Johanns wrote:
>
> Hi,
>
> I have a simple image that contains an order number, a page number and 
> page count.
> The format expected will always be : 1234567 - 1 of 3
> I tried with 3 different images, only one gave the correct result, the 
> other two got "10f2" instead of "1of2".
>
> This is weird since the o is lower case and doesn't ressemble.
>
> What is the best way do handle this ?
> Is there an option I don't know about, can I specify the font that 
> tesseract will be analyzing, etc ?
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/c9b44855-dcc8-4edc-9a1c-e0eb581e1406%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] o recognized as 0 on simple image (no captcha style text)

2018-12-31 Thread Thomas Johanns
Hi,

I have a simple image that contains an order number, a page number and page 
count.
The format expected will always be : 1234567 - 1 of 3
I tried with 3 different images, only one gave the correct result, the 
other two got "10f2" instead of "1of2".

This is weird since the o is lower case and doesn't ressemble.

What is the best way do handle this ?
Is there an option I don't know about, can I specify the font that 
tesseract will be analyzing, etc ?

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/da6e35bd-e579-4217-b339-28144f760a2f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Failing to run on OSX after installation with brew

2018-12-31 Thread Bernard Pochet
After installing (and reinstall ...) with brew,I receive this message ...

dyld: Library not loaded: /usr/local/opt/leptonica/lib/liblept.5.dylib
  Referenced from: /usr/local/bin/tesseract
  Reason: image not found
Abort trap: 6

need help 

Thanks (and hapy new year)

B

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/bfe38d1b-8055-4e5e-bbf9-61cbe747eb53%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.