Re: [tesseract-ocr] Tesseract does not recognise these numbers

2021-06-18 Thread Juanjo Gómez Navarro
Thanks for the hint. A little bit of blur and resizing indeed helped.

El viernes, 18 de junio de 2021 a las 12:53:00 UTC+2, zdenop escribió:

> With tessdata from  [1]  and oem 0 you can get:
>
> tesseract unnamed.png - --psm 7 --oem 0
> 09:41 Dm
>
> Otherwise:
>
> tesseract unnamed.png - --psm 7
> 0%:41 pm
>
> With small preprocessing (blur and resize, so letter have high around 30 
> points) you can get :
>
> tesseract time.png - --psm 7
> 09:41 pm
>
>
> [1] https://github.com/tesseract-ocr/tessdata
>
> Zdenko
>
>
> št 17. 6. 2021 o 15:51 Juanjo Gómez Navarro  
> napísal(a):
>
>> I have this simple image with a date:
>> [image: test.png]
>> Tesseract produces the output: 
>>
>> *$ tesseract test.png -*
>> *Estimating resolution as 233*
>> *03:41 pm*
>>
>> In similar images, I have the problem that it misunderstands 1's for 7's 
>> and the other way around. How can I help Tesseract to recognise these 
>> characters?
>>
>> My version of Tesseract is:
>>
>> *$ tesseract -v*
>> *tesseract 5.0.0-alpha-20210401-130-g7a308*
>> * leptonica-1.79.0*
>> *  libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 2.0.3) : libpng 1.6.37 : 
>> libtiff 4.1.0 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.1*
>> * Found AVX2*
>> * Found AVX*
>> * Found FMA*
>> * Found SSE4.1*
>> * Found OpenMP 201511*
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/801b7f63-2f79-41d0-8d48-b00cfe3f292en%40googlegroups.com
>>  
>> 
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/3785f285-0c0a-46ac-ba1c-0f7ef76c107en%40googlegroups.com.


Re: [tesseract-ocr] Tesseract does not recognise these numbers

2021-06-18 Thread Zdenko Podobny
With tessdata from  [1]  and oem 0 you can get:

tesseract unnamed.png - --psm 7 --oem 0
09:41 Dm

Otherwise:

tesseract unnamed.png - --psm 7
0%:41 pm

With small preprocessing (blur and resize, so letter have high around 30
points) you can get :

tesseract time.png - --psm 7
09:41 pm


[1] https://github.com/tesseract-ocr/tessdata

Zdenko


št 17. 6. 2021 o 15:51 Juanjo Gómez Navarro 
napísal(a):

> I have this simple image with a date:
> [image: test.png]
> Tesseract produces the output:
>
> *$ tesseract test.png -*
> *Estimating resolution as 233*
> *03:41 pm*
>
> In similar images, I have the problem that it misunderstands 1's for 7's
> and the other way around. How can I help Tesseract to recognise these
> characters?
>
> My version of Tesseract is:
>
> *$ tesseract -v*
> *tesseract 5.0.0-alpha-20210401-130-g7a308*
> * leptonica-1.79.0*
> *  libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 2.0.3) : libpng 1.6.37 :
> libtiff 4.1.0 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.1*
> * Found AVX2*
> * Found AVX*
> * Found FMA*
> * Found SSE4.1*
> * Found OpenMP 201511*
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/801b7f63-2f79-41d0-8d48-b00cfe3f292en%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8xvwN9pkNqNajEA%3DQWdx9dnyZA%3DwkDbsZ7GbsR815nnNQ%40mail.gmail.com.


[tesseract-ocr] Tesseract does not recognise these numbers

2021-06-17 Thread Juanjo Gómez Navarro
I have this simple image with a date:
[image: test.png]
Tesseract produces the output: 

*$ tesseract test.png -*
*Estimating resolution as 233*
*03:41 pm*

In similar images, I have the problem that it misunderstands 1's for 7's 
and the other way around. How can I help Tesseract to recognise these 
characters?

My version of Tesseract is:

*$ tesseract -v*
*tesseract 5.0.0-alpha-20210401-130-g7a308*
* leptonica-1.79.0*
*  libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 2.0.3) : libpng 1.6.37 : 
libtiff 4.1.0 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.1*
* Found AVX2*
* Found AVX*
* Found FMA*
* Found SSE4.1*
* Found OpenMP 201511*

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/801b7f63-2f79-41d0-8d48-b00cfe3f292en%40googlegroups.com.