Re: [tesseract-ocr] Why does tessaract fail on this image?

2020-06-12 Thread Zdenko Podobny
search for forum/issue tracker - there is explanation why LSTM can not
exact character  box coordinates.
If you need exact  character  boxes IMO you need to use legacy engine (but
it could have other problems)

Zdenko


pi 12. 6. 2020 o 12:31 'Tariq Ahmad' via tesseract-ocr <
tesseract-ocr@googlegroups.com> napísal(a):

>
> Many thanks for your reply - useful to know.
>
> I now find that pytesseract is returning the wrong coordinates for
> individual characters. For example, for this image (which has a 10pixel
> border):
>
> image_to_boxes returns:
>
> A: 17 32 10 22
> L: 17 32 24 33
> etc
> etc
>
> These I believe are interpreted as (left bottom right top) and when I
> extract the image for the letter A I get:
>
>
> However, the same code works correctly for:
>
>
> On Thursday, 11 June 2020 19:30:50 UTC+1, zdenop wrote:
>>
>>
>> https://github.com/tesseract-ocr/tessdoc/blob/master/ImproveQuality.md#missing-borders
>>
>>
>> Zdenko
>>
>>
>> st 10. 6. 2020 o 18:50 'Tariq Ahmad' via tesseract-ocr <
>> tesser...@googlegroups.com> napísal(a):
>>
>>> I cannot understand whyTessaract fails on this (cropped) image:
>>>
>>>
>>> Yet if i add a random white border it works:
>>>
>>>
>>> Can anyone shed any light please?
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesser...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/280cee80-aad1-4245-8346-25d87d447730o%40googlegroups.com
>>> 
>>> .
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/53639a29-76a4-4917-8f74-743d48e1de77o%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8zs05s43prXqhhBLdDv%2BpYCDUf5cQkfTcANfhjSTpKXVw%40mail.gmail.com.


Re: [tesseract-ocr] Why does tessaract fail on this image?

2020-06-12 Thread 'Tariq Ahmad' via tesseract-ocr

Many thanks for your reply - useful to know. 

I now find that pytesseract is returning the wrong coordinates for 
individual characters. For example, for this image (which has a 10pixel 
border):

image_to_boxes returns:

A: 17 32 10 22
L: 17 32 24 33
etc
etc

These I believe are interpreted as (left bottom right top) and when I 
extract the image for the letter A I get:


However, the same code works correctly for:


On Thursday, 11 June 2020 19:30:50 UTC+1, zdenop wrote:
>
>
> https://github.com/tesseract-ocr/tessdoc/blob/master/ImproveQuality.md#missing-borders
>  
>  
> Zdenko
>
>
> st 10. 6. 2020 o 18:50 'Tariq Ahmad' via tesseract-ocr <
> tesser...@googlegroups.com > napísal(a):
>
>> I cannot understand whyTessaract fails on this (cropped) image:
>>
>>
>> Yet if i add a random white border it works:
>>
>>
>> Can anyone shed any light please?
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesser...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/280cee80-aad1-4245-8346-25d87d447730o%40googlegroups.com
>>  
>> 
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/53639a29-76a4-4917-8f74-743d48e1de77o%40googlegroups.com.


Re: [tesseract-ocr] Why does tessaract fail on this image?

2020-06-11 Thread Zdenko Podobny
https://github.com/tesseract-ocr/tessdoc/blob/master/ImproveQuality.md#missing-borders


Zdenko


st 10. 6. 2020 o 18:50 'Tariq Ahmad' via tesseract-ocr <
tesseract-ocr@googlegroups.com> napísal(a):

> I cannot understand whyTessaract fails on this (cropped) image:
>
>
> Yet if i add a random white border it works:
>
>
> Can anyone shed any light please?
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/280cee80-aad1-4245-8346-25d87d447730o%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8zETiUza4suo7Wwx5iGwsTphqugrnh_JE1goKD06Jjj%2BA%40mail.gmail.com.


[tesseract-ocr] Why does tessaract fail on this image?

2020-06-10 Thread 'Tariq Ahmad' via tesseract-ocr
I cannot understand whyTessaract fails on this (cropped) image:


Yet if i add a random white border it works:


Can anyone shed any light please?

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/280cee80-aad1-4245-8346-25d87d447730o%40googlegroups.com.