Hi,
I guess the problem is:
- the word is short
- surrounded by graphics elements (button).
tesseract does a little image preprocessing (a user is responsible for
preprocessing ;-) ) e.g. converting image to binary image(black&white) and
it tries to ignore pictures in the provided image...
Based on your images: you can remove a button with the following Python
code:
import cv2
import numpy as np
img = cv2.imread('Images/notDetected.png', cv2.IMREAD_GRAYSCALE)
_, im_th = cv2.threshold(img, 200, 255, cv2.THRESH_BINARY_INV)
im_floodfill = im_th.copy()
h, w = im_th.shape[:2]
mask = np.zeros((h+2, w+2), np.uint8)
cv2.floodFill(im_floodfill, mask, (0, 0), (255, 255, 255))
cv2.imwrite('improved.png', im_floodfill)
Zdenko
po 7. 10. 2024 o 20:47 L ht <[email protected]> napísal(a):
> Hi Zdenko,
> Thanks
> I managed to get the OCR result by cropping out all the white edges from
> the "SIGN IN" image. Interestingly, I didn’t need to crop the white edges
> from the "SIGN IN WITH FACEBOOK" image to achieve a successful result.
> Although both images appear quite similar—white text on a dark background,
> with some white edges—the results differed between them. Could you please
> provide more details about how you handled this, so I can better manage
> similar cases in the future?
>
> On Sun, Oct 6, 2024 at 5:39 AM Zdenko Podobny <[email protected]> wrote:
>
>> Great! So you know what preprocessing steps you need to take to get
>> correct results.
>>
>> >tesseract improved.png - --psm 7
>> SIGNIN
>>
>> Zdenko
>>
>>
>> so 5. 10. 2024 o 20:32 L ht <[email protected]> napísal(a):
>>
>>> Yes, I did.
>>>
>>> On Sat, Oct 5, 2024 at 11:05 AM Zdenko Podobny <[email protected]> wrote:
>>>
>>>> Did you read the tesseract documentation (before asking the forum)?
>>>>
>>>> Zdenko
>>>>
>>>>
>>>> st 2. 10. 2024 o 7:35 L ht <[email protected]> napísal(a):
>>>>
>>>>> Hi,
>>>>>
>>>>> I cropped two buttons from a screenshot. The images contain white text
>>>>> on blue or green backgrounds, but I’m getting different results.
>>>>> Using Ubuntu Tesseract 5.4.0 with the following commands:
>>>>> tesseract detected.png - return "SIGNIN WITH FACEBOOK"
>>>>> tesseract undetected.png - return empty string
>>>>>
>>>>> Can anyone explain why this is happening and how to detect both?
>>>>> Thanks.
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "tesseract-ocr" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/tesseract-ocr/afe1dc25-cebc-43c9-94bc-dcabe6d43c52n%40googlegroups.com
>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/afe1dc25-cebc-43c9-94bc-dcabe6d43c52n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8zfnB%3Dhk2nwBB-DN%3DHUpG5ibmrceHiV6-gVyntWr_GCEA%40mail.gmail.com
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8zfnB%3Dhk2nwBB-DN%3DHUpG5ibmrceHiV6-gVyntWr_GCEA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/CANmU3o9dP29QKkRkXKMUO%2BXCpCpK-vW9SwpES%2BSwcLYkXEAvkQ%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/CANmU3o9dP29QKkRkXKMUO%2BXCpCpK-vW9SwpES%2BSwcLYkXEAvkQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8x18pUsTRt4S%3D7ADV7Nee%2BWvAYjMKFY4UC7Swj0XKzOSA%40mail.gmail.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8x18pUsTRt4S%3D7ADV7Nee%2BWvAYjMKFY4UC7Swj0XKzOSA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/CANmU3o9L2_WUSu6V53dSdpvqv8gkNXqhK0FYghXA_y1ajobBEw%40mail.gmail.com
> <https://groups.google.com/d/msgid/tesseract-ocr/CANmU3o9L2_WUSu6V53dSdpvqv8gkNXqhK0FYghXA_y1ajobBEw%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8wUCyQOGp_sxP0ZHQ7Gv_W2x694%2BexQhvohYqjZH_4sYg%40mail.gmail.com.