I suppose this means that the image is always binarized, is this correct?

Is there any way to avoid it?

Does this binarization happens by default during training too?

I fine tuned a few models using grayscale images. Do you thing the neural
network received binary black/white pixels or the gray ones?


Thanks, bye

Lorenzo

Il giorno mer 30 gen 2019 alle ore 13:28 Zdenko Podobny <[email protected]>
ha scritto:

> try:
>  tesseract image - get.image
> which calls GetThresholdedImage()
> <https://github.com/tesseract-ocr/tesseract/blob/12c1abcb6b4ef90cfafe316a3b40753ee5e9b9ef/src/api/baseapi.cpp#L638>
>
>
> Zdenko
>
>
> st 30. 1. 2019 o 11:17 Lorenzo Bolzani <[email protected]> napísal(a):
>
>>
>> Zdenko, are you 100% sure that the image is binarized before being fed to
>> the neural network? It looks like a big waste of information to me.
>>
>>
>> Il giorno mer 30 gen 2019 alle ore 07:56 Zdenko Podobny <[email protected]>
>> ha scritto:
>>
>>> That is not true: you do not need to transform image to grayscale. Any
>>> image is at the end binarized (if input image is not binarized) by
>>> tesseract (Otsu).
>>>
>>> BUT: preprocessing image (e.g. custom binarization) will help. See
>>> https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality
>>>
>>> Zdenko
>>>
>>>
>>> st 30. 1. 2019 o 7:36 <[email protected]> napísal(a):
>>>
>>>> Not a solution, but the image needs to be transformed into
>>>> grayscaleetc, (using Open CV) since OCR works best with grayed images and
>>>> images which have size of 300 dpi
>>>>
>>>> On Tuesday, June 12, 2018 at 12:44:21 PM UTC+5:30, Vidur Malhotra wrote:
>>>>>
>>>>> I tried running the tesseract on the attached image. But not getting
>>>>> the desired output. My sample code:
>>>>>
>>>>>
>>>>> import PIL
>>>>> from PIL import Image
>>>>> import pytesseract
>>>>>
>>>>> text = pytesseract.image_to_string(Image.open('test3.jpg'), lang='eng')
>>>>> print(text)
>>>>>
>>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/tesseract-ocr/43a5a754-4227-43b6-aec1-0261403b2029%40googlegroups.com
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/43a5a754-4227-43b6-aec1-0261403b2029%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8zsShhi_LsUXkCWoj8uxWxROkTL7G4RpGwzBEVm1EweTA%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8zsShhi_LsUXkCWoj8uxWxROkTL7G4RpGwzBEVm1EweTA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/CAMgOLLxVGXt527-wkSfGegMtOjMU2LT0rz_H%3Dp8kQZ13CCE1ag%40mail.gmail.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/CAMgOLLxVGXt527-wkSfGegMtOjMU2LT0rz_H%3Dp8kQZ13CCE1ag%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8zse4y474Uz7w0pJEuyDesRgD6fuQu_Y0cMDzGH4Ux7JA%40mail.gmail.com
> <https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8zse4y474Uz7w0pJEuyDesRgD6fuQu_Y0cMDzGH4Ux7JA%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAMgOLLwkdPEWjteO1%2BJ0M0TVvfcfB9hvEX-3uQUvo_8dnr%2B2kw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to