I suppose this means that the image is always binarized, is this correct? Is there any way to avoid it?
Does this binarization happens by default during training too? I fine tuned a few models using grayscale images. Do you thing the neural network received binary black/white pixels or the gray ones? Thanks, bye Lorenzo Il giorno mer 30 gen 2019 alle ore 13:28 Zdenko Podobny <[email protected]> ha scritto: > try: > tesseract image - get.image > which calls GetThresholdedImage() > <https://github.com/tesseract-ocr/tesseract/blob/12c1abcb6b4ef90cfafe316a3b40753ee5e9b9ef/src/api/baseapi.cpp#L638> > > > Zdenko > > > st 30. 1. 2019 o 11:17 Lorenzo Bolzani <[email protected]> napísal(a): > >> >> Zdenko, are you 100% sure that the image is binarized before being fed to >> the neural network? It looks like a big waste of information to me. >> >> >> Il giorno mer 30 gen 2019 alle ore 07:56 Zdenko Podobny <[email protected]> >> ha scritto: >> >>> That is not true: you do not need to transform image to grayscale. Any >>> image is at the end binarized (if input image is not binarized) by >>> tesseract (Otsu). >>> >>> BUT: preprocessing image (e.g. custom binarization) will help. See >>> https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality >>> >>> Zdenko >>> >>> >>> st 30. 1. 2019 o 7:36 <[email protected]> napísal(a): >>> >>>> Not a solution, but the image needs to be transformed into >>>> grayscaleetc, (using Open CV) since OCR works best with grayed images and >>>> images which have size of 300 dpi >>>> >>>> On Tuesday, June 12, 2018 at 12:44:21 PM UTC+5:30, Vidur Malhotra wrote: >>>>> >>>>> I tried running the tesseract on the attached image. But not getting >>>>> the desired output. My sample code: >>>>> >>>>> >>>>> import PIL >>>>> from PIL import Image >>>>> import pytesseract >>>>> >>>>> text = pytesseract.image_to_string(Image.open('test3.jpg'), lang='eng') >>>>> print(text) >>>>> >>>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To post to this group, send email to [email protected]. >>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/43a5a754-4227-43b6-aec1-0261403b2029%40googlegroups.com >>>> <https://groups.google.com/d/msgid/tesseract-ocr/43a5a754-4227-43b6-aec1-0261403b2029%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8zsShhi_LsUXkCWoj8uxWxROkTL7G4RpGwzBEVm1EweTA%40mail.gmail.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8zsShhi_LsUXkCWoj8uxWxROkTL7G4RpGwzBEVm1EweTA%40mail.gmail.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/CAMgOLLxVGXt527-wkSfGegMtOjMU2LT0rz_H%3Dp8kQZ13CCE1ag%40mail.gmail.com >> <https://groups.google.com/d/msgid/tesseract-ocr/CAMgOLLxVGXt527-wkSfGegMtOjMU2LT0rz_H%3Dp8kQZ13CCE1ag%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8zse4y474Uz7w0pJEuyDesRgD6fuQu_Y0cMDzGH4Ux7JA%40mail.gmail.com > <https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8zse4y474Uz7w0pJEuyDesRgD6fuQu_Y0cMDzGH4Ux7JA%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAMgOLLwkdPEWjteO1%2BJ0M0TVvfcfB9hvEX-3uQUvo_8dnr%2B2kw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

