I am on tesseract 5
Inverting images
While tesseract version 3.05 (and older) handle inverted image (dark
background and light text) without problem, for 4.x version use dark text
on light background.
isn'it the same than :
(thresh, im_bw) = cv2.threshold(gray, 128, 255, cv2.THRESH_BINARY |
cv2.THRESH_OTSU)
im_bw = cv2.bitwise_not(im_bw)
for resizing, I take my picture in full HD, do increasing resolution will
allow tesseract to better OCR ?
thanks
Le samedi 25 juin 2022 à 11:25:50 UTC+2, zdenop a écrit :
> Why you did not try more relevant hits like inverting and resizing?
>
> Zdenko
>
>
> so 25. 6. 2022 o 10:56 Hervé <[email protected]> napísal(a):
>
>> I tried gray image, black and white, and I use
>>
>> custom_psm = r'--psm 7'
>>
>> didn't try others parameters
>> Le samedi 25 juin 2022 à 10:32:14 UTC+2, zdenop a écrit :
>>
>>>
>>>
>>> so 25. 6. 2022 o 8:15 Hervé <[email protected]> napísal(a):
>>>
>>>> Hi
>>>> I just tried some, without real success
>>>>
>>>> Please be specific: what did you try and what was the result?
>>>
>>>
>>>
>>>> could I learn digits from pictures ? maybe this font is not well
>>>> recognized
>>>>
>>>
>>> Any training is useless if the failure is at the image preprocessing
>>> stage.
>>>
>>>
>>>> thanks
>>>>
>>>> Le vendredi 24 juin 2022 à 17:12:44 UTC+2, zdenop a écrit :
>>>>
>>>>> Did try to implement suggestion from documentation?
>>>>> https://github.com/tesseract-ocr/tessdoc/blob/main/ImproveQuality.md
>>>>>
>>>>>
>>>>> Zdenko
>>>>>
>>>>>
>>>>> pi 24. 6. 2022 o 16:59 Hervé <[email protected]> napísal(a):
>>>>>
>>>>>> Hi, I need some help to make tesseract-OCR recognize digits : can't
>>>>>> achieve to make this work with
>>>>>>
>>>>>>
>>>>>> https://img.super-h.fr/images/2022/06/24/9a03414616bc4c6bd6e4bdb78e9d6783.jpg
>>>>>>
>>>>>>
>>>>>> here is my code :
>>>>>>
>>>>>>
>>>>>>
>>>>>> import cv2
>>>>>> import pytesseract
>>>>>>
>>>>>> pytesseract.pytesseract.tesseract_cmd ="C:\\Program
>>>>>> Files\\Tesseract-OCR\\tesseract.exe"
>>>>>>
>>>>>> def process_image(img):
>>>>>> #cv2.imshow('Img',img)
>>>>>> #cv2.waitKey(0)
>>>>>>
>>>>>> ### passage en niveau de gris
>>>>>> gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
>>>>>> #cv2.imshow('Img',gray)
>>>>>> #v2.waitKey(0)
>>>>>>
>>>>>> ###analyse de l'image
>>>>>> valeur = pytesseract.image_to_string(gray)
>>>>>> print(valeur)
>>>>>>
>>>>>> ##passage en noir et blanc
>>>>>> (thresh, im_bw) = cv2.threshold(gray, 128, 255, cv2.THRESH_BINARY
>>>>>> | cv2.THRESH_OTSU)
>>>>>> im_bw = cv2.bitwise_not(im_bw)
>>>>>> #cv2.imshow('Img',im_bw)
>>>>>> #cv2.waitKey(0)
>>>>>> # cv2.imwrite('ph.png',im_bw)
>>>>>> print(pytesseract.image_to_string(im_bw))
>>>>>>
>>>>>>
>>>>>> ###ouverture de l'image
>>>>>> img = cv2.imread('ocr5.png')
>>>>>> # cv2.imshow('Img',imgcoupee)
>>>>>>
>>>>>>
>>>>>> ###on rogne
>>>>>> imgcoupee = img[1056:1517,950:1862]
>>>>>> #img = cv2.imwrite('ocrcoupee.png',imgcoupee)
>>>>>> # cv2.imshow('Img',imgcoupee)
>>>>>>
>>>>>> ### decoupage de la partie correspondant au PH
>>>>>> ph= img[516:625, 616:815]
>>>>>>
>>>>>> #cv2.imwrite('pH.jpg', image_pH)
>>>>>>
>>>>>> ### partie chlore
>>>>>> cl = img[516:625, 882:1056]
>>>>>>
>>>>>> ### partie dÃ:copyright:faut flow
>>>>>> #flow= img[1302:1398,1054:1400]
>>>>>>
>>>>>> ### process
>>>>>> #process_image(imgcoupee)
>>>>>> process_image(ph)
>>>>>> process_image(cl)
>>>>>> #process_image(flow)
>>>>>>
>>>>>> digits seems to be clear enough, but it does'nt work, if someone
>>>>>> could help me ?
>>>>>>
>>>>>> thanks !
>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "tesseract-ocr" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to [email protected].
>>>>>> To view this discussion on the web visit
>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/a05712a5-e6ed-411f-a072-e389ea7095efn%40googlegroups.com
>>>>>>
>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/a05712a5-e6ed-411f-a072-e389ea7095efn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>>
>>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>>
>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/tesseract-ocr/4ed81a73-0a82-426e-a35e-ba52c5ac71f1n%40googlegroups.com
>>>>
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/4ed81a73-0a82-426e-a35e-ba52c5ac71f1n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>>
> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/eb2f2bdd-843d-4f11-83bb-d96e578ad94en%40googlegroups.com
>>
>> <https://groups.google.com/d/msgid/tesseract-ocr/eb2f2bdd-843d-4f11-83bb-d96e578ad94en%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/68c4cc25-811d-41dd-b93a-b0df17d9b705n%40googlegroups.com.