decimal point is not a problem, I can devide by 100 or 10 and it works :)
could you share my the whole code ? thanks
Le lundi 27 juin 2022 à 20:44:42 UTC+2, zdenop a écrit :
> not sure what are you doing, but try something like this:
>
> def autoinvert(binarized_img, tresh=0.5):
> """Invert binarized image if amount of black pixels is higher than
> tresh.
> """
> height, width = binarized_img.shape
> non_zero = cv2.countNonZero(binarized_img)
> white_rate = non_zero/(height*width)
> if white_rate < tresh:
> return ~binarized_img
> else:
> return binarized_img
>
> filename = 'default.png'
> test = cv2.imread(filename, cv2.IMREAD_GRAYSCALE)
> binarized = cv2.threshold(test, 0, 255, cv2.THRESH_BINARY +
> cv2.THRESH_OTSU)[1]
> kernel = np.ones((5,5), np.uint8)
> img_erosion = cv2.dilate(autoinvert(binarized), kernel, iterations=1)
> ratio = round(40/img_erosion.shape[0], 2)
> ocr_image = cv2.resize(img_erosion, (0,0), fx=ratio, fy=ratio)
>
> output = pytesseract.image_to_string(ocr_image,
> config=f'--tessdata-dir "{tessdata}" --psm 6')
> print(output)
>
> Which produces '733 124', so there is still a problem with the decimal
> point...
>
> Zdenko
>
>
> po 27. 6. 2022 o 13:00 Hervé <[email protected]> napísal(a):
>
>> Hi
>>
>> I don't achieve to have a 300dpi image, I tried with increasing picam
>> resolution, I only have 96. I tried with
>>
>> img = cv2.resize(img, None, fx=1.5, fy=1.5, interpolation=cv2.INTER_AREA)
>>
>> but it only grows the image size, not the DPI.
>>
>> Thanks
>>
>>
>> Le dimanche 26 juin 2022 à 15:24:01 UTC+2, zdenop a écrit :
>>
>>> Check your tesseract version (tesseract -v). Here is mine:
>>>
>>> tesseract 5.1.0-70-g0df5
>>> leptonica-1.83.0 (Jun 24 2022, 17:48:50) [MSC v.1929 LIB Release x64]
>>> libgif 5.2.1 : libjpeg 6b (libjpeg-turbo 2.0.91) : libpng 1.6.37 :
>>> libtiff 4.4.0 : zlib 1.2.12 : libwebp 1.2.2 : libopenjp2 2.5.0
>>> Found AVX2
>>> Found AVX
>>> Found FMA
>>> Found SSE4.1
>>> Found libarchive 3.5.1 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.6
>>> libzstd/1.4.9
>>> Found libcurl/7.75.0 zlib/1.2.12 libssh2/1.10.1_DEV
>>>
>>>
>>> + try to use (eng) data file from tessdata_best[1] (also just
>>> tessdata[2] produce a result)
>>>
>>> Regarding image:
>>>
>>> 1. I took output from your code "cv2.imwrite('pH.jpg', ph)" (jpg is
>>> not good format for ocr)
>>> 2. I opened it as grayscale and I see 2 problems covered by
>>> documentation:
>>> - it needs to be inverted
>>> - it needs to be resized to the height of letters is between
>>> 30-40 points.
>>> 3. I guess sharpening (to increase space between dot and 3)
>>> would help to recognize dot.
>>> 4. Binarize/threshold image by yourself. Tesseract has some binarize
>>> algorithms, but you can another one that better fit your case.
>>>
>>> I suggest doing image preprocessing in the image editor (to check what
>>> helps) and then implementing it into code.
>>>
>>> [1] https://github.com/tesseract-ocr/tessdata_best
>>> [2] https://github.com/tesseract-ocr/tessdata
>>>
>>> Zdenko
>>>
>>>
>>> ne 26. 6. 2022 o 0:23 Hervé <[email protected]> napísal(a):
>>>
>>>> Sorry I am really noob
>>>>
>>>> When I do : tesseract pH_treshr.png -
>>>> I have :
>>>> Empty page!!
>>>> Empty page!!
>>>>
>>>> How do you achieve to have this image ? and why can't I tesseract it
>>>> like you ? I am on buster with tesseract 5.1
>>>>
>>>> is there a way to discuss ? discord ?
>>>>
>>>> thanks for your patience and help
>>>>
>>>> Le samedi 25 juin 2022 à 14:34:06 UTC+2, zdenop a écrit :
>>>>
>>>>> Sorry - I mean Rescaling:
>>>>>
>>>>> Tesseract works best on images which have a DPI of at least 300 dpi,
>>>>> so it may be beneficial to resize images. For more information see the
>>>>> FAQ.
>>>>> "Willus Dotkom" made interesting test for Optimal image resolution
>>>>> with suggestion for optimal Height of capital letter in pixels:
>>>>> https://groups.google.com/g/tesseract-ocr/c/Wdh_JJwnw94/m/24JHDYQbBQAJ
>>>>>
>>>>>
>>>>> After that, you can get output (but the dot is missing) with the
>>>>> command line: "tesseract pH_treshr.png -"
>>>>>
>>>>> I was able to get the decimal point separator with the letsgodigital
>>>>> data file
>>>>> https://github.com/arturaugusto/display_ocr/blob/master/letsgodigital/letsgodigital.traineddata
>>>>> tesseract pH_treshr.png - -l letsgodigital
>>>>>
>>>>> Or have a look at SSD https://github.com/Shreeshrii/tessdata_ssd
>>>>>
>>>>> Zdenko
>>>>>
>>>>>
>>>>> so 25. 6. 2022 o 12:17 Hervé <[email protected]> napísal(a):
>>>>>
>>>>>> I am on tesseract 5
>>>>>>
>>>>>> Inverting images
>>>>>>
>>>>>> While tesseract version 3.05 (and older) handle inverted image (dark
>>>>>> background and light text) without problem, for 4.x version use dark
>>>>>> text
>>>>>> on light background.
>>>>>> isn'it the same than :
>>>>>> (thresh, im_bw) = cv2.threshold(gray, 128, 255, cv2.THRESH_BINARY
>>>>>> | cv2.THRESH_OTSU)
>>>>>> im_bw = cv2.bitwise_not(im_bw)
>>>>>>
>>>>>> for resizing, I take my picture in full HD, do increasing resolution
>>>>>> will allow tesseract to better OCR ?
>>>>>>
>>>>>> thanks
>>>>>>
>>>>>>
>>>>>> Le samedi 25 juin 2022 à 11:25:50 UTC+2, zdenop a écrit :
>>>>>>
>>>>>>> Why you did not try more relevant hits like inverting and resizing?
>>>>>>>
>>>>>>> Zdenko
>>>>>>>
>>>>>>>
>>>>>>> so 25. 6. 2022 o 10:56 Hervé <[email protected]> napísal(a):
>>>>>>>
>>>>>>>> I tried gray image, black and white, and I use
>>>>>>>>
>>>>>>>> custom_psm = r'--psm 7'
>>>>>>>>
>>>>>>>> didn't try others parameters
>>>>>>>> Le samedi 25 juin 2022 à 10:32:14 UTC+2, zdenop a écrit :
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> so 25. 6. 2022 o 8:15 Hervé <[email protected]> napísal(a):
>>>>>>>>>
>>>>>>>>>> Hi
>>>>>>>>>> I just tried some, without real success
>>>>>>>>>>
>>>>>>>>>> Please be specific: what did you try and what was the result?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> could I learn digits from pictures ? maybe this font is not well
>>>>>>>>>> recognized
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Any training is useless if the failure is at the image
>>>>>>>>> preprocessing stage.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> thanks
>>>>>>>>>>
>>>>>>>>>> Le vendredi 24 juin 2022 à 17:12:44 UTC+2, zdenop a écrit :
>>>>>>>>>>
>>>>>>>>>>> Did try to implement suggestion from documentation?
>>>>>>>>>>>
>>>>>>>>>>> https://github.com/tesseract-ocr/tessdoc/blob/main/ImproveQuality.md
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Zdenko
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> pi 24. 6. 2022 o 16:59 Hervé <[email protected]> napísal(a):
>>>>>>>>>>>
>>>>>>>>>>>> Hi, I need some help to make tesseract-OCR recognize digits :
>>>>>>>>>>>> can't achieve to make this work with
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> https://img.super-h.fr/images/2022/06/24/9a03414616bc4c6bd6e4bdb78e9d6783.jpg
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> here is my code :
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> import cv2
>>>>>>>>>>>> import pytesseract
>>>>>>>>>>>>
>>>>>>>>>>>> pytesseract.pytesseract.tesseract_cmd ="C:\\Program
>>>>>>>>>>>> Files\\Tesseract-OCR\\tesseract.exe"
>>>>>>>>>>>>
>>>>>>>>>>>> def process_image(img):
>>>>>>>>>>>> #cv2.imshow('Img',img)
>>>>>>>>>>>> #cv2.waitKey(0)
>>>>>>>>>>>>
>>>>>>>>>>>> ### passage en niveau de gris
>>>>>>>>>>>> gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
>>>>>>>>>>>> #cv2.imshow('Img',gray)
>>>>>>>>>>>> #v2.waitKey(0)
>>>>>>>>>>>>
>>>>>>>>>>>> ###analyse de l'image
>>>>>>>>>>>> valeur = pytesseract.image_to_string(gray)
>>>>>>>>>>>> print(valeur)
>>>>>>>>>>>>
>>>>>>>>>>>> ##passage en noir et blanc
>>>>>>>>>>>> (thresh, im_bw) = cv2.threshold(gray, 128, 255,
>>>>>>>>>>>> cv2.THRESH_BINARY | cv2.THRESH_OTSU)
>>>>>>>>>>>> im_bw = cv2.bitwise_not(im_bw)
>>>>>>>>>>>> #cv2.imshow('Img',im_bw)
>>>>>>>>>>>> #cv2.waitKey(0)
>>>>>>>>>>>> # cv2.imwrite('ph.png',im_bw)
>>>>>>>>>>>> print(pytesseract.image_to_string(im_bw))
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ###ouverture de l'image
>>>>>>>>>>>> img = cv2.imread('ocr5.png')
>>>>>>>>>>>> # cv2.imshow('Img',imgcoupee)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ###on rogne
>>>>>>>>>>>> imgcoupee = img[1056:1517,950:1862]
>>>>>>>>>>>> #img = cv2.imwrite('ocrcoupee.png',imgcoupee)
>>>>>>>>>>>> # cv2.imshow('Img',imgcoupee)
>>>>>>>>>>>>
>>>>>>>>>>>> ### decoupage de la partie correspondant au PH
>>>>>>>>>>>> ph= img[516:625, 616:815]
>>>>>>>>>>>>
>>>>>>>>>>>> #cv2.imwrite('pH.jpg', image_pH)
>>>>>>>>>>>>
>>>>>>>>>>>> ### partie chlore
>>>>>>>>>>>> cl = img[516:625, 882:1056]
>>>>>>>>>>>>
>>>>>>>>>>>> ### partie dÃ:copyright:faut flow
>>>>>>>>>>>> #flow= img[1302:1398,1054:1400]
>>>>>>>>>>>>
>>>>>>>>>>>> ### process
>>>>>>>>>>>> #process_image(imgcoupee)
>>>>>>>>>>>> process_image(ph)
>>>>>>>>>>>> process_image(cl)
>>>>>>>>>>>> #process_image(flow)
>>>>>>>>>>>>
>>>>>>>>>>>> digits seems to be clear enough, but it does'nt work, if
>>>>>>>>>>>> someone could help me ?
>>>>>>>>>>>>
>>>>>>>>>>>> thanks !
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> You received this message because you are subscribed to the
>>>>>>>>>>>> Google Groups "tesseract-ocr" group.
>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from
>>>>>>>>>>>> it, send an email to [email protected].
>>>>>>>>>>>> To view this discussion on the web visit
>>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/a05712a5-e6ed-411f-a072-e389ea7095efn%40googlegroups.com
>>>>>>>>>>>>
>>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/a05712a5-e6ed-411f-a072-e389ea7095efn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>>>> .
>>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>> You received this message because you are subscribed to the
>>>>>>>>>> Google Groups "tesseract-ocr" group.
>>>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>>>> send an email to [email protected].
>>>>>>>>>>
>>>>>>>>> To view this discussion on the web visit
>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/4ed81a73-0a82-426e-a35e-ba52c5ac71f1n%40googlegroups.com
>>>>>>>>>>
>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/4ed81a73-0a82-426e-a35e-ba52c5ac71f1n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>> .
>>>>>>>>>>
>>>>>>>>> --
>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>> Groups "tesseract-ocr" group.
>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>> send an email to [email protected].
>>>>>>>>
>>>>>>> To view this discussion on the web visit
>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/eb2f2bdd-843d-4f11-83bb-d96e578ad94en%40googlegroups.com
>>>>>>>>
>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/eb2f2bdd-843d-4f11-83bb-d96e578ad94en%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>> .
>>>>>>>>
>>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "tesseract-ocr" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to [email protected].
>>>>>>
>>>>> To view this discussion on the web visit
>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/68c4cc25-811d-41dd-b93a-b0df17d9b705n%40googlegroups.com
>>>>>>
>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/68c4cc25-811d-41dd-b93a-b0df17d9b705n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>>
>>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>>
>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/tesseract-ocr/f76dbe5d-d75d-4ef8-90c3-d36ae3898194n%40googlegroups.com
>>>>
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/f76dbe5d-d75d-4ef8-90c3-d36ae3898194n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>>
> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/93e684ff-c519-4966-906b-ed6b376ee11en%40googlegroups.com
>>
>> <https://groups.google.com/d/msgid/tesseract-ocr/93e684ff-c519-4966-906b-ed6b376ee11en%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/4f74ab5a-4305-4d57-9154-e0bdda7dfb1an%40googlegroups.com.