Re: [tesseract-ocr] Unable detect number in box

Lorenzo Bolzani Fri, 27 Mar 2020 03:19:01 -0700

Hi,
an easy trick to remove closed borders it to fill the outside area with the
border color and then with the opposite one. See the attached example.


For image 2 it is more complex. You can crop a little the image to remove
the external borders and paint a rectangle over the middle line if the
location is approximately fixed.

Otherwise use morphological transformations to merge the number into blobs:

https://www.geeksforgeeks.org/erosion-dilation-images-using-opencv-python/

dilate to join the letters and later erode to delete the lines (or the
opposite depending if the background if black or white).

Now do component analysis to find the remaining blobs and crop those
regions from the original image with some margin.

Now you have the numbers but I do not know a simple reliable way to fill
them. Maybe tesseract is able to read them. Otherwise I would try to do a
little dilate, to make them thicker, it might help.


Bye

Lorenzo

Il giorno gio 26 mar 2020 alle ore 20:03 smarty pokemon <
[email protected]> ha scritto:

> Hi All,
>
> I am trying to convert the following images into to the text via tesseract
> but unable to do so after multiple attempts.
> I tried with different images by binarization of image-making color invert
> I only want to extract number in box
> but no luck after several attempts I am using the
> ubuntu 16.04 server
> tesseract 3.04.01
> leptonica-1.73
> libgif 5.1.2 : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff
> 4.0.6 : zlib 1.2.8 : libwebp 0.4.4 : libopenjp2 2.1.2
> locale set to eng us
> I am using the cli option `tesseract image_1.jpg stdout`
> tried with all -psm as well.
>
> Can some help me to understand where I am doing wrong or image has some
> issue?
> Thanks in advance.
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/14602759-68be-4a71-b6d2-43fa9b2a8081%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/14602759-68be-4a71-b6d2-43fa9b2a8081%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAMgOLLwOEHeoyAF%3DU53V6Y7saq2dvV0Z%3DV1mN-m_7urgV%2BQH0g%40mail.gmail.com.

import sys
import cv2

img = cv2.imread(sys.argv[1])

img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) # tolerance seems not to work on RGB images

cornerPx = img[0,0] # this must be white

cv2.imshow("input", img)

th=30  # flood threshold
_, img, _, _ = cv2.floodFill(img, None, (0, 0), (0, 0, 255), loDiff=th, upDiff=th)
cv2.imshow("flood black", img)

_, img, _, _ = cv2.floodFill(img, None, (20, 200), (255, 255, 255), loDiff=th, upDiff=th)
cv2.imshow("flood white", img)

cv2.waitKey(0)

Re: [tesseract-ocr] Unable detect number in box

Reply via email to