I have similar issues.
The only thing that helped me - confidence level for those "words" is very 
low (about 0), so I could filter them out (it was acceptable in my case).
The same issue arises when there are multiple dots (>3) after normal text.

суббота, 2 марта 2019 г., 17:02:34 UTC+10:30 пользователь 
[email protected] написал:
>
> I tried following code . I want to extract text along with *** symbol . I 
> tired following code 
>
> import cv2
> import pytesseract
> import numpy as np
>
>
> def image_resize(image, width = None, height = None, inter = 
> cv2.INTER_AREA):
>     # initialize the dimensions of the image to be resized and
>     # grab the image size
>     dim = None
>     (h, w) = image.shape[:2]
>
>     # if both the width and height are None, then return the
>     # original image
>     if width is None and height is None:
>         return image
>
>     # check to see if the width is None
>     if width is None:
>         # calculate the ratio of the height and construct the
>         # dimensions
>         r = height / float(h)
>         dim = (int(w * r), height)
>
>     # otherwise, the height is None
>     else:
>         # calculate the ratio of the width and construct the
>         # dimensions
>         r = width / float(w)
>         dim = (width, int(h * r))
>
>     # resize the image
>     resized = cv2.resize(image, dim, interpolation = cv2.INTER_LINEAR)
>
>     # return the resized image
>     return resized
>
>
> img = cv2.imread('test.jpg' ,0)
> img =  image_resize(img, height = 4000)
>
>
> print(pytesseract.image_to_string(img,  config=' -c textord_heavy_nr=0 
> textord_noise_area_ratio =100 textord_max_noise_size = 154  --psm 11 ' ))
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/fcc40cd4-bb62-41e2-8618-e3b0bf7d441d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to