I have similar issues. The only thing that helped me - confidence level for those "words" is very low (about 0), so I could filter them out (it was acceptable in my case). The same issue arises when there are multiple dots (>3) after normal text.
суббота, 2 марта 2019 г., 17:02:34 UTC+10:30 пользователь [email protected] написал: > > I tried following code . I want to extract text along with *** symbol . I > tired following code > > import cv2 > import pytesseract > import numpy as np > > > def image_resize(image, width = None, height = None, inter = > cv2.INTER_AREA): > # initialize the dimensions of the image to be resized and > # grab the image size > dim = None > (h, w) = image.shape[:2] > > # if both the width and height are None, then return the > # original image > if width is None and height is None: > return image > > # check to see if the width is None > if width is None: > # calculate the ratio of the height and construct the > # dimensions > r = height / float(h) > dim = (int(w * r), height) > > # otherwise, the height is None > else: > # calculate the ratio of the width and construct the > # dimensions > r = width / float(w) > dim = (width, int(h * r)) > > # resize the image > resized = cv2.resize(image, dim, interpolation = cv2.INTER_LINEAR) > > # return the resized image > return resized > > > img = cv2.imread('test.jpg' ,0) > img = image_resize(img, height = 4000) > > > print(pytesseract.image_to_string(img, config=' -c textord_heavy_nr=0 > textord_noise_area_ratio =100 textord_max_noise_size = 154 --psm 11 ' )) > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/fcc40cd4-bb62-41e2-8618-e3b0bf7d441d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

