I was able to get better results by playing with the psm

tesseract --psm 12 -l eng file.jpg output

On Saturday, March 2, 2019 at 4:42:05 PM UTC-8, [email protected] wrote:
>
> I have similar issues.
> The only thing that helped me - confidence level for those "words" is very 
> low (about 0), so I could filter them out (it was acceptable in my case).
> The same issue arises when there are multiple dots (>3) after normal text.
>
> суббота, 2 марта 2019 г., 17:02:34 UTC+10:30 пользователь 
> [email protected] написал:
>>
>> I tried following code . I want to extract text along with *** symbol . I 
>> tired following code 
>>
>> import cv2
>> import pytesseract
>> import numpy as np
>>
>>
>> def image_resize(image, width = None, height = None, inter = 
>> cv2.INTER_AREA):
>>     # initialize the dimensions of the image to be resized and
>>     # grab the image size
>>     dim = None
>>     (h, w) = image.shape[:2]
>>
>>     # if both the width and height are None, then return the
>>     # original image
>>     if width is None and height is None:
>>         return image
>>
>>     # check to see if the width is None
>>     if width is None:
>>         # calculate the ratio of the height and construct the
>>         # dimensions
>>         r = height / float(h)
>>         dim = (int(w * r), height)
>>
>>     # otherwise, the height is None
>>     else:
>>         # calculate the ratio of the width and construct the
>>         # dimensions
>>         r = width / float(w)
>>         dim = (width, int(h * r))
>>
>>     # resize the image
>>     resized = cv2.resize(image, dim, interpolation = cv2.INTER_LINEAR)
>>
>>     # return the resized image
>>     return resized
>>
>>
>> img = cv2.imread('test.jpg' ,0)
>> img =  image_resize(img, height = 4000)
>>
>>
>> print(pytesseract.image_to_string(img,  config=' -c textord_heavy_nr=0 
>> textord_noise_area_ratio =100 textord_max_noise_size = 154  --psm 11 ' ))
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/871699cc-ecc1-4d04-a036-190a2e7c5285%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to