It might be worth trying to go with a b&w rendering and using a PSM of 11 since 
your input images are of such good quality. This is less likely to miss words 
or letters though other artifacts may slip through. Something like this seems 
to get decent results:

TESSERACT_CONFIG=r'--psm 11'

def showResults(region):
    results = pytesseract.image_to_data(region,
        config=TESSERACT_CONFIG,
        output_type=pytesseract.Output.DICT)

    tlen = len(results['text'])

    for i in range(tlen):
        #use conf to weed out some of the cruft
        if float(results['conf'][i]) > 0:
            print("WORD:",results['text'][i])
            print("left:",results['left'][i])
            print("top:",results['top'][i])
            print("width:",results['width'][i])
            print("height:",results['height'][i])
            print("conf:",results['conf'][i])

#read as grayscale to mute colors
gray = cv2.imread("mina.png",cv2.IMREAD_GRAYSCALE)

#convert to 2 color black & white
im= cv2.threshold(gray, 128, 255, cv2.THRESH_BINARY)[1]

_,w = im.shape

#crop and ocr top region (as per coords in email)
region1 = im[55:110,0:w]
cv2.imwrite('region1.png', region1)
showResults(region1)

#crop and ocr bottom region
region2 = im[312:360,0:w]
cv2.imwrite('region2.png', region2)
showResults(region2)

I think maybe you are cropping at a more granular level than in this example 
but the basic approach would be the same.

art

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/YTBPR01MB3087469FB571077C158E1669DC4B9%40YTBPR01MB3087.CANPRD01.PROD.OUTLOOK.COM.

Reply via email to