[tesseract-ocr] improve image so i can better OCR

eliav schmulewitz Tue, 06 Jun 2017 23:20:07 -0700

Hi


I posted this on stackoverflow but got no response...


I am trying to read subtitles from an image taken from the news using 
tesserract on python. 
for some reasons I get better results when saving the file using plt and 
using tesseract reading it from there

   1. Why is that?
   2. How can I refine my results using cv2?

import urllib3import requestsimport numpy as npimport pytesseractimport 
matplotlib.pyplot as pltfrom  PIL import Imagedef downloadFile():
    url = 
'https://drive.google.com/uc?export=download&id=0B7t_yZLolnbiaVpicnEwbDRjTmc'
    http = urllib3.PoolManager()
    r = http.request('GET',url)
    f = open('testing.npy', 'wb')
    f.write(r.data)

downloadFile()
frame = np.load('testing.npy')
new_frame = frame[170:210,8:195]
plt.imshow(new_frame)
plt.axis('off')
plt.savefig('plt.png')print('from array: ' + 
pytesseract.image_to_string(Image.fromarray(new_frame),lang = 'eng'))print( 
'from plt: ' + pytesseract.image_to_string(Image.open('plt.png'),lang = 'eng'))

Thank you!

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/994ad827-8804-4f6f-89d7-6ff3348fc9e3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] improve image so i can better OCR

Reply via email to