Hello all
I am trying to extract text from the attached image (010003.bin.png) using
tesserocr (python wrapper for Tesseract 3.04 API). When i used the script
TestAdapttoWord.py (attachment) with the lines 18,19 commented my console
reads like output1.png (attachment) and when i uncomment lines 18,19 my
console reads like output2.png (attachment).
According to AdaptToWordStr documentation, it will return true if it was
able to adapt to the given word. I am getting true but after that when i do
GetUTF8Text i get empty results. I was hoping it would give correct result
after AdaptToWordStr returns true.
I am not sure whether i am using AdapttoWordsStr correctly or not because
the documentation doesn't say much. Is my interpretation of AdaptToWordStr
is correct?
I am on Ubuntu 16 using Tesseract 3.04.
Thanks
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/b31f615f-4d87-4cf1-b046-b337bd709764%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
from PIL import Image
from tesserocr import PyTessBaseAPI, RIL, PSM
import tesserocr
#print tesserocr.tesseract_version() # print tesseract-ocr version
image = Image.open('010003.bin.png')
with PyTessBaseAPI() as api:
api.SetImage(image)
api.SetDebugVariable("debug_file","debug.txt")
boxes = api.GetComponentImages(RIL.WORD, True)
print 'Found {} word image components.'.format(len(boxes))
list=['( b )','S a l e s','o f','T r a d e d','G o o d s']
for i, (im, box, _, _) in enumerate(boxes):
#im.show()
api.SetPageSegMode(8)
api.SetRectangle(box['x'], box['y'], box['w'], box['h'])
b = api.AdaptToWordStr(psm=8,word = list[i])
print b
ocrResult = api.GetUTF8Text()
print "Word"+str(i)+" Text:"+ocrResult
conf = api.MeanTextConf()