[tesseract-ocr] Tesseract AdaptToWordStr usage?

Syed Uzair Mon, 31 Jul 2017 04:40:08 -0700

Hello all

I am trying to extract text from the attached image (010003.bin.png) using 
tesserocr (python wrapper for Tesseract 3.04 API). When i used the script 
TestAdapttoWord.py (attachment) with the lines 18,19 commented my console 
reads like output1.png (attachment) and when i uncomment lines 18,19 my 
console reads like output2.png (attachment).
According to AdaptToWordStr documentation, it will return true if it was 
able to adapt to the given word. I am getting true but after that when i do 
GetUTF8Text i get empty results. I was hoping it would give correct result 
after AdaptToWordStr returns true.


I am not sure whether i am using AdapttoWordsStr correctly or not because 
the documentation doesn't say much. Is my interpretation of AdaptToWordStr 
is correct?  
I am on Ubuntu 16 using Tesseract 3.04.

Thanks


-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/b31f615f-4d87-4cf1-b046-b337bd709764%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

from PIL import Image
from tesserocr import PyTessBaseAPI, RIL, PSM
import tesserocr

#print tesserocr.tesseract_version()  # print tesseract-ocr version

image = Image.open('010003.bin.png')
with PyTessBaseAPI() as api:
    api.SetImage(image)
    api.SetDebugVariable("debug_file","debug.txt")
    boxes = api.GetComponentImages(RIL.WORD, True)
    print 'Found {} word image components.'.format(len(boxes))
    list=['( b )','S a l e s','o f','T r a d e d','G o o d s']
    for i, (im, box, _, _) in enumerate(boxes):
    	#im.show()
    	api.SetPageSegMode(8)
    	api.SetRectangle(box['x'], box['y'], box['w'], box['h'])
    	b = api.AdaptToWordStr(psm=8,word = list[i])
    	print b
    	ocrResult = api.GetUTF8Text()
    	print "Word"+str(i)+" Text:"+ocrResult
    	conf = api.MeanTextConf()

[tesseract-ocr] Tesseract AdaptToWordStr usage?

Reply via email to