[tesseract-ocr] Training from actual user feedback

Fred Beer Wed, 30 Apr 2014 10:11:39 -0700

I have a problem where I have forms with many fields on them (all in the 
same place) that need to be scanned and interpreted by Tesseract. My idea 
is to draw rectangles around each "field" and then grab that image and feed 
it to Tesseract. Then if the confidence is high give that as the 
translation and if it is low feed it to a human to manually correct. The 
question I have is can I then feed that back to Tesseract to train it to 
improve in the future (the image of the text and the actual text)? This way 
the training of the system comes from actual usage of it rather then going 
through a separate training exercise.


We are having a disagreement internally on how to do this. I read about the 
Make 
Box Files which seems promising. However one of our team says that won't 
work? Could someone help point us in the right direction and if this is 
possible. Any help is appreciated. Thanks.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/508fc238-37c8-4dd6-b63a-b31e1c484200%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Training from actual user feedback

Reply via email to