I have a problem where I have forms with many fields on them (all in the same place) that need to be scanned and interpreted by Tesseract. My idea is to draw rectangles around each "field" and then grab that image and feed it to Tesseract. Then if the confidence is high give that as the translation and if it is low feed it to a human to manually correct. The question I have is can I then feed that back to Tesseract to train it to improve in the future (the image of the text and the actual text)? This way the training of the system comes from actual usage of it rather then going through a separate training exercise.
We are having a disagreement internally on how to do this. I read about the Make Box Files which seems promising. However one of our team says that won't work? Could someone help point us in the right direction and if this is possible. Any help is appreciated. Thanks. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/508fc238-37c8-4dd6-b63a-b31e1c484200%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

