Dear Tesseract community,

First of all -- my sincere gratitude to the devs (Tom and others!) for
this fantastic piece of open source software. I'm presently working on
a project to detect and recognize text on a phone in real time for
blind people. My background is in CS/Vision/Machine Learning. That
said, I have a few questions that would help me better understand the
working of tesseract without diving deep into the training code

Run Time Setup:

We have compiled Tesseract 3.01 along leptonica and pix and pushed it
onto an iphone4/4S. We also have some patented text detection
algorithm that detects text in the wild and returns bounding boxes.
The bounding boxes are used to crop an image that is then passed to
Tesseract.
Tesseract seems to perform better on text from documents than those
from "urban scenes" such as signs, posts, etc. Is this a system
constraint, can we do anything to help it do better?
On controlled experiments with just one word on a white background --
we find that it is extremely sensitive to getting the right word.  Do
people have suggestions/hypothesis on this?
Our control included, reducing how much of the image was being passed
to tesseract (for e.g. limiting it to the top 100 pixels of the image)
and  further limiting the single word on the sheet of paper within
these 100 pixels.
On some occasions, Tesseract gets very close to the original word but
some characters get changed to variants of what the ground truth is.
For e.g., "ã" or "á" instead "a" . We hypothesize this is because of
small amounts of erroneous foreground are helping fudge the confidence
for the correct character a bit lower.
Pre-processing and other specifics :
What sort of pre-processing does Tesseract perform scaling/rotation/de-
noising/dialation/binarization/connected components?
How invariant is Tesseract supposed to be (Based on both the model and
the default training)?
Can anyone elaborate on what the actual classification model/algorithm
is for the character based training algorithm?
Would people recommend that we consider retraining tesseract to
include more "wild text" to see a marked improvement in performance?
What other information would experts on this group share to help build
a more robust system?

My group and me are happy to share our findings and some code to the
tesseract community. Your efforts/contributions will go a long way
toward helping the blind as well. Your time and effort is greatly
appreciated.

Regards,
Mayur Mudigonda

Research Scientist,
Blindsight Corporation

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to