Hi everybody,
I'm writing an application to automatically scan tons of postal
orders, using TessNet2 library from C#. Tesseract is great and
recognizes about everything on the postal order. But, because some
fields contain only numbers and some others only letters, I want to
process single subimages from the whole picture, adjusting
tessedit_char_blacklist and tessedit_char_whitelist variables for each
of these.
But while processing the entire picture gives great results (still
with some letters recognized as numbers like '0' instead of O),
processing a single subimage, particularly this one, gives no results
at all: http://www.francescovannini.com/pub/importo.jpg
The library detects only a tilde in this image, strangely with a
confidence of 100/255. Unfortunately this is the only part of the
postal order image that I can publish, because  sensitive data
concerns.
Is there something that I can tune? Surely processing the entire
picture gives Tesseract some more information about font features than
processing this subimage. That's the only reason why it seems possible
to me. But how can I process a subimage setting a particular whitelist
while achieving the same accuracy that processing the entire picture
gives?
Thank you in advance.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to