Hi everybody, I'm writing an application to automatically scan tons of postal orders, using TessNet2 library from C#. Tesseract is great and recognizes about everything on the postal order. But, because some fields contain only numbers and some others only letters, I want to process single subimages from the whole picture, adjusting tessedit_char_blacklist and tessedit_char_whitelist variables for each of these. But while processing the entire picture gives great results (still with some letters recognized as numbers like '0' instead of O), processing a single subimage, particularly this one, gives no results at all: http://www.francescovannini.com/pub/importo.jpg The library detects only a tilde in this image, strangely with a confidence of 100/255. Unfortunately this is the only part of the postal order image that I can publish, because sensitive data concerns. Is there something that I can tune? Surely processing the entire picture gives Tesseract some more information about font features than processing this subimage. That's the only reason why it seems possible to me. But how can I process a subimage setting a particular whitelist while achieving the same accuracy that processing the entire picture gives? Thank you in advance.
-- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

