Could you probably show us an example image that gives you bad results? Probably it would be useful to use another technique for image binarization. Tesseract uses Otsu's method. I would suggest to use a method like this one <http://www.imlab.jp/cbdar2007/proceedings/papers/O1-1.pdf> by Kasar et. al. It can be helpful with colored imagery and white on black/color text.
Your idea to add a drug dictionary could also be beneficial. You don't necessarily need to start a new training, though. Maybe using bazaar with your own "eng.user-words" file might be enough (see http://tesseract-ocr.googlecode.com/svn-history/r1116/trunk/doc/tesseract.1.html). Am Mittwoch, 11. Juni 2014 12:49:34 UTC+2 schrieb elena bresciani: > > Hello to everybody, > > for the project I'm working on I need to automatically recognize a grug > from an image of its package. > I tried tesseract but with not so good results. In particular sometimes > certain words (especially the drug names) are totally bad interpreted and > moreover other words (even printed in big fonts) are missing. > > How can I resolve my issues? > Maybe I have to train tesseract with a "drug-dictionary"? > And how can I resolve the problem of completly missing words? > > Thank you in advance > > Cheers > Elena > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/f95f7758-53c8-4a7f-bbff-3e74f3aa29db%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

