[tesseract-ocr] Re: Pharmaceutics OCR recognition project

Paul Sat, 14 Jun 2014 07:12:20 -0700

Could you probably show us an example image that gives you bad results?

Probably it would be useful to use another technique for  image 
binarization.
Tesseract uses Otsu's method. I would suggest to use a method like this one 
<http://www.imlab.jp/cbdar2007/proceedings/papers/O1-1.pdf> by Kasar et. al.
It can be helpful with colored imagery and white on black/color text.


Your idea to add a drug dictionary could also be beneficial. You don't 
necessarily need to start a new training, though.
Maybe using bazaar with your own "eng.user-words" file might be enough (see 
http://tesseract-ocr.googlecode.com/svn-history/r1116/trunk/doc/tesseract.1.html).


Am Mittwoch, 11. Juni 2014 12:49:34 UTC+2 schrieb elena bresciani:
>
> Hello to everybody,
>
> for the project I'm working on I need to automatically recognize a grug 
> from an image of its package. 
> I tried tesseract but with not so good results. In particular sometimes 
> certain words (especially the drug names) are totally bad interpreted and 
> moreover other words (even printed in big fonts) are missing.
>
> How can I resolve my issues?
> Maybe I have to train tesseract with a "drug-dictionary"?
> And how can I resolve the problem of completly missing words?
>
> Thank you in advance
>
> Cheers
> Elena
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/f95f7758-53c8-4a7f-bbff-3e74f3aa29db%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Re: Pharmaceutics OCR recognition project

Reply via email to