When you say that the binarized image looked perfect but the accuracy was poor then my best guess is that the font used on those tickets is the culprit. I assume you can could try to create training data specifically for this special font.
Am Mittwoch, 1. Januar 2014 13:22:31 UTC+1 schrieb Muhammad Muaz: > > Hello, I am trying to recognized characters from the images taken from > *mobile > camera* at *72dpi* resolution with in 2-2.5 secs with complete > processing. Can be found in the following link > > Tickets for > OCR<https://picasaweb.google.com/107072433218124342258/TicketsForOCR?authuser=0&feat=directlink> > > Ticket contains > > - little bit bad light > - Non-text area > - less resolution > > I tried to feed the image direct to tesseract API and it is giving me 70% > good results in 1sec average. But I want to increase the accuracy in > noticing the time factor > So far I have tried > > 1. Detect edges of the image > 2. Blob Analysis for blobs > 3. Binarized the ticket using adaptive thresholding > > Then I tried to feed those binarized images to tesseract, the accuracy > reduced to less than 50-60%, though binarized image look perfect. I also > tried to look in to few research papers > > - http://www.vincent-net.com/luc/papers/10wiley_morpho_DIAapps.pdf > - > > http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.193.6347&rep=rep1&type=pdf > - http://iit.demokritos.gr/~bgat/PatRec2006.pdf > - http://psych.stanford.edu/~jlm/pdfs/Sternberg67.pdf > > but no luck. Kindly help me in this and sorry if my question is so basic. > Also I am trying not to use command line solution but I would prefer > *Leptonica > *and *OpenCV*. > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

