Have you tested with the English traineddata from the git tessdata repo?
Please see https://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html try with these, /path/to/eng.user-patterns: 1-\d\d\d-GOOG-411 www.\n\\\*.com I haven't tried this personally though ShreeDevi ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Nov 11, 2014 at 10:20 PM, <[email protected]> wrote: > I am working on getting Tesseract to recognize VINs for an application I > am developing. I have a clean VIN image (work around to be black text on > white background). Have traineddata using fonts Courier, HelveticaNeue, > LatoBold, LatoLight, OpenSans, and RobotoSlab as a first attempt. I've also > limited the unicharset to A-Z except I and O and 0-9. > > The result is not very good. It returns a great deal of characters that > surpass the number of characters present (17). Is there a way to limit > tesseract to only detecting a 17 character word in one line? I'd also like > to have tesseract prefer, but not require, the last 5 characters to be > digits. There are a few other preferences that may help too, but I want to > start with these. I'm not sure how to go about setting up those preferences. > > Also, any suggestions past these on being able to clean up the OCR to read > more correctly would be helpful. I can't post full data and image here > (they're VINs. I'd need permission to do so), but I can say that a in one > instance WM is coming back as 6W6M and that the digits 67258 are coming > back as 572S5 in another. > > Any guidance would be appreciated. I'll provide whatever information I can. > > Thanks! > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/1766c3a2-f13d-407b-a474-ad1fa8c7868c%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/1766c3a2-f13d-407b-a474-ad1fa8c7868c%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVzjPQ%2Bi2okT9EpoLy7YqYdSj36cFLCOggHwpOY2zdi%2Bg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

