also see https://groups.google.com/forum/#!topic/tesseract-ocr/et7bS5QRf2o
ShreeDevi ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Nov 11, 2014 at 11:02 PM, ShreeDevi Kumar <[email protected]> wrote: > Have you tested with the English traineddata from the git tessdata repo? > > > Please see > https://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html > > try with these, > > /path/to/eng.user-patterns: > > 1-\d\d\d-GOOG-411 > www.\n\\\*.com > > > > I haven't tried this personally though > > > ShreeDevi > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > On Tue, Nov 11, 2014 at 10:20 PM, <[email protected]> wrote: > >> I am working on getting Tesseract to recognize VINs for an application I >> am developing. I have a clean VIN image (work around to be black text on >> white background). Have traineddata using fonts Courier, HelveticaNeue, >> LatoBold, LatoLight, OpenSans, and RobotoSlab as a first attempt. I've also >> limited the unicharset to A-Z except I and O and 0-9. >> >> The result is not very good. It returns a great deal of characters that >> surpass the number of characters present (17). Is there a way to limit >> tesseract to only detecting a 17 character word in one line? I'd also like >> to have tesseract prefer, but not require, the last 5 characters to be >> digits. There are a few other preferences that may help too, but I want to >> start with these. I'm not sure how to go about setting up those preferences. >> >> Also, any suggestions past these on being able to clean up the OCR to >> read more correctly would be helpful. I can't post full data and image here >> (they're VINs. I'd need permission to do so), but I can say that a in one >> instance WM is coming back as 6W6M and that the digits 67258 are coming >> back as 572S5 in another. >> >> Any guidance would be appreciated. I'll provide whatever information I >> can. >> >> Thanks! >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at http://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/1766c3a2-f13d-407b-a474-ad1fa8c7868c%40googlegroups.com >> <https://groups.google.com/d/msgid/tesseract-ocr/1766c3a2-f13d-407b-a474-ad1fa8c7868c%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUKtHbCU0BoM%3DxegRn5Mkdk9JiJ_kj9H0K6Bm7r4pqakg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

