Have you tested with the English traineddata from the git tessdata repo?

Please see
https://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html

try with these,

/path/to/eng.user-patterns:

1-\d\d\d-GOOG-411
www.\n\\\*.com



I haven't tried this personally though


ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Tue, Nov 11, 2014 at 10:20 PM, <[email protected]> wrote:

> I am working on getting Tesseract to recognize VINs for an application I
> am developing. I have a clean VIN image (work around to be black text on
> white background). Have traineddata using fonts Courier, HelveticaNeue,
> LatoBold, LatoLight, OpenSans, and RobotoSlab as a first attempt. I've also
> limited the unicharset to A-Z except I and O and 0-9.
>
> The result is not very good. It returns a great deal of characters that
> surpass the number of characters present (17). Is there a way to limit
> tesseract to only detecting a 17 character word in one line? I'd also like
> to have tesseract prefer, but not require, the last 5 characters to be
> digits. There are a few other preferences that may help too, but I want to
> start with these. I'm not sure how to go about setting up those preferences.
>
> Also, any suggestions past these on being able to clean up the OCR to read
> more correctly would be helpful. I can't post full data and image here
> (they're VINs. I'd need permission to do so), but I can say that a in one
> instance WM is coming back as 6W6M and that the digits 67258 are coming
> back as 572S5 in another.
>
> Any guidance would be appreciated. I'll provide whatever information I can.
>
> Thanks!
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/1766c3a2-f13d-407b-a474-ad1fa8c7868c%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/1766c3a2-f13d-407b-a474-ad1fa8c7868c%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVzjPQ%2Bi2okT9EpoLy7YqYdSj36cFLCOggHwpOY2zdi%2Bg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to