also see https://groups.google.com/forum/#!topic/tesseract-ocr/et7bS5QRf2o



ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Tue, Nov 11, 2014 at 11:02 PM, ShreeDevi Kumar <[email protected]>
wrote:

> Have you tested with the English traineddata from the git tessdata repo?
>
>
> Please see
> https://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html
>
> try with these,
>
> /path/to/eng.user-patterns:
>
> 1-\d\d\d-GOOG-411
> www.\n\\\*.com
>
>
>
> I haven't tried this personally though
>
>
> ShreeDevi
> ____________________________________________________________
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Tue, Nov 11, 2014 at 10:20 PM, <[email protected]> wrote:
>
>> I am working on getting Tesseract to recognize VINs for an application I
>> am developing. I have a clean VIN image (work around to be black text on
>> white background). Have traineddata using fonts Courier, HelveticaNeue,
>> LatoBold, LatoLight, OpenSans, and RobotoSlab as a first attempt. I've also
>> limited the unicharset to A-Z except I and O and 0-9.
>>
>> The result is not very good. It returns a great deal of characters that
>> surpass the number of characters present (17). Is there a way to limit
>> tesseract to only detecting a 17 character word in one line? I'd also like
>> to have tesseract prefer, but not require, the last 5 characters to be
>> digits. There are a few other preferences that may help too, but I want to
>> start with these. I'm not sure how to go about setting up those preferences.
>>
>> Also, any suggestions past these on being able to clean up the OCR to
>> read more correctly would be helpful. I can't post full data and image here
>> (they're VINs. I'd need permission to do so), but I can say that a in one
>> instance WM is coming back as 6W6M and that the digits 67258 are coming
>> back as 572S5 in another.
>>
>> Any guidance would be appreciated. I'll provide whatever information I
>> can.
>>
>> Thanks!
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at http://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/1766c3a2-f13d-407b-a474-ad1fa8c7868c%40googlegroups.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/1766c3a2-f13d-407b-a474-ad1fa8c7868c%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUKtHbCU0BoM%3DxegRn5Mkdk9JiJ_kj9H0K6Bm7r4pqakg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to