On Wed, Nov 12, 2014 at 2:13 AM, <[email protected]> wrote:

>
>
> The user-patterns looks helpful, but I can't find any documentation on
> formatting or how it works. Is there documentation on this somewhere?
>


​Did you see the man page? I had also sent link to a related discussion in
the past. Search the archives for other tips.

https://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html
says
"if you pass the word *bazaar* as a trailing command line parameter to
Tesseract, Tesseract will not bother loading the system dictionary nor the
dictionary of frequent words and will load and use the eng.user-words and
eng.user-patterns files you provided. The former is a simple word list, one
per line. The format of the latter is documented in dict/trie.h on
read_pattern_list()."

https://code.google.com/p/tesseract-ocr/source/browse/dict/trie.h
​see
lines 199-232​



​


>
>
> On Tuesday, November 11, 2014 10:50:57 AM UTC-6, [email protected] wrote:
>>
>> I am working on getting Tesseract to recognize VINs for an application I
>> am developing. I have a clean VIN image (work around to be black text on
>> white background). Have traineddata using fonts Courier, HelveticaNeue,
>> LatoBold, LatoLight, OpenSans, and RobotoSlab as a first attempt. I've also
>> limited the unicharset to A-Z except I and O and 0-9.
>>
>> The result is not very good. It returns a great deal of characters that
>> surpass the number of characters present (17). Is there a way to limit
>> tesseract to only detecting a 17 character word in one line? I'd also like
>> to have tesseract prefer, but not require, the last 5 characters to be
>> digits. There are a few other preferences that may help too, but I want to
>> start with these. I'm not sure how to go about setting up those preferences.
>>
>> Also, any suggestions past these on being able to clean up the OCR to
>> read more correctly would be helpful. I can't post full data and image here
>> (they're VINs. I'd need permission to do so), but I can say that a in one
>> instance WM is coming back as 6W6M and that the digits 67258 are coming
>> back as 572S5 in another.
>>
>> Any guidance would be appreciated. I'll provide whatever information I
>> can.
>>
>> Thanks!
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/065a4b64-bcba-4d02-bc81-461d9ae11655%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/065a4b64-bcba-4d02-bc81-461d9ae11655%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWoMKQg7enZUxOBfe35fCthkMOLvA6MmnwtqnuiFjacEw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to