That may work then. Is there any documentation on patterns that you know
of? Syntax, format, anything? I'm not sure how to go about formatting my
patterns.


On Wed, Nov 12, 2014 at 10:12 AM, ShreeDevi Kumar <[email protected]>
wrote:

> bazaar is nothing but a config file which sets values for a set of config
> variables, please see
>
>
> https://code.google.com/p/tesseract-ocr/source/browse/tessdata/configs/bazaar
>
> So, if patterns are helpful, you can that as a config.
>
> ShreeDevi
> ____________________________________________________________
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Wed, Nov 12, 2014 at 9:09 PM, Steven Norris <[email protected]> wrote:
>
>> In a way. I can set values for keys that would appear in a config file.
>> Like the below:
>>
>> [tesseract setVariableValue:@"0123456789" forKey:@"tessedit_char_whitelist"];
>>
>>
>> On Wed, Nov 12, 2014 at 12:30 AM, ShreeDevi Kumar <[email protected]>
>> wrote:
>>
>>> Are you able to pass a configuration variable with iOS CocoaPod ?
>>>
>>> *-c configvar=value*
>>>
>>> Set value for control parameter. Multiple -c arguments are allowed.
>>>
>>>
>>> *configfile*
>>>
>>> The name of a config to use. A config is a plaintext file which contains
>>> a list of variables and their values, one per line, with a space separating
>>> variable from value.
>>>
>>> ShreeDevi
>>> ____________________________________________________________
>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>
>>> On Wed, Nov 12, 2014 at 10:33 AM, Steven Norris <[email protected]>
>>> wrote:
>>>
>>>> I did see that. Unfortunately I cannot use bazaar, as the final version
>>>> of what I'm using will be using an iOS CocoaPod that does not support the
>>>> bazaar functionality of Tesseract.
>>>>
>>>> On Tue, Nov 11, 2014 at 8:51 PM, ShreeDevi Kumar <[email protected]>
>>>> wrote:
>>>>
>>>>> On Wed, Nov 12, 2014 at 2:13 AM, <[email protected]> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> The user-patterns looks helpful, but I can't find any documentation
>>>>>> on formatting or how it works. Is there documentation on this somewhere?
>>>>>>
>>>>>
>>>>>
>>>>> ​Did you see the man page? I had also sent link to a related
>>>>> discussion in the past. Search the archives for other tips.
>>>>>
>>>>> https://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html
>>>>> says
>>>>> "if you pass the word *bazaar* as a trailing command line parameter
>>>>> to Tesseract, Tesseract will not bother loading the system dictionary nor
>>>>> the dictionary of frequent words and will load and use the eng.user-words
>>>>> and eng.user-patterns files you provided. The former is a simple word 
>>>>> list,
>>>>> one per line. The format of the latter is documented in dict/trie.h on
>>>>> read_pattern_list()."
>>>>>
>>>>> https://code.google.com/p/tesseract-ocr/source/browse/dict/trie.h
>>>>> ​see
>>>>> lines 199-232​
>>>>>
>>>>>
>>>>>
>>>>> ​
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>> On Tuesday, November 11, 2014 10:50:57 AM UTC-6, [email protected]
>>>>>> wrote:
>>>>>>>
>>>>>>> I am working on getting Tesseract to recognize VINs for an
>>>>>>> application I am developing. I have a clean VIN image (work around to be
>>>>>>> black text on white background). Have traineddata using fonts Courier,
>>>>>>> HelveticaNeue, LatoBold, LatoLight, OpenSans, and RobotoSlab as a first
>>>>>>> attempt. I've also limited the unicharset to A-Z except I and O and 0-9.
>>>>>>>
>>>>>>> The result is not very good. It returns a great deal of characters
>>>>>>> that surpass the number of characters present (17). Is there a way to 
>>>>>>> limit
>>>>>>> tesseract to only detecting a 17 character word in one line? I'd also 
>>>>>>> like
>>>>>>> to have tesseract prefer, but not require, the last 5 characters to be
>>>>>>> digits. There are a few other preferences that may help too, but I want 
>>>>>>> to
>>>>>>> start with these. I'm not sure how to go about setting up those 
>>>>>>> preferences.
>>>>>>>
>>>>>>> Also, any suggestions past these on being able to clean up the OCR
>>>>>>> to read more correctly would be helpful. I can't post full data and 
>>>>>>> image
>>>>>>> here (they're VINs. I'd need permission to do so), but I can say that a 
>>>>>>> in
>>>>>>> one instance WM is coming back as 6W6M and that the digits 67258 are 
>>>>>>> coming
>>>>>>> back as 572S5 in another.
>>>>>>>
>>>>>>> Any guidance would be appreciated. I'll provide whatever information
>>>>>>> I can.
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>  --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "tesseract-ocr" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to [email protected].
>>>>>> To post to this group, send email to [email protected].
>>>>>> Visit this group at http://groups.google.com/group/tesseract-ocr.
>>>>>> To view this discussion on the web visit
>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/065a4b64-bcba-4d02-bc81-461d9ae11655%40googlegroups.com
>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/065a4b64-bcba-4d02-bc81-461d9ae11655%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>>
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>
>>>>>  --
>>>>> You received this message because you are subscribed to a topic in the
>>>>> Google Groups "tesseract-ocr" group.
>>>>> To unsubscribe from this topic, visit
>>>>> https://groups.google.com/d/topic/tesseract-ocr/AyCNiju1x1Y/unsubscribe
>>>>> .
>>>>> To unsubscribe from this group and all its topics, send an email to
>>>>> [email protected].
>>>>> To post to this group, send email to [email protected].
>>>>> Visit this group at http://groups.google.com/group/tesseract-ocr.
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWoMKQg7enZUxOBfe35fCthkMOLvA6MmnwtqnuiFjacEw%40mail.gmail.com
>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWoMKQg7enZUxOBfe35fCthkMOLvA6MmnwtqnuiFjacEw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> *Steven T. Norris*
>>>> *Software Engineer - Forty AU*
>>>>
>>>> *p: (615)997-0836 <%28615%29997-0836>*
>>>> *e: s <[email protected]>[email protected] <[email protected]>*
>>>> *w: http://www.linkedin.com/in/steventnorris
>>>> <http://www.linkedin.com/in/steventnorris>*
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>> Visit this group at http://groups.google.com/group/tesseract-ocr.
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/tesseract-ocr/CAG5%2BCTEGQcag4QsX9Gy5Ei7dXrHzB5N4icc3tEUj0vt3dO6Fbg%40mail.gmail.com
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAG5%2BCTEGQcag4QsX9Gy5Ei7dXrHzB5N4icc3tEUj0vt3dO6Fbg%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  --
>>> You received this message because you are subscribed to a topic in the
>>> Google Groups "tesseract-ocr" group.
>>> To unsubscribe from this topic, visit
>>> https://groups.google.com/d/topic/tesseract-ocr/AyCNiju1x1Y/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to
>>> [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at http://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVgjzY8GDv9wea4emyEju%2B3gXZdHZL0krUjzWOD3jHF%2BA%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVgjzY8GDv9wea4emyEju%2B3gXZdHZL0krUjzWOD3jHF%2BA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>>
>> --
>> *Steven T. Norris*
>> *Software Engineer - Forty AU*
>>
>> *p: (615)997-0836 <%28615%29997-0836>*
>> *e: s <[email protected]>[email protected] <[email protected]>*
>> *w: http://www.linkedin.com/in/steventnorris
>> <http://www.linkedin.com/in/steventnorris>*
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at http://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/CAG5%2BCTF%3DEXLTscCHxg%2B585E2Q7zKOH4Kn%2B3dPhmMDVDpV-P2hg%40mail.gmail.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/CAG5%2BCTF%3DEXLTscCHxg%2B585E2Q7zKOH4Kn%2B3dPhmMDVDpV-P2hg%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "tesseract-ocr" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/tesseract-ocr/AyCNiju1x1Y/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUJHWJbm1ku0dV8K-Wd_6O2i2%2B8%3DkgzK%2B7F2kmTmjMYeQ%40mail.gmail.com
> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUJHWJbm1ku0dV8K-Wd_6O2i2%2B8%3DkgzK%2B7F2kmTmjMYeQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>



-- 
*Steven T. Norris*
*Software Engineer - Forty AU*

*p: (615)997-0836*
*e: s <[email protected]>[email protected] <[email protected]>*
*w: http://www.linkedin.com/in/steventnorris
<http://www.linkedin.com/in/steventnorris>*

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG5%2BCTF5D%2BDZPoNsaPWWe2wY26kM4_MApQid3p1DYXYwXxKz9Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to