Re: [tesseract-ocr] Does the number in the .exp# file type matter?

ShreeDevi Kumar Sun, 24 Sep 2017 09:08:47 -0700

Please read tesstrain_utils.sh if you want to know the details.

Dictionary files are built from your sources in langdata. Unicharset is
also built from your training_text in langdata.


On 24-Sep-2017 7:05 PM, "Dan9er" <[email protected]> wrote:

> That answer doesn't help me.
>
> How can I add dictionary files to tesstrain?
>
> On Saturday, September 23, 2017 at 12:05:37 PM UTC-4, shree wrote:
>>
>> You cannot use a random unicharset, it needs to be the same one used for
>> training the model.
>>
>> For multiple exposures, use the following method
>>
>> training/tesstrain.sh \
>> --fonts_dir /mnt/c/Windows/Fonts \
>>  --lang eng \
>>  --noextract_font_properties  --linedata_only \
>>  --exposures "-1, 0, 1" \
>>  --langdata_dir ../langdata \
>>  --tessdata_dir ../tessdata \
>>  --fontlist \
>>   "Arial" \
>>   "Tahoma" \
>>   "Times New Roman," \
>>   "Sanskrit 2003," \
>>     "FreeSerif Italic" \
>>     "Times New Roman, Italic" \
>>   --output_dir ../tesstutorial/eng
>>
>>
>> ShreeDevi
>> ____________________________________________________________
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
>> On Sat, Sep 23, 2017 at 8:46 PM, Dan9er <[email protected]> wrote:
>>
>>> I'm making a unicharset file so I can compile DAWG dictionary files so
>>> I can use it with tesstrain.sh. I want to use multiple exposures (-1,
>>> 0,1) for the tiff/box pairs. How should name them to separate the
>>> different exposures?
>>>
>>> Can I do this?:
>>>
>>> lang.Arial.exp0
>>> lang.Arial.exp1
>>> lang.Arial.exp2
>>>
>>> Or will changing the file numbers screw things up? As an alternative,
>>> can I do this?:
>>>
>>> lang.Arial0.exp0
>>> lang.Arial1.exp0
>>> lang.Arial2.exp0
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>> gid/tesseract-ocr/6e9f4a45-5dde-41f6-8a41-a403778aef54%40goo
>>> glegroups.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/6e9f4a45-5dde-41f6-8a41-a403778aef54%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/f473592f-3bc3-4e8f-b625-6a14b2d3bfba%
> 40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/f473592f-3bc3-4e8f-b625-6a14b2d3bfba%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWGPqKCNiywjaTTn%2B1ZZF4XjGE-wRCohDoeYF2gafngRw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Does the number in the .exp# file type matter?

Reply via email to