Thank you for the follow-up. Is it possible to fine-tune Tesseract without
font files if I can't accurately source them (and without training from
scratch)?



On Tue, Jan 29, 2019 at 12:51 PM Shree Devi Kumar <[email protected]>
wrote:

> >I’m trying to train Tesseract 4 using images (and associated box files).
> I can’t pinpoint the font name and prefer to avoid sourcing the font itself.
>
> LSTM training is much easier with font files since a large amount of
> training data is needed. You could try https://www.whatfontis.com/ to
> identify the font and then finetune with it.
>
> The box files generated by tesseract are NOT in a format needed for LSTM
> training. They will need to be modified by hand to use for training.
>
> I am attaching modified version of tesstrain bash scripts which add
>
>   OPTIONAL flag for specifying directory with user specified box/tiff
> pairs.
>   Files should be named similar to
> ${LANG_CODE}.${fontname}.exp${EXPOSURE}.box/tif
>      --my_boxtiff_dir MY_BOXTIFF_DIR # Location of user specified box/tiff
> files.
>
>
>
>
>
> On Sat, Jan 26, 2019 at 10:45 PM <[email protected]> wrote:
>
>> Thank you for the suggestion, but I have tried OCR-D Train previously,
>> and seem to have an issue running even the training example. I receive
>> issues with make and also ascii encoding errors (likely from the included
>> python script). Might you have advice for accomplishing my initial goal
>> without the helper app?
>>
>> On Jan 26, 2019, at 3:28 AM, Shree Devi Kumar <[email protected]>
>> wrote:
>>
>> Check out
>>
>> https://github.com/OCR-D/ocrd-train
>>
>>
>> On Sat, 26 Jan 2019, 13:36 <[email protected] wrote:
>>
>>> Hello,
>>>
>>> I’m trying to train Tesseract 4 using images (and associated box files).
>>> I can’t pinpoint the font name and prefer to avoid sourcing the font itself.
>>>
>>> I’m currently trying to train on MacOS High Sierra, but have access to
>>> Trisquel and Windows 8.1, as well.
>>>
>>> I find the directions on the wiki to assume either considerable prior
>>> knowledge to really understand the structure of the training terminal
>>> commands (and why each of them is important) or a leap of faith with trial
>>> and error.
>>>
>>> Any help would be most appreciated?
>>>
>>> -Aaron
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/C3A0C93E-FFBC-474C-87DC-C3F53F3F0F70%40stonybrook.edu
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUHgU%2B3rUS6gWqKKY299%2B5iXWKBAngmRfkuGGPixBhLeg%40mail.gmail.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUHgU%2B3rUS6gWqKKY299%2B5iXWKBAngmRfkuGGPixBhLeg%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/C542CC19-7610-43B2-92EA-9355B2A493C7%40stonybrook.edu
>> <https://groups.google.com/d/msgid/tesseract-ocr/C542CC19-7610-43B2-92EA-9355B2A493C7%40stonybrook.edu?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
> --
>
> ____________________________________________________________
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduW02dm-aJwLL6i5cJd-292cdzWZ9-Emd4b9ukPs6r_yvg%40mail.gmail.com
> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduW02dm-aJwLL6i5cJd-292cdzWZ9-Emd4b9ukPs6r_yvg%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAMFcnyAx-Nx1yNRC_XRhqLQr7-cD-WPNw%2BxaG0HaRZCJqRpxKg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to