On Tuesday, February 7, 2017 at 9:34:11 AM UTC-6, shree wrote:
>
> For LSTM training, box files need to have an additional line for each
> text line with the tab character to indicate a new line.
>
> If you have existing box/tiff pairs, you can use a box editor (such as
> jtessboxeditor) and insert a box at end of each line and add a tab
> character in it.
>
The jTessBoxEditor beta version has a new Mark EOL function that does just
that.
>
> >On the toolbar, the Character textbox has a built-in conversion
> function. If you enter U+0009 and hit Enter key or click on the adjacent
> Tool icon, the escape sequences will be converted to Unicode. You can also
> enter the tab character via Alt+09 numpad keys on Windows.
>
> o
> r add a dummy sequence such as @@@ and then replace to tab character in a
> text editor.
>
> See attached files as a sample.
>
> Then modify tesstrain.sh to copy the box tiff pairs to the training
> directory before starting training
>
>
>
> mkdir -p ${TRAINING_DIR}
> tlog "\n=== Starting training for language '${LANG_CODE}'"
>
> cp ./*.box "${TRAINING_DIR}/"
> cp ./*.tif "${TRAINING_DIR}/"
>
>
> On Tue, Feb 7, 2017 at 8:27 PM, Kay-Michael Würzner <[email protected]
> <javascript:>> wrote:
>
>> +1 for this question. The training documentation for Tesseract 4.0 by now
>> only covers training with font files (synthetic materials). What is missing
>> is information on training with real data (i.e. manually aligned ground
>> truth).
>> Any hints on that matter are greatly appreciated.
>>
>> Cheers,
>> Kay
>>
>> On Wednesday, January 18, 2017 at 12:31:54 AM UTC+1, [email protected]
>> wrote:
>>>
>>> I have a bunch of images, containing English words.
>>> I would like to generate training data by these images, and do the
>>> training.
>>> How should I do?
>>>
>>> Thanks a lot.
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected]
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/7bffab95-3e6b-4165-929e-a152f1799703%40googlegroups.com
>>
>> <https://groups.google.com/d/msgid/tesseract-ocr/7bffab95-3e6b-4165-929e-a152f1799703%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/ab8bc158-95b1-4c08-bc99-76a7442a919d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.