Thanks, Quan

- excuse the brevity, sent from mobile

On 08-Feb-2017 7:33 PM, "Quan Nguyen" <[email protected]> wrote:

>
>
> On Tuesday, February 7, 2017 at 9:34:11 AM UTC-6, shree wrote:
>>
>> ​For LSTM training, box files need to have an additional line for each
>> text line with the tab character to indicate a new line.
>>
>> If you have existing box/tiff pairs, you can use a box editor (such as
>> jtessboxeditor) and insert a box at end of each line and add a tab
>> character in it.
>>
>
> The jTessBoxEditor beta version has a new Mark EOL function that does just
> that.
>
>
>>
>> >On the toolbar, the Character textbox has a built-in conversion
>> function. If you enter U+0009 and hit Enter key or click on the adjacent
>> Tool icon, the escape sequences will be converted to Unicode. You can also
>> enter the tab character via Alt+09 numpad keys on Windows.
>>
>> o
>> ​r add a dummy sequence such as @@@ and then replace to tab character in
>> a text editor.
>> ​
>> ​See attached files as a sample.
>>
>> Then modify tesstrain.sh to copy the box tiff pairs to the training
>> directory before starting training
>>
>>
>>
>> mkdir -p ${TRAINING_DIR}
>> tlog "\n=== Starting training for language '${LANG_CODE}'"
>>
>> cp  ./*.box "${TRAINING_DIR}/"
>> cp  ./*.tif "${TRAINING_DIR}/"​
>>
>>
>> On Tue, Feb 7, 2017 at 8:27 PM, Kay-Michael Würzner <[email protected]>
>> wrote:
>>
>>> +1 for this question. The training documentation for Tesseract 4.0 by
>>> now only covers training with font files (synthetic materials). What is
>>> missing is information on training with real data (i.e. manually aligned
>>> ground truth).
>>> Any hints on that matter are greatly appreciated.
>>>
>>> Cheers,
>>> Kay
>>>
>>> On Wednesday, January 18, 2017 at 12:31:54 AM UTC+1, [email protected]
>>> wrote:
>>>>
>>>> I have a bunch of images, containing English words.
>>>> I would like to generate training data by these images, and do the
>>>> training.
>>>> How should I do?
>>>>
>>>> Thanks a lot.
>>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>> gid/tesseract-ocr/7bffab95-3e6b-4165-929e-a152f1799703%40goo
>>> glegroups.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/7bffab95-3e6b-4165-929e-a152f1799703%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/ab8bc158-95b1-4c08-bc99-76a7442a919d%
> 40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/ab8bc158-95b1-4c08-bc99-76a7442a919d%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVMFM6uHtRmbLAaH5ZOKuuNfyX%2BV-9Dsbti1ZihGhL%2BhA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to