Thanks also from my side. I'll have a look into the jTessBoxEditor beta, 
try to setup training and get back to you.

Kay

On Wednesday, February 8, 2017 at 3:52:58 PM UTC+1, shree wrote:
>
> Thanks, Quan
>
> - excuse the brevity, sent from mobile
>
> On 08-Feb-2017 7:33 PM, "Quan Nguyen" <[email protected] <javascript:>> 
> wrote:
>
>>
>>
>> On Tuesday, February 7, 2017 at 9:34:11 AM UTC-6, shree wrote:
>>>
>>> ​For LSTM training, box files need to have an additional line for each 
>>> text line with the tab character to indicate a new line.
>>>
>>> If you have existing box/tiff pairs, you can use a box editor (such as 
>>> jtessboxeditor) and insert a box at end of each line and add a tab 
>>> character in it.
>>>
>>
>> The jTessBoxEditor beta version has a new Mark EOL function that does 
>> just that.
>>  
>>
>>>
>>> >On the toolbar, the Character textbox has a built-in conversion 
>>> function. If you enter U+0009 and hit Enter key or click on the adjacent 
>>> Tool icon, the escape sequences will be converted to Unicode. You can also 
>>> enter the tab character via Alt+09 numpad keys on Windows.
>>>
>>> o
>>> ​r add a dummy sequence such as @@@ and then replace to tab character in 
>>> a text editor.
>>> ​
>>> ​See attached files as a sample.
>>>
>>> Then modify tesstrain.sh to copy the box tiff pairs to the training 
>>> directory before starting training
>>>
>>>
>>>
>>> mkdir -p ${TRAINING_DIR}
>>> tlog "\n=== Starting training for language '${LANG_CODE}'"
>>>
>>> cp  ./*.box "${TRAINING_DIR}/"
>>> cp  ./*.tif "${TRAINING_DIR}/"​
>>>
>>>
>>> On Tue, Feb 7, 2017 at 8:27 PM, Kay-Michael Würzner <[email protected]> 
>>> wrote:
>>>
>>>> +1 for this question. The training documentation for Tesseract 4.0 by 
>>>> now only covers training with font files (synthetic materials). What is 
>>>> missing is information on training with real data (i.e. manually aligned 
>>>> ground truth).
>>>> Any hints on that matter are greatly appreciated.
>>>>
>>>> Cheers,
>>>> Kay
>>>>
>>>> On Wednesday, January 18, 2017 at 12:31:54 AM UTC+1, [email protected] 
>>>> wrote:
>>>>>
>>>>> I have a bunch of images, containing English words.
>>>>> I would like to generate training data by these images, and do the 
>>>>> training.
>>>>> How should I do?
>>>>>
>>>>> Thanks a lot.
>>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/tesseract-ocr/7bffab95-3e6b-4165-929e-a152f1799703%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/7bffab95-3e6b-4165-929e-a152f1799703%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected] 
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/ab8bc158-95b1-4c08-bc99-76a7442a919d%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/ab8bc158-95b1-4c08-bc99-76a7442a919d%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/beeb2493-58e1-4a4a-bb0a-3b5c1dfd007f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to