Re: [tesseract-ocr] Re: How can I do the training using my own image in Tesseract 4.0

[email protected] Sat, 22 Aug 2020 12:57:00 -0700

Hi Sir/madam
I am requesting for the following questions
can we print the input shape and output shape for each layer in the 
training process
can you explain how an image feed into the LSTM cells with timestamps in 
the training process ?
how does Tesseract language model work in the training



On Saturday, August 8, 2020 at 10:06:14 AM UTC+5:30 [email protected] wrote:

> This post from 2017, 
> do I have to change it with tesseract 5.0 version
>
> Because the code already indicate the langdata *--langdata_dir 
> ~/tesstutorial/langdata \ *
>
> the tesstrain.sh as follow:
>
> [image: Screen Shot 2020-08-08 at 11.25.56.png]
>
>
> PANGOCAIRO_BACKEND=fc \ 
> ~/tesseract/src/training/tesstrain.sh \ 
> --fonts_dir /Library/Fonts \ 
> --lang vie \ 
> --linedata_only \ 
> --noextract_font_properties \ 
> --exposures "0" \ 
> --langdata_dir ~/tesstutorial/langdata \ 
> --tessdata_dir ~/tesstutorial/tesseract/tessdata \ 
> --fontlist "Times New Roman" \ 
> --output_dir ~/tesstutorial/vietrain
>
> Best regards,
>
> TuPM
>
> On Tuesday, February 7, 2017 at 10:34:11 PM UTC+7 shree wrote:
>
>> For LSTM training, box files need to have an additional line for each 
>> text line with the tab character to indicate a new line.
>>
>> If you have existing box/tiff pairs, you can use a box editor (such as 
>> jtessboxeditor) and insert a box at end of each line and add a tab 
>> character in it.
>>
>> >On the toolbar, the Character textbox has a built-in conversion 
>> function. If you enter U+0009 and hit Enter key or click on the adjacent 
>> Tool icon, the escape sequences will be converted to Unicode. You can also 
>> enter the tab character via Alt+09 numpad keys on Windows.
>>
>> o
>> r add a dummy sequence such as @@@ and then replace to tab character in a 
>> text editor.
>> See attached files as a sample.
>>
>> Then modify tesstrain.sh to copy the box tiff pairs to the training 
>> directory before starting training
>>
>>
>>
>> mkdir -p ${TRAINING_DIR}
>> tlog "\n=== Starting training for language '${LANG_CODE}'"
>>
>> cp  ./*.box "${TRAINING_DIR}/"
>> cp  ./*.tif "${TRAINING_DIR}/" 
>>
>>
>> On Tue, Feb 7, 2017 at 8:27 PM, Kay-Michael Würzner <[email protected]> 
>> wrote:
>>
>>> +1 for this question. The training documentation for Tesseract 4.0 by 
>>> now only covers training with font files (synthetic materials). What is 
>>> missing is information on training with real data (i.e. manually aligned 
>>> ground truth).
>>> Any hints on that matter are greatly appreciated.
>>>
>>> Cheers,
>>> Kay
>>>
>>> On Wednesday, January 18, 2017 at 12:31:54 AM UTC+1, [email protected] 
>>> wrote:
>>>>
>>>> I have a bunch of images, containing English words.
>>>> I would like to generate training data by these images, and do the 
>>>> training.
>>>> How should I do?
>>>>
>>>> Thanks a lot.
>>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "tesseract-ocr" group.
>>>
>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to [email protected].
>>
>>
>>> To post to this group, send email to [email protected].
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/tesseract-ocr/7bffab95-3e6b-4165-929e-a152f1799703%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/tesseract-ocr/7bffab95-3e6b-4165-929e-a152f1799703%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/1c775021-c053-4d21-aebf-c0a5a495b6b6n%40googlegroups.com.

Re: [tesseract-ocr] Re: How can I do the training using my own image in Tesseract 4.0

Reply via email to