*I got this thing while trying to make starter training data*

Rendered page 31 to file /tmp/ben-2019-05-29.K90/ben.SolaimanLipi.exp0.tif
Stripped 1 unrenderable words
Rendered page 31 to file /tmp/ben-2019-05-29.K90/ben.Nikosh.exp0.tif
Stripped 1 unrenderable words
Rendered page 37 to file /tmp/ben-2019-05-29.K90/ben.Mukti_Narrow.exp0.tif
Stripped 1 unrenderable words
Rendered page 38 to file /tmp/ben-2019-05-29.K90/ben.Lohit_Bengali.exp0.tif
Stripped 2 unrenderable words
Rendered page 32 to file /tmp/ben-2019-05-29.K90/ben.SolaimanLipi.exp0.tif
Stripped 6 unrenderable words
Rendered page 32 to file /tmp/ben-2019-05-29.K90/ben.Nikosh.exp0.tif
Stripped 1 unrenderable words
Rendered page 38 to file /tmp/ben-2019-05-29.K90/ben.Mukti_Narrow.exp0.tif
Stripped 1 unrenderable words
Rendered page 39 to file /tmp/ben-2019-05-29.K90/ben.Lohit_Bengali.exp0.tif
Rendered page 33 to file /tmp/ben-2019-05-29.K90/ben.SolaimanLipi.exp0.tif
Stripped 5 unrenderable words
Rendered page 33 to file /tmp/ben-2019-05-29.K90/ben.Nikosh.exp0.tif
Stripped 1 unrenderable words
Rendered page 39 to file /tmp/ben-2019-05-29.K90/ben.Mukti_Narrow.exp0.tif
Stripped 1 unrenderable words
Rendered page 40 to file /tmp/ben-2019-05-29.K90/ben.Lohit_Bengali.exp0.tif
Rendered page 34 to file /tmp/ben-2019-05-29.K90/ben.SolaimanLipi.exp0.tif
Rendered page 34 to file /tmp/ben-2019-05-29.K90/ben.Nikosh.exp0.tif
Rendered page 40 to file /tmp/ben-2019-05-29.K90/ben.Mukti_Narrow.exp0.tif
Stripped 1 unrenderable words
......
*and then*
.......
Invalid start of grapheme sequence:M=0x9c7
Normalization failed for string 'পাে'
Invalid start of grapheme sequence:M=0x9c7
Normalization failed for string 'জাে'
Invalid start of grapheme sequence:M=0x9bf
Normalization failed for string 'গাি'
Invalid start of grapheme sequence:M=0x9bf
Normalization failed for string 'রীি'
Invalid start of grapheme sequence:M=0x9c7
Normalization failed for string 'ভাে'
Invalid start of grapheme sequence:M=0x9bf
Normalization failed for string 'জাি'
Invalid start of grapheme sequence:M=0x9c7
Normalization failed for string 'থাে'
Invalid start of grapheme sequence:M=0x9c7
Normalization failed for string 'হাে'
Invalid start of grapheme sequence:M=0x9c7
Normalization failed for string 'পুে'
Invalid start of grapheme sequence:M=0x9bf
Normalization failed for string 'পুি'
Invalid start of grapheme sequence:H=0x9cd
Normalization failed for string 'অ্যা'
Invalid start of grapheme sequence:M=0x9c7
Normalization failed for string 'খাে'
Invalid start of grapheme sequence:M=0x9c7
Normalization failed for string 'চুে'
Invalid start of grapheme sequence:M=0x9bf
Normalization failed for string 'ঢাি'
Invalid start of grapheme sequence:M=0x9c7
Normalization failed for string 'তাে'
Invalid start of grapheme sequence:M=0x9c7
Normalization failed for string 'উে'
Invalid start of grapheme sequence:M=0x9bf
Normalization failed for string 'উি'
Invalid start of grapheme sequence:M=0x9c7
Normalization failed for string 'থাে'
Invalid start of grapheme sequence:M=0x9bf
Normalization failed for string 'তাি'
Invalid start of grapheme sequence:M=0x9bf


*but finally i got *

=== Moving lstmf files for training data ===
Moving /tmp/ben-2019-05-29.K90/ben.Bangla_Medium.exp0.lstmf to
/home/guest/tesstutorial/train_wa/Eval_wa
Moving /tmp/ben-2019-05-29.K90/ben.Lohit_Bengali.exp0.lstmf to
/home/guest/tesstutorial/train_wa/Eval_wa
Moving /tmp/ben-2019-05-29.K90/ben.Mukti_Narrow.exp0.lstmf to
/home/guest/tesstutorial/train_wa/Eval_wa
Moving /tmp/ben-2019-05-29.K90/ben.Nikosh.exp0.lstmf to
/home/guest/tesstutorial/train_wa/Eval_wa
Moving /tmp/ben-2019-05-29.K90/ben.SolaimanLipi.exp0.lstmf to
/home/guest/tesstutorial/train_wa/Eval_wa

Created starter traineddata for LSTM training of language 'ben'


Run 'lstmtraining' command to continue LSTM training for language 'ben'


*No error, will this training data be good, i am asking this because i feel
lots of things are happening not in the way it has to be....like it says
"normalization failed" "unrenderable"*

On Tue, May 28, 2019 at 6:27 PM Jennil Thiyam <[email protected]>
wrote:

> okay, now i understand, thank you shree
>
> On Tue, May 28, 2019 at 6:22 PM Shree Devi Kumar <[email protected]>
> wrote:
>
>> It is using a different set of fonts. So training is being done on one
>> set of fonts and eval on others.
>>
>> alternately, you can use a smaller text file for eval and use same set of
>> fonts.
>>
>> It all depends on what you want to accomplish with training.
>>
>> On Tue, May 28, 2019 at 5:59 PM Jennil Thiyam <[email protected]>
>> wrote:
>>
>>> training/tesstrain.sh \
>>>   --fonts_dir /c/Windows/Fonts \
>>>   --tessdata_dir ./tessdata \
>>>   --training_text ../langdata/ara/ara.training_text \
>>>   --langdata_dir ../langdata \
>>>   --lang ara  \
>>>   --linedata_only \
>>>   --noextract_font_properties \
>>>   --exposures "0"    \
>>>   --fontlist "Arial" \
>>>   --output_dir ~/tesstutorial/aratest
>>>
>>> training/tesstrain.sh \
>>>   --fonts_dir /c/Windows/Fonts \
>>>   --tessdata_dir ./tessdata \
>>>   --training_text ../langdata/ara/ara.training_text \
>>>   --langdata_dir ../langdata \
>>>   --lang ara  \
>>>   --linedata_only \
>>>   --noextract_font_properties \
>>>   --exposures "0"    \
>>>   --fontlist "Arial" \
>>>   "Arial Unicode MS" \
>>>   "Calibri" \
>>>   "Courier New" \
>>>   --output_dir ~/tesstutorial/araeval
>>>
>>> can anyone tell me why do we need to create this eval data, i meant it is 
>>> also going to same as training data.
>>>
>>>
>>> On Tue, May 28, 2019 at 10:46 AM Jennil Thiyam <[email protected]>
>>> wrote:
>>>
>>>> okay, thank you
>>>>
>>>> On Tue, May 28, 2019 at 10:30 AM Shree Devi Kumar <[email protected]>
>>>> wrote:
>>>>
>>>>> The old traineddata and the lstm file need to be in sync. So you
>>>>> should extract lstm file after downloading the traineddata and use those
>>>>> files. Rest of files don't need to be regenerated.
>>>>>
>>>>> On Tue, May 28, 2019 at 10:26 AM Jennil Thiyam <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> do you mean to change only the path of this old traineddata(in the
>>>>>> command, that I underlined) to the path of ben.traineddata(that i am 
>>>>>> going
>>>>>> to download from tessdata_best)? or do i need to perform the whole 
>>>>>> process
>>>>>> with this (to be downloaded) ben.traineddata?
>>>>>>
>>>>>>  lstmtraining --model_output /model  \
>>>>>> --continue_from  /ben_extract/ben.lstm  \
>>>>>> --traineddata  /tesstutorial_output/ben/ben.traineddata  \
>>>>>> *--old_traineddata
>>>>>> /usr/share/tesseract-ocr/4.00/tessdata/ben.traineddata  \*
>>>>>> --train_listfile  /tesstutorial_output/ben.training_files.txt  \
>>>>>> --max_iterations 1500
>>>>>>
>>>>>> Do you have any idea about the estimated time it will take for 1500
>>>>>> iterations?
>>>>>>
>>>>>> Thank you
>>>>>>
>>>>>> On Mon, May 27, 2019 at 10:20 PM Shree Devi Kumar <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> You can download ben.traineddata from tessdata_best in a different
>>>>>>> location and use that as part of lstmtraining command
>>>>>>>
>>>>>>> On Mon, May 27, 2019 at 6:24 PM Jennil Thiyam <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> I installed by using the command in ubuntu 18, so i dint install
>>>>>>>> from git repository, so if i installed from git repository,will this 
>>>>>>>> thing
>>>>>>>> work??
>>>>>>>>
>>>>>>>> On Mon 27 May, 2019, 5:43 PM Shree Devi Kumar <[email protected]
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Is  /usr/share/tesseract-ocr/4.00/tessdata/ben.traineddata from
>>>>>>>>> tessdata_best repo? Only those models can be used for finetuning.
>>>>>>>>>
>>>>>>>>> On Mon, May 27, 2019 at 4:25 PM Jennil Thiyam <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> yes...i extracted with the command combine_tessdata
>>>>>>>>>>
>>>>>>>>>> On Mon 27 May, 2019, 4:23 PM Shree Devi Kumar <
>>>>>>>>>> [email protected] wrote:
>>>>>>>>>>
>>>>>>>>>>> Has  /ben_extract/ben.lstm been extracted from
>>>>>>>>>>> /usr/share/tesseract-ocr/4.00/tessdata/ben.traineddata ?
>>>>>>>>>>>
>>>>>>>>>>> On Mon, May 27, 2019 at 2:55 PM Jennil Thiyam <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I got error whie trying to perform fine tuning, the command i
>>>>>>>>>>>> used is below:
>>>>>>>>>>>>
>>>>>>>>>>>>  lstmtraining --model_output /model  \
>>>>>>>>>>>> --continue_from  /ben_extract/ben.lstm  \
>>>>>>>>>>>> --traineddata  /tesstutorial_output/ben/ben.traineddata  \
>>>>>>>>>>>> --old_traineddata  
>>>>>>>>>>>> /usr/share/tesseract-ocr/4.00/tessdata/ben.traineddata
>>>>>>>>>>>> \
>>>>>>>>>>>> --train_listfile  /tesstutorial_output/ben.training_files.txt
>>>>>>>>>>>> \
>>>>>>>>>>>> --max_iterations 1500
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I have read the discussion about the same error, but the
>>>>>>>>>>>> solution provided over there were all about changing path and all, 
>>>>>>>>>>>> and i am
>>>>>>>>>>>> sure i am right about the path. please help me out
>>>>>>>>>>>> --
>>>>>>>>>>>> You received this message because you are subscribed to the
>>>>>>>>>>>> Google Groups "tesseract-ocr" group.
>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from
>>>>>>>>>>>> it, send an email to [email protected]
>>>>>>>>>>>> .
>>>>>>>>>>>> To post to this group, send email to
>>>>>>>>>>>> [email protected].
>>>>>>>>>>>> Visit this group at
>>>>>>>>>>>> https://groups.google.com/group/tesseract-ocr.
>>>>>>>>>>>> To view this discussion on the web visit
>>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/0958d266-6f2f-4d10-9104-ee8145a4f005%40googlegroups.com
>>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/0958d266-6f2f-4d10-9104-ee8145a4f005%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>>>> .
>>>>>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>>
>>>>>>>>>>> ____________________________________________________________
>>>>>>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>>>>>>>> --
>>>>>>>>>>> You received this message because you are subscribed to the
>>>>>>>>>>> Google Groups "tesseract-ocr" group.
>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from
>>>>>>>>>>> it, send an email to [email protected].
>>>>>>>>>>> To post to this group, send email to
>>>>>>>>>>> [email protected].
>>>>>>>>>>> Visit this group at
>>>>>>>>>>> https://groups.google.com/group/tesseract-ocr.
>>>>>>>>>>> To view this discussion on the web visit
>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXN72W5rb7o%3D7btSfz-GOj%2BoXWOX10%3Dr3CpdNb%2By-JbKA%40mail.gmail.com
>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXN72W5rb7o%3D7btSfz-GOj%2BoXWOX10%3Dr3CpdNb%2By-JbKA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>>>>>>>> .
>>>>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> You received this message because you are subscribed to the
>>>>>>>>>> Google Groups "tesseract-ocr" group.
>>>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>>>> send an email to [email protected].
>>>>>>>>>> To post to this group, send email to
>>>>>>>>>> [email protected].
>>>>>>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr
>>>>>>>>>> .
>>>>>>>>>> To view this discussion on the web visit
>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/CAJxgoofQjuSOdaVNdkB%2B54b%2BzNhLWY9uyb-yDFuDGrhEh-ixCg%40mail.gmail.com
>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAJxgoofQjuSOdaVNdkB%2B54b%2BzNhLWY9uyb-yDFuDGrhEh-ixCg%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>>>>>>> .
>>>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> ____________________________________________________________
>>>>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>>>>>> --
>>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>>> Groups "tesseract-ocr" group.
>>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>>> send an email to [email protected].
>>>>>>>>> To post to this group, send email to
>>>>>>>>> [email protected].
>>>>>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>>>>>>> To view this discussion on the web visit
>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWhz4YfUPDDWctdkbKcA-nVT1j2Rxkbq%2BZhuh2W2dxqJA%40mail.gmail.com
>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWhz4YfUPDDWctdkbKcA-nVT1j2Rxkbq%2BZhuh2W2dxqJA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>>>>>> .
>>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>>
>>>>>>>> --
>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>> Groups "tesseract-ocr" group.
>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>> send an email to [email protected].
>>>>>>>> To post to this group, send email to [email protected]
>>>>>>>> .
>>>>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>>>>>> To view this discussion on the web visit
>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/CAJxgoocvQgqXPQL6VAWm-iZS_WHu3dU094fH%3Db_i%2Bo2B%2BAdzPA%40mail.gmail.com
>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAJxgoocvQgqXPQL6VAWm-iZS_WHu3dU094fH%3Db_i%2Bo2B%2BAdzPA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>>>>> .
>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> ____________________________________________________________
>>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>>>> --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "tesseract-ocr" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>> send an email to [email protected].
>>>>>>> To post to this group, send email to [email protected].
>>>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>>>>> To view this discussion on the web visit
>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXcdSWM-TxaSPVtk%3DVbG4bB8DRrtT6ocGRBErq46si6_g%40mail.gmail.com
>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXcdSWM-TxaSPVtk%3DVbG4bB8DRrtT6ocGRBErq46si6_g%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "tesseract-ocr" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to [email protected].
>>>>>> To post to this group, send email to [email protected].
>>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>>>> To view this discussion on the web visit
>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/CAJxgoodYdOK4S9XoGOBAKoGWvRQ1xA52%3DUB-TqoVVgSLagPraw%40mail.gmail.com
>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAJxgoodYdOK4S9XoGOBAKoGWvRQ1xA52%3DUB-TqoVVgSLagPraw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> ____________________________________________________________
>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "tesseract-ocr" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To post to this group, send email to [email protected].
>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXDbsmDfyngQ%2B_2Pqiwumj%3DuT3c16myvoutpD%3DOVq%3DN_g%40mail.gmail.com
>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXDbsmDfyngQ%2B_2Pqiwumj%3DuT3c16myvoutpD%3DOVq%3DN_g%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/CAJxgoodOZJg3eGg5k2w%3D5%3DeCtq2%2BmNfw%3DFsaYT-4OB2hEmLHMw%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAJxgoodOZJg3eGg5k2w%3D5%3DeCtq2%2BmNfw%3DFsaYT-4OB2hEmLHMw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>> --
>>
>> ____________________________________________________________
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXzCFHEEioCSu6drQSysHti818xztypCFSWMrQDWtuPaw%40mail.gmail.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXzCFHEEioCSu6drQSysHti818xztypCFSWMrQDWtuPaw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJxgooc9uDhQnLsHshfB%2BmE7kd71T8U_JR%3D96QoE%2Bf%2Baefs6ug%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to