it seems the problem was copying *langdata *from windows to linux, I have
redownload them on linux and it worked, will retry again

On Tue, 18 Jun 2019, 5:21 pm fady taher, <[email protected]> wrote:

> the output  of
>
> *src/training/tesstrain.sh  --fontlist "Times New Roman" --lang eng
> --linedata_only   --noextract_font_properties --langdata_dir
> /home/sw/repo/langdata   --tessdata_dir /home/sw/repo/tessdata --output_dir
> ~/tesstutorial/trainplusminus*
>
> is
>
> ....
> ....
>
>
>
>
>
>
>
>
>
>
> *[Tue Jun 18 17:19:46 EET 2019] /usr/local/bin/combine_lang_model
> --input_unicharset /tmp/eng-2019-06-18.baG/eng.unicharset --script_dir
> /home/sw/repo/langdata --words /home/sw/repo/langdata/eng/eng.wordlist
> --numbers /home/sw/repo/langdata/eng/eng.numbers --puncs
> /home/sw/repo/langdata/eng/eng.punc --output_dir
> /home/sw/tesstutorial/trainplusminus --lang engLoaded unicharset of size
> 111 from file /tmp/eng-2019-06-18.baG/eng.unicharsetSetting unichar
> propertiesOther case É of é is not in unicharsetSetting script
> propertiesWarning: properties incomplete for index 95 = ~Config file is
> optional, continuing...Failed to read data from:
> /home/sw/repo/langdata/eng/eng.configNull char=2Reducing Trie to
> SquishedDawgError during conversion of wordlists to DAWGs!!*
>
> On Tue, Jun 18, 2019 at 5:18 PM Shree Devi Kumar <[email protected]>
> wrote:
>
>> That means
>>
>> src/training/tesstrain.sh  --fontlist "Times New Roman" --lang eng
>> --linedata_only   --noextract_font_properties --langdata_dir
>> /home/sw/repo/langdata   --tessdata_dir /home/sw/repo/tessdata --output_dir
>> ~/tesstutorial/trainplusminus
>>
>> did not complete correctly.
>>
>> On Tue, Jun 18, 2019 at 8:46 PM fady taher <[email protected]> wrote:
>>
>>> Nop, this file doesn't exist yet
>>> only contains
>>>
>>> *eng.charset_size=110.txt*
>>> *eng.unicharset*
>>>
>>>
>>> On Tue, Jun 18, 2019 at 4:46 PM Shree Devi Kumar <[email protected]>
>>> wrote:
>>>
>>>> Check ~/tesstutorial/trainplusminus
>>>> Did your earlier training complete correctly? Does
>>>> ~/tesstutorial/trainplusminus/eng/eng.traineddata exist?
>>>>
>>>> On Tue, Jun 18, 2019 at 8:11 PM fady taher <[email protected]>
>>>> wrote:
>>>>
>>>>> Am trying to fine tune tesseract
>>>>>
>>>>> but I keep getting the error 
>>>>> *mgr_.Init(traineddata_path.c_str()):Error:Assert
>>>>> failed:in file ../../src/lstm/lstmtrainer.h, line 110  *on the
>>>>> training statement.
>>>>>
>>>>> My script looks as follows
>>>>>
>>>>> cd /home/sw/repo/tesseract-ocr
>>>>>
>>>>> mkdir -p ~/tesstutorial/
>>>>> mkdir -p ~/tesstutorial/trainplusminus
>>>>> mkdir -p ~/tesstutorial/evalplusminus
>>>>>
>>>>>
>>>>> src/training/tesstrain.sh  --fontlist "Times New Roman" --lang eng
>>>>> --linedata_only   --noextract_font_properties --langdata_dir
>>>>> /home/sw/repo/langdata   --tessdata_dir /home/sw/repo/tessdata 
>>>>> --output_dir
>>>>> ~/tesstutorial/trainplusminus
>>>>>
>>>>> src/training/tesstrain.sh  --fontlist "Times New Roman" --lang eng
>>>>> --linedata_only   --noextract_font_properties --langdata_dir
>>>>> /home/sw/repo/langdata/eng   --tessdata_dir /home/sw/repo/tessdata
>>>>>  --output_dir ~/tesstutorial/evalplusminus
>>>>>
>>>>>
>>>>> *#eng.lstm file gets extracted correctly*
>>>>> src/training/combine_tessdata -e
>>>>> /home/sw/repo/tessdata/eng.traineddata
>>>>>  ~/tesstutorial/trainplusminus/eng.lstm
>>>>>
>>>>> *#this command fails and throws the error*
>>>>> src/training/lstmtraining --model_output
>>>>> ~/tesstutorial/trainplusminus/plusminus \
>>>>>    --continue_from ~/tesstutorial/trainplusminus/eng.lstm  \
>>>>>    --traineddata ~/tesstutorial/trainplusminus/eng/eng.traineddata   \
>>>>>    --old_traineddata /home/sw/repo/tessdata/eng.traineddata   \
>>>>>    --train_listfile
>>>>> ~/tesstutorial/trainplusminus/eng.training_files.txt   \
>>>>>    --max_iterations 400
>>>>>
>>>>>
>>>>> src/training/lstmtraining --stop_training \
>>>>>   --continue_from ~/tesstutorial/trainplusminus/plusminus_checkpoint \
>>>>>   --traineddata ~/tesstutorial/trainplusminus/eng/eng.traineddata \
>>>>>   --model_output ~/tesstutorial/eng_final.traineddata
>>>>>
>>>>> cp ~/tesstutorial/eng_final.traineddata
>>>>> /usr/share/tesseract/4/tessdata/eng.traineddata
>>>>>
>>>>>
>>>>> I have download the eng.traineddata from "Best" repo though, anyone
>>>>> can help ?
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "tesseract-ocr" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To post to this group, send email to [email protected].
>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/tesseract-ocr/00310d99-1fc9-402f-b0fa-d048486d77b2%40googlegroups.com
>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/00310d99-1fc9-402f-b0fa-d048486d77b2%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> ____________________________________________________________
>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUyFr_891kXw-cLkAU13JoTSj6temm92hEWfP%3DBtZmGHA%40mail.gmail.com
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUyFr_891kXw-cLkAU13JoTSj6temm92hEWfP%3DBtZmGHA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/CADhGFTw_1TR96f%3DUTC6k5Pm4GssLvd2NXZ0s9oyMknUBFrtLHQ%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/CADhGFTw_1TR96f%3DUTC6k5Pm4GssLvd2NXZ0s9oyMknUBFrtLHQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>> --
>>
>> ____________________________________________________________
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWQ_po%3DauX3tYaJf9kB_-06inWFMS%2BDKx_RWYMTWZvrmw%40mail.gmail.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWQ_po%3DauX3tYaJf9kB_-06inWFMS%2BDKx_RWYMTWZvrmw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CADhGFTyNcPFTJ5KvOHci0bkCZyvhHNjQ-%2Bc7hUo7XFsxkSeoiQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to