Hi@shreeshrii
attached is the bash script as described in the following page
https://github.com/tesseract-ocr/tesseract/issues/2695#issuecomment-539412948

when i change the line #51 line 

--traineddata ~/tesstutorial/tesseract/tessdata/best/ara.traineddata \

to be

--traineddata ~/tesstutorial/araeval/ara/ara.traineddata

now it works fine without error 
but i have another question
the number of character set in best train is 85 and in the new generated 
character set contain only 74
how to keep unicharset number as best  85 ?


بتاريخ الأحد، 29 مارس، 2020 5:06:16 ص UTC+2، كتب shree:
>
> See https://github.com/Shreeshrii/tess4training/blob/master/6-plusminus.sh
>
> lstmtraining --model_output ../tesstutorial/trainplusminus/plusminus \
>   --continue_from ../tesstutorial/trainplusminus/eng.lstm \
>   --traineddata ../tesstutorial/trainplusminus/eng/eng.traineddata \
>   --old_traineddata tessdata/best/eng.traineddata \
>   --train_listfile ../tesstutorial/trainplusminus/eng.training_files.txt \
>   --max_iterations 3600
>
> ...
>
>
> lstmtraining \
>   --stop_training \
>   --continue_from ../tesstutorial/trainplusminus/plusminus_checkpoint \
>   --traineddata ../tesstutorial/trainplusminus/eng/eng.traineddata \
>   --model_output ../tesstutorial/trainplusminus/eng_plusminus.traineddata
>
>     --traineddata  needs to be same in both commands. 
>
> On Sun, Mar 29, 2020 at 6:45 AM Shree Devi Kumar <[email protected] 
> <javascript:>> wrote:
>
>> Please check that you have used the correct path for the traineddata file.
>>
>> Please share the lstmtraining command that you used before this for 
>> training.
>>
>> On Sat, Mar 28, 2020, 22:56 Essam Zaky <[email protected] <javascript:>> 
>> wrote:
>>
>>> Dear @Shreeshrii
>>> I had followed your bash script to add Andalus font in the Arabic 
>>> lanaguage here it the script url
>>>
>>> https://github.com/tesseract-ocr/tesseract/issues/2695#issuecomment-539412948
>>>
>>> all steps steps works except the last one which generate the traineddata 
>>> here it's the error
>>>
>>> osboxes@osboxes:~/tesstutorial/tesseract$ time lstmtraining \
>>> >   --stop_training \
>>> >   --continue_from ~/tesstutorial/ara_from_full/PLUS_checkpoint \
>>> >   --traineddata ~/tesstutorial/tesseract/tessdata/best/ara.traineddata 
>>> \
>>> >   --model_output 
>>> ~/tesstutorial/ara_from_full/ara.Andalus.PLUS.traineddata
>>> Loaded file /home/osboxes/tesstutorial/ara_from_full/PLUS_checkpoint, 
>>> unpacking...
>>> Code range changed from 74 to 85!
>>> Must supply the old traineddata for code conversion!
>>> Failed to read continue from: 
>>> /home/osboxes/tesstutorial/ara_from_full/PLUS_checkpoint
>>>
>>>
>>> Best Regards
>>> Essam
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected] <javascript:>.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/tesseract-ocr/0c9123f5-8e80-447c-9bf1-2c6ec9831238%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/tesseract-ocr/0c9123f5-8e80-447c-9bf1-2c6ec9831238%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>
>
> -- 
>
> ____________________________________________________________
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/e1e7e7c6-8b11-4713-a303-837604668c22%40googlegroups.com.
#!/bin/bash

time tesstrain.sh \
  --fonts_dir ~/.fonts \
  --lang ara --linedata_only \
  --noextract_font_properties \
  --langdata_dir ~/tesstutorial/langdata \
  --tessdata_dir ~/tesstutorial/tesseract/tessdata \
  --fontlist "Andalus" \
  --training_text ~/tesstutorial/langdata/ara/ara.training_text \
  --workspace_dir ~/tesstutorial/tmp/ \
  --save_box_tiff \
  --output_dir ~/tesstutorial/araeval

echo "/n ****** Finetune one of the fully-trained existing models: ***********"

mkdir -p ~/tesstutorial/ara_from_full

combine_tessdata -e ~/tesstutorial/tesseract/tessdata/best/ara.traineddata \
  ~/tesstutorial/ara_from_full/ara.lstm

lstmtraining \
  --model_output ~/tesstutorial/ara_from_full/PLUS \
   --continue_from ~/tesstutorial/ara_from_full/ara.lstm \
   --traineddata ~/tesstutorial/araeval/ara/ara.traineddata \
   --old_traineddata ~/tesstutorial/tesseract/tessdata/best/ara.traineddata \
   --train_listfile ~/tesstutorial/araeval/ara.training_files.txt \
   --debug_interval -1 \
   --max_iterations 3600 &>~/tesstutorial/ara_from_full/plustrain.log

tail -f ~/tesstutorial/ara_from_full/plustrain.log

echo -e "\n****************************  ******\n"

lstmeval \
  --model ~/tesstutorial/tesseract/tessdata/best/ara.traineddata \
  --eval_listfile ~/tesstutorial/araeval/ara.training_files.txt
  
echo -e "\n****************************  ******\n"

lstmeval \
  --model ~/tesstutorial/ara_from_full/PLUS_checkpoint \
   --traineddata ~/tesstutorial/araeval/ara/ara.traineddata \
  --eval_listfile ~/tesstutorial/araeval/ara.training_files.txt

echo -e "\n****************************  ******\n"

time lstmtraining \
  --stop_training \
  --continue_from ~/tesstutorial/ara_from_full/PLUS_checkpoint \
  --traineddata ~/tesstutorial/tesseract/tessdata/best/ara.traineddata \
  --model_output ~/tesstutorial/ara_from_full/ara.Andalus.PLUS.traineddata

Reply via email to