The unicharset is based on the training text you use. Please make sure you
have all required characters in the text.

Fine-tune for impact works with the unicharset of the best traineddata
file, but then you can't add any characters to it.

On Sun, Mar 29, 2020, 11:08 Essam Zaky <[email protected]> wrote:

> Hi@shreeshrii
> attached is the bash script as described in the following page
>
> https://github.com/tesseract-ocr/tesseract/issues/2695#issuecomment-539412948
>
> when i change the line #51 line
>
> --traineddata ~/tesstutorial/tesseract/tessdata/best/ara.traineddata \
>
> to be
>
> --traineddata ~/tesstutorial/araeval/ara/ara.traineddata
>
> now it works fine without error
> but i have another question
> the number of character set in best train is 85 and in the new generated
> character set contain only 74
> how to keep unicharset number as best  85 ?
>
>
> بتاريخ الأحد، 29 مارس، 2020 5:06:16 ص UTC+2، كتب shree:
>>
>> See
>> https://github.com/Shreeshrii/tess4training/blob/master/6-plusminus.sh
>>
>> lstmtraining --model_output ../tesstutorial/trainplusminus/plusminus \
>>   --continue_from ../tesstutorial/trainplusminus/eng.lstm \
>>   --traineddata ../tesstutorial/trainplusminus/eng/eng.traineddata \
>>   --old_traineddata tessdata/best/eng.traineddata \
>>   --train_listfile ../tesstutorial/trainplusminus/eng.training_files.txt \
>>   --max_iterations 3600
>>
>> ...
>>
>>
>> lstmtraining \
>>   --stop_training \
>>   --continue_from ../tesstutorial/trainplusminus/plusminus_checkpoint \
>>   --traineddata ../tesstutorial/trainplusminus/eng/eng.traineddata \
>>   --model_output ../tesstutorial/trainplusminus/eng_plusminus.traineddata
>>
>>     --traineddata  needs to be same in both commands.
>>
>> On Sun, Mar 29, 2020 at 6:45 AM Shree Devi Kumar <[email protected]>
>> wrote:
>>
>>> Please check that you have used the correct path for the traineddata
>>> file.
>>>
>>> Please share the lstmtraining command that you used before this for
>>> training.
>>>
>>> On Sat, Mar 28, 2020, 22:56 Essam Zaky <[email protected]> wrote:
>>>
>>>> Dear @Shreeshrii
>>>> I had followed your bash script to add Andalus font in the Arabic
>>>> lanaguage here it the script url
>>>>
>>>> https://github.com/tesseract-ocr/tesseract/issues/2695#issuecomment-539412948
>>>>
>>>> all steps steps works except the last one which generate the
>>>> traineddata here it's the error
>>>>
>>>> osboxes@osboxes:~/tesstutorial/tesseract$ time lstmtraining \
>>>> >   --stop_training \
>>>> >   --continue_from ~/tesstutorial/ara_from_full/PLUS_checkpoint \
>>>> >   --traineddata
>>>> ~/tesstutorial/tesseract/tessdata/best/ara.traineddata \
>>>> >   --model_output
>>>> ~/tesstutorial/ara_from_full/ara.Andalus.PLUS.traineddata
>>>> Loaded file /home/osboxes/tesstutorial/ara_from_full/PLUS_checkpoint,
>>>> unpacking...
>>>> Code range changed from 74 to 85!
>>>> Must supply the old traineddata for code conversion!
>>>> Failed to read continue from:
>>>> /home/osboxes/tesstutorial/ara_from_full/PLUS_checkpoint
>>>>
>>>>
>>>> Best Regards
>>>> Essam
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/tesseract-ocr/0c9123f5-8e80-447c-9bf1-2c6ec9831238%40googlegroups.com
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/0c9123f5-8e80-447c-9bf1-2c6ec9831238%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>
>>
>> --
>>
>> ____________________________________________________________
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/e1e7e7c6-8b11-4713-a303-837604668c22%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/e1e7e7c6-8b11-4713-a303-837604668c22%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWPzg%3DvB5MsnZq_i-cjUx4S0VmP4kUgyV-Kh25_g%2BFnYg%40mail.gmail.com.

Reply via email to