Thank you very much sir On Thu, Jun 21, 2018, 2:26 AM Shree Devi Kumar <[email protected]> wrote:
> Here are the bash script files: > > 1. for finetune for impact training - add a font > 2. for finetune plus-minus training - for adding a new character > > On Thu, Jun 21, 2018 at 1:40 AM Shree Devi Kumar <[email protected]> > wrote: > >> Attached is a BASH script for Finetune training for 'Impact' (refer to >> Ray's tutorial in wiki for more details). >> Use this when you want to finetune a model for a single new font. >> >> You will need to change the paths for directories and filenames based on >> your system. >> >> The script assumes that you have tesseract 4.0.0-beta installed alongwith >> training tools. Refer to wiki main page for info on how to download latest >> version of code from PPA etc. >> >> Please read through the script first, change as needed, create the >> required training texts and then run the script. >> >> #!/bin/bash >> ##################################################### >> # Script to finetune a language traineddata file for one new font >> # for tesseract4.0.0-beta >> # Modify directory paths and filenames as required for your setup. >> ##################################################### >> # Choose which parts of script are to be run? >> MakeData=yes >> RunTraining=yes >> RunEval=yes >> ##################################################### >> >> # Language >> Lang=eng >> >> # downloaded directory with language data >> langdata_dir=~/langdata >> >> # Make about 150 lines of representative training text for finetuning >> finetune_training_text=$langdata_dir/$Lang/$Lang.finetune.training_text >> >> # Make about 150 lines of representative training text for evaluation >> eval_training_text=$langdata_dir/$Lang/$Lang.eval.training_text >> >> # fonts directory for this system >> fonts_dir=~/.fonts >> >> # Finetune training for IMPACT - ONE font ONLY >> fonts_for_training=" \ >> 'Alanis Hand' \ >> " >> >> # directory with the old 'best' language training set to continue from >> eg. ara, eng, san >> bestdata_dir=~/tessdata_best >> >> # tessdata-dir which has osd.trainddata, eng.traineddata, config and >> tessconfigs folder and pdf.ttf >> tessdata_dir=~/tessdata >> >> # directory with training scripts - tesstrain.sh etc. >> tesstrain_dir=~/tesseract/src/training >> >> # output directories for this run >> trained_output_dir=./$Lang-finetune-impact >> eval_output_dir=./$Lang-finetune-impact-eval >> >> if [ $MakeData = "yes" ]; then >> >> echo "###### MAKING EVAL DATA ######" >> rm -rf $eval_output_dir >> mkdir $trained_output_dir >> >> echo "#### running tesstrain.sh for eval text ####" >> >> eval bash $tesstrain_dir/tesstrain.sh \ >> --lang $Lang \ >> --linedata_only \ >> --noextract_font_properties \ >> --exposures "0" \ >> --fonts_dir $fonts_dir \ >> --fontlist $fonts_for_training \ >> --langdata_dir $langdata_dir \ >> --tessdata_dir $tessdata_dir \ >> --training_text $eval_training_text \ >> --output_dir $eval_output_dir >> >> echo "###### MAKING TRAINING DATA ######" >> rm -rf $trained_output_dir >> mkdir $trained_output_dir >> >> echo "#### running tesstrain.sh for training text ####" >> >> eval bash $tesstrain_dir/tesstrain.sh \ >> --lang $Lang \ >> --linedata_only \ >> --noextract_font_properties \ >> --exposures "0" \ >> --fonts_dir $fonts_dir \ >> --fontlist $fonts_for_training \ >> --langdata_dir $langdata_dir \ >> --tessdata_dir $tessdata_dir \ >> --training_text $finetune_training_text \ >> --output_dir $trained_output_dir >> >> echo "#### running combine_tessdata to extract lstm model from >> 'tessdata_best' for $Lang ####" >> >> combine_tessdata -e $bestdata_dir/$Lang.traineddata >> $bestdata_dir/$Lang.lstm >> >> fi >> >> if [ $RunTraining = "yes" ]; then >> >> echo "###### LSTM TRAINING ######" >> >> echo "#### running lstmtraining for finetuning from >> $bestdata_dir/$Lang.traineddata #####" >> >> lstmtraining \ >> --continue_from $bestdata_dir/$Lang.lstm \ >> --traineddata $bestdata_dir/$Lang.traineddata \ >> --max_iterations 1000 \ >> --debug_interval 0 \ >> --train_listfile $trained_output_dir/$Lang.training_files.txt \ >> --model_output $trained_output_dir/finetune >> >> echo "###### BUILD FINETUNED MODEL ######" >> >> echo "#### Building final trained file $Lang-finetune-$Lang.traineddata >> ####" >> >> lstmtraining \ >> --stop_training \ >> --continue_from $trained_output_dir/finetune_checkpoint \ >> --traineddata $bestdata_dir/$Lang.traineddata \ >> --model_output "$trained_output_dir/$Lang-finetune-$Lang.traineddata" >> >> fi >> >> if [ $RunEval = "yes" ]; then >> >> echo "###### EVAL ORIGINAL MODEL ######" >> >> lstmeval \ >> --model $bestdata_dir/$Lang.traineddata \ >> --eval_listfile $eval_output_dir/$Lang.training_files.txt \ >> --verbosity 0 >> >> echo "###### EVAL FINETUNED MODEL ######" >> >> lstmeval \ >> --model $trained_output_dir/$Lang-finetune-$Lang.traineddata \ >> --eval_listfile $eval_output_dir/$Lang.training_files.txt \ >> --verbosity 0 >> >> fi >> >> >> On Wed, Jun 20, 2018 at 9:14 PM Shree Devi Kumar <[email protected]> >> wrote: >> >>> >>> https://github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract-3.03%E2%80%933.05 >>> >>> >>> https://github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract-%E2%80%93-tesstrain.sh >>> >>> I haven't trained with tesseract 3 for a while. I willpost instructions >>> for tesseract4 later. >>> >>> On Wed, Jun 20, 2018 at 9:05 PM Navaneetha Bitla <[email protected]> >>> wrote: >>> >>>> can you help us by saying how to train with tesstrain.sh >>>> >>>> It will help all of us, we are thankful to you. >>>> >>>> On Wed, Jun 20, 2018 at 8:59 PM, Shree Devi Kumar <[email protected] >>>> > wrote: >>>> >>>>> You will have better control on training if you use tesstrain.sh >>>>> provided with tesseract. >>>>> >>>>> On Wed, Jun 20, 2018 at 8:52 PM Navaneetha Bitla < >>>>> [email protected]> wrote: >>>>> >>>>>> http://www.1001fonts.com/handwritten-fonts.html. >>>>>> >>>>>> the above link has 1900+ fonts from that site i have downloaded the >>>>>> ttf files of fonts and converted to tiff files online. >>>>>> >>>>>> then i have trained the tiff files(fonts) using serak trainer. >>>>>> >>>>>> >>>>>> If you got the accuracy just forward the results so everyone can konw >>>>>> and will follw you. >>>>>> >>>>>> Thank you >>>>>> >>>>>> On Wed, Jun 20, 2018 at 3:13 PM, James Q <[email protected] >>>>>> > wrote: >>>>>> >>>>>>> I'm going to be using tesseract 4 and using the tesstrain.sh script. >>>>>>> If I come across things that improve accuracy though I will let you >>>>>>> know. >>>>>>> >>>>>>> Where did you find 1300 handwriting fonts? >>>>>>> >>>>>>> On Tuesday, June 19, 2018 at 5:19:54 PM UTC+1, Navaneetha Bitla >>>>>>> wrote: >>>>>>>> >>>>>>>> serak trainer using training tesseract 3.5. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Jun 19, 2018 at 9:29 PM, James Q <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi Navaneetha >>>>>>>>> I am also looking to start training tesseract using handwritten >>>>>>>>> fonts and am about to start setting up my training environment. Are >>>>>>>>> you >>>>>>>>> training tesseract 4 by following the guide at >>>>>>>>> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00 >>>>>>>>> ? >>>>>>>>> >>>>>>>>> If so are you fine tuning the existing english model, retraining >>>>>>>>> just the top layer(s) or training from scratch with your additional >>>>>>>>> fonts? >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> Jim >>>>>>>>> >>>>>>>>> On Tuesday, June 19, 2018 at 10:30:30 AM UTC+1, Navaneetha Bitla >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Hi, this is Navaneetha >>>>>>>>>> >>>>>>>>>> i'm working in hand written character recognition project. >>>>>>>>>> >>>>>>>>>> I have trained 1300 different hand written fonts of english and >>>>>>>>>> moved the files into tessdata directory. >>>>>>>>>> >>>>>>>>>> tested tesseract using the below commands: >>>>>>>>>> >>>>>>>>>> $convert -density 300 input.png -depth 8 -strip -background white >>>>>>>>>> -alpha off out.tiff >>>>>>>>>> >>>>>>>>>> $tesseract out.tiff eng >>>>>>>>>> >>>>>>>>>> The input.png is of Alanis Handa font and i have trained this >>>>>>>>>> font but i'm not getting atleast 40% accuracy. >>>>>>>>>> >>>>>>>>>> Can someone help me. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks in advance. >>>>>>>>>> >>>>>>>>> -- >>>>>>>>> You received this message because you are subscribed to the Google >>>>>>>>> Groups "tesseract-ocr" group. >>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>> send an email to [email protected]. >>>>>>>>> To post to this group, send email to [email protected]. >>>>>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>>>>>>> To view this discussion on the web visit >>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/253906ac-fedf-4364-ad70-e745b8786c0d%40googlegroups.com >>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/253906ac-fedf-4364-ad70-e745b8786c0d%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>> . >>>>>>>>> >>>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "tesseract-ocr" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to [email protected]. >>>>>>> To post to this group, send email to [email protected]. >>>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>>>>> To view this discussion on the web visit >>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/29a1bc53-d127-407b-8611-0652821a0707%40googlegroups.com >>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/29a1bc53-d127-407b-8611-0652821a0707%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>> . >>>>>>> >>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>> >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "tesseract-ocr" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to [email protected]. >>>>>> To post to this group, send email to [email protected]. >>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>>>> To view this discussion on the web visit >>>>>> https://groups.google.com/d/msgid/tesseract-ocr/CABbi8QfEe2r%2BynHHEGfr8_b-x5KOf2yJ1xr%2Be7e1sDCKxqUFXA%40mail.gmail.com >>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/CABbi8QfEe2r%2BynHHEGfr8_b-x5KOf2yJ1xr%2Be7e1sDCKxqUFXA%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>>> . >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> ____________________________________________________________ >>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To post to this group, send email to [email protected]. >>>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduU4w%2BjPakoNOdzq6QyS3nF9rAp9gHSPUkKddioZTXsgyw%40mail.gmail.com >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduU4w%2BjPakoNOdzq6QyS3nF9rAp9gHSPUkKddioZTXsgyw%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To post to this group, send email to [email protected]. >>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/CABbi8Qdg6FhUbL9ZznVNikY-CS9PcYCoWWeM_7OJNuq7BLMgUA%40mail.gmail.com >>>> <https://groups.google.com/d/msgid/tesseract-ocr/CABbi8Qdg6FhUbL9ZznVNikY-CS9PcYCoWWeM_7OJNuq7BLMgUA%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> >>> -- >>> >>> ____________________________________________________________ >>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>> >> >> >> -- >> >> ____________________________________________________________ >> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >> > > > -- > > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWE-H37wg-J9u9H9uRNx%2B1ttPQe_WZ3%2BLchNgr8Z%2BEXPA%40mail.gmail.com > <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWE-H37wg-J9u9H9uRNx%2B1ttPQe_WZ3%2BLchNgr8Z%2BEXPA%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CABbi8QfBB4P6Q5q7kzurLKhC%2B2ySqZNSb9b_RYa8JiHtEZZ7LQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

